All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/19] netfilter: IPv6 NAT
@ 2012-08-09 20:08 kaber
  2012-08-09 20:08 ` [PATCH 01/19] netfilter: nf_ct_sip: fix helper name kaber
                   ` (22 more replies)
  0 siblings, 23 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

The following patches contain an updated version of IPv6 NAT against
Linus' current tree.

The series is organized as follows:

- Patches 01-03 contain bugfixes for SIP helper bugs/regressions
  present in the current kernel

- Patches 04-06 improve conntrack fragmentation handling, the IPv6
  parts are also a precondition for IPv6 NAT

- Patches 07 and 08 prepare the current NAT code for conversion to
  an address family independant core, but contain no functional
  changes

- Patch 09 adds the address family independant NAT core and converts
  the existing IPv4-only NAT code to an AF-specific module

- Patches 10 and 11 add some infrastructure for IPv6 NAT

- Patch 12 adds IPv6 NAT support

- Patches 13-15 add IPv6 specific NAT targets

- Patches 16-19 add some IPv6-capable ports of existing NAT helpers

- Patch 19 is independant of the IPv6 NAT code and adds support for
  stateless IPv6 prefix translation, just to relieve my conscience ;)


Since the last posting numerous bugs have been fixed, I don't remember
all of them, the more important ones include:

- automatic NAT module loading in ctnetlink

- address selection when mapping to IPv6 ranges

- handling of IPv6 fragments

- NAT handling of ICMPv6 error messages

Besides implementing IPv6 NAT, there are no known bugs left. Userspace
patches will follow shortly.

The entire patchset is also available at

git://github.com/kaber/nf-nat-ipv6.git master

Comments, questions and test results welcome.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 01/19] netfilter: nf_ct_sip: fix helper name
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-14  0:00   ` Pablo Neira Ayuso
  2012-08-09 20:08 ` [PATCH 02/19] netfilter: nf_ct_sip: fix IPv6 address parsing kaber
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Commit 3a8fc53a (netfilter: nf_ct_helper: allocate 16 bytes for the helper
and policy names) introduced a bug in the SIP helper, the helper name is
sprinted to the sip_names array instead of instead of into the helper
structure. This breaks the helper match and the /proc/net/nf_conntrack_expect
output.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/nf_conntrack_sip.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 758a1ba..2fb6669 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -1515,7 +1515,6 @@ static int sip_help_udp(struct sk_buff *skb, unsigned int protoff,
 }
 
 static struct nf_conntrack_helper sip[MAX_PORTS][4] __read_mostly;
-static char sip_names[MAX_PORTS][4][sizeof("sip-65535")] __read_mostly;
 
 static const struct nf_conntrack_expect_policy sip_exp_policy[SIP_EXPECT_MAX + 1] = {
 	[SIP_EXPECT_SIGNALLING] = {
@@ -1585,9 +1584,9 @@ static int __init nf_conntrack_sip_init(void)
 			sip[i][j].me = THIS_MODULE;
 
 			if (ports[i] == SIP_PORT)
-				sprintf(sip_names[i][j], "sip");
+				sprintf(sip[i][j].name, "sip");
 			else
-				sprintf(sip_names[i][j], "sip-%u", i);
+				sprintf(sip[i][j].name, "sip-%u", i);
 
 			pr_debug("port #%u: %u\n", i, ports[i]);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/19] netfilter: nf_ct_sip: fix IPv6 address parsing
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
  2012-08-09 20:08 ` [PATCH 01/19] netfilter: nf_ct_sip: fix helper name kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-14  0:19   ` Pablo Neira Ayuso
  2012-08-09 20:08 ` [PATCH 03/19] netfilter: nf_nat_sip: fix via header translation with multiple parameters kaber
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Within SIP messages IPv6 addresses are enclosed in square brackets in most
cases, with the exception of the "received=" header parameter. Currently
the helper fails to parse enclosed addresses.

This patch:

- changes the SIP address parsing function to enforce square brackets
  when required, and accept them when not required but present, as
  recommended by RFC 5118.

- adds a new SDP address parsing function that never accepts square
  brackets since SDP doesn't use them.

With these changes, the SIP helper correctly parses all test messages
from RFC 5118 (Session Initiation Protocol (SIP) Torture Test Messages
for Internet Protocol Version 6 (IPv6)).

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter/nf_conntrack_sip.h |    2 +-
 net/ipv4/netfilter/nf_nat_sip.c            |    4 +-
 net/netfilter/nf_conntrack_sip.c           |   87 ++++++++++++++++++++++------
 3 files changed, 73 insertions(+), 20 deletions(-)

diff --git a/include/linux/netfilter/nf_conntrack_sip.h b/include/linux/netfilter/nf_conntrack_sip.h
index 0dfc8b7..89f2a62 100644
--- a/include/linux/netfilter/nf_conntrack_sip.h
+++ b/include/linux/netfilter/nf_conntrack_sip.h
@@ -164,7 +164,7 @@ extern int ct_sip_parse_address_param(const struct nf_conn *ct, const char *dptr
 				      unsigned int dataoff, unsigned int datalen,
 				      const char *name,
 				      unsigned int *matchoff, unsigned int *matchlen,
-				      union nf_inet_addr *addr);
+				      union nf_inet_addr *addr, bool delim);
 extern int ct_sip_parse_numerical_param(const struct nf_conn *ct, const char *dptr,
 					unsigned int off, unsigned int datalen,
 					const char *name,
diff --git a/net/ipv4/netfilter/nf_nat_sip.c b/net/ipv4/netfilter/nf_nat_sip.c
index ea4a238..eef8f29 100644
--- a/net/ipv4/netfilter/nf_nat_sip.c
+++ b/net/ipv4/netfilter/nf_nat_sip.c
@@ -173,7 +173,7 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 		 * the reply. */
 		if (ct_sip_parse_address_param(ct, *dptr, matchend, *datalen,
 					       "maddr=", &poff, &plen,
-					       &addr) > 0 &&
+					       &addr, true) > 0 &&
 		    addr.ip == ct->tuplehash[dir].tuple.src.u3.ip &&
 		    addr.ip != ct->tuplehash[!dir].tuple.dst.u3.ip) {
 			buflen = sprintf(buffer, "%pI4",
@@ -187,7 +187,7 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 		 * from which the server received the request. */
 		if (ct_sip_parse_address_param(ct, *dptr, matchend, *datalen,
 					       "received=", &poff, &plen,
-					       &addr) > 0 &&
+					       &addr, false) > 0 &&
 		    addr.ip == ct->tuplehash[dir].tuple.dst.u3.ip &&
 		    addr.ip != ct->tuplehash[!dir].tuple.src.u3.ip) {
 			buflen = sprintf(buffer, "%pI4",
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 2fb6669..5c0a112 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -183,12 +183,12 @@ static int media_len(const struct nf_conn *ct, const char *dptr,
 	return len + digits_len(ct, dptr, limit, shift);
 }
 
-static int parse_addr(const struct nf_conn *ct, const char *cp,
-                      const char **endp, union nf_inet_addr *addr,
-                      const char *limit)
+static int sip_parse_addr(const struct nf_conn *ct, const char *cp,
+			  const char **endp, union nf_inet_addr *addr,
+			  const char *limit, bool delim)
 {
 	const char *end;
-	int ret = 0;
+	int ret;
 
 	if (!ct)
 		return 0;
@@ -197,16 +197,28 @@ static int parse_addr(const struct nf_conn *ct, const char *cp,
 	switch (nf_ct_l3num(ct)) {
 	case AF_INET:
 		ret = in4_pton(cp, limit - cp, (u8 *)&addr->ip, -1, &end);
+		if (ret == 0)
+			return 0;
 		break;
 	case AF_INET6:
+		if (cp < limit && *cp == '[')
+			cp++;
+		else if (delim)
+			return 0;
+
 		ret = in6_pton(cp, limit - cp, (u8 *)&addr->ip6, -1, &end);
+		if (ret == 0)
+			return 0;
+
+		if (end < limit && *end == ']')
+			end++;
+		else if (delim)
+			return 0;
 		break;
 	default:
 		BUG();
 	}
 
-	if (ret == 0 || end == cp)
-		return 0;
 	if (endp)
 		*endp = end;
 	return 1;
@@ -219,7 +231,7 @@ static int epaddr_len(const struct nf_conn *ct, const char *dptr,
 	union nf_inet_addr addr;
 	const char *aux = dptr;
 
-	if (!parse_addr(ct, dptr, &dptr, &addr, limit)) {
+	if (!sip_parse_addr(ct, dptr, &dptr, &addr, limit, true)) {
 		pr_debug("ip: %s parse failed.!\n", dptr);
 		return 0;
 	}
@@ -296,7 +308,7 @@ int ct_sip_parse_request(const struct nf_conn *ct,
 		return 0;
 	dptr += shift;
 
-	if (!parse_addr(ct, dptr, &end, addr, limit))
+	if (!sip_parse_addr(ct, dptr, &end, addr, limit, true))
 		return -1;
 	if (end < limit && *end == ':') {
 		end++;
@@ -550,7 +562,7 @@ int ct_sip_parse_header_uri(const struct nf_conn *ct, const char *dptr,
 	if (ret == 0)
 		return ret;
 
-	if (!parse_addr(ct, dptr + *matchoff, &c, addr, limit))
+	if (!sip_parse_addr(ct, dptr + *matchoff, &c, addr, limit, true))
 		return -1;
 	if (*c == ':') {
 		c++;
@@ -599,7 +611,7 @@ int ct_sip_parse_address_param(const struct nf_conn *ct, const char *dptr,
 			       unsigned int dataoff, unsigned int datalen,
 			       const char *name,
 			       unsigned int *matchoff, unsigned int *matchlen,
-			       union nf_inet_addr *addr)
+			       union nf_inet_addr *addr, bool delim)
 {
 	const char *limit = dptr + datalen;
 	const char *start, *end;
@@ -613,7 +625,7 @@ int ct_sip_parse_address_param(const struct nf_conn *ct, const char *dptr,
 		return 0;
 
 	start += strlen(name);
-	if (!parse_addr(ct, start, &end, addr, limit))
+	if (!sip_parse_addr(ct, start, &end, addr, limit, delim))
 		return 0;
 	*matchoff = start - dptr;
 	*matchlen = end - start;
@@ -675,6 +687,47 @@ static int ct_sip_parse_transport(struct nf_conn *ct, const char *dptr,
 	return 1;
 }
 
+static int sdp_parse_addr(const struct nf_conn *ct, const char *cp,
+			  const char **endp, union nf_inet_addr *addr,
+			  const char *limit)
+{
+	const char *end;
+	int ret;
+
+	memset(addr, 0, sizeof(*addr));
+	switch (nf_ct_l3num(ct)) {
+	case AF_INET:
+		ret = in4_pton(cp, limit - cp, (u8 *)&addr->ip, -1, &end);
+		break;
+	case AF_INET6:
+		ret = in6_pton(cp, limit - cp, (u8 *)&addr->ip6, -1, &end);
+		break;
+	default:
+		BUG();
+	}
+
+	if (ret == 0)
+		return 0;
+	if (endp)
+		*endp = end;
+	return 1;
+}
+
+/* skip ip address. returns its length. */
+static int sdp_addr_len(const struct nf_conn *ct, const char *dptr,
+			const char *limit, int *shift)
+{
+	union nf_inet_addr addr;
+	const char *aux = dptr;
+
+	if (!sdp_parse_addr(ct, dptr, &dptr, &addr, limit)) {
+		pr_debug("ip: %s parse failed.!\n", dptr);
+		return 0;
+	}
+
+	return dptr - aux;
+}
+
 /* SDP header parsing: a SDP session description contains an ordered set of
  * headers, starting with a section containing general session parameters,
  * optionally followed by multiple media descriptions.
@@ -686,10 +739,10 @@ static int ct_sip_parse_transport(struct nf_conn *ct, const char *dptr,
  */
 static const struct sip_header ct_sdp_hdrs[] = {
 	[SDP_HDR_VERSION]		= SDP_HDR("v=", NULL, digits_len),
-	[SDP_HDR_OWNER_IP4]		= SDP_HDR("o=", "IN IP4 ", epaddr_len),
-	[SDP_HDR_CONNECTION_IP4]	= SDP_HDR("c=", "IN IP4 ", epaddr_len),
-	[SDP_HDR_OWNER_IP6]		= SDP_HDR("o=", "IN IP6 ", epaddr_len),
-	[SDP_HDR_CONNECTION_IP6]	= SDP_HDR("c=", "IN IP6 ", epaddr_len),
+	[SDP_HDR_OWNER_IP4]		= SDP_HDR("o=", "IN IP4 ", sdp_addr_len),
+	[SDP_HDR_CONNECTION_IP4]	= SDP_HDR("c=", "IN IP4 ", sdp_addr_len),
+	[SDP_HDR_OWNER_IP6]		= SDP_HDR("o=", "IN IP6 ", sdp_addr_len),
+	[SDP_HDR_CONNECTION_IP6]	= SDP_HDR("c=", "IN IP6 ", sdp_addr_len),
 	[SDP_HDR_MEDIA]			= SDP_HDR("m=", NULL, media_len),
 };
 
@@ -775,8 +828,8 @@ static int ct_sip_parse_sdp_addr(const struct nf_conn *ct, const char *dptr,
 	if (ret <= 0)
 		return ret;
 
-	if (!parse_addr(ct, dptr + *matchoff, NULL, addr,
-			dptr + *matchoff + *matchlen))
+	if (!sdp_parse_addr(ct, dptr + *matchoff, NULL, addr,
+			    dptr + *matchoff + *matchlen))
 		return -1;
 	return 1;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 03/19] netfilter: nf_nat_sip: fix via header translation with multiple parameters
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
  2012-08-09 20:08 ` [PATCH 01/19] netfilter: nf_ct_sip: fix helper name kaber
  2012-08-09 20:08 ` [PATCH 02/19] netfilter: nf_ct_sip: fix IPv6 address parsing kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-14  0:28   ` Pablo Neira Ayuso
  2012-08-09 20:08 ` [PATCH 04/19] ipv4: fix path MTU discovery with connection tracking kaber
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Via-headers are parsed beginning at the first character after the Via-address.
When the address is translated first and its length decreases, the offset to
start parsing at is incorrect and header parameters might be missed.

Update the offset after translating the Via-address to fix this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/ipv4/netfilter/nf_nat_sip.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/nf_nat_sip.c b/net/ipv4/netfilter/nf_nat_sip.c
index eef8f29..4ad9cf1 100644
--- a/net/ipv4/netfilter/nf_nat_sip.c
+++ b/net/ipv4/netfilter/nf_nat_sip.c
@@ -148,7 +148,7 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 	if (ct_sip_parse_header_uri(ct, *dptr, NULL, *datalen,
 				    hdr, NULL, &matchoff, &matchlen,
 				    &addr, &port) > 0) {
-		unsigned int matchend, poff, plen, buflen, n;
+		unsigned int olen, matchend, poff, plen, buflen, n;
 		char buffer[sizeof("nnn.nnn.nnn.nnn:nnnnn")];
 
 		/* We're only interested in headers related to this
@@ -163,11 +163,12 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 				goto next;
 		}
 
+		olen = *datalen;
 		if (!map_addr(skb, dataoff, dptr, datalen, matchoff, matchlen,
 			      &addr, port))
 			return NF_DROP;
 
-		matchend = matchoff + matchlen;
+		matchend = matchoff + matchlen + *datalen - olen;
 
 		/* The maddr= parameter (RFC 2361) specifies where to send
 		 * the reply. */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 04/19] ipv4: fix path MTU discovery with connection tracking
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (2 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 03/19] netfilter: nf_nat_sip: fix via header translation with multiple parameters kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-09 20:08 ` [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling kaber
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independant of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.

This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/inet_frag.h |    2 ++
 include/net/ip.h        |    2 ++
 net/ipv4/ip_fragment.c  |    8 +++++++-
 net/ipv4/ip_output.c    |    4 +++-
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 2431cf8..5098ee7 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -29,6 +29,8 @@ struct inet_frag_queue {
 #define INET_FRAG_COMPLETE	4
 #define INET_FRAG_FIRST_IN	2
 #define INET_FRAG_LAST_IN	1
+
+	u16			max_size;
 };
 
 #define INETFRAGS_HASHSZ		64
diff --git a/include/net/ip.h b/include/net/ip.h
index bd5e444..613053e 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -42,6 +42,8 @@ struct inet_skb_parm {
 #define IPSKB_XFRM_TRANSFORMED	4
 #define IPSKB_FRAG_COMPLETE	8
 #define IPSKB_REROUTED		16
+
+	u16			frag_max_size;
 };
 
 static inline unsigned int ip_hdrlen(const struct sk_buff *skb)
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 8d07c97..fa6a12c 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -523,6 +523,10 @@ found:
 	if (offset == 0)
 		qp->q.last_in |= INET_FRAG_FIRST_IN;
 
+	if (ip_hdr(skb)->frag_off & htons(IP_DF) &&
+	    skb->len + ihl > qp->q.max_size)
+		qp->q.max_size = skb->len + ihl;
+
 	if (qp->q.last_in == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
 	    qp->q.meat == qp->q.len)
 		return ip_frag_reasm(qp, prev, dev);
@@ -646,9 +650,11 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 	head->next = NULL;
 	head->dev = dev;
 	head->tstamp = qp->q.stamp;
+	IPCB(head)->frag_max_size = qp->q.max_size;
 
 	iph = ip_hdr(head);
-	iph->frag_off = 0;
+	/* max_size != 0 implies at least one fragment had IP_DF set */
+	iph->frag_off = qp->q.max_size ? htons(IP_DF) : 0;
 	iph->tot_len = htons(len);
 	iph->tos |= ecn;
 	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMOKS);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 76dde25..ae33b99 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -467,7 +467,9 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 
 	iph = ip_hdr(skb);
 
-	if (unlikely((iph->frag_off & htons(IP_DF)) && !skb->local_df)) {
+	if (unlikely(((iph->frag_off & htons(IP_DF)) && !skb->local_df) ||
+		     (IPCB(skb)->frag_max_size &&
+		      IPCB(skb)->frag_max_size > dst_mtu(&rt->dst)))) {
 		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 			  htonl(ip_skb_dst_mtu(skb)));
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (3 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 04/19] ipv4: fix path MTU discovery with connection tracking kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-17  8:06   ` Jesper Dangaard Brouer
  2012-08-17 13:36   ` [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling Pablo Neira Ayuso
  2012-08-09 20:08 ` [PATCH 06/19] netfilter: nf_conntrack_ipv6: fix tracking of ICMPv6 error messages containing fragments kaber
                   ` (17 subsequent siblings)
  22 siblings, 2 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.

Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.

This patch improves the situation in the following way:

- If a reassembled packet belongs to a connection that has a helper
  assigned, the reassembled packet is passed through the stack instead
  of the original fragments.

- During defragmentation, the largest received fragment size is stored.
  On output, the packet is refragmented if required. If the largest
  received fragment size exceeds the outgoing MTU, a "packet too big"
  message is generated, thus behaving as if the original fragments
  were passed through the stack from an outside point of view.

- The ipv6_helper() hook function can't receive fragments anymore for
  connections using a helper, so it is switched to use ipv6_skip_exthdr()
  instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
  reassembled packets are passed to connection tracking helpers.

The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.

This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/ipv6.h                           |    1 +
 net/ipv6/ip6_output.c                          |    7 +++-
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |   37 ++++++++++++++++++------
 net/ipv6/netfilter/nf_conntrack_reasm.c        |   19 ++++++++++--
 4 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 879db26..0b94e91 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -256,6 +256,7 @@ struct inet6_skb_parm {
 #if defined(CONFIG_IPV6_MIP6) || defined(CONFIG_IPV6_MIP6_MODULE)
 	__u16			dsthao;
 #endif
+	__u16			frag_max_size;
 
 #define IP6SKB_XFRM_TRANSFORMED	1
 #define IP6SKB_FORWARDED	2
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 5b2d63e..a4f6263 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -493,7 +493,8 @@ int ip6_forward(struct sk_buff *skb)
 	if (mtu < IPV6_MIN_MTU)
 		mtu = IPV6_MIN_MTU;
 
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if ((!skb->local_df && skb->len > mtu && !skb_is_gso(skb)) ||
+	    (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)) {
 		/* Again, force OUTPUT device used as source address */
 		skb->dev = dst->dev;
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
@@ -636,7 +637,9 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 	/* We must not fragment if the socket is set to force MTU discovery
 	 * or if the skb it not generated by a local socket.
 	 */
-	if (unlikely(!skb->local_df && skb->len > mtu)) {
+	if (unlikely(!skb->local_df && skb->len > mtu) ||
+		     (IP6CB(skb)->frag_max_size &&
+		      IP6CB(skb)->frag_max_size > mtu)) {
 		if (skb->sk && dst_allfrag(skb_dst(skb)))
 			sk_nocaps_add(skb->sk, NETIF_F_GSO_MASK);
 
diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index 4794f96..560d823 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -153,10 +153,10 @@ static unsigned int ipv6_helper(unsigned int hooknum,
 	const struct nf_conn_help *help;
 	const struct nf_conntrack_helper *helper;
 	enum ip_conntrack_info ctinfo;
-	unsigned int ret, protoff;
-	unsigned int extoff = (u8 *)(ipv6_hdr(skb) + 1) - skb->data;
-	unsigned char pnum = ipv6_hdr(skb)->nexthdr;
-
+	unsigned int ret;
+	__be16 frag_off;
+	int protoff;
+	u8 nexthdr;
 
 	/* This is where we call the helper: as the packet goes out. */
 	ct = nf_ct_get(skb, &ctinfo);
@@ -171,9 +171,10 @@ static unsigned int ipv6_helper(unsigned int hooknum,
 	if (!helper)
 		return NF_ACCEPT;
 
-	protoff = nf_ct_ipv6_skip_exthdr(skb, extoff, &pnum,
-					 skb->len - extoff);
-	if (protoff > skb->len || pnum == NEXTHDR_FRAGMENT) {
+	nexthdr = ipv6_hdr(skb)->nexthdr;
+	protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr,
+				   &frag_off);
+	if (protoff < 0 || (frag_off & ntohs(~0x7)) != 0) {
 		pr_debug("proto header not found\n");
 		return NF_ACCEPT;
 	}
@@ -199,9 +200,13 @@ static unsigned int ipv6_confirm(unsigned int hooknum,
 static unsigned int __ipv6_conntrack_in(struct net *net,
 					unsigned int hooknum,
 					struct sk_buff *skb,
+					const struct net_device *in,
+					const struct net_device *out,
 					int (*okfn)(struct sk_buff *))
 {
 	struct sk_buff *reasm = skb->nfct_reasm;
+	struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
 
 	/* This packet is fragmented and has reassembled packet. */
 	if (reasm) {
@@ -213,6 +218,20 @@ static unsigned int __ipv6_conntrack_in(struct net *net,
 			if (ret != NF_ACCEPT)
 				return ret;
 		}
+
+		/* Conntrack helpers need the entire reassembled packet in the
+		 * POST_ROUTING hook.
+		 */
+		ct = nf_ct_get(reasm, &ctinfo);
+		if (ct != NULL && test_bit(IPS_HELPER_BIT, &ct->status)) {
+			nf_conntrack_get_reasm(skb);
+			NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
+				       (struct net_device *)in,
+				       (struct net_device *)out,
+				       okfn, NF_IP6_PRI_CONNTRACK + 1);
+			return NF_DROP_ERR(-ECANCELED);
+		}
+
 		nf_conntrack_get(reasm->nfct);
 		skb->nfct = reasm->nfct;
 		skb->nfctinfo = reasm->nfctinfo;
@@ -228,7 +247,7 @@ static unsigned int ipv6_conntrack_in(unsigned int hooknum,
 				      const struct net_device *out,
 				      int (*okfn)(struct sk_buff *))
 {
-	return __ipv6_conntrack_in(dev_net(in), hooknum, skb, okfn);
+	return __ipv6_conntrack_in(dev_net(in), hooknum, skb, in, out, okfn);
 }
 
 static unsigned int ipv6_conntrack_local(unsigned int hooknum,
@@ -242,7 +261,7 @@ static unsigned int ipv6_conntrack_local(unsigned int hooknum,
 		net_notice_ratelimited("ipv6_conntrack_local: packet too short\n");
 		return NF_ACCEPT;
 	}
-	return __ipv6_conntrack_in(dev_net(out), hooknum, skb, okfn);
+	return __ipv6_conntrack_in(dev_net(out), hooknum, skb, in, out, okfn);
 }
 
 static struct nf_hook_ops ipv6_conntrack_ops[] __read_mostly = {
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index c9c78c2..f94fb3a 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -190,6 +190,7 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
 			     const struct frag_hdr *fhdr, int nhoff)
 {
 	struct sk_buff *prev, *next;
+	unsigned int payload_len;
 	int offset, end;
 
 	if (fq->q.last_in & INET_FRAG_COMPLETE) {
@@ -197,8 +198,10 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
 		goto err;
 	}
 
+	payload_len = ntohs(ipv6_hdr(skb)->payload_len);
+
 	offset = ntohs(fhdr->frag_off) & ~0x7;
-	end = offset + (ntohs(ipv6_hdr(skb)->payload_len) -
+	end = offset + (payload_len -
 			((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1)));
 
 	if ((unsigned int)end > IPV6_MAXPLEN) {
@@ -307,6 +310,8 @@ found:
 	skb->dev = NULL;
 	fq->q.stamp = skb->tstamp;
 	fq->q.meat += skb->len;
+	if (payload_len > fq->q.max_size)
+		fq->q.max_size = payload_len;
 	atomic_add(skb->truesize, &nf_init_frags.mem);
 
 	/* The first fragment.
@@ -412,10 +417,12 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
 	}
 	atomic_sub(head->truesize, &nf_init_frags.mem);
 
+	head->local_df = 1;
 	head->next = NULL;
 	head->dev = dev;
 	head->tstamp = fq->q.stamp;
 	ipv6_hdr(head)->payload_len = htons(payload_len);
+	IP6CB(head)->frag_max_size = sizeof(struct ipv6hdr) + fq->q.max_size;
 
 	/* Yes, and fold redundant checksum back. 8) */
 	if (head->ip_summed == CHECKSUM_COMPLETE)
@@ -592,6 +599,7 @@ void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
 			int (*okfn)(struct sk_buff *))
 {
 	struct sk_buff *s, *s2;
+	unsigned int ret = 0;
 
 	for (s = NFCT_FRAG6_CB(skb)->orig; s;) {
 		nf_conntrack_put_reasm(s->nfct_reasm);
@@ -601,8 +609,13 @@ void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
 		s2 = s->next;
 		s->next = NULL;
 
-		NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, s, in, out, okfn,
-			       NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
+		if (ret != -ECANCELED)
+			ret = NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, s,
+					     in, out, okfn,
+					     NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
+		else
+			kfree_skb(s);
+
 		s = s2;
 	}
 	nf_conntrack_put_reasm(skb);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 06/19] netfilter: nf_conntrack_ipv6: fix tracking of ICMPv6 error messages containing fragments
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (4 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-09 20:08 ` [PATCH 07/19] netfilter: nf_conntrack: restrict NAT helper invocation to IPv4 kaber
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

ICMPv6 error messages are tracked by extracting the conntrack tuple of
the inner packet and looking up the corresponding conntrack entry. Tuple
extraction uses the ->get_l4proto() callback, which in case of fragments
returns NEXTHDR_FRAGMENT instead of the upper protocol, even for the
first fragment when the entire next header is present, resulting in a
failure to find the correct connection tracking entry.

This patch changes ipv6_get_l4proto() to use ipv6_skip_exthdr() instead
of nf_ct_ipv6_skip_exthdr() in order to skip fragment headers when the
fragment offset is zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |   63 ++---------------------
 1 files changed, 6 insertions(+), 57 deletions(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index 560d823..4d92ea6 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -64,82 +64,31 @@ static int ipv6_print_tuple(struct seq_file *s,
 			  tuple->src.u3.ip6, tuple->dst.u3.ip6);
 }
 
-/*
- * Based on ipv6_skip_exthdr() in net/ipv6/exthdr.c
- *
- * This function parses (probably truncated) exthdr set "hdr"
- * of length "len". "nexthdrp" initially points to some place,
- * where type of the first header can be found.
- *
- * It skips all well-known exthdrs, and returns pointer to the start
- * of unparsable area i.e. the first header with unknown type.
- * if success, *nexthdr is updated by type/protocol of this header.
- *
- * NOTES: - it may return pointer pointing beyond end of packet,
- *          if the last recognized header is truncated in the middle.
- *        - if packet is truncated, so that all parsed headers are skipped,
- *          it returns -1.
- *        - if packet is fragmented, return pointer of the fragment header.
- *        - ESP is unparsable for now and considered like
- *          normal payload protocol.
- *        - Note also special handling of AUTH header. Thanks to IPsec wizards.
- */
-
-static int nf_ct_ipv6_skip_exthdr(const struct sk_buff *skb, int start,
-				  u8 *nexthdrp, int len)
-{
-	u8 nexthdr = *nexthdrp;
-
-	while (ipv6_ext_hdr(nexthdr)) {
-		struct ipv6_opt_hdr hdr;
-		int hdrlen;
-
-		if (len < (int)sizeof(struct ipv6_opt_hdr))
-			return -1;
-		if (nexthdr == NEXTHDR_NONE)
-			break;
-		if (nexthdr == NEXTHDR_FRAGMENT)
-			break;
-		if (skb_copy_bits(skb, start, &hdr, sizeof(hdr)))
-			BUG();
-		if (nexthdr == NEXTHDR_AUTH)
-			hdrlen = (hdr.hdrlen+2)<<2;
-		else
-			hdrlen = ipv6_optlen(&hdr);
-
-		nexthdr = hdr.nexthdr;
-		len -= hdrlen;
-		start += hdrlen;
-	}
-
-	*nexthdrp = nexthdr;
-	return start;
-}
-
 static int ipv6_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
 			    unsigned int *dataoff, u_int8_t *protonum)
 {
 	unsigned int extoff = nhoff + sizeof(struct ipv6hdr);
-	unsigned char pnum;
+	__be16 frag_off;
 	int protoff;
+	u8 nexthdr;
 
 	if (skb_copy_bits(skb, nhoff + offsetof(struct ipv6hdr, nexthdr),
-			  &pnum, sizeof(pnum)) != 0) {
+			  &nexthdr, sizeof(nexthdr)) != 0) {
 		pr_debug("ip6_conntrack_core: can't get nexthdr\n");
 		return -NF_ACCEPT;
 	}
-	protoff = nf_ct_ipv6_skip_exthdr(skb, extoff, &pnum, skb->len - extoff);
+	protoff = ipv6_skip_exthdr(skb, extoff, &nexthdr, &frag_off);
 	/*
 	 * (protoff == skb->len) mean that the packet doesn't have no data
 	 * except of IPv6 & ext headers. but it's tracked anyway. - YK
 	 */
-	if ((protoff < 0) || (protoff > skb->len)) {
+	if (protoff < 0 || (frag_off & ntohs(~0x7)) != 0) {
 		pr_debug("ip6_conntrack_core: can't find proto in pkt\n");
 		return -NF_ACCEPT;
 	}
 
 	*dataoff = protoff;
-	*protonum = pnum;
+	*protonum = nexthdr;
 	return NF_ACCEPT;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 07/19] netfilter: nf_conntrack: restrict NAT helper invocation to IPv4
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (5 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 06/19] netfilter: nf_conntrack_ipv6: fix tracking of ICMPv6 error messages containing fragments kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-09 20:08 ` [PATCH 08/19] netfilter: nf_nat: add protoff argument to packet mangling functions kaber
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

The NAT helpers currently only handle IPv4 packets correctly. Restrict
invocation of the helpers to IPv4 in preparation of IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/nf_conntrack_amanda.c    |    3 +-
 net/netfilter/nf_conntrack_ftp.c       |    3 +-
 net/netfilter/nf_conntrack_h323_main.c |   41 ++++++++++++++++++++++---------
 net/netfilter/nf_conntrack_irc.c       |    3 +-
 net/netfilter/nf_conntrack_sip.c       |   18 +++++++++----
 net/netfilter/nf_conntrack_tftp.c      |    3 +-
 6 files changed, 49 insertions(+), 22 deletions(-)

diff --git a/net/netfilter/nf_conntrack_amanda.c b/net/netfilter/nf_conntrack_amanda.c
index f2de8c5..184c0dc 100644
--- a/net/netfilter/nf_conntrack_amanda.c
+++ b/net/netfilter/nf_conntrack_amanda.c
@@ -154,7 +154,8 @@ static int amanda_help(struct sk_buff *skb,
 				  IPPROTO_TCP, NULL, &port);
 
 		nf_nat_amanda = rcu_dereference(nf_nat_amanda_hook);
-		if (nf_nat_amanda && ct->status & IPS_NAT_MASK)
+		if (nf_nat_amanda && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+		    ct->status & IPS_NAT_MASK)
 			ret = nf_nat_amanda(skb, ctinfo, off - dataoff,
 					    len, exp);
 		else if (nf_ct_expect_related(exp) != 0)
diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c
index 4bb771d..3e1587e 100644
--- a/net/netfilter/nf_conntrack_ftp.c
+++ b/net/netfilter/nf_conntrack_ftp.c
@@ -487,7 +487,8 @@ static int help(struct sk_buff *skb,
 	/* Now, NAT might want to mangle the packet, and register the
 	 * (possibly changed) expectation itself. */
 	nf_nat_ftp = rcu_dereference(nf_nat_ftp_hook);
-	if (nf_nat_ftp && ct->status & IPS_NAT_MASK)
+	if (nf_nat_ftp && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK)
 		ret = nf_nat_ftp(skb, ctinfo, search[dir][i].ftptype,
 				 matchoff, matchlen, exp);
 	else {
diff --git a/net/netfilter/nf_conntrack_h323_main.c b/net/netfilter/nf_conntrack_h323_main.c
index 4283b20..517c5e3 100644
--- a/net/netfilter/nf_conntrack_h323_main.c
+++ b/net/netfilter/nf_conntrack_h323_main.c
@@ -295,6 +295,7 @@ static int expect_rtp_rtcp(struct sk_buff *skb, struct nf_conn *ct,
 		   &ct->tuplehash[!dir].tuple.dst.u3,
 		   sizeof(ct->tuplehash[dir].tuple.src.u3)) &&
 		   (nat_rtp_rtcp = rcu_dereference(nat_rtp_rtcp_hook)) &&
+		   nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 		   ct->status & IPS_NAT_MASK) {
 		/* NAT needed */
 		ret = nat_rtp_rtcp(skb, ct, ctinfo, data, dataoff,
@@ -353,6 +354,7 @@ static int expect_t120(struct sk_buff *skb,
 		   &ct->tuplehash[!dir].tuple.dst.u3,
 		   sizeof(ct->tuplehash[dir].tuple.src.u3)) &&
 	    (nat_t120 = rcu_dereference(nat_t120_hook)) &&
+	    nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
 		/* NAT needed */
 		ret = nat_t120(skb, ct, ctinfo, data, dataoff, taddr,
@@ -688,6 +690,7 @@ static int expect_h245(struct sk_buff *skb, struct nf_conn *ct,
 		   &ct->tuplehash[!dir].tuple.dst.u3,
 		   sizeof(ct->tuplehash[dir].tuple.src.u3)) &&
 	    (nat_h245 = rcu_dereference(nat_h245_hook)) &&
+	    nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
 		/* NAT needed */
 		ret = nat_h245(skb, ct, ctinfo, data, dataoff, taddr,
@@ -811,6 +814,7 @@ static int expect_callforwarding(struct sk_buff *skb,
 		   &ct->tuplehash[!dir].tuple.dst.u3,
 		   sizeof(ct->tuplehash[dir].tuple.src.u3)) &&
 	    (nat_callforwarding = rcu_dereference(nat_callforwarding_hook)) &&
+	    nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
 		/* Need NAT */
 		ret = nat_callforwarding(skb, ct, ctinfo, data, dataoff,
@@ -852,7 +856,8 @@ static int process_setup(struct sk_buff *skb, struct nf_conn *ct,
 
 	set_h225_addr = rcu_dereference(set_h225_addr_hook);
 	if ((setup->options & eSetup_UUIE_destCallSignalAddress) &&
-	    (set_h225_addr) && ct->status & IPS_NAT_MASK &&
+	    (set_h225_addr) && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK &&
 	    get_h225_addr(ct, *data, &setup->destCallSignalAddress,
 			  &addr, &port) &&
 	    memcmp(&addr, &ct->tuplehash[!dir].tuple.src.u3, sizeof(addr))) {
@@ -868,7 +873,8 @@ static int process_setup(struct sk_buff *skb, struct nf_conn *ct,
 	}
 
 	if ((setup->options & eSetup_UUIE_sourceCallSignalAddress) &&
-	    (set_h225_addr) && ct->status & IPS_NAT_MASK &&
+	    (set_h225_addr) && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK &&
 	    get_h225_addr(ct, *data, &setup->sourceCallSignalAddress,
 			  &addr, &port) &&
 	    memcmp(&addr, &ct->tuplehash[!dir].tuple.dst.u3, sizeof(addr))) {
@@ -1278,7 +1284,8 @@ static int expect_q931(struct sk_buff *skb, struct nf_conn *ct,
 	exp->flags = NF_CT_EXPECT_PERMANENT;	/* Accept multiple calls */
 
 	nat_q931 = rcu_dereference(nat_q931_hook);
-	if (nat_q931 && ct->status & IPS_NAT_MASK) {	/* Need NAT */
+	if (nat_q931 && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK) {	/* Need NAT */
 		ret = nat_q931(skb, ct, ctinfo, data, taddr, i, port, exp);
 	} else {		/* Conntrack only */
 		if (nf_ct_expect_related(exp) == 0) {
@@ -1306,7 +1313,8 @@ static int process_grq(struct sk_buff *skb, struct nf_conn *ct,
 	pr_debug("nf_ct_ras: GRQ\n");
 
 	set_ras_addr = rcu_dereference(set_ras_addr_hook);
-	if (set_ras_addr && ct->status & IPS_NAT_MASK)	/* NATed */
+	if (set_ras_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK)	/* NATed */
 		return set_ras_addr(skb, ct, ctinfo, data,
 				    &grq->rasAddress, 1);
 	return 0;
@@ -1374,7 +1382,8 @@ static int process_rrq(struct sk_buff *skb, struct nf_conn *ct,
 		return -1;
 
 	set_ras_addr = rcu_dereference(set_ras_addr_hook);
-	if (set_ras_addr && ct->status & IPS_NAT_MASK) {
+	if (set_ras_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK) {
 		ret = set_ras_addr(skb, ct, ctinfo, data,
 				   rrq->rasAddress.item,
 				   rrq->rasAddress.count);
@@ -1405,7 +1414,8 @@ static int process_rcf(struct sk_buff *skb, struct nf_conn *ct,
 	pr_debug("nf_ct_ras: RCF\n");
 
 	set_sig_addr = rcu_dereference(set_sig_addr_hook);
-	if (set_sig_addr && ct->status & IPS_NAT_MASK) {
+	if (set_sig_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK) {
 		ret = set_sig_addr(skb, ct, ctinfo, data,
 					rcf->callSignalAddress.item,
 					rcf->callSignalAddress.count);
@@ -1453,7 +1463,8 @@ static int process_urq(struct sk_buff *skb, struct nf_conn *ct,
 	pr_debug("nf_ct_ras: URQ\n");
 
 	set_sig_addr = rcu_dereference(set_sig_addr_hook);
-	if (set_sig_addr && ct->status & IPS_NAT_MASK) {
+	if (set_sig_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK) {
 		ret = set_sig_addr(skb, ct, ctinfo, data,
 				   urq->callSignalAddress.item,
 				   urq->callSignalAddress.count);
@@ -1491,6 +1502,7 @@ static int process_arq(struct sk_buff *skb, struct nf_conn *ct,
 			  &addr, &port) &&
 	    !memcmp(&addr, &ct->tuplehash[dir].tuple.src.u3, sizeof(addr)) &&
 	    port == info->sig_port[dir] &&
+	    nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    set_h225_addr && ct->status & IPS_NAT_MASK) {
 		/* Answering ARQ */
 		return set_h225_addr(skb, data, 0,
@@ -1503,7 +1515,8 @@ static int process_arq(struct sk_buff *skb, struct nf_conn *ct,
 	    get_h225_addr(ct, *data, &arq->srcCallSignalAddress,
 			  &addr, &port) &&
 	    !memcmp(&addr, &ct->tuplehash[dir].tuple.src.u3, sizeof(addr)) &&
-	    set_h225_addr && ct->status & IPS_NAT_MASK) {
+	    set_h225_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK) {
 		/* Calling ARQ */
 		return set_h225_addr(skb, data, 0,
 				     &arq->srcCallSignalAddress,
@@ -1535,7 +1548,8 @@ static int process_acf(struct sk_buff *skb, struct nf_conn *ct,
 	if (!memcmp(&addr, &ct->tuplehash[dir].tuple.dst.u3, sizeof(addr))) {
 		/* Answering ACF */
 		set_sig_addr = rcu_dereference(set_sig_addr_hook);
-		if (set_sig_addr && ct->status & IPS_NAT_MASK)
+		if (set_sig_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+		    ct->status & IPS_NAT_MASK)
 			return set_sig_addr(skb, ct, ctinfo, data,
 					    &acf->destCallSignalAddress, 1);
 		return 0;
@@ -1571,7 +1585,8 @@ static int process_lrq(struct sk_buff *skb, struct nf_conn *ct,
 	pr_debug("nf_ct_ras: LRQ\n");
 
 	set_ras_addr = rcu_dereference(set_ras_addr_hook);
-	if (set_ras_addr && ct->status & IPS_NAT_MASK)
+	if (set_ras_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK)
 		return set_ras_addr(skb, ct, ctinfo, data,
 				    &lrq->replyAddress, 1);
 	return 0;
@@ -1628,7 +1643,8 @@ static int process_irr(struct sk_buff *skb, struct nf_conn *ct,
 	pr_debug("nf_ct_ras: IRR\n");
 
 	set_ras_addr = rcu_dereference(set_ras_addr_hook);
-	if (set_ras_addr && ct->status & IPS_NAT_MASK) {
+	if (set_ras_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK) {
 		ret = set_ras_addr(skb, ct, ctinfo, data,
 				   &irr->rasAddress, 1);
 		if (ret < 0)
@@ -1636,7 +1652,8 @@ static int process_irr(struct sk_buff *skb, struct nf_conn *ct,
 	}
 
 	set_sig_addr = rcu_dereference(set_sig_addr_hook);
-	if (set_sig_addr && ct->status & IPS_NAT_MASK) {
+	if (set_sig_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK) {
 		ret = set_sig_addr(skb, ct, ctinfo, data,
 					irr->callSignalAddress.item,
 					irr->callSignalAddress.count);
diff --git a/net/netfilter/nf_conntrack_irc.c b/net/netfilter/nf_conntrack_irc.c
index 009c52c..e06dc2f 100644
--- a/net/netfilter/nf_conntrack_irc.c
+++ b/net/netfilter/nf_conntrack_irc.c
@@ -204,7 +204,8 @@ static int help(struct sk_buff *skb, unsigned int protoff,
 					  IPPROTO_TCP, NULL, &port);
 
 			nf_nat_irc = rcu_dereference(nf_nat_irc_hook);
-			if (nf_nat_irc && ct->status & IPS_NAT_MASK)
+			if (nf_nat_irc && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+			    ct->status & IPS_NAT_MASK)
 				ret = nf_nat_irc(skb, ctinfo,
 						 addr_beg_p - ib_ptr,
 						 addr_end_p - addr_beg_p,
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 5c0a112..d08e0ba 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -981,7 +981,8 @@ static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int dataoff,
 			  IPPROTO_UDP, NULL, &rtcp_port);
 
 	nf_nat_sdp_media = rcu_dereference(nf_nat_sdp_media_hook);
-	if (nf_nat_sdp_media && ct->status & IPS_NAT_MASK && !direct_rtp)
+	if (nf_nat_sdp_media && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK && !direct_rtp)
 		ret = nf_nat_sdp_media(skb, dataoff, dptr, datalen,
 				       rtp_exp, rtcp_exp,
 				       mediaoff, medialen, daddr);
@@ -1104,7 +1105,8 @@ static int process_sdp(struct sk_buff *skb, unsigned int dataoff,
 			return ret;
 
 		/* Update media connection address if present */
-		if (maddr_len && nf_nat_sdp_addr && ct->status & IPS_NAT_MASK) {
+		if (maddr_len && nf_nat_sdp_addr &&
+		    nf_ct_l3num(ct) == NFPROTO_IPV4 && ct->status & IPS_NAT_MASK) {
 			ret = nf_nat_sdp_addr(skb, dataoff, dptr, datalen,
 					      mediaoff, c_hdr, SDP_HDR_MEDIA,
 					      &rtp_addr);
@@ -1116,7 +1118,8 @@ static int process_sdp(struct sk_buff *skb, unsigned int dataoff,
 
 	/* Update session connection and owner addresses */
 	nf_nat_sdp_session = rcu_dereference(nf_nat_sdp_session_hook);
-	if (nf_nat_sdp_session && ct->status & IPS_NAT_MASK)
+	if (nf_nat_sdp_session && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK)
 		ret = nf_nat_sdp_session(skb, dataoff, dptr, datalen, sdpoff,
 					 &rtp_addr);
 
@@ -1275,7 +1278,8 @@ static int process_register_request(struct sk_buff *skb, unsigned int dataoff,
 	exp->flags = NF_CT_EXPECT_PERMANENT | NF_CT_EXPECT_INACTIVE;
 
 	nf_nat_sip_expect = rcu_dereference(nf_nat_sip_expect_hook);
-	if (nf_nat_sip_expect && ct->status & IPS_NAT_MASK)
+	if (nf_nat_sip_expect && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK)
 		ret = nf_nat_sip_expect(skb, dataoff, dptr, datalen, exp,
 					matchoff, matchlen);
 	else {
@@ -1453,7 +1457,8 @@ static int process_sip_msg(struct sk_buff *skb, struct nf_conn *ct,
 	else
 		ret = process_sip_response(skb, dataoff, dptr, datalen);
 
-	if (ret == NF_ACCEPT && ct->status & IPS_NAT_MASK) {
+	if (ret == NF_ACCEPT && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK) {
 		nf_nat_sip = rcu_dereference(nf_nat_sip_hook);
 		if (nf_nat_sip && !nf_nat_sip(skb, dataoff, dptr, datalen))
 			ret = NF_DROP;
@@ -1534,7 +1539,8 @@ static int sip_help_tcp(struct sk_buff *skb, unsigned int protoff,
 		datalen  = datalen + diff - msglen;
 	}
 
-	if (ret == NF_ACCEPT && ct->status & IPS_NAT_MASK) {
+	if (ret == NF_ACCEPT && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+	    ct->status & IPS_NAT_MASK) {
 		nf_nat_sip_seq_adjust = rcu_dereference(nf_nat_sip_seq_adjust_hook);
 		if (nf_nat_sip_seq_adjust)
 			nf_nat_sip_seq_adjust(skb, tdiff);
diff --git a/net/netfilter/nf_conntrack_tftp.c b/net/netfilter/nf_conntrack_tftp.c
index 81fc61c..9363e1c 100644
--- a/net/netfilter/nf_conntrack_tftp.c
+++ b/net/netfilter/nf_conntrack_tftp.c
@@ -72,7 +72,8 @@ static int tftp_help(struct sk_buff *skb,
 		nf_ct_dump_tuple(&exp->tuple);
 
 		nf_nat_tftp = rcu_dereference(nf_nat_tftp_hook);
-		if (nf_nat_tftp && ct->status & IPS_NAT_MASK)
+		if (nf_nat_tftp && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
+		    ct->status & IPS_NAT_MASK)
 			ret = nf_nat_tftp(skb, ctinfo, exp);
 		else if (nf_ct_expect_related(exp) != 0)
 			ret = NF_DROP;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 08/19] netfilter: nf_nat: add protoff argument to packet mangling functions
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (6 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 07/19] netfilter: nf_conntrack: restrict NAT helper invocation to IPv4 kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-09 20:08 ` [PATCH 09/19] netfilter: add protocol independant NAT core kaber
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter/nf_conntrack_amanda.h  |    1 +
 include/linux/netfilter/nf_conntrack_ftp.h     |    1 +
 include/linux/netfilter/nf_conntrack_h323.h    |   15 ++-
 include/linux/netfilter/nf_conntrack_irc.h     |    1 +
 include/linux/netfilter/nf_conntrack_pptp.h    |    2 +
 include/linux/netfilter/nf_conntrack_sip.h     |   12 ++-
 include/net/netfilter/nf_nat_helper.h          |   11 +-
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |    6 +-
 net/ipv4/netfilter/nf_nat_amanda.c             |    3 +-
 net/ipv4/netfilter/nf_nat_ftp.c                |    3 +-
 net/ipv4/netfilter/nf_nat_h323.c               |   48 ++++---
 net/ipv4/netfilter/nf_nat_helper.c             |    9 +-
 net/ipv4/netfilter/nf_nat_irc.c                |    3 +-
 net/ipv4/netfilter/nf_nat_pptp.c               |    6 +-
 net/ipv4/netfilter/nf_nat_sip.c                |   96 +++++++-----
 net/netfilter/ipvs/ip_vs_ftp.c                 |    1 +
 net/netfilter/nf_conntrack_amanda.c            |    5 +-
 net/netfilter/nf_conntrack_ftp.c               |    3 +-
 net/netfilter/nf_conntrack_h323_main.c         |  191 +++++++++++++++---------
 net/netfilter/nf_conntrack_irc.c               |    3 +-
 net/netfilter/nf_conntrack_pptp.c              |   18 ++-
 net/netfilter/nf_conntrack_sip.c               |   95 +++++++-----
 22 files changed, 328 insertions(+), 205 deletions(-)

diff --git a/include/linux/netfilter/nf_conntrack_amanda.h b/include/linux/netfilter/nf_conntrack_amanda.h
index 0bb5a69..4b59a15 100644
--- a/include/linux/netfilter/nf_conntrack_amanda.h
+++ b/include/linux/netfilter/nf_conntrack_amanda.h
@@ -4,6 +4,7 @@
 
 extern unsigned int (*nf_nat_amanda_hook)(struct sk_buff *skb,
 					  enum ip_conntrack_info ctinfo,
+					  unsigned int protoff,
 					  unsigned int matchoff,
 					  unsigned int matchlen,
 					  struct nf_conntrack_expect *exp);
diff --git a/include/linux/netfilter/nf_conntrack_ftp.h b/include/linux/netfilter/nf_conntrack_ftp.h
index 3e3aa08..28f18df 100644
--- a/include/linux/netfilter/nf_conntrack_ftp.h
+++ b/include/linux/netfilter/nf_conntrack_ftp.h
@@ -34,6 +34,7 @@ struct nf_conntrack_expect;
 extern unsigned int (*nf_nat_ftp_hook)(struct sk_buff *skb,
 				       enum ip_conntrack_info ctinfo,
 				       enum nf_ct_ftp_type type,
+				       unsigned int protoff,
 				       unsigned int matchoff,
 				       unsigned int matchlen,
 				       struct nf_conntrack_expect *exp);
diff --git a/include/linux/netfilter/nf_conntrack_h323.h b/include/linux/netfilter/nf_conntrack_h323.h
index 26f9226..f381020 100644
--- a/include/linux/netfilter/nf_conntrack_h323.h
+++ b/include/linux/netfilter/nf_conntrack_h323.h
@@ -36,12 +36,12 @@ extern void nf_conntrack_h245_expect(struct nf_conn *new,
 				     struct nf_conntrack_expect *this);
 extern void nf_conntrack_q931_expect(struct nf_conn *new,
 				     struct nf_conntrack_expect *this);
-extern int (*set_h245_addr_hook) (struct sk_buff *skb,
+extern int (*set_h245_addr_hook) (struct sk_buff *skb, unsigned int protoff,
 				  unsigned char **data, int dataoff,
 				  H245_TransportAddress *taddr,
 				  union nf_inet_addr *addr,
 				  __be16 port);
-extern int (*set_h225_addr_hook) (struct sk_buff *skb,
+extern int (*set_h225_addr_hook) (struct sk_buff *skb, unsigned int protoff,
 				  unsigned char **data, int dataoff,
 				  TransportAddress *taddr,
 				  union nf_inet_addr *addr,
@@ -49,40 +49,45 @@ extern int (*set_h225_addr_hook) (struct sk_buff *skb,
 extern int (*set_sig_addr_hook) (struct sk_buff *skb,
 				 struct nf_conn *ct,
 				 enum ip_conntrack_info ctinfo,
-				 unsigned char **data,
+				 unsigned int protoff, unsigned char **data,
 				 TransportAddress *taddr, int count);
 extern int (*set_ras_addr_hook) (struct sk_buff *skb,
 				 struct nf_conn *ct,
 				 enum ip_conntrack_info ctinfo,
-				 unsigned char **data,
+				 unsigned int protoff, unsigned char **data,
 				 TransportAddress *taddr, int count);
 extern int (*nat_rtp_rtcp_hook) (struct sk_buff *skb,
 				 struct nf_conn *ct,
 				 enum ip_conntrack_info ctinfo,
-				 unsigned char **data, int dataoff,
+				 unsigned int protoff, unsigned char **data,
+				 int dataoff,
 				 H245_TransportAddress *taddr,
 				 __be16 port, __be16 rtp_port,
 				 struct nf_conntrack_expect *rtp_exp,
 				 struct nf_conntrack_expect *rtcp_exp);
 extern int (*nat_t120_hook) (struct sk_buff *skb, struct nf_conn *ct,
 			     enum ip_conntrack_info ctinfo,
+			     unsigned int protoff,
 			     unsigned char **data, int dataoff,
 			     H245_TransportAddress *taddr, __be16 port,
 			     struct nf_conntrack_expect *exp);
 extern int (*nat_h245_hook) (struct sk_buff *skb, struct nf_conn *ct,
 			     enum ip_conntrack_info ctinfo,
+			     unsigned int protoff,
 			     unsigned char **data, int dataoff,
 			     TransportAddress *taddr, __be16 port,
 			     struct nf_conntrack_expect *exp);
 extern int (*nat_callforwarding_hook) (struct sk_buff *skb,
 				       struct nf_conn *ct,
 				       enum ip_conntrack_info ctinfo,
+				       unsigned int protoff,
 				       unsigned char **data, int dataoff,
 				       TransportAddress *taddr,
 				       __be16 port,
 				       struct nf_conntrack_expect *exp);
 extern int (*nat_q931_hook) (struct sk_buff *skb, struct nf_conn *ct,
 			     enum ip_conntrack_info ctinfo,
+			     unsigned int protoff,
 			     unsigned char **data, TransportAddress *taddr,
 			     int idx, __be16 port,
 			     struct nf_conntrack_expect *exp);
diff --git a/include/linux/netfilter/nf_conntrack_irc.h b/include/linux/netfilter/nf_conntrack_irc.h
index 36282bf..4bb9bae 100644
--- a/include/linux/netfilter/nf_conntrack_irc.h
+++ b/include/linux/netfilter/nf_conntrack_irc.h
@@ -7,6 +7,7 @@
 
 extern unsigned int (*nf_nat_irc_hook)(struct sk_buff *skb,
 				       enum ip_conntrack_info ctinfo,
+				       unsigned int protoff,
 				       unsigned int matchoff,
 				       unsigned int matchlen,
 				       struct nf_conntrack_expect *exp);
diff --git a/include/linux/netfilter/nf_conntrack_pptp.h b/include/linux/netfilter/nf_conntrack_pptp.h
index 3bbde0c..2ab2830 100644
--- a/include/linux/netfilter/nf_conntrack_pptp.h
+++ b/include/linux/netfilter/nf_conntrack_pptp.h
@@ -303,12 +303,14 @@ struct nf_conntrack_expect;
 extern int
 (*nf_nat_pptp_hook_outbound)(struct sk_buff *skb,
 			     struct nf_conn *ct, enum ip_conntrack_info ctinfo,
+			     unsigned int protoff,
 			     struct PptpControlHeader *ctlh,
 			     union pptp_ctrl_union *pptpReq);
 
 extern int
 (*nf_nat_pptp_hook_inbound)(struct sk_buff *skb,
 			    struct nf_conn *ct, enum ip_conntrack_info ctinfo,
+			    unsigned int protoff,
 			    struct PptpControlHeader *ctlh,
 			    union pptp_ctrl_union *pptpReq);
 
diff --git a/include/linux/netfilter/nf_conntrack_sip.h b/include/linux/netfilter/nf_conntrack_sip.h
index 89f2a62..1afc669 100644
--- a/include/linux/netfilter/nf_conntrack_sip.h
+++ b/include/linux/netfilter/nf_conntrack_sip.h
@@ -37,10 +37,12 @@ struct sdp_media_type {
 struct sip_handler {
 	const char	*method;
 	unsigned int	len;
-	int		(*request)(struct sk_buff *skb, unsigned int dataoff,
+	int		(*request)(struct sk_buff *skb, unsigned int protoff,
+				   unsigned int dataoff,
 				   const char **dptr, unsigned int *datalen,
 				   unsigned int cseq);
-	int		(*response)(struct sk_buff *skb, unsigned int dataoff,
+	int		(*response)(struct sk_buff *skb, unsigned int protoff,
+				    unsigned int dataoff,
 				    const char **dptr, unsigned int *datalen,
 				    unsigned int cseq, unsigned int code);
 };
@@ -105,11 +107,13 @@ enum sdp_header_types {
 };
 
 extern unsigned int (*nf_nat_sip_hook)(struct sk_buff *skb,
+				       unsigned int protoff,
 				       unsigned int dataoff,
 				       const char **dptr,
 				       unsigned int *datalen);
 extern void (*nf_nat_sip_seq_adjust_hook)(struct sk_buff *skb, s16 off);
 extern unsigned int (*nf_nat_sip_expect_hook)(struct sk_buff *skb,
+					      unsigned int protoff,
 					      unsigned int dataoff,
 					      const char **dptr,
 					      unsigned int *datalen,
@@ -117,6 +121,7 @@ extern unsigned int (*nf_nat_sip_expect_hook)(struct sk_buff *skb,
 					      unsigned int matchoff,
 					      unsigned int matchlen);
 extern unsigned int (*nf_nat_sdp_addr_hook)(struct sk_buff *skb,
+					    unsigned int protoff,
 					    unsigned int dataoff,
 					    const char **dptr,
 					    unsigned int *datalen,
@@ -125,6 +130,7 @@ extern unsigned int (*nf_nat_sdp_addr_hook)(struct sk_buff *skb,
 					    enum sdp_header_types term,
 					    const union nf_inet_addr *addr);
 extern unsigned int (*nf_nat_sdp_port_hook)(struct sk_buff *skb,
+					    unsigned int protoff,
 					    unsigned int dataoff,
 					    const char **dptr,
 					    unsigned int *datalen,
@@ -132,12 +138,14 @@ extern unsigned int (*nf_nat_sdp_port_hook)(struct sk_buff *skb,
 					    unsigned int matchlen,
 					    u_int16_t port);
 extern unsigned int (*nf_nat_sdp_session_hook)(struct sk_buff *skb,
+					       unsigned int protoff,
 					       unsigned int dataoff,
 					       const char **dptr,
 					       unsigned int *datalen,
 					       unsigned int sdpoff,
 					       const union nf_inet_addr *addr);
 extern unsigned int (*nf_nat_sdp_media_hook)(struct sk_buff *skb,
+					     unsigned int protoff,
 					     unsigned int dataoff,
 					     const char **dptr,
 					     unsigned int *datalen,
diff --git a/include/net/netfilter/nf_nat_helper.h b/include/net/netfilter/nf_nat_helper.h
index 7d8fb7b..b4d6bfc 100644
--- a/include/net/netfilter/nf_nat_helper.h
+++ b/include/net/netfilter/nf_nat_helper.h
@@ -10,6 +10,7 @@ struct sk_buff;
 extern int __nf_nat_mangle_tcp_packet(struct sk_buff *skb,
 				      struct nf_conn *ct,
 				      enum ip_conntrack_info ctinfo,
+				      unsigned int protoff,
 				      unsigned int match_offset,
 				      unsigned int match_len,
 				      const char *rep_buffer,
@@ -18,12 +19,13 @@ extern int __nf_nat_mangle_tcp_packet(struct sk_buff *skb,
 static inline int nf_nat_mangle_tcp_packet(struct sk_buff *skb,
 					   struct nf_conn *ct,
 					   enum ip_conntrack_info ctinfo,
+					   unsigned int protoff,
 					   unsigned int match_offset,
 					   unsigned int match_len,
 					   const char *rep_buffer,
 					   unsigned int rep_len)
 {
-	return __nf_nat_mangle_tcp_packet(skb, ct, ctinfo,
+	return __nf_nat_mangle_tcp_packet(skb, ct, ctinfo, protoff,
 					  match_offset, match_len,
 					  rep_buffer, rep_len, true);
 }
@@ -31,6 +33,7 @@ static inline int nf_nat_mangle_tcp_packet(struct sk_buff *skb,
 extern int nf_nat_mangle_udp_packet(struct sk_buff *skb,
 				    struct nf_conn *ct,
 				    enum ip_conntrack_info ctinfo,
+				    unsigned int protoff,
 				    unsigned int match_offset,
 				    unsigned int match_len,
 				    const char *rep_buffer,
@@ -41,10 +44,12 @@ extern void nf_nat_set_seq_adjust(struct nf_conn *ct,
 				  __be32 seq, s16 off);
 extern int nf_nat_seq_adjust(struct sk_buff *skb,
 			     struct nf_conn *ct,
-			     enum ip_conntrack_info ctinfo);
+			     enum ip_conntrack_info ctinfo,
+			     unsigned int protoff);
 extern int (*nf_nat_seq_adjust_hook)(struct sk_buff *skb,
 				     struct nf_conn *ct,
-				     enum ip_conntrack_info ctinfo);
+				     enum ip_conntrack_info ctinfo,
+				     unsigned int protoff);
 
 /* Setup NAT on this expected conntrack so it follows master, but goes
  * to port ct->master->saved_proto. */
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index e7ff2dc..4ada329 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -31,7 +31,8 @@
 
 int (*nf_nat_seq_adjust_hook)(struct sk_buff *skb,
 			      struct nf_conn *ct,
-			      enum ip_conntrack_info ctinfo);
+			      enum ip_conntrack_info ctinfo,
+			      unsigned int protoff);
 EXPORT_SYMBOL_GPL(nf_nat_seq_adjust_hook);
 
 static bool ipv4_pkt_to_tuple(const struct sk_buff *skb, unsigned int nhoff,
@@ -149,7 +150,8 @@ static unsigned int ipv4_confirm(unsigned int hooknum,
 		typeof(nf_nat_seq_adjust_hook) seq_adjust;
 
 		seq_adjust = rcu_dereference(nf_nat_seq_adjust_hook);
-		if (!seq_adjust || !seq_adjust(skb, ct, ctinfo)) {
+		if (!seq_adjust ||
+		    !seq_adjust(skb, ct, ctinfo, ip_hdrlen(skb))) {
 			NF_CT_STAT_INC_ATOMIC(nf_ct_net(ct), drop);
 			return NF_DROP;
 		}
diff --git a/net/ipv4/netfilter/nf_nat_amanda.c b/net/ipv4/netfilter/nf_nat_amanda.c
index 3c04d24..75464b6 100644
--- a/net/ipv4/netfilter/nf_nat_amanda.c
+++ b/net/ipv4/netfilter/nf_nat_amanda.c
@@ -26,6 +26,7 @@ MODULE_ALIAS("ip_nat_amanda");
 
 static unsigned int help(struct sk_buff *skb,
 			 enum ip_conntrack_info ctinfo,
+			 unsigned int protoff,
 			 unsigned int matchoff,
 			 unsigned int matchlen,
 			 struct nf_conntrack_expect *exp)
@@ -61,7 +62,7 @@ static unsigned int help(struct sk_buff *skb,
 
 	sprintf(buffer, "%u", port);
 	ret = nf_nat_mangle_udp_packet(skb, exp->master, ctinfo,
-				       matchoff, matchlen,
+				       protoff, matchoff, matchlen,
 				       buffer, strlen(buffer));
 	if (ret != NF_ACCEPT)
 		nf_ct_unexpect_related(exp);
diff --git a/net/ipv4/netfilter/nf_nat_ftp.c b/net/ipv4/netfilter/nf_nat_ftp.c
index e462a95..5589f3a 100644
--- a/net/ipv4/netfilter/nf_nat_ftp.c
+++ b/net/ipv4/netfilter/nf_nat_ftp.c
@@ -55,6 +55,7 @@ static int nf_nat_ftp_fmt_cmd(enum nf_ct_ftp_type type,
 static unsigned int nf_nat_ftp(struct sk_buff *skb,
 			       enum ip_conntrack_info ctinfo,
 			       enum nf_ct_ftp_type type,
+			       unsigned int protoff,
 			       unsigned int matchoff,
 			       unsigned int matchlen,
 			       struct nf_conntrack_expect *exp)
@@ -100,7 +101,7 @@ static unsigned int nf_nat_ftp(struct sk_buff *skb,
 
 	pr_debug("calling nf_nat_mangle_tcp_packet\n");
 
-	if (!nf_nat_mangle_tcp_packet(skb, ct, ctinfo, matchoff,
+	if (!nf_nat_mangle_tcp_packet(skb, ct, ctinfo, protoff, matchoff,
 				      matchlen, buffer, buflen))
 		goto out;
 
diff --git a/net/ipv4/netfilter/nf_nat_h323.c b/net/ipv4/netfilter/nf_nat_h323.c
index c6784a1..d2c228d 100644
--- a/net/ipv4/netfilter/nf_nat_h323.c
+++ b/net/ipv4/netfilter/nf_nat_h323.c
@@ -21,7 +21,7 @@
 #include <linux/netfilter/nf_conntrack_h323.h>
 
 /****************************************************************************/
-static int set_addr(struct sk_buff *skb,
+static int set_addr(struct sk_buff *skb, unsigned int protoff,
 		    unsigned char **data, int dataoff,
 		    unsigned int addroff, __be32 ip, __be16 port)
 {
@@ -40,7 +40,7 @@ static int set_addr(struct sk_buff *skb,
 
 	if (ip_hdr(skb)->protocol == IPPROTO_TCP) {
 		if (!nf_nat_mangle_tcp_packet(skb, ct, ctinfo,
-					      addroff, sizeof(buf),
+					      protoff, addroff, sizeof(buf),
 					      (char *) &buf, sizeof(buf))) {
 			net_notice_ratelimited("nf_nat_h323: nf_nat_mangle_tcp_packet error\n");
 			return -1;
@@ -54,7 +54,7 @@ static int set_addr(struct sk_buff *skb,
 		*data = skb->data + ip_hdrlen(skb) + th->doff * 4 + dataoff;
 	} else {
 		if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo,
-					      addroff, sizeof(buf),
+					      protoff, addroff, sizeof(buf),
 					      (char *) &buf, sizeof(buf))) {
 			net_notice_ratelimited("nf_nat_h323: nf_nat_mangle_udp_packet error\n");
 			return -1;
@@ -69,22 +69,22 @@ static int set_addr(struct sk_buff *skb,
 }
 
 /****************************************************************************/
-static int set_h225_addr(struct sk_buff *skb,
+static int set_h225_addr(struct sk_buff *skb, unsigned int protoff,
 			 unsigned char **data, int dataoff,
 			 TransportAddress *taddr,
 			 union nf_inet_addr *addr, __be16 port)
 {
-	return set_addr(skb, data, dataoff, taddr->ipAddress.ip,
+	return set_addr(skb, protoff, data, dataoff, taddr->ipAddress.ip,
 			addr->ip, port);
 }
 
 /****************************************************************************/
-static int set_h245_addr(struct sk_buff *skb,
+static int set_h245_addr(struct sk_buff *skb, unsigned protoff,
 			 unsigned char **data, int dataoff,
 			 H245_TransportAddress *taddr,
 			 union nf_inet_addr *addr, __be16 port)
 {
-	return set_addr(skb, data, dataoff,
+	return set_addr(skb, protoff, data, dataoff,
 			taddr->unicastAddress.iPAddress.network,
 			addr->ip, port);
 }
@@ -92,7 +92,7 @@ static int set_h245_addr(struct sk_buff *skb,
 /****************************************************************************/
 static int set_sig_addr(struct sk_buff *skb, struct nf_conn *ct,
 			enum ip_conntrack_info ctinfo,
-			unsigned char **data,
+			unsigned int protoff, unsigned char **data,
 			TransportAddress *taddr, int count)
 {
 	const struct nf_ct_h323_master *info = nfct_help_data(ct);
@@ -118,7 +118,8 @@ static int set_sig_addr(struct sk_buff *skb, struct nf_conn *ct,
 					 &addr.ip, port,
 					 &ct->tuplehash[!dir].tuple.dst.u3.ip,
 					 info->sig_port[!dir]);
-				return set_h225_addr(skb, data, 0, &taddr[i],
+				return set_h225_addr(skb, protoff, data, 0,
+						     &taddr[i],
 						     &ct->tuplehash[!dir].
 						     tuple.dst.u3,
 						     info->sig_port[!dir]);
@@ -129,7 +130,8 @@ static int set_sig_addr(struct sk_buff *skb, struct nf_conn *ct,
 					 &addr.ip, port,
 					 &ct->tuplehash[!dir].tuple.src.u3.ip,
 					 info->sig_port[!dir]);
-				return set_h225_addr(skb, data, 0, &taddr[i],
+				return set_h225_addr(skb, protoff, data, 0,
+						     &taddr[i],
 						     &ct->tuplehash[!dir].
 						     tuple.src.u3,
 						     info->sig_port[!dir]);
@@ -143,7 +145,7 @@ static int set_sig_addr(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int set_ras_addr(struct sk_buff *skb, struct nf_conn *ct,
 			enum ip_conntrack_info ctinfo,
-			unsigned char **data,
+			unsigned int protoff, unsigned char **data,
 			TransportAddress *taddr, int count)
 {
 	int dir = CTINFO2DIR(ctinfo);
@@ -159,7 +161,7 @@ static int set_ras_addr(struct sk_buff *skb, struct nf_conn *ct,
 				 &addr.ip, ntohs(port),
 				 &ct->tuplehash[!dir].tuple.dst.u3.ip,
 				 ntohs(ct->tuplehash[!dir].tuple.dst.u.udp.port));
-			return set_h225_addr(skb, data, 0, &taddr[i],
+			return set_h225_addr(skb, protoff, data, 0, &taddr[i],
 					     &ct->tuplehash[!dir].tuple.dst.u3,
 					     ct->tuplehash[!dir].tuple.
 								dst.u.udp.port);
@@ -172,7 +174,7 @@ static int set_ras_addr(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int nat_rtp_rtcp(struct sk_buff *skb, struct nf_conn *ct,
 			enum ip_conntrack_info ctinfo,
-			unsigned char **data, int dataoff,
+			unsigned int protoff, unsigned char **data, int dataoff,
 			H245_TransportAddress *taddr,
 			__be16 port, __be16 rtp_port,
 			struct nf_conntrack_expect *rtp_exp,
@@ -244,7 +246,7 @@ static int nat_rtp_rtcp(struct sk_buff *skb, struct nf_conn *ct,
 	}
 
 	/* Modify signal */
-	if (set_h245_addr(skb, data, dataoff, taddr,
+	if (set_h245_addr(skb, protoff, data, dataoff, taddr,
 			  &ct->tuplehash[!dir].tuple.dst.u3,
 			  htons((port & htons(1)) ? nated_port + 1 :
 						    nated_port)) == 0) {
@@ -275,7 +277,7 @@ static int nat_rtp_rtcp(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int nat_t120(struct sk_buff *skb, struct nf_conn *ct,
 		    enum ip_conntrack_info ctinfo,
-		    unsigned char **data, int dataoff,
+		    unsigned int protoff, unsigned char **data, int dataoff,
 		    H245_TransportAddress *taddr, __be16 port,
 		    struct nf_conntrack_expect *exp)
 {
@@ -307,7 +309,7 @@ static int nat_t120(struct sk_buff *skb, struct nf_conn *ct,
 	}
 
 	/* Modify signal */
-	if (set_h245_addr(skb, data, dataoff, taddr,
+	if (set_h245_addr(skb, protoff, data, dataoff, taddr,
 			  &ct->tuplehash[!dir].tuple.dst.u3,
 			  htons(nated_port)) < 0) {
 		nf_ct_unexpect_related(exp);
@@ -326,7 +328,7 @@ static int nat_t120(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int nat_h245(struct sk_buff *skb, struct nf_conn *ct,
 		    enum ip_conntrack_info ctinfo,
-		    unsigned char **data, int dataoff,
+		    unsigned int protoff, unsigned char **data, int dataoff,
 		    TransportAddress *taddr, __be16 port,
 		    struct nf_conntrack_expect *exp)
 {
@@ -363,7 +365,7 @@ static int nat_h245(struct sk_buff *skb, struct nf_conn *ct,
 	}
 
 	/* Modify signal */
-	if (set_h225_addr(skb, data, dataoff, taddr,
+	if (set_h225_addr(skb, protoff, data, dataoff, taddr,
 			  &ct->tuplehash[!dir].tuple.dst.u3,
 			  htons(nated_port)) == 0) {
 		/* Save ports */
@@ -416,7 +418,8 @@ static void ip_nat_q931_expect(struct nf_conn *new,
 /****************************************************************************/
 static int nat_q931(struct sk_buff *skb, struct nf_conn *ct,
 		    enum ip_conntrack_info ctinfo,
-		    unsigned char **data, TransportAddress *taddr, int idx,
+		    unsigned int protoff, unsigned char **data,
+		    TransportAddress *taddr, int idx,
 		    __be16 port, struct nf_conntrack_expect *exp)
 {
 	struct nf_ct_h323_master *info = nfct_help_data(ct);
@@ -453,7 +456,7 @@ static int nat_q931(struct sk_buff *skb, struct nf_conn *ct,
 	}
 
 	/* Modify signal */
-	if (set_h225_addr(skb, data, 0, &taddr[idx],
+	if (set_h225_addr(skb, protoff, data, 0, &taddr[idx],
 			  &ct->tuplehash[!dir].tuple.dst.u3,
 			  htons(nated_port)) == 0) {
 		/* Save ports */
@@ -464,7 +467,7 @@ static int nat_q931(struct sk_buff *skb, struct nf_conn *ct,
 		if (idx > 0 &&
 		    get_h225_addr(ct, *data, &taddr[0], &addr, &port) &&
 		    (ntohl(addr.ip) & 0xff000000) == 0x7f000000) {
-			set_h225_addr(skb, data, 0, &taddr[0],
+			set_h225_addr(skb, protoff, data, 0, &taddr[0],
 				      &ct->tuplehash[!dir].tuple.dst.u3,
 				      info->sig_port[!dir]);
 		}
@@ -507,6 +510,7 @@ static void ip_nat_callforwarding_expect(struct nf_conn *new,
 /****************************************************************************/
 static int nat_callforwarding(struct sk_buff *skb, struct nf_conn *ct,
 			      enum ip_conntrack_info ctinfo,
+			      unsigned int protoff,
 			      unsigned char **data, int dataoff,
 			      TransportAddress *taddr, __be16 port,
 			      struct nf_conntrack_expect *exp)
@@ -541,7 +545,7 @@ static int nat_callforwarding(struct sk_buff *skb, struct nf_conn *ct,
 	}
 
 	/* Modify signal */
-	if (!set_h225_addr(skb, data, dataoff, taddr,
+	if (!set_h225_addr(skb, protoff, data, dataoff, taddr,
 			   &ct->tuplehash[!dir].tuple.dst.u3,
 			   htons(nated_port)) == 0) {
 		nf_ct_unexpect_related(exp);
diff --git a/net/ipv4/netfilter/nf_nat_helper.c b/net/ipv4/netfilter/nf_nat_helper.c
index 2e59ad0..2fefec5 100644
--- a/net/ipv4/netfilter/nf_nat_helper.c
+++ b/net/ipv4/netfilter/nf_nat_helper.c
@@ -206,6 +206,7 @@ static void nf_nat_csum(struct sk_buff *skb, const struct iphdr *iph, void *data
 int __nf_nat_mangle_tcp_packet(struct sk_buff *skb,
 			       struct nf_conn *ct,
 			       enum ip_conntrack_info ctinfo,
+			       unsigned int protoff,
 			       unsigned int match_offset,
 			       unsigned int match_len,
 			       const char *rep_buffer,
@@ -257,6 +258,7 @@ int
 nf_nat_mangle_udp_packet(struct sk_buff *skb,
 			 struct nf_conn *ct,
 			 enum ip_conntrack_info ctinfo,
+			 unsigned int protoff,
 			 unsigned int match_offset,
 			 unsigned int match_len,
 			 const char *rep_buffer,
@@ -387,7 +389,8 @@ nf_nat_sack_adjust(struct sk_buff *skb,
 int
 nf_nat_seq_adjust(struct sk_buff *skb,
 		  struct nf_conn *ct,
-		  enum ip_conntrack_info ctinfo)
+		  enum ip_conntrack_info ctinfo,
+		  unsigned int protoff)
 {
 	struct tcphdr *tcph;
 	int dir;
@@ -401,10 +404,10 @@ nf_nat_seq_adjust(struct sk_buff *skb,
 	this_way = &nat->seq[dir];
 	other_way = &nat->seq[!dir];
 
-	if (!skb_make_writable(skb, ip_hdrlen(skb) + sizeof(*tcph)))
+	if (!skb_make_writable(skb, protoff + sizeof(*tcph)))
 		return 0;
 
-	tcph = (void *)skb->data + ip_hdrlen(skb);
+	tcph = (void *)skb->data + protoff;
 	if (after(ntohl(tcph->seq), this_way->correction_pos))
 		seqoff = this_way->offset_after;
 	else
diff --git a/net/ipv4/netfilter/nf_nat_irc.c b/net/ipv4/netfilter/nf_nat_irc.c
index 979ae16..5b0c20a 100644
--- a/net/ipv4/netfilter/nf_nat_irc.c
+++ b/net/ipv4/netfilter/nf_nat_irc.c
@@ -29,6 +29,7 @@ MODULE_ALIAS("ip_nat_irc");
 
 static unsigned int help(struct sk_buff *skb,
 			 enum ip_conntrack_info ctinfo,
+			 unsigned int protoff,
 			 unsigned int matchoff,
 			 unsigned int matchlen,
 			 struct nf_conntrack_expect *exp)
@@ -66,7 +67,7 @@ static unsigned int help(struct sk_buff *skb,
 		 buffer, &ip, port);
 
 	ret = nf_nat_mangle_tcp_packet(skb, exp->master, ctinfo,
-				       matchoff, matchlen, buffer,
+				       protoff, matchoff, matchlen, buffer,
 				       strlen(buffer));
 	if (ret != NF_ACCEPT)
 		nf_ct_unexpect_related(exp);
diff --git a/net/ipv4/netfilter/nf_nat_pptp.c b/net/ipv4/netfilter/nf_nat_pptp.c
index 3881408..31ef890 100644
--- a/net/ipv4/netfilter/nf_nat_pptp.c
+++ b/net/ipv4/netfilter/nf_nat_pptp.c
@@ -113,6 +113,7 @@ static int
 pptp_outbound_pkt(struct sk_buff *skb,
 		  struct nf_conn *ct,
 		  enum ip_conntrack_info ctinfo,
+		  unsigned int protoff,
 		  struct PptpControlHeader *ctlh,
 		  union pptp_ctrl_union *pptpReq)
 
@@ -175,7 +176,7 @@ pptp_outbound_pkt(struct sk_buff *skb,
 		 ntohs(REQ_CID(pptpReq, cid_off)), ntohs(new_callid));
 
 	/* mangle packet */
-	if (nf_nat_mangle_tcp_packet(skb, ct, ctinfo,
+	if (nf_nat_mangle_tcp_packet(skb, ct, ctinfo, protoff,
 				     cid_off + sizeof(struct pptp_pkt_hdr) +
 				     sizeof(struct PptpControlHeader),
 				     sizeof(new_callid), (char *)&new_callid,
@@ -216,6 +217,7 @@ static int
 pptp_inbound_pkt(struct sk_buff *skb,
 		 struct nf_conn *ct,
 		 enum ip_conntrack_info ctinfo,
+		 unsigned int protoff,
 		 struct PptpControlHeader *ctlh,
 		 union pptp_ctrl_union *pptpReq)
 {
@@ -268,7 +270,7 @@ pptp_inbound_pkt(struct sk_buff *skb,
 	pr_debug("altering peer call id from 0x%04x to 0x%04x\n",
 		 ntohs(REQ_CID(pptpReq, pcid_off)), ntohs(new_pcid));
 
-	if (nf_nat_mangle_tcp_packet(skb, ct, ctinfo,
+	if (nf_nat_mangle_tcp_packet(skb, ct, ctinfo, protoff,
 				     pcid_off + sizeof(struct pptp_pkt_hdr) +
 				     sizeof(struct PptpControlHeader),
 				     sizeof(new_pcid), (char *)&new_pcid,
diff --git a/net/ipv4/netfilter/nf_nat_sip.c b/net/ipv4/netfilter/nf_nat_sip.c
index 4ad9cf1..df626af 100644
--- a/net/ipv4/netfilter/nf_nat_sip.c
+++ b/net/ipv4/netfilter/nf_nat_sip.c
@@ -30,7 +30,8 @@ MODULE_DESCRIPTION("SIP NAT helper");
 MODULE_ALIAS("ip_nat_sip");
 
 
-static unsigned int mangle_packet(struct sk_buff *skb, unsigned int dataoff,
+static unsigned int mangle_packet(struct sk_buff *skb, unsigned int protoff,
+				  unsigned int dataoff,
 				  const char **dptr, unsigned int *datalen,
 				  unsigned int matchoff, unsigned int matchlen,
 				  const char *buffer, unsigned int buflen)
@@ -46,7 +47,7 @@ static unsigned int mangle_packet(struct sk_buff *skb, unsigned int dataoff,
 		matchoff += dataoff - baseoff;
 
 		if (!__nf_nat_mangle_tcp_packet(skb, ct, ctinfo,
-						matchoff, matchlen,
+						protoff, matchoff, matchlen,
 						buffer, buflen, false))
 			return 0;
 	} else {
@@ -54,7 +55,7 @@ static unsigned int mangle_packet(struct sk_buff *skb, unsigned int dataoff,
 		matchoff += dataoff - baseoff;
 
 		if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo,
-					      matchoff, matchlen,
+					      protoff, matchoff, matchlen,
 					      buffer, buflen))
 			return 0;
 	}
@@ -65,7 +66,8 @@ static unsigned int mangle_packet(struct sk_buff *skb, unsigned int dataoff,
 	return 1;
 }
 
-static int map_addr(struct sk_buff *skb, unsigned int dataoff,
+static int map_addr(struct sk_buff *skb, unsigned int protoff,
+		    unsigned int dataoff,
 		    const char **dptr, unsigned int *datalen,
 		    unsigned int matchoff, unsigned int matchlen,
 		    union nf_inet_addr *addr, __be16 port)
@@ -94,11 +96,12 @@ static int map_addr(struct sk_buff *skb, unsigned int dataoff,
 
 	buflen = sprintf(buffer, "%pI4:%u", &newaddr, ntohs(newport));
 
-	return mangle_packet(skb, dataoff, dptr, datalen, matchoff, matchlen,
-			     buffer, buflen);
+	return mangle_packet(skb, protoff, dataoff, dptr, datalen,
+			     matchoff, matchlen, buffer, buflen);
 }
 
-static int map_sip_addr(struct sk_buff *skb, unsigned int dataoff,
+static int map_sip_addr(struct sk_buff *skb, unsigned int protoff,
+			unsigned int dataoff,
 			const char **dptr, unsigned int *datalen,
 			enum sip_header_types type)
 {
@@ -111,11 +114,12 @@ static int map_sip_addr(struct sk_buff *skb, unsigned int dataoff,
 	if (ct_sip_parse_header_uri(ct, *dptr, NULL, *datalen, type, NULL,
 				    &matchoff, &matchlen, &addr, &port) <= 0)
 		return 1;
-	return map_addr(skb, dataoff, dptr, datalen, matchoff, matchlen,
-			&addr, port);
+	return map_addr(skb, protoff, dataoff, dptr, datalen,
+			matchoff, matchlen, &addr, port);
 }
 
-static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
+static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int protoff,
+			       unsigned int dataoff,
 			       const char **dptr, unsigned int *datalen)
 {
 	enum ip_conntrack_info ctinfo;
@@ -132,8 +136,8 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 		if (ct_sip_parse_request(ct, *dptr, *datalen,
 					 &matchoff, &matchlen,
 					 &addr, &port) > 0 &&
-		    !map_addr(skb, dataoff, dptr, datalen, matchoff, matchlen,
-			      &addr, port))
+		    !map_addr(skb, protoff, dataoff, dptr, datalen,
+			      matchoff, matchlen, &addr, port))
 			return NF_DROP;
 		request = 1;
 	} else
@@ -164,8 +168,8 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 		}
 
 		olen = *datalen;
-		if (!map_addr(skb, dataoff, dptr, datalen, matchoff, matchlen,
-			      &addr, port))
+		if (!map_addr(skb, protoff, dataoff, dptr, datalen,
+			      matchoff, matchlen, &addr, port))
 			return NF_DROP;
 
 		matchend = matchoff + matchlen + *datalen - olen;
@@ -179,7 +183,7 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 		    addr.ip != ct->tuplehash[!dir].tuple.dst.u3.ip) {
 			buflen = sprintf(buffer, "%pI4",
 					&ct->tuplehash[!dir].tuple.dst.u3.ip);
-			if (!mangle_packet(skb, dataoff, dptr, datalen,
+			if (!mangle_packet(skb, protoff, dataoff, dptr, datalen,
 					   poff, plen, buffer, buflen))
 				return NF_DROP;
 		}
@@ -193,7 +197,7 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 		    addr.ip != ct->tuplehash[!dir].tuple.src.u3.ip) {
 			buflen = sprintf(buffer, "%pI4",
 					&ct->tuplehash[!dir].tuple.src.u3.ip);
-			if (!mangle_packet(skb, dataoff, dptr, datalen,
+			if (!mangle_packet(skb, protoff, dataoff, dptr, datalen,
 					   poff, plen, buffer, buflen))
 				return NF_DROP;
 		}
@@ -207,7 +211,7 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 		    htons(n) != ct->tuplehash[!dir].tuple.src.u.udp.port) {
 			__be16 p = ct->tuplehash[!dir].tuple.src.u.udp.port;
 			buflen = sprintf(buffer, "%u", ntohs(p));
-			if (!mangle_packet(skb, dataoff, dptr, datalen,
+			if (!mangle_packet(skb, protoff, dataoff, dptr, datalen,
 					   poff, plen, buffer, buflen))
 				return NF_DROP;
 		}
@@ -221,13 +225,14 @@ next:
 				       SIP_HDR_CONTACT, &in_header,
 				       &matchoff, &matchlen,
 				       &addr, &port) > 0) {
-		if (!map_addr(skb, dataoff, dptr, datalen, matchoff, matchlen,
+		if (!map_addr(skb, protoff, dataoff, dptr, datalen,
+			      matchoff, matchlen,
 			      &addr, port))
 			return NF_DROP;
 	}
 
-	if (!map_sip_addr(skb, dataoff, dptr, datalen, SIP_HDR_FROM) ||
-	    !map_sip_addr(skb, dataoff, dptr, datalen, SIP_HDR_TO))
+	if (!map_sip_addr(skb, protoff, dataoff, dptr, datalen, SIP_HDR_FROM) ||
+	    !map_sip_addr(skb, protoff, dataoff, dptr, datalen, SIP_HDR_TO))
 		return NF_DROP;
 
 	return NF_ACCEPT;
@@ -272,7 +277,8 @@ static void ip_nat_sip_expected(struct nf_conn *ct,
 	}
 }
 
-static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int dataoff,
+static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
+				      unsigned int dataoff,
 				      const char **dptr, unsigned int *datalen,
 				      struct nf_conntrack_expect *exp,
 				      unsigned int matchoff,
@@ -326,7 +332,7 @@ static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int dataoff,
 	if (exp->tuple.dst.u3.ip != exp->saved_ip ||
 	    exp->tuple.dst.u.udp.port != exp->saved_proto.udp.port) {
 		buflen = sprintf(buffer, "%pI4:%u", &newip, port);
-		if (!mangle_packet(skb, dataoff, dptr, datalen,
+		if (!mangle_packet(skb, protoff, dataoff, dptr, datalen,
 				   matchoff, matchlen, buffer, buflen))
 			goto err;
 	}
@@ -337,7 +343,8 @@ err:
 	return NF_DROP;
 }
 
-static int mangle_content_len(struct sk_buff *skb, unsigned int dataoff,
+static int mangle_content_len(struct sk_buff *skb, unsigned int protoff,
+			      unsigned int dataoff,
 			      const char **dptr, unsigned int *datalen)
 {
 	enum ip_conntrack_info ctinfo;
@@ -359,11 +366,12 @@ static int mangle_content_len(struct sk_buff *skb, unsigned int dataoff,
 		return 0;
 
 	buflen = sprintf(buffer, "%u", c_len);
-	return mangle_packet(skb, dataoff, dptr, datalen, matchoff, matchlen,
-			     buffer, buflen);
+	return mangle_packet(skb, protoff, dataoff, dptr, datalen,
+			     matchoff, matchlen, buffer, buflen);
 }
 
-static int mangle_sdp_packet(struct sk_buff *skb, unsigned int dataoff,
+static int mangle_sdp_packet(struct sk_buff *skb, unsigned int protoff,
+			     unsigned int dataoff,
 			     const char **dptr, unsigned int *datalen,
 			     unsigned int sdpoff,
 			     enum sdp_header_types type,
@@ -377,11 +385,12 @@ static int mangle_sdp_packet(struct sk_buff *skb, unsigned int dataoff,
 	if (ct_sip_get_sdp_header(ct, *dptr, sdpoff, *datalen, type, term,
 				  &matchoff, &matchlen) <= 0)
 		return -ENOENT;
-	return mangle_packet(skb, dataoff, dptr, datalen, matchoff, matchlen,
-			     buffer, buflen) ? 0 : -EINVAL;
+	return mangle_packet(skb, protoff, dataoff, dptr, datalen,
+			     matchoff, matchlen, buffer, buflen) ? 0 : -EINVAL;
 }
 
-static unsigned int ip_nat_sdp_addr(struct sk_buff *skb, unsigned int dataoff,
+static unsigned int ip_nat_sdp_addr(struct sk_buff *skb, unsigned int protoff,
+				    unsigned int dataoff,
 				    const char **dptr, unsigned int *datalen,
 				    unsigned int sdpoff,
 				    enum sdp_header_types type,
@@ -392,14 +401,15 @@ static unsigned int ip_nat_sdp_addr(struct sk_buff *skb, unsigned int dataoff,
 	unsigned int buflen;
 
 	buflen = sprintf(buffer, "%pI4", &addr->ip);
-	if (mangle_sdp_packet(skb, dataoff, dptr, datalen, sdpoff, type, term,
-			      buffer, buflen))
+	if (mangle_sdp_packet(skb, protoff, dataoff, dptr, datalen,
+			      sdpoff, type, term, buffer, buflen))
 		return 0;
 
-	return mangle_content_len(skb, dataoff, dptr, datalen);
+	return mangle_content_len(skb, protoff, dataoff, dptr, datalen);
 }
 
-static unsigned int ip_nat_sdp_port(struct sk_buff *skb, unsigned int dataoff,
+static unsigned int ip_nat_sdp_port(struct sk_buff *skb, unsigned int protoff,
+				    unsigned int dataoff,
 				    const char **dptr, unsigned int *datalen,
 				    unsigned int matchoff,
 				    unsigned int matchlen,
@@ -409,14 +419,15 @@ static unsigned int ip_nat_sdp_port(struct sk_buff *skb, unsigned int dataoff,
 	unsigned int buflen;
 
 	buflen = sprintf(buffer, "%u", port);
-	if (!mangle_packet(skb, dataoff, dptr, datalen, matchoff, matchlen,
-			   buffer, buflen))
+	if (!mangle_packet(skb, protoff, dataoff, dptr, datalen,
+			   matchoff, matchlen, buffer, buflen))
 		return 0;
 
-	return mangle_content_len(skb, dataoff, dptr, datalen);
+	return mangle_content_len(skb, protoff, dataoff, dptr, datalen);
 }
 
-static unsigned int ip_nat_sdp_session(struct sk_buff *skb, unsigned int dataoff,
+static unsigned int ip_nat_sdp_session(struct sk_buff *skb, unsigned int protoff,
+				       unsigned int dataoff,
 				       const char **dptr, unsigned int *datalen,
 				       unsigned int sdpoff,
 				       const union nf_inet_addr *addr)
@@ -426,12 +437,12 @@ static unsigned int ip_nat_sdp_session(struct sk_buff *skb, unsigned int dataoff
 
 	/* Mangle session description owner and contact addresses */
 	buflen = sprintf(buffer, "%pI4", &addr->ip);
-	if (mangle_sdp_packet(skb, dataoff, dptr, datalen, sdpoff,
+	if (mangle_sdp_packet(skb, protoff, dataoff, dptr, datalen, sdpoff,
 			       SDP_HDR_OWNER_IP4, SDP_HDR_MEDIA,
 			       buffer, buflen))
 		return 0;
 
-	switch (mangle_sdp_packet(skb, dataoff, dptr, datalen, sdpoff,
+	switch (mangle_sdp_packet(skb, protoff, dataoff, dptr, datalen, sdpoff,
 				  SDP_HDR_CONNECTION_IP4, SDP_HDR_MEDIA,
 				  buffer, buflen)) {
 	case 0:
@@ -448,12 +459,13 @@ static unsigned int ip_nat_sdp_session(struct sk_buff *skb, unsigned int dataoff
 		return 0;
 	}
 
-	return mangle_content_len(skb, dataoff, dptr, datalen);
+	return mangle_content_len(skb, protoff, dataoff, dptr, datalen);
 }
 
 /* So, this packet has hit the connection tracking matching code.
    Mangle it, and change the expectation to match the new version. */
-static unsigned int ip_nat_sdp_media(struct sk_buff *skb, unsigned int dataoff,
+static unsigned int ip_nat_sdp_media(struct sk_buff *skb, unsigned int protoff,
+				     unsigned int dataoff,
 				     const char **dptr, unsigned int *datalen,
 				     struct nf_conntrack_expect *rtp_exp,
 				     struct nf_conntrack_expect *rtcp_exp,
@@ -514,7 +526,7 @@ static unsigned int ip_nat_sdp_media(struct sk_buff *skb, unsigned int dataoff,
 
 	/* Update media port. */
 	if (rtp_exp->tuple.dst.u.udp.port != rtp_exp->saved_proto.udp.port &&
-	    !ip_nat_sdp_port(skb, dataoff, dptr, datalen,
+	    !ip_nat_sdp_port(skb, protoff, dataoff, dptr, datalen,
 			     mediaoff, medialen, port))
 		goto err2;
 
diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index b20b29c..390c195 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -268,6 +268,7 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
 			 * packet.
 			 */
 			ret = nf_nat_mangle_tcp_packet(skb, ct, ctinfo,
+						       iph->ihl * 4,
 						       start-data, end-start,
 						       buf, buf_len);
 			if (ret) {
diff --git a/net/netfilter/nf_conntrack_amanda.c b/net/netfilter/nf_conntrack_amanda.c
index 184c0dc..e0212b5 100644
--- a/net/netfilter/nf_conntrack_amanda.c
+++ b/net/netfilter/nf_conntrack_amanda.c
@@ -40,6 +40,7 @@ MODULE_PARM_DESC(ts_algo, "textsearch algorithm to use (default kmp)");
 
 unsigned int (*nf_nat_amanda_hook)(struct sk_buff *skb,
 				   enum ip_conntrack_info ctinfo,
+				   unsigned int protoff,
 				   unsigned int matchoff,
 				   unsigned int matchlen,
 				   struct nf_conntrack_expect *exp)
@@ -156,8 +157,8 @@ static int amanda_help(struct sk_buff *skb,
 		nf_nat_amanda = rcu_dereference(nf_nat_amanda_hook);
 		if (nf_nat_amanda && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 		    ct->status & IPS_NAT_MASK)
-			ret = nf_nat_amanda(skb, ctinfo, off - dataoff,
-					    len, exp);
+			ret = nf_nat_amanda(skb, ctinfo, protoff,
+					    off - dataoff, len, exp);
 		else if (nf_ct_expect_related(exp) != 0)
 			ret = NF_DROP;
 		nf_ct_expect_put(exp);
diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c
index 3e1587e..c0f4a5b 100644
--- a/net/netfilter/nf_conntrack_ftp.c
+++ b/net/netfilter/nf_conntrack_ftp.c
@@ -48,6 +48,7 @@ module_param(loose, bool, 0600);
 unsigned int (*nf_nat_ftp_hook)(struct sk_buff *skb,
 				enum ip_conntrack_info ctinfo,
 				enum nf_ct_ftp_type type,
+				unsigned int protoff,
 				unsigned int matchoff,
 				unsigned int matchlen,
 				struct nf_conntrack_expect *exp);
@@ -490,7 +491,7 @@ static int help(struct sk_buff *skb,
 	if (nf_nat_ftp && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK)
 		ret = nf_nat_ftp(skb, ctinfo, search[dir][i].ftptype,
-				 matchoff, matchlen, exp);
+				 protoff, matchoff, matchlen, exp);
 	else {
 		/* Can't expect this?  Best to drop packet now. */
 		if (nf_ct_expect_related(exp) != 0)
diff --git a/net/netfilter/nf_conntrack_h323_main.c b/net/netfilter/nf_conntrack_h323_main.c
index 517c5e3..1b30b0d 100644
--- a/net/netfilter/nf_conntrack_h323_main.c
+++ b/net/netfilter/nf_conntrack_h323_main.c
@@ -49,12 +49,12 @@ MODULE_PARM_DESC(callforward_filter, "only create call forwarding expectations "
 				     "(determined by routing information)");
 
 /* Hooks for NAT */
-int (*set_h245_addr_hook) (struct sk_buff *skb,
+int (*set_h245_addr_hook) (struct sk_buff *skb, unsigned int protoff,
 			   unsigned char **data, int dataoff,
 			   H245_TransportAddress *taddr,
 			   union nf_inet_addr *addr, __be16 port)
 			   __read_mostly;
-int (*set_h225_addr_hook) (struct sk_buff *skb,
+int (*set_h225_addr_hook) (struct sk_buff *skb, unsigned int protoff,
 			   unsigned char **data, int dataoff,
 			   TransportAddress *taddr,
 			   union nf_inet_addr *addr, __be16 port)
@@ -62,16 +62,17 @@ int (*set_h225_addr_hook) (struct sk_buff *skb,
 int (*set_sig_addr_hook) (struct sk_buff *skb,
 			  struct nf_conn *ct,
 			  enum ip_conntrack_info ctinfo,
-			  unsigned char **data,
+			  unsigned int protoff, unsigned char **data,
 			  TransportAddress *taddr, int count) __read_mostly;
 int (*set_ras_addr_hook) (struct sk_buff *skb,
 			  struct nf_conn *ct,
 			  enum ip_conntrack_info ctinfo,
-			  unsigned char **data,
+			  unsigned int protoff, unsigned char **data,
 			  TransportAddress *taddr, int count) __read_mostly;
 int (*nat_rtp_rtcp_hook) (struct sk_buff *skb,
 			  struct nf_conn *ct,
 			  enum ip_conntrack_info ctinfo,
+			  unsigned int protoff,
 			  unsigned char **data, int dataoff,
 			  H245_TransportAddress *taddr,
 			  __be16 port, __be16 rtp_port,
@@ -80,24 +81,28 @@ int (*nat_rtp_rtcp_hook) (struct sk_buff *skb,
 int (*nat_t120_hook) (struct sk_buff *skb,
 		      struct nf_conn *ct,
 		      enum ip_conntrack_info ctinfo,
+		      unsigned int protoff,
 		      unsigned char **data, int dataoff,
 		      H245_TransportAddress *taddr, __be16 port,
 		      struct nf_conntrack_expect *exp) __read_mostly;
 int (*nat_h245_hook) (struct sk_buff *skb,
 		      struct nf_conn *ct,
 		      enum ip_conntrack_info ctinfo,
+		      unsigned int protoff,
 		      unsigned char **data, int dataoff,
 		      TransportAddress *taddr, __be16 port,
 		      struct nf_conntrack_expect *exp) __read_mostly;
 int (*nat_callforwarding_hook) (struct sk_buff *skb,
 				struct nf_conn *ct,
 				enum ip_conntrack_info ctinfo,
+				unsigned int protoff,
 				unsigned char **data, int dataoff,
 				TransportAddress *taddr, __be16 port,
 				struct nf_conntrack_expect *exp) __read_mostly;
 int (*nat_q931_hook) (struct sk_buff *skb,
 		      struct nf_conn *ct,
 		      enum ip_conntrack_info ctinfo,
+		      unsigned int protoff,
 		      unsigned char **data, TransportAddress *taddr, int idx,
 		      __be16 port, struct nf_conntrack_expect *exp)
 		      __read_mostly;
@@ -251,6 +256,7 @@ static int get_h245_addr(struct nf_conn *ct, const unsigned char *data,
 /****************************************************************************/
 static int expect_rtp_rtcp(struct sk_buff *skb, struct nf_conn *ct,
 			   enum ip_conntrack_info ctinfo,
+			   unsigned int protoff,
 			   unsigned char **data, int dataoff,
 			   H245_TransportAddress *taddr)
 {
@@ -298,7 +304,7 @@ static int expect_rtp_rtcp(struct sk_buff *skb, struct nf_conn *ct,
 		   nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 		   ct->status & IPS_NAT_MASK) {
 		/* NAT needed */
-		ret = nat_rtp_rtcp(skb, ct, ctinfo, data, dataoff,
+		ret = nat_rtp_rtcp(skb, ct, ctinfo, protoff, data, dataoff,
 				   taddr, port, rtp_port, rtp_exp, rtcp_exp);
 	} else {		/* Conntrack only */
 		if (nf_ct_expect_related(rtp_exp) == 0) {
@@ -325,6 +331,7 @@ static int expect_rtp_rtcp(struct sk_buff *skb, struct nf_conn *ct,
 static int expect_t120(struct sk_buff *skb,
 		       struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, int dataoff,
 		       H245_TransportAddress *taddr)
 {
@@ -357,7 +364,7 @@ static int expect_t120(struct sk_buff *skb,
 	    nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
 		/* NAT needed */
-		ret = nat_t120(skb, ct, ctinfo, data, dataoff, taddr,
+		ret = nat_t120(skb, ct, ctinfo, protoff, data, dataoff, taddr,
 			       port, exp);
 	} else {		/* Conntrack only */
 		if (nf_ct_expect_related(exp) == 0) {
@@ -376,6 +383,7 @@ static int expect_t120(struct sk_buff *skb,
 static int process_h245_channel(struct sk_buff *skb,
 				struct nf_conn *ct,
 				enum ip_conntrack_info ctinfo,
+				unsigned int protoff,
 				unsigned char **data, int dataoff,
 				H2250LogicalChannelParameters *channel)
 {
@@ -383,7 +391,7 @@ static int process_h245_channel(struct sk_buff *skb,
 
 	if (channel->options & eH2250LogicalChannelParameters_mediaChannel) {
 		/* RTP */
-		ret = expect_rtp_rtcp(skb, ct, ctinfo, data, dataoff,
+		ret = expect_rtp_rtcp(skb, ct, ctinfo, protoff, data, dataoff,
 				      &channel->mediaChannel);
 		if (ret < 0)
 			return -1;
@@ -392,7 +400,7 @@ static int process_h245_channel(struct sk_buff *skb,
 	if (channel->
 	    options & eH2250LogicalChannelParameters_mediaControlChannel) {
 		/* RTCP */
-		ret = expect_rtp_rtcp(skb, ct, ctinfo, data, dataoff,
+		ret = expect_rtp_rtcp(skb, ct, ctinfo, protoff, data, dataoff,
 				      &channel->mediaControlChannel);
 		if (ret < 0)
 			return -1;
@@ -404,6 +412,7 @@ static int process_h245_channel(struct sk_buff *skb,
 /****************************************************************************/
 static int process_olc(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, int dataoff,
 		       OpenLogicalChannel *olc)
 {
@@ -414,7 +423,8 @@ static int process_olc(struct sk_buff *skb, struct nf_conn *ct,
 	if (olc->forwardLogicalChannelParameters.multiplexParameters.choice ==
 	    eOpenLogicalChannel_forwardLogicalChannelParameters_multiplexParameters_h2250LogicalChannelParameters)
 	{
-		ret = process_h245_channel(skb, ct, ctinfo, data, dataoff,
+		ret = process_h245_channel(skb, ct, ctinfo,
+					   protoff, data, dataoff,
 					   &olc->
 					   forwardLogicalChannelParameters.
 					   multiplexParameters.
@@ -432,7 +442,8 @@ static int process_olc(struct sk_buff *skb, struct nf_conn *ct,
 		eOpenLogicalChannel_reverseLogicalChannelParameters_multiplexParameters_h2250LogicalChannelParameters))
 	{
 		ret =
-		    process_h245_channel(skb, ct, ctinfo, data, dataoff,
+		    process_h245_channel(skb, ct, ctinfo,
+					 protoff, data, dataoff,
 					 &olc->
 					 reverseLogicalChannelParameters.
 					 multiplexParameters.
@@ -450,7 +461,7 @@ static int process_olc(struct sk_buff *skb, struct nf_conn *ct,
 	    t120.choice == eDataProtocolCapability_separateLANStack &&
 	    olc->separateStack.networkAddress.choice ==
 	    eNetworkAccessParameters_networkAddress_localAreaAddress) {
-		ret = expect_t120(skb, ct, ctinfo, data, dataoff,
+		ret = expect_t120(skb, ct, ctinfo, protoff, data, dataoff,
 				  &olc->separateStack.networkAddress.
 				  localAreaAddress);
 		if (ret < 0)
@@ -463,7 +474,7 @@ static int process_olc(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_olca(struct sk_buff *skb, struct nf_conn *ct,
 			enum ip_conntrack_info ctinfo,
-			unsigned char **data, int dataoff,
+			unsigned int protoff, unsigned char **data, int dataoff,
 			OpenLogicalChannelAck *olca)
 {
 	H2250LogicalChannelAckParameters *ack;
@@ -479,7 +490,8 @@ static int process_olca(struct sk_buff *skb, struct nf_conn *ct,
 		choice ==
 		eOpenLogicalChannelAck_reverseLogicalChannelParameters_multiplexParameters_h2250LogicalChannelParameters))
 	{
-		ret = process_h245_channel(skb, ct, ctinfo, data, dataoff,
+		ret = process_h245_channel(skb, ct, ctinfo,
+					   protoff, data, dataoff,
 					   &olca->
 					   reverseLogicalChannelParameters.
 					   multiplexParameters.
@@ -498,7 +510,8 @@ static int process_olca(struct sk_buff *skb, struct nf_conn *ct,
 		if (ack->options &
 		    eH2250LogicalChannelAckParameters_mediaChannel) {
 			/* RTP */
-			ret = expect_rtp_rtcp(skb, ct, ctinfo, data, dataoff,
+			ret = expect_rtp_rtcp(skb, ct, ctinfo,
+					      protoff, data, dataoff,
 					      &ack->mediaChannel);
 			if (ret < 0)
 				return -1;
@@ -507,7 +520,8 @@ static int process_olca(struct sk_buff *skb, struct nf_conn *ct,
 		if (ack->options &
 		    eH2250LogicalChannelAckParameters_mediaControlChannel) {
 			/* RTCP */
-			ret = expect_rtp_rtcp(skb, ct, ctinfo, data, dataoff,
+			ret = expect_rtp_rtcp(skb, ct, ctinfo,
+					      protoff, data, dataoff,
 					      &ack->mediaControlChannel);
 			if (ret < 0)
 				return -1;
@@ -517,7 +531,7 @@ static int process_olca(struct sk_buff *skb, struct nf_conn *ct,
 	if ((olca->options & eOpenLogicalChannelAck_separateStack) &&
 		olca->separateStack.networkAddress.choice ==
 		eNetworkAccessParameters_networkAddress_localAreaAddress) {
-		ret = expect_t120(skb, ct, ctinfo, data, dataoff,
+		ret = expect_t120(skb, ct, ctinfo, protoff, data, dataoff,
 				  &olca->separateStack.networkAddress.
 				  localAreaAddress);
 		if (ret < 0)
@@ -530,14 +544,15 @@ static int process_olca(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_h245(struct sk_buff *skb, struct nf_conn *ct,
 			enum ip_conntrack_info ctinfo,
-			unsigned char **data, int dataoff,
+			unsigned int protoff, unsigned char **data, int dataoff,
 			MultimediaSystemControlMessage *mscm)
 {
 	switch (mscm->choice) {
 	case eMultimediaSystemControlMessage_request:
 		if (mscm->request.choice ==
 		    eRequestMessage_openLogicalChannel) {
-			return process_olc(skb, ct, ctinfo, data, dataoff,
+			return process_olc(skb, ct, ctinfo,
+					   protoff, data, dataoff,
 					   &mscm->request.openLogicalChannel);
 		}
 		pr_debug("nf_ct_h323: H.245 Request %d\n",
@@ -546,7 +561,8 @@ static int process_h245(struct sk_buff *skb, struct nf_conn *ct,
 	case eMultimediaSystemControlMessage_response:
 		if (mscm->response.choice ==
 		    eResponseMessage_openLogicalChannelAck) {
-			return process_olca(skb, ct, ctinfo, data, dataoff,
+			return process_olca(skb, ct, ctinfo,
+					    protoff, data, dataoff,
 					    &mscm->response.
 					    openLogicalChannelAck);
 		}
@@ -597,7 +613,8 @@ static int h245_help(struct sk_buff *skb, unsigned int protoff,
 		}
 
 		/* Process H.245 signal */
-		if (process_h245(skb, ct, ctinfo, &data, dataoff, &mscm) < 0)
+		if (process_h245(skb, ct, ctinfo, protoff,
+				 &data, dataoff, &mscm) < 0)
 			goto drop;
 	}
 
@@ -661,7 +678,7 @@ int get_h225_addr(struct nf_conn *ct, unsigned char *data,
 /****************************************************************************/
 static int expect_h245(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
-		       unsigned char **data, int dataoff,
+		       unsigned int protoff, unsigned char **data, int dataoff,
 		       TransportAddress *taddr)
 {
 	int dir = CTINFO2DIR(ctinfo);
@@ -693,7 +710,7 @@ static int expect_h245(struct sk_buff *skb, struct nf_conn *ct,
 	    nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
 		/* NAT needed */
-		ret = nat_h245(skb, ct, ctinfo, data, dataoff, taddr,
+		ret = nat_h245(skb, ct, ctinfo, protoff, data, dataoff, taddr,
 			       port, exp);
 	} else {		/* Conntrack only */
 		if (nf_ct_expect_related(exp) == 0) {
@@ -779,6 +796,7 @@ static int callforward_do_filter(const union nf_inet_addr *src,
 static int expect_callforwarding(struct sk_buff *skb,
 				 struct nf_conn *ct,
 				 enum ip_conntrack_info ctinfo,
+				 unsigned int protoff,
 				 unsigned char **data, int dataoff,
 				 TransportAddress *taddr)
 {
@@ -817,7 +835,8 @@ static int expect_callforwarding(struct sk_buff *skb,
 	    nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
 		/* Need NAT */
-		ret = nat_callforwarding(skb, ct, ctinfo, data, dataoff,
+		ret = nat_callforwarding(skb, ct, ctinfo,
+					 protoff, data, dataoff,
 					 taddr, port, exp);
 	} else {		/* Conntrack only */
 		if (nf_ct_expect_related(exp) == 0) {
@@ -835,6 +854,7 @@ static int expect_callforwarding(struct sk_buff *skb,
 /****************************************************************************/
 static int process_setup(struct sk_buff *skb, struct nf_conn *ct,
 			 enum ip_conntrack_info ctinfo,
+			 unsigned int protoff,
 			 unsigned char **data, int dataoff,
 			 Setup_UUIE *setup)
 {
@@ -848,7 +868,7 @@ static int process_setup(struct sk_buff *skb, struct nf_conn *ct,
 	pr_debug("nf_ct_q931: Setup\n");
 
 	if (setup->options & eSetup_UUIE_h245Address) {
-		ret = expect_h245(skb, ct, ctinfo, data, dataoff,
+		ret = expect_h245(skb, ct, ctinfo, protoff, data, dataoff,
 				  &setup->h245Address);
 		if (ret < 0)
 			return -1;
@@ -864,7 +884,7 @@ static int process_setup(struct sk_buff *skb, struct nf_conn *ct,
 		pr_debug("nf_ct_q931: set destCallSignalAddress %pI6:%hu->%pI6:%hu\n",
 			 &addr, ntohs(port), &ct->tuplehash[!dir].tuple.src.u3,
 			 ntohs(ct->tuplehash[!dir].tuple.src.u.tcp.port));
-		ret = set_h225_addr(skb, data, dataoff,
+		ret = set_h225_addr(skb, protoff, data, dataoff,
 				    &setup->destCallSignalAddress,
 				    &ct->tuplehash[!dir].tuple.src.u3,
 				    ct->tuplehash[!dir].tuple.src.u.tcp.port);
@@ -881,7 +901,7 @@ static int process_setup(struct sk_buff *skb, struct nf_conn *ct,
 		pr_debug("nf_ct_q931: set sourceCallSignalAddress %pI6:%hu->%pI6:%hu\n",
 			 &addr, ntohs(port), &ct->tuplehash[!dir].tuple.dst.u3,
 			 ntohs(ct->tuplehash[!dir].tuple.dst.u.tcp.port));
-		ret = set_h225_addr(skb, data, dataoff,
+		ret = set_h225_addr(skb, protoff, data, dataoff,
 				    &setup->sourceCallSignalAddress,
 				    &ct->tuplehash[!dir].tuple.dst.u3,
 				    ct->tuplehash[!dir].tuple.dst.u.tcp.port);
@@ -891,7 +911,8 @@ static int process_setup(struct sk_buff *skb, struct nf_conn *ct,
 
 	if (setup->options & eSetup_UUIE_fastStart) {
 		for (i = 0; i < setup->fastStart.count; i++) {
-			ret = process_olc(skb, ct, ctinfo, data, dataoff,
+			ret = process_olc(skb, ct, ctinfo,
+					  protoff, data, dataoff,
 					  &setup->fastStart.item[i]);
 			if (ret < 0)
 				return -1;
@@ -905,6 +926,7 @@ static int process_setup(struct sk_buff *skb, struct nf_conn *ct,
 static int process_callproceeding(struct sk_buff *skb,
 				  struct nf_conn *ct,
 				  enum ip_conntrack_info ctinfo,
+				  unsigned int protoff,
 				  unsigned char **data, int dataoff,
 				  CallProceeding_UUIE *callproc)
 {
@@ -914,7 +936,7 @@ static int process_callproceeding(struct sk_buff *skb,
 	pr_debug("nf_ct_q931: CallProceeding\n");
 
 	if (callproc->options & eCallProceeding_UUIE_h245Address) {
-		ret = expect_h245(skb, ct, ctinfo, data, dataoff,
+		ret = expect_h245(skb, ct, ctinfo, protoff, data, dataoff,
 				  &callproc->h245Address);
 		if (ret < 0)
 			return -1;
@@ -922,7 +944,8 @@ static int process_callproceeding(struct sk_buff *skb,
 
 	if (callproc->options & eCallProceeding_UUIE_fastStart) {
 		for (i = 0; i < callproc->fastStart.count; i++) {
-			ret = process_olc(skb, ct, ctinfo, data, dataoff,
+			ret = process_olc(skb, ct, ctinfo,
+					  protoff, data, dataoff,
 					  &callproc->fastStart.item[i]);
 			if (ret < 0)
 				return -1;
@@ -935,6 +958,7 @@ static int process_callproceeding(struct sk_buff *skb,
 /****************************************************************************/
 static int process_connect(struct sk_buff *skb, struct nf_conn *ct,
 			   enum ip_conntrack_info ctinfo,
+			   unsigned int protoff,
 			   unsigned char **data, int dataoff,
 			   Connect_UUIE *connect)
 {
@@ -944,7 +968,7 @@ static int process_connect(struct sk_buff *skb, struct nf_conn *ct,
 	pr_debug("nf_ct_q931: Connect\n");
 
 	if (connect->options & eConnect_UUIE_h245Address) {
-		ret = expect_h245(skb, ct, ctinfo, data, dataoff,
+		ret = expect_h245(skb, ct, ctinfo, protoff, data, dataoff,
 				  &connect->h245Address);
 		if (ret < 0)
 			return -1;
@@ -952,7 +976,8 @@ static int process_connect(struct sk_buff *skb, struct nf_conn *ct,
 
 	if (connect->options & eConnect_UUIE_fastStart) {
 		for (i = 0; i < connect->fastStart.count; i++) {
-			ret = process_olc(skb, ct, ctinfo, data, dataoff,
+			ret = process_olc(skb, ct, ctinfo,
+					  protoff, data, dataoff,
 					  &connect->fastStart.item[i]);
 			if (ret < 0)
 				return -1;
@@ -965,6 +990,7 @@ static int process_connect(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_alerting(struct sk_buff *skb, struct nf_conn *ct,
 			    enum ip_conntrack_info ctinfo,
+			    unsigned int protoff,
 			    unsigned char **data, int dataoff,
 			    Alerting_UUIE *alert)
 {
@@ -974,7 +1000,7 @@ static int process_alerting(struct sk_buff *skb, struct nf_conn *ct,
 	pr_debug("nf_ct_q931: Alerting\n");
 
 	if (alert->options & eAlerting_UUIE_h245Address) {
-		ret = expect_h245(skb, ct, ctinfo, data, dataoff,
+		ret = expect_h245(skb, ct, ctinfo, protoff, data, dataoff,
 				  &alert->h245Address);
 		if (ret < 0)
 			return -1;
@@ -982,7 +1008,8 @@ static int process_alerting(struct sk_buff *skb, struct nf_conn *ct,
 
 	if (alert->options & eAlerting_UUIE_fastStart) {
 		for (i = 0; i < alert->fastStart.count; i++) {
-			ret = process_olc(skb, ct, ctinfo, data, dataoff,
+			ret = process_olc(skb, ct, ctinfo,
+					  protoff, data, dataoff,
 					  &alert->fastStart.item[i]);
 			if (ret < 0)
 				return -1;
@@ -995,6 +1022,7 @@ static int process_alerting(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_facility(struct sk_buff *skb, struct nf_conn *ct,
 			    enum ip_conntrack_info ctinfo,
+			    unsigned int protoff,
 			    unsigned char **data, int dataoff,
 			    Facility_UUIE *facility)
 {
@@ -1005,15 +1033,15 @@ static int process_facility(struct sk_buff *skb, struct nf_conn *ct,
 
 	if (facility->reason.choice == eFacilityReason_callForwarded) {
 		if (facility->options & eFacility_UUIE_alternativeAddress)
-			return expect_callforwarding(skb, ct, ctinfo, data,
-						     dataoff,
+			return expect_callforwarding(skb, ct, ctinfo,
+						     protoff, data, dataoff,
 						     &facility->
 						     alternativeAddress);
 		return 0;
 	}
 
 	if (facility->options & eFacility_UUIE_h245Address) {
-		ret = expect_h245(skb, ct, ctinfo, data, dataoff,
+		ret = expect_h245(skb, ct, ctinfo, protoff, data, dataoff,
 				  &facility->h245Address);
 		if (ret < 0)
 			return -1;
@@ -1021,7 +1049,8 @@ static int process_facility(struct sk_buff *skb, struct nf_conn *ct,
 
 	if (facility->options & eFacility_UUIE_fastStart) {
 		for (i = 0; i < facility->fastStart.count; i++) {
-			ret = process_olc(skb, ct, ctinfo, data, dataoff,
+			ret = process_olc(skb, ct, ctinfo,
+					  protoff, data, dataoff,
 					  &facility->fastStart.item[i]);
 			if (ret < 0)
 				return -1;
@@ -1034,6 +1063,7 @@ static int process_facility(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_progress(struct sk_buff *skb, struct nf_conn *ct,
 			    enum ip_conntrack_info ctinfo,
+			    unsigned int protoff,
 			    unsigned char **data, int dataoff,
 			    Progress_UUIE *progress)
 {
@@ -1043,7 +1073,7 @@ static int process_progress(struct sk_buff *skb, struct nf_conn *ct,
 	pr_debug("nf_ct_q931: Progress\n");
 
 	if (progress->options & eProgress_UUIE_h245Address) {
-		ret = expect_h245(skb, ct, ctinfo, data, dataoff,
+		ret = expect_h245(skb, ct, ctinfo, protoff, data, dataoff,
 				  &progress->h245Address);
 		if (ret < 0)
 			return -1;
@@ -1051,7 +1081,8 @@ static int process_progress(struct sk_buff *skb, struct nf_conn *ct,
 
 	if (progress->options & eProgress_UUIE_fastStart) {
 		for (i = 0; i < progress->fastStart.count; i++) {
-			ret = process_olc(skb, ct, ctinfo, data, dataoff,
+			ret = process_olc(skb, ct, ctinfo,
+					  protoff, data, dataoff,
 					  &progress->fastStart.item[i]);
 			if (ret < 0)
 				return -1;
@@ -1064,7 +1095,8 @@ static int process_progress(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_q931(struct sk_buff *skb, struct nf_conn *ct,
 			enum ip_conntrack_info ctinfo,
-			unsigned char **data, int dataoff, Q931 *q931)
+			unsigned int protoff, unsigned char **data, int dataoff,
+			Q931 *q931)
 {
 	H323_UU_PDU *pdu = &q931->UUIE.h323_uu_pdu;
 	int i;
@@ -1072,28 +1104,29 @@ static int process_q931(struct sk_buff *skb, struct nf_conn *ct,
 
 	switch (pdu->h323_message_body.choice) {
 	case eH323_UU_PDU_h323_message_body_setup:
-		ret = process_setup(skb, ct, ctinfo, data, dataoff,
+		ret = process_setup(skb, ct, ctinfo, protoff, data, dataoff,
 				    &pdu->h323_message_body.setup);
 		break;
 	case eH323_UU_PDU_h323_message_body_callProceeding:
-		ret = process_callproceeding(skb, ct, ctinfo, data, dataoff,
+		ret = process_callproceeding(skb, ct, ctinfo,
+					     protoff, data, dataoff,
 					     &pdu->h323_message_body.
 					     callProceeding);
 		break;
 	case eH323_UU_PDU_h323_message_body_connect:
-		ret = process_connect(skb, ct, ctinfo, data, dataoff,
+		ret = process_connect(skb, ct, ctinfo, protoff, data, dataoff,
 				      &pdu->h323_message_body.connect);
 		break;
 	case eH323_UU_PDU_h323_message_body_alerting:
-		ret = process_alerting(skb, ct, ctinfo, data, dataoff,
+		ret = process_alerting(skb, ct, ctinfo, protoff, data, dataoff,
 				       &pdu->h323_message_body.alerting);
 		break;
 	case eH323_UU_PDU_h323_message_body_facility:
-		ret = process_facility(skb, ct, ctinfo, data, dataoff,
+		ret = process_facility(skb, ct, ctinfo, protoff, data, dataoff,
 				       &pdu->h323_message_body.facility);
 		break;
 	case eH323_UU_PDU_h323_message_body_progress:
-		ret = process_progress(skb, ct, ctinfo, data, dataoff,
+		ret = process_progress(skb, ct, ctinfo, protoff, data, dataoff,
 				       &pdu->h323_message_body.progress);
 		break;
 	default:
@@ -1107,7 +1140,8 @@ static int process_q931(struct sk_buff *skb, struct nf_conn *ct,
 
 	if (pdu->options & eH323_UU_PDU_h245Control) {
 		for (i = 0; i < pdu->h245Control.count; i++) {
-			ret = process_h245(skb, ct, ctinfo, data, dataoff,
+			ret = process_h245(skb, ct, ctinfo,
+					   protoff, data, dataoff,
 					   &pdu->h245Control.item[i]);
 			if (ret < 0)
 				return -1;
@@ -1152,7 +1186,8 @@ static int q931_help(struct sk_buff *skb, unsigned int protoff,
 		}
 
 		/* Process Q.931 signal */
-		if (process_q931(skb, ct, ctinfo, &data, dataoff, &q931) < 0)
+		if (process_q931(skb, ct, ctinfo, protoff,
+				 &data, dataoff, &q931) < 0)
 			goto drop;
 	}
 
@@ -1249,7 +1284,7 @@ static int set_expect_timeout(struct nf_conntrack_expect *exp,
 /****************************************************************************/
 static int expect_q931(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
-		       unsigned char **data,
+		       unsigned int protoff, unsigned char **data,
 		       TransportAddress *taddr, int count)
 {
 	struct nf_ct_h323_master *info = nfct_help_data(ct);
@@ -1286,7 +1321,8 @@ static int expect_q931(struct sk_buff *skb, struct nf_conn *ct,
 	nat_q931 = rcu_dereference(nat_q931_hook);
 	if (nat_q931 && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {	/* Need NAT */
-		ret = nat_q931(skb, ct, ctinfo, data, taddr, i, port, exp);
+		ret = nat_q931(skb, ct, ctinfo, protoff, data,
+			       taddr, i, port, exp);
 	} else {		/* Conntrack only */
 		if (nf_ct_expect_related(exp) == 0) {
 			pr_debug("nf_ct_ras: expect Q.931 ");
@@ -1306,6 +1342,7 @@ static int expect_q931(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_grq(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, GatekeeperRequest *grq)
 {
 	typeof(set_ras_addr_hook) set_ras_addr;
@@ -1315,7 +1352,7 @@ static int process_grq(struct sk_buff *skb, struct nf_conn *ct,
 	set_ras_addr = rcu_dereference(set_ras_addr_hook);
 	if (set_ras_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK)	/* NATed */
-		return set_ras_addr(skb, ct, ctinfo, data,
+		return set_ras_addr(skb, ct, ctinfo, protoff, data,
 				    &grq->rasAddress, 1);
 	return 0;
 }
@@ -1323,6 +1360,7 @@ static int process_grq(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_gcf(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, GatekeeperConfirm *gcf)
 {
 	int dir = CTINFO2DIR(ctinfo);
@@ -1367,6 +1405,7 @@ static int process_gcf(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_rrq(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, RegistrationRequest *rrq)
 {
 	struct nf_ct_h323_master *info = nfct_help_data(ct);
@@ -1375,7 +1414,7 @@ static int process_rrq(struct sk_buff *skb, struct nf_conn *ct,
 
 	pr_debug("nf_ct_ras: RRQ\n");
 
-	ret = expect_q931(skb, ct, ctinfo, data,
+	ret = expect_q931(skb, ct, ctinfo, protoff, data,
 			  rrq->callSignalAddress.item,
 			  rrq->callSignalAddress.count);
 	if (ret < 0)
@@ -1384,7 +1423,7 @@ static int process_rrq(struct sk_buff *skb, struct nf_conn *ct,
 	set_ras_addr = rcu_dereference(set_ras_addr_hook);
 	if (set_ras_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
-		ret = set_ras_addr(skb, ct, ctinfo, data,
+		ret = set_ras_addr(skb, ct, ctinfo, protoff, data,
 				   rrq->rasAddress.item,
 				   rrq->rasAddress.count);
 		if (ret < 0)
@@ -1403,6 +1442,7 @@ static int process_rrq(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_rcf(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, RegistrationConfirm *rcf)
 {
 	struct nf_ct_h323_master *info = nfct_help_data(ct);
@@ -1416,7 +1456,7 @@ static int process_rcf(struct sk_buff *skb, struct nf_conn *ct,
 	set_sig_addr = rcu_dereference(set_sig_addr_hook);
 	if (set_sig_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
-		ret = set_sig_addr(skb, ct, ctinfo, data,
+		ret = set_sig_addr(skb, ct, ctinfo, protoff, data,
 					rcf->callSignalAddress.item,
 					rcf->callSignalAddress.count);
 		if (ret < 0)
@@ -1453,6 +1493,7 @@ static int process_rcf(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_urq(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, UnregistrationRequest *urq)
 {
 	struct nf_ct_h323_master *info = nfct_help_data(ct);
@@ -1465,7 +1506,7 @@ static int process_urq(struct sk_buff *skb, struct nf_conn *ct,
 	set_sig_addr = rcu_dereference(set_sig_addr_hook);
 	if (set_sig_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
-		ret = set_sig_addr(skb, ct, ctinfo, data,
+		ret = set_sig_addr(skb, ct, ctinfo, protoff, data,
 				   urq->callSignalAddress.item,
 				   urq->callSignalAddress.count);
 		if (ret < 0)
@@ -1486,6 +1527,7 @@ static int process_urq(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_arq(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, AdmissionRequest *arq)
 {
 	const struct nf_ct_h323_master *info = nfct_help_data(ct);
@@ -1505,7 +1547,7 @@ static int process_arq(struct sk_buff *skb, struct nf_conn *ct,
 	    nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    set_h225_addr && ct->status & IPS_NAT_MASK) {
 		/* Answering ARQ */
-		return set_h225_addr(skb, data, 0,
+		return set_h225_addr(skb, protoff, data, 0,
 				     &arq->destCallSignalAddress,
 				     &ct->tuplehash[!dir].tuple.dst.u3,
 				     info->sig_port[!dir]);
@@ -1518,7 +1560,7 @@ static int process_arq(struct sk_buff *skb, struct nf_conn *ct,
 	    set_h225_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
 		/* Calling ARQ */
-		return set_h225_addr(skb, data, 0,
+		return set_h225_addr(skb, protoff, data, 0,
 				     &arq->srcCallSignalAddress,
 				     &ct->tuplehash[!dir].tuple.dst.u3,
 				     port);
@@ -1530,6 +1572,7 @@ static int process_arq(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_acf(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, AdmissionConfirm *acf)
 {
 	int dir = CTINFO2DIR(ctinfo);
@@ -1550,7 +1593,7 @@ static int process_acf(struct sk_buff *skb, struct nf_conn *ct,
 		set_sig_addr = rcu_dereference(set_sig_addr_hook);
 		if (set_sig_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 		    ct->status & IPS_NAT_MASK)
-			return set_sig_addr(skb, ct, ctinfo, data,
+			return set_sig_addr(skb, ct, ctinfo, protoff, data,
 					    &acf->destCallSignalAddress, 1);
 		return 0;
 	}
@@ -1578,6 +1621,7 @@ static int process_acf(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_lrq(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, LocationRequest *lrq)
 {
 	typeof(set_ras_addr_hook) set_ras_addr;
@@ -1587,7 +1631,7 @@ static int process_lrq(struct sk_buff *skb, struct nf_conn *ct,
 	set_ras_addr = rcu_dereference(set_ras_addr_hook);
 	if (set_ras_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK)
-		return set_ras_addr(skb, ct, ctinfo, data,
+		return set_ras_addr(skb, ct, ctinfo, protoff, data,
 				    &lrq->replyAddress, 1);
 	return 0;
 }
@@ -1595,6 +1639,7 @@ static int process_lrq(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_lcf(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, LocationConfirm *lcf)
 {
 	int dir = CTINFO2DIR(ctinfo);
@@ -1634,6 +1679,7 @@ static int process_lcf(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_irr(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, InfoRequestResponse *irr)
 {
 	int ret;
@@ -1645,7 +1691,7 @@ static int process_irr(struct sk_buff *skb, struct nf_conn *ct,
 	set_ras_addr = rcu_dereference(set_ras_addr_hook);
 	if (set_ras_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
-		ret = set_ras_addr(skb, ct, ctinfo, data,
+		ret = set_ras_addr(skb, ct, ctinfo, protoff, data,
 				   &irr->rasAddress, 1);
 		if (ret < 0)
 			return -1;
@@ -1654,7 +1700,7 @@ static int process_irr(struct sk_buff *skb, struct nf_conn *ct,
 	set_sig_addr = rcu_dereference(set_sig_addr_hook);
 	if (set_sig_addr && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
-		ret = set_sig_addr(skb, ct, ctinfo, data,
+		ret = set_sig_addr(skb, ct, ctinfo, protoff, data,
 					irr->callSignalAddress.item,
 					irr->callSignalAddress.count);
 		if (ret < 0)
@@ -1667,38 +1713,39 @@ static int process_irr(struct sk_buff *skb, struct nf_conn *ct,
 /****************************************************************************/
 static int process_ras(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
+		       unsigned int protoff,
 		       unsigned char **data, RasMessage *ras)
 {
 	switch (ras->choice) {
 	case eRasMessage_gatekeeperRequest:
-		return process_grq(skb, ct, ctinfo, data,
+		return process_grq(skb, ct, ctinfo, protoff, data,
 				   &ras->gatekeeperRequest);
 	case eRasMessage_gatekeeperConfirm:
-		return process_gcf(skb, ct, ctinfo, data,
+		return process_gcf(skb, ct, ctinfo, protoff, data,
 				   &ras->gatekeeperConfirm);
 	case eRasMessage_registrationRequest:
-		return process_rrq(skb, ct, ctinfo, data,
+		return process_rrq(skb, ct, ctinfo, protoff, data,
 				   &ras->registrationRequest);
 	case eRasMessage_registrationConfirm:
-		return process_rcf(skb, ct, ctinfo, data,
+		return process_rcf(skb, ct, ctinfo, protoff, data,
 				   &ras->registrationConfirm);
 	case eRasMessage_unregistrationRequest:
-		return process_urq(skb, ct, ctinfo, data,
+		return process_urq(skb, ct, ctinfo, protoff, data,
 				   &ras->unregistrationRequest);
 	case eRasMessage_admissionRequest:
-		return process_arq(skb, ct, ctinfo, data,
+		return process_arq(skb, ct, ctinfo, protoff, data,
 				   &ras->admissionRequest);
 	case eRasMessage_admissionConfirm:
-		return process_acf(skb, ct, ctinfo, data,
+		return process_acf(skb, ct, ctinfo, protoff, data,
 				   &ras->admissionConfirm);
 	case eRasMessage_locationRequest:
-		return process_lrq(skb, ct, ctinfo, data,
+		return process_lrq(skb, ct, ctinfo, protoff, data,
 				   &ras->locationRequest);
 	case eRasMessage_locationConfirm:
-		return process_lcf(skb, ct, ctinfo, data,
+		return process_lcf(skb, ct, ctinfo, protoff, data,
 				   &ras->locationConfirm);
 	case eRasMessage_infoRequestResponse:
-		return process_irr(skb, ct, ctinfo, data,
+		return process_irr(skb, ct, ctinfo, protoff, data,
 				   &ras->infoRequestResponse);
 	default:
 		pr_debug("nf_ct_ras: RAS message %d\n", ras->choice);
@@ -1738,7 +1785,7 @@ static int ras_help(struct sk_buff *skb, unsigned int protoff,
 	}
 
 	/* Process RAS message */
-	if (process_ras(skb, ct, ctinfo, &data, &ras) < 0)
+	if (process_ras(skb, ct, ctinfo, protoff, &data, &ras) < 0)
 		goto drop;
 
       accept:
diff --git a/net/netfilter/nf_conntrack_irc.c b/net/netfilter/nf_conntrack_irc.c
index e06dc2f..95d097c 100644
--- a/net/netfilter/nf_conntrack_irc.c
+++ b/net/netfilter/nf_conntrack_irc.c
@@ -33,6 +33,7 @@ static DEFINE_SPINLOCK(irc_buffer_lock);
 
 unsigned int (*nf_nat_irc_hook)(struct sk_buff *skb,
 				enum ip_conntrack_info ctinfo,
+				unsigned int protoff,
 				unsigned int matchoff,
 				unsigned int matchlen,
 				struct nf_conntrack_expect *exp) __read_mostly;
@@ -206,7 +207,7 @@ static int help(struct sk_buff *skb, unsigned int protoff,
 			nf_nat_irc = rcu_dereference(nf_nat_irc_hook);
 			if (nf_nat_irc && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 			    ct->status & IPS_NAT_MASK)
-				ret = nf_nat_irc(skb, ctinfo,
+				ret = nf_nat_irc(skb, ctinfo, protoff,
 						 addr_beg_p - ib_ptr,
 						 addr_end_p - addr_beg_p,
 						 exp);
diff --git a/net/netfilter/nf_conntrack_pptp.c b/net/netfilter/nf_conntrack_pptp.c
index 6fed9ec..cc7669e 100644
--- a/net/netfilter/nf_conntrack_pptp.c
+++ b/net/netfilter/nf_conntrack_pptp.c
@@ -45,14 +45,14 @@ static DEFINE_SPINLOCK(nf_pptp_lock);
 int
 (*nf_nat_pptp_hook_outbound)(struct sk_buff *skb,
 			     struct nf_conn *ct, enum ip_conntrack_info ctinfo,
-			     struct PptpControlHeader *ctlh,
+			     unsigned int protoff, struct PptpControlHeader *ctlh,
 			     union pptp_ctrl_union *pptpReq) __read_mostly;
 EXPORT_SYMBOL_GPL(nf_nat_pptp_hook_outbound);
 
 int
 (*nf_nat_pptp_hook_inbound)(struct sk_buff *skb,
 			    struct nf_conn *ct, enum ip_conntrack_info ctinfo,
-			    struct PptpControlHeader *ctlh,
+			    unsigned int protoff, struct PptpControlHeader *ctlh,
 			    union pptp_ctrl_union *pptpReq) __read_mostly;
 EXPORT_SYMBOL_GPL(nf_nat_pptp_hook_inbound);
 
@@ -262,7 +262,7 @@ out_unexpect_orig:
 }
 
 static inline int
-pptp_inbound_pkt(struct sk_buff *skb,
+pptp_inbound_pkt(struct sk_buff *skb, unsigned int protoff,
 		 struct PptpControlHeader *ctlh,
 		 union pptp_ctrl_union *pptpReq,
 		 unsigned int reqlen,
@@ -376,7 +376,8 @@ pptp_inbound_pkt(struct sk_buff *skb,
 
 	nf_nat_pptp_inbound = rcu_dereference(nf_nat_pptp_hook_inbound);
 	if (nf_nat_pptp_inbound && ct->status & IPS_NAT_MASK)
-		return nf_nat_pptp_inbound(skb, ct, ctinfo, ctlh, pptpReq);
+		return nf_nat_pptp_inbound(skb, ct, ctinfo,
+					   protoff, ctlh, pptpReq);
 	return NF_ACCEPT;
 
 invalid:
@@ -389,7 +390,7 @@ invalid:
 }
 
 static inline int
-pptp_outbound_pkt(struct sk_buff *skb,
+pptp_outbound_pkt(struct sk_buff *skb, unsigned int protoff,
 		  struct PptpControlHeader *ctlh,
 		  union pptp_ctrl_union *pptpReq,
 		  unsigned int reqlen,
@@ -471,7 +472,8 @@ pptp_outbound_pkt(struct sk_buff *skb,
 
 	nf_nat_pptp_outbound = rcu_dereference(nf_nat_pptp_hook_outbound);
 	if (nf_nat_pptp_outbound && ct->status & IPS_NAT_MASK)
-		return nf_nat_pptp_outbound(skb, ct, ctinfo, ctlh, pptpReq);
+		return nf_nat_pptp_outbound(skb, ct, ctinfo,
+					    protoff, ctlh, pptpReq);
 	return NF_ACCEPT;
 
 invalid:
@@ -570,11 +572,11 @@ conntrack_pptp_help(struct sk_buff *skb, unsigned int protoff,
 	 * established from PNS->PAC.  However, RFC makes no guarantee */
 	if (dir == IP_CT_DIR_ORIGINAL)
 		/* client -> server (PNS -> PAC) */
-		ret = pptp_outbound_pkt(skb, ctlh, pptpReq, reqlen, ct,
+		ret = pptp_outbound_pkt(skb, protoff, ctlh, pptpReq, reqlen, ct,
 					ctinfo);
 	else
 		/* server -> client (PAC -> PNS) */
-		ret = pptp_inbound_pkt(skb, ctlh, pptpReq, reqlen, ct,
+		ret = pptp_inbound_pkt(skb, protoff, ctlh, pptpReq, reqlen, ct,
 				       ctinfo);
 	pr_debug("sstate: %d->%d, cstate: %d->%d\n",
 		 oldsstate, info->sstate, oldcstate, info->cstate);
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index d08e0ba..590f0ab 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -52,8 +52,8 @@ module_param(sip_direct_media, int, 0600);
 MODULE_PARM_DESC(sip_direct_media, "Expect Media streams between signalling "
 				   "endpoints only (default 1)");
 
-unsigned int (*nf_nat_sip_hook)(struct sk_buff *skb, unsigned int dataoff,
-				const char **dptr,
+unsigned int (*nf_nat_sip_hook)(struct sk_buff *skb, unsigned int protoff,
+				unsigned int dataoff, const char **dptr,
 				unsigned int *datalen) __read_mostly;
 EXPORT_SYMBOL_GPL(nf_nat_sip_hook);
 
@@ -61,6 +61,7 @@ void (*nf_nat_sip_seq_adjust_hook)(struct sk_buff *skb, s16 off) __read_mostly;
 EXPORT_SYMBOL_GPL(nf_nat_sip_seq_adjust_hook);
 
 unsigned int (*nf_nat_sip_expect_hook)(struct sk_buff *skb,
+				       unsigned int protoff,
 				       unsigned int dataoff,
 				       const char **dptr,
 				       unsigned int *datalen,
@@ -69,7 +70,8 @@ unsigned int (*nf_nat_sip_expect_hook)(struct sk_buff *skb,
 				       unsigned int matchlen) __read_mostly;
 EXPORT_SYMBOL_GPL(nf_nat_sip_expect_hook);
 
-unsigned int (*nf_nat_sdp_addr_hook)(struct sk_buff *skb, unsigned int dataoff,
+unsigned int (*nf_nat_sdp_addr_hook)(struct sk_buff *skb, unsigned int protoff,
+				     unsigned int dataoff,
 				     const char **dptr,
 				     unsigned int *datalen,
 				     unsigned int sdpoff,
@@ -79,7 +81,8 @@ unsigned int (*nf_nat_sdp_addr_hook)(struct sk_buff *skb, unsigned int dataoff,
 				     __read_mostly;
 EXPORT_SYMBOL_GPL(nf_nat_sdp_addr_hook);
 
-unsigned int (*nf_nat_sdp_port_hook)(struct sk_buff *skb, unsigned int dataoff,
+unsigned int (*nf_nat_sdp_port_hook)(struct sk_buff *skb, unsigned int protoff,
+				     unsigned int dataoff,
 				     const char **dptr,
 				     unsigned int *datalen,
 				     unsigned int matchoff,
@@ -88,6 +91,7 @@ unsigned int (*nf_nat_sdp_port_hook)(struct sk_buff *skb, unsigned int dataoff,
 EXPORT_SYMBOL_GPL(nf_nat_sdp_port_hook);
 
 unsigned int (*nf_nat_sdp_session_hook)(struct sk_buff *skb,
+					unsigned int protoff,
 					unsigned int dataoff,
 					const char **dptr,
 					unsigned int *datalen,
@@ -96,7 +100,8 @@ unsigned int (*nf_nat_sdp_session_hook)(struct sk_buff *skb,
 					__read_mostly;
 EXPORT_SYMBOL_GPL(nf_nat_sdp_session_hook);
 
-unsigned int (*nf_nat_sdp_media_hook)(struct sk_buff *skb, unsigned int dataoff,
+unsigned int (*nf_nat_sdp_media_hook)(struct sk_buff *skb, unsigned int protoff,
+				      unsigned int dataoff,
 				      const char **dptr,
 				      unsigned int *datalen,
 				      struct nf_conntrack_expect *rtp_exp,
@@ -883,7 +888,8 @@ static void flush_expectations(struct nf_conn *ct, bool media)
 	spin_unlock_bh(&nf_conntrack_lock);
 }
 
-static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int dataoff,
+static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int protoff,
+				 unsigned int dataoff,
 				 const char **dptr, unsigned int *datalen,
 				 union nf_inet_addr *daddr, __be16 port,
 				 enum sip_expectation_classes class,
@@ -960,7 +966,7 @@ static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int dataoff,
 	if (direct_rtp) {
 		nf_nat_sdp_port = rcu_dereference(nf_nat_sdp_port_hook);
 		if (nf_nat_sdp_port &&
-		    !nf_nat_sdp_port(skb, dataoff, dptr, datalen,
+		    !nf_nat_sdp_port(skb, protoff, dataoff, dptr, datalen,
 				     mediaoff, medialen, ntohs(rtp_port)))
 			goto err1;
 	}
@@ -983,7 +989,7 @@ static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int dataoff,
 	nf_nat_sdp_media = rcu_dereference(nf_nat_sdp_media_hook);
 	if (nf_nat_sdp_media && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK && !direct_rtp)
-		ret = nf_nat_sdp_media(skb, dataoff, dptr, datalen,
+		ret = nf_nat_sdp_media(skb, protoff, dataoff, dptr, datalen,
 				       rtp_exp, rtcp_exp,
 				       mediaoff, medialen, daddr);
 	else {
@@ -1024,7 +1030,8 @@ static const struct sdp_media_type *sdp_media_type(const char *dptr,
 	return NULL;
 }
 
-static int process_sdp(struct sk_buff *skb, unsigned int dataoff,
+static int process_sdp(struct sk_buff *skb, unsigned int protoff,
+		       unsigned int dataoff,
 		       const char **dptr, unsigned int *datalen,
 		       unsigned int cseq)
 {
@@ -1098,7 +1105,8 @@ static int process_sdp(struct sk_buff *skb, unsigned int dataoff,
 		else
 			return NF_DROP;
 
-		ret = set_expected_rtp_rtcp(skb, dataoff, dptr, datalen,
+		ret = set_expected_rtp_rtcp(skb, protoff, dataoff,
+					    dptr, datalen,
 					    &rtp_addr, htons(port), t->class,
 					    mediaoff, medialen);
 		if (ret != NF_ACCEPT)
@@ -1107,7 +1115,8 @@ static int process_sdp(struct sk_buff *skb, unsigned int dataoff,
 		/* Update media connection address if present */
 		if (maddr_len && nf_nat_sdp_addr &&
 		    nf_ct_l3num(ct) == NFPROTO_IPV4 && ct->status & IPS_NAT_MASK) {
-			ret = nf_nat_sdp_addr(skb, dataoff, dptr, datalen,
+			ret = nf_nat_sdp_addr(skb, protoff, dataoff,
+					      dptr, datalen,
 					      mediaoff, c_hdr, SDP_HDR_MEDIA,
 					      &rtp_addr);
 			if (ret != NF_ACCEPT)
@@ -1120,12 +1129,13 @@ static int process_sdp(struct sk_buff *skb, unsigned int dataoff,
 	nf_nat_sdp_session = rcu_dereference(nf_nat_sdp_session_hook);
 	if (nf_nat_sdp_session && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK)
-		ret = nf_nat_sdp_session(skb, dataoff, dptr, datalen, sdpoff,
-					 &rtp_addr);
+		ret = nf_nat_sdp_session(skb, protoff, dataoff,
+					 dptr, datalen, sdpoff, &rtp_addr);
 
 	return ret;
 }
-static int process_invite_response(struct sk_buff *skb, unsigned int dataoff,
+static int process_invite_response(struct sk_buff *skb, unsigned int protoff,
+				   unsigned int dataoff,
 				   const char **dptr, unsigned int *datalen,
 				   unsigned int cseq, unsigned int code)
 {
@@ -1135,13 +1145,14 @@ static int process_invite_response(struct sk_buff *skb, unsigned int dataoff,
 
 	if ((code >= 100 && code <= 199) ||
 	    (code >= 200 && code <= 299))
-		return process_sdp(skb, dataoff, dptr, datalen, cseq);
+		return process_sdp(skb, protoff, dataoff, dptr, datalen, cseq);
 	else if (ct_sip_info->invite_cseq == cseq)
 		flush_expectations(ct, true);
 	return NF_ACCEPT;
 }
 
-static int process_update_response(struct sk_buff *skb, unsigned int dataoff,
+static int process_update_response(struct sk_buff *skb, unsigned int protoff,
+				   unsigned int dataoff,
 				   const char **dptr, unsigned int *datalen,
 				   unsigned int cseq, unsigned int code)
 {
@@ -1151,13 +1162,14 @@ static int process_update_response(struct sk_buff *skb, unsigned int dataoff,
 
 	if ((code >= 100 && code <= 199) ||
 	    (code >= 200 && code <= 299))
-		return process_sdp(skb, dataoff, dptr, datalen, cseq);
+		return process_sdp(skb, protoff, dataoff, dptr, datalen, cseq);
 	else if (ct_sip_info->invite_cseq == cseq)
 		flush_expectations(ct, true);
 	return NF_ACCEPT;
 }
 
-static int process_prack_response(struct sk_buff *skb, unsigned int dataoff,
+static int process_prack_response(struct sk_buff *skb, unsigned int protoff,
+				  unsigned int dataoff,
 				  const char **dptr, unsigned int *datalen,
 				  unsigned int cseq, unsigned int code)
 {
@@ -1167,13 +1179,14 @@ static int process_prack_response(struct sk_buff *skb, unsigned int dataoff,
 
 	if ((code >= 100 && code <= 199) ||
 	    (code >= 200 && code <= 299))
-		return process_sdp(skb, dataoff, dptr, datalen, cseq);
+		return process_sdp(skb, protoff, dataoff, dptr, datalen, cseq);
 	else if (ct_sip_info->invite_cseq == cseq)
 		flush_expectations(ct, true);
 	return NF_ACCEPT;
 }
 
-static int process_invite_request(struct sk_buff *skb, unsigned int dataoff,
+static int process_invite_request(struct sk_buff *skb, unsigned int protoff,
+				  unsigned int dataoff,
 				  const char **dptr, unsigned int *datalen,
 				  unsigned int cseq)
 {
@@ -1183,13 +1196,14 @@ static int process_invite_request(struct sk_buff *skb, unsigned int dataoff,
 	unsigned int ret;
 
 	flush_expectations(ct, true);
-	ret = process_sdp(skb, dataoff, dptr, datalen, cseq);
+	ret = process_sdp(skb, protoff, dataoff, dptr, datalen, cseq);
 	if (ret == NF_ACCEPT)
 		ct_sip_info->invite_cseq = cseq;
 	return ret;
 }
 
-static int process_bye_request(struct sk_buff *skb, unsigned int dataoff,
+static int process_bye_request(struct sk_buff *skb, unsigned int protoff,
+			       unsigned int dataoff,
 			       const char **dptr, unsigned int *datalen,
 			       unsigned int cseq)
 {
@@ -1204,7 +1218,8 @@ static int process_bye_request(struct sk_buff *skb, unsigned int dataoff,
  * signalling connections. The expectation is marked inactive and is activated
  * when receiving a response indicating success from the registrar.
  */
-static int process_register_request(struct sk_buff *skb, unsigned int dataoff,
+static int process_register_request(struct sk_buff *skb, unsigned int protoff,
+				    unsigned int dataoff,
 				    const char **dptr, unsigned int *datalen,
 				    unsigned int cseq)
 {
@@ -1280,8 +1295,8 @@ static int process_register_request(struct sk_buff *skb, unsigned int dataoff,
 	nf_nat_sip_expect = rcu_dereference(nf_nat_sip_expect_hook);
 	if (nf_nat_sip_expect && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK)
-		ret = nf_nat_sip_expect(skb, dataoff, dptr, datalen, exp,
-					matchoff, matchlen);
+		ret = nf_nat_sip_expect(skb, protoff, dataoff, dptr, datalen,
+					exp, matchoff, matchlen);
 	else {
 		if (nf_ct_expect_related(exp) != 0)
 			ret = NF_DROP;
@@ -1296,7 +1311,8 @@ store_cseq:
 	return ret;
 }
 
-static int process_register_response(struct sk_buff *skb, unsigned int dataoff,
+static int process_register_response(struct sk_buff *skb, unsigned int protoff,
+				     unsigned int dataoff,
 				     const char **dptr, unsigned int *datalen,
 				     unsigned int cseq, unsigned int code)
 {
@@ -1378,7 +1394,8 @@ static const struct sip_handler sip_handlers[] = {
 	SIP_HANDLER("REGISTER", process_register_request, process_register_response),
 };
 
-static int process_sip_response(struct sk_buff *skb, unsigned int dataoff,
+static int process_sip_response(struct sk_buff *skb, unsigned int protoff,
+				unsigned int dataoff,
 				const char **dptr, unsigned int *datalen)
 {
 	enum ip_conntrack_info ctinfo;
@@ -1409,13 +1426,14 @@ static int process_sip_response(struct sk_buff *skb, unsigned int dataoff,
 		if (*datalen < matchend + handler->len ||
 		    strnicmp(*dptr + matchend, handler->method, handler->len))
 			continue;
-		return handler->response(skb, dataoff, dptr, datalen,
+		return handler->response(skb, protoff, dataoff, dptr, datalen,
 					 cseq, code);
 	}
 	return NF_ACCEPT;
 }
 
-static int process_sip_request(struct sk_buff *skb, unsigned int dataoff,
+static int process_sip_request(struct sk_buff *skb, unsigned int protoff,
+			       unsigned int dataoff,
 			       const char **dptr, unsigned int *datalen)
 {
 	enum ip_conntrack_info ctinfo;
@@ -1440,27 +1458,29 @@ static int process_sip_request(struct sk_buff *skb, unsigned int dataoff,
 		if (!cseq)
 			return NF_DROP;
 
-		return handler->request(skb, dataoff, dptr, datalen, cseq);
+		return handler->request(skb, protoff, dataoff, dptr, datalen,
+					cseq);
 	}
 	return NF_ACCEPT;
 }
 
 static int process_sip_msg(struct sk_buff *skb, struct nf_conn *ct,
-			   unsigned int dataoff, const char **dptr,
-			   unsigned int *datalen)
+			   unsigned int protoff, unsigned int dataoff,
+			   const char **dptr, unsigned int *datalen)
 {
 	typeof(nf_nat_sip_hook) nf_nat_sip;
 	int ret;
 
 	if (strnicmp(*dptr, "SIP/2.0 ", strlen("SIP/2.0 ")) != 0)
-		ret = process_sip_request(skb, dataoff, dptr, datalen);
+		ret = process_sip_request(skb, protoff, dataoff, dptr, datalen);
 	else
-		ret = process_sip_response(skb, dataoff, dptr, datalen);
+		ret = process_sip_response(skb, protoff, dataoff, dptr, datalen);
 
 	if (ret == NF_ACCEPT && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
 	    ct->status & IPS_NAT_MASK) {
 		nf_nat_sip = rcu_dereference(nf_nat_sip_hook);
-		if (nf_nat_sip && !nf_nat_sip(skb, dataoff, dptr, datalen))
+		if (nf_nat_sip && !nf_nat_sip(skb, protoff, dataoff,
+					      dptr, datalen))
 			ret = NF_DROP;
 	}
 
@@ -1528,7 +1548,8 @@ static int sip_help_tcp(struct sk_buff *skb, unsigned int protoff,
 		if (msglen > datalen)
 			return NF_DROP;
 
-		ret = process_sip_msg(skb, ct, dataoff, &dptr, &msglen);
+		ret = process_sip_msg(skb, ct, protoff, dataoff,
+				      &dptr, &msglen);
 		if (ret != NF_ACCEPT)
 			break;
 		diff     = msglen - origlen;
@@ -1570,7 +1591,7 @@ static int sip_help_udp(struct sk_buff *skb, unsigned int protoff,
 	if (datalen < strlen("SIP/2.0 200"))
 		return NF_ACCEPT;
 
-	return process_sip_msg(skb, ct, dataoff, &dptr, &datalen);
+	return process_sip_msg(skb, ct, protoff, dataoff, &dptr, &datalen);
 }
 
 static struct nf_conntrack_helper sip[MAX_PORTS][4] __read_mostly;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 09/19] netfilter: add protocol independant NAT core
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (7 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 08/19] netfilter: nf_nat: add protoff argument to packet mangling functions kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-09 20:08 ` [PATCH 10/19] netfilter: ipv6: expand skb head in ip6_route_me_harder after oif change kaber
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Convert the IPv4 NAT implementation to a protocol independant core and
address family specific modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter.h                          |   14 +-
 include/linux/netfilter/nf_nat.h                   |    8 +
 include/linux/netfilter/nfnetlink_conntrack.h      |    6 +-
 include/linux/netfilter_ipv4.h                     |    1 -
 include/net/netfilter/nf_conntrack_expect.h        |    2 +-
 include/net/netfilter/nf_nat.h                     |    2 +-
 include/net/netfilter/nf_nat_core.h                |    5 +-
 include/net/netfilter/nf_nat_l3proto.h             |   47 ++
 include/net/netfilter/nf_nat_l4proto.h             |   71 +++
 include/net/netfilter/nf_nat_protocol.h            |   67 ---
 include/net/netfilter/nf_nat_rule.h                |   15 -
 include/net/netns/conntrack.h                      |    4 +
 include/net/netns/ipv4.h                           |    2 -
 net/ipv4/netfilter.c                               |   37 --
 net/ipv4/netfilter/Kconfig                         |   65 +--
 net/ipv4/netfilter/Makefile                        |   11 +-
 net/ipv4/netfilter/ipt_MASQUERADE.c                |   15 +-
 net/ipv4/netfilter/ipt_NETMAP.c                    |   15 +-
 net/ipv4/netfilter/ipt_REDIRECT.c                  |   15 +-
 .../{nf_nat_standalone.c => iptable_nat.c}         |  264 +++++-----
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c     |    6 -
 net/ipv4/netfilter/nf_nat_amanda.c                 |    1 -
 net/ipv4/netfilter/nf_nat_ftp.c                    |    1 -
 net/ipv4/netfilter/nf_nat_h323.c                   |   23 +-
 net/ipv4/netfilter/nf_nat_irc.c                    |    1 -
 net/ipv4/netfilter/nf_nat_l3proto_ipv4.c           |  281 +++++++++
 net/ipv4/netfilter/nf_nat_pptp.c                   |   15 +-
 net/ipv4/netfilter/nf_nat_proto_gre.c              |   30 +-
 net/ipv4/netfilter/nf_nat_proto_icmp.c             |   24 +-
 net/ipv4/netfilter/nf_nat_rule.c                   |  214 -------
 net/ipv4/netfilter/nf_nat_sip.c                    |   19 +-
 net/ipv4/netfilter/nf_nat_tftp.c                   |    1 -
 net/netfilter/Kconfig                              |   24 +
 net/netfilter/Makefile                             |   11 +
 net/netfilter/core.c                               |    5 +
 net/netfilter/nf_conntrack_core.c                  |    6 +
 net/netfilter/nf_conntrack_netlink.c               |   35 +-
 net/netfilter/nf_conntrack_proto_tcp.c             |    8 +-
 net/netfilter/nf_conntrack_sip.c                   |    6 +-
 net/{ipv4 => }/netfilter/nf_nat_core.c             |  599 ++++++++++----------
 net/{ipv4 => }/netfilter/nf_nat_helper.c           |  100 ++--
 net/{ipv4 => }/netfilter/nf_nat_proto_common.c     |   54 +-
 net/{ipv4 => }/netfilter/nf_nat_proto_dccp.c       |   56 ++-
 net/{ipv4 => }/netfilter/nf_nat_proto_sctp.c       |   53 +-
 net/{ipv4 => }/netfilter/nf_nat_proto_tcp.c        |   40 +-
 net/{ipv4 => }/netfilter/nf_nat_proto_udp.c        |   42 +-
 net/{ipv4 => }/netfilter/nf_nat_proto_udplite.c    |   58 ++-
 net/{ipv4 => }/netfilter/nf_nat_proto_unknown.c    |   16 +-
 net/netfilter/xt_nat.c                             |  167 ++++++
 49 files changed, 1427 insertions(+), 1135 deletions(-)
 create mode 100644 include/net/netfilter/nf_nat_l3proto.h
 create mode 100644 include/net/netfilter/nf_nat_l4proto.h
 delete mode 100644 include/net/netfilter/nf_nat_protocol.h
 delete mode 100644 include/net/netfilter/nf_nat_rule.h
 rename net/ipv4/netfilter/{nf_nat_standalone.c => iptable_nat.c} (52%)
 create mode 100644 net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
 delete mode 100644 net/ipv4/netfilter/nf_nat_rule.c
 rename net/{ipv4 => }/netfilter/nf_nat_core.c (56%)
 rename net/{ipv4 => }/netfilter/nf_nat_helper.c (83%)
 rename net/{ipv4 => }/netfilter/nf_nat_proto_common.c (62%)
 rename net/{ipv4 => }/netfilter/nf_nat_proto_dccp.c (61%)
 rename net/{ipv4 => }/netfilter/nf_nat_proto_sctp.c (61%)
 rename net/{ipv4 => }/netfilter/nf_nat_proto_tcp.c (65%)
 rename net/{ipv4 => }/netfilter/nf_nat_proto_udp.c (60%)
 rename net/{ipv4 => }/netfilter/nf_nat_proto_udplite.c (58%)
 rename net/{ipv4 => }/netfilter/nf_nat_proto_unknown.c (76%)
 create mode 100644 net/netfilter/xt_nat.c

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index c613cf0..1dcf2a3 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -342,7 +342,7 @@ extern int nf_register_afinfo(const struct nf_afinfo *afinfo);
 extern void nf_unregister_afinfo(const struct nf_afinfo *afinfo);
 
 #include <net/flow.h>
-extern void (*ip_nat_decode_session)(struct sk_buff *, struct flowi *);
+extern void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
 
 static inline void
 nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family)
@@ -350,13 +350,11 @@ nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family)
 #ifdef CONFIG_NF_NAT_NEEDED
 	void (*decodefn)(struct sk_buff *, struct flowi *);
 
-	if (family == AF_INET) {
-		rcu_read_lock();
-		decodefn = rcu_dereference(ip_nat_decode_session);
-		if (decodefn)
-			decodefn(skb, fl);
-		rcu_read_unlock();
-	}
+	rcu_read_lock();
+	decodefn = rcu_dereference(nf_nat_decode_session_hook);
+	if (decodefn)
+		decodefn(skb, fl);
+	rcu_read_unlock();
 #endif
 }
 
diff --git a/include/linux/netfilter/nf_nat.h b/include/linux/netfilter/nf_nat.h
index 8df2d13..bf0cc37 100644
--- a/include/linux/netfilter/nf_nat.h
+++ b/include/linux/netfilter/nf_nat.h
@@ -22,4 +22,12 @@ struct nf_nat_ipv4_multi_range_compat {
 	struct nf_nat_ipv4_range	range[1];
 };
 
+struct nf_nat_range {
+	unsigned int			flags;
+	union nf_inet_addr		min_addr;
+	union nf_inet_addr		max_addr;
+	union nf_conntrack_man_proto	min_proto;
+	union nf_conntrack_man_proto	max_proto;
+};
+
 #endif /* _NETFILTER_NF_NAT_H */
diff --git a/include/linux/netfilter/nfnetlink_conntrack.h b/include/linux/netfilter/nfnetlink_conntrack.h
index f649f74..68920ea 100644
--- a/include/linux/netfilter/nfnetlink_conntrack.h
+++ b/include/linux/netfilter/nfnetlink_conntrack.h
@@ -142,8 +142,10 @@ enum ctattr_tstamp {
 
 enum ctattr_nat {
 	CTA_NAT_UNSPEC,
-	CTA_NAT_MINIP,
-	CTA_NAT_MAXIP,
+	CTA_NAT_V4_MINIP,
+#define CTA_NAT_MINIP CTA_NAT_V4_MINIP
+	CTA_NAT_V4_MAXIP,
+#define CTA_NAT_MAXIP CTA_NAT_V4_MAXIP
 	CTA_NAT_PROTO,
 	__CTA_NAT_MAX
 };
diff --git a/include/linux/netfilter_ipv4.h b/include/linux/netfilter_ipv4.h
index e2b1280..b962dfc 100644
--- a/include/linux/netfilter_ipv4.h
+++ b/include/linux/netfilter_ipv4.h
@@ -79,7 +79,6 @@ enum nf_ip_hook_priorities {
 
 #ifdef __KERNEL__
 extern int ip_route_me_harder(struct sk_buff *skb, unsigned addr_type);
-extern int ip_xfrm_me_harder(struct sk_buff *skb);
 extern __sum16 nf_ip_checksum(struct sk_buff *skb, unsigned int hook,
 				   unsigned int dataoff, u_int8_t protocol);
 #endif /*__KERNEL__*/
diff --git a/include/net/netfilter/nf_conntrack_expect.h b/include/net/netfilter/nf_conntrack_expect.h
index 983f002..cc13f37 100644
--- a/include/net/netfilter/nf_conntrack_expect.h
+++ b/include/net/netfilter/nf_conntrack_expect.h
@@ -43,7 +43,7 @@ struct nf_conntrack_expect {
 	unsigned int class;
 
 #ifdef CONFIG_NF_NAT_NEEDED
-	__be32 saved_ip;
+	union nf_inet_addr saved_addr;
 	/* This is the original per-proto part, used to map the
 	 * expected connection the way the recipient expects. */
 	union nf_conntrack_man_proto saved_proto;
diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index b4de990..1752f13 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -50,7 +50,7 @@ struct nf_conn_nat {
 
 /* Set up the info structure to map into this range. */
 extern unsigned int nf_nat_setup_info(struct nf_conn *ct,
-				      const struct nf_nat_ipv4_range *range,
+				      const struct nf_nat_range *range,
 				      enum nf_nat_manip_type maniptype);
 
 /* Is this tuple already taken? (not by us)*/
diff --git a/include/net/netfilter/nf_nat_core.h b/include/net/netfilter/nf_nat_core.h
index b13d8d1..972e1e4 100644
--- a/include/net/netfilter/nf_nat_core.h
+++ b/include/net/netfilter/nf_nat_core.h
@@ -12,10 +12,7 @@ extern unsigned int nf_nat_packet(struct nf_conn *ct,
 				  unsigned int hooknum,
 				  struct sk_buff *skb);
 
-extern int nf_nat_icmp_reply_translation(struct nf_conn *ct,
-					 enum ip_conntrack_info ctinfo,
-					 unsigned int hooknum,
-					 struct sk_buff *skb);
+extern int nf_xfrm_me_harder(struct sk_buff *skb, unsigned int family);
 
 static inline int nf_nat_initialized(struct nf_conn *ct,
 				     enum nf_nat_manip_type manip)
diff --git a/include/net/netfilter/nf_nat_l3proto.h b/include/net/netfilter/nf_nat_l3proto.h
new file mode 100644
index 0000000..3df4a67
--- /dev/null
+++ b/include/net/netfilter/nf_nat_l3proto.h
@@ -0,0 +1,47 @@
+#ifndef _NF_NAT_L3PROTO_H
+#define _NF_NAT_L3PROTO_H
+
+struct nf_nat_l4proto;
+struct nf_nat_l3proto {
+	u8	l3proto;
+
+	int	(*in_range)(const struct nf_conntrack_tuple *t,
+			    const struct nf_nat_range *range);
+
+	u32 	(*secure_port)(const struct nf_conntrack_tuple *t, __be16);
+
+	bool	(*manip_pkt)(struct sk_buff *skb,
+			     unsigned int iphdroff,
+			     const struct nf_nat_l4proto *l4proto,
+			     const struct nf_conntrack_tuple *target,
+			     enum nf_nat_manip_type maniptype);
+
+	void	(*csum_update)(struct sk_buff *skb, unsigned int iphdroff,
+			       __sum16 *check,
+			       const struct nf_conntrack_tuple *t,
+			       enum nf_nat_manip_type maniptype);
+
+	void	(*csum_recalc)(struct sk_buff *skb, u8 proto,
+			       void *data, __sum16 *check,
+			       int datalen, int oldlen);
+
+	void	(*decode_session)(struct sk_buff *skb,
+				  const struct nf_conn *ct,
+				  enum ip_conntrack_dir dir,
+				  unsigned long statusbit,
+				  struct flowi *fl);
+
+	int	(*nlattr_to_range)(struct nlattr *tb[],
+				   struct nf_nat_range *range);
+};
+
+extern int nf_nat_l3proto_register(const struct nf_nat_l3proto *);
+extern void nf_nat_l3proto_unregister(const struct nf_nat_l3proto *);
+extern const struct nf_nat_l3proto *__nf_nat_l3proto_find(u8 l3proto);
+
+extern int nf_nat_icmp_reply_translation(struct sk_buff *skb,
+					 struct nf_conn *ct,
+					 enum ip_conntrack_info ctinfo,
+					 unsigned int hooknum);
+
+#endif /* _NF_NAT_L3PROTO_H */
diff --git a/include/net/netfilter/nf_nat_l4proto.h b/include/net/netfilter/nf_nat_l4proto.h
new file mode 100644
index 0000000..1f0a4f0
--- /dev/null
+++ b/include/net/netfilter/nf_nat_l4proto.h
@@ -0,0 +1,71 @@
+/* Header for use in defining a given protocol. */
+#ifndef _NF_NAT_L4PROTO_H
+#define _NF_NAT_L4PROTO_H
+#include <net/netfilter/nf_nat.h>
+#include <linux/netfilter/nfnetlink_conntrack.h>
+
+struct nf_nat_range;
+struct nf_nat_l3proto;
+
+struct nf_nat_l4proto {
+	/* Protocol number. */
+	u8 l4proto;
+
+	/* Translate a packet to the target according to manip type.
+	 * Return true if succeeded.
+	 */
+	bool (*manip_pkt)(struct sk_buff *skb,
+			  const struct nf_nat_l3proto *l3proto,
+			  unsigned int iphdroff, unsigned int hdroff,
+			  const struct nf_conntrack_tuple *tuple,
+			  enum nf_nat_manip_type maniptype);
+
+	/* Is the manipable part of the tuple between min and max incl? */
+	bool (*in_range)(const struct nf_conntrack_tuple *tuple,
+			 enum nf_nat_manip_type maniptype,
+			 const union nf_conntrack_man_proto *min,
+			 const union nf_conntrack_man_proto *max);
+
+	/* Alter the per-proto part of the tuple (depending on
+	 * maniptype), to give a unique tuple in the given range if
+	 * possible.  Per-protocol part of tuple is initialized to the
+	 * incoming packet.
+	 */
+	void (*unique_tuple)(const struct nf_nat_l3proto *l3proto,
+			     struct nf_conntrack_tuple *tuple,
+			     const struct nf_nat_range *range,
+			     enum nf_nat_manip_type maniptype,
+			     const struct nf_conn *ct);
+
+	int (*nlattr_to_range)(struct nlattr *tb[],
+			       struct nf_nat_range *range);
+};
+
+/* Protocol registration. */
+extern int nf_nat_l4proto_register(u8 l3proto, const struct nf_nat_l4proto *l4proto);
+extern void nf_nat_l4proto_unregister(u8 l3proto, const struct nf_nat_l4proto *l4proto);
+
+extern const struct nf_nat_l4proto *__nf_nat_l4proto_find(u8 l3proto, u8 l4proto);
+
+/* Built-in protocols. */
+extern const struct nf_nat_l4proto nf_nat_l4proto_tcp;
+extern const struct nf_nat_l4proto nf_nat_l4proto_udp;
+extern const struct nf_nat_l4proto nf_nat_l4proto_icmp;
+extern const struct nf_nat_l4proto nf_nat_l4proto_unknown;
+
+extern bool nf_nat_l4proto_in_range(const struct nf_conntrack_tuple *tuple,
+				    enum nf_nat_manip_type maniptype,
+				    const union nf_conntrack_man_proto *min,
+				    const union nf_conntrack_man_proto *max);
+
+extern void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto,
+					struct nf_conntrack_tuple *tuple,
+					const struct nf_nat_range *range,
+					enum nf_nat_manip_type maniptype,
+					const struct nf_conn *ct,
+					u16 *rover);
+
+extern int nf_nat_l4proto_nlattr_to_range(struct nlattr *tb[],
+					  struct nf_nat_range *range);
+
+#endif /*_NF_NAT_L4PROTO_H*/
diff --git a/include/net/netfilter/nf_nat_protocol.h b/include/net/netfilter/nf_nat_protocol.h
deleted file mode 100644
index 7b0b511..0000000
--- a/include/net/netfilter/nf_nat_protocol.h
+++ /dev/null
@@ -1,67 +0,0 @@
-/* Header for use in defining a given protocol. */
-#ifndef _NF_NAT_PROTOCOL_H
-#define _NF_NAT_PROTOCOL_H
-#include <net/netfilter/nf_nat.h>
-#include <linux/netfilter/nfnetlink_conntrack.h>
-
-struct nf_nat_ipv4_range;
-
-struct nf_nat_protocol {
-	/* Protocol number. */
-	unsigned int protonum;
-
-	/* Translate a packet to the target according to manip type.
-	   Return true if succeeded. */
-	bool (*manip_pkt)(struct sk_buff *skb,
-			  unsigned int iphdroff,
-			  const struct nf_conntrack_tuple *tuple,
-			  enum nf_nat_manip_type maniptype);
-
-	/* Is the manipable part of the tuple between min and max incl? */
-	bool (*in_range)(const struct nf_conntrack_tuple *tuple,
-			 enum nf_nat_manip_type maniptype,
-			 const union nf_conntrack_man_proto *min,
-			 const union nf_conntrack_man_proto *max);
-
-	/* Alter the per-proto part of the tuple (depending on
-	   maniptype), to give a unique tuple in the given range if
-	   possible.  Per-protocol part of tuple is initialized to the
-	   incoming packet. */
-	void (*unique_tuple)(struct nf_conntrack_tuple *tuple,
-			     const struct nf_nat_ipv4_range *range,
-			     enum nf_nat_manip_type maniptype,
-			     const struct nf_conn *ct);
-
-	int (*nlattr_to_range)(struct nlattr *tb[],
-			       struct nf_nat_ipv4_range *range);
-};
-
-/* Protocol registration. */
-extern int nf_nat_protocol_register(const struct nf_nat_protocol *proto);
-extern void nf_nat_protocol_unregister(const struct nf_nat_protocol *proto);
-
-/* Built-in protocols. */
-extern const struct nf_nat_protocol nf_nat_protocol_tcp;
-extern const struct nf_nat_protocol nf_nat_protocol_udp;
-extern const struct nf_nat_protocol nf_nat_protocol_icmp;
-extern const struct nf_nat_protocol nf_nat_unknown_protocol;
-
-extern int init_protocols(void) __init;
-extern void cleanup_protocols(void);
-extern const struct nf_nat_protocol *find_nat_proto(u_int16_t protonum);
-
-extern bool nf_nat_proto_in_range(const struct nf_conntrack_tuple *tuple,
-				  enum nf_nat_manip_type maniptype,
-				  const union nf_conntrack_man_proto *min,
-				  const union nf_conntrack_man_proto *max);
-
-extern void nf_nat_proto_unique_tuple(struct nf_conntrack_tuple *tuple,
-				      const struct nf_nat_ipv4_range *range,
-				      enum nf_nat_manip_type maniptype,
-				      const struct nf_conn *ct,
-				      u_int16_t *rover);
-
-extern int nf_nat_proto_nlattr_to_range(struct nlattr *tb[],
-					struct nf_nat_ipv4_range *range);
-
-#endif /*_NF_NAT_PROTO_H*/
diff --git a/include/net/netfilter/nf_nat_rule.h b/include/net/netfilter/nf_nat_rule.h
deleted file mode 100644
index 2890bdc..0000000
--- a/include/net/netfilter/nf_nat_rule.h
+++ /dev/null
@@ -1,15 +0,0 @@
-#ifndef _NF_NAT_RULE_H
-#define _NF_NAT_RULE_H
-#include <net/netfilter/nf_conntrack.h>
-#include <net/netfilter/nf_nat.h>
-#include <linux/netfilter_ipv4/ip_tables.h>
-
-extern int nf_nat_rule_init(void) __init;
-extern void nf_nat_rule_cleanup(void);
-extern int nf_nat_rule_find(struct sk_buff *skb,
-			    unsigned int hooknum,
-			    const struct net_device *in,
-			    const struct net_device *out,
-			    struct nf_conn *ct);
-
-#endif /* _NF_NAT_RULE_H */
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 3aecdc7..a1d83cc 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -83,6 +83,10 @@ struct netns_ct {
 	int			sysctl_auto_assign_helper;
 	bool			auto_assign_helper_warned;
 	struct nf_ip_net	nf_ct_proto;
+#ifdef CONFIG_NF_NAT_NEEDED
+	struct hlist_head	*nat_bysource;
+	unsigned int		nat_htable_size;
+#endif
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header	*sysctl_header;
 	struct ctl_table_header	*acct_sysctl_header;
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 1474dd6..ace280d 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -51,8 +51,6 @@ struct netns_ipv4 {
 	struct xt_table		*iptable_security;
 #endif
 	struct xt_table		*nat_table;
-	struct hlist_head	*nat_bysource;
-	unsigned int		nat_htable_size;
 #endif
 
 	int sysctl_icmp_echo_ignore_all;
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index ed1b367..f1643c0 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -72,43 +72,6 @@ int ip_route_me_harder(struct sk_buff *skb, unsigned int addr_type)
 }
 EXPORT_SYMBOL(ip_route_me_harder);
 
-#ifdef CONFIG_XFRM
-int ip_xfrm_me_harder(struct sk_buff *skb)
-{
-	struct flowi fl;
-	unsigned int hh_len;
-	struct dst_entry *dst;
-
-	if (IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED)
-		return 0;
-	if (xfrm_decode_session(skb, &fl, AF_INET) < 0)
-		return -1;
-
-	dst = skb_dst(skb);
-	if (dst->xfrm)
-		dst = ((struct xfrm_dst *)dst)->route;
-	dst_hold(dst);
-
-	dst = xfrm_lookup(dev_net(dst->dev), dst, &fl, skb->sk, 0);
-	if (IS_ERR(dst))
-		return -1;
-
-	skb_dst_drop(skb);
-	skb_dst_set(skb, dst);
-
-	/* Change in oif may mean change in hh_len. */
-	hh_len = skb_dst(skb)->dev->hard_header_len;
-	if (skb_headroom(skb) < hh_len &&
-	    pskb_expand_head(skb, hh_len - skb_headroom(skb), 0, GFP_ATOMIC))
-		return -1;
-	return 0;
-}
-EXPORT_SYMBOL(ip_xfrm_me_harder);
-#endif
-
-void (*ip_nat_decode_session)(struct sk_buff *, struct flowi *);
-EXPORT_SYMBOL(ip_nat_decode_session);
-
 /*
  * Extra routing may needed on local out, as the QUEUE target never
  * returns control to the table.
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index fcc543c..33372a1 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -143,25 +143,21 @@ config IP_NF_TARGET_ULOG
 	  To compile it as a module, choose M here.  If unsure, say N.
 
 # NAT + specific targets: nf_conntrack
-config NF_NAT
-	tristate "Full NAT"
+config NF_NAT_IPV4
+	tristate "IPv4 NAT"
 	depends on NF_CONNTRACK_IPV4
 	default m if NETFILTER_ADVANCED=n
+	select NF_NAT
 	help
-	  The Full NAT option allows masquerading, port forwarding and other
+	  The IPv4 NAT option allows masquerading, port forwarding and other
 	  forms of full Network Address Port Translation.  It is controlled by
 	  the `nat' table in iptables: see the man page for iptables(8).
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
-config NF_NAT_NEEDED
-	bool
-	depends on NF_NAT
-	default y
-
 config IP_NF_TARGET_MASQUERADE
 	tristate "MASQUERADE target support"
-	depends on NF_NAT
+	depends on NF_NAT_IPV4
 	default m if NETFILTER_ADVANCED=n
 	help
 	  Masquerading is a special case of NAT: all outgoing connections are
@@ -174,7 +170,7 @@ config IP_NF_TARGET_MASQUERADE
 
 config IP_NF_TARGET_NETMAP
 	tristate "NETMAP target support"
-	depends on NF_NAT
+	depends on NF_NAT_IPV4
 	depends on NETFILTER_ADVANCED
 	help
 	  NETMAP is an implementation of static 1:1 NAT mapping of network
@@ -185,7 +181,7 @@ config IP_NF_TARGET_NETMAP
 
 config IP_NF_TARGET_REDIRECT
 	tristate "REDIRECT target support"
-	depends on NF_NAT
+	depends on NF_NAT_IPV4
 	depends on NETFILTER_ADVANCED
 	help
 	  REDIRECT is a special case of NAT: all incoming connections are
@@ -197,7 +193,7 @@ config IP_NF_TARGET_REDIRECT
 
 config NF_NAT_SNMP_BASIC
 	tristate "Basic SNMP-ALG support"
-	depends on NF_CONNTRACK_SNMP && NF_NAT
+	depends on NF_CONNTRACK_SNMP && NF_NAT_IPV4
 	depends on NETFILTER_ADVANCED
 	default NF_NAT && NF_CONNTRACK_SNMP
 	---help---
@@ -219,61 +215,46 @@ config NF_NAT_SNMP_BASIC
 #           <expr> '&&' <expr>                   (6)
 #
 # (6) Returns the result of min(/expr/, /expr/).
-config NF_NAT_PROTO_DCCP
-	tristate
-	depends on NF_NAT && NF_CT_PROTO_DCCP
-	default NF_NAT && NF_CT_PROTO_DCCP
 
 config NF_NAT_PROTO_GRE
 	tristate
-	depends on NF_NAT && NF_CT_PROTO_GRE
-
-config NF_NAT_PROTO_UDPLITE
-	tristate
-	depends on NF_NAT && NF_CT_PROTO_UDPLITE
-	default NF_NAT && NF_CT_PROTO_UDPLITE
-
-config NF_NAT_PROTO_SCTP
-	tristate
-	default NF_NAT && NF_CT_PROTO_SCTP
-	depends on NF_NAT && NF_CT_PROTO_SCTP
-	select LIBCRC32C
+	depends on NF_NAT_IPV4 && NF_CT_PROTO_GRE
 
 config NF_NAT_FTP
 	tristate
-	depends on NF_CONNTRACK && NF_NAT
-	default NF_NAT && NF_CONNTRACK_FTP
+	depends on NF_CONNTRACK && NF_NAT_IPV4
+	default NF_NAT_IPV4 && NF_CONNTRACK_FTP
 
 config NF_NAT_IRC
 	tristate
-	depends on NF_CONNTRACK && NF_NAT
-	default NF_NAT && NF_CONNTRACK_IRC
+	depends on NF_CONNTRACK && NF_NAT_IPV4
+	default NF_NAT_IPV4 && NF_CONNTRACK_IRC
 
 config NF_NAT_TFTP
 	tristate
-	depends on NF_CONNTRACK && NF_NAT
-	default NF_NAT && NF_CONNTRACK_TFTP
+	depends on NF_CONNTRACK && NF_NAT_IPV4
+	default NF_NAT_IPV4 && NF_CONNTRACK_TFTP
 
 config NF_NAT_AMANDA
 	tristate
-	depends on NF_CONNTRACK && NF_NAT
-	default NF_NAT && NF_CONNTRACK_AMANDA
+	depends on NF_CONNTRACK && NF_NAT_IPV4
+	default NF_NAT_IPV4 && NF_CONNTRACK_AMANDA
 
 config NF_NAT_PPTP
 	tristate
-	depends on NF_CONNTRACK && NF_NAT
-	default NF_NAT && NF_CONNTRACK_PPTP
+	depends on NF_CONNTRACK && NF_NAT_IPV4
+	default NF_NAT_IPV4 && NF_CONNTRACK_PPTP
 	select NF_NAT_PROTO_GRE
 
 config NF_NAT_H323
 	tristate
-	depends on NF_CONNTRACK && NF_NAT
-	default NF_NAT && NF_CONNTRACK_H323
+	depends on NF_CONNTRACK && NF_NAT_IPV4
+	default NF_NAT_IPV4 && NF_CONNTRACK_H323
 
 config NF_NAT_SIP
 	tristate
-	depends on NF_CONNTRACK && NF_NAT
-	default NF_NAT && NF_CONNTRACK_SIP
+	depends on NF_CONNTRACK && NF_NAT_IPV4
+	default NF_NAT_IPV4 && NF_CONNTRACK_SIP
 
 # mangle + specific targets
 config IP_NF_MANGLE
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index c20674d..0ea3acc 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -10,13 +10,11 @@ nf_conntrack_ipv4-objs	+= nf_conntrack_l3proto_ipv4_compat.o
 endif
 endif
 
-nf_nat-y		:= nf_nat_core.o nf_nat_helper.o nf_nat_proto_unknown.o nf_nat_proto_common.o nf_nat_proto_tcp.o nf_nat_proto_udp.o nf_nat_proto_icmp.o
-iptable_nat-y	:= nf_nat_rule.o nf_nat_standalone.o
-
 # connection tracking
 obj-$(CONFIG_NF_CONNTRACK_IPV4) += nf_conntrack_ipv4.o
 
-obj-$(CONFIG_NF_NAT) += nf_nat.o
+nf_nat_ipv4-y		:= nf_nat_l3proto_ipv4.o nf_nat_proto_icmp.o
+obj-$(CONFIG_NF_NAT_IPV4) += nf_nat_ipv4.o
 
 # defrag
 obj-$(CONFIG_NF_DEFRAG_IPV4) += nf_defrag_ipv4.o
@@ -32,10 +30,7 @@ obj-$(CONFIG_NF_NAT_SNMP_BASIC) += nf_nat_snmp_basic.o
 obj-$(CONFIG_NF_NAT_TFTP) += nf_nat_tftp.o
 
 # NAT protocols (nf_nat)
-obj-$(CONFIG_NF_NAT_PROTO_DCCP) += nf_nat_proto_dccp.o
 obj-$(CONFIG_NF_NAT_PROTO_GRE) += nf_nat_proto_gre.o
-obj-$(CONFIG_NF_NAT_PROTO_UDPLITE) += nf_nat_proto_udplite.o
-obj-$(CONFIG_NF_NAT_PROTO_SCTP) += nf_nat_proto_sctp.o
 
 # generic IP tables 
 obj-$(CONFIG_IP_NF_IPTABLES) += ip_tables.o
@@ -43,7 +38,7 @@ obj-$(CONFIG_IP_NF_IPTABLES) += ip_tables.o
 # the three instances of ip_tables
 obj-$(CONFIG_IP_NF_FILTER) += iptable_filter.o
 obj-$(CONFIG_IP_NF_MANGLE) += iptable_mangle.o
-obj-$(CONFIG_NF_NAT) += iptable_nat.o
+obj-$(CONFIG_NF_NAT_IPV4) += iptable_nat.o
 obj-$(CONFIG_IP_NF_RAW) += iptable_raw.o
 obj-$(CONFIG_IP_NF_SECURITY) += iptable_security.o
 
diff --git a/net/ipv4/netfilter/ipt_MASQUERADE.c b/net/ipv4/netfilter/ipt_MASQUERADE.c
index cbb6a1a..1c3aa28 100644
--- a/net/ipv4/netfilter/ipt_MASQUERADE.c
+++ b/net/ipv4/netfilter/ipt_MASQUERADE.c
@@ -19,9 +19,9 @@
 #include <net/ip.h>
 #include <net/checksum.h>
 #include <net/route.h>
-#include <net/netfilter/nf_nat_rule.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_nat.h>
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>");
@@ -49,7 +49,7 @@ masquerade_tg(struct sk_buff *skb, const struct xt_action_param *par)
 	struct nf_conn *ct;
 	struct nf_conn_nat *nat;
 	enum ip_conntrack_info ctinfo;
-	struct nf_nat_ipv4_range newrange;
+	struct nf_nat_range newrange;
 	const struct nf_nat_ipv4_multi_range_compat *mr;
 	const struct rtable *rt;
 	__be32 newsrc, nh;
@@ -80,10 +80,13 @@ masquerade_tg(struct sk_buff *skb, const struct xt_action_param *par)
 	nat->masq_index = par->out->ifindex;
 
 	/* Transfer from original range. */
-	newrange = ((struct nf_nat_ipv4_range)
-		{ mr->range[0].flags | NF_NAT_RANGE_MAP_IPS,
-		  newsrc, newsrc,
-		  mr->range[0].min, mr->range[0].max });
+	memset(&newrange.min_addr, 0, sizeof(newrange.min_addr));
+	memset(&newrange.max_addr, 0, sizeof(newrange.max_addr));
+	newrange.flags       = mr->range[0].flags | NF_NAT_RANGE_MAP_IPS;
+	newrange.min_addr.ip = newsrc;
+	newrange.max_addr.ip = newsrc;
+	newrange.min_proto   = mr->range[0].min;
+	newrange.max_proto   = mr->range[0].max;
 
 	/* Hand modified range to generic setup. */
 	return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_SRC);
diff --git a/net/ipv4/netfilter/ipt_NETMAP.c b/net/ipv4/netfilter/ipt_NETMAP.c
index b5bfbba..85028dc 100644
--- a/net/ipv4/netfilter/ipt_NETMAP.c
+++ b/net/ipv4/netfilter/ipt_NETMAP.c
@@ -16,7 +16,7 @@
 #include <linux/netfilter.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter/x_tables.h>
-#include <net/netfilter/nf_nat_rule.h>
+#include <net/netfilter/nf_nat.h>
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Svenning Soerensen <svenning@post5.tele.dk>");
@@ -44,7 +44,7 @@ netmap_tg(struct sk_buff *skb, const struct xt_action_param *par)
 	enum ip_conntrack_info ctinfo;
 	__be32 new_ip, netmask;
 	const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
-	struct nf_nat_ipv4_range newrange;
+	struct nf_nat_range newrange;
 
 	NF_CT_ASSERT(par->hooknum == NF_INET_PRE_ROUTING ||
 		     par->hooknum == NF_INET_POST_ROUTING ||
@@ -61,10 +61,13 @@ netmap_tg(struct sk_buff *skb, const struct xt_action_param *par)
 		new_ip = ip_hdr(skb)->saddr & ~netmask;
 	new_ip |= mr->range[0].min_ip & netmask;
 
-	newrange = ((struct nf_nat_ipv4_range)
-		{ mr->range[0].flags | NF_NAT_RANGE_MAP_IPS,
-		  new_ip, new_ip,
-		  mr->range[0].min, mr->range[0].max });
+	memset(&newrange.min_addr, 0, sizeof(newrange.min_addr));
+	memset(&newrange.max_addr, 0, sizeof(newrange.max_addr));
+	newrange.flags	     = mr->range[0].flags | NF_NAT_RANGE_MAP_IPS;
+	newrange.min_addr.ip = new_ip;
+	newrange.max_addr.ip = new_ip;
+	newrange.min_proto   = mr->range[0].min;
+	newrange.max_proto   = mr->range[0].max;
 
 	/* Hand modified range to generic setup. */
 	return nf_nat_setup_info(ct, &newrange, HOOK2MANIP(par->hooknum));
diff --git a/net/ipv4/netfilter/ipt_REDIRECT.c b/net/ipv4/netfilter/ipt_REDIRECT.c
index 7c0103a..11407d7 100644
--- a/net/ipv4/netfilter/ipt_REDIRECT.c
+++ b/net/ipv4/netfilter/ipt_REDIRECT.c
@@ -19,7 +19,7 @@
 #include <net/checksum.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter/x_tables.h>
-#include <net/netfilter/nf_nat_rule.h>
+#include <net/netfilter/nf_nat.h>
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>");
@@ -48,7 +48,7 @@ redirect_tg(struct sk_buff *skb, const struct xt_action_param *par)
 	enum ip_conntrack_info ctinfo;
 	__be32 newdst;
 	const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
-	struct nf_nat_ipv4_range newrange;
+	struct nf_nat_range newrange;
 
 	NF_CT_ASSERT(par->hooknum == NF_INET_PRE_ROUTING ||
 		     par->hooknum == NF_INET_LOCAL_OUT);
@@ -76,10 +76,13 @@ redirect_tg(struct sk_buff *skb, const struct xt_action_param *par)
 	}
 
 	/* Transfer from original range. */
-	newrange = ((struct nf_nat_ipv4_range)
-		{ mr->range[0].flags | NF_NAT_RANGE_MAP_IPS,
-		  newdst, newdst,
-		  mr->range[0].min, mr->range[0].max });
+	memset(&newrange.min_addr, 0, sizeof(newrange.min_addr));
+	memset(&newrange.max_addr, 0, sizeof(newrange.max_addr));
+	newrange.flags	     = mr->range[0].flags | NF_NAT_RANGE_MAP_IPS;
+	newrange.min_addr.ip = newdst;
+	newrange.max_addr.ip = newdst;
+	newrange.min_proto   = mr->range[0].min;
+	newrange.max_proto   = mr->range[0].max;
 
 	/* Hand modified range to generic setup. */
 	return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_DST);
diff --git a/net/ipv4/netfilter/nf_nat_standalone.c b/net/ipv4/netfilter/iptable_nat.c
similarity index 52%
rename from net/ipv4/netfilter/nf_nat_standalone.c
rename to net/ipv4/netfilter/iptable_nat.c
index 3828a42..9e0ffaf 100644
--- a/net/ipv4/netfilter/nf_nat_standalone.c
+++ b/net/ipv4/netfilter/iptable_nat.c
@@ -1,84 +1,71 @@
 /* (C) 1999-2001 Paul `Rusty' Russell
  * (C) 2002-2006 Netfilter Core Team <coreteam@netfilter.org>
+ * (C) 2011 Patrick McHardy <kaber@trash.net>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-#include <linux/types.h>
-#include <linux/icmp.h>
-#include <linux/gfp.h>
-#include <linux/ip.h>
+
+#include <linux/module.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter_ipv4.h>
-#include <linux/module.h>
-#include <linux/skbuff.h>
-#include <linux/proc_fs.h>
+#include <linux/netfilter_ipv4/ip_tables.h>
+#include <linux/ip.h>
 #include <net/ip.h>
-#include <net/checksum.h>
-#include <linux/spinlock.h>
 
-#include <net/netfilter/nf_conntrack.h>
-#include <net/netfilter/nf_conntrack_core.h>
-#include <net/netfilter/nf_conntrack_extend.h>
 #include <net/netfilter/nf_nat.h>
-#include <net/netfilter/nf_nat_rule.h>
-#include <net/netfilter/nf_nat_protocol.h>
 #include <net/netfilter/nf_nat_core.h>
-#include <net/netfilter/nf_nat_helper.h>
-#include <linux/netfilter_ipv4/ip_tables.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+
+static const struct xt_table nf_nat_ipv4_table = {
+	.name		= "nat",
+	.valid_hooks	= (1 << NF_INET_PRE_ROUTING) |
+			  (1 << NF_INET_POST_ROUTING) |
+			  (1 << NF_INET_LOCAL_OUT) |
+			  (1 << NF_INET_LOCAL_IN),
+	.me		= THIS_MODULE,
+	.af		= NFPROTO_IPV4,
+};
 
-#ifdef CONFIG_XFRM
-static void nat_decode_session(struct sk_buff *skb, struct flowi *fl)
+static unsigned int alloc_null_binding(struct nf_conn *ct, unsigned int hooknum)
 {
-	struct flowi4 *fl4 = &fl->u.ip4;
-	const struct nf_conn *ct;
-	const struct nf_conntrack_tuple *t;
-	enum ip_conntrack_info ctinfo;
-	enum ip_conntrack_dir dir;
-	unsigned long statusbit;
-
-	ct = nf_ct_get(skb, &ctinfo);
-	if (ct == NULL)
-		return;
-	dir = CTINFO2DIR(ctinfo);
-	t = &ct->tuplehash[dir].tuple;
-
-	if (dir == IP_CT_DIR_ORIGINAL)
-		statusbit = IPS_DST_NAT;
-	else
-		statusbit = IPS_SRC_NAT;
-
-	if (ct->status & statusbit) {
-		fl4->daddr = t->dst.u3.ip;
-		if (t->dst.protonum == IPPROTO_TCP ||
-		    t->dst.protonum == IPPROTO_UDP ||
-		    t->dst.protonum == IPPROTO_UDPLITE ||
-		    t->dst.protonum == IPPROTO_DCCP ||
-		    t->dst.protonum == IPPROTO_SCTP)
-			fl4->fl4_dport = t->dst.u.tcp.port;
-	}
+	/* Force range to this IP; let proto decide mapping for
+	 * per-proto parts (hence not IP_NAT_RANGE_PROTO_SPECIFIED).
+	 */
+	struct nf_nat_range range;
+
+	range.flags = 0;
+	pr_debug("Allocating NULL binding for %p (%pI4)\n", ct,
+		 HOOK2MANIP(hooknum) == NF_NAT_MANIP_SRC ?
+		 &ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.ip :
+		 &ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.ip);
+
+	return nf_nat_setup_info(ct, &range, HOOK2MANIP(hooknum));
+}
 
-	statusbit ^= IPS_NAT_MASK;
+static unsigned int nf_nat_rule_find(struct sk_buff *skb, unsigned int hooknum,
+				     const struct net_device *in,
+				     const struct net_device *out,
+				     struct nf_conn *ct)
+{
+	struct net *net = nf_ct_net(ct);
+	unsigned int ret;
 
-	if (ct->status & statusbit) {
-		fl4->saddr = t->src.u3.ip;
-		if (t->dst.protonum == IPPROTO_TCP ||
-		    t->dst.protonum == IPPROTO_UDP ||
-		    t->dst.protonum == IPPROTO_UDPLITE ||
-		    t->dst.protonum == IPPROTO_DCCP ||
-		    t->dst.protonum == IPPROTO_SCTP)
-			fl4->fl4_sport = t->src.u.tcp.port;
+	ret = ipt_do_table(skb, hooknum, in, out, net->ipv4.nat_table);
+	if (ret == NF_ACCEPT) {
+		if (!nf_nat_initialized(ct, HOOK2MANIP(hooknum)))
+			ret = alloc_null_binding(ct, hooknum);
 	}
+	return ret;
 }
-#endif
 
 static unsigned int
-nf_nat_fn(unsigned int hooknum,
-	  struct sk_buff *skb,
-	  const struct net_device *in,
-	  const struct net_device *out,
-	  int (*okfn)(struct sk_buff *))
+nf_nat_ipv4_fn(unsigned int hooknum,
+	       struct sk_buff *skb,
+	       const struct net_device *in,
+	       const struct net_device *out,
+	       int (*okfn)(struct sk_buff *))
 {
 	struct nf_conn *ct;
 	enum ip_conntrack_info ctinfo;
@@ -87,14 +74,16 @@ nf_nat_fn(unsigned int hooknum,
 	enum nf_nat_manip_type maniptype = HOOK2MANIP(hooknum);
 
 	/* We never see fragments: conntrack defrags on pre-routing
-	   and local-out, and nf_nat_out protects post-routing. */
+	 * and local-out, and nf_nat_out protects post-routing.
+	 */
 	NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb)));
 
 	ct = nf_ct_get(skb, &ctinfo);
 	/* Can't track?  It's not due to stress, or conntrack would
-	   have dropped it.  Hence it's the user's responsibilty to
-	   packet filter it out, or implement conntrack/NAT for that
-	   protocol. 8) --RR */
+	 * have dropped it.  Hence it's the user's responsibilty to
+	 * packet filter it out, or implement conntrack/NAT for that
+	 * protocol. 8) --RR
+	 */
 	if (!ct)
 		return NF_ACCEPT;
 
@@ -118,17 +107,17 @@ nf_nat_fn(unsigned int hooknum,
 	case IP_CT_RELATED:
 	case IP_CT_RELATED_REPLY:
 		if (ip_hdr(skb)->protocol == IPPROTO_ICMP) {
-			if (!nf_nat_icmp_reply_translation(ct, ctinfo,
-							   hooknum, skb))
+			if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
+							   hooknum))
 				return NF_DROP;
 			else
 				return NF_ACCEPT;
 		}
 		/* Fall thru... (Only ICMPs can be IP_CT_IS_REPLY) */
 	case IP_CT_NEW:
-
 		/* Seen it before?  This can happen for loopback, retrans,
-		   or local packets.. */
+		 * or local packets.
+		 */
 		if (!nf_nat_initialized(ct, maniptype)) {
 			unsigned int ret;
 
@@ -151,16 +140,16 @@ nf_nat_fn(unsigned int hooknum,
 }
 
 static unsigned int
-nf_nat_in(unsigned int hooknum,
-	  struct sk_buff *skb,
-	  const struct net_device *in,
-	  const struct net_device *out,
-	  int (*okfn)(struct sk_buff *))
+nf_nat_ipv4_in(unsigned int hooknum,
+	       struct sk_buff *skb,
+	       const struct net_device *in,
+	       const struct net_device *out,
+	       int (*okfn)(struct sk_buff *))
 {
 	unsigned int ret;
 	__be32 daddr = ip_hdr(skb)->daddr;
 
-	ret = nf_nat_fn(hooknum, skb, in, out, okfn);
+	ret = nf_nat_ipv4_fn(hooknum, skb, in, out, okfn);
 	if (ret != NF_DROP && ret != NF_STOLEN &&
 	    daddr != ip_hdr(skb)->daddr)
 		skb_dst_drop(skb);
@@ -169,11 +158,11 @@ nf_nat_in(unsigned int hooknum,
 }
 
 static unsigned int
-nf_nat_out(unsigned int hooknum,
-	   struct sk_buff *skb,
-	   const struct net_device *in,
-	   const struct net_device *out,
-	   int (*okfn)(struct sk_buff *))
+nf_nat_ipv4_out(unsigned int hooknum,
+		struct sk_buff *skb,
+		const struct net_device *in,
+		const struct net_device *out,
+		int (*okfn)(struct sk_buff *))
 {
 #ifdef CONFIG_XFRM
 	const struct nf_conn *ct;
@@ -186,29 +175,30 @@ nf_nat_out(unsigned int hooknum,
 	    ip_hdrlen(skb) < sizeof(struct iphdr))
 		return NF_ACCEPT;
 
-	ret = nf_nat_fn(hooknum, skb, in, out, okfn);
+	ret = nf_nat_ipv4_fn(hooknum, skb, in, out, okfn);
 #ifdef CONFIG_XFRM
 	if (ret != NF_DROP && ret != NF_STOLEN &&
+	    !(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
 	    (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
 		enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
 
 		if ((ct->tuplehash[dir].tuple.src.u3.ip !=
 		     ct->tuplehash[!dir].tuple.dst.u3.ip) ||
 		    (ct->tuplehash[dir].tuple.src.u.all !=
-		     ct->tuplehash[!dir].tuple.dst.u.all)
-		   )
-			return ip_xfrm_me_harder(skb) == 0 ? ret : NF_DROP;
+		     ct->tuplehash[!dir].tuple.dst.u.all))
+			if (nf_xfrm_me_harder(skb, AF_INET) < 0)
+				ret = NF_DROP;
 	}
 #endif
 	return ret;
 }
 
 static unsigned int
-nf_nat_local_fn(unsigned int hooknum,
-		struct sk_buff *skb,
-		const struct net_device *in,
-		const struct net_device *out,
-		int (*okfn)(struct sk_buff *))
+nf_nat_ipv4_local_fn(unsigned int hooknum,
+		     struct sk_buff *skb,
+		     const struct net_device *in,
+		     const struct net_device *out,
+		     int (*okfn)(struct sk_buff *))
 {
 	const struct nf_conn *ct;
 	enum ip_conntrack_info ctinfo;
@@ -219,7 +209,7 @@ nf_nat_local_fn(unsigned int hooknum,
 	    ip_hdrlen(skb) < sizeof(struct iphdr))
 		return NF_ACCEPT;
 
-	ret = nf_nat_fn(hooknum, skb, in, out, okfn);
+	ret = nf_nat_ipv4_fn(hooknum, skb, in, out, okfn);
 	if (ret != NF_DROP && ret != NF_STOLEN &&
 	    (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
 		enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
@@ -230,21 +220,20 @@ nf_nat_local_fn(unsigned int hooknum,
 				ret = NF_DROP;
 		}
 #ifdef CONFIG_XFRM
-		else if (ct->tuplehash[dir].tuple.dst.u.all !=
+		else if (!(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
+			 ct->tuplehash[dir].tuple.dst.u.all !=
 			 ct->tuplehash[!dir].tuple.src.u.all)
-			if (ip_xfrm_me_harder(skb))
+			if (nf_xfrm_me_harder(skb, AF_INET) < 0)
 				ret = NF_DROP;
 #endif
 	}
 	return ret;
 }
 
-/* We must be after connection tracking and before packet filtering. */
-
-static struct nf_hook_ops nf_nat_ops[] __read_mostly = {
+static struct nf_hook_ops nf_nat_ipv4_ops[] __read_mostly = {
 	/* Before packet filtering, change destination */
 	{
-		.hook		= nf_nat_in,
+		.hook		= nf_nat_ipv4_in,
 		.owner		= THIS_MODULE,
 		.pf		= NFPROTO_IPV4,
 		.hooknum	= NF_INET_PRE_ROUTING,
@@ -252,7 +241,7 @@ static struct nf_hook_ops nf_nat_ops[] __read_mostly = {
 	},
 	/* After packet filtering, change source */
 	{
-		.hook		= nf_nat_out,
+		.hook		= nf_nat_ipv4_out,
 		.owner		= THIS_MODULE,
 		.pf		= NFPROTO_IPV4,
 		.hooknum	= NF_INET_POST_ROUTING,
@@ -260,7 +249,7 @@ static struct nf_hook_ops nf_nat_ops[] __read_mostly = {
 	},
 	/* Before packet filtering, change destination */
 	{
-		.hook		= nf_nat_local_fn,
+		.hook		= nf_nat_ipv4_local_fn,
 		.owner		= THIS_MODULE,
 		.pf		= NFPROTO_IPV4,
 		.hooknum	= NF_INET_LOCAL_OUT,
@@ -268,7 +257,7 @@ static struct nf_hook_ops nf_nat_ops[] __read_mostly = {
 	},
 	/* After packet filtering, change source */
 	{
-		.hook		= nf_nat_fn,
+		.hook		= nf_nat_ipv4_fn,
 		.owner		= THIS_MODULE,
 		.pf		= NFPROTO_IPV4,
 		.hooknum	= NF_INET_LOCAL_IN,
@@ -276,51 +265,56 @@ static struct nf_hook_ops nf_nat_ops[] __read_mostly = {
 	},
 };
 
-static int __init nf_nat_standalone_init(void)
+static int __net_init iptable_nat_net_init(struct net *net)
 {
-	int ret = 0;
+	struct ipt_replace *repl;
+
+	repl = ipt_alloc_initial_table(&nf_nat_ipv4_table);
+	if (repl == NULL)
+		return -ENOMEM;
+	net->ipv4.nat_table = ipt_register_table(net, &nf_nat_ipv4_table, repl);
+	kfree(repl);
+	if (IS_ERR(net->ipv4.nat_table))
+		return PTR_ERR(net->ipv4.nat_table);
+	return 0;
+}
 
-	need_ipv4_conntrack();
+static void __net_exit iptable_nat_net_exit(struct net *net)
+{
+	ipt_unregister_table(net, net->ipv4.nat_table);
+}
 
-#ifdef CONFIG_XFRM
-	BUG_ON(ip_nat_decode_session != NULL);
-	RCU_INIT_POINTER(ip_nat_decode_session, nat_decode_session);
-#endif
-	ret = nf_nat_rule_init();
-	if (ret < 0) {
-		pr_err("nf_nat_init: can't setup rules.\n");
-		goto cleanup_decode_session;
-	}
-	ret = nf_register_hooks(nf_nat_ops, ARRAY_SIZE(nf_nat_ops));
-	if (ret < 0) {
-		pr_err("nf_nat_init: can't register hooks.\n");
-		goto cleanup_rule_init;
-	}
-	return ret;
+static struct pernet_operations iptable_nat_net_ops = {
+	.init	= iptable_nat_net_init,
+	.exit	= iptable_nat_net_exit,
+};
 
- cleanup_rule_init:
-	nf_nat_rule_cleanup();
- cleanup_decode_session:
-#ifdef CONFIG_XFRM
-	RCU_INIT_POINTER(ip_nat_decode_session, NULL);
-	synchronize_net();
-#endif
-	return ret;
+static int __init iptable_nat_init(void)
+{
+	int err;
+
+	err = register_pernet_subsys(&iptable_nat_net_ops);
+	if (err < 0)
+		goto err1;
+
+	err = nf_register_hooks(nf_nat_ipv4_ops, ARRAY_SIZE(nf_nat_ipv4_ops));
+	if (err < 0)
+		goto err2;
+	return 0;
+
+err2:
+	unregister_pernet_subsys(&iptable_nat_net_ops);
+err1:
+	return err;
 }
 
-static void __exit nf_nat_standalone_fini(void)
+static void __exit iptable_nat_exit(void)
 {
-	nf_unregister_hooks(nf_nat_ops, ARRAY_SIZE(nf_nat_ops));
-	nf_nat_rule_cleanup();
-#ifdef CONFIG_XFRM
-	RCU_INIT_POINTER(ip_nat_decode_session, NULL);
-	synchronize_net();
-#endif
-	/* Conntrack caches are unregistered in nf_conntrack_cleanup */
+	nf_unregister_hooks(nf_nat_ipv4_ops, ARRAY_SIZE(nf_nat_ipv4_ops));
+	unregister_pernet_subsys(&iptable_nat_net_ops);
 }
 
-module_init(nf_nat_standalone_init);
-module_exit(nf_nat_standalone_fini);
+module_init(iptable_nat_init);
+module_exit(iptable_nat_exit);
 
 MODULE_LICENSE("GPL");
-MODULE_ALIAS("ip_nat");
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 4ada329..fcdd0c2 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -29,12 +29,6 @@
 #include <net/netfilter/ipv4/nf_defrag_ipv4.h>
 #include <net/netfilter/nf_log.h>
 
-int (*nf_nat_seq_adjust_hook)(struct sk_buff *skb,
-			      struct nf_conn *ct,
-			      enum ip_conntrack_info ctinfo,
-			      unsigned int protoff);
-EXPORT_SYMBOL_GPL(nf_nat_seq_adjust_hook);
-
 static bool ipv4_pkt_to_tuple(const struct sk_buff *skb, unsigned int nhoff,
 			      struct nf_conntrack_tuple *tuple)
 {
diff --git a/net/ipv4/netfilter/nf_nat_amanda.c b/net/ipv4/netfilter/nf_nat_amanda.c
index 75464b6..42d3378 100644
--- a/net/ipv4/netfilter/nf_nat_amanda.c
+++ b/net/ipv4/netfilter/nf_nat_amanda.c
@@ -16,7 +16,6 @@
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <net/netfilter/nf_nat_helper.h>
-#include <net/netfilter/nf_nat_rule.h>
 #include <linux/netfilter/nf_conntrack_amanda.h>
 
 MODULE_AUTHOR("Brian J. Murrell <netfilter@interlinx.bc.ca>");
diff --git a/net/ipv4/netfilter/nf_nat_ftp.c b/net/ipv4/netfilter/nf_nat_ftp.c
index 5589f3a..dd5e387 100644
--- a/net/ipv4/netfilter/nf_nat_ftp.c
+++ b/net/ipv4/netfilter/nf_nat_ftp.c
@@ -15,7 +15,6 @@
 #include <linux/netfilter_ipv4.h>
 #include <net/netfilter/nf_nat.h>
 #include <net/netfilter/nf_nat_helper.h>
-#include <net/netfilter/nf_nat_rule.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <linux/netfilter/nf_conntrack_ftp.h>
diff --git a/net/ipv4/netfilter/nf_nat_h323.c b/net/ipv4/netfilter/nf_nat_h323.c
index d2c228d..9c3db10 100644
--- a/net/ipv4/netfilter/nf_nat_h323.c
+++ b/net/ipv4/netfilter/nf_nat_h323.c
@@ -15,7 +15,6 @@
 
 #include <net/netfilter/nf_nat.h>
 #include <net/netfilter/nf_nat_helper.h>
-#include <net/netfilter/nf_nat_rule.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <linux/netfilter/nf_conntrack_h323.h>
@@ -392,7 +391,7 @@ static int nat_h245(struct sk_buff *skb, struct nf_conn *ct,
 static void ip_nat_q931_expect(struct nf_conn *new,
 			       struct nf_conntrack_expect *this)
 {
-	struct nf_nat_ipv4_range range;
+	struct nf_nat_range range;
 
 	if (this->tuple.src.u3.ip != 0) {	/* Only accept calls from GK */
 		nf_nat_follow_master(new, this);
@@ -404,14 +403,15 @@ static void ip_nat_q931_expect(struct nf_conn *new,
 
 	/* Change src to where master sends to */
 	range.flags = NF_NAT_RANGE_MAP_IPS;
-	range.min_ip = range.max_ip = new->tuplehash[!this->dir].tuple.src.u3.ip;
+	range.min_addr = range.max_addr =
+	    new->tuplehash[!this->dir].tuple.src.u3;
 	nf_nat_setup_info(new, &range, NF_NAT_MANIP_SRC);
 
 	/* For DST manip, map port here to where it's expected. */
 	range.flags = (NF_NAT_RANGE_MAP_IPS | NF_NAT_RANGE_PROTO_SPECIFIED);
-	range.min = range.max = this->saved_proto;
-	range.min_ip = range.max_ip =
-	    new->master->tuplehash[!this->dir].tuple.src.u3.ip;
+	range.min_proto = range.max_proto = this->saved_proto;
+	range.min_addr = range.max_addr =
+	    new->master->tuplehash[!this->dir].tuple.src.u3;
 	nf_nat_setup_info(new, &range, NF_NAT_MANIP_DST);
 }
 
@@ -490,20 +490,21 @@ static int nat_q931(struct sk_buff *skb, struct nf_conn *ct,
 static void ip_nat_callforwarding_expect(struct nf_conn *new,
 					 struct nf_conntrack_expect *this)
 {
-	struct nf_nat_ipv4_range range;
+	struct nf_nat_range range;
 
 	/* This must be a fresh one. */
 	BUG_ON(new->status & IPS_NAT_DONE_MASK);
 
 	/* Change src to where master sends to */
 	range.flags = NF_NAT_RANGE_MAP_IPS;
-	range.min_ip = range.max_ip = new->tuplehash[!this->dir].tuple.src.u3.ip;
+	range.min_addr = range.max_addr =
+	    new->tuplehash[!this->dir].tuple.src.u3;
 	nf_nat_setup_info(new, &range, NF_NAT_MANIP_SRC);
 
 	/* For DST manip, map port here to where it's expected. */
 	range.flags = (NF_NAT_RANGE_MAP_IPS | NF_NAT_RANGE_PROTO_SPECIFIED);
-	range.min = range.max = this->saved_proto;
-	range.min_ip = range.max_ip = this->saved_ip;
+	range.min_proto = range.max_proto = this->saved_proto;
+	range.min_addr = range.max_addr = this->saved_addr;
 	nf_nat_setup_info(new, &range, NF_NAT_MANIP_DST);
 }
 
@@ -519,7 +520,7 @@ static int nat_callforwarding(struct sk_buff *skb, struct nf_conn *ct,
 	u_int16_t nated_port;
 
 	/* Set expectations for NAT */
-	exp->saved_ip = exp->tuple.dst.u3.ip;
+	exp->saved_addr = exp->tuple.dst.u3;
 	exp->tuple.dst.u3.ip = ct->tuplehash[!dir].tuple.dst.u3.ip;
 	exp->saved_proto.tcp.port = exp->tuple.dst.u.tcp.port;
 	exp->expectfn = ip_nat_callforwarding_expect;
diff --git a/net/ipv4/netfilter/nf_nat_irc.c b/net/ipv4/netfilter/nf_nat_irc.c
index 5b0c20a..1ce37f8 100644
--- a/net/ipv4/netfilter/nf_nat_irc.c
+++ b/net/ipv4/netfilter/nf_nat_irc.c
@@ -17,7 +17,6 @@
 
 #include <net/netfilter/nf_nat.h>
 #include <net/netfilter/nf_nat_helper.h>
-#include <net/netfilter/nf_nat_rule.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <linux/netfilter/nf_conntrack_irc.h>
diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
new file mode 100644
index 0000000..ede45fb
--- /dev/null
+++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
@@ -0,0 +1,281 @@
+/*
+ * (C) 1999-2001 Paul `Rusty' Russell
+ * (C) 2002-2006 Netfilter Core Team <coreteam@netfilter.org>
+ * (C) 2011 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/ip.h>
+#include <linux/icmp.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv4.h>
+#include <net/secure_seq.h>
+#include <net/checksum.h>
+#include <net/route.h>
+#include <net/ip.h>
+
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_nat_core.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+#include <net/netfilter/nf_nat_l4proto.h>
+
+static const struct nf_nat_l3proto nf_nat_l3proto_ipv4;
+
+#ifdef CONFIG_XFRM
+static void nf_nat_ipv4_decode_session(struct sk_buff *skb,
+				       const struct nf_conn *ct,
+				       enum ip_conntrack_dir dir,
+				       unsigned long statusbit,
+				       struct flowi *fl)
+{
+	const struct nf_conntrack_tuple *t = &ct->tuplehash[dir].tuple;
+	struct flowi4 *fl4 = &fl->u.ip4;
+
+	if (ct->status & statusbit) {
+		fl4->daddr = t->dst.u3.ip;
+		if (t->dst.protonum == IPPROTO_TCP ||
+		    t->dst.protonum == IPPROTO_UDP ||
+		    t->dst.protonum == IPPROTO_UDPLITE ||
+		    t->dst.protonum == IPPROTO_DCCP ||
+		    t->dst.protonum == IPPROTO_SCTP)
+			fl4->fl4_dport = t->dst.u.all;
+	}
+
+	statusbit ^= IPS_NAT_MASK;
+
+	if (ct->status & statusbit) {
+		fl4->saddr = t->src.u3.ip;
+		if (t->dst.protonum == IPPROTO_TCP ||
+		    t->dst.protonum == IPPROTO_UDP ||
+		    t->dst.protonum == IPPROTO_UDPLITE ||
+		    t->dst.protonum == IPPROTO_DCCP ||
+		    t->dst.protonum == IPPROTO_SCTP)
+			fl4->fl4_sport = t->src.u.all;
+	}
+}
+#endif /* CONFIG_XFRM */
+
+static int nf_nat_ipv4_in_range(const struct nf_conntrack_tuple *t,
+				const struct nf_nat_range *range)
+{
+	return ntohl(t->src.u3.ip) >= ntohl(range->min_addr.ip) &&
+	       ntohl(t->src.u3.ip) <= ntohl(range->max_addr.ip);
+}
+
+static u32 nf_nat_ipv4_secure_port(const struct nf_conntrack_tuple *t,
+				   __be16 dport)
+{
+	return secure_ipv4_port_ephemeral(t->src.u3.ip, t->dst.u3.ip, dport);
+}
+
+static bool nf_nat_ipv4_manip_pkt(struct sk_buff *skb,
+				  unsigned int iphdroff,
+				  const struct nf_nat_l4proto *l4proto,
+				  const struct nf_conntrack_tuple *target,
+				  enum nf_nat_manip_type maniptype)
+{
+	struct iphdr *iph;
+	unsigned int hdroff;
+
+	if (!skb_make_writable(skb, iphdroff + sizeof(*iph)))
+		return false;
+
+	iph = (void *)skb->data + iphdroff;
+	hdroff = iphdroff + iph->ihl * 4;
+
+	if (!l4proto->manip_pkt(skb, &nf_nat_l3proto_ipv4, iphdroff, hdroff,
+				target, maniptype))
+		return false;
+	iph = (void *)skb->data + iphdroff;
+
+	if (maniptype == NF_NAT_MANIP_SRC) {
+		csum_replace4(&iph->check, iph->saddr, target->src.u3.ip);
+		iph->saddr = target->src.u3.ip;
+	} else {
+		csum_replace4(&iph->check, iph->daddr, target->dst.u3.ip);
+		iph->daddr = target->dst.u3.ip;
+	}
+	return true;
+}
+
+static void nf_nat_ipv4_csum_update(struct sk_buff *skb,
+				    unsigned int iphdroff, __sum16 *check,
+				    const struct nf_conntrack_tuple *t,
+				    enum nf_nat_manip_type maniptype)
+{
+	struct iphdr *iph = (struct iphdr *)(skb->data + iphdroff);
+	__be32 oldip, newip;
+
+	if (maniptype == NF_NAT_MANIP_SRC) {
+		oldip = iph->saddr;
+		newip = t->src.u3.ip;
+	} else {
+		oldip = iph->daddr;
+		newip = t->dst.u3.ip;
+	}
+	inet_proto_csum_replace4(check, skb, oldip, newip, 1);
+}
+
+static void nf_nat_ipv4_csum_recalc(struct sk_buff *skb,
+				    u8 proto, void *data, __sum16 *check,
+				    int datalen, int oldlen)
+{
+	const struct iphdr *iph = ip_hdr(skb);
+	struct rtable *rt = skb_rtable(skb);
+
+	if (skb->ip_summed != CHECKSUM_PARTIAL) {
+		if (!(rt->rt_flags & RTCF_LOCAL) &&
+		    (!skb->dev || skb->dev->features & NETIF_F_V4_CSUM)) {
+			skb->ip_summed = CHECKSUM_PARTIAL;
+			skb->csum_start = skb_headroom(skb) +
+					  skb_network_offset(skb) +
+					  ip_hdrlen(skb);
+			skb->csum_offset = (void *)check - data;
+			*check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
+						    datalen, proto, 0);
+		} else {
+			*check = 0;
+			*check = csum_tcpudp_magic(iph->saddr, iph->daddr,
+						   datalen, proto,
+						   csum_partial(data, datalen,
+								0));
+			if (proto == IPPROTO_UDP && !*check)
+				*check = CSUM_MANGLED_0;
+		}
+	} else
+		inet_proto_csum_replace2(check, skb,
+					 htons(oldlen), htons(datalen), 1);
+}
+
+static int nf_nat_ipv4_nlattr_to_range(struct nlattr *tb[],
+				       struct nf_nat_range *range)
+{
+	if (tb[CTA_NAT_V4_MINIP]) {
+		range->min_addr.ip = nla_get_be32(tb[CTA_NAT_V4_MINIP]);
+		range->flags |= NF_NAT_RANGE_MAP_IPS;
+	}
+
+	if (tb[CTA_NAT_V4_MAXIP])
+		range->max_addr.ip = nla_get_be32(tb[CTA_NAT_V4_MAXIP]);
+	else
+		range->max_addr.ip = range->min_addr.ip;
+
+	return 0;
+}
+
+static const struct nf_nat_l3proto nf_nat_l3proto_ipv4 = {
+	.l3proto		= NFPROTO_IPV4,
+	.in_range		= nf_nat_ipv4_in_range,
+	.secure_port		= nf_nat_ipv4_secure_port,
+	.manip_pkt		= nf_nat_ipv4_manip_pkt,
+	.csum_update		= nf_nat_ipv4_csum_update,
+	.csum_recalc		= nf_nat_ipv4_csum_recalc,
+	.nlattr_to_range	= nf_nat_ipv4_nlattr_to_range,
+#ifdef CONFIG_XFRM
+	.decode_session		= nf_nat_ipv4_decode_session,
+#endif
+};
+
+int nf_nat_icmp_reply_translation(struct sk_buff *skb,
+				  struct nf_conn *ct,
+				  enum ip_conntrack_info ctinfo,
+				  unsigned int hooknum)
+{
+	struct {
+		struct icmphdr	icmp;
+		struct iphdr	ip;
+	} *inside;
+	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	enum nf_nat_manip_type manip = HOOK2MANIP(hooknum);
+	unsigned int hdrlen = ip_hdrlen(skb);
+	const struct nf_nat_l4proto *l4proto;
+	struct nf_conntrack_tuple target;
+	unsigned long statusbit;
+
+	NF_CT_ASSERT(ctinfo == IP_CT_RELATED || ctinfo == IP_CT_RELATED_REPLY);
+
+	if (!skb_make_writable(skb, hdrlen + sizeof(*inside)))
+		return 0;
+	if (nf_ip_checksum(skb, hooknum, hdrlen, 0))
+		return 0;
+
+	inside = (void *)skb->data + hdrlen;
+	if (inside->icmp.type == ICMP_REDIRECT) {
+		if ((ct->status & IPS_NAT_DONE_MASK) != IPS_NAT_DONE_MASK)
+			return 0;
+		if (ct->status & IPS_NAT_MASK)
+			return 0;
+	}
+
+	if (manip == NF_NAT_MANIP_SRC)
+		statusbit = IPS_SRC_NAT;
+	else
+		statusbit = IPS_DST_NAT;
+
+	/* Invert if this is reply direction */
+	if (dir == IP_CT_DIR_REPLY)
+		statusbit ^= IPS_NAT_MASK;
+
+	if (!(ct->status & statusbit))
+		return 1;
+
+	l4proto = __nf_nat_l4proto_find(NFPROTO_IPV4, inside->ip.protocol);
+	if (!nf_nat_ipv4_manip_pkt(skb, hdrlen + sizeof(inside->icmp),
+				   l4proto, &ct->tuplehash[!dir].tuple, !manip))
+		return 0;
+
+	if (skb->ip_summed != CHECKSUM_PARTIAL) {
+		/* Reloading "inside" here since manip_pkt may reallocate */
+		inside = (void *)skb->data + hdrlen;
+		inside->icmp.checksum = 0;
+		inside->icmp.checksum =
+			csum_fold(skb_checksum(skb, hdrlen,
+					       skb->len - hdrlen, 0));
+	}
+
+	/* Change outer to look like the reply to an incoming packet */
+	nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
+	l4proto = __nf_nat_l4proto_find(NFPROTO_IPV4, 0);
+	if (!nf_nat_ipv4_manip_pkt(skb, 0, l4proto, &target, manip))
+		return 0;
+
+	return 1;
+}
+EXPORT_SYMBOL_GPL(nf_nat_icmp_reply_translation);
+
+static int __init nf_nat_l3proto_ipv4_init(void)
+{
+	int err;
+
+	err = nf_nat_l4proto_register(NFPROTO_IPV4, &nf_nat_l4proto_icmp);
+	if (err < 0)
+		goto err1;
+	err = nf_nat_l3proto_register(&nf_nat_l3proto_ipv4);
+	if (err < 0)
+		goto err2;
+	return err;
+
+err2:
+	nf_nat_l4proto_unregister(NFPROTO_IPV4, &nf_nat_l4proto_icmp);
+err1:
+	return err;
+}
+
+static void __exit nf_nat_l3proto_ipv4_exit(void)
+{
+	nf_nat_l3proto_unregister(&nf_nat_l3proto_ipv4);
+	nf_nat_l4proto_unregister(NFPROTO_IPV4, &nf_nat_l4proto_icmp);
+}
+
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("nf-nat-" __stringify(AF_INET));
+
+module_init(nf_nat_l3proto_ipv4_init);
+module_exit(nf_nat_l3proto_ipv4_exit);
diff --git a/net/ipv4/netfilter/nf_nat_pptp.c b/net/ipv4/netfilter/nf_nat_pptp.c
index 31ef890..a06d7d7 100644
--- a/net/ipv4/netfilter/nf_nat_pptp.c
+++ b/net/ipv4/netfilter/nf_nat_pptp.c
@@ -22,7 +22,6 @@
 
 #include <net/netfilter/nf_nat.h>
 #include <net/netfilter/nf_nat_helper.h>
-#include <net/netfilter/nf_nat_rule.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <net/netfilter/nf_conntrack_zones.h>
@@ -47,7 +46,7 @@ static void pptp_nat_expected(struct nf_conn *ct,
 	struct nf_conntrack_tuple t;
 	const struct nf_ct_pptp_master *ct_pptp_info;
 	const struct nf_nat_pptp *nat_pptp_info;
-	struct nf_nat_ipv4_range range;
+	struct nf_nat_range range;
 
 	ct_pptp_info = nfct_help_data(master);
 	nat_pptp_info = &nfct_nat(master)->help.nat_pptp_info;
@@ -89,21 +88,21 @@ static void pptp_nat_expected(struct nf_conn *ct,
 
 	/* Change src to where master sends to */
 	range.flags = NF_NAT_RANGE_MAP_IPS;
-	range.min_ip = range.max_ip
-		= ct->master->tuplehash[!exp->dir].tuple.dst.u3.ip;
+	range.min_addr = range.max_addr
+		= ct->master->tuplehash[!exp->dir].tuple.dst.u3;
 	if (exp->dir == IP_CT_DIR_ORIGINAL) {
 		range.flags |= NF_NAT_RANGE_PROTO_SPECIFIED;
-		range.min = range.max = exp->saved_proto;
+		range.min_proto = range.max_proto = exp->saved_proto;
 	}
 	nf_nat_setup_info(ct, &range, NF_NAT_MANIP_SRC);
 
 	/* For DST manip, map port here to where it's expected. */
 	range.flags = NF_NAT_RANGE_MAP_IPS;
-	range.min_ip = range.max_ip
-		= ct->master->tuplehash[!exp->dir].tuple.src.u3.ip;
+	range.min_addr = range.max_addr
+		= ct->master->tuplehash[!exp->dir].tuple.src.u3;
 	if (exp->dir == IP_CT_DIR_REPLY) {
 		range.flags |= NF_NAT_RANGE_PROTO_SPECIFIED;
-		range.min = range.max = exp->saved_proto;
+		range.min_proto = range.max_proto = exp->saved_proto;
 	}
 	nf_nat_setup_info(ct, &range, NF_NAT_MANIP_DST);
 }
diff --git a/net/ipv4/netfilter/nf_nat_proto_gre.c b/net/ipv4/netfilter/nf_nat_proto_gre.c
index 46ba0b9..ea44f02 100644
--- a/net/ipv4/netfilter/nf_nat_proto_gre.c
+++ b/net/ipv4/netfilter/nf_nat_proto_gre.c
@@ -28,8 +28,7 @@
 #include <linux/ip.h>
 
 #include <net/netfilter/nf_nat.h>
-#include <net/netfilter/nf_nat_rule.h>
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 #include <linux/netfilter/nf_conntrack_proto_gre.h>
 
 MODULE_LICENSE("GPL");
@@ -38,8 +37,9 @@ MODULE_DESCRIPTION("Netfilter NAT protocol helper module for GRE");
 
 /* generate unique tuple ... */
 static void
-gre_unique_tuple(struct nf_conntrack_tuple *tuple,
-		 const struct nf_nat_ipv4_range *range,
+gre_unique_tuple(const struct nf_nat_l3proto *l3proto,
+		 struct nf_conntrack_tuple *tuple,
+		 const struct nf_nat_range *range,
 		 enum nf_nat_manip_type maniptype,
 		 const struct nf_conn *ct)
 {
@@ -62,8 +62,8 @@ gre_unique_tuple(struct nf_conntrack_tuple *tuple,
 		min = 1;
 		range_size = 0xffff;
 	} else {
-		min = ntohs(range->min.gre.key);
-		range_size = ntohs(range->max.gre.key) - min + 1;
+		min = ntohs(range->min_proto.gre.key);
+		range_size = ntohs(range->max_proto.gre.key) - min + 1;
 	}
 
 	pr_debug("min = %u, range_size = %u\n", min, range_size);
@@ -80,14 +80,14 @@ gre_unique_tuple(struct nf_conntrack_tuple *tuple,
 
 /* manipulate a GRE packet according to maniptype */
 static bool
-gre_manip_pkt(struct sk_buff *skb, unsigned int iphdroff,
+gre_manip_pkt(struct sk_buff *skb,
+	      const struct nf_nat_l3proto *l3proto,
+	      unsigned int iphdroff, unsigned int hdroff,
 	      const struct nf_conntrack_tuple *tuple,
 	      enum nf_nat_manip_type maniptype)
 {
 	const struct gre_hdr *greh;
 	struct gre_hdr_pptp *pgreh;
-	const struct iphdr *iph = (struct iphdr *)(skb->data + iphdroff);
-	unsigned int hdroff = iphdroff + iph->ihl * 4;
 
 	/* pgreh includes two optional 32bit fields which are not required
 	 * to be there.  That's where the magic '8' comes from */
@@ -117,24 +117,24 @@ gre_manip_pkt(struct sk_buff *skb, unsigned int iphdroff,
 	return true;
 }
 
-static const struct nf_nat_protocol gre = {
-	.protonum		= IPPROTO_GRE,
+static const struct nf_nat_l4proto gre = {
+	.l4proto		= IPPROTO_GRE,
 	.manip_pkt		= gre_manip_pkt,
-	.in_range		= nf_nat_proto_in_range,
+	.in_range		= nf_nat_l4proto_in_range,
 	.unique_tuple		= gre_unique_tuple,
 #if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
-	.nlattr_to_range	= nf_nat_proto_nlattr_to_range,
+	.nlattr_to_range	= nf_nat_l4proto_nlattr_to_range,
 #endif
 };
 
 static int __init nf_nat_proto_gre_init(void)
 {
-	return nf_nat_protocol_register(&gre);
+	return nf_nat_l4proto_register(NFPROTO_IPV4, &gre);
 }
 
 static void __exit nf_nat_proto_gre_fini(void)
 {
-	nf_nat_protocol_unregister(&gre);
+	nf_nat_l4proto_unregister(NFPROTO_IPV4, &gre);
 }
 
 module_init(nf_nat_proto_gre_init);
diff --git a/net/ipv4/netfilter/nf_nat_proto_icmp.c b/net/ipv4/netfilter/nf_nat_proto_icmp.c
index b351728..eb30347 100644
--- a/net/ipv4/netfilter/nf_nat_proto_icmp.c
+++ b/net/ipv4/netfilter/nf_nat_proto_icmp.c
@@ -15,8 +15,7 @@
 #include <linux/netfilter.h>
 #include <net/netfilter/nf_nat.h>
 #include <net/netfilter/nf_nat_core.h>
-#include <net/netfilter/nf_nat_rule.h>
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 
 static bool
 icmp_in_range(const struct nf_conntrack_tuple *tuple,
@@ -29,8 +28,9 @@ icmp_in_range(const struct nf_conntrack_tuple *tuple,
 }
 
 static void
-icmp_unique_tuple(struct nf_conntrack_tuple *tuple,
-		  const struct nf_nat_ipv4_range *range,
+icmp_unique_tuple(const struct nf_nat_l3proto *l3proto,
+		  struct nf_conntrack_tuple *tuple,
+		  const struct nf_nat_range *range,
 		  enum nf_nat_manip_type maniptype,
 		  const struct nf_conn *ct)
 {
@@ -38,13 +38,14 @@ icmp_unique_tuple(struct nf_conntrack_tuple *tuple,
 	unsigned int range_size;
 	unsigned int i;
 
-	range_size = ntohs(range->max.icmp.id) - ntohs(range->min.icmp.id) + 1;
+	range_size = ntohs(range->max_proto.icmp.id) -
+		     ntohs(range->min_proto.icmp.id) + 1;
 	/* If no range specified... */
 	if (!(range->flags & NF_NAT_RANGE_PROTO_SPECIFIED))
 		range_size = 0xFFFF;
 
 	for (i = 0; ; ++id) {
-		tuple->src.u.icmp.id = htons(ntohs(range->min.icmp.id) +
+		tuple->src.u.icmp.id = htons(ntohs(range->min_proto.icmp.id) +
 					     (id % range_size));
 		if (++i == range_size || !nf_nat_used_tuple(tuple, ct))
 			return;
@@ -54,13 +55,12 @@ icmp_unique_tuple(struct nf_conntrack_tuple *tuple,
 
 static bool
 icmp_manip_pkt(struct sk_buff *skb,
-	       unsigned int iphdroff,
+	       const struct nf_nat_l3proto *l3proto,
+	       unsigned int iphdroff, unsigned int hdroff,
 	       const struct nf_conntrack_tuple *tuple,
 	       enum nf_nat_manip_type maniptype)
 {
-	const struct iphdr *iph = (struct iphdr *)(skb->data + iphdroff);
 	struct icmphdr *hdr;
-	unsigned int hdroff = iphdroff + iph->ihl*4;
 
 	if (!skb_make_writable(skb, hdroff + sizeof(*hdr)))
 		return false;
@@ -72,12 +72,12 @@ icmp_manip_pkt(struct sk_buff *skb,
 	return true;
 }
 
-const struct nf_nat_protocol nf_nat_protocol_icmp = {
-	.protonum		= IPPROTO_ICMP,
+const struct nf_nat_l4proto nf_nat_l4proto_icmp = {
+	.l4proto		= IPPROTO_ICMP,
 	.manip_pkt		= icmp_manip_pkt,
 	.in_range		= icmp_in_range,
 	.unique_tuple		= icmp_unique_tuple,
 #if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
-	.nlattr_to_range	= nf_nat_proto_nlattr_to_range,
+	.nlattr_to_range	= nf_nat_l4proto_nlattr_to_range,
 #endif
 };
diff --git a/net/ipv4/netfilter/nf_nat_rule.c b/net/ipv4/netfilter/nf_nat_rule.c
deleted file mode 100644
index d2a9dc3..0000000
--- a/net/ipv4/netfilter/nf_nat_rule.c
+++ /dev/null
@@ -1,214 +0,0 @@
-/* (C) 1999-2001 Paul `Rusty' Russell
- * (C) 2002-2006 Netfilter Core Team <coreteam@netfilter.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-/* Everything about the rules for NAT. */
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#include <linux/types.h>
-#include <linux/ip.h>
-#include <linux/netfilter.h>
-#include <linux/netfilter_ipv4.h>
-#include <linux/module.h>
-#include <linux/kmod.h>
-#include <linux/skbuff.h>
-#include <linux/proc_fs.h>
-#include <linux/slab.h>
-#include <net/checksum.h>
-#include <net/route.h>
-#include <linux/bitops.h>
-
-#include <linux/netfilter_ipv4/ip_tables.h>
-#include <net/netfilter/nf_nat.h>
-#include <net/netfilter/nf_nat_core.h>
-#include <net/netfilter/nf_nat_rule.h>
-
-#define NAT_VALID_HOOKS ((1 << NF_INET_PRE_ROUTING) | \
-			 (1 << NF_INET_POST_ROUTING) | \
-			 (1 << NF_INET_LOCAL_OUT) | \
-			 (1 << NF_INET_LOCAL_IN))
-
-static const struct xt_table nat_table = {
-	.name		= "nat",
-	.valid_hooks	= NAT_VALID_HOOKS,
-	.me		= THIS_MODULE,
-	.af		= NFPROTO_IPV4,
-};
-
-/* Source NAT */
-static unsigned int
-ipt_snat_target(struct sk_buff *skb, const struct xt_action_param *par)
-{
-	struct nf_conn *ct;
-	enum ip_conntrack_info ctinfo;
-	const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
-
-	NF_CT_ASSERT(par->hooknum == NF_INET_POST_ROUTING ||
-		     par->hooknum == NF_INET_LOCAL_IN);
-
-	ct = nf_ct_get(skb, &ctinfo);
-
-	/* Connection must be valid and new. */
-	NF_CT_ASSERT(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
-			    ctinfo == IP_CT_RELATED_REPLY));
-	NF_CT_ASSERT(par->out != NULL);
-
-	return nf_nat_setup_info(ct, &mr->range[0], NF_NAT_MANIP_SRC);
-}
-
-static unsigned int
-ipt_dnat_target(struct sk_buff *skb, const struct xt_action_param *par)
-{
-	struct nf_conn *ct;
-	enum ip_conntrack_info ctinfo;
-	const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
-
-	NF_CT_ASSERT(par->hooknum == NF_INET_PRE_ROUTING ||
-		     par->hooknum == NF_INET_LOCAL_OUT);
-
-	ct = nf_ct_get(skb, &ctinfo);
-
-	/* Connection must be valid and new. */
-	NF_CT_ASSERT(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED));
-
-	return nf_nat_setup_info(ct, &mr->range[0], NF_NAT_MANIP_DST);
-}
-
-static int ipt_snat_checkentry(const struct xt_tgchk_param *par)
-{
-	const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
-
-	/* Must be a valid range */
-	if (mr->rangesize != 1) {
-		pr_info("SNAT: multiple ranges no longer supported\n");
-		return -EINVAL;
-	}
-	return 0;
-}
-
-static int ipt_dnat_checkentry(const struct xt_tgchk_param *par)
-{
-	const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
-
-	/* Must be a valid range */
-	if (mr->rangesize != 1) {
-		pr_info("DNAT: multiple ranges no longer supported\n");
-		return -EINVAL;
-	}
-	return 0;
-}
-
-static unsigned int
-alloc_null_binding(struct nf_conn *ct, unsigned int hooknum)
-{
-	/* Force range to this IP; let proto decide mapping for
-	   per-proto parts (hence not NF_NAT_RANGE_PROTO_SPECIFIED).
-	*/
-	struct nf_nat_ipv4_range range;
-
-	range.flags = 0;
-	pr_debug("Allocating NULL binding for %p (%pI4)\n", ct,
-		 HOOK2MANIP(hooknum) == NF_NAT_MANIP_SRC ?
-		 &ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.ip :
-		 &ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.ip);
-
-	return nf_nat_setup_info(ct, &range, HOOK2MANIP(hooknum));
-}
-
-int nf_nat_rule_find(struct sk_buff *skb,
-		     unsigned int hooknum,
-		     const struct net_device *in,
-		     const struct net_device *out,
-		     struct nf_conn *ct)
-{
-	struct net *net = nf_ct_net(ct);
-	int ret;
-
-	ret = ipt_do_table(skb, hooknum, in, out, net->ipv4.nat_table);
-
-	if (ret == NF_ACCEPT) {
-		if (!nf_nat_initialized(ct, HOOK2MANIP(hooknum)))
-			/* NUL mapping */
-			ret = alloc_null_binding(ct, hooknum);
-	}
-	return ret;
-}
-
-static struct xt_target ipt_snat_reg __read_mostly = {
-	.name		= "SNAT",
-	.target		= ipt_snat_target,
-	.targetsize	= sizeof(struct nf_nat_ipv4_multi_range_compat),
-	.table		= "nat",
-	.hooks		= (1 << NF_INET_POST_ROUTING) | (1 << NF_INET_LOCAL_IN),
-	.checkentry	= ipt_snat_checkentry,
-	.family		= AF_INET,
-};
-
-static struct xt_target ipt_dnat_reg __read_mostly = {
-	.name		= "DNAT",
-	.target		= ipt_dnat_target,
-	.targetsize	= sizeof(struct nf_nat_ipv4_multi_range_compat),
-	.table		= "nat",
-	.hooks		= (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_OUT),
-	.checkentry	= ipt_dnat_checkentry,
-	.family		= AF_INET,
-};
-
-static int __net_init nf_nat_rule_net_init(struct net *net)
-{
-	struct ipt_replace *repl;
-
-	repl = ipt_alloc_initial_table(&nat_table);
-	if (repl == NULL)
-		return -ENOMEM;
-	net->ipv4.nat_table = ipt_register_table(net, &nat_table, repl);
-	kfree(repl);
-	if (IS_ERR(net->ipv4.nat_table))
-		return PTR_ERR(net->ipv4.nat_table);
-	return 0;
-}
-
-static void __net_exit nf_nat_rule_net_exit(struct net *net)
-{
-	ipt_unregister_table(net, net->ipv4.nat_table);
-}
-
-static struct pernet_operations nf_nat_rule_net_ops = {
-	.init = nf_nat_rule_net_init,
-	.exit = nf_nat_rule_net_exit,
-};
-
-int __init nf_nat_rule_init(void)
-{
-	int ret;
-
-	ret = register_pernet_subsys(&nf_nat_rule_net_ops);
-	if (ret != 0)
-		goto out;
-	ret = xt_register_target(&ipt_snat_reg);
-	if (ret != 0)
-		goto unregister_table;
-
-	ret = xt_register_target(&ipt_dnat_reg);
-	if (ret != 0)
-		goto unregister_snat;
-
-	return ret;
-
- unregister_snat:
-	xt_unregister_target(&ipt_snat_reg);
- unregister_table:
-	unregister_pernet_subsys(&nf_nat_rule_net_ops);
- out:
-	return ret;
-}
-
-void nf_nat_rule_cleanup(void)
-{
-	xt_unregister_target(&ipt_dnat_reg);
-	xt_unregister_target(&ipt_snat_reg);
-	unregister_pernet_subsys(&nf_nat_rule_net_ops);
-}
diff --git a/net/ipv4/netfilter/nf_nat_sip.c b/net/ipv4/netfilter/nf_nat_sip.c
index df626af..47a4718 100644
--- a/net/ipv4/netfilter/nf_nat_sip.c
+++ b/net/ipv4/netfilter/nf_nat_sip.c
@@ -19,7 +19,6 @@
 
 #include <net/netfilter/nf_nat.h>
 #include <net/netfilter/nf_nat_helper.h>
-#include <net/netfilter/nf_nat_rule.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <linux/netfilter/nf_conntrack_sip.h>
@@ -255,15 +254,15 @@ static void ip_nat_sip_seq_adjust(struct sk_buff *skb, s16 off)
 static void ip_nat_sip_expected(struct nf_conn *ct,
 				struct nf_conntrack_expect *exp)
 {
-	struct nf_nat_ipv4_range range;
+	struct nf_nat_range range;
 
 	/* This must be a fresh one. */
 	BUG_ON(ct->status & IPS_NAT_DONE_MASK);
 
 	/* For DST manip, map port here to where it's expected. */
 	range.flags = (NF_NAT_RANGE_MAP_IPS | NF_NAT_RANGE_PROTO_SPECIFIED);
-	range.min = range.max = exp->saved_proto;
-	range.min_ip = range.max_ip = exp->saved_ip;
+	range.min_proto = range.max_proto = exp->saved_proto;
+	range.min_addr = range.max_addr = exp->saved_addr;
 	nf_nat_setup_info(ct, &range, NF_NAT_MANIP_DST);
 
 	/* Change src to where master sends to, but only if the connection
@@ -271,8 +270,8 @@ static void ip_nat_sip_expected(struct nf_conn *ct,
 	if (ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip ==
 	    ct->master->tuplehash[exp->dir].tuple.src.u3.ip) {
 		range.flags = NF_NAT_RANGE_MAP_IPS;
-		range.min_ip = range.max_ip
-			= ct->master->tuplehash[!exp->dir].tuple.dst.u3.ip;
+		range.min_addr = range.max_addr
+			= ct->master->tuplehash[!exp->dir].tuple.dst.u3;
 		nf_nat_setup_info(ct, &range, NF_NAT_MANIP_SRC);
 	}
 }
@@ -307,7 +306,7 @@ static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
 	else
 		port = ntohs(exp->tuple.dst.u.udp.port);
 
-	exp->saved_ip = exp->tuple.dst.u3.ip;
+	exp->saved_addr = exp->tuple.dst.u3;
 	exp->tuple.dst.u3.ip = newip;
 	exp->saved_proto.udp.port = exp->tuple.dst.u.udp.port;
 	exp->dir = !dir;
@@ -329,7 +328,7 @@ static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
 	if (port == 0)
 		return NF_DROP;
 
-	if (exp->tuple.dst.u3.ip != exp->saved_ip ||
+	if (exp->tuple.dst.u3.ip != exp->saved_addr.ip ||
 	    exp->tuple.dst.u.udp.port != exp->saved_proto.udp.port) {
 		buflen = sprintf(buffer, "%pI4:%u", &newip, port);
 		if (!mangle_packet(skb, protoff, dataoff, dptr, datalen,
@@ -485,13 +484,13 @@ static unsigned int ip_nat_sdp_media(struct sk_buff *skb, unsigned int protoff,
 	else
 		rtp_addr->ip = ct->tuplehash[!dir].tuple.dst.u3.ip;
 
-	rtp_exp->saved_ip = rtp_exp->tuple.dst.u3.ip;
+	rtp_exp->saved_addr = rtp_exp->tuple.dst.u3;
 	rtp_exp->tuple.dst.u3.ip = rtp_addr->ip;
 	rtp_exp->saved_proto.udp.port = rtp_exp->tuple.dst.u.udp.port;
 	rtp_exp->dir = !dir;
 	rtp_exp->expectfn = ip_nat_sip_expected;
 
-	rtcp_exp->saved_ip = rtcp_exp->tuple.dst.u3.ip;
+	rtcp_exp->saved_addr = rtcp_exp->tuple.dst.u3;
 	rtcp_exp->tuple.dst.u3.ip = rtp_addr->ip;
 	rtcp_exp->saved_proto.udp.port = rtcp_exp->tuple.dst.u.udp.port;
 	rtcp_exp->dir = !dir;
diff --git a/net/ipv4/netfilter/nf_nat_tftp.c b/net/ipv4/netfilter/nf_nat_tftp.c
index 9dbb8d2..ccabbda 100644
--- a/net/ipv4/netfilter/nf_nat_tftp.c
+++ b/net/ipv4/netfilter/nf_nat_tftp.c
@@ -11,7 +11,6 @@
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <net/netfilter/nf_nat_helper.h>
-#include <net/netfilter/nf_nat_rule.h>
 #include <linux/netfilter/nf_conntrack_tftp.h>
 
 MODULE_AUTHOR("Magnus Boden <mb@ozaba.mine.nu>");
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c19b214..91addda 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -356,6 +356,30 @@ config NETFILTER_NETLINK_QUEUE_CT
 	  If this option is enabled, NFQUEUE can include Connection Tracking
 	  information together with the packet is the enqueued via NFNETLINK.
 
+config NF_NAT
+	tristate
+
+config NF_NAT_NEEDED
+	bool
+	depends on NF_NAT
+	default y
+
+config NF_NAT_PROTO_DCCP
+	tristate
+	depends on NF_NAT && NF_CT_PROTO_DCCP
+	default NF_NAT && NF_CT_PROTO_DCCP
+
+config NF_NAT_PROTO_UDPLITE
+	tristate
+	depends on NF_NAT && NF_CT_PROTO_UDPLITE
+	default NF_NAT && NF_CT_PROTO_UDPLITE
+
+config NF_NAT_PROTO_SCTP
+	tristate
+	default NF_NAT && NF_CT_PROTO_SCTP
+	depends on NF_NAT && NF_CT_PROTO_SCTP
+	select LIBCRC32C
+
 endif # NF_CONNTRACK
 
 # transparent proxy support
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 1c5160f..09c9451 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -43,6 +43,17 @@ obj-$(CONFIG_NF_CONNTRACK_SANE) += nf_conntrack_sane.o
 obj-$(CONFIG_NF_CONNTRACK_SIP) += nf_conntrack_sip.o
 obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o
 
+nf_nat-y	:= nf_nat_core.o nf_nat_proto_unknown.o nf_nat_proto_common.o \
+		   nf_nat_proto_udp.o nf_nat_proto_tcp.o nf_nat_helper.o
+
+obj-$(CONFIG_NF_NAT) += nf_nat.o
+obj-$(CONFIG_NF_NAT) += xt_nat.o
+
+# NAT protocols (nf_nat)
+obj-$(CONFIG_NF_NAT_PROTO_DCCP) += nf_nat_proto_dccp.o
+obj-$(CONFIG_NF_NAT_PROTO_UDPLITE) += nf_nat_proto_udplite.o
+obj-$(CONFIG_NF_NAT_PROTO_SCTP) += nf_nat_proto_sctp.o
+
 # transparent proxy support
 obj-$(CONFIG_NETFILTER_TPROXY) += nf_tproxy_core.o
 
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 0bc6b60..4ff0a60 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -273,6 +273,11 @@ EXPORT_SYMBOL_GPL(nfq_ct_nat_hook);
 
 #endif /* CONFIG_NF_CONNTRACK */
 
+#ifdef CONFIG_NF_NAT_NEEDED
+void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
+EXPORT_SYMBOL(nf_nat_decode_session_hook);
+#endif
+
 #ifdef CONFIG_PROC_FS
 struct proc_dir_entry *proc_net_netfilter;
 EXPORT_SYMBOL(proc_net_netfilter);
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index cf48755..f83e79d 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -55,6 +55,12 @@ int (*nfnetlink_parse_nat_setup_hook)(struct nf_conn *ct,
 				      const struct nlattr *attr) __read_mostly;
 EXPORT_SYMBOL_GPL(nfnetlink_parse_nat_setup_hook);
 
+int (*nf_nat_seq_adjust_hook)(struct sk_buff *skb,
+			      struct nf_conn *ct,
+			      enum ip_conntrack_info ctinfo,
+			      unsigned int protoff);
+EXPORT_SYMBOL_GPL(nf_nat_seq_adjust_hook);
+
 DEFINE_SPINLOCK(nf_conntrack_lock);
 EXPORT_SYMBOL_GPL(nf_conntrack_lock);
 
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 14f67a2..7545506 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -45,7 +45,7 @@
 #include <net/netfilter/nf_conntrack_timestamp.h>
 #ifdef CONFIG_NF_NAT_NEEDED
 #include <net/netfilter/nf_nat_core.h>
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 #include <net/netfilter/nf_nat_helper.h>
 #endif
 
@@ -1096,13 +1096,14 @@ ctnetlink_parse_nat_setup(struct nf_conn *ct,
 			  const struct nlattr *attr)
 {
 	typeof(nfnetlink_parse_nat_setup_hook) parse_nat_setup;
+	int err;
 
 	parse_nat_setup = rcu_dereference(nfnetlink_parse_nat_setup_hook);
 	if (!parse_nat_setup) {
 #ifdef CONFIG_MODULES
 		rcu_read_unlock();
 		nfnl_unlock();
-		if (request_module("nf-nat-ipv4") < 0) {
+		if (request_module("nf-nat") < 0) {
 			nfnl_lock();
 			rcu_read_lock();
 			return -EOPNOTSUPP;
@@ -1115,7 +1116,26 @@ ctnetlink_parse_nat_setup(struct nf_conn *ct,
 		return -EOPNOTSUPP;
 	}
 
-	return parse_nat_setup(ct, manip, attr);
+	err = parse_nat_setup(ct, manip, attr);
+	if (err == -EAGAIN) {
+#ifdef CONFIG_MODULES
+		rcu_read_unlock();
+		spin_unlock_bh(&nf_conntrack_lock);
+		nfnl_unlock();
+		if (request_module("nf-nat-%u", nf_ct_l3num(ct)) < 0) {
+			nfnl_lock();
+			spin_lock_bh(&nf_conntrack_lock);
+			rcu_read_lock();
+			return -EOPNOTSUPP;
+		}
+		nfnl_lock();
+		spin_lock_bh(&nf_conntrack_lock);
+		rcu_read_lock();
+#else
+		err = -EOPNOTSUPP;
+#endif
+	}
+	return err;
 }
 #endif
 
@@ -1974,6 +1994,8 @@ nla_put_failure:
 	return -1;
 }
 
+static const union nf_inet_addr any_addr;
+
 static int
 ctnetlink_exp_dump_expect(struct sk_buff *skb,
 			  const struct nf_conntrack_expect *exp)
@@ -2000,7 +2022,8 @@ ctnetlink_exp_dump_expect(struct sk_buff *skb,
 		goto nla_put_failure;
 
 #ifdef CONFIG_NF_NAT_NEEDED
-	if (exp->saved_ip || exp->saved_proto.all) {
+	if (!nf_inet_addr_cmp(&exp->saved_addr, &any_addr) ||
+	    exp->saved_proto.all) {
 		nest_parms = nla_nest_start(skb, CTA_EXPECT_NAT | NLA_F_NESTED);
 		if (!nest_parms)
 			goto nla_put_failure;
@@ -2009,7 +2032,7 @@ ctnetlink_exp_dump_expect(struct sk_buff *skb,
 			goto nla_put_failure;
 
 		nat_tuple.src.l3num = nf_ct_l3num(master);
-		nat_tuple.src.u3.ip = exp->saved_ip;
+		nat_tuple.src.u3 = exp->saved_addr;
 		nat_tuple.dst.protonum = nf_ct_protonum(master);
 		nat_tuple.src.u = exp->saved_proto;
 
@@ -2405,7 +2428,7 @@ ctnetlink_parse_expect_nat(const struct nlattr *attr,
 	if (err < 0)
 		return err;
 
-	exp->saved_ip = nat_tuple.src.u3.ip;
+	exp->saved_addr = nat_tuple.src.u3;
 	exp->saved_proto = nat_tuple.src.u;
 	exp->dir = ntohl(nla_get_be32(tb[CTA_EXPECT_NAT_DIR]));
 
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index a5ac11e..9c2cc71 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -505,10 +505,10 @@ static inline s16 nat_offset(const struct nf_conn *ct,
 
 	return get_offset != NULL ? get_offset(ct, dir, seq) : 0;
 }
-#define NAT_OFFSET(pf, ct, dir, seq) \
-	(pf == NFPROTO_IPV4 ? nat_offset(ct, dir, seq) : 0)
+#define NAT_OFFSET(ct, dir, seq) \
+	(nat_offset(ct, dir, seq))
 #else
-#define NAT_OFFSET(pf, ct, dir, seq)	0
+#define NAT_OFFSET(ct, dir, seq)	0
 #endif
 
 static bool tcp_in_window(const struct nf_conn *ct,
@@ -541,7 +541,7 @@ static bool tcp_in_window(const struct nf_conn *ct,
 		tcp_sack(skb, dataoff, tcph, &sack);
 
 	/* Take into account NAT sequence number mangling */
-	receiver_offset = NAT_OFFSET(pf, ct, !dir, ack - 1);
+	receiver_offset = NAT_OFFSET(ct, !dir, ack - 1);
 	ack -= receiver_offset;
 	sack -= receiver_offset;
 
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 590f0ab..d517490 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -946,11 +946,11 @@ static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int protoff,
 			break;
 #ifdef CONFIG_NF_NAT_NEEDED
 		if (exp->tuple.src.l3num == AF_INET && !direct_rtp &&
-		    (exp->saved_ip != exp->tuple.dst.u3.ip ||
+		    (exp->saved_addr.ip != exp->tuple.dst.u3.ip ||
 		     exp->saved_proto.udp.port != exp->tuple.dst.u.udp.port) &&
 		    ct->status & IPS_NAT_MASK) {
-			daddr->ip		= exp->saved_ip;
-			tuple.dst.u3.ip		= exp->saved_ip;
+			daddr->ip		= exp->saved_addr.ip;
+			tuple.dst.u3.ip		= exp->saved_addr.ip;
 			tuple.dst.u.udp.port	= exp->saved_proto.udp.port;
 			direct_rtp = 1;
 		} else
diff --git a/net/ipv4/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
similarity index 56%
rename from net/ipv4/netfilter/nf_nat_core.c
rename to net/netfilter/nf_nat_core.c
index 44b082f..8e0910a 100644
--- a/net/ipv4/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -1,7 +1,7 @@
-/* NAT for netfilter; shared with compatibility layer. */
-
-/* (C) 1999-2001 Paul `Rusty' Russell
+/*
+ * (C) 1999-2001 Paul `Rusty' Russell
  * (C) 2002-2006 Netfilter Core Team <coreteam@netfilter.org>
+ * (C) 2011 Patrick McHardy <kaber@trash.net>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -13,38 +13,103 @@
 #include <linux/timer.h>
 #include <linux/skbuff.h>
 #include <linux/gfp.h>
-#include <net/checksum.h>
-#include <net/icmp.h>
-#include <net/ip.h>
-#include <net/tcp.h>  /* For tcp_prot in getorigdst */
-#include <linux/icmp.h>
-#include <linux/udp.h>
+#include <net/xfrm.h>
 #include <linux/jhash.h>
 
-#include <linux/netfilter_ipv4.h>
 #include <net/netfilter/nf_conntrack.h>
 #include <net/netfilter/nf_conntrack_core.h>
 #include <net/netfilter/nf_nat.h>
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 #include <net/netfilter/nf_nat_core.h>
 #include <net/netfilter/nf_nat_helper.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_l3proto.h>
 #include <net/netfilter/nf_conntrack_zones.h>
+#include <linux/netfilter/nf_nat.h>
 
 static DEFINE_SPINLOCK(nf_nat_lock);
 
-static struct nf_conntrack_l3proto *l3proto __read_mostly;
-
-#define MAX_IP_NAT_PROTO 256
-static const struct nf_nat_protocol __rcu *nf_nat_protos[MAX_IP_NAT_PROTO]
+static const struct nf_nat_l3proto __rcu *nf_nat_l3protos[NFPROTO_NUMPROTO]
+						__read_mostly;
+static const struct nf_nat_l4proto __rcu **nf_nat_l4protos[NFPROTO_NUMPROTO]
 						__read_mostly;
 
-static inline const struct nf_nat_protocol *
-__nf_nat_proto_find(u_int8_t protonum)
+
+inline const struct nf_nat_l3proto *
+__nf_nat_l3proto_find(u8 family)
+{
+	return rcu_dereference(nf_nat_l3protos[family]);
+}
+
+inline const struct nf_nat_l4proto *
+__nf_nat_l4proto_find(u8 family, u8 protonum)
+{
+	return rcu_dereference(nf_nat_l4protos[family][protonum]);
+}
+EXPORT_SYMBOL_GPL(__nf_nat_l4proto_find);
+
+#ifdef CONFIG_XFRM
+static void __nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl)
+{
+	const struct nf_nat_l3proto *l3proto;
+	const struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	enum ip_conntrack_dir dir;
+	unsigned  long statusbit;
+	u8 family;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct == NULL)
+		return;
+
+	family = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num;
+	rcu_read_lock();
+	l3proto = __nf_nat_l3proto_find(family);
+	if (l3proto == NULL)
+		goto out;
+
+	dir = CTINFO2DIR(ctinfo);
+	if (dir == IP_CT_DIR_ORIGINAL)
+		statusbit = IPS_DST_NAT;
+	else
+		statusbit = IPS_SRC_NAT;
+
+	l3proto->decode_session(skb, ct, dir, statusbit, fl);
+out:
+	rcu_read_unlock();
+}
+
+int nf_xfrm_me_harder(struct sk_buff *skb, unsigned int family)
 {
-	return rcu_dereference(nf_nat_protos[protonum]);
+	struct flowi fl;
+	unsigned int hh_len;
+	struct dst_entry *dst;
+
+	if (xfrm_decode_session(skb, &fl, family) < 0)
+		return -1;
+
+	dst = skb_dst(skb);
+	if (dst->xfrm)
+		dst = ((struct xfrm_dst *)dst)->route;
+	dst_hold(dst);
+
+	dst = xfrm_lookup(dev_net(dst->dev), dst, &fl, skb->sk, 0);
+	if (IS_ERR(dst))
+		return -1;
+
+	skb_dst_drop(skb);
+	skb_dst_set(skb, dst);
+
+	/* Change in oif may mean change in hh_len. */
+	hh_len = skb_dst(skb)->dev->hard_header_len;
+	if (skb_headroom(skb) < hh_len &&
+	    pskb_expand_head(skb, hh_len - skb_headroom(skb), 0, GFP_ATOMIC))
+		return -1;
+	return 0;
 }
+EXPORT_SYMBOL(nf_xfrm_me_harder);
+#endif /* CONFIG_XFRM */
 
 /* We keep an extra hash for each conntrack, for fast searching. */
 static inline unsigned int
@@ -54,10 +119,9 @@ hash_by_src(const struct net *net, u16 zone,
 	unsigned int hash;
 
 	/* Original src, to ensure we map it consistently if poss. */
-	hash = jhash_3words((__force u32)tuple->src.u3.ip,
-			    (__force u32)tuple->src.u.all ^ zone,
-			    tuple->dst.protonum, nf_conntrack_hash_rnd);
-	return ((u64)hash * net->ipv4.nat_htable_size) >> 32;
+	hash = jhash2((u32 *)&tuple->src, sizeof(tuple->src) / sizeof(u32),
+		      tuple->dst.protonum ^ zone ^ nf_conntrack_hash_rnd);
+	return ((u64)hash * net->ct.nat_htable_size) >> 32;
 }
 
 /* Is this tuple already taken? (not by us) */
@@ -66,10 +130,11 @@ nf_nat_used_tuple(const struct nf_conntrack_tuple *tuple,
 		  const struct nf_conn *ignored_conntrack)
 {
 	/* Conntrack tracking doesn't keep track of outgoing tuples; only
-	   incoming ones.  NAT means they don't have a fixed mapping,
-	   so we invert the tuple and look for the incoming reply.
-
-	   We could keep a separate hash if this proves too slow. */
+	 * incoming ones.  NAT means they don't have a fixed mapping,
+	 * so we invert the tuple and look for the incoming reply.
+	 *
+	 * We could keep a separate hash if this proves too slow.
+	 */
 	struct nf_conntrack_tuple reply;
 
 	nf_ct_invert_tuplepr(&reply, tuple);
@@ -78,31 +143,26 @@ nf_nat_used_tuple(const struct nf_conntrack_tuple *tuple,
 EXPORT_SYMBOL(nf_nat_used_tuple);
 
 /* If we source map this tuple so reply looks like reply_tuple, will
- * that meet the constraints of range. */
-static int
-in_range(const struct nf_conntrack_tuple *tuple,
-	 const struct nf_nat_ipv4_range *range)
+ * that meet the constraints of range.
+ */
+static int in_range(const struct nf_nat_l3proto *l3proto,
+		    const struct nf_nat_l4proto *l4proto,
+		    const struct nf_conntrack_tuple *tuple,
+		    const struct nf_nat_range *range)
 {
-	const struct nf_nat_protocol *proto;
-	int ret = 0;
-
 	/* If we are supposed to map IPs, then we must be in the
-	   range specified, otherwise let this drag us onto a new src IP. */
-	if (range->flags & NF_NAT_RANGE_MAP_IPS) {
-		if (ntohl(tuple->src.u3.ip) < ntohl(range->min_ip) ||
-		    ntohl(tuple->src.u3.ip) > ntohl(range->max_ip))
-			return 0;
-	}
+	 * range specified, otherwise let this drag us onto a new src IP.
+	 */
+	if (range->flags & NF_NAT_RANGE_MAP_IPS &&
+	    !l3proto->in_range(tuple, range))
+		return 0;
 
-	rcu_read_lock();
-	proto = __nf_nat_proto_find(tuple->dst.protonum);
 	if (!(range->flags & NF_NAT_RANGE_PROTO_SPECIFIED) ||
-	    proto->in_range(tuple, NF_NAT_MANIP_SRC,
-			    &range->min, &range->max))
-		ret = 1;
-	rcu_read_unlock();
+	    l4proto->in_range(tuple, NF_NAT_MANIP_SRC,
+			      &range->min_proto, &range->max_proto))
+		return 1;
 
-	return ret;
+	return 0;
 }
 
 static inline int
@@ -113,24 +173,25 @@ same_src(const struct nf_conn *ct,
 
 	t = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
 	return (t->dst.protonum == tuple->dst.protonum &&
-		t->src.u3.ip == tuple->src.u3.ip &&
+		nf_inet_addr_cmp(&t->src.u3, &tuple->src.u3) &&
 		t->src.u.all == tuple->src.u.all);
 }
 
 /* Only called for SRC manip */
 static int
 find_appropriate_src(struct net *net, u16 zone,
+		     const struct nf_nat_l3proto *l3proto,
+		     const struct nf_nat_l4proto *l4proto,
 		     const struct nf_conntrack_tuple *tuple,
 		     struct nf_conntrack_tuple *result,
-		     const struct nf_nat_ipv4_range *range)
+		     const struct nf_nat_range *range)
 {
 	unsigned int h = hash_by_src(net, zone, tuple);
 	const struct nf_conn_nat *nat;
 	const struct nf_conn *ct;
 	const struct hlist_node *n;
 
-	rcu_read_lock();
-	hlist_for_each_entry_rcu(nat, n, &net->ipv4.nat_bysource[h], bysource) {
+	hlist_for_each_entry_rcu(nat, n, &net->ct.nat_bysource[h], bysource) {
 		ct = nat->ct;
 		if (same_src(ct, tuple) && nf_ct_zone(ct) == zone) {
 			/* Copy source part from reply tuple. */
@@ -138,119 +199,149 @@ find_appropriate_src(struct net *net, u16 zone,
 				       &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
 			result->dst = tuple->dst;
 
-			if (in_range(result, range)) {
+			if (in_range(l3proto, l4proto, result, range)) {
 				rcu_read_unlock();
 				return 1;
 			}
 		}
 	}
-	rcu_read_unlock();
 	return 0;
 }
 
 /* For [FUTURE] fragmentation handling, we want the least-used
-   src-ip/dst-ip/proto triple.  Fairness doesn't come into it.  Thus
-   if the range specifies 1.2.3.4 ports 10000-10005 and 1.2.3.5 ports
-   1-65535, we don't do pro-rata allocation based on ports; we choose
-   the ip with the lowest src-ip/dst-ip/proto usage.
-*/
+ * src-ip/dst-ip/proto triple.  Fairness doesn't come into it.  Thus
+ * if the range specifies 1.2.3.4 ports 10000-10005 and 1.2.3.5 ports
+ * 1-65535, we don't do pro-rata allocation based on ports; we choose
+ * the ip with the lowest src-ip/dst-ip/proto usage.
+ */
 static void
 find_best_ips_proto(u16 zone, struct nf_conntrack_tuple *tuple,
-		    const struct nf_nat_ipv4_range *range,
+		    const struct nf_nat_range *range,
 		    const struct nf_conn *ct,
 		    enum nf_nat_manip_type maniptype)
 {
-	__be32 *var_ipp;
+	union nf_inet_addr *var_ipp;
+	unsigned int i, max;
 	/* Host order */
-	u_int32_t minip, maxip, j;
+	u32 minip, maxip, j, dist;
+	bool full_range;
 
 	/* No IP mapping?  Do nothing. */
 	if (!(range->flags & NF_NAT_RANGE_MAP_IPS))
 		return;
 
 	if (maniptype == NF_NAT_MANIP_SRC)
-		var_ipp = &tuple->src.u3.ip;
+		var_ipp = &tuple->src.u3;
 	else
-		var_ipp = &tuple->dst.u3.ip;
+		var_ipp = &tuple->dst.u3;
 
 	/* Fast path: only one choice. */
-	if (range->min_ip == range->max_ip) {
-		*var_ipp = range->min_ip;
+	if (nf_inet_addr_cmp(&range->min_addr, &range->max_addr)) {
+		*var_ipp = range->min_addr;
 		return;
 	}
 
+	if (nf_ct_l3num(ct) == NFPROTO_IPV4)
+		max = sizeof(var_ipp->ip) / sizeof(u32) - 1;
+	else
+		max = sizeof(var_ipp->ip6) / sizeof(u32) - 1;
+
 	/* Hashing source and destination IPs gives a fairly even
 	 * spread in practice (if there are a small number of IPs
 	 * involved, there usually aren't that many connections
 	 * anyway).  The consistency means that servers see the same
 	 * client coming from the same IP (some Internet Banking sites
-	 * like this), even across reboots. */
-	minip = ntohl(range->min_ip);
-	maxip = ntohl(range->max_ip);
-	j = jhash_2words((__force u32)tuple->src.u3.ip,
-			 range->flags & NF_NAT_RANGE_PERSISTENT ?
-				0 : (__force u32)tuple->dst.u3.ip ^ zone, 0);
-	j = ((u64)j * (maxip - minip + 1)) >> 32;
-	*var_ipp = htonl(minip + j);
+	 * like this), even across reboots.
+	 */
+	j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3),
+		   range->flags & NF_NAT_RANGE_PERSISTENT ?
+			0 : (__force u32)tuple->dst.u3.all[max] ^ zone);
+
+	full_range = false;
+	for (i = 0; i <= max; i++) {
+		/* If first bytes of the address are at the maximum, use the
+		 * distance. Otherwise use the full range.
+		 */
+		if (!full_range) {
+			minip = ntohl(range->min_addr.all[i]);
+			maxip = ntohl(range->max_addr.all[i]);
+			dist  = maxip - minip + 1;
+		} else {
+			minip = 0;
+			dist  = ~0;
+		}
+
+		var_ipp->all[i] = htonl(minip + (((u64)j * dist) >> 32));
+		if (var_ipp->all[i] != range->max_addr.all[i])
+			full_range = true;
+
+		if (!(range->flags & NF_NAT_RANGE_PERSISTENT))
+			j ^= (__force u32)tuple->dst.u3.all[i];
+	}
 }
 
-/* Manipulate the tuple into the range given.  For NF_INET_POST_ROUTING,
- * we change the source to map into the range.  For NF_INET_PRE_ROUTING
+/* Manipulate the tuple into the range given. For NF_INET_POST_ROUTING,
+ * we change the source to map into the range. For NF_INET_PRE_ROUTING
  * and NF_INET_LOCAL_OUT, we change the destination to map into the
- * range.  It might not be possible to get a unique tuple, but we try.
+ * range. It might not be possible to get a unique tuple, but we try.
  * At worst (or if we race), we will end up with a final duplicate in
  * __ip_conntrack_confirm and drop the packet. */
 static void
 get_unique_tuple(struct nf_conntrack_tuple *tuple,
 		 const struct nf_conntrack_tuple *orig_tuple,
-		 const struct nf_nat_ipv4_range *range,
+		 const struct nf_nat_range *range,
 		 struct nf_conn *ct,
 		 enum nf_nat_manip_type maniptype)
 {
+	const struct nf_nat_l3proto *l3proto;
+	const struct nf_nat_l4proto *l4proto;
 	struct net *net = nf_ct_net(ct);
-	const struct nf_nat_protocol *proto;
 	u16 zone = nf_ct_zone(ct);
 
-	/* 1) If this srcip/proto/src-proto-part is currently mapped,
-	   and that same mapping gives a unique tuple within the given
-	   range, use that.
+	rcu_read_lock();
+	l3proto = __nf_nat_l3proto_find(orig_tuple->src.l3num);
+	l4proto = __nf_nat_l4proto_find(orig_tuple->src.l3num,
+					orig_tuple->dst.protonum);
 
-	   This is only required for source (ie. NAT/masq) mappings.
-	   So far, we don't do local source mappings, so multiple
-	   manips not an issue.  */
+	/* 1) If this srcip/proto/src-proto-part is currently mapped,
+	 * and that same mapping gives a unique tuple within the given
+	 * range, use that.
+	 *
+	 * This is only required for source (ie. NAT/masq) mappings.
+	 * So far, we don't do local source mappings, so multiple
+	 * manips not an issue.
+	 */
 	if (maniptype == NF_NAT_MANIP_SRC &&
 	    !(range->flags & NF_NAT_RANGE_PROTO_RANDOM)) {
 		/* try the original tuple first */
-		if (in_range(orig_tuple, range)) {
+		if (in_range(l3proto, l4proto, orig_tuple, range)) {
 			if (!nf_nat_used_tuple(orig_tuple, ct)) {
 				*tuple = *orig_tuple;
-				return;
+				goto out;
 			}
-		} else if (find_appropriate_src(net, zone, orig_tuple, tuple,
-			   range)) {
+		} else if (find_appropriate_src(net, zone, l3proto, l4proto,
+						orig_tuple, tuple, range)) {
 			pr_debug("get_unique_tuple: Found current src map\n");
 			if (!nf_nat_used_tuple(tuple, ct))
-				return;
+				goto out;
 		}
 	}
 
-	/* 2) Select the least-used IP/proto combination in the given
-	   range. */
+	/* 2) Select the least-used IP/proto combination in the given range */
 	*tuple = *orig_tuple;
 	find_best_ips_proto(zone, tuple, range, ct, maniptype);
 
 	/* 3) The per-protocol part of the manip is made to map into
-	   the range to make a unique tuple. */
-
-	rcu_read_lock();
-	proto = __nf_nat_proto_find(orig_tuple->dst.protonum);
+	 * the range to make a unique tuple.
+	 */
 
 	/* Only bother mapping if it's not already in range and unique */
 	if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM)) {
 		if (range->flags & NF_NAT_RANGE_PROTO_SPECIFIED) {
-			if (proto->in_range(tuple, maniptype, &range->min,
-					    &range->max) &&
-			    (range->min.all == range->max.all ||
+			if (l4proto->in_range(tuple, maniptype,
+					      &range->min_proto,
+					      &range->max_proto) &&
+			    (range->min_proto.all == range->max_proto.all ||
 			     !nf_nat_used_tuple(tuple, ct)))
 				goto out;
 		} else if (!nf_nat_used_tuple(tuple, ct)) {
@@ -259,14 +350,14 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple,
 	}
 
 	/* Last change: get protocol to try to obtain unique tuple. */
-	proto->unique_tuple(tuple, range, maniptype, ct);
+	l4proto->unique_tuple(l3proto, tuple, range, maniptype, ct);
 out:
 	rcu_read_unlock();
 }
 
 unsigned int
 nf_nat_setup_info(struct nf_conn *ct,
-		  const struct nf_nat_ipv4_range *range,
+		  const struct nf_nat_range *range,
 		  enum nf_nat_manip_type maniptype)
 {
 	struct net *net = nf_ct_net(ct);
@@ -288,10 +379,10 @@ nf_nat_setup_info(struct nf_conn *ct,
 	BUG_ON(nf_nat_initialized(ct, maniptype));
 
 	/* What we've got will look like inverse of reply. Normally
-	   this is what is in the conntrack, except for prior
-	   manipulations (future optimization: if num_manips == 0,
-	   orig_tp =
-	   conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple) */
+	 * this is what is in the conntrack, except for prior
+	 * manipulations (future optimization: if num_manips == 0,
+	 * orig_tp = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)
+	 */
 	nf_ct_invert_tuplepr(&curr_tuple,
 			     &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
 
@@ -317,11 +408,11 @@ nf_nat_setup_info(struct nf_conn *ct,
 		srchash = hash_by_src(net, nf_ct_zone(ct),
 				      &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
 		spin_lock_bh(&nf_nat_lock);
-		/* nf_conntrack_alter_reply might re-allocate extension area */
+		/* nf_conntrack_alter_reply might re-allocate exntension aera */
 		nat = nfct_nat(ct);
 		nat->ct = ct;
 		hlist_add_head_rcu(&nat->bysource,
-				   &net->ipv4.nat_bysource[srchash]);
+				   &net->ct.nat_bysource[srchash]);
 		spin_unlock_bh(&nf_nat_lock);
 	}
 
@@ -335,47 +426,14 @@ nf_nat_setup_info(struct nf_conn *ct,
 }
 EXPORT_SYMBOL(nf_nat_setup_info);
 
-/* Returns true if succeeded. */
-static bool
-manip_pkt(u_int16_t proto,
-	  struct sk_buff *skb,
-	  unsigned int iphdroff,
-	  const struct nf_conntrack_tuple *target,
-	  enum nf_nat_manip_type maniptype)
-{
-	struct iphdr *iph;
-	const struct nf_nat_protocol *p;
-
-	if (!skb_make_writable(skb, iphdroff + sizeof(*iph)))
-		return false;
-
-	iph = (void *)skb->data + iphdroff;
-
-	/* Manipulate protcol part. */
-
-	/* rcu_read_lock()ed by nf_hook_slow */
-	p = __nf_nat_proto_find(proto);
-	if (!p->manip_pkt(skb, iphdroff, target, maniptype))
-		return false;
-
-	iph = (void *)skb->data + iphdroff;
-
-	if (maniptype == NF_NAT_MANIP_SRC) {
-		csum_replace4(&iph->check, iph->saddr, target->src.u3.ip);
-		iph->saddr = target->src.u3.ip;
-	} else {
-		csum_replace4(&iph->check, iph->daddr, target->dst.u3.ip);
-		iph->daddr = target->dst.u3.ip;
-	}
-	return true;
-}
-
 /* Do packet manipulations according to nf_nat_setup_info. */
 unsigned int nf_nat_packet(struct nf_conn *ct,
 			   enum ip_conntrack_info ctinfo,
 			   unsigned int hooknum,
 			   struct sk_buff *skb)
 {
+	const struct nf_nat_l3proto *l3proto;
+	const struct nf_nat_l4proto *l4proto;
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
 	unsigned long statusbit;
 	enum nf_nat_manip_type mtype = HOOK2MANIP(hooknum);
@@ -396,129 +454,94 @@ unsigned int nf_nat_packet(struct nf_conn *ct,
 		/* We are aiming to look like inverse of other direction. */
 		nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
 
-		if (!manip_pkt(target.dst.protonum, skb, 0, &target, mtype))
+		l3proto = __nf_nat_l3proto_find(target.src.l3num);
+		l4proto = __nf_nat_l4proto_find(target.src.l3num,
+						target.dst.protonum);
+		if (!l3proto->manip_pkt(skb, 0, l4proto, &target, mtype))
 			return NF_DROP;
 	}
 	return NF_ACCEPT;
 }
 EXPORT_SYMBOL_GPL(nf_nat_packet);
 
-/* Dir is direction ICMP is coming from (opposite to packet it contains) */
-int nf_nat_icmp_reply_translation(struct nf_conn *ct,
-				  enum ip_conntrack_info ctinfo,
-				  unsigned int hooknum,
-				  struct sk_buff *skb)
-{
-	struct {
-		struct icmphdr icmp;
-		struct iphdr ip;
-	} *inside;
-	struct nf_conntrack_tuple target;
-	int hdrlen = ip_hdrlen(skb);
-	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
-	unsigned long statusbit;
-	enum nf_nat_manip_type manip = HOOK2MANIP(hooknum);
-
-	if (!skb_make_writable(skb, hdrlen + sizeof(*inside)))
-		return 0;
-
-	inside = (void *)skb->data + hdrlen;
-
-	/* We're actually going to mangle it beyond trivial checksum
-	   adjustment, so make sure the current checksum is correct. */
-	if (nf_ip_checksum(skb, hooknum, hdrlen, 0))
-		return 0;
-
-	/* Must be RELATED */
-	NF_CT_ASSERT(skb->nfctinfo == IP_CT_RELATED ||
-		     skb->nfctinfo == IP_CT_RELATED_REPLY);
-
-	/* Redirects on non-null nats must be dropped, else they'll
-	   start talking to each other without our translation, and be
-	   confused... --RR */
-	if (inside->icmp.type == ICMP_REDIRECT) {
-		/* If NAT isn't finished, assume it and drop. */
-		if ((ct->status & IPS_NAT_DONE_MASK) != IPS_NAT_DONE_MASK)
-			return 0;
-
-		if (ct->status & IPS_NAT_MASK)
-			return 0;
-	}
-
-	if (manip == NF_NAT_MANIP_SRC)
-		statusbit = IPS_SRC_NAT;
-	else
-		statusbit = IPS_DST_NAT;
-
-	/* Invert if this is reply dir. */
-	if (dir == IP_CT_DIR_REPLY)
-		statusbit ^= IPS_NAT_MASK;
-
-	if (!(ct->status & statusbit))
-		return 1;
-
-	pr_debug("icmp_reply_translation: translating error %p manip %u "
-		 "dir %s\n", skb, manip,
-		 dir == IP_CT_DIR_ORIGINAL ? "ORIG" : "REPLY");
-
-	/* Change inner back to look like incoming packet.  We do the
-	   opposite manip on this hook to normal, because it might not
-	   pass all hooks (locally-generated ICMP).  Consider incoming
-	   packet: PREROUTING (DST manip), routing produces ICMP, goes
-	   through POSTROUTING (which must correct the DST manip). */
-	if (!manip_pkt(inside->ip.protocol, skb, hdrlen + sizeof(inside->icmp),
-		       &ct->tuplehash[!dir].tuple, !manip))
-		return 0;
-
-	if (skb->ip_summed != CHECKSUM_PARTIAL) {
-		/* Reloading "inside" here since manip_pkt inner. */
-		inside = (void *)skb->data + hdrlen;
-		inside->icmp.checksum = 0;
-		inside->icmp.checksum =
-			csum_fold(skb_checksum(skb, hdrlen,
-					       skb->len - hdrlen, 0));
-	}
-
-	/* Change outer to look the reply to an incoming packet
-	 * (proto 0 means don't invert per-proto part). */
-	nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
-	if (!manip_pkt(0, skb, 0, &target, manip))
-		return 0;
-
-	return 1;
-}
-EXPORT_SYMBOL_GPL(nf_nat_icmp_reply_translation);
-
 /* Protocol registration. */
-int nf_nat_protocol_register(const struct nf_nat_protocol *proto)
+int nf_nat_l4proto_register(u8 l3proto, const struct nf_nat_l4proto *l4proto)
 {
+	const struct nf_nat_l4proto **l4protos;
+	unsigned int i;
 	int ret = 0;
 
 	spin_lock_bh(&nf_nat_lock);
+	if (nf_nat_l4protos[l3proto] == NULL) {
+		l4protos = kmalloc(IPPROTO_MAX * sizeof(struct nf_nat_l4proto *),
+				   GFP_ATOMIC);
+		if (l4protos == NULL) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		for (i = 0; i < IPPROTO_MAX; i++)
+			RCU_INIT_POINTER(l4protos[i], &nf_nat_l4proto_unknown);
+
+		nf_nat_l4protos[l3proto] = l4protos;
+	}
+
 	if (rcu_dereference_protected(
-			nf_nat_protos[proto->protonum],
+			nf_nat_l4protos[l3proto][l4proto->l4proto],
 			lockdep_is_held(&nf_nat_lock)
-			) != &nf_nat_unknown_protocol) {
+			) != &nf_nat_l4proto_unknown) {
 		ret = -EBUSY;
 		goto out;
 	}
-	RCU_INIT_POINTER(nf_nat_protos[proto->protonum], proto);
+	RCU_INIT_POINTER(nf_nat_l4protos[l3proto][l4proto->l4proto], l4proto);
  out:
 	spin_unlock_bh(&nf_nat_lock);
 	return ret;
 }
-EXPORT_SYMBOL(nf_nat_protocol_register);
+EXPORT_SYMBOL_GPL(nf_nat_l4proto_register);
 
 /* No one stores the protocol anywhere; simply delete it. */
-void nf_nat_protocol_unregister(const struct nf_nat_protocol *proto)
+void nf_nat_l4proto_unregister(u8 l3proto, const struct nf_nat_l4proto *l4proto)
 {
 	spin_lock_bh(&nf_nat_lock);
-	RCU_INIT_POINTER(nf_nat_protos[proto->protonum],
-			   &nf_nat_unknown_protocol);
+	RCU_INIT_POINTER(nf_nat_l4protos[l3proto][l4proto->l4proto],
+			 &nf_nat_l4proto_unknown);
 	spin_unlock_bh(&nf_nat_lock);
 	synchronize_rcu();
 }
-EXPORT_SYMBOL(nf_nat_protocol_unregister);
+EXPORT_SYMBOL_GPL(nf_nat_l4proto_unregister);
+
+int nf_nat_l3proto_register(const struct nf_nat_l3proto *l3proto)
+{
+	int err;
+
+	err = nf_ct_l3proto_try_module_get(l3proto->l3proto);
+	if (err < 0)
+		return err;
+
+	spin_lock_bh(&nf_nat_lock);
+	RCU_INIT_POINTER(nf_nat_l4protos[l3proto->l3proto][IPPROTO_TCP],
+			 &nf_nat_l4proto_tcp);
+	RCU_INIT_POINTER(nf_nat_l4protos[l3proto->l3proto][IPPROTO_UDP],
+			 &nf_nat_l4proto_udp);
+	spin_unlock_bh(&nf_nat_lock);
+
+	RCU_INIT_POINTER(nf_nat_l3protos[l3proto->l3proto], l3proto);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nf_nat_l3proto_register);
+
+static void nf_nat_clean_l3proto(u8 l3proto);
+
+void nf_nat_l3proto_unregister(const struct nf_nat_l3proto *l3proto)
+{
+	RCU_INIT_POINTER(nf_nat_l3protos[l3proto->l3proto], NULL);
+	synchronize_rcu();
+	kfree(nf_nat_l4protos[l3proto->l3proto]);
+	nf_nat_l4protos[l3proto->l3proto] = NULL;
+	nf_nat_clean_l3proto(l3proto->l3proto);
+	nf_ct_l3proto_module_put(l3proto->l3proto);
+}
+EXPORT_SYMBOL_GPL(nf_nat_l3proto_unregister);
 
 /* No one using conntrack by the time this called. */
 static void nf_nat_cleanup_conntrack(struct nf_conn *ct)
@@ -570,34 +593,34 @@ static const struct nla_policy protonat_nla_policy[CTA_PROTONAT_MAX+1] = {
 
 static int nfnetlink_parse_nat_proto(struct nlattr *attr,
 				     const struct nf_conn *ct,
-				     struct nf_nat_ipv4_range *range)
+				     struct nf_nat_range *range)
 {
 	struct nlattr *tb[CTA_PROTONAT_MAX+1];
-	const struct nf_nat_protocol *npt;
+	const struct nf_nat_l4proto *l4proto;
 	int err;
 
 	err = nla_parse_nested(tb, CTA_PROTONAT_MAX, attr, protonat_nla_policy);
 	if (err < 0)
 		return err;
 
-	rcu_read_lock();
-	npt = __nf_nat_proto_find(nf_ct_protonum(ct));
-	if (npt->nlattr_to_range)
-		err = npt->nlattr_to_range(tb, range);
-	rcu_read_unlock();
+	l4proto = __nf_nat_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
+	if (l4proto->nlattr_to_range)
+		err = l4proto->nlattr_to_range(tb, range);
+
 	return err;
 }
 
 static const struct nla_policy nat_nla_policy[CTA_NAT_MAX+1] = {
-	[CTA_NAT_MINIP]		= { .type = NLA_U32 },
-	[CTA_NAT_MAXIP]		= { .type = NLA_U32 },
+	[CTA_NAT_V4_MINIP]	= { .type = NLA_U32 },
+	[CTA_NAT_V4_MAXIP]	= { .type = NLA_U32 },
 	[CTA_NAT_PROTO]		= { .type = NLA_NESTED },
 };
 
 static int
 nfnetlink_parse_nat(const struct nlattr *nat,
-		    const struct nf_conn *ct, struct nf_nat_ipv4_range *range)
+		    const struct nf_conn *ct, struct nf_nat_range *range)
 {
+	const struct nf_nat_l3proto *l3proto;
 	struct nlattr *tb[CTA_NAT_MAX+1];
 	int err;
 
@@ -607,25 +630,23 @@ nfnetlink_parse_nat(const struct nlattr *nat,
 	if (err < 0)
 		return err;
 
-	if (tb[CTA_NAT_MINIP])
-		range->min_ip = nla_get_be32(tb[CTA_NAT_MINIP]);
-
-	if (!tb[CTA_NAT_MAXIP])
-		range->max_ip = range->min_ip;
-	else
-		range->max_ip = nla_get_be32(tb[CTA_NAT_MAXIP]);
-
-	if (range->min_ip)
-		range->flags |= NF_NAT_RANGE_MAP_IPS;
+	rcu_read_lock();
+	l3proto = __nf_nat_l3proto_find(nf_ct_l3num(ct));
+	if (l3proto == NULL) {
+		err = -EAGAIN;
+		goto out;
+	}
+	err = l3proto->nlattr_to_range(tb, range);
+	if (err < 0)
+		goto out;
 
 	if (!tb[CTA_NAT_PROTO])
-		return 0;
+		goto out;
 
 	err = nfnetlink_parse_nat_proto(tb[CTA_NAT_PROTO], ct, range);
-	if (err < 0)
-		return err;
-
-	return 0;
+out:
+	rcu_read_unlock();
+	return err;
 }
 
 static int
@@ -633,10 +654,12 @@ nfnetlink_parse_nat_setup(struct nf_conn *ct,
 			  enum nf_nat_manip_type manip,
 			  const struct nlattr *attr)
 {
-	struct nf_nat_ipv4_range range;
+	struct nf_nat_range range;
+	int err;
 
-	if (nfnetlink_parse_nat(attr, ct, &range) < 0)
-		return -EINVAL;
+	err = nfnetlink_parse_nat(attr, ct, &range);
+	if (err < 0)
+		return err;
 	if (nf_nat_initialized(ct, manip))
 		return -EEXIST;
 
@@ -655,9 +678,9 @@ nfnetlink_parse_nat_setup(struct nf_conn *ct,
 static int __net_init nf_nat_net_init(struct net *net)
 {
 	/* Leave them the same for the moment. */
-	net->ipv4.nat_htable_size = net->ct.htable_size;
-	net->ipv4.nat_bysource = nf_ct_alloc_hashtable(&net->ipv4.nat_htable_size, 0);
-	if (!net->ipv4.nat_bysource)
+	net->ct.nat_htable_size = net->ct.htable_size;
+	net->ct.nat_bysource = nf_ct_alloc_hashtable(&net->ct.nat_htable_size, 0);
+	if (!net->ct.nat_bysource)
 		return -ENOMEM;
 	return 0;
 }
@@ -666,19 +689,30 @@ static int __net_init nf_nat_net_init(struct net *net)
 static int clean_nat(struct nf_conn *i, void *data)
 {
 	struct nf_conn_nat *nat = nfct_nat(i);
+	u8 l3proto = (long)data;
 
 	if (!nat)
 		return 0;
+	if (l3proto && nf_ct_l3num(i) != l3proto)
+		return 0;
 	memset(nat, 0, sizeof(*nat));
 	i->status &= ~(IPS_NAT_MASK | IPS_NAT_DONE_MASK | IPS_SEQ_ADJUST);
 	return 0;
 }
 
+static void nf_nat_clean_l3proto(u8 l3proto)
+{
+	struct net *net;
+
+	for_each_net(net)
+		nf_ct_iterate_cleanup(net, clean_nat, (void *)(long)l3proto);
+}
+
 static void __net_exit nf_nat_net_exit(struct net *net)
 {
 	nf_ct_iterate_cleanup(net, &clean_nat, NULL);
 	synchronize_rcu();
-	nf_ct_free_hashtable(net->ipv4.nat_bysource, net->ipv4.nat_htable_size);
+	nf_ct_free_hashtable(net->ct.nat_bysource, net->ct.nat_htable_size);
 }
 
 static struct pernet_operations nf_nat_net_ops = {
@@ -697,11 +731,8 @@ static struct nfq_ct_nat_hook nfq_ct_nat = {
 
 static int __init nf_nat_init(void)
 {
-	size_t i;
 	int ret;
 
-	need_ipv4_conntrack();
-
 	ret = nf_ct_extend_register(&nat_extend);
 	if (ret < 0) {
 		printk(KERN_ERR "nf_nat_core: Unable to register extension\n");
@@ -712,22 +743,11 @@ static int __init nf_nat_init(void)
 	if (ret < 0)
 		goto cleanup_extend;
 
-	/* Sew in builtin protocols. */
-	spin_lock_bh(&nf_nat_lock);
-	for (i = 0; i < MAX_IP_NAT_PROTO; i++)
-		RCU_INIT_POINTER(nf_nat_protos[i], &nf_nat_unknown_protocol);
-	RCU_INIT_POINTER(nf_nat_protos[IPPROTO_TCP], &nf_nat_protocol_tcp);
-	RCU_INIT_POINTER(nf_nat_protos[IPPROTO_UDP], &nf_nat_protocol_udp);
-	RCU_INIT_POINTER(nf_nat_protos[IPPROTO_ICMP], &nf_nat_protocol_icmp);
-	spin_unlock_bh(&nf_nat_lock);
+	nf_ct_helper_expectfn_register(&follow_master_nat);
 
 	/* Initialize fake conntrack so that NAT will skip it */
 	nf_ct_untracked_status_or(IPS_NAT_DONE_MASK);
 
-	l3proto = nf_ct_l3proto_find_get((u_int16_t)AF_INET);
-
-	nf_ct_helper_expectfn_register(&follow_master_nat);
-
 	BUG_ON(nf_nat_seq_adjust_hook != NULL);
 	RCU_INIT_POINTER(nf_nat_seq_adjust_hook, nf_nat_seq_adjust);
 	BUG_ON(nfnetlink_parse_nat_setup_hook != NULL);
@@ -736,6 +756,10 @@ static int __init nf_nat_init(void)
 	BUG_ON(nf_ct_nat_offset != NULL);
 	RCU_INIT_POINTER(nf_ct_nat_offset, nf_nat_get_offset);
 	RCU_INIT_POINTER(nfq_ct_nat_hook, &nfq_ct_nat);
+#ifdef CONFIG_XFRM
+	BUG_ON(nf_nat_decode_session_hook != NULL);
+	RCU_INIT_POINTER(nf_nat_decode_session_hook, __nf_nat_decode_session);
+#endif
 	return 0;
 
  cleanup_extend:
@@ -745,19 +769,24 @@ static int __init nf_nat_init(void)
 
 static void __exit nf_nat_cleanup(void)
 {
+	unsigned int i;
+
 	unregister_pernet_subsys(&nf_nat_net_ops);
-	nf_ct_l3proto_put(l3proto);
 	nf_ct_extend_unregister(&nat_extend);
 	nf_ct_helper_expectfn_unregister(&follow_master_nat);
 	RCU_INIT_POINTER(nf_nat_seq_adjust_hook, NULL);
 	RCU_INIT_POINTER(nfnetlink_parse_nat_setup_hook, NULL);
 	RCU_INIT_POINTER(nf_ct_nat_offset, NULL);
 	RCU_INIT_POINTER(nfq_ct_nat_hook, NULL);
+#ifdef CONFIG_XFRM
+	RCU_INIT_POINTER(nf_nat_decode_session_hook, NULL);
+#endif
+	for (i = 0; i < NFPROTO_NUMPROTO; i++)
+		kfree(nf_nat_l4protos[i]);
 	synchronize_net();
 }
 
 MODULE_LICENSE("GPL");
-MODULE_ALIAS("nf-nat-ipv4");
 
 module_init(nf_nat_init);
 module_exit(nf_nat_cleanup);
diff --git a/net/ipv4/netfilter/nf_nat_helper.c b/net/netfilter/nf_nat_helper.c
similarity index 83%
rename from net/ipv4/netfilter/nf_nat_helper.c
rename to net/netfilter/nf_nat_helper.c
index 2fefec5..23c2b38 100644
--- a/net/ipv4/netfilter/nf_nat_helper.c
+++ b/net/netfilter/nf_nat_helper.c
@@ -1,4 +1,4 @@
-/* ip_nat_helper.c - generic support functions for NAT helpers
+/* nf_nat_helper.c - generic support functions for NAT helpers
  *
  * (C) 2000-2002 Harald Welte <laforge@netfilter.org>
  * (C) 2003-2006 Netfilter Core Team <coreteam@netfilter.org>
@@ -9,23 +9,19 @@
  */
 #include <linux/module.h>
 #include <linux/gfp.h>
-#include <linux/kmod.h>
 #include <linux/types.h>
-#include <linux/timer.h>
 #include <linux/skbuff.h>
 #include <linux/tcp.h>
 #include <linux/udp.h>
-#include <net/checksum.h>
 #include <net/tcp.h>
-#include <net/route.h>
 
-#include <linux/netfilter_ipv4.h>
 #include <net/netfilter/nf_conntrack.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_ecache.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <net/netfilter/nf_nat.h>
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 #include <net/netfilter/nf_nat_core.h>
 #include <net/netfilter/nf_nat_helper.h>
 
@@ -90,7 +86,6 @@ s16 nf_nat_get_offset(const struct nf_conn *ct,
 
 	return offset;
 }
-EXPORT_SYMBOL_GPL(nf_nat_get_offset);
 
 /* Frobs data inside this packet, which is linear. */
 static void mangle_contents(struct sk_buff *skb,
@@ -125,9 +120,13 @@ static void mangle_contents(struct sk_buff *skb,
 		__skb_trim(skb, skb->len + rep_len - match_len);
 	}
 
-	/* fix IP hdr checksum information */
-	ip_hdr(skb)->tot_len = htons(skb->len);
-	ip_send_check(ip_hdr(skb));
+	if (nf_ct_l3num((struct nf_conn *)skb->nfct) == NFPROTO_IPV4) {
+		/* fix IP hdr checksum information */
+		ip_hdr(skb)->tot_len = htons(skb->len);
+		ip_send_check(ip_hdr(skb));
+	} else
+		ipv6_hdr(skb)->payload_len =
+			htons(skb->len - sizeof(struct ipv6hdr));
 }
 
 /* Unusual, but possible case. */
@@ -166,35 +165,6 @@ void nf_nat_tcp_seq_adjust(struct sk_buff *skb, struct nf_conn *ct,
 }
 EXPORT_SYMBOL_GPL(nf_nat_tcp_seq_adjust);
 
-static void nf_nat_csum(struct sk_buff *skb, const struct iphdr *iph, void *data,
-			int datalen, __sum16 *check, int oldlen)
-{
-	struct rtable *rt = skb_rtable(skb);
-
-	if (skb->ip_summed != CHECKSUM_PARTIAL) {
-		if (!(rt->rt_flags & RTCF_LOCAL) &&
-		    (!skb->dev || skb->dev->features & NETIF_F_V4_CSUM)) {
-			skb->ip_summed = CHECKSUM_PARTIAL;
-			skb->csum_start = skb_headroom(skb) +
-					  skb_network_offset(skb) +
-					  iph->ihl * 4;
-			skb->csum_offset = (void *)check - data;
-			*check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-						    datalen, iph->protocol, 0);
-		} else {
-			*check = 0;
-			*check = csum_tcpudp_magic(iph->saddr, iph->daddr,
-						   datalen, iph->protocol,
-						   csum_partial(data, datalen,
-								0));
-			if (iph->protocol == IPPROTO_UDP && !*check)
-				*check = CSUM_MANGLED_0;
-		}
-	} else
-		inet_proto_csum_replace2(check, skb,
-					 htons(oldlen), htons(datalen), 1);
-}
-
 /* Generic function for mangling variable-length address changes inside
  * NATed TCP connections (like the PORT XXX,XXX,XXX,XXX,XXX,XXX
  * command in FTP).
@@ -212,7 +182,7 @@ int __nf_nat_mangle_tcp_packet(struct sk_buff *skb,
 			       const char *rep_buffer,
 			       unsigned int rep_len, bool adjust)
 {
-	struct iphdr *iph;
+	const struct nf_nat_l3proto *l3proto;
 	struct tcphdr *tcph;
 	int oldlen, datalen;
 
@@ -226,15 +196,17 @@ int __nf_nat_mangle_tcp_packet(struct sk_buff *skb,
 
 	SKB_LINEAR_ASSERT(skb);
 
-	iph = ip_hdr(skb);
-	tcph = (void *)iph + iph->ihl*4;
+	tcph = (void *)skb->data + protoff;
 
-	oldlen = skb->len - iph->ihl*4;
-	mangle_contents(skb, iph->ihl*4 + tcph->doff*4,
+	oldlen = skb->len - protoff;
+	mangle_contents(skb, protoff + tcph->doff*4,
 			match_offset, match_len, rep_buffer, rep_len);
 
-	datalen = skb->len - iph->ihl*4;
-	nf_nat_csum(skb, iph, tcph, datalen, &tcph->check, oldlen);
+	datalen = skb->len - protoff;
+
+	l3proto = __nf_nat_l3proto_find(nf_ct_l3num(ct));
+	l3proto->csum_recalc(skb, IPPROTO_TCP, tcph, &tcph->check,
+			     datalen, oldlen);
 
 	if (adjust && rep_len != match_len)
 		nf_nat_set_seq_adjust(ct, ctinfo, tcph->seq,
@@ -264,7 +236,7 @@ nf_nat_mangle_udp_packet(struct sk_buff *skb,
 			 const char *rep_buffer,
 			 unsigned int rep_len)
 {
-	struct iphdr *iph;
+	const struct nf_nat_l3proto *l3proto;
 	struct udphdr *udph;
 	int datalen, oldlen;
 
@@ -276,22 +248,23 @@ nf_nat_mangle_udp_packet(struct sk_buff *skb,
 	    !enlarge_skb(skb, rep_len - match_len))
 		return 0;
 
-	iph = ip_hdr(skb);
-	udph = (void *)iph + iph->ihl*4;
+	udph = (void *)skb->data + protoff;
 
-	oldlen = skb->len - iph->ihl*4;
-	mangle_contents(skb, iph->ihl*4 + sizeof(*udph),
+	oldlen = skb->len - protoff;
+	mangle_contents(skb, protoff + sizeof(*udph),
 			match_offset, match_len, rep_buffer, rep_len);
 
 	/* update the length of the UDP packet */
-	datalen = skb->len - iph->ihl*4;
+	datalen = skb->len - protoff;
 	udph->len = htons(datalen);
 
 	/* fix udp checksum if udp checksum was previously calculated */
 	if (!udph->check && skb->ip_summed != CHECKSUM_PARTIAL)
 		return 1;
 
-	nf_nat_csum(skb, iph, udph, datalen, &udph->check, oldlen);
+	l3proto = __nf_nat_l3proto_find(nf_ct_l3num(ct));
+	l3proto->csum_recalc(skb, IPPROTO_UDP, udph, &udph->check,
+			     datalen, oldlen);
 
 	return 1;
 }
@@ -343,6 +316,7 @@ sack_adjust(struct sk_buff *skb,
 /* TCP SACK sequence number adjustment */
 static inline unsigned int
 nf_nat_sack_adjust(struct sk_buff *skb,
+		   unsigned int protoff,
 		   struct tcphdr *tcph,
 		   struct nf_conn *ct,
 		   enum ip_conntrack_info ctinfo)
@@ -350,8 +324,8 @@ nf_nat_sack_adjust(struct sk_buff *skb,
 	unsigned int dir, optoff, optend;
 	struct nf_conn_nat *nat = nfct_nat(ct);
 
-	optoff = ip_hdrlen(skb) + sizeof(struct tcphdr);
-	optend = ip_hdrlen(skb) + tcph->doff * 4;
+	optoff = protoff + sizeof(struct tcphdr);
+	optend = protoff + tcph->doff * 4;
 
 	if (!skb_make_writable(skb, optend))
 		return 0;
@@ -432,7 +406,7 @@ nf_nat_seq_adjust(struct sk_buff *skb,
 	tcph->seq = newseq;
 	tcph->ack_seq = newack;
 
-	return nf_nat_sack_adjust(skb, tcph, ct, ctinfo);
+	return nf_nat_sack_adjust(skb, protoff, tcph, ct, ctinfo);
 }
 
 /* Setup NAT on this expected conntrack so it follows master. */
@@ -440,22 +414,22 @@ nf_nat_seq_adjust(struct sk_buff *skb,
 void nf_nat_follow_master(struct nf_conn *ct,
 			  struct nf_conntrack_expect *exp)
 {
-	struct nf_nat_ipv4_range range;
+	struct nf_nat_range range;
 
 	/* This must be a fresh one. */
 	BUG_ON(ct->status & IPS_NAT_DONE_MASK);
 
 	/* Change src to where master sends to */
 	range.flags = NF_NAT_RANGE_MAP_IPS;
-	range.min_ip = range.max_ip
-		= ct->master->tuplehash[!exp->dir].tuple.dst.u3.ip;
+	range.min_addr = range.max_addr
+		= ct->master->tuplehash[!exp->dir].tuple.dst.u3;
 	nf_nat_setup_info(ct, &range, NF_NAT_MANIP_SRC);
 
 	/* For DST manip, map port here to where it's expected. */
 	range.flags = (NF_NAT_RANGE_MAP_IPS | NF_NAT_RANGE_PROTO_SPECIFIED);
-	range.min = range.max = exp->saved_proto;
-	range.min_ip = range.max_ip
-		= ct->master->tuplehash[!exp->dir].tuple.src.u3.ip;
+	range.min_proto = range.max_proto = exp->saved_proto;
+	range.min_addr = range.max_addr
+		= ct->master->tuplehash[!exp->dir].tuple.src.u3;
 	nf_nat_setup_info(ct, &range, NF_NAT_MANIP_DST);
 }
 EXPORT_SYMBOL(nf_nat_follow_master);
diff --git a/net/ipv4/netfilter/nf_nat_proto_common.c b/net/netfilter/nf_nat_proto_common.c
similarity index 62%
rename from net/ipv4/netfilter/nf_nat_proto_common.c
rename to net/netfilter/nf_nat_proto_common.c
index 9993bc9..9baaf73 100644
--- a/net/ipv4/netfilter/nf_nat_proto_common.c
+++ b/net/netfilter/nf_nat_proto_common.c
@@ -9,20 +9,18 @@
 
 #include <linux/types.h>
 #include <linux/random.h>
-#include <linux/ip.h>
-
 #include <linux/netfilter.h>
 #include <linux/export.h>
-#include <net/secure_seq.h>
+
 #include <net/netfilter/nf_nat.h>
 #include <net/netfilter/nf_nat_core.h>
-#include <net/netfilter/nf_nat_rule.h>
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 
-bool nf_nat_proto_in_range(const struct nf_conntrack_tuple *tuple,
-			   enum nf_nat_manip_type maniptype,
-			   const union nf_conntrack_man_proto *min,
-			   const union nf_conntrack_man_proto *max)
+bool nf_nat_l4proto_in_range(const struct nf_conntrack_tuple *tuple,
+			     enum nf_nat_manip_type maniptype,
+			     const union nf_conntrack_man_proto *min,
+			     const union nf_conntrack_man_proto *max)
 {
 	__be16 port;
 
@@ -34,13 +32,14 @@ bool nf_nat_proto_in_range(const struct nf_conntrack_tuple *tuple,
 	return ntohs(port) >= ntohs(min->all) &&
 	       ntohs(port) <= ntohs(max->all);
 }
-EXPORT_SYMBOL_GPL(nf_nat_proto_in_range);
+EXPORT_SYMBOL_GPL(nf_nat_l4proto_in_range);
 
-void nf_nat_proto_unique_tuple(struct nf_conntrack_tuple *tuple,
-			       const struct nf_nat_ipv4_range *range,
-			       enum nf_nat_manip_type maniptype,
-			       const struct nf_conn *ct,
-			       u_int16_t *rover)
+void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto,
+				 struct nf_conntrack_tuple *tuple,
+				 const struct nf_nat_range *range,
+				 enum nf_nat_manip_type maniptype,
+				 const struct nf_conn *ct,
+				 u16 *rover)
 {
 	unsigned int range_size, min, i;
 	__be16 *portptr;
@@ -71,15 +70,14 @@ void nf_nat_proto_unique_tuple(struct nf_conntrack_tuple *tuple,
 			range_size = 65535 - 1024 + 1;
 		}
 	} else {
-		min = ntohs(range->min.all);
-		range_size = ntohs(range->max.all) - min + 1;
+		min = ntohs(range->min_proto.all);
+		range_size = ntohs(range->max_proto.all) - min + 1;
 	}
 
 	if (range->flags & NF_NAT_RANGE_PROTO_RANDOM)
-		off = secure_ipv4_port_ephemeral(tuple->src.u3.ip, tuple->dst.u3.ip,
-						 maniptype == NF_NAT_MANIP_SRC
-						 ? tuple->dst.u.all
-						 : tuple->src.u.all);
+		off = l3proto->secure_port(tuple, maniptype == NF_NAT_MANIP_SRC
+						  ? tuple->dst.u.all
+						  : tuple->src.u.all);
 	else
 		off = *rover;
 
@@ -93,22 +91,22 @@ void nf_nat_proto_unique_tuple(struct nf_conntrack_tuple *tuple,
 	}
 	return;
 }
-EXPORT_SYMBOL_GPL(nf_nat_proto_unique_tuple);
+EXPORT_SYMBOL_GPL(nf_nat_l4proto_unique_tuple);
 
 #if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
-int nf_nat_proto_nlattr_to_range(struct nlattr *tb[],
-				 struct nf_nat_ipv4_range *range)
+int nf_nat_l4proto_nlattr_to_range(struct nlattr *tb[],
+				   struct nf_nat_range *range)
 {
 	if (tb[CTA_PROTONAT_PORT_MIN]) {
-		range->min.all = nla_get_be16(tb[CTA_PROTONAT_PORT_MIN]);
-		range->max.all = range->min.tcp.port;
+		range->min_proto.all = nla_get_be16(tb[CTA_PROTONAT_PORT_MIN]);
+		range->max_proto.all = range->min_proto.all;
 		range->flags |= NF_NAT_RANGE_PROTO_SPECIFIED;
 	}
 	if (tb[CTA_PROTONAT_PORT_MAX]) {
-		range->max.all = nla_get_be16(tb[CTA_PROTONAT_PORT_MAX]);
+		range->max_proto.all = nla_get_be16(tb[CTA_PROTONAT_PORT_MAX]);
 		range->flags |= NF_NAT_RANGE_PROTO_SPECIFIED;
 	}
 	return 0;
 }
-EXPORT_SYMBOL_GPL(nf_nat_proto_nlattr_to_range);
+EXPORT_SYMBOL_GPL(nf_nat_l4proto_nlattr_to_range);
 #endif
diff --git a/net/ipv4/netfilter/nf_nat_proto_dccp.c b/net/netfilter/nf_nat_proto_dccp.c
similarity index 61%
rename from net/ipv4/netfilter/nf_nat_proto_dccp.c
rename to net/netfilter/nf_nat_proto_dccp.c
index 3f67138..c8be2cd 100644
--- a/net/ipv4/netfilter/nf_nat_proto_dccp.c
+++ b/net/netfilter/nf_nat_proto_dccp.c
@@ -1,7 +1,7 @@
 /*
  * DCCP NAT protocol helper
  *
- * Copyright (c) 2005, 2006. 2008 Patrick McHardy <kaber@trash.net>
+ * Copyright (c) 2005, 2006, 2008 Patrick McHardy <kaber@trash.net>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -13,35 +13,34 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/skbuff.h>
-#include <linux/ip.h>
 #include <linux/dccp.h>
 
 #include <net/netfilter/nf_conntrack.h>
 #include <net/netfilter/nf_nat.h>
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 
 static u_int16_t dccp_port_rover;
 
 static void
-dccp_unique_tuple(struct nf_conntrack_tuple *tuple,
-		  const struct nf_nat_ipv4_range *range,
+dccp_unique_tuple(const struct nf_nat_l3proto *l3proto,
+		  struct nf_conntrack_tuple *tuple,
+		  const struct nf_nat_range *range,
 		  enum nf_nat_manip_type maniptype,
 		  const struct nf_conn *ct)
 {
-	nf_nat_proto_unique_tuple(tuple, range, maniptype, ct,
-				  &dccp_port_rover);
+	nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct,
+				    &dccp_port_rover);
 }
 
 static bool
 dccp_manip_pkt(struct sk_buff *skb,
-	       unsigned int iphdroff,
+	       const struct nf_nat_l3proto *l3proto,
+	       unsigned int iphdroff, unsigned int hdroff,
 	       const struct nf_conntrack_tuple *tuple,
 	       enum nf_nat_manip_type maniptype)
 {
-	const struct iphdr *iph = (const void *)(skb->data + iphdroff);
 	struct dccp_hdr *hdr;
-	unsigned int hdroff = iphdroff + iph->ihl * 4;
-	__be32 oldip, newip;
 	__be16 *portptr, oldport, newport;
 	int hdrsize = 8; /* DCCP connection tracking guarantees this much */
 
@@ -51,17 +50,12 @@ dccp_manip_pkt(struct sk_buff *skb,
 	if (!skb_make_writable(skb, hdroff + hdrsize))
 		return false;
 
-	iph = (struct iphdr *)(skb->data + iphdroff);
 	hdr = (struct dccp_hdr *)(skb->data + hdroff);
 
 	if (maniptype == NF_NAT_MANIP_SRC) {
-		oldip = iph->saddr;
-		newip = tuple->src.u3.ip;
 		newport = tuple->src.u.dccp.port;
 		portptr = &hdr->dccph_sport;
 	} else {
-		oldip = iph->daddr;
-		newip = tuple->dst.u3.ip;
 		newport = tuple->dst.u.dccp.port;
 		portptr = &hdr->dccph_dport;
 	}
@@ -72,30 +66,46 @@ dccp_manip_pkt(struct sk_buff *skb,
 	if (hdrsize < sizeof(*hdr))
 		return true;
 
-	inet_proto_csum_replace4(&hdr->dccph_checksum, skb, oldip, newip, 1);
+	l3proto->csum_update(skb, iphdroff, &hdr->dccph_checksum,
+			     tuple, maniptype);
 	inet_proto_csum_replace2(&hdr->dccph_checksum, skb, oldport, newport,
 				 0);
 	return true;
 }
 
-static const struct nf_nat_protocol nf_nat_protocol_dccp = {
-	.protonum		= IPPROTO_DCCP,
+static const struct nf_nat_l4proto nf_nat_l4proto_dccp = {
+	.l4proto		= IPPROTO_DCCP,
 	.manip_pkt		= dccp_manip_pkt,
-	.in_range		= nf_nat_proto_in_range,
+	.in_range		= nf_nat_l4proto_in_range,
 	.unique_tuple		= dccp_unique_tuple,
 #if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
-	.nlattr_to_range	= nf_nat_proto_nlattr_to_range,
+	.nlattr_to_range	= nf_nat_l4proto_nlattr_to_range,
 #endif
 };
 
 static int __init nf_nat_proto_dccp_init(void)
 {
-	return nf_nat_protocol_register(&nf_nat_protocol_dccp);
+	int err;
+
+	err = nf_nat_l4proto_register(NFPROTO_IPV4, &nf_nat_l4proto_dccp);
+	if (err < 0)
+		goto err1;
+	err = nf_nat_l4proto_register(NFPROTO_IPV6, &nf_nat_l4proto_dccp);
+	if (err < 0)
+		goto err2;
+	return 0;
+
+err2:
+	nf_nat_l4proto_unregister(NFPROTO_IPV4, &nf_nat_l4proto_dccp);
+err1:
+	return err;
 }
 
 static void __exit nf_nat_proto_dccp_fini(void)
 {
-	nf_nat_protocol_unregister(&nf_nat_protocol_dccp);
+	nf_nat_l4proto_unregister(NFPROTO_IPV6, &nf_nat_l4proto_dccp);
+	nf_nat_l4proto_unregister(NFPROTO_IPV4, &nf_nat_l4proto_dccp);
+
 }
 
 module_init(nf_nat_proto_dccp_init);
diff --git a/net/ipv4/netfilter/nf_nat_proto_sctp.c b/net/netfilter/nf_nat_proto_sctp.c
similarity index 61%
rename from net/ipv4/netfilter/nf_nat_proto_sctp.c
rename to net/netfilter/nf_nat_proto_sctp.c
index 3cce9b6..e64faa5 100644
--- a/net/ipv4/netfilter/nf_nat_proto_sctp.c
+++ b/net/netfilter/nf_nat_proto_sctp.c
@@ -8,53 +8,46 @@
 
 #include <linux/types.h>
 #include <linux/init.h>
-#include <linux/ip.h>
 #include <linux/sctp.h>
 #include <linux/module.h>
 #include <net/sctp/checksum.h>
 
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 
 static u_int16_t nf_sctp_port_rover;
 
 static void
-sctp_unique_tuple(struct nf_conntrack_tuple *tuple,
-		  const struct nf_nat_ipv4_range *range,
+sctp_unique_tuple(const struct nf_nat_l3proto *l3proto,
+		  struct nf_conntrack_tuple *tuple,
+		  const struct nf_nat_range *range,
 		  enum nf_nat_manip_type maniptype,
 		  const struct nf_conn *ct)
 {
-	nf_nat_proto_unique_tuple(tuple, range, maniptype, ct,
-				  &nf_sctp_port_rover);
+	nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct,
+				    &nf_sctp_port_rover);
 }
 
 static bool
 sctp_manip_pkt(struct sk_buff *skb,
-	       unsigned int iphdroff,
+	       const struct nf_nat_l3proto *l3proto,
+	       unsigned int iphdroff, unsigned int hdroff,
 	       const struct nf_conntrack_tuple *tuple,
 	       enum nf_nat_manip_type maniptype)
 {
-	const struct iphdr *iph = (struct iphdr *)(skb->data + iphdroff);
 	struct sk_buff *frag;
 	sctp_sctphdr_t *hdr;
-	unsigned int hdroff = iphdroff + iph->ihl*4;
-	__be32 oldip, newip;
 	__be32 crc32;
 
 	if (!skb_make_writable(skb, hdroff + sizeof(*hdr)))
 		return false;
 
-	iph = (struct iphdr *)(skb->data + iphdroff);
 	hdr = (struct sctphdr *)(skb->data + hdroff);
 
 	if (maniptype == NF_NAT_MANIP_SRC) {
-		/* Get rid of src ip and src pt */
-		oldip = iph->saddr;
-		newip = tuple->src.u3.ip;
+		/* Get rid of src port */
 		hdr->source = tuple->src.u.sctp.port;
 	} else {
-		/* Get rid of dst ip and dst pt */
-		oldip = iph->daddr;
-		newip = tuple->dst.u3.ip;
+		/* Get rid of dst port */
 		hdr->dest = tuple->dst.u.sctp.port;
 	}
 
@@ -68,24 +61,38 @@ sctp_manip_pkt(struct sk_buff *skb,
 	return true;
 }
 
-static const struct nf_nat_protocol nf_nat_protocol_sctp = {
-	.protonum		= IPPROTO_SCTP,
+static const struct nf_nat_l4proto nf_nat_l4proto_sctp = {
+	.l4proto		= IPPROTO_SCTP,
 	.manip_pkt		= sctp_manip_pkt,
-	.in_range		= nf_nat_proto_in_range,
+	.in_range		= nf_nat_l4proto_in_range,
 	.unique_tuple		= sctp_unique_tuple,
 #if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
-	.nlattr_to_range	= nf_nat_proto_nlattr_to_range,
+	.nlattr_to_range	= nf_nat_l4proto_nlattr_to_range,
 #endif
 };
 
 static int __init nf_nat_proto_sctp_init(void)
 {
-	return nf_nat_protocol_register(&nf_nat_protocol_sctp);
+	int err;
+
+	err = nf_nat_l4proto_register(NFPROTO_IPV4, &nf_nat_l4proto_sctp);
+	if (err < 0)
+		goto err1;
+	err = nf_nat_l4proto_register(NFPROTO_IPV6, &nf_nat_l4proto_sctp);
+	if (err < 0)
+		goto err2;
+	return 0;
+
+err2:
+	nf_nat_l4proto_unregister(NFPROTO_IPV4, &nf_nat_l4proto_sctp);
+err1:
+	return err;
 }
 
 static void __exit nf_nat_proto_sctp_exit(void)
 {
-	nf_nat_protocol_unregister(&nf_nat_protocol_sctp);
+	nf_nat_l4proto_unregister(NFPROTO_IPV6, &nf_nat_l4proto_sctp);
+	nf_nat_l4proto_unregister(NFPROTO_IPV4, &nf_nat_l4proto_sctp);
 }
 
 module_init(nf_nat_proto_sctp_init);
diff --git a/net/ipv4/netfilter/nf_nat_proto_tcp.c b/net/netfilter/nf_nat_proto_tcp.c
similarity index 65%
rename from net/ipv4/netfilter/nf_nat_proto_tcp.c
rename to net/netfilter/nf_nat_proto_tcp.c
index 9fb4b4e..83ec8a6 100644
--- a/net/ipv4/netfilter/nf_nat_proto_tcp.c
+++ b/net/netfilter/nf_nat_proto_tcp.c
@@ -9,37 +9,36 @@
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/export.h>
-#include <linux/ip.h>
 #include <linux/tcp.h>
 
 #include <linux/netfilter.h>
 #include <linux/netfilter/nfnetlink_conntrack.h>
 #include <net/netfilter/nf_nat.h>
-#include <net/netfilter/nf_nat_rule.h>
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 #include <net/netfilter/nf_nat_core.h>
 
-static u_int16_t tcp_port_rover;
+static u16 tcp_port_rover;
 
 static void
-tcp_unique_tuple(struct nf_conntrack_tuple *tuple,
-		 const struct nf_nat_ipv4_range *range,
+tcp_unique_tuple(const struct nf_nat_l3proto *l3proto,
+		 struct nf_conntrack_tuple *tuple,
+		 const struct nf_nat_range *range,
 		 enum nf_nat_manip_type maniptype,
 		 const struct nf_conn *ct)
 {
-	nf_nat_proto_unique_tuple(tuple, range, maniptype, ct, &tcp_port_rover);
+	nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct,
+				    &tcp_port_rover);
 }
 
 static bool
 tcp_manip_pkt(struct sk_buff *skb,
-	      unsigned int iphdroff,
+	      const struct nf_nat_l3proto *l3proto,
+	      unsigned int iphdroff, unsigned int hdroff,
 	      const struct nf_conntrack_tuple *tuple,
 	      enum nf_nat_manip_type maniptype)
 {
-	const struct iphdr *iph = (struct iphdr *)(skb->data + iphdroff);
 	struct tcphdr *hdr;
-	unsigned int hdroff = iphdroff + iph->ihl*4;
-	__be32 oldip, newip;
 	__be16 *portptr, newport, oldport;
 	int hdrsize = 8; /* TCP connection tracking guarantees this much */
 
@@ -52,19 +51,14 @@ tcp_manip_pkt(struct sk_buff *skb,
 	if (!skb_make_writable(skb, hdroff + hdrsize))
 		return false;
 
-	iph = (struct iphdr *)(skb->data + iphdroff);
 	hdr = (struct tcphdr *)(skb->data + hdroff);
 
 	if (maniptype == NF_NAT_MANIP_SRC) {
-		/* Get rid of src ip and src pt */
-		oldip = iph->saddr;
-		newip = tuple->src.u3.ip;
+		/* Get rid of src port */
 		newport = tuple->src.u.tcp.port;
 		portptr = &hdr->source;
 	} else {
-		/* Get rid of dst ip and dst pt */
-		oldip = iph->daddr;
-		newip = tuple->dst.u3.ip;
+		/* Get rid of dst port */
 		newport = tuple->dst.u.tcp.port;
 		portptr = &hdr->dest;
 	}
@@ -75,17 +69,17 @@ tcp_manip_pkt(struct sk_buff *skb,
 	if (hdrsize < sizeof(*hdr))
 		return true;
 
-	inet_proto_csum_replace4(&hdr->check, skb, oldip, newip, 1);
+	l3proto->csum_update(skb, iphdroff, &hdr->check, tuple, maniptype);
 	inet_proto_csum_replace2(&hdr->check, skb, oldport, newport, 0);
 	return true;
 }
 
-const struct nf_nat_protocol nf_nat_protocol_tcp = {
-	.protonum		= IPPROTO_TCP,
+const struct nf_nat_l4proto nf_nat_l4proto_tcp = {
+	.l4proto		= IPPROTO_TCP,
 	.manip_pkt		= tcp_manip_pkt,
-	.in_range		= nf_nat_proto_in_range,
+	.in_range		= nf_nat_l4proto_in_range,
 	.unique_tuple		= tcp_unique_tuple,
 #if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
-	.nlattr_to_range	= nf_nat_proto_nlattr_to_range,
+	.nlattr_to_range	= nf_nat_l4proto_nlattr_to_range,
 #endif
 };
diff --git a/net/ipv4/netfilter/nf_nat_proto_udp.c b/net/netfilter/nf_nat_proto_udp.c
similarity index 60%
rename from net/ipv4/netfilter/nf_nat_proto_udp.c
rename to net/netfilter/nf_nat_proto_udp.c
index 9883336..7df613f 100644
--- a/net/ipv4/netfilter/nf_nat_proto_udp.c
+++ b/net/netfilter/nf_nat_proto_udp.c
@@ -9,59 +9,53 @@
 #include <linux/types.h>
 #include <linux/export.h>
 #include <linux/init.h>
-#include <linux/ip.h>
 #include <linux/udp.h>
 
 #include <linux/netfilter.h>
 #include <net/netfilter/nf_nat.h>
 #include <net/netfilter/nf_nat_core.h>
-#include <net/netfilter/nf_nat_rule.h>
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 
-static u_int16_t udp_port_rover;
+static u16 udp_port_rover;
 
 static void
-udp_unique_tuple(struct nf_conntrack_tuple *tuple,
-		 const struct nf_nat_ipv4_range *range,
+udp_unique_tuple(const struct nf_nat_l3proto *l3proto,
+		 struct nf_conntrack_tuple *tuple,
+		 const struct nf_nat_range *range,
 		 enum nf_nat_manip_type maniptype,
 		 const struct nf_conn *ct)
 {
-	nf_nat_proto_unique_tuple(tuple, range, maniptype, ct, &udp_port_rover);
+	nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct,
+				    &udp_port_rover);
 }
 
 static bool
 udp_manip_pkt(struct sk_buff *skb,
-	      unsigned int iphdroff,
+	      const struct nf_nat_l3proto *l3proto,
+	      unsigned int iphdroff, unsigned int hdroff,
 	      const struct nf_conntrack_tuple *tuple,
 	      enum nf_nat_manip_type maniptype)
 {
-	const struct iphdr *iph = (struct iphdr *)(skb->data + iphdroff);
 	struct udphdr *hdr;
-	unsigned int hdroff = iphdroff + iph->ihl*4;
-	__be32 oldip, newip;
 	__be16 *portptr, newport;
 
 	if (!skb_make_writable(skb, hdroff + sizeof(*hdr)))
 		return false;
-
-	iph = (struct iphdr *)(skb->data + iphdroff);
 	hdr = (struct udphdr *)(skb->data + hdroff);
 
 	if (maniptype == NF_NAT_MANIP_SRC) {
-		/* Get rid of src ip and src pt */
-		oldip = iph->saddr;
-		newip = tuple->src.u3.ip;
+		/* Get rid of src port */
 		newport = tuple->src.u.udp.port;
 		portptr = &hdr->source;
 	} else {
-		/* Get rid of dst ip and dst pt */
-		oldip = iph->daddr;
-		newip = tuple->dst.u3.ip;
+		/* Get rid of dst port */
 		newport = tuple->dst.u.udp.port;
 		portptr = &hdr->dest;
 	}
 	if (hdr->check || skb->ip_summed == CHECKSUM_PARTIAL) {
-		inet_proto_csum_replace4(&hdr->check, skb, oldip, newip, 1);
+		l3proto->csum_update(skb, iphdroff, &hdr->check,
+				     tuple, maniptype);
 		inet_proto_csum_replace2(&hdr->check, skb, *portptr, newport,
 					 0);
 		if (!hdr->check)
@@ -71,12 +65,12 @@ udp_manip_pkt(struct sk_buff *skb,
 	return true;
 }
 
-const struct nf_nat_protocol nf_nat_protocol_udp = {
-	.protonum		= IPPROTO_UDP,
+const struct nf_nat_l4proto nf_nat_l4proto_udp = {
+	.l4proto		= IPPROTO_UDP,
 	.manip_pkt		= udp_manip_pkt,
-	.in_range		= nf_nat_proto_in_range,
+	.in_range		= nf_nat_l4proto_in_range,
 	.unique_tuple		= udp_unique_tuple,
 #if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
-	.nlattr_to_range	= nf_nat_proto_nlattr_to_range,
+	.nlattr_to_range	= nf_nat_l4proto_nlattr_to_range,
 #endif
 };
diff --git a/net/ipv4/netfilter/nf_nat_proto_udplite.c b/net/netfilter/nf_nat_proto_udplite.c
similarity index 58%
rename from net/ipv4/netfilter/nf_nat_proto_udplite.c
rename to net/netfilter/nf_nat_proto_udplite.c
index d24d10a..776a0d1 100644
--- a/net/ipv4/netfilter/nf_nat_proto_udplite.c
+++ b/net/netfilter/nf_nat_proto_udplite.c
@@ -9,59 +9,53 @@
 
 #include <linux/types.h>
 #include <linux/init.h>
-#include <linux/ip.h>
 #include <linux/udp.h>
 
 #include <linux/netfilter.h>
 #include <linux/module.h>
 #include <net/netfilter/nf_nat.h>
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 
-static u_int16_t udplite_port_rover;
+static u16 udplite_port_rover;
 
 static void
-udplite_unique_tuple(struct nf_conntrack_tuple *tuple,
-		     const struct nf_nat_ipv4_range *range,
+udplite_unique_tuple(const struct nf_nat_l3proto *l3proto,
+		     struct nf_conntrack_tuple *tuple,
+		     const struct nf_nat_range *range,
 		     enum nf_nat_manip_type maniptype,
 		     const struct nf_conn *ct)
 {
-	nf_nat_proto_unique_tuple(tuple, range, maniptype, ct,
-				  &udplite_port_rover);
+	nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct,
+				    &udplite_port_rover);
 }
 
 static bool
 udplite_manip_pkt(struct sk_buff *skb,
-		  unsigned int iphdroff,
+		  const struct nf_nat_l3proto *l3proto,
+		  unsigned int iphdroff, unsigned int hdroff,
 		  const struct nf_conntrack_tuple *tuple,
 		  enum nf_nat_manip_type maniptype)
 {
-	const struct iphdr *iph = (struct iphdr *)(skb->data + iphdroff);
 	struct udphdr *hdr;
-	unsigned int hdroff = iphdroff + iph->ihl*4;
-	__be32 oldip, newip;
 	__be16 *portptr, newport;
 
 	if (!skb_make_writable(skb, hdroff + sizeof(*hdr)))
 		return false;
 
-	iph = (struct iphdr *)(skb->data + iphdroff);
 	hdr = (struct udphdr *)(skb->data + hdroff);
 
 	if (maniptype == NF_NAT_MANIP_SRC) {
-		/* Get rid of src ip and src pt */
-		oldip = iph->saddr;
-		newip = tuple->src.u3.ip;
+		/* Get rid of source port */
 		newport = tuple->src.u.udp.port;
 		portptr = &hdr->source;
 	} else {
-		/* Get rid of dst ip and dst pt */
-		oldip = iph->daddr;
-		newip = tuple->dst.u3.ip;
+		/* Get rid of dst port */
 		newport = tuple->dst.u.udp.port;
 		portptr = &hdr->dest;
 	}
 
-	inet_proto_csum_replace4(&hdr->check, skb, oldip, newip, 1);
+	l3proto->csum_update(skb, iphdroff, &hdr->check, tuple, maniptype);
 	inet_proto_csum_replace2(&hdr->check, skb, *portptr, newport, 0);
 	if (!hdr->check)
 		hdr->check = CSUM_MANGLED_0;
@@ -70,24 +64,38 @@ udplite_manip_pkt(struct sk_buff *skb,
 	return true;
 }
 
-static const struct nf_nat_protocol nf_nat_protocol_udplite = {
-	.protonum		= IPPROTO_UDPLITE,
+static const struct nf_nat_l4proto nf_nat_l4proto_udplite = {
+	.l4proto		= IPPROTO_UDPLITE,
 	.manip_pkt		= udplite_manip_pkt,
-	.in_range		= nf_nat_proto_in_range,
+	.in_range		= nf_nat_l4proto_in_range,
 	.unique_tuple		= udplite_unique_tuple,
 #if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
-	.nlattr_to_range	= nf_nat_proto_nlattr_to_range,
+	.nlattr_to_range	= nf_nat_l4proto_nlattr_to_range,
 #endif
 };
 
 static int __init nf_nat_proto_udplite_init(void)
 {
-	return nf_nat_protocol_register(&nf_nat_protocol_udplite);
+	int err;
+
+	err = nf_nat_l4proto_register(NFPROTO_IPV4, &nf_nat_l4proto_udplite);
+	if (err < 0)
+		goto err1;
+	err = nf_nat_l4proto_register(NFPROTO_IPV6, &nf_nat_l4proto_udplite);
+	if (err < 0)
+		goto err2;
+	return 0;
+
+err2:
+	nf_nat_l4proto_unregister(NFPROTO_IPV4, &nf_nat_l4proto_udplite);
+err1:
+	return err;
 }
 
 static void __exit nf_nat_proto_udplite_fini(void)
 {
-	nf_nat_protocol_unregister(&nf_nat_protocol_udplite);
+	nf_nat_l4proto_unregister(NFPROTO_IPV6, &nf_nat_l4proto_udplite);
+	nf_nat_l4proto_unregister(NFPROTO_IPV4, &nf_nat_l4proto_udplite);
 }
 
 module_init(nf_nat_proto_udplite_init);
diff --git a/net/ipv4/netfilter/nf_nat_proto_unknown.c b/net/netfilter/nf_nat_proto_unknown.c
similarity index 76%
rename from net/ipv4/netfilter/nf_nat_proto_unknown.c
rename to net/netfilter/nf_nat_proto_unknown.c
index e0afe81..6e494d5 100644
--- a/net/ipv4/netfilter/nf_nat_proto_unknown.c
+++ b/net/netfilter/nf_nat_proto_unknown.c
@@ -15,8 +15,7 @@
 
 #include <linux/netfilter.h>
 #include <net/netfilter/nf_nat.h>
-#include <net/netfilter/nf_nat_rule.h>
-#include <net/netfilter/nf_nat_protocol.h>
+#include <net/netfilter/nf_nat_l4proto.h>
 
 static bool unknown_in_range(const struct nf_conntrack_tuple *tuple,
 			     enum nf_nat_manip_type manip_type,
@@ -26,26 +25,29 @@ static bool unknown_in_range(const struct nf_conntrack_tuple *tuple,
 	return true;
 }
 
-static void unknown_unique_tuple(struct nf_conntrack_tuple *tuple,
-				 const struct nf_nat_ipv4_range *range,
+static void unknown_unique_tuple(const struct nf_nat_l3proto *l3proto,
+				 struct nf_conntrack_tuple *tuple,
+				 const struct nf_nat_range *range,
 				 enum nf_nat_manip_type maniptype,
 				 const struct nf_conn *ct)
 {
 	/* Sorry: we can't help you; if it's not unique, we can't frob
-	   anything. */
+	 * anything.
+	 */
 	return;
 }
 
 static bool
 unknown_manip_pkt(struct sk_buff *skb,
-		  unsigned int iphdroff,
+		  const struct nf_nat_l3proto *l3proto,
+		  unsigned int iphdroff, unsigned int hdroff,
 		  const struct nf_conntrack_tuple *tuple,
 		  enum nf_nat_manip_type maniptype)
 {
 	return true;
 }
 
-const struct nf_nat_protocol nf_nat_unknown_protocol = {
+const struct nf_nat_l4proto nf_nat_l4proto_unknown = {
 	.manip_pkt		= unknown_manip_pkt,
 	.in_range		= unknown_in_range,
 	.unique_tuple		= unknown_unique_tuple,
diff --git a/net/netfilter/xt_nat.c b/net/netfilter/xt_nat.c
new file mode 100644
index 0000000..7521368
--- /dev/null
+++ b/net/netfilter/xt_nat.c
@@ -0,0 +1,167 @@
+/*
+ * (C) 1999-2001 Paul `Rusty' Russell
+ * (C) 2002-2006 Netfilter Core Team <coreteam@netfilter.org>
+ * (C) 2011 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_nat_core.h>
+
+static int xt_nat_checkentry_v0(const struct xt_tgchk_param *par)
+{
+	const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
+
+	if (mr->rangesize != 1) {
+		pr_info("%s: multiple ranges no longer supported\n",
+			par->target->name);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static void xt_nat_convert_range(struct nf_nat_range *dst,
+				 const struct nf_nat_ipv4_range *src)
+{
+	memset(&dst->min_addr, 0, sizeof(dst->min_addr));
+	memset(&dst->max_addr, 0, sizeof(dst->max_addr));
+
+	dst->flags	 = src->flags;
+	dst->min_addr.ip = src->min_ip;
+	dst->max_addr.ip = src->max_ip;
+	dst->min_proto	 = src->min;
+	dst->max_proto	 = src->max;
+}
+
+static unsigned int
+xt_snat_target_v0(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
+	struct nf_nat_range range;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	NF_CT_ASSERT(ct != NULL &&
+		     (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
+		      ctinfo == IP_CT_RELATED_REPLY));
+
+	xt_nat_convert_range(&range, &mr->range[0]);
+	return nf_nat_setup_info(ct, &range, NF_NAT_MANIP_SRC);
+}
+
+static unsigned int
+xt_dnat_target_v0(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
+	struct nf_nat_range range;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	NF_CT_ASSERT(ct != NULL &&
+		     (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED));
+
+	xt_nat_convert_range(&range, &mr->range[0]);
+	return nf_nat_setup_info(ct, &range, NF_NAT_MANIP_DST);
+}
+
+static unsigned int
+xt_snat_target_v1(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct nf_nat_range *range = par->targinfo;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	NF_CT_ASSERT(ct != NULL &&
+		     (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
+		      ctinfo == IP_CT_RELATED_REPLY));
+
+	return nf_nat_setup_info(ct, range, NF_NAT_MANIP_SRC);
+}
+
+static unsigned int
+xt_dnat_target_v1(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct nf_nat_range *range = par->targinfo;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	NF_CT_ASSERT(ct != NULL &&
+		     (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED));
+
+	return nf_nat_setup_info(ct, range, NF_NAT_MANIP_DST);
+}
+
+static struct xt_target xt_nat_target_reg[] __read_mostly = {
+	{
+		.name		= "SNAT",
+		.revision	= 0,
+		.checkentry	= xt_nat_checkentry_v0,
+		.target		= xt_snat_target_v0,
+		.targetsize	= sizeof(struct nf_nat_ipv4_multi_range_compat),
+		.family		= NFPROTO_IPV4,
+		.table		= "nat",
+		.hooks		= (1 << NF_INET_POST_ROUTING) |
+				  (1 << NF_INET_LOCAL_OUT),
+		.me		= THIS_MODULE,
+	},
+	{
+		.name		= "DNAT",
+		.revision	= 0,
+		.checkentry	= xt_nat_checkentry_v0,
+		.target		= xt_dnat_target_v0,
+		.targetsize	= sizeof(struct nf_nat_ipv4_multi_range_compat),
+		.family		= NFPROTO_IPV4,
+		.table		= "nat",
+		.hooks		= (1 << NF_INET_PRE_ROUTING) |
+				  (1 << NF_INET_LOCAL_IN),
+		.me		= THIS_MODULE,
+	},
+	{
+		.name		= "SNAT",
+		.revision	= 1,
+		.target		= xt_snat_target_v1,
+		.targetsize	= sizeof(struct nf_nat_range),
+		.table		= "nat",
+		.hooks		= (1 << NF_INET_POST_ROUTING) |
+				  (1 << NF_INET_LOCAL_OUT),
+		.me		= THIS_MODULE,
+	},
+	{
+		.name		= "DNAT",
+		.revision	= 1,
+		.target		= xt_dnat_target_v1,
+		.targetsize	= sizeof(struct nf_nat_range),
+		.table		= "nat",
+		.hooks		= (1 << NF_INET_PRE_ROUTING) |
+				  (1 << NF_INET_LOCAL_IN),
+		.me		= THIS_MODULE,
+	},
+};
+
+static int __init xt_nat_init(void)
+{
+	return xt_register_targets(xt_nat_target_reg,
+				   ARRAY_SIZE(xt_nat_target_reg));
+}
+
+static void __exit xt_nat_exit(void)
+{
+	xt_unregister_targets(xt_nat_target_reg, ARRAY_SIZE(xt_nat_target_reg));
+}
+
+module_init(xt_nat_init);
+module_exit(xt_nat_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_SNAT");
+MODULE_ALIAS("ipt_DNAT");
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 10/19] netfilter: ipv6: expand skb head in ip6_route_me_harder after oif change
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (8 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 09/19] netfilter: add protocol independant NAT core kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-09 20:08 ` [PATCH 11/19] net: core: add function for incremental IPv6 pseudo header checksum updates kaber
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Expand the skb headroom if the oif changed due to rerouting similar to
how IPv4 packets are handled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/ipv6/netfilter.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index db31561..429089c 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -15,6 +15,7 @@ int ip6_route_me_harder(struct sk_buff *skb)
 {
 	struct net *net = dev_net(skb_dst(skb)->dev);
 	const struct ipv6hdr *iph = ipv6_hdr(skb);
+	unsigned int hh_len;
 	struct dst_entry *dst;
 	struct flowi6 fl6 = {
 		.flowi6_oif = skb->sk ? skb->sk->sk_bound_dev_if : 0,
@@ -47,6 +48,13 @@ int ip6_route_me_harder(struct sk_buff *skb)
 	}
 #endif
 
+	/* Change in oif may mean change in hh_len. */
+	hh_len = skb_dst(skb)->dev->hard_header_len;
+	if (skb_headroom(skb) < hh_len &&
+	    pskb_expand_head(skb, HH_DATA_ALIGN(hh_len - skb_headroom(skb)),
+			     0, GFP_ATOMIC))
+		return -1;
+
 	return 0;
 }
 EXPORT_SYMBOL(ip6_route_me_harder);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 11/19] net: core: add function for incremental IPv6 pseudo header checksum updates
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (9 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 10/19] netfilter: ipv6: expand skb head in ip6_route_me_harder after oif change kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-09 20:08 ` [PATCH 12/19] netfilter: ipv6: add IPv6 NAT support kaber
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header
checksums for IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/checksum.h |    3 +++
 net/core/utils.c       |   20 ++++++++++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/net/checksum.h b/include/net/checksum.h
index ba55d8b..600d1d7 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -109,6 +109,9 @@ static inline void csum_replace2(__sum16 *sum, __be16 from, __be16 to)
 struct sk_buff;
 extern void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
 				     __be32 from, __be32 to, int pseudohdr);
+extern void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb,
+				      const __be32 *from, const __be32 *to,
+				      int pseudohdr);
 
 static inline void inet_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
 					    __be16 from, __be16 to,
diff --git a/net/core/utils.c b/net/core/utils.c
index 39895a6..f5613d5 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -294,6 +294,26 @@ void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
 }
 EXPORT_SYMBOL(inet_proto_csum_replace4);
 
+void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb,
+			       const __be32 *from, const __be32 *to,
+			       int pseudohdr)
+{
+	__be32 diff[] = {
+		~from[0], ~from[1], ~from[2], ~from[3],
+		to[0], to[1], to[2], to[3],
+	};
+	if (skb->ip_summed != CHECKSUM_PARTIAL) {
+		*sum = csum_fold(csum_partial(diff, sizeof(diff),
+				 ~csum_unfold(*sum)));
+		if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
+			skb->csum = ~csum_partial(diff, sizeof(diff),
+						  ~skb->csum);
+	} else if (pseudohdr)
+		*sum = ~csum_fold(csum_partial(diff, sizeof(diff),
+				  csum_unfold(*sum)));
+}
+EXPORT_SYMBOL(inet_proto_csum_replace16);
+
 int mac_pton(const char *s, u8 *mac)
 {
 	int i;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 12/19] netfilter: ipv6: add IPv6 NAT support
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (10 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 11/19] net: core: add function for incremental IPv6 pseudo header checksum updates kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-09 20:08 ` [PATCH 13/19] netfilter: ip6tables: add MASQUERADE target kaber
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter/nfnetlink_conntrack.h  |    2 +
 include/net/netfilter/nf_nat_l3proto.h         |    5 +
 include/net/netfilter/nf_nat_l4proto.h         |    1 +
 include/net/netns/ipv6.h                       |    1 +
 net/core/secure_seq.c                          |    1 +
 net/ipv6/netfilter/Kconfig                     |   12 +
 net/ipv6/netfilter/Makefile                    |    4 +
 net/ipv6/netfilter/ip6table_nat.c              |  321 ++++++++++++++++++++++++
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |   39 +++-
 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c       |  287 +++++++++++++++++++++
 net/ipv6/netfilter/nf_nat_proto_icmpv6.c       |   90 +++++++
 11 files changed, 761 insertions(+), 2 deletions(-)
 create mode 100644 net/ipv6/netfilter/ip6table_nat.c
 create mode 100644 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
 create mode 100644 net/ipv6/netfilter/nf_nat_proto_icmpv6.c

diff --git a/include/linux/netfilter/nfnetlink_conntrack.h b/include/linux/netfilter/nfnetlink_conntrack.h
index 68920ea..43bfe3e 100644
--- a/include/linux/netfilter/nfnetlink_conntrack.h
+++ b/include/linux/netfilter/nfnetlink_conntrack.h
@@ -147,6 +147,8 @@ enum ctattr_nat {
 	CTA_NAT_V4_MAXIP,
 #define CTA_NAT_MAXIP CTA_NAT_V4_MAXIP
 	CTA_NAT_PROTO,
+	CTA_NAT_V6_MINIP,
+	CTA_NAT_V6_MAXIP,
 	__CTA_NAT_MAX
 };
 #define CTA_NAT_MAX (__CTA_NAT_MAX - 1)
diff --git a/include/net/netfilter/nf_nat_l3proto.h b/include/net/netfilter/nf_nat_l3proto.h
index 3df4a67..57d9e27 100644
--- a/include/net/netfilter/nf_nat_l3proto.h
+++ b/include/net/netfilter/nf_nat_l3proto.h
@@ -43,5 +43,10 @@ extern int nf_nat_icmp_reply_translation(struct sk_buff *skb,
 					 struct nf_conn *ct,
 					 enum ip_conntrack_info ctinfo,
 					 unsigned int hooknum);
+extern int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
+					   struct nf_conn *ct,
+					   enum ip_conntrack_info ctinfo,
+					   unsigned int hooknum,
+					   unsigned int hdrlen);
 
 #endif /* _NF_NAT_L3PROTO_H */
diff --git a/include/net/netfilter/nf_nat_l4proto.h b/include/net/netfilter/nf_nat_l4proto.h
index 1f0a4f0..24feb68 100644
--- a/include/net/netfilter/nf_nat_l4proto.h
+++ b/include/net/netfilter/nf_nat_l4proto.h
@@ -51,6 +51,7 @@ extern const struct nf_nat_l4proto *__nf_nat_l4proto_find(u8 l3proto, u8 l4proto
 extern const struct nf_nat_l4proto nf_nat_l4proto_tcp;
 extern const struct nf_nat_l4proto nf_nat_l4proto_udp;
 extern const struct nf_nat_l4proto nf_nat_l4proto_icmp;
+extern const struct nf_nat_l4proto nf_nat_l4proto_icmpv6;
 extern const struct nf_nat_l4proto nf_nat_l4proto_unknown;
 
 extern bool nf_nat_l4proto_in_range(const struct nf_conntrack_tuple *tuple,
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index df0a545..0318104 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -42,6 +42,7 @@ struct netns_ipv6 {
 #ifdef CONFIG_SECURITY
 	struct xt_table		*ip6table_security;
 #endif
+	struct xt_table		*ip6table_nat;
 #endif
 	struct rt6_info         *ip6_null_entry;
 	struct rt6_statistics   *rt6_stats;
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 99b2596..e61a8bb 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -76,6 +76,7 @@ u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 
 	return hash[0];
 }
+EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 #endif
 
 #ifdef CONFIG_INET
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 1013534..b27e0ad 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -25,6 +25,18 @@ config NF_CONNTRACK_IPV6
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config NF_NAT_IPV6
+	tristate "IPv6 NAT"
+	depends on NF_CONNTRACK_IPV6
+	depends on NETFILTER_ADVANCED
+	select NF_NAT
+	help
+	  The IPv6 NAT option allows masquerading, port forwarding and other
+	  forms of full Network Address Port Translation. It is controlled by
+	  the `nat' table in ip6tables, see the man page for ip6tables(8).
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config IP6_NF_IPTABLES
 	tristate "IP6 tables support (required for filtering)"
 	depends on INET && IPV6
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index 534d3f2..7677937 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_IP6_NF_FILTER) += ip6table_filter.o
 obj-$(CONFIG_IP6_NF_MANGLE) += ip6table_mangle.o
 obj-$(CONFIG_IP6_NF_RAW) += ip6table_raw.o
 obj-$(CONFIG_IP6_NF_SECURITY) += ip6table_security.o
+obj-$(CONFIG_NF_NAT_IPV6) += ip6table_nat.o
 
 # objects for l3 independent conntrack
 nf_conntrack_ipv6-y  :=  nf_conntrack_l3proto_ipv6.o nf_conntrack_proto_icmpv6.o
@@ -15,6 +16,9 @@ nf_conntrack_ipv6-y  :=  nf_conntrack_l3proto_ipv6.o nf_conntrack_proto_icmpv6.o
 # l3 independent conntrack
 obj-$(CONFIG_NF_CONNTRACK_IPV6) += nf_conntrack_ipv6.o nf_defrag_ipv6.o
 
+nf_nat_ipv6-y		:= nf_nat_l3proto_ipv6.o nf_nat_proto_icmpv6.o
+obj-$(CONFIG_NF_NAT_IPV6) += nf_nat_ipv6.o
+
 # defrag
 nf_defrag_ipv6-y := nf_defrag_ipv6_hooks.o nf_conntrack_reasm.o
 obj-$(CONFIG_NF_DEFRAG_IPV6) += nf_defrag_ipv6.o
diff --git a/net/ipv6/netfilter/ip6table_nat.c b/net/ipv6/netfilter/ip6table_nat.c
new file mode 100644
index 0000000..b3f710e
--- /dev/null
+++ b/net/ipv6/netfilter/ip6table_nat.c
@@ -0,0 +1,321 @@
+/*
+ * Copyright (c) 2011 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on Rusty Russell's IPv4 NAT code. Development of IPv6 NAT
+ * funded by Astaro.
+ */
+
+#include <linux/module.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#include <linux/ipv6.h>
+#include <net/ipv6.h>
+
+#include <net/netfilter/nf_nat.h>
+#include <net/netfilter/nf_nat_core.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+
+static const struct xt_table nf_nat_ipv6_table = {
+	.name		= "nat",
+	.valid_hooks	= (1 << NF_INET_PRE_ROUTING) |
+			  (1 << NF_INET_POST_ROUTING) |
+			  (1 << NF_INET_LOCAL_OUT) |
+			  (1 << NF_INET_LOCAL_IN),
+	.me		= THIS_MODULE,
+	.af		= NFPROTO_IPV6,
+};
+
+static unsigned int alloc_null_binding(struct nf_conn *ct, unsigned int hooknum)
+{
+	/* Force range to this IP; let proto decide mapping for
+	 * per-proto parts (hence not IP_NAT_RANGE_PROTO_SPECIFIED).
+	 */
+	struct nf_nat_range range;
+
+	range.flags = 0;
+	pr_debug("Allocating NULL binding for %p (%pI6)\n", ct,
+		 HOOK2MANIP(hooknum) == NF_NAT_MANIP_SRC ?
+		 &ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.ip6 :
+		 &ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.ip6);
+
+	return nf_nat_setup_info(ct, &range, HOOK2MANIP(hooknum));
+}
+
+static unsigned int nf_nat_rule_find(struct sk_buff *skb, unsigned int hooknum,
+				     const struct net_device *in,
+				     const struct net_device *out,
+				     struct nf_conn *ct)
+{
+	struct net *net = nf_ct_net(ct);
+	unsigned int ret;
+
+	ret = ip6t_do_table(skb, hooknum, in, out, net->ipv6.ip6table_nat);
+	if (ret == NF_ACCEPT) {
+		if (!nf_nat_initialized(ct, HOOK2MANIP(hooknum)))
+			ret = alloc_null_binding(ct, hooknum);
+	}
+	return ret;
+}
+
+static unsigned int
+nf_nat_ipv6_fn(unsigned int hooknum,
+	       struct sk_buff *skb,
+	       const struct net_device *in,
+	       const struct net_device *out,
+	       int (*okfn)(struct sk_buff *))
+{
+	struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn_nat *nat;
+	enum nf_nat_manip_type maniptype = HOOK2MANIP(hooknum);
+	__be16 frag_off;
+	int hdrlen;
+	u8 nexthdr;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	/* Can't track?  It's not due to stress, or conntrack would
+	 * have dropped it.  Hence it's the user's responsibilty to
+	 * packet filter it out, or implement conntrack/NAT for that
+	 * protocol. 8) --RR
+	 */
+	if (!ct)
+		return NF_ACCEPT;
+
+	/* Don't try to NAT if this packet is not conntracked */
+	if (nf_ct_is_untracked(ct))
+		return NF_ACCEPT;
+
+	nat = nfct_nat(ct);
+	if (!nat) {
+		/* NAT module was loaded late. */
+		if (nf_ct_is_confirmed(ct))
+			return NF_ACCEPT;
+		nat = nf_ct_ext_add(ct, NF_CT_EXT_NAT, GFP_ATOMIC);
+		if (nat == NULL) {
+			pr_debug("failed to add NAT extension\n");
+			return NF_ACCEPT;
+		}
+	}
+
+	switch (ctinfo) {
+	case IP_CT_RELATED:
+	case IP_CT_RELATED_REPLY:
+		nexthdr = ipv6_hdr(skb)->nexthdr;
+		hdrlen = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr),
+					  &nexthdr, &frag_off);
+
+		if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
+			if (!nf_nat_icmpv6_reply_translation(skb, ct, ctinfo,
+							     hooknum, hdrlen))
+				return NF_DROP;
+			else
+				return NF_ACCEPT;
+		}
+		/* Fall thru... (Only ICMPs can be IP_CT_IS_REPLY) */
+	case IP_CT_NEW:
+		/* Seen it before?  This can happen for loopback, retrans,
+		 * or local packets.
+		 */
+		if (!nf_nat_initialized(ct, maniptype)) {
+			unsigned int ret;
+
+			ret = nf_nat_rule_find(skb, hooknum, in, out, ct);
+			if (ret != NF_ACCEPT)
+				return ret;
+		} else
+			pr_debug("Already setup manip %s for ct %p\n",
+				 maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
+				 ct);
+		break;
+
+	default:
+		/* ESTABLISHED */
+		NF_CT_ASSERT(ctinfo == IP_CT_ESTABLISHED ||
+			     ctinfo == IP_CT_ESTABLISHED_REPLY);
+	}
+
+	return nf_nat_packet(ct, ctinfo, hooknum, skb);
+}
+
+static unsigned int
+nf_nat_ipv6_in(unsigned int hooknum,
+	       struct sk_buff *skb,
+	       const struct net_device *in,
+	       const struct net_device *out,
+	       int (*okfn)(struct sk_buff *))
+{
+	unsigned int ret;
+	struct in6_addr daddr = ipv6_hdr(skb)->daddr;
+
+	ret = nf_nat_ipv6_fn(hooknum, skb, in, out, okfn);
+	if (ret != NF_DROP && ret != NF_STOLEN &&
+	    ipv6_addr_cmp(&daddr, &ipv6_hdr(skb)->daddr))
+		skb_dst_drop(skb);
+
+	return ret;
+}
+
+static unsigned int
+nf_nat_ipv6_out(unsigned int hooknum,
+		struct sk_buff *skb,
+		const struct net_device *in,
+		const struct net_device *out,
+		int (*okfn)(struct sk_buff *))
+{
+#ifdef CONFIG_XFRM
+	const struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+#endif
+	unsigned int ret;
+
+	/* root is playing with raw sockets. */
+	if (skb->len < sizeof(struct ipv6hdr))
+		return NF_ACCEPT;
+
+	ret = nf_nat_ipv6_fn(hooknum, skb, in, out, okfn);
+#ifdef CONFIG_XFRM
+	if (ret != NF_DROP && ret != NF_STOLEN &&
+	    !(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) &&
+	    (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
+		enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+
+		if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.src.u3,
+				      &ct->tuplehash[!dir].tuple.dst.u3) ||
+		    (ct->tuplehash[dir].tuple.src.u.all !=
+		     ct->tuplehash[!dir].tuple.dst.u.all))
+			if (nf_xfrm_me_harder(skb, AF_INET6) < 0)
+				ret = NF_DROP;
+	}
+#endif
+	return ret;
+}
+
+static unsigned int
+nf_nat_ipv6_local_fn(unsigned int hooknum,
+		     struct sk_buff *skb,
+		     const struct net_device *in,
+		     const struct net_device *out,
+		     int (*okfn)(struct sk_buff *))
+{
+	const struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	unsigned int ret;
+
+	/* root is playing with raw sockets. */
+	if (skb->len < sizeof(struct ipv6hdr))
+		return NF_ACCEPT;
+
+	ret = nf_nat_ipv6_fn(hooknum, skb, in, out, okfn);
+	if (ret != NF_DROP && ret != NF_STOLEN &&
+	    (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
+		enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+
+		if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.dst.u3,
+				      &ct->tuplehash[!dir].tuple.src.u3)) {
+			if (ip6_route_me_harder(skb))
+				ret = NF_DROP;
+		}
+#ifdef CONFIG_XFRM
+		else if (!(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) &&
+			 ct->tuplehash[dir].tuple.dst.u.all !=
+			 ct->tuplehash[!dir].tuple.src.u.all)
+			if (nf_xfrm_me_harder(skb, AF_INET6))
+				ret = NF_DROP;
+#endif
+	}
+	return ret;
+}
+
+static struct nf_hook_ops nf_nat_ipv6_ops[] __read_mostly = {
+	/* Before packet filtering, change destination */
+	{
+		.hook		= nf_nat_ipv6_in,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV6,
+		.hooknum	= NF_INET_PRE_ROUTING,
+		.priority	= NF_IP_PRI_NAT_DST,
+	},
+	/* After packet filtering, change source */
+	{
+		.hook		= nf_nat_ipv6_out,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV6,
+		.hooknum	= NF_INET_POST_ROUTING,
+		.priority	= NF_IP_PRI_NAT_SRC,
+	},
+	/* Before packet filtering, change destination */
+	{
+		.hook		= nf_nat_ipv6_local_fn,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV6,
+		.hooknum	= NF_INET_LOCAL_OUT,
+		.priority	= NF_IP_PRI_NAT_DST,
+	},
+	/* After packet filtering, change source */
+	{
+		.hook		= nf_nat_ipv6_fn,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV6,
+		.hooknum	= NF_INET_LOCAL_IN,
+		.priority	= NF_IP_PRI_NAT_SRC,
+	},
+};
+
+static int __net_init ip6table_nat_net_init(struct net *net)
+{
+	struct ip6t_replace *repl;
+
+	repl = ip6t_alloc_initial_table(&nf_nat_ipv6_table);
+	if (repl == NULL)
+		return -ENOMEM;
+	net->ipv6.ip6table_nat = ip6t_register_table(net, &nf_nat_ipv6_table, repl);
+	kfree(repl);
+	if (IS_ERR(net->ipv6.ip6table_nat))
+		return PTR_ERR(net->ipv6.ip6table_nat);
+	return 0;
+}
+
+static void __net_exit ip6table_nat_net_exit(struct net *net)
+{
+	ip6t_unregister_table(net, net->ipv6.ip6table_nat);
+}
+
+static struct pernet_operations ip6table_nat_net_ops = {
+	.init	= ip6table_nat_net_init,
+	.exit	= ip6table_nat_net_exit,
+};
+
+static int __init ip6table_nat_init(void)
+{
+	int err;
+
+	err = register_pernet_subsys(&ip6table_nat_net_ops);
+	if (err < 0)
+		goto err1;
+
+	err = nf_register_hooks(nf_nat_ipv6_ops, ARRAY_SIZE(nf_nat_ipv6_ops));
+	if (err < 0)
+		goto err2;
+	return 0;
+
+err2:
+	unregister_pernet_subsys(&ip6table_nat_net_ops);
+err1:
+	return err;
+}
+
+static void __exit ip6table_nat_exit(void)
+{
+	nf_unregister_hooks(nf_nat_ipv6_ops, ARRAY_SIZE(nf_nat_ipv6_ops));
+	unregister_pernet_subsys(&ip6table_nat_net_ops);
+}
+
+module_init(ip6table_nat_init);
+module_exit(ip6table_nat_exit);
+
+MODULE_LICENSE("GPL");
diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index 4d92ea6..9f427b5 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -28,6 +28,7 @@
 #include <net/netfilter/nf_conntrack_core.h>
 #include <net/netfilter/nf_conntrack_zones.h>
 #include <net/netfilter/ipv6/nf_conntrack_ipv6.h>
+#include <net/netfilter/nf_nat_helper.h>
 #include <net/netfilter/ipv6/nf_defrag_ipv6.h>
 #include <net/netfilter/nf_log.h>
 
@@ -142,6 +143,36 @@ static unsigned int ipv6_confirm(unsigned int hooknum,
 				 const struct net_device *out,
 				 int (*okfn)(struct sk_buff *))
 {
+	struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	unsigned char pnum = ipv6_hdr(skb)->nexthdr;
+	int protoff;
+	__be16 frag_off;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct || ctinfo == IP_CT_RELATED_REPLY)
+		goto out;
+
+	protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &pnum,
+				   &frag_off);
+	if (protoff < 0 || (frag_off & ntohs(~0x7)) != 0) {
+		pr_debug("proto header not found\n");
+		goto out;
+	}
+
+	/* adjust seqs for loopback traffic only in outgoing direction */
+	if (test_bit(IPS_SEQ_ADJUST_BIT, &ct->status) &&
+	    !nf_is_loopback_packet(skb)) {
+		typeof(nf_nat_seq_adjust_hook) seq_adjust;
+
+		seq_adjust = rcu_dereference(nf_nat_seq_adjust_hook);
+		if (!seq_adjust ||
+		    !seq_adjust(skb, ct, ctinfo, protoff)) {
+			NF_CT_STAT_INC_ATOMIC(nf_ct_net(ct), drop);
+			return NF_DROP;
+		}
+	}
+out:
 	/* We've seen it coming out the other side: confirm it */
 	return nf_conntrack_confirm(skb);
 }
@@ -169,10 +200,14 @@ static unsigned int __ipv6_conntrack_in(struct net *net,
 		}
 
 		/* Conntrack helpers need the entire reassembled packet in the
-		 * POST_ROUTING hook.
+		 * POST_ROUTING hook. In case of unconfirmed connections NAT
+		 * might reassign a helper, so the entire packet is also
+		 * required.
 		 */
 		ct = nf_ct_get(reasm, &ctinfo);
-		if (ct != NULL && test_bit(IPS_HELPER_BIT, &ct->status)) {
+		if (ct != NULL &&
+		    (test_bit(IPS_HELPER_BIT, &ct->status) ||
+		     !nf_ct_is_confirmed(ct))) {
 			nf_conntrack_get_reasm(skb);
 			NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
 				       (struct net_device *)in,
diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
new file mode 100644
index 0000000..4d42eaa
--- /dev/null
+++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
@@ -0,0 +1,287 @@
+/*
+ * Copyright (c) 2011 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Development of IPv6 NAT funded by Astaro.
+ */
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/ipv6.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv6.h>
+#include <net/secure_seq.h>
+#include <net/checksum.h>
+#include <net/ip6_route.h>
+#include <net/ipv6.h>
+
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_nat_core.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+#include <net/netfilter/nf_nat_l4proto.h>
+
+static const struct nf_nat_l3proto nf_nat_l3proto_ipv6;
+
+#ifdef CONFIG_XFRM
+static void nf_nat_ipv6_decode_session(struct sk_buff *skb,
+				       const struct nf_conn *ct,
+				       enum ip_conntrack_dir dir,
+				       unsigned long statusbit,
+				       struct flowi *fl)
+{
+	const struct nf_conntrack_tuple *t = &ct->tuplehash[dir].tuple;
+	struct flowi6 *fl6 = &fl->u.ip6;
+
+	if (ct->status & statusbit) {
+		fl6->daddr = t->dst.u3.in6;
+		if (t->dst.protonum == IPPROTO_TCP ||
+		    t->dst.protonum == IPPROTO_UDP ||
+		    t->dst.protonum == IPPROTO_UDPLITE ||
+		    t->dst.protonum == IPPROTO_DCCP ||
+		    t->dst.protonum == IPPROTO_SCTP)
+			fl6->fl6_dport = t->dst.u.all;
+	}
+
+	statusbit ^= IPS_NAT_MASK;
+
+	if (ct->status & statusbit) {
+		fl6->saddr = t->src.u3.in6;
+		if (t->dst.protonum == IPPROTO_TCP ||
+		    t->dst.protonum == IPPROTO_UDP ||
+		    t->dst.protonum == IPPROTO_UDPLITE ||
+		    t->dst.protonum == IPPROTO_DCCP ||
+		    t->dst.protonum == IPPROTO_SCTP)
+			fl6->fl6_sport = t->src.u.all;
+	}
+}
+#endif
+
+static int nf_nat_ipv6_in_range(const struct nf_conntrack_tuple *t,
+				const struct nf_nat_range *range)
+{
+	return ipv6_addr_cmp(&t->src.u3.in6, &range->min_addr.in6) >= 0 &&
+	       ipv6_addr_cmp(&t->src.u3.in6, &range->max_addr.in6) <= 0;
+}
+
+static u32 nf_nat_ipv6_secure_port(const struct nf_conntrack_tuple *t,
+				   __be16 dport)
+{
+	return secure_ipv6_port_ephemeral(t->src.u3.ip6, t->dst.u3.ip6, dport);
+}
+
+static bool nf_nat_ipv6_manip_pkt(struct sk_buff *skb,
+				  unsigned int iphdroff,
+				  const struct nf_nat_l4proto *l4proto,
+				  const struct nf_conntrack_tuple *target,
+				  enum nf_nat_manip_type maniptype)
+{
+	struct ipv6hdr *ipv6h;
+	__be16 frag_off;
+	int hdroff;
+	u8 nexthdr;
+
+	if (!skb_make_writable(skb, iphdroff + sizeof(*ipv6h)))
+		return false;
+
+	ipv6h = (void *)skb->data + iphdroff;
+	nexthdr = ipv6h->nexthdr;
+	hdroff = ipv6_skip_exthdr(skb, iphdroff + sizeof(*ipv6h),
+				  &nexthdr, &frag_off);
+	if (hdroff < 0)
+		goto manip_addr;
+
+	if ((frag_off & htons(~0x7)) == 0 &&
+	    !l4proto->manip_pkt(skb, &nf_nat_l3proto_ipv6, iphdroff, hdroff,
+				target, maniptype))
+		return false;
+manip_addr:
+	if (maniptype == NF_NAT_MANIP_SRC)
+		ipv6h->saddr = target->src.u3.in6;
+	else
+		ipv6h->daddr = target->dst.u3.in6;
+
+	return true;
+}
+
+static void nf_nat_ipv6_csum_update(struct sk_buff *skb,
+				    unsigned int iphdroff, __sum16 *check,
+				    const struct nf_conntrack_tuple *t,
+				    enum nf_nat_manip_type maniptype)
+{
+	const struct ipv6hdr *ipv6h = (struct ipv6hdr *)(skb->data + iphdroff);
+	const struct in6_addr *oldip, *newip;
+
+	if (maniptype == NF_NAT_MANIP_SRC) {
+		oldip = &ipv6h->saddr;
+		newip = &t->src.u3.in6;
+	} else {
+		oldip = &ipv6h->daddr;
+		newip = &t->dst.u3.in6;
+	}
+	inet_proto_csum_replace16(check, skb, oldip->s6_addr32,
+				  newip->s6_addr32, 1);
+}
+
+static void nf_nat_ipv6_csum_recalc(struct sk_buff *skb,
+				    u8 proto, void *data, __sum16 *check,
+				    int datalen, int oldlen)
+{
+	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+	struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
+
+	if (skb->ip_summed != CHECKSUM_PARTIAL) {
+		if (!(rt->rt6i_flags & RTF_LOCAL) &&
+		    (!skb->dev || skb->dev->features & NETIF_F_V6_CSUM)) {
+			skb->ip_summed = CHECKSUM_PARTIAL;
+			skb->csum_start = skb_headroom(skb) +
+					  skb_network_offset(skb) +
+					  (data - (void *)skb->data);
+			skb->csum_offset = (void *)check - data;
+			*check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr,
+						  datalen, proto, 0);
+		} else {
+			*check = 0;
+			*check = csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr,
+						 datalen, proto,
+						 csum_partial(data, datalen,
+							      0));
+			if (proto == IPPROTO_UDP && !*check)
+				*check = CSUM_MANGLED_0;
+		}
+	} else
+		inet_proto_csum_replace2(check, skb,
+					 htons(oldlen), htons(datalen), 1);
+}
+
+static int nf_nat_ipv6_nlattr_to_range(struct nlattr *tb[],
+				       struct nf_nat_range *range)
+{
+	if (tb[CTA_NAT_V6_MINIP]) {
+		nla_memcpy(&range->min_addr.ip6, tb[CTA_NAT_V6_MINIP],
+			   sizeof(struct in6_addr));
+		range->flags |= NF_NAT_RANGE_MAP_IPS;
+	}
+
+	if (tb[CTA_NAT_V6_MAXIP])
+		nla_memcpy(&range->max_addr.ip6, tb[CTA_NAT_V6_MAXIP],
+			   sizeof(struct in6_addr));
+	else
+		range->max_addr = range->min_addr;
+
+	return 0;
+}
+
+static const struct nf_nat_l3proto nf_nat_l3proto_ipv6 = {
+	.l3proto		= NFPROTO_IPV6,
+	.secure_port		= nf_nat_ipv6_secure_port,
+	.in_range		= nf_nat_ipv6_in_range,
+	.manip_pkt		= nf_nat_ipv6_manip_pkt,
+	.csum_update		= nf_nat_ipv6_csum_update,
+	.csum_recalc		= nf_nat_ipv6_csum_recalc,
+	.nlattr_to_range	= nf_nat_ipv6_nlattr_to_range,
+#ifdef CONFIG_XFRM
+	.decode_session	= nf_nat_ipv6_decode_session,
+#endif
+};
+
+int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
+				    struct nf_conn *ct,
+				    enum ip_conntrack_info ctinfo,
+				    unsigned int hooknum,
+				    unsigned int hdrlen)
+{
+	struct {
+		struct icmp6hdr	icmp6;
+		struct ipv6hdr	ip6;
+	} *inside;
+	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	enum nf_nat_manip_type manip = HOOK2MANIP(hooknum);
+	const struct nf_nat_l4proto *l4proto;
+	struct nf_conntrack_tuple target;
+	unsigned long statusbit;
+
+	NF_CT_ASSERT(ctinfo == IP_CT_RELATED || ctinfo == IP_CT_RELATED_REPLY);
+
+	if (!skb_make_writable(skb, hdrlen + sizeof(*inside)))
+		return 0;
+	if (nf_ip6_checksum(skb, hooknum, hdrlen, IPPROTO_ICMPV6))
+		return 0;
+
+	inside = (void *)skb->data + hdrlen;
+	if (inside->icmp6.icmp6_type == NDISC_REDIRECT) {
+		if ((ct->status & IPS_NAT_DONE_MASK) != IPS_NAT_DONE_MASK)
+			return 0;
+		if (ct->status & IPS_NAT_MASK)
+			return 0;
+	}
+
+	if (manip == NF_NAT_MANIP_SRC)
+		statusbit = IPS_SRC_NAT;
+	else
+		statusbit = IPS_DST_NAT;
+
+	/* Invert if this is reply direction */
+	if (dir == IP_CT_DIR_REPLY)
+		statusbit ^= IPS_NAT_MASK;
+
+	if (!(ct->status & statusbit))
+		return 1;
+
+	l4proto = __nf_nat_l4proto_find(NFPROTO_IPV6, inside->ip6.nexthdr);
+	if (!nf_nat_ipv6_manip_pkt(skb, hdrlen + sizeof(inside->icmp6),
+				   l4proto, &ct->tuplehash[!dir].tuple, !manip))
+		return 0;
+
+	if (skb->ip_summed != CHECKSUM_PARTIAL) {
+		struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+		inside = (void *)skb->data + hdrlen;
+		inside->icmp6.icmp6_cksum = 0;
+		inside->icmp6.icmp6_cksum =
+			csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr,
+					skb->len - hdrlen, IPPROTO_ICMPV6,
+					csum_partial(&inside->icmp6,
+						     skb->len - hdrlen, 0));
+	}
+
+	nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
+	l4proto = __nf_nat_l4proto_find(NFPROTO_IPV6, IPPROTO_ICMPV6);
+	if (!nf_nat_ipv6_manip_pkt(skb, 0, l4proto, &target, manip))
+		return 0;
+
+	return 1;
+}
+EXPORT_SYMBOL_GPL(nf_nat_icmpv6_reply_translation);
+
+static int __init nf_nat_l3proto_ipv6_init(void)
+{
+	int err;
+
+	err = nf_nat_l4proto_register(NFPROTO_IPV6, &nf_nat_l4proto_icmpv6);
+	if (err < 0)
+		goto err1;
+	err = nf_nat_l3proto_register(&nf_nat_l3proto_ipv6);
+	if (err < 0)
+		goto err2;
+	return err;
+
+err2:
+	nf_nat_l4proto_unregister(NFPROTO_IPV6, &nf_nat_l4proto_icmpv6);
+err1:
+	return err;
+}
+
+static void __exit nf_nat_l3proto_ipv6_exit(void)
+{
+	nf_nat_l3proto_unregister(&nf_nat_l3proto_ipv6);
+	nf_nat_l4proto_unregister(NFPROTO_IPV6, &nf_nat_l4proto_icmpv6);
+}
+
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("nf-nat-" __stringify(AF_INET6));
+
+module_init(nf_nat_l3proto_ipv6_init);
+module_exit(nf_nat_l3proto_ipv6_exit);
diff --git a/net/ipv6/netfilter/nf_nat_proto_icmpv6.c b/net/ipv6/netfilter/nf_nat_proto_icmpv6.c
new file mode 100644
index 0000000..5d6da78
--- /dev/null
+++ b/net/ipv6/netfilter/nf_nat_proto_icmpv6.c
@@ -0,0 +1,90 @@
+/*
+ * Copyright (c) 2011 Patrick Mchardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on Rusty Russell's IPv4 ICMP NAT code. Development of IPv6
+ * NAT funded by Astaro.
+ */
+
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/icmpv6.h>
+
+#include <linux/netfilter.h>
+#include <net/netfilter/nf_nat.h>
+#include <net/netfilter/nf_nat_core.h>
+#include <net/netfilter/nf_nat_l3proto.h>
+#include <net/netfilter/nf_nat_l4proto.h>
+
+static bool
+icmpv6_in_range(const struct nf_conntrack_tuple *tuple,
+		enum nf_nat_manip_type maniptype,
+		const union nf_conntrack_man_proto *min,
+		const union nf_conntrack_man_proto *max)
+{
+	return ntohs(tuple->src.u.icmp.id) >= ntohs(min->icmp.id) &&
+	       ntohs(tuple->src.u.icmp.id) <= ntohs(max->icmp.id);
+}
+
+static void
+icmpv6_unique_tuple(const struct nf_nat_l3proto *l3proto,
+		    struct nf_conntrack_tuple *tuple,
+		    const struct nf_nat_range *range,
+		    enum nf_nat_manip_type maniptype,
+		    const struct nf_conn *ct)
+{
+	static u16 id;
+	unsigned int range_size;
+	unsigned int i;
+
+	range_size = ntohs(range->max_proto.icmp.id) -
+		     ntohs(range->min_proto.icmp.id) + 1;
+
+	if (!(range->flags & NF_NAT_RANGE_PROTO_SPECIFIED))
+		range_size = 0xffff;
+
+	for (i = 0; ; ++id) {
+		tuple->src.u.icmp.id = htons(ntohs(range->min_proto.icmp.id) +
+					     (id % range_size));
+		if (++i == range_size || !nf_nat_used_tuple(tuple, ct))
+			return;
+	}
+}
+
+static bool
+icmpv6_manip_pkt(struct sk_buff *skb,
+		 const struct nf_nat_l3proto *l3proto,
+		 unsigned int iphdroff, unsigned int hdroff,
+		 const struct nf_conntrack_tuple *tuple,
+		 enum nf_nat_manip_type maniptype)
+{
+	struct icmp6hdr *hdr;
+
+	if (!skb_make_writable(skb, hdroff + sizeof(*hdr)))
+		return false;
+
+	hdr = (struct icmp6hdr *)(skb->data + hdroff);
+	l3proto->csum_update(skb, iphdroff, &hdr->icmp6_cksum,
+			     tuple, maniptype);
+	if (hdr->icmp6_code == ICMPV6_ECHO_REQUEST ||
+	    hdr->icmp6_code == ICMPV6_ECHO_REPLY) {
+		inet_proto_csum_replace2(&hdr->icmp6_cksum, skb,
+					 hdr->icmp6_identifier,
+					 tuple->src.u.icmp.id, 0);
+		hdr->icmp6_identifier = tuple->src.u.icmp.id;
+	}
+	return true;
+}
+
+const struct nf_nat_l4proto nf_nat_l4proto_icmpv6 = {
+	.l4proto		= IPPROTO_ICMPV6,
+	.manip_pkt		= icmpv6_manip_pkt,
+	.in_range		= icmpv6_in_range,
+	.unique_tuple		= icmpv6_unique_tuple,
+#if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
+	.nlattr_to_range	= nf_nat_l4proto_nlattr_to_range,
+#endif
+};
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 13/19] netfilter: ip6tables: add MASQUERADE target
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (11 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 12/19] netfilter: ipv6: add IPv6 NAT support kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-17 13:11   ` Pablo Neira Ayuso
  2012-08-09 20:08 ` [PATCH 14/19] netfilter: ip6tables: add REDIRECT target kaber
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/addrconf.h               |    2 +-
 net/ipv4/netfilter/ipt_MASQUERADE.c  |    3 +-
 net/ipv6/addrconf.c                  |    2 +-
 net/ipv6/netfilter/Kconfig           |   12 +++
 net/ipv6/netfilter/Makefile          |    1 +
 net/ipv6/netfilter/ip6t_MASQUERADE.c |  135 ++++++++++++++++++++++++++++++++++
 6 files changed, 152 insertions(+), 3 deletions(-)
 create mode 100644 net/ipv6/netfilter/ip6t_MASQUERADE.c

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 089a09d..9e63e76 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -78,7 +78,7 @@ extern struct inet6_ifaddr      *ipv6_get_ifaddr(struct net *net,
 						 int strict);
 
 extern int			ipv6_dev_get_saddr(struct net *net,
-					       struct net_device *dev,
+					       const struct net_device *dev,
 					       const struct in6_addr *daddr,
 					       unsigned int srcprefs,
 					       struct in6_addr *saddr);
diff --git a/net/ipv4/netfilter/ipt_MASQUERADE.c b/net/ipv4/netfilter/ipt_MASQUERADE.c
index 1c3aa28..5d5d4d1 100644
--- a/net/ipv4/netfilter/ipt_MASQUERADE.c
+++ b/net/ipv4/netfilter/ipt_MASQUERADE.c
@@ -99,7 +99,8 @@ device_cmp(struct nf_conn *i, void *ifindex)
 
 	if (!nat)
 		return 0;
-
+	if (nf_ct_l3num(i) != NFPROTO_IPV4)
+		return 0;
 	return nat->masq_index == (int)(long)ifindex;
 }
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 7918181..6536404 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1095,7 +1095,7 @@ out:
 	return ret;
 }
 
-int ipv6_dev_get_saddr(struct net *net, struct net_device *dst_dev,
+int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
 		       const struct in6_addr *daddr, unsigned int prefs,
 		       struct in6_addr *saddr)
 {
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index b27e0ad..54a5032 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -144,6 +144,18 @@ config IP6_NF_TARGET_HL
 	(e.g. when running oldconfig). It selects
 	CONFIG_NETFILTER_XT_TARGET_HL.
 
+config IP6_NF_TARGET_MASQUERADE
+	tristate "MASQUERADE target support"
+	depends on NF_NAT_IPV6
+	help
+	  Masquerading is a special case of NAT: all outgoing connections are
+	  changed to seem to come from a particular interface's address, and
+	  if the interface goes down, those connections are lost.  This is
+	  only useful for dialup accounts with dynamic IP address (ie. your IP
+	  address will be different on next dialup).
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config IP6_NF_FILTER
 	tristate "Packet filtering"
 	default m if NETFILTER_ADVANCED=n
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index 7677937..068bad1 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -34,4 +34,5 @@ obj-$(CONFIG_IP6_NF_MATCH_RPFILTER) += ip6t_rpfilter.o
 obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o
 
 # targets
+obj-$(CONFIG_IP6_NF_TARGET_MASQUERADE) += ip6t_MASQUERADE.o
 obj-$(CONFIG_IP6_NF_TARGET_REJECT) += ip6t_REJECT.o
diff --git a/net/ipv6/netfilter/ip6t_MASQUERADE.c b/net/ipv6/netfilter/ip6t_MASQUERADE.c
new file mode 100644
index 0000000..60e9053
--- /dev/null
+++ b/net/ipv6/netfilter/ip6t_MASQUERADE.c
@@ -0,0 +1,135 @@
+/*
+ * Copyright (c) 2011 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on Rusty Russell's IPv6 MASQUERADE target. Development of IPv6
+ * NAT funded by Astaro.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/ipv6.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv6.h>
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_nat.h>
+#include <net/addrconf.h>
+#include <net/ipv6.h>
+
+static unsigned int
+masquerade_tg6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct nf_nat_range *range = par->targinfo;
+	enum ip_conntrack_info ctinfo;
+	struct in6_addr src;
+	struct nf_conn *ct;
+	struct nf_nat_range newrange;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	NF_CT_ASSERT(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
+			    ctinfo == IP_CT_RELATED_REPLY));
+
+	if (ipv6_dev_get_saddr(dev_net(par->out), par->out,
+			       &ipv6_hdr(skb)->daddr, 0, &src) < 0)
+		return NF_DROP;
+
+	nfct_nat(ct)->masq_index = par->out->ifindex;
+
+	newrange.flags		= range->flags | NF_NAT_RANGE_MAP_IPS;
+	newrange.min_addr.in6	= src;
+	newrange.max_addr.in6	= src;
+	newrange.min_proto	= range->min_proto;
+	newrange.max_proto	= range->max_proto;
+
+	return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_SRC);
+}
+
+static int masquerade_tg6_checkentry(const struct xt_tgchk_param *par)
+{
+	const struct nf_nat_range *range = par->targinfo;
+
+	if (range->flags & NF_NAT_RANGE_MAP_IPS)
+		return -EINVAL;
+	return 0;
+}
+
+static int device_cmp(struct nf_conn *ct, void *ifindex)
+{
+	const struct nf_conn_nat *nat = nfct_nat(ct);
+
+	if (!nat)
+		return 0;
+	if (nf_ct_l3num(ct) != NFPROTO_IPV6)
+		return 0;
+	return nat->masq_index == (int)(long)ifindex;
+}
+
+static int masq_device_event(struct notifier_block *this,
+			     unsigned long event, void *ptr)
+{
+	const struct net_device *dev = ptr;
+	struct net *net = dev_net(dev);
+
+	if (event == NETDEV_DOWN)
+		nf_ct_iterate_cleanup(net, device_cmp,
+				      (void *)(long)dev->ifindex);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block masq_dev_notifier = {
+	.notifier_call	= masq_device_event,
+};
+
+static int masq_inet_event(struct notifier_block *this,
+			   unsigned long event, void *ptr)
+{
+	struct inet6_ifaddr *ifa = ptr;
+
+	return masq_device_event(this, event, ifa->idev->dev);
+}
+
+static struct notifier_block masq_inet_notifier = {
+	.notifier_call	= masq_inet_event,
+};
+
+static struct xt_target masquerade_tg6_reg __read_mostly = {
+	.name		= "MASQUERADE",
+	.family		= NFPROTO_IPV6,
+	.checkentry	= masquerade_tg6_checkentry,
+	.target		= masquerade_tg6,
+	.targetsize	= sizeof(struct nf_nat_range),
+	.table		= "nat",
+	.hooks		= 1 << NF_INET_POST_ROUTING,
+	.me		= THIS_MODULE,
+};
+
+static int __init masquerade_tg6_init(void)
+{
+	int err;
+
+	err = xt_register_target(&masquerade_tg6_reg);
+	if (err == 0) {
+		register_netdevice_notifier(&masq_dev_notifier);
+		register_inet6addr_notifier(&masq_inet_notifier);
+	}
+
+	return err;
+}
+static void __exit masquerade_tg6_exit(void)
+{
+	unregister_inet6addr_notifier(&masq_inet_notifier);
+	unregister_netdevice_notifier(&masq_dev_notifier);
+	xt_unregister_target(&masquerade_tg6_reg);
+}
+
+module_init(masquerade_tg6_init);
+module_exit(masquerade_tg6_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
+MODULE_DESCRIPTION("Xtables: automatic address SNAT");
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 14/19] netfilter: ip6tables: add REDIRECT target
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (12 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 13/19] netfilter: ip6tables: add MASQUERADE target kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-09 20:08 ` [PATCH 15/19] netfilter: ip6tables: add NETMAP target kaber
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/ipv6/netfilter/Kconfig         |   11 ++++
 net/ipv6/netfilter/Makefile        |    1 +
 net/ipv6/netfilter/ip6t_REDIRECT.c |   98 ++++++++++++++++++++++++++++++++++++
 3 files changed, 110 insertions(+), 0 deletions(-)
 create mode 100644 net/ipv6/netfilter/ip6t_REDIRECT.c

diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 54a5032..585590f 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -156,6 +156,17 @@ config IP6_NF_TARGET_MASQUERADE
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP6_NF_TARGET_REDIRECT
+	tristate "REDIRECT target support"
+	depends on NF_NAT_IPV6
+	help
+	  REDIRECT is a special case of NAT: all incoming connections are
+	  mapped onto the incoming interface's address, causing the packets to
+	  come to the local machine instead of passing through.  This is
+	  useful for transparent proxies.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config IP6_NF_FILTER
 	tristate "Packet filtering"
 	default m if NETFILTER_ADVANCED=n
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index 068bad1..e30a531 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -35,4 +35,5 @@ obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o
 
 # targets
 obj-$(CONFIG_IP6_NF_TARGET_MASQUERADE) += ip6t_MASQUERADE.o
+obj-$(CONFIG_IP6_NF_TARGET_REDIRECT) += ip6t_REDIRECT.o
 obj-$(CONFIG_IP6_NF_TARGET_REJECT) += ip6t_REJECT.o
diff --git a/net/ipv6/netfilter/ip6t_REDIRECT.c b/net/ipv6/netfilter/ip6t_REDIRECT.c
new file mode 100644
index 0000000..60497a3
--- /dev/null
+++ b/net/ipv6/netfilter/ip6t_REDIRECT.c
@@ -0,0 +1,98 @@
+/*
+ * Copyright (c) 2011 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on Rusty Russell's IPv4 REDIRECT target. Development of IPv6
+ * NAT funded by Astaro.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv6.h>
+#include <linux/netfilter/x_tables.h>
+#include <net/addrconf.h>
+#include <net/netfilter/nf_nat.h>
+
+static const struct in6_addr loopback_addr = IN6ADDR_LOOPBACK_INIT;
+
+static unsigned int
+redirect_tg6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct nf_nat_range *range = par->targinfo;
+	struct nf_nat_range newrange;
+	struct in6_addr newdst;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (par->hooknum == NF_INET_LOCAL_OUT)
+		newdst = loopback_addr;
+	else {
+		struct inet6_dev *idev;
+		struct inet6_ifaddr *ifa;
+		bool addr = false;
+
+		rcu_read_lock();
+		idev = __in6_dev_get(skb->dev);
+		if (idev != NULL) {
+			list_for_each_entry(ifa, &idev->addr_list, if_list) {
+				newdst = ifa->addr;
+				addr = true;
+				break;
+			}
+		}
+		rcu_read_unlock();
+
+		if (!addr)
+			return NF_DROP;
+	}
+
+	newrange.flags		= range->flags | NF_NAT_RANGE_MAP_IPS;
+	newrange.min_addr.in6	= newdst;
+	newrange.max_addr.in6	= newdst;
+	newrange.min_proto	= range->min_proto;
+	newrange.max_proto	= range->max_proto;
+
+	return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_DST);
+}
+
+static int redirect_tg6_checkentry(const struct xt_tgchk_param *par)
+{
+	const struct nf_nat_range *range = par->targinfo;
+
+	if (range->flags & NF_NAT_RANGE_MAP_IPS)
+		return -EINVAL;
+	return 0;
+}
+
+static struct xt_target redirect_tg6_reg __read_mostly = {
+	.name		= "REDIRECT",
+	.family		= NFPROTO_IPV6,
+	.checkentry	= redirect_tg6_checkentry,
+	.target		= redirect_tg6,
+	.targetsize	= sizeof(struct nf_nat_range),
+	.table		= "nat",
+	.hooks		= (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_OUT),
+	.me		= THIS_MODULE,
+};
+
+static int __init redirect_tg6_init(void)
+{
+	return xt_register_target(&redirect_tg6_reg);
+}
+
+static void __exit redirect_tg6_exit(void)
+{
+	xt_unregister_target(&redirect_tg6_reg);
+}
+
+module_init(redirect_tg6_init);
+module_exit(redirect_tg6_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
+MODULE_DESCRIPTION("Xtables: Connection redirection to localhost");
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 15/19] netfilter: ip6tables: add NETMAP target
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (13 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 14/19] netfilter: ip6tables: add REDIRECT target kaber
@ 2012-08-09 20:08 ` kaber
  2012-08-09 20:09 ` [PATCH 16/19] netfilter: nf_nat: support IPv6 in FTP NAT helper kaber
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/ipv6/netfilter/Kconfig       |   10 ++++
 net/ipv6/netfilter/Makefile      |    1 +
 net/ipv6/netfilter/ip6t_NETMAP.c |   94 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 net/ipv6/netfilter/ip6t_NETMAP.c

diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 585590f..7bdf73b 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -156,6 +156,16 @@ config IP6_NF_TARGET_MASQUERADE
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP6_NF_TARGET_NETMAP
+	tristate "NETMAP target support"
+	depends on NF_NAT_IPV6
+	help
+	  NETMAP is an implementation of static 1:1 NAT mapping of network
+	  addresses. It maps the network address part, while keeping the host
+	  address part intact.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config IP6_NF_TARGET_REDIRECT
 	tristate "REDIRECT target support"
 	depends on NF_NAT_IPV6
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index e30a531..0864ce6 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -35,5 +35,6 @@ obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o
 
 # targets
 obj-$(CONFIG_IP6_NF_TARGET_MASQUERADE) += ip6t_MASQUERADE.o
+obj-$(CONFIG_IP6_NF_TARGET_NETMAP) += ip6t_NETMAP.o
 obj-$(CONFIG_IP6_NF_TARGET_REDIRECT) += ip6t_REDIRECT.o
 obj-$(CONFIG_IP6_NF_TARGET_REJECT) += ip6t_REJECT.o
diff --git a/net/ipv6/netfilter/ip6t_NETMAP.c b/net/ipv6/netfilter/ip6t_NETMAP.c
new file mode 100644
index 0000000..4f3bf36
--- /dev/null
+++ b/net/ipv6/netfilter/ip6t_NETMAP.c
@@ -0,0 +1,94 @@
+/*
+ * Copyright (c) 2011 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on Svenning Soerensen's IPv4 NETMAP target. Development of IPv6
+ * NAT funded by Astaro.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/ipv6.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv6.h>
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_nat.h>
+
+static unsigned int
+netmap_tg6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct nf_nat_range *range = par->targinfo;
+	struct nf_nat_range newrange;
+	struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	union nf_inet_addr new_addr, netmask;
+	unsigned int i;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	for (i = 0; i < ARRAY_SIZE(range->min_addr.ip6); i++)
+		netmask.ip6[i] = ~(range->min_addr.ip6[i] ^
+				   range->max_addr.ip6[i]);
+
+	if (par->hooknum == NF_INET_PRE_ROUTING ||
+	    par->hooknum == NF_INET_LOCAL_OUT)
+		new_addr.in6 = ipv6_hdr(skb)->daddr;
+	else
+		new_addr.in6 = ipv6_hdr(skb)->saddr;
+
+	for (i = 0; i < ARRAY_SIZE(new_addr.ip6); i++) {
+		new_addr.ip6[i] &= ~netmask.ip6[i];
+		new_addr.ip6[i] |= range->min_addr.ip6[i] &
+				   netmask.ip6[i];
+	}
+
+	newrange.flags	= range->flags | NF_NAT_RANGE_MAP_IPS;
+	newrange.min_addr	= new_addr;
+	newrange.max_addr	= new_addr;
+	newrange.min_proto	= range->min_proto;
+	newrange.max_proto	= range->max_proto;
+
+	return nf_nat_setup_info(ct, &newrange, HOOK2MANIP(par->hooknum));
+}
+
+static int netmap_tg6_checkentry(const struct xt_tgchk_param *par)
+{
+	const struct nf_nat_range *range = par->targinfo;
+
+	if (!(range->flags & NF_NAT_RANGE_MAP_IPS))
+		return -EINVAL;
+	return 0;
+}
+
+static struct xt_target netmap_tg6_reg __read_mostly = {
+	.name		= "NETMAP",
+	.family		= NFPROTO_IPV6,
+	.target		= netmap_tg6,
+	.targetsize	= sizeof(struct nf_nat_range),
+	.table		= "nat",
+	.hooks		= (1 << NF_INET_PRE_ROUTING) |
+			  (1 << NF_INET_POST_ROUTING) |
+			  (1 << NF_INET_LOCAL_OUT) |
+			  (1 << NF_INET_LOCAL_IN),
+	.checkentry	= netmap_tg6_checkentry,
+	.me		= THIS_MODULE,
+};
+
+static int __init netmap_tg6_init(void)
+{
+	return xt_register_target(&netmap_tg6_reg);
+}
+
+static void netmap_tg6_exit(void)
+{
+	xt_unregister_target(&netmap_tg6_reg);
+}
+
+module_init(netmap_tg6_init);
+module_exit(netmap_tg6_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Xtables: 1:1 NAT mapping of IPv6 subnets");
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 16/19] netfilter: nf_nat: support IPv6 in FTP NAT helper
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (14 preceding siblings ...)
  2012-08-09 20:08 ` [PATCH 15/19] netfilter: ip6tables: add NETMAP target kaber
@ 2012-08-09 20:09 ` kaber
  2012-08-09 20:09 ` [PATCH 17/19] netfilter: nf_nat: support IPv6 in amanda " kaber
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:09 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/ipv4/netfilter/Kconfig            |    5 -----
 net/ipv4/netfilter/Makefile           |    1 -
 net/netfilter/Kconfig                 |    5 +++++
 net/netfilter/Makefile                |    3 +++
 net/netfilter/nf_conntrack_ftp.c      |    3 +--
 net/{ipv4 => }/netfilter/nf_nat_ftp.c |   30 ++++++++++++++++++------------
 6 files changed, 27 insertions(+), 20 deletions(-)
 rename net/{ipv4 => }/netfilter/nf_nat_ftp.c (82%)

diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 33372a1..80272e7 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -220,11 +220,6 @@ config NF_NAT_PROTO_GRE
 	tristate
 	depends on NF_NAT_IPV4 && NF_CT_PROTO_GRE
 
-config NF_NAT_FTP
-	tristate
-	depends on NF_CONNTRACK && NF_NAT_IPV4
-	default NF_NAT_IPV4 && NF_CONNTRACK_FTP
-
 config NF_NAT_IRC
 	tristate
 	depends on NF_CONNTRACK && NF_NAT_IPV4
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 0ea3acc..4d8a4ad 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -21,7 +21,6 @@ obj-$(CONFIG_NF_DEFRAG_IPV4) += nf_defrag_ipv4.o
 
 # NAT helpers (nf_conntrack)
 obj-$(CONFIG_NF_NAT_AMANDA) += nf_nat_amanda.o
-obj-$(CONFIG_NF_NAT_FTP) += nf_nat_ftp.o
 obj-$(CONFIG_NF_NAT_H323) += nf_nat_h323.o
 obj-$(CONFIG_NF_NAT_IRC) += nf_nat_irc.o
 obj-$(CONFIG_NF_NAT_PPTP) += nf_nat_pptp.o
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 91addda..3104494 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -380,6 +380,11 @@ config NF_NAT_PROTO_SCTP
 	depends on NF_NAT && NF_CT_PROTO_SCTP
 	select LIBCRC32C
 
+config NF_NAT_FTP
+	tristate
+	depends on NF_CONNTRACK && NF_NAT
+	default NF_NAT && NF_CONNTRACK_FTP
+
 endif # NF_CONNTRACK
 
 # transparent proxy support
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 09c9451..16592b1 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -54,6 +54,9 @@ obj-$(CONFIG_NF_NAT_PROTO_DCCP) += nf_nat_proto_dccp.o
 obj-$(CONFIG_NF_NAT_PROTO_UDPLITE) += nf_nat_proto_udplite.o
 obj-$(CONFIG_NF_NAT_PROTO_SCTP) += nf_nat_proto_sctp.o
 
+# NAT helpers
+obj-$(CONFIG_NF_NAT_FTP) += nf_nat_ftp.o
+
 # transparent proxy support
 obj-$(CONFIG_NETFILTER_TPROXY) += nf_tproxy_core.o
 
diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c
index c0f4a5b..f8cc26a 100644
--- a/net/netfilter/nf_conntrack_ftp.c
+++ b/net/netfilter/nf_conntrack_ftp.c
@@ -488,8 +488,7 @@ static int help(struct sk_buff *skb,
 	/* Now, NAT might want to mangle the packet, and register the
 	 * (possibly changed) expectation itself. */
 	nf_nat_ftp = rcu_dereference(nf_nat_ftp_hook);
-	if (nf_nat_ftp && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
-	    ct->status & IPS_NAT_MASK)
+	if (nf_nat_ftp && ct->status & IPS_NAT_MASK)
 		ret = nf_nat_ftp(skb, ctinfo, search[dir][i].ftptype,
 				 protoff, matchoff, matchlen, exp);
 	else {
diff --git a/net/ipv4/netfilter/nf_nat_ftp.c b/net/netfilter/nf_nat_ftp.c
similarity index 82%
rename from net/ipv4/netfilter/nf_nat_ftp.c
rename to net/netfilter/nf_nat_ftp.c
index dd5e387..e839b97 100644
--- a/net/ipv4/netfilter/nf_nat_ftp.c
+++ b/net/netfilter/nf_nat_ftp.c
@@ -10,7 +10,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
-#include <linux/ip.h>
+#include <linux/inet.h>
 #include <linux/tcp.h>
 #include <linux/netfilter_ipv4.h>
 #include <net/netfilter/nf_nat.h>
@@ -26,22 +26,27 @@ MODULE_ALIAS("ip_nat_ftp");
 
 /* FIXME: Time out? --RR */
 
-static int nf_nat_ftp_fmt_cmd(enum nf_ct_ftp_type type,
+static int nf_nat_ftp_fmt_cmd(struct nf_conn *ct, enum nf_ct_ftp_type type,
 			      char *buffer, size_t buflen,
-			      __be32 addr, u16 port)
+			      union nf_inet_addr *addr, u16 port)
 {
 	switch (type) {
 	case NF_CT_FTP_PORT:
 	case NF_CT_FTP_PASV:
 		return snprintf(buffer, buflen, "%u,%u,%u,%u,%u,%u",
-				((unsigned char *)&addr)[0],
-				((unsigned char *)&addr)[1],
-				((unsigned char *)&addr)[2],
-				((unsigned char *)&addr)[3],
+				((unsigned char *)&addr->ip)[0],
+				((unsigned char *)&addr->ip)[1],
+				((unsigned char *)&addr->ip)[2],
+				((unsigned char *)&addr->ip)[3],
 				port >> 8,
 				port & 0xFF);
 	case NF_CT_FTP_EPRT:
-		return snprintf(buffer, buflen, "|1|%pI4|%u|", &addr, port);
+		if (nf_ct_l3num(ct) == NFPROTO_IPV4)
+			return snprintf(buffer, buflen, "|1|%pI4|%u|",
+					&addr->ip, port);
+		else
+			return snprintf(buffer, buflen, "|2|%pI6|%u|",
+					&addr->ip6, port);
 	case NF_CT_FTP_EPSV:
 		return snprintf(buffer, buflen, "|||%u|", port);
 	}
@@ -59,17 +64,17 @@ static unsigned int nf_nat_ftp(struct sk_buff *skb,
 			       unsigned int matchlen,
 			       struct nf_conntrack_expect *exp)
 {
-	__be32 newip;
+	union nf_inet_addr newaddr;
 	u_int16_t port;
 	int dir = CTINFO2DIR(ctinfo);
 	struct nf_conn *ct = exp->master;
-	char buffer[sizeof("|1|255.255.255.255|65535|")];
+	char buffer[sizeof("|1||65535|") + INET6_ADDRSTRLEN];
 	unsigned int buflen;
 
 	pr_debug("FTP_NAT: type %i, off %u len %u\n", type, matchoff, matchlen);
 
 	/* Connection will come from wherever this packet goes, hence !dir */
-	newip = ct->tuplehash[!dir].tuple.dst.u3.ip;
+	newaddr = ct->tuplehash[!dir].tuple.dst.u3;
 	exp->saved_proto.tcp.port = exp->tuple.dst.u.tcp.port;
 	exp->dir = !dir;
 
@@ -94,7 +99,8 @@ static unsigned int nf_nat_ftp(struct sk_buff *skb,
 	if (port == 0)
 		return NF_DROP;
 
-	buflen = nf_nat_ftp_fmt_cmd(type, buffer, sizeof(buffer), newip, port);
+	buflen = nf_nat_ftp_fmt_cmd(ct, type, buffer, sizeof(buffer),
+				    &newaddr, port);
 	if (!buflen)
 		goto out;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 17/19] netfilter: nf_nat: support IPv6 in amanda NAT helper
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (15 preceding siblings ...)
  2012-08-09 20:09 ` [PATCH 16/19] netfilter: nf_nat: support IPv6 in FTP NAT helper kaber
@ 2012-08-09 20:09 ` kaber
  2012-08-09 20:09 ` [PATCH 18/19] netfilter: nf_nat: support IPv6 in SIP " kaber
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:09 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/ipv4/netfilter/Kconfig               |    5 -----
 net/ipv4/netfilter/Makefile              |    1 -
 net/netfilter/Kconfig                    |    5 +++++
 net/netfilter/Makefile                   |    1 +
 net/netfilter/nf_conntrack_amanda.c      |    3 +--
 net/{ipv4 => }/netfilter/nf_nat_amanda.c |    0
 6 files changed, 7 insertions(+), 8 deletions(-)
 rename net/{ipv4 => }/netfilter/nf_nat_amanda.c (100%)

diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 80272e7..df42031 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -230,11 +230,6 @@ config NF_NAT_TFTP
 	depends on NF_CONNTRACK && NF_NAT_IPV4
 	default NF_NAT_IPV4 && NF_CONNTRACK_TFTP
 
-config NF_NAT_AMANDA
-	tristate
-	depends on NF_CONNTRACK && NF_NAT_IPV4
-	default NF_NAT_IPV4 && NF_CONNTRACK_AMANDA
-
 config NF_NAT_PPTP
 	tristate
 	depends on NF_CONNTRACK && NF_NAT_IPV4
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 4d8a4ad..8baa496 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -20,7 +20,6 @@ obj-$(CONFIG_NF_NAT_IPV4) += nf_nat_ipv4.o
 obj-$(CONFIG_NF_DEFRAG_IPV4) += nf_defrag_ipv4.o
 
 # NAT helpers (nf_conntrack)
-obj-$(CONFIG_NF_NAT_AMANDA) += nf_nat_amanda.o
 obj-$(CONFIG_NF_NAT_H323) += nf_nat_h323.o
 obj-$(CONFIG_NF_NAT_IRC) += nf_nat_irc.o
 obj-$(CONFIG_NF_NAT_PPTP) += nf_nat_pptp.o
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 3104494..2eee9f1 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -380,6 +380,11 @@ config NF_NAT_PROTO_SCTP
 	depends on NF_NAT && NF_CT_PROTO_SCTP
 	select LIBCRC32C
 
+config NF_NAT_AMANDA
+	tristate
+	depends on NF_CONNTRACK && NF_NAT
+	default NF_NAT && NF_CONNTRACK_AMANDA
+
 config NF_NAT_FTP
 	tristate
 	depends on NF_CONNTRACK && NF_NAT
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 16592b1..7d6e1ea 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_NF_NAT_PROTO_UDPLITE) += nf_nat_proto_udplite.o
 obj-$(CONFIG_NF_NAT_PROTO_SCTP) += nf_nat_proto_sctp.o
 
 # NAT helpers
+obj-$(CONFIG_NF_NAT_AMANDA) += nf_nat_amanda.o
 obj-$(CONFIG_NF_NAT_FTP) += nf_nat_ftp.o
 
 # transparent proxy support
diff --git a/net/netfilter/nf_conntrack_amanda.c b/net/netfilter/nf_conntrack_amanda.c
index e0212b5..c514fe6 100644
--- a/net/netfilter/nf_conntrack_amanda.c
+++ b/net/netfilter/nf_conntrack_amanda.c
@@ -155,8 +155,7 @@ static int amanda_help(struct sk_buff *skb,
 				  IPPROTO_TCP, NULL, &port);
 
 		nf_nat_amanda = rcu_dereference(nf_nat_amanda_hook);
-		if (nf_nat_amanda && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
-		    ct->status & IPS_NAT_MASK)
+		if (nf_nat_amanda && ct->status & IPS_NAT_MASK)
 			ret = nf_nat_amanda(skb, ctinfo, protoff,
 					    off - dataoff, len, exp);
 		else if (nf_ct_expect_related(exp) != 0)
diff --git a/net/ipv4/netfilter/nf_nat_amanda.c b/net/netfilter/nf_nat_amanda.c
similarity index 100%
rename from net/ipv4/netfilter/nf_nat_amanda.c
rename to net/netfilter/nf_nat_amanda.c
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 18/19] netfilter: nf_nat: support IPv6 in SIP NAT helper
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (16 preceding siblings ...)
  2012-08-09 20:09 ` [PATCH 17/19] netfilter: nf_nat: support IPv6 in amanda " kaber
@ 2012-08-09 20:09 ` kaber
  2012-08-09 20:09 ` [PATCH 19/19] netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target kaber
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: kaber @ 2012-08-09 20:09 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Add IPv6 support to the SIP NAT helper. There are no functional differences
to IPv4 NAT, just different formats for addresses.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter/nf_conntrack_sip.h |    9 +-
 net/ipv4/netfilter/Kconfig                 |    5 -
 net/ipv4/netfilter/Makefile                |    1 -
 net/netfilter/Kconfig                      |    5 +
 net/netfilter/Makefile                     |    1 +
 net/netfilter/nf_conntrack_sip.c           |   68 ++++++------
 net/{ipv4 => }/netfilter/nf_nat_sip.c      |  172 ++++++++++++++++------------
 7 files changed, 145 insertions(+), 116 deletions(-)
 rename net/{ipv4 => }/netfilter/nf_nat_sip.c (76%)

diff --git a/include/linux/netfilter/nf_conntrack_sip.h b/include/linux/netfilter/nf_conntrack_sip.h
index 1afc669..387bdd0 100644
--- a/include/linux/netfilter/nf_conntrack_sip.h
+++ b/include/linux/netfilter/nf_conntrack_sip.h
@@ -99,10 +99,8 @@ enum sip_header_types {
 enum sdp_header_types {
 	SDP_HDR_UNSPEC,
 	SDP_HDR_VERSION,
-	SDP_HDR_OWNER_IP4,
-	SDP_HDR_CONNECTION_IP4,
-	SDP_HDR_OWNER_IP6,
-	SDP_HDR_CONNECTION_IP6,
+	SDP_HDR_OWNER,
+	SDP_HDR_CONNECTION,
 	SDP_HDR_MEDIA,
 };
 
@@ -111,7 +109,8 @@ extern unsigned int (*nf_nat_sip_hook)(struct sk_buff *skb,
 				       unsigned int dataoff,
 				       const char **dptr,
 				       unsigned int *datalen);
-extern void (*nf_nat_sip_seq_adjust_hook)(struct sk_buff *skb, s16 off);
+extern void (*nf_nat_sip_seq_adjust_hook)(struct sk_buff *skb,
+					  unsigned int protoff, s16 off);
 extern unsigned int (*nf_nat_sip_expect_hook)(struct sk_buff *skb,
 					      unsigned int protoff,
 					      unsigned int dataoff,
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index df42031..27f9574 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -241,11 +241,6 @@ config NF_NAT_H323
 	depends on NF_CONNTRACK && NF_NAT_IPV4
 	default NF_NAT_IPV4 && NF_CONNTRACK_H323
 
-config NF_NAT_SIP
-	tristate
-	depends on NF_CONNTRACK && NF_NAT_IPV4
-	default NF_NAT_IPV4 && NF_CONNTRACK_SIP
-
 # mangle + specific targets
 config IP_NF_MANGLE
 	tristate "Packet mangling"
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 8baa496..8914abf 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -23,7 +23,6 @@ obj-$(CONFIG_NF_DEFRAG_IPV4) += nf_defrag_ipv4.o
 obj-$(CONFIG_NF_NAT_H323) += nf_nat_h323.o
 obj-$(CONFIG_NF_NAT_IRC) += nf_nat_irc.o
 obj-$(CONFIG_NF_NAT_PPTP) += nf_nat_pptp.o
-obj-$(CONFIG_NF_NAT_SIP) += nf_nat_sip.o
 obj-$(CONFIG_NF_NAT_SNMP_BASIC) += nf_nat_snmp_basic.o
 obj-$(CONFIG_NF_NAT_TFTP) += nf_nat_tftp.o
 
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 2eee9f1..bf3e464 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -390,6 +390,11 @@ config NF_NAT_FTP
 	depends on NF_CONNTRACK && NF_NAT
 	default NF_NAT && NF_CONNTRACK_FTP
 
+config NF_NAT_SIP
+	tristate
+	depends on NF_CONNTRACK && NF_NAT
+	default NF_NAT && NF_CONNTRACK_SIP
+
 endif # NF_CONNTRACK
 
 # transparent proxy support
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 7d6e1ea..7d6d1a0 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_NF_NAT_PROTO_SCTP) += nf_nat_proto_sctp.o
 # NAT helpers
 obj-$(CONFIG_NF_NAT_AMANDA) += nf_nat_amanda.o
 obj-$(CONFIG_NF_NAT_FTP) += nf_nat_ftp.o
+obj-$(CONFIG_NF_NAT_SIP) += nf_nat_sip.o
 
 # transparent proxy support
 obj-$(CONFIG_NETFILTER_TPROXY) += nf_tproxy_core.o
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index d517490..df8f4f2 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -57,7 +57,8 @@ unsigned int (*nf_nat_sip_hook)(struct sk_buff *skb, unsigned int protoff,
 				unsigned int *datalen) __read_mostly;
 EXPORT_SYMBOL_GPL(nf_nat_sip_hook);
 
-void (*nf_nat_sip_seq_adjust_hook)(struct sk_buff *skb, s16 off) __read_mostly;
+void (*nf_nat_sip_seq_adjust_hook)(struct sk_buff *skb, unsigned int protoff,
+				   s16 off) __read_mostly;
 EXPORT_SYMBOL_GPL(nf_nat_sip_seq_adjust_hook);
 
 unsigned int (*nf_nat_sip_expect_hook)(struct sk_buff *skb,
@@ -742,13 +743,18 @@ static int sdp_addr_len(const struct nf_conn *ct, const char *dptr,
  * be tolerant and also accept records terminated with a single newline
  * character". We handle both cases.
  */
-static const struct sip_header ct_sdp_hdrs[] = {
-	[SDP_HDR_VERSION]		= SDP_HDR("v=", NULL, digits_len),
-	[SDP_HDR_OWNER_IP4]		= SDP_HDR("o=", "IN IP4 ", sdp_addr_len),
-	[SDP_HDR_CONNECTION_IP4]	= SDP_HDR("c=", "IN IP4 ", sdp_addr_len),
-	[SDP_HDR_OWNER_IP6]		= SDP_HDR("o=", "IN IP6 ", sdp_addr_len),
-	[SDP_HDR_CONNECTION_IP6]	= SDP_HDR("c=", "IN IP6 ", sdp_addr_len),
-	[SDP_HDR_MEDIA]			= SDP_HDR("m=", NULL, media_len),
+static const struct sip_header ct_sdp_hdrs_v4[] = {
+	[SDP_HDR_VERSION]	= SDP_HDR("v=", NULL, digits_len),
+	[SDP_HDR_OWNER]		= SDP_HDR("o=", "IN IP4 ", sdp_addr_len),
+	[SDP_HDR_CONNECTION]	= SDP_HDR("c=", "IN IP4 ", sdp_addr_len),
+	[SDP_HDR_MEDIA]		= SDP_HDR("m=", NULL, media_len),
+};
+
+static const struct sip_header ct_sdp_hdrs_v6[] = {
+	[SDP_HDR_VERSION]	= SDP_HDR("v=", NULL, digits_len),
+	[SDP_HDR_OWNER]		= SDP_HDR("o=", "IN IP6 ", sdp_addr_len),
+	[SDP_HDR_CONNECTION]	= SDP_HDR("c=", "IN IP6 ", sdp_addr_len),
+	[SDP_HDR_MEDIA]		= SDP_HDR("m=", NULL, media_len),
 };
 
 /* Linear string search within SDP header values */
@@ -774,11 +780,14 @@ int ct_sip_get_sdp_header(const struct nf_conn *ct, const char *dptr,
 			  enum sdp_header_types term,
 			  unsigned int *matchoff, unsigned int *matchlen)
 {
-	const struct sip_header *hdr = &ct_sdp_hdrs[type];
-	const struct sip_header *thdr = &ct_sdp_hdrs[term];
+	const struct sip_header *hdrs, *hdr, *thdr;
 	const char *start = dptr, *limit = dptr + datalen;
 	int shift = 0;
 
+	hdrs = nf_ct_l3num(ct) == NFPROTO_IPV4 ? ct_sdp_hdrs_v4 : ct_sdp_hdrs_v6;
+	hdr = &hdrs[type];
+	thdr = &hdrs[term];
+
 	for (dptr += dataoff; dptr < limit; dptr++) {
 		/* Find beginning of line */
 		if (*dptr != '\r' && *dptr != '\n')
@@ -945,12 +954,12 @@ static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int protoff,
 		    exp->class != class)
 			break;
 #ifdef CONFIG_NF_NAT_NEEDED
-		if (exp->tuple.src.l3num == AF_INET && !direct_rtp &&
-		    (exp->saved_addr.ip != exp->tuple.dst.u3.ip ||
+		if (!direct_rtp &&
+		    (!nf_inet_addr_cmp(&exp->saved_addr, &exp->tuple.dst.u3) ||
 		     exp->saved_proto.udp.port != exp->tuple.dst.u.udp.port) &&
 		    ct->status & IPS_NAT_MASK) {
-			daddr->ip		= exp->saved_addr.ip;
-			tuple.dst.u3.ip		= exp->saved_addr.ip;
+			*daddr			= exp->saved_addr;
+			tuple.dst.u3		= exp->saved_addr;
 			tuple.dst.u.udp.port	= exp->saved_proto.udp.port;
 			direct_rtp = 1;
 		} else
@@ -987,8 +996,7 @@ static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int protoff,
 			  IPPROTO_UDP, NULL, &rtcp_port);
 
 	nf_nat_sdp_media = rcu_dereference(nf_nat_sdp_media_hook);
-	if (nf_nat_sdp_media && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
-	    ct->status & IPS_NAT_MASK && !direct_rtp)
+	if (nf_nat_sdp_media && ct->status & IPS_NAT_MASK && !direct_rtp)
 		ret = nf_nat_sdp_media(skb, protoff, dataoff, dptr, datalen,
 				       rtp_exp, rtcp_exp,
 				       mediaoff, medialen, daddr);
@@ -1044,15 +1052,12 @@ static int process_sdp(struct sk_buff *skb, unsigned int protoff,
 	unsigned int i;
 	union nf_inet_addr caddr, maddr, rtp_addr;
 	unsigned int port;
-	enum sdp_header_types c_hdr;
 	const struct sdp_media_type *t;
 	int ret = NF_ACCEPT;
 	typeof(nf_nat_sdp_addr_hook) nf_nat_sdp_addr;
 	typeof(nf_nat_sdp_session_hook) nf_nat_sdp_session;
 
 	nf_nat_sdp_addr = rcu_dereference(nf_nat_sdp_addr_hook);
-	c_hdr = nf_ct_l3num(ct) == AF_INET ? SDP_HDR_CONNECTION_IP4 :
-					     SDP_HDR_CONNECTION_IP6;
 
 	/* Find beginning of session description */
 	if (ct_sip_get_sdp_header(ct, *dptr, 0, *datalen,
@@ -1066,7 +1071,7 @@ static int process_sdp(struct sk_buff *skb, unsigned int protoff,
 	 * the end of the session description. */
 	caddr_len = 0;
 	if (ct_sip_parse_sdp_addr(ct, *dptr, sdpoff, *datalen,
-				  c_hdr, SDP_HDR_MEDIA,
+				  SDP_HDR_CONNECTION, SDP_HDR_MEDIA,
 				  &matchoff, &matchlen, &caddr) > 0)
 		caddr_len = matchlen;
 
@@ -1096,7 +1101,7 @@ static int process_sdp(struct sk_buff *skb, unsigned int protoff,
 		/* The media description overrides the session description. */
 		maddr_len = 0;
 		if (ct_sip_parse_sdp_addr(ct, *dptr, mediaoff, *datalen,
-					  c_hdr, SDP_HDR_MEDIA,
+					  SDP_HDR_CONNECTION, SDP_HDR_MEDIA,
 					  &matchoff, &matchlen, &maddr) > 0) {
 			maddr_len = matchlen;
 			memcpy(&rtp_addr, &maddr, sizeof(rtp_addr));
@@ -1113,11 +1118,10 @@ static int process_sdp(struct sk_buff *skb, unsigned int protoff,
 			return ret;
 
 		/* Update media connection address if present */
-		if (maddr_len && nf_nat_sdp_addr &&
-		    nf_ct_l3num(ct) == NFPROTO_IPV4 && ct->status & IPS_NAT_MASK) {
+		if (maddr_len && nf_nat_sdp_addr && ct->status & IPS_NAT_MASK) {
 			ret = nf_nat_sdp_addr(skb, protoff, dataoff,
-					      dptr, datalen,
-					      mediaoff, c_hdr, SDP_HDR_MEDIA,
+					      dptr, datalen, mediaoff,
+					      SDP_HDR_CONNECTION, SDP_HDR_MEDIA,
 					      &rtp_addr);
 			if (ret != NF_ACCEPT)
 				return ret;
@@ -1127,8 +1131,7 @@ static int process_sdp(struct sk_buff *skb, unsigned int protoff,
 
 	/* Update session connection and owner addresses */
 	nf_nat_sdp_session = rcu_dereference(nf_nat_sdp_session_hook);
-	if (nf_nat_sdp_session && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
-	    ct->status & IPS_NAT_MASK)
+	if (nf_nat_sdp_session && ct->status & IPS_NAT_MASK)
 		ret = nf_nat_sdp_session(skb, protoff, dataoff,
 					 dptr, datalen, sdpoff, &rtp_addr);
 
@@ -1293,8 +1296,7 @@ static int process_register_request(struct sk_buff *skb, unsigned int protoff,
 	exp->flags = NF_CT_EXPECT_PERMANENT | NF_CT_EXPECT_INACTIVE;
 
 	nf_nat_sip_expect = rcu_dereference(nf_nat_sip_expect_hook);
-	if (nf_nat_sip_expect && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
-	    ct->status & IPS_NAT_MASK)
+	if (nf_nat_sip_expect && ct->status & IPS_NAT_MASK)
 		ret = nf_nat_sip_expect(skb, protoff, dataoff, dptr, datalen,
 					exp, matchoff, matchlen);
 	else {
@@ -1476,8 +1478,7 @@ static int process_sip_msg(struct sk_buff *skb, struct nf_conn *ct,
 	else
 		ret = process_sip_response(skb, protoff, dataoff, dptr, datalen);
 
-	if (ret == NF_ACCEPT && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
-	    ct->status & IPS_NAT_MASK) {
+	if (ret == NF_ACCEPT && ct->status & IPS_NAT_MASK) {
 		nf_nat_sip = rcu_dereference(nf_nat_sip_hook);
 		if (nf_nat_sip && !nf_nat_sip(skb, protoff, dataoff,
 					      dptr, datalen))
@@ -1560,11 +1561,10 @@ static int sip_help_tcp(struct sk_buff *skb, unsigned int protoff,
 		datalen  = datalen + diff - msglen;
 	}
 
-	if (ret == NF_ACCEPT && nf_ct_l3num(ct) == NFPROTO_IPV4 &&
-	    ct->status & IPS_NAT_MASK) {
+	if (ret == NF_ACCEPT && ct->status & IPS_NAT_MASK) {
 		nf_nat_sip_seq_adjust = rcu_dereference(nf_nat_sip_seq_adjust_hook);
 		if (nf_nat_sip_seq_adjust)
-			nf_nat_sip_seq_adjust(skb, tdiff);
+			nf_nat_sip_seq_adjust(skb, protoff, tdiff);
 	}
 
 	return ret;
diff --git a/net/ipv4/netfilter/nf_nat_sip.c b/net/netfilter/nf_nat_sip.c
similarity index 76%
rename from net/ipv4/netfilter/nf_nat_sip.c
rename to net/netfilter/nf_nat_sip.c
index 47a4718..6c69b2b 100644
--- a/net/ipv4/netfilter/nf_nat_sip.c
+++ b/net/netfilter/nf_nat_sip.c
@@ -3,7 +3,7 @@
  * (C) 2005 by Christian Hentschel <chentschel@arnet.com.ar>
  * based on RR's ip_nat_ftp.c and other modules.
  * (C) 2007 United Security Providers
- * (C) 2007, 2008 Patrick McHardy <kaber@trash.net>
+ * (C) 2007, 2008, 2011, 2012 Patrick McHardy <kaber@trash.net>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,8 +12,7 @@
 
 #include <linux/module.h>
 #include <linux/skbuff.h>
-#include <linux/ip.h>
-#include <net/ip.h>
+#include <linux/inet.h>
 #include <linux/udp.h>
 #include <linux/tcp.h>
 
@@ -41,8 +40,8 @@ static unsigned int mangle_packet(struct sk_buff *skb, unsigned int protoff,
 	unsigned int baseoff;
 
 	if (nf_ct_protonum(ct) == IPPROTO_TCP) {
-		th = (struct tcphdr *)(skb->data + ip_hdrlen(skb));
-		baseoff = ip_hdrlen(skb) + th->doff * 4;
+		th = (struct tcphdr *)(skb->data + protoff);
+		baseoff = protoff + th->doff * 4;
 		matchoff += dataoff - baseoff;
 
 		if (!__nf_nat_mangle_tcp_packet(skb, ct, ctinfo,
@@ -50,7 +49,7 @@ static unsigned int mangle_packet(struct sk_buff *skb, unsigned int protoff,
 						buffer, buflen, false))
 			return 0;
 	} else {
-		baseoff = ip_hdrlen(skb) + sizeof(struct udphdr);
+		baseoff = protoff + sizeof(struct udphdr);
 		matchoff += dataoff - baseoff;
 
 		if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo,
@@ -65,6 +64,29 @@ static unsigned int mangle_packet(struct sk_buff *skb, unsigned int protoff,
 	return 1;
 }
 
+static int sip_sprintf_addr(const struct nf_conn *ct, char *buffer,
+			    const union nf_inet_addr *addr, bool delim)
+{
+	if (nf_ct_l3num(ct) == NFPROTO_IPV4)
+		return sprintf(buffer, "%pI4", &addr->ip);
+	else {
+		if (delim)
+			return sprintf(buffer, "[%pI6c]", &addr->ip6);
+		else
+			return sprintf(buffer, "%pI6c", &addr->ip6);
+	}
+
+}
+
+static int sip_sprintf_addr_port(const struct nf_conn *ct, char *buffer,
+				 const union nf_inet_addr *addr, u16 port)
+{
+	if (nf_ct_l3num(ct) == NFPROTO_IPV4)
+		return sprintf(buffer, "%pI4:%u", &addr->ip, port);
+	else
+		return sprintf(buffer, "[%pI6c]:%u", &addr->ip6, port);
+}
+
 static int map_addr(struct sk_buff *skb, unsigned int protoff,
 		    unsigned int dataoff,
 		    const char **dptr, unsigned int *datalen,
@@ -74,27 +96,26 @@ static int map_addr(struct sk_buff *skb, unsigned int protoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
-	char buffer[sizeof("nnn.nnn.nnn.nnn:nnnnn")];
+	char buffer[INET6_ADDRSTRLEN + sizeof("[]:nnnnn")];
 	unsigned int buflen;
-	__be32 newaddr;
+	union nf_inet_addr newaddr;
 	__be16 newport;
 
-	if (ct->tuplehash[dir].tuple.src.u3.ip == addr->ip &&
+	if (nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.src.u3, addr) &&
 	    ct->tuplehash[dir].tuple.src.u.udp.port == port) {
-		newaddr = ct->tuplehash[!dir].tuple.dst.u3.ip;
+		newaddr = ct->tuplehash[!dir].tuple.dst.u3;
 		newport = ct->tuplehash[!dir].tuple.dst.u.udp.port;
-	} else if (ct->tuplehash[dir].tuple.dst.u3.ip == addr->ip &&
+	} else if (nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.dst.u3, addr) &&
 		   ct->tuplehash[dir].tuple.dst.u.udp.port == port) {
-		newaddr = ct->tuplehash[!dir].tuple.src.u3.ip;
+		newaddr = ct->tuplehash[!dir].tuple.src.u3;
 		newport = ct->tuplehash[!dir].tuple.src.u.udp.port;
 	} else
 		return 1;
 
-	if (newaddr == addr->ip && newport == port)
+	if (nf_inet_addr_cmp(&newaddr, addr) && newport == port)
 		return 1;
 
-	buflen = sprintf(buffer, "%pI4:%u", &newaddr, ntohs(newport));
-
+	buflen = sip_sprintf_addr_port(ct, buffer, &newaddr, ntohs(newport));
 	return mangle_packet(skb, protoff, dataoff, dptr, datalen,
 			     matchoff, matchlen, buffer, buflen);
 }
@@ -117,7 +138,7 @@ static int map_sip_addr(struct sk_buff *skb, unsigned int protoff,
 			matchoff, matchlen, &addr, port);
 }
 
-static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int protoff,
+static unsigned int nf_nat_sip(struct sk_buff *skb, unsigned int protoff,
 			       unsigned int dataoff,
 			       const char **dptr, unsigned int *datalen)
 {
@@ -152,16 +173,18 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int protoff,
 				    hdr, NULL, &matchoff, &matchlen,
 				    &addr, &port) > 0) {
 		unsigned int olen, matchend, poff, plen, buflen, n;
-		char buffer[sizeof("nnn.nnn.nnn.nnn:nnnnn")];
+		char buffer[INET6_ADDRSTRLEN + sizeof("[]:nnnnn")];
 
 		/* We're only interested in headers related to this
 		 * connection */
 		if (request) {
-			if (addr.ip != ct->tuplehash[dir].tuple.src.u3.ip ||
+			if (!nf_inet_addr_cmp(&addr,
+					&ct->tuplehash[dir].tuple.src.u3) ||
 			    port != ct->tuplehash[dir].tuple.src.u.udp.port)
 				goto next;
 		} else {
-			if (addr.ip != ct->tuplehash[dir].tuple.dst.u3.ip ||
+			if (!nf_inet_addr_cmp(&addr,
+					&ct->tuplehash[dir].tuple.dst.u3) ||
 			    port != ct->tuplehash[dir].tuple.dst.u.udp.port)
 				goto next;
 		}
@@ -178,10 +201,11 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int protoff,
 		if (ct_sip_parse_address_param(ct, *dptr, matchend, *datalen,
 					       "maddr=", &poff, &plen,
 					       &addr, true) > 0 &&
-		    addr.ip == ct->tuplehash[dir].tuple.src.u3.ip &&
-		    addr.ip != ct->tuplehash[!dir].tuple.dst.u3.ip) {
-			buflen = sprintf(buffer, "%pI4",
-					&ct->tuplehash[!dir].tuple.dst.u3.ip);
+		    nf_inet_addr_cmp(&addr, &ct->tuplehash[dir].tuple.src.u3) &&
+		    !nf_inet_addr_cmp(&addr, &ct->tuplehash[!dir].tuple.dst.u3)) {
+			buflen = sip_sprintf_addr(ct, buffer,
+					&ct->tuplehash[!dir].tuple.dst.u3,
+					true);
 			if (!mangle_packet(skb, protoff, dataoff, dptr, datalen,
 					   poff, plen, buffer, buflen))
 				return NF_DROP;
@@ -192,10 +216,11 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int protoff,
 		if (ct_sip_parse_address_param(ct, *dptr, matchend, *datalen,
 					       "received=", &poff, &plen,
 					       &addr, false) > 0 &&
-		    addr.ip == ct->tuplehash[dir].tuple.dst.u3.ip &&
-		    addr.ip != ct->tuplehash[!dir].tuple.src.u3.ip) {
-			buflen = sprintf(buffer, "%pI4",
-					&ct->tuplehash[!dir].tuple.src.u3.ip);
+		    nf_inet_addr_cmp(&addr, &ct->tuplehash[dir].tuple.dst.u3) &&
+		    !nf_inet_addr_cmp(&addr, &ct->tuplehash[!dir].tuple.src.u3)) {
+			buflen = sip_sprintf_addr(ct, buffer,
+					&ct->tuplehash[!dir].tuple.src.u3,
+					false);
 			if (!mangle_packet(skb, protoff, dataoff, dptr, datalen,
 					   poff, plen, buffer, buflen))
 				return NF_DROP;
@@ -237,7 +262,8 @@ next:
 	return NF_ACCEPT;
 }
 
-static void ip_nat_sip_seq_adjust(struct sk_buff *skb, s16 off)
+static void nf_nat_sip_seq_adjust(struct sk_buff *skb, unsigned int protoff,
+				  s16 off)
 {
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
@@ -246,12 +272,12 @@ static void ip_nat_sip_seq_adjust(struct sk_buff *skb, s16 off)
 	if (nf_ct_protonum(ct) != IPPROTO_TCP || off == 0)
 		return;
 
-	th = (struct tcphdr *)(skb->data + ip_hdrlen(skb));
+	th = (struct tcphdr *)(skb->data + protoff);
 	nf_nat_set_seq_adjust(ct, ctinfo, th->seq, off);
 }
 
 /* Handles expected signalling connections and media streams */
-static void ip_nat_sip_expected(struct nf_conn *ct,
+static void nf_nat_sip_expected(struct nf_conn *ct,
 				struct nf_conntrack_expect *exp)
 {
 	struct nf_nat_range range;
@@ -267,8 +293,8 @@ static void ip_nat_sip_expected(struct nf_conn *ct,
 
 	/* Change src to where master sends to, but only if the connection
 	 * actually came from the same source. */
-	if (ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip ==
-	    ct->master->tuplehash[exp->dir].tuple.src.u3.ip) {
+	if (nf_inet_addr_cmp(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3,
+			     &ct->master->tuplehash[exp->dir].tuple.src.u3)) {
 		range.flags = NF_NAT_RANGE_MAP_IPS;
 		range.min_addr = range.max_addr
 			= ct->master->tuplehash[!exp->dir].tuple.dst.u3;
@@ -276,7 +302,7 @@ static void ip_nat_sip_expected(struct nf_conn *ct,
 	}
 }
 
-static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
+static unsigned int nf_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
 				      unsigned int dataoff,
 				      const char **dptr, unsigned int *datalen,
 				      struct nf_conntrack_expect *exp,
@@ -286,16 +312,17 @@ static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
-	__be32 newip;
+	union nf_inet_addr newaddr;
 	u_int16_t port;
-	char buffer[sizeof("nnn.nnn.nnn.nnn:nnnnn")];
+	char buffer[INET6_ADDRSTRLEN + sizeof("[]:nnnnn")];
 	unsigned int buflen;
 
 	/* Connection will come from reply */
-	if (ct->tuplehash[dir].tuple.src.u3.ip == ct->tuplehash[!dir].tuple.dst.u3.ip)
-		newip = exp->tuple.dst.u3.ip;
+	if (nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.src.u3,
+			     &ct->tuplehash[!dir].tuple.dst.u3))
+		newaddr = exp->tuple.dst.u3;
 	else
-		newip = ct->tuplehash[!dir].tuple.dst.u3.ip;
+		newaddr = ct->tuplehash[!dir].tuple.dst.u3;
 
 	/* If the signalling port matches the connection's source port in the
 	 * original direction, try to use the destination port in the opposite
@@ -307,10 +334,10 @@ static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
 		port = ntohs(exp->tuple.dst.u.udp.port);
 
 	exp->saved_addr = exp->tuple.dst.u3;
-	exp->tuple.dst.u3.ip = newip;
+	exp->tuple.dst.u3 = newaddr;
 	exp->saved_proto.udp.port = exp->tuple.dst.u.udp.port;
 	exp->dir = !dir;
-	exp->expectfn = ip_nat_sip_expected;
+	exp->expectfn = nf_nat_sip_expected;
 
 	for (; port != 0; port++) {
 		int ret;
@@ -328,9 +355,9 @@ static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
 	if (port == 0)
 		return NF_DROP;
 
-	if (exp->tuple.dst.u3.ip != exp->saved_addr.ip ||
+	if (!nf_inet_addr_cmp(&exp->tuple.dst.u3, &exp->saved_addr) ||
 	    exp->tuple.dst.u.udp.port != exp->saved_proto.udp.port) {
-		buflen = sprintf(buffer, "%pI4:%u", &newip, port);
+		buflen = sip_sprintf_addr_port(ct, buffer, &newaddr, port);
 		if (!mangle_packet(skb, protoff, dataoff, dptr, datalen,
 				   matchoff, matchlen, buffer, buflen))
 			goto err;
@@ -388,7 +415,7 @@ static int mangle_sdp_packet(struct sk_buff *skb, unsigned int protoff,
 			     matchoff, matchlen, buffer, buflen) ? 0 : -EINVAL;
 }
 
-static unsigned int ip_nat_sdp_addr(struct sk_buff *skb, unsigned int protoff,
+static unsigned int nf_nat_sdp_addr(struct sk_buff *skb, unsigned int protoff,
 				    unsigned int dataoff,
 				    const char **dptr, unsigned int *datalen,
 				    unsigned int sdpoff,
@@ -396,10 +423,12 @@ static unsigned int ip_nat_sdp_addr(struct sk_buff *skb, unsigned int protoff,
 				    enum sdp_header_types term,
 				    const union nf_inet_addr *addr)
 {
-	char buffer[sizeof("nnn.nnn.nnn.nnn")];
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	char buffer[INET6_ADDRSTRLEN];
 	unsigned int buflen;
 
-	buflen = sprintf(buffer, "%pI4", &addr->ip);
+	buflen = sip_sprintf_addr(ct, buffer, addr, false);
 	if (mangle_sdp_packet(skb, protoff, dataoff, dptr, datalen,
 			      sdpoff, type, term, buffer, buflen))
 		return 0;
@@ -407,7 +436,7 @@ static unsigned int ip_nat_sdp_addr(struct sk_buff *skb, unsigned int protoff,
 	return mangle_content_len(skb, protoff, dataoff, dptr, datalen);
 }
 
-static unsigned int ip_nat_sdp_port(struct sk_buff *skb, unsigned int protoff,
+static unsigned int nf_nat_sdp_port(struct sk_buff *skb, unsigned int protoff,
 				    unsigned int dataoff,
 				    const char **dptr, unsigned int *datalen,
 				    unsigned int matchoff,
@@ -425,24 +454,25 @@ static unsigned int ip_nat_sdp_port(struct sk_buff *skb, unsigned int protoff,
 	return mangle_content_len(skb, protoff, dataoff, dptr, datalen);
 }
 
-static unsigned int ip_nat_sdp_session(struct sk_buff *skb, unsigned int protoff,
+static unsigned int nf_nat_sdp_session(struct sk_buff *skb, unsigned int protoff,
 				       unsigned int dataoff,
 				       const char **dptr, unsigned int *datalen,
 				       unsigned int sdpoff,
 				       const union nf_inet_addr *addr)
 {
-	char buffer[sizeof("nnn.nnn.nnn.nnn")];
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	char buffer[INET6_ADDRSTRLEN];
 	unsigned int buflen;
 
 	/* Mangle session description owner and contact addresses */
-	buflen = sprintf(buffer, "%pI4", &addr->ip);
+	buflen = sip_sprintf_addr(ct, buffer, addr, false);
 	if (mangle_sdp_packet(skb, protoff, dataoff, dptr, datalen, sdpoff,
-			       SDP_HDR_OWNER_IP4, SDP_HDR_MEDIA,
-			       buffer, buflen))
+			      SDP_HDR_OWNER, SDP_HDR_MEDIA, buffer, buflen))
 		return 0;
 
 	switch (mangle_sdp_packet(skb, protoff, dataoff, dptr, datalen, sdpoff,
-				  SDP_HDR_CONNECTION_IP4, SDP_HDR_MEDIA,
+				  SDP_HDR_CONNECTION, SDP_HDR_MEDIA,
 				  buffer, buflen)) {
 	case 0:
 	/*
@@ -463,7 +493,7 @@ static unsigned int ip_nat_sdp_session(struct sk_buff *skb, unsigned int protoff
 
 /* So, this packet has hit the connection tracking matching code.
    Mangle it, and change the expectation to match the new version. */
-static unsigned int ip_nat_sdp_media(struct sk_buff *skb, unsigned int protoff,
+static unsigned int nf_nat_sdp_media(struct sk_buff *skb, unsigned int protoff,
 				     unsigned int dataoff,
 				     const char **dptr, unsigned int *datalen,
 				     struct nf_conntrack_expect *rtp_exp,
@@ -478,23 +508,23 @@ static unsigned int ip_nat_sdp_media(struct sk_buff *skb, unsigned int protoff,
 	u_int16_t port;
 
 	/* Connection will come from reply */
-	if (ct->tuplehash[dir].tuple.src.u3.ip ==
-	    ct->tuplehash[!dir].tuple.dst.u3.ip)
-		rtp_addr->ip = rtp_exp->tuple.dst.u3.ip;
+	if (nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.src.u3,
+			     &ct->tuplehash[!dir].tuple.dst.u3))
+		*rtp_addr = rtp_exp->tuple.dst.u3;
 	else
-		rtp_addr->ip = ct->tuplehash[!dir].tuple.dst.u3.ip;
+		*rtp_addr = ct->tuplehash[!dir].tuple.dst.u3;
 
 	rtp_exp->saved_addr = rtp_exp->tuple.dst.u3;
-	rtp_exp->tuple.dst.u3.ip = rtp_addr->ip;
+	rtp_exp->tuple.dst.u3 = *rtp_addr;
 	rtp_exp->saved_proto.udp.port = rtp_exp->tuple.dst.u.udp.port;
 	rtp_exp->dir = !dir;
-	rtp_exp->expectfn = ip_nat_sip_expected;
+	rtp_exp->expectfn = nf_nat_sip_expected;
 
 	rtcp_exp->saved_addr = rtcp_exp->tuple.dst.u3;
-	rtcp_exp->tuple.dst.u3.ip = rtp_addr->ip;
+	rtcp_exp->tuple.dst.u3 = *rtp_addr;
 	rtcp_exp->saved_proto.udp.port = rtcp_exp->tuple.dst.u.udp.port;
 	rtcp_exp->dir = !dir;
-	rtcp_exp->expectfn = ip_nat_sip_expected;
+	rtcp_exp->expectfn = nf_nat_sip_expected;
 
 	/* Try to get same pair of ports: if not, try to change them. */
 	for (port = ntohs(rtp_exp->tuple.dst.u.udp.port);
@@ -525,7 +555,7 @@ static unsigned int ip_nat_sdp_media(struct sk_buff *skb, unsigned int protoff,
 
 	/* Update media port. */
 	if (rtp_exp->tuple.dst.u.udp.port != rtp_exp->saved_proto.udp.port &&
-	    !ip_nat_sdp_port(skb, protoff, dataoff, dptr, datalen,
+	    !nf_nat_sdp_port(skb, protoff, dataoff, dptr, datalen,
 			     mediaoff, medialen, port))
 		goto err2;
 
@@ -539,8 +569,8 @@ err1:
 }
 
 static struct nf_ct_helper_expectfn sip_nat = {
-        .name           = "sip",
-        .expectfn       = ip_nat_sip_expected,
+	.name		= "sip",
+	.expectfn	= nf_nat_sip_expected,
 };
 
 static void __exit nf_nat_sip_fini(void)
@@ -565,13 +595,13 @@ static int __init nf_nat_sip_init(void)
 	BUG_ON(nf_nat_sdp_port_hook != NULL);
 	BUG_ON(nf_nat_sdp_session_hook != NULL);
 	BUG_ON(nf_nat_sdp_media_hook != NULL);
-	RCU_INIT_POINTER(nf_nat_sip_hook, ip_nat_sip);
-	RCU_INIT_POINTER(nf_nat_sip_seq_adjust_hook, ip_nat_sip_seq_adjust);
-	RCU_INIT_POINTER(nf_nat_sip_expect_hook, ip_nat_sip_expect);
-	RCU_INIT_POINTER(nf_nat_sdp_addr_hook, ip_nat_sdp_addr);
-	RCU_INIT_POINTER(nf_nat_sdp_port_hook, ip_nat_sdp_port);
-	RCU_INIT_POINTER(nf_nat_sdp_session_hook, ip_nat_sdp_session);
-	RCU_INIT_POINTER(nf_nat_sdp_media_hook, ip_nat_sdp_media);
+	RCU_INIT_POINTER(nf_nat_sip_hook, nf_nat_sip);
+	RCU_INIT_POINTER(nf_nat_sip_seq_adjust_hook, nf_nat_sip_seq_adjust);
+	RCU_INIT_POINTER(nf_nat_sip_expect_hook, nf_nat_sip_expect);
+	RCU_INIT_POINTER(nf_nat_sdp_addr_hook, nf_nat_sdp_addr);
+	RCU_INIT_POINTER(nf_nat_sdp_port_hook, nf_nat_sdp_port);
+	RCU_INIT_POINTER(nf_nat_sdp_session_hook, nf_nat_sdp_session);
+	RCU_INIT_POINTER(nf_nat_sdp_media_hook, nf_nat_sdp_media);
 	nf_ct_helper_expectfn_register(&sip_nat);
 	return 0;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 19/19] netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (17 preceding siblings ...)
  2012-08-09 20:09 ` [PATCH 18/19] netfilter: nf_nat: support IPv6 in SIP " kaber
@ 2012-08-09 20:09 ` kaber
  2012-08-09 21:55   ` Jan Engelhardt
  2012-08-09 20:56 ` [PATCH 00/19] netfilter: IPv6 NAT Eric W. Biederman
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: kaber @ 2012-08-09 20:09 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter_ipv6/Kbuild     |    1 +
 include/linux/netfilter_ipv6/ip6t_NPT.h |   16 +++
 net/ipv6/netfilter/Kconfig              |    9 ++
 net/ipv6/netfilter/Makefile             |    1 +
 net/ipv6/netfilter/ip6t_NPT.c           |  165 +++++++++++++++++++++++++++++++
 5 files changed, 192 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter_ipv6/ip6t_NPT.h
 create mode 100644 net/ipv6/netfilter/ip6t_NPT.c

diff --git a/include/linux/netfilter_ipv6/Kbuild b/include/linux/netfilter_ipv6/Kbuild
index bd095bc..b88c005 100644
--- a/include/linux/netfilter_ipv6/Kbuild
+++ b/include/linux/netfilter_ipv6/Kbuild
@@ -1,6 +1,7 @@
 header-y += ip6_tables.h
 header-y += ip6t_HL.h
 header-y += ip6t_LOG.h
+header-y += ip6t_NPT.h
 header-y += ip6t_REJECT.h
 header-y += ip6t_ah.h
 header-y += ip6t_frag.h
diff --git a/include/linux/netfilter_ipv6/ip6t_NPT.h b/include/linux/netfilter_ipv6/ip6t_NPT.h
new file mode 100644
index 0000000..f763355
--- /dev/null
+++ b/include/linux/netfilter_ipv6/ip6t_NPT.h
@@ -0,0 +1,16 @@
+#ifndef __NETFILTER_IP6T_NPT
+#define __NETFILTER_IP6T_NPT
+
+#include <linux/types.h>
+#include <linux/netfilter.h>
+
+struct ip6t_npt_tginfo {
+	union nf_inet_addr	src_pfx;
+	union nf_inet_addr	dst_pfx;
+	__u8			src_pfx_len;
+	__u8			dst_pfx_len;
+	/* Used internally by the kernel */
+	__sum16			adjustment;
+};
+
+#endif /* __NETFILTER_IP6T_NPT */
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 7bdf73b..44ae1bf 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -177,6 +177,15 @@ config IP6_NF_TARGET_REDIRECT
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP6_NF_TARGET_NPT
+	tristate "NPT (Network Prefix translation) target support"
+	depends on NETFILTER_ADVANCED
+	help
+	  This option adda `SNPT' and `DNPT' target, which perform stateless
+	  IPv6-to-IPv6 Network Prefix Translation (RFC 6296).
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config IP6_NF_FILTER
 	tristate "Packet filtering"
 	default m if NETFILTER_ADVANCED=n
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index 0864ce6..5752132 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -36,5 +36,6 @@ obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o
 # targets
 obj-$(CONFIG_IP6_NF_TARGET_MASQUERADE) += ip6t_MASQUERADE.o
 obj-$(CONFIG_IP6_NF_TARGET_NETMAP) += ip6t_NETMAP.o
+obj-$(CONFIG_IP6_NF_TARGET_NPT) += ip6t_NPT.o
 obj-$(CONFIG_IP6_NF_TARGET_REDIRECT) += ip6t_REDIRECT.o
 obj-$(CONFIG_IP6_NF_TARGET_REJECT) += ip6t_REJECT.o
diff --git a/net/ipv6/netfilter/ip6t_NPT.c b/net/ipv6/netfilter/ip6t_NPT.c
new file mode 100644
index 0000000..e948691
--- /dev/null
+++ b/net/ipv6/netfilter/ip6t_NPT.c
@@ -0,0 +1,165 @@
+/*
+ * Copyright (c) 2011, 2012 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/ipv6.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv6.h>
+#include <linux/netfilter_ipv6/ip6t_NPT.h>
+#include <linux/netfilter/x_tables.h>
+
+static __sum16 csum16_complement(__sum16 a)
+{
+	return (__force __sum16)(0xffff - (__force u16)a);
+}
+
+static __sum16 csum16_add(__sum16 a, __sum16 b)
+{
+	u16 sum;
+
+	sum = (__force u16)a + (__force u16)b;
+	sum += (__force u16)a < (__force u16)b;
+	return (__force __sum16)sum;
+}
+
+static __sum16 csum16_sub(__sum16 a, __sum16 b)
+{
+	return csum16_add(a, csum16_complement(b));
+}
+
+static int ip6t_npt_checkentry(const struct xt_tgchk_param *par)
+{
+	struct ip6t_npt_tginfo *npt = par->targinfo;
+	__sum16 src_sum = 0, dst_sum = 0;
+	unsigned int i;
+
+	if (npt->src_pfx_len > 64 || npt->dst_pfx_len > 64)
+		return -EINVAL;
+
+	for (i = 0; i < ARRAY_SIZE(npt->src_pfx.in6.s6_addr16); i++) {
+		src_sum = csum16_add(src_sum,
+				(__force __sum16)npt->src_pfx.in6.s6_addr16[i]);
+		dst_sum = csum16_add(dst_sum,
+				(__force __sum16)npt->dst_pfx.in6.s6_addr16[i]);
+	}
+
+	npt->adjustment = csum16_sub(src_sum, dst_sum);
+	return 0;
+}
+
+static bool ip6t_npt_map_pfx(const struct ip6t_npt_tginfo *npt,
+			     struct in6_addr *addr)
+{
+	unsigned int pfx_len;
+	unsigned int i, idx;
+	__be32 mask;
+	__sum16 sum;
+
+	pfx_len = max(npt->src_pfx_len, npt->dst_pfx_len);
+	for (i = 0; i < pfx_len; i += 32) {
+		if (pfx_len - i >= 32)
+			mask = 0;
+		else
+			mask = htonl(~((1 << (pfx_len - i)) - 1));
+
+		idx = i / 32;
+		addr->s6_addr32[idx] &= mask;
+		addr->s6_addr32[idx] |= npt->dst_pfx.in6.s6_addr32[idx];
+	}
+
+	if (pfx_len <= 48)
+		idx = 3;
+	else {
+		for (idx = 4; idx < ARRAY_SIZE(addr->s6_addr16); idx++) {
+			if ((__force __sum16)addr->s6_addr16[idx] !=
+			    CSUM_MANGLED_0)
+				break;
+		}
+		if (idx == ARRAY_SIZE(addr->s6_addr16))
+			return false;
+	}
+
+	sum = csum16_add((__force __sum16)addr->s6_addr16[idx],
+			 npt->adjustment);
+	if (sum == CSUM_MANGLED_0)
+		sum = 0;
+	*(__force __sum16 *)&addr->s6_addr16[idx] = sum;
+
+	return true;
+}
+
+static unsigned int
+ip6t_snpt_tg(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct ip6t_npt_tginfo *npt = par->targinfo;
+
+	if (!ip6t_npt_map_pfx(npt, &ipv6_hdr(skb)->saddr)) {
+		icmpv6_send(skb, ICMPV6_PARAMPROB, ICMPV6_HDR_FIELD,
+			    offsetof(struct ipv6hdr, saddr));
+		return NF_DROP;
+	}
+	return XT_CONTINUE;
+}
+
+static unsigned int
+ip6t_dnpt_tg(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct ip6t_npt_tginfo *npt = par->targinfo;
+
+	if (!ip6t_npt_map_pfx(npt, &ipv6_hdr(skb)->daddr)) {
+		icmpv6_send(skb, ICMPV6_PARAMPROB, ICMPV6_HDR_FIELD,
+			    offsetof(struct ipv6hdr, daddr));
+		return NF_DROP;
+	}
+	return XT_CONTINUE;
+}
+
+static struct xt_target ip6t_npt_target_reg[] __read_mostly = {
+	{
+		.name		= "SNPT",
+		.target		= ip6t_snpt_tg,
+		.targetsize	= sizeof(struct ip6t_npt_tginfo),
+		.checkentry	= ip6t_npt_checkentry,
+		.family		= NFPROTO_IPV6,
+		.hooks		= (1 << NF_INET_LOCAL_IN) |
+				  (1 << NF_INET_POST_ROUTING),
+		.me		= THIS_MODULE,
+	},
+	{
+		.name		= "DNPT",
+		.target		= ip6t_dnpt_tg,
+		.targetsize	= sizeof(struct ip6t_npt_tginfo),
+		.checkentry	= ip6t_npt_checkentry,
+		.family		= NFPROTO_IPV6,
+		.hooks		= (1 << NF_INET_PRE_ROUTING) |
+				  (1 << NF_INET_LOCAL_OUT),
+		.me		= THIS_MODULE,
+	},
+};
+
+static int __init ip6t_npt_init(void)
+{
+	return xt_register_targets(ip6t_npt_target_reg,
+				   ARRAY_SIZE(ip6t_npt_target_reg));
+}
+
+static void __exit ip6t_npt_exit(void)
+{
+	xt_unregister_targets(ip6t_npt_target_reg,
+			      ARRAY_SIZE(ip6t_npt_target_reg));
+}
+
+module_init(ip6t_npt_init);
+module_exit(ip6t_npt_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("IPv6-to-IPv6 Network Prefix Translation (RFC 6296)");
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
+MODULE_ALIAS("ip6t_SNPT");
+MODULE_ALIAS("ip6t_DNPT");
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/19] netfilter: IPv6 NAT
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (18 preceding siblings ...)
  2012-08-09 20:09 ` [PATCH 19/19] netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target kaber
@ 2012-08-09 20:56 ` Eric W. Biederman
  2012-08-09 21:52   ` Patrick McHardy
  2012-08-09 22:00 ` Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2012-08-09 20:56 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

kaber@trash.net writes:

> The following patches contain an updated version of IPv6 NAT against
> Linus' current tree.
>
> The series is organized as follows:
>
> - Patches 01-03 contain bugfixes for SIP helper bugs/regressions
>   present in the current kernel

Why not just delete this code?  The current best practices are to
disable ALGs for SIP.  To the point in some circles people recommend
running SIP over TLS to avoid over helpful NAT ALGs.

Eric

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/19] netfilter: IPv6 NAT
  2012-08-09 20:56 ` [PATCH 00/19] netfilter: IPv6 NAT Eric W. Biederman
@ 2012-08-09 21:52   ` Patrick McHardy
  0 siblings, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-09 21:52 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netfilter-devel, netdev

On Thu, 9 Aug 2012, Eric W. Biederman wrote:

> kaber@trash.net writes:
>
>> The following patches contain an updated version of IPv6 NAT against
>> Linus' current tree.
>>
>> The series is organized as follows:
>>
>> - Patches 01-03 contain bugfixes for SIP helper bugs/regressions
>>   present in the current kernel
>
> Why not just delete this code?  The current best practices are to
> disable ALGs for SIP.  To the point in some circles people recommend
> running SIP over TLS to avoid over helpful NAT ALGs.

And where can I read up on these best practices and how well they work?

In any case, these patches are all for the connection tracking helper,
which is needed unless you want to open up your firewall for every 
possible RTP source, in which case you can simply disable it. Some people 
are also using it to proritize RTP streams without any filtering.

Also, even if the NAT helper would not mangle packets, it is still needed 
to adjust expectations. so incoming connections can go to the correct
destination. That is, direct RTP connections between two endpoints
that didn't have any direct signalling communication before

You can of course also proxy everything through your SIP provider 
(including internal calls) and/or use STUN (which is unreliable under
Linux). I prefer not to.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/19] netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target
  2012-08-09 20:09 ` [PATCH 19/19] netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target kaber
@ 2012-08-09 21:55   ` Jan Engelhardt
  2012-08-09 22:25     ` Patrick McHardy
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Engelhardt @ 2012-08-09 21:55 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev


On Thursday 2012-08-09 22:09, kaber@trash.net wrote:
>+config IP6_NF_TARGET_NPT
>+	tristate "NPT (Network Prefix translation) target support"
>+	depends on NETFILTER_ADVANCED
>+	help
>+	  This option adda `SNPT' and `DNPT' target, which perform stateless
>+	  IPv6-to-IPv6 Network Prefix Translation (RFC 6296).

Fixes/suggestion in the help text (near "adda" and subsequent).

config IP6_NF_TARGET_NPT
	tristate "NPT (Network Prefix translation) target support"
	depends on NETFILTER_ADVANCED
	---help---
	This option adds the "SNPT" and "DNPT" targets, which perform stateless
	IPv6-to-IPv6 Network Prefix Translation per RFC 6296.

>+/*
>+ * Copyright (c) 2011, 2012 Patrick McHardy <kaber@trash.net>
>+ *
>+ * This program is free software; you can redistribute it and/or modify
>+ * it under the terms of the GNU General Public License version 2 as
>+ * published by the Free Software Foundation.
>+ */

GNU sometimes has a strange way of expressing their (C) lines,
listing all years separately, as if "1989-1991,1994-1999,2001-2005" was not
sufficient. In your case, would "2011-2012" work?

Any objection to adding a "(or later)" clause to the set?

>+static int ip6t_npt_checkentry(const struct xt_tgchk_param *par)
>+{
>+	struct ip6t_npt_tginfo *npt = par->targinfo;
>+	__sum16 src_sum = 0, dst_sum = 0;
>+	unsigned int i;
>+
>+	if (npt->src_pfx_len > 64 || npt->dst_pfx_len > 64)
>+		return -EINVAL;

While the RFC probably only specifies masks up to /64,
the general algorithm is sustainable up to at least /96.

Extending the RFC's section 3.6 ("/48 Prefix Mapping") example,
fd01:0203:0405:0001:0000:0000:0000:1234/112 (specification per `ip
addr`) would become 2001:0db8:0001:0000:0000:0000:d550:1234.

Not everybody runs exclusively on EUI64-SLAAC / PRIVACY addresses :)


>+static struct xt_target ip6t_npt_target_reg[] __read_mostly = {
>+	{
>+		.name		= "SNPT",

IETF did quite a job again... NPT as an acronym is quite
close to NAPT - too close.
Since it's essentially NETMAP, having something in that ballpark
may seem more fitting. NPMAP? (Translation is mapping -
henceforh "Network Prefix Mapping")



I would hint towards choosing a different name; the P in NPT
may be (mis)understood as port (due to the common "NAPT")
keyword.

>+		.target		= ip6t_snpt_tg,
>+		.targetsize	= sizeof(struct ip6t_npt_tginfo),
>+		.checkentry	= ip6t_npt_checkentry,
>+		.family		= NFPROTO_IPV6,
>+		.hooks		= (1 << NF_INET_LOCAL_IN) |
>+				  (1 << NF_INET_POST_ROUTING),
>+		.me		= THIS_MODULE,

Should perhaps a  .table = "mangle"  be added?


Are any tricks on the userspace side needed to use SNPT/SNPMAP?
Since I spot no code telling conntrack about the address mingling,
one would have to use -j CT --notrack or the rawpost table, like
it's done for RAWSNAT in xtables-addons, would he not?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/19] netfilter: IPv6 NAT
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (19 preceding siblings ...)
  2012-08-09 20:56 ` [PATCH 00/19] netfilter: IPv6 NAT Eric W. Biederman
@ 2012-08-09 22:00 ` Pablo Neira Ayuso
  2012-08-09 22:30   ` Patrick McHardy
  2012-08-17 13:42 ` Pablo Neira Ayuso
  2012-08-25  0:58 ` Andre Tomt
  22 siblings, 1 reply; 68+ messages in thread
From: Pablo Neira Ayuso @ 2012-08-09 22:00 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

Hi Patrick,

On Thu, Aug 09, 2012 at 10:08:44PM +0200, kaber@trash.net wrote:
> The following patches contain an updated version of IPv6 NAT against
> Linus' current tree.
> 
> The series is organized as follows:
> 
> - Patches 01-03 contain bugfixes for SIP helper bugs/regressions
>   present in the current kernel

Thanks, I'll pass these to David.

I have also two more to fixes to oopses regarding SIP. I'm expecting
one user to finally confirm that their issues are fixed.

> - Patches 04-06 improve conntrack fragmentation handling, the IPv6
>   parts are also a precondition for IPv6 NAT
> 
> - Patches 07 and 08 prepare the current NAT code for conversion to
>   an address family independant core, but contain no functional
>   changes
> 
> - Patch 09 adds the address family independant NAT core and converts
>   the existing IPv4-only NAT code to an AF-specific module
> 
> - Patches 10 and 11 add some infrastructure for IPv6 NAT
> 
> - Patch 12 adds IPv6 NAT support
> 
> - Patches 13-15 add IPv6 specific NAT targets
> 
> - Patches 16-19 add some IPv6-capable ports of existing NAT helpers
> 
> - Patch 19 is independant of the IPv6 NAT code and adds support for
>   stateless IPv6 prefix translation, just to relieve my conscience ;)
> 
> 
> Since the last posting numerous bugs have been fixed, I don't remember
> all of them, the more important ones include:
> 
> - automatic NAT module loading in ctnetlink
> 
> - address selection when mapping to IPv6 ranges
> 
> - handling of IPv6 fragments
> 
> - NAT handling of ICMPv6 error messages

Thanks, I was keeping the previous patchset in one branch:

http://1984.lsi.us.es/git/nf-next/log/?h=nf-nat4

You can also find forward ports of netlink-mmap (from Florian Westpal)
and one for nftables from myself in that tree.

> Besides implementing IPv6 NAT, there are no known bugs left. Userspace
> patches will follow shortly.

We have this branch for iptables IPv6 NAT:

http://git.netfilter.org/cgi-bin/gitweb.cgi?p=iptables.git;a=shortlog;h=refs/heads/nf-nat

Let me know if you're OK with these.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 19/19] netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target
  2012-08-09 21:55   ` Jan Engelhardt
@ 2012-08-09 22:25     ` Patrick McHardy
  0 siblings, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-09 22:25 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev

On Thu, 9 Aug 2012, Jan Engelhardt wrote:

>
> On Thursday 2012-08-09 22:09, kaber@trash.net wrote:
>> +config IP6_NF_TARGET_NPT
>> +	tristate "NPT (Network Prefix translation) target support"
>> +	depends on NETFILTER_ADVANCED
>> +	help
>> +	  This option adda `SNPT' and `DNPT' target, which perform stateless
>> +	  IPv6-to-IPv6 Network Prefix Translation (RFC 6296).
>
> Fixes/suggestion in the help text (near "adda" and subsequent).
>
> config IP6_NF_TARGET_NPT
> 	tristate "NPT (Network Prefix translation) target support"
> 	depends on NETFILTER_ADVANCED
> 	---help---
> 	This option adds the "SNPT" and "DNPT" targets, which perform stateless
> 	IPv6-to-IPv6 Network Prefix Translation per RFC 6296.

Thanks, changes.

>> +/*
>> + * Copyright (c) 2011, 2012 Patrick McHardy <kaber@trash.net>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>
> GNU sometimes has a strange way of expressing their (C) lines,
> listing all years separately, as if "1989-1991,1994-1999,2001-2005" was not
> sufficient. In your case, would "2011-2012" work?
>
> Any objection to adding a "(or later)" clause to the set?

Yeah this is a stupid habit that doesn't matter anyway, in all
jurisdictions I care about copyright duration is based on the
lifetime of the author. I'll try to find out whether just adding
a copyright statement without any specific year would be fine
for my purposes.

>> +static int ip6t_npt_checkentry(const struct xt_tgchk_param *par)
>> +{
>> +	struct ip6t_npt_tginfo *npt = par->targinfo;
>> +	__sum16 src_sum = 0, dst_sum = 0;
>> +	unsigned int i;
>> +
>> +	if (npt->src_pfx_len > 64 || npt->dst_pfx_len > 64)
>> +		return -EINVAL;
>
> While the RFC probably only specifies masks up to /64,
> the general algorithm is sustainable up to at least /96.
>
> Extending the RFC's section 3.6 ("/48 Prefix Mapping") example,
> fd01:0203:0405:0001:0000:0000:0000:1234/112 (specification per `ip
> addr`) would become 2001:0db8:0001:0000:0000:0000:d550:1234.
>
> Not everybody runs exclusively on EUI64-SLAAC / PRIVACY addresses :)

I'll think about it. If you want to send a patch, please go ahead ;)

>> +static struct xt_target ip6t_npt_target_reg[] __read_mostly = {
>> +	{
>> +		.name		= "SNPT",
>
> IETF did quite a job again... NPT as an acronym is quite
> close to NAPT - too close.
> Since it's essentially NETMAP, having something in that ballpark
> may seem more fitting. NPMAP? (Translation is mapping -
> henceforh "Network Prefix Mapping")

So we'd have SNPMAP and DNPMAP. I like NPMAP, but SNPMAP and DNPMAP
don't sound that much better, so I think I prefer keeping the currently
used abrevation of Network Prefix Translation. I don't care much though.

> I would hint towards choosing a different name; the P in NPT
> may be (mis)understood as port (due to the common "NAPT")
> keyword.

Well, I think people using computers are used to tons of similar
sounding acronyms and abrevations. They'll manage ;)

>> +		.target		= ip6t_snpt_tg,
>> +		.targetsize	= sizeof(struct ip6t_npt_tginfo),
>> +		.checkentry	= ip6t_npt_checkentry,
>> +		.family		= NFPROTO_IPV6,
>> +		.hooks		= (1 << NF_INET_LOCAL_IN) |
>> +				  (1 << NF_INET_POST_ROUTING),
>> +		.me		= THIS_MODULE,
>
> Should perhaps a  .table = "mangle"  be added?

I've been proposing to lift the mangle restriction on all targets
for years. Basically the only special thing about the mangle table
is rerouting on mark changes. Everything else works just fine in
other tables (with NAT being somewhat special as well), even marking
packets if you don't need rerouting, its even more performant in that
case since no extra routing lookups need to be done. So I don't see
a reason to impose artificial limitations.

> Are any tricks on the userspace side needed to use SNPT/SNPMAP?
> Since I spot no code telling conntrack about the address mingling,
> one would have to use -j CT --notrack or the rawpost table, like
> it's done for RAWSNAT in xtables-addons, would he not?

Yes, if connection tracking is used (which is not necessary of course)
its best to exclude the translated packets from tracking. There's
actually a lot of potential for improvement here. One thing is telling
connection tracking, if its used, about the translations, so it can
properly track packets. Another thing is, with stateless translation
you still need ALGs to take care of addresses in layer 7 protocols.
I've been thinking about how to make the existing NAT helpers work
without conntrack and NAT, but its not easy and requires a lot of
restructuring of the existing code. Its something we can still add
later, its something that just affects the kernel.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/19] netfilter: IPv6 NAT
  2012-08-09 22:00 ` Pablo Neira Ayuso
@ 2012-08-09 22:30   ` Patrick McHardy
  0 siblings, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-09 22:30 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, netdev

On Fri, 10 Aug 2012, Pablo Neira Ayuso wrote:

> Hi Patrick,
>
> On Thu, Aug 09, 2012 at 10:08:44PM +0200, kaber@trash.net wrote:
>> The following patches contain an updated version of IPv6 NAT against
>> Linus' current tree.
>>
>> The series is organized as follows:
>>
>> - Patches 01-03 contain bugfixes for SIP helper bugs/regressions
>>   present in the current kernel
>
> Thanks, I'll pass these to David.
>
> I have also two more to fixes to oopses regarding SIP. I'm expecting
> one user to finally confirm that their issues are fixed.

If you want me to have a look as well, just send me an URL or the patches.

>> - Patches 04-06 improve conntrack fragmentation handling, the IPv6
>>   parts are also a precondition for IPv6 NAT
>>
>> - Patches 07 and 08 prepare the current NAT code for conversion to
>>   an address family independant core, but contain no functional
>>   changes
>>
>> - Patch 09 adds the address family independant NAT core and converts
>>   the existing IPv4-only NAT code to an AF-specific module
>>
>> - Patches 10 and 11 add some infrastructure for IPv6 NAT
>>
>> - Patch 12 adds IPv6 NAT support
>>
>> - Patches 13-15 add IPv6 specific NAT targets
>>
>> - Patches 16-19 add some IPv6-capable ports of existing NAT helpers
>>
>> - Patch 19 is independant of the IPv6 NAT code and adds support for
>>   stateless IPv6 prefix translation, just to relieve my conscience ;)
>>
>>
>> Since the last posting numerous bugs have been fixed, I don't remember
>> all of them, the more important ones include:
>>
>> - automatic NAT module loading in ctnetlink
>>
>> - address selection when mapping to IPv6 ranges
>>
>> - handling of IPv6 fragments
>>
>> - NAT handling of ICMPv6 error messages
>
> Thanks, I was keeping the previous patchset in one branch:
>
> http://1984.lsi.us.es/git/nf-next/log/?h=nf-nat4
>
> You can also find forward ports of netlink-mmap (from Florian Westpal)
> and one for nftables from myself in that tree.

Thanks, Florian just pointed me to these trees. Will have a look at
the changes compared to my tree. I'm actually intending to finish up
the mmaped netlink work once I'm done with IPv6 NAT.

>> Besides implementing IPv6 NAT, there are no known bugs left. Userspace
>> patches will follow shortly.
>
> We have this branch for iptables IPv6 NAT:
>
> http://git.netfilter.org/cgi-bin/gitweb.cgi?p=iptables.git;a=shortlog;h=refs/heads/nf-nat
>
> Let me know if you're OK with these.

For now I'll just accumulate feedback and will incorporate it into my 
tree. I'll also diff them against your tree and will then send the
final result once all feedback/fixes are included.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 01/19] netfilter: nf_ct_sip: fix helper name
  2012-08-09 20:08 ` [PATCH 01/19] netfilter: nf_ct_sip: fix helper name kaber
@ 2012-08-14  0:00   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 68+ messages in thread
From: Pablo Neira Ayuso @ 2012-08-14  0:00 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

On Thu, Aug 09, 2012 at 10:08:45PM +0200, kaber@trash.net wrote:
> From: Patrick McHardy <kaber@trash.net>
> 
> Commit 3a8fc53a (netfilter: nf_ct_helper: allocate 16 bytes for the helper
> and policy names) introduced a bug in the SIP helper, the helper name is
> sprinted to the sip_names array instead of instead of into the helper
> structure. This breaks the helper match and the /proc/net/nf_conntrack_expect
> output.

Applied, thanks Patrick.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 02/19] netfilter: nf_ct_sip: fix IPv6 address parsing
  2012-08-09 20:08 ` [PATCH 02/19] netfilter: nf_ct_sip: fix IPv6 address parsing kaber
@ 2012-08-14  0:19   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 68+ messages in thread
From: Pablo Neira Ayuso @ 2012-08-14  0:19 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

On Thu, Aug 09, 2012 at 10:08:46PM +0200, kaber@trash.net wrote:
> From: Patrick McHardy <kaber@trash.net>
> 
> Within SIP messages IPv6 addresses are enclosed in square brackets in most
> cases, with the exception of the "received=" header parameter. Currently
> the helper fails to parse enclosed addresses.
> 
> This patch:
> 
> - changes the SIP address parsing function to enforce square brackets
>   when required, and accept them when not required but present, as
>   recommended by RFC 5118.
> 
> - adds a new SDP address parsing function that never accepts square
>   brackets since SDP doesn't use them.
> 
> With these changes, the SIP helper correctly parses all test messages
> from RFC 5118 (Session Initiation Protocol (SIP) Torture Test Messages
> for Internet Protocol Version 6 (IPv6)).

Applied, thanks Patrick.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 03/19] netfilter: nf_nat_sip: fix via header translation with multiple parameters
  2012-08-09 20:08 ` [PATCH 03/19] netfilter: nf_nat_sip: fix via header translation with multiple parameters kaber
@ 2012-08-14  0:28   ` Pablo Neira Ayuso
  2012-08-14 12:23     ` Patrick McHardy
  0 siblings, 1 reply; 68+ messages in thread
From: Pablo Neira Ayuso @ 2012-08-14  0:28 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

On Thu, Aug 09, 2012 at 10:08:47PM +0200, kaber@trash.net wrote:
> From: Patrick McHardy <kaber@trash.net>
> 
> Via-headers are parsed beginning at the first character after the Via-address.
> When the address is translated first and its length decreases, the offset to
> start parsing at is incorrect and header parameters might be missed.
> 
> Update the offset after translating the Via-address to fix this.

Applied, thanks Patrick.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 03/19] netfilter: nf_nat_sip: fix via header translation with multiple parameters
  2012-08-14  0:28   ` Pablo Neira Ayuso
@ 2012-08-14 12:23     ` Patrick McHardy
  0 siblings, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-14 12:23 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, netdev

On Tue, 14 Aug 2012, Pablo Neira Ayuso wrote:

> On Thu, Aug 09, 2012 at 10:08:47PM +0200, kaber@trash.net wrote:
>> From: Patrick McHardy <kaber@trash.net>
>>
>> Via-headers are parsed beginning at the first character after the Via-address.
>> When the address is translated first and its length decreases, the offset to
>> start parsing at is incorrect and header parameters might be missed.
>>
>> Update the offset after translating the Via-address to fix this.
>
> Applied, thanks Patrick.

Thanks Pablo. I'll submit the final IPv6-NAT patches once these three
patches have made it into net-next/nf-next.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-09 20:08 ` [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling kaber
@ 2012-08-17  8:06   ` Jesper Dangaard Brouer
  2012-08-18 12:26     ` Patrick McHardy
  2012-08-17 13:36   ` [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling Pablo Neira Ayuso
  1 sibling, 1 reply; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-17  8:06 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

On Thu, 2012-08-09 at 22:08 +0200, kaber@trash.net wrote:
> From: Patrick McHardy <kaber@trash.net>
> 
> The IPv6 conntrack fragmentation currently has a couple of shortcomings.
> Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
> defragmented packet is then passed to conntrack, the resulting conntrack
> information is attached to each original fragment and the fragments then
> continue their way through the stack.
> 
> Helper invocation occurs in the POSTROUTING hook, at which point only
> the original fragments are available. The result of this is that
> fragmented packets are never passed to helpers.
> 
> This patch improves the situation in the following way:
> 
> - If a reassembled packet belongs to a connection that has a helper
>   assigned, the reassembled packet is passed through the stack instead
>   of the original fragments.

I'm working on IPv6 fragment handling for IPVS, and are taking advantage
of the "replay" by nf_ct_frag6_output() at hook prio -399
(NF_IP6_PRI_CONNTRACK_DEFRAG + 1).
By making a hook at NF_INET_PRE_ROUTING at prio -99 (NF_IP6_PRI_NAT_DST
+ 1).

I can see that the code path can be changed (with this patch), if a
helper is assigned.  Then the "replay" starts at prio -199
(NF_IP6_PRI_CONNTRACK + 1), I guess I'm safe as I run at -99.

I have tested that your patchset works, with my ipvs patches, but would
like the trigger the changed code path, to make sure.

Could you provide an iptables command/rule, that trigger this code path?


[cut]

> @@ -199,9 +200,13 @@ static unsigned int ipv6_confirm(unsigned int hooknum,
>  static unsigned int __ipv6_conntrack_in(struct net *net,
>  					unsigned int hooknum,
>  					struct sk_buff *skb,
> +					const struct net_device *in,
> +					const struct net_device *out,
>  					int (*okfn)(struct sk_buff *))
>  {
>  	struct sk_buff *reasm = skb->nfct_reasm;
> +	struct nf_conn *ct;
> +	enum ip_conntrack_info ctinfo;
>  
>  	/* This packet is fragmented and has reassembled packet. */
>  	if (reasm) {
> @@ -213,6 +218,20 @@ static unsigned int __ipv6_conntrack_in(struct net *net,
>  			if (ret != NF_ACCEPT)
>  				return ret;
>  		}
> +
> +		/* Conntrack helpers need the entire reassembled packet in the
> +		 * POST_ROUTING hook.
> +		 */
> +		ct = nf_ct_get(reasm, &ctinfo);
> +		if (ct != NULL && test_bit(IPS_HELPER_BIT, &ct->status)) {
> +			nf_conntrack_get_reasm(skb);
> +			NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
> +				       (struct net_device *)in,
> +				       (struct net_device *)out,
> +				       okfn, NF_IP6_PRI_CONNTRACK + 1);

Hook prio change to NF_IP6_PRI_CONNTRACK + 1


> +			return NF_DROP_ERR(-ECANCELED);
> +		}

[cut]

> @@ -592,6 +599,7 @@ void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
>  			int (*okfn)(struct sk_buff *))
>  {
>  	struct sk_buff *s, *s2;
> +	unsigned int ret = 0;
>  
>  	for (s = NFCT_FRAG6_CB(skb)->orig; s;) {
>  		nf_conntrack_put_reasm(s->nfct_reasm);
> @@ -601,8 +609,13 @@ void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
>  		s2 = s->next;
>  		s->next = NULL;
>  
> -		NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, s, in, out, okfn,
> -			       NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
> +		if (ret != -ECANCELED)
> +			ret = NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, s,
> +					     in, out, okfn,
> +					     NF_IP6_PRI_CONNTRACK_DEFRAG + 1);

Old hook prio

> +		else
> +			kfree_skb(s);
> +
>  		s = s2;
>  	}
>  	nf_conntrack_put_reasm(skb);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 13/19] netfilter: ip6tables: add MASQUERADE target
  2012-08-09 20:08 ` [PATCH 13/19] netfilter: ip6tables: add MASQUERADE target kaber
@ 2012-08-17 13:11   ` Pablo Neira Ayuso
  2012-08-18 12:31     ` Patrick McHardy
  0 siblings, 1 reply; 68+ messages in thread
From: Pablo Neira Ayuso @ 2012-08-17 13:11 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

Hi Patrick,

On Thu, Aug 09, 2012 at 10:08:57PM +0200, kaber@trash.net wrote:
> From: Patrick McHardy <kaber@trash.net>
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>
> ---
>  include/net/addrconf.h               |    2 +-
>  net/ipv4/netfilter/ipt_MASQUERADE.c  |    3 +-
>  net/ipv6/addrconf.c                  |    2 +-
>  net/ipv6/netfilter/Kconfig           |   12 +++
>  net/ipv6/netfilter/Makefile          |    1 +
>  net/ipv6/netfilter/ip6t_MASQUERADE.c |  135 ++++++++++++++++++++++++++++++++++
>  6 files changed, 152 insertions(+), 3 deletions(-)
>  create mode 100644 net/ipv6/netfilter/ip6t_MASQUERADE.c

Please, add this chunk to this patch:

diff --git a/include/net/netfilter/nf_nat.h
b/include/net/netfilter/nf_nat.h
index 1752f133..bd8eea7 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -43,7 +43,9 @@ struct nf_conn_nat {
        struct nf_conn *ct;
        union nf_conntrack_nat_help help;
 #if defined(CONFIG_IP_NF_TARGET_MASQUERADE) || \
-    defined(CONFIG_IP_NF_TARGET_MASQUERADE_MODULE)
+    defined(CONFIG_IP_NF_TARGET_MASQUERADE_MODULE) || \
+    defined(CONFIG_IP6_NF_TARGET_MASQUERADE) || \
+    defined(CONFIG_IP6_NF_TARGET_MASQUERADE_MODULE)
        int masq_index;
 #endif
 };

Otherwise, compilation breaks with:

* IPv4 NAT is disabled
* IPv6 NAT enabled.

And yes, that pile of ifdefs is really ugly, I wonder if they are
worth for saving 4 bytes. I think most vendors usually include
MASQUERADE support if NAT is enabled.

It seems we have the tradition of keeping several similar compile time
options in Netfilter to optimize memory in several situations (at the
cost of polluting the code with ifdefs). Probably we can think of
getting rid of them.

> diff --git a/include/net/addrconf.h b/include/net/addrconf.h
> index 089a09d..9e63e76 100644
> --- a/include/net/addrconf.h
> +++ b/include/net/addrconf.h
> @@ -78,7 +78,7 @@ extern struct inet6_ifaddr      *ipv6_get_ifaddr(struct net *net,
>  						 int strict);
>  
>  extern int			ipv6_dev_get_saddr(struct net *net,
> -					       struct net_device *dev,
> +					       const struct net_device *dev,
>  					       const struct in6_addr *daddr,
>  					       unsigned int srcprefs,
>  					       struct in6_addr *saddr);
> diff --git a/net/ipv4/netfilter/ipt_MASQUERADE.c b/net/ipv4/netfilter/ipt_MASQUERADE.c
> index 1c3aa28..5d5d4d1 100644
> --- a/net/ipv4/netfilter/ipt_MASQUERADE.c
> +++ b/net/ipv4/netfilter/ipt_MASQUERADE.c
> @@ -99,7 +99,8 @@ device_cmp(struct nf_conn *i, void *ifindex)
>  
>  	if (!nat)
>  		return 0;
> -
> +	if (nf_ct_l3num(i) != NFPROTO_IPV4)
> +		return 0;
>  	return nat->masq_index == (int)(long)ifindex;
>  }
>  
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 7918181..6536404 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1095,7 +1095,7 @@ out:
>  	return ret;
>  }
>  
> -int ipv6_dev_get_saddr(struct net *net, struct net_device *dst_dev,
> +int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
>  		       const struct in6_addr *daddr, unsigned int prefs,
>  		       struct in6_addr *saddr)
>  {
> diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
> index b27e0ad..54a5032 100644
> --- a/net/ipv6/netfilter/Kconfig
> +++ b/net/ipv6/netfilter/Kconfig
> @@ -144,6 +144,18 @@ config IP6_NF_TARGET_HL
>  	(e.g. when running oldconfig). It selects
>  	CONFIG_NETFILTER_XT_TARGET_HL.
>  
> +config IP6_NF_TARGET_MASQUERADE
> +	tristate "MASQUERADE target support"
> +	depends on NF_NAT_IPV6
> +	help
> +	  Masquerading is a special case of NAT: all outgoing connections are
> +	  changed to seem to come from a particular interface's address, and
> +	  if the interface goes down, those connections are lost.  This is
> +	  only useful for dialup accounts with dynamic IP address (ie. your IP
> +	  address will be different on next dialup).
> +
> +	  To compile it as a module, choose M here.  If unsure, say N.
> +
>  config IP6_NF_FILTER
>  	tristate "Packet filtering"
>  	default m if NETFILTER_ADVANCED=n
> diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
> index 7677937..068bad1 100644
> --- a/net/ipv6/netfilter/Makefile
> +++ b/net/ipv6/netfilter/Makefile
> @@ -34,4 +34,5 @@ obj-$(CONFIG_IP6_NF_MATCH_RPFILTER) += ip6t_rpfilter.o
>  obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o
>  
>  # targets
> +obj-$(CONFIG_IP6_NF_TARGET_MASQUERADE) += ip6t_MASQUERADE.o
>  obj-$(CONFIG_IP6_NF_TARGET_REJECT) += ip6t_REJECT.o
> diff --git a/net/ipv6/netfilter/ip6t_MASQUERADE.c b/net/ipv6/netfilter/ip6t_MASQUERADE.c
> new file mode 100644
> index 0000000..60e9053
> --- /dev/null
> +++ b/net/ipv6/netfilter/ip6t_MASQUERADE.c
> @@ -0,0 +1,135 @@
> +/*
> + * Copyright (c) 2011 Patrick McHardy <kaber@trash.net>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Based on Rusty Russell's IPv6 MASQUERADE target. Development of IPv6
> + * NAT funded by Astaro.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/netdevice.h>
> +#include <linux/ipv6.h>
> +#include <linux/netfilter.h>
> +#include <linux/netfilter_ipv6.h>
> +#include <linux/netfilter/x_tables.h>
> +#include <net/netfilter/nf_nat.h>
> +#include <net/addrconf.h>
> +#include <net/ipv6.h>
> +
> +static unsigned int
> +masquerade_tg6(struct sk_buff *skb, const struct xt_action_param *par)
> +{
> +	const struct nf_nat_range *range = par->targinfo;
> +	enum ip_conntrack_info ctinfo;
> +	struct in6_addr src;
> +	struct nf_conn *ct;
> +	struct nf_nat_range newrange;
> +
> +	ct = nf_ct_get(skb, &ctinfo);
> +	NF_CT_ASSERT(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
> +			    ctinfo == IP_CT_RELATED_REPLY));
> +
> +	if (ipv6_dev_get_saddr(dev_net(par->out), par->out,
> +			       &ipv6_hdr(skb)->daddr, 0, &src) < 0)
> +		return NF_DROP;
> +
> +	nfct_nat(ct)->masq_index = par->out->ifindex;
> +
> +	newrange.flags		= range->flags | NF_NAT_RANGE_MAP_IPS;
> +	newrange.min_addr.in6	= src;
> +	newrange.max_addr.in6	= src;
> +	newrange.min_proto	= range->min_proto;
> +	newrange.max_proto	= range->max_proto;
> +
> +	return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_SRC);
> +}
> +
> +static int masquerade_tg6_checkentry(const struct xt_tgchk_param *par)
> +{
> +	const struct nf_nat_range *range = par->targinfo;
> +
> +	if (range->flags & NF_NAT_RANGE_MAP_IPS)
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +static int device_cmp(struct nf_conn *ct, void *ifindex)
> +{
> +	const struct nf_conn_nat *nat = nfct_nat(ct);
> +
> +	if (!nat)
> +		return 0;
> +	if (nf_ct_l3num(ct) != NFPROTO_IPV6)
> +		return 0;
> +	return nat->masq_index == (int)(long)ifindex;
> +}
> +
> +static int masq_device_event(struct notifier_block *this,
> +			     unsigned long event, void *ptr)
> +{
> +	const struct net_device *dev = ptr;
> +	struct net *net = dev_net(dev);
> +
> +	if (event == NETDEV_DOWN)
> +		nf_ct_iterate_cleanup(net, device_cmp,
> +				      (void *)(long)dev->ifindex);
> +
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block masq_dev_notifier = {
> +	.notifier_call	= masq_device_event,
> +};
> +
> +static int masq_inet_event(struct notifier_block *this,
> +			   unsigned long event, void *ptr)
> +{
> +	struct inet6_ifaddr *ifa = ptr;
> +
> +	return masq_device_event(this, event, ifa->idev->dev);
> +}
> +
> +static struct notifier_block masq_inet_notifier = {
> +	.notifier_call	= masq_inet_event,
> +};
> +
> +static struct xt_target masquerade_tg6_reg __read_mostly = {
> +	.name		= "MASQUERADE",
> +	.family		= NFPROTO_IPV6,
> +	.checkentry	= masquerade_tg6_checkentry,
> +	.target		= masquerade_tg6,
> +	.targetsize	= sizeof(struct nf_nat_range),
> +	.table		= "nat",
> +	.hooks		= 1 << NF_INET_POST_ROUTING,
> +	.me		= THIS_MODULE,
> +};
> +
> +static int __init masquerade_tg6_init(void)
> +{
> +	int err;
> +
> +	err = xt_register_target(&masquerade_tg6_reg);
> +	if (err == 0) {
> +		register_netdevice_notifier(&masq_dev_notifier);
> +		register_inet6addr_notifier(&masq_inet_notifier);
> +	}
> +
> +	return err;
> +}
> +static void __exit masquerade_tg6_exit(void)
> +{
> +	unregister_inet6addr_notifier(&masq_inet_notifier);
> +	unregister_netdevice_notifier(&masq_dev_notifier);
> +	xt_unregister_target(&masquerade_tg6_reg);
> +}
> +
> +module_init(masquerade_tg6_init);
> +module_exit(masquerade_tg6_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
> +MODULE_DESCRIPTION("Xtables: automatic address SNAT");
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-09 20:08 ` [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling kaber
  2012-08-17  8:06   ` Jesper Dangaard Brouer
@ 2012-08-17 13:36   ` Pablo Neira Ayuso
  2012-08-18 12:43     ` Patrick McHardy
  1 sibling, 1 reply; 68+ messages in thread
From: Pablo Neira Ayuso @ 2012-08-17 13:36 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

On Thu, Aug 09, 2012 at 10:08:49PM +0200, kaber@trash.net wrote:
> From: Patrick McHardy <kaber@trash.net>
> 
> The IPv6 conntrack fragmentation currently has a couple of shortcomings.
> Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
> defragmented packet is then passed to conntrack, the resulting conntrack
> information is attached to each original fragment and the fragments then
> continue their way through the stack.
> 
> Helper invocation occurs in the POSTROUTING hook, at which point only
> the original fragments are available. The result of this is that
> fragmented packets are never passed to helpers.
> 
> This patch improves the situation in the following way:
> 
> - If a reassembled packet belongs to a connection that has a helper
>   assigned, the reassembled packet is passed through the stack instead
>   of the original fragments.
> 
> - During defragmentation, the largest received fragment size is stored.
>   On output, the packet is refragmented if required. If the largest
>   received fragment size exceeds the outgoing MTU, a "packet too big"
>   message is generated, thus behaving as if the original fragments
>   were passed through the stack from an outside point of view.
> 
> - The ipv6_helper() hook function can't receive fragments anymore for
>   connections using a helper, so it is switched to use ipv6_skip_exthdr()
>   instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
>   reassembled packets are passed to connection tracking helpers.
> 
> The result of this is that we can properly track fragmented packets, but
> still generate ICMPv6 Packet too big messages if we would have before.
> 
> This patch is also required as a precondition for IPv6 NAT, where NAT
> helpers might enlarge packets up to a point that they require
> fragmentation. In that case we can't generate Packet too big messages
> since the proper MTU can't be calculated in all cases (f.i. when
> changing textual representation of a variable amount of addresses),
> so the packet is transparently fragmented iff the original packet or
> fragments would have fit the outgoing MTU.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>
> ---
>  include/linux/ipv6.h                           |    1 +
>  net/ipv6/ip6_output.c                          |    7 +++-
>  net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |   37 ++++++++++++++++++------
>  net/ipv6/netfilter/nf_conntrack_reasm.c        |   19 ++++++++++--
>  4 files changed, 50 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 879db26..0b94e91 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -256,6 +256,7 @@ struct inet6_skb_parm {
>  #if defined(CONFIG_IPV6_MIP6) || defined(CONFIG_IPV6_MIP6_MODULE)
>  	__u16			dsthao;
>  #endif
> +	__u16			frag_max_size;
>  
>  #define IP6SKB_XFRM_TRANSFORMED	1
>  #define IP6SKB_FORWARDED	2
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 5b2d63e..a4f6263 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -493,7 +493,8 @@ int ip6_forward(struct sk_buff *skb)
>  	if (mtu < IPV6_MIN_MTU)
>  		mtu = IPV6_MIN_MTU;
>  
> -	if (skb->len > mtu && !skb_is_gso(skb)) {
> +	if ((!skb->local_df && skb->len > mtu && !skb_is_gso(skb)) ||
> +	    (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)) {
>  		/* Again, force OUTPUT device used as source address */
>  		skb->dev = dst->dev;
>  		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
> @@ -636,7 +637,9 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
>  	/* We must not fragment if the socket is set to force MTU discovery
>  	 * or if the skb it not generated by a local socket.
>  	 */
> -	if (unlikely(!skb->local_df && skb->len > mtu)) {
> +	if (unlikely(!skb->local_df && skb->len > mtu) ||
> +		     (IP6CB(skb)->frag_max_size &&
> +		      IP6CB(skb)->frag_max_size > mtu)) {
>  		if (skb->sk && dst_allfrag(skb_dst(skb)))
>  			sk_nocaps_add(skb->sk, NETIF_F_GSO_MASK);
>  
> diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> index 4794f96..560d823 100644
> --- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> +++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> @@ -153,10 +153,10 @@ static unsigned int ipv6_helper(unsigned int hooknum,
>  	const struct nf_conn_help *help;
>  	const struct nf_conntrack_helper *helper;
>  	enum ip_conntrack_info ctinfo;
> -	unsigned int ret, protoff;
> -	unsigned int extoff = (u8 *)(ipv6_hdr(skb) + 1) - skb->data;
> -	unsigned char pnum = ipv6_hdr(skb)->nexthdr;
> -
> +	unsigned int ret;
> +	__be16 frag_off;
> +	int protoff;
> +	u8 nexthdr;
>  
>  	/* This is where we call the helper: as the packet goes out. */
>  	ct = nf_ct_get(skb, &ctinfo);
> @@ -171,9 +171,10 @@ static unsigned int ipv6_helper(unsigned int hooknum,
>  	if (!helper)
>  		return NF_ACCEPT;
>  
> -	protoff = nf_ct_ipv6_skip_exthdr(skb, extoff, &pnum,
> -					 skb->len - extoff);
> -	if (protoff > skb->len || pnum == NEXTHDR_FRAGMENT) {
> +	nexthdr = ipv6_hdr(skb)->nexthdr;
> +	protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr,
> +				   &frag_off);
> +	if (protoff < 0 || (frag_off & ntohs(~0x7)) != 0) {
>  		pr_debug("proto header not found\n");
>  		return NF_ACCEPT;
>  	}
> @@ -199,9 +200,13 @@ static unsigned int ipv6_confirm(unsigned int hooknum,
>  static unsigned int __ipv6_conntrack_in(struct net *net,
>  					unsigned int hooknum,
>  					struct sk_buff *skb,
> +					const struct net_device *in,
> +					const struct net_device *out,
>  					int (*okfn)(struct sk_buff *))
>  {
>  	struct sk_buff *reasm = skb->nfct_reasm;
> +	struct nf_conn *ct;
> +	enum ip_conntrack_info ctinfo;
>  
>  	/* This packet is fragmented and has reassembled packet. */
>  	if (reasm) {
> @@ -213,6 +218,20 @@ static unsigned int __ipv6_conntrack_in(struct net *net,
>  			if (ret != NF_ACCEPT)
>  				return ret;
>  		}
> +
> +		/* Conntrack helpers need the entire reassembled packet in the
> +		 * POST_ROUTING hook.
> +		 */
> +		ct = nf_ct_get(reasm, &ctinfo);
> +		if (ct != NULL && test_bit(IPS_HELPER_BIT, &ct->status)) {

Two things regarding the line above:

- I think this also need to check for !nf_ct_is_untracked(ct)

- IPS_HELPER_BIT is only set if the CT target is used to attach
  helpers. I know, this behaviour may seem confusing, but I didn't
  find any better way to avoid that NAT removes the helper
  explicitly attached via CT.

  So basically now that status bit means: "this helper has been attached
  via CT".

  Setting it inconditionally in __nf_ct_try_assign_helper would break
  the magic auto-assign helper code.

  On the other hand, the automatic helper assignment is scheduled to
  be removed (well, it would still take at least one 1.5/2 years
  before we do so). At that time, we'll be able to say that all
  conntrack with IPS_HELPER really has one helper. But now I think that
  you'll have to use for nfct_help instead to check if that ct has a
  helper.

> +			nf_conntrack_get_reasm(skb);
> +			NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
> +				       (struct net_device *)in,
> +				       (struct net_device *)out,
> +				       okfn, NF_IP6_PRI_CONNTRACK + 1);
> +			return NF_DROP_ERR(-ECANCELED);
> +		}
> +
>  		nf_conntrack_get(reasm->nfct);
>  		skb->nfct = reasm->nfct;
>  		skb->nfctinfo = reasm->nfctinfo;
> @@ -228,7 +247,7 @@ static unsigned int ipv6_conntrack_in(unsigned int hooknum,
>  				      const struct net_device *out,
>  				      int (*okfn)(struct sk_buff *))
>  {
> -	return __ipv6_conntrack_in(dev_net(in), hooknum, skb, okfn);
> +	return __ipv6_conntrack_in(dev_net(in), hooknum, skb, in, out, okfn);
>  }
>  
>  static unsigned int ipv6_conntrack_local(unsigned int hooknum,
> @@ -242,7 +261,7 @@ static unsigned int ipv6_conntrack_local(unsigned int hooknum,
>  		net_notice_ratelimited("ipv6_conntrack_local: packet too short\n");
>  		return NF_ACCEPT;
>  	}
> -	return __ipv6_conntrack_in(dev_net(out), hooknum, skb, okfn);
> +	return __ipv6_conntrack_in(dev_net(out), hooknum, skb, in, out, okfn);
>  }
>  
>  static struct nf_hook_ops ipv6_conntrack_ops[] __read_mostly = {
> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
> index c9c78c2..f94fb3a 100644
> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
> @@ -190,6 +190,7 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
>  			     const struct frag_hdr *fhdr, int nhoff)
>  {
>  	struct sk_buff *prev, *next;
> +	unsigned int payload_len;
>  	int offset, end;
>  
>  	if (fq->q.last_in & INET_FRAG_COMPLETE) {
> @@ -197,8 +198,10 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
>  		goto err;
>  	}
>  
> +	payload_len = ntohs(ipv6_hdr(skb)->payload_len);
> +
>  	offset = ntohs(fhdr->frag_off) & ~0x7;
> -	end = offset + (ntohs(ipv6_hdr(skb)->payload_len) -
> +	end = offset + (payload_len -
>  			((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1)));
>  
>  	if ((unsigned int)end > IPV6_MAXPLEN) {
> @@ -307,6 +310,8 @@ found:
>  	skb->dev = NULL;
>  	fq->q.stamp = skb->tstamp;
>  	fq->q.meat += skb->len;
> +	if (payload_len > fq->q.max_size)
> +		fq->q.max_size = payload_len;
>  	atomic_add(skb->truesize, &nf_init_frags.mem);
>  
>  	/* The first fragment.
> @@ -412,10 +417,12 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
>  	}
>  	atomic_sub(head->truesize, &nf_init_frags.mem);
>  
> +	head->local_df = 1;
>  	head->next = NULL;
>  	head->dev = dev;
>  	head->tstamp = fq->q.stamp;
>  	ipv6_hdr(head)->payload_len = htons(payload_len);
> +	IP6CB(head)->frag_max_size = sizeof(struct ipv6hdr) + fq->q.max_size;
>  
>  	/* Yes, and fold redundant checksum back. 8) */
>  	if (head->ip_summed == CHECKSUM_COMPLETE)
> @@ -592,6 +599,7 @@ void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
>  			int (*okfn)(struct sk_buff *))
>  {
>  	struct sk_buff *s, *s2;
> +	unsigned int ret = 0;
>  
>  	for (s = NFCT_FRAG6_CB(skb)->orig; s;) {
>  		nf_conntrack_put_reasm(s->nfct_reasm);
> @@ -601,8 +609,13 @@ void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
>  		s2 = s->next;
>  		s->next = NULL;
>  
> -		NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, s, in, out, okfn,
> -			       NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
> +		if (ret != -ECANCELED)
> +			ret = NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, s,
> +					     in, out, okfn,
> +					     NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
> +		else
> +			kfree_skb(s);
> +
>  		s = s2;
>  	}
>  	nf_conntrack_put_reasm(skb);
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/19] netfilter: IPv6 NAT
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (20 preceding siblings ...)
  2012-08-09 22:00 ` Pablo Neira Ayuso
@ 2012-08-17 13:42 ` Pablo Neira Ayuso
  2012-08-18 12:46   ` Patrick McHardy
  2012-08-25  0:58 ` Andre Tomt
  22 siblings, 1 reply; 68+ messages in thread
From: Pablo Neira Ayuso @ 2012-08-17 13:42 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

On Thu, Aug 09, 2012 at 10:08:44PM +0200, kaber@trash.net wrote:
[...]
> - Patches 16-19 add some IPv6-capable ports of existing NAT helpers

There was one for IRC also, did it get lost?

I also made TFTP, it was more or less trivial, you can find it here:

http://1984.lsi.us.es/git/nf-next/commit/?h=nf-nat4&id=ac2a06ab790386c8c8333dbf84584fc3f80fcdc5

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-17  8:06   ` Jesper Dangaard Brouer
@ 2012-08-18 12:26     ` Patrick McHardy
  2012-08-19 19:37       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 68+ messages in thread
From: Patrick McHardy @ 2012-08-18 12:26 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netfilter-devel, netdev

On Fri, 17 Aug 2012, Jesper Dangaard Brouer wrote:

> On Thu, 2012-08-09 at 22:08 +0200, kaber@trash.net wrote:
>> From: Patrick McHardy <kaber@trash.net>
>>
>> The IPv6 conntrack fragmentation currently has a couple of shortcomings.
>> Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
>> defragmented packet is then passed to conntrack, the resulting conntrack
>> information is attached to each original fragment and the fragments then
>> continue their way through the stack.
>>
>> Helper invocation occurs in the POSTROUTING hook, at which point only
>> the original fragments are available. The result of this is that
>> fragmented packets are never passed to helpers.
>>
>> This patch improves the situation in the following way:
>>
>> - If a reassembled packet belongs to a connection that has a helper
>>   assigned, the reassembled packet is passed through the stack instead
>>   of the original fragments.
>
> I'm working on IPv6 fragment handling for IPVS, and are taking advantage
> of the "replay" by nf_ct_frag6_output() at hook prio -399
> (NF_IP6_PRI_CONNTRACK_DEFRAG + 1).
> By making a hook at NF_INET_PRE_ROUTING at prio -99 (NF_IP6_PRI_NAT_DST
> + 1).
>
> I can see that the code path can be changed (with this patch), if a
> helper is assigned.  Then the "replay" starts at prio -199
> (NF_IP6_PRI_CONNTRACK + 1), I guess I'm safe as I run at -99.
>
> I have tested that your patchset works, with my ipvs patches, but would
> like the trigger the changed code path, to make sure.
>
> Could you provide an iptables command/rule, that trigger this code path?

The easiest way is a large ping with the NAT patches also applied,
in that case we also pass the first packet of a connection through
the stack reassembled.

>> @@ -199,9 +200,13 @@ static unsigned int ipv6_confirm(unsigned int hooknum,
>>  static unsigned int __ipv6_conntrack_in(struct net *net,
>>  					unsigned int hooknum,
>>  					struct sk_buff *skb,
>> +					const struct net_device *in,
>> +					const struct net_device *out,
>>  					int (*okfn)(struct sk_buff *))
>>  {
>>  	struct sk_buff *reasm = skb->nfct_reasm;
>> +	struct nf_conn *ct;
>> +	enum ip_conntrack_info ctinfo;
>>
>>  	/* This packet is fragmented and has reassembled packet. */
>>  	if (reasm) {
>> @@ -213,6 +218,20 @@ static unsigned int __ipv6_conntrack_in(struct net *net,
>>  			if (ret != NF_ACCEPT)
>>  				return ret;
>>  		}
>> +
>> +		/* Conntrack helpers need the entire reassembled packet in the
>> +		 * POST_ROUTING hook.
>> +		 */
>> +		ct = nf_ct_get(reasm, &ctinfo);
>> +		if (ct != NULL && test_bit(IPS_HELPER_BIT, &ct->status)) {
>> +			nf_conntrack_get_reasm(skb);
>> +			NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
>> +				       (struct net_device *)in,
>> +				       (struct net_device *)out,
>> +				       okfn, NF_IP6_PRI_CONNTRACK + 1);
>
> Hook prio change to NF_IP6_PRI_CONNTRACK + 1

I didn't get this part, you want to change to PRE_CONNTRACK + 1? What
about raw and SELinux?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 13/19] netfilter: ip6tables: add MASQUERADE target
  2012-08-17 13:11   ` Pablo Neira Ayuso
@ 2012-08-18 12:31     ` Patrick McHardy
  0 siblings, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-18 12:31 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, netdev

On Fri, 17 Aug 2012, Pablo Neira Ayuso wrote:

> Hi Patrick,
>
> On Thu, Aug 09, 2012 at 10:08:57PM +0200, kaber@trash.net wrote:
>> From: Patrick McHardy <kaber@trash.net>
>>
> Please, add this chunk to this patch:
>
> diff --git a/include/net/netfilter/nf_nat.h
> b/include/net/netfilter/nf_nat.h
> index 1752f133..bd8eea7 100644
> --- a/include/net/netfilter/nf_nat.h
> +++ b/include/net/netfilter/nf_nat.h
> @@ -43,7 +43,9 @@ struct nf_conn_nat {
>        struct nf_conn *ct;
>        union nf_conntrack_nat_help help;
> #if defined(CONFIG_IP_NF_TARGET_MASQUERADE) || \
> -    defined(CONFIG_IP_NF_TARGET_MASQUERADE_MODULE)
> +    defined(CONFIG_IP_NF_TARGET_MASQUERADE_MODULE) || \
> +    defined(CONFIG_IP6_NF_TARGET_MASQUERADE) || \
> +    defined(CONFIG_IP6_NF_TARGET_MASQUERADE_MODULE)
>        int masq_index;
> #endif
> };
>
> Otherwise, compilation breaks with:
>
> * IPv4 NAT is disabled
> * IPv6 NAT enabled.

Fixed, thanks.

>
> And yes, that pile of ifdefs is really ugly, I wonder if they are
> worth for saving 4 bytes. I think most vendors usually include
> MASQUERADE support if NAT is enabled.
>
> It seems we have the tradition of keeping several similar compile time
> options in Netfilter to optimize memory in several situations (at the
> cost of polluting the code with ifdefs). Probably we can think of
> getting rid of them.

Well, vendors maybe, but what about embedded systems? I have some which
are *really* short on memory. They limit conntrack to very few entries
though, so it doesn't make that much of a difference.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-17 13:36   ` [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling Pablo Neira Ayuso
@ 2012-08-18 12:43     ` Patrick McHardy
  0 siblings, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-18 12:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1283 bytes --]

On Fri, 17 Aug 2012, Pablo Neira Ayuso wrote:

> On Thu, Aug 09, 2012 at 10:08:49PM +0200, kaber@trash.net wrote:
>> +
>> +		/* Conntrack helpers need the entire reassembled packet in the
>> +		 * POST_ROUTING hook.
>> +		 */
>> +		ct = nf_ct_get(reasm, &ctinfo);
>> +		if (ct != NULL && test_bit(IPS_HELPER_BIT, &ct->status)) {
>
> Two things regarding the line above:
>
> - I think this also need to check for !nf_ct_is_untracked(ct)

Fixed, thanks.

>
> - IPS_HELPER_BIT is only set if the CT target is used to attach
>  helpers. I know, this behaviour may seem confusing, but I didn't
>  find any better way to avoid that NAT removes the helper
>  explicitly attached via CT.
>
>  So basically now that status bit means: "this helper has been attached
>  via CT".
>
>  Setting it inconditionally in __nf_ct_try_assign_helper would break
>  the magic auto-assign helper code.
>
>  On the other hand, the automatic helper assignment is scheduled to
>  be removed (well, it would still take at least one 1.5/2 years
>  before we do so). At that time, we'll be able to say that all
>  conntrack with IPS_HELPER really has one helper. But now I think that
>  you'll have to use for nfct_help instead to check if that ct has a
>  helper.

I see. Also fixed, thanks. New patch attached.

[-- Attachment #2: Type: TEXT/PLAIN, Size: 9573 bytes --]

commit f4266409e20dc0d2d22447aa7935c997f38914a1
Author: Patrick McHardy <kaber@trash.net>
Date:   Tue Aug 14 15:27:55 2012 +0200

    netfilter: nf_conntrack_ipv6: improve fragmentation handling
    
    The IPv6 conntrack fragmentation currently has a couple of shortcomings.
    Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
    defragmented packet is then passed to conntrack, the resulting conntrack
    information is attached to each original fragment and the fragments then
    continue their way through the stack.
    
    Helper invocation occurs in the POSTROUTING hook, at which point only
    the original fragments are available. The result of this is that
    fragmented packets are never passed to helpers.
    
    This patch improves the situation in the following way:
    
    - If a reassembled packet belongs to a connection that has a helper
      assigned, the reassembled packet is passed through the stack instead
      of the original fragments.
    
    - During defragmentation, the largest received fragment size is stored.
      On output, the packet is refragmented if required. If the largest
      received fragment size exceeds the outgoing MTU, a "packet too big"
      message is generated, thus behaving as if the original fragments
      were passed through the stack from an outside point of view.
    
    - The ipv6_helper() hook function can't receive fragments anymore for
      connections using a helper, so it is switched to use ipv6_skip_exthdr()
      instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
      reassembled packets are passed to connection tracking helpers.
    
    The result of this is that we can properly track fragmented packets, but
    still generate ICMPv6 Packet too big messages if we would have before.
    
    This patch is also required as a precondition for IPv6 NAT, where NAT
    helpers might enlarge packets up to a point that they require
    fragmentation. In that case we can't generate Packet too big messages
    since the proper MTU can't be calculated in all cases (f.i. when
    changing textual representation of a variable amount of addresses),
    so the packet is transparently fragmented iff the original packet or
    fragments would have fit the outgoing MTU.
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 879db26..0b94e91 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -256,6 +256,7 @@ struct inet6_skb_parm {
 #if defined(CONFIG_IPV6_MIP6) || defined(CONFIG_IPV6_MIP6_MODULE)
 	__u16			dsthao;
 #endif
+	__u16			frag_max_size;
 
 #define IP6SKB_XFRM_TRANSFORMED	1
 #define IP6SKB_FORWARDED	2
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 5b2d63e..a4f6263 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -493,7 +493,8 @@ int ip6_forward(struct sk_buff *skb)
 	if (mtu < IPV6_MIN_MTU)
 		mtu = IPV6_MIN_MTU;
 
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if ((!skb->local_df && skb->len > mtu && !skb_is_gso(skb)) ||
+	    (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)) {
 		/* Again, force OUTPUT device used as source address */
 		skb->dev = dst->dev;
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
@@ -636,7 +637,9 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 	/* We must not fragment if the socket is set to force MTU discovery
 	 * or if the skb it not generated by a local socket.
 	 */
-	if (unlikely(!skb->local_df && skb->len > mtu)) {
+	if (unlikely(!skb->local_df && skb->len > mtu) ||
+		     (IP6CB(skb)->frag_max_size &&
+		      IP6CB(skb)->frag_max_size > mtu)) {
 		if (skb->sk && dst_allfrag(skb_dst(skb)))
 			sk_nocaps_add(skb->sk, NETIF_F_GSO_MASK);
 
diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index 4794f96..9f36e38 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -153,10 +153,10 @@ static unsigned int ipv6_helper(unsigned int hooknum,
 	const struct nf_conn_help *help;
 	const struct nf_conntrack_helper *helper;
 	enum ip_conntrack_info ctinfo;
-	unsigned int ret, protoff;
-	unsigned int extoff = (u8 *)(ipv6_hdr(skb) + 1) - skb->data;
-	unsigned char pnum = ipv6_hdr(skb)->nexthdr;
-
+	unsigned int ret;
+	__be16 frag_off;
+	int protoff;
+	u8 nexthdr;
 
 	/* This is where we call the helper: as the packet goes out. */
 	ct = nf_ct_get(skb, &ctinfo);
@@ -171,9 +171,10 @@ static unsigned int ipv6_helper(unsigned int hooknum,
 	if (!helper)
 		return NF_ACCEPT;
 
-	protoff = nf_ct_ipv6_skip_exthdr(skb, extoff, &pnum,
-					 skb->len - extoff);
-	if (protoff > skb->len || pnum == NEXTHDR_FRAGMENT) {
+	nexthdr = ipv6_hdr(skb)->nexthdr;
+	protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr,
+				   &frag_off);
+	if (protoff < 0 || (frag_off & ntohs(~0x7)) != 0) {
 		pr_debug("proto header not found\n");
 		return NF_ACCEPT;
 	}
@@ -199,9 +200,14 @@ static unsigned int ipv6_confirm(unsigned int hooknum,
 static unsigned int __ipv6_conntrack_in(struct net *net,
 					unsigned int hooknum,
 					struct sk_buff *skb,
+					const struct net_device *in,
+					const struct net_device *out,
 					int (*okfn)(struct sk_buff *))
 {
 	struct sk_buff *reasm = skb->nfct_reasm;
+	const struct nf_conn_help *help;
+	struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
 
 	/* This packet is fragmented and has reassembled packet. */
 	if (reasm) {
@@ -213,6 +219,23 @@ static unsigned int __ipv6_conntrack_in(struct net *net,
 			if (ret != NF_ACCEPT)
 				return ret;
 		}
+
+		/* Conntrack helpers need the entire reassembled packet in the
+		 * POST_ROUTING hook.
+		 */
+		ct = nf_ct_get(reasm, &ctinfo);
+		if (ct != NULL && !nf_ct_is_untracked(ct)) {
+			help = nfct_help(ct);
+			if (help && help->helper) {
+				nf_conntrack_get_reasm(skb);
+				NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
+					       (struct net_device *)in,
+					       (struct net_device *)out,
+					       okfn, NF_IP6_PRI_CONNTRACK + 1);
+				return NF_DROP_ERR(-ECANCELED);
+			}
+		}
+
 		nf_conntrack_get(reasm->nfct);
 		skb->nfct = reasm->nfct;
 		skb->nfctinfo = reasm->nfctinfo;
@@ -228,7 +251,7 @@ static unsigned int ipv6_conntrack_in(unsigned int hooknum,
 				      const struct net_device *out,
 				      int (*okfn)(struct sk_buff *))
 {
-	return __ipv6_conntrack_in(dev_net(in), hooknum, skb, okfn);
+	return __ipv6_conntrack_in(dev_net(in), hooknum, skb, in, out, okfn);
 }
 
 static unsigned int ipv6_conntrack_local(unsigned int hooknum,
@@ -242,7 +265,7 @@ static unsigned int ipv6_conntrack_local(unsigned int hooknum,
 		net_notice_ratelimited("ipv6_conntrack_local: packet too short\n");
 		return NF_ACCEPT;
 	}
-	return __ipv6_conntrack_in(dev_net(out), hooknum, skb, okfn);
+	return __ipv6_conntrack_in(dev_net(out), hooknum, skb, in, out, okfn);
 }
 
 static struct nf_hook_ops ipv6_conntrack_ops[] __read_mostly = {
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index c9c78c2..f94fb3a 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -190,6 +190,7 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
 			     const struct frag_hdr *fhdr, int nhoff)
 {
 	struct sk_buff *prev, *next;
+	unsigned int payload_len;
 	int offset, end;
 
 	if (fq->q.last_in & INET_FRAG_COMPLETE) {
@@ -197,8 +198,10 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
 		goto err;
 	}
 
+	payload_len = ntohs(ipv6_hdr(skb)->payload_len);
+
 	offset = ntohs(fhdr->frag_off) & ~0x7;
-	end = offset + (ntohs(ipv6_hdr(skb)->payload_len) -
+	end = offset + (payload_len -
 			((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1)));
 
 	if ((unsigned int)end > IPV6_MAXPLEN) {
@@ -307,6 +310,8 @@ found:
 	skb->dev = NULL;
 	fq->q.stamp = skb->tstamp;
 	fq->q.meat += skb->len;
+	if (payload_len > fq->q.max_size)
+		fq->q.max_size = payload_len;
 	atomic_add(skb->truesize, &nf_init_frags.mem);
 
 	/* The first fragment.
@@ -412,10 +417,12 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
 	}
 	atomic_sub(head->truesize, &nf_init_frags.mem);
 
+	head->local_df = 1;
 	head->next = NULL;
 	head->dev = dev;
 	head->tstamp = fq->q.stamp;
 	ipv6_hdr(head)->payload_len = htons(payload_len);
+	IP6CB(head)->frag_max_size = sizeof(struct ipv6hdr) + fq->q.max_size;
 
 	/* Yes, and fold redundant checksum back. 8) */
 	if (head->ip_summed == CHECKSUM_COMPLETE)
@@ -592,6 +599,7 @@ void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
 			int (*okfn)(struct sk_buff *))
 {
 	struct sk_buff *s, *s2;
+	unsigned int ret = 0;
 
 	for (s = NFCT_FRAG6_CB(skb)->orig; s;) {
 		nf_conntrack_put_reasm(s->nfct_reasm);
@@ -601,8 +609,13 @@ void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
 		s2 = s->next;
 		s->next = NULL;
 
-		NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, s, in, out, okfn,
-			       NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
+		if (ret != -ECANCELED)
+			ret = NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, s,
+					     in, out, okfn,
+					     NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
+		else
+			kfree_skb(s);
+
 		s = s2;
 	}
 	nf_conntrack_put_reasm(skb);

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/19] netfilter: IPv6 NAT
  2012-08-17 13:42 ` Pablo Neira Ayuso
@ 2012-08-18 12:46   ` Patrick McHardy
  0 siblings, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-18 12:46 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, netdev

On Fri, 17 Aug 2012, Pablo Neira Ayuso wrote:

> On Thu, Aug 09, 2012 at 10:08:44PM +0200, kaber@trash.net wrote:
> [...]
>> - Patches 16-19 add some IPv6-capable ports of existing NAT helpers
>
> There was one for IRC also, did it get lost?
>
> I also made TFTP, it was more or less trivial, you can find it here:
>
> http://1984.lsi.us.es/git/nf-next/commit/?h=nf-nat4&id=ac2a06ab790386c8c8333dbf84584fc3f80fcdc5

I've already added your IRC and TFTP patches to my tree. Will send the
latest version later today.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-18 12:26     ` Patrick McHardy
@ 2012-08-19 19:37       ` Jesper Dangaard Brouer
  2012-08-19 19:44         ` Patrick McHardy
  0 siblings, 1 reply; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-19 19:37 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev

On Sat, 2012-08-18 at 14:26 +0200, Patrick McHardy wrote:
> On Fri, 17 Aug 2012, Jesper Dangaard Brouer wrote:
> 
> > On Thu, 2012-08-09 at 22:08 +0200, kaber@trash.net wrote:
> >> From: Patrick McHardy <kaber@trash.net>
> >>
> >> The IPv6 conntrack fragmentation currently has a couple of shortcomings.
> >> Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
> >> defragmented packet is then passed to conntrack, the resulting conntrack
> >> information is attached to each original fragment and the fragments then
> >> continue their way through the stack.
> >>
> >> Helper invocation occurs in the POSTROUTING hook, at which point only
> >> the original fragments are available. The result of this is that
> >> fragmented packets are never passed to helpers.
> >>
> >> This patch improves the situation in the following way:
> >>
> >> - If a reassembled packet belongs to a connection that has a helper
> >>   assigned, the reassembled packet is passed through the stack instead
> >>   of the original fragments.
> >
> > I'm working on IPv6 fragment handling for IPVS, and are taking advantage
> > of the "replay" by nf_ct_frag6_output() at hook prio -399
> > (NF_IP6_PRI_CONNTRACK_DEFRAG + 1).
> > By making a hook at NF_INET_PRE_ROUTING at prio -99 (NF_IP6_PRI_NAT_DST
> > + 1).
> >
> > I can see that the code path can be changed (with this patch), if a
> > helper is assigned.  Then the "replay" starts at prio -199
> > (NF_IP6_PRI_CONNTRACK + 1), I guess I'm safe as I run at -99.
> >
> > I have tested that your patchset works, with my ipvs patches, but would
> > like the trigger the changed code path, to make sure.
> >
> > Could you provide an iptables command/rule, that trigger this code path?
> 
> The easiest way is a large ping with the NAT patches also applied,
> in that case we also pass the first packet of a connection through
> the stack reassembled.

So, a fragmented IPv6 ICMPv6 packet, I assume?

Don't I need to load some of the helper modules, or just the
nf_conntrack_ipv6 module, or perhaps only nf_defrag_ipv6 ?


> >> @@ -199,9 +200,13 @@ static unsigned int ipv6_confirm(unsigned int hooknum,
> >>  static unsigned int __ipv6_conntrack_in(struct net *net,
> >>  					unsigned int hooknum,
> >>  					struct sk_buff *skb,
> >> +					const struct net_device *in,
> >> +					const struct net_device *out,
> >>  					int (*okfn)(struct sk_buff *))
> >>  {
> >>  	struct sk_buff *reasm = skb->nfct_reasm;
> >> +	struct nf_conn *ct;
> >> +	enum ip_conntrack_info ctinfo;
> >>
> >>  	/* This packet is fragmented and has reassembled packet. */
> >>  	if (reasm) {
> >> @@ -213,6 +218,20 @@ static unsigned int __ipv6_conntrack_in(struct net *net,
> >>  			if (ret != NF_ACCEPT)
> >>  				return ret;
> >>  		}
> >> +
> >> +		/* Conntrack helpers need the entire reassembled packet in the
> >> +		 * POST_ROUTING hook.
> >> +		 */
> >> +		ct = nf_ct_get(reasm, &ctinfo);
> >> +		if (ct != NULL && test_bit(IPS_HELPER_BIT, &ct->status)) {
> >> +			nf_conntrack_get_reasm(skb);
> >> +			NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
> >> +				       (struct net_device *)in,
> >> +				       (struct net_device *)out,
> >> +				       okfn, NF_IP6_PRI_CONNTRACK + 1);
> >
> > Hook prio change to NF_IP6_PRI_CONNTRACK + 1
> 
> I didn't get this part, you want to change to PRE_CONNTRACK + 1? What
> about raw and SELinux?

No - I don't want any changes.

I was just pointing out *where* the changes occur in your patch. This is
just a "service" to other email readers, so they can spot the changes
faster, I were referring to.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-19 19:37       ` Jesper Dangaard Brouer
@ 2012-08-19 19:44         ` Patrick McHardy
  2012-08-20 13:13           ` Jesper Dangaard Brouer
  2012-08-21 22:21           ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-19 19:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netfilter-devel, netdev

On Sun, 19 Aug 2012, Jesper Dangaard Brouer wrote:

> On Sat, 2012-08-18 at 14:26 +0200, Patrick McHardy wrote:
>>>
>>> Could you provide an iptables command/rule, that trigger this code path?
>>
>> The easiest way is a large ping with the NAT patches also applied,
>> in that case we also pass the first packet of a connection through
>> the stack reassembled.
>
> So, a fragmented IPv6 ICMPv6 packet, I assume?

Correct.

> Don't I need to load some of the helper modules, or just the
> nf_conntrack_ipv6 module, or perhaps only nf_defrag_ipv6 ?

Not with the entire patchset, just IPv6 conntrack is enough. Aith IPv6 NAT
the first packet of a connection must always be defragemented, independant
of an assigned helper.

>>>> @@ -199,9 +200,13 @@ static unsigned int ipv6_confirm(unsigned int hooknum,
>>>>  static unsigned int __ipv6_conntrack_in(struct net *net,
>>>>  					unsigned int hooknum,
>>>>  					struct sk_buff *skb,
>>>> +					const struct net_device *in,
>>>> +					const struct net_device *out,
>>>>  					int (*okfn)(struct sk_buff *))
>>>>  {
>>>>  	struct sk_buff *reasm = skb->nfct_reasm;
>>>> +	struct nf_conn *ct;
>>>> +	enum ip_conntrack_info ctinfo;
>>>>
>>>>  	/* This packet is fragmented and has reassembled packet. */
>>>>  	if (reasm) {
>>>> @@ -213,6 +218,20 @@ static unsigned int __ipv6_conntrack_in(struct net *net,
>>>>  			if (ret != NF_ACCEPT)
>>>>  				return ret;
>>>>  		}
>>>> +
>>>> +		/* Conntrack helpers need the entire reassembled packet in the
>>>> +		 * POST_ROUTING hook.
>>>> +		 */
>>>> +		ct = nf_ct_get(reasm, &ctinfo);
>>>> +		if (ct != NULL && test_bit(IPS_HELPER_BIT, &ct->status)) {
>>>> +			nf_conntrack_get_reasm(skb);
>>>> +			NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
>>>> +				       (struct net_device *)in,
>>>> +				       (struct net_device *)out,
>>>> +				       okfn, NF_IP6_PRI_CONNTRACK + 1);
>>>
>>> Hook prio change to NF_IP6_PRI_CONNTRACK + 1
>>
>> I didn't get this part, you want to change to PRE_CONNTRACK + 1? What
>> about raw and SELinux?
>
> No - I don't want any changes.
>
> I was just pointing out *where* the changes occur in your patch. This is
> just a "service" to other email readers, so they can spot the changes
> faster, I were referring to.

Could you send me your patch so I get a better picture of what you're 
doing exactly?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-19 19:44         ` Patrick McHardy
@ 2012-08-20 13:13           ` Jesper Dangaard Brouer
  2012-08-22 22:21             ` Patrick McHardy
  2012-08-21 22:21           ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-20 13:13 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev

On Sun, 2012-08-19 at 21:44 +0200, Patrick McHardy wrote:

> Could you send me your patch so I get a better picture of what you're 
> doing exactly?

Okay, just posted the patchset.

Specifically look at patch:
 [PATCH 3/3] ipvs: Complete IPv6 fragment handling for IPVS

Where I use the hook to copy the fw mark from the reasm SKB packet to
the SKB fragments. (Perhaps, this could be done else were in the
netfilter framework).

--Jesper Brouer

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-19 19:44         ` Patrick McHardy
  2012-08-20 13:13           ` Jesper Dangaard Brouer
@ 2012-08-21 22:21           ` Jesper Dangaard Brouer
  2012-08-26 21:20             ` Patrick McHardy
  1 sibling, 1 reply; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-21 22:21 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: netfilter-devel, netdev, Julian Anastasov, Hans Schillstrom,
	Hans Schillstrom

On Sun, 2012-08-19 at 21:44 +0200, Patrick McHardy wrote:
> On Sun, 19 Aug 2012, Jesper Dangaard Brouer wrote:
> > On Sat, 2012-08-18 at 14:26 +0200, Patrick McHardy wrote:
[...]

> > Don't I need to load some of the helper modules, or just the
> > nf_conntrack_ipv6 module, or perhaps only nf_defrag_ipv6 ?
> 
> Not with the entire patchset, just IPv6 conntrack is enough. Aith IPv6 NAT
> the first packet of a connection must always be defragemented, independant
> of an assigned helper.

When loading "nf_conntrack_ipv6" I run into issues.

When sending a fragmented UDP packet.  With these patches, the IPVS
stack will no longer see the fragmented packets, but instead see one
large SKB.  This will trigger a MTU path check in e.g.
ip_vs_dr_xmit_v6() and an ICMPv6 too big packet is send back.

  IPVS: ip_vs_dr_xmit_v6(): frag needed

Perhaps we could change/fix the MTU check in IPVS?
(This would also solve issues I've seen with TSO/GSO frames, hitting
this code path).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-20 13:13           ` Jesper Dangaard Brouer
@ 2012-08-22 22:21             ` Patrick McHardy
  0 siblings, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-22 22:21 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netfilter-devel, netdev

On Mon, 20 Aug 2012, Jesper Dangaard Brouer wrote:

> On Sun, 2012-08-19 at 21:44 +0200, Patrick McHardy wrote:
>
>> Could you send me your patch so I get a better picture of what you're
>> doing exactly?
>
> Okay, just posted the patchset.
>
> Specifically look at patch:
> [PATCH 3/3] ipvs: Complete IPv6 fragment handling for IPVS
>
> Where I use the hook to copy the fw mark from the reasm SKB packet to
> the SKB fragments. (Perhaps, this could be done else were in the
> netfilter framework).

Thanks, I'll have a look at this tommorrow.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/19] netfilter: IPv6 NAT
  2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
                   ` (21 preceding siblings ...)
  2012-08-17 13:42 ` Pablo Neira Ayuso
@ 2012-08-25  0:58 ` Andre Tomt
  2012-08-25  1:16   ` Andre Tomt
  2012-08-27  7:33   ` Florian Weimer
  22 siblings, 2 replies; 68+ messages in thread
From: Andre Tomt @ 2012-08-25  0:58 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

On 09. aug. 2012 22:08, kaber@trash.net wrote:
> The following patches contain an updated version of IPv6 NAT against
> Linus' current tree.

Hmmm. Looking in my crystal ball (hi #ipv6!), I predict that if this 
lands in mainline - and thus in consumer CPE/routers eventually - many 
ISP's will have little incentive to actually implement assigning of 
blocks to their consumer users like they "have to" today.

We have this wonderful chance of fixing a major problem with todays 
internet, but now we are going down this very slippery slope.

I do need this code for a experimental project myself, and acknowledge 
there may be some valid use cases, but I do not like the global 
implications one bit.

At least some big fat warnings please?

:-(

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/19] netfilter: IPv6 NAT
  2012-08-25  0:58 ` Andre Tomt
@ 2012-08-25  1:16   ` Andre Tomt
  2012-08-26 18:06     ` Patrick McHardy
  2012-08-27  7:33   ` Florian Weimer
  1 sibling, 1 reply; 68+ messages in thread
From: Andre Tomt @ 2012-08-25  1:16 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

On 25. aug. 2012 02:58, Andre Tomt wrote:
> On 09. aug. 2012 22:08, kaber@trash.net wrote:
>> The following patches contain an updated version of IPv6 NAT against
>> Linus' current tree.
>
> Hmmm. Looking in my crystal ball (hi #ipv6!), I predict that if this
> lands in mainline - and thus in consumer CPE/routers eventually - many
> ISP's will have little incentive to actually implement assigning of
> blocks to their consumer users like they "have to" today.
>
> We have this wonderful chance of fixing a major problem with todays
> internet, but now we are going down this very slippery slope.
>
> I do need this code for a experimental project myself, and acknowledge
> there may be some valid use cases, but I do not like the global
> implications one bit.
>
> At least some big fat warnings please?

Clarification: This is about the NAT66 port-based 1:n NAT targets.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/19] netfilter: IPv6 NAT
  2012-08-25  1:16   ` Andre Tomt
@ 2012-08-26 18:06     ` Patrick McHardy
  0 siblings, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-26 18:06 UTC (permalink / raw)
  To: Andre Tomt; +Cc: netfilter-devel, netdev

On Sat, 25 Aug 2012, Andre Tomt wrote:

> On 25. aug. 2012 02:58, Andre Tomt wrote:
>> On 09. aug. 2012 22:08, kaber@trash.net wrote:
>>> The following patches contain an updated version of IPv6 NAT against
>>> Linus' current tree.
>> 
>> Hmmm. Looking in my crystal ball (hi #ipv6!), I predict that if this
>> lands in mainline - and thus in consumer CPE/routers eventually - many
>> ISP's will have little incentive to actually implement assigning of
>> blocks to their consumer users like they "have to" today.
>> 
>> We have this wonderful chance of fixing a major problem with todays
>> internet, but now we are going down this very slippery slope.
>> 
>> I do need this code for a experimental project myself, and acknowledge
>> there may be some valid use cases, but I do not like the global
>> implications one bit.
>> 
>> At least some big fat warnings please?
>
> Clarification: This is about the NAT66 port-based 1:n NAT targets.

We can certainly add a warning to the Kconfig text or (better) the
iptables manpage. But only a very small percentage of people who
might end up (unknowingly) using this will ever see them.

Feel free to send a patch though.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-21 22:21           ` Jesper Dangaard Brouer
@ 2012-08-26 21:20             ` Patrick McHardy
  2012-08-27 10:13               ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 68+ messages in thread
From: Patrick McHardy @ 2012-08-26 21:20 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netfilter-devel, netdev, Julian Anastasov, Hans Schillstrom,
	Hans Schillstrom

On Wed, 22 Aug 2012, Jesper Dangaard Brouer wrote:

> On Sun, 2012-08-19 at 21:44 +0200, Patrick McHardy wrote:
>> On Sun, 19 Aug 2012, Jesper Dangaard Brouer wrote:
>>> On Sat, 2012-08-18 at 14:26 +0200, Patrick McHardy wrote:
> [...]
>
>>> Don't I need to load some of the helper modules, or just the
>>> nf_conntrack_ipv6 module, or perhaps only nf_defrag_ipv6 ?
>>
>> Not with the entire patchset, just IPv6 conntrack is enough. Aith IPv6 NAT
>> the first packet of a connection must always be defragemented, independant
>> of an assigned helper.
>
> When loading "nf_conntrack_ipv6" I run into issues.
>
> When sending a fragmented UDP packet.  With these patches, the IPVS
> stack will no longer see the fragmented packets, but instead see one
> large SKB.  This will trigger a MTU path check in e.g.
> ip_vs_dr_xmit_v6() and an ICMPv6 too big packet is send back.
>
>  IPVS: ip_vs_dr_xmit_v6(): frag needed
>
> Perhaps we could change/fix the MTU check in IPVS?
> (This would also solve issues I've seen with TSO/GSO frames, hitting
> this code path).

I guess this should use the same check as in IPv6 output, check
whether IP6CB(skb)->max_frag_size is != 0 and > MTU and only send
an ICMPv6 error in that case.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 00/19] netfilter: IPv6 NAT
  2012-08-25  0:58 ` Andre Tomt
  2012-08-25  1:16   ` Andre Tomt
@ 2012-08-27  7:33   ` Florian Weimer
  1 sibling, 0 replies; 68+ messages in thread
From: Florian Weimer @ 2012-08-27  7:33 UTC (permalink / raw)
  To: Andre Tomt; +Cc: kaber, netfilter-devel, netdev

On 08/25/2012 02:58 AM, Andre Tomt wrote:
> On 09. aug. 2012 22:08, kaber@trash.net wrote:
>> The following patches contain an updated version of IPv6 NAT against
>> Linus' current tree.
>
> Hmmm. Looking in my crystal ball (hi #ipv6!), I predict that if this
> lands in mainline - and thus in consumer CPE/routers eventually - many
> ISP's will have little incentive to actually implement assigning of
> blocks to their consumer users like they "have to" today.

I think the brokenness of IPv6 ND is still sufficient incentive to 
provide the customer with a prefix.  Whether this prefix is constant 
over an extended period of time is a separate question.  Even among 
consumers, preferences vary widely.

-- 
Florian Weimer / Red Hat Product Security Team

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-26 21:20             ` Patrick McHardy
@ 2012-08-27 10:13               ` Jesper Dangaard Brouer
  2012-08-27 10:41                 ` Patrick McHardy
  0 siblings, 1 reply; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-27 10:13 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: netfilter-devel, netdev, Julian Anastasov, Hans Schillstrom,
	Hans Schillstrom

On Sun, 2012-08-26 at 23:20 +0200, Patrick McHardy wrote:
> On Wed, 22 Aug 2012, Jesper Dangaard Brouer wrote:
> 
> > On Sun, 2012-08-19 at 21:44 +0200, Patrick McHardy wrote:
> >> On Sun, 19 Aug 2012, Jesper Dangaard Brouer wrote:
> >>> On Sat, 2012-08-18 at 14:26 +0200, Patrick McHardy wrote:
> > [...]
> >
> >>> Don't I need to load some of the helper modules, or just the
> >>> nf_conntrack_ipv6 module, or perhaps only nf_defrag_ipv6 ?
> >>
> >> Not with the entire patchset, just IPv6 conntrack is enough. Aith IPv6 NAT
> >> the first packet of a connection must always be defragemented, independant
> >> of an assigned helper.
> >
> > When loading "nf_conntrack_ipv6" I run into issues.
> >
> > When sending a fragmented UDP packet.  With these patches, the IPVS
> > stack will no longer see the fragmented packets, but instead see one
> > large SKB.  This will trigger a MTU path check in e.g.
> > ip_vs_dr_xmit_v6() and an ICMPv6 too big packet is send back.
> >
> >  IPVS: ip_vs_dr_xmit_v6(): frag needed
> >
> > Perhaps we could change/fix the MTU check in IPVS?
> > (This would also solve issues I've seen with TSO/GSO frames, hitting
> > this code path).
> 
> I guess this should use the same check as in IPv6 output, check
> whether IP6CB(skb)->max_frag_size is != 0 and > MTU and only send
> an ICMPv6 error in that case.

Hans have (already) proposed this solution:

  if ((!skb->local_df && skb->len > mtu && !skb_is_gso(skb)) ||
      (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)) {

And I have tested it works.
But I'm not sure about, if we really need the "!skb->local_df" check ?


Thus, we should extend you patchset with a patch, that also address the
MTU checks in IPVS.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling
  2012-08-27 10:13               ` Jesper Dangaard Brouer
@ 2012-08-27 10:41                 ` Patrick McHardy
  2012-08-27 14:40                   ` [PATCH 0/2] net: ipvs and netfilter IPv6 defrag MTU handling Jesper Dangaard Brouer
  0 siblings, 1 reply; 68+ messages in thread
From: Patrick McHardy @ 2012-08-27 10:41 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netfilter-devel, netdev, Julian Anastasov, Hans Schillstrom,
	Hans Schillstrom

On Mon, 27 Aug 2012, Jesper Dangaard Brouer wrote:

>>> When loading "nf_conntrack_ipv6" I run into issues.
>>>
>>> When sending a fragmented UDP packet.  With these patches, the IPVS
>>> stack will no longer see the fragmented packets, but instead see one
>>> large SKB.  This will trigger a MTU path check in e.g.
>>> ip_vs_dr_xmit_v6() and an ICMPv6 too big packet is send back.
>>>
>>>  IPVS: ip_vs_dr_xmit_v6(): frag needed
>>>
>>> Perhaps we could change/fix the MTU check in IPVS?
>>> (This would also solve issues I've seen with TSO/GSO frames, hitting
>>> this code path).
>>
>> I guess this should use the same check as in IPv6 output, check
>> whether IP6CB(skb)->max_frag_size is != 0 and > MTU and only send
>> an ICMPv6 error in that case.
>
> Hans have (already) proposed this solution:
>
>  if ((!skb->local_df && skb->len > mtu && !skb_is_gso(skb)) ||
>      (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)) {
>
> And I have tested it works.
> But I'm not sure about, if we really need the "!skb->local_df" check ?

Not necessarily, alternatively you could use:

if ((!IP6CB(skb)->frag_max_size &&
      (skb->len > mtu && !skb_is_gso(skb)) ||
     IP6CB(skb)->frag_max_size > mtu)

We just need to make sure that if frag_max_size is set that skb->len
is ignored. Either way is fine, but I'd suggest to keep it similar to
the variant used in IPv6.

> Thus, we should extend you patchset with a patch, that also address the
> MTU checks in IPVS.

Can you send over a patch for inclusion in the patchset? Besides this, its
ready for submission.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 0/2] net: ipvs and netfilter IPv6 defrag MTU handling
  2012-08-27 10:41                 ` Patrick McHardy
@ 2012-08-27 14:40                   ` Jesper Dangaard Brouer
  2012-08-27 14:40                     ` [PATCH 1/2] ipvs: IPv6 MTU checking cleanup and bugfix Jesper Dangaard Brouer
  2012-08-27 14:42                     ` [PATCH 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes Jesper Dangaard Brouer
  0 siblings, 2 replies; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-27 14:40 UTC (permalink / raw)
  To: netdev, Patrick McHardy, lvs-devel, Julian Anastasov,
	Simon Horman, Pablo Neira Ayuso
  Cc: Hans Schillstrom, Jesper Dangaard Brouer, Wensong Zhang, netfilter-devel

The following patchset makes IPVS compatible with Patrick McHardys
IPv6 NAT patchset: http://thread.gmane.org/gmane.linux.network/239615

Specifically the part that improves defragmentation.

 Patch01: Cleanup (and fixes a bug) in IPVS MTU IPv6 handling

 Patch02: Extend MTU check to account for IPv6 NAT defrag changes

This patchset is based upon:
 Homes ipvs-next tree:
  git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git

BUT with Patrick McHardy's IPv6 NAT patchset applied first, on top of
commit: 3654e61137db891f5312e6dd813b961484b5fdf3
title: "ipvs: add pmtu_disc option to disable IP DF for TUN packets"

Send due to request by Patrick:
 http://thread.gmane.org/gmane.linux.network/239615/focus=43870

---

Jesper Dangaard Brouer (2):
      ipvs: Extend MTU check to account for IPv6 NAT defrag changes
      ipvs: IPv6 MTU checking cleanup and bugfix


 net/netfilter/ipvs/ip_vs_xmit.c |   32 ++++++++++++++++++++++++++------
 1 files changed, 26 insertions(+), 6 deletions(-)


--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/2] ipvs: IPv6 MTU checking cleanup and bugfix
  2012-08-27 14:40                   ` [PATCH 0/2] net: ipvs and netfilter IPv6 defrag MTU handling Jesper Dangaard Brouer
@ 2012-08-27 14:40                     ` Jesper Dangaard Brouer
  2012-08-27 14:42                     ` [PATCH 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes Jesper Dangaard Brouer
  1 sibling, 0 replies; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-27 14:40 UTC (permalink / raw)
  To: netdev, Patrick McHardy, lvs-devel, Julian Anastasov,
	Simon Horman, Pablo Neira Ayuso
  Cc: Hans Schillstrom, Jesper Dangaard Brouer, Wensong Zhang, netfilter-devel

Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
a common helper function __mtu_check_toobig_v6().  The MTU check
for tunnel mode is special, and does not use this helper.

Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
were missing a check for skb_is_gso().

This e.g. caused issues for KVM IPVS setups, where different
Segmentation Offloading techniques are utilized, between guests,
via the virtio driver.  This resulted in very bad performance,
due to the ICMPv6 "too big" messages didn't affect the sender.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 net/netfilter/ipvs/ip_vs_xmit.c |   18 ++++++++++++++----
 1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 543a554..e56f0df 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -85,6 +85,15 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos)
 	return dst;
 }
 
+static inline bool
+__mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
+{
+	if (skb->len > mtu && !skb_is_gso(skb)) {
+		return true; /* Packet size violate MTU size */
+	}
+	return false;
+}
+
 /* Get route to daddr, update *saddr, optionally bind route to saddr */
 static struct rtable *do_output_route4(struct net *net, __be32 daddr,
 				       u32 rtos, int rt_mode, __be32 *saddr)
@@ -491,7 +500,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 
@@ -712,7 +721,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 
@@ -946,6 +955,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	if (skb_dst(skb))
 		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
 
+	/* MTU checking: Special for tunnel mode */
 	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
 	    !skb_is_gso(skb)) {
 		if (!skb->dev) {
@@ -1113,7 +1123,7 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if (skb->len > mtu) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 
@@ -1349,7 +1359,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-27 14:40                   ` [PATCH 0/2] net: ipvs and netfilter IPv6 defrag MTU handling Jesper Dangaard Brouer
  2012-08-27 14:40                     ` [PATCH 1/2] ipvs: IPv6 MTU checking cleanup and bugfix Jesper Dangaard Brouer
@ 2012-08-27 14:42                     ` Jesper Dangaard Brouer
  2012-08-27 15:20                       ` Julian Anastasov
  1 sibling, 1 reply; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-27 14:42 UTC (permalink / raw)
  To: netdev, Patrick McHardy, lvs-devel, Julian Anastasov,
	Simon Horman, Pablo Neira Ayuso
  Cc: Hans Schillstrom, Jesper Dangaard Brouer, Wensong Zhang, netfilter-devel

This patch is necessary, to make IPVS work, after Patrick McHardys
IPv6 NAT defragmentation changes.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

I would appriciate, if someone (e.g. Julian) double check the tunnel mode code.
--Jesper

 net/netfilter/ipvs/ip_vs_xmit.c |   16 +++++++++++++---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index e56f0df..20763de 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -88,7 +88,14 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos)
 static inline bool
 __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
 {
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if (IP6CB(skb)->frag_max_size) {
+		/* frag_max_size tell us that, this packet have been
+		 * defragmented by netfilter IPv6 conntrack module.
+		 */
+		if (IP6CB(skb)->frag_max_size > mtu)
+			return true; /* largest fragment violate MTU */
+	}
+	else if (skb->len > mtu && !skb_is_gso(skb)) {
 		return true; /* Packet size violate MTU size */
 	}
 	return false;
@@ -956,8 +963,11 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
 
 	/* MTU checking: Special for tunnel mode */
-	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
-	    !skb_is_gso(skb)) {
+	if ((!IP6CB(skb)->frag_max_size &&
+	     (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
+	      !skb_is_gso(skb)))
+	    || IP6CB(skb)->frag_max_size + sizeof(struct ipv6hdr) > mtu) {
+
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-27 14:42                     ` [PATCH 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes Jesper Dangaard Brouer
@ 2012-08-27 15:20                       ` Julian Anastasov
  2012-08-28  8:22                         ` Patrick McHardy
  2012-08-28  9:03                         ` [PATCH " Jesper Dangaard Brouer
  0 siblings, 2 replies; 68+ messages in thread
From: Julian Anastasov @ 2012-08-27 15:20 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Patrick McHardy, lvs-devel, Simon Horman,
	Pablo Neira Ayuso, Hans Schillstrom, Wensong Zhang,
	netfilter-devel


	Hello,

On Mon, 27 Aug 2012, Jesper Dangaard Brouer wrote:

> This patch is necessary, to make IPVS work, after Patrick McHardys
> IPv6 NAT defragmentation changes.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
> 
> I would appriciate, if someone (e.g. Julian) double check the tunnel mode code.

	Sure

> --Jesper
> 
>  net/netfilter/ipvs/ip_vs_xmit.c |   16 +++++++++++++---
>  1 files changed, 13 insertions(+), 3 deletions(-)
> 
> @@ -956,8 +963,11 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
>  		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
>  
>  	/* MTU checking: Special for tunnel mode */
> -	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
> -	    !skb_is_gso(skb)) {
> +	if ((!IP6CB(skb)->frag_max_size &&
> +	     (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
> +	      !skb_is_gso(skb)))
> +	    || IP6CB(skb)->frag_max_size + sizeof(struct ipv6hdr) > mtu) {
> +

	mtu is already reduced with the new outer header size,
may be we can just call __mtu_check_toobig_v6 with mtu?

>  		if (!skb->dev) {
>  			struct net *net = dev_net(skb_dst(skb)->dev);

	All other changes in patch 1 and 2 look ok and
I'll ack them next time.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-27 15:20                       ` Julian Anastasov
@ 2012-08-28  8:22                         ` Patrick McHardy
  2012-08-28  8:28                           ` Simon Horman
  2012-08-28 14:21                           ` [PATCH V2 0/2] net: ipvs and netfilter IPv6 defrag MTU handling Jesper Dangaard Brouer
  2012-08-28  9:03                         ` [PATCH " Jesper Dangaard Brouer
  1 sibling, 2 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-28  8:22 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Jesper Dangaard Brouer, netdev, lvs-devel, Simon Horman,
	Pablo Neira Ayuso, Hans Schillstrom, Wensong Zhang,
	netfilter-devel

On Mon, 27 Aug 2012, Julian Anastasov wrote:

>
> 	Hello,
>
> On Mon, 27 Aug 2012, Jesper Dangaard Brouer wrote:
>
>> This patch is necessary, to make IPVS work, after Patrick McHardys
>> IPv6 NAT defragmentation changes.
>>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> ---
>>
>> I would appriciate, if someone (e.g. Julian) double check the tunnel mode code.
>
> 	Sure
>
>> --Jesper
>>
>>  net/netfilter/ipvs/ip_vs_xmit.c |   16 +++++++++++++---
>>  1 files changed, 13 insertions(+), 3 deletions(-)
>>
>> @@ -956,8 +963,11 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
>>  		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
>>
>>  	/* MTU checking: Special for tunnel mode */
>> -	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
>> -	    !skb_is_gso(skb)) {
>> +	if ((!IP6CB(skb)->frag_max_size &&
>> +	     (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
>> +	      !skb_is_gso(skb)))
>> +	    || IP6CB(skb)->frag_max_size + sizeof(struct ipv6hdr) > mtu) {
>> +
>
> 	mtu is already reduced with the new outer header size,
> may be we can just call __mtu_check_toobig_v6 with mtu?
>
>>  		if (!skb->dev) {
>>  			struct net *net = dev_net(skb_dst(skb)->dev);
>
> 	All other changes in patch 1 and 2 look ok and
> I'll ack them next time.

I'd like to take the final patches through my nat-ipv6 tree in order
to avoid temporary regressions. Please let me know when they're ready
for applying.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-28  8:22                         ` Patrick McHardy
@ 2012-08-28  8:28                           ` Simon Horman
  2012-08-28 14:21                           ` [PATCH V2 0/2] net: ipvs and netfilter IPv6 defrag MTU handling Jesper Dangaard Brouer
  1 sibling, 0 replies; 68+ messages in thread
From: Simon Horman @ 2012-08-28  8:28 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Julian Anastasov, Jesper Dangaard Brouer, netdev, lvs-devel,
	Pablo Neira Ayuso, Hans Schillstrom, Wensong Zhang,
	netfilter-devel

On Tue, Aug 28, 2012 at 10:22:05AM +0200, Patrick McHardy wrote:
> On Mon, 27 Aug 2012, Julian Anastasov wrote:
> 
> >
> >	Hello,
> >
> >On Mon, 27 Aug 2012, Jesper Dangaard Brouer wrote:
> >
> >>This patch is necessary, to make IPVS work, after Patrick McHardys
> >>IPv6 NAT defragmentation changes.
> >>
> >>Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> >>---
> >>
> >>I would appriciate, if someone (e.g. Julian) double check the tunnel mode code.
> >
> >	Sure
> >
> >>--Jesper
> >>
> >> net/netfilter/ipvs/ip_vs_xmit.c |   16 +++++++++++++---
> >> 1 files changed, 13 insertions(+), 3 deletions(-)
> >>
> >>@@ -956,8 +963,11 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
> >> 		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
> >>
> >> 	/* MTU checking: Special for tunnel mode */
> >>-	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
> >>-	    !skb_is_gso(skb)) {
> >>+	if ((!IP6CB(skb)->frag_max_size &&
> >>+	     (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
> >>+	      !skb_is_gso(skb)))
> >>+	    || IP6CB(skb)->frag_max_size + sizeof(struct ipv6hdr) > mtu) {
> >>+
> >
> >	mtu is already reduced with the new outer header size,
> >may be we can just call __mtu_check_toobig_v6 with mtu?
> >
> >> 		if (!skb->dev) {
> >> 			struct net *net = dev_net(skb_dst(skb)->dev);
> >
> >	All other changes in patch 1 and 2 look ok and
> >I'll ack them next time.
> 
> I'd like to take the final patches through my nat-ipv6 tree in order
> to avoid temporary regressions. Please let me know when they're ready
> for applying.

I'm fine with these changes going through your tree (rather than my IPVS tree)
once they are ready.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-27 15:20                       ` Julian Anastasov
  2012-08-28  8:22                         ` Patrick McHardy
@ 2012-08-28  9:03                         ` Jesper Dangaard Brouer
  2012-08-28  9:47                           ` Julian Anastasov
  1 sibling, 1 reply; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-28  9:03 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: netdev, Patrick McHardy, lvs-devel, Simon Horman,
	Pablo Neira Ayuso, Hans Schillstrom, Wensong Zhang,
	netfilter-devel

On Mon, 2012-08-27 at 18:20 +0300, Julian Anastasov wrote:
> 	Hello,
> 
> On Mon, 27 Aug 2012, Jesper Dangaard Brouer wrote:
> 
> > This patch is necessary, to make IPVS work, after Patrick McHardys
> > IPv6 NAT defragmentation changes.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> > 
> > I would appriciate, if someone (e.g. Julian) double check the tunnel mode code.
> 
> 	Sure
> 
> > --Jesper
> > 
> >  net/netfilter/ipvs/ip_vs_xmit.c |   16 +++++++++++++---
> >  1 files changed, 13 insertions(+), 3 deletions(-)
> > 
> > @@ -956,8 +963,11 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
> >  		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
> >  
> >  	/* MTU checking: Special for tunnel mode */
> > -	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&

I guess:
 ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr)
Is the same as:
 skb->len

> > -	    !skb_is_gso(skb)) {
> > +	if ((!IP6CB(skb)->frag_max_size &&
> > +	     (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
> > +	      !skb_is_gso(skb)))
> > +	    || IP6CB(skb)->frag_max_size + sizeof(struct ipv6hdr) > mtu) {

> 
> 	mtu is already reduced with the new outer header size,
> may be we can just call __mtu_check_toobig_v6 with mtu?

To Julian, is the extra sizeof(struct ipv6hdr) addition to
frag_max_size, wrong? (as the mtu is already reduced)

If above statements hold, I think we can simply use
__mtu_check_toobig_v6() also for the tunnel case :-)

--Jesper


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-28  9:03                         ` [PATCH " Jesper Dangaard Brouer
@ 2012-08-28  9:47                           ` Julian Anastasov
  0 siblings, 0 replies; 68+ messages in thread
From: Julian Anastasov @ 2012-08-28  9:47 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Patrick McHardy, lvs-devel, Simon Horman,
	Pablo Neira Ayuso, Hans Schillstrom, Wensong Zhang,
	netfilter-devel


	Hello,

On Tue, 28 Aug 2012, Jesper Dangaard Brouer wrote:

> On Mon, 2012-08-27 at 18:20 +0300, Julian Anastasov wrote:
> > 
> > > @@ -956,8 +963,11 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
> > >  		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
> > >  
> > >  	/* MTU checking: Special for tunnel mode */
> > > -	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
> 
> I guess:
>  ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr)
> Is the same as:
>  skb->len

	I think so. You can think of this in different way:
all transmitters are called from same place, there is no
difference in the packets we see. When we can use
__mtu_check_toobig_v6 for other methods relying on skb->len
being correct, we can do the same for tunnels, only that
tunnels have lower MTU, that is the only difference.

> > > -	    !skb_is_gso(skb)) {
> > > +	if ((!IP6CB(skb)->frag_max_size &&
> > > +	     (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
> > > +	      !skb_is_gso(skb)))
> > > +	    || IP6CB(skb)->frag_max_size + sizeof(struct ipv6hdr) > mtu) {
> 
> > 
> > 	mtu is already reduced with the new outer header size,
> > may be we can just call __mtu_check_toobig_v6 with mtu?
> 
> To Julian, is the extra sizeof(struct ipv6hdr) addition to
> frag_max_size, wrong? (as the mtu is already reduced)

	Yes, sizeof(struct ipv6hdr) is needed only together
with payload_len because payload_len does not include the
first header.

> If above statements hold, I think we can simply use
> __mtu_check_toobig_v6() also for the tunnel case :-)

	Yep

> --Jesper

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH V2 0/2] net: ipvs and netfilter IPv6 defrag MTU handling
  2012-08-28  8:22                         ` Patrick McHardy
  2012-08-28  8:28                           ` Simon Horman
@ 2012-08-28 14:21                           ` Jesper Dangaard Brouer
  2012-08-28 14:22                             ` [PATCH V2 1/2] ipvs: IPv6 MTU checking cleanup and bugfix Jesper Dangaard Brouer
  2012-08-28 14:23                             ` [PATCH V2 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes Jesper Dangaard Brouer
  1 sibling, 2 replies; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-28 14:21 UTC (permalink / raw)
  To: netdev, Patrick McHardy, lvs-devel, Julian Anastasov,
	Simon Horman, Pablo Neira Ayuso
  Cc: Jesper Dangaard Brouer, Hans Schillstrom, Wensong Zhang, netfilter-devel

The following patchset V2 makes IPVS compatible with Patrick McHardys
IPv6 NAT patchset: http://thread.gmane.org/gmane.linux.network/239615

Specifically the part that improves defragmentation.

 Patch01: Cleanup (and fixes a bug) in IPVS MTU IPv6 handling
  In V2: we also handle tunnel mode, to use the common helper

 Patch02: Extend MTU check to account for IPv6 NAT defrag changes
  In V2: the tunnel mode is no longer a special case.

This patchset is based upon:
 Homes ipvs-next tree:
  git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git

BUT with Patrick McHardy's IPv6 NAT patchset applied first, on top of
commit: 3654e61137db891f5312e6dd813b961484b5fdf3
title: "ipvs: add pmtu_disc option to disable IP DF for TUN packets"

---

Jesper Dangaard Brouer (2):
      ipvs: Extend MTU check to account for IPv6 NAT defrag changes
      ipvs: IPv6 MTU checking cleanup and bugfix


 net/netfilter/ipvs/ip_vs_xmit.c |   28 ++++++++++++++++++++++------
 1 files changed, 22 insertions(+), 6 deletions(-)


--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH V2 1/2] ipvs: IPv6 MTU checking cleanup and bugfix
  2012-08-28 14:21                           ` [PATCH V2 0/2] net: ipvs and netfilter IPv6 defrag MTU handling Jesper Dangaard Brouer
@ 2012-08-28 14:22                             ` Jesper Dangaard Brouer
  2012-08-28 20:08                               ` Patrick McHardy
  2012-08-28 14:23                             ` [PATCH V2 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes Jesper Dangaard Brouer
  1 sibling, 1 reply; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-28 14:22 UTC (permalink / raw)
  To: netdev, Patrick McHardy, lvs-devel, Julian Anastasov,
	Simon Horman, Pablo Neira Ayuso
  Cc: Jesper Dangaard Brouer, Hans Schillstrom, Wensong Zhang, netfilter-devel

Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
a common helper function __mtu_check_toobig_v6().

The MTU check for tunnel mode can also use this helper as
ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
skb->len.  And the 'mtu' variable have been adjusted before
calling helper.

Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
were missing a check for skb_is_gso().

This bug e.g. caused issues for KVM IPVS setups, where different
Segmentation Offloading techniques are utilized, between guests,
via the virtio driver.  This resulted in very bad performance,
due to the ICMPv6 "too big" messages didn't affect the sender.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
In V2: we also make tunnel mode use the common helper

 net/netfilter/ipvs/ip_vs_xmit.c |   21 +++++++++++++++------
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 543a554..67a3978 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -85,6 +85,15 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos)
 	return dst;
 }
 
+static inline bool
+__mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
+{
+	if (skb->len > mtu && !skb_is_gso(skb)) {
+		return true; /* Packet size violate MTU size */
+	}
+	return false;
+}
+
 /* Get route to daddr, update *saddr, optionally bind route to saddr */
 static struct rtable *do_output_route4(struct net *net, __be32 daddr,
 				       u32 rtos, int rt_mode, __be32 *saddr)
@@ -491,7 +500,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 
@@ -712,7 +721,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 
@@ -946,8 +955,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	if (skb_dst(skb))
 		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
 
-	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
-	    !skb_is_gso(skb)) {
+	/* MTU checking: Notice that 'mtu' have been adjusted before hand */
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 
@@ -1113,7 +1122,7 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if (skb->len > mtu) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 
@@ -1349,7 +1358,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH V2 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-28 14:21                           ` [PATCH V2 0/2] net: ipvs and netfilter IPv6 defrag MTU handling Jesper Dangaard Brouer
  2012-08-28 14:22                             ` [PATCH V2 1/2] ipvs: IPv6 MTU checking cleanup and bugfix Jesper Dangaard Brouer
@ 2012-08-28 14:23                             ` Jesper Dangaard Brouer
  2012-08-28 14:49                               ` Eric Dumazet
  2012-08-28 20:10                               ` Patrick McHardy
  1 sibling, 2 replies; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-28 14:23 UTC (permalink / raw)
  To: netdev, Patrick McHardy, lvs-devel, Julian Anastasov,
	Simon Horman, Pablo Neira Ayuso
  Cc: Jesper Dangaard Brouer, Hans Schillstrom, Wensong Zhang, netfilter-devel

This patch is necessary, to make IPVS work, after Patrick McHardys
IPv6 NAT defragmentation changes.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
In V2: the tunnel mode is no longer a special case.

 net/netfilter/ipvs/ip_vs_xmit.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 67a3978..56f6d5d 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -88,7 +88,14 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos)
 static inline bool
 __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
 {
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if (IP6CB(skb)->frag_max_size) {
+		/* frag_max_size tell us that, this packet have been
+		 * defragmented by netfilter IPv6 conntrack module.
+		 */
+		if (IP6CB(skb)->frag_max_size > mtu)
+			return true; /* largest fragment violate MTU */
+	}
+	else if (skb->len > mtu && !skb_is_gso(skb)) {
 		return true; /* Packet size violate MTU size */
 	}
 	return false;


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH V2 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-28 14:23                             ` [PATCH V2 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes Jesper Dangaard Brouer
@ 2012-08-28 14:49                               ` Eric Dumazet
  2012-08-29  7:02                                 ` Jesper Dangaard Brouer
  2012-08-28 20:10                               ` Patrick McHardy
  1 sibling, 1 reply; 68+ messages in thread
From: Eric Dumazet @ 2012-08-28 14:49 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Patrick McHardy, lvs-devel, Julian Anastasov,
	Simon Horman, Pablo Neira Ayuso, Hans Schillstrom, Wensong Zhang,
	netfilter-devel

On Tue, 2012-08-28 at 16:23 +0200, Jesper Dangaard Brouer wrote:
> This patch is necessary, to make IPVS work, after Patrick McHardys
> IPv6 NAT defragmentation changes.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
> In V2: the tunnel mode is no longer a special case.
> 
>  net/netfilter/ipvs/ip_vs_xmit.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> index 67a3978..56f6d5d 100644
> --- a/net/netfilter/ipvs/ip_vs_xmit.c
> +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> @@ -88,7 +88,14 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos)
>  static inline bool
>  __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
>  {
> -	if (skb->len > mtu && !skb_is_gso(skb)) {
> +	if (IP6CB(skb)->frag_max_size) {
> +		/* frag_max_size tell us that, this packet have been
> +		 * defragmented by netfilter IPv6 conntrack module.
> +		 */
> +		if (IP6CB(skb)->frag_max_size > mtu)
> +			return true; /* largest fragment violate MTU */
> +	}
> +	else if (skb->len > mtu && !skb_is_gso(skb)) {
>  		return true; /* Packet size violate MTU size */
>  	}

Couldnt you use a single test ?

if (IP6CB(skb)->frag_max_size > mtu)
	return true;

if (skb->len > mtu && !skb_is_gso(skb))
	return true;




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V2 1/2] ipvs: IPv6 MTU checking cleanup and bugfix
  2012-08-28 14:22                             ` [PATCH V2 1/2] ipvs: IPv6 MTU checking cleanup and bugfix Jesper Dangaard Brouer
@ 2012-08-28 20:08                               ` Patrick McHardy
  0 siblings, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-28 20:08 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, lvs-devel, Julian Anastasov, Simon Horman,
	Pablo Neira Ayuso, Hans Schillstrom, Wensong Zhang,
	netfilter-devel

On Tue, 28 Aug 2012, Jesper Dangaard Brouer wrote:

> Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
> a common helper function __mtu_check_toobig_v6().
>
> The MTU check for tunnel mode can also use this helper as
> ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
> skb->len.  And the 'mtu' variable have been adjusted before
> calling helper.
>
> Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
> were missing a check for skb_is_gso().
>
> This bug e.g. caused issues for KVM IPVS setups, where different
> Segmentation Offloading techniques are utilized, between guests,
> via the virtio driver.  This resulted in very bad performance,
> due to the ICMPv6 "too big" messages didn't affect the sender.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Applied to my nf-nat-ipv6 tree.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V2 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-28 14:23                             ` [PATCH V2 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes Jesper Dangaard Brouer
  2012-08-28 14:49                               ` Eric Dumazet
@ 2012-08-28 20:10                               ` Patrick McHardy
  1 sibling, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-28 20:10 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, lvs-devel, Julian Anastasov, Simon Horman,
	Pablo Neira Ayuso, Hans Schillstrom, Wensong Zhang,
	netfilter-devel

On Tue, 28 Aug 2012, Jesper Dangaard Brouer wrote:

> This patch is necessary, to make IPVS work, after Patrick McHardys
> IPv6 NAT defragmentation changes.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

I've folded this into the defragmentation patch to avoid temporary 
breakage. Thanks Jesper.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V2 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-28 14:49                               ` Eric Dumazet
@ 2012-08-29  7:02                                 ` Jesper Dangaard Brouer
  2012-08-29  8:43                                   ` Eric Dumazet
  0 siblings, 1 reply; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-29  7:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Patrick McHardy, lvs-devel, Julian Anastasov,
	Simon Horman, Pablo Neira Ayuso, Hans Schillstrom, Wensong Zhang,
	netfilter-devel

On Tue, 2012-08-28 at 07:49 -0700, Eric Dumazet wrote:
> On Tue, 2012-08-28 at 16:23 +0200, Jesper Dangaard Brouer wrote:
> > This patch is necessary, to make IPVS work, after Patrick McHardys
> > IPv6 NAT defragmentation changes.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> > In V2: the tunnel mode is no longer a special case.
> > 
> >  net/netfilter/ipvs/ip_vs_xmit.c |    9 ++++++++-
> >  1 files changed, 8 insertions(+), 1 deletions(-)
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> > index 67a3978..56f6d5d 100644
> > --- a/net/netfilter/ipvs/ip_vs_xmit.c
> > +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> > @@ -88,7 +88,14 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos)
> >  static inline bool
> >  __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
> >  {
> > -	if (skb->len > mtu && !skb_is_gso(skb)) {
> > +	if (IP6CB(skb)->frag_max_size) {
> > +		/* frag_max_size tell us that, this packet have been
> > +		 * defragmented by netfilter IPv6 conntrack module.
> > +		 */
> > +		if (IP6CB(skb)->frag_max_size > mtu)
> > +			return true; /* largest fragment violate MTU */
Implicit:         else
     			return false

(if it makes it more clear, not sure)
> > +	}
> > +	else if (skb->len > mtu && !skb_is_gso(skb)) {
> >  		return true; /* Packet size violate MTU size */
> >  	}
> 
> Couldnt you use a single test ?
> 
> if (IP6CB(skb)->frag_max_size > mtu)
> 	return true;
> 
> if (skb->len > mtu && !skb_is_gso(skb))
> 	return true;
> 

Nope, this will not work.

If (IP6CB(skb)->frag_max_size > 0) then we have a defragmented packet,
this means that skb->len cannot be used for MTU checking, because
skb->len is now the total length of all the fragments (which your
solution will fall-through to)

The unreadable version of the function is:

static inline bool
__mtu_check_toobig_v6_unreadable(const struct sk_buff *skb, u32 mtu)
{
       return !!((!IP6CB(skb)->frag_max_size &&
                  (skb->len > mtu && !skb_is_gso(skb)))
                 || IP6CB(skb)->frag_max_size > mtu);
}

Perhaps you like this version better (you seem to use constructions like
this), but I really dislike it, because I (personally) find it
harder/slower to read this kind of code, and I also believe its more
error prone when someone needs to extend this.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V2 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-29  7:02                                 ` Jesper Dangaard Brouer
@ 2012-08-29  8:43                                   ` Eric Dumazet
  2012-08-29  9:04                                     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 68+ messages in thread
From: Eric Dumazet @ 2012-08-29  8:43 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Patrick McHardy, lvs-devel, Julian Anastasov,
	Simon Horman, Pablo Neira Ayuso, Hans Schillstrom, Wensong Zhang,
	netfilter-devel

On Wed, 2012-08-29 at 09:02 +0200, Jesper Dangaard Brouer wrote:
> On Tue, 2012-08-28 at 07:49 -0700, Eric Dumazet wrote:
> > On Tue, 2012-08-28 at 16:23 +0200, Jesper Dangaard Brouer wrote:
> > > This patch is necessary, to make IPVS work, after Patrick McHardys
> > > IPv6 NAT defragmentation changes.
> > > 
> > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > > ---
> > > In V2: the tunnel mode is no longer a special case.
> > > 
> > >  net/netfilter/ipvs/ip_vs_xmit.c |    9 ++++++++-
> > >  1 files changed, 8 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> > > index 67a3978..56f6d5d 100644
> > > --- a/net/netfilter/ipvs/ip_vs_xmit.c
> > > +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> > > @@ -88,7 +88,14 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos)
> > >  static inline bool
> > >  __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
> > >  {
> > > -	if (skb->len > mtu && !skb_is_gso(skb)) {
> > > +	if (IP6CB(skb)->frag_max_size) {
> > > +		/* frag_max_size tell us that, this packet have been
> > > +		 * defragmented by netfilter IPv6 conntrack module.
> > > +		 */
> > > +		if (IP6CB(skb)->frag_max_size > mtu)
> > > +			return true; /* largest fragment violate MTU */
> Implicit:         else
>      			return false
> 
> (if it makes it more clear, not sure)
> > > +	}
> > > +	else if (skb->len > mtu && !skb_is_gso(skb)) {
> > >  		return true; /* Packet size violate MTU size */
> > >  	}
> > 
> > Couldnt you use a single test ?
> > 
> > if (IP6CB(skb)->frag_max_size > mtu)
> > 	return true;
> > 
> > if (skb->len > mtu && !skb_is_gso(skb))
> > 	return true;
> > 
> 
> Nope, this will not work.
> 
> If (IP6CB(skb)->frag_max_size > 0) then we have a defragmented packet,
> this means that skb->len cannot be used for MTU checking, because
> skb->len is now the total length of all the fragments (which your
> solution will fall-through to)
> 

If the packet was not fragmented, its was a single frame.

But if this frame length is above mtu, packet is not too big ?

Sorry if its a stupid question.




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH V2 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes
  2012-08-29  8:43                                   ` Eric Dumazet
@ 2012-08-29  9:04                                     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 68+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-29  9:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Patrick McHardy, lvs-devel, Julian Anastasov,
	Simon Horman, Pablo Neira Ayuso, Hans Schillstrom, Wensong Zhang,
	netfilter-devel

On Wed, 2012-08-29 at 01:43 -0700, Eric Dumazet wrote:
> On Wed, 2012-08-29 at 09:02 +0200, Jesper Dangaard Brouer wrote:
> > On Tue, 2012-08-28 at 07:49 -0700, Eric Dumazet wrote:
> > > On Tue, 2012-08-28 at 16:23 +0200, Jesper Dangaard Brouer wrote:
> > > > This patch is necessary, to make IPVS work, after Patrick McHardys
> > > > IPv6 NAT defragmentation changes.
> > > > 
> > > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > > > ---
> > > > In V2: the tunnel mode is no longer a special case.
> > > > 
> > > >  net/netfilter/ipvs/ip_vs_xmit.c |    9 ++++++++-
> > > >  1 files changed, 8 insertions(+), 1 deletions(-)
> > > > 
> > > > diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> > > > index 67a3978..56f6d5d 100644
> > > > --- a/net/netfilter/ipvs/ip_vs_xmit.c
> > > > +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> > > > @@ -88,7 +88,14 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos)
> > > >  static inline bool
> > > >  __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
> > > >  {
> > > > -	if (skb->len > mtu && !skb_is_gso(skb)) {
> > > > +	if (IP6CB(skb)->frag_max_size) {
> > > > +		/* frag_max_size tell us that, this packet have been
> > > > +		 * defragmented by netfilter IPv6 conntrack module.
> > > > +		 */
> > > > +		if (IP6CB(skb)->frag_max_size > mtu)
> > > > +			return true; /* largest fragment violate MTU */
> > Implicit:         else
> >      			return false
> > 
> > (if it makes it more clear, not sure)
> > > > +	}
> > > > +	else if (skb->len > mtu && !skb_is_gso(skb)) {
> > > >  		return true; /* Packet size violate MTU size */
> > > >  	}
> > > 
> > > Couldnt you use a single test ?
> > > 
> > > if (IP6CB(skb)->frag_max_size > mtu)
> > > 	return true;
> > > 
> > > if (skb->len > mtu && !skb_is_gso(skb))
> > > 	return true;
> > > 
> > 
> > Nope, this will not work.
> > 
> > If (IP6CB(skb)->frag_max_size > 0) then we have a defragmented packet,
> > this means that skb->len cannot be used for MTU checking, because
> > skb->len is now the total length of all the fragments (which your
> > solution will fall-through to)
> > 
> 
> If the packet was not fragmented, its was a single frame.
> 
> But if this frame length is above mtu, packet is not too big ?

Nope... not if its a defragmented/reassembled packet.

> Sorry if its a stupid question.

These changes have to be seen together with Patrick's patch:
 "netfilter: nf_conntrack_ipv6: improve fragmentation handling"
 http://thread.gmane.org/gmane.linux.network/241517/focus=241518

The IPv6 packet arriving have been defragmented/reassembled by the
nf_conntrack_ipv6 module.  Thus, they look like a normal, but big,
packet to us.   We let it through, because it will be re-fragmented
again later, but first we need to check if the largest fragment would
violate the MTU.

Hope it makes it more clear.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 00/19] netfilter: IPv6 NAT
@ 2012-08-28 21:48 Patrick McHardy
  0 siblings, 0 replies; 68+ messages in thread
From: Patrick McHardy @ 2012-08-28 21:48 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev

The following patches contain the latest version of IPv6 NAT rebased on
top of nf-next.git. Since the last posting an issue with IPVS and the
IPv6 fragmentation improvements have been fixed by Jesper, as well as
some minor cosmetic changes done myself. I consider these patches ready
for merging now.

Please pull from:

git://github.com/kaber/nf-next-ipv6-nat.git master

Thanks.


Jesper Dangaard Brouer (1):
      Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using     a common helper function __mtu_check_toobig_v6().

Pablo Neira Ayuso (2):
      netfilter: nf_nat: support IPv6 in IRC NAT helper
      netfilter: nf_nat: support IPv6 in TFTP NAT helper

Patrick McHardy (16):
      ipv4: fix path MTU discovery with connection tracking
      netfilter: nf_conntrack_ipv6: improve fragmentation handling
      netfilter: nf_conntrack_ipv6: fix tracking of ICMPv6 error messages containing fragments
      netfilter: nf_conntrack: restrict NAT helper invocation to IPv4
      netfilter: nf_nat: add protoff argument to packet mangling functions
      netfilter: add protocol independent NAT core
      netfilter: ipv6: expand skb head in ip6_route_me_harder after oif change
      net: core: add function for incremental IPv6 pseudo header checksum updates
      netfilter: ipv6: add IPv6 NAT support
      netfilter: ip6tables: add MASQUERADE target
      netfilter: ip6tables: add REDIRECT target
      netfilter: ip6tables: add NETMAP target
      netfilter: nf_nat: support IPv6 in FTP NAT helper
      netfilter: nf_nat: support IPv6 in amanda NAT helper
      netfilter: nf_nat: support IPv6 in SIP NAT helper
      netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target

 include/linux/ipv6.h                               |    1 +
 include/linux/netfilter.h                          |   14 +-
 include/linux/netfilter/nf_conntrack_amanda.h      |    1 +
 include/linux/netfilter/nf_conntrack_ftp.h         |    1 +
 include/linux/netfilter/nf_conntrack_h323.h        |   15 +-
 include/linux/netfilter/nf_conntrack_irc.h         |    1 +
 include/linux/netfilter/nf_conntrack_pptp.h        |    2 +
 include/linux/netfilter/nf_conntrack_sip.h         |   21 +-
 include/linux/netfilter/nf_nat.h                   |    8 +
 include/linux/netfilter/nfnetlink_conntrack.h      |    8 +-
 include/linux/netfilter_ipv4.h                     |    1 -
 include/linux/netfilter_ipv6/Kbuild                |    1 +
 include/linux/netfilter_ipv6/ip6t_NPT.h            |   16 +
 include/net/addrconf.h                             |    2 +-
 include/net/checksum.h                             |    3 +
 include/net/inet_frag.h                            |    2 +
 include/net/ip.h                                   |    2 +
 include/net/netfilter/nf_conntrack_expect.h        |    2 +-
 include/net/netfilter/nf_nat.h                     |    6 +-
 include/net/netfilter/nf_nat_core.h                |    5 +-
 include/net/netfilter/nf_nat_helper.h              |   11 +-
 include/net/netfilter/nf_nat_l3proto.h             |   52 ++
 include/net/netfilter/nf_nat_l4proto.h             |   72 +++
 include/net/netfilter/nf_nat_protocol.h            |   67 --
 include/net/netfilter/nf_nat_rule.h                |   15 -
 include/net/netns/conntrack.h                      |    4 +
 include/net/netns/ipv4.h                           |    2 -
 include/net/netns/ipv6.h                           |    1 +
 net/core/secure_seq.c                              |    1 +
 net/core/utils.c                                   |   20 +
 net/ipv4/ip_fragment.c                             |    8 +-
 net/ipv4/ip_output.c                               |    4 +-
 net/ipv4/netfilter.c                               |   37 --
 net/ipv4/netfilter/Kconfig                         |   69 +--
 net/ipv4/netfilter/Makefile                        |   16 +-
 net/ipv4/netfilter/ipt_MASQUERADE.c                |   18 +-
 net/ipv4/netfilter/ipt_NETMAP.c                    |   15 +-
 net/ipv4/netfilter/ipt_REDIRECT.c                  |   15 +-
 .../{nf_nat_standalone.c => iptable_nat.c}         |  264 ++++----
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c     |    8 +-
 net/ipv4/netfilter/nf_nat_h323.c                   |   71 ++-
 net/ipv4/netfilter/nf_nat_l3proto_ipv4.c           |  281 ++++++++
 net/ipv4/netfilter/nf_nat_pptp.c                   |   21 +-
 net/ipv4/netfilter/nf_nat_proto_gre.c              |   30 +-
 net/ipv4/netfilter/nf_nat_proto_icmp.c             |   24 +-
 net/ipv4/netfilter/nf_nat_rule.c                   |  214 -------
 net/ipv6/addrconf.c                                |    2 +-
 net/ipv6/ip6_output.c                              |    7 +-
 net/ipv6/netfilter.c                               |    8 +
 net/ipv6/netfilter/Kconfig                         |   54 ++
 net/ipv6/netfilter/Makefile                        |    8 +
 net/ipv6/netfilter/ip6t_MASQUERADE.c               |  135 ++++
 net/ipv6/netfilter/ip6t_NETMAP.c                   |   94 +++
 net/ipv6/netfilter/ip6t_NPT.c                      |  165 +++++
 net/ipv6/netfilter/ip6t_REDIRECT.c                 |   98 +++
 net/ipv6/netfilter/ip6table_nat.c                  |  321 ++++++++++
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c     |  137 ++--
 net/ipv6/netfilter/nf_conntrack_reasm.c            |   19 +-
 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c           |  287 +++++++++
 net/ipv6/netfilter/nf_nat_proto_icmpv6.c           |   90 +++
 net/netfilter/Kconfig                              |   49 ++
 net/netfilter/Makefile                             |   18 +
 net/netfilter/core.c                               |    5 +
 net/netfilter/ipvs/ip_vs_ftp.c                     |    1 +
 net/netfilter/ipvs/ip_vs_xmit.c                    |   28 +-
 net/netfilter/nf_conntrack_amanda.c                |    5 +-
 net/netfilter/nf_conntrack_core.c                  |    6 +
 net/netfilter/nf_conntrack_ftp.c                   |    3 +-
 net/netfilter/nf_conntrack_h323_main.c             |  232 +++++---
 net/netfilter/nf_conntrack_irc.c                   |    3 +-
 net/netfilter/nf_conntrack_netlink.c               |   35 +-
 net/netfilter/nf_conntrack_pptp.c                  |   18 +-
 net/netfilter/nf_conntrack_proto_tcp.c             |    8 +-
 net/netfilter/nf_conntrack_sip.c                   |  143 +++--
 net/{ipv4 => }/netfilter/nf_nat_amanda.c           |    4 +-
 net/{ipv4 => }/netfilter/nf_nat_core.c             |  675 +++++++++++---------
 net/{ipv4 => }/netfilter/nf_nat_ftp.c              |   34 +-
 net/{ipv4 => }/netfilter/nf_nat_helper.c           |  109 ++--
 net/{ipv4 => }/netfilter/nf_nat_irc.c              |   10 +-
 net/{ipv4 => }/netfilter/nf_nat_proto_common.c     |   54 +-
 net/{ipv4 => }/netfilter/nf_nat_proto_dccp.c       |   56 +-
 net/{ipv4 => }/netfilter/nf_nat_proto_sctp.c       |   53 +-
 net/{ipv4 => }/netfilter/nf_nat_proto_tcp.c        |   40 +-
 net/{ipv4 => }/netfilter/nf_nat_proto_udp.c        |   42 +-
 net/{ipv4 => }/netfilter/nf_nat_proto_udplite.c    |   58 +-
 net/{ipv4 => }/netfilter/nf_nat_proto_unknown.c    |   16 +-
 net/{ipv4 => }/netfilter/nf_nat_sip.c              |  270 +++++----
 net/{ipv4 => }/netfilter/nf_nat_tftp.c             |    1 -
 net/netfilter/xt_nat.c                             |  170 +++++
 89 files changed, 3461 insertions(+), 1562 deletions(-)

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2012-08-29  9:04 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-09 20:08 [PATCH 00/19] netfilter: IPv6 NAT kaber
2012-08-09 20:08 ` [PATCH 01/19] netfilter: nf_ct_sip: fix helper name kaber
2012-08-14  0:00   ` Pablo Neira Ayuso
2012-08-09 20:08 ` [PATCH 02/19] netfilter: nf_ct_sip: fix IPv6 address parsing kaber
2012-08-14  0:19   ` Pablo Neira Ayuso
2012-08-09 20:08 ` [PATCH 03/19] netfilter: nf_nat_sip: fix via header translation with multiple parameters kaber
2012-08-14  0:28   ` Pablo Neira Ayuso
2012-08-14 12:23     ` Patrick McHardy
2012-08-09 20:08 ` [PATCH 04/19] ipv4: fix path MTU discovery with connection tracking kaber
2012-08-09 20:08 ` [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling kaber
2012-08-17  8:06   ` Jesper Dangaard Brouer
2012-08-18 12:26     ` Patrick McHardy
2012-08-19 19:37       ` Jesper Dangaard Brouer
2012-08-19 19:44         ` Patrick McHardy
2012-08-20 13:13           ` Jesper Dangaard Brouer
2012-08-22 22:21             ` Patrick McHardy
2012-08-21 22:21           ` Jesper Dangaard Brouer
2012-08-26 21:20             ` Patrick McHardy
2012-08-27 10:13               ` Jesper Dangaard Brouer
2012-08-27 10:41                 ` Patrick McHardy
2012-08-27 14:40                   ` [PATCH 0/2] net: ipvs and netfilter IPv6 defrag MTU handling Jesper Dangaard Brouer
2012-08-27 14:40                     ` [PATCH 1/2] ipvs: IPv6 MTU checking cleanup and bugfix Jesper Dangaard Brouer
2012-08-27 14:42                     ` [PATCH 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes Jesper Dangaard Brouer
2012-08-27 15:20                       ` Julian Anastasov
2012-08-28  8:22                         ` Patrick McHardy
2012-08-28  8:28                           ` Simon Horman
2012-08-28 14:21                           ` [PATCH V2 0/2] net: ipvs and netfilter IPv6 defrag MTU handling Jesper Dangaard Brouer
2012-08-28 14:22                             ` [PATCH V2 1/2] ipvs: IPv6 MTU checking cleanup and bugfix Jesper Dangaard Brouer
2012-08-28 20:08                               ` Patrick McHardy
2012-08-28 14:23                             ` [PATCH V2 2/2] ipvs: Extend MTU check to account for IPv6 NAT defrag changes Jesper Dangaard Brouer
2012-08-28 14:49                               ` Eric Dumazet
2012-08-29  7:02                                 ` Jesper Dangaard Brouer
2012-08-29  8:43                                   ` Eric Dumazet
2012-08-29  9:04                                     ` Jesper Dangaard Brouer
2012-08-28 20:10                               ` Patrick McHardy
2012-08-28  9:03                         ` [PATCH " Jesper Dangaard Brouer
2012-08-28  9:47                           ` Julian Anastasov
2012-08-17 13:36   ` [PATCH 05/19] netfilter: nf_conntrack_ipv6: improve fragmentation handling Pablo Neira Ayuso
2012-08-18 12:43     ` Patrick McHardy
2012-08-09 20:08 ` [PATCH 06/19] netfilter: nf_conntrack_ipv6: fix tracking of ICMPv6 error messages containing fragments kaber
2012-08-09 20:08 ` [PATCH 07/19] netfilter: nf_conntrack: restrict NAT helper invocation to IPv4 kaber
2012-08-09 20:08 ` [PATCH 08/19] netfilter: nf_nat: add protoff argument to packet mangling functions kaber
2012-08-09 20:08 ` [PATCH 09/19] netfilter: add protocol independant NAT core kaber
2012-08-09 20:08 ` [PATCH 10/19] netfilter: ipv6: expand skb head in ip6_route_me_harder after oif change kaber
2012-08-09 20:08 ` [PATCH 11/19] net: core: add function for incremental IPv6 pseudo header checksum updates kaber
2012-08-09 20:08 ` [PATCH 12/19] netfilter: ipv6: add IPv6 NAT support kaber
2012-08-09 20:08 ` [PATCH 13/19] netfilter: ip6tables: add MASQUERADE target kaber
2012-08-17 13:11   ` Pablo Neira Ayuso
2012-08-18 12:31     ` Patrick McHardy
2012-08-09 20:08 ` [PATCH 14/19] netfilter: ip6tables: add REDIRECT target kaber
2012-08-09 20:08 ` [PATCH 15/19] netfilter: ip6tables: add NETMAP target kaber
2012-08-09 20:09 ` [PATCH 16/19] netfilter: nf_nat: support IPv6 in FTP NAT helper kaber
2012-08-09 20:09 ` [PATCH 17/19] netfilter: nf_nat: support IPv6 in amanda " kaber
2012-08-09 20:09 ` [PATCH 18/19] netfilter: nf_nat: support IPv6 in SIP " kaber
2012-08-09 20:09 ` [PATCH 19/19] netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target kaber
2012-08-09 21:55   ` Jan Engelhardt
2012-08-09 22:25     ` Patrick McHardy
2012-08-09 20:56 ` [PATCH 00/19] netfilter: IPv6 NAT Eric W. Biederman
2012-08-09 21:52   ` Patrick McHardy
2012-08-09 22:00 ` Pablo Neira Ayuso
2012-08-09 22:30   ` Patrick McHardy
2012-08-17 13:42 ` Pablo Neira Ayuso
2012-08-18 12:46   ` Patrick McHardy
2012-08-25  0:58 ` Andre Tomt
2012-08-25  1:16   ` Andre Tomt
2012-08-26 18:06     ` Patrick McHardy
2012-08-27  7:33   ` Florian Weimer
2012-08-28 21:48 Patrick McHardy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.