New draft document on nat / traversal / hole punching (with Dan Kegel)

All of lore.kernel.org
 help / color / mirror / Atom feed

* New draft document on nat / traversal / hole punching (with Dan Kegel)
@ 2004-11-09 15:49 Harald Welte
  2004-11-09 18:22 ` Martin Josefsson
  2004-11-16  5:28 ` Rusty Russell
  0 siblings, 2 replies; 20+ messages in thread
From: Harald Welte @ 2004-11-09 15:49 UTC (permalink / raw)
  To: Netfilter Development Mailinglist; +Cc: coreteam

[-- Attachment #1: Type: text/plain, Size: 744 bytes --]

Hi!

On the IETF behave list, I received a link to a yet unpublished draft
that can be seen as a successor to the old 'kegel draft' (now also
authored by a whole group).  It should be a good reading for anyone who
wans to thoroughly understand the problem and see proposed solutions.

It now covers both, UDP and TCP.

http://www.brynosaurus.com/pub/os/nat.pdf

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-09 15:49 New draft document on nat / traversal / hole punching (with Dan Kegel) Harald Welte
@ 2004-11-09 18:22 ` Martin Josefsson
  2004-11-09 23:42   ` Rusty Russell
  2004-11-16  5:28 ` Rusty Russell
  1 sibling, 1 reply; 20+ messages in thread
From: Martin Josefsson @ 2004-11-09 18:22 UTC (permalink / raw)
  To: Harald Welte; +Cc: Rusty Russell, Netfilter Development Mailinglist, coreteam

[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]

On Tue, 2004-11-09 at 16:49, Harald Welte wrote:
> Hi!

Hi

> On the IETF behave list, I received a link to a yet unpublished draft
> that can be seen as a successor to the old 'kegel draft' (now also
> authored by a whole group).  It should be a good reading for anyone who
> wans to thoroughly understand the problem and see proposed solutions.
> 
> It now covers both, UDP and TCP.
> 
> http://www.brynosaurus.com/pub/os/nat.pdf

Interesting read. Just one thing... the tcp-windowtracking patch
"breaks" TCP "hole punching" unless ip_conntrack_tcp_be_liberal is set
nonzero. Maybe we should introduce an option to enable/disable "hole
punching", a few modifications to the state-table, possibly some other
small things and it should work unless I'm missing something.

Rusty, iirc we have a problem in the NAT core when we SNAT and have to
remap ports, later SNAT'd connections from the same host:port pair
doesn't get the same mapping as the first one. You were working on a fix
for this? Was this ever made public?

-- 
/Martin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-09 18:22 ` Martin Josefsson
@ 2004-11-09 23:42   ` Rusty Russell
  2004-11-10  6:10     ` David S. Miller
  2004-11-10 16:25     ` Harald Welte
  0 siblings, 2 replies; 20+ messages in thread
From: Rusty Russell @ 2004-11-09 23:42 UTC (permalink / raw)
  To: Martin Josefsson
  Cc: Harald Welte, Netfilter development mailing list, coreteam

On Tue, 2004-11-09 at 19:22 +0100, Martin Josefsson wrote:
> Rusty, iirc we have a problem in the NAT core when we SNAT and have to
> remap ports, later SNAT'd connections from the same host:port pair
> doesn't get the same mapping as the first one. You were working on a fix
> for this? Was this ever made public?

Dave, please apply.  (Thanks for the reminder Martin).

Rusty.

Name: Fix find_appropriate_src() To Actually Work
Status: Tested under nfsim
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

We try to bind to the same source port when sending packets from the
same source IP/source port to the outside world.  Normally, this is
simple, since we always try to keep the same source port anyway, but
there are cases where that is not available.

This is a requirement for the Kegel Peer-to-Peer NAT paper:

	http://alumnus.caltech.edu/~dank/peer-nat.html

Unfortunately, our current implementation is useless.  It looks up a
hash to see if this srcip/srcpt has been used, but instead of
returning the mapping to use, it simply returns that same srcip/srcpt.
This is clearly wrong.  As pointed out by Krisztian Kovacs.

Also, we are no longer using the netfilter-special list iterators, so
we can split src_cmp, in_range and the actual result manipulation for
much clearer code.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .8770-linux-2.6.9-bk3/net/ipv4/netfilter/ip_nat_core.c .8770-linux-2.6.9-bk3.updated/net/ipv4/netfilter/ip_nat_core.c
--- .8770-linux-2.6.9-bk3/net/ipv4/netfilter/ip_nat_core.c	2004-10-19 14:34:25.000000000 +1000
+++ .8770-linux-2.6.9-bk3.updated/net/ipv4/netfilter/ip_nat_core.c	2004-10-20 11:07:45.000000000 +1000
@@ -118,31 +118,30 @@ ip_nat_used_tuple(const struct ip_conntr
 	return ip_conntrack_tuple_taken(&reply, ignored_conntrack);
 }
 
-/* Does tuple + the source manip come within the range mr */
+/* If we source map this tuple so reply looks like reply_tuple, will
+ * that meet the constraints of mr. */
 static int
 in_range(const struct ip_conntrack_tuple *tuple,
-	 const struct ip_conntrack_manip *manip,
 	 const struct ip_nat_multi_range *mr)
 {
 	struct ip_nat_protocol *proto = ip_nat_find_proto(tuple->dst.protonum);
 	unsigned int i;
-	struct ip_conntrack_tuple newtuple = { *manip, tuple->dst };
 
 	for (i = 0; i < mr->rangesize; i++) {
 		/* If we are allowed to map IPs, then we must be in the
 		   range specified, otherwise we must be unchanged. */
 		if (mr->range[i].flags & IP_NAT_RANGE_MAP_IPS) {
-			if (ntohl(newtuple.src.ip) < ntohl(mr->range[i].min_ip)
-			    || (ntohl(newtuple.src.ip)
+			if (ntohl(tuple->src.ip) < ntohl(mr->range[i].min_ip)
+			    || (ntohl(tuple->src.ip)
 				> ntohl(mr->range[i].max_ip)))
 				continue;
 		} else {
-			if (newtuple.src.ip != tuple->src.ip)
+			if (tuple->src.ip != tuple->src.ip)
 				continue;
 		}
 
 		if (!(mr->range[i].flags & IP_NAT_RANGE_PROTO_SPECIFIED)
-		    || proto->in_range(&newtuple, IP_NAT_MANIP_SRC,
+		    || proto->in_range(tuple, IP_NAT_MANIP_SRC,
 				       &mr->range[i].min, &mr->range[i].max))
 			return 1;
 	}
@@ -150,33 +149,40 @@ in_range(const struct ip_conntrack_tuple
 }
 
 static inline int
-src_cmp(const struct ip_conntrack *ct,
-	const struct ip_conntrack_tuple *tuple,
-	const struct ip_nat_multi_range *mr)
+same_src(const struct ip_conntrack *ct,
+	 const struct ip_conntrack_tuple *tuple)
 {
 	return (ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum
 		== tuple->dst.protonum
 		&& ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.ip
 		== tuple->src.ip
 		&& ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.all
-		== tuple->src.u.all
-		&& in_range(tuple,
-			    &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src, mr));
+		== tuple->src.u.all);
 }
 
 /* Only called for SRC manip */
-static struct ip_conntrack_manip *
+static int
 find_appropriate_src(const struct ip_conntrack_tuple *tuple,
+		     struct ip_conntrack_tuple *result,
 		     const struct ip_nat_multi_range *mr)
 {
 	unsigned int h = hash_by_src(&tuple->src, tuple->dst.protonum);
 	struct ip_conntrack *ct;
 
 	MUST_BE_READ_LOCKED(&ip_nat_lock);
-	list_for_each_entry(ct, &bysource[h], nat.info.bysource)
-		if (src_cmp(ct, tuple, mr))
-			return &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src;
-	return NULL;
+
+	list_for_each_entry(ct, &bysource[h], nat.info.bysource) {
+		if (same_src(ct, tuple)) {
+			/* Copy source part from reply tuple. */
+			invert_tuplepr(result,
+				       &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+			result->dst = tuple->dst;
+
+			if (in_range(result, mr))
+				return 1;
+		}
+	}
+	return 0;
 }
 
 #ifdef CONFIG_IP_NF_NAT_LOCAL
@@ -393,13 +399,7 @@ get_unique_tuple(struct ip_conntrack_tup
 	   So far, we don't do local source mappings, so multiple
 	   manips not an issue.  */
 	if (hooknum == NF_IP_POST_ROUTING) {
-		struct ip_conntrack_manip *manip;
-
-		manip = find_appropriate_src(orig_tuple, mr);
-		if (manip) {
-			/* Apply same source manipulation. */
-			*tuple = ((struct ip_conntrack_tuple)
-				  { *manip, orig_tuple->dst });
+		if (find_appropriate_src(orig_tuple, tuple, mr)) {
 			DEBUGP("get_unique_tuple: Found current src map\n");
 			if (!ip_nat_used_tuple(tuple, conntrack))
 				return 1;

-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-09 23:42   ` Rusty Russell
@ 2004-11-10  6:10     ` David S. Miller
  2004-11-10 12:26       ` KOVACS Krisztian
  2004-11-10 16:25     ` Harald Welte
  1 sibling, 1 reply; 20+ messages in thread
From: David S. Miller @ 2004-11-10  6:10 UTC (permalink / raw)
  To: Rusty Russell; +Cc: laforge, netfilter-devel, gandalf, coreteam

On Wed, 10 Nov 2004 10:42:04 +1100
Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Tue, 2004-11-09 at 19:22 +0100, Martin Josefsson wrote:
> > Rusty, iirc we have a problem in the NAT core when we SNAT and have to
> > remap ports, later SNAT'd connections from the same host:port pair
> > doesn't get the same mapping as the first one. You were working on a fix
> > for this? Was this ever made public?
> 
> Dave, please apply.  (Thanks for the reminder Martin).

Applied, thanks Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-10  6:10     ` David S. Miller
@ 2004-11-10 12:26       ` KOVACS Krisztian
  2004-11-10 13:20         ` [netfilter-core] " Harald Welte
  0 siblings, 1 reply; 20+ messages in thread
From: KOVACS Krisztian @ 2004-11-10 12:26 UTC (permalink / raw)
  To: netfilter-devel; +Cc: coreteam


  Hi,

2004-11-10, sze keltezéssel 07:10-kor David S. Miller ezt írta:
> On Wed, 10 Nov 2004 10:42:04 +1100
> Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > On Tue, 2004-11-09 at 19:22 +0100, Martin Josefsson wrote:
> > > Rusty, iirc we have a problem in the NAT core when we SNAT and have to
> > > remap ports, later SNAT'd connections from the same host:port pair
> > > doesn't get the same mapping as the first one. You were working on a fix
> > > for this? Was this ever made public?
> > 
> > Dave, please apply.  (Thanks for the reminder Martin).
> 
> Applied, thanks Rusty.

  BTW, are there any plans to apply this fix to 2.4 as well? (With the
in_range() fix included, of course.)

-- 
 Regards,
   Krisztian KOVACS

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [netfilter-core] Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-10 12:26       ` KOVACS Krisztian
@ 2004-11-10 13:20         ` Harald Welte
  0 siblings, 0 replies; 20+ messages in thread
From: Harald Welte @ 2004-11-10 13:20 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netfilter-devel, coreteam

[-- Attachment #1: Type: text/plain, Size: 1276 bytes --]

On Wed, Nov 10, 2004 at 01:26:16PM +0100, KOVACS Krisztian wrote:
> 
>   Hi,
> 
> 2004-11-10, sze keltezéssel 07:10-kor David S. Miller ezt írta:
> > On Wed, 10 Nov 2004 10:42:04 +1100
> > Rusty Russell <rusty@rustcorp.com.au> wrote:
> > 
> > > On Tue, 2004-11-09 at 19:22 +0100, Martin Josefsson wrote:
> > > > Rusty, iirc we have a problem in the NAT core when we SNAT and have to
> > > > remap ports, later SNAT'd connections from the same host:port pair
> > > > doesn't get the same mapping as the first one. You were working on a fix
> > > > for this? Was this ever made public?
> > > 
> > > Dave, please apply.  (Thanks for the reminder Martin).
> > 
> > Applied, thanks Rusty.
> 
>   BTW, are there any plans to apply this fix to 2.4 as well? (With the
> in_range() fix included, of course.)

yes, certainly. I'm about to do the merge.

>  Regards,
>    Krisztian KOVACS

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-09 23:42   ` Rusty Russell
  2004-11-10  6:10     ` David S. Miller
@ 2004-11-10 16:25     ` Harald Welte
  2004-11-10 17:39       ` [PATCH 2.4] fix find_appropriate_src() to comply with kegel draft Harald Welte
  2004-11-11  0:30       ` New draft document on nat / traversal / hole punching (with Dan Kegel) Rusty Russell
  1 sibling, 2 replies; 20+ messages in thread
From: Harald Welte @ 2004-11-10 16:25 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Netfilter development mailing list, Martin Josefsson, coreteam

[-- Attachment #1: Type: text/plain, Size: 734 bytes --]

On Wed, Nov 10, 2004 at 10:42:04AM +1100, Rusty Russell wrote:
> Name: Fix find_appropriate_src() To Actually Work
> Status: Tested under nfsim
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

> [...]

> -			if (newtuple.src.ip != tuple->src.ip)
> +			if (tuple->src.ip != tuple->src.ip)

Rusty, are you sure this is really what you want?  ;)

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2.4] fix find_appropriate_src() to comply with kegel draft
  2004-11-10 16:25     ` Harald Welte
@ 2004-11-10 17:39       ` Harald Welte
  2004-11-11  0:30       ` New draft document on nat / traversal / hole punching (with Dan Kegel) Rusty Russell
  1 sibling, 0 replies; 20+ messages in thread
From: Harald Welte @ 2004-11-10 17:39 UTC (permalink / raw)
  To: Rusty Russell, Martin Josefsson; +Cc: Netfilter development mailing list

[-- Attachment #1: Type: text/plain, Size: 5570 bytes --]

Hi!

This is now my 2.4.x merge, yet untested:

Name: Fix find_appropriate_src() To Actually Work
Status: Awaiting testing
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Harald Welte <laforge@netfilter.org>

We try to bind to the same source port when sending packets from the
same source IP/source port to the outside world.  Normally, this is
simple, since we always try to keep the same source port anyway, but
there are cases where that is not available.

This is a requirement for the Kegel Peer-to-Peer NAT paper:

	http://alumnus.caltech.edu/~dank/peer-nat.html

Unfortunately, our current implementation is useless.  It looks up a
hash to see if this srcip/srcpt has been used, but instead of
returning the mapping to use, it simply returns that same srcip/srcpt.
This is clearly wrong.  As pointed out by Krisztian Kovacs.

Also, we fix a long-standing in_range() bug that was first fixed in 2.4.23, but
rolled back because it caused infinite loops.  However, this has since been
fixed and tested in 2.6.x by adding an additional ip_nat_used_tuple() in
get_unique_tuple().

diff -Nru --exclude .depend --exclude '*.o' --exclude '*.ko' --exclude '*.ver' --exclude '.*.flags' --exclude '*.orig' --exclude '*.rej' --exclude '*.cmd' --exclude '*.mod.c' --exclude '*~' linux-2.4.28-rc2-plain/net/ipv4/netfilter/ip_nat_core.c linux-2.4.28-rc2-kegel/net/ipv4/netfilter/ip_nat_core.c
--- linux-2.4.28-rc2-plain/net/ipv4/netfilter/ip_nat_core.c	2004-08-08 01:26:06.000000000 +0200
+++ linux-2.4.28-rc2-kegel/net/ipv4/netfilter/ip_nat_core.c	2004-11-10 18:32:45.189229469 +0100
@@ -134,31 +134,30 @@
 	return ip_conntrack_tuple_taken(&reply, ignored_conntrack);
 }
 
-/* Does tuple + the source manip come within the range mr */
+/* If we source map this tuple so reply looks like reply_tuple, will
+ * that meet the constraints of mr. */
 static int
 in_range(const struct ip_conntrack_tuple *tuple,
-	 const struct ip_conntrack_manip *manip,
 	 const struct ip_nat_multi_range *mr)
 {
 	struct ip_nat_protocol *proto = find_nat_proto(tuple->dst.protonum);
 	unsigned int i;
-	struct ip_conntrack_tuple newtuple = { *manip, tuple->dst };
 
 	for (i = 0; i < mr->rangesize; i++) {
 		/* If we are allowed to map IPs, then we must be in the
 		   range specified, otherwise we must be unchanged. */
 		if (mr->range[i].flags & IP_NAT_RANGE_MAP_IPS) {
-			if (ntohl(newtuple.src.ip) < ntohl(mr->range[i].min_ip)
-			    || (ntohl(newtuple.src.ip)
+			if (ntohl(tuple->src.ip) < ntohl(mr->range[i].min_ip)
+			    || (ntohl(tuple->src.ip)
 				> ntohl(mr->range[i].max_ip)))
 				continue;
 		} else {
-			if (newtuple.src.ip != tuple->src.ip)
+			if (tuple->src.ip != tuple->src.ip)
 				continue;
 		}
 
 		if ((mr->range[i].flags & IP_NAT_RANGE_PROTO_SPECIFIED)
-		    && proto->in_range(&newtuple, IP_NAT_MANIP_SRC,
+		    || proto->in_range(tuple, IP_NAT_MANIP_SRC,
 				       &mr->range[i].min, &mr->range[i].max))
 			return 1;
 	}
@@ -166,36 +165,38 @@
 }
 
 static inline int
-src_cmp(const struct ip_nat_hash *i,
-	const struct ip_conntrack_tuple *tuple,
-	const struct ip_nat_multi_range *mr)
+same_src(const struct ip_nat_hash *i,
+	 const struct ip_conntrack_tuple *tuple)
 {
 	return (i->conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum
 		== tuple->dst.protonum
 		&& i->conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.ip
 		== tuple->src.ip
 		&& i->conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.all
-		== tuple->src.u.all
-		&& in_range(tuple,
-			    &i->conntrack->tuplehash[IP_CT_DIR_ORIGINAL]
-			    .tuple.src,
-			    mr));
+		== tuple->src.u.all);
 }
 
 /* Only called for SRC manip */
-static struct ip_conntrack_manip *
+static int
 find_appropriate_src(const struct ip_conntrack_tuple *tuple,
+		     struct ip_conntrack_tuple *result,
 		     const struct ip_nat_multi_range *mr)
 {
 	unsigned int h = hash_by_src(&tuple->src, tuple->dst.protonum);
 	struct ip_nat_hash *i;
 
 	MUST_BE_READ_LOCKED(&ip_nat_lock);
-	i = LIST_FIND(&bysource[h], src_cmp, struct ip_nat_hash *, tuple, mr);
-	if (i)
-		return &i->conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src;
-	else
-		return NULL;
+	i = LIST_FIND(&bysource[h], same_src, struct ip_nat_hash *, tuple);
+	if (i) {
+		/* Copy source part from reply tuple */
+		invert_tuplepr(result, 
+			&i->conntrack->tuplehash[IP_CT_DIR_REPLY].tuple);
+		result->dst = tuple->dst;
+
+		if (in_range(result, mr))
+			return 1;
+	}
+	return 0;
 }
 
 #ifdef CONFIG_IP_NF_NAT_LOCAL
@@ -412,15 +413,10 @@
 	   So far, we don't do local source mappings, so multiple
 	   manips not an issue.  */
 	if (hooknum == NF_IP_POST_ROUTING) {
-		struct ip_conntrack_manip *manip;
-
-		manip = find_appropriate_src(orig_tuple, mr);
-		if (manip) {
-			/* Apply same source manipulation. */
-			*tuple = ((struct ip_conntrack_tuple)
-				  { *manip, orig_tuple->dst });
+		if (find_appropriate_src(orig_tuple, tuple, mr)) {
 			DEBUGP("get_unique_tuple: Found current src map\n");
-			return 1;
+			if (!ip_nat_used_tuple(tuple, conntrack))
+				return 1;
 		}
 	}
 

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-11  0:30       ` New draft document on nat / traversal / hole punching (with Dan Kegel) Rusty Russell
@ 2004-11-11  0:26         ` David S. Miller
  2004-11-11  4:23           ` Rusty Russell
  0 siblings, 1 reply; 20+ messages in thread
From: David S. Miller @ 2004-11-11  0:26 UTC (permalink / raw)
  To: Rusty Russell; +Cc: laforge, netfilter-devel, gandalf, coreteam

On Thu, 11 Nov 2004 11:30:56 +1100
Rusty Russell <rusty@rustcorp.com.au> wrote:

> Ew... Dave, please drop that patch as it was crap.
> 
> I merged two patches (a cleanup and the fix) and didn't re-test.
> 
> I'll fix, test and re-send.

Please send something relative, I just pushed your change
to Linus.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-10 16:25     ` Harald Welte
  2004-11-10 17:39       ` [PATCH 2.4] fix find_appropriate_src() to comply with kegel draft Harald Welte
@ 2004-11-11  0:30       ` Rusty Russell
  2004-11-11  0:26         ` David S. Miller
  1 sibling, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2004-11-11  0:30 UTC (permalink / raw)
  To: Harald Welte
  Cc: Netfilter development mailing list, Martin Josefsson, coreteam

On Wed, 2004-11-10 at 17:25 +0100, Harald Welte wrote:
> On Wed, Nov 10, 2004 at 10:42:04AM +1100, Rusty Russell wrote:
> > Name: Fix find_appropriate_src() To Actually Work
> > Status: Tested under nfsim
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> > [...]
> 
> > -			if (newtuple.src.ip != tuple->src.ip)
> > +			if (tuple->src.ip != tuple->src.ip)
> 
> Rusty, are you sure this is really what you want?  ;)

Ew... Dave, please drop that patch as it was crap.

I merged two patches (a cleanup and the fix) and didn't re-test.

I'll fix, test and re-send.

Sorry,
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-11  0:26         ` David S. Miller
@ 2004-11-11  4:23           ` Rusty Russell
  2004-11-15 23:09             ` David S. Miller
  0 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2004-11-11  4:23 UTC (permalink / raw)
  To: David S. Miller
  Cc: Harald Welte, Netfilter development mailing list, gandalf, coreteam

On Wed, 2004-11-10 at 16:26 -0800, David S. Miller wrote:
> On Thu, 11 Nov 2004 11:30:56 +1100
> Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > Ew... Dave, please drop that patch as it was crap.
> > 
> > I merged two patches (a cleanup and the fix) and didn't re-test.
> > 
> > I'll fix, test and re-send.
> 
> Please send something relative, I just pushed your change
> to Linus.

Here it is.  Turns out it's a noop change.

Name: Cleanup find_appropriate_src() Fix
Status: Tested under nfsim
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

The fix to find_appropriate_src left a stupid test, which has no
effect but is incoherent at best.  It covers a corner case: if a
previous connection from this source was mapped onto a different IP
address (because it was explicitly told to), should the next one be
mapped that was as well, if no explicit rule says to remap the src IP?

eg. Admin says "NAT connections with DST 10.* to SRC 10.0.0.1", and a
machine sends out a UDP from SRC=a.b.c.d:p to 10.* (which gets
remapped to SRC=10.0.0.1:p), then sends from the same SRC=a.b.c.d:p to
a non-10.*, should we also map the second one to 10.0.0.1:p?

I think the answer is yes: unless the user explicitly tells us to map
into a particular range, we should follow the Kegel draft.

--- working-2.6.10-rc1-bk19-fix-source-matching/net/ipv4/netfilter/ip_nat_core.c.~1~	2004-11-11 12:17:30.000000000 +1100
+++ working-2.6.10-rc1-bk19-fix-source-matching/net/ipv4/netfilter/ip_nat_core.c	2004-11-11 12:27:16.000000000 +1100
@@ -128,16 +128,13 @@
 	unsigned int i;
 
 	for (i = 0; i < mr->rangesize; i++) {
-		/* If we are allowed to map IPs, then we must be in the
-		   range specified, otherwise we must be unchanged. */
+		/* If we are supposed to map IPs, then we must be in the
+		   range specified. */
 		if (mr->range[i].flags & IP_NAT_RANGE_MAP_IPS) {
 			if (ntohl(tuple->src.ip) < ntohl(mr->range[i].min_ip)
 			    || (ntohl(tuple->src.ip)
 				> ntohl(mr->range[i].max_ip)))
 				continue;
-		} else {
-			if (tuple->src.ip != tuple->src.ip)
-				continue;
 		}
 
 		if (!(mr->range[i].flags & IP_NAT_RANGE_PROTO_SPECIFIED)

-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-11  4:23           ` Rusty Russell
@ 2004-11-15 23:09             ` David S. Miller
  0 siblings, 0 replies; 20+ messages in thread
From: David S. Miller @ 2004-11-15 23:09 UTC (permalink / raw)
  To: Rusty Russell; +Cc: laforge, netfilter-devel, gandalf, coreteam

On Thu, 11 Nov 2004 15:23:17 +1100
Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Wed, 2004-11-10 at 16:26 -0800, David S. Miller wrote:
> > On Thu, 11 Nov 2004 11:30:56 +1100
> > Rusty Russell <rusty@rustcorp.com.au> wrote:
> > 
> > > Ew... Dave, please drop that patch as it was crap.
> > > 
> > > I merged two patches (a cleanup and the fix) and didn't re-test.
> > > 
> > > I'll fix, test and re-send.
> > 
> > Please send something relative, I just pushed your change
> > to Linus.
> 
> Here it is.  Turns out it's a noop change.

Applied, thanks a lot Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-09 15:49 New draft document on nat / traversal / hole punching (with Dan Kegel) Harald Welte
  2004-11-09 18:22 ` Martin Josefsson
@ 2004-11-16  5:28 ` Rusty Russell
  2004-11-16 20:38   ` Bryan Ford
  1 sibling, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2004-11-16  5:28 UTC (permalink / raw)
  To: Harald Welte
  Cc: srisures, baford, Netfilter development mailing list,
	Netfilter Core Team, dank

On Tue, 2004-11-09 at 16:49 +0100, Harald Welte wrote:
> Hi!
> 
> On the IETF behave list, I received a link to a yet unpublished draft
> that can be seen as a successor to the old 'kegel draft' (now also
> authored by a whole group).  It should be a good reading for anyone who
> wans to thoroughly understand the problem and see proposed solutions.
> 
> It now covers both, UDP and TCP.
> 
> http://www.brynosaurus.com/pub/os/nat.pdf

OK, I actually like this draft (although the classic NAT terminology
still leaves me cold - no matter).  Kudos, Bryan/Pyda/Dan!

The exploration of the multiple layers problem was new to me, which was
a pleasant surprise (in my own exploration of these issues I never got
past "Ick, don't do that!").  I'm not sure whether we can implement
hairpin NAT sanely though.

In iptables-speak, assume:
	NAT A: does SNAT -> 10.0.1.1 on conns to external.
	NAT B: does SNAT -> 10.0.1.2 on conns to external.
	NAT C: does SNAT -> 155.99.25.11 on conns to external.

When Client A talks to B(public), NAT C will see:
	10.0.1.1:45000 -> 155.99.25.11:62005

It's a new connection, and without an explicit rule, it will simply be
sent to the NAT C box itself (assuming 155.99.25.11 is its real IP
address), in the assumption that it is an incoming local connection.
NAT C will send an ICMP error or TCP RST.  The packet from Client B to
A(public) will do the same thing.

Now, what do we want to happen?  NAT C should scour its tables and see
if anyone is using 155.99.25.11:62005 as an output address, and assume
that this connection should be DNATted towards them (the source-lookup
mapping should ensure that this will be SNATted to 155.99.25.11:62000
anyway).  "iptables -t nat -i eth0 -A PREROUTING -d 155.99.25.11 -j
HAIRPIN" anyone?  Not that we have a hash we can search for this case,
so it'd be incredibly slow, or we'd need to add one.  

Most NAT explanations and implementations assume that each connection
gets exclusive use of the port in the NAT space (exception connections
with same SRC already).  Unfortunately I decided that being able to
preserve source ports was generally more important, but this is looking
more questionable now.  One benefit is that we can put thousands of
hosts behind a single IP address, but maybe that's worth sacrificing?

Thoughts welcome,
Rusty.
PS.  Even before our "lookup previous connections from same source" fix,
we usually got the desired "cone" behaviour because we try to preserve
source ports.  So in practice that bug doesn't bite very hard.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-16  5:28 ` Rusty Russell
@ 2004-11-16 20:38   ` Bryan Ford
  2004-11-17  3:58     ` Rusty Russell
  2004-11-17 16:04     ` Henrik Nordstrom
  0 siblings, 2 replies; 20+ messages in thread
From: Bryan Ford @ 2004-11-16 20:38 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Harald Welte, srisures, Netfilter development mailing list,
	Netfilter Core Team, dank

Hi Rusty,

On Tuesday 16 November 2004 00:28, Rusty Russell wrote:
> Most NAT explanations and implementations assume that each connection
> gets exclusive use of the port in the NAT space (exception connections
> with same SRC already).  Unfortunately I decided that being able to
> preserve source ports was generally more important, but this is looking
> more questionable now.  One benefit is that we can put thousands of
> hosts behind a single IP address, but maybe that's worth sacrificing?

I don't know much about the details of iptables or other NAT implementations, 
but from a high-level viewpoint it seems to me that implementing hairpin 
translation should only require facilities that must already be implemented 
somehow in any port-restricted cone NAT.  (Symmetric NAT is completely 
P2P-unfriendly by definition and thus moot for our purposes, and other types 
of cone NAT just have more permissive firewall policies.)  In particular, a 
port-restricted cone must somehow implement two different kinds of "tables", 
and be able to perform two correspondingly different kinds of "lookups":

1. First, the NAT has to have some kind of "session table" or "connection 
table" keyed on the IP addresses and port numbers of _both_ ends of the 
connection, whether the endpoint is internal or external, which endpoint(s) 
need to be translated, and how.  Packets are filtered on a 
connection-by-connection basis using this table (which is what makes the NAT 
port-restricted rather than full cone or something else), and the common-case 
translation path only uses this table.  In the common case, one endpoint of a 
connection is on each side of the NAT, and only one endpoint gets translated 
- but for hairpin translation, the NAT must have enough flexibility to allow 
_both_ endpoints of a connection to be on the same side of the NAT, and hence 
for both endpoints to be translated.  

2. Second, the NAT has to have a "port table" or "mapping table" of some kind, 
which records the association between the NAT's own public ports and the 
private endpoints (private IP, private port pairs) they currently represent.  
The important thing is that the port table is mapped by only _one_ of the 
communication endpoints - the internal endpoint - and thus there can be many 
active sessions corresponding to a single port table entry.  When an internal 
node initiates a new outgoing session, causing a "miss" in the NAT's session 
table, the NAT must look up the internal (source) endpoint in its port table 
to see if it's already allocated a public (translated) port at the NAT's IP 
address, and if so, reuse that existing public port as it sets up the new 
session.  Again, this "port table" functionality must somehow already be 
implemented by any cone NAT, independent of the hairpin translation issue; a 
NAT that doesn't do this is a symmetric NAT (or worse), and thus is 
completely broken from a P2P perspective.  Hairpin translation just relates 
to the special case where a new "outgoing" session is directed at one of the 
NAT's own public (translated) ports, in which case the NAT must set up a 
session in which both endpoints are translated.

Again, I don't know the details of iptables, and I know this high-level 
viewpoint probably doesn't line up very closely with how iptables currently 
does translation.  The important point, however, is that hairpin translation 
"in theory" should require just a slight generalization of the hash tables or 
other lookup facilities that need to be implemented anyway in any 
port-restricted cone NAT.  Does this make sense?

> PS.  Even before our "lookup previous connections from same source" fix,
> we usually got the desired "cone" behaviour because we try to preserve
> source ports.  So in practice that bug doesn't bite very hard.

By trying to preserve source ports, I assume you mean that if a client on the 
internal network uses source port number 1234 for a connection, the NAT first 
tries to allocate port number 1234 at its IP address as the public port for 
that connection?  That's a sensible feature, but it of course has very 
limited benefit as soon as multiple hosts behind the same NAT make outgoing 
connections from the same source port number.  For P2P-friendliness the 
important requirement is that the NAT preserves the _identity_ of an 
internal-to-external translation across multiple sessions originating from 
the same internal port, regardless of whether it can preserve the actual 
_port number_ (which is often impossible).  I assume that by your "lookup 
previous connections from same source" fix you meant the latter, 
identity-preserving functionality.  In which case you should in theory be 
able to use approximately the same lookup functionality needed for this fix 
to implement hairpin translation.  Unless I'm missing something important, 
which of course is quite possible. :)

Anyway, I hope this helps.

Cheers,
Bryan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-16 20:38   ` Bryan Ford
@ 2004-11-17  3:58     ` Rusty Russell
  2004-11-17 16:33       ` Bryan Ford
  2004-11-17 16:04     ` Henrik Nordstrom
  1 sibling, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2004-11-17  3:58 UTC (permalink / raw)
  To: Bryan Ford
  Cc: Harald Welte, srisures, Netfilter development mailing list,
	Netfilter Core Team, dank

On Tue, 2004-11-16 at 15:38 -0500, Bryan Ford wrote:
> Hi Rusty,
> 
> On Tuesday 16 November 2004 00:28, Rusty Russell wrote:
> > Most NAT explanations and implementations assume that each connection
> > gets exclusive use of the port in the NAT space (exception connections
> > with same SRC already).  Unfortunately I decided that being able to
> > preserve source ports was generally more important, but this is looking
> > more questionable now.  One benefit is that we can put thousands of
> > hosts behind a single IP address, but maybe that's worth sacrificing?
> 
> I don't know much about the details of iptables or other NAT implementations, 
> but from a high-level viewpoint it seems to me that implementing hairpin 
> translation should only require facilities that must already be implemented 
> somehow in any port-restricted cone NAT.  (Symmetric NAT is completely 
> P2P-unfriendly by definition and thus moot for our purposes, and other types 
> of cone NAT just have more permissive firewall policies.)  In particular, a 
> port-restricted cone must somehow implement two different kinds of "tables", 
> and be able to perform two correspondingly different kinds of "lookups":
> 
> 1. First, the NAT has to have some kind of "session table" or "connection 
> table" keyed on the IP addresses and port numbers of _both_ ends of the 
> connection, whether the endpoint is internal or external, which endpoint(s) 
> need to be translated, and how.  Packets are filtered on a 
> connection-by-connection basis using this table (which is what makes the NAT 
> port-restricted rather than full cone or something else), and the common-case 
> translation path only uses this table.  In the common case, one endpoint of a 
> connection is on each side of the NAT, and only one endpoint gets translated 
> - but for hairpin translation, the NAT must have enough flexibility to allow 
> _both_ endpoints of a connection to be on the same side of the NAT, and hence 
> for both endpoints to be translated.  

Indeed, we don't really have the concept of sides inherent in our NAT
implementation, but re-routing a packet out the same interface after
modification is indeed possible (note that we also suppress ICMP
redirect which normally gets sent when this happens).

We use a "conntrack" hash table of 5-tuples
(PROTO-SRCIP:SPT=>DSTIP:DPT); each connection has two of them
(symmetrical in the case of no NAT).

> 2. Second, the NAT has to have a "port table" or "mapping table" of some kind, 
> which records the association between the NAT's own public ports and the 
> private endpoints (private IP, private port pairs) they currently represent.  
> The important thing is that the port table is mapped by only _one_ of the 
> communication endpoints - the internal endpoint - and thus there can be many 
> active sessions corresponding to a single port table entry.  When an internal 
> node initiates a new outgoing session, causing a "miss" in the NAT's session 
> table, the NAT must look up the internal (source) endpoint in its port table 
> to see if it's already allocated a public (translated) port at the NAT's IP 
> address, and if so, reuse that existing public port as it sets up the new 
> session.  Again, this "port table" functionality must somehow already be 
> implemented by any cone NAT, independent of the hairpin translation issue; a 
> NAT that doesn't do this is a symmetric NAT (or worse), and thus is 
> completely broken from a P2P perspective.  Hairpin translation just relates 
> to the special case where a new "outgoing" session is directed at one of the 
> NAT's own public (translated) ports, in which case the NAT must set up a 
> session in which both endpoints are translated.

This is where we have problems.  We do not have the concept of "public
ports".  Our NAT algorithm for new connections is as follows.  It runs
twice, once for source NAT and once for destination, but we can ignore
that and concentrate on source mapping here:

For connection/packet PROTO-SRCIP:SPT=>DSTIP:DPT:

1) Look up user-defined rules to decide what range to NAT this to
(default is <SRCIP:anyport> if no rule).  We support ranges.

2) Look up a a "bysource" hash table of (PROTO, SRCIP, SPT).  If another
connection from the same endpoint is found, and the same mapping gives a
connection within the range, try that.  If that mapping creates a unique
reply (ie. there's no PROTO-DSTIP:DPT=>mapped SRCIP:mapped SPT in our
"conntrack" hash table), we're done.

3) Try a mapping which doesn't change SPT, ie. just changing the SRCIP
to each IP address in the range.  If that mapping creates a unique
reply, we're done.

4) Try altering the SPT as well (we attempt to keep high ports high,
etc).  If this fails, we drop the connection.

Consider two machines A and B being NATed through N to servers Sa, Sb
and Sc respectively (rule says NAT source to N):

	A:45000 -> Sa:80  <=>  N:45000 -> Sa:80 (rule 3)
	B:45000 -> Sb:80  <=>  N:45000 -> Sb:80 (rule 3)
	A:45000 -> Sc:80  <=>  N:45000 -> Sc:80 (rule 2)
	B:45000 -> Sc:80  <=>  N:45001 -> Sc:80 (rule 2 clash, rule 4)

The reason I implemented NAT this way (back in 1998/99) was that not
disturbing port numbers avoided breaking a number of protocols.  It also
turns out that overloading the ports also allows more connections than
65535 * number of IP addresses.

I hope that explains?  We're going to have to think hard about changing
to a port reservation architecture if we really want the hairpin
feature.

Cheers,
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-16 20:38   ` Bryan Ford
  2004-11-17  3:58     ` Rusty Russell
@ 2004-11-17 16:04     ` Henrik Nordstrom
  2004-11-17 16:47       ` Bryan Ford
  1 sibling, 1 reply; 20+ messages in thread
From: Henrik Nordstrom @ 2004-11-17 16:04 UTC (permalink / raw)
  To: Bryan Ford
  Cc: srisures, Netfilter development mailing list, Harald Welte,
	Rusty Russell, dank, Netfilter Core Team

On Tue, 16 Nov 2004, Bryan Ford wrote:

> 2. Second, the NAT has to have a "port table" or "mapping table" of some kind,
> which records the association between the NAT's own public ports and the
> private endpoints (private IP, private port pairs) they currently represent.

And this is what you do not have in netfilter/iptables NAT.

The netfilter/iptables NAT always work with full tuples, one original 
(before NAT) and one reply (inverse after NAT). The port is not reserved 
and can be reused by any number of different connections from any number 
of clients, what is guaranteed is that the resulting tuples is always 
unique on both sides.

The netfilter/iptables NAT tries to preserve as much as possible of the 
original tuple, and almost always preserves the original source port used 
by the client. Only if there is conflicts after NAT with already 
established sessions is the source port rewritten to keep the NAT:ed tuple 
unique.

Also, netfilter/iptables NAT does not have a concept of internal or 
external. Only original and reply. The concept of internal vs external is 
mainly an administrative concept, not a technical one.

> Again, I don't know the details of iptables, and I know this high-level
> viewpoint probably doesn't line up very closely with how iptables currently
> does translation.  The important point, however, is that hairpin translation
> "in theory" should require just a slight generalization of the hash tables or
> other lookup facilities that need to be implemented anyway in any
> port-restricted cone NAT.  Does this make sense?

Yes.

And the issue here is that iptables is not port-restricted cone NAT and 
does not reserve any ports.

> By trying to preserve source ports, I assume you mean that if a client on the
> internal network uses source port number 1234 for a connection, the NAT first
> tries to allocate port number 1234 at its IP address as the public port for
> that connection?  That's a sensible feature, but it of course has very
> limited benefit as soon as multiple hosts behind the same NAT make outgoing
> connections from the same source port number.

As long as they make connections to different destinations their source 
ports is still preserved.

The source port need only to be rewritten if you have two clients using 
the exact same source port going to the exact same destination ip:port and 
being masqueraded/NAT:ed to a single external IP address.

> For P2P-friendliness the important requirement is that the NAT 
> preserves the _identity_ of an internal-to-external translation across 
> multiple sessions originating from the same internal port, regardless of 
> whether it can preserve the actual _port number_ (which is often 
> impossible).

And as you might see from the above netfilter NAT currently does not have 
this property. Instead it has many nice properties which makes it work 
extremely well in high traffic situations with millions of 
concurrent connections, all being masqueraded/NAT:ed to the same IP 
address if you like (in such case limited to ca 65K connetions per 
destination ip:port, per external IP NAT:ed to, or 1K connections per .. 
for low ports)

I think most of the Netfilter developers understands the problems and 
approaches quite well by now. The question is how much of the quite nice 
properties we have today should/need to be sacrified in order to become 
more P2P friendly.

Regards
Henrik

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-17  3:58     ` Rusty Russell
@ 2004-11-17 16:33       ` Bryan Ford
  2004-11-18  2:47         ` Rusty Russell
  0 siblings, 1 reply; 20+ messages in thread
From: Bryan Ford @ 2004-11-17 16:33 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Harald Welte, srisures, Netfilter development mailing list,
	Netfilter Core Team, dank

On Tuesday 16 November 2004 22:58, Rusty Russell wrote:
> Consider two machines A and B being NATed through N to servers Sa, Sb
> and Sc respectively (rule says NAT source to N):
>
> 	A:45000 -> Sa:80  <=>  N:45000 -> Sa:80 (rule 3)
> 	B:45000 -> Sb:80  <=>  N:45000 -> Sb:80 (rule 3)
> 	A:45000 -> Sc:80  <=>  N:45000 -> Sc:80 (rule 2)
> 	B:45000 -> Sc:80  <=>  N:45001 -> Sc:80 (rule 2 clash, rule 4)

Ah!  Now I understand what you're getting at.  The problem is that this NAT 
design does not implement a true cone NAT at all, and therefore from a P2P 
perspective has more basic problems that essentially make the hairpin 
translation issue moot.  The fundamental problem is that after the NAT has 
opened the first two sessions above, allowing two different source endpoints 
to share the same translated NAT port, the NAT has already "painted itself 
into a corner" and made it impossible to preserve either source endpoint's 
identity when either endpoint attempts to communicate with other hosts - as 
your example eloquently illustrates.

Suppose for example that A:45000 and B:45000 are both P2P apps that have 
established connections to two different P2P "rendezvous servers" Sa and Sb, 
respectively, causing the first two connections to be opened just as above:

 	A:45000 -> Sa:80  <=>  N:45000 -> Sa:80 (rule 3)
 	B:45000 -> Sb:80  <=>  N:45000 -> Sb:80 (rule 3)

Now suppose A and B both want to open connections with a third host C, which 
may be NATted, and therefore hole punching is required.  Say A gets lucky and 
gets there first; the NAT establishes a third session, which according to 
your rules above will happen to preserve the source port's identity:

 	A:45000 -> C:80  <=>  N:45000 -> C:45000 (rule 2)

C will similarly try to send a ping back to A at N:45000, and as soon as each 
client's ping has reached the respective NAT, a pair of corresponding holes 
is open and regular communication can proceed.

But now B also wants to talk to C, and because the NAT has already painted 
itself into a corner by making A's and B's rendezvous connections share the 
same NAT port, the NAT has no possibility of preserving the identity of B's 
source port.  Instead the NAT establishes the following session, which breaks 
the cone NAT rule:

 	B:45000 -> C:80  <=>  N:45001 -> C:45000 (rule 2)

Now when B and C try to hole punch a communication session open to each other, 
C assumes that it can reach B at N:45000 because that is B's public port as 
far as B and Sb can determine.  But the NAT will only forward traffic from C 
if it comes to port N:45001, and there's no way B or Sb could determine this 
fact without direct knowledge or cooperation with the NAT.

In practice, this design may be even more "P2P-unfriendly" than a typical 
symmetric NAT that just always bumps the port number by 1 when a new session 
is opened.  In the always-bump-by-1 case, at least the change is relatively 
predictable, and more sophisticated hole punching algorithms are already able 
to detect this behavior and establish P2P connections across such NATs with 
moderate reliability.  The NAT behavior you describe, however, makes things 
even more unpredictable: your NAT will look to the application like a cone 
NAT sometimes, but can at times mysteriously and inexplicably switch to a 
symmetric NAT as a result of the independent actions of other nodes elsewhere 
on the private network.

> The reason I implemented NAT this way (back in 1998/99) was that not
> disturbing port numbers avoided breaking a number of protocols.  It also
> turns out that overloading the ports also allows more connections than
> 65535 * number of IP addresses.

Yes, I'm aware of these considerations.  As far as the first one goes, it's 
certainly a good thing to be able to make NAT-unaware legacy protocols keep 
working "most of the time" whenever possible, but as your example above 
demonstrates it's often simply not possible to preserve the source port 
number in real-world situations, so the "preserve source port" rule at best 
works some of the time and does not represent a viable long-term strategy for 
making NATs and apps get along together reliably.  One appropriate solution I 
can think of would be to migrate the NAT to a P2P-friendly, true cone NAT 
model, but retain the above, P2P-unfriendly but "source port preserving when 
possible" strategy only for the specific legacy protocols that we care about 
and are known to work better under that strategy.  The NAT could determine 
the connection strategy simply by looking up the destination port number in a 
table of well-known port numbers for these legacy protocols, for example.  
Ideally this would be configurable, of course, but any unknown port should 
default to providing true cone NAT behavior, because otherwise there will 
never be a fully reliable way new NAT-aware apps can make P2P connections 
across NATs.

As for the second consideration, first note that true cone NATs can still 
potentially support more than (65535 * # IP addrs) connections - the actual 
limitation is that they can only support (65535 * # IP addrs) separate 
_internal endpoints_ at once.  For example, if N is a true cone NAT, then 
A:45000 can use the port N:45000 to talk to as many external endpoints as it 
wants, and all those connections will re-use the same public port, and thus 
only consume one port in the NAT's port space.  Since this is the typical 
behavior of P2P applications, you could say that not only are cone NATs 
P2P-friendly, but P2P apps are also "cone-NAT-friendly" because they make 
reasonably sparing use of the NAT's public port space by re-using the same 
source port.

Traditional client/server apps are another matter, of course: since the common 
behavior is for the client to allocate a new port number for every new 
connection, in practice a cone NAT is indeed limited to (65535 * # IP addrs) 
separate client/server connections at once, and I agree that that could be a 
problem in some cases.  One solution is, once again, to make the NAT's 
behavior depend on the protocol in a configurable fashion: for example, it 
might use the symmetric NAT behavior for popular, well-known client/server 
protocols such as HTTP that are known not to need the cone NAT property, but 
provide the cone NAT property by default for other, unknown protocols.  A 
cone NAT's limitation on number of simultaneously active internal ports is 
only ever likely to be a problem anyway for highly popular, well-known 
protocols using standard port numbers.

On a related note - in the first public version of our P2P-over-middleboxes 
draft, we proposed a new IP header option that would allow apps to indicate 
explicitly to NATs in the path whether they required cone NAT behavior or 
whather symmetric NAT behavior would be acceptable.  Part of the motivation 
for proposing this IP option was specifically the resource management concern 
you described.  We dropped it from future drafts because we encountered 
immediate resistance to "inventing anything new protocols" related to NATs, 
and it was seen as outside the scope of the draft.  But the idea could be 
resurrected in the future if there turns out to be a need for it.    You can 
find the original proposal in Section 4 at the following link:

http://www.brynosaurus.com/pub/net/draft-ford-midcom-p2p-00.txt

For now, though, I would propose that we first just try to make it possible 
for P2P apps to work reliably over NATs, and then deal later with any port 
space limitation issues if and when they actually arise.

> I hope that explains?  We're going to have to think hard about changing
> to a port reservation architecture if we really want the hairpin
> feature.

Yes, I agree - and I hope it's clear now that it's not just about the hairpin 
feature, but about reliably supporting hole-punching in general, even in the 
standard single-level NAT case.

Thanks,
Bryan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-17 16:04     ` Henrik Nordstrom
@ 2004-11-17 16:47       ` Bryan Ford
  2004-11-19  7:37         ` New draft document on nat / traversal / hole punching (with DanKegel) jesse
  0 siblings, 1 reply; 20+ messages in thread
From: Bryan Ford @ 2004-11-17 16:47 UTC (permalink / raw)
  To: Henrik Nordstrom
  Cc: srisures, Netfilter development mailing list, Harald Welte,
	Rusty Russell, dank, Netfilter Core Team

On Wednesday 17 November 2004 11:04, Henrik Nordstrom wrote:
> I think most of the Netfilter developers understands the problems and
> approaches quite well by now.

Excellent - in that case perhaps the long message I sent just before receiving 
your message wasn't necessary after all; you made all my points much more 
succinctly and eloquently. :)

> The question is how much of the quite nice 
> properties we have today should/need to be sacrified in order to become
> more P2P friendly.

As I pointed out in the previous message, in practice I don't think these nice 
properties need to be sacrificed - the NAT could keep the traditional 
symmetric NAT behavior for existing, well-known protocols such as HTTP that 
don't need the cone NAT property, while providing exclusive port allocation, 
and thus more predictable, P2P-friendly cone NAT behavior for new or unknown 
protocols.  Since it's only the standard, well-known protocols that are ever 
likely to push the port allocation limitations of cone NAT, it seems to me 
that this strategy should work quite well in practice even on a large 
industrial NAT that might handle millions of connections at once.

Thanks,
Bryan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with Dan Kegel)
  2004-11-17 16:33       ` Bryan Ford
@ 2004-11-18  2:47         ` Rusty Russell
  0 siblings, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2004-11-18  2:47 UTC (permalink / raw)
  To: Bryan Ford
  Cc: Harald Welte, srisures, Netfilter development mailing list,
	Netfilter Core Team, dank

On Wed, 2004-11-17 at 11:33 -0500, Bryan Ford wrote:
> > I hope that explains?  We're going to have to think hard about changing
> > to a port reservation architecture if we really want the hairpin
> > feature.
> 
> Yes, I agree - and I hope it's clear now that it's not just about the hairpin 
> feature, but about reliably supporting hole-punching in general, even in the 
> standard single-level NAT case.

One issue that became clear to me in developing and receiving bug
reports for NAT was that NAT is *always* a "90%" solution. ie. aim to be
best effort rather than reliable.  With unknown protocols,
fragmentation, embedded source addresses, address exhaustion, etc, we're
always imperfect.  Once I absorbed this epiphany this made my life was
much easier 8)

For the common, simple case, our current behaviour works (which is good,
because those machines will stay out there for a long time yet!).

But I think there's evidence here is that the balance is shifting: more
protocols are NAT-aware, and excellent (and hopefully to-be-widespread)
papers such as yours will hopefully accelerate this trend.  So we should
look to changing the implementation now, even if we risk becoming worse
in the short term.

Thanks,
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: New draft document on nat / traversal / hole punching (with DanKegel)
  2004-11-17 16:47       ` Bryan Ford
@ 2004-11-19  7:37         ` jesse
  0 siblings, 0 replies; 20+ messages in thread
From: jesse @ 2004-11-19  7:37 UTC (permalink / raw)
  To: Bryan Ford, Henrik Nordstrom
  Cc: srisures, Netfilter development mailing list, Harald Welte,
	Rusty Russell, dank, Netfilter Core Team


----- Original Message -----
From: "Bryan Ford" <baford@mit.edu>
To: "Henrik Nordstrom" <hno@marasystems.com>
Cc: <srisures@yahoo.com>; "Netfilter development mailing list"
<netfilter-devel@lists.netfilter.org>; "Harald Welte"
<laforge@netfilter.org>; "Rusty Russell" <rusty@rustcorp.com.au>;
<dank@kegel.com>; "Netfilter Core Team" <coreteam@netfilter.org>
Sent: Thursday, November 18, 2004 12:47 AM
Subject: Re: New draft document on nat / traversal / hole punching (with
DanKegel)


> On Wednesday 17 November 2004 11:04, Henrik Nordstrom wrote:
> > I think most of the Netfilter developers understands the problems and
> > approaches quite well by now.

Yes,I think most of us have bet our life too much on those good features
Rusty designed,and really can't afford the last "remove do_extra_mangle"
like 5 patchs who proposed.
>
> Excellent - in that case perhaps the long message I sent just before
receiving
> your message wasn't necessary after all; you made all my points much more
> succinctly and eloquently. :)
>
> > The question is how much of the quite nice
> > properties we have today should/need to be sacrified in order to become
> > more P2P friendly.

According to the 3 years knowledge in netfilter field,I don't think it is
necessary that any existing nice properties be sacrified in order to match
P2P need.
>
> As I pointed out in the previous message, in practice I don't think these
nice
> properties need to be sacrificed - the NAT could keep the traditional
> symmetric NAT behavior for existing, well-known protocols such as HTTP
that
> don't need the cone NAT property, while providing exclusive port
allocation,
> and thus more predictable, P2P-friendly cone NAT behavior for new or
unknown
> protocols.  Since it's only the standard, well-known protocols that are
ever
> likely to push the port allocation limitations of cone NAT, it seems to me
> that this strategy should work quite well in practice even on a large
> industrial NAT that might handle millions of connections at once.
>
> Thanks,
> Bryan

Yes,but opposite is leaving P2P need be specifically treated.The following
is the high layer design in netfilter context that for Bryan can keep giving
advice:

1.Like Rusty's previous suggestion,Build a hash for modified source
endpoint(source ip and port after NAT),in netfilter context,that is hash for
reply tuple's dst like what we already have as source hash for original
tuple's src.
2.Adding a flag called HOLEPUNCH somewhere on conntrack cache(session table)
indicating if this session is for P2P.
3.Have a iptables target something like "P2PSNAT" as alternative to the
existing SNAT.That's for specific P2P rule policy based on iptables:
iptables -t nat -A POSTROUTING -s 10.0.1.1/32 -d 0.0.0.0/0 -j P2PSNAT --to
155.99.25.11
4.Any connection modified by P2PSNAT will be flaged on  HOLEPUNCH.
5.When a new connection request come,in addition to existing source NAT
machenism's ip and port allocation method,the hash for modifid source should
be looked up to see if  there exist the same mapping as the candidate NAT
source ip and port,if there does exist one,then following is the conditions:
    a.This new connection's  HOLEPUNCH flag is off,representing this is a so
called "standard protocol",then check the existing mapping's connection's
HOLEPUNCH flag.If it is on,representing some P2P connection occupy this
mapping now, then skip this candidate and find a new one.If it is
off,representing this mapping is free for share,then adopt this candidate.
    b.This new connection's HOLEPUNCH flag is on,representing this is a P2P
connection request and can't share mapping,then skip this candidate and find
a new one.
If there does'nt exist one,then adopt this candidate.

For Hairpin,then life is easy now.While NAT A 10.0.1.1:4500 connects to NAT
B at NAT C 155.99.25.11:62005,in addition to existing destination NAT
machenism,the hash for modified source should also be looked up to see if
there exist any connection using 155.99.25.11:62005 as its modified source
to outbound connect with HOLEPUNCH flag on.If there does exist,then
destination NAT this connection's destination end point to the existing
mapping one's original source end point.If there doesn't,then like before if
TCP reset it.While source NAT,follow the same machenism as the above 5
phases.

Hope this design suggestion can help reserve what we have in netfilter.And
if it is needed,we are willing to do what we can do.

Thanks
Your sincerely
Jesse Peng

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2004-11-19  7:37 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-09 15:49 New draft document on nat / traversal / hole punching (with Dan Kegel) Harald Welte
2004-11-09 18:22 ` Martin Josefsson
2004-11-09 23:42   ` Rusty Russell
2004-11-10  6:10     ` David S. Miller
2004-11-10 12:26       ` KOVACS Krisztian
2004-11-10 13:20         ` [netfilter-core] " Harald Welte
2004-11-10 16:25     ` Harald Welte
2004-11-10 17:39       ` [PATCH 2.4] fix find_appropriate_src() to comply with kegel draft Harald Welte
2004-11-11  0:30       ` New draft document on nat / traversal / hole punching (with Dan Kegel) Rusty Russell
2004-11-11  0:26         ` David S. Miller
2004-11-11  4:23           ` Rusty Russell
2004-11-15 23:09             ` David S. Miller
2004-11-16  5:28 ` Rusty Russell
2004-11-16 20:38   ` Bryan Ford
2004-11-17  3:58     ` Rusty Russell
2004-11-17 16:33       ` Bryan Ford
2004-11-18  2:47         ` Rusty Russell
2004-11-17 16:04     ` Henrik Nordstrom
2004-11-17 16:47       ` Bryan Ford
2004-11-19  7:37         ` New draft document on nat / traversal / hole punching (with DanKegel) jesse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.