All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net, ip6_tunnel: enhance tunnel locate with link/type check
@ 2020-02-05 14:57 William Dauchy
  2020-02-05 15:04 ` Nicolas Dichtel
  0 siblings, 1 reply; 20+ messages in thread
From: William Dauchy @ 2020-02-05 14:57 UTC (permalink / raw)
  To: netdev; +Cc: Nicolas Dichtel, William Dauchy

With ipip, it is possible to create an extra interface explicitly
attached to a given physical interface:

  # ip link show tunl0
  4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
  # ip link add tunl1 type ipip dev eth0
  # ip link show tunl1
  6: tunl1@eth0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0

But it is not possible with ip6tnl:

  # ip link show ip6tnl0
  5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/tunnel6 :: brd ::
  # ip link add ip6tnl1 type ip6tnl dev eth0
  RTNETLINK answers: File exists

This patch aims to make it possible by adding the comparaison of the
link device while trying to locate an existing tunnel.
This later permits to make use of x-netns communication by moving the
newly created tunnel in a given netns.

Take this opportunity to also compare dev->type value as it is done in
ip_tunnel. This is therefore adding a new type parameter to
`ip6_tnl_locate`.

Signed-off-by: William Dauchy <w.dauchy@criteo.com>
---
 net/ipv6/ip6_tunnel.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index b5dd20c4599b..0df3b3ca7608 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -339,7 +339,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
  **/
 
 static struct ip6_tnl *ip6_tnl_locate(struct net *net,
-		struct __ip6_tnl_parm *p, int create)
+		struct __ip6_tnl_parm *p, int create, int type)
 {
 	const struct in6_addr *remote = &p->raddr;
 	const struct in6_addr *local = &p->laddr;
@@ -351,7 +351,9 @@ static struct ip6_tnl *ip6_tnl_locate(struct net *net,
 	     (t = rtnl_dereference(*tp)) != NULL;
 	     tp = &t->next) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_equal(remote, &t->parms.raddr)) {
+		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    p->link == t->parms.link &&
+		    type == t->dev->type) {
 			if (create)
 				return ERR_PTR(-EEXIST);
 
@@ -1600,7 +1602,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 				break;
 			}
 			ip6_tnl_parm_from_user(&p1, &p);
-			t = ip6_tnl_locate(net, &p1, 0);
+			t = ip6_tnl_locate(net, &p1, 0, dev->type);
 			if (IS_ERR(t))
 				t = netdev_priv(dev);
 		} else {
@@ -1624,7 +1626,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		    p.proto != 0)
 			break;
 		ip6_tnl_parm_from_user(&p1, &p);
-		t = ip6_tnl_locate(net, &p1, cmd == SIOCADDTUNNEL);
+		t = ip6_tnl_locate(net, &p1, cmd == SIOCADDTUNNEL, dev->type);
 		if (cmd == SIOCCHGTUNNEL) {
 			if (!IS_ERR(t)) {
 				if (t->dev != dev) {
@@ -1659,7 +1661,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 				break;
 			err = -ENOENT;
 			ip6_tnl_parm_from_user(&p1, &p);
-			t = ip6_tnl_locate(net, &p1, 0);
+			t = ip6_tnl_locate(net, &p1, 0, dev->type);
 			if (IS_ERR(t))
 				break;
 			err = -EPERM;
@@ -2015,7 +2017,7 @@ static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev,
 		if (rtnl_dereference(ip6n->collect_md_tun))
 			return -EEXIST;
 	} else {
-		t = ip6_tnl_locate(net, &nt->parms, 0);
+		t = ip6_tnl_locate(net, &nt->parms, 0, dev->type);
 		if (!IS_ERR(t))
 			return -EEXIST;
 	}
@@ -2050,7 +2052,7 @@ static int ip6_tnl_changelink(struct net_device *dev, struct nlattr *tb[],
 	if (p.collect_md)
 		return -EINVAL;
 
-	t = ip6_tnl_locate(net, &p, 0);
+	t = ip6_tnl_locate(net, &p, 0, dev->type);
 	if (!IS_ERR(t)) {
 		if (t->dev != dev)
 			return -EEXIST;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] net, ip6_tunnel: enhance tunnel locate with link/type check
  2020-02-05 14:57 [PATCH] net, ip6_tunnel: enhance tunnel locate with link/type check William Dauchy
@ 2020-02-05 15:04 ` Nicolas Dichtel
  2020-02-05 16:29   ` [PATCH v2 0/2] net, ip6_tunnel: enhance tunnel locate William Dauchy
                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Nicolas Dichtel @ 2020-02-05 15:04 UTC (permalink / raw)
  To: William Dauchy, netdev

Le 05/02/2020 à 15:57, William Dauchy a écrit :
> With ipip, it is possible to create an extra interface explicitly
> attached to a given physical interface:
> 
>   # ip link show tunl0
>   4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/ipip 0.0.0.0 brd 0.0.0.0
>   # ip link add tunl1 type ipip dev eth0
>   # ip link show tunl1
>   6: tunl1@eth0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/ipip 0.0.0.0 brd 0.0.0.0
> 
> But it is not possible with ip6tnl:
> 
>   # ip link show ip6tnl0
>   5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>       link/tunnel6 :: brd ::
>   # ip link add ip6tnl1 type ip6tnl dev eth0
>   RTNETLINK answers: File exists
> 
> This patch aims to make it possible by adding the comparaison of the
> link device while trying to locate an existing tunnel.
> This later permits to make use of x-netns communication by moving the
> newly created tunnel in a given netns.
> 
> Take this opportunity to also compare dev->type value as it is done in
Please, don't mix patches and problems.
You will have to split this.

Thank you,
Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 0/2] net, ip6_tunnel: enhance tunnel locate
  2020-02-05 15:04 ` Nicolas Dichtel
@ 2020-02-05 16:29   ` William Dauchy
  2020-02-05 16:29   ` [PATCH v2 1/2] net, ip6_tunnel: enhance tunnel locate with link check William Dauchy
  2020-02-05 16:29   ` [PATCH v2 2/2] net, ip6_tunnel: enhance tunnel locate with type check William Dauchy
  2 siblings, 0 replies; 20+ messages in thread
From: William Dauchy @ 2020-02-05 16:29 UTC (permalink / raw)
  To: netdev; +Cc: Nicolas Dichtel, William Dauchy

While trying to create additional ip6tnl interfaces, I noted behavior
was different from ip_tunnel, especially when you need to create a new
tunnel in order to make use for x-netns communication between root
namespace and another one. It is an issue when you do not have specific
remote/local address.

changed in V2:
- splitted patch in two parts for link/type parameter

William Dauchy (2):
  net, ip6_tunnel: enhance tunnel locate with link check
  net, ip6_tunnel: enhance tunnel locate with type check

 net/ipv6/ip6_tunnel.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/2] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-05 15:04 ` Nicolas Dichtel
  2020-02-05 16:29   ` [PATCH v2 0/2] net, ip6_tunnel: enhance tunnel locate William Dauchy
@ 2020-02-05 16:29   ` William Dauchy
  2020-02-05 16:54     ` Nicolas Dichtel
  2020-02-05 16:29   ` [PATCH v2 2/2] net, ip6_tunnel: enhance tunnel locate with type check William Dauchy
  2 siblings, 1 reply; 20+ messages in thread
From: William Dauchy @ 2020-02-05 16:29 UTC (permalink / raw)
  To: netdev; +Cc: Nicolas Dichtel, William Dauchy

With ipip, it is possible to create an extra interface explicitly
attached to a given physical interface:

  # ip link show tunl0
  4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
  # ip link add tunl1 type ipip dev eth0
  # ip link show tunl1
  6: tunl1@eth0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0

But it is not possible with ip6tnl:

  # ip link show ip6tnl0
  5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/tunnel6 :: brd ::
  # ip link add ip6tnl1 type ip6tnl dev eth0
  RTNETLINK answers: File exists

This patch aims to make it possible by adding the comparaison of the
link device while trying to locate an existing tunnel.
This later permits to make use of x-netns communication by moving the
newly created tunnel in a given netns.

Signed-off-by: William Dauchy <w.dauchy@criteo.com>
---
 net/ipv6/ip6_tunnel.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index b5dd20c4599b..053f44691cc6 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -351,7 +351,8 @@ static struct ip6_tnl *ip6_tnl_locate(struct net *net,
 	     (t = rtnl_dereference(*tp)) != NULL;
 	     tp = &t->next) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_equal(remote, &t->parms.raddr)) {
+		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    p->link == t->parms.link) {
 			if (create)
 				return ERR_PTR(-EEXIST);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/2] net, ip6_tunnel: enhance tunnel locate with type check
  2020-02-05 15:04 ` Nicolas Dichtel
  2020-02-05 16:29   ` [PATCH v2 0/2] net, ip6_tunnel: enhance tunnel locate William Dauchy
  2020-02-05 16:29   ` [PATCH v2 1/2] net, ip6_tunnel: enhance tunnel locate with link check William Dauchy
@ 2020-02-05 16:29   ` William Dauchy
  2020-02-05 16:59     ` Nicolas Dichtel
  2 siblings, 1 reply; 20+ messages in thread
From: William Dauchy @ 2020-02-05 16:29 UTC (permalink / raw)
  To: netdev; +Cc: Nicolas Dichtel, William Dauchy

As it is done in ip_tunnel, compare dev->type when trying to locate an
existing tunnel.
This is therefore adding a new type parameter to `ip6_tnl_locate`.

Signed-off-by: William Dauchy <w.dauchy@criteo.com>
---
 net/ipv6/ip6_tunnel.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 053f44691cc6..94419b6479fd 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -339,7 +339,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
  **/
 
 static struct ip6_tnl *ip6_tnl_locate(struct net *net,
-		struct __ip6_tnl_parm *p, int create)
+		struct __ip6_tnl_parm *p, int create, int type)
 {
 	const struct in6_addr *remote = &p->raddr;
 	const struct in6_addr *local = &p->laddr;
@@ -352,7 +352,8 @@ static struct ip6_tnl *ip6_tnl_locate(struct net *net,
 	     tp = &t->next) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
 		    ipv6_addr_equal(remote, &t->parms.raddr) &&
-		    p->link == t->parms.link) {
+		    p->link == t->parms.link &&
+		    type == t->dev->type) {
 			if (create)
 				return ERR_PTR(-EEXIST);
 
@@ -1601,7 +1602,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 				break;
 			}
 			ip6_tnl_parm_from_user(&p1, &p);
-			t = ip6_tnl_locate(net, &p1, 0);
+			t = ip6_tnl_locate(net, &p1, 0, ip6n->fb_tnl_dev->type);
 			if (IS_ERR(t))
 				t = netdev_priv(dev);
 		} else {
@@ -1625,7 +1626,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		    p.proto != 0)
 			break;
 		ip6_tnl_parm_from_user(&p1, &p);
-		t = ip6_tnl_locate(net, &p1, cmd == SIOCADDTUNNEL);
+		t = ip6_tnl_locate(net, &p1, cmd == SIOCADDTUNNEL, dev->type);
 		if (cmd == SIOCCHGTUNNEL) {
 			if (!IS_ERR(t)) {
 				if (t->dev != dev) {
@@ -1660,7 +1661,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 				break;
 			err = -ENOENT;
 			ip6_tnl_parm_from_user(&p1, &p);
-			t = ip6_tnl_locate(net, &p1, 0);
+			t = ip6_tnl_locate(net, &p1, 0, ip6n->fb_tnl_dev->type);
 			if (IS_ERR(t))
 				break;
 			err = -EPERM;
@@ -2016,7 +2017,7 @@ static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev,
 		if (rtnl_dereference(ip6n->collect_md_tun))
 			return -EEXIST;
 	} else {
-		t = ip6_tnl_locate(net, &nt->parms, 0);
+		t = ip6_tnl_locate(net, &nt->parms, 0, dev->type);
 		if (!IS_ERR(t))
 			return -EEXIST;
 	}
@@ -2051,7 +2052,7 @@ static int ip6_tnl_changelink(struct net_device *dev, struct nlattr *tb[],
 	if (p.collect_md)
 		return -EINVAL;
 
-	t = ip6_tnl_locate(net, &p, 0);
+	t = ip6_tnl_locate(net, &p, 0, dev->type);
 	if (!IS_ERR(t)) {
 		if (t->dev != dev)
 			return -EEXIST;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-05 16:29   ` [PATCH v2 1/2] net, ip6_tunnel: enhance tunnel locate with link check William Dauchy
@ 2020-02-05 16:54     ` Nicolas Dichtel
  2020-02-05 17:36       ` William Dauchy
  2020-02-12  8:30       ` [PATCH v3 net] " William Dauchy
  0 siblings, 2 replies; 20+ messages in thread
From: Nicolas Dichtel @ 2020-02-05 16:54 UTC (permalink / raw)
  To: William Dauchy, netdev

Le 05/02/2020 à 17:29, William Dauchy a écrit :
[snip]
> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
> index b5dd20c4599b..053f44691cc6 100644
> --- a/net/ipv6/ip6_tunnel.c
> +++ b/net/ipv6/ip6_tunnel.c
> @@ -351,7 +351,8 @@ static struct ip6_tnl *ip6_tnl_locate(struct net *net,
>  	     (t = rtnl_dereference(*tp)) != NULL;
>  	     tp = &t->next) {
>  		if (ipv6_addr_equal(local, &t->parms.laddr) &&
> -		    ipv6_addr_equal(remote, &t->parms.raddr)) {
> +		    ipv6_addr_equal(remote, &t->parms.raddr) &&
> +		    p->link == t->parms.link) {
>  			if (create)
>  				return ERR_PTR(-EEXIST);
This is probably not so easy. If link becomes part of the key, at least
ip6_tnl_lookup() should also be updated.
You can also look at ip_tunnel_bind_dev() to check how the mtu is calculated.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] net, ip6_tunnel: enhance tunnel locate with type check
  2020-02-05 16:29   ` [PATCH v2 2/2] net, ip6_tunnel: enhance tunnel locate with type check William Dauchy
@ 2020-02-05 16:59     ` Nicolas Dichtel
  2020-02-05 17:31       ` William Dauchy
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Dichtel @ 2020-02-05 16:59 UTC (permalink / raw)
  To: William Dauchy, netdev

Le 05/02/2020 à 17:29, William Dauchy a écrit :
> As it is done in ip_tunnel, compare dev->type when trying to locate an
> existing tunnel.
Can you elaborate? What problem does this solve?
Which tree are you targeting? net or net-next?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] net, ip6_tunnel: enhance tunnel locate with type check
  2020-02-05 16:59     ` Nicolas Dichtel
@ 2020-02-05 17:31       ` William Dauchy
  2020-02-05 18:16         ` William Dauchy
  0 siblings, 1 reply; 20+ messages in thread
From: William Dauchy @ 2020-02-05 17:31 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: William Dauchy, NETDEV

On Wed, Feb 5, 2020 at 6:01 PM Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> Can you elaborate? What problem does this solve?
> Which tree are you targeting? net or net-next?

I was seeing that as a nice to have for coherency between ip6_tunnel
and ip_tunnel. But we can drop this one if we lack of practical
example, as I was more concerned about link check in the 1/2 patch.
It was more for net in my opinion as I considered it not too invasive.
-- 
William

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-05 16:54     ` Nicolas Dichtel
@ 2020-02-05 17:36       ` William Dauchy
  2020-02-12  8:30       ` [PATCH v3 net] " William Dauchy
  1 sibling, 0 replies; 20+ messages in thread
From: William Dauchy @ 2020-02-05 17:36 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: William Dauchy, NETDEV

On Wed, Feb 5, 2020 at 5:55 PM Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> Le 05/02/2020 à 17:29, William Dauchy a écrit :
> [snip]
> > diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
> > index b5dd20c4599b..053f44691cc6 100644
> > --- a/net/ipv6/ip6_tunnel.c
> > +++ b/net/ipv6/ip6_tunnel.c
> > @@ -351,7 +351,8 @@ static struct ip6_tnl *ip6_tnl_locate(struct net *net,
> >            (t = rtnl_dereference(*tp)) != NULL;
> >            tp = &t->next) {
> >               if (ipv6_addr_equal(local, &t->parms.laddr) &&
> > -                 ipv6_addr_equal(remote, &t->parms.raddr)) {
> > +                 ipv6_addr_equal(remote, &t->parms.raddr) &&
> > +                 p->link == t->parms.link) {
> >                       if (create)
> >                               return ERR_PTR(-EEXIST);
> This is probably not so easy. If link becomes part of the key, at least
> ip6_tnl_lookup() should also be updated.
> You can also look at ip_tunnel_bind_dev() to check how the mtu is calculated.

indeed, I will update that.
-- 
William

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] net, ip6_tunnel: enhance tunnel locate with type check
  2020-02-05 17:31       ` William Dauchy
@ 2020-02-05 18:16         ` William Dauchy
  0 siblings, 0 replies; 20+ messages in thread
From: William Dauchy @ 2020-02-05 18:16 UTC (permalink / raw)
  To: William Dauchy; +Cc: Nicolas Dichtel, NETDEV

On Wed, Feb 5, 2020 at 6:31 PM William Dauchy <w.dauchy@criteo.com> wrote:
> On Wed, Feb 5, 2020 at 6:01 PM Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
> > Can you elaborate? What problem does this solve?
> > Which tree are you targeting? net or net-next?
>
> I was seeing that as a nice to have for coherency between ip6_tunnel
> and ip_tunnel. But we can drop this one if we lack of practical
> example, as I was more concerned about link check in the 1/2 patch.
> It was more for net in my opinion as I considered it not too invasive.

Ok let's drop that one for now as there is no practical example as of
today, I will focus on 1/2 around link check.

Thanks,
-- 
William

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 net] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-05 16:54     ` Nicolas Dichtel
  2020-02-05 17:36       ` William Dauchy
@ 2020-02-12  8:30       ` William Dauchy
  2020-02-12 15:54         ` Nicolas Dichtel
  1 sibling, 1 reply; 20+ messages in thread
From: William Dauchy @ 2020-02-12  8:30 UTC (permalink / raw)
  To: netdev; +Cc: nicolas.dichtel, William Dauchy

With ipip, it is possible to create an extra interface explicitly
attached to a given physical interface:

  # ip link show tunl0
  4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
  # ip link add tunl1 type ipip dev eth0
  # ip link show tunl1
  6: tunl1@eth0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0

But it is not possible with ip6tnl:

  # ip link show ip6tnl0
  5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/tunnel6 :: brd ::
  # ip link add ip6tnl1 type ip6tnl dev eth0
  RTNETLINK answers: File exists

This patch aims to make it possible by adding link comparaison in both
tunnel locate and lookup functions; we also modify mtu calculation when
attached to an interface with a lower mtu.

This permits to make use of x-netns communication by moving the newly
created tunnel in a given netns.

Signed-off-by: William Dauchy <w.dauchy@criteo.com>
---
 net/ipv6/ip6_tunnel.c | 55 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 41 insertions(+), 14 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index b5dd20c4599b..e57802f100fd 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -121,6 +121,7 @@ static struct net_device_stats *ip6_get_stats(struct net_device *dev)
 
 /**
  * ip6_tnl_lookup - fetch tunnel matching the end-point addresses
+ *   @link: ifindex of underlying interface
  *   @remote: the address of the tunnel exit-point
  *   @local: the address of the tunnel entry-point
  *
@@ -134,37 +135,56 @@ static struct net_device_stats *ip6_get_stats(struct net_device *dev)
 	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
 
 static struct ip6_tnl *
-ip6_tnl_lookup(struct net *net, const struct in6_addr *remote, const struct in6_addr *local)
+ip6_tnl_lookup(struct net *net, int link,
+	       const struct in6_addr *remote, const struct in6_addr *local)
 {
 	unsigned int hash = HASH(remote, local);
-	struct ip6_tnl *t;
+	struct ip6_tnl *t, *cand = NULL;
 	struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
 	struct in6_addr any;
 
 	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
-		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_equal(remote, &t->parms.raddr) &&
-		    (t->dev->flags & IFF_UP))
+		if (!ipv6_addr_equal(local, &t->parms.laddr) ||
+		    !ipv6_addr_equal(remote, &t->parms.raddr) ||
+		    !(t->dev->flags & IFF_UP))
+			continue;
+
+		if (link == t->parms.link)
 			return t;
+		else
+			cand = t;
 	}
 
 	memset(&any, 0, sizeof(any));
 	hash = HASH(&any, local);
 	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
-		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_any(&t->parms.raddr) &&
-		    (t->dev->flags & IFF_UP))
+		if (!ipv6_addr_equal(local, &t->parms.laddr) ||
+		    !ipv6_addr_any(&t->parms.raddr) ||
+		    !(t->dev->flags & IFF_UP))
+			continue;
+
+		if (link == t->parms.link)
 			return t;
+		else if (!cand)
+			cand = t;
 	}
 
 	hash = HASH(remote, &any);
 	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
-		if (ipv6_addr_equal(remote, &t->parms.raddr) &&
-		    ipv6_addr_any(&t->parms.laddr) &&
-		    (t->dev->flags & IFF_UP))
+		if (!ipv6_addr_equal(remote, &t->parms.raddr) ||
+		    !ipv6_addr_any(&t->parms.laddr) ||
+		    !(t->dev->flags & IFF_UP))
+			continue;
+
+		if (link == t->parms.link)
 			return t;
+		else if (!cand)
+			cand = t;
 	}
 
+	if (cand)
+		return cand;
+
 	t = rcu_dereference(ip6n->collect_md_tun);
 	if (t && t->dev->flags & IFF_UP)
 		return t;
@@ -351,7 +371,8 @@ static struct ip6_tnl *ip6_tnl_locate(struct net *net,
 	     (t = rtnl_dereference(*tp)) != NULL;
 	     tp = &t->next) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_equal(remote, &t->parms.raddr)) {
+		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    p->link == t->parms.link) {
 			if (create)
 				return ERR_PTR(-EEXIST);
 
@@ -485,7 +506,7 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct inet6_skb_parm *opt,
 	   processing of the error. */
 
 	rcu_read_lock();
-	t = ip6_tnl_lookup(dev_net(skb->dev), &ipv6h->daddr, &ipv6h->saddr);
+	t = ip6_tnl_lookup(dev_net(skb->dev), skb->dev->ifindex, &ipv6h->daddr, &ipv6h->saddr);
 	if (!t)
 		goto out;
 
@@ -887,7 +908,7 @@ static int ipxip6_rcv(struct sk_buff *skb, u8 ipproto,
 	int ret = -1;
 
 	rcu_read_lock();
-	t = ip6_tnl_lookup(dev_net(skb->dev), &ipv6h->saddr, &ipv6h->daddr);
+	t = ip6_tnl_lookup(dev_net(skb->dev), skb->dev->ifindex, &ipv6h->saddr, &ipv6h->daddr);
 
 	if (t) {
 		u8 tproto = READ_ONCE(t->parms.proto);
@@ -1823,6 +1844,7 @@ static void ip6_tnl_dev_setup(struct net_device *dev)
 static inline int
 ip6_tnl_dev_init_gen(struct net_device *dev)
 {
+	struct net_device *tdev = NULL;
 	struct ip6_tnl *t = netdev_priv(dev);
 	int ret;
 	int t_hlen;
@@ -1848,6 +1870,11 @@ ip6_tnl_dev_init_gen(struct net_device *dev)
 	dev->type = ARPHRD_TUNNEL6;
 	dev->hard_header_len = LL_MAX_HEADER + t_hlen;
 	dev->mtu = ETH_DATA_LEN - t_hlen;
+	if (t->parms.link) {
+		tdev = __dev_get_by_index(t->net, t->parms.link);
+		if (tdev && tdev->mtu < dev->mtu)
+			dev->mtu = tdev->mtu;
+	}
 	if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
 		dev->mtu -= 8;
 	dev->min_mtu = ETH_MIN_MTU;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 net] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-12  8:30       ` [PATCH v3 net] " William Dauchy
@ 2020-02-12 15:54         ` Nicolas Dichtel
  2020-02-12 21:06           ` William Dauchy
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Dichtel @ 2020-02-12 15:54 UTC (permalink / raw)
  To: William Dauchy, netdev

Le 12/02/2020 à 09:30, William Dauchy a écrit :
[snip]
> @@ -1848,6 +1870,11 @@ ip6_tnl_dev_init_gen(struct net_device *dev)
>  	dev->type = ARPHRD_TUNNEL6;
>  	dev->hard_header_len = LL_MAX_HEADER + t_hlen;
>  	dev->mtu = ETH_DATA_LEN - t_hlen;
> +	if (t->parms.link) {
> +		tdev = __dev_get_by_index(t->net, t->parms.link);
> +		if (tdev && tdev->mtu < dev->mtu)
> +			dev->mtu = tdev->mtu;
Hmm, I was expecting 'tdev->mtu - t_hlen'. Am I wrong?

In fact, something like this:
dev->mtu = ETH_DATA_LEN - t_hlen;
if (t->parms.link) {
	tdev = __dev_get_by_index(t->net, t->parms.link);
	if (tdev)
		dev->mtu = tdev->mtu - t_hlen;
}

With ipip:
$ ip l s eth1 mtu 2000
$ ip link add ipip1 type ipip remote 10.16.0.121 local 10.16.0.249 dev eth1
$ ip l
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2000 qdisc pfifo_fast state UP
mode DEFAULT group default qlen 1000
...
10: ipip1@eth1: <POINTOPOINT,NOARP> mtu 1980 qdisc noop state DOWN mode DEFAULT
group default qlen 1000

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 net] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-12 15:54         ` Nicolas Dichtel
@ 2020-02-12 21:06           ` William Dauchy
  2020-02-13  9:34             ` Nicolas Dichtel
  0 siblings, 1 reply; 20+ messages in thread
From: William Dauchy @ 2020-02-12 21:06 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev

Hello Nicolas,

Thank you for your review.

On Wed, Feb 12, 2020 at 04:54:19PM +0100, Nicolas Dichtel wrote:
> Hmm, I was expecting 'tdev->mtu - t_hlen'. Am I wrong?
> 
> In fact, something like this:
> dev->mtu = ETH_DATA_LEN - t_hlen;
> if (t->parms.link) {
> 	tdev = __dev_get_by_index(t->net, t->parms.link);
> 	if (tdev)
> 		dev->mtu = tdev->mtu - t_hlen;
> }

true, I missed that one; I reworked to something like:

int mtu;

mtu = ETH_DATA_LEN;
if (t->parms.link) {
	tdev = __dev_get_by_index(t->net, t->parms.link);
	if (tdev && tdev->mtu < mtu)
		mtu = tdev->mtu;
}
dev->mtu = mtu - t_hlen;


However in ipip we do:

mtu -= (dev->hard_header_len + t_hlen);

Do I need to use hard_header_len as well?

Thanks,
-- 
William

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 net] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-12 21:06           ` William Dauchy
@ 2020-02-13  9:34             ` Nicolas Dichtel
  2020-02-13 15:26               ` William Dauchy
  2020-02-13 15:35               ` [PATCH v4 " William Dauchy
  0 siblings, 2 replies; 20+ messages in thread
From: Nicolas Dichtel @ 2020-02-13  9:34 UTC (permalink / raw)
  To: William Dauchy; +Cc: netdev

Le 12/02/2020 à 22:06, William Dauchy a écrit :
> Hello Nicolas,
> 
> Thank you for your review.
> 
> On Wed, Feb 12, 2020 at 04:54:19PM +0100, Nicolas Dichtel wrote:
>> Hmm, I was expecting 'tdev->mtu - t_hlen'. Am I wrong?
>>
>> In fact, something like this:
>> dev->mtu = ETH_DATA_LEN - t_hlen;
>> if (t->parms.link) {
>> 	tdev = __dev_get_by_index(t->net, t->parms.link);
>> 	if (tdev)
>> 		dev->mtu = tdev->mtu - t_hlen;
>> }
> 
> true, I missed that one; I reworked to something like:
> 
> int mtu;
> 
> mtu = ETH_DATA_LEN;
> if (t->parms.link) {
> 	tdev = __dev_get_by_index(t->net, t->parms.link);
> 	if (tdev && tdev->mtu < mtu)
Why this second condition? Why not allowing more than ETH_DATA_LEN (1500)?
ip_tunnels do:
        if (tdev) {

                hlen = tdev->hard_header_len + tdev->needed_headroom;

                mtu = min(tdev->mtu, IP_MAX_MTU);

        }
which seems better.

Note also that you patch ip6_tnl_dev_init_gen(), but ip6_tnl_link_config() is
called later and may adjust the mtu. I would suggest to take care of link mtu in
ip6_tnl_link_config().

> 		mtu = tdev->mtu;
> }
> dev->mtu = mtu - t_hlen;
> 
> 
> However in ipip we do:
> 
> mtu -= (dev->hard_header_len + t_hlen);
> 
> Do I need to use hard_header_len as well?
hard_header_len is not set for ipv4 tunnels, but for ipv6 tunnels:
        dev->hard_header_len = LL_MAX_HEADER + t_hlen;

This is not the real value, I don't think you can calculate the real mtu based
on this.


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 net] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-13  9:34             ` Nicolas Dichtel
@ 2020-02-13 15:26               ` William Dauchy
  2020-02-13 15:35               ` [PATCH v4 " William Dauchy
  1 sibling, 0 replies; 20+ messages in thread
From: William Dauchy @ 2020-02-13 15:26 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev

On Thu, Feb 13, 2020 at 10:34:19AM +0100, Nicolas Dichtel wrote:
> > mtu = ETH_DATA_LEN;
> > if (t->parms.link) {
> > 	tdev = __dev_get_by_index(t->net, t->parms.link);
> > 	if (tdev && tdev->mtu < mtu)
> Why this second condition? Why not allowing more than ETH_DATA_LEN (1500)?
> ip_tunnels do:
>         if (tdev) {
> 
>                 hlen = tdev->hard_header_len + tdev->needed_headroom;
> 
>                 mtu = min(tdev->mtu, IP_MAX_MTU);
> 
>         }
> which seems better.

I wrongly mixed the two codes in my head as I wanted to take the lowest
of the two values; will correct that; sorry for the confusion.

> Note also that you patch ip6_tnl_dev_init_gen(), but ip6_tnl_link_config() is
> called later and may adjust the mtu. I would suggest to take care of link mtu in
> ip6_tnl_link_config().

agreed, I overlooked it. Unsure whether I can put it in CAP_XMIT
condition as well.

> hard_header_len is not set for ipv4 tunnels, but for ipv6 tunnels:
>         dev->hard_header_len = LL_MAX_HEADER + t_hlen;
> 
> This is not the real value, I don't think you can calculate the real mtu based
> on this.

understood, I indeed did not noticed hard_header_len was not set in
ip_tunnel.

Best,
-- 
William

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 net] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-13  9:34             ` Nicolas Dichtel
  2020-02-13 15:26               ` William Dauchy
@ 2020-02-13 15:35               ` William Dauchy
  2020-02-13 16:33                 ` Nicolas Dichtel
  1 sibling, 1 reply; 20+ messages in thread
From: William Dauchy @ 2020-02-13 15:35 UTC (permalink / raw)
  To: netdev; +Cc: William Dauchy, Nicolas Dichtel

With ipip, it is possible to create an extra interface explicitly
attached to a given physical interface:

  # ip link show tunl0
  4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
  # ip link add tunl1 type ipip dev eth0
  # ip link show tunl1
  6: tunl1@eth0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0

But it is not possible with ip6tnl:

  # ip link show ip6tnl0
  5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/tunnel6 :: brd ::
  # ip link add ip6tnl1 type ip6tnl dev eth0
  RTNETLINK answers: File exists

This patch aims to make it possible by adding link comparaison in both
tunnel locate and lookup functions; we also modify mtu calculation when
attached to an interface with a lower mtu.

This permits to make use of x-netns communication by moving the newly
created tunnel in a given netns.

Signed-off-by: William Dauchy <w.dauchy@criteo.com>
---
changes in v2:
- splitted code to differenciate link/type check
changes in v3:
- abadon 2/2 with type check as we do not have any real use case as of
  today
- fix lookup function and mtu calculation
changes in v4:
- fix and move mtu calculation in ip6_tnl_link_config
---
 net/ipv6/ip6_tunnel.c | 69 +++++++++++++++++++++++++++++--------------
 1 file changed, 47 insertions(+), 22 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index b5dd20c4599b..284e24806eab 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -121,6 +121,7 @@ static struct net_device_stats *ip6_get_stats(struct net_device *dev)
 
 /**
  * ip6_tnl_lookup - fetch tunnel matching the end-point addresses
+ *   @link: ifindex of underlying interface
  *   @remote: the address of the tunnel exit-point
  *   @local: the address of the tunnel entry-point
  *
@@ -134,37 +135,56 @@ static struct net_device_stats *ip6_get_stats(struct net_device *dev)
 	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
 
 static struct ip6_tnl *
-ip6_tnl_lookup(struct net *net, const struct in6_addr *remote, const struct in6_addr *local)
+ip6_tnl_lookup(struct net *net, int link,
+	       const struct in6_addr *remote, const struct in6_addr *local)
 {
 	unsigned int hash = HASH(remote, local);
-	struct ip6_tnl *t;
+	struct ip6_tnl *t, *cand = NULL;
 	struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
 	struct in6_addr any;
 
 	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
-		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_equal(remote, &t->parms.raddr) &&
-		    (t->dev->flags & IFF_UP))
+		if (!ipv6_addr_equal(local, &t->parms.laddr) ||
+		    !ipv6_addr_equal(remote, &t->parms.raddr) ||
+		    !(t->dev->flags & IFF_UP))
+			continue;
+
+		if (link == t->parms.link)
 			return t;
+		else
+			cand = t;
 	}
 
 	memset(&any, 0, sizeof(any));
 	hash = HASH(&any, local);
 	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
-		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_any(&t->parms.raddr) &&
-		    (t->dev->flags & IFF_UP))
+		if (!ipv6_addr_equal(local, &t->parms.laddr) ||
+		    !ipv6_addr_any(&t->parms.raddr) ||
+		    !(t->dev->flags & IFF_UP))
+			continue;
+
+		if (link == t->parms.link)
 			return t;
+		else if (!cand)
+			cand = t;
 	}
 
 	hash = HASH(remote, &any);
 	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
-		if (ipv6_addr_equal(remote, &t->parms.raddr) &&
-		    ipv6_addr_any(&t->parms.laddr) &&
-		    (t->dev->flags & IFF_UP))
+		if (!ipv6_addr_equal(remote, &t->parms.raddr) ||
+		    !ipv6_addr_any(&t->parms.laddr) ||
+		    !(t->dev->flags & IFF_UP))
+			continue;
+
+		if (link == t->parms.link)
 			return t;
+		else if (!cand)
+			cand = t;
 	}
 
+	if (cand)
+		return cand;
+
 	t = rcu_dereference(ip6n->collect_md_tun);
 	if (t && t->dev->flags & IFF_UP)
 		return t;
@@ -351,7 +371,8 @@ static struct ip6_tnl *ip6_tnl_locate(struct net *net,
 	     (t = rtnl_dereference(*tp)) != NULL;
 	     tp = &t->next) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_equal(remote, &t->parms.raddr)) {
+		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    p->link == t->parms.link) {
 			if (create)
 				return ERR_PTR(-EEXIST);
 
@@ -485,7 +506,7 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct inet6_skb_parm *opt,
 	   processing of the error. */
 
 	rcu_read_lock();
-	t = ip6_tnl_lookup(dev_net(skb->dev), &ipv6h->daddr, &ipv6h->saddr);
+	t = ip6_tnl_lookup(dev_net(skb->dev), skb->dev->ifindex, &ipv6h->daddr, &ipv6h->saddr);
 	if (!t)
 		goto out;
 
@@ -887,7 +908,7 @@ static int ipxip6_rcv(struct sk_buff *skb, u8 ipproto,
 	int ret = -1;
 
 	rcu_read_lock();
-	t = ip6_tnl_lookup(dev_net(skb->dev), &ipv6h->saddr, &ipv6h->daddr);
+	t = ip6_tnl_lookup(dev_net(skb->dev), skb->dev->ifindex, &ipv6h->saddr, &ipv6h->daddr);
 
 	if (t) {
 		u8 tproto = READ_ONCE(t->parms.proto);
@@ -1420,9 +1441,11 @@ ip6_tnl_start_xmit(struct sk_buff *skb, struct net_device *dev)
 static void ip6_tnl_link_config(struct ip6_tnl *t)
 {
 	struct net_device *dev = t->dev;
+	struct net_device *tdev = NULL;
 	struct __ip6_tnl_parm *p = &t->parms;
 	struct flowi6 *fl6 = &t->fl.u.ip6;
 	int t_hlen;
+	int mtu;
 
 	memcpy(dev->dev_addr, &p->laddr, sizeof(struct in6_addr));
 	memcpy(dev->broadcast, &p->raddr, sizeof(struct in6_addr));
@@ -1457,22 +1480,24 @@ static void ip6_tnl_link_config(struct ip6_tnl *t)
 		struct rt6_info *rt = rt6_lookup(t->net,
 						 &p->raddr, &p->laddr,
 						 p->link, NULL, strict);
+		if (!IS_ERR(rt)) {
+			tdev = rt->dst.dev;
+			ip6_rt_put(rt);
+		} else if (t->parms.link) {
+			tdev = __dev_get_by_index(t->net, t->parms.link);
+		}
 
-		if (!rt)
-			return;
-
-		if (rt->dst.dev) {
-			dev->hard_header_len = rt->dst.dev->hard_header_len +
-				t_hlen;
+		if (tdev) {
+			dev->hard_header_len = tdev->hard_header_len + t_hlen;
+			mtu = min(tdev->mtu, IP_MAX_MTU);
 
-			dev->mtu = rt->dst.dev->mtu - t_hlen;
+			dev->mtu = mtu - t_hlen;
 			if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
 				dev->mtu -= 8;
 
 			if (dev->mtu < IPV6_MIN_MTU)
 				dev->mtu = IPV6_MIN_MTU;
 		}
-		ip6_rt_put(rt);
 	}
 }
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-13 15:35               ` [PATCH v4 " William Dauchy
@ 2020-02-13 16:33                 ` Nicolas Dichtel
  2020-02-13 17:19                   ` [PATCH v5 " William Dauchy
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Dichtel @ 2020-02-13 16:33 UTC (permalink / raw)
  To: William Dauchy, netdev

Le 13/02/2020 à 16:35, William Dauchy a écrit :
[snip]
> @@ -1420,9 +1441,11 @@ ip6_tnl_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  static void ip6_tnl_link_config(struct ip6_tnl *t)
>  {
>  	struct net_device *dev = t->dev;
> +	struct net_device *tdev = NULL;
>  	struct __ip6_tnl_parm *p = &t->parms;
>  	struct flowi6 *fl6 = &t->fl.u.ip6;
>  	int t_hlen;
> +	int mtu;
>  
>  	memcpy(dev->dev_addr, &p->laddr, sizeof(struct in6_addr));
>  	memcpy(dev->broadcast, &p->raddr, sizeof(struct in6_addr));
> @@ -1457,22 +1480,24 @@ static void ip6_tnl_link_config(struct ip6_tnl *t)
>  		struct rt6_info *rt = rt6_lookup(t->net,
>  						 &p->raddr, &p->laddr,
>  						 p->link, NULL, strict);
> +		if (!IS_ERR(rt)) {
Why IS_ERR()? rt6_lookup() returns a valid pointer or NULL.

> +			tdev = rt->dst.dev;
> +			ip6_rt_put(rt);
> +		} else if (t->parms.link) {
> +			tdev = __dev_get_by_index(t->net, t->parms.link);
p->link to be consistent with the rest of the function.

> +		}
>  
> -		if (!rt)
> -			return;
> -
> -		if (rt->dst.dev) {
But rt->dst.dev can be NULL.

> -			dev->hard_header_len = rt->dst.dev->hard_header_len +
> -				t_hlen;
> +		if (tdev) {
> +			dev->hard_header_len = tdev->hard_header_len + t_hlen;
> +			mtu = min(tdev->mtu, IP_MAX_MTU);
IP6_MAX_MTU?

>  
> -			dev->mtu = rt->dst.dev->mtu - t_hlen;
> +			dev->mtu = mtu - t_hlen;
>  			if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
>  				dev->mtu -= 8;
>  
>  			if (dev->mtu < IPV6_MIN_MTU)
>  				dev->mtu = IPV6_MIN_MTU;
>  		}
> -		ip6_rt_put(rt);
>  	}
>  }
>  
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5 net] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-13 16:33                 ` Nicolas Dichtel
@ 2020-02-13 17:19                   ` William Dauchy
  2020-02-14  9:50                     ` Nicolas Dichtel
  2020-02-14 15:32                     ` David Miller
  0 siblings, 2 replies; 20+ messages in thread
From: William Dauchy @ 2020-02-13 17:19 UTC (permalink / raw)
  To: netdev; +Cc: William Dauchy, Nicolas Dichtel

With ipip, it is possible to create an extra interface explicitly
attached to a given physical interface:

  # ip link show tunl0
  4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
  # ip link add tunl1 type ipip dev eth0
  # ip link show tunl1
  6: tunl1@eth0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0

But it is not possible with ip6tnl:

  # ip link show ip6tnl0
  5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/tunnel6 :: brd ::
  # ip link add ip6tnl1 type ip6tnl dev eth0
  RTNETLINK answers: File exists

This patch aims to make it possible by adding link comparaison in both
tunnel locate and lookup functions; we also modify mtu calculation when
attached to an interface with a lower mtu.

This permits to make use of x-netns communication by moving the newly
created tunnel in a given netns.

Signed-off-by: William Dauchy <w.dauchy@criteo.com>
---
changes in v2:
- splitted code to differenciate link/type check
changes in v3:
- abadon 2/2 with type check as we do not have any real use case as of
  today
- fix lookup function and mtu calculation
changes in v4:
- fix and move mtu calculation in ip6_tnl_link_config
changes in v5:
- mtu fixes: rt->dst.dev check + IP6_MAX_MTU
---
 net/ipv6/ip6_tunnel.c | 68 ++++++++++++++++++++++++++++++-------------
 1 file changed, 47 insertions(+), 21 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index b5dd20c4599b..5d65436ad5ad 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -121,6 +121,7 @@ static struct net_device_stats *ip6_get_stats(struct net_device *dev)
 
 /**
  * ip6_tnl_lookup - fetch tunnel matching the end-point addresses
+ *   @link: ifindex of underlying interface
  *   @remote: the address of the tunnel exit-point
  *   @local: the address of the tunnel entry-point
  *
@@ -134,37 +135,56 @@ static struct net_device_stats *ip6_get_stats(struct net_device *dev)
 	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
 
 static struct ip6_tnl *
-ip6_tnl_lookup(struct net *net, const struct in6_addr *remote, const struct in6_addr *local)
+ip6_tnl_lookup(struct net *net, int link,
+	       const struct in6_addr *remote, const struct in6_addr *local)
 {
 	unsigned int hash = HASH(remote, local);
-	struct ip6_tnl *t;
+	struct ip6_tnl *t, *cand = NULL;
 	struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
 	struct in6_addr any;
 
 	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
-		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_equal(remote, &t->parms.raddr) &&
-		    (t->dev->flags & IFF_UP))
+		if (!ipv6_addr_equal(local, &t->parms.laddr) ||
+		    !ipv6_addr_equal(remote, &t->parms.raddr) ||
+		    !(t->dev->flags & IFF_UP))
+			continue;
+
+		if (link == t->parms.link)
 			return t;
+		else
+			cand = t;
 	}
 
 	memset(&any, 0, sizeof(any));
 	hash = HASH(&any, local);
 	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
-		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_any(&t->parms.raddr) &&
-		    (t->dev->flags & IFF_UP))
+		if (!ipv6_addr_equal(local, &t->parms.laddr) ||
+		    !ipv6_addr_any(&t->parms.raddr) ||
+		    !(t->dev->flags & IFF_UP))
+			continue;
+
+		if (link == t->parms.link)
 			return t;
+		else if (!cand)
+			cand = t;
 	}
 
 	hash = HASH(remote, &any);
 	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
-		if (ipv6_addr_equal(remote, &t->parms.raddr) &&
-		    ipv6_addr_any(&t->parms.laddr) &&
-		    (t->dev->flags & IFF_UP))
+		if (!ipv6_addr_equal(remote, &t->parms.raddr) ||
+		    !ipv6_addr_any(&t->parms.laddr) ||
+		    !(t->dev->flags & IFF_UP))
+			continue;
+
+		if (link == t->parms.link)
 			return t;
+		else if (!cand)
+			cand = t;
 	}
 
+	if (cand)
+		return cand;
+
 	t = rcu_dereference(ip6n->collect_md_tun);
 	if (t && t->dev->flags & IFF_UP)
 		return t;
@@ -351,7 +371,8 @@ static struct ip6_tnl *ip6_tnl_locate(struct net *net,
 	     (t = rtnl_dereference(*tp)) != NULL;
 	     tp = &t->next) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_equal(remote, &t->parms.raddr)) {
+		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    p->link == t->parms.link) {
 			if (create)
 				return ERR_PTR(-EEXIST);
 
@@ -485,7 +506,7 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct inet6_skb_parm *opt,
 	   processing of the error. */
 
 	rcu_read_lock();
-	t = ip6_tnl_lookup(dev_net(skb->dev), &ipv6h->daddr, &ipv6h->saddr);
+	t = ip6_tnl_lookup(dev_net(skb->dev), skb->dev->ifindex, &ipv6h->daddr, &ipv6h->saddr);
 	if (!t)
 		goto out;
 
@@ -887,7 +908,7 @@ static int ipxip6_rcv(struct sk_buff *skb, u8 ipproto,
 	int ret = -1;
 
 	rcu_read_lock();
-	t = ip6_tnl_lookup(dev_net(skb->dev), &ipv6h->saddr, &ipv6h->daddr);
+	t = ip6_tnl_lookup(dev_net(skb->dev), skb->dev->ifindex, &ipv6h->saddr, &ipv6h->daddr);
 
 	if (t) {
 		u8 tproto = READ_ONCE(t->parms.proto);
@@ -1420,8 +1441,10 @@ ip6_tnl_start_xmit(struct sk_buff *skb, struct net_device *dev)
 static void ip6_tnl_link_config(struct ip6_tnl *t)
 {
 	struct net_device *dev = t->dev;
+	struct net_device *tdev = NULL;
 	struct __ip6_tnl_parm *p = &t->parms;
 	struct flowi6 *fl6 = &t->fl.u.ip6;
+	unsigned int mtu;
 	int t_hlen;
 
 	memcpy(dev->dev_addr, &p->laddr, sizeof(struct in6_addr));
@@ -1457,22 +1480,25 @@ static void ip6_tnl_link_config(struct ip6_tnl *t)
 		struct rt6_info *rt = rt6_lookup(t->net,
 						 &p->raddr, &p->laddr,
 						 p->link, NULL, strict);
+		if (rt) {
+			tdev = rt->dst.dev;
+			ip6_rt_put(rt);
+		}
 
-		if (!rt)
-			return;
+		if (!tdev && p->link)
+			tdev = __dev_get_by_index(t->net, p->link);
 
-		if (rt->dst.dev) {
-			dev->hard_header_len = rt->dst.dev->hard_header_len +
-				t_hlen;
+		if (tdev) {
+			dev->hard_header_len = tdev->hard_header_len + t_hlen;
+			mtu = min_t(unsigned int, tdev->mtu, IP6_MAX_MTU);
 
-			dev->mtu = rt->dst.dev->mtu - t_hlen;
+			dev->mtu = mtu - t_hlen;
 			if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
 				dev->mtu -= 8;
 
 			if (dev->mtu < IPV6_MIN_MTU)
 				dev->mtu = IPV6_MIN_MTU;
 		}
-		ip6_rt_put(rt);
 	}
 }
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-13 17:19                   ` [PATCH v5 " William Dauchy
@ 2020-02-14  9:50                     ` Nicolas Dichtel
  2020-02-14 15:32                     ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: Nicolas Dichtel @ 2020-02-14  9:50 UTC (permalink / raw)
  To: William Dauchy, netdev

Le 13/02/2020 à 18:19, William Dauchy a écrit :
> With ipip, it is possible to create an extra interface explicitly
> attached to a given physical interface:
> 
>   # ip link show tunl0
>   4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/ipip 0.0.0.0 brd 0.0.0.0
>   # ip link add tunl1 type ipip dev eth0
>   # ip link show tunl1
>   6: tunl1@eth0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/ipip 0.0.0.0 brd 0.0.0.0
> 
> But it is not possible with ip6tnl:
> 
>   # ip link show ip6tnl0
>   5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>       link/tunnel6 :: brd ::
>   # ip link add ip6tnl1 type ip6tnl dev eth0
>   RTNETLINK answers: File exists
> 
> This patch aims to make it possible by adding link comparaison in both
> tunnel locate and lookup functions; we also modify mtu calculation when
> attached to an interface with a lower mtu.
> 
> This permits to make use of x-netns communication by moving the newly
> created tunnel in a given netns.
> 
> Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net] net, ip6_tunnel: enhance tunnel locate with link check
  2020-02-13 17:19                   ` [PATCH v5 " William Dauchy
  2020-02-14  9:50                     ` Nicolas Dichtel
@ 2020-02-14 15:32                     ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2020-02-14 15:32 UTC (permalink / raw)
  To: w.dauchy; +Cc: netdev, nicolas.dichtel

From: William Dauchy <w.dauchy@criteo.com>
Date: Thu, 13 Feb 2020 18:19:22 +0100

> With ipip, it is possible to create an extra interface explicitly
> attached to a given physical interface:
> 
>   # ip link show tunl0
>   4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/ipip 0.0.0.0 brd 0.0.0.0
>   # ip link add tunl1 type ipip dev eth0
>   # ip link show tunl1
>   6: tunl1@eth0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/ipip 0.0.0.0 brd 0.0.0.0
> 
> But it is not possible with ip6tnl:
> 
>   # ip link show ip6tnl0
>   5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>       link/tunnel6 :: brd ::
>   # ip link add ip6tnl1 type ip6tnl dev eth0
>   RTNETLINK answers: File exists
> 
> This patch aims to make it possible by adding link comparaison in both
> tunnel locate and lookup functions; we also modify mtu calculation when
> attached to an interface with a lower mtu.
> 
> This permits to make use of x-netns communication by moving the newly
> created tunnel in a given netns.
> 
> Signed-off-by: William Dauchy <w.dauchy@criteo.com>

Applied, thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-02-14 15:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-05 14:57 [PATCH] net, ip6_tunnel: enhance tunnel locate with link/type check William Dauchy
2020-02-05 15:04 ` Nicolas Dichtel
2020-02-05 16:29   ` [PATCH v2 0/2] net, ip6_tunnel: enhance tunnel locate William Dauchy
2020-02-05 16:29   ` [PATCH v2 1/2] net, ip6_tunnel: enhance tunnel locate with link check William Dauchy
2020-02-05 16:54     ` Nicolas Dichtel
2020-02-05 17:36       ` William Dauchy
2020-02-12  8:30       ` [PATCH v3 net] " William Dauchy
2020-02-12 15:54         ` Nicolas Dichtel
2020-02-12 21:06           ` William Dauchy
2020-02-13  9:34             ` Nicolas Dichtel
2020-02-13 15:26               ` William Dauchy
2020-02-13 15:35               ` [PATCH v4 " William Dauchy
2020-02-13 16:33                 ` Nicolas Dichtel
2020-02-13 17:19                   ` [PATCH v5 " William Dauchy
2020-02-14  9:50                     ` Nicolas Dichtel
2020-02-14 15:32                     ` David Miller
2020-02-05 16:29   ` [PATCH v2 2/2] net, ip6_tunnel: enhance tunnel locate with type check William Dauchy
2020-02-05 16:59     ` Nicolas Dichtel
2020-02-05 17:31       ` William Dauchy
2020-02-05 18:16         ` William Dauchy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.