linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next 0/2] Add support for SRv6 End.DT4 action
@ 2020-02-13  1:09 Carmine Scarpitta
  2020-02-13  1:09 ` [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table Carmine Scarpitta
  2020-02-13  1:09 ` [net-next 2/2] Add support for SRv6 End.DT4 action Carmine Scarpitta
  0 siblings, 2 replies; 13+ messages in thread
From: Carmine Scarpitta @ 2020-02-13  1:09 UTC (permalink / raw)
  To: davem
  Cc: kuznet, yoshfuji, kuba, netdev, linux-kernel, ahmed.abdelsalam,
	david.lebrun, dav.lebrun, andrea.mayer, paolo.lungaroni,
	Carmine Scarpitta

Hi,

this patch series adds the support for SRv6 End.DT4 action.
End.DT4 decapsulates the received packets and does IPv4 routing
lookup in a specific routing table.

The IPv4 routing subsystem does not allow to specify the routing
table in which the lookup has to be performed.

Patch 1 enables to perform IPv4 FIB lookup in a predefined FIB table.
Patch 2 adds the support for SRv6 End.DT4 action.

Thanks,
Carmine Scarpitta

Carmine Scarpitta (2):
  Perform IPv4 FIB lookup in a predefined FIB table
  Add support for SRv6 End.DT4 action

 include/net/route.h   |  2 +-
 net/ipv4/route.c      | 22 ++++++++++++-------
 net/ipv6/seg6_local.c | 49 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 9 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table
  2020-02-13  1:09 [net-next 0/2] Add support for SRv6 End.DT4 action Carmine Scarpitta
@ 2020-02-13  1:09 ` Carmine Scarpitta
  2020-02-15 18:06   ` David Ahern
  2020-02-13  1:09 ` [net-next 2/2] Add support for SRv6 End.DT4 action Carmine Scarpitta
  1 sibling, 1 reply; 13+ messages in thread
From: Carmine Scarpitta @ 2020-02-13  1:09 UTC (permalink / raw)
  To: davem
  Cc: kuznet, yoshfuji, kuba, netdev, linux-kernel, ahmed.abdelsalam,
	david.lebrun, dav.lebrun, andrea.mayer, paolo.lungaroni,
	Carmine Scarpitta

In IPv4, the routing subsystem is invoked by calling ip_route_input_rcu()
which performs the recognition logic and calls ip_route_input_slow().

ip_route_input_slow() initialises both "fi" and "table" members
of the fib_result structure to null before calling fib_lookup().

fib_lookup() performs fib lookup in the routing table configured
by the policy routing rules.

In this patch, we allow invoking the ip4 routing subsystem
with known routing table. This is useful for use-cases implementing
a separate routing table per tenant.

The patch introduces a new flag named "tbl_known" to the definition of
ip_route_input_rcu() and ip_route_input_slow().

When the flag is set, ip_route_input_slow() will call fib_table_lookup()
using the defined table instead of using fib_lookup().

Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
Acked-by: Ahmed Abdelsalam <ahmed.abdelsalam@gssi.it>
Acked-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Acked-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
---
 include/net/route.h |  2 +-
 net/ipv4/route.c    | 22 ++++++++++++++--------
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index a9c60fc68e36..4ff977bd7029 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -183,7 +183,7 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
 			 u8 tos, struct net_device *devin);
 int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src,
 		       u8 tos, struct net_device *devin,
-		       struct fib_result *res);
+		       struct fib_result *res, bool tbl_known);
 
 int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src,
 		      u8 tos, struct net_device *devin,
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d5c57b3f77d5..39cec9883d6f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2077,7 +2077,7 @@ int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 
 static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 			       u8 tos, struct net_device *dev,
-			       struct fib_result *res)
+			       struct fib_result *res, bool tbl_known)
 {
 	struct in_device *in_dev = __in_dev_get_rcu(dev);
 	struct flow_keys *flkeys = NULL, _flkeys;
@@ -2109,8 +2109,6 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
 		goto martian_source;
 
-	res->fi = NULL;
-	res->table = NULL;
 	if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0))
 		goto brd_input;
 
@@ -2155,7 +2153,14 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		fl4.fl4_dport = 0;
 	}
 
-	err = fib_lookup(net, &fl4, res, 0);
+	if (!tbl_known) {
+		res->fi = NULL;
+		res->table = NULL;
+		err = fib_lookup(net, &fl4, res, 0);
+	} else {
+		err = fib_table_lookup(res->table, &fl4, res, FIB_LOOKUP_NOREF);
+	}
+
 	if (err != 0) {
 		if (!IN_DEV_FORWARD(in_dev))
 			err = -EHOSTUNREACH;
@@ -2292,7 +2297,7 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 
 	tos &= IPTOS_RT_MASK;
 	rcu_read_lock();
-	err = ip_route_input_rcu(skb, daddr, saddr, tos, dev, &res);
+	err = ip_route_input_rcu(skb, daddr, saddr, tos, dev, &res, false);
 	rcu_read_unlock();
 
 	return err;
@@ -2301,7 +2306,8 @@ EXPORT_SYMBOL(ip_route_input_noref);
 
 /* called with rcu_read_lock held */
 int ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr,
-		       u8 tos, struct net_device *dev, struct fib_result *res)
+		       u8 tos, struct net_device *dev, struct fib_result *res,
+		       bool tbl_known)
 {
 	/* Multicast recognition logic is moved from route cache to here.
 	   The problem was that too many Ethernet cards have broken/missing
@@ -2347,7 +2353,7 @@ int ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		return err;
 	}
 
-	return ip_route_input_slow(skb, daddr, saddr, tos, dev, res);
+	return ip_route_input_slow(skb, daddr, saddr, tos, dev, res, tbl_known);
 }
 
 /* called with rcu_read_lock() */
@@ -3192,7 +3198,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 		skb->dev	= dev;
 		skb->mark	= mark;
 		err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos,
-					 dev, &res);
+					 dev, &res, false);
 
 		rt = skb_rtable(skb);
 		if (err == 0 && rt->dst.error)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next 2/2] Add support for SRv6 End.DT4 action
  2020-02-13  1:09 [net-next 0/2] Add support for SRv6 End.DT4 action Carmine Scarpitta
  2020-02-13  1:09 ` [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table Carmine Scarpitta
@ 2020-02-13  1:09 ` Carmine Scarpitta
  1 sibling, 0 replies; 13+ messages in thread
From: Carmine Scarpitta @ 2020-02-13  1:09 UTC (permalink / raw)
  To: davem
  Cc: kuznet, yoshfuji, kuba, netdev, linux-kernel, ahmed.abdelsalam,
	david.lebrun, dav.lebrun, andrea.mayer, paolo.lungaroni,
	Carmine Scarpitta

SRv6 End.DT4 is defined in the SRv6 Network Programming doc within IETF [1].

End.DT4 is used to implement IPv4 L3VPN use-cases in multi-tenants
environments. It decapsulates the received packets and does IPv4 routing
lookup in the routing table of the tenant.

At JANOG44, LINE corporation presented their multi-tenant DC architecture
using SRv6 [2]. In the slides, they reported that the linux kernel is
missing the support of SRv6 End.DT4 action.

This patch adds support to SRv6 End.DT4 action.

The iproute2 part required for this action was already implemented along
with the other supported SRv6 actions [3].

[1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming-08
[2] https://speakerdeck.com/line_developers/line-data-center-networking-with-srv6
[3] https://patchwork.ozlabs.org/patch/799837/

Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
Acked-by: Ahmed Abdelsalam <ahmed.abdelsalam@gssi.it>
Acked-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Acked-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
---
 net/ipv6/seg6_local.c | 49 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 7cbc19731997..d54a921ea96d 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -151,6 +151,26 @@ static void advance_nextseg(struct ipv6_sr_hdr *srh, struct in6_addr *daddr)
 	*daddr = *addr;
 }
 
+static int seg6_lookup_nexthop_v4(struct sk_buff *skb, u32 tbl_id)
+{
+	struct net *net = dev_net(skb->dev);
+	struct fib_result res;
+	struct iphdr *iph;
+	u8 tos;
+
+	iph = ip_hdr(skb);
+	tos = iph->tos & IPTOS_RT_MASK;
+
+	res.table = fib_get_table(net, tbl_id);
+	if (!res.table)
+		return -ENOENT;
+
+	skb_dst_drop(skb);
+
+	return ip_route_input_rcu(skb, iph->daddr, iph->saddr,
+				  tos, skb->dev, &res, true);
+}
+
 static int
 seg6_lookup_any_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
 			u32 tbl_id, bool local_delivery)
@@ -401,6 +421,30 @@ static int input_action_end_dx4(struct sk_buff *skb,
 	return -EINVAL;
 }
 
+static int input_action_end_dt4(struct sk_buff *skb,
+				struct seg6_local_lwt *slwt)
+{
+	int err;
+
+	if (!decap_and_validate(skb, IPPROTO_IPIP))
+		goto drop;
+
+	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+		goto drop;
+
+	skb_set_transport_header(skb, sizeof(struct iphdr));
+
+	err = seg6_lookup_nexthop_v4(skb, slwt->table);
+	if (err)
+		goto drop;
+
+	return dst_input(skb);
+
+drop:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+
 static int input_action_end_dt6(struct sk_buff *skb,
 				struct seg6_local_lwt *slwt)
 {
@@ -589,6 +633,11 @@ static struct seg6_action_desc seg6_action_table[] = {
 		.attrs		= (1 << SEG6_LOCAL_NH4),
 		.input		= input_action_end_dx4,
 	},
+	{
+		.action		= SEG6_LOCAL_ACTION_END_DT4,
+		.attrs		= (1 << SEG6_LOCAL_TABLE),
+		.input		= input_action_end_dt4,
+	},
 	{
 		.action		= SEG6_LOCAL_ACTION_END_DT6,
 		.attrs		= (1 << SEG6_LOCAL_TABLE),
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table
  2020-02-13  1:09 ` [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table Carmine Scarpitta
@ 2020-02-15 18:06   ` David Ahern
  2020-02-18 23:50     ` Carmine Scarpitta
  0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2020-02-15 18:06 UTC (permalink / raw)
  To: Carmine Scarpitta, davem
  Cc: kuznet, yoshfuji, kuba, netdev, linux-kernel, ahmed.abdelsalam,
	david.lebrun, dav.lebrun, andrea.mayer, paolo.lungaroni

On 2/12/20 6:09 PM, Carmine Scarpitta wrote:
> In IPv4, the routing subsystem is invoked by calling ip_route_input_rcu()
> which performs the recognition logic and calls ip_route_input_slow().
> 
> ip_route_input_slow() initialises both "fi" and "table" members
> of the fib_result structure to null before calling fib_lookup().
> 
> fib_lookup() performs fib lookup in the routing table configured
> by the policy routing rules.
> 
> In this patch, we allow invoking the ip4 routing subsystem
> with known routing table. This is useful for use-cases implementing
> a separate routing table per tenant.
> 
> The patch introduces a new flag named "tbl_known" to the definition of
> ip_route_input_rcu() and ip_route_input_slow().
> 
> When the flag is set, ip_route_input_slow() will call fib_table_lookup()
> using the defined table instead of using fib_lookup().

I do not like this change. If you want a specific table lookup, then why
just call fib_table_lookup directly? Both it and rt_dst_alloc are
exported for modules. Your next patch already does a fib table lookup.


> 
> Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
> Acked-by: Ahmed Abdelsalam <ahmed.abdelsalam@gssi.it>
> Acked-by: Andrea Mayer <andrea.mayer@uniroma2.it>
> Acked-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
> ---
>  include/net/route.h |  2 +-
>  net/ipv4/route.c    | 22 ++++++++++++++--------
>  2 files changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/include/net/route.h b/include/net/route.h
> index a9c60fc68e36..4ff977bd7029 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -183,7 +183,7 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
>  			 u8 tos, struct net_device *devin);
>  int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src,
>  		       u8 tos, struct net_device *devin,
> -		       struct fib_result *res);
> +		       struct fib_result *res, bool tbl_known);
>  
>  int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src,
>  		      u8 tos, struct net_device *devin,
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index d5c57b3f77d5..39cec9883d6f 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2077,7 +2077,7 @@ int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr,
>  
>  static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
>  			       u8 tos, struct net_device *dev,
> -			       struct fib_result *res)
> +			       struct fib_result *res, bool tbl_known)
>  {
>  	struct in_device *in_dev = __in_dev_get_rcu(dev);
>  	struct flow_keys *flkeys = NULL, _flkeys;
> @@ -2109,8 +2109,6 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
>  	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
>  		goto martian_source;
>  
> -	res->fi = NULL;
> -	res->table = NULL;
>  	if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0))
>  		goto brd_input;

I believe this also introduces a potential bug. You remove the fi
initialization yet do not cover the goto case.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table
  2020-02-15 18:06   ` David Ahern
@ 2020-02-18 23:50     ` Carmine Scarpitta
  2020-02-19  1:05       ` David Ahern
  0 siblings, 1 reply; 13+ messages in thread
From: Carmine Scarpitta @ 2020-02-18 23:50 UTC (permalink / raw)
  To: David Ahern
  Cc: davem, kuznet, yoshfuji, kuba, netdev, linux-kernel,
	ahmed.abdelsalam, dav.lebrun, andrea.mayer, paolo.lungaroni

Hi David, 
Thanks for the review and sorry for the late reply 

Indeed both call fib_table_lookup and rt_dst_alloc are exported for modules. 
However, several functions defined in route.c are not exported:
- the two functions rt_cache_valid and rt_cache_route required to handle the routing cache
- find_exception, required to support fib exceptions.
This would require duplicating a lot of the IPv4 routing code. 
The reason behind this change is really to reuse the IPv4 routing code instead of doing a duplication. 

For the fi member of the struct fib_result, we will fix it by initializing before "if (!tbl_known)"

Thanks, 
Carmine 


On Sat, 15 Feb 2020 11:06:43 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 2/12/20 6:09 PM, Carmine Scarpitta wrote:
> > In IPv4, the routing subsystem is invoked by calling ip_route_input_rcu()
> > which performs the recognition logic and calls ip_route_input_slow().
> > 
> > ip_route_input_slow() initialises both "fi" and "table" members
> > of the fib_result structure to null before calling fib_lookup().
> > 
> > fib_lookup() performs fib lookup in the routing table configured
> > by the policy routing rules.
> > 
> > In this patch, we allow invoking the ip4 routing subsystem
> > with known routing table. This is useful for use-cases implementing
> > a separate routing table per tenant.
> > 
> > The patch introduces a new flag named "tbl_known" to the definition of
> > ip_route_input_rcu() and ip_route_input_slow().
> > 
> > When the flag is set, ip_route_input_slow() will call fib_table_lookup()
> > using the defined table instead of using fib_lookup().
> 
> I do not like this change. If you want a specific table lookup, then why
> just call fib_table_lookup directly? Both it and rt_dst_alloc are
> exported for modules. Your next patch already does a fib table lookup.
> 
> 
> > 
> > Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
> > Acked-by: Ahmed Abdelsalam <ahmed.abdelsalam@gssi.it>
> > Acked-by: Andrea Mayer <andrea.mayer@uniroma2.it>
> > Acked-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
> > ---
> >  include/net/route.h |  2 +-
> >  net/ipv4/route.c    | 22 ++++++++++++++--------
> >  2 files changed, 15 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/net/route.h b/include/net/route.h
> > index a9c60fc68e36..4ff977bd7029 100644
> > --- a/include/net/route.h
> > +++ b/include/net/route.h
> > @@ -183,7 +183,7 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
> >  			 u8 tos, struct net_device *devin);
> >  int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src,
> >  		       u8 tos, struct net_device *devin,
> > -		       struct fib_result *res);
> > +		       struct fib_result *res, bool tbl_known);
> >  
> >  int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src,
> >  		      u8 tos, struct net_device *devin,
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index d5c57b3f77d5..39cec9883d6f 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -2077,7 +2077,7 @@ int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr,
> >  
> >  static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
> >  			       u8 tos, struct net_device *dev,
> > -			       struct fib_result *res)
> > +			       struct fib_result *res, bool tbl_known)
> >  {
> >  	struct in_device *in_dev = __in_dev_get_rcu(dev);
> >  	struct flow_keys *flkeys = NULL, _flkeys;
> > @@ -2109,8 +2109,6 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
> >  	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
> >  		goto martian_source;
> >  
> > -	res->fi = NULL;
> > -	res->table = NULL;
> >  	if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0))
> >  		goto brd_input;
> 
> I believe this also introduces a potential bug. You remove the fi
> initialization yet do not cover the goto case.
> 
> 


-- 
Carmine Scarpitta <carmine.scarpitta@uniroma2.it>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table
  2020-02-18 23:50     ` Carmine Scarpitta
@ 2020-02-19  1:05       ` David Ahern
  2020-02-19  2:49         ` Carmine Scarpitta
  0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2020-02-19  1:05 UTC (permalink / raw)
  To: Carmine Scarpitta
  Cc: davem, kuznet, yoshfuji, kuba, netdev, linux-kernel,
	ahmed.abdelsalam, dav.lebrun, andrea.mayer, paolo.lungaroni

On 2/18/20 4:50 PM, Carmine Scarpitta wrote:
> Indeed both call fib_table_lookup and rt_dst_alloc are exported for modules. 
> However, several functions defined in route.c are not exported:
> - the two functions rt_cache_valid and rt_cache_route required to handle the routing cache
> - find_exception, required to support fib exceptions.
> This would require duplicating a lot of the IPv4 routing code. 
> The reason behind this change is really to reuse the IPv4 routing code instead of doing a duplication. 
> 
> For the fi member of the struct fib_result, we will fix it by initializing before "if (!tbl_known)"

The route.c code does not need to know about the fib table or fib
policy. Why do all of the existing policy options (mark, L3 domains,
uid) to direct the lookup to the table of interest not work for this use
case?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table
  2020-02-19  1:05       ` David Ahern
@ 2020-02-19  2:49         ` Carmine Scarpitta
  2020-02-19  4:29           ` David Ahern
  0 siblings, 1 reply; 13+ messages in thread
From: Carmine Scarpitta @ 2020-02-19  2:49 UTC (permalink / raw)
  To: David Ahern
  Cc: davem, kuznet, yoshfuji, kuba, netdev, linux-kernel,
	ahmed.abdelsalam, dav.lebrun, andrea.mayer, paolo.lungaroni,
	hiroki.shirokura

Hi David,
Thanks for the reply.

The problem is not related to the table lookup. Calling fib_table_lookup and then rt_dst_alloc from seg6_local.c is good.

But after the lookup we need to forward the packet according the matched table entry. This requires to perform all the steps already implemented by ip_route_input_slow function. So we need to call the following functions (defined in route.c):
- rt_cache_valid
- find_exception
- rt_dst_alloc
- rt_set_nexthop
- rt_cache_route

Some of these functions are not exported and so we cannot call them from seg6_local.c
Consequently, we are not able to support all the functionalities implemented by IPv4 routing subsystem.

We are not harming the IPv4 FIB lookup. We introduce a new flag that allows us to re-use all the non exported functions. 

If the flag is not set, the normal IPv4 FIB lookup is the same with no change. 

Thanks,
Carmine


On Tue, 18 Feb 2020 18:05:58 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 2/18/20 4:50 PM, Carmine Scarpitta wrote:
> > Indeed both call fib_table_lookup and rt_dst_alloc are exported for modules. 
> > However, several functions defined in route.c are not exported:
> > - the two functions rt_cache_valid and rt_cache_route required to handle the routing cache
> > - find_exception, required to support fib exceptions.
> > This would require duplicating a lot of the IPv4 routing code. 
> > The reason behind this change is really to reuse the IPv4 routing code instead of doing a duplication. 
> > 
> > For the fi member of the struct fib_result, we will fix it by initializing before "if (!tbl_known)"
> 
> The route.c code does not need to know about the fib table or fib
> policy. Why do all of the existing policy options (mark, L3 domains,
> uid) to direct the lookup to the table of interest not work for this use
> case?


-- 
Carmine Scarpitta <carmine.scarpitta@uniroma2.it>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table
  2020-02-19  2:49         ` Carmine Scarpitta
@ 2020-02-19  4:29           ` David Ahern
  2020-02-20 22:33             ` Carmine Scarpitta
  2020-03-06 16:45             ` Ahmed Abdelsalam
  0 siblings, 2 replies; 13+ messages in thread
From: David Ahern @ 2020-02-19  4:29 UTC (permalink / raw)
  To: Carmine Scarpitta
  Cc: davem, kuznet, yoshfuji, kuba, netdev, linux-kernel,
	ahmed.abdelsalam, dav.lebrun, andrea.mayer, paolo.lungaroni,
	hiroki.shirokura

On 2/18/20 7:49 PM, Carmine Scarpitta wrote:
> Hi David,
> Thanks for the reply.
> 
> The problem is not related to the table lookup. Calling fib_table_lookup and then rt_dst_alloc from seg6_local.c is good.
> 

you did not answer my question. Why do all of the existing policy
options (mark, L3 domains, uid) to direct the lookup to the table of
interest not work for this use case?

What you want is not unique. There are many ways to make it happen.
Bleeding policy details to route.c and adding a flag that is always
present and checked even when not needed (e.g.,
CONFIG_IP_MULTIPLE_TABLES is disabled) is not the right way to do it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table
  2020-02-19  4:29           ` David Ahern
@ 2020-02-20 22:33             ` Carmine Scarpitta
  2020-02-21 17:31               ` David Ahern
  2020-03-06 16:45             ` Ahmed Abdelsalam
  1 sibling, 1 reply; 13+ messages in thread
From: Carmine Scarpitta @ 2020-02-20 22:33 UTC (permalink / raw)
  To: David Ahern
  Cc: davem, kuznet, yoshfuji, kuba, netdev, linux-kernel,
	ahmed.abdelsalam, dav.lebrun, andrea.mayer, paolo.lungaroni,
	hiroki.shirokura

Hi David,

Regarding your question. 

Our use-case is more than doing lookup into a VRF. 

What we are working on a multi-tenant automated DC fabric that supports 
overlay, traffic engineering (TE) and service function chaining (SFC). 
We are leveraging the SRv6 implementation in Linux. 
 
For the overlay we leverage: 
- SRv6 T.Encaps to encapsulate both IPv4 and IPv6 traffic of the tenant 
   (T.Encaps is supported since kernel 4.10) 
- SRv6 End.DT4 to decapsulate the overlay encapsulation and does the 
lookup inside the tenants VRF (this is the only missing piece)

For TE we leverage: 
- SRv6 End and End.X functions to steer traffic through one or more midpoints
to avoid congested links, etc. (End and End.X are supported since kernel 4.14)

For SFC we leverage some network functions that supports SRv6: 
- iptables already supports matching SRv6 header since kernel 4.16. 
- There is some work in progress of adding support to nftables as well. 

On top of that we are using BGP as a control plane to advertise the VPN/Egress 
tunnel endpoints. 

Part of this is already running in production at LINE corporation [1]. 

As you can see, what is missing is having SRv6 End.DT4 supported to do 
decapsulation and VRF lookup.  

We introduced this flag to avoid duplicating the IPv4 FIB lookup code. 

For the "tbl_known" flag, we can wrap the check of the flag inside 
a "#ifdef CONFIG_IP_MULTIPLE_TABLES" directive. 
If CONFIG_IP_MULTIPLE_TABLES is not set, we won't do any check.  

Thanks, 
Carmine 


[1] https://speakerdeck.com/line_developers/line-data-center-networking-with-srv6


On Tue, 18 Feb 2020 21:29:31 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 2/18/20 7:49 PM, Carmine Scarpitta wrote:
> > Hi David,
> > Thanks for the reply.
> > 
> > The problem is not related to the table lookup. Calling fib_table_lookup and then rt_dst_alloc from seg6_local.c is good.
> > 
> 
> you did not answer my question. Why do all of the existing policy
> options (mark, L3 domains, uid) to direct the lookup to the table of
> interest not work for this use case?
> 
> What you want is not unique. There are many ways to make it happen.
> Bleeding policy details to route.c and adding a flag that is always
> present and checked even when not needed (e.g.,
> CONFIG_IP_MULTIPLE_TABLES is disabled) is not the right way to do it.


-- 
Carmine Scarpitta <carmine.scarpitta@uniroma2.it>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table
  2020-02-20 22:33             ` Carmine Scarpitta
@ 2020-02-21 17:31               ` David Ahern
  0 siblings, 0 replies; 13+ messages in thread
From: David Ahern @ 2020-02-21 17:31 UTC (permalink / raw)
  To: Carmine Scarpitta
  Cc: davem, kuznet, yoshfuji, kuba, netdev, linux-kernel,
	ahmed.abdelsalam, dav.lebrun, andrea.mayer, paolo.lungaroni,
	hiroki.shirokura

On 2/20/20 3:33 PM, Carmine Scarpitta wrote:
> As you can see, what is missing is having SRv6 End.DT4 supported to do 
> decapsulation and VRF lookup.  

Thanks for the detailed explanation and reference. Both this comment and
the slides reference VRF and per-tenant table lookup which is the same
as VRF.

If you allocated a VRF device for the tenant, the SRv6 header pop and
route lookup can be done very similar to how MPLS works when the goal is
to do a route lookup vs forwarding to a specific nexthop, be it main table:

    ip -f mpls route add 200 dev lo

or VRF table

    ip -f mpls route add 200 dev red

Take a look at the mpls code. Unless I am missing something this use
case is very similar.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table
  2020-02-19  4:29           ` David Ahern
  2020-02-20 22:33             ` Carmine Scarpitta
@ 2020-03-06 16:45             ` Ahmed Abdelsalam
  2020-03-09 15:36               ` David Ahern
  1 sibling, 1 reply; 13+ messages in thread
From: Ahmed Abdelsalam @ 2020-03-06 16:45 UTC (permalink / raw)
  To: David Ahern, Carmine Scarpitta
  Cc: davem, kuznet, yoshfuji, kuba, netdev, linux-kernel, dav.lebrun,
	andrea.mayer, paolo.lungaroni, hiroki.shirokura

Hi David,

Thanks for the pointers for the VRF with MPLS.

We have been looking at this for the last weeks also watched your videos 
on the VRF and l3mdev implementation at the different netdev conferences.

However, in the SRv6 we don’t really need a VRF device. The SRv6 
functions (the already supported ones as well as the End.DT4 submitted 
here) resides in the IPv6 FIB table.

The way it works is as follows:
1) create a table for the tenant
$ echo 100 tenant1 >> /etc/iproute2/rt_tables

You instantiate an SRv6 End.DT4 function at the Egress PE to decapsulate 
the SRv6 encapsulation and lookup the inner packet in the tenant1 table. 
The example iproute2 command to do so is as below.

$ ip -6 route add A::B encap seg6local action End.DT4 table tenant1 dev 
enp0s8

This installs an IPv6 FIB entry as shown below.
$ ip -6 r
a::b  encap seg6local action End.DT4 table 100 dev enp0s8 metric 1024 
pref medium

Then the BGP routing daemon at the Egress PE is used to advertise this 
VPN service. The BGP sub-TLV to support SRv6 IPv4 L3VPN is defined in [2].

The SRv6 BGP extensions to support IPv4/IPv6 L3VPN are now merged in in 
FRRouting/frr [3][4][5][6].

There is also a pull request for the CLI to configure SRv6-locator on 
zebra [7].

The BGP daemon at the Ingress PE receives the BGP update and installs an 
a FIB entry that this bound to SRv6 encapsulation.

$ ip r
30.0.0.0/24  encap seg6 mode encap segs 1 [ a::b ] dev enp0s9

Traffic destined to that tenant will get encapsulated at the ingress 
node and forwarded to the egress node on the IPv6 fabric.

The encapsulation is in the form of outer IPv6 header that has the 
destination address equal to the VPN service A::B instantiated at the 
Egress PE.

When the packet arrives at the Egress PE, the destination address 
matches the FIB entry associated with the End.DT4 function which does 
the decapsulation and the lookup inside the tenant table associated with 
it (tenant1).

Everything I explained is in the Linux kernel since a while. End.DT4 was 
missing and this the reason we submitted this patch.

In this multi-tenant DC fabric we leverage the IPv6 forwarding. No need 
for MPLS dataplane in the fabric.

We can submit a v2 of patch addressing your comments on the "tbl_known" 
flag.

Thanks,
Ahmed

[1] https://segment-routing.org/index.php/Implementation/AdvancedConf
[2] https://tools.ietf.org/html/draft-ietf-bess-srv6-services-02
[3] 
https://github.com/FRRouting/frr/commit/7f1ace03c78ca57c7f8b5df5796c66fddb47e5fe
[4] 
https://github.com/FRRouting/frr/commit/e496b4203055c50806dc7193b9762304261c4bbd
[5] 
https://github.com/FRRouting/frr/commit/63d02478b557011b8606668f1e3c2edbf263794d
[6] 
https://github.com/FRRouting/frr/commit/c6ca155d73585b1ca383facd74e9973c281f1f93
[7] https://github.com/FRRouting/frr/pull/5865


On 19/02/2020 05:29, David Ahern wrote:
> On 2/18/20 7:49 PM, Carmine Scarpitta wrote:
>> Hi David,
>> Thanks for the reply.
>>
>> The problem is not related to the table lookup. Calling fib_table_lookup and then rt_dst_alloc from seg6_local.c is good.
>>
> 
> you did not answer my question. Why do all of the existing policy
> options (mark, L3 domains, uid) to direct the lookup to the table of
> interest not work for this use case?
> 
> What you want is not unique. There are many ways to make it happen.
> Bleeding policy details to route.c and adding a flag that is always
> present and checked even when not needed (e.g.,
> CONFIG_IP_MULTIPLE_TABLES is disabled) is not the right way to do it.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table
  2020-03-06 16:45             ` Ahmed Abdelsalam
@ 2020-03-09 15:36               ` David Ahern
  2020-03-10 16:26                 ` Ahmed Abdelsalam
  0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2020-03-09 15:36 UTC (permalink / raw)
  To: Ahmed Abdelsalam, Carmine Scarpitta
  Cc: davem, kuznet, yoshfuji, kuba, netdev, linux-kernel, dav.lebrun,
	andrea.mayer, paolo.lungaroni, hiroki.shirokura

On 3/6/20 9:45 AM, Ahmed Abdelsalam wrote:
> 
> However, in the SRv6 we don’t really need a VRF device. The SRv6
> functions (the already supported ones as well as the End.DT4 submitted
> here) resides in the IPv6 FIB table.
> 
> The way it works is as follows:
> 1) create a table for the tenant
> $ echo 100 tenant1 >> /etc/iproute2/rt_tables
> 
> You instantiate an SRv6 End.DT4 function at the Egress PE to decapsulate
> the SRv6 encapsulation and lookup the inner packet in the tenant1 table.
> The example iproute2 command to do so is as below.
> 
> $ ip -6 route add A::B encap seg6local action End.DT4 table tenant1 dev
> enp0s8
> 
> This installs an IPv6 FIB entry as shown below.
> $ ip -6 r
> a::b  encap seg6local action End.DT4 table 100 dev enp0s8 metric 1024
> pref medium
> 
> Then the BGP routing daemon at the Egress PE is used to advertise this
> VPN service. The BGP sub-TLV to support SRv6 IPv4 L3VPN is defined in [2].
> 
> The SRv6 BGP extensions to support IPv4/IPv6 L3VPN are now merged in in
> FRRouting/frr [3][4][5][6].
> 
> There is also a pull request for the CLI to configure SRv6-locator on
> zebra [7].
> 
> The BGP daemon at the Ingress PE receives the BGP update and installs an
> a FIB entry that this bound to SRv6 encapsulation.
> 
> $ ip r
> 30.0.0.0/24  encap seg6 mode encap segs 1 [ a::b ] dev enp0s9
> 
> Traffic destined to that tenant will get encapsulated at the ingress
> node and forwarded to the egress node on the IPv6 fabric.
> 
> The encapsulation is in the form of outer IPv6 header that has the
> destination address equal to the VPN service A::B instantiated at the
> Egress PE.
> 
> When the packet arrives at the Egress PE, the destination address
> matches the FIB entry associated with the End.DT4 function which does
> the decapsulation and the lookup inside the tenant table associated with
> it (tenant1).

And that is exactly how MPLS works. At ingress, a label is pushed to the
front of the packet encapping the original packet at the network header.
It traverses the label switched path and at egress the label is popped
and tenant table can be consulted.

IPv6 SR is not special with any of these steps. If you used a VRF device
and used syntax that mirrors MPLS, you would not need a kernel change.
The infrastructure to do what you need already exists, you are just
trying to go around and special case the code.

SR for IPv6 packets really should have been done this way already; the
implementation is leveraging a code path that should not exist.

> 
> Everything I explained is in the Linux kernel since a while. End.DT4 was
> missing and this the reason we submitted this patch.
> 
> In this multi-tenant DC fabric we leverage the IPv6 forwarding. No need
> for MPLS dataplane in the fabric.

My MPLS comment was only point out that MPLS encap and IPv6 SR encap is
doing the exact same thing.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table
  2020-03-09 15:36               ` David Ahern
@ 2020-03-10 16:26                 ` Ahmed Abdelsalam
  0 siblings, 0 replies; 13+ messages in thread
From: Ahmed Abdelsalam @ 2020-03-10 16:26 UTC (permalink / raw)
  To: David Ahern, Carmine Scarpitta
  Cc: davem, kuznet, yoshfuji, kuba, netdev, linux-kernel, dav.lebrun,
	andrea.mayer, paolo.lungaroni, hiroki.shirokura

Hi David,

Thank you for your reply.

Our goal is not really a work around what is available and what the VRF 
can do.
The main issue (you already scratched in you email) is the consistency 
between IPv4 L3VPN and IPv6 L3VPN.
We can use the VRF device for End.DT4 but this will lead to 
inconsistency with End.DT6 as it uses the table instead.
This becomes hard to operate as provisioning the IPv4 and IPv6 VPNs 
needs to be done in different ways.

We would appreciate to get get your thoughts/advices on that and how we 
should go forward.

Regards,
Ahmed


On 09/03/2020 16:36, David Ahern wrote:
> On 3/6/20 9:45 AM, Ahmed Abdelsalam wrote:
>>
>> However, in the SRv6 we don’t really need a VRF device. The SRv6
>> functions (the already supported ones as well as the End.DT4 submitted
>> here) resides in the IPv6 FIB table.
>>
>> The way it works is as follows:
>> 1) create a table for the tenant
>> $ echo 100 tenant1 >> /etc/iproute2/rt_tables
>>
>> You instantiate an SRv6 End.DT4 function at the Egress PE to decapsulate
>> the SRv6 encapsulation and lookup the inner packet in the tenant1 table.
>> The example iproute2 command to do so is as below.
>>
>> $ ip -6 route add A::B encap seg6local action End.DT4 table tenant1 dev
>> enp0s8
>>
>> This installs an IPv6 FIB entry as shown below.
>> $ ip -6 r
>> a::b  encap seg6local action End.DT4 table 100 dev enp0s8 metric 1024
>> pref medium
>>
>> Then the BGP routing daemon at the Egress PE is used to advertise this
>> VPN service. The BGP sub-TLV to support SRv6 IPv4 L3VPN is defined in [2].
>>
>> The SRv6 BGP extensions to support IPv4/IPv6 L3VPN are now merged in in
>> FRRouting/frr [3][4][5][6].
>>
>> There is also a pull request for the CLI to configure SRv6-locator on
>> zebra [7].
>>
>> The BGP daemon at the Ingress PE receives the BGP update and installs an
>> a FIB entry that this bound to SRv6 encapsulation.
>>
>> $ ip r
>> 30.0.0.0/24  encap seg6 mode encap segs 1 [ a::b ] dev enp0s9
>>
>> Traffic destined to that tenant will get encapsulated at the ingress
>> node and forwarded to the egress node on the IPv6 fabric.
>>
>> The encapsulation is in the form of outer IPv6 header that has the
>> destination address equal to the VPN service A::B instantiated at the
>> Egress PE.
>>
>> When the packet arrives at the Egress PE, the destination address
>> matches the FIB entry associated with the End.DT4 function which does
>> the decapsulation and the lookup inside the tenant table associated with
>> it (tenant1).
> 
> And that is exactly how MPLS works. At ingress, a label is pushed to the
> front of the packet encapping the original packet at the network header.
> It traverses the label switched path and at egress the label is popped
> and tenant table can be consulted.
> 
> IPv6 SR is not special with any of these steps. If you used a VRF device
> and used syntax that mirrors MPLS, you would not need a kernel change.
> The infrastructure to do what you need already exists, you are just
> trying to go around and special case the code.
> 
> SR for IPv6 packets really should have been done this way already; the
> implementation is leveraging a code path that should not exist.
> 
>>
>> Everything I explained is in the Linux kernel since a while. End.DT4 was
>> missing and this the reason we submitted this patch.
>>
>> In this multi-tenant DC fabric we leverage the IPv6 forwarding. No need
>> for MPLS dataplane in the fabric.
> 
> My MPLS comment was only point out that MPLS encap and IPv6 SR encap is
> doing the exact same thing.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-03-10 16:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13  1:09 [net-next 0/2] Add support for SRv6 End.DT4 action Carmine Scarpitta
2020-02-13  1:09 ` [net-next 1/2] Perform IPv4 FIB lookup in a predefined FIB table Carmine Scarpitta
2020-02-15 18:06   ` David Ahern
2020-02-18 23:50     ` Carmine Scarpitta
2020-02-19  1:05       ` David Ahern
2020-02-19  2:49         ` Carmine Scarpitta
2020-02-19  4:29           ` David Ahern
2020-02-20 22:33             ` Carmine Scarpitta
2020-02-21 17:31               ` David Ahern
2020-03-06 16:45             ` Ahmed Abdelsalam
2020-03-09 15:36               ` David Ahern
2020-03-10 16:26                 ` Ahmed Abdelsalam
2020-02-13  1:09 ` [net-next 2/2] Add support for SRv6 End.DT4 action Carmine Scarpitta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).