All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Inverse of flowi{4,6}_oif: flowi{4,6}_not_oif
@ 2016-02-02 22:42 Jason A. Donenfeld
  2016-02-03 14:27 ` Jason A. Donenfeld
  2016-02-03 17:46 ` [PATCH] flowi: add concept of "not_oif" Jason A. Donenfeld
  0 siblings, 2 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2016-02-02 22:42 UTC (permalink / raw)
  To: Netdev, David Miller, Eric W. Biederman, dsa

Hi folks,

Sometimes it is useful to ask, "what is the route for 1.2.3.4/32 if we
*exclude* routes that go out through eth8?" Currently, the only way of
doing this is to read the entire routing table in userspace, and then
reimplement all of the logic for the various tables and metrics and
complex logic of the FIB, remove the routes you want, and then
calculate the answer. This is, obviously, far from satisfactory, as
it's not really feasible to accurate reimplement that. Of course,
another obviously flawed way would be to just remove those routes for
"dev eth8", do the lookup, and then re-add them, but this is
disruptive.

The best solution for this is to add a flowi4_not_oif and
flowi6_not_oif member which looks up a route that doesn't use the
specified netdev.

What are the use cases of this? Several.

In userspace, the most obvious usage is this: OpenVPN or OpenConnect
or any other similar application receives routes from a server. It
wants to add those routes to the routing table. But, it needs to make
sure that the OpenVPN endpoint is still accessible over the actual
network interface, especially in the case of being pushed "0/1 and
128/1". So, before adding those routes, it looks up what the existing
route is, and then adds that route explicitly: "ip route add
1.2.3.4/32 via <current default route>". Then it can add routes that
might potentially override this, while keeping the tunnel working.

However, there are big problems with this naive (yet "state of the
art") approach. What if the former default route changes (because of,
say, dhclient)? In this case, the explicit route to the endpoint is
not updated. Or worse, what if several complicated changes are made at
once to the routing table? The *only* way to reliably figure out the
new explicit route to the tunnel endpoint is to remove the tunnel's
existing routes (!), query the route for the endpoint, and then re-add
them. Not only does this affect availability due to its blatant lack
of atomicity, but it also is an issue from a network security
perspective. Another problem -- which affects me personally on a daily
basis -- is: what happens when the device that previously routed the
endpoint goes down, and then back up again? This happens with wireless
cards, for example, when a laptop suspends. On an OpenVPN laptop with
"0/1 and 128/1" routes, upon resuming from suspend and reconnecting to
a wireless network, one must manually reconfigure the explicit route
to the endpoint, since it has been automatically garbage collected
when the interface went down. No, this isn't a userspace problem: as
previously mentioned, userspace cannot reliably make the calculations
necessary to add such endpoint routes without affecting availability
and/or security.

There's another use case, inside the kernel. Geneve, vxlan, and many
other tunnel devices have this copy&pasted codeblock:

        if (rt->dst.dev == dev) { /* is this necessary? */
                netdev_dbg(dev, "circular route to %pI4\n", &fl4->daddr);
                ip_rt_put(rt);
                return ERR_PTR(-ELOOP);
        }

While it remains up for debate (and potential configuration flags)
whether one would want such an "automagical" solution, it is possible
to imagine "rt->dst" here being calculated with "flowi{4,6}_not_oif"
in mind, which would eliminate this loop detection need and generally
lead to having a happier network administrator.

In private discussions with several system admins and kernel
developers alike, the response has been, "oh my God, I know - I hate
this issue. What an elegant solution! Have you written to davem &
friends about this?" to which I respond, "maybe some day I'll have the
courage..." Well, this is it guys.

So, what I propose is adding this "flowi{4,6}_not_oif", for an
extremely common and only-properly-solved-by-the-kernel problem. The
first step would be augmenting fib4 and fib6, and the second step
would be adding support for this to ip-route(8) and the rtnetlink
layer.

I stress again: there is no feasible userspace solution to this problem.

So, this [RFC] is to determine the following:

(1) Would you merge a patch that adds this functionality?
(2) Is there someone intimately familiar with the FIB who would be
willing to write this patch?

- If 1 && 2, awesome! I owe you a steak dinner.
- If !1, why? You best have quite a good alternative solution for this
issue (that doesn't include the words "install NetworkManager").
- If 1 && !2, I'll do a thorough study of the FIB code and write it myself.
- If !1 and 2, um, well, join the cause I guess.

Hope to hear from you soon.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Inverse of flowi{4,6}_oif: flowi{4,6}_not_oif
  2016-02-02 22:42 [RFC] Inverse of flowi{4,6}_oif: flowi{4,6}_not_oif Jason A. Donenfeld
@ 2016-02-03 14:27 ` Jason A. Donenfeld
  2016-02-03 16:28   ` David Ahern
  2016-02-03 17:46 ` [PATCH] flowi: add concept of "not_oif" Jason A. Donenfeld
  1 sibling, 1 reply; 9+ messages in thread
From: Jason A. Donenfeld @ 2016-02-03 14:27 UTC (permalink / raw)
  To: Netdev, David Miller, Eric W. Biederman, dsa

FYI, for v4 at least, it's ridiculously easy and simple to implement:

=~=~=~=~=~=~=~=~=

diff --git a/include/net/flow.h b/include/net/flow.h
index 83969ee..29967ad 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -26,6 +26,7 @@ struct flowi_tunnel {

 struct flowi_common {
  int flowic_oif;
+ int flowic_not_oif;
  int flowic_iif;
  __u32 flowic_mark;
  __u8 flowic_tos;
@@ -67,6 +68,7 @@ union flowi_uli {
 struct flowi4 {
  struct flowi_common __fl_common;
 #define flowi4_oif  __fl_common.flowic_oif
+#define flowi4_not_oif  __fl_common.flowic_not_oif
 #define flowi4_iif  __fl_common.flowic_iif
 #define flowi4_mark  __fl_common.flowic_mark
 #define flowi4_tos  __fl_common.flowic_tos
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 7aea0cc..d03e991 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1429,6 +1429,8 @@ found:
         flp->flowi4_oif != nh->nh_oif)
      continue;
    }
+   if (flp->flowi4_not_oif && flp->flowi4_not_oif == nh->nh_oif)
+    continue;

    if (!(fib_flags & FIB_LOOKUP_NOREF))
     atomic_inc(&fi->fib_clntref);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC] Inverse of flowi{4,6}_oif: flowi{4,6}_not_oif
  2016-02-03 14:27 ` Jason A. Donenfeld
@ 2016-02-03 16:28   ` David Ahern
  2016-02-03 17:40     ` Jason A. Donenfeld
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2016-02-03 16:28 UTC (permalink / raw)
  To: Jason A. Donenfeld, Netdev, David Miller, Eric W. Biederman

On 2/3/16 7:27 AM, Jason A. Donenfeld wrote:
> FYI, for v4 at least, it's ridiculously easy and simple to implement:
>
> =~=~=~=~=~=~=~=~=
>
> diff --git a/include/net/flow.h b/include/net/flow.h
> index 83969ee..29967ad 100644
> --- a/include/net/flow.h
> +++ b/include/net/flow.h
> @@ -26,6 +26,7 @@ struct flowi_tunnel {
>
>   struct flowi_common {
>    int flowic_oif;
> + int flowic_not_oif;
>    int flowic_iif;
>    __u32 flowic_mark;
>    __u8 flowic_tos;

I was going to suggest a flag:

@@ -36,6 +36,7 @@ struct flowi_common {
  #define FLOWI_FLAG_KNOWN_NH            0x02
  #define FLOWI_FLAG_L3MDEV_SRC          0x04
  #define FLOWI_FLAG_SKIP_NH_OIF         0x08
+#define FLOWI_FLAG_NOT_OIF             0x10
         __u32   flowic_secid;
         struct flowi_tunnel flowic_tun_key;
  };


but there are a number of oif checks that would have to be enhanced with 
the flag check. Adding a flowic_not_oif member is certainly simpler and 
there is a 4-byte hole in the struct.

> @@ -67,6 +68,7 @@ union flowi_uli {
>   struct flowi4 {
>    struct flowi_common __fl_common;
>   #define flowi4_oif  __fl_common.flowic_oif
> +#define flowi4_not_oif  __fl_common.flowic_not_oif
>   #define flowi4_iif  __fl_common.flowic_iif
>   #define flowi4_mark  __fl_common.flowic_mark
>   #define flowi4_tos  __fl_common.flowic_tos
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 7aea0cc..d03e991 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -1429,6 +1429,8 @@ found:
>           flp->flowi4_oif != nh->nh_oif)
>        continue;
>      }
> +   if (flp->flowi4_not_oif && flp->flowi4_not_oif == nh->nh_oif)
> +    continue;
>
>      if (!(fib_flags & FIB_LOOKUP_NOREF))
>       atomic_inc(&fi->fib_clntref);
>

For IPv6 start with ip6_pol_route_lookup and modifying rt6_device_match

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Inverse of flowi{4,6}_oif: flowi{4,6}_not_oif
  2016-02-03 16:28   ` David Ahern
@ 2016-02-03 17:40     ` Jason A. Donenfeld
  0 siblings, 0 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2016-02-03 17:40 UTC (permalink / raw)
  To: David Ahern; +Cc: Netdev, David Miller, Eric W. Biederman

On Wed, Feb 3, 2016 at 5:28 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
>
> For IPv6 start with ip6_pol_route_lookup and modifying rt6_device_match
>

If that's all it takes, then that turns out to be ridiculously easy
too. I'll get a patch together for this.

--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -467,6 +467,7 @@ static inline struct rt6_info
*rt6_device_match(struct net *net,
                                                    struct rt6_info *rt,
                                                    const struct
in6_addr *saddr,
                                                    int oif,
+                                                   int not_oif,
                                                    int flags)
 {
        struct rt6_info *local = NULL;
@@ -478,6 +479,9 @@ static inline struct rt6_info
*rt6_device_match(struct net *net,
        for (sprt = rt; sprt; sprt = sprt->dst.rt6_next) {
                struct net_device *dev = sprt->dst.dev;

+               if (not_oif && dev->ifindex == not_oif)
+                       continue;
+
                if (oif) {
                        if (dev->ifindex == oif)
                                return sprt;
@@ -856,7 +860,7 @@ static struct rt6_info
*ip6_pol_route_lookup(struct net *net,
        fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
 restart:
        rt = fn->leaf;
-       rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
+       rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif,
fl6->flowi6_not_oif, flags);
        if (rt->rt6i_nsiblings && fl6->flowi6_oif == 0)
                rt = rt6_multipath_select(rt, fl6, fl6->flowi6_oif, flags);
        if (rt == net->ipv6.ip6_null_entry) {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] flowi: add concept of "not_oif"
  2016-02-02 22:42 [RFC] Inverse of flowi{4,6}_oif: flowi{4,6}_not_oif Jason A. Donenfeld
  2016-02-03 14:27 ` Jason A. Donenfeld
@ 2016-02-03 17:46 ` Jason A. Donenfeld
  2016-02-03 20:42   ` Julian Anastasov
  1 sibling, 1 reply; 9+ messages in thread
From: Jason A. Donenfeld @ 2016-02-03 17:46 UTC (permalink / raw)
  To: Netdev, David Miller, Eric W. Biederman, dsa; +Cc: Jason A. Donenfeld

This patch simply adds support for specifying a "not_oif" device in
flowi4 and flowi6 lookups, that will find a matching route that _isn't_
via the specified device.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 include/net/flow.h  | 3 +++
 net/ipv4/fib_trie.c | 2 ++
 net/ipv6/route.c    | 6 +++++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index 83969ee..29967ad 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -26,6 +26,7 @@ struct flowi_tunnel {
 
 struct flowi_common {
 	int	flowic_oif;
+	int	flowic_not_oif;
 	int	flowic_iif;
 	__u32	flowic_mark;
 	__u8	flowic_tos;
@@ -67,6 +68,7 @@ union flowi_uli {
 struct flowi4 {
 	struct flowi_common	__fl_common;
 #define flowi4_oif		__fl_common.flowic_oif
+#define flowi4_not_oif		__fl_common.flowic_not_oif
 #define flowi4_iif		__fl_common.flowic_iif
 #define flowi4_mark		__fl_common.flowic_mark
 #define flowi4_tos		__fl_common.flowic_tos
@@ -125,6 +127,7 @@ static inline void flowi4_update_output(struct flowi4 *fl4, int oif, __u8 tos,
 struct flowi6 {
 	struct flowi_common	__fl_common;
 #define flowi6_oif		__fl_common.flowic_oif
+#define flowi6_not_oif		__fl_common.flowic_not_oif
 #define flowi6_iif		__fl_common.flowic_iif
 #define flowi6_mark		__fl_common.flowic_mark
 #define flowi6_tos		__fl_common.flowic_tos
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 7aea0cc..d03e991 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1429,6 +1429,8 @@ found:
 				    flp->flowi4_oif != nh->nh_oif)
 					continue;
 			}
+			if (flp->flowi4_not_oif && flp->flowi4_not_oif == nh->nh_oif)
+				continue;
 
 			if (!(fib_flags & FIB_LOOKUP_NOREF))
 				atomic_inc(&fi->fib_clntref);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 3c8834b..2a793b5 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -467,6 +467,7 @@ static inline struct rt6_info *rt6_device_match(struct net *net,
 						    struct rt6_info *rt,
 						    const struct in6_addr *saddr,
 						    int oif,
+						    int not_oif,
 						    int flags)
 {
 	struct rt6_info *local = NULL;
@@ -478,6 +479,9 @@ static inline struct rt6_info *rt6_device_match(struct net *net,
 	for (sprt = rt; sprt; sprt = sprt->dst.rt6_next) {
 		struct net_device *dev = sprt->dst.dev;
 
+		if (not_oif && dev->ifindex == not_oif)
+			continue;
+
 		if (oif) {
 			if (dev->ifindex == oif)
 				return sprt;
@@ -856,7 +860,7 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
 	fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
 restart:
 	rt = fn->leaf;
-	rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
+	rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, fl6->flowi6_not_oif, flags);
 	if (rt->rt6i_nsiblings && fl6->flowi6_oif == 0)
 		rt = rt6_multipath_select(rt, fl6, fl6->flowi6_oif, flags);
 	if (rt == net->ipv6.ip6_null_entry) {
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] flowi: add concept of "not_oif"
  2016-02-03 17:46 ` [PATCH] flowi: add concept of "not_oif" Jason A. Donenfeld
@ 2016-02-03 20:42   ` Julian Anastasov
  2016-02-03 21:02     ` Jason A. Donenfeld
  0 siblings, 1 reply; 9+ messages in thread
From: Julian Anastasov @ 2016-02-03 20:42 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev, David Miller, Eric W. Biederman, dsa


	Hello,

On Wed, 3 Feb 2016, Jason A. Donenfeld wrote:

> This patch simply adds support for specifying a "not_oif" device in
> flowi4 and flowi6 lookups, that will find a matching route that _isn't_
> via the specified device.

	If you check every flowi4_oif user you will notice
that some places can not fulfil this requirement:

- fib_select_path -> fib_select_multipath

	Other places like fib_select_default are called
for flowi4_oif=0 and there are no other checks for flowi4_oif
but they will be needed for the new field.

	I don't know about the particular problems with
tunnels but the scripts can use the route metric to order
the routes in a table. Your patch looks simple but misses
a dozen of problems. The first breakage should be from the
missing initialization of this new field because the flowi
structure is not initialized at some places. Random
stack can lead to skipped routes. If this feature has
fans, you have to check all places that use flowi4_oif and
flowi6_oif.

Regards

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] flowi: add concept of "not_oif"
  2016-02-03 20:42   ` Julian Anastasov
@ 2016-02-03 21:02     ` Jason A. Donenfeld
  2016-02-04  3:44       ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Jason A. Donenfeld @ 2016-02-03 21:02 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Netdev, David Miller, Eric W. Biederman, David Ahern

Hi Julian,

Thanks a lot for your review. Much appreciated.

On Wed, Feb 3, 2016 at 9:42 PM, Julian Anastasov <ja@ssi.bg> wrote:
>         If you check every flowi4_oif user you will notice
> that some places can not fulfil this requirement:
> - fib_select_path -> fib_select_multipath
>         Other places like fib_select_default are called
> for flowi4_oif=0 and there are no other checks for flowi4_oif
> but they will be needed for the new field.
> fans, you have to check all places that use flowi4_oif and
> flowi6_oif.
> missing initialization of this new field

Thanks for pointing these out. I will take this into account and send
an updated patch [series].



>         I don't know about the particular problems with
> tunnels but the scripts can use the route metric to order
> the routes in a table.

This unfortunately does not cut it with tunnels.


Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] flowi: add concept of "not_oif"
  2016-02-03 21:02     ` Jason A. Donenfeld
@ 2016-02-04  3:44       ` Eric Dumazet
  2016-02-05 18:38         ` Jason A. Donenfeld
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2016-02-04  3:44 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Julian Anastasov, Netdev, David Miller, Eric W. Biederman, David Ahern

On Wed, 2016-02-03 at 22:02 +0100, Jason A. Donenfeld wrote:

> 
> >         I don't know about the particular problems with
> > tunnels but the scripts can use the route metric to order
> > the routes in a table.
> 
> This unfortunately does not cut it with tunnels.

ip rule show

ip route show table 10

I am pretty sure that you could select/change skb mark when packets
traverse the tunnel : The second route lookup can then select a
completely different table.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] flowi: add concept of "not_oif"
  2016-02-04  3:44       ` Eric Dumazet
@ 2016-02-05 18:38         ` Jason A. Donenfeld
  0 siblings, 0 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2016-02-05 18:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Julian Anastasov, Netdev, David Miller, Eric W. Biederman, David Ahern

On Thu, Feb 4, 2016 at 4:44 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2016-02-03 at 22:02 +0100, Jason A. Donenfeld wrote:
>
>>
>> >         I don't know about the particular problems with
>> > tunnels but the scripts can use the route metric to order
>> > the routes in a table.
>>
>> This unfortunately does not cut it with tunnels.
>
> ip rule show
>
> ip route show table 10
>
> I am pretty sure that you could select/change skb mark when packets
> traverse the tunnel : The second route lookup can then select a
> completely different table.

This doesn't work. Not to mention the fact that ip-rules aren't
cleaned up when the interface is removed and the issues with having
multiple routing tables, the following only works for very narrow
cases:

(212.47.239.81 is the IP of a VPN endpoint, for example below)

$ ip rule add to 212.47.239.81 lookup main pref 30
$ ip rule add to all lookup 80 pref 40
$ ip route add default dev tun0 table 80

The problem is -- what happens when you have particular routes that
you'd like specifically to go over your original network connection,
not the tunnel? I am now not able to do this:

$ ip route add 1.2.3.4/32 dev eth0

Because it will examine the second rule. Moving everything to the
second routing table obviously isn't a solution either for all the
reasons listed in my first email. Everything is complicated, partially
broken, and all together unrobust this way.

Sorry, but the current routing facilities of Linux are woefully
insufficient for extremely common place modern day tunneling. The
solution I've offered here is extremely simple, easy, and
non-intrusive to implement. You will be the best friend of many
network administrators and ordinary daily users alike.

Just imagine if authors of userspace tunneling utilities could write
things like:

    setsockopt(fd, SO_NOTOIF, tun0_idx);

Or kernel tunneling utilities being able to write:

   .flowi4_not_oif = geneve0_idx;

And then never have to worry about routing loops or bizarre
situations? And have everything nicely cleaned up when the interface
goes away? And be able to continue using all the same old routing
tools and schemes as before, with no need for enormous
reimplementations and daemons and endless bloat?

This is a simple feature that will go a very long way. There is no
current solution that comes anywhere close to solving the real
problem. Please consider it.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-02-05 18:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-02 22:42 [RFC] Inverse of flowi{4,6}_oif: flowi{4,6}_not_oif Jason A. Donenfeld
2016-02-03 14:27 ` Jason A. Donenfeld
2016-02-03 16:28   ` David Ahern
2016-02-03 17:40     ` Jason A. Donenfeld
2016-02-03 17:46 ` [PATCH] flowi: add concept of "not_oif" Jason A. Donenfeld
2016-02-03 20:42   ` Julian Anastasov
2016-02-03 21:02     ` Jason A. Donenfeld
2016-02-04  3:44       ` Eric Dumazet
2016-02-05 18:38         ` Jason A. Donenfeld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.