All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
@ 2015-10-12 16:33 David Ahern
  2015-10-14  1:45 ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: David Ahern @ 2015-10-12 16:33 UTC (permalink / raw)
  To: netdev; +Cc: hannes, nicolas.dichtel, David Ahern

Currently, all ipv6 addresses are flushed when the interface is configured
down, including global, static addresses:

    $ ip -6 addr add dev eth1 2000:11:1:1::1/64
    $ ip addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 02:04:11:22:33:01 brd ff:ff:ff:ff:ff:ff
        inet6 2000:11:1:1::1/64 scope global tentative
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 up
    $ ip link set dev eth1 down
    $ ip addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
        link/ether 02:04:11:22:33:01 brd ff:ff:ff:ff:ff:ff

Add a new sysctl to make this behavior optional. The new setting defaults to
flush all addresses to maintain backwards compatibility. When the setting is
reset global addresses with no expire times are not flushed:

    $ echo 0 > /proc/sys/net/ipv6/conf/eth1/flush_addr_on_down
    $ ip -6 addr add dev eth1 2000:11:1:1::1/64
    $ ip addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
        link/ether 02:04:11:22:33:01 brd ff:ff:ff:ff:ff:ff
        inet6 2000:11:1:1::1/64 scope global tentative
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 up
    $ ip link set dev eth1 down
    $ ip addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
        link/ether 02:04:11:22:33:01 brd ff:ff:ff:ff:ff:ff
        inet6 2000:11:1:1::1/64 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::4:11ff:fe22:3301/64 scope link
           valid_lft forever preferred_lft forever

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v5:
- renamed managed to user_managed as requested by Hannes
- handle addrconf_dst_alloc failure and cleanup ifp as noted by Dave
  -- tested by faking allocation failure
- minor ordering changes in addrconf_ifdown() to handle changes under lock

v4:
- rebased to top of tree
- updated to clear all routes on admin down and re-added on admin up
- verified the route tables (main and local) on a link down have *no*
  remnants of the configured, global address. On a link up all routes
  are restored -- multicast, linklocal, local routes and connected.

v3:
- fix local variable ordering and comment style per Dave's comment
- consistency in DEVCONF naming per Brian Haley's comment
- added entry to Documentation/networking/ip-sysctl.txt

v2:
- only keep static addresses as suggested by Hannes
- added new managed flag to track configured addresses
- on ifdown do not remove from configured address from inet6_addr_lst
- on ifdown reset the TENTATIVE flag and set state to DAD so that DAD is
  redone when link is brought up again

 Documentation/networking/ip-sysctl.txt |   6 ++
 include/linux/ipv6.h                   |   1 +
 include/net/if_inet6.h                 |   1 +
 include/uapi/linux/ipv6.h              |   1 +
 net/ipv6/addrconf.c                    | 124 +++++++++++++++++++++++++++++----
 5 files changed, 118 insertions(+), 15 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index ebe94f2cab98..51c60f58f7ec 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1432,6 +1432,12 @@ dad_transmits - INTEGER
 	The amount of Duplicate Address Detection probes to send.
 	Default: 1
 
+flush_addr_on_down - BOOLEAN
+	Flush all IPv6 addresses on an interface down event. If disabled
+	static global addresses with no expiration time are not flushed.
+
+	Default: enabled
+
 forwarding - INTEGER
 	Configure interface-specific Host/Router behaviour.
 
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 0ef2a97ccdb5..112a18940ab2 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -60,6 +60,7 @@ struct ipv6_devconf {
 		struct in6_addr secret;
 	} stable_secret;
 	__s32		use_oif_addrs_only;
+	__s32		flush_addr_on_down;
 	void		*sysctl;
 };
 
diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index 1c8b6820b694..01ba6a286a4b 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -72,6 +72,7 @@ struct inet6_ifaddr {
 	int			regen_count;
 
 	bool			tokenized;
+	bool			user_managed;
 
 	struct rcu_head		rcu;
 	struct in6_addr		peer_addr;
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 38b4fef20219..7c514f7cd209 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -174,6 +174,7 @@ enum {
 	DEVCONF_USE_OIF_ADDRS_ONLY,
 	DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT,
 	DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+	DEVCONF_FLUSH_ADDR_ON_DOWN,
 	DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index c8380f1876f1..fae99057d94a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -215,6 +215,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
 	},
 	.use_oif_addrs_only	= 0,
 	.ignore_routes_with_linkdown = 0,
+	.flush_addr_on_down	= 1,
 };
 
 static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -259,6 +260,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 	},
 	.use_oif_addrs_only	= 0,
 	.ignore_routes_with_linkdown = 0,
+	.flush_addr_on_down	= 1,
 };
 
 /* Check if a valid qdisc is available */
@@ -954,6 +956,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr,
 	ifa->prefered_lft = prefered_lft;
 	ifa->cstamp = ifa->tstamp = jiffies;
 	ifa->tokenized = false;
+	ifa->user_managed = false;
 
 	ifa->rt = rt;
 
@@ -2687,6 +2690,9 @@ static int inet6_addr_add(struct net *net, int ifindex,
 			    valid_lft, prefered_lft);
 
 	if (!IS_ERR(ifp)) {
+		if (!expires)
+			ifp->user_managed = true;
+
 		if (!(ifa_flags & IFA_F_NOPREFIXROUTE)) {
 			addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev,
 					      expires, flags);
@@ -3122,6 +3128,55 @@ static void addrconf_gre_config(struct net_device *dev)
 }
 #endif
 
+static int fixup_user_managed_addr(struct inet6_dev *idev,
+				   struct inet6_ifaddr *ifp)
+{
+	if (!ifp->rt) {
+		struct rt6_info *rt;
+
+		rt = addrconf_dst_alloc(idev, &ifp->addr, false);
+		if (unlikely(IS_ERR(rt)))
+			return PTR_ERR(rt);
+
+		ifp->rt = rt;
+	}
+
+	if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
+		addrconf_prefix_route(&ifp->addr, ifp->prefix_len,
+				      idev->dev, 0, 0);
+	}
+
+	addrconf_dad_start(ifp);
+
+	return 0;
+}
+
+static void addrconf_user_managed_addr(struct net_device *dev)
+{
+	struct inet6_ifaddr *ifp, *tmp;
+	struct inet6_dev *idev;
+
+	idev = __in6_dev_get(dev);
+	if (!idev)
+		return;
+
+	write_lock_bh(&idev->lock);
+
+	list_for_each_entry_safe(ifp, tmp, &idev->addr_list, if_list) {
+		if (ifp->user_managed &&
+		    fixup_user_managed_addr(idev, ifp) < 0) {
+			write_unlock_bh(&idev->lock);
+			ipv6_del_addr(ifp);
+			write_lock_bh(&idev->lock);
+
+			net_info_ratelimited("%s: Failed to add prefix route for address %pI6c; dropping\n",
+					     idev->dev->name, &ifp->addr);
+		}
+	}
+
+	write_unlock_bh(&idev->lock);
+}
+
 static int addrconf_notify(struct notifier_block *this, unsigned long event,
 			   void *ptr)
 {
@@ -3181,6 +3236,8 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 			run_pending = 1;
 		}
 
+		addrconf_user_managed_addr(dev);
+
 		switch (dev->type) {
 #if IS_ENABLED(CONFIG_IPV6_SIT)
 		case ARPHRD_SIT:
@@ -3301,7 +3358,8 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 {
 	struct net *net = dev_net(dev);
 	struct inet6_dev *idev;
-	struct inet6_ifaddr *ifa;
+	struct inet6_ifaddr *ifa, *tmp;
+	struct list_head del_list;
 	int state, i;
 
 	ASSERT_RTNL();
@@ -3336,9 +3394,13 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 restart:
 		hlist_for_each_entry_rcu(ifa, h, addr_lst) {
 			if (ifa->idev == idev) {
-				hlist_del_init_rcu(&ifa->addr_lst);
 				addrconf_del_dad_work(ifa);
-				goto restart;
+				if (how || idev->cnf.flush_addr_on_down ||
+				    !ifa->user_managed) {
+					hlist_del_init_rcu(&ifa->addr_lst);
+					goto restart;
+				}
+
 			}
 		}
 		spin_unlock_bh(&addrconf_hash_lock);
@@ -3372,31 +3434,52 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 		write_lock_bh(&idev->lock);
 	}
 
-	while (!list_empty(&idev->addr_list)) {
-		ifa = list_first_entry(&idev->addr_list,
-				       struct inet6_ifaddr, if_list);
-		addrconf_del_dad_work(ifa);
+	INIT_LIST_HEAD(&del_list);
+	list_for_each_entry_safe(ifa, tmp, &idev->addr_list, if_list) {
+		bool keep_ifa = false;
 
-		list_del(&ifa->if_list);
+		if (!how && !idev->cnf.flush_addr_on_down && ifa->user_managed)
+			keep_ifa = true;
 
-		write_unlock_bh(&idev->lock);
+		addrconf_del_dad_work(ifa);
 
+		write_unlock_bh(&idev->lock);
 		spin_lock_bh(&ifa->lock);
-		state = ifa->state;
-		ifa->state = INET6_IFADDR_STATE_DEAD;
+
+		if (unlikely(keep_ifa)) {
+			/* set state to skip the notifier below */
+			state = INET6_IFADDR_STATE_DEAD;
+			ifa->state = 0;
+			if (!(ifa->flags & IFA_F_NODAD))
+				ifa->flags |= IFA_F_TENTATIVE;
+		} else {
+			state = ifa->state;
+			ifa->state = INET6_IFADDR_STATE_DEAD;
+
+			list_del(&ifa->if_list);
+			list_add(&ifa->if_list, &del_list);
+		}
+
 		spin_unlock_bh(&ifa->lock);
 
 		if (state != INET6_IFADDR_STATE_DEAD) {
 			__ipv6_ifa_notify(RTM_DELADDR, ifa);
 			inet6addr_notifier_call_chain(NETDEV_DOWN, ifa);
 		}
-		in6_ifa_put(ifa);
 
 		write_lock_bh(&idev->lock);
 	}
 
 	write_unlock_bh(&idev->lock);
 
+	while (!list_empty(&del_list)) {
+		ifa = list_first_entry(&del_list,
+				       struct inet6_ifaddr, if_list);
+		list_del(&ifa->if_list);
+
+		in6_ifa_put(ifa);
+	}
+
 	/* Step 5: Discard anycast and multicast list */
 	if (how) {
 		ipv6_ac_destroy_dev(idev);
@@ -4656,6 +4739,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 	array[DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN] = cnf->ignore_routes_with_linkdown;
 	/* we omit DEVCONF_STABLE_SECRET for now */
 	array[DEVCONF_USE_OIF_ADDRS_ONLY] = cnf->use_oif_addrs_only;
+	array[DEVCONF_FLUSH_ADDR_ON_DOWN] = cnf->flush_addr_on_down;
 }
 
 static inline size_t inet6_ifla6_size(void)
@@ -5135,10 +5219,12 @@ static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp)
 			if (rt)
 				ip6_del_rt(rt);
 		}
-		dst_hold(&ifp->rt->dst);
-
-		ip6_del_rt(ifp->rt);
+		if (ifp->rt) {
+			dst_hold(&ifp->rt->dst);
 
+			ip6_del_rt(ifp->rt);
+			ifp->rt = NULL;
+		}
 		rt_genid_bump_ipv6(net);
 		break;
 	}
@@ -5717,6 +5803,14 @@ static struct addrconf_sysctl_table
 			.proc_handler	= addrconf_sysctl_ignore_routes_with_linkdown,
 		},
 		{
+			.procname       = "flush_addr_on_down",
+			.data           = &ipv6_devconf.flush_addr_on_down,
+			.maxlen         = sizeof(int),
+			.mode           = 0644,
+			.proc_handler   = proc_dointvec,
+
+		},
+		{
 			/* sentinel */
 		}
 	},
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
  2015-10-12 16:33 [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional David Ahern
@ 2015-10-14  1:45 ` David Miller
  2015-10-14  9:34   ` Hannes Frederic Sowa
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2015-10-14  1:45 UTC (permalink / raw)
  To: dsa; +Cc: netdev, hannes, nicolas.dichtel

From: David Ahern <dsa@cumulusnetworks.com>
Date: Mon, 12 Oct 2015 09:33:07 -0700

> Currently, all ipv6 addresses are flushed when the interface is configured
> down, including global, static addresses:
 ...
> Add a new sysctl to make this behavior optional. The new setting defaults to
> flush all addresses to maintain backwards compatibility. When the setting is
> reset global addresses with no expire times are not flushed:
 ...
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Here is what I really don't like about these changes: the failure
semantics are terrible.

If I add an address or a route, and some memory allocation fails, I get
a notification and see that my operation did not succeed.

But here, my routes can fail to be added during an ifup and all I will
get, at best, is a kernel log message.

This places a serious disconnect between what the user asked for and
making sure he finds out directly that his operation did not really
fully succeed.

In fact, this failure handling here during ifup leaves the interface in
a partially configured state.

There are really two ways to deal with this properly:

1) Propagate the failure back through the notifiers and fail the ifup
   completely when the addrconf_dst_alloc() fails.

2) On ifdown, stash the objects away somewhere so that memory allocation
   is not necessary on ifup.

Neither are really smooth approaches, but they have the attribute that
they give the user clean behavior and semantics.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
  2015-10-14  1:45 ` David Miller
@ 2015-10-14  9:34   ` Hannes Frederic Sowa
  2015-10-14 10:08     ` Nicolas Dichtel
  2015-10-14 16:09     ` David Ahern
  0 siblings, 2 replies; 11+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-14  9:34 UTC (permalink / raw)
  To: David Miller, dsa; +Cc: netdev, hannes, nicolas.dichtel

Hello,

On Wed, Oct 14, 2015, at 03:45, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Mon, 12 Oct 2015 09:33:07 -0700
> 
> > Currently, all ipv6 addresses are flushed when the interface is configured
> > down, including global, static addresses:
>  ...
> > Add a new sysctl to make this behavior optional. The new setting defaults to
> > flush all addresses to maintain backwards compatibility. When the setting is
> > reset global addresses with no expire times are not flushed:
>  ...
> > Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> 
> Here is what I really don't like about these changes: the failure
> semantics are terrible.
> 
> If I add an address or a route, and some memory allocation fails, I get
> a notification and see that my operation did not succeed.
> 
> But here, my routes can fail to be added during an ifup and all I will
> get, at best, is a kernel log message.
> 
> This places a serious disconnect between what the user asked for and
> making sure he finds out directly that his operation did not really
> fully succeed.
> 
> In fact, this failure handling here during ifup leaves the interface in
> a partially configured state.

Agreed, it would be better.

> There are really two ways to deal with this properly:
> 
> 1) Propagate the failure back through the notifiers and fail the ifup
>    completely when the addrconf_dst_alloc() fails.
> 
> 2) On ifdown, stash the objects away somewhere so that memory allocation
>    is not necessary on ifup.

In regard to error propagation I think 2) is the only viable option. I
don't think why you shouldn't be able to simply remove the ip6_rt_put
after the ip6_del_rt and keep the dst_entry hanging there and later
reinsert.

> Neither are really smooth approaches, but they have the attribute that
> they give the user clean behavior and semantics.

On another node:

This sysctl is on my list to be enabled soon by default by any systemd
based distribution. For that reason, could you maybe remove all the
unlikely annotations from your code. It really doesn't matter in
configuration code, because it is not performance critical and in my
opinion just distracts.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
  2015-10-14  9:34   ` Hannes Frederic Sowa
@ 2015-10-14 10:08     ` Nicolas Dichtel
  2015-10-14 11:03       ` Hannes Frederic Sowa
  2015-10-14 16:09     ` David Ahern
  1 sibling, 1 reply; 11+ messages in thread
From: Nicolas Dichtel @ 2015-10-14 10:08 UTC (permalink / raw)
  To: Hannes Frederic Sowa, David Miller, dsa; +Cc: netdev, hannes

Le 14/10/2015 11:34, Hannes Frederic Sowa a écrit :
[sni]
> This sysctl is on my list to be enabled soon by default by any systemd
> based distribution. For that reason, could you maybe remove all the
I'm not sure to understand why we add a sysctl then. Or at least, why the linux
default value is different from all standard distrib. I will be like
rp_filter :/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
  2015-10-14 10:08     ` Nicolas Dichtel
@ 2015-10-14 11:03       ` Hannes Frederic Sowa
  2015-10-14 12:18         ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-14 11:03 UTC (permalink / raw)
  To: Nicolas Dichtel, David Miller, dsa; +Cc: netdev, hannes

On Wed, Oct 14, 2015, at 12:08, Nicolas Dichtel wrote:
> Le 14/10/2015 11:34, Hannes Frederic Sowa a écrit :
> [sni]
> > This sysctl is on my list to be enabled soon by default by any systemd
> > based distribution. For that reason, could you maybe remove all the
> I'm not sure to understand why we add a sysctl then. Or at least, why the
> linux
> default value is different from all standard distrib. I will be like
> rp_filter :/

The difference is that people upgrade (in case of fedora they get a
.rpmnew file) or install a distribution and don't wonder or have
assumptions about old behavior. In case companies integrate kernel in
products/appliances without a way to manage those sysctls we cannot
simply change them as this would break assumptions for them. I think
those are two different cases.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
  2015-10-14 12:18         ` David Miller
@ 2015-10-14 12:14           ` Hannes Frederic Sowa
  2015-10-14 13:00             ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-14 12:14 UTC (permalink / raw)
  To: David Miller; +Cc: nicolas.dichtel, dsa, netdev, hannes



On Wed, Oct 14, 2015, at 14:18, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Wed, 14 Oct 2015 13:03:41 +0200
> > The difference is that people upgrade (in case of fedora they get a
> > .rpmnew file) or install a distribution and don't wonder or have
> > assumptions about old behavior. In case companies integrate kernel in
> > products/appliances without a way to manage those sysctls we cannot
> > simply change them as this would break assumptions for them. I think
> > those are two different cases.
> 
> The thing that is similar is that people set rp_filter inappropriately
> (no end host should have that knob enabled, ever, it's totally
> unnecesary).  And the risk here is similar, distribution X will set it
> so Y will say "we probably should set it too even though we really
> don't understand it fully".
> 
> I really hate situations like this.

I can bring up the rp_filter setting, too. It currently gets
unconditional set to strict mode in systemd on all interfaces.

The question is, if we should care about people enabling forwarding by
simply toggling the sysctl forwarding knob? Essentially in the kernel we
could provide two sysctl knobs, one for forwarding and one for local
reception. So people following the guidelines how to enable forwarding
could automatically have rp_filter enabled while host mode does not
because we leave  the forwarding rp_filter setting enabled. This at the
same time seems unnecessary complex and maybe we should simply talk to
distributions. ;)

What do you think?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
  2015-10-14 11:03       ` Hannes Frederic Sowa
@ 2015-10-14 12:18         ` David Miller
  2015-10-14 12:14           ` Hannes Frederic Sowa
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2015-10-14 12:18 UTC (permalink / raw)
  To: hannes; +Cc: nicolas.dichtel, dsa, netdev, hannes

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 14 Oct 2015 13:03:41 +0200

> On Wed, Oct 14, 2015, at 12:08, Nicolas Dichtel wrote:
>> Le 14/10/2015 11:34, Hannes Frederic Sowa a écrit :
>> [sni]
>> > This sysctl is on my list to be enabled soon by default by any systemd
>> > based distribution. For that reason, could you maybe remove all the
>> I'm not sure to understand why we add a sysctl then. Or at least, why the
>> linux
>> default value is different from all standard distrib. I will be like
>> rp_filter :/
> 
> The difference is that people upgrade (in case of fedora they get a
> .rpmnew file) or install a distribution and don't wonder or have
> assumptions about old behavior. In case companies integrate kernel in
> products/appliances without a way to manage those sysctls we cannot
> simply change them as this would break assumptions for them. I think
> those are two different cases.

The thing that is similar is that people set rp_filter inappropriately
(no end host should have that knob enabled, ever, it's totally
unnecesary).  And the risk here is similar, distribution X will set it
so Y will say "we probably should set it too even though we really
don't understand it fully".

I really hate situations like this.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
  2015-10-14 12:14           ` Hannes Frederic Sowa
@ 2015-10-14 13:00             ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2015-10-14 13:00 UTC (permalink / raw)
  To: hannes; +Cc: nicolas.dichtel, dsa, netdev, hannes

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 14 Oct 2015 14:14:05 +0200

> I can bring up the rp_filter setting, too. It currently gets
> unconditional set to strict mode in systemd on all interfaces.

Sigh...

> The question is, if we should care about people enabling forwarding by
> simply toggling the sysctl forwarding knob? Essentially in the kernel we
> could provide two sysctl knobs, one for forwarding and one for local
> reception. So people following the guidelines how to enable forwarding
> could automatically have rp_filter enabled while host mode does not
> because we leave  the forwarding rp_filter setting enabled. This at the
> same time seems unnecessary complex and maybe we should simply talk to
> distributions. ;)
> 
> What do you think?

We could make rp_filter only apply when something more than default and
subnet routes are configured.  Another bypass might be when only one
interface other than loopback is up and enabled for ipv4.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
  2015-10-14  9:34   ` Hannes Frederic Sowa
  2015-10-14 10:08     ` Nicolas Dichtel
@ 2015-10-14 16:09     ` David Ahern
  2015-10-15  1:06       ` David Miller
  1 sibling, 1 reply; 11+ messages in thread
From: David Ahern @ 2015-10-14 16:09 UTC (permalink / raw)
  To: Hannes Frederic Sowa, David Miller; +Cc: netdev, hannes, nicolas.dichtel

On 10/14/15 3:34 AM, Hannes Frederic Sowa wrote:
> Hello,
>
> On Wed, Oct 14, 2015, at 03:45, David Miller wrote:
>> From: David Ahern <dsa@cumulusnetworks.com>
>> Date: Mon, 12 Oct 2015 09:33:07 -0700
>>
>>> Currently, all ipv6 addresses are flushed when the interface is configured
>>> down, including global, static addresses:
>>   ...
>>> Add a new sysctl to make this behavior optional. The new setting defaults to
>>> flush all addresses to maintain backwards compatibility. When the setting is
>>> reset global addresses with no expire times are not flushed:
>>   ...
>>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
>>
>> Here is what I really don't like about these changes: the failure
>> semantics are terrible.
>>
>> If I add an address or a route, and some memory allocation fails, I get
>> a notification and see that my operation did not succeed.
>>
>> But here, my routes can fail to be added during an ifup and all I will
>> get, at best, is a kernel log message.
>>
>> This places a serious disconnect between what the user asked for and
>> making sure he finds out directly that his operation did not really
>> fully succeed.
>>
>> In fact, this failure handling here during ifup leaves the interface in
>> a partially configured state.
>
> Agreed, it would be better.
>
>> There are really two ways to deal with this properly:
>>
>> 1) Propagate the failure back through the notifiers and fail the ifup
>>     completely when the addrconf_dst_alloc() fails.
>>
>> 2) On ifdown, stash the objects away somewhere so that memory allocation
>>     is not necessary on ifup.
>
> In regard to error propagation I think 2) is the only viable option. I
> don't think why you shouldn't be able to simply remove the ip6_rt_put
> after the ip6_del_rt and keep the dst_entry hanging there and later
> reinsert.

I don't agree with stashing it. IPv4 code does not do that and handling 
the stashed routes makes the implementation more complex than needed.

This latest patch makes IPv6 static addresses on par with IPv4, 
including error paths. On admin link down IPv4 code removes local and 
connected routes. On admin link up they are reinserted. Error paths are 
not handled. ie., any memory allocations or insertion failures are 
silently ignored.

As for this IPv6 patch the only real failure scenario is ENOMEM in the 
dst_alloc.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
  2015-10-14 16:09     ` David Ahern
@ 2015-10-15  1:06       ` David Miller
  2015-10-15  2:46         ` David Ahern
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2015-10-15  1:06 UTC (permalink / raw)
  To: dsa; +Cc: hannes, netdev, hannes, nicolas.dichtel

From: David Ahern <dsa@cumulusnetworks.com>
Date: Wed, 14 Oct 2015 10:09:59 -0600

> This latest patch makes IPv6 static addresses on par with IPv4,
> including error paths.

I don't agree with ipv4's behavior... and just because ipv4 does
something poorly doesn't mean we get a free pass to replicate that
lazyness in ipv6.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
  2015-10-15  1:06       ` David Miller
@ 2015-10-15  2:46         ` David Ahern
  0 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2015-10-15  2:46 UTC (permalink / raw)
  To: David Miller; +Cc: hannes, netdev, hannes, nicolas.dichtel

On 10/14/15 7:06 PM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Wed, 14 Oct 2015 10:09:59 -0600
>
>> This latest patch makes IPv6 static addresses on par with IPv4,
>> including error paths.
>
> I don't agree with ipv4's behavior... and just because ipv4 does
> something poorly doesn't mean we get a free pass to replicate that
> lazyness in ipv6.
>

As I stated this patch makes IPv6 on par with IPv4 with regards to 
saving the address and lack of error handling back to the user should a 
failure happen on a link up. Yes, it is best to give the user 
notification of a failure, but step back for a moment and look at the 
bigger picture:

At best the address is saved and restored on a link up (the expected 
outcome for 99.999999...% of the time). At worst the address is removed 
because the prefix route fails a memory allocation and the user is not 
notified. But that is exactly what happens today - the address is 
dropped and the user has to restore it.

As for the 1 failure path -- it's a GFP_ATOMIC memory allocation 
failure. Frankly if that happens lack of an address on an interface is 
the least of the user's problems.

As for the options to fix this existing shortcoming:

1. The existing call_netdevice_notifiers infra does not allow a notifier 
to 'fail' the transaction and roll it back or even to give the user an 
error message.

2. Stashing the prefix route has its merits but it has to deal with 
error paths as well. What if the address is deleted? What if the mask is 
changed while the device is a down state? What if the device is deleted? 
Sure, handle those cases but what other paths are missing from that list?

Both paths introduce a lot of complexity all b/c we want to save the 
address on a link and restore the route on a link up.

Why not take this as a start point that at least does the right thing 
almost every time?

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-10-15  2:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-12 16:33 [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional David Ahern
2015-10-14  1:45 ` David Miller
2015-10-14  9:34   ` Hannes Frederic Sowa
2015-10-14 10:08     ` Nicolas Dichtel
2015-10-14 11:03       ` Hannes Frederic Sowa
2015-10-14 12:18         ` David Miller
2015-10-14 12:14           ` Hannes Frederic Sowa
2015-10-14 13:00             ` David Miller
2015-10-14 16:09     ` David Ahern
2015-10-15  1:06       ` David Miller
2015-10-15  2:46         ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.