All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
@ 2019-03-05  0:50 Si-Wei Liu
  2019-03-05  2:33 ` Michael S. Tsirkin
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Si-Wei Liu @ 2019-03-05  0:50 UTC (permalink / raw)
  To: Michael S. Tsirkin, Sridhar Samudrala, Stephen Hemminger,
	Jakub Kicinski, Jiri Pirko, David Miller, Netdev, virtualization
  Cc: liran.alon, boris.ostrovsky, vijay.balakrishna, si-wei liu

When a netdev appears through hot plug then gets enslaved by a failover
master that is already up and running, the slave will be opened
right away after getting enslaved. Today there's a race that userspace
(udev) may fail to rename the slave if the kernel (net_failover)
opens the slave earlier than when the userspace rename happens.
Unlike bond or team, the primary slave of failover can't be renamed by
userspace ahead of time, since the kernel initiated auto-enslavement is
unable to, or rather, is never meant to be synchronized with the rename
request from userspace.

As the failover slave interfaces are not designed to be operated
directly by userspace apps: IP configuration, filter rules with
regard to network traffic passing and etc., should all be done on master
interface. In general, userspace apps only care about the
name of master interface, while slave names are less important as long
as admin users can see reliable names that may carry
other information describing the netdev. For e.g., they can infer that
"ens3nsby" is a standby slave of "ens3", while for a
name like "eth0" they can't tell which master it belongs to.

Historically the name of IFF_UP interface can't be changed because
there might be admin script or management software that is already
relying on such behavior and assumes that the slave name can't be
changed once UP. But failover is special: with the in-kernel
auto-enslavement mechanism, the userspace expectation for device
enumeration and bring-up order is already broken. Previously initramfs
and various userspace config tools were modified to bypass failover
slaves because of auto-enslavement and duplicate MAC address. Similarly,
in case that users care about seeing reliable slave name, the new type
of failover slaves needs to be taken care of specifically in userspace
anyway.

For that to work, now introduce a module-level tunable,
"slave_rename_ok" that allows users to lift up the rename restriction on
failover slave which is already UP. Although it's possible this change
potentially break userspace component (most likely configuration scripts
or management software) that assumes slave name can't be changed while
UP, it's relatively a limited and controllable set among all userspace
components, which can be fixed specifically to work with the new naming
behavior of the failover slave. Userspace component interacting with
slaves should be changed to operate on failover master instead, as the
failover slave is dynamic in nature which may come and go at any point.
The goal is to make the role of failover slaves less relevant, and
all userspace should only deal with master in the long run. The default
for the "slave_rename_ok" is set to true(1). If userspace doesn't have
the right support in place meanwhile users don't care about reliable
userspace naming, the value can be set to false(0).

Signed-off-by: Si-Wei.Liu@oracle.com
Reviewed-by: Liran Alon <liran.alon@oracle.com>
---
 include/linux/netdevice.h |  3 +++
 net/core/dev.c            |  3 ++-
 net/core/failover.c       | 11 +++++++++--
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 857f8ab..6d9e4e0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1487,6 +1487,7 @@ struct net_device_ops {
  * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
  * @IFF_FAILOVER: device is a failover master device
  * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
+ * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
  */
 enum netdev_priv_flags {
 	IFF_802_1Q_VLAN			= 1<<0,
@@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
 	IFF_NO_RX_HANDLER		= 1<<26,
 	IFF_FAILOVER			= 1<<27,
 	IFF_FAILOVER_SLAVE		= 1<<28,
+	IFF_SLAVE_RENAME_OK		= 1<<29,
 };
 
 #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
@@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
 #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
 #define IFF_FAILOVER			IFF_FAILOVER
 #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
+#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
 
 /**
  *	struct net_device - The DEVICE structure.
diff --git a/net/core/dev.c b/net/core/dev.c
index 722d50d..ae070de 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
 	BUG_ON(!dev_net(dev));
 
 	net = dev_net(dev);
-	if (dev->flags & IFF_UP)
+	if (dev->flags & IFF_UP &&
+	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
 		return -EBUSY;
 
 	write_seqcount_begin(&devnet_rename_seq);
diff --git a/net/core/failover.c b/net/core/failover.c
index 4a92a98..1fd8bbb 100644
--- a/net/core/failover.c
+++ b/net/core/failover.c
@@ -16,6 +16,11 @@
 
 static LIST_HEAD(failover_list);
 static DEFINE_SPINLOCK(failover_lock);
+static bool slave_rename_ok = true;
+
+module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
+MODULE_PARM_DESC(slave_rename_ok,
+		 "If set allow renaming the slave when failover master is up");
 
 static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
 {
@@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
 	}
 
 	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
+	if (slave_rename_ok)
+		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
 
 	if (fops && fops->slave_register &&
 	    !fops->slave_register(slave_dev, failover_dev))
 		return NOTIFY_OK;
 
 	netdev_upper_dev_unlink(slave_dev, failover_dev);
-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
 err_upper_link:
 	netdev_rx_handler_unregister(slave_dev);
 done:
@@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
 
 	netdev_rx_handler_unregister(slave_dev);
 	netdev_upper_dev_unlink(slave_dev, failover_dev);
-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
 
 	if (fops && fops->slave_unregister &&
 	    !fops->slave_unregister(slave_dev, failover_dev))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05  0:50 [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
  2019-03-05  2:33 ` Michael S. Tsirkin
@ 2019-03-05  2:33 ` Michael S. Tsirkin
  2019-03-05 19:19   ` si-wei liu
  2019-03-06 12:04 ` Jiri Pirko
  2019-03-06 12:04 ` Jiri Pirko
  3 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-05  2:33 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Sridhar Samudrala, Stephen Hemminger, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna

On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote:
> When a netdev appears through hot plug then gets enslaved by a failover
> master that is already up and running, the slave will be opened
> right away after getting enslaved. Today there's a race that userspace
> (udev) may fail to rename the slave if the kernel (net_failover)
> opens the slave earlier than when the userspace rename happens.
> Unlike bond or team, the primary slave of failover can't be renamed by
> userspace ahead of time, since the kernel initiated auto-enslavement is
> unable to, or rather, is never meant to be synchronized with the rename
> request from userspace.
> 
> As the failover slave interfaces are not designed to be operated
> directly by userspace apps: IP configuration, filter rules with
> regard to network traffic passing and etc., should all be done on master
> interface. In general, userspace apps only care about the
> name of master interface, while slave names are less important as long
> as admin users can see reliable names that may carry
> other information describing the netdev. For e.g., they can infer that
> "ens3nsby" is a standby slave of "ens3", while for a
> name like "eth0" they can't tell which master it belongs to.
> 
> Historically the name of IFF_UP interface can't be changed because
> there might be admin script or management software that is already
> relying on such behavior and assumes that the slave name can't be
> changed once UP. But failover is special: with the in-kernel
> auto-enslavement mechanism, the userspace expectation for device
> enumeration and bring-up order is already broken. Previously initramfs
> and various userspace config tools were modified to bypass failover
> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> in case that users care about seeing reliable slave name, the new type
> of failover slaves needs to be taken care of specifically in userspace
> anyway.
> 
> For that to work, now introduce a module-level tunable,
> "slave_rename_ok" that allows users to lift up the rename restriction on
> failover slave which is already UP. Although it's possible this change
> potentially break userspace component (most likely configuration scripts
> or management software) that assumes slave name can't be changed while
> UP, it's relatively a limited and controllable set among all userspace
> components, which can be fixed specifically to work with the new naming
> behavior of the failover slave. Userspace component interacting with
> slaves should be changed to operate on failover master instead, as the
> failover slave is dynamic in nature which may come and go at any point.
> The goal is to make the role of failover slaves less relevant, and
> all userspace should only deal with master in the long run. The default
> for the "slave_rename_ok" is set to true(1). If userspace doesn't have
> the right support in place meanwhile users don't care about reliable
> userspace naming, the value can be set to false(0).
> 
> Signed-off-by: Si-Wei.Liu@oracle.com
> Reviewed-by: Liran Alon <liran.alon@oracle.com>

Not sure which of the versions I should reply to.

I have a vague idea: would it work to *not* set
IFF_UP on slave devices at all?

Would this reduce the chances of existing scripts such as dracut being
confused?

And this leaves open the option for scripts to address
slaves by checking some custom attribute.

> ---
>  include/linux/netdevice.h |  3 +++
>  net/core/dev.c            |  3 ++-
>  net/core/failover.c       | 11 +++++++++--
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 857f8ab..6d9e4e0 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1487,6 +1487,7 @@ struct net_device_ops {
>   * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>   * @IFF_FAILOVER: device is a failover master device
>   * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
>   */
>  enum netdev_priv_flags {
>  	IFF_802_1Q_VLAN			= 1<<0,
> @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
>  	IFF_NO_RX_HANDLER		= 1<<26,
>  	IFF_FAILOVER			= 1<<27,
>  	IFF_FAILOVER_SLAVE		= 1<<28,
> +	IFF_SLAVE_RENAME_OK		= 1<<29,
>  };
>  
>  #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
> @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
>  #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
>  #define IFF_FAILOVER			IFF_FAILOVER
>  #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
> +#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
>  
>  /**
>   *	struct net_device - The DEVICE structure.
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 722d50d..ae070de 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
>  	BUG_ON(!dev_net(dev));
>  
>  	net = dev_net(dev);
> -	if (dev->flags & IFF_UP)
> +	if (dev->flags & IFF_UP &&
> +	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
>  		return -EBUSY;
>  
>  	write_seqcount_begin(&devnet_rename_seq);
> diff --git a/net/core/failover.c b/net/core/failover.c
> index 4a92a98..1fd8bbb 100644
> --- a/net/core/failover.c
> +++ b/net/core/failover.c
> @@ -16,6 +16,11 @@
>  
>  static LIST_HEAD(failover_list);
>  static DEFINE_SPINLOCK(failover_lock);
> +static bool slave_rename_ok = true;
> +
> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
> +MODULE_PARM_DESC(slave_rename_ok,
> +		 "If set allow renaming the slave when failover master is up");
>  
>  static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
>  {
> @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
>  	}
>  
>  	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
> +	if (slave_rename_ok)
> +		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
>  
>  	if (fops && fops->slave_register &&
>  	    !fops->slave_register(slave_dev, failover_dev))
>  		return NOTIFY_OK;
>  
>  	netdev_upper_dev_unlink(slave_dev, failover_dev);
> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>  err_upper_link:
>  	netdev_rx_handler_unregister(slave_dev);
>  done:
> @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
>  
>  	netdev_rx_handler_unregister(slave_dev);
>  	netdev_upper_dev_unlink(slave_dev, failover_dev);
> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>  
>  	if (fops && fops->slave_unregister &&
>  	    !fops->slave_unregister(slave_dev, failover_dev))
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05  0:50 [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
@ 2019-03-05  2:33 ` Michael S. Tsirkin
  2019-03-05  2:33 ` Michael S. Tsirkin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-05  2:33 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Jiri Pirko, Jakub Kicinski, Sridhar Samudrala, virtualization,
	liran.alon, Netdev, boris.ostrovsky, David Miller

On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote:
> When a netdev appears through hot plug then gets enslaved by a failover
> master that is already up and running, the slave will be opened
> right away after getting enslaved. Today there's a race that userspace
> (udev) may fail to rename the slave if the kernel (net_failover)
> opens the slave earlier than when the userspace rename happens.
> Unlike bond or team, the primary slave of failover can't be renamed by
> userspace ahead of time, since the kernel initiated auto-enslavement is
> unable to, or rather, is never meant to be synchronized with the rename
> request from userspace.
> 
> As the failover slave interfaces are not designed to be operated
> directly by userspace apps: IP configuration, filter rules with
> regard to network traffic passing and etc., should all be done on master
> interface. In general, userspace apps only care about the
> name of master interface, while slave names are less important as long
> as admin users can see reliable names that may carry
> other information describing the netdev. For e.g., they can infer that
> "ens3nsby" is a standby slave of "ens3", while for a
> name like "eth0" they can't tell which master it belongs to.
> 
> Historically the name of IFF_UP interface can't be changed because
> there might be admin script or management software that is already
> relying on such behavior and assumes that the slave name can't be
> changed once UP. But failover is special: with the in-kernel
> auto-enslavement mechanism, the userspace expectation for device
> enumeration and bring-up order is already broken. Previously initramfs
> and various userspace config tools were modified to bypass failover
> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> in case that users care about seeing reliable slave name, the new type
> of failover slaves needs to be taken care of specifically in userspace
> anyway.
> 
> For that to work, now introduce a module-level tunable,
> "slave_rename_ok" that allows users to lift up the rename restriction on
> failover slave which is already UP. Although it's possible this change
> potentially break userspace component (most likely configuration scripts
> or management software) that assumes slave name can't be changed while
> UP, it's relatively a limited and controllable set among all userspace
> components, which can be fixed specifically to work with the new naming
> behavior of the failover slave. Userspace component interacting with
> slaves should be changed to operate on failover master instead, as the
> failover slave is dynamic in nature which may come and go at any point.
> The goal is to make the role of failover slaves less relevant, and
> all userspace should only deal with master in the long run. The default
> for the "slave_rename_ok" is set to true(1). If userspace doesn't have
> the right support in place meanwhile users don't care about reliable
> userspace naming, the value can be set to false(0).
> 
> Signed-off-by: Si-Wei.Liu@oracle.com
> Reviewed-by: Liran Alon <liran.alon@oracle.com>

Not sure which of the versions I should reply to.

I have a vague idea: would it work to *not* set
IFF_UP on slave devices at all?

Would this reduce the chances of existing scripts such as dracut being
confused?

And this leaves open the option for scripts to address
slaves by checking some custom attribute.

> ---
>  include/linux/netdevice.h |  3 +++
>  net/core/dev.c            |  3 ++-
>  net/core/failover.c       | 11 +++++++++--
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 857f8ab..6d9e4e0 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1487,6 +1487,7 @@ struct net_device_ops {
>   * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>   * @IFF_FAILOVER: device is a failover master device
>   * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
>   */
>  enum netdev_priv_flags {
>  	IFF_802_1Q_VLAN			= 1<<0,
> @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
>  	IFF_NO_RX_HANDLER		= 1<<26,
>  	IFF_FAILOVER			= 1<<27,
>  	IFF_FAILOVER_SLAVE		= 1<<28,
> +	IFF_SLAVE_RENAME_OK		= 1<<29,
>  };
>  
>  #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
> @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
>  #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
>  #define IFF_FAILOVER			IFF_FAILOVER
>  #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
> +#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
>  
>  /**
>   *	struct net_device - The DEVICE structure.
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 722d50d..ae070de 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
>  	BUG_ON(!dev_net(dev));
>  
>  	net = dev_net(dev);
> -	if (dev->flags & IFF_UP)
> +	if (dev->flags & IFF_UP &&
> +	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
>  		return -EBUSY;
>  
>  	write_seqcount_begin(&devnet_rename_seq);
> diff --git a/net/core/failover.c b/net/core/failover.c
> index 4a92a98..1fd8bbb 100644
> --- a/net/core/failover.c
> +++ b/net/core/failover.c
> @@ -16,6 +16,11 @@
>  
>  static LIST_HEAD(failover_list);
>  static DEFINE_SPINLOCK(failover_lock);
> +static bool slave_rename_ok = true;
> +
> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
> +MODULE_PARM_DESC(slave_rename_ok,
> +		 "If set allow renaming the slave when failover master is up");
>  
>  static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
>  {
> @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
>  	}
>  
>  	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
> +	if (slave_rename_ok)
> +		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
>  
>  	if (fops && fops->slave_register &&
>  	    !fops->slave_register(slave_dev, failover_dev))
>  		return NOTIFY_OK;
>  
>  	netdev_upper_dev_unlink(slave_dev, failover_dev);
> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>  err_upper_link:
>  	netdev_rx_handler_unregister(slave_dev);
>  done:
> @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
>  
>  	netdev_rx_handler_unregister(slave_dev);
>  	netdev_upper_dev_unlink(slave_dev, failover_dev);
> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>  
>  	if (fops && fops->slave_unregister &&
>  	    !fops->slave_unregister(slave_dev, failover_dev))
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05  2:33 ` Michael S. Tsirkin
@ 2019-03-05 19:19   ` si-wei liu
  2019-03-05 19:24     ` Stephen Hemminger
                       ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: si-wei liu @ 2019-03-05 19:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sridhar Samudrala, Stephen Hemminger, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna



On 3/4/2019 6:33 PM, Michael S. Tsirkin wrote:
> On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote:
>> When a netdev appears through hot plug then gets enslaved by a failover
>> master that is already up and running, the slave will be opened
>> right away after getting enslaved. Today there's a race that userspace
>> (udev) may fail to rename the slave if the kernel (net_failover)
>> opens the slave earlier than when the userspace rename happens.
>> Unlike bond or team, the primary slave of failover can't be renamed by
>> userspace ahead of time, since the kernel initiated auto-enslavement is
>> unable to, or rather, is never meant to be synchronized with the rename
>> request from userspace.
>>
>> As the failover slave interfaces are not designed to be operated
>> directly by userspace apps: IP configuration, filter rules with
>> regard to network traffic passing and etc., should all be done on master
>> interface. In general, userspace apps only care about the
>> name of master interface, while slave names are less important as long
>> as admin users can see reliable names that may carry
>> other information describing the netdev. For e.g., they can infer that
>> "ens3nsby" is a standby slave of "ens3", while for a
>> name like "eth0" they can't tell which master it belongs to.
>>
>> Historically the name of IFF_UP interface can't be changed because
>> there might be admin script or management software that is already
>> relying on such behavior and assumes that the slave name can't be
>> changed once UP. But failover is special: with the in-kernel
>> auto-enslavement mechanism, the userspace expectation for device
>> enumeration and bring-up order is already broken. Previously initramfs
>> and various userspace config tools were modified to bypass failover
>> slaves because of auto-enslavement and duplicate MAC address. Similarly,
>> in case that users care about seeing reliable slave name, the new type
>> of failover slaves needs to be taken care of specifically in userspace
>> anyway.
>>
>> For that to work, now introduce a module-level tunable,
>> "slave_rename_ok" that allows users to lift up the rename restriction on
>> failover slave which is already UP. Although it's possible this change
>> potentially break userspace component (most likely configuration scripts
>> or management software) that assumes slave name can't be changed while
>> UP, it's relatively a limited and controllable set among all userspace
>> components, which can be fixed specifically to work with the new naming
>> behavior of the failover slave. Userspace component interacting with
>> slaves should be changed to operate on failover master instead, as the
>> failover slave is dynamic in nature which may come and go at any point.
>> The goal is to make the role of failover slaves less relevant, and
>> all userspace should only deal with master in the long run. The default
>> for the "slave_rename_ok" is set to true(1). If userspace doesn't have
>> the right support in place meanwhile users don't care about reliable
>> userspace naming, the value can be set to false(0).
>>
>> Signed-off-by: Si-Wei.Liu@oracle.com
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Not sure which of the versions I should reply to.
Sorry for multiple copies sent. It's fine to reply to this one.

>
> I have a vague idea: would it work to *not* set
> IFF_UP on slave devices at all?
Hmm, I ever thought about this option, and it appears this solution is 
more invasive than required to convert existing scripts, despite the 
controversy of introducing internal netdev state to differentiate user 
visible state. Either we disallow slave to be brought up by user, or to 
not set IFF_UP flag but instead use the internal one, could end up with 
substantial behavioral change that breaks scripts. Consider any admin 
script that does `ip link set dev ... up' successfully just assumes the 
link is up and subsequent operation can be done as usual. While it *may* 
work for dracut (yet to be verified), I'm a bit concerned that there are 
more scripts to be converted than those that don't follow volatile 
failover slave names. It's technically doable, but may not worth the 
effort (in terms of porting existing scripts/apps).

Thanks
-Siwei

>
> Would this reduce the chances of existing scripts such as dracut being
> confused?
>
> And this leaves open the option for scripts to address
> slaves by checking some custom attribute.
>
>> ---
>>   include/linux/netdevice.h |  3 +++
>>   net/core/dev.c            |  3 ++-
>>   net/core/failover.c       | 11 +++++++++--
>>   3 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 857f8ab..6d9e4e0 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1487,6 +1487,7 @@ struct net_device_ops {
>>    * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>>    * @IFF_FAILOVER: device is a failover master device
>>    * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>> + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
>>    */
>>   enum netdev_priv_flags {
>>   	IFF_802_1Q_VLAN			= 1<<0,
>> @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
>>   	IFF_NO_RX_HANDLER		= 1<<26,
>>   	IFF_FAILOVER			= 1<<27,
>>   	IFF_FAILOVER_SLAVE		= 1<<28,
>> +	IFF_SLAVE_RENAME_OK		= 1<<29,
>>   };
>>   
>>   #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
>> @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
>>   #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
>>   #define IFF_FAILOVER			IFF_FAILOVER
>>   #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
>> +#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
>>   
>>   /**
>>    *	struct net_device - The DEVICE structure.
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 722d50d..ae070de 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
>>   	BUG_ON(!dev_net(dev));
>>   
>>   	net = dev_net(dev);
>> -	if (dev->flags & IFF_UP)
>> +	if (dev->flags & IFF_UP &&
>> +	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
>>   		return -EBUSY;
>>   
>>   	write_seqcount_begin(&devnet_rename_seq);
>> diff --git a/net/core/failover.c b/net/core/failover.c
>> index 4a92a98..1fd8bbb 100644
>> --- a/net/core/failover.c
>> +++ b/net/core/failover.c
>> @@ -16,6 +16,11 @@
>>   
>>   static LIST_HEAD(failover_list);
>>   static DEFINE_SPINLOCK(failover_lock);
>> +static bool slave_rename_ok = true;
>> +
>> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
>> +MODULE_PARM_DESC(slave_rename_ok,
>> +		 "If set allow renaming the slave when failover master is up");
>>   
>>   static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
>>   {
>> @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
>>   	}
>>   
>>   	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
>> +	if (slave_rename_ok)
>> +		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
>>   
>>   	if (fops && fops->slave_register &&
>>   	    !fops->slave_register(slave_dev, failover_dev))
>>   		return NOTIFY_OK;
>>   
>>   	netdev_upper_dev_unlink(slave_dev, failover_dev);
>> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>>   err_upper_link:
>>   	netdev_rx_handler_unregister(slave_dev);
>>   done:
>> @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
>>   
>>   	netdev_rx_handler_unregister(slave_dev);
>>   	netdev_upper_dev_unlink(slave_dev, failover_dev);
>> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>>   
>>   	if (fops && fops->slave_unregister &&
>>   	    !fops->slave_unregister(slave_dev, failover_dev))
>> -- 
>> 1.8.3.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05 19:19   ` si-wei liu
  2019-03-05 19:24     ` Stephen Hemminger
@ 2019-03-05 19:24     ` Stephen Hemminger
  2019-03-05 19:35       ` si-wei liu
  2019-03-05 20:28     ` Michael S. Tsirkin
  2019-03-05 20:28     ` Michael S. Tsirkin
  3 siblings, 1 reply; 28+ messages in thread
From: Stephen Hemminger @ 2019-03-05 19:24 UTC (permalink / raw)
  To: si-wei liu
  Cc: Michael S. Tsirkin, Sridhar Samudrala, Jakub Kicinski,
	Jiri Pirko, David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna

On Tue, 5 Mar 2019 11:19:32 -0800
si-wei liu <si-wei.liu@oracle.com> wrote:

> > I have a vague idea: would it work to *not* set
> > IFF_UP on slave devices at all?  
> Hmm, I ever thought about this option, and it appears this solution is 
> more invasive than required to convert existing scripts, despite the 
> controversy of introducing internal netdev state to differentiate user 
> visible state. Either we disallow slave to be brought up by user, or to 
> not set IFF_UP flag but instead use the internal one, could end up with 
> substantial behavioral change that breaks scripts. Consider any admin 
> script that does `ip link set dev ... up' successfully just assumes the 
> link is up and subsequent operation can be done as usual. While it *may* 
> work for dracut (yet to be verified), I'm a bit concerned that there are 
> more scripts to be converted than those that don't follow volatile 
> failover slave names. It's technically doable, but may not worth the 
> effort (in terms of porting existing scripts/apps).
> 
> Thanks
> -Siwei

Won't work for most devices.  Many devices turn off PHY and link layer
if not IFF_UP

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05 19:19   ` si-wei liu
@ 2019-03-05 19:24     ` Stephen Hemminger
  2019-03-05 19:24     ` Stephen Hemminger
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Stephen Hemminger @ 2019-03-05 19:24 UTC (permalink / raw)
  To: si-wei liu
  Cc: Jiri Pirko, Michael S. Tsirkin, Jakub Kicinski,
	Sridhar Samudrala, virtualization, liran.alon, Netdev,
	boris.ostrovsky, David Miller

On Tue, 5 Mar 2019 11:19:32 -0800
si-wei liu <si-wei.liu@oracle.com> wrote:

> > I have a vague idea: would it work to *not* set
> > IFF_UP on slave devices at all?  
> Hmm, I ever thought about this option, and it appears this solution is 
> more invasive than required to convert existing scripts, despite the 
> controversy of introducing internal netdev state to differentiate user 
> visible state. Either we disallow slave to be brought up by user, or to 
> not set IFF_UP flag but instead use the internal one, could end up with 
> substantial behavioral change that breaks scripts. Consider any admin 
> script that does `ip link set dev ... up' successfully just assumes the 
> link is up and subsequent operation can be done as usual. While it *may* 
> work for dracut (yet to be verified), I'm a bit concerned that there are 
> more scripts to be converted than those that don't follow volatile 
> failover slave names. It's technically doable, but may not worth the 
> effort (in terms of porting existing scripts/apps).
> 
> Thanks
> -Siwei

Won't work for most devices.  Many devices turn off PHY and link layer
if not IFF_UP

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05 19:24     ` Stephen Hemminger
@ 2019-03-05 19:35       ` si-wei liu
  2019-03-06  0:06           ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: si-wei liu @ 2019-03-05 19:35 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael S. Tsirkin, Sridhar Samudrala, Jakub Kicinski,
	Jiri Pirko, David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna



On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> On Tue, 5 Mar 2019 11:19:32 -0800
> si-wei liu <si-wei.liu@oracle.com> wrote:
>
>>> I have a vague idea: would it work to *not* set
>>> IFF_UP on slave devices at all?
>> Hmm, I ever thought about this option, and it appears this solution is
>> more invasive than required to convert existing scripts, despite the
>> controversy of introducing internal netdev state to differentiate user
>> visible state. Either we disallow slave to be brought up by user, or to
>> not set IFF_UP flag but instead use the internal one, could end up with
>> substantial behavioral change that breaks scripts. Consider any admin
>> script that does `ip link set dev ... up' successfully just assumes the
>> link is up and subsequent operation can be done as usual. While it *may*
>> work for dracut (yet to be verified), I'm a bit concerned that there are
>> more scripts to be converted than those that don't follow volatile
>> failover slave names. It's technically doable, but may not worth the
>> effort (in terms of porting existing scripts/apps).
>>
>> Thanks
>> -Siwei
> Won't work for most devices.  Many devices turn off PHY and link layer
> if not IFF_UP
True, that's what I said about introducing internal state for those 
driver and other kernel component. Very invasive change indeed.

-Siwei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05 19:19   ` si-wei liu
                       ` (2 preceding siblings ...)
  2019-03-05 20:28     ` Michael S. Tsirkin
@ 2019-03-05 20:28     ` Michael S. Tsirkin
  2019-03-05 22:49       ` si-wei liu
  3 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-05 20:28 UTC (permalink / raw)
  To: si-wei liu
  Cc: Sridhar Samudrala, Stephen Hemminger, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna

On Tue, Mar 05, 2019 at 11:19:32AM -0800, si-wei liu wrote:
> 
> 
> On 3/4/2019 6:33 PM, Michael S. Tsirkin wrote:
> > On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote:
> > > When a netdev appears through hot plug then gets enslaved by a failover
> > > master that is already up and running, the slave will be opened
> > > right away after getting enslaved. Today there's a race that userspace
> > > (udev) may fail to rename the slave if the kernel (net_failover)
> > > opens the slave earlier than when the userspace rename happens.
> > > Unlike bond or team, the primary slave of failover can't be renamed by
> > > userspace ahead of time, since the kernel initiated auto-enslavement is
> > > unable to, or rather, is never meant to be synchronized with the rename
> > > request from userspace.
> > > 
> > > As the failover slave interfaces are not designed to be operated
> > > directly by userspace apps: IP configuration, filter rules with
> > > regard to network traffic passing and etc., should all be done on master
> > > interface. In general, userspace apps only care about the
> > > name of master interface, while slave names are less important as long
> > > as admin users can see reliable names that may carry
> > > other information describing the netdev. For e.g., they can infer that
> > > "ens3nsby" is a standby slave of "ens3", while for a
> > > name like "eth0" they can't tell which master it belongs to.
> > > 
> > > Historically the name of IFF_UP interface can't be changed because
> > > there might be admin script or management software that is already
> > > relying on such behavior and assumes that the slave name can't be
> > > changed once UP. But failover is special: with the in-kernel
> > > auto-enslavement mechanism, the userspace expectation for device
> > > enumeration and bring-up order is already broken. Previously initramfs
> > > and various userspace config tools were modified to bypass failover
> > > slaves because of auto-enslavement and duplicate MAC address. Similarly,
> > > in case that users care about seeing reliable slave name, the new type
> > > of failover slaves needs to be taken care of specifically in userspace
> > > anyway.
> > > 
> > > For that to work, now introduce a module-level tunable,
> > > "slave_rename_ok" that allows users to lift up the rename restriction on
> > > failover slave which is already UP. Although it's possible this change
> > > potentially break userspace component (most likely configuration scripts
> > > or management software) that assumes slave name can't be changed while
> > > UP, it's relatively a limited and controllable set among all userspace
> > > components, which can be fixed specifically to work with the new naming
> > > behavior of the failover slave. Userspace component interacting with
> > > slaves should be changed to operate on failover master instead, as the
> > > failover slave is dynamic in nature which may come and go at any point.
> > > The goal is to make the role of failover slaves less relevant, and
> > > all userspace should only deal with master in the long run. The default
> > > for the "slave_rename_ok" is set to true(1). If userspace doesn't have
> > > the right support in place meanwhile users don't care about reliable
> > > userspace naming, the value can be set to false(0).
> > > 
> > > Signed-off-by: Si-Wei.Liu@oracle.com
> > > Reviewed-by: Liran Alon <liran.alon@oracle.com>
> > Not sure which of the versions I should reply to.
> Sorry for multiple copies sent. It's fine to reply to this one.
> 
> > 
> > I have a vague idea: would it work to *not* set
> > IFF_UP on slave devices at all?
> Hmm, I ever thought about this option, and it appears this solution is more
> invasive than required to convert existing scripts, despite the controversy
> of introducing internal netdev state to differentiate user visible state.
> Either we disallow slave to be brought up by user, or to not set IFF_UP flag
> but instead use the internal one, could end up with substantial behavioral
> change that breaks scripts. Consider any admin script that does `ip link set
> dev ... up' successfully just assumes the link is up and subsequent
> operation can be done as usual. While it *may* work for dracut (yet to be
> verified), I'm a bit concerned that there are more scripts to be converted
> than those that don't follow volatile failover slave names. It's technically
> doable, but may not worth the effort (in terms of porting existing
> scripts/apps).
> 
> Thanks
> -Siwei


Right. Advantage could be that we prevent all kind of
misconfigurations e.g. when one has a route on a slave.

> > 
> > Would this reduce the chances of existing scripts such as dracut being
> > confused?
> > 
> > And this leaves open the option for scripts to address
> > slaves by checking some custom attribute.
> > 
> > > ---
> > >   include/linux/netdevice.h |  3 +++
> > >   net/core/dev.c            |  3 ++-
> > >   net/core/failover.c       | 11 +++++++++--
> > >   3 files changed, 14 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 857f8ab..6d9e4e0 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -1487,6 +1487,7 @@ struct net_device_ops {
> > >    * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
> > >    * @IFF_FAILOVER: device is a failover master device
> > >    * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> > > + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
> > >    */
> > >   enum netdev_priv_flags {
> > >   	IFF_802_1Q_VLAN			= 1<<0,
> > > @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
> > >   	IFF_NO_RX_HANDLER		= 1<<26,
> > >   	IFF_FAILOVER			= 1<<27,
> > >   	IFF_FAILOVER_SLAVE		= 1<<28,
> > > +	IFF_SLAVE_RENAME_OK		= 1<<29,
> > >   };
> > >   #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
> > > @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
> > >   #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
> > >   #define IFF_FAILOVER			IFF_FAILOVER
> > >   #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
> > > +#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
> > >   /**
> > >    *	struct net_device - The DEVICE structure.
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 722d50d..ae070de 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
> > >   	BUG_ON(!dev_net(dev));
> > >   	net = dev_net(dev);
> > > -	if (dev->flags & IFF_UP)
> > > +	if (dev->flags & IFF_UP &&
> > > +	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
> > >   		return -EBUSY;
> > >   	write_seqcount_begin(&devnet_rename_seq);
> > > diff --git a/net/core/failover.c b/net/core/failover.c
> > > index 4a92a98..1fd8bbb 100644
> > > --- a/net/core/failover.c
> > > +++ b/net/core/failover.c
> > > @@ -16,6 +16,11 @@
> > >   static LIST_HEAD(failover_list);
> > >   static DEFINE_SPINLOCK(failover_lock);
> > > +static bool slave_rename_ok = true;
> > > +
> > > +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
> > > +MODULE_PARM_DESC(slave_rename_ok,
> > > +		 "If set allow renaming the slave when failover master is up");
> > >   static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
> > >   {
> > > @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
> > >   	}
> > >   	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
> > > +	if (slave_rename_ok)
> > > +		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
> > >   	if (fops && fops->slave_register &&
> > >   	    !fops->slave_register(slave_dev, failover_dev))
> > >   		return NOTIFY_OK;
> > >   	netdev_upper_dev_unlink(slave_dev, failover_dev);
> > > -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> > > +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> > >   err_upper_link:
> > >   	netdev_rx_handler_unregister(slave_dev);
> > >   done:
> > > @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
> > >   	netdev_rx_handler_unregister(slave_dev);
> > >   	netdev_upper_dev_unlink(slave_dev, failover_dev);
> > > -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> > > +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> > >   	if (fops && fops->slave_unregister &&
> > >   	    !fops->slave_unregister(slave_dev, failover_dev))
> > > -- 
> > > 1.8.3.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05 19:19   ` si-wei liu
  2019-03-05 19:24     ` Stephen Hemminger
  2019-03-05 19:24     ` Stephen Hemminger
@ 2019-03-05 20:28     ` Michael S. Tsirkin
  2019-03-05 20:28     ` Michael S. Tsirkin
  3 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-05 20:28 UTC (permalink / raw)
  To: si-wei liu
  Cc: Jiri Pirko, Jakub Kicinski, Sridhar Samudrala, virtualization,
	liran.alon, Netdev, boris.ostrovsky, David Miller

On Tue, Mar 05, 2019 at 11:19:32AM -0800, si-wei liu wrote:
> 
> 
> On 3/4/2019 6:33 PM, Michael S. Tsirkin wrote:
> > On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote:
> > > When a netdev appears through hot plug then gets enslaved by a failover
> > > master that is already up and running, the slave will be opened
> > > right away after getting enslaved. Today there's a race that userspace
> > > (udev) may fail to rename the slave if the kernel (net_failover)
> > > opens the slave earlier than when the userspace rename happens.
> > > Unlike bond or team, the primary slave of failover can't be renamed by
> > > userspace ahead of time, since the kernel initiated auto-enslavement is
> > > unable to, or rather, is never meant to be synchronized with the rename
> > > request from userspace.
> > > 
> > > As the failover slave interfaces are not designed to be operated
> > > directly by userspace apps: IP configuration, filter rules with
> > > regard to network traffic passing and etc., should all be done on master
> > > interface. In general, userspace apps only care about the
> > > name of master interface, while slave names are less important as long
> > > as admin users can see reliable names that may carry
> > > other information describing the netdev. For e.g., they can infer that
> > > "ens3nsby" is a standby slave of "ens3", while for a
> > > name like "eth0" they can't tell which master it belongs to.
> > > 
> > > Historically the name of IFF_UP interface can't be changed because
> > > there might be admin script or management software that is already
> > > relying on such behavior and assumes that the slave name can't be
> > > changed once UP. But failover is special: with the in-kernel
> > > auto-enslavement mechanism, the userspace expectation for device
> > > enumeration and bring-up order is already broken. Previously initramfs
> > > and various userspace config tools were modified to bypass failover
> > > slaves because of auto-enslavement and duplicate MAC address. Similarly,
> > > in case that users care about seeing reliable slave name, the new type
> > > of failover slaves needs to be taken care of specifically in userspace
> > > anyway.
> > > 
> > > For that to work, now introduce a module-level tunable,
> > > "slave_rename_ok" that allows users to lift up the rename restriction on
> > > failover slave which is already UP. Although it's possible this change
> > > potentially break userspace component (most likely configuration scripts
> > > or management software) that assumes slave name can't be changed while
> > > UP, it's relatively a limited and controllable set among all userspace
> > > components, which can be fixed specifically to work with the new naming
> > > behavior of the failover slave. Userspace component interacting with
> > > slaves should be changed to operate on failover master instead, as the
> > > failover slave is dynamic in nature which may come and go at any point.
> > > The goal is to make the role of failover slaves less relevant, and
> > > all userspace should only deal with master in the long run. The default
> > > for the "slave_rename_ok" is set to true(1). If userspace doesn't have
> > > the right support in place meanwhile users don't care about reliable
> > > userspace naming, the value can be set to false(0).
> > > 
> > > Signed-off-by: Si-Wei.Liu@oracle.com
> > > Reviewed-by: Liran Alon <liran.alon@oracle.com>
> > Not sure which of the versions I should reply to.
> Sorry for multiple copies sent. It's fine to reply to this one.
> 
> > 
> > I have a vague idea: would it work to *not* set
> > IFF_UP on slave devices at all?
> Hmm, I ever thought about this option, and it appears this solution is more
> invasive than required to convert existing scripts, despite the controversy
> of introducing internal netdev state to differentiate user visible state.
> Either we disallow slave to be brought up by user, or to not set IFF_UP flag
> but instead use the internal one, could end up with substantial behavioral
> change that breaks scripts. Consider any admin script that does `ip link set
> dev ... up' successfully just assumes the link is up and subsequent
> operation can be done as usual. While it *may* work for dracut (yet to be
> verified), I'm a bit concerned that there are more scripts to be converted
> than those that don't follow volatile failover slave names. It's technically
> doable, but may not worth the effort (in terms of porting existing
> scripts/apps).
> 
> Thanks
> -Siwei


Right. Advantage could be that we prevent all kind of
misconfigurations e.g. when one has a route on a slave.

> > 
> > Would this reduce the chances of existing scripts such as dracut being
> > confused?
> > 
> > And this leaves open the option for scripts to address
> > slaves by checking some custom attribute.
> > 
> > > ---
> > >   include/linux/netdevice.h |  3 +++
> > >   net/core/dev.c            |  3 ++-
> > >   net/core/failover.c       | 11 +++++++++--
> > >   3 files changed, 14 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 857f8ab..6d9e4e0 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -1487,6 +1487,7 @@ struct net_device_ops {
> > >    * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
> > >    * @IFF_FAILOVER: device is a failover master device
> > >    * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> > > + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
> > >    */
> > >   enum netdev_priv_flags {
> > >   	IFF_802_1Q_VLAN			= 1<<0,
> > > @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
> > >   	IFF_NO_RX_HANDLER		= 1<<26,
> > >   	IFF_FAILOVER			= 1<<27,
> > >   	IFF_FAILOVER_SLAVE		= 1<<28,
> > > +	IFF_SLAVE_RENAME_OK		= 1<<29,
> > >   };
> > >   #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
> > > @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
> > >   #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
> > >   #define IFF_FAILOVER			IFF_FAILOVER
> > >   #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
> > > +#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
> > >   /**
> > >    *	struct net_device - The DEVICE structure.
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 722d50d..ae070de 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
> > >   	BUG_ON(!dev_net(dev));
> > >   	net = dev_net(dev);
> > > -	if (dev->flags & IFF_UP)
> > > +	if (dev->flags & IFF_UP &&
> > > +	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
> > >   		return -EBUSY;
> > >   	write_seqcount_begin(&devnet_rename_seq);
> > > diff --git a/net/core/failover.c b/net/core/failover.c
> > > index 4a92a98..1fd8bbb 100644
> > > --- a/net/core/failover.c
> > > +++ b/net/core/failover.c
> > > @@ -16,6 +16,11 @@
> > >   static LIST_HEAD(failover_list);
> > >   static DEFINE_SPINLOCK(failover_lock);
> > > +static bool slave_rename_ok = true;
> > > +
> > > +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
> > > +MODULE_PARM_DESC(slave_rename_ok,
> > > +		 "If set allow renaming the slave when failover master is up");
> > >   static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
> > >   {
> > > @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
> > >   	}
> > >   	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
> > > +	if (slave_rename_ok)
> > > +		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
> > >   	if (fops && fops->slave_register &&
> > >   	    !fops->slave_register(slave_dev, failover_dev))
> > >   		return NOTIFY_OK;
> > >   	netdev_upper_dev_unlink(slave_dev, failover_dev);
> > > -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> > > +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> > >   err_upper_link:
> > >   	netdev_rx_handler_unregister(slave_dev);
> > >   done:
> > > @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
> > >   	netdev_rx_handler_unregister(slave_dev);
> > >   	netdev_upper_dev_unlink(slave_dev, failover_dev);
> > > -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> > > +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> > >   	if (fops && fops->slave_unregister &&
> > >   	    !fops->slave_unregister(slave_dev, failover_dev))
> > > -- 
> > > 1.8.3.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05 20:28     ` Michael S. Tsirkin
@ 2019-03-05 22:49       ` si-wei liu
  0 siblings, 0 replies; 28+ messages in thread
From: si-wei liu @ 2019-03-05 22:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sridhar Samudrala, Stephen Hemminger, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna



On 3/5/2019 12:28 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 11:19:32AM -0800, si-wei liu wrote:
>>
>> On 3/4/2019 6:33 PM, Michael S. Tsirkin wrote:
>>> On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote:
>>>> When a netdev appears through hot plug then gets enslaved by a failover
>>>> master that is already up and running, the slave will be opened
>>>> right away after getting enslaved. Today there's a race that userspace
>>>> (udev) may fail to rename the slave if the kernel (net_failover)
>>>> opens the slave earlier than when the userspace rename happens.
>>>> Unlike bond or team, the primary slave of failover can't be renamed by
>>>> userspace ahead of time, since the kernel initiated auto-enslavement is
>>>> unable to, or rather, is never meant to be synchronized with the rename
>>>> request from userspace.
>>>>
>>>> As the failover slave interfaces are not designed to be operated
>>>> directly by userspace apps: IP configuration, filter rules with
>>>> regard to network traffic passing and etc., should all be done on master
>>>> interface. In general, userspace apps only care about the
>>>> name of master interface, while slave names are less important as long
>>>> as admin users can see reliable names that may carry
>>>> other information describing the netdev. For e.g., they can infer that
>>>> "ens3nsby" is a standby slave of "ens3", while for a
>>>> name like "eth0" they can't tell which master it belongs to.
>>>>
>>>> Historically the name of IFF_UP interface can't be changed because
>>>> there might be admin script or management software that is already
>>>> relying on such behavior and assumes that the slave name can't be
>>>> changed once UP. But failover is special: with the in-kernel
>>>> auto-enslavement mechanism, the userspace expectation for device
>>>> enumeration and bring-up order is already broken. Previously initramfs
>>>> and various userspace config tools were modified to bypass failover
>>>> slaves because of auto-enslavement and duplicate MAC address. Similarly,
>>>> in case that users care about seeing reliable slave name, the new type
>>>> of failover slaves needs to be taken care of specifically in userspace
>>>> anyway.
>>>>
>>>> For that to work, now introduce a module-level tunable,
>>>> "slave_rename_ok" that allows users to lift up the rename restriction on
>>>> failover slave which is already UP. Although it's possible this change
>>>> potentially break userspace component (most likely configuration scripts
>>>> or management software) that assumes slave name can't be changed while
>>>> UP, it's relatively a limited and controllable set among all userspace
>>>> components, which can be fixed specifically to work with the new naming
>>>> behavior of the failover slave. Userspace component interacting with
>>>> slaves should be changed to operate on failover master instead, as the
>>>> failover slave is dynamic in nature which may come and go at any point.
>>>> The goal is to make the role of failover slaves less relevant, and
>>>> all userspace should only deal with master in the long run. The default
>>>> for the "slave_rename_ok" is set to true(1). If userspace doesn't have
>>>> the right support in place meanwhile users don't care about reliable
>>>> userspace naming, the value can be set to false(0).
>>>>
>>>> Signed-off-by: Si-Wei.Liu@oracle.com
>>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>> Not sure which of the versions I should reply to.
>> Sorry for multiple copies sent. It's fine to reply to this one.
>>
>>> I have a vague idea: would it work to *not* set
>>> IFF_UP on slave devices at all?
>> Hmm, I ever thought about this option, and it appears this solution is more
>> invasive than required to convert existing scripts, despite the controversy
>> of introducing internal netdev state to differentiate user visible state.
>> Either we disallow slave to be brought up by user, or to not set IFF_UP flag
>> but instead use the internal one, could end up with substantial behavioral
>> change that breaks scripts. Consider any admin script that does `ip link set
>> dev ... up' successfully just assumes the link is up and subsequent
>> operation can be done as usual. While it *may* work for dracut (yet to be
>> verified), I'm a bit concerned that there are more scripts to be converted
>> than those that don't follow volatile failover slave names. It's technically
>> doable, but may not worth the effort (in terms of porting existing
>> scripts/apps).
>>
>> Thanks
>> -Siwei
>
> Right. Advantage could be that we prevent all kind of
> misconfigurations e.g. when one has a route on a slave.
The fix for the slave route problem is already there in dracut. The ship 
has sailed, no matter how seamless upstream thought failover could work 
with the existing userspace. I would rather avoid introducing more 
breakage to userspace if there's simple yet less intrusive way to fix 
the rename issue itself.

-Siwei

>
>>> Would this reduce the chances of existing scripts such as dracut being
>>> confused?
>>>
>>> And this leaves open the option for scripts to address
>>> slaves by checking some custom attribute.
>>>
>>>> ---
>>>>    include/linux/netdevice.h |  3 +++
>>>>    net/core/dev.c            |  3 ++-
>>>>    net/core/failover.c       | 11 +++++++++--
>>>>    3 files changed, 14 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 857f8ab..6d9e4e0 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -1487,6 +1487,7 @@ struct net_device_ops {
>>>>     * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>>>>     * @IFF_FAILOVER: device is a failover master device
>>>>     * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>>>> + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
>>>>     */
>>>>    enum netdev_priv_flags {
>>>>    	IFF_802_1Q_VLAN			= 1<<0,
>>>> @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
>>>>    	IFF_NO_RX_HANDLER		= 1<<26,
>>>>    	IFF_FAILOVER			= 1<<27,
>>>>    	IFF_FAILOVER_SLAVE		= 1<<28,
>>>> +	IFF_SLAVE_RENAME_OK		= 1<<29,
>>>>    };
>>>>    #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
>>>> @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
>>>>    #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
>>>>    #define IFF_FAILOVER			IFF_FAILOVER
>>>>    #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
>>>> +#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
>>>>    /**
>>>>     *	struct net_device - The DEVICE structure.
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index 722d50d..ae070de 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
>>>>    	BUG_ON(!dev_net(dev));
>>>>    	net = dev_net(dev);
>>>> -	if (dev->flags & IFF_UP)
>>>> +	if (dev->flags & IFF_UP &&
>>>> +	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
>>>>    		return -EBUSY;
>>>>    	write_seqcount_begin(&devnet_rename_seq);
>>>> diff --git a/net/core/failover.c b/net/core/failover.c
>>>> index 4a92a98..1fd8bbb 100644
>>>> --- a/net/core/failover.c
>>>> +++ b/net/core/failover.c
>>>> @@ -16,6 +16,11 @@
>>>>    static LIST_HEAD(failover_list);
>>>>    static DEFINE_SPINLOCK(failover_lock);
>>>> +static bool slave_rename_ok = true;
>>>> +
>>>> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
>>>> +MODULE_PARM_DESC(slave_rename_ok,
>>>> +		 "If set allow renaming the slave when failover master is up");
>>>>    static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
>>>>    {
>>>> @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
>>>>    	}
>>>>    	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
>>>> +	if (slave_rename_ok)
>>>> +		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
>>>>    	if (fops && fops->slave_register &&
>>>>    	    !fops->slave_register(slave_dev, failover_dev))
>>>>    		return NOTIFY_OK;
>>>>    	netdev_upper_dev_unlink(slave_dev, failover_dev);
>>>> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>>>> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>>>>    err_upper_link:
>>>>    	netdev_rx_handler_unregister(slave_dev);
>>>>    done:
>>>> @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
>>>>    	netdev_rx_handler_unregister(slave_dev);
>>>>    	netdev_upper_dev_unlink(slave_dev, failover_dev);
>>>> -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>>>> +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
>>>>    	if (fops && fops->slave_unregister &&
>>>>    	    !fops->slave_unregister(slave_dev, failover_dev))
>>>> -- 
>>>> 1.8.3.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05 19:35       ` si-wei liu
@ 2019-03-06  0:06           ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-06  0:06 UTC (permalink / raw)
  To: si-wei liu
  Cc: Stephen Hemminger, Sridhar Samudrala, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna

On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > On Tue, 5 Mar 2019 11:19:32 -0800
> > si-wei liu <si-wei.liu@oracle.com> wrote:
> > 
> > > > I have a vague idea: would it work to *not* set
> > > > IFF_UP on slave devices at all?
> > > Hmm, I ever thought about this option, and it appears this solution is
> > > more invasive than required to convert existing scripts, despite the
> > > controversy of introducing internal netdev state to differentiate user
> > > visible state. Either we disallow slave to be brought up by user, or to
> > > not set IFF_UP flag but instead use the internal one, could end up with
> > > substantial behavioral change that breaks scripts. Consider any admin
> > > script that does `ip link set dev ... up' successfully just assumes the
> > > link is up and subsequent operation can be done as usual.

How would it work when carrier is off?

> While it *may*
> > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > more scripts to be converted than those that don't follow volatile
> > > failover slave names. It's technically doable, but may not worth the
> > > effort (in terms of porting existing scripts/apps).
> > > 
> > > Thanks
> > > -Siwei
> > Won't work for most devices.  Many devices turn off PHY and link layer
> > if not IFF_UP
> True, that's what I said about introducing internal state for those driver
> and other kernel component. Very invasive change indeed.
> 
> -Siwei

Well I did say it's vague.
How about hiding IFF_UP from dev_get_flags (and probably
__dev_change_flags)?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
@ 2019-03-06  0:06           ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-06  0:06 UTC (permalink / raw)
  To: si-wei liu
  Cc: Jiri Pirko, Jakub Kicinski, Sridhar Samudrala, virtualization,
	liran.alon, Netdev, boris.ostrovsky, David Miller

On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > On Tue, 5 Mar 2019 11:19:32 -0800
> > si-wei liu <si-wei.liu@oracle.com> wrote:
> > 
> > > > I have a vague idea: would it work to *not* set
> > > > IFF_UP on slave devices at all?
> > > Hmm, I ever thought about this option, and it appears this solution is
> > > more invasive than required to convert existing scripts, despite the
> > > controversy of introducing internal netdev state to differentiate user
> > > visible state. Either we disallow slave to be brought up by user, or to
> > > not set IFF_UP flag but instead use the internal one, could end up with
> > > substantial behavioral change that breaks scripts. Consider any admin
> > > script that does `ip link set dev ... up' successfully just assumes the
> > > link is up and subsequent operation can be done as usual.

How would it work when carrier is off?

> While it *may*
> > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > more scripts to be converted than those that don't follow volatile
> > > failover slave names. It's technically doable, but may not worth the
> > > effort (in terms of porting existing scripts/apps).
> > > 
> > > Thanks
> > > -Siwei
> > Won't work for most devices.  Many devices turn off PHY and link layer
> > if not IFF_UP
> True, that's what I said about introducing internal state for those driver
> and other kernel component. Very invasive change indeed.
> 
> -Siwei

Well I did say it's vague.
How about hiding IFF_UP from dev_get_flags (and probably
__dev_change_flags)?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-06  0:06           ` Michael S. Tsirkin
  (?)
@ 2019-03-06  0:20           ` si-wei liu
  2019-03-06  0:36             ` Michael S. Tsirkin
  2019-03-06  0:36             ` Michael S. Tsirkin
  -1 siblings, 2 replies; 28+ messages in thread
From: si-wei liu @ 2019-03-06  0:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stephen Hemminger, Sridhar Samudrala, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna



On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
>>
>> On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
>>> On Tue, 5 Mar 2019 11:19:32 -0800
>>> si-wei liu <si-wei.liu@oracle.com> wrote:
>>>
>>>>> I have a vague idea: would it work to *not* set
>>>>> IFF_UP on slave devices at all?
>>>> Hmm, I ever thought about this option, and it appears this solution is
>>>> more invasive than required to convert existing scripts, despite the
>>>> controversy of introducing internal netdev state to differentiate user
>>>> visible state. Either we disallow slave to be brought up by user, or to
>>>> not set IFF_UP flag but instead use the internal one, could end up with
>>>> substantial behavioral change that breaks scripts. Consider any admin
>>>> script that does `ip link set dev ... up' successfully just assumes the
>>>> link is up and subsequent operation can be done as usual.
> How would it work when carrier is off?
>
>> While it *may*
>>>> work for dracut (yet to be verified), I'm a bit concerned that there are
>>>> more scripts to be converted than those that don't follow volatile
>>>> failover slave names. It's technically doable, but may not worth the
>>>> effort (in terms of porting existing scripts/apps).
>>>>
>>>> Thanks
>>>> -Siwei
>>> Won't work for most devices.  Many devices turn off PHY and link layer
>>> if not IFF_UP
>> True, that's what I said about introducing internal state for those driver
>> and other kernel component. Very invasive change indeed.
>>
>> -Siwei
> Well I did say it's vague.
> How about hiding IFF_UP from dev_get_flags (and probably
> __dev_change_flags)?
>
Any different? This has small footprint for the kernel change for sure, 
while the discrepancy is still there. Anyone who writes code for IFF_UP 
will not notice IFF_FAILOVER_SLAVE.

Not to mention more userspace "fixup" work has to be done due to this 
change.

-Siwei




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-06  0:20           ` si-wei liu
@ 2019-03-06  0:36             ` Michael S. Tsirkin
  2019-03-06  0:51               ` si-wei liu
  2019-03-06  0:36             ` Michael S. Tsirkin
  1 sibling, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-06  0:36 UTC (permalink / raw)
  To: si-wei liu
  Cc: Stephen Hemminger, Sridhar Samudrala, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna

On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> > > 
> > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > > > On Tue, 5 Mar 2019 11:19:32 -0800
> > > > si-wei liu <si-wei.liu@oracle.com> wrote:
> > > > 
> > > > > > I have a vague idea: would it work to *not* set
> > > > > > IFF_UP on slave devices at all?
> > > > > Hmm, I ever thought about this option, and it appears this solution is
> > > > > more invasive than required to convert existing scripts, despite the
> > > > > controversy of introducing internal netdev state to differentiate user
> > > > > visible state. Either we disallow slave to be brought up by user, or to
> > > > > not set IFF_UP flag but instead use the internal one, could end up with
> > > > > substantial behavioral change that breaks scripts. Consider any admin
> > > > > script that does `ip link set dev ... up' successfully just assumes the
> > > > > link is up and subsequent operation can be done as usual.
> > How would it work when carrier is off?
> > 
> > > While it *may*
> > > > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > > > more scripts to be converted than those that don't follow volatile
> > > > > failover slave names. It's technically doable, but may not worth the
> > > > > effort (in terms of porting existing scripts/apps).
> > > > > 
> > > > > Thanks
> > > > > -Siwei
> > > > Won't work for most devices.  Many devices turn off PHY and link layer
> > > > if not IFF_UP
> > > True, that's what I said about introducing internal state for those driver
> > > and other kernel component. Very invasive change indeed.
> > > 
> > > -Siwei
> > Well I did say it's vague.
> > How about hiding IFF_UP from dev_get_flags (and probably
> > __dev_change_flags)?
> > 
> Any different? This has small footprint for the kernel change for sure,
> while the discrepancy is still there. Anyone who writes code for IFF_UP will
> not notice IFF_FAILOVER_SLAVE.
> 
> Not to mention more userspace "fixup" work has to be done due to this
> change.
> 
> -Siwei
> 
> 

Point is it's ok since most userspace should just ignore slaves
- hopefully it will just ignore it since it already
ignores interfaces that are down.

-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-06  0:20           ` si-wei liu
  2019-03-06  0:36             ` Michael S. Tsirkin
@ 2019-03-06  0:36             ` Michael S. Tsirkin
  1 sibling, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-06  0:36 UTC (permalink / raw)
  To: si-wei liu
  Cc: Jiri Pirko, Jakub Kicinski, Sridhar Samudrala, virtualization,
	liran.alon, Netdev, boris.ostrovsky, David Miller

On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> > > 
> > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > > > On Tue, 5 Mar 2019 11:19:32 -0800
> > > > si-wei liu <si-wei.liu@oracle.com> wrote:
> > > > 
> > > > > > I have a vague idea: would it work to *not* set
> > > > > > IFF_UP on slave devices at all?
> > > > > Hmm, I ever thought about this option, and it appears this solution is
> > > > > more invasive than required to convert existing scripts, despite the
> > > > > controversy of introducing internal netdev state to differentiate user
> > > > > visible state. Either we disallow slave to be brought up by user, or to
> > > > > not set IFF_UP flag but instead use the internal one, could end up with
> > > > > substantial behavioral change that breaks scripts. Consider any admin
> > > > > script that does `ip link set dev ... up' successfully just assumes the
> > > > > link is up and subsequent operation can be done as usual.
> > How would it work when carrier is off?
> > 
> > > While it *may*
> > > > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > > > more scripts to be converted than those that don't follow volatile
> > > > > failover slave names. It's technically doable, but may not worth the
> > > > > effort (in terms of porting existing scripts/apps).
> > > > > 
> > > > > Thanks
> > > > > -Siwei
> > > > Won't work for most devices.  Many devices turn off PHY and link layer
> > > > if not IFF_UP
> > > True, that's what I said about introducing internal state for those driver
> > > and other kernel component. Very invasive change indeed.
> > > 
> > > -Siwei
> > Well I did say it's vague.
> > How about hiding IFF_UP from dev_get_flags (and probably
> > __dev_change_flags)?
> > 
> Any different? This has small footprint for the kernel change for sure,
> while the discrepancy is still there. Anyone who writes code for IFF_UP will
> not notice IFF_FAILOVER_SLAVE.
> 
> Not to mention more userspace "fixup" work has to be done due to this
> change.
> 
> -Siwei
> 
> 

Point is it's ok since most userspace should just ignore slaves
- hopefully it will just ignore it since it already
ignores interfaces that are down.

-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-06  0:36             ` Michael S. Tsirkin
@ 2019-03-06  0:51               ` si-wei liu
  2019-03-06  6:43                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: si-wei liu @ 2019-03-06  0:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stephen Hemminger, Sridhar Samudrala, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna



On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
>>
>> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
>>> On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
>>>> On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
>>>>> On Tue, 5 Mar 2019 11:19:32 -0800
>>>>> si-wei liu <si-wei.liu@oracle.com> wrote:
>>>>>
>>>>>>> I have a vague idea: would it work to *not* set
>>>>>>> IFF_UP on slave devices at all?
>>>>>> Hmm, I ever thought about this option, and it appears this solution is
>>>>>> more invasive than required to convert existing scripts, despite the
>>>>>> controversy of introducing internal netdev state to differentiate user
>>>>>> visible state. Either we disallow slave to be brought up by user, or to
>>>>>> not set IFF_UP flag but instead use the internal one, could end up with
>>>>>> substantial behavioral change that breaks scripts. Consider any admin
>>>>>> script that does `ip link set dev ... up' successfully just assumes the
>>>>>> link is up and subsequent operation can be done as usual.
>>> How would it work when carrier is off?
>>>
>>>> While it *may*
>>>>>> work for dracut (yet to be verified), I'm a bit concerned that there are
>>>>>> more scripts to be converted than those that don't follow volatile
>>>>>> failover slave names. It's technically doable, but may not worth the
>>>>>> effort (in terms of porting existing scripts/apps).
>>>>>>
>>>>>> Thanks
>>>>>> -Siwei
>>>>> Won't work for most devices.  Many devices turn off PHY and link layer
>>>>> if not IFF_UP
>>>> True, that's what I said about introducing internal state for those driver
>>>> and other kernel component. Very invasive change indeed.
>>>>
>>>> -Siwei
>>> Well I did say it's vague.
>>> How about hiding IFF_UP from dev_get_flags (and probably
>>> __dev_change_flags)?
>>>
>> Any different? This has small footprint for the kernel change for sure,
>> while the discrepancy is still there. Anyone who writes code for IFF_UP will
>> not notice IFF_FAILOVER_SLAVE.
>>
>> Not to mention more userspace "fixup" work has to be done due to this
>> change.
>>
>> -Siwei
>>
>>
> Point is it's ok since most userspace should just ignore slaves
> - hopefully it will just ignore it since it already
> ignores interfaces that are down.
Admin script thought the interface could be bright up and do further 
operations without checking the UP flag. It doesn't look to be a 
reliable way of prohibit userspace from operating against slaves.

-Siwei




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-06  0:51               ` si-wei liu
@ 2019-03-06  6:43                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-06  6:43 UTC (permalink / raw)
  To: si-wei liu
  Cc: Stephen Hemminger, Sridhar Samudrala, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna

On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
> > > 
> > > On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> > > > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > > > > > On Tue, 5 Mar 2019 11:19:32 -0800
> > > > > > si-wei liu <si-wei.liu@oracle.com> wrote:
> > > > > > 
> > > > > > > > I have a vague idea: would it work to *not* set
> > > > > > > > IFF_UP on slave devices at all?
> > > > > > > Hmm, I ever thought about this option, and it appears this solution is
> > > > > > > more invasive than required to convert existing scripts, despite the
> > > > > > > controversy of introducing internal netdev state to differentiate user
> > > > > > > visible state. Either we disallow slave to be brought up by user, or to
> > > > > > > not set IFF_UP flag but instead use the internal one, could end up with
> > > > > > > substantial behavioral change that breaks scripts. Consider any admin
> > > > > > > script that does `ip link set dev ... up' successfully just assumes the
> > > > > > > link is up and subsequent operation can be done as usual.
> > > > How would it work when carrier is off?
> > > > 
> > > > > While it *may*
> > > > > > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > > > > > more scripts to be converted than those that don't follow volatile
> > > > > > > failover slave names. It's technically doable, but may not worth the
> > > > > > > effort (in terms of porting existing scripts/apps).
> > > > > > > 
> > > > > > > Thanks
> > > > > > > -Siwei
> > > > > > Won't work for most devices.  Many devices turn off PHY and link layer
> > > > > > if not IFF_UP
> > > > > True, that's what I said about introducing internal state for those driver
> > > > > and other kernel component. Very invasive change indeed.
> > > > > 
> > > > > -Siwei
> > > > Well I did say it's vague.
> > > > How about hiding IFF_UP from dev_get_flags (and probably
> > > > __dev_change_flags)?
> > > > 
> > > Any different? This has small footprint for the kernel change for sure,
> > > while the discrepancy is still there. Anyone who writes code for IFF_UP will
> > > not notice IFF_FAILOVER_SLAVE.
> > > 
> > > Not to mention more userspace "fixup" work has to be done due to this
> > > change.
> > > 
> > > -Siwei
> > > 
> > > 
> > Point is it's ok since most userspace should just ignore slaves
> > - hopefully it will just ignore it since it already
> > ignores interfaces that are down.
> Admin script thought the interface could be bright up and do further
> operations without checking the UP flag.

These scripts then would be broken  on any box with multiple interfaces
since not all of these would have carrier.


> It doesn't look to be a reliable
> way of prohibit userspace from operating against slaves.
> 
> -Siwei
> 
> 

This does not mean we shouldn't make an effort to disable broken
configurations.

I am not arguing against your patch. Not at all. I see better
hiding of slaves as a separate enhancement.


Acked-by: Michael S. Tsirkin <mst@redhat.com>


-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
@ 2019-03-06  6:43                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-06  6:43 UTC (permalink / raw)
  To: si-wei liu
  Cc: Jiri Pirko, Jakub Kicinski, Sridhar Samudrala, virtualization,
	liran.alon, Netdev, boris.ostrovsky, David Miller

On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
> > > 
> > > On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> > > > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > > > > > On Tue, 5 Mar 2019 11:19:32 -0800
> > > > > > si-wei liu <si-wei.liu@oracle.com> wrote:
> > > > > > 
> > > > > > > > I have a vague idea: would it work to *not* set
> > > > > > > > IFF_UP on slave devices at all?
> > > > > > > Hmm, I ever thought about this option, and it appears this solution is
> > > > > > > more invasive than required to convert existing scripts, despite the
> > > > > > > controversy of introducing internal netdev state to differentiate user
> > > > > > > visible state. Either we disallow slave to be brought up by user, or to
> > > > > > > not set IFF_UP flag but instead use the internal one, could end up with
> > > > > > > substantial behavioral change that breaks scripts. Consider any admin
> > > > > > > script that does `ip link set dev ... up' successfully just assumes the
> > > > > > > link is up and subsequent operation can be done as usual.
> > > > How would it work when carrier is off?
> > > > 
> > > > > While it *may*
> > > > > > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > > > > > more scripts to be converted than those that don't follow volatile
> > > > > > > failover slave names. It's technically doable, but may not worth the
> > > > > > > effort (in terms of porting existing scripts/apps).
> > > > > > > 
> > > > > > > Thanks
> > > > > > > -Siwei
> > > > > > Won't work for most devices.  Many devices turn off PHY and link layer
> > > > > > if not IFF_UP
> > > > > True, that's what I said about introducing internal state for those driver
> > > > > and other kernel component. Very invasive change indeed.
> > > > > 
> > > > > -Siwei
> > > > Well I did say it's vague.
> > > > How about hiding IFF_UP from dev_get_flags (and probably
> > > > __dev_change_flags)?
> > > > 
> > > Any different? This has small footprint for the kernel change for sure,
> > > while the discrepancy is still there. Anyone who writes code for IFF_UP will
> > > not notice IFF_FAILOVER_SLAVE.
> > > 
> > > Not to mention more userspace "fixup" work has to be done due to this
> > > change.
> > > 
> > > -Siwei
> > > 
> > > 
> > Point is it's ok since most userspace should just ignore slaves
> > - hopefully it will just ignore it since it already
> > ignores interfaces that are down.
> Admin script thought the interface could be bright up and do further
> operations without checking the UP flag.

These scripts then would be broken  on any box with multiple interfaces
since not all of these would have carrier.


> It doesn't look to be a reliable
> way of prohibit userspace from operating against slaves.
> 
> -Siwei
> 
> 

This does not mean we shouldn't make an effort to disable broken
configurations.

I am not arguing against your patch. Not at all. I see better
hiding of slaves as a separate enhancement.


Acked-by: Michael S. Tsirkin <mst@redhat.com>


-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-06  6:43                   ` Michael S. Tsirkin
  (?)
@ 2019-03-06  7:15                   ` si-wei liu
  2019-03-06  7:23                     ` Michael S. Tsirkin
  2019-03-06  7:23                     ` Michael S. Tsirkin
  -1 siblings, 2 replies; 28+ messages in thread
From: si-wei liu @ 2019-03-06  7:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stephen Hemminger, Sridhar Samudrala, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna



On 3/5/2019 10:43 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote:
>>
>> On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
>>> On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
>>>> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
>>>>>> On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
>>>>>>> On Tue, 5 Mar 2019 11:19:32 -0800
>>>>>>> si-wei liu <si-wei.liu@oracle.com> wrote:
>>>>>>>
>>>>>>>>> I have a vague idea: would it work to *not* set
>>>>>>>>> IFF_UP on slave devices at all?
>>>>>>>> Hmm, I ever thought about this option, and it appears this solution is
>>>>>>>> more invasive than required to convert existing scripts, despite the
>>>>>>>> controversy of introducing internal netdev state to differentiate user
>>>>>>>> visible state. Either we disallow slave to be brought up by user, or to
>>>>>>>> not set IFF_UP flag but instead use the internal one, could end up with
>>>>>>>> substantial behavioral change that breaks scripts. Consider any admin
>>>>>>>> script that does `ip link set dev ... up' successfully just assumes the
>>>>>>>> link is up and subsequent operation can be done as usual.
>>>>> How would it work when carrier is off?
>>>>>
>>>>>> While it *may*
>>>>>>>> work for dracut (yet to be verified), I'm a bit concerned that there are
>>>>>>>> more scripts to be converted than those that don't follow volatile
>>>>>>>> failover slave names. It's technically doable, but may not worth the
>>>>>>>> effort (in terms of porting existing scripts/apps).
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> -Siwei
>>>>>>> Won't work for most devices.  Many devices turn off PHY and link layer
>>>>>>> if not IFF_UP
>>>>>> True, that's what I said about introducing internal state for those driver
>>>>>> and other kernel component. Very invasive change indeed.
>>>>>>
>>>>>> -Siwei
>>>>> Well I did say it's vague.
>>>>> How about hiding IFF_UP from dev_get_flags (and probably
>>>>> __dev_change_flags)?
>>>>>
>>>> Any different? This has small footprint for the kernel change for sure,
>>>> while the discrepancy is still there. Anyone who writes code for IFF_UP will
>>>> not notice IFF_FAILOVER_SLAVE.
>>>>
>>>> Not to mention more userspace "fixup" work has to be done due to this
>>>> change.
>>>>
>>>> -Siwei
>>>>
>>>>
>>> Point is it's ok since most userspace should just ignore slaves
>>> - hopefully it will just ignore it since it already
>>> ignores interfaces that are down.
>> Admin script thought the interface could be bright up and do further
>> operations without checking the UP flag.
> These scripts then would be broken  on any box with multiple interfaces
> since not all of these would have carrier.
Consider a script executing `ifconfig ... up' and once succeeds runs 
tcpdump or some other command relying on UP interface. It's quite common 
that those scripts don't check the UP flag but instead just rely on the 
well-known fact that the command exits with 0 meaning the interface 
should be UP. This change might well break scripts of that kind.

>
>
>> It doesn't look to be a reliable
>> way of prohibit userspace from operating against slaves.
>>
>> -Siwei
>>
>>
> This does not mean we shouldn't make an effort to disable broken
> configurations.
>
> I am not arguing against your patch. Not at all. I see better
> hiding of slaves as a separate enhancement.
I understand, but my point is we should try to minimize unnecessary side 
impact to the current usage for whatever "hiding" effort we can make. 
It's hard to find a tradeoff sometimes.

>
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>
>
Thank you.

-Siwei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-06  7:15                   ` si-wei liu
  2019-03-06  7:23                     ` Michael S. Tsirkin
@ 2019-03-06  7:23                     ` Michael S. Tsirkin
  2019-03-06  8:20                       ` si-wei liu
  1 sibling, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-06  7:23 UTC (permalink / raw)
  To: si-wei liu
  Cc: Stephen Hemminger, Sridhar Samudrala, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna

On Tue, Mar 05, 2019 at 11:15:06PM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 10:43 PM, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote:
> > > 
> > > On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
> > > > > On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
> > > > > > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> > > > > > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > > > > > > > On Tue, 5 Mar 2019 11:19:32 -0800
> > > > > > > > si-wei liu <si-wei.liu@oracle.com> wrote:
> > > > > > > > 
> > > > > > > > > > I have a vague idea: would it work to *not* set
> > > > > > > > > > IFF_UP on slave devices at all?
> > > > > > > > > Hmm, I ever thought about this option, and it appears this solution is
> > > > > > > > > more invasive than required to convert existing scripts, despite the
> > > > > > > > > controversy of introducing internal netdev state to differentiate user
> > > > > > > > > visible state. Either we disallow slave to be brought up by user, or to
> > > > > > > > > not set IFF_UP flag but instead use the internal one, could end up with
> > > > > > > > > substantial behavioral change that breaks scripts. Consider any admin
> > > > > > > > > script that does `ip link set dev ... up' successfully just assumes the
> > > > > > > > > link is up and subsequent operation can be done as usual.
> > > > > > How would it work when carrier is off?
> > > > > > 
> > > > > > > While it *may*
> > > > > > > > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > > > > > > > more scripts to be converted than those that don't follow volatile
> > > > > > > > > failover slave names. It's technically doable, but may not worth the
> > > > > > > > > effort (in terms of porting existing scripts/apps).
> > > > > > > > > 
> > > > > > > > > Thanks
> > > > > > > > > -Siwei
> > > > > > > > Won't work for most devices.  Many devices turn off PHY and link layer
> > > > > > > > if not IFF_UP
> > > > > > > True, that's what I said about introducing internal state for those driver
> > > > > > > and other kernel component. Very invasive change indeed.
> > > > > > > 
> > > > > > > -Siwei
> > > > > > Well I did say it's vague.
> > > > > > How about hiding IFF_UP from dev_get_flags (and probably
> > > > > > __dev_change_flags)?
> > > > > > 
> > > > > Any different? This has small footprint for the kernel change for sure,
> > > > > while the discrepancy is still there. Anyone who writes code for IFF_UP will
> > > > > not notice IFF_FAILOVER_SLAVE.
> > > > > 
> > > > > Not to mention more userspace "fixup" work has to be done due to this
> > > > > change.
> > > > > 
> > > > > -Siwei
> > > > > 
> > > > > 
> > > > Point is it's ok since most userspace should just ignore slaves
> > > > - hopefully it will just ignore it since it already
> > > > ignores interfaces that are down.
> > > Admin script thought the interface could be bright up and do further
> > > operations without checking the UP flag.
> > These scripts then would be broken  on any box with multiple interfaces
> > since not all of these would have carrier.
> Consider a script executing `ifconfig ... up' and once succeeds runs tcpdump
> or some other command relying on UP interface. It's quite common that those
> scripts don't check the UP flag but instead just rely on the well-known fact
> that the command exits with 0 meaning the interface should be UP. This
> change might well break scripts of that kind.

I am sorry I don't get it. Could you give an example
of a script that works now but would be broken?


> > 
> > 
> > > It doesn't look to be a reliable
> > > way of prohibit userspace from operating against slaves.
> > > 
> > > -Siwei
> > > 
> > > 
> > This does not mean we shouldn't make an effort to disable broken
> > configurations.
> > 
> > I am not arguing against your patch. Not at all. I see better
> > hiding of slaves as a separate enhancement.
> I understand, but my point is we should try to minimize unnecessary side
> impact to the current usage for whatever "hiding" effort we can make. It's
> hard to find a tradeoff sometimes.

Yes if some userspace made an assumption and it worked, we should keep
it working I think. I don't necessarily agree we should worry too much
about theoretical issues. In half a year since the feature got merged
it's unlikely there are millions of slightly different scripts using it.

> > 
> > 
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > 
> Thank you.
> 
> -Siwei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-06  7:15                   ` si-wei liu
@ 2019-03-06  7:23                     ` Michael S. Tsirkin
  2019-03-06  7:23                     ` Michael S. Tsirkin
  1 sibling, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-03-06  7:23 UTC (permalink / raw)
  To: si-wei liu
  Cc: Jiri Pirko, Jakub Kicinski, Sridhar Samudrala, virtualization,
	liran.alon, Netdev, boris.ostrovsky, David Miller

On Tue, Mar 05, 2019 at 11:15:06PM -0800, si-wei liu wrote:
> 
> 
> On 3/5/2019 10:43 PM, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote:
> > > 
> > > On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
> > > > > On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
> > > > > > On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
> > > > > > > On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
> > > > > > > > On Tue, 5 Mar 2019 11:19:32 -0800
> > > > > > > > si-wei liu <si-wei.liu@oracle.com> wrote:
> > > > > > > > 
> > > > > > > > > > I have a vague idea: would it work to *not* set
> > > > > > > > > > IFF_UP on slave devices at all?
> > > > > > > > > Hmm, I ever thought about this option, and it appears this solution is
> > > > > > > > > more invasive than required to convert existing scripts, despite the
> > > > > > > > > controversy of introducing internal netdev state to differentiate user
> > > > > > > > > visible state. Either we disallow slave to be brought up by user, or to
> > > > > > > > > not set IFF_UP flag but instead use the internal one, could end up with
> > > > > > > > > substantial behavioral change that breaks scripts. Consider any admin
> > > > > > > > > script that does `ip link set dev ... up' successfully just assumes the
> > > > > > > > > link is up and subsequent operation can be done as usual.
> > > > > > How would it work when carrier is off?
> > > > > > 
> > > > > > > While it *may*
> > > > > > > > > work for dracut (yet to be verified), I'm a bit concerned that there are
> > > > > > > > > more scripts to be converted than those that don't follow volatile
> > > > > > > > > failover slave names. It's technically doable, but may not worth the
> > > > > > > > > effort (in terms of porting existing scripts/apps).
> > > > > > > > > 
> > > > > > > > > Thanks
> > > > > > > > > -Siwei
> > > > > > > > Won't work for most devices.  Many devices turn off PHY and link layer
> > > > > > > > if not IFF_UP
> > > > > > > True, that's what I said about introducing internal state for those driver
> > > > > > > and other kernel component. Very invasive change indeed.
> > > > > > > 
> > > > > > > -Siwei
> > > > > > Well I did say it's vague.
> > > > > > How about hiding IFF_UP from dev_get_flags (and probably
> > > > > > __dev_change_flags)?
> > > > > > 
> > > > > Any different? This has small footprint for the kernel change for sure,
> > > > > while the discrepancy is still there. Anyone who writes code for IFF_UP will
> > > > > not notice IFF_FAILOVER_SLAVE.
> > > > > 
> > > > > Not to mention more userspace "fixup" work has to be done due to this
> > > > > change.
> > > > > 
> > > > > -Siwei
> > > > > 
> > > > > 
> > > > Point is it's ok since most userspace should just ignore slaves
> > > > - hopefully it will just ignore it since it already
> > > > ignores interfaces that are down.
> > > Admin script thought the interface could be bright up and do further
> > > operations without checking the UP flag.
> > These scripts then would be broken  on any box with multiple interfaces
> > since not all of these would have carrier.
> Consider a script executing `ifconfig ... up' and once succeeds runs tcpdump
> or some other command relying on UP interface. It's quite common that those
> scripts don't check the UP flag but instead just rely on the well-known fact
> that the command exits with 0 meaning the interface should be UP. This
> change might well break scripts of that kind.

I am sorry I don't get it. Could you give an example
of a script that works now but would be broken?


> > 
> > 
> > > It doesn't look to be a reliable
> > > way of prohibit userspace from operating against slaves.
> > > 
> > > -Siwei
> > > 
> > > 
> > This does not mean we shouldn't make an effort to disable broken
> > configurations.
> > 
> > I am not arguing against your patch. Not at all. I see better
> > hiding of slaves as a separate enhancement.
> I understand, but my point is we should try to minimize unnecessary side
> impact to the current usage for whatever "hiding" effort we can make. It's
> hard to find a tradeoff sometimes.

Yes if some userspace made an assumption and it worked, we should keep
it working I think. I don't necessarily agree we should worry too much
about theoretical issues. In half a year since the feature got merged
it's unlikely there are millions of slightly different scripts using it.

> > 
> > 
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > 
> Thank you.
> 
> -Siwei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-06  7:23                     ` Michael S. Tsirkin
@ 2019-03-06  8:20                       ` si-wei liu
  0 siblings, 0 replies; 28+ messages in thread
From: si-wei liu @ 2019-03-06  8:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stephen Hemminger, Sridhar Samudrala, Jakub Kicinski, Jiri Pirko,
	David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna



On 3/5/2019 11:23 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 11:15:06PM -0800, si-wei liu wrote:
>>
>> On 3/5/2019 10:43 PM, Michael S. Tsirkin wrote:
>>> On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote:
>>>> On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:
>>>>>> On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:
>>>>>>>> On 3/5/2019 11:24 AM, Stephen Hemminger wrote:
>>>>>>>>> On Tue, 5 Mar 2019 11:19:32 -0800
>>>>>>>>> si-wei liu <si-wei.liu@oracle.com> wrote:
>>>>>>>>>
>>>>>>>>>>> I have a vague idea: would it work to *not* set
>>>>>>>>>>> IFF_UP on slave devices at all?
>>>>>>>>>> Hmm, I ever thought about this option, and it appears this solution is
>>>>>>>>>> more invasive than required to convert existing scripts, despite the
>>>>>>>>>> controversy of introducing internal netdev state to differentiate user
>>>>>>>>>> visible state. Either we disallow slave to be brought up by user, or to
>>>>>>>>>> not set IFF_UP flag but instead use the internal one, could end up with
>>>>>>>>>> substantial behavioral change that breaks scripts. Consider any admin
>>>>>>>>>> script that does `ip link set dev ... up' successfully just assumes the
>>>>>>>>>> link is up and subsequent operation can be done as usual.
>>>>>>> How would it work when carrier is off?
>>>>>>>
>>>>>>>> While it *may*
>>>>>>>>>> work for dracut (yet to be verified), I'm a bit concerned that there are
>>>>>>>>>> more scripts to be converted than those that don't follow volatile
>>>>>>>>>> failover slave names. It's technically doable, but may not worth the
>>>>>>>>>> effort (in terms of porting existing scripts/apps).
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> -Siwei
>>>>>>>>> Won't work for most devices.  Many devices turn off PHY and link layer
>>>>>>>>> if not IFF_UP
>>>>>>>> True, that's what I said about introducing internal state for those driver
>>>>>>>> and other kernel component. Very invasive change indeed.
>>>>>>>>
>>>>>>>> -Siwei
>>>>>>> Well I did say it's vague.
>>>>>>> How about hiding IFF_UP from dev_get_flags (and probably
>>>>>>> __dev_change_flags)?
>>>>>>>
>>>>>> Any different? This has small footprint for the kernel change for sure,
>>>>>> while the discrepancy is still there. Anyone who writes code for IFF_UP will
>>>>>> not notice IFF_FAILOVER_SLAVE.
>>>>>>
>>>>>> Not to mention more userspace "fixup" work has to be done due to this
>>>>>> change.
>>>>>>
>>>>>> -Siwei
>>>>>>
>>>>>>
>>>>> Point is it's ok since most userspace should just ignore slaves
>>>>> - hopefully it will just ignore it since it already
>>>>> ignores interfaces that are down.
>>>> Admin script thought the interface could be bright up and do further
>>>> operations without checking the UP flag.
>>> These scripts then would be broken  on any box with multiple interfaces
>>> since not all of these would have carrier.
>> Consider a script executing `ifconfig ... up' and once succeeds runs tcpdump
>> or some other command relying on UP interface. It's quite common that those
>> scripts don't check the UP flag but instead just rely on the well-known fact
>> that the command exits with 0 meaning the interface should be UP. This
>> change might well break scripts of that kind.
> I am sorry I don't get it. Could you give an example
> of a script that works now but would be broken?

https://github.com/torvalds/linux/blob/master/tools/testing/selftests/net/netdevice.sh#L27
https://github.com/WPO-Foundation/wptagent/blob/master/internal/adb.py#L443
https://github.com/openstack/steth/blob/master/steth/agent/api.py#L134

There are more if you keep searching.

-Siwei

>
>
>>>
>>>> It doesn't look to be a reliable
>>>> way of prohibit userspace from operating against slaves.
>>>>
>>>> -Siwei
>>>>
>>>>
>>> This does not mean we shouldn't make an effort to disable broken
>>> configurations.
>>>
>>> I am not arguing against your patch. Not at all. I see better
>>> hiding of slaves as a separate enhancement.
>> I understand, but my point is we should try to minimize unnecessary side
>> impact to the current usage for whatever "hiding" effort we can make. It's
>> hard to find a tradeoff sometimes.
> Yes if some userspace made an assumption and it worked, we should keep
> it working I think. I don't necessarily agree we should worry too much
> about theoretical issues. In half a year since the feature got merged
> it's unlikely there are millions of slightly different scripts using it.
>
>>>
>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>>
>>>
>> Thank you.
>>
>> -Siwei


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05  0:50 [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
  2019-03-05  2:33 ` Michael S. Tsirkin
  2019-03-05  2:33 ` Michael S. Tsirkin
@ 2019-03-06 12:04 ` Jiri Pirko
       [not found]   ` <7d1e79f6-01ff-413d-dac0-ee34258aafec@oracle.com>
  2019-03-06 12:04 ` Jiri Pirko
  3 siblings, 1 reply; 28+ messages in thread
From: Jiri Pirko @ 2019-03-06 12:04 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Michael S. Tsirkin, Sridhar Samudrala, Stephen Hemminger,
	Jakub Kicinski, David Miller, Netdev, virtualization, liran.alon,
	boris.ostrovsky, vijay.balakrishna

Tue, Mar 05, 2019 at 01:50:59AM CET, si-wei.liu@oracle.com wrote:
>When a netdev appears through hot plug then gets enslaved by a failover
>master that is already up and running, the slave will be opened
>right away after getting enslaved. Today there's a race that userspace
>(udev) may fail to rename the slave if the kernel (net_failover)
>opens the slave earlier than when the userspace rename happens.
>Unlike bond or team, the primary slave of failover can't be renamed by
>userspace ahead of time, since the kernel initiated auto-enslavement is
>unable to, or rather, is never meant to be synchronized with the rename
>request from userspace.
>
>As the failover slave interfaces are not designed to be operated
>directly by userspace apps: IP configuration, filter rules with
>regard to network traffic passing and etc., should all be done on master
>interface. In general, userspace apps only care about the
>name of master interface, while slave names are less important as long
>as admin users can see reliable names that may carry
>other information describing the netdev. For e.g., they can infer that
>"ens3nsby" is a standby slave of "ens3", while for a
>name like "eth0" they can't tell which master it belongs to.
>
>Historically the name of IFF_UP interface can't be changed because
>there might be admin script or management software that is already
>relying on such behavior and assumes that the slave name can't be
>changed once UP. But failover is special: with the in-kernel
>auto-enslavement mechanism, the userspace expectation for device
>enumeration and bring-up order is already broken. Previously initramfs
>and various userspace config tools were modified to bypass failover
>slaves because of auto-enslavement and duplicate MAC address. Similarly,
>in case that users care about seeing reliable slave name, the new type
>of failover slaves needs to be taken care of specifically in userspace
>anyway.
>
>For that to work, now introduce a module-level tunable,
>"slave_rename_ok" that allows users to lift up the rename restriction on
>failover slave which is already UP. Although it's possible this change
>potentially break userspace component (most likely configuration scripts
>or management software) that assumes slave name can't be changed while
>UP, it's relatively a limited and controllable set among all userspace
>components, which can be fixed specifically to work with the new naming
>behavior of the failover slave. Userspace component interacting with
>slaves should be changed to operate on failover master instead, as the
>failover slave is dynamic in nature which may come and go at any point.
>The goal is to make the role of failover slaves less relevant, and
>all userspace should only deal with master in the long run. The default
>for the "slave_rename_ok" is set to true(1). If userspace doesn't have
>the right support in place meanwhile users don't care about reliable
>userspace naming, the value can be set to false(0).
>
>Signed-off-by: Si-Wei.Liu@oracle.com
>Reviewed-by: Liran Alon <liran.alon@oracle.com>
>---
> include/linux/netdevice.h |  3 +++
> net/core/dev.c            |  3 ++-
> net/core/failover.c       | 11 +++++++++--
> 3 files changed, 14 insertions(+), 3 deletions(-)
>
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 857f8ab..6d9e4e0 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1487,6 +1487,7 @@ struct net_device_ops {
>  * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>  * @IFF_FAILOVER: device is a failover master device
>  * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>+ * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
>  */
> enum netdev_priv_flags {
> 	IFF_802_1Q_VLAN			= 1<<0,
>@@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
> 	IFF_NO_RX_HANDLER		= 1<<26,
> 	IFF_FAILOVER			= 1<<27,
> 	IFF_FAILOVER_SLAVE		= 1<<28,
>+	IFF_SLAVE_RENAME_OK		= 1<<29,
> };
> 
> #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
>@@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
> #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
> #define IFF_FAILOVER			IFF_FAILOVER
> #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
>+#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
> 
> /**
>  *	struct net_device - The DEVICE structure.
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 722d50d..ae070de 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
> 	BUG_ON(!dev_net(dev));
> 
> 	net = dev_net(dev);
>-	if (dev->flags & IFF_UP)
>+	if (dev->flags & IFF_UP &&
>+	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
> 		return -EBUSY;
> 
> 	write_seqcount_begin(&devnet_rename_seq);
>diff --git a/net/core/failover.c b/net/core/failover.c
>index 4a92a98..1fd8bbb 100644
>--- a/net/core/failover.c
>+++ b/net/core/failover.c
>@@ -16,6 +16,11 @@
> 
> static LIST_HEAD(failover_list);
> static DEFINE_SPINLOCK(failover_lock);
>+static bool slave_rename_ok = true;
>+
>+module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
>+MODULE_PARM_DESC(slave_rename_ok,
>+		 "If set allow renaming the slave when failover master is up");

No module parameters please. If you need to set something do it using
rtnl_link_ops. Thanks.


> 
> static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
> {
>@@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
> 	}
> 
> 	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
>+	if (slave_rename_ok)
>+		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
> 
> 	if (fops && fops->slave_register &&
> 	    !fops->slave_register(slave_dev, failover_dev))
> 		return NOTIFY_OK;
> 
> 	netdev_upper_dev_unlink(slave_dev, failover_dev);
>-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> err_upper_link:
> 	netdev_rx_handler_unregister(slave_dev);
> done:
>@@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
> 
> 	netdev_rx_handler_unregister(slave_dev);
> 	netdev_upper_dev_unlink(slave_dev, failover_dev);
>-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> 
> 	if (fops && fops->slave_unregister &&
> 	    !fops->slave_unregister(slave_dev, failover_dev))
>-- 
>1.8.3.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
  2019-03-05  0:50 [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
                   ` (2 preceding siblings ...)
  2019-03-06 12:04 ` Jiri Pirko
@ 2019-03-06 12:04 ` Jiri Pirko
  3 siblings, 0 replies; 28+ messages in thread
From: Jiri Pirko @ 2019-03-06 12:04 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Michael S. Tsirkin, Jakub Kicinski, Sridhar Samudrala,
	virtualization, liran.alon, Netdev, boris.ostrovsky,
	David Miller

Tue, Mar 05, 2019 at 01:50:59AM CET, si-wei.liu@oracle.com wrote:
>When a netdev appears through hot plug then gets enslaved by a failover
>master that is already up and running, the slave will be opened
>right away after getting enslaved. Today there's a race that userspace
>(udev) may fail to rename the slave if the kernel (net_failover)
>opens the slave earlier than when the userspace rename happens.
>Unlike bond or team, the primary slave of failover can't be renamed by
>userspace ahead of time, since the kernel initiated auto-enslavement is
>unable to, or rather, is never meant to be synchronized with the rename
>request from userspace.
>
>As the failover slave interfaces are not designed to be operated
>directly by userspace apps: IP configuration, filter rules with
>regard to network traffic passing and etc., should all be done on master
>interface. In general, userspace apps only care about the
>name of master interface, while slave names are less important as long
>as admin users can see reliable names that may carry
>other information describing the netdev. For e.g., they can infer that
>"ens3nsby" is a standby slave of "ens3", while for a
>name like "eth0" they can't tell which master it belongs to.
>
>Historically the name of IFF_UP interface can't be changed because
>there might be admin script or management software that is already
>relying on such behavior and assumes that the slave name can't be
>changed once UP. But failover is special: with the in-kernel
>auto-enslavement mechanism, the userspace expectation for device
>enumeration and bring-up order is already broken. Previously initramfs
>and various userspace config tools were modified to bypass failover
>slaves because of auto-enslavement and duplicate MAC address. Similarly,
>in case that users care about seeing reliable slave name, the new type
>of failover slaves needs to be taken care of specifically in userspace
>anyway.
>
>For that to work, now introduce a module-level tunable,
>"slave_rename_ok" that allows users to lift up the rename restriction on
>failover slave which is already UP. Although it's possible this change
>potentially break userspace component (most likely configuration scripts
>or management software) that assumes slave name can't be changed while
>UP, it's relatively a limited and controllable set among all userspace
>components, which can be fixed specifically to work with the new naming
>behavior of the failover slave. Userspace component interacting with
>slaves should be changed to operate on failover master instead, as the
>failover slave is dynamic in nature which may come and go at any point.
>The goal is to make the role of failover slaves less relevant, and
>all userspace should only deal with master in the long run. The default
>for the "slave_rename_ok" is set to true(1). If userspace doesn't have
>the right support in place meanwhile users don't care about reliable
>userspace naming, the value can be set to false(0).
>
>Signed-off-by: Si-Wei.Liu@oracle.com
>Reviewed-by: Liran Alon <liran.alon@oracle.com>
>---
> include/linux/netdevice.h |  3 +++
> net/core/dev.c            |  3 ++-
> net/core/failover.c       | 11 +++++++++--
> 3 files changed, 14 insertions(+), 3 deletions(-)
>
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 857f8ab..6d9e4e0 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1487,6 +1487,7 @@ struct net_device_ops {
>  * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>  * @IFF_FAILOVER: device is a failover master device
>  * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>+ * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
>  */
> enum netdev_priv_flags {
> 	IFF_802_1Q_VLAN			= 1<<0,
>@@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
> 	IFF_NO_RX_HANDLER		= 1<<26,
> 	IFF_FAILOVER			= 1<<27,
> 	IFF_FAILOVER_SLAVE		= 1<<28,
>+	IFF_SLAVE_RENAME_OK		= 1<<29,
> };
> 
> #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
>@@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
> #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
> #define IFF_FAILOVER			IFF_FAILOVER
> #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
>+#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
> 
> /**
>  *	struct net_device - The DEVICE structure.
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 722d50d..ae070de 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
> 	BUG_ON(!dev_net(dev));
> 
> 	net = dev_net(dev);
>-	if (dev->flags & IFF_UP)
>+	if (dev->flags & IFF_UP &&
>+	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
> 		return -EBUSY;
> 
> 	write_seqcount_begin(&devnet_rename_seq);
>diff --git a/net/core/failover.c b/net/core/failover.c
>index 4a92a98..1fd8bbb 100644
>--- a/net/core/failover.c
>+++ b/net/core/failover.c
>@@ -16,6 +16,11 @@
> 
> static LIST_HEAD(failover_list);
> static DEFINE_SPINLOCK(failover_lock);
>+static bool slave_rename_ok = true;
>+
>+module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
>+MODULE_PARM_DESC(slave_rename_ok,
>+		 "If set allow renaming the slave when failover master is up");

No module parameters please. If you need to set something do it using
rtnl_link_ops. Thanks.


> 
> static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
> {
>@@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
> 	}
> 
> 	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
>+	if (slave_rename_ok)
>+		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
> 
> 	if (fops && fops->slave_register &&
> 	    !fops->slave_register(slave_dev, failover_dev))
> 		return NOTIFY_OK;
> 
> 	netdev_upper_dev_unlink(slave_dev, failover_dev);
>-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> err_upper_link:
> 	netdev_rx_handler_unregister(slave_dev);
> done:
>@@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
> 
> 	netdev_rx_handler_unregister(slave_dev);
> 	netdev_upper_dev_unlink(slave_dev, failover_dev);
>-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
>+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> 
> 	if (fops && fops->slave_unregister &&
> 	    !fops->slave_unregister(slave_dev, failover_dev))
>-- 
>1.8.3.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
       [not found]   ` <7d1e79f6-01ff-413d-dac0-ee34258aafec@oracle.com>
@ 2019-03-06 21:36     ` Samudrala, Sridhar
       [not found]       ` <7bc9dc90-6597-4223-c192-55a314ff079f@oracle.com>
  0 siblings, 1 reply; 28+ messages in thread
From: Samudrala, Sridhar @ 2019-03-06 21:36 UTC (permalink / raw)
  To: si-wei liu, Jiri Pirko
  Cc: Michael S. Tsirkin, Jakub Kicinski, Netdev, virtualization,
	liran.alon, boris.ostrovsky, David Miller


[-- Attachment #1.1: Type: text/plain, Size: 1218 bytes --]


On 3/6/2019 1:26 PM, si-wei liu wrote:
>
>
>
> On 3/6/2019 4:04 AM, Jiri Pirko wrote:
>>> --- a/net/core/failover.c
>>> +++ b/net/core/failover.c
>>> @@ -16,6 +16,11 @@
>>>
>>> static LIST_HEAD(failover_list);
>>> static DEFINE_SPINLOCK(failover_lock);
>>> +static bool slave_rename_ok = true;
>>> +
>>> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
>>> +MODULE_PARM_DESC(slave_rename_ok,
>>> +		 "If set allow renaming the slave when failover master is up");
>> No module parameters please. If you need to set something do it using
>> rtnl_link_ops. Thanks.
>>
> I understand what you ask for, but without module parameters userspace 
> don't work. During boot (dracut) the virtio netdev gets enslaved 
> earlier than when userspace comes up, so failover has to determine the 
> setting during initialization/creation. This config is not dynamic, at 
> least for the life cycle of a particular failover link it shouldn't be 
> changed. Without module parameter, how does the userspace specify this 
> value during kernel initialization?
>
Can we enable this by default and not make it configurable via module 
parameter?
Is there any  usecase where someone expects rename to fail with failover 
slaves?

[-- Attachment #1.2: Type: text/html, Size: 1999 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
       [not found]       ` <7bc9dc90-6597-4223-c192-55a314ff079f@oracle.com>
@ 2019-03-06 23:36         ` Liran Alon
  2019-03-06 23:36         ` Liran Alon
  1 sibling, 0 replies; 28+ messages in thread
From: Liran Alon @ 2019-03-06 23:36 UTC (permalink / raw)
  To: si-wei liu
  Cc: Samudrala, Sridhar, Jiri Pirko, Michael S. Tsirkin,
	Stephen Hemminger, Jakub Kicinski, David Miller, Netdev,
	virtualization, boris.ostrovsky, vijay.balakrishna



> On 6 Mar 2019, at 23:42, si-wei liu <si-wei.liu@oracle.com> wrote:
> 
> 
> 
> On 3/6/2019 1:36 PM, Samudrala, Sridhar wrote:
>> 
>> On 3/6/2019 1:26 PM, si-wei liu wrote:
>>> 
>>> 
>>> On 3/6/2019 4:04 AM, Jiri Pirko wrote:
>>>>> --- a/net/core/failover.c
>>>>> +++ b/net/core/failover.c
>>>>> @@ -16,6 +16,11 @@
>>>>> 
>>>>> static LIST_HEAD(failover_list);
>>>>> static DEFINE_SPINLOCK(failover_lock);
>>>>> +static bool slave_rename_ok = true;
>>>>> +
>>>>> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
>>>>> +MODULE_PARM_DESC(slave_rename_ok,
>>>>> +		 "If set allow renaming the slave when failover master is up");
>>>>> 
>>>> No module parameters please. If you need to set something do it using
>>>> rtnl_link_ops. Thanks.
>>>> 
>>>> 
>>> I understand what you ask for, but without module parameters userspace don't work. During boot (dracut) the virtio netdev gets enslaved earlier than when userspace comes up, so failover has to determine the setting during initialization/creation. This config is not dynamic, at least for the life cycle of a particular failover link it shouldn't be changed. Without module parameter, how does the userspace specify this value during kernel initialization? 
>>> 
>> Can we enable this by default and not make it configurable via module parameter?
>> Is there any  usecase where someone expects rename to fail with failover slaves?
> Probably just cater for those application that assumes fixed name on UP interface?
> 
> It's already the default for the configurable. I myself don't think that's a big problem for failover users. So far there's not even QEMU support I think everything can be changed. I don't feel strong to just fix it without introducing configurable. But maybe Michael or others think it differently...
> 
> If no one objects, I don't feel strong to make it fixed behavior.
> 
> -Siwei
> 

I agree we should just remove the module parameter.

-Liran



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
       [not found]       ` <7bc9dc90-6597-4223-c192-55a314ff079f@oracle.com>
  2019-03-06 23:36         ` Liran Alon
@ 2019-03-06 23:36         ` Liran Alon
  1 sibling, 0 replies; 28+ messages in thread
From: Liran Alon @ 2019-03-06 23:36 UTC (permalink / raw)
  To: si-wei liu
  Cc: Jiri Pirko, Michael S. Tsirkin, Jakub Kicinski, Samudrala,
	Sridhar, virtualization, Netdev, boris.ostrovsky, David Miller



> On 6 Mar 2019, at 23:42, si-wei liu <si-wei.liu@oracle.com> wrote:
> 
> 
> 
> On 3/6/2019 1:36 PM, Samudrala, Sridhar wrote:
>> 
>> On 3/6/2019 1:26 PM, si-wei liu wrote:
>>> 
>>> 
>>> On 3/6/2019 4:04 AM, Jiri Pirko wrote:
>>>>> --- a/net/core/failover.c
>>>>> +++ b/net/core/failover.c
>>>>> @@ -16,6 +16,11 @@
>>>>> 
>>>>> static LIST_HEAD(failover_list);
>>>>> static DEFINE_SPINLOCK(failover_lock);
>>>>> +static bool slave_rename_ok = true;
>>>>> +
>>>>> +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
>>>>> +MODULE_PARM_DESC(slave_rename_ok,
>>>>> +		 "If set allow renaming the slave when failover master is up");
>>>>> 
>>>> No module parameters please. If you need to set something do it using
>>>> rtnl_link_ops. Thanks.
>>>> 
>>>> 
>>> I understand what you ask for, but without module parameters userspace don't work. During boot (dracut) the virtio netdev gets enslaved earlier than when userspace comes up, so failover has to determine the setting during initialization/creation. This config is not dynamic, at least for the life cycle of a particular failover link it shouldn't be changed. Without module parameter, how does the userspace specify this value during kernel initialization? 
>>> 
>> Can we enable this by default and not make it configurable via module parameter?
>> Is there any  usecase where someone expects rename to fail with failover slaves?
> Probably just cater for those application that assumes fixed name on UP interface?
> 
> It's already the default for the configurable. I myself don't think that's a big problem for failover users. So far there's not even QEMU support I think everything can be changed. I don't feel strong to just fix it without introducing configurable. But maybe Michael or others think it differently...
> 
> If no one objects, I don't feel strong to make it fixed behavior.
> 
> -Siwei
> 

I agree we should just remove the module parameter.

-Liran

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
@ 2019-03-05  0:36 Si-Wei Liu
  0 siblings, 0 replies; 28+ messages in thread
From: Si-Wei Liu @ 2019-03-05  0:36 UTC (permalink / raw)
  To: Michael S. Tsirkin, Sridhar Samudrala, Stephen Hemminger,
	Jakub Kicinski, Jiri Pirko, David Miller, Netdev, virtualization

When a netdev appears through hot plug then gets enslaved by a failover
master that is already up and running, the slave will be opened
right away after getting enslaved. Today there's a race that userspace
(udev) may fail to rename the slave if the kernel (net_failover)
opens the slave earlier than when the userspace rename happens.
Unlike bond or team, the primary slave of failover can't be renamed by
userspace ahead of time, since the kernel initiated auto-enslavement is
unable to, or rather, is never meant to be synchronized with the rename
request from userspace.

As the failover slave interfaces are not designed to be operated
directly by userspace apps: IP configuration, filter rules with
regard to network traffic passing and etc., should all be done on master
interface. In general, userspace apps only care about the
name of master interface, while slave names are less important as long
as admin users can see reliable names that may carry
other information describing the netdev. For e.g., they can infer that
"ens3nsby" is a standby slave of "ens3", while for a
name like "eth0" they can't tell which master it belongs to.

Historically the name of IFF_UP interface can't be changed because
there might be admin script or management software that is already
relying on such behavior and assumes that the slave name can't be
changed once UP. But failover is special: with the in-kernel
auto-enslavement mechanism, the userspace expectation for device
enumeration and bring-up order is already broken. Previously initramfs
and various userspace config tools were modified to bypass failover
slaves because of auto-enslavement and duplicate MAC address. Similarly,
in case that users care about seeing reliable slave name, the new type
of failover slaves needs to be taken care of specifically in userspace
anyway.

For that to work, now introduce a module-level tunable,
"slave_rename_ok" that allows users to lift up the rename restriction on
failover slave which is already UP. Although it's possible this change
potentially break userspace component (most likely configuration scripts
or management software) that assumes slave name can't be changed while
UP, it's relatively a limited and controllable set among all userspace
components, which can be fixed specifically to work with the new naming
behavior of the failover slave. Userspace component interacting with
slaves should be changed to operate on failover master instead, as the
failover slave is dynamic in nature which may come and go at any point.
The goal is to make the role of failover slaves less relevant, and
all userspace should only deal with master in the long run. The default
for the "slave_rename_ok" is set to true(1). If userspace doesn't have
the right support in place meanwhile users don't care about reliable
userspace naming, the value can be set to false(0).

Signed-off-by: Si-Wei.Liu@oracle.com
Reviewed-by: Liran Alon <liran.alon@oracle.com>
---
 include/linux/netdevice.h |  3 +++
 net/core/dev.c            |  3 ++-
 net/core/failover.c       | 11 +++++++++--
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 857f8ab..6d9e4e0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1487,6 +1487,7 @@ struct net_device_ops {
  * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
  * @IFF_FAILOVER: device is a failover master device
  * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
+ * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
  */
 enum netdev_priv_flags {
 	IFF_802_1Q_VLAN			= 1<<0,
@@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
 	IFF_NO_RX_HANDLER		= 1<<26,
 	IFF_FAILOVER			= 1<<27,
 	IFF_FAILOVER_SLAVE		= 1<<28,
+	IFF_SLAVE_RENAME_OK		= 1<<29,
 };
 
 #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
@@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
 #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
 #define IFF_FAILOVER			IFF_FAILOVER
 #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
+#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
 
 /**
  *	struct net_device - The DEVICE structure.
diff --git a/net/core/dev.c b/net/core/dev.c
index 722d50d..ae070de 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
 	BUG_ON(!dev_net(dev));
 
 	net = dev_net(dev);
-	if (dev->flags & IFF_UP)
+	if (dev->flags & IFF_UP &&
+	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
 		return -EBUSY;
 
 	write_seqcount_begin(&devnet_rename_seq);
diff --git a/net/core/failover.c b/net/core/failover.c
index 4a92a98..1fd8bbb 100644
--- a/net/core/failover.c
+++ b/net/core/failover.c
@@ -16,6 +16,11 @@
 
 static LIST_HEAD(failover_list);
 static DEFINE_SPINLOCK(failover_lock);
+static bool slave_rename_ok = true;
+
+module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
+MODULE_PARM_DESC(slave_rename_ok,
+		 "If set allow renaming the slave when failover master is up");
 
 static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
 {
@@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
 	}
 
 	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
+	if (slave_rename_ok)
+		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
 
 	if (fops && fops->slave_register &&
 	    !fops->slave_register(slave_dev, failover_dev))
 		return NOTIFY_OK;
 
 	netdev_upper_dev_unlink(slave_dev, failover_dev);
-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
 err_upper_link:
 	netdev_rx_handler_unregister(slave_dev);
 done:
@@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
 
 	netdev_rx_handler_unregister(slave_dev);
 	netdev_upper_dev_unlink(slave_dev, failover_dev);
-	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
+	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
 
 	if (fops && fops->slave_unregister &&
 	    !fops->slave_unregister(slave_dev, failover_dev))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-03-06 23:36 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-05  0:50 [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
2019-03-05  2:33 ` Michael S. Tsirkin
2019-03-05  2:33 ` Michael S. Tsirkin
2019-03-05 19:19   ` si-wei liu
2019-03-05 19:24     ` Stephen Hemminger
2019-03-05 19:24     ` Stephen Hemminger
2019-03-05 19:35       ` si-wei liu
2019-03-06  0:06         ` Michael S. Tsirkin
2019-03-06  0:06           ` Michael S. Tsirkin
2019-03-06  0:20           ` si-wei liu
2019-03-06  0:36             ` Michael S. Tsirkin
2019-03-06  0:51               ` si-wei liu
2019-03-06  6:43                 ` Michael S. Tsirkin
2019-03-06  6:43                   ` Michael S. Tsirkin
2019-03-06  7:15                   ` si-wei liu
2019-03-06  7:23                     ` Michael S. Tsirkin
2019-03-06  7:23                     ` Michael S. Tsirkin
2019-03-06  8:20                       ` si-wei liu
2019-03-06  0:36             ` Michael S. Tsirkin
2019-03-05 20:28     ` Michael S. Tsirkin
2019-03-05 20:28     ` Michael S. Tsirkin
2019-03-05 22:49       ` si-wei liu
2019-03-06 12:04 ` Jiri Pirko
     [not found]   ` <7d1e79f6-01ff-413d-dac0-ee34258aafec@oracle.com>
2019-03-06 21:36     ` Samudrala, Sridhar
     [not found]       ` <7bc9dc90-6597-4223-c192-55a314ff079f@oracle.com>
2019-03-06 23:36         ` Liran Alon
2019-03-06 23:36         ` Liran Alon
2019-03-06 12:04 ` Jiri Pirko
  -- strict thread matches above, loose matches on Subject: below --
2019-03-05  0:36 Si-Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.