* [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
@ 2019-03-26 23:48 Si-Wei Liu
2019-03-27 2:13 ` Stephen Hemminger
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Si-Wei Liu @ 2019-03-26 23:48 UTC (permalink / raw)
To: mst, sridhar.samudrala, stephen, davem, kubakici,
alexander.duyck, jiri, netdev, virtualization
Cc: liran.alon, boris.ostrovsky, vijay.balakrishna, si-wei liu
When a netdev appears through hot plug then gets enslaved by a failover
master that is already up and running, the slave will be opened
right away after getting enslaved. Today there's a race that userspace
(udev) may fail to rename the slave if the kernel (net_failover)
opens the slave earlier than when the userspace rename happens.
Unlike bond or team, the primary slave of failover can't be renamed by
userspace ahead of time, since the kernel initiated auto-enslavement is
unable to, or rather, is never meant to be synchronized with the rename
request from userspace.
As the failover slave interfaces are not designed to be operated
directly by userspace apps: IP configuration, filter rules with
regard to network traffic passing and etc., should all be done on master
interface. In general, userspace apps only care about the
name of master interface, while slave names are less important as long
as admin users can see reliable names that may carry
other information describing the netdev. For e.g., they can infer that
"ens3nsby" is a standby slave of "ens3", while for a
name like "eth0" they can't tell which master it belongs to.
Historically the name of IFF_UP interface can't be changed because
there might be admin script or management software that is already
relying on such behavior and assumes that the slave name can't be
changed once UP. But failover is special: with the in-kernel
auto-enslavement mechanism, the userspace expectation for device
enumeration and bring-up order is already broken. Previously initramfs
and various userspace config tools were modified to bypass failover
slaves because of auto-enslavement and duplicate MAC address. Similarly,
in case that users care about seeing reliable slave name, the new type
of failover slaves needs to be taken care of specifically in userspace
anyway.
It's less risky to lift up the rename restriction on failover slave
which is already UP. Although it's possible this change may potentially
break userspace component (most likely configuration scripts or
management software) that assumes slave name can't be changed while
UP, it's relatively a limited and controllable set among all userspace
components, which can be fixed specifically to listen for the rename
and/or link down/up events on failover slaves. Userspace component
interacting with slaves is expected to be changed to operate on failover
master interface instead, as the failover slave is dynamic in nature
which may come and go at any point. The goal is to make the role of
failover slaves less relevant, and userspace components should only
deal with failover master in the long run.
Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
--
v1 -> v2:
- Drop configurable module parameter (Sridhar)
v2 -> v3:
- Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
- Send down and up events around rename (Michael S. Tsirkin)
---
net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
1 file changed, 34 insertions(+), 3 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 722d50d..3e0cd80 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev,
int dev_change_name(struct net_device *dev, const char *newname)
{
unsigned char old_assign_type;
+ bool reopen_needed = false;
char oldname[IFNAMSIZ];
int err = 0;
int ret;
@@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname)
BUG_ON(!dev_net(dev));
net = dev_net(dev);
- if (dev->flags & IFF_UP)
- return -EBUSY;
+
+ /* Allow failover slave to rename even when
+ * it is up and running.
+ *
+ * Failover slaves are special, since userspace
+ * might rename the slave after the interface
+ * has been brought up and running due to
+ * auto-enslavement.
+ *
+ * Failover users don't actually care about slave
+ * name change, as they are only expected to operate
+ * on master interface directly.
+ */
+ if (dev->flags & IFF_UP) {
+ if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
+ return -EBUSY;
+ reopen_needed = true;
+ }
write_seqcount_begin(&devnet_rename_seq);
@@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
return err;
}
+ if (reopen_needed)
+ dev_close(dev);
+
if (oldname[0] && !strchr(oldname, '%'))
netdev_info(dev, "renamed from %s\n", oldname);
@@ -1210,7 +1230,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
memcpy(dev->name, oldname, IFNAMSIZ);
dev->name_assign_type = old_assign_type;
write_seqcount_end(&devnet_rename_seq);
- return ret;
+ if (err >= 0)
+ err = ret;
+ goto reopen;
}
write_seqcount_end(&devnet_rename_seq);
@@ -1246,6 +1268,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
}
}
+reopen:
+ if (reopen_needed) {
+ ret = dev_open(dev);
+ if (ret) {
+ pr_err("%s: reopen device failed: %d\n",
+ dev->name, ret);
+ }
+ }
+
return err;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-26 23:48 [PATCH net v3] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
2019-03-27 2:13 ` Stephen Hemminger
@ 2019-03-27 2:13 ` Stephen Hemminger
2019-03-27 13:25 ` Michael S. Tsirkin
2019-03-27 13:25 ` Michael S. Tsirkin
2019-03-27 11:11 ` Jiri Pirko
` (3 subsequent siblings)
5 siblings, 2 replies; 18+ messages in thread
From: Stephen Hemminger @ 2019-03-27 2:13 UTC (permalink / raw)
To: Si-Wei Liu
Cc: mst, sridhar.samudrala, davem, kubakici, alexander.duyck, jiri,
netdev, virtualization, liran.alon, boris.ostrovsky,
vijay.balakrishna
On Tue, 26 Mar 2019 19:48:13 -0400
Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> When a netdev appears through hot plug then gets enslaved by a failover
> master that is already up and running, the slave will be opened
> right away after getting enslaved. Today there's a race that userspace
> (udev) may fail to rename the slave if the kernel (net_failover)
> opens the slave earlier than when the userspace rename happens.
> Unlike bond or team, the primary slave of failover can't be renamed by
> userspace ahead of time, since the kernel initiated auto-enslavement is
> unable to, or rather, is never meant to be synchronized with the rename
> request from userspace.
>
> As the failover slave interfaces are not designed to be operated
> directly by userspace apps: IP configuration, filter rules with
> regard to network traffic passing and etc., should all be done on master
> interface. In general, userspace apps only care about the
> name of master interface, while slave names are less important as long
> as admin users can see reliable names that may carry
> other information describing the netdev. For e.g., they can infer that
> "ens3nsby" is a standby slave of "ens3", while for a
> name like "eth0" they can't tell which master it belongs to.
>
> Historically the name of IFF_UP interface can't be changed because
> there might be admin script or management software that is already
> relying on such behavior and assumes that the slave name can't be
> changed once UP. But failover is special: with the in-kernel
> auto-enslavement mechanism, the userspace expectation for device
> enumeration and bring-up order is already broken. Previously initramfs
> and various userspace config tools were modified to bypass failover
> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> in case that users care about seeing reliable slave name, the new type
> of failover slaves needs to be taken care of specifically in userspace
> anyway.
>
> It's less risky to lift up the rename restriction on failover slave
> which is already UP. Although it's possible this change may potentially
> break userspace component (most likely configuration scripts or
> management software) that assumes slave name can't be changed while
> UP, it's relatively a limited and controllable set among all userspace
> components, which can be fixed specifically to listen for the rename
> and/or link down/up events on failover slaves. Userspace component
> interacting with slaves is expected to be changed to operate on failover
> master interface instead, as the failover slave is dynamic in nature
> which may come and go at any point. The goal is to make the role of
> failover slaves less relevant, and userspace components should only
> deal with failover master in the long run.
>
> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
Why do you need to do dev_close/dev_open which will bounce
the link?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-26 23:48 [PATCH net v3] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
@ 2019-03-27 2:13 ` Stephen Hemminger
2019-03-27 2:13 ` Stephen Hemminger
` (4 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2019-03-27 2:13 UTC (permalink / raw)
To: Si-Wei Liu
Cc: jiri, mst, kubakici, sridhar.samudrala, alexander.duyck,
virtualization, liran.alon, netdev, boris.ostrovsky, davem
On Tue, 26 Mar 2019 19:48:13 -0400
Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> When a netdev appears through hot plug then gets enslaved by a failover
> master that is already up and running, the slave will be opened
> right away after getting enslaved. Today there's a race that userspace
> (udev) may fail to rename the slave if the kernel (net_failover)
> opens the slave earlier than when the userspace rename happens.
> Unlike bond or team, the primary slave of failover can't be renamed by
> userspace ahead of time, since the kernel initiated auto-enslavement is
> unable to, or rather, is never meant to be synchronized with the rename
> request from userspace.
>
> As the failover slave interfaces are not designed to be operated
> directly by userspace apps: IP configuration, filter rules with
> regard to network traffic passing and etc., should all be done on master
> interface. In general, userspace apps only care about the
> name of master interface, while slave names are less important as long
> as admin users can see reliable names that may carry
> other information describing the netdev. For e.g., they can infer that
> "ens3nsby" is a standby slave of "ens3", while for a
> name like "eth0" they can't tell which master it belongs to.
>
> Historically the name of IFF_UP interface can't be changed because
> there might be admin script or management software that is already
> relying on such behavior and assumes that the slave name can't be
> changed once UP. But failover is special: with the in-kernel
> auto-enslavement mechanism, the userspace expectation for device
> enumeration and bring-up order is already broken. Previously initramfs
> and various userspace config tools were modified to bypass failover
> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> in case that users care about seeing reliable slave name, the new type
> of failover slaves needs to be taken care of specifically in userspace
> anyway.
>
> It's less risky to lift up the rename restriction on failover slave
> which is already UP. Although it's possible this change may potentially
> break userspace component (most likely configuration scripts or
> management software) that assumes slave name can't be changed while
> UP, it's relatively a limited and controllable set among all userspace
> components, which can be fixed specifically to listen for the rename
> and/or link down/up events on failover slaves. Userspace component
> interacting with slaves is expected to be changed to operate on failover
> master interface instead, as the failover slave is dynamic in nature
> which may come and go at any point. The goal is to make the role of
> failover slaves less relevant, and userspace components should only
> deal with failover master in the long run.
>
> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
Why do you need to do dev_close/dev_open which will bounce
the link?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-26 23:48 [PATCH net v3] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
2019-03-27 2:13 ` Stephen Hemminger
2019-03-27 2:13 ` Stephen Hemminger
@ 2019-03-27 11:11 ` Jiri Pirko
2019-03-27 23:44 ` si-wei liu
2019-03-27 11:11 ` Jiri Pirko
` (2 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Jiri Pirko @ 2019-03-27 11:11 UTC (permalink / raw)
To: Si-Wei Liu
Cc: mst, sridhar.samudrala, stephen, davem, kubakici,
alexander.duyck, netdev, virtualization, liran.alon,
boris.ostrovsky, vijay.balakrishna
Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote:
>When a netdev appears through hot plug then gets enslaved by a failover
>master that is already up and running, the slave will be opened
>right away after getting enslaved. Today there's a race that userspace
>(udev) may fail to rename the slave if the kernel (net_failover)
>opens the slave earlier than when the userspace rename happens.
>Unlike bond or team, the primary slave of failover can't be renamed by
>userspace ahead of time, since the kernel initiated auto-enslavement is
>unable to, or rather, is never meant to be synchronized with the rename
>request from userspace.
>
>As the failover slave interfaces are not designed to be operated
>directly by userspace apps: IP configuration, filter rules with
>regard to network traffic passing and etc., should all be done on master
>interface. In general, userspace apps only care about the
>name of master interface, while slave names are less important as long
>as admin users can see reliable names that may carry
>other information describing the netdev. For e.g., they can infer that
>"ens3nsby" is a standby slave of "ens3", while for a
>name like "eth0" they can't tell which master it belongs to.
>
>Historically the name of IFF_UP interface can't be changed because
>there might be admin script or management software that is already
>relying on such behavior and assumes that the slave name can't be
>changed once UP. But failover is special: with the in-kernel
>auto-enslavement mechanism, the userspace expectation for device
>enumeration and bring-up order is already broken. Previously initramfs
>and various userspace config tools were modified to bypass failover
>slaves because of auto-enslavement and duplicate MAC address. Similarly,
>in case that users care about seeing reliable slave name, the new type
>of failover slaves needs to be taken care of specifically in userspace
>anyway.
>
>It's less risky to lift up the rename restriction on failover slave
>which is already UP. Although it's possible this change may potentially
>break userspace component (most likely configuration scripts or
>management software) that assumes slave name can't be changed while
>UP, it's relatively a limited and controllable set among all userspace
>components, which can be fixed specifically to listen for the rename
>and/or link down/up events on failover slaves. Userspace component
>interacting with slaves is expected to be changed to operate on failover
>master interface instead, as the failover slave is dynamic in nature
>which may come and go at any point. The goal is to make the role of
>failover slaves less relevant, and userspace components should only
>deal with failover master in the long run.
>
>Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
>Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>Reviewed-by: Liran Alon <liran.alon@oracle.com>
>
>--
>v1 -> v2:
>- Drop configurable module parameter (Sridhar)
>
>v2 -> v3:
>- Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
>- Send down and up events around rename (Michael S. Tsirkin)
>---
> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
> 1 file changed, 34 insertions(+), 3 deletions(-)
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 722d50d..3e0cd80 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev,
> int dev_change_name(struct net_device *dev, const char *newname)
> {
> unsigned char old_assign_type;
>+ bool reopen_needed = false;
> char oldname[IFNAMSIZ];
> int err = 0;
> int ret;
>@@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname)
> BUG_ON(!dev_net(dev));
>
> net = dev_net(dev);
>- if (dev->flags & IFF_UP)
>- return -EBUSY;
>+
>+ /* Allow failover slave to rename even when
>+ * it is up and running.
>+ *
>+ * Failover slaves are special, since userspace
>+ * might rename the slave after the interface
>+ * has been brought up and running due to
>+ * auto-enslavement.
>+ *
>+ * Failover users don't actually care about slave
>+ * name change, as they are only expected to operate
>+ * on master interface directly.
>+ */
>+ if (dev->flags & IFF_UP) {
>+ if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
>+ return -EBUSY;
>+ reopen_needed = true;
>+ }
>
> write_seqcount_begin(&devnet_rename_seq);
>
>@@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
> return err;
> }
>
>+ if (reopen_needed)
>+ dev_close(dev);
Ugh. Don't dev_close/dev_open on name change.
>+
> if (oldname[0] && !strchr(oldname, '%'))
> netdev_info(dev, "renamed from %s\n", oldname);
>
>@@ -1210,7 +1230,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
> memcpy(dev->name, oldname, IFNAMSIZ);
> dev->name_assign_type = old_assign_type;
> write_seqcount_end(&devnet_rename_seq);
>- return ret;
>+ if (err >= 0)
>+ err = ret;
>+ goto reopen;
> }
>
> write_seqcount_end(&devnet_rename_seq);
>@@ -1246,6 +1268,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
> }
> }
>
>+reopen:
>+ if (reopen_needed) {
>+ ret = dev_open(dev);
>+ if (ret) {
>+ pr_err("%s: reopen device failed: %d\n",
>+ dev->name, ret);
>+ }
>+ }
>+
> return err;
> }
>
>--
>1.8.3.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-26 23:48 [PATCH net v3] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
` (2 preceding siblings ...)
2019-03-27 11:11 ` Jiri Pirko
@ 2019-03-27 11:11 ` Jiri Pirko
2019-03-28 19:49 ` kbuild test robot
2019-03-28 19:49 ` kbuild test robot
5 siblings, 0 replies; 18+ messages in thread
From: Jiri Pirko @ 2019-03-27 11:11 UTC (permalink / raw)
To: Si-Wei Liu
Cc: mst, kubakici, sridhar.samudrala, alexander.duyck,
virtualization, liran.alon, netdev, boris.ostrovsky, davem
Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote:
>When a netdev appears through hot plug then gets enslaved by a failover
>master that is already up and running, the slave will be opened
>right away after getting enslaved. Today there's a race that userspace
>(udev) may fail to rename the slave if the kernel (net_failover)
>opens the slave earlier than when the userspace rename happens.
>Unlike bond or team, the primary slave of failover can't be renamed by
>userspace ahead of time, since the kernel initiated auto-enslavement is
>unable to, or rather, is never meant to be synchronized with the rename
>request from userspace.
>
>As the failover slave interfaces are not designed to be operated
>directly by userspace apps: IP configuration, filter rules with
>regard to network traffic passing and etc., should all be done on master
>interface. In general, userspace apps only care about the
>name of master interface, while slave names are less important as long
>as admin users can see reliable names that may carry
>other information describing the netdev. For e.g., they can infer that
>"ens3nsby" is a standby slave of "ens3", while for a
>name like "eth0" they can't tell which master it belongs to.
>
>Historically the name of IFF_UP interface can't be changed because
>there might be admin script or management software that is already
>relying on such behavior and assumes that the slave name can't be
>changed once UP. But failover is special: with the in-kernel
>auto-enslavement mechanism, the userspace expectation for device
>enumeration and bring-up order is already broken. Previously initramfs
>and various userspace config tools were modified to bypass failover
>slaves because of auto-enslavement and duplicate MAC address. Similarly,
>in case that users care about seeing reliable slave name, the new type
>of failover slaves needs to be taken care of specifically in userspace
>anyway.
>
>It's less risky to lift up the rename restriction on failover slave
>which is already UP. Although it's possible this change may potentially
>break userspace component (most likely configuration scripts or
>management software) that assumes slave name can't be changed while
>UP, it's relatively a limited and controllable set among all userspace
>components, which can be fixed specifically to listen for the rename
>and/or link down/up events on failover slaves. Userspace component
>interacting with slaves is expected to be changed to operate on failover
>master interface instead, as the failover slave is dynamic in nature
>which may come and go at any point. The goal is to make the role of
>failover slaves less relevant, and userspace components should only
>deal with failover master in the long run.
>
>Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
>Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>Reviewed-by: Liran Alon <liran.alon@oracle.com>
>
>--
>v1 -> v2:
>- Drop configurable module parameter (Sridhar)
>
>v2 -> v3:
>- Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
>- Send down and up events around rename (Michael S. Tsirkin)
>---
> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
> 1 file changed, 34 insertions(+), 3 deletions(-)
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 722d50d..3e0cd80 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev,
> int dev_change_name(struct net_device *dev, const char *newname)
> {
> unsigned char old_assign_type;
>+ bool reopen_needed = false;
> char oldname[IFNAMSIZ];
> int err = 0;
> int ret;
>@@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname)
> BUG_ON(!dev_net(dev));
>
> net = dev_net(dev);
>- if (dev->flags & IFF_UP)
>- return -EBUSY;
>+
>+ /* Allow failover slave to rename even when
>+ * it is up and running.
>+ *
>+ * Failover slaves are special, since userspace
>+ * might rename the slave after the interface
>+ * has been brought up and running due to
>+ * auto-enslavement.
>+ *
>+ * Failover users don't actually care about slave
>+ * name change, as they are only expected to operate
>+ * on master interface directly.
>+ */
>+ if (dev->flags & IFF_UP) {
>+ if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
>+ return -EBUSY;
>+ reopen_needed = true;
>+ }
>
> write_seqcount_begin(&devnet_rename_seq);
>
>@@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
> return err;
> }
>
>+ if (reopen_needed)
>+ dev_close(dev);
Ugh. Don't dev_close/dev_open on name change.
>+
> if (oldname[0] && !strchr(oldname, '%'))
> netdev_info(dev, "renamed from %s\n", oldname);
>
>@@ -1210,7 +1230,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
> memcpy(dev->name, oldname, IFNAMSIZ);
> dev->name_assign_type = old_assign_type;
> write_seqcount_end(&devnet_rename_seq);
>- return ret;
>+ if (err >= 0)
>+ err = ret;
>+ goto reopen;
> }
>
> write_seqcount_end(&devnet_rename_seq);
>@@ -1246,6 +1268,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
> }
> }
>
>+reopen:
>+ if (reopen_needed) {
>+ ret = dev_open(dev);
>+ if (ret) {
>+ pr_err("%s: reopen device failed: %d\n",
>+ dev->name, ret);
>+ }
>+ }
>+
> return err;
> }
>
>--
>1.8.3.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-27 2:13 ` Stephen Hemminger
@ 2019-03-27 13:25 ` Michael S. Tsirkin
2019-03-27 20:10 ` si-wei liu
2019-03-27 13:25 ` Michael S. Tsirkin
1 sibling, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2019-03-27 13:25 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Si-Wei Liu, sridhar.samudrala, davem, kubakici, alexander.duyck,
jiri, netdev, virtualization, liran.alon, boris.ostrovsky,
vijay.balakrishna
On Tue, Mar 26, 2019 at 07:13:42PM -0700, Stephen Hemminger wrote:
> On Tue, 26 Mar 2019 19:48:13 -0400
> Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> > When a netdev appears through hot plug then gets enslaved by a failover
> > master that is already up and running, the slave will be opened
> > right away after getting enslaved. Today there's a race that userspace
> > (udev) may fail to rename the slave if the kernel (net_failover)
> > opens the slave earlier than when the userspace rename happens.
> > Unlike bond or team, the primary slave of failover can't be renamed by
> > userspace ahead of time, since the kernel initiated auto-enslavement is
> > unable to, or rather, is never meant to be synchronized with the rename
> > request from userspace.
> >
> > As the failover slave interfaces are not designed to be operated
> > directly by userspace apps: IP configuration, filter rules with
> > regard to network traffic passing and etc., should all be done on master
> > interface. In general, userspace apps only care about the
> > name of master interface, while slave names are less important as long
> > as admin users can see reliable names that may carry
> > other information describing the netdev. For e.g., they can infer that
> > "ens3nsby" is a standby slave of "ens3", while for a
> > name like "eth0" they can't tell which master it belongs to.
> >
> > Historically the name of IFF_UP interface can't be changed because
> > there might be admin script or management software that is already
> > relying on such behavior and assumes that the slave name can't be
> > changed once UP. But failover is special: with the in-kernel
> > auto-enslavement mechanism, the userspace expectation for device
> > enumeration and bring-up order is already broken. Previously initramfs
> > and various userspace config tools were modified to bypass failover
> > slaves because of auto-enslavement and duplicate MAC address. Similarly,
> > in case that users care about seeing reliable slave name, the new type
> > of failover slaves needs to be taken care of specifically in userspace
> > anyway.
> >
> > It's less risky to lift up the rename restriction on failover slave
> > which is already UP. Although it's possible this change may potentially
> > break userspace component (most likely configuration scripts or
> > management software) that assumes slave name can't be changed while
> > UP, it's relatively a limited and controllable set among all userspace
> > components, which can be fixed specifically to listen for the rename
> > and/or link down/up events on failover slaves. Userspace component
> > interacting with slaves is expected to be changed to operate on failover
> > master interface instead, as the failover slave is dynamic in nature
> > which may come and go at any point. The goal is to make the role of
> > failover slaves less relevant, and userspace components should only
> > deal with failover master in the long run.
> >
> > Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > Reviewed-by: Liran Alon <liran.alon@oracle.com>
>
>
> Why do you need to do dev_close/dev_open which will bounce
> the link?
What we need is notify userspace that link went up/down.
close/open will do that but just sending notifications
would do that as well without playing with link states.
--
MST
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-27 2:13 ` Stephen Hemminger
2019-03-27 13:25 ` Michael S. Tsirkin
@ 2019-03-27 13:25 ` Michael S. Tsirkin
1 sibling, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2019-03-27 13:25 UTC (permalink / raw)
To: Stephen Hemminger
Cc: jiri, kubakici, sridhar.samudrala, alexander.duyck,
virtualization, liran.alon, netdev, Si-Wei Liu, boris.ostrovsky,
davem
On Tue, Mar 26, 2019 at 07:13:42PM -0700, Stephen Hemminger wrote:
> On Tue, 26 Mar 2019 19:48:13 -0400
> Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> > When a netdev appears through hot plug then gets enslaved by a failover
> > master that is already up and running, the slave will be opened
> > right away after getting enslaved. Today there's a race that userspace
> > (udev) may fail to rename the slave if the kernel (net_failover)
> > opens the slave earlier than when the userspace rename happens.
> > Unlike bond or team, the primary slave of failover can't be renamed by
> > userspace ahead of time, since the kernel initiated auto-enslavement is
> > unable to, or rather, is never meant to be synchronized with the rename
> > request from userspace.
> >
> > As the failover slave interfaces are not designed to be operated
> > directly by userspace apps: IP configuration, filter rules with
> > regard to network traffic passing and etc., should all be done on master
> > interface. In general, userspace apps only care about the
> > name of master interface, while slave names are less important as long
> > as admin users can see reliable names that may carry
> > other information describing the netdev. For e.g., they can infer that
> > "ens3nsby" is a standby slave of "ens3", while for a
> > name like "eth0" they can't tell which master it belongs to.
> >
> > Historically the name of IFF_UP interface can't be changed because
> > there might be admin script or management software that is already
> > relying on such behavior and assumes that the slave name can't be
> > changed once UP. But failover is special: with the in-kernel
> > auto-enslavement mechanism, the userspace expectation for device
> > enumeration and bring-up order is already broken. Previously initramfs
> > and various userspace config tools were modified to bypass failover
> > slaves because of auto-enslavement and duplicate MAC address. Similarly,
> > in case that users care about seeing reliable slave name, the new type
> > of failover slaves needs to be taken care of specifically in userspace
> > anyway.
> >
> > It's less risky to lift up the rename restriction on failover slave
> > which is already UP. Although it's possible this change may potentially
> > break userspace component (most likely configuration scripts or
> > management software) that assumes slave name can't be changed while
> > UP, it's relatively a limited and controllable set among all userspace
> > components, which can be fixed specifically to listen for the rename
> > and/or link down/up events on failover slaves. Userspace component
> > interacting with slaves is expected to be changed to operate on failover
> > master interface instead, as the failover slave is dynamic in nature
> > which may come and go at any point. The goal is to make the role of
> > failover slaves less relevant, and userspace components should only
> > deal with failover master in the long run.
> >
> > Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > Reviewed-by: Liran Alon <liran.alon@oracle.com>
>
>
> Why do you need to do dev_close/dev_open which will bounce
> the link?
What we need is notify userspace that link went up/down.
close/open will do that but just sending notifications
would do that as well without playing with link states.
--
MST
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-27 13:25 ` Michael S. Tsirkin
@ 2019-03-27 20:10 ` si-wei liu
2019-03-27 22:16 ` Michael S. Tsirkin
2019-03-27 22:16 ` Michael S. Tsirkin
0 siblings, 2 replies; 18+ messages in thread
From: si-wei liu @ 2019-03-27 20:10 UTC (permalink / raw)
To: Michael S. Tsirkin, Stephen Hemminger
Cc: sridhar.samudrala, davem, kubakici, alexander.duyck, jiri,
netdev, virtualization, liran.alon, boris.ostrovsky,
vijay.balakrishna
On 3/27/2019 6:25 AM, Michael S. Tsirkin wrote:
> On Tue, Mar 26, 2019 at 07:13:42PM -0700, Stephen Hemminger wrote:
>> On Tue, 26 Mar 2019 19:48:13 -0400
>> Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>> When a netdev appears through hot plug then gets enslaved by a failover
>>> master that is already up and running, the slave will be opened
>>> right away after getting enslaved. Today there's a race that userspace
>>> (udev) may fail to rename the slave if the kernel (net_failover)
>>> opens the slave earlier than when the userspace rename happens.
>>> Unlike bond or team, the primary slave of failover can't be renamed by
>>> userspace ahead of time, since the kernel initiated auto-enslavement is
>>> unable to, or rather, is never meant to be synchronized with the rename
>>> request from userspace.
>>>
>>> As the failover slave interfaces are not designed to be operated
>>> directly by userspace apps: IP configuration, filter rules with
>>> regard to network traffic passing and etc., should all be done on master
>>> interface. In general, userspace apps only care about the
>>> name of master interface, while slave names are less important as long
>>> as admin users can see reliable names that may carry
>>> other information describing the netdev. For e.g., they can infer that
>>> "ens3nsby" is a standby slave of "ens3", while for a
>>> name like "eth0" they can't tell which master it belongs to.
>>>
>>> Historically the name of IFF_UP interface can't be changed because
>>> there might be admin script or management software that is already
>>> relying on such behavior and assumes that the slave name can't be
>>> changed once UP. But failover is special: with the in-kernel
>>> auto-enslavement mechanism, the userspace expectation for device
>>> enumeration and bring-up order is already broken. Previously initramfs
>>> and various userspace config tools were modified to bypass failover
>>> slaves because of auto-enslavement and duplicate MAC address. Similarly,
>>> in case that users care about seeing reliable slave name, the new type
>>> of failover slaves needs to be taken care of specifically in userspace
>>> anyway.
>>>
>>> It's less risky to lift up the rename restriction on failover slave
>>> which is already UP. Although it's possible this change may potentially
>>> break userspace component (most likely configuration scripts or
>>> management software) that assumes slave name can't be changed while
>>> UP, it's relatively a limited and controllable set among all userspace
>>> components, which can be fixed specifically to listen for the rename
>>> and/or link down/up events on failover slaves. Userspace component
>>> interacting with slaves is expected to be changed to operate on failover
>>> master interface instead, as the failover slave is dynamic in nature
>>> which may come and go at any point. The goal is to make the role of
>>> failover slaves less relevant, and userspace components should only
>>> deal with failover master in the long run.
>>>
>>> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>
>> Why do you need to do dev_close/dev_open which will bounce
>> the link?
> What we need is notify userspace that link went up/down.
> close/open will do that but just sending notifications
> would do that as well without playing with link states.
>
Since you were requesting to send fake link down/up events around
rename, so as to keep existing userspace intact with this behavioral
change, right? The thing is if you can't fake notification with just
IFF_UP or ~IFF_UP then claim everything is done. If you look at
rtnl_fill_ifinfo() where the notification payload is prepared, you'll
find a lot of states and flags are correlated:
ifi_flags
IFLA_OPERSTATE
IFLA_CARRIER
IFLA_CARRIER_CHANGES
which requires below states to be toggled or taken care of in between:
operstate
__LINK_STATE_START
__LINK_STATE_NOCARRIER
carrier_changes
for e.g. user mostly treats IFF_RUNNING as the indication of link
up/down as opposed to IFF_UP. That would require you to toggle
__LINK_STATE_START (and operstate as well) without doing a full
dev_close/open. Since __LINK_STATE_START is cleared, there's no sense to
let CARRIER_OK remain set, and then you'd need to take care of
carrier_changes... Since you don't really shutting down the device, the
link watchdog keeps running and may race with inconsistent carrier state
in between. dev_close/open may have done unneeded work, but it's the
safest option IMHO, as apparently the cost and ugly complexity to fake
link down/up events is not something worthwhile compared to simply
bouncing the link state.
Another point is kernel consumers of the NETDEV_CHANGENAME notifier
might well assume the link is already taken down by dev_close() before
the rename. I didn't check all those consumers in tree but thought it
might be safe to keep the current convention.
Now let me turn around and ask you what's your concerns if bouncing the
link state. While I can tweak a lightweight version of dev_close/open to
bypass ndo_stop and ndo_start while shutting down the link watchdog on
behalf of drivers, it's far more involved than make me think if that's
really what you had in mind.
Another less safer option is that we just notify userspace anyway
without sending down/up event around, as I don't see *any real
application* cares about the link state or whatsoever when it attempts
to detect rename. Given that the scope is limited to failover slave the
chance of breaking userspace app would be extremely low in practice.
Thanks,
-Siwei
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-27 20:10 ` si-wei liu
2019-03-27 22:16 ` Michael S. Tsirkin
@ 2019-03-27 22:16 ` Michael S. Tsirkin
2019-03-27 22:31 ` si-wei liu
1 sibling, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2019-03-27 22:16 UTC (permalink / raw)
To: si-wei liu
Cc: Stephen Hemminger, sridhar.samudrala, davem, kubakici,
alexander.duyck, jiri, netdev, virtualization, liran.alon,
boris.ostrovsky, vijay.balakrishna
On Wed, Mar 27, 2019 at 01:10:10PM -0700, si-wei liu wrote:
> Another less safer option is that we just notify userspace anyway without
> sending down/up event around, as I don't see *any real application* cares
> about the link state or whatsoever when it attempts to detect rename.
How do you write a race ree handler then? ATM just detecting link up is
sufficient and covers 100% of cases. Seems like a good idea to keep it
that way.
--
MST
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-27 20:10 ` si-wei liu
@ 2019-03-27 22:16 ` Michael S. Tsirkin
2019-03-27 22:16 ` Michael S. Tsirkin
1 sibling, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2019-03-27 22:16 UTC (permalink / raw)
To: si-wei liu
Cc: jiri, kubakici, sridhar.samudrala, alexander.duyck,
virtualization, liran.alon, netdev, boris.ostrovsky, davem
On Wed, Mar 27, 2019 at 01:10:10PM -0700, si-wei liu wrote:
> Another less safer option is that we just notify userspace anyway without
> sending down/up event around, as I don't see *any real application* cares
> about the link state or whatsoever when it attempts to detect rename.
How do you write a race ree handler then? ATM just detecting link up is
sufficient and covers 100% of cases. Seems like a good idea to keep it
that way.
--
MST
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-27 22:16 ` Michael S. Tsirkin
@ 2019-03-27 22:31 ` si-wei liu
2019-03-27 22:48 ` si-wei liu
0 siblings, 1 reply; 18+ messages in thread
From: si-wei liu @ 2019-03-27 22:31 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Stephen Hemminger, sridhar.samudrala, davem, kubakici,
alexander.duyck, jiri, netdev, virtualization, liran.alon,
boris.ostrovsky, vijay.balakrishna
On 3/27/2019 3:16 PM, Michael S. Tsirkin wrote:
> On Wed, Mar 27, 2019 at 01:10:10PM -0700, si-wei liu wrote:
>> Another less safer option is that we just notify userspace anyway without
>> sending down/up event around, as I don't see *any real application* cares
>> about the link state or whatsoever when it attempts to detect rename.
> How do you write a race ree handler then? ATM just detecting link up is
> sufficient and covers 100% of cases. Seems like a good idea to keep it
> that way.
>
What do you mean? Which flag or attribute do you think 100% of the
userspace regard it as link up? And why userspace that cares about name
change needs to double check whether link is up or not? I'm sorry, but
are you sure this is the 100% case that every userspace app does it like
what you're claiming?
-Siwei
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-27 22:31 ` si-wei liu
@ 2019-03-27 22:48 ` si-wei liu
0 siblings, 0 replies; 18+ messages in thread
From: si-wei liu @ 2019-03-27 22:48 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Stephen Hemminger, sridhar.samudrala, davem, kubakici,
alexander.duyck, jiri, netdev, virtualization, liran.alon,
boris.ostrovsky, vijay.balakrishna
On 3/27/2019 3:31 PM, si-wei liu wrote:
>
>
> On 3/27/2019 3:16 PM, Michael S. Tsirkin wrote:
>> On Wed, Mar 27, 2019 at 01:10:10PM -0700, si-wei liu wrote:
>>> Another less safer option is that we just notify userspace anyway
>>> without
>>> sending down/up event around, as I don't see *any real application*
>>> cares
>>> about the link state or whatsoever when it attempts to detect rename.
>> How do you write a race ree handler then? ATM just detecting link up is
>> sufficient and covers 100% of cases. Seems like a good idea to keep it
>> that way.
>>
> What do you mean? Which flag or attribute do you think 100% of the
> userspace regard it as link up? And why userspace that cares about
> name change needs to double check whether link is up or not? I'm
> sorry, but are you sure this is the 100% case that every userspace app
> does it like what you're claiming?
>
> -Siwei
To me even the any flag checking regarding link state is unnecessary as
it's hard for userspace apps to follow precisely the name change
together with the link state. I'm not sure if you still persist in
sending link down/up event. What the userspace apps that care about the
name so far as I see chase the name through ifindex. I'm really doubtful
if link state is that important to them. But if you really need that
consistency I'd say bouncing the link is perhaps the only safe option.
-Siwei
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-27 11:11 ` Jiri Pirko
@ 2019-03-27 23:44 ` si-wei liu
2019-03-28 17:14 ` Stephen Hemminger
0 siblings, 1 reply; 18+ messages in thread
From: si-wei liu @ 2019-03-27 23:44 UTC (permalink / raw)
To: Jiri Pirko
Cc: mst, sridhar.samudrala, stephen, davem, kubakici,
alexander.duyck, netdev, virtualization, liran.alon,
boris.ostrovsky, vijay.balakrishna
On 3/27/2019 4:11 AM, Jiri Pirko wrote:
> Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote:
>> When a netdev appears through hot plug then gets enslaved by a failover
>> master that is already up and running, the slave will be opened
>> right away after getting enslaved. Today there's a race that userspace
>> (udev) may fail to rename the slave if the kernel (net_failover)
>> opens the slave earlier than when the userspace rename happens.
>> Unlike bond or team, the primary slave of failover can't be renamed by
>> userspace ahead of time, since the kernel initiated auto-enslavement is
>> unable to, or rather, is never meant to be synchronized with the rename
>> request from userspace.
>>
>> As the failover slave interfaces are not designed to be operated
>> directly by userspace apps: IP configuration, filter rules with
>> regard to network traffic passing and etc., should all be done on master
>> interface. In general, userspace apps only care about the
>> name of master interface, while slave names are less important as long
>> as admin users can see reliable names that may carry
>> other information describing the netdev. For e.g., they can infer that
>> "ens3nsby" is a standby slave of "ens3", while for a
>> name like "eth0" they can't tell which master it belongs to.
>>
>> Historically the name of IFF_UP interface can't be changed because
>> there might be admin script or management software that is already
>> relying on such behavior and assumes that the slave name can't be
>> changed once UP. But failover is special: with the in-kernel
>> auto-enslavement mechanism, the userspace expectation for device
>> enumeration and bring-up order is already broken. Previously initramfs
>> and various userspace config tools were modified to bypass failover
>> slaves because of auto-enslavement and duplicate MAC address. Similarly,
>> in case that users care about seeing reliable slave name, the new type
>> of failover slaves needs to be taken care of specifically in userspace
>> anyway.
>>
>> It's less risky to lift up the rename restriction on failover slave
>> which is already UP. Although it's possible this change may potentially
>> break userspace component (most likely configuration scripts or
>> management software) that assumes slave name can't be changed while
>> UP, it's relatively a limited and controllable set among all userspace
>> components, which can be fixed specifically to listen for the rename
>> and/or link down/up events on failover slaves. Userspace component
>> interacting with slaves is expected to be changed to operate on failover
>> master interface instead, as the failover slave is dynamic in nature
>> which may come and go at any point. The goal is to make the role of
>> failover slaves less relevant, and userspace components should only
>> deal with failover master in the long run.
>>
>> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>
>> --
>> v1 -> v2:
>> - Drop configurable module parameter (Sridhar)
>>
>> v2 -> v3:
>> - Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
>> - Send down and up events around rename (Michael S. Tsirkin)
>> ---
>> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
>> 1 file changed, 34 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 722d50d..3e0cd80 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev,
>> int dev_change_name(struct net_device *dev, const char *newname)
>> {
>> unsigned char old_assign_type;
>> + bool reopen_needed = false;
>> char oldname[IFNAMSIZ];
>> int err = 0;
>> int ret;
>> @@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname)
>> BUG_ON(!dev_net(dev));
>>
>> net = dev_net(dev);
>> - if (dev->flags & IFF_UP)
>> - return -EBUSY;
>> +
>> + /* Allow failover slave to rename even when
>> + * it is up and running.
>> + *
>> + * Failover slaves are special, since userspace
>> + * might rename the slave after the interface
>> + * has been brought up and running due to
>> + * auto-enslavement.
>> + *
>> + * Failover users don't actually care about slave
>> + * name change, as they are only expected to operate
>> + * on master interface directly.
>> + */
>> + if (dev->flags & IFF_UP) {
>> + if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
>> + return -EBUSY;
>> + reopen_needed = true;
>> + }
>>
>> write_seqcount_begin(&devnet_rename_seq);
>>
>> @@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
>> return err;
>> }
>>
>> + if (reopen_needed)
>> + dev_close(dev);
> Ugh. Don't dev_close/dev_open on name change.
See my response to Michael and Stephen. What's your suggestion then?
>
>> +
>> if (oldname[0] && !strchr(oldname, '%'))
>> netdev_info(dev, "renamed from %s\n", oldname);
>>
>> @@ -1210,7 +1230,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
>> memcpy(dev->name, oldname, IFNAMSIZ);
>> dev->name_assign_type = old_assign_type;
>> write_seqcount_end(&devnet_rename_seq);
>> - return ret;
>> + if (err >= 0)
>> + err = ret;
>> + goto reopen;
>> }
>>
>> write_seqcount_end(&devnet_rename_seq);
>> @@ -1246,6 +1268,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
>> }
>> }
>>
>> +reopen:
>> + if (reopen_needed) {
>> + ret = dev_open(dev);
>> + if (ret) {
>> + pr_err("%s: reopen device failed: %d\n",
>> + dev->name, ret);
>> + }
>> + }
>> +
>> return err;
>> }
>>
>> --
>> 1.8.3.1
>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-27 23:44 ` si-wei liu
@ 2019-03-28 17:14 ` Stephen Hemminger
0 siblings, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2019-03-28 17:14 UTC (permalink / raw)
To: si-wei liu
Cc: Jiri Pirko, mst, sridhar.samudrala, davem, kubakici,
alexander.duyck, netdev, virtualization, liran.alon,
boris.ostrovsky, vijay.balakrishna
On Wed, 27 Mar 2019 16:44:19 -0700
si-wei liu <si-wei.liu@oracle.com> wrote:
> On 3/27/2019 4:11 AM, Jiri Pirko wrote:
> > Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote:
> >> When a netdev appears through hot plug then gets enslaved by a failover
> >> master that is already up and running, the slave will be opened
> >> right away after getting enslaved. Today there's a race that userspace
> >> (udev) may fail to rename the slave if the kernel (net_failover)
> >> opens the slave earlier than when the userspace rename happens.
> >> Unlike bond or team, the primary slave of failover can't be renamed by
> >> userspace ahead of time, since the kernel initiated auto-enslavement is
> >> unable to, or rather, is never meant to be synchronized with the rename
> >> request from userspace.
> >>
> >> As the failover slave interfaces are not designed to be operated
> >> directly by userspace apps: IP configuration, filter rules with
> >> regard to network traffic passing and etc., should all be done on master
> >> interface. In general, userspace apps only care about the
> >> name of master interface, while slave names are less important as long
> >> as admin users can see reliable names that may carry
> >> other information describing the netdev. For e.g., they can infer that
> >> "ens3nsby" is a standby slave of "ens3", while for a
> >> name like "eth0" they can't tell which master it belongs to.
> >>
> >> Historically the name of IFF_UP interface can't be changed because
> >> there might be admin script or management software that is already
> >> relying on such behavior and assumes that the slave name can't be
> >> changed once UP. But failover is special: with the in-kernel
> >> auto-enslavement mechanism, the userspace expectation for device
> >> enumeration and bring-up order is already broken. Previously initramfs
> >> and various userspace config tools were modified to bypass failover
> >> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> >> in case that users care about seeing reliable slave name, the new type
> >> of failover slaves needs to be taken care of specifically in userspace
> >> anyway.
> >>
> >> It's less risky to lift up the rename restriction on failover slave
> >> which is already UP. Although it's possible this change may potentially
> >> break userspace component (most likely configuration scripts or
> >> management software) that assumes slave name can't be changed while
> >> UP, it's relatively a limited and controllable set among all userspace
> >> components, which can be fixed specifically to listen for the rename
> >> and/or link down/up events on failover slaves. Userspace component
> >> interacting with slaves is expected to be changed to operate on failover
> >> master interface instead, as the failover slave is dynamic in nature
> >> which may come and go at any point. The goal is to make the role of
> >> failover slaves less relevant, and userspace components should only
> >> deal with failover master in the long run.
> >>
> >> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> >>
> >> --
> >> v1 -> v2:
> >> - Drop configurable module parameter (Sridhar)
> >>
> >> v2 -> v3:
> >> - Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
> >> - Send down and up events around rename (Michael S. Tsirkin)
> >> ---
> >> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
> >> 1 file changed, 34 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index 722d50d..3e0cd80 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev,
> >> int dev_change_name(struct net_device *dev, const char *newname)
> >> {
> >> unsigned char old_assign_type;
> >> + bool reopen_needed = false;
> >> char oldname[IFNAMSIZ];
> >> int err = 0;
> >> int ret;
> >> @@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname)
> >> BUG_ON(!dev_net(dev));
> >>
> >> net = dev_net(dev);
> >> - if (dev->flags & IFF_UP)
> >> - return -EBUSY;
> >> +
> >> + /* Allow failover slave to rename even when
> >> + * it is up and running.
> >> + *
> >> + * Failover slaves are special, since userspace
> >> + * might rename the slave after the interface
> >> + * has been brought up and running due to
> >> + * auto-enslavement.
> >> + *
> >> + * Failover users don't actually care about slave
> >> + * name change, as they are only expected to operate
> >> + * on master interface directly.
> >> + */
> >> + if (dev->flags & IFF_UP) {
> >> + if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
> >> + return -EBUSY;
> >> + reopen_needed = true;
> >> + }
> >>
> >> write_seqcount_begin(&devnet_rename_seq);
> >>
> >> @@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
> >> return err;
> >> }
> >>
> >> + if (reopen_needed)
> >> + dev_close(dev);
> > Ugh. Don't dev_close/dev_open on name change.
> See my response to Michael and Stephen. What's your suggestion then?
To a DEV_CHANGE notification instead?
My opinion is that allowing name change is not worth the doing.
Also, the kernel should never do the name change, it is up to userspace.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
@ 2019-03-28 17:14 ` Stephen Hemminger
0 siblings, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2019-03-28 17:14 UTC (permalink / raw)
To: si-wei liu
Cc: Jiri Pirko, mst, kubakici, sridhar.samudrala, alexander.duyck,
virtualization, liran.alon, netdev, boris.ostrovsky, davem
On Wed, 27 Mar 2019 16:44:19 -0700
si-wei liu <si-wei.liu@oracle.com> wrote:
> On 3/27/2019 4:11 AM, Jiri Pirko wrote:
> > Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote:
> >> When a netdev appears through hot plug then gets enslaved by a failover
> >> master that is already up and running, the slave will be opened
> >> right away after getting enslaved. Today there's a race that userspace
> >> (udev) may fail to rename the slave if the kernel (net_failover)
> >> opens the slave earlier than when the userspace rename happens.
> >> Unlike bond or team, the primary slave of failover can't be renamed by
> >> userspace ahead of time, since the kernel initiated auto-enslavement is
> >> unable to, or rather, is never meant to be synchronized with the rename
> >> request from userspace.
> >>
> >> As the failover slave interfaces are not designed to be operated
> >> directly by userspace apps: IP configuration, filter rules with
> >> regard to network traffic passing and etc., should all be done on master
> >> interface. In general, userspace apps only care about the
> >> name of master interface, while slave names are less important as long
> >> as admin users can see reliable names that may carry
> >> other information describing the netdev. For e.g., they can infer that
> >> "ens3nsby" is a standby slave of "ens3", while for a
> >> name like "eth0" they can't tell which master it belongs to.
> >>
> >> Historically the name of IFF_UP interface can't be changed because
> >> there might be admin script or management software that is already
> >> relying on such behavior and assumes that the slave name can't be
> >> changed once UP. But failover is special: with the in-kernel
> >> auto-enslavement mechanism, the userspace expectation for device
> >> enumeration and bring-up order is already broken. Previously initramfs
> >> and various userspace config tools were modified to bypass failover
> >> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> >> in case that users care about seeing reliable slave name, the new type
> >> of failover slaves needs to be taken care of specifically in userspace
> >> anyway.
> >>
> >> It's less risky to lift up the rename restriction on failover slave
> >> which is already UP. Although it's possible this change may potentially
> >> break userspace component (most likely configuration scripts or
> >> management software) that assumes slave name can't be changed while
> >> UP, it's relatively a limited and controllable set among all userspace
> >> components, which can be fixed specifically to listen for the rename
> >> and/or link down/up events on failover slaves. Userspace component
> >> interacting with slaves is expected to be changed to operate on failover
> >> master interface instead, as the failover slave is dynamic in nature
> >> which may come and go at any point. The goal is to make the role of
> >> failover slaves less relevant, and userspace components should only
> >> deal with failover master in the long run.
> >>
> >> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> >>
> >> --
> >> v1 -> v2:
> >> - Drop configurable module parameter (Sridhar)
> >>
> >> v2 -> v3:
> >> - Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
> >> - Send down and up events around rename (Michael S. Tsirkin)
> >> ---
> >> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
> >> 1 file changed, 34 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index 722d50d..3e0cd80 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev,
> >> int dev_change_name(struct net_device *dev, const char *newname)
> >> {
> >> unsigned char old_assign_type;
> >> + bool reopen_needed = false;
> >> char oldname[IFNAMSIZ];
> >> int err = 0;
> >> int ret;
> >> @@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname)
> >> BUG_ON(!dev_net(dev));
> >>
> >> net = dev_net(dev);
> >> - if (dev->flags & IFF_UP)
> >> - return -EBUSY;
> >> +
> >> + /* Allow failover slave to rename even when
> >> + * it is up and running.
> >> + *
> >> + * Failover slaves are special, since userspace
> >> + * might rename the slave after the interface
> >> + * has been brought up and running due to
> >> + * auto-enslavement.
> >> + *
> >> + * Failover users don't actually care about slave
> >> + * name change, as they are only expected to operate
> >> + * on master interface directly.
> >> + */
> >> + if (dev->flags & IFF_UP) {
> >> + if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
> >> + return -EBUSY;
> >> + reopen_needed = true;
> >> + }
> >>
> >> write_seqcount_begin(&devnet_rename_seq);
> >>
> >> @@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
> >> return err;
> >> }
> >>
> >> + if (reopen_needed)
> >> + dev_close(dev);
> > Ugh. Don't dev_close/dev_open on name change.
> See my response to Michael and Stephen. What's your suggestion then?
To a DEV_CHANGE notification instead?
My opinion is that allowing name change is not worth the doing.
Also, the kernel should never do the name change, it is up to userspace.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-26 23:48 [PATCH net v3] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
` (4 preceding siblings ...)
2019-03-28 19:49 ` kbuild test robot
@ 2019-03-28 19:49 ` kbuild test robot
5 siblings, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2019-03-28 19:49 UTC (permalink / raw)
To: Si-Wei Liu
Cc: kbuild-all, mst, sridhar.samudrala, stephen, davem, kubakici,
alexander.duyck, jiri, netdev, virtualization, liran.alon,
boris.ostrovsky, vijay.balakrishna, si-wei liu
[-- Attachment #1: Type: text/plain, Size: 4836 bytes --]
Hi Si-Wei,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on net/master]
url: https://github.com/0day-ci/linux/commits/Si-Wei-Liu/failover-allow-name-change-on-IFF_UP-slave-interfaces/20190329-020744
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=openrisc
All errors (new ones prefixed by >>):
net/core/dev.c: In function 'dev_change_name':
>> net/core/dev.c:1277:9: error: too few arguments to function 'dev_open'
ret = dev_open(dev);
^~~~~~~~
In file included from net/core/dev.c:93:0:
include/linux/netdevice.h:2636:5: note: declared here
int dev_open(struct net_device *dev, struct netlink_ext_ack *extack);
^~~~~~~~
vim +/dev_open +1277 net/core/dev.c
1166
1167 /**
1168 * dev_change_name - change name of a device
1169 * @dev: device
1170 * @newname: name (or format string) must be at least IFNAMSIZ
1171 *
1172 * Change name of a device, can pass format strings "eth%d".
1173 * for wildcarding.
1174 */
1175 int dev_change_name(struct net_device *dev, const char *newname)
1176 {
1177 unsigned char old_assign_type;
1178 bool reopen_needed = false;
1179 char oldname[IFNAMSIZ];
1180 int err = 0;
1181 int ret;
1182 struct net *net;
1183
1184 ASSERT_RTNL();
1185 BUG_ON(!dev_net(dev));
1186
1187 net = dev_net(dev);
1188
1189 /* Allow failover slave to rename even when
1190 * it is up and running.
1191 *
1192 * Failover slaves are special, since userspace
1193 * might rename the slave after the interface
1194 * has been brought up and running due to
1195 * auto-enslavement.
1196 *
1197 * Failover users don't actually care about slave
1198 * name change, as they are only expected to operate
1199 * on master interface directly.
1200 */
1201 if (dev->flags & IFF_UP) {
1202 if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
1203 return -EBUSY;
1204 reopen_needed = true;
1205 }
1206
1207 write_seqcount_begin(&devnet_rename_seq);
1208
1209 if (strncmp(newname, dev->name, IFNAMSIZ) == 0) {
1210 write_seqcount_end(&devnet_rename_seq);
1211 return 0;
1212 }
1213
1214 memcpy(oldname, dev->name, IFNAMSIZ);
1215
1216 err = dev_get_valid_name(net, dev, newname);
1217 if (err < 0) {
1218 write_seqcount_end(&devnet_rename_seq);
1219 return err;
1220 }
1221
1222 if (reopen_needed)
1223 dev_close(dev);
1224
1225 if (oldname[0] && !strchr(oldname, '%'))
1226 netdev_info(dev, "renamed from %s\n", oldname);
1227
1228 old_assign_type = dev->name_assign_type;
1229 dev->name_assign_type = NET_NAME_RENAMED;
1230
1231 rollback:
1232 ret = device_rename(&dev->dev, dev->name);
1233 if (ret) {
1234 memcpy(dev->name, oldname, IFNAMSIZ);
1235 dev->name_assign_type = old_assign_type;
1236 write_seqcount_end(&devnet_rename_seq);
1237 if (err >= 0)
1238 err = ret;
1239 goto reopen;
1240 }
1241
1242 write_seqcount_end(&devnet_rename_seq);
1243
1244 netdev_adjacent_rename_links(dev, oldname);
1245
1246 write_lock_bh(&dev_base_lock);
1247 hlist_del_rcu(&dev->name_hlist);
1248 write_unlock_bh(&dev_base_lock);
1249
1250 synchronize_rcu();
1251
1252 write_lock_bh(&dev_base_lock);
1253 hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name));
1254 write_unlock_bh(&dev_base_lock);
1255
1256 ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
1257 ret = notifier_to_errno(ret);
1258
1259 if (ret) {
1260 /* err >= 0 after dev_alloc_name() or stores the first errno */
1261 if (err >= 0) {
1262 err = ret;
1263 write_seqcount_begin(&devnet_rename_seq);
1264 memcpy(dev->name, oldname, IFNAMSIZ);
1265 memcpy(oldname, newname, IFNAMSIZ);
1266 dev->name_assign_type = old_assign_type;
1267 old_assign_type = NET_NAME_RENAMED;
1268 goto rollback;
1269 } else {
1270 pr_err("%s: name change rollback failed: %d\n",
1271 dev->name, ret);
1272 }
1273 }
1274
1275 reopen:
1276 if (reopen_needed) {
> 1277 ret = dev_open(dev);
1278 if (ret) {
1279 pr_err("%s: reopen device failed: %d\n",
1280 dev->name, ret);
1281 }
1282 }
1283
1284 return err;
1285 }
1286
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 8104 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-26 23:48 [PATCH net v3] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
` (3 preceding siblings ...)
2019-03-27 11:11 ` Jiri Pirko
@ 2019-03-28 19:49 ` kbuild test robot
2019-03-28 19:49 ` kbuild test robot
5 siblings, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2019-03-28 19:49 UTC (permalink / raw)
Cc: jiri, mst, kubakici, sridhar.samudrala, alexander.duyck,
virtualization, liran.alon, kbuild-all, netdev, si-wei liu,
boris.ostrovsky, davem
[-- Attachment #1: Type: text/plain, Size: 4836 bytes --]
Hi Si-Wei,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on net/master]
url: https://github.com/0day-ci/linux/commits/Si-Wei-Liu/failover-allow-name-change-on-IFF_UP-slave-interfaces/20190329-020744
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=openrisc
All errors (new ones prefixed by >>):
net/core/dev.c: In function 'dev_change_name':
>> net/core/dev.c:1277:9: error: too few arguments to function 'dev_open'
ret = dev_open(dev);
^~~~~~~~
In file included from net/core/dev.c:93:0:
include/linux/netdevice.h:2636:5: note: declared here
int dev_open(struct net_device *dev, struct netlink_ext_ack *extack);
^~~~~~~~
vim +/dev_open +1277 net/core/dev.c
1166
1167 /**
1168 * dev_change_name - change name of a device
1169 * @dev: device
1170 * @newname: name (or format string) must be at least IFNAMSIZ
1171 *
1172 * Change name of a device, can pass format strings "eth%d".
1173 * for wildcarding.
1174 */
1175 int dev_change_name(struct net_device *dev, const char *newname)
1176 {
1177 unsigned char old_assign_type;
1178 bool reopen_needed = false;
1179 char oldname[IFNAMSIZ];
1180 int err = 0;
1181 int ret;
1182 struct net *net;
1183
1184 ASSERT_RTNL();
1185 BUG_ON(!dev_net(dev));
1186
1187 net = dev_net(dev);
1188
1189 /* Allow failover slave to rename even when
1190 * it is up and running.
1191 *
1192 * Failover slaves are special, since userspace
1193 * might rename the slave after the interface
1194 * has been brought up and running due to
1195 * auto-enslavement.
1196 *
1197 * Failover users don't actually care about slave
1198 * name change, as they are only expected to operate
1199 * on master interface directly.
1200 */
1201 if (dev->flags & IFF_UP) {
1202 if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
1203 return -EBUSY;
1204 reopen_needed = true;
1205 }
1206
1207 write_seqcount_begin(&devnet_rename_seq);
1208
1209 if (strncmp(newname, dev->name, IFNAMSIZ) == 0) {
1210 write_seqcount_end(&devnet_rename_seq);
1211 return 0;
1212 }
1213
1214 memcpy(oldname, dev->name, IFNAMSIZ);
1215
1216 err = dev_get_valid_name(net, dev, newname);
1217 if (err < 0) {
1218 write_seqcount_end(&devnet_rename_seq);
1219 return err;
1220 }
1221
1222 if (reopen_needed)
1223 dev_close(dev);
1224
1225 if (oldname[0] && !strchr(oldname, '%'))
1226 netdev_info(dev, "renamed from %s\n", oldname);
1227
1228 old_assign_type = dev->name_assign_type;
1229 dev->name_assign_type = NET_NAME_RENAMED;
1230
1231 rollback:
1232 ret = device_rename(&dev->dev, dev->name);
1233 if (ret) {
1234 memcpy(dev->name, oldname, IFNAMSIZ);
1235 dev->name_assign_type = old_assign_type;
1236 write_seqcount_end(&devnet_rename_seq);
1237 if (err >= 0)
1238 err = ret;
1239 goto reopen;
1240 }
1241
1242 write_seqcount_end(&devnet_rename_seq);
1243
1244 netdev_adjacent_rename_links(dev, oldname);
1245
1246 write_lock_bh(&dev_base_lock);
1247 hlist_del_rcu(&dev->name_hlist);
1248 write_unlock_bh(&dev_base_lock);
1249
1250 synchronize_rcu();
1251
1252 write_lock_bh(&dev_base_lock);
1253 hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name));
1254 write_unlock_bh(&dev_base_lock);
1255
1256 ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
1257 ret = notifier_to_errno(ret);
1258
1259 if (ret) {
1260 /* err >= 0 after dev_alloc_name() or stores the first errno */
1261 if (err >= 0) {
1262 err = ret;
1263 write_seqcount_begin(&devnet_rename_seq);
1264 memcpy(dev->name, oldname, IFNAMSIZ);
1265 memcpy(oldname, newname, IFNAMSIZ);
1266 dev->name_assign_type = old_assign_type;
1267 old_assign_type = NET_NAME_RENAMED;
1268 goto rollback;
1269 } else {
1270 pr_err("%s: name change rollback failed: %d\n",
1271 dev->name, ret);
1272 }
1273 }
1274
1275 reopen:
1276 if (reopen_needed) {
> 1277 ret = dev_open(dev);
1278 if (ret) {
1279 pr_err("%s: reopen device failed: %d\n",
1280 dev->name, ret);
1281 }
1282 }
1283
1284 return err;
1285 }
1286
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 8104 bytes --]
[-- Attachment #3: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3] failover: allow name change on IFF_UP slave interfaces
2019-03-28 17:14 ` Stephen Hemminger
(?)
@ 2019-03-28 20:15 ` si-wei liu
-1 siblings, 0 replies; 18+ messages in thread
From: si-wei liu @ 2019-03-28 20:15 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Jiri Pirko, mst, sridhar.samudrala, davem, kubakici,
alexander.duyck, netdev, virtualization, liran.alon,
boris.ostrovsky, vijay.balakrishna
On 3/28/2019 10:14 AM, Stephen Hemminger wrote:
> On Wed, 27 Mar 2019 16:44:19 -0700
> si-wei liu <si-wei.liu@oracle.com> wrote:
>
>> On 3/27/2019 4:11 AM, Jiri Pirko wrote:
>>> Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote:
>>>> When a netdev appears through hot plug then gets enslaved by a failover
>>>> master that is already up and running, the slave will be opened
>>>> right away after getting enslaved. Today there's a race that userspace
>>>> (udev) may fail to rename the slave if the kernel (net_failover)
>>>> opens the slave earlier than when the userspace rename happens.
>>>> Unlike bond or team, the primary slave of failover can't be renamed by
>>>> userspace ahead of time, since the kernel initiated auto-enslavement is
>>>> unable to, or rather, is never meant to be synchronized with the rename
>>>> request from userspace.
>>>>
>>>> As the failover slave interfaces are not designed to be operated
>>>> directly by userspace apps: IP configuration, filter rules with
>>>> regard to network traffic passing and etc., should all be done on master
>>>> interface. In general, userspace apps only care about the
>>>> name of master interface, while slave names are less important as long
>>>> as admin users can see reliable names that may carry
>>>> other information describing the netdev. For e.g., they can infer that
>>>> "ens3nsby" is a standby slave of "ens3", while for a
>>>> name like "eth0" they can't tell which master it belongs to.
>>>>
>>>> Historically the name of IFF_UP interface can't be changed because
>>>> there might be admin script or management software that is already
>>>> relying on such behavior and assumes that the slave name can't be
>>>> changed once UP. But failover is special: with the in-kernel
>>>> auto-enslavement mechanism, the userspace expectation for device
>>>> enumeration and bring-up order is already broken. Previously initramfs
>>>> and various userspace config tools were modified to bypass failover
>>>> slaves because of auto-enslavement and duplicate MAC address. Similarly,
>>>> in case that users care about seeing reliable slave name, the new type
>>>> of failover slaves needs to be taken care of specifically in userspace
>>>> anyway.
>>>>
>>>> It's less risky to lift up the rename restriction on failover slave
>>>> which is already UP. Although it's possible this change may potentially
>>>> break userspace component (most likely configuration scripts or
>>>> management software) that assumes slave name can't be changed while
>>>> UP, it's relatively a limited and controllable set among all userspace
>>>> components, which can be fixed specifically to listen for the rename
>>>> and/or link down/up events on failover slaves. Userspace component
>>>> interacting with slaves is expected to be changed to operate on failover
>>>> master interface instead, as the failover slave is dynamic in nature
>>>> which may come and go at any point. The goal is to make the role of
>>>> failover slaves less relevant, and userspace components should only
>>>> deal with failover master in the long run.
>>>>
>>>> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>>>
>>>> --
>>>> v1 -> v2:
>>>> - Drop configurable module parameter (Sridhar)
>>>>
>>>> v2 -> v3:
>>>> - Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
>>>> - Send down and up events around rename (Michael S. Tsirkin)
>>>> ---
>>>> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
>>>> 1 file changed, 34 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index 722d50d..3e0cd80 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev,
>>>> int dev_change_name(struct net_device *dev, const char *newname)
>>>> {
>>>> unsigned char old_assign_type;
>>>> + bool reopen_needed = false;
>>>> char oldname[IFNAMSIZ];
>>>> int err = 0;
>>>> int ret;
>>>> @@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname)
>>>> BUG_ON(!dev_net(dev));
>>>>
>>>> net = dev_net(dev);
>>>> - if (dev->flags & IFF_UP)
>>>> - return -EBUSY;
>>>> +
>>>> + /* Allow failover slave to rename even when
>>>> + * it is up and running.
>>>> + *
>>>> + * Failover slaves are special, since userspace
>>>> + * might rename the slave after the interface
>>>> + * has been brought up and running due to
>>>> + * auto-enslavement.
>>>> + *
>>>> + * Failover users don't actually care about slave
>>>> + * name change, as they are only expected to operate
>>>> + * on master interface directly.
>>>> + */
>>>> + if (dev->flags & IFF_UP) {
>>>> + if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
>>>> + return -EBUSY;
>>>> + reopen_needed = true;
>>>> + }
>>>>
>>>> write_seqcount_begin(&devnet_rename_seq);
>>>>
>>>> @@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
>>>> return err;
>>>> }
>>>>
>>>> + if (reopen_needed)
>>>> + dev_close(dev);
>>> Ugh. Don't dev_close/dev_open on name change.
>> See my response to Michael and Stephen. What's your suggestion then?
> To a DEV_CHANGE notification instead?
Thanks. That is what I was thinking. Will post a v4 to simplify the
notification. Not worth the effort to fine tune for cases in hypothesis.
-Siwei
>
>
> My opinion is that allowing name change is not worth the doing.
> Also, the kernel should never do the name change, it is up to userspace.
>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-03-28 20:15 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-26 23:48 [PATCH net v3] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
2019-03-27 2:13 ` Stephen Hemminger
2019-03-27 2:13 ` Stephen Hemminger
2019-03-27 13:25 ` Michael S. Tsirkin
2019-03-27 20:10 ` si-wei liu
2019-03-27 22:16 ` Michael S. Tsirkin
2019-03-27 22:16 ` Michael S. Tsirkin
2019-03-27 22:31 ` si-wei liu
2019-03-27 22:48 ` si-wei liu
2019-03-27 13:25 ` Michael S. Tsirkin
2019-03-27 11:11 ` Jiri Pirko
2019-03-27 23:44 ` si-wei liu
2019-03-28 17:14 ` Stephen Hemminger
2019-03-28 17:14 ` Stephen Hemminger
2019-03-28 20:15 ` si-wei liu
2019-03-27 11:11 ` Jiri Pirko
2019-03-28 19:49 ` kbuild test robot
2019-03-28 19:49 ` kbuild test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.