From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA420C43381 for ; Tue, 26 Feb 2019 00:57:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6ECCF21848 for ; Tue, 26 Feb 2019 00:57:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="qsllI2tW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725947AbfBZA5Z (ORCPT ); Mon, 25 Feb 2019 19:57:25 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:44812 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725842AbfBZA5Z (ORCPT ); Mon, 25 Feb 2019 19:57:25 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x1Q0rgMF079762; Tue, 26 Feb 2019 00:57:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type; s=corp-2018-07-02; bh=GyLXJC3xPmcmFhkRqK8itJ1cQWyax2877Ue3OMY2r0Y=; b=qsllI2tWbNvyITHGSRv937PSXCbYnEV+Qtufx4qawhShUUsFTspZbaF/LAdqHle+2G6w 9uDfKw/modT+c5tuSi55qiZzhno2dC5jRLrYcmY4pzlGb0bbaIaW8jH6/sAXyO0xaunl N2lBQfckq5PU0YXP6SZ+RpqAY2DC9lgvaUAbevXlNHEsY7E3wKNnTkv2jYgrGWqFY5xs UO9lBCBwnJZlyKDmgqaZEd03fFkMWkzyKJh1LqdPjp7azDlzyHKOeKpZUdy3//D0h84t mBzncDuHMCgPr/ZpsUQbXnnJ5nPf5bDvLomRZdlI79TVv+DZdwXAwYaGBXvaw/wGTRc5 xQ== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2qtwku1qqc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 26 Feb 2019 00:57:11 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id x1Q0vAdu029930 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 26 Feb 2019 00:57:10 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x1Q0v8Ca026821; Tue, 26 Feb 2019 00:57:09 GMT Received: from [10.132.91.97] (/10.132.91.97) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 25 Feb 2019 16:57:08 -0800 Subject: Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework) To: "Michael S. Tsirkin" Cc: "Samudrala, Sridhar" , Siwei Liu , Jiri Pirko , Stephen Hemminger , David Miller , Netdev , virtualization@lists.linux-foundation.org, virtio-dev , "Brandeburg, Jesse" , Alexander Duyck , Jakub Kicinski , Jason Wang , liran.alon@oracle.com References: <1523386790-12396-1-git-send-email-sridhar.samudrala@intel.com> <1523386790-12396-5-git-send-email-sridhar.samudrala@intel.com> <20180410142608.50f15b45@xeon-e3> <20180411075334.GK2028@nanopsycho> <20190221203808-mutt-send-email-mst@kernel.org> <581e4399-3969-aecd-e923-03bbc0880733@oracle.com> <91d4cbb1-be7a-b53c-6b2a-99bef07e7c53@intel.com> <20190222100753-mutt-send-email-mst@kernel.org> From: si-wei liu Organization: Oracle Corporation Message-ID: Date: Mon, 25 Feb 2019 16:58:07 -0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <20190222100753-mutt-send-email-mst@kernel.org> Content-Type: multipart/mixed; boundary="------------10FAAC2262C0FD1C74C40294" Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9178 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1902260004 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This is a multi-part message in MIME format. --------------10FAAC2262C0FD1C74C40294 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote: > On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote: >> >> On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote: >>> >>> On 2/21/2019 7:33 PM, si-wei liu wrote: >>>> >>>> On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote: >>>>> On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote: >>>>>> Sorry for replying to this ancient thread. There was some remaining >>>>>> issue that I don't think the initial net_failover patch got addressed >>>>>> cleanly, see: >>>>>> >>>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268 >>>>>> >>>>>> The renaming of 'eth0' to 'ens4' fails because the udev userspace was >>>>>> not specifically writtten for such kernel automatic enslavement. >>>>>> Specifically, if it is a bond or team, the slave would typically get >>>>>> renamed *before* virtual device gets created, that's what udev can >>>>>> control (without getting netdev opened early by the other part of >>>>>> kernel) and other userspace components for e.g. initramfs, >>>>>> init-scripts can coordinate well in between. The in-kernel >>>>>> auto-enslavement of net_failover breaks this userspace convention, >>>>>> which don't provides a solution if user care about consistent naming >>>>>> on the slave netdevs specifically. >>>>>> >>>>>> Previously this issue had been specifically called out when IFF_HIDDEN >>>>>> and the 1-netdev was proposed, but no one gives out a solution to this >>>>>> problem ever since. Please share your mind how to proceed and solve >>>>>> this userspace issue if netdev does not welcome a 1-netdev model. >>>>> Above says: >>>>> >>>>> there's no motivation in the systemd/udevd community at >>>>> this point to refactor the rename logic and make it work well with >>>>> 3-netdev. >>>>> >>>>> What would the fix be? Skip slave devices? >>>>> >>>> There's nothing user can get if just skipping slave devices - the >>>> name is still unchanged and unpredictable e.g. eth0, or eth1 the >>>> next reboot, while the rest may conform to the naming scheme (ens3 >>>> and such). There's no way one can fix this in userspace alone - when >>>> the failover is created the enslaved netdev was opened by the kernel >>>> earlier than the userspace is made aware of, and there's no >>>> negotiation protocol for kernel to know when userspace has done >>>> initial renaming of the interface. I would expect netdev list should >>>> at least provide the direction in general for how this can be >>>> solved... > > I was just wondering what did you mean when you said > "refactor the rename logic and make it work well with 3-netdev" - > was there a proposal udev rejected? No. I never believed this particular issue can be fixed in userspace alone. Previously someone had said it could be, but I never see any work or relevant discussion ever happened in various userspace communities (for e.g. dracut, initramfs-tools, systemd, udev, and NetworkManager). IMHO the root of the issue derives from the kernel, it makes more sense to start from netdev, work out and decide on a solution: see what can be done in the kernel in order to fix it, then after that engage userspace community for the feasibility... > Anyway, can we write a time diagram for what happens in which order that > leads to failure? That would help look for triggers that we can tie > into, or add new ones. > See attached diagram. > > > > >>> Is there an issue if slave device names are not predictable? The user/admin scripts are expected >>> to only work with the master failover device. >> Where does this expectation come from? >> >> Admin users may have ethtool or tc configurations that need to deal with >> predictable interface name. Third-party app which was built upon specifying >> certain interface name can't be modified to chase dynamic names. >> >> Specifically, we have pre-canned image that uses ethtool to fine tune VF >> offload settings post boot for specific workload. Those images won't work >> well if the name is constantly changing just after couple rounds of live >> migration. > It should be possible to specify the ethtool configuration on the > master and have it automatically propagated to the slave. > > BTW this is something we should look at IMHO. I was elaborating a few examples that the expectation and assumption that user/admin scripts only deal with master failover device is incorrect. It had never been taken good care of, although I did try to emphasize it from the very beginning. Basically what you said about propagating the ethtool configuration down to the slave is the key pursuance of 1-netdev model. However, what I am seeking now is any alternative that can also fix the specific udev rename problem, before concluding that 1-netdev is the only solution. Generally a 1-netdev scheme would take time to implement, while I'm trying to find a way out to fix this particular naming problem under 3-netdev. > >>> Moreover, you were suggesting hiding the lower slave devices anyway. There was some discussion >>> about moving them to a hidden network namespace so that they are not visible from the default namespace. >>> I looked into this sometime back, but did not find the right kernel api to create a network namespace within >>> kernel. If so, we could use this mechanism to simulate a 1-netdev model. >> Yes, that's one possible implementation (IMHO the key is to make 1-netdev >> model as much transparent to a real NIC as possible, while a hidden netns is >> just the vehicle). However, I recall there was resistance around this >> discussion that even the concept of hiding itself is a taboo for Linux >> netdev. I would like to summon potential alternatives before concluding >> 1-netdev is the only solution too soon. >> >> Thanks, >> -Siwei > Your scripts would not work at all then, right? At this point we don't claim images with such usage as SR-IOV live migrate-able. We would flag it as live migrate-able until this ethtool config issue is fully addressed and a transparent live migration solution emerges in upstream eventually. Thanks, -Siwei > > >>>> -Siwei >>>> >>>> > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > --------------10FAAC2262C0FD1C74C40294 Content-Type: text/plain; charset=UTF-8; name="net_failover_rename_race.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="net_failover_rename_race.txt" DQogIG5ldF9mYWlsb3ZlcihrZXJuZWwpICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwg ICAgbmV0d29yay5zZXJ2aWNlICh1c2VyKSAgICB8ICAgICAgICAgIHN5c3RlbWQtdWRldmQg KHVzZXIpDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLSstLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0rLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCihzdGFuZGJ5IHZpcnRpby1uZXQgYW5kIG5l dF9mYWlsb3ZlciAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IHwNCmRldmljZXMgY3JlYXRlZCBhbmQgaW5pdGlhbGl6ZWQsICAgICAgICAgICAgICAgICAg fCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCmkuZS4gdmlydG5ldF9wcm9iZSgp LT4gICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIHwNCiAgICAgICBuZXRfZmFpbG92ZXJfY3JlYXRlKCkgICAgICAgICAgICAgICAg ICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCndhcyBkb25lLikgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgIHwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgcnVucyBgaWZ1 cCBlbnMzJyAtPiAgICAgICAgIHwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgfCAgICBpcCBsaW5rIHNldCBkZXYgZW5zMyB1cCAgIHwNCm5l dF9mYWlsb3Zlcl9vcGVuKCkgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgZGV2X29wZW4odmlydG5ldF9kZXYpICAg ICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IHwNCiAgICB2aXJ0bmV0X29wZW4odmlydG5ldF9kZXYpICAgICAgICAgICAgICAgICAgICAg fCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgbmV0aWZfY2Fycmllcl9vbihm YWlsb3Zlcl9kZXYpICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIHwNCiAgLi4uICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgIHwNCihWRiBob3QgcGx1Z2dlZCBpbikgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCml4Z2JldmZf cHJvYmUoKSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgIHwNCiByZWdpc3Rlcl9uZXRkZXYoaXhnYmV2Zl9uZXRkZXYp ICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAg bmV0ZGV2X3JlZ2lzdGVyX2tvYmplY3QoaXhnYmV2Zl9uZXRkZXYpICAgICAgICAgfCAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgIGtvYmplY3RfYWRkKGl4Z2JldmZfZGV2 KSAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IHwNCiAgICBkZXZpY2VfYWRkKGl4Z2JldmZfZGV2KSAgICAgICAgICAgICAgICAgICAgICAg fCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgICAga29iamVjdF91ZXZlbnQo Jml4Z2JldmZfZGV2LT5rb2JqLCBLT0JKX0FERCkgfCAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIHwNCiAgICAgIG5ldGxpbmtfYnJvYWRjYXN0KCkgICAgICAgICAgICAgICAgICAg ICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgLi4uICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgIHwNCiAgY2FsbF9uZXRkZXZpY2Vfbm90aWZpZXJzKE5FVERFVl9SRUdJ U1RFUikgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgIGZhaWxv dmVyX2V2ZW50KC4uLiwgTkVUREVWX1JFR0lTVEVSLCAuLi4pICAgICAgfCAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgIHwNCiAgICBmYWlsb3Zlcl9zbGF2ZV9yZWdpc3RlcihpeGdi ZXZmX25ldGRldikgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAg ICAgbmV0X2ZhaWxvdmVyX3NsYXZlX3JlZ2lzdGVyKGl4Z2JldmZfbmV0ZGV2KSAgfCAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgICAgIGRldl9vcGVuKGl4Z2JldmZfbmV0 ZGV2KSAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IHwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg fCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIHwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwgICByZWNlaXZlZCBBREQg dWV2ZW50IGZyb20gbmV0bGluayBmZA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICB8ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAg IC4uLg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICB8ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgIHVkZXYtYnVpbHRpbi1uZXRf aWQuYzpkZXZfcGNpX3Nsb3QoKQ0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICB8ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgIChk ZWNpZGVkIHRvIHJlbmFtZWQgJ2V0aDAnICkNCiAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IHwgICAgIGlwIGxpbmsgc2V0IGRldiBldGgwIG5hbWUgZW5zNA0KKGRldl9jaGFuZ2VfbmFt ZSgpIHJldHVybnMgLUVCVVNZIGFzICAgICAgICAgICAgICB8ICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgfA0KaXhnYmV2Zl9uZXRkZXYtPmZsYWdzIGhhcyBJRkZfVVApICAgICAg ICAgICAgICAgICB8ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfA0KICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB8ICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgfA0KDQo= --------------10FAAC2262C0FD1C74C40294-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: virtio-dev-return-5522-cohuck=redhat.com@lists.oasis-open.org Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis.ws5.connectedcommunity.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id DB5C3985B92 for ; Tue, 26 Feb 2019 00:57:18 +0000 (UTC) References: <1523386790-12396-1-git-send-email-sridhar.samudrala@intel.com> <1523386790-12396-5-git-send-email-sridhar.samudrala@intel.com> <20180410142608.50f15b45@xeon-e3> <20180411075334.GK2028@nanopsycho> <20190221203808-mutt-send-email-mst@kernel.org> <581e4399-3969-aecd-e923-03bbc0880733@oracle.com> <91d4cbb1-be7a-b53c-6b2a-99bef07e7c53@intel.com> <20190222100753-mutt-send-email-mst@kernel.org> From: si-wei liu Message-ID: Date: Mon, 25 Feb 2019 16:58:07 -0800 MIME-Version: 1.0 In-Reply-To: <20190222100753-mutt-send-email-mst@kernel.org> Content-Type: multipart/mixed; boundary="------------10FAAC2262C0FD1C74C40294" Content-Language: en-US Subject: Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework) To: "Michael S. Tsirkin" Cc: "Samudrala, Sridhar" , Siwei Liu , Jiri Pirko , Stephen Hemminger , David Miller , Netdev , virtualization@lists.linux-foundation.org, virtio-dev , "Brandeburg, Jesse" , Alexander Duyck , Jakub Kicinski , Jason Wang , liran.alon@oracle.com List-ID: --------------10FAAC2262C0FD1C74C40294 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote: > On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote: >> >> On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote: >>> >>> On 2/21/2019 7:33 PM, si-wei liu wrote: >>>> >>>> On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote: >>>>> On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote: >>>>>> Sorry for replying to this ancient thread. There was some remaining >>>>>> issue that I don't think the initial net_failover patch got addressed >>>>>> cleanly, see: >>>>>> >>>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268 >>>>>> >>>>>> The renaming of 'eth0' to 'ens4' fails because the udev userspace was >>>>>> not specifically writtten for such kernel automatic enslavement. >>>>>> Specifically, if it is a bond or team, the slave would typically get >>>>>> renamed *before* virtual device gets created, that's what udev can >>>>>> control (without getting netdev opened early by the other part of >>>>>> kernel) and other userspace components for e.g. initramfs, >>>>>> init-scripts can coordinate well in between. The in-kernel >>>>>> auto-enslavement of net_failover breaks this userspace convention, >>>>>> which don't provides a solution if user care about consistent naming >>>>>> on the slave netdevs specifically. >>>>>> >>>>>> Previously this issue had been specifically called out when IFF_HIDDEN >>>>>> and the 1-netdev was proposed, but no one gives out a solution to this >>>>>> problem ever since. Please share your mind how to proceed and solve >>>>>> this userspace issue if netdev does not welcome a 1-netdev model. >>>>> Above says: >>>>> >>>>> there's no motivation in the systemd/udevd community at >>>>> this point to refactor the rename logic and make it work well with >>>>> 3-netdev. >>>>> >>>>> What would the fix be? Skip slave devices? >>>>> >>>> There's nothing user can get if just skipping slave devices - the >>>> name is still unchanged and unpredictable e.g. eth0, or eth1 the >>>> next reboot, while the rest may conform to the naming scheme (ens3 >>>> and such). There's no way one can fix this in userspace alone - when >>>> the failover is created the enslaved netdev was opened by the kernel >>>> earlier than the userspace is made aware of, and there's no >>>> negotiation protocol for kernel to know when userspace has done >>>> initial renaming of the interface. I would expect netdev list should >>>> at least provide the direction in general for how this can be >>>> solved... > > I was just wondering what did you mean when you said > "refactor the rename logic and make it work well with 3-netdev" - > was there a proposal udev rejected? No. I never believed this particular issue can be fixed in userspace alone. Previously someone had said it could be, but I never see any work or relevant discussion ever happened in various userspace communities (for e.g. dracut, initramfs-tools, systemd, udev, and NetworkManager). IMHO the root of the issue derives from the kernel, it makes more sense to start from netdev, work out and decide on a solution: see what can be done in the kernel in order to fix it, then after that engage userspace community for the feasibility... > Anyway, can we write a time diagram for what happens in which order that > leads to failure? That would help look for triggers that we can tie > into, or add new ones. > See attached diagram. > > > > >>> Is there an issue if slave device names are not predictable? The user/admin scripts are expected >>> to only work with the master failover device. >> Where does this expectation come from? >> >> Admin users may have ethtool or tc configurations that need to deal with >> predictable interface name. Third-party app which was built upon specifying >> certain interface name can't be modified to chase dynamic names. >> >> Specifically, we have pre-canned image that uses ethtool to fine tune VF >> offload settings post boot for specific workload. Those images won't work >> well if the name is constantly changing just after couple rounds of live >> migration. > It should be possible to specify the ethtool configuration on the > master and have it automatically propagated to the slave. > > BTW this is something we should look at IMHO. I was elaborating a few examples that the expectation and assumption that user/admin scripts only deal with master failover device is incorrect. It had never been taken good care of, although I did try to emphasize it from the very beginning. Basically what you said about propagating the ethtool configuration down to the slave is the key pursuance of 1-netdev model. However, what I am seeking now is any alternative that can also fix the specific udev rename problem, before concluding that 1-netdev is the only solution. Generally a 1-netdev scheme would take time to implement, while I'm trying to find a way out to fix this particular naming problem under 3-netdev. > >>> Moreover, you were suggesting hiding the lower slave devices anyway. There was some discussion >>> about moving them to a hidden network namespace so that they are not visible from the default namespace. >>> I looked into this sometime back, but did not find the right kernel api to create a network namespace within >>> kernel. If so, we could use this mechanism to simulate a 1-netdev model. >> Yes, that's one possible implementation (IMHO the key is to make 1-netdev >> model as much transparent to a real NIC as possible, while a hidden netns is >> just the vehicle). However, I recall there was resistance around this >> discussion that even the concept of hiding itself is a taboo for Linux >> netdev. I would like to summon potential alternatives before concluding >> 1-netdev is the only solution too soon. >> >> Thanks, >> -Siwei > Your scripts would not work at all then, right? At this point we don't claim images with such usage as SR-IOV live migrate-able. We would flag it as live migrate-able until this ethtool config issue is fully addressed and a transparent live migration solution emerges in upstream eventually. Thanks, -Siwei > > >>>> -Siwei >>>> >>>> > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > --------------10FAAC2262C0FD1C74C40294 Content-Type: text/plain; charset=UTF-8; name="net_failover_rename_race.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="net_failover_rename_race.txt" DQogIG5ldF9mYWlsb3ZlcihrZXJuZWwpICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwg ICAgbmV0d29yay5zZXJ2aWNlICh1c2VyKSAgICB8ICAgICAgICAgIHN5c3RlbWQtdWRldmQg KHVzZXIpDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLSstLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0rLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCihzdGFuZGJ5IHZpcnRpby1uZXQgYW5kIG5l dF9mYWlsb3ZlciAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IHwNCmRldmljZXMgY3JlYXRlZCBhbmQgaW5pdGlhbGl6ZWQsICAgICAgICAgICAgICAgICAg fCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCmkuZS4gdmlydG5ldF9wcm9iZSgp LT4gICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIHwNCiAgICAgICBuZXRfZmFpbG92ZXJfY3JlYXRlKCkgICAgICAgICAgICAgICAg ICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCndhcyBkb25lLikgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgIHwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgcnVucyBgaWZ1 cCBlbnMzJyAtPiAgICAgICAgIHwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgfCAgICBpcCBsaW5rIHNldCBkZXYgZW5zMyB1cCAgIHwNCm5l dF9mYWlsb3Zlcl9vcGVuKCkgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgZGV2X29wZW4odmlydG5ldF9kZXYpICAg ICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IHwNCiAgICB2aXJ0bmV0X29wZW4odmlydG5ldF9kZXYpICAgICAgICAgICAgICAgICAgICAg fCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgbmV0aWZfY2Fycmllcl9vbihm YWlsb3Zlcl9kZXYpICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIHwNCiAgLi4uICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgIHwNCihWRiBob3QgcGx1Z2dlZCBpbikgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCml4Z2JldmZf cHJvYmUoKSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgIHwNCiByZWdpc3Rlcl9uZXRkZXYoaXhnYmV2Zl9uZXRkZXYp ICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAg bmV0ZGV2X3JlZ2lzdGVyX2tvYmplY3QoaXhnYmV2Zl9uZXRkZXYpICAgICAgICAgfCAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgIGtvYmplY3RfYWRkKGl4Z2JldmZfZGV2 KSAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IHwNCiAgICBkZXZpY2VfYWRkKGl4Z2JldmZfZGV2KSAgICAgICAgICAgICAgICAgICAgICAg fCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgICAga29iamVjdF91ZXZlbnQo Jml4Z2JldmZfZGV2LT5rb2JqLCBLT0JKX0FERCkgfCAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIHwNCiAgICAgIG5ldGxpbmtfYnJvYWRjYXN0KCkgICAgICAgICAgICAgICAgICAg ICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgLi4uICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgIHwNCiAgY2FsbF9uZXRkZXZpY2Vfbm90aWZpZXJzKE5FVERFVl9SRUdJ U1RFUikgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgIGZhaWxv dmVyX2V2ZW50KC4uLiwgTkVUREVWX1JFR0lTVEVSLCAuLi4pICAgICAgfCAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgIHwNCiAgICBmYWlsb3Zlcl9zbGF2ZV9yZWdpc3RlcihpeGdi ZXZmX25ldGRldikgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAg ICAgbmV0X2ZhaWxvdmVyX3NsYXZlX3JlZ2lzdGVyKGl4Z2JldmZfbmV0ZGV2KSAgfCAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgICAgIGRldl9vcGVuKGl4Z2JldmZfbmV0 ZGV2KSAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IHwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg fCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCiAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIHwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwgICByZWNlaXZlZCBBREQg dWV2ZW50IGZyb20gbmV0bGluayBmZA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICB8ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAg IC4uLg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICB8ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgIHVkZXYtYnVpbHRpbi1uZXRf aWQuYzpkZXZfcGNpX3Nsb3QoKQ0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICB8ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgIChk ZWNpZGVkIHRvIHJlbmFtZWQgJ2V0aDAnICkNCiAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgfCAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IHwgICAgIGlwIGxpbmsgc2V0IGRldiBldGgwIG5hbWUgZW5zNA0KKGRldl9jaGFuZ2VfbmFt ZSgpIHJldHVybnMgLUVCVVNZIGFzICAgICAgICAgICAgICB8ICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgfA0KaXhnYmV2Zl9uZXRkZXYtPmZsYWdzIGhhcyBJRkZfVVApICAgICAg ICAgICAgICAgICB8ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfA0KICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB8ICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgfA0KDQo= --------------10FAAC2262C0FD1C74C40294 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org --------------10FAAC2262C0FD1C74C40294--