From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 040D9C43381 for ; Fri, 22 Feb 2019 15:14:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C7D832075C for ; Fri, 22 Feb 2019 15:14:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727150AbfBVPO2 (ORCPT ); Fri, 22 Feb 2019 10:14:28 -0500 Received: from mail-qk1-f195.google.com ([209.85.222.195]:46950 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726818AbfBVPO2 (ORCPT ); Fri, 22 Feb 2019 10:14:28 -0500 Received: by mail-qk1-f195.google.com with SMTP id i5so1282651qkd.13 for ; Fri, 22 Feb 2019 07:14:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Ie4Nwd79eGOHNnQj8PkL73UcIVHtq6w1yWbVFlhNiNw=; b=g46Vu72aKKK5o3w5rN5AHRvNSW2YfwOXSG1OdQ3fCo/tJa2xJANStQ4VxoPqo2FDSB A1SCHbfk5ltx+B0qybg7xGRD+10TeyyFPg0o4XlZ75g46Wx9Q/QdamhvW742CDNYfrhf /xlVNZBwc0a1QWN8N4lfBsIZFiL3D1H4wUlPizsEDyaxzeV5tYDcUbGQ7Hzic6n13S6O xUiSTh+xzzQ+Wdjb01T8rk9KvmvJqK52VPwmrsSZURz2QFx8Xaac63kouBQq/zTL5/g5 PMwjBkC4f7cuccNhvy6p1diu/VjoqueqKN3s+KFt1yd3p+GOxXmrpw+IS7LpA+ktJ9u2 mk/A== X-Gm-Message-State: AHQUAubBbQc+RsAgfaWW220SCqtYcA7IOlWFkcWjriIy48SML7TseG+0 uuB7m1DCo8nQ9korLeJ7oRI7uw== X-Google-Smtp-Source: AHgI3IbK+9hmUJc5g5dbxE8sOdYTxkK3Wv2TN07l2x7wDlJXFJxjkDJ4znGSzkbHn7tgRp5Cvwwqug== X-Received: by 2002:a37:7e45:: with SMTP id z66mr3272205qkc.23.1550848466956; Fri, 22 Feb 2019 07:14:26 -0800 (PST) Received: from redhat.com (pool-173-76-246-42.bstnma.fios.verizon.net. [173.76.246.42]) by smtp.gmail.com with ESMTPSA id a13sm1326923qtb.6.2019.02.22.07.14.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 22 Feb 2019 07:14:26 -0800 (PST) Date: Fri, 22 Feb 2019 10:14:23 -0500 From: "Michael S. Tsirkin" To: si-wei liu Cc: "Samudrala, Sridhar" , Siwei Liu , Jiri Pirko , Stephen Hemminger , David Miller , Netdev , virtualization@lists.linux-foundation.org, virtio-dev , "Brandeburg, Jesse" , Alexander Duyck , Jakub Kicinski , Jason Wang , liran.alon@oracle.com Subject: Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework) Message-ID: <20190222100753-mutt-send-email-mst@kernel.org> References: <1523386790-12396-1-git-send-email-sridhar.samudrala@intel.com> <1523386790-12396-5-git-send-email-sridhar.samudrala@intel.com> <20180410142608.50f15b45@xeon-e3> <20180411075334.GK2028@nanopsycho> <20190221203808-mutt-send-email-mst@kernel.org> <581e4399-3969-aecd-e923-03bbc0880733@oracle.com> <91d4cbb1-be7a-b53c-6b2a-99bef07e7c53@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote: > > > On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote: > > > > > > On 2/21/2019 7:33 PM, si-wei liu wrote: > > > > > > > > > On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote: > > > > On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote: > > > > > Sorry for replying to this ancient thread. There was some remaining > > > > > issue that I don't think the initial net_failover patch got addressed > > > > > cleanly, see: > > > > > > > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268 > > > > > > > > > > The renaming of 'eth0' to 'ens4' fails because the udev userspace was > > > > > not specifically writtten for such kernel automatic enslavement. > > > > > Specifically, if it is a bond or team, the slave would typically get > > > > > renamed *before* virtual device gets created, that's what udev can > > > > > control (without getting netdev opened early by the other part of > > > > > kernel) and other userspace components for e.g. initramfs, > > > > > init-scripts can coordinate well in between. The in-kernel > > > > > auto-enslavement of net_failover breaks this userspace convention, > > > > > which don't provides a solution if user care about consistent naming > > > > > on the slave netdevs specifically. > > > > > > > > > > Previously this issue had been specifically called out when IFF_HIDDEN > > > > > and the 1-netdev was proposed, but no one gives out a solution to this > > > > > problem ever since. Please share your mind how to proceed and solve > > > > > this userspace issue if netdev does not welcome a 1-netdev model. > > > > Above says: > > > > > > > > there's no motivation in the systemd/udevd community at > > > > this point to refactor the rename logic and make it work well with > > > > 3-netdev. > > > > > > > > What would the fix be? Skip slave devices? > > > > > > > There's nothing user can get if just skipping slave devices - the > > > name is still unchanged and unpredictable e.g. eth0, or eth1 the > > > next reboot, while the rest may conform to the naming scheme (ens3 > > > and such). There's no way one can fix this in userspace alone - when > > > the failover is created the enslaved netdev was opened by the kernel > > > earlier than the userspace is made aware of, and there's no > > > negotiation protocol for kernel to know when userspace has done > > > initial renaming of the interface. I would expect netdev list should > > > at least provide the direction in general for how this can be > > > solved... I was just wondering what did you mean when you said "refactor the rename logic and make it work well with 3-netdev" - was there a proposal udev rejected? Anyway, can we write a time diagram for what happens in which order that leads to failure? That would help look for triggers that we can tie into, or add new ones. > > > > > Is there an issue if slave device names are not predictable? The user/admin scripts are expected > > to only work with the master failover device. > Where does this expectation come from? > > Admin users may have ethtool or tc configurations that need to deal with > predictable interface name. Third-party app which was built upon specifying > certain interface name can't be modified to chase dynamic names. > > Specifically, we have pre-canned image that uses ethtool to fine tune VF > offload settings post boot for specific workload. Those images won't work > well if the name is constantly changing just after couple rounds of live > migration. It should be possible to specify the ethtool configuration on the master and have it automatically propagated to the slave. BTW this is something we should look at IMHO. > > Moreover, you were suggesting hiding the lower slave devices anyway. There was some discussion > > about moving them to a hidden network namespace so that they are not visible from the default namespace. > > I looked into this sometime back, but did not find the right kernel api to create a network namespace within > > kernel. If so, we could use this mechanism to simulate a 1-netdev model. > Yes, that's one possible implementation (IMHO the key is to make 1-netdev > model as much transparent to a real NIC as possible, while a hidden netns is > just the vehicle). However, I recall there was resistance around this > discussion that even the concept of hiding itself is a taboo for Linux > netdev. I would like to summon potential alternatives before concluding > 1-netdev is the only solution too soon. > > Thanks, > -Siwei Your scripts would not work at all then, right? > > > > > -Siwei > > > > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: virtio-dev-return-5513-cohuck=redhat.com@lists.oasis-open.org Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis.ws5.connectedcommunity.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id 6F2EE9860E6 for ; Fri, 22 Feb 2019 15:14:28 +0000 (UTC) Date: Fri, 22 Feb 2019 10:14:23 -0500 From: "Michael S. Tsirkin" Message-ID: <20190222100753-mutt-send-email-mst@kernel.org> References: <1523386790-12396-1-git-send-email-sridhar.samudrala@intel.com> <1523386790-12396-5-git-send-email-sridhar.samudrala@intel.com> <20180410142608.50f15b45@xeon-e3> <20180411075334.GK2028@nanopsycho> <20190221203808-mutt-send-email-mst@kernel.org> <581e4399-3969-aecd-e923-03bbc0880733@oracle.com> <91d4cbb1-be7a-b53c-6b2a-99bef07e7c53@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework) To: si-wei liu Cc: "Samudrala, Sridhar" , Siwei Liu , Jiri Pirko , Stephen Hemminger , David Miller , Netdev , virtualization@lists.linux-foundation.org, virtio-dev , "Brandeburg, Jesse" , Alexander Duyck , Jakub Kicinski , Jason Wang , liran.alon@oracle.com List-ID: On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote: > > > On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote: > > > > > > On 2/21/2019 7:33 PM, si-wei liu wrote: > > > > > > > > > On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote: > > > > On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote: > > > > > Sorry for replying to this ancient thread. There was some remaining > > > > > issue that I don't think the initial net_failover patch got addressed > > > > > cleanly, see: > > > > > > > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268 > > > > > > > > > > The renaming of 'eth0' to 'ens4' fails because the udev userspace was > > > > > not specifically writtten for such kernel automatic enslavement. > > > > > Specifically, if it is a bond or team, the slave would typically get > > > > > renamed *before* virtual device gets created, that's what udev can > > > > > control (without getting netdev opened early by the other part of > > > > > kernel) and other userspace components for e.g. initramfs, > > > > > init-scripts can coordinate well in between. The in-kernel > > > > > auto-enslavement of net_failover breaks this userspace convention, > > > > > which don't provides a solution if user care about consistent naming > > > > > on the slave netdevs specifically. > > > > > > > > > > Previously this issue had been specifically called out when IFF_HIDDEN > > > > > and the 1-netdev was proposed, but no one gives out a solution to this > > > > > problem ever since. Please share your mind how to proceed and solve > > > > > this userspace issue if netdev does not welcome a 1-netdev model. > > > > Above says: > > > > > > > > there's no motivation in the systemd/udevd community at > > > > this point to refactor the rename logic and make it work well with > > > > 3-netdev. > > > > > > > > What would the fix be? Skip slave devices? > > > > > > > There's nothing user can get if just skipping slave devices - the > > > name is still unchanged and unpredictable e.g. eth0, or eth1 the > > > next reboot, while the rest may conform to the naming scheme (ens3 > > > and such). There's no way one can fix this in userspace alone - when > > > the failover is created the enslaved netdev was opened by the kernel > > > earlier than the userspace is made aware of, and there's no > > > negotiation protocol for kernel to know when userspace has done > > > initial renaming of the interface. I would expect netdev list should > > > at least provide the direction in general for how this can be > > > solved... I was just wondering what did you mean when you said "refactor the rename logic and make it work well with 3-netdev" - was there a proposal udev rejected? Anyway, can we write a time diagram for what happens in which order that leads to failure? That would help look for triggers that we can tie into, or add new ones. > > > > > Is there an issue if slave device names are not predictable? The user/admin scripts are expected > > to only work with the master failover device. > Where does this expectation come from? > > Admin users may have ethtool or tc configurations that need to deal with > predictable interface name. Third-party app which was built upon specifying > certain interface name can't be modified to chase dynamic names. > > Specifically, we have pre-canned image that uses ethtool to fine tune VF > offload settings post boot for specific workload. Those images won't work > well if the name is constantly changing just after couple rounds of live > migration. It should be possible to specify the ethtool configuration on the master and have it automatically propagated to the slave. BTW this is something we should look at IMHO. > > Moreover, you were suggesting hiding the lower slave devices anyway. There was some discussion > > about moving them to a hidden network namespace so that they are not visible from the default namespace. > > I looked into this sometime back, but did not find the right kernel api to create a network namespace within > > kernel. If so, we could use this mechanism to simulate a 1-netdev model. > Yes, that's one possible implementation (IMHO the key is to make 1-netdev > model as much transparent to a real NIC as possible, while a hidden netns is > just the vehicle). However, I recall there was resistance around this > discussion that even the concept of hiding itself is a taboo for Linux > netdev. I would like to summon potential alternatives before concluding > 1-netdev is the only solution too soon. > > Thanks, > -Siwei Your scripts would not work at all then, right? > > > > > -Siwei > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org