netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: si-wei liu <si-wei.liu@oracle.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	"Samudrala, Sridhar" <sridhar.samudrala@intel.com>,
	Siwei Liu <loseweigh@gmail.com>, Jiri Pirko <jiri@resnulli.us>,
	David Miller <davem@davemloft.net>,
	Netdev <netdev@vger.kernel.org>,
	virtualization@lists.linux-foundation.org,
	virtio-dev <virtio-dev@lists.oasis-open.org>,
	"Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
	Alexander Duyck <alexander.h.duyck@intel.com>,
	Jakub Kicinski <kubakici@wp.pl>, Jason Wang <jasowang@redhat.com>,
	liran.alon@oracle.com
Subject: Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)
Date: Mon, 25 Feb 2019 17:39:12 -0800	[thread overview]
Message-ID: <20190225173912.26b93422@shemminger-XPS-13-9360> (raw)
In-Reply-To: <e6a53bd1-83ab-f170-406a-03276e8c87e2@oracle.com>

On Mon, 25 Feb 2019 16:58:07 -0800
si-wei liu <si-wei.liu@oracle.com> wrote:

> On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote:
> > On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:  
> >>
> >> On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:  
> >>>
> >>> On 2/21/2019 7:33 PM, si-wei liu wrote:  
> >>>>
> >>>> On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:  
> >>>>> On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote:  
> >>>>>> Sorry for replying to this ancient thread. There was some remaining
> >>>>>> issue that I don't think the initial net_failover patch got addressed
> >>>>>> cleanly, see:
> >>>>>>
> >>>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268
> >>>>>>
> >>>>>> The renaming of 'eth0' to 'ens4' fails because the udev userspace was
> >>>>>> not specifically writtten for such kernel automatic enslavement.
> >>>>>> Specifically, if it is a bond or team, the slave would typically get
> >>>>>> renamed *before* virtual device gets created, that's what udev can
> >>>>>> control (without getting netdev opened early by the other part of
> >>>>>> kernel) and other userspace components for e.g. initramfs,
> >>>>>> init-scripts can coordinate well in between. The in-kernel
> >>>>>> auto-enslavement of net_failover breaks this userspace convention,
> >>>>>> which don't provides a solution if user care about consistent naming
> >>>>>> on the slave netdevs specifically.
> >>>>>>
> >>>>>> Previously this issue had been specifically called out when IFF_HIDDEN
> >>>>>> and the 1-netdev was proposed, but no one gives out a solution to this
> >>>>>> problem ever since. Please share your mind how to proceed and solve
> >>>>>> this userspace issue if netdev does not welcome a 1-netdev model.  
> >>>>> Above says:
> >>>>>
> >>>>>      there's no motivation in the systemd/udevd community at
> >>>>>      this point to refactor the rename logic and make it work well with
> >>>>>      3-netdev.
> >>>>>
> >>>>> What would the fix be? Skip slave devices?
> >>>>>  
> >>>> There's nothing user can get if just skipping slave devices - the
> >>>> name is still unchanged and unpredictable e.g. eth0, or eth1 the
> >>>> next reboot, while the rest may conform to the naming scheme (ens3
> >>>> and such). There's no way one can fix this in userspace alone - when
> >>>> the failover is created the enslaved netdev was opened by the kernel
> >>>> earlier than the userspace is made aware of, and there's no
> >>>> negotiation protocol for kernel to know when userspace has done
> >>>> initial renaming of the interface. I would expect netdev list should
> >>>> at least provide the direction in general for how this can be
> >>>> solved...  
> >
> > I was just wondering what did you mean when you said
> > "refactor the rename logic and make it work well with 3-netdev" -
> > was there a proposal udev rejected?  
> No. I never believed this particular issue can be fixed in userspace 
> alone. Previously someone had said it could be, but I never see any work 
> or relevant discussion ever happened in various userspace communities 
> (for e.g. dracut, initramfs-tools, systemd, udev, and NetworkManager). 
> IMHO the root of the issue derives from the kernel, it makes more sense 
> to start from netdev, work out and decide on a solution: see what can be 
> done in the kernel in order to fix it, then after that engage userspace 
> community for the feasibility...
> 
> > Anyway, can we write a time diagram for what happens in which order that
> > leads to failure?  That would help look for triggers that we can tie
> > into, or add new ones.
> >  
> 
> See attached diagram.
> 
> >
> >
> >
> >  
> >>> Is there an issue if slave device names are not predictable? The user/admin scripts are expected
> >>> to only work with the master failover device.  
> >> Where does this expectation come from?
> >>
> >> Admin users may have ethtool or tc configurations that need to deal with
> >> predictable interface name. Third-party app which was built upon specifying
> >> certain interface name can't be modified to chase dynamic names.
> >>
> >> Specifically, we have pre-canned image that uses ethtool to fine tune VF
> >> offload settings post boot for specific workload. Those images won't work
> >> well if the name is constantly changing just after couple rounds of live
> >> migration.  
> > It should be possible to specify the ethtool configuration on the
> > master and have it automatically propagated to the slave.
> >
> > BTW this is something we should look at IMHO.  
> I was elaborating a few examples that the expectation and assumption 
> that user/admin scripts only deal with master failover device is 
> incorrect. It had never been taken good care of, although I did try to 
> emphasize it from the very beginning.
> 
> Basically what you said about propagating the ethtool configuration down 
> to the slave is the key pursuance of 1-netdev model. However, what I am 
> seeking now is any alternative that can also fix the specific udev 
> rename problem, before concluding that 1-netdev is the only solution. 
> Generally a 1-netdev scheme would take time to implement, while I'm 
> trying to find a way out to fix this particular naming problem under 
> 3-netdev.
> 
> >  
> >>> Moreover, you were suggesting hiding the lower slave devices anyway. There was some discussion
> >>> about moving them to a hidden network namespace so that they are not visible from the default namespace.
> >>> I looked into this sometime back, but did not find the right kernel api to create a network namespace within
> >>> kernel. If so, we could use this mechanism to simulate a 1-netdev model.  
> >> Yes, that's one possible implementation (IMHO the key is to make 1-netdev
> >> model as much transparent to a real NIC as possible, while a hidden netns is
> >> just the vehicle). However, I recall there was resistance around this
> >> discussion that even the concept of hiding itself is a taboo for Linux
> >> netdev. I would like to summon potential alternatives before concluding
> >> 1-netdev is the only solution too soon.
> >>
> >> Thanks,
> >> -Siwei  
> > Your scripts would not work at all then, right?  
> At this point we don't claim images with such usage as SR-IOV live 
> migrate-able. We would flag it as live migrate-able until this ethtool 
> config issue is fully addressed and a transparent live migration 
> solution emerges in upstream eventually.

The hyper-v netvsc with 1-dev model uses a timeout to allow  udev to do its rename.
I proposed a patch to key state change off of the udev rename, but that patch was
rejected.


  reply	other threads:[~2019-02-26  1:39 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-10 18:59 [RFC PATCH net-next v6 0/4] Enable virtio_net to act as a backup for a passthru device Sridhar Samudrala
2018-04-10 18:59 ` [RFC PATCH net-next v6 1/4] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit Sridhar Samudrala
2018-04-10 18:59 ` [RFC PATCH net-next v6 2/4] net: Introduce generic bypass module Sridhar Samudrala
2018-04-11 15:51   ` Jiri Pirko
2018-04-11 19:13     ` Samudrala, Sridhar
2018-04-18  9:25       ` Jiri Pirko
2018-04-18 18:43         ` Samudrala, Sridhar
2018-04-18 19:13           ` Jiri Pirko
2018-04-18 19:46             ` Michael S. Tsirkin
2018-04-18 20:32               ` Jiri Pirko
2018-04-18 22:46                 ` Samudrala, Sridhar
2018-04-19  6:35                   ` Jiri Pirko
2018-04-19  4:08                 ` Michael S. Tsirkin
2018-04-19  7:22                   ` Jiri Pirko
2018-04-10 18:59 ` [RFC PATCH net-next v6 3/4] virtio_net: Extend virtio to use VF datapath when available Sridhar Samudrala
2018-04-10 18:59 ` [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework Sridhar Samudrala
2018-04-10 21:26   ` Stephen Hemminger
2018-04-10 22:56     ` Samudrala, Sridhar
2018-04-10 23:28     ` Michael S. Tsirkin
2018-04-10 23:44       ` Siwei Liu
2018-04-10 23:59         ` Stephen Hemminger
2018-04-11  7:50       ` Jiri Pirko
2018-04-11  1:21     ` Michael S. Tsirkin
2018-04-11  7:53     ` Jiri Pirko
2019-02-22  1:14       ` net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework) Siwei Liu
2019-02-22  1:39         ` Michael S. Tsirkin
2019-02-22  3:33           ` [virtio-dev] " si-wei liu
     [not found]             ` <91d4cbb1-be7a-b53c-6b2a-99bef07e7c53@intel.com>
2019-02-22  7:55               ` si-wei liu
2019-02-22 12:58                 ` Rob Miller
2019-02-22 15:14                 ` Michael S. Tsirkin
2019-02-26  0:58                   ` si-wei liu
2019-02-26  1:39                     ` Stephen Hemminger [this message]
2019-02-26  2:05                       ` Michael S. Tsirkin
2019-02-27  0:49                         ` si-wei liu
2019-02-26  2:08                     ` Michael S. Tsirkin
2019-02-27  0:17                       ` si-wei liu
2019-02-27 21:57                         ` Stephen Hemminger
2019-02-27 22:30                           ` si-wei liu
2019-02-27 22:38                         ` Michael S. Tsirkin
2019-02-27 23:34                           ` si-wei liu
2019-02-27 23:50                             ` Michael S. Tsirkin
2019-02-28  0:00                               ` Liran Alon
2019-02-28  0:03                               ` Stephen Hemminger
2019-02-28  0:38                                 ` Michael S. Tsirkin
2019-02-28  0:38                               ` si-wei liu
2019-02-28  0:41                                 ` Michael S. Tsirkin
2019-02-28  0:52                                   ` Jakub Kicinski
2019-02-28  1:26                                     ` Michael S. Tsirkin
2019-02-28  1:52                                       ` Jakub Kicinski
2019-02-28  4:47                                         ` Michael S. Tsirkin
2019-02-28 18:13                                           ` Jakub Kicinski
2019-02-28 19:36                                             ` Michael S. Tsirkin
2019-02-28 19:56                                               ` Jakub Kicinski
2019-02-28 20:14                                                 ` Michael S. Tsirkin
2019-02-28 23:31                                                   ` Jakub Kicinski
2019-03-01  0:20                                                 ` Siwei Liu
2019-03-01  1:05                                                   ` Jakub Kicinski
2019-03-02  0:30                                                     ` Siwei Liu
2019-02-28  9:32                                   ` si-wei liu
2019-02-28 14:26                                     ` Michael S. Tsirkin
2019-03-01  1:30                                       ` si-wei liu
2019-03-01 13:27                                         ` Michael S. Tsirkin
2019-03-01 20:55                                           ` si-wei liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190225173912.26b93422@shemminger-XPS-13-9360 \
    --to=stephen@networkplumber.org \
    --cc=alexander.h.duyck@intel.com \
    --cc=davem@davemloft.net \
    --cc=jasowang@redhat.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=jiri@resnulli.us \
    --cc=kubakici@wp.pl \
    --cc=liran.alon@oracle.com \
    --cc=loseweigh@gmail.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=si-wei.liu@oracle.com \
    --cc=sridhar.samudrala@intel.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).