All of lore.kernel.org
 help / color / mirror / Atom feed
From: Siwei Liu <loseweigh@gmail.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Sridhar Samudrala <sridhar.samudrala@intel.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	David Miller <davem@davemloft.net>,
	Netdev <netdev@vger.kernel.org>, Jiri Pirko <jiri@resnulli.us>,
	virtio-dev@lists.oasis-open.org, "Brandeburg,
	Jesse" <jesse.brandeburg@intel.com>,
	Alexander Duyck <alexander.h.duyck@intel.com>,
	Jakub Kicinski <kubakici@wp.pl>
Subject: Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
Date: Fri, 2 Mar 2018 15:56:31 -0800	[thread overview]
Message-ID: <CADGSJ22VUgJzi6B=Bh4M6Bado1CQEEJvRR1VJ=oC47G2SJ0DEA@mail.gmail.com> (raw)
In-Reply-To: <20180302233443-mutt-send-email-mst@kernel.org>

On Fri, Mar 2, 2018 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Mar 02, 2018 at 01:11:56PM -0800, Siwei Liu wrote:
>> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
>> <sridhar.samudrala@intel.com> wrote:
>> > This patch enables virtio_net to switch over to a VF datapath when a VF
>> > netdev is present with the same MAC address. It allows live migration
>> > of a VM with a direct attached VF without the need to setup a bond/team
>> > between a VF and virtio net device in the guest.
>> >
>> > The hypervisor needs to enable only one datapath at any time so that
>> > packets don't get looped back to the VM over the other datapath. When a VF
>> > is plugged, the virtio datapath link state can be marked as down. The
>> > hypervisor needs to unplug the VF device from the guest on the source host
>> > and reset the MAC filter of the VF to initiate failover of datapath to
>> > virtio before starting the migration. After the migration is completed,
>> > the destination hypervisor sets the MAC filter on the VF and plugs it back
>> > to the guest to switch over to VF datapath.
>> >
>> > When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>> > created that acts as a master device and tracks the state of the 2 lower
>> > netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
>> > passthru device with the same MAC is registered as 'active' netdev.
>> >
>> > This patch is based on the discussion initiated by Jesse on this thread.
>> > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>> >
>> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> > Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> > ---
>> >  drivers/net/virtio_net.c | 683 ++++++++++++++++++++++++++++++++++++++++++++++-
>> >  1 file changed, 682 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> > index bcd13fe906ca..f2860d86c952 100644
>> > --- a/drivers/net/virtio_net.c
>> > +++ b/drivers/net/virtio_net.c
>> > @@ -30,6 +30,8 @@
>> >  #include <linux/cpu.h>
>> >  #include <linux/average.h>
>> >  #include <linux/filter.h>
>> > +#include <linux/netdevice.h>
>> > +#include <linux/pci.h>
>> >  #include <net/route.h>
>> >  #include <net/xdp.h>
>> >
>> > @@ -206,6 +208,9 @@ struct virtnet_info {
>> >         u32 speed;
>> >
>> >         unsigned long guest_offloads;
>> > +
>> > +       /* upper netdev created when BACKUP feature enabled */
>> > +       struct net_device *bypass_netdev;
>> >  };
>> >
>> >  struct padded_vnet_hdr {
>> > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>> >         }
>> >  }
>> >
>> > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>> > +                                     size_t len)
>> > +{
>> > +       struct virtnet_info *vi = netdev_priv(dev);
>> > +       int ret;
>> > +
>> > +       if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
>> > +               return -EOPNOTSUPP;
>> > +
>> > +       ret = snprintf(buf, len, "_bkup");
>> > +       if (ret >= len)
>> > +               return -EOPNOTSUPP;
>> > +
>> > +       return 0;
>> > +}
>> > +
>>
>> What if the systemd/udevd is not new enough to enforce the
>> n<phys_port_name> naming? Would virtio_bypass get a different name
>> than the original virtio_net?
>
> You mean people using ethX names? Any hardware config change breaks
> these, I don't think that can be helped.

I don't like the way to rely on .ndo_get_phys_port_name - it's fragile
and it does not completely solve the problem it tries to address.
Imagine what can end up with if getting an old udevd, or users already
have exsiting explicit udev rules around phys_port_name. It does not
give you the an ack in saying "yes, I know you're the bypass and
you're the backup, please continue and I will give you both correct
names", or an unacknowlegment saying "no, I don't know what these
extra interfaces are, please go back and leave the VF device alone".
We need new udev API for both feature negotiation and naming, or may
even completely hide the lower interfaces.

>
>> Should we detect this earlier and fall
>> back to legacy mode without creating the bypass netdev and ensalving
>> the VF?
>
> I don't think we can do this with existing kernel/userspace APIs.

That's why I ever said to make udev aware of this new type of combined
device instead of doing hacks here and there around.

Regards,
-Siwei

>
> --
> MST

WARNING: multiple messages have this Message-ID (diff)
From: Siwei Liu <loseweigh@gmail.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Sridhar Samudrala <sridhar.samudrala@intel.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	David Miller <davem@davemloft.net>,
	Netdev <netdev@vger.kernel.org>, Jiri Pirko <jiri@resnulli.us>,
	virtio-dev@lists.oasis-open.org, "Brandeburg,
	Jesse" <jesse.brandeburg@intel.com>,
	Alexander Duyck <alexander.h.duyck@intel.com>,
	Jakub Kicinski <kubakici@wp.pl>
Subject: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
Date: Fri, 2 Mar 2018 15:56:31 -0800	[thread overview]
Message-ID: <CADGSJ22VUgJzi6B=Bh4M6Bado1CQEEJvRR1VJ=oC47G2SJ0DEA@mail.gmail.com> (raw)
In-Reply-To: <20180302233443-mutt-send-email-mst@kernel.org>

On Fri, Mar 2, 2018 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Mar 02, 2018 at 01:11:56PM -0800, Siwei Liu wrote:
>> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
>> <sridhar.samudrala@intel.com> wrote:
>> > This patch enables virtio_net to switch over to a VF datapath when a VF
>> > netdev is present with the same MAC address. It allows live migration
>> > of a VM with a direct attached VF without the need to setup a bond/team
>> > between a VF and virtio net device in the guest.
>> >
>> > The hypervisor needs to enable only one datapath at any time so that
>> > packets don't get looped back to the VM over the other datapath. When a VF
>> > is plugged, the virtio datapath link state can be marked as down. The
>> > hypervisor needs to unplug the VF device from the guest on the source host
>> > and reset the MAC filter of the VF to initiate failover of datapath to
>> > virtio before starting the migration. After the migration is completed,
>> > the destination hypervisor sets the MAC filter on the VF and plugs it back
>> > to the guest to switch over to VF datapath.
>> >
>> > When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>> > created that acts as a master device and tracks the state of the 2 lower
>> > netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
>> > passthru device with the same MAC is registered as 'active' netdev.
>> >
>> > This patch is based on the discussion initiated by Jesse on this thread.
>> > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>> >
>> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> > Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> > ---
>> >  drivers/net/virtio_net.c | 683 ++++++++++++++++++++++++++++++++++++++++++++++-
>> >  1 file changed, 682 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> > index bcd13fe906ca..f2860d86c952 100644
>> > --- a/drivers/net/virtio_net.c
>> > +++ b/drivers/net/virtio_net.c
>> > @@ -30,6 +30,8 @@
>> >  #include <linux/cpu.h>
>> >  #include <linux/average.h>
>> >  #include <linux/filter.h>
>> > +#include <linux/netdevice.h>
>> > +#include <linux/pci.h>
>> >  #include <net/route.h>
>> >  #include <net/xdp.h>
>> >
>> > @@ -206,6 +208,9 @@ struct virtnet_info {
>> >         u32 speed;
>> >
>> >         unsigned long guest_offloads;
>> > +
>> > +       /* upper netdev created when BACKUP feature enabled */
>> > +       struct net_device *bypass_netdev;
>> >  };
>> >
>> >  struct padded_vnet_hdr {
>> > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>> >         }
>> >  }
>> >
>> > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>> > +                                     size_t len)
>> > +{
>> > +       struct virtnet_info *vi = netdev_priv(dev);
>> > +       int ret;
>> > +
>> > +       if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
>> > +               return -EOPNOTSUPP;
>> > +
>> > +       ret = snprintf(buf, len, "_bkup");
>> > +       if (ret >= len)
>> > +               return -EOPNOTSUPP;
>> > +
>> > +       return 0;
>> > +}
>> > +
>>
>> What if the systemd/udevd is not new enough to enforce the
>> n<phys_port_name> naming? Would virtio_bypass get a different name
>> than the original virtio_net?
>
> You mean people using ethX names? Any hardware config change breaks
> these, I don't think that can be helped.

I don't like the way to rely on .ndo_get_phys_port_name - it's fragile
and it does not completely solve the problem it tries to address.
Imagine what can end up with if getting an old udevd, or users already
have exsiting explicit udev rules around phys_port_name. It does not
give you the an ack in saying "yes, I know you're the bypass and
you're the backup, please continue and I will give you both correct
names", or an unacknowlegment saying "no, I don't know what these
extra interfaces are, please go back and leave the VF device alone".
We need new udev API for both feature negotiation and naming, or may
even completely hide the lower interfaces.

>
>> Should we detect this earlier and fall
>> back to legacy mode without creating the bypass netdev and ensalving
>> the VF?
>
> I don't think we can do this with existing kernel/userspace APIs.

That's why I ever said to make udev aware of this new type of combined
device instead of doing hacks here and there around.

Regards,
-Siwei

>
> --
> MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


  reply	other threads:[~2018-03-02 23:56 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-01 20:08 [PATCH v4 0/2] Enable virtio_net to act as a backup for a passthru device Sridhar Samudrala
2018-03-01 20:08 ` [virtio-dev] " Sridhar Samudrala
2018-03-01 20:08 ` [PATCH v4 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit Sridhar Samudrala
2018-03-01 20:08   ` [virtio-dev] " Sridhar Samudrala
2018-03-01 20:08 ` [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available Sridhar Samudrala
2018-03-01 20:08   ` [virtio-dev] " Sridhar Samudrala
2018-03-02  8:36   ` Jiri Pirko
2018-03-02 15:26     ` Alexander Duyck
2018-03-02 15:26       ` [virtio-dev] " Alexander Duyck
2018-03-02 16:20       ` Jiri Pirko
2018-03-02 16:37         ` Samudrala, Sridhar
2018-03-02 16:37           ` [virtio-dev] " Samudrala, Sridhar
2018-03-02 17:06           ` Alexander Duyck
2018-03-02 17:06             ` [virtio-dev] " Alexander Duyck
2018-03-02 19:42         ` Michael S. Tsirkin
2018-03-02 19:42           ` [virtio-dev] " Michael S. Tsirkin
2018-03-02 20:49           ` Siwei Liu
2018-03-02 20:49             ` [virtio-dev] " Siwei Liu
2018-03-03 11:31           ` Jiri Pirko
2018-03-03 18:04             ` Alexander Duyck
2018-03-03 18:04               ` [virtio-dev] " Alexander Duyck
2018-03-03 21:25               ` Jiri Pirko
2018-03-04  0:26                 ` Alexander Duyck
2018-03-04  0:26                   ` [virtio-dev] " Alexander Duyck
2018-03-04  7:13                   ` Jiri Pirko
2018-03-04 18:24                     ` Alexander Duyck
2018-03-04 18:24                       ` [virtio-dev] " Alexander Duyck
2018-03-04 18:50                       ` Jiri Pirko
2018-03-04 21:54                         ` Samudrala, Sridhar
2018-03-04 21:54                           ` [virtio-dev] " Samudrala, Sridhar
2018-03-04 21:58                         ` Alexander Duyck
2018-03-04 21:58                           ` [virtio-dev] " Alexander Duyck
2018-03-05  9:21                           ` Jiri Pirko
2018-03-05 16:11                             ` Stephen Hemminger
2018-03-05 22:30                               ` Jiri Pirko
2018-03-05 22:47                                 ` Alexander Duyck
2018-03-05 22:47                                   ` [virtio-dev] " Alexander Duyck
2018-03-06  3:15                                   ` Stephen Hemminger
2018-03-06 19:08                                     ` Alexander Duyck
2018-03-06 19:08                                       ` [virtio-dev] " Alexander Duyck
2018-03-06 22:59                                       ` Jiri Pirko
2018-03-06 23:27                                         ` Alexander Duyck
2018-03-06 23:27                                           ` [virtio-dev] " Alexander Duyck
2018-03-07  2:38                                           ` Michael S. Tsirkin
2018-03-07  2:38                                             ` [virtio-dev] " Michael S. Tsirkin
2018-03-07 17:50                                             ` Alexander Duyck
2018-03-07 17:50                                               ` [virtio-dev] " Alexander Duyck
2018-03-07 18:06                                               ` Stephen Hemminger
2018-03-07 18:55                                                 ` Alexander Duyck
2018-03-07 18:55                                                   ` [virtio-dev] " Alexander Duyck
2018-03-07 20:11                                                 ` Michael S. Tsirkin
2018-03-07 20:11                                                   ` [virtio-dev] " Michael S. Tsirkin
2018-03-12 18:47                                                   ` Samudrala, Sridhar
2018-03-12 18:47                                                     ` [virtio-dev] " Samudrala, Sridhar
2018-03-02 19:41       ` Michael S. Tsirkin
2018-03-02 19:41         ` [virtio-dev] " Michael S. Tsirkin
2018-03-02 19:52         ` Samudrala, Sridhar
2018-03-02 19:52           ` [virtio-dev] " Samudrala, Sridhar
2018-03-02 20:10           ` Michael S. Tsirkin
2018-03-02 20:10             ` [virtio-dev] " Michael S. Tsirkin
2018-03-02 20:44             ` Siwei Liu
2018-03-02 20:44               ` [virtio-dev] " Siwei Liu
2018-03-02 20:56               ` Samudrala, Sridhar
2018-03-02 20:56                 ` [virtio-dev] " Samudrala, Sridhar
2018-03-02 21:33                 ` Michael S. Tsirkin
2018-03-02 21:33                   ` [virtio-dev] " Michael S. Tsirkin
2018-03-02 21:31               ` Michael S. Tsirkin
2018-03-02 21:31                 ` [virtio-dev] " Michael S. Tsirkin
2018-03-02 22:26                 ` Siwei Liu
2018-03-02 22:26                   ` [virtio-dev] " Siwei Liu
2018-03-04  4:00                   ` Michael S. Tsirkin
2018-03-04  4:00                     ` [virtio-dev] " Michael S. Tsirkin
2018-03-02 21:11   ` Siwei Liu
2018-03-02 21:11     ` [virtio-dev] " Siwei Liu
2018-03-02 21:36     ` Michael S. Tsirkin
2018-03-02 21:36       ` [virtio-dev] " Michael S. Tsirkin
2018-03-02 23:56       ` Siwei Liu [this message]
2018-03-02 23:56         ` Siwei Liu
2018-03-04  4:04         ` Michael S. Tsirkin
2018-03-04  4:04           ` [virtio-dev] " Michael S. Tsirkin
2018-03-12 21:53           ` Siwei Liu
2018-03-12 21:53             ` [virtio-dev] " Siwei Liu
2018-03-02 23:12     ` Samudrala, Sridhar
2018-03-02 23:12       ` [virtio-dev] " Samudrala, Sridhar
2018-03-03  0:09       ` Siwei Liu
2018-03-03  0:09         ` [virtio-dev] " Siwei Liu
2018-03-12 20:12   ` Jiri Pirko
2018-03-12 20:58     ` Samudrala, Sridhar
2018-03-12 20:58       ` [virtio-dev] " Samudrala, Sridhar
2018-03-12 21:08       ` Jiri Pirko
2018-03-14  0:36         ` Samudrala, Sridhar
2018-03-14  0:36           ` [virtio-dev] " Samudrala, Sridhar
2018-03-14  0:54           ` Stephen Hemminger
2018-03-14 15:45           ` Jiri Pirko
2018-03-12 22:44   ` Siwei Liu
2018-03-12 22:44     ` [virtio-dev] " Siwei Liu
2018-03-14  0:28     ` Samudrala, Sridhar
2018-03-14  0:28       ` [virtio-dev] " Samudrala, Sridhar
2018-03-14  0:44       ` Michael S. Tsirkin
2018-03-14  0:44         ` [virtio-dev] " Michael S. Tsirkin
2018-03-14  4:50       ` Siwei Liu
2018-03-14  4:50         ` [virtio-dev] " Siwei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADGSJ22VUgJzi6B=Bh4M6Bado1CQEEJvRR1VJ=oC47G2SJ0DEA@mail.gmail.com' \
    --to=loseweigh@gmail.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=davem@davemloft.net \
    --cc=jesse.brandeburg@intel.com \
    --cc=jiri@resnulli.us \
    --cc=kubakici@wp.pl \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sridhar.samudrala@intel.com \
    --cc=stephen@networkplumber.org \
    --cc=virtio-dev@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.