netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: "Samudrala, Sridhar" <sridhar.samudrala@intel.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	David Miller <davem@davemloft.net>,
	Netdev <netdev@vger.kernel.org>,
	virtualization@lists.linux-foundation.org,
	virtio-dev@lists.oasis-open.org, "Brandeburg,
	Jesse" <jesse.brandeburg@intel.com>,
	"Duyck, Alexander H" <alexander.h.duyck@intel.com>,
	Jakub Kicinski <kubakici@wp.pl>,
	achiad shochat <achiad.mellanox@gmail.com>,
	Achiad Shochat <achiad@mellanox.com>
Subject: Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit
Date: Mon, 22 Jan 2018 23:31:51 +0200	[thread overview]
Message-ID: <20180122231713-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAKgT0UeyNvVQc11KXc3updJfa9p7a9NcfRC=gP6=ktkjrSkOag@mail.gmail.com>

On Wed, Jan 17, 2018 at 01:49:58PM -0800, Alexander Duyck wrote:
> On Wed, Jan 17, 2018 at 11:57 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, Jan 17, 2018 at 11:25:41AM -0800, Samudrala, Sridhar wrote:
> >>
> >>
> >> On 1/17/2018 11:02 AM, Michael S. Tsirkin wrote:
> >> > On Wed, Jan 17, 2018 at 10:15:52AM -0800, Alexander Duyck wrote:
> >> > > On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudrala
> >> > > <sridhar.samudrala@intel.com> wrote:
> >> > > > This feature bit can be used by hypervisor to indicate virtio_net device to
> >> > > > act as a backup for another device with the same MAC address.
> >> > > >
> >> > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> > > > ---
> >> > > >   drivers/net/virtio_net.c        | 2 +-
> >> > > >   include/uapi/linux/virtio_net.h | 3 +++
> >> > > >   2 files changed, 4 insertions(+), 1 deletion(-)
> >> > > >
> >> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> > > > index 12dfc5fee58e..f149a160a8c5 100644
> >> > > > --- a/drivers/net/virtio_net.c
> >> > > > +++ b/drivers/net/virtio_net.c
> >> > > > @@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = {
> >> > > >          VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> >> > > >          VIRTIO_NET_F_CTRL_MAC_ADDR, \
> >> > > >          VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> >> > > > -       VIRTIO_NET_F_SPEED_DUPLEX
> >> > > > +       VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP
> >> > > >
> >> > > >   static unsigned int features[] = {
> >> > > >          VIRTNET_FEATURES,
> >> > > > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> >> > > > index 5de6ed37695b..c7c35fd1a5ed 100644
> >> > > > --- a/include/uapi/linux/virtio_net.h
> >> > > > +++ b/include/uapi/linux/virtio_net.h
> >> > > > @@ -57,6 +57,9 @@
> >> > > >                                           * Steering */
> >> > > >   #define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
> >> > > >
> >> > > > +#define VIRTIO_NET_F_BACKUP      62    /* Act as backup for another device
> >> > > > +                                        * with the same MAC.
> >> > > > +                                        */
> >> > > >   #define VIRTIO_NET_F_SPEED_DUPLEX 63   /* Device set linkspeed and duplex */
> >> > > >
> >> > > >   #ifndef VIRTIO_NET_NO_LEGACY
> >> > > I'm not a huge fan of the name "backup" since that implies that the
> >> > > Virtio interface is only used if the VF is not present, and there are
> >> > > multiple instances such as dealing with east/west or
> >> > > broadcast/multicast traffic where it may be desirable to use the
> >> > > para-virtual interface rather then deal with PCI overhead/bottleneck
> >> > > to send the packet.
> >> > Right now hypervisors mostly expect that yes, only one at a time is
> >> > used.  E.g. if you try to do multicast sending packets on both VF and
> >> > virtio then you will end up with two copies of each packet.
> >>
> >> I think we want to use only 1 interface to  send out any packet. In case of
> >> broadcast/multicasts it would be an optimization to send them via virtio and
> >> this patch series adds that optimization.
> >
> > Right that's what I think we should rather avoid for now.
> >
> > It's *not* an optimization if there's a single VM on this host,
> > or if a specific multicast group does not have any VMs on same
> > host.
> 
> Agreed. In my mind this is something that is controlled by the
> pass-thru interface once it is enslaved.

It would be pretty tricky to control through the PT
interface since a PT interface pretends to be a physical
device, which has no concept of VMs.

> > I'd rather we just sent everything out on the PT if that's
> > there. The reason we have virtio in the picture is just so
> > we can migrate without downtime.
> 
> I wasn't saying we do that in all cases. That would be something that
> would have to be decided by the pass-thru interface. Ideally the
> virtio would provide just enough information to get itself into the
> bond and I see this being the mechanism for it to do so. From there
> the complexity mostly lies in the pass-thru interface to configure the
> correct transmit modes if for example you have multiple pass-thru
> interfaces or a more complex traffic setup due to things like
> SwitchDev.
> 
> In my mind we go the bonding route and there are few use cases for all
> of this. First is the backup case that is being addressed here. That
> becomes your basic "copy netvsc" approach for this which would be
> default. It is how we would handle basic pass-thru back-up paths. If
> the host decides to send multicast/broadcast traffic from the host up
> through it that is a host side decision. I am okay with our default
> transmit behavior from the guest being to send everything through the
> pass-thru interface. All the virtio would be doing is advertising that
> it is a side channel for some sort of bond with another interface. By
> calling it a side channel it implies that it isn't the direct channel
> for the interface, but it is an alternative communications channel for
> things like backup, and other future uses.
> 
> There end up being a few different "phase 2" concepts I have floating
> around which I want to avoid limiting. Solving the east/west problem
> is an example. In my mind that just becomes a bonding Tx mode that is
> controlled via the pass-thru interface. The VF could black-list the
> local addresses so that they instead fall into the virtio interface.
> In addition I seem to recall watching a presentation from Mellanox
> where they were talking about a VF per NUMA node in a system which
> would imply multiple VFs with the same MAC address. I'm not sure how
> the current patch handles that. Obviously that would require a more
> complex load balancing setup. The bonding solution ends up being a way
> to resolve that so that they could just have it take care of picking
> the right Tx queue based on the NUMA affinity and fall back to the
> virtio/netvsc when those fail.

The way I see it, we'd need to pass a bunch of extra information
host to guest, and we'd have to use a PV interface for it.
When we do this, we'll need to have another
feature bit, and we can call it SIDE_CHANNEL or whatever.


> >> In the receive path,  the broadcasts should only go the PF and reach the VM
> >> via vitio so that the VM doesn't see duplicate broadcasts.
> >>
> >>
> >> >
> >> > To me the east/west scenario looks like you want something
> >> > more similar to a bridge on top of the virtio/PT pair.
> >> >
> >> > So I suspect that use-case will need a separate configuration bit,
> >> > and possibly that's when you will want something more powerful
> >> > such as a full bridge.
> >>
> >> east-west optimization of unicasts would be a harder problem to solve as
> >> somehow the hypervisor needs to indicate the VM about the local dest. macs
> >
> > Using a bridge with a dedicated device for east/west would let
> > bridge use standard learning techniques for that perhaps?
> 
> I'm not sure about having the guest have to learn.

It's certainly a way to do this, but certainly not the only one.

> In my mind the VF
> should know who is local and who isn't.

Right. But note that these things change.

> In the case of SwitchDev it
> should be possible for the port representors and the switch to provide
> data on which interfaces are bonded on the host side and which aren't.
> With that data it would be pretty easy to just put together a list of
> addresses that would prefer to go the para-virtual route instead of
> being transmitted through physical hardware.
> 
> In addition a bridge implies much more overhead since normally a
> bridge can receive a packet in on one interface and transmit it on
> another. We don't really need that. This is more of a VEPA type setup
> and doesn't need to be anything all that complex. You could probably
> even handle the Tx queue selection via a simple eBPF program and map
> since the input for whatever is used to select Tx should be pretty
> simple, destination MAC, source NUMA node, etc, and the data-set
> shouldn't be too large.

That sounds interesting. A separate device might make this kind of setup
a bit easier.  Sridhar, did you look into creating a separate device for
the virtual bond device at all?  It does not have to be in a separate
module, that kind of refactoring can come later, but once we commit to
using the same single device as virtio, we can't change that.


> >> > > What if instead of BACKUP we used the name SIDE_CHANNEL? Basically it
> >> > > is a bit of double entendre as we are using the physical MAC address
> >> > > to provide configuration information, and then in addition this
> >> > > interface acts as a secondary channel for passing frames to and from
> >> > > the guest rather than just using the VF.
> >> > >
> >> > > Just a thought.
> >> > >
> >> > > Thanks.
> >> > >
> >> > > - Alex
> >> > I just feel that's a very generic name, not conveying enough information
> >> > about how they hypervisor expects the pair of devices to be used.
> 
> I would argue that BACKUP is too specific. I can see many other uses
> other than just being a backup interface

True but the ones you listed above all seem to require virtio
changes anyway, we'll be able to add a new feature or
rename this one as we make them.

> and I don't want us to box
> ourselves in. I agree that it makes sense for active/backup to be the
> base mode, but I really don't think it conveys all of the
> possibilities since I see this as possibly being able to solve
> multiple issues as this evolves. I'm also open to other suggestions.
> We could go with PASSTHRU_SIDE_CHANNEL for instance, I just didn't
> suggest that since it seemed kind of wordy.

There's only so much info a single bit can confer :)
So we can always rename later, the point is to draw the line
and say "this is the functionality current hosts expect".

-- 
MST

  reply	other threads:[~2018-01-22 21:31 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-12  5:58 [RFC PATCH net-next v2 0/2] Enable virtio to act as a backup for a passthru device Sridhar Samudrala
2018-01-12  5:58 ` [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit Sridhar Samudrala
2018-01-17 18:15   ` Alexander Duyck
2018-01-17 19:02     ` [virtio-dev] " Michael S. Tsirkin
2018-01-17 19:25       ` Samudrala, Sridhar
2018-01-17 19:57         ` [virtio-dev] " Michael S. Tsirkin
2018-01-17 21:49           ` Alexander Duyck
2018-01-22 21:31             ` Michael S. Tsirkin [this message]
2018-01-22 23:27               ` [virtio-dev] " Samudrala, Sridhar
2018-01-23  0:02                 ` Stephen Hemminger
2018-01-23  1:37                   ` Samudrala, Sridhar
2018-01-23  0:05                 ` Michael S. Tsirkin
2018-01-23  0:16                   ` Jakub Kicinski
2018-01-23  0:47                     ` Michael S. Tsirkin
2018-01-23  1:13                       ` Jakub Kicinski
2018-01-23  1:23                         ` Michael S. Tsirkin
2018-01-23 19:21                           ` Jakub Kicinski
2018-01-23  1:34                   ` Samudrala, Sridhar
2018-01-23  2:04                     ` Michael S. Tsirkin
2018-01-23  3:36                       ` [virtio-dev] " Alexander Duyck
2018-01-23  5:54                         ` Samudrala, Sridhar
2018-01-23 23:01                         ` Michael S. Tsirkin
2018-01-12  5:58 ` [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available Sridhar Samudrala
2018-01-22 20:27   ` Siwei Liu
2018-01-22 21:05     ` Samudrala, Sridhar
2018-01-23 19:53       ` Laine Stump
2018-01-22 21:41     ` Michael S. Tsirkin
2018-01-23 20:24       ` Siwei Liu
2018-01-23 22:58         ` Michael S. Tsirkin
2018-01-26  8:14           ` Siwei Liu
2018-01-26 16:51             ` Samudrala, Sridhar
2018-01-26 21:46               ` Siwei Liu
2018-01-26 22:14                 ` [virtio-dev] " Michael S. Tsirkin
2018-01-26 22:47                   ` Jakub Kicinski
2018-01-26 23:30                     ` Samudrala, Sridhar
2018-01-27  2:30                       ` Jakub Kicinski
2018-01-27  5:33                         ` Samudrala, Sridhar
2018-01-27  5:58                           ` Jakub Kicinski
2018-01-28 17:35                             ` Alexander Duyck
2018-01-28 19:18                               ` [virtio-dev] " Samudrala, Sridhar
2018-01-28 20:18                                 ` Alexander Duyck
2018-01-28 21:01                                   ` [virtio-dev] " Samudrala, Sridhar
2018-01-29  0:58                                     ` Alexander Duyck
2018-01-28 23:02                         ` Stephen Hemminger
2018-01-29  4:26                           ` Alexander Duyck
2018-01-29 18:24                             ` Michael S. Tsirkin
2018-01-29 20:09                               ` Alexander Duyck
2018-01-23 10:33   ` Jason Wang
2018-01-23 16:03     ` Samudrala, Sridhar
2018-01-29  3:32       ` Jason Wang
2018-01-26 16:58   ` Michael S. Tsirkin
2018-01-26 18:15     ` Samudrala, Sridhar
2018-01-12  5:58 ` [RFC PATCH 1/1] qemu: Introduce VIRTIO_NET_F_BACKUP feature bit to virtio_net Sridhar Samudrala

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180122231713-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=achiad.mellanox@gmail.com \
    --cc=achiad@mellanox.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=davem@davemloft.net \
    --cc=jesse.brandeburg@intel.com \
    --cc=kubakici@wp.pl \
    --cc=netdev@vger.kernel.org \
    --cc=sridhar.samudrala@intel.com \
    --cc=stephen@networkplumber.org \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).