All of lore.kernel.org
 help / color / mirror / Atom feed
From: Haiyang Zhang <haiyangz@microsoft.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	KY Srinivasan <kys@microsoft.com>,
	"olaf@aepfle.de" <olaf@aepfle.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"bjorn.helgaas@gmail.com" <bjorn.helgaas@gmail.com>,
	"apw@canonical.com" <apw@canonical.com>,
	"devel@linuxdriverproject.org" <devel@linuxdriverproject.org>,
	"leann.ogasawara@canonical.com" <leann.ogasawara@canonical.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>
Subject: RE: [PATCH 3/3] hv_netvsc: Implement VF matching based on serial numbers
Date: Fri, 9 Dec 2016 21:31:25 +0000	[thread overview]
Message-ID: <BLUPR03MB1412A251ACF16A543F20E254CA870@BLUPR03MB1412.namprd03.prod.outlook.com> (raw)
In-Reply-To: <20161209122935.2bd0779e@xeon-e3>



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, December 9, 2016 3:30 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Greg KH <gregkh@linuxfoundation.org>; KY Srinivasan
> <kys@microsoft.com>; olaf@aepfle.de; linux-kernel@vger.kernel.org;
> bjorn.helgaas@gmail.com; apw@canonical.com; devel@linuxdriverproject.org;
> leann.ogasawara@canonical.com; jasowang@redhat.com
> Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on
> serial numbers
> 
> On Fri, 9 Dec 2016 20:09:49 +0000
> Haiyang Zhang <haiyangz@microsoft.com> wrote:
> 
> > > -----Original Message-----
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Friday, December 9, 2016 1:21 PM
> > > To: Greg KH <gregkh@linuxfoundation.org>
> > > Cc: KY Srinivasan <kys@microsoft.com>; olaf@aepfle.de; Haiyang Zhang
> > > <haiyangz@microsoft.com>; linux-kernel@vger.kernel.org;
> > > bjorn.helgaas@gmail.com; apw@canonical.com;
> devel@linuxdriverproject.org;
> > > leann.ogasawara@canonical.com; jasowang@redhat.com
> > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on
> > > serial numbers
> > >
> > > On Fri, 9 Dec 2016 08:31:22 +0100
> > > Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > > On Fri, Dec 09, 2016 at 12:05:53AM +0000, KY Srinivasan wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > > > > > Sent: Thursday, December 8, 2016 7:56 AM
> > > > > > To: KY Srinivasan <kys@microsoft.com>
> > > > > > Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> > > > > > olaf@aepfle.de; apw@canonical.com; vkuznets@redhat.com;
> > > > > > jasowang@redhat.com; leann.ogasawara@canonical.com;
> > > > > > bjorn.helgaas@gmail.com; Haiyang Zhang <haiyangz@microsoft.com>
> > > > > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching
> based on
> > > serial
> > > > > > numbers
> > > > > >
> > > > > > On Thu, Dec 08, 2016 at 12:33:43AM -0800,
> > > kys@exchange.microsoft.com
> > > > > > wrote:
> > > > > > > From: Haiyang Zhang <haiyangz@microsoft.com>
> > > > > > >
> > > > > > > We currently use MAC address to match VF and synthetic NICs.
> > > Hyper-V
> > > > > > > provides a serial number to both devices for this purpose.
> This
> > > patch
> > > > > > > implements the matching based on VF serial numbers. This is
> the
> > > way
> > > > > > > specified by the protocol and more reliable.
> > > > > > >
> > > > > > > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > > > > > > Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> > > > > > > ---
> > > > > > >  drivers/net/hyperv/netvsc_drv.c |   55
> > > > > > ++++++++++++++++++++++++++++++++++++---
> > > > > > >  1 files changed, 51 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/hyperv/netvsc_drv.c
> > > > > > b/drivers/net/hyperv/netvsc_drv.c
> > > > > > > index 9522763..c5778cf 100644
> > > > > > > --- a/drivers/net/hyperv/netvsc_drv.c
> > > > > > > +++ b/drivers/net/hyperv/netvsc_drv.c
> > > > > > > @@ -1165,9 +1165,10 @@ static void netvsc_free_netdev(struct
> > > > > > net_device *netdev)
> > > > > > >  	free_netdev(netdev);
> > > > > > >  }
> > > > > > >
> > > > > > > -static struct net_device *get_netvsc_bymac(const u8 *mac)
> > > > > > > +static struct net_device *get_netvsc_byvfser(u32 vfser)
> > > > > > >  {
> > > > > > >  	struct net_device *dev;
> > > > > > > +	struct net_device_context *ndev_ctx;
> > > > > > >
> > > > > > >  	ASSERT_RTNL();
> > > > > > >
> > > > > > > @@ -1175,7 +1176,8 @@ static void netvsc_free_netdev(struct
> > > net_device
> > > > > > *netdev)
> > > > > > >  		if (dev->netdev_ops != &device_ops)
> > > > > > >  			continue;	/* not a netvsc device */
> > > > > > >
> > > > > > > -		if (ether_addr_equal(mac, dev->perm_addr))
> > > > > > > +		ndev_ctx = netdev_priv(dev);
> > > > > > > +		if (ndev_ctx->vf_serial == vfser)
> > > > > > >  			return dev;
> > > > > > >  	}
> > > > > > >
> > > > > > > @@ -1205,21 +1207,66 @@ static void
> netvsc_free_netdev(struct
> > > > > > net_device *netdev)
> > > > > > >  	return NULL;
> > > > > > >  }
> > > > > > >
> > > > > > > +static u32 netvsc_get_vfser(struct net_device *vf_netdev)
> > > > > > > +{
> > > > > > > +	struct device *dev;
> > > > > > > +	struct hv_device *hdev;
> > > > > > > +	struct hv_pcibus_device *hbus = NULL;
> > > > > > > +	struct list_head *iter;
> > > > > > > +	struct hv_pci_dev *hpdev;
> > > > > > > +	unsigned long flags;
> > > > > > > +	u32 vfser = 0;
> > > > > > > +	u32 count = 0;
> > > > > > > +
> > > > > > > +	for (dev = &vf_netdev->dev; dev; dev = dev->parent) {
> > > > > >
> > > > > > You are going to walk the whole device tree backwards?  That's
> > > crazy.
> > > > > > And foolish.  And racy and broken (what happens if the tree
> > > changes
> > > > > > while you do this?)  Where is the lock being grabbed while
> this
> > > happens?
> > > > > > What about reference counts?  Do you see other drivers ever
> doing
> > > this
> > > > > > (if you do, point them out and I'll go yell at them too...)
> > > > >
> > > > > Greg,
> > > > >
> > > > > We are registering for netdev events. Coming into this function,
> the
> > > caller
> > > > > guarantees that the list of netdevs does not change - we assert
> this
> > > on entry:
> > > > > ASSERT_RTNL(). We are only walking up the device tree for the
> > > netdevs whose
> > > > > state change is being notified to us - the device tree being
> walked
> > > here is limited to
> > > > > netdevs under question.
> > > >
> > > > But a netdev is a child of some type of "real" device, and you are
> now
> > > > walking the tree of all devices up to the "root" parent device,
> which
> > > > means you will hit PCI bridges, USB controllers, and all sorts of
> fun
> > > > things if you are a child of those types of devices.
> > > >
> > > > And can't you tell if the netdev for this event, really is "your"
> > > > netdev?  Or are you getting called this for "all" netdevs?  Sorry,
> I
> > > > don't know this api, any pointers to it would be appreciated.
> > > >
> > > > > We have a reference to the device and we know the device is not
> > > going away. Is it not
> > > > > safe to dereference the parent pointer - after all the child has
> > > taken a reference on
> > > > > the parent as part of  device_add() call.
> > > >
> > > > It might be, and might not be.  There's a reason you don't see
> this
> > > > pattern anywhere in the kernel because of this...
> > > >
> > > > > > > +		if (!dev_is_vmbus(dev))
> > > > > > > +			continue;
> > > > > >
> > > > > > Ick.
> > > > > >
> > > > > > Why isn't your parent pointer a vmbus device all the time?
> How
> > > could
> > > > > > you get burried down in the device hierarchy when you are the
> > > driver for
> > > > > > a specific bus type in the first place?  How could this
> function
> > > ever be
> > > > > > called for a device that is NOT of this type?
> > > > >
> > > > > We get notified when state changes on any of the netdev devices
> in
> > > the system.
> > > > > Not all netdevs in the system belong to vmbus. Consider for
> instance
> > > the
> > > > > emulated NIC that can be configured. This is an emulated PCI NIC.
> We
> > > are only
> > > > > interested in netdevs that correspond to the VF instance that we
> are
> > > interested in.
> > > >
> > > > Can you "know" this is your netdev by some other way than having
> to
> > > walk
> > > > the device tree?  Name?  local device type?  Something else?  This
> > > seems
> > > > like an odd api in that everyone would have to do gyrations like
> this
> > > in
> > > > order to determine if the netdev is "theirs" or not...
> > >
> > > The scenario is SR-IOV on Hyper-V. In the case of VF device, the
> host
> > > hands the
> > > guest OS a PCI device for the virtual function device. The VF device
> is
> > > placed
> > > on a special synthetic PCI bus (ie not part of the other buses on
> the
> > > system).
> > > The VF device also has a synthetic network interface (netvsc) which
> > > lives
> > > on VMBUS.  This code is about managing the interaction between the
> two.
> > >
> > > The association between VF and synthetic NIC is done in response to
> the
> > > VF network device being registered. Initial version was based on MAC
> > > address
> > > which is the same.  Later refinement used permanent MAC address to
> > > avoid bugs if MAC address changed.  This version is to use serial
> number
> > > instead which is safer than MAC address.
> > >
> > > The code to walk up/down maybe not be needed to find serial number.
> > > Perhaps a more direct single set of conditions is possible?
> > >
> > > Something like:
> > >
> > > In pci-hyperv.c
> > >
> > > u32 hv_pcifront_get_serial(struct pci_bus *bus, unsigned int devfn)
> > > {
> > > 	struct hv_pcibus_device *hbus
> > > 		= container_of(bus->sysdata,
> > > 			       struct hv_pcibus_device, sysdata);
> > > 	struct hf_pci_dev *hpdev;
> > > 	u32 serial;
> > >
> > > 	hpdev = get_pcichild_wslot(hbus, devfn_to_wslot(pdev->devfn));
> > > 	if (!hpdev)
> > > 		return 0;
> > >
> > > 	serial = hpdev->devs.ser;
> > > 	put_pcichild(hpdev, hv_pcidev_ref_by_slot);
> > > 	return serial;
> > > }
> > >
> > > In netvsc_drv.c
> > >
> > > static u32 netvsc_get_vfser(struct net_device *vf_netdev)
> > > {
> > > 	struct device *dev = vf_netdev->dev.parent;
> > > 	struct pci_dev *pdev;
> > > 	u32 wslot;
> > >
> > > 	if (!dev || !dev_is_pci(dev))
> > > 		return 0;
> > >
> > > 	pdev = container_of(dev, struct pci_device, dev);
> > >
> > > 	return hv_pcifront_get_serial(pdev->bus, pdev->devfn);
> > > }
> > >
> > >
> > >
> > >
> > >
> > > P.S: it would be good to be able to get win_slot out through sysfs
> as
> > > well for systemd/udev.
> >
> > Stephen,
> >
> > Thanks for suggestion. Actually, in my earlier implementation of this
> > feature (VF serial based matching), I thought about export a function
> > from vPCI driver, then calling it from netvsc. So I don't need to
> > move structs between headers... But, it creates a dependency of netvsc
> > on vPCI driver's symbol. So, even if on a VM without SRIOV, we have to
> > load vPCI driver, which we don't want.
> >
> > Also, hv_vpci device is 3 parent layers above the vf_netdevice:
> > Here is the VF drv hierarchy --
> > Should we assume it's always 3 parents above vf_netdevice, or search
> for it?
> >
> > [  368.185259] HZINFO:NETDEV_REGISTER:
> > [  368.185261] HZINFO: dev:ffff88007c10d518, bus:          (null),
> busName:(null), drvName:(null)
> > [  368.185262] HZINFO: dev:ffff88007c10c0a0, bus:ffffffff81ce4b60,
> busName:pci, drvName:ixgbevf
> > [  368.185263] HZINFO: dev:ffff8800355c0000, bus:          (null),
> busName:(null), drvName:(null)
> > [  368.185264] HZINFO: dev:ffff8800355c5428, bus:ffffffffa0008160,
> busName:vmbus, drvName:hv_pci
> > [  368.185264] HZINFO: dev:ffff88007c49e268, bus:ffffffff81ce9800,
> busName:acpi, drvName:vmbus
> > [  368.185265] HZINFO: dev:ffff88007c48ea68, bus:ffffffff81ce9800,
> busName:acpi, drvName:(null)
> > [  368.185266] HZINFO: dev:ffff88007c48aa68, bus:ffffffff81ce9800,
> busName:acpi, drvName:(null)
> > [  368.185266] HZINFO: dev:ffff88007c48a268, bus:ffffffff81ce9800,
> busName:acpi, drvName:(null)
> > [  368.185267] HZINFO: dev:ffff88007c489a68, bus:ffffffff81ce9800,
> busName:acpi, drvName:(null)
> >
> > Thanks,
> > - Haiyang
> 
> Since this is a synthetic bus, the topology should not change unless
> host side
> software changes. The vf_netdev device has to be PCI device, so that is
> going to
> be certain.  After that there maybe intermediate up to hv_pci. The code
> in hyperv-pci
> already has similar stuff (ie for read_config).

Other netdevice, like emulated NIC can also trigger this notification.
They are not vPCI.

Thanks,
- Haiyang

  reply	other threads:[~2016-12-09 21:45 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-08  8:33 [PATCH 0/3] Drivers: hv: Implement VF association based on serial number kys
2016-12-08  8:33 ` [PATCH 1/3] hyperv: Move hv_pci_dev and related structs to hyperv.h kys
2016-12-08  8:33   ` [PATCH 2/3] hyperv: Add a function to detect if the device is a vmbus dev kys
2016-12-08  8:33   ` [PATCH 3/3] hv_netvsc: Implement VF matching based on serial numbers kys
2016-12-08 15:56     ` Greg KH
2016-12-09  0:05       ` KY Srinivasan
2016-12-09  7:31         ` Greg KH
2016-12-09 18:20           ` Stephen Hemminger
2016-12-09 20:09             ` Haiyang Zhang
2016-12-09 20:29               ` Stephen Hemminger
2016-12-09 21:31                 ` Haiyang Zhang [this message]
2016-12-09 21:45                   ` Stephen Hemminger
2016-12-09 21:53                     ` Haiyang Zhang
2016-12-09 22:05                       ` Stephen Hemminger
2016-12-09 22:35                         ` Haiyang Zhang
2016-12-10  0:21                           ` Stephen Hemminger
2016-12-10 12:20                             ` Greg KH
2016-12-14 23:18                               ` Haiyang Zhang
2016-12-14 23:27                                 ` Greg KH
2016-12-14 23:51                                   ` Stephen Hemminger
2016-12-16  1:11                                     ` KY Srinivasan
2016-12-16 15:20                                       ` Haiyang Zhang
2016-12-16 18:39                                         ` KY Srinivasan
2016-12-16 16:45                                     ` Greg KH
2016-12-08  9:49   ` [PATCH 1/3] hyperv: Move hv_pci_dev and related structs to hyperv.h kbuild test robot
2016-12-08 15:26     ` KY Srinivasan
2016-12-08 15:56       ` gregkh
2016-12-08 16:54         ` KY Srinivasan
2016-12-08 11:15   ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BLUPR03MB1412A251ACF16A543F20E254CA870@BLUPR03MB1412.namprd03.prod.outlook.com \
    --to=haiyangz@microsoft.com \
    --cc=apw@canonical.com \
    --cc=bjorn.helgaas@gmail.com \
    --cc=devel@linuxdriverproject.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jasowang@redhat.com \
    --cc=kys@microsoft.com \
    --cc=leann.ogasawara@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=olaf@aepfle.de \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.