From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932110AbcLIWFT (ORCPT ); Fri, 9 Dec 2016 17:05:19 -0500 Received: from mail-pg0-f41.google.com ([74.125.83.41]:34659 "EHLO mail-pg0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753079AbcLIWFS (ORCPT ); Fri, 9 Dec 2016 17:05:18 -0500 Date: Fri, 9 Dec 2016 14:05:09 -0800 From: Stephen Hemminger To: Haiyang Zhang Cc: Greg KH , KY Srinivasan , "olaf@aepfle.de" , "linux-kernel@vger.kernel.org" , "bjorn.helgaas@gmail.com" , "apw@canonical.com" , "devel@linuxdriverproject.org" , "leann.ogasawara@canonical.com" , "jasowang@redhat.com" Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on serial numbers Message-ID: <20161209140509.1dd7cad2@xeon-e3> In-Reply-To: References: <1481185988-30383-1-git-send-email-kys@exchange.microsoft.com> <1481186023-30429-1-git-send-email-kys@exchange.microsoft.com> <1481186023-30429-3-git-send-email-kys@exchange.microsoft.com> <20161208155604.GA4601@kroah.com> <20161209073122.GB1513@kroah.com> <20161209102030.1f550c5a@xeon-e3> <20161209122935.2bd0779e@xeon-e3> <20161209134510.22087f81@xeon-e3> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 9 Dec 2016 21:53:49 +0000 Haiyang Zhang wrote: > > -----Original Message----- > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > Sent: Friday, December 9, 2016 4:45 PM > > To: Haiyang Zhang > > Cc: Greg KH ; KY Srinivasan > > ; olaf@aepfle.de; linux-kernel@vger.kernel.org; > > bjorn.helgaas@gmail.com; apw@canonical.com; devel@linuxdriverproject.org; > > leann.ogasawara@canonical.com; jasowang@redhat.com > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on > > serial numbers > > > > On Fri, 9 Dec 2016 21:31:25 +0000 > > Haiyang Zhang wrote: > > > > > > -----Original Message----- > > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > > > Sent: Friday, December 9, 2016 3:30 PM > > > > To: Haiyang Zhang > > > > Cc: Greg KH ; KY Srinivasan > > > > ; olaf@aepfle.de; linux-kernel@vger.kernel.org; > > > > bjorn.helgaas@gmail.com; apw@canonical.com; > > devel@linuxdriverproject.org; > > > > leann.ogasawara@canonical.com; jasowang@redhat.com > > > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on > > > > serial numbers > > > > > > > > On Fri, 9 Dec 2016 20:09:49 +0000 > > > > Haiyang Zhang wrote: > > > > > > > > > > -----Original Message----- > > > > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > > > > > Sent: Friday, December 9, 2016 1:21 PM > > > > > > To: Greg KH > > > > > > Cc: KY Srinivasan ; olaf@aepfle.de; Haiyang > > Zhang > > > > > > ; linux-kernel@vger.kernel.org; > > > > > > bjorn.helgaas@gmail.com; apw@canonical.com; > > > > devel@linuxdriverproject.org; > > > > > > leann.ogasawara@canonical.com; jasowang@redhat.com > > > > > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based > > on > > > > > > serial numbers > > > > > > > > > > > > On Fri, 9 Dec 2016 08:31:22 +0100 > > > > > > Greg KH wrote: > > > > > > > > > > > > > On Fri, Dec 09, 2016 at 12:05:53AM +0000, KY Srinivasan wrote: > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Greg KH [mailto:gregkh@linuxfoundation.org] > > > > > > > > > Sent: Thursday, December 8, 2016 7:56 AM > > > > > > > > > To: KY Srinivasan > > > > > > > > > Cc: linux-kernel@vger.kernel.org; > > devel@linuxdriverproject.org; > > > > > > > > > olaf@aepfle.de; apw@canonical.com; vkuznets@redhat.com; > > > > > > > > > jasowang@redhat.com; leann.ogasawara@canonical.com; > > > > > > > > > bjorn.helgaas@gmail.com; Haiyang Zhang > > > > > > > > > > > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching > > > > based on > > > > > > serial > > > > > > > > > numbers > > > > > > > > > > > > > > > > > > On Thu, Dec 08, 2016 at 12:33:43AM -0800, > > > > > > kys@exchange.microsoft.com > > > > > > > > > wrote: > > > > > > > > > > From: Haiyang Zhang > > > > > > > > > > > > > > > > > > > > We currently use MAC address to match VF and synthetic > > NICs. > > > > > > Hyper-V > > > > > > > > > > provides a serial number to both devices for this > > purpose. > > > > This > > > > > > patch > > > > > > > > > > implements the matching based on VF serial numbers. This > > is > > > > the > > > > > > way > > > > > > > > > > specified by the protocol and more reliable. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Haiyang Zhang > > > > > > > > > > Signed-off-by: K. Y. Srinivasan > > > > > > > > > > --- > > > > > > > > > > drivers/net/hyperv/netvsc_drv.c | 55 > > > > > > > > > ++++++++++++++++++++++++++++++++++++--- > > > > > > > > > > 1 files changed, 51 insertions(+), 4 deletions(-) > > > > > > > > > > > > > > > > > > > > diff --git a/drivers/net/hyperv/netvsc_drv.c > > > > > > > > > b/drivers/net/hyperv/netvsc_drv.c > > > > > > > > > > index 9522763..c5778cf 100644 > > > > > > > > > > --- a/drivers/net/hyperv/netvsc_drv.c > > > > > > > > > > +++ b/drivers/net/hyperv/netvsc_drv.c > > > > > > > > > > @@ -1165,9 +1165,10 @@ static void > > netvsc_free_netdev(struct > > > > > > > > > net_device *netdev) > > > > > > > > > > free_netdev(netdev); > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > -static struct net_device *get_netvsc_bymac(const u8 > > *mac) > > > > > > > > > > +static struct net_device *get_netvsc_byvfser(u32 vfser) > > > > > > > > > > { > > > > > > > > > > struct net_device *dev; > > > > > > > > > > + struct net_device_context *ndev_ctx; > > > > > > > > > > > > > > > > > > > > ASSERT_RTNL(); > > > > > > > > > > > > > > > > > > > > @@ -1175,7 +1176,8 @@ static void > > netvsc_free_netdev(struct > > > > > > net_device > > > > > > > > > *netdev) > > > > > > > > > > if (dev->netdev_ops != &device_ops) > > > > > > > > > > continue; /* not a netvsc device */ > > > > > > > > > > > > > > > > > > > > - if (ether_addr_equal(mac, dev->perm_addr)) > > > > > > > > > > + ndev_ctx = netdev_priv(dev); > > > > > > > > > > + if (ndev_ctx->vf_serial == vfser) > > > > > > > > > > return dev; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > @@ -1205,21 +1207,66 @@ static void > > > > netvsc_free_netdev(struct > > > > > > > > > net_device *netdev) > > > > > > > > > > return NULL; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > +static u32 netvsc_get_vfser(struct net_device > > *vf_netdev) > > > > > > > > > > +{ > > > > > > > > > > + struct device *dev; > > > > > > > > > > + struct hv_device *hdev; > > > > > > > > > > + struct hv_pcibus_device *hbus = NULL; > > > > > > > > > > + struct list_head *iter; > > > > > > > > > > + struct hv_pci_dev *hpdev; > > > > > > > > > > + unsigned long flags; > > > > > > > > > > + u32 vfser = 0; > > > > > > > > > > + u32 count = 0; > > > > > > > > > > + > > > > > > > > > > + for (dev = &vf_netdev->dev; dev; dev = dev->parent) > > { > > > > > > > > > > > > > > > > > > You are going to walk the whole device tree backwards? > > That's > > > > > > crazy. > > > > > > > > > And foolish. And racy and broken (what happens if the > > tree > > > > > > changes > > > > > > > > > while you do this?) Where is the lock being grabbed while > > > > this > > > > > > happens? > > > > > > > > > What about reference counts? Do you see other drivers > > ever > > > > doing > > > > > > this > > > > > > > > > (if you do, point them out and I'll go yell at them too...) > > > > > > > > > > > > > > > > Greg, > > > > > > > > > > > > > > > > We are registering for netdev events. Coming into this > > function, > > > > the > > > > > > caller > > > > > > > > guarantees that the list of netdevs does not change - we > > assert > > > > this > > > > > > on entry: > > > > > > > > ASSERT_RTNL(). We are only walking up the device tree for > > the > > > > > > netdevs whose > > > > > > > > state change is being notified to us - the device tree being > > > > walked > > > > > > here is limited to > > > > > > > > netdevs under question. > > > > > > > > > > > > > > But a netdev is a child of some type of "real" device, and you > > are > > > > now > > > > > > > walking the tree of all devices up to the "root" parent device, > > > > which > > > > > > > means you will hit PCI bridges, USB controllers, and all sorts > > of > > > > fun > > > > > > > things if you are a child of those types of devices. > > > > > > > > > > > > > > And can't you tell if the netdev for this event, really is > > "your" > > > > > > > netdev? Or are you getting called this for "all" netdevs? > > Sorry, > > > > I > > > > > > > don't know this api, any pointers to it would be appreciated. > > > > > > > > > > > > > > > We have a reference to the device and we know the device is > > not > > > > > > going away. Is it not > > > > > > > > safe to dereference the parent pointer - after all the child > > has > > > > > > taken a reference on > > > > > > > > the parent as part of device_add() call. > > > > > > > > > > > > > > It might be, and might not be. There's a reason you don't see > > > > this > > > > > > > pattern anywhere in the kernel because of this... > > > > > > > > > > > > > > > > > + if (!dev_is_vmbus(dev)) > > > > > > > > > > + continue; > > > > > > > > > > > > > > > > > > Ick. > > > > > > > > > > > > > > > > > > Why isn't your parent pointer a vmbus device all the time? > > > > How > > > > > > could > > > > > > > > > you get burried down in the device hierarchy when you are > > the > > > > > > driver for > > > > > > > > > a specific bus type in the first place? How could this > > > > function > > > > > > ever be > > > > > > > > > called for a device that is NOT of this type? > > > > > > > > > > > > > > > > We get notified when state changes on any of the netdev > > devices > > > > in > > > > > > the system. > > > > > > > > Not all netdevs in the system belong to vmbus. Consider for > > > > instance > > > > > > the > > > > > > > > emulated NIC that can be configured. This is an emulated PCI > > NIC. > > > > We > > > > > > are only > > > > > > > > interested in netdevs that correspond to the VF instance > > that we > > > > are > > > > > > interested in. > > > > > > > > > > > > > > Can you "know" this is your netdev by some other way than > > having > > > > to > > > > > > walk > > > > > > > the device tree? Name? local device type? Something else? > > This > > > > > > seems > > > > > > > like an odd api in that everyone would have to do gyrations > > like > > > > this > > > > > > in > > > > > > > order to determine if the netdev is "theirs" or not... > > > > > > > > > > > > The scenario is SR-IOV on Hyper-V. In the case of VF device, the > > > > host > > > > > > hands the > > > > > > guest OS a PCI device for the virtual function device. The VF > > device > > > > is > > > > > > placed > > > > > > on a special synthetic PCI bus (ie not part of the other buses > > on > > > > the > > > > > > system). > > > > > > The VF device also has a synthetic network interface (netvsc) > > which > > > > > > lives > > > > > > on VMBUS. This code is about managing the interaction between > > the > > > > two. > > > > > > > > > > > > The association between VF and synthetic NIC is done in response > > to > > > > the > > > > > > VF network device being registered. Initial version was based on > > MAC > > > > > > address > > > > > > which is the same. Later refinement used permanent MAC address > > to > > > > > > avoid bugs if MAC address changed. This version is to use > > serial > > > > number > > > > > > instead which is safer than MAC address. > > > > > > > > > > > > The code to walk up/down maybe not be needed to find serial > > number. > > > > > > Perhaps a more direct single set of conditions is possible? > > > > > > > > > > > > Something like: > > > > > > > > > > > > In pci-hyperv.c > > > > > > > > > > > > u32 hv_pcifront_get_serial(struct pci_bus *bus, unsigned int > > devfn) > > > > > > { > > > > > > struct hv_pcibus_device *hbus > > > > > > = container_of(bus->sysdata, > > > > > > struct hv_pcibus_device, sysdata); > > > > > > struct hf_pci_dev *hpdev; > > > > > > u32 serial; > > > > > > > > > > > > hpdev = get_pcichild_wslot(hbus, devfn_to_wslot(pdev->devfn)); > > > > > > if (!hpdev) > > > > > > return 0; > > > > > > > > > > > > serial = hpdev->devs.ser; > > > > > > put_pcichild(hpdev, hv_pcidev_ref_by_slot); > > > > > > return serial; > > > > > > } > > > > > > > > > > > > In netvsc_drv.c > > > > > > > > > > > > static u32 netvsc_get_vfser(struct net_device *vf_netdev) > > > > > > { > > > > > > struct device *dev = vf_netdev->dev.parent; > > > > > > struct pci_dev *pdev; > > > > > > u32 wslot; > > > > > > > > > > > > if (!dev || !dev_is_pci(dev)) > > > > > > return 0; > > > > > > > > > > > > pdev = container_of(dev, struct pci_device, dev); > > > > > > > > > > > > return hv_pcifront_get_serial(pdev->bus, pdev->devfn); > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > P.S: it would be good to be able to get win_slot out through > > sysfs > > > > as > > > > > > well for systemd/udev. > > > > > > > > > > Stephen, > > > > > > > > > > Thanks for suggestion. Actually, in my earlier implementation of > > this > > > > > feature (VF serial based matching), I thought about export a > > function > > > > > from vPCI driver, then calling it from netvsc. So I don't need to > > > > > move structs between headers... But, it creates a dependency of > > netvsc > > > > > on vPCI driver's symbol. So, even if on a VM without SRIOV, we > > have to > > > > > load vPCI driver, which we don't want. > > > > > > > > > > Also, hv_vpci device is 3 parent layers above the vf_netdevice: > > > > > Here is the VF drv hierarchy -- > > > > > Should we assume it's always 3 parents above vf_netdevice, or > > search > > > > for it? > > > > > > > > > > [ 368.185259] HZINFO:NETDEV_REGISTER: > > > > > [ 368.185261] HZINFO: dev:ffff88007c10d518, bus: (null), > > > > busName:(null), drvName:(null) > > > > > [ 368.185262] HZINFO: dev:ffff88007c10c0a0, bus:ffffffff81ce4b60, > > > > busName:pci, drvName:ixgbevf > > > > > [ 368.185263] HZINFO: dev:ffff8800355c0000, bus: (null), > > > > busName:(null), drvName:(null) > > > > > [ 368.185264] HZINFO: dev:ffff8800355c5428, bus:ffffffffa0008160, > > > > busName:vmbus, drvName:hv_pci > > > > > [ 368.185264] HZINFO: dev:ffff88007c49e268, bus:ffffffff81ce9800, > > > > busName:acpi, drvName:vmbus > > > > > [ 368.185265] HZINFO: dev:ffff88007c48ea68, bus:ffffffff81ce9800, > > > > busName:acpi, drvName:(null) > > > > > [ 368.185266] HZINFO: dev:ffff88007c48aa68, bus:ffffffff81ce9800, > > > > busName:acpi, drvName:(null) > > > > > [ 368.185266] HZINFO: dev:ffff88007c48a268, bus:ffffffff81ce9800, > > > > busName:acpi, drvName:(null) > > > > > [ 368.185267] HZINFO: dev:ffff88007c489a68, bus:ffffffff81ce9800, > > > > busName:acpi, drvName:(null) > > > > > > > > > > Thanks, > > > > > - Haiyang > > > > > > > > Since this is a synthetic bus, the topology should not change unless > > > > host side > > > > software changes. The vf_netdev device has to be PCI device, so that > > is > > > > going to > > > > be certain. After that there maybe intermediate up to hv_pci. The > > code > > > > in hyperv-pci > > > > already has similar stuff (ie for read_config). > > > > > > Other netdevice, like emulated NIC can also trigger this notification. > > > They are not vPCI. > > > > > > Thanks, > > > - Haiyang > > > > Emulated NIC is already excluded in start of netvc notifier handler. > > > > static int netvsc_netdev_event(struct notifier_block *this, > > unsigned long event, void *ptr) > > { > > struct net_device *event_dev = netdev_notifier_info_to_dev(ptr); > > > > /* Skip our own events */ > > if (event_dev->netdev_ops == &device_ops) > > return NOTIFY_DONE; > > > > Emulated device is not based on netvsc. It's the native Linux (dec100M?) > Driver. So this line doesn't exclude it. And how about other NIC type > may be added in the future? Sorry, forgot about that haven't used emulated device in years. The emulated device should appear to be on a PCI bus, but the serial would not match??