From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tetsuya Mukawa Subject: Re: [PATCH v8 04/14] eal/pci: Consolidate pci address comparison APIs Date: Wed, 18 Feb 2015 21:20:08 +0900 Message-ID: <54E48378.8050707@igel.co.jp> References: <1423470639-15744-2-git-send-email-mukawa@igel.co.jp> <54E3F10A.4090303@igel.co.jp> <8CEF83825BEC744B83065625E567D7C2049EAA9F@IRSMSX108.ger.corp.intel.com> <2590241.Y3C96lux3o@xps13> <8CEF83825BEC744B83065625E567D7C2049EAB13@IRSMSX108.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Cc: "dev-VfR2kkLFssw@public.gmane.org" To: "Iremonger, Bernard" , Thomas Monjalon Return-path: In-Reply-To: <8CEF83825BEC744B83065625E567D7C2049EAB13-kPTMFJFq+rEMvF1YICWikbfspsVTdybXVpNB7YpNyf8@public.gmane.org> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org Sender: "dev" On 2015/02/18 20:39, Iremonger, Bernard wrote: > >> -----Original Message----- >> From: Thomas Monjalon [mailto:thomas.monjalon-pdR9zngts4EAvxtiuMwx3w@public.gmane.org] >> Sent: Wednesday, February 18, 2015 10:33 AM >> To: Iremonger, Bernard >> Cc: Tetsuya Mukawa; dev-VfR2kkLFssw@public.gmane.org; ivan.boule-pdR9zngts4EAvxtiuMwx3w@public.gmane.org >> Subject: Re: [dpdk-dev] [PATCH v8 04/14] eal/pci: Consolidate pci addr= ess comparison APIs >> >> Hi Bernard, >> >> 2015-02-18 10:26, Iremonger, Bernard: >>> From: dev [mailto:dev-bounces-VfR2kkLFssw@public.gmane.org] On Behalf Of Tetsuya Mukawa >>>> On 2015/02/18 10:02, Thomas Monjalon wrote: >>>>> 2015-02-17 15:14, Tetsuya Mukawa: >>>>>> On 2015/02/17 9:44, Thomas Monjalon wrote: >>>>>>> 2015-02-16 13:14, Tetsuya Mukawa: >>>>>>>> @@ -356,13 +342,24 @@ pci_scan_one(int dev_pci_fd, struct pci_co= nf *conf) >>>>>>>> } >>>>>>>> else { >>>>>>>> struct rte_pci_device *dev2 =3D NULL; >>>>>>>> + int ret; >>>>>>>> >>>>>>>> TAILQ_FOREACH(dev2, &pci_device_list, next) { >>>>>>>> - if (pci_addr_comparison(&dev->addr, &dev2->addr)) >>>>>>>> + ret =3D eal_compare_pci_addr(&dev->addr, &dev2->addr); >>>>>>>> + if (ret > 0) >>>>>>>> continue; >>>>>>>> - else { >>>>>>>> + else if (ret < 0) { >>>>>>>> TAILQ_INSERT_BEFORE(dev2, dev, next); >>>>>>>> return 0; >>>>>>>> + } else { /* already registered */ >>>>>>>> + /* update pt_driver */ >>>>>>>> + dev2->pt_driver =3D dev->pt_driver; >>>>>>>> + dev2->max_vfs =3D dev->max_vfs; >>>>>>>> + memmove(dev2->mem_resource, >>>>>>>> + dev->mem_resource, >>>>>>>> + sizeof(dev->mem_resource)); >>>>>>>> + free(dev); >>>>>>>> + return 0; >>>>>>> Could you comment this "else part" please? >>>>>> PCI device list is allocated when rte_eal_init() is called. At >>>>>> the time, to fill pci device information, sysfs value is used. >>>>>> If sysfs values written by kernel device driver will not be >>>>>> changed by igb_uio, vfio or pci_uio_genereic, above code isn't nee= ded. >>>>>> But actually above values are changed or added by them. >>>>>> >>>>>> Here is a step to cause issue. >>>>>> 1. Boot linux. >>>>>> 2. Start DPDK application without any physical NIC ports. >>>>>> - Here, some sysfs values are read, and store to pci device list.= >>>>>> 3. igb_uio starts managing a port. >>>>>> - Here, some sysfs values are changed. >>>>>> 4. Add a NIC port to DPDK application using hotplug functions. >>>>>> - Here, we need to replace some values. >>>>> I think that you are showing that something is wrongly designed in >>>>> these EAL structures. I suggest to try cleaning this mess instead o= f workarounding. >>> Hi Tetsuya, Thomas, >>> I think that redesigning the EAL structures is beyond the scope of th= is patchset and should be >> undertaken as a separate task. >> >> I strongly disagrees this opinion. We should never workaround design p= roblems and add more >> complex/weird code. >> I think that this kind of consideration is the heart of some design pr= oblems we have to face today. >> Please let's stop adding some code which just works without thinking t= he whole design. >> >>> I suspect there may be a problem in the original code when a device = which was using a kernel driver >> is bound to igb_uio. The igb_uio driver adds /sys/bus/pci/devices/00= 00\:05\:00.0/max_vfs. > Hi Tomas, Tetsuya, > > In general, I agree that we should not workaround design problems. > In this case I don't think there is a problem with the rte_pci_device a= nd pci_device_list structures. I agree with it. > The "already registered" device has been replaced. It would probably be= cleaner to remove the "already registered" device from the list and then= add the new device to the list rather than update the "already registere= d" device. > I guess "replacing" will not work, because rte_pci_device structure is also registered in rte_eth_dev structure. If we remove and free the pci device, I guess something goes wrong in ethdev library. Just removing is one more option, but it means there is a working pci device that is not registered in the pci_device_list. I guess it's weird.= I still think updating may be correct behavior. The pci_device_list is used like below when rte_eal_init() is called. 1. When rte_eal_pci_init() is called, all pci devices are registered in the pci_device_list. 2. When rte_eal_dev_init() is called, dev_driver_list is traversed, and if a PCI device for a driver is found in the pci_device_list, init() of the driver is called. I guess it's not so strange design. But this design assumes pci_device_list is latest while matching a driver registered in dev_driver_list. Now, we have hotplug patch. I guess we should do same thing. Before matching, we should update the pci_device_list. Thanks, Tetsuya > Regards, > > Bernard. >