From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Return-Path: Date: Thu, 13 Apr 2017 17:53:55 +1000 From: Gavin Shan To: Gavin Shan Cc: Bjorn Helgaas , linux-pci@vger.kernel.org, bhelgaas@google.com, clsoto@us.ibm.com Subject: Re: [PATCH] PCI: Disable IOV before pcibios_sriov_disable() Reply-To: Gavin Shan References: <1488154712-8354-1-git-send-email-gwshan@linux.vnet.ibm.com> <20170307201529.GF21358@bhelgaas-glaptop.roam.corp.google.com> <20170309003432.GA27263@gwshan> <20170318001907.GA16291@gwshan> <20170320154644.GA24572@bhelgaas-glaptop.roam.corp.google.com> <20170320225006.GA8591@gwshan> <20170330232406.GA14305@gwshan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20170330232406.GA14305@gwshan> Message-Id: <20170413075355.GA21414@gwshan> List-ID: On Fri, Mar 31, 2017 at 10:24:06AM +1100, Gavin Shan wrote: >On Tue, Mar 21, 2017 at 09:50:06AM +1100, Gavin Shan wrote: >>On Mon, Mar 20, 2017 at 10:46:44AM -0500, Bjorn Helgaas wrote: >>>On Sat, Mar 18, 2017 at 11:19:07AM +1100, Gavin Shan wrote: >>>> On Thu, Mar 09, 2017 at 11:34:32AM +1100, Gavin Shan wrote: >>>> >On Tue, Mar 07, 2017 at 02:15:29PM -0600, Bjorn Helgaas wrote: >>>> >>On Mon, Feb 27, 2017 at 11:18:32AM +1100, Gavin Shan wrote: >>>> >>> The PowerNV platform is the only user of pcibios_sriov_disable(). >>>> >>> The IOV BAR could be shifted by pci_iov_update_resource(). The >>>> >>> warning message in the function is printed if the IOV capability >>>> >>> is in enabled (PCI_SRIOV_CTRL_VFE && PCI_SRIOV_CTRL_MSE) state. >>>> >>> >>>> >>> pci_disable_sriov >>>> >>> sriov_disable >>>> >>> pnv_pci_sriov_disable >>>> >>> pnv_pci_vf_resource_shift >>>> >>> pci_update_resource >>>> >>> pci_iov_update_resource >>>> >>> >>>> >>> This fixes the issue by disabling IOV capability before calling >>>> >>> pcibios_sriov_disable(). With it, the disabling path matches with >>>> >>> the enabling path: pcibios_sriov_enable() is called before the >>>> >>> IOV capability is enabled. >>>> >> >>>> >>I'm vaguely uncomfortable about this path: >>>> >> >>>> >> pci_disable_sriov >>>> >> sriov_disable >>>> >> pcibios_sriov_disable # powerpc version >>>> >> pnv_pci_sriov_disable >>>> >> pnv_pci_vf_resource_shift >>>> >> res = &dev->resource[i + PCI_IOV_RESOURCES] >>>> >> res->start += size * offset >>>> >> pci_update_resource >>>> >> pci_iov_update_resource >>>> >> pnv_pci_vf_release_m64 >>>> >> >>>> >>1) "res" is already in the resource tree, so we shouldn't be changing >>>> >> its start address, because that may make the tree inconsistent, >>>> >> e.g., the resource may no longer be completely contained in its >>>> >> parent, it may conflict with a sibling, etc. >>>> >> >>>> >>2) If we update "res->start", shouldn't we update "res->end" >>>> >> correspondingly? >>>> >> >>>> >>It seems like it'd be better if we didn't update the device resources >>>> >>in the enable/disable paths. If we could do the resource adjustments >>>> >>earlier, somewhere before we give the device to a driver, it seems >>>> >>like it would avoid these issues. >>>> >> >>>> >>We might have talked about these questions in the past, so I apologize >>>> >>if you've already explained this. If that's the case, maybe we just >>>> >>need some comments in the code to help the next confused reader. >>>> >> >>>> > >>>> >Bjorn, thanks for review. I agree it's not perfect. We discussed this long >>>> >time ago as I can remember. Let me try to make it a bit more clear: In our >>>> >PHB hardware, there are 16 MMIO BARs. Each of them can be shared by 256 PEs >>>> >(A) and owned exclusively by one PE (B). When VF BAR size is small enough, >>>> >we take (A). Otherwise, we have to take (B). Only when taking (A), we need >>>> >expand/move/shrink the IOV BAR. So lets stick to (A) for discussion here. >>>> > >>>> >Under (A), PF's IOV BAR size is extended to ((256 * (VF BAR size)) when the >>>> >PF is probed. Then the @res, which corresponds to the IOV BAR, is assigned >>>> >and put into the resource tree during resource sizing and assignment stage. >>>> >The IOV capability is going to be enabled by PF's driver or sysfs entry, it >>>> >calls into pnv_pci_sriov_enable() where number of contigous PEs (equal to >>>> >number of VFs to be enabled) are allocated. We shift the IOV BAR base according >>>> >to the starting PE number of the allocated block. Afterewards, the IOV BAR >>>> >is restored when the IOV capability is disabled. So it's all about the PE. >>>> >The IOV BAR's end address isn't touched, we needn't update @res->end when >>>> >restoring the IOV BAR. >>>> > >>>> >In order to avoid moving IOV BAR base address, I need know the the PEs >>>> >for the VFs before resourcd sizing and assignment stage. It means I need >>>> >to reserve PEs in advance, which isn't nice because we never enable the >>>> >VFs. In that case, the PEs are wasted. >>>> > >>>> >Yeah, it's nice to have add some comments in pnv_pci_vf_resource_shift() >>>> >where pci_update_resource() is called. I will post another patch to >>>> >linux-ppcdev and you'll be copied. If you agree, I think you can merge >>>> >this patch as none of the concerns are too much related. >>>> > >>>> >>>> Sorry, Bjorn, ping! Please let me know if there are more concerns you have. >>> >>>I think we had a misunderstanding -- you mentioned adding some >>>comments and wrote "I will post another patch", and I *thought* you >>>meant you were going to post another version of *this* patch with some >>>updated comments. So I've been waiting for that updated patch. But I >>>think you've been waiting for me to merge *this* patch as-is. >>> >>>To avoid having this discussion a third time in the future, I think >>>you should add some comments at the point where you update the >>>resource. Updating a resource after it's in the resource tree is >>>clearly dangerous, so we need some explanation of why it's sort of OK >>>in this particular case. >>> >>>If you can write a comment and dig up a URL to our previous >>>discussion, I'd like to incorporate that into *this* patch before I >>>merge it. The sooner we can document this, the less work it will be >>>in the future. >>> >> >>Ok. Sorry for the confusion and that I should looked into the code for more. >>We already had one comment like below in arch/powerpc/platforms/powernv/pci-ioda.c:: >>pnv_pci_vf_resource_shift(). I think it's exactly what you like to have, please >>help to confirm. I believe it was added based on your comments long time ago when >>you review the SRIOV (for PowerNV) patches. >> >> /* >> * After doing so, there would be a "hole" in the /proc/iomem when >> * offset is a positive value. It looks like the device return some >> * mmio back to the system, which actually no one could use it. >> */ >> >>http://www.spinics.net/lists/linux-pci/msg39424.html >> > >Bjorn, please let me know if you have concerns. > Bjorn, Sorry that I have to ping you again. I assume it's mergable. Please let me know if you have more concerns. Thanks, Gavin