From mboxrd@z Thu Jan 1 00:00:00 1970 From: manish jaggi Subject: Re: [RFC + Queries] Flow of PCI passthrough in ARM Date: Wed, 1 Oct 2014 16:07:23 +0530 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Stefano Stabellini Cc: Ian Campbell , Vijay Kilari , Prasun Kapoor , manish.jaggi@caviumnetworks.com, Julien Grall , xen-devel , psawargaonkar@linaro.org, Matt.Evans@arm.com, Dave.Martin@arm.com, Anup Patel List-Id: xen-devel@lists.xenproject.org On 25 September 2014 15:57, Stefano Stabellini wrote: > On Thu, 25 Sep 2014, manish jaggi wrote: >> On 24 September 2014 19:40, Stefano Stabellini >> wrote: >> > CC'ing Matt and Dave at ARM for an opinions about device tree, SMMUs and >> > stream ids. See below. >> > >> > On Wed, 24 Sep 2014, manish jaggi wrote: >> >> On 22 September 2014 16:15, Stefano Stabellini >> >> wrote: >> >> > On Thu, 18 Sep 2014, manish jaggi wrote: >> >> >> Hi, >> >> >> Below is the flow I am working on, Please provide your comments, I >> >> >> have a couple of queries as well.. >> >> >> >> >> >> a) Device tree has smmu nodes and each smmu node has the mmu-master property. >> >> >> In our Soc DT the mmu-master is a pcie node in device tree. >> >> > >> >> > Do you mean that both the smmu nodes and the pcie node have the >> >> > mmu-master property? The pcie node is the pcie root complex, right? >> >> > >> >> pci-node is the pcie root complex. pci node is the mmu master in smmu node. >> >> >> >> smmu1@0x8310,00000000 { >> >> ... >> >> >> >> mmu-masters = <&pcie1 0x100>; >> >> }; >> >> >> >> >> b) Xen parses the device tree and prepares a list which stores the pci >> >> >> device tree node pointers. The order in device tree is mapped to >> >> >> segment number in subsequent calls. For eg 1st pci node found is >> >> >> segment 0, 2nd segment 1 >> >> > >> >> > What's a segment number? Something from the PCI spec? >> >> > If you have several pci nodes on device tree, does that mean that you >> >> > have several different pcie root complexes? >> >> > >> >> yes. >> >> segment is the pci rc number. >> >> > >> >> >> c) During SMMU init the pcie nodes in DT are saved as smmu masters. >> >> > >> >> > At this point you should also be able to find via DT the stream-id range >> >> > supported by each SMMU and program the SMMU with them, assigning >> >> > everything to dom0. >> >> Currently pcie enumeration is not done in xen, it is done in dom0. >> > >> > Yes, but we don't really need to walk any PCIe busses in order to >> > program the SMMU, right? We only need the requestor id and the stream id >> > ranges. We should be able to get them via device tree. >> > >> Yes, but i have a doubt here >> Before booting dom0 for each smmu the mask in SMR can be set to enable >> stream ids to dom0. >> This can be fixed or read from device tree. >> There are 2 points here >> a) PCI bus enumeration >> b) Programming SMMU for dom0 >> For (b) the enumeration is not required provided we set the mask >> So are you also saying that (a) should be done in Xen and not in dom0 ? >> If yes how would dom0 get to know about PCIe Eps , from its Device tree ? > > No, I think that doing (a) via PHYSDEVOP_pci_device_add is OK. > I am saying that we should consider doing (b) in Xen before booting > dom0. > > >> >> >> d) Dom0 Enumerates PCI devices, calls hypercall PHYSDEVOP_pci_device_add. >> >> >> - In Xen the SMMU iommu_ops add_device is called. I have implemented >> >> >> the add_device function. >> >> >> - In the add_device function >> >> >> the segment number is used to locate the device tree node pointer of >> >> >> the pcie node which helps to find out the corresponding smmu. >> >> >> - In the same PHYSDEVOP the BAR regions are mapped to Dom0. >> >> >> >> >> >> Note: The current SMMU driver maps the complete Domain's Address space >> >> >> for the device in SMMU hardware. >> >> >> >> >> >> The above flow works currently for us. >> >> > >> >> > It would be nice to be able to skip d): in a system where all dma capable >> >> > devices are behind smmus, we should be capable of booting dom0 without >> >> > the 1:1 mapping hack. If we do that, it would be better to program the >> >> > smmus before booting dom0. Otherwise there is a risk that dom0 is going >> >> > to start using these devices and doing dma before we manage to secure >> >> > the devices via smmus. >> >> > >> >> In our current case we are programming smmu in >> >> PHYSDEVOP_pci_device_add flow so before the domain 0 accesses the >> >> device it is mapped, otherwise xen gets a SMMU fault. >> > >> > Good. >> > >> > >> >> > Of course we can do that if there are no alternatives. But in our case >> >> > we should be able to extract the stream-ids from device tree and program >> >> > the smmus right away, right? Do we really need to wait for dom0 to call >> >> > PHYSDEVOP_pci_device_add? We could just assign everything to dom0 for a >> >> > start. >> >> > >> >> We cannot get streamid from device tree as enumeration is done for the same. >> > >> > I am not sure what the current state of the device tree spec is, but I >> > am pretty sure that the intention is to express stream id and requestor >> > id ranges directly in the dts so that the SMMU can be programmed right >> > away without walking the PCI bus. >> > >> > >> >> > I would like to know from the x86 guys, if this is really how it is >> >> > supposed to work on PVH too. Do we rely on PHYSDEVOP_pci_device_add to >> >> > program the IOMMU? >> >> > >> >> > >> >> I was waiting but no one has commented >> > >> > Me too. Everybody is very busy at the moment with the 4.5 release. >> > >> > >> >> >> Now when I call pci-assignable-add I see that the iommu_ops >> >> >> remove_device in smmu driver is not called. If that is not called the >> >> >> SMMU would still have the dom0 address space mappings for that device >> >> >> >> >> >> Can you please suggest the best place (kernel / xl-tools) to put the >> >> >> code which would call the remove_device in iommu_opps in the control >> >> >> flow from pci-assignable-add. >> >> >> >> >> >> One way I see is to introduce a DOMCTL_iommu_remove_device in >> >> >> pci-assignable-add / pci-detach and DOMCTL_iommu_add_device in >> >> >> pci-attach. Is that a valid approach ? >> >> > >> >> > I am not 100% sure, but I think that before assigning a PCI device to >> >> > another guest, you are supposed to bind the device to xen-pciback (see >> >> > drivers/xen/xen-pciback, also see >> >> > http://wiki.xen.org/wiki/Xen_PCI_Passthrough). The pciback driver is >> >> > going hide the device from dom0 and as a consequence >> >> > drivers/xen/pci.c:xen_remove_device ends up being called, that issues a >> >> > PHYSDEVOP_pci_device_remove hypercall. >> >> >> >> xen_remove_device is not called at all. in pci-attach >> >> iommu_ops->assign_device is called. >> >> In Xen the nomenclature is confusing and no comments are there is iommu.h >> >> iommu_ops.add_device is when dom0 issues hypercall >> >> iommu_ops.assign_dt_device is when xen attaches a device tree device to dom0 >> >> iommu_ops.assign_device is when xl pci-attach is called >> >> iommu_ops.reassign_device is called when xl pci-detach is called >> >> >> >> As of now we are able to assign devices to domU and driver in domU is >> >> running, we did some hacks like >> >> a) in xen pci front driver bus->msi is assigned to its msi_chip >> >> >> >> ---- pcifront_scan_root() >> >> ... >> >> b = pci_scan_bus_parented(&pdev->xdev->dev, bus, >> >> &pcifront_bus_ops, sd); >> >> if (!b) { >> >> dev_err(&pdev->xdev->dev, >> >> "Error creating PCI Frontend Bus!\n"); >> >> err = -ENOMEM; >> >> pci_unlock_rescan_remove(); >> >> goto err_out; >> >> } >> >> >> >> bus_entry->bus = b; >> >> + msi_node = of_find_compatible_node(NULL,NULL, "arm,gic-v3-its"); >> >> + if(msi_node) { >> >> + b->msi = of_pci_find_msi_chip_by_node(msi_node); >> >> + if(!b->msi) { >> >> + printk(KERN_ERR"Unable to find bus->msi node \r\n"); >> >> + goto err_out; >> >> + } >> >> + }else { >> >> + printk(KERN_ERR"Unable to find arm,gic-v3-its >> >> compatible node \r\n"); >> >> + goto err_out; >> >> + } >> > >> > It seems to be that of_pci_find_msi_chip_by_node should be called by >> > common code somewhere else. Maybe people at linux-arm would know where >> > to suggest this initialization should go. >> > >> This is a workaround to attach a msi-controller to xen pcifront bus. >> We are avoiding the xen fronted ops for msi. > > I think I would need to see a proper patch series to really evaluate this change. > > >> > >> >> ---- >> >> >> >> using this the ITS emulation code in xen is able to trap ITS command >> >> writes by driver. >> >> But we are facing a problem now, where your help is needed >> >> >> >> The StreamID is generated by segment: bus : device: number which is >> >> fed as DevID in ITS commands. In Dom0 the streamID is correctly >> >> generated but in domU the Stream ID for a passthrough device is >> >> 0:0:0:0 now when emulating this in Xen it is a problem as xen does not >> >> know how to get the physical stream id. >> >> >> >> (Eg: xl pci-attach 1 0001:00:05.0 >> >> DomU has the device but in DomU the id is 0000:00:00.0.) >> >> >> >> Could you suggest how to go about this. >> > >> > I don't think that the ITS patches have been posted yet, so it is >> > difficult for me to understand the problem and propose a solution. >> >> In a simpler way, It is more of what the StreamID a driver running in >> domU sees. Which is programmed in the ITS commands. >> And how to map the domU streamID to actual streamID in Xen when the >> ITS command write traps. > > Wouldn't it be possible to pass the correct StreamID to DomU via device > tree? Does it really need to match the PCI BDF? Device Tree provide static mapping, runtime attaching a device (using xl tools) to a domU is what I am working on. > Otherwise if the command trap into Xen, couldn't Xen do the translation? Xen does not know how to map the BDF in domU to actual streamID. I had thought of adding a hypercall, when xl pci-attach is called. PHYSDEVOP_map_streamid { dom_id, phys_streamid, //bdf guest_streamid, } But I am not able to get correct BDF of domU. For instance the logs at 2 different place give diff BDFs #xl pci-attach 1 '0002:01:00.1,permissive=1' xen-pciback pci-1-0: xen_pcibk_export_device exporting dom 2 bus 1 slot 0 func 1 xen_pciback: vpci: 0002:01:00.1: assign to virtual slot 1 xen_pcibk_publish_pci_dev 0000:00:01.00 Code that generated print: static int xen_pcibk_publish_pci_dev(struct xen_pcibk_device *pdev, unsigned int domain, unsigned int bus, unsigned int devfn, unsigned int devid) { ... printk(KERN_ERR"%s %04x:%02x:%02x.%02x",__func__, domain, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); While in xen_pcibk_do_op Print is: xen_pcibk_do_op Guest SBDF=0:0:1.1 (this is output of lspci in domU) Code that generated print: void xen_pcibk_do_op(struct work_struct *data) { ... if (dev == NULL) op->err = XEN_PCI_ERR_dev_not_found; else { printk(KERN_ERR"%s Guest SBDF=%d:%d:%d.%d \r\n",__func__, op->domain, op->bus, op->devfn>>3, op->devfn&0x7); Stefano, I need your help in this -Regards Manish