On Tue, Mar 29, 2022 at 07:58:51PM +0000, Jag Raman wrote: > > > > On Mar 29, 2022, at 10:48 AM, Stefan Hajnoczi wrote: > > > > On Tue, Mar 29, 2022 at 02:12:40PM +0000, Jag Raman wrote: > >>> On Mar 29, 2022, at 8:35 AM, Stefan Hajnoczi wrote: > >>> On Fri, Mar 25, 2022 at 03:19:41PM -0400, Jagannathan Raman wrote: > >>>> +void remote_iommu_del_device(PCIDevice *pci_dev) > >>>> +{ > >>>> + int pci_bdf; > >>>> + > >>>> + if (!remote_iommu_table.elem_by_bdf || !pci_dev) { > >>>> + return; > >>>> + } > >>>> + > >>>> + pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn); > >>>> + > >>>> + qemu_mutex_lock(&remote_iommu_table.lock); > >>>> + g_hash_table_remove(remote_iommu_table.elem_by_bdf, INT2VOIDP(pci_bdf)); > >>>> + qemu_mutex_unlock(&remote_iommu_table.lock); > >>>> +} > >>>> + > >>>> +void remote_configure_iommu(PCIBus *pci_bus) > >>>> +{ > >>>> + if (!remote_iommu_table.elem_by_bdf) { > >>>> + remote_iommu_table.elem_by_bdf = > >>>> + g_hash_table_new_full(NULL, NULL, NULL, remote_iommu_del_elem); > >>>> + qemu_mutex_init(&remote_iommu_table.lock); > >>>> + } > >>>> + > >>>> + pci_setup_iommu(pci_bus, remote_iommu_find_add_as, &remote_iommu_table); > >>> > >>> Why is remote_iommu_table global? It could be per-PCIBus and indexed by > >>> just devfn instead of the full BDF. > >> > >> It’s global because remote_iommu_del_device() needs it for cleanup. > > > > Can remote_iommu_del_device() use pci_get_bis(pci_dev)->irq_opaque to > > get the per-bus table? > > pci_get_bus(pci_dev)->irq_opaque is used for interrupts. > > PCIBus already has an iommu_opaque, which is a private > member of the bus structure. It’s passed as an argument > to the iommu_fn(). > > We could add a getter function to retrieve PCIBus->iommu_opaque > in remote_iommu_del_device(). That way we could avoid the global variable. I've CCed Michael, Peter, and Jason regarding IOMMUs. This makes me wonder whether there is a deeper issue with the pci_setup_iommu() API: the lack of per-device cleanup callbacks. Per-device IOMMU resources should be freed when a device is hot unplugged. From what I can tell this is not the case today: - hw/i386/intel_iommu.c:vtd_find_add_as() allocates and adds device address spaces but I can't find where they are removed and freed. VTDAddressSpace instances pointed to from vtd_bus->dev_as[] are leaked. - hw/i386/amd_iommu.c has similar leaks. Your patch introduces a custom remote_iommu_del_device() function, but I think the pci_setup_iommu() API should take a device_del() callback so IOMMUs have a standard interface for handling per-device cleanup. BTW in your case remote_iommu_del_device() is sufficient because hot unplug is blocked by the new unplug blocker mechanism you introduced. For other IOMMUs unplug will not be blocked and therefore IOMMUs really need a callback for per-device cleanup. Stefan