On Thu, Jun 03, 2021 at 09:28:32AM -0300, Jason Gunthorpe wrote: > On Thu, Jun 03, 2021 at 03:23:17PM +1000, David Gibson wrote: > > On Wed, Jun 02, 2021 at 01:37:53PM -0300, Jason Gunthorpe wrote: > > > On Wed, Jun 02, 2021 at 04:57:52PM +1000, David Gibson wrote: > > > > > > > I don't think presence or absence of a group fd makes a lot of > > > > difference to this design. Having a group fd just means we attach > > > > groups to the ioasid instead of individual devices, and we no longer > > > > need the bookkeeping of "partial" devices. > > > > > > Oh, I think we really don't want to attach the group to an ioasid, or > > > at least not as a first-class idea. > > > > > > The fundamental problem that got us here is we now live in a world > > > where there are many ways to attach a device to an IOASID: > > > > I'm not seeing that that's necessarily a problem. > > > > > - A RID binding > > > - A RID,PASID binding > > > - A RID,PASID binding for ENQCMD > > > > I have to admit I haven't fully grasped the differences between these > > modes. I'm hoping we can consolidate at least some of them into the > > same sort of binding onto different IOASIDs (which may be linked in > > parent/child relationships). > > What I would like is that the /dev/iommu side managing the IOASID > doesn't really care much, but the device driver has to tell > drivers/iommu what it is going to do when it attaches. By the device driver, do you mean the userspace or guest device driver? Or do you mean the vfio_pci or mdev "shim" device driver"? > It makes sense, in PCI terms, only the driver knows what TLPs the > device will generate. The IOMMU needs to know what TLPs it will > recieve to configure properly. > > PASID or not is major device specific variation, as is the ENQCMD/etc > > Having the device be explicit when it tells the IOMMU what it is going > to be sending is a major plus to me. I actually don't want to see this > part of the interface be made less strong. Ok, if I'm understanding this right a PASID capable IOMMU will be able to process *both* transactions with just a RID and transactions with a RID+PASID. So if we're thinking of this notional 84ish-bit address space, then that includes "no PASID" as well as all the possible PASID values. Yes? Or am I confused? > > > > The selection of which mode to use is based on the specific > > > driver/device operation. Ie the thing that implements the 'struct > > > vfio_device' is the thing that has to select the binding mode. > > > > I thought userspace selected the binding mode - although not all modes > > will be possible for all devices. > > /dev/iommu is concerned with setting up the IOAS and filling the IO > page tables with information > > The driver behind "struct vfio_device" is responsible to "route" its > HW into that IOAS. > > They are two halfs of the problem, one is only the io page table, and one > the is connection of a PCI TLP to a specific io page table. > > Only the driver knows what format of TLPs the device will generate so > only the driver can specify the "route" Ok. I'd really like if we can encode this in a way that doesn't build PCI-specific structure into the API, though. > > > > eg if two PCI devices are in a group then it is perfectly fine that > > > one device uses RID binding and the other device uses RID,PASID > > > binding. > > > > Uhhhh... I don't see how that can be. They could well be in the same > > group because their RIDs cannot be distinguished from each other. > > Inability to match the RID is rare, certainly I would expect any IOMMU > HW that can do PCIEe PASID matching can also do RID matching. It's not just up to the IOMMU. The obvious case is a PCIe-to-PCI bridge. All transactions show the RID of the bridge, because vanilla PCI doesn't have them. Same situation with a buggy multifunction device which uses function 0's RID for all functions. It may be rare, but we still have to deal with it one way or another. I really don't think we want to support multiple binding types for a single group. > With > such HW the above is perfectly fine - the group may not be secure > between members (eg !ACS), but the TLPs still carry valid RIDs and > PASID and the IOMMU can still discriminate. They carry RIDs, whether they're valid depends on how buggy your hardware is. > I think you are talking about really old IOMMU's that could only > isolate based on ingress port or something.. I suppose modern PCIe has > some cases like this in the NTB stuff too. Depends what you mean by really old. They may seem really old to those working on new fancy IOMMU technology. But I hit problems in practice not long ago with awkwardly multi-device groups because it was on a particular Dell server without ACS implementation. Likewise I strongly suspect non-PASID IOMMUs will remain common on low end hardware (like peoples' laptops) for some time. > Oh, I hadn't spent time thinking about any of those.. It is messy but > it can still be forced to work, I guess. A device centric model means > all the devices using the same routing ID have to be connected to the > same IOASID by userspace. So some of the connections will be NOPs. See, that's exactly what I thought the group checks were enforcing. I'm really hoping we don't need two levels of granularity here: groups of devices that can't be identified from each other, and then groups of those that can't be isolated from each other. That introduces a huge amount of extra conceptual complexity. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson