On Thu, Oct 14, 2021 at 11:52:08AM -0300, Jason Gunthorpe wrote: > On Thu, Oct 14, 2021 at 03:53:33PM +1100, David Gibson wrote: > > > > My feeling is that qemu should be dealing with the host != target > > > case, not the kernel. > > > > > > The kernel's job should be to expose the IOMMU HW it has, with all > > > features accessible, to userspace. > > > > See... to me this is contrary to the point we agreed on above. > > I'm not thinking of these as exclusive ideas. > > The IOCTL interface in iommu can quite happily expose: > Create IOAS generically > Manipulate IOAS generically > Create IOAS with IOMMU driver specific attributes > HW specific Manipulate IOAS > > IOCTL commands all together. > > So long as everything is focused on a generic in-kernel IOAS object it > is fine to have multiple ways in the uAPI to create and manipulate the > objects. > > When I speak about a generic interface I mean "Create IOAS > generically" - ie a set of IOCTLs that work on most IOMMU HW and can > be relied upon by things like DPDK/etc to always work and be portable. > This is why I like "hints" to provide some limited widely applicable > micro-optimization. > > When I said "expose the IOMMU HW it has with all features accessible" > I mean also providing "Create IOAS with IOMMU driver specific > attributes". > > These other IOCTLs would allow the IOMMU driver to expose every > configuration knob its HW has, in a natural HW centric language. > There is no pretense of genericness here, no crazy foo=A, foo=B hidden > device specific interface. > > Think of it as a high level/low level interface to the same thing. Ok, I see what you mean. > > Those are certainly wrong, but they came about explicitly by *not* > > being generic rather than by being too generic. So I'm really > > confused aso to what you're arguing for / against. > > IMHO it is not having a PPC specific interface that was the problem, > it was making the PPC specific interface exclusive to the type 1 > interface. If type 1 continued to work on PPC then DPDK/etc would > never learned PPC specific code. Ok, but the reason this happened is that the initial version of type 1 *could not* be used on PPC. The original Type 1 implicitly promised a "large" IOVA range beginning at IOVA 0 without any real way of specifying or discovering how large that range was. Since ppc could typically only give a 2GiB range at IOVA 0, that wasn't usable. That's why I say the problem was not making type1 generic enough. I believe the current version of Type1 has addressed this - at least enough to be usable in common cases. But by this time the ppc backend is already out there, so no-one's had the capacity to go back and make ppc work with Type1. > For iommufd with the high/low interface each IOMMU HW should ask basic > questions: > > - What should the generic high level interface do on this HW? > For instance what should 'Create IOAS generically' do for PPC? > It should not fail, it should create *something* > What is the best thing for DPDK? > I guess the 64 bit window is most broadly useful. Right, which means the kernel must (at least in the common case) have the capcity to choose and report a non-zero base-IOVA. Hrm... which makes me think... if we allow this for the common kernel-managed case, do we even need to have capcity in the high-level interface for reporting IO holes? If the kernel can choose a non-zero base, it could just choose on x86 to place it's advertised window above the IO hole. > - How to accurately describe the HW in terms of standard IOAS objects > and where to put HW specific structs to support this. > > This is where PPC would decide how best to expose a control over > its low/high window (eg 1,2,3 IOAS). Whatever the IOMMU driver > wants, so long as it fits into the kernel IOAS model facing the > connected device driver. > > QEMU would have IOMMU userspace drivers. One would be the "generic > driver" using only the high level generic interface. It should work as > best it can on all HW devices. This is the fallback path you talked > of. > > QEMU would also have HW specific IOMMU userspace drivers that know how > to operate the exact HW. eg these drivers would know how to use > userspace page tables, how to form IOPTEs and how to access the > special features. > > This is how QEMU could use an optimzed path with nested page tables, > for instance. The concept makes sense in general. The devil's in the details, as usual. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson