On Thu, Jun 03, 2021 at 01:29:58AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Thursday, June 3, 2021 12:09 AM > > > > On Wed, Jun 02, 2021 at 01:33:22AM +0000, Tian, Kevin wrote: > > > > From: Jason Gunthorpe > > > > Sent: Wednesday, June 2, 2021 1:42 AM > > > > > > > > On Tue, Jun 01, 2021 at 08:10:14AM +0000, Tian, Kevin wrote: > > > > > > From: Jason Gunthorpe > > > > > > Sent: Saturday, May 29, 2021 1:36 AM > > > > > > > > > > > > On Thu, May 27, 2021 at 07:58:12AM +0000, Tian, Kevin wrote: > > > > > > > > > > > > > IOASID nesting can be implemented in two ways: hardware nesting > > and > > > > > > > software nesting. With hardware support the child and parent I/O > > page > > > > > > > tables are walked consecutively by the IOMMU to form a nested > > > > translation. > > > > > > > When it's implemented in software, the ioasid driver is responsible > > for > > > > > > > merging the two-level mappings into a single-level shadow I/O page > > > > table. > > > > > > > Software nesting requires both child/parent page tables operated > > > > through > > > > > > > the dma mapping protocol, so any change in either level can be > > > > captured > > > > > > > by the kernel to update the corresponding shadow mapping. > > > > > > > > > > > > Why? A SW emulation could do this synchronization during > > invalidation > > > > > > processing if invalidation contained an IOVA range. > > > > > > > > > > In this proposal we differentiate between host-managed and user- > > > > > managed I/O page tables. If host-managed, the user is expected to use > > > > > map/unmap cmd explicitly upon any change required on the page table. > > > > > If user-managed, the user first binds its page table to the IOMMU and > > > > > then use invalidation cmd to flush iotlb when necessary (e.g. typically > > > > > not required when changing a PTE from non-present to present). > > > > > > > > > > We expect user to use map+unmap and bind+invalidate respectively > > > > > instead of mixing them together. Following this policy, map+unmap > > > > > must be used in both levels for software nesting, so changes in either > > > > > level are captured timely to synchronize the shadow mapping. > > > > > > > > map+unmap or bind+invalidate is a policy of the IOASID itself set when > > > > it is created. If you put two different types in a tree then each IOASID > > > > must continue to use its own operation mode. > > > > > > > > I don't see a reason to force all IOASIDs in a tree to be consistent?? > > > > > > only for software nesting. With hardware support the parent uses map > > > while the child uses bind. > > > > > > Yes, the policy is specified per IOASID. But if the policy violates the > > > requirement in a specific nesting mode, then nesting should fail. > > > > I don't get it. > > > > If the IOASID is a page table then it is bind/invalidate. SW or not SW > > doesn't matter at all. > > > > > > > > > > A software emulated two level page table where the leaf level is a > > > > bound page table in guest memory should continue to use > > > > bind/invalidate to maintain the guest page table IOASID even though it > > > > is a SW construct. > > > > > > with software nesting the leaf should be a host-managed page table > > > (or metadata). A bind/invalidate protocol doesn't require the user > > > to notify the kernel of every page table change. > > > > The purpose of invalidate is to inform the implementation that the > > page table has changed so it can flush the caches. If the page table > > is changed and invalidation is not issued then then the implementation > > is free to ignore the changes. > > > > In this way the SW mode is the same as a HW mode with an infinite > > cache. > > > > The collaposed shadow page table is really just a cache. > > > > OK. One additional thing is that we may need a 'caching_mode" > thing reported by /dev/ioasid, indicating whether invalidation is > required when changing non-present to present. For hardware > nesting it's not reported as the hardware IOMMU will walk the > guest page table in cases of iotlb miss. For software nesting > caching_mode is reported so the user must issue invalidation > upon any change in guest page table so the kernel can update > the shadow page table timely. For the fist cut, I'd have the API assume that invalidates are *always* required. Some bypass to avoid them in cases where they're not needed can be an additional extension. > Following this and your other comment with David, we will mark > host-managed vs. guest-managed explicitly for I/O page table > of each IOASID. map+unmap or bind+invalid is decided by > which owner is specified by the user. > > Thanks > Kevin > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson