On Mon, Oct 11, 2021 at 09:49:57AM +0100, Jean-Philippe Brucker wrote: > On Mon, Oct 11, 2021 at 05:02:01PM +1100, David Gibson wrote: > > qemu wants to emulate a PAPR vIOMMU, so it says (via interfaces yet to > > be determined) that it needs an IOAS where things can be mapped in the > > range 0..2GiB (for the 32-bit window) and 2^59..2^59+1TiB (for the > > 64-bit window). > > > > Ideally the host /dev/iommu will say "ok!", since both those ranges > > are within the 0..2^60 translated range of the host IOMMU, and don't > > touch the IO hole. When the guest calls the IO mapping hypercalls, > > qemu translates those into DMA_MAP operations, and since they're all > > within the previously verified windows, they should work fine. > > Seems like we don't need the negotiation part? The host kernel > communicates available IOVA ranges to userspace including holes (patch > 17), and userspace can check that the ranges it needs are within the IOVA > space boundaries. That part is necessary for DPDK as well since it needs > to know about holes in the IOVA space where DMA wouldn't work as expected > (MSI doorbells for example). And there already is a negotiation happening, > when the host kernel rejects MAP ioctl outside the advertised area. The problem with the approach where the kernel advertises and userspace selects based on that, is that it locks us into a specific representation of what's possible. If we get new hardware with new weird constraints that can't be expressed with the representation we chose, we're kind of out of stuffed. Userspace will have to change to accomodate the new extension and have any chance of working on the new hardware. With the model where userspace requests, and the kernel acks or nacks, we can still support existing userspace if the only things it requests can still be accomodated in the new constraints. That's pretty likely if the majority of userspaces request very simple things (say a single IOVA block where it doesn't care about the base address). -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson