On Fri, Feb 11, 2022 at 12:15:44AM +0100, Janne Grunau wrote: > On 2022-02-09 17:31:16 +0100, Thierry Reding wrote: > > On Sun, Feb 06, 2022 at 11:27:00PM +0100, Janne Grunau wrote: > > > On 2021-09-15 17:19:39 +0200, Thierry Reding wrote: > > > > On Tue, Sep 07, 2021 at 07:44:44PM +0200, Thierry Reding wrote: > > > > > On Tue, Sep 07, 2021 at 10:33:24AM -0500, Rob Herring wrote: > > > > > > On Fri, Sep 3, 2021 at 10:36 AM Thierry Reding wrote: > > > > > > > > > > > > > > On Fri, Sep 03, 2021 at 09:36:33AM -0500, Rob Herring wrote: > > > > > > > > On Fri, Sep 3, 2021 at 8:52 AM Thierry Reding wrote: > > > > > > > > > > > > > > > > > > On Fri, Sep 03, 2021 at 08:20:55AM -0500, Rob Herring wrote: > > > > > > > > > > > > > > > > > > > > Couldn't we keep this all in /reserved-memory? Just add an iova > > > > > > > > > > version of reg. Perhaps abuse 'assigned-address' for this purpose. The > > > > > > > > > > issue I see would be handling reserved iova areas without a physical > > > > > > > > > > area. That can be handled with just a iova and no reg. We already have > > > > > > > > > > a no reg case. > > > > > > > > > > > > > > > > > > I had thought about that initially. One thing I'm worried about is that > > > > > > > > > every child node in /reserved-memory will effectively cause the memory > > > > > > > > > that it described to be reserved. But we don't want that for regions > > > > > > > > > that are "virtual only" (i.e. IOMMU reservations). > > > > > > > > > > > > > > > > By virtual only, you mean no physical mapping, just a region of > > > > > > > > virtual space, right? For that we'd have no 'reg' and therefore no > > > > > > > > (physical) reservation by the OS. It's similar to non-static regions. > > > > > > > > You need a specific handler for them. We'd probably want a compatible > > > > > > > > as well for these virtual reservations. > > > > > > > > > > > > > > Yeah, these would be purely used for reserving regions in the IOVA so > > > > > > > that they won't be used by the IOVA allocator. Typically these would be > > > > > > > used for cases where those addresses have some special meaning. > > > > > > > > > > > > > > Do we want something like: > > > > > > > > > > > > > > compatible = "iommu-reserved"; > > > > > > > > > > > > > > for these? Or would that need to be: > > > > > > > > > > > > > > compatible = "linux,iommu-reserved"; > > > > > > > > > > > > > > ? There seems to be a mix of vendor-prefix vs. non-vendor-prefix > > > > > > > compatible strings in the reserved-memory DT bindings directory. > > > > > > > > > > > > I would not use 'linux,' here. > > > > > > > > > > > > > > > > > > > > On the other hand, do we actually need the compatible string? Because we > > > > > > > don't really want to associate much extra information with this like we > > > > > > > do for example with "shared-dma-pool". The logic to handle this would > > > > > > > all be within the IOMMU framework. All we really need is for the > > > > > > > standard reservation code to skip nodes that don't have a reg property > > > > > > > so we don't reserve memory for "virtual-only" allocations. > > > > > > > > > > > > It doesn't hurt to have one and I can imagine we might want to iterate > > > > > > over all the nodes. It's slightly easier and more common to iterate > > > > > > over compatible nodes rather than nodes with some property. > > > > > > > > > > > > > > Are these being global in DT going to be a problem? Presumably we have > > > > > > > > a virtual space per IOMMU. We'd know which IOMMU based on a device's > > > > > > > > 'iommus' and 'memory-region' properties, but within /reserved-memory > > > > > > > > we wouldn't be able to distinguish overlapping addresses from separate > > > > > > > > address spaces. Or we could have 2 different IOVAs for 1 physical > > > > > > > > space. That could be solved with something like this: > > > > > > > > > > > > > > > > iommu-addresses = <&iommu1
>; > > > > > > > > > > > > > > The only case that would be problematic would be if we have overlapping > > > > > > > physical regions, because that will probably trip up the standard code. > > > > > > > > > > > > > > But this could also be worked around by looking at iommu-addresses. For > > > > > > > example, if we had something like this: > > > > > > > > > > > > > > reserved-memory { > > > > > > > fb_dc0: fb@80000000 { > > > > > > > reg = <0x80000000 0x01000000>; > > > > > > > iommu-addresses = <0xa0000000 0x01000000>; > > > > > > > }; > > > > > > > > > > > > > > fb_dc1: fb@80000000 { > > > > > > > > > > > > You can't have 2 nodes with the same name (actually, you can, they > > > > > > just get merged together). Different names with the same unit-address > > > > > > is a dtc warning. I'd really like to make that a full blown > > > > > > overlapping region check. > > > > > > > > > > Right... so this would be a lot easier to deal with using that earlier > > > > > proposal where the IOMMU regions were a separate thing and referencing > > > > > the reserved-memory nodes. In those cases we could just have the > > > > > physical reservation for the framebuffer once (so we don't get any > > > > > duplicates or overlaps) and then have each IOVA reservation reference > > > > > that to create the mapping. > > > > > > > > > > > > > > > > > > reg = <0x80000000 0x01000000>; > > > > > > > iommu-addresses = <0xb0000000 0x01000000>; > > > > > > > }; > > > > > > > }; > > > > > > > > > > > > > > We could make the code identify that this is for the same physical > > > > > > > reservation (maybe make it so that reg needs to match exactly for this > > > > > > > to be recognized) but with different virtual allocations. > > > > > > > > > > > > > > On a side-note: do we really need to repeat the size? I'd think if we > > > > > > > want mappings then we'd likely want them for the whole reservation. > > > > > > > > > > > > Humm, I suppose not, but dropping it paints us into a corner if we > > > > > > come up with wanting a different size later. You could have a carveout > > > > > > for double/triple buffering your framebuffer, but the bootloader > > > > > > framebuffer is only single buffered. So would you want actual size? > > > > > > > > > > Perhaps this needs to be a bit more verbose then. If we want the ability > > > > > to create a mapping for only a partial reservation, I could imagine we > > > > > may as well want one that doesn't start at the beginning. So perhaps an > > > > > ever better solution would be to have a complete mapping, something that > > > > > works similar to "ranges" perhaps, like so: > > > > > > > > > > fb@80000000 { > > > > > reg = <0x80000000 0x01000000>; > > > > > iommu-ranges = <0x80000000 0x01000000 0x80000000>; > > > > > }; > > > > > > > > > > That would be for a full identity mapping, but we could also have > > > > > something along the lines of this: > > > > > > > > > > fb@80000000 { > > > > > reg = <0x80000000 0x01000000>; > > > > > iommu-ranges = <0x80100000 0x00100000 0xa0000000>; > > > > > }; > > > > > > > > > > So that would only map a 1 MiB chunk at offset 1 MiB (of the physical > > > > > reservation) to I/O virtual address 0xa0000000. > > > > > > > > > > > > I'd like to keep references to IOMMUs out of this because they would be > > > > > > > duplicated. We will only use these nodes if they are referenced by a > > > > > > > device node that also has an iommus property. Also, the IOMMU reference > > > > > > > itself isn't enough. We'd also need to support the complete specifier > > > > > > > because you can have things like SIDs in there to specify the exact > > > > > > > address space that a device uses. > > > > > > > > > > > > > > Also, for some of these they may be reused independently of the IOMMU > > > > > > > address space. For example the Tegra framebuffer identity mapping can > > > > > > > be used by either of the 2-4 display controllers, each with (at least > > > > > > > potentially) their own address space. But we don't want to have to > > > > > > > describe the identity mapping separately for each display controller. > > > > > > > > > > > > Okay, but I'd rather have to duplicate things in your case than not be > > > > > > able to express some other case. > > > > > > > > > > The earlier "separate iov-reserved-memory" proposal would be a good > > > > > compromise here. It'd allow us to duplicate only the necessary bits > > > > > (i.e. the IOVA mappings) but keep the common bits simple. And even > > > > > the IOVA mappings could be shared for cases like identity mappings. > > > > > See below for more on that. > > > > > > > > > > > > Another thing to consider is that these nodes will often be added by > > > > > > > firmware (e.g. firmware will allocate the framebuffer and set up the > > > > > > > corresponding reserved memory region in DT). Wiring up references like > > > > > > > this would get very complicated very quickly. > > > > > > > > > > > > Yes. > > > > > > > > > > > > The using 'iommus' property option below can be optional and doesn't > > > > > > have to be defined/supported now. Just trying to think ahead and not > > > > > > be stuck with something that can't be extended. > > > > > > > > > > One other benefit of the separate iov-reserved-memory node would be that > > > > > the iommus property could be simplified. If we have a physical > > > > > reservation that needs to be accessed by multiple different display > > > > > controllers, we'd end up with something fairly complex, such as this: > > > > > > > > > > fb: fb@80000000 { > > > > > reg = <0x80000000 0x01000000>; > > > > > iommus = <&dc0_iommu 0xa0000000 0x01000000>, > > > > > <&dc1_iommu 0xb0000000 0x01000000>, > > > > > <&dc2_iommu 0xc0000000 0x01000000>; > > > > > }; > > > > > > > > > > This would get even worse if we want to support partial mappings. Also, > > > > > it'd become quite complicated to correlate this with the memory-region > > > > > references: > > > > > > > > > > dc0: dc@40000000 { > > > > > ... > > > > > memory-region = <&fb>; > > > > > iommus = <&dc0_iommu>; > > > > > ... > > > > > }; > > > > > > > > > > So now you have to go match up the phandle (and potentially specifier) > > > > > in the iommus property of the disp0 node with an entry in the fb node's > > > > > iommus property. That's all fairly complicated stuff. > > > > > > > > > > With separate iov-reserved-memory, this would be a bit more verbose, but > > > > > each individual node would be simpler: > > > > > > > > > > reserved-memory { > > > > > fb: fb@80000000 { > > > > > reg = <0x80000000 0x01000000>; > > > > > }; > > > > > }; > > > > > > > > > > iov-reserved-memory { > > > > > fb0: fb@80000000 { > > > > > /* identity mapping, "reg" optional? */ > > > > > reg = <0x80000000 0x01000000>; > > > > > memory-region = <&fb>; > > > > > }; > > > > > > > > > > fb1: fb@90000000 { > > > > > /* but doesn't have to be */ > > > > > reg = <0x90000000 0x01000000>; > > > > > memory-region = <&fb>; > > > > > }; > > > > > > > > > > fb2: fb@a0000000 { > > > > > /* can be partial, too */ > > > > > ranges = <0x80000000 0x00800000 0xa0000000>; > > > > > memory-region = <&fb>; > > > > > }; > > > > > } > > > > > > > > > > dc0: dc@40000000 { > > > > > iov-memory-regions = <&fb0>; > > > > > /* optional? */ > > > > > memory-region = <&fb>; > > > > > iommus = <&dc0_iommu>; > > > > > }; > > > > > > > > > > Alternatively, if we want to support partial mappings, we could replace > > > > > those reg properties by ranges properties that I showed earlier. We may > > > > > even want to support both. Use "reg" for virtual-only reservations and > > > > > identity mappings, or "simple partial mappings" (that map a sub-region > > > > > starting from the beginning). Identity mappings could still be > > > > > simplified by just omitting the "reg" property. For more complicated > > > > > mappings, such as the ones on M1, the "ranges" property could be used. > > > > > > > > > > Note how this looks a bit boilerplate-y, but it's actually really quite > > > > > simple to understand, even for humans, I think. > > > > > > > > > > Also, the phandles in this are comparatively easy to wire up because > > > > > they can all be generated in a hierarchical way: generate physical > > > > > reservation and store phandle, then generate I/O virtual reservation > > > > > to reference that phandle and store the new phandle as well. Finally, > > > > > wire this up to the display controller (using either the IOV phandle or > > > > > both). > > > > > > > > > > Granted, this requires the addition of a new top-level node, but given > > > > > how expressive this becomes, I think it might be worth a second > > > > > consideration. > > > > > > > > I guess as a middle-ground between your suggestion and mine, we could > > > > also move the IOV nodes back into reserved-memory. If we make sure the > > > > names (together with unit-addresses) are unique, to support cases where > > > > we want to identity map, or have multiple mappings at the same address. > > > > So it'd look something like this: > > > > > > > > reserved-memory { > > > > fb: fb@80000000 { > > > > reg = <0x80000000 0x01000000>; > > > > }; > > > > > > > > audio-firmware@ff000000 { > > > > /* perhaps add "iommu-reserved" for this case */ > > > > compatible = "iommu-mapping"; > > > > /* > > > > * no memory-region referencing a physical > > > > * reservation, indicates that this is an > > > > * IOMMU reservation, rather than a mapping > > > > / > > > > reg = <0xff000000 0x01000000>; > > > > }; > > > > > > > > fb0: fb-mapping@80000000 { > > > > compatible = "iommu-mapping"; > > > > /* identity mapping, "reg" optional? */ > > > > reg = <0x80000000 0x01000000>; > > > > memory-region = <&fb>; > > > > }; > > > > > > > > fb1: fb-mapping@90000000 { > > > > compatible = "iommu-mapping"; > > > > /* but doesn't have to be */ > > > > reg = <0x90000000 0x01000000>; > > > > memory-region = <&fb>; > > > > }; > > > > > > > > fb2: fb-mapping@a0000000 { > > > > compatible = "iommu-mapping"; > > > > /* can be partial, too */ > > > > ranges = <0xa0000000 0x00800000 0x80000000>; > > > > memory-region = <&fb>; > > > > }; > > > > } > > > > > > > > dc0: dc@40000000 { > > > > memory-region = <&fb0>; > > > > iommus = <&dc0_iommu>; > > > > }; > > > > > > > > What do you think? > > > > > > I converted the Apple M1 display controller driver to using reserved > > > regions using these bindings. It is sufficient for the needs of the M1 > > > display controller which is so far the only device requiring this. > > > > Thanks for trying this out. I've been meaning to resume this discussion > > to finally get closure because we really want to enable this for various > > Tegra SoCs. > > > > > I encountered two problems with this bindings proposal: > > > > > > 1) It is impossible to express which iommu needs to be used if a device > > > has multiple "iommus" specified. This is on the M1 only a theoretical > > > problem as the display co-processor devices use a single iommu. > > > > From what I recall this is something that we don't fully support either > > way. If you've got a struct device and you want to allocate DMA'able > > memory, you can only pass that struct device to the DMA API upon > > allocation but you have no way of specifying separate instances > > depending on use-case. > > Ok, let's us ignore then my complicated proposal. It is not a problem we > need to solve for the M1. > > > > 2) The reserved regions can not easily looked up at iommu probe > > > time. The Apple M1 iommu driver resets the iommu at probe. This > > > breaks the framebuffer. The display controller appears to crash then > > > an active scan-out framebuffer is unmapped. Resetting the iommu > > > looks like a sensible approach though. > > > > > > To work around this I added custom property to the affected iommu node > > > to avoid the reset. This doesn't feel correct since the reason to avoid > > > the reset is that we have to maintain the reserved regions mapping until > > > the display controller driver takes over. > > > As far as I can see the only method to retrieve devices with reserved > > > memory from the iommu is to iterate over all devices. This looks > > > impractical. The M1 has over 20 distinct iommus. > > > > Do I understand correctly that on the M1, the firmware sets up a mapping > > in the IOMMU already and then you want to recreate that mapping after > > the IOMMU driver has reset the IOMMU? > > The mappings are already set up by firmware as it uses the frame buffer > already itself. We need to make the kernel aware of the existing mapping > so it can use the IOMMU. Using reserved memory regions and mappings > seems to be clean way to do this. We want to reset IOMMUs without > pre-existing mappings (the M1 has over 20 IOMMUs). We need a way to > identify the two IOMMUs which must not be reseted at driver probe time. > A simple property in the IOMMU node would be enough. It would duplicate > information though since the only reason why we can't reset the IOMMU is > the pre-existing mapping > > > In that case, how do you make sure that you atomically transition from > > the firmware mapping to the kernel mapping? As soon as you reset the > > IOMMU, the display controller will cause IOMMU faults because its now > > scanning out from an unmapped buffer, right? > > We are replacing the entire firmware managed page table with a kernel > managed one with a TTBR MMIO register write. The second IOMMU with > pre-existing mapping has unfortunately the TTBR locked. Dealing with > this is more complicated but the device using this IOMMU appears to > sleep. > > > So that approach of avoiding the reset doesn't seem wrong to me. > > Obviously that's not altogether trivial to do either. Typically the > > IOMMU mappings would be contained in system memory, so you'd have to > > reserve those via reserved-memory nodes as well, etc. > > The system memory is currently not expressed as reserved-memory but > simply outside of the specified memory. > > > > One way to avoid both problems would be to move the mappings to the > > > iommu node as sub nodes. The device would then reference those. > > > This way the mapping is readily available at iommu probe time and > > > adding iommu type specific parameters to map the region correctly is > > > possible. > > > > > > The sample above would transfor to: > > > > > > reserved-memory { > > > fb: fb@80000000 { > > > reg = <0x80000000 0x01000000>; > > > }; > > > }; > > > > > > dc0_iommu: iommu@20000000 { > > > #iommu-cells = <1>; > > > > > > fb0: fb-mapping@80000000 { > > > compatible = "iommu-mapping"; > > > /* identity mapping, "reg" optional? */ > > > reg = <0x80000000 0x01000000>; > > > memory-region = <&fb>; > > > device-id = <0>; /* for #iommu-cells*/ > > > }; > > > > > > fb1: fb-mapping@90000000 { > > > compatible = "iommu-mapping"; > > > /* but doesn't have to be */ > > > reg = <0x90000000 0x01000000>; > > > memory-region = <&fb>; > > > device-id = <1>; /* for #iommu-cells*/ > > > }; > > > }; > > > > > > dc0: dc@40000000 { > > > iommu-region = <&fb0>; > > > iommus = <&dc0_iommu 0>; > > > }; > > > > > > Does anyone see problems with this approach or can think of something > > > better? > > > > The device tree description of this looks a bit weird because it > > sprinkles things all around. For instance now we've got the "stream ID" > > (i.e. what you seem to be referring to as "device-id") in two places, > > once in the iommus property of the DC node and once in the mapping. > > Yes, stream_id would be the device-id. It is the term used in the > apple-dart IOMMU driver. It is duplicated to deal with the multiple > IOMMU problem. Let's ignore that and scrape my proposal. > > > Would it work if you added back-references to the devices that are > > active on boot to the IOMMU node? Something along these lines: > > > > reserved-memory { > > fb: fb@80000000 { > > reg = <0x80000000 0x01000000>; > > }; > > }; > > > > dc0_iommu: iommu@20000000 { > > #iommu-cells = <1>; > > > > mapped-devices = <&dc0>; > > }; > > > > dc0: dc@40000000 { > > memory-region = <&fb0>; > > iommus = <&dc0_iommu 0>; > > }; > > > > Depending on how you look at it that's a circular dependency, but it > > won't be in practice. It makes things a bit more compact and puts the > > data where it belongs. > > Yes, this works for the Apple M1 display co-processor. I've changed the > dts and my apple-dart private parsing code to use "mapped-devices" > back-references and it works as before. We probably need an automated > check to ensure the references between device and IOMMU remains > consistent. Circling back to this... again. I've been thinking about this some more and have come up with a mix between what Rob, Janne and I had proposed. This is how it would look (based on Tegra210): reserved-memory { fb: framebuffer@80000000 { /* * Physical memory region that is reserved. If * this property is omitted, this region should * be treated as an IOVA reservation. */ reg = <0x80000000 0x01000000>; /* * Create 1:1 mapping for display controller. * * Note how instead of the IOMMU reference we * actually pass the device reference here. This * combines the "mapped-devices" property that * was proposed earlier and makes it easier to * find the device that needs this mapping. The * IOMMU phandle and specifier can be obtained * via this backlink to the consumer device. * * More than one entry could be specified here * to allow mappings for multiple devices. This * avoids the problem of having multiple nodes * with the same name. * * Could also be "iommu-addresses" as Rob had * suggested earlier, but "iommu-mapping" seems * a bit more appropriate given that there's * also the phandle now. */ iommu-mapping = <&dc 0x80000000 0x01000000>; }; }; mc: memory-controller@70019000 { ... #iommu-cells = <1>; ... }; dc: dc@54200000 { ... iommus = <&mc TEGRA_SWGROUP_DC>; /* * As in earlier proposals, this could be optional if * all we need is the IOMMU mapping. It can be specified * if there's a need for the driver to use the physical * memory region (i.e. to copy out existing framebuffer * content and recycle memory). */ memory-region = <&fb>; ... }; One last remaining question that I have for this is whether we also need some sort of #address-cells and #size-cells for the IOMMU which we need to determine how many cells the addresses in iommu-mapping need to have. I suppose we could derive that from the dma-ranges property somehow, since that defines the addressable region of the device that needs the mapping. Thierry