On Tue, Sep 07, 2021 at 07:44:44PM +0200, Thierry Reding wrote: > On Tue, Sep 07, 2021 at 10:33:24AM -0500, Rob Herring wrote: > > On Fri, Sep 3, 2021 at 10:36 AM Thierry Reding wrote: > > > > > > On Fri, Sep 03, 2021 at 09:36:33AM -0500, Rob Herring wrote: > > > > On Fri, Sep 3, 2021 at 8:52 AM Thierry Reding wrote: > > > > > > > > > > On Fri, Sep 03, 2021 at 08:20:55AM -0500, Rob Herring wrote: > > > > > > On Wed, Sep 1, 2021 at 9:13 AM Thierry Reding wrote: > > > > > > > > > > > > > > On Fri, Jul 02, 2021 at 05:16:25PM +0300, Dmitry Osipenko wrote: > > > > > > > > 01.07.2021 21:14, Thierry Reding пишет: > > > > > > > > > On Tue, Jun 08, 2021 at 06:51:40PM +0200, Thierry Reding wrote: > > > > > > > > >> On Fri, May 28, 2021 at 06:54:55PM +0200, Thierry Reding wrote: > > > > > > > > >>> On Thu, May 20, 2021 at 05:03:06PM -0500, Rob Herring wrote: > > > > > > > > >>>> On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding wrote: > > > > > > > > >>>>> From: Thierry Reding > > > > > > > > >>>>> > > > > > > > > >>>>> Reserved memory region phandle references can be accompanied by a > > > > > > > > >>>>> specifier that provides additional information about how that specific > > > > > > > > >>>>> reference should be treated. > > > > > > > > >>>>> > > > > > > > > >>>>> One use-case is to mark a memory region as needing an identity mapping > > > > > > > > >>>>> in the system's IOMMU for the device that references the region. This is > > > > > > > > >>>>> needed for example when the bootloader has set up hardware (such as a > > > > > > > > >>>>> display controller) to actively access a memory region (e.g. a boot > > > > > > > > >>>>> splash screen framebuffer) during boot. The operating system can use the > > > > > > > > >>>>> identity mapping flag from the specifier to make sure an IOMMU identity > > > > > > > > >>>>> mapping is set up for the framebuffer before IOMMU translations are > > > > > > > > >>>>> enabled for the display controller. > > > > > > > > >>>>> > > > > > > > > >>>>> Signed-off-by: Thierry Reding > > > > > > > > >>>>> --- > > > > > > > > >>>>> .../reserved-memory/reserved-memory.txt | 21 +++++++++++++++++++ > > > > > > > > >>>>> include/dt-bindings/reserved-memory.h | 8 +++++++ > > > > > > > > >>>>> 2 files changed, 29 insertions(+) > > > > > > > > >>>>> create mode 100644 include/dt-bindings/reserved-memory.h > > > > > > > > >>>> > > > > > > > > >>>> Sorry for being slow on this. I have 2 concerns. > > > > > > > > >>>> > > > > > > > > >>>> First, this creates an ABI issue. A DT with cells in 'memory-region' > > > > > > > > >>>> will not be understood by an existing OS. I'm less concerned about this > > > > > > > > >>>> if we address that with a stable fix. (Though I'm pretty sure we've > > > > > > > > >>>> naively added #?-cells in the past ignoring this issue.) > > > > > > > > >>> > > > > > > > > >>> A while ago I had proposed adding memory-region*s* as an alternative > > > > > > > > >>> name for memory-region to make the naming more consistent with other > > > > > > > > >>> types of properties (think clocks, resets, gpios, ...). If we added > > > > > > > > >>> that, we could easily differentiate between the "legacy" cases where > > > > > > > > >>> no #memory-region-cells was allowed and the new cases where it was. > > > > > > > > >>> > > > > > > > > >>>> Second, it could be the bootloader setting up the reserved region. If a > > > > > > > > >>>> node already has 'memory-region', then adding more regions is more > > > > > > > > >>>> complicated compared to adding new properties. And defining what each > > > > > > > > >>>> memory-region entry is or how many in schemas is impossible. > > > > > > > > >>> > > > > > > > > >>> It's true that updating the property gets a bit complicated, but it's > > > > > > > > >>> not exactly rocket science. We really just need to splice the array. I > > > > > > > > >>> have a working implemention for this in U-Boot. > > > > > > > > >>> > > > > > > > > >>> For what it's worth, we could run into the same issue with any new > > > > > > > > >>> property that we add. Even if we renamed this to iommu-memory-region, > > > > > > > > >>> it's still possible that a bootloader may have to update this property > > > > > > > > >>> if it already exists (it could be hard-coded in DT, or it could have > > > > > > > > >>> been added by some earlier bootloader or firmware). > > > > > > > > >>> > > > > > > > > >>>> Both could be addressed with a new property. Perhaps something like > > > > > > > > >>>> 'iommu-memory-region = <&phandle>;'. I think the 'iommu' prefix is > > > > > > > > >>>> appropriate given this is entirely because of the IOMMU being in the > > > > > > > > >>>> mix. I might feel differently if we had other uses for cells, but I > > > > > > > > >>>> don't really see it in this case. > > > > > > > > >>> > > > > > > > > >>> I'm afraid that down the road we'll end up with other cases and then we > > > > > > > > >>> might proliferate a number of *-memory-region properties with varying > > > > > > > > >>> prefixes. > > > > > > > > >>> > > > > > > > > >>> I am aware of one other case where we might need something like this: on > > > > > > > > >>> some Tegra SoCs we have audio processors that will access memory buffers > > > > > > > > >>> using a DMA engine. These processors are booted from early firmware > > > > > > > > >>> using firmware from system memory. In order to avoid trashing the > > > > > > > > >>> firmware, we need to reserve memory. We can do this using reserved > > > > > > > > >>> memory nodes. However, the audio DMA engine also uses the SMMU, so we > > > > > > > > >>> need to make sure that the firmware memory is marked as reserved within > > > > > > > > >>> the SMMU. This is similar to the identity mapping case, but not exactly > > > > > > > > >>> the same. Instead of creating a 1:1 mapping, we just want that IOVA > > > > > > > > >>> region to be reserved (i.e. IOMMU_RESV_RESERVED instead of > > > > > > > > >>> IOMMU_RESV_DIRECT{,_RELAXABLE}). > > > > > > > > >>> > > > > > > > > >>> That would also fall into the IOMMU domain, but we can't reuse the > > > > > > > > >>> iommu-memory-region property for that because then we don't have enough > > > > > > > > >>> information to decide which type of reservation we need. > > > > > > > > >>> > > > > > > > > >>> We could obviously make iommu-memory-region take a specifier, but we > > > > > > > > >>> could just as well use memory-regions in that case since we have > > > > > > > > >>> something more generic anyway. > > > > > > > > >>> > > > > > > > > >>> With the #memory-region-cells proposal, we can easily extend the cell in > > > > > > > > >>> the specifier with an additional MEMORY_REGION_IOMMU_RESERVE flag to > > > > > > > > >>> take that other use case into account. If we than also change to the new > > > > > > > > >>> memory-regions property name, we avoid the ABI issue (and we gain a bit > > > > > > > > >>> of consistency while at it). > > > > > > > > >> > > > > > > > > >> Ping? Rob, do you want me to add this second use-case to the patch > > > > > > > > >> series to make it more obvious that this isn't just a one-off thing? Or > > > > > > > > >> how do we proceed? > > > > > > > > > > > > > > > > > > Rob, given that additional use-case, do you want me to run with this > > > > > > > > > proposal and send out an updated series? > > > > > > > > > > > > > > > > > > > > > > > > What about variant with a "descriptor" properties that will describe > > > > > > > > each region: > > > > > > > > > > > > > > > > fb_desc: display-framebuffer-memory-descriptor { > > > > > > > > needs-identity-mapping; > > > > > > > > } > > > > > > > > > > > > > > > > display@52400000 { > > > > > > > > memory-region = <&fb ...>; > > > > > > > > memory-region-descriptor = <&fb_desc ...>; > > > > > > > > }; > > > > > > > > > > > > > > > > It could be a more flexible/extendible variant. > > > > > > > > > > > > > > This problem recently came up on #dri-devel again. Adding Alyssa and > > > > > > > Sven who are facing a similar challenge on their work on Apple M1 (if I > > > > > > > understood correctly). Also adding dri-devel for visibility since this > > > > > > > is a very common problem for display in particular. > > > > > > > > > > > > > > On M1 the situation is slightly more complicated: the firmware will > > > > > > > allocate a couple of buffers (including the framebuffer) in high memory > > > > > > > (> 4 GiB) and use the IOMMU to map that into an IOVA region below 4 GiB > > > > > > > so that the display hardware can access it. This makes it impossible to > > > > > > > bypass the IOMMU like we do on other chips (in particular to work around > > > > > > > the fault-by-default policy of the ARM SMMU driver). It also means that > > > > > > > in addition to the simple reserved regions I mentioned we need for audio > > > > > > > use-cases and identity mapping use-cases we need for display on Tegra, > > > > > > > we now also need to be able to convey physical to IOVA mappings. > > > > > > > > > > > > > > Fitting the latter into the original proposal sounds difficult. A quick > > > > > > > fix would've been to generate a mapping table in memory and pass that to > > > > > > > the kernel using a reserved-memory node (similar to what's done for > > > > > > > example on Tegra for the EMC frequency table on Tegra210) and mark it as > > > > > > > such using a special flag. But that then involves two layers of parsing, > > > > > > > which seems a bit suboptimal. Another way to shoehorn that into the > > > > > > > original proposal would've been to add flags for physical and virtual > > > > > > > address regions and use pairs to pass them using special flags. Again, > > > > > > > this is a bit wonky because it needs these to be carefully parsed and > > > > > > > matched up. > > > > > > > > > > > > > > Another downside is that we now have a situation where some of these > > > > > > > regions are no longer "reserved-memory regions" in the traditional > > > > > > > sense. This would require an additional flag in the reserved-memory > > > > > > > region nodes to prevent the IOVA regions from being reserved. By the > > > > > > > way, this is something that would also be needed for the audio use-case > > > > > > > I mentioned before, because the physical memory at that address can > > > > > > > still be used by an operating system. > > > > > > > > > > > > > > A more general solution would be to draw a bit from Dmitry's proposal > > > > > > > and introduce a new top-level "iov-reserved-memory" node. This could be > > > > > > > modelled on the existing reserved-memory node, except that the physical > > > > > > > memory pages for regions represented by child nodes would not be marked > > > > > > > as reserved. Only the IOVA range described by the region would be > > > > > > > reserved subsequently by the IOMMU framework and/or IOMMU driver. > > > > > > > > > > > > > > The simplest case where we just want to reserve some IOVA region could > > > > > > > then be done like this: > > > > > > > > > > > > > > iov-reserved-memory { > > > > > > > /* > > > > > > > * Probably safest to default to <2>, <2> here given > > > > > > > * that most IOMMUs support either > 32 bits of IAS > > > > > > > * or OAS. > > > > > > > */ > > > > > > > #address-cells = <2>; > > > > > > > #size-cells = <2>; > > > > > > > > > > > > > > firmware: firmware@80000000 { > > > > > > > reg = <0 0x80000000 0 0x01000000>; > > > > > > > }; > > > > > > > }; > > > > > > > > > > > > > > audio@30000000 { > > > > > > > ... > > > > > > > iov-memory-regions = <&firmware>; > > > > > > > ... > > > > > > > }; > > > > > > > > > > > > > > Mappings could be represented by an IOV reserved region taking a > > > > > > > reference to the reserved-region that they map: > > > > > > > > > > > > > > reserved-memory { > > > > > > > #address-cells = <2>; > > > > > > > #size-cells = <2>; > > > > > > > > > > > > > > /* 16 MiB of framebuffer at top-of-memory */ > > > > > > > framebuffer: framebuffer@1,ff000000 { > > > > > > > reg = <0x1 0xff000000 0 0x01000000>; > > > > > > > no-map; > > > > > > > }; > > > > > > > }; > > > > > > > > > > > > > > iov-reserved-memory { > > > > > > > /* IOMMU supports only 32-bit output address space */ > > > > > > > #address-cells = <1>; > > > > > > > #size-cells = <1>; > > > > > > > > > > > > > > /* 16 MiB of framebuffer mapped to top of IOVA */ > > > > > > > fb: fb@ff000000 { > > > > > > > reg = <0 0xff000000 0 0x01000000>; > > > > > > > memory-region = <&framebuffer>; > > > > > > > }; > > > > > > > }; > > > > > > > > > > > > > > display@40000000 { > > > > > > > ... > > > > > > > /* optional? */ > > > > > > > memory-region = <&framebuffer>; > > > > > > > iov-memory-regions = <&fb>; > > > > > > > ... > > > > > > > }; > > > > > > > > > > > > > > It's interesting how identity mapped regions now become a trivial > > > > > > > special case of mappings. All that is needed is to make the reg property > > > > > > > of the IOV reserved region correspond to the reg property of the normal > > > > > > > reserved region. Alternatively, as a small optimization for lazy people > > > > > > > like me, we could just allow these cases to omit the reg property and > > > > > > > instead inherit it from the referenced reserved region. > > > > > > > > > > > > > > As the second example shows it might be convenient if memory-region > > > > > > > could be derived from iov-memory-regions. This could be useful for cases > > > > > > > where the driver wants to do something with the physical pages of the > > > > > > > reserved region (such as mapping them and copying out the framebuffer > > > > > > > data to another buffer so that the reserved memory can be recycled). If > > > > > > > we have the IOV reserved region, we could provide an API to extract the > > > > > > > physical reserved region (if it exists). That way we could avoid > > > > > > > referencing it twice in DT. Then again, there's something elegant about > > > > > > > the explicit second reference to. It indicates the intent that we may > > > > > > > want to use the region for something other than just the IOV mapping. > > > > > > > > > > > > > > Anyway, this has been long enough. Let me know what you think. Alyssa, > > > > > > > Sven, it'd be interesting to hear if you think this could work as a > > > > > > > solution to the problem on M1. > > > > > > > > > > > > > > Rob, I think you might like this alternative because it basically gets > > > > > > > rid of all the points in the original proposal that you were concerned > > > > > > > about. Let me know what you think. > > > > > > > > > > > > Couldn't we keep this all in /reserved-memory? Just add an iova > > > > > > version of reg. Perhaps abuse 'assigned-address' for this purpose. The > > > > > > issue I see would be handling reserved iova areas without a physical > > > > > > area. That can be handled with just a iova and no reg. We already have > > > > > > a no reg case. > > > > > > > > > > I had thought about that initially. One thing I'm worried about is that > > > > > every child node in /reserved-memory will effectively cause the memory > > > > > that it described to be reserved. But we don't want that for regions > > > > > that are "virtual only" (i.e. IOMMU reservations). > > > > > > > > By virtual only, you mean no physical mapping, just a region of > > > > virtual space, right? For that we'd have no 'reg' and therefore no > > > > (physical) reservation by the OS. It's similar to non-static regions. > > > > You need a specific handler for them. We'd probably want a compatible > > > > as well for these virtual reservations. > > > > > > Yeah, these would be purely used for reserving regions in the IOVA so > > > that they won't be used by the IOVA allocator. Typically these would be > > > used for cases where those addresses have some special meaning. > > > > > > Do we want something like: > > > > > > compatible = "iommu-reserved"; > > > > > > for these? Or would that need to be: > > > > > > compatible = "linux,iommu-reserved"; > > > > > > ? There seems to be a mix of vendor-prefix vs. non-vendor-prefix > > > compatible strings in the reserved-memory DT bindings directory. > > > > I would not use 'linux,' here. > > > > > > > > On the other hand, do we actually need the compatible string? Because we > > > don't really want to associate much extra information with this like we > > > do for example with "shared-dma-pool". The logic to handle this would > > > all be within the IOMMU framework. All we really need is for the > > > standard reservation code to skip nodes that don't have a reg property > > > so we don't reserve memory for "virtual-only" allocations. > > > > It doesn't hurt to have one and I can imagine we might want to iterate > > over all the nodes. It's slightly easier and more common to iterate > > over compatible nodes rather than nodes with some property. > > > > > > Are these being global in DT going to be a problem? Presumably we have > > > > a virtual space per IOMMU. We'd know which IOMMU based on a device's > > > > 'iommus' and 'memory-region' properties, but within /reserved-memory > > > > we wouldn't be able to distinguish overlapping addresses from separate > > > > address spaces. Or we could have 2 different IOVAs for 1 physical > > > > space. That could be solved with something like this: > > > > > > > > iommu-addresses = <&iommu1
>; > > > > > > The only case that would be problematic would be if we have overlapping > > > physical regions, because that will probably trip up the standard code. > > > > > > But this could also be worked around by looking at iommu-addresses. For > > > example, if we had something like this: > > > > > > reserved-memory { > > > fb_dc0: fb@80000000 { > > > reg = <0x80000000 0x01000000>; > > > iommu-addresses = <0xa0000000 0x01000000>; > > > }; > > > > > > fb_dc1: fb@80000000 { > > > > You can't have 2 nodes with the same name (actually, you can, they > > just get merged together). Different names with the same unit-address > > is a dtc warning. I'd really like to make that a full blown > > overlapping region check. > > Right... so this would be a lot easier to deal with using that earlier > proposal where the IOMMU regions were a separate thing and referencing > the reserved-memory nodes. In those cases we could just have the > physical reservation for the framebuffer once (so we don't get any > duplicates or overlaps) and then have each IOVA reservation reference > that to create the mapping. > > > > > > reg = <0x80000000 0x01000000>; > > > iommu-addresses = <0xb0000000 0x01000000>; > > > }; > > > }; > > > > > > We could make the code identify that this is for the same physical > > > reservation (maybe make it so that reg needs to match exactly for this > > > to be recognized) but with different virtual allocations. > > > > > > On a side-note: do we really need to repeat the size? I'd think if we > > > want mappings then we'd likely want them for the whole reservation. > > > > Humm, I suppose not, but dropping it paints us into a corner if we > > come up with wanting a different size later. You could have a carveout > > for double/triple buffering your framebuffer, but the bootloader > > framebuffer is only single buffered. So would you want actual size? > > Perhaps this needs to be a bit more verbose then. If we want the ability > to create a mapping for only a partial reservation, I could imagine we > may as well want one that doesn't start at the beginning. So perhaps an > ever better solution would be to have a complete mapping, something that > works similar to "ranges" perhaps, like so: > > fb@80000000 { > reg = <0x80000000 0x01000000>; > iommu-ranges = <0x80000000 0x01000000 0x80000000>; > }; > > That would be for a full identity mapping, but we could also have > something along the lines of this: > > fb@80000000 { > reg = <0x80000000 0x01000000>; > iommu-ranges = <0x80100000 0x00100000 0xa0000000>; > }; > > So that would only map a 1 MiB chunk at offset 1 MiB (of the physical > reservation) to I/O virtual address 0xa0000000. > > > > I'd like to keep references to IOMMUs out of this because they would be > > > duplicated. We will only use these nodes if they are referenced by a > > > device node that also has an iommus property. Also, the IOMMU reference > > > itself isn't enough. We'd also need to support the complete specifier > > > because you can have things like SIDs in there to specify the exact > > > address space that a device uses. > > > > > > Also, for some of these they may be reused independently of the IOMMU > > > address space. For example the Tegra framebuffer identity mapping can > > > be used by either of the 2-4 display controllers, each with (at least > > > potentially) their own address space. But we don't want to have to > > > describe the identity mapping separately for each display controller. > > > > Okay, but I'd rather have to duplicate things in your case than not be > > able to express some other case. > > The earlier "separate iov-reserved-memory" proposal would be a good > compromise here. It'd allow us to duplicate only the necessary bits > (i.e. the IOVA mappings) but keep the common bits simple. And even > the IOVA mappings could be shared for cases like identity mappings. > See below for more on that. > > > > Another thing to consider is that these nodes will often be added by > > > firmware (e.g. firmware will allocate the framebuffer and set up the > > > corresponding reserved memory region in DT). Wiring up references like > > > this would get very complicated very quickly. > > > > Yes. > > > > The using 'iommus' property option below can be optional and doesn't > > have to be defined/supported now. Just trying to think ahead and not > > be stuck with something that can't be extended. > > One other benefit of the separate iov-reserved-memory node would be that > the iommus property could be simplified. If we have a physical > reservation that needs to be accessed by multiple different display > controllers, we'd end up with something fairly complex, such as this: > > fb: fb@80000000 { > reg = <0x80000000 0x01000000>; > iommus = <&dc0_iommu 0xa0000000 0x01000000>, > <&dc1_iommu 0xb0000000 0x01000000>, > <&dc2_iommu 0xc0000000 0x01000000>; > }; > > This would get even worse if we want to support partial mappings. Also, > it'd become quite complicated to correlate this with the memory-region > references: > > dc0: dc@40000000 { > ... > memory-region = <&fb>; > iommus = <&dc0_iommu>; > ... > }; > > So now you have to go match up the phandle (and potentially specifier) > in the iommus property of the disp0 node with an entry in the fb node's > iommus property. That's all fairly complicated stuff. > > With separate iov-reserved-memory, this would be a bit more verbose, but > each individual node would be simpler: > > reserved-memory { > fb: fb@80000000 { > reg = <0x80000000 0x01000000>; > }; > }; > > iov-reserved-memory { > fb0: fb@80000000 { > /* identity mapping, "reg" optional? */ > reg = <0x80000000 0x01000000>; > memory-region = <&fb>; > }; > > fb1: fb@90000000 { > /* but doesn't have to be */ > reg = <0x90000000 0x01000000>; > memory-region = <&fb>; > }; > > fb2: fb@a0000000 { > /* can be partial, too */ > ranges = <0x80000000 0x00800000 0xa0000000>; > memory-region = <&fb>; > }; > } > > dc0: dc@40000000 { > iov-memory-regions = <&fb0>; > /* optional? */ > memory-region = <&fb>; > iommus = <&dc0_iommu>; > }; > > Alternatively, if we want to support partial mappings, we could replace > those reg properties by ranges properties that I showed earlier. We may > even want to support both. Use "reg" for virtual-only reservations and > identity mappings, or "simple partial mappings" (that map a sub-region > starting from the beginning). Identity mappings could still be > simplified by just omitting the "reg" property. For more complicated > mappings, such as the ones on M1, the "ranges" property could be used. > > Note how this looks a bit boilerplate-y, but it's actually really quite > simple to understand, even for humans, I think. > > Also, the phandles in this are comparatively easy to wire up because > they can all be generated in a hierarchical way: generate physical > reservation and store phandle, then generate I/O virtual reservation > to reference that phandle and store the new phandle as well. Finally, > wire this up to the display controller (using either the IOV phandle or > both). > > Granted, this requires the addition of a new top-level node, but given > how expressive this becomes, I think it might be worth a second > consideration. I guess as a middle-ground between your suggestion and mine, we could also move the IOV nodes back into reserved-memory. If we make sure the names (together with unit-addresses) are unique, to support cases where we want to identity map, or have multiple mappings at the same address. So it'd look something like this: reserved-memory { fb: fb@80000000 { reg = <0x80000000 0x01000000>; }; audio-firmware@ff000000 { /* perhaps add "iommu-reserved" for this case */ compatible = "iommu-mapping"; /* * no memory-region referencing a physical * reservation, indicates that this is an * IOMMU reservation, rather than a mapping / reg = <0xff000000 0x01000000>; }; fb0: fb-mapping@80000000 { compatible = "iommu-mapping"; /* identity mapping, "reg" optional? */ reg = <0x80000000 0x01000000>; memory-region = <&fb>; }; fb1: fb-mapping@90000000 { compatible = "iommu-mapping"; /* but doesn't have to be */ reg = <0x90000000 0x01000000>; memory-region = <&fb>; }; fb2: fb-mapping@a0000000 { compatible = "iommu-mapping"; /* can be partial, too */ ranges = <0xa0000000 0x00800000 0x80000000>; memory-region = <&fb>; }; } dc0: dc@40000000 { memory-region = <&fb0>; iommus = <&dc0_iommu>; }; What do you think? Thierry