Re: [RFC] xen/arm: introduce XENFEAT_ARM_dom0_iommu

From: Julien Grall <julien@xen.org>
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: xen-devel@lists.xenproject.org, Bertrand.Marquis@arm.com,
	Volodymyr_Babchuk@epam.com, rahul.singh@arm.com
Subject: Re: [RFC] xen/arm: introduce XENFEAT_ARM_dom0_iommu
Date: Fri, 19 Feb 2021 10:32:41 +0000	[thread overview]
Message-ID: <98a15b6d-7460-31a0-0b4a-acf035571a17@xen.org> (raw)
In-Reply-To: <alpine.DEB.2.21.2102180920570.3234@sstabellini-ThinkPad-T480s>

Hi Stefano,

On 19/02/2021 01:42, Stefano Stabellini wrote:
> On Thu, 18 Feb 2021, Julien Grall wrote:
>> On 17/02/2021 23:54, Stefano Stabellini wrote:
>>> On Wed, 17 Feb 2021, Julien Grall wrote:
>>>> On 17/02/2021 02:00, Stefano Stabellini wrote:
>>> But actually it was always wrong for Linux to enable swiotlb-xen without
>>> checking whether it is 1:1 mapped or not. Today we enable swiotlb-xen in
>>> dom0 and disable it in domU, while we should have enabled swiotlb-xen if
>>> 1:1 mapped no matter dom0/domU. (swiotlb-xen could be useful in a 1:1
>>> mapped domU driver domain.)
>>>
>>>
>>> There is an argument (Andy was making on IRC) that being 1:1 mapped or
>>> not is an important information that Xen should provide to the domain
>>> regardless of anything else.
>>>
>>> So maybe we should add two flags:
>>>
>>> - XENFEAT_direct_mapped
>>> - XENFEAT_not_direct_mapped
>>
>> I am guessing the two flags is to allow Linux to fallback to the default
>> behavior (depending on dom0 vs domU) on older hypervisor On newer hypervisors,
>> one of this flag would always be set. Is that correct?
> 
> Yes. On a newer hypervisor one of the two would be present and Linux can
> make an informed decision. On an older hypervisor, neither flag would be
> present, so Linux will have to keep doing what is currently doing.
> 
>   
>>> To all domains. This is not even ARM specific. Today dom0 would get
>>> XENFEAT_direct_mapped and domUs XENFEAT_not_direct_mapped. With cache
>>> coloring all domains will get XENFEAT_not_direct_mapped. With Bertrand's
>>> team work on 1:1 mapping domUs, some domUs might start to get
>>> XENFEAT_direct_mapped also one day soon.
>>>
>>> Now I think this is the best option because it is descriptive, doesn't
>>> imply anything about what Linux should or should not do, and doesn't
>>> depend on unreliable IOMMU information.
>>
>> That's a good first step but this still doesn't solve the problem on whether
>> the swiotlb can be disabled per-device or even disabling the expensive 1:1
>> mapping in the IOMMU page-tables.
>>
>> It feels to me we need to have a more complete solution (not necessary
>> implemented) so we don't put ourself in the corner again.
> 
> Yeah, XENFEAT_{not,}_direct_mapped help cleaning things up, but don't
> solve the issues you described. Those are difficult to solve, it would
> be nice to have some idea.
> 
> One issue is that we only have limited information passed via device
> tree, limited to the "iommus" property. If that's all we have, there
> isn't much we can do.

We can actually do a lot with that :). See more below.

> The device tree list is maybe the only option,
> although it feels a bit complex intuitively. We could maybe replace the
> real iommu node with a fake iommu node only to use it to "tag" devices
> protected by the real iommu.
> 
> I like the idea of rewarding well-designed boards; boards that have an
> IOMMU and works for all DMA-mastering devices. It would be great to be
> able to optimize those in a simple way, without breaking the others. But
> unfortunately due to the limited info on device tree, I cannot think of
> a way to do it automatically. And it is not great to rely on platform
> files.

We would not be able to automate in Xen alone, however we can ask the 
help of Linux.

Xen is able to tell whether it has protected the device with an IOMMU or 
not. When creating the domain device-tree, it could replace the IOMMU 
node with a Xen specific one.

With the Xen IOMMU nodes, Linux could find out whether the device needs 
to use the swiotlb ops or not.

Skipping extra mapping in the IOMMU is a bit trickier. I can see two 
solutions:
   - A per-domain toggle to skip the IOMMU mapping. This is assuming 
that Linux is able to know that all DMA capable devices are protected. 
The problem is a  driver may be loaded later. Such drivers are unlikely 
to use existing grant, so the toggle could be used to say "all the grant 
after this point will require a mapping (or not)"

   - A per-grant flag to indicate whether an IOMMU mapping is necessary. 
This is assuming we are able to know whether a grant will be used for DMA.

>>> Instead, if we follow my original proposal of using
>>> XENFEAT_ARM_dom0_iommu and set it automatically when Dom0 is protected
>>> by IOMMU, we risk breaking PV drivers for platforms where that protection
>>> is incomplete. I have no idea how many there are out there today.
>>
>> This can virtually affect any platform as it is easy to disable an IOMMU in
>> the firmware table.
>>
>>> I have
>>> the feeling that there aren't that many but I am not sure. So yes, it
>>> could be that we start passing XENFEAT_ARM_dom0_iommu for a given
>>> platform, Linux skips the swiotlb-xen initialization, actually it is
>>> needed for a network/block device, and a PV driver breaks. I can see why
>>> you say this is a no-go.
>>>
>>>
>>> Third option. We still use XENFEAT_ARM_dom0_iommu but we never set
>>> XENFEAT_ARM_dom0_iommu automatically. It needs a platform specific flag
>>> to be set. We add the flag to xen/arch/arm/platforms/xilinx-zynqmp.c and
>>> any other platforms that qualify. Basically it is "opt in" instead of
>>> "opt out". We don't risk breaking anything because platforms would have
>>> XENFEAT_ARM_dom0_iommu disabled by default.
>> Well, yes you will not break other platforms. However, you are still at risk
>> to break your platform if the firmware table is updated and disable some but
>> not all IOMMUs (for performance concern, brokeness...).
> 
> This is something we might be able to detect: we can detect if an IOMMU
> is disabled.

This is assuming that node has not been removed... :) Anyway, as I 
pointed out in my original answer, I don't think platform quirk (or 
enablement) is a viable solution here.

Cheers,

-- 
Julien Grall