From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AED4C433E0 for ; Fri, 19 Feb 2021 01:43:09 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8BEDF64EB4 for ; Fri, 19 Feb 2021 01:43:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8BEDF64EB4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.86752.163096 (Exim 4.92) (envelope-from ) id 1lCuou-0001Iy-HN; Fri, 19 Feb 2021 01:43:00 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 86752.163096; Fri, 19 Feb 2021 01:43:00 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1lCuou-0001Ir-EM; Fri, 19 Feb 2021 01:43:00 +0000 Received: by outflank-mailman (input) for mailman id 86752; Fri, 19 Feb 2021 01:42:59 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1lCuot-0001IQ-6f for xen-devel@lists.xenproject.org; Fri, 19 Feb 2021 01:42:59 +0000 Received: from mail.kernel.org (unknown [198.145.29.99]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 41e0c28a-ee77-4bf5-af49-4967c6e7371c; Fri, 19 Feb 2021 01:42:56 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 1E99964E92; Fri, 19 Feb 2021 01:42:55 +0000 (UTC) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 41e0c28a-ee77-4bf5-af49-4967c6e7371c DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1613698975; bh=XF5c/LiOQ6+/57pE7wpWL6218t2PqLuap1Fk4canAHU=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=J5lonfZr6/9hfj/SYa2ITGZu1SdnUNu5oqWx30JRAI+bhc2F4MPBANEEX8phklIE0 rQ5Imz1fBcjeGstEYkS2dTF0PwuVSUkHzyOhLN/Dmr0wdY3ogPMLjzAaHkWXQC+xNy JN/pIujrabw7dFokJ8XR7c7j+dCSf3zjri2utBwjkG47Mg0z4EOuZEyXJ8LPGGKD8E sCi/AF51DuSc/2dwT1oj0XPpOpII0wmJ/KkfuVZ+otQWTh4OhVH0D+zccDEIrM5u5K sVwY8qchkORdYzx5FIajfjT6Ws7CwIpIIgV19c8LSDkFvi6uJaM7jFJJ+VSP4uJooH FxJ378q/zeahA== Date: Thu, 18 Feb 2021 17:42:54 -0800 (PST) From: Stefano Stabellini X-X-Sender: sstabellini@sstabellini-ThinkPad-T480s To: Julien Grall cc: Stefano Stabellini , xen-devel@lists.xenproject.org, Bertrand.Marquis@arm.com, Volodymyr_Babchuk@epam.com, rahul.singh@arm.com Subject: Re: [RFC] xen/arm: introduce XENFEAT_ARM_dom0_iommu In-Reply-To: <0be0196f-5b3f-73f9-5ab7-7a54faabec5c@xen.org> Message-ID: References: <2d22d5b8-6cda-f27b-e938-4806b65794a5@xen.org> <0be0196f-5b3f-73f9-5ab7-7a54faabec5c@xen.org> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII On Thu, 18 Feb 2021, Julien Grall wrote: > On 17/02/2021 23:54, Stefano Stabellini wrote: > > On Wed, 17 Feb 2021, Julien Grall wrote: > > > On 17/02/2021 02:00, Stefano Stabellini wrote: > > > > Hi all, > > > > > > > > Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to > > > > translate addresses for DMA operations in Dom0. Specifically, > > > > swiotlb-xen is used to translate the address of a foreign page (a page > > > > belonging to a domU) mapped into Dom0 before using it for DMA. > > > > > > > > This is important because although Dom0 is 1:1 mapped, DomUs are not. On > > > > systems without an IOMMU swiotlb-xen enables PV drivers to work as long > > > > as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation > > > > ends up using the MFN, rather than the GFN. > > > > > > > > > > > > On systems with an IOMMU, this is not necessary: when a foreign page is > > > > mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation > > > > is enstablished for both MMU and SMMU. Dom0 could safely use the GFN > > > > address (instead of the MFN) for DMA operations and they would work. It > > > > would be more efficient than using swiotlb-xen. > > > > > > > > If you recall my presentation from Xen Summit 2020, Xilinx is working on > > > > cache coloring. With cache coloring, no domain is 1:1 mapped, not even > > > > Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not > > > > work as intended. > > > > > > > > > > > > The suggested solution for both these issues is to add a new feature > > > > flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use > > > > swiotlb-xen because IOMMU translations are available for Dom0. If > > > > XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen > > > > initialization. I have tested this scheme with and without cache > > > > coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it > > > > works as expected: DMA operations succeed. > > > > > > > > > > > > What about systems where an IOMMU is present but not all devices are > > > > protected? > > > > > > > > There is no way for Xen to know which devices are protected and which > > > > ones are not: devices that do not have the "iommus" property could or > > > > could not be DMA masters. > > > > > > > > Perhaps Xen could populate a whitelist of devices protected by the IOMMU > > > > based on the "iommus" property. It would require some added complexity > > > > in Xen and especially in the swiotlb-xen driver in Linux to use it, > > > > which is not ideal. > > > > > > You are trading a bit more complexity in Xen and Linux with a user will > > > may > > > not be able to use the hypervisor on his/her platform without a quirk in > > > Xen > > > (see more below). > > > > > > > However, this approach would not work for cache > > > > coloring where dom0 is not 1:1 mapped so the swiotlb-xen should not be > > > > used either way > > > > > > Not all the Dom0 Linux kernel will be able to work with cache colouring. > > > So > > > you will need a way for the kernel to say "Hey I am can avoid using > > > swiotlb". > > > > A dom0 Linux kernel unmodified can work with Xen cache coloring enabled. > > Really? I was expecting Linux to configure the DMA transaction to use the MFN > for foreign pages. So a mapping GFN == MFN would be necessary. > > > The swiotlb-xen is unneeded and also all the pfn_valid() checks to detect > > if a page is local or foreign don't work as intended. > > I am not sure to understand this. Are you saying that Linux will > "mistakenly" consider the foreign page as local? The pfn_valid(addr) check is based on the idea that gfn == mfn for local pages and gfn != mfn for foreign pages. If you take the mfn of a foreign page pfn_valid(mfn) should fail. However, it only works as long as Dom0 is 1:1 mapped. If it is not 1:1 mapped pfn_valid(mfn) could return true for a foreign page. It could mistake a foreign page for a local one. It could also mistake a local page for a foreign one if we called pfn_valid() passing the mfn of one of our local pages. > Therefore, it would always use the GFN rather than MFN? For DMA operations: - local pages use GFN as dev_addr - foreign pages use MFN as dev_addr <- requires 1:1 MFN mapping For cache flush operations: - local pages might flush locally, works as expected - local pages might flush via hypercall, needlessly slow but works (this is what I was talking about) - foreign pages might flush locally, requires 1:1 MFN mapping - foreign pages might flush via hypercall, works as expected The 1:1 MFN mapping of foreign pages is required and it is problematic as it could conflict. Sorry for not mentioning it earlier. So yeah, it looks like a change in Linux is required. > > However, it still > > works on systems where the IOMMU can be relied upon. That's because the > > IOMMU does the GFN->MFN translations, and also because both swiotlb-xen > > code paths (cache flush local and cache flush via hypercall) work. > > > > Also considering that somebody that enables cache coloring has to know > > the entire system well, I don't think we need a runtime flag from Linux > > to say "Hey I can avoid using swiotlb". > > I think we should avoid to hardcode any decision again. Otherwise, sooner or > later this will come back to bite us. That's fair. Let's say that Linux needs some sort of change to work with cache colored Xen. One important point is that a user cannot really expect the system to enable cache coloring on its own if appropriate. The user has to go in and manually set it up in the config files of all domUs and even the Xen command line. So if the user turns it on and Linux breaks at boot because it can't support it, I think it is entirely fine. In other words, I don't think cache coloring needs to be dynamically discoverable and transparently enableable. But certainly if you turn it on, you don't want Linux to fail silently after a few hours. If it is going to fail, we want it to fail straight away and clearly. For example, a BUG_ON at boot in Linux or Xen would be fine. > > > > How to set XENFEAT_ARM_dom0_iommu? > > > > > > > > We could set XENFEAT_ARM_dom0_iommu automatically when > > > > is_iommu_enabled(d) for Dom0. We could also have a platform specific > > > > (xen/arch/arm/platforms/) override so that a specific platform can > > > > disable XENFEAT_ARM_dom0_iommu. For debugging purposes and advanced > > > > users, it would also be useful to be able to override it via a Xen > > > > command line parameter. > > > Platform quirks should be are limited to a small set of platforms. > > > > > > In this case, this would not be only per-platform but also per-firmware > > > table > > > as a developer can decide to remove/add IOMMU nodes in the DT at any time. > > > > > > In addition to that, it means we are introducing a regression for those > > > users > > > as Xen 4.14 would have worked on there platform but not anymore. They > > > would > > > need to go through all the nodes and find out which one is not protected. > > > > > > This is a bit of a daunting task and we are going to end up having a lot > > > of > > > per-platform quirk in Xen. > > > > > > So this approach selecting the flag is a no-go for me. FAOD, the inverted > > > idea > > > (i.e. only setting XENFEAT_ARM_dom0_iommu per-platform) is a no go as > > > well. > > > > > > I don't have a good idea how to set the flag automatically. My > > > requirement is newer Xen should continue to work on all supported > > > platforms without any additional per-platform effort. > > > > Absolutely agreed. > > > > > > One option would be to rename the flag to XENFEAT_ARM_cache_coloring and > > only set it when cache coloring is enabled. Obviously you need to know > > what you are doing if you are enabling cache coloring. If we go down > > that route, we don't risk breaking compatibility with any platforms. > > Given that cache coloring is not upstream yet (upstreaming should start > > very soon), maybe the only thing to do now would be to reserve a XENFEAT > > number. > > At least in this context, I can't see how the problem described is cache > coloring specific. Although, we may need to expose such flag for other purpose > in the future. I agree > > But actually it was always wrong for Linux to enable swiotlb-xen without > > checking whether it is 1:1 mapped or not. Today we enable swiotlb-xen in > > dom0 and disable it in domU, while we should have enabled swiotlb-xen if > > 1:1 mapped no matter dom0/domU. (swiotlb-xen could be useful in a 1:1 > > mapped domU driver domain.) > > > > > > There is an argument (Andy was making on IRC) that being 1:1 mapped or > > not is an important information that Xen should provide to the domain > > regardless of anything else. > > > > So maybe we should add two flags: > > > > - XENFEAT_direct_mapped > > - XENFEAT_not_direct_mapped > > I am guessing the two flags is to allow Linux to fallback to the default > behavior (depending on dom0 vs domU) on older hypervisor On newer hypervisors, > one of this flag would always be set. Is that correct? Yes. On a newer hypervisor one of the two would be present and Linux can make an informed decision. On an older hypervisor, neither flag would be present, so Linux will have to keep doing what is currently doing. > > To all domains. This is not even ARM specific. Today dom0 would get > > XENFEAT_direct_mapped and domUs XENFEAT_not_direct_mapped. With cache > > coloring all domains will get XENFEAT_not_direct_mapped. With Bertrand's > > team work on 1:1 mapping domUs, some domUs might start to get > > XENFEAT_direct_mapped also one day soon. > > > > Now I think this is the best option because it is descriptive, doesn't > > imply anything about what Linux should or should not do, and doesn't > > depend on unreliable IOMMU information. > > That's a good first step but this still doesn't solve the problem on whether > the swiotlb can be disabled per-device or even disabling the expensive 1:1 > mapping in the IOMMU page-tables. > > It feels to me we need to have a more complete solution (not necessary > implemented) so we don't put ourself in the corner again. Yeah, XENFEAT_{not,}_direct_mapped help cleaning things up, but don't solve the issues you described. Those are difficult to solve, it would be nice to have some idea. One issue is that we only have limited information passed via device tree, limited to the "iommus" property. If that's all we have, there isn't much we can do. The device tree list is maybe the only option, although it feels a bit complex intuitively. We could maybe replace the real iommu node with a fake iommu node only to use it to "tag" devices protected by the real iommu. I like the idea of rewarding well-designed boards; boards that have an IOMMU and works for all DMA-mastering devices. It would be great to be able to optimize those in a simple way, without breaking the others. But unfortunately due to the limited info on device tree, I cannot think of a way to do it automatically. And it is not great to rely on platform files. > > Instead, if we follow my original proposal of using > > XENFEAT_ARM_dom0_iommu and set it automatically when Dom0 is protected > > by IOMMU, we risk breaking PV drivers for platforms where that protection > > is incomplete. I have no idea how many there are out there today. > > This can virtually affect any platform as it is easy to disable an IOMMU in > the firmware table. > > > I have > > the feeling that there aren't that many but I am not sure. So yes, it > > could be that we start passing XENFEAT_ARM_dom0_iommu for a given > > platform, Linux skips the swiotlb-xen initialization, actually it is > > needed for a network/block device, and a PV driver breaks. I can see why > > you say this is a no-go. > > > > > > Third option. We still use XENFEAT_ARM_dom0_iommu but we never set > > XENFEAT_ARM_dom0_iommu automatically. It needs a platform specific flag > > to be set. We add the flag to xen/arch/arm/platforms/xilinx-zynqmp.c and > > any other platforms that qualify. Basically it is "opt in" instead of > > "opt out". We don't risk breaking anything because platforms would have > > XENFEAT_ARM_dom0_iommu disabled by default. > Well, yes you will not break other platforms. However, you are still at risk > to break your platform if the firmware table is updated and disable some but > not all IOMMUs (for performance concern, brokeness...). This is something we might be able to detect: we can detect if an IOMMU is disabled.