From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB3DFC6778F for ; Fri, 27 Jul 2018 02:50:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8FEF920857 for ; Fri, 27 Jul 2018 02:50:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8FEF920857 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727384AbeG0EKP (ORCPT ); Fri, 27 Jul 2018 00:10:15 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:10140 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725817AbeG0EKP (ORCPT ); Fri, 27 Jul 2018 00:10:15 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id D1D00949252B; Fri, 27 Jul 2018 10:50:32 +0800 (CST) Received: from [127.0.0.1] (10.177.23.164) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.382.0; Fri, 27 Jul 2018 10:50:00 +0800 Subject: Re: [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 To: Robin Murphy , Jean-Philippe Brucker , Will Deacon , "Joerg Roedel" , linux-arm-kernel , iommu , linux-kernel , LinuxArm References: <1531376312-2192-1-git-send-email-thunder.leizhen@huawei.com> <5B594399.1080404@huawei.com> From: "Leizhen (ThunderTown)" Message-ID: <5B5A8857.9040907@huawei.com> Date: Fri, 27 Jul 2018 10:49:59 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.23.164] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018/7/26 22:16, Robin Murphy wrote: > On 2018-07-26 4:44 AM, Leizhen (ThunderTown) wrote: >> >> >> On 2018/7/25 5:51, Robin Murphy wrote: >>> On 2018-07-12 7:18 AM, Zhen Lei wrote: >>>> v2 -> v3: Add a bootup option "iommu_strict_mode" to make the >>>> manager can choose which mode to be used. The first 5 patches >>>> have not changed. + iommu_strict_mode= [arm-smmu-v3] + >>>> 0 - strict mode (default) + 1 - non-strict mode >>>> >>>> v1 -> v2: Use the lowest bit of the io_pgtable_ops.unmap's iova >>>> parameter to pass the strict mode: 0, IOMMU_STRICT; 1, >>>> IOMMU_NON_STRICT; Treat 0 as IOMMU_STRICT, so that the unmap >>>> operation can compatible with other IOMMUs which still use strict >>>> mode. In other words, this patch series will not impact other >>>> IOMMU drivers. I tried add a new quirk >>>> IO_PGTABLE_QUIRK_NON_STRICT in io_pgtable_cfg.quirks, but it can >>>> not pass the strict mode of the domain from SMMUv3 driver to >>>> io-pgtable module. >>> >>> What exactly is the issue there? We don't have any problem with >>> other quirks like NO_DMA, and as I said before, by the time we're >>> allocating the io-pgtable in arm_smmu_domain_finalise() we already >>> know everything there is to know about the domain. >> >> Because userspace can map/unamp and start devices to access memory >> through VFIO. So that, the attacker can: 1. alloc memory 2. map 3. >> unmap 4. free memory 5. repeatedly accesssing the just freed memory >> base on the just unmapped iova, this attack may success if the freed >> memory is reused by others and the mapping still staying in TLB > > Right, but that's why we don't set non-strict mode on unmanaged domains; what I was asking about was specifically why "it can not pass the strict mode of the domain from SMMUv3 driver to io-pgtable module", because we don't get anywhere near io-pgtable until we already know whether the domain in question can allow lazy unmaps or not. Sorry, it's my fault. I've all through mistaken that "data->iop.cfg" is shared by all domains until you mentioned me again. > >> But if only root user can use VFIO, this is an unnecessary worry. >> Then both normal and VFIO will use the same strict mode, so that the >> new quirk IO_PGTABLE_QUIRK_NON_STRICT can easily be applied. >> >>> >>>> Add a new member domain_non_strict in struct iommu_dma_cookie, >>>> this member will only be initialized when the related domain and >>>> IOMMU driver support non-strict mode. >>>> >>>> v1: In common, a IOMMU unmap operation follow the below steps: 1. >>>> remove the mapping in page table of the specified iova range 2. >>>> execute tlbi command to invalid the mapping which is cached in >>>> TLB 3. wait for the above tlbi operation to be finished 4. free >>>> the IOVA resource 5. free the physical memory resource >>>> >>>> This maybe a problem when unmap is very frequently, the >>>> combination of tlbi and wait operation will consume a lot of >>>> time. A feasible method is put off tlbi and iova-free operation, >>>> when accumulating to a certain number or reaching a specified >>>> time, execute only one tlbi_all command to clean up TLB, then >>>> free the backup IOVAs. Mark as non-strict mode. >>>> >>>> But it must be noted that, although the mapping has already been >>>> removed in the page table, it maybe still exist in TLB. And the >>>> freed physical memory may also be reused for others. So a >>>> attacker can persistent access to memory based on the just freed >>>> IOVA, to obtain sensible data or corrupt memory. So the VFIO >>>> should always choose the strict mode. >>>> >>>> Some may consider put off physical memory free also, that will >>>> still follow strict mode. But for the map_sg cases, the memory >>>> allocation is not controlled by IOMMU APIs, so it is not >>>> enforceable. >>>> >>>> Fortunately, Intel and AMD have already applied the non-strict >>>> mode, and put queue_iova() operation into the common file >>>> dma-iommu.c., and my work is based on it. The difference is that >>>> arm-smmu-v3 driver will call IOMMU common APIs to unmap, but >>>> Intel and AMD IOMMU drivers are not. >>>> >>>> Below is the performance data of strict vs non-strict for NVMe >>>> device: Randomly Read IOPS: 146K(strict) vs 573K(non-strict) Randomly Write IOPS: 143K(strict) vs 513K(non-strict) >>> >>> How does that compare to passthrough performance? One thing I'm not >>> entirely clear about is what the realistic use-case for this is - >>> even if invalidation were infinitely fast, enabling translation >>> still typically has a fair impact on overall system performance in >>> terms of latency, power, memory bandwidth, etc., so I can't help >>> wonder what devices exist today for which performance is critical >>> and robustness* is unimportant, yet have crippled addressing >>> capabilities such that they can't just use passthrough. >> I have no passthrough performance data yet, I will ask my team to do >> it. But we have tested the Global bypass: Randomly Read IOPS: 744K, >> and Randomly Write IOPS: is the same to non-strict. >> >> I'm also not clear. But I think in most cases, the system does not >> need to run at full capacity, but the system should have that >> ability. For example, a system's daily load may only 30-50%, but the >> load may increase to 80%+ on festival day. >> >> Passthrough is not enough to support VFIO, and virtualization need >> the later. > > Huh? The whole point of passthrough mode is that the IOMMU can still be used for VFIO, but without imposing the overhead of dynamic mapping on host DMA. I said that from my experience. Userspace do not known the PA, so I think the user can not fill dma_map.iova correctly. /* Allocate some space and setup a DMA mapping */ dma_map.vaddr = (__u64)mmap(0, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); dma_map.size = 0x1000; dma_map.iova = 0x2f00000000UL; /* user specified */ dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE; ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map); Further more, dma_map is also suitable for iommu_map_sg usage scenario. > > Robin. > >>> * I don't want to say "security" here, since I'm actually a lot >>> less concerned about the theoretical malicious endpoint/wild write >>> scenarios than the the much more straightforward malfunctioning >>> device and/or buggy driver causing use-after-free style memory >>> corruption. Also, I'm sick of the word "security"... >> >> OK,We really have no need to consider buggy devices. >> >>> >>>> >>>> Zhen Lei (6): iommu/arm-smmu-v3: fix the implementation of >>>> flush_iotlb_all hook iommu/dma: add support for non-strict mode iommu/amd: use default branch to deal with all non-supported capabilities iommu/io-pgtable-arm: add support for non-strict >>>> mode iommu/arm-smmu-v3: add support for non-strict mode iommu/arm-smmu-v3: add bootup option "iommu_strict_mode" >>>> >>>> Documentation/admin-guide/kernel-parameters.txt | 12 +++++++ drivers/iommu/amd_iommu.c | 4 +-- drivers/iommu/arm-smmu-v3.c | 42 >>>> +++++++++++++++++++++++-- drivers/iommu/dma-iommu.c >>>> | 25 +++++++++++++++ drivers/iommu/io-pgtable-arm.c >>>> | 23 ++++++++------ include/linux/iommu.h >>>> | 7 +++++ 6 files changed, 98 insertions(+), 15 deletions(-) >>>> >>> >>> . >>> >> > > . > -- Thanks! BestRegards From mboxrd@z Thu Jan 1 00:00:00 1970 From: thunder.leizhen@huawei.com (Leizhen (ThunderTown)) Date: Fri, 27 Jul 2018 10:49:59 +0800 Subject: [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 In-Reply-To: References: <1531376312-2192-1-git-send-email-thunder.leizhen@huawei.com> <5B594399.1080404@huawei.com> Message-ID: <5B5A8857.9040907@huawei.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 2018/7/26 22:16, Robin Murphy wrote: > On 2018-07-26 4:44 AM, Leizhen (ThunderTown) wrote: >> >> >> On 2018/7/25 5:51, Robin Murphy wrote: >>> On 2018-07-12 7:18 AM, Zhen Lei wrote: >>>> v2 -> v3: Add a bootup option "iommu_strict_mode" to make the >>>> manager can choose which mode to be used. The first 5 patches >>>> have not changed. + iommu_strict_mode= [arm-smmu-v3] + >>>> 0 - strict mode (default) + 1 - non-strict mode >>>> >>>> v1 -> v2: Use the lowest bit of the io_pgtable_ops.unmap's iova >>>> parameter to pass the strict mode: 0, IOMMU_STRICT; 1, >>>> IOMMU_NON_STRICT; Treat 0 as IOMMU_STRICT, so that the unmap >>>> operation can compatible with other IOMMUs which still use strict >>>> mode. In other words, this patch series will not impact other >>>> IOMMU drivers. I tried add a new quirk >>>> IO_PGTABLE_QUIRK_NON_STRICT in io_pgtable_cfg.quirks, but it can >>>> not pass the strict mode of the domain from SMMUv3 driver to >>>> io-pgtable module. >>> >>> What exactly is the issue there? We don't have any problem with >>> other quirks like NO_DMA, and as I said before, by the time we're >>> allocating the io-pgtable in arm_smmu_domain_finalise() we already >>> know everything there is to know about the domain. >> >> Because userspace can map/unamp and start devices to access memory >> through VFIO. So that, the attacker can: 1. alloc memory 2. map 3. >> unmap 4. free memory 5. repeatedly accesssing the just freed memory >> base on the just unmapped iova, this attack may success if the freed >> memory is reused by others and the mapping still staying in TLB > > Right, but that's why we don't set non-strict mode on unmanaged domains; what I was asking about was specifically why "it can not pass the strict mode of the domain from SMMUv3 driver to io-pgtable module", because we don't get anywhere near io-pgtable until we already know whether the domain in question can allow lazy unmaps or not. Sorry, it's my fault. I've all through mistaken that "data->iop.cfg" is shared by all domains until you mentioned me again. > >> But if only root user can use VFIO, this is an unnecessary worry. >> Then both normal and VFIO will use the same strict mode, so that the >> new quirk IO_PGTABLE_QUIRK_NON_STRICT can easily be applied. >> >>> >>>> Add a new member domain_non_strict in struct iommu_dma_cookie, >>>> this member will only be initialized when the related domain and >>>> IOMMU driver support non-strict mode. >>>> >>>> v1: In common, a IOMMU unmap operation follow the below steps: 1. >>>> remove the mapping in page table of the specified iova range 2. >>>> execute tlbi command to invalid the mapping which is cached in >>>> TLB 3. wait for the above tlbi operation to be finished 4. free >>>> the IOVA resource 5. free the physical memory resource >>>> >>>> This maybe a problem when unmap is very frequently, the >>>> combination of tlbi and wait operation will consume a lot of >>>> time. A feasible method is put off tlbi and iova-free operation, >>>> when accumulating to a certain number or reaching a specified >>>> time, execute only one tlbi_all command to clean up TLB, then >>>> free the backup IOVAs. Mark as non-strict mode. >>>> >>>> But it must be noted that, although the mapping has already been >>>> removed in the page table, it maybe still exist in TLB. And the >>>> freed physical memory may also be reused for others. So a >>>> attacker can persistent access to memory based on the just freed >>>> IOVA, to obtain sensible data or corrupt memory. So the VFIO >>>> should always choose the strict mode. >>>> >>>> Some may consider put off physical memory free also, that will >>>> still follow strict mode. But for the map_sg cases, the memory >>>> allocation is not controlled by IOMMU APIs, so it is not >>>> enforceable. >>>> >>>> Fortunately, Intel and AMD have already applied the non-strict >>>> mode, and put queue_iova() operation into the common file >>>> dma-iommu.c., and my work is based on it. The difference is that >>>> arm-smmu-v3 driver will call IOMMU common APIs to unmap, but >>>> Intel and AMD IOMMU drivers are not. >>>> >>>> Below is the performance data of strict vs non-strict for NVMe >>>> device: Randomly Read IOPS: 146K(strict) vs 573K(non-strict) Randomly Write IOPS: 143K(strict) vs 513K(non-strict) >>> >>> How does that compare to passthrough performance? One thing I'm not >>> entirely clear about is what the realistic use-case for this is - >>> even if invalidation were infinitely fast, enabling translation >>> still typically has a fair impact on overall system performance in >>> terms of latency, power, memory bandwidth, etc., so I can't help >>> wonder what devices exist today for which performance is critical >>> and robustness* is unimportant, yet have crippled addressing >>> capabilities such that they can't just use passthrough. >> I have no passthrough performance data yet, I will ask my team to do >> it. But we have tested the Global bypass: Randomly Read IOPS: 744K, >> and Randomly Write IOPS: is the same to non-strict. >> >> I'm also not clear. But I think in most cases, the system does not >> need to run at full capacity, but the system should have that >> ability. For example, a system's daily load may only 30-50%, but the >> load may increase to 80%+ on festival day. >> >> Passthrough is not enough to support VFIO, and virtualization need >> the later. > > Huh? The whole point of passthrough mode is that the IOMMU can still be used for VFIO, but without imposing the overhead of dynamic mapping on host DMA. I said that from my experience. Userspace do not known the PA, so I think the user can not fill dma_map.iova correctly. /* Allocate some space and setup a DMA mapping */ dma_map.vaddr = (__u64)mmap(0, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); dma_map.size = 0x1000; dma_map.iova = 0x2f00000000UL; /* user specified */ dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE; ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map); Further more, dma_map is also suitable for iommu_map_sg usage scenario. > > Robin. > >>> * I don't want to say "security" here, since I'm actually a lot >>> less concerned about the theoretical malicious endpoint/wild write >>> scenarios than the the much more straightforward malfunctioning >>> device and/or buggy driver causing use-after-free style memory >>> corruption. Also, I'm sick of the word "security"... >> >> OK?We really have no need to consider buggy devices. >> >>> >>>> >>>> Zhen Lei (6): iommu/arm-smmu-v3: fix the implementation of >>>> flush_iotlb_all hook iommu/dma: add support for non-strict mode iommu/amd: use default branch to deal with all non-supported capabilities iommu/io-pgtable-arm: add support for non-strict >>>> mode iommu/arm-smmu-v3: add support for non-strict mode iommu/arm-smmu-v3: add bootup option "iommu_strict_mode" >>>> >>>> Documentation/admin-guide/kernel-parameters.txt | 12 +++++++ drivers/iommu/amd_iommu.c | 4 +-- drivers/iommu/arm-smmu-v3.c | 42 >>>> +++++++++++++++++++++++-- drivers/iommu/dma-iommu.c >>>> | 25 +++++++++++++++ drivers/iommu/io-pgtable-arm.c >>>> | 23 ++++++++------ include/linux/iommu.h >>>> | 7 +++++ 6 files changed, 98 insertions(+), 15 deletions(-) >>>> >>> >>> . >>> >> > > . > -- Thanks! BestRegards