From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 893A8C433E0 for ; Thu, 14 Jan 2021 16:42:37 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3A5CE23B1C for ; Thu, 14 Jan 2021 16:42:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3A5CE23B1C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ZApX00rVQDo/7aAACQWkieERooVSyF8g1awEjTB25No=; b=piDtrVWfy6FoD/H7o9Pg9JASm 2pHJ56e14jDXOkihvvuiZ+nUhejaLXi9USL+utNVVOpahXtSbw5D++91Ac/GJRhqJ2QYHlQyv00iW 3JIkLsF3OaX0ShonNVdyCGXuL4QQiFJDDX6qcL85HthQjQ4AlFxwAW4OYBxwFii4nO2XseizEMjFZ k8/yy7TKlR1F9sxmCEE9eP0GJuTbih72Fl+mSr/bnoxwPNzI43WmsHZ35JP9Zbj+U60EasKpk+cHw hzkkcljIU1v+CJLn5CKADAnUPprEGj4/j0eJzIWWoHBsp35wuxIDOiIk7Toz7eE4RNDvIzVaq05dZ ogf64R+2A==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l05gI-0003Ec-Sr; Thu, 14 Jan 2021 16:41:06 +0000 Received: from mail-ej1-x629.google.com ([2a00:1450:4864:20::629]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l05gG-0003Ds-4Q for linux-arm-kernel@lists.infradead.org; Thu, 14 Jan 2021 16:41:05 +0000 Received: by mail-ej1-x629.google.com with SMTP id w1so9080443ejf.11 for ; Thu, 14 Jan 2021 08:41:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=PpcdUm/gcB3TMCtroKeX+86rJVww+0E1mxO6peOGnow=; b=FGHOZ9P0lE3tooUNruq7iIqeJ+RHltmSjWchoV2pjQXQDi8y359Fcjnd5WKAEXoTpK Y9OB1zMWocLiVTmE+InQdaXpQNjPDgUECLlWCOnGieGoM0J4RE/Wy7qXBK0sRi1DhEOs XVKh7vqG8/VFncTq/LU/PCwHf8zvYQDZF+x8FQplUHzBvgaonBa+BMGlMxcECbHfK8E3 afZ/Nm8ZjPNH9Dti91WxyG1PxU1lPESE/XK7L1I+8CvnPM5MGPof6NUxHPf1rFmGa32u Ruke0VtajnvzD6VnyxNJgDPYRlLZHGk9L0cXHe6V3I1wPePxdBBMixEIBrLujoLqYjlM LIYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=PpcdUm/gcB3TMCtroKeX+86rJVww+0E1mxO6peOGnow=; b=JOC3kd3kpbG3Jx05sLnA/NL88stKcoczLBxOMvhbbqmLQMRxDyg9qmrJXX+Tyd6hIb cNDPmYes8JhwYlnYsQ1aG9qM2OpUAOOQ//kkVDbtR3Qy38dKMyeIN+EkhHdTq37nJKYQ Oacelusp/4/msDNUMMJBEE8O2HiSd9RqIwHHoJKrW4bnru14d/nEnXKiepmbjeFiryYe NKnoYXha/IPCd6s6wA6BjiybypdvimieqrSkfKLVlUHasrmp4jYCcyynR8PAvDyKrGjc 1spkfDIW+XcjByiaW85Vu/VbWUud5GdIVhOrPaMZFEesvAkdiiMN0JMFx0nm8ZRaWTDV tFLA== X-Gm-Message-State: AOAM532x60HK1CL+8xq32QWZAG9irouPV8qPUCvCCFpkQNQ1tba3ojdp TfBfDi6DBgHx6GgV2z98HmTN/g== X-Google-Smtp-Source: ABdhPJwiiUWF0wbqzhersCabuU1z96vEkPFvGfcvm3oB0CBDThtD+9g0uc7hvEvcDAoRh0d/m3WtnQ== X-Received: by 2002:a17:906:b08f:: with SMTP id x15mr6030193ejy.36.1610642461135; Thu, 14 Jan 2021 08:41:01 -0800 (PST) Received: from larix.localdomain (adsl-84-226-106-126.adslplus.ch. [84.226.106.126]) by smtp.gmail.com with ESMTPSA id cw7sm2123574ejc.13.2021.01.14.08.40.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jan 2021 08:41:00 -0800 (PST) Date: Thu, 14 Jan 2021 17:41:44 +0100 From: Jean-Philippe Brucker To: "Tian, Kevin" Subject: Re: [PATCH v9 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA Message-ID: References: <20210108145217.2254447-1-jean-philippe@linaro.org> <20210108145217.2254447-4-jean-philippe@linaro.org> <4de8ef03-a2ed-316e-d3e3-6b8474e20113@linux.intel.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TUID: nq/+KJqkCM9u X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210114_114104_206691_6DB551EC X-CRM114-Status: GOOD ( 31.34 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Greg Kroah-Hartman , "vivek.gautam@arm.com" , "guohanjun@huawei.com" , "will@kernel.org" , "lorenzo.pieralisi@arm.com" , "joro@8bytes.org" , Zhou Wang , "linux-acpi@vger.kernel.org" , "zhangfei.gao@linaro.org" , "lenb@kernel.org" , "devicetree@vger.kernel.org" , Arnd Bergmann , "eric.auger@redhat.com" , "robh+dt@kernel.org" , "Jonathan.Cameron@huawei.com" , "linux-arm-kernel@lists.infradead.org" , David Woodhouse , "rjw@rjwysocki.net" , "shameerali.kolothum.thodi@huawei.com" , "iommu@lists.linux-foundation.org" , "sudeep.holla@arm.com" , "robin.murphy@arm.com" , "linux-accelerators@lists.ozlabs.org" , Lu Baolu Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Jan 13, 2021 at 08:10:28AM +0000, Tian, Kevin wrote: > > >> Is this only for SVA? We may see more scenarios of using IOPF. For > > >> example, when passing through devices to user level, the user's pages > > >> could be managed dynamically instead of being allocated and pinned > > >> statically. > > > > > > Hm, isn't that precisely what SVA does? I don't understand the > > > difference. That said FEAT_IOPF doesn't have to be only for SVA. It could > > > later be used as a prerequisite some another feature. For special cases > > > device drivers can always use the iommu_register_device_fault_handler() > > > API and handle faults themselves. > > > > From the perspective of IOMMU, there is a little difference between > > these two. For SVA, the page table is from CPU side, so IOMMU only needs > > to call handle_mm_fault(); For above pass-through case, the page table > > is from IOMMU side, so the device driver (probably VFIO) needs to > > register a fault handler and call iommu_map/unmap() to serve the page > > faults. > > > > If we think about the nested mode (or dual-stage translation), it's more > > complicated since the kernel (probably VFIO) handles the second level > > page faults, while the first level page faults need to be delivered to > > user-level guest. Obviously, this hasn't been fully implemented in any > > IOMMU driver. > > > > Thinking more the confusion might come from the fact that we mixed > hardware capability with software capability. IOMMU_FEAT describes > the hardware capability. When FEAT_IOPF is set, it purely means whatever > page faults that are enabled by the software are routed through the IOMMU. > Nothing more. Then the software (IOMMU drivers) may choose to support > only limited faulting scenarios and then evolve to support more complex > usages gradually. For example, the intel-iommu driver only supports 1st-level > fault (thus SVA) for now, while FEAT_IOPF as a separate feature may give the > impression that 2nd-level faults are also allowed. From this angle once we > start to separate page fault from SVA, we may also need a way to report > the software capability (e.g. a set of faulting categories) and also extend > iommu_register_device_fault_handler to allow specifying which > category is enabled respectively. The example categories could be: > > - IOPF_BIND, for page tables which are bound/linked to the IOMMU. > Apply to bare metal SVA and guest SVA case; These don't seem to fit in the same software capability, since the action to perform on incoming page faults is very different. In the first case the fault handling is entirely contained within the IOMMU driver; in the second case the IOMMU driver only tracks page requests, and offloads handling to VFIO. > - IOPF_MAP, for page tables which are managed through explicit IOMMU > map interfaces. Apply to removing VFIO pinning restriction; >From the IOMMU perspective this is the same as guest SVA, no? VFIO registering a fault handler and doing the bulk of the work itself. > Both categories can be enabled together in nested translation, with > additional information provided to differentiate them in fault information. > Using paging/staging level doesn't make much sense as it's IOMMU driver's > internal knowledge, e.g. VT-d driver plans to use 1st level for GPA if no > nesting and then turn to 2nd level when nesting is enabled. I guess detailing what's needed for nested IOPF can help the discussion, although I haven't seen any concrete plan about implementing it, and it still seems a couple of years away. There are two important steps with nested IOPF: (1) Figuring out whether a fault comes from L1 or L2. A SMMU stall event comes with this information, but a PRI page request doesn't. The IOMMU driver has to first translate the IOVA to a GPA, injecting the fault into the guest if this translation fails by using the usual iommu_report_device_fault(). (2) Translating the faulting GPA to a HVA that can be fed to handle_mm_fault(). That requires help from KVM, so another interface - either KVM registering GPA->HVA translation tables or IOMMU driver querying each translation. Either way it should be reusable by device drivers that implement IOPF themselves. (1) could be enabled with iommu_dev_enable_feature(). (2) requires a more complex interface. (2) alone might also be desirable - demand-paging for level 2 only, no SVA for level 1. Anyway, back to this patch. What I'm trying to convey is "can the IOMMU receive incoming I/O page faults for this device and, when SVA is enabled, feed them to the mm subsystem? Enable that or return an error." I'm stuck on the name. IOPF alone is too vague. Not IOPF_L1 as Kevin noted, since L1 is also used in virtualization. IOPF_BIND and IOPF_SVA could also mean (2) above. IOMMU_DEV_FEAT_IOPF_FLAT? That leaves space for the nested extensions. (1) above could be IOMMU_FEAT_IOPF_NESTED, and (2) requires some new interfacing with KVM (or just an external fault handler) and could be used with either IOPF_FLAT or IOPF_NESTED. We can figure out the details later. What do you think? Thanks, Jean _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel