From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,URIBL_SBL,URIBL_SBL_A, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8415C73C50 for ; Tue, 9 Jul 2019 14:37:12 +0000 (UTC) Received: from dpdk.org (dpdk.org [92.243.14.124]) by mail.kernel.org (Postfix) with ESMTP id 5FFEA20844 for ; Tue, 9 Jul 2019 14:37:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5FFEA20844 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0C9124C94; Tue, 9 Jul 2019 16:37:11 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 759F7322C for ; Tue, 9 Jul 2019 16:37:08 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Jul 2019 07:37:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,470,1557212400"; d="scan'208";a="165778863" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.82]) ([10.237.220.82]) by fmsmga008.fm.intel.com with ESMTP; 09 Jul 2019 07:37:06 -0700 To: Jerin Jacob Kollanukkaran , David Marchand Cc: dev , Thomas Monjalon , Ben Walker References: <20190708142450.51597-1-jerinj@marvell.com> <0947c33d-b3be-1acc-f98e-3635cc5658d2@intel.com> From: "Burakov, Anatoly" Message-ID: <553b3a91-7458-98d0-9df6-5b53010d326f@intel.com> Date: Tue, 9 Jul 2019 15:37:05 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote: >> -----Original Message----- >> From: Burakov, Anatoly >> Sent: Tuesday, July 9, 2019 7:00 PM >> To: Jerin Jacob Kollanukkaran ; David Marchand >> >> Cc: dev ; Thomas Monjalon ; Ben >> Walker >> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode >> selection >> >> On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote: >>>> -----Original Message----- >>>> From: Burakov, Anatoly >>>> Sent: Tuesday, July 9, 2019 5:10 PM >>>> To: Jerin Jacob Kollanukkaran ; David Marchand >>>> >>>> Cc: dev ; Thomas Monjalon ; >> Ben >>>> Walker >>>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA >>>> mode selection >>>>>>> ________________________________________ >>>>>>> >>>>>>> On Mon, Jul 8, 2019 at 4:25 PM wrote: >>>>>>> From: Jerin Jacob >>>>>>> >>>>>>> Existing logic fails to select IOVA mode as VA if driver request >>>>>>> to enable IOVA as VA. >>>>>>> >>>>>>> IOVA as VA has more strict requirement than other modes, so >>>>>>> enabling positive logic for IOVA as VA selection. >>>>>>> >>>>>>> This patch also updates the default IOVA mode as PA for PCI >>>>>>> devices as it has to deal with DMA engines unlike the virtual >>>>>>> devices that may need only IOVA as DC. >>>>>>> >>>>>>> We have three cases: >>>>>>> - driver/hw supports IOVA as PA only >>>>>>> >>>>>>> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs >>>>>>> non IOMMU). We are already addressing that case >>>>>> >>>>>> I don't get how this works. How does "system capability" affect >>>>>> what the device itself supports? Are we to assume that *all* >>>>>> hardware support IOVA as VA by default? "System capability" is more >>>>>> of a bus issue than an individual device issue, is it not? >>>>> >>>>> What I meant is, supporting VA vs PA is function of IOMMU(not the >>>>> device >>>> attribute). >>>>> Ie. Device makes the bus master request, if IOMMU available and >>>>> enabled in the SYSTEM , It goes over IOMMU and translate the IOVA >>>>> to >>>> physical address. >>>>> >>>>> Another way to put is, Is there any _PCIe_ device which >>>>> need/requires RTE_PCI_DRV_NEED_IOVA_AS_PA in >>>>> rte_pci_driver.drv_flags >>>>> >>>>> >>>> >>>> Previously, as far as i can tell, the flag was used to indicate >>>> support for IOVA as VA mode, not *requirement* for IOVA as VA mode. >>>> For example, there are multiple patches [1][2][3][4] (i'm sure i can >>>> find more!) that added IOVA as VA support to various drivers, and >>>> they all were worded it in this exact way >>>> - "support for IOVA as VA mode", not "require IOVA as VA mode". As >>>> far as i can tell, none of these drivers *require* IOVA as VA mode - >>>> they merely use this flag to indicate support for it. >>> >>> Some class of devices NEED IOVA as VA for performance reasons. >>> Specially the devices has HW mempool allocators. On those devices If >>> we don’t use IOVA as VA, Upon getting packet from device, It needs to >>> go over rte_mem_iova2virt() per packet see driver/net/dppa2. Which has >> real performance issue. >> >> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot work >> without it, whereas in this case it's more of a "highly recommended" rather >> than "need". > > It is "need" as performance is horrible without it as is per packet SW translation. > A "need" for DPDK performance perspective. Would the driver fail to initialize if it detects running as IOVA as PA? > >> >>>> >>>> Now suddenly it turns out that someone somewhere "knew" that "IOVA >> as >>>> VA" flag in PCI drivers is supposed to indicate *requirement* and not >>>> support, and it appears that this knowledge was not communicated nor >>>> documented anywhere, and is now treated as common knowledge. >>> >>> I think, the confusion here is, I was under impression that # If >>> device supports IOVA as VA and system runs with IOMMU then the dpdk >>> should run in IOVA as VA mode. >>> If above statement true then we don’t really need a new flag. >> >> Exactly. And the flag used to indicate that the device *supports* IOVA as VA, >> not that it *requires* it. >> >>> >>> Couple of points to make forward progress: >>> # If we think, there is a use case where device is IOVA as VA And >>> system runs in IOMMU mode then for some reason DPDK needs to run in >> PA >>> mode. If so, we need to create two flags RTE_PCI_DRV_IOVA_AS_VA - it >>> can run either modes >> >> There are use cases - KNI and igb_uio come to mind. Whether IOMMU uses >> VA or PA is a different from whether IOMMU is in use - there is no law that >> states that, when using IOMMU, IOVA have to have 1:1 mapping with VA. >> IOMMU requirement does not necessarily imply IOVA as VA - it is perfectly >> legal to program IOMMU to use IOVA as PA (which we currently do when we >> e.g. use VFIO for some devices and igb_uio for others). > > For KNI, we already submitted a patch to support IOVA as VA. Yep, point being that it *didn't work* before, hence we may want to account for possible future use cases like this (however admittedly hacky they are). There are valid use cases to enforce IOVA as VA support only (such as for performance reasons, even though it would be technically possible to use IOVA as PA), and there could be valid use cases to enforce IOVA as PA support only (for example, i seem to remember that crypto/qat driver at one point didn't support VFIO driver, effectively rendering it not supporting IOVA as VA). > I don’t know about igb_uio, if IOVA as PA, we might as well disable IOMMU. > Is igb_uio enables IOMMU? I don’t see any reference. > grep -ri "iommu" kernel/linux/igb_uio/ igb_uio can work with IOMMU with pass-through mode. When Linux is booted up in pass-through, IOMMU is enabled and igb_uio will work, and VFIO can use both IOVA as PA and IOVA as VA, while igb_uio can only use IOVA as PA. So yes, igb_uio does enable IOMMU in a very limited way, but only to set up 1:1 mapping of IOVA with PA. Also, some other use cases will also require IOVA as PA while having full IOMMU support. An example of this would be systems with limited IOMMU width (such as VM's) - even though the IOMMU is technically supported, we may not have the necessary address width to run all devices in IOVA as VA mode, and would need to fall back to IOVA as PA. Since we cannot *require* IOVA as VA in current codebase, any driver that expects IOVA as VA to always be enabled will presumably not work. > > Again, it is not device attribute, it is system attribute. If it's a system attribute, why is it a device driver flag then? The system may or may not support IOMMU, the device itself probably doesn't care since bus address looks the same in both cases, *but the driver might* (such as would be in your case - requiring IOVA as VA and disallowing IOVA as PA for performance reasons). > In current KNI case it > Fall backs to PA irrespective of device capability so we don’t need any > separate flag from driver. > > Even if we introduce a flag, what it is supposed to do? The same thing you are suggesting for one of your HW mempool drivers: a "need" to only run in IOVA as VA mode. The logic in this case (as far as the driver is concerned, disregarding the kernel driver issue for now) would be: has_pa = (drv->flags & SUPPORTS_PA) != 0; has_va = (drv->flags & SUPPORTS_VA) != 0; if (has_va && has_pa) return RTE_IOVA_DC; // don't care, supports both if (has_va) return RTE_IOVA_VA; // only supports VA - your case return RTE_IOVA_PA; // only supports PA Currently (again, disregarding your interpretation of how IOVA as VA works and looking at the actual commit history), we always seem to imply that IOVA as PA works for all devices, and we use IOVA_AS_VA flag to indicate that the device *also* supports IOVA as VA mode. But we don't have any way to express a *requirement* for IOVA as VA mode - only for IOVA as PA mode. That is the purpose of the new flag. You are stating that the IOVA_AS_VA drv flag is an expression of that requirement, but that is not reflected in the codebase - our commit history indicates that we don't treat IOVA as VA as hard requirement whenever this flag is specified (and i would argue that we shouldn't). > >> >>> RTE_PCI_DRV_NEED_IOVA_AS_VA - it can run only on IOVA as VA >> >> If we're adding a flag, we might as well not create a confusion and do it >> consistently. If IOVA as PA is supported, have a flag to indicate that. If IOVA >> as VA is supported, have a flag to indicate that. Absence of either flag implies > > So in what category i40e driver comes? By default, pci bus should return PA for class. > If VA supported then return VA. > So how new flag will help? We seem to be in agreement that we need *two* flags to express all three of the above. The question is, which flags. You suggest to have "supports IOVA as VA" and "requires IOVA as VA" as two options, while i am suggesting to have "supports IOVA as PA" and "supports IOVA as VA" as flags. This requires modification to all existing drivers and is perhaps undesirable from that point of view (this isn't my decision), but it's less confusing than having two IOVA-as-VA flags that differ slightly in their meaning (supports VA vs. requires VA). Going back to your i40e example, AFAIK i40e supports both IOVA as VA and IOVA as PA - so in this case it should return RTE_IOVA_DC (i.e. use whatever's available). If other devices also don't care, then push the decision to the upper layers and not decide anything at the bus level [1]. [1] http://patchwork.dpdk.org/patch/54801/ > >> inability to work in that mode. I don't see how this is less clear and self- >> documenting than having two IOVA as VA-related flags that have slightly >> different meaning and imply things not otherwise stated explicitly. >> >>> # With top of tree, Currently it never runs in IOVA as VA mode. >>> That’s a separate problem to fix. Which effect all the devices >>> Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though Device >>> support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU >>> protection and/or root privilege is required to run DPDK. > > What's your view on this existing problem? My view would be to always run in IOVA as VA by default and only falling back to IOVA as PA if there is a need to do that. Yet, it seems that whenever i try to bring this up, the response (not necessarily from you, so this is not directed at you specifically) seems to be that because of hotplug, we have to start in the "safest" (from device support point of view) mode - that is, in IOVA as PA. Seeing how, as you claim, some devices require IOVA as VA, then IOVA as PA is no longer the "safe" default that all devices will support. Perhaps we can use this opportunity to finally make IOVA as VA the default :) > > >>> >>> >>>> >>>> [1] http://patchwork.dpdk.org/patch/53206/ >>>> [2] http://patchwork.dpdk.org/patch/50274/ >>>> [3] http://patchwork.dpdk.org/patch/50991/ >>>> [4] http://patchwork.dpdk.org/patch/46134/ >>>> >>>> -- >>>> Thanks, >>>> Anatoly >> >> >> -- >> Thanks, >> Anatoly -- Thanks, Anatoly