From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F39DDC5DF63 for ; Wed, 6 Nov 2019 18:17:37 +0000 (UTC) Received: from dpdk.org (dpdk.org [92.243.14.124]) by mail.kernel.org (Postfix) with ESMTP id 8CAAF217D7 for ; Wed, 6 Nov 2019 18:17:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=broadcom.com header.i=@broadcom.com header.b="ckOxYYjj" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8CAAF217D7 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=broadcom.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 63CC11DFE7; Wed, 6 Nov 2019 19:17:31 +0100 (CET) Received: from mail-il1-f193.google.com (mail-il1-f193.google.com [209.85.166.193]) by dpdk.org (Postfix) with ESMTP id 4BC701C1D3 for ; Wed, 6 Nov 2019 16:28:37 +0100 (CET) Received: by mail-il1-f193.google.com with SMTP id n18so15144882ilt.9 for ; Wed, 06 Nov 2019 07:28:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2qmMlpE0nQ+qNAL6NspS287/QjFsGy9MTDhADTEnCG4=; b=ckOxYYjjzgu1fC8EttUhlwMOX2ZG0n4ptOoh0ryYI5QLRb63bpuVq+dqzjxGt75EHn 1Xz9gmoZHCK7ctxd3x+FiMwySLJmqrPTJzwDOCrCs0CdSi6aW3iFs7JYXNLmyRXwWnwd xNlPM3v99oaRso7TcVzq8LyAcLAg+SXcKOojs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2qmMlpE0nQ+qNAL6NspS287/QjFsGy9MTDhADTEnCG4=; b=cxoKy+WTk4d68/otmz9f2A6mPU5yosy7Feq1uyBXD+dumQgLg7ck3YjopD8X8FvKda RE/px0VB8iQlIoLfEsNi8dCPlzaGrbuFldClDfkmFArJxFWw1uC1fYN81xWLVs2V+hMT y4ad2SlVNuk6kCdMcJt696otUjtkgByc6FJXghSUQDxILVSso6nZizQdSZsAr+RVsxus u/eYiO1TkoRHVzpsHVNu5SX8v5P6Znrxai9HLOj2X3RXaG0LBbGO1gEQ9R8ef3oLMjaA 2sSHtaEHAtoK+zFdEW+yzbh1LLeHa+zxHX7Uxc1fkf/Ty+rA6j+05kOd2xVf5KUw/0Hp 9g6Q== X-Gm-Message-State: APjAAAV6G3Z6VDHPWLKOtA7Ww4FXuQ2HSUJEL9QNa/EW04c7Kktz3uT9 xhj34xNrGEWUCMJqHUoklFIbWTbiU9o69x3pDOpaGQ== X-Google-Smtp-Source: APXvYqz7LAdCBJTDwpxDlrCXf8yJWqIXYc/C7me752HA1Jeg2iEcTYkF0pK/5pwEKch1CTCMLRtFFxT8K2io6mG13nc= X-Received: by 2002:a92:b109:: with SMTP id t9mr1480804ilh.144.1573054116474; Wed, 06 Nov 2019 07:28:36 -0800 (PST) MIME-Version: 1.0 References: <5dd669557fc499df5e345a14c9252c095eff6c07.1572966906.git.anatoly.burakov@intel.com> In-Reply-To: From: Rajesh Ravi Date: Wed, 6 Nov 2019 20:57:59 +0530 Message-ID: To: David Marchand Cc: "Burakov, Anatoly" , dev , Bruce Richardson , Ajit Khaparde , Jonathan Richardson , Scott Branden , Vikram Mysore Prakash , Srinath Mannam , Thomas Monjalon X-Mailman-Approved-At: Wed, 06 Nov 2019 19:17:27 +0100 Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH 19.11] vfio: fix DMA mapping of externally allocated heaps X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Thanks Anatoly & David. The main patch provided by Anatoly is not compatible with DPDK 19.02. So I had to make following patch and verified it to be working with DPDK 19.02 & SPDK 19.07 combination. ------------------------------------------------------------------------------------------------------------------------------------------------------------------ diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index 84aabe3..49934f5 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -35,6 +35,7 @@ struct rte_memseg_list { volatile uint32_t version; /**< version number for multiprocess sync. */ size_t len; /**< Length of memory area covered by this memseg list. */ unsigned int external; /**< 1 if this list points to external memory */ + unsigned int heap; /**< 1 if this list points to a heap */ struct rte_fbarray memseg_arr; }; diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index b39de3c..4d31bf7 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -368,6 +368,7 @@ void rte_free(void *addr) rte_spinlock_lock(&heap->lock); ret = malloc_heap_add_external_memory(heap, msl); + msl->heap = 1; /* mark it as heap segment */ rte_spinlock_unlock(&heap->lock); unlock: diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 1b96b57..33dd923 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -836,6 +836,7 @@ void numa_error(char *where) msl->page_sz = page_sz; msl->socket_id = socket_id; msl->base_va = NULL; + msl->heap = 1; /* mark it as a heap segment */ RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n", (size_t)page_sz >> 10, socket_id); @@ -1410,6 +1411,7 @@ void numa_error(char *where) msl->page_sz = page_sz; msl->socket_id = 0; msl->len = internal_config.memory; + msl->heap = 1; /* we're in single-file segments mode, so only the segment list * fd needs to be set up. @@ -1682,6 +1684,7 @@ void numa_error(char *where) mem_sz = msl->len; munmap(msl->base_va, mem_sz); msl->base_va = NULL; + msl->heap = 0; /* destroy backing fbarray */ rte_fbarray_destroy(&msl->memseg_arr); diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c index 61b54f9..3bb3e6b 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c @@ -1238,7 +1238,16 @@ static int vfio_dma_mem_map(struct vfio_config *vfio_cfg, uint64_t vaddr, { int *vfio_container_fd = arg; - if (msl->external & !internal_config.iso_cmem) + /* skip external memory that isn't a heap */ + if (msl->external && !msl->heap) + return 0; + + /* skip any segments with invalid IOVA addresses */ + if (ms->iova == RTE_BAD_IOVA) + return 0; + + /* if IOVA mode is VA, we've already mapped the internal segments */ + if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA) return 0; return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova, @@ -1362,9 +1371,19 @@ static int vfio_dma_mem_map(struct vfio_config *vfio_cfg, uint64_t vaddr, { int *vfio_container_fd = arg; - if (msl->external) + /* skip external memory that isn't a heap */ + if (msl->external && !msl->heap) return 0; + /* skip any segments with invalid IOVA addresses */ + if (ms->iova == RTE_BAD_IOVA) + return 0; + +#if 0 + if (ms->addr_64 == param->addr_64) + return 0; +#endif + return vfio_spapr_dma_do_map(*vfio_container_fd, ms->addr_64, ms->iova, ms->len, 1); } @@ -1381,6 +1400,12 @@ struct spapr_walk_param { uint64_t max = ms->iova + ms->len; if (msl->external) + /* skip external memory that isn't a heap */ + if (msl->external && !msl->heap) + return 0; + + /* skip any segments with invalid IOVA addresses */ + if (ms->iova == RTE_BAD_IOVA) return 0; if (max > param->window_size) { ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Regards, Rajesh On Wed, Nov 6, 2019 at 7:28 PM David Marchand wrote: > On Tue, Nov 5, 2019 at 6:15 PM Burakov, Anatoly > wrote: > > > > On 05-Nov-19 3:15 PM, Anatoly Burakov wrote: > > > Currently, externally created heaps are supposed to be automatically > > > mapped for VFIO DMA by EAL, however they only do so if, at the time of > > > heap creation, VFIO is initialized and has at least one device > > > available. If no devices are available at the time of heap creation (or > > > if devices were available, but were since hot-unplugged, thus dropping > > > all VFIO container mappings), then VFIO mapping code would have skipped > > > over externally allocated heaps. > > > > > > The fix is two-fold. First, we allow externally allocated memory > > > segments to be marked as "heap" segments. This allows us to distinguish > > > between external memory segments that were created via heap API, from > > > those that were created via rte_extmem_register() API. > > > > > > Then, we fix the VFIO code to only skip non-heap external segments. > > > Also, since external heaps are not guaranteed to have valid IOVA > > > addresses, we will skip those which have invalid IOVA addresses as > well. > > > > > > Fixes: 0f526d674f8e ("malloc: separate creating memseg list and malloc > heap") > > > > > > Signed-off-by: Anatoly Burakov > > > --- > > > > > > Notes: > > > This cannot be backported to older releases as it breaks the > > > API and ABI. A separate fix is in the works for stable. > > > > > > > Alternative, non-breaking implementation available (which will be slower > > due to O(N) memseg list heaps lookups): > > > > http://patches.dpdk.org/patch/62486/ > > > > I'm fine with either option being merged. > > > > The more perfect solution would've been to rename "msl->external" into > > "msl->flags" and have various flags for memseg lists, but it's too late > > to break the API now. > > Either way is fine for me (between the 18.11 and the master patches you > sent). > Breaking the ABI is acceptable, but I agree the API is another story. > > If the code seems better to you with the additional heap flag, let's go > for it. > > I would still like to hear from Rajesh though, since he seems to be > the first to hit this issue. > > > -- > David Marchand > > -- Regards, Rajesh