From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54AAACA9EA0 for ; Fri, 25 Oct 2019 15:02:34 +0000 (UTC) Received: from dpdk.org (dpdk.org [92.243.14.124]) by mail.kernel.org (Postfix) with ESMTP id BB7FA20684 for ; Fri, 25 Oct 2019 15:02:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB7FA20684 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 19D4C1C2A2; Fri, 25 Oct 2019 17:02:33 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 3ECA71C11A for ; Fri, 25 Oct 2019 17:02:31 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Oct 2019 08:02:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,229,1569308400"; d="scan'208";a="373577477" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.104]) ([10.237.220.104]) by orsmga005.jf.intel.com with ESMTP; 25 Oct 2019 08:02:28 -0700 To: Rajesh Ravi Cc: Ajit Khaparde , dev@dpdk.org, Jonathan Richardson , Scott Branden , Vikram Mysore Prakash , Srinath Mannam References: <20191015053047.52260-1-ajit.khaparde@broadcom.com> <83009bb3-1e0c-a22e-eff8-41a437817cb7@intel.com> <64edebee-3686-beca-2b30-c6ec1f26c162@intel.com> From: "Burakov, Anatoly" Message-ID: <31bff5b7-169c-e158-3a87-6448272c571c@intel.com> Date: Fri, 25 Oct 2019 16:02:27 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH] eal: add option --iso-cmem for external custom memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 25-Oct-19 1:53 PM, Rajesh Ravi wrote: > Thanks Anatoly. > I think now we 're close to sync. You understood it right: the problem > is dma/iommu mappings are not getting created. > > 1)vfio_mem_event_callback() is getting called later when memory is > allocated from heap but not when external memory is added to heap Please correct me if i'm wrong, but i believe that's not true. vfio_mem_event_callback() is called every time memory is added to a heap. That includes internal and external memory [1] [2]. [1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/common/rte_malloc.c#n397 [2] http://git.dpdk.org/dpdk/tree/lib/librte_eal/common/malloc_heap.c#n1231 The only way the callback doesn't get called is if VFIO doesn't get initialize and/or fails to register said callback. > > 2)vfio_dma_mem_map() is called inside vfio_mem_event_callback() but it's > skipped in case of external memory because it's invoked inside following > if condition: >     if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) >     [ I verified this also with gdb by stepping over line by line > inside vfio_mem_event_callback() ] >   We 're using DPDK v19.02 but I see almost same code and flow in DPDK > 19.08-rc4 as well. I believe that is not true either. The line you're referring to [1] does *not* skip external memory segments (or rather, that's not its intention). Instead, it is checking if we can assume that the region is IOVA-contiguous. For IOVA as VA mode, we can assume that, *but not for external memory*, which is why it is skipped. After this check, there is a loop that goes memseg by memseg [2], and it does *not* skip external memory. The memseg will only get skipped if it doesn't have a valid IOVA address [3], which will only happen if you don't provide an IOVA table at heap creation time (we can't make assumptions about desired IOVA addresses, so we just skip them). Perhaps it should output a debug message when it does that, but the important thing is, the code does nothing to skip external memory. [1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n521 [2] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n533 [3] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n536 > --------------------------------------------------------------------------------------------------------------------------------- > [ _Further explanation of point 1) :_ >  I used gdb and set break point and checked back trace to verify the > call stack trace: first I set break point at rte_malloc_heap_memory_add. >   Then I hit break point and then set break point at > vfio_mem_event_callback. I see from back trace that it's from > malloc_heap_alloc () with event = RTE_MEM_EVENT_ALLOC as can be seen > from log at the end of this mail. > >  And also I don't see any event for memory ADD as below: > enum rte_mem_event { >          RTE_MEM_EVENT_ALLOC = 0, /**< Allocation event. */ >          RTE_MEM_EVENT_FREE,      /**< Deallocation event. */ >  }; ] > > *_Log_* > *_ > _* > (gdb) bt > #0  vfio_mem_event_callback (type=RTE_MEM_EVENT_ALLOC, > addr=0x200016c00000, len=4194304, arg=0x0) at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/linuxapp/eal/eal_vfio.c:512 > #1  0x00000000004b78f0 in eal_memalloc_mem_event_notify > (event=event@entry=RTE_MEM_EVENT_ALLOC, > start=start@entry=0x200016c00000, len=len@entry=4194304) >     at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_memalloc.c:248 > #2  0x00000000004c7a90 in try_expand_heap_primary (contig=false, > bound=0, align=35184753770496, flags=0, socket=0, > elt_size=35184753770496, pg_sz=2097152, heap=0x569b3c > ) >     at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:412 > #3  try_expand_heap (heap=heap@entry=0x569b3c , > pg_sz=2097152, elt_size=elt_size@entry=8192, socket=socket@entry=0, > flags=flags@entry=0, align=align@entry=2097152, bound=bound@entry=0, > contig=contig@entry=false) >     at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:494 > #4  0x00000000004c7f28 in alloc_more_mem_on_socket > (heap=heap@entry=0x569b3c , > size=size@entry=8192, socket=0, flags=flags@entry=0, > align=align@entry=2097152, bound=bound@entry=0, contig=contig@entry=false) >     at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:622 > #5  0x00000000004c822c in malloc_heap_alloc_on_heap_id > (size=size@entry=8192, heap_id=heap_id@entry=0, flags=flags@entry=0, > align=, align@entry=2097152, bound=bound@entry=0, > contig=contig@entry=false, type=0x0) >     at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:676 > #6  0x00000000004c8464 in malloc_heap_alloc (type=type@entry=0x0, > size=size@entry=8192, socket_arg=, flags=flags@entry=0, > align=2097152, bound=bound@entry=0, contig=contig@entry=false) >     at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:714 > #7  0x00000000004c498c in rte_malloc_socket (type=type@entry=0x0, > size=size@entry=8192, align=align@entry=2097152, socket_arg= out>, socket_arg@entry=-1) >     at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/rte_malloc.c:58 > #8  0x000000000049fe24 in spdk_malloc (size=size@entry=8192, > align=align@entry=2097152, phys_addr=phys_addr@entry=0x0, > socket_id=socket_id@entry=-1, flags=flags@entry=3) >     at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:95 > #9  0x000000000049fe94 in spdk_zmalloc (size=8192, > align=align@entry=2097152, phys_addr=phys_addr@entry=0x0, > socket_id=socket_id@entry=-1, flags=flags@entry=3) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:108 > #10 0x0000000000431f64 in nvme_pcie_qpair_construct > (qpair=0x200016aed1d8, opts=opts@entry=0x0) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:1054 > #11 0x000000000043233c in nvme_pcie_ctrlr_construct_admin_qpair > (ctrlr=0x200016aed340) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:719 > #12 nvme_pcie_ctrlr_construct (trid=, opts= out>, devhandle=0x70f4b0) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:881 > #13 0x0000000000437674 in nvme_transport_ctrlr_construct > (trid=trid@entry=0xffffffffec98, opts=opts@entry=0xffffffffea00, > devhandle=devhandle@entry=0x70f4b0) >     at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:99 > #14 0x0000000000436b08 in nvme_ctrlr_probe > (trid=trid@entry=0xffffffffec98, probe_ctx=0x70eae0, > devhandle=devhandle@entry=0x70f4b0) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:426 > #15 0x0000000000430d40 in pcie_nvme_enum_cb (ctx=0xffffffffefe8, > pci_dev=0x70f4b0) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:752 > #16 0x00000000004a2e20 in spdk_pci_device_init (_drv=0x51b660 > , _dev=0x61b5f0) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:347 > #17 0x00000000004d5c68 in rte_pci_probe_one_driver (dev=0x61b5f0, > dr=0x51b660 ) at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:185 > #18 pci_probe_all_drivers (dev=0x61b5f0) at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:259 > #19 rte_pci_probe () at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:294 > #20 0x00000000004bb490 in rte_bus_probe () at > /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_bus.c:100 > #21 0x00000000004a3030 in spdk_pci_enumerate (driver=0x51b660 > , enum_cb=enum_cb@entry=0x430c50 , > enum_ctx=enum_ctx@entry=0xffffffffefe8) >     at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:502 > #22 0x00000000004319c4 in nvme_pcie_ctrlr_scan (probe_ctx=0x70eae0, > direct_connect=) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:780 > #23 0x00000000004376ac in nvme_transport_ctrlr_scan > (probe_ctx=probe_ctx@entry=0x70eae0, > direct_connect=direct_connect@entry=false) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:106 > #24 0x0000000000436be4 in spdk_nvme_probe_internal > (probe_ctx=probe_ctx@entry=0x70eae0, > direct_connect=direct_connect@entry=false) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:553 > #25 0x0000000000436fc4 in spdk_nvme_probe_async (trid=0xfffffffff0d8, > cb_ctx=cb_ctx@entry=0x6402c0, probe_cb=probe_cb@entry=0x420b70 > , attach_cb=0x421b50 , remove_cb=0x0) >     at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:1078 > #26 0x00000000004372d0 in spdk_nvme_probe (trid=, > trid@entry=0x0, cb_ctx=cb_ctx@entry=0x6402c0, > probe_cb=probe_cb@entry=0x420b70 , > attach_cb=attach_cb@entry=0x421b50 , > remove_cb=remove_cb@entry=0x0) >     at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:625 > #27 0x00000000004223b0 in bdev_nvme_library_init () at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/nvme/bdev_nvme.c:1539 > #28 0x000000000048cf80 in spdk_bdev_modules_init () at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:963 > #29 spdk_bdev_initialize (cb_fn=, cb_arg=) > at /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:1084 > #30 0x00000000004811b8 in spdk_subsystem_init (cb_fn=, > cb_arg=) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/subsystem.c:178 > #31 0x0000000000484720 in _spdk_msg_queue_run_batch (max_msgs= out>, thread=0x60c600) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:406 > #32 spdk_thread_poll (thread=thread@entry=0x60c600, > max_msgs=max_msgs@entry=0, now=now@entry=2386134829398) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:462 > #33 0x000000000048040c in _spdk_reactor_run (arg=0x60bcc0) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:272 > #34 0x0000000000480990 in spdk_reactors_start () at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:381 > #35 0x000000000047f7d4 in spdk_app_start > (opts=opts@entry=0xfffffffffa78, start_fn=start_fn@entry=0x407350 > , arg1=arg1@entry=0x0) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/app.c:692 > #36 0x0000000000405b2c in main (argc=7, argv=0xfffffffffc78) at > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/app/nvmf_tgt/nvmf_main.c:75 > > Regards, > Rajesh > > On Thu, Oct 24, 2019 at 5:13 PM Burakov, Anatoly > > wrote: > > On 22-Oct-19 8:56 AM, Rajesh Ravi wrote: > > Hi Anatoly, > > I tried calling rte_vfio_dma_map() but, it  failed for me because > > > > vfio_dma_mem_map() failed with error:VFIO support not initialized > > because: default_vfio_cfg->vfio_iommu_type=NULL ( container_dma_map > > calls vfio_dma_mem_map  with vfio_cfg = default_vfio_cfg ) > > > > call sequence: rte_vfio_dma_map() -->container_dma_map() > > --> vfio_dma_mem_map > > > > Regards, > > Rajesh > > > > On Mon, Oct 21, 2019 at 9:16 PM Rajesh Ravi > > > >> wrote: > > > >     Thanks Anatoly for prompt response. Sorry for the delayed > response, > >     I took some time to reverify with SPDK. > > > >     Infact, I do want the iommu mapping to be performed. I don't > want it > >     to be bypassed by type1_map() > >     [lib/librte_eal/linuxapp/eal/eal_vfio.c] for external memory. > > > >     Then, if I understood correctly, you suggested to call > >     rte_vfio_dma_map() as an alternative. > > > >     _Context & clarification_ > > > >     1) We 're using DPDK to manage/allocate memory for SPDK > through heap > >     API. > > > >        The only DPDK APIs we 're calling are: > >        A)rte_malloc_heap_memory_add() to add external memory to heap. > >        B)rte_malloc_heap_get_socket() & rte_malloc_socket() to > allocate > >     memory > > > >     /Are you suggesting to make a call to rte_vfio_dma_map() from > spdk, > >     in addition to the APIs listed under 1)A & 1)B instead of > modifying > >     DPDK vfio functions?/ > >         Please confirm, Probably I missed the context and might > not have > >     understood fully. > > > > > >     2) .dma_user_map_func=vfio_type1_dma_mem_map() is called from 2 > >     paths in dpdk. In either case call to dma_user_map_func() is > skipped. > >         A) _During the startup, as you said before:_ > >           rte_vfio_setup_device() --> type1_map() > > > >         B)During allocation event: > >              vfio_mem_event_callback()  (type=RTE_MEM_EVENT_ALLOC,..) > >     -->vfio_dma_mem_map() -->dma_user_map_func() > > >  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >     _Conclusion_ > > > >     So we have 2 alternatives: > > > >     A) make additional call to rte_vfio_dma_map() API after adding > >     external memory using rte_malloc_heap_memory_add()  API. > > > >     B) remove msl->external check which bypasses call to > >     dma_user_map_func() in DPDK. > > > >     I modified DPDK functions [Option B) ]. I guess you 're > >     suggesting option A) > > > >     Please confirm. > > > >     Regards, > >     Rajesh > > Hi Rajesh, > > Apologies for delayed response. > > I'm still not sure i understand your problem. > > When adding memory with rte_malloc_heap_memory_add(), this memory > is, as > far as i can tell, automatically mapped for DMA mapping with VFIO. This > happens because: > > rte_malloc_heap_memory_add() -> malloc_heap_add_external_memory() -> > eal_memalloc_mem_event_notify() -> vfio_mem_event_callback() -> > vfio_dma_mem_map() > > VFIO registers itself for mem event callbacks in > eal_vfio.c:rte_vfio_setup_device():791. > > So, there is no need to map anything for DMA when you're creating > external heaps - it is being mapped automatically by EAL. > > For an example implementation have a look at testpmd, specifically > testpmd.c:mbuf_pool_create() for MP_ALLOC_XMEM type. This calls > testpmd.c:setup_extmem(), which creates an external heap. Notice how it > doesn't do anything to map the memory for DMA, because that is already > done by EAL. You can verify this working by adding --mp-alloc=xmem > argument to testpmd to force it to use external memory for its mempools. > > So, what is the actual issue here, given that the memory is mapped for > DMA by EAL automatically when it is added? > > -- > Thanks, > Anatoly > > > > -- > Regards, > Rajesh -- Thanks, Anatoly