From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shreyansh Jain Subject: Re: [PATCH v2 00/41] Memory Hotplug for DPDK Date: Wed, 21 Mar 2018 19:15:57 +0530 Message-ID: References: <20180308101805.GA9526@ltp-pvn> <20180308111337.GA11638@ltp-pvn> <20180308133612.GA16647@ltp-pvn> <57c18da9-7377-3c0b-4aa2-9b97ef206f4f@intel.com> <55a2a182-27d5-b59a-0993-5b988f041e98@intel.com> <20180309091513.GA5781@ltp-pvn> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: dev@dpdk.org, Hemant Agrawal To: "Burakov, Anatoly" Return-path: Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on0088.outbound.protection.outlook.com [104.47.2.88]) by dpdk.org (Postfix) with ESMTP id 0C6BB2C12 for ; Wed, 21 Mar 2018 14:46:33 +0100 (CET) Received: by mail-wm0-f50.google.com with SMTP id l16so9957720wmh.3 for ; Wed, 21 Mar 2018 06:46:31 -0700 (PDT) In-Reply-To: List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hello Anatoly, This is not necessarily right chain to reply to, but reusing this email for another issue in DPAA2 so that all issues can be at a single place. On Thu, Mar 15, 2018 at 7:31 PM, Shreyansh Jain wrote: > Hello Anatoly, > > On Tue, Mar 13, 2018 at 10:47 AM, Shreyansh Jain wrote: >> Hello Anatoly, >> >> On Fri, Mar 9, 2018 at 4:12 PM, Burakov, Anatoly >> wrote: >>> On 09-Mar-18 9:15 AM, Pavan Nikhilesh wrote: >> >> [...] >> >>>> >>>> >>>> I have taken a look at the github tree the issues with VFIO are gone, >>>> Although >>>> compilation issues with dpaa/dpaa2 are still present due to their >>>> dependency on >>>> `rte_eal_get_physmem_layout`. >>> >>> >>> I've fixed the dpaa compile issue and pushed it to github. I've tried to >>> keep the semantics the same as before, but i can't compile-test (let alone >>> test-test) them as i don't have access to a system with dpaa bus. >> >> Thanks. I will have a look at this. > > Just a heads-up, DPAA2 is broken on top-of-tree (github: > 784e041f6b520) as of now: > > --->8--- > root@ls2088ardb:~/shreyansh/07_dpdk_memory# > ./arm64-dpaa2-linuxapp-gcc/app/testpmd -c 0xE -n 1 --log-level=eal,8 > --log-level=mem,8 -- -i --portmask=0x3 > EAL: Detected lcore 0 as core 0 on socket 0 > EAL: Detected lcore 1 as core 1 on socket 0 > EAL: Detected lcore 2 as core 0 on socket 0 > EAL: Detected lcore 3 as core 1 on socket 0 > EAL: Detected lcore 4 as core 0 on socket 0 > EAL: Detected lcore 5 as core 1 on socket 0 > EAL: Detected lcore 6 as core 0 on socket 0 > EAL: Detected lcore 7 as core 1 on socket 0 > EAL: Support maximum 16 logical core(s) by configuration. > EAL: Detected 8 lcore(s) > EAL: Detected 1 NUMA nodes > EAL: VFIO PCI modules not loaded > EAL: DPAA Bus not present. Skipping. > EAL: Container: dprc.2 has VFIO iommu group id = 4 > EAL: fslmc: Bus scan completed > EAL: Module /sys/module/rte_kni not found! error 2 (No such file or directory) > EAL: Multi-process socket /var/run/.rte_unix > EAL: Probing VFIO support... > EAL: IOMMU type 1 (Type 1) is supported > EAL: IOMMU type 7 (sPAPR) is not supported > EAL: IOMMU type 8 (No-IOMMU) is not supported > EAL: VFIO support initialized > EAL: Mem event callback 'vfio_mem_event_clb' registered > EAL: Ask a virtual area of 0x2e000 bytes > EAL: Virtual area found at 0xffff86cae000 (size = 0x2e000) > EAL: Setting up physically contiguous memory... > EAL: Ask a virtual area of 0x1000 bytes > EAL: Virtual area found at 0xffff8873f000 (size = 0x1000) > EAL: Memseg list allocated: 0x100000kB at socket 0 > EAL: Ask a virtual area of 0x800000000 bytes > EAL: Virtual area found at 0xfff780000000 (size = 0x800000000) > EAL: Ask a virtual area of 0x1000 bytes > EAL: Virtual area found at 0xffff8873e000 (size = 0x1000) > EAL: Memseg list allocated: 0x100000kB at socket 0 > EAL: Ask a virtual area of 0x800000000 bytes > EAL: Virtual area found at 0xffef40000000 (size = 0x800000000) > EAL: Ask a virtual area of 0x1000 bytes > EAL: Virtual area found at 0xffff8873d000 (size = 0x1000) > EAL: Memseg list allocated: 0x100000kB at socket 0 > EAL: Ask a virtual area of 0x800000000 bytes > EAL: Virtual area found at 0xffe700000000 (size = 0x800000000) > EAL: Ask a virtual area of 0x1000 bytes > EAL: Virtual area found at 0xffff8873c000 (size = 0x1000) > EAL: Memseg list allocated: 0x100000kB at socket 0 > EAL: Ask a virtual area of 0x800000000 bytes > EAL: Virtual area found at 0xffdec0000000 (size = 0x800000000) > EAL: TSC frequency is ~25000 KHz > EAL: Master lcore 1 is ready (tid=88742110;cpuset=[1]) > EAL: lcore 3 is ready (tid=85cab910;cpuset=[3]) > EAL: lcore 2 is ready (tid=864ab910;cpuset=[2]) > EAL: eal_memalloc_alloc_page_bulk(): couldn't find suitable memseg_list > error allocating rte services array > EAL: FATAL: rte_service_init() failed > > EAL: rte_service_init() failed > > PANIC in main(): > Cannot init EAL > 1: [./arm64-dpaa2-linuxapp-gcc/app/testpmd(rte_dump_stack+0x38) [0x4f37a8]] > Aborted > --->8-- > > Above is an initial output - still investigating. I will keep you posted. > While working on issue reported in [1], I have found another issue which I might need you help. [1] http://dpdk.org/ml/archives/dev/2018-March/093202.html For [1], I bypassed by changing the mempool_add_elem code for time being - it now allows non-contiguous (not explicitly demanded contiguous) allocations to go through rte_mempool_populate_iova. With that, I was able to get DPAA2 working. Problem is: 1. When I am working with 1GB pages, I/O is working fine. 2. When using 2MB pages (1024 num), the initialization somewhere after VFIO layer fails. All with IOVA=VA mode. Some logs: This is the output of the virtual memory layout demanded by DPDK: --->8--- EAL: Ask a virtual area of 0x2e000 bytes EAL: Virtual area found at 0xffffb6561000 (size = 0x2e000) EAL: Setting up physically contiguous memory... EAL: Ask a virtual area of 0x59000 bytes EAL: Virtual area found at 0xffffb6508000 (size = 0x59000) EAL: Memseg list allocated: 0x800kB at socket 0 EAL: Ask a virtual area of 0x400000000 bytes EAL: Virtual area found at 0xfffbb6400000 (size = 0x400000000) EAL: Ask a virtual area of 0x59000 bytes EAL: Virtual area found at 0xfffbb62af000 (size = 0x59000) EAL: Memseg list allocated: 0x800kB at socket 0 EAL: Ask a virtual area of 0x400000000 bytes EAL: Virtual area found at 0xfff7b6200000 (size = 0x400000000) EAL: Ask a virtual area of 0x59000 bytes EAL: Virtual area found at 0xfff7b6056000 (size = 0x59000) EAL: Memseg list allocated: 0x800kB at socket 0 EAL: Ask a virtual area of 0x400000000 bytes EAL: Virtual area found at 0xfff3b6000000 (size = 0x400000000) EAL: Ask a virtual area of 0x59000 bytes EAL: Virtual area found at 0xfff3b5dfd000 (size = 0x59000) EAL: Memseg list allocated: 0x800kB at socket 0 EAL: Ask a virtual area of 0x400000000 bytes EAL: Virtual area found at 0xffefb5c00000 (size = 0x400000000) --->8--- Then, somehow VFIO mapping is able to find only a single page to map --->8--- EAL: Device (dpci.1) abstracted from VFIO EAL: -->Initial SHM Virtual ADDR FFFBB6400000 EAL: -----> DMA size 0x200000 EAL: Total 1 segments found. --->8--- Then, these logs appear probably when DPAA2 code requests for memory. I am not sure why it repeats the same '...expanded by 10MB'. --->8--- EAL: Calling mem event callback vfio_mem_event_clbEAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 10MB EAL: Calling mem event callback vfio_mem_event_clbEAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 10MB EAL: Calling mem event callback vfio_mem_event_clbEAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 10MB EAL: Calling mem event callback vfio_mem_event_clbEAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 10MB EAL: Calling mem event callback vfio_mem_event_clbEAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 10MB EAL: Calling mem event callback vfio_mem_event_clbEAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 10MB EAL: Calling mem event callback vfio_mem_event_clbEAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 2MB EAL: Calling mem event callback vfio_mem_event_clbEAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 10MB EAL: Calling mem event callback vfio_mem_event_clbEAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 10MB LPM or EM none selected, default LPM on Initializing port 0 ... --->8--- l3fwd is stuck at this point. What I observe is that DPAA2 driver has gone ahead to register the queues (queue_setup) with hardware and the memory has either overrun (smaller than requested size mapped) or the addresses are corrupt (that is, not dma-able). (I get SMMU faults, indicating one of these cases) There is some change from you in the fslmc/fslmc_vfio.c file (rte_fslmc_vfio_dmamap()). Ideally, that code should have walked over all the available pages for mapping but that didn't happen and only a single virtual area got dma-mapped. --->8--- EAL: Device (dpci.1) abstracted from VFIO EAL: -->Initial SHM Virtual ADDR FFFBB6400000 EAL: -----> DMA size 0x200000 EAL: Total 1 segments found. --->8--- I am looking into this but if there is some hint which come to your mind, it might help. Regards, Shreyansh