From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexey Kardashevskiy Subject: Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map Date: Fri, 21 Apr 2017 18:59:08 +1000 Message-ID: <41362370-9fbd-f1b8-156d-8073da3d80f7@ozlabs.ru> References: <20170420072402.38106-1-aik@ozlabs.ru> <20170420072402.38106-6-aik@ozlabs.ru> <12566b0a-8f9a-4040-a37d-2a106e49adcf@ozlabs.ru> <6e669e2b-2cfd-078d-b6b0-5c3819fad796@ozlabs.ru> <3e20a6f7-a1b7-4b13-5659-5afb827563ca@linux.vnet.ibm.com> <4977b4e8-e63a-0621-2375-89066d8de10a@ozlabs.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" , Adrian Schuepbach , Gowrishankar Muthukrishnan , gowrishankar muthukrishnan To: Jonas Pfefferle1 Return-path: Received: from mail-oi0-f68.google.com (mail-oi0-f68.google.com [209.85.218.68]) by dpdk.org (Postfix) with ESMTP id 010662C08 for ; Fri, 21 Apr 2017 10:59:13 +0200 (CEST) Received: by mail-oi0-f68.google.com with SMTP id m34so9246687oik.2 for ; Fri, 21 Apr 2017 01:59:13 -0700 (PDT) In-Reply-To: Content-Language: en-AU List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 21/04/17 18:35, Jonas Pfefferle1 wrote: > ---------------------------------------- > Jonas Pfefferle > Cloud Storage & Analytics > IBM Zurich Research Laboratory > Saeumerstrasse 4 > CH-8803 Rueschlikon, Switzerland > +41 44 724 8539 > > Alexey Kardashevskiy wrote on 21/04/2017 05:42:35: > >> From: Alexey Kardashevskiy >> To: gowrishankar muthukrishnan >> Cc: Jonas Pfefferle1 , Gowrishankar >> Muthukrishnan , Adrian Schuepbach >> , "dev@dpdk.org" >> Date: 21/04/2017 05:42 >> Subject: Re: [dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use >> correct bus addresses for DMA map >> >> On 21/04/17 05:16, gowrishankar muthukrishnan wrote: >> > On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote: >> >> On 20/04/17 23:25, Alexey Kardashevskiy wrote: >> >>> On 20/04/17 19:04, Jonas Pfefferle1 wrote: >> >>>> Alexey Kardashevskiy wrote on 20/04/2017 09:24:02: >> >>>> >> >>>>> From: Alexey Kardashevskiy >> >>>>> To: dev@dpdk.org >> >>>>> Cc: Alexey Kardashevskiy , JPF@zurich.ibm.com, >> >>>>> Gowrishankar Muthukrishnan >> >>>>> Date: 20/04/2017 09:24 >> >>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus >> >>>>> addresses for DMA map >> >>>>> >> >>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for >> >>>>> just created DMA window. It happens to start from zero because the >> >>>>> default >> >>>>> window is removed (leaving no windows) and new window starts from zero. >> >>>>> However this is not guaranteed and the new window may start from > another >> >>>>> address, this adds an error check. >> >>>>> >> >>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI >> >>>>> bus address while in this case a physical address of a user >> page is used. >> >>>>> This changes IOVA to start from zero in a hope that the rest of DPDK >> >>>>> expects this. >> >>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It >> will use the >> >>>> phys_addr of the memory segment it got from /proc/self/pagemap cf. >> >>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here > to the >> >>>> actual iova which basically makes the whole virtual to phyiscal mapping >> >>>> with pagemap unnecessary which I believe should be the case for VFIO >> >>>> anyway. Pagemap should only be needed when using pci_uio. >> >>> >> >>> Ah, ok, makes sense now. But it sure needs a big fat comment >> there as it is >> >>> not obvious why host RAM address is used there as DMA window start is not >> >>> guaranteed. >> >> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 > both >> >> have exact same value, in my setup it is 3fffb33c0000 which is a userspace >> >> address - at least ms[i].phys_addr must be physical address. >> > >> > This patch breaks i40e_dev_init() in my server. >> > >> > EAL: PCI device 0004:01:00.0 on NUMA socket 1 >> > EAL: probe driver: 8086:1583 net_i40e >> > EAL: using IOMMU type 7 (sPAPR) >> > eth_i40e_dev_init(): Failed to init adminq: -32 >> > EAL: Releasing pci mapped resource for 0004:01:00.0 >> > EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000 >> > EAL: Requested device 0004:01:00.0 cannot be used >> > EAL: PCI device 0004:01:00.1 on NUMA socket 1 >> > EAL: probe driver: 8086:1583 net_i40e >> > EAL: using IOMMU type 7 (sPAPR) >> > eth_i40e_dev_init(): Failed to init adminq: -32 >> > EAL: Releasing pci mapped resource for 0004:01:00.1 >> > EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000 >> > EAL: Requested device 0004:01:00.1 cannot be used >> > EAL: No probed ethernet devices >> > >> > I have two memseg each of 1G size. Their mapped PA and VA are > alsodifferent. >> > >> > (gdb) p /x ms[0] >> > $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 = >> > 0x3effaf000000}, >> > len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel = >> > 0x0, nrank = 0x0} >> > (gdb) p /x ms[1] >> > $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 = >> > 0x3efbaf000000}, >> > len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel = >> > 0x0, nrank = 0x0} >> > >> > Could you please recheck this. May be, if new DMA window does not start >> > from bus address 0, >> > only then you reset dma_map.iova for this offset ? >> >> As we figured out, it is --no-huge effect. >> >> Another thing - as I read the code - the window size comes from >> rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB >> window so it is far away from 1:1 mapping which is believed to be DPDK >> expectation. Looking now for a better version of > rte_eal_get_physmem_size()... > > You can try specifying the size with -m or --socket-mem. Oh, right. Thanks. >> >> >> And another problem - after few unsuccessful starts of app/testpmd, all >> huge pages are gone: >> >> aik@stratton2:~$ cat /proc/meminfo >> MemTotal: 535527296 kB >> MemFree: 516662272 kB >> MemAvailable: 515501696 kB >> ... >> HugePages_Total: 1024 >> HugePages_Free: 0 >> HugePages_Rsvd: 0 >> HugePages_Surp: 0 >> Hugepagesize: 16384 kB >> >> >> How is that possible? What is pinning these pages so testpmd process exit >> does not clear that up? > > I've also seen this. I think that happens if it does not cleanly shutdown. > I regularly clean /dev/hugepages ... Oh, I am learning new things about hugepages as we speak :) I think not being anonymous mapping has this effect. Anyway, this is a bug - pages stay allocated after every run of testpmd, even if it does not crash but just does exit() :-/ I still cannot get it working, with Intel 40G ethernet now, this is how far I get: USER1: create a new mbuf pool : n=1419456, size=2176, socket=1 EAL: Error - exiting with code: 1 Cause: Creation of mbuf pool for socket 1 failed: Cannot allocate memory aik@stratton2:~$ I have put more details to another email. > >> >> >> >> >> > >> > >> > Thanks, >> > Gowrishankar >> > >> >> >> >>> >> >>>>> Signed-off-by: Alexey Kardashevskiy >> >>>>> --- >> >>>>> lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++-- >> >>>>> 1 file changed, 10 insertions(+), 2 deletions(-) >> >>>>> >> >>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/ >> >>>>> librte_eal/linuxapp/eal/eal_vfio.c >> >>>>> index 46f951f4d..8b8e75c4f 100644 >> >>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c >> >>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c >> >>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >> >>>>> { >> >>>>> const struct rte_memseg *ms = rte_eal_get_physmem_layout(); >> >>>>> int i, ret; >> >>>>> - >> >>>>> + phys_addr_t io_offset; >> >>>>> struct vfio_iommu_spapr_register_memory reg = { >> >>>>> .argsz = sizeof(reg), >> >>>>> .flags = 0 >> >>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd) >> >>>>> return -1; >> >>>>> } >> >>>>> + io_offset = create.start_addr; >> >>>>> + if (io_offset) { >> >>>>> + RTE_LOG(ERR, EAL, " DMA offsets other than zero is not >> >>>>> supported, " >> >>>>> + "new window is created at %lx\n", io_offset); >> >>>>> + return -1; >> >>>>> + } >> >>>>> + >> >>>>> /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */ >> >>>>> for (i = 0; i < RTE_MAX_MEMSEG; i++) { >> >>>>> struct vfio_iommu_type1_dma_map dma_map; >> >>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >> >>>>> dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map); >> >>>>> dma_map.vaddr = ms[i].addr_64; >> >>>>> dma_map.size = ms[i].len; >> >>>>> - dma_map.iova = ms[i].phys_addr; >> >>>>> + dma_map.iova = io_offset; >> >>>>> dma_map.flags = VFIO_DMA_MAP_FLAG_READ | >> >>>>> VFIO_DMA_MAP_FLAG_WRITE; >> >>>>> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >> >>>>> return -1; >> >>>>> } >> >>>>> + io_offset += dma_map.size; >> >>>>> } >> >>>>> return 0; >> >>>>> -- >> >>>>> 2.11.0 >> >>>>> >> >>> >> >> >> > >> > >> >> >> -- >> Alexey >> > -- Alexey