From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shreyansh Jain Subject: Re: [PATCH 3/5] common/dpaax: add library for PA VA translation table Date: Thu, 11 Oct 2018 16:09:34 +0530 Message-ID: <2fdcf7ac-6bed-00b6-1601-b816cffc613a@nxp.com> References: <20180925125423.7505-1-shreyansh.jain@nxp.com> <20180925125423.7505-4-shreyansh.jain@nxp.com> <894130a9-017c-348d-31f8-c4c23f517f25@nxp.com> <2d2e9008-fb5b-3ecd-2d2c-e86250f5d363@nxp.com> <5e153c76-6eaa-e6a5-28ed-7cec191d4581@intel.com> <2c377b57-c418-5c03-a23d-5da91ef898d0@nxp.com> <16940ff0-278c-2376-1a86-6beaf0770a09@nxp.com> <4f10749d-b6e8-e257-0606-2fbb3c1e3d76@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: ferruh.yigit@intel.com, dev@dpdk.org To: "Burakov, Anatoly" Return-path: Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10085.outbound.protection.outlook.com [40.107.1.85]) by dpdk.org (Postfix) with ESMTP id 602DB1B439 for ; Thu, 11 Oct 2018 12:40:03 +0200 (CEST) In-Reply-To: <4f10749d-b6e8-e257-0606-2fbb3c1e3d76@intel.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thursday 11 October 2018 03:43 PM, Burakov, Anatoly wrote: > On 11-Oct-18 11:07 AM, Shreyansh Jain wrote: >> On Thursday 11 October 2018 03:32 PM, Shreyansh Jain wrote: >>> On Thursday 11 October 2018 02:33 PM, Burakov, Anatoly wrote: >>>> On 09-Oct-18 11:45 AM, Shreyansh Jain wrote: >>>>> On Tuesday 25 September 2018 07:09 PM, Shreyansh Jain wrote: >>>>>> Hello Anatoly, >>>>>> >>>>>> On Tuesday 25 September 2018 06:58 PM, Burakov, Anatoly wrote: >>>>>>> On 25-Sep-18 1:54 PM, Shreyansh Jain wrote: >>>>>>>> A common library, valid for dpaaX drivers, which is used to >>>>>>>> maintain >>>>>>>> a local copy of PA->VA translations. >>>>>>>> >>>>>>>> In case of physical addressing mode (one of the option for >>>>>>>> FSLMC, and >>>>>>>> only option for DPAA bus), the addresses of descriptors Rx'd are >>>>>>>> physical. These need to be converted into equivalent VA for >>>>>>>> rte_mbuf >>>>>>>> and other similar calls. >>>>>>>> >>>>>>>> Using the rte_mem_virt2iova or rte_mem_virt2phy is expensive. This >>>>>>>> library is an attempt to reduce the overall cost associated with >>>>>>>> this translation. >>>>>>>> >>>>>>>> A small table is maintained, containing continuous entries >>>>>>>> representing a continguous physical range. Each of these entries >>>>>>>> stores the equivalent VA, which is fed during mempool creation, or >>>>>>>> memory allocation/deallocation callbacks. >>>>>>>> >>>>> >>>>> [...] >>>>> >>>>>> >>>>>>> >>>>>>> Also, a couple of nitpicks below. >>>>>>> >>>>>>>>   cosnfig/common_base                            |   5 + >>>>>>>>   config/common_linuxapp                        |   5 + >>>>>>>>   drivers/common/Makefile                       |   4 + >>>>>>>>   drivers/common/dpaax/Makefile                 |  31 ++ >>>>>>>>   drivers/common/dpaax/dpaax_iova_table.c       | 509 >>>>>>>> ++++++++++++++++++ >>>>>>>>   drivers/common/dpaax/dpaax_iova_table.h       | 104 ++++ >>>>>>>>   drivers/common/dpaax/dpaax_logs.h             |  39 ++ >>>>>>>>   drivers/common/dpaax/meson.build              |  12 + >>>>>>> >>>>>>> >>>>>>> >>>>>>>> +    DPAAX_DEBUG("Add: Found slot at (%"PRIu64")[(%zu)] for >>>>>>>> vaddr:(%p)," >>>>>>>> +            " phy(%"PRIu64"), len(%zu)", entry[i].start, e_offset, >>>>>>>> +            vaddr, paddr, length); >>>>>>>> +    return 0; >>>>>>>> +} >>>>>>>> + >>>>>>>> +int >>>>>>>> +dpaax_iova_table_del(phys_addr_t paddr, size_t len __rte_unused) >>>>>>> >>>>>>> len is not unused. >>>>>> >>>>>> I will fix this. >>>>>> Actually, this function itself is useless - more for symmetry reason. >>>>>> Callers would be either simply updating the table, or ignoring it >>>>>> completely. But, yes, this is indeed wrong that I set that unused. >>>>>> >>>>> >>>>> Actually, I was wrong in my first reply. In case of >>>>> dpaax_iova_table_del(), len is indeed redundant. This is because >>>>> the mapping is for a complete page (min of 2MB size), even if the >>>>> request is for lesser length. So, removal of a single entry (of >>>>> fixed size) would be done. >>>>> >>>>> In fact, while on this, I think deleting a PA->VA entry itself is >>>>> incorrect (not just useless). A single entry (~2MB equivalent) can >>>>> represent multiple users (working on a rte_malloc'd area, for >>>>> example). So, effectively, its always an update - not an add or del. >>>> >>>> I'm not sure what you mean here. If you got a mem event about memory >>>> area being freed, it's guaranteed to *not* have any users - neither >>>> malloc, nor any other memory. And len is always page-aligned. >>> >>> ok. Maybe I am getting this wrong, but consider this: >>> >>> 1) hugepage size=2MB >>> 2) a = malloc(1M) >>>    this will pin an entry in table for a block starting at VA=(a) and >>> PA=(a'). Each entry is of 2MB length - that means, even if someone >>> were to access a+1048577 for an equivalent PA, they would get it >>> (though, that is a incorrect access). >>> 3) b = malloc(1M) >>>    this *might* lead to a case where same 2MB page is used and >>> VA=(b==(a+1MB)). Being hugepage backed, PA=(b=PA(a)+1M). >>> = After b, the PA-VA table has a single entry of 2MB, representing >>> two mallocs. It can be used for translation for any thread requesting >>> PAs of a or b. >>> 4) Free(a) >>>   - this would attempt to remove one 2MB entry from PA-VA table. But, >>> 'b' is already valid. Access to get_pa(VA(b)) should return me the >>> PA(b). >>>   - 'len' is not even used as the entry in PA-VA table is of a fixed >>> size. >> >> Just to add to this: >> - if talking about the mem_event callback, it definitely won't be a >> case where same page is still being served under another rte_malloc >> - But, calls can come to delete from users of PA-VA table based on >> their own rte_free(). >> >> And, your comment makes me think - I should probably del entry from >> the table only when mem_event callback is received. > > Mem events are not triggered on rte_free(), they're triggered on page > deallocation. A call to rte_free/rte_memzone_free/rte_mempool_free etc. > *might* trigger a page deallocation, but *only* if the memory area being > freed encompasses an entire page. If you rte_malloc() 64 bytes and then > rte_free() those 64 bytes, you won't get a mem event *unless* these were > the only 64 bytes allocated on a particular page, and the entire page is > no longer used by anything else. My understanding is same. But, it seems my explanation wasn't well written: For a rte_free(), I am not expecting that mem_event is raised - but, the caller of rte_free() (the eth or crypto drivers, or applications) may call the PA-VA table del function to remove the entries. This voluntary delete of table entry from the drivers or applications using PA-VA calling del of PA-VA table - is not correct. The path from mem_event callback clearing the PA-VA table entry is correct (which I removed in v2) - that time the page (len) would definitely not be used by anyone and can be removed from PA-VA table. And, yes, I agree that mem-event may not be on an rte_free(). > >> >>> >>> In the above, (3) is an assumption I am making based on my >>> understanding how mem allocator is working. Is that wrong? >>> >>> Basically, this is a restriction of this table - it has a min chunk >>> of 2MB - even for 1G hugepages - and hence, it is not possible to >>> honor deletes. I know this is convoluted logic - but, this keeps it >>> simple and use-able without much performance impact. >>> >>> [...] >>> >> >> > >