From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shreyansh Jain Subject: Re: [PATCH v2 23/41] mempool: add support for the new allocation methods Date: Tue, 20 Mar 2018 17:05:17 +0530 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: dev@dpdk.org, Olivier Matz , keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, Bruce Richardson , Thomas Monjalon , "Ananyev, Konstantin" , kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, Hemant Agrawal To: Anatoly Burakov Return-path: Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on0087.outbound.protection.outlook.com [104.47.1.87]) by dpdk.org (Postfix) with ESMTP id 5A24E2BC7 for ; Tue, 20 Mar 2018 12:35:52 +0100 (CET) Received: by mail-wr0-f173.google.com with SMTP id z12so1326582wrg.4 for ; Tue, 20 Mar 2018 04:35:50 -0700 (PDT) In-Reply-To: List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hello Anatoly, On Wed, Mar 7, 2018 at 10:26 PM, Anatoly Burakov wrote: > If a user has specified that the zone should have contiguous memory, > use the new _contig allocation API's instead of normal ones. > Otherwise, account for the fact that unless we're in IOVA_AS_VA > mode, we cannot guarantee that the pages would be physically > contiguous, so we calculate the memzone size and alignments as if > we were getting the smallest page size available. > > Signed-off-by: Anatoly Burakov > --- [...] > static void > mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova) > { > @@ -549,6 +570,7 @@ rte_mempool_populate_default(struct rte_mempool *mp) > unsigned mz_id, n; > unsigned int mp_flags; > int ret; > + bool force_contig, no_contig; > > /* mempool must not be populated */ > if (mp->nb_mem_chunks != 0) > @@ -563,10 +585,46 @@ rte_mempool_populate_default(struct rte_mempool *mp) > /* update mempool capabilities */ > mp->flags |= mp_flags; > > - if (rte_eal_has_hugepages()) { > - pg_shift = 0; /* not needed, zone is physically contiguous */ > + no_contig = mp->flags & MEMPOOL_F_NO_PHYS_CONTIG; > + force_contig = mp->flags & MEMPOOL_F_CAPA_PHYS_CONTIG; > + > + /* > + * there are several considerations for page size and page shift here. > + * > + * if we don't need our mempools to have physically contiguous objects, > + * then just set page shift and page size to 0, because the user has > + * indicated that there's no need to care about anything. I think the above case is not handled properly here. reason below... > + * > + * if we do need contiguous objects, there is also an option to reserve > + * the entire mempool memory as one contiguous block of memory, in > + * which case the page shift and alignment wouldn't matter as well. > + * > + * if we require contiguous objects, but not necessarily the entire > + * mempool reserved space to be contiguous, then there are two options. > + * > + * if our IO addresses are virtual, not actual physical (IOVA as VA > + * case), then no page shift needed - our memory allocation will give us > + * contiguous physical memory as far as the hardware is concerned, so > + * act as if we're getting contiguous memory. > + * > + * if our IO addresses are physical, we may get memory from bigger > + * pages, or we might get memory from smaller pages, and how much of it > + * we require depends on whether we want bigger or smaller pages. > + * However, requesting each and every memory size is too much work, so > + * what we'll do instead is walk through the page sizes available, pick > + * the smallest one and set up page shift to match that one. We will be > + * wasting some space this way, but it's much nicer than looping around > + * trying to reserve each and every page size. > + */ > + > + if (no_contig || force_contig || rte_eal_iova_mode() == RTE_IOVA_VA) { > pg_sz = 0; > + pg_shift = 0; > align = RTE_CACHE_LINE_SIZE; So, assuming dpaa2 as example, I ran testpmd. IOVA=VA is the mode. pg_sz = 0 is set. same as before applying the hotplug patchset except that earlier this decision was purely based on availability of hugepages (rte_eal_has_hugepages()). Moving on... > + } else if (rte_eal_has_hugepages()) { > + pg_sz = get_min_page_size(); > + pg_shift = rte_bsf32(pg_sz); > + align = pg_sz; > } else { > pg_sz = getpagesize(); > pg_shift = rte_bsf32(pg_sz); > @@ -585,23 +643,34 @@ rte_mempool_populate_default(struct rte_mempool *mp) > goto fail; > } > > - mz = rte_memzone_reserve_aligned(mz_name, size, > - mp->socket_id, mz_flags, align); > - /* not enough memory, retry with the biggest zone we have */ > - if (mz == NULL) > - mz = rte_memzone_reserve_aligned(mz_name, 0, > + if (force_contig) { > + /* > + * if contiguous memory for entire mempool memory was > + * requested, don't try reserving again if we fail. > + */ > + mz = rte_memzone_reserve_aligned_contig(mz_name, size, > + mp->socket_id, mz_flags, align); > + } else { > + mz = rte_memzone_reserve_aligned(mz_name, size, > mp->socket_id, mz_flags, align); > + /* not enough memory, retry with the biggest zone we > + * have > + */ > + if (mz == NULL) > + mz = rte_memzone_reserve_aligned(mz_name, 0, > + mp->socket_id, mz_flags, align); > + } > if (mz == NULL) { > ret = -rte_errno; > goto fail; > } > > - if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG) > + if (no_contig) > iova = RTE_BAD_IOVA; > else > iova = mz->iova; > > - if (rte_eal_has_hugepages()) > + if (rte_eal_has_hugepages() && force_contig) So, pre-hotplugging patch, call used to enter mempool_populate_iova. But, with the 'force_contig' not set (in app/test-pmd/testpmd.c:521) while calling rte_pktmbuf_pool_create, rte_mempool_populate_va is called instead. > ret = rte_mempool_populate_iova(mp, mz->addr, > iova, mz->len, > rte_mempool_memchunk_mz_free, > -- > 2.7.4 This is called with pg_sz = 0: 678 else >># 679 ret = rte_mempool_populate_virt(mp, mz->addr, 680 mz->len, pg_sz, 681 rte_mempool_memchunk_mz_free, 682 (void *)(uintptr_t)mz); In this function, 512 /* address and len must be page-aligned */ 513 if (RTE_PTR_ALIGN_CEIL(addr, pg_sz) != addr) 514 return -EINVAL; This is where error is returned. I don't think RTE_PTR_ALIGN_CEIL is designed to handle pg_sz = 0. It is roughly equivalent to: RTE_PTR_ALIGN_FLOOR(((uintptr_t)addr - 1), pg_sz) which returns NULL (0 ~ pg_sz). Basically, this ends up failing rte_mempool_populate_default. I think the reason is the assumption that when rte_mempool_populate_virt is called, it can handle 0 page sizes (there would issues besides the above RTE_PTR_ALIGN_CEIL as well, like a for-loop looping over off+pg_sz), is wrong. It needs a valid page-size value to work with (!0). So, basically, DPAA2 is stuck with this patch because of above issue, if I am correctly comprehending it as above. Regards, Shreyansh