From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sergio Gonzalez Monroy Subject: Re: [PATCH v2] eal: make hugetlb initialization more robust Date: Wed, 4 May 2016 12:07:03 +0100 Message-ID: <5729D7D7.3020400@intel.com> References: <1457089092-4128-1-git-send-email-jianfeng.tan@intel.com> <1457401359-132260-1-git-send-email-jianfeng.tan@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: david.marchand@6wind.com, nhorman@tuxdriver.com, konstantin.ananyev@intel.com To: Jianfeng Tan , dev@dpdk.org Return-path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 017F82B8B for ; Wed, 4 May 2016 15:15:27 +0200 (CEST) In-Reply-To: <1457401359-132260-1-git-send-email-jianfeng.tan@intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 08/03/2016 01:42, Jianfeng Tan wrote: > This patch adds an option, --huge-trybest, to use a recover mechanism to > the case that there are not so many hugepages (declared in sysfs), which > can be used. It relys on a mem access to fault-in hugepages, and if fails > with SIGBUS, recover to previously saved stack environment with > siglongjmp(). > > Test example: > a. cgcreate -g hugetlb:/test-subgroup > b. cgset -r hugetlb.1GB.limit_in_bytes=2147483648 test-subgroup > c. cgexec -g hugetlb:test-subgroup \ > ./examples/helloworld/build/helloworld -c 0x2 -n 4 --huge-trybest I think you should mention in the commit message that this option also covers the case of hugetlbfs mount with quota. > > +static sigjmp_buf jmpenv; > + > +static void sigbus_handler(int signo __rte_unused) > +{ > + siglongjmp(jmpenv, 1); > +} > + > +/* Put setjmp into a wrap method to avoid compiling error. Any non-volatile, > + * non-static local variable in the stack frame calling setjmp might be > + * clobbered by a call to longjmp. > + */ > +static int wrap_setjmp(void) > +{ > + return setjmp(jmpenv); > +} Use sigsetjmp instead of setjmp and restore the signal masks. > /* > * Mmap all hugepages of hugepage table: it first open a file in > * hugetlbfs, then mmap() hugepage_sz data in it. If orig is set, the > @@ -396,7 +413,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl, > if (fd < 0) { > RTE_LOG(ERR, EAL, "%s(): open failed: %s\n", __func__, > strerror(errno)); > - return -1; > + return i; When using --try-best, we could get an error and still work as expected. It can be confusing for users to see an error when it is expected behavior. Any thoughts? > } > > /* map the segment, and populate page tables, > @@ -407,7 +424,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl, > RTE_LOG(ERR, EAL, "%s(): mmap failed: %s\n", __func__, > strerror(errno)); > close(fd); > - return -1; > + return i; > } > Same comment as above > /* set shared flock on the file. */ > if (flock(fd, LOCK_SH | LOCK_NB) == -1) { > RTE_LOG(ERR, EAL, "%s(): Locking file failed:%s \n", > __func__, strerror(errno)); > close(fd); > - return -1; > + return i; Same comment as above > @@ -1137,10 +1206,24 @@ rte_eal_hugepage_init(void) > continue; > > /* map all hugepages available */ > - if (map_all_hugepages(&tmp_hp[hp_offset], hpi, 1) < 0){ > - RTE_LOG(DEBUG, EAL, "Failed to mmap %u MB hugepages\n", > - (unsigned)(hpi->hugepage_sz / 0x100000)); > - goto fail; > + pages_old = hpi->num_pages[0]; > + pages_new = map_all_hugepages(&tmp_hp[hp_offset], hpi, 1); > + if (pages_new < pages_old) { > + RTE_LOG(DEBUG, EAL, > + "%d not %d hugepages of size %u MB allocated\n", > + pages_new, pages_old, > + (unsigned)(hpi->hugepage_sz / 0x100000)); > + if (internal_config.huge_trybest) { > + int pages = pages_old - pages_new; > + > + internal_config.memory -= > + hpi->hugepage_sz * pages; > + nr_hugepages -= pages; > + hpi->num_pages[0] = pages_new; > + if (pages_new == 0) > + continue; > + } else > + goto fail; > } There is another call to map_all_hugepages that you are not updating the check of the return value. Sergio