From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8348DCA9EC3 for ; Tue, 29 Oct 2019 17:34:47 +0000 (UTC) Received: from dpdk.org (dpdk.org [92.243.14.124]) by mail.kernel.org (Postfix) with ESMTP id 175A020679 for ; Tue, 29 Oct 2019 17:34:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=6wind.com header.i=@6wind.com header.b="GWXgd296" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 175A020679 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=6wind.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E1AB51BF7A; Tue, 29 Oct 2019 18:34:45 +0100 (CET) Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by dpdk.org (Postfix) with ESMTP id E76251BF71 for ; Tue, 29 Oct 2019 18:34:44 +0100 (CET) Received: by mail-wr1-f66.google.com with SMTP id v9so14600177wrq.5 for ; Tue, 29 Oct 2019 10:34:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=7G9lBYZaCSACOFT8+kWfS8LTNxRBZ5BXxxKbMQsbGL4=; b=GWXgd2964D5Rn7D3LInLphxNdygQVz2Pr/1cJ0RJ8NyOg9olMkLSPnvpYTc6LX/R6v egyiUFRkruE+qhFegdNVmQCJnur/wkvrO7Id1rIzXbfWo01NFI9xzUdWvFuWhKU+My/7 scI5/h6oUBSKYQqnpbll4IcHqF5jObBsjgyUjSb2vsKlctmZdMz18J3C2mhT/LpHxK3v Qcd7U7MCxoCms7qRabAifqHfgB/kNOZ0bO4rvCo/kBl+/uBTOCm4yFIZt7Dbjkg0fdro q1F034VMepOVGmQlNsBdWkMXo1Pqt0rdy/m5sKERj0q7k4Jes0gIwCGXAU4wVLL40QPi OgnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=7G9lBYZaCSACOFT8+kWfS8LTNxRBZ5BXxxKbMQsbGL4=; b=LDVmdGw/KhdBKnt1iBb5oWqTDdFFfICAXMhtUCaMj/4ijgS0nFIWeAlnlF+LQvpodu CQ4xOLav9rBMYeu1jAMSCDj2QnKWzGHaU5AlUN+L9dPjtuGBmbzsR9lxK4wmKdUvIbIs B2xfgN1Y/+NFi6fRGIPzesyrKIdhHpUVCBIWPsaFSlgC8PGM1REbYpXPeQ0g+Clgxalz DzzPh67zGbKIHX9lsxvSENAFTNxQfCV6hjW/xHsPeVU9NuDPFb/qT5TMKXymfGsDFGKs 03XtwIGQKsriL3YzGh+fbRlz5bNPZLI+YgV4nWihpyoPCD1rlYXMeQgRdwe+JVEZn9re NCgg== X-Gm-Message-State: APjAAAUAw9ujGz+ADcIsBXb5D9QgoAs6kezXRVajpdcjR2FMwHpbrU2Z 5cC8qAUHWgFHUcJiJP60gOCgUw== X-Google-Smtp-Source: APXvYqz8V2lNQM4FBNUm0sHge+x/99OciKl/EdEcmjkFN/IvCnWbdkTP+TTOzhsn7J9lwcgN55YALQ== X-Received: by 2002:a5d:46d2:: with SMTP id g18mr20121107wrs.245.1572370484486; Tue, 29 Oct 2019 10:34:44 -0700 (PDT) Received: from 6wind.com (2a01cb0c0005a6000226b0fffeed02fc.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:226:b0ff:feed:2fc]) by smtp.gmail.com with ESMTPSA id c15sm3389819wmb.45.2019.10.29.10.34.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Oct 2019 10:34:42 -0700 (PDT) Date: Tue, 29 Oct 2019 18:34:42 +0100 From: Olivier Matz To: Andrew Rybchenko Cc: dev@dpdk.org, Anatoly Burakov , Ferruh Yigit , "Giridharan, Ganesan" , Jerin Jacob Kollanukkaran , Kiran Kumar Kokkilagadda , Stephen Hemminger , Thomas Monjalon , Vamsi Krishna Attunuru Message-ID: <20191029173442.4bzpefjs2ivp5yk4@platinum> References: <20190719133845.32432-1-olivier.matz@6wind.com> <20191028140122.9592-1-olivier.matz@6wind.com> <20191028140122.9592-6-olivier.matz@6wind.com> <08a69641-9876-1f28-0f43-06f5d858d4c7@solarflare.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <08a69641-9876-1f28-0f43-06f5d858d4c7@solarflare.com> User-Agent: NeoMutt/20180716 Subject: Re: [dpdk-dev] ***Spam*** [PATCH 5/5] mempool: prevent objects from being across pages X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Tue, Oct 29, 2019 at 01:59:00PM +0300, Andrew Rybchenko wrote: > On 10/28/19 5:01 PM, Olivier Matz wrote: > > When populating a mempool, ensure that objects are not located across > > several pages, except if user did not request iova contiguous objects. > > I think it breaks distribution across memory channels which could > affect performance significantly. With 2M hugepages, there are ~900 mbufs per page, and all of them will be distributed across memory channels. For larger objects, I don't think the distribution is that important. With small pages, that may be true. I think the problem was already there except in IOVA=VA mode. This should be fixable, but I'm not sure a use-case where we can see a regression really exists. > > Signed-off-by: Vamsi Krishna Attunuru > > Signed-off-by: Olivier Matz > > --- > > lib/librte_mempool/rte_mempool.c | 23 +++++----------- > > lib/librte_mempool/rte_mempool_ops_default.c | 29 ++++++++++++++++++-- > > 2 files changed, 33 insertions(+), 19 deletions(-) > > > > diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c > > index 7664764e5..b23fd1b06 100644 > > --- a/lib/librte_mempool/rte_mempool.c > > +++ b/lib/librte_mempool/rte_mempool.c > > @@ -428,8 +428,6 @@ rte_mempool_get_page_size(struct rte_mempool *mp, size_t *pg_sz) > > if (!need_iova_contig_obj) > > *pg_sz = 0; > > - else if (!alloc_in_ext_mem && rte_eal_iova_mode() == RTE_IOVA_VA) > > - *pg_sz = 0; > > else if (rte_eal_has_hugepages() || alloc_in_ext_mem) > > *pg_sz = get_min_page_size(mp->socket_id); > > else > > @@ -478,17 +476,15 @@ rte_mempool_populate_default(struct rte_mempool *mp) > > * then just set page shift and page size to 0, because the user has > > * indicated that there's no need to care about anything. > > * > > - * if we do need contiguous objects, there is also an option to reserve > > - * the entire mempool memory as one contiguous block of memory, in > > - * which case the page shift and alignment wouldn't matter as well. > > + * if we do need contiguous objects (if a mempool driver has its > > + * own calc_size() method returning min_chunk_size = mem_size), > > + * there is also an option to reserve the entire mempool memory > > + * as one contiguous block of memory. > > * > > * if we require contiguous objects, but not necessarily the entire > > - * mempool reserved space to be contiguous, then there are two options. > > - * > > - * if our IO addresses are virtual, not actual physical (IOVA as VA > > - * case), then no page shift needed - our memory allocation will give us > > - * contiguous IO memory as far as the hardware is concerned, so > > - * act as if we're getting contiguous memory. > > + * mempool reserved space to be contiguous, pg_sz will be != 0, > > + * and the default ops->populate() will take care of not placing > > + * objects across pages. > > * > > * if our IO addresses are physical, we may get memory from bigger > > * pages, or we might get memory from smaller pages, and how much of it > > @@ -501,11 +497,6 @@ rte_mempool_populate_default(struct rte_mempool *mp) > > * > > * If we fail to get enough contiguous memory, then we'll go and > > * reserve space in smaller chunks. > > - * > > - * We also have to take into account the fact that memory that we're > > - * going to allocate from can belong to an externally allocated memory > > - * area, in which case the assumption of IOVA as VA mode being > > - * synonymous with IOVA contiguousness will not hold. > > */ > > need_iova_contig_obj = !(mp->flags & MEMPOOL_F_NO_IOVA_CONTIG); > > diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c > > index f6aea7662..dd09a0a32 100644 > > --- a/lib/librte_mempool/rte_mempool_ops_default.c > > +++ b/lib/librte_mempool/rte_mempool_ops_default.c > > @@ -61,21 +61,44 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp, > > return mem_size; > > } > > +/* Returns -1 if object crosses a page boundary, else returns 0 */ > > +static int > > +check_obj_bounds(char *obj, size_t pg_sz, size_t elt_sz) > > +{ > > + if (pg_sz == 0) > > + return 0; > > + if (elt_sz > pg_sz) > > + return 0; > > + if (RTE_PTR_ALIGN(obj, pg_sz) != RTE_PTR_ALIGN(obj + elt_sz - 1, pg_sz)) > > + return -1; > > + return 0; > > +} > > + > > int > > rte_mempool_op_populate_default(struct rte_mempool *mp, unsigned int max_objs, > > void *vaddr, rte_iova_t iova, size_t len, > > rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg) > > { > > - size_t total_elt_sz; > > + char *va = vaddr; > > + size_t total_elt_sz, pg_sz; > > size_t off; > > unsigned int i; > > void *obj; > > + rte_mempool_get_page_size(mp, &pg_sz); > > + > > The function may return an error which should be taken into account here. That would be better, indeed. > > total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size; > > - for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs; i++) { > > + for (off = 0, i = 0; i < max_objs; i++) { > > + /* align offset to next page start if required */ > > + if (check_obj_bounds(va + off, pg_sz, total_elt_sz) < 0) > > + off += RTE_PTR_ALIGN_CEIL(va + off, pg_sz) - (va + off); > > + > > + if (off + total_elt_sz > len) > > + break; > > + > > off += mp->header_size; > > - obj = (char *)vaddr + off; > > + obj = va + off; > > obj_cb(mp, obj_cb_arg, obj, > > (iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off)); > > rte_mempool_ops_enqueue_bulk(mp, &obj, 1); > Thanks for all the comments! I will send a v2 tomorrow. Olivier