From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B670FC433C1 for ; Tue, 23 Mar 2021 11:09:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4CD53619C4 for ; Tue, 23 Mar 2021 11:09:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4CD53619C4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DC1C16B0156; Tue, 23 Mar 2021 07:09:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D70A66B0158; Tue, 23 Mar 2021 07:09:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C13E46B0159; Tue, 23 Mar 2021 07:09:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0026.hostedemail.com [216.40.44.26]) by kanga.kvack.org (Postfix) with ESMTP id A0D0A6B0156 for ; Tue, 23 Mar 2021 07:09:05 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 686E9180AD82F for ; Tue, 23 Mar 2021 11:09:05 +0000 (UTC) X-FDA: 77950867050.07.42DE8D6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 670FDA0009DE for ; Tue, 23 Mar 2021 11:09:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616497744; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jH64CwItD7+IDNu5T9rDTh1QqhWJhGHEtaUsuBHfOjI=; b=KtR9TPuXH2C6X3zmB0VHWNc0CGD1dHNX50z+7Tm00fB7ZAl2qZ4JahD+DfbD8BrPse85RT pgWew+WoB2+tXQKVp0oPuajpoFkbK/7o5xTx8lIDO1at2DzP1oBd8mZ+qj8VWQQ1U9Zq/z VFaAlJrzPMYIOduOL8JzwI+1+rZOtFE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-351-XYjXjMkUNaOx1Y9hR_U8xA-1; Tue, 23 Mar 2021 07:09:01 -0400 X-MC-Unique: XYjXjMkUNaOx1Y9hR_U8xA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9D7881009466; Tue, 23 Mar 2021 11:08:59 +0000 (UTC) Received: from carbon (unknown [10.36.110.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 38A1F5D9F0; Tue, 23 Mar 2021 11:08:52 +0000 (UTC) Date: Tue, 23 Mar 2021 12:08:51 +0100 From: Jesper Dangaard Brouer To: Mel Gorman Cc: Chuck Lever III , Andrew Morton , Vlastimil Babka , Christoph Hellwig , Alexander Duyck , Matthew Wilcox , LKML , Linux-Net , Linux-MM , Linux NFS Mailing List , brouer@redhat.com Subject: Re: [PATCH 0/3 v5] Introduce a bulk order-0 page allocator Message-ID: <20210323120851.18d430cf@carbon> In-Reply-To: <20210322205827.GJ3697@techsingularity.net> References: <20210322091845.16437-1-mgorman@techsingularity.net> <20210322194948.GI3697@techsingularity.net> <0E0B33DE-9413-4849-8E78-06B0CDF2D503@oracle.com> <20210322205827.GJ3697@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Stat-Signature: oruej4r7kshpg3z4145ujf68wpbsi8pm X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 670FDA0009DE Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf23; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616497744-544902 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 22 Mar 2021 20:58:27 +0000 Mel Gorman wrote: > On Mon, Mar 22, 2021 at 08:32:54PM +0000, Chuck Lever III wrote: > > >> It is returning some confusing (to me) results. I'd like > > >> to get these resolved before posting any benchmark > > >> results. > > >> > > >> 1. When it has visited every array element, it returns the > > >> same value as was passed in @nr_pages. That's the N + 1th > > >> array element, which shouldn't be touched. Should the > > >> allocator return nr_pages - 1 in the fully successful case? > > >> Or should the documentation describe the return value as > > >> "the number of elements visited" ? > > >> > > > > > > I phrased it as "the known number of populated elements in the > > > page_array". > > > > The comment you added states: > > > > + * For lists, nr_pages is the number of pages that should be allocated. > > + * > > + * For arrays, only NULL elements are populated with pages and nr_pages > > + * is the maximum number of pages that will be stored in the array. > > + * > > + * Returns the number of pages added to the page_list or the index of the > > + * last known populated element of page_array. > > > > > > > I did not want to write it as "the number of valid elements > > > in the array" because that is not necessarily the case if an array is > > > passed in with holes in the middle. I'm open to any suggestions on how > > > the __alloc_pages_bulk description can be improved. > > > > The comments states that, for the array case, a /count/ of > > pages is passed in, and an /index/ is returned. If you want > > to return the same type for lists and arrays, it should be > > documented as a count in both cases, to match @nr_pages. > > Consumers will want to compare @nr_pages with the return > > value to see if they need to call again. > > > > Then I'll just say it's the known count of pages in the array. That > might still be less than the number of requested pages if holes are > encountered. > > > > The definition of the return value as-is makes sense for either a list > > > or an array. Returning "nr_pages - 1" suits an array because it's the > > > last valid index but it makes less sense when returning a list. > > > > > >> 2. Frequently the allocator returns a number smaller than > > >> the total number of elements. As you may recall, sunrpc > > >> will delay a bit (via a call to schedule_timeout) then call > > >> again. This is supposed to be a rare event, and the delay > > >> is substantial. But with the array-based API, a not-fully- > > >> successful allocator call seems to happen more than half > > >> the time. Is that expected? I'm calling with GFP_KERNEL, > > >> seems like the allocator should be trying harder. > > >> > > > > > > It's not expected that the array implementation would be worse *unless* > > > you are passing in arrays with holes in the middle. Otherwise, the success > > > rate should be similar. > > > > Essentially, sunrpc will always pass an array with a hole. > > Each RPC consumes the first N elements in the rq_pages array. > > Sometimes N == ARRAY_SIZE(rq_pages). AFAIK sunrpc will not > > pass in an array with more than one hole. Typically: > > > > .....PPPP > > > > My results show that, because svc_alloc_arg() ends up calling > > __alloc_pages_bulk() twice in this case, it ends up being > > twice as expensive as the list case, on average, for the same > > workload. > > > > Ok, so in this case the caller knows that holes are always at the > start. If the API returns an index that is a valid index and populated, > it can check the next index and if it is valid then the whole array > must be populated. > > Right now, the implementation checks for populated elements at the *start* > because it is required for calling prep_new_page starting at the correct > index and the API cannot make assumptions about the location of the hole. > > The patch below would check the rest of the array but note that it's > slower for the API to do this check because it has to check every element > while the sunrpc user could check one element. Let me know if a) this > hunk helps and b) is desired behaviour. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c83d38dfe936..4bf20650e5f5 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5107,6 +5107,9 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > } else { > while (prep_index < nr_populated) > prep_new_page(page_array[prep_index++], 0, gfp, 0); > + > + while (nr_populated < nr_pages && page_array[nr_populated]) > + nr_populated++; > } > > return nr_populated; I do know that I suggested moving prep_new_page() out of the IRQ-disabled loop, but maybe was a bad idea, for several reasons. All prep_new_page does is to write into struct page, unless some debugging stuff (like kasan) is enabled. This cache-line is hot as LRU-list update just wrote into this cache-line. As the bulk size goes up, as Matthew pointed out, this cache-line might be pushed into L2-cache, and then need to be accessed again when prep_new_page() is called. Another observation is that moving prep_new_page() into loop reduced function size with 253 bytes (which affect I-cache). ./scripts/bloat-o-meter mm/page_alloc.o-prep_new_page-outside mm/page_alloc.o-prep_new_page-inside add/remove: 18/18 grow/shrink: 0/1 up/down: 144/-397 (-253) Function old new delta __alloc_pages_bulk 1965 1712 -253 Total: Before=60799, After=60546, chg -0.42% Maybe it is better to keep prep_new_page() inside the loop. This also allows list vs array variant to share the call. And it should simplify the array variant code. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer [PATCH] mm: move prep_new_page inside IRQ disabled loop From: Jesper Dangaard Brouer ./scripts/bloat-o-meter mm/page_alloc.o-prep_new_page-outside mm/page_alloc.o-prep_new_page-inside add/remove: 18/18 grow/shrink: 0/1 up/down: 144/-397 (-253) Function old new delta __alloc_pages_bulk 1965 1712 -253 Total: Before=60799, After=60546, chg -0.42% Signed-off-by: Jesper Dangaard Brouer --- mm/page_alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 88a5c1ce5b87..b4ff09b320bc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5096,11 +5096,13 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid, else page_array[nr_populated] = page; nr_populated++; + prep_new_page(page, 0, gfp, 0); } local_irq_restore(flags); /* Prep pages with IRQs enabled. */ +/* if (page_list) { list_for_each_entry(page, page_list, lru) prep_new_page(page, 0, gfp, 0); @@ -5108,7 +5110,7 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid, while (prep_index < nr_populated) prep_new_page(page_array[prep_index++], 0, gfp, 0); } - +*/ return nr_populated; failed_irq: