All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever III <chuck.lever@oracle.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-Net <netdev@vger.kernel.org>, Linux-MM <linux-mm@kvack.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 0/3 v5] Introduce a bulk order-0 page allocator
Date: Mon, 22 Mar 2021 20:32:54 +0000	[thread overview]
Message-ID: <0E0B33DE-9413-4849-8E78-06B0CDF2D503@oracle.com> (raw)
In-Reply-To: <20210322194948.GI3697@techsingularity.net>



> On Mar 22, 2021, at 3:49 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> On Mon, Mar 22, 2021 at 06:25:03PM +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Mar 22, 2021, at 5:18 AM, Mel Gorman <mgorman@techsingularity.net> wrote:
>>> 
>>> This series is based on top of Matthew Wilcox's series "Rationalise
>>> __alloc_pages wrapper" and does not apply to 5.12-rc2. If you want to
>>> test and are not using Andrew's tree as a baseline, I suggest using the
>>> following git tree
>>> 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-rebase-v5r9
>>> 
>>> The users of the API have been dropped in this version as the callers
>>> need to check whether they prefer an array or list interface (whether
>>> preference is based on convenience or performance).
>> 
>> I now have a consumer implementation that uses the array
>> API. If I understand the contract correctly, the return
>> value is the last array index that __alloc_pages_bulk()
>> visits. My consumer uses the return value to determine
>> if it needs to call the allocator again.
>> 
> 
> For either arrays or lists, the return value is the number of valid
> pages. For arrays, the pattern is expected to be
> 
> nr_pages = alloc_pages_bulk(gfp, nr_requested, page_array);
> for (i = 0; i < nr_pages; i++) {
> 	do something with page_array[i] 
> }
> 
> There *could* be populated valid elements on and after nr_pages but the
> implementation did not visit those elements. The implementation can abort
> early if the array looks like this
> 
> PPP....PPP
> 
> Where P is a page and . is NULL. The implementation would skip the
> first three pages, allocate four pages and then abort when a new page
> was encountered. This is an implementation detail around how I handled
> prep_new_page. It could be addressed if many callers expect to pass in
> an array that has holes in the middle.
> 
>> It is returning some confusing (to me) results. I'd like
>> to get these resolved before posting any benchmark
>> results.
>> 
>> 1. When it has visited every array element, it returns the
>> same value as was passed in @nr_pages. That's the N + 1th
>> array element, which shouldn't be touched. Should the
>> allocator return nr_pages - 1 in the fully successful case?
>> Or should the documentation describe the return value as
>> "the number of elements visited" ?
>> 
> 
> I phrased it as "the known number of populated elements in the
> page_array".

The comment you added states:

+ * For lists, nr_pages is the number of pages that should be allocated.
+ *
+ * For arrays, only NULL elements are populated with pages and nr_pages
+ * is the maximum number of pages that will be stored in the array.
+ *
+ * Returns the number of pages added to the page_list or the index of the
+ * last known populated element of page_array.


> I did not want to write it as "the number of valid elements
> in the array" because that is not necessarily the case if an array is
> passed in with holes in the middle. I'm open to any suggestions on how
> the __alloc_pages_bulk description can be improved.

The comments states that, for the array case, a /count/ of
pages is passed in, and an /index/ is returned. If you want
to return the same type for lists and arrays, it should be
documented as a count in both cases, to match @nr_pages.
Consumers will want to compare @nr_pages with the return
value to see if they need to call again.

Comparing a count to an index is a notorious source of
off-by-one errors.


> The definition of the return value as-is makes sense for either a list
> or an array. Returning "nr_pages - 1" suits an array because it's the
> last valid index but it makes less sense when returning a list.
> 
>> 2. Frequently the allocator returns a number smaller than
>> the total number of elements. As you may recall, sunrpc
>> will delay a bit (via a call to schedule_timeout) then call
>> again. This is supposed to be a rare event, and the delay
>> is substantial. But with the array-based API, a not-fully-
>> successful allocator call seems to happen more than half
>> the time. Is that expected? I'm calling with GFP_KERNEL,
>> seems like the allocator should be trying harder.
>> 
> 
> It's not expected that the array implementation would be worse *unless*
> you are passing in arrays with holes in the middle. Otherwise, the success
> rate should be similar.

Essentially, sunrpc will always pass an array with a hole.
Each RPC consumes the first N elements in the rq_pages array.
Sometimes N == ARRAY_SIZE(rq_pages). AFAIK sunrpc will not
pass in an array with more than one hole. Typically:

.....PPPP

My results show that, because svc_alloc_arg() ends up calling
__alloc_pages_bulk() twice in this case, it ends up being
twice as expensive as the list case, on average, for the same
workload.


>> 3. Is the current design intended so that if the consumer
>> does call again, is it supposed to pass in the array address
>> + the returned index (and @nr_pages reduced by the returned
>> index) ?
>> 
> 
> The caller does not have to pass in array address + returned index but
> it's more efficient if it does.
> 
> If you are passing in arrays with holes in the middle then the following
> might work (not tested)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c83d38dfe936..4dc38516a5bd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5002,6 +5002,7 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> 	gfp_t alloc_gfp;
> 	unsigned int alloc_flags;
> 	int nr_populated = 0, prep_index = 0;
> +	bool hole = false;
> 
> 	if (WARN_ON_ONCE(nr_pages <= 0))
> 		return 0;
> @@ -5057,6 +5058,7 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> 	if (!zone)
> 		goto failed;
> 
> +retry_hole:
> 	/* Attempt the batch allocation */
> 	local_irq_save(flags);
> 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
> @@ -5069,6 +5071,7 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> 		 * IRQs are enabled.
> 		 */
> 		if (page_array && page_array[nr_populated]) {
> +			hole = true;
> 			nr_populated++;
> 			break;
> 		}
> @@ -5109,6 +5112,9 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> 			prep_new_page(page_array[prep_index++], 0, gfp, 0);
> 	}
> 
> +	if (hole && nr_populated < nr_pages && hole)
> +		goto retry_hole;
> +
> 	return nr_populated;
> 
> failed_irq:
> 
> -- 
> Mel Gorman
> SUSE Labs

If a local_irq_save() is done more than once in this case, I don't
expect that the result will be much better.

To make the array API as performant as the list API, the sunrpc
consumer will have to check if the N + 1th element is populated,
upon return, rather than checking the return value against
@nr_pages.

--
Chuck Lever




  reply	other threads:[~2021-03-22 20:34 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-22  9:18 [PATCH 0/3 v5] Introduce a bulk order-0 page allocator Mel Gorman
2021-03-22  9:18 ` [PATCH 1/3] mm/page_alloc: Rename alloced to allocated Mel Gorman
2021-03-22  9:18 ` [PATCH 2/3] mm/page_alloc: Add a bulk page allocator Mel Gorman
2021-03-23 16:00   ` Jesper Dangaard Brouer
2021-03-23 18:43     ` Mel Gorman
2021-03-22  9:18 ` [PATCH 3/3] mm/page_alloc: Add an array-based interface to the " Mel Gorman
2021-03-22 12:04 ` [PATCH 0/3 v5] Introduce a bulk order-0 " Jesper Dangaard Brouer
2021-03-22 16:44   ` Mel Gorman
2021-03-22 18:25 ` Chuck Lever III
2021-03-22 19:49   ` Mel Gorman
2021-03-22 20:32     ` Chuck Lever III [this message]
2021-03-22 20:58       ` Mel Gorman
2021-03-23 11:08         ` Jesper Dangaard Brouer
2021-03-23 14:45           ` Mel Gorman
2021-03-23 18:52             ` Chuck Lever III
2021-03-23 11:13       ` Matthew Wilcox
2021-03-23 10:44 ` Mel Gorman
2021-03-23 15:08   ` Jesper Dangaard Brouer
2021-03-23 16:29     ` Mel Gorman
2021-03-23 17:06     ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0E0B33DE-9413-4849-8E78-06B0CDF2D503@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=brouer@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.