From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
Steffen Klassert <steffen.klassert@secunet.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Hildenbrand <david@redhat.com>,
Jason Gunthorpe <jgg@ziepe.ca>, Jonathan Corbet <corbet@lwn.net>,
Josh Triplett <josh@joshtriplett.org>,
Kirill Tkhai <ktkhai@virtuozzo.com>,
Michal Hocko <mhocko@kernel.org>, Pavel Machek <pavel@ucw.cz>,
Pavel Tatashin <pasha.tatashin@soleen.com>,
Peter Zijlstra <peterz@infradead.org>,
Randy Dunlap <rdunlap@infradead.org>,
Shile Zhang <shile.zhang@linux.alibaba.com>,
Tejun Heo <tj@kernel.org>, Zi Yan <ziy@nvidia.com>,
linux-crypto@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder()
Date: Thu, 30 Apr 2020 14:43:28 -0700 [thread overview]
Message-ID: <deadac9a-fbef-6c66-207c-83d251d2ef50@linux.intel.com> (raw)
In-Reply-To: <20200430201125.532129-6-daniel.m.jordan@oracle.com>
On 4/30/2020 1:11 PM, Daniel Jordan wrote:
> padata will soon divide up pfn ranges between threads when parallelizing
> deferred init, and deferred_init_maxorder() complicates that by using an
> opaque index in addition to start and end pfns. Move the index outside
> the function to make splitting the job easier, and simplify the code
> while at it.
>
> deferred_init_maxorder() now always iterates within a single pfn range
> instead of potentially multiple ranges, and advances start_pfn to the
> end of that range instead of the max-order block so partial pfn ranges
> in the block aren't skipped in a later iteration. The section alignment
> check in deferred_grow_zone() is removed as well since this alignment is
> no longer guaranteed. It's not clear what value the alignment provided
> originally.
>
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
So part of the reason for splitting it up along section aligned
boundaries was because we already had an existing functionality in
deferred_grow_zone that was going in and pulling out a section aligned
chunk and processing it to prepare enough memory for other threads to
keep running. I suspect that the section alignment was done because
normally I believe that is also the alignment for memory onlining.
With this already breaking things up over multiple threads how does this
work with deferred_grow_zone? Which thread is it trying to allocate from
if it needs to allocate some memory for itself?
Also what is to prevent a worker from stop deferred_grow_zone from
bailing out in the middle of a max order page block if there is a hole
in the middle of the block?
> ---
> mm/page_alloc.c | 88 +++++++++++++++----------------------------------
> 1 file changed, 27 insertions(+), 61 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 68669d3a5a665..990514d8f0d94 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1708,55 +1708,23 @@ deferred_init_mem_pfn_range_in_zone(u64 *i, struct zone *zone,
> }
>
> /*
> - * Initialize and free pages. We do it in two loops: first we initialize
> - * struct page, then free to buddy allocator, because while we are
> - * freeing pages we can access pages that are ahead (computing buddy
> - * page in __free_one_page()).
> - *
> - * In order to try and keep some memory in the cache we have the loop
> - * broken along max page order boundaries. This way we will not cause
> - * any issues with the buddy page computation.
> + * Initialize the struct pages and then free them to the buddy allocator at
> + * most a max order block at a time because while we are freeing pages we can
> + * access pages that are ahead (computing buddy page in __free_one_page()).
> + * It's also cache friendly.
> */
> static unsigned long __init
> -deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn,
> - unsigned long *end_pfn)
> +deferred_init_maxorder(struct zone *zone, unsigned long *start_pfn,
> + unsigned long end_pfn)
> {
> - unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES);
> - unsigned long spfn = *start_pfn, epfn = *end_pfn;
> - unsigned long nr_pages = 0;
> - u64 j = *i;
> -
> - /* First we loop through and initialize the page values */
> - for_each_free_mem_pfn_range_in_zone_from(j, zone, start_pfn, end_pfn) {
> - unsigned long t;
> -
> - if (mo_pfn <= *start_pfn)
> - break;
> -
> - t = min(mo_pfn, *end_pfn);
> - nr_pages += deferred_init_pages(zone, *start_pfn, t);
> -
> - if (mo_pfn < *end_pfn) {
> - *start_pfn = mo_pfn;
> - break;
> - }
> - }
> -
> - /* Reset values and now loop through freeing pages as needed */
> - swap(j, *i);
> -
> - for_each_free_mem_pfn_range_in_zone_from(j, zone, &spfn, &epfn) {
> - unsigned long t;
> -
> - if (mo_pfn <= spfn)
> - break;
> + unsigned long nr_pages, pfn;
>
> - t = min(mo_pfn, epfn);
> - deferred_free_pages(spfn, t);
> + pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES);
> + pfn = min(pfn, end_pfn);
>
> - if (mo_pfn <= epfn)
> - break;
> - }
> + nr_pages = deferred_init_pages(zone, *start_pfn, pfn);
> + deferred_free_pages(*start_pfn, pfn);
> + *start_pfn = pfn;
>
> return nr_pages;
> }
> @@ -1814,9 +1782,11 @@ static int __init deferred_init_memmap(void *data)
> * that we can avoid introducing any issues with the buddy
> * allocator.
> */
> - while (spfn < epfn) {
> - nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
> - cond_resched();
> + for_each_free_mem_pfn_range_in_zone_from(i, zone, &spfn, &epfn) {
> + while (spfn < epfn) {
> + nr_pages += deferred_init_maxorder(zone, &spfn, epfn);
> + cond_resched();
> + }
> }
> zone_empty:
> /* Sanity check that the next zone really is unpopulated */
> @@ -1883,22 +1853,18 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
> * that we can avoid introducing any issues with the buddy
> * allocator.
> */
> - while (spfn < epfn) {
> - /* update our first deferred PFN for this section */
> - first_deferred_pfn = spfn;
> -
> - nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
> - touch_nmi_watchdog();
> -
> - /* We should only stop along section boundaries */
> - if ((first_deferred_pfn ^ spfn) < PAGES_PER_SECTION)
> - continue;
> -
> - /* If our quota has been met we can stop here */
> - if (nr_pages >= nr_pages_needed)
> - break;
> + for_each_free_mem_pfn_range_in_zone_from(i, zone, &spfn, &epfn) {
> + while (spfn < epfn) {
> + nr_pages += deferred_init_maxorder(zone, &spfn, epfn);
> + touch_nmi_watchdog();
> +
> + /* If our quota has been met we can stop here */
> + if (nr_pages >= nr_pages_needed)
> + goto out;
> + }
> }
>
> +out:
> pgdat->first_deferred_pfn = spfn;
> pgdat_resize_unlock(pgdat, &flags);
>
>
next prev parent reply other threads:[~2020-04-30 21:43 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-30 20:11 [PATCH 0/7] padata: parallelize deferred page init Daniel Jordan
2020-04-30 20:11 ` [PATCH 1/7] padata: remove exit routine Daniel Jordan
2020-04-30 20:11 ` [PATCH 2/7] padata: initialize earlier Daniel Jordan
2020-04-30 20:11 ` [PATCH 3/7] padata: allocate work structures for parallel jobs from a pool Daniel Jordan
2020-04-30 20:11 ` [PATCH 4/7] padata: add basic support for multithreaded jobs Daniel Jordan
2020-04-30 20:11 ` [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder() Daniel Jordan
2020-04-30 21:43 ` Alexander Duyck [this message]
2020-05-01 2:45 ` Daniel Jordan
2020-05-04 22:10 ` Alexander Duyck
2020-05-05 0:54 ` Daniel Jordan
2020-05-05 15:27 ` Alexander Duyck
2020-05-06 22:39 ` Daniel Jordan
2020-05-07 15:26 ` Alexander Duyck
2020-05-07 20:20 ` Daniel Jordan
2020-05-07 21:18 ` Alexander Duyck
2020-05-07 22:15 ` Daniel Jordan
2020-04-30 20:11 ` [PATCH 6/7] mm: parallelize deferred_init_memmap() Daniel Jordan
2020-05-04 22:33 ` Alexander Duyck
2020-05-04 23:38 ` Josh Triplett
2020-05-05 0:40 ` Alexander Duyck
2020-05-05 1:48 ` Daniel Jordan
2020-05-05 2:09 ` Daniel Jordan
2020-05-05 14:55 ` Alexander Duyck
2020-05-06 22:21 ` Daniel Jordan
2020-05-06 22:36 ` Alexander Duyck
2020-05-06 22:43 ` Daniel Jordan
2020-05-06 23:01 ` Daniel Jordan
2020-05-05 1:26 ` Daniel Jordan
2020-04-30 20:11 ` [PATCH 7/7] padata: document multithreaded jobs Daniel Jordan
2020-04-30 21:31 ` [PATCH 0/7] padata: parallelize deferred page init Andrew Morton
2020-04-30 21:40 ` Pavel Tatashin
2020-05-01 2:40 ` Daniel Jordan
2020-05-01 0:50 ` Josh Triplett
2020-05-01 1:09 ` Josh Triplett
2020-05-01 2:48 ` Daniel Jordan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=deadac9a-fbef-6c66-207c-83d251d2ef50@linux.intel.com \
--to=alexander.h.duyck@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=daniel.m.jordan@oracle.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=herbert@gondor.apana.org.au \
--cc=jgg@ziepe.ca \
--cc=josh@joshtriplett.org \
--cc=ktkhai@virtuozzo.com \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=pavel@ucw.cz \
--cc=peterz@infradead.org \
--cc=rdunlap@infradead.org \
--cc=shile.zhang@linux.alibaba.com \
--cc=steffen.klassert@secunet.com \
--cc=tj@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).