linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Jonathan Corbet <corbet@lwn.net>,
	Josh Triplett <josh@joshtriplett.org>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	Michal Hocko <mhocko@kernel.org>, Pavel Machek <pavel@ucw.cz>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Shile Zhang <shile.zhang@linux.alibaba.com>,
	Tejun Heo <tj@kernel.org>, Zi Yan <ziy@nvidia.com>,
	linux-crypto@vger.kernel.org, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder()
Date: Mon, 4 May 2020 15:10:46 -0700	[thread overview]
Message-ID: <CAKgT0Uctro3+PWeJTi=O3Yc2qUF8Oy+HrypzCUzkaCt=XH0Lkg@mail.gmail.com> (raw)
In-Reply-To: <20200501024539.tnjuybydwe3r4u2x@ca-dmjordan1.us.oracle.com>

On Thu, Apr 30, 2020 at 7:45 PM Daniel Jordan
<daniel.m.jordan@oracle.com> wrote:
>
> Hi Alex,
>
> On Thu, Apr 30, 2020 at 02:43:28PM -0700, Alexander Duyck wrote:
> > On 4/30/2020 1:11 PM, Daniel Jordan wrote:
> > > padata will soon divide up pfn ranges between threads when parallelizing
> > > deferred init, and deferred_init_maxorder() complicates that by using an
> > > opaque index in addition to start and end pfns.  Move the index outside
> > > the function to make splitting the job easier, and simplify the code
> > > while at it.
> > >
> > > deferred_init_maxorder() now always iterates within a single pfn range
> > > instead of potentially multiple ranges, and advances start_pfn to the
> > > end of that range instead of the max-order block so partial pfn ranges
> > > in the block aren't skipped in a later iteration.  The section alignment
> > > check in deferred_grow_zone() is removed as well since this alignment is
> > > no longer guaranteed.  It's not clear what value the alignment provided
> > > originally.
> > >
> > > Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> >
> > So part of the reason for splitting it up along section aligned boundaries
> > was because we already had an existing functionality in deferred_grow_zone
> > that was going in and pulling out a section aligned chunk and processing it
> > to prepare enough memory for other threads to keep running. I suspect that
> > the section alignment was done because normally I believe that is also the
> > alignment for memory onlining.
>
> I think Pavel added that functionality, maybe he could confirm.
>
> My impression was that the reason deferred_grow_zone aligned the requested
> order up to a section was to make enough memory available to avoid being called
> on every allocation.
>
> > With this already breaking things up over multiple threads how does this
> > work with deferred_grow_zone? Which thread is it trying to allocate from if
> > it needs to allocate some memory for itself?
>
> I may not be following your question, but deferred_grow_zone doesn't allocate
> memory during the multithreading in deferred_init_memmap because the latter
> sets first_deferred_pfn so that deferred_grow_zone bails early.

It has been a while since I looked at this code so I forgot that
deferred_grow_zone is essentially blocked out once we start the
per-node init.

> > Also what is to prevent a worker from stop deferred_grow_zone from bailing
> > out in the middle of a max order page block if there is a hole in the middle
> > of the block?
>
> deferred_grow_zone remains singlethreaded.  It could stop in the middle of a
> max order block, but it can't run concurrently with deferred_init_memmap, as
> per above, so if deferred_init_memmap were to init 'n free the remaining part
> of the block, the previous portion would have already been initialized.

So we cannot stop in the middle of a max order block. That shouldn't
be possible as part of the issue is that the buddy allocator will
attempt to access the buddy for the page which could cause issues if
it tries to merge the page with one that is not initialized. So if
your code supports that then it is definitely broken. That was one of
the reasons for all of the variable weirdness in
deferred_init_maxorder. I was going through and making certain that
while we were initializing the range we were freeing the pages in
MAX_ORDER aligned blocks and skipping over whatever reserved blocks
were there. Basically it was handling the case where a single
MAX_ORDER block could span multiple ranges.

On x86 this was all pretty straightforward and I don't believe we
needed the code, but I seem to recall there were some other
architectures that had more complex memory layouts at the time and
that was one of the reasons why I had to be careful to wait until I
had processed the full MAX_ORDER block before I could start freeing
the pages, otherwise it would start triggering memory corruptions.

  reply	other threads:[~2020-05-04 22:11 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-30 20:11 [PATCH 0/7] padata: parallelize deferred page init Daniel Jordan
2020-04-30 20:11 ` [PATCH 1/7] padata: remove exit routine Daniel Jordan
2020-04-30 20:11 ` [PATCH 2/7] padata: initialize earlier Daniel Jordan
2020-04-30 20:11 ` [PATCH 3/7] padata: allocate work structures for parallel jobs from a pool Daniel Jordan
2020-04-30 20:11 ` [PATCH 4/7] padata: add basic support for multithreaded jobs Daniel Jordan
2020-04-30 20:11 ` [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder() Daniel Jordan
2020-04-30 21:43   ` Alexander Duyck
2020-05-01  2:45     ` Daniel Jordan
2020-05-04 22:10       ` Alexander Duyck [this message]
2020-05-05  0:54         ` Daniel Jordan
2020-05-05 15:27           ` Alexander Duyck
2020-05-06 22:39             ` Daniel Jordan
2020-05-07 15:26               ` Alexander Duyck
2020-05-07 20:20                 ` Daniel Jordan
2020-05-07 21:18                   ` Alexander Duyck
2020-05-07 22:15                     ` Daniel Jordan
2020-04-30 20:11 ` [PATCH 6/7] mm: parallelize deferred_init_memmap() Daniel Jordan
2020-05-04 22:33   ` Alexander Duyck
2020-05-04 23:38     ` Josh Triplett
2020-05-05  0:40       ` Alexander Duyck
2020-05-05  1:48         ` Daniel Jordan
2020-05-05  2:09           ` Daniel Jordan
2020-05-05 14:55             ` Alexander Duyck
2020-05-06 22:21               ` Daniel Jordan
2020-05-06 22:36                 ` Alexander Duyck
2020-05-06 22:43                   ` Daniel Jordan
2020-05-06 23:01                     ` Daniel Jordan
2020-05-05  1:26     ` Daniel Jordan
2020-04-30 20:11 ` [PATCH 7/7] padata: document multithreaded jobs Daniel Jordan
2020-04-30 21:31 ` [PATCH 0/7] padata: parallelize deferred page init Andrew Morton
2020-04-30 21:40   ` Pavel Tatashin
2020-05-01  2:40     ` Daniel Jordan
2020-05-01  0:50   ` Josh Triplett
2020-05-01  1:09 ` Josh Triplett
2020-05-01  2:48   ` Daniel Jordan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKgT0Uctro3+PWeJTi=O3Yc2qUF8Oy+HrypzCUzkaCt=XH0Lkg@mail.gmail.com' \
    --to=alexander.duyck@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=jgg@ziepe.ca \
    --cc=josh@joshtriplett.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=shile.zhang@linux.alibaba.com \
    --cc=steffen.klassert@secunet.com \
    --cc=tj@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).