linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	 Steffen Klassert <steffen.klassert@secunet.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	 Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	 David Hildenbrand <david@redhat.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Jonathan Corbet <corbet@lwn.net>,
	 Josh Triplett <josh@joshtriplett.org>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	 Michal Hocko <mhocko@kernel.org>, Pavel Machek <pavel@ucw.cz>,
	 Pavel Tatashin <pasha.tatashin@soleen.com>,
	Peter Zijlstra <peterz@infradead.org>,
	 Randy Dunlap <rdunlap@infradead.org>,
	Shile Zhang <shile.zhang@linux.alibaba.com>,
	 Tejun Heo <tj@kernel.org>, Zi Yan <ziy@nvidia.com>,
	linux-crypto@vger.kernel.org,  linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder()
Date: Tue, 5 May 2020 08:27:52 -0700	[thread overview]
Message-ID: <CAKgT0Uegw2vFSCOcsCMATfDu0Q8NP2ZVi-2Fgm8P2RwU_B2c3A@mail.gmail.com> (raw)
In-Reply-To: <20200505005432.bohmaa6zeffhdkgn@ca-dmjordan1.us.oracle.com>

On Mon, May 4, 2020 at 5:54 PM Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
>
> On Mon, May 04, 2020 at 03:10:46PM -0700, Alexander Duyck wrote:
> > So we cannot stop in the middle of a max order block. That shouldn't
> > be possible as part of the issue is that the buddy allocator will
> > attempt to access the buddy for the page which could cause issues if
> > it tries to merge the page with one that is not initialized. So if
> > your code supports that then it is definitely broken. That was one of
> > the reasons for all of the variable weirdness in
> > deferred_init_maxorder. I was going through and making certain that
> > while we were initializing the range we were freeing the pages in
> > MAX_ORDER aligned blocks and skipping over whatever reserved blocks
> > were there. Basically it was handling the case where a single
> > MAX_ORDER block could span multiple ranges.
> >
> > On x86 this was all pretty straightforward and I don't believe we
> > needed the code, but I seem to recall there were some other
> > architectures that had more complex memory layouts at the time and
> > that was one of the reasons why I had to be careful to wait until I
> > had processed the full MAX_ORDER block before I could start freeing
> > the pages, otherwise it would start triggering memory corruptions.
>
> Yes, thanks, I missed the case where deferred_grow_zone could stop
> mid-max-order-block.

As it turns out that deferred_free_range will be setting the
migratetype for the page. In a sparse config the migratetype bits are
stored in the section bitmap. So to avoid cacheline bouncing it would
make sense to section align the tasks so that they only have one
thread touching one section rather than having the pageblock_flags
getting bounced between threads. It should also reduce the overhead
for having to parallelize the work in the first place since a section
is several times larger than a MAX_ORDER page and allows for more
batching of the work.

> Maybe it's better to leave deferred_init_maxorder alone and adapt the
> multithreading to the existing implementation.  That'd mean dealing with the
> pesky opaque index somehow, so deferred_init_mem_pfn_range_in_zone() could be
> generalized to find it in the thread function based on the start/end range, or
> it could be maintained as part of the range that padata passes to the thread
> function.

You may be better off just implementing your threads to operate like
deferred_grow_zone does. All your worker thread really needs then is
to know where to start performing the page initialization and then it
could go through and process an entire section worth of pages. The
other bit that would have to be changed is patch 6 so that you combine
any ranges that might span a single section instead of just splitting
the work up based on the ranges.

If you are referring to the mo_pfn you shouldn't even need to think
about it. All it is doing is guaranteeing you are processing at least
a full max order worth of pages. Without that the logic before was
either process a whole section, or just process all of memory
initializing it before it started freeing it. I found it made things
much more efficient to process only up to MAX_ORDER at a time as you
could squeeze that into the L2 cache for most x86 processors at least
and it reduced the memory bandwidth by quite a bit. If you update the
code to only provide section aligned/sized ranges of of PFNs to
initialize then it can pretty much be ignored since all it is doing is
defining the break point for single MAX_ORDER chunks which would be
smaller than a section anyway.


  reply	other threads:[~2020-05-05 15:28 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-30 20:11 [PATCH 0/7] padata: parallelize deferred page init Daniel Jordan
2020-04-30 20:11 ` [PATCH 1/7] padata: remove exit routine Daniel Jordan
2020-04-30 20:11 ` [PATCH 2/7] padata: initialize earlier Daniel Jordan
2020-04-30 20:11 ` [PATCH 3/7] padata: allocate work structures for parallel jobs from a pool Daniel Jordan
2020-04-30 20:11 ` [PATCH 4/7] padata: add basic support for multithreaded jobs Daniel Jordan
2020-04-30 20:11 ` [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder() Daniel Jordan
2020-04-30 21:43   ` Alexander Duyck
2020-05-01  2:45     ` Daniel Jordan
2020-05-04 22:10       ` Alexander Duyck
2020-05-05  0:54         ` Daniel Jordan
2020-05-05 15:27           ` Alexander Duyck [this message]
2020-05-06 22:39             ` Daniel Jordan
2020-05-07 15:26               ` Alexander Duyck
2020-05-07 20:20                 ` Daniel Jordan
2020-05-07 21:18                   ` Alexander Duyck
2020-05-07 22:15                     ` Daniel Jordan
2020-04-30 20:11 ` [PATCH 6/7] mm: parallelize deferred_init_memmap() Daniel Jordan
2020-05-04 22:33   ` Alexander Duyck
2020-05-04 23:38     ` Josh Triplett
2020-05-05  0:40       ` Alexander Duyck
2020-05-05  1:48         ` Daniel Jordan
2020-05-05  2:09           ` Daniel Jordan
2020-05-05 14:55             ` Alexander Duyck
2020-05-06 22:21               ` Daniel Jordan
2020-05-06 22:36                 ` Alexander Duyck
2020-05-06 22:43                   ` Daniel Jordan
2020-05-06 23:01                     ` Daniel Jordan
2020-05-05  1:26     ` Daniel Jordan
2020-04-30 20:11 ` [PATCH 7/7] padata: document multithreaded jobs Daniel Jordan
2020-04-30 21:31 ` [PATCH 0/7] padata: parallelize deferred page init Andrew Morton
2020-04-30 21:40   ` Pavel Tatashin
2020-05-01  2:40     ` Daniel Jordan
2020-05-01  0:50   ` Josh Triplett
2020-05-01  1:09 ` Josh Triplett
2020-05-01  2:48   ` Daniel Jordan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKgT0Uegw2vFSCOcsCMATfDu0Q8NP2ZVi-2Fgm8P2RwU_B2c3A@mail.gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=jgg@ziepe.ca \
    --cc=josh@joshtriplett.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=shile.zhang@linux.alibaba.com \
    --cc=steffen.klassert@secunet.com \
    --cc=tj@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).