linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Tatashin <pasha.tatashin@oracle.com>
To: steven.sistare@oracle.com, daniel.m.jordan@oracle.com,
	pasha.tatashin@oracle.com, m.mizuma@jp.fujitsu.com,
	akpm@linux-foundation.org, mhocko@suse.com,
	catalin.marinas@arm.com, takahiro.akashi@linaro.org,
	gi-oh.kim@profitbricks.com, heiko.carstens@de.ibm.com,
	baiyaowei@cmss.chinamobile.com, richard.weiyang@gmail.com,
	paul.burton@mips.com, miles.chen@mediatek.com, vbabka@suse.cz,
	mgorman@suse.de, hannes@cmpxchg.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [v5 0/2] initialize pages on demand during boot
Date: Fri,  9 Mar 2018 17:08:05 -0500	[thread overview]
Message-ID: <20180309220807.24961-1-pasha.tatashin@oracle.com> (raw)

Change log:
	v4 - v5
	- Fix issue reported by Vlasimil Babka:
	  > I've noticed that this function first disables the
	  > on-demand initialization, and then runs the kthreads.
	  > Doesn't that leave a window where allocations can fail? The
	  > chances are probably small, but I think it would be better
	  > to avoid it completely, rare failures suck.
	  >
	  > Fixing that probably means rethinking the whole
	  > synchronization more dramatically though :/
	- Introduced a new patch that uses node resize lock to synchronize
	  on-demand deferred page initialization, and regular deferred page
	  initialization.

	v3 - v4
	- Fix !CONFIG_NUMA issue.
	v2 - v3
	Andrew Morton's comments:
	- Moved read of pgdat->first_deferred_pfn into
	  deferred_zone_grow_lock, thus got rid of READ_ONCE()/WRITE_ONCE()
	- Replaced spin_lock() with spin_lock_irqsave() in
	  deferred_grow_zone
	- Updated comments for deferred_zone_grow_lock
	- Updated comment before deferred_grow_zone() explaining return
	  value, and also noinline specifier.
	- Fixed comment before _deferred_grow_zone().

	v1 - v2
	Added Tested-by: Masayoshi Mizuma

This change helps for three reasons:

1. Insufficient amount of reserved memory due to arguments provided by
user. User may request some buffers, increased hash tables sizes etc.
Currently, machine panics during boot if it can't allocate memory due
to insufficient amount of reserved memory. With this change, it will
be able to grow zone before deferred pages are initialized.

One observed example is described in the linked discussion [1] Mel
Gorman writes:

"
Yasuaki Ishimatsu reported a premature OOM when trace_buf_size=100m was
specified on a machine with many CPUs. The kernel tried to allocate 38.4GB
but only 16GB was available due to deferred memory initialisation.
"

The allocations in the above scenario happen per-cpu in smp_init(),
and before deferred pages are initialized. So, there is no way to
predict how much memory we should put aside to boot successfully with
deferred page initialization feature compiled in.

2. The second reason is future proof. The kernel memory requirements
may change, and we do not want to constantly update
reset_deferred_meminit() to satisfy the new requirements. In addition,
this function is currently in common code, but potentially would need
to be split into arch specific variants, as more arches will start
taking advantage of deferred page initialization feature.

3. On demand initialization of reserved pages guarantees that we will
initialize only as many pages early in boot using only one thread as
needed, the rest are going to be efficiently initialized in parallel.

[1] https://www.spinics.net/lists/linux-mm/msg139087.html

Pavel Tatashin (2):
  mm: disable interrupts while initializing deferred pages
  mm: initialize pages on demand during boot

 include/linux/memblock.h       |  10 --
 include/linux/memory_hotplug.h |  73 ++++++++++-----
 include/linux/mmzone.h         |   5 +-
 mm/memblock.c                  |  23 -----
 mm/page_alloc.c                | 205 +++++++++++++++++++++++++++++++----------
 5 files changed, 206 insertions(+), 110 deletions(-)

-- 
2.16.2

             reply	other threads:[~2018-03-09 22:08 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-09 22:08 Pavel Tatashin [this message]
2018-03-09 22:08 ` [v5 1/2] mm: disable interrupts while initializing deferred pages Pavel Tatashin
2018-03-12 20:04   ` Andrew Morton
2018-03-13 16:04     ` Pavel Tatashin
2018-03-13 18:55       ` Andrew Morton
2018-03-13 19:45         ` Pavel Tatashin
2018-03-13 20:11           ` Andrew Morton
2018-03-13 20:43             ` Pavel Tatashin
2018-03-13 21:24               ` Andrew Morton
2018-03-14  0:59                 ` Pavel Tatashin
2018-03-09 22:08 ` [v5 2/2] mm: initialize pages on demand during boot Pavel Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180309220807.24961-1-pasha.tatashin@oracle.com \
    --to=pasha.tatashin@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baiyaowei@cmss.chinamobile.com \
    --cc=catalin.marinas@arm.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=gi-oh.kim@profitbricks.com \
    --cc=hannes@cmpxchg.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=m.mizuma@jp.fujitsu.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=miles.chen@mediatek.com \
    --cc=paul.burton@mips.com \
    --cc=richard.weiyang@gmail.com \
    --cc=steven.sistare@oracle.com \
    --cc=takahiro.akashi@linaro.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).