All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Mike Travis <travis@sgi.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>, Peter Anvin <hpa@zytor.com>,
	Ingo Molnar <mingo@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Robin Holt <holt@sgi.com>,
	Rob Landley <rob@landley.net>,
	Daniel J Blueman <daniel@numascale-asia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Yinghai Lu <yinghai@kernel.org>, Mel Gorman <mgorman@suse.de>
Subject: Re: [RFC v3 0/5] Transparent on-demand struct page initialization embedded in the buddy allocator
Date: Tue, 13 Aug 2013 10:51:37 -0700	[thread overview]
Message-ID: <CA+55aFwRHdQ_f6ryUU1yWkW1Qz8cG958jLZuyhd_YdOq4-rfRA@mail.gmail.com> (raw)
In-Reply-To: <520A6DFC.1070201@sgi.com>

On Tue, Aug 13, 2013 at 10:33 AM, Mike Travis <travis@sgi.com> wrote:
>
> Initially this patch set consisted of diverting a major portion of the
> memory to an "absent" list during e820 processing.  A very late initcall
> was then used to dispatch a cpu per node to add that nodes's absent
> memory.  By nature these ran in parallel so Nathan did the work to
> "parallelize" various global resource locks to become per node locks.

So quite frankly, I'm not sure how worthwhile it even is to
parallelize the thing. I realize that some environments may care about
getting up to full memory population very quicky, but I think it would
be very rare and specialized, and shouldn't necessarily be part of the
initial patches.

And it really doesn't have to be an initcall at all - at least not a
synchronous one. A late initcall to get the process *started*, but the
process itself could easily be done with a separate thread
asynchronously, and let the machine boot up while that thread is
going.

And in fact, I'd argue that instead of trying to make it fast and
parallelize things excessively, you might want to make the memory
initialization *slow*, and make all the rest of the bootup have higher
priority.

At that point, who cares if it takes 400 seconds to get all memory
initialized? In fact, who cares if it takes twice that? Let's assume
that the rest of the boot takes 30s (which is pretty aggressive for
some big server with terabytes of memory), even if the memory
initialization was running in the background and only during idle time
for probing, I'm sure you'd have a few hundred gigs of RAM initialized
by the time you can log in. And if it then takes another ten minutes
until you have the full 16TB initialized, and some things might be a
tad slower early on, does anybody really care?  The machine will be up
and running with plenty of memory, even if it may not be *all* the
memory yet.

I realize that benchmarking cares, and yes, I also realize that some
benchmarks actually want to reboot the machine between some runs just
to get repeatability, but if you're benchmarking a 16TB machine I'm
guessing any serious benchmark that actually uses that much memory is
going to take many hours to a few days to run anyway? Having some way
to wait until the memory is all done (which might even be just a silly
shell script that does "ps" and waits for the kernel threads to all go
away) isn't going to kill the benchmark - and the benchmark itself
will then not have to worry about hittinf the "oops, I need to
initialize 2GB of RAM now because I hit an uninitialized page".

Ok, so I don't know all the issues, and in many ways I don't even
really care. You could do it other ways, I don't think this is a big
deal. The part I hate is the runtime hook into the core MM page
allocation code, so I'm just throwing out any random thing that comes
to my mind that could be used to avoid that part.

                    Linus

WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Mike Travis <travis@sgi.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>, Peter Anvin <hpa@zytor.com>,
	Ingo Molnar <mingo@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Robin Holt <holt@sgi.com>,
	Rob Landley <rob@landley.net>,
	Daniel J Blueman <daniel@numascale-asia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Yinghai Lu <yinghai@kernel.org>, Mel Gorman <mgorman@suse.de>
Subject: Re: [RFC v3 0/5] Transparent on-demand struct page initialization embedded in the buddy allocator
Date: Tue, 13 Aug 2013 10:51:37 -0700	[thread overview]
Message-ID: <CA+55aFwRHdQ_f6ryUU1yWkW1Qz8cG958jLZuyhd_YdOq4-rfRA@mail.gmail.com> (raw)
In-Reply-To: <520A6DFC.1070201@sgi.com>

On Tue, Aug 13, 2013 at 10:33 AM, Mike Travis <travis@sgi.com> wrote:
>
> Initially this patch set consisted of diverting a major portion of the
> memory to an "absent" list during e820 processing.  A very late initcall
> was then used to dispatch a cpu per node to add that nodes's absent
> memory.  By nature these ran in parallel so Nathan did the work to
> "parallelize" various global resource locks to become per node locks.

So quite frankly, I'm not sure how worthwhile it even is to
parallelize the thing. I realize that some environments may care about
getting up to full memory population very quicky, but I think it would
be very rare and specialized, and shouldn't necessarily be part of the
initial patches.

And it really doesn't have to be an initcall at all - at least not a
synchronous one. A late initcall to get the process *started*, but the
process itself could easily be done with a separate thread
asynchronously, and let the machine boot up while that thread is
going.

And in fact, I'd argue that instead of trying to make it fast and
parallelize things excessively, you might want to make the memory
initialization *slow*, and make all the rest of the bootup have higher
priority.

At that point, who cares if it takes 400 seconds to get all memory
initialized? In fact, who cares if it takes twice that? Let's assume
that the rest of the boot takes 30s (which is pretty aggressive for
some big server with terabytes of memory), even if the memory
initialization was running in the background and only during idle time
for probing, I'm sure you'd have a few hundred gigs of RAM initialized
by the time you can log in. And if it then takes another ten minutes
until you have the full 16TB initialized, and some things might be a
tad slower early on, does anybody really care?  The machine will be up
and running with plenty of memory, even if it may not be *all* the
memory yet.

I realize that benchmarking cares, and yes, I also realize that some
benchmarks actually want to reboot the machine between some runs just
to get repeatability, but if you're benchmarking a 16TB machine I'm
guessing any serious benchmark that actually uses that much memory is
going to take many hours to a few days to run anyway? Having some way
to wait until the memory is all done (which might even be just a silly
shell script that does "ps" and waits for the kernel threads to all go
away) isn't going to kill the benchmark - and the benchmark itself
will then not have to worry about hittinf the "oops, I need to
initialize 2GB of RAM now because I hit an uninitialized page".

Ok, so I don't know all the issues, and in many ways I don't even
really care. You could do it other ways, I don't think this is a big
deal. The part I hate is the runtime hook into the core MM page
allocation code, so I'm just throwing out any random thing that comes
to my mind that could be used to avoid that part.

                    Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-08-13 17:51 UTC|newest]

Thread overview: 153+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-12  2:03 [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator Robin Holt
2013-07-12  2:03 ` Robin Holt
2013-07-12  2:03 ` [RFC 1/4] memblock: Introduce a for_each_reserved_mem_region iterator Robin Holt
2013-07-12  2:03   ` Robin Holt
2013-07-12  2:03 ` [RFC 2/4] Have __free_pages_memory() free in larger chunks Robin Holt
2013-07-12  2:03   ` Robin Holt
2013-07-12  7:45   ` Robin Holt
2013-07-12  7:45     ` Robin Holt
2013-07-13  3:08     ` Yinghai Lu
2013-07-13  3:08       ` Yinghai Lu
2013-07-16 13:02   ` Sam Ben
2013-07-16 13:02     ` Sam Ben
2013-07-23 15:32     ` Johannes Weiner
2013-07-23 15:32       ` Johannes Weiner
2013-07-12  2:03 ` [RFC 3/4] Seperate page initialization into a separate function Robin Holt
2013-07-12  2:03   ` Robin Holt
2013-07-13  3:06   ` Yinghai Lu
2013-07-13  3:06     ` Yinghai Lu
2013-07-15  3:19     ` Robin Holt
2013-07-15  3:19       ` Robin Holt
2013-07-12  2:03 ` [RFC 4/4] Sparse initialization of struct page array Robin Holt
2013-07-12  2:03   ` Robin Holt
2013-07-13  4:19   ` Yinghai Lu
2013-07-13  4:19     ` Yinghai Lu
2013-07-13  4:39     ` H. Peter Anvin
2013-07-13  4:39       ` H. Peter Anvin
2013-07-13  5:31       ` Yinghai Lu
2013-07-13  5:31         ` Yinghai Lu
2013-07-13  5:38         ` H. Peter Anvin
2013-07-13  5:38           ` H. Peter Anvin
2013-07-15 14:08         ` Nathan Zimmer
2013-07-15 14:08           ` Nathan Zimmer
2013-07-15 17:45     ` Nathan Zimmer
2013-07-15 17:45       ` Nathan Zimmer
2013-07-15 17:54       ` H. Peter Anvin
2013-07-15 17:54         ` H. Peter Anvin
2013-07-15 18:26         ` Robin Holt
2013-07-15 18:26           ` Robin Holt
2013-07-15 18:29           ` H. Peter Anvin
2013-07-15 18:29             ` H. Peter Anvin
2013-07-23  8:32             ` Ingo Molnar
2013-07-23  8:32               ` Ingo Molnar
2013-07-23 11:09               ` Robin Holt
2013-07-23 11:09                 ` Robin Holt
2013-07-23 11:15                 ` Robin Holt
2013-07-23 11:15                   ` Robin Holt
2013-07-23 11:41                   ` Robin Holt
2013-07-23 11:41                     ` Robin Holt
2013-07-23 11:50                     ` Robin Holt
2013-07-23 11:50                       ` Robin Holt
2013-07-16 10:26     ` Robin Holt
2013-07-16 10:26       ` Robin Holt
2013-07-25  2:25     ` Robin Holt
2013-07-25  2:25       ` Robin Holt
2013-07-25 12:50       ` Yinghai Lu
2013-07-25 12:50         ` Yinghai Lu
2013-07-25 13:42         ` Robin Holt
2013-07-25 13:42           ` Robin Holt
2013-07-25 13:52           ` Yinghai Lu
2013-07-25 13:52             ` Yinghai Lu
2013-07-15 21:30   ` Andrew Morton
2013-07-15 21:30     ` Andrew Morton
2013-07-16 10:38     ` Robin Holt
2013-07-16 10:38       ` Robin Holt
2013-07-12  8:27 ` [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator Ingo Molnar
2013-07-12  8:27   ` Ingo Molnar
2013-07-12  8:47   ` boot tracing Borislav Petkov
2013-07-12  8:47     ` Borislav Petkov
2013-07-12  8:53     ` Ingo Molnar
2013-07-12  8:53       ` Ingo Molnar
2013-07-15  1:38       ` Sam Ben
2013-07-15  1:38         ` Sam Ben
2013-07-23  8:18         ` Ingo Molnar
2013-07-23  8:18           ` Ingo Molnar
2013-07-12  9:19   ` [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator Robert Richter
2013-07-12  9:19     ` Robert Richter
2013-07-15 15:16   ` Robin Holt
2013-07-15 15:16     ` Robin Holt
2013-07-16  8:55   ` Joonsoo Kim
2013-07-16  8:55     ` Joonsoo Kim
2013-07-16  9:08     ` Borislav Petkov
2013-07-16  9:08       ` Borislav Petkov
2013-07-23  8:20       ` Ingo Molnar
2013-07-23  8:20         ` Ingo Molnar
2013-07-15 15:00 ` Robin Holt
2013-07-15 15:00   ` Robin Holt
2013-07-17  5:17 ` Sam Ben
2013-07-17  5:17   ` Sam Ben
2013-07-17  9:30   ` Robin Holt
2013-07-17  9:30     ` Robin Holt
2013-07-19 23:51     ` Yinghai Lu
2013-07-22  6:13       ` Robin Holt
2013-07-22  6:13         ` Robin Holt
2013-08-02 17:44 ` [RFC v2 0/5] " Nathan Zimmer
2013-08-02 17:44   ` Nathan Zimmer
2013-08-02 17:44   ` [RFC v2 1/5] memblock: Introduce a for_each_reserved_mem_region iterator Nathan Zimmer
2013-08-02 17:44     ` Nathan Zimmer
2013-08-02 17:44   ` [RFC v2 2/5] Have __free_pages_memory() free in larger chunks Nathan Zimmer
2013-08-02 17:44     ` Nathan Zimmer
2013-08-02 17:44   ` [RFC v2 3/5] Move page initialization into a separate function Nathan Zimmer
2013-08-02 17:44     ` Nathan Zimmer
2013-08-02 17:44   ` [RFC v2 4/5] Only set page reserved in the memblock region Nathan Zimmer
2013-08-02 17:44     ` Nathan Zimmer
2013-08-03 20:04     ` Nathan Zimmer
2013-08-03 20:04       ` Nathan Zimmer
2013-08-02 17:44   ` [RFC v2 5/5] Sparse initialization of struct page array Nathan Zimmer
2013-08-02 17:44     ` Nathan Zimmer
2013-08-05  9:58   ` [RFC v2 0/5] Transparent on-demand struct page initialization embedded in the buddy allocator Ingo Molnar
2013-08-05  9:58     ` Ingo Molnar
2013-08-12 21:54   ` [RFC v3 " Nathan Zimmer
2013-08-12 21:54     ` Nathan Zimmer
2013-08-12 21:54     ` [RFC v3 1/5] memblock: Introduce a for_each_reserved_mem_region iterator Nathan Zimmer
2013-08-12 21:54       ` Nathan Zimmer
2013-08-12 21:54     ` [RFC v3 2/5] Have __free_pages_memory() free in larger chunks Nathan Zimmer
2013-08-12 21:54       ` Nathan Zimmer
2013-08-12 21:54     ` [RFC v3 3/5] Move page initialization into a separate function Nathan Zimmer
2013-08-12 21:54       ` Nathan Zimmer
2013-08-12 21:54     ` [RFC v3 4/5] Only set page reserved in the memblock region Nathan Zimmer
2013-08-12 21:54       ` Nathan Zimmer
2013-08-12 21:54     ` [RFC v3 5/5] Sparse initialization of struct page array Nathan Zimmer
2013-08-12 21:54       ` Nathan Zimmer
2013-08-13 10:58     ` [RFC v3 0/5] Transparent on-demand struct page initialization embedded in the buddy allocator Ingo Molnar
2013-08-13 10:58       ` Ingo Molnar
2013-08-13 17:09     ` Linus Torvalds
2013-08-13 17:09       ` Linus Torvalds
2013-08-13 17:23       ` H. Peter Anvin
2013-08-13 17:23         ` H. Peter Anvin
2013-08-13 17:33       ` Mike Travis
2013-08-13 17:33         ` Mike Travis
2013-08-13 17:51         ` Linus Torvalds [this message]
2013-08-13 17:51           ` Linus Torvalds
2013-08-13 18:04           ` Mike Travis
2013-08-13 18:04             ` Mike Travis
2013-08-13 19:06             ` Mike Travis
2013-08-13 19:06               ` Mike Travis
2013-08-13 20:24               ` Yinghai Lu
2013-08-13 20:24                 ` Yinghai Lu
2013-08-13 20:37                 ` Mike Travis
2013-08-13 20:37                   ` Mike Travis
2013-08-13 21:35             ` Nathan Zimmer
2013-08-13 21:35               ` Nathan Zimmer
2013-08-13 23:10           ` Nathan Zimmer
2013-08-13 23:10             ` Nathan Zimmer
2013-08-13 23:55             ` Linus Torvalds
2013-08-13 23:55               ` Linus Torvalds
2013-08-14 11:27               ` Ingo Molnar
2013-08-14 11:27                 ` Ingo Molnar
2013-08-14 11:05           ` Ingo Molnar
2013-08-14 11:05             ` Ingo Molnar
2013-08-14 22:15             ` Nathan Zimmer
2013-08-14 22:15               ` Nathan Zimmer
2013-08-16 16:36     ` Dave Hansen
2013-08-16 16:36       ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFwRHdQ_f6ryUU1yWkW1Qz8cG958jLZuyhd_YdOq4-rfRA@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=daniel@numascale-asia.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=holt@sgi.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=nzimmer@sgi.com \
    --cc=rob@landley.net \
    --cc=travis@sgi.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.