From: "Luck, Tony" <tony.luck@intel.com>
To: "Martin J. Bligh" <mbligh@aracnet.com>
Cc: <linux-kernel@vger.kernel.org>
Subject: Re: memory hotremove prototype, take 3
Date: Tue, 9 Dec 2003 16:45:10 -0800 [thread overview]
Message-ID: <B8E391BBE9FE384DAA4C5C003888BE6F4FAF4A@scsmsx401.sc.intel.com> (raw)
> If your target is NUMA, then you really, really need CONFIG_NONLINEAR.
> We don't support multiple pgdats per node, nor do I wish to, as it'll
> make an unholy mess ;-). With CONFIG_NONLINEAR, the discontiguities
> within a node are buried down further, so we have much less complexity
> to deal with from the main VM. The abstraction also keeps the poor
> VM engineers trying to read / write the code saner via simplicity ;-)
>
> WRT generic discontigmem support (not NUMA), doing that via pgdats
> should really go away, as there's no real difference between the
> chunks of physical memory as far as the page allocator is concerned.
> The plan is to use Daniel's nonlinear stuff to replace that, and keep
> the pgdats strictly for NUMA. Same would apply to hotpluggable zones -
> I'd hate to end up with 512 pgdats of stuff that are really all the
> same memory types underneath.
I guess this all depends on whether you allow bits of memory on
nodes to be hot-plugged ... or insist on the whole node being
added/removed in one fell swoop. I'd expect the latter to be
a more common model, and in that case the "pgdat-for-the-node" is
the same as the "pgdat-for-the-hot-plug-zone", so you don't have
a proliferation of pgdats to support hotplug.
> The real issue you have is the mapping of the struct pages - if we can
> achieve a non-contig mapping of the mem_map / lmem_map array, we should
> be able to take memory on and offline reasonably easy. If you're willing
> for a first implementation to pre-allocate the struct page array for
> every possible virtual address, it makes life a lot easier.
On 64-bit systems with CONFIG_VIRTUAL_MEMMAP, this would be trivial,
and avoids the need for the extra level of indirection in the psection[]
and vection[] arrays in CONFIG_NONLINEAR (ok ... it doesn't really get
rid of the indirection, as the page table lookup to access the virtual
mem_map effectively ends up doing the same thing).
> Adding the other layer of indirection for access the struct page array
> should fix up most of that, and is very easily abstracted out via the
> pfn_to_page macros and friends. I ripped out all the direct references
> to mem_map indexing already in 2.6, so it should all be nicely
> abstracted out.
I did go back and look at the CONFIG_NONLINEAR patch again, and I
still can't see how to make it useful on 64-bit machines. Jack
Steiner asked a bunch of questions on how it would work for an
architecture like the SGI:
http://marc.theaimsgroup.com/?l=lse-tech&m=101828803506249&w=2
I don't remember seeing any answers on the list. Assuming he
were to use a section size of 64MB (a convenient number for ia64)
he'd end up with psection[]/vsection[] tables with 8 million
entries each (@ 4 bytes/entry -> 64MB for the pair).
-Tony Luck
next reply other threads:[~2003-12-10 0:45 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-12-10 0:45 Luck, Tony [this message]
-- strict thread matches above, loose matches on Subject: below --
2003-12-03 17:57 memory hotremove prototype, take 3 Luck, Tony
2003-12-03 5:19 Perez-Gonzalez, Inaky
2003-12-01 20:12 Luck, Tony
2003-12-02 3:01 ` IWAMOTO Toshihiro
2003-12-02 6:43 ` Hirokazu Takahashi
2003-12-02 22:26 ` Yasunori Goto
2003-12-01 3:41 IWAMOTO Toshihiro
2003-12-01 19:56 ` Pavel Machek
2003-12-03 19:41 ` Martin J. Bligh
2003-12-04 3:58 ` IWAMOTO Toshihiro
2003-12-04 5:38 ` Martin J. Bligh
2003-12-04 15:44 ` IWAMOTO Toshihiro
2003-12-04 17:12 ` Martin J. Bligh
2003-12-04 18:27 ` Jesse Barnes
2003-12-04 18:29 ` Martin J. Bligh
2003-12-04 18:59 ` Jesse Barnes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=B8E391BBE9FE384DAA4C5C003888BE6F4FAF4A@scsmsx401.sc.intel.com \
--to=tony.luck@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@aracnet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).