linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Martin J. Bligh" <mbligh@aracnet.com>
To: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: memory hotremove prototype, take 3
Date: Wed, 03 Dec 2003 21:38:54 -0800	[thread overview]
Message-ID: <152440000.1070516333@[10.10.2.4]> (raw)
In-Reply-To: <20031204035842.72C9A7007A@sv1.valinux.co.jp>

> I used the discontigmem code because this is what we have now.
> My hacks such as zone_active[] will go away when the memory hot add
> code (on which Goto-san is working on) is ready.

Understand that, but it'd be much cleaner (and more likely to get 
accepted) doing it the other way.
 
>> Have you looked at Daniel's CONFIG_NONLINEAR stuff? That provides a much
>> cleaner abstraction for getting rid of discontiguous memory in the non
>> truly-NUMA case, and should work really well for doing mem hot add / remove
>> as well.
> 
> Thanks for pointing out.  I looked at the patch.
> It should be doable to make my patch work with the CONFIG_NONLINEAR
> code.  For my code to work, basically the following functionarities
> are necessary:
> 1. disabling alloc_page from hot-removing area
> and
> 2. enumerating pages in use in hot-removing area.
> 
> My target is somewhat NUMA-ish and fairly large.  So I'm not sure if
> CONFIG_NONLINEAR fits, but CONFIG_NUMA isn't perfect either.

If your target is NUMA, then you really, really need CONFIG_NONLINEAR.
We don't support multiple pgdats per node, nor do I wish to, as it'll
make an unholy mess ;-). With CONFIG_NONLINEAR, the discontiguities
within a node are buried down further, so we have much less complexity
to deal with from the main VM. The abstraction also keeps the poor
VM engineers trying to read / write the code saner via simplicity ;-)

WRT generic discontigmem support (not NUMA), doing that via pgdats
should really go away, as there's no real difference between the 
chunks of physical memory as far as the page allocator is concerned.
The plan is to use Daniel's nonlinear stuff to replace that, and keep
the pgdats strictly for NUMA. Same would apply to hotpluggable zones - 
I'd hate to end up with 512 pgdats of stuff that are really all the
same memory types underneath.

The real issue you have is the mapping of the struct pages - if we can
acheive a non-contig mapping of the mem_map / lmem_map array, we should
be able to take memory on and offline reasonably easy. If you're willing
for a first implementation to pre-allocate the struct page array for 
every possible virtual address, it makes life a lot easier.

Adding the other layer of indirection for access the struct page array
should fix up most of that, and is very easily abstracted out via the
pfn_to_page macros and friends. I ripped out all the direct references
to mem_map indexing already in 2.6, so it should all be nicely 
abstracted out.

>> PS. What's this bit of the patch for?
>> 
>>  void *vmalloc(unsigned long size)
>>  {
>> +#ifdef CONFIG_MEMHOTPLUGTEST
>> +       return __vmalloc(size, GFP_KERNEL, PAGE_KERNEL);
>> +#else
>>         return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL);
>> +#endif
>>  }
> 
> This is necessary because kernel memory cannot be swapped out.
> Only highmem can be hot removed, though it doesn't need to be highmem.
> We can define another zone attribute such as GFP_HOTPLUGGABLE.

You could just lock the pages, I'd think? I don't see at a glance
exactly what you were using this for, but would that work?

M.


  reply	other threads:[~2003-12-04  5:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-01  3:41 memory hotremove prototype, take 3 IWAMOTO Toshihiro
2003-12-01 19:56 ` Pavel Machek
2003-12-03 19:41 ` Martin J. Bligh
2003-12-04  3:58   ` IWAMOTO Toshihiro
2003-12-04  5:38     ` Martin J. Bligh [this message]
2003-12-04 15:44       ` IWAMOTO Toshihiro
2003-12-04 17:12         ` Martin J. Bligh
2003-12-04 18:27         ` Jesse Barnes
2003-12-04 18:29           ` Martin J. Bligh
2003-12-04 18:59             ` Jesse Barnes
2003-12-01 20:12 Luck, Tony
2003-12-02  3:01 ` IWAMOTO Toshihiro
2003-12-02  6:43   ` Hirokazu Takahashi
2003-12-02 22:26 ` Yasunori Goto
2003-12-03  5:19 Perez-Gonzalez, Inaky
2003-12-03 17:57 Luck, Tony
2003-12-10  0:45 Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='152440000.1070516333@[10.10.2.4]' \
    --to=mbligh@aracnet.com \
    --cc=iwamoto@valinux.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).