linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: "zaslonko@linux.ibm.com" <zaslonko@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	"osalvador@suse.de" <osalvador@suse.de>,
	"gerald.schaefer@de.ibm.com" <gerald.schaefer@de.ibm.com>
Subject: Re: [PATCH] memory_hotplug: fix the panic when memory end is not on the section boundary
Date: Mon, 10 Sep 2018 15:26:55 +0000	[thread overview]
Message-ID: <abf84f61-82f3-e3d5-2e6e-82a11cb5dcf5@microsoft.com> (raw)
In-Reply-To: <20180910144152.GL10951@dhcp22.suse.cz>



On 9/10/18 10:41 AM, Michal Hocko wrote:
> On Mon 10-09-18 14:32:16, Pavel Tatashin wrote:
>> On Mon, Sep 10, 2018 at 10:19 AM Michal Hocko <mhocko@kernel.org> wrote:
>>>
>>> On Mon 10-09-18 14:11:45, Pavel Tatashin wrote:
>>>> Hi Michal,
>>>>
>>>> It is tricky, but probably can be done. Either change
>>>> memmap_init_zone() or its caller to also cover the ends and starts of
>>>> unaligned sections to initialize and reserve pages.
>>>>
>>>> The same thing would also need to be done in deferred_init_memmap() to
>>>> cover the deferred init case.
>>>
>>> Well, I am not sure TBH. I have to think about that much more. Maybe it
>>> would be much more simple to make sure that we will never add incomplete
>>> memblocks and simply refuse them during the discovery. At least for now.
>>
>> On x86 memblocks can be upto 2G on machines with over 64G of RAM.
> 
> sorry I meant pageblock_nr_pages rather than memblocks.

OK. This sound reasonable, but, to be honest I am not sure how to
achieve this yet, I need to think more about this. In theory, if we have
sparse memory model, it makes sense to enforce memory alignment to
section sizes, sounds a lot safer.

> 
>> Also, memory size is way to easy too change via qemu arguments when VM
>> starts. If we simply disable unaligned trailing memblocks, I am sure
>> we would get tons of noise of missing memory.
>>
>> I think, adding check_hotplug_memory_range() would work to fix the
>> immediate problem. But, we do need to figure out  a better solution.
>>
>> memblock design is based on archaic assumption that hotplug units are
>> physical dimms. VMs and hypervisors changed all of that, and we can
>> have much finer hotplug requests on machines with huge DIMMs. Yet, we
>> do not want to pollute sysfs with millions of tiny memory devices. I
>> am not sure what a long term proper solution for this problem should
>> be, but I see that linux hotplug/hotremove subsystems must be
>> redesigned based on the new requirements.
> 
> Not an easy task though. Anyway, sparse memory modely is highly based on
> memory sections so it makes some sense to have hotplug section based as
> well. Memblocks as a higher logical unit on top of that is kinda hack.
> The userspace API has never been properly thought through I am afraid.

I agree memoryblock is a hack, it fails to do both things it was
designed to do:

1. On bare metal you cannot free a physical dimm of memory using
memoryblock granularity because memory devices do not equal to physical
dimms. Thus, if for some reason a particular dimm must be
remove/replaced, memoryblock does not help us.

2. On machines with hypervisors it fails to provide an adequate
granularity to add/remove memory.

We should define a new user interface where memory can be added/removed
at a finer granularity: sparse section size, but without a memory
devices for each section. We should also provide an optional access to
legacy interface where memory devices are exported but each is of
section size.

So, when legacy interface is enabled, current way would work:

echo offline > /sys/devices/system/memory/memoryXXX/state

And new interface would allow us to do something like this:

echo offline 256M > /sys/devices/system/node/nodeXXX/memory

With optional start address for offline memory.
echo offline [start_pa] size > /sys/devices/system/node/nodeXXX/memory
start_pa and size must be section size aligned (128M).

It would probably be a good discussion for the next MM Summit how to
solve the current memory hotplug interface limitations.

Pavel

  reply	other threads:[~2018-09-10 15:27 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-10 12:35 [PATCH] memory_hotplug: fix the panic when memory end is not on the section boundary Mikhail Zaslonko
2018-09-10 13:17 ` Michal Hocko
2018-09-10 13:46   ` Pasha Tatashin
2018-09-10 13:59     ` Michal Hocko
2018-09-10 14:11       ` Pasha Tatashin
2018-09-10 14:19         ` Michal Hocko
2018-09-10 14:32           ` Pasha Tatashin
2018-09-10 14:41             ` Michal Hocko
2018-09-10 15:26               ` Pasha Tatashin [this message]
2018-09-11  9:16                 ` Michal Hocko
2018-09-12 14:28                 ` Gerald Schaefer
2018-09-11 14:08     ` Zaslonko Mikhail
2018-09-11 14:06   ` Zaslonko Mikhail
2018-09-12 12:21     ` Michal Hocko
2018-09-12 13:03   ` Gerald Schaefer
2018-09-12 13:39     ` Michal Hocko
2018-09-12 14:27       ` Gerald Schaefer
2018-09-12 14:40         ` Pasha Tatashin
2018-09-12 15:51           ` Gerald Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abf84f61-82f3-e3d5-2e6e-82a11cb5dcf5@microsoft.com \
    --to=pavel.tatashin@microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    --cc=zaslonko@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).