From: Michal Hocko <mhocko@kernel.org>
To: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: "zaslonko@linux.ibm.com" <zaslonko@linux.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
"osalvador@suse.de" <osalvador@suse.de>,
"gerald.schaefer@de.ibm.com" <gerald.schaefer@de.ibm.com>
Subject: Re: [PATCH] memory_hotplug: fix the panic when memory end is not on the section boundary
Date: Tue, 11 Sep 2018 11:16:08 +0200 [thread overview]
Message-ID: <20180911091608.GQ10951@dhcp22.suse.cz> (raw)
In-Reply-To: <abf84f61-82f3-e3d5-2e6e-82a11cb5dcf5@microsoft.com>
On Mon 10-09-18 15:26:55, Pavel Tatashin wrote:
>
>
> On 9/10/18 10:41 AM, Michal Hocko wrote:
> > On Mon 10-09-18 14:32:16, Pavel Tatashin wrote:
> >> On Mon, Sep 10, 2018 at 10:19 AM Michal Hocko <mhocko@kernel.org> wrote:
> >>>
> >>> On Mon 10-09-18 14:11:45, Pavel Tatashin wrote:
> >>>> Hi Michal,
> >>>>
> >>>> It is tricky, but probably can be done. Either change
> >>>> memmap_init_zone() or its caller to also cover the ends and starts of
> >>>> unaligned sections to initialize and reserve pages.
> >>>>
> >>>> The same thing would also need to be done in deferred_init_memmap() to
> >>>> cover the deferred init case.
> >>>
> >>> Well, I am not sure TBH. I have to think about that much more. Maybe it
> >>> would be much more simple to make sure that we will never add incomplete
> >>> memblocks and simply refuse them during the discovery. At least for now.
> >>
> >> On x86 memblocks can be upto 2G on machines with over 64G of RAM.
> >
> > sorry I meant pageblock_nr_pages rather than memblocks.
>
> OK. This sound reasonable, but, to be honest I am not sure how to
> achieve this yet, I need to think more about this. In theory, if we have
> sparse memory model, it makes sense to enforce memory alignment to
> section sizes, sounds a lot safer.
Memory hotplug is sparsemem only. You do not have to think about other
memory models fortunately.
> >> Also, memory size is way to easy too change via qemu arguments when VM
> >> starts. If we simply disable unaligned trailing memblocks, I am sure
> >> we would get tons of noise of missing memory.
> >>
> >> I think, adding check_hotplug_memory_range() would work to fix the
> >> immediate problem. But, we do need to figure out a better solution.
> >>
> >> memblock design is based on archaic assumption that hotplug units are
> >> physical dimms. VMs and hypervisors changed all of that, and we can
> >> have much finer hotplug requests on machines with huge DIMMs. Yet, we
> >> do not want to pollute sysfs with millions of tiny memory devices. I
> >> am not sure what a long term proper solution for this problem should
> >> be, but I see that linux hotplug/hotremove subsystems must be
> >> redesigned based on the new requirements.
> >
> > Not an easy task though. Anyway, sparse memory modely is highly based on
> > memory sections so it makes some sense to have hotplug section based as
> > well. Memblocks as a higher logical unit on top of that is kinda hack.
> > The userspace API has never been properly thought through I am afraid.
>
> I agree memoryblock is a hack, it fails to do both things it was
> designed to do:
>
> 1. On bare metal you cannot free a physical dimm of memory using
> memoryblock granularity because memory devices do not equal to physical
> dimms. Thus, if for some reason a particular dimm must be
> remove/replaced, memoryblock does not help us.
agreed
> 2. On machines with hypervisors it fails to provide an adequate
> granularity to add/remove memory.
>
> We should define a new user interface where memory can be added/removed
> at a finer granularity: sparse section size, but without a memory
> devices for each section. We should also provide an optional access to
> legacy interface where memory devices are exported but each is of
> section size.
>
> So, when legacy interface is enabled, current way would work:
>
> echo offline > /sys/devices/system/memory/memoryXXX/state
>
> And new interface would allow us to do something like this:
>
> echo offline 256M > /sys/devices/system/node/nodeXXX/memory
>
> With optional start address for offline memory.
> echo offline [start_pa] size > /sys/devices/system/node/nodeXXX/memory
> start_pa and size must be section size aligned (128M).
I am not sure what is the expected semantic of the version without
start_pa.
> It would probably be a good discussion for the next MM Summit how to
> solve the current memory hotplug interface limitations.
Yes, sounds good to me. In any case let's not pollute this email thread
with this discussion now.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2018-09-11 9:16 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-10 12:35 [PATCH] memory_hotplug: fix the panic when memory end is not on the section boundary Mikhail Zaslonko
2018-09-10 13:17 ` Michal Hocko
2018-09-10 13:46 ` Pasha Tatashin
2018-09-10 13:59 ` Michal Hocko
2018-09-10 14:11 ` Pasha Tatashin
2018-09-10 14:19 ` Michal Hocko
2018-09-10 14:32 ` Pasha Tatashin
2018-09-10 14:41 ` Michal Hocko
2018-09-10 15:26 ` Pasha Tatashin
2018-09-11 9:16 ` Michal Hocko [this message]
2018-09-12 14:28 ` Gerald Schaefer
2018-09-11 14:08 ` Zaslonko Mikhail
2018-09-11 14:06 ` Zaslonko Mikhail
2018-09-12 12:21 ` Michal Hocko
2018-09-12 13:03 ` Gerald Schaefer
2018-09-12 13:39 ` Michal Hocko
2018-09-12 14:27 ` Gerald Schaefer
2018-09-12 14:40 ` Pasha Tatashin
2018-09-12 15:51 ` Gerald Schaefer
2018-10-24 19:28 ` Zaslonko Mikhail
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180911091608.GQ10951@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=Pavel.Tatashin@microsoft.com \
--cc=akpm@linux-foundation.org \
--cc=gerald.schaefer@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=osalvador@suse.de \
--cc=zaslonko@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).