linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: "zaslonko@linux.ibm.com" <zaslonko@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	"osalvador@suse.de" <osalvador@suse.de>,
	"gerald.schaefer@de.ibm.com" <gerald.schaefer@de.ibm.com>
Subject: Re: [PATCH] memory_hotplug: fix the panic when memory end is not on the section boundary
Date: Tue, 11 Sep 2018 11:16:08 +0200	[thread overview]
Message-ID: <20180911091608.GQ10951@dhcp22.suse.cz> (raw)
In-Reply-To: <abf84f61-82f3-e3d5-2e6e-82a11cb5dcf5@microsoft.com>

On Mon 10-09-18 15:26:55, Pavel Tatashin wrote:
> 
> 
> On 9/10/18 10:41 AM, Michal Hocko wrote:
> > On Mon 10-09-18 14:32:16, Pavel Tatashin wrote:
> >> On Mon, Sep 10, 2018 at 10:19 AM Michal Hocko <mhocko@kernel.org> wrote:
> >>>
> >>> On Mon 10-09-18 14:11:45, Pavel Tatashin wrote:
> >>>> Hi Michal,
> >>>>
> >>>> It is tricky, but probably can be done. Either change
> >>>> memmap_init_zone() or its caller to also cover the ends and starts of
> >>>> unaligned sections to initialize and reserve pages.
> >>>>
> >>>> The same thing would also need to be done in deferred_init_memmap() to
> >>>> cover the deferred init case.
> >>>
> >>> Well, I am not sure TBH. I have to think about that much more. Maybe it
> >>> would be much more simple to make sure that we will never add incomplete
> >>> memblocks and simply refuse them during the discovery. At least for now.
> >>
> >> On x86 memblocks can be upto 2G on machines with over 64G of RAM.
> > 
> > sorry I meant pageblock_nr_pages rather than memblocks.
> 
> OK. This sound reasonable, but, to be honest I am not sure how to
> achieve this yet, I need to think more about this. In theory, if we have
> sparse memory model, it makes sense to enforce memory alignment to
> section sizes, sounds a lot safer.

Memory hotplug is sparsemem only. You do not have to think about other
memory models fortunately.
 
> >> Also, memory size is way to easy too change via qemu arguments when VM
> >> starts. If we simply disable unaligned trailing memblocks, I am sure
> >> we would get tons of noise of missing memory.
> >>
> >> I think, adding check_hotplug_memory_range() would work to fix the
> >> immediate problem. But, we do need to figure out  a better solution.
> >>
> >> memblock design is based on archaic assumption that hotplug units are
> >> physical dimms. VMs and hypervisors changed all of that, and we can
> >> have much finer hotplug requests on machines with huge DIMMs. Yet, we
> >> do not want to pollute sysfs with millions of tiny memory devices. I
> >> am not sure what a long term proper solution for this problem should
> >> be, but I see that linux hotplug/hotremove subsystems must be
> >> redesigned based on the new requirements.
> > 
> > Not an easy task though. Anyway, sparse memory modely is highly based on
> > memory sections so it makes some sense to have hotplug section based as
> > well. Memblocks as a higher logical unit on top of that is kinda hack.
> > The userspace API has never been properly thought through I am afraid.
> 
> I agree memoryblock is a hack, it fails to do both things it was
> designed to do:
> 
> 1. On bare metal you cannot free a physical dimm of memory using
> memoryblock granularity because memory devices do not equal to physical
> dimms. Thus, if for some reason a particular dimm must be
> remove/replaced, memoryblock does not help us.

agreed

> 2. On machines with hypervisors it fails to provide an adequate
> granularity to add/remove memory.
> 
> We should define a new user interface where memory can be added/removed
> at a finer granularity: sparse section size, but without a memory
> devices for each section. We should also provide an optional access to
> legacy interface where memory devices are exported but each is of
> section size.
> 
> So, when legacy interface is enabled, current way would work:
> 
> echo offline > /sys/devices/system/memory/memoryXXX/state
> 
> And new interface would allow us to do something like this:
> 
> echo offline 256M > /sys/devices/system/node/nodeXXX/memory
> 
> With optional start address for offline memory.
> echo offline [start_pa] size > /sys/devices/system/node/nodeXXX/memory
> start_pa and size must be section size aligned (128M).

I am not sure what is the expected semantic of the version without
start_pa.

> It would probably be a good discussion for the next MM Summit how to
> solve the current memory hotplug interface limitations.

Yes, sounds good to me. In any case let's not pollute this email thread
with this discussion now.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-09-11  9:16 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-10 12:35 [PATCH] memory_hotplug: fix the panic when memory end is not on the section boundary Mikhail Zaslonko
2018-09-10 13:17 ` Michal Hocko
2018-09-10 13:46   ` Pasha Tatashin
2018-09-10 13:59     ` Michal Hocko
2018-09-10 14:11       ` Pasha Tatashin
2018-09-10 14:19         ` Michal Hocko
2018-09-10 14:32           ` Pasha Tatashin
2018-09-10 14:41             ` Michal Hocko
2018-09-10 15:26               ` Pasha Tatashin
2018-09-11  9:16                 ` Michal Hocko [this message]
2018-09-12 14:28                 ` Gerald Schaefer
2018-09-11 14:08     ` Zaslonko Mikhail
2018-09-11 14:06   ` Zaslonko Mikhail
2018-09-12 12:21     ` Michal Hocko
2018-09-12 13:03   ` Gerald Schaefer
2018-09-12 13:39     ` Michal Hocko
2018-09-12 14:27       ` Gerald Schaefer
2018-09-12 14:40         ` Pasha Tatashin
2018-09-12 15:51           ` Gerald Schaefer
2018-10-24 19:28           ` Zaslonko Mikhail

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180911091608.GQ10951@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=Pavel.Tatashin@microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=zaslonko@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).