All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: akpm@linux-foundation.org, mhocko@suse.com,
	dan.j.williams@intel.com, pasha.tatashin@soleen.com,
	Jonathan.Cameron@huawei.com, anshuman.khandual@arm.com,
	vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 0/5] Allocate memmap from hotadded memory
Date: Wed, 26 Jun 2019 10:03:03 +0200	[thread overview]
Message-ID: <20190626080249.GA30863@linux> (raw)
In-Reply-To: <2ebfbd36-11bd-9576-e373-2964c458185b@redhat.com>

On Tue, Jun 25, 2019 at 10:25:48AM +0200, David Hildenbrand wrote:
> > [Coverletter]
> > 
> > This is another step to make memory hotplug more usable. The primary
> > goal of this patchset is to reduce memory overhead of the hot-added
> > memory (at least for SPARSEMEM_VMEMMAP memory model). The current way we use
> > to populate memmap (struct page array) has two main drawbacks:

First off, thanks for looking into this :-)

> 
> Mental note: How will it be handled if a caller specifies "Allocate
> memmap from hotadded memory", but we are running under SPARSEMEM where
> we can't do this.

In add_memory_resource(), we have a call to mhp_check_correct_flags(), which is
in charge of checking if the flags passed are compliant with our configuration
among other things.
It also checks if both flags were passed (_MEMBLOCK|_DEVICE).

If a) any of the flags were specified and we are not on CONFIG_SPARSEMEM_VMEMMAP,
b) the flags are colliding with each other or c) the flags just do not make sense,
we print out a warning and drop the flags to 0, so we just ignore them.

I just realized that I can adjust the check even more (something for the next
version).

But to answer your question, flags are ignored under !CONFIG_SPARSEMEM_VMEMMAP.

> 
> > 
> > a) it consumes an additional memory until the hotadded memory itself is
> >    onlined and
> > b) memmap might end up on a different numa node which is especially true
> >    for movable_node configuration.
> > 
> > a) it is a problem especially for memory hotplug based memory "ballooning"
> >    solutions when the delay between physical memory hotplug and the
> >    onlining can lead to OOM and that led to introduction of hacks like auto
> >    onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining
> >    policy for the newly added memory")).
> > 
> > b) can have performance drawbacks.
> > 
> > Another minor case is that I have seen hot-add operations failing on archs
> > because they were running out of order-x pages.
> > E.g On powerpc, in certain configurations, we use order-8 pages,
> > and given 64KB base pagesize, that is 16MB.
> > If we run out of those, we just fail the operation and we cannot add
> > more memory.
> 
> At least for SPARSEMEM, we fallback to vmalloc() to work around this
> issue. I haven't looked into the populate_section_memmap() internals
> yet. Can you point me at the code that performs this allocation?

Yes, on SPARSEMEM we first try to allocate the pages physical configuous, and
then fallback to vmalloc.
This is because on CONFIG_SPARSEMEM memory model, the translations pfn_to_page/
page_to_pfn do not expect the memory to be contiguous.

But that is not the case on CONFIG_SPARSEMEM_VMEMMAP.
We do expect the memory to be physical contigous there, that is why a simply
pfn_to_page/page_to_pfn is a matter of adding/substracting vmemmap/pfn.

Powerpc code is at:

https://elixir.bootlin.com/linux/v5.2-rc6/source/arch/powerpc/mm/init_64.c#L175



> So, assuming we add_memory(1GB, MHP_MEMMAP_DEVICE) and then
> remove_memory(128MB) of the added memory, this will work?

No, MHP_MEMMAP_DEVICE is meant to be used when hot-adding and hot-removing work
in the same granularity.
This is because all memmap pages will be stored at the beginning of the memory
range.
Allowing hot-removing in a different granularity on MHP_MEMMAP_DEVICE would imply
a lot of extra work.
For example, we would have to parse the vmemmap-head of the hot-removed range,
and punch a hole in there to clear the vmemmap pages, and then be very carefull
when deleting those pagetables.

So I followed Michal's advice, and I decided to let the caller specify if he
either wants to allocate per memory block or per hot-added range(device).
Where per memory block, allows us to do:

add_memory(1GB, MHP_MEMMAP_MEMBLOCK)
remove_memory(128MB)


> add_memory(8GB, MHP_MEMMAP_DEVICE)
> 
> For 8GB, we will need exactly 128MB of memmap if I did the math right.
> So exactly one section. This section will still be marked as being
> online (although not pages on it are actually online)?

Yap, 8GB will fill the first section with vmemmap pages.
It will be marked as online, yes.
This is not to diverge too much from what we have right now, and starting
treat some sections different than others.
E.g: Early sections that are used for memmap pages on early boot stage.

> > 
> > What we do is that since when we hot-remove a memory-range, sections are being
> > removed sequentially, we wait until we hit the last section, and then we free
> > the hole range to vmemmap_free backwards.
> > We know that it is the last section because in every pass we
> > decrease head->_refcount, and when it reaches 0, we got our last section.
> > 
> > We also have to be careful about those pages during online and offline
> > operations. They are simply skipped, so online will keep them
> > reserved and so unusable for any other purpose and offline ignores them
> > so they do not block the offline operation.
> 
> I assume that they will still be dumped normally by user space. (as they
> are described by a "memory resource" and not PG_Offline)

They are PG_Reserved.
Anyway, you mean by crash-tool?


-- 
Oscar Salvador
SUSE L3

  parent reply	other threads:[~2019-06-26  8:03 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-25  7:52 [PATCH v2 0/5] Allocate memmap from hotadded memory Oscar Salvador
2019-06-25  7:52 ` [PATCH v2 1/5] drivers/base/memory: Remove unneeded check in remove_memory_block_devices Oscar Salvador
2019-06-25  8:01   ` David Hildenbrand
2019-06-25  8:03     ` David Hildenbrand
2019-06-25  8:09       ` Oscar Salvador
2019-06-25  8:27         ` David Hildenbrand
2019-06-25  7:52 ` [PATCH v2 2/5] mm,memory_hotplug: Introduce MHP_VMEMMAP_FLAGS Oscar Salvador
2019-06-25  8:31   ` David Hildenbrand
2019-07-24 20:11   ` Dan Williams
2019-07-24 20:11     ` Dan Williams
2019-07-24 21:36     ` osalvador
2019-07-25  9:27     ` Oscar Salvador
2019-07-25  9:30       ` David Hildenbrand
2019-07-25  9:40         ` Oscar Salvador
2019-07-25 10:04           ` David Hildenbrand
2019-07-25 10:13             ` Oscar Salvador
2019-07-25 10:15               ` David Hildenbrand
2019-06-25  7:52 ` [PATCH v2 3/5] mm,memory_hotplug: Introduce Vmemmap page helpers Oscar Salvador
2019-06-25 10:28   ` David Hildenbrand
2019-06-26  9:48     ` Oscar Salvador
2019-06-25  7:52 ` [PATCH v2 4/5] mm,memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap Oscar Salvador
2019-06-25  8:49   ` David Hildenbrand
2019-06-26  8:13     ` Oscar Salvador
2019-06-26  8:15       ` David Hildenbrand
2019-06-26  8:17   ` Anshuman Khandual
2019-06-26  8:28     ` Oscar Salvador
2019-07-24 21:49   ` Dan Williams
2019-07-24 21:49     ` Dan Williams
2019-06-25  7:52 ` [PATCH v2 5/5] mm,memory_hotplug: Allow userspace to enable/disable vmemmap Oscar Salvador
2019-06-25  8:25 ` [PATCH v2 0/5] Allocate memmap from hotadded memory David Hildenbrand
2019-06-25  8:33   ` David Hildenbrand
2019-06-26  8:03   ` Oscar Salvador [this message]
2019-06-26  8:11     ` David Hildenbrand
2019-06-26  8:15       ` Oscar Salvador
2019-06-26  8:27         ` Oscar Salvador
2019-06-26  8:37           ` David Hildenbrand
2019-06-26  8:28         ` David Hildenbrand
2019-07-02  6:42           ` Rashmica Gupta
2019-07-02  6:42             ` Rashmica Gupta
2019-07-02  7:48             ` Oscar Salvador
2019-07-02  8:52               ` Rashmica Gupta
2019-07-10  1:14                 ` Rashmica Gupta
2019-07-10  1:14                   ` Rashmica Gupta
2019-07-31 12:08                 ` Michal Hocko
2019-07-31 23:06                   ` Rashmica Gupta
2019-07-31 23:06                     ` Rashmica Gupta
2019-08-01  7:17                     ` Michal Hocko
2019-08-01  7:18                       ` David Hildenbrand
2019-08-01  7:24                         ` Michal Hocko
2019-08-01  7:26                           ` David Hildenbrand
2019-08-01  7:31                             ` David Hildenbrand
2019-08-01  7:39                               ` Michal Hocko
2019-08-01  7:48                                 ` Michal Hocko
2019-08-01  9:18                                   ` David Hildenbrand
2019-08-01  7:34                             ` Michal Hocko
2019-08-01  7:50                               ` David Hildenbrand
2019-08-01  8:04                                 ` Michal Hocko
2019-07-16 12:28             ` David Hildenbrand
2019-07-29  5:42               ` Rashmica Gupta
2019-07-29  5:42                 ` Rashmica Gupta
2019-07-29  8:06                 ` David Hildenbrand
2019-07-30  7:08                   ` Rashmica Gupta
2019-07-30  7:08                     ` Rashmica Gupta
2019-07-31  2:21                   ` Rashmica Gupta
2019-07-31  2:21                     ` Rashmica Gupta
2019-07-31  9:39                     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190626080249.GA30863@linux \
    --to=osalvador@suse.de \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.