All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Oscar Salvador <osalvador@suse.com>
Cc: linux-mm@kvack.org, mhocko@suse.com, rppt@linux.vnet.ibm.com,
	akpm@linux-foundation.org, arunks@codeaurora.org, bhe@redhat.com,
	dan.j.williams@intel.com, Pavel.Tatashin@microsoft.com,
	Jonathan.Cameron@huawei.com, jglisse@redhat.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/4] mm, memory_hotplug: allocate memmap from hotadded memory
Date: Fri, 23 Nov 2018 13:11:29 +0100	[thread overview]
Message-ID: <729f2126-c4ba-e764-3c71-7bd711e44187@redhat.com> (raw)
In-Reply-To: <20181123115519.2dnzscmmgv63fdub@d104.suse.de>

On 23.11.18 12:55, Oscar Salvador wrote:
> On Thu, Nov 22, 2018 at 10:21:24AM +0100, David Hildenbrand wrote:
>> 1. How are we going to present such memory to the system statistics?
>>
>> In my opinion, this vmemmap memory should
>> a) still account to total memory
>> b) show up as allocated
>>
>> So just like before.
> 
> No, it does not show up under total memory and neither as allocated memory.
> This memory is not for use for anything but for creating the pagetables
> for the memmap array for the section/s.
> 
> It is not memory that the system can use.
> 
> I also guess that if there is a strong opinion on this, we could create
> a counter, something like NR_VMEMMAP_PAGES, and show it under /proc/meminfo.

It's a change if we "hide" such memory. E.g. in a cloud environment you
request to add XGB to your system. You will not see XGB, that can be
"problematic" with some costumers :) - "But I am paying for additional
XGB". (Showing XGB but YMB as allocated is easier to argue with - "your
OS is using it").

> 
>> 2. Is this optional, in other words, can a device driver decide to not
>> to it like that?
> 
> Right now, is a per arch setup.
> For example, x86_64/powerpc/arm64 will do it inconditionally.

That could indeed break Hyper-V/XEN (if the granularity in which you can
add memory can be smaller than 2MB). Or you have bigger memory blocks.

> 
> If we want to restrict this a per device-driver thing, I guess that we could
> allow to pass a flag to add_memory()->add_memory_resource(), and there
> unset MHP_MEMMAP_FROM_RANGE in case that flag is enabled.
> 
>> You mention ballooning. Now, both XEN and Hyper-V (the only balloon
>> drivers that add new memory as of now), usually add e.g. a 128MB segment
>> to only actually some part of it (e.g. 64MB, but could vary). Now, going
>> ahead and assuming that all memory of a section can be read/written is
>> wrong. A device driver will indicate which pages may actually be used
>> via set_online_page_callback() when new memory is added. But at that
>> point you already happily accessed some memory for vmmap - which might
>> lead to crashes.
>>
>> For now the rule was: Memory that was not onlined will not be
>> read/written, that's why it works for XEN and Hyper-V.
> 
> We do not write all memory of the hot-added section, we just write the
> first 2MB (first 512 pages), the other 126MB are left untouched.

Then that has to be made a rule and we have to make sure that all users
(Hyper-V/XEN) can cope with that.

But it is more problematic because we could have 2GB memory blocks. Then
the 2MB rule does no longer strike. Other archs have other sizes (e.g.
s390x 256MB).

> 
> Assuming that you add a memory-chunk section aligned (128MB), but you only present
> the first 64MB or 32MB to the guest as onlined, we still need to allocate the memmap
> for the whole section.

Yes, that's the right thing to do. (the section will be online but some
parts "fake offline")

> 
> I do not really know the tricks behind Hyper-V/Xen, could you expand on that?

Let's say you want to add 64MB on Hyper-V. What Linux will do is add a
new section (128MB) but only actually online, say the first 64MB (I have
no idea if it has to be the first 64MB actually!).

It will keep the other pages "fake-offline" and online them later on
when e.g. adding another 64MB.

See drivers/hv/hv_balloon.c:
- set_online_page_callback(&hv_online_page);
- hv_bring_pgs_online() -> hv_page_online_one() -> has_pfn_is_backed()

The other 64MB must not be written (otherwise GP!) but eventually be
read for e.g. dumping (although that is also shaky and I am fixing that
right now to make it more reliable).

Long story short: It is better to allow device drivers to make use of
the old behavior until they eventually can make sure that the "altmap?"
can be read/written when adding memory.

It presents a major change in the add_memory() interface.

> 
> So far I only tested this with qemu simulating large machines, but I plan
> to try the balloning thing on Xen.
> 
> At this moment I am working on a second version of this patchset
> to address Dave's feedback.

Cool, keep me tuned :)

> 
> ----
> Oscar Salvador
> SUSE L3 
> 


-- 

Thanks,

David / dhildenb

  reply	other threads:[~2018-11-23 12:11 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-16 10:12 [RFC PATCH 0/4] mm, memory_hotplug: allocate memmap from hotadded memory Oscar Salvador
2018-11-16 10:12 ` [RFC PATCH 1/4] mm, memory_hotplug: cleanup memory offline path Oscar Salvador
2018-11-16 10:12 ` [RFC PATCH 2/4] mm, memory_hotplug: provide a more generic restrictions for memory hotplug Oscar Salvador
2018-11-16 21:39   ` Dave Hansen
2018-11-23 13:00   ` Michal Hocko
2019-01-14 13:17     ` Oscar Salvador
2018-11-16 10:12 ` [RFC PATCH 3/4] mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap Oscar Salvador
2018-11-16 22:41   ` Dave Hansen
2018-11-16 22:41     ` Dave Hansen
2018-11-18 22:54     ` osalvador
2018-11-16 10:12 ` [RFC PATCH 4/4] mm, sparse: rename kmalloc_section_memmap, __kfree_section_memmap Oscar Salvador
2018-11-22  9:21 ` [RFC PATCH 0/4] mm, memory_hotplug: allocate memmap from hotadded memory David Hildenbrand
2018-11-23 11:55   ` Oscar Salvador
2018-11-23 12:11     ` David Hildenbrand [this message]
2018-11-23 12:42     ` Michal Hocko
2018-11-23 12:51       ` David Hildenbrand
2018-11-23 13:12         ` Michal Hocko
2018-11-26 13:05     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=729f2126-c4ba-e764-3c71-7bd711e44187@redhat.com \
    --to=david@redhat.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Pavel.Tatashin@microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=arunks@codeaurora.org \
    --cc=bhe@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.com \
    --cc=rppt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.