From: David Hildenbrand <david@redhat.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Marek Kedzierski <mkedzier@redhat.com>,
Hui Zhu <teawater@gmail.com>,
Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
Wei Yang <richard.weiyang@linux.alibaba.com>,
Michal Hocko <mhocko@kernel.org>,
Dan Williams <dan.j.williams@intel.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Len Brown <lenb@kernel.org>,
Pavel Tatashin <pasha.tatashin@soleen.com>,
virtualization@lists.linux-foundation.org, linux-mm@kvack.org,
linux-acpi@vger.kernel.org
Subject: Re: [PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory groups
Date: Tue, 8 Jun 2021 12:12:09 +0200 [thread overview]
Message-ID: <9ab50bc0-1714-67c4-ea9a-79e7d315315b@redhat.com> (raw)
In-Reply-To: <20210608094244.GA22894@linux>
On 08.06.21 11:42, Oscar Salvador wrote:
> On Mon, Jun 07, 2021 at 09:54:18PM +0200, David Hildenbrand wrote:
>> Hi,
>>
>> this series aims at improving in-kernel auto-online support. It tackles the
>> fundamental problems that:
>
> Hi David,
>
> the idea sounds good to me, and I like that this series takes away part of the
> responsability from the user to know where the memory should go.
> I think the kernel is a much better fit for that as it has all the required
> information to balance things.
>
> I also glanced over the series and besides some things here and there the
> whole approach looks sane.
> I plan to have a look into it in a few days, just have some high level questions
> for the time being:
Hi Oscar,
>
>> 1) We can create zone imbalances when onlining all memory blindly to
>> ZONE_MOVABLE, in the worst case crashing the system. We have to know
>> upfront how much memory we are going to hotplug such that we can
>> safely enable auto-onlining of all hotplugged memory to ZONE_MOVABLE
>> via "online_movable". This is far from practical and only applicable in
>> limited setups -- like inside VMs under the RHV/oVirt hypervisor which
>> will never hotplug more than 3 times the boot memory (and the
>> limitation is only in place due to the Linux limitation).
>
> Could you give more insight about the problems created by zone imbalances (e.g:
> a lot of movable memory and little kernel memory).
I just updated memory-hotplug.rst exactly for that purpose :)
https://lkml.kernel.org/r/20210525102604.8770-1-david@redhat.com
There, also safe zone ratios and "usually well known values" are given.
I can link it in the next cover letter.
>
>> 2) We see more setups that implement dynamic VM resizing, hot(un)plugging
>> memory to resize VM memory. In these setups, we might hotplug a lot of
>> memory, but it might happen in various small steps in both directions
>> (e.g., 2 GiB -> 8 GiB -> 4 GiB -> 16 GiB ...). virtio-mem is the
>> primary driver of this upstream right now, performing such dynamic
>> resizing NUMA-aware via multiple virtio-mem devices.
>>
>> Onlining all hotplugged memory to ZONE_NORMAL means we basically have
>> no hotunplug guarantees. Onlining all to ZONE_MOVABLE means we can
>> easily run into zone imbalances when growing a VM. We want a mixture,
>> and we want as much memory as reasonable/configured in ZONE_MOVABLE.
>>
>> 3) Memory devices consist of 1..X memory block devices, however, the
>> kernel doesn't really track the relationship. Consequently, also user
>> space has no idea. We want to make per-device decisions. As one
>> example, for memory hotunplug it doesn't make sense to use a mixture of
>> zones within a single DIMM: we want all MOVABLE if possible, otherwise
>> all !MOVABLE, because any !MOVABLE part will easily block the DIMM from
>> getting hotunplugged. As another example, virtio-mem operates on
>> individual units that span 1..X memory blocks. Similar to a DIMM, we
>> want a unit to either be all MOVABLE or !MOVABLE. Further, we want
>> as much memory of a virtio-mem device to be MOVABLE as possible.
>
> So, a virtio-mem unit could be seen as DIMM right?
It's a bit more complicated. Each individual unit (e.g., a 128 MiB
memory block) is the smallest granularity we can add/remove of that
device. So such a unit is somewhat like a DIMM. However, all "units" of
the device can interact -- it's a single memory device.
>
>> 4) We want memory onlining to be done right from the kernel while adding
>> memory; for example, this is reqired for fast memory hotplug for
>> drivers that add individual memory blocks, like virito-mem. We want a
>> way to configure a policy in the kernel and avoid implementing advanced
>> policies in user space.
>
> "we want memory onlining to be done right from the kernel while adding memory"
>
> is not that always the case when a driver adds memory? User has no interaction
> with that right?
Well, with auto-onlining in the kernel disabled, user space has to do
the onlining -- for example via udev rules right now in major distributions.
But there are also users that always want to online manually in user
space to select a zone. Most prominently standby memory on s390x, but
also in some cases dax/kmem memory. But these two are really corner
cases. In general, we want hotplugged memory to be onlined immediately.
>
>> The auto-onlining support we have in the kernel is not sufficient. All we
>> have is a) online everything movable (online_movable) b) online everything
>> !movable (online_kernel) c) keep zones contiguous (online). This series
>> allows configuring c) to mean instead "online movable if possible according
>> to the coniguration, driven by a maximum MOVABLE:KERNEL ratio" -- a new
>> onlining policy.
>>
>> This series does 3 things:
>>
>> 1) Introduces the "auto-movable" online policy that initially operates on
>> individual memory blocks only. It uses a maximum MOVABLE:KERNEL ratio
>> to make a decision whether a memory block will be onlined to
>> ZONE_MOVABLE or not. However, in the basic form, hotplugged KERNEL
>> memory does not allow for more MOVABLE memory (details in the
>> patches). CMA memory is treated like MOVABLE memory.
>
> How a user would know which ratio is sane? Could we add some info in the
> Docu part that kinda sets some "basic" rules?
Again, currently resides in the memory-hotplug.rst overhaul.
>
>> 2) Introduces static (e.g., DIMM) and dynamic (e.g., virtio-mem) memory
>> groups and uses group information to make decisions in the
>> "auto-movable" online policy accross memory blocks of a single memory
>> device (modeled as memory group).
>
> So, the distinction being that a DIMM cannot grow larger but we can add more
> memory to a virtio-mem unit? I feel I am missing some insight here.
Right, the relevant patch contains more info.
You either plug or unplug a DIMM (or a NUMA node which spans multiple
DIMMS) -- both are ACPI memory devices that span multiple physical
regions. You cannot unplug parts of a DIMM or grow it. "static" as also
expressed by ACPI code ("adds" and "removes" all memory device memory in
one go).
virtio-mem behaves differently, as it's a single physical memory region
in which we dynamically add or remove memory. The granularity in which
we add/remove memory from Linux is a "unit". In the simplest case, it's
just a single memory block (e.g., 128 MiB). So it's a memory device that
can grow/shrink in the given unit -- "dynamic".
>
>> 3) Maximizes ZONE_MOVABLE memory within dynamic memory groups, by
>> allowing ZONE_NORMAL memory within a dynamic memory group to allow for
>> more ZONE_MOVABLE memory within the same memory group. The target use
>> case is dynamic VM resizing using virtio-mem.
>
> Sorry, I got lost in this one. Care to explain a bit more?
The virtio-mem example below should make this a bit more clearer (in
addition to the relevant patch), especially in contrast to static memory
devices like DIMMs. Key is that a single virtio-mem device is a "dynamic
memory group" in which memory can get added/removed dynamically in a
given unit granularity. And we want to special case that type of device
to have as much memory of a virtio-mem device being MOVABLE as possible
(and configured).
>
>> The target usage will be:
>>
>> 1) Linux boots with "mhp_default_online_type=offline"
>>
>> 2) User space (e.g., systemd unit) configures memory onlining (according
>> to a config file and system properties), for example:
>> * Setting memory_hotplug.online_policy=auto-movable
>> * Setting memory_hotplug.auto_movable_ratio=301
>> * Setting memory_hotplug.auto_movable_numa_aware=true
>
> I think we would need to document those in order to let the user know what
> it is best for them. e.g: when do we want to enable auto_movable_numa_aware etc.
Yes, as mentioned below, an memory-hotplug.rst update will follow once
the overhaul is done. The respective patch contains more information.
>
>> For DIMMs, hotplugging 4 GiB DIMMs to a 4 GiB VM with a configured ratio of
>> 301% results in the following layout:
>> Memory block 1-15: DMA32 (early)
>> Memory block 32-47: Normal (early)
>> Memory block 48-79: Movable (DIMM 0)
>> Memory block 80-111: Movable (DIMM 1)
>> Memory block 112-143: Movable (DIMM 2)
>> Memory block 144-275: Normal (DIMM 3)
>> Memory block 176-207: Normal (DIMM 4)
>> ... all Normal
>> (-> hotplugged Normal memory does not allow for more Movable memory)
>
> Uhm, I am sorry for being dense here:
>
> On x86_64, 4GB = 32 sections (of 128MB each). Why the memblock span from #1 to #47?
Sorry, it's actually "Memory block 0-15", which gives us 0-15 and 32-47
== 32 memory blocks corresponding to boot memory. Note that the absent
memory blocks 16-31 should correspond to the PCI hole.
Thanks Oscar!
--
Thanks,
David / dhildenb
prev parent reply other threads:[~2021-06-08 10:12 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-07 19:54 [PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory groups David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 01/12] mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 02/12] mm: track present early pages per zone David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 03/12] mm/memory_hotplug: introduce "auto-movable" online policy David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 04/12] mm/memory_hotplug: remove nid parameter from arch_remove_memory() David Hildenbrand
2021-06-08 8:32 ` Catalin Marinas
2021-06-08 10:50 ` Michael Ellerman
2021-06-09 5:51 ` Heiko Carstens
2021-06-07 19:54 ` [PATCH v1 05/12] mm/memory_hotplug: remove nid parameter from remove_memory() and friends David Hildenbrand
2021-06-08 11:11 ` Michael Ellerman
2021-06-08 11:18 ` David Hildenbrand
2021-06-09 10:05 ` David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 06/12] drivers/base/memory: "memory groups" to logically group memory blocks David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 07/12] mm/memory_hotplug: track present pages in memory groups David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 08/12] ACPI: memhotplug: memory resources cannot be enabled yet David Hildenbrand
2021-06-08 12:20 ` Rafael J. Wysocki
2021-06-07 19:54 ` [PATCH v1 09/12] ACPI: memhotplug: use a single static memory group for a single memory device David Hildenbrand
2021-06-08 12:20 ` Rafael J. Wysocki
2021-06-07 19:54 ` [PATCH v1 10/12] virtio-mem: use a single dynamic memory group for a single virtio-mem device David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 11/12] mm/memory_hotplug: memory group aware "auto-movable" online policy David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 12/12] mm/memory_hotplug: improved dynamic " David Hildenbrand
2021-06-08 9:42 ` [PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory groups Oscar Salvador
2021-06-08 10:12 ` David Hildenbrand [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9ab50bc0-1714-67c4-ea9a-79e7d315315b@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=jasowang@redhat.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mkedzier@redhat.com \
--cc=mst@redhat.com \
--cc=osalvador@suse.de \
--cc=pankaj.gupta.linux@gmail.com \
--cc=pasha.tatashin@soleen.com \
--cc=richard.weiyang@linux.alibaba.com \
--cc=rjw@rjwysocki.net \
--cc=rppt@kernel.org \
--cc=teawater@gmail.com \
--cc=vbabka@suse.cz \
--cc=virtualization@lists.linux-foundation.org \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).