From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A541FC4332F for ; Wed, 6 Oct 2021 00:35:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5F7656120C for ; Wed, 6 Oct 2021 00:35:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5F7656120C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id F00C06B0071; Tue, 5 Oct 2021 20:35:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB0AB6B0073; Tue, 5 Oct 2021 20:35:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9F88900002; Tue, 5 Oct 2021 20:35:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0152.hostedemail.com [216.40.44.152]) by kanga.kvack.org (Postfix) with ESMTP id CB3816B0071 for ; Tue, 5 Oct 2021 20:35:50 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8F6A31838A136 for ; Wed, 6 Oct 2021 00:35:50 +0000 (UTC) X-FDA: 78664144860.38.40B32CF Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf06.hostedemail.com (Postfix) with ESMTP id 2E5B4801B0D9 for ; Wed, 6 Oct 2021 00:35:50 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id A81BE61207; Wed, 6 Oct 2021 00:35:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1633480549; bh=PrZNgMx7YzMXM2DrDDSBr5tyPQGCj7j0uu7IMYK3EC8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=FNg8p9vnOBdjONhlzgt21BQUsX5Ls3sKljcMLq3eH1OjZpIpbSYJ8eytov9AhoTgW OsfAD7LOtVg3wT7pKVkoKZFDwqVrkrTFh+BXNcRzu6TeIWAGTfPw2kIhSwGupk86t5 pw0Qv5KndBfaYbgXxU+dDD491qenPRMsqeEsoJAq1iJE/nyxHt09Ihlw+nhA1JSV7l uUOmVK7RLAKhx8oXbApZ7JlCLqeO8M7TNGD8bXn7OpMHSBJYggXqTc8LwRhw9WNbVb SpxWHF25iaRwmiLVWVIw7T/KDSVLBMC1ZfqSc2PZj4gKL38PpjF5pb9TWss+1M3urJ cOk1IPcGWC+Kw== Date: Tue, 5 Oct 2021 17:35:46 -0700 From: Mike Rapoport To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, Andrew Morton , Jonathan Corbet , Michal Hocko , Oscar Salvador , linux-doc@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v1 3/3] memory-hotplug.rst: document the "auto-movable" online policy Message-ID: References: <20210930144117.23641-1-david@redhat.com> <20210930144117.23641-4-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210930144117.23641-4-david@redhat.com> X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2E5B4801B0D9 X-Stat-Signature: qnrx1kjz7thuw3yhh3byaysqa9o7u58g Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=FNg8p9vn; spf=pass (imf06.hostedemail.com: domain of rppt@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-HE-Tag: 1633480550-765683 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 30, 2021 at 04:41:17PM +0200, David Hildenbrand wrote: > In commit e83a437faa62 ("mm/memory_hotplug: introduce "auto-movable" online > policy") we introduced a new memory online policy to automatically > select a zone for memory blocks to be onlined. We added a way to > set the active online policy and tunables for the auto-movable online > policy. In follow-up commits we tweaked the "auto-movable" policy to also > consider memory device details when selecting zones for memory blocks to > be onlined. > > Let's document the new toggles and how the two online policies we have > work. > > Signed-off-by: David Hildenbrand > --- > .../admin-guide/mm/memory-hotplug.rst | 128 +++++++++++++++--- > 1 file changed, 108 insertions(+), 20 deletions(-) > > diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst > index ee00b70dedde..c20a2c0031cf 100644 > --- a/Documentation/admin-guide/mm/memory-hotplug.rst > +++ b/Documentation/admin-guide/mm/memory-hotplug.rst > @@ -165,9 +165,8 @@ Or alternatively:: > > % echo 1 > /sys/devices/system/memory/memoryXXX/online > > -The kernel will select the target zone automatically, usually defaulting to > -``ZONE_NORMAL`` unless ``movable_node`` has been specified on the kernel > -command line or if the memory block would intersect the ZONE_MOVABLE already. > +The kernel will select the target zone automatically, depending on the > +configured ``online_policy``. > > One can explicitly request to associate an offline memory block with > ZONE_MOVABLE by:: > @@ -198,6 +197,9 @@ Auto-onlining can be enabled by writing ``online``, ``online_kernel`` or > > % echo online > /sys/devices/system/memory/auto_online_blocks > > +Similarly to manual onlining, with ``online`` the kernel will select the > +target zone automatically, depending on the configured ``online_policy``. > + > Modifying the auto-online behavior will only affect all subsequently added > memory blocks only. > > @@ -393,9 +395,11 @@ command line parameters are relevant: > ======================== ======================================================= > ``memhp_default_state`` configure auto-onlining by essentially setting > ``/sys/devices/system/memory/auto_online_blocks``. > -``movable_node`` configure automatic zone selection in the kernel. When > - set, the kernel will default to ZONE_MOVABLE, unless > - other zones can be kept contiguous. > +``movable_node`` configure automatic zone selection in the kernel when > + using the ``contig-zones`` online policy. When > + set, the kernel will default to ZONE_MOVABLE when > + onlining a memory block, unless other zones can be kept > + contiguous. The movable_node main purpose is to allow unplugging an entire node. Zone selection is a consequence of this. You may want to cite the description of movable_node in kernel-paramenters.txt here. And, pardon my ignorance, how movable_node will play with auto-movable policy? > ======================== ======================================================= > > Module Parameters > @@ -414,20 +418,104 @@ and they can be observed (and some even modified at runtime) via:: > > The following module parameters are currently defined: > > -======================== ======================================================= > -``memmap_on_memory`` read-write: Allocate memory for the memmap from the > - added memory block itself. Even if enabled, actual > - support depends on various other system properties and > - should only be regarded as a hint whether the behavior > - would be desired. > - > - While allocating the memmap from the memory block > - itself makes memory hotplug less likely to fail and > - keeps the memmap on the same NUMA node in any case, it > - can fragment physical memory in a way that huge pages > - in bigger granularity cannot be formed on hotplugged > - memory. > -======================== ======================================================= > +================================ =============================================== > +``memmap_on_memory`` read-write: Allocate memory for the memmap from > + the added memory block itself. Even if enabled, > + actual support depends on various other system > + properties and should only be regarded as a > + hint whether the behavior would be desired. > + > + While allocating the memmap from the memory > + block itself makes memory hotplug less likely > + to fail and keeps the memmap on the same NUMA > + node in any case, it can fragment physical > + memory in a way that huge pages in bigger > + granularity cannot be formed on hotplugged > + memory. > +``online_policy`` read-write: Set the basic policy used for > + automatic zone selection when onlining memory > + blocks without specifying a target zone. > + ``contig-zones`` has been the kernel default > + before this parameter was added. After an > + online policy was configured and memory was > + online, the policy should not be changed > + anymore. > + > + When set to ``contig-zones``, the kernel will > + try keeping zones contiguous. If a memory block > + intersects multiple zones or no zone, the > + behavior depends on the ``movable_node`` kernel > + command line parameter: default to ZONE_MOVABLE > + if set, default to the applicable kernel zone > + (usually ZONE_NORMAL) if not set. > + > + When set to ``auto-movable``, the kernel will > + try onlining memory blocks to ZONE_MOVABLE if > + possible according to the configuration and > + memory device details. With this policy, one > + can avoid zone imbalances when eventually > + hotplugging a lot of memory later and still > + wanting to be able to hotunplug as much as > + possible reliably, very desirable in > + virtualized environments. As one example, a > + hotplugged DIMM will be onlined either > + completely to ZONE_MOVABLE or completely to > + ZONE_NORMAL, not a mixture. > + As another example, as many memory blocks > + belonging to a virtio-mem device will be > + onlined to ZONE_MOVABLE as possible, > + special-casing units of memory blocks that can > + only get hotunplugged together. *This policy > + does not protect from setups that are > + problematic with ZONE_MOVABLE and does not > + change the zone of memory blocks dynamically > + after they were onlined.* > +``auto_movable_ratio`` read-write: Set the maximum MOVABLE:KERNEL > + memory ratio in % for the ``auto-movable`` > + online policy. Whether the ratio applies only > + for the system across all NUMA nodes or also > + per NUMA nodes depends on the > + ``auto_movable_numa_aware`` configuration. > + > + All accounting is based on present memory pages > + in the zones combined with accounting per > + memory device. Memory dedicated to the CMA > + allocator is accounted as MOVABLE, although > + residing on one of the kernel zones. The > + possible ratio depends on the actual workload. > + The kernel default is "301" %, for example, > + allowing for hotplugging 24 GiB to a 8 GiB VM > + and automatically onlining all hotplugged > + memory to ZONE_MOVABLE in many setups. The > + additional 1% deals with some pages being not > + present, for example, because of some firmware > + allocations. > + > + Note that ZONE_NORMAL memory provided by one > + memory device does not allow for more > + ZONE_MOVABLE memory for a different memory > + device. As one example, onlining memory of a > + hotplugged DIMM to ZONE_NORMAL will not allow > + for another hotplugged DIMM to get onlined to > + ZONE_MOVABLE automatically. In contrast, memory > + hotplugged by a virtio-mem device that got > + onlined to ZONE_NORMAL will allow for more > + ZONE_MOVABLE memory within *the same* > + virtio-mem device. > +``auto_movable_numa_aware`` read-write: Configure whether the > + ``auto_movable_ratio`` in the ``auto-movable`` > + online policy also applies per NUMA > + node in addition to the whole system across all > + NUMA nodes. The kernel default is "Y". > + > + Disabling NUMA awareness can be helpful when > + dealing with NUMA nodes that should be > + completely hotunpluggable, onlining the memory > + completely to ZONE_MOVABLE automatically if > + possible. > + > + Parameter availability depends on CONFIG_NUMA. > +================================ =============================================== > > ZONE_MOVABLE > ============ > -- > 2.31.1 > -- Sincerely yours, Mike.