linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Doug Berger <opendmb@gmail.com>
To: David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Rob Herring <robh+dt@kernel.org>,
	Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>,
	Frank Rowand <frowand.list@gmail.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Mike Rapoport <rppt@kernel.org>, Christoph Hellwig <hch@lst.de>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>, Borislav Petkov <bp@suse.de>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Neeraj Upadhyay <quic_neeraju@quicinc.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>,
	Florian Fainelli <f.fainelli@gmail.com>, Zi Yan <ziy@nvidia.com>,
	Oscar Salvador <osalvador@suse.de>,
	Hari Bathini <hbathini@linux.ibm.com>,
	Kees Cook <keescook@chromium.org>,
	- <devicetree-spec@vger.kernel.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Mel Gorman <mgorman@suse.de>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-mm@kvack.org,
	iommu@lists.linux.dev
Subject: Re: [PATCH 00/21] mm: introduce Designated Movable Blocks
Date: Mon, 19 Sep 2022 18:03:55 -0700	[thread overview]
Message-ID: <02561695-df44-4df6-c486-1431bf152650@gmail.com> (raw)
In-Reply-To: <b610a7b3-d740-8d45-c270-4c638deb1cfa@redhat.com>

On 9/19/2022 2:00 AM, David Hildenbrand wrote:
> Hi Dough,
> 
> I have some high-level questions.
Thanks for your interest. I will attempt to answer them.

> 
>> MOTIVATION:
>> Some Broadcom devices (e.g. 7445, 7278) contain multiple memory
>> controllers with each mapped in a different address range within
>> a Uniform Memory Architecture. Some users of these systems have
> 
> How large are these areas typically?
> 
> How large are they in comparison to other memory in the system?
> 
> How is this memory currently presented to the system?
I'm not certain what is typical because these systems are highly 
configurable and Broadcom's customers have different ideas about 
application processing.

The 7278 device has four ARMv8 CPU cores in an SMP cluster and two 
memory controllers (MEMCs). Each MEMC is capable of controlling up to 
8GB of DRAM. An example 7278 system might have 1GB on each controller, 
so an arm64 kernel might see 1GB on MEMC0 at 0x40000000-0x7FFFFFFF and 
1GB on MEMC1 at 0x300000000-0x33FFFFFFF.

The Designated Movable Block concept introduced here has the potential 
to offer useful services to different constituencies. I tried to 
highlight this in my V1 patch set with the hope of attracting some 
interest, but it can complicate the overall discussion, so I would like 
to maybe narrow the discussion here. It may be good to keep them in mind 
when assessing the overall value, but perhaps the "other opportunities" 
can be covered as a follow on discussion.

The base capability described in commits 7-15 of this V1 patch set is to 
allow a 'movablecore' block to be created at a particular base address 
rather than solely at the end of addressable memory.

> 
>> expressed the desire to locate ZONE_MOVABLE memory on each
>> memory controller to allow user space intensive processing to
>> make better use of the additional memory bandwidth.
> 
> Can you share some more how exactly ZONE_MOVABLE would help here to make 
> better use of the memory bandwidth?
ZONE_MOVABLE memory is effectively unusable by the kernel. It can be 
used by user space applications through both the page allocator and the 
Hugetlbfs. If a large 'movablecore' allocation is defined and it can 
only be located at the end of addressable memory then it will always be 
located on MEMC1 of a 7278 system. This will create a tendency for user 
space accesses to consume more bandwidth on the MEMC1 memory controller 
and kernel space accesses to consume more bandwidth on MEMC0. A more 
even distribution of ZONE_MOVABLE memory between the available memory 
controllers in theory makes more memory bandwidth available to user 
space intensive loads.

> 
>> Unfortunately, the historical monotonic layout of zones would
>> mean that if the lowest addressed memory controller contains
>> ZONE_MOVABLE memory then all of the memory available from
>> memory controllers at higher addresses must also be in the
>> ZONE_MOVABLE zone. This would force all kernel memory accesses
>> onto the lowest addressed memory controller and significantly
>> reduce the amount of memory available for non-movable
>> allocations.
> 
> We do have code that relies on zones during boot to not overlap within a 
> single node.
I believe my changes address all such reliance, but if you are aware of 
something I missed please let me know.

> 
>>
>> The main objective of this patch set is therefore to allow a
>> block of memory to be designated as part of the ZONE_MOVABLE
>> zone where it will always only be used by the kernel page
>> allocator to satisfy requests for movable pages. The term
>> Designated Movable Block is introduced here to represent such a
>> block. The favored implementation allows modification of the
> 
> Sorry to say, but that term is rather suboptimal to describe what you 
> are doing here. You simply have some system RAM you'd want to have 
> managed by ZONE_MOVABLE, no?
That may be true, but I found it superior to the 'sticky' movable 
terminology put forth by Mel Gorman ;). I'm happy to entertain 
alternatives, but they may not be as easy to find as you think.

> 
>> 'movablecore' kernel parameter to allow specification of a base
>> address and support for multiple blocks. The existing
>> 'movablecore' mechanisms are retained. Other mechanisms based on
>> device tree are also included in this set.
>>
>> BACKGROUND:
>> NUMA architectures support distributing movablecore memory
>> across each node, but it is undesirable to introduce the
>> overhead and complexities of NUMA on systems that don't have a
>> Non-Uniform Memory Architecture.
> 
> How exactly would that look like? I think I am missing something :)
The notion would be to consider each memory controller as a separate 
node, but as stated it is not desirable.

> 
>>
>> Commit 342332e6a925 ("mm/page_alloc.c: introduce kernelcore=mirror 
>> option")
>> also depends on zone overlap to support sytems with multiple
>> mirrored ranges.
> 
> IIRC, zones will not overlap within a single node.
I believe the implementation for kernelcore=mirror allows for the 
possibility of multiple non-adjacent mirrored ranges in a single node 
and accommodates the zone overlap.

> 
>>
>> Commit c6f03e2903c9 ("mm, memory_hotplug: remove zone restrictions")
>> embraced overlapped zones for memory hotplug.
> 
> Yes, after boot.
> 
>>
>> This commit set follows their lead to allow the ZONE_MOVABLE
>> zone to overlap other zones while spanning the pages from the
>> lowest Designated Movable Block to the end of the node.
>> Designated Movable Blocks are made absent from overlapping zones
>> and present within the ZONE_MOVABLE zone.
>>
>> I initially investigated an implementation using a Designated
>> Movable migrate type in line with comments[1] made by Mel Gorman
>> regarding a "sticky" MIGRATE_MOVABLE type to avoid using
>> ZONE_MOVABLE. However, this approach was riskier since it was
>> much more instrusive on the allocation paths. Ultimately, the
>> progress made by the memory hotplug folks to expand the
>> ZONE_MOVABLE functionality convinced me to follow this approach.
>>
>> OPPORTUNITIES:
>> There have been many attempts to modify the behavior of the
>> kernel page allocators use of CMA regions. This implementation
>> of Designated Movable Blocks creates an opportunity to repurpose
>> the CMA allocator to operate on ZONE_MOVABLE memory that the
>> kernel page allocator can use more agressively, without
>> affecting the existing CMA implementation. It is hoped that the
>> "shared-dmb-pool" approach included here will be useful in cases
>> where memory sharing is more important than allocation latency.
>>
>> CMA introduced a paradigm where multiple allocators could
>> operate on the same region of memory, and that paradigm can be
>> extended to Designated Movable Blocks as well. I was interested
>> in using kernel resource management as a mechanism for exposing
>> Designated Movable Block resources (e.g. /proc/iomem) that would
>> be used by the kernel page allocator like any other ZONE_MOVABLE
>> memory, but could be claimed by an alternative allocator (e.g.
>> CMA). Unfortunately, this becomes complicated because the kernel
>> resource implementation varies materially across different
>> architectures and I do not require this capability so I have
>> deferred that.
> 
> Why can't we simply designate these regions as CMA regions?
We and others have encountered significant performance issues when large 
CMA regions are used. There are significant restrictions on the page 
allocator's use of MIGRATE_CMA pages and the memory subsystem works very 
hard to keep about half of the memory in the CMA region free. There have 
been attempts to patch the CMA implementation to alter this behavior 
(for example the set I referenced Mel's response to in [1]), but there 
are users that desire the current behavior.

> 
> Why do we have to start using ZONE_MOVABLE for them?
One of the "other opportunities" for Designated Movable Blocks is to 
allow CMA to allocate from a DMB as an alternative. This would allow 
current users to continue using CMA as they want, but would allow users 
(e.g. hugetlb_cma) that are not sensitive to the allocation latency to 
let the kernel page allocator make more complete use (i.e. waste less) 
of the shared memory. ZONE_MOVABLE pageblocks are always MIGRATE_MOVABLE 
so the restrictions placed on MIGRATE_CMA pageblocks are lifted within a 
DMB.

> 
Thanks for your consideration,
Dough Baker ... I mean Doug Berger :).

  reply	other threads:[~2022-09-20  1:04 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-13 19:54 [PATCH 00/21] mm: introduce Designated Movable Blocks Doug Berger
2022-09-13 19:54 ` [PATCH 01/21] mm/page_isolation: protect cma from isolate_single_pageblock Doug Berger
2022-09-14  0:02   ` Zi Yan
2022-09-14  0:59     ` Doug Berger
2022-09-14  1:09       ` Zi Yan
2022-09-14  1:47         ` Doug Berger
2022-09-14  1:53           ` Zi Yan
2022-09-14 17:27             ` Doug Berger
2022-09-16  3:40   ` kernel test robot
2022-09-13 19:54 ` [PATCH 02/21] mm/hugetlb: correct max_huge_pages accounting on demote Doug Berger
2022-09-14 17:23   ` Mike Kravetz
2022-09-14 17:26     ` Florian Fainelli
2022-09-14 18:43       ` Mike Kravetz
2022-09-14 17:30     ` Doug Berger
2022-09-14 20:58     ` Andrew Morton
2022-09-14 21:11       ` Mike Kravetz
2022-09-13 19:54 ` [PATCH 03/21] mm/hugetlb: correct demote page offset logic Doug Berger
2022-09-13 23:34   ` Matthew Wilcox
2022-09-14  1:07     ` Doug Berger
2022-09-14 17:08       ` Mike Kravetz
2022-09-14 17:54         ` Doug Berger
2022-09-15  1:40   ` Muchun Song
2022-09-13 19:54 ` [PATCH 04/21] mm/hugetlb: refactor alloc_and_dissolve_huge_page Doug Berger
2022-09-13 19:54 ` [PATCH 05/21] mm/hugetlb: allow migrated hugepage to dissolve when freed Doug Berger
2022-09-13 19:54 ` [PATCH 06/21] mm/hugetlb: add hugepage isolation support Doug Berger
2022-09-13 19:54 ` [PATCH 07/21] lib/show_mem.c: display MovableOnly Doug Berger
2022-09-13 19:54 ` [PATCH 08/21] mm/vmstat: show start_pfn when zone spans pages Doug Berger
2022-09-13 19:54 ` [PATCH 09/21] mm/page_alloc: calculate node_spanned_pages from pfns Doug Berger
2022-09-13 19:54 ` [PATCH 10/21] mm/page_alloc.c: allow oversized movablecore Doug Berger
2022-09-13 19:54 ` [PATCH 11/21] mm/page_alloc: introduce init_reserved_pageblock() Doug Berger
2022-09-13 19:54 ` [PATCH 12/21] memblock: introduce MEMBLOCK_MOVABLE flag Doug Berger
2022-09-13 19:55 ` [PATCH 13/21] mm/dmb: Introduce Designated Movable Blocks Doug Berger
2022-09-13 19:55 ` [PATCH 14/21] mm/page_alloc: make alloc_contig_pages DMB aware Doug Berger
2022-09-13 19:55 ` [PATCH 15/21] mm/page_alloc: allow base for movablecore Doug Berger
2022-09-13 19:55 ` [PATCH 16/21] dt-bindings: reserved-memory: introduce designated-movable-block Doug Berger
2022-09-14 14:55   ` Rob Herring
2022-09-14 17:13     ` Doug Berger
2022-09-18 10:31       ` Krzysztof Kozlowski
2022-09-18 23:12         ` Doug Berger
2022-09-19 11:03           ` Krzysztof Kozlowski
2022-09-21  0:14             ` Doug Berger
2022-09-21  6:35               ` Krzysztof Kozlowski
2022-09-18 10:28   ` Krzysztof Kozlowski
2022-09-18 22:41     ` Doug Berger
2022-09-13 19:55 ` [PATCH 17/21] mm/dmb: introduce rmem designated-movable-block Doug Berger
2022-09-13 19:55 ` [PATCH 18/21] mm/cma: support CMA in Designated Movable Blocks Doug Berger
2022-09-14 17:07   ` kernel test robot
2022-09-14 17:58   ` kernel test robot
2022-09-14 22:03   ` kernel test robot
2022-09-13 19:55 ` [PATCH 19/21] dt-bindings: reserved-memory: shared-dma-pool: support DMB Doug Berger
2022-09-13 19:55 ` [PATCH 20/21] mm/cma: introduce rmem shared-dmb-pool Doug Berger
2022-09-13 19:55 ` [PATCH 21/21] mm/hugetlb: introduce hugetlb_dmb Doug Berger
2022-09-14 13:21 ` [PATCH 00/21] mm: introduce Designated Movable Blocks Rob Herring
2022-09-14 16:57   ` Doug Berger
2022-09-14 18:07     ` Rob Herring
2022-09-19  9:00 ` David Hildenbrand
2022-09-20  1:03   ` Doug Berger [this message]
2022-09-23 11:19     ` Mike Rapoport
2022-09-23 22:10       ` Doug Berger
2022-09-29  9:00     ` David Hildenbrand
2022-10-01  0:42       ` Doug Berger
2022-10-05 18:39         ` David Hildenbrand
2022-10-12 23:38           ` Doug Berger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02561695-df44-4df6-c486-1431bf152650@gmail.com \
    --to=opendmb@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@suse.de \
    --cc=corbet@lwn.net \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=david@redhat.com \
    --cc=devicetree-spec@vger.kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=f.fainelli@gmail.com \
    --cc=frowand.list@gmail.com \
    --cc=hbathini@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=keescook@chromium.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=krzysztof.kozlowski+dt@linaro.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mgorman@suse.de \
    --cc=mike.kravetz@oracle.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=quic_neeraju@quicinc.com \
    --cc=rdunlap@infradead.org \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=rppt@kernel.org \
    --cc=songmuchun@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).