linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: David Hildenbrand <david@redhat.com>, Zi Yan <ziy@nvidia.com>,
	linux-mm@kvack.org
Cc: Matthew Wilcox <willy@infradead.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Michal Hocko <mhocko@kernel.org>,
	John Hubbard <jhubbard@nvidia.com>,
	linux-kernel@vger.kernel.org, Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [RFC PATCH 00/15] Make MAX_ORDER adjustable as a kernel boot time parameter.
Date: Fri, 6 Aug 2021 18:54:14 +0200	[thread overview]
Message-ID: <3417eb98-36c8-5459-c83e-52f90e42a146@suse.cz> (raw)
In-Reply-To: <72b317e5-c78a-f0bc-fe69-f82261ec252e@redhat.com>

On 8/6/21 6:16 PM, David Hildenbrand wrote:
> On 06.08.21 17:36, Vlastimil Babka wrote:
>> On 8/5/21 9:02 PM, Zi Yan wrote:
>>> From: Zi Yan <ziy@nvidia.com>
>>
>>> Patch 3 restores the pfn_valid_within() check when buddy allocator can merge
>>> pages across memory sections. The check was removed when ARM64 gets rid of holes
>>> in zones, but holes can appear in zones again after this patchset.
>>
>> To me that's most unwelcome resurrection. I kinda missed it was going away and
>> now I can't even rejoice? I assume the systems that will be bumping max_order
>> have a lot of memory. Are they going to have many holes? What if we just
>> sacrificed the memory that would have a hole and don't add it to buddy at all?
> 
> I think the old implementation was just horrible and the description we have
> here still suffers from that old crap: "but holes can appear in zones again".
> No, it's not related to holes in zones at all. We can have MAX_ORDER -1 pages
> that are partially a hole.
> 
> And to be precise, "hole" here means "there is no memmap" and not "there is a
> hole but it has a valid memmap".

Yes.

> But IIRC, we now have under SPARSEMEM always a complete memmap for a complete
> memory sections (when talking about system RAM, ZONE_DEVICE is different but we
> don't really care for now I think).
> 
> So instead of introducing what we had before, I think we should look into
> something that doesn't confuse each person that stumbles over it out there. What
> does pfn_valid_within() even mean in the new context? pfn_valid() is most
> probably no longer what we really want, as we're dealing with multiple sections
> that might be online or offline; in the old world, this was different, as a
> MAX_ORDER -1 page was completely contained in a memory section that was either
> online or offline.
> 
> I'd imagine something that expresses something different in the context of
> sparsemem:
> 
> "Some page orders, such as MAX_ORDER -1, might span multiple memory sections.
> Each memory section has a completely valid memmap if online. Memory sections
> might either be completely online or completely offline. pfn_to_online_page()
> might succeed on one part of a MAX_ORDER - 1 page, but not on another part. But
> it will certainly be consistent within one memory section."
> 
> Further, as we know that MAX_ORDER -1 and memory sections are a power of two, we
> can actually do a binary search to identify boundaries, instead of having to
> check each and every page in the range.
> 
> Is what I describe the actual reason why we introduce pfn_valid_within() ? (and
> might better introduce something new, with a better fitting name?)

What I don't like is mainly the re-addition of pfn_valid_within() (or whatever
we'd call it) into __free_one_page() for performance reasons, and also to
various pfn scanners (compaction) for performance and "I must not forget to
check this, or do I?" confusion reasons. It would be really great if we could
keep a guarantee that memmap exists for MAX_ORDER blocks. I see two ways to
achieve that:

1. we create memmap for MAX_ORDER blocks, pages in sections not online are
marked as reserved or some other state that allows us to do checks such as "is
there a buddy? no" without accessing a missing memmap
2. smaller blocks than MAX_ORDER are not released to buddy allocator

I think 1 would be more work, but less wasteful in the end?

  reply	other threads:[~2021-08-06 16:54 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 19:02 [RFC PATCH 00/15] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
2021-08-05 19:02 ` [RFC PATCH 01/15] arch: x86: remove MAX_ORDER exceeding SECTION_SIZE check for 32bit vdso Zi Yan
2021-08-05 19:02 ` [RFC PATCH 02/15] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER Zi Yan
2021-08-05 19:02 ` [RFC PATCH 03/15] mm: check pfn validity when buddy allocator can merge pages across mem sections Zi Yan
2021-08-05 19:02 ` [RFC PATCH 04/15] mm: prevent pageblock size being larger than section size Zi Yan
2021-08-05 19:02 ` [RFC PATCH 05/15] mm/memory_hotplug: online pages at " Zi Yan
2021-08-05 19:02 ` [RFC PATCH 06/15] mm: use PAGES_PER_SECTION instead for mem_map_offset/next() Zi Yan
2021-08-05 19:02 ` [RFC PATCH 07/15] mm: hugetlb: use PAGES_PER_SECTION to check mem_map discontiguity Zi Yan
2021-08-05 19:02 ` [RFC PATCH 08/15] fs: proc: use PAGES_PER_SECTION for page offline checking period Zi Yan
2021-08-07 10:32   ` Mike Rapoport
2021-08-09 15:45     ` [RFC PATCH 08/15] " Zi Yan
2021-08-05 19:02 ` [RFC PATCH 09/15] virtio: virtio_mem: use PAGES_PER_SECTION instead of MAX_ORDER_NR_PAGES Zi Yan
2021-08-09  7:35   ` David Hildenbrand
2021-08-05 19:02 ` [RFC PATCH 10/15] virtio: virtio_balloon: " Zi Yan
2021-08-09  7:42   ` David Hildenbrand
2021-08-05 19:02 ` [RFC PATCH 11/15] mm/page_reporting: report pages at section size instead of MAX_ORDER Zi Yan
2021-08-09  7:25   ` David Hildenbrand
2021-08-09 14:12     ` Alexander Duyck
2021-08-09 15:08       ` Zi Yan
2021-08-09 16:51         ` Alexander Duyck
2021-08-09 14:08   ` Alexander Duyck
2021-08-05 19:02 ` [RFC PATCH 12/15] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER Zi Yan
2021-08-06 15:16   ` Vlastimil Babka
2021-08-06 15:23     ` Zi Yan
2021-08-05 19:02 ` [RFC PATCH 13/15] mm: convert MAX_ORDER sized static arrays to dynamic ones Zi Yan
2021-08-05 19:16   ` Christian König
2021-08-05 19:58     ` Zi Yan
2021-08-06  9:37       ` Christian König
2021-08-06 14:00         ` Zi Yan
2021-08-05 19:02 ` [RFC PATCH 14/15] mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant Zi Yan
2021-08-08  8:23   ` Mike Rapoport
2021-08-09 15:35     ` Zi Yan
2021-08-05 19:02 ` [RFC PATCH 15/15] mm: make MAX_ORDER a kernel boot time parameter Zi Yan
2021-08-06 15:36 ` [RFC PATCH 00/15] Make MAX_ORDER adjustable as " Vlastimil Babka
2021-08-06 16:16   ` David Hildenbrand
2021-08-06 16:54     ` Vlastimil Babka [this message]
2021-08-06 17:08       ` David Hildenbrand
2021-08-06 18:24         ` Zi Yan
2021-08-09  7:20           ` David Hildenbrand
2021-08-08  7:41       ` Mike Rapoport
2021-08-06 16:32 ` Vlastimil Babka
2021-08-06 17:19   ` Zi Yan
2021-08-06 20:27     ` Hugh Dickins
2021-08-06 21:26       ` Zi Yan
2021-08-09  4:04         ` Hugh Dickins
2021-08-07  1:10       ` Matthew Wilcox
2021-08-07 21:23         ` Matthew Wilcox
2021-08-09  4:29         ` Hugh Dickins
2021-08-09 11:22           ` Matthew Wilcox
2021-08-09  7:41 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3417eb98-36c8-5459-c83e-52f90e42a146@suse.cz \
    --to=vbabka@suse.cz \
    --cc=david@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rppt@linux.ibm.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).