Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Michal Hocko <mhocko@suse.com>, Rik van Riel <riel@surriel.com>,
	Roman Gushchin <guro@fb.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-mm@kvack.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Shakeel Butt <shakeelb@google.com>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	David Nellans <dnellans@nvidia.com>,
	linux-kernel@vger.kernel.org, Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@suse.de>
Subject: Re: [RFC PATCH 00/16] 1GB THP support on x86_64
Date: Thu, 10 Sep 2020 17:15:09 +0200
Message-ID: <4b3006cf-3391-6839-904e-b415613198cb@redhat.com> (raw)
In-Reply-To: <3684BEAF-C8A2-4EEC-8FC2-55EA5F8F7DA5@nvidia.com>

On 10.09.20 16:41, Zi Yan wrote:
> On 10 Sep 2020, at 10:34, David Hildenbrand wrote:
> 
>>>> As long as we stay in safe zone boundaries you get a benefit in most
>>>> scenarios. As soon as we would have a (temporary) workload that would
>>>> require more unmovable allocations we would fallback to polluting some
>>>> pageblocks only.
>>>
>>> The idea would work well until unmoveable pages begin to overflow into
>>> ZONE_PREFER_MOVABLE or we move the boundary of ZONE_PREFER_MOVABLE to
>>> avoid unmoveable page overflow. The issue comes from the lifetime of
>>> the unmoveable pages. Since some long-live ones can be around the boundary,
>>> there is no guarantee that ZONE_PREFER_MOVABLE cannot grow back
>>> even if other unmoveable pages are deallocated. Ultimately,
>>> ZONE_PREFER_MOVABLE would be shrink to a small size and the situation is
>>> back to what we have now.
>>
>> As discussed this would not happen in the usual case in case we size it
>> reasonable. Of course, if you push it to the extreme (which was never
>> suggested!), you would create mess. There is always a way to create a
>> mess if you abuse such mechanism. Also see Rik's reply regarding reclaim.
>>
>>>
>>> OK. I have a stupid question here. Why not just grow pageblock to a larger
>>> size, like 1GB? So the fragmentation of unmoveable pages will be at larger
>>> granularity. But it is less likely unmoveable pages will be allocated at
>>> a movable pageblock, since the kernel has 1GB pageblock for them after
>>> a pageblock stealing. If other kinds of pageblocks run out, moveable and
>>> reclaimable pages can fall back to unmoveable pageblocks.
>>> What am I missing here?
>>
>> Oh no. For example pageblocks have to completely fit into a single
>> section (that's where metadata is maintained). Please refrain from
>> suggesting to increase the section size ;)
> 
> Thank you for the explanation. I have no idea about the restrictions on
> pageblock and section. Out of curiosity, what prevents the growth of
> the section size?

The section size (and based on that the Linux memory block size) defines
- the minimum size in which we can add_memory()
- the alignment requirement in which we can add_memory()

This is applicable
- in physical environments, where the bios will decide where to place
  DIMMs/NVDIMMs. The coarser the granularity, the less memory we might
  be able to make use of in corner cases.
- in virtualized environments, where we want to add memory in fairly
  small granularity. The coarser the granularity, the less flexibility
  we have.

arm64 has a section size of 1GB (and a THP/MAX_ORDER - 1 size of 512MB
with 64k base pages :/ ). That already turned out to be a problem - see
[1] regarding thoughts on how to shrink the section size. I once read
about thoughts of switching to 2MB THP on arm64 with any base page size,
not sure if that will become real at one point (and we might be able to
reduce the pageblock size there as well ... )

[1]
https://lkml.kernel.org/r/AM6PR08MB40690714A2E77A7128B2B2ADF7700@AM6PR08MB4069.eurprd08.prod.outlook.com
See [1] as

> 
> —
> Best Regards,
> Yan Zi
> 


-- 
Thanks,

David / dhildenb



  reply index

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-02 18:06 Zi Yan
2020-09-02 18:06 ` [RFC PATCH 01/16] mm: add pagechain container for storing multiple pages Zi Yan
2020-09-02 20:29   ` Randy Dunlap
2020-09-02 20:48     ` Zi Yan
2020-09-03  3:15   ` Matthew Wilcox
2020-09-07 12:22   ` Kirill A. Shutemov
2020-09-07 15:11     ` Zi Yan
2020-09-09 13:46       ` Kirill A. Shutemov
2020-09-09 14:15         ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 02/16] mm: thp: 1GB anonymous page implementation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 03/16] mm: proc: add 1GB THP kpageflag Zi Yan
2020-09-09 13:46   ` Kirill A. Shutemov
2020-09-02 18:06 ` [RFC PATCH 04/16] mm: thp: 1GB THP copy on write implementation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 05/16] mm: thp: handling 1GB THP reference bit Zi Yan
2020-09-09 14:09   ` Kirill A. Shutemov
2020-09-09 14:36     ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 06/16] mm: thp: add 1GB THP split_huge_pud_page() function Zi Yan
2020-09-09 14:18   ` Kirill A. Shutemov
2020-09-09 14:19     ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 07/16] mm: stats: make smap stats understand PUD THPs Zi Yan
2020-09-02 18:06 ` [RFC PATCH 08/16] mm: page_vma_walk: teach it about PMD-mapped PUD THP Zi Yan
2020-09-02 18:06 ` [RFC PATCH 09/16] mm: thp: 1GB THP support in try_to_unmap() Zi Yan
2020-09-02 18:06 ` [RFC PATCH 10/16] mm: thp: split 1GB THPs at page reclaim Zi Yan
2020-09-02 18:06 ` [RFC PATCH 11/16] mm: thp: 1GB THP follow_p*d_page() support Zi Yan
2020-09-02 18:06 ` [RFC PATCH 12/16] mm: support 1GB THP pagemap support Zi Yan
2020-09-02 18:06 ` [RFC PATCH 13/16] mm: thp: add a knob to enable/disable 1GB THPs Zi Yan
2020-09-02 18:06 ` [RFC PATCH 14/16] mm: page_alloc: >=MAX_ORDER pages allocation an deallocation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 15/16] hugetlb: cma: move cma reserve function to cma.c Zi Yan
2020-09-02 18:06 ` [RFC PATCH 16/16] mm: thp: use cma reservation for pud thp allocation Zi Yan
2020-09-02 18:40 ` [RFC PATCH 00/16] 1GB THP support on x86_64 Jason Gunthorpe
2020-09-02 18:45   ` Zi Yan
2020-09-02 18:48     ` Jason Gunthorpe
2020-09-02 19:05       ` Zi Yan
2020-09-02 19:57         ` Jason Gunthorpe
2020-09-02 20:29           ` Zi Yan
2020-09-03 16:40             ` Jason Gunthorpe
2020-09-03 16:55               ` Matthew Wilcox
2020-09-03 17:08                 ` Jason Gunthorpe
2020-09-03  7:32 ` Michal Hocko
2020-09-03 16:25   ` Roman Gushchin
2020-09-03 16:50     ` Jason Gunthorpe
2020-09-03 17:01       ` Matthew Wilcox
2020-09-03 17:18         ` Jason Gunthorpe
2020-09-03 20:57     ` Mike Kravetz
2020-09-03 21:06       ` Roman Gushchin
2020-09-04  7:42     ` Michal Hocko
2020-09-04 21:10       ` Roman Gushchin
2020-09-07  7:20         ` Michal Hocko
2020-09-08 15:09           ` Zi Yan
2020-09-08 19:58             ` Roman Gushchin
2020-09-09  4:01               ` John Hubbard
2020-09-09  7:15               ` Michal Hocko
2020-09-03 14:23 ` Kirill A. Shutemov
2020-09-03 16:30   ` Roman Gushchin
2020-09-08 11:57     ` David Hildenbrand
2020-09-08 14:05       ` Zi Yan
2020-09-08 14:22         ` David Hildenbrand
2020-09-08 15:36           ` Zi Yan
2020-09-08 14:27         ` Matthew Wilcox
2020-09-08 15:50           ` Zi Yan
2020-09-09 12:11           ` Jason Gunthorpe
2020-09-09 12:32             ` Matthew Wilcox
2020-09-09 13:14               ` Jason Gunthorpe
2020-09-09 13:27                 ` David Hildenbrand
2020-09-10 10:02                   ` William Kucharski
2020-09-08 14:35         ` Michal Hocko
2020-09-08 14:41           ` Rik van Riel
2020-09-08 15:02             ` David Hildenbrand
2020-09-09  7:04             ` Michal Hocko
2020-09-09 13:19               ` Rik van Riel
2020-09-09 13:43                 ` David Hildenbrand
2020-09-09 13:49                   ` Rik van Riel
2020-09-09 13:54                     ` David Hildenbrand
2020-09-10  7:32                   ` Michal Hocko
2020-09-10  8:27                     ` David Hildenbrand
2020-09-10 14:21                       ` Zi Yan
2020-09-10 14:34                         ` David Hildenbrand
2020-09-10 14:41                           ` Zi Yan
2020-09-10 15:15                             ` David Hildenbrand [this message]
2020-09-10 13:32                     ` Rik van Riel
2020-09-10 14:30                       ` Zi Yan
2020-09-09 13:59                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b3006cf-3391-6839-904e-b415613198cb@redhat.com \
    --to=david@redhat.com \
    --cc=dnellans@nvidia.com \
    --cc=guro@fb.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git