From: David Hildenbrand <david@redhat.com> To: Zi Yan <ziy@nvidia.com> Cc: Michal Hocko <mhocko@suse.com>, Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>, "Kirill A. Shutemov" <kirill@shutemov.name>, linux-mm@kvack.org, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, Matthew Wilcox <willy@infradead.org>, Shakeel Butt <shakeelb@google.com>, Yang Shi <yang.shi@linux.alibaba.com>, David Nellans <dnellans@nvidia.com>, linux-kernel@vger.kernel.org, Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@suse.de> Subject: Re: [RFC PATCH 00/16] 1GB THP support on x86_64 Date: Thu, 10 Sep 2020 17:15:09 +0200 Message-ID: <4b3006cf-3391-6839-904e-b415613198cb@redhat.com> (raw) In-Reply-To: <3684BEAF-C8A2-4EEC-8FC2-55EA5F8F7DA5@nvidia.com> On 10.09.20 16:41, Zi Yan wrote: > On 10 Sep 2020, at 10:34, David Hildenbrand wrote: > >>>> As long as we stay in safe zone boundaries you get a benefit in most >>>> scenarios. As soon as we would have a (temporary) workload that would >>>> require more unmovable allocations we would fallback to polluting some >>>> pageblocks only. >>> >>> The idea would work well until unmoveable pages begin to overflow into >>> ZONE_PREFER_MOVABLE or we move the boundary of ZONE_PREFER_MOVABLE to >>> avoid unmoveable page overflow. The issue comes from the lifetime of >>> the unmoveable pages. Since some long-live ones can be around the boundary, >>> there is no guarantee that ZONE_PREFER_MOVABLE cannot grow back >>> even if other unmoveable pages are deallocated. Ultimately, >>> ZONE_PREFER_MOVABLE would be shrink to a small size and the situation is >>> back to what we have now. >> >> As discussed this would not happen in the usual case in case we size it >> reasonable. Of course, if you push it to the extreme (which was never >> suggested!), you would create mess. There is always a way to create a >> mess if you abuse such mechanism. Also see Rik's reply regarding reclaim. >> >>> >>> OK. I have a stupid question here. Why not just grow pageblock to a larger >>> size, like 1GB? So the fragmentation of unmoveable pages will be at larger >>> granularity. But it is less likely unmoveable pages will be allocated at >>> a movable pageblock, since the kernel has 1GB pageblock for them after >>> a pageblock stealing. If other kinds of pageblocks run out, moveable and >>> reclaimable pages can fall back to unmoveable pageblocks. >>> What am I missing here? >> >> Oh no. For example pageblocks have to completely fit into a single >> section (that's where metadata is maintained). Please refrain from >> suggesting to increase the section size ;) > > Thank you for the explanation. I have no idea about the restrictions on > pageblock and section. Out of curiosity, what prevents the growth of > the section size? The section size (and based on that the Linux memory block size) defines - the minimum size in which we can add_memory() - the alignment requirement in which we can add_memory() This is applicable - in physical environments, where the bios will decide where to place DIMMs/NVDIMMs. The coarser the granularity, the less memory we might be able to make use of in corner cases. - in virtualized environments, where we want to add memory in fairly small granularity. The coarser the granularity, the less flexibility we have. arm64 has a section size of 1GB (and a THP/MAX_ORDER - 1 size of 512MB with 64k base pages :/ ). That already turned out to be a problem - see [1] regarding thoughts on how to shrink the section size. I once read about thoughts of switching to 2MB THP on arm64 with any base page size, not sure if that will become real at one point (and we might be able to reduce the pageblock size there as well ... ) [1] https://lkml.kernel.org/r/AM6PR08MB40690714A2E77A7128B2B2ADF7700@AM6PR08MB4069.eurprd08.prod.outlook.com See [1] as > > — > Best Regards, > Yan Zi > -- Thanks, David / dhildenb
next prev parent reply index Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-02 18:06 Zi Yan 2020-09-02 18:06 ` [RFC PATCH 01/16] mm: add pagechain container for storing multiple pages Zi Yan 2020-09-02 20:29 ` Randy Dunlap 2020-09-02 20:48 ` Zi Yan 2020-09-03 3:15 ` Matthew Wilcox 2020-09-07 12:22 ` Kirill A. Shutemov 2020-09-07 15:11 ` Zi Yan 2020-09-09 13:46 ` Kirill A. Shutemov 2020-09-09 14:15 ` Zi Yan 2020-09-02 18:06 ` [RFC PATCH 02/16] mm: thp: 1GB anonymous page implementation Zi Yan 2020-09-02 18:06 ` [RFC PATCH 03/16] mm: proc: add 1GB THP kpageflag Zi Yan 2020-09-09 13:46 ` Kirill A. Shutemov 2020-09-02 18:06 ` [RFC PATCH 04/16] mm: thp: 1GB THP copy on write implementation Zi Yan 2020-09-02 18:06 ` [RFC PATCH 05/16] mm: thp: handling 1GB THP reference bit Zi Yan 2020-09-09 14:09 ` Kirill A. Shutemov 2020-09-09 14:36 ` Zi Yan 2020-09-02 18:06 ` [RFC PATCH 06/16] mm: thp: add 1GB THP split_huge_pud_page() function Zi Yan 2020-09-09 14:18 ` Kirill A. Shutemov 2020-09-09 14:19 ` Zi Yan 2020-09-02 18:06 ` [RFC PATCH 07/16] mm: stats: make smap stats understand PUD THPs Zi Yan 2020-09-02 18:06 ` [RFC PATCH 08/16] mm: page_vma_walk: teach it about PMD-mapped PUD THP Zi Yan 2020-09-02 18:06 ` [RFC PATCH 09/16] mm: thp: 1GB THP support in try_to_unmap() Zi Yan 2020-09-02 18:06 ` [RFC PATCH 10/16] mm: thp: split 1GB THPs at page reclaim Zi Yan 2020-09-02 18:06 ` [RFC PATCH 11/16] mm: thp: 1GB THP follow_p*d_page() support Zi Yan 2020-09-02 18:06 ` [RFC PATCH 12/16] mm: support 1GB THP pagemap support Zi Yan 2020-09-02 18:06 ` [RFC PATCH 13/16] mm: thp: add a knob to enable/disable 1GB THPs Zi Yan 2020-09-02 18:06 ` [RFC PATCH 14/16] mm: page_alloc: >=MAX_ORDER pages allocation an deallocation Zi Yan 2020-09-02 18:06 ` [RFC PATCH 15/16] hugetlb: cma: move cma reserve function to cma.c Zi Yan 2020-09-02 18:06 ` [RFC PATCH 16/16] mm: thp: use cma reservation for pud thp allocation Zi Yan 2020-09-02 18:40 ` [RFC PATCH 00/16] 1GB THP support on x86_64 Jason Gunthorpe 2020-09-02 18:45 ` Zi Yan 2020-09-02 18:48 ` Jason Gunthorpe 2020-09-02 19:05 ` Zi Yan 2020-09-02 19:57 ` Jason Gunthorpe 2020-09-02 20:29 ` Zi Yan 2020-09-03 16:40 ` Jason Gunthorpe 2020-09-03 16:55 ` Matthew Wilcox 2020-09-03 17:08 ` Jason Gunthorpe 2020-09-03 7:32 ` Michal Hocko 2020-09-03 16:25 ` Roman Gushchin 2020-09-03 16:50 ` Jason Gunthorpe 2020-09-03 17:01 ` Matthew Wilcox 2020-09-03 17:18 ` Jason Gunthorpe 2020-09-03 20:57 ` Mike Kravetz 2020-09-03 21:06 ` Roman Gushchin 2020-09-04 7:42 ` Michal Hocko 2020-09-04 21:10 ` Roman Gushchin 2020-09-07 7:20 ` Michal Hocko 2020-09-08 15:09 ` Zi Yan 2020-09-08 19:58 ` Roman Gushchin 2020-09-09 4:01 ` John Hubbard 2020-09-09 7:15 ` Michal Hocko 2020-09-03 14:23 ` Kirill A. Shutemov 2020-09-03 16:30 ` Roman Gushchin 2020-09-08 11:57 ` David Hildenbrand 2020-09-08 14:05 ` Zi Yan 2020-09-08 14:22 ` David Hildenbrand 2020-09-08 15:36 ` Zi Yan 2020-09-08 14:27 ` Matthew Wilcox 2020-09-08 15:50 ` Zi Yan 2020-09-09 12:11 ` Jason Gunthorpe 2020-09-09 12:32 ` Matthew Wilcox 2020-09-09 13:14 ` Jason Gunthorpe 2020-09-09 13:27 ` David Hildenbrand 2020-09-10 10:02 ` William Kucharski 2020-09-08 14:35 ` Michal Hocko 2020-09-08 14:41 ` Rik van Riel 2020-09-08 15:02 ` David Hildenbrand 2020-09-09 7:04 ` Michal Hocko 2020-09-09 13:19 ` Rik van Riel 2020-09-09 13:43 ` David Hildenbrand 2020-09-09 13:49 ` Rik van Riel 2020-09-09 13:54 ` David Hildenbrand 2020-09-10 7:32 ` Michal Hocko 2020-09-10 8:27 ` David Hildenbrand 2020-09-10 14:21 ` Zi Yan 2020-09-10 14:34 ` David Hildenbrand 2020-09-10 14:41 ` Zi Yan 2020-09-10 15:15 ` David Hildenbrand [this message] 2020-09-10 13:32 ` Rik van Riel 2020-09-10 14:30 ` Zi Yan 2020-09-09 13:59 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=4b3006cf-3391-6839-904e-b415613198cb@redhat.com \ --to=david@redhat.com \ --cc=dnellans@nvidia.com \ --cc=guro@fb.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=kirill@shutemov.name \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=mhocko@suse.com \ --cc=riel@surriel.com \ --cc=shakeelb@google.com \ --cc=vbabka@suse.cz \ --cc=willy@infradead.org \ --cc=yang.shi@linux.alibaba.com \ --cc=ziy@nvidia.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-mm Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \ linux-mm@kvack.org public-inbox-index linux-mm Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kvack.linux-mm AGPL code for this site: git clone https://public-inbox.org/public-inbox.git