On 2 Sep 2020, at 14:40, Jason Gunthorpe wrote: > On Wed, Sep 02, 2020 at 02:06:12PM -0400, Zi Yan wrote: >> From: Zi Yan >> >> Hi all, >> >> This patchset adds support for 1GB THP on x86_64. It is on top of >> v5.9-rc2-mmots-2020-08-25-21-13. >> >> 1GB THP is more flexible for reducing translation overhead and increasing the >> performance of applications with large memory footprint without application >> changes compared to hugetlb. >> >> Design >> ======= >> >> 1GB THP implementation looks similar to exiting THP code except some new designs >> for the additional page table level. >> >> 1. Page table deposit and withdraw using a new pagechain data structure: >> instead of one PTE page table page, 1GB THP requires 513 page table pages >> (one PMD page table page and 512 PTE page table pages) to be deposited >> at the page allocaiton time, so that we can split the page later. Currently, >> the page table deposit is using ->lru, thus only one page can be deposited. >> A new pagechain data structure is added to enable multi-page deposit. >> >> 2. Triple mapped 1GB THP : 1GB THP can be mapped by a combination of PUD, PMD, >> and PTE entries. Mixing PUD an PTE mapping can be achieved with existing >> PageDoubleMap mechanism. To add PMD mapping, PMDPageInPUD and >> sub_compound_mapcount are introduced. PMDPageInPUD is the 512-aligned base >> page in a 1GB THP and sub_compound_mapcount counts the PMD mapping by using >> page[N*512 + 3].compound_mapcount. >> >> 3. Using CMA allocaiton for 1GB THP: instead of bump MAX_ORDER, it is more sane >> to use something less intrusive. So all 1GB THPs are allocated from reserved >> CMA areas shared with hugetlb. At page splitting time, the bitmap for the 1GB >> THP is cleared as the resulting pages can be freed via normal page free path. >> We can fall back to alloc_contig_pages for 1GB THP if necessary. >> >> >> Patch Organization >> ======= >> >> Patch 01 adds the new pagechain data structure. >> >> Patch 02 to 13 adds 1GB THP support in variable places. >> >> Patch 14 tries to use alloc_contig_pages for 1GB THP allocaiton. >> >> Patch 15 moves hugetlb_cma reservation to cma.c and rename it to hugepage_cma. >> >> Patch 16 use hugepage_cma reservation for 1GB THP allocation. >> >> >> Any suggestions and comments are welcome. >> >> >> Zi Yan (16): >> mm: add pagechain container for storing multiple pages. >> mm: thp: 1GB anonymous page implementation. >> mm: proc: add 1GB THP kpageflag. >> mm: thp: 1GB THP copy on write implementation. >> mm: thp: handling 1GB THP reference bit. >> mm: thp: add 1GB THP split_huge_pud_page() function. >> mm: stats: make smap stats understand PUD THPs. >> mm: page_vma_walk: teach it about PMD-mapped PUD THP. >> mm: thp: 1GB THP support in try_to_unmap(). >> mm: thp: split 1GB THPs at page reclaim. >> mm: thp: 1GB THP follow_p*d_page() support. >> mm: support 1GB THP pagemap support. >> mm: thp: add a knob to enable/disable 1GB THPs. >> mm: page_alloc: >=MAX_ORDER pages allocation an deallocation. >> hugetlb: cma: move cma reserve function to cma.c. >> mm: thp: use cma reservation for pud thp allocation. > > Surprised this doesn't touch mm/pagewalk.c ? 1GB PUD page support is present for DAX purpose, so the code is there in mm/pagewalk.c already. I only needed to supply ops->pud_entry when using the functions in mm/pagewalk.c. :) — Best Regards, Yan Zi