On 11 Feb 2021, at 18:44, Mike Kravetz wrote: > On 2/11/21 12:47 PM, Zi Yan wrote: >> On 28 Jan 2021, at 16:53, Mike Kravetz wrote: >> >>> On 1/28/21 10:26 AM, Joao Martins wrote: >>>> For a given hugepage backing a VA, there's a rather ineficient >>>> loop which is solely responsible for storing subpages in GUP >>>> @pages/@vmas array. For each subpage we check whether it's within >>>> range or size of @pages and keep increment @pfn_offset and a couple >>>> other variables per subpage iteration. >>>> >>>> Simplify this logic and minimize the cost of each iteration to just >>>> store the output page/vma. Instead of incrementing number of @refs >>>> iteratively, we do it through pre-calculation of @refs and only >>>> with a tight loop for storing pinned subpages/vmas. >>>> >>>> Additionally, retain existing behaviour with using mem_map_offset() >>>> when recording the subpages for configurations that don't have a >>>> contiguous mem_map. >>>> >>>> pinning consequently improves bringing us close to >>>> {pin,get}_user_pages_fast: >>>> >>>> - 16G with 1G huge page size >>>> gup_test -f /mnt/huge/file -m 16384 -r 30 -L -S -n 512 -w >>>> >>>> PIN_LONGTERM_BENCHMARK: ~12.8k us -> ~5.8k us >>>> PIN_FAST_BENCHMARK: ~3.7k us >>>> >>>> Signed-off-by: Joao Martins >>>> --- >>>> mm/hugetlb.c | 49 ++++++++++++++++++++++++++++--------------------- >>>> 1 file changed, 28 insertions(+), 21 deletions(-) >>> >>> Thanks for updating this. >>> >>> Reviewed-by: Mike Kravetz >>> >>> I think there still is an open general question about whether we can always >>> assume page structs are contiguous for really big pages. That is outside >> >> I do not think page structs need to be contiguous, but PFNs within a big page >> need to be contiguous, at least based on existing code like mem_map_offset() we have. > > Thanks for looking Zi, > Yes, PFNs need to be contiguous. Also, as you say page structs do not need > to be contiguous. The issue is that there is code that assumes page structs > are contiguous for gigantic pages. hugetlb code does not make this assumption > and does a pfn_to_page() when looping through page structs for gigantic pages. > > I do not believe this to be a huge issue. In most cases CONFIG_VIRTUAL_MEM_MAP > is defined and struct pages can be accessed contiguously. I 'think' we could > run into problems with CONFIG_SPARSEMEM and without CONFIG_VIRTUAL_MEM_MAP > and doing hotplug operations. However, I still need to look into more. Yeah, you are right about this. The combination of CONFIG_SPARSEMEM, !CONFIG_SPARSEMEM_VMEMMAP and doing hotplug does cause errors, as simple as dynamically reserving gigantic hugetlb pages then freeing them in a system with CONFIG_SPARSEMEM_VMEMMAP not set and some hotplug memory. Here are the steps to reproduce: 0. Configure a kernel with CONFIG_SPARSEMEM_VMEMMAP not set. 1. Create a VM using qemu with “-m size=8g,slots=16,maxmem=16g” to enable hotplug. 2. After boot the machine, add large enough memory using “object_add memory-backend-ram,id=mem1,size=7g” and “device_add pc-dimm,id=dimm1,memdev=mem1”. 3. In the guest OS, online all hot-plugged memory. My VM has 128MB memory block size. If you have larger memory block size, I think you will need to plug in more memory. 4. Reserve gigantic hugetlb pages so that hot-plugged memory will be used. I reserved 12GB, like “echo 12 | sudo tee /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages”. 5. Free all hugetlb gigantic pages, “echo 0 | sudo tee /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages”. 6. You will get “BUG: Bad page state in process …” errors. The patch below can fix the error, but I suspect there might be other places missing the necessary mem_map_next() too. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4bdb58ab14cb..aae99c6984f3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1319,7 +1319,8 @@ static void update_and_free_page(struct hstate *h, struct page *page) h->nr_huge_pages--; h->nr_huge_pages_node[page_to_nid(page)]--; for (i = 0; i < pages_per_huge_page(h); i++) { - page[i].flags &= ~(1 << PG_locked | 1 << PG_error | + struct page *subpage = mem_map_next(subpage, page, i); + subpage->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced | 1 << PG_dirty | 1 << PG_active | 1 << PG_private | 1 << PG_writeback); — Best Regards, Yan Zi