From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89343C678D4 for ; Thu, 2 Mar 2023 16:44:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230003AbjCBQo2 (ORCPT ); Thu, 2 Mar 2023 11:44:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230001AbjCBQoZ (ORCPT ); Thu, 2 Mar 2023 11:44:25 -0500 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEDFA193FE for ; Thu, 2 Mar 2023 08:44:22 -0800 (PST) Received: by mail-pj1-x1030.google.com with SMTP id h11-20020a17090a2ecb00b00237c740335cso3352927pjs.3 for ; Thu, 02 Mar 2023 08:44:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1677775462; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OtSkf5MF6MTtXsT3MyqpoFulHtBnaDyIDTTRZHqKdEg=; b=KBUkvr0lv2QIcfQoGK5aXZ68kvtapM/5pdg6W6hfZtq1qVK0uIyjG4G+RAgzhPAUa5 TDhBHqANpfx2JgL6HRbRxs0I3hwvcXFMFyWfKx2HfwoMxL00Lce6i/9ZsN34atzLzmB1 niso2B3+0s8j9WjXBM4mqbYj+xu4opyY9DSTb1Ddsn5/vjarGuKErH7tCRto61/RpMou QANRqrn1v6fdkQ0z896flZP9jXX3/ZRHUneW3u/w+1mwrUerPwfwiu9GiheOtTltivKS tsmSpjgP5lj0wikw19CqLLFyswc/YO/hHxChGQPFlzSvi9INJk4Ynu5keVVMYwPh34h0 jbbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677775462; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OtSkf5MF6MTtXsT3MyqpoFulHtBnaDyIDTTRZHqKdEg=; b=djrN/Ok1umMIpbkGpUAXNQ9As7bzMGaHcjJkYYQLQhhtTxlq++4D6iIDusU1AZA2lz 78B4zlQnHEUdsS7wuAQ/58mTVULeVc5uhyqabrNM7KK9F1k97EFL9J09mPwfjXVietKr CNv7TIYuujbm2cIO+b9IeoSjv4ur5ic4SgrGLsWYtiz/QN55bRzxDZp1MTqEt6GvyUhi +iw8gi1cg0IuNv6uV1aRY3LtSCwqChxJ01WEylkEbgy1xzMcR8YPHkBrN8XgZ9R15H61 fzG+OXXLInc7w+oUac21Y2eMxdGKUKprhxEA9RoQp1ZUMraL2K2QVgzYm0yU7peQcEZX usMg== X-Gm-Message-State: AO0yUKVW1P7p+X7o6cX5wzrhAforFrYCRJokip7TedOvrhGCHt/lYBIP A0uDPcEALiFFRvDLHIMzG1vgoh3900VIQeBa3SHUSg== X-Google-Smtp-Source: AK7set+hvx7Wgjhea8R3CjbZJvcN48WjNh85ZCPVU1R1z18VqHVsjHrZVzO1z2+9J+5A1QcZSY228YzFS8mkyWMbIZ4= X-Received: by 2002:a17:903:2591:b0:19a:8bc7:d814 with SMTP id jb17-20020a170903259100b0019a8bc7d814mr4028466plb.13.1677775461664; Thu, 02 Mar 2023 08:44:21 -0800 (PST) MIME-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> <20230218002819.1486479-6-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Thu, 2 Mar 2023 08:43:43 -0800 Message-ID: Subject: Re: [PATCH v2 05/46] rmap: hugetlb: switch from page_dup_file_rmap to page_add_file_rmap To: Jiaqi Yan Cc: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton , David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 2, 2023 at 7:44=E2=80=AFAM James Houghton wrote: > > On Wed, Mar 1, 2023 at 5:06=E2=80=AFPM Jiaqi Yan wr= ote: > > > > On Fri, Feb 17, 2023 at 4:28 PM James Houghton = wrote: > > > > > > This only applies to file-backed HugeTLB, and it should be a no-op un= til > > > high-granularity mapping is possible. Also update page_remove_rmap to > > > support the eventual case where !compound && folio_test_hugetlb(). > > > > > > HugeTLB doesn't use LRU or mlock, so we avoid those bits. This also > > > means we don't need to use subpage_mapcount; if we did, it would > > > overflow with only a few mappings. > > This is wrong; I guess I misunderstood the code when I wrote this > commit. subpages_mapcount (now called _nr_pages_mapped) won't overflow > (unless HugeTLB pages could be greater than 16G). It is indeed a bug > not to update _nr_pages_mapped the same way THPs do. > > > > > > > > > There is still one caller of page_dup_file_rmap left: copy_present_pt= e, > > > and it is always called with compound=3Dfalse in this case. > > > > > > Signed-off-by: James Houghton > > > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > index 08004371cfed..6c008c9de80e 100644 > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -5077,7 +5077,7 @@ int copy_hugetlb_page_range(struct mm_struct *d= st, struct mm_struct *src, > > > * sleep during the process. > > > */ > > > if (!PageAnon(ptepage)) { > > > - page_dup_file_rmap(ptepage, true); > > > + page_add_file_rmap(ptepage, src_vma, = true); > > > } else if (page_try_dup_anon_rmap(ptepage, tr= ue, > > > src_vma)) { > > > pte_t src_pte_old =3D entry; > > > @@ -5910,7 +5910,7 @@ static vm_fault_t hugetlb_no_page(struct mm_str= uct *mm, > > > if (anon_rmap) > > > hugepage_add_new_anon_rmap(folio, vma, haddr); > > > else > > > - page_dup_file_rmap(&folio->page, true); > > > + page_add_file_rmap(&folio->page, vma, true); > > > new_pte =3D make_huge_pte(vma, &folio->page, ((vma->vm_flags = & VM_WRITE) > > > && (vma->vm_flags & VM_SHARED))); > > > /* > > > @@ -6301,7 +6301,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *= dst_mm, > > > goto out_release_unlock; > > > > > > if (folio_in_pagecache) > > > - page_dup_file_rmap(&folio->page, true); > > > + page_add_file_rmap(&folio->page, dst_vma, true); > > > else > > > hugepage_add_new_anon_rmap(folio, dst_vma, dst_addr); > > > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > > index d3964c414010..b0f87f19b536 100644 > > > --- a/mm/migrate.c > > > +++ b/mm/migrate.c > > > @@ -254,7 +254,7 @@ static bool remove_migration_pte(struct folio *fo= lio, > > > hugepage_add_anon_rmap(new, vma, pvmw= .address, > > > rmap_flags); > > > else > > > - page_dup_file_rmap(new, true); > > > + page_add_file_rmap(new, vma, true); > > > set_huge_pte_at(vma->vm_mm, pvmw.address, pvm= w.pte, pte); > > > } else > > > #endif > > > diff --git a/mm/rmap.c b/mm/rmap.c > > > index 15ae24585fc4..c010d0af3a82 100644 > > > --- a/mm/rmap.c > > > +++ b/mm/rmap.c > > > > Given you are making hugetlb's ref/mapcount mechanism to be consistent > > with THP, I think the special folio_test_hugetlb checks you added in > > this commit will break page_mapped() and folio_mapped() if page/folio > > is HGMed. With these checks, folio->_nr_pages_mapped are not properly > > increased/decreased. > > Thank you, Jiaqi! I didn't realize I broke > folio_mapped()/page_mapped(). The end result is that page_mapped() may > report that an HGMed page isn't mapped when it is. Not good! > > > > > > @@ -1318,21 +1318,21 @@ void page_add_file_rmap(struct page *page, st= ruct vm_area_struct *vma, > > > int nr =3D 0, nr_pmdmapped =3D 0; > > > bool first; > > > > > > - VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); > > > + VM_BUG_ON_PAGE(compound && !PageTransHuge(page) > > > + && !folio_test_hugetlb(folio), page); > > > > > > /* Is page being mapped by PTE? Is this its first map to be a= dded? */ > > > if (likely(!compound)) { > > > first =3D atomic_inc_and_test(&page->_mapcount); > > > nr =3D first; > > > - if (first && folio_test_large(folio)) { > > > + if (first && folio_test_large(folio) > > > + && !folio_test_hugetlb(folio)) { > > > > So we should still increment _nr_pages_mapped for hugetlb case here, > > and decrement in the corresponding place in page_remove_rmap. > > > > > nr =3D atomic_inc_return_relaxed(mapped); > > > nr =3D (nr < COMPOUND_MAPPED); > > > } > > > - } else if (folio_test_pmd_mappable(folio)) { > > > - /* That test is redundant: it's for safety or to opti= mize out */ > > > - > > > + } else { > > > first =3D atomic_inc_and_test(&folio->_entire_mapcoun= t); > > > - if (first) { > > > + if (first && !folio_test_hugetlb(folio)) { > > > > Same here: we should still increase _nr_pages_mapped by > > COMPOUND_MAPPED and decrease by COMPOUND_MAPPED in the corresponding > > place in page_remove_rmap. > > > > > nr =3D atomic_add_return_relaxed(COMPOUND_MAP= PED, mapped); > > > if (likely(nr < COMPOUND_MAPPED + COMPOUND_MA= PPED)) { > > > nr_pmdmapped =3D folio_nr_pages(folio= ); > > > @@ -1347,6 +1347,9 @@ void page_add_file_rmap(struct page *page, stru= ct vm_area_struct *vma, > > > } > > > } > > > > > > + if (folio_test_hugetlb(folio)) > > > + return; > > > + > > > if (nr_pmdmapped) > > > __lruvec_stat_mod_folio(folio, folio_test_swapbacked(= folio) ? > > > NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pm= dmapped); > > > @@ -1376,8 +1379,7 @@ void page_remove_rmap(struct page *page, struct= vm_area_struct *vma, > > > VM_BUG_ON_PAGE(compound && !PageHead(page), page); > > > > > > /* Hugetlb pages are not counted in NR_*MAPPED */ > > > - if (unlikely(folio_test_hugetlb(folio))) { > > > - /* hugetlb pages are always mapped with pmds */ > > > + if (unlikely(folio_test_hugetlb(folio)) && compound) { > > > atomic_dec(&folio->_entire_mapcount); > > > return; > > > } > > > > This entire if-block should be removed after you remove the > > !folio_test_hugetlb checks in page_add_file_rmap. > > This is the not-so-obvious change that is needed. Thank you! > > > > > > @@ -1386,15 +1388,14 @@ void page_remove_rmap(struct page *page, stru= ct vm_area_struct *vma, > > > if (likely(!compound)) { > > > last =3D atomic_add_negative(-1, &page->_mapcount); > > > nr =3D last; > > > - if (last && folio_test_large(folio)) { > > > + if (last && folio_test_large(folio) > > > + && !folio_test_hugetlb(folio)) { > > > > ditto. > > > > > nr =3D atomic_dec_return_relaxed(mapped); > > > nr =3D (nr < COMPOUND_MAPPED); > > > } > > > - } else if (folio_test_pmd_mappable(folio)) { > > > - /* That test is redundant: it's for safety or to opti= mize out */ > > > - > > > + } else { > > > last =3D atomic_add_negative(-1, &folio->_entire_mapc= ount); > > > - if (last) { > > > + if (last && !folio_test_hugetlb(folio)) { > > > > ditto. > > I agree with all of your suggestions. Testing with the hugetlb-hgm > selftest, nothing seems to break. :) > > Given that this is at least the third or fourth major bug in this > version of the series, I'll go ahead and send a v3 sooner rather than > later. This solution limits the size of a HugeTLB page to 16G. I'm not sure if there are any architectures that support HugeTLB pages larger than 16G (it looks like powerpc supports 16G). If they do, I think we can just increase the value of COMPOUND_MAPPED. If that's not possible, we would need another solution than participating in _nr_pages_mapped like THPs.