From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E7C7CCA47E for ; Fri, 24 Jun 2022 17:37:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B31AF8E0251; Fri, 24 Jun 2022 13:37:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ABB8A8E0244; Fri, 24 Jun 2022 13:37:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8714A8E0251; Fri, 24 Jun 2022 13:37:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6DC108E0244 for ; Fri, 24 Jun 2022 13:37:38 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4B33AB70 for ; Fri, 24 Jun 2022 17:37:38 +0000 (UTC) X-FDA: 79613836596.27.5CBD609 Received: from mail-ua1-f74.google.com (mail-ua1-f74.google.com [209.85.222.74]) by imf23.hostedemail.com (Postfix) with ESMTP id 5032014002A for ; Fri, 24 Jun 2022 17:37:37 +0000 (UTC) Received: by mail-ua1-f74.google.com with SMTP id j14-20020ab01d0e000000b0037f3ad22193so1059647uak.0 for ; Fri, 24 Jun 2022 10:37:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=BEnHzeA8IqLcgv72Sg9jh80/umPdhd6MjUiQ8hqfSFI=; b=GdzVxrX7wkYjgErfDdqCIkYSvWPbbIDKSxW5Wzna9kLimEZLIF7v/kjfzJ0/MJJI8C Nan7vqP8mjs7qbiftpzOotx0G0VhhFdl+a3xoAwOSm1ekZLV4zffUeKakFvQsIpRI1Zt tqdgg1fUIcbUxn4sKjY6xnWu7yiFtN8U8XUowuMeteG3pczq9XwOEfTu1w4q9aFihIch Gea+lKdHnWOnA7lB1uuW7+Uv8ZFug+g0pv2kbNnErXzZs6E84L9Efj3yIVwEZKkFg8tY dzRXPhe7pxjxGRQKU8w+ukahEWnGGDq1tUSkIQMfu89Z2YI4CkLnSQBu54xqTuD+UzJw axPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=BEnHzeA8IqLcgv72Sg9jh80/umPdhd6MjUiQ8hqfSFI=; b=7OkGp0Jf7oBD4noDCrESY8h5UFNSGmJ7lJHyDnq1F+q6B1Yv6bSLF7LNUuWNpi+YZW fAcH6GpTlvGW5DMN87UEHTeXmx86pZBhAup0Xe6iSOhhztNwX+qOpDeX4+ZUErFUAx5X c7XkF3GLHTWC1HLs+qpW6K2KoRhYE4BNUOobiEudEkeLmBlWSTVES7EnvY8ntXx+RTUx kxa9LZ8yxoOipDtKs11GSdXOh+eqvxrolteXKG9ruN8TUuzcWTklHWCtuas7yycJpH8/ g2t5wHBSU5l38guUixADk4YlekpMR2FsR6EKa3R+aSe95RE5RMaYOLc5msCe0M8p7PxQ kx0w== X-Gm-Message-State: AJIora9Sa02xeJYAIven7o6zjyonq4Nu+GZIfvFaGD9yhrbRqXI2UiFG bLkrycrtjiS1/9u8aOmKqIQLLkGi6Co3dIdd X-Google-Smtp-Source: AGRyM1uf1g766JLfRQ4yT/c5CHXe9SFwBo1833UUdGMaiNufztGVkND1yyG58uFyUB+Rm94Hkh8sbhU0SS7vep8r X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a1f:3456:0:b0:368:ca4e:1bd1 with SMTP id b83-20020a1f3456000000b00368ca4e1bd1mr4581vka.36.1656092256495; Fri, 24 Jun 2022 10:37:36 -0700 (PDT) Date: Fri, 24 Jun 2022 17:36:49 +0000 In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com> Message-Id: <20220624173656.2033256-20-jthoughton@google.com> Mime-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=GdzVxrX7; spf=pass (imf23.hostedemail.com: domain of 3YPa1YgoKCEEmwkrxjkwrqjrrjoh.frpolqx0-ppnydfn.ruj@flex--jthoughton.bounces.google.com designates 209.85.222.74 as permitted sender) smtp.mailfrom=3YPa1YgoKCEEmwkrxjkwrqjrrjoh.frpolqx0-ppnydfn.ruj@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656092258; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BEnHzeA8IqLcgv72Sg9jh80/umPdhd6MjUiQ8hqfSFI=; b=SL8XZujXyH2+yXoxM+e9ZnM00lVP5GCHUanJ3NVW/MdvihglPx3tecyXpUijY75oKJczGI kiR5l26WTlZjsRoqCecl3ABSdgh3JxsumuuBfX38PofBUayUf7DKkIT7kG2KAwRGt3IjDX BnCkF8rs7s0YaSmLhtKDOs/qy6zpATs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656092258; a=rsa-sha256; cv=none; b=zi+sPyd9ehl4q4OtG/WOKNcm/qjynzp4IUdYUSJHZQjWs8cV1rZPT1vNv7LudQoSu01S5b DtMksK4WhWRnd1uJVauY4Cl5z6umasNthnMaw+eyp5o0mdjB5fN5sN91V7FPb/8UiSn7kQ pTTUDgH+RgJeEy5TW8SCzTod0cQgiQA= X-Stat-Signature: q5d8rroyzygij91fkbfwxe8b6cb38o1x X-Rspamd-Queue-Id: 5032014002A X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=GdzVxrX7; spf=pass (imf23.hostedemail.com: domain of 3YPa1YgoKCEEmwkrxjkwrqjrrjoh.frpolqx0-ppnydfn.ruj@flex--jthoughton.bounces.google.com designates 209.85.222.74 as permitted sender) smtp.mailfrom=3YPa1YgoKCEEmwkrxjkwrqjrrjoh.frpolqx0-ppnydfn.ruj@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam02 X-HE-Tag: 1656092257-989069 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This allows fork() to work with high-granularity mappings. The page table structure is copied such that partially mapped regions will remain partially mapped in the same way for the new process. Signed-off-by: James Houghton --- mm/hugetlb.c | 74 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 59 insertions(+), 15 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index aadfcee947cf..0ec2f231524e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4851,7 +4851,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry, dst_entry; - struct page *ptepage; + struct hugetlb_pte src_hpte, dst_hpte; + struct page *ptepage, *hpage; unsigned long addr; bool cow = is_cow_mapping(src_vma->vm_flags); struct hstate *h = hstate_vma(src_vma); @@ -4878,17 +4879,44 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, i_mmap_lock_read(mapping); } - for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { + addr = src_vma->vm_start; + while (addr < src_vma->vm_end) { spinlock_t *src_ptl, *dst_ptl; + unsigned long hpte_sz; src_pte = huge_pte_offset(src, addr, sz); - if (!src_pte) + if (!src_pte) { + addr += sz; continue; + } dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz); if (!dst_pte) { ret = -ENOMEM; break; } + hugetlb_pte_populate(&src_hpte, src_pte, huge_page_shift(h)); + hugetlb_pte_populate(&dst_hpte, dst_pte, huge_page_shift(h)); + + if (hugetlb_hgm_enabled(src_vma)) { + BUG_ON(!hugetlb_hgm_enabled(dst_vma)); + ret = hugetlb_walk_to(src, &src_hpte, addr, + PAGE_SIZE, /*stop_at_none=*/true); + if (ret) + break; + ret = huge_pte_alloc_high_granularity( + &dst_hpte, dst, dst_vma, addr, + hugetlb_pte_shift(&src_hpte), + HUGETLB_SPLIT_NONE, + /*write_locked=*/false); + if (ret) + break; + + src_pte = src_hpte.ptep; + dst_pte = dst_hpte.ptep; + } + + hpte_sz = hugetlb_pte_size(&src_hpte); + /* * If the pagetables are shared don't copy or take references. * dst_pte == src_pte is the common case of src/dest sharing. @@ -4899,16 +4927,19 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * after taking the lock below. */ dst_entry = huge_ptep_get(dst_pte); - if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) + if ((dst_pte == src_pte) || !hugetlb_pte_none(&dst_hpte)) { + addr += hugetlb_pte_size(&src_hpte); continue; + } - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte); + dst_ptl = hugetlb_pte_lock(dst, &dst_hpte); + src_ptl = hugetlb_pte_lockptr(src, &src_hpte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); dst_entry = huge_ptep_get(dst_pte); again: - if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) { + if (hugetlb_pte_none(&src_hpte) || + !hugetlb_pte_none(&dst_hpte)) { /* * Skip if src entry none. Also, skip in the * unlikely case dst entry !none as this implies @@ -4931,11 +4962,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, if (userfaultfd_wp(src_vma) && uffd_wp) entry = huge_pte_mkuffd_wp(entry); set_huge_swap_pte_at(src, addr, src_pte, - entry, sz); + entry, hpte_sz); } if (!userfaultfd_wp(dst_vma) && uffd_wp) entry = huge_pte_clear_uffd_wp(entry); - set_huge_swap_pte_at(dst, addr, dst_pte, entry, sz); + set_huge_swap_pte_at(dst, addr, dst_pte, entry, + hpte_sz); } else if (unlikely(is_pte_marker(entry))) { /* * We copy the pte marker only if the dst vma has @@ -4946,7 +4978,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } else { entry = huge_ptep_get(src_pte); ptepage = pte_page(entry); - get_page(ptepage); + hpage = compound_head(ptepage); + get_page(hpage); /* * Failing to duplicate the anon rmap is a rare case @@ -4959,9 +4992,16 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * sleep during the process. */ if (!PageAnon(ptepage)) { - page_dup_file_rmap(ptepage, true); + /* Only dup_rmap once for a page */ + if (IS_ALIGNED(addr, sz)) + page_dup_file_rmap(hpage, true); } else if (page_try_dup_anon_rmap(ptepage, true, src_vma)) { + if (hugetlb_hgm_enabled(src_vma)) { + ret = -EINVAL; + break; + } + BUG_ON(!IS_ALIGNED(addr, hugetlb_pte_size(&src_hpte))); pte_t src_pte_old = entry; struct page *new; @@ -4970,13 +5010,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, /* Do not use reserve as it's private owned */ new = alloc_huge_page(dst_vma, addr, 1); if (IS_ERR(new)) { - put_page(ptepage); + put_page(hpage); ret = PTR_ERR(new); break; } - copy_user_huge_page(new, ptepage, addr, dst_vma, + copy_user_huge_page(new, hpage, addr, dst_vma, npages); - put_page(ptepage); + put_page(hpage); /* Install the new huge page if src pte stable */ dst_ptl = huge_pte_lock(h, dst, dst_pte); @@ -4994,6 +5034,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, hugetlb_install_page(dst_vma, dst_pte, addr, new); spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); continue; } @@ -5010,10 +5051,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } set_huge_pte_at(dst, addr, dst_pte, entry); - hugetlb_count_add(npages, dst); + hugetlb_count_add( + hugetlb_pte_size(&dst_hpte) / PAGE_SIZE, + dst); } spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); } if (cow) { -- 2.37.0.rc0.161.g10f37bed90-goog