From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D284C43460 for ; Thu, 13 May 2021 23:49:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D8AC76143E for ; Thu, 13 May 2021 23:49:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230466AbhEMXu6 (ORCPT ); Thu, 13 May 2021 19:50:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230316AbhEMXu5 (ORCPT ); Thu, 13 May 2021 19:50:57 -0400 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B23B0C061574 for ; Thu, 13 May 2021 16:49:45 -0700 (PDT) Received: by mail-pj1-x1035.google.com with SMTP id g6-20020a17090adac6b029015d1a9a6f1aso388153pjx.1 for ; Thu, 13 May 2021 16:49:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:cc; bh=SQ6zgb1j+MlQe0RT2WIWbEbaJo5IPcgCveO2XpCmT/Q=; b=JVTHzAgrbcqpqwmMjsQM0gGO8yCvsxBgLaWJyBwsVub1nHSmi6zO0b8KG8aSE7/bcp uwbXv7j1s48xEI8ihsRCEbEsEQCSw8gZYJr/HspTxMFWJOGk/eCWS3tJGK6JFj8JFXR7 ulzPFOXUVwBjgMSuTyKdM/5q/FLYRfLUbPD+5y5aEnK7wVK+4tOa8hh8afvlXyLEfm7G KDS2dgLgtgl/om+w7V4zn4p0sRFlgKWpwqGWHfcSnAH1GSJdx1D0gFHBzSYpc+9x4+6d SgFntlcSr+AFEE4JdjC0ft1qD1sPs/1uMG9sIlC3n64sGHNQzXAOnS3/1ENUOD0IWe48 FCwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:cc; bh=SQ6zgb1j+MlQe0RT2WIWbEbaJo5IPcgCveO2XpCmT/Q=; b=DEGJJZN0gOIzP4U23i6VHruw350tGjveNpV/SYbVIempJOVFhx8HIDK2w3QkLuZPVB jU7DFqnFn19ZLYk0Om1shM93f3txOIg6Vp/xhUBQY5wnVTn1kO4nHv2em0t3yUNLX0Gc W9+9A5aD0QaPTmeXiaBpf0pKBZeocpAo6Yj476fDx1Q8W5G59AIUFSX1911KVJ8bib9t /C8PdUtI2WPp6t/VuTfsRzl7E4ZOiPlLxIUuUo1TqsCmCatpnmkhE1wpOutmELUxjApr pdiXUq13IUma6GsjnFjGhJ3KFn0MvSzlOd3p5RN7kJDz/v/ubex+F5HboZvWvkHorrMR ouQw== X-Gm-Message-State: AOAM531NE+A2vc1J5kaXyFTFx6PO8FEvHIWCx+TycwtQ7UugKUWuhoh5 BBAlh/sxh211kgADl2wvvfVnjGN4JG63smYu9O3POUuxQUs= X-Received: by 2002:a17:90a:1a:: with SMTP id 26mt5210238pja.187.1620949784838; Thu, 13 May 2021 16:49:44 -0700 (PDT) MIME-Version: 1.0 References: <20210513234309.366727-1-almasrymina@google.com> In-Reply-To: <20210513234309.366727-1-almasrymina@google.com> From: Mina Almasry Date: Thu, 13 May 2021 16:49:33 -0700 Message-ID: Subject: Re: [PATCH] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY Cc: Axel Rasmussen , Peter Xu , Linux-MM , Mike Kravetz , Andrew Morton , open list Content-Type: text/plain; charset="UTF-8" To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 13, 2021 at 4:43 PM Mina Almasry wrote: > > When hugetlb_mcopy_atomic_pte() is called with: > - mode==MCOPY_ATOMIC_NORMAL and, > - we already have a page in the page cache corresponding to the > associated address, > > We will allocate a huge page from the reserves, and then fail to insert it > into the cache and return -EEXIST. In this case, we need to return -EEXIST > without allocating a new page as the page already exists in the cache. > Allocating the extra page causes the resv_huge_pages to underflow temporarily > until the extra page is freed. > > To fix this we check if a page exists in the cache, and allocate it and > insert it in the cache immediately while holding the lock. After that we > copy the contents into the page. > > As a side effect of this, pages may exist in the cache for which the > copy failed and for these pages PageUptodate(page) == false. Modify code > that query the cache to handle this correctly. > To be honest, I'm not sure I've done this bit correctly. Please take a look and let me know what you think. It may be too overly complicated to have !PageUptodate() pages in the cache and ask the rest of the code to handle that edge case correctly, but I'm not sure how else to fix this issue. > Tested using: > ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 1024 200 \ > /mnt/huge > > Test passes, and dmesg shows no underflow warnings. > > Signed-off-by: Mina Almasry > Cc: Axel Rasmussen > Cc: Peter Xu > Cc: linux-mm@kvack.org > Cc: Mike Kravetz > Cc: Andrew Morton > Cc: linux-mm@kvack.org > Cc: linux-kernel@vger.kernel.org > > --- > fs/hugetlbfs/inode.c | 2 +- > mm/hugetlb.c | 109 +++++++++++++++++++++++-------------------- > 2 files changed, 60 insertions(+), 51 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index a2a42335e8fd..cc027c335242 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -346,7 +346,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) > > /* Find the page */ > page = find_lock_page(mapping, index); > - if (unlikely(page == NULL)) { > + if (unlikely(page == NULL || !PageUptodate(page))) { > /* > * We have a HOLE, zero out the user-buffer for the > * length of the hole or request. > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 629aa4c2259c..a5a5fbf7ac25 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4543,7 +4543,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > > retry: > page = find_lock_page(mapping, idx); > - if (!page) { > + if (!page || !PageUptodate(page)) { > /* Check for page in userfault range */ > if (userfaultfd_missing(vma)) { > ret = hugetlb_handle_userfault(vma, mapping, idx, > @@ -4552,26 +4552,30 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > goto out; > } > > - page = alloc_huge_page(vma, haddr, 0); > - if (IS_ERR(page)) { > - /* > - * Returning error will result in faulting task being > - * sent SIGBUS. The hugetlb fault mutex prevents two > - * tasks from racing to fault in the same page which > - * could result in false unable to allocate errors. > - * Page migration does not take the fault mutex, but > - * does a clear then write of pte's under page table > - * lock. Page fault code could race with migration, > - * notice the clear pte and try to allocate a page > - * here. Before returning error, get ptl and make > - * sure there really is no pte entry. > - */ > - ptl = huge_pte_lock(h, mm, ptep); > - ret = 0; > - if (huge_pte_none(huge_ptep_get(ptep))) > - ret = vmf_error(PTR_ERR(page)); > - spin_unlock(ptl); > - goto out; > + if (!page) { > + page = alloc_huge_page(vma, haddr, 0); > + if (IS_ERR(page)) { > + /* > + * Returning error will result in faulting task > + * being sent SIGBUS. The hugetlb fault mutex > + * prevents two tasks from racing to fault in > + * the same page which could result in false > + * unable to allocate errors. Page migration > + * does not take the fault mutex, but does a > + * clear then write of pte's under page table > + * lock. Page fault code could race with > + * migration, notice the clear pte and try to > + * allocate a page here. Before returning > + * error, get ptl and make sure there really is > + * no pte entry. > + */ > + ptl = huge_pte_lock(h, mm, ptep); > + ret = 0; > + if (huge_pte_none(huge_ptep_get(ptep))) > + ret = vmf_error(PTR_ERR(page)); > + spin_unlock(ptl); > + goto out; > + } > } > clear_huge_page(page, address, pages_per_huge_page(h)); > __SetPageUptodate(page); > @@ -4868,31 +4872,55 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > struct page **pagep) > { > bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); > - struct address_space *mapping; > - pgoff_t idx; > + struct hstate *h = hstate_vma(dst_vma); > + struct address_space *mapping = dst_vma->vm_file->f_mapping; > + pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); > unsigned long size; > int vm_shared = dst_vma->vm_flags & VM_SHARED; > - struct hstate *h = hstate_vma(dst_vma); > pte_t _dst_pte; > spinlock_t *ptl; > - int ret; > + int ret = -ENOMEM; > struct page *page; > int writable; > > - mapping = dst_vma->vm_file->f_mapping; > - idx = vma_hugecache_offset(h, dst_vma, dst_addr); > - > if (is_continue) { > ret = -EFAULT; > - page = find_lock_page(mapping, idx); > - if (!page) > + page = hugetlbfs_pagecache_page(h, dst_vma, dst_addr); > + if (!page) { > + ret = -ENOMEM; > goto out; > + } > } else if (!*pagep) { > - ret = -ENOMEM; > + /* If a page already exists, then it's UFFDIO_COPY for > + * a non-missing case. Return -EEXIST. > + */ > + if (hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { > + ret = -EEXIST; > + goto out; > + } > + > page = alloc_huge_page(dst_vma, dst_addr, 0); > if (IS_ERR(page)) > goto out; > > + /* Add shared, newly allocated pages to the page cache. */ > + if (vm_shared) { > + size = i_size_read(mapping->host) >> huge_page_shift(h); > + ret = -EFAULT; > + if (idx >= size) > + goto out; > + > + /* > + * Serialization between remove_inode_hugepages() and > + * huge_add_to_page_cache() below happens through the > + * hugetlb_fault_mutex_table that here must be hold by > + * the caller. > + */ > + ret = huge_add_to_page_cache(page, mapping, idx); > + if (ret) > + goto out; > + } > + > ret = copy_huge_page_from_user(page, > (const void __user *) src_addr, > pages_per_huge_page(h), false); > @@ -4916,24 +4944,6 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > */ > __SetPageUptodate(page); > > - /* Add shared, newly allocated pages to the page cache. */ > - if (vm_shared && !is_continue) { > - size = i_size_read(mapping->host) >> huge_page_shift(h); > - ret = -EFAULT; > - if (idx >= size) > - goto out_release_nounlock; > - > - /* > - * Serialization between remove_inode_hugepages() and > - * huge_add_to_page_cache() below happens through the > - * hugetlb_fault_mutex_table that here must be hold by > - * the caller. > - */ > - ret = huge_add_to_page_cache(page, mapping, idx); > - if (ret) > - goto out_release_nounlock; > - } > - > ptl = huge_pte_lockptr(h, dst_mm, dst_pte); > spin_lock(ptl); > > @@ -4994,7 +5004,6 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > spin_unlock(ptl); > if (vm_shared || is_continue) > unlock_page(page); > -out_release_nounlock: > put_page(page); > goto out; > } > -- > 2.31.1.751.gd2f1c929bd-goog From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-22.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,MISSING_HEADERS, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CA32C433B4 for ; Thu, 13 May 2021 23:49:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B6218611CC for ; Thu, 13 May 2021 23:49:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B6218611CC Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 38A266B0036; Thu, 13 May 2021 19:49:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 339BD6B006E; Thu, 13 May 2021 19:49:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18F246B0070; Thu, 13 May 2021 19:49:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id DA6416B0036 for ; Thu, 13 May 2021 19:49:46 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 7032A9412 for ; Thu, 13 May 2021 23:49:46 +0000 (UTC) X-FDA: 78137852772.22.65C3FDD Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf01.hostedemail.com (Postfix) with ESMTP id 07F77500152D for ; Thu, 13 May 2021 23:49:44 +0000 (UTC) Received: by mail-pj1-f50.google.com with SMTP id g6-20020a17090adac6b029015d1a9a6f1aso388154pjx.1 for ; Thu, 13 May 2021 16:49:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:cc; bh=SQ6zgb1j+MlQe0RT2WIWbEbaJo5IPcgCveO2XpCmT/Q=; b=JVTHzAgrbcqpqwmMjsQM0gGO8yCvsxBgLaWJyBwsVub1nHSmi6zO0b8KG8aSE7/bcp uwbXv7j1s48xEI8ihsRCEbEsEQCSw8gZYJr/HspTxMFWJOGk/eCWS3tJGK6JFj8JFXR7 ulzPFOXUVwBjgMSuTyKdM/5q/FLYRfLUbPD+5y5aEnK7wVK+4tOa8hh8afvlXyLEfm7G KDS2dgLgtgl/om+w7V4zn4p0sRFlgKWpwqGWHfcSnAH1GSJdx1D0gFHBzSYpc+9x4+6d SgFntlcSr+AFEE4JdjC0ft1qD1sPs/1uMG9sIlC3n64sGHNQzXAOnS3/1ENUOD0IWe48 FCwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:cc; bh=SQ6zgb1j+MlQe0RT2WIWbEbaJo5IPcgCveO2XpCmT/Q=; b=nWhADFH7iCj0gGfXeiQ+DvN7kHKJiRXOkz+cjDfzrAqhotoILPvIr6H4l9Fh5RkEfi 8qoHU30SivumtFc1p9tAsd2mUn7MHyZVWE5464JARwL1+9VW+u/XAnANHx9AuT+XnpWg De3BQG2P2wm+5S9v1o7pCFITRk5pxbbWxZ7+gh3euPIkO/CbnNPqflU7A3ZnXcpnlpvN Gow2LhGuc4ASJJ5OJHHoebMhFuHqa647PeViGvaagC/SK2q1f4l//RwAvIlDpsBEnBy/ oxQ98KaFYkcFosTVR8L9MG780o+F7P8wxNUyUYcJqgqMpifcyZ2vsb+syvNvhN8C9mhA 6R0g== X-Gm-Message-State: AOAM531xnNghSJXgKTtOwej+oWufbD2TCI8BnxkMHInw4pD7XOy93D/R 35LBqzd0d99FU1XZpiJkDjBFzLKIipH7bOlF9ozFGQ== X-Received: by 2002:a17:90a:1a:: with SMTP id 26mt5210235pja.187.1620949784838; Thu, 13 May 2021 16:49:44 -0700 (PDT) MIME-Version: 1.0 References: <20210513234309.366727-1-almasrymina@google.com> In-Reply-To: <20210513234309.366727-1-almasrymina@google.com> From: Mina Almasry Date: Thu, 13 May 2021 16:49:33 -0700 Message-ID: Subject: Re: [PATCH] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY Cc: Axel Rasmussen , Peter Xu , Linux-MM , Mike Kravetz , Andrew Morton , open list Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=JVTHzAgr; spf=pass (imf01.hostedemail.com: domain of almasrymina@google.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 07F77500152D X-Stat-Signature: czbis4pcud7x3rbd1gt8y83wwenjezmj X-HE-Tag: 1620949784-963716 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 13, 2021 at 4:43 PM Mina Almasry wrote: > > When hugetlb_mcopy_atomic_pte() is called with: > - mode==MCOPY_ATOMIC_NORMAL and, > - we already have a page in the page cache corresponding to the > associated address, > > We will allocate a huge page from the reserves, and then fail to insert it > into the cache and return -EEXIST. In this case, we need to return -EEXIST > without allocating a new page as the page already exists in the cache. > Allocating the extra page causes the resv_huge_pages to underflow temporarily > until the extra page is freed. > > To fix this we check if a page exists in the cache, and allocate it and > insert it in the cache immediately while holding the lock. After that we > copy the contents into the page. > > As a side effect of this, pages may exist in the cache for which the > copy failed and for these pages PageUptodate(page) == false. Modify code > that query the cache to handle this correctly. > To be honest, I'm not sure I've done this bit correctly. Please take a look and let me know what you think. It may be too overly complicated to have !PageUptodate() pages in the cache and ask the rest of the code to handle that edge case correctly, but I'm not sure how else to fix this issue. > Tested using: > ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 1024 200 \ > /mnt/huge > > Test passes, and dmesg shows no underflow warnings. > > Signed-off-by: Mina Almasry > Cc: Axel Rasmussen > Cc: Peter Xu > Cc: linux-mm@kvack.org > Cc: Mike Kravetz > Cc: Andrew Morton > Cc: linux-mm@kvack.org > Cc: linux-kernel@vger.kernel.org > > --- > fs/hugetlbfs/inode.c | 2 +- > mm/hugetlb.c | 109 +++++++++++++++++++++++-------------------- > 2 files changed, 60 insertions(+), 51 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index a2a42335e8fd..cc027c335242 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -346,7 +346,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) > > /* Find the page */ > page = find_lock_page(mapping, index); > - if (unlikely(page == NULL)) { > + if (unlikely(page == NULL || !PageUptodate(page))) { > /* > * We have a HOLE, zero out the user-buffer for the > * length of the hole or request. > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 629aa4c2259c..a5a5fbf7ac25 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4543,7 +4543,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > > retry: > page = find_lock_page(mapping, idx); > - if (!page) { > + if (!page || !PageUptodate(page)) { > /* Check for page in userfault range */ > if (userfaultfd_missing(vma)) { > ret = hugetlb_handle_userfault(vma, mapping, idx, > @@ -4552,26 +4552,30 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > goto out; > } > > - page = alloc_huge_page(vma, haddr, 0); > - if (IS_ERR(page)) { > - /* > - * Returning error will result in faulting task being > - * sent SIGBUS. The hugetlb fault mutex prevents two > - * tasks from racing to fault in the same page which > - * could result in false unable to allocate errors. > - * Page migration does not take the fault mutex, but > - * does a clear then write of pte's under page table > - * lock. Page fault code could race with migration, > - * notice the clear pte and try to allocate a page > - * here. Before returning error, get ptl and make > - * sure there really is no pte entry. > - */ > - ptl = huge_pte_lock(h, mm, ptep); > - ret = 0; > - if (huge_pte_none(huge_ptep_get(ptep))) > - ret = vmf_error(PTR_ERR(page)); > - spin_unlock(ptl); > - goto out; > + if (!page) { > + page = alloc_huge_page(vma, haddr, 0); > + if (IS_ERR(page)) { > + /* > + * Returning error will result in faulting task > + * being sent SIGBUS. The hugetlb fault mutex > + * prevents two tasks from racing to fault in > + * the same page which could result in false > + * unable to allocate errors. Page migration > + * does not take the fault mutex, but does a > + * clear then write of pte's under page table > + * lock. Page fault code could race with > + * migration, notice the clear pte and try to > + * allocate a page here. Before returning > + * error, get ptl and make sure there really is > + * no pte entry. > + */ > + ptl = huge_pte_lock(h, mm, ptep); > + ret = 0; > + if (huge_pte_none(huge_ptep_get(ptep))) > + ret = vmf_error(PTR_ERR(page)); > + spin_unlock(ptl); > + goto out; > + } > } > clear_huge_page(page, address, pages_per_huge_page(h)); > __SetPageUptodate(page); > @@ -4868,31 +4872,55 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > struct page **pagep) > { > bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); > - struct address_space *mapping; > - pgoff_t idx; > + struct hstate *h = hstate_vma(dst_vma); > + struct address_space *mapping = dst_vma->vm_file->f_mapping; > + pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); > unsigned long size; > int vm_shared = dst_vma->vm_flags & VM_SHARED; > - struct hstate *h = hstate_vma(dst_vma); > pte_t _dst_pte; > spinlock_t *ptl; > - int ret; > + int ret = -ENOMEM; > struct page *page; > int writable; > > - mapping = dst_vma->vm_file->f_mapping; > - idx = vma_hugecache_offset(h, dst_vma, dst_addr); > - > if (is_continue) { > ret = -EFAULT; > - page = find_lock_page(mapping, idx); > - if (!page) > + page = hugetlbfs_pagecache_page(h, dst_vma, dst_addr); > + if (!page) { > + ret = -ENOMEM; > goto out; > + } > } else if (!*pagep) { > - ret = -ENOMEM; > + /* If a page already exists, then it's UFFDIO_COPY for > + * a non-missing case. Return -EEXIST. > + */ > + if (hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { > + ret = -EEXIST; > + goto out; > + } > + > page = alloc_huge_page(dst_vma, dst_addr, 0); > if (IS_ERR(page)) > goto out; > > + /* Add shared, newly allocated pages to the page cache. */ > + if (vm_shared) { > + size = i_size_read(mapping->host) >> huge_page_shift(h); > + ret = -EFAULT; > + if (idx >= size) > + goto out; > + > + /* > + * Serialization between remove_inode_hugepages() and > + * huge_add_to_page_cache() below happens through the > + * hugetlb_fault_mutex_table that here must be hold by > + * the caller. > + */ > + ret = huge_add_to_page_cache(page, mapping, idx); > + if (ret) > + goto out; > + } > + > ret = copy_huge_page_from_user(page, > (const void __user *) src_addr, > pages_per_huge_page(h), false); > @@ -4916,24 +4944,6 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > */ > __SetPageUptodate(page); > > - /* Add shared, newly allocated pages to the page cache. */ > - if (vm_shared && !is_continue) { > - size = i_size_read(mapping->host) >> huge_page_shift(h); > - ret = -EFAULT; > - if (idx >= size) > - goto out_release_nounlock; > - > - /* > - * Serialization between remove_inode_hugepages() and > - * huge_add_to_page_cache() below happens through the > - * hugetlb_fault_mutex_table that here must be hold by > - * the caller. > - */ > - ret = huge_add_to_page_cache(page, mapping, idx); > - if (ret) > - goto out_release_nounlock; > - } > - > ptl = huge_pte_lockptr(h, dst_mm, dst_pte); > spin_lock(ptl); > > @@ -4994,7 +5004,6 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > spin_unlock(ptl); > if (vm_shared || is_continue) > unlock_page(page); > -out_release_nounlock: > put_page(page); > goto out; > } > -- > 2.31.1.751.gd2f1c929bd-goog