From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5A9CC433B4 for ; Thu, 8 Apr 2021 23:43:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B304F61168 for ; Thu, 8 Apr 2021 23:43:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233065AbhDHXnz (ORCPT ); Thu, 8 Apr 2021 19:43:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233019AbhDHXnt (ORCPT ); Thu, 8 Apr 2021 19:43:49 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A70F0C061763 for ; Thu, 8 Apr 2021 16:43:37 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 10so3524172ybu.18 for ; Thu, 08 Apr 2021 16:43:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=iJrBKic6BWLSUzVU0etk70MUxAfaDA5mu3HMcg/WB2s=; b=ic+SugPMZft+1nzSi4Yef4EU2Ccy8KcK3OzqNulnPL8gkFYK92kKgxPDtoqnaitbq1 sL7Zz0Wg8X4Ux3qEN51dsekfArXaBoa37tMzdIs2ORXk/KM83dsMJCHF5DKpsgK7IwWw xfsl2OgAD7uWHV9JWdMHHUHIBH6IukevDzfj43G5TijUVbesqk5OxyCfGUW10QaxvhJo ++gG0U2pixyy2dbD+TzlnaTkoLx0uBsZ6hgOnsReJTVElIz8zKjURLKfzkQZwgLCZA7F I2E21NbyhbJi91WQqlLXpecYru25fOB4bOFna/WqtTzB8DaRNpR1PGZPXT7hLvXdGizB pkWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=iJrBKic6BWLSUzVU0etk70MUxAfaDA5mu3HMcg/WB2s=; b=ShyG3xDBx4W75kXgKsOdBkrS0OmVMzk5yGzP2jEsH9ro0KFRmjDY2vLE/uC7hKlxbQ I4vUE/38DbZl3VN3hQJGxmgQbEYEdJ4hCGlZdRJp+jOBDsGjAsvC9wydctqIsIEAq0FF KmG1XXe7SuI7lgisfHxZOp4j6IbuUNlVz1+VnCHaOxKHZCj5AOn3iHVDblGn3z6vubqL AyrpzqY1c2DD4BU+LiYEHSWE72FU5woV7J60RnlBC+DjLuGFTc0a7Jehhs1ZkQPFUwqN PGZXe1HaimM09JDhScOpuTWOLqopHIQV5/24AwXi3YnYP40hCPCjrWP+eO4tT2rE6Efd UNRg== X-Gm-Message-State: AOAM531nIcW2PzY3h/iyGg2aNxS9/3OqUlY/3qoA5ssLSqA4B5lS2AF9 btcRBtAl4r82O0Ypz8EHdDvAEKkNlYvefGo05xUE X-Google-Smtp-Source: ABdhPJxpp4lDpLiYYEnpqNp54PjbYV/IE4p232ydPzXfrLRRoYXvEvVTNo4kRm16OanUNhbB9X3yYs7J9+1UroUEsKtN X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:3d06:d00f:a626:675a]) (user=axelrasmussen job=sendgmr) by 2002:a25:6e87:: with SMTP id j129mr15135621ybc.215.1617925416939; Thu, 08 Apr 2021 16:43:36 -0700 (PDT) Date: Thu, 8 Apr 2021 16:43:21 -0700 In-Reply-To: <20210408234327.624367-1-axelrasmussen@google.com> Message-Id: <20210408234327.624367-4-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210408234327.624367-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.31.1.295.g9ea45b61b8-goog Subject: [PATCH 3/9] userfaultfd/shmem: support minor fault registration for shmem From: Axel Rasmussen To: Alexander Viro , Andrea Arcangeli , Andrew Morton , Daniel Colascione , Hugh Dickins , Jerome Glisse , Joe Perches , Lokesh Gidra , Mike Kravetz , Mike Rapoport , Peter Xu , Shaohua Li , Shuah Khan , Stephen Rothwell , Wang Qing Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, Axel Rasmussen , Brian Geffon , "Dr . David Alan Gilbert" , Mina Almasry , Oliver Upton Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch allows shmem-backed VMAs to be registered for minor faults. Minor faults are appropriately relayed to userspace in the fault path, for VMAs with the relevant flag. This commit doesn't hook up the UFFDIO_CONTINUE ioctl for shmem-backed minor faults, though, so userspace doesn't yet have a way to resolve such faults. Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 6 +++--- include/uapi/linux/userfaultfd.h | 7 ++++++- mm/memory.c | 8 +++++--- mm/shmem.c | 20 ++++++++++++++++---- 4 files changed, 30 insertions(+), 11 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 14f92285d04f..9f3b8684cf3c 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1267,8 +1267,7 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma, } if (vm_flags & VM_UFFD_MINOR) { - /* FIXME: Add minor fault interception for shmem. */ - if (!is_vm_hugetlb_page(vma)) + if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma))) return false; } @@ -1941,7 +1940,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, /* report all available features and ioctls to userland */ uffdio_api.features = UFFD_API_FEATURES; #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR - uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS; + uffdio_api.features &= + ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); #endif uffdio_api.ioctls = UFFD_API_IOCTLS; ret = -EFAULT; diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index bafbeb1a2624..159a74e9564f 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -31,7 +31,8 @@ UFFD_FEATURE_MISSING_SHMEM | \ UFFD_FEATURE_SIGBUS | \ UFFD_FEATURE_THREAD_ID | \ - UFFD_FEATURE_MINOR_HUGETLBFS) + UFFD_FEATURE_MINOR_HUGETLBFS | \ + UFFD_FEATURE_MINOR_SHMEM) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -185,6 +186,9 @@ struct uffdio_api { * UFFD_FEATURE_MINOR_HUGETLBFS indicates that minor faults * can be intercepted (via REGISTER_MODE_MINOR) for * hugetlbfs-backed pages. + * + * UFFD_FEATURE_MINOR_SHMEM indicates the same support as + * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -196,6 +200,7 @@ struct uffdio_api { #define UFFD_FEATURE_SIGBUS (1<<7) #define UFFD_FEATURE_THREAD_ID (1<<8) #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) +#define UFFD_FEATURE_MINOR_SHMEM (1<<10) __u64 features; __u64 ioctls; diff --git a/mm/memory.c b/mm/memory.c index c8e357627318..a1e5ff55027e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3929,9 +3929,11 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf) * something). */ if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) { - ret = do_fault_around(vmf); - if (ret) - return ret; + if (likely(!userfaultfd_minor(vmf->vma))) { + ret = do_fault_around(vmf); + if (ret) + return ret; + } } ret = __do_fault(vmf); diff --git a/mm/shmem.c b/mm/shmem.c index c21f20cc4204..99c54b165c16 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1785,7 +1785,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, * vm. If we swap it in we mark it dirty since we also free the swap * entry since a page cannot live in both the swap and page cache. * - * vmf and fault_type are only supplied by shmem_fault: + * vma, vmf, and fault_type are only supplied by shmem_fault: * otherwise they are NULL. */ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, @@ -1802,6 +1802,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, pgoff_t hindex = index; gfp_t huge_gfp; int error; + bool swapped; int once = 0; int alloced = 0; @@ -1820,16 +1821,27 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, page = pagecache_get_page(mapping, index, FGP_ENTRY | FGP_HEAD | FGP_LOCK, 0); - if (xa_is_value(page)) { + swapped = xa_is_value(page); + if (swapped) { error = shmem_swapin_page(inode, index, &page, sgp, gfp, vma, fault_type); if (error == -EEXIST) goto repeat; - *pagep = page; - return error; + if (error) + return error; } + if (page && vma && userfaultfd_minor(vma)) { + unlock_page(page); + put_page(page); + *fault_type = handle_userfault(vmf, VM_UFFD_MINOR); + return 0; + } + + if (swapped) + return 0; + if (page) hindex = page->index; if (page && sgp == SGP_WRITE) -- 2.31.1.295.g9ea45b61b8-goog