linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: "Axel Rasmussen" <axelrasmussen@google.com>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Alexey Dobriyan" <adobriyan@gmail.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Chinwen Chang" <chinwen.chang@mediatek.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Ingo Molnar" <mingo@redhat.com>, "Jann Horn" <jannh@google.com>,
	"Jerome Glisse" <jglisse@redhat.com>,
	"Lokesh Gidra" <lokeshgidra@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Michel Lespinasse" <walken@google.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Peter Xu" <peterx@redhat.com>, "Shaohua Li" <shli@fb.com>,
	"Shawn Anastasio" <shawn@anastas.io>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Steven Price" <steven.price@arm.com>,
	"Vlastimil Babka" <vbabka@suse.cz>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Adam Ruprecht <ruprecht@google.com>,
	Cannon Matthews <cannonmatthews@google.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Oliver Upton <oupton@google.com>
Subject: Re: [PATCH v5 05/10] userfaultfd: add minor fault registration mode
Date: Fri, 12 Feb 2021 11:17:04 -0800	[thread overview]
Message-ID: <efb3173f-811c-5792-a193-6476905ecdb2@oracle.com> (raw)
In-Reply-To: <20210210212200.1097784-6-axelrasmussen@google.com>

On 2/10/21 1:21 PM, Axel Rasmussen wrote:
> This feature allows userspace to intercept "minor" faults. By "minor"
> faults, I mean the following situation:
> 
> Let there exist two mappings (i.e., VMAs) to the same page(s). One of
> the mappings is registered with userfaultfd (in minor mode), and the
> other is not. Via the non-UFFD mapping, the underlying pages have
> already been allocated & filled with some contents. The UFFD mapping
> has not yet been faulted in; when it is touched for the first time,
> this results in what I'm calling a "minor" fault. As a concrete
> example, when working with hugetlbfs, we have huge_pte_none(), but
> find_lock_page() finds an existing page.

Do we want to intercept the fault if it is for a private mapping that
will COW the page in the page cache?  I think 'yes' but just want to
confirm.  The code added to hugetlb_no_page will intercept these COW
accesses.

<snip>

> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e41b77cf6cc2..f150b10981a8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4366,6 +4366,38 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>  				VM_FAULT_SET_HINDEX(hstate_index(h));
>  			goto backout_unlocked;
>  		}
> +
> +		/* Check for page in userfault range. */
> +		if (userfaultfd_minor(vma)) {
> +			u32 hash;
> +			struct vm_fault vmf = {
> +				.vma = vma,
> +				.address = haddr,
> +				.flags = flags,
> +				/*
> +				 * Hard to debug if it ends up being used by a
> +				 * callee that assumes something about the
> +				 * other uninitialized fields... same as in
> +				 * memory.c
> +				 */
> +			};
> +
> +			unlock_page(page);
> +
> +			/*
> +			 * hugetlb_fault_mutex and i_mmap_rwsem must be dropped
> +			 * before handling userfault.  Reacquire after handling
> +			 * fault to make calling code simpler.
> +			 */
> +
> +			hash = hugetlb_fault_mutex_hash(mapping, idx);
> +			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> +			i_mmap_unlock_read(mapping);

After dropping all the locks, we only hold a reference to the page in the
page cache.  I 'think' someone else could hole punch the page and remove it
from the cache.  IIUC, state changing while processing uffd faults is something
that users need to deal with?  Just need to make sure there are no assumptions
in the kernel code.

> +			ret = handle_userfault(&vmf, VM_UFFD_MINOR);
> +			i_mmap_lock_read(mapping);
> +			mutex_lock(&hugetlb_fault_mutex_table[hash]);
> +			goto out;
> +		}
>  	}
>  
>  	/*
> 

-- 
Mike Kravetz

  parent reply	other threads:[~2021-02-12 19:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-10 21:21 [PATCH v5 00/10] userfaultfd: add minor fault handling Axel Rasmussen
2021-02-10 21:21 ` [PATCH v5 01/10] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() Axel Rasmussen
2021-02-11 23:59   ` Mike Kravetz
2021-02-10 21:21 ` [PATCH v5 02/10] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Axel Rasmussen
2021-02-12  0:19   ` Mike Kravetz
2021-02-12 20:40     ` Peter Xu
2021-02-12 20:47       ` Axel Rasmussen
2021-02-12 21:27         ` Mike Kravetz
2021-02-10 21:21 ` [PATCH v5 03/10] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Axel Rasmussen
2021-02-10 21:21 ` [PATCH v5 04/10] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Axel Rasmussen
2021-02-12 18:11   ` Mike Kravetz
2021-02-12 21:18     ` Peter Xu
2021-02-12 21:34       ` Mike Kravetz
2021-02-12 22:14         ` Peter Xu
2021-02-10 21:21 ` [PATCH v5 05/10] userfaultfd: add minor fault registration mode Axel Rasmussen
2021-02-11 19:28   ` Axel Rasmussen
2021-02-11 20:58     ` Axel Rasmussen
2021-02-12 22:21     ` Matthew Wilcox
2021-02-12 22:44       ` Peter Xu
2021-02-12 22:51         ` Axel Rasmussen
2021-02-12 23:01           ` Peter Xu
2021-02-12 19:17   ` Mike Kravetz [this message]
2021-02-12 19:33     ` Axel Rasmussen
2021-02-10 21:21 ` [PATCH v5 06/10] userfaultfd: disable huge PMD sharing for MINOR registered VMAs Axel Rasmussen
2021-02-10 21:21 ` [PATCH v5 07/10] userfaultfd: hugetlbfs: only compile UFFD helpers if config enabled Axel Rasmussen
2021-02-10 21:21 ` [PATCH v5 08/10] userfaultfd: add UFFDIO_CONTINUE ioctl Axel Rasmussen
2021-02-10 22:21   ` Axel Rasmussen
2021-02-10 21:21 ` [PATCH v5 09/10] userfaultfd: update documentation to describe minor fault handling Axel Rasmussen
2021-02-10 21:22 ` [PATCH v5 10/10] userfaultfd/selftests: add test exercising " Axel Rasmussen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=efb3173f-811c-5792-a193-6476905ecdb2@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=axelrasmussen@google.com \
    --cc=cannonmatthews@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=chinwen.chang@mediatek.com \
    --cc=dgilbert@redhat.com \
    --cc=jannh@google.com \
    --cc=jglisse@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=oupton@google.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=ruprecht@google.com \
    --cc=shawn@anastas.io \
    --cc=shli@fb.com \
    --cc=steven.price@arm.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).