All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lin <mlin@kernel.org>
To: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Simon Ser <contact@emersion.fr>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org
Subject: Re: [PATCH 2/2] mm: adds NOSIGBUS extension for out-of-band shmem read
Date: Thu, 3 Jun 2021 12:57:01 -0700	[thread overview]
Message-ID: <e46d1453-b1ff-e665-7312-1b97f2f44f4f@kernel.org> (raw)
In-Reply-To: <alpine.LSU.2.11.2106021719500.8333@eggly.anvils>

On 6/2/2021 5:46 PM, Hugh Dickins wrote
> 
> It's do_anonymous_page()'s business to map in the zero page on
> read fault (see "my_zero_pfn(vmf->address)" in there), or fill
> a freshly allocated page with zeroes on write fault - and now
> you're sticking to MAP_PRIVATE, write faults in VM_WRITE areas
> are okay for VM_NOSIGBUS.
> 
> Ideally you can simply call do_anonymous_page() from __do_fault()
> in the VM_FAULT_SIGBUS on VM_NOSIGBUS case.  That's what to start
> from anyway: but look to see if there's state to be adjusted to
> achieve that; and it won't be surprising if somewhere down in
> do_anonymous_page() or something it calls, there's a BUG on it
> being called when vma->vm_file is set, or something like that.
> May need some tweaking.

do_anonymous_page() works nicely for read fault and write fault.
I didn't see any BUG() thing in my test.

But I'm still struggling with how to do "punch hole should remove the mapping of zero page".
Here is the hack I have now.

diff --git a/mm/memory.c b/mm/memory.c
index 46ecda5..6b5a897 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1241,7 +1241,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
                         struct page *page;
  
                         page = vm_normal_page(vma, addr, ptent);
-                       if (unlikely(details) && page) {
+                       if (unlikely(details) && page && !(vma->vm_flags & VM_NOSIGBUS)) {
                                 /*
                                  * unmap_shared_mapping_pages() wants to
                                  * invalidate cache without truncating:


And other parts of the patch is following,

----

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e9d67bc..af9e277 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -373,6 +373,8 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
  # define VM_UFFD_MINOR		VM_NONE
  #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
  
+#define VM_NOSIGBUS		VM_FLAGS_BIT(38)	/* Do not SIGBUS on fault */
+
  /* Bits set in the VMA until the stack is in its final location */
  #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
  
diff --git a/include/linux/mman.h b/include/linux/mman.h
index b2cbae9..c966b08 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -154,6 +154,7 @@ static inline bool arch_validate_flags(unsigned long flags)
  	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
  	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
  	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
+	       _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
  	       arch_calc_vm_flag_bits(flags);
  }
  
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index f94f65d..a2a5333 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -29,6 +29,7 @@
  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
  
  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
  					 * uninitialized */
diff --git a/mm/memory.c b/mm/memory.c
index eff2a47..46ecda5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3676,6 +3676,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
  	}
  
  	ret = vma->vm_ops->fault(vmf);
+	if (unlikely(ret & VM_FAULT_SIGBUS) && (vma->vm_flags & VM_NOSIGBUS)) {
+		/*
+		 * For MAP_NOSIGBUS mapping, map in the zero page on read fault
+		 * or fill a freshly allocated page with zeroes on write fault
+		 */
+		ret = do_anonymous_page(vmf);
+		if (!ret)
+			ret = VM_FAULT_NOPAGE;
+		return ret;
+	}
+
  	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
  			    VM_FAULT_DONE_COW)))
  		return ret;
diff --git a/mm/mmap.c b/mm/mmap.c
index 096bba4..74fb49a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1419,6 +1419,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
  	if (!len)
  		return -EINVAL;
  
+	/* Restrict MAP_NOSIGBUS to MAP_PRIVATE mapping */
+	if ((flags & MAP_NOSIGBUS) && !(flags & MAP_PRIVATE))
+		return -EINVAL;
+
  	/*
  	 * Does the application expect PROT_READ to imply PROT_EXEC?
  	 *

  parent reply	other threads:[~2021-06-03 19:57 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01 23:22 [PATCH 0/2] mm: adds MAP_NOSIGBUS extension for shmem read Ming Lin
2021-06-01 23:22 ` [PATCH 1/2] mm: make "vm_flags" be an u64 Ming Lin
2021-06-02  1:58   ` kernel test robot
2021-06-02  1:58     ` kernel test robot
2021-06-02  2:06   ` kernel test robot
2021-06-02  2:06     ` kernel test robot
2021-06-01 23:22 ` [PATCH 2/2] mm: adds NOSIGBUS extension for out-of-band shmem read Ming Lin
2021-06-02  0:16   ` Linus Torvalds
2021-06-02  0:16     ` Linus Torvalds
2021-06-02  1:06     ` Ming Lin
2021-06-02  1:06       ` Ming Lin
2021-06-02  2:13     ` Hugh Dickins
2021-06-02  2:13       ` Hugh Dickins
2021-06-02  2:02   ` kernel test robot
2021-06-02  2:02     ` kernel test robot
2021-06-02  3:49   ` Hugh Dickins
2021-06-02  3:49     ` Hugh Dickins
2021-06-03  0:05     ` Ming Lin
2021-06-03  0:46       ` Hugh Dickins
2021-06-03  0:46         ` Hugh Dickins
2021-06-03 18:25         ` Linus Torvalds
2021-06-03 18:25           ` Linus Torvalds
2021-06-03 19:07           ` Hugh Dickins
2021-06-03 19:07             ` Hugh Dickins
2021-06-03 19:12             ` Linus Torvalds
2021-06-03 19:12               ` Linus Torvalds
2021-06-03 19:15               ` Linus Torvalds
2021-06-03 19:15                 ` Linus Torvalds
2021-06-03 19:24               ` Andy Lutomirski
2021-06-03 19:35                 ` Simon Ser
2021-06-03 19:57         ` Ming Lin [this message]
2021-06-02  9:30   ` kernel test robot
2021-06-02  9:30     ` kernel test robot
2021-06-02  0:16 kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e46d1453-b1ff-e665-7312-1b97f2f44f4f@kernel.org \
    --to=mlin@kernel.org \
    --cc=contact@emersion.fr \
    --cc=hughd@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.