All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] mm: support NOSIGBUS on fault of mmap
@ 2021-06-04  7:43 Ming Lin
  2021-06-04  7:43 ` [PATCH v2 1/2] mm: make "vm_flags" be an u64 Ming Lin
  2021-06-04  7:43 ` [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap() Ming Lin
  0 siblings, 2 replies; 9+ messages in thread
From: Ming Lin @ 2021-06-04  7:43 UTC (permalink / raw)
  To: Linus Torvalds, Hugh Dickins, Simon Ser, Matthew Wilcox
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-api

These 2 patches are based on the discussion of "Sealed memfd & no-fault mmap"
at https://bit.ly/3pdwOGR

v2:
- make MAP_NOSIGBUS generic instead of being restricted to shmem
- use do_anonymous_page() to insert zero page
- fix build warnings/errors reported by LKP test robot

v1:
https://lkml.org/lkml/2021/6/1/1076

Ming Lin (2):
  mm: make "vm_flags" be an u64
  mm: adds NOSIGBUS extension to mmap()

 arch/arm64/Kconfig                           |   1 -
 arch/parisc/include/uapi/asm/mman.h          |   1 +
 arch/powerpc/Kconfig                         |   1 -
 arch/x86/Kconfig                             |   1 -
 drivers/android/binder.c                     |   6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c     |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c    |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c      |   2 +-
 drivers/infiniband/hw/hfi1/file_ops.c        |   2 +-
 drivers/infiniband/hw/qib/qib_file_ops.c     |   4 +-
 fs/exec.c                                    |   2 +-
 fs/userfaultfd.c                             |   6 +-
 include/linux/huge_mm.h                      |   4 +-
 include/linux/ksm.h                          |   4 +-
 include/linux/mm.h                           | 108 +++++++++++++--------------
 include/linux/mm_types.h                     |   6 +-
 include/linux/mman.h                         |   5 +-
 include/uapi/asm-generic/mman-common.h       |   1 +
 mm/Kconfig                                   |   2 -
 mm/debug.c                                   |   4 +-
 mm/khugepaged.c                              |   2 +-
 mm/ksm.c                                     |   2 +-
 mm/madvise.c                                 |   2 +-
 mm/memory.c                                  |  15 +++-
 mm/mmap.c                                    |  14 ++--
 mm/mprotect.c                                |   4 +-
 mm/mremap.c                                  |   2 +-
 tools/include/uapi/asm-generic/mman-common.h |   1 +
 28 files changed, 108 insertions(+), 98 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] mm: make "vm_flags" be an u64
  2021-06-04  7:43 [PATCH v2 0/2] mm: support NOSIGBUS on fault of mmap Ming Lin
@ 2021-06-04  7:43 ` Ming Lin
  2021-06-04  7:43 ` [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap() Ming Lin
  1 sibling, 0 replies; 9+ messages in thread
From: Ming Lin @ 2021-06-04  7:43 UTC (permalink / raw)
  To: Linus Torvalds, Hugh Dickins, Simon Ser, Matthew Wilcox
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-api

So we can have enough bits on 32-bit architectures.

Use vm_flags_t instead of "unsigned long".
Also fix build warnings for many print code.

Signed-off-by: Ming Lin <mlin@kernel.org>
---
 arch/arm64/Kconfig                        |   1 -
 arch/powerpc/Kconfig                      |   1 -
 arch/x86/Kconfig                          |   1 -
 drivers/android/binder.c                  |   6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |   2 +-
 drivers/infiniband/hw/hfi1/file_ops.c     |   2 +-
 drivers/infiniband/hw/qib/qib_file_ops.c  |   4 +-
 fs/exec.c                                 |   2 +-
 fs/userfaultfd.c                          |   6 +-
 include/linux/huge_mm.h                   |   4 +-
 include/linux/ksm.h                       |   4 +-
 include/linux/mm.h                        | 106 ++++++++++++++----------------
 include/linux/mm_types.h                  |   6 +-
 include/linux/mman.h                      |   4 +-
 mm/Kconfig                                |   2 -
 mm/debug.c                                |   4 +-
 mm/khugepaged.c                           |   2 +-
 mm/ksm.c                                  |   2 +-
 mm/madvise.c                              |   2 +-
 mm/memory.c                               |   4 +-
 mm/mmap.c                                 |  10 +--
 mm/mprotect.c                             |   4 +-
 mm/mremap.c                               |   2 +-
 25 files changed, 87 insertions(+), 98 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9f1d856..c6960ea 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1658,7 +1658,6 @@ config ARM64_MTE
 	depends on AS_HAS_LSE_ATOMICS
 	# Required for tag checking in the uaccess routines
 	depends on ARM64_PAN
-	select ARCH_USES_HIGH_VMA_FLAGS
 	help
 	  Memory Tagging (part of the ARMv8.5 Extensions) provides
 	  architectural support for run-time, always-on detection of
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 088dd2a..5c1b49e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -940,7 +940,6 @@ config PPC_MEM_KEYS
 	prompt "PowerPC Memory Protection Keys"
 	def_bool y
 	depends on PPC_BOOK3S_64
-	select ARCH_USES_HIGH_VMA_FLAGS
 	select ARCH_HAS_PKEYS
 	help
 	  Memory Protection Keys provides a mechanism for enforcing
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0045e1b..a885336 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1874,7 +1874,6 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 	def_bool y
 	# Note: only available in 64-bit mode
 	depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
-	select ARCH_USES_HIGH_VMA_FLAGS
 	select ARCH_HAS_PKEYS
 	help
 	  Memory Protection Keys provides a mechanism for enforcing
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index bcec598..2a56b8b 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -4947,7 +4947,7 @@ static void binder_vma_open(struct vm_area_struct *vma)
 	struct binder_proc *proc = vma->vm_private_data;
 
 	binder_debug(BINDER_DEBUG_OPEN_CLOSE,
-		     "%d open vm area %lx-%lx (%ld K) vma %lx pagep %lx\n",
+		     "%d open vm area %lx-%lx (%ld K) vma %llx pagep %lx\n",
 		     proc->pid, vma->vm_start, vma->vm_end,
 		     (vma->vm_end - vma->vm_start) / SZ_1K, vma->vm_flags,
 		     (unsigned long)pgprot_val(vma->vm_page_prot));
@@ -4958,7 +4958,7 @@ static void binder_vma_close(struct vm_area_struct *vma)
 	struct binder_proc *proc = vma->vm_private_data;
 
 	binder_debug(BINDER_DEBUG_OPEN_CLOSE,
-		     "%d close vm area %lx-%lx (%ld K) vma %lx pagep %lx\n",
+		     "%d close vm area %lx-%lx (%ld K) vma %llx pagep %lx\n",
 		     proc->pid, vma->vm_start, vma->vm_end,
 		     (vma->vm_end - vma->vm_start) / SZ_1K, vma->vm_flags,
 		     (unsigned long)pgprot_val(vma->vm_page_prot));
@@ -4984,7 +4984,7 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma)
 		return -EINVAL;
 
 	binder_debug(BINDER_DEBUG_OPEN_CLOSE,
-		     "%s: %d %lx-%lx (%ld K) vma %lx pagep %lx\n",
+		     "%s: %d %lx-%lx (%ld K) vma %llx pagep %lx\n",
 		     __func__, proc->pid, vma->vm_start, vma->vm_end,
 		     (vma->vm_end - vma->vm_start) / SZ_1K, vma->vm_flags,
 		     (unsigned long)pgprot_val(vma->vm_page_prot));
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 43de260..3a1726b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1957,7 +1957,7 @@ static int kfd_mmio_mmap(struct kfd_dev *dev, struct kfd_process *process,
 	pr_debug("pasid 0x%x mapping mmio page\n"
 		 "     target user address == 0x%08llX\n"
 		 "     physical address    == 0x%08llX\n"
-		 "     vm_flags            == 0x%04lX\n"
+		 "     vm_flags            == 0x%08llX\n"
 		 "     size                == 0x%04lX\n",
 		 process->pasid, (unsigned long long) vma->vm_start,
 		 address, vma->vm_flags, PAGE_SIZE);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index 768d153..002462b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -150,7 +150,7 @@ int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process,
 	pr_debug("Mapping doorbell page\n"
 		 "     target user address == 0x%08llX\n"
 		 "     physical address    == 0x%08llX\n"
-		 "     vm_flags            == 0x%04lX\n"
+		 "     vm_flags            == 0x%08llX\n"
 		 "     size                == 0x%04lX\n",
 		 (unsigned long long) vma->vm_start, address, vma->vm_flags,
 		 kfd_doorbell_process_slice(dev));
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index ba2c2ce..e25ff04 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -808,7 +808,7 @@ int kfd_event_mmap(struct kfd_process *p, struct vm_area_struct *vma)
 	pr_debug("     start user address  == 0x%08lx\n", vma->vm_start);
 	pr_debug("     end user address    == 0x%08lx\n", vma->vm_end);
 	pr_debug("     pfn                 == 0x%016lX\n", pfn);
-	pr_debug("     vm_flags            == 0x%08lX\n", vma->vm_flags);
+	pr_debug("     vm_flags            == 0x%08llX\n", vma->vm_flags);
 	pr_debug("     size                == 0x%08lX\n",
 			vma->vm_end - vma->vm_start);
 
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 3b7bbc7..a40410f 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -569,7 +569,7 @@ static int hfi1_file_mmap(struct file *fp, struct vm_area_struct *vma)
 
 	vma->vm_flags = flags;
 	hfi1_cdbg(PROC,
-		  "%u:%u type:%u io/vf:%d/%d, addr:0x%llx, len:%lu(%lu), flags:0x%lx\n",
+		  "%u:%u type:%u io/vf:%d/%d, addr:0x%llx, len:%lu(%lu), flags:0x%llx\n",
 		    ctxt, subctxt, type, mapio, vmf, memaddr, memlen,
 		    vma->vm_end - vma->vm_start, vma->vm_flags);
 	if (vmf) {
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index c60e79d..9bd34e6 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -846,7 +846,7 @@ static int mmap_rcvegrbufs(struct vm_area_struct *vma,
 
 	if (vma->vm_flags & VM_WRITE) {
 		qib_devinfo(dd->pcidev,
-			"Can't map eager buffers as writable (flags=%lx)\n",
+			"Can't map eager buffers as writable (flags=%llx)\n",
 			vma->vm_flags);
 		ret = -EPERM;
 		goto bail;
@@ -935,7 +935,7 @@ static int mmap_kvaddr(struct vm_area_struct *vma, u64 pgaddr,
 		/* rcvegrbufs are read-only on the slave */
 		if (vma->vm_flags & VM_WRITE) {
 			qib_devinfo(dd->pcidev,
-				 "Can't map eager buffers as writable (flags=%lx)\n",
+				 "Can't map eager buffers as writable (flags=%llx)\n",
 				 vma->vm_flags);
 			ret = -EPERM;
 			goto bail;
diff --git a/fs/exec.c b/fs/exec.c
index 18594f1..8dcf8a5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -748,7 +748,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma = bprm->vma;
 	struct vm_area_struct *prev = NULL;
-	unsigned long vm_flags;
+	vm_flags_t vm_flags;
 	unsigned long stack_base;
 	unsigned long stack_size;
 	unsigned long stack_expand;
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 14f9228..b958055 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -846,7 +846,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 	struct vm_area_struct *vma, *prev;
 	/* len == 0 means wake all */
 	struct userfaultfd_wake_range range = { .len = 0, };
-	unsigned long new_flags;
+	vm_flags_t new_flags;
 
 	WRITE_ONCE(ctx->released, true);
 
@@ -1284,7 +1284,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	int ret;
 	struct uffdio_register uffdio_register;
 	struct uffdio_register __user *user_uffdio_register;
-	unsigned long vm_flags, new_flags;
+	vm_flags_t vm_flags, new_flags;
 	bool found;
 	bool basic_ioctls;
 	unsigned long start, end, vma_end;
@@ -1510,7 +1510,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 	struct vm_area_struct *vma, *prev, *cur;
 	int ret;
 	struct uffdio_range uffdio_unregister;
-	unsigned long new_flags;
+	vm_flags_t new_flags;
 	bool found;
 	unsigned long start, end, vma_end;
 	const void __user *buf = (void __user *)arg;
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 9626fda..2f524f0 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -215,7 +215,7 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
 			__split_huge_pud(__vma, __pud, __address);	\
 	}  while (0)
 
-int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags,
+int hugepage_madvise(struct vm_area_struct *vma, vm_flags_t *vm_flags,
 		     int advice);
 void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start,
 			   unsigned long end, long adjust_next);
@@ -403,7 +403,7 @@ static inline void split_huge_pmd_address(struct vm_area_struct *vma,
 	do { } while (0)
 
 static inline int hugepage_madvise(struct vm_area_struct *vma,
-				   unsigned long *vm_flags, int advice)
+				   vm_flags_t *vm_flags, int advice)
 {
 	BUG();
 	return 0;
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 161e816..9f57409 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -20,7 +20,7 @@
 
 #ifdef CONFIG_KSM
 int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
-		unsigned long end, int advice, unsigned long *vm_flags);
+		unsigned long end, int advice, vm_flags_t *vm_flags);
 int __ksm_enter(struct mm_struct *mm);
 void __ksm_exit(struct mm_struct *mm);
 
@@ -67,7 +67,7 @@ static inline void ksm_exit(struct mm_struct *mm)
 
 #ifdef CONFIG_MMU
 static inline int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
-		unsigned long end, int advice, unsigned long *vm_flags)
+		unsigned long end, int advice, vm_flags_t *vm_flags)
 {
 	return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c274f75..9e86ca1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -264,73 +264,68 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 extern unsigned int kobjsize(const void *objp);
 #endif
 
+#define VM_FLAGS_BIT(N)	(1ULL << (N))
+
 /*
  * vm_flags in vm_area_struct, see mm_types.h.
  * When changing, update also include/trace/events/mmflags.h
  */
 #define VM_NONE		0x00000000
 
-#define VM_READ		0x00000001	/* currently active flags */
-#define VM_WRITE	0x00000002
-#define VM_EXEC		0x00000004
-#define VM_SHARED	0x00000008
+#define VM_READ		VM_FLAGS_BIT(0)	 /* currently active flags */
+#define VM_WRITE	VM_FLAGS_BIT(1)
+#define VM_EXEC		VM_FLAGS_BIT(2)
+#define VM_SHARED	VM_FLAGS_BIT(3)
 
 /* mprotect() hardcodes VM_MAYREAD >> 4 == VM_READ, and so for r/w/x bits. */
-#define VM_MAYREAD	0x00000010	/* limits for mprotect() etc */
-#define VM_MAYWRITE	0x00000020
-#define VM_MAYEXEC	0x00000040
-#define VM_MAYSHARE	0x00000080
-
-#define VM_GROWSDOWN	0x00000100	/* general info on the segment */
-#define VM_UFFD_MISSING	0x00000200	/* missing pages tracking */
-#define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct page", just pure PFN */
-#define VM_DENYWRITE	0x00000800	/* ETXTBSY on write attempts.. */
-#define VM_UFFD_WP	0x00001000	/* wrprotect pages tracking */
-
-#define VM_LOCKED	0x00002000
-#define VM_IO           0x00004000	/* Memory mapped I/O or similar */
-
-					/* Used by sys_madvise() */
-#define VM_SEQ_READ	0x00008000	/* App will access data sequentially */
-#define VM_RAND_READ	0x00010000	/* App will not benefit from clustered reads */
-
-#define VM_DONTCOPY	0x00020000      /* Do not copy this vma on fork */
-#define VM_DONTEXPAND	0x00040000	/* Cannot expand with mremap() */
-#define VM_LOCKONFAULT	0x00080000	/* Lock the pages covered when they are faulted in */
-#define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
-#define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
-#define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
-#define VM_SYNC		0x00800000	/* Synchronous page faults */
-#define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
-#define VM_WIPEONFORK	0x02000000	/* Wipe VMA contents in child. */
-#define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
+#define VM_MAYREAD	VM_FLAGS_BIT(4)	 /* limits for mprotect() etc */
+#define VM_MAYWRITE	VM_FLAGS_BIT(5)
+#define VM_MAYEXEC	VM_FLAGS_BIT(6)
+#define VM_MAYSHARE	VM_FLAGS_BIT(7)
+
+#define VM_GROWSDOWN	VM_FLAGS_BIT(8)	 /* general info on the segment */
+#define VM_UFFD_MISSING	VM_FLAGS_BIT(9)	 /* missing pages tracking */
+#define VM_PFNMAP	VM_FLAGS_BIT(10) /* Page-ranges managed without "struct page", just pure PFN */
+#define VM_DENYWRITE	VM_FLAGS_BIT(11) /* ETXTBSY on write attempts.. */
+#define VM_UFFD_WP	VM_FLAGS_BIT(12) /* wrprotect pages tracking */
+
+#define VM_LOCKED	VM_FLAGS_BIT(13)
+#define VM_IO           VM_FLAGS_BIT(14) /* Memory mapped I/O or similar */
+
+					 /* Used by sys_madvise() */
+#define VM_SEQ_READ	VM_FLAGS_BIT(15) /* App will access data sequentially */
+#define VM_RAND_READ	VM_FLAGS_BIT(16) /* App will not benefit from clustered reads */
+
+#define VM_DONTCOPY	VM_FLAGS_BIT(17) /* Do not copy this vma on fork */
+#define VM_DONTEXPAND	VM_FLAGS_BIT(18) /* Cannot expand with mremap() */
+#define VM_LOCKONFAULT	VM_FLAGS_BIT(19) /* Lock the pages covered when they are faulted in */
+#define VM_ACCOUNT	VM_FLAGS_BIT(20) /* Is a VM accounted object */
+#define VM_NORESERVE	VM_FLAGS_BIT(21) /* should the VM suppress accounting */
+#define VM_HUGETLB	VM_FLAGS_BIT(22) /* Huge TLB Page VM */
+#define VM_SYNC		VM_FLAGS_BIT(23) /* Synchronous page faults */
+#define VM_ARCH_1	VM_FLAGS_BIT(24) /* Architecture-specific flag */
+#define VM_WIPEONFORK	VM_FLAGS_BIT(25) /* Wipe VMA contents in child. */
+#define VM_DONTDUMP	VM_FLAGS_BIT(26) /* Do not include in the core dump */
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
-# define VM_SOFTDIRTY	0x08000000	/* Not soft dirty clean area */
+# define VM_SOFTDIRTY	VM_FLAGS_BIT(27) /* Not soft dirty clean area */
 #else
 # define VM_SOFTDIRTY	0
 #endif
 
-#define VM_MIXEDMAP	0x10000000	/* Can contain "struct page" and pure PFN pages */
-#define VM_HUGEPAGE	0x20000000	/* MADV_HUGEPAGE marked this vma */
-#define VM_NOHUGEPAGE	0x40000000	/* MADV_NOHUGEPAGE marked this vma */
-#define VM_MERGEABLE	0x80000000	/* KSM may merge identical pages */
-
-#ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
-#define VM_HIGH_ARCH_BIT_0	32	/* bit only usable on 64-bit architectures */
-#define VM_HIGH_ARCH_BIT_1	33	/* bit only usable on 64-bit architectures */
-#define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
-#define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
-#define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
-#define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
-#define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
-#define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
-#define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
-#define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
-#endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
+#define VM_MIXEDMAP	VM_FLAGS_BIT(28) /* Can contain "struct page" and pure PFN pages */
+#define VM_HUGEPAGE	VM_FLAGS_BIT(29) /* MADV_HUGEPAGE marked this vma */
+#define VM_NOHUGEPAGE	VM_FLAGS_BIT(30) /* MADV_NOHUGEPAGE marked this vma */
+#define VM_MERGEABLE	VM_FLAGS_BIT(31) /* KSM may merge identical pages */
+
+#define VM_HIGH_ARCH_0	VM_FLAGS_BIT(32)
+#define VM_HIGH_ARCH_1	VM_FLAGS_BIT(33)
+#define VM_HIGH_ARCH_2	VM_FLAGS_BIT(34)
+#define VM_HIGH_ARCH_3	VM_FLAGS_BIT(35)
+#define VM_HIGH_ARCH_4	VM_FLAGS_BIT(36)
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
-# define VM_PKEY_SHIFT	VM_HIGH_ARCH_BIT_0
+# define VM_PKEY_SHIFT	32
 # define VM_PKEY_BIT0	VM_HIGH_ARCH_0	/* A protection key is a 4-bit value */
 # define VM_PKEY_BIT1	VM_HIGH_ARCH_1	/* on x86 and 5-bit value on ppc64   */
 # define VM_PKEY_BIT2	VM_HIGH_ARCH_2
@@ -373,8 +368,7 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 #endif
 
 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
-# define VM_UFFD_MINOR_BIT	37
-# define VM_UFFD_MINOR		BIT(VM_UFFD_MINOR_BIT)	/* UFFD minor faults */
+# define VM_UFFD_MINOR		VM_FLAGS_BIT(37)	/* UFFD minor faults */
 #else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
 # define VM_UFFD_MINOR		VM_NONE
 #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
@@ -1894,7 +1888,7 @@ extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long
 			      unsigned long cp_flags);
 extern int mprotect_fixup(struct vm_area_struct *vma,
 			  struct vm_area_struct **pprev, unsigned long start,
-			  unsigned long end, unsigned long newflags);
+			  unsigned long end, vm_flags_t newflags);
 
 /*
  * doesn't attempt to fault and will return short.
@@ -2545,7 +2539,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
 }
 extern struct vm_area_struct *vma_merge(struct mm_struct *,
 	struct vm_area_struct *prev, unsigned long addr, unsigned long end,
-	unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
+	vm_flags_t vm_flags, struct anon_vma *, struct file *, pgoff_t,
 	struct mempolicy *, struct vm_userfaultfd_ctx);
 extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
 extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
@@ -2626,7 +2620,7 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {}
 
 /* These take the mm semaphore themselves */
 extern int __must_check vm_brk(unsigned long, unsigned long);
-extern int __must_check vm_brk_flags(unsigned long, unsigned long, unsigned long);
+extern int __must_check vm_brk_flags(unsigned long, unsigned long, vm_flags_t);
 extern int vm_munmap(unsigned long, size_t);
 extern unsigned long __must_check vm_mmap(struct file *, unsigned long,
         unsigned long, unsigned long,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5aacc1c..cb612d0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -264,7 +264,7 @@ struct page_frag_cache {
 	bool pfmemalloc;
 };
 
-typedef unsigned long vm_flags_t;
+typedef u64 vm_flags_t;
 
 /*
  * A region containing a mapping of a non-memory backed file under NOMMU
@@ -330,7 +330,7 @@ struct vm_area_struct {
 	 * See vmf_insert_mixed_prot() for discussion.
 	 */
 	pgprot_t vm_page_prot;
-	unsigned long vm_flags;		/* Flags, see mm.h. */
+	vm_flags_t vm_flags;			/* Flags, see mm.h. */
 
 	/*
 	 * For areas with an address space and backing store,
@@ -478,7 +478,7 @@ struct mm_struct {
 		unsigned long data_vm;	   /* VM_WRITE & ~VM_SHARED & ~VM_STACK */
 		unsigned long exec_vm;	   /* VM_EXEC & ~VM_WRITE & ~VM_STACK */
 		unsigned long stack_vm;	   /* VM_STACK */
-		unsigned long def_flags;
+		vm_flags_t def_flags;
 
 		spinlock_t arg_lock; /* protect the below fields */
 		unsigned long start_code, end_code, start_data, end_data;
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 629cefc..b2cbae9 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -135,7 +135,7 @@ static inline bool arch_validate_flags(unsigned long flags)
 /*
  * Combine the mmap "prot" argument into "vm_flags" used internally.
  */
-static inline unsigned long
+static inline vm_flags_t
 calc_vm_prot_bits(unsigned long prot, unsigned long pkey)
 {
 	return _calc_vm_trans(prot, PROT_READ,  VM_READ ) |
@@ -147,7 +147,7 @@ static inline bool arch_validate_flags(unsigned long flags)
 /*
  * Combine the mmap "flags" argument into "vm_flags" used internally.
  */
-static inline unsigned long
+static inline vm_flags_t
 calc_vm_flag_bits(unsigned long flags)
 {
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
diff --git a/mm/Kconfig b/mm/Kconfig
index 02d44e3..aa8efba 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -830,8 +830,6 @@ config DEVICE_PRIVATE
 config VMAP_PFN
 	bool
 
-config ARCH_USES_HIGH_VMA_FLAGS
-	bool
 config ARCH_HAS_PKEYS
 	bool
 
diff --git a/mm/debug.c b/mm/debug.c
index 0bdda84..6165b5f 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -202,7 +202,7 @@ void dump_vma(const struct vm_area_struct *vma)
 		"next %px prev %px mm %px\n"
 		"prot %lx anon_vma %px vm_ops %px\n"
 		"pgoff %lx file %px private_data %px\n"
-		"flags: %#lx(%pGv)\n",
+		"flags: %#llx(%pGv)\n",
 		vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_next,
 		vma->vm_prev, vma->vm_mm,
 		(unsigned long)pgprot_val(vma->vm_page_prot),
@@ -240,7 +240,7 @@ void dump_mm(const struct mm_struct *mm)
 		"numa_next_scan %lu numa_scan_offset %lu numa_scan_seq %d\n"
 #endif
 		"tlb_flush_pending %d\n"
-		"def_flags: %#lx(%pGv)\n",
+		"def_flags: %#llx(%pGv)\n",
 
 		mm, mm->mmap, (long long) mm->vmacache_seqnum, mm->task_size,
 #ifdef CONFIG_MMU
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6c0185f..ad76bde 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -345,7 +345,7 @@ struct attribute_group khugepaged_attr_group = {
 #endif /* CONFIG_SYSFS */
 
 int hugepage_madvise(struct vm_area_struct *vma,
-		     unsigned long *vm_flags, int advice)
+		     vm_flags_t *vm_flags, int advice)
 {
 	switch (advice) {
 	case MADV_HUGEPAGE:
diff --git a/mm/ksm.c b/mm/ksm.c
index 2f3aaeb..257147c 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2431,7 +2431,7 @@ static int ksm_scan_thread(void *nothing)
 }
 
 int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
-		unsigned long end, int advice, unsigned long *vm_flags)
+		unsigned long end, int advice, vm_flags_t *vm_flags)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	int err;
diff --git a/mm/madvise.c b/mm/madvise.c
index 63e489e..5105393 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -71,7 +71,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
 	struct mm_struct *mm = vma->vm_mm;
 	int error = 0;
 	pgoff_t pgoff;
-	unsigned long new_flags = vma->vm_flags;
+	vm_flags_t new_flags = vma->vm_flags;
 
 	switch (behavior) {
 	case MADV_NORMAL:
diff --git a/mm/memory.c b/mm/memory.c
index 730daa0..8d5e583 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -550,7 +550,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
 		 (long long)pte_val(pte), (long long)pmd_val(*pmd));
 	if (page)
 		dump_page(page, "bad pte");
-	pr_alert("addr:%px vm_flags:%08lx anon_vma:%px mapping:%px index:%lx\n",
+	pr_alert("addr:%px vm_flags:%08llx anon_vma:%px mapping:%px index:%lx\n",
 		 (void *)addr, vma->vm_flags, vma->anon_vma, mapping, index);
 	pr_alert("file:%pD fault:%ps mmap:%ps readpage:%ps\n",
 		 vma->vm_file,
@@ -1241,7 +1241,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			struct page *page;
 
 			page = vm_normal_page(vma, addr, ptent);
-			if (unlikely(details) && page) {
+			if (unlikely(details) && page && !(vma->vm_flags & VM_NOSIGBUS)) {
 				/*
 				 * unmap_shared_mapping_pages() wants to
 				 * invalidate cache without truncating:
diff --git a/mm/mmap.c b/mm/mmap.c
index 0584e54..8bed547 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -191,7 +191,7 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
 	return next;
 }
 
-static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags,
+static int do_brk_flags(unsigned long addr, unsigned long request, vm_flags_t flags,
 		struct list_head *uf);
 SYSCALL_DEFINE1(brk, unsigned long, brk)
 {
@@ -1160,7 +1160,7 @@ static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1,
  */
 struct vm_area_struct *vma_merge(struct mm_struct *mm,
 			struct vm_area_struct *prev, unsigned long addr,
-			unsigned long end, unsigned long vm_flags,
+			unsigned long end, vm_flags_t vm_flags,
 			struct anon_vma *anon_vma, struct file *file,
 			pgoff_t pgoff, struct mempolicy *policy,
 			struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
@@ -1353,7 +1353,7 @@ static inline unsigned long round_hint_to_min(unsigned long hint)
 }
 
 static inline int mlock_future_check(struct mm_struct *mm,
-				     unsigned long flags,
+				     vm_flags_t flags,
 				     unsigned long len)
 {
 	unsigned long locked, lock_limit;
@@ -3050,7 +3050,7 @@ int vm_munmap(unsigned long start, size_t len)
  *  anonymous maps.  eventually we may be able to do some
  *  brk-specific accounting here.
  */
-static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long flags, struct list_head *uf)
+static int do_brk_flags(unsigned long addr, unsigned long len, vm_flags_t flags, struct list_head *uf)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma, *prev;
@@ -3118,7 +3118,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
 	return 0;
 }
 
-int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
+int vm_brk_flags(unsigned long addr, unsigned long request, vm_flags_t flags)
 {
 	struct mm_struct *mm = current->mm;
 	unsigned long len;
diff --git a/mm/mprotect.c b/mm/mprotect.c
index e7a4431..0433db7 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -397,10 +397,10 @@ static int prot_none_test(unsigned long addr, unsigned long next,
 
 int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
-	unsigned long start, unsigned long end, unsigned long newflags)
+	unsigned long start, unsigned long end, vm_flags_t newflags)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	unsigned long oldflags = vma->vm_flags;
+	vm_flags_t oldflags = vma->vm_flags;
 	long nrpages = (end - start) >> PAGE_SHIFT;
 	unsigned long charged = 0;
 	pgoff_t pgoff;
diff --git a/mm/mremap.c b/mm/mremap.c
index 47c255b..bf9a661 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -489,7 +489,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct vm_area_struct *new_vma;
-	unsigned long vm_flags = vma->vm_flags;
+	vm_flags_t vm_flags = vma->vm_flags;
 	unsigned long new_pgoff;
 	unsigned long moved_len;
 	unsigned long excess = 0;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap()
  2021-06-04  7:43 [PATCH v2 0/2] mm: support NOSIGBUS on fault of mmap Ming Lin
  2021-06-04  7:43 ` [PATCH v2 1/2] mm: make "vm_flags" be an u64 Ming Lin
@ 2021-06-04  7:43 ` Ming Lin
  2021-06-04 15:24   ` Kirill A. Shutemov
  2021-06-28 14:27     ` [Virtio-fs] " Vivek Goyal
  1 sibling, 2 replies; 9+ messages in thread
From: Ming Lin @ 2021-06-04  7:43 UTC (permalink / raw)
  To: Linus Torvalds, Hugh Dickins, Simon Ser, Matthew Wilcox
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-api

Adds new flag MAP_NOSIGBUS of mmap() to specify the behavior of
"don't SIGBUS on fault". Right now, this flag is only allowed
for private mapping.

For MAP_NOSIGBUS mapping, map in the zero page on read fault
or fill a freshly allocated page with zeroes on write fault.

Signed-off-by: Ming Lin <mlin@kernel.org>
---
 arch/parisc/include/uapi/asm/mman.h          |  1 +
 include/linux/mm.h                           |  2 ++
 include/linux/mman.h                         |  1 +
 include/uapi/asm-generic/mman-common.h       |  1 +
 mm/memory.c                                  | 11 +++++++++++
 mm/mmap.c                                    |  4 ++++
 tools/include/uapi/asm-generic/mman-common.h |  1 +
 7 files changed, 21 insertions(+)

diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index ab78cba..eecf9af 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -25,6 +25,7 @@
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
 #define MAP_FIXED_NOREPLACE 0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+#define MAP_NOSIGBUS	0x200000	/* do not SIGBUS on fault */
 #define MAP_UNINITIALIZED 0		/* uninitialized anonymous mmap */
 
 #define MS_SYNC		1		/* synchronous memory sync */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9e86ca1..100d122 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -373,6 +373,8 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 # define VM_UFFD_MINOR		VM_NONE
 #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
 
+#define VM_NOSIGBUS		VM_FLAGS_BIT(38)	/* Do not SIGBUS on fault */
+
 /* Bits set in the VMA until the stack is in its final location */
 #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
 
diff --git a/include/linux/mman.h b/include/linux/mman.h
index b2cbae9..c966b08 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -154,6 +154,7 @@ static inline bool arch_validate_flags(unsigned long flags)
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
 	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
 	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
+	       _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
 	       arch_calc_vm_flag_bits(flags);
 }
 
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index f94f65d..a2a5333 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -29,6 +29,7 @@
 #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
 #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
 #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
 
 #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
 					 * uninitialized */
diff --git a/mm/memory.c b/mm/memory.c
index 8d5e583..6b5a897 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3676,6 +3676,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
 	}
 
 	ret = vma->vm_ops->fault(vmf);
+	if (unlikely(ret & VM_FAULT_SIGBUS) && (vma->vm_flags & VM_NOSIGBUS)) {
+		/*
+		 * For MAP_NOSIGBUS mapping, map in the zero page on read fault
+		 * or fill a freshly allocated page with zeroes on write fault
+		 */
+		ret = do_anonymous_page(vmf);
+		if (!ret)
+			ret = VM_FAULT_NOPAGE;
+		return ret;
+	}
+
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
 			    VM_FAULT_DONE_COW)))
 		return ret;
diff --git a/mm/mmap.c b/mm/mmap.c
index 8bed547..d5c9fb5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1419,6 +1419,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	if (!len)
 		return -EINVAL;
 
+	/* Restrict MAP_NOSIGBUS to MAP_PRIVATE mapping */
+	if ((flags & MAP_NOSIGBUS) && !(flags & MAP_PRIVATE))
+		return -EINVAL;
+
 	/*
 	 * Does the application expect PROT_READ to imply PROT_EXEC?
 	 *
diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h
index f94f65d..a2a5333 100644
--- a/tools/include/uapi/asm-generic/mman-common.h
+++ b/tools/include/uapi/asm-generic/mman-common.h
@@ -29,6 +29,7 @@
 #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
 #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
 #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
 
 #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
 					 * uninitialized */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap()
  2021-06-04  7:43 ` [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap() Ming Lin
@ 2021-06-04 15:24   ` Kirill A. Shutemov
  2021-06-04 16:22     ` Ming Lin
  2021-06-28 14:27     ` [Virtio-fs] " Vivek Goyal
  1 sibling, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2021-06-04 15:24 UTC (permalink / raw)
  To: Ming Lin
  Cc: Linus Torvalds, Hugh Dickins, Simon Ser, Matthew Wilcox,
	linux-mm, linux-kernel, linux-fsdevel, linux-api

On Fri, Jun 04, 2021 at 12:43:22AM -0700, Ming Lin wrote:
> Adds new flag MAP_NOSIGBUS of mmap() to specify the behavior of
> "don't SIGBUS on fault". Right now, this flag is only allowed
> for private mapping.

That's not what your use case asks for.

SIGBUS can be generated for a number of reasons, not only on fault beyond
end-of-file. vmf_error() would convert any errno, except ENOMEM to
VM_FAULT_SIGBUS.

Do you want to ignore -EIO or -ENOSPC? I don't think so.

> For MAP_NOSIGBUS mapping, map in the zero page on read fault
> or fill a freshly allocated page with zeroes on write fault.

I don't like the resulting semantics: if you had a read fault beyond EOF
and got zero page, you will still see zero page even if the file grows.
Yes, it's allowed by POSIX for MAP_PRIVATE to get out-of-sync with the
file, but it's not what users used to.

It might be enough for the use case, but I would rather avoid one-user
features.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap()
  2021-06-04 15:24   ` Kirill A. Shutemov
@ 2021-06-04 16:22     ` Ming Lin
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Lin @ 2021-06-04 16:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Hugh Dickins, Simon Ser, Matthew Wilcox,
	linux-mm, linux-kernel, linux-fsdevel, linux-api

On Fri, Jun 04, 2021 at 06:24:07PM +0300, Kirill A. Shutemov wrote:
> On Fri, Jun 04, 2021 at 12:43:22AM -0700, Ming Lin wrote:
> > Adds new flag MAP_NOSIGBUS of mmap() to specify the behavior of
> > "don't SIGBUS on fault". Right now, this flag is only allowed
> > for private mapping.
> 
> That's not what your use case asks for.

Simon explained the use case here: https://bit.ly/3wR85Lc

FYI, I copied here too.

------begin-------------------------------------------------------------------
Regarding the requirements for Wayland:

- The baseline requirement is being able to avoid SIGBUS for read-only mappings
  of shm files.
- Wayland clients can expand their shm files. However the compositor doesn't
  need to immediately access the new expanded region. The client will tell the
  compositor what the new shm file size is, and the compositor will re-map it.
- Ideally, MAP_NOSIGBUS would work on PROT_WRITE + MAP_SHARED mappings (of
  course, the no-SIGBUS behavior would be restricted to that mapping). The
  use-case is writing back to client buffers e.g. for screen capture. From the
  earlier discussions it seems like this would be complicated to implement.
  This means we'll need to come up with a new libwayland API to allow
  compositors to opt-in to the read-only mappings. This is sub-optimal but
  seems doable.
- Ideally, MAP_SIGBUS wouldn't be restricted to shm. There are use-cases for
  using it on ordinary files too, e.g. for sharing ICC profiles. But from the
  earlier replies it seems very unlikely that this will become possible, and
  making it work only on shm files would already be fantastic.
------end-------------------------------------------------------------------

> 
> SIGBUS can be generated for a number of reasons, not only on fault beyond
> end-of-file. vmf_error() would convert any errno, except ENOMEM to
> VM_FAULT_SIGBUS.
> 
> Do you want to ignore -EIO or -ENOSPC? I don't think so.
> 
> > For MAP_NOSIGBUS mapping, map in the zero page on read fault
> > or fill a freshly allocated page with zeroes on write fault.
> 
> I don't like the resulting semantics: if you had a read fault beyond EOF
> and got zero page, you will still see zero page even if the file grows.
> Yes, it's allowed by POSIX for MAP_PRIVATE to get out-of-sync with the
> file, but it's not what users used to.

Actually old version did support file grows.
https://github.com/minggr/linux/commit/77f3722b94ff33cafe0a72c1bf1b8fa374adb29f

We can support this if there is real use case.

> 
> It might be enough for the use case, but I would rather avoid one-user
> features.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap()
  2021-06-04  7:43 ` [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap() Ming Lin
@ 2021-06-28 14:27     ` Vivek Goyal
  2021-06-28 14:27     ` [Virtio-fs] " Vivek Goyal
  1 sibling, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2021-06-28 14:27 UTC (permalink / raw)
  To: Ming Lin
  Cc: Linus Torvalds, Hugh Dickins, Simon Ser, Matthew Wilcox,
	linux-mm, linux-kernel, linux-fsdevel, linux-api, virtio-fs-list,
	Dr. David Alan Gilbert, Miklos Szeredi

On Fri, Jun 04, 2021 at 12:43:22AM -0700, Ming Lin wrote:
> Adds new flag MAP_NOSIGBUS of mmap() to specify the behavior of
> "don't SIGBUS on fault". Right now, this flag is only allowed
> for private mapping.
> 
> For MAP_NOSIGBUS mapping, map in the zero page on read fault
> or fill a freshly allocated page with zeroes on write fault.

I am wondering if this could be of limited use for me if MAP_NOSIGBUS
were to be supported for shared mappings as well.

When virtiofs is run with dax enabled, then it is possible that if
a file is shared between two guests, then one guest truncates the
file and second guest tries to do load/store operation. Given current
kvm architecture, there is no mechanism to propagate SIGBUS to guest
process, instead KVM retries page fault infinitely and guest cpu/process
hangs.

Ideally we want this error to propagate all the way back into the
guest and to the guest process but that solution is not in place yet.

https://lore.kernel.org/kvm/20200406190951.GA19259@redhat.com/

In the absense of a proper solution, one could think of mapping
shared file on host with MAP_NOSIGBUS, and hopefully that means
kvm will be able to resolve fault to a zero filled page and guest
will not hang. But this means that data sharing between two processes
is now broken. Writes by process A will not be visible to process B
in another once this situation happens, IIUC.

So if we were to MAP_NOSIGBUS, guest will not hang but failures resulting
from ftruncate will be silent and will be noticed sometime later. I guess
not exactly a very pleasant scenario...

Thanks
Vivek



> 
> Signed-off-by: Ming Lin <mlin@kernel.org>
> ---
>  arch/parisc/include/uapi/asm/mman.h          |  1 +
>  include/linux/mm.h                           |  2 ++
>  include/linux/mman.h                         |  1 +
>  include/uapi/asm-generic/mman-common.h       |  1 +
>  mm/memory.c                                  | 11 +++++++++++
>  mm/mmap.c                                    |  4 ++++
>  tools/include/uapi/asm-generic/mman-common.h |  1 +
>  7 files changed, 21 insertions(+)
> 
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index ab78cba..eecf9af 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -25,6 +25,7 @@
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
>  #define MAP_FIXED_NOREPLACE 0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS	0x200000	/* do not SIGBUS on fault */
>  #define MAP_UNINITIALIZED 0		/* uninitialized anonymous mmap */
>  
>  #define MS_SYNC		1		/* synchronous memory sync */
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 9e86ca1..100d122 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -373,6 +373,8 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
>  # define VM_UFFD_MINOR		VM_NONE
>  #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
>  
> +#define VM_NOSIGBUS		VM_FLAGS_BIT(38)	/* Do not SIGBUS on fault */
> +
>  /* Bits set in the VMA until the stack is in its final location */
>  #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
>  
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index b2cbae9..c966b08 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -154,6 +154,7 @@ static inline bool arch_validate_flags(unsigned long flags)
>  	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
>  	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>  	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
> +	       _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
>  	       arch_calc_vm_flag_bits(flags);
>  }
>  
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index f94f65d..a2a5333 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -29,6 +29,7 @@
>  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
>  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
>  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
>  
>  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> diff --git a/mm/memory.c b/mm/memory.c
> index 8d5e583..6b5a897 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3676,6 +3676,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
>  	}
>  
>  	ret = vma->vm_ops->fault(vmf);
> +	if (unlikely(ret & VM_FAULT_SIGBUS) && (vma->vm_flags & VM_NOSIGBUS)) {
> +		/*
> +		 * For MAP_NOSIGBUS mapping, map in the zero page on read fault
> +		 * or fill a freshly allocated page with zeroes on write fault
> +		 */
> +		ret = do_anonymous_page(vmf);
> +		if (!ret)
> +			ret = VM_FAULT_NOPAGE;
> +		return ret;
> +	}
> +
>  	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
>  			    VM_FAULT_DONE_COW)))
>  		return ret;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 8bed547..d5c9fb5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1419,6 +1419,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  	if (!len)
>  		return -EINVAL;
>  
> +	/* Restrict MAP_NOSIGBUS to MAP_PRIVATE mapping */
> +	if ((flags & MAP_NOSIGBUS) && !(flags & MAP_PRIVATE))
> +		return -EINVAL;
> +
>  	/*
>  	 * Does the application expect PROT_READ to imply PROT_EXEC?
>  	 *
> diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h
> index f94f65d..a2a5333 100644
> --- a/tools/include/uapi/asm-generic/mman-common.h
> +++ b/tools/include/uapi/asm-generic/mman-common.h
> @@ -29,6 +29,7 @@
>  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
>  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
>  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
>  
>  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Virtio-fs] [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap()
@ 2021-06-28 14:27     ` Vivek Goyal
  0 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2021-06-28 14:27 UTC (permalink / raw)
  To: Ming Lin
  Cc: Miklos Szeredi, Simon Ser, Hugh Dickins, linux-kernel,
	Matthew Wilcox, virtio-fs-list, linux-mm, linux-api,
	linux-fsdevel, Linus Torvalds

On Fri, Jun 04, 2021 at 12:43:22AM -0700, Ming Lin wrote:
> Adds new flag MAP_NOSIGBUS of mmap() to specify the behavior of
> "don't SIGBUS on fault". Right now, this flag is only allowed
> for private mapping.
> 
> For MAP_NOSIGBUS mapping, map in the zero page on read fault
> or fill a freshly allocated page with zeroes on write fault.

I am wondering if this could be of limited use for me if MAP_NOSIGBUS
were to be supported for shared mappings as well.

When virtiofs is run with dax enabled, then it is possible that if
a file is shared between two guests, then one guest truncates the
file and second guest tries to do load/store operation. Given current
kvm architecture, there is no mechanism to propagate SIGBUS to guest
process, instead KVM retries page fault infinitely and guest cpu/process
hangs.

Ideally we want this error to propagate all the way back into the
guest and to the guest process but that solution is not in place yet.

https://lore.kernel.org/kvm/20200406190951.GA19259@redhat.com/

In the absense of a proper solution, one could think of mapping
shared file on host with MAP_NOSIGBUS, and hopefully that means
kvm will be able to resolve fault to a zero filled page and guest
will not hang. But this means that data sharing between two processes
is now broken. Writes by process A will not be visible to process B
in another once this situation happens, IIUC.

So if we were to MAP_NOSIGBUS, guest will not hang but failures resulting
from ftruncate will be silent and will be noticed sometime later. I guess
not exactly a very pleasant scenario...

Thanks
Vivek



> 
> Signed-off-by: Ming Lin <mlin@kernel.org>
> ---
>  arch/parisc/include/uapi/asm/mman.h          |  1 +
>  include/linux/mm.h                           |  2 ++
>  include/linux/mman.h                         |  1 +
>  include/uapi/asm-generic/mman-common.h       |  1 +
>  mm/memory.c                                  | 11 +++++++++++
>  mm/mmap.c                                    |  4 ++++
>  tools/include/uapi/asm-generic/mman-common.h |  1 +
>  7 files changed, 21 insertions(+)
> 
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index ab78cba..eecf9af 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -25,6 +25,7 @@
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
>  #define MAP_FIXED_NOREPLACE 0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS	0x200000	/* do not SIGBUS on fault */
>  #define MAP_UNINITIALIZED 0		/* uninitialized anonymous mmap */
>  
>  #define MS_SYNC		1		/* synchronous memory sync */
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 9e86ca1..100d122 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -373,6 +373,8 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
>  # define VM_UFFD_MINOR		VM_NONE
>  #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
>  
> +#define VM_NOSIGBUS		VM_FLAGS_BIT(38)	/* Do not SIGBUS on fault */
> +
>  /* Bits set in the VMA until the stack is in its final location */
>  #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
>  
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index b2cbae9..c966b08 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -154,6 +154,7 @@ static inline bool arch_validate_flags(unsigned long flags)
>  	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
>  	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>  	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
> +	       _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
>  	       arch_calc_vm_flag_bits(flags);
>  }
>  
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index f94f65d..a2a5333 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -29,6 +29,7 @@
>  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
>  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
>  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
>  
>  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> diff --git a/mm/memory.c b/mm/memory.c
> index 8d5e583..6b5a897 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3676,6 +3676,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
>  	}
>  
>  	ret = vma->vm_ops->fault(vmf);
> +	if (unlikely(ret & VM_FAULT_SIGBUS) && (vma->vm_flags & VM_NOSIGBUS)) {
> +		/*
> +		 * For MAP_NOSIGBUS mapping, map in the zero page on read fault
> +		 * or fill a freshly allocated page with zeroes on write fault
> +		 */
> +		ret = do_anonymous_page(vmf);
> +		if (!ret)
> +			ret = VM_FAULT_NOPAGE;
> +		return ret;
> +	}
> +
>  	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
>  			    VM_FAULT_DONE_COW)))
>  		return ret;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 8bed547..d5c9fb5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1419,6 +1419,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  	if (!len)
>  		return -EINVAL;
>  
> +	/* Restrict MAP_NOSIGBUS to MAP_PRIVATE mapping */
> +	if ((flags & MAP_NOSIGBUS) && !(flags & MAP_PRIVATE))
> +		return -EINVAL;
> +
>  	/*
>  	 * Does the application expect PROT_READ to imply PROT_EXEC?
>  	 *
> diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h
> index f94f65d..a2a5333 100644
> --- a/tools/include/uapi/asm-generic/mman-common.h
> +++ b/tools/include/uapi/asm-generic/mman-common.h
> @@ -29,6 +29,7 @@
>  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
>  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
>  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
>  
>  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap()
  2021-06-28 14:27     ` [Virtio-fs] " Vivek Goyal
@ 2021-06-30 16:37       ` Ming Lin
  -1 siblings, 0 replies; 9+ messages in thread
From: Ming Lin @ 2021-06-30 16:37 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Linus Torvalds, Hugh Dickins, Simon Ser, Matthew Wilcox,
	linux-mm, linux-kernel, linux-fsdevel, linux-api, virtio-fs-list,
	Dr. David Alan Gilbert, Miklos Szeredi

O Mon, Jun 28, 2021 at 10:27:23AM -0400, Vivek Goyal wrote:
> On Fri, Jun 04, 2021 at 12:43:22AM -0700, Ming Lin wrote:
> > Adds new flag MAP_NOSIGBUS of mmap() to specify the behavior of
> > "don't SIGBUS on fault". Right now, this flag is only allowed
> > for private mapping.
> > 
> > For MAP_NOSIGBUS mapping, map in the zero page on read fault
> > or fill a freshly allocated page with zeroes on write fault.
> 
> I am wondering if this could be of limited use for me if MAP_NOSIGBUS
> were to be supported for shared mappings as well.

V1 did support shared mapping.
https://lkml.org/lkml/2021/6/1/1078

And V0 even supported unmapping the zero page for later write.
https://github.com/minggr/linux/commit/77f3722b94ff33cafe0a72c1bf1b8fa374adb29f

We may support shared mapping if there is a real use case.
As Hugh mentioned:
> And by restricting to MAP_PRIVATE, you would allow for adding a
> proper MAP_SHARED implementation later, if it's thought useful
> (that being the implementation which can subsequently unmap a
> zero page to let new page cache be mapped).

See https://lkml.org/lkml/2021/6/1/1258

Ming

> 
> When virtiofs is run with dax enabled, then it is possible that if
> a file is shared between two guests, then one guest truncates the
> file and second guest tries to do load/store operation. Given current
> kvm architecture, there is no mechanism to propagate SIGBUS to guest
> process, instead KVM retries page fault infinitely and guest cpu/process
> hangs.
> 
> Ideally we want this error to propagate all the way back into the
> guest and to the guest process but that solution is not in place yet.
> 
> https://lore.kernel.org/kvm/20200406190951.GA19259@redhat.com/
> 
> In the absense of a proper solution, one could think of mapping
> shared file on host with MAP_NOSIGBUS, and hopefully that means
> kvm will be able to resolve fault to a zero filled page and guest
> will not hang. But this means that data sharing between two processes
> is now broken. Writes by process A will not be visible to process B
> in another once this situation happens, IIUC.
> 
> So if we were to MAP_NOSIGBUS, guest will not hang but failures resulting
> from ftruncate will be silent and will be noticed sometime later. I guess
> not exactly a very pleasant scenario...
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Virtio-fs] [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap()
@ 2021-06-30 16:37       ` Ming Lin
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Lin @ 2021-06-30 16:37 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Simon Ser, Hugh Dickins, linux-kernel,
	Matthew Wilcox, virtio-fs-list, linux-mm, linux-api,
	linux-fsdevel, Linus Torvalds

O Mon, Jun 28, 2021 at 10:27:23AM -0400, Vivek Goyal wrote:
> On Fri, Jun 04, 2021 at 12:43:22AM -0700, Ming Lin wrote:
> > Adds new flag MAP_NOSIGBUS of mmap() to specify the behavior of
> > "don't SIGBUS on fault". Right now, this flag is only allowed
> > for private mapping.
> > 
> > For MAP_NOSIGBUS mapping, map in the zero page on read fault
> > or fill a freshly allocated page with zeroes on write fault.
> 
> I am wondering if this could be of limited use for me if MAP_NOSIGBUS
> were to be supported for shared mappings as well.

V1 did support shared mapping.
https://lkml.org/lkml/2021/6/1/1078

And V0 even supported unmapping the zero page for later write.
https://github.com/minggr/linux/commit/77f3722b94ff33cafe0a72c1bf1b8fa374adb29f

We may support shared mapping if there is a real use case.
As Hugh mentioned:
> And by restricting to MAP_PRIVATE, you would allow for adding a
> proper MAP_SHARED implementation later, if it's thought useful
> (that being the implementation which can subsequently unmap a
> zero page to let new page cache be mapped).

See https://lkml.org/lkml/2021/6/1/1258

Ming

> 
> When virtiofs is run with dax enabled, then it is possible that if
> a file is shared between two guests, then one guest truncates the
> file and second guest tries to do load/store operation. Given current
> kvm architecture, there is no mechanism to propagate SIGBUS to guest
> process, instead KVM retries page fault infinitely and guest cpu/process
> hangs.
> 
> Ideally we want this error to propagate all the way back into the
> guest and to the guest process but that solution is not in place yet.
> 
> https://lore.kernel.org/kvm/20200406190951.GA19259@redhat.com/
> 
> In the absense of a proper solution, one could think of mapping
> shared file on host with MAP_NOSIGBUS, and hopefully that means
> kvm will be able to resolve fault to a zero filled page and guest
> will not hang. But this means that data sharing between two processes
> is now broken. Writes by process A will not be visible to process B
> in another once this situation happens, IIUC.
> 
> So if we were to MAP_NOSIGBUS, guest will not hang but failures resulting
> from ftruncate will be silent and will be noticed sometime later. I guess
> not exactly a very pleasant scenario...
> 
> Thanks
> Vivek


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-06-30 16:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-04  7:43 [PATCH v2 0/2] mm: support NOSIGBUS on fault of mmap Ming Lin
2021-06-04  7:43 ` [PATCH v2 1/2] mm: make "vm_flags" be an u64 Ming Lin
2021-06-04  7:43 ` [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap() Ming Lin
2021-06-04 15:24   ` Kirill A. Shutemov
2021-06-04 16:22     ` Ming Lin
2021-06-28 14:27   ` Vivek Goyal
2021-06-28 14:27     ` [Virtio-fs] " Vivek Goyal
2021-06-30 16:37     ` Ming Lin
2021-06-30 16:37       ` [Virtio-fs] " Ming Lin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.