linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Allow user to request memory to be locked on page fault
@ 2015-05-08 19:33 Eric B Munson
  2015-05-08 19:33 ` [PATCH 1/3] Add flag to request pages are locked after " Eric B Munson
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Eric B Munson @ 2015-05-08 19:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric B Munson, Shuah Khan, linux-alpha, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm,
	linux-arch, linux-api

mlock() allows a user to control page out of program memory, but this
comes at the cost of faulting in the entire mapping when it is
allocated.  For large mappings where the entire area is not necessary
this is not ideal.

This series introduces new flags for mmap() and mlockall() that allow a
user to specify that the covered are should not be paged out, but only
after the memory has been used the first time.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).

Avg throughput in MB/s from stream using 1000000 element arrays
Test     4.1-rc2      4.1-rc2+lock-on-fault
Copy:    10,979.08    10,917.34
Scale:   11,094.45    11,023.01
Add:     12,487.29    12,388.65
Triad:   12,505.77    12,418.78

Kernbench optimal load
                 4.1-rc2  4.1-rc2+lock-on-fault
Elapsed Time     71.046   71.324
User Time        62.117   62.352
System Time      8.926    8.969
Context Switches 14531.9  14542.5
Sleeps           14935.9  14939

Eric B Munson (3):
  Add flag to request pages are locked after page fault
  Add mlockall flag for locking pages on fault
  Add tests for lock on fault

 arch/alpha/include/uapi/asm/mman.h          |   2 +
 arch/mips/include/uapi/asm/mman.h           |   2 +
 arch/parisc/include/uapi/asm/mman.h         |   2 +
 arch/powerpc/include/uapi/asm/mman.h        |   2 +
 arch/sparc/include/uapi/asm/mman.h          |   2 +
 arch/tile/include/uapi/asm/mman.h           |   2 +
 arch/xtensa/include/uapi/asm/mman.h         |   2 +
 include/linux/mm.h                          |   1 +
 include/linux/mman.h                        |   3 +-
 include/uapi/asm-generic/mman.h             |   2 +
 mm/mlock.c                                  |  13 ++-
 mm/mmap.c                                   |   4 +-
 mm/swap.c                                   |   3 +-
 tools/testing/selftests/vm/Makefile         |   8 +-
 tools/testing/selftests/vm/lock-on-fault.c  | 145 ++++++++++++++++++++++++++++
 tools/testing/selftests/vm/on-fault-limit.c |  47 +++++++++
 tools/testing/selftests/vm/run_vmtests      |  23 +++++
 17 files changed, 254 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/vm/lock-on-fault.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-mm@kvack.org
Cc: linux-arch@vger.kernel.org
Cc: linux-api@vger.kernel.org

-- 
1.9.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] Add flag to request pages are locked after page fault
  2015-05-08 19:33 [PATCH 0/3] Allow user to request memory to be locked on page fault Eric B Munson
@ 2015-05-08 19:33 ` Eric B Munson
  2015-05-08 19:33 ` [PATCH 2/3] Add mlockall flag for locking pages on fault Eric B Munson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Eric B Munson @ 2015-05-08 19:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric B Munson, linux-alpha, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm,
	linux-arch, linux-api

The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.  This patch introduces
the ability to request that pages are not pre-faulted, but are placed on
the unevictable LRU when they are finally faulted in.

To keep accounting checks out of the page fault path, users are billed
for the entire mapping lock as if MAP_LOCKED was used.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-mm@kvack.org
Cc: linux-arch@vger.kernel.org
Cc: linux-api@vger.kernel.org
---
 arch/alpha/include/uapi/asm/mman.h   | 1 +
 arch/mips/include/uapi/asm/mman.h    | 1 +
 arch/parisc/include/uapi/asm/mman.h  | 1 +
 arch/powerpc/include/uapi/asm/mman.h | 1 +
 arch/sparc/include/uapi/asm/mman.h   | 1 +
 arch/tile/include/uapi/asm/mman.h    | 1 +
 arch/xtensa/include/uapi/asm/mman.h  | 1 +
 include/linux/mm.h                   | 1 +
 include/linux/mman.h                 | 3 ++-
 include/uapi/asm-generic/mman.h      | 1 +
 mm/mmap.c                            | 4 ++--
 mm/swap.c                            | 3 ++-
 12 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..15e96e1 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -30,6 +30,7 @@
 #define MAP_NONBLOCK	0x40000		/* do not block on IO */
 #define MAP_STACK	0x80000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x100000	/* create a huge page mapping */
+#define MAP_LOCKONFAULT	0x200000	/* Lock pages after they are faulted in, do not prefault */
 
 #define MS_ASYNC	1		/* sync memory asynchronously */
 #define MS_SYNC		2		/* synchronous memory sync */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index cfcb876..47846a5 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -48,6 +48,7 @@
 #define MAP_NONBLOCK	0x20000		/* do not block on IO */
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
+#define MAP_LOCKONFAULT	0x100000	/* Lock pages after they are faulted in, do not prefault */
 
 /*
  * Flags for msync
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 294d251..1514cd7 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -24,6 +24,7 @@
 #define MAP_NONBLOCK	0x20000		/* do not block on IO */
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
+#define MAP_LOCKONFAULT	0x100000	/* Lock pages after they are faulted in, do not prefault */
 
 #define MS_SYNC		1		/* synchronous memory sync */
 #define MS_ASYNC	2		/* sync memory asynchronously */
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..fce74fe 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -27,5 +27,6 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_LOCKONFAULT	0x80000		/* Lock pages after they are faulted in, do not prefault */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
index 0b14df3..12425d8 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_LOCKONFAULT	0x80000		/* Lock pages after they are faulted in, do not prefault */
 
 
 #endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
index 81b8fc3..ec04eaf 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -29,6 +29,7 @@
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_HUGETLB	0x4000		/* create a huge page mapping */
+#define MAP_LOCKONFAULT	0x8000		/* Lock pages after they are faulted in, do not prefault */
 
 
 /*
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 201aec0..42d43cc 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -55,6 +55,7 @@
 #define MAP_NONBLOCK	0x20000		/* do not block on IO */
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
+#define MAP_LOCKONFAULT	0x100000	/* Lock pages after they are faulted in, do not prefault */
 #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
 # define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
 					 * uninitialized */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0755b9f..3e31457 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -126,6 +126,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct page", just pure PFN */
 #define VM_DENYWRITE	0x00000800	/* ETXTBSY on write attempts.. */
 
+#define VM_LOCKONFAULT	0x00001000	/* Lock the pages covered when they are faulted in */
 #define VM_LOCKED	0x00002000
 #define VM_IO           0x00004000	/* Memory mapped I/O or similar */
 
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 16373c8..437264b 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags)
 {
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
-	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
+	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
+	       _calc_vm_trans(flags, MAP_LOCKONFAULT,VM_LOCKONFAULT);
 }
 
 unsigned long vm_commit_limit(void);
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index e9fe6fd..fc4e586 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -12,6 +12,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_LOCKONFAULT	0x80000		/* Lock pages after they are faulted in, do not prefault */
 
 /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
 
diff --git a/mm/mmap.c b/mm/mmap.c
index bb50cac..ba1a6bf 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1233,7 +1233,7 @@ static inline int mlock_future_check(struct mm_struct *mm,
 	unsigned long locked, lock_limit;
 
 	/*  mlock MCL_FUTURE? */
-	if (flags & VM_LOCKED) {
+	if (flags & (VM_LOCKED | VM_LOCKONFAULT)) {
 		locked = len >> PAGE_SHIFT;
 		locked += mm->locked_vm;
 		lock_limit = rlimit(RLIMIT_MEMLOCK);
@@ -1301,7 +1301,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 	vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) |
 			mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
 
-	if (flags & MAP_LOCKED)
+	if (flags & (MAP_LOCKED | MAP_LOCKONFAULT))
 		if (!can_do_mlock())
 			return -EPERM;
 
diff --git a/mm/swap.c b/mm/swap.c
index a7251a8..07c905e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -711,7 +711,8 @@ void lru_cache_add_active_or_unevictable(struct page *page,
 {
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 
-	if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) {
+	if (likely((vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT)) == 0) ||
+		   (vma->vm_flags & VM_SPECIAL)) {
 		SetPageActive(page);
 		lru_cache_add(page);
 		return;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] Add mlockall flag for locking pages on fault
  2015-05-08 19:33 [PATCH 0/3] Allow user to request memory to be locked on page fault Eric B Munson
  2015-05-08 19:33 ` [PATCH 1/3] Add flag to request pages are locked after " Eric B Munson
@ 2015-05-08 19:33 ` Eric B Munson
  2015-05-08 19:33 ` [PATCH 3/3] Add tests for lock " Eric B Munson
  2015-05-08 19:42 ` [PATCH 0/3] Allow user to request memory to be locked on page fault Andrew Morton
  3 siblings, 0 replies; 18+ messages in thread
From: Eric B Munson @ 2015-05-08 19:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric B Munson, linux-alpha, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-arch,
	linux-api, linux-mm

Building on the previous patch, extend mlockall() to give a process a
way to specify that pages should be locked when they are faulted in, but
that pre-faulting is not needed.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
---
 arch/alpha/include/uapi/asm/mman.h   |  1 +
 arch/mips/include/uapi/asm/mman.h    |  1 +
 arch/parisc/include/uapi/asm/mman.h  |  1 +
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h    |  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  1 +
 include/uapi/asm-generic/mman.h      |  1 +
 mm/mlock.c                           | 13 +++++++++----
 9 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 15e96e1..3120dfb 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -38,6 +38,7 @@
 
 #define MCL_CURRENT	 8192		/* lock all currently mapped pages */
 #define MCL_FUTURE	16384		/* lock all additions to address space */
+#define MCL_ON_FAULT	32768		/* lock all pages that are faulted in */
 
 #define MADV_NORMAL	0		/* no further special treatment */
 #define MADV_RANDOM	1		/* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index 47846a5..82aec3c 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -62,6 +62,7 @@
  */
 #define MCL_CURRENT	1		/* lock all current mappings */
 #define MCL_FUTURE	2		/* lock all future mappings */
+#define MCL_ON_FAULT	4		/* lock all pages that are faulted in */
 
 #define MADV_NORMAL	0		/* no further special treatment */
 #define MADV_RANDOM	1		/* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 1514cd7..f4601f3 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -32,6 +32,7 @@
 
 #define MCL_CURRENT	1		/* lock all current mappings */
 #define MCL_FUTURE	2		/* lock all future mappings */
+#define MCL_ON_FAULT	4		/* lock all pages that are faulted in */
 
 #define MADV_NORMAL     0               /* no further special treatment */
 #define MADV_RANDOM     1               /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index fce74fe..0a28efc 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 
 #define MCL_CURRENT     0x2000          /* lock all currently mapped pages */
 #define MCL_FUTURE      0x4000          /* lock all additions to address space */
+#define MCL_ON_FAULT	0x80000		/* lock all pages that are faulted in */
 
 #define MAP_POPULATE	0x8000		/* populate (prefault) pagetables */
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
index 12425d8..119be80 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -17,6 +17,7 @@
 
 #define MCL_CURRENT     0x2000          /* lock all currently mapped pages */
 #define MCL_FUTURE      0x4000          /* lock all additions to address space */
+#define MCL_ON_FAULT	0x80000		/* lock all pages that are faulted in */
 
 #define MAP_POPULATE	0x8000		/* populate (prefault) pagetables */
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
index ec04eaf..66ea935 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -37,6 +37,7 @@
  */
 #define MCL_CURRENT	1		/* lock all current mappings */
 #define MCL_FUTURE	2		/* lock all future mappings */
+#define MCL_ON_FAULT	4		/* lock all pages that are faulted in */
 
 
 #endif /* _ASM_TILE_MMAN_H */
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 42d43cc..9abcc29 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -75,6 +75,7 @@
  */
 #define MCL_CURRENT	1		/* lock all current mappings */
 #define MCL_FUTURE	2		/* lock all future mappings */
+#define MCL_ON_FAULT	4		/* lock all pages that are faulted in */
 
 #define MADV_NORMAL	0		/* no further special treatment */
 #define MADV_RANDOM	1		/* expect random page references */
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index fc4e586..6ac7a7b 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -18,5 +18,6 @@
 
 #define MCL_CURRENT	1		/* lock all current mappings */
 #define MCL_FUTURE	2		/* lock all future mappings */
+#define MCL_ON_FAULT	4		/* lock all pages that are faulted in */
 
 #endif /* __ASM_GENERIC_MMAN_H */
diff --git a/mm/mlock.c b/mm/mlock.c
index 6fd2cf1..1406835 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -579,7 +579,7 @@ static int do_mlock(unsigned long start, size_t len, int on)
 
 		/* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
 
-		newflags = vma->vm_flags & ~VM_LOCKED;
+		newflags = vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT);
 		if (on)
 			newflags |= VM_LOCKED;
 
@@ -662,13 +662,17 @@ static int do_mlockall(int flags)
 		current->mm->def_flags |= VM_LOCKED;
 	else
 		current->mm->def_flags &= ~VM_LOCKED;
-	if (flags == MCL_FUTURE)
+	if (flags & MCL_ON_FAULT)
+		current->mm->def_flags |= VM_LOCKONFAULT;
+	else
+		current->mm->def_flags &= ~VM_LOCKONFAULT;
+	if (flags == MCL_FUTURE || flags == MCL_ON_FAULT)
 		goto out;
 
 	for (vma = current->mm->mmap; vma ; vma = prev->vm_next) {
 		vm_flags_t newflags;
 
-		newflags = vma->vm_flags & ~VM_LOCKED;
+		newflags = vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT);
 		if (flags & MCL_CURRENT)
 			newflags |= VM_LOCKED;
 
@@ -685,7 +689,8 @@ SYSCALL_DEFINE1(mlockall, int, flags)
 	unsigned long lock_limit;
 	int ret = -EINVAL;
 
-	if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE)))
+	if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE | MCL_ON_FAULT)) ||
+	    ((flags & MCL_FUTURE) && (flags & MCL_ON_FAULT)))
 		goto out;
 
 	ret = -EPERM;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] Add tests for lock on fault
  2015-05-08 19:33 [PATCH 0/3] Allow user to request memory to be locked on page fault Eric B Munson
  2015-05-08 19:33 ` [PATCH 1/3] Add flag to request pages are locked after " Eric B Munson
  2015-05-08 19:33 ` [PATCH 2/3] Add mlockall flag for locking pages on fault Eric B Munson
@ 2015-05-08 19:33 ` Eric B Munson
  2015-05-08 19:42 ` [PATCH 0/3] Allow user to request memory to be locked on page fault Andrew Morton
  3 siblings, 0 replies; 18+ messages in thread
From: Eric B Munson @ 2015-05-08 19:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric B Munson, Shuah Khan, linux-mm, linux-kernel, linux-api

Test the mmap() flag, the mlockall() flag, and ensure that mlock limits
are respected.  Note that the limit test needs to be run a normal user.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-api@vger.kernel.org
---
 tools/testing/selftests/vm/Makefile         |   8 +-
 tools/testing/selftests/vm/lock-on-fault.c  | 145 ++++++++++++++++++++++++++++
 tools/testing/selftests/vm/on-fault-limit.c |  47 +++++++++
 tools/testing/selftests/vm/run_vmtests      |  23 +++++
 4 files changed, 222 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/vm/lock-on-fault.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index a5ce953..32f3d20 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -1,7 +1,13 @@
 # Makefile for vm selftests
 
 CFLAGS = -Wall
-BINARIES = hugepage-mmap hugepage-shm map_hugetlb thuge-gen hugetlbfstest
+BINARIES = hugepage-mmap
+BINARIES += hugepage-shm
+BINARIES += hugetlbfstest
+BINARIES += lock-on-fault
+BINARIES += map_hugetlb
+BINARIES += on-fault-limit
+BINARIES += thuge-gen
 BINARIES += transhuge-stress
 
 all: $(BINARIES)
diff --git a/tools/testing/selftests/vm/lock-on-fault.c b/tools/testing/selftests/vm/lock-on-fault.c
new file mode 100644
index 0000000..e6a9688
--- /dev/null
+++ b/tools/testing/selftests/vm/lock-on-fault.c
@@ -0,0 +1,145 @@
+#include <sys/mman.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+
+#ifndef MCL_ON_FAULT
+#define MCL_ON_FAULT 4
+#endif
+
+#define PRESENT_BIT	0x8000000000000000
+#define PFN_MASK	0x007FFFFFFFFFFFFF
+#define UNEVICTABLE_BIT	(1UL << 18)
+
+static int check_pageflags(void *map)
+{
+	FILE *file;
+	unsigned long pfn1;
+	unsigned long pfn2;
+	unsigned long offset1;
+	unsigned long offset2;
+	int ret = 1;
+
+	file = fopen("/proc/self/pagemap", "r");
+	if (!file) {
+		perror("fopen");
+		return ret;
+	}
+	offset1 = (unsigned long)map / getpagesize() * sizeof(unsigned long);
+	offset2 = ((unsigned long)map + getpagesize()) / getpagesize() * sizeof(unsigned long);
+	if (fseek(file, offset1, SEEK_SET)) {
+		perror("fseek");
+		goto out;
+	}
+
+	if (fread(&pfn1, sizeof(unsigned long), 1, file) != 1) {
+		perror("fread");
+		goto out;
+	}
+
+	if (fseek(file, offset2, SEEK_SET)) {
+		perror("fseek");
+		goto out;
+	}
+
+	if (fread(&pfn2, sizeof(unsigned long), 1, file) != 1) {
+		perror("fread");
+		goto out;
+	}
+
+	/* pfn2 should not be present */
+	if (pfn2 & PRESENT_BIT) {
+		printf("page map says 0x%lx\n", pfn2);
+		printf("present is    0x%lx\n", PRESENT_BIT);
+		goto out;
+	}
+
+	/* pfn1 should be present */
+	if ((pfn1 & PRESENT_BIT) == 0) {
+		printf("page map says 0x%lx\n", pfn1);
+		printf("present is    0x%lx\n", PRESENT_BIT);
+		goto out;
+	}
+
+	pfn1 &= PFN_MASK;
+	fclose(file);
+	file = fopen("/proc/kpageflags", "r");
+	if (!file) {
+		perror("fopen");
+		munmap(map, 2 * getpagesize());
+		return ret;
+	}
+
+	if (fseek(file, pfn1 * sizeof(unsigned long), SEEK_SET)) {
+		perror("fseek");
+		goto out;
+	}
+
+	if (fread(&pfn2, sizeof(unsigned long), 1, file) != 1) {
+		perror("fread");
+		goto out;
+	}
+
+	/* pfn2 now contains the entry from kpageflags for the first page, the
+	 * unevictable bit should be set */
+	if ((pfn2 & UNEVICTABLE_BIT) == 0) {
+		printf("kpageflags says 0x%lx\n", pfn2);
+		printf("unevictable is  0x%lx\n", UNEVICTABLE_BIT);
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	fclose(file);
+	return ret;
+}
+
+static int test_mmap(int flags)
+{
+	int ret = 1;
+	void *map;
+
+	map = mmap(NULL, 2 * getpagesize(), PROT_READ | PROT_WRITE, flags, 0, 0);
+	if (map == MAP_FAILED) {
+		perror("mmap()");
+		return ret;
+	}
+
+	/* Write something into the first page to ensure it is present */
+	*(char *)map = 1;
+
+	ret = check_pageflags(map);
+
+	munmap(map, 2 * getpagesize());
+	return ret;
+}
+
+static int test_mlockall(void)
+{
+	int ret = 1;
+
+	if (mlockall(MCL_ON_FAULT)) {
+		perror("mlockall");
+		return ret;
+	}
+
+	ret = test_mmap(MAP_PRIVATE | MAP_ANONYMOUS);
+	munlockall();
+	return ret;
+}
+
+#ifndef MAP_LOCKONFAULT
+#define MAP_LOCKONFAULT (MAP_HUGETLB << 1)
+#endif
+
+int main(int argc, char **argv)
+{
+	int ret = 0;
+
+	ret += test_mmap(MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKONFAULT);
+	ret += test_mlockall();
+	return ret;
+}
diff --git a/tools/testing/selftests/vm/on-fault-limit.c b/tools/testing/selftests/vm/on-fault-limit.c
new file mode 100644
index 0000000..bd70078
--- /dev/null
+++ b/tools/testing/selftests/vm/on-fault-limit.c
@@ -0,0 +1,47 @@
+#include <sys/mman.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+
+#ifndef MCL_ON_FAULT
+#define MCL_ON_FAULT 4
+#endif
+
+static int test_limit(void)
+{
+	int ret = 1;
+	struct rlimit lims;
+	void *map;
+
+	if (getrlimit(RLIMIT_MEMLOCK, &lims)) {
+		perror("getrlimit");
+		return ret;
+	}
+
+	if (mlockall(MCL_ON_FAULT)) {
+		perror("mlockall");
+		return ret;
+	}
+
+	map = mmap(NULL, 2 * lims.rlim_max, PROT_READ | PROT_WRITE,
+		   MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, 0, 0);
+	if (map != MAP_FAILED)
+		printf("mmap should have failed, but didn't\n");
+	else {
+		ret = 0;
+		munmap(map, 2 * lims.rlim_max);
+	}
+
+	munlockall();
+	return ret;
+}
+
+int main(int argc, char **argv)
+{
+	int ret = 0;
+
+	ret += test_limit();
+	return ret;
+}
diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests
index c87b681..c1aecce 100755
--- a/tools/testing/selftests/vm/run_vmtests
+++ b/tools/testing/selftests/vm/run_vmtests
@@ -90,4 +90,27 @@ fi
 umount $mnt
 rm -rf $mnt
 echo $nr_hugepgs > /proc/sys/vm/nr_hugepages
+
+echo "--------------------"
+echo "running lock-on-fault"
+echo "--------------------"
+./lock-on-fault
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+	exitcode=1
+else
+	echo "[PASS]"
+fi
+
+echo "--------------------"
+echo "running on-fault-limit"
+echo "--------------------"
+sudo -u nobody ./on-fault-limit
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+	exitcode=1
+else
+	echo "[PASS]"
+fi
+
 exit $exitcode
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-08 19:33 [PATCH 0/3] Allow user to request memory to be locked on page fault Eric B Munson
                   ` (2 preceding siblings ...)
  2015-05-08 19:33 ` [PATCH 3/3] Add tests for lock " Eric B Munson
@ 2015-05-08 19:42 ` Andrew Morton
  2015-05-08 20:06   ` Eric B Munson
  2015-05-11 18:06   ` Eric B Munson
  3 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2015-05-08 19:42 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Shuah Khan, linux-alpha, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch,
	linux-api

On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:

> mlock() allows a user to control page out of program memory, but this
> comes at the cost of faulting in the entire mapping when it is
> allocated.  For large mappings where the entire area is not necessary
> this is not ideal.
> 
> This series introduces new flags for mmap() and mlockall() that allow a
> user to specify that the covered are should not be paged out, but only
> after the memory has been used the first time.

Please tell us much much more about the value of these changes: the use
cases, the behavioural improvements and performance results which the
patchset brings to those use cases, etc.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-08 19:42 ` [PATCH 0/3] Allow user to request memory to be locked on page fault Andrew Morton
@ 2015-05-08 20:06   ` Eric B Munson
  2015-05-08 20:15     ` Andrew Morton
  2015-05-13 13:58     ` Michal Hocko
  2015-05-11 18:06   ` Eric B Munson
  1 sibling, 2 replies; 18+ messages in thread
From: Eric B Munson @ 2015-05-08 20:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shuah Khan, linux-alpha, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch,
	linux-api

[-- Attachment #1: Type: text/plain, Size: 1523 bytes --]

On Fri, 08 May 2015, Andrew Morton wrote:

> On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:
> 
> > mlock() allows a user to control page out of program memory, but this
> > comes at the cost of faulting in the entire mapping when it is
> > allocated.  For large mappings where the entire area is not necessary
> > this is not ideal.
> > 
> > This series introduces new flags for mmap() and mlockall() that allow a
> > user to specify that the covered are should not be paged out, but only
> > after the memory has been used the first time.
> 
> Please tell us much much more about the value of these changes: the use
> cases, the behavioural improvements and performance results which the
> patchset brings to those use cases, etc.
> 

The primary use case is for mmaping large files read only.  The process
knows that some of the data is necessary, but it is unlikely that the
entire file will be needed.  The developer only wants to pay the cost to
read the data in once.  Unfortunately developer must choose between
allowing the kernel to page in the memory as needed and guaranteeing
that the data will only be read from disk once.  The first option runs
the risk of having the memory reclaimed if the system is under memory
pressure, the second forces the memory usage and startup delay when
faulting in the entire file.

I am working on getting startup times with and without this change for
an application, I will post them as soon as I have them.

Eric

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-08 20:06   ` Eric B Munson
@ 2015-05-08 20:15     ` Andrew Morton
  2015-05-11 14:36       ` Eric B Munson
  2015-05-13 13:58     ` Michal Hocko
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2015-05-08 20:15 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Shuah Khan, linux-alpha, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch,
	linux-api

On Fri, 8 May 2015 16:06:10 -0400 Eric B Munson <emunson@akamai.com> wrote:

> On Fri, 08 May 2015, Andrew Morton wrote:
> 
> > On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:
> > 
> > > mlock() allows a user to control page out of program memory, but this
> > > comes at the cost of faulting in the entire mapping when it is
> > > allocated.  For large mappings where the entire area is not necessary
> > > this is not ideal.
> > > 
> > > This series introduces new flags for mmap() and mlockall() that allow a
> > > user to specify that the covered are should not be paged out, but only
> > > after the memory has been used the first time.
> > 
> > Please tell us much much more about the value of these changes: the use
> > cases, the behavioural improvements and performance results which the
> > patchset brings to those use cases, etc.
> > 
> 
> The primary use case is for mmaping large files read only.  The process
> knows that some of the data is necessary, but it is unlikely that the
> entire file will be needed.  The developer only wants to pay the cost to
> read the data in once.  Unfortunately developer must choose between
> allowing the kernel to page in the memory as needed and guaranteeing
> that the data will only be read from disk once.  The first option runs
> the risk of having the memory reclaimed if the system is under memory
> pressure, the second forces the memory usage and startup delay when
> faulting in the entire file.

Why can't the application mmap only those parts of the file which it
wants and mlock those?

> I am working on getting startup times with and without this change for
> an application, I will post them as soon as I have them.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-08 20:15     ` Andrew Morton
@ 2015-05-11 14:36       ` Eric B Munson
  2015-05-11 19:12         ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Eric B Munson @ 2015-05-11 14:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shuah Khan, linux-alpha, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch,
	linux-api

[-- Attachment #1: Type: text/plain, Size: 3082 bytes --]

On Fri, 08 May 2015, Andrew Morton wrote:

> On Fri, 8 May 2015 16:06:10 -0400 Eric B Munson <emunson@akamai.com> wrote:
> 
> > On Fri, 08 May 2015, Andrew Morton wrote:
> > 
> > > On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:
> > > 
> > > > mlock() allows a user to control page out of program memory, but this
> > > > comes at the cost of faulting in the entire mapping when it is
> > > > allocated.  For large mappings where the entire area is not necessary
> > > > this is not ideal.
> > > > 
> > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > user to specify that the covered are should not be paged out, but only
> > > > after the memory has been used the first time.
> > > 
> > > Please tell us much much more about the value of these changes: the use
> > > cases, the behavioural improvements and performance results which the
> > > patchset brings to those use cases, etc.
> > > 
> > 
> > The primary use case is for mmaping large files read only.  The process
> > knows that some of the data is necessary, but it is unlikely that the
> > entire file will be needed.  The developer only wants to pay the cost to
> > read the data in once.  Unfortunately developer must choose between
> > allowing the kernel to page in the memory as needed and guaranteeing
> > that the data will only be read from disk once.  The first option runs
> > the risk of having the memory reclaimed if the system is under memory
> > pressure, the second forces the memory usage and startup delay when
> > faulting in the entire file.
> 
> Why can't the application mmap only those parts of the file which it
> wants and mlock those?

There are a number of problems with this approach.  The first is it
presumes the program will know what portions are needed a head of time.
In many cases this is simply not true.  The second problem is the number
of syscalls required.  With my patches, a single mmap() or mlockall()
call is needed to setup the required locking.  Without it, a separate
mmap call must be made for each piece of data that is needed.  This also
opens up problems for data that is arranged assuming it is contiguous in
memory.  With the single mmap call, the user gets a contiguous VMA
without having to know about it.  mmap() with MAP_FIXED could address
the problem, but this introduces a new failure mode of your map
colliding with another that was placed by the kernel.

Another use case for the LOCKONFAULT flag is the security use of
mlock().  If an application will be using data that cannot be written
to swap, but the exact size is unknown until run time (all we have a
build time is the maximum size the buffer can be).  The LOCKONFAULT flag
allows the developer to create the buffer and guarantee that the
contents are never written to swap without ever consuming more memory
than is actually needed.

> 
> > I am working on getting startup times with and without this change for
> > an application, I will post them as soon as I have them.
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-08 19:42 ` [PATCH 0/3] Allow user to request memory to be locked on page fault Andrew Morton
  2015-05-08 20:06   ` Eric B Munson
@ 2015-05-11 18:06   ` Eric B Munson
  2015-05-13 15:00     ` Eric B Munson
  1 sibling, 1 reply; 18+ messages in thread
From: Eric B Munson @ 2015-05-11 18:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shuah Khan, linux-alpha, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch,
	linux-api

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

On Fri, 08 May 2015, Andrew Morton wrote:

> On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:
> 
> > mlock() allows a user to control page out of program memory, but this
> > comes at the cost of faulting in the entire mapping when it is
> > allocated.  For large mappings where the entire area is not necessary
> > this is not ideal.
> > 
> > This series introduces new flags for mmap() and mlockall() that allow a
> > user to specify that the covered are should not be paged out, but only
> > after the memory has been used the first time.
> 
> Please tell us much much more about the value of these changes: the use
> cases, the behavioural improvements and performance results which the
> patchset brings to those use cases, etc.
> 

To illustrate the proposed use case I wrote a quick program that mmaps
a 5GB file which is filled with random data and accesses 150,000 pages
from that mapping.  Setup and processing were timed separately to
illustrate the differences between the three tested approaches.  the
setup portion is simply the call to mmap, the processing is the
accessing of the various locations in  that mapping.  The following
values are in milliseconds and are the averages of 20 runs each with a
call to echo 3 > /proc/sys/vm/drop_caches between each run.

The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
Startup average:    9476.506
Processing average: 3.573

The second mapping was simply MAP_PRIVATE but each page was passed to
mlock() before being read:
Startup average:    0.051
Processing average: 721.859

The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
Startup average:    0.084
Processing average: 42.125



[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-11 14:36       ` Eric B Munson
@ 2015-05-11 19:12         ` Andrew Morton
  2015-05-11 21:05           ` Eric B Munson
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2015-05-11 19:12 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Shuah Khan, linux-alpha, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch,
	linux-api

On Mon, 11 May 2015 10:36:18 -0400 Eric B Munson <emunson@akamai.com> wrote:

> On Fri, 08 May 2015, Andrew Morton wrote:
> ...
>
> > 
> > Why can't the application mmap only those parts of the file which it
> > wants and mlock those?
> 
> There are a number of problems with this approach.  The first is it
> presumes the program will know what portions are needed a head of time.
> In many cases this is simply not true.  The second problem is the number
> of syscalls required.  With my patches, a single mmap() or mlockall()
> call is needed to setup the required locking.  Without it, a separate
> mmap call must be made for each piece of data that is needed.  This also
> opens up problems for data that is arranged assuming it is contiguous in
> memory.  With the single mmap call, the user gets a contiguous VMA
> without having to know about it.  mmap() with MAP_FIXED could address
> the problem, but this introduces a new failure mode of your map
> colliding with another that was placed by the kernel.
> 
> Another use case for the LOCKONFAULT flag is the security use of
> mlock().  If an application will be using data that cannot be written
> to swap, but the exact size is unknown until run time (all we have a
> build time is the maximum size the buffer can be).  The LOCKONFAULT flag
> allows the developer to create the buffer and guarantee that the
> contents are never written to swap without ever consuming more memory
> than is actually needed.

What application(s) or class of applications are we talking about here?

IOW, how generally applicable is this?  It sounds rather specialized.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-11 19:12         ` Andrew Morton
@ 2015-05-11 21:05           ` Eric B Munson
  0 siblings, 0 replies; 18+ messages in thread
From: Eric B Munson @ 2015-05-11 21:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shuah Khan, linux-alpha, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch,
	linux-api

[-- Attachment #1: Type: text/plain, Size: 2059 bytes --]

On Mon, 11 May 2015, Andrew Morton wrote:

> On Mon, 11 May 2015 10:36:18 -0400 Eric B Munson <emunson@akamai.com> wrote:
> 
> > On Fri, 08 May 2015, Andrew Morton wrote:
> > ...
> >
> > > 
> > > Why can't the application mmap only those parts of the file which it
> > > wants and mlock those?
> > 
> > There are a number of problems with this approach.  The first is it
> > presumes the program will know what portions are needed a head of time.
> > In many cases this is simply not true.  The second problem is the number
> > of syscalls required.  With my patches, a single mmap() or mlockall()
> > call is needed to setup the required locking.  Without it, a separate
> > mmap call must be made for each piece of data that is needed.  This also
> > opens up problems for data that is arranged assuming it is contiguous in
> > memory.  With the single mmap call, the user gets a contiguous VMA
> > without having to know about it.  mmap() with MAP_FIXED could address
> > the problem, but this introduces a new failure mode of your map
> > colliding with another that was placed by the kernel.
> > 
> > Another use case for the LOCKONFAULT flag is the security use of
> > mlock().  If an application will be using data that cannot be written
> > to swap, but the exact size is unknown until run time (all we have a
> > build time is the maximum size the buffer can be).  The LOCKONFAULT flag
> > allows the developer to create the buffer and guarantee that the
> > contents are never written to swap without ever consuming more memory
> > than is actually needed.
> 
> What application(s) or class of applications are we talking about here?
> 
> IOW, how generally applicable is this?  It sounds rather specialized.
> 

For the example of a large file, this is the usage pattern for a large
statical language model (probably applies to other statical or graphical
models as well).  For the security example, any application transacting
in data that cannot be swapped out (credit card data, medical records,
etc).


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-08 20:06   ` Eric B Munson
  2015-05-08 20:15     ` Andrew Morton
@ 2015-05-13 13:58     ` Michal Hocko
  2015-05-13 14:14       ` Eric B Munson
  1 sibling, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2015-05-13 13:58 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Andrew Morton, Shuah Khan, linux-alpha, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm,
	linux-arch, linux-api

On Fri 08-05-15 16:06:10, Eric B Munson wrote:
> On Fri, 08 May 2015, Andrew Morton wrote:
> 
> > On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:
> > 
> > > mlock() allows a user to control page out of program memory, but this
> > > comes at the cost of faulting in the entire mapping when it is
> > > allocated.  For large mappings where the entire area is not necessary
> > > this is not ideal.
> > > 
> > > This series introduces new flags for mmap() and mlockall() that allow a
> > > user to specify that the covered are should not be paged out, but only
> > > after the memory has been used the first time.
> > 
> > Please tell us much much more about the value of these changes: the use
> > cases, the behavioural improvements and performance results which the
> > patchset brings to those use cases, etc.
> > 
> 
> The primary use case is for mmaping large files read only.  The process
> knows that some of the data is necessary, but it is unlikely that the
> entire file will be needed.  The developer only wants to pay the cost to
> read the data in once.  Unfortunately developer must choose between
> allowing the kernel to page in the memory as needed and guaranteeing
> that the data will only be read from disk once.  The first option runs
> the risk of having the memory reclaimed if the system is under memory
> pressure, the second forces the memory usage and startup delay when
> faulting in the entire file.

Is there any reason you cannot do this from the userspace? Start by
mmap(PROT_NONE) and do mmap(MAP_FIXED|MAP_LOCKED|MAP_READ|other_flags_you_need)
from the SIGSEGV handler?
You can generate a lot of vmas that way but you can mitigate that to a
certain level by mapping larger than PAGE_SIZE chunks in the fault
handler. Would that work in your usecase?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-13 13:58     ` Michal Hocko
@ 2015-05-13 14:14       ` Eric B Munson
  0 siblings, 0 replies; 18+ messages in thread
From: Eric B Munson @ 2015-05-13 14:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Shuah Khan, linux-alpha, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm,
	linux-arch, linux-api

[-- Attachment #1: Type: text/plain, Size: 2818 bytes --]

On Wed, 13 May 2015, Michal Hocko wrote:

> On Fri 08-05-15 16:06:10, Eric B Munson wrote:
> > On Fri, 08 May 2015, Andrew Morton wrote:
> > 
> > > On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:
> > > 
> > > > mlock() allows a user to control page out of program memory, but this
> > > > comes at the cost of faulting in the entire mapping when it is
> > > > allocated.  For large mappings where the entire area is not necessary
> > > > this is not ideal.
> > > > 
> > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > user to specify that the covered are should not be paged out, but only
> > > > after the memory has been used the first time.
> > > 
> > > Please tell us much much more about the value of these changes: the use
> > > cases, the behavioural improvements and performance results which the
> > > patchset brings to those use cases, etc.
> > > 
> > 
> > The primary use case is for mmaping large files read only.  The process
> > knows that some of the data is necessary, but it is unlikely that the
> > entire file will be needed.  The developer only wants to pay the cost to
> > read the data in once.  Unfortunately developer must choose between
> > allowing the kernel to page in the memory as needed and guaranteeing
> > that the data will only be read from disk once.  The first option runs
> > the risk of having the memory reclaimed if the system is under memory
> > pressure, the second forces the memory usage and startup delay when
> > faulting in the entire file.
> 
> Is there any reason you cannot do this from the userspace? Start by
> mmap(PROT_NONE) and do mmap(MAP_FIXED|MAP_LOCKED|MAP_READ|other_flags_you_need)
> from the SIGSEGV handler?
> You can generate a lot of vmas that way but you can mitigate that to a
> certain level by mapping larger than PAGE_SIZE chunks in the fault
> handler. Would that work in your usecase?

This might work for the use cases I have laid out (I am not sure about
the anonymous mmap one, but I will try it).  I am concerned about how
much memory management policy these suggestions push into userspace.
I am also concerned about the number of system calls required to do the
same thing.  This will require a new call to mmap() for every new page
accessed in the file (or for every file_size/map_size in the multiple
page chunk).  The simple case of calling mlock() on the every time the
file was accessed was significantly slower than the LOCKONFAULT flag.
Your suggestion will be better in that it avoids the extra mlock call
for pages already locked, but there still significantly more system
calls.  I will add this to the program I have been using to measure
executuion times and see how it compares to the other options.

Eric


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-11 18:06   ` Eric B Munson
@ 2015-05-13 15:00     ` Eric B Munson
  2015-05-14  8:08       ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Eric B Munson @ 2015-05-13 15:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Shuah Khan, linux-alpha, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm,
	linux-arch, linux-api

[-- Attachment #1: Type: text/plain, Size: 2108 bytes --]

On Mon, 11 May 2015, Eric B Munson wrote:

> On Fri, 08 May 2015, Andrew Morton wrote:
> 
> > On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:
> > 
> > > mlock() allows a user to control page out of program memory, but this
> > > comes at the cost of faulting in the entire mapping when it is
> > > allocated.  For large mappings where the entire area is not necessary
> > > this is not ideal.
> > > 
> > > This series introduces new flags for mmap() and mlockall() that allow a
> > > user to specify that the covered are should not be paged out, but only
> > > after the memory has been used the first time.
> > 
> > Please tell us much much more about the value of these changes: the use
> > cases, the behavioural improvements and performance results which the
> > patchset brings to those use cases, etc.
> > 
> 
> To illustrate the proposed use case I wrote a quick program that mmaps
> a 5GB file which is filled with random data and accesses 150,000 pages
> from that mapping.  Setup and processing were timed separately to
> illustrate the differences between the three tested approaches.  the
> setup portion is simply the call to mmap, the processing is the
> accessing of the various locations in  that mapping.  The following
> values are in milliseconds and are the averages of 20 runs each with a
> call to echo 3 > /proc/sys/vm/drop_caches between each run.
> 
> The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
> Startup average:    9476.506
> Processing average: 3.573
> 
> The second mapping was simply MAP_PRIVATE but each page was passed to
> mlock() before being read:
> Startup average:    0.051
> Processing average: 721.859
> 
> The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
> Startup average:    0.084
> Processing average: 42.125
> 

Michal's suggestion of changing protections and locking in a signal
handler was better than the locking as needed, but still significantly
more work required than the LOCKONFAULT case.

Startup average:    0.047
Processing average: 86.431


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-13 15:00     ` Eric B Munson
@ 2015-05-14  8:08       ` Michal Hocko
  2015-05-14 13:58         ` Eric B Munson
  2015-05-15 15:35         ` Eric B Munson
  0 siblings, 2 replies; 18+ messages in thread
From: Michal Hocko @ 2015-05-14  8:08 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Andrew Morton, Shuah Khan, linux-alpha, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm,
	linux-arch, linux-api

On Wed 13-05-15 11:00:36, Eric B Munson wrote:
> On Mon, 11 May 2015, Eric B Munson wrote:
> 
> > On Fri, 08 May 2015, Andrew Morton wrote:
> > 
> > > On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:
> > > 
> > > > mlock() allows a user to control page out of program memory, but this
> > > > comes at the cost of faulting in the entire mapping when it is
> > > > allocated.  For large mappings where the entire area is not necessary
> > > > this is not ideal.
> > > > 
> > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > user to specify that the covered are should not be paged out, but only
> > > > after the memory has been used the first time.
> > > 
> > > Please tell us much much more about the value of these changes: the use
> > > cases, the behavioural improvements and performance results which the
> > > patchset brings to those use cases, etc.
> > > 
> > 
> > To illustrate the proposed use case I wrote a quick program that mmaps
> > a 5GB file which is filled with random data and accesses 150,000 pages
> > from that mapping.  Setup and processing were timed separately to
> > illustrate the differences between the three tested approaches.  the
> > setup portion is simply the call to mmap, the processing is the
> > accessing of the various locations in  that mapping.  The following
> > values are in milliseconds and are the averages of 20 runs each with a
> > call to echo 3 > /proc/sys/vm/drop_caches between each run.
> > 
> > The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
> > Startup average:    9476.506
> > Processing average: 3.573
> > 
> > The second mapping was simply MAP_PRIVATE but each page was passed to
> > mlock() before being read:
> > Startup average:    0.051
> > Processing average: 721.859
> > 
> > The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
> > Startup average:    0.084
> > Processing average: 42.125
> > 
> 
> Michal's suggestion of changing protections and locking in a signal
> handler was better than the locking as needed, but still significantly
> more work required than the LOCKONFAULT case.
> 
> Startup average:    0.047
> Processing average: 86.431

Have you played with batching? Has it helped? Anyway it is to be
expected that the overhead will be higher than a single mmap call. The
question is whether you can live with it because adding a new semantic
to mlock sounds trickier and MAP_LOCKED is tricky enough already...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-14  8:08       ` Michal Hocko
@ 2015-05-14 13:58         ` Eric B Munson
  2015-05-15 15:35         ` Eric B Munson
  1 sibling, 0 replies; 18+ messages in thread
From: Eric B Munson @ 2015-05-14 13:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Shuah Khan, linux-alpha, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm,
	linux-arch, linux-api

[-- Attachment #1: Type: text/plain, Size: 3514 bytes --]

On Thu, 14 May 2015, Michal Hocko wrote:

> On Wed 13-05-15 11:00:36, Eric B Munson wrote:
> > On Mon, 11 May 2015, Eric B Munson wrote:
> > 
> > > On Fri, 08 May 2015, Andrew Morton wrote:
> > > 
> > > > On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:
> > > > 
> > > > > mlock() allows a user to control page out of program memory, but this
> > > > > comes at the cost of faulting in the entire mapping when it is
> > > > > allocated.  For large mappings where the entire area is not necessary
> > > > > this is not ideal.
> > > > > 
> > > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > > user to specify that the covered are should not be paged out, but only
> > > > > after the memory has been used the first time.
> > > > 
> > > > Please tell us much much more about the value of these changes: the use
> > > > cases, the behavioural improvements and performance results which the
> > > > patchset brings to those use cases, etc.
> > > > 
> > > 
> > > To illustrate the proposed use case I wrote a quick program that mmaps
> > > a 5GB file which is filled with random data and accesses 150,000 pages
> > > from that mapping.  Setup and processing were timed separately to
> > > illustrate the differences between the three tested approaches.  the
> > > setup portion is simply the call to mmap, the processing is the
> > > accessing of the various locations in  that mapping.  The following
> > > values are in milliseconds and are the averages of 20 runs each with a
> > > call to echo 3 > /proc/sys/vm/drop_caches between each run.
> > > 
> > > The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
> > > Startup average:    9476.506
> > > Processing average: 3.573
> > > 
> > > The second mapping was simply MAP_PRIVATE but each page was passed to
> > > mlock() before being read:
> > > Startup average:    0.051
> > > Processing average: 721.859
> > > 
> > > The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
> > > Startup average:    0.084
> > > Processing average: 42.125
> > > 
> > 
> > Michal's suggestion of changing protections and locking in a signal
> > handler was better than the locking as needed, but still significantly
> > more work required than the LOCKONFAULT case.
> > 
> > Startup average:    0.047
> > Processing average: 86.431
> 
> Have you played with batching? Has it helped? Anyway it is to be
> expected that the overhead will be higher than a single mmap call. The
> question is whether you can live with it because adding a new semantic
> to mlock sounds trickier and MAP_LOCKED is tricky enough already...
> 

The test code I have been using is a pathalogical test case that only
touches pages once and they are fairly far apart.

On the face batching sounds like a good idea, but I have a couple of
questions.  In order to batch fault in pages the seg fault handler needs
to know about the mapping in question.  Specifically it needs to know
where it ends so that it doesn't try and mprotect()/mlock() past the
end.  So now the program has to start tracking its maps in some globally
accessible structure and this sounds more like implementing memory
management in userspace.  How could this batching be implemented without
requiring the signal handler to know about mapping that is being
accessed?  Also, how much memory management policy is it reasonable to
expect user space to implement in these cases?

Eric


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-14  8:08       ` Michal Hocko
  2015-05-14 13:58         ` Eric B Munson
@ 2015-05-15 15:35         ` Eric B Munson
  2015-05-19 20:30           ` Eric B Munson
  1 sibling, 1 reply; 18+ messages in thread
From: Eric B Munson @ 2015-05-15 15:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Shuah Khan, linux-alpha, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm,
	linux-arch, linux-api

[-- Attachment #1: Type: text/plain, Size: 4446 bytes --]

On Thu, 14 May 2015, Michal Hocko wrote:

> On Wed 13-05-15 11:00:36, Eric B Munson wrote:
> > On Mon, 11 May 2015, Eric B Munson wrote:
> > 
> > > On Fri, 08 May 2015, Andrew Morton wrote:
> > > 
> > > > On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:
> > > > 
> > > > > mlock() allows a user to control page out of program memory, but this
> > > > > comes at the cost of faulting in the entire mapping when it is
> > > > > allocated.  For large mappings where the entire area is not necessary
> > > > > this is not ideal.
> > > > > 
> > > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > > user to specify that the covered are should not be paged out, but only
> > > > > after the memory has been used the first time.
> > > > 
> > > > Please tell us much much more about the value of these changes: the use
> > > > cases, the behavioural improvements and performance results which the
> > > > patchset brings to those use cases, etc.
> > > > 
> > > 
> > > To illustrate the proposed use case I wrote a quick program that mmaps
> > > a 5GB file which is filled with random data and accesses 150,000 pages
> > > from that mapping.  Setup and processing were timed separately to
> > > illustrate the differences between the three tested approaches.  the
> > > setup portion is simply the call to mmap, the processing is the
> > > accessing of the various locations in  that mapping.  The following
> > > values are in milliseconds and are the averages of 20 runs each with a
> > > call to echo 3 > /proc/sys/vm/drop_caches between each run.
> > > 
> > > The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
> > > Startup average:    9476.506
> > > Processing average: 3.573
> > > 
> > > The second mapping was simply MAP_PRIVATE but each page was passed to
> > > mlock() before being read:
> > > Startup average:    0.051
> > > Processing average: 721.859
> > > 
> > > The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
> > > Startup average:    0.084
> > > Processing average: 42.125
> > > 
> > 
> > Michal's suggestion of changing protections and locking in a signal
> > handler was better than the locking as needed, but still significantly
> > more work required than the LOCKONFAULT case.
> > 
> > Startup average:    0.047
> > Processing average: 86.431
> 
> Have you played with batching? Has it helped? Anyway it is to be
> expected that the overhead will be higher than a single mmap call. The
> question is whether you can live with it because adding a new semantic
> to mlock sounds trickier and MAP_LOCKED is tricky enough already...
> 

I reworked the experiment to better cover the batching solution.  The
same 5GB data file is used, however instead of 150,000 accesses at
regular intervals, the test program now does 15,000,000 accesses to
random pages in the mapping.  The rest of the setup remains the same.

mmap with MAP_LOCKED:
Setup avg:      11821.193
Processing avg: 3404.286

mmap with mlock() before each access:
Setup avg:      0.054
Processing avg: 34263.201

mmap with PROT_NONE and signal handler and batch size of 1 page:
With the default value in max_map_count, this gets ENOMEM as I attempt
to change the permissions, after upping the sysctl significantly I get:
Setup avg:      0.050
Processing avg: 67690.625

mmap with PROT_NONE and signal handler and batch size of 8 pages:
Setup avg:      0.098
Processing avg: 37344.197

mmap with PROT_NONE and signal handler and batch size of 16 pages:
Setup avg:      0.0548
Processing avg: 29295.669

mmap with MAP_LOCKONFAULT:
Setup avg:      0.073
Processing avg: 18392.136

The signal handler in the batch cases faulted in memory in two steps to
avoid having to know the start and end of the faulting mapping.  The
first step covers the page that caused the fault as we know that it will
be possible to lock.  The second step speculatively tries to mlock and
mprotect the batch size - 1 pages that follow.  There may be a clever
way to avoid this without having the program track each mapping to be
covered by this handeler in a globally accessible structure, but I could
not find it.

These results show that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise MAP_LOCKONFAULT is significantly faster.

Eric

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Allow user to request memory to be locked on page fault
  2015-05-15 15:35         ` Eric B Munson
@ 2015-05-19 20:30           ` Eric B Munson
  0 siblings, 0 replies; 18+ messages in thread
From: Eric B Munson @ 2015-05-19 20:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Shuah Khan, linux-alpha, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm,
	linux-arch, linux-api

[-- Attachment #1: Type: text/plain, Size: 4761 bytes --]

On Fri, 15 May 2015, Eric B Munson wrote:

> On Thu, 14 May 2015, Michal Hocko wrote:
> 
> > On Wed 13-05-15 11:00:36, Eric B Munson wrote:
> > > On Mon, 11 May 2015, Eric B Munson wrote:
> > > 
> > > > On Fri, 08 May 2015, Andrew Morton wrote:
> > > > 
> > > > > On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson <emunson@akamai.com> wrote:
> > > > > 
> > > > > > mlock() allows a user to control page out of program memory, but this
> > > > > > comes at the cost of faulting in the entire mapping when it is
> > > > > > allocated.  For large mappings where the entire area is not necessary
> > > > > > this is not ideal.
> > > > > > 
> > > > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > > > user to specify that the covered are should not be paged out, but only
> > > > > > after the memory has been used the first time.
> > > > > 
> > > > > Please tell us much much more about the value of these changes: the use
> > > > > cases, the behavioural improvements and performance results which the
> > > > > patchset brings to those use cases, etc.
> > > > > 
> > > > 
> > > > To illustrate the proposed use case I wrote a quick program that mmaps
> > > > a 5GB file which is filled with random data and accesses 150,000 pages
> > > > from that mapping.  Setup and processing were timed separately to
> > > > illustrate the differences between the three tested approaches.  the
> > > > setup portion is simply the call to mmap, the processing is the
> > > > accessing of the various locations in  that mapping.  The following
> > > > values are in milliseconds and are the averages of 20 runs each with a
> > > > call to echo 3 > /proc/sys/vm/drop_caches between each run.
> > > > 
> > > > The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
> > > > Startup average:    9476.506
> > > > Processing average: 3.573
> > > > 
> > > > The second mapping was simply MAP_PRIVATE but each page was passed to
> > > > mlock() before being read:
> > > > Startup average:    0.051
> > > > Processing average: 721.859
> > > > 
> > > > The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
> > > > Startup average:    0.084
> > > > Processing average: 42.125
> > > > 
> > > 
> > > Michal's suggestion of changing protections and locking in a signal
> > > handler was better than the locking as needed, but still significantly
> > > more work required than the LOCKONFAULT case.
> > > 
> > > Startup average:    0.047
> > > Processing average: 86.431
> > 
> > Have you played with batching? Has it helped? Anyway it is to be
> > expected that the overhead will be higher than a single mmap call. The
> > question is whether you can live with it because adding a new semantic
> > to mlock sounds trickier and MAP_LOCKED is tricky enough already...
> > 
> 
> I reworked the experiment to better cover the batching solution.  The
> same 5GB data file is used, however instead of 150,000 accesses at
> regular intervals, the test program now does 15,000,000 accesses to
> random pages in the mapping.  The rest of the setup remains the same.
> 
> mmap with MAP_LOCKED:
> Setup avg:      11821.193
> Processing avg: 3404.286
> 
> mmap with mlock() before each access:
> Setup avg:      0.054
> Processing avg: 34263.201
> 
> mmap with PROT_NONE and signal handler and batch size of 1 page:
> With the default value in max_map_count, this gets ENOMEM as I attempt
> to change the permissions, after upping the sysctl significantly I get:
> Setup avg:      0.050
> Processing avg: 67690.625
> 
> mmap with PROT_NONE and signal handler and batch size of 8 pages:
> Setup avg:      0.098
> Processing avg: 37344.197
> 
> mmap with PROT_NONE and signal handler and batch size of 16 pages:
> Setup avg:      0.0548
> Processing avg: 29295.669
> 
> mmap with MAP_LOCKONFAULT:
> Setup avg:      0.073
> Processing avg: 18392.136
> 
> The signal handler in the batch cases faulted in memory in two steps to
> avoid having to know the start and end of the faulting mapping.  The
> first step covers the page that caused the fault as we know that it will
> be possible to lock.  The second step speculatively tries to mlock and
> mprotect the batch size - 1 pages that follow.  There may be a clever
> way to avoid this without having the program track each mapping to be
> covered by this handeler in a globally accessible structure, but I could
> not find it.
> 
> These results show that if the developer knows that a majority of the
> mapping will be used, it is better to try and fault it in at once,
> otherwise MAP_LOCKONFAULT is significantly faster.
> 
> Eric

Is there anything else I can add to the discussion here?


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-05-19 20:30 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-08 19:33 [PATCH 0/3] Allow user to request memory to be locked on page fault Eric B Munson
2015-05-08 19:33 ` [PATCH 1/3] Add flag to request pages are locked after " Eric B Munson
2015-05-08 19:33 ` [PATCH 2/3] Add mlockall flag for locking pages on fault Eric B Munson
2015-05-08 19:33 ` [PATCH 3/3] Add tests for lock " Eric B Munson
2015-05-08 19:42 ` [PATCH 0/3] Allow user to request memory to be locked on page fault Andrew Morton
2015-05-08 20:06   ` Eric B Munson
2015-05-08 20:15     ` Andrew Morton
2015-05-11 14:36       ` Eric B Munson
2015-05-11 19:12         ` Andrew Morton
2015-05-11 21:05           ` Eric B Munson
2015-05-13 13:58     ` Michal Hocko
2015-05-13 14:14       ` Eric B Munson
2015-05-11 18:06   ` Eric B Munson
2015-05-13 15:00     ` Eric B Munson
2015-05-14  8:08       ` Michal Hocko
2015-05-14 13:58         ` Eric B Munson
2015-05-15 15:35         ` Eric B Munson
2015-05-19 20:30           ` Eric B Munson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).