linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 0/6] mm: faster get user pages
@ 2018-09-21 22:39 Keith Busch
  2018-09-21 22:39 ` [PATCHv3 1/6] mm/gup_benchmark: Time put_page Keith Busch
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Keith Busch @ 2018-09-21 22:39 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch

Changes since v2:

  Combine only the output parameters in a struct that need tracking,
  and squash to just one final kernel patch.

  Fixed compile bugs for all configs

Keith Busch (6):
  mm/gup_benchmark: Time put_page
  mm/gup_benchmark: Add additional pinning methods
  tools/gup_benchmark: Fix 'write' flag usage
  tools/gup_benchmark: Allow user specified file
  tools/gup_benchmark: Add parameter for hugetlb
  mm/gup: Cache dev_pagemap while pinning pages

 include/linux/huge_mm.h                    |  8 +--
 include/linux/mm.h                         | 19 ++++++-
 mm/gup.c                                   | 90 +++++++++++++++++-------------
 mm/gup_benchmark.c                         | 36 ++++++++++--
 mm/huge_memory.c                           | 38 ++++++-------
 mm/nommu.c                                 |  4 +-
 tools/testing/selftests/vm/gup_benchmark.c | 40 +++++++++++--
 7 files changed, 154 insertions(+), 81 deletions(-)

-- 
2.14.4


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv3 1/6] mm/gup_benchmark: Time put_page
  2018-09-21 22:39 [PATCHv3 0/6] mm: faster get user pages Keith Busch
@ 2018-09-21 22:39 ` Keith Busch
  2018-10-02 10:54   ` Kirill A. Shutemov
  2018-09-21 22:39 ` [PATCHv3 2/6] mm/gup_benchmark: Add additional pinning methods Keith Busch
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2018-09-21 22:39 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch

We'd like to measure time to unpin user pages, so this adds a second
benchmark timer on put_page, separate from get_page.

Adding the field will breaks this ioctl ABI, but should be okay since
this an in-tree kernel selftest.

Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 mm/gup_benchmark.c                         | 8 ++++++--
 tools/testing/selftests/vm/gup_benchmark.c | 6 ++++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
index 6a473709e9b6..76cd35e477af 100644
--- a/mm/gup_benchmark.c
+++ b/mm/gup_benchmark.c
@@ -8,7 +8,8 @@
 #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
 
 struct gup_benchmark {
-	__u64 delta_usec;
+	__u64 get_delta_usec;
+	__u64 put_delta_usec;
 	__u64 addr;
 	__u64 size;
 	__u32 nr_pages_per_call;
@@ -47,14 +48,17 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
 	}
 	end_time = ktime_get();
 
-	gup->delta_usec = ktime_us_delta(end_time, start_time);
+	gup->get_delta_usec = ktime_us_delta(end_time, start_time);
 	gup->size = addr - gup->addr;
 
+	start_time = ktime_get();
 	for (i = 0; i < nr_pages; i++) {
 		if (!pages[i])
 			break;
 		put_page(pages[i]);
 	}
+	end_time = ktime_get();
+	gup->put_delta_usec = ktime_us_delta(end_time, start_time);
 
 	kvfree(pages);
 	return 0;
diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
index 36df55132036..bdcb97acd0ac 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -17,7 +17,8 @@
 #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
 
 struct gup_benchmark {
-	__u64 delta_usec;
+	__u64 get_delta_usec;
+	__u64 put_delta_usec;
 	__u64 addr;
 	__u64 size;
 	__u32 nr_pages_per_call;
@@ -81,7 +82,8 @@ int main(int argc, char **argv)
 		if (ioctl(fd, GUP_FAST_BENCHMARK, &gup))
 			perror("ioctl"), exit(1);
 
-		printf("Time: %lld us", gup.delta_usec);
+		printf("Time: get:%lld put:%lld us", gup.get_delta_usec,
+			gup.put_delta_usec);
 		if (gup.size != size)
 			printf(", truncated (size: %lld)", gup.size);
 		printf("\n");
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv3 2/6] mm/gup_benchmark: Add additional pinning methods
  2018-09-21 22:39 [PATCHv3 0/6] mm: faster get user pages Keith Busch
  2018-09-21 22:39 ` [PATCHv3 1/6] mm/gup_benchmark: Time put_page Keith Busch
@ 2018-09-21 22:39 ` Keith Busch
  2018-10-02 10:56   ` Kirill A. Shutemov
  2018-09-21 22:39 ` [PATCHv3 3/6] tools/gup_benchmark: Fix 'write' flag usage Keith Busch
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2018-09-21 22:39 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch

This patch provides new gup benchmark ioctl commands to run different
user page pinning methods, get_user_pages_longterm and get_user_pages,
in addition to the existing get_user_pages_fast.

Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 mm/gup_benchmark.c                         | 28 ++++++++++++++++++++++++++--
 tools/testing/selftests/vm/gup_benchmark.c | 13 +++++++++++--
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
index 76cd35e477af..e6d9ce001ffa 100644
--- a/mm/gup_benchmark.c
+++ b/mm/gup_benchmark.c
@@ -6,6 +6,8 @@
 #include <linux/debugfs.h>
 
 #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
+#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
+#define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
 
 struct gup_benchmark {
 	__u64 get_delta_usec;
@@ -41,7 +43,23 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
 			nr = (next - addr) / PAGE_SIZE;
 		}
 
-		nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i);
+		switch (cmd) {
+		case GUP_FAST_BENCHMARK:
+			nr = get_user_pages_fast(addr, nr, gup->flags & 1,
+						 pages + i);
+			break;
+		case GUP_LONGTERM_BENCHMARK:
+			nr = get_user_pages_longterm(addr, nr, gup->flags & 1,
+						     pages + i, NULL);
+			break;
+		case GUP_BENCHMARK:
+			nr = get_user_pages(addr, nr, gup->flags & 1, pages + i,
+					    NULL);
+			break;
+		default:
+			return -1;
+		}
+
 		if (nr <= 0)
 			break;
 		i += nr;
@@ -70,8 +88,14 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd,
 	struct gup_benchmark gup;
 	int ret;
 
-	if (cmd != GUP_FAST_BENCHMARK)
+	switch (cmd) {
+	case GUP_FAST_BENCHMARK:
+	case GUP_LONGTERM_BENCHMARK:
+	case GUP_BENCHMARK:
+		break;
+	default:
 		return -EINVAL;
+	}
 
 	if (copy_from_user(&gup, (void __user *)arg, sizeof(gup)))
 		return -EFAULT;
diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
index bdcb97acd0ac..c2f785ded9b9 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -15,6 +15,8 @@
 #define PAGE_SIZE sysconf(_SC_PAGESIZE)
 
 #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
+#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
+#define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
 
 struct gup_benchmark {
 	__u64 get_delta_usec;
@@ -30,9 +32,10 @@ int main(int argc, char **argv)
 	struct gup_benchmark gup;
 	unsigned long size = 128 * MB;
 	int i, fd, opt, nr_pages = 1, thp = -1, repeats = 1, write = 0;
+	int cmd = GUP_FAST_BENCHMARK;
 	char *p;
 
-	while ((opt = getopt(argc, argv, "m:r:n:tT")) != -1) {
+	while ((opt = getopt(argc, argv, "m:r:n:tTLU")) != -1) {
 		switch (opt) {
 		case 'm':
 			size = atoi(optarg) * MB;
@@ -49,6 +52,12 @@ int main(int argc, char **argv)
 		case 'T':
 			thp = 0;
 			break;
+		case 'L':
+			cmd = GUP_LONGTERM_BENCHMARK;
+			break;
+		case 'U':
+			cmd = GUP_BENCHMARK;
+			break;
 		case 'w':
 			write = 1;
 		default:
@@ -79,7 +88,7 @@ int main(int argc, char **argv)
 
 	for (i = 0; i < repeats; i++) {
 		gup.size = size;
-		if (ioctl(fd, GUP_FAST_BENCHMARK, &gup))
+		if (ioctl(fd, cmd, &gup))
 			perror("ioctl"), exit(1);
 
 		printf("Time: get:%lld put:%lld us", gup.get_delta_usec,
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv3 3/6] tools/gup_benchmark: Fix 'write' flag usage
  2018-09-21 22:39 [PATCHv3 0/6] mm: faster get user pages Keith Busch
  2018-09-21 22:39 ` [PATCHv3 1/6] mm/gup_benchmark: Time put_page Keith Busch
  2018-09-21 22:39 ` [PATCHv3 2/6] mm/gup_benchmark: Add additional pinning methods Keith Busch
@ 2018-09-21 22:39 ` Keith Busch
  2018-10-02 10:57   ` Kirill A. Shutemov
  2018-09-21 22:39 ` [PATCHv3 4/6] tools/gup_benchmark: Allow user specified file Keith Busch
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2018-09-21 22:39 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch

If the '-w' parameter was provided, the benchmark would exit due to a
mssing 'break'.

Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 tools/testing/selftests/vm/gup_benchmark.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
index c2f785ded9b9..b2082df8beb4 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -60,6 +60,7 @@ int main(int argc, char **argv)
 			break;
 		case 'w':
 			write = 1;
+			break;
 		default:
 			return -1;
 		}
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv3 4/6] tools/gup_benchmark: Allow user specified file
  2018-09-21 22:39 [PATCHv3 0/6] mm: faster get user pages Keith Busch
                   ` (2 preceding siblings ...)
  2018-09-21 22:39 ` [PATCHv3 3/6] tools/gup_benchmark: Fix 'write' flag usage Keith Busch
@ 2018-09-21 22:39 ` Keith Busch
  2018-10-02 11:03   ` Kirill A. Shutemov
  2018-09-21 22:39 ` [PATCHv3 5/6] tools/gup_benchmark: Add parameter for hugetlb Keith Busch
  2018-09-21 22:39 ` [PATCHv3 6/6] mm/gup: Cache dev_pagemap while pinning pages Keith Busch
  5 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2018-09-21 22:39 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch

The gup benchmark by default maps anonymous memory. This patch allows a
user to specify a file to map, providing a means to test various
file backings, like device and filesystem DAX.

Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 tools/testing/selftests/vm/gup_benchmark.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
index b2082df8beb4..f2c99e2436f8 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -33,9 +33,12 @@ int main(int argc, char **argv)
 	unsigned long size = 128 * MB;
 	int i, fd, opt, nr_pages = 1, thp = -1, repeats = 1, write = 0;
 	int cmd = GUP_FAST_BENCHMARK;
+	int file_map = -1;
+	int flags = MAP_ANONYMOUS | MAP_PRIVATE;
+	char *file = NULL;
 	char *p;
 
-	while ((opt = getopt(argc, argv, "m:r:n:tTLU")) != -1) {
+	while ((opt = getopt(argc, argv, "m:r:n:f:tTLU")) != -1) {
 		switch (opt) {
 		case 'm':
 			size = atoi(optarg) * MB;
@@ -61,11 +64,22 @@ int main(int argc, char **argv)
 		case 'w':
 			write = 1;
 			break;
+		case 'f':
+			file = optarg;
+			flags &= ~(MAP_PRIVATE | MAP_ANONYMOUS);
+			flags |= MAP_SHARED;
+			break;
 		default:
 			return -1;
 		}
 	}
 
+	if (file) {
+		file_map = open(file, O_RDWR|O_CREAT);
+		if (file_map < 0)
+			perror("open"), exit(file_map);
+	}
+
 	gup.nr_pages_per_call = nr_pages;
 	gup.flags = write;
 
@@ -73,8 +87,7 @@ int main(int argc, char **argv)
 	if (fd == -1)
 		perror("open"), exit(1);
 
-	p = mmap(NULL, size, PROT_READ | PROT_WRITE,
-			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, file_map, 0);
 	if (p == MAP_FAILED)
 		perror("mmap"), exit(1);
 	gup.addr = (unsigned long)p;
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv3 5/6] tools/gup_benchmark: Add parameter for hugetlb
  2018-09-21 22:39 [PATCHv3 0/6] mm: faster get user pages Keith Busch
                   ` (3 preceding siblings ...)
  2018-09-21 22:39 ` [PATCHv3 4/6] tools/gup_benchmark: Allow user specified file Keith Busch
@ 2018-09-21 22:39 ` Keith Busch
  2018-10-02 11:05   ` Kirill A. Shutemov
  2018-09-21 22:39 ` [PATCHv3 6/6] mm/gup: Cache dev_pagemap while pinning pages Keith Busch
  5 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2018-09-21 22:39 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch

Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 tools/testing/selftests/vm/gup_benchmark.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
index f2c99e2436f8..5d96e2b3d2f1 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -38,7 +38,7 @@ int main(int argc, char **argv)
 	char *file = NULL;
 	char *p;
 
-	while ((opt = getopt(argc, argv, "m:r:n:f:tTLU")) != -1) {
+	while ((opt = getopt(argc, argv, "m:r:n:f:tTLUH")) != -1) {
 		switch (opt) {
 		case 'm':
 			size = atoi(optarg) * MB;
@@ -64,6 +64,9 @@ int main(int argc, char **argv)
 		case 'w':
 			write = 1;
 			break;
+		case 'H':
+			flags |= MAP_HUGETLB;
+			break;
 		case 'f':
 			file = optarg;
 			flags &= ~(MAP_PRIVATE | MAP_ANONYMOUS);
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv3 6/6] mm/gup: Cache dev_pagemap while pinning pages
  2018-09-21 22:39 [PATCHv3 0/6] mm: faster get user pages Keith Busch
                   ` (4 preceding siblings ...)
  2018-09-21 22:39 ` [PATCHv3 5/6] tools/gup_benchmark: Add parameter for hugetlb Keith Busch
@ 2018-09-21 22:39 ` Keith Busch
  2018-10-02 11:26   ` Kirill A. Shutemov
  5 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2018-09-21 22:39 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch

Pinning pages from ZONE_DEVICE memory needs to check the backing device's
live-ness, which is tracked in the device's dev_pagemap metadata. This
metadata is stored in a radix tree and looking it up adds measurable
software overhead.

This patch avoids repeating this relatively costly operation when
dev_pagemap is used by caching the last dev_pagemap when getting user
pages. The gup_benchmark reports this reduces the time to get user pages
to as low as 1/3 of the previous time.

The cached value is combined with other output parameters into a context
struct to keep the parameters fewer.

Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 include/linux/huge_mm.h |  8 ++---
 include/linux/mm.h      | 19 +++++++++--
 mm/gup.c                | 90 +++++++++++++++++++++++++++----------------------
 mm/huge_memory.c        | 38 +++++++++------------
 mm/nommu.c              |  4 +--
 5 files changed, 88 insertions(+), 71 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 99c19b06d9a4..5cbabdebe9af 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -213,9 +213,9 @@ static inline int hpage_nr_pages(struct page *page)
 }
 
 struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
-		pmd_t *pmd, int flags);
+		pmd_t *pmd, int flags, struct dev_pagemap **pgmap);
 struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
-		pud_t *pud, int flags);
+		pud_t *pud, int flags, struct dev_pagemap **pgmap);
 
 extern vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t orig_pmd);
 
@@ -344,13 +344,13 @@ static inline void mm_put_huge_zero_page(struct mm_struct *mm)
 }
 
 static inline struct page *follow_devmap_pmd(struct vm_area_struct *vma,
-		unsigned long addr, pmd_t *pmd, int flags)
+	unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap)
 {
 	return NULL;
 }
 
 static inline struct page *follow_devmap_pud(struct vm_area_struct *vma,
-		unsigned long addr, pud_t *pud, int flags)
+	unsigned long addr, pud_t *pud, int flags, struct dev_pagemap **pgmap)
 {
 	return NULL;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a61ebe8ad4ca..79c80496dd50 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2534,15 +2534,28 @@ static inline vm_fault_t vmf_error(int err)
 	return VM_FAULT_SIGBUS;
 }
 
+struct follow_page_context {
+	struct dev_pagemap *pgmap;
+	unsigned int page_mask;
+};
+
 struct page *follow_page_mask(struct vm_area_struct *vma,
 			      unsigned long address, unsigned int foll_flags,
-			      unsigned int *page_mask);
+			      struct follow_page_context *ctx);
 
 static inline struct page *follow_page(struct vm_area_struct *vma,
 		unsigned long address, unsigned int foll_flags)
 {
-	unsigned int unused_page_mask;
-	return follow_page_mask(vma, address, foll_flags, &unused_page_mask);
+	struct page *page;
+	struct follow_page_context ctx = {
+		.pgmap = NULL,
+		.page_mask = 0,
+	};
+
+	page = follow_page_mask(vma, address, foll_flags, &ctx);
+	if (ctx.pgmap)
+		put_dev_pagemap(ctx.pgmap);
+	return page;
 }
 
 #define FOLL_WRITE	0x01	/* check pte is writable */
diff --git a/mm/gup.c b/mm/gup.c
index 1abc8b4afff6..124e7293e381 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -71,10 +71,10 @@ static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
 }
 
 static struct page *follow_page_pte(struct vm_area_struct *vma,
-		unsigned long address, pmd_t *pmd, unsigned int flags)
+		unsigned long address, pmd_t *pmd, unsigned int flags,
+		struct dev_pagemap **pgmap)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	struct dev_pagemap *pgmap = NULL;
 	struct page *page;
 	spinlock_t *ptl;
 	pte_t *ptep, pte;
@@ -116,8 +116,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		 * Only return device mapping pages in the FOLL_GET case since
 		 * they are only valid while holding the pgmap reference.
 		 */
-		pgmap = get_dev_pagemap(pte_pfn(pte), NULL);
-		if (pgmap)
+		*pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap);
+		if (*pgmap)
 			page = pte_page(pte);
 		else
 			goto no_page;
@@ -156,9 +156,9 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		get_page(page);
 
 		/* drop the pgmap reference now that we hold the page */
-		if (pgmap) {
-			put_dev_pagemap(pgmap);
-			pgmap = NULL;
+		if (*pgmap) {
+			put_dev_pagemap(*pgmap);
+			*pgmap = NULL;
 		}
 	}
 	if (flags & FOLL_TOUCH) {
@@ -210,7 +210,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 
 static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 				    unsigned long address, pud_t *pudp,
-				    unsigned int flags, unsigned int *page_mask)
+				    unsigned int flags,
+				    struct follow_page_context *ctx)
 {
 	pmd_t *pmd, pmdval;
 	spinlock_t *ptl;
@@ -258,13 +259,13 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 	}
 	if (pmd_devmap(pmdval)) {
 		ptl = pmd_lock(mm, pmd);
-		page = follow_devmap_pmd(vma, address, pmd, flags);
+		page = follow_devmap_pmd(vma, address, pmd, flags, &ctx->pgmap);
 		spin_unlock(ptl);
 		if (page)
 			return page;
 	}
 	if (likely(!pmd_trans_huge(pmdval)))
-		return follow_page_pte(vma, address, pmd, flags);
+		return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
 
 	if ((flags & FOLL_NUMA) && pmd_protnone(pmdval))
 		return no_page_table(vma, flags);
@@ -284,7 +285,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 	}
 	if (unlikely(!pmd_trans_huge(*pmd))) {
 		spin_unlock(ptl);
-		return follow_page_pte(vma, address, pmd, flags);
+		return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
 	}
 	if (flags & FOLL_SPLIT) {
 		int ret;
@@ -307,18 +308,18 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 		}
 
 		return ret ? ERR_PTR(ret) :
-			follow_page_pte(vma, address, pmd, flags);
+			follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
 	}
 	page = follow_trans_huge_pmd(vma, address, pmd, flags);
 	spin_unlock(ptl);
-	*page_mask = HPAGE_PMD_NR - 1;
+	ctx->page_mask = HPAGE_PMD_NR - 1;
 	return page;
 }
 
-
 static struct page *follow_pud_mask(struct vm_area_struct *vma,
 				    unsigned long address, p4d_t *p4dp,
-				    unsigned int flags, unsigned int *page_mask)
+				    unsigned int flags,
+				    struct follow_page_context *ctx)
 {
 	pud_t *pud;
 	spinlock_t *ptl;
@@ -344,7 +345,7 @@ static struct page *follow_pud_mask(struct vm_area_struct *vma,
 	}
 	if (pud_devmap(*pud)) {
 		ptl = pud_lock(mm, pud);
-		page = follow_devmap_pud(vma, address, pud, flags);
+		page = follow_devmap_pud(vma, address, pud, flags, &ctx->pgmap);
 		spin_unlock(ptl);
 		if (page)
 			return page;
@@ -352,13 +353,13 @@ static struct page *follow_pud_mask(struct vm_area_struct *vma,
 	if (unlikely(pud_bad(*pud)))
 		return no_page_table(vma, flags);
 
-	return follow_pmd_mask(vma, address, pud, flags, page_mask);
+	return follow_pmd_mask(vma, address, pud, flags, ctx);
 }
 
-
 static struct page *follow_p4d_mask(struct vm_area_struct *vma,
 				    unsigned long address, pgd_t *pgdp,
-				    unsigned int flags, unsigned int *page_mask)
+				    unsigned int flags,
+				    struct follow_page_context *ctx)
 {
 	p4d_t *p4d;
 	struct page *page;
@@ -378,7 +379,7 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma,
 			return page;
 		return no_page_table(vma, flags);
 	}
-	return follow_pud_mask(vma, address, p4d, flags, page_mask);
+	return follow_pud_mask(vma, address, p4d, flags, ctx);
 }
 
 /**
@@ -396,13 +397,13 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma,
  */
 struct page *follow_page_mask(struct vm_area_struct *vma,
 			      unsigned long address, unsigned int flags,
-			      unsigned int *page_mask)
+			      struct follow_page_context *ctx)
 {
 	pgd_t *pgd;
 	struct page *page;
 	struct mm_struct *mm = vma->vm_mm;
 
-	*page_mask = 0;
+	ctx->page_mask = 0;
 
 	/* make this handle hugepd */
 	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
@@ -431,7 +432,7 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
 		return no_page_table(vma, flags);
 	}
 
-	return follow_p4d_mask(vma, address, pgd, flags, page_mask);
+	return follow_p4d_mask(vma, address, pgd, flags, ctx);
 }
 
 static int get_gate_page(struct mm_struct *mm, unsigned long address,
@@ -659,9 +660,9 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned int gup_flags, struct page **pages,
 		struct vm_area_struct **vmas, int *nonblocking)
 {
-	long i = 0;
-	unsigned int page_mask;
+	long ret = 0, i = 0;
 	struct vm_area_struct *vma = NULL;
+	struct follow_page_context ctx = {};
 
 	if (!nr_pages)
 		return 0;
@@ -691,12 +692,14 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 						pages ? &pages[i] : NULL);
 				if (ret)
 					return i ? : ret;
-				page_mask = 0;
+				ctx.page_mask = 0;
 				goto next_page;
 			}
 
-			if (!vma || check_vma_flags(vma, gup_flags))
-				return i ? : -EFAULT;
+			if (!vma || check_vma_flags(vma, gup_flags)) {
+				ret = -EFAULT;
+				goto out;
+			}
 			if (is_vm_hugetlb_page(vma)) {
 				i = follow_hugetlb_page(mm, vma, pages, vmas,
 						&start, &nr_pages, i,
@@ -709,23 +712,26 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		 * If we have a pending SIGKILL, don't keep faulting pages and
 		 * potentially allocating memory.
 		 */
-		if (unlikely(fatal_signal_pending(current)))
-			return i ? i : -ERESTARTSYS;
+		if (unlikely(fatal_signal_pending(current))) {
+			ret = -ERESTARTSYS;
+			goto out;
+		}
 		cond_resched();
-		page = follow_page_mask(vma, start, foll_flags, &page_mask);
+
+		page = follow_page_mask(vma, start, foll_flags, &ctx);
 		if (!page) {
-			int ret;
 			ret = faultin_page(tsk, vma, start, &foll_flags,
 					nonblocking);
 			switch (ret) {
 			case 0:
 				goto retry;
+			case -EBUSY:
+				ret = 0;
+				/* FALLTHRU */
 			case -EFAULT:
 			case -ENOMEM:
 			case -EHWPOISON:
-				return i ? i : ret;
-			case -EBUSY:
-				return i;
+				goto out;
 			case -ENOENT:
 				goto next_page;
 			}
@@ -737,27 +743,31 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 			 */
 			goto next_page;
 		} else if (IS_ERR(page)) {
-			return i ? i : PTR_ERR(page);
+			ret = PTR_ERR(page);
+			goto out;
 		}
 		if (pages) {
 			pages[i] = page;
 			flush_anon_page(vma, page, start);
 			flush_dcache_page(page);
-			page_mask = 0;
+			ctx.page_mask = 0;
 		}
 next_page:
 		if (vmas) {
 			vmas[i] = vma;
-			page_mask = 0;
+			ctx.page_mask = 0;
 		}
-		page_increm = 1 + (~(start >> PAGE_SHIFT) & page_mask);
+		page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask);
 		if (page_increm > nr_pages)
 			page_increm = nr_pages;
 		i += page_increm;
 		start += page_increm * PAGE_SIZE;
 		nr_pages -= page_increm;
 	} while (nr_pages);
-	return i;
+out:
+	if (ctx.pgmap)
+		put_dev_pagemap(ctx.pgmap);
+	return i ? i : ret;
 }
 
 static bool vma_permits_fault(struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 533f9b00147d..9839bf91b057 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -851,13 +851,23 @@ static void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
 		update_mmu_cache_pmd(vma, addr, pmd);
 }
 
+static struct page *pagemap_page(unsigned long pfn, struct dev_pagemap **pgmap)
+{
+	struct page *page;
+
+	*pgmap = get_dev_pagemap(pfn, *pgmap);
+	if (!*pgmap)
+		return ERR_PTR(-EFAULT);
+	page = pfn_to_page(pfn);
+	get_page(page);
+	return page;
+}
+
 struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
-		pmd_t *pmd, int flags)
+		pmd_t *pmd, int flags, struct dev_pagemap **pgmap)
 {
 	unsigned long pfn = pmd_pfn(*pmd);
 	struct mm_struct *mm = vma->vm_mm;
-	struct dev_pagemap *pgmap;
-	struct page *page;
 
 	assert_spin_locked(pmd_lockptr(mm, pmd));
 
@@ -886,14 +896,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
 		return ERR_PTR(-EEXIST);
 
 	pfn += (addr & ~PMD_MASK) >> PAGE_SHIFT;
-	pgmap = get_dev_pagemap(pfn, NULL);
-	if (!pgmap)
-		return ERR_PTR(-EFAULT);
-	page = pfn_to_page(pfn);
-	get_page(page);
-	put_dev_pagemap(pgmap);
-
-	return page;
+	return pagemap_page(pfn, pgmap);
 }
 
 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
@@ -1000,12 +1003,10 @@ static void touch_pud(struct vm_area_struct *vma, unsigned long addr,
 }
 
 struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
-		pud_t *pud, int flags)
+               pud_t *pud, int flags, struct dev_pagemap **pgmap)
 {
 	unsigned long pfn = pud_pfn(*pud);
 	struct mm_struct *mm = vma->vm_mm;
-	struct dev_pagemap *pgmap;
-	struct page *page;
 
 	assert_spin_locked(pud_lockptr(mm, pud));
 
@@ -1028,14 +1029,7 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
 		return ERR_PTR(-EEXIST);
 
 	pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT;
-	pgmap = get_dev_pagemap(pfn, NULL);
-	if (!pgmap)
-		return ERR_PTR(-EFAULT);
-	page = pfn_to_page(pfn);
-	get_page(page);
-	put_dev_pagemap(pgmap);
-
-	return page;
+	return pagemap_page(pfn, pgmap);
 }
 
 int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
diff --git a/mm/nommu.c b/mm/nommu.c
index e4aac33216ae..a795c70cf21e 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1711,9 +1711,9 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 
 struct page *follow_page_mask(struct vm_area_struct *vma,
 			      unsigned long address, unsigned int flags,
-			      unsigned int *page_mask)
+			      struct follow_page_context *ctx)
 {
-	*page_mask = 0;
+	ctx->page_mask = 0;
 	return NULL;
 }
 
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 1/6] mm/gup_benchmark: Time put_page
  2018-09-21 22:39 ` [PATCHv3 1/6] mm/gup_benchmark: Time put_page Keith Busch
@ 2018-10-02 10:54   ` Kirill A. Shutemov
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2018-10-02 10:54 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-mm, linux-kernel, Dave Hansen, Dan Williams

On Fri, Sep 21, 2018 at 10:39:51PM +0000, Keith Busch wrote:
> We'd like to measure time to unpin user pages, so this adds a second
> benchmark timer on put_page, separate from get_page.
> 
> Adding the field will breaks this ioctl ABI, but should be okay since
> this an in-tree kernel selftest.
> 
> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Keith Busch <keith.busch@intel.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 2/6] mm/gup_benchmark: Add additional pinning methods
  2018-09-21 22:39 ` [PATCHv3 2/6] mm/gup_benchmark: Add additional pinning methods Keith Busch
@ 2018-10-02 10:56   ` Kirill A. Shutemov
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2018-10-02 10:56 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-mm, linux-kernel, Dave Hansen, Dan Williams

On Fri, Sep 21, 2018 at 10:39:52PM +0000, Keith Busch wrote:
> This patch provides new gup benchmark ioctl commands to run different
> user page pinning methods, get_user_pages_longterm and get_user_pages,
> in addition to the existing get_user_pages_fast.
> 
> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Keith Busch <keith.busch@intel.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 3/6] tools/gup_benchmark: Fix 'write' flag usage
  2018-09-21 22:39 ` [PATCHv3 3/6] tools/gup_benchmark: Fix 'write' flag usage Keith Busch
@ 2018-10-02 10:57   ` Kirill A. Shutemov
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2018-10-02 10:57 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-mm, linux-kernel, Dave Hansen, Dan Williams

On Fri, Sep 21, 2018 at 10:39:53PM +0000, Keith Busch wrote:
> If the '-w' parameter was provided, the benchmark would exit due to a
> mssing 'break'.
> 
> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Keith Busch <keith.busch@intel.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 4/6] tools/gup_benchmark: Allow user specified file
  2018-09-21 22:39 ` [PATCHv3 4/6] tools/gup_benchmark: Allow user specified file Keith Busch
@ 2018-10-02 11:03   ` Kirill A. Shutemov
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2018-10-02 11:03 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-mm, linux-kernel, Dave Hansen, Dan Williams

On Fri, Sep 21, 2018 at 10:39:54PM +0000, Keith Busch wrote:
> The gup benchmark by default maps anonymous memory. This patch allows a
> user to specify a file to map, providing a means to test various
> file backings, like device and filesystem DAX.
> 
> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  tools/testing/selftests/vm/gup_benchmark.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
> index b2082df8beb4..f2c99e2436f8 100644
> --- a/tools/testing/selftests/vm/gup_benchmark.c
> +++ b/tools/testing/selftests/vm/gup_benchmark.c
> @@ -33,9 +33,12 @@ int main(int argc, char **argv)
>  	unsigned long size = 128 * MB;
>  	int i, fd, opt, nr_pages = 1, thp = -1, repeats = 1, write = 0;
>  	int cmd = GUP_FAST_BENCHMARK;
> +	int file_map = -1;
> +	int flags = MAP_ANONYMOUS | MAP_PRIVATE;
> +	char *file = NULL;
>  	char *p;
>  
> -	while ((opt = getopt(argc, argv, "m:r:n:tTLU")) != -1) {
> +	while ((opt = getopt(argc, argv, "m:r:n:f:tTLU")) != -1) {
>  		switch (opt) {
>  		case 'm':
>  			size = atoi(optarg) * MB;
> @@ -61,11 +64,22 @@ int main(int argc, char **argv)
>  		case 'w':
>  			write = 1;
>  			break;
> +		case 'f':
> +			file = optarg;
> +			flags &= ~(MAP_PRIVATE | MAP_ANONYMOUS);
> +			flags |= MAP_SHARED;

Why do we want to assume shared mapping if a file is passed? Private-file
mapping is also valid target for the benchmark.

Maybe a separate option for shared? It would keep options more independent.

BTW, we can make a default file /dev/zero and don't have MAP_ANONYMOUS in
the flags: private mapping of /dev/zero would produce anonymous mapping.
No need in masking out MAP_ANONYMOUS on -f and no branch on 'if (file)'
below.

> +			break;
>  		default:
>  			return -1;
>  		}
>  	}
>  
> +	if (file) {
> +		file_map = open(file, O_RDWR|O_CREAT);
> +		if (file_map < 0)
> +			perror("open"), exit(file_map);
> +	}
> +
>  	gup.nr_pages_per_call = nr_pages;
>  	gup.flags = write;
>  
> @@ -73,8 +87,7 @@ int main(int argc, char **argv)
>  	if (fd == -1)
>  		perror("open"), exit(1);
>  
> -	p = mmap(NULL, size, PROT_READ | PROT_WRITE,
> -			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> +	p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, file_map, 0);
>  	if (p == MAP_FAILED)
>  		perror("mmap"), exit(1);
>  	gup.addr = (unsigned long)p;
> -- 
> 2.14.4
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 5/6] tools/gup_benchmark: Add parameter for hugetlb
  2018-09-21 22:39 ` [PATCHv3 5/6] tools/gup_benchmark: Add parameter for hugetlb Keith Busch
@ 2018-10-02 11:05   ` Kirill A. Shutemov
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2018-10-02 11:05 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-mm, linux-kernel, Dave Hansen, Dan Williams

On Fri, Sep 21, 2018 at 10:39:55PM +0000, Keith Busch wrote:

-ENOMSG

> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  tools/testing/selftests/vm/gup_benchmark.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
> index f2c99e2436f8..5d96e2b3d2f1 100644
> --- a/tools/testing/selftests/vm/gup_benchmark.c
> +++ b/tools/testing/selftests/vm/gup_benchmark.c
> @@ -38,7 +38,7 @@ int main(int argc, char **argv)
>  	char *file = NULL;
>  	char *p;
>  
> -	while ((opt = getopt(argc, argv, "m:r:n:f:tTLU")) != -1) {
> +	while ((opt = getopt(argc, argv, "m:r:n:f:tTLUH")) != -1) {
>  		switch (opt) {
>  		case 'm':
>  			size = atoi(optarg) * MB;
> @@ -64,6 +64,9 @@ int main(int argc, char **argv)
>  		case 'w':
>  			write = 1;
>  			break;
> +		case 'H':
> +			flags |= MAP_HUGETLB;
> +			break;
>  		case 'f':
>  			file = optarg;
>  			flags &= ~(MAP_PRIVATE | MAP_ANONYMOUS);
> -- 
> 2.14.4
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 6/6] mm/gup: Cache dev_pagemap while pinning pages
  2018-09-21 22:39 ` [PATCHv3 6/6] mm/gup: Cache dev_pagemap while pinning pages Keith Busch
@ 2018-10-02 11:26   ` Kirill A. Shutemov
  2018-10-02 15:49     ` Dave Hansen
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2018-10-02 11:26 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-mm, linux-kernel, Dave Hansen, Dan Williams

On Fri, Sep 21, 2018 at 10:39:56PM +0000, Keith Busch wrote:
> Pinning pages from ZONE_DEVICE memory needs to check the backing device's
> live-ness, which is tracked in the device's dev_pagemap metadata. This
> metadata is stored in a radix tree and looking it up adds measurable
> software overhead.
> 
> This patch avoids repeating this relatively costly operation when
> dev_pagemap is used by caching the last dev_pagemap when getting user
> pages. The gup_benchmark reports this reduces the time to get user pages
> to as low as 1/3 of the previous time.
> 
> The cached value is combined with other output parameters into a context
> struct to keep the parameters fewer.
> 
> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---

....

> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a61ebe8ad4ca..79c80496dd50 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2534,15 +2534,28 @@ static inline vm_fault_t vmf_error(int err)
>  	return VM_FAULT_SIGBUS;
>  }
>  
> +struct follow_page_context {
> +	struct dev_pagemap *pgmap;
> +	unsigned int page_mask;
> +};
> +
>  struct page *follow_page_mask(struct vm_area_struct *vma,
>  			      unsigned long address, unsigned int foll_flags,
> -			      unsigned int *page_mask);
> +			      struct follow_page_context *ctx);
>  
>  static inline struct page *follow_page(struct vm_area_struct *vma,
>  		unsigned long address, unsigned int foll_flags)
>  {
> -	unsigned int unused_page_mask;
> -	return follow_page_mask(vma, address, foll_flags, &unused_page_mask);
> +	struct page *page;
> +	struct follow_page_context ctx = {
> +		.pgmap = NULL,
> +		.page_mask = 0,
> +	};
> +
> +	page = follow_page_mask(vma, address, foll_flags, &ctx);
> +	if (ctx.pgmap)
> +		put_dev_pagemap(ctx.pgmap);
> +	return page;
>  }

Do we still want to keep the function as inline? I don't think so.
Let's move it into mm/gup.c and make struct follow_page_context private to
the file.

>  
>  #define FOLL_WRITE	0x01	/* check pte is writable */
> diff --git a/mm/gup.c b/mm/gup.c
> index 1abc8b4afff6..124e7293e381 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -71,10 +71,10 @@ static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
>  }
>  
>  static struct page *follow_page_pte(struct vm_area_struct *vma,
> -		unsigned long address, pmd_t *pmd, unsigned int flags)
> +		unsigned long address, pmd_t *pmd, unsigned int flags,
> +		struct dev_pagemap **pgmap)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
> -	struct dev_pagemap *pgmap = NULL;
>  	struct page *page;
>  	spinlock_t *ptl;
>  	pte_t *ptep, pte;
> @@ -116,8 +116,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>  		 * Only return device mapping pages in the FOLL_GET case since
>  		 * they are only valid while holding the pgmap reference.
>  		 */
> -		pgmap = get_dev_pagemap(pte_pfn(pte), NULL);
> -		if (pgmap)
> +		*pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap);
> +		if (*pgmap)
>  			page = pte_page(pte);
>  		else
>  			goto no_page;

Hm. Shouldn't get_dev_pagemap() call be under if (!*pgmap)?

... ah, never mind. I've got confused by get_dev_pagemap() interface.

>  static bool vma_permits_fault(struct vm_area_struct *vma,
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 533f9b00147d..9839bf91b057 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -851,13 +851,23 @@ static void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		update_mmu_cache_pmd(vma, addr, pmd);
>  }
>  
> +static struct page *pagemap_page(unsigned long pfn, struct dev_pagemap **pgmap)

The function name doesn't reflect the fact that it takes pin on the page.
Maybe pagemap_get_page()?

> +{
> +	struct page *page;
> +
> +	*pgmap = get_dev_pagemap(pfn, *pgmap);
> +	if (!*pgmap)
> +		return ERR_PTR(-EFAULT);
> +	page = pfn_to_page(pfn);
> +	get_page(page);
> +	return page;
> +}
> +

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 6/6] mm/gup: Cache dev_pagemap while pinning pages
  2018-10-02 11:26   ` Kirill A. Shutemov
@ 2018-10-02 15:49     ` Dave Hansen
  2018-10-02 16:05       ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2018-10-02 15:49 UTC (permalink / raw)
  To: Kirill A. Shutemov, Keith Busch; +Cc: linux-mm, linux-kernel, Dan Williams

On 10/02/2018 04:26 AM, Kirill A. Shutemov wrote:
>> +	page = follow_page_mask(vma, address, foll_flags, &ctx);
>> +	if (ctx.pgmap)
>> +		put_dev_pagemap(ctx.pgmap);
>> +	return page;
>>  }
> Do we still want to keep the function as inline? I don't think so.
> Let's move it into mm/gup.c and make struct follow_page_context private to
> the file.

Yeah, and let's have a put_follow_page_context() that does the
put_dev_pagemap() rather than spreading that if() to each call site.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 6/6] mm/gup: Cache dev_pagemap while pinning pages
  2018-10-02 15:49     ` Dave Hansen
@ 2018-10-02 16:05       ` Keith Busch
  0 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2018-10-02 16:05 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Dan Williams

On Tue, Oct 02, 2018 at 08:49:39AM -0700, Dave Hansen wrote:
> On 10/02/2018 04:26 AM, Kirill A. Shutemov wrote:
> >> +	page = follow_page_mask(vma, address, foll_flags, &ctx);
> >> +	if (ctx.pgmap)
> >> +		put_dev_pagemap(ctx.pgmap);
> >> +	return page;
> >>  }
> > Do we still want to keep the function as inline? I don't think so.
> > Let's move it into mm/gup.c and make struct follow_page_context private to
> > the file.
> 
> Yeah, and let's have a put_follow_page_context() that does the
> put_dev_pagemap() rather than spreading that if() to each call site.

Thanks for all the feedback. I will make a new version, but with the
gup_benchmark part split into an independent set since it is logically
separate from the final patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-10-02 16:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-21 22:39 [PATCHv3 0/6] mm: faster get user pages Keith Busch
2018-09-21 22:39 ` [PATCHv3 1/6] mm/gup_benchmark: Time put_page Keith Busch
2018-10-02 10:54   ` Kirill A. Shutemov
2018-09-21 22:39 ` [PATCHv3 2/6] mm/gup_benchmark: Add additional pinning methods Keith Busch
2018-10-02 10:56   ` Kirill A. Shutemov
2018-09-21 22:39 ` [PATCHv3 3/6] tools/gup_benchmark: Fix 'write' flag usage Keith Busch
2018-10-02 10:57   ` Kirill A. Shutemov
2018-09-21 22:39 ` [PATCHv3 4/6] tools/gup_benchmark: Allow user specified file Keith Busch
2018-10-02 11:03   ` Kirill A. Shutemov
2018-09-21 22:39 ` [PATCHv3 5/6] tools/gup_benchmark: Add parameter for hugetlb Keith Busch
2018-10-02 11:05   ` Kirill A. Shutemov
2018-09-21 22:39 ` [PATCHv3 6/6] mm/gup: Cache dev_pagemap while pinning pages Keith Busch
2018-10-02 11:26   ` Kirill A. Shutemov
2018-10-02 15:49     ` Dave Hansen
2018-10-02 16:05       ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).