[PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page
@ 2018-05-24  0:58 Huang, Ying
  2018-05-24  0:58 ` [PATCH -V2 -mm 1/4] mm, clear_huge_page: Move order algorithm into a separate function Huang, Ying
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Huang, Ying @ 2018-05-24  0:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Andi Kleen, Jan Kara,
	Michal Hocko, Andrea Arcangeli, Kirill A. Shutemov,
	Matthew Wilcox, Hugh Dickins, Minchan Kim, Shaohua Li,
	Christopher Lameter, Mike Kravetz

From: Huang Ying <ying.huang@intel.com>

Huge page helps to reduce TLB miss rate, but it has higher cache
footprint, sometimes this may cause some issue.  For example, when
copying huge page on x86_64 platform, the cache footprint is 4M.  But
on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M
LLC (last level cache).  That is, in average, there are 2.5M LLC for
each core and 1.25M LLC for each thread.

If the cache contention is heavy when copying the huge page, and we
copy the huge page from the begin to the end, it is possible that the
begin of huge page is evicted from the cache after we finishing
copying the end of the huge page.  And it is possible for the
application to access the begin of the huge page after copying the
huge page.

In commit c79b57e462b5d ("mm: hugetlb: clear target sub-page last when
clearing huge page"), to keep the cache lines of the target subpage
hot, the order to clear the subpages in the huge page in
clear_huge_page() is changed to clearing the subpage which is furthest
from the target subpage firstly, and the target subpage last.  The
similar order changing helps huge page copying too.  That is
implemented in this patchset.

The patchset is a generic optimization which should benefit quite some
workloads, not for a specific use case.  To demonstrate the
performance benefit of the patchset, we have tested it with
vm-scalability run on transparent huge page.

With this patchset, the throughput increases ~16.6% in vm-scalability
anon-cow-seq test case with 36 processes on a 2 socket Xeon E5 v3 2699
system (36 cores, 72 threads).  The test case set
/sys/kernel/mm/transparent_hugepage/enabled to be always, mmap() a big
anonymous memory area and populate it, then forked 36 child processes,
each writes to the anonymous memory area from the begin to the end, so
cause copy on write.  For each child process, other child processes
could be seen as other workloads which generate heavy cache pressure.
At the same time, the IPC (instruction per cycle) increased from 0.63
to 0.78, and the time spent in user space is reduced ~7.2%.

Changelog:

V2:

- As suggested by Mike Kravetz, put subpage order algorithm into a
  separate patch to avoid code duplication and reduce maintenance
  overhead.

- Add hugetlbfs support

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH -V2 -mm 1/4] mm, clear_huge_page: Move order algorithm into a separate function
  2018-05-24  0:58 [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
@ 2018-05-24  0:58 ` Huang, Ying
  2018-05-24 20:55   ` Mike Kravetz
  2018-05-24  0:58 ` [PATCH -V2 -mm 2/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Huang, Ying @ 2018-05-24  0:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Andi Kleen, Jan Kara,
	Michal Hocko, Andrea Arcangeli, Kirill A. Shutemov,
	Matthew Wilcox, Hugh Dickins, Minchan Kim, Shaohua Li,
	Christopher Lameter, Mike Kravetz

From: Huang Ying <ying.huang@intel.com>

In commit c79b57e462b5d ("mm: hugetlb: clear target sub-page last when
clearing huge page"), to keep the cache lines of the target subpage
hot, the order to clear the subpages in the huge page in
clear_huge_page() is changed to clearing the subpage which is furthest
from the target subpage firstly, and the target subpage last.  This
optimization could be applied to copying huge page too with the same
order algorithm.  To avoid code duplication and reduce maintenance
overhead, in this patch, the order algorithm is moved out of
clear_huge_page() into a separate function: process_huge_page().  So
that we can use it for copying huge page too.

This will change the direct calls to clear_user_highpage() into the
indirect calls.  But with the proper inline support of the compilers,
the indirect call will be optimized to be the direct call.  Our tests
show no performance change with the patch.

This patch is a code cleanup without functionality change.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Christopher Lameter <cl@linux.com>
---
 mm/memory.c | 90 ++++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 56 insertions(+), 34 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 14578158ed20..b9f573a81bbd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4569,71 +4569,93 @@ EXPORT_SYMBOL(__might_fault);
 #endif
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
-static void clear_gigantic_page(struct page *page,
-				unsigned long addr,
-				unsigned int pages_per_huge_page)
-{
-	int i;
-	struct page *p = page;
-
-	might_sleep();
-	for (i = 0; i < pages_per_huge_page;
-	     i++, p = mem_map_next(p, page, i)) {
-		cond_resched();
-		clear_user_highpage(p, addr + i * PAGE_SIZE);
-	}
-}
-void clear_huge_page(struct page *page,
-		     unsigned long addr_hint, unsigned int pages_per_huge_page)
+/*
+ * Process all subpages of the specified huge page with the specified
+ * operation.  The target subpage will be processed last to keep its
+ * cache lines hot.
+ */
+static inline void process_huge_page(
+	unsigned long addr_hint, unsigned int pages_per_huge_page,
+	void (*process_subpage)(unsigned long addr, int idx, void *arg),
+	void *arg)
 {
 	int i, n, base, l;
 	unsigned long addr = addr_hint &
 		~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1);
 
-	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, addr, pages_per_huge_page);
-		return;
-	}
-
-	/* Clear sub-page to access last to keep its cache lines hot */
+	/* Process target subpage last to keep its cache lines hot */
 	might_sleep();
 	n = (addr_hint - addr) / PAGE_SIZE;
 	if (2 * n <= pages_per_huge_page) {
-		/* If sub-page to access in first half of huge page */
+		/* If target subpage in first half of huge page */
 		base = 0;
 		l = n;
-		/* Clear sub-pages at the end of huge page */
+		/* Process subpages at the end of huge page */
 		for (i = pages_per_huge_page - 1; i >= 2 * n; i--) {
 			cond_resched();
-			clear_user_highpage(page + i, addr + i * PAGE_SIZE);
+			process_subpage(addr + i * PAGE_SIZE, i, arg);
 		}
 	} else {
-		/* If sub-page to access in second half of huge page */
+		/* If target subpage in second half of huge page */
 		base = pages_per_huge_page - 2 * (pages_per_huge_page - n);
 		l = pages_per_huge_page - n;
-		/* Clear sub-pages at the begin of huge page */
+		/* Process subpages at the begin of huge page */
 		for (i = 0; i < base; i++) {
 			cond_resched();
-			clear_user_highpage(page + i, addr + i * PAGE_SIZE);
+			process_subpage(addr + i * PAGE_SIZE, i, arg);
 		}
 	}
 	/*
-	 * Clear remaining sub-pages in left-right-left-right pattern
-	 * towards the sub-page to access
+	 * Process remaining subpages in left-right-left-right pattern
+	 * towards the target subpage
 	 */
 	for (i = 0; i < l; i++) {
 		int left_idx = base + i;
 		int right_idx = base + 2 * l - 1 - i;
 
 		cond_resched();
-		clear_user_highpage(page + left_idx,
-				    addr + left_idx * PAGE_SIZE);
+		process_subpage(addr + left_idx * PAGE_SIZE, left_idx, arg);
 		cond_resched();
-		clear_user_highpage(page + right_idx,
-				    addr + right_idx * PAGE_SIZE);
+		process_subpage(addr + right_idx * PAGE_SIZE, right_idx, arg);
 	}
 }
 
+static void clear_gigantic_page(struct page *page,
+				unsigned long addr,
+				unsigned int pages_per_huge_page)
+{
+	int i;
+	struct page *p = page;
+
+	might_sleep();
+	for (i = 0; i < pages_per_huge_page;
+	     i++, p = mem_map_next(p, page, i)) {
+		cond_resched();
+		clear_user_highpage(p, addr + i * PAGE_SIZE);
+	}
+}
+
+static void clear_subpage(unsigned long addr, int idx, void *arg)
+{
+	struct page *page = arg;
+
+	clear_user_highpage(page + idx, addr);
+}
+
+void clear_huge_page(struct page *page,
+		     unsigned long addr_hint, unsigned int pages_per_huge_page)
+{
+	unsigned long addr = addr_hint &
+		~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1);
+
+	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
+		clear_gigantic_page(page, addr, pages_per_huge_page);
+		return;
+	}
+
+	process_huge_page(addr_hint, pages_per_huge_page, clear_subpage, page);
+}
+
 static void copy_user_gigantic_page(struct page *dst, struct page *src,
 				    unsigned long addr,
 				    struct vm_area_struct *vma,
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH -V2 -mm 1/4] mm, clear_huge_page: Move order algorithm into a separate function
  2018-05-24  0:58 ` [PATCH -V2 -mm 1/4] mm, clear_huge_page: Move order algorithm into a separate function Huang, Ying
@ 2018-05-24 20:55   ` Mike Kravetz
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Kravetz @ 2018-05-24 20:55 UTC (permalink / raw)
  To: Huang, Ying, Andrew Morton
  Cc: linux-mm, linux-kernel, Andi Kleen, Jan Kara, Michal Hocko,
	Andrea Arcangeli, Kirill A. Shutemov, Matthew Wilcox,
	Hugh Dickins, Minchan Kim, Shaohua Li, Christopher Lameter

On 05/23/2018 05:58 PM, Huang, Ying wrote:
> From: Huang Ying <ying.huang@intel.com>
> 
> In commit c79b57e462b5d ("mm: hugetlb: clear target sub-page last when
> clearing huge page"), to keep the cache lines of the target subpage
> hot, the order to clear the subpages in the huge page in
> clear_huge_page() is changed to clearing the subpage which is furthest
> from the target subpage firstly, and the target subpage last.  This
> optimization could be applied to copying huge page too with the same
> order algorithm.  To avoid code duplication and reduce maintenance
> overhead, in this patch, the order algorithm is moved out of
> clear_huge_page() into a separate function: process_huge_page().  So
> that we can use it for copying huge page too.
> 
> This will change the direct calls to clear_user_highpage() into the
> indirect calls.  But with the proper inline support of the compilers,
> the indirect call will be optimized to be the direct call.  Our tests
> show no performance change with the patch.
> 
> This patch is a code cleanup without functionality change.
> 
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>

Thanks for doing this.

The extra level of indirection does make this a bit more difficult to
read.  However, I believe this is offset by the reuse of the algorithm
in subsequent copy_huge_page support.

> Cc: Andi Kleen <andi.kleen@intel.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Shaohua Li <shli@fb.com>
> Cc: Christopher Lameter <cl@linux.com>
> ---
>  mm/memory.c | 90 ++++++++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 56 insertions(+), 34 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 14578158ed20..b9f573a81bbd 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4569,71 +4569,93 @@ EXPORT_SYMBOL(__might_fault);
>  #endif
>  
>  #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
> -static void clear_gigantic_page(struct page *page,
> -				unsigned long addr,
> -				unsigned int pages_per_huge_page)
> -{
> -	int i;
> -	struct page *p = page;
> -
> -	might_sleep();
> -	for (i = 0; i < pages_per_huge_page;
> -	     i++, p = mem_map_next(p, page, i)) {
> -		cond_resched();
> -		clear_user_highpage(p, addr + i * PAGE_SIZE);
> -	}
> -}
> -void clear_huge_page(struct page *page,
> -		     unsigned long addr_hint, unsigned int pages_per_huge_page)
> +/*
> + * Process all subpages of the specified huge page with the specified
> + * operation.  The target subpage will be processed last to keep its
> + * cache lines hot.
> + */
> +static inline void process_huge_page(
> +	unsigned long addr_hint, unsigned int pages_per_huge_page,
> +	void (*process_subpage)(unsigned long addr, int idx, void *arg),
> +	void *arg)

There could be a bit more information in the comment about the function.
But it is not a requirement, unless patch needs to be redone for some
other reason.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

>  {
>  	int i, n, base, l;
>  	unsigned long addr = addr_hint &
>  		~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1);
>  
> -	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
> -		clear_gigantic_page(page, addr, pages_per_huge_page);
> -		return;
> -	}
> -
> -	/* Clear sub-page to access last to keep its cache lines hot */
> +	/* Process target subpage last to keep its cache lines hot */
>  	might_sleep();
>  	n = (addr_hint - addr) / PAGE_SIZE;
>  	if (2 * n <= pages_per_huge_page) {
> -		/* If sub-page to access in first half of huge page */
> +		/* If target subpage in first half of huge page */
>  		base = 0;
>  		l = n;
> -		/* Clear sub-pages at the end of huge page */
> +		/* Process subpages at the end of huge page */
>  		for (i = pages_per_huge_page - 1; i >= 2 * n; i--) {
>  			cond_resched();
> -			clear_user_highpage(page + i, addr + i * PAGE_SIZE);
> +			process_subpage(addr + i * PAGE_SIZE, i, arg);
>  		}
>  	} else {
> -		/* If sub-page to access in second half of huge page */
> +		/* If target subpage in second half of huge page */
>  		base = pages_per_huge_page - 2 * (pages_per_huge_page - n);
>  		l = pages_per_huge_page - n;
> -		/* Clear sub-pages at the begin of huge page */
> +		/* Process subpages at the begin of huge page */
>  		for (i = 0; i < base; i++) {
>  			cond_resched();
> -			clear_user_highpage(page + i, addr + i * PAGE_SIZE);
> +			process_subpage(addr + i * PAGE_SIZE, i, arg);
>  		}
>  	}
>  	/*
> -	 * Clear remaining sub-pages in left-right-left-right pattern
> -	 * towards the sub-page to access
> +	 * Process remaining subpages in left-right-left-right pattern
> +	 * towards the target subpage
>  	 */
>  	for (i = 0; i < l; i++) {
>  		int left_idx = base + i;
>  		int right_idx = base + 2 * l - 1 - i;
>  
>  		cond_resched();
> -		clear_user_highpage(page + left_idx,
> -				    addr + left_idx * PAGE_SIZE);
> +		process_subpage(addr + left_idx * PAGE_SIZE, left_idx, arg);
>  		cond_resched();
> -		clear_user_highpage(page + right_idx,
> -				    addr + right_idx * PAGE_SIZE);
> +		process_subpage(addr + right_idx * PAGE_SIZE, right_idx, arg);
>  	}
>  }
>  
> +static void clear_gigantic_page(struct page *page,
> +				unsigned long addr,
> +				unsigned int pages_per_huge_page)
> +{
> +	int i;
> +	struct page *p = page;
> +
> +	might_sleep();
> +	for (i = 0; i < pages_per_huge_page;
> +	     i++, p = mem_map_next(p, page, i)) {
> +		cond_resched();
> +		clear_user_highpage(p, addr + i * PAGE_SIZE);
> +	}
> +}
> +
> +static void clear_subpage(unsigned long addr, int idx, void *arg)
> +{
> +	struct page *page = arg;
> +
> +	clear_user_highpage(page + idx, addr);
> +}
> +
> +void clear_huge_page(struct page *page,
> +		     unsigned long addr_hint, unsigned int pages_per_huge_page)
> +{
> +	unsigned long addr = addr_hint &
> +		~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1);
> +
> +	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
> +		clear_gigantic_page(page, addr, pages_per_huge_page);
> +		return;
> +	}
> +
> +	process_huge_page(addr_hint, pages_per_huge_page, clear_subpage, page);
> +}
> +
>  static void copy_user_gigantic_page(struct page *dst, struct page *src,
>  				    unsigned long addr,
>  				    struct vm_area_struct *vma,
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH -V2 -mm 2/4] mm, huge page: Copy target sub-page last when copy huge page
  2018-05-24  0:58 [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
  2018-05-24  0:58 ` [PATCH -V2 -mm 1/4] mm, clear_huge_page: Move order algorithm into a separate function Huang, Ying
@ 2018-05-24  0:58 ` Huang, Ying
  2018-05-24 21:25   ` Mike Kravetz
  2018-05-24  0:58 ` [PATCH -V2 -mm 3/4] mm, hugetlbfs: Rename address to haddr in hugetlb_cow() Huang, Ying
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Huang, Ying @ 2018-05-24  0:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Andi Kleen, Jan Kara,
	Michal Hocko, Andrea Arcangeli, Kirill A. Shutemov,
	Matthew Wilcox, Hugh Dickins, Minchan Kim, Shaohua Li,
	Christopher Lameter, Mike Kravetz

From: Huang Ying <ying.huang@intel.com>

Huge page helps to reduce TLB miss rate, but it has higher cache
footprint, sometimes this may cause some issue.  For example, when
copying huge page on x86_64 platform, the cache footprint is 4M.  But
on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M
LLC (last level cache).  That is, in average, there are 2.5M LLC for
each core and 1.25M LLC for each thread.

If the cache contention is heavy when copying the huge page, and we
copy the huge page from the begin to the end, it is possible that the
begin of huge page is evicted from the cache after we finishing
copying the end of the huge page.  And it is possible for the
application to access the begin of the huge page after copying the
huge page.

In commit c79b57e462b5d ("mm: hugetlb: clear target sub-page last when
clearing huge page"), to keep the cache lines of the target subpage
hot, the order to clear the subpages in the huge page in
clear_huge_page() is changed to clearing the subpage which is furthest
from the target subpage firstly, and the target subpage last.  The
similar order changing helps huge page copying too.  That is
implemented in this patch.  Because we have put the order algorithm
into a separate function, the implementation is quite simple.

The patch is a generic optimization which should benefit quite some
workloads, not for a specific use case.  To demonstrate the performance
benefit of the patch, we tested it with vm-scalability run on
transparent huge page.

With this patch, the throughput increases ~16.6% in vm-scalability
anon-cow-seq test case with 36 processes on a 2 socket Xeon E5 v3 2699
system (36 cores, 72 threads).  The test case set
/sys/kernel/mm/transparent_hugepage/enabled to be always, mmap() a big
anonymous memory area and populate it, then forked 36 child processes,
each writes to the anonymous memory area from the begin to the end, so
cause copy on write.  For each child process, other child processes
could be seen as other workloads which generate heavy cache pressure.
At the same time, the IPC (instruction per cycle) increased from 0.63
to 0.78, and the time spent in user space is reduced ~7.2%.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/mm.h |  3 ++-
 mm/huge_memory.c   |  3 ++-
 mm/memory.c        | 30 +++++++++++++++++++++++-------
 3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7cdd8b7f62e5..d227aadaa964 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2734,7 +2734,8 @@ extern void clear_huge_page(struct page *page,
 			    unsigned long addr_hint,
 			    unsigned int pages_per_huge_page);
 extern void copy_user_huge_page(struct page *dst, struct page *src,
-				unsigned long addr, struct vm_area_struct *vma,
+				unsigned long addr_hint,
+				struct vm_area_struct *vma,
 				unsigned int pages_per_huge_page);
 extern long copy_huge_page_from_user(struct page *dst_page,
 				const void __user *usr_src,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e9177363fe2e..1b7fd9bda1dc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1328,7 +1328,8 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 	if (!page)
 		clear_huge_page(new_page, vmf->address, HPAGE_PMD_NR);
 	else
-		copy_user_huge_page(new_page, page, haddr, vma, HPAGE_PMD_NR);
+		copy_user_huge_page(new_page, page, vmf->address,
+				    vma, HPAGE_PMD_NR);
 	__SetPageUptodate(new_page);

 	mmun_start = haddr;
diff --git a/mm/memory.c b/mm/memory.c
index b9f573a81bbd..5d432f833d19 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4675,11 +4675,31 @@ static void copy_user_gigantic_page(struct page *dst, struct page *src,
 	}
 }

+struct copy_subpage_arg {
+	struct page *dst;
+	struct page *src;
+	struct vm_area_struct *vma;
+};
+
+static void copy_subpage(unsigned long addr, int idx, void *arg)
+{
+	struct copy_subpage_arg *copy_arg = arg;
+
+	copy_user_highpage(copy_arg->dst + idx, copy_arg->src + idx,
+			   addr, copy_arg->vma);
+}
+
 void copy_user_huge_page(struct page *dst, struct page *src,
-			 unsigned long addr, struct vm_area_struct *vma,
+			 unsigned long addr_hint, struct vm_area_struct *vma,
 			 unsigned int pages_per_huge_page)
 {
-	int i;
+	unsigned long addr = addr_hint &
+		~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1);
+	struct copy_subpage_arg arg = {
+		.dst = dst,
+		.src = src,
+		.vma = vma,
+	};

 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
 		copy_user_gigantic_page(dst, src, addr, vma,
@@ -4687,11 +4707,7 @@ void copy_user_huge_page(struct page *dst, struct page *src,
 		return;
 	}

-	might_sleep();
-	for (i = 0; i < pages_per_huge_page; i++) {
-		cond_resched();
-		copy_user_highpage(dst + i, src + i, addr + i*PAGE_SIZE, vma);
-	}
+	process_huge_page(addr_hint, pages_per_huge_page, copy_subpage, &arg);
 }

 long copy_huge_page_from_user(struct page *dst_page,
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH -V2 -mm 2/4] mm, huge page: Copy target sub-page last when copy huge page
  2018-05-24  0:58 ` [PATCH -V2 -mm 2/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
@ 2018-05-24 21:25   ` Mike Kravetz
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Kravetz @ 2018-05-24 21:25 UTC (permalink / raw)
  To: Huang, Ying, Andrew Morton
  Cc: linux-mm, linux-kernel, Andi Kleen, Jan Kara, Michal Hocko,
	Andrea Arcangeli, Kirill A. Shutemov, Matthew Wilcox,
	Hugh Dickins, Minchan Kim, Shaohua Li, Christopher Lameter

On 05/23/2018 05:58 PM, Huang, Ying wrote:
> From: Huang Ying <ying.huang@intel.com>
> 
> Huge page helps to reduce TLB miss rate, but it has higher cache
> footprint, sometimes this may cause some issue.  For example, when
> copying huge page on x86_64 platform, the cache footprint is 4M.  But
> on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M
> LLC (last level cache).  That is, in average, there are 2.5M LLC for
> each core and 1.25M LLC for each thread.
> 
> If the cache contention is heavy when copying the huge page, and we
> copy the huge page from the begin to the end, it is possible that the
> begin of huge page is evicted from the cache after we finishing
> copying the end of the huge page.  And it is possible for the
> application to access the begin of the huge page after copying the
> huge page.
> 
> In commit c79b57e462b5d ("mm: hugetlb: clear target sub-page last when
> clearing huge page"), to keep the cache lines of the target subpage
> hot, the order to clear the subpages in the huge page in
> clear_huge_page() is changed to clearing the subpage which is furthest
> from the target subpage firstly, and the target subpage last.  The
> similar order changing helps huge page copying too.  That is
> implemented in this patch.  Because we have put the order algorithm
> into a separate function, the implementation is quite simple.
> 
> The patch is a generic optimization which should benefit quite some
> workloads, not for a specific use case.  To demonstrate the performance
> benefit of the patch, we tested it with vm-scalability run on
> transparent huge page.
> 
> With this patch, the throughput increases ~16.6% in vm-scalability
> anon-cow-seq test case with 36 processes on a 2 socket Xeon E5 v3 2699
> system (36 cores, 72 threads).  The test case set
> /sys/kernel/mm/transparent_hugepage/enabled to be always, mmap() a big
> anonymous memory area and populate it, then forked 36 child processes,
> each writes to the anonymous memory area from the begin to the end, so
> cause copy on write.  For each child process, other child processes
> could be seen as other workloads which generate heavy cache pressure.
> At the same time, the IPC (instruction per cycle) increased from 0.63
> to 0.78, and the time spent in user space is reduced ~7.2%.
> 
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

> Cc: Andi Kleen <andi.kleen@intel.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Shaohua Li <shli@fb.com>
> Cc: Christopher Lameter <cl@linux.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  include/linux/mm.h |  3 ++-
>  mm/huge_memory.c   |  3 ++-
>  mm/memory.c        | 30 +++++++++++++++++++++++-------
>  3 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7cdd8b7f62e5..d227aadaa964 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2734,7 +2734,8 @@ extern void clear_huge_page(struct page *page,
>  			    unsigned long addr_hint,
>  			    unsigned int pages_per_huge_page);
>  extern void copy_user_huge_page(struct page *dst, struct page *src,
> -				unsigned long addr, struct vm_area_struct *vma,
> +				unsigned long addr_hint,
> +				struct vm_area_struct *vma,
>  				unsigned int pages_per_huge_page);
>  extern long copy_huge_page_from_user(struct page *dst_page,
>  				const void __user *usr_src,
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e9177363fe2e..1b7fd9bda1dc 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1328,7 +1328,8 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
>  	if (!page)
>  		clear_huge_page(new_page, vmf->address, HPAGE_PMD_NR);
>  	else
> -		copy_user_huge_page(new_page, page, haddr, vma, HPAGE_PMD_NR);
> +		copy_user_huge_page(new_page, page, vmf->address,
> +				    vma, HPAGE_PMD_NR);
>  	__SetPageUptodate(new_page);
>  
>  	mmun_start = haddr;
> diff --git a/mm/memory.c b/mm/memory.c
> index b9f573a81bbd..5d432f833d19 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4675,11 +4675,31 @@ static void copy_user_gigantic_page(struct page *dst, struct page *src,
>  	}
>  }
>  
> +struct copy_subpage_arg {
> +	struct page *dst;
> +	struct page *src;
> +	struct vm_area_struct *vma;
> +};
> +
> +static void copy_subpage(unsigned long addr, int idx, void *arg)
> +{
> +	struct copy_subpage_arg *copy_arg = arg;
> +
> +	copy_user_highpage(copy_arg->dst + idx, copy_arg->src + idx,
> +			   addr, copy_arg->vma);
> +}
> +
>  void copy_user_huge_page(struct page *dst, struct page *src,
> -			 unsigned long addr, struct vm_area_struct *vma,
> +			 unsigned long addr_hint, struct vm_area_struct *vma,
>  			 unsigned int pages_per_huge_page)
>  {
> -	int i;
> +	unsigned long addr = addr_hint &
> +		~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1);
> +	struct copy_subpage_arg arg = {
> +		.dst = dst,
> +		.src = src,
> +		.vma = vma,
> +	};
>  
>  	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
>  		copy_user_gigantic_page(dst, src, addr, vma,
> @@ -4687,11 +4707,7 @@ void copy_user_huge_page(struct page *dst, struct page *src,
>  		return;
>  	}
>  
> -	might_sleep();
> -	for (i = 0; i < pages_per_huge_page; i++) {
> -		cond_resched();
> -		copy_user_highpage(dst + i, src + i, addr + i*PAGE_SIZE, vma);
> -	}
> +	process_huge_page(addr_hint, pages_per_huge_page, copy_subpage, &arg);
>  }
>  
>  long copy_huge_page_from_user(struct page *dst_page,
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH -V2 -mm 3/4] mm, hugetlbfs: Rename address to haddr in hugetlb_cow()
  2018-05-24  0:58 [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
  2018-05-24  0:58 ` [PATCH -V2 -mm 1/4] mm, clear_huge_page: Move order algorithm into a separate function Huang, Ying
  2018-05-24  0:58 ` [PATCH -V2 -mm 2/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
@ 2018-05-24  0:58 ` Huang, Ying
  2018-05-24 21:42   ` Mike Kravetz
  2018-05-24  0:58 ` [PATCH -V2 -mm 4/4] mm, hugetlbfs: Pass fault address to cow handler Huang, Ying
  2018-05-25 15:38 ` [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page Christopher Lameter
  4 siblings, 1 reply; 11+ messages in thread
From: Huang, Ying @ 2018-05-24  0:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, David Rientjes,
	Andrea Arcangeli, Kirill A. Shutemov, Andi Kleen, Jan Kara,
	Matthew Wilcox, Hugh Dickins, Minchan Kim, Shaohua Li,
	Christopher Lameter, Aneesh Kumar K.V, Punit Agrawal,
	Anshuman Khandual, Mike Kravetz, Michal Hocko

From: Huang Ying <ying.huang@intel.com>

To take better advantage of general huge page copying optimization,
the target subpage address will be passed to hugetlb_cow(), then
copy_user_huge_page().  So we will use both target subpage address and
huge page size aligned address in hugetlb_cow().  To distinguish
between them, "haddr" is used for huge page size aligned address to be
consistent with Transparent Huge Page naming convention.

Now, only huge page size aligned address is used in hugetlb_cow(), so
the "address" is renamed to "haddr" in hugetlb_cow() in this patch.
Next patch will use target subpage address in hugetlb_cow() too.

The patch is just code cleanup without any functionality changes.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 mm/hugetlb.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 696befffe6f7..ad3bec2ed269 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3500,7 +3500,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
  * Keep the pte_same checks anyway to make transition from the mutex easier.
  */
 static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
-		       unsigned long address, pte_t *ptep,
+		       unsigned long haddr, pte_t *ptep,
 		       struct page *pagecache_page, spinlock_t *ptl)
 {
 	pte_t pte;
@@ -3518,7 +3518,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * and just make the page writable */
 	if (page_mapcount(old_page) == 1 && PageAnon(old_page)) {
 		page_move_anon_rmap(old_page, vma);
-		set_huge_ptep_writable(vma, address, ptep);
+		set_huge_ptep_writable(vma, haddr, ptep);
 		return 0;
 	}
 
@@ -3542,7 +3542,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * be acquired again before returning to the caller, as expected.
 	 */
 	spin_unlock(ptl);
-	new_page = alloc_huge_page(vma, address, outside_reserve);
+	new_page = alloc_huge_page(vma, haddr, outside_reserve);
 
 	if (IS_ERR(new_page)) {
 		/*
@@ -3555,11 +3555,10 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (outside_reserve) {
 			put_page(old_page);
 			BUG_ON(huge_pte_none(pte));
-			unmap_ref_private(mm, vma, old_page, address);
+			unmap_ref_private(mm, vma, old_page, haddr);
 			BUG_ON(huge_pte_none(pte));
 			spin_lock(ptl);
-			ptep = huge_pte_offset(mm, address & huge_page_mask(h),
-					       huge_page_size(h));
+			ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
 			if (likely(ptep &&
 				   pte_same(huge_ptep_get(ptep), pte)))
 				goto retry_avoidcopy;
@@ -3584,12 +3583,12 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 		goto out_release_all;
 	}
 
-	copy_user_huge_page(new_page, old_page, address, vma,
+	copy_user_huge_page(new_page, old_page, haddr, vma,
 			    pages_per_huge_page(h));
 	__SetPageUptodate(new_page);
 	set_page_huge_active(new_page);
 
-	mmun_start = address & huge_page_mask(h);
+	mmun_start = haddr;
 	mmun_end = mmun_start + huge_page_size(h);
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 
@@ -3598,25 +3597,24 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * before the page tables are altered
 	 */
 	spin_lock(ptl);
-	ptep = huge_pte_offset(mm, address & huge_page_mask(h),
-			       huge_page_size(h));
+	ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
 	if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
 		ClearPagePrivate(new_page);
 
 		/* Break COW */
-		huge_ptep_clear_flush(vma, address, ptep);
+		huge_ptep_clear_flush(vma, haddr, ptep);
 		mmu_notifier_invalidate_range(mm, mmun_start, mmun_end);
-		set_huge_pte_at(mm, address, ptep,
+		set_huge_pte_at(mm, haddr, ptep,
 				make_huge_pte(vma, new_page, 1));
 		page_remove_rmap(old_page, true);
-		hugepage_add_new_anon_rmap(new_page, vma, address);
+		hugepage_add_new_anon_rmap(new_page, vma, haddr);
 		/* Make the old page be freed below */
 		new_page = old_page;
 	}
 	spin_unlock(ptl);
 	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
 out_release_all:
-	restore_reserve_on_error(h, vma, address, new_page);
+	restore_reserve_on_error(h, vma, haddr, new_page);
 	put_page(new_page);
 out_release_old:
 	put_page(old_page);
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH -V2 -mm 3/4] mm, hugetlbfs: Rename address to haddr in hugetlb_cow()
  2018-05-24  0:58 ` [PATCH -V2 -mm 3/4] mm, hugetlbfs: Rename address to haddr in hugetlb_cow() Huang, Ying
@ 2018-05-24 21:42   ` Mike Kravetz
  2018-05-25  0:34     ` Huang, Ying
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Kravetz @ 2018-05-24 21:42 UTC (permalink / raw)
  To: Huang, Ying, Andrew Morton
  Cc: linux-mm, linux-kernel, David Rientjes, Andrea Arcangeli,
	Kirill A. Shutemov, Andi Kleen, Jan Kara, Matthew Wilcox,
	Hugh Dickins, Minchan Kim, Shaohua Li, Christopher Lameter,
	Aneesh Kumar K.V, Punit Agrawal, Anshuman Khandual, Michal Hocko

On 05/23/2018 05:58 PM, Huang, Ying wrote:
> From: Huang Ying <ying.huang@intel.com>
> 
> To take better advantage of general huge page copying optimization,
> the target subpage address will be passed to hugetlb_cow(), then
> copy_user_huge_page().  So we will use both target subpage address and
> huge page size aligned address in hugetlb_cow().  To distinguish
> between them, "haddr" is used for huge page size aligned address to be
> consistent with Transparent Huge Page naming convention.
> 
> Now, only huge page size aligned address is used in hugetlb_cow(), so
> the "address" is renamed to "haddr" in hugetlb_cow() in this patch.
> Next patch will use target subpage address in hugetlb_cow() too.
> 
> The patch is just code cleanup without any functionality changes.
> 
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
> Suggested-by: Michal Hocko <mhocko@suse.com>

I believe Kirill may have been the one who suggested using haddr to be
consistent with usage in huge_memory.c.

> Cc: David Rientjes <rientjes@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Andi Kleen <andi.kleen@intel.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Shaohua Li <shli@fb.com>
> Cc: Christopher Lameter <cl@linux.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  mm/hugetlb.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 696befffe6f7..ad3bec2ed269 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3500,7 +3500,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
>   * Keep the pte_same checks anyway to make transition from the mutex easier.
>   */
>  static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
> -		       unsigned long address, pte_t *ptep,
> +		       unsigned long haddr, pte_t *ptep,
>  		       struct page *pagecache_page, spinlock_t *ptl)
>  {
>  	pte_t pte;
> @@ -3518,7 +3518,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 * and just make the page writable */
>  	if (page_mapcount(old_page) == 1 && PageAnon(old_page)) {
>  		page_move_anon_rmap(old_page, vma);
> -		set_huge_ptep_writable(vma, address, ptep);
> +		set_huge_ptep_writable(vma, haddr, ptep);
>  		return 0;
>  	}
>  
> @@ -3542,7 +3542,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 * be acquired again before returning to the caller, as expected.
>  	 */
>  	spin_unlock(ptl);
> -	new_page = alloc_huge_page(vma, address, outside_reserve);
> +	new_page = alloc_huge_page(vma, haddr, outside_reserve);
>  
>  	if (IS_ERR(new_page)) {
>  		/*
> @@ -3555,11 +3555,10 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  		if (outside_reserve) {
>  			put_page(old_page);
>  			BUG_ON(huge_pte_none(pte));
> -			unmap_ref_private(mm, vma, old_page, address);
> +			unmap_ref_private(mm, vma, old_page, haddr);
>  			BUG_ON(huge_pte_none(pte));
>  			spin_lock(ptl);
> -			ptep = huge_pte_offset(mm, address & huge_page_mask(h),
> -					       huge_page_size(h));
> +			ptep = huge_pte_offset(mm, haddr, huge_page_size(h));

Ha!  The name change points out an obviously unnecessary conversion in
the existing code.  Yes, hugetlb_cow is always passed a hpage aligned
address today.

>  			if (likely(ptep &&
>  				   pte_same(huge_ptep_get(ptep), pte)))
>  				goto retry_avoidcopy;
> @@ -3584,12 +3583,12 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  		goto out_release_all;
>  	}
>  
> -	copy_user_huge_page(new_page, old_page, address, vma,
> +	copy_user_huge_page(new_page, old_page, haddr, vma,
>  			    pages_per_huge_page(h));
>  	__SetPageUptodate(new_page);
>  	set_page_huge_active(new_page);
>  
> -	mmun_start = address & huge_page_mask(h);
> +	mmun_start = haddr;

And another one.

>  	mmun_end = mmun_start + huge_page_size(h);
>  	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
>  
> @@ -3598,25 +3597,24 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 * before the page tables are altered
>  	 */
>  	spin_lock(ptl);
> -	ptep = huge_pte_offset(mm, address & huge_page_mask(h),
> -			       huge_page_size(h));
> +	ptep = huge_pte_offset(mm, haddr, huge_page_size(h));

And yet another.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

>  	if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
>  		ClearPagePrivate(new_page);
>  
>  		/* Break COW */
> -		huge_ptep_clear_flush(vma, address, ptep);
> +		huge_ptep_clear_flush(vma, haddr, ptep);
>  		mmu_notifier_invalidate_range(mm, mmun_start, mmun_end);
> -		set_huge_pte_at(mm, address, ptep,
> +		set_huge_pte_at(mm, haddr, ptep,
>  				make_huge_pte(vma, new_page, 1));
>  		page_remove_rmap(old_page, true);
> -		hugepage_add_new_anon_rmap(new_page, vma, address);
> +		hugepage_add_new_anon_rmap(new_page, vma, haddr);
>  		/* Make the old page be freed below */
>  		new_page = old_page;
>  	}
>  	spin_unlock(ptl);
>  	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
>  out_release_all:
> -	restore_reserve_on_error(h, vma, address, new_page);
> +	restore_reserve_on_error(h, vma, haddr, new_page);
>  	put_page(new_page);
>  out_release_old:
>  	put_page(old_page);
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -V2 -mm 3/4] mm, hugetlbfs: Rename address to haddr in hugetlb_cow()
  2018-05-24 21:42   ` Mike Kravetz
@ 2018-05-25  0:34     ` Huang, Ying
  0 siblings, 0 replies; 11+ messages in thread
From: Huang, Ying @ 2018-05-25  0:34 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, linux-mm, linux-kernel, David Rientjes,
	Andrea Arcangeli, Kirill A. Shutemov, Andi Kleen, Jan Kara,
	Matthew Wilcox, Hugh Dickins, Minchan Kim, Shaohua Li,
	Christopher Lameter, Aneesh Kumar K.V, Punit Agrawal,
	Anshuman Khandual, Michal Hocko

Mike Kravetz <mike.kravetz@oracle.com> writes:

> On 05/23/2018 05:58 PM, Huang, Ying wrote:
>> From: Huang Ying <ying.huang@intel.com>
>> 
>> To take better advantage of general huge page copying optimization,
>> the target subpage address will be passed to hugetlb_cow(), then
>> copy_user_huge_page().  So we will use both target subpage address and
>> huge page size aligned address in hugetlb_cow().  To distinguish
>> between them, "haddr" is used for huge page size aligned address to be
>> consistent with Transparent Huge Page naming convention.
>> 
>> Now, only huge page size aligned address is used in hugetlb_cow(), so
>> the "address" is renamed to "haddr" in hugetlb_cow() in this patch.
>> Next patch will use target subpage address in hugetlb_cow() too.
>> 
>> The patch is just code cleanup without any functionality changes.
>> 
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
>> Suggested-by: Michal Hocko <mhocko@suse.com>
>
> I believe Kirill may have been the one who suggested using haddr to be
> consistent with usage in huge_memory.c.

Yes.  I should have added

Suggested-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Best Regards,
Huang, Ying

>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>> Cc: Andi Kleen <andi.kleen@intel.com>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Matthew Wilcox <mawilcox@microsoft.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Shaohua Li <shli@fb.com>
>> Cc: Christopher Lameter <cl@linux.com>
>> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> Cc: Punit Agrawal <punit.agrawal@arm.com>
>> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>> ---
>>  mm/hugetlb.c | 26 ++++++++++++--------------
>>  1 file changed, 12 insertions(+), 14 deletions(-)
>> 
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 696befffe6f7..ad3bec2ed269 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3500,7 +3500,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
>>   * Keep the pte_same checks anyway to make transition from the mutex easier.
>>   */
>>  static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>> -		       unsigned long address, pte_t *ptep,
>> +		       unsigned long haddr, pte_t *ptep,
>>  		       struct page *pagecache_page, spinlock_t *ptl)
>>  {
>>  	pte_t pte;
>> @@ -3518,7 +3518,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>>  	 * and just make the page writable */
>>  	if (page_mapcount(old_page) == 1 && PageAnon(old_page)) {
>>  		page_move_anon_rmap(old_page, vma);
>> -		set_huge_ptep_writable(vma, address, ptep);
>> +		set_huge_ptep_writable(vma, haddr, ptep);
>>  		return 0;
>>  	}
>>  
>> @@ -3542,7 +3542,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>>  	 * be acquired again before returning to the caller, as expected.
>>  	 */
>>  	spin_unlock(ptl);
>> -	new_page = alloc_huge_page(vma, address, outside_reserve);
>> +	new_page = alloc_huge_page(vma, haddr, outside_reserve);
>>  
>>  	if (IS_ERR(new_page)) {
>>  		/*
>> @@ -3555,11 +3555,10 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>>  		if (outside_reserve) {
>>  			put_page(old_page);
>>  			BUG_ON(huge_pte_none(pte));
>> -			unmap_ref_private(mm, vma, old_page, address);
>> +			unmap_ref_private(mm, vma, old_page, haddr);
>>  			BUG_ON(huge_pte_none(pte));
>>  			spin_lock(ptl);
>> -			ptep = huge_pte_offset(mm, address & huge_page_mask(h),
>> -					       huge_page_size(h));
>> +			ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
>
> Ha!  The name change points out an obviously unnecessary conversion in
> the existing code.  Yes, hugetlb_cow is always passed a hpage aligned
> address today.
>
>>  			if (likely(ptep &&
>>  				   pte_same(huge_ptep_get(ptep), pte)))
>>  				goto retry_avoidcopy;
>> @@ -3584,12 +3583,12 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>>  		goto out_release_all;
>>  	}
>>  
>> -	copy_user_huge_page(new_page, old_page, address, vma,
>> +	copy_user_huge_page(new_page, old_page, haddr, vma,
>>  			    pages_per_huge_page(h));
>>  	__SetPageUptodate(new_page);
>>  	set_page_huge_active(new_page);
>>  
>> -	mmun_start = address & huge_page_mask(h);
>> +	mmun_start = haddr;
>
> And another one.
>
>>  	mmun_end = mmun_start + huge_page_size(h);
>>  	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
>>  
>> @@ -3598,25 +3597,24 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>>  	 * before the page tables are altered
>>  	 */
>>  	spin_lock(ptl);
>> -	ptep = huge_pte_offset(mm, address & huge_page_mask(h),
>> -			       huge_page_size(h));
>> +	ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
>
> And yet another.
>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH -V2 -mm 4/4] mm, hugetlbfs: Pass fault address to cow handler
  2018-05-24  0:58 [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
                   ` (2 preceding siblings ...)
  2018-05-24  0:58 ` [PATCH -V2 -mm 3/4] mm, hugetlbfs: Rename address to haddr in hugetlb_cow() Huang, Ying
@ 2018-05-24  0:58 ` Huang, Ying
  2018-05-24 22:27   ` Mike Kravetz
  2018-05-25 15:38 ` [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page Christopher Lameter
  4 siblings, 1 reply; 11+ messages in thread
From: Huang, Ying @ 2018-05-24  0:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Mike Kravetz, Michal Hocko,
	David Rientjes, Andrea Arcangeli, Kirill A. Shutemov, Andi Kleen,
	Jan Kara, Matthew Wilcox, Hugh Dickins, Minchan Kim, Shaohua Li,
	Christopher Lameter, Aneesh Kumar K.V, Punit Agrawal,
	Anshuman Khandual

From: Huang Ying <ying.huang@intel.com>

This is to take better advantage of the general huge page copying
optimization.  Where, the target subpage will be copied last to avoid
the cache lines of target subpage to be evicted when copying other
subpages.  This works better if the address of the target subpage is
available when copying huge page.  So hugetlbfs page fault handlers
are changed to pass that information to hugetlb_cow().  This will
benefit workloads which don't access the begin of the hugetlbfs huge
page after the page fault under heavy cache contention.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 mm/hugetlb.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ad3bec2ed269..1df974af34c1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3500,7 +3500,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
  * Keep the pte_same checks anyway to make transition from the mutex easier.
  */
 static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
-		       unsigned long haddr, pte_t *ptep,
+		       unsigned long address, pte_t *ptep,
 		       struct page *pagecache_page, spinlock_t *ptl)
 {
 	pte_t pte;
@@ -3509,6 +3509,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	int ret = 0, outside_reserve = 0;
 	unsigned long mmun_start;	/* For mmu_notifiers */
 	unsigned long mmun_end;		/* For mmu_notifiers */
+	unsigned long haddr = address & huge_page_mask(h);
 
 	pte = huge_ptep_get(ptep);
 	old_page = pte_page(pte);
@@ -3583,7 +3584,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 		goto out_release_all;
 	}
 
-	copy_user_huge_page(new_page, old_page, haddr, vma,
+	copy_user_huge_page(new_page, old_page, address, vma,
 			    pages_per_huge_page(h));
 	__SetPageUptodate(new_page);
 	set_page_huge_active(new_page);
@@ -3817,7 +3818,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	hugetlb_count_add(pages_per_huge_page(h), mm);
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
 		/* Optimization, do the COW without a second fault */
-		ret = hugetlb_cow(mm, vma, haddr, ptep, page, ptl);
+		ret = hugetlb_cow(mm, vma, address, ptep, page, ptl);
 	}
 
 	spin_unlock(ptl);
@@ -3971,7 +3972,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (flags & FAULT_FLAG_WRITE) {
 		if (!huge_pte_write(entry)) {
-			ret = hugetlb_cow(mm, vma, haddr, ptep,
+			ret = hugetlb_cow(mm, vma, address, ptep,
 					  pagecache_page, ptl);
 			goto out_put_page;
 		}
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH -V2 -mm 4/4] mm, hugetlbfs: Pass fault address to cow handler
  2018-05-24  0:58 ` [PATCH -V2 -mm 4/4] mm, hugetlbfs: Pass fault address to cow handler Huang, Ying
@ 2018-05-24 22:27   ` Mike Kravetz
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Kravetz @ 2018-05-24 22:27 UTC (permalink / raw)
  To: Huang, Ying, Andrew Morton
  Cc: linux-mm, linux-kernel, Michal Hocko, David Rientjes,
	Andrea Arcangeli, Kirill A. Shutemov, Andi Kleen, Jan Kara,
	Matthew Wilcox, Hugh Dickins, Minchan Kim, Shaohua Li,
	Christopher Lameter, Aneesh Kumar K.V, Punit Agrawal,
	Anshuman Khandual

On 05/23/2018 05:58 PM, Huang, Ying wrote:
> From: Huang Ying <ying.huang@intel.com>
> 
> This is to take better advantage of the general huge page copying
> optimization.  Where, the target subpage will be copied last to avoid
> the cache lines of target subpage to be evicted when copying other
> subpages.  This works better if the address of the target subpage is
> available when copying huge page.  So hugetlbfs page fault handlers
> are changed to pass that information to hugetlb_cow().  This will
> benefit workloads which don't access the begin of the hugetlbfs huge
> page after the page fault under heavy cache contention.
> 
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Andi Kleen <andi.kleen@intel.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Shaohua Li <shli@fb.com>
> Cc: Christopher Lameter <cl@linux.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  mm/hugetlb.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ad3bec2ed269..1df974af34c1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3500,7 +3500,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
>   * Keep the pte_same checks anyway to make transition from the mutex easier.
>   */
>  static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
> -		       unsigned long haddr, pte_t *ptep,
> +		       unsigned long address, pte_t *ptep,
>  		       struct page *pagecache_page, spinlock_t *ptl)
>  {
>  	pte_t pte;
> @@ -3509,6 +3509,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  	int ret = 0, outside_reserve = 0;
>  	unsigned long mmun_start;	/* For mmu_notifiers */
>  	unsigned long mmun_end;		/* For mmu_notifiers */
> +	unsigned long haddr = address & huge_page_mask(h);
>  
>  	pte = huge_ptep_get(ptep);
>  	old_page = pte_page(pte);
> @@ -3583,7 +3584,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  		goto out_release_all;
>  	}
>  
> -	copy_user_huge_page(new_page, old_page, haddr, vma,
> +	copy_user_huge_page(new_page, old_page, address, vma,
>  			    pages_per_huge_page(h));
>  	__SetPageUptodate(new_page);
>  	set_page_huge_active(new_page);
> @@ -3817,7 +3818,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	hugetlb_count_add(pages_per_huge_page(h), mm);
>  	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
>  		/* Optimization, do the COW without a second fault */
> -		ret = hugetlb_cow(mm, vma, haddr, ptep, page, ptl);
> +		ret = hugetlb_cow(mm, vma, address, ptep, page, ptl);
>  	}
>  
>  	spin_unlock(ptl);
> @@ -3971,7 +3972,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  
>  	if (flags & FAULT_FLAG_WRITE) {
>  		if (!huge_pte_write(entry)) {
> -			ret = hugetlb_cow(mm, vma, haddr, ptep,
> +			ret = hugetlb_cow(mm, vma, address, ptep,
>  					  pagecache_page, ptl);
>  			goto out_put_page;
>  		}
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page
  2018-05-24  0:58 [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
                   ` (3 preceding siblings ...)
  2018-05-24  0:58 ` [PATCH -V2 -mm 4/4] mm, hugetlbfs: Pass fault address to cow handler Huang, Ying
@ 2018-05-25 15:38 ` Christopher Lameter
  4 siblings, 0 replies; 11+ messages in thread
From: Christopher Lameter @ 2018-05-25 15:38 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Andi Kleen, Jan Kara,
	Michal Hocko, Andrea Arcangeli, Kirill A. Shutemov,
	Matthew Wilcox, Hugh Dickins, Minchan Kim, Shaohua Li,
	Mike Kravetz

On Thu, 24 May 2018, Huang, Ying wrote:

> If the cache contention is heavy when copying the huge page, and we
> copy the huge page from the begin to the end, it is possible that the
> begin of huge page is evicted from the cache after we finishing
> copying the end of the huge page.  And it is possible for the
> application to access the begin of the huge page after copying the
> huge page.

Isnt there a better way to zero the remaining pages? Something that has no
cache impact like a non temporal store? So the remaining cache will not be
evicted?

https://www.felixcloutier.com/x86/MOVNTI.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-05-25 15:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24  0:58 [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
2018-05-24  0:58 ` [PATCH -V2 -mm 1/4] mm, clear_huge_page: Move order algorithm into a separate function Huang, Ying
2018-05-24 20:55   ` Mike Kravetz
2018-05-24  0:58 ` [PATCH -V2 -mm 2/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
2018-05-24 21:25   ` Mike Kravetz
2018-05-24  0:58 ` [PATCH -V2 -mm 3/4] mm, hugetlbfs: Rename address to haddr in hugetlb_cow() Huang, Ying
2018-05-24 21:42   ` Mike Kravetz
2018-05-25  0:34     ` Huang, Ying
2018-05-24  0:58 ` [PATCH -V2 -mm 4/4] mm, hugetlbfs: Pass fault address to cow handler Huang, Ying
2018-05-24 22:27   ` Mike Kravetz
2018-05-25 15:38 ` [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page Christopher Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.