linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v3][PATCH 0/2] v3: fix hugetlb vs. anon-thp copy page
@ 2013-11-15 22:55 Dave Hansen
  2013-11-15 22:55 ` [v3][PATCH 1/2] mm: hugetlbfs: Add VM_BUG_ON()s to catch non-hugetlbfs pages Dave Hansen
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Dave Hansen @ 2013-11-15 22:55 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, dave.jiang, akpm, dhillf, Naoya Horiguchi,
	Mel Gorman, Dave Hansen

This took some of Mel's comments in to consideration.  Dave
Jiang, could you retest this if you get a chance?  These have
only been lightly compile-tested.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [v3][PATCH 1/2] mm: hugetlbfs: Add VM_BUG_ON()s to catch non-hugetlbfs pages
  2013-11-15 22:55 [v3][PATCH 0/2] v3: fix hugetlb vs. anon-thp copy page Dave Hansen
@ 2013-11-15 22:55 ` Dave Hansen
  2013-11-15 22:55 ` [v3][PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
  2013-11-18 19:23 ` [v3][PATCH 0/2] v3: fix hugetlb vs. anon-thp copy page Jiang, Dave
  2 siblings, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2013-11-15 22:55 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, dave.jiang, akpm, dhillf, Naoya Horiguchi,
	Mel Gorman, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

Changes from v2:
 * Removed the VM_BUG_ON() from copy_huge_page() since the
   next patch makes it able to handle non-hugetlbfs pages

--

Dave Jiang reported that he was seeing oopses when running
NUMA systems and default_hugepagesz=1G.  I traced the issue down
to migrate_page_copy() trying to use the same code for hugetlb
pages and transparent hugepages.  It should not have been trying
to pass thp pages in there.

So, add a VM_BUG_ON()s for the next hapless developer that
tries to use page_hstate() on a non-hugetlbfs page.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Mel Gorman <mgorman@suse.de>

---

 linux.git-davehans/include/linux/hugetlb.h |    1 +
 1 file changed, 1 insertion(+)

diff -puN include/linux/hugetlb.h~bug-not-hugetlbfs-in-copy_huge_page include/linux/hugetlb.h
--- linux.git/include/linux/hugetlb.h~bug-not-hugetlbfs-in-copy_huge_page	2013-11-15 14:44:41.550357120 -0800
+++ linux.git-davehans/include/linux/hugetlb.h	2013-11-15 14:44:41.553357255 -0800
@@ -355,6 +355,7 @@ static inline pte_t arch_make_huge_pte(p
 
 static inline struct hstate *page_hstate(struct page *page)
 {
+	VM_BUG_ON(!PageHuge(page));
 	return size_to_hstate(PAGE_SIZE << compound_order(page));
 }
 
_

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [v3][PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page
  2013-11-15 22:55 [v3][PATCH 0/2] v3: fix hugetlb vs. anon-thp copy page Dave Hansen
  2013-11-15 22:55 ` [v3][PATCH 1/2] mm: hugetlbfs: Add VM_BUG_ON()s to catch non-hugetlbfs pages Dave Hansen
@ 2013-11-15 22:55 ` Dave Hansen
  2013-11-18 10:32   ` Mel Gorman
  2013-11-18 18:51   ` [v3][PATCH 2/2] mm: thp: give transparent hugepage code aseparate copy_page Naoya Horiguchi
  2013-11-18 19:23 ` [v3][PATCH 0/2] v3: fix hugetlb vs. anon-thp copy page Jiang, Dave
  2 siblings, 2 replies; 13+ messages in thread
From: Dave Hansen @ 2013-11-15 22:55 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, dave.jiang, akpm, dhillf, Naoya Horiguchi,
	Mel Gorman, Dave Hansen


Changes from v2:
 * 
Changes from v1:
 * removed explicit might_sleep() in favor of the one that we
   get from the cond_resched();

--

From: Dave Hansen <dave.hansen@linux.intel.com>

Right now, the migration code in migrate_page_copy() uses
copy_huge_page() for hugetlbfs and thp pages:

       if (PageHuge(page) || PageTransHuge(page))
                copy_huge_page(newpage, page);

So, yay for code reuse.  But:

void copy_huge_page(struct page *dst, struct page *src)
{
        struct hstate *h = page_hstate(src);

and a non-hugetlbfs page has no page_hstate().  This works 99% of
the time because page_hstate() determines the hstate from the
page order alone.  Since the page order of a THP page matches the
default hugetlbfs page order, it works.

But, if you change the default huge page size on the boot
command-line (say default_hugepagesz=1G), then we might not even
*have* a 2MB hstate so page_hstate() returns null and
copy_huge_page() oopses pretty fast since copy_huge_page()
dereferences the hstate:

void copy_huge_page(struct page *dst, struct page *src)
{
        struct hstate *h = page_hstate(src);
        if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
...

Mel noticed that the migration code is really the only user of
these functions.  This moves all the copy code over to migrate.c
and makes copy_huge_page() work for THP by checking for it
explicitly.

I believe the bug was introduced in b32967ff101:
Author: Mel Gorman <mgorman@suse.de>
Date:   Mon Nov 19 12:35:47 2012 +0000
mm: numa: Add THP migration for the NUMA working set scanning fault case.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/include/linux/hugetlb.h |    4 --
 linux.git-davehans/mm/hugetlb.c            |   34 --------------------
 linux.git-davehans/mm/migrate.c            |   48 +++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+), 38 deletions(-)

diff -puN mm/migrate.c~copy-huge-separate-from-copy-transhuge mm/migrate.c
--- linux.git/mm/migrate.c~copy-huge-separate-from-copy-transhuge	2013-11-15 14:44:55.256970259 -0800
+++ linux.git-davehans/mm/migrate.c	2013-11-15 14:45:17.457963844 -0800
@@ -442,6 +442,54 @@ int migrate_huge_page_move_mapping(struc
 }
 
 /*
+ * Gigantic pages are so large that the we do not guarantee
+ * that page++ pointer arithmetic will work across the
+ * entire page.  We need something more specialized.
+ */
+static void __copy_gigantic_page(struct page *dst, struct page *src,
+				int nr_pages)
+{
+	int i;
+	struct page *dst_base = dst;
+	struct page *src_base = src;
+
+	for (i = 0; i < nr_pages; ) {
+		cond_resched();
+		copy_highpage(dst, src);
+
+		i++;
+		dst = mem_map_next(dst, dst_base, i);
+		src = mem_map_next(src, src_base, i);
+	}
+}
+
+static void copy_huge_page(struct page *dst, struct page *src)
+{
+	int i;
+	int nr_pages;
+
+	if (PageHuge(src)) {
+		/* hugetlbfs page */
+		struct hstate *h = page_hstate(src);
+		nr_pages = pages_per_huge_page(h);
+
+		if (unlikely(nr_pages > MAX_ORDER_NR_PAGES)) {
+			__copy_gigantic_page(dst, src, nr_pages);
+			return;
+		}
+	} else {
+		/* thp page */
+		BUG_ON(!PageTransHuge(src));
+		nr_pages = hpage_nr_pages(src);
+	}
+
+	for (i = 0; i < nr_pages; i++ ) {
+		cond_resched();
+		copy_highpage(dst + i, src + i);
+	}
+}
+
+/*
  * Copy the page to its new location
  */
 void migrate_page_copy(struct page *newpage, struct page *page)
diff -puN mm/hugetlb.c~copy-huge-separate-from-copy-transhuge mm/hugetlb.c
--- linux.git/mm/hugetlb.c~copy-huge-separate-from-copy-transhuge	2013-11-15 14:44:55.261970484 -0800
+++ linux.git-davehans/mm/hugetlb.c	2013-11-15 14:44:55.389976227 -0800
@@ -476,40 +476,6 @@ static int vma_has_reserves(struct vm_ar
 	return 0;
 }
 
-static void copy_gigantic_page(struct page *dst, struct page *src)
-{
-	int i;
-	struct hstate *h = page_hstate(src);
-	struct page *dst_base = dst;
-	struct page *src_base = src;
-
-	for (i = 0; i < pages_per_huge_page(h); ) {
-		cond_resched();
-		copy_highpage(dst, src);
-
-		i++;
-		dst = mem_map_next(dst, dst_base, i);
-		src = mem_map_next(src, src_base, i);
-	}
-}
-
-void copy_huge_page(struct page *dst, struct page *src)
-{
-	int i;
-	struct hstate *h = page_hstate(src);
-
-	if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
-		copy_gigantic_page(dst, src);
-		return;
-	}
-
-	might_sleep();
-	for (i = 0; i < pages_per_huge_page(h); i++) {
-		cond_resched();
-		copy_highpage(dst + i, src + i);
-	}
-}
-
 static void enqueue_huge_page(struct hstate *h, struct page *page)
 {
 	int nid = page_to_nid(page);
diff -puN include/linux/hugetlb.h~copy-huge-separate-from-copy-transhuge include/linux/hugetlb.h
--- linux.git/include/linux/hugetlb.h~copy-huge-separate-from-copy-transhuge	2013-11-15 14:44:55.263970574 -0800
+++ linux.git-davehans/include/linux/hugetlb.h	2013-11-15 14:44:55.325973356 -0800
@@ -69,7 +69,6 @@ int dequeue_hwpoisoned_huge_page(struct
 bool isolate_huge_page(struct page *page, struct list_head *list);
 void putback_active_hugepage(struct page *page);
 bool is_hugepage_active(struct page *page);
-void copy_huge_page(struct page *dst, struct page *src);
 
 #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
 pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud);
@@ -140,9 +139,6 @@ static inline int dequeue_hwpoisoned_hug
 #define isolate_huge_page(p, l) false
 #define putback_active_hugepage(p)	do {} while (0)
 #define is_hugepage_active(x)	false
-static inline void copy_huge_page(struct page *dst, struct page *src)
-{
-}
 
 static inline unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 		unsigned long address, unsigned long end, pgprot_t newprot)
_

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [v3][PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page
  2013-11-15 22:55 ` [v3][PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
@ 2013-11-18 10:32   ` Mel Gorman
  2013-11-18 18:51   ` [v3][PATCH 2/2] mm: thp: give transparent hugepage code aseparate copy_page Naoya Horiguchi
  1 sibling, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2013-11-18 10:32 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, linux-kernel, dave.jiang, akpm, dhillf, Naoya Horiguchi

On Fri, Nov 15, 2013 at 02:55:53PM -0800, Dave Hansen wrote:
> 
> Changes from v2:
>  * 
> Changes from v1:
>  * removed explicit might_sleep() in favor of the one that we
>    get from the cond_resched();
> 
> --
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> Right now, the migration code in migrate_page_copy() uses
> copy_huge_page() for hugetlbfs and thp pages:
> 
>        if (PageHuge(page) || PageTransHuge(page))
>                 copy_huge_page(newpage, page);
> 
> So, yay for code reuse.  But:
> 
> void copy_huge_page(struct page *dst, struct page *src)
> {
>         struct hstate *h = page_hstate(src);
> 
> and a non-hugetlbfs page has no page_hstate().  This works 99% of
> the time because page_hstate() determines the hstate from the
> page order alone.  Since the page order of a THP page matches the
> default hugetlbfs page order, it works.
> 
> But, if you change the default huge page size on the boot
> command-line (say default_hugepagesz=1G), then we might not even
> *have* a 2MB hstate so page_hstate() returns null and
> copy_huge_page() oopses pretty fast since copy_huge_page()
> dereferences the hstate:
> 
> void copy_huge_page(struct page *dst, struct page *src)
> {
>         struct hstate *h = page_hstate(src);
>         if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
> ...
> 
> Mel noticed that the migration code is really the only user of
> these functions.  This moves all the copy code over to migrate.c
> and makes copy_huge_page() work for THP by checking for it
> explicitly.
> 
> I believe the bug was introduced in b32967ff101:
> Author: Mel Gorman <mgorman@suse.de>
> Date:   Mon Nov 19 12:35:47 2012 +0000
> mm: numa: Add THP migration for the NUMA working set scanning fault case.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [v3][PATCH 2/2] mm: thp: give transparent hugepage code aseparate copy_page
  2013-11-15 22:55 ` [v3][PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
  2013-11-18 10:32   ` Mel Gorman
@ 2013-11-18 18:51   ` Naoya Horiguchi
  2013-11-18 18:54     ` [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy Naoya Horiguchi
  1 sibling, 1 reply; 13+ messages in thread
From: Naoya Horiguchi @ 2013-11-18 18:51 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, dave.jiang, akpm, dhillf, Mel Gorman

On Fri, Nov 15, 2013 at 02:55:53PM -0800, Dave Hansen wrote:
> 
> Changes from v2:
>  * 
> Changes from v1:
>  * removed explicit might_sleep() in favor of the one that we
>    get from the cond_resched();
> 
> --
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> Right now, the migration code in migrate_page_copy() uses
> copy_huge_page() for hugetlbfs and thp pages:
> 
>        if (PageHuge(page) || PageTransHuge(page))
>                 copy_huge_page(newpage, page);
> 
> So, yay for code reuse.  But:
> 
> void copy_huge_page(struct page *dst, struct page *src)
> {
>         struct hstate *h = page_hstate(src);
> 
> and a non-hugetlbfs page has no page_hstate().  This works 99% of
> the time because page_hstate() determines the hstate from the
> page order alone.  Since the page order of a THP page matches the
> default hugetlbfs page order, it works.
> 
> But, if you change the default huge page size on the boot
> command-line (say default_hugepagesz=1G), then we might not even
> *have* a 2MB hstate so page_hstate() returns null and
> copy_huge_page() oopses pretty fast since copy_huge_page()
> dereferences the hstate:
> 
> void copy_huge_page(struct page *dst, struct page *src)
> {
>         struct hstate *h = page_hstate(src);
>         if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
> ...
> 
> Mel noticed that the migration code is really the only user of
> these functions.  This moves all the copy code over to migrate.c
> and makes copy_huge_page() work for THP by checking for it
> explicitly.
> 
> I believe the bug was introduced in b32967ff101:
> Author: Mel Gorman <mgorman@suse.de>
> Date:   Mon Nov 19 12:35:47 2012 +0000
> mm: numa: Add THP migration for the NUMA working set scanning fault case.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

Looks good to me with a few comments below. Thanks.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

> ---
> 
>  linux.git-davehans/include/linux/hugetlb.h |    4 --
>  linux.git-davehans/mm/hugetlb.c            |   34 --------------------
>  linux.git-davehans/mm/migrate.c            |   48 +++++++++++++++++++++++++++++
>  3 files changed, 48 insertions(+), 38 deletions(-)
> 
> diff -puN mm/migrate.c~copy-huge-separate-from-copy-transhuge mm/migrate.c
> --- linux.git/mm/migrate.c~copy-huge-separate-from-copy-transhuge	2013-11-15 14:44:55.256970259 -0800
> +++ linux.git-davehans/mm/migrate.c	2013-11-15 14:45:17.457963844 -0800
> @@ -442,6 +442,54 @@ int migrate_huge_page_move_mapping(struc
>  }
>  
>  /*
> + * Gigantic pages are so large that the we do not guarantee

s/the // ?

> + * that page++ pointer arithmetic will work across the
> + * entire page.  We need something more specialized.
> + */
> +static void __copy_gigantic_page(struct page *dst, struct page *src,
> +				int nr_pages)
> +{
> +	int i;
> +	struct page *dst_base = dst;
> +	struct page *src_base = src;
> +
> +	for (i = 0; i < nr_pages; ) {
> +		cond_resched();

I think that this cond_resched() seems to be called too often.
One cond_resched() per MAX_ORDER_NR_PAGES pages copy looks better
than per single page copy. So I'll post a separate patch for this
which is appled on top of your patches.

> +		copy_highpage(dst, src);
> +
> +		i++;
> +		dst = mem_map_next(dst, dst_base, i);
> +		src = mem_map_next(src, src_base, i);
> +	}
> +}
> +
> +static void copy_huge_page(struct page *dst, struct page *src)
> +{
> +	int i;
> +	int nr_pages;
> +
> +	if (PageHuge(src)) {
> +		/* hugetlbfs page */
> +		struct hstate *h = page_hstate(src);
> +		nr_pages = pages_per_huge_page(h);
> +
> +		if (unlikely(nr_pages > MAX_ORDER_NR_PAGES)) {
> +			__copy_gigantic_page(dst, src, nr_pages);
> +			return;
> +		}
> +	} else {
> +		/* thp page */
> +		BUG_ON(!PageTransHuge(src));
> +		nr_pages = hpage_nr_pages(src);
> +	}
> +
> +	for (i = 0; i < nr_pages; i++ ) {

Coding style violation?

  ERROR: space prohibited before that close parenthesis ')'
  #177: FILE: mm/migrate.c:486:                            
  +       for (i = 0; i < nr_pages; i++ ) {                

Thanks,
Naoya Horiguchi

> +		cond_resched();
> +		copy_highpage(dst + i, src + i);
> +	}
> +}
> +
> +/*
>   * Copy the page to its new location
>   */
>  void migrate_page_copy(struct page *newpage, struct page *page)
> diff -puN mm/hugetlb.c~copy-huge-separate-from-copy-transhuge mm/hugetlb.c
> --- linux.git/mm/hugetlb.c~copy-huge-separate-from-copy-transhuge	2013-11-15 14:44:55.261970484 -0800
> +++ linux.git-davehans/mm/hugetlb.c	2013-11-15 14:44:55.389976227 -0800
> @@ -476,40 +476,6 @@ static int vma_has_reserves(struct vm_ar
>  	return 0;
>  }
>  
> -static void copy_gigantic_page(struct page *dst, struct page *src)
> -{
> -	int i;
> -	struct hstate *h = page_hstate(src);
> -	struct page *dst_base = dst;
> -	struct page *src_base = src;
> -
> -	for (i = 0; i < pages_per_huge_page(h); ) {
> -		cond_resched();
> -		copy_highpage(dst, src);
> -
> -		i++;
> -		dst = mem_map_next(dst, dst_base, i);
> -		src = mem_map_next(src, src_base, i);
> -	}
> -}
> -
> -void copy_huge_page(struct page *dst, struct page *src)
> -{
> -	int i;
> -	struct hstate *h = page_hstate(src);
> -
> -	if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
> -		copy_gigantic_page(dst, src);
> -		return;
> -	}
> -
> -	might_sleep();
> -	for (i = 0; i < pages_per_huge_page(h); i++) {
> -		cond_resched();
> -		copy_highpage(dst + i, src + i);
> -	}
> -}
> -
>  static void enqueue_huge_page(struct hstate *h, struct page *page)
>  {
>  	int nid = page_to_nid(page);
> diff -puN include/linux/hugetlb.h~copy-huge-separate-from-copy-transhuge include/linux/hugetlb.h
> --- linux.git/include/linux/hugetlb.h~copy-huge-separate-from-copy-transhuge	2013-11-15 14:44:55.263970574 -0800
> +++ linux.git-davehans/include/linux/hugetlb.h	2013-11-15 14:44:55.325973356 -0800
> @@ -69,7 +69,6 @@ int dequeue_hwpoisoned_huge_page(struct
>  bool isolate_huge_page(struct page *page, struct list_head *list);
>  void putback_active_hugepage(struct page *page);
>  bool is_hugepage_active(struct page *page);
> -void copy_huge_page(struct page *dst, struct page *src);
>  
>  #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
>  pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud);
> @@ -140,9 +139,6 @@ static inline int dequeue_hwpoisoned_hug
>  #define isolate_huge_page(p, l) false
>  #define putback_active_hugepage(p)	do {} while (0)
>  #define is_hugepage_active(x)	false
> -static inline void copy_huge_page(struct page *dst, struct page *src)
> -{
> -}
>  
>  static inline unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>  		unsigned long address, unsigned long end, pgprot_t newprot)
> _
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy
  2013-11-18 18:51   ` [v3][PATCH 2/2] mm: thp: give transparent hugepage code aseparate copy_page Naoya Horiguchi
@ 2013-11-18 18:54     ` Naoya Horiguchi
  2013-11-18 19:02       ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Naoya Horiguchi @ 2013-11-18 18:54 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, dave.jiang, akpm, dhillf, Mel Gorman

In copy_huge_page() we call cond_resched() before every single page copy.
This is an overkill because single page copy is not a heavy operation.
This patch changes this to call cond_resched() per MAX_ORDER_NR_PAGES pages.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/migrate.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index cb5d152b58bc..661ff5f66591 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -454,7 +454,8 @@ static void __copy_gigantic_page(struct page *dst, struct page *src,
 	struct page *src_base = src;
 
 	for (i = 0; i < nr_pages; ) {
-		cond_resched();
+		if (i % MAX_ORDER_NR_PAGES == 0)
+			cond_resched();
 		copy_highpage(dst, src);
 
 		i++;
@@ -483,10 +484,9 @@ static void copy_huge_page(struct page *dst, struct page *src)
 		nr_pages = hpage_nr_pages(src);
 	}
 
-	for (i = 0; i < nr_pages; i++ ) {
-		cond_resched();
+	cond_resched();
+	for (i = 0; i < nr_pages; i++)
 		copy_highpage(dst + i, src + i);
-	}
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy
  2013-11-18 18:54     ` [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy Naoya Horiguchi
@ 2013-11-18 19:02       ` Dave Hansen
  2013-11-18 20:20         ` Naoya Horiguchi
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2013-11-18 19:02 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, dave.jiang, akpm, dhillf, Mel Gorman

On 11/18/2013 10:54 AM, Naoya Horiguchi wrote:
> diff --git a/mm/migrate.c b/mm/migrate.c
> index cb5d152b58bc..661ff5f66591 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -454,7 +454,8 @@ static void __copy_gigantic_page(struct page *dst, struct page *src,
>  	struct page *src_base = src;
>  
>  	for (i = 0; i < nr_pages; ) {
> -		cond_resched();
> +		if (i % MAX_ORDER_NR_PAGES == 0)
> +			cond_resched();
>  		copy_highpage(dst, src);

This is certainly OK on x86, but remember that MAX_ORDER can be
overridden by a config variable.  Just picking one at random:

config FORCE_MAX_ZONEORDER
        int "Maximum zone order"
        range 9 64 if PPC64 && PPC_64K_PAGES
...

Would it be OK to only resched once every 2^63 pages? ;)

Really, though, a lot of things seem to have MAX_ORDER set up so that
it's at 256MB or 512MB.  That's an awful lot to do between rescheds.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [v3][PATCH 0/2] v3: fix hugetlb vs. anon-thp copy page
  2013-11-15 22:55 [v3][PATCH 0/2] v3: fix hugetlb vs. anon-thp copy page Dave Hansen
  2013-11-15 22:55 ` [v3][PATCH 1/2] mm: hugetlbfs: Add VM_BUG_ON()s to catch non-hugetlbfs pages Dave Hansen
  2013-11-15 22:55 ` [v3][PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
@ 2013-11-18 19:23 ` Jiang, Dave
  2 siblings, 0 replies; 13+ messages in thread
From: Jiang, Dave @ 2013-11-18 19:23 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, linux-kernel, akpm, dhillf, Naoya Horiguchi, Mel Gorman

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 450 bytes --]

On Fri, 2013-11-15 at 14:55 -0800, Dave Hansen wrote:
> This took some of Mel's comments in to consideration.  Dave
> Jiang, could you retest this if you get a chance?  These have
> only been lightly compile-tested.
> 

Everything looks good. 

Tested-by: Dave Jiang <dave.jiang@intel.com>
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy
  2013-11-18 19:02       ` Dave Hansen
@ 2013-11-18 20:20         ` Naoya Horiguchi
  2013-11-18 20:48           ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Naoya Horiguchi @ 2013-11-18 20:20 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, dave.jiang, akpm, dhillf, Mel Gorman

On Mon, Nov 18, 2013 at 11:02:32AM -0800, Dave Hansen wrote:
> On 11/18/2013 10:54 AM, Naoya Horiguchi wrote:
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index cb5d152b58bc..661ff5f66591 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -454,7 +454,8 @@ static void __copy_gigantic_page(struct page *dst, struct page *src,
> >  	struct page *src_base = src;
> >  
> >  	for (i = 0; i < nr_pages; ) {
> > -		cond_resched();
> > +		if (i % MAX_ORDER_NR_PAGES == 0)
> > +			cond_resched();
> >  		copy_highpage(dst, src);
> 
> This is certainly OK on x86, but remember that MAX_ORDER can be
> overridden by a config variable.  Just picking one at random:
> 
> config FORCE_MAX_ZONEORDER
>         int "Maximum zone order"
>         range 9 64 if PPC64 && PPC_64K_PAGES
> ...
> 
> Would it be OK to only resched once every 2^63 pages? ;)

You're right. We need use more reliable value here.
HPAGE_SIZE/PAGE_SIZE looks better to me.

> Really, though, a lot of things seem to have MAX_ORDER set up so that
> it's at 256MB or 512MB.  That's an awful lot to do between rescheds.

Yes.

BTW, I found that we have the same problem for other functions like
copy_user_gigantic_page, copy_user_huge_page, and maybe clear_gigantic_page.
So we had better handle them too.

Thanks,
Naoya

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy
  2013-11-18 20:20         ` Naoya Horiguchi
@ 2013-11-18 20:48           ` Dave Hansen
  2013-11-18 21:56             ` Naoya Horiguchi
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2013-11-18 20:48 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, dave.jiang, akpm, dhillf, Mel Gorman

On 11/18/2013 12:20 PM, Naoya Horiguchi wrote:
>> > Really, though, a lot of things seem to have MAX_ORDER set up so that
>> > it's at 256MB or 512MB.  That's an awful lot to do between rescheds.
> Yes.
> 
> BTW, I found that we have the same problem for other functions like
> copy_user_gigantic_page, copy_user_huge_page, and maybe clear_gigantic_page.
> So we had better handle them too.

Is there a problem you're trying to solve here?  The common case of the
cond_resched() call boils down to a read of a percpu variable which will
surely be in the L1 cache after the first run around the loop.  In other
words, it's about as cheap of an operation as we're going to get.

Why bother trying to "optimize" it?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy
  2013-11-18 20:48           ` Dave Hansen
@ 2013-11-18 21:56             ` Naoya Horiguchi
  2013-11-18 22:29               ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Naoya Horiguchi @ 2013-11-18 21:56 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, dave.jiang, akpm, dhillf, Mel Gorman

On Mon, Nov 18, 2013 at 12:48:54PM -0800, Dave Hansen wrote:
> On 11/18/2013 12:20 PM, Naoya Horiguchi wrote:
> >> > Really, though, a lot of things seem to have MAX_ORDER set up so that
> >> > it's at 256MB or 512MB.  That's an awful lot to do between rescheds.
> > Yes.
> > 
> > BTW, I found that we have the same problem for other functions like
> > copy_user_gigantic_page, copy_user_huge_page, and maybe clear_gigantic_page.
> > So we had better handle them too.
> 
> Is there a problem you're trying to solve here?  The common case of the
> cond_resched() call boils down to a read of a percpu variable which will
> surely be in the L1 cache after the first run around the loop.  In other
> words, it's about as cheap of an operation as we're going to get.

Yes, cond_resched() is cheap if should_resched() is false (and it is in
common case).

> Why bother trying to "optimize" it?

I thought that if we call cond_resched() too often, the copying thread can
take too long in a heavy load system, because the copying thread always
yields the CPU in every loop.

But it seems to be an extreme case, so I can't push it strongly.

Thanks,
Naoya

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy
  2013-11-18 21:56             ` Naoya Horiguchi
@ 2013-11-18 22:29               ` Dave Hansen
  2013-11-19  0:34                 ` Naoya Horiguchi
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2013-11-18 22:29 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, dave.jiang, akpm, dhillf, Mel Gorman

On 11/18/2013 01:56 PM, Naoya Horiguchi wrote:
>> > Why bother trying to "optimize" it?
> I thought that if we call cond_resched() too often, the copying thread can
> take too long in a heavy load system, because the copying thread always
> yields the CPU in every loop.

I think you're confusing cond_resched() and yield().  The way I look at it:

yield() means: "Hey scheduler, go right now and run something else I'm
done running"

cond_resched() means: "Schedule me off if the scheduler has already
decided something else _should_ be running"

I'm sure I'm missing some of the subtleties, but as I see it, yield()
actively goes off and finds something else to run.  cond_resched() only
schedules you off if you've *already* run too long.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy
  2013-11-18 22:29               ` Dave Hansen
@ 2013-11-19  0:34                 ` Naoya Horiguchi
  0 siblings, 0 replies; 13+ messages in thread
From: Naoya Horiguchi @ 2013-11-19  0:34 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, dave.jiang, akpm, dhillf, Mel Gorman

On Mon, Nov 18, 2013 at 02:29:24PM -0800, Dave Hansen wrote:
> On 11/18/2013 01:56 PM, Naoya Horiguchi wrote:
> >> > Why bother trying to "optimize" it?
> > I thought that if we call cond_resched() too often, the copying thread can
> > take too long in a heavy load system, because the copying thread always
> > yields the CPU in every loop.
> 
> I think you're confusing cond_resched() and yield().  The way I look at it:
> 
> yield() means: "Hey scheduler, go right now and run something else I'm
> done running"
> 
> cond_resched() means: "Schedule me off if the scheduler has already
> decided something else _should_ be running"
> 
> I'm sure I'm missing some of the subtleties, but as I see it, yield()
> actively goes off and finds something else to run.  cond_resched() only
> schedules you off if you've *already* run too long.

I see.
Thanks for the explanation!

Naoya

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-11-19  0:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-15 22:55 [v3][PATCH 0/2] v3: fix hugetlb vs. anon-thp copy page Dave Hansen
2013-11-15 22:55 ` [v3][PATCH 1/2] mm: hugetlbfs: Add VM_BUG_ON()s to catch non-hugetlbfs pages Dave Hansen
2013-11-15 22:55 ` [v3][PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
2013-11-18 10:32   ` Mel Gorman
2013-11-18 18:51   ` [v3][PATCH 2/2] mm: thp: give transparent hugepage code aseparate copy_page Naoya Horiguchi
2013-11-18 18:54     ` [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy Naoya Horiguchi
2013-11-18 19:02       ` Dave Hansen
2013-11-18 20:20         ` Naoya Horiguchi
2013-11-18 20:48           ` Dave Hansen
2013-11-18 21:56             ` Naoya Horiguchi
2013-11-18 22:29               ` Dave Hansen
2013-11-19  0:34                 ` Naoya Horiguchi
2013-11-18 19:23 ` [v3][PATCH 0/2] v3: fix hugetlb vs. anon-thp copy page Jiang, Dave

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).