linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page
  2013-10-28 22:16 ` [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
@ 2013-10-28 22:11   ` Kirill A. Shutemov
  2013-10-30 15:31     ` [PATCH 2/2] mm: thp: give transparent hugepage code a separatecopy_page Naoya Horiguchi
  2013-11-05 18:47     ` [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
  2013-11-06 13:46   ` Hillf Danton
  1 sibling, 2 replies; 8+ messages in thread
From: Kirill A. Shutemov @ 2013-10-28 22:11 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, dave.jiang, Mel Gorman, akpm, dhillf

On Mon, Oct 28, 2013 at 03:16:20PM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> Right now, the migration code in migrate_page_copy() uses 
> copy_huge_page() for hugetlbfs and thp pages:
> 
>        if (PageHuge(page) || PageTransHuge(page))
>                 copy_huge_page(newpage, page);
> 
> So, yay for code reuse.  But:
> 
> void copy_huge_page(struct page *dst, struct page *src)
> {
>         struct hstate *h = page_hstate(src);
> 
> and a non-hugetlbfs page has no page_hstate().  This
> works 99% of the time because page_hstate() determines
> the hstate from the page order alone.  Since the page
> order of a THP page matches the default hugetlbfs page
> order, it works.
> 
> But, if you change the default huge page size on the
> boot command-line (say default_hugepagesz=1G), then
> we might not even *have* a 2MB hstate so page_hstate()
> returns null and copy_huge_page() oopses pretty fast
> since copy_huge_page() dereferences the hstate:
> 
> void copy_huge_page(struct page *dst, struct page *src)
> {
>         struct hstate *h = page_hstate(src);
>         if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
> ...
> 
> This patch creates a copy_high_order_page() which can
> be used on THP pages.

We already have copy_user_huge_page() and copy_user_gigantic_page() in
generic code (mm/memory.c). I think copy_gigantic_page() and
copy_huge_page() should be moved there too.

BTW, I think pages_per_huge_page in copy_user_huge_page() is redunand:
compound_order(page) should be enough, right?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pages
@ 2013-10-28 22:16 Dave Hansen
  2013-10-28 22:16 ` [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
  2013-10-30 15:10 ` [PATCH 1/2] mm: hugetlbfs: Add some VM_BUG_ON()s to catchnon-hugetlbfs pages Naoya Horiguchi
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Hansen @ 2013-10-28 22:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, dave.jiang, Mel Gorman, akpm, dhillf, Dave Hansen


Dave Jiang reported that he was seeing oopses when running
NUMA systems and default_hugepagesz=1G.  I traced the issue down
to migrate_page_copy() trying to use the same code for hugetlb
pages and transparent hugepages.  It should not have been trying
to pass thp pages in there.

So, add some VM_BUG_ON()s for the next hapless VM developer that
tries the same thing.

---

 linux.git-davehans/include/linux/hugetlb.h |    1 +
 linux.git-davehans/mm/hugetlb.c            |    1 +
 2 files changed, 2 insertions(+)

diff -puN include/linux/hugetlb.h~bug-not-hugetlbfs-in-copy_huge_page include/linux/hugetlb.h
--- linux.git/include/linux/hugetlb.h~bug-not-hugetlbfs-in-copy_huge_page	2013-10-28 15:06:12.888828815 -0700
+++ linux.git-davehans/include/linux/hugetlb.h	2013-10-28 15:06:12.893829038 -0700
@@ -355,6 +355,7 @@ static inline pte_t arch_make_huge_pte(p
 
 static inline struct hstate *page_hstate(struct page *page)
 {
+	VM_BUG_ON(!PageHuge(page));
 	return size_to_hstate(PAGE_SIZE << compound_order(page));
 }
 
diff -puN mm/hugetlb.c~bug-not-hugetlbfs-in-copy_huge_page mm/hugetlb.c
--- linux.git/mm/hugetlb.c~bug-not-hugetlbfs-in-copy_huge_page	2013-10-28 15:06:12.890828904 -0700
+++ linux.git-davehans/mm/hugetlb.c	2013-10-28 15:06:12.894829082 -0700
@@ -498,6 +498,7 @@ void copy_huge_page(struct page *dst, st
 	int i;
 	struct hstate *h = page_hstate(src);
 
+	VM_BUG_ON(!h);
 	if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
 		copy_gigantic_page(dst, src);
 		return;
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page
  2013-10-28 22:16 [PATCH 1/2] mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pages Dave Hansen
@ 2013-10-28 22:16 ` Dave Hansen
  2013-10-28 22:11   ` Kirill A. Shutemov
  2013-11-06 13:46   ` Hillf Danton
  2013-10-30 15:10 ` [PATCH 1/2] mm: hugetlbfs: Add some VM_BUG_ON()s to catchnon-hugetlbfs pages Naoya Horiguchi
  1 sibling, 2 replies; 8+ messages in thread
From: Dave Hansen @ 2013-10-28 22:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, dave.jiang, Mel Gorman, akpm, dhillf, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

Right now, the migration code in migrate_page_copy() uses 
copy_huge_page() for hugetlbfs and thp pages:

       if (PageHuge(page) || PageTransHuge(page))
                copy_huge_page(newpage, page);

So, yay for code reuse.  But:

void copy_huge_page(struct page *dst, struct page *src)
{
        struct hstate *h = page_hstate(src);

and a non-hugetlbfs page has no page_hstate().  This
works 99% of the time because page_hstate() determines
the hstate from the page order alone.  Since the page
order of a THP page matches the default hugetlbfs page
order, it works.

But, if you change the default huge page size on the
boot command-line (say default_hugepagesz=1G), then
we might not even *have* a 2MB hstate so page_hstate()
returns null and copy_huge_page() oopses pretty fast
since copy_huge_page() dereferences the hstate:

void copy_huge_page(struct page *dst, struct page *src)
{
        struct hstate *h = page_hstate(src);
        if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
...

This patch creates a copy_high_order_page() which can
be used on THP pages.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/include/linux/huge_mm.h |   16 ++++++++++++++++
 linux.git-davehans/mm/huge_memory.c        |   13 +++++++++++++
 linux.git-davehans/mm/migrate.c            |    4 +++-
 3 files changed, 32 insertions(+), 1 deletion(-)

diff -puN include/linux/huge_mm.h~copy-huge-separate-from-copy-transhuge include/linux/huge_mm.h
--- linux.git/include/linux/huge_mm.h~copy-huge-separate-from-copy-transhuge	2013-10-28 15:10:28.294220490 -0700
+++ linux.git-davehans/include/linux/huge_mm.h	2013-10-28 15:10:28.301220803 -0700
@@ -177,6 +177,10 @@ static inline struct page *compound_tran
 extern int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
 				unsigned long addr, pmd_t pmd, pmd_t *pmdp);
 
+extern void copy_high_order_page(struct page *newpage,
+				 struct page *oldpage,
+				 int order);
+
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
 #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
@@ -227,6 +231,18 @@ static inline int do_huge_pmd_numa_page(
 	return 0;
 }
 
+/*
+ * The non-stub version of this code is probably usable
+ * generically but its only user is thp at the moment,
+ * so enforce that with a BUG()
+ */
+static inline  void copy_high_order_page(struct page *newpage,
+					 struct page *oldpage,
+					 int order)
+{
+	BUG();
+}
+
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #endif /* _LINUX_HUGE_MM_H */
diff -puN mm/huge_memory.c~copy-huge-separate-from-copy-transhuge mm/huge_memory.c
--- linux.git/mm/huge_memory.c~copy-huge-separate-from-copy-transhuge	2013-10-28 15:10:28.296220580 -0700
+++ linux.git-davehans/mm/huge_memory.c	2013-10-28 15:10:28.302220848 -0700
@@ -2789,3 +2789,16 @@ void __vma_adjust_trans_huge(struct vm_a
 			split_huge_page_address(next->vm_mm, nstart);
 	}
 }
+
+void copy_high_order_page(struct page *newpage,
+			  struct page *oldpage,
+			  int order)
+{
+	int i;
+
+	might_sleep();
+	for (i = 0; i < (1<<order); i++) {
+		cond_resched();
+		copy_highpage(newpage + i, oldpage + i);
+	}
+}
diff -puN mm/migrate.c~copy-huge-separate-from-copy-transhuge mm/migrate.c
--- linux.git/mm/migrate.c~copy-huge-separate-from-copy-transhuge	2013-10-28 15:10:28.298220669 -0700
+++ linux.git-davehans/mm/migrate.c	2013-10-28 15:10:28.303220893 -0700
@@ -443,8 +443,10 @@ int migrate_huge_page_move_mapping(struc
  */
 void migrate_page_copy(struct page *newpage, struct page *page)
 {
-	if (PageHuge(page) || PageTransHuge(page))
+	if (PageHuge(page))
 		copy_huge_page(newpage, page);
+	else if(PageTransHuge(page))
+		copy_high_order_page(newpage, page, HPAGE_PMD_ORDER);
 	else
 		copy_highpage(newpage, page);
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mm: hugetlbfs: Add some VM_BUG_ON()s to catchnon-hugetlbfs pages
  2013-10-28 22:16 [PATCH 1/2] mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pages Dave Hansen
  2013-10-28 22:16 ` [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
@ 2013-10-30 15:10 ` Naoya Horiguchi
  1 sibling, 0 replies; 8+ messages in thread
From: Naoya Horiguchi @ 2013-10-30 15:10 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, dave.jiang, Mel Gorman, akpm, dhillf

On Mon, Oct 28, 2013 at 03:16:18PM -0700, Dave Hansen wrote:
> 
> Dave Jiang reported that he was seeing oopses when running
> NUMA systems and default_hugepagesz=1G.  I traced the issue down
> to migrate_page_copy() trying to use the same code for hugetlb
> pages and transparent hugepages.  It should not have been trying
> to pass thp pages in there.
> 
> So, add some VM_BUG_ON()s for the next hapless VM developer that

Looks good to me, thanks.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>


> 
> ---
> 
>  linux.git-davehans/include/linux/hugetlb.h |    1 +
>  linux.git-davehans/mm/hugetlb.c            |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff -puN include/linux/hugetlb.h~bug-not-hugetlbfs-in-copy_huge_page include/linux/hugetlb.h
> --- linux.git/include/linux/hugetlb.h~bug-not-hugetlbfs-in-copy_huge_page	2013-10-28 15:06:12.888828815 -0700
> +++ linux.git-davehans/include/linux/hugetlb.h	2013-10-28 15:06:12.893829038 -0700
> @@ -355,6 +355,7 @@ static inline pte_t arch_make_huge_pte(p
>  
>  static inline struct hstate *page_hstate(struct page *page)
>  {
> +	VM_BUG_ON(!PageHuge(page));
>  	return size_to_hstate(PAGE_SIZE << compound_order(page));
>  }
>  
> diff -puN mm/hugetlb.c~bug-not-hugetlbfs-in-copy_huge_page mm/hugetlb.c
> --- linux.git/mm/hugetlb.c~bug-not-hugetlbfs-in-copy_huge_page	2013-10-28 15:06:12.890828904 -0700
> +++ linux.git-davehans/mm/hugetlb.c	2013-10-28 15:06:12.894829082 -0700
> @@ -498,6 +498,7 @@ void copy_huge_page(struct page *dst, st
>  	int i;
>  	struct hstate *h = page_hstate(src);
>  
> +	VM_BUG_ON(!h);
>  	if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
>  		copy_gigantic_page(dst, src);
>  		return;
> _
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm: thp: give transparent hugepage code a separatecopy_page
  2013-10-28 22:11   ` Kirill A. Shutemov
@ 2013-10-30 15:31     ` Naoya Horiguchi
  2013-11-05 18:47     ` [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
  1 sibling, 0 replies; 8+ messages in thread
From: Naoya Horiguchi @ 2013-10-30 15:31 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, linux-kernel, linux-mm, dave.jiang, Mel Gorman,
	akpm, dhillf

On Tue, Oct 29, 2013 at 12:11:26AM +0200, Kirill A. Shutemov wrote:
> On Mon, Oct 28, 2013 at 03:16:20PM -0700, Dave Hansen wrote:
> > 
> > From: Dave Hansen <dave.hansen@linux.intel.com>
> > 
> > Right now, the migration code in migrate_page_copy() uses 
> > copy_huge_page() for hugetlbfs and thp pages:
> > 
> >        if (PageHuge(page) || PageTransHuge(page))
> >                 copy_huge_page(newpage, page);
> > 
> > So, yay for code reuse.  But:
> > 
> > void copy_huge_page(struct page *dst, struct page *src)
> > {
> >         struct hstate *h = page_hstate(src);
> > 
> > and a non-hugetlbfs page has no page_hstate().  This
> > works 99% of the time because page_hstate() determines
> > the hstate from the page order alone.  Since the page
> > order of a THP page matches the default hugetlbfs page
> > order, it works.
> > 
> > But, if you change the default huge page size on the
> > boot command-line (say default_hugepagesz=1G), then
> > we might not even *have* a 2MB hstate so page_hstate()
> > returns null and copy_huge_page() oopses pretty fast
> > since copy_huge_page() dereferences the hstate:
> > 
> > void copy_huge_page(struct page *dst, struct page *src)
> > {
> >         struct hstate *h = page_hstate(src);
> >         if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
> > ...
> > 
> > This patch creates a copy_high_order_page() which can
> > be used on THP pages.
> 
> We already have copy_user_huge_page() and copy_user_gigantic_page() in
> generic code (mm/memory.c). I think copy_gigantic_page() and
> copy_huge_page() should be moved there too.

I agree this.

> BTW, I think pages_per_huge_page in copy_user_huge_page() is redunand:
> compound_order(page) should be enough, right?

I guess that thp code is very strict on performance, so developers chose
to pass it as an argument instead of calculating compound_order in each call.
I think the performance gain is small (maybe invisible),
but it's not a bad idea to me.

Thanks,
Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page
  2013-10-28 22:11   ` Kirill A. Shutemov
  2013-10-30 15:31     ` [PATCH 2/2] mm: thp: give transparent hugepage code a separatecopy_page Naoya Horiguchi
@ 2013-11-05 18:47     ` Dave Hansen
  1 sibling, 0 replies; 8+ messages in thread
From: Dave Hansen @ 2013-11-05 18:47 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, dave.jiang, Mel Gorman, akpm, dhillf

On 10/28/2013 03:11 PM, Kirill A. Shutemov wrote:
> On Mon, Oct 28, 2013 at 03:16:20PM -0700, Dave Hansen wrote:
>> void copy_huge_page(struct page *dst, struct page *src)
>> {
>>         struct hstate *h = page_hstate(src);
>>         if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
>> ...
>>
>> This patch creates a copy_high_order_page() which can
>> be used on THP pages.
> 
> We already have copy_user_huge_page() and copy_user_gigantic_page() in
> generic code (mm/memory.c). I think copy_gigantic_page() and
> copy_huge_page() should be moved there too.

That would be fine I guesss... in another patch. :)

> BTW, I think pages_per_huge_page in copy_user_huge_page() is redunand:
> compound_order(page) should be enough, right?

The way it is now, the compiler can optimize for the !HUGETLBFS case.
Also, pages_per_huge_page() works for gigantic pages.  compound_order()
wouldn't work for those.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page
  2013-10-28 22:16 ` [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
  2013-10-28 22:11   ` Kirill A. Shutemov
@ 2013-11-06 13:46   ` Hillf Danton
  2013-11-06 16:34     ` Dave Hansen
  1 sibling, 1 reply; 8+ messages in thread
From: Hillf Danton @ 2013-11-06 13:46 UTC (permalink / raw)
  To: Dave Hansen; +Cc: LKML, Linux-MM, dave.jiang, Mel Gorman, Andrew Morton

On Tue, Oct 29, 2013 at 6:16 AM, Dave Hansen <dave@sr71.net> wrote:
> +
> +void copy_high_order_page(struct page *newpage,
> +                         struct page *oldpage,
> +                         int order)
> +{
> +       int i;
> +
> +       might_sleep();
> +       for (i = 0; i < (1<<order); i++) {
> +               cond_resched();
> +               copy_highpage(newpage + i, oldpage + i);
> +       }
> +}

Can we make no  use of might_sleep here with cond_resched in loop?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page
  2013-11-06 13:46   ` Hillf Danton
@ 2013-11-06 16:34     ` Dave Hansen
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Hansen @ 2013-11-06 16:34 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Linux-MM, dave.jiang, Mel Gorman, Andrew Morton

On 11/06/2013 05:46 AM, Hillf Danton wrote:
> On Tue, Oct 29, 2013 at 6:16 AM, Dave Hansen <dave@sr71.net> wrote:
>> +
>> +void copy_high_order_page(struct page *newpage,
>> +                         struct page *oldpage,
>> +                         int order)
>> +{
>> +       int i;
>> +
>> +       might_sleep();
>> +       for (i = 0; i < (1<<order); i++) {
>> +               cond_resched();
>> +               copy_highpage(newpage + i, oldpage + i);
>> +       }
>> +}
> 
> Can we make no  use of might_sleep here with cond_resched in loop?

I'm not sure what you're saying.

Are you pointing out that cond_resched() actually calls might_sleep() so
the might_sleep() is redundant?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-11-06 16:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-28 22:16 [PATCH 1/2] mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pages Dave Hansen
2013-10-28 22:16 ` [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
2013-10-28 22:11   ` Kirill A. Shutemov
2013-10-30 15:31     ` [PATCH 2/2] mm: thp: give transparent hugepage code a separatecopy_page Naoya Horiguchi
2013-11-05 18:47     ` [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page Dave Hansen
2013-11-06 13:46   ` Hillf Danton
2013-11-06 16:34     ` Dave Hansen
2013-10-30 15:10 ` [PATCH 1/2] mm: hugetlbfs: Add some VM_BUG_ON()s to catchnon-hugetlbfs pages Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).