linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] pagemap handles transparent hugepage
@ 2011-12-19 18:38 Naoya Horiguchi
  2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: Andi Kleen, Wu Fengguang, Andrea Arcangeli, KOSAKI Motohiro,
	KAMEZAWA Hiroyuki, linux-kernel, Naoya Horiguchi

Transparent hugepage (thp) splits itself when kernel cannot work on it
in hugepage unit. Thp split can reduce performance merit of thp, so we
are now making efforts to handle thp as it is if possible.
As one of such efforts, I suggest to make pagemap aware of thp.

Could you review and comment on them?

Thanks,
Naoya

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap
  2011-12-19 18:38 [RFC][PATCH 0/3] pagemap handles transparent hugepage Naoya Horiguchi
@ 2011-12-19 18:38 ` Naoya Horiguchi
  2011-12-19 18:45   ` Andi Kleen
  2011-12-19 18:48   ` David Rientjes
  2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
  2011-12-19 18:38 ` [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it Naoya Horiguchi
  2 siblings, 2 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: Andi Kleen, Wu Fengguang, Andrea Arcangeli, KOSAKI Motohiro,
	KAMEZAWA Hiroyuki, linux-kernel, Naoya Horiguchi

Thp split is not necessary if we explicitly check whether pmds are
mapping thps or not. This patch introduces the check and the code
to generate pagemap entries for pmds mapping thps, which results in
less performance impact of pagemap on thp.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/proc/task_mmu.c |   48 ++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 44 insertions(+), 4 deletions(-)

diff --git 3.2-rc5.orig/fs/proc/task_mmu.c 3.2-rc5/fs/proc/task_mmu.c
index e418c5a..90c4b7a 100644
--- 3.2-rc5.orig/fs/proc/task_mmu.c
+++ 3.2-rc5/fs/proc/task_mmu.c
@@ -600,6 +600,9 @@ struct pagemapread {
 	u64 *buffer;
 };
 
+#define PAGEMAP_WALK_SIZE	(PMD_SIZE)
+#define PAGEMAP_WALK_MASK	(PMD_MASK)
+
 #define PM_ENTRY_BYTES      sizeof(u64)
 #define PM_STATUS_BITS      3
 #define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
@@ -658,6 +661,22 @@ static u64 pte_to_pagemap_entry(pte_t pte)
 	return pme;
 }
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static u64 thp_pte_to_pagemap_entry(pte_t pte, int offset)
+{
+	u64 pme = 0;
+	if (pte_present(pte))
+		pme = PM_PFRAME(pte_pfn(pte) + offset)
+			| PM_PSHIFT(PAGE_SHIFT) | PM_PRESENT;
+	return pme;
+}
+#else
+static inline u64 thp_pte_to_pagemap_entry(pte_t pte, int offset)
+{
+	return 0;
+}
+#endif
+
 static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 			     struct mm_walk *walk)
 {
@@ -666,10 +685,33 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	pte_t *pte;
 	int err = 0;
 
-	split_huge_page_pmd(walk->mm, pmd);
-
 	/* find the first VMA at or above 'addr' */
 	vma = find_vma(walk->mm, addr);
+
+	spin_lock(&walk->mm->page_table_lock);
+	if (pmd_trans_huge(*pmd)) {
+		if (pmd_trans_splitting(*pmd)) {
+			spin_unlock(&walk->mm->page_table_lock);
+			wait_split_huge_page(vma->anon_vma, pmd);
+		} else {
+			u64 pfn = PM_NOT_PRESENT;
+
+			for (; addr != end; addr += PAGE_SIZE) {
+				int offset = (addr & ~PAGEMAP_WALK_MASK)
+					>> PAGE_SHIFT;
+				pfn = thp_pte_to_pagemap_entry(*(pte_t *)pmd,
+							       offset);
+				err = add_to_pagemap(addr, pfn, pm);
+				if (err)
+					break;
+			}
+			spin_unlock(&walk->mm->page_table_lock);
+			return err;
+		}
+	} else {
+		spin_unlock(&walk->mm->page_table_lock);
+	}
+
 	for (; addr != end; addr += PAGE_SIZE) {
 		u64 pfn = PM_NOT_PRESENT;
 
@@ -754,8 +796,6 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
  * determine which areas of memory are actually mapped and llseek to
  * skip over unmapped regions.
  */
-#define PAGEMAP_WALK_SIZE	(PMD_SIZE)
-#define PAGEMAP_WALK_MASK	(PMD_MASK)
 static ssize_t pagemap_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
 {
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 18:38 [RFC][PATCH 0/3] pagemap handles transparent hugepage Naoya Horiguchi
  2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
@ 2011-12-19 18:38 ` Naoya Horiguchi
  2011-12-19 18:40   ` Andi Kleen
                     ` (2 more replies)
  2011-12-19 18:38 ` [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it Naoya Horiguchi
  2 siblings, 3 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: Andi Kleen, Wu Fengguang, Andrea Arcangeli, KOSAKI Motohiro,
	KAMEZAWA Hiroyuki, linux-kernel, Naoya Horiguchi

This flag shows that a given pages is a subpage of transparent hugepage.
It does not care about whether it is a head page or a tail page, because
it's clear from pfn of the target page which you should know when you read
/proc/kpageflags.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/proc/page.c                    |    5 +++++
 include/linux/kernel-page-flags.h |    4 ++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git 3.2-rc5.orig/fs/proc/page.c 3.2-rc5/fs/proc/page.c
index 6d8e6a9..d436fc6 100644
--- 3.2-rc5.orig/fs/proc/page.c
+++ 3.2-rc5/fs/proc/page.c
@@ -116,6 +116,11 @@ u64 stable_page_flags(struct page *page)
 	if (PageHuge(page))
 		u |= 1 << KPF_HUGE;
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	if (PageTransCompound(page))
+		u |= 1 << KPF_THP;
+#endif
+
 	/*
 	 * Caveats on high order pages: page->_count will only be set
 	 * -1 on the head page; SLUB/SLQB do the same for PG_slab;
diff --git 3.2-rc5.orig/include/linux/kernel-page-flags.h 3.2-rc5/include/linux/kernel-page-flags.h
index bd92a89..7b83ee7 100644
--- 3.2-rc5.orig/include/linux/kernel-page-flags.h
+++ 3.2-rc5/include/linux/kernel-page-flags.h
@@ -31,6 +31,10 @@
 
 #define KPF_KSM			21
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define KPF_THP			22
+#endif
+
 /* kernel hacking assistances
  * WARNING: subject to change, never rely on them!
  */
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it
  2011-12-19 18:38 [RFC][PATCH 0/3] pagemap handles transparent hugepage Naoya Horiguchi
  2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
  2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
@ 2011-12-19 18:38 ` Naoya Horiguchi
  2011-12-19 18:51   ` David Rientjes
  2011-12-20  3:41   ` Wu Fengguang
  2 siblings, 2 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: Andi Kleen, Wu Fengguang, Andrea Arcangeli, KOSAKI Motohiro,
	KAMEZAWA Hiroyuki, linux-kernel, Naoya Horiguchi

page-types, which is a common user of pagemap, gets aware of thp
with this patch. This helps system admins and kernel hackers to know
about how thp works.
Here is a sample output of page-types over a thp:

  voffset offset  len     flags
  ...
  7f9d40200       3fd200  1       ___U_lA____Ma_b_______t____________
  7f9d40201       3fd201  1ff     ______________________t____________
  ...

  flags      page-count       MB  symbolic-flags                     long-symbolic-flags
  0x0000000000400000             511        1  ______________________t____________        thp
  0x0000000000405868               1        0  ___U_lA____Ma_b_______t____________        uptodate,lru,active,mmap,anonymous,swapbacked,thp

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 Documentation/vm/page-types.c |    2 ++
 Documentation/vm/pagemap.txt  |    4 ++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git 3.2-rc5.orig/Documentation/vm/page-types.c 3.2-rc5/Documentation/vm/page-types.c
index 7445caa..0b13f02 100644
--- 3.2-rc5.orig/Documentation/vm/page-types.c
+++ 3.2-rc5/Documentation/vm/page-types.c
@@ -98,6 +98,7 @@
 #define KPF_HWPOISON		19
 #define KPF_NOPAGE		20
 #define KPF_KSM			21
+#define KPF_THP			22
 
 /* [32-] kernel hacking assistances */
 #define KPF_RESERVED		32
@@ -147,6 +148,7 @@ static const char *page_flag_names[] = {
 	[KPF_HWPOISON]		= "X:hwpoison",
 	[KPF_NOPAGE]		= "n:nopage",
 	[KPF_KSM]		= "x:ksm",
+	[KPF_THP]		= "t:thp",
 
 	[KPF_RESERVED]		= "r:reserved",
 	[KPF_MLOCKED]		= "m:mlocked",
diff --git 3.2-rc5.orig/Documentation/vm/pagemap.txt 3.2-rc5/Documentation/vm/pagemap.txt
index df09b96..666d12b 100644
--- 3.2-rc5.orig/Documentation/vm/pagemap.txt
+++ 3.2-rc5/Documentation/vm/pagemap.txt
@@ -60,6 +60,7 @@ There are three components to pagemap:
     19. HWPOISON
     20. NOPAGE
     21. KSM
+    22. THP
 
 Short descriptions to the page flags:
 
@@ -97,6 +98,9 @@ Short descriptions to the page flags:
 21. KSM
     identical memory pages dynamically shared between one or more processes
 
+22. THP
+    continuous pages which construct transparent hugepages
+
     [IO related page flags]
  1. ERROR     IO error occurred
  3. UPTODATE  page has up-to-date data
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
@ 2011-12-19 18:40   ` Andi Kleen
  2011-12-19 18:49     ` David Rientjes
                       ` (2 more replies)
  2011-12-19 19:24   ` KOSAKI Motohiro
  2011-12-20  3:35   ` Wu Fengguang
  2 siblings, 3 replies; 23+ messages in thread
From: Andi Kleen @ 2011-12-19 18:40 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
	KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel

> diff --git 3.2-rc5.orig/fs/proc/page.c 3.2-rc5/fs/proc/page.c
> index 6d8e6a9..d436fc6 100644
> --- 3.2-rc5.orig/fs/proc/page.c
> +++ 3.2-rc5/fs/proc/page.c
> @@ -116,6 +116,11 @@ u64 stable_page_flags(struct page *page)
>  	if (PageHuge(page))
>  		u |= 1 << KPF_HUGE;
>  
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	if (PageTransCompound(page))
> +		u |= 1 << KPF_THP;
> +#endif

It would be better to have PageTransCompound be a dummy (always 0) 
for !CONFIG_TRANSPARENT_HUGEPAGE and KPF_THP always defined.
This would keep ifdefery in the headers.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap
  2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
@ 2011-12-19 18:45   ` Andi Kleen
  2011-12-19 18:48   ` David Rientjes
  1 sibling, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2011-12-19 18:45 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
	KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel

On Mon, Dec 19, 2011 at 01:38:37PM -0500, Naoya Horiguchi wrote:
> Thp split is not necessary if we explicitly check whether pmds are
> mapping thps or not. This patch introduces the check and the code
> to generate pagemap entries for pmds mapping thps, which results in
> less performance impact of pagemap on thp.

Looks good.

Reviewed-by: Andi Kleen <ak@linux.intel.com>

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap
  2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
  2011-12-19 18:45   ` Andi Kleen
@ 2011-12-19 18:48   ` David Rientjes
  2011-12-19 19:16     ` Naoya Horiguchi
  1 sibling, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-12-19 18:48 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
	KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel

On Mon, 19 Dec 2011, Naoya Horiguchi wrote:

> diff --git 3.2-rc5.orig/fs/proc/task_mmu.c 3.2-rc5/fs/proc/task_mmu.c
> index e418c5a..90c4b7a 100644
> --- 3.2-rc5.orig/fs/proc/task_mmu.c
> +++ 3.2-rc5/fs/proc/task_mmu.c
> @@ -600,6 +600,9 @@ struct pagemapread {
>  	u64 *buffer;
>  };
>  
> +#define PAGEMAP_WALK_SIZE	(PMD_SIZE)
> +#define PAGEMAP_WALK_MASK	(PMD_MASK)
> +
>  #define PM_ENTRY_BYTES      sizeof(u64)
>  #define PM_STATUS_BITS      3
>  #define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
> @@ -658,6 +661,22 @@ static u64 pte_to_pagemap_entry(pte_t pte)
>  	return pme;
>  }
>  
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static u64 thp_pte_to_pagemap_entry(pte_t pte, int offset)
> +{
> +	u64 pme = 0;
> +	if (pte_present(pte))
> +		pme = PM_PFRAME(pte_pfn(pte) + offset)
> +			| PM_PSHIFT(PAGE_SHIFT) | PM_PRESENT;
> +	return pme;
> +}
> +#else
> +static inline u64 thp_pte_to_pagemap_entry(pte_t pte, int offset)
> +{
> +	return 0;
> +}
> +#endif
> +
>  static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  			     struct mm_walk *walk)
>  {
> @@ -666,10 +685,33 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  	pte_t *pte;
>  	int err = 0;
>  
> -	split_huge_page_pmd(walk->mm, pmd);
> -
>  	/* find the first VMA at or above 'addr' */
>  	vma = find_vma(walk->mm, addr);
> +
> +	spin_lock(&walk->mm->page_table_lock);

pagemap_pte_range() could potentially be called a _lot_, so I'd recommend 
optimizing this by checking for pmd_trans_huge() prior to taking 
page_table_lock and then rechecking after grabbing it with likely().

> +	if (pmd_trans_huge(*pmd)) {
> +		if (pmd_trans_splitting(*pmd)) {
> +			spin_unlock(&walk->mm->page_table_lock);
> +			wait_split_huge_page(vma->anon_vma, pmd);
> +		} else {
> +			u64 pfn = PM_NOT_PRESENT;

This doesn't need to be initialized and it would probably be better to 
declare "pfn" at the top-level of this function since it's later used for 
the non-thp case.

> +
> +			for (; addr != end; addr += PAGE_SIZE) {
> +				int offset = (addr & ~PAGEMAP_WALK_MASK)
> +					>> PAGE_SHIFT;
> +				pfn = thp_pte_to_pagemap_entry(*(pte_t *)pmd,
> +							       offset);
> +				err = add_to_pagemap(addr, pfn, pm);
> +				if (err)
> +					break;
> +			}
> +			spin_unlock(&walk->mm->page_table_lock);
> +			return err;
> +		}
> +	} else {
> +		spin_unlock(&walk->mm->page_table_lock);
> +	}
> +
>  	for (; addr != end; addr += PAGE_SIZE) {
>  		u64 pfn = PM_NOT_PRESENT;
>  
> @@ -754,8 +796,6 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>   * determine which areas of memory are actually mapped and llseek to
>   * skip over unmapped regions.
>   */
> -#define PAGEMAP_WALK_SIZE	(PMD_SIZE)
> -#define PAGEMAP_WALK_MASK	(PMD_MASK)
>  static ssize_t pagemap_read(struct file *file, char __user *buf,
>  			    size_t count, loff_t *ppos)
>  {

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 18:40   ` Andi Kleen
@ 2011-12-19 18:49     ` David Rientjes
  2011-12-19 18:55     ` Naoya Horiguchi
  2011-12-19 19:05     ` Andrea Arcangeli
  2 siblings, 0 replies; 23+ messages in thread
From: David Rientjes @ 2011-12-19 18:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Naoya Horiguchi, linux-mm, Wu Fengguang, Andrea Arcangeli,
	KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel

On Mon, 19 Dec 2011, Andi Kleen wrote:

> It would be better to have PageTransCompound be a dummy (always 0) 
> for !CONFIG_TRANSPARENT_HUGEPAGE

It already is.

> and KPF_THP always defined.
> This would keep ifdefery in the headers.
> 

Agreed.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it
  2011-12-19 18:38 ` [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it Naoya Horiguchi
@ 2011-12-19 18:51   ` David Rientjes
  2011-12-20  3:41   ` Wu Fengguang
  1 sibling, 0 replies; 23+ messages in thread
From: David Rientjes @ 2011-12-19 18:51 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
	KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel

On Mon, 19 Dec 2011, Naoya Horiguchi wrote:

> @@ -97,6 +98,9 @@ Short descriptions to the page flags:
>  21. KSM
>      identical memory pages dynamically shared between one or more processes
>  
> +22. THP
> +    continuous pages which construct transparent hugepages
> +
>      [IO related page flags]
>   1. ERROR     IO error occurred
>   3. UPTODATE  page has up-to-date data

s/continuous/contiguous/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 18:40   ` Andi Kleen
  2011-12-19 18:49     ` David Rientjes
@ 2011-12-19 18:55     ` Naoya Horiguchi
  2011-12-19 19:05     ` Andrea Arcangeli
  2 siblings, 0 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 18:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Naoya Horiguchi, linux-mm, Wu Fengguang, Andrea Arcangeli,
	KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel

Hi,

On Mon, Dec 19, 2011 at 07:40:47PM +0100, Andi Kleen wrote:
> > diff --git 3.2-rc5.orig/fs/proc/page.c 3.2-rc5/fs/proc/page.c
> > index 6d8e6a9..d436fc6 100644
> > --- 3.2-rc5.orig/fs/proc/page.c
> > +++ 3.2-rc5/fs/proc/page.c
> > @@ -116,6 +116,11 @@ u64 stable_page_flags(struct page *page)
> >  	if (PageHuge(page))
> >  		u |= 1 << KPF_HUGE;
> >  
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +	if (PageTransCompound(page))
> > +		u |= 1 << KPF_THP;
> > +#endif
> 
> It would be better to have PageTransCompound be a dummy (always 0) 
> for !CONFIG_TRANSPARENT_HUGEPAGE and KPF_THP always defined.
> This would keep ifdefery in the headers.

OK, I'll do it in the next post.
Thanks for the feedback.

Naoya

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 18:40   ` Andi Kleen
  2011-12-19 18:49     ` David Rientjes
  2011-12-19 18:55     ` Naoya Horiguchi
@ 2011-12-19 19:05     ` Andrea Arcangeli
  2 siblings, 0 replies; 23+ messages in thread
From: Andrea Arcangeli @ 2011-12-19 19:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Naoya Horiguchi, linux-mm, Wu Fengguang, KOSAKI Motohiro,
	KAMEZAWA Hiroyuki, linux-kernel

On Mon, Dec 19, 2011 at 07:40:47PM +0100, Andi Kleen wrote:
> > diff --git 3.2-rc5.orig/fs/proc/page.c 3.2-rc5/fs/proc/page.c
> > index 6d8e6a9..d436fc6 100644
> > --- 3.2-rc5.orig/fs/proc/page.c
> > +++ 3.2-rc5/fs/proc/page.c
> > @@ -116,6 +116,11 @@ u64 stable_page_flags(struct page *page)
> >  	if (PageHuge(page))
> >  		u |= 1 << KPF_HUGE;
> >  
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +	if (PageTransCompound(page))
> > +		u |= 1 << KPF_THP;
> > +#endif
> 
> It would be better to have PageTransCompound be a dummy (always 0) 
> for !CONFIG_TRANSPARENT_HUGEPAGE and KPF_THP always defined.

It's already the case, that's the whole point of using
PageTransCompound instead of PageCompound (the former defines to 0 is
the config option is disabled).

> This would keep ifdefery in the headers.

Yes the #ifdef can go already.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap
  2011-12-19 18:48   ` David Rientjes
@ 2011-12-19 19:16     ` Naoya Horiguchi
  0 siblings, 0 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 19:16 UTC (permalink / raw)
  To: David Rientjes
  Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
	Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
	linux-kernel

On Mon, Dec 19, 2011 at 10:48:17AM -0800, David Rientjes wrote:
> On Mon, 19 Dec 2011, Naoya Horiguchi wrote:
...
> > @@ -666,10 +685,33 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> >  	pte_t *pte;
> >  	int err = 0;
> >  
> > -	split_huge_page_pmd(walk->mm, pmd);
> > -
> >  	/* find the first VMA at or above 'addr' */
> >  	vma = find_vma(walk->mm, addr);
> > +
> > +	spin_lock(&walk->mm->page_table_lock);
> 
> pagemap_pte_range() could potentially be called a _lot_, so I'd recommend 
> optimizing this by checking for pmd_trans_huge() prior to taking 
> page_table_lock and then rechecking after grabbing it with likely().

OK, I'll try it.
Similar logic is used on smaps_pte_range() and gather_pte_stats(),
so I think it's better to write a separate patch for this optimization.

> > +	if (pmd_trans_huge(*pmd)) {
> > +		if (pmd_trans_splitting(*pmd)) {
> > +			spin_unlock(&walk->mm->page_table_lock);
> > +			wait_split_huge_page(vma->anon_vma, pmd);
> > +		} else {
> > +			u64 pfn = PM_NOT_PRESENT;
> 
> This doesn't need to be initialized and it would probably be better to 
> declare "pfn" at the top-level of this function since it's later used for 
> the non-thp case.

Agreed.
I unify two declarations of "pfn" at the beginning of the function.

Thank you for nice feedback.
Naoya

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
  2011-12-19 18:40   ` Andi Kleen
@ 2011-12-19 19:24   ` KOSAKI Motohiro
  2011-12-19 20:26     ` Naoya Horiguchi
  2011-12-19 20:31     ` Dave Hansen
  2011-12-20  3:35   ` Wu Fengguang
  2 siblings, 2 replies; 23+ messages in thread
From: KOSAKI Motohiro @ 2011-12-19 19:24 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
	KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel

(12/19/11 1:38 PM), Naoya Horiguchi wrote:
> This flag shows that a given pages is a subpage of transparent hugepage.
> It does not care about whether it is a head page or a tail page, because
> it's clear from pfn of the target page which you should know when you read
> /proc/kpageflags.
> 
> Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>

NAK.

The detail of transparent hugepage are hidden by design. We hope it
keep 'transparent'.
Until any explain why we should expose KPF_THP, we don't agree it.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 19:24   ` KOSAKI Motohiro
@ 2011-12-19 20:26     ` Naoya Horiguchi
  2011-12-19 20:48       ` KOSAKI Motohiro
  2011-12-19 20:31     ` Dave Hansen
  1 sibling, 1 reply; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 20:26 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
	Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
	linux-kernel

Hi,

On Mon, Dec 19, 2011 at 02:24:53PM -0500, KOSAKI Motohiro wrote:
> (12/19/11 1:38 PM), Naoya Horiguchi wrote:
> > This flag shows that a given pages is a subpage of transparent hugepage.
> > It does not care about whether it is a head page or a tail page, because
> > it's clear from pfn of the target page which you should know when you read
> > /proc/kpageflags.
> > 
> > Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
> 
> NAK.
> 
> The detail of transparent hugepage are hidden by design. We hope it
> keep 'transparent'.
> Until any explain why we should expose KPF_THP, we don't agree it.

The reason why I want to know physical address of thp is testing.
I'm working on memory error recovery and writing test code to confirm
that memory recovery really works when an error occurs on thps.
There I need to locate thps on the physical memory.

IMO, transparency in thp means that we need no manual setup to use
it (as a contrast with hugetlbfs,) so it seems to me that exporting
pageflag of thp does not break the design of thp.

Anyway, I should have written the purpose in the patch description.
Thanks for the comment.

Naoya

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 19:24   ` KOSAKI Motohiro
  2011-12-19 20:26     ` Naoya Horiguchi
@ 2011-12-19 20:31     ` Dave Hansen
  2011-12-19 20:45       ` KOSAKI Motohiro
  1 sibling, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2011-12-19 20:31 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
	Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
	linux-kernel

On 12/19/2011 11:24 AM, KOSAKI Motohiro wrote:
> (12/19/11 1:38 PM), Naoya Horiguchi wrote:
>> This flag shows that a given pages is a subpage of transparent hugepage.
>> It does not care about whether it is a head page or a tail page, because
>> it's clear from pfn of the target page which you should know when you read
>> /proc/kpageflags.
>>
>> Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
> 
> NAK.
> 
> The detail of transparent hugepage are hidden by design. We hope it
> keep 'transparent'.
> Until any explain why we should expose KPF_THP, we don't agree it.

Transparent shouldn't mean "undebuggable", though. :)

Let's say you profiled a application and the data shows you're missing
the TLB a bunch, but you're also using THP.  This might give you a shot
at figuring out which parts of your application are *TRULY* THP-backed
instead of just the areas you *think* are backed.

I'm not sure there's another way to figure it out at the moment.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 20:31     ` Dave Hansen
@ 2011-12-19 20:45       ` KOSAKI Motohiro
  2011-12-19 20:57         ` Dave Hansen
  0 siblings, 1 reply; 23+ messages in thread
From: KOSAKI Motohiro @ 2011-12-19 20:45 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
	Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
	linux-kernel

(12/19/11 3:31 PM), Dave Hansen wrote:
> On 12/19/2011 11:24 AM, KOSAKI Motohiro wrote:
>> (12/19/11 1:38 PM), Naoya Horiguchi wrote:
>>> This flag shows that a given pages is a subpage of transparent hugepage.
>>> It does not care about whether it is a head page or a tail page, because
>>> it's clear from pfn of the target page which you should know when you read
>>> /proc/kpageflags.
>>>
>>> Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
>>
>> NAK.
>>
>> The detail of transparent hugepage are hidden by design. We hope it
>> keep 'transparent'.
>> Until any explain why we should expose KPF_THP, we don't agree it.
> 
> Transparent shouldn't mean "undebuggable", though. :)
> 
> Let's say you profiled a application and the data shows you're missing
> the TLB a bunch, but you're also using THP.  This might give you a shot
> at figuring out which parts of your application are *TRULY* THP-backed
> instead of just the areas you *think* are backed.
> 
> I'm not sure there's another way to figure it out at the moment.

A snapshot status of THP doesn't help your purpose. I think you need
perf or similar profiling subsystem enhancement.

Because of, if you've seen KPF_THP at once, It has no guarantee to keep
hugepages until applications run. Opposite, If you only need rough
statistics, the best way is to add some new stat to
/sys/kernel/mm/transparent_hugepage.

I don't think your usecase and current proposal are matched.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 20:26     ` Naoya Horiguchi
@ 2011-12-19 20:48       ` KOSAKI Motohiro
  2011-12-19 21:20         ` Naoya Horiguchi
  0 siblings, 1 reply; 23+ messages in thread
From: KOSAKI Motohiro @ 2011-12-19 20:48 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
	KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel

>> The detail of transparent hugepage are hidden by design. We hope it
>> keep 'transparent'.
>> Until any explain why we should expose KPF_THP, we don't agree it.
> 
> The reason why I want to know physical address of thp is testing.
> I'm working on memory error recovery and writing test code to confirm
> that memory recovery really works when an error occurs on thps.
> There I need to locate thps on the physical memory.

I'm sorry, I simply don't understand what you say. Why do you think
memory recovery and thp are related feature?



> 
> IMO, transparency in thp means that we need no manual setup to use
> it (as a contrast with hugetlbfs,) so it seems to me that exporting
> pageflag of thp does not break the design of thp.
> 
> Anyway, I should have written the purpose in the patch description.
> Thanks for the comment.
> 
> Naoya


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 20:45       ` KOSAKI Motohiro
@ 2011-12-19 20:57         ` Dave Hansen
  2011-12-19 21:23           ` Andrea Arcangeli
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2011-12-19 20:57 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
	Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
	linux-kernel

On 12/19/2011 12:45 PM, KOSAKI Motohiro wrote:
> (12/19/11 3:31 PM), Dave Hansen wrote:
>> Let's say you profiled a application and the data shows you're missing
>> the TLB a bunch, but you're also using THP.  This might give you a shot
>> at figuring out which parts of your application are *TRULY* THP-backed
>> instead of just the areas you *think* are backed.
>>
>> I'm not sure there's another way to figure it out at the moment.
> 
> A snapshot status of THP doesn't help your purpose. I think you need
> perf or similar profiling subsystem enhancement.
>
> Because of, if you've seen KPF_THP at once, It has no guarantee to keep
> hugepages until applications run. Opposite, If you only need rough
> statistics, the best way is to add some new stat to
> /sys/kernel/mm/transparent_hugepage.

But, every single one of the pagemap flags is really just a snapshot
KPF_DIRTY, KPF_LOCKED, etc...  The entire interface is inherently a racy
snapshot, and there's not a whole lot you can do about it.
sys_mincore() has the exact same issues.  But, that does not make them
useless, nor mean they shouldn't be in the kernel.

A tracepoint or something similar to watch for THP promotions or
demotions would be a great addition to this interface.  That way, you at
least have a concept if the data you got has become stale.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 20:48       ` KOSAKI Motohiro
@ 2011-12-19 21:20         ` Naoya Horiguchi
  0 siblings, 0 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 21:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
	Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
	linux-kernel

On Mon, Dec 19, 2011 at 03:48:08PM -0500, KOSAKI Motohiro wrote:
> >> The detail of transparent hugepage are hidden by design. We hope it
> >> keep 'transparent'.
> >> Until any explain why we should expose KPF_THP, we don't agree it.
> > 
> > The reason why I want to know physical address of thp is testing.
> > I'm working on memory error recovery and writing test code to confirm
> > that memory recovery really works when an error occurs on thps.
> > There I need to locate thps on the physical memory.
> 
> I'm sorry, I simply don't understand what you say. Why do you think
> memory recovery and thp are related feature?

This is because memory error can occur on thp and then we should be
able to handle them (otherwise we can use corrupted date) and
verify that it works.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 20:57         ` Dave Hansen
@ 2011-12-19 21:23           ` Andrea Arcangeli
  0 siblings, 0 replies; 23+ messages in thread
From: Andrea Arcangeli @ 2011-12-19 21:23 UTC (permalink / raw)
  To: Dave Hansen
  Cc: KOSAKI Motohiro, Naoya Horiguchi, linux-mm, Andi Kleen,
	Wu Fengguang, KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel

On Mon, Dec 19, 2011 at 12:57:01PM -0800, Dave Hansen wrote:
> But, every single one of the pagemap flags is really just a snapshot
> KPF_DIRTY, KPF_LOCKED, etc...  The entire interface is inherently a racy
> snapshot, and there's not a whole lot you can do about it.

Having read the discussion, while I don't see a big need of the
KPF_THP, I also see how it in certain corner cases it can be used to
test memory failure injection and I agree with you on the above. Maybe
it can also be used to check if at certain virtual offsets
(pid/pagemap lookup followed by a kpageflags lookup) we always fail to
find THP inside big vmas, maybe out of not aligned mprotect that may
be optimized by aligning it.

The other kernel internal bits may also be stale and go away quicker
than the KPF_THP, so I don't see a problem in exposing it. We also
provide THP related info in meminfo/smaps, if they were supposed to be
invisible that wouldn't be allowed too.

A bigger concern to me is that the new bitfield alters the protocol,
but old code by adding one more bit (if sanely coded...) shouldn't break.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
  2011-12-19 18:40   ` Andi Kleen
  2011-12-19 19:24   ` KOSAKI Motohiro
@ 2011-12-20  3:35   ` Wu Fengguang
  2011-12-20 18:10     ` Naoya Horiguchi
  2 siblings, 1 reply; 23+ messages in thread
From: Wu Fengguang @ 2011-12-20  3:35 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andi Kleen, Andrea Arcangeli, KOSAKI Motohiro,
	KAMEZAWA Hiroyuki, linux-kernel

On Tue, Dec 20, 2011 at 02:38:38AM +0800, Naoya Horiguchi wrote:
> This flag shows that a given pages is a subpage of transparent hugepage.
> It does not care about whether it is a head page or a tail page, because
> it's clear from pfn of the target page which you should know when you read
> /proc/kpageflags.

OK, this is aligning with KPF_HUGE. For those who only care about
head/tail pages, will the KPF_COMPOUND_HEAD/KPF_COMPOUND_TAIL flags be
set automatically for thp? Which may be more convenient to test/filter
than the page address.

> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>

As you already discussed, the below #ifdef should be removed.
In fact, kernel-page-flags.h is intended for direct inclusion by
user space tools, so must not have any conditional defines.

> --- 3.2-rc5.orig/include/linux/kernel-page-flags.h
> +++ 3.2-rc5/include/linux/kernel-page-flags.h
> @@ -31,6 +31,10 @@
>  
>  #define KPF_KSM			21
>  
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#define KPF_THP			22
> +#endif
> +
>  /* kernel hacking assistances
>   * WARNING: subject to change, never rely on them!
>   */
> -- 
> 1.7.7.3

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it
  2011-12-19 18:38 ` [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it Naoya Horiguchi
  2011-12-19 18:51   ` David Rientjes
@ 2011-12-20  3:41   ` Wu Fengguang
  1 sibling, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2011-12-20  3:41 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andi Kleen, Andrea Arcangeli, KOSAKI Motohiro,
	KAMEZAWA Hiroyuki, linux-kernel

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
  2011-12-20  3:35   ` Wu Fengguang
@ 2011-12-20 18:10     ` Naoya Horiguchi
  0 siblings, 0 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-20 18:10 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Andrea Arcangeli,
	KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel

On Tue, Dec 20, 2011 at 11:35:37AM +0800, Wu Fengguang wrote:
> On Tue, Dec 20, 2011 at 02:38:38AM +0800, Naoya Horiguchi wrote:
> > This flag shows that a given pages is a subpage of transparent hugepage.
> > It does not care about whether it is a head page or a tail page, because
> > it's clear from pfn of the target page which you should know when you read
> > /proc/kpageflags.
> 
> OK, this is aligning with KPF_HUGE. For those who only care about
> head/tail pages, will the KPF_COMPOUND_HEAD/KPF_COMPOUND_TAIL flags be
> set automatically for thp? Which may be more convenient to test/filter
> than the page address.

Yes, both of KPF_COMPOUND_HEAD/TAIL flags are automatically set for thp.
So above patch description was wrong and should be fixed.
(I didn't notice that because page-types hid compound_head/tail flags
without --raw flag. Sorry for confusion.)

> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>

Thank you.

Naoya

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-12-20 18:11 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-19 18:38 [RFC][PATCH 0/3] pagemap handles transparent hugepage Naoya Horiguchi
2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
2011-12-19 18:45   ` Andi Kleen
2011-12-19 18:48   ` David Rientjes
2011-12-19 19:16     ` Naoya Horiguchi
2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
2011-12-19 18:40   ` Andi Kleen
2011-12-19 18:49     ` David Rientjes
2011-12-19 18:55     ` Naoya Horiguchi
2011-12-19 19:05     ` Andrea Arcangeli
2011-12-19 19:24   ` KOSAKI Motohiro
2011-12-19 20:26     ` Naoya Horiguchi
2011-12-19 20:48       ` KOSAKI Motohiro
2011-12-19 21:20         ` Naoya Horiguchi
2011-12-19 20:31     ` Dave Hansen
2011-12-19 20:45       ` KOSAKI Motohiro
2011-12-19 20:57         ` Dave Hansen
2011-12-19 21:23           ` Andrea Arcangeli
2011-12-20  3:35   ` Wu Fengguang
2011-12-20 18:10     ` Naoya Horiguchi
2011-12-19 18:38 ` [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it Naoya Horiguchi
2011-12-19 18:51   ` David Rientjes
2011-12-20  3:41   ` Wu Fengguang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).