* [RFC][PATCH 0/3] pagemap handles transparent hugepage
@ 2011-12-19 18:38 Naoya Horiguchi
2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 18:38 UTC (permalink / raw)
To: linux-mm
Cc: Andi Kleen, Wu Fengguang, Andrea Arcangeli, KOSAKI Motohiro,
KAMEZAWA Hiroyuki, linux-kernel, Naoya Horiguchi
Transparent hugepage (thp) splits itself when kernel cannot work on it
in hugepage unit. Thp split can reduce performance merit of thp, so we
are now making efforts to handle thp as it is if possible.
As one of such efforts, I suggest to make pagemap aware of thp.
Could you review and comment on them?
Thanks,
Naoya
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap
2011-12-19 18:38 [RFC][PATCH 0/3] pagemap handles transparent hugepage Naoya Horiguchi
@ 2011-12-19 18:38 ` Naoya Horiguchi
2011-12-19 18:45 ` Andi Kleen
2011-12-19 18:48 ` David Rientjes
2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
2011-12-19 18:38 ` [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it Naoya Horiguchi
2 siblings, 2 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 18:38 UTC (permalink / raw)
To: linux-mm
Cc: Andi Kleen, Wu Fengguang, Andrea Arcangeli, KOSAKI Motohiro,
KAMEZAWA Hiroyuki, linux-kernel, Naoya Horiguchi
Thp split is not necessary if we explicitly check whether pmds are
mapping thps or not. This patch introduces the check and the code
to generate pagemap entries for pmds mapping thps, which results in
less performance impact of pagemap on thp.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
fs/proc/task_mmu.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 44 insertions(+), 4 deletions(-)
diff --git 3.2-rc5.orig/fs/proc/task_mmu.c 3.2-rc5/fs/proc/task_mmu.c
index e418c5a..90c4b7a 100644
--- 3.2-rc5.orig/fs/proc/task_mmu.c
+++ 3.2-rc5/fs/proc/task_mmu.c
@@ -600,6 +600,9 @@ struct pagemapread {
u64 *buffer;
};
+#define PAGEMAP_WALK_SIZE (PMD_SIZE)
+#define PAGEMAP_WALK_MASK (PMD_MASK)
+
#define PM_ENTRY_BYTES sizeof(u64)
#define PM_STATUS_BITS 3
#define PM_STATUS_OFFSET (64 - PM_STATUS_BITS)
@@ -658,6 +661,22 @@ static u64 pte_to_pagemap_entry(pte_t pte)
return pme;
}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static u64 thp_pte_to_pagemap_entry(pte_t pte, int offset)
+{
+ u64 pme = 0;
+ if (pte_present(pte))
+ pme = PM_PFRAME(pte_pfn(pte) + offset)
+ | PM_PSHIFT(PAGE_SHIFT) | PM_PRESENT;
+ return pme;
+}
+#else
+static inline u64 thp_pte_to_pagemap_entry(pte_t pte, int offset)
+{
+ return 0;
+}
+#endif
+
static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
struct mm_walk *walk)
{
@@ -666,10 +685,33 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
pte_t *pte;
int err = 0;
- split_huge_page_pmd(walk->mm, pmd);
-
/* find the first VMA at or above 'addr' */
vma = find_vma(walk->mm, addr);
+
+ spin_lock(&walk->mm->page_table_lock);
+ if (pmd_trans_huge(*pmd)) {
+ if (pmd_trans_splitting(*pmd)) {
+ spin_unlock(&walk->mm->page_table_lock);
+ wait_split_huge_page(vma->anon_vma, pmd);
+ } else {
+ u64 pfn = PM_NOT_PRESENT;
+
+ for (; addr != end; addr += PAGE_SIZE) {
+ int offset = (addr & ~PAGEMAP_WALK_MASK)
+ >> PAGE_SHIFT;
+ pfn = thp_pte_to_pagemap_entry(*(pte_t *)pmd,
+ offset);
+ err = add_to_pagemap(addr, pfn, pm);
+ if (err)
+ break;
+ }
+ spin_unlock(&walk->mm->page_table_lock);
+ return err;
+ }
+ } else {
+ spin_unlock(&walk->mm->page_table_lock);
+ }
+
for (; addr != end; addr += PAGE_SIZE) {
u64 pfn = PM_NOT_PRESENT;
@@ -754,8 +796,6 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
* determine which areas of memory are actually mapped and llseek to
* skip over unmapped regions.
*/
-#define PAGEMAP_WALK_SIZE (PMD_SIZE)
-#define PAGEMAP_WALK_MASK (PMD_MASK)
static ssize_t pagemap_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
--
1.7.7.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 18:38 [RFC][PATCH 0/3] pagemap handles transparent hugepage Naoya Horiguchi
2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
@ 2011-12-19 18:38 ` Naoya Horiguchi
2011-12-19 18:40 ` Andi Kleen
` (2 more replies)
2011-12-19 18:38 ` [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it Naoya Horiguchi
2 siblings, 3 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 18:38 UTC (permalink / raw)
To: linux-mm
Cc: Andi Kleen, Wu Fengguang, Andrea Arcangeli, KOSAKI Motohiro,
KAMEZAWA Hiroyuki, linux-kernel, Naoya Horiguchi
This flag shows that a given pages is a subpage of transparent hugepage.
It does not care about whether it is a head page or a tail page, because
it's clear from pfn of the target page which you should know when you read
/proc/kpageflags.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
fs/proc/page.c | 5 +++++
include/linux/kernel-page-flags.h | 4 ++++
2 files changed, 9 insertions(+), 0 deletions(-)
diff --git 3.2-rc5.orig/fs/proc/page.c 3.2-rc5/fs/proc/page.c
index 6d8e6a9..d436fc6 100644
--- 3.2-rc5.orig/fs/proc/page.c
+++ 3.2-rc5/fs/proc/page.c
@@ -116,6 +116,11 @@ u64 stable_page_flags(struct page *page)
if (PageHuge(page))
u |= 1 << KPF_HUGE;
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ if (PageTransCompound(page))
+ u |= 1 << KPF_THP;
+#endif
+
/*
* Caveats on high order pages: page->_count will only be set
* -1 on the head page; SLUB/SLQB do the same for PG_slab;
diff --git 3.2-rc5.orig/include/linux/kernel-page-flags.h 3.2-rc5/include/linux/kernel-page-flags.h
index bd92a89..7b83ee7 100644
--- 3.2-rc5.orig/include/linux/kernel-page-flags.h
+++ 3.2-rc5/include/linux/kernel-page-flags.h
@@ -31,6 +31,10 @@
#define KPF_KSM 21
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define KPF_THP 22
+#endif
+
/* kernel hacking assistances
* WARNING: subject to change, never rely on them!
*/
--
1.7.7.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it
2011-12-19 18:38 [RFC][PATCH 0/3] pagemap handles transparent hugepage Naoya Horiguchi
2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
@ 2011-12-19 18:38 ` Naoya Horiguchi
2011-12-19 18:51 ` David Rientjes
2011-12-20 3:41 ` Wu Fengguang
2 siblings, 2 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 18:38 UTC (permalink / raw)
To: linux-mm
Cc: Andi Kleen, Wu Fengguang, Andrea Arcangeli, KOSAKI Motohiro,
KAMEZAWA Hiroyuki, linux-kernel, Naoya Horiguchi
page-types, which is a common user of pagemap, gets aware of thp
with this patch. This helps system admins and kernel hackers to know
about how thp works.
Here is a sample output of page-types over a thp:
voffset offset len flags
...
7f9d40200 3fd200 1 ___U_lA____Ma_b_______t____________
7f9d40201 3fd201 1ff ______________________t____________
...
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000400000 511 1 ______________________t____________ thp
0x0000000000405868 1 0 ___U_lA____Ma_b_______t____________ uptodate,lru,active,mmap,anonymous,swapbacked,thp
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
Documentation/vm/page-types.c | 2 ++
Documentation/vm/pagemap.txt | 4 ++++
2 files changed, 6 insertions(+), 0 deletions(-)
diff --git 3.2-rc5.orig/Documentation/vm/page-types.c 3.2-rc5/Documentation/vm/page-types.c
index 7445caa..0b13f02 100644
--- 3.2-rc5.orig/Documentation/vm/page-types.c
+++ 3.2-rc5/Documentation/vm/page-types.c
@@ -98,6 +98,7 @@
#define KPF_HWPOISON 19
#define KPF_NOPAGE 20
#define KPF_KSM 21
+#define KPF_THP 22
/* [32-] kernel hacking assistances */
#define KPF_RESERVED 32
@@ -147,6 +148,7 @@ static const char *page_flag_names[] = {
[KPF_HWPOISON] = "X:hwpoison",
[KPF_NOPAGE] = "n:nopage",
[KPF_KSM] = "x:ksm",
+ [KPF_THP] = "t:thp",
[KPF_RESERVED] = "r:reserved",
[KPF_MLOCKED] = "m:mlocked",
diff --git 3.2-rc5.orig/Documentation/vm/pagemap.txt 3.2-rc5/Documentation/vm/pagemap.txt
index df09b96..666d12b 100644
--- 3.2-rc5.orig/Documentation/vm/pagemap.txt
+++ 3.2-rc5/Documentation/vm/pagemap.txt
@@ -60,6 +60,7 @@ There are three components to pagemap:
19. HWPOISON
20. NOPAGE
21. KSM
+ 22. THP
Short descriptions to the page flags:
@@ -97,6 +98,9 @@ Short descriptions to the page flags:
21. KSM
identical memory pages dynamically shared between one or more processes
+22. THP
+ continuous pages which construct transparent hugepages
+
[IO related page flags]
1. ERROR IO error occurred
3. UPTODATE page has up-to-date data
--
1.7.7.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
@ 2011-12-19 18:40 ` Andi Kleen
2011-12-19 18:49 ` David Rientjes
` (2 more replies)
2011-12-19 19:24 ` KOSAKI Motohiro
2011-12-20 3:35 ` Wu Fengguang
2 siblings, 3 replies; 23+ messages in thread
From: Andi Kleen @ 2011-12-19 18:40 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel
> diff --git 3.2-rc5.orig/fs/proc/page.c 3.2-rc5/fs/proc/page.c
> index 6d8e6a9..d436fc6 100644
> --- 3.2-rc5.orig/fs/proc/page.c
> +++ 3.2-rc5/fs/proc/page.c
> @@ -116,6 +116,11 @@ u64 stable_page_flags(struct page *page)
> if (PageHuge(page))
> u |= 1 << KPF_HUGE;
>
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + if (PageTransCompound(page))
> + u |= 1 << KPF_THP;
> +#endif
It would be better to have PageTransCompound be a dummy (always 0)
for !CONFIG_TRANSPARENT_HUGEPAGE and KPF_THP always defined.
This would keep ifdefery in the headers.
-Andi
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap
2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
@ 2011-12-19 18:45 ` Andi Kleen
2011-12-19 18:48 ` David Rientjes
1 sibling, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2011-12-19 18:45 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel
On Mon, Dec 19, 2011 at 01:38:37PM -0500, Naoya Horiguchi wrote:
> Thp split is not necessary if we explicitly check whether pmds are
> mapping thps or not. This patch introduces the check and the code
> to generate pagemap entries for pmds mapping thps, which results in
> less performance impact of pagemap on thp.
Looks good.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
-Andi
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap
2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
2011-12-19 18:45 ` Andi Kleen
@ 2011-12-19 18:48 ` David Rientjes
2011-12-19 19:16 ` Naoya Horiguchi
1 sibling, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-12-19 18:48 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel
On Mon, 19 Dec 2011, Naoya Horiguchi wrote:
> diff --git 3.2-rc5.orig/fs/proc/task_mmu.c 3.2-rc5/fs/proc/task_mmu.c
> index e418c5a..90c4b7a 100644
> --- 3.2-rc5.orig/fs/proc/task_mmu.c
> +++ 3.2-rc5/fs/proc/task_mmu.c
> @@ -600,6 +600,9 @@ struct pagemapread {
> u64 *buffer;
> };
>
> +#define PAGEMAP_WALK_SIZE (PMD_SIZE)
> +#define PAGEMAP_WALK_MASK (PMD_MASK)
> +
> #define PM_ENTRY_BYTES sizeof(u64)
> #define PM_STATUS_BITS 3
> #define PM_STATUS_OFFSET (64 - PM_STATUS_BITS)
> @@ -658,6 +661,22 @@ static u64 pte_to_pagemap_entry(pte_t pte)
> return pme;
> }
>
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static u64 thp_pte_to_pagemap_entry(pte_t pte, int offset)
> +{
> + u64 pme = 0;
> + if (pte_present(pte))
> + pme = PM_PFRAME(pte_pfn(pte) + offset)
> + | PM_PSHIFT(PAGE_SHIFT) | PM_PRESENT;
> + return pme;
> +}
> +#else
> +static inline u64 thp_pte_to_pagemap_entry(pte_t pte, int offset)
> +{
> + return 0;
> +}
> +#endif
> +
> static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> struct mm_walk *walk)
> {
> @@ -666,10 +685,33 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> pte_t *pte;
> int err = 0;
>
> - split_huge_page_pmd(walk->mm, pmd);
> -
> /* find the first VMA at or above 'addr' */
> vma = find_vma(walk->mm, addr);
> +
> + spin_lock(&walk->mm->page_table_lock);
pagemap_pte_range() could potentially be called a _lot_, so I'd recommend
optimizing this by checking for pmd_trans_huge() prior to taking
page_table_lock and then rechecking after grabbing it with likely().
> + if (pmd_trans_huge(*pmd)) {
> + if (pmd_trans_splitting(*pmd)) {
> + spin_unlock(&walk->mm->page_table_lock);
> + wait_split_huge_page(vma->anon_vma, pmd);
> + } else {
> + u64 pfn = PM_NOT_PRESENT;
This doesn't need to be initialized and it would probably be better to
declare "pfn" at the top-level of this function since it's later used for
the non-thp case.
> +
> + for (; addr != end; addr += PAGE_SIZE) {
> + int offset = (addr & ~PAGEMAP_WALK_MASK)
> + >> PAGE_SHIFT;
> + pfn = thp_pte_to_pagemap_entry(*(pte_t *)pmd,
> + offset);
> + err = add_to_pagemap(addr, pfn, pm);
> + if (err)
> + break;
> + }
> + spin_unlock(&walk->mm->page_table_lock);
> + return err;
> + }
> + } else {
> + spin_unlock(&walk->mm->page_table_lock);
> + }
> +
> for (; addr != end; addr += PAGE_SIZE) {
> u64 pfn = PM_NOT_PRESENT;
>
> @@ -754,8 +796,6 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
> * determine which areas of memory are actually mapped and llseek to
> * skip over unmapped regions.
> */
> -#define PAGEMAP_WALK_SIZE (PMD_SIZE)
> -#define PAGEMAP_WALK_MASK (PMD_MASK)
> static ssize_t pagemap_read(struct file *file, char __user *buf,
> size_t count, loff_t *ppos)
> {
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 18:40 ` Andi Kleen
@ 2011-12-19 18:49 ` David Rientjes
2011-12-19 18:55 ` Naoya Horiguchi
2011-12-19 19:05 ` Andrea Arcangeli
2 siblings, 0 replies; 23+ messages in thread
From: David Rientjes @ 2011-12-19 18:49 UTC (permalink / raw)
To: Andi Kleen
Cc: Naoya Horiguchi, linux-mm, Wu Fengguang, Andrea Arcangeli,
KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel
On Mon, 19 Dec 2011, Andi Kleen wrote:
> It would be better to have PageTransCompound be a dummy (always 0)
> for !CONFIG_TRANSPARENT_HUGEPAGE
It already is.
> and KPF_THP always defined.
> This would keep ifdefery in the headers.
>
Agreed.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it
2011-12-19 18:38 ` [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it Naoya Horiguchi
@ 2011-12-19 18:51 ` David Rientjes
2011-12-20 3:41 ` Wu Fengguang
1 sibling, 0 replies; 23+ messages in thread
From: David Rientjes @ 2011-12-19 18:51 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel
On Mon, 19 Dec 2011, Naoya Horiguchi wrote:
> @@ -97,6 +98,9 @@ Short descriptions to the page flags:
> 21. KSM
> identical memory pages dynamically shared between one or more processes
>
> +22. THP
> + continuous pages which construct transparent hugepages
> +
> [IO related page flags]
> 1. ERROR IO error occurred
> 3. UPTODATE page has up-to-date data
s/continuous/contiguous/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 18:40 ` Andi Kleen
2011-12-19 18:49 ` David Rientjes
@ 2011-12-19 18:55 ` Naoya Horiguchi
2011-12-19 19:05 ` Andrea Arcangeli
2 siblings, 0 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 18:55 UTC (permalink / raw)
To: Andi Kleen
Cc: Naoya Horiguchi, linux-mm, Wu Fengguang, Andrea Arcangeli,
KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel
Hi,
On Mon, Dec 19, 2011 at 07:40:47PM +0100, Andi Kleen wrote:
> > diff --git 3.2-rc5.orig/fs/proc/page.c 3.2-rc5/fs/proc/page.c
> > index 6d8e6a9..d436fc6 100644
> > --- 3.2-rc5.orig/fs/proc/page.c
> > +++ 3.2-rc5/fs/proc/page.c
> > @@ -116,6 +116,11 @@ u64 stable_page_flags(struct page *page)
> > if (PageHuge(page))
> > u |= 1 << KPF_HUGE;
> >
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > + if (PageTransCompound(page))
> > + u |= 1 << KPF_THP;
> > +#endif
>
> It would be better to have PageTransCompound be a dummy (always 0)
> for !CONFIG_TRANSPARENT_HUGEPAGE and KPF_THP always defined.
> This would keep ifdefery in the headers.
OK, I'll do it in the next post.
Thanks for the feedback.
Naoya
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 18:40 ` Andi Kleen
2011-12-19 18:49 ` David Rientjes
2011-12-19 18:55 ` Naoya Horiguchi
@ 2011-12-19 19:05 ` Andrea Arcangeli
2 siblings, 0 replies; 23+ messages in thread
From: Andrea Arcangeli @ 2011-12-19 19:05 UTC (permalink / raw)
To: Andi Kleen
Cc: Naoya Horiguchi, linux-mm, Wu Fengguang, KOSAKI Motohiro,
KAMEZAWA Hiroyuki, linux-kernel
On Mon, Dec 19, 2011 at 07:40:47PM +0100, Andi Kleen wrote:
> > diff --git 3.2-rc5.orig/fs/proc/page.c 3.2-rc5/fs/proc/page.c
> > index 6d8e6a9..d436fc6 100644
> > --- 3.2-rc5.orig/fs/proc/page.c
> > +++ 3.2-rc5/fs/proc/page.c
> > @@ -116,6 +116,11 @@ u64 stable_page_flags(struct page *page)
> > if (PageHuge(page))
> > u |= 1 << KPF_HUGE;
> >
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > + if (PageTransCompound(page))
> > + u |= 1 << KPF_THP;
> > +#endif
>
> It would be better to have PageTransCompound be a dummy (always 0)
> for !CONFIG_TRANSPARENT_HUGEPAGE and KPF_THP always defined.
It's already the case, that's the whole point of using
PageTransCompound instead of PageCompound (the former defines to 0 is
the config option is disabled).
> This would keep ifdefery in the headers.
Yes the #ifdef can go already.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap
2011-12-19 18:48 ` David Rientjes
@ 2011-12-19 19:16 ` Naoya Horiguchi
0 siblings, 0 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 19:16 UTC (permalink / raw)
To: David Rientjes
Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
linux-kernel
On Mon, Dec 19, 2011 at 10:48:17AM -0800, David Rientjes wrote:
> On Mon, 19 Dec 2011, Naoya Horiguchi wrote:
...
> > @@ -666,10 +685,33 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> > pte_t *pte;
> > int err = 0;
> >
> > - split_huge_page_pmd(walk->mm, pmd);
> > -
> > /* find the first VMA at or above 'addr' */
> > vma = find_vma(walk->mm, addr);
> > +
> > + spin_lock(&walk->mm->page_table_lock);
>
> pagemap_pte_range() could potentially be called a _lot_, so I'd recommend
> optimizing this by checking for pmd_trans_huge() prior to taking
> page_table_lock and then rechecking after grabbing it with likely().
OK, I'll try it.
Similar logic is used on smaps_pte_range() and gather_pte_stats(),
so I think it's better to write a separate patch for this optimization.
> > + if (pmd_trans_huge(*pmd)) {
> > + if (pmd_trans_splitting(*pmd)) {
> > + spin_unlock(&walk->mm->page_table_lock);
> > + wait_split_huge_page(vma->anon_vma, pmd);
> > + } else {
> > + u64 pfn = PM_NOT_PRESENT;
>
> This doesn't need to be initialized and it would probably be better to
> declare "pfn" at the top-level of this function since it's later used for
> the non-thp case.
Agreed.
I unify two declarations of "pfn" at the beginning of the function.
Thank you for nice feedback.
Naoya
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
2011-12-19 18:40 ` Andi Kleen
@ 2011-12-19 19:24 ` KOSAKI Motohiro
2011-12-19 20:26 ` Naoya Horiguchi
2011-12-19 20:31 ` Dave Hansen
2011-12-20 3:35 ` Wu Fengguang
2 siblings, 2 replies; 23+ messages in thread
From: KOSAKI Motohiro @ 2011-12-19 19:24 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel
(12/19/11 1:38 PM), Naoya Horiguchi wrote:
> This flag shows that a given pages is a subpage of transparent hugepage.
> It does not care about whether it is a head page or a tail page, because
> it's clear from pfn of the target page which you should know when you read
> /proc/kpageflags.
>
> Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
NAK.
The detail of transparent hugepage are hidden by design. We hope it
keep 'transparent'.
Until any explain why we should expose KPF_THP, we don't agree it.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 19:24 ` KOSAKI Motohiro
@ 2011-12-19 20:26 ` Naoya Horiguchi
2011-12-19 20:48 ` KOSAKI Motohiro
2011-12-19 20:31 ` Dave Hansen
1 sibling, 1 reply; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 20:26 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
linux-kernel
Hi,
On Mon, Dec 19, 2011 at 02:24:53PM -0500, KOSAKI Motohiro wrote:
> (12/19/11 1:38 PM), Naoya Horiguchi wrote:
> > This flag shows that a given pages is a subpage of transparent hugepage.
> > It does not care about whether it is a head page or a tail page, because
> > it's clear from pfn of the target page which you should know when you read
> > /proc/kpageflags.
> >
> > Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
>
> NAK.
>
> The detail of transparent hugepage are hidden by design. We hope it
> keep 'transparent'.
> Until any explain why we should expose KPF_THP, we don't agree it.
The reason why I want to know physical address of thp is testing.
I'm working on memory error recovery and writing test code to confirm
that memory recovery really works when an error occurs on thps.
There I need to locate thps on the physical memory.
IMO, transparency in thp means that we need no manual setup to use
it (as a contrast with hugetlbfs,) so it seems to me that exporting
pageflag of thp does not break the design of thp.
Anyway, I should have written the purpose in the patch description.
Thanks for the comment.
Naoya
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 19:24 ` KOSAKI Motohiro
2011-12-19 20:26 ` Naoya Horiguchi
@ 2011-12-19 20:31 ` Dave Hansen
2011-12-19 20:45 ` KOSAKI Motohiro
1 sibling, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2011-12-19 20:31 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
linux-kernel
On 12/19/2011 11:24 AM, KOSAKI Motohiro wrote:
> (12/19/11 1:38 PM), Naoya Horiguchi wrote:
>> This flag shows that a given pages is a subpage of transparent hugepage.
>> It does not care about whether it is a head page or a tail page, because
>> it's clear from pfn of the target page which you should know when you read
>> /proc/kpageflags.
>>
>> Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
>
> NAK.
>
> The detail of transparent hugepage are hidden by design. We hope it
> keep 'transparent'.
> Until any explain why we should expose KPF_THP, we don't agree it.
Transparent shouldn't mean "undebuggable", though. :)
Let's say you profiled a application and the data shows you're missing
the TLB a bunch, but you're also using THP. This might give you a shot
at figuring out which parts of your application are *TRULY* THP-backed
instead of just the areas you *think* are backed.
I'm not sure there's another way to figure it out at the moment.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 20:31 ` Dave Hansen
@ 2011-12-19 20:45 ` KOSAKI Motohiro
2011-12-19 20:57 ` Dave Hansen
0 siblings, 1 reply; 23+ messages in thread
From: KOSAKI Motohiro @ 2011-12-19 20:45 UTC (permalink / raw)
To: Dave Hansen
Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
linux-kernel
(12/19/11 3:31 PM), Dave Hansen wrote:
> On 12/19/2011 11:24 AM, KOSAKI Motohiro wrote:
>> (12/19/11 1:38 PM), Naoya Horiguchi wrote:
>>> This flag shows that a given pages is a subpage of transparent hugepage.
>>> It does not care about whether it is a head page or a tail page, because
>>> it's clear from pfn of the target page which you should know when you read
>>> /proc/kpageflags.
>>>
>>> Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
>>
>> NAK.
>>
>> The detail of transparent hugepage are hidden by design. We hope it
>> keep 'transparent'.
>> Until any explain why we should expose KPF_THP, we don't agree it.
>
> Transparent shouldn't mean "undebuggable", though. :)
>
> Let's say you profiled a application and the data shows you're missing
> the TLB a bunch, but you're also using THP. This might give you a shot
> at figuring out which parts of your application are *TRULY* THP-backed
> instead of just the areas you *think* are backed.
>
> I'm not sure there's another way to figure it out at the moment.
A snapshot status of THP doesn't help your purpose. I think you need
perf or similar profiling subsystem enhancement.
Because of, if you've seen KPF_THP at once, It has no guarantee to keep
hugepages until applications run. Opposite, If you only need rough
statistics, the best way is to add some new stat to
/sys/kernel/mm/transparent_hugepage.
I don't think your usecase and current proposal are matched.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 20:26 ` Naoya Horiguchi
@ 2011-12-19 20:48 ` KOSAKI Motohiro
2011-12-19 21:20 ` Naoya Horiguchi
0 siblings, 1 reply; 23+ messages in thread
From: KOSAKI Motohiro @ 2011-12-19 20:48 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, Andi Kleen, Wu Fengguang, Andrea Arcangeli,
KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel
>> The detail of transparent hugepage are hidden by design. We hope it
>> keep 'transparent'.
>> Until any explain why we should expose KPF_THP, we don't agree it.
>
> The reason why I want to know physical address of thp is testing.
> I'm working on memory error recovery and writing test code to confirm
> that memory recovery really works when an error occurs on thps.
> There I need to locate thps on the physical memory.
I'm sorry, I simply don't understand what you say. Why do you think
memory recovery and thp are related feature?
>
> IMO, transparency in thp means that we need no manual setup to use
> it (as a contrast with hugetlbfs,) so it seems to me that exporting
> pageflag of thp does not break the design of thp.
>
> Anyway, I should have written the purpose in the patch description.
> Thanks for the comment.
>
> Naoya
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 20:45 ` KOSAKI Motohiro
@ 2011-12-19 20:57 ` Dave Hansen
2011-12-19 21:23 ` Andrea Arcangeli
0 siblings, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2011-12-19 20:57 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
linux-kernel
On 12/19/2011 12:45 PM, KOSAKI Motohiro wrote:
> (12/19/11 3:31 PM), Dave Hansen wrote:
>> Let's say you profiled a application and the data shows you're missing
>> the TLB a bunch, but you're also using THP. This might give you a shot
>> at figuring out which parts of your application are *TRULY* THP-backed
>> instead of just the areas you *think* are backed.
>>
>> I'm not sure there's another way to figure it out at the moment.
>
> A snapshot status of THP doesn't help your purpose. I think you need
> perf or similar profiling subsystem enhancement.
>
> Because of, if you've seen KPF_THP at once, It has no guarantee to keep
> hugepages until applications run. Opposite, If you only need rough
> statistics, the best way is to add some new stat to
> /sys/kernel/mm/transparent_hugepage.
But, every single one of the pagemap flags is really just a snapshot
KPF_DIRTY, KPF_LOCKED, etc... The entire interface is inherently a racy
snapshot, and there's not a whole lot you can do about it.
sys_mincore() has the exact same issues. But, that does not make them
useless, nor mean they shouldn't be in the kernel.
A tracepoint or something similar to watch for THP promotions or
demotions would be a great addition to this interface. That way, you at
least have a concept if the data you got has become stale.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 20:48 ` KOSAKI Motohiro
@ 2011-12-19 21:20 ` Naoya Horiguchi
0 siblings, 0 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-19 21:20 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Wu Fengguang,
Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
linux-kernel
On Mon, Dec 19, 2011 at 03:48:08PM -0500, KOSAKI Motohiro wrote:
> >> The detail of transparent hugepage are hidden by design. We hope it
> >> keep 'transparent'.
> >> Until any explain why we should expose KPF_THP, we don't agree it.
> >
> > The reason why I want to know physical address of thp is testing.
> > I'm working on memory error recovery and writing test code to confirm
> > that memory recovery really works when an error occurs on thps.
> > There I need to locate thps on the physical memory.
>
> I'm sorry, I simply don't understand what you say. Why do you think
> memory recovery and thp are related feature?
This is because memory error can occur on thp and then we should be
able to handle them (otherwise we can use corrupted date) and
verify that it works.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 20:57 ` Dave Hansen
@ 2011-12-19 21:23 ` Andrea Arcangeli
0 siblings, 0 replies; 23+ messages in thread
From: Andrea Arcangeli @ 2011-12-19 21:23 UTC (permalink / raw)
To: Dave Hansen
Cc: KOSAKI Motohiro, Naoya Horiguchi, linux-mm, Andi Kleen,
Wu Fengguang, KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel
On Mon, Dec 19, 2011 at 12:57:01PM -0800, Dave Hansen wrote:
> But, every single one of the pagemap flags is really just a snapshot
> KPF_DIRTY, KPF_LOCKED, etc... The entire interface is inherently a racy
> snapshot, and there's not a whole lot you can do about it.
Having read the discussion, while I don't see a big need of the
KPF_THP, I also see how it in certain corner cases it can be used to
test memory failure injection and I agree with you on the above. Maybe
it can also be used to check if at certain virtual offsets
(pid/pagemap lookup followed by a kpageflags lookup) we always fail to
find THP inside big vmas, maybe out of not aligned mprotect that may
be optimized by aligning it.
The other kernel internal bits may also be stale and go away quicker
than the KPF_THP, so I don't see a problem in exposing it. We also
provide THP related info in meminfo/smaps, if they were supposed to be
invisible that wouldn't be allowed too.
A bigger concern to me is that the new bitfield alters the protocol,
but old code by adding one more bit (if sanely coded...) shouldn't break.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
2011-12-19 18:40 ` Andi Kleen
2011-12-19 19:24 ` KOSAKI Motohiro
@ 2011-12-20 3:35 ` Wu Fengguang
2011-12-20 18:10 ` Naoya Horiguchi
2 siblings, 1 reply; 23+ messages in thread
From: Wu Fengguang @ 2011-12-20 3:35 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, Andi Kleen, Andrea Arcangeli, KOSAKI Motohiro,
KAMEZAWA Hiroyuki, linux-kernel
On Tue, Dec 20, 2011 at 02:38:38AM +0800, Naoya Horiguchi wrote:
> This flag shows that a given pages is a subpage of transparent hugepage.
> It does not care about whether it is a head page or a tail page, because
> it's clear from pfn of the target page which you should know when you read
> /proc/kpageflags.
OK, this is aligning with KPF_HUGE. For those who only care about
head/tail pages, will the KPF_COMPOUND_HEAD/KPF_COMPOUND_TAIL flags be
set automatically for thp? Which may be more convenient to test/filter
than the page address.
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
As you already discussed, the below #ifdef should be removed.
In fact, kernel-page-flags.h is intended for direct inclusion by
user space tools, so must not have any conditional defines.
> --- 3.2-rc5.orig/include/linux/kernel-page-flags.h
> +++ 3.2-rc5/include/linux/kernel-page-flags.h
> @@ -31,6 +31,10 @@
>
> #define KPF_KSM 21
>
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#define KPF_THP 22
> +#endif
> +
> /* kernel hacking assistances
> * WARNING: subject to change, never rely on them!
> */
> --
> 1.7.7.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it
2011-12-19 18:38 ` [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it Naoya Horiguchi
2011-12-19 18:51 ` David Rientjes
@ 2011-12-20 3:41 ` Wu Fengguang
1 sibling, 0 replies; 23+ messages in thread
From: Wu Fengguang @ 2011-12-20 3:41 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: linux-mm, Andi Kleen, Andrea Arcangeli, KOSAKI Motohiro,
KAMEZAWA Hiroyuki, linux-kernel
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Thanks!
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC][PATCH 2/3] pagemap: export KPF_THP
2011-12-20 3:35 ` Wu Fengguang
@ 2011-12-20 18:10 ` Naoya Horiguchi
0 siblings, 0 replies; 23+ messages in thread
From: Naoya Horiguchi @ 2011-12-20 18:10 UTC (permalink / raw)
To: Wu Fengguang
Cc: Naoya Horiguchi, linux-mm, Andi Kleen, Andrea Arcangeli,
KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel
On Tue, Dec 20, 2011 at 11:35:37AM +0800, Wu Fengguang wrote:
> On Tue, Dec 20, 2011 at 02:38:38AM +0800, Naoya Horiguchi wrote:
> > This flag shows that a given pages is a subpage of transparent hugepage.
> > It does not care about whether it is a head page or a tail page, because
> > it's clear from pfn of the target page which you should know when you read
> > /proc/kpageflags.
>
> OK, this is aligning with KPF_HUGE. For those who only care about
> head/tail pages, will the KPF_COMPOUND_HEAD/KPF_COMPOUND_TAIL flags be
> set automatically for thp? Which may be more convenient to test/filter
> than the page address.
Yes, both of KPF_COMPOUND_HEAD/TAIL flags are automatically set for thp.
So above patch description was wrong and should be fixed.
(I didn't notice that because page-types hid compound_head/tail flags
without --raw flag. Sorry for confusion.)
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>
> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Thank you.
Naoya
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2011-12-20 18:11 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-19 18:38 [RFC][PATCH 0/3] pagemap handles transparent hugepage Naoya Horiguchi
2011-12-19 18:38 ` [RFC][PATCH 1/3] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
2011-12-19 18:45 ` Andi Kleen
2011-12-19 18:48 ` David Rientjes
2011-12-19 19:16 ` Naoya Horiguchi
2011-12-19 18:38 ` [RFC][PATCH 2/3] pagemap: export KPF_THP Naoya Horiguchi
2011-12-19 18:40 ` Andi Kleen
2011-12-19 18:49 ` David Rientjes
2011-12-19 18:55 ` Naoya Horiguchi
2011-12-19 19:05 ` Andrea Arcangeli
2011-12-19 19:24 ` KOSAKI Motohiro
2011-12-19 20:26 ` Naoya Horiguchi
2011-12-19 20:48 ` KOSAKI Motohiro
2011-12-19 21:20 ` Naoya Horiguchi
2011-12-19 20:31 ` Dave Hansen
2011-12-19 20:45 ` KOSAKI Motohiro
2011-12-19 20:57 ` Dave Hansen
2011-12-19 21:23 ` Andrea Arcangeli
2011-12-20 3:35 ` Wu Fengguang
2011-12-20 18:10 ` Naoya Horiguchi
2011-12-19 18:38 ` [RFC][PATCH 3/3] pagemap: document KPF_THP and show make page-types aware of it Naoya Horiguchi
2011-12-19 18:51 ` David Rientjes
2011-12-20 3:41 ` Wu Fengguang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).