linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH, RFC 0/9] Introduce huge zero page
@ 2012-08-09  9:08 Kirill A. Shutemov
  2012-08-09  9:08 ` [PATCH, RFC 1/9] thp: huge zero page: basic preparation Kirill A. Shutemov
                   ` (10 more replies)
  0 siblings, 11 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-09  9:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, linux-mm
  Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Kirill A. Shutemov,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

During testing I noticed big (up to 2.5 times) memory consumption overhead
on some workloads (e.g. ft.A from NPB) if THP is enabled.

The main reason for that big difference is lacking zero page in THP case.
We have to allocate a real page on read page fault.

A program to demonstrate the issue:
#include <assert.h>
#include <stdlib.h>
#include <unistd.h>

#define MB 1024*1024

int main(int argc, char **argv)
{
        char *p;
        int i;

        posix_memalign((void **)&p, 2 * MB, 200 * MB);
        for (i = 0; i < 200 * MB; i+= 4096)
                assert(p[i] == 0);
        pause();
        return 0;
}

With thp-never RSS is about 400k, but with thp-always it's 200M.
After the patcheset thp-always RSS is 400k too.

H. Peter Anvin proposed to use a "virtual huge zero page" -- a pmd table
with all pte set to 4k zero page. I haven't tried that approach and I'm
not sure if it's good idea (cache vs. tlb trashing). And I guess it will
require more code to handle.
For now, I just allocate 2M page and use it.

Kirill A. Shutemov (9):
  thp: huge zero page: basic preparation
  thp: zap_huge_pmd(): zap huge zero pmd
  thp: copy_huge_pmd(): copy huge zero page
  thp: do_huge_pmd_wp_page(): handle huge zero page
  thp: change_huge_pmd(): keep huge zero page write-protected
  thp: add address parameter to split_huge_page_pmd()
  thp: implement splitting pmd for huge zero page
  thp: setup huge zero page on non-write page fault
  thp: lazy huge zero page allocation

 Documentation/vm/transhuge.txt |    4 +-
 arch/x86/kernel/vm86_32.c      |    2 +-
 fs/proc/task_mmu.c             |    2 +-
 include/linux/huge_mm.h        |   10 +-
 include/linux/mm.h             |    8 ++
 mm/huge_memory.c               |  228 +++++++++++++++++++++++++++++++++++-----
 mm/memory.c                    |   11 +--
 mm/mempolicy.c                 |    2 +-
 mm/mprotect.c                  |    2 +-
 mm/mremap.c                    |    3 +-
 mm/pagewalk.c                  |    2 +-
 11 files changed, 226 insertions(+), 48 deletions(-)

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, RFC 1/9] thp: huge zero page: basic preparation
  2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
@ 2012-08-09  9:08 ` Kirill A. Shutemov
  2012-08-09  9:08 ` [PATCH, RFC 2/9] thp: zap_huge_pmd(): zap huge zero pmd Kirill A. Shutemov
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-09  9:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, linux-mm
  Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Kirill A. Shutemov,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

For now let's allocate the page on hugepage_init(). We'll switch to lazy
allocation later.

We are not going to map the huge zero page until we can handle it
properly on all code paths.

is_huge_zero_{pfn,pmd}() functions will be used by following patches to
check whether the pfn/pmd is huge zero page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..88e0a7a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -46,6 +46,7 @@ static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
 static struct task_struct *khugepaged_thread __read_mostly;
+static unsigned long huge_zero_pfn __read_mostly;
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
 static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
@@ -167,6 +168,28 @@ out:
 	return err;
 }
 
+static int init_huge_zero_page(void)
+{
+	struct page *hpage;
+
+	hpage = alloc_pages(GFP_TRANSHUGE | __GFP_ZERO, HPAGE_PMD_ORDER);
+	if (!hpage)
+		return -ENOMEM;
+
+	huge_zero_pfn = page_to_pfn(hpage);
+	return 0;
+}
+
+static inline bool is_huge_zero_pfn(unsigned long pfn)
+{
+	return pfn == huge_zero_pfn;
+}
+
+static inline bool is_huge_zero_pmd(pmd_t pmd)
+{
+	return is_huge_zero_pfn(pmd_pfn(pmd));
+}
+
 #ifdef CONFIG_SYSFS
 
 static ssize_t double_flag_show(struct kobject *kobj,
@@ -550,6 +573,10 @@ static int __init hugepage_init(void)
 	if (err)
 		return err;
 
+	err = init_huge_zero_page();
+	if (err)
+		goto out;
+
 	err = khugepaged_slab_init();
 	if (err)
 		goto out;
@@ -574,6 +601,8 @@ static int __init hugepage_init(void)
 
 	return 0;
 out:
+	if (huge_zero_pfn)
+		__free_page(pfn_to_page(huge_zero_pfn));
 	hugepage_exit_sysfs(hugepage_kobj);
 	return err;
 }
-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, RFC 2/9] thp: zap_huge_pmd(): zap huge zero pmd
  2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
  2012-08-09  9:08 ` [PATCH, RFC 1/9] thp: huge zero page: basic preparation Kirill A. Shutemov
@ 2012-08-09  9:08 ` Kirill A. Shutemov
  2012-08-09  9:08 ` [PATCH, RFC 3/9] thp: copy_huge_pmd(): copy huge zero page Kirill A. Shutemov
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-09  9:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, linux-mm
  Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Kirill A. Shutemov,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

We don't have a real page to zap in huge zero page case. Let's just
clear pmd and remove it from tlb.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |   27 +++++++++++++++++----------
 1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 88e0a7a..9dcb9e6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1071,16 +1071,23 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		struct page *page;
 		pgtable_t pgtable;
 		pgtable = get_pmd_huge_pte(tlb->mm);
-		page = pmd_page(*pmd);
-		pmd_clear(pmd);
-		tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
-		page_remove_rmap(page);
-		VM_BUG_ON(page_mapcount(page) < 0);
-		add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
-		VM_BUG_ON(!PageHead(page));
-		tlb->mm->nr_ptes--;
-		spin_unlock(&tlb->mm->page_table_lock);
-		tlb_remove_page(tlb, page);
+		if (is_huge_zero_pmd(*pmd)) {
+			pmd_clear(pmd);
+			tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
+			tlb->mm->nr_ptes--;
+			spin_unlock(&tlb->mm->page_table_lock);
+		} else {
+			page = pmd_page(*pmd);
+			pmd_clear(pmd);
+			tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
+			page_remove_rmap(page);
+			VM_BUG_ON(page_mapcount(page) < 0);
+			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+			VM_BUG_ON(!PageHead(page));
+			tlb->mm->nr_ptes--;
+			spin_unlock(&tlb->mm->page_table_lock);
+			tlb_remove_page(tlb, page);
+		}
 		pte_free(tlb->mm, pgtable);
 		ret = 1;
 	}
-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, RFC 3/9] thp: copy_huge_pmd(): copy huge zero page
  2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
  2012-08-09  9:08 ` [PATCH, RFC 1/9] thp: huge zero page: basic preparation Kirill A. Shutemov
  2012-08-09  9:08 ` [PATCH, RFC 2/9] thp: zap_huge_pmd(): zap huge zero pmd Kirill A. Shutemov
@ 2012-08-09  9:08 ` Kirill A. Shutemov
  2012-08-09  9:08 ` [PATCH, RFC 4/9] thp: do_huge_pmd_wp_page(): handle " Kirill A. Shutemov
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-09  9:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, linux-mm
  Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Kirill A. Shutemov,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

It's easy to copy huge zero page. Just set destination pmd to huge zero
page.

It's safe to copy huge zero page since we have none yet :-p

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9dcb9e6..a534f84 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -725,6 +725,18 @@ static inline struct page *alloc_hugepage(int defrag)
 }
 #endif
 
+static void set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
+		struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd)
+{
+	pmd_t entry;
+	entry = pfn_pmd(huge_zero_pfn, vma->vm_page_prot);
+	entry = pmd_wrprotect(entry);
+	entry = pmd_mkhuge(entry);
+	set_pmd_at(mm, haddr, pmd, entry);
+	prepare_pmd_huge_pte(pgtable, mm);
+	mm->nr_ptes++;
+}
+
 int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			       unsigned long address, pmd_t *pmd,
 			       unsigned int flags)
@@ -802,6 +814,11 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pte_free(dst_mm, pgtable);
 		goto out_unlock;
 	}
+	if (is_huge_zero_pmd(pmd)) {
+		set_huge_zero_page(pgtable, dst_mm, vma, addr, dst_pmd);
+		ret = 0;
+		goto out_unlock;
+	}
 	if (unlikely(pmd_trans_splitting(pmd))) {
 		/* split huge page running from under us */
 		spin_unlock(&src_mm->page_table_lock);
-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, RFC 4/9] thp: do_huge_pmd_wp_page(): handle huge zero page
  2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2012-08-09  9:08 ` [PATCH, RFC 3/9] thp: copy_huge_pmd(): copy huge zero page Kirill A. Shutemov
@ 2012-08-09  9:08 ` Kirill A. Shutemov
  2012-08-09  9:08 ` [PATCH, RFC 5/9] thp: change_huge_pmd(): keep huge zero page write-protected Kirill A. Shutemov
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-09  9:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, linux-mm
  Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Kirill A. Shutemov,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

On right access to huge zero page we alloc a new page and clear it.

In fallback path we create a new table and set pte around fault address
to the newly allocated page. All other ptes set to normal zero page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |    8 ++++
 mm/huge_memory.c   |  102 ++++++++++++++++++++++++++++++++++++++++++++--------
 mm/memory.c        |    7 ----
 3 files changed, 95 insertions(+), 22 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b36d08c..c6eef63 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -514,6 +514,14 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 }
 #endif
 
+#ifndef my_zero_pfn
+static inline unsigned long my_zero_pfn(unsigned long addr)
+{
+	extern unsigned long zero_pfn;
+	return zero_pfn;
+}
+#endif
+
 /*
  * Multiple processes may "see" the same page. E.g. for untouched
  * mappings of /dev/null, all processes see the same page full of
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a534f84..f5029d4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -867,6 +867,61 @@ pgtable_t get_pmd_huge_pte(struct mm_struct *mm)
 	return pgtable;
 }
 
+static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm,
+		struct vm_area_struct *vma, unsigned long address,
+		pmd_t *pmd, unsigned long haddr)
+{
+	pgtable_t pgtable;
+	pmd_t _pmd;
+	struct page *page;
+	int i, ret = 0;
+
+	page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
+	if (!page) {
+		ret |= VM_FAULT_OOM;
+		goto out;
+	}
+
+	if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) {
+		put_page(page);
+		ret |= VM_FAULT_OOM;
+		goto out;
+	}
+
+	clear_user_highpage(page, address);
+	__SetPageUptodate(page);
+
+	spin_lock(&mm->page_table_lock);
+	pmdp_clear_flush_notify(vma, haddr, pmd);
+	/* leave pmd empty until pte is filled */
+
+	pgtable = get_pmd_huge_pte(mm);
+	pmd_populate(mm, &_pmd, pgtable);
+
+	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+		pte_t *pte, entry;
+		if (haddr == (address & PAGE_MASK)) {
+			entry = mk_pte(page, vma->vm_page_prot);
+			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+			page_add_new_anon_rmap(page, vma, haddr);
+		} else {
+			entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
+			entry = pte_mkspecial(entry);
+		}
+		pte = pte_offset_map(&_pmd, haddr);
+		VM_BUG_ON(!pte_none(*pte));
+		set_pte_at(mm, haddr, pte, entry);
+		pte_unmap(pte);
+	}
+	smp_wmb(); /* make pte visible before pmd */
+	pmd_populate(mm, pmd, pgtable);
+	spin_unlock(&mm->page_table_lock);
+
+	ret |= VM_FAULT_WRITE;
+out:
+	return ret;
+}
+
 static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
 					struct vm_area_struct *vma,
 					unsigned long address,
@@ -964,17 +1019,19 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, pmd_t *pmd, pmd_t orig_pmd)
 {
 	int ret = 0;
-	struct page *page, *new_page;
+	struct page *page = NULL, *new_page;
 	unsigned long haddr;
 
 	VM_BUG_ON(!vma->anon_vma);
+	haddr = address & HPAGE_PMD_MASK;
+	if (is_huge_zero_pmd(orig_pmd))
+		goto alloc;
 	spin_lock(&mm->page_table_lock);
 	if (unlikely(!pmd_same(*pmd, orig_pmd)))
 		goto out_unlock;
 
 	page = pmd_page(orig_pmd);
 	VM_BUG_ON(!PageCompound(page) || !PageHead(page));
-	haddr = address & HPAGE_PMD_MASK;
 	if (page_mapcount(page) == 1) {
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
@@ -986,7 +1043,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	}
 	get_page(page);
 	spin_unlock(&mm->page_table_lock);
-
+alloc:
 	if (transparent_hugepage_enabled(vma) &&
 	    !transparent_hugepage_debug_cow())
 		new_page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
@@ -996,28 +1053,39 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (unlikely(!new_page)) {
 		count_vm_event(THP_FAULT_FALLBACK);
-		ret = do_huge_pmd_wp_page_fallback(mm, vma, address,
-						   pmd, orig_pmd, page, haddr);
-		if (ret & VM_FAULT_OOM)
-			split_huge_page(page);
-		put_page(page);
+		if (is_huge_zero_pmd(orig_pmd)) {
+			ret = do_huge_pmd_wp_zero_page_fallback(mm, vma,
+					address, pmd, haddr);
+		} else {
+			ret = do_huge_pmd_wp_page_fallback(mm, vma, address,
+					pmd, orig_pmd, page, haddr);
+			if (ret & VM_FAULT_OOM)
+				split_huge_page(page);
+			put_page(page);
+		}
 		goto out;
 	}
 	count_vm_event(THP_FAULT_ALLOC);
 
 	if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) {
 		put_page(new_page);
-		split_huge_page(page);
-		put_page(page);
+		if (page) {
+			split_huge_page(page);
+			put_page(page);
+		}
 		ret |= VM_FAULT_OOM;
 		goto out;
 	}
 
-	copy_user_huge_page(new_page, page, haddr, vma, HPAGE_PMD_NR);
+	if (is_huge_zero_pmd(orig_pmd))
+		clear_huge_page(new_page, haddr, HPAGE_PMD_NR);
+	else
+		copy_user_huge_page(new_page, page, haddr, vma, HPAGE_PMD_NR);
 	__SetPageUptodate(new_page);
 
 	spin_lock(&mm->page_table_lock);
-	put_page(page);
+	if (page)
+		put_page(page);
 	if (unlikely(!pmd_same(*pmd, orig_pmd))) {
 		spin_unlock(&mm->page_table_lock);
 		mem_cgroup_uncharge_page(new_page);
@@ -1025,7 +1093,6 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		goto out;
 	} else {
 		pmd_t entry;
-		VM_BUG_ON(!PageHead(page));
 		entry = mk_pmd(new_page, vma->vm_page_prot);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
 		entry = pmd_mkhuge(entry);
@@ -1033,8 +1100,13 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		page_add_new_anon_rmap(new_page, vma, haddr);
 		set_pmd_at(mm, haddr, pmd, entry);
 		update_mmu_cache(vma, address, entry);
-		page_remove_rmap(page);
-		put_page(page);
+		if (is_huge_zero_pmd(orig_pmd))
+			add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
+		if (page) {
+			VM_BUG_ON(!PageHead(page));
+			page_remove_rmap(page);
+			put_page(page);
+		}
 		ret |= VM_FAULT_WRITE;
 	}
 out_unlock:
diff --git a/mm/memory.c b/mm/memory.c
index 2466d12..c281847 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -720,13 +720,6 @@ static inline int is_zero_pfn(unsigned long pfn)
 }
 #endif
 
-#ifndef my_zero_pfn
-static inline unsigned long my_zero_pfn(unsigned long addr)
-{
-	return zero_pfn;
-}
-#endif
-
 /*
  * vm_normal_page -- This function gets the "struct page" associated with a pte.
  *
-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, RFC 5/9] thp: change_huge_pmd(): keep huge zero page write-protected
  2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2012-08-09  9:08 ` [PATCH, RFC 4/9] thp: do_huge_pmd_wp_page(): handle " Kirill A. Shutemov
@ 2012-08-09  9:08 ` Kirill A. Shutemov
  2012-08-09  9:08 ` [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd() Kirill A. Shutemov
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-09  9:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, linux-mm
  Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Kirill A. Shutemov,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

We want to get page fault on write attempt to huge zero page, so let's
keep it write-protected.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f5029d4..4001f1a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1248,6 +1248,8 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		pmd_t entry;
 		entry = pmdp_get_and_clear(mm, addr, pmd);
 		entry = pmd_modify(entry, newprot);
+		if (is_huge_zero_pmd(entry))
+			entry = pmd_wrprotect(entry);
 		set_pmd_at(mm, addr, pmd, entry);
 		spin_unlock(&vma->vm_mm->page_table_lock);
 		ret = 1;
-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd()
  2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2012-08-09  9:08 ` [PATCH, RFC 5/9] thp: change_huge_pmd(): keep huge zero page write-protected Kirill A. Shutemov
@ 2012-08-09  9:08 ` Kirill A. Shutemov
  2012-08-16 19:42   ` Andrea Arcangeli
  2012-08-09  9:08 ` [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page Kirill A. Shutemov
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-09  9:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, linux-mm
  Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Kirill A. Shutemov,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

It's required to implement huge zero pmd splitting.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/vm/transhuge.txt |    4 ++--
 arch/x86/kernel/vm86_32.c      |    2 +-
 fs/proc/task_mmu.c             |    2 +-
 include/linux/huge_mm.h        |   10 ++++++----
 mm/huge_memory.c               |    5 +++--
 mm/memory.c                    |    4 ++--
 mm/mempolicy.c                 |    2 +-
 mm/mprotect.c                  |    2 +-
 mm/mremap.c                    |    3 ++-
 mm/pagewalk.c                  |    2 +-
 10 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index f734bb2..b1fe2ca 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -276,7 +276,7 @@ unaffected. libhugetlbfs will also work fine as usual.
 == Graceful fallback ==
 
 Code walking pagetables but unware about huge pmds can simply call
-split_huge_page_pmd(mm, pmd) where the pmd is the one returned by
+split_huge_page_pmd(mm, pmd, addr) where the pmd is the one returned by
 pmd_offset. It's trivial to make the code transparent hugepage aware
 by just grepping for "pmd_offset" and adding split_huge_page_pmd where
 missing after pmd_offset returns the pmd. Thanks to the graceful
@@ -299,7 +299,7 @@ diff --git a/mm/mremap.c b/mm/mremap.c
 		return NULL;
 
 	pmd = pmd_offset(pud, addr);
-+	split_huge_page_pmd(mm, pmd);
++	split_huge_page_pmd(mm, pmd, addr);
 	if (pmd_none_or_clear_bad(pmd))
 		return NULL;
 
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 255f58a..719ba0c 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -180,7 +180,7 @@ static void mark_screen_rdonly(struct mm_struct *mm)
 	if (pud_none_or_clear_bad(pud))
 		goto out;
 	pmd = pmd_offset(pud, 0xA0000);
-	split_huge_page_pmd(mm, pmd);
+	split_huge_page_pmd(mm, pmd, 0xA0000);
 	if (pmd_none_or_clear_bad(pmd))
 		goto out;
 	pte = pte_offset_map_lock(mm, pmd, 0xA0000, &ptl);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 4540b8f..27c1827 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -597,7 +597,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 	spinlock_t *ptl;
 	struct page *page;
 
-	split_huge_page_pmd(walk->mm, pmd);
+	split_huge_page_pmd(walk->mm, pmd, addr);
 	if (pmd_trans_unstable(pmd))
 		return 0;
 
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4c59b11..ce91199 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -92,12 +92,14 @@ extern int handle_pte_fault(struct mm_struct *mm,
 			    struct vm_area_struct *vma, unsigned long address,
 			    pte_t *pte, pmd_t *pmd, unsigned int flags);
 extern int split_huge_page(struct page *page);
-extern void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd);
-#define split_huge_page_pmd(__mm, __pmd)				\
+extern void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd,
+		unsigned long address);
+#define split_huge_page_pmd(__mm, __pmd, __address)			\
 	do {								\
 		pmd_t *____pmd = (__pmd);				\
 		if (unlikely(pmd_trans_huge(*____pmd)))			\
-			__split_huge_page_pmd(__mm, ____pmd);		\
+			__split_huge_page_pmd(__mm, ____pmd,		\
+					__address);			\
 	}  while (0)
 #define wait_split_huge_page(__anon_vma, __pmd)				\
 	do {								\
@@ -174,7 +176,7 @@ static inline int split_huge_page(struct page *page)
 {
 	return 0;
 }
-#define split_huge_page_pmd(__mm, __pmd)	\
+#define split_huge_page_pmd(__mm, __pmd, __address)	\
 	do { } while (0)
 #define wait_split_huge_page(__anon_vma, __pmd)	\
 	do { } while (0)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4001f1a..c8948d6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2503,7 +2503,8 @@ static int khugepaged(void *none)
 	return 0;
 }
 
-void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd)
+void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd,
+		unsigned long address)
 {
 	struct page *page;
 
@@ -2547,7 +2548,7 @@ static void split_huge_page_address(struct mm_struct *mm,
 	 * Caller holds the mmap_sem write mode, so a huge pmd cannot
 	 * materialize from under us.
 	 */
-	split_huge_page_pmd(mm, pmd);
+	split_huge_page_pmd(mm, pmd, address);
 }
 
 void __vma_adjust_trans_huge(struct vm_area_struct *vma,
diff --git a/mm/memory.c b/mm/memory.c
index c281847..e202392 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1227,7 +1227,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 					BUG();
 				}
 #endif
-				split_huge_page_pmd(vma->vm_mm, pmd);
+				split_huge_page_pmd(vma->vm_mm, pmd, addr);
 			} else if (zap_huge_pmd(tlb, vma, pmd, addr))
 				goto next;
 			/* fall through */
@@ -1493,7 +1493,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 	}
 	if (pmd_trans_huge(*pmd)) {
 		if (flags & FOLL_SPLIT) {
-			split_huge_page_pmd(mm, pmd);
+			split_huge_page_pmd(mm, pmd, address);
 			goto split_fallthrough;
 		}
 		spin_lock(&mm->page_table_lock);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..44c818d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -511,7 +511,7 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
 	pmd = pmd_offset(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
-		split_huge_page_pmd(vma->vm_mm, pmd);
+		split_huge_page_pmd(vma->vm_mm, pmd, addr);
 		if (pmd_none_or_trans_huge_or_clear_bad(pmd))
 			continue;
 		if (check_pte_range(vma, pmd, addr, next, nodes,
diff --git a/mm/mprotect.c b/mm/mprotect.c
index a409926..e85e29d 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -90,7 +90,7 @@ static inline void change_pmd_range(struct vm_area_struct *vma, pud_t *pud,
 		next = pmd_addr_end(addr, end);
 		if (pmd_trans_huge(*pmd)) {
 			if (next - addr != HPAGE_PMD_SIZE)
-				split_huge_page_pmd(vma->vm_mm, pmd);
+				split_huge_page_pmd(vma->vm_mm, pmd, addr);
 			else if (change_huge_pmd(vma, pmd, addr, newprot))
 				continue;
 			/* fall through */
diff --git a/mm/mremap.c b/mm/mremap.c
index 21fed20..790df47 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -156,7 +156,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 				need_flush = true;
 				continue;
 			} else if (!err) {
-				split_huge_page_pmd(vma->vm_mm, old_pmd);
+				split_huge_page_pmd(vma->vm_mm, old_pmd,
+						old_addr);
 			}
 			VM_BUG_ON(pmd_trans_huge(*old_pmd));
 		}
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 6c118d0..7e92ebd 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -58,7 +58,7 @@ again:
 		if (!walk->pte_entry)
 			continue;
 
-		split_huge_page_pmd(walk->mm, pmd);
+		split_huge_page_pmd(walk->mm, pmd, addr);
 		if (pmd_none_or_trans_huge_or_clear_bad(pmd))
 			goto again;
 		err = walk_pte_range(pmd, addr, next, walk);
-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page
  2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2012-08-09  9:08 ` [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd() Kirill A. Shutemov
@ 2012-08-09  9:08 ` Kirill A. Shutemov
  2012-08-16 19:27   ` Andrea Arcangeli
  2012-08-09  9:08 ` [PATCH, RFC 8/9] thp: setup huge zero page on non-write page fault Kirill A. Shutemov
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-09  9:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, linux-mm
  Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Kirill A. Shutemov,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

We can't split huge zero page itself, but we can split the pmd which
points to it.

On splitting the pmd we create a table with all ptes set to normal zero
page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |   36 ++++++++++++++++++++++++++++++++++++
 1 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c8948d6..4df5841 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1599,6 +1599,7 @@ int split_huge_page(struct page *page)
 	struct anon_vma *anon_vma;
 	int ret = 1;
 
+	BUG_ON(is_huge_zero_pfn(page_to_pfn(page)));
 	BUG_ON(!PageAnon(page));
 	anon_vma = page_lock_anon_vma(page);
 	if (!anon_vma)
@@ -2503,6 +2504,36 @@ static int khugepaged(void *none)
 	return 0;
 }
 
+static void __split_huge_zero_page_pmd(struct mm_struct *mm, pmd_t *pmd,
+		unsigned long address)
+{
+	pgtable_t pgtable;
+	pmd_t _pmd;
+	unsigned long haddr = address & HPAGE_PMD_MASK;
+	struct vm_area_struct *vma;
+	int i;
+
+	vma = find_vma(mm, address);
+	VM_BUG_ON(vma == NULL);
+	pmdp_clear_flush_notify(vma, haddr, pmd);
+	/* leave pmd empty until pte is filled */
+
+	pgtable = get_pmd_huge_pte(mm);
+	pmd_populate(mm, &_pmd, pgtable);
+
+	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+		pte_t *pte, entry;
+		entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
+		entry = pte_mkspecial(entry);
+		pte = pte_offset_map(&_pmd, haddr);
+		VM_BUG_ON(!pte_none(*pte));
+		set_pte_at(mm, haddr, pte, entry);
+		pte_unmap(pte);
+	}
+	smp_wmb(); /* make pte visible before pmd */
+	pmd_populate(mm, pmd, pgtable);
+}
+
 void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd,
 		unsigned long address)
 {
@@ -2513,6 +2544,11 @@ void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd,
 		spin_unlock(&mm->page_table_lock);
 		return;
 	}
+	if (is_huge_zero_pmd(*pmd)) {
+		__split_huge_zero_page_pmd(mm, pmd, address);
+		spin_unlock(&mm->page_table_lock);
+		return;
+	}
 	page = pmd_page(*pmd);
 	VM_BUG_ON(!page_count(page));
 	get_page(page);
-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, RFC 8/9] thp: setup huge zero page on non-write page fault
  2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2012-08-09  9:08 ` [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page Kirill A. Shutemov
@ 2012-08-09  9:08 ` Kirill A. Shutemov
  2012-08-09  9:08 ` [PATCH, RFC 9/9] thp: lazy huge zero page allocation Kirill A. Shutemov
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-09  9:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, linux-mm
  Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Kirill A. Shutemov,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

All code paths seems covered. Now we can map huge zero page on read page
fault.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4df5841..3a78677 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -750,6 +750,16 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			return VM_FAULT_OOM;
 		if (unlikely(khugepaged_enter(vma)))
 			return VM_FAULT_OOM;
+		if (!(flags & FAULT_FLAG_WRITE)) {
+			pgtable_t pgtable;
+			pgtable = pte_alloc_one(mm, haddr);
+			if (unlikely(!pgtable))
+				goto out;
+			spin_lock(&mm->page_table_lock);
+			set_huge_zero_page(pgtable, mm, vma, haddr, pmd);
+			spin_unlock(&mm->page_table_lock);
+			return 0;
+		}
 		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
 					  vma, haddr, numa_node_id(), 0);
 		if (unlikely(!page)) {
-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH, RFC 9/9] thp: lazy huge zero page allocation
  2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
                   ` (7 preceding siblings ...)
  2012-08-09  9:08 ` [PATCH, RFC 8/9] thp: setup huge zero page on non-write page fault Kirill A. Shutemov
@ 2012-08-09  9:08 ` Kirill A. Shutemov
       [not found] ` <20120810034912.GA31071@hacker.(null)>
  2012-08-16 19:20 ` Andrew Morton
  10 siblings, 0 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-09  9:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, linux-mm
  Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Kirill A. Shutemov,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Instead of allocating huge zero page on hugepage_init() we can postpone it
until first huge zero page map. It saves memory if THP is not in use.

cmpxchg() is used to avoid race on huge_zero_pfn initialization.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3a78677..6861230 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -168,21 +168,23 @@ out:
 	return err;
 }
 
-static int init_huge_zero_page(void)
+static int init_huge_zero_pfn(void)
 {
 	struct page *hpage;
+	unsigned long pfn;
 
 	hpage = alloc_pages(GFP_TRANSHUGE | __GFP_ZERO, HPAGE_PMD_ORDER);
 	if (!hpage)
 		return -ENOMEM;
-
-	huge_zero_pfn = page_to_pfn(hpage);
+	pfn = page_to_pfn(hpage);
+	if (cmpxchg(&huge_zero_pfn, 0, pfn))
+		__free_page(hpage);
 	return 0;
 }
 
 static inline bool is_huge_zero_pfn(unsigned long pfn)
 {
-	return pfn == huge_zero_pfn;
+	return huge_zero_pfn && pfn == huge_zero_pfn;
 }
 
 static inline bool is_huge_zero_pmd(pmd_t pmd)
@@ -573,10 +575,6 @@ static int __init hugepage_init(void)
 	if (err)
 		return err;
 
-	err = init_huge_zero_page();
-	if (err)
-		goto out;
-
 	err = khugepaged_slab_init();
 	if (err)
 		goto out;
@@ -601,8 +599,6 @@ static int __init hugepage_init(void)
 
 	return 0;
 out:
-	if (huge_zero_pfn)
-		__free_page(pfn_to_page(huge_zero_pfn));
 	hugepage_exit_sysfs(hugepage_kobj);
 	return err;
 }
@@ -752,6 +748,10 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			return VM_FAULT_OOM;
 		if (!(flags & FAULT_FLAG_WRITE)) {
 			pgtable_t pgtable;
+			if (unlikely(!huge_zero_pfn && init_huge_zero_pfn())) {
+				count_vm_event(THP_FAULT_FALLBACK);
+				goto out;
+			}
 			pgtable = pte_alloc_one(mm, haddr);
 			if (unlikely(!pgtable))
 				goto out;
-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, RFC 0/9] Introduce huge zero page
       [not found] ` <20120810034912.GA31071@hacker.(null)>
@ 2012-08-10 10:33   ` Kirill A. Shutemov
  0 siblings, 0 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-10 10:33 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Andrea Arcangeli, linux-mm, Andi Kleen,
	H. Peter Anvin, linux-kernel, Kirill A. Shutemov, Gavin Shan

[-- Attachment #1: Type: text/plain, Size: 3209 bytes --]

On Fri, Aug 10, 2012 at 11:49:12AM +0800, Wanpeng Li wrote:
> On Thu, Aug 09, 2012 at 12:08:11PM +0300, Kirill A. Shutemov wrote:
> >From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >
> >During testing I noticed big (up to 2.5 times) memory consumption overhead
> >on some workloads (e.g. ft.A from NPB) if THP is enabled.
> >
> >The main reason for that big difference is lacking zero page in THP case.
> >We have to allocate a real page on read page fault.
> >
> >A program to demonstrate the issue:
> >#include <assert.h>
> >#include <stdlib.h>
> >#include <unistd.h>
> >
> >#define MB 1024*1024
> >
> >int main(int argc, char **argv)
> >{
> >        char *p;
> >        int i;
> >
> >        posix_memalign((void **)&p, 2 * MB, 200 * MB);
> >        for (i = 0; i < 200 * MB; i+= 4096)
> >                assert(p[i] == 0);
> >        pause();
> >        return 0;
> >}
> >
> >With thp-never RSS is about 400k, but with thp-always it's 200M.
> >After the patcheset thp-always RSS is 400k too.
> >
> Hi Kirill, 
> 
> Thank you for your patchset, I have some questions to ask.
> 
> 1. In your patchset, if read page fault, the pmd will be populated by huge
> zero page, IIUC, assert(p[i] == 0) is a read operation, so why thp-always
> RSS is 400K ? You allocate 100 pages, why each cost 4K? I think the
> right overhead should be 2MB for the huge zero page instead of 400K, where
> I missing ?

400k comes not from the allocation, but from libc runtime. The test
program consumes about the same without any allocation at all.

Zero page is a global resource. System owns it. It's not accounted to any
process.

> 
> 2. If the user hope to allocate 200MB, total 100 pages needed. The codes 
> will allocate one 2MB huge zero page and populate to all associated pmd
> in your patchset logic. When the user attempt to write pages, wp will be 
> triggered, and if allocate huge page failed will fallback to
> do_huge_pmd_wp_zero_page_fallback in your patch logic, but you just
> create a new table and set pte around fault address to the newly
> allocated page, all other ptes set to normal zero page. In this scene 
> user only get one 4K page and all other zero pages, how the codes can
> cotinue to work? Why not fallback to allocate normal page even if not 
> physical continuous.

Since we allocate 4k page around the fault address the fault is handled.
Userspace can use it.

If the process will try to write to any other 4k page of this area a new
fault will be triggered and do_wp_page() will allocate a real page.

It's not reasonable to allocate all 4k pages in the fallback path. We can
postpone it until userspace will really want to use them. This way we reduce
memory pressure in fallback path.

> 3. In your patchset logic:
> "In fallback path we create a new table and set pte around fault address
> to the newly allocated page. All other ptes set to normal zero page."
> When these zero pages will be replaced by real pages and add memcg charge?

I guess I've answered the question above.

> Look forward to your detail response, thank you! :)

Thanks for your questions.

-- 
 Kirill A. Shutemov

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, RFC 0/9] Introduce huge zero page
  2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
                   ` (9 preceding siblings ...)
       [not found] ` <20120810034912.GA31071@hacker.(null)>
@ 2012-08-16 19:20 ` Andrew Morton
  2012-08-16 19:40   ` Andrea Arcangeli
  10 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2012-08-16 19:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, linux-mm, Andi Kleen, H. Peter Anvin,
	linux-kernel, Kirill A. Shutemov

On Thu,  9 Aug 2012 12:08:11 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> During testing I noticed big (up to 2.5 times) memory consumption overhead
> on some workloads (e.g. ft.A from NPB) if THP is enabled.
> 
> The main reason for that big difference is lacking zero page in THP case.
> We have to allocate a real page on read page fault.
> 
> A program to demonstrate the issue:
> #include <assert.h>
> #include <stdlib.h>
> #include <unistd.h>
> 
> #define MB 1024*1024
> 
> int main(int argc, char **argv)
> {
>         char *p;
>         int i;
> 
>         posix_memalign((void **)&p, 2 * MB, 200 * MB);
>         for (i = 0; i < 200 * MB; i+= 4096)
>                 assert(p[i] == 0);
>         pause();
>         return 0;
> }
> 
> With thp-never RSS is about 400k, but with thp-always it's 200M.
> After the patcheset thp-always RSS is 400k too.

That's a pretty big improvement for a rather fake test case.  I wonder
how much benefit we'd see with real workloads?

Things are rather quiet at present, with summer and beaches and Kernel
Summit coming up.  Please resend these patches early next month and
let's see if we can get a bit of action happening?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page
  2012-08-09  9:08 ` [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page Kirill A. Shutemov
@ 2012-08-16 19:27   ` Andrea Arcangeli
  2012-08-17  8:12     ` Kirill A. Shutemov
  2012-08-31 14:06     ` Kirill A. Shutemov
  0 siblings, 2 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 19:27 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, linux-mm, Andi Kleen, H. Peter Anvin,
	linux-kernel, Kirill A. Shutemov

On Thu, Aug 09, 2012 at 12:08:18PM +0300, Kirill A. Shutemov wrote:
> +static void __split_huge_zero_page_pmd(struct mm_struct *mm, pmd_t *pmd,
> +		unsigned long address)
> +{
> +	pgtable_t pgtable;
> +	pmd_t _pmd;
> +	unsigned long haddr = address & HPAGE_PMD_MASK;
> +	struct vm_area_struct *vma;
> +	int i;
> +
> +	vma = find_vma(mm, address);
> +	VM_BUG_ON(vma == NULL);

I think you can use BUG_ON here just in case but see below how I would
change it.

> +	pmdp_clear_flush_notify(vma, haddr, pmd);
> +	/* leave pmd empty until pte is filled */
> +
> +	pgtable = get_pmd_huge_pte(mm);
> +	pmd_populate(mm, &_pmd, pgtable);
> +
> +	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +		pte_t *pte, entry;
> +		entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
> +		entry = pte_mkspecial(entry);
> +		pte = pte_offset_map(&_pmd, haddr);
> +		VM_BUG_ON(!pte_none(*pte));
> +		set_pte_at(mm, haddr, pte, entry);
> +		pte_unmap(pte);
> +	}
> +	smp_wmb(); /* make pte visible before pmd */
> +	pmd_populate(mm, pmd, pgtable);
> +}
> +

The last pmd_populate will corrupt memory.

See the comment in __split_huge_page_splitting. If you set it to none
at any given time, a new page fault will instantiate a hugepmd
thinking it's the first fault and then you'll overwrite it leaking
memory and corrupting userland.

The caller may be holding the mmap_sem in read mode too (pagewalk is
an example). The PSE bit must also remain on at all times.

The non present bit must be clear and a tlb flush must happen before
the final pmd_populate with the regular pmd to avoid tripping machine
checks on some CPU (to avoid a 4k and 2m tlb to appear for the same
vaddr).

I think you should replace pmdp_clear_flush_notify with:

  	    	pmdp_splitting_flush_notify(vma, haddr, pmd);

then build the 4k zero pages in the loop using the temporary _pmd set with
pmd_populate(&_pmd) and then:


		/*
		 * Up to this point the pmd is present and huge and
		 * userland has the whole access to the hugepage
		 * during the split (which happens in place). If we
		 * overwrite the pmd with the not-huge version
		 * pointing to the pte here (which of course we could
		 * if all CPUs were bug free), userland could trigger
		 * a small page size TLB miss on the small sized TLB
		 * while the hugepage TLB entry is still established
		 * in the huge TLB. Some CPU doesn't like that. See
		 * http://support.amd.com/us/Processor_TechDocs/41322.pdf,
		 * Erratum 383 on page 93. Intel should be safe but is
		 * also warns that it's only safe if the permission
		 * and cache attributes of the two entries loaded in
		 * the two TLB is identical (which should be the case
		 * here). But it is generally safer to never allow
		 * small and huge TLB entries for the same virtual
		 * address to be loaded simultaneously. So instead of
		 * doing "pmd_populate(); flush_tlb_range();" we first
		 * mark the current pmd notpresent (atomically because
		 * here the pmd_trans_huge and pmd_trans_splitting
		 * must remain set at all times on the pmd until the
		 * split is complete for this pmd), then we flush the
		 * SMP TLB and finally we write the non-huge version
		 * of the pmd entry with pmd_populate.
		 */
		set_pmd_at(mm, address, pmd, pmd_mknotpresent(*pmd));
		flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
		pmd_populate(mm, pmd, pgtable);

note address above is actually haddr aligned (generated by
vma_address(page, vma) where page is a thp page)

> +	if (is_huge_zero_pmd(*pmd)) {
> +		__split_huge_zero_page_pmd(mm, pmd, address);

This will work fine but it's a bit sad having to add "address" at
every call, just to run a find_vma(). The only place that doesn't have
a vma already on the caller stack is actually pagewalk, all other
places already have a vma on the stack without having to find it with
the rbtree.

I think it may be better to change the param to
split_huge_page_pmd(vma, pmd).

Then have standard split_huge_page_pmd obtain the mm with vma->vm_mm
(most callers already calles it with split_huge_page_pmd(vma->vm_mm)
so it won't alter the cost to do vma->vm_mm in caller or callee).

split_huge_page_address also should take the vma (all callers are
invoking it as split_huge_page_address(vma->vm_mm) so it'll be zero
cost change).

Then we can add a split_huge_page_pmd_mm(mm, address, pmd) or
split_huge_page_pmd_address(mm, address, pmd) (call it as you
prefer...) only for the pagewalk caller that will do the find_vma and
BUG_ON if it's not found.

In that new split_huge_page_pmd_mm you can also add a BUG_ON checking
vma->vm_start to be <= haddr and vma->vm_end >= haddr+HPAGE_PMD_SIZE
in addition to BUG_ON(!vma) above, for more robustness. I'm not aware
of any place calling it without mmap_sem hold at least for reading
and the vma must be stable, but more debug checks won't hurt.

Thanks!
Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, RFC 0/9] Introduce huge zero page
  2012-08-16 19:20 ` Andrew Morton
@ 2012-08-16 19:40   ` Andrea Arcangeli
  2012-08-16 23:08     ` H. Peter Anvin
  2012-08-16 23:12     ` Andi Kleen
  0 siblings, 2 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 19:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, linux-mm, Andi Kleen, H. Peter Anvin,
	linux-kernel, Kirill A. Shutemov

Hi Andrew,

On Thu, Aug 16, 2012 at 12:20:23PM -0700, Andrew Morton wrote:
> That's a pretty big improvement for a rather fake test case.  I wonder
> how much benefit we'd see with real workloads?

The same discussion happened about the zero page in general and
there's no easy answer. I seem to recall that it was dropped at some
point and then we reintroduced the zero page later.

Most of the time it won't be worth it, it's just a few pathological
compute loads that benefits IIRC. So I'm overall positive about it
(after it's stable).

Because this is done the right way (i.e. to allocate an hugepage at
the first wp fault, and to fallback exclusively if compaction fails)
it will help much less than the 4k zero pages if the zero pages are
scattered over the address space and not contiguous (it only helps if
there are 512 of them in a row). OTOH if they're contiguous, the huge
zero pages will perform better than the 4k zero pages.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd()
  2012-08-09  9:08 ` [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd() Kirill A. Shutemov
@ 2012-08-16 19:42   ` Andrea Arcangeli
  2012-08-17  7:49     ` Kirill A. Shutemov
  0 siblings, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 19:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, linux-mm, Andi Kleen, H. Peter Anvin,
	linux-kernel, Kirill A. Shutemov

On Thu, Aug 09, 2012 at 12:08:17PM +0300, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> It's required to implement huge zero pmd splitting.
> 

This isn't bisectable with the next one, it'd fail on wfg 0-DAY kernel
build testing backend, however this is clearly to separate this patch
from the next, to keep the size small so I don't mind.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, RFC 0/9] Introduce huge zero page
  2012-08-16 19:40   ` Andrea Arcangeli
@ 2012-08-16 23:08     ` H. Peter Anvin
  2012-08-16 23:12     ` Andi Kleen
  1 sibling, 0 replies; 21+ messages in thread
From: H. Peter Anvin @ 2012-08-16 23:08 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, Kirill A. Shutemov, linux-mm, Andi Kleen,
	linux-kernel, Kirill A. Shutemov

On 08/16/2012 12:40 PM, Andrea Arcangeli wrote:
> Hi Andrew,
> 
> On Thu, Aug 16, 2012 at 12:20:23PM -0700, Andrew Morton wrote:
>> That's a pretty big improvement for a rather fake test case.  I wonder
>> how much benefit we'd see with real workloads?
> 
> The same discussion happened about the zero page in general and
> there's no easy answer. I seem to recall that it was dropped at some
> point and then we reintroduced the zero page later.
> 
> Most of the time it won't be worth it, it's just a few pathological
> compute loads that benefits IIRC. So I'm overall positive about it
> (after it's stable).
> 
> Because this is done the right way (i.e. to allocate an hugepage at
> the first wp fault, and to fallback exclusively if compaction fails)
> it will help much less than the 4k zero pages if the zero pages are
> scattered over the address space and not contiguous (it only helps if
> there are 512 of them in a row). OTOH if they're contiguous, the huge
> zero pages will perform better than the 4k zero pages.
> 

One thing that I asked for testing a "virtual zero page" where the same
page (or N pages for N-way page coloring) is reused across a page table.
 It would have worse TLB performance but likely *much* better cache
behavior.

	-hpa



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, RFC 0/9] Introduce huge zero page
  2012-08-16 19:40   ` Andrea Arcangeli
  2012-08-16 23:08     ` H. Peter Anvin
@ 2012-08-16 23:12     ` Andi Kleen
  1 sibling, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2012-08-16 23:12 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, Kirill A. Shutemov, linux-mm, H. Peter Anvin,
	linux-kernel, Kirill A. Shutemov

> Because this is done the right way (i.e. to allocate an hugepage at
> the first wp fault, and to fallback exclusively if compaction fails)
> it will help much less than the 4k zero pages if the zero pages are

The main benefit is that you have a zero page with THP enabled.
So it lowers the cost of having THP on (for workloads that benefit
from a zero page)

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd()
  2012-08-16 19:42   ` Andrea Arcangeli
@ 2012-08-17  7:49     ` Kirill A. Shutemov
  0 siblings, 0 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-17  7:49 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, Andi Kleen, H. Peter Anvin,
	linux-kernel, Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 764 bytes --]

On Thu, Aug 16, 2012 at 09:42:01PM +0200, Andrea Arcangeli wrote:
> On Thu, Aug 09, 2012 at 12:08:17PM +0300, Kirill A. Shutemov wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > 
> > It's required to implement huge zero pmd splitting.
> > 
> 
> This isn't bisectable with the next one, it'd fail on wfg 0-DAY kernel
> build testing backend, however this is clearly to separate this patch
> from the next, to keep the size small so I don't mind.

Hm. I don't see why it's not bisectable. It's only add a new parameter to
the function. The parameter is unused until next patch.

Actually, I've checked build bisectability with aiaiai[1].

[1] http://git.infradead.org/users/dedekind/aiaiai.git

-- 
 Kirill A. Shutemov

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page
  2012-08-16 19:27   ` Andrea Arcangeli
@ 2012-08-17  8:12     ` Kirill A. Shutemov
  2012-08-17 16:33       ` Andrea Arcangeli
  2012-08-31 14:06     ` Kirill A. Shutemov
  1 sibling, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-17  8:12 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, Andi Kleen, H. Peter Anvin,
	linux-kernel, Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 2999 bytes --]

On Thu, Aug 16, 2012 at 09:27:38PM +0200, Andrea Arcangeli wrote:
> On Thu, Aug 09, 2012 at 12:08:18PM +0300, Kirill A. Shutemov wrote:
> > +static void __split_huge_zero_page_pmd(struct mm_struct *mm, pmd_t *pmd,
> > +		unsigned long address)
> > +{
> > +	pgtable_t pgtable;
> > +	pmd_t _pmd;
> > +	unsigned long haddr = address & HPAGE_PMD_MASK;
> > +	struct vm_area_struct *vma;
> > +	int i;
> > +
> > +	vma = find_vma(mm, address);
> > +	VM_BUG_ON(vma == NULL);
> 
> I think you can use BUG_ON here just in case but see below how I would
> change it.
> 
> > +	pmdp_clear_flush_notify(vma, haddr, pmd);
> > +	/* leave pmd empty until pte is filled */
> > +
> > +	pgtable = get_pmd_huge_pte(mm);
> > +	pmd_populate(mm, &_pmd, pgtable);
> > +
> > +	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> > +		pte_t *pte, entry;
> > +		entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
> > +		entry = pte_mkspecial(entry);
> > +		pte = pte_offset_map(&_pmd, haddr);
> > +		VM_BUG_ON(!pte_none(*pte));
> > +		set_pte_at(mm, haddr, pte, entry);
> > +		pte_unmap(pte);
> > +	}
> > +	smp_wmb(); /* make pte visible before pmd */
> > +	pmd_populate(mm, pmd, pgtable);
> > +}
> > +
> 
> The last pmd_populate will corrupt memory.

Nice catch, thank you.

I've used do_huge_pmd_wp_page_fallback() as template for my code.
What's difference between these two code paths?
Why is do_huge_pmd_wp_page_fallback() safe?

> > +	if (is_huge_zero_pmd(*pmd)) {
> > +		__split_huge_zero_page_pmd(mm, pmd, address);
> 
> This will work fine but it's a bit sad having to add "address" at
> every call, just to run a find_vma(). The only place that doesn't have
> a vma already on the caller stack is actually pagewalk, all other
> places already have a vma on the stack without having to find it with
> the rbtree.
> 
> I think it may be better to change the param to
> split_huge_page_pmd(vma, pmd).
> 
> Then have standard split_huge_page_pmd obtain the mm with vma->vm_mm
> (most callers already calles it with split_huge_page_pmd(vma->vm_mm)
> so it won't alter the cost to do vma->vm_mm in caller or callee).
> 
> split_huge_page_address also should take the vma (all callers are
> invoking it as split_huge_page_address(vma->vm_mm) so it'll be zero
> cost change).
> 
> Then we can add a split_huge_page_pmd_mm(mm, address, pmd) or
> split_huge_page_pmd_address(mm, address, pmd) (call it as you
> prefer...) only for the pagewalk caller that will do the find_vma and
> BUG_ON if it's not found.
> 
> In that new split_huge_page_pmd_mm you can also add a BUG_ON checking
> vma->vm_start to be <= haddr and vma->vm_end >= haddr+HPAGE_PMD_SIZE
> in addition to BUG_ON(!vma) above, for more robustness. I'm not aware
> of any place calling it without mmap_sem hold at least for reading
> and the vma must be stable, but more debug checks won't hurt.

Looks resonable. I'll update it in next revision.

-- 
 Kirill A. Shutemov

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page
  2012-08-17  8:12     ` Kirill A. Shutemov
@ 2012-08-17 16:33       ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2012-08-17 16:33 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, linux-mm, Andi Kleen, H. Peter Anvin,
	linux-kernel, Kirill A. Shutemov

On Fri, Aug 17, 2012 at 11:12:33AM +0300, Kirill A. Shutemov wrote:
> I've used do_huge_pmd_wp_page_fallback() as template for my code.
> What's difference between these two code paths?
> Why is do_huge_pmd_wp_page_fallback() safe?

Good point. do_huge_pmd_wp_page_fallback works only on the current
"mm" so it doesn't need the splitting transition, but thinking twice
the split_huge_zero_page_pmd also works only on the local "mm" because
you're not really splitting the zero page there (you're not affecting
other mm). As long as you keep holding the page_table_lock of the "mm"
that you're altering your current version is safe.

I got mistaken because I'm very used to think at split huge page as
something that cannot relay on the page_table_lock, but this is a
simpler case that isn't splitting the "page" but only the "pmd" of a
single "mm", so you can safely relay on the mm->page_table_lock :).

> Looks resonable. I'll update it in next revision.

Thanks. Of course the function parameter comments to avoid unnecessary
calls of find_vma, weren't related to the above locking issues.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page
  2012-08-16 19:27   ` Andrea Arcangeli
  2012-08-17  8:12     ` Kirill A. Shutemov
@ 2012-08-31 14:06     ` Kirill A. Shutemov
  1 sibling, 0 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2012-08-31 14:06 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, Andi Kleen, H. Peter Anvin,
	linux-kernel, Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 483 bytes --]

On Thu, Aug 16, 2012 at 09:27:38PM +0200, Andrea Arcangeli wrote:
> On Thu, Aug 09, 2012 at 12:08:18PM +0300, Kirill A. Shutemov wrote:
> > +	if (is_huge_zero_pmd(*pmd)) {
> > +		__split_huge_zero_page_pmd(mm, pmd, address);
> 
> This will work fine but it's a bit sad having to add "address" at
> every call, just to run a find_vma().

Hm. address is also used to calculate haddr..

It seems we need pass address anyway. I mean vma + address.

-- 
 Kirill A. Shutemov

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-08-31 14:06 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 1/9] thp: huge zero page: basic preparation Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 2/9] thp: zap_huge_pmd(): zap huge zero pmd Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 3/9] thp: copy_huge_pmd(): copy huge zero page Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 4/9] thp: do_huge_pmd_wp_page(): handle " Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 5/9] thp: change_huge_pmd(): keep huge zero page write-protected Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd() Kirill A. Shutemov
2012-08-16 19:42   ` Andrea Arcangeli
2012-08-17  7:49     ` Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page Kirill A. Shutemov
2012-08-16 19:27   ` Andrea Arcangeli
2012-08-17  8:12     ` Kirill A. Shutemov
2012-08-17 16:33       ` Andrea Arcangeli
2012-08-31 14:06     ` Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 8/9] thp: setup huge zero page on non-write page fault Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 9/9] thp: lazy huge zero page allocation Kirill A. Shutemov
     [not found] ` <20120810034912.GA31071@hacker.(null)>
2012-08-10 10:33   ` [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
2012-08-16 19:20 ` Andrew Morton
2012-08-16 19:40   ` Andrea Arcangeli
2012-08-16 23:08     ` H. Peter Anvin
2012-08-16 23:12     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).