All of lore.kernel.org
 help / color / mirror / Atom feed
* Transparent Hugepage Support #30
@ 2010-09-01 19:08 ` Andrea Arcangeli
  0 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2010-09-01 19:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, linux-kernel
  Cc: Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus,
	Hugh Dickins, Nick Piggin, Rik van Riel, Mel Gorman, Dave Hansen,
	Benjamin Herrenschmidt, Ingo Molnar, Mike Travis,
	KAMEZAWA Hiroyuki, Christoph Lameter, Chris Wright, bpicco,
	KOSAKI Motohiro, Balbir Singh, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura, Chris Mason,
	Borislav Petkov

http://www.linux-kvm.org/wiki/images/9/9e/2010-forum-thp.pdf

http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog

first: git clone git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
or first: git clone --reference linux-2.6 git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
later: git fetch; git checkout -f origin/master

The tree is rebased and git pull won't work.

http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.36-rc3/transparent_hugepage-30/
http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.36-rc3/transparent_hugepage-30.gz

Diff #29 -> #30:

 b/compaction-migration-warning               |   25 +++

Avoid MIGRATION config warning when COMPACTION is selected but numa
and memhotplug aren't.

 do_swap_page-VM_FAULT_WRITE                  |   21 --
 kvm-huge-spte-wrprotect                      |   48 ------
 kvm-mmu-notifier-huge-spte                   |   29 ---
 root_anon_vma-anon_vma_lock                  |  208 ---------------------------
 root_anon_vma-avoid-ksm-hang                 |   30 ---
 root_anon_vma-bugchecks                      |   37 ----
 root_anon_vma-in_vma                         |   27 ---
 root_anon_vma-ksm_refcount                   |  169 ---------------------
 root_anon_vma-lock_root                      |  127 ----------------
 root_anon_vma-memory-compaction              |   36 ----
 root_anon_vma-mm_take_all_locks              |   81 ----------
 root_anon_vma-oldest_root                    |   81 ----------
 root_anon_vma-refcount                       |   29 ---
 root_anon_vma-swapin                         |   91 -----------
 root_anon_vma-use-root                       |   66 --------
 root_anon_vma-vma_lock_anon_vma              |   94 ------------

merged upstream.

 b/memcg_compound                             |  166 ++++++++++-----------
 b/memcg_compound_tail                        |   31 +---
 b/memcg_consume_stock                        |   31 ++--
 memcg_check_room                             |   88 -----------
 memcg_oom                                    |   34 ----

These had heavy rejects, the last two patches and other bits got
removed. memcg code is rewritten so fast it's hard to justify to keep
up with it. It's simpler and less time consuming to fix it just once
than over and over again. Likely memcg in this release isn't too
stable with THP on (it'll definitely work fine if you disable THP at
compile time or at boot time with the kernel parameter). Especially
all get_css/put_css will have to be re-audited after these new
changes. For now it builds just fine and the basics to support THP and
to show the direction are in. Nevertheless I welcome patches to fix
this up.

btw, memcg developers could already support THP inside memcg even if
THP is not included yet without any sort of problem, so it's also
partly up to them to want to support THP in memcg, but it's also
perfectly ok to catch up with memcg externally, but it'd be also nice
to know when memcg reaches a milestone and so when it's time to
re-audit it all for THP.

Full diffstat:

 Documentation/vm/transhuge.txt        |  283 ++++
 arch/alpha/include/asm/mman.h         |    2 
 arch/mips/include/asm/mman.h          |    2 
 arch/parisc/include/asm/mman.h        |    2 
 arch/powerpc/mm/gup.c                 |   12 
 arch/x86/include/asm/kvm_host.h       |    1 
 arch/x86/include/asm/paravirt.h       |   23 
 arch/x86/include/asm/paravirt_types.h |    6 
 arch/x86/include/asm/pgtable-2level.h |    9 
 arch/x86/include/asm/pgtable-3level.h |   23 
 arch/x86/include/asm/pgtable.h        |  149 ++
 arch/x86/include/asm/pgtable_64.h     |   28 
 arch/x86/include/asm/pgtable_types.h  |    3 
 arch/x86/kernel/paravirt.c            |    3 
 arch/x86/kernel/vm86_32.c             |    1 
 arch/x86/kvm/mmu.c                    |   60 
 arch/x86/kvm/paging_tmpl.h            |    4 
 arch/x86/mm/gup.c                     |   28 
 arch/x86/mm/pgtable.c                 |   66 +
 arch/xtensa/include/asm/mman.h        |    2 
 fs/Kconfig                            |    2 
 fs/exec.c                             |   44 
 fs/proc/meminfo.c                     |   14 
 fs/proc/page.c                        |   14 
 include/asm-generic/mman-common.h     |    2 
 include/asm-generic/pgtable.h         |  130 +
 include/linux/compaction.h            |   13 
 include/linux/gfp.h                   |   14 
 include/linux/huge_mm.h               |  151 ++
 include/linux/khugepaged.h            |   66 +
 include/linux/ksm.h                   |   20 
 include/linux/kvm_host.h              |    4 
 include/linux/memory_hotplug.h        |   14 
 include/linux/mm.h                    |  114 +
 include/linux/mm_inline.h             |   19 
 include/linux/mm_types.h              |    3 
 include/linux/mmu_notifier.h          |   66 +
 include/linux/mmzone.h                |    1 
 include/linux/page-flags.h            |   36 
 include/linux/sched.h                 |    1 
 include/linux/swap.h                  |    2 
 include/linux/vmstat.h                |    4 
 kernel/fork.c                         |   12 
 kernel/futex.c                        |   55 
 mm/Kconfig                            |   40 
 mm/Makefile                           |    1 
 mm/compaction.c                       |   48 
 mm/huge_memory.c                      | 2212 ++++++++++++++++++++++++++++++++++
 mm/hugetlb.c                          |   69 -
 mm/ksm.c                              |   53 
 mm/madvise.c                          |    8 
 mm/memcontrol.c                       |  138 +-
 mm/memory-failure.c                   |    2 
 mm/memory.c                           |  235 +++
 mm/memory_hotplug.c                   |   14 
 mm/mempolicy.c                        |   14 
 mm/migrate.c                          |   12 
 mm/mincore.c                          |    7 
 mm/mmap.c                             |    5 
 mm/mmu_notifier.c                     |   20 
 mm/mprotect.c                         |   20 
 mm/mremap.c                           |    8 
 mm/oom_kill.c                         |    1 
 mm/page_alloc.c                       |   58 
 mm/pagewalk.c                         |    1 
 mm/rmap.c                             |  115 -
 mm/sparse.c                           |    4 
 mm/swap.c                             |  117 +
 mm/swap_state.c                       |    6 
 mm/swapfile.c                         |    2 
 mm/vmscan.c                           |   98 -
 mm/vmstat.c                           |   31 
 virt/kvm/iommu.c                      |    2 
 virt/kvm/kvm_main.c                   |   56 
 74 files changed, 4468 insertions(+), 437 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Transparent Hugepage Support #30
@ 2010-09-01 19:08 ` Andrea Arcangeli
  0 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2010-09-01 19:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, linux-kernel
  Cc: Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus,
	Hugh Dickins, Nick Piggin, Rik van Riel, Mel Gorman, Dave Hansen,
	Benjamin Herrenschmidt, Ingo Molnar, Mike Travis,
	KAMEZAWA Hiroyuki, Christoph Lameter, Chris Wright, bpicco,
	KOSAKI Motohiro, Balbir Singh, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura, Chris Mason,
	Borislav Petkov

http://www.linux-kvm.org/wiki/images/9/9e/2010-forum-thp.pdf

http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog

first: git clone git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
or first: git clone --reference linux-2.6 git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
later: git fetch; git checkout -f origin/master

The tree is rebased and git pull won't work.

http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.36-rc3/transparent_hugepage-30/
http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.36-rc3/transparent_hugepage-30.gz

Diff #29 -> #30:

 b/compaction-migration-warning               |   25 +++

Avoid MIGRATION config warning when COMPACTION is selected but numa
and memhotplug aren't.

 do_swap_page-VM_FAULT_WRITE                  |   21 --
 kvm-huge-spte-wrprotect                      |   48 ------
 kvm-mmu-notifier-huge-spte                   |   29 ---
 root_anon_vma-anon_vma_lock                  |  208 ---------------------------
 root_anon_vma-avoid-ksm-hang                 |   30 ---
 root_anon_vma-bugchecks                      |   37 ----
 root_anon_vma-in_vma                         |   27 ---
 root_anon_vma-ksm_refcount                   |  169 ---------------------
 root_anon_vma-lock_root                      |  127 ----------------
 root_anon_vma-memory-compaction              |   36 ----
 root_anon_vma-mm_take_all_locks              |   81 ----------
 root_anon_vma-oldest_root                    |   81 ----------
 root_anon_vma-refcount                       |   29 ---
 root_anon_vma-swapin                         |   91 -----------
 root_anon_vma-use-root                       |   66 --------
 root_anon_vma-vma_lock_anon_vma              |   94 ------------

merged upstream.

 b/memcg_compound                             |  166 ++++++++++-----------
 b/memcg_compound_tail                        |   31 +---
 b/memcg_consume_stock                        |   31 ++--
 memcg_check_room                             |   88 -----------
 memcg_oom                                    |   34 ----

These had heavy rejects, the last two patches and other bits got
removed. memcg code is rewritten so fast it's hard to justify to keep
up with it. It's simpler and less time consuming to fix it just once
than over and over again. Likely memcg in this release isn't too
stable with THP on (it'll definitely work fine if you disable THP at
compile time or at boot time with the kernel parameter). Especially
all get_css/put_css will have to be re-audited after these new
changes. For now it builds just fine and the basics to support THP and
to show the direction are in. Nevertheless I welcome patches to fix
this up.

btw, memcg developers could already support THP inside memcg even if
THP is not included yet without any sort of problem, so it's also
partly up to them to want to support THP in memcg, but it's also
perfectly ok to catch up with memcg externally, but it'd be also nice
to know when memcg reaches a milestone and so when it's time to
re-audit it all for THP.

Full diffstat:

 Documentation/vm/transhuge.txt        |  283 ++++
 arch/alpha/include/asm/mman.h         |    2 
 arch/mips/include/asm/mman.h          |    2 
 arch/parisc/include/asm/mman.h        |    2 
 arch/powerpc/mm/gup.c                 |   12 
 arch/x86/include/asm/kvm_host.h       |    1 
 arch/x86/include/asm/paravirt.h       |   23 
 arch/x86/include/asm/paravirt_types.h |    6 
 arch/x86/include/asm/pgtable-2level.h |    9 
 arch/x86/include/asm/pgtable-3level.h |   23 
 arch/x86/include/asm/pgtable.h        |  149 ++
 arch/x86/include/asm/pgtable_64.h     |   28 
 arch/x86/include/asm/pgtable_types.h  |    3 
 arch/x86/kernel/paravirt.c            |    3 
 arch/x86/kernel/vm86_32.c             |    1 
 arch/x86/kvm/mmu.c                    |   60 
 arch/x86/kvm/paging_tmpl.h            |    4 
 arch/x86/mm/gup.c                     |   28 
 arch/x86/mm/pgtable.c                 |   66 +
 arch/xtensa/include/asm/mman.h        |    2 
 fs/Kconfig                            |    2 
 fs/exec.c                             |   44 
 fs/proc/meminfo.c                     |   14 
 fs/proc/page.c                        |   14 
 include/asm-generic/mman-common.h     |    2 
 include/asm-generic/pgtable.h         |  130 +
 include/linux/compaction.h            |   13 
 include/linux/gfp.h                   |   14 
 include/linux/huge_mm.h               |  151 ++
 include/linux/khugepaged.h            |   66 +
 include/linux/ksm.h                   |   20 
 include/linux/kvm_host.h              |    4 
 include/linux/memory_hotplug.h        |   14 
 include/linux/mm.h                    |  114 +
 include/linux/mm_inline.h             |   19 
 include/linux/mm_types.h              |    3 
 include/linux/mmu_notifier.h          |   66 +
 include/linux/mmzone.h                |    1 
 include/linux/page-flags.h            |   36 
 include/linux/sched.h                 |    1 
 include/linux/swap.h                  |    2 
 include/linux/vmstat.h                |    4 
 kernel/fork.c                         |   12 
 kernel/futex.c                        |   55 
 mm/Kconfig                            |   40 
 mm/Makefile                           |    1 
 mm/compaction.c                       |   48 
 mm/huge_memory.c                      | 2212 ++++++++++++++++++++++++++++++++++
 mm/hugetlb.c                          |   69 -
 mm/ksm.c                              |   53 
 mm/madvise.c                          |    8 
 mm/memcontrol.c                       |  138 +-
 mm/memory-failure.c                   |    2 
 mm/memory.c                           |  235 +++
 mm/memory_hotplug.c                   |   14 
 mm/mempolicy.c                        |   14 
 mm/migrate.c                          |   12 
 mm/mincore.c                          |    7 
 mm/mmap.c                             |    5 
 mm/mmu_notifier.c                     |   20 
 mm/mprotect.c                         |   20 
 mm/mremap.c                           |    8 
 mm/oom_kill.c                         |    1 
 mm/page_alloc.c                       |   58 
 mm/pagewalk.c                         |    1 
 mm/rmap.c                             |  115 -
 mm/sparse.c                           |    4 
 mm/swap.c                             |  117 +
 mm/swap_state.c                       |    6 
 mm/swapfile.c                         |    2 
 mm/vmscan.c                           |   98 -
 mm/vmstat.c                           |   31 
 virt/kvm/iommu.c                      |    2 
 virt/kvm/kvm_main.c                   |   56 
 74 files changed, 4468 insertions(+), 437 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch] transparent hugepage sysfs meminfo
  2010-09-01 19:08 ` Andrea Arcangeli
@ 2010-09-01 19:44   ` David Rientjes
  -1 siblings, 0 replies; 22+ messages in thread
From: David Rientjes @ 2010-09-01 19:44 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Balbir Singh,
	Michael S. Tsirkin, Peter Zijlstra, Johannes Weiner,
	Daisuke Nishimura, Chris Mason, Borislav Petkov

Add hugepage statistics to per-node sysfs meminfo

Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 drivers/base/node.c |   21 ++++++++++++++++++---
 1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -117,12 +117,21 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
 		       "Node %d WritebackTmp:   %8lu kB\n"
 		       "Node %d Slab:           %8lu kB\n"
 		       "Node %d SReclaimable:   %8lu kB\n"
-		       "Node %d SUnreclaim:     %8lu kB\n",
+		       "Node %d SUnreclaim:     %8lu kB\n"
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+		       "Node %d AnonHugePages:  %8lu kB\n"
+#endif
+			,
 		       nid, K(node_page_state(nid, NR_FILE_DIRTY)),
 		       nid, K(node_page_state(nid, NR_WRITEBACK)),
 		       nid, K(node_page_state(nid, NR_FILE_PAGES)),
 		       nid, K(node_page_state(nid, NR_FILE_MAPPED)),
-		       nid, K(node_page_state(nid, NR_ANON_PAGES)),
+		       nid, K(node_page_state(nid, NR_ANON_PAGES)
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+			+ node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
+			HPAGE_PMD_NR
+#endif
+		       ),
 		       nid, K(node_page_state(nid, NR_SHMEM)),
 		       nid, node_page_state(nid, NR_KERNEL_STACK) *
 				THREAD_SIZE / 1024,
@@ -133,7 +142,13 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
 		       nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE) +
 				node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
 		       nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE)),
-		       nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE)));
+		       nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE))
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+			, nid,
+			K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
+			HPAGE_PMD_NR)
+#endif
+		       );
 	n += hugetlb_report_node_meminfo(nid, buf + n);
 	return n;
 }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch] transparent hugepage sysfs meminfo
@ 2010-09-01 19:44   ` David Rientjes
  0 siblings, 0 replies; 22+ messages in thread
From: David Rientjes @ 2010-09-01 19:44 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Balbir Singh,
	Michael S. Tsirkin, Peter Zijlstra, Johannes Weiner,
	Daisuke Nishimura, Chris Mason, Borislav Petkov

Add hugepage statistics to per-node sysfs meminfo

Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 drivers/base/node.c |   21 ++++++++++++++++++---
 1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -117,12 +117,21 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
 		       "Node %d WritebackTmp:   %8lu kB\n"
 		       "Node %d Slab:           %8lu kB\n"
 		       "Node %d SReclaimable:   %8lu kB\n"
-		       "Node %d SUnreclaim:     %8lu kB\n",
+		       "Node %d SUnreclaim:     %8lu kB\n"
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+		       "Node %d AnonHugePages:  %8lu kB\n"
+#endif
+			,
 		       nid, K(node_page_state(nid, NR_FILE_DIRTY)),
 		       nid, K(node_page_state(nid, NR_WRITEBACK)),
 		       nid, K(node_page_state(nid, NR_FILE_PAGES)),
 		       nid, K(node_page_state(nid, NR_FILE_MAPPED)),
-		       nid, K(node_page_state(nid, NR_ANON_PAGES)),
+		       nid, K(node_page_state(nid, NR_ANON_PAGES)
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+			+ node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
+			HPAGE_PMD_NR
+#endif
+		       ),
 		       nid, K(node_page_state(nid, NR_SHMEM)),
 		       nid, node_page_state(nid, NR_KERNEL_STACK) *
 				THREAD_SIZE / 1024,
@@ -133,7 +142,13 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
 		       nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE) +
 				node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
 		       nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE)),
-		       nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE)));
+		       nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE))
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+			, nid,
+			K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
+			HPAGE_PMD_NR)
+#endif
+		       );
 	n += hugetlb_report_node_meminfo(nid, buf + n);
 	return n;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] transparent hugepage sysfs meminfo
  2010-09-01 19:44   ` David Rientjes
@ 2010-09-01 19:50     ` Andrea Arcangeli
  -1 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2010-09-01 19:50 UTC (permalink / raw)
  To: David Rientjes; +Cc: linux-mm, linux-kernel

On Wed, Sep 01, 2010 at 12:44:35PM -0700, David Rientjes wrote:
> Add hugepage statistics to per-node sysfs meminfo

Applied now.

Thanks for the resend, last version I got it when I was on vacation,
and it slipped sorry.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] transparent hugepage sysfs meminfo
@ 2010-09-01 19:50     ` Andrea Arcangeli
  0 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2010-09-01 19:50 UTC (permalink / raw)
  To: David Rientjes; +Cc: linux-mm, linux-kernel

On Wed, Sep 01, 2010 at 12:44:35PM -0700, David Rientjes wrote:
> Add hugepage statistics to per-node sysfs meminfo

Applied now.

Thanks for the resend, last version I got it when I was on vacation,
and it slipped sorry.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
  2010-09-01 19:08 ` Andrea Arcangeli
@ 2010-09-09 10:46   ` Balbir Singh
  -1 siblings, 0 replies; 22+ messages in thread
From: Balbir Singh @ 2010-09-09 10:46 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura, Chris Mason,
	Borislav Petkov

* Andrea Arcangeli <aarcange@redhat.com> [2010-09-01 21:08:59]:

> http://www.linux-kvm.org/wiki/images/9/9e/2010-forum-thp.pdf
> 
> http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog
> 
> first: git clone git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> or first: git clone --reference linux-2.6 git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> later: git fetch; git checkout -f origin/master
> 
> The tree is rebased and git pull won't work.
> 
> http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.36-rc3/transparent_hugepage-30/
> http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.36-rc3/transparent_hugepage-30.gz
> 
> Diff #29 -> #30:
> 
>  b/compaction-migration-warning               |   25 +++
> 
> Avoid MIGRATION config warning when COMPACTION is selected but numa
> and memhotplug aren't.
> 
>  do_swap_page-VM_FAULT_WRITE                  |   21 --
>  kvm-huge-spte-wrprotect                      |   48 ------
>  kvm-mmu-notifier-huge-spte                   |   29 ---
>  root_anon_vma-anon_vma_lock                  |  208 ---------------------------
>  root_anon_vma-avoid-ksm-hang                 |   30 ---
>  root_anon_vma-bugchecks                      |   37 ----
>  root_anon_vma-in_vma                         |   27 ---
>  root_anon_vma-ksm_refcount                   |  169 ---------------------
>  root_anon_vma-lock_root                      |  127 ----------------
>  root_anon_vma-memory-compaction              |   36 ----
>  root_anon_vma-mm_take_all_locks              |   81 ----------
>  root_anon_vma-oldest_root                    |   81 ----------
>  root_anon_vma-refcount                       |   29 ---
>  root_anon_vma-swapin                         |   91 -----------
>  root_anon_vma-use-root                       |   66 --------
>  root_anon_vma-vma_lock_anon_vma              |   94 ------------
> 
> merged upstream.
> 
>  b/memcg_compound                             |  166 ++++++++++-----------
>  b/memcg_compound_tail                        |   31 +---
>  b/memcg_consume_stock                        |   31 ++--
>  memcg_check_room                             |   88 -----------
>  memcg_oom                                    |   34 ----
> 
> These had heavy rejects, the last two patches and other bits got
> removed. memcg code is rewritten so fast it's hard to justify to keep
> up with it. It's simpler and less time consuming to fix it just once
> than over and over again. Likely memcg in this release isn't too
> stable with THP on (it'll definitely work fine if you disable THP at
> compile time or at boot time with the kernel parameter). Especially
> all get_css/put_css will have to be re-audited after these new
> changes. For now it builds just fine and the basics to support THP and
> to show the direction are in. Nevertheless I welcome patches to fix
> this up.
> 
> btw, memcg developers could already support THP inside memcg even if
> THP is not included yet without any sort of problem, so it's also

Could you elaborate by what you mean here?

> partly up to them to want to support THP in memcg, but it's also
> perfectly ok to catch up with memcg externally, but it'd be also nice
> to know when memcg reaches a milestone and so when it's time to
> re-audit it all for THP.
>

We try not to change too drastically, but several of the current
changes are fixes, we are currently contemplating some more changes to
support the I/O control. Some of the recent changes have been driven
by tracing. We will pay closer attention to THP changes, thanks for
bring your concern to our notice.

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
@ 2010-09-09 10:46   ` Balbir Singh
  0 siblings, 0 replies; 22+ messages in thread
From: Balbir Singh @ 2010-09-09 10:46 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura, Chris Mason,
	Borislav Petkov

* Andrea Arcangeli <aarcange@redhat.com> [2010-09-01 21:08:59]:

> http://www.linux-kvm.org/wiki/images/9/9e/2010-forum-thp.pdf
> 
> http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog
> 
> first: git clone git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> or first: git clone --reference linux-2.6 git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> later: git fetch; git checkout -f origin/master
> 
> The tree is rebased and git pull won't work.
> 
> http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.36-rc3/transparent_hugepage-30/
> http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.36-rc3/transparent_hugepage-30.gz
> 
> Diff #29 -> #30:
> 
>  b/compaction-migration-warning               |   25 +++
> 
> Avoid MIGRATION config warning when COMPACTION is selected but numa
> and memhotplug aren't.
> 
>  do_swap_page-VM_FAULT_WRITE                  |   21 --
>  kvm-huge-spte-wrprotect                      |   48 ------
>  kvm-mmu-notifier-huge-spte                   |   29 ---
>  root_anon_vma-anon_vma_lock                  |  208 ---------------------------
>  root_anon_vma-avoid-ksm-hang                 |   30 ---
>  root_anon_vma-bugchecks                      |   37 ----
>  root_anon_vma-in_vma                         |   27 ---
>  root_anon_vma-ksm_refcount                   |  169 ---------------------
>  root_anon_vma-lock_root                      |  127 ----------------
>  root_anon_vma-memory-compaction              |   36 ----
>  root_anon_vma-mm_take_all_locks              |   81 ----------
>  root_anon_vma-oldest_root                    |   81 ----------
>  root_anon_vma-refcount                       |   29 ---
>  root_anon_vma-swapin                         |   91 -----------
>  root_anon_vma-use-root                       |   66 --------
>  root_anon_vma-vma_lock_anon_vma              |   94 ------------
> 
> merged upstream.
> 
>  b/memcg_compound                             |  166 ++++++++++-----------
>  b/memcg_compound_tail                        |   31 +---
>  b/memcg_consume_stock                        |   31 ++--
>  memcg_check_room                             |   88 -----------
>  memcg_oom                                    |   34 ----
> 
> These had heavy rejects, the last two patches and other bits got
> removed. memcg code is rewritten so fast it's hard to justify to keep
> up with it. It's simpler and less time consuming to fix it just once
> than over and over again. Likely memcg in this release isn't too
> stable with THP on (it'll definitely work fine if you disable THP at
> compile time or at boot time with the kernel parameter). Especially
> all get_css/put_css will have to be re-audited after these new
> changes. For now it builds just fine and the basics to support THP and
> to show the direction are in. Nevertheless I welcome patches to fix
> this up.
> 
> btw, memcg developers could already support THP inside memcg even if
> THP is not included yet without any sort of problem, so it's also

Could you elaborate by what you mean here?

> partly up to them to want to support THP in memcg, but it's also
> perfectly ok to catch up with memcg externally, but it'd be also nice
> to know when memcg reaches a milestone and so when it's time to
> re-audit it all for THP.
>

We try not to change too drastically, but several of the current
changes are fixes, we are currently contemplating some more changes to
support the I/O control. Some of the recent changes have been driven
by tracing. We will pay closer attention to THP changes, thanks for
bring your concern to our notice.

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
  2010-09-09 10:46   ` Balbir Singh
@ 2010-09-09 23:40     ` Andrea Arcangeli
  -1 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2010-09-09 23:40 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura, Chris Mason,
	Borislav Petkov

Hello,

On Thu, Sep 09, 2010 at 04:16:30PM +0530, Balbir Singh wrote:
> * Andrea Arcangeli <aarcange@redhat.com> [2010-09-01 21:08:59]:
> > btw, memcg developers could already support THP inside memcg even if
> > THP is not included yet without any sort of problem, so it's also
> 
> Could you elaborate by what you mean here?

Ok, what I mean is that you could already stop assuming the "page"
passed as parameter to memcg is PAGE_SIZE in size. It would still work
fine. The check should later be done with PageTransCompound as that
will be optimized away at compile time when
CONFIG_TRANSPARENT_HUGEPAGE=n. But in the meantime PageCompund shall
work fine.

One nasty detail to pay attention to later (which isn't possible to
implement until compound_lock is defined), is that at times we may
also need to take the compound_lock to avoid the size of the page to
change from under us (it should only be needed if PageTransCompound
returns true so it won't affect the regular paths and it won't be
built if THP is off at compile time). The collapsing takes the
mmap_sem write mode which normally won't risk to run in parallel,
furthermore the collapsing isn't done in place so it's unlikely to
give issues. So only the transition from transcompound to regular
page, is likely to require special care.

> We try not to change too drastically, but several of the current
> changes are fixes, we are currently contemplating some more changes to
> support the I/O control. Some of the recent changes have been driven
> by tracing. We will pay closer attention to THP changes, thanks for
> bring your concern to our notice.

Thanks a lot. I can already start looking more closely into the memcg
of current upstream myself, if this is a good time and there are no
more big changes planned or already queued in some git tree waiting to
be pulled.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
@ 2010-09-09 23:40     ` Andrea Arcangeli
  0 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2010-09-09 23:40 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura, Chris Mason,
	Borislav Petkov

Hello,

On Thu, Sep 09, 2010 at 04:16:30PM +0530, Balbir Singh wrote:
> * Andrea Arcangeli <aarcange@redhat.com> [2010-09-01 21:08:59]:
> > btw, memcg developers could already support THP inside memcg even if
> > THP is not included yet without any sort of problem, so it's also
> 
> Could you elaborate by what you mean here?

Ok, what I mean is that you could already stop assuming the "page"
passed as parameter to memcg is PAGE_SIZE in size. It would still work
fine. The check should later be done with PageTransCompound as that
will be optimized away at compile time when
CONFIG_TRANSPARENT_HUGEPAGE=n. But in the meantime PageCompund shall
work fine.

One nasty detail to pay attention to later (which isn't possible to
implement until compound_lock is defined), is that at times we may
also need to take the compound_lock to avoid the size of the page to
change from under us (it should only be needed if PageTransCompound
returns true so it won't affect the regular paths and it won't be
built if THP is off at compile time). The collapsing takes the
mmap_sem write mode which normally won't risk to run in parallel,
furthermore the collapsing isn't done in place so it's unlikely to
give issues. So only the transition from transcompound to regular
page, is likely to require special care.

> We try not to change too drastically, but several of the current
> changes are fixes, we are currently contemplating some more changes to
> support the I/O control. Some of the recent changes have been driven
> by tracing. We will pay closer attention to THP changes, thanks for
> bring your concern to our notice.

Thanks a lot. I can already start looking more closely into the memcg
of current upstream myself, if this is a good time and there are no
more big changes planned or already queued in some git tree waiting to
be pulled.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
  2010-09-09 23:40     ` Andrea Arcangeli
@ 2010-09-13  9:34       ` Balbir Singh
  -1 siblings, 0 replies; 22+ messages in thread
From: Balbir Singh @ 2010-09-13  9:34 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura, Chris Mason,
	Borislav Petkov

* Andrea Arcangeli <aarcange@redhat.com> [2010-09-10 01:40:08]:

> Hello,
> 
> On Thu, Sep 09, 2010 at 04:16:30PM +0530, Balbir Singh wrote:
> > * Andrea Arcangeli <aarcange@redhat.com> [2010-09-01 21:08:59]:
> > > btw, memcg developers could already support THP inside memcg even if
> > > THP is not included yet without any sort of problem, so it's also
> > 
> > Could you elaborate by what you mean here?
> 
> Ok, what I mean is that you could already stop assuming the "page"
> passed as parameter to memcg is PAGE_SIZE in size. It would still work
> fine. The check should later be done with PageTransCompound as that
> will be optimized away at compile time when
> CONFIG_TRANSPARENT_HUGEPAGE=n. But in the meantime PageCompund shall
> work fine.
> 

OK, when the code is touched next and from now on, we'll stop making
that assumption.

> One nasty detail to pay attention to later (which isn't possible to
> implement until compound_lock is defined), is that at times we may
> also need to take the compound_lock to avoid the size of the page to
> change from under us (it should only be needed if PageTransCompound
> returns true so it won't affect the regular paths and it won't be
> built if THP is off at compile time). The collapsing takes the
> mmap_sem write mode which normally won't risk to run in parallel,
> furthermore the collapsing isn't done in place so it's unlikely to
> give issues. So only the transition from transcompound to regular
> page, is likely to require special care.
>

Thanks, is there an overhead of the compound_lock that will show up?

 
> > We try not to change too drastically, but several of the current
> > changes are fixes, we are currently contemplating some more changes to
> > support the I/O control. Some of the recent changes have been driven
> > by tracing. We will pay closer attention to THP changes, thanks for
> > bring your concern to our notice.
> 
> Thanks a lot. I can already start looking more closely into the memcg
> of current upstream myself, if this is a good time and there are no
> more big changes planned or already queued in some git tree waiting to
> be pulled.

Please do look at it, most of the churn is not controllable since it
is bug fixes and feature enhancements for newer subsystems and
performance. We'll try not to break anything fundamental.


-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
@ 2010-09-13  9:34       ` Balbir Singh
  0 siblings, 0 replies; 22+ messages in thread
From: Balbir Singh @ 2010-09-13  9:34 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura, Chris Mason,
	Borislav Petkov

* Andrea Arcangeli <aarcange@redhat.com> [2010-09-10 01:40:08]:

> Hello,
> 
> On Thu, Sep 09, 2010 at 04:16:30PM +0530, Balbir Singh wrote:
> > * Andrea Arcangeli <aarcange@redhat.com> [2010-09-01 21:08:59]:
> > > btw, memcg developers could already support THP inside memcg even if
> > > THP is not included yet without any sort of problem, so it's also
> > 
> > Could you elaborate by what you mean here?
> 
> Ok, what I mean is that you could already stop assuming the "page"
> passed as parameter to memcg is PAGE_SIZE in size. It would still work
> fine. The check should later be done with PageTransCompound as that
> will be optimized away at compile time when
> CONFIG_TRANSPARENT_HUGEPAGE=n. But in the meantime PageCompund shall
> work fine.
> 

OK, when the code is touched next and from now on, we'll stop making
that assumption.

> One nasty detail to pay attention to later (which isn't possible to
> implement until compound_lock is defined), is that at times we may
> also need to take the compound_lock to avoid the size of the page to
> change from under us (it should only be needed if PageTransCompound
> returns true so it won't affect the regular paths and it won't be
> built if THP is off at compile time). The collapsing takes the
> mmap_sem write mode which normally won't risk to run in parallel,
> furthermore the collapsing isn't done in place so it's unlikely to
> give issues. So only the transition from transcompound to regular
> page, is likely to require special care.
>

Thanks, is there an overhead of the compound_lock that will show up?

 
> > We try not to change too drastically, but several of the current
> > changes are fixes, we are currently contemplating some more changes to
> > support the I/O control. Some of the recent changes have been driven
> > by tracing. We will pay closer attention to THP changes, thanks for
> > bring your concern to our notice.
> 
> Thanks a lot. I can already start looking more closely into the memcg
> of current upstream myself, if this is a good time and there are no
> more big changes planned or already queued in some git tree waiting to
> be pulled.

Please do look at it, most of the churn is not controllable since it
is bug fixes and feature enhancements for newer subsystems and
performance. We'll try not to break anything fundamental.


-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
  2010-09-13  9:34       ` Balbir Singh
@ 2010-09-15 13:42         ` Andrea Arcangeli
  -1 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2010-09-15 13:42 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura, Chris Mason,
	Borislav Petkov

Hello,

On Mon, Sep 13, 2010 at 03:04:09PM +0530, Balbir Singh wrote:
> OK, when the code is touched next and from now on, we'll stop making
> that assumption.

Great, thanks!

> Thanks, is there an overhead of the compound_lock that will show up?

The compound lock is a per-page bit spinlock, so it'll surely scale
well, but surely there is a locked op overhead associated to it, but
it will only cost for hugepages, not normal pages.

Hugepages can't be collapsed in place, and they can only be collapsed
under the mmap_sem write mode (so holding the mmap sem in read or
write mode is enough to protect against it). The same can't be said
for the split of an hugepage, hugepages can be splitted under the mmap
sem just fine (the only way to protect against it is the compound_lock
or the anon_vma_lock, or yet another way to avoid the page to be
splitted under us is to local_irq_disable and then call
__get_user_pages_fast like futex.c does, it can't be splitted until
local_irq_enable is called, same guarantee as in gup_fast, the
pmd_splitting_flush_notify will wait, the tlb flush for the splitting
is really useless, it's just there to send an IPI and wait for any
gup_fast to finish). It's not entirely clear right now, what kind of
protection we need in memcg.

> Please do look at it, most of the churn is not controllable since it
> is bug fixes and feature enhancements for newer subsystems and
> performance. We'll try not to break anything fundamental.

Looking at it right now!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
@ 2010-09-15 13:42         ` Andrea Arcangeli
  0 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2010-09-15 13:42 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura, Chris Mason,
	Borislav Petkov

Hello,

On Mon, Sep 13, 2010 at 03:04:09PM +0530, Balbir Singh wrote:
> OK, when the code is touched next and from now on, we'll stop making
> that assumption.

Great, thanks!

> Thanks, is there an overhead of the compound_lock that will show up?

The compound lock is a per-page bit spinlock, so it'll surely scale
well, but surely there is a locked op overhead associated to it, but
it will only cost for hugepages, not normal pages.

Hugepages can't be collapsed in place, and they can only be collapsed
under the mmap_sem write mode (so holding the mmap sem in read or
write mode is enough to protect against it). The same can't be said
for the split of an hugepage, hugepages can be splitted under the mmap
sem just fine (the only way to protect against it is the compound_lock
or the anon_vma_lock, or yet another way to avoid the page to be
splitted under us is to local_irq_disable and then call
__get_user_pages_fast like futex.c does, it can't be splitted until
local_irq_enable is called, same guarantee as in gup_fast, the
pmd_splitting_flush_notify will wait, the tlb flush for the splitting
is really useless, it's just there to send an IPI and wait for any
gup_fast to finish). It's not entirely clear right now, what kind of
protection we need in memcg.

> Please do look at it, most of the churn is not controllable since it
> is bug fixes and feature enhancements for newer subsystems and
> performance. We'll try not to break anything fundamental.

Looking at it right now!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
  2010-09-01 19:08 ` Andrea Arcangeli
@ 2010-10-04  3:24   ` Naoya Horiguchi
  -1 siblings, 0 replies; 22+ messages in thread
From: Naoya Horiguchi @ 2010-10-04  3:24 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Balbir Singh,
	Michael S. Tsirkin, Peter Zijlstra, Johannes Weiner,
	Daisuke Nishimura, Chris Mason, Borislav Petkov

Hi,

I experienced build error of "calling pte_alloc_map() with 3 parameters, 
while it's defined to have 4 parameters" in arch/x86/kernel/tboot.c etc.
Is the following chunk in patch "pte alloc trans splitting" necessary?

@@ -1167,16 +1168,18 @@ static inline void pgtable_page_dtor(struct page *page)
        pte_unmap(pte);                                 \
 } while (0)
 
-#define pte_alloc_map(mm, pmd, address)                        \
-       ((unlikely(!pmd_present(*(pmd))) && __pte_alloc(mm, pmd, address))? \
-               NULL: pte_offset_map(pmd, address))
+#define pte_alloc_map(mm, vma, pmd, address)                           \
+       ((unlikely(pmd_none(*(pmd))) && __pte_alloc(mm, vma,    \
+                                                       pmd, address))? \
+        NULL: pte_offset_map(pmd, address))
 
Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
@ 2010-10-04  3:24   ` Naoya Horiguchi
  0 siblings, 0 replies; 22+ messages in thread
From: Naoya Horiguchi @ 2010-10-04  3:24 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Balbir Singh,
	Michael S. Tsirkin, Peter Zijlstra, Johannes Weiner,
	Daisuke Nishimura, Chris Mason, Borislav Petkov

Hi,

I experienced build error of "calling pte_alloc_map() with 3 parameters, 
while it's defined to have 4 parameters" in arch/x86/kernel/tboot.c etc.
Is the following chunk in patch "pte alloc trans splitting" necessary?

@@ -1167,16 +1168,18 @@ static inline void pgtable_page_dtor(struct page *page)
        pte_unmap(pte);                                 \
 } while (0)
 
-#define pte_alloc_map(mm, pmd, address)                        \
-       ((unlikely(!pmd_present(*(pmd))) && __pte_alloc(mm, pmd, address))? \
-               NULL: pte_offset_map(pmd, address))
+#define pte_alloc_map(mm, vma, pmd, address)                           \
+       ((unlikely(pmd_none(*(pmd))) && __pte_alloc(mm, vma,    \
+                                                       pmd, address))? \
+        NULL: pte_offset_map(pmd, address))
 
Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
  2010-10-04  3:24   ` Naoya Horiguchi
@ 2010-10-05 19:18     ` Andrea Arcangeli
  -1 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2010-10-05 19:18 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Balbir Singh,
	Michael S. Tsirkin, Peter Zijlstra, Johannes Weiner,
	Daisuke Nishimura, Chris Mason, Borislav Petkov

Hi Naoya,

On Mon, Oct 04, 2010 at 12:24:51PM +0900, Naoya Horiguchi wrote:
> Hi,
> 
> I experienced build error of "calling pte_alloc_map() with 3 parameters, 
> while it's defined to have 4 parameters" in arch/x86/kernel/tboot.c etc.
> Is the following chunk in patch "pte alloc trans splitting" necessary?
> 
> @@ -1167,16 +1168,18 @@ static inline void pgtable_page_dtor(struct page *page)
>         pte_unmap(pte);                                 \
>  } while (0)
>  
> -#define pte_alloc_map(mm, pmd, address)                        \
> -       ((unlikely(!pmd_present(*(pmd))) && __pte_alloc(mm, pmd, address))? \
> -               NULL: pte_offset_map(pmd, address))
> +#define pte_alloc_map(mm, vma, pmd, address)                           \
> +       ((unlikely(pmd_none(*(pmd))) && __pte_alloc(mm, vma,    \
> +                                                       pmd, address))? \
> +        NULL: pte_offset_map(pmd, address))

Sure it's necessary.

Can you try again with current aa.git origin/master?
(84c5ce35cf221ed0e561dec279df6985a388a080) Thanks a lot.

Andrea

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Transparent Hugepage Support #30
@ 2010-10-05 19:18     ` Andrea Arcangeli
  0 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2010-10-05 19:18 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, linux-kernel, Marcelo Tosatti,
	Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins, Nick Piggin,
	Rik van Riel, Mel Gorman, Dave Hansen, Benjamin Herrenschmidt,
	Ingo Molnar, Mike Travis, KAMEZAWA Hiroyuki, Christoph Lameter,
	Chris Wright, bpicco, KOSAKI Motohiro, Balbir Singh,
	Michael S. Tsirkin, Peter Zijlstra, Johannes Weiner,
	Daisuke Nishimura, Chris Mason, Borislav Petkov

Hi Naoya,

On Mon, Oct 04, 2010 at 12:24:51PM +0900, Naoya Horiguchi wrote:
> Hi,
> 
> I experienced build error of "calling pte_alloc_map() with 3 parameters, 
> while it's defined to have 4 parameters" in arch/x86/kernel/tboot.c etc.
> Is the following chunk in patch "pte alloc trans splitting" necessary?
> 
> @@ -1167,16 +1168,18 @@ static inline void pgtable_page_dtor(struct page *page)
>         pte_unmap(pte);                                 \
>  } while (0)
>  
> -#define pte_alloc_map(mm, pmd, address)                        \
> -       ((unlikely(!pmd_present(*(pmd))) && __pte_alloc(mm, pmd, address))? \
> -               NULL: pte_offset_map(pmd, address))
> +#define pte_alloc_map(mm, vma, pmd, address)                           \
> +       ((unlikely(pmd_none(*(pmd))) && __pte_alloc(mm, vma,    \
> +                                                       pmd, address))? \
> +        NULL: pte_offset_map(pmd, address))

Sure it's necessary.

Can you try again with current aa.git origin/master?
(84c5ce35cf221ed0e561dec279df6985a388a080) Thanks a lot.

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] transparent hugepage sysfs meminfo
  2010-08-17 20:56   ` David Rientjes
@ 2010-08-17 21:01     ` Rik van Riel
  -1 siblings, 0 replies; 22+ messages in thread
From: Rik van Riel @ 2010-08-17 21:01 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andrea Arcangeli, linux-mm, linux-kernel

On 08/17/2010 04:56 PM, David Rientjes wrote:
> Add hugepage statistics to per-node sysfs meminfo
>
> Cc: Rik van Riel<riel@redhat.com>
> Signed-off-by: David Rientjes<rientjes@google.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] transparent hugepage sysfs meminfo
@ 2010-08-17 21:01     ` Rik van Riel
  0 siblings, 0 replies; 22+ messages in thread
From: Rik van Riel @ 2010-08-17 21:01 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andrea Arcangeli, linux-mm, linux-kernel

On 08/17/2010 04:56 PM, David Rientjes wrote:
> Add hugepage statistics to per-node sysfs meminfo
>
> Cc: Rik van Riel<riel@redhat.com>
> Signed-off-by: David Rientjes<rientjes@google.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch] transparent hugepage sysfs meminfo
  2010-08-03 13:56 Transparent Hugepage Support #29 Andrea Arcangeli
@ 2010-08-17 20:56   ` David Rientjes
  0 siblings, 0 replies; 22+ messages in thread
From: David Rientjes @ 2010-08-17 20:56 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, linux-mm, linux-kernel

Add hugepage statistics to per-node sysfs meminfo

Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 drivers/base/node.c |   21 ++++++++++++++++++---
 1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -98,7 +98,11 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
 		       "Node %d WritebackTmp:   %8lu kB\n"
 		       "Node %d Slab:           %8lu kB\n"
 		       "Node %d SReclaimable:   %8lu kB\n"
-		       "Node %d SUnreclaim:     %8lu kB\n",
+		       "Node %d SUnreclaim:     %8lu kB\n"
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+		       "Node %d AnonHugePages:  %8lu kB\n"
+#endif
+			,
 		       nid, K(i.totalram),
 		       nid, K(i.freeram),
 		       nid, K(i.totalram - i.freeram),
@@ -122,7 +126,12 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
 		       nid, K(node_page_state(nid, NR_WRITEBACK)),
 		       nid, K(node_page_state(nid, NR_FILE_PAGES)),
 		       nid, K(node_page_state(nid, NR_FILE_MAPPED)),
-		       nid, K(node_page_state(nid, NR_ANON_PAGES)),
+		       nid, K(node_page_state(nid, NR_ANON_PAGES)
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+			+ node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
+			HPAGE_PMD_NR
+#endif
+		       ),
 		       nid, K(node_page_state(nid, NR_SHMEM)),
 		       nid, node_page_state(nid, NR_KERNEL_STACK) *
 				THREAD_SIZE / 1024,
@@ -133,7 +142,13 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
 		       nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE) +
 				node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
 		       nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE)),
-		       nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE)));
+		       nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE))
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+			, nid,
+			K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
+				HPAGE_PMD_NR)
+#endif
+		       );
 	n += hugetlb_report_node_meminfo(nid, buf + n);
 	return n;
 }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch] transparent hugepage sysfs meminfo
@ 2010-08-17 20:56   ` David Rientjes
  0 siblings, 0 replies; 22+ messages in thread
From: David Rientjes @ 2010-08-17 20:56 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, linux-mm, linux-kernel

Add hugepage statistics to per-node sysfs meminfo

Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 drivers/base/node.c |   21 ++++++++++++++++++---
 1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -98,7 +98,11 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
 		       "Node %d WritebackTmp:   %8lu kB\n"
 		       "Node %d Slab:           %8lu kB\n"
 		       "Node %d SReclaimable:   %8lu kB\n"
-		       "Node %d SUnreclaim:     %8lu kB\n",
+		       "Node %d SUnreclaim:     %8lu kB\n"
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+		       "Node %d AnonHugePages:  %8lu kB\n"
+#endif
+			,
 		       nid, K(i.totalram),
 		       nid, K(i.freeram),
 		       nid, K(i.totalram - i.freeram),
@@ -122,7 +126,12 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
 		       nid, K(node_page_state(nid, NR_WRITEBACK)),
 		       nid, K(node_page_state(nid, NR_FILE_PAGES)),
 		       nid, K(node_page_state(nid, NR_FILE_MAPPED)),
-		       nid, K(node_page_state(nid, NR_ANON_PAGES)),
+		       nid, K(node_page_state(nid, NR_ANON_PAGES)
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+			+ node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
+			HPAGE_PMD_NR
+#endif
+		       ),
 		       nid, K(node_page_state(nid, NR_SHMEM)),
 		       nid, node_page_state(nid, NR_KERNEL_STACK) *
 				THREAD_SIZE / 1024,
@@ -133,7 +142,13 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
 		       nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE) +
 				node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
 		       nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE)),
-		       nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE)));
+		       nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE))
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+			, nid,
+			K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
+				HPAGE_PMD_NR)
+#endif
+		       );
 	n += hugetlb_report_node_meminfo(nid, buf + n);
 	return n;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-10-05 19:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-01 19:08 Transparent Hugepage Support #30 Andrea Arcangeli
2010-09-01 19:08 ` Andrea Arcangeli
2010-09-01 19:44 ` [patch] transparent hugepage sysfs meminfo David Rientjes
2010-09-01 19:44   ` David Rientjes
2010-09-01 19:50   ` Andrea Arcangeli
2010-09-01 19:50     ` Andrea Arcangeli
2010-09-09 10:46 ` Transparent Hugepage Support #30 Balbir Singh
2010-09-09 10:46   ` Balbir Singh
2010-09-09 23:40   ` Andrea Arcangeli
2010-09-09 23:40     ` Andrea Arcangeli
2010-09-13  9:34     ` Balbir Singh
2010-09-13  9:34       ` Balbir Singh
2010-09-15 13:42       ` Andrea Arcangeli
2010-09-15 13:42         ` Andrea Arcangeli
2010-10-04  3:24 ` Naoya Horiguchi
2010-10-04  3:24   ` Naoya Horiguchi
2010-10-05 19:18   ` Andrea Arcangeli
2010-10-05 19:18     ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2010-08-03 13:56 Transparent Hugepage Support #29 Andrea Arcangeli
2010-08-17 20:56 ` [patch] transparent hugepage sysfs meminfo David Rientjes
2010-08-17 20:56   ` David Rientjes
2010-08-17 21:01   ` Rik van Riel
2010-08-17 21:01     ` Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.