linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/36] AutoNUMA24
@ 2012-08-22 14:58 Andrea Arcangeli
  2012-08-22 14:58 ` [PATCH 01/36] autonuma: make set_pmd_at always available Andrea Arcangeli
                   ` (36 more replies)
  0 siblings, 37 replies; 54+ messages in thread
From: Andrea Arcangeli @ 2012-08-22 14:58 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Hillf Danton, Dan Smith, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, Ingo Molnar, Paul Turner, Suresh Siddha,
	Mike Galbraith, Paul E. McKenney, Lai Jiangshan, Bharata B Rao,
	Lee Schermerhorn, Rik van Riel, Johannes Weiner,
	Srivatsa Vaddagiri, Christoph Lameter, Alex Shi,
	Mauricio Faria de Oliveira, Konrad Rzeszutek Wilk, Don Morris,
	Benjamin Herrenschmidt

Hello everyone,

Before the Kernel Summit, I think it's good idea to post a new
AutoNUMA24 and to go through a new review cycle. The last review cycle
has been fundamental in improving the patchset. Thanks!

The objective of AutoNUMA is to be able to perform as close as
possible to (and sometime faster than) the NUMA hard CPU/memory
bindings setups, without requiring the administrator to manually setup
any NUMA hard bind.

I hope everyone sees this is an hard problem, and what one thinks will
work great in theory, when tested in practice, it may not run so
great. But I'd like to remind that all research is good and valuable.
All approaches to solve the problem are worthwhile, regardless if they
work better/worse. sched-numa rewrite is also a very interesting
approach and I hope everyone agrees that it's wonderful that both ways
to solve the problem are being researched. Whatever will be merged
upstream in the end won't change the fact that all work done to try to
solve this hard problem is very valuable and worthwhile.

git clone --reference linux -b autonuma24 git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git

Development autonuma branch:

git clone --reference linux -b autonuma git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git

To update:

git fetch
git checkout -f origin/autonuma

PDF with some benchmark results:

http://www.kernel.org/pub/linux/kernel/people/andrea/autonuma/autonuma-vs-sched-numa-rewrite-20120817.pdf
http://www.kernel.org/pub/linux/kernel/people/andrea/autonuma/autonuma_bench-20120530.pdf

Changelog from AutoNUMA19 to AutoNUMA24:

o Improved lots of comments and header commit messages.

o Rewritten from scratch the comment at the top of kernel/sched/numa.c
  as the old comment wasn't well received in upstream reviews. Tried
  to describe the algorithm from a global view now.

o Added ppc64 support.

o Improved patch splitup.

o Lots of code cleanups and variable renames to make the code more readable.

o Try to take advantage of task_autonuma_nid before the knuma_scand is
  complete.

o Moved some performance tuning sysfs tweaks under DEBUG_VM so they
  won't be visible on production kernels.

o Enabled by default the working set mode for the mm_autonuma data
  collection.

o Halved the size of the mm_autonuma structure.

o scan_sleep_pass_millisecs now is more intuitive (you can can set it
  to 10000 to mean one pass every 10 sec, in the previous release it had
  to be set to 5000 to one pass every 10 sec).

o Removed PF_THREAD_BOUND to allow CPU isolation. Turned the VM_BUG_ON
  verifying the hard binding into a WARN_ON_ONCE so the knuma_migrated
  can be moved by root anywhere safely.

o Optimized autonuma_possible() to avoid checking num_possible_nodes()
  every time.

o Added the math on the last_nid statistical effects from sched-numa
  rewrite which also introduced the last_nid logic of AutoNUMA.

o Now handle systems with holes in the NUMA nodemask. Lots of
  num_possible_nodes() replaced with nr_node_ids (nr_node_ids not so
  nice name for such information).

o Fixed a bug affecting KSM. KSM failed to merge pages mapped with a
  pte_numa pte, now it passes LTP fine. LTP found it.

o Fixed repeated CPU scheduler migrate in sched_autonuma_balance()
  (the idle load balancing sometime was faster and it put the task
  back to its previous CPU before it had a chance to be scheduled on
  the destination CPU).

o More...

Changelog from AutoNUMA-alpha14 to AutoNUMA19:

o sched_autonuma_balance callout location removed from schedule() now it runs
  in the softirq along with CFS load_balancing

o lots of documentation about the math in the sched_autonuma_balance algorithm

o fixed a bug in the fast path detection in sched_autonuma_balance that could
  decrease performance with many nodes

o reduced the page_autonuma memory overhead to from 32 to 12 bytes per page

o fixed a crash in __pmd_numa_fixup

o knuma_numad won't scan VM_MIXEDMAP|PFNMAP (it never touched those ptes
  anyway)

o fixed a crash in autonuma_exit

o fixed a crash when split_huge_page returns 0 in knuma_migratedN as the page
  has been freed already

o assorted cleanups and probably more

Changelog from alpha13 to alpha14:

o page_autonuma introduction, no memory wasted if the kernel is booted
  on not-NUMA hardware. Tested with flatmem/sparsemem on x86
  autonuma=y/n and sparsemem/vsparsemem on x86_64 with autonuma=y/n.
  "noautonuma" kernel param disables autonuma permanently also when
  booted on NUMA hardware (no /sys/kernel/mm/autonuma, and no
  page_autonuma allocations, like cgroup_disable=memory)

o autonuma_balance only runs along with run_rebalance_domains, to
  avoid altering the usual scheduler runtime. autonuma_balance gives a
  "kick" to the scheduler after a rebalance (it overrides the load
  balance activity if needed). It's not yet tested on specjbb or more
  schedule intensive benchmark, hopefully there's no NUMA
  regression. For intensive compute loads not involving a flood of
  scheduling activity this doesn't show any performance regression,
  and it avoids altering the strict schedule performance. It goes in
  the direction of being less intrusive with the stock scheduler
  runtime.

  Note: autonuma_balance still runs from normal context (not softirq
  context like run_rebalance_domains) to be able to wait on process
  migration (avoid _nowait), but most of the time it does nothing at
  all.

Changelog from alpha11 to alpha13:

o autonuma_balance optimization (take the fast path when process is in
  the preferred NUMA node)

TODO:

o THP native migration (orthogonal and also needed for
  cpuset/migrate_pages(2)/numa/sched).

Andrea Arcangeli (35):
  autonuma: make set_pmd_at always available
  autonuma: export is_vma_temporary_stack() even if
    CONFIG_TRANSPARENT_HUGEPAGE=n
  autonuma: define _PAGE_NUMA_PTE and _PAGE_NUMA_PMD
  autonuma: pte_numa() and pmd_numa()
  autonuma: teach gup_fast about pmd_numa
  autonuma: introduce kthread_bind_node()
  autonuma: mm_autonuma and task_autonuma data structures
  autonuma: define the autonuma flags
  autonuma: core autonuma.h header
  autonuma: CPU follows memory algorithm
  autonuma: add page structure fields
  autonuma: knuma_migrated per NUMA node queues
  autonuma: autonuma_enter/exit
  autonuma: call autonuma_setup_new_exec()
  autonuma: alloc/free/init task_autonuma
  autonuma: alloc/free/init mm_autonuma
  autonuma: prevent select_task_rq_fair to return -1
  autonuma: teach CFS about autonuma affinity
  autonuma: memory follows CPU algorithm and task/mm_autonuma stats
    collection
  autonuma: default mempolicy follow AutoNUMA
  autonuma: call autonuma_split_huge_page()
  autonuma: make khugepaged pte_numa aware
  autonuma: retain page last_nid information in khugepaged
  autonuma: numa hinting page faults entry points
  autonuma: reset autonuma page data when pages are freed
  autonuma: link mm/autonuma.o and kernel/sched/numa.o
  autonuma: add CONFIG_AUTONUMA and CONFIG_AUTONUMA_DEFAULT_ENABLED
  autonuma: page_autonuma
  autonuma: autonuma_migrate_head[0] dynamic size
  autonuma: bugcheck page_autonuma fields on newly allocated pages
  autonuma: shrink the per-page page_autonuma struct size
  autonuma: boost khugepaged scanning rate
  autonuma: make the AUTONUMA_SCAN_PMD_FLAG conditional to
    CONFIG_HAVE_ARCH_AUTONUMA_SCAN_PMD
  autonuma: add knuma_migrated/allow_first_fault in sysfs
  autonuma: add mm_autonuma working set estimation

Vaidyanathan Srinivasan (1):
  autonuma: powerpc port

 arch/Kconfig                              |    6 +
 arch/powerpc/Kconfig                      |    6 +
 arch/powerpc/include/asm/pgtable.h        |   48 +-
 arch/powerpc/include/asm/pte-hash64-64k.h |    4 +-
 arch/powerpc/mm/numa.c                    |    3 +-
 arch/x86/Kconfig                          |    2 +
 arch/x86/include/asm/paravirt.h           |    2 -
 arch/x86/include/asm/pgtable.h            |   65 ++-
 arch/x86/include/asm/pgtable_types.h      |   28 +
 arch/x86/mm/gup.c                         |   13 +-
 arch/x86/mm/numa.c                        |    6 +-
 arch/x86/mm/numa_32.c                     |    3 +-
 fs/exec.c                                 |    7 +
 include/asm-generic/pgtable.h             |   12 +
 include/linux/autonuma.h                  |   72 ++
 include/linux/autonuma_flags.h            |  168 +++
 include/linux/autonuma_list.h             |  100 ++
 include/linux/autonuma_sched.h            |   50 +
 include/linux/autonuma_types.h            |  169 +++
 include/linux/huge_mm.h                   |    6 +-
 include/linux/kthread.h                   |    1 +
 include/linux/memory_hotplug.h            |    3 +-
 include/linux/mm_types.h                  |    5 +
 include/linux/mmzone.h                    |   38 +
 include/linux/page_autonuma.h             |   59 +
 include/linux/sched.h                     |    3 +
 init/main.c                               |    2 +
 kernel/fork.c                             |   18 +
 kernel/kthread.c                          |   21 +
 kernel/sched/Makefile                     |    1 +
 kernel/sched/core.c                       |    1 +
 kernel/sched/fair.c                       |   86 ++-
 kernel/sched/numa.c                       |  604 ++++++++++
 kernel/sched/sched.h                      |   19 +
 mm/Kconfig                                |   17 +
 mm/Makefile                               |    1 +
 mm/autonuma.c                             | 1727 +++++++++++++++++++++++++++++
 mm/autonuma_list.c                        |  169 +++
 mm/huge_memory.c                          |   78 ++-
 mm/memory.c                               |   31 +
 mm/memory_hotplug.c                       |    2 +-
 mm/mempolicy.c                            |   12 +-
 mm/mmu_context.c                          |    3 +
 mm/page_alloc.c                           |    7 +-
 mm/page_autonuma.c                        |  248 +++++
 mm/sparse.c                               |  126 ++-
 46 files changed, 4014 insertions(+), 38 deletions(-)
 create mode 100644 include/linux/autonuma.h
 create mode 100644 include/linux/autonuma_flags.h
 create mode 100644 include/linux/autonuma_list.h
 create mode 100644 include/linux/autonuma_sched.h
 create mode 100644 include/linux/autonuma_types.h
 create mode 100644 include/linux/page_autonuma.h
 create mode 100644 kernel/sched/numa.c
 create mode 100644 mm/autonuma.c
 create mode 100644 mm/autonuma_list.c
 create mode 100644 mm/page_autonuma.c


^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2012-08-23 22:15 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-22 14:58 [PATCH 00/36] AutoNUMA24 Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 01/36] autonuma: make set_pmd_at always available Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 02/36] autonuma: export is_vma_temporary_stack() even if CONFIG_TRANSPARENT_HUGEPAGE=n Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 03/36] autonuma: define _PAGE_NUMA_PTE and _PAGE_NUMA_PMD Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 04/36] autonuma: pte_numa() and pmd_numa() Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 05/36] autonuma: teach gup_fast about pmd_numa Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 06/36] autonuma: introduce kthread_bind_node() Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 07/36] autonuma: mm_autonuma and task_autonuma data structures Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 08/36] autonuma: define the autonuma flags Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 09/36] autonuma: core autonuma.h header Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 10/36] autonuma: CPU follows memory algorithm Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 11/36] autonuma: add page structure fields Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 12/36] autonuma: knuma_migrated per NUMA node queues Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 13/36] autonuma: autonuma_enter/exit Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 14/36] autonuma: call autonuma_setup_new_exec() Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 15/36] autonuma: alloc/free/init task_autonuma Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 16/36] autonuma: alloc/free/init mm_autonuma Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 17/36] autonuma: prevent select_task_rq_fair to return -1 Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 18/36] autonuma: teach CFS about autonuma affinity Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 19/36] autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection Andrea Arcangeli
2012-08-22 20:19   ` Andi Kleen
2012-08-22 21:22     ` Hugh Dickins
2012-08-22 21:24     ` Andrea Arcangeli
2012-08-22 22:37       ` Andi Kleen
2012-08-22 22:46         ` Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 20/36] autonuma: default mempolicy follow AutoNUMA Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 21/36] autonuma: call autonuma_split_huge_page() Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 22/36] autonuma: make khugepaged pte_numa aware Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 23/36] autonuma: retain page last_nid information in khugepaged Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 24/36] autonuma: numa hinting page faults entry points Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 25/36] autonuma: reset autonuma page data when pages are freed Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 26/36] autonuma: link mm/autonuma.o and kernel/sched/numa.o Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 27/36] autonuma: add CONFIG_AUTONUMA and CONFIG_AUTONUMA_DEFAULT_ENABLED Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 28/36] autonuma: page_autonuma Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 29/36] autonuma: autonuma_migrate_head[0] dynamic size Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 30/36] autonuma: bugcheck page_autonuma fields on newly allocated pages Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 31/36] autonuma: shrink the per-page page_autonuma struct size Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 32/36] autonuma: boost khugepaged scanning rate Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 33/36] autonuma: powerpc port Andrea Arcangeli
2012-08-22 22:01   ` Benjamin Herrenschmidt
2012-08-22 22:35     ` Andrea Arcangeli
2012-08-23  5:11       ` Benjamin Herrenschmidt
2012-08-23 15:23         ` Andrea Arcangeli
2012-08-23 22:13         ` Benjamin Herrenschmidt
2012-08-22 22:56     ` Benjamin Herrenschmidt
2012-08-22 23:06       ` Andrea Arcangeli
2012-08-23  4:15       ` Vaidyanathan Srinivasan
2012-08-22 14:59 ` [PATCH 34/36] autonuma: make the AUTONUMA_SCAN_PMD_FLAG conditional to CONFIG_HAVE_ARCH_AUTONUMA_SCAN_PMD Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 35/36] autonuma: add knuma_migrated/allow_first_fault in sysfs Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 36/36] autonuma: add mm_autonuma working set estimation Andrea Arcangeli
2012-08-22 19:26 ` [PATCH 00/36] AutoNUMA24 Rik van Riel
2012-08-22 21:40   ` Ingo Molnar
2012-08-22 22:19     ` Andrea Arcangeli
2012-08-23  8:42       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).