All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/20] Cleanup and optimise the page allocator
@ 2009-02-22 23:17 ` Mel Gorman
  0 siblings, 0 replies; 190+ messages in thread
From: Mel Gorman @ 2009-02-22 23:17 UTC (permalink / raw)
  To: Mel Gorman, Linux Memory Management List
  Cc: Pekka Enberg, Rik van Riel, KOSAKI Motohiro, Christoph Lameter,
	Johannes Weiner, Nick Piggin, Linux Kernel Mailing List,
	Lin Ming, Zhang Yanmin

The complexity of the page allocator has been increasing for some time
and it has now reached the point where the SLUB allocator is doing strange
tricks to avoid the page allocator. This is obviously bad as it may encourage
other subsystems to try avoiding the page allocator as well.

This series of patches is intended to reduce the cost of the page
allocator by doing the following.

Patches 1-3 iron out the entry paths slightly and remove stupid sanity
checks from the fast path.

Patch 4 uses a lookup table instead of a number of branches to decide what
zones are usable given the GFP flags.

Patch 5 avoids repeated checks of the zonelist

Patch 6 breaks the allocator up into a fast and slow path where the fast
path later becomes one long inlined function.

Patches 7-10 avoids calculating the same things repeatedly and instead
calculates them once.

Patches 11-13 inline the whole allocator fast path

Patch 14 avoids calling get_pageblock_migratetype() potentially twice on
every page free

Patch 15 reduces the number of times interrupts are disabled by reworking
what free_page_mlock() does. However, I notice that the cost of calling
TestClearPageMlocked() is still quite high and I'm guessing it's because
it's a locked bit operation. It's be nice if it could be established if
it's safe to use an unlocked version here. Rik, can you comment?

Patch 16 avoids using the zonelist cache on non-NUMA machines

Patch 17 removes an expensive and excessively paranoid check in the
allocator fast path

Patch 18 avoids a list search in the allocator fast path.

Patch 19 avoids repeated checking of an empty list.

Patch 20 gets rid of hot/cold freeing of pages because it incurs cost for
what I believe to be very dubious gain. I'm not sure we currently gain
anything by it but it's further discussed in the patch itself.

Running all of these through a profiler shows me the cost of page allocation
and freeing is reduced by a nice amount without drastically altering how the
allocator actually works. Excluding the cost of zeroing pages, the cost of
allocation is reduced by 25% and the cost of freeing by 12%.  Again excluding
zeroing a page, much of the remaining cost is due to counters, debugging
checks and interrupt disabling.  Of course when a page has to be zeroed,
the dominant cost of a page allocation is zeroing it.

Counters are surprising expensive, we spent a good chuck of our time in
functions like __dec_zone_page_state and __dec_zone_state. In a profiled
run of kernbench, the time spent in __dec_zone_state was roughly equal to
the combined cost of the rest of the page free path. A quick check showed
that almost half of the time in that function is spent on line 233 alone
which for me is;

	(*p)--;

That's worth a separate investigation but it might be a case that
manipulating int8_t on the machine I was using for profiling is unusually
expensive. Converting this to an int might be faster but the increased
memory consumption and cache footprint might be a problem. Opinions?

The downside is that the patches do increase text size because of the
splitting of the fast path into one inlined blob and the slow path into a
number of other functions. On my test machine, text increased by 1.2K so
I might revisit that again and see how much of a difference it really made.

That all said, I'm seeing good results on actual benchmarks with these
patches.

o On many machines, I'm seeing a 0-2% improvement on kernbench. The dominant
  cost in kernbench is the compiler and zeroing allocated pages for
  pagetables.

o For tbench, I have seen an 8-12% improvement on two x86-64 machines (elm3b6
  on test.kernel.org gained 8%) but generally it was less dramatic on
  x86-64 in the range of 0-4%. On one PPC64, the different was also in the
  range of 0-4%. Generally there were gains, but one specific ppc64 showed a
  regression of 7% for one client but a negligible difference for 8 clients.
  It's not clear why this machine regressed and others didn't.

o hackbench is harder to conclude anything from. Most machines showed
  performance gains in the 5-11% range but one machine in particular showed
  a mix of gains and losses depending on the number of clients. Might be
  a caching thing.

o One machine in particular was a major surprise for sysbench with gains
  of 4-8% there which was drastically higher than I was expecting. However,
  on other machines, it was in the more reasonable 0-4% range, still pretty
  respectable. It's not guaranteed though. While most machines showed some
  sort of gain, one ppc64 showed no difference at all.

So, by and large it's an improvement of some sort.

I haven't run a page-allocator micro-benchmark to see what sort of figures
that gives. Christoph, I recall you had some sort of page allocator
micro-benchmark. Do you want to give it a shot or remind me how to use
it please?

All other reviews, comments, alternative benchmark reports are welcome.

 arch/ia64/hp/common/sba_iommu.c   |    2 +-
 arch/ia64/kernel/mca.c            |    3 +-
 arch/ia64/kernel/uncached.c       |    3 +-
 arch/ia64/sn/pci/pci_dma.c        |    3 +-
 arch/powerpc/platforms/cell/ras.c |    2 +-
 arch/x86/kvm/vmx.c                |    2 +-
 drivers/misc/sgi-gru/grufile.c    |    2 +-
 drivers/misc/sgi-xp/xpc_uv.c      |    2 +-
 fs/afs/write.c                    |    4 +-
 fs/btrfs/compression.c            |    2 +-
 fs/btrfs/extent_io.c              |    4 +-
 fs/btrfs/ordered-data.c           |    2 +-
 fs/cifs/file.c                    |    4 +-
 fs/gfs2/ops_address.c             |    2 +-
 fs/hugetlbfs/inode.c              |    2 +-
 fs/nfs/dir.c                      |    2 +-
 fs/ntfs/file.c                    |    2 +-
 fs/ramfs/file-nommu.c             |    2 +-
 fs/xfs/linux-2.6/xfs_aops.c       |    4 +-
 include/linux/gfp.h               |   58 ++--
 include/linux/mm.h                |    1 -
 include/linux/mmzone.h            |    8 +-
 include/linux/pagemap.h           |    2 +-
 include/linux/pagevec.h           |    4 +-
 include/linux/swap.h              |    2 +-
 init/main.c                       |    1 +
 kernel/profile.c                  |    8 +-
 mm/filemap.c                      |    4 +-
 mm/hugetlb.c                      |    4 +-
 mm/internal.h                     |   10 +-
 mm/mempolicy.c                    |    2 +-
 mm/migrate.c                      |    2 +-
 mm/page-writeback.c               |    2 +-
 mm/page_alloc.c                   |  646 ++++++++++++++++++++++-----------
 mm/slab.c                         |    4 +-
 mm/slob.c                         |    4 +-
 mm/slub.c                         |    5 +-
 mm/swap.c                         |   12 +-
 mm/swap_state.c                   |    2 +-
 mm/truncate.c                     |    6 +-
 mm/vmalloc.c                      |    6 +-
 mm/vmscan.c                       |    8 +-
 42 files changed, 517 insertions(+), 333 deletions(-)


^ permalink raw reply	[flat|nested] 190+ messages in thread

end of thread, other threads:[~2009-03-03 19:04 UTC | newest]

Thread overview: 190+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-22 23:17 [RFC PATCH 00/20] Cleanup and optimise the page allocator Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 01/20] Replace __alloc_pages_internal() with __alloc_pages_nodemask() Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 02/20] Do not sanity check order in the fast path Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 03/20] Do not check NUMA node ID when the caller knows the node is valid Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23 15:01   ` Christoph Lameter
2009-02-23 15:01     ` Christoph Lameter
2009-02-23 16:24     ` Mel Gorman
2009-02-23 16:24       ` Mel Gorman
2009-02-22 23:17 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23 11:55   ` [PATCH] mm: clean up __GFP_* flags a bit Peter Zijlstra
2009-02-23 11:55     ` Peter Zijlstra
2009-02-23 18:01     ` Mel Gorman
2009-02-23 18:01       ` Mel Gorman
2009-02-23 20:27       ` Vegard Nossum
2009-02-23 20:27         ` Vegard Nossum
2009-02-23 15:23   ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Christoph Lameter
2009-02-23 15:23     ` Christoph Lameter
2009-02-23 15:41     ` Nick Piggin
2009-02-23 15:41       ` Nick Piggin
2009-02-23 15:43       ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 15:43         ` Christoph Lameter
2009-02-23 16:40         ` Mel Gorman
2009-02-23 16:40           ` Mel Gorman
2009-02-23 17:03           ` Christoph Lameter
2009-02-23 17:03             ` Christoph Lameter
2009-02-24  1:32           ` KAMEZAWA Hiroyuki
2009-02-24  1:32             ` KAMEZAWA Hiroyuki
2009-02-24  3:59             ` Nick Piggin
2009-02-24  3:59               ` Nick Piggin
2009-02-24  5:20               ` KAMEZAWA Hiroyuki
2009-02-24  5:20                 ` KAMEZAWA Hiroyuki
2009-02-24 11:36             ` Mel Gorman
2009-02-24 11:36               ` Mel Gorman
2009-02-23 16:33     ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-23 16:33       ` Mel Gorman
2009-02-23 16:33       ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 16:33         ` Christoph Lameter
2009-02-23 17:41         ` Mel Gorman
2009-02-23 17:41           ` Mel Gorman
2009-02-22 23:17 ` [PATCH 05/20] Check only once if the zonelist is suitable for the allocation Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 06/20] Break up the allocator entry point into fast and slow paths Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 07/20] Simplify the check on whether cpusets are a factor or not Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23  7:14   ` Pekka J Enberg
2009-02-23  7:14     ` Pekka J Enberg
2009-02-23  9:07     ` Peter Zijlstra
2009-02-23  9:07       ` Peter Zijlstra
2009-02-23  9:13       ` Pekka Enberg
2009-02-23  9:13         ` Pekka Enberg
2009-02-23 11:39         ` Mel Gorman
2009-02-23 11:39           ` Mel Gorman
2009-02-23 13:19           ` Pekka Enberg
2009-02-23 13:19             ` Pekka Enberg
2009-02-23  9:14   ` Li Zefan
2009-02-23  9:14     ` Li Zefan
2009-02-22 23:17 ` [PATCH 08/20] Move check for disabled anti-fragmentation out of fastpath Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 09/20] Calculate the preferred zone for allocation only once Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 10/20] Calculate the migratetype " Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 11/20] Inline get_page_from_freelist() in the fast-path Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23  7:21   ` Pekka Enberg
2009-02-23  7:21     ` Pekka Enberg
2009-02-23 11:42     ` Mel Gorman
2009-02-23 11:42       ` Mel Gorman
2009-02-23 15:32   ` Nick Piggin
2009-02-23 15:32     ` Nick Piggin
2009-02-24 13:32     ` Mel Gorman
2009-02-24 13:32       ` Mel Gorman
2009-02-24 14:08       ` Nick Piggin
2009-02-24 14:08         ` Nick Piggin
2009-02-24 15:03         ` Mel Gorman
2009-02-24 15:03           ` Mel Gorman
2009-02-22 23:17 ` [PATCH 12/20] Inline __rmqueue_smallest() Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 13/20] Inline buffered_rmqueue() Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23  7:24   ` Pekka Enberg
2009-02-23  7:24     ` Pekka Enberg
2009-02-23 11:44     ` Mel Gorman
2009-02-23 11:44       ` Mel Gorman
2009-02-22 23:17 ` [PATCH 14/20] Do not call get_pageblock_migratetype() more than necessary Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 15/20] Do not disable interrupts in free_page_mlock() Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23  9:19   ` Peter Zijlstra
2009-02-23  9:19     ` Peter Zijlstra
2009-02-23 12:23     ` Mel Gorman
2009-02-23 12:23       ` Mel Gorman
2009-02-23 12:44       ` Peter Zijlstra
2009-02-23 12:44         ` Peter Zijlstra
2009-02-23 14:25         ` Mel Gorman
2009-02-23 14:25           ` Mel Gorman
2009-02-22 23:17 ` [PATCH 16/20] Do not setup zonelist cache when there is only one node Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 17/20] Do not double sanity check page attributes during allocation Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 18/20] Split per-cpu list into one-list-per-migrate-type Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 19/20] Batch free pages from migratetype per-cpu lists Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 20/20] Get rid of the concept of hot/cold page freeing Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23  9:37   ` Andrew Morton
2009-02-23  9:37     ` Andrew Morton
2009-02-23 23:30     ` Mel Gorman
2009-02-23 23:30       ` Mel Gorman
2009-02-23 23:53       ` Andrew Morton
2009-02-23 23:53         ` Andrew Morton
2009-02-24 11:51         ` Mel Gorman
2009-02-24 11:51           ` Mel Gorman
2009-02-25  0:01           ` Andrew Morton
2009-02-25  0:01             ` Andrew Morton
2009-02-25 16:01             ` Mel Gorman
2009-02-25 16:01               ` Mel Gorman
2009-02-25 16:19               ` Andrew Morton
2009-02-25 16:19                 ` Andrew Morton
2009-02-26 16:37                 ` Mel Gorman
2009-02-26 16:37                   ` Mel Gorman
2009-02-26 17:00                   ` Christoph Lameter
2009-02-26 17:00                     ` Christoph Lameter
2009-02-26 17:15                     ` Mel Gorman
2009-02-26 17:15                       ` Mel Gorman
2009-02-26 17:30                       ` Christoph Lameter
2009-02-26 17:30                         ` Christoph Lameter
2009-02-27 11:33                         ` Nick Piggin
2009-02-27 11:33                           ` Nick Piggin
2009-02-27 15:40                           ` Christoph Lameter
2009-02-27 15:40                             ` Christoph Lameter
2009-03-03 13:52                             ` Mel Gorman
2009-03-03 13:52                               ` Mel Gorman
2009-03-03 18:53                               ` Christoph Lameter
2009-03-03 18:53                                 ` Christoph Lameter
2009-02-27 11:38                       ` Nick Piggin
2009-02-27 11:38                         ` Nick Piggin
2009-03-01 10:37                         ` KOSAKI Motohiro
2009-03-01 10:37                           ` KOSAKI Motohiro
2009-02-25 18:33               ` Christoph Lameter
2009-02-25 18:33                 ` Christoph Lameter
2009-02-22 23:57 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-22 23:57   ` Andi Kleen
2009-02-23 12:34   ` Mel Gorman
2009-02-23 12:34     ` Mel Gorman
2009-02-23 15:34   ` [RFC PATCH 00/20] Cleanup and optimise the page allocato Christoph Lameter
2009-02-23 15:34     ` Christoph Lameter
2009-02-23  0:02 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23  0:02   ` Andi Kleen
2009-02-23 14:32   ` Mel Gorman
2009-02-23 14:32     ` Mel Gorman
2009-02-23 17:49     ` Andi Kleen
2009-02-23 17:49       ` Andi Kleen
2009-02-24 14:32       ` Mel Gorman
2009-02-24 14:32         ` Mel Gorman
2009-02-23  7:29 ` Pekka Enberg
2009-02-23  7:29   ` Pekka Enberg
2009-02-23  8:34   ` Zhang, Yanmin
2009-02-23  8:34     ` Zhang, Yanmin
2009-02-23  9:10   ` KOSAKI Motohiro
2009-02-23  9:10     ` KOSAKI Motohiro
2009-02-23 11:55 ` [PATCH] mm: gfp_to_alloc_flags() Peter Zijlstra
2009-02-23 11:55   ` Peter Zijlstra
2009-02-23 14:00   ` Pekka Enberg
2009-02-23 14:00     ` Pekka Enberg
2009-02-23 18:17   ` Mel Gorman
2009-02-23 18:17     ` Mel Gorman
2009-02-23 20:09     ` Peter Zijlstra
2009-02-23 20:09       ` Peter Zijlstra
2009-02-23 22:59   ` Andrew Morton
2009-02-23 22:59     ` Andrew Morton
2009-02-24  8:59     ` Peter Zijlstra
2009-02-24  8:59       ` Peter Zijlstra
2009-02-23 14:38 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Christoph Lameter
2009-02-23 14:38   ` Christoph Lameter
2009-02-23 14:46 ` Nick Piggin
2009-02-23 14:46   ` Nick Piggin
2009-02-23 15:00   ` Mel Gorman
2009-02-23 15:00     ` Mel Gorman
2009-02-23 15:22     ` Nick Piggin
2009-02-23 15:22       ` Nick Piggin
2009-02-23 20:26       ` Mel Gorman
2009-02-23 20:26         ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.