linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/19] Misc page alloc, shmem, mark_page_accessed and page_waitqueue optimisations v3r33
@ 2014-05-13  9:45 Mel Gorman
  2014-05-13  9:45 ` [PATCH 01/19] mm: page_alloc: Do not update zlc unless the zlc is active Mel Gorman
                   ` (19 more replies)
  0 siblings, 20 replies; 103+ messages in thread
From: Mel Gorman @ 2014-05-13  9:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Vlastimil Babka, Jan Kara, Michal Hocko,
	Hugh Dickins, Peter Zijlstra, Dave Hansen, Mel Gorman,
	Linux Kernel, Linux-MM, Linux-FSDevel

Changelog since V2
o Fewer atomic operations in buffer discards				(mgorman)
o Remove number_of_cpusets and use ref count in jump labels		(peterz)
o Optimise set loop for pageblock flags further				(peterz)
o Remove unnecessary parameters when setting pageblock flags		(vbabka)
o Rework how PG_waiters are set/cleared to avoid changing wait.c	(mgorman)

I was investigating a performance bug that looked like dd to tmpfs
had regressed.  The bulk of the problem turned out to be a difference
in Kconfig but it got me looking at the unnecessary overhead in tmpfs,
mark_page_accessed and parts of the allocator. This series is the result.

The patches themselves have details of the performance results but here
are a few showing the impact of the whole series. This is the result of
dd'ing to a file multiple times on tmpfs

sync DD to tmpfs
Throughput           3.15.0-rc4            3.15.0-rc4
                        vanilla         fullseries-v3
Min         4096.0000 (  0.00%)   4300.8000 (  5.00%)
Mean        4785.4933 (  0.00%)   5003.9467 (  4.56%)
TrimMean    4812.8000 (  0.00%)   5028.5714 (  4.48%)
Stddev       147.0509 (  0.00%)    191.9981 ( 30.57%)
Max         5017.6000 (  0.00%)   5324.8000 (  6.12%)

sync DD to tmpfs
Elapsed Time                3.15.0-rc4            3.15.0-rc4
                               vanilla         fullseries-v3
Min      elapsed      0.4200 (  0.00%)      0.3900 (  7.14%)
Mean     elapsed      0.4947 (  0.00%)      0.4527 (  8.49%)
TrimMean elapsed      0.4968 (  0.00%)      0.4539 (  8.63%)
Stddev   elapsed      0.0255 (  0.00%)      0.0340 (-33.02%)
Max      elapsed      0.5200 (  0.00%)      0.4800 (  7.69%)

TrimMean elapsed      0.4796 (  0.00%)      0.4179 ( 12.88%)
Stddev   elapsed      0.0353 (  0.00%)      0.0379 ( -7.23%)
Max      elapsed      0.5100 (  0.00%)      0.4800 (  5.88%)

sync DD to ext4
Throughput           3.15.0-rc4            3.15.0-rc4
                        vanilla         fullseries-v3
Min          113.0000 (  0.00%)    117.0000 (  3.54%)
Mean         116.3000 (  0.00%)    119.6667 (  2.89%)
TrimMean     116.2857 (  0.00%)    119.5714 (  2.83%)
Stddev         1.6961 (  0.00%)      1.1643 (-31.35%)
Max          120.0000 (  0.00%)    122.0000 (  1.67%)

sync DD to ext4
Elapsed time                3.15.0-rc4            3.15.0-rc4
                               vanilla         fullseries-v3
Min      elapsed     13.9500 (  0.00%)     13.6900 (  1.86%)
Mean     elapsed     14.4253 (  0.00%)     14.0010 (  2.94%)
TrimMean elapsed     14.4321 (  0.00%)     14.0161 (  2.88%)
Stddev   elapsed      0.2047 (  0.00%)      0.1423 ( 30.46%)
Max      elapsed     14.8300 (  0.00%)     14.3100 (  3.51%)

async DD to ext4 
Elapsed time                3.15.0-rc4            3.15.0-rc4
                               vanilla         fullseries-v3
Min      elapsed      0.7900 (  0.00%)      0.7800 (  1.27%)
Mean     elapsed     12.4023 (  0.00%)     12.2957 (  0.86%)
TrimMean elapsed     13.2036 (  0.00%)     13.0918 (  0.85%)
Stddev   elapsed      3.3286 (  0.00%)      2.9842 ( 10.35%)
Max      elapsed     18.6000 (  0.00%)     13.4300 ( 27.80%)



This table shows the latency in usecs of accessing ext4-backed
mappings of various sizes

lat_mmap
                       3.15.0-rc4            3.15.0-rc4
                          vanilla         fullseries-v3
Procs 107M     564.0000 (  0.00%)    546.0000 (  3.19%)
Procs 214M    1123.0000 (  0.00%)   1090.0000 (  2.94%)
Procs 322M    1636.0000 (  0.00%)   1395.0000 ( 14.73%)
Procs 429M    2076.0000 (  0.00%)   2051.0000 (  1.20%)
Procs 536M    2518.0000 (  0.00%)   2482.0000 (  1.43%)
Procs 644M    3008.0000 (  0.00%)   2978.0000 (  1.00%)
Procs 751M    3506.0000 (  0.00%)   3450.0000 (  1.60%)
Procs 859M    3988.0000 (  0.00%)   3756.0000 (  5.82%)
Procs 966M    4544.0000 (  0.00%)   4310.0000 (  5.15%)
Procs 1073M   4960.0000 (  0.00%)   4928.0000 (  0.65%)
Procs 1181M   5342.0000 (  0.00%)   5144.0000 (  3.71%)
Procs 1288M   5573.0000 (  0.00%)   5427.0000 (  2.62%)
Procs 1395M   5777.0000 (  0.00%)   6056.0000 ( -4.83%)
Procs 1503M   6141.0000 (  0.00%)   5963.0000 (  2.90%)
Procs 1610M   6689.0000 (  0.00%)   6331.0000 (  5.35%)
Procs 1717M   8839.0000 (  0.00%)   6807.0000 ( 22.99%)
Procs 1825M   8399.0000 (  0.00%)   9062.0000 ( -7.89%)
Procs 1932M   7871.0000 (  0.00%)   8778.0000 (-11.52%)
Procs 2040M   8235.0000 (  0.00%)   8081.0000 (  1.87%)
Procs 2147M   8861.0000 (  0.00%)   8337.0000 (  5.91%)

In general the system CPU overhead is lower.

 arch/tile/mm/homecache.c        |   2 +-
 fs/btrfs/extent_io.c            |  11 +-
 fs/btrfs/file.c                 |   5 +-
 fs/buffer.c                     |  21 ++-
 fs/ext4/mballoc.c               |  14 +-
 fs/f2fs/checkpoint.c            |   3 -
 fs/f2fs/node.c                  |   2 -
 fs/fuse/dev.c                   |   2 +-
 fs/fuse/file.c                  |   2 -
 fs/gfs2/aops.c                  |   1 -
 fs/gfs2/meta_io.c               |   4 +-
 fs/ntfs/attrib.c                |   1 -
 fs/ntfs/file.c                  |   1 -
 include/linux/buffer_head.h     |   5 +
 include/linux/cpuset.h          |  46 +++++
 include/linux/gfp.h             |   4 +-
 include/linux/jump_label.h      |  20 ++-
 include/linux/mmzone.h          |  21 ++-
 include/linux/page-flags.h      |  20 +++
 include/linux/pageblock-flags.h |  30 +++-
 include/linux/pagemap.h         | 115 +++++++++++-
 include/linux/swap.h            |   9 +-
 kernel/cpuset.c                 |  10 +-
 mm/filemap.c                    | 380 +++++++++++++++++++++++++---------------
 mm/page_alloc.c                 | 229 ++++++++++++++----------
 mm/shmem.c                      |   8 +-
 mm/swap.c                       |  27 ++-
 mm/swap_state.c                 |   2 +-
 mm/vmscan.c                     |   9 +-
 29 files changed, 686 insertions(+), 318 deletions(-)

-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 103+ messages in thread

end of thread, other threads:[~2014-05-22 19:53 UTC | newest]

Thread overview: 103+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-13  9:45 [PATCH 00/19] Misc page alloc, shmem, mark_page_accessed and page_waitqueue optimisations v3r33 Mel Gorman
2014-05-13  9:45 ` [PATCH 01/19] mm: page_alloc: Do not update zlc unless the zlc is active Mel Gorman
2014-05-13  9:45 ` [PATCH 02/19] mm: page_alloc: Do not treat a zone that cannot be used for dirty pages as "full" Mel Gorman
2014-05-13  9:45 ` [PATCH 03/19] jump_label: Expose the reference count Mel Gorman
2014-05-13  9:45 ` [PATCH 04/19] mm: page_alloc: Use jump labels to avoid checking number_of_cpusets Mel Gorman
2014-05-13 10:58   ` Peter Zijlstra
2014-05-13 12:28     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 05/19] mm: page_alloc: Calculate classzone_idx once from the zonelist ref Mel Gorman
2014-05-13 22:25   ` Andrew Morton
2014-05-14  6:32     ` Mel Gorman
2014-05-14 20:29     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 06/19] mm: page_alloc: Only check the zone id check if pages are buddies Mel Gorman
2014-05-13  9:45 ` [PATCH 07/19] mm: page_alloc: Only check the alloc flags and gfp_mask for dirty once Mel Gorman
2014-05-13  9:45 ` [PATCH 08/19] mm: page_alloc: Take the ALLOC_NO_WATERMARK check out of the fast path Mel Gorman
2014-05-13  9:45 ` [PATCH 09/19] mm: page_alloc: Use word-based accesses for get/set pageblock bitmaps Mel Gorman
2014-05-22  9:24   ` Vlastimil Babka
2014-05-22 18:23     ` Andrew Morton
2014-05-22 18:45       ` Vlastimil Babka
2014-05-13  9:45 ` [PATCH 10/19] mm: page_alloc: Reduce number of times page_to_pfn is called Mel Gorman
2014-05-13 13:27   ` Vlastimil Babka
2014-05-13 14:09     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 11/19] mm: page_alloc: Lookup pageblock migratetype with IRQs enabled during free Mel Gorman
2014-05-13 13:36   ` Vlastimil Babka
2014-05-13 14:23     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 12/19] mm: page_alloc: Use unsigned int for order in more places Mel Gorman
2014-05-13  9:45 ` [PATCH 13/19] mm: page_alloc: Convert hot/cold parameter and immediate callers to bool Mel Gorman
2014-05-13  9:45 ` [PATCH 14/19] mm: shmem: Avoid atomic operation during shmem_getpage_gfp Mel Gorman
2014-05-13  9:45 ` [PATCH 15/19] mm: Do not use atomic operations when releasing pages Mel Gorman
2014-05-13  9:45 ` [PATCH 16/19] mm: Do not use unnecessary atomic operations when adding pages to the LRU Mel Gorman
2014-05-13  9:45 ` [PATCH 17/19] fs: buffer: Do not use unnecessary atomic operations when discarding buffers Mel Gorman
2014-05-13 11:09   ` Peter Zijlstra
2014-05-13 12:50     ` Mel Gorman
2014-05-13 13:49       ` Jan Kara
2014-05-13 14:30         ` Mel Gorman
2014-05-13 14:01       ` Peter Zijlstra
2014-05-13 14:46         ` Mel Gorman
2014-05-13 13:50   ` Jan Kara
2014-05-13 22:29   ` Andrew Morton
2014-05-14  6:12     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 18/19] mm: Non-atomically mark page accessed during page cache allocation where possible Mel Gorman
2014-05-13 14:29   ` Theodore Ts'o
2014-05-20 15:49   ` [PATCH] mm: non-atomically mark page accessed during page cache allocation where possible -fix Mel Gorman
2014-05-20 19:34     ` Andrew Morton
2014-05-21 12:09       ` Mel Gorman
2014-05-21 22:11         ` Andrew Morton
2014-05-22  0:07           ` Mel Gorman
2014-05-22  5:35       ` Prabhakar Lad
2014-05-13  9:45 ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Mel Gorman
2014-05-13 12:53   ` Mel Gorman
2014-05-13 14:17     ` Peter Zijlstra
2014-05-13 15:27       ` Paul E. McKenney
2014-05-13 15:44         ` Peter Zijlstra
2014-05-13 16:14           ` Paul E. McKenney
2014-05-13 18:57             ` Oleg Nesterov
2014-05-13 20:24               ` Paul E. McKenney
2014-05-14 14:25                 ` Oleg Nesterov
2014-05-13 18:22           ` Oleg Nesterov
2014-05-13 18:18         ` Oleg Nesterov
2014-05-13 18:24           ` Peter Zijlstra
2014-05-13 18:52           ` Paul E. McKenney
2014-05-13 19:31             ` Oleg Nesterov
2014-05-13 20:32               ` Paul E. McKenney
2014-05-14 16:11       ` Oleg Nesterov
2014-05-14 16:17         ` Peter Zijlstra
2014-05-16 13:51           ` [PATCH 0/1] ptrace: task_clear_jobctl_trapping()->wake_up_bit() needs mb() Oleg Nesterov
2014-05-16 13:51             ` [PATCH 1/1] " Oleg Nesterov
2014-05-21  9:29               ` Peter Zijlstra
2014-05-21 19:19                 ` Andrew Morton
2014-05-21 19:18             ` [PATCH 0/1] " Andrew Morton
2014-05-14 19:29         ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Oleg Nesterov
2014-05-14 20:53           ` Mel Gorman
2014-05-15 10:48           ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v4 Mel Gorman
2014-05-15 13:20             ` Peter Zijlstra
2014-05-15 13:29               ` Peter Zijlstra
2014-05-15 15:34               ` Oleg Nesterov
2014-05-15 15:45                 ` Peter Zijlstra
2014-05-15 16:18               ` Mel Gorman
2014-05-15 15:03             ` Oleg Nesterov
2014-05-15 21:24             ` Andrew Morton
2014-05-21 12:15               ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Mel Gorman
2014-05-21 13:02                 ` Peter Zijlstra
2014-05-21 15:33                   ` Mel Gorman
2014-05-21 16:08                     ` Peter Zijlstra
2014-05-21 21:26                 ` Andrew Morton
2014-05-21 21:33                   ` Peter Zijlstra
2014-05-21 21:50                     ` Andrew Morton
2014-05-22  0:07                       ` Mel Gorman
2014-05-22  7:20                         ` Peter Zijlstra
2014-05-22 10:40                           ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v7 Mel Gorman
2014-05-22 10:56                             ` Peter Zijlstra
2014-05-22 13:00                               ` Mel Gorman
2014-05-22 14:40                               ` Mel Gorman
2014-05-22 15:04                                 ` Peter Zijlstra
2014-05-22 15:36                                   ` Mel Gorman
2014-05-22 16:58                                   ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v8 Mel Gorman
2014-05-22  6:45                       ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Peter Zijlstra
2014-05-22  8:46                         ` Mel Gorman
2014-05-22 17:47                           ` Andrew Morton
2014-05-22 19:53                             ` Mel Gorman
2014-05-21 23:35                   ` Mel Gorman
2014-05-13 16:52   ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Peter Zijlstra
2014-05-14  7:31     ` Mel Gorman
2014-05-19  8:57 ` [PATCH] mm: Avoid unnecessary atomic operations during end_page_writeback Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).