[PATCH 0/5] Improve sequential read throughput v4r8

* [PATCH 0/5] Improve sequential read throughput v4r8
@ 2014-06-30 16:47 Mel Gorman
  2014-06-30 16:48 ` [PATCH 1/4] mm: pagemap: Avoid unnecessary overhead when tracepoints are deactivated Mel Gorman
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Mel Gorman @ 2014-06-30 16:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Kernel, Linux-MM, Linux-FSDevel, Johannes Weiner, Mel Gorman

Changelog since V3
o Push down kwapd changes to cover the balance gap
o Drop drop page distribution patch

Changelog since V2
o Simply fair zone policy cost reduction
o Drop CFQ patch

Changelog since v1
o Rebase to v3.16-rc2
o Move CFQ patch to end of series where it can be rejected easier if necessary
o Introduce page-reclaim related patch related to kswapd/fairzone interactions
o Rework fast zone policy patch

IO performance since 3.0 has been a mixed bag. In many respects we are
better and in some we are worse and one of those places is sequential
read throughput. This is visible in a number of benchmarks but I looked
at tiobench the closest. This is using ext3 on a mid-range desktop and
the series applied.

                                      3.16.0-rc2                 3.0.0            3.16.0-rc2
                                         vanilla               vanilla         fairzone-v4r5
Min    SeqRead-MB/sec-1         120.92 (  0.00%)      133.65 ( 10.53%)      140.68 ( 16.34%)
Min    SeqRead-MB/sec-2         100.25 (  0.00%)      121.74 ( 21.44%)      118.13 ( 17.84%)
Min    SeqRead-MB/sec-4          96.27 (  0.00%)      113.48 ( 17.88%)      109.84 ( 14.10%)
Min    SeqRead-MB/sec-8          83.55 (  0.00%)       97.87 ( 17.14%)       89.62 (  7.27%)
Min    SeqRead-MB/sec-16         66.77 (  0.00%)       82.59 ( 23.69%)       70.49 (  5.57%)

Overall system CPU usage is reduced

          3.16.0-rc2       3.0.0  3.16.0-rc2
             vanilla     vanilla fairzone-v4
User          390.13      251.45      396.13
System        404.41      295.13      389.61
Elapsed      5412.45     5072.42     5163.49

This series does not fully restore throughput performance to 3.0 levels
but it brings it close for lower thread counts. Higher thread counts are
known to be worse than 3.0 due to CFQ changes but there is no appetite
for changing the defaults there.

 include/linux/mmzone.h         | 207 ++++++++++++++++++++++-------------------
 include/linux/swap.h           |   9 --
 include/trace/events/pagemap.h |  16 ++--
 mm/page_alloc.c                | 126 ++++++++++++++-----------
 mm/swap.c                      |   4 +-
 mm/vmscan.c                    |  46 ++++-----
 mm/vmstat.c                    |   4 +-
 7 files changed, 208 insertions(+), 204 deletions(-)

-- 
1.8.4.5

^ permalink raw reply	[flat|nested] 17+ messages in thread