All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Improve sequential read throughput v2
@ 2014-06-25  7:58 ` Mel Gorman
  0 siblings, 0 replies; 41+ messages in thread
From: Mel Gorman @ 2014-06-25  7:58 UTC (permalink / raw)
  To: Linux Kernel, Linux-MM, Linux-FSDevel
  Cc: Johannes Weiner, Jens Axboe, Jeff Moyer, Dave Chinner, Mel Gorman

Changelog since v1
o Rebase to v3.16-rc2
o Move CFQ patch to end of series where it can be rejected easier if necessary
o Introduce page-reclaim related patch related to kswapd/fairzone interactions
o Rework fast zone policy patch

IO performance since 3.0 has been a mixed bag. In many respects we are
better and in some we are worse and one of those places is sequential
read throughput. This is visible in a number of benchmarks but I looked
at tiobench the closest. This is using ext3 on a mid-range desktop and
comparing against 3.0.

                                      3.16.0-rc2            3.16.0-rc2                 3.0.0
                                         vanilla                cfq600               vanilla
Min    SeqRead-MB/sec-1         120.96 (  0.00%)      140.43 ( 16.10%)      134.04 ( 10.81%)
Min    SeqRead-MB/sec-2         100.73 (  0.00%)      118.18 ( 17.32%)      120.76 ( 19.88%)
Min    SeqRead-MB/sec-4          96.05 (  0.00%)      110.84 ( 15.40%)      114.49 ( 19.20%)
Min    SeqRead-MB/sec-8          82.46 (  0.00%)       92.40 ( 12.05%)       98.04 ( 18.89%)
Min    SeqRead-MB/sec-16         66.37 (  0.00%)       76.68 ( 15.53%)       79.49 ( 19.77%)

This series does not fully restore throughput performance to 3.0 levels
but it brings it acceptably close. While throughput for higher numbers
of threads is lower, it is known that it can be tuned by increasing
target_latency or disabling low_latency giving higher overall throughput
at the cost of latency and IO fairness.

This series in ordered in ascending-likelihood-to-cause-controversary so
that a partial series can still potentially be merged even if parts of it
are naked (e.g. CGQ). For reference, here is the series without the CFQ
patch at the end.

                                      3.16.0-rc2            3.16.0-rc2                 3.0.0
                                         vanilla             lessdirty               vanilla
Min    SeqRead-MB/sec-1         120.96 (  0.00%)      141.04 ( 16.60%)      134.04 ( 10.81%)
Min    SeqRead-MB/sec-2         100.73 (  0.00%)      116.26 ( 15.42%)      120.76 ( 19.88%)
Min    SeqRead-MB/sec-4          96.05 (  0.00%)      109.52 ( 14.02%)      114.49 ( 19.20%)
Min    SeqRead-MB/sec-8          82.46 (  0.00%)       88.60 (  7.45%)       98.04 ( 18.89%)
Min    SeqRead-MB/sec-16         66.37 (  0.00%)       69.87 (  5.27%)       79.49 ( 19.77%)


 block/cfq-iosched.c            |   2 +-
 include/linux/mmzone.h         | 210 ++++++++++++++++++++++-------------------
 include/linux/writeback.h      |   1 +
 include/trace/events/pagemap.h |  16 ++--
 mm/internal.h                  |   1 +
 mm/mm_init.c                   |   5 +-
 mm/page-writeback.c            |  15 +--
 mm/page_alloc.c                | 206 ++++++++++++++++++++++++++--------------
 mm/swap.c                      |   4 +-
 mm/vmscan.c                    |  16 ++--
 mm/vmstat.c                    |   4 +-
 11 files changed, 285 insertions(+), 195 deletions(-)

-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 41+ messages in thread
* [PATCH 0/5] Reduce sequential read overhead
@ 2014-07-09  8:13 Mel Gorman
  2014-07-09  8:13   ` Mel Gorman
  0 siblings, 1 reply; 41+ messages in thread
From: Mel Gorman @ 2014-07-09  8:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Kernel, Linux-MM, Linux-FSDevel, Johannes Weiner, Mel Gorman

This was formerly the series "Improve sequential read throughput" which
noted some major differences in performance of tiobench since 3.0. While
there are a number of factors, two that dominated were the introduction
of the fair zone allocation policy and changes to CFQ.

The behaviour of fair zone allocation policy makes more sense than tiobench
as a benchmark and CFQ defaults were not changed due to insufficient
benchmarking.

This series is what's left. It's one functional fix to the fair zone
allocation policy when used on NUMA machines and a reduction of overhead
in general. tiobench was used for the comparison despite its flaws as an
IO benchmark as in this case we are primarily interested in the overhead
of page allocator and page reclaim activity.

On UMA, it makes little difference to overhead

          3.16.0-rc3   3.16.0-rc3
             vanilla lowercost-v5
User          383.61      386.77
System        403.83      401.74
Elapsed      5411.50     5413.11

On a 4-socket NUMA machine it's a bit more noticable

          3.16.0-rc3   3.16.0-rc3
             vanilla lowercost-v5
User          746.94      802.00
System      65336.22    40852.33
Elapsed     27553.52    27368.46

 include/linux/mmzone.h         | 217 ++++++++++++++++++++++-------------------
 include/trace/events/pagemap.h |  16 ++-
 mm/page_alloc.c                | 122 ++++++++++++-----------
 mm/swap.c                      |   4 +-
 mm/vmscan.c                    |   7 +-
 mm/vmstat.c                    |   9 +-
 6 files changed, 198 insertions(+), 177 deletions(-)

-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2014-07-10 12:06 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-25  7:58 [PATCH 0/6] Improve sequential read throughput v2 Mel Gorman
2014-06-25  7:58 ` Mel Gorman
2014-06-25  7:58 ` [PATCH 1/6] mm: pagemap: Avoid unnecessary overhead when tracepoints are deactivated Mel Gorman
2014-06-25  7:58   ` Mel Gorman
2014-06-25  7:58 ` [PATCH 2/6] mm: Rearrange zone fields into read-only, page alloc, statistics and page reclaim lines Mel Gorman
2014-06-25  7:58   ` Mel Gorman
2014-06-25  7:58 ` [PATCH 3/6] mm: vmscan: Do not reclaim from lower zones if they are balanced Mel Gorman
2014-06-25  7:58   ` Mel Gorman
2014-06-25 23:32   ` Andrew Morton
2014-06-25 23:32     ` Andrew Morton
2014-06-26 10:17     ` Mel Gorman
2014-06-26 10:17       ` Mel Gorman
2014-06-25  7:58 ` [PATCH 4/6] mm: page_alloc: Reduce cost of the fair zone allocation policy Mel Gorman
2014-06-25  7:58   ` Mel Gorman
2014-06-25  7:58 ` [PATCH 5/6] mm: page_alloc: Reduce cost of dirty zone balancing Mel Gorman
2014-06-25  7:58   ` Mel Gorman
2014-06-25 23:35   ` Andrew Morton
2014-06-25 23:35     ` Andrew Morton
2014-06-26  8:43     ` Mel Gorman
2014-06-26  8:43       ` Mel Gorman
2014-06-26 14:37       ` Johannes Weiner
2014-06-26 14:56         ` Mel Gorman
2014-06-26 14:56           ` Mel Gorman
2014-06-26 15:11           ` Johannes Weiner
2014-06-26 15:11             ` Johannes Weiner
2014-06-25  7:58 ` [PATCH 6/6] cfq: Increase default value of target_latency Mel Gorman
2014-06-25  7:58   ` Mel Gorman
2014-06-26 15:36   ` Jeff Moyer
2014-06-26 15:36     ` Jeff Moyer
2014-06-26 16:19     ` Mel Gorman
2014-06-26 16:19       ` Mel Gorman
2014-06-26 16:50       ` Jeff Moyer
2014-06-26 16:50         ` Jeff Moyer
2014-06-26 17:45         ` Mel Gorman
2014-06-26 17:45           ` Mel Gorman
2014-06-26 18:04           ` Jeff Moyer
2014-06-26 18:04             ` Jeff Moyer
2014-07-09  8:13 [PATCH 0/5] Reduce sequential read overhead Mel Gorman
2014-07-09  8:13 ` [PATCH 2/6] mm: Rearrange zone fields into read-only, page alloc, statistics and page reclaim lines Mel Gorman
2014-07-09  8:13   ` Mel Gorman
2014-07-10 12:06   ` Johannes Weiner
2014-07-10 12:06     ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.