All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: [MMTests] Page reclaim performance on ext4
Date: Wed, 4 Jul 2012 16:53:16 +0100	[thread overview]
Message-ID: <20120704155316.GK14154@suse.de> (raw)
In-Reply-To: <20120629111932.GA14154@suse.de>

Configuration:	global-dhp__pagereclaim-performance-ext4
Result: 	http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__pagereclaim-performance-ext4
Benchmarks:	postmark largedd fsmark-single fsmark-threaded micro

Summary
=======

fsmark is showing a performance dip in 3.4 on a 32-bit machine that is
not matched by results elsewhere.

A number of tests show that there was swapping activity in 3.1 and 3.2
which should have been completely unnecessary. It has been fixed but the
fixes should have been backported as some were based on reports of poor
interactivity performance during IO.

One largedd test on hydra showed that a number of dirty pages are reaching
the end of the LRU. This is not visible on other tests but should be
monitored.

I added linux-ext4 to the list because there are some performance drops
that are not visible elsewhere so may be filesystem-specific.

Benchmark notes
===============

Each of the benchmarks trigger page reclaim in a fairly simple manner. The
intention is not to be exhaustive but to test basic reclaim patterns that
the VM should never get wrong. Regressions may also be due to changes in
the IO scheduler or underlying filesystem.

The workloads are predominately file-based. Anonymous page reclaim stress
testing is covered by another test.

mkfs was run on system startup. No attempt was made to age it. No
special mkfs or mount options were used.

postmark
  o 15000 transactions
  o File size ranged 3096 bytes to 5M
  o 100 subdirectories
  o Total footprint approximately 4*TOTAL_RAM

  This workload is a single-threaded benchmark intended to measure
  filesystem performance for many short-lived and small files. It's
  primary weakness is that it does no application processing and
  so the page aging is basically on per-file granularity. 

largedd
  o Target size 8*TOTAL_RAM

  This downloads a large file and makes copies with dd until the
  target footprint size is reached.

fsmark
  o Parallel directories were used
  o 1 Thread per CPU
  o 30M Filesize
  o 16 directories
  o 256 files per directory
  o TOTAL_RAM_IN_BYTES/FILESIZE files per iteration
  o 15 iterations
  Single: ./fs_mark  -d  /tmp/fsmark-25458/1  -D  16  -N  256  -n  532  -L  15  -S0  -s  31457280
  Thread: ./fs_mark  -d  /tmp/fsmark-28217/1  -d  /tmp/fsmark-28217/2  -d  /tmp/fsmark-28217/3  -d
/tmp/fsmark-28217/4  -d  /tmp/fsmark-28217/5  -d  /tmp/fsmark-28217/6  -d  /tmp/fsmark-28217/7  -d
/tmp/fsmark-28217/8  -d  /tmp/fsmark-28217/9  -d  /tmp/fsmark-28217/10  -d  /tmp/fsmark-28217/11  -d
/tmp/fsmark-28217/12  -d  /tmp/fsmark-28217/13  -d  /tmp/fsmark-28217/14  -d  /tmp/fsmark-28217/15  -d
/tmp/fsmark-28217/16  -d  /tmp/fsmark-28217/17  -d  /tmp/fsmark-28217/18  -d  /tmp/fsmark-28217/19  -d
/tmp/fsmark-28217/20  -d  /tmp/fsmark-28217/21  -d  /tmp/fsmark-28217/22  -d  /tmp/fsmark-28217/23  -d
/tmp/fsmark-28217/24  -d  /tmp/fsmark-28217/25  -d  /tmp/fsmark-28217/26  -d  /tmp/fsmark-28217/27  -d
/tmp/fsmark-28217/28  -d  /tmp/fsmark-28217/29  -d  /tmp/fsmark-28217/30  -d  /tmp/fsmark-28217/31  -d
/tmp/fsmark-28217/32  -D  16  -N  256  -n  16  -L  15  -S0  -s  31457280

micro
  o Total mapping size 10*TOTAL_RAM
  o NR_CPU threads
  o 5 iterations

  This creates one process per CPU and creates a large file-backed mapping
  up to the total mapping size. Each of the threads does a streaming
  read of the mapping for a number of iterations. It then restarts with
  a streaming write.

  Completion time is the primary factor here but be careful as a good
  completion time can be due to excessive reclaim which has an adverse
  effect on many other workloads.

===========================================================
Machine:	arnold
Result:		http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__pagereclaim-performance-ext4/arnold/comparison.html
Arch:		x86
CPUs:		1 socket, 2 threads
Model:		Pentium 4
Disk:		Single Rotary Disk
Status:		Generally good but swapping in 3.1 and 3.2
===========================================================

fsmark-single
-------------
  There was a performance dip in 3.4 that is not visible for ext3 and so
  may be filesystem-specific. From a reclaim perspective the figures look ok.

fsmark-threaded
---------------
  This also shows a large performance dip in 3.4 and an increase in variance
  but again the reclaim stats look ok.

postmark
--------
  As seen on ext3 there was a performance dip for kernels 3.1 to 3.3 that
  is not quite been recovered in 3.4. There was also swapping activity
  for the 3.1 and 3.2 kernels which may partially explain the problem.

largedd
-------
  Completion times are mixed. Again some swapping activity is visible in
  3.1 and 3.2 which has been resolved but not backported.

micro
-----
  Completion figures are not looking bad for 3.4
   
==========================================================
Machine:	hydra
Result:		http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__pagereclaim-performance-ext4/hydra/comparison.html
Arch:		x86-64
CPUs:		1 socket, 4 threads
Model:		AMD Phenom II X4 940
Disk:		Single Rotary Disk
Status:		Ok but postmark shows high inode steals
==========================================================

fsmark-single
-------------
  Performance was declining until 3.1 and been steady every since.
  Some minor amounts of swapping is visible in 3.1

fsmark-threaded
---------------
 Surprisingly stead performance but recent kernels show some direct
 reclaim is going on. It's a tiny percentage and lower than it has
 been historically but keep an eye on it.

postmark
-------
  As with other tests 3.1 saw a performance dip and swapping activity. 3.2
  was also swapping although performance was not affected. While the reclaim
  figures currently look ok, the actual performance sucks. As the same is
  not visible on ext3, this may be a filesystem problem.

largedd
-------
  Completion figures look good but as before, swapping in 3.1 and 3.2.
  What is of concern is the pages reclaimed by PageReclaim are excessively
  high in 3.3 and 3.4. This implies that a large number of dirty pages
  are reaching the end of the LRU and this can be a problem. Minimally
  it increases kswapd CPU usage but can also be indicate a flushing
  problem.

micro
-----
  Looks ok.

==========================================================
Machine:	sandy
Result:		http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__pagereclaim-performance-ext4/sandy/comparison.html
Arch:		x86-64
CPUs:		1 socket, 8 threads
Model:		Intel Core i7-2600
Disk:		Single Rotary Disk
Status:		Generally ok, but swapping in 3.1 and 3.2
==========================================================

fsmark-single
-------------
  Steady performance throughout. Tiny swap activity visible on 3.1 and 3.2
  which caused no harm but does correlate with other tests.

fsmark-threaded
---------------
  Also steady with the exception of 3.2 which is bizzare from a reclaim
  perspective. There was no direct reclaim scanning but a lot of inodes
  were reclaimed.

postmark
--------
  Same performance dip in 3.1 and 3.2 and accompanied by the same swapping
  problem.

largedd
-------
  Completion figures generally looking ok although again 3.1 is bad from
  a swapping and direct reclaim perspective.

micro
-----
  Looks ok

-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: [MMTests] Page reclaim performance on ext4
Date: Wed, 4 Jul 2012 16:53:16 +0100	[thread overview]
Message-ID: <20120704155316.GK14154@suse.de> (raw)
In-Reply-To: <20120629111932.GA14154@suse.de>

Configuration:	global-dhp__pagereclaim-performance-ext4
Result: 	http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__pagereclaim-performance-ext4
Benchmarks:	postmark largedd fsmark-single fsmark-threaded micro

Summary
=======

fsmark is showing a performance dip in 3.4 on a 32-bit machine that is
not matched by results elsewhere.

A number of tests show that there was swapping activity in 3.1 and 3.2
which should have been completely unnecessary. It has been fixed but the
fixes should have been backported as some were based on reports of poor
interactivity performance during IO.

One largedd test on hydra showed that a number of dirty pages are reaching
the end of the LRU. This is not visible on other tests but should be
monitored.

I added linux-ext4 to the list because there are some performance drops
that are not visible elsewhere so may be filesystem-specific.

Benchmark notes
===============

Each of the benchmarks trigger page reclaim in a fairly simple manner. The
intention is not to be exhaustive but to test basic reclaim patterns that
the VM should never get wrong. Regressions may also be due to changes in
the IO scheduler or underlying filesystem.

The workloads are predominately file-based. Anonymous page reclaim stress
testing is covered by another test.

mkfs was run on system startup. No attempt was made to age it. No
special mkfs or mount options were used.

postmark
  o 15000 transactions
  o File size ranged 3096 bytes to 5M
  o 100 subdirectories
  o Total footprint approximately 4*TOTAL_RAM

  This workload is a single-threaded benchmark intended to measure
  filesystem performance for many short-lived and small files. It's
  primary weakness is that it does no application processing and
  so the page aging is basically on per-file granularity. 

largedd
  o Target size 8*TOTAL_RAM

  This downloads a large file and makes copies with dd until the
  target footprint size is reached.

fsmark
  o Parallel directories were used
  o 1 Thread per CPU
  o 30M Filesize
  o 16 directories
  o 256 files per directory
  o TOTAL_RAM_IN_BYTES/FILESIZE files per iteration
  o 15 iterations
  Single: ./fs_mark  -d  /tmp/fsmark-25458/1  -D  16  -N  256  -n  532  -L  15  -S0  -s  31457280
  Thread: ./fs_mark  -d  /tmp/fsmark-28217/1  -d  /tmp/fsmark-28217/2  -d  /tmp/fsmark-28217/3  -d
/tmp/fsmark-28217/4  -d  /tmp/fsmark-28217/5  -d  /tmp/fsmark-28217/6  -d  /tmp/fsmark-28217/7  -d
/tmp/fsmark-28217/8  -d  /tmp/fsmark-28217/9  -d  /tmp/fsmark-28217/10  -d  /tmp/fsmark-28217/11  -d
/tmp/fsmark-28217/12  -d  /tmp/fsmark-28217/13  -d  /tmp/fsmark-28217/14  -d  /tmp/fsmark-28217/15  -d
/tmp/fsmark-28217/16  -d  /tmp/fsmark-28217/17  -d  /tmp/fsmark-28217/18  -d  /tmp/fsmark-28217/19  -d
/tmp/fsmark-28217/20  -d  /tmp/fsmark-28217/21  -d  /tmp/fsmark-28217/22  -d  /tmp/fsmark-28217/23  -d
/tmp/fsmark-28217/24  -d  /tmp/fsmark-28217/25  -d  /tmp/fsmark-28217/26  -d  /tmp/fsmark-28217/27  -d
/tmp/fsmark-28217/28  -d  /tmp/fsmark-28217/29  -d  /tmp/fsmark-28217/30  -d  /tmp/fsmark-28217/31  -d
/tmp/fsmark-28217/32  -D  16  -N  256  -n  16  -L  15  -S0  -s  31457280

micro
  o Total mapping size 10*TOTAL_RAM
  o NR_CPU threads
  o 5 iterations

  This creates one process per CPU and creates a large file-backed mapping
  up to the total mapping size. Each of the threads does a streaming
  read of the mapping for a number of iterations. It then restarts with
  a streaming write.

  Completion time is the primary factor here but be careful as a good
  completion time can be due to excessive reclaim which has an adverse
  effect on many other workloads.

===========================================================
Machine:	arnold
Result:		http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__pagereclaim-performance-ext4/arnold/comparison.html
Arch:		x86
CPUs:		1 socket, 2 threads
Model:		Pentium 4
Disk:		Single Rotary Disk
Status:		Generally good but swapping in 3.1 and 3.2
===========================================================

fsmark-single
-------------
  There was a performance dip in 3.4 that is not visible for ext3 and so
  may be filesystem-specific. From a reclaim perspective the figures look ok.

fsmark-threaded
---------------
  This also shows a large performance dip in 3.4 and an increase in variance
  but again the reclaim stats look ok.

postmark
--------
  As seen on ext3 there was a performance dip for kernels 3.1 to 3.3 that
  is not quite been recovered in 3.4. There was also swapping activity
  for the 3.1 and 3.2 kernels which may partially explain the problem.

largedd
-------
  Completion times are mixed. Again some swapping activity is visible in
  3.1 and 3.2 which has been resolved but not backported.

micro
-----
  Completion figures are not looking bad for 3.4
   
==========================================================
Machine:	hydra
Result:		http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__pagereclaim-performance-ext4/hydra/comparison.html
Arch:		x86-64
CPUs:		1 socket, 4 threads
Model:		AMD Phenom II X4 940
Disk:		Single Rotary Disk
Status:		Ok but postmark shows high inode steals
==========================================================

fsmark-single
-------------
  Performance was declining until 3.1 and been steady every since.
  Some minor amounts of swapping is visible in 3.1

fsmark-threaded
---------------
 Surprisingly stead performance but recent kernels show some direct
 reclaim is going on. It's a tiny percentage and lower than it has
 been historically but keep an eye on it.

postmark
-------
  As with other tests 3.1 saw a performance dip and swapping activity. 3.2
  was also swapping although performance was not affected. While the reclaim
  figures currently look ok, the actual performance sucks. As the same is
  not visible on ext3, this may be a filesystem problem.

largedd
-------
  Completion figures look good but as before, swapping in 3.1 and 3.2.
  What is of concern is the pages reclaimed by PageReclaim are excessively
  high in 3.3 and 3.4. This implies that a large number of dirty pages
  are reaching the end of the LRU and this can be a problem. Minimally
  it increases kswapd CPU usage but can also be indicate a flushing
  problem.

micro
-----
  Looks ok.

==========================================================
Machine:	sandy
Result:		http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__pagereclaim-performance-ext4/sandy/comparison.html
Arch:		x86-64
CPUs:		1 socket, 8 threads
Model:		Intel Core i7-2600
Disk:		Single Rotary Disk
Status:		Generally ok, but swapping in 3.1 and 3.2
==========================================================

fsmark-single
-------------
  Steady performance throughout. Tiny swap activity visible on 3.1 and 3.2
  which caused no harm but does correlate with other tests.

fsmark-threaded
---------------
  Also steady with the exception of 3.2 which is bizzare from a reclaim
  perspective. There was no direct reclaim scanning but a lot of inodes
  were reclaimed.

postmark
--------
  Same performance dip in 3.1 and 3.2 and accompanied by the same swapping
  problem.

largedd
-------
  Completion figures generally looking ok although again 3.1 is bad from
  a swapping and direct reclaim perspective.

micro
-----
  Looks ok

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-07-04 15:53 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-20 11:32 MMTests 0.04 Mel Gorman
2012-06-20 11:32 ` Mel Gorman
2012-06-29 11:19 ` Mel Gorman
2012-06-29 11:19   ` Mel Gorman
2012-06-29 11:21   ` [MMTests] Page allocator Mel Gorman
2012-06-29 11:21     ` Mel Gorman
2012-06-29 11:22   ` [MMTests] Network performance Mel Gorman
2012-06-29 11:22     ` Mel Gorman
2012-06-29 11:23   ` [MMTests] IO metadata on ext3 Mel Gorman
2012-06-29 11:23     ` Mel Gorman
2012-06-29 11:24   ` [MMTests] IO metadata on ext4 Mel Gorman
2012-06-29 11:24     ` Mel Gorman
2012-06-29 11:25   ` [MMTests] IO metadata on XFS Mel Gorman
2012-06-29 11:25     ` Mel Gorman
2012-06-29 11:25     ` Mel Gorman
2012-07-01 23:54     ` Dave Chinner
2012-07-01 23:54       ` Dave Chinner
2012-07-01 23:54       ` Dave Chinner
2012-07-02  6:32       ` Christoph Hellwig
2012-07-02  6:32         ` Christoph Hellwig
2012-07-02  6:32         ` Christoph Hellwig
2012-07-02 14:32         ` Mel Gorman
2012-07-02 14:32           ` Mel Gorman
2012-07-02 14:32           ` Mel Gorman
2012-07-02 19:35           ` Mel Gorman
2012-07-02 19:35             ` Mel Gorman
2012-07-02 19:35             ` Mel Gorman
2012-07-03  0:19             ` Dave Chinner
2012-07-03  0:19               ` Dave Chinner
2012-07-03  0:19               ` Dave Chinner
2012-07-03 10:59               ` Mel Gorman
2012-07-03 10:59                 ` Mel Gorman
2012-07-03 10:59                 ` Mel Gorman
2012-07-03 11:44                 ` Mel Gorman
2012-07-03 11:44                   ` Mel Gorman
2012-07-03 11:44                   ` Mel Gorman
2012-07-03 12:31                 ` Daniel Vetter
2012-07-03 12:31                   ` Daniel Vetter
2012-07-03 12:31                   ` Daniel Vetter
2012-07-03 13:08                   ` Mel Gorman
2012-07-03 13:08                     ` Mel Gorman
2012-07-03 13:08                     ` Mel Gorman
2012-07-03 13:28                   ` Eugeni Dodonov
2012-07-03 13:28                     ` Eugeni Dodonov
2012-07-04  0:47                 ` Dave Chinner
2012-07-04  0:47                   ` Dave Chinner
2012-07-04  0:47                   ` Dave Chinner
2012-07-04  9:51                   ` Mel Gorman
2012-07-04  9:51                     ` Mel Gorman
2012-07-04  9:51                     ` Mel Gorman
2012-07-03 13:04             ` Mel Gorman
2012-07-03 13:04               ` Mel Gorman
2012-07-03 13:04               ` Mel Gorman
2012-07-03 14:04               ` Daniel Vetter
2012-07-03 14:04                 ` Daniel Vetter
2012-07-03 14:04                 ` Daniel Vetter
2012-07-02 13:30       ` Mel Gorman
2012-07-02 13:30         ` Mel Gorman
2012-07-02 13:30         ` Mel Gorman
2012-07-04 15:52   ` [MMTests] Page reclaim performance on ext3 Mel Gorman
2012-07-04 15:52     ` Mel Gorman
2012-07-04 15:53   ` Mel Gorman [this message]
2012-07-04 15:53     ` [MMTests] Page reclaim performance on ext4 Mel Gorman
2012-07-04 15:53   ` [MMTests] Page reclaim performance on xfs Mel Gorman
2012-07-04 15:53     ` Mel Gorman
2012-07-05 14:56   ` [MMTests] Interactivity during IO on ext3 Mel Gorman
2012-07-05 14:56     ` Mel Gorman
2012-07-10  9:49     ` Jan Kara
2012-07-10  9:49       ` Jan Kara
2012-07-10 11:30       ` Mel Gorman
2012-07-10 11:30         ` Mel Gorman
2012-07-05 14:57   ` [MMTests] Interactivity during IO on ext4 Mel Gorman
2012-07-05 14:57     ` Mel Gorman
2012-07-23 21:12   ` [MMTests] Scheduler Mel Gorman
2012-07-23 21:12     ` Mel Gorman
2012-07-23 21:13   ` [MMTests] Sysbench read-only on ext3 Mel Gorman
2012-07-23 21:13     ` Mel Gorman
2012-07-24  2:29     ` Mike Galbraith
2012-07-24  2:29       ` Mike Galbraith
2012-07-24  8:19       ` Mel Gorman
2012-07-24  8:19         ` Mel Gorman
2012-07-24  8:32         ` Mike Galbraith
2012-07-24  8:32           ` Mike Galbraith
2012-07-23 21:14   ` [MMTests] Sysbench read-only on ext4 Mel Gorman
2012-07-23 21:14     ` Mel Gorman
2012-07-23 21:15   ` [MMTests] Sysbench read-only on xfs Mel Gorman
2012-07-23 21:15     ` Mel Gorman
2012-07-23 21:17   ` [MMTests] memcachetest and parallel IO on ext3 Mel Gorman
2012-07-23 21:17     ` Mel Gorman
2012-07-23 21:19   ` [MMTests] memcachetest and parallel IO on xfs Mel Gorman
2012-07-23 21:19     ` Mel Gorman
2012-07-23 21:20   ` [MMTests] Stress high-order allocations on ext3 Mel Gorman
2012-07-23 21:20     ` Mel Gorman
2012-07-23 21:21   ` [MMTests] dbench4 async " Mel Gorman
2012-07-23 21:21     ` Mel Gorman
2012-08-16 14:52     ` Jan Kara
2012-08-16 14:52       ` Jan Kara
2012-08-21 22:00     ` Jan Kara
2012-08-21 22:00       ` Jan Kara
2012-08-22 10:48       ` Mel Gorman
2012-08-22 10:48         ` Mel Gorman
2012-07-23 21:23   ` [MMTests] dbench4 async on ext4 Mel Gorman
2012-07-23 21:23     ` Mel Gorman
2012-07-23 21:24   ` [MMTests] Threaded IO Performance on ext3 Mel Gorman
2012-07-23 21:24     ` Mel Gorman
2012-07-23 21:25   ` [MMTests] Threaded IO Performance on xfs Mel Gorman
2012-07-23 21:25     ` Mel Gorman
2012-07-23 21:25     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120704155316.GK14154@suse.de \
    --to=mgorman@suse.de \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.