All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-10 10:47 ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

Changelog since V2
  o Drop patch eliminating all writes from kswapd until such time as
    particular pages can be prioritised for writeback. Eliminating
    all writes led to stalls on NUMA
  o Lumpy synchronous reclaim now waits for pages currently under
    writeback but can no longer queue pages itself
  o Dropped btrfs warning when filesystems are called from direct
    reclaim. The fallback method for migration looks indistinguishable
    from direct reclaim.
  o Throttle based on pages writeback rather than pages dirty. Throttling
    based on just dirty is too aggressive and can end up trying to stall
    even when the underlying device is not congested

Changelog since v1
  o Drop prio-inode patch. There is now a dependency that the flusher
    threads find these dirty pages quickly.
  o Drop nr_vmscan_throttled counter
  o SetPageReclaim instead of deactivate_page which was wrong
  o Add warning to main filesystems if called from direct reclaim context
  o Add patch to completely disable filesystem writeback from reclaim

Testing from the XFS folk revealed that there is still too much
I/O from the end of the LRU in kswapd. Previously it was considered
acceptable by VM people for a small number of pages to be written
back from reclaim with testing generally showing about 0.3% of pages
reclaimed were written back (higher if memory was low). That writing
back a small number of pages is ok has been heavily disputed for
quite some time and Dave Chinner explained it well;

	It doesn't have to be a very high number to be a problem. IO
	is orders of magnitude slower than the CPU time it takes to
	flush a page, so the cost of making a bad flush decision is
	very high. And single page writeback from the LRU is almost
	always a bad flush decision.

To complicate matters, filesystems respond very differently to requests
from reclaim according to Christoph Hellwig;

	xfs tries to write it back if the requester is kswapd
	ext4 ignores the request if it's a delayed allocation
	btrfs ignores the request

As a result, each filesystem has different performance characteristics
when under memory pressure and there are many pages being dirties. In
some cases, the request is ignored entirely so the VM cannot depend
on the IO being dispatched.

The objective of this series is to reduce writing of filesystem-backed
pages from reclaim, play nicely with writeback that is already in
progress and throttle reclaim appropriately when writeback pages are
encountered. The assumption is that the flushers will always write
pages faster than if reclaim issues the IO. The new problem is that
reclaim has very little control over how long before a page in a
particular zone or container is cleaned which is discussed later. A
secondary goal is to avoid the problem whereby direct reclaim splices
two potentially deep call stacks together.

Patch 1 disables writeback of filesystem pages from direct reclaim
	entirely. Anonymous pages are still written.

Patch 2 removes dead code in lumpy reclaim as it is no longer able
	to synchronously write pages. This hurts lumpy reclaim but
	there is an expectation that compaction is used for hugepage
	allocations these days and lumpy reclaims days are numbered.

Patches 3-4 add warnings to XFS and ext4 if called from
	direct reclaim. With patch 1, this "never happens" and is
	intended to catch regressions in this logic in the future.

Patch 5 disables writeback of filesystem pages from kswapd unless
	the priority is raised to the point where kswapd is considered
	to be in trouble.

Patch 6 throttles reclaimers if too many dirty pages are being
	encountered and the zones or backing devices are congested.

Patch 7 invalidates dirty pages found at the end of the LRU so they
	are reclaimed quickly after being written back rather than
	waiting for a reclaimer to find them

I consider this series to be orthogonal to the writeback work but
it is worth noting that the writeback work affects the viability of
patch 8 in particular.

I tested this on ext4 and xfs using fs_mark, a simple writeback test
based on dd and a micro benchmark that does a streaming write to a
large mapping (exercises use-once LRU logic) followed by streaming
writes to a mix of anonymous and file-backed mappings. The command
line for fs_mark when botted with 512M looked something like

./fs_mark -d  /tmp/fsmark-2676  -D  100  -N  150  -n  150  -L  25  -t  1  -S0  -s  10485760

The number of files was adjusted depending on the amount of available
memory so that the files created was about 3xRAM. For multiple threads,
the -d switch is specified multiple times.

The test machine is x86-64 with an older generation of AMD processor
with 4 cores. The underlying storage was 4 disks configured as RAID-0
as this was the best configuration of storage I had available. Swap
is on a separate disk. Dirty ratio was tuned to 40% instead of the
default of 20%.

Testing was run with and without monitors to both verify that the
patches were operating as expected and that any performance gain was
real and not due to interference from monitors.

Here is a summary of results based on testing XFS.

512M1P-xfs           Files/s  mean                 32.69 ( 0.00%)     34.44 ( 5.08%)
512M1P-xfs           Elapsed Time fsmark                    51.41     48.29
512M1P-xfs           Elapsed Time simple-wb                114.09    108.61
512M1P-xfs           Elapsed Time mmap-strm                113.46    109.34
512M1P-xfs           Kswapd efficiency fsmark                 62%       63%
512M1P-xfs           Kswapd efficiency simple-wb              56%       61%
512M1P-xfs           Kswapd efficiency mmap-strm              44%       42%
512M-xfs             Files/s  mean                 30.78 ( 0.00%)     35.94 (14.36%)
512M-xfs             Elapsed Time fsmark                    56.08     48.90
512M-xfs             Elapsed Time simple-wb                112.22     98.13
512M-xfs             Elapsed Time mmap-strm                219.15    196.67
512M-xfs             Kswapd efficiency fsmark                 54%       56%
512M-xfs             Kswapd efficiency simple-wb              54%       55%
512M-xfs             Kswapd efficiency mmap-strm              45%       44%
512M-4X-xfs          Files/s  mean                 30.31 ( 0.00%)     33.33 ( 9.06%)
512M-4X-xfs          Elapsed Time fsmark                    63.26     55.88
512M-4X-xfs          Elapsed Time simple-wb                100.90     90.25
512M-4X-xfs          Elapsed Time mmap-strm                261.73    255.38
512M-4X-xfs          Kswapd efficiency fsmark                 49%       50%
512M-4X-xfs          Kswapd efficiency simple-wb              54%       56%
512M-4X-xfs          Kswapd efficiency mmap-strm              37%       36%
512M-16X-xfs         Files/s  mean                 60.89 ( 0.00%)     65.22 ( 6.64%)
512M-16X-xfs         Elapsed Time fsmark                    67.47     58.25
512M-16X-xfs         Elapsed Time simple-wb                103.22     90.89
512M-16X-xfs         Elapsed Time mmap-strm                237.09    198.82
512M-16X-xfs         Kswapd efficiency fsmark                 45%       46%
512M-16X-xfs         Kswapd efficiency simple-wb              53%       55%
512M-16X-xfs         Kswapd efficiency mmap-strm              33%       33%

Up until 512-4X, the FSmark improvements were statistically
significant. For the 4X and 16X tests the results were within standard
deviations but just barely. The time to completion for all tests is
improved which is an important result. In general, kswapd efficiency
is not affected by skipping dirty pages.

1024M1P-xfs          Files/s  mean                 39.09 ( 0.00%)     41.15 ( 5.01%)
1024M1P-xfs          Elapsed Time fsmark                    84.14     80.41
1024M1P-xfs          Elapsed Time simple-wb                210.77    184.78
1024M1P-xfs          Elapsed Time mmap-strm                162.00    160.34
1024M1P-xfs          Kswapd efficiency fsmark                 69%       75%
1024M1P-xfs          Kswapd efficiency simple-wb              71%       77%
1024M1P-xfs          Kswapd efficiency mmap-strm              43%       44%
1024M-xfs            Files/s  mean                 35.45 ( 0.00%)     37.00 ( 4.19%)
1024M-xfs            Elapsed Time fsmark                    94.59     91.00
1024M-xfs            Elapsed Time simple-wb                229.84    195.08
1024M-xfs            Elapsed Time mmap-strm                405.38    440.29
1024M-xfs            Kswapd efficiency fsmark                 79%       71%
1024M-xfs            Kswapd efficiency simple-wb              74%       74%
1024M-xfs            Kswapd efficiency mmap-strm              39%       42%
1024M-4X-xfs         Files/s  mean                 32.63 ( 0.00%)     35.05 ( 6.90%)
1024M-4X-xfs         Elapsed Time fsmark                   103.33     97.74
1024M-4X-xfs         Elapsed Time simple-wb                204.48    178.57
1024M-4X-xfs         Elapsed Time mmap-strm                528.38    511.88
1024M-4X-xfs         Kswapd efficiency fsmark                 81%       70%
1024M-4X-xfs         Kswapd efficiency simple-wb              73%       72%
1024M-4X-xfs         Kswapd efficiency mmap-strm              39%       38%
1024M-16X-xfs        Files/s  mean                 42.65 ( 0.00%)     42.97 ( 0.74%)
1024M-16X-xfs        Elapsed Time fsmark                   103.11     99.11
1024M-16X-xfs        Elapsed Time simple-wb                200.83    178.24
1024M-16X-xfs        Elapsed Time mmap-strm                397.35    459.82
1024M-16X-xfs        Kswapd efficiency fsmark                 84%       69%
1024M-16X-xfs        Kswapd efficiency simple-wb              74%       73%
1024M-16X-xfs        Kswapd efficiency mmap-strm              39%       40%

All FSMark tests up to 16X had statistically significant
improvements. For the most part, tests are completing faster with
the exception of the streaming writes to a mixture of anonymous and
file-backed mappings which were slower in two cases

In the cases where the mmap-strm tests were slower, there was more
swapping due to dirty pages being skipped. The number of additional
pages swapped is almost identical to the fewer number of pages written
from reclaim. In other words, roughly the same number of pages were
reclaimed but swapping was slower. As the test is a bit unrealistic
and stresses memory heavily, the small shift is acceptable.

4608M1P-xfs          Files/s  mean                 29.75 ( 0.00%)     30.96 ( 3.91%)
4608M1P-xfs          Elapsed Time fsmark                   512.01    492.15
4608M1P-xfs          Elapsed Time simple-wb                618.18    566.24
4608M1P-xfs          Elapsed Time mmap-strm                488.05    465.07
4608M1P-xfs          Kswapd efficiency fsmark                 93%       86%
4608M1P-xfs          Kswapd efficiency simple-wb              88%       84%
4608M1P-xfs          Kswapd efficiency mmap-strm              46%       45%
4608M-xfs            Files/s  mean                 27.60 ( 0.00%)     28.85 ( 4.33%)
4608M-xfs            Elapsed Time fsmark                   555.96    532.34
4608M-xfs            Elapsed Time simple-wb                659.72    571.85
4608M-xfs            Elapsed Time mmap-strm               1082.57   1146.38
4608M-xfs            Kswapd efficiency fsmark                 89%       91%
4608M-xfs            Kswapd efficiency simple-wb              88%       82%
4608M-xfs            Kswapd efficiency mmap-strm              48%       46%
4608M-4X-xfs         Files/s  mean                 26.00 ( 0.00%)     27.47 ( 5.35%)
4608M-4X-xfs         Elapsed Time fsmark                   592.91    564.00
4608M-4X-xfs         Elapsed Time simple-wb                616.65    575.07
4608M-4X-xfs         Elapsed Time mmap-strm               1773.02   1631.53
4608M-4X-xfs         Kswapd efficiency fsmark                 90%       94%
4608M-4X-xfs         Kswapd efficiency simple-wb              87%       82%
4608M-4X-xfs         Kswapd efficiency mmap-strm              43%       43%
4608M-16X-xfs        Files/s  mean                 26.07 ( 0.00%)     26.42 ( 1.32%)
4608M-16X-xfs        Elapsed Time fsmark                   602.69    585.78
4608M-16X-xfs        Elapsed Time simple-wb                606.60    573.81
4608M-16X-xfs        Elapsed Time mmap-strm               1549.75   1441.86
4608M-16X-xfs        Kswapd efficiency fsmark                 98%       98%
4608M-16X-xfs        Kswapd efficiency simple-wb              88%       82%
4608M-16X-xfs        Kswapd efficiency mmap-strm              44%       42%

Unlike the other tests, the fsmark results are not statistically
significant but the min and max times are both improved and for the
most part, tests completed faster.

There are other indications that this is an improvement as well. For
example, in the vast majority of cases, there were fewer pages scanned
by direct reclaim implying in many cases that stalls due to direct
reclaim are reduced. KSwapd is scanning more due to skipping dirty
pages which is unfortunate but the CPU usage is still acceptable

In an earlier set of tests, I used blktrace and in almost all cases
throughput throughout the entire test was higher. However, I ended
up discarding those results as recording blktrace data was too heavy
for my liking.

On a laptop, I plugged in a USB stick and ran a similar tests of tests
using it as backing storage. A desktop environment was running and for
the entire duration of the tests, firefox and gnome terminal were
launching and exiting to vaguely simulate a user.

1024M-xfs            Files/s  mean               0.41 ( 0.00%)        0.44 ( 6.82%)
1024M-xfs            Elapsed Time fsmark               2053.52   1641.03
1024M-xfs            Elapsed Time simple-wb            1229.53    768.05
1024M-xfs            Elapsed Time mmap-strm            4126.44   4597.03
1024M-xfs            Kswapd efficiency fsmark              84%       85%
1024M-xfs            Kswapd efficiency simple-wb           92%       81%
1024M-xfs            Kswapd efficiency mmap-strm           60%       51%
1024M-xfs            Avg wait ms fsmark                5404.53     4473.87
1024M-xfs            Avg wait ms simple-wb             2541.35     1453.54
1024M-xfs            Avg wait ms mmap-strm             3400.25     3852.53

The mmap-strm results were hurt because firefox launching had
a tendency to push the test out of memory. On the postive side,
firefox launched marginally faster with the patches applied.  Time to
completion for many tests was faster but more importantly - the "Avg
wait" time as measured by iostat was far lower implying the system
would be more responsive. It was also the case that "Avg wait ms"
on the root filesystem was lower. I tested it manually and while the
system felt slightly more responsive while copying data to a USB stick,
it was marginal enough that it could be my imagination.

For the most part, this series has a positive impact. Is there anything
else that should be done before I send this to Andrew requested it
be merged?

 fs/ext4/inode.c             |    6 +++-
 fs/xfs/linux-2.6/xfs_aops.c |    7 ++--
 include/linux/mmzone.h      |    1 +
 mm/vmscan.c                 |   67 ++++++++++++++++++++++++++++++------------
 mm/vmstat.c                 |    1 +
 5 files changed, 58 insertions(+), 24 deletions(-)

-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-10 10:47 ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig,
	Minchan Kim, Wu Fengguang, Johannes Weiner, Mel Gorman

Changelog since V2
  o Drop patch eliminating all writes from kswapd until such time as
    particular pages can be prioritised for writeback. Eliminating
    all writes led to stalls on NUMA
  o Lumpy synchronous reclaim now waits for pages currently under
    writeback but can no longer queue pages itself
  o Dropped btrfs warning when filesystems are called from direct
    reclaim. The fallback method for migration looks indistinguishable
    from direct reclaim.
  o Throttle based on pages writeback rather than pages dirty. Throttling
    based on just dirty is too aggressive and can end up trying to stall
    even when the underlying device is not congested

Changelog since v1
  o Drop prio-inode patch. There is now a dependency that the flusher
    threads find these dirty pages quickly.
  o Drop nr_vmscan_throttled counter
  o SetPageReclaim instead of deactivate_page which was wrong
  o Add warning to main filesystems if called from direct reclaim context
  o Add patch to completely disable filesystem writeback from reclaim

Testing from the XFS folk revealed that there is still too much
I/O from the end of the LRU in kswapd. Previously it was considered
acceptable by VM people for a small number of pages to be written
back from reclaim with testing generally showing about 0.3% of pages
reclaimed were written back (higher if memory was low). That writing
back a small number of pages is ok has been heavily disputed for
quite some time and Dave Chinner explained it well;

	It doesn't have to be a very high number to be a problem. IO
	is orders of magnitude slower than the CPU time it takes to
	flush a page, so the cost of making a bad flush decision is
	very high. And single page writeback from the LRU is almost
	always a bad flush decision.

To complicate matters, filesystems respond very differently to requests
from reclaim according to Christoph Hellwig;

	xfs tries to write it back if the requester is kswapd
	ext4 ignores the request if it's a delayed allocation
	btrfs ignores the request

As a result, each filesystem has different performance characteristics
when under memory pressure and there are many pages being dirties. In
some cases, the request is ignored entirely so the VM cannot depend
on the IO being dispatched.

The objective of this series is to reduce writing of filesystem-backed
pages from reclaim, play nicely with writeback that is already in
progress and throttle reclaim appropriately when writeback pages are
encountered. The assumption is that the flushers will always write
pages faster than if reclaim issues the IO. The new problem is that
reclaim has very little control over how long before a page in a
particular zone or container is cleaned which is discussed later. A
secondary goal is to avoid the problem whereby direct reclaim splices
two potentially deep call stacks together.

Patch 1 disables writeback of filesystem pages from direct reclaim
	entirely. Anonymous pages are still written.

Patch 2 removes dead code in lumpy reclaim as it is no longer able
	to synchronously write pages. This hurts lumpy reclaim but
	there is an expectation that compaction is used for hugepage
	allocations these days and lumpy reclaims days are numbered.

Patches 3-4 add warnings to XFS and ext4 if called from
	direct reclaim. With patch 1, this "never happens" and is
	intended to catch regressions in this logic in the future.

Patch 5 disables writeback of filesystem pages from kswapd unless
	the priority is raised to the point where kswapd is considered
	to be in trouble.

Patch 6 throttles reclaimers if too many dirty pages are being
	encountered and the zones or backing devices are congested.

Patch 7 invalidates dirty pages found at the end of the LRU so they
	are reclaimed quickly after being written back rather than
	waiting for a reclaimer to find them

I consider this series to be orthogonal to the writeback work but
it is worth noting that the writeback work affects the viability of
patch 8 in particular.

I tested this on ext4 and xfs using fs_mark, a simple writeback test
based on dd and a micro benchmark that does a streaming write to a
large mapping (exercises use-once LRU logic) followed by streaming
writes to a mix of anonymous and file-backed mappings. The command
line for fs_mark when botted with 512M looked something like

./fs_mark -d  /tmp/fsmark-2676  -D  100  -N  150  -n  150  -L  25  -t  1  -S0  -s  10485760

The number of files was adjusted depending on the amount of available
memory so that the files created was about 3xRAM. For multiple threads,
the -d switch is specified multiple times.

The test machine is x86-64 with an older generation of AMD processor
with 4 cores. The underlying storage was 4 disks configured as RAID-0
as this was the best configuration of storage I had available. Swap
is on a separate disk. Dirty ratio was tuned to 40% instead of the
default of 20%.

Testing was run with and without monitors to both verify that the
patches were operating as expected and that any performance gain was
real and not due to interference from monitors.

Here is a summary of results based on testing XFS.

512M1P-xfs           Files/s  mean                 32.69 ( 0.00%)     34.44 ( 5.08%)
512M1P-xfs           Elapsed Time fsmark                    51.41     48.29
512M1P-xfs           Elapsed Time simple-wb                114.09    108.61
512M1P-xfs           Elapsed Time mmap-strm                113.46    109.34
512M1P-xfs           Kswapd efficiency fsmark                 62%       63%
512M1P-xfs           Kswapd efficiency simple-wb              56%       61%
512M1P-xfs           Kswapd efficiency mmap-strm              44%       42%
512M-xfs             Files/s  mean                 30.78 ( 0.00%)     35.94 (14.36%)
512M-xfs             Elapsed Time fsmark                    56.08     48.90
512M-xfs             Elapsed Time simple-wb                112.22     98.13
512M-xfs             Elapsed Time mmap-strm                219.15    196.67
512M-xfs             Kswapd efficiency fsmark                 54%       56%
512M-xfs             Kswapd efficiency simple-wb              54%       55%
512M-xfs             Kswapd efficiency mmap-strm              45%       44%
512M-4X-xfs          Files/s  mean                 30.31 ( 0.00%)     33.33 ( 9.06%)
512M-4X-xfs          Elapsed Time fsmark                    63.26     55.88
512M-4X-xfs          Elapsed Time simple-wb                100.90     90.25
512M-4X-xfs          Elapsed Time mmap-strm                261.73    255.38
512M-4X-xfs          Kswapd efficiency fsmark                 49%       50%
512M-4X-xfs          Kswapd efficiency simple-wb              54%       56%
512M-4X-xfs          Kswapd efficiency mmap-strm              37%       36%
512M-16X-xfs         Files/s  mean                 60.89 ( 0.00%)     65.22 ( 6.64%)
512M-16X-xfs         Elapsed Time fsmark                    67.47     58.25
512M-16X-xfs         Elapsed Time simple-wb                103.22     90.89
512M-16X-xfs         Elapsed Time mmap-strm                237.09    198.82
512M-16X-xfs         Kswapd efficiency fsmark                 45%       46%
512M-16X-xfs         Kswapd efficiency simple-wb              53%       55%
512M-16X-xfs         Kswapd efficiency mmap-strm              33%       33%

Up until 512-4X, the FSmark improvements were statistically
significant. For the 4X and 16X tests the results were within standard
deviations but just barely. The time to completion for all tests is
improved which is an important result. In general, kswapd efficiency
is not affected by skipping dirty pages.

1024M1P-xfs          Files/s  mean                 39.09 ( 0.00%)     41.15 ( 5.01%)
1024M1P-xfs          Elapsed Time fsmark                    84.14     80.41
1024M1P-xfs          Elapsed Time simple-wb                210.77    184.78
1024M1P-xfs          Elapsed Time mmap-strm                162.00    160.34
1024M1P-xfs          Kswapd efficiency fsmark                 69%       75%
1024M1P-xfs          Kswapd efficiency simple-wb              71%       77%
1024M1P-xfs          Kswapd efficiency mmap-strm              43%       44%
1024M-xfs            Files/s  mean                 35.45 ( 0.00%)     37.00 ( 4.19%)
1024M-xfs            Elapsed Time fsmark                    94.59     91.00
1024M-xfs            Elapsed Time simple-wb                229.84    195.08
1024M-xfs            Elapsed Time mmap-strm                405.38    440.29
1024M-xfs            Kswapd efficiency fsmark                 79%       71%
1024M-xfs            Kswapd efficiency simple-wb              74%       74%
1024M-xfs            Kswapd efficiency mmap-strm              39%       42%
1024M-4X-xfs         Files/s  mean                 32.63 ( 0.00%)     35.05 ( 6.90%)
1024M-4X-xfs         Elapsed Time fsmark                   103.33     97.74
1024M-4X-xfs         Elapsed Time simple-wb                204.48    178.57
1024M-4X-xfs         Elapsed Time mmap-strm                528.38    511.88
1024M-4X-xfs         Kswapd efficiency fsmark                 81%       70%
1024M-4X-xfs         Kswapd efficiency simple-wb              73%       72%
1024M-4X-xfs         Kswapd efficiency mmap-strm              39%       38%
1024M-16X-xfs        Files/s  mean                 42.65 ( 0.00%)     42.97 ( 0.74%)
1024M-16X-xfs        Elapsed Time fsmark                   103.11     99.11
1024M-16X-xfs        Elapsed Time simple-wb                200.83    178.24
1024M-16X-xfs        Elapsed Time mmap-strm                397.35    459.82
1024M-16X-xfs        Kswapd efficiency fsmark                 84%       69%
1024M-16X-xfs        Kswapd efficiency simple-wb              74%       73%
1024M-16X-xfs        Kswapd efficiency mmap-strm              39%       40%

All FSMark tests up to 16X had statistically significant
improvements. For the most part, tests are completing faster with
the exception of the streaming writes to a mixture of anonymous and
file-backed mappings which were slower in two cases

In the cases where the mmap-strm tests were slower, there was more
swapping due to dirty pages being skipped. The number of additional
pages swapped is almost identical to the fewer number of pages written
from reclaim. In other words, roughly the same number of pages were
reclaimed but swapping was slower. As the test is a bit unrealistic
and stresses memory heavily, the small shift is acceptable.

4608M1P-xfs          Files/s  mean                 29.75 ( 0.00%)     30.96 ( 3.91%)
4608M1P-xfs          Elapsed Time fsmark                   512.01    492.15
4608M1P-xfs          Elapsed Time simple-wb                618.18    566.24
4608M1P-xfs          Elapsed Time mmap-strm                488.05    465.07
4608M1P-xfs          Kswapd efficiency fsmark                 93%       86%
4608M1P-xfs          Kswapd efficiency simple-wb              88%       84%
4608M1P-xfs          Kswapd efficiency mmap-strm              46%       45%
4608M-xfs            Files/s  mean                 27.60 ( 0.00%)     28.85 ( 4.33%)
4608M-xfs            Elapsed Time fsmark                   555.96    532.34
4608M-xfs            Elapsed Time simple-wb                659.72    571.85
4608M-xfs            Elapsed Time mmap-strm               1082.57   1146.38
4608M-xfs            Kswapd efficiency fsmark                 89%       91%
4608M-xfs            Kswapd efficiency simple-wb              88%       82%
4608M-xfs            Kswapd efficiency mmap-strm              48%       46%
4608M-4X-xfs         Files/s  mean                 26.00 ( 0.00%)     27.47 ( 5.35%)
4608M-4X-xfs         Elapsed Time fsmark                   592.91    564.00
4608M-4X-xfs         Elapsed Time simple-wb                616.65    575.07
4608M-4X-xfs         Elapsed Time mmap-strm               1773.02   1631.53
4608M-4X-xfs         Kswapd efficiency fsmark                 90%       94%
4608M-4X-xfs         Kswapd efficiency simple-wb              87%       82%
4608M-4X-xfs         Kswapd efficiency mmap-strm              43%       43%
4608M-16X-xfs        Files/s  mean                 26.07 ( 0.00%)     26.42 ( 1.32%)
4608M-16X-xfs        Elapsed Time fsmark                   602.69    585.78
4608M-16X-xfs        Elapsed Time simple-wb                606.60    573.81
4608M-16X-xfs        Elapsed Time mmap-strm               1549.75   1441.86
4608M-16X-xfs        Kswapd efficiency fsmark                 98%       98%
4608M-16X-xfs        Kswapd efficiency simple-wb              88%       82%
4608M-16X-xfs        Kswapd efficiency mmap-strm              44%       42%

Unlike the other tests, the fsmark results are not statistically
significant but the min and max times are both improved and for the
most part, tests completed faster.

There are other indications that this is an improvement as well. For
example, in the vast majority of cases, there were fewer pages scanned
by direct reclaim implying in many cases that stalls due to direct
reclaim are reduced. KSwapd is scanning more due to skipping dirty
pages which is unfortunate but the CPU usage is still acceptable

In an earlier set of tests, I used blktrace and in almost all cases
throughput throughout the entire test was higher. However, I ended
up discarding those results as recording blktrace data was too heavy
for my liking.

On a laptop, I plugged in a USB stick and ran a similar tests of tests
using it as backing storage. A desktop environment was running and for
the entire duration of the tests, firefox and gnome terminal were
launching and exiting to vaguely simulate a user.

1024M-xfs            Files/s  mean               0.41 ( 0.00%)        0.44 ( 6.82%)
1024M-xfs            Elapsed Time fsmark               2053.52   1641.03
1024M-xfs            Elapsed Time simple-wb            1229.53    768.05
1024M-xfs            Elapsed Time mmap-strm            4126.44   4597.03
1024M-xfs            Kswapd efficiency fsmark              84%       85%
1024M-xfs            Kswapd efficiency simple-wb           92%       81%
1024M-xfs            Kswapd efficiency mmap-strm           60%       51%
1024M-xfs            Avg wait ms fsmark                5404.53     4473.87
1024M-xfs            Avg wait ms simple-wb             2541.35     1453.54
1024M-xfs            Avg wait ms mmap-strm             3400.25     3852.53

The mmap-strm results were hurt because firefox launching had
a tendency to push the test out of memory. On the postive side,
firefox launched marginally faster with the patches applied.  Time to
completion for many tests was faster but more importantly - the "Avg
wait" time as measured by iostat was far lower implying the system
would be more responsive. It was also the case that "Avg wait ms"
on the root filesystem was lower. I tested it manually and while the
system felt slightly more responsive while copying data to a USB stick,
it was marginal enough that it could be my imagination.

For the most part, this series has a positive impact. Is there anything
else that should be done before I send this to Andrew requested it
be merged?

 fs/ext4/inode.c             |    6 +++-
 fs/xfs/linux-2.6/xfs_aops.c |    7 ++--
 include/linux/mmzone.h      |    1 +
 mm/vmscan.c                 |   67 ++++++++++++++++++++++++++++++------------
 mm/vmstat.c                 |    1 +
 5 files changed, 58 insertions(+), 24 deletions(-)

-- 
1.7.3.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-10 10:47 ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

Changelog since V2
  o Drop patch eliminating all writes from kswapd until such time as
    particular pages can be prioritised for writeback. Eliminating
    all writes led to stalls on NUMA
  o Lumpy synchronous reclaim now waits for pages currently under
    writeback but can no longer queue pages itself
  o Dropped btrfs warning when filesystems are called from direct
    reclaim. The fallback method for migration looks indistinguishable
    from direct reclaim.
  o Throttle based on pages writeback rather than pages dirty. Throttling
    based on just dirty is too aggressive and can end up trying to stall
    even when the underlying device is not congested

Changelog since v1
  o Drop prio-inode patch. There is now a dependency that the flusher
    threads find these dirty pages quickly.
  o Drop nr_vmscan_throttled counter
  o SetPageReclaim instead of deactivate_page which was wrong
  o Add warning to main filesystems if called from direct reclaim context
  o Add patch to completely disable filesystem writeback from reclaim

Testing from the XFS folk revealed that there is still too much
I/O from the end of the LRU in kswapd. Previously it was considered
acceptable by VM people for a small number of pages to be written
back from reclaim with testing generally showing about 0.3% of pages
reclaimed were written back (higher if memory was low). That writing
back a small number of pages is ok has been heavily disputed for
quite some time and Dave Chinner explained it well;

	It doesn't have to be a very high number to be a problem. IO
	is orders of magnitude slower than the CPU time it takes to
	flush a page, so the cost of making a bad flush decision is
	very high. And single page writeback from the LRU is almost
	always a bad flush decision.

To complicate matters, filesystems respond very differently to requests
from reclaim according to Christoph Hellwig;

	xfs tries to write it back if the requester is kswapd
	ext4 ignores the request if it's a delayed allocation
	btrfs ignores the request

As a result, each filesystem has different performance characteristics
when under memory pressure and there are many pages being dirties. In
some cases, the request is ignored entirely so the VM cannot depend
on the IO being dispatched.

The objective of this series is to reduce writing of filesystem-backed
pages from reclaim, play nicely with writeback that is already in
progress and throttle reclaim appropriately when writeback pages are
encountered. The assumption is that the flushers will always write
pages faster than if reclaim issues the IO. The new problem is that
reclaim has very little control over how long before a page in a
particular zone or container is cleaned which is discussed later. A
secondary goal is to avoid the problem whereby direct reclaim splices
two potentially deep call stacks together.

Patch 1 disables writeback of filesystem pages from direct reclaim
	entirely. Anonymous pages are still written.

Patch 2 removes dead code in lumpy reclaim as it is no longer able
	to synchronously write pages. This hurts lumpy reclaim but
	there is an expectation that compaction is used for hugepage
	allocations these days and lumpy reclaims days are numbered.

Patches 3-4 add warnings to XFS and ext4 if called from
	direct reclaim. With patch 1, this "never happens" and is
	intended to catch regressions in this logic in the future.

Patch 5 disables writeback of filesystem pages from kswapd unless
	the priority is raised to the point where kswapd is considered
	to be in trouble.

Patch 6 throttles reclaimers if too many dirty pages are being
	encountered and the zones or backing devices are congested.

Patch 7 invalidates dirty pages found at the end of the LRU so they
	are reclaimed quickly after being written back rather than
	waiting for a reclaimer to find them

I consider this series to be orthogonal to the writeback work but
it is worth noting that the writeback work affects the viability of
patch 8 in particular.

I tested this on ext4 and xfs using fs_mark, a simple writeback test
based on dd and a micro benchmark that does a streaming write to a
large mapping (exercises use-once LRU logic) followed by streaming
writes to a mix of anonymous and file-backed mappings. The command
line for fs_mark when botted with 512M looked something like

./fs_mark -d  /tmp/fsmark-2676  -D  100  -N  150  -n  150  -L  25  -t  1  -S0  -s  10485760

The number of files was adjusted depending on the amount of available
memory so that the files created was about 3xRAM. For multiple threads,
the -d switch is specified multiple times.

The test machine is x86-64 with an older generation of AMD processor
with 4 cores. The underlying storage was 4 disks configured as RAID-0
as this was the best configuration of storage I had available. Swap
is on a separate disk. Dirty ratio was tuned to 40% instead of the
default of 20%.

Testing was run with and without monitors to both verify that the
patches were operating as expected and that any performance gain was
real and not due to interference from monitors.

Here is a summary of results based on testing XFS.

512M1P-xfs           Files/s  mean                 32.69 ( 0.00%)     34.44 ( 5.08%)
512M1P-xfs           Elapsed Time fsmark                    51.41     48.29
512M1P-xfs           Elapsed Time simple-wb                114.09    108.61
512M1P-xfs           Elapsed Time mmap-strm                113.46    109.34
512M1P-xfs           Kswapd efficiency fsmark                 62%       63%
512M1P-xfs           Kswapd efficiency simple-wb              56%       61%
512M1P-xfs           Kswapd efficiency mmap-strm              44%       42%
512M-xfs             Files/s  mean                 30.78 ( 0.00%)     35.94 (14.36%)
512M-xfs             Elapsed Time fsmark                    56.08     48.90
512M-xfs             Elapsed Time simple-wb                112.22     98.13
512M-xfs             Elapsed Time mmap-strm                219.15    196.67
512M-xfs             Kswapd efficiency fsmark                 54%       56%
512M-xfs             Kswapd efficiency simple-wb              54%       55%
512M-xfs             Kswapd efficiency mmap-strm              45%       44%
512M-4X-xfs          Files/s  mean                 30.31 ( 0.00%)     33.33 ( 9.06%)
512M-4X-xfs          Elapsed Time fsmark                    63.26     55.88
512M-4X-xfs          Elapsed Time simple-wb                100.90     90.25
512M-4X-xfs          Elapsed Time mmap-strm                261.73    255.38
512M-4X-xfs          Kswapd efficiency fsmark                 49%       50%
512M-4X-xfs          Kswapd efficiency simple-wb              54%       56%
512M-4X-xfs          Kswapd efficiency mmap-strm              37%       36%
512M-16X-xfs         Files/s  mean                 60.89 ( 0.00%)     65.22 ( 6.64%)
512M-16X-xfs         Elapsed Time fsmark                    67.47     58.25
512M-16X-xfs         Elapsed Time simple-wb                103.22     90.89
512M-16X-xfs         Elapsed Time mmap-strm                237.09    198.82
512M-16X-xfs         Kswapd efficiency fsmark                 45%       46%
512M-16X-xfs         Kswapd efficiency simple-wb              53%       55%
512M-16X-xfs         Kswapd efficiency mmap-strm              33%       33%

Up until 512-4X, the FSmark improvements were statistically
significant. For the 4X and 16X tests the results were within standard
deviations but just barely. The time to completion for all tests is
improved which is an important result. In general, kswapd efficiency
is not affected by skipping dirty pages.

1024M1P-xfs          Files/s  mean                 39.09 ( 0.00%)     41.15 ( 5.01%)
1024M1P-xfs          Elapsed Time fsmark                    84.14     80.41
1024M1P-xfs          Elapsed Time simple-wb                210.77    184.78
1024M1P-xfs          Elapsed Time mmap-strm                162.00    160.34
1024M1P-xfs          Kswapd efficiency fsmark                 69%       75%
1024M1P-xfs          Kswapd efficiency simple-wb              71%       77%
1024M1P-xfs          Kswapd efficiency mmap-strm              43%       44%
1024M-xfs            Files/s  mean                 35.45 ( 0.00%)     37.00 ( 4.19%)
1024M-xfs            Elapsed Time fsmark                    94.59     91.00
1024M-xfs            Elapsed Time simple-wb                229.84    195.08
1024M-xfs            Elapsed Time mmap-strm                405.38    440.29
1024M-xfs            Kswapd efficiency fsmark                 79%       71%
1024M-xfs            Kswapd efficiency simple-wb              74%       74%
1024M-xfs            Kswapd efficiency mmap-strm              39%       42%
1024M-4X-xfs         Files/s  mean                 32.63 ( 0.00%)     35.05 ( 6.90%)
1024M-4X-xfs         Elapsed Time fsmark                   103.33     97.74
1024M-4X-xfs         Elapsed Time simple-wb                204.48    178.57
1024M-4X-xfs         Elapsed Time mmap-strm                528.38    511.88
1024M-4X-xfs         Kswapd efficiency fsmark                 81%       70%
1024M-4X-xfs         Kswapd efficiency simple-wb              73%       72%
1024M-4X-xfs         Kswapd efficiency mmap-strm              39%       38%
1024M-16X-xfs        Files/s  mean                 42.65 ( 0.00%)     42.97 ( 0.74%)
1024M-16X-xfs        Elapsed Time fsmark                   103.11     99.11
1024M-16X-xfs        Elapsed Time simple-wb                200.83    178.24
1024M-16X-xfs        Elapsed Time mmap-strm                397.35    459.82
1024M-16X-xfs        Kswapd efficiency fsmark                 84%       69%
1024M-16X-xfs        Kswapd efficiency simple-wb              74%       73%
1024M-16X-xfs        Kswapd efficiency mmap-strm              39%       40%

All FSMark tests up to 16X had statistically significant
improvements. For the most part, tests are completing faster with
the exception of the streaming writes to a mixture of anonymous and
file-backed mappings which were slower in two cases

In the cases where the mmap-strm tests were slower, there was more
swapping due to dirty pages being skipped. The number of additional
pages swapped is almost identical to the fewer number of pages written
from reclaim. In other words, roughly the same number of pages were
reclaimed but swapping was slower. As the test is a bit unrealistic
and stresses memory heavily, the small shift is acceptable.

4608M1P-xfs          Files/s  mean                 29.75 ( 0.00%)     30.96 ( 3.91%)
4608M1P-xfs          Elapsed Time fsmark                   512.01    492.15
4608M1P-xfs          Elapsed Time simple-wb                618.18    566.24
4608M1P-xfs          Elapsed Time mmap-strm                488.05    465.07
4608M1P-xfs          Kswapd efficiency fsmark                 93%       86%
4608M1P-xfs          Kswapd efficiency simple-wb              88%       84%
4608M1P-xfs          Kswapd efficiency mmap-strm              46%       45%
4608M-xfs            Files/s  mean                 27.60 ( 0.00%)     28.85 ( 4.33%)
4608M-xfs            Elapsed Time fsmark                   555.96    532.34
4608M-xfs            Elapsed Time simple-wb                659.72    571.85
4608M-xfs            Elapsed Time mmap-strm               1082.57   1146.38
4608M-xfs            Kswapd efficiency fsmark                 89%       91%
4608M-xfs            Kswapd efficiency simple-wb              88%       82%
4608M-xfs            Kswapd efficiency mmap-strm              48%       46%
4608M-4X-xfs         Files/s  mean                 26.00 ( 0.00%)     27.47 ( 5.35%)
4608M-4X-xfs         Elapsed Time fsmark                   592.91    564.00
4608M-4X-xfs         Elapsed Time simple-wb                616.65    575.07
4608M-4X-xfs         Elapsed Time mmap-strm               1773.02   1631.53
4608M-4X-xfs         Kswapd efficiency fsmark                 90%       94%
4608M-4X-xfs         Kswapd efficiency simple-wb              87%       82%
4608M-4X-xfs         Kswapd efficiency mmap-strm              43%       43%
4608M-16X-xfs        Files/s  mean                 26.07 ( 0.00%)     26.42 ( 1.32%)
4608M-16X-xfs        Elapsed Time fsmark                   602.69    585.78
4608M-16X-xfs        Elapsed Time simple-wb                606.60    573.81
4608M-16X-xfs        Elapsed Time mmap-strm               1549.75   1441.86
4608M-16X-xfs        Kswapd efficiency fsmark                 98%       98%
4608M-16X-xfs        Kswapd efficiency simple-wb              88%       82%
4608M-16X-xfs        Kswapd efficiency mmap-strm              44%       42%

Unlike the other tests, the fsmark results are not statistically
significant but the min and max times are both improved and for the
most part, tests completed faster.

There are other indications that this is an improvement as well. For
example, in the vast majority of cases, there were fewer pages scanned
by direct reclaim implying in many cases that stalls due to direct
reclaim are reduced. KSwapd is scanning more due to skipping dirty
pages which is unfortunate but the CPU usage is still acceptable

In an earlier set of tests, I used blktrace and in almost all cases
throughput throughout the entire test was higher. However, I ended
up discarding those results as recording blktrace data was too heavy
for my liking.

On a laptop, I plugged in a USB stick and ran a similar tests of tests
using it as backing storage. A desktop environment was running and for
the entire duration of the tests, firefox and gnome terminal were
launching and exiting to vaguely simulate a user.

1024M-xfs            Files/s  mean               0.41 ( 0.00%)        0.44 ( 6.82%)
1024M-xfs            Elapsed Time fsmark               2053.52   1641.03
1024M-xfs            Elapsed Time simple-wb            1229.53    768.05
1024M-xfs            Elapsed Time mmap-strm            4126.44   4597.03
1024M-xfs            Kswapd efficiency fsmark              84%       85%
1024M-xfs            Kswapd efficiency simple-wb           92%       81%
1024M-xfs            Kswapd efficiency mmap-strm           60%       51%
1024M-xfs            Avg wait ms fsmark                5404.53     4473.87
1024M-xfs            Avg wait ms simple-wb             2541.35     1453.54
1024M-xfs            Avg wait ms mmap-strm             3400.25     3852.53

The mmap-strm results were hurt because firefox launching had
a tendency to push the test out of memory. On the postive side,
firefox launched marginally faster with the patches applied.  Time to
completion for many tests was faster but more importantly - the "Avg
wait" time as measured by iostat was far lower implying the system
would be more responsive. It was also the case that "Avg wait ms"
on the root filesystem was lower. I tested it manually and while the
system felt slightly more responsive while copying data to a USB stick,
it was marginal enough that it could be my imagination.

For the most part, this series has a positive impact. Is there anything
else that should be done before I send this to Andrew requested it
be merged?

 fs/ext4/inode.c             |    6 +++-
 fs/xfs/linux-2.6/xfs_aops.c |    7 ++--
 include/linux/mmzone.h      |    1 +
 mm/vmscan.c                 |   67 ++++++++++++++++++++++++++++++------------
 mm/vmstat.c                 |    1 +
 5 files changed, 58 insertions(+), 24 deletions(-)

-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
  2011-08-10 10:47 ` Mel Gorman
  (?)
@ 2011-08-10 10:47   ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

From: Mel Gorman <mel@csn.ul.ie>

When kswapd is failing to keep zones above the min watermark, a process
will enter direct reclaim in the same manner kswapd does. If a dirty
page is encountered during the scan, this page is written to backing
storage using mapping->writepage.

This causes two problems. First, it can result in very deep call
stacks, particularly if the target storage or filesystem are complex.
Some filesystems ignore write requests from direct reclaim as a result.
The second is that a single-page flush is inefficient in terms of IO.
While there is an expectation that the elevator will merge requests,
this does not always happen. Quoting Christoph Hellwig;

	The elevator has a relatively small window it can operate on,
	and can never fix up a bad large scale writeback pattern.

This patch prevents direct reclaim writing back filesystem pages by
checking if current is kswapd. Anonymous pages are still written to
swap as there is not the equivalent of a flusher thread for anonymous
pages. If the dirty pages cannot be written back, they are placed
back on the LRU lists. There is now a direct dependency on dirty page
balancing to prevent too many pages in the system being dirtied which
would prevent reclaim making forward progress.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
---
 include/linux/mmzone.h |    1 +
 mm/vmscan.c            |    9 +++++++++
 mm/vmstat.c            |    1 +
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9f7c3eb..b70a0c0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,6 +100,7 @@ enum zone_stat_item {
 	NR_UNSTABLE_NFS,	/* NFS unstable pages */
 	NR_BOUNCE,
 	NR_VMSCAN_WRITE,
+	NR_VMSCAN_WRITE_SKIP,
 	NR_WRITEBACK_TEMP,	/* Writeback using temporary buffers */
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d036e59..1522b0f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -825,6 +825,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (PageDirty(page)) {
 			nr_dirty++;
 
+			/*
+			 * Only kswapd can writeback filesystem pages to
+			 * avoid risk of stack overflow
+			 */
+			if (page_is_file_cache(page) && !current_is_kswapd()) {
+				inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
+				goto keep_locked;
+			}
+
 			if (references == PAGEREF_RECLAIM_CLEAN)
 				goto keep_locked;
 			if (!may_enter_fs)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 20c18b7..fd109f3 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -702,6 +702,7 @@ const char * const vmstat_text[] = {
 	"nr_unstable",
 	"nr_bounce",
 	"nr_vmscan_write",
+	"nr_vmscan_write_skip",
 	"nr_writeback_temp",
 	"nr_isolated_anon",
 	"nr_isolated_file",
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig,
	Minchan Kim, Wu Fengguang, Johannes Weiner, Mel Gorman

From: Mel Gorman <mel@csn.ul.ie>

When kswapd is failing to keep zones above the min watermark, a process
will enter direct reclaim in the same manner kswapd does. If a dirty
page is encountered during the scan, this page is written to backing
storage using mapping->writepage.

This causes two problems. First, it can result in very deep call
stacks, particularly if the target storage or filesystem are complex.
Some filesystems ignore write requests from direct reclaim as a result.
The second is that a single-page flush is inefficient in terms of IO.
While there is an expectation that the elevator will merge requests,
this does not always happen. Quoting Christoph Hellwig;

	The elevator has a relatively small window it can operate on,
	and can never fix up a bad large scale writeback pattern.

This patch prevents direct reclaim writing back filesystem pages by
checking if current is kswapd. Anonymous pages are still written to
swap as there is not the equivalent of a flusher thread for anonymous
pages. If the dirty pages cannot be written back, they are placed
back on the LRU lists. There is now a direct dependency on dirty page
balancing to prevent too many pages in the system being dirtied which
would prevent reclaim making forward progress.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
---
 include/linux/mmzone.h |    1 +
 mm/vmscan.c            |    9 +++++++++
 mm/vmstat.c            |    1 +
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9f7c3eb..b70a0c0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,6 +100,7 @@ enum zone_stat_item {
 	NR_UNSTABLE_NFS,	/* NFS unstable pages */
 	NR_BOUNCE,
 	NR_VMSCAN_WRITE,
+	NR_VMSCAN_WRITE_SKIP,
 	NR_WRITEBACK_TEMP,	/* Writeback using temporary buffers */
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d036e59..1522b0f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -825,6 +825,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (PageDirty(page)) {
 			nr_dirty++;
 
+			/*
+			 * Only kswapd can writeback filesystem pages to
+			 * avoid risk of stack overflow
+			 */
+			if (page_is_file_cache(page) && !current_is_kswapd()) {
+				inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
+				goto keep_locked;
+			}
+
 			if (references == PAGEREF_RECLAIM_CLEAN)
 				goto keep_locked;
 			if (!may_enter_fs)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 20c18b7..fd109f3 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -702,6 +702,7 @@ const char * const vmstat_text[] = {
 	"nr_unstable",
 	"nr_bounce",
 	"nr_vmscan_write",
+	"nr_vmscan_write_skip",
 	"nr_writeback_temp",
 	"nr_isolated_anon",
 	"nr_isolated_file",
-- 
1.7.3.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

From: Mel Gorman <mel@csn.ul.ie>

When kswapd is failing to keep zones above the min watermark, a process
will enter direct reclaim in the same manner kswapd does. If a dirty
page is encountered during the scan, this page is written to backing
storage using mapping->writepage.

This causes two problems. First, it can result in very deep call
stacks, particularly if the target storage or filesystem are complex.
Some filesystems ignore write requests from direct reclaim as a result.
The second is that a single-page flush is inefficient in terms of IO.
While there is an expectation that the elevator will merge requests,
this does not always happen. Quoting Christoph Hellwig;

	The elevator has a relatively small window it can operate on,
	and can never fix up a bad large scale writeback pattern.

This patch prevents direct reclaim writing back filesystem pages by
checking if current is kswapd. Anonymous pages are still written to
swap as there is not the equivalent of a flusher thread for anonymous
pages. If the dirty pages cannot be written back, they are placed
back on the LRU lists. There is now a direct dependency on dirty page
balancing to prevent too many pages in the system being dirtied which
would prevent reclaim making forward progress.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
---
 include/linux/mmzone.h |    1 +
 mm/vmscan.c            |    9 +++++++++
 mm/vmstat.c            |    1 +
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9f7c3eb..b70a0c0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,6 +100,7 @@ enum zone_stat_item {
 	NR_UNSTABLE_NFS,	/* NFS unstable pages */
 	NR_BOUNCE,
 	NR_VMSCAN_WRITE,
+	NR_VMSCAN_WRITE_SKIP,
 	NR_WRITEBACK_TEMP,	/* Writeback using temporary buffers */
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d036e59..1522b0f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -825,6 +825,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (PageDirty(page)) {
 			nr_dirty++;
 
+			/*
+			 * Only kswapd can writeback filesystem pages to
+			 * avoid risk of stack overflow
+			 */
+			if (page_is_file_cache(page) && !current_is_kswapd()) {
+				inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
+				goto keep_locked;
+			}
+
 			if (references == PAGEREF_RECLAIM_CLEAN)
 				goto keep_locked;
 			if (!may_enter_fs)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 20c18b7..fd109f3 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -702,6 +702,7 @@ const char * const vmstat_text[] = {
 	"nr_unstable",
 	"nr_bounce",
 	"nr_vmscan_write",
+	"nr_vmscan_write_skip",
 	"nr_writeback_temp",
 	"nr_isolated_anon",
 	"nr_isolated_file",
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
  2011-08-10 10:47 ` Mel Gorman
  (?)
@ 2011-08-10 10:47   ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

Lumpy reclaim worked with two passes - the first which queued pages for
IO and the second which waited on writeback. As direct reclaim can no
longer write pages there is some dead code. This patch removes it but
direct reclaim will continue to wait on pages under writeback while in
synchronous reclaim mode.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c |   21 +++++----------------
 1 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1522b0f..7863f8e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -455,15 +455,6 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 			return PAGE_ACTIVATE;
 		}
 
-		/*
-		 * Wait on writeback if requested to. This happens when
-		 * direct reclaiming a large contiguous area and the
-		 * first attempt to free a range of pages fails.
-		 */
-		if (PageWriteback(page) &&
-		    (sc->reclaim_mode & RECLAIM_MODE_SYNC))
-			wait_on_page_writeback(page);
-
 		if (!PageWriteback(page)) {
 			/* synchronous write or broken a_ops? */
 			ClearPageReclaim(page);
@@ -764,12 +755,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 		if (PageWriteback(page)) {
 			/*
-			 * Synchronous reclaim is performed in two passes,
-			 * first an asynchronous pass over the list to
-			 * start parallel writeback, and a second synchronous
-			 * pass to wait for the IO to complete.  Wait here
-			 * for any page for which writeback has already
-			 * started.
+			 * Synchronous reclaim cannot queue pages for
+			 * writeback due to the possibility of stack overflow
+			 * but if it encounters a page under writeback, wait
+			 * for the IO to complete.
 			 */
 			if ((sc->reclaim_mode & RECLAIM_MODE_SYNC) &&
 			    may_enter_fs)
@@ -1363,7 +1352,7 @@ static noinline_for_stack void update_isolated_counts(struct zone *zone,
 }
 
 /*
- * Returns true if the caller should wait to clean dirty/writeback pages.
+ * Returns true if a direct reclaim should wait on pages under writeback.
  *
  * If we are direct reclaiming for contiguous pages and we do not reclaim
  * everything in the list, try again and wait for writeback IO to complete.
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig,
	Minchan Kim, Wu Fengguang, Johannes Weiner, Mel Gorman

Lumpy reclaim worked with two passes - the first which queued pages for
IO and the second which waited on writeback. As direct reclaim can no
longer write pages there is some dead code. This patch removes it but
direct reclaim will continue to wait on pages under writeback while in
synchronous reclaim mode.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c |   21 +++++----------------
 1 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1522b0f..7863f8e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -455,15 +455,6 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 			return PAGE_ACTIVATE;
 		}
 
-		/*
-		 * Wait on writeback if requested to. This happens when
-		 * direct reclaiming a large contiguous area and the
-		 * first attempt to free a range of pages fails.
-		 */
-		if (PageWriteback(page) &&
-		    (sc->reclaim_mode & RECLAIM_MODE_SYNC))
-			wait_on_page_writeback(page);
-
 		if (!PageWriteback(page)) {
 			/* synchronous write or broken a_ops? */
 			ClearPageReclaim(page);
@@ -764,12 +755,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 		if (PageWriteback(page)) {
 			/*
-			 * Synchronous reclaim is performed in two passes,
-			 * first an asynchronous pass over the list to
-			 * start parallel writeback, and a second synchronous
-			 * pass to wait for the IO to complete.  Wait here
-			 * for any page for which writeback has already
-			 * started.
+			 * Synchronous reclaim cannot queue pages for
+			 * writeback due to the possibility of stack overflow
+			 * but if it encounters a page under writeback, wait
+			 * for the IO to complete.
 			 */
 			if ((sc->reclaim_mode & RECLAIM_MODE_SYNC) &&
 			    may_enter_fs)
@@ -1363,7 +1352,7 @@ static noinline_for_stack void update_isolated_counts(struct zone *zone,
 }
 
 /*
- * Returns true if the caller should wait to clean dirty/writeback pages.
+ * Returns true if a direct reclaim should wait on pages under writeback.
  *
  * If we are direct reclaiming for contiguous pages and we do not reclaim
  * everything in the list, try again and wait for writeback IO to complete.
-- 
1.7.3.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

Lumpy reclaim worked with two passes - the first which queued pages for
IO and the second which waited on writeback. As direct reclaim can no
longer write pages there is some dead code. This patch removes it but
direct reclaim will continue to wait on pages under writeback while in
synchronous reclaim mode.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c |   21 +++++----------------
 1 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1522b0f..7863f8e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -455,15 +455,6 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 			return PAGE_ACTIVATE;
 		}
 
-		/*
-		 * Wait on writeback if requested to. This happens when
-		 * direct reclaiming a large contiguous area and the
-		 * first attempt to free a range of pages fails.
-		 */
-		if (PageWriteback(page) &&
-		    (sc->reclaim_mode & RECLAIM_MODE_SYNC))
-			wait_on_page_writeback(page);
-
 		if (!PageWriteback(page)) {
 			/* synchronous write or broken a_ops? */
 			ClearPageReclaim(page);
@@ -764,12 +755,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 		if (PageWriteback(page)) {
 			/*
-			 * Synchronous reclaim is performed in two passes,
-			 * first an asynchronous pass over the list to
-			 * start parallel writeback, and a second synchronous
-			 * pass to wait for the IO to complete.  Wait here
-			 * for any page for which writeback has already
-			 * started.
+			 * Synchronous reclaim cannot queue pages for
+			 * writeback due to the possibility of stack overflow
+			 * but if it encounters a page under writeback, wait
+			 * for the IO to complete.
 			 */
 			if ((sc->reclaim_mode & RECLAIM_MODE_SYNC) &&
 			    may_enter_fs)
@@ -1363,7 +1352,7 @@ static noinline_for_stack void update_isolated_counts(struct zone *zone,
 }
 
 /*
- * Returns true if the caller should wait to clean dirty/writeback pages.
+ * Returns true if a direct reclaim should wait on pages under writeback.
  *
  * If we are direct reclaiming for contiguous pages and we do not reclaim
  * everything in the list, try again and wait for writeback IO to complete.
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 3/7] xfs: Warn if direct reclaim tries to writeback pages
  2011-08-10 10:47 ` Mel Gorman
  (?)
@ 2011-08-10 10:47   ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

Direct reclaim should never writeback pages. For now, handle the
situation and warn about it. Ultimately, this will be a BUG_ON.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/xfs/linux-2.6/xfs_aops.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 79ce38b..afea9cd 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -930,11 +930,10 @@ xfs_vm_writepage(
 	 * random callers for direct reclaim or memcg reclaim.  We explicitly
 	 * allow reclaim from kswapd as the stack usage there is relatively low.
 	 *
-	 * This should really be done by the core VM, but until that happens
-	 * filesystems like XFS, btrfs and ext4 have to take care of this
-	 * by themselves.
+	 * This should never happen except in the case of a VM regression so
+	 * warn about it.
 	 */
-	if ((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC)
+	if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC))
 		goto redirty;
 
 	/*
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 3/7] xfs: Warn if direct reclaim tries to writeback pages
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig,
	Minchan Kim, Wu Fengguang, Johannes Weiner, Mel Gorman

Direct reclaim should never writeback pages. For now, handle the
situation and warn about it. Ultimately, this will be a BUG_ON.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/xfs/linux-2.6/xfs_aops.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 79ce38b..afea9cd 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -930,11 +930,10 @@ xfs_vm_writepage(
 	 * random callers for direct reclaim or memcg reclaim.  We explicitly
 	 * allow reclaim from kswapd as the stack usage there is relatively low.
 	 *
-	 * This should really be done by the core VM, but until that happens
-	 * filesystems like XFS, btrfs and ext4 have to take care of this
-	 * by themselves.
+	 * This should never happen except in the case of a VM regression so
+	 * warn about it.
 	 */
-	if ((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC)
+	if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC))
 		goto redirty;
 
 	/*
-- 
1.7.3.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 3/7] xfs: Warn if direct reclaim tries to writeback pages
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

Direct reclaim should never writeback pages. For now, handle the
situation and warn about it. Ultimately, this will be a BUG_ON.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/xfs/linux-2.6/xfs_aops.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 79ce38b..afea9cd 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -930,11 +930,10 @@ xfs_vm_writepage(
 	 * random callers for direct reclaim or memcg reclaim.  We explicitly
 	 * allow reclaim from kswapd as the stack usage there is relatively low.
 	 *
-	 * This should really be done by the core VM, but until that happens
-	 * filesystems like XFS, btrfs and ext4 have to take care of this
-	 * by themselves.
+	 * This should never happen except in the case of a VM regression so
+	 * warn about it.
 	 */
-	if ((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC)
+	if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC))
 		goto redirty;
 
 	/*
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 4/7] ext4: Warn if direct reclaim tries to writeback pages
  2011-08-10 10:47 ` Mel Gorman
  (?)
@ 2011-08-10 10:47   ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

Direct reclaim should never writeback pages. Warn if an attempt
is made.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/ext4/inode.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e3126c0..95bb179 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2663,8 +2663,12 @@ static int ext4_writepage(struct page *page,
 		 * We don't want to do block allocation, so redirty
 		 * the page and return.  We may reach here when we do
 		 * a journal commit via journal_submit_inode_data_buffers.
-		 * We can also reach here via shrink_page_list
+		 * We can also reach here via shrink_page_list but it
+		 * should never be for direct reclaim so warn if that
+		 * happens
 		 */
+		WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
+								PF_MEMALLOC);
 		goto redirty_page;
 	}
 	if (commit_write)
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 4/7] ext4: Warn if direct reclaim tries to writeback pages
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig,
	Minchan Kim, Wu Fengguang, Johannes Weiner, Mel Gorman

Direct reclaim should never writeback pages. Warn if an attempt
is made.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/ext4/inode.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e3126c0..95bb179 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2663,8 +2663,12 @@ static int ext4_writepage(struct page *page,
 		 * We don't want to do block allocation, so redirty
 		 * the page and return.  We may reach here when we do
 		 * a journal commit via journal_submit_inode_data_buffers.
-		 * We can also reach here via shrink_page_list
+		 * We can also reach here via shrink_page_list but it
+		 * should never be for direct reclaim so warn if that
+		 * happens
 		 */
+		WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
+								PF_MEMALLOC);
 		goto redirty_page;
 	}
 	if (commit_write)
-- 
1.7.3.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 4/7] ext4: Warn if direct reclaim tries to writeback pages
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

Direct reclaim should never writeback pages. Warn if an attempt
is made.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/ext4/inode.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e3126c0..95bb179 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2663,8 +2663,12 @@ static int ext4_writepage(struct page *page,
 		 * We don't want to do block allocation, so redirty
 		 * the page and return.  We may reach here when we do
 		 * a journal commit via journal_submit_inode_data_buffers.
-		 * We can also reach here via shrink_page_list
+		 * We can also reach here via shrink_page_list but it
+		 * should never be for direct reclaim so warn if that
+		 * happens
 		 */
+		WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
+								PF_MEMALLOC);
 		goto redirty_page;
 	}
 	if (commit_write)
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
  2011-08-10 10:47 ` Mel Gorman
  (?)
@ 2011-08-10 10:47   ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

It is preferable that no dirty pages are dispatched for cleaning from
the page reclaim path. At normal priorities, this patch prevents kswapd
writing pages.

However, page reclaim does have a requirement that pages be freed
in a particular zone. If it is failing to make sufficient progress
(reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
considered to be the point where kswapd is getting into trouble
reclaiming pages. If this priority is reached, kswapd will dispatch
pages for writing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/vmscan.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7863f8e..6d7c696 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -710,7 +710,8 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
  */
 static unsigned long shrink_page_list(struct list_head *page_list,
 				      struct zone *zone,
-				      struct scan_control *sc)
+				      struct scan_control *sc,
+				      int priority)
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
@@ -816,9 +817,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 			/*
 			 * Only kswapd can writeback filesystem pages to
-			 * avoid risk of stack overflow
+			 * avoid risk of stack overflow but do not writeback
+			 * unless under significant pressure.
 			 */
-			if (page_is_file_cache(page) && !current_is_kswapd()) {
+			if (page_is_file_cache(page) &&
+					(!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
 				inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
 				goto keep_locked;
 			}
@@ -1454,12 +1457,12 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	spin_unlock_irq(&zone->lru_lock);
 
-	nr_reclaimed = shrink_page_list(&page_list, zone, sc);
+	nr_reclaimed = shrink_page_list(&page_list, zone, sc, priority);
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
 		set_reclaim_mode(priority, sc, true);
-		nr_reclaimed += shrink_page_list(&page_list, zone, sc);
+		nr_reclaimed += shrink_page_list(&page_list, zone, sc, priority);
 	}
 
 	local_irq_disable();
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig,
	Minchan Kim, Wu Fengguang, Johannes Weiner, Mel Gorman

It is preferable that no dirty pages are dispatched for cleaning from
the page reclaim path. At normal priorities, this patch prevents kswapd
writing pages.

However, page reclaim does have a requirement that pages be freed
in a particular zone. If it is failing to make sufficient progress
(reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
considered to be the point where kswapd is getting into trouble
reclaiming pages. If this priority is reached, kswapd will dispatch
pages for writing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/vmscan.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7863f8e..6d7c696 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -710,7 +710,8 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
  */
 static unsigned long shrink_page_list(struct list_head *page_list,
 				      struct zone *zone,
-				      struct scan_control *sc)
+				      struct scan_control *sc,
+				      int priority)
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
@@ -816,9 +817,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 			/*
 			 * Only kswapd can writeback filesystem pages to
-			 * avoid risk of stack overflow
+			 * avoid risk of stack overflow but do not writeback
+			 * unless under significant pressure.
 			 */
-			if (page_is_file_cache(page) && !current_is_kswapd()) {
+			if (page_is_file_cache(page) &&
+					(!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
 				inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
 				goto keep_locked;
 			}
@@ -1454,12 +1457,12 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	spin_unlock_irq(&zone->lru_lock);
 
-	nr_reclaimed = shrink_page_list(&page_list, zone, sc);
+	nr_reclaimed = shrink_page_list(&page_list, zone, sc, priority);
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
 		set_reclaim_mode(priority, sc, true);
-		nr_reclaimed += shrink_page_list(&page_list, zone, sc);
+		nr_reclaimed += shrink_page_list(&page_list, zone, sc, priority);
 	}
 
 	local_irq_disable();
-- 
1.7.3.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

It is preferable that no dirty pages are dispatched for cleaning from
the page reclaim path. At normal priorities, this patch prevents kswapd
writing pages.

However, page reclaim does have a requirement that pages be freed
in a particular zone. If it is failing to make sufficient progress
(reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
considered to be the point where kswapd is getting into trouble
reclaiming pages. If this priority is reached, kswapd will dispatch
pages for writing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/vmscan.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7863f8e..6d7c696 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -710,7 +710,8 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
  */
 static unsigned long shrink_page_list(struct list_head *page_list,
 				      struct zone *zone,
-				      struct scan_control *sc)
+				      struct scan_control *sc,
+				      int priority)
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
@@ -816,9 +817,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 			/*
 			 * Only kswapd can writeback filesystem pages to
-			 * avoid risk of stack overflow
+			 * avoid risk of stack overflow but do not writeback
+			 * unless under significant pressure.
 			 */
-			if (page_is_file_cache(page) && !current_is_kswapd()) {
+			if (page_is_file_cache(page) &&
+					(!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
 				inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
 				goto keep_locked;
 			}
@@ -1454,12 +1457,12 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	spin_unlock_irq(&zone->lru_lock);
 
-	nr_reclaimed = shrink_page_list(&page_list, zone, sc);
+	nr_reclaimed = shrink_page_list(&page_list, zone, sc, priority);
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
 		set_reclaim_mode(priority, sc, true);
-		nr_reclaimed += shrink_page_list(&page_list, zone, sc);
+		nr_reclaimed += shrink_page_list(&page_list, zone, sc, priority);
 	}
 
 	local_irq_disable();
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
  2011-08-10 10:47 ` Mel Gorman
  (?)
@ 2011-08-10 10:47   ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

Workloads that are allocating frequently and writing files place a
large number of dirty pages on the LRU. With use-once logic, it is
possible for them to reach the end of the LRU quickly requiring the
reclaimer to scan more to find clean pages. Ordinarily, processes that
are dirtying memory will get throttled by dirty balancing but this
is a global heuristic and does not take into account that LRUs are
maintained on a per-zone basis. This can lead to a situation whereby
reclaim is scanning heavily, skipping over a large number of pages
under writeback and recycling them around the LRU consuming CPU.

This patch checks how many of the number of pages isolated from the
LRU were dirty and under writeback. If a percentage of them under
writeback, the process will be throttled if a backing device or the
zone is congested. Note that this applies whether it is anonymous or
file-backed pages that are under writeback meaning that swapping is
potentially throttled. This is intentional due to the fact if the
swap device is congested, scanning more pages and dispatching more
IO is not going to help matters.

The percentage that must be in writeback depends on the priority. At
default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
the greater the likelihood the process will get throttled to allow
the flusher threads to make some progress.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/vmscan.c |   26 +++++++++++++++++++++++---
 1 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6d7c696..b0437f2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -711,7 +711,9 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
 static unsigned long shrink_page_list(struct list_head *page_list,
 				      struct zone *zone,
 				      struct scan_control *sc,
-				      int priority)
+				      int priority,
+				      unsigned long *ret_nr_dirty,
+				      unsigned long *ret_nr_writeback)
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
@@ -719,6 +721,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 	unsigned long nr_dirty = 0;
 	unsigned long nr_congested = 0;
 	unsigned long nr_reclaimed = 0;
+	unsigned long nr_writeback = 0;
 
 	cond_resched();
 
@@ -755,6 +758,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
 
 		if (PageWriteback(page)) {
+			nr_writeback++;
 			/*
 			 * Synchronous reclaim cannot queue pages for
 			 * writeback due to the possibility of stack overflow
@@ -960,6 +964,8 @@ keep_lumpy:
 
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
+	*ret_nr_dirty += nr_dirty;
+	*ret_nr_writeback += nr_writeback;
 	return nr_reclaimed;
 }
 
@@ -1409,6 +1415,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	unsigned long nr_taken;
 	unsigned long nr_anon;
 	unsigned long nr_file;
+	unsigned long nr_dirty = 0;
+	unsigned long nr_writeback = 0;
 
 	while (unlikely(too_many_isolated(zone, file, sc))) {
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -1457,12 +1465,14 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	spin_unlock_irq(&zone->lru_lock);
 
-	nr_reclaimed = shrink_page_list(&page_list, zone, sc, priority);
+	nr_reclaimed = shrink_page_list(&page_list, zone, sc, priority,
+						&nr_dirty, &nr_writeback);
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
 		set_reclaim_mode(priority, sc, true);
-		nr_reclaimed += shrink_page_list(&page_list, zone, sc, priority);
+		nr_reclaimed += shrink_page_list(&page_list, zone, sc,
+					priority, &nr_dirty, &nr_writeback);
 	}
 
 	local_irq_disable();
@@ -1472,6 +1482,16 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
 
+	/*
+	 * If we have encountered a high number of dirty pages under writeback
+	 * then we are reaching the end of the LRU too quickly and global
+	 * limits are not enough to throttle processes due to the page
+	 * distribution throughout zones. Scale the number of dirty pages that
+	 * must be under writeback before being throttled to priority.
+	 */
+	if (nr_writeback && nr_writeback >= (nr_taken >> (DEF_PRIORITY-priority)))
+		wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
+
 	trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
 		zone_idx(zone),
 		nr_scanned, nr_reclaimed,
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig,
	Minchan Kim, Wu Fengguang, Johannes Weiner, Mel Gorman

Workloads that are allocating frequently and writing files place a
large number of dirty pages on the LRU. With use-once logic, it is
possible for them to reach the end of the LRU quickly requiring the
reclaimer to scan more to find clean pages. Ordinarily, processes that
are dirtying memory will get throttled by dirty balancing but this
is a global heuristic and does not take into account that LRUs are
maintained on a per-zone basis. This can lead to a situation whereby
reclaim is scanning heavily, skipping over a large number of pages
under writeback and recycling them around the LRU consuming CPU.

This patch checks how many of the number of pages isolated from the
LRU were dirty and under writeback. If a percentage of them under
writeback, the process will be throttled if a backing device or the
zone is congested. Note that this applies whether it is anonymous or
file-backed pages that are under writeback meaning that swapping is
potentially throttled. This is intentional due to the fact if the
swap device is congested, scanning more pages and dispatching more
IO is not going to help matters.

The percentage that must be in writeback depends on the priority. At
default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
the greater the likelihood the process will get throttled to allow
the flusher threads to make some progress.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/vmscan.c |   26 +++++++++++++++++++++++---
 1 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6d7c696..b0437f2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -711,7 +711,9 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
 static unsigned long shrink_page_list(struct list_head *page_list,
 				      struct zone *zone,
 				      struct scan_control *sc,
-				      int priority)
+				      int priority,
+				      unsigned long *ret_nr_dirty,
+				      unsigned long *ret_nr_writeback)
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
@@ -719,6 +721,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 	unsigned long nr_dirty = 0;
 	unsigned long nr_congested = 0;
 	unsigned long nr_reclaimed = 0;
+	unsigned long nr_writeback = 0;
 
 	cond_resched();
 
@@ -755,6 +758,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
 
 		if (PageWriteback(page)) {
+			nr_writeback++;
 			/*
 			 * Synchronous reclaim cannot queue pages for
 			 * writeback due to the possibility of stack overflow
@@ -960,6 +964,8 @@ keep_lumpy:
 
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
+	*ret_nr_dirty += nr_dirty;
+	*ret_nr_writeback += nr_writeback;
 	return nr_reclaimed;
 }
 
@@ -1409,6 +1415,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	unsigned long nr_taken;
 	unsigned long nr_anon;
 	unsigned long nr_file;
+	unsigned long nr_dirty = 0;
+	unsigned long nr_writeback = 0;
 
 	while (unlikely(too_many_isolated(zone, file, sc))) {
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -1457,12 +1465,14 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	spin_unlock_irq(&zone->lru_lock);
 
-	nr_reclaimed = shrink_page_list(&page_list, zone, sc, priority);
+	nr_reclaimed = shrink_page_list(&page_list, zone, sc, priority,
+						&nr_dirty, &nr_writeback);
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
 		set_reclaim_mode(priority, sc, true);
-		nr_reclaimed += shrink_page_list(&page_list, zone, sc, priority);
+		nr_reclaimed += shrink_page_list(&page_list, zone, sc,
+					priority, &nr_dirty, &nr_writeback);
 	}
 
 	local_irq_disable();
@@ -1472,6 +1482,16 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
 
+	/*
+	 * If we have encountered a high number of dirty pages under writeback
+	 * then we are reaching the end of the LRU too quickly and global
+	 * limits are not enough to throttle processes due to the page
+	 * distribution throughout zones. Scale the number of dirty pages that
+	 * must be under writeback before being throttled to priority.
+	 */
+	if (nr_writeback && nr_writeback >= (nr_taken >> (DEF_PRIORITY-priority)))
+		wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
+
 	trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
 		zone_idx(zone),
 		nr_scanned, nr_reclaimed,
-- 
1.7.3.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

Workloads that are allocating frequently and writing files place a
large number of dirty pages on the LRU. With use-once logic, it is
possible for them to reach the end of the LRU quickly requiring the
reclaimer to scan more to find clean pages. Ordinarily, processes that
are dirtying memory will get throttled by dirty balancing but this
is a global heuristic and does not take into account that LRUs are
maintained on a per-zone basis. This can lead to a situation whereby
reclaim is scanning heavily, skipping over a large number of pages
under writeback and recycling them around the LRU consuming CPU.

This patch checks how many of the number of pages isolated from the
LRU were dirty and under writeback. If a percentage of them under
writeback, the process will be throttled if a backing device or the
zone is congested. Note that this applies whether it is anonymous or
file-backed pages that are under writeback meaning that swapping is
potentially throttled. This is intentional due to the fact if the
swap device is congested, scanning more pages and dispatching more
IO is not going to help matters.

The percentage that must be in writeback depends on the priority. At
default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
the greater the likelihood the process will get throttled to allow
the flusher threads to make some progress.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/vmscan.c |   26 +++++++++++++++++++++++---
 1 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6d7c696..b0437f2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -711,7 +711,9 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
 static unsigned long shrink_page_list(struct list_head *page_list,
 				      struct zone *zone,
 				      struct scan_control *sc,
-				      int priority)
+				      int priority,
+				      unsigned long *ret_nr_dirty,
+				      unsigned long *ret_nr_writeback)
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
@@ -719,6 +721,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 	unsigned long nr_dirty = 0;
 	unsigned long nr_congested = 0;
 	unsigned long nr_reclaimed = 0;
+	unsigned long nr_writeback = 0;
 
 	cond_resched();
 
@@ -755,6 +758,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
 
 		if (PageWriteback(page)) {
+			nr_writeback++;
 			/*
 			 * Synchronous reclaim cannot queue pages for
 			 * writeback due to the possibility of stack overflow
@@ -960,6 +964,8 @@ keep_lumpy:
 
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
+	*ret_nr_dirty += nr_dirty;
+	*ret_nr_writeback += nr_writeback;
 	return nr_reclaimed;
 }
 
@@ -1409,6 +1415,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	unsigned long nr_taken;
 	unsigned long nr_anon;
 	unsigned long nr_file;
+	unsigned long nr_dirty = 0;
+	unsigned long nr_writeback = 0;
 
 	while (unlikely(too_many_isolated(zone, file, sc))) {
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -1457,12 +1465,14 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	spin_unlock_irq(&zone->lru_lock);
 
-	nr_reclaimed = shrink_page_list(&page_list, zone, sc, priority);
+	nr_reclaimed = shrink_page_list(&page_list, zone, sc, priority,
+						&nr_dirty, &nr_writeback);
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
 		set_reclaim_mode(priority, sc, true);
-		nr_reclaimed += shrink_page_list(&page_list, zone, sc, priority);
+		nr_reclaimed += shrink_page_list(&page_list, zone, sc,
+					priority, &nr_dirty, &nr_writeback);
 	}
 
 	local_irq_disable();
@@ -1472,6 +1482,16 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
 
+	/*
+	 * If we have encountered a high number of dirty pages under writeback
+	 * then we are reaching the end of the LRU too quickly and global
+	 * limits are not enough to throttle processes due to the page
+	 * distribution throughout zones. Scale the number of dirty pages that
+	 * must be under writeback before being throttled to priority.
+	 */
+	if (nr_writeback && nr_writeback >= (nr_taken >> (DEF_PRIORITY-priority)))
+		wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
+
 	trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
 		zone_idx(zone),
 		nr_scanned, nr_reclaimed,
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
  2011-08-10 10:47 ` Mel Gorman
  (?)
@ 2011-08-10 10:47   ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

When direct reclaim encounters a dirty page, it gets recycled around
the LRU for another cycle. This patch marks the page PageReclaim
similar to deactivate_page() so that the page gets reclaimed almost
immediately after the page gets cleaned. This is to avoid reclaiming
clean pages that are younger than a dirty page encountered at the
end of the LRU that might have been something like a use-once page.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <jweiner@redhat.com>
---
 include/linux/mmzone.h |    2 +-
 mm/vmscan.c            |   10 +++++++++-
 mm/vmstat.c            |    2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b70a0c0..c9c5797 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,7 +100,7 @@ enum zone_stat_item {
 	NR_UNSTABLE_NFS,	/* NFS unstable pages */
 	NR_BOUNCE,
 	NR_VMSCAN_WRITE,
-	NR_VMSCAN_WRITE_SKIP,
+	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
 	NR_WRITEBACK_TEMP,	/* Writeback using temporary buffers */
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b0437f2..33882a3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -826,7 +826,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 */
 			if (page_is_file_cache(page) &&
 					(!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
-				inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
+				/*
+				 * Immediately reclaim when written back.
+				 * Similar in principal to deactivate_page()
+				 * except we already have the page isolated
+				 * and know it's dirty
+				 */
+				inc_zone_page_state(page, NR_VMSCAN_IMMEDIATE);
+				SetPageReclaim(page);
+
 				goto keep_locked;
 			}
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index fd109f3..471b20b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -702,7 +702,7 @@ const char * const vmstat_text[] = {
 	"nr_unstable",
 	"nr_bounce",
 	"nr_vmscan_write",
-	"nr_vmscan_write_skip",
+	"nr_vmscan_immediate_reclaim",
 	"nr_writeback_temp",
 	"nr_isolated_anon",
 	"nr_isolated_file",
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig,
	Minchan Kim, Wu Fengguang, Johannes Weiner, Mel Gorman

When direct reclaim encounters a dirty page, it gets recycled around
the LRU for another cycle. This patch marks the page PageReclaim
similar to deactivate_page() so that the page gets reclaimed almost
immediately after the page gets cleaned. This is to avoid reclaiming
clean pages that are younger than a dirty page encountered at the
end of the LRU that might have been something like a use-once page.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <jweiner@redhat.com>
---
 include/linux/mmzone.h |    2 +-
 mm/vmscan.c            |   10 +++++++++-
 mm/vmstat.c            |    2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b70a0c0..c9c5797 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,7 +100,7 @@ enum zone_stat_item {
 	NR_UNSTABLE_NFS,	/* NFS unstable pages */
 	NR_BOUNCE,
 	NR_VMSCAN_WRITE,
-	NR_VMSCAN_WRITE_SKIP,
+	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
 	NR_WRITEBACK_TEMP,	/* Writeback using temporary buffers */
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b0437f2..33882a3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -826,7 +826,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 */
 			if (page_is_file_cache(page) &&
 					(!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
-				inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
+				/*
+				 * Immediately reclaim when written back.
+				 * Similar in principal to deactivate_page()
+				 * except we already have the page isolated
+				 * and know it's dirty
+				 */
+				inc_zone_page_state(page, NR_VMSCAN_IMMEDIATE);
+				SetPageReclaim(page);
+
 				goto keep_locked;
 			}
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index fd109f3..471b20b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -702,7 +702,7 @@ const char * const vmstat_text[] = {
 	"nr_unstable",
 	"nr_bounce",
 	"nr_vmscan_write",
-	"nr_vmscan_write_skip",
+	"nr_vmscan_immediate_reclaim",
 	"nr_writeback_temp",
 	"nr_isolated_anon",
 	"nr_isolated_file",
-- 
1.7.3.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
@ 2011-08-10 10:47   ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 10:47 UTC (permalink / raw)
  To: Linux-MM
  Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman

When direct reclaim encounters a dirty page, it gets recycled around
the LRU for another cycle. This patch marks the page PageReclaim
similar to deactivate_page() so that the page gets reclaimed almost
immediately after the page gets cleaned. This is to avoid reclaiming
clean pages that are younger than a dirty page encountered at the
end of the LRU that might have been something like a use-once page.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <jweiner@redhat.com>
---
 include/linux/mmzone.h |    2 +-
 mm/vmscan.c            |   10 +++++++++-
 mm/vmstat.c            |    2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b70a0c0..c9c5797 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,7 +100,7 @@ enum zone_stat_item {
 	NR_UNSTABLE_NFS,	/* NFS unstable pages */
 	NR_BOUNCE,
 	NR_VMSCAN_WRITE,
-	NR_VMSCAN_WRITE_SKIP,
+	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
 	NR_WRITEBACK_TEMP,	/* Writeback using temporary buffers */
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b0437f2..33882a3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -826,7 +826,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 */
 			if (page_is_file_cache(page) &&
 					(!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
-				inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
+				/*
+				 * Immediately reclaim when written back.
+				 * Similar in principal to deactivate_page()
+				 * except we already have the page isolated
+				 * and know it's dirty
+				 */
+				inc_zone_page_state(page, NR_VMSCAN_IMMEDIATE);
+				SetPageReclaim(page);
+
 				goto keep_locked;
 			}
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index fd109f3..471b20b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -702,7 +702,7 @@ const char * const vmstat_text[] = {
 	"nr_unstable",
 	"nr_bounce",
 	"nr_vmscan_write",
-	"nr_vmscan_write_skip",
+	"nr_vmscan_immediate_reclaim",
 	"nr_writeback_temp",
 	"nr_isolated_anon",
 	"nr_isolated_file",
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
  2011-08-10 10:47 ` Mel Gorman
  (?)
@ 2011-08-10 11:00   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2011-08-10 11:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, Aug 10, 2011 at 11:47:13AM +0100, Mel Gorman wrote:
>   o Dropped btrfs warning when filesystems are called from direct
>     reclaim. The fallback method for migration looks indistinguishable
>     from direct reclaim.

The right fix is to simply remove that fallback, possibly in combination
with implementating real migration support for btrfs.


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-10 11:00   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2011-08-10 11:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Wed, Aug 10, 2011 at 11:47:13AM +0100, Mel Gorman wrote:
>   o Dropped btrfs warning when filesystems are called from direct
>     reclaim. The fallback method for migration looks indistinguishable
>     from direct reclaim.

The right fix is to simply remove that fallback, possibly in combination
with implementating real migration support for btrfs.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-10 11:00   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2011-08-10 11:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, Aug 10, 2011 at 11:47:13AM +0100, Mel Gorman wrote:
>   o Dropped btrfs warning when filesystems are called from direct
>     reclaim. The fallback method for migration looks indistinguishable
>     from direct reclaim.

The right fix is to simply remove that fallback, possibly in combination
with implementating real migration support for btrfs.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
  2011-08-10 11:00   ` Christoph Hellwig
  (?)
@ 2011-08-10 11:15     ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 11:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Johannes Weiner, Wu Fengguang,
	Jan Kara, Rik van Riel, Minchan Kim

On Wed, Aug 10, 2011 at 07:00:56AM -0400, Christoph Hellwig wrote:
> On Wed, Aug 10, 2011 at 11:47:13AM +0100, Mel Gorman wrote:
> >   o Dropped btrfs warning when filesystems are called from direct
> >     reclaim. The fallback method for migration looks indistinguishable
> >     from direct reclaim.
> 
> The right fix is to simply remove that fallback, possibly in combination
> with implementating real migration support for btrfs.
> 

Removing the fallback entirely is overkill as proper migration support
is not going to get 100% coverage but I agree that btrfs should have
real migration support. I didn't think it belonged in this series
though.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-10 11:15     ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 11:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Linux-MM, Minchan Kim,
	Wu Fengguang, Johannes Weiner

On Wed, Aug 10, 2011 at 07:00:56AM -0400, Christoph Hellwig wrote:
> On Wed, Aug 10, 2011 at 11:47:13AM +0100, Mel Gorman wrote:
> >   o Dropped btrfs warning when filesystems are called from direct
> >     reclaim. The fallback method for migration looks indistinguishable
> >     from direct reclaim.
> 
> The right fix is to simply remove that fallback, possibly in combination
> with implementating real migration support for btrfs.
> 

Removing the fallback entirely is overkill as proper migration support
is not going to get 100% coverage but I agree that btrfs should have
real migration support. I didn't think it belonged in this series
though.

-- 
Mel Gorman
SUSE Labs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-10 11:15     ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-10 11:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Johannes Weiner, Wu Fengguang,
	Jan Kara, Rik van Riel, Minchan Kim

On Wed, Aug 10, 2011 at 07:00:56AM -0400, Christoph Hellwig wrote:
> On Wed, Aug 10, 2011 at 11:47:13AM +0100, Mel Gorman wrote:
> >   o Dropped btrfs warning when filesystems are called from direct
> >     reclaim. The fallback method for migration looks indistinguishable
> >     from direct reclaim.
> 
> The right fix is to simply remove that fallback, possibly in combination
> with implementating real migration support for btrfs.
> 

Removing the fallback entirely is overkill as proper migration support
is not going to get 100% coverage but I agree that btrfs should have
real migration support. I didn't think it belonged in this series
though.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-10 12:40     ` Johannes Weiner
  -1 siblings, 0 replies; 120+ messages in thread
From: Johannes Weiner @ 2011-08-10 12:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim

On Wed, Aug 10, 2011 at 11:47:14AM +0100, Mel Gorman wrote:
> From: Mel Gorman <mel@csn.ul.ie>
> 
> When kswapd is failing to keep zones above the min watermark, a process
> will enter direct reclaim in the same manner kswapd does. If a dirty
> page is encountered during the scan, this page is written to backing
> storage using mapping->writepage.
> 
> This causes two problems. First, it can result in very deep call
> stacks, particularly if the target storage or filesystem are complex.
> Some filesystems ignore write requests from direct reclaim as a result.
> The second is that a single-page flush is inefficient in terms of IO.
> While there is an expectation that the elevator will merge requests,
> this does not always happen. Quoting Christoph Hellwig;
> 
> 	The elevator has a relatively small window it can operate on,
> 	and can never fix up a bad large scale writeback pattern.
> 
> This patch prevents direct reclaim writing back filesystem pages by
> checking if current is kswapd. Anonymous pages are still written to
> swap as there is not the equivalent of a flusher thread for anonymous
> pages. If the dirty pages cannot be written back, they are placed
> back on the LRU lists. There is now a direct dependency on dirty page
> balancing to prevent too many pages in the system being dirtied which
> would prevent reclaim making forward progress.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Acked-by: Johannes Weiner <jweiner@redhat.com>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
@ 2011-08-10 12:40     ` Johannes Weiner
  0 siblings, 0 replies; 120+ messages in thread
From: Johannes Weiner @ 2011-08-10 12:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang

On Wed, Aug 10, 2011 at 11:47:14AM +0100, Mel Gorman wrote:
> From: Mel Gorman <mel@csn.ul.ie>
> 
> When kswapd is failing to keep zones above the min watermark, a process
> will enter direct reclaim in the same manner kswapd does. If a dirty
> page is encountered during the scan, this page is written to backing
> storage using mapping->writepage.
> 
> This causes two problems. First, it can result in very deep call
> stacks, particularly if the target storage or filesystem are complex.
> Some filesystems ignore write requests from direct reclaim as a result.
> The second is that a single-page flush is inefficient in terms of IO.
> While there is an expectation that the elevator will merge requests,
> this does not always happen. Quoting Christoph Hellwig;
> 
> 	The elevator has a relatively small window it can operate on,
> 	and can never fix up a bad large scale writeback pattern.
> 
> This patch prevents direct reclaim writing back filesystem pages by
> checking if current is kswapd. Anonymous pages are still written to
> swap as there is not the equivalent of a flusher thread for anonymous
> pages. If the dirty pages cannot be written back, they are placed
> back on the LRU lists. There is now a direct dependency on dirty page
> balancing to prevent too many pages in the system being dirtied which
> would prevent reclaim making forward progress.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Acked-by: Johannes Weiner <jweiner@redhat.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
@ 2011-08-10 12:40     ` Johannes Weiner
  0 siblings, 0 replies; 120+ messages in thread
From: Johannes Weiner @ 2011-08-10 12:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim

On Wed, Aug 10, 2011 at 11:47:14AM +0100, Mel Gorman wrote:
> From: Mel Gorman <mel@csn.ul.ie>
> 
> When kswapd is failing to keep zones above the min watermark, a process
> will enter direct reclaim in the same manner kswapd does. If a dirty
> page is encountered during the scan, this page is written to backing
> storage using mapping->writepage.
> 
> This causes two problems. First, it can result in very deep call
> stacks, particularly if the target storage or filesystem are complex.
> Some filesystems ignore write requests from direct reclaim as a result.
> The second is that a single-page flush is inefficient in terms of IO.
> While there is an expectation that the elevator will merge requests,
> this does not always happen. Quoting Christoph Hellwig;
> 
> 	The elevator has a relatively small window it can operate on,
> 	and can never fix up a bad large scale writeback pattern.
> 
> This patch prevents direct reclaim writing back filesystem pages by
> checking if current is kswapd. Anonymous pages are still written to
> swap as there is not the equivalent of a flusher thread for anonymous
> pages. If the dirty pages cannot be written back, they are placed
> back on the LRU lists. There is now a direct dependency on dirty page
> balancing to prevent too many pages in the system being dirtied which
> would prevent reclaim making forward progress.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Acked-by: Johannes Weiner <jweiner@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-10 12:41     ` Johannes Weiner
  -1 siblings, 0 replies; 120+ messages in thread
From: Johannes Weiner @ 2011-08-10 12:41 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim

On Wed, Aug 10, 2011 at 11:47:15AM +0100, Mel Gorman wrote:
> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Acked-by: Johannes Weiner <jweiner@redhat.com>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
@ 2011-08-10 12:41     ` Johannes Weiner
  0 siblings, 0 replies; 120+ messages in thread
From: Johannes Weiner @ 2011-08-10 12:41 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang

On Wed, Aug 10, 2011 at 11:47:15AM +0100, Mel Gorman wrote:
> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Acked-by: Johannes Weiner <jweiner@redhat.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
@ 2011-08-10 12:41     ` Johannes Weiner
  0 siblings, 0 replies; 120+ messages in thread
From: Johannes Weiner @ 2011-08-10 12:41 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim

On Wed, Aug 10, 2011 at 11:47:15AM +0100, Mel Gorman wrote:
> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Acked-by: Johannes Weiner <jweiner@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-10 12:44     ` Johannes Weiner
  -1 siblings, 0 replies; 120+ messages in thread
From: Johannes Weiner @ 2011-08-10 12:44 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim

On Wed, Aug 10, 2011 at 11:47:18AM +0100, Mel Gorman wrote:
> It is preferable that no dirty pages are dispatched for cleaning from
> the page reclaim path. At normal priorities, this patch prevents kswapd
> writing pages.
> 
> However, page reclaim does have a requirement that pages be freed
> in a particular zone. If it is failing to make sufficient progress
> (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> considered to be the point where kswapd is getting into trouble
> reclaiming pages. If this priority is reached, kswapd will dispatch
> pages for writing.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Acked-by: Johannes Weiner <jweiner@redhat.com>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-10 12:44     ` Johannes Weiner
  0 siblings, 0 replies; 120+ messages in thread
From: Johannes Weiner @ 2011-08-10 12:44 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang

On Wed, Aug 10, 2011 at 11:47:18AM +0100, Mel Gorman wrote:
> It is preferable that no dirty pages are dispatched for cleaning from
> the page reclaim path. At normal priorities, this patch prevents kswapd
> writing pages.
> 
> However, page reclaim does have a requirement that pages be freed
> in a particular zone. If it is failing to make sufficient progress
> (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> considered to be the point where kswapd is getting into trouble
> reclaiming pages. If this priority is reached, kswapd will dispatch
> pages for writing.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Acked-by: Johannes Weiner <jweiner@redhat.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-10 12:44     ` Johannes Weiner
  0 siblings, 0 replies; 120+ messages in thread
From: Johannes Weiner @ 2011-08-10 12:44 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim

On Wed, Aug 10, 2011 at 11:47:18AM +0100, Mel Gorman wrote:
> It is preferable that no dirty pages are dispatched for cleaning from
> the page reclaim path. At normal priorities, this patch prevents kswapd
> writing pages.
> 
> However, page reclaim does have a requirement that pages be freed
> in a particular zone. If it is failing to make sufficient progress
> (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> considered to be the point where kswapd is getting into trouble
> reclaiming pages. If this priority is reached, kswapd will dispatch
> pages for writing.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Acked-by: Johannes Weiner <jweiner@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-10 23:19     ` Minchan Kim
  -1 siblings, 0 replies; 120+ messages in thread
From: Minchan Kim @ 2011-08-10 23:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel

On Wed, Aug 10, 2011 at 7:47 PM, Mel Gorman <mgorman@suse.de> wrote:
> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
@ 2011-08-10 23:19     ` Minchan Kim
  0 siblings, 0 replies; 120+ messages in thread
From: Minchan Kim @ 2011-08-10 23:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Wu Fengguang, Johannes Weiner

On Wed, Aug 10, 2011 at 7:47 PM, Mel Gorman <mgorman@suse.de> wrote:
> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
@ 2011-08-10 23:19     ` Minchan Kim
  0 siblings, 0 replies; 120+ messages in thread
From: Minchan Kim @ 2011-08-10 23:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel

On Wed, Aug 10, 2011 at 7:47 PM, Mel Gorman <mgorman@suse.de> wrote:
> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-10 23:22     ` Minchan Kim
  -1 siblings, 0 replies; 120+ messages in thread
From: Minchan Kim @ 2011-08-10 23:22 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel

On Wed, Aug 10, 2011 at 7:47 PM, Mel Gorman <mgorman@suse.de> wrote:
> When direct reclaim encounters a dirty page, it gets recycled around
> the LRU for another cycle. This patch marks the page PageReclaim
> similar to deactivate_page() so that the page gets reclaimed almost
> immediately after the page gets cleaned. This is to avoid reclaiming
> clean pages that are younger than a dirty page encountered at the
> end of the LRU that might have been something like a use-once page.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
@ 2011-08-10 23:22     ` Minchan Kim
  0 siblings, 0 replies; 120+ messages in thread
From: Minchan Kim @ 2011-08-10 23:22 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Wu Fengguang, Johannes Weiner

On Wed, Aug 10, 2011 at 7:47 PM, Mel Gorman <mgorman@suse.de> wrote:
> When direct reclaim encounters a dirty page, it gets recycled around
> the LRU for another cycle. This patch marks the page PageReclaim
> similar to deactivate_page() so that the page gets reclaimed almost
> immediately after the page gets cleaned. This is to avoid reclaiming
> clean pages that are younger than a dirty page encountered at the
> end of the LRU that might have been something like a use-once page.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
@ 2011-08-10 23:22     ` Minchan Kim
  0 siblings, 0 replies; 120+ messages in thread
From: Minchan Kim @ 2011-08-10 23:22 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel

On Wed, Aug 10, 2011 at 7:47 PM, Mel Gorman <mgorman@suse.de> wrote:
> When direct reclaim encounters a dirty page, it gets recycled around
> the LRU for another cycle. This patch marks the page PageReclaim
> similar to deactivate_page() so that the page gets reclaimed almost
> immediately after the page gets cleaned. This is to avoid reclaiming
> clean pages that are younger than a dirty page encountered at the
> end of the LRU that might have been something like a use-once page.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-11  9:03     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:14 +0100
Mel Gorman <mgorman@suse.de> wrote:

> From: Mel Gorman <mel@csn.ul.ie>
> 
> When kswapd is failing to keep zones above the min watermark, a process
> will enter direct reclaim in the same manner kswapd does. If a dirty
> page is encountered during the scan, this page is written to backing
> storage using mapping->writepage.
> 
> This causes two problems. First, it can result in very deep call
> stacks, particularly if the target storage or filesystem are complex.
> Some filesystems ignore write requests from direct reclaim as a result.
> The second is that a single-page flush is inefficient in terms of IO.
> While there is an expectation that the elevator will merge requests,
> this does not always happen. Quoting Christoph Hellwig;
> 
> 	The elevator has a relatively small window it can operate on,
> 	and can never fix up a bad large scale writeback pattern.
> 
> This patch prevents direct reclaim writing back filesystem pages by
> checking if current is kswapd. Anonymous pages are still written to
> swap as there is not the equivalent of a flusher thread for anonymous
> pages. If the dirty pages cannot be written back, they are placed
> back on the LRU lists. There is now a direct dependency on dirty page
> balancing to prevent too many pages in the system being dirtied which
> would prevent reclaim making forward progress.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
@ 2011-08-11  9:03     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Wed, 10 Aug 2011 11:47:14 +0100
Mel Gorman <mgorman@suse.de> wrote:

> From: Mel Gorman <mel@csn.ul.ie>
> 
> When kswapd is failing to keep zones above the min watermark, a process
> will enter direct reclaim in the same manner kswapd does. If a dirty
> page is encountered during the scan, this page is written to backing
> storage using mapping->writepage.
> 
> This causes two problems. First, it can result in very deep call
> stacks, particularly if the target storage or filesystem are complex.
> Some filesystems ignore write requests from direct reclaim as a result.
> The second is that a single-page flush is inefficient in terms of IO.
> While there is an expectation that the elevator will merge requests,
> this does not always happen. Quoting Christoph Hellwig;
> 
> 	The elevator has a relatively small window it can operate on,
> 	and can never fix up a bad large scale writeback pattern.
> 
> This patch prevents direct reclaim writing back filesystem pages by
> checking if current is kswapd. Anonymous pages are still written to
> swap as there is not the equivalent of a flusher thread for anonymous
> pages. If the dirty pages cannot be written back, they are placed
> back on the LRU lists. There is now a direct dependency on dirty page
> balancing to prevent too many pages in the system being dirtied which
> would prevent reclaim making forward progress.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
@ 2011-08-11  9:03     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:14 +0100
Mel Gorman <mgorman@suse.de> wrote:

> From: Mel Gorman <mel@csn.ul.ie>
> 
> When kswapd is failing to keep zones above the min watermark, a process
> will enter direct reclaim in the same manner kswapd does. If a dirty
> page is encountered during the scan, this page is written to backing
> storage using mapping->writepage.
> 
> This causes two problems. First, it can result in very deep call
> stacks, particularly if the target storage or filesystem are complex.
> Some filesystems ignore write requests from direct reclaim as a result.
> The second is that a single-page flush is inefficient in terms of IO.
> While there is an expectation that the elevator will merge requests,
> this does not always happen. Quoting Christoph Hellwig;
> 
> 	The elevator has a relatively small window it can operate on,
> 	and can never fix up a bad large scale writeback pattern.
> 
> This patch prevents direct reclaim writing back filesystem pages by
> checking if current is kswapd. Anonymous pages are still written to
> swap as there is not the equivalent of a flusher thread for anonymous
> pages. If the dirty pages cannot be written back, they are placed
> back on the LRU lists. There is now a direct dependency on dirty page
> balancing to prevent too many pages in the system being dirtied which
> would prevent reclaim making forward progress.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-11  9:05     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:15 +0100
Mel Gorman <mgorman@suse.de> wrote:

> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
@ 2011-08-11  9:05     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Wed, 10 Aug 2011 11:47:15 +0100
Mel Gorman <mgorman@suse.de> wrote:

> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
@ 2011-08-11  9:05     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:15 +0100
Mel Gorman <mgorman@suse.de> wrote:

> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-11  9:10     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:10 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:18 +0100
Mel Gorman <mgorman@suse.de> wrote:

> It is preferable that no dirty pages are dispatched for cleaning from
> the page reclaim path. At normal priorities, this patch prevents kswapd
> writing pages.
> 
> However, page reclaim does have a requirement that pages be freed
> in a particular zone. If it is failing to make sufficient progress
> (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> considered to be the point where kswapd is getting into trouble
> reclaiming pages. If this priority is reached, kswapd will dispatch
> pages for writing.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

BTW, I'd like to see summary of the effect of priority..


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-11  9:10     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:10 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Wed, 10 Aug 2011 11:47:18 +0100
Mel Gorman <mgorman@suse.de> wrote:

> It is preferable that no dirty pages are dispatched for cleaning from
> the page reclaim path. At normal priorities, this patch prevents kswapd
> writing pages.
> 
> However, page reclaim does have a requirement that pages be freed
> in a particular zone. If it is failing to make sufficient progress
> (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> considered to be the point where kswapd is getting into trouble
> reclaiming pages. If this priority is reached, kswapd will dispatch
> pages for writing.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

BTW, I'd like to see summary of the effect of priority..

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-11  9:10     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:10 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:18 +0100
Mel Gorman <mgorman@suse.de> wrote:

> It is preferable that no dirty pages are dispatched for cleaning from
> the page reclaim path. At normal priorities, this patch prevents kswapd
> writing pages.
> 
> However, page reclaim does have a requirement that pages be freed
> in a particular zone. If it is failing to make sufficient progress
> (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> considered to be the point where kswapd is getting into trouble
> reclaiming pages. If this priority is reached, kswapd will dispatch
> pages for writing.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

BTW, I'd like to see summary of the effect of priority..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-11  9:18     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:19 +0100
Mel Gorman <mgorman@suse.de> wrote:

> Workloads that are allocating frequently and writing files place a
> large number of dirty pages on the LRU. With use-once logic, it is
> possible for them to reach the end of the LRU quickly requiring the
> reclaimer to scan more to find clean pages. Ordinarily, processes that
> are dirtying memory will get throttled by dirty balancing but this
> is a global heuristic and does not take into account that LRUs are
> maintained on a per-zone basis. This can lead to a situation whereby
> reclaim is scanning heavily, skipping over a large number of pages
> under writeback and recycling them around the LRU consuming CPU.
> 
> This patch checks how many of the number of pages isolated from the
> LRU were dirty and under writeback. If a percentage of them under
> writeback, the process will be throttled if a backing device or the
> zone is congested. Note that this applies whether it is anonymous or
> file-backed pages that are under writeback meaning that swapping is
> potentially throttled. This is intentional due to the fact if the
> swap device is congested, scanning more pages and dispatching more
> IO is not going to help matters.
> 
> The percentage that must be in writeback depends on the priority. At
> default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> the greater the likelihood the process will get throttled to allow
> the flusher threads to make some progress.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Johannes Weiner <jweiner@redhat.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Maybe I need to add memcg_is_congested() ...


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-11  9:18     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Wed, 10 Aug 2011 11:47:19 +0100
Mel Gorman <mgorman@suse.de> wrote:

> Workloads that are allocating frequently and writing files place a
> large number of dirty pages on the LRU. With use-once logic, it is
> possible for them to reach the end of the LRU quickly requiring the
> reclaimer to scan more to find clean pages. Ordinarily, processes that
> are dirtying memory will get throttled by dirty balancing but this
> is a global heuristic and does not take into account that LRUs are
> maintained on a per-zone basis. This can lead to a situation whereby
> reclaim is scanning heavily, skipping over a large number of pages
> under writeback and recycling them around the LRU consuming CPU.
> 
> This patch checks how many of the number of pages isolated from the
> LRU were dirty and under writeback. If a percentage of them under
> writeback, the process will be throttled if a backing device or the
> zone is congested. Note that this applies whether it is anonymous or
> file-backed pages that are under writeback meaning that swapping is
> potentially throttled. This is intentional due to the fact if the
> swap device is congested, scanning more pages and dispatching more
> IO is not going to help matters.
> 
> The percentage that must be in writeback depends on the priority. At
> default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> the greater the likelihood the process will get throttled to allow
> the flusher threads to make some progress.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Johannes Weiner <jweiner@redhat.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Maybe I need to add memcg_is_congested() ...

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-11  9:18     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:19 +0100
Mel Gorman <mgorman@suse.de> wrote:

> Workloads that are allocating frequently and writing files place a
> large number of dirty pages on the LRU. With use-once logic, it is
> possible for them to reach the end of the LRU quickly requiring the
> reclaimer to scan more to find clean pages. Ordinarily, processes that
> are dirtying memory will get throttled by dirty balancing but this
> is a global heuristic and does not take into account that LRUs are
> maintained on a per-zone basis. This can lead to a situation whereby
> reclaim is scanning heavily, skipping over a large number of pages
> under writeback and recycling them around the LRU consuming CPU.
> 
> This patch checks how many of the number of pages isolated from the
> LRU were dirty and under writeback. If a percentage of them under
> writeback, the process will be throttled if a backing device or the
> zone is congested. Note that this applies whether it is anonymous or
> file-backed pages that are under writeback meaning that swapping is
> potentially throttled. This is intentional due to the fact if the
> swap device is congested, scanning more pages and dispatching more
> IO is not going to help matters.
> 
> The percentage that must be in writeback depends on the priority. At
> default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> the greater the likelihood the process will get throttled to allow
> the flusher threads to make some progress.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Johannes Weiner <jweiner@redhat.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Maybe I need to add memcg_is_congested() ...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-11  9:19     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:20 +0100
Mel Gorman <mgorman@suse.de> wrote:

> When direct reclaim encounters a dirty page, it gets recycled around
> the LRU for another cycle. This patch marks the page PageReclaim
> similar to deactivate_page() so that the page gets reclaimed almost
> immediately after the page gets cleaned. This is to avoid reclaiming
> clean pages that are younger than a dirty page encountered at the
> end of the LRU that might have been something like a use-once page.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Johannes Weiner <jweiner@redhat.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
@ 2011-08-11  9:19     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Wed, 10 Aug 2011 11:47:20 +0100
Mel Gorman <mgorman@suse.de> wrote:

> When direct reclaim encounters a dirty page, it gets recycled around
> the LRU for another cycle. This patch marks the page PageReclaim
> similar to deactivate_page() so that the page gets reclaimed almost
> immediately after the page gets cleaned. This is to avoid reclaiming
> clean pages that are younger than a dirty page encountered at the
> end of the LRU that might have been something like a use-once page.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Johannes Weiner <jweiner@redhat.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
@ 2011-08-11  9:19     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-11  9:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:20 +0100
Mel Gorman <mgorman@suse.de> wrote:

> When direct reclaim encounters a dirty page, it gets recycled around
> the LRU for another cycle. This patch marks the page PageReclaim
> similar to deactivate_page() so that the page gets reclaimed almost
> immediately after the page gets cleaned. This is to avoid reclaiming
> clean pages that are younger than a dirty page encountered at the
> end of the LRU that might have been something like a use-once page.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Johannes Weiner <jweiner@redhat.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-11 15:57     ` Rik van Riel
  -1 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 15:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> From: Mel Gorman<mel@csn.ul.ie>
>
> When kswapd is failing to keep zones above the min watermark, a process
> will enter direct reclaim in the same manner kswapd does. If a dirty
> page is encountered during the scan, this page is written to backing
> storage using mapping->writepage.
>
> This causes two problems. First, it can result in very deep call
> stacks, particularly if the target storage or filesystem are complex.
> Some filesystems ignore write requests from direct reclaim as a result.
> The second is that a single-page flush is inefficient in terms of IO.
> While there is an expectation that the elevator will merge requests,
> this does not always happen. Quoting Christoph Hellwig;
>
> 	The elevator has a relatively small window it can operate on,
> 	and can never fix up a bad large scale writeback pattern.
>
> This patch prevents direct reclaim writing back filesystem pages by
> checking if current is kswapd. Anonymous pages are still written to
> swap as there is not the equivalent of a flusher thread for anonymous
> pages. If the dirty pages cannot be written back, they are placed
> back on the LRU lists. There is now a direct dependency on dirty page
> balancing to prevent too many pages in the system being dirtied which
> would prevent reclaim making forward progress.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Reviewed-by: Minchan Kim<minchan.kim@gmail.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
@ 2011-08-11 15:57     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 15:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM, Minchan Kim,
	Wu Fengguang, Johannes Weiner

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> From: Mel Gorman<mel@csn.ul.ie>
>
> When kswapd is failing to keep zones above the min watermark, a process
> will enter direct reclaim in the same manner kswapd does. If a dirty
> page is encountered during the scan, this page is written to backing
> storage using mapping->writepage.
>
> This causes two problems. First, it can result in very deep call
> stacks, particularly if the target storage or filesystem are complex.
> Some filesystems ignore write requests from direct reclaim as a result.
> The second is that a single-page flush is inefficient in terms of IO.
> While there is an expectation that the elevator will merge requests,
> this does not always happen. Quoting Christoph Hellwig;
>
> 	The elevator has a relatively small window it can operate on,
> 	and can never fix up a bad large scale writeback pattern.
>
> This patch prevents direct reclaim writing back filesystem pages by
> checking if current is kswapd. Anonymous pages are still written to
> swap as there is not the equivalent of a flusher thread for anonymous
> pages. If the dirty pages cannot be written back, they are placed
> back on the LRU lists. There is now a direct dependency on dirty page
> balancing to prevent too many pages in the system being dirtied which
> would prevent reclaim making forward progress.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Reviewed-by: Minchan Kim<minchan.kim@gmail.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim
@ 2011-08-11 15:57     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 15:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> From: Mel Gorman<mel@csn.ul.ie>
>
> When kswapd is failing to keep zones above the min watermark, a process
> will enter direct reclaim in the same manner kswapd does. If a dirty
> page is encountered during the scan, this page is written to backing
> storage using mapping->writepage.
>
> This causes two problems. First, it can result in very deep call
> stacks, particularly if the target storage or filesystem are complex.
> Some filesystems ignore write requests from direct reclaim as a result.
> The second is that a single-page flush is inefficient in terms of IO.
> While there is an expectation that the elevator will merge requests,
> this does not always happen. Quoting Christoph Hellwig;
>
> 	The elevator has a relatively small window it can operate on,
> 	and can never fix up a bad large scale writeback pattern.
>
> This patch prevents direct reclaim writing back filesystem pages by
> checking if current is kswapd. Anonymous pages are still written to
> swap as there is not the equivalent of a flusher thread for anonymous
> pages. If the dirty pages cannot be written back, they are placed
> back on the LRU lists. There is now a direct dependency on dirty page
> balancing to prevent too many pages in the system being dirtied which
> would prevent reclaim making forward progress.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Reviewed-by: Minchan Kim<minchan.kim@gmail.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-11 16:52     ` Rik van Riel
  -1 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 16:52 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Reviewed-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
@ 2011-08-11 16:52     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 16:52 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM, Minchan Kim,
	Wu Fengguang, Johannes Weiner

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Reviewed-by: Rik van Riel <riel@redhat.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback
@ 2011-08-11 16:52     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 16:52 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Lumpy reclaim worked with two passes - the first which queued pages for
> IO and the second which waited on writeback. As direct reclaim can no
> longer write pages there is some dead code. This patch removes it but
> direct reclaim will continue to wait on pages under writeback while in
> synchronous reclaim mode.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Reviewed-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 3/7] xfs: Warn if direct reclaim tries to writeback pages
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-11 16:53     ` Rik van Riel
  -1 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 16:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Direct reclaim should never writeback pages. For now, handle the
> situation and warn about it. Ultimately, this will be a BUG_ON.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Reviewed-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 3/7] xfs: Warn if direct reclaim tries to writeback pages
@ 2011-08-11 16:53     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 16:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM, Minchan Kim,
	Wu Fengguang, Johannes Weiner

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Direct reclaim should never writeback pages. For now, handle the
> situation and warn about it. Ultimately, this will be a BUG_ON.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Reviewed-by: Rik van Riel <riel@redhat.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 3/7] xfs: Warn if direct reclaim tries to writeback pages
@ 2011-08-11 16:53     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 16:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Direct reclaim should never writeback pages. For now, handle the
> situation and warn about it. Ultimately, this will be a BUG_ON.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Reviewed-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 4/7] ext4: Warn if direct reclaim tries to writeback pages
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-11 17:07     ` Rik van Riel
  -1 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 17:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Direct reclaim should never writeback pages. Warn if an attempt
> is made.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Reviewed-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 4/7] ext4: Warn if direct reclaim tries to writeback pages
@ 2011-08-11 17:07     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 17:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM, Minchan Kim,
	Wu Fengguang, Johannes Weiner

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Direct reclaim should never writeback pages. Warn if an attempt
> is made.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Reviewed-by: Rik van Riel <riel@redhat.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 4/7] ext4: Warn if direct reclaim tries to writeback pages
@ 2011-08-11 17:07     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 17:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Direct reclaim should never writeback pages. Warn if an attempt
> is made.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Reviewed-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-11 18:18     ` Rik van Riel
  -1 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 18:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> It is preferable that no dirty pages are dispatched for cleaning from
> the page reclaim path. At normal priorities, this patch prevents kswapd
> writing pages.
>
> However, page reclaim does have a requirement that pages be freed
> in a particular zone. If it is failing to make sufficient progress
> (reclaiming<  SWAP_CLUSTER_MAX at any priority priority), the priority
> is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> considered to be the point where kswapd is getting into trouble
> reclaiming pages. If this priority is reached, kswapd will dispatch
> pages for writing.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Reviewed-by: Minchan Kim<minchan.kim@gmail.com>

My only worry with this patch is that maybe we'll burn too
much CPU time freeing pages from a zone.  However, chances
are we'll have freed pages from other zones when scanning
one zone multiple times (the page cache dirty limit is global,
the clean pages have to be _somewhere_).

Since the bulk of the allocators are not too picky about
which zone they get their pages from, I suspect this patch
will be an overall improvement pretty much all the time.

Acked-by: Rik van Riel <riel@redhat.com>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-11 18:18     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 18:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM, Minchan Kim,
	Wu Fengguang, Johannes Weiner

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> It is preferable that no dirty pages are dispatched for cleaning from
> the page reclaim path. At normal priorities, this patch prevents kswapd
> writing pages.
>
> However, page reclaim does have a requirement that pages be freed
> in a particular zone. If it is failing to make sufficient progress
> (reclaiming<  SWAP_CLUSTER_MAX at any priority priority), the priority
> is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> considered to be the point where kswapd is getting into trouble
> reclaiming pages. If this priority is reached, kswapd will dispatch
> pages for writing.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Reviewed-by: Minchan Kim<minchan.kim@gmail.com>

My only worry with this patch is that maybe we'll burn too
much CPU time freeing pages from a zone.  However, chances
are we'll have freed pages from other zones when scanning
one zone multiple times (the page cache dirty limit is global,
the clean pages have to be _somewhere_).

Since the bulk of the allocators are not too picky about
which zone they get their pages from, I suspect this patch
will be an overall improvement pretty much all the time.

Acked-by: Rik van Riel <riel@redhat.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-11 18:18     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-11 18:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> It is preferable that no dirty pages are dispatched for cleaning from
> the page reclaim path. At normal priorities, this patch prevents kswapd
> writing pages.
>
> However, page reclaim does have a requirement that pages be freed
> in a particular zone. If it is failing to make sufficient progress
> (reclaiming<  SWAP_CLUSTER_MAX at any priority priority), the priority
> is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> considered to be the point where kswapd is getting into trouble
> reclaiming pages. If this priority is reached, kswapd will dispatch
> pages for writing.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Reviewed-by: Minchan Kim<minchan.kim@gmail.com>

My only worry with this patch is that maybe we'll burn too
much CPU time freeing pages from a zone.  However, chances
are we'll have freed pages from other zones when scanning
one zone multiple times (the page cache dirty limit is global,
the clean pages have to be _somewhere_).

Since the bulk of the allocators are not too picky about
which zone they get their pages from, I suspect this patch
will be an overall improvement pretty much all the time.

Acked-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
  2011-08-11  9:10     ` KAMEZAWA Hiroyuki
  (?)
@ 2011-08-11 20:25       ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-11 20:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, Aug 11, 2011 at 06:10:29PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 10 Aug 2011 11:47:18 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > It is preferable that no dirty pages are dispatched for cleaning from
> > the page reclaim path. At normal priorities, this patch prevents kswapd
> > writing pages.
> > 
> > However, page reclaim does have a requirement that pages be freed
> > in a particular zone. If it is failing to make sufficient progress
> > (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> > is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> > considered to be the point where kswapd is getting into trouble
> > reclaiming pages. If this priority is reached, kswapd will dispatch
> > pages for writing.
> > 
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> 
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 

Thanks

> BTW, I'd like to see summary of the effect of priority..
> 

What sort of summary are you looking for? If pressure is high enough,
writes start happening from reclaim. On NUMA, it can be particularly
pronounced. Here is a summary of page writes from reclaim over a range
of tests

512M1P-xfs           Page writes file fsmark                                 8113        74
512M1P-xfs           Page writes file simple-wb                             19895         1
512M1P-xfs           Page writes file mmap-strm                               997        95
512M-xfs             Page writes file fsmark                                12071         9
512M-xfs             Page writes file simple-wb                             31709         1
512M-xfs             Page writes file mmap-strm                            148274      2448
512M-4X-xfs          Page writes file fsmark                                12828         0
512M-4X-xfs          Page writes file simple-wb                             32168         5
512M-4X-xfs          Page writes file mmap-strm                            346460      4405
512M-16X-xfs         Page writes file fsmark                                11566        29
512M-16X-xfs         Page writes file simple-wb                             31935         4
512M-16X-xfs         Page writes file mmap-strm                             38085      4371

With 1 processor (512M1P), very few writes occur as for the most part
flushers are keeping up. With 4x times more processors than there are
CPUs (512M-4X), there are more writes by kswapd..

1024M1P-xfs          Page writes file fsmark                                 3446         1
1024M1P-xfs          Page writes file simple-wb                             11697         6
1024M1P-xfs          Page writes file mmap-strm                              4077       446
1024M-xfs            Page writes file fsmark                                 5159         0
1024M-xfs            Page writes file simple-wb                             12785         5
1024M-xfs            Page writes file mmap-strm                            251153      8108
1024M-4X-xfs         Page writes file fsmark                                 4781         0
1024M-4X-xfs         Page writes file simple-wb                             12486         6
1024M-4X-xfs         Page writes file mmap-strm                           1627122     15000
1024M-16X-xfs        Page writes file fsmark                                 3777         1
1024M-16X-xfs        Page writes file simple-wb                             11856         2
1024M-16X-xfs        Page writes file mmap-strm                              6563      2638
4608M1P-xfs          Page writes file fsmark                                 1497         0
4608M1P-xfs          Page writes file simple-wb                              4305         0
4608M1P-xfs          Page writes file mmap-strm                             17586     10153
4608M-xfs            Page writes file fsmark                                 3380         0
4608M-xfs            Page writes file simple-wb                              5528         0
4608M-4X-xfs         Page writes file fsmark                                 4650         0
4608M-4X-xfs         Page writes file simple-wb                              5621         0
4608M-4X-xfs         Page writes file mmap-strm                            149751     18395
4608M-16X-xfs        Page writes file fsmark                                  388         0
4608M-16X-xfs        Page writes file simple-wb                              5466         0
4608M-16X-xfs        Page writes file mmap-strm                           3349772     19307

This is the same type of tests just with more memory. If enough
processes are running, kswapd will start writing pages as it tries
to reclaim memory.

4096M8N-xfs          Page writes file fsmark                                11571      8163
4096M8N-xfs          Page writes file simple-wb                             28979     11460
4096M8N-xfs          Page writes file mmap-strm                            178999     12181
4096M8N-4X-xfs       Page writes file fsmark                                14421      7487
4096M8N-4X-xfs       Page writes file simple-wb                             26474     10529
4096M8N-4X-xfs       Page writes file mmap-strm                            163770     58765
4096M8N-16X-xfs      Page writes file fsmark                                16726      9265
4096M8N-16X-xfs      Page writes file simple-wb                             28800     11129
4096M8N-16X-xfs      Page writes file mmap-strm                             73303     48267

This is with 8 NUMA nodes, each 512M in size. As the flusher threads are
not targetting a specific ndoe, kswapd writing pages happens more
frequently.

Is this what you are looking for?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-11 20:25       ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-11 20:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Thu, Aug 11, 2011 at 06:10:29PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 10 Aug 2011 11:47:18 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > It is preferable that no dirty pages are dispatched for cleaning from
> > the page reclaim path. At normal priorities, this patch prevents kswapd
> > writing pages.
> > 
> > However, page reclaim does have a requirement that pages be freed
> > in a particular zone. If it is failing to make sufficient progress
> > (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> > is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> > considered to be the point where kswapd is getting into trouble
> > reclaiming pages. If this priority is reached, kswapd will dispatch
> > pages for writing.
> > 
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> 
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 

Thanks

> BTW, I'd like to see summary of the effect of priority..
> 

What sort of summary are you looking for? If pressure is high enough,
writes start happening from reclaim. On NUMA, it can be particularly
pronounced. Here is a summary of page writes from reclaim over a range
of tests

512M1P-xfs           Page writes file fsmark                                 8113        74
512M1P-xfs           Page writes file simple-wb                             19895         1
512M1P-xfs           Page writes file mmap-strm                               997        95
512M-xfs             Page writes file fsmark                                12071         9
512M-xfs             Page writes file simple-wb                             31709         1
512M-xfs             Page writes file mmap-strm                            148274      2448
512M-4X-xfs          Page writes file fsmark                                12828         0
512M-4X-xfs          Page writes file simple-wb                             32168         5
512M-4X-xfs          Page writes file mmap-strm                            346460      4405
512M-16X-xfs         Page writes file fsmark                                11566        29
512M-16X-xfs         Page writes file simple-wb                             31935         4
512M-16X-xfs         Page writes file mmap-strm                             38085      4371

With 1 processor (512M1P), very few writes occur as for the most part
flushers are keeping up. With 4x times more processors than there are
CPUs (512M-4X), there are more writes by kswapd..

1024M1P-xfs          Page writes file fsmark                                 3446         1
1024M1P-xfs          Page writes file simple-wb                             11697         6
1024M1P-xfs          Page writes file mmap-strm                              4077       446
1024M-xfs            Page writes file fsmark                                 5159         0
1024M-xfs            Page writes file simple-wb                             12785         5
1024M-xfs            Page writes file mmap-strm                            251153      8108
1024M-4X-xfs         Page writes file fsmark                                 4781         0
1024M-4X-xfs         Page writes file simple-wb                             12486         6
1024M-4X-xfs         Page writes file mmap-strm                           1627122     15000
1024M-16X-xfs        Page writes file fsmark                                 3777         1
1024M-16X-xfs        Page writes file simple-wb                             11856         2
1024M-16X-xfs        Page writes file mmap-strm                              6563      2638
4608M1P-xfs          Page writes file fsmark                                 1497         0
4608M1P-xfs          Page writes file simple-wb                              4305         0
4608M1P-xfs          Page writes file mmap-strm                             17586     10153
4608M-xfs            Page writes file fsmark                                 3380         0
4608M-xfs            Page writes file simple-wb                              5528         0
4608M-4X-xfs         Page writes file fsmark                                 4650         0
4608M-4X-xfs         Page writes file simple-wb                              5621         0
4608M-4X-xfs         Page writes file mmap-strm                            149751     18395
4608M-16X-xfs        Page writes file fsmark                                  388         0
4608M-16X-xfs        Page writes file simple-wb                              5466         0
4608M-16X-xfs        Page writes file mmap-strm                           3349772     19307

This is the same type of tests just with more memory. If enough
processes are running, kswapd will start writing pages as it tries
to reclaim memory.

4096M8N-xfs          Page writes file fsmark                                11571      8163
4096M8N-xfs          Page writes file simple-wb                             28979     11460
4096M8N-xfs          Page writes file mmap-strm                            178999     12181
4096M8N-4X-xfs       Page writes file fsmark                                14421      7487
4096M8N-4X-xfs       Page writes file simple-wb                             26474     10529
4096M8N-4X-xfs       Page writes file mmap-strm                            163770     58765
4096M8N-16X-xfs      Page writes file fsmark                                16726      9265
4096M8N-16X-xfs      Page writes file simple-wb                             28800     11129
4096M8N-16X-xfs      Page writes file mmap-strm                             73303     48267

This is with 8 NUMA nodes, each 512M in size. As the flusher threads are
not targetting a specific ndoe, kswapd writing pages happens more
frequently.

Is this what you are looking for?

-- 
Mel Gorman
SUSE Labs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-11 20:25       ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-11 20:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, Aug 11, 2011 at 06:10:29PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 10 Aug 2011 11:47:18 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > It is preferable that no dirty pages are dispatched for cleaning from
> > the page reclaim path. At normal priorities, this patch prevents kswapd
> > writing pages.
> > 
> > However, page reclaim does have a requirement that pages be freed
> > in a particular zone. If it is failing to make sufficient progress
> > (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> > is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> > considered to be the point where kswapd is getting into trouble
> > reclaiming pages. If this priority is reached, kswapd will dispatch
> > pages for writing.
> > 
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> 
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 

Thanks

> BTW, I'd like to see summary of the effect of priority..
> 

What sort of summary are you looking for? If pressure is high enough,
writes start happening from reclaim. On NUMA, it can be particularly
pronounced. Here is a summary of page writes from reclaim over a range
of tests

512M1P-xfs           Page writes file fsmark                                 8113        74
512M1P-xfs           Page writes file simple-wb                             19895         1
512M1P-xfs           Page writes file mmap-strm                               997        95
512M-xfs             Page writes file fsmark                                12071         9
512M-xfs             Page writes file simple-wb                             31709         1
512M-xfs             Page writes file mmap-strm                            148274      2448
512M-4X-xfs          Page writes file fsmark                                12828         0
512M-4X-xfs          Page writes file simple-wb                             32168         5
512M-4X-xfs          Page writes file mmap-strm                            346460      4405
512M-16X-xfs         Page writes file fsmark                                11566        29
512M-16X-xfs         Page writes file simple-wb                             31935         4
512M-16X-xfs         Page writes file mmap-strm                             38085      4371

With 1 processor (512M1P), very few writes occur as for the most part
flushers are keeping up. With 4x times more processors than there are
CPUs (512M-4X), there are more writes by kswapd..

1024M1P-xfs          Page writes file fsmark                                 3446         1
1024M1P-xfs          Page writes file simple-wb                             11697         6
1024M1P-xfs          Page writes file mmap-strm                              4077       446
1024M-xfs            Page writes file fsmark                                 5159         0
1024M-xfs            Page writes file simple-wb                             12785         5
1024M-xfs            Page writes file mmap-strm                            251153      8108
1024M-4X-xfs         Page writes file fsmark                                 4781         0
1024M-4X-xfs         Page writes file simple-wb                             12486         6
1024M-4X-xfs         Page writes file mmap-strm                           1627122     15000
1024M-16X-xfs        Page writes file fsmark                                 3777         1
1024M-16X-xfs        Page writes file simple-wb                             11856         2
1024M-16X-xfs        Page writes file mmap-strm                              6563      2638
4608M1P-xfs          Page writes file fsmark                                 1497         0
4608M1P-xfs          Page writes file simple-wb                              4305         0
4608M1P-xfs          Page writes file mmap-strm                             17586     10153
4608M-xfs            Page writes file fsmark                                 3380         0
4608M-xfs            Page writes file simple-wb                              5528         0
4608M-4X-xfs         Page writes file fsmark                                 4650         0
4608M-4X-xfs         Page writes file simple-wb                              5621         0
4608M-4X-xfs         Page writes file mmap-strm                            149751     18395
4608M-16X-xfs        Page writes file fsmark                                  388         0
4608M-16X-xfs        Page writes file simple-wb                              5466         0
4608M-16X-xfs        Page writes file mmap-strm                           3349772     19307

This is the same type of tests just with more memory. If enough
processes are running, kswapd will start writing pages as it tries
to reclaim memory.

4096M8N-xfs          Page writes file fsmark                                11571      8163
4096M8N-xfs          Page writes file simple-wb                             28979     11460
4096M8N-xfs          Page writes file mmap-strm                            178999     12181
4096M8N-4X-xfs       Page writes file fsmark                                14421      7487
4096M8N-4X-xfs       Page writes file simple-wb                             26474     10529
4096M8N-4X-xfs       Page writes file mmap-strm                            163770     58765
4096M8N-16X-xfs      Page writes file fsmark                                16726      9265
4096M8N-16X-xfs      Page writes file simple-wb                             28800     11129
4096M8N-16X-xfs      Page writes file mmap-strm                             73303     48267

This is with 8 NUMA nodes, each 512M in size. As the flusher threads are
not targetting a specific ndoe, kswapd writing pages happens more
frequently.

Is this what you are looking for?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
  2011-08-11 18:18     ` Rik van Riel
  (?)
@ 2011-08-11 20:38       ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-11 20:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On Thu, Aug 11, 2011 at 02:18:54PM -0400, Rik van Riel wrote:
> On 08/10/2011 06:47 AM, Mel Gorman wrote:
> >It is preferable that no dirty pages are dispatched for cleaning from
> >the page reclaim path. At normal priorities, this patch prevents kswapd
> >writing pages.
> >
> >However, page reclaim does have a requirement that pages be freed
> >in a particular zone. If it is failing to make sufficient progress
> >(reclaiming<  SWAP_CLUSTER_MAX at any priority priority), the priority
> >is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> >considered to be the point where kswapd is getting into trouble
> >reclaiming pages. If this priority is reached, kswapd will dispatch
> >pages for writing.
> >
> >Signed-off-by: Mel Gorman<mgorman@suse.de>
> >Reviewed-by: Minchan Kim<minchan.kim@gmail.com>
> 
> My only worry with this patch is that maybe we'll burn too
> much CPU time freeing pages from a zone. 

The throttling patch prevents too much CPU being used if pages under
writeback are being encountered during scanning. That said, I shared
your concern and recorded kswapd CPU usage over time.

> However, chances
> are we'll have freed pages from other zones when scanning
> one zone multiple times (the page cache dirty limit is global,
> the clean pages have to be _somewhere_).
> 
> Since the bulk of the allocators are not too picky about
> which zone they get their pages from, I suspect this patch
> will be an overall improvement pretty much all the time.
> 

This is roughly similar to my own reasoning.

I uploaded all the kswapd CPU usage charts to
http://www.csn.ul.ie/~mel/postings/riel-20110811

These are smoothened as the raw figures are barely readable. If you
go through them, you'll see that kswapd CPU usage is sometimes higher
but generally within 2-3%.

> Acked-by: Rik van Riel <riel@redhat.com>

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-11 20:38       ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-11 20:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM, Minchan Kim,
	Wu Fengguang, Johannes Weiner

On Thu, Aug 11, 2011 at 02:18:54PM -0400, Rik van Riel wrote:
> On 08/10/2011 06:47 AM, Mel Gorman wrote:
> >It is preferable that no dirty pages are dispatched for cleaning from
> >the page reclaim path. At normal priorities, this patch prevents kswapd
> >writing pages.
> >
> >However, page reclaim does have a requirement that pages be freed
> >in a particular zone. If it is failing to make sufficient progress
> >(reclaiming<  SWAP_CLUSTER_MAX at any priority priority), the priority
> >is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> >considered to be the point where kswapd is getting into trouble
> >reclaiming pages. If this priority is reached, kswapd will dispatch
> >pages for writing.
> >
> >Signed-off-by: Mel Gorman<mgorman@suse.de>
> >Reviewed-by: Minchan Kim<minchan.kim@gmail.com>
> 
> My only worry with this patch is that maybe we'll burn too
> much CPU time freeing pages from a zone. 

The throttling patch prevents too much CPU being used if pages under
writeback are being encountered during scanning. That said, I shared
your concern and recorded kswapd CPU usage over time.

> However, chances
> are we'll have freed pages from other zones when scanning
> one zone multiple times (the page cache dirty limit is global,
> the clean pages have to be _somewhere_).
> 
> Since the bulk of the allocators are not too picky about
> which zone they get their pages from, I suspect this patch
> will be an overall improvement pretty much all the time.
> 

This is roughly similar to my own reasoning.

I uploaded all the kswapd CPU usage charts to
http://www.csn.ul.ie/~mel/postings/riel-20110811

These are smoothened as the raw figures are barely readable. If you
go through them, you'll see that kswapd CPU usage is sometimes higher
but generally within 2-3%.

> Acked-by: Rik van Riel <riel@redhat.com>

Thanks.

-- 
Mel Gorman
SUSE Labs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-11 20:38       ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-11 20:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On Thu, Aug 11, 2011 at 02:18:54PM -0400, Rik van Riel wrote:
> On 08/10/2011 06:47 AM, Mel Gorman wrote:
> >It is preferable that no dirty pages are dispatched for cleaning from
> >the page reclaim path. At normal priorities, this patch prevents kswapd
> >writing pages.
> >
> >However, page reclaim does have a requirement that pages be freed
> >in a particular zone. If it is failing to make sufficient progress
> >(reclaiming<  SWAP_CLUSTER_MAX at any priority priority), the priority
> >is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> >considered to be the point where kswapd is getting into trouble
> >reclaiming pages. If this priority is reached, kswapd will dispatch
> >pages for writing.
> >
> >Signed-off-by: Mel Gorman<mgorman@suse.de>
> >Reviewed-by: Minchan Kim<minchan.kim@gmail.com>
> 
> My only worry with this patch is that maybe we'll burn too
> much CPU time freeing pages from a zone. 

The throttling patch prevents too much CPU being used if pages under
writeback are being encountered during scanning. That said, I shared
your concern and recorded kswapd CPU usage over time.

> However, chances
> are we'll have freed pages from other zones when scanning
> one zone multiple times (the page cache dirty limit is global,
> the clean pages have to be _somewhere_).
> 
> Since the bulk of the allocators are not too picky about
> which zone they get their pages from, I suspect this patch
> will be an overall improvement pretty much all the time.
> 

This is roughly similar to my own reasoning.

I uploaded all the kswapd CPU usage charts to
http://www.csn.ul.ie/~mel/postings/riel-20110811

These are smoothened as the raw figures are barely readable. If you
go through them, you'll see that kswapd CPU usage is sometimes higher
but generally within 2-3%.

> Acked-by: Rik van Riel <riel@redhat.com>

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
  2011-08-10 11:15     ` Mel Gorman
  (?)
@ 2011-08-11 23:45       ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2011-08-11 23:45 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Hellwig, Rik van Riel, Jan Kara, LKML, XFS, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Wed, Aug 10, 2011 at 12:15:47PM +0100, Mel Gorman wrote:
> > The right fix is to simply remove that fallback, possibly in combination
> > with implementating real migration support for btrfs.
> > 
> 
> Removing the fallback entirely is overkill as proper migration support
> is not going to get 100% coverage

It seems like btrfs is indeed the only important one missing.


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-11 23:45       ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2011-08-11 23:45 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Wed, Aug 10, 2011 at 12:15:47PM +0100, Mel Gorman wrote:
> > The right fix is to simply remove that fallback, possibly in combination
> > with implementating real migration support for btrfs.
> > 
> 
> Removing the fallback entirely is overkill as proper migration support
> is not going to get 100% coverage

It seems like btrfs is indeed the only important one missing.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-11 23:45       ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2011-08-11 23:45 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Hellwig, Rik van Riel, Jan Kara, LKML, XFS, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Wed, Aug 10, 2011 at 12:15:47PM +0100, Mel Gorman wrote:
> > The right fix is to simply remove that fallback, possibly in combination
> > with implementating real migration support for btrfs.
> > 
> 
> Removing the fallback entirely is overkill as proper migration support
> is not going to get 100% coverage

It seems like btrfs is indeed the only important one missing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-12  2:47     ` Rik van Riel
  -1 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-12  2:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Workloads that are allocating frequently and writing files place a
> large number of dirty pages on the LRU. With use-once logic, it is
> possible for them to reach the end of the LRU quickly requiring the
> reclaimer to scan more to find clean pages. Ordinarily, processes that
> are dirtying memory will get throttled by dirty balancing but this
> is a global heuristic and does not take into account that LRUs are
> maintained on a per-zone basis. This can lead to a situation whereby
> reclaim is scanning heavily, skipping over a large number of pages
> under writeback and recycling them around the LRU consuming CPU.
>
> This patch checks how many of the number of pages isolated from the
> LRU were dirty and under writeback. If a percentage of them under
> writeback, the process will be throttled if a backing device or the
> zone is congested. Note that this applies whether it is anonymous or
> file-backed pages that are under writeback meaning that swapping is
> potentially throttled. This is intentional due to the fact if the
> swap device is congested, scanning more pages and dispatching more
> IO is not going to help matters.
>
> The percentage that must be in writeback depends on the priority. At
> default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> the greater the likelihood the process will get throttled to allow
> the flusher threads to make some progress.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Reviewed-by: Minchan Kim<minchan.kim@gmail.com>
> Acked-by: Johannes Weiner<jweiner@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-12  2:47     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-12  2:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM, Minchan Kim,
	Wu Fengguang, Johannes Weiner

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Workloads that are allocating frequently and writing files place a
> large number of dirty pages on the LRU. With use-once logic, it is
> possible for them to reach the end of the LRU quickly requiring the
> reclaimer to scan more to find clean pages. Ordinarily, processes that
> are dirtying memory will get throttled by dirty balancing but this
> is a global heuristic and does not take into account that LRUs are
> maintained on a per-zone basis. This can lead to a situation whereby
> reclaim is scanning heavily, skipping over a large number of pages
> under writeback and recycling them around the LRU consuming CPU.
>
> This patch checks how many of the number of pages isolated from the
> LRU were dirty and under writeback. If a percentage of them under
> writeback, the process will be throttled if a backing device or the
> zone is congested. Note that this applies whether it is anonymous or
> file-backed pages that are under writeback meaning that swapping is
> potentially throttled. This is intentional due to the fact if the
> swap device is congested, scanning more pages and dispatching more
> IO is not going to help matters.
>
> The percentage that must be in writeback depends on the priority. At
> default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> the greater the likelihood the process will get throttled to allow
> the flusher threads to make some progress.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Reviewed-by: Minchan Kim<minchan.kim@gmail.com>
> Acked-by: Johannes Weiner<jweiner@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-12  2:47     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-12  2:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> Workloads that are allocating frequently and writing files place a
> large number of dirty pages on the LRU. With use-once logic, it is
> possible for them to reach the end of the LRU quickly requiring the
> reclaimer to scan more to find clean pages. Ordinarily, processes that
> are dirtying memory will get throttled by dirty balancing but this
> is a global heuristic and does not take into account that LRUs are
> maintained on a per-zone basis. This can lead to a situation whereby
> reclaim is scanning heavily, skipping over a large number of pages
> under writeback and recycling them around the LRU consuming CPU.
>
> This patch checks how many of the number of pages isolated from the
> LRU were dirty and under writeback. If a percentage of them under
> writeback, the process will be throttled if a backing device or the
> zone is congested. Note that this applies whether it is anonymous or
> file-backed pages that are under writeback meaning that swapping is
> potentially throttled. This is intentional due to the fact if the
> swap device is congested, scanning more pages and dispatching more
> IO is not going to help matters.
>
> The percentage that must be in writeback depends on the priority. At
> default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> the greater the likelihood the process will get throttled to allow
> the flusher threads to make some progress.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Reviewed-by: Minchan Kim<minchan.kim@gmail.com>
> Acked-by: Johannes Weiner<jweiner@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-12 15:27     ` Rik van Riel
  -1 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-12 15:27 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> When direct reclaim encounters a dirty page, it gets recycled around
> the LRU for another cycle. This patch marks the page PageReclaim
> similar to deactivate_page() so that the page gets reclaimed almost
> immediately after the page gets cleaned. This is to avoid reclaiming
> clean pages that are younger than a dirty page encountered at the
> end of the LRU that might have been something like a use-once page.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Acked-by: Johannes Weiner<jweiner@redhat.com>

I'm thinking we may need to add some code to
ClearPageReclaim to mark_page_accessed, but
that would be completely independent of these
patches, so ...

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
@ 2011-08-12 15:27     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-12 15:27 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM, Minchan Kim,
	Wu Fengguang, Johannes Weiner

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> When direct reclaim encounters a dirty page, it gets recycled around
> the LRU for another cycle. This patch marks the page PageReclaim
> similar to deactivate_page() so that the page gets reclaimed almost
> immediately after the page gets cleaned. This is to avoid reclaiming
> clean pages that are younger than a dirty page encountered at the
> end of the LRU that might have been something like a use-once page.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Acked-by: Johannes Weiner<jweiner@redhat.com>

I'm thinking we may need to add some code to
ClearPageReclaim to mark_page_accessed, but
that would be completely independent of these
patches, so ...

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
@ 2011-08-12 15:27     ` Rik van Riel
  0 siblings, 0 replies; 120+ messages in thread
From: Rik van Riel @ 2011-08-12 15:27 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Minchan Kim

On 08/10/2011 06:47 AM, Mel Gorman wrote:
> When direct reclaim encounters a dirty page, it gets recycled around
> the LRU for another cycle. This patch marks the page PageReclaim
> similar to deactivate_page() so that the page gets reclaimed almost
> immediately after the page gets cleaned. This is to avoid reclaiming
> clean pages that are younger than a dirty page encountered at the
> end of the LRU that might have been something like a use-once page.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> Acked-by: Johannes Weiner<jweiner@redhat.com>

I'm thinking we may need to add some code to
ClearPageReclaim to mark_page_accessed, but
that would be completely independent of these
patches, so ...

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-16 14:06     ` Wu Fengguang
  -1 siblings, 0 replies; 120+ messages in thread
From: Wu Fengguang @ 2011-08-16 14:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Jan Kara, Rik van Riel, Minchan Kim

Mel,

I tend to agree with the whole patchset except for this one.

The worry comes from the fact that there are always the very possible
unevenly distribution of dirty pages throughout the LRU lists. This
patch works on local information and may unnecessarily throttle page
reclaim when running into small spans of dirty pages.

One possible scheme of global throttling is to first tag the skipped
page with PG_reclaim (as you already do). And to throttle page reclaim
only when running into pages with both PG_dirty and PG_reclaim set,
which means we have cycled through the _whole_ LRU list (which is the
global and adaptive feedback we want) and run into that dirty page for
the second time.

One test scheme would be to read/write a sparse file fast with some
average 5:1 or 10:1 or whatever read:write ratio. This can effectively
spread dirty pages all over the LRU list. It's a practical test since
it mimics the typical file server workload with concurrent downloads
and uploads.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-16 14:06     ` Wu Fengguang
  0 siblings, 0 replies; 120+ messages in thread
From: Wu Fengguang @ 2011-08-16 14:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Johannes Weiner

Mel,

I tend to agree with the whole patchset except for this one.

The worry comes from the fact that there are always the very possible
unevenly distribution of dirty pages throughout the LRU lists. This
patch works on local information and may unnecessarily throttle page
reclaim when running into small spans of dirty pages.

One possible scheme of global throttling is to first tag the skipped
page with PG_reclaim (as you already do). And to throttle page reclaim
only when running into pages with both PG_dirty and PG_reclaim set,
which means we have cycled through the _whole_ LRU list (which is the
global and adaptive feedback we want) and run into that dirty page for
the second time.

One test scheme would be to read/write a sparse file fast with some
average 5:1 or 10:1 or whatever read:write ratio. This can effectively
spread dirty pages all over the LRU list. It's a practical test since
it mimics the typical file server workload with concurrent downloads
and uploads.

Thanks,
Fengguang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-16 14:06     ` Wu Fengguang
  0 siblings, 0 replies; 120+ messages in thread
From: Wu Fengguang @ 2011-08-16 14:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Jan Kara, Rik van Riel, Minchan Kim

Mel,

I tend to agree with the whole patchset except for this one.

The worry comes from the fact that there are always the very possible
unevenly distribution of dirty pages throughout the LRU lists. This
patch works on local information and may unnecessarily throttle page
reclaim when running into small spans of dirty pages.

One possible scheme of global throttling is to first tag the skipped
page with PG_reclaim (as you already do). And to throttle page reclaim
only when running into pages with both PG_dirty and PG_reclaim set,
which means we have cycled through the _whole_ LRU list (which is the
global and adaptive feedback we want) and run into that dirty page for
the second time.

One test scheme would be to read/write a sparse file fast with some
average 5:1 or 10:1 or whatever read:write ratio. This can effectively
spread dirty pages all over the LRU list. It's a practical test since
it mimics the typical file server workload with concurrent downloads
and uploads.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
  2011-08-16 14:06     ` Wu Fengguang
  (?)
@ 2011-08-16 15:02       ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-16 15:02 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Jan Kara, Rik van Riel, Minchan Kim

On Tue, Aug 16, 2011 at 10:06:52PM +0800, Wu Fengguang wrote:
> Mel,
> 
> I tend to agree with the whole patchset except for this one.
> 
> The worry comes from the fact that there are always the very possible
> unevenly distribution of dirty pages throughout the LRU lists.

It is pages under writeback that determines if throttling is considered
not dirty pages. The distinction is important. I agree with you that if
it was dirty pages that throttling would be considered too regularly.

> This
> patch works on local information and may unnecessarily throttle page
> reclaim when running into small spans of dirty pages.
> 

It's also calling wait_iff_congested() not congestion_wait(). This
takes BDI congestion and zone congestion into account with this check.

       /*
         * If there is no congestion, or heavy congestion is not being
         * encountered in the current zone, yield if necessary instead
         * of sleeping on the congestion queue
         */
        if (atomic_read(&nr_bdi_congested[sync]) == 0 ||
                        !zone_is_reclaim_congested(zone)) {

So global information is being taken into account.

> One possible scheme of global throttling is to first tag the skipped
> page with PG_reclaim (as you already do). And to throttle page reclaim
> only when running into pages with both PG_dirty and PG_reclaim set,

It's PG_writeback that is looked at, not PG_dirty.

> which means we have cycled through the _whole_ LRU list (which is the
> global and adaptive feedback we want) and run into that dirty page for
> the second time.
> 

This potentially results in more scanning from kswapd before it starts
throttling which could consume a lot of CPU. If pages under writeback
are reaching the end of the LRU, it's already the case that kswapd is
scanning faster than pages can be cleaned. Even then, it only really
throttles if the zone or a BDI is congested.

Taking that into consideration, do you still think there is a big
advantage to having writeback pages take another lap around the LRU
that is justifies the expected increase in CPU usage?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-16 15:02       ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-16 15:02 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Johannes Weiner

On Tue, Aug 16, 2011 at 10:06:52PM +0800, Wu Fengguang wrote:
> Mel,
> 
> I tend to agree with the whole patchset except for this one.
> 
> The worry comes from the fact that there are always the very possible
> unevenly distribution of dirty pages throughout the LRU lists.

It is pages under writeback that determines if throttling is considered
not dirty pages. The distinction is important. I agree with you that if
it was dirty pages that throttling would be considered too regularly.

> This
> patch works on local information and may unnecessarily throttle page
> reclaim when running into small spans of dirty pages.
> 

It's also calling wait_iff_congested() not congestion_wait(). This
takes BDI congestion and zone congestion into account with this check.

       /*
         * If there is no congestion, or heavy congestion is not being
         * encountered in the current zone, yield if necessary instead
         * of sleeping on the congestion queue
         */
        if (atomic_read(&nr_bdi_congested[sync]) == 0 ||
                        !zone_is_reclaim_congested(zone)) {

So global information is being taken into account.

> One possible scheme of global throttling is to first tag the skipped
> page with PG_reclaim (as you already do). And to throttle page reclaim
> only when running into pages with both PG_dirty and PG_reclaim set,

It's PG_writeback that is looked at, not PG_dirty.

> which means we have cycled through the _whole_ LRU list (which is the
> global and adaptive feedback we want) and run into that dirty page for
> the second time.
> 

This potentially results in more scanning from kswapd before it starts
throttling which could consume a lot of CPU. If pages under writeback
are reaching the end of the LRU, it's already the case that kswapd is
scanning faster than pages can be cleaned. Even then, it only really
throttles if the zone or a BDI is congested.

Taking that into consideration, do you still think there is a big
advantage to having writeback pages take another lap around the LRU
that is justifies the expected increase in CPU usage?

-- 
Mel Gorman
SUSE Labs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-16 15:02       ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-16 15:02 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Jan Kara, Rik van Riel, Minchan Kim

On Tue, Aug 16, 2011 at 10:06:52PM +0800, Wu Fengguang wrote:
> Mel,
> 
> I tend to agree with the whole patchset except for this one.
> 
> The worry comes from the fact that there are always the very possible
> unevenly distribution of dirty pages throughout the LRU lists.

It is pages under writeback that determines if throttling is considered
not dirty pages. The distinction is important. I agree with you that if
it was dirty pages that throttling would be considered too regularly.

> This
> patch works on local information and may unnecessarily throttle page
> reclaim when running into small spans of dirty pages.
> 

It's also calling wait_iff_congested() not congestion_wait(). This
takes BDI congestion and zone congestion into account with this check.

       /*
         * If there is no congestion, or heavy congestion is not being
         * encountered in the current zone, yield if necessary instead
         * of sleeping on the congestion queue
         */
        if (atomic_read(&nr_bdi_congested[sync]) == 0 ||
                        !zone_is_reclaim_congested(zone)) {

So global information is being taken into account.

> One possible scheme of global throttling is to first tag the skipped
> page with PG_reclaim (as you already do). And to throttle page reclaim
> only when running into pages with both PG_dirty and PG_reclaim set,

It's PG_writeback that is looked at, not PG_dirty.

> which means we have cycled through the _whole_ LRU list (which is the
> global and adaptive feedback we want) and run into that dirty page for
> the second time.
> 

This potentially results in more scanning from kswapd before it starts
throttling which could consume a lot of CPU. If pages under writeback
are reaching the end of the LRU, it's already the case that kswapd is
scanning faster than pages can be cleaned. Even then, it only really
throttles if the zone or a BDI is congested.

Taking that into consideration, do you still think there is a big
advantage to having writeback pages take another lap around the LRU
that is justifies the expected increase in CPU usage?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
  2011-08-11 20:25       ` Mel Gorman
  (?)
@ 2011-08-17  1:06         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-17  1:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, 11 Aug 2011 21:25:04 +0100
Mel Gorman <mgorman@suse.de> wrote:

> On Thu, Aug 11, 2011 at 06:10:29PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 10 Aug 2011 11:47:18 +0100
> > Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > It is preferable that no dirty pages are dispatched for cleaning from
> > > the page reclaim path. At normal priorities, this patch prevents kswapd
> > > writing pages.
> > > 
> > > However, page reclaim does have a requirement that pages be freed
> > > in a particular zone. If it is failing to make sufficient progress
> > > (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> > > is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> > > considered to be the point where kswapd is getting into trouble
> > > reclaiming pages. If this priority is reached, kswapd will dispatch
> > > pages for writing.
> > > 
> > > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > > Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> > 
> > 
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> 
> Thanks
> 
> > BTW, I'd like to see summary of the effect of priority..
> > 
> 
> What sort of summary are you looking for? If pressure is high enough,
> writes start happening from reclaim. On NUMA, it can be particularly
> pronounced. Here is a summary of page writes from reclaim over a range
> of tests
> 
> 512M1P-xfs           Page writes file fsmark                                 8113        74
> 512M1P-xfs           Page writes file simple-wb                             19895         1
> 512M1P-xfs           Page writes file mmap-strm                               997        95
> 512M-xfs             Page writes file fsmark                                12071         9
> 512M-xfs             Page writes file simple-wb                             31709         1
> 512M-xfs             Page writes file mmap-strm                            148274      2448
> 512M-4X-xfs          Page writes file fsmark                                12828         0
> 512M-4X-xfs          Page writes file simple-wb                             32168         5
> 512M-4X-xfs          Page writes file mmap-strm                            346460      4405
> 512M-16X-xfs         Page writes file fsmark                                11566        29
> 512M-16X-xfs         Page writes file simple-wb                             31935         4
> 512M-16X-xfs         Page writes file mmap-strm                             38085      4371
> 
> With 1 processor (512M1P), very few writes occur as for the most part
> flushers are keeping up. With 4x times more processors than there are
> CPUs (512M-4X), there are more writes by kswapd..
> 
> 1024M1P-xfs          Page writes file fsmark                                 3446         1
> 1024M1P-xfs          Page writes file simple-wb                             11697         6
> 1024M1P-xfs          Page writes file mmap-strm                              4077       446
> 1024M-xfs            Page writes file fsmark                                 5159         0
> 1024M-xfs            Page writes file simple-wb                             12785         5
> 1024M-xfs            Page writes file mmap-strm                            251153      8108
> 1024M-4X-xfs         Page writes file fsmark                                 4781         0
> 1024M-4X-xfs         Page writes file simple-wb                             12486         6
> 1024M-4X-xfs         Page writes file mmap-strm                           1627122     15000
> 1024M-16X-xfs        Page writes file fsmark                                 3777         1
> 1024M-16X-xfs        Page writes file simple-wb                             11856         2
> 1024M-16X-xfs        Page writes file mmap-strm                              6563      2638
> 4608M1P-xfs          Page writes file fsmark                                 1497         0
> 4608M1P-xfs          Page writes file simple-wb                              4305         0
> 4608M1P-xfs          Page writes file mmap-strm                             17586     10153
> 4608M-xfs            Page writes file fsmark                                 3380         0
> 4608M-xfs            Page writes file simple-wb                              5528         0
> 4608M-4X-xfs         Page writes file fsmark                                 4650         0
> 4608M-4X-xfs         Page writes file simple-wb                              5621         0
> 4608M-4X-xfs         Page writes file mmap-strm                            149751     18395
> 4608M-16X-xfs        Page writes file fsmark                                  388         0
> 4608M-16X-xfs        Page writes file simple-wb                              5466         0
> 4608M-16X-xfs        Page writes file mmap-strm                           3349772     19307
> 
> This is the same type of tests just with more memory. If enough
> processes are running, kswapd will start writing pages as it tries
> to reclaim memory.
> 
> 4096M8N-xfs          Page writes file fsmark                                11571      8163
> 4096M8N-xfs          Page writes file simple-wb                             28979     11460
> 4096M8N-xfs          Page writes file mmap-strm                            178999     12181
> 4096M8N-4X-xfs       Page writes file fsmark                                14421      7487
> 4096M8N-4X-xfs       Page writes file simple-wb                             26474     10529
> 4096M8N-4X-xfs       Page writes file mmap-strm                            163770     58765
> 4096M8N-16X-xfs      Page writes file fsmark                                16726      9265
> 4096M8N-16X-xfs      Page writes file simple-wb                             28800     11129
> 4096M8N-16X-xfs      Page writes file mmap-strm                             73303     48267
> 
> This is with 8 NUMA nodes, each 512M in size. As the flusher threads are
> not targetting a specific ndoe, kswapd writing pages happens more
> frequently.
> 

Thank you for illustration.

> Is this what you are looking for?
> 

I just wondered how 'priority' is used over vmscan.c

It's used for
  - calculate # of pages to be scanned.
  - sleep(congestion_wait())
  - change reclaim mode
  - reclaim stall detection
  - quit scan loop 
  - all_unreclaimable detection
  - swap token
  - write back skip <----- New!

To me, it seems a value is used for many purpose.
And I wonder whether this is good or not..

Thanks,
-Kame









^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-17  1:06         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-17  1:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Thu, 11 Aug 2011 21:25:04 +0100
Mel Gorman <mgorman@suse.de> wrote:

> On Thu, Aug 11, 2011 at 06:10:29PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 10 Aug 2011 11:47:18 +0100
> > Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > It is preferable that no dirty pages are dispatched for cleaning from
> > > the page reclaim path. At normal priorities, this patch prevents kswapd
> > > writing pages.
> > > 
> > > However, page reclaim does have a requirement that pages be freed
> > > in a particular zone. If it is failing to make sufficient progress
> > > (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> > > is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> > > considered to be the point where kswapd is getting into trouble
> > > reclaiming pages. If this priority is reached, kswapd will dispatch
> > > pages for writing.
> > > 
> > > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > > Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> > 
> > 
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> 
> Thanks
> 
> > BTW, I'd like to see summary of the effect of priority..
> > 
> 
> What sort of summary are you looking for? If pressure is high enough,
> writes start happening from reclaim. On NUMA, it can be particularly
> pronounced. Here is a summary of page writes from reclaim over a range
> of tests
> 
> 512M1P-xfs           Page writes file fsmark                                 8113        74
> 512M1P-xfs           Page writes file simple-wb                             19895         1
> 512M1P-xfs           Page writes file mmap-strm                               997        95
> 512M-xfs             Page writes file fsmark                                12071         9
> 512M-xfs             Page writes file simple-wb                             31709         1
> 512M-xfs             Page writes file mmap-strm                            148274      2448
> 512M-4X-xfs          Page writes file fsmark                                12828         0
> 512M-4X-xfs          Page writes file simple-wb                             32168         5
> 512M-4X-xfs          Page writes file mmap-strm                            346460      4405
> 512M-16X-xfs         Page writes file fsmark                                11566        29
> 512M-16X-xfs         Page writes file simple-wb                             31935         4
> 512M-16X-xfs         Page writes file mmap-strm                             38085      4371
> 
> With 1 processor (512M1P), very few writes occur as for the most part
> flushers are keeping up. With 4x times more processors than there are
> CPUs (512M-4X), there are more writes by kswapd..
> 
> 1024M1P-xfs          Page writes file fsmark                                 3446         1
> 1024M1P-xfs          Page writes file simple-wb                             11697         6
> 1024M1P-xfs          Page writes file mmap-strm                              4077       446
> 1024M-xfs            Page writes file fsmark                                 5159         0
> 1024M-xfs            Page writes file simple-wb                             12785         5
> 1024M-xfs            Page writes file mmap-strm                            251153      8108
> 1024M-4X-xfs         Page writes file fsmark                                 4781         0
> 1024M-4X-xfs         Page writes file simple-wb                             12486         6
> 1024M-4X-xfs         Page writes file mmap-strm                           1627122     15000
> 1024M-16X-xfs        Page writes file fsmark                                 3777         1
> 1024M-16X-xfs        Page writes file simple-wb                             11856         2
> 1024M-16X-xfs        Page writes file mmap-strm                              6563      2638
> 4608M1P-xfs          Page writes file fsmark                                 1497         0
> 4608M1P-xfs          Page writes file simple-wb                              4305         0
> 4608M1P-xfs          Page writes file mmap-strm                             17586     10153
> 4608M-xfs            Page writes file fsmark                                 3380         0
> 4608M-xfs            Page writes file simple-wb                              5528         0
> 4608M-4X-xfs         Page writes file fsmark                                 4650         0
> 4608M-4X-xfs         Page writes file simple-wb                              5621         0
> 4608M-4X-xfs         Page writes file mmap-strm                            149751     18395
> 4608M-16X-xfs        Page writes file fsmark                                  388         0
> 4608M-16X-xfs        Page writes file simple-wb                              5466         0
> 4608M-16X-xfs        Page writes file mmap-strm                           3349772     19307
> 
> This is the same type of tests just with more memory. If enough
> processes are running, kswapd will start writing pages as it tries
> to reclaim memory.
> 
> 4096M8N-xfs          Page writes file fsmark                                11571      8163
> 4096M8N-xfs          Page writes file simple-wb                             28979     11460
> 4096M8N-xfs          Page writes file mmap-strm                            178999     12181
> 4096M8N-4X-xfs       Page writes file fsmark                                14421      7487
> 4096M8N-4X-xfs       Page writes file simple-wb                             26474     10529
> 4096M8N-4X-xfs       Page writes file mmap-strm                            163770     58765
> 4096M8N-16X-xfs      Page writes file fsmark                                16726      9265
> 4096M8N-16X-xfs      Page writes file simple-wb                             28800     11129
> 4096M8N-16X-xfs      Page writes file mmap-strm                             73303     48267
> 
> This is with 8 NUMA nodes, each 512M in size. As the flusher threads are
> not targetting a specific ndoe, kswapd writing pages happens more
> frequently.
> 

Thank you for illustration.

> Is this what you are looking for?
> 

I just wondered how 'priority' is used over vmscan.c

It's used for
  - calculate # of pages to be scanned.
  - sleep(congestion_wait())
  - change reclaim mode
  - reclaim stall detection
  - quit scan loop 
  - all_unreclaimable detection
  - swap token
  - write back skip <----- New!

To me, it seems a value is used for many purpose.
And I wonder whether this is good or not..

Thanks,
-Kame








_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
@ 2011-08-17  1:06         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 120+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-17  1:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, 11 Aug 2011 21:25:04 +0100
Mel Gorman <mgorman@suse.de> wrote:

> On Thu, Aug 11, 2011 at 06:10:29PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 10 Aug 2011 11:47:18 +0100
> > Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > It is preferable that no dirty pages are dispatched for cleaning from
> > > the page reclaim path. At normal priorities, this patch prevents kswapd
> > > writing pages.
> > > 
> > > However, page reclaim does have a requirement that pages be freed
> > > in a particular zone. If it is failing to make sufficient progress
> > > (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> > > is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> > > considered to be the point where kswapd is getting into trouble
> > > reclaiming pages. If this priority is reached, kswapd will dispatch
> > > pages for writing.
> > > 
> > > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > > Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> > 
> > 
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> 
> Thanks
> 
> > BTW, I'd like to see summary of the effect of priority..
> > 
> 
> What sort of summary are you looking for? If pressure is high enough,
> writes start happening from reclaim. On NUMA, it can be particularly
> pronounced. Here is a summary of page writes from reclaim over a range
> of tests
> 
> 512M1P-xfs           Page writes file fsmark                                 8113        74
> 512M1P-xfs           Page writes file simple-wb                             19895         1
> 512M1P-xfs           Page writes file mmap-strm                               997        95
> 512M-xfs             Page writes file fsmark                                12071         9
> 512M-xfs             Page writes file simple-wb                             31709         1
> 512M-xfs             Page writes file mmap-strm                            148274      2448
> 512M-4X-xfs          Page writes file fsmark                                12828         0
> 512M-4X-xfs          Page writes file simple-wb                             32168         5
> 512M-4X-xfs          Page writes file mmap-strm                            346460      4405
> 512M-16X-xfs         Page writes file fsmark                                11566        29
> 512M-16X-xfs         Page writes file simple-wb                             31935         4
> 512M-16X-xfs         Page writes file mmap-strm                             38085      4371
> 
> With 1 processor (512M1P), very few writes occur as for the most part
> flushers are keeping up. With 4x times more processors than there are
> CPUs (512M-4X), there are more writes by kswapd..
> 
> 1024M1P-xfs          Page writes file fsmark                                 3446         1
> 1024M1P-xfs          Page writes file simple-wb                             11697         6
> 1024M1P-xfs          Page writes file mmap-strm                              4077       446
> 1024M-xfs            Page writes file fsmark                                 5159         0
> 1024M-xfs            Page writes file simple-wb                             12785         5
> 1024M-xfs            Page writes file mmap-strm                            251153      8108
> 1024M-4X-xfs         Page writes file fsmark                                 4781         0
> 1024M-4X-xfs         Page writes file simple-wb                             12486         6
> 1024M-4X-xfs         Page writes file mmap-strm                           1627122     15000
> 1024M-16X-xfs        Page writes file fsmark                                 3777         1
> 1024M-16X-xfs        Page writes file simple-wb                             11856         2
> 1024M-16X-xfs        Page writes file mmap-strm                              6563      2638
> 4608M1P-xfs          Page writes file fsmark                                 1497         0
> 4608M1P-xfs          Page writes file simple-wb                              4305         0
> 4608M1P-xfs          Page writes file mmap-strm                             17586     10153
> 4608M-xfs            Page writes file fsmark                                 3380         0
> 4608M-xfs            Page writes file simple-wb                              5528         0
> 4608M-4X-xfs         Page writes file fsmark                                 4650         0
> 4608M-4X-xfs         Page writes file simple-wb                              5621         0
> 4608M-4X-xfs         Page writes file mmap-strm                            149751     18395
> 4608M-16X-xfs        Page writes file fsmark                                  388         0
> 4608M-16X-xfs        Page writes file simple-wb                              5466         0
> 4608M-16X-xfs        Page writes file mmap-strm                           3349772     19307
> 
> This is the same type of tests just with more memory. If enough
> processes are running, kswapd will start writing pages as it tries
> to reclaim memory.
> 
> 4096M8N-xfs          Page writes file fsmark                                11571      8163
> 4096M8N-xfs          Page writes file simple-wb                             28979     11460
> 4096M8N-xfs          Page writes file mmap-strm                            178999     12181
> 4096M8N-4X-xfs       Page writes file fsmark                                14421      7487
> 4096M8N-4X-xfs       Page writes file simple-wb                             26474     10529
> 4096M8N-4X-xfs       Page writes file mmap-strm                            163770     58765
> 4096M8N-16X-xfs      Page writes file fsmark                                16726      9265
> 4096M8N-16X-xfs      Page writes file simple-wb                             28800     11129
> 4096M8N-16X-xfs      Page writes file mmap-strm                             73303     48267
> 
> This is with 8 NUMA nodes, each 512M in size. As the flusher threads are
> not targetting a specific ndoe, kswapd writing pages happens more
> frequently.
> 

Thank you for illustration.

> Is this what you are looking for?
> 

I just wondered how 'priority' is used over vmscan.c

It's used for
  - calculate # of pages to be scanned.
  - sleep(congestion_wait())
  - change reclaim mode
  - reclaim stall detection
  - quit scan loop 
  - all_unreclaimable detection
  - swap token
  - write back skip <----- New!

To me, it seems a value is used for many purpose.
And I wonder whether this is good or not..

Thanks,
-Kame








--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
  2011-08-16 15:02       ` Mel Gorman
  (?)
@ 2011-08-18 14:02         ` Wu Fengguang
  -1 siblings, 0 replies; 120+ messages in thread
From: Wu Fengguang @ 2011-08-18 14:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Jan Kara, Rik van Riel, Minchan Kim

On Tue, Aug 16, 2011 at 11:02:08PM +0800, Mel Gorman wrote:
> On Tue, Aug 16, 2011 at 10:06:52PM +0800, Wu Fengguang wrote:
> > Mel,
> > 
> > I tend to agree with the whole patchset except for this one.
> > 
> > The worry comes from the fact that there are always the very possible
> > unevenly distribution of dirty pages throughout the LRU lists.
> 
> It is pages under writeback that determines if throttling is considered
> not dirty pages. The distinction is important. I agree with you that if
> it was dirty pages that throttling would be considered too regularly.

Ah right, sorry for the rushed conclusion!

btw, I guess the vmscan will now progress faster due to the reduced
->pageout() and implicitly blocks in get_request_wait() on congested
IO queue.

> > This
> > patch works on local information and may unnecessarily throttle page
> > reclaim when running into small spans of dirty pages.
> > 
> 
> It's also calling wait_iff_congested() not congestion_wait(). This
> takes BDI congestion and zone congestion into account with this check.
> 
>        /*
>          * If there is no congestion, or heavy congestion is not being
>          * encountered in the current zone, yield if necessary instead
>          * of sleeping on the congestion queue
>          */
>         if (atomic_read(&nr_bdi_congested[sync]) == 0 ||
>                         !zone_is_reclaim_congested(zone)) {
> 
> So global information is being taken into account.

That's right.

> > One possible scheme of global throttling is to first tag the skipped
> > page with PG_reclaim (as you already do). And to throttle page reclaim
> > only when running into pages with both PG_dirty and PG_reclaim set,
> 
> It's PG_writeback that is looked at, not PG_dirty.
> 
> > which means we have cycled through the _whole_ LRU list (which is the
> > global and adaptive feedback we want) and run into that dirty page for
> > the second time.
> > 
> 
> This potentially results in more scanning from kswapd before it starts
> throttling which could consume a lot of CPU. If pages under writeback
> are reaching the end of the LRU, it's already the case that kswapd is
> scanning faster than pages can be cleaned. Even then, it only really
> throttles if the zone or a BDI is congested.

Yeah, the first round may already eat a lot of CPU power..

> Taking that into consideration, do you still think there is a big
> advantage to having writeback pages take another lap around the LRU
> that is justifies the expected increase in CPU usage?

Given that there are typically much fewer PG_writeback than PG_dirty
(except for btrfs which probably should be fixed), the current
throttle condition should be strong enough to avoid false positives.

I even start to worry on the opposite side -- it could be less
throttled than necessary when some LRU is full of dirty pages and
somehow the flusher failed to focus on those pages (hence there are no
enough PG_writeback to wait upon at all).

In this case it may help to do "wait on PG_dirty&PG_reclaim and/or
PG_writeback&PG_reclaim". But the most essential task is always to let
the flusher focus more on the pages, rather than the question of
to-sleep-or-not-to-sleep, which will either block the direct reclaim
tasks for arbitrary long time, or act even worse by busy burning the CPU
during the time.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-18 14:02         ` Wu Fengguang
  0 siblings, 0 replies; 120+ messages in thread
From: Wu Fengguang @ 2011-08-18 14:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Johannes Weiner

On Tue, Aug 16, 2011 at 11:02:08PM +0800, Mel Gorman wrote:
> On Tue, Aug 16, 2011 at 10:06:52PM +0800, Wu Fengguang wrote:
> > Mel,
> > 
> > I tend to agree with the whole patchset except for this one.
> > 
> > The worry comes from the fact that there are always the very possible
> > unevenly distribution of dirty pages throughout the LRU lists.
> 
> It is pages under writeback that determines if throttling is considered
> not dirty pages. The distinction is important. I agree with you that if
> it was dirty pages that throttling would be considered too regularly.

Ah right, sorry for the rushed conclusion!

btw, I guess the vmscan will now progress faster due to the reduced
->pageout() and implicitly blocks in get_request_wait() on congested
IO queue.

> > This
> > patch works on local information and may unnecessarily throttle page
> > reclaim when running into small spans of dirty pages.
> > 
> 
> It's also calling wait_iff_congested() not congestion_wait(). This
> takes BDI congestion and zone congestion into account with this check.
> 
>        /*
>          * If there is no congestion, or heavy congestion is not being
>          * encountered in the current zone, yield if necessary instead
>          * of sleeping on the congestion queue
>          */
>         if (atomic_read(&nr_bdi_congested[sync]) == 0 ||
>                         !zone_is_reclaim_congested(zone)) {
> 
> So global information is being taken into account.

That's right.

> > One possible scheme of global throttling is to first tag the skipped
> > page with PG_reclaim (as you already do). And to throttle page reclaim
> > only when running into pages with both PG_dirty and PG_reclaim set,
> 
> It's PG_writeback that is looked at, not PG_dirty.
> 
> > which means we have cycled through the _whole_ LRU list (which is the
> > global and adaptive feedback we want) and run into that dirty page for
> > the second time.
> > 
> 
> This potentially results in more scanning from kswapd before it starts
> throttling which could consume a lot of CPU. If pages under writeback
> are reaching the end of the LRU, it's already the case that kswapd is
> scanning faster than pages can be cleaned. Even then, it only really
> throttles if the zone or a BDI is congested.

Yeah, the first round may already eat a lot of CPU power..

> Taking that into consideration, do you still think there is a big
> advantage to having writeback pages take another lap around the LRU
> that is justifies the expected increase in CPU usage?

Given that there are typically much fewer PG_writeback than PG_dirty
(except for btrfs which probably should be fixed), the current
throttle condition should be strong enough to avoid false positives.

I even start to worry on the opposite side -- it could be less
throttled than necessary when some LRU is full of dirty pages and
somehow the flusher failed to focus on those pages (hence there are no
enough PG_writeback to wait upon at all).

In this case it may help to do "wait on PG_dirty&PG_reclaim and/or
PG_writeback&PG_reclaim". But the most essential task is always to let
the flusher focus more on the pages, rather than the question of
to-sleep-or-not-to-sleep, which will either block the direct reclaim
tasks for arbitrary long time, or act even worse by busy burning the CPU
during the time.

Thanks,
Fengguang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-18 14:02         ` Wu Fengguang
  0 siblings, 0 replies; 120+ messages in thread
From: Wu Fengguang @ 2011-08-18 14:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Jan Kara, Rik van Riel, Minchan Kim

On Tue, Aug 16, 2011 at 11:02:08PM +0800, Mel Gorman wrote:
> On Tue, Aug 16, 2011 at 10:06:52PM +0800, Wu Fengguang wrote:
> > Mel,
> > 
> > I tend to agree with the whole patchset except for this one.
> > 
> > The worry comes from the fact that there are always the very possible
> > unevenly distribution of dirty pages throughout the LRU lists.
> 
> It is pages under writeback that determines if throttling is considered
> not dirty pages. The distinction is important. I agree with you that if
> it was dirty pages that throttling would be considered too regularly.

Ah right, sorry for the rushed conclusion!

btw, I guess the vmscan will now progress faster due to the reduced
->pageout() and implicitly blocks in get_request_wait() on congested
IO queue.

> > This
> > patch works on local information and may unnecessarily throttle page
> > reclaim when running into small spans of dirty pages.
> > 
> 
> It's also calling wait_iff_congested() not congestion_wait(). This
> takes BDI congestion and zone congestion into account with this check.
> 
>        /*
>          * If there is no congestion, or heavy congestion is not being
>          * encountered in the current zone, yield if necessary instead
>          * of sleeping on the congestion queue
>          */
>         if (atomic_read(&nr_bdi_congested[sync]) == 0 ||
>                         !zone_is_reclaim_congested(zone)) {
> 
> So global information is being taken into account.

That's right.

> > One possible scheme of global throttling is to first tag the skipped
> > page with PG_reclaim (as you already do). And to throttle page reclaim
> > only when running into pages with both PG_dirty and PG_reclaim set,
> 
> It's PG_writeback that is looked at, not PG_dirty.
> 
> > which means we have cycled through the _whole_ LRU list (which is the
> > global and adaptive feedback we want) and run into that dirty page for
> > the second time.
> > 
> 
> This potentially results in more scanning from kswapd before it starts
> throttling which could consume a lot of CPU. If pages under writeback
> are reaching the end of the LRU, it's already the case that kswapd is
> scanning faster than pages can be cleaned. Even then, it only really
> throttles if the zone or a BDI is congested.

Yeah, the first round may already eat a lot of CPU power..

> Taking that into consideration, do you still think there is a big
> advantage to having writeback pages take another lap around the LRU
> that is justifies the expected increase in CPU usage?

Given that there are typically much fewer PG_writeback than PG_dirty
(except for btrfs which probably should be fixed), the current
throttle condition should be strong enough to avoid false positives.

I even start to worry on the opposite side -- it could be less
throttled than necessary when some LRU is full of dirty pages and
somehow the flusher failed to focus on those pages (hence there are no
enough PG_writeback to wait upon at all).

In this case it may help to do "wait on PG_dirty&PG_reclaim and/or
PG_writeback&PG_reclaim". But the most essential task is always to let
the flusher focus more on the pages, rather than the question of
to-sleep-or-not-to-sleep, which will either block the direct reclaim
tasks for arbitrary long time, or act even worse by busy burning the CPU
during the time.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
  2011-08-10 10:47 ` Mel Gorman
  (?)
@ 2011-08-18 23:54   ` Andrew Morton
  -1 siblings, 0 replies; 120+ messages in thread
From: Andrew Morton @ 2011-08-18 23:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:13 +0100
Mel Gorman <mgorman@suse.de> wrote:

> The new problem is that
> reclaim has very little control over how long before a page in a
> particular zone or container is cleaned which is discussed later.

Confused - where was this discussed?  Please tell us more about
this problem and how it was addressed.


Another (and somewhat interrelated) potential problem I see with this
work is that it throws a big dependency onto kswapd.  If kswapd gets
stuck somewhere for extended periods, there's nothing there to perform
direct writeback.  This has happened in the past in weird situations
such as kswpad getting blocked on ext3 journal commits which are
themselves stuck for ages behind lots of writeout which itself is stuck
behind lots of reads.  That's an advantage of direct reclaim: more
threads available.

How forcefully has this stuff been tested with multiple disks per
kswapd?  Where one disk is overloaded-ext3-on-usb-stick?


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-18 23:54   ` Andrew Morton
  0 siblings, 0 replies; 120+ messages in thread
From: Andrew Morton @ 2011-08-18 23:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Wed, 10 Aug 2011 11:47:13 +0100
Mel Gorman <mgorman@suse.de> wrote:

> The new problem is that
> reclaim has very little control over how long before a page in a
> particular zone or container is cleaned which is discussed later.

Confused - where was this discussed?  Please tell us more about
this problem and how it was addressed.


Another (and somewhat interrelated) potential problem I see with this
work is that it throws a big dependency onto kswapd.  If kswapd gets
stuck somewhere for extended periods, there's nothing there to perform
direct writeback.  This has happened in the past in weird situations
such as kswpad getting blocked on ext3 journal commits which are
themselves stuck for ages behind lots of writeout which itself is stuck
behind lots of reads.  That's an advantage of direct reclaim: more
threads available.

How forcefully has this stuff been tested with multiple disks per
kswapd?  Where one disk is overloaded-ext3-on-usb-stick?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-18 23:54   ` Andrew Morton
  0 siblings, 0 replies; 120+ messages in thread
From: Andrew Morton @ 2011-08-18 23:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:13 +0100
Mel Gorman <mgorman@suse.de> wrote:

> The new problem is that
> reclaim has very little control over how long before a page in a
> particular zone or container is cleaned which is discussed later.

Confused - where was this discussed?  Please tell us more about
this problem and how it was addressed.


Another (and somewhat interrelated) potential problem I see with this
work is that it throws a big dependency onto kswapd.  If kswapd gets
stuck somewhere for extended periods, there's nothing there to perform
direct writeback.  This has happened in the past in weird situations
such as kswpad getting blocked on ext3 journal commits which are
themselves stuck for ages behind lots of writeout which itself is stuck
behind lots of reads.  That's an advantage of direct reclaim: more
threads available.

How forcefully has this stuff been tested with multiple disks per
kswapd?  Where one disk is overloaded-ext3-on-usb-stick?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
  2011-08-10 10:47   ` Mel Gorman
  (?)
@ 2011-08-18 23:54     ` Andrew Morton
  -1 siblings, 0 replies; 120+ messages in thread
From: Andrew Morton @ 2011-08-18 23:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:19 +0100
Mel Gorman <mgorman@suse.de> wrote:

> The percentage that must be in writeback depends on the priority. At
> default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> the greater the likelihood the process will get throttled to allow
> the flusher threads to make some progress.

It'd be nice if the code comment were to capture this piece of implicit
arithmetic.  After all, it's a magic number and magic numbers should
stick out like sore thumbs.

And.. how do we know that the chosen magic numbers were optimal?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-18 23:54     ` Andrew Morton
  0 siblings, 0 replies; 120+ messages in thread
From: Andrew Morton @ 2011-08-18 23:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Wed, 10 Aug 2011 11:47:19 +0100
Mel Gorman <mgorman@suse.de> wrote:

> The percentage that must be in writeback depends on the priority. At
> default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> the greater the likelihood the process will get throttled to allow
> the flusher threads to make some progress.

It'd be nice if the code comment were to capture this piece of implicit
arithmetic.  After all, it's a magic number and magic numbers should
stick out like sore thumbs.

And.. how do we know that the chosen magic numbers were optimal?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-18 23:54     ` Andrew Morton
  0 siblings, 0 replies; 120+ messages in thread
From: Andrew Morton @ 2011-08-18 23:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Wed, 10 Aug 2011 11:47:19 +0100
Mel Gorman <mgorman@suse.de> wrote:

> The percentage that must be in writeback depends on the priority. At
> default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> the greater the likelihood the process will get throttled to allow
> the flusher threads to make some progress.

It'd be nice if the code comment were to capture this piece of implicit
arithmetic.  After all, it's a magic number and magic numbers should
stick out like sore thumbs.

And.. how do we know that the chosen magic numbers were optimal?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
  2011-08-18 23:54   ` Andrew Morton
  (?)
@ 2011-08-20 19:33     ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-20 19:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, Aug 18, 2011 at 04:54:20PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:13 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The new problem is that
> > reclaim has very little control over how long before a page in a
> > particular zone or container is cleaned which is discussed later.
> 
> Confused - where was this discussed?  Please tell us more about
> this problem and how it was addressed.
> 

I'm currently on holiday. I am only online checking train timetables.
I'll be back online properly on August 30th.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-20 19:33     ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-20 19:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Thu, Aug 18, 2011 at 04:54:20PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:13 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The new problem is that
> > reclaim has very little control over how long before a page in a
> > particular zone or container is cleaned which is discussed later.
> 
> Confused - where was this discussed?  Please tell us more about
> this problem and how it was addressed.
> 

I'm currently on holiday. I am only online checking train timetables.
I'll be back online properly on August 30th.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-20 19:33     ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-20 19:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, Aug 18, 2011 at 04:54:20PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:13 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The new problem is that
> > reclaim has very little control over how long before a page in a
> > particular zone or container is cleaned which is discussed later.
> 
> Confused - where was this discussed?  Please tell us more about
> this problem and how it was addressed.
> 

I'm currently on holiday. I am only online checking train timetables.
I'll be back online properly on August 30th.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
  2011-08-18 23:54   ` Andrew Morton
  (?)
@ 2011-08-30 13:19     ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-30 13:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, Aug 18, 2011 at 04:54:20PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:13 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The new problem is that
> > reclaim has very little control over how long before a page in a
> > particular zone or container is cleaned which is discussed later.
> 
> Confused - where was this discussed?  Please tell us more about
> this problem and how it was addressed.
> 

This text really referred to V2 of the series where kswapd was not
writing back pages. This lead to problems on NUMA as described in
https://lkml.org/lkml/2011/7/21/242 . I should have updated the text to
read

"There is a potential new problem as reclaim has less control over
how long before a page in a particularly zone or container is cleaned
and direct reclaimers depend on kswapd or flusher threads to do
the necessary work. However, as filesystems sometimes ignore direct
reclaim requests already, it is not expected to be a serious issue"

> Another (and somewhat interrelated) potential problem I see with this
> work is that it throws a big dependency onto kswapd.  If kswapd gets
> stuck somewhere for extended periods, there's nothing there to perform
> direct writeback. 

In theory, this is true. In practice, btrfs and ext4 are already
ignoring requests from direct reclaim and have been for some
time. btrfs is particularly bad in that is also ignores requests
from kswapd leading me to believe that we are eventually going to
see stall-related bug reports on large NUMA machines with btrfs.

> This has happened in the past in weird situations
> such as kswpad getting blocked on ext3 journal commits which are
> themselves stuck for ages behind lots of writeout which itself is stuck
> behind lots of reads.  That's an advantage of direct reclaim: more
> threads available.

I do not know what these situations were but was it possible that it was
due to too many direct reclaimers starving kswapd of access to the
journal?

> How forcefully has this stuff been tested with multiple disks per
> kswapd? 

As heavily as I could on the machine I had available. This was 4 disks
for one kswapd instance. I did not spot major problems.

> Where one disk is overloaded-ext3-on-usb-stick?
> 

I tested with ext4 on a USB stick, not ext3. It completed faster and the
interactive performance felt roughly the same.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-30 13:19     ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-30 13:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Thu, Aug 18, 2011 at 04:54:20PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:13 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The new problem is that
> > reclaim has very little control over how long before a page in a
> > particular zone or container is cleaned which is discussed later.
> 
> Confused - where was this discussed?  Please tell us more about
> this problem and how it was addressed.
> 

This text really referred to V2 of the series where kswapd was not
writing back pages. This lead to problems on NUMA as described in
https://lkml.org/lkml/2011/7/21/242 . I should have updated the text to
read

"There is a potential new problem as reclaim has less control over
how long before a page in a particularly zone or container is cleaned
and direct reclaimers depend on kswapd or flusher threads to do
the necessary work. However, as filesystems sometimes ignore direct
reclaim requests already, it is not expected to be a serious issue"

> Another (and somewhat interrelated) potential problem I see with this
> work is that it throws a big dependency onto kswapd.  If kswapd gets
> stuck somewhere for extended periods, there's nothing there to perform
> direct writeback. 

In theory, this is true. In practice, btrfs and ext4 are already
ignoring requests from direct reclaim and have been for some
time. btrfs is particularly bad in that is also ignores requests
from kswapd leading me to believe that we are eventually going to
see stall-related bug reports on large NUMA machines with btrfs.

> This has happened in the past in weird situations
> such as kswpad getting blocked on ext3 journal commits which are
> themselves stuck for ages behind lots of writeout which itself is stuck
> behind lots of reads.  That's an advantage of direct reclaim: more
> threads available.

I do not know what these situations were but was it possible that it was
due to too many direct reclaimers starving kswapd of access to the
journal?

> How forcefully has this stuff been tested with multiple disks per
> kswapd? 

As heavily as I could on the machine I had available. This was 4 disks
for one kswapd instance. I did not spot major problems.

> Where one disk is overloaded-ext3-on-usb-stick?
> 

I tested with ext4 on a USB stick, not ext3. It completed faster and the
interactive performance felt roughly the same.

-- 
Mel Gorman
SUSE Labs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 0/7] Reduce filesystem writeback from page reclaim v3
@ 2011-08-30 13:19     ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-30 13:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, Aug 18, 2011 at 04:54:20PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:13 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The new problem is that
> > reclaim has very little control over how long before a page in a
> > particular zone or container is cleaned which is discussed later.
> 
> Confused - where was this discussed?  Please tell us more about
> this problem and how it was addressed.
> 

This text really referred to V2 of the series where kswapd was not
writing back pages. This lead to problems on NUMA as described in
https://lkml.org/lkml/2011/7/21/242 . I should have updated the text to
read

"There is a potential new problem as reclaim has less control over
how long before a page in a particularly zone or container is cleaned
and direct reclaimers depend on kswapd or flusher threads to do
the necessary work. However, as filesystems sometimes ignore direct
reclaim requests already, it is not expected to be a serious issue"

> Another (and somewhat interrelated) potential problem I see with this
> work is that it throws a big dependency onto kswapd.  If kswapd gets
> stuck somewhere for extended periods, there's nothing there to perform
> direct writeback. 

In theory, this is true. In practice, btrfs and ext4 are already
ignoring requests from direct reclaim and have been for some
time. btrfs is particularly bad in that is also ignores requests
from kswapd leading me to believe that we are eventually going to
see stall-related bug reports on large NUMA machines with btrfs.

> This has happened in the past in weird situations
> such as kswpad getting blocked on ext3 journal commits which are
> themselves stuck for ages behind lots of writeout which itself is stuck
> behind lots of reads.  That's an advantage of direct reclaim: more
> threads available.

I do not know what these situations were but was it possible that it was
due to too many direct reclaimers starving kswapd of access to the
journal?

> How forcefully has this stuff been tested with multiple disks per
> kswapd? 

As heavily as I could on the machine I had available. This was 4 disks
for one kswapd instance. I did not spot major problems.

> Where one disk is overloaded-ext3-on-usb-stick?
> 

I tested with ext4 on a USB stick, not ext3. It completed faster and the
interactive performance felt roughly the same.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
  2011-08-18 23:54     ` Andrew Morton
  (?)
@ 2011-08-30 13:49       ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-30 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, Aug 18, 2011 at 04:54:28PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:19 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The percentage that must be in writeback depends on the priority. At
> > default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> > of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> > the greater the likelihood the process will get throttled to allow
> > the flusher threads to make some progress.
> 
> It'd be nice if the code comment were to capture this piece of implicit
> arithmetic.  After all, it's a magic number and magic numbers should
> stick out like sore thumbs.
> 
> And.. how do we know that the chosen magic numbers were optimal?

Good question. The short answer "we don't know but it's not important
to get this particular decision perfect because the real throttling
should happen earlier".

Now the long answer;

For the value to be used, pages under writeback must be reaching the
end of the LRU. This implies that the rate of page consumption is
exceeding the writing speed of the backing storage. Regardless of
what decision is made, the rate of page allocation must be reduced
as the the system is already in a sub-optimal state of requiring more
resources than are available.

The values are based on a simple expontial backoff function with useful
ranges of DEF_PRIORITY to DEF_PRIORITY-2 which is the point where
"kswapd is getting into trouble". However, any decreasing function
within that range is sufficient because while there might be an optimal
choice, it makes little difference overall as the decision is made
too late with no guarantee the process doing the dirtying is throttled.

The truly optimal decision is to throttle writers to slow storage
earlier in balance_dirty_pages() and have dirty_ratio scaled
proportional to the estimate writeback speed of the underlying storage
but we do not have that yet. This patches throttling decision is
fairly close to the best we can do from reclaim context.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-30 13:49       ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-30 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Thu, Aug 18, 2011 at 04:54:28PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:19 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The percentage that must be in writeback depends on the priority. At
> > default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> > of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> > the greater the likelihood the process will get throttled to allow
> > the flusher threads to make some progress.
> 
> It'd be nice if the code comment were to capture this piece of implicit
> arithmetic.  After all, it's a magic number and magic numbers should
> stick out like sore thumbs.
> 
> And.. how do we know that the chosen magic numbers were optimal?

Good question. The short answer "we don't know but it's not important
to get this particular decision perfect because the real throttling
should happen earlier".

Now the long answer;

For the value to be used, pages under writeback must be reaching the
end of the LRU. This implies that the rate of page consumption is
exceeding the writing speed of the backing storage. Regardless of
what decision is made, the rate of page allocation must be reduced
as the the system is already in a sub-optimal state of requiring more
resources than are available.

The values are based on a simple expontial backoff function with useful
ranges of DEF_PRIORITY to DEF_PRIORITY-2 which is the point where
"kswapd is getting into trouble". However, any decreasing function
within that range is sufficient because while there might be an optimal
choice, it makes little difference overall as the decision is made
too late with no guarantee the process doing the dirtying is throttled.

The truly optimal decision is to throttle writers to slow storage
earlier in balance_dirty_pages() and have dirty_ratio scaled
proportional to the estimate writeback speed of the underlying storage
but we do not have that yet. This patches throttling decision is
fairly close to the best we can do from reclaim context.

-- 
Mel Gorman
SUSE Labs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-30 13:49       ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-30 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, Aug 18, 2011 at 04:54:28PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:19 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The percentage that must be in writeback depends on the priority. At
> > default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> > of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> > the greater the likelihood the process will get throttled to allow
> > the flusher threads to make some progress.
> 
> It'd be nice if the code comment were to capture this piece of implicit
> arithmetic.  After all, it's a magic number and magic numbers should
> stick out like sore thumbs.
> 
> And.. how do we know that the chosen magic numbers were optimal?

Good question. The short answer "we don't know but it's not important
to get this particular decision perfect because the real throttling
should happen earlier".

Now the long answer;

For the value to be used, pages under writeback must be reaching the
end of the LRU. This implies that the rate of page consumption is
exceeding the writing speed of the backing storage. Regardless of
what decision is made, the rate of page allocation must be reduced
as the the system is already in a sub-optimal state of requiring more
resources than are available.

The values are based on a simple expontial backoff function with useful
ranges of DEF_PRIORITY to DEF_PRIORITY-2 which is the point where
"kswapd is getting into trouble". However, any decreasing function
within that range is sufficient because while there might be an optimal
choice, it makes little difference overall as the decision is made
too late with no guarantee the process doing the dirtying is throttled.

The truly optimal decision is to throttle writers to slow storage
earlier in balance_dirty_pages() and have dirty_ratio scaled
proportional to the estimate writeback speed of the underlying storage
but we do not have that yet. This patches throttling decision is
fairly close to the best we can do from reclaim context.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
  2011-08-18 23:54     ` Andrew Morton
  (?)
@ 2011-08-31  9:53       ` Mel Gorman
  -1 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-31  9:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, Aug 18, 2011 at 04:54:28PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:19 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The percentage that must be in writeback depends on the priority. At
> > default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> > of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> > the greater the likelihood the process will get throttled to allow
> > the flusher threads to make some progress.
> 
> It'd be nice if the code comment were to capture this piece of implicit
> arithmetic.

How about this?

==== CUT HERE ====
mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback -fix1

This patch expands on a comment on how we throttle from reclaim context.
It should be merged with
mm-vmscan-throttle-reclaim-if-encountering-too-many-dirty-pages-under-writeback.patch

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c |   26 +++++++++++++++++++++-----
 1 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33882a3..5ff3e26 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1491,11 +1491,27 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
 
 	/*
-	 * If we have encountered a high number of dirty pages under writeback
-	 * then we are reaching the end of the LRU too quickly and global
-	 * limits are not enough to throttle processes due to the page
-	 * distribution throughout zones. Scale the number of dirty pages that
-	 * must be under writeback before being throttled to priority.
+	 * If reclaim is isolating dirty pages under writeback, it implies
+	 * that the long-lived page allocation rate is exceeding the page
+	 * laundering rate. Either the global limits are not being effective
+	 * at throttling processes due to the page distribution throughout
+	 * zones or there is heavy usage of a slow backing device. The
+	 * only option is to throttle from reclaim context which is not ideal
+	 * as there is no guarantee the dirtying process is throttled in the
+	 * same way balance_dirty_pages() manages.
+	 *
+	 * This scales the number of dirty pages that must be under writeback
+	 * before throttling depending on priority. It is a simple backoff
+	 * function that has the most effect in the range DEF_PRIORITY to
+	 * DEF_PRIORITY-2 which is the priority reclaim is considered to be
+	 * in trouble and reclaim is considered to be in trouble.
+	 *
+	 * DEF_PRIORITY   100% isolated pages must be PageWriteback to throttle
+	 * DEF_PRIORITY-1  50% must be PageWriteback
+	 * DEF_PRIORITY-2  25% must be PageWriteback, kswapd in trouble
+	 * ...
+	 * DEF_PRIORITY-6 For SWAP_CLUSTER_MAX isolated pages, throttle if any
+	 *                     isolated page is PageWriteback
 	 */
 	if (nr_writeback && nr_writeback >= (nr_taken >> (DEF_PRIORITY-priority)))
 		wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-31  9:53       ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-31  9:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig, Linux-MM,
	Minchan Kim, Wu Fengguang, Johannes Weiner

On Thu, Aug 18, 2011 at 04:54:28PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:19 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The percentage that must be in writeback depends on the priority. At
> > default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> > of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> > the greater the likelihood the process will get throttled to allow
> > the flusher threads to make some progress.
> 
> It'd be nice if the code comment were to capture this piece of implicit
> arithmetic.

How about this?

==== CUT HERE ====
mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback -fix1

This patch expands on a comment on how we throttle from reclaim context.
It should be merged with
mm-vmscan-throttle-reclaim-if-encountering-too-many-dirty-pages-under-writeback.patch

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c |   26 +++++++++++++++++++++-----
 1 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33882a3..5ff3e26 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1491,11 +1491,27 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
 
 	/*
-	 * If we have encountered a high number of dirty pages under writeback
-	 * then we are reaching the end of the LRU too quickly and global
-	 * limits are not enough to throttle processes due to the page
-	 * distribution throughout zones. Scale the number of dirty pages that
-	 * must be under writeback before being throttled to priority.
+	 * If reclaim is isolating dirty pages under writeback, it implies
+	 * that the long-lived page allocation rate is exceeding the page
+	 * laundering rate. Either the global limits are not being effective
+	 * at throttling processes due to the page distribution throughout
+	 * zones or there is heavy usage of a slow backing device. The
+	 * only option is to throttle from reclaim context which is not ideal
+	 * as there is no guarantee the dirtying process is throttled in the
+	 * same way balance_dirty_pages() manages.
+	 *
+	 * This scales the number of dirty pages that must be under writeback
+	 * before throttling depending on priority. It is a simple backoff
+	 * function that has the most effect in the range DEF_PRIORITY to
+	 * DEF_PRIORITY-2 which is the priority reclaim is considered to be
+	 * in trouble and reclaim is considered to be in trouble.
+	 *
+	 * DEF_PRIORITY   100% isolated pages must be PageWriteback to throttle
+	 * DEF_PRIORITY-1  50% must be PageWriteback
+	 * DEF_PRIORITY-2  25% must be PageWriteback, kswapd in trouble
+	 * ...
+	 * DEF_PRIORITY-6 For SWAP_CLUSTER_MAX isolated pages, throttle if any
+	 *                     isolated page is PageWriteback
 	 */
 	if (nr_writeback && nr_writeback >= (nr_taken >> (DEF_PRIORITY-priority)))
 		wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
@ 2011-08-31  9:53       ` Mel Gorman
  0 siblings, 0 replies; 120+ messages in thread
From: Mel Gorman @ 2011-08-31  9:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
	Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
	Minchan Kim

On Thu, Aug 18, 2011 at 04:54:28PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:19 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > The percentage that must be in writeback depends on the priority. At
> > default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> > of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> > the greater the likelihood the process will get throttled to allow
> > the flusher threads to make some progress.
> 
> It'd be nice if the code comment were to capture this piece of implicit
> arithmetic.

How about this?

==== CUT HERE ====
mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback -fix1

This patch expands on a comment on how we throttle from reclaim context.
It should be merged with
mm-vmscan-throttle-reclaim-if-encountering-too-many-dirty-pages-under-writeback.patch

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c |   26 +++++++++++++++++++++-----
 1 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33882a3..5ff3e26 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1491,11 +1491,27 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
 
 	/*
-	 * If we have encountered a high number of dirty pages under writeback
-	 * then we are reaching the end of the LRU too quickly and global
-	 * limits are not enough to throttle processes due to the page
-	 * distribution throughout zones. Scale the number of dirty pages that
-	 * must be under writeback before being throttled to priority.
+	 * If reclaim is isolating dirty pages under writeback, it implies
+	 * that the long-lived page allocation rate is exceeding the page
+	 * laundering rate. Either the global limits are not being effective
+	 * at throttling processes due to the page distribution throughout
+	 * zones or there is heavy usage of a slow backing device. The
+	 * only option is to throttle from reclaim context which is not ideal
+	 * as there is no guarantee the dirtying process is throttled in the
+	 * same way balance_dirty_pages() manages.
+	 *
+	 * This scales the number of dirty pages that must be under writeback
+	 * before throttling depending on priority. It is a simple backoff
+	 * function that has the most effect in the range DEF_PRIORITY to
+	 * DEF_PRIORITY-2 which is the priority reclaim is considered to be
+	 * in trouble and reclaim is considered to be in trouble.
+	 *
+	 * DEF_PRIORITY   100% isolated pages must be PageWriteback to throttle
+	 * DEF_PRIORITY-1  50% must be PageWriteback
+	 * DEF_PRIORITY-2  25% must be PageWriteback, kswapd in trouble
+	 * ...
+	 * DEF_PRIORITY-6 For SWAP_CLUSTER_MAX isolated pages, throttle if any
+	 *                     isolated page is PageWriteback
 	 */
 	if (nr_writeback && nr_writeback >= (nr_taken >> (DEF_PRIORITY-priority)))
 		wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

end of thread, other threads:[~2011-08-31  9:53 UTC | newest]

Thread overview: 120+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-10 10:47 [PATCH 0/7] Reduce filesystem writeback from page reclaim v3 Mel Gorman
2011-08-10 10:47 ` Mel Gorman
2011-08-10 10:47 ` Mel Gorman
2011-08-10 10:47 ` [PATCH 1/7] mm: vmscan: Do not writeback filesystem pages in direct reclaim Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-10 12:40   ` Johannes Weiner
2011-08-10 12:40     ` Johannes Weiner
2011-08-10 12:40     ` Johannes Weiner
2011-08-11  9:03   ` KAMEZAWA Hiroyuki
2011-08-11  9:03     ` KAMEZAWA Hiroyuki
2011-08-11  9:03     ` KAMEZAWA Hiroyuki
2011-08-11 15:57   ` Rik van Riel
2011-08-11 15:57     ` Rik van Riel
2011-08-11 15:57     ` Rik van Riel
2011-08-10 10:47 ` [PATCH 2/7] mm: vmscan: Remove dead code related to lumpy reclaim waiting on pages under writeback Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-10 12:41   ` Johannes Weiner
2011-08-10 12:41     ` Johannes Weiner
2011-08-10 12:41     ` Johannes Weiner
2011-08-10 23:19   ` Minchan Kim
2011-08-10 23:19     ` Minchan Kim
2011-08-10 23:19     ` Minchan Kim
2011-08-11  9:05   ` KAMEZAWA Hiroyuki
2011-08-11  9:05     ` KAMEZAWA Hiroyuki
2011-08-11  9:05     ` KAMEZAWA Hiroyuki
2011-08-11 16:52   ` Rik van Riel
2011-08-11 16:52     ` Rik van Riel
2011-08-11 16:52     ` Rik van Riel
2011-08-10 10:47 ` [PATCH 3/7] xfs: Warn if direct reclaim tries to writeback pages Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-11 16:53   ` Rik van Riel
2011-08-11 16:53     ` Rik van Riel
2011-08-11 16:53     ` Rik van Riel
2011-08-10 10:47 ` [PATCH 4/7] ext4: " Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-11 17:07   ` Rik van Riel
2011-08-11 17:07     ` Rik van Riel
2011-08-11 17:07     ` Rik van Riel
2011-08-10 10:47 ` [PATCH 5/7] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-10 12:44   ` Johannes Weiner
2011-08-10 12:44     ` Johannes Weiner
2011-08-10 12:44     ` Johannes Weiner
2011-08-11  9:10   ` KAMEZAWA Hiroyuki
2011-08-11  9:10     ` KAMEZAWA Hiroyuki
2011-08-11  9:10     ` KAMEZAWA Hiroyuki
2011-08-11 20:25     ` Mel Gorman
2011-08-11 20:25       ` Mel Gorman
2011-08-11 20:25       ` Mel Gorman
2011-08-17  1:06       ` KAMEZAWA Hiroyuki
2011-08-17  1:06         ` KAMEZAWA Hiroyuki
2011-08-17  1:06         ` KAMEZAWA Hiroyuki
2011-08-11 18:18   ` Rik van Riel
2011-08-11 18:18     ` Rik van Riel
2011-08-11 18:18     ` Rik van Riel
2011-08-11 20:38     ` Mel Gorman
2011-08-11 20:38       ` Mel Gorman
2011-08-11 20:38       ` Mel Gorman
2011-08-10 10:47 ` [PATCH 6/7] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-11  9:18   ` KAMEZAWA Hiroyuki
2011-08-11  9:18     ` KAMEZAWA Hiroyuki
2011-08-11  9:18     ` KAMEZAWA Hiroyuki
2011-08-12  2:47   ` Rik van Riel
2011-08-12  2:47     ` Rik van Riel
2011-08-12  2:47     ` Rik van Riel
2011-08-16 14:06   ` Wu Fengguang
2011-08-16 14:06     ` Wu Fengguang
2011-08-16 14:06     ` Wu Fengguang
2011-08-16 15:02     ` Mel Gorman
2011-08-16 15:02       ` Mel Gorman
2011-08-16 15:02       ` Mel Gorman
2011-08-18 14:02       ` Wu Fengguang
2011-08-18 14:02         ` Wu Fengguang
2011-08-18 14:02         ` Wu Fengguang
2011-08-18 23:54   ` Andrew Morton
2011-08-18 23:54     ` Andrew Morton
2011-08-18 23:54     ` Andrew Morton
2011-08-30 13:49     ` Mel Gorman
2011-08-30 13:49       ` Mel Gorman
2011-08-30 13:49       ` Mel Gorman
2011-08-31  9:53     ` Mel Gorman
2011-08-31  9:53       ` Mel Gorman
2011-08-31  9:53       ` Mel Gorman
2011-08-10 10:47 ` [PATCH 7/7] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-10 10:47   ` Mel Gorman
2011-08-10 23:22   ` Minchan Kim
2011-08-10 23:22     ` Minchan Kim
2011-08-10 23:22     ` Minchan Kim
2011-08-11  9:19   ` KAMEZAWA Hiroyuki
2011-08-11  9:19     ` KAMEZAWA Hiroyuki
2011-08-11  9:19     ` KAMEZAWA Hiroyuki
2011-08-12 15:27   ` Rik van Riel
2011-08-12 15:27     ` Rik van Riel
2011-08-12 15:27     ` Rik van Riel
2011-08-10 11:00 ` [PATCH 0/7] Reduce filesystem writeback from page reclaim v3 Christoph Hellwig
2011-08-10 11:00   ` Christoph Hellwig
2011-08-10 11:00   ` Christoph Hellwig
2011-08-10 11:15   ` Mel Gorman
2011-08-10 11:15     ` Mel Gorman
2011-08-10 11:15     ` Mel Gorman
2011-08-11 23:45     ` Christoph Hellwig
2011-08-11 23:45       ` Christoph Hellwig
2011-08-11 23:45       ` Christoph Hellwig
2011-08-18 23:54 ` Andrew Morton
2011-08-18 23:54   ` Andrew Morton
2011-08-18 23:54   ` Andrew Morton
2011-08-20 19:33   ` Mel Gorman
2011-08-20 19:33     ` Mel Gorman
2011-08-20 19:33     ` Mel Gorman
2011-08-30 13:19   ` Mel Gorman
2011-08-30 13:19     ` Mel Gorman
2011-08-30 13:19     ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.