All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Use up free swap space before reaching OOM kill
@ 2013-01-09  6:21 Minchan Kim
  2013-01-09  6:21   ` Minchan Kim
  2013-01-09  6:21   ` Minchan Kim
  0 siblings, 2 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-09  6:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Minchan Kim

Recently, Luigi reported there are lots of free swap space when
OOM happens. It's easily reproduced on zram-over-swap, where
many instance of memory hogs are running and laptop_mode = 2.
http://marc.info/?l=linux-mm&m=135421750914807&w=2

This patchset fixes the problem. In fact, if we apply one of two,
we can fix the problem but I send two all because it's separate
issue although each of them solves this issues.

Andrew, Could you replace [1] with this patchset in mmotm?
I think this patchset is better than [1].

[1] mm-swap-out-anonymous-page-regardless-of-laptop_mode.patch

Minchan Kim (2):
  [1/2] mm: prevent to add a page to swap if may_writepage is unset
  [2/2] mm: forcely swapout when we are out of page cache

 mm/vmscan.c |    8 ++++++++
 1 file changed, 8 insertions(+)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-09  6:21 [PATCH 0/2] Use up free swap space before reaching OOM kill Minchan Kim
@ 2013-01-09  6:21   ` Minchan Kim
  2013-01-09  6:21   ` Minchan Kim
  1 sibling, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-09  6:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Minchan Kim

Recently, Luigi reported there are lots of free swap space when
OOM happens. It's easily reproduced on zram-over-swap, where
many instance of memory hogs are running and laptop_mode is enabled.

Luigi reported there was no problem when he disabled laptop_mode.
The problem when I investigate problem is following as.

try_to_free_pages disable may_writepage if laptop_mode is enabled.
shrink_page_list adds lots of anon pages in swap cache by
add_to_swap, which makes pages Dirty and rotate them to head of
inactive LRU without pageout. If it is repeated, inactive anon LRU
is full of Dirty and SwapCache pages.

In case of that, isolate_lru_pages fails because it try to isolate
clean page due to may_writepage == 0.

The may_writepage could be 1 only if total_scanned is higher than
writeback_threshold in do_try_to_free_pages but unfortunately,
VM can't isolate anon pages from inactive anon lru list by
above reason and we already reclaimed all file-backed pages.
So it ends up OOM killing.

This patch prevents to add a page to swap cache unnecessary when
may_writepage is unset so anoymous lru list isn't full of
Dirty/Swapcache page. So VM can isolate pages from anon lru list,
which ends up setting may_writepage to 1 and could swap out
anon lru pages. When OOM triggers, I confirmed swap space was full.

Reported-by: Luigi Semenzato <semenzato@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/vmscan.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ff869d2..439cc47 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (PageAnon(page) && !PageSwapCache(page)) {
 			if (!(sc->gfp_mask & __GFP_IO))
 				goto keep_locked;
+			if (!sc->may_writepage)
+				goto keep_locked;
 			if (!add_to_swap(page))
 				goto activate_locked;
 			may_enter_fs = 1;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-09  6:21   ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-09  6:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Minchan Kim

Recently, Luigi reported there are lots of free swap space when
OOM happens. It's easily reproduced on zram-over-swap, where
many instance of memory hogs are running and laptop_mode is enabled.

Luigi reported there was no problem when he disabled laptop_mode.
The problem when I investigate problem is following as.

try_to_free_pages disable may_writepage if laptop_mode is enabled.
shrink_page_list adds lots of anon pages in swap cache by
add_to_swap, which makes pages Dirty and rotate them to head of
inactive LRU without pageout. If it is repeated, inactive anon LRU
is full of Dirty and SwapCache pages.

In case of that, isolate_lru_pages fails because it try to isolate
clean page due to may_writepage == 0.

The may_writepage could be 1 only if total_scanned is higher than
writeback_threshold in do_try_to_free_pages but unfortunately,
VM can't isolate anon pages from inactive anon lru list by
above reason and we already reclaimed all file-backed pages.
So it ends up OOM killing.

This patch prevents to add a page to swap cache unnecessary when
may_writepage is unset so anoymous lru list isn't full of
Dirty/Swapcache page. So VM can isolate pages from anon lru list,
which ends up setting may_writepage to 1 and could swap out
anon lru pages. When OOM triggers, I confirmed swap space was full.

Reported-by: Luigi Semenzato <semenzato@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/vmscan.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ff869d2..439cc47 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (PageAnon(page) && !PageSwapCache(page)) {
 			if (!(sc->gfp_mask & __GFP_IO))
 				goto keep_locked;
+			if (!sc->may_writepage)
+				goto keep_locked;
 			if (!add_to_swap(page))
 				goto activate_locked;
 			may_enter_fs = 1;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-09  6:21 [PATCH 0/2] Use up free swap space before reaching OOM kill Minchan Kim
@ 2013-01-09  6:21   ` Minchan Kim
  2013-01-09  6:21   ` Minchan Kim
  1 sibling, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-09  6:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Minchan Kim

If laptop_mode is enable, VM try to avoid I/O for saving the power.
But if there isn't reclaimable memory without I/O, we should do I/O
for preventing unnecessary OOM kill although we sacrifices power.

One of example is that we are out of page cache. Remained one is
only anonymous pages, for swapping out, we needs may_writepage = 1.

Reported-by: Luigi Semenzato <semenzato@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/vmscan.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 439cc47..624c816 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 		free = zone_page_state(zone, NR_FREE_PAGES);
 		if (unlikely(file + free <= high_wmark_pages(zone))) {
 			scan_balance = SCAN_ANON;
+			/*
+			 * From now on, we have to swap out
+			 * for peventing OOM kill although
+			 * we sacrifice power consumption.
+			 */
+			sc->may_writepage = 1;
 			goto out;
 		}
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-09  6:21   ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-09  6:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Minchan Kim

If laptop_mode is enable, VM try to avoid I/O for saving the power.
But if there isn't reclaimable memory without I/O, we should do I/O
for preventing unnecessary OOM kill although we sacrifices power.

One of example is that we are out of page cache. Remained one is
only anonymous pages, for swapping out, we needs may_writepage = 1.

Reported-by: Luigi Semenzato <semenzato@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/vmscan.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 439cc47..624c816 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 		free = zone_page_state(zone, NR_FREE_PAGES);
 		if (unlikely(file + free <= high_wmark_pages(zone))) {
 			scan_balance = SCAN_ANON;
+			/*
+			 * From now on, we have to swap out
+			 * for peventing OOM kill although
+			 * we sacrifice power consumption.
+			 */
+			sc->may_writepage = 1;
 			goto out;
 		}
 	}
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-09  6:21   ` Minchan Kim
@ 2013-01-09  6:56     ` Johannes Weiner
  -1 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2013-01-09  6:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Rik van Riel

On Wed, Jan 09, 2013 at 03:21:13PM +0900, Minchan Kim wrote:
> Recently, Luigi reported there are lots of free swap space when
> OOM happens. It's easily reproduced on zram-over-swap, where
> many instance of memory hogs are running and laptop_mode is enabled.
> 
> Luigi reported there was no problem when he disabled laptop_mode.
> The problem when I investigate problem is following as.
> 
> try_to_free_pages disable may_writepage if laptop_mode is enabled.
> shrink_page_list adds lots of anon pages in swap cache by
> add_to_swap, which makes pages Dirty and rotate them to head of
> inactive LRU without pageout. If it is repeated, inactive anon LRU
> is full of Dirty and SwapCache pages.
> 
> In case of that, isolate_lru_pages fails because it try to isolate
> clean page due to may_writepage == 0.
> 
> The may_writepage could be 1 only if total_scanned is higher than
> writeback_threshold in do_try_to_free_pages but unfortunately,
> VM can't isolate anon pages from inactive anon lru list by
> above reason and we already reclaimed all file-backed pages.
> So it ends up OOM killing.
> 
> This patch prevents to add a page to swap cache unnecessary when
> may_writepage is unset so anoymous lru list isn't full of
> Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> which ends up setting may_writepage to 1 and could swap out
> anon lru pages. When OOM triggers, I confirmed swap space was full.
> 
> Reported-by: Luigi Semenzato <semenzato@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

We used to ignore the page's writeback state on isolation in the past,
could you include a reference to since when this problem has been in
the tree?  Also, would it make sense to tag it for one of the stable
trees?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-09  6:56     ` Johannes Weiner
  0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2013-01-09  6:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Rik van Riel

On Wed, Jan 09, 2013 at 03:21:13PM +0900, Minchan Kim wrote:
> Recently, Luigi reported there are lots of free swap space when
> OOM happens. It's easily reproduced on zram-over-swap, where
> many instance of memory hogs are running and laptop_mode is enabled.
> 
> Luigi reported there was no problem when he disabled laptop_mode.
> The problem when I investigate problem is following as.
> 
> try_to_free_pages disable may_writepage if laptop_mode is enabled.
> shrink_page_list adds lots of anon pages in swap cache by
> add_to_swap, which makes pages Dirty and rotate them to head of
> inactive LRU without pageout. If it is repeated, inactive anon LRU
> is full of Dirty and SwapCache pages.
> 
> In case of that, isolate_lru_pages fails because it try to isolate
> clean page due to may_writepage == 0.
> 
> The may_writepage could be 1 only if total_scanned is higher than
> writeback_threshold in do_try_to_free_pages but unfortunately,
> VM can't isolate anon pages from inactive anon lru list by
> above reason and we already reclaimed all file-backed pages.
> So it ends up OOM killing.
> 
> This patch prevents to add a page to swap cache unnecessary when
> may_writepage is unset so anoymous lru list isn't full of
> Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> which ends up setting may_writepage to 1 and could swap out
> anon lru pages. When OOM triggers, I confirmed swap space was full.
> 
> Reported-by: Luigi Semenzato <semenzato@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

We used to ignore the page's writeback state on isolation in the past,
could you include a reference to since when this problem has been in
the tree?  Also, would it make sense to tag it for one of the stable
trees?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-09  6:56     ` Johannes Weiner
@ 2013-01-09  7:10       ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-09  7:10 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Rik van Riel

Hi Hannes,

On Wed, Jan 09, 2013 at 01:56:12AM -0500, Johannes Weiner wrote:
> On Wed, Jan 09, 2013 at 03:21:13PM +0900, Minchan Kim wrote:
> > Recently, Luigi reported there are lots of free swap space when
> > OOM happens. It's easily reproduced on zram-over-swap, where
> > many instance of memory hogs are running and laptop_mode is enabled.
> > 
> > Luigi reported there was no problem when he disabled laptop_mode.
> > The problem when I investigate problem is following as.
> > 
> > try_to_free_pages disable may_writepage if laptop_mode is enabled.
> > shrink_page_list adds lots of anon pages in swap cache by
> > add_to_swap, which makes pages Dirty and rotate them to head of
> > inactive LRU without pageout. If it is repeated, inactive anon LRU
> > is full of Dirty and SwapCache pages.
> > 
> > In case of that, isolate_lru_pages fails because it try to isolate
> > clean page due to may_writepage == 0.
> > 
> > The may_writepage could be 1 only if total_scanned is higher than
> > writeback_threshold in do_try_to_free_pages but unfortunately,
> > VM can't isolate anon pages from inactive anon lru list by
> > above reason and we already reclaimed all file-backed pages.
> > So it ends up OOM killing.
> > 
> > This patch prevents to add a page to swap cache unnecessary when
> > may_writepage is unset so anoymous lru list isn't full of
> > Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> > which ends up setting may_writepage to 1 and could swap out
> > anon lru pages. When OOM triggers, I confirmed swap space was full.
> > 
> > Reported-by: Luigi Semenzato <semenzato@google.com>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> We used to ignore the page's writeback state on isolation in the past,
> could you include a reference to since when this problem has been in

Good idea.
It has existed since f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
I will write down it in changelog.

> the tree?  Also, would it make sense to tag it for one of the stable
> trees?

If Luigi confirmed it, I will Cc stable@vger.kernel.org in next spin.
Thanks!

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-09  7:10       ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-09  7:10 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Rik van Riel

Hi Hannes,

On Wed, Jan 09, 2013 at 01:56:12AM -0500, Johannes Weiner wrote:
> On Wed, Jan 09, 2013 at 03:21:13PM +0900, Minchan Kim wrote:
> > Recently, Luigi reported there are lots of free swap space when
> > OOM happens. It's easily reproduced on zram-over-swap, where
> > many instance of memory hogs are running and laptop_mode is enabled.
> > 
> > Luigi reported there was no problem when he disabled laptop_mode.
> > The problem when I investigate problem is following as.
> > 
> > try_to_free_pages disable may_writepage if laptop_mode is enabled.
> > shrink_page_list adds lots of anon pages in swap cache by
> > add_to_swap, which makes pages Dirty and rotate them to head of
> > inactive LRU without pageout. If it is repeated, inactive anon LRU
> > is full of Dirty and SwapCache pages.
> > 
> > In case of that, isolate_lru_pages fails because it try to isolate
> > clean page due to may_writepage == 0.
> > 
> > The may_writepage could be 1 only if total_scanned is higher than
> > writeback_threshold in do_try_to_free_pages but unfortunately,
> > VM can't isolate anon pages from inactive anon lru list by
> > above reason and we already reclaimed all file-backed pages.
> > So it ends up OOM killing.
> > 
> > This patch prevents to add a page to swap cache unnecessary when
> > may_writepage is unset so anoymous lru list isn't full of
> > Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> > which ends up setting may_writepage to 1 and could swap out
> > anon lru pages. When OOM triggers, I confirmed swap space was full.
> > 
> > Reported-by: Luigi Semenzato <semenzato@google.com>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> We used to ignore the page's writeback state on isolation in the past,
> could you include a reference to since when this problem has been in

Good idea.
It has existed since f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
I will write down it in changelog.

> the tree?  Also, would it make sense to tag it for one of the stable
> trees?

If Luigi confirmed it, I will Cc stable@vger.kernel.org in next spin.
Thanks!

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-09  6:21   ` Minchan Kim
@ 2013-01-10  0:18     ` Andrew Morton
  -1 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-10  0:18 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed,  9 Jan 2013 15:21:13 +0900
Minchan Kim <minchan@kernel.org> wrote:

> Recently, Luigi reported there are lots of free swap space when
> OOM happens. It's easily reproduced on zram-over-swap, where
> many instance of memory hogs are running and laptop_mode is enabled.
> 
> Luigi reported there was no problem when he disabled laptop_mode.
> The problem when I investigate problem is following as.
> 
> try_to_free_pages disable may_writepage if laptop_mode is enabled.
> shrink_page_list adds lots of anon pages in swap cache by
> add_to_swap, which makes pages Dirty and rotate them to head of
> inactive LRU without pageout. If it is repeated, inactive anon LRU
> is full of Dirty and SwapCache pages.
> 
> In case of that, isolate_lru_pages fails because it try to isolate
> clean page due to may_writepage == 0.
> 
> The may_writepage could be 1 only if total_scanned is higher than
> writeback_threshold in do_try_to_free_pages but unfortunately,
> VM can't isolate anon pages from inactive anon lru list by
> above reason and we already reclaimed all file-backed pages.
> So it ends up OOM killing.
> 
> This patch prevents to add a page to swap cache unnecessary when
> may_writepage is unset so anoymous lru list isn't full of
> Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> which ends up setting may_writepage to 1 and could swap out
> anon lru pages. When OOM triggers, I confirmed swap space was full.
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		if (PageAnon(page) && !PageSwapCache(page)) {
>  			if (!(sc->gfp_mask & __GFP_IO))
>  				goto keep_locked;
> +			if (!sc->may_writepage)
> +				goto keep_locked;
>  			if (!add_to_swap(page))
>  				goto activate_locked;
>  			may_enter_fs = 1;

I'm not really getting it, and the description is rather hard to follow :(

We should be adding anon pages to swapcache even when laptop_mode is
set.  And we should be writing them to swap as well, then reclaiming
them.  The only thing laptop_mode shouild do is make the disk spin up
less frequently - that doesn't mean "not at all"!

So something seems screwed up here and the patch looks like a
heavy-handed workaround.  Why aren't these anon pages getting written
out in laptop_mode?



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-10  0:18     ` Andrew Morton
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-10  0:18 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed,  9 Jan 2013 15:21:13 +0900
Minchan Kim <minchan@kernel.org> wrote:

> Recently, Luigi reported there are lots of free swap space when
> OOM happens. It's easily reproduced on zram-over-swap, where
> many instance of memory hogs are running and laptop_mode is enabled.
> 
> Luigi reported there was no problem when he disabled laptop_mode.
> The problem when I investigate problem is following as.
> 
> try_to_free_pages disable may_writepage if laptop_mode is enabled.
> shrink_page_list adds lots of anon pages in swap cache by
> add_to_swap, which makes pages Dirty and rotate them to head of
> inactive LRU without pageout. If it is repeated, inactive anon LRU
> is full of Dirty and SwapCache pages.
> 
> In case of that, isolate_lru_pages fails because it try to isolate
> clean page due to may_writepage == 0.
> 
> The may_writepage could be 1 only if total_scanned is higher than
> writeback_threshold in do_try_to_free_pages but unfortunately,
> VM can't isolate anon pages from inactive anon lru list by
> above reason and we already reclaimed all file-backed pages.
> So it ends up OOM killing.
> 
> This patch prevents to add a page to swap cache unnecessary when
> may_writepage is unset so anoymous lru list isn't full of
> Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> which ends up setting may_writepage to 1 and could swap out
> anon lru pages. When OOM triggers, I confirmed swap space was full.
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		if (PageAnon(page) && !PageSwapCache(page)) {
>  			if (!(sc->gfp_mask & __GFP_IO))
>  				goto keep_locked;
> +			if (!sc->may_writepage)
> +				goto keep_locked;
>  			if (!add_to_swap(page))
>  				goto activate_locked;
>  			may_enter_fs = 1;

I'm not really getting it, and the description is rather hard to follow :(

We should be adding anon pages to swapcache even when laptop_mode is
set.  And we should be writing them to swap as well, then reclaiming
them.  The only thing laptop_mode shouild do is make the disk spin up
less frequently - that doesn't mean "not at all"!

So something seems screwed up here and the patch looks like a
heavy-handed workaround.  Why aren't these anon pages getting written
out in laptop_mode?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-09  6:21   ` Minchan Kim
@ 2013-01-10  0:20     ` Andrew Morton
  -1 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-10  0:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed,  9 Jan 2013 15:21:13 +0900
Minchan Kim <minchan@kernel.org> wrote:

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		if (PageAnon(page) && !PageSwapCache(page)) {
>  			if (!(sc->gfp_mask & __GFP_IO))
>  				goto keep_locked;
> +			if (!sc->may_writepage)
> +				goto keep_locked;
>  			if (!add_to_swap(page))
>  				goto activate_locked;
>  			may_enter_fs = 1;

We should add a comment here explaining what's going on.  But I can't
suggest anything which sounds rational because this looks so wrong :(


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-10  0:20     ` Andrew Morton
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-10  0:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed,  9 Jan 2013 15:21:13 +0900
Minchan Kim <minchan@kernel.org> wrote:

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		if (PageAnon(page) && !PageSwapCache(page)) {
>  			if (!(sc->gfp_mask & __GFP_IO))
>  				goto keep_locked;
> +			if (!sc->may_writepage)
> +				goto keep_locked;
>  			if (!add_to_swap(page))
>  				goto activate_locked;
>  			may_enter_fs = 1;

We should add a comment here explaining what's going on.  But I can't
suggest anything which sounds rational because this looks so wrong :(

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-09  6:21   ` Minchan Kim
@ 2013-01-10  0:26     ` Andrew Morton
  -1 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-10  0:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed,  9 Jan 2013 15:21:14 +0900
Minchan Kim <minchan@kernel.org> wrote:

> If laptop_mode is enable, VM try to avoid I/O for saving the power.
> But if there isn't reclaimable memory without I/O, we should do I/O
> for preventing unnecessary OOM kill although we sacrifices power.
> 
> One of example is that we are out of page cache. Remained one is
> only anonymous pages, for swapping out, we needs may_writepage = 1.
> 
> Reported-by: Luigi Semenzato <semenzato@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/vmscan.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 439cc47..624c816 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
>  		free = zone_page_state(zone, NR_FREE_PAGES);
>  		if (unlikely(file + free <= high_wmark_pages(zone))) {
>  			scan_balance = SCAN_ANON;
> +			/*
> +			 * From now on, we have to swap out
> +			 * for peventing OOM kill although
> +			 * we sacrifice power consumption.
> +			 */
> +			sc->may_writepage = 1;
>  			goto out;
>  		}
>  	}

This is pretty ugly.  get_scan_count() is, as its name implies, an
idempotent function which inspects the state of things and returns a
result.  As such, it has no business going in and altering the state of
the scan_control.

We have code in both direct reclaim and in kswapd to set may_writepage
if vmscan is getting into trouble.  I don't see why adding another
instance is necessary if the existing instances are working correctly.



(Is it correct that __zone_reclaim() ignores laptop_mode?)


I have a feeling that laptop mode has bitrotted and these patches are
kinda hacking around as-yet-not-understood failures...

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-10  0:26     ` Andrew Morton
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-10  0:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed,  9 Jan 2013 15:21:14 +0900
Minchan Kim <minchan@kernel.org> wrote:

> If laptop_mode is enable, VM try to avoid I/O for saving the power.
> But if there isn't reclaimable memory without I/O, we should do I/O
> for preventing unnecessary OOM kill although we sacrifices power.
> 
> One of example is that we are out of page cache. Remained one is
> only anonymous pages, for swapping out, we needs may_writepage = 1.
> 
> Reported-by: Luigi Semenzato <semenzato@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/vmscan.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 439cc47..624c816 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
>  		free = zone_page_state(zone, NR_FREE_PAGES);
>  		if (unlikely(file + free <= high_wmark_pages(zone))) {
>  			scan_balance = SCAN_ANON;
> +			/*
> +			 * From now on, we have to swap out
> +			 * for peventing OOM kill although
> +			 * we sacrifice power consumption.
> +			 */
> +			sc->may_writepage = 1;
>  			goto out;
>  		}
>  	}

This is pretty ugly.  get_scan_count() is, as its name implies, an
idempotent function which inspects the state of things and returns a
result.  As such, it has no business going in and altering the state of
the scan_control.

We have code in both direct reclaim and in kswapd to set may_writepage
if vmscan is getting into trouble.  I don't see why adding another
instance is necessary if the existing instances are working correctly.



(Is it correct that __zone_reclaim() ignores laptop_mode?)


I have a feeling that laptop mode has bitrotted and these patches are
kinda hacking around as-yet-not-understood failures...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-10  0:18     ` Andrew Morton
@ 2013-01-10  2:03       ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-10  2:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

Hi Andrew,

On Wed, Jan 09, 2013 at 04:18:54PM -0800, Andrew Morton wrote:
> On Wed,  9 Jan 2013 15:21:13 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> 
> > Recently, Luigi reported there are lots of free swap space when
> > OOM happens. It's easily reproduced on zram-over-swap, where
> > many instance of memory hogs are running and laptop_mode is enabled.
> > 
> > Luigi reported there was no problem when he disabled laptop_mode.
> > The problem when I investigate problem is following as.
> > 
> > try_to_free_pages disable may_writepage if laptop_mode is enabled.
> > shrink_page_list adds lots of anon pages in swap cache by
> > add_to_swap, which makes pages Dirty and rotate them to head of
> > inactive LRU without pageout. If it is repeated, inactive anon LRU
> > is full of Dirty and SwapCache pages.
> > 
> > In case of that, isolate_lru_pages fails because it try to isolate
> > clean page due to may_writepage == 0.
> > 
> > The may_writepage could be 1 only if total_scanned is higher than
> > writeback_threshold in do_try_to_free_pages but unfortunately,
> > VM can't isolate anon pages from inactive anon lru list by
> > above reason and we already reclaimed all file-backed pages.
> > So it ends up OOM killing.
> > 
> > This patch prevents to add a page to swap cache unnecessary when
> > may_writepage is unset so anoymous lru list isn't full of
> > Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> > which ends up setting may_writepage to 1 and could swap out
> > anon lru pages. When OOM triggers, I confirmed swap space was full.
> > 
> > ...
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  		if (PageAnon(page) && !PageSwapCache(page)) {
> >  			if (!(sc->gfp_mask & __GFP_IO))
> >  				goto keep_locked;
> > +			if (!sc->may_writepage)
> > +				goto keep_locked;
> >  			if (!add_to_swap(page))
> >  				goto activate_locked;
> >  			may_enter_fs = 1;
> 
> I'm not really getting it, and the description is rather hard to follow :(

It seems I don't have a talent about description. :(
I hope it would be better this year. :)

> 
> We should be adding anon pages to swapcache even when laptop_mode is
> set.  And we should be writing them to swap as well, then reclaiming
> them.  The only thing laptop_mode shouild do is make the disk spin up
> less frequently - that doesn't mean "not at all"!

So it seems your rationale is that let's save power in only system has
enough memory so let's remove may_writepage in reclaim path?

If it is, I love it because I didn't see any number about power saving
through reclaiming throttling(But surely there was reason to add it)
and not sure it works well during long time because we have tweaked
reclaim part too many.

> 
> So something seems screwed up here and the patch looks like a
> heavy-handed workaround.  Why aren't these anon pages getting written
> out in laptop_mode?

Don't know. It was there long time and I don't want to screw it up.
If we decide paging out in reclaim path regardless of laptop_mode,
it makes the problem easy without ugly workaround.

Remove may_writepage? If it's too agressive, we can remove it in only
direct reclaim path.

> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-10  2:03       ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-10  2:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

Hi Andrew,

On Wed, Jan 09, 2013 at 04:18:54PM -0800, Andrew Morton wrote:
> On Wed,  9 Jan 2013 15:21:13 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> 
> > Recently, Luigi reported there are lots of free swap space when
> > OOM happens. It's easily reproduced on zram-over-swap, where
> > many instance of memory hogs are running and laptop_mode is enabled.
> > 
> > Luigi reported there was no problem when he disabled laptop_mode.
> > The problem when I investigate problem is following as.
> > 
> > try_to_free_pages disable may_writepage if laptop_mode is enabled.
> > shrink_page_list adds lots of anon pages in swap cache by
> > add_to_swap, which makes pages Dirty and rotate them to head of
> > inactive LRU without pageout. If it is repeated, inactive anon LRU
> > is full of Dirty and SwapCache pages.
> > 
> > In case of that, isolate_lru_pages fails because it try to isolate
> > clean page due to may_writepage == 0.
> > 
> > The may_writepage could be 1 only if total_scanned is higher than
> > writeback_threshold in do_try_to_free_pages but unfortunately,
> > VM can't isolate anon pages from inactive anon lru list by
> > above reason and we already reclaimed all file-backed pages.
> > So it ends up OOM killing.
> > 
> > This patch prevents to add a page to swap cache unnecessary when
> > may_writepage is unset so anoymous lru list isn't full of
> > Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> > which ends up setting may_writepage to 1 and could swap out
> > anon lru pages. When OOM triggers, I confirmed swap space was full.
> > 
> > ...
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  		if (PageAnon(page) && !PageSwapCache(page)) {
> >  			if (!(sc->gfp_mask & __GFP_IO))
> >  				goto keep_locked;
> > +			if (!sc->may_writepage)
> > +				goto keep_locked;
> >  			if (!add_to_swap(page))
> >  				goto activate_locked;
> >  			may_enter_fs = 1;
> 
> I'm not really getting it, and the description is rather hard to follow :(

It seems I don't have a talent about description. :(
I hope it would be better this year. :)

> 
> We should be adding anon pages to swapcache even when laptop_mode is
> set.  And we should be writing them to swap as well, then reclaiming
> them.  The only thing laptop_mode shouild do is make the disk spin up
> less frequently - that doesn't mean "not at all"!

So it seems your rationale is that let's save power in only system has
enough memory so let's remove may_writepage in reclaim path?

If it is, I love it because I didn't see any number about power saving
through reclaiming throttling(But surely there was reason to add it)
and not sure it works well during long time because we have tweaked
reclaim part too many.

> 
> So something seems screwed up here and the patch looks like a
> heavy-handed workaround.  Why aren't these anon pages getting written
> out in laptop_mode?

Don't know. It was there long time and I don't want to screw it up.
If we decide paging out in reclaim path regardless of laptop_mode,
it makes the problem easy without ugly workaround.

Remove may_writepage? If it's too agressive, we can remove it in only
direct reclaim path.

> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-10  0:26     ` Andrew Morton
@ 2013-01-10  2:23       ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-10  2:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed, Jan 09, 2013 at 04:26:02PM -0800, Andrew Morton wrote:
> On Wed,  9 Jan 2013 15:21:14 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> 
> > If laptop_mode is enable, VM try to avoid I/O for saving the power.
> > But if there isn't reclaimable memory without I/O, we should do I/O
> > for preventing unnecessary OOM kill although we sacrifices power.
> > 
> > One of example is that we are out of page cache. Remained one is
> > only anonymous pages, for swapping out, we needs may_writepage = 1.
> > 
> > Reported-by: Luigi Semenzato <semenzato@google.com>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  mm/vmscan.c |    6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 439cc47..624c816 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> >  		free = zone_page_state(zone, NR_FREE_PAGES);
> >  		if (unlikely(file + free <= high_wmark_pages(zone))) {
> >  			scan_balance = SCAN_ANON;
> > +			/*
> > +			 * From now on, we have to swap out
> > +			 * for peventing OOM kill although
> > +			 * we sacrifice power consumption.
> > +			 */
> > +			sc->may_writepage = 1;
> >  			goto out;
> >  		}
> >  	}
> 
> This is pretty ugly.  get_scan_count() is, as its name implies, an
> idempotent function which inspects the state of things and returns a
> result.  As such, it has no business going in and altering the state of
> the scan_control.
> 
> We have code in both direct reclaim and in kswapd to set may_writepage
> if vmscan is getting into trouble.  I don't see why adding another
> instance is necessary if the existing instances are working correctly.
> 
> 
> 
> (Is it correct that __zone_reclaim() ignores laptop_mode?)
> 
> 
> I have a feeling that laptop mode has bitrotted and these patches are
> kinda hacking around as-yet-not-understood failures...

Absolutely, this patch is last guard for unexpectable behavior.
As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
or [2/2] but I wanted to add this as last resort in case of unexpected
emergency. But you're right. It's not good to hide the problem like this path
so let's drop [2/2].

Also, I absolutely agree it has bitrotted so for correcting it, we need a
volunteer who have to inverstigate power saveing experiment with long time.
So [1/2] would be band-aid until that.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-10  2:23       ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-10  2:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed, Jan 09, 2013 at 04:26:02PM -0800, Andrew Morton wrote:
> On Wed,  9 Jan 2013 15:21:14 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> 
> > If laptop_mode is enable, VM try to avoid I/O for saving the power.
> > But if there isn't reclaimable memory without I/O, we should do I/O
> > for preventing unnecessary OOM kill although we sacrifices power.
> > 
> > One of example is that we are out of page cache. Remained one is
> > only anonymous pages, for swapping out, we needs may_writepage = 1.
> > 
> > Reported-by: Luigi Semenzato <semenzato@google.com>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  mm/vmscan.c |    6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 439cc47..624c816 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> >  		free = zone_page_state(zone, NR_FREE_PAGES);
> >  		if (unlikely(file + free <= high_wmark_pages(zone))) {
> >  			scan_balance = SCAN_ANON;
> > +			/*
> > +			 * From now on, we have to swap out
> > +			 * for peventing OOM kill although
> > +			 * we sacrifice power consumption.
> > +			 */
> > +			sc->may_writepage = 1;
> >  			goto out;
> >  		}
> >  	}
> 
> This is pretty ugly.  get_scan_count() is, as its name implies, an
> idempotent function which inspects the state of things and returns a
> result.  As such, it has no business going in and altering the state of
> the scan_control.
> 
> We have code in both direct reclaim and in kswapd to set may_writepage
> if vmscan is getting into trouble.  I don't see why adding another
> instance is necessary if the existing instances are working correctly.
> 
> 
> 
> (Is it correct that __zone_reclaim() ignores laptop_mode?)
> 
> 
> I have a feeling that laptop mode has bitrotted and these patches are
> kinda hacking around as-yet-not-understood failures...

Absolutely, this patch is last guard for unexpectable behavior.
As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
or [2/2] but I wanted to add this as last resort in case of unexpected
emergency. But you're right. It's not good to hide the problem like this path
so let's drop [2/2].

Also, I absolutely agree it has bitrotted so for correcting it, we need a
volunteer who have to inverstigate power saveing experiment with long time.
So [1/2] would be band-aid until that.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-10  2:23       ` Minchan Kim
@ 2013-01-10 21:58         ` Andrew Morton
  -1 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-10 21:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Thu, 10 Jan 2013 11:23:06 +0900
Minchan Kim <minchan@kernel.org> wrote:

> > I have a feeling that laptop mode has bitrotted and these patches are
> > kinda hacking around as-yet-not-understood failures...
> 
> Absolutely, this patch is last guard for unexpectable behavior.
> As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
> or [2/2] but I wanted to add this as last resort in case of unexpected
> emergency. But you're right. It's not good to hide the problem like this path
> so let's drop [2/2].
> 
> Also, I absolutely agree it has bitrotted so for correcting it, we need a
> volunteer who have to inverstigate power saveing experiment with long time.
> So [1/2] would be band-aid until that.

I'm inclined to hold off on 1/2 as well, really.

The point of laptop_mode isn't to save power btw - it is to minimise
the frequency with which the disk drive is spun up.  By deferring and
then batching writeout operations, basically.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-10 21:58         ` Andrew Morton
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-10 21:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Thu, 10 Jan 2013 11:23:06 +0900
Minchan Kim <minchan@kernel.org> wrote:

> > I have a feeling that laptop mode has bitrotted and these patches are
> > kinda hacking around as-yet-not-understood failures...
> 
> Absolutely, this patch is last guard for unexpectable behavior.
> As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
> or [2/2] but I wanted to add this as last resort in case of unexpected
> emergency. But you're right. It's not good to hide the problem like this path
> so let's drop [2/2].
> 
> Also, I absolutely agree it has bitrotted so for correcting it, we need a
> volunteer who have to inverstigate power saveing experiment with long time.
> So [1/2] would be band-aid until that.

I'm inclined to hold off on 1/2 as well, really.

The point of laptop_mode isn't to save power btw - it is to minimise
the frequency with which the disk drive is spun up.  By deferring and
then batching writeout operations, basically.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-10  2:03       ` Minchan Kim
  (?)
@ 2013-01-10 23:24       ` Luigi Semenzato
  2013-01-10 23:27           ` Luigi Semenzato
  2013-01-11  4:03           ` Minchan Kim
  -1 siblings, 2 replies; 58+ messages in thread
From: Luigi Semenzato @ 2013-01-10 23:24 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Rik van Riel,
	Johannes Weiner

[-- Attachment #1: Type: text/plain, Size: 4989 bytes --]

For what it's worth, I tested this patch on my 3.4 kernel, and it works as
advertised.  Here's my setup.

- 2 GB RAM
- a 3 GB zram disk for swapping
- start one "hog" process per second (each hog process mallocs and touches
200 MB of memory).
- watch /proc/meminfo

1. I verified that the problem still exists on my current 3.4 kernel.  With
laptop_mode = 2, hog processes are oom-killed when about 1.8-1.9 (out of 3)
GB of swap space are still left

2. I double-checked that the problem does not exist with laptop_mode = 0:
hog processes are oom-killed when swap space is exhausted (with good
approximation).

3. I added the two-line patch, put back laptop_mode = 2, and verified that
hog processes are oom-killed when swap space is exhausted, same as case 2.

Let me know if I can run any more tests for you, and thanks for all the
support so far!



On Wed, Jan 9, 2013 at 6:03 PM, Minchan Kim <minchan@kernel.org> wrote:

> Hi Andrew,
>
> On Wed, Jan 09, 2013 at 04:18:54PM -0800, Andrew Morton wrote:
> > On Wed,  9 Jan 2013 15:21:13 +0900
> > Minchan Kim <minchan@kernel.org> wrote:
> >
> > > Recently, Luigi reported there are lots of free swap space when
> > > OOM happens. It's easily reproduced on zram-over-swap, where
> > > many instance of memory hogs are running and laptop_mode is enabled.
> > >
> > > Luigi reported there was no problem when he disabled laptop_mode.
> > > The problem when I investigate problem is following as.
> > >
> > > try_to_free_pages disable may_writepage if laptop_mode is enabled.
> > > shrink_page_list adds lots of anon pages in swap cache by
> > > add_to_swap, which makes pages Dirty and rotate them to head of
> > > inactive LRU without pageout. If it is repeated, inactive anon LRU
> > > is full of Dirty and SwapCache pages.
> > >
> > > In case of that, isolate_lru_pages fails because it try to isolate
> > > clean page due to may_writepage == 0.
> > >
> > > The may_writepage could be 1 only if total_scanned is higher than
> > > writeback_threshold in do_try_to_free_pages but unfortunately,
> > > VM can't isolate anon pages from inactive anon lru list by
> > > above reason and we already reclaimed all file-backed pages.
> > > So it ends up OOM killing.
> > >
> > > This patch prevents to add a page to swap cache unnecessary when
> > > may_writepage is unset so anoymous lru list isn't full of
> > > Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> > > which ends up setting may_writepage to 1 and could swap out
> > > anon lru pages. When OOM triggers, I confirmed swap space was full.
> > >
> > > ...
> > >
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct
> list_head *page_list,
> > >             if (PageAnon(page) && !PageSwapCache(page)) {
> > >                     if (!(sc->gfp_mask & __GFP_IO))
> > >                             goto keep_locked;
> > > +                   if (!sc->may_writepage)
> > > +                           goto keep_locked;
> > >                     if (!add_to_swap(page))
> > >                             goto activate_locked;
> > >                     may_enter_fs = 1;
> >
> > I'm not really getting it, and the description is rather hard to follow
> :(
>
> It seems I don't have a talent about description. :(
> I hope it would be better this year. :)
>
> >
> > We should be adding anon pages to swapcache even when laptop_mode is
> > set.  And we should be writing them to swap as well, then reclaiming
> > them.  The only thing laptop_mode shouild do is make the disk spin up
> > less frequently - that doesn't mean "not at all"!
>
> So it seems your rationale is that let's save power in only system has
> enough memory so let's remove may_writepage in reclaim path?
>
> If it is, I love it because I didn't see any number about power saving
> through reclaiming throttling(But surely there was reason to add it)
> and not sure it works well during long time because we have tweaked
> reclaim part too many.
>
> >
> > So something seems screwed up here and the patch looks like a
> > heavy-handed workaround.  Why aren't these anon pages getting written
> > out in laptop_mode?
>
> Don't know. It was there long time and I don't want to screw it up.
> If we decide paging out in reclaim path regardless of laptop_mode,
> it makes the problem easy without ugly workaround.
>
> Remove may_writepage? If it's too agressive, we can remove it in only
> direct reclaim path.
>
> >
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

[-- Attachment #2: Type: text/html, Size: 7234 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-10 23:24       ` Luigi Semenzato
@ 2013-01-10 23:27           ` Luigi Semenzato
  2013-01-11  4:03           ` Minchan Kim
  1 sibling, 0 replies; 58+ messages in thread
From: Luigi Semenzato @ 2013-01-10 23:27 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Rik van Riel,
	Johannes Weiner

[I may have screwed up my previous message, sorry if this is a
duplicate.  (Content-Policy reject msg: The message contains HTML
subpart, therefore we consider it SPAM or Outlook Virus.)]

------------------------------------------

For what it's worth, I tested this patch on my 3.4 kernel, and it
works as advertised.  Here's my setup.

- 2 GB RAM
- a 3 GB zram disk for swapping
- start one "hog" process per second (each hog process mallocs and
touches 200 MB of memory).
- watch /proc/meminfo

1. I verified that the problem still exists on my current 3.4 kernel.
With laptop_mode = 2, hog processes are oom-killed when about 1.8-1.9
(out of 3) GB of swap space are still left

2. I double-checked that the problem does not exist with laptop_mode =
0: hog processes are oom-killed when swap space is exhausted (with
good approximation).

3. I added the two-line patch, put back laptop_mode = 2, and verified
that hog processes are oom-killed when swap space is exhausted, same
as case 2.

Let me know if I can run any more tests for you, and thanks for all
the support so far!

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-10 23:27           ` Luigi Semenzato
  0 siblings, 0 replies; 58+ messages in thread
From: Luigi Semenzato @ 2013-01-10 23:27 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Rik van Riel,
	Johannes Weiner

[I may have screwed up my previous message, sorry if this is a
duplicate.  (Content-Policy reject msg: The message contains HTML
subpart, therefore we consider it SPAM or Outlook Virus.)]

------------------------------------------

For what it's worth, I tested this patch on my 3.4 kernel, and it
works as advertised.  Here's my setup.

- 2 GB RAM
- a 3 GB zram disk for swapping
- start one "hog" process per second (each hog process mallocs and
touches 200 MB of memory).
- watch /proc/meminfo

1. I verified that the problem still exists on my current 3.4 kernel.
With laptop_mode = 2, hog processes are oom-killed when about 1.8-1.9
(out of 3) GB of swap space are still left

2. I double-checked that the problem does not exist with laptop_mode =
0: hog processes are oom-killed when swap space is exhausted (with
good approximation).

3. I added the two-line patch, put back laptop_mode = 2, and verified
that hog processes are oom-killed when swap space is exhausted, same
as case 2.

Let me know if I can run any more tests for you, and thanks for all
the support so far!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-10 23:24       ` Luigi Semenzato
@ 2013-01-11  4:03           ` Minchan Kim
  2013-01-11  4:03           ` Minchan Kim
  1 sibling, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-11  4:03 UTC (permalink / raw)
  To: Luigi Semenzato
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Rik van Riel,
	Johannes Weiner

Hi Luigi,

On Thu, Jan 10, 2013 at 03:24:21PM -0800, Luigi Semenzato wrote:
> For what it's worth, I tested this patch on my 3.4 kernel, and it works as
> advertised.  Here's my setup.
> 
> - 2 GB RAM
> - a 3 GB zram disk for swapping
> - start one "hog" process per second (each hog process mallocs and touches
> 200 MB of memory).
> - watch /proc/meminfo
> 
> 1. I verified that the problem still exists on my current 3.4 kernel.  With
> laptop_mode = 2, hog processes are oom-killed when about 1.8-1.9 (out of 3)
> GB of swap space are still left
> 
> 2. I double-checked that the problem does not exist with laptop_mode = 0:
> hog processes are oom-killed when swap space is exhausted (with good
> approximation).
> 
> 3. I added the two-line patch, put back laptop_mode = 2, and verified that
> hog processes are oom-killed when swap space is exhausted, same as case 2.
> 
> Let me know if I can run any more tests for you, and thanks for all the
> support so far!

Thanks very much! But it seems Andrew doesn't like this version.
I will discuss more with him and ask again with confimred version to you.

Thanks, again.!

FYI)
After I resolves this issue, will dive into min_filelist_kbytes patch. :)
> 
> 
> 
> On Wed, Jan 9, 2013 at 6:03 PM, Minchan Kim <minchan@kernel.org> wrote:
> 
> > Hi Andrew,
> >
> > On Wed, Jan 09, 2013 at 04:18:54PM -0800, Andrew Morton wrote:
> > > On Wed,  9 Jan 2013 15:21:13 +0900
> > > Minchan Kim <minchan@kernel.org> wrote:
> > >
> > > > Recently, Luigi reported there are lots of free swap space when
> > > > OOM happens. It's easily reproduced on zram-over-swap, where
> > > > many instance of memory hogs are running and laptop_mode is enabled.
> > > >
> > > > Luigi reported there was no problem when he disabled laptop_mode.
> > > > The problem when I investigate problem is following as.
> > > >
> > > > try_to_free_pages disable may_writepage if laptop_mode is enabled.
> > > > shrink_page_list adds lots of anon pages in swap cache by
> > > > add_to_swap, which makes pages Dirty and rotate them to head of
> > > > inactive LRU without pageout. If it is repeated, inactive anon LRU
> > > > is full of Dirty and SwapCache pages.
> > > >
> > > > In case of that, isolate_lru_pages fails because it try to isolate
> > > > clean page due to may_writepage == 0.
> > > >
> > > > The may_writepage could be 1 only if total_scanned is higher than
> > > > writeback_threshold in do_try_to_free_pages but unfortunately,
> > > > VM can't isolate anon pages from inactive anon lru list by
> > > > above reason and we already reclaimed all file-backed pages.
> > > > So it ends up OOM killing.
> > > >
> > > > This patch prevents to add a page to swap cache unnecessary when
> > > > may_writepage is unset so anoymous lru list isn't full of
> > > > Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> > > > which ends up setting may_writepage to 1 and could swap out
> > > > anon lru pages. When OOM triggers, I confirmed swap space was full.
> > > >
> > > > ...
> > > >
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct
> > list_head *page_list,
> > > >             if (PageAnon(page) && !PageSwapCache(page)) {
> > > >                     if (!(sc->gfp_mask & __GFP_IO))
> > > >                             goto keep_locked;
> > > > +                   if (!sc->may_writepage)
> > > > +                           goto keep_locked;
> > > >                     if (!add_to_swap(page))
> > > >                             goto activate_locked;
> > > >                     may_enter_fs = 1;
> > >
> > > I'm not really getting it, and the description is rather hard to follow
> > :(
> >
> > It seems I don't have a talent about description. :(
> > I hope it would be better this year. :)
> >
> > >
> > > We should be adding anon pages to swapcache even when laptop_mode is
> > > set.  And we should be writing them to swap as well, then reclaiming
> > > them.  The only thing laptop_mode shouild do is make the disk spin up
> > > less frequently - that doesn't mean "not at all"!
> >
> > So it seems your rationale is that let's save power in only system has
> > enough memory so let's remove may_writepage in reclaim path?
> >
> > If it is, I love it because I didn't see any number about power saving
> > through reclaiming throttling(But surely there was reason to add it)
> > and not sure it works well during long time because we have tweaked
> > reclaim part too many.
> >
> > >
> > > So something seems screwed up here and the patch looks like a
> > > heavy-handed workaround.  Why aren't these anon pages getting written
> > > out in laptop_mode?
> >
> > Don't know. It was there long time and I don't want to screw it up.
> > If we decide paging out in reclaim path regardless of laptop_mode,
> > it makes the problem easy without ugly workaround.
> >
> > Remove may_writepage? If it's too agressive, we can remove it in only
> > direct reclaim path.
> >
> > >
> > >
> > > --
> > > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > > the body to majordomo@kvack.org.  For more info on Linux MM,
> > > see: http://www.linux-mm.org/ .
> > > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> > --
> > Kind regards,
> > Minchan Kim
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-11  4:03           ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-11  4:03 UTC (permalink / raw)
  To: Luigi Semenzato
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Rik van Riel,
	Johannes Weiner

Hi Luigi,

On Thu, Jan 10, 2013 at 03:24:21PM -0800, Luigi Semenzato wrote:
> For what it's worth, I tested this patch on my 3.4 kernel, and it works as
> advertised.  Here's my setup.
> 
> - 2 GB RAM
> - a 3 GB zram disk for swapping
> - start one "hog" process per second (each hog process mallocs and touches
> 200 MB of memory).
> - watch /proc/meminfo
> 
> 1. I verified that the problem still exists on my current 3.4 kernel.  With
> laptop_mode = 2, hog processes are oom-killed when about 1.8-1.9 (out of 3)
> GB of swap space are still left
> 
> 2. I double-checked that the problem does not exist with laptop_mode = 0:
> hog processes are oom-killed when swap space is exhausted (with good
> approximation).
> 
> 3. I added the two-line patch, put back laptop_mode = 2, and verified that
> hog processes are oom-killed when swap space is exhausted, same as case 2.
> 
> Let me know if I can run any more tests for you, and thanks for all the
> support so far!

Thanks very much! But it seems Andrew doesn't like this version.
I will discuss more with him and ask again with confimred version to you.

Thanks, again.!

FYI)
After I resolves this issue, will dive into min_filelist_kbytes patch. :)
> 
> 
> 
> On Wed, Jan 9, 2013 at 6:03 PM, Minchan Kim <minchan@kernel.org> wrote:
> 
> > Hi Andrew,
> >
> > On Wed, Jan 09, 2013 at 04:18:54PM -0800, Andrew Morton wrote:
> > > On Wed,  9 Jan 2013 15:21:13 +0900
> > > Minchan Kim <minchan@kernel.org> wrote:
> > >
> > > > Recently, Luigi reported there are lots of free swap space when
> > > > OOM happens. It's easily reproduced on zram-over-swap, where
> > > > many instance of memory hogs are running and laptop_mode is enabled.
> > > >
> > > > Luigi reported there was no problem when he disabled laptop_mode.
> > > > The problem when I investigate problem is following as.
> > > >
> > > > try_to_free_pages disable may_writepage if laptop_mode is enabled.
> > > > shrink_page_list adds lots of anon pages in swap cache by
> > > > add_to_swap, which makes pages Dirty and rotate them to head of
> > > > inactive LRU without pageout. If it is repeated, inactive anon LRU
> > > > is full of Dirty and SwapCache pages.
> > > >
> > > > In case of that, isolate_lru_pages fails because it try to isolate
> > > > clean page due to may_writepage == 0.
> > > >
> > > > The may_writepage could be 1 only if total_scanned is higher than
> > > > writeback_threshold in do_try_to_free_pages but unfortunately,
> > > > VM can't isolate anon pages from inactive anon lru list by
> > > > above reason and we already reclaimed all file-backed pages.
> > > > So it ends up OOM killing.
> > > >
> > > > This patch prevents to add a page to swap cache unnecessary when
> > > > may_writepage is unset so anoymous lru list isn't full of
> > > > Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> > > > which ends up setting may_writepage to 1 and could swap out
> > > > anon lru pages. When OOM triggers, I confirmed swap space was full.
> > > >
> > > > ...
> > > >
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct
> > list_head *page_list,
> > > >             if (PageAnon(page) && !PageSwapCache(page)) {
> > > >                     if (!(sc->gfp_mask & __GFP_IO))
> > > >                             goto keep_locked;
> > > > +                   if (!sc->may_writepage)
> > > > +                           goto keep_locked;
> > > >                     if (!add_to_swap(page))
> > > >                             goto activate_locked;
> > > >                     may_enter_fs = 1;
> > >
> > > I'm not really getting it, and the description is rather hard to follow
> > :(
> >
> > It seems I don't have a talent about description. :(
> > I hope it would be better this year. :)
> >
> > >
> > > We should be adding anon pages to swapcache even when laptop_mode is
> > > set.  And we should be writing them to swap as well, then reclaiming
> > > them.  The only thing laptop_mode shouild do is make the disk spin up
> > > less frequently - that doesn't mean "not at all"!
> >
> > So it seems your rationale is that let's save power in only system has
> > enough memory so let's remove may_writepage in reclaim path?
> >
> > If it is, I love it because I didn't see any number about power saving
> > through reclaiming throttling(But surely there was reason to add it)
> > and not sure it works well during long time because we have tweaked
> > reclaim part too many.
> >
> > >
> > > So something seems screwed up here and the patch looks like a
> > > heavy-handed workaround.  Why aren't these anon pages getting written
> > > out in laptop_mode?
> >
> > Don't know. It was there long time and I don't want to screw it up.
> > If we decide paging out in reclaim path regardless of laptop_mode,
> > it makes the problem easy without ugly workaround.
> >
> > Remove may_writepage? If it's too agressive, we can remove it in only
> > direct reclaim path.
> >
> > >
> > >
> > > --
> > > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > > the body to majordomo@kvack.org.  For more info on Linux MM,
> > > see: http://www.linux-mm.org/ .
> > > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> > --
> > Kind regards,
> > Minchan Kim
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-10 21:58         ` Andrew Morton
@ 2013-01-11  4:43           ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-11  4:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

Hi Andrew,

On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
> On Thu, 10 Jan 2013 11:23:06 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> 
> > > I have a feeling that laptop mode has bitrotted and these patches are
> > > kinda hacking around as-yet-not-understood failures...
> > 
> > Absolutely, this patch is last guard for unexpectable behavior.
> > As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
> > or [2/2] but I wanted to add this as last resort in case of unexpected
> > emergency. But you're right. It's not good to hide the problem like this path
> > so let's drop [2/2].
> > 
> > Also, I absolutely agree it has bitrotted so for correcting it, we need a
> > volunteer who have to inverstigate power saveing experiment with long time.
> > So [1/2] would be band-aid until that.
> 
> I'm inclined to hold off on 1/2 as well, really.

Then, what's your plan?

It's real bug since f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
was introduced. Some portable device could use laptop_mode to save batter power.
AFAIK, the usecase was trial of ChromeOS and Luigi reported this problem although
they decided to disable laptop_mode due to other reason which laptop_mode burns out
power for a very long time in their some workload.

Another problem of laptop_mode isn't aware of in-memory swap, like zram.
So unconditionally, prevent to swap out. :( Yeb. it's another story to be fixed.

If you hate this version, how about this?
This version does following as.

1. get_scan_count forces only file-backed pages reclaiming if may_writepage is false.
   It prevents unnecessary CPU consumption and LRU churing with anon pages.
2. If memory reclaim suffers(ie, below DEF_PRIORITY - 2), may_writepage would be true
   in only direct reclaim path.

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 73b64a3..695b907 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -550,6 +550,8 @@ static inline int zone_is_oom_locked(const struct zone *zone)
  */
 #define DEF_PRIORITY 12
 
+#define HARD_TO_RECLAIM_PRIO (DEF_PRIORITY - 2)
+
 /* Maximum number of zones on a zonelist */
 #define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ff869d2..4c63bda 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -814,7 +814,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 */
 			if (page_is_file_cache(page) &&
 					(!current_is_kswapd() ||
-					 sc->priority >= DEF_PRIORITY - 2)) {
+					 sc->priority >= HARD_TO_RECLAIM_PRIO)) {
 				/*
 				 * Immediately reclaim when written back.
 				 * Similar in principal to deactivate_page()
@@ -1683,8 +1683,11 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	if (!global_reclaim(sc))
 		force_scan = true;
 
-	/* If we have no swap space, do not bother scanning anon pages. */
-	if (!sc->may_swap || (nr_swap_pages <= 0)) {
+	/*
+	 * If we have no swap space or may_writepage is false,
+	 * do not bother scanning anon pages.
+	 */
+	if (!sc->may_swap || !sc->may_writepage || (nr_swap_pages <= 0)) {
 		scan_balance = SCAN_FILE;
 		goto out;
 	}
@@ -1879,7 +1882,7 @@ static bool in_reclaim_compaction(struct scan_control *sc)
 {
 	if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
 			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
-			 sc->priority < DEF_PRIORITY - 2))
+			 sc->priority < HARD_TO_RECLAIM_PRIO))
 		return true;
 
 	return false;
@@ -2215,9 +2218,16 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 			sc->may_writepage = 1;
 		}
 
+		/*
+		 * This is a safety belt to prevent OOM kill through reclaiming
+		 * pages with sacrificing the power.
+		 */
+		if (sc->priority < HARD_TO_RECLAIM_PRIO)
+			sc->may_writepage = 1;
+
 		/* Take a nap, wait for some writeback to complete */
 		if (!sc->hibernation_mode && sc->nr_scanned &&
-		    sc->priority < DEF_PRIORITY - 2) {
+		    sc->priority < HARD_TO_RECLAIM_PRIO) {
 			struct zone *preferred_zone;
 
 			first_zones_zonelist(zonelist, gfp_zone(sc->gfp_mask),
@@ -2824,7 +2834,7 @@ loop_again:
 		 * OK, kswapd is getting into trouble.  Take a nap, then take
 		 * another pass across the zones.
 		 */
-		if (total_scanned && (sc.priority < DEF_PRIORITY - 2)) {
+		if (total_scanned && (sc.priority < HARD_TO_RECLAIM_PRIO)) {
 			if (has_under_min_watermark_zone)
 				count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
 			else if (unbalanced_zone)

> 
> The point of laptop_mode isn't to save power btw - it is to minimise
> the frequency with which the disk drive is spun up.  By deferring and
> then batching writeout operations, basically.

I don't get it. Why should we minimise such frequency?
It's for saving the power to increase batter life.
As I real all document about laptop_mode, they all said about the power
or battery life saving.

1. Documentation/laptops/laptop-mode.txt
2. http://linux.die.net/man/8/laptop_mode
3. http://samwel.tk/laptop_mode/
3. http://www.thinkwiki.org/wiki/Laptop-mode 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-11  4:43           ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-11  4:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

Hi Andrew,

On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
> On Thu, 10 Jan 2013 11:23:06 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> 
> > > I have a feeling that laptop mode has bitrotted and these patches are
> > > kinda hacking around as-yet-not-understood failures...
> > 
> > Absolutely, this patch is last guard for unexpectable behavior.
> > As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
> > or [2/2] but I wanted to add this as last resort in case of unexpected
> > emergency. But you're right. It's not good to hide the problem like this path
> > so let's drop [2/2].
> > 
> > Also, I absolutely agree it has bitrotted so for correcting it, we need a
> > volunteer who have to inverstigate power saveing experiment with long time.
> > So [1/2] would be band-aid until that.
> 
> I'm inclined to hold off on 1/2 as well, really.

Then, what's your plan?

It's real bug since f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
was introduced. Some portable device could use laptop_mode to save batter power.
AFAIK, the usecase was trial of ChromeOS and Luigi reported this problem although
they decided to disable laptop_mode due to other reason which laptop_mode burns out
power for a very long time in their some workload.

Another problem of laptop_mode isn't aware of in-memory swap, like zram.
So unconditionally, prevent to swap out. :( Yeb. it's another story to be fixed.

If you hate this version, how about this?
This version does following as.

1. get_scan_count forces only file-backed pages reclaiming if may_writepage is false.
   It prevents unnecessary CPU consumption and LRU churing with anon pages.
2. If memory reclaim suffers(ie, below DEF_PRIORITY - 2), may_writepage would be true
   in only direct reclaim path.

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 73b64a3..695b907 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -550,6 +550,8 @@ static inline int zone_is_oom_locked(const struct zone *zone)
  */
 #define DEF_PRIORITY 12
 
+#define HARD_TO_RECLAIM_PRIO (DEF_PRIORITY - 2)
+
 /* Maximum number of zones on a zonelist */
 #define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ff869d2..4c63bda 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -814,7 +814,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 */
 			if (page_is_file_cache(page) &&
 					(!current_is_kswapd() ||
-					 sc->priority >= DEF_PRIORITY - 2)) {
+					 sc->priority >= HARD_TO_RECLAIM_PRIO)) {
 				/*
 				 * Immediately reclaim when written back.
 				 * Similar in principal to deactivate_page()
@@ -1683,8 +1683,11 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	if (!global_reclaim(sc))
 		force_scan = true;
 
-	/* If we have no swap space, do not bother scanning anon pages. */
-	if (!sc->may_swap || (nr_swap_pages <= 0)) {
+	/*
+	 * If we have no swap space or may_writepage is false,
+	 * do not bother scanning anon pages.
+	 */
+	if (!sc->may_swap || !sc->may_writepage || (nr_swap_pages <= 0)) {
 		scan_balance = SCAN_FILE;
 		goto out;
 	}
@@ -1879,7 +1882,7 @@ static bool in_reclaim_compaction(struct scan_control *sc)
 {
 	if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
 			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
-			 sc->priority < DEF_PRIORITY - 2))
+			 sc->priority < HARD_TO_RECLAIM_PRIO))
 		return true;
 
 	return false;
@@ -2215,9 +2218,16 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 			sc->may_writepage = 1;
 		}
 
+		/*
+		 * This is a safety belt to prevent OOM kill through reclaiming
+		 * pages with sacrificing the power.
+		 */
+		if (sc->priority < HARD_TO_RECLAIM_PRIO)
+			sc->may_writepage = 1;
+
 		/* Take a nap, wait for some writeback to complete */
 		if (!sc->hibernation_mode && sc->nr_scanned &&
-		    sc->priority < DEF_PRIORITY - 2) {
+		    sc->priority < HARD_TO_RECLAIM_PRIO) {
 			struct zone *preferred_zone;
 
 			first_zones_zonelist(zonelist, gfp_zone(sc->gfp_mask),
@@ -2824,7 +2834,7 @@ loop_again:
 		 * OK, kswapd is getting into trouble.  Take a nap, then take
 		 * another pass across the zones.
 		 */
-		if (total_scanned && (sc.priority < DEF_PRIORITY - 2)) {
+		if (total_scanned && (sc.priority < HARD_TO_RECLAIM_PRIO)) {
 			if (has_under_min_watermark_zone)
 				count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
 			else if (unbalanced_zone)

> 
> The point of laptop_mode isn't to save power btw - it is to minimise
> the frequency with which the disk drive is spun up.  By deferring and
> then batching writeout operations, basically.

I don't get it. Why should we minimise such frequency?
It's for saving the power to increase batter life.
As I real all document about laptop_mode, they all said about the power
or battery life saving.

1. Documentation/laptops/laptop-mode.txt
2. http://linux.die.net/man/8/laptop_mode
3. http://samwel.tk/laptop_mode/
3. http://www.thinkwiki.org/wiki/Laptop-mode 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-11  4:43           ` Minchan Kim
@ 2013-01-16  0:09             ` Andrew Morton
  -1 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-16  0:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Fri, 11 Jan 2013 13:43:27 +0900
Minchan Kim <minchan@kernel.org> wrote:

> Hi Andrew,
> 
> On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
> > On Thu, 10 Jan 2013 11:23:06 +0900
> > Minchan Kim <minchan@kernel.org> wrote:
> > 
> > > > I have a feeling that laptop mode has bitrotted and these patches are
> > > > kinda hacking around as-yet-not-understood failures...
> > > 
> > > Absolutely, this patch is last guard for unexpectable behavior.
> > > As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
> > > or [2/2] but I wanted to add this as last resort in case of unexpected
> > > emergency. But you're right. It's not good to hide the problem like this path
> > > so let's drop [2/2].
> > > 
> > > Also, I absolutely agree it has bitrotted so for correcting it, we need a
> > > volunteer who have to inverstigate power saveing experiment with long time.
> > > So [1/2] would be band-aid until that.
> > 
> > I'm inclined to hold off on 1/2 as well, really.
> 
> Then, what's your plan?

My plan is to sit here until someone gets down and fully tests and
fixes laptop-mode.  Making it work properly, reliably and as-designed.

Or perhaps someone wants to make the case that we just don't need it
any more (SSDs are silent!) and removes it all.

> > 
> > The point of laptop_mode isn't to save power btw - it is to minimise
> > the frequency with which the disk drive is spun up.  By deferring and
> > then batching writeout operations, basically.
> 
> I don't get it. Why should we minimise such frequency?

Because my laptop was going clickety every minute and was keeping me
awake.

> It's for saving the power to increase batter life.

It might well have that effect, dunno.  That wasn't my intent.  Testing
needed!

> As I real all document about laptop_mode, they all said about the power
> or battery life saving.
> 
> 1. Documentation/laptops/laptop-mode.txt
> 2. http://linux.die.net/man/8/laptop_mode
> 3. http://samwel.tk/laptop_mode/
> 3. http://www.thinkwiki.org/wiki/Laptop-mode 

Documentation creep ;)

Ten years ago, gad: http://lwn.net/Articles/1652/

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-16  0:09             ` Andrew Morton
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-16  0:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Fri, 11 Jan 2013 13:43:27 +0900
Minchan Kim <minchan@kernel.org> wrote:

> Hi Andrew,
> 
> On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
> > On Thu, 10 Jan 2013 11:23:06 +0900
> > Minchan Kim <minchan@kernel.org> wrote:
> > 
> > > > I have a feeling that laptop mode has bitrotted and these patches are
> > > > kinda hacking around as-yet-not-understood failures...
> > > 
> > > Absolutely, this patch is last guard for unexpectable behavior.
> > > As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
> > > or [2/2] but I wanted to add this as last resort in case of unexpected
> > > emergency. But you're right. It's not good to hide the problem like this path
> > > so let's drop [2/2].
> > > 
> > > Also, I absolutely agree it has bitrotted so for correcting it, we need a
> > > volunteer who have to inverstigate power saveing experiment with long time.
> > > So [1/2] would be band-aid until that.
> > 
> > I'm inclined to hold off on 1/2 as well, really.
> 
> Then, what's your plan?

My plan is to sit here until someone gets down and fully tests and
fixes laptop-mode.  Making it work properly, reliably and as-designed.

Or perhaps someone wants to make the case that we just don't need it
any more (SSDs are silent!) and removes it all.

> > 
> > The point of laptop_mode isn't to save power btw - it is to minimise
> > the frequency with which the disk drive is spun up.  By deferring and
> > then batching writeout operations, basically.
> 
> I don't get it. Why should we minimise such frequency?

Because my laptop was going clickety every minute and was keeping me
awake.

> It's for saving the power to increase batter life.

It might well have that effect, dunno.  That wasn't my intent.  Testing
needed!

> As I real all document about laptop_mode, they all said about the power
> or battery life saving.
> 
> 1. Documentation/laptops/laptop-mode.txt
> 2. http://linux.die.net/man/8/laptop_mode
> 3. http://samwel.tk/laptop_mode/
> 3. http://www.thinkwiki.org/wiki/Laptop-mode 

Documentation creep ;)

Ten years ago, gad: http://lwn.net/Articles/1652/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-16  0:09             ` Andrew Morton
@ 2013-01-16  0:32               ` Sonny Rao
  -1 siblings, 0 replies; 58+ messages in thread
From: Sonny Rao @ 2013-01-16  0:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, linux-mm, linux-kernel, Dan Magenheimer,
	Bryan Freed, Hugh Dickins, Rik van Riel, Johannes Weiner,
	Sameer Nanda

On Tue, Jan 15, 2013 at 4:09 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Fri, 11 Jan 2013 13:43:27 +0900
> Minchan Kim <minchan@kernel.org> wrote:
>
>> Hi Andrew,
>>
>> On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
>> > On Thu, 10 Jan 2013 11:23:06 +0900
>> > Minchan Kim <minchan@kernel.org> wrote:
>> >
>> > > > I have a feeling that laptop mode has bitrotted and these patches are
>> > > > kinda hacking around as-yet-not-understood failures...
>> > >
>> > > Absolutely, this patch is last guard for unexpectable behavior.
>> > > As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
>> > > or [2/2] but I wanted to add this as last resort in case of unexpected
>> > > emergency. But you're right. It's not good to hide the problem like this path
>> > > so let's drop [2/2].
>> > >
>> > > Also, I absolutely agree it has bitrotted so for correcting it, we need a
>> > > volunteer who have to inverstigate power saveing experiment with long time.
>> > > So [1/2] would be band-aid until that.
>> >
>> > I'm inclined to hold off on 1/2 as well, really.
>>
>> Then, what's your plan?
>
> My plan is to sit here until someone gets down and fully tests and
> fixes laptop-mode.  Making it work properly, reliably and as-designed.
>

I think we should agree on what the goals are first.

> Or perhaps someone wants to make the case that we just don't need it
> any more (SSDs are silent!) and removes it all.
>
>> >
>> > The point of laptop_mode isn't to save power btw - it is to minimise
>> > the frequency with which the disk drive is spun up.  By deferring and
>> > then batching writeout operations, basically.
>>
>> I don't get it. Why should we minimise such frequency?
>
> Because my laptop was going clickety every minute and was keeping me
> awake.
>

Very interesting, I don't know if anyone realized that (or we just forgot) :-)

>> It's for saving the power to increase batter life.
>
> It might well have that effect, dunno.  That wasn't my intent.  Testing
> needed!
>

Power saving is certainly why we had it on originally for ChromeOS,
but we turned it off due to misbehavior.

Specifically, we saw a pathological behavior where we'd end up writing
to the disk every few seconds when laptop mode was turned on.  This
turned out to be because laptop-mode sets a timer which is used to
check for new dirty data after the initial flush and writes that out
before spinning the disk down, and on ChromeOS various chatty daemons
on the system were logging and dirtying data more or less constantly
so there was almost always something there to be written out.  So what
ended up happening was that we'd need to do a read, then wake up the
disk, and then keep writing every few seconds for a long period of
time, which had the opposite effect from what we wanted.  The issues
with zram swap just confirmed that we didn't want laptop mode.

Most of our devices have had SSDs rather than spinning disks, so noise
wasn't an issue, although when we finally did support an official
device with a spinning disk people certainly complained when the disk
started clicking all the time (due to the underflow in the writeback
code).   We do know that current SSDs save a significant amount of
power when they go into standby, so minimizing disk writes is still
useful on these devices.

A very simple laptop mode which only does a single sync when we spin
up the disk, and didn't bother with the timer behavior or muck with
swap behavior might be something that is more useful for us, and I
suspect it might simplify the writeback code somewhat as well.

>> As I real all document about laptop_mode, they all said about the power
>> or battery life saving.
>>
>> 1. Documentation/laptops/laptop-mode.txt
>> 2. http://linux.die.net/man/8/laptop_mode
>> 3. http://samwel.tk/laptop_mode/
>> 3. http://www.thinkwiki.org/wiki/Laptop-mode
>
> Documentation creep ;)
>
> Ten years ago, gad: http://lwn.net/Articles/1652/

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-16  0:32               ` Sonny Rao
  0 siblings, 0 replies; 58+ messages in thread
From: Sonny Rao @ 2013-01-16  0:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, linux-mm, linux-kernel, Dan Magenheimer,
	Bryan Freed, Hugh Dickins, Rik van Riel, Johannes Weiner,
	Sameer Nanda

On Tue, Jan 15, 2013 at 4:09 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Fri, 11 Jan 2013 13:43:27 +0900
> Minchan Kim <minchan@kernel.org> wrote:
>
>> Hi Andrew,
>>
>> On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
>> > On Thu, 10 Jan 2013 11:23:06 +0900
>> > Minchan Kim <minchan@kernel.org> wrote:
>> >
>> > > > I have a feeling that laptop mode has bitrotted and these patches are
>> > > > kinda hacking around as-yet-not-understood failures...
>> > >
>> > > Absolutely, this patch is last guard for unexpectable behavior.
>> > > As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
>> > > or [2/2] but I wanted to add this as last resort in case of unexpected
>> > > emergency. But you're right. It's not good to hide the problem like this path
>> > > so let's drop [2/2].
>> > >
>> > > Also, I absolutely agree it has bitrotted so for correcting it, we need a
>> > > volunteer who have to inverstigate power saveing experiment with long time.
>> > > So [1/2] would be band-aid until that.
>> >
>> > I'm inclined to hold off on 1/2 as well, really.
>>
>> Then, what's your plan?
>
> My plan is to sit here until someone gets down and fully tests and
> fixes laptop-mode.  Making it work properly, reliably and as-designed.
>

I think we should agree on what the goals are first.

> Or perhaps someone wants to make the case that we just don't need it
> any more (SSDs are silent!) and removes it all.
>
>> >
>> > The point of laptop_mode isn't to save power btw - it is to minimise
>> > the frequency with which the disk drive is spun up.  By deferring and
>> > then batching writeout operations, basically.
>>
>> I don't get it. Why should we minimise such frequency?
>
> Because my laptop was going clickety every minute and was keeping me
> awake.
>

Very interesting, I don't know if anyone realized that (or we just forgot) :-)

>> It's for saving the power to increase batter life.
>
> It might well have that effect, dunno.  That wasn't my intent.  Testing
> needed!
>

Power saving is certainly why we had it on originally for ChromeOS,
but we turned it off due to misbehavior.

Specifically, we saw a pathological behavior where we'd end up writing
to the disk every few seconds when laptop mode was turned on.  This
turned out to be because laptop-mode sets a timer which is used to
check for new dirty data after the initial flush and writes that out
before spinning the disk down, and on ChromeOS various chatty daemons
on the system were logging and dirtying data more or less constantly
so there was almost always something there to be written out.  So what
ended up happening was that we'd need to do a read, then wake up the
disk, and then keep writing every few seconds for a long period of
time, which had the opposite effect from what we wanted.  The issues
with zram swap just confirmed that we didn't want laptop mode.

Most of our devices have had SSDs rather than spinning disks, so noise
wasn't an issue, although when we finally did support an official
device with a spinning disk people certainly complained when the disk
started clicking all the time (due to the underflow in the writeback
code).   We do know that current SSDs save a significant amount of
power when they go into standby, so minimizing disk writes is still
useful on these devices.

A very simple laptop mode which only does a single sync when we spin
up the disk, and didn't bother with the timer behavior or muck with
swap behavior might be something that is more useful for us, and I
suspect it might simplify the writeback code somewhat as well.

>> As I real all document about laptop_mode, they all said about the power
>> or battery life saving.
>>
>> 1. Documentation/laptops/laptop-mode.txt
>> 2. http://linux.die.net/man/8/laptop_mode
>> 3. http://samwel.tk/laptop_mode/
>> 3. http://www.thinkwiki.org/wiki/Laptop-mode
>
> Documentation creep ;)
>
> Ten years ago, gad: http://lwn.net/Articles/1652/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-16  0:32               ` Sonny Rao
@ 2013-01-16  0:50                 ` Andrew Morton
  -1 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-16  0:50 UTC (permalink / raw)
  To: Sonny Rao
  Cc: Minchan Kim, linux-mm, linux-kernel, Dan Magenheimer,
	Bryan Freed, Hugh Dickins, Rik van Riel, Johannes Weiner,
	Sameer Nanda

On Tue, 15 Jan 2013 16:32:38 -0800
Sonny Rao <sonnyrao@google.com> wrote:

> >> It's for saving the power to increase batter life.
> >
> > It might well have that effect, dunno.  That wasn't my intent.  Testing
> > needed!
> >
> 
> Power saving is certainly why we had it on originally for ChromeOS,
> but we turned it off due to misbehavior.
> 
> Specifically, we saw a pathological behavior where we'd end up writing
> to the disk every few seconds when laptop mode was turned on.  This
> turned out to be because laptop-mode sets a timer which is used to
> check for new dirty data after the initial flush and writes that out
> before spinning the disk down, and on ChromeOS various chatty daemons
> on the system were logging and dirtying data more or less constantly
> so there was almost always something there to be written out.  So what
> ended up happening was that we'd need to do a read, then wake up the
> disk, and then keep writing every few seconds for a long period of
> time, which had the opposite effect from what we wanted.

So after the read, the disk would chatter away doing a dribble of
writes?  That sounds like plain brokenness (and why did the chrome guys
not tell anyone about it?!?!?).  The idea is that when the physical
read occurs, we should opportunistically flush out all pending writes,
while the disk is running.  Then go back into
buffer-writes-for-a-long-time mode.

I forget what we did with fsync() and friends.  Quite a lot of
pestiferous applications like to do fsync quite frequently.  I had a
special kernel in which fsync() consisted of "return 0;", but ISTR
there being some resistance to productizing that idea.

>  The issues
> with zram swap just confirmed that we didn't want laptop mode.
>
> Most of our devices have had SSDs rather than spinning disks, so noise
> wasn't an issue, although when we finally did support an official
> device with a spinning disk people certainly complained when the disk
> started clicking all the time

hm, it's interesting that the general idea still has vailidity.  It
would be a fun project for someone to sniff out all the requirements,
fixup/enhance/rewrite the current implementation and generally make it
all spiffy and nice.

> (due to the underflow in the writeback code).

To what underflow do you refer?

> We do know that current SSDs save a significant amount of
> power when they go into standby, so minimizing disk writes is still
> useful on these devices.
> 
> A very simple laptop mode which only does a single sync when we spin
> up the disk, and didn't bother with the timer behavior or muck with
> swap behavior might be something that is more useful for us, and I
> suspect it might simplify the writeback code somewhat as well.

I don't think I understand the problem with the timer.  My original RFC
said

: laptop_writeback_centisecs
: --------------------------
: 
: This tunable determines the maximum age of dirty data when the machine
: is operating in Laptop mode.  The default value is 30000 - five
: minutes.  This means that if applications are generating a small amount
: of write traffic, the disk will spin up once per five minutes.
: 
: If the disk is spun up for any other reason (such as for a read) then
: all dirty data will be flushed anyway, and this timer is reset to zero.

which all sounds very sensible and shouldn't exhibit the behavior you
observed.




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-16  0:50                 ` Andrew Morton
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-16  0:50 UTC (permalink / raw)
  To: Sonny Rao
  Cc: Minchan Kim, linux-mm, linux-kernel, Dan Magenheimer,
	Bryan Freed, Hugh Dickins, Rik van Riel, Johannes Weiner,
	Sameer Nanda

On Tue, 15 Jan 2013 16:32:38 -0800
Sonny Rao <sonnyrao@google.com> wrote:

> >> It's for saving the power to increase batter life.
> >
> > It might well have that effect, dunno.  That wasn't my intent.  Testing
> > needed!
> >
> 
> Power saving is certainly why we had it on originally for ChromeOS,
> but we turned it off due to misbehavior.
> 
> Specifically, we saw a pathological behavior where we'd end up writing
> to the disk every few seconds when laptop mode was turned on.  This
> turned out to be because laptop-mode sets a timer which is used to
> check for new dirty data after the initial flush and writes that out
> before spinning the disk down, and on ChromeOS various chatty daemons
> on the system were logging and dirtying data more or less constantly
> so there was almost always something there to be written out.  So what
> ended up happening was that we'd need to do a read, then wake up the
> disk, and then keep writing every few seconds for a long period of
> time, which had the opposite effect from what we wanted.

So after the read, the disk would chatter away doing a dribble of
writes?  That sounds like plain brokenness (and why did the chrome guys
not tell anyone about it?!?!?).  The idea is that when the physical
read occurs, we should opportunistically flush out all pending writes,
while the disk is running.  Then go back into
buffer-writes-for-a-long-time mode.

I forget what we did with fsync() and friends.  Quite a lot of
pestiferous applications like to do fsync quite frequently.  I had a
special kernel in which fsync() consisted of "return 0;", but ISTR
there being some resistance to productizing that idea.

>  The issues
> with zram swap just confirmed that we didn't want laptop mode.
>
> Most of our devices have had SSDs rather than spinning disks, so noise
> wasn't an issue, although when we finally did support an official
> device with a spinning disk people certainly complained when the disk
> started clicking all the time

hm, it's interesting that the general idea still has vailidity.  It
would be a fun project for someone to sniff out all the requirements,
fixup/enhance/rewrite the current implementation and generally make it
all spiffy and nice.

> (due to the underflow in the writeback code).

To what underflow do you refer?

> We do know that current SSDs save a significant amount of
> power when they go into standby, so minimizing disk writes is still
> useful on these devices.
> 
> A very simple laptop mode which only does a single sync when we spin
> up the disk, and didn't bother with the timer behavior or muck with
> swap behavior might be something that is more useful for us, and I
> suspect it might simplify the writeback code somewhat as well.

I don't think I understand the problem with the timer.  My original RFC
said

: laptop_writeback_centisecs
: --------------------------
: 
: This tunable determines the maximum age of dirty data when the machine
: is operating in Laptop mode.  The default value is 30000 - five
: minutes.  This means that if applications are generating a small amount
: of write traffic, the disk will spin up once per five minutes.
: 
: If the disk is spun up for any other reason (such as for a read) then
: all dirty data will be flushed anyway, and this timer is reset to zero.

which all sounds very sensible and shouldn't exhibit the behavior you
observed.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-16  0:50                 ` Andrew Morton
@ 2013-01-16  1:21                   ` Sonny Rao
  -1 siblings, 0 replies; 58+ messages in thread
From: Sonny Rao @ 2013-01-16  1:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, linux-mm, linux-kernel, Dan Magenheimer,
	Bryan Freed, Hugh Dickins, Rik van Riel, Johannes Weiner,
	Sameer Nanda

On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 15 Jan 2013 16:32:38 -0800
> Sonny Rao <sonnyrao@google.com> wrote:
>
>> >> It's for saving the power to increase batter life.
>> >
>> > It might well have that effect, dunno.  That wasn't my intent.  Testing
>> > needed!
>> >
>>
>> Power saving is certainly why we had it on originally for ChromeOS,
>> but we turned it off due to misbehavior.
>>
>> Specifically, we saw a pathological behavior where we'd end up writing
>> to the disk every few seconds when laptop mode was turned on.  This
>> turned out to be because laptop-mode sets a timer which is used to
>> check for new dirty data after the initial flush and writes that out
>> before spinning the disk down, and on ChromeOS various chatty daemons
>> on the system were logging and dirtying data more or less constantly
>> so there was almost always something there to be written out.  So what
>> ended up happening was that we'd need to do a read, then wake up the
>> disk, and then keep writing every few seconds for a long period of
>> time, which had the opposite effect from what we wanted.
>
> So after the read, the disk would chatter away doing a dribble of
> writes?  That sounds like plain brokenness (and why did the chrome guys
> not tell anyone about it?!?!?).

Yes, either read or fsync.  I ranted about it a little (here:
http://marc.info/?l=linux-mm&m=135422986220016&w=4), but mostly
assumed it was working as expected, and that ChromeOS was just
dirtying data at an absurd pace.  Might have been a bad assumption and
I could have been more explicit about reporting it, sorry about that.

> The idea is that when the physical
> read occurs, we should opportunistically flush out all pending writes,
> while the disk is running.  Then go back into
> buffer-writes-for-a-long-time mode.
>

See the comment in page-writeback.c above laptop_io_completion():

/*
 * We've spun up the disk and we're in laptop mode: schedule writeback
 * of all dirty data a few seconds from now.  If the flush is already
scheduled
 * then push it back - the user is still using the disk.
 */
void laptop_io_completion(struct backing_dev_info *info)

What ends up happening fairly often is that there's always something
dirty with that few seconds (or even one second) on our system.

> I forget what we did with fsync() and friends.  Quite a lot of
> pestiferous applications like to do fsync quite frequently.  I had a
> special kernel in which fsync() consisted of "return 0;", but ISTR
> there being some resistance to productizing that idea.
>

Yeah, we have this problem and we try to fix up users of fsync() as we
find them but it's a bit of a never-ending battle.  Such a feature
would be useful.

>>  The issues
>> with zram swap just confirmed that we didn't want laptop mode.
>>
>> Most of our devices have had SSDs rather than spinning disks, so noise
>> wasn't an issue, although when we finally did support an official
>> device with a spinning disk people certainly complained when the disk
>> started clicking all the time
>
> hm, it's interesting that the general idea still has vailidity.  It
> would be a fun project for someone to sniff out all the requirements,
> fixup/enhance/rewrite the current implementation and generally make it
> all spiffy and nice.
>
>> (due to the underflow in the writeback code).
>
> To what underflow do you refer?
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754

That particular bug caused writes to happen almost instantly after the
underflow ocurred, and consequently slowed write throughput to a crawl
because there was no chance for contiguous writes to gather.

>> We do know that current SSDs save a significant amount of
>> power when they go into standby, so minimizing disk writes is still
>> useful on these devices.
>>
>> A very simple laptop mode which only does a single sync when we spin
>> up the disk, and didn't bother with the timer behavior or muck with
>> swap behavior might be something that is more useful for us, and I
>> suspect it might simplify the writeback code somewhat as well.
>
> I don't think I understand the problem with the timer.  My original RFC
> said
>
> : laptop_writeback_centisecs
> : --------------------------
> :
> : This tunable determines the maximum age of dirty data when the machine
> : is operating in Laptop mode.  The default value is 30000 - five
> : minutes.  This means that if applications are generating a small amount
> : of write traffic, the disk will spin up once per five minutes.
> :
> : If the disk is spun up for any other reason (such as for a read) then
> : all dirty data will be flushed anyway, and this timer is reset to zero.
>
> which all sounds very sensible and shouldn't exhibit the behavior you
> observed.
>

The laptop-mode timer get re-armed after each writeback (see above
laptop_io_completion function), even if it was caused by laptop-mode
itself.  So, if something is continually dirtying a little bit of
data, we end up getting a chain of small writes which keeps the disk
awake for long periods of time.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-16  1:21                   ` Sonny Rao
  0 siblings, 0 replies; 58+ messages in thread
From: Sonny Rao @ 2013-01-16  1:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, linux-mm, linux-kernel, Dan Magenheimer,
	Bryan Freed, Hugh Dickins, Rik van Riel, Johannes Weiner,
	Sameer Nanda

On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 15 Jan 2013 16:32:38 -0800
> Sonny Rao <sonnyrao@google.com> wrote:
>
>> >> It's for saving the power to increase batter life.
>> >
>> > It might well have that effect, dunno.  That wasn't my intent.  Testing
>> > needed!
>> >
>>
>> Power saving is certainly why we had it on originally for ChromeOS,
>> but we turned it off due to misbehavior.
>>
>> Specifically, we saw a pathological behavior where we'd end up writing
>> to the disk every few seconds when laptop mode was turned on.  This
>> turned out to be because laptop-mode sets a timer which is used to
>> check for new dirty data after the initial flush and writes that out
>> before spinning the disk down, and on ChromeOS various chatty daemons
>> on the system were logging and dirtying data more or less constantly
>> so there was almost always something there to be written out.  So what
>> ended up happening was that we'd need to do a read, then wake up the
>> disk, and then keep writing every few seconds for a long period of
>> time, which had the opposite effect from what we wanted.
>
> So after the read, the disk would chatter away doing a dribble of
> writes?  That sounds like plain brokenness (and why did the chrome guys
> not tell anyone about it?!?!?).

Yes, either read or fsync.  I ranted about it a little (here:
http://marc.info/?l=linux-mm&m=135422986220016&w=4), but mostly
assumed it was working as expected, and that ChromeOS was just
dirtying data at an absurd pace.  Might have been a bad assumption and
I could have been more explicit about reporting it, sorry about that.

> The idea is that when the physical
> read occurs, we should opportunistically flush out all pending writes,
> while the disk is running.  Then go back into
> buffer-writes-for-a-long-time mode.
>

See the comment in page-writeback.c above laptop_io_completion():

/*
 * We've spun up the disk and we're in laptop mode: schedule writeback
 * of all dirty data a few seconds from now.  If the flush is already
scheduled
 * then push it back - the user is still using the disk.
 */
void laptop_io_completion(struct backing_dev_info *info)

What ends up happening fairly often is that there's always something
dirty with that few seconds (or even one second) on our system.

> I forget what we did with fsync() and friends.  Quite a lot of
> pestiferous applications like to do fsync quite frequently.  I had a
> special kernel in which fsync() consisted of "return 0;", but ISTR
> there being some resistance to productizing that idea.
>

Yeah, we have this problem and we try to fix up users of fsync() as we
find them but it's a bit of a never-ending battle.  Such a feature
would be useful.

>>  The issues
>> with zram swap just confirmed that we didn't want laptop mode.
>>
>> Most of our devices have had SSDs rather than spinning disks, so noise
>> wasn't an issue, although when we finally did support an official
>> device with a spinning disk people certainly complained when the disk
>> started clicking all the time
>
> hm, it's interesting that the general idea still has vailidity.  It
> would be a fun project for someone to sniff out all the requirements,
> fixup/enhance/rewrite the current implementation and generally make it
> all spiffy and nice.
>
>> (due to the underflow in the writeback code).
>
> To what underflow do you refer?
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754

That particular bug caused writes to happen almost instantly after the
underflow ocurred, and consequently slowed write throughput to a crawl
because there was no chance for contiguous writes to gather.

>> We do know that current SSDs save a significant amount of
>> power when they go into standby, so minimizing disk writes is still
>> useful on these devices.
>>
>> A very simple laptop mode which only does a single sync when we spin
>> up the disk, and didn't bother with the timer behavior or muck with
>> swap behavior might be something that is more useful for us, and I
>> suspect it might simplify the writeback code somewhat as well.
>
> I don't think I understand the problem with the timer.  My original RFC
> said
>
> : laptop_writeback_centisecs
> : --------------------------
> :
> : This tunable determines the maximum age of dirty data when the machine
> : is operating in Laptop mode.  The default value is 30000 - five
> : minutes.  This means that if applications are generating a small amount
> : of write traffic, the disk will spin up once per five minutes.
> :
> : If the disk is spun up for any other reason (such as for a read) then
> : all dirty data will be flushed anyway, and this timer is reset to zero.
>
> which all sounds very sensible and shouldn't exhibit the behavior you
> observed.
>

The laptop-mode timer get re-armed after each writeback (see above
laptop_io_completion function), even if it was caused by laptop-mode
itself.  So, if something is continually dirtying a little bit of
data, we end up getting a chain of small writes which keeps the disk
awake for long periods of time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-16  0:09             ` Andrew Morton
@ 2013-01-16  4:43               ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-16  4:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Tue, Jan 15, 2013 at 04:09:57PM -0800, Andrew Morton wrote:
> On Fri, 11 Jan 2013 13:43:27 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> 
> > Hi Andrew,
> > 
> > On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
> > > On Thu, 10 Jan 2013 11:23:06 +0900
> > > Minchan Kim <minchan@kernel.org> wrote:
> > > 
> > > > > I have a feeling that laptop mode has bitrotted and these patches are
> > > > > kinda hacking around as-yet-not-understood failures...
> > > > 
> > > > Absolutely, this patch is last guard for unexpectable behavior.
> > > > As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
> > > > or [2/2] but I wanted to add this as last resort in case of unexpected
> > > > emergency. But you're right. It's not good to hide the problem like this path
> > > > so let's drop [2/2].
> > > > 
> > > > Also, I absolutely agree it has bitrotted so for correcting it, we need a
> > > > volunteer who have to inverstigate power saveing experiment with long time.
> > > > So [1/2] would be band-aid until that.
> > > 
> > > I'm inclined to hold off on 1/2 as well, really.
> > 
> > Then, what's your plan?
> 
> My plan is to sit here until someone gets down and fully tests and
> fixes laptop-mode.  Making it work properly, reliably and as-designed.
> 
> Or perhaps someone wants to make the case that we just don't need it
> any more (SSDs are silent!) and removes it all.
> 
> > > 
> > > The point of laptop_mode isn't to save power btw - it is to minimise
> > > the frequency with which the disk drive is spun up.  By deferring and
> > > then batching writeout operations, basically.
> > 
> > I don't get it. Why should we minimise such frequency?
> 
> Because my laptop was going clickety every minute and was keeping me
> awake.
> 
> > It's for saving the power to increase batter life.
> 
> It might well have that effect, dunno.  That wasn't my intent.  Testing
> needed!
> 
> > As I real all document about laptop_mode, they all said about the power
> > or battery life saving.
> > 
> > 1. Documentation/laptops/laptop-mode.txt
> > 2. http://linux.die.net/man/8/laptop_mode
> > 3. http://samwel.tk/laptop_mode/
> > 3. http://www.thinkwiki.org/wiki/Laptop-mode 
> 
> Documentation creep ;)
> 
> Ten years ago, gad: http://lwn.net/Articles/1652/

Odd, I grep it in linux-history.git and found this.
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=93d33a4885a483c708ccb7d24b56e0d5fef7bcab

It seem to be first commit about laptop_mode but it still said about battery life
, NOT clickety. But unfortunately, it had no number, measure method and even no
side-effect when the memory pressure is severe so we couldn't sure how it helped
about batter life without reclaim problem so the VM problem have been exported
since we apply f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware].

So let's apply [1/2] in mainline and even stable to fix the problem.
After that, we can add warning to laptop_mode so user who have used it will claim their
requirements. With it, we can know they need it for power saving, clickety or
, both so we can make requirement lists. From then, we can start to do someting.
If we are luck, we can remove it totally if any user doesn't claim.

What do you think about it?

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-16  4:43               ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-16  4:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Tue, Jan 15, 2013 at 04:09:57PM -0800, Andrew Morton wrote:
> On Fri, 11 Jan 2013 13:43:27 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> 
> > Hi Andrew,
> > 
> > On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
> > > On Thu, 10 Jan 2013 11:23:06 +0900
> > > Minchan Kim <minchan@kernel.org> wrote:
> > > 
> > > > > I have a feeling that laptop mode has bitrotted and these patches are
> > > > > kinda hacking around as-yet-not-understood failures...
> > > > 
> > > > Absolutely, this patch is last guard for unexpectable behavior.
> > > > As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
> > > > or [2/2] but I wanted to add this as last resort in case of unexpected
> > > > emergency. But you're right. It's not good to hide the problem like this path
> > > > so let's drop [2/2].
> > > > 
> > > > Also, I absolutely agree it has bitrotted so for correcting it, we need a
> > > > volunteer who have to inverstigate power saveing experiment with long time.
> > > > So [1/2] would be band-aid until that.
> > > 
> > > I'm inclined to hold off on 1/2 as well, really.
> > 
> > Then, what's your plan?
> 
> My plan is to sit here until someone gets down and fully tests and
> fixes laptop-mode.  Making it work properly, reliably and as-designed.
> 
> Or perhaps someone wants to make the case that we just don't need it
> any more (SSDs are silent!) and removes it all.
> 
> > > 
> > > The point of laptop_mode isn't to save power btw - it is to minimise
> > > the frequency with which the disk drive is spun up.  By deferring and
> > > then batching writeout operations, basically.
> > 
> > I don't get it. Why should we minimise such frequency?
> 
> Because my laptop was going clickety every minute and was keeping me
> awake.
> 
> > It's for saving the power to increase batter life.
> 
> It might well have that effect, dunno.  That wasn't my intent.  Testing
> needed!
> 
> > As I real all document about laptop_mode, they all said about the power
> > or battery life saving.
> > 
> > 1. Documentation/laptops/laptop-mode.txt
> > 2. http://linux.die.net/man/8/laptop_mode
> > 3. http://samwel.tk/laptop_mode/
> > 3. http://www.thinkwiki.org/wiki/Laptop-mode 
> 
> Documentation creep ;)
> 
> Ten years ago, gad: http://lwn.net/Articles/1652/

Odd, I grep it in linux-history.git and found this.
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=93d33a4885a483c708ccb7d24b56e0d5fef7bcab

It seem to be first commit about laptop_mode but it still said about battery life
, NOT clickety. But unfortunately, it had no number, measure method and even no
side-effect when the memory pressure is severe so we couldn't sure how it helped
about batter life without reclaim problem so the VM problem have been exported
since we apply f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware].

So let's apply [1/2] in mainline and even stable to fix the problem.
After that, we can add warning to laptop_mode so user who have used it will claim their
requirements. With it, we can know they need it for power saving, clickety or
, both so we can make requirement lists. From then, we can start to do someting.
If we are luck, we can remove it totally if any user doesn't claim.

What do you think about it?

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-16  1:21                   ` Sonny Rao
@ 2013-01-16  4:47                     ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-16  4:47 UTC (permalink / raw)
  To: Sonny Rao
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Bryan Freed, Hugh Dickins, Rik van Riel, Johannes Weiner,
	Sameer Nanda

On Tue, Jan 15, 2013 at 05:21:15PM -0800, Sonny Rao wrote:
> On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Tue, 15 Jan 2013 16:32:38 -0800
> > Sonny Rao <sonnyrao@google.com> wrote:
> >
> >> >> It's for saving the power to increase batter life.
> >> >
> >> > It might well have that effect, dunno.  That wasn't my intent.  Testing
> >> > needed!
> >> >
> >>
> >> Power saving is certainly why we had it on originally for ChromeOS,
> >> but we turned it off due to misbehavior.
> >>
> >> Specifically, we saw a pathological behavior where we'd end up writing
> >> to the disk every few seconds when laptop mode was turned on.  This
> >> turned out to be because laptop-mode sets a timer which is used to
> >> check for new dirty data after the initial flush and writes that out
> >> before spinning the disk down, and on ChromeOS various chatty daemons
> >> on the system were logging and dirtying data more or less constantly
> >> so there was almost always something there to be written out.  So what
> >> ended up happening was that we'd need to do a read, then wake up the
> >> disk, and then keep writing every few seconds for a long period of
> >> time, which had the opposite effect from what we wanted.
> >
> > So after the read, the disk would chatter away doing a dribble of
> > writes?  That sounds like plain brokenness (and why did the chrome guys
> > not tell anyone about it?!?!?).
> 
> Yes, either read or fsync.  I ranted about it a little (here:
> http://marc.info/?l=linux-mm&m=135422986220016&w=4), but mostly
> assumed it was working as expected, and that ChromeOS was just
> dirtying data at an absurd pace.  Might have been a bad assumption and
> I could have been more explicit about reporting it, sorry about that.
> 
> > The idea is that when the physical
> > read occurs, we should opportunistically flush out all pending writes,
> > while the disk is running.  Then go back into
> > buffer-writes-for-a-long-time mode.
> >
> 
> See the comment in page-writeback.c above laptop_io_completion():
> 
> /*
>  * We've spun up the disk and we're in laptop mode: schedule writeback
>  * of all dirty data a few seconds from now.  If the flush is already
> scheduled
>  * then push it back - the user is still using the disk.
>  */
> void laptop_io_completion(struct backing_dev_info *info)
> 
> What ends up happening fairly often is that there's always something
> dirty with that few seconds (or even one second) on our system.
> 
> > I forget what we did with fsync() and friends.  Quite a lot of
> > pestiferous applications like to do fsync quite frequently.  I had a
> > special kernel in which fsync() consisted of "return 0;", but ISTR
> > there being some resistance to productizing that idea.
> >
> 
> Yeah, we have this problem and we try to fix up users of fsync() as we
> find them but it's a bit of a never-ending battle.  Such a feature
> would be useful.
> 
> >>  The issues
> >> with zram swap just confirmed that we didn't want laptop mode.
> >>
> >> Most of our devices have had SSDs rather than spinning disks, so noise
> >> wasn't an issue, although when we finally did support an official
> >> device with a spinning disk people certainly complained when the disk
> >> started clicking all the time
> >
> > hm, it's interesting that the general idea still has vailidity.  It
> > would be a fun project for someone to sniff out all the requirements,
> > fixup/enhance/rewrite the current implementation and generally make it
> > all spiffy and nice.
> >
> >> (due to the underflow in the writeback code).
> >
> > To what underflow do you refer?
> >
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754
> 
> That particular bug caused writes to happen almost instantly after the
> underflow ocurred, and consequently slowed write throughput to a crawl
> because there was no chance for contiguous writes to gather.
> 
> >> We do know that current SSDs save a significant amount of
> >> power when they go into standby, so minimizing disk writes is still
> >> useful on these devices.
> >>
> >> A very simple laptop mode which only does a single sync when we spin
> >> up the disk, and didn't bother with the timer behavior or muck with
> >> swap behavior might be something that is more useful for us, and I
> >> suspect it might simplify the writeback code somewhat as well.
> >
> > I don't think I understand the problem with the timer.  My original RFC
> > said
> >
> > : laptop_writeback_centisecs
> > : --------------------------
> > :
> > : This tunable determines the maximum age of dirty data when the machine
> > : is operating in Laptop mode.  The default value is 30000 - five
> > : minutes.  This means that if applications are generating a small amount
> > : of write traffic, the disk will spin up once per five minutes.
> > :
> > : If the disk is spun up for any other reason (such as for a read) then
> > : all dirty data will be flushed anyway, and this timer is reset to zero.
> >
> > which all sounds very sensible and shouldn't exhibit the behavior you
> > observed.
> >
> 
> The laptop-mode timer get re-armed after each writeback (see above
> laptop_io_completion function), even if it was caused by laptop-mode
> itself.  So, if something is continually dirtying a little bit of
> data, we end up getting a chain of small writes which keeps the disk
> awake for long periods of time.

Out of curiosity, for saving the power, why don' you increase the value for
laptop_mode?

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-16  4:47                     ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-16  4:47 UTC (permalink / raw)
  To: Sonny Rao
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Bryan Freed, Hugh Dickins, Rik van Riel, Johannes Weiner,
	Sameer Nanda

On Tue, Jan 15, 2013 at 05:21:15PM -0800, Sonny Rao wrote:
> On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Tue, 15 Jan 2013 16:32:38 -0800
> > Sonny Rao <sonnyrao@google.com> wrote:
> >
> >> >> It's for saving the power to increase batter life.
> >> >
> >> > It might well have that effect, dunno.  That wasn't my intent.  Testing
> >> > needed!
> >> >
> >>
> >> Power saving is certainly why we had it on originally for ChromeOS,
> >> but we turned it off due to misbehavior.
> >>
> >> Specifically, we saw a pathological behavior where we'd end up writing
> >> to the disk every few seconds when laptop mode was turned on.  This
> >> turned out to be because laptop-mode sets a timer which is used to
> >> check for new dirty data after the initial flush and writes that out
> >> before spinning the disk down, and on ChromeOS various chatty daemons
> >> on the system were logging and dirtying data more or less constantly
> >> so there was almost always something there to be written out.  So what
> >> ended up happening was that we'd need to do a read, then wake up the
> >> disk, and then keep writing every few seconds for a long period of
> >> time, which had the opposite effect from what we wanted.
> >
> > So after the read, the disk would chatter away doing a dribble of
> > writes?  That sounds like plain brokenness (and why did the chrome guys
> > not tell anyone about it?!?!?).
> 
> Yes, either read or fsync.  I ranted about it a little (here:
> http://marc.info/?l=linux-mm&m=135422986220016&w=4), but mostly
> assumed it was working as expected, and that ChromeOS was just
> dirtying data at an absurd pace.  Might have been a bad assumption and
> I could have been more explicit about reporting it, sorry about that.
> 
> > The idea is that when the physical
> > read occurs, we should opportunistically flush out all pending writes,
> > while the disk is running.  Then go back into
> > buffer-writes-for-a-long-time mode.
> >
> 
> See the comment in page-writeback.c above laptop_io_completion():
> 
> /*
>  * We've spun up the disk and we're in laptop mode: schedule writeback
>  * of all dirty data a few seconds from now.  If the flush is already
> scheduled
>  * then push it back - the user is still using the disk.
>  */
> void laptop_io_completion(struct backing_dev_info *info)
> 
> What ends up happening fairly often is that there's always something
> dirty with that few seconds (or even one second) on our system.
> 
> > I forget what we did with fsync() and friends.  Quite a lot of
> > pestiferous applications like to do fsync quite frequently.  I had a
> > special kernel in which fsync() consisted of "return 0;", but ISTR
> > there being some resistance to productizing that idea.
> >
> 
> Yeah, we have this problem and we try to fix up users of fsync() as we
> find them but it's a bit of a never-ending battle.  Such a feature
> would be useful.
> 
> >>  The issues
> >> with zram swap just confirmed that we didn't want laptop mode.
> >>
> >> Most of our devices have had SSDs rather than spinning disks, so noise
> >> wasn't an issue, although when we finally did support an official
> >> device with a spinning disk people certainly complained when the disk
> >> started clicking all the time
> >
> > hm, it's interesting that the general idea still has vailidity.  It
> > would be a fun project for someone to sniff out all the requirements,
> > fixup/enhance/rewrite the current implementation and generally make it
> > all spiffy and nice.
> >
> >> (due to the underflow in the writeback code).
> >
> > To what underflow do you refer?
> >
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754
> 
> That particular bug caused writes to happen almost instantly after the
> underflow ocurred, and consequently slowed write throughput to a crawl
> because there was no chance for contiguous writes to gather.
> 
> >> We do know that current SSDs save a significant amount of
> >> power when they go into standby, so minimizing disk writes is still
> >> useful on these devices.
> >>
> >> A very simple laptop mode which only does a single sync when we spin
> >> up the disk, and didn't bother with the timer behavior or muck with
> >> swap behavior might be something that is more useful for us, and I
> >> suspect it might simplify the writeback code somewhat as well.
> >
> > I don't think I understand the problem with the timer.  My original RFC
> > said
> >
> > : laptop_writeback_centisecs
> > : --------------------------
> > :
> > : This tunable determines the maximum age of dirty data when the machine
> > : is operating in Laptop mode.  The default value is 30000 - five
> > : minutes.  This means that if applications are generating a small amount
> > : of write traffic, the disk will spin up once per five minutes.
> > :
> > : If the disk is spun up for any other reason (such as for a read) then
> > : all dirty data will be flushed anyway, and this timer is reset to zero.
> >
> > which all sounds very sensible and shouldn't exhibit the behavior you
> > observed.
> >
> 
> The laptop-mode timer get re-armed after each writeback (see above
> laptop_io_completion function), even if it was caused by laptop-mode
> itself.  So, if something is continually dirtying a little bit of
> data, we end up getting a chain of small writes which keeps the disk
> awake for long periods of time.

Out of curiosity, for saving the power, why don' you increase the value for
laptop_mode?

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
  2013-01-16  4:47                     ` Minchan Kim
@ 2013-01-16 20:08                       ` Sonny Rao
  -1 siblings, 0 replies; 58+ messages in thread
From: Sonny Rao @ 2013-01-16 20:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Bryan Freed, Hugh Dickins, Rik van Riel, Johannes Weiner,
	Sameer Nanda

On Tue, Jan 15, 2013 at 8:47 PM, Minchan Kim <minchan@kernel.org> wrote:
> On Tue, Jan 15, 2013 at 05:21:15PM -0800, Sonny Rao wrote:
>> On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>> > On Tue, 15 Jan 2013 16:32:38 -0800
>> > Sonny Rao <sonnyrao@google.com> wrote:
>> >
>> >> >> It's for saving the power to increase batter life.
>> >> >
>> >> > It might well have that effect, dunno.  That wasn't my intent.  Testing
>> >> > needed!
>> >> >
>> >>
>> >> Power saving is certainly why we had it on originally for ChromeOS,
>> >> but we turned it off due to misbehavior.
>> >>
>> >> Specifically, we saw a pathological behavior where we'd end up writing
>> >> to the disk every few seconds when laptop mode was turned on.  This
>> >> turned out to be because laptop-mode sets a timer which is used to
>> >> check for new dirty data after the initial flush and writes that out
>> >> before spinning the disk down, and on ChromeOS various chatty daemons
>> >> on the system were logging and dirtying data more or less constantly
>> >> so there was almost always something there to be written out.  So what
>> >> ended up happening was that we'd need to do a read, then wake up the
>> >> disk, and then keep writing every few seconds for a long period of
>> >> time, which had the opposite effect from what we wanted.
>> >
>> > So after the read, the disk would chatter away doing a dribble of
>> > writes?  That sounds like plain brokenness (and why did the chrome guys
>> > not tell anyone about it?!?!?).
>>
>> Yes, either read or fsync.  I ranted about it a little (here:
>> http://marc.info/?l=linux-mm&m=135422986220016&w=4), but mostly
>> assumed it was working as expected, and that ChromeOS was just
>> dirtying data at an absurd pace.  Might have been a bad assumption and
>> I could have been more explicit about reporting it, sorry about that.
>>
>> > The idea is that when the physical
>> > read occurs, we should opportunistically flush out all pending writes,
>> > while the disk is running.  Then go back into
>> > buffer-writes-for-a-long-time mode.
>> >
>>
>> See the comment in page-writeback.c above laptop_io_completion():
>>
>> /*
>>  * We've spun up the disk and we're in laptop mode: schedule writeback
>>  * of all dirty data a few seconds from now.  If the flush is already
>> scheduled
>>  * then push it back - the user is still using the disk.
>>  */
>> void laptop_io_completion(struct backing_dev_info *info)
>>
>> What ends up happening fairly often is that there's always something
>> dirty with that few seconds (or even one second) on our system.
>>
>> > I forget what we did with fsync() and friends.  Quite a lot of
>> > pestiferous applications like to do fsync quite frequently.  I had a
>> > special kernel in which fsync() consisted of "return 0;", but ISTR
>> > there being some resistance to productizing that idea.
>> >
>>
>> Yeah, we have this problem and we try to fix up users of fsync() as we
>> find them but it's a bit of a never-ending battle.  Such a feature
>> would be useful.
>>
>> >>  The issues
>> >> with zram swap just confirmed that we didn't want laptop mode.
>> >>
>> >> Most of our devices have had SSDs rather than spinning disks, so noise
>> >> wasn't an issue, although when we finally did support an official
>> >> device with a spinning disk people certainly complained when the disk
>> >> started clicking all the time
>> >
>> > hm, it's interesting that the general idea still has vailidity.  It
>> > would be a fun project for someone to sniff out all the requirements,
>> > fixup/enhance/rewrite the current implementation and generally make it
>> > all spiffy and nice.
>> >
>> >> (due to the underflow in the writeback code).
>> >
>> > To what underflow do you refer?
>> >
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754
>>
>> That particular bug caused writes to happen almost instantly after the
>> underflow ocurred, and consequently slowed write throughput to a crawl
>> because there was no chance for contiguous writes to gather.
>>
>> >> We do know that current SSDs save a significant amount of
>> >> power when they go into standby, so minimizing disk writes is still
>> >> useful on these devices.
>> >>
>> >> A very simple laptop mode which only does a single sync when we spin
>> >> up the disk, and didn't bother with the timer behavior or muck with
>> >> swap behavior might be something that is more useful for us, and I
>> >> suspect it might simplify the writeback code somewhat as well.
>> >
>> > I don't think I understand the problem with the timer.  My original RFC
>> > said
>> >
>> > : laptop_writeback_centisecs
>> > : --------------------------
>> > :
>> > : This tunable determines the maximum age of dirty data when the machine
>> > : is operating in Laptop mode.  The default value is 30000 - five
>> > : minutes.  This means that if applications are generating a small amount
>> > : of write traffic, the disk will spin up once per five minutes.
>> > :
>> > : If the disk is spun up for any other reason (such as for a read) then
>> > : all dirty data will be flushed anyway, and this timer is reset to zero.
>> >
>> > which all sounds very sensible and shouldn't exhibit the behavior you
>> > observed.
>> >
>>
>> The laptop-mode timer get re-armed after each writeback (see above
>> laptop_io_completion function), even if it was caused by laptop-mode
>> itself.  So, if something is continually dirtying a little bit of
>> data, we end up getting a chain of small writes which keeps the disk
>> awake for long periods of time.
>
> Out of curiosity, for saving the power, why don' you increase the value for
> laptop_mode?
>

We want to keep the disk in the active state for as short a time as
possible, in order to save power.  The minimum time before we can go
to standby is 5 seconds, so that would be the upper limit on where
we'd want to set the laptop mode timer.  But the real issue here is
that the longer we wait, the greater the chance that something on the
system has dirtyed something which will get flushed, and cause the
chain of writes I mentioned above.  So, really we want to minimize
this time (down to 0), not maximize it.

>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
@ 2013-01-16 20:08                       ` Sonny Rao
  0 siblings, 0 replies; 58+ messages in thread
From: Sonny Rao @ 2013-01-16 20:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Bryan Freed, Hugh Dickins, Rik van Riel, Johannes Weiner,
	Sameer Nanda

On Tue, Jan 15, 2013 at 8:47 PM, Minchan Kim <minchan@kernel.org> wrote:
> On Tue, Jan 15, 2013 at 05:21:15PM -0800, Sonny Rao wrote:
>> On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>> > On Tue, 15 Jan 2013 16:32:38 -0800
>> > Sonny Rao <sonnyrao@google.com> wrote:
>> >
>> >> >> It's for saving the power to increase batter life.
>> >> >
>> >> > It might well have that effect, dunno.  That wasn't my intent.  Testing
>> >> > needed!
>> >> >
>> >>
>> >> Power saving is certainly why we had it on originally for ChromeOS,
>> >> but we turned it off due to misbehavior.
>> >>
>> >> Specifically, we saw a pathological behavior where we'd end up writing
>> >> to the disk every few seconds when laptop mode was turned on.  This
>> >> turned out to be because laptop-mode sets a timer which is used to
>> >> check for new dirty data after the initial flush and writes that out
>> >> before spinning the disk down, and on ChromeOS various chatty daemons
>> >> on the system were logging and dirtying data more or less constantly
>> >> so there was almost always something there to be written out.  So what
>> >> ended up happening was that we'd need to do a read, then wake up the
>> >> disk, and then keep writing every few seconds for a long period of
>> >> time, which had the opposite effect from what we wanted.
>> >
>> > So after the read, the disk would chatter away doing a dribble of
>> > writes?  That sounds like plain brokenness (and why did the chrome guys
>> > not tell anyone about it?!?!?).
>>
>> Yes, either read or fsync.  I ranted about it a little (here:
>> http://marc.info/?l=linux-mm&m=135422986220016&w=4), but mostly
>> assumed it was working as expected, and that ChromeOS was just
>> dirtying data at an absurd pace.  Might have been a bad assumption and
>> I could have been more explicit about reporting it, sorry about that.
>>
>> > The idea is that when the physical
>> > read occurs, we should opportunistically flush out all pending writes,
>> > while the disk is running.  Then go back into
>> > buffer-writes-for-a-long-time mode.
>> >
>>
>> See the comment in page-writeback.c above laptop_io_completion():
>>
>> /*
>>  * We've spun up the disk and we're in laptop mode: schedule writeback
>>  * of all dirty data a few seconds from now.  If the flush is already
>> scheduled
>>  * then push it back - the user is still using the disk.
>>  */
>> void laptop_io_completion(struct backing_dev_info *info)
>>
>> What ends up happening fairly often is that there's always something
>> dirty with that few seconds (or even one second) on our system.
>>
>> > I forget what we did with fsync() and friends.  Quite a lot of
>> > pestiferous applications like to do fsync quite frequently.  I had a
>> > special kernel in which fsync() consisted of "return 0;", but ISTR
>> > there being some resistance to productizing that idea.
>> >
>>
>> Yeah, we have this problem and we try to fix up users of fsync() as we
>> find them but it's a bit of a never-ending battle.  Such a feature
>> would be useful.
>>
>> >>  The issues
>> >> with zram swap just confirmed that we didn't want laptop mode.
>> >>
>> >> Most of our devices have had SSDs rather than spinning disks, so noise
>> >> wasn't an issue, although when we finally did support an official
>> >> device with a spinning disk people certainly complained when the disk
>> >> started clicking all the time
>> >
>> > hm, it's interesting that the general idea still has vailidity.  It
>> > would be a fun project for someone to sniff out all the requirements,
>> > fixup/enhance/rewrite the current implementation and generally make it
>> > all spiffy and nice.
>> >
>> >> (due to the underflow in the writeback code).
>> >
>> > To what underflow do you refer?
>> >
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754
>>
>> That particular bug caused writes to happen almost instantly after the
>> underflow ocurred, and consequently slowed write throughput to a crawl
>> because there was no chance for contiguous writes to gather.
>>
>> >> We do know that current SSDs save a significant amount of
>> >> power when they go into standby, so minimizing disk writes is still
>> >> useful on these devices.
>> >>
>> >> A very simple laptop mode which only does a single sync when we spin
>> >> up the disk, and didn't bother with the timer behavior or muck with
>> >> swap behavior might be something that is more useful for us, and I
>> >> suspect it might simplify the writeback code somewhat as well.
>> >
>> > I don't think I understand the problem with the timer.  My original RFC
>> > said
>> >
>> > : laptop_writeback_centisecs
>> > : --------------------------
>> > :
>> > : This tunable determines the maximum age of dirty data when the machine
>> > : is operating in Laptop mode.  The default value is 30000 - five
>> > : minutes.  This means that if applications are generating a small amount
>> > : of write traffic, the disk will spin up once per five minutes.
>> > :
>> > : If the disk is spun up for any other reason (such as for a read) then
>> > : all dirty data will be flushed anyway, and this timer is reset to zero.
>> >
>> > which all sounds very sensible and shouldn't exhibit the behavior you
>> > observed.
>> >
>>
>> The laptop-mode timer get re-armed after each writeback (see above
>> laptop_io_completion function), even if it was caused by laptop-mode
>> itself.  So, if something is continually dirtying a little bit of
>> data, we end up getting a chain of small writes which keeps the disk
>> awake for long periods of time.
>
> Out of curiosity, for saving the power, why don' you increase the value for
> laptop_mode?
>

We want to keep the disk in the active state for as short a time as
possible, in order to save power.  The minimum time before we can go
to standby is 5 seconds, so that would be the upper limit on where
we'd want to set the laptop mode timer.  But the real issue here is
that the longer we wait, the greater the chance that something on the
system has dirtyed something which will get flushed, and cause the
chain of writes I mentioned above.  So, really we want to minimize
this time (down to 0), not maximize it.

>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-09  6:21   ` Minchan Kim
@ 2013-01-16 21:41     ` Andrew Morton
  -1 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-16 21:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed,  9 Jan 2013 15:21:13 +0900
Minchan Kim <minchan@kernel.org> wrote:
>

This changelog is quite hard to understand :(

> Recently, Luigi reported there are lots of free swap space when
> OOM happens. It's easily reproduced on zram-over-swap, where
> many instance of memory hogs are running and laptop_mode is enabled.
> 
> Luigi reported there was no problem when he disabled laptop_mode.
> The problem when I investigate problem is following as.
> 
> try_to_free_pages disable may_writepage if laptop_mode is enabled.
> shrink_page_list adds lots of anon pages in swap cache by
> add_to_swap, which makes pages Dirty and rotate them to head of
> inactive LRU without pageout. If it is repeated, inactive anon LRU
> is full of Dirty and SwapCache pages.

"Dirty and SwapCache" is ambigious.  Does it mean "dirty pages and
swapcache pages" or does it mean "dirty swapcache pages".  The latter,
I expect.

> 
> In case of that, isolate_lru_pages fails because it try to isolate
> clean page due to may_writepage == 0.
> 
> The may_writepage could be 1 only if total_scanned is higher than
> writeback_threshold in do_try_to_free_pages but unfortunately,
> VM can't isolate anon pages from inactive anon lru list by
> above reason and we already reclaimed all file-backed pages.
> So it ends up OOM killing.

Here, please expand upon "by above reason".  Explain here exactly why
scanning is unsuccessful.

> This patch prevents to add a page to swap cache unnecessary when
> may_writepage is unset so anoymous lru list isn't full of
> Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> which ends up setting may_writepage to 1 and could swap out
> anon lru pages. When OOM triggers, I confirmed swap space was full.
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		if (PageAnon(page) && !PageSwapCache(page)) {
>  			if (!(sc->gfp_mask & __GFP_IO))
>  				goto keep_locked;
> +			if (!sc->may_writepage)
> +				goto keep_locked;
>  			if (!add_to_swap(page))
>  				goto activate_locked;
>  			may_enter_fs = 1;

Needs a comment explaining why we bale out in this case, please.

If I'm understanding it correctly, this change causes the kernel to
move less anonymous memory onto the inactive anon LRU and thereby
causes the scanner to be more successful in locating clean swapcache
pages on that list?  But that makes no sense, because from your
description it appears the intent of the patch is to use *more* swap.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-16 21:41     ` Andrew Morton
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-16 21:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed,  9 Jan 2013 15:21:13 +0900
Minchan Kim <minchan@kernel.org> wrote:
>

This changelog is quite hard to understand :(

> Recently, Luigi reported there are lots of free swap space when
> OOM happens. It's easily reproduced on zram-over-swap, where
> many instance of memory hogs are running and laptop_mode is enabled.
> 
> Luigi reported there was no problem when he disabled laptop_mode.
> The problem when I investigate problem is following as.
> 
> try_to_free_pages disable may_writepage if laptop_mode is enabled.
> shrink_page_list adds lots of anon pages in swap cache by
> add_to_swap, which makes pages Dirty and rotate them to head of
> inactive LRU without pageout. If it is repeated, inactive anon LRU
> is full of Dirty and SwapCache pages.

"Dirty and SwapCache" is ambigious.  Does it mean "dirty pages and
swapcache pages" or does it mean "dirty swapcache pages".  The latter,
I expect.

> 
> In case of that, isolate_lru_pages fails because it try to isolate
> clean page due to may_writepage == 0.
> 
> The may_writepage could be 1 only if total_scanned is higher than
> writeback_threshold in do_try_to_free_pages but unfortunately,
> VM can't isolate anon pages from inactive anon lru list by
> above reason and we already reclaimed all file-backed pages.
> So it ends up OOM killing.

Here, please expand upon "by above reason".  Explain here exactly why
scanning is unsuccessful.

> This patch prevents to add a page to swap cache unnecessary when
> may_writepage is unset so anoymous lru list isn't full of
> Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> which ends up setting may_writepage to 1 and could swap out
> anon lru pages. When OOM triggers, I confirmed swap space was full.
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		if (PageAnon(page) && !PageSwapCache(page)) {
>  			if (!(sc->gfp_mask & __GFP_IO))
>  				goto keep_locked;
> +			if (!sc->may_writepage)
> +				goto keep_locked;
>  			if (!add_to_swap(page))
>  				goto activate_locked;
>  			may_enter_fs = 1;

Needs a comment explaining why we bale out in this case, please.

If I'm understanding it correctly, this change causes the kernel to
move less anonymous memory onto the inactive anon LRU and thereby
causes the scanner to be more successful in locating clean swapcache
pages on that list?  But that makes no sense, because from your
description it appears the intent of the patch is to use *more* swap.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-16 21:41     ` Andrew Morton
@ 2013-01-17  0:53       ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-17  0:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed, Jan 16, 2013 at 01:41:55PM -0800, Andrew Morton wrote:
> On Wed,  9 Jan 2013 15:21:13 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> >
> 
> This changelog is quite hard to understand :(
> 
> > Recently, Luigi reported there are lots of free swap space when
> > OOM happens. It's easily reproduced on zram-over-swap, where
> > many instance of memory hogs are running and laptop_mode is enabled.
> > 
> > Luigi reported there was no problem when he disabled laptop_mode.
> > The problem when I investigate problem is following as.
> > 
> > try_to_free_pages disable may_writepage if laptop_mode is enabled.
> > shrink_page_list adds lots of anon pages in swap cache by
> > add_to_swap, which makes pages Dirty and rotate them to head of
> > inactive LRU without pageout. If it is repeated, inactive anon LRU
> > is full of Dirty and SwapCache pages.
> 
> "Dirty and SwapCache" is ambigious.  Does it mean "dirty pages and
> swapcache pages" or does it mean "dirty swapcache pages".  The latter,
> I expect.

Yeb.

> 
> > 
> > In case of that, isolate_lru_pages fails because it try to isolate
> > clean page due to may_writepage == 0.
> > 
> > The may_writepage could be 1 only if total_scanned is higher than
> > writeback_threshold in do_try_to_free_pages but unfortunately,
> > VM can't isolate anon pages from inactive anon lru list by
> > above reason and we already reclaimed all file-backed pages.
> > So it ends up OOM killing.
> 
> Here, please expand upon "by above reason".  Explain here exactly why
> scanning is unsuccessful.

Let me try again ;)

============================  &< ============================

Recently, Luigi reported there are lots of free swap space when
OOM happens. It's easily reproduced on zram-over-swap, where
many instance of memory hogs are running and laptop_mode is enabled.
He said there was no problem when he disabled laptop_mode.

The problem when I investigate problem is following as.

Assumption for easy explanation: There are no page cache page in system
because they all are already reclaimed.

1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
2. shrink_inactive_list isolates victim pages from inactive anon lru list.
3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
   pageout because sc->may_writepage is 0 so the page is rotated back into
   inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
   retry reclaim with higher priority.
5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
   but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
   inactive anon lru list is full of dirty pages by 3 so it just returns
   without  any reclaim progress.
6. do_try_to_free_pages doesn't set may_write due to zero total_scanned.
   Because sc->nr_scanned is increased by shrink_page_list but we don't call
   shrink_page_list in 5 due to short of isolated pages.

Above loop is continued until OOM happens.
The problem didn't happen before [1] was merged because old logic's isolatation
in shrink_inactive_list was successful and tried to call shrink_page_list
to pageout them but it still ends up failed to page out by may_writepage.
But important point is that sc->nr_scanned was increased althoug we couldn't
swap out them so do_try_to_free_pages could set may_writepages.
So this patch need to go stable tree althoug it's a band-aid.
Then, for latest linus tree, we should fix laptop_mode's fundamental
problem.

[1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]

> 
> > This patch prevents to add a page to swap cache unnecessary when
> > may_writepage is unset so anoymous lru list isn't full of
> > Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> > which ends up setting may_writepage to 1 and could swap out
> > anon lru pages. When OOM triggers, I confirmed swap space was full.
> > 
> > ...
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  		if (PageAnon(page) && !PageSwapCache(page)) {
> >  			if (!(sc->gfp_mask & __GFP_IO))
> >  				goto keep_locked;
> > +			if (!sc->may_writepage)
> > +				goto keep_locked;
> >  			if (!add_to_swap(page))
> >  				goto activate_locked;
> >  			may_enter_fs = 1;
> 
> Needs a comment explaining why we bale out in this case, please.


Okay. How about this?

/*
 * There is no point to add a page to swap cache if we can't swap out.
 */

> 
> If I'm understanding it correctly, this change causes the kernel to
> move less anonymous memory onto the inactive anon LRU and thereby

No. The amount of inactive anon LRU is same. Patch just prevent to add
page to swapcache unnecessary.

> causes the scanner to be more successful in locating clean swapcache
> pages on that list?  But that makes no sense, because from your
> description it appears the intent of the patch is to use *more* swap.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-17  0:53       ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-17  0:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Wed, Jan 16, 2013 at 01:41:55PM -0800, Andrew Morton wrote:
> On Wed,  9 Jan 2013 15:21:13 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> >
> 
> This changelog is quite hard to understand :(
> 
> > Recently, Luigi reported there are lots of free swap space when
> > OOM happens. It's easily reproduced on zram-over-swap, where
> > many instance of memory hogs are running and laptop_mode is enabled.
> > 
> > Luigi reported there was no problem when he disabled laptop_mode.
> > The problem when I investigate problem is following as.
> > 
> > try_to_free_pages disable may_writepage if laptop_mode is enabled.
> > shrink_page_list adds lots of anon pages in swap cache by
> > add_to_swap, which makes pages Dirty and rotate them to head of
> > inactive LRU without pageout. If it is repeated, inactive anon LRU
> > is full of Dirty and SwapCache pages.
> 
> "Dirty and SwapCache" is ambigious.  Does it mean "dirty pages and
> swapcache pages" or does it mean "dirty swapcache pages".  The latter,
> I expect.

Yeb.

> 
> > 
> > In case of that, isolate_lru_pages fails because it try to isolate
> > clean page due to may_writepage == 0.
> > 
> > The may_writepage could be 1 only if total_scanned is higher than
> > writeback_threshold in do_try_to_free_pages but unfortunately,
> > VM can't isolate anon pages from inactive anon lru list by
> > above reason and we already reclaimed all file-backed pages.
> > So it ends up OOM killing.
> 
> Here, please expand upon "by above reason".  Explain here exactly why
> scanning is unsuccessful.

Let me try again ;)

============================  &< ============================

Recently, Luigi reported there are lots of free swap space when
OOM happens. It's easily reproduced on zram-over-swap, where
many instance of memory hogs are running and laptop_mode is enabled.
He said there was no problem when he disabled laptop_mode.

The problem when I investigate problem is following as.

Assumption for easy explanation: There are no page cache page in system
because they all are already reclaimed.

1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
2. shrink_inactive_list isolates victim pages from inactive anon lru list.
3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
   pageout because sc->may_writepage is 0 so the page is rotated back into
   inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
   retry reclaim with higher priority.
5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
   but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
   inactive anon lru list is full of dirty pages by 3 so it just returns
   without  any reclaim progress.
6. do_try_to_free_pages doesn't set may_write due to zero total_scanned.
   Because sc->nr_scanned is increased by shrink_page_list but we don't call
   shrink_page_list in 5 due to short of isolated pages.

Above loop is continued until OOM happens.
The problem didn't happen before [1] was merged because old logic's isolatation
in shrink_inactive_list was successful and tried to call shrink_page_list
to pageout them but it still ends up failed to page out by may_writepage.
But important point is that sc->nr_scanned was increased althoug we couldn't
swap out them so do_try_to_free_pages could set may_writepages.
So this patch need to go stable tree althoug it's a band-aid.
Then, for latest linus tree, we should fix laptop_mode's fundamental
problem.

[1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]

> 
> > This patch prevents to add a page to swap cache unnecessary when
> > may_writepage is unset so anoymous lru list isn't full of
> > Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> > which ends up setting may_writepage to 1 and could swap out
> > anon lru pages. When OOM triggers, I confirmed swap space was full.
> > 
> > ...
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  		if (PageAnon(page) && !PageSwapCache(page)) {
> >  			if (!(sc->gfp_mask & __GFP_IO))
> >  				goto keep_locked;
> > +			if (!sc->may_writepage)
> > +				goto keep_locked;
> >  			if (!add_to_swap(page))
> >  				goto activate_locked;
> >  			may_enter_fs = 1;
> 
> Needs a comment explaining why we bale out in this case, please.


Okay. How about this?

/*
 * There is no point to add a page to swap cache if we can't swap out.
 */

> 
> If I'm understanding it correctly, this change causes the kernel to
> move less anonymous memory onto the inactive anon LRU and thereby

No. The amount of inactive anon LRU is same. Patch just prevent to add
page to swapcache unnecessary.

> causes the scanner to be more successful in locating clean swapcache
> pages on that list?  But that makes no sense, because from your
> description it appears the intent of the patch is to use *more* swap.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-17  0:53       ` Minchan Kim
@ 2013-01-17 22:22         ` Andrew Morton
  -1 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-17 22:22 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Thu, 17 Jan 2013 09:53:14 +0900
Minchan Kim <minchan@kernel.org> wrote:

> Recently, Luigi reported there are lots of free swap space when
> OOM happens. It's easily reproduced on zram-over-swap, where
> many instance of memory hogs are running and laptop_mode is enabled.
> He said there was no problem when he disabled laptop_mode.
> 
> The problem when I investigate problem is following as.
> 
> Assumption for easy explanation: There are no page cache page in system
> because they all are already reclaimed.
> 
> 1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> 2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> 3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
>    pageout because sc->may_writepage is 0 so the page is rotated back into
>    inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty
> 4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
>    retry reclaim with higher priority.
> 5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
>    but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
>    inactive anon lru list is full of dirty pages by 3 so it just returns
>    without  any reclaim progress.
> 6. do_try_to_free_pages doesn't set may_write due to zero total_scanned.

s/may_write/may_writepage/

>    Because sc->nr_scanned is increased by shrink_page_list but we don't call
>    shrink_page_list in 5 due to short of isolated pages.

This is the bug, is it not?

In laptop mode, we still need to write out dirty swapcache at some
point.  An appropriate time to do this is when the scanning priority is
getting high.  But it seems that this ISOLATE_CLEAN->total_scanned
interaction is preventing that.

(An enhancement to laptop mode would be to opportunistically write out
dirty swapcache in or around laptop_mode_timer_fn()).

> Above loop is continued until OOM happens.
> The problem didn't happen before [1] was merged because old logic's isolatation
> in shrink_inactive_list was successful and tried to call shrink_page_list
> to pageout them but it still ends up failed to page out by may_writepage.
> But important point is that sc->nr_scanned was increased althoug we couldn't
> swap out them so do_try_to_free_pages could set may_writepages.
> So this patch need to go stable tree althoug it's a band-aid.
> Then, for latest linus tree, we should fix laptop_mode's fundamental
> problem.

Well.  Perhaps we can do that now.

> [1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-17 22:22         ` Andrew Morton
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2013-01-17 22:22 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Thu, 17 Jan 2013 09:53:14 +0900
Minchan Kim <minchan@kernel.org> wrote:

> Recently, Luigi reported there are lots of free swap space when
> OOM happens. It's easily reproduced on zram-over-swap, where
> many instance of memory hogs are running and laptop_mode is enabled.
> He said there was no problem when he disabled laptop_mode.
> 
> The problem when I investigate problem is following as.
> 
> Assumption for easy explanation: There are no page cache page in system
> because they all are already reclaimed.
> 
> 1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> 2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> 3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
>    pageout because sc->may_writepage is 0 so the page is rotated back into
>    inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty
> 4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
>    retry reclaim with higher priority.
> 5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
>    but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
>    inactive anon lru list is full of dirty pages by 3 so it just returns
>    without  any reclaim progress.
> 6. do_try_to_free_pages doesn't set may_write due to zero total_scanned.

s/may_write/may_writepage/

>    Because sc->nr_scanned is increased by shrink_page_list but we don't call
>    shrink_page_list in 5 due to short of isolated pages.

This is the bug, is it not?

In laptop mode, we still need to write out dirty swapcache at some
point.  An appropriate time to do this is when the scanning priority is
getting high.  But it seems that this ISOLATE_CLEAN->total_scanned
interaction is preventing that.

(An enhancement to laptop mode would be to opportunistically write out
dirty swapcache in or around laptop_mode_timer_fn()).

> Above loop is continued until OOM happens.
> The problem didn't happen before [1] was merged because old logic's isolatation
> in shrink_inactive_list was successful and tried to call shrink_page_list
> to pageout them but it still ends up failed to page out by may_writepage.
> But important point is that sc->nr_scanned was increased althoug we couldn't
> swap out them so do_try_to_free_pages could set may_writepages.
> So this patch need to go stable tree althoug it's a band-aid.
> Then, for latest linus tree, we should fix laptop_mode's fundamental
> problem.

Well.  Perhaps we can do that now.

> [1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-17 22:22         ` Andrew Morton
@ 2013-01-17 23:36           ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-17 23:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Thu, Jan 17, 2013 at 02:22:38PM -0800, Andrew Morton wrote:
> On Thu, 17 Jan 2013 09:53:14 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> 
> > Recently, Luigi reported there are lots of free swap space when
> > OOM happens. It's easily reproduced on zram-over-swap, where
> > many instance of memory hogs are running and laptop_mode is enabled.
> > He said there was no problem when he disabled laptop_mode.
> > 
> > The problem when I investigate problem is following as.
> > 
> > Assumption for easy explanation: There are no page cache page in system
> > because they all are already reclaimed.
> > 
> > 1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> > 2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> > 3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
> >    pageout because sc->may_writepage is 0 so the page is rotated back into
> >    inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty
> > 4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
> >    retry reclaim with higher priority.
> > 5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
> >    but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
> >    inactive anon lru list is full of dirty pages by 3 so it just returns
> >    without  any reclaim progress.
> > 6. do_try_to_free_pages doesn't set may_write due to zero total_scanned.
> 
> s/may_write/may_writepage/

Thanks!

> 
> >    Because sc->nr_scanned is increased by shrink_page_list but we don't call
> >    shrink_page_list in 5 due to short of isolated pages.
> 
> This is the bug, is it not?
> 
> In laptop mode, we still need to write out dirty swapcache at some
> point.  An appropriate time to do this is when the scanning priority is

Yes and when to some point is really important. Now, the point for that is
depends on on the number of scanned pages by shrink_page_list. It means we
must isolate victim pages from inactive LRU list and call shrink_page_list
to increase sc->nr_scanned but unfortunately, we have various filters to
decrease CPU consumption and LRU churning when VM try to isolate victim pages
so it could prevent isolating victim pages from LRU list.

> getting high.  But it seems that this ISOLATE_CLEAN->total_scanned

Yes. I absolutely agree on that some point should depend on priority, NOT
the number of scanned pages. And I already said to you about that.
https://lkml.org/lkml/2013/1/10/643

We used to use such heuristic in several places in VM, ie DEF_PRIORITY - 2
But why I hesitate with the patch is that I think this patch should go to
stable tree so the patch should be really small and have no side effect so
I don't wanted to change laptop_mode behavior heavily caused by changing
condition for may_writepage trigger point.

> interaction is preventing that.
> 
> (An enhancement to laptop mode would be to opportunistically write out
> dirty swapcache in or around laptop_mode_timer_fn()).

It could but it should be another patch and VM shouldn't rely on ONLY
laptop_mode_timer_fn, IMHO. VM should have own rule to reclaim pages
regardless of laptop_mode's help to prevent OOM kill.

> 
> > Above loop is continued until OOM happens.
> > The problem didn't happen before [1] was merged because old logic's isolatation
> > in shrink_inactive_list was successful and tried to call shrink_page_list
> > to pageout them but it still ends up failed to page out by may_writepage.
> > But important point is that sc->nr_scanned was increased althoug we couldn't
> > swap out them so do_try_to_free_pages could set may_writepages.
> > So this patch need to go stable tree althoug it's a band-aid.
> > Then, for latest linus tree, we should fix laptop_mode's fundamental
> > problem.
> 
> Well.  Perhaps we can do that now.

Okay. If you don't object my suggestion, I will send patches next week.
Thanks for the review, Andrew!

> 
> > [1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-17 23:36           ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-17 23:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

On Thu, Jan 17, 2013 at 02:22:38PM -0800, Andrew Morton wrote:
> On Thu, 17 Jan 2013 09:53:14 +0900
> Minchan Kim <minchan@kernel.org> wrote:
> 
> > Recently, Luigi reported there are lots of free swap space when
> > OOM happens. It's easily reproduced on zram-over-swap, where
> > many instance of memory hogs are running and laptop_mode is enabled.
> > He said there was no problem when he disabled laptop_mode.
> > 
> > The problem when I investigate problem is following as.
> > 
> > Assumption for easy explanation: There are no page cache page in system
> > because they all are already reclaimed.
> > 
> > 1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> > 2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> > 3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
> >    pageout because sc->may_writepage is 0 so the page is rotated back into
> >    inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty
> > 4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
> >    retry reclaim with higher priority.
> > 5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
> >    but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
> >    inactive anon lru list is full of dirty pages by 3 so it just returns
> >    without  any reclaim progress.
> > 6. do_try_to_free_pages doesn't set may_write due to zero total_scanned.
> 
> s/may_write/may_writepage/

Thanks!

> 
> >    Because sc->nr_scanned is increased by shrink_page_list but we don't call
> >    shrink_page_list in 5 due to short of isolated pages.
> 
> This is the bug, is it not?
> 
> In laptop mode, we still need to write out dirty swapcache at some
> point.  An appropriate time to do this is when the scanning priority is

Yes and when to some point is really important. Now, the point for that is
depends on on the number of scanned pages by shrink_page_list. It means we
must isolate victim pages from inactive LRU list and call shrink_page_list
to increase sc->nr_scanned but unfortunately, we have various filters to
decrease CPU consumption and LRU churning when VM try to isolate victim pages
so it could prevent isolating victim pages from LRU list.

> getting high.  But it seems that this ISOLATE_CLEAN->total_scanned

Yes. I absolutely agree on that some point should depend on priority, NOT
the number of scanned pages. And I already said to you about that.
https://lkml.org/lkml/2013/1/10/643

We used to use such heuristic in several places in VM, ie DEF_PRIORITY - 2
But why I hesitate with the patch is that I think this patch should go to
stable tree so the patch should be really small and have no side effect so
I don't wanted to change laptop_mode behavior heavily caused by changing
condition for may_writepage trigger point.

> interaction is preventing that.
> 
> (An enhancement to laptop mode would be to opportunistically write out
> dirty swapcache in or around laptop_mode_timer_fn()).

It could but it should be another patch and VM shouldn't rely on ONLY
laptop_mode_timer_fn, IMHO. VM should have own rule to reclaim pages
regardless of laptop_mode's help to prevent OOM kill.

> 
> > Above loop is continued until OOM happens.
> > The problem didn't happen before [1] was merged because old logic's isolatation
> > in shrink_inactive_list was successful and tried to call shrink_page_list
> > to pageout them but it still ends up failed to page out by may_writepage.
> > But important point is that sc->nr_scanned was increased althoug we couldn't
> > swap out them so do_try_to_free_pages could set may_writepages.
> > So this patch need to go stable tree althoug it's a band-aid.
> > Then, for latest linus tree, we should fix laptop_mode's fundamental
> > problem.
> 
> Well.  Perhaps we can do that now.

Okay. If you don't object my suggestion, I will send patches next week.
Thanks for the review, Andrew!

> 
> > [1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-17 23:36           ` Minchan Kim
@ 2013-01-21  1:52             ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-21  1:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

Hi,

On Fri, Jan 18, 2013 at 08:36:42AM +0900, Minchan Kim wrote:
> On Thu, Jan 17, 2013 at 02:22:38PM -0800, Andrew Morton wrote:
> > On Thu, 17 Jan 2013 09:53:14 +0900
> > Minchan Kim <minchan@kernel.org> wrote:
> > 
> > > Recently, Luigi reported there are lots of free swap space when
> > > OOM happens. It's easily reproduced on zram-over-swap, where
> > > many instance of memory hogs are running and laptop_mode is enabled.
> > > He said there was no problem when he disabled laptop_mode.
> > > 
> > > The problem when I investigate problem is following as.
> > > 
> > > Assumption for easy explanation: There are no page cache page in system
> > > because they all are already reclaimed.
> > > 
> > > 1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> > > 2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> > > 3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
> > >    pageout because sc->may_writepage is 0 so the page is rotated back into
> > >    inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty
> > > 4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
> > >    retry reclaim with higher priority.
> > > 5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
> > >    but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
> > >    inactive anon lru list is full of dirty pages by 3 so it just returns
> > >    without  any reclaim progress.
> > > 6. do_try_to_free_pages doesn't set may_write due to zero total_scanned.
> > 
> > s/may_write/may_writepage/
> 
> Thanks!
> 
> > 
> > >    Because sc->nr_scanned is increased by shrink_page_list but we don't call
> > >    shrink_page_list in 5 due to short of isolated pages.
> > 
> > This is the bug, is it not?
> > 
> > In laptop mode, we still need to write out dirty swapcache at some
> > point.  An appropriate time to do this is when the scanning priority is
> 
> Yes and when to some point is really important. Now, the point for that is
> depends on on the number of scanned pages by shrink_page_list. It means we
> must isolate victim pages from inactive LRU list and call shrink_page_list
> to increase sc->nr_scanned but unfortunately, we have various filters to
> decrease CPU consumption and LRU churning when VM try to isolate victim pages
> so it could prevent isolating victim pages from LRU list.
> 
> > getting high.  But it seems that this ISOLATE_CLEAN->total_scanned
> 
> Yes. I absolutely agree on that some point should depend on priority, NOT
> the number of scanned pages. And I already said to you about that.
> https://lkml.org/lkml/2013/1/10/643
> 
> We used to use such heuristic in several places in VM, ie DEF_PRIORITY - 2
> But why I hesitate with the patch is that I think this patch should go to
> stable tree so the patch should be really small and have no side effect so
> I don't wanted to change laptop_mode behavior heavily caused by changing
> condition for may_writepage trigger point.
> 
> > interaction is preventing that.
> > 
> > (An enhancement to laptop mode would be to opportunistically write out
> > dirty swapcache in or around laptop_mode_timer_fn()).
> 
> It could but it should be another patch and VM shouldn't rely on ONLY
> laptop_mode_timer_fn, IMHO. VM should have own rule to reclaim pages
> regardless of laptop_mode's help to prevent OOM kill.
> 
> > 
> > > Above loop is continued until OOM happens.
> > > The problem didn't happen before [1] was merged because old logic's isolatation
> > > in shrink_inactive_list was successful and tried to call shrink_page_list
> > > to pageout them but it still ends up failed to page out by may_writepage.
> > > But important point is that sc->nr_scanned was increased althoug we couldn't
> > > swap out them so do_try_to_free_pages could set may_writepages.
> > > So this patch need to go stable tree althoug it's a band-aid.
> > > Then, for latest linus tree, we should fix laptop_mode's fundamental
> > > problem.
> > 
> > Well.  Perhaps we can do that now.
> 
> Okay. If you don't object my suggestion, I will send patches next week.
> Thanks for the review, Andrew!

Andrew, If nobody objects, I would like to drop [1] and add ths patch instead of [1].
Luigi, below patch passed my test. If anybody doesn't object, could you test this
patch?

Thanks!

[1] mm: prevent addition of pages to swap if may_writepage is unset

-------------- &< --------------------

>From 94086dc7152359d052802c55c82ef19509fe8cce Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Mon, 21 Jan 2013 10:43:43 +0900
Subject: [PATCH] mm: Use up free swap space before reaching OOM kill

Recently, Luigi reported there are lots of free swap space when
OOM happens. It's easily reproduced on zram-over-swap, where
many instance of memory hogs are running and laptop_mode is enabled.
He said there was no problem when he disabled laptop_mode.
The problem when I investigate problem is following as.

Assumption for easy explanation: There are no page cache page in system
because they all are already reclaimed.

1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
2. shrink_inactive_list isolates victim pages from inactive anon lru list.
3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
   pageout because sc->may_writepage is 0 so the page is rotated back into
   inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
   retry reclaim with higher priority.
5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
   but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
   inactive anon lru list is full of dirty pages by 3 so it just returns
   without  any reclaim progress.
6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned.
   Because sc->nr_scanned is increased by shrink_page_list but we don't call
   shrink_page_list in 5 due to short of isolated pages.

Above loop is continued until OOM happens.
The problem didn't happen before [1] was merged because old logic's
isolatation in shrink_inactive_list was successful and tried to call
shrink_page_list to pageout them but it still ends up failed to page out
by may_writepage. But important point is that sc->nr_scanned was increased
although we couldn't swap out them so do_try_to_free_pages could set
may_writepages.

Since [1] was introduced, it's not a good idea any more to depends on
only the number of scanned pages for setting may_writepage. So this patch
adds new trigger point of setting may_writepage as below DEF_PRIOIRTY - 2
which is used to show the significant memory pressure in VM so it's good
fit for our purpose which would be better to lose power saving or clickety
rather than OOM killing.

[1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]

Reported-by: Luigi Semenzato <semenzato@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/vmscan.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 740bad9..d333df0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2204,6 +2204,13 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 			goto out;
 
 		/*
+		 * If we're getting trouble reclaiming, start doing
+		 * writepage even in laptop mode.
+		 */
+		if (sc->priority < DEF_PRIORITY - 2)
+			sc->may_writepage = 1;
+
+		/*
 		 * Try to write back as many pages as we just scanned.  This
 		 * tends to cause slow streaming writers to write data to the
 		 * disk smoothly, at the dirtying rate, which is nice.   But
@@ -2774,12 +2781,10 @@ loop_again:
 			}
 
 			/*
-			 * If we've done a decent amount of scanning and
-			 * the reclaim ratio is low, start doing writepage
-			 * even in laptop mode
+			 * If we're getting trouble reclaiming, start doing
+			 * writepage even in laptop mode.
 			 */
-			if (total_scanned > SWAP_CLUSTER_MAX * 2 &&
-			    total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
+			if (sc.priority < DEF_PRIORITY - 2)
 				sc.may_writepage = 1;
 
 			if (zone->all_unreclaimable) {
-- 
1.7.9.5

>- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-21  1:52             ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-21  1:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Rik van Riel, Johannes Weiner

Hi,

On Fri, Jan 18, 2013 at 08:36:42AM +0900, Minchan Kim wrote:
> On Thu, Jan 17, 2013 at 02:22:38PM -0800, Andrew Morton wrote:
> > On Thu, 17 Jan 2013 09:53:14 +0900
> > Minchan Kim <minchan@kernel.org> wrote:
> > 
> > > Recently, Luigi reported there are lots of free swap space when
> > > OOM happens. It's easily reproduced on zram-over-swap, where
> > > many instance of memory hogs are running and laptop_mode is enabled.
> > > He said there was no problem when he disabled laptop_mode.
> > > 
> > > The problem when I investigate problem is following as.
> > > 
> > > Assumption for easy explanation: There are no page cache page in system
> > > because they all are already reclaimed.
> > > 
> > > 1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> > > 2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> > > 3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
> > >    pageout because sc->may_writepage is 0 so the page is rotated back into
> > >    inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty
> > > 4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
> > >    retry reclaim with higher priority.
> > > 5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
> > >    but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
> > >    inactive anon lru list is full of dirty pages by 3 so it just returns
> > >    without  any reclaim progress.
> > > 6. do_try_to_free_pages doesn't set may_write due to zero total_scanned.
> > 
> > s/may_write/may_writepage/
> 
> Thanks!
> 
> > 
> > >    Because sc->nr_scanned is increased by shrink_page_list but we don't call
> > >    shrink_page_list in 5 due to short of isolated pages.
> > 
> > This is the bug, is it not?
> > 
> > In laptop mode, we still need to write out dirty swapcache at some
> > point.  An appropriate time to do this is when the scanning priority is
> 
> Yes and when to some point is really important. Now, the point for that is
> depends on on the number of scanned pages by shrink_page_list. It means we
> must isolate victim pages from inactive LRU list and call shrink_page_list
> to increase sc->nr_scanned but unfortunately, we have various filters to
> decrease CPU consumption and LRU churning when VM try to isolate victim pages
> so it could prevent isolating victim pages from LRU list.
> 
> > getting high.  But it seems that this ISOLATE_CLEAN->total_scanned
> 
> Yes. I absolutely agree on that some point should depend on priority, NOT
> the number of scanned pages. And I already said to you about that.
> https://lkml.org/lkml/2013/1/10/643
> 
> We used to use such heuristic in several places in VM, ie DEF_PRIORITY - 2
> But why I hesitate with the patch is that I think this patch should go to
> stable tree so the patch should be really small and have no side effect so
> I don't wanted to change laptop_mode behavior heavily caused by changing
> condition for may_writepage trigger point.
> 
> > interaction is preventing that.
> > 
> > (An enhancement to laptop mode would be to opportunistically write out
> > dirty swapcache in or around laptop_mode_timer_fn()).
> 
> It could but it should be another patch and VM shouldn't rely on ONLY
> laptop_mode_timer_fn, IMHO. VM should have own rule to reclaim pages
> regardless of laptop_mode's help to prevent OOM kill.
> 
> > 
> > > Above loop is continued until OOM happens.
> > > The problem didn't happen before [1] was merged because old logic's isolatation
> > > in shrink_inactive_list was successful and tried to call shrink_page_list
> > > to pageout them but it still ends up failed to page out by may_writepage.
> > > But important point is that sc->nr_scanned was increased althoug we couldn't
> > > swap out them so do_try_to_free_pages could set may_writepages.
> > > So this patch need to go stable tree althoug it's a band-aid.
> > > Then, for latest linus tree, we should fix laptop_mode's fundamental
> > > problem.
> > 
> > Well.  Perhaps we can do that now.
> 
> Okay. If you don't object my suggestion, I will send patches next week.
> Thanks for the review, Andrew!

Andrew, If nobody objects, I would like to drop [1] and add ths patch instead of [1].
Luigi, below patch passed my test. If anybody doesn't object, could you test this
patch?

Thanks!

[1] mm: prevent addition of pages to swap if may_writepage is unset

-------------- &< --------------------

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-21  1:52             ` Minchan Kim
@ 2013-01-21 14:39               ` Rik van Riel
  -1 siblings, 0 replies; 58+ messages in thread
From: Rik van Riel @ 2013-01-21 14:39 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Johannes Weiner

On 01/20/2013 08:52 PM, Minchan Kim wrote:

>  From 94086dc7152359d052802c55c82ef19509fe8cce Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Mon, 21 Jan 2013 10:43:43 +0900
> Subject: [PATCH] mm: Use up free swap space before reaching OOM kill
>
> Recently, Luigi reported there are lots of free swap space when
> OOM happens. It's easily reproduced on zram-over-swap, where
> many instance of memory hogs are running and laptop_mode is enabled.
> He said there was no problem when he disabled laptop_mode.
> The problem when I investigate problem is following as.
>
> Assumption for easy explanation: There are no page cache page in system
> because they all are already reclaimed.
>
> 1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> 2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> 3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
>     pageout because sc->may_writepage is 0 so the page is rotated back into
>     inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
> 4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
>     retry reclaim with higher priority.
> 5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
>     but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
>     inactive anon lru list is full of dirty pages by 3 so it just returns
>     without  any reclaim progress.
> 6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned.
>     Because sc->nr_scanned is increased by shrink_page_list but we don't call
>     shrink_page_list in 5 due to short of isolated pages.
>
> Above loop is continued until OOM happens.
> The problem didn't happen before [1] was merged because old logic's
> isolatation in shrink_inactive_list was successful and tried to call
> shrink_page_list to pageout them but it still ends up failed to page out
> by may_writepage. But important point is that sc->nr_scanned was increased
> although we couldn't swap out them so do_try_to_free_pages could set
> may_writepages.
>
> Since [1] was introduced, it's not a good idea any more to depends on
> only the number of scanned pages for setting may_writepage. So this patch
> adds new trigger point of setting may_writepage as below DEF_PRIOIRTY - 2
> which is used to show the significant memory pressure in VM so it's good
> fit for our purpose which would be better to lose power saving or clickety
> rather than OOM killing.
>
> [1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
>
> Reported-by: Luigi Semenzato <semenzato@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Your patch is a nice simplification.  I am ok with the
change, provided it works for Luigi :)

Acked-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-21 14:39               ` Rik van Riel
  0 siblings, 0 replies; 58+ messages in thread
From: Rik van Riel @ 2013-01-21 14:39 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Johannes Weiner

On 01/20/2013 08:52 PM, Minchan Kim wrote:

>  From 94086dc7152359d052802c55c82ef19509fe8cce Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Mon, 21 Jan 2013 10:43:43 +0900
> Subject: [PATCH] mm: Use up free swap space before reaching OOM kill
>
> Recently, Luigi reported there are lots of free swap space when
> OOM happens. It's easily reproduced on zram-over-swap, where
> many instance of memory hogs are running and laptop_mode is enabled.
> He said there was no problem when he disabled laptop_mode.
> The problem when I investigate problem is following as.
>
> Assumption for easy explanation: There are no page cache page in system
> because they all are already reclaimed.
>
> 1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> 2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> 3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
>     pageout because sc->may_writepage is 0 so the page is rotated back into
>     inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
> 4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
>     retry reclaim with higher priority.
> 5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
>     but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
>     inactive anon lru list is full of dirty pages by 3 so it just returns
>     without  any reclaim progress.
> 6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned.
>     Because sc->nr_scanned is increased by shrink_page_list but we don't call
>     shrink_page_list in 5 due to short of isolated pages.
>
> Above loop is continued until OOM happens.
> The problem didn't happen before [1] was merged because old logic's
> isolatation in shrink_inactive_list was successful and tried to call
> shrink_page_list to pageout them but it still ends up failed to page out
> by may_writepage. But important point is that sc->nr_scanned was increased
> although we couldn't swap out them so do_try_to_free_pages could set
> may_writepages.
>
> Since [1] was introduced, it's not a good idea any more to depends on
> only the number of scanned pages for setting may_writepage. So this patch
> adds new trigger point of setting may_writepage as below DEF_PRIOIRTY - 2
> which is used to show the significant memory pressure in VM so it's good
> fit for our purpose which would be better to lose power saving or clickety
> rather than OOM killing.
>
> [1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
>
> Reported-by: Luigi Semenzato <semenzato@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Your patch is a nice simplification.  I am ok with the
change, provided it works for Luigi :)

Acked-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-21 14:39               ` Rik van Riel
@ 2013-01-22  0:09                 ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-22  0:09 UTC (permalink / raw)
  To: Rik van Riel, Luigi Semenzato
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Johannes Weiner

On Mon, Jan 21, 2013 at 09:39:06AM -0500, Rik van Riel wrote:
> On 01/20/2013 08:52 PM, Minchan Kim wrote:
> 
> > From 94086dc7152359d052802c55c82ef19509fe8cce Mon Sep 17 00:00:00 2001
> >From: Minchan Kim <minchan@kernel.org>
> >Date: Mon, 21 Jan 2013 10:43:43 +0900
> >Subject: [PATCH] mm: Use up free swap space before reaching OOM kill
> >
> >Recently, Luigi reported there are lots of free swap space when
> >OOM happens. It's easily reproduced on zram-over-swap, where
> >many instance of memory hogs are running and laptop_mode is enabled.
> >He said there was no problem when he disabled laptop_mode.
> >The problem when I investigate problem is following as.
> >
> >Assumption for easy explanation: There are no page cache page in system
> >because they all are already reclaimed.
> >
> >1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> >2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> >3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
> >    pageout because sc->may_writepage is 0 so the page is rotated back into
> >    inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
> >4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
> >    retry reclaim with higher priority.
> >5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
> >    but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
> >    inactive anon lru list is full of dirty pages by 3 so it just returns
> >    without  any reclaim progress.
> >6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned.
> >    Because sc->nr_scanned is increased by shrink_page_list but we don't call
> >    shrink_page_list in 5 due to short of isolated pages.
> >
> >Above loop is continued until OOM happens.
> >The problem didn't happen before [1] was merged because old logic's
> >isolatation in shrink_inactive_list was successful and tried to call
> >shrink_page_list to pageout them but it still ends up failed to page out
> >by may_writepage. But important point is that sc->nr_scanned was increased
> >although we couldn't swap out them so do_try_to_free_pages could set
> >may_writepages.
> >
> >Since [1] was introduced, it's not a good idea any more to depends on
> >only the number of scanned pages for setting may_writepage. So this patch
> >adds new trigger point of setting may_writepage as below DEF_PRIOIRTY - 2
> >which is used to show the significant memory pressure in VM so it's good
> >fit for our purpose which would be better to lose power saving or clickety
> >rather than OOM killing.
> >
> >[1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
> >
> >Reported-by: Luigi Semenzato <semenzato@google.com>
> >Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> Your patch is a nice simplification.  I am ok with the
> change, provided it works for Luigi :)

Thanks, Rik.

Oops, I missed to Ccing Luigi. Add him again.
Luigi, Could you test this patch?
Thanks for your endless effort.

> 
> Acked-by: Rik van Riel <riel@redhat.com>
> 
> 
> -- 
> All rights reversed
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-01-22  0:09                 ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-01-22  0:09 UTC (permalink / raw)
  To: Rik van Riel, Luigi Semenzato
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Johannes Weiner

On Mon, Jan 21, 2013 at 09:39:06AM -0500, Rik van Riel wrote:
> On 01/20/2013 08:52 PM, Minchan Kim wrote:
> 
> > From 94086dc7152359d052802c55c82ef19509fe8cce Mon Sep 17 00:00:00 2001
> >From: Minchan Kim <minchan@kernel.org>
> >Date: Mon, 21 Jan 2013 10:43:43 +0900
> >Subject: [PATCH] mm: Use up free swap space before reaching OOM kill
> >
> >Recently, Luigi reported there are lots of free swap space when
> >OOM happens. It's easily reproduced on zram-over-swap, where
> >many instance of memory hogs are running and laptop_mode is enabled.
> >He said there was no problem when he disabled laptop_mode.
> >The problem when I investigate problem is following as.
> >
> >Assumption for easy explanation: There are no page cache page in system
> >because they all are already reclaimed.
> >
> >1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> >2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> >3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
> >    pageout because sc->may_writepage is 0 so the page is rotated back into
> >    inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
> >4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
> >    retry reclaim with higher priority.
> >5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
> >    but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
> >    inactive anon lru list is full of dirty pages by 3 so it just returns
> >    without  any reclaim progress.
> >6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned.
> >    Because sc->nr_scanned is increased by shrink_page_list but we don't call
> >    shrink_page_list in 5 due to short of isolated pages.
> >
> >Above loop is continued until OOM happens.
> >The problem didn't happen before [1] was merged because old logic's
> >isolatation in shrink_inactive_list was successful and tried to call
> >shrink_page_list to pageout them but it still ends up failed to page out
> >by may_writepage. But important point is that sc->nr_scanned was increased
> >although we couldn't swap out them so do_try_to_free_pages could set
> >may_writepages.
> >
> >Since [1] was introduced, it's not a good idea any more to depends on
> >only the number of scanned pages for setting may_writepage. So this patch
> >adds new trigger point of setting may_writepage as below DEF_PRIOIRTY - 2
> >which is used to show the significant memory pressure in VM so it's good
> >fit for our purpose which would be better to lose power saving or clickety
> >rather than OOM killing.
> >
> >[1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
> >
> >Reported-by: Luigi Semenzato <semenzato@google.com>
> >Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> Your patch is a nice simplification.  I am ok with the
> change, provided it works for Luigi :)

Thanks, Rik.

Oops, I missed to Ccing Luigi. Add him again.
Luigi, Could you test this patch?
Thanks for your endless effort.

> 
> Acked-by: Rik van Riel <riel@redhat.com>
> 
> 
> -- 
> All rights reversed
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
  2013-01-22  0:09                 ` Minchan Kim
@ 2013-02-05  1:43                   ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-02-05  1:43 UTC (permalink / raw)
  To: Rik van Riel, Luigi Semenzato, Andrew Morton
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Magenheimer,
	Sonny Rao, Bryan Freed, Hugh Dickins, Johannes Weiner

On Tue, Jan 22, 2013 at 09:09:54AM +0900, Minchan Kim wrote:
> On Mon, Jan 21, 2013 at 09:39:06AM -0500, Rik van Riel wrote:
> > On 01/20/2013 08:52 PM, Minchan Kim wrote:
> > 
> > > From 94086dc7152359d052802c55c82ef19509fe8cce Mon Sep 17 00:00:00 2001
> > >From: Minchan Kim <minchan@kernel.org>
> > >Date: Mon, 21 Jan 2013 10:43:43 +0900
> > >Subject: [PATCH] mm: Use up free swap space before reaching OOM kill
> > >
> > >Recently, Luigi reported there are lots of free swap space when
> > >OOM happens. It's easily reproduced on zram-over-swap, where
> > >many instance of memory hogs are running and laptop_mode is enabled.
> > >He said there was no problem when he disabled laptop_mode.
> > >The problem when I investigate problem is following as.
> > >
> > >Assumption for easy explanation: There are no page cache page in system
> > >because they all are already reclaimed.
> > >
> > >1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> > >2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> > >3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
> > >    pageout because sc->may_writepage is 0 so the page is rotated back into
> > >    inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
> > >4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
> > >    retry reclaim with higher priority.
> > >5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
> > >    but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
> > >    inactive anon lru list is full of dirty pages by 3 so it just returns
> > >    without  any reclaim progress.
> > >6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned.
> > >    Because sc->nr_scanned is increased by shrink_page_list but we don't call
> > >    shrink_page_list in 5 due to short of isolated pages.
> > >
> > >Above loop is continued until OOM happens.
> > >The problem didn't happen before [1] was merged because old logic's
> > >isolatation in shrink_inactive_list was successful and tried to call
> > >shrink_page_list to pageout them but it still ends up failed to page out
> > >by may_writepage. But important point is that sc->nr_scanned was increased
> > >although we couldn't swap out them so do_try_to_free_pages could set
> > >may_writepages.
> > >
> > >Since [1] was introduced, it's not a good idea any more to depends on
> > >only the number of scanned pages for setting may_writepage. So this patch
> > >adds new trigger point of setting may_writepage as below DEF_PRIOIRTY - 2
> > >which is used to show the significant memory pressure in VM so it's good
> > >fit for our purpose which would be better to lose power saving or clickety
> > >rather than OOM killing.
> > >
> > >[1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
> > >
> > >Reported-by: Luigi Semenzato <semenzato@google.com>
> > >Signed-off-by: Minchan Kim <minchan@kernel.org>
> > 
> > Your patch is a nice simplification.  I am ok with the
> > change, provided it works for Luigi :)
> 
> Thanks, Rik.
> 
> Oops, I missed to Ccing Luigi. Add him again.
> Luigi, Could you test this patch?
> Thanks for your endless effort.
> 
> > 
> > Acked-by: Rik van Riel <riel@redhat.com>
> > 


Andrew,
I hope Luigi confirms this patch but he seems to be very busy.
At a minimum, I tested this patch and passed my test.
Could you apply this and remove [2]?
Otherwise, should I wait for Luigi?

[2] mm: prevent addition of pages to swap if may_writepage is unset

>From 72cdf4159427c1ecdbd21a40b8bd1f13d5b8d5e2 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Mon, 21 Jan 2013 10:52:22 +0900
Subject: [PATCH] mm: Use up free swap space before reaching OOM kill

Recently, Luigi reported there are lots of free swap space when
OOM happens. It's easily reproduced on zram-over-swap, where
many instance of memory hogs are running and laptop_mode is enabled.
He said there was no problem when he disabled laptop_mode.
The problem when I investigate problem is following as.

Assumption for easy explanation: There are no page cache page in system
because they all are already reclaimed.

1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
2. shrink_inactive_list isolates victim pages from inactive anon lru list.
3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
   pageout because sc->may_writepage is 0 so the page is rotated back into
   inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
   retry reclaim with higher priority.
5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
   but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
   inactive anon lru list is full of dirty pages by 3 so it just returns
   without  any reclaim progress.
6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned.
   Because sc->nr_scanned is increased by shrink_page_list but we don't call
   shrink_page_list in 5 due to short of isolated pages.

Above loop is continued until OOM happens.
The problem didn't happen before [1] was merged because old logic's
isolatation in shrink_inactive_list was successful and tried to call
shrink_page_list to pageout them but it still ends up failed to page out
by may_writepage. But important point is that sc->nr_scanned was increased
although we couldn't swap out them so do_try_to_free_pages could set
may_writepages.

Since [1] was introduced, it's not a good idea any more to depends on
only the number of scanned pages for setting may_writepage. So this patch
adds new trigger point of setting may_writepage as below DEF_PRIOIRTY - 2
which is used to show the significant memory pressure in VM so it's good
fit for our purpose which would be better to lose power saving or clickety
rather than OOM killing.

[1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]

Reported-by: Luigi Semenzato <semenzato@google.com>
[Rik is ok if the patch works for Luigi]
Not-yet-Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/vmscan.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index d75c1ec..4fb3a6d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2204,6 +2204,13 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 			goto out;
 
 		/*
+		 * If we're getting trouble reclaiming, start doing
+		 * writepage even in laptop mode.
+		 */
+		if (sc->priority < DEF_PRIORITY - 2)
+			sc->may_writepage = 1;
+
+		/*
 		 * Try to write back as many pages as we just scanned.  This
 		 * tends to cause slow streaming writers to write data to the
 		 * disk smoothly, at the dirtying rate, which is nice.   But
@@ -2774,12 +2781,10 @@ loop_again:
 			}
 
 			/*
-			 * If we've done a decent amount of scanning and
-			 * the reclaim ratio is low, start doing writepage
-			 * even in laptop mode
+			 * If we're getting trouble reclaiming, start doing
+			 * writepage even in laptop mode.
 			 */
-			if (total_scanned > SWAP_CLUSTER_MAX * 2 &&
-			    total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
+			if (sc.priority < DEF_PRIORITY - 2)
 				sc.may_writepage = 1;
 
 			if (zone->all_unreclaimable) {
-- 
1.8.1.1

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
@ 2013-02-05  1:43                   ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2013-02-05  1:43 UTC (permalink / raw)
  To: Rik van Riel, Luigi Semenzato, Andrew Morton
  Cc: linux-mm, linux-kernel, Dan Magenheimer, Sonny Rao, Bryan Freed,
	Hugh Dickins, Johannes Weiner

On Tue, Jan 22, 2013 at 09:09:54AM +0900, Minchan Kim wrote:
> On Mon, Jan 21, 2013 at 09:39:06AM -0500, Rik van Riel wrote:
> > On 01/20/2013 08:52 PM, Minchan Kim wrote:
> > 
> > > From 94086dc7152359d052802c55c82ef19509fe8cce Mon Sep 17 00:00:00 2001
> > >From: Minchan Kim <minchan@kernel.org>
> > >Date: Mon, 21 Jan 2013 10:43:43 +0900
> > >Subject: [PATCH] mm: Use up free swap space before reaching OOM kill
> > >
> > >Recently, Luigi reported there are lots of free swap space when
> > >OOM happens. It's easily reproduced on zram-over-swap, where
> > >many instance of memory hogs are running and laptop_mode is enabled.
> > >He said there was no problem when he disabled laptop_mode.
> > >The problem when I investigate problem is following as.
> > >
> > >Assumption for easy explanation: There are no page cache page in system
> > >because they all are already reclaimed.
> > >
> > >1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
> > >2. shrink_inactive_list isolates victim pages from inactive anon lru list.
> > >3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
> > >    pageout because sc->may_writepage is 0 so the page is rotated back into
> > >    inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
> > >4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
> > >    retry reclaim with higher priority.
> > >5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
> > >    but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
> > >    inactive anon lru list is full of dirty pages by 3 so it just returns
> > >    without  any reclaim progress.
> > >6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned.
> > >    Because sc->nr_scanned is increased by shrink_page_list but we don't call
> > >    shrink_page_list in 5 due to short of isolated pages.
> > >
> > >Above loop is continued until OOM happens.
> > >The problem didn't happen before [1] was merged because old logic's
> > >isolatation in shrink_inactive_list was successful and tried to call
> > >shrink_page_list to pageout them but it still ends up failed to page out
> > >by may_writepage. But important point is that sc->nr_scanned was increased
> > >although we couldn't swap out them so do_try_to_free_pages could set
> > >may_writepages.
> > >
> > >Since [1] was introduced, it's not a good idea any more to depends on
> > >only the number of scanned pages for setting may_writepage. So this patch
> > >adds new trigger point of setting may_writepage as below DEF_PRIOIRTY - 2
> > >which is used to show the significant memory pressure in VM so it's good
> > >fit for our purpose which would be better to lose power saving or clickety
> > >rather than OOM killing.
> > >
> > >[1] f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware]
> > >
> > >Reported-by: Luigi Semenzato <semenzato@google.com>
> > >Signed-off-by: Minchan Kim <minchan@kernel.org>
> > 
> > Your patch is a nice simplification.  I am ok with the
> > change, provided it works for Luigi :)
> 
> Thanks, Rik.
> 
> Oops, I missed to Ccing Luigi. Add him again.
> Luigi, Could you test this patch?
> Thanks for your endless effort.
> 
> > 
> > Acked-by: Rik van Riel <riel@redhat.com>
> > 


Andrew,
I hope Luigi confirms this patch but he seems to be very busy.
At a minimum, I tested this patch and passed my test.
Could you apply this and remove [2]?
Otherwise, should I wait for Luigi?

[2] mm: prevent addition of pages to swap if may_writepage is unset

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2013-02-05  1:43 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-09  6:21 [PATCH 0/2] Use up free swap space before reaching OOM kill Minchan Kim
2013-01-09  6:21 ` [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset Minchan Kim
2013-01-09  6:21   ` Minchan Kim
2013-01-09  6:56   ` Johannes Weiner
2013-01-09  6:56     ` Johannes Weiner
2013-01-09  7:10     ` Minchan Kim
2013-01-09  7:10       ` Minchan Kim
2013-01-10  0:18   ` Andrew Morton
2013-01-10  0:18     ` Andrew Morton
2013-01-10  2:03     ` Minchan Kim
2013-01-10  2:03       ` Minchan Kim
2013-01-10 23:24       ` Luigi Semenzato
2013-01-10 23:27         ` Luigi Semenzato
2013-01-10 23:27           ` Luigi Semenzato
2013-01-11  4:03         ` Minchan Kim
2013-01-11  4:03           ` Minchan Kim
2013-01-10  0:20   ` Andrew Morton
2013-01-10  0:20     ` Andrew Morton
2013-01-16 21:41   ` Andrew Morton
2013-01-16 21:41     ` Andrew Morton
2013-01-17  0:53     ` Minchan Kim
2013-01-17  0:53       ` Minchan Kim
2013-01-17 22:22       ` Andrew Morton
2013-01-17 22:22         ` Andrew Morton
2013-01-17 23:36         ` Minchan Kim
2013-01-17 23:36           ` Minchan Kim
2013-01-21  1:52           ` Minchan Kim
2013-01-21  1:52             ` Minchan Kim
2013-01-21 14:39             ` Rik van Riel
2013-01-21 14:39               ` Rik van Riel
2013-01-22  0:09               ` Minchan Kim
2013-01-22  0:09                 ` Minchan Kim
2013-02-05  1:43                 ` Minchan Kim
2013-02-05  1:43                   ` Minchan Kim
2013-01-09  6:21 ` [PATCH 2/2] mm: forcely swapout when we are out of page cache Minchan Kim
2013-01-09  6:21   ` Minchan Kim
2013-01-10  0:26   ` Andrew Morton
2013-01-10  0:26     ` Andrew Morton
2013-01-10  2:23     ` Minchan Kim
2013-01-10  2:23       ` Minchan Kim
2013-01-10 21:58       ` Andrew Morton
2013-01-10 21:58         ` Andrew Morton
2013-01-11  4:43         ` Minchan Kim
2013-01-11  4:43           ` Minchan Kim
2013-01-16  0:09           ` Andrew Morton
2013-01-16  0:09             ` Andrew Morton
2013-01-16  0:32             ` Sonny Rao
2013-01-16  0:32               ` Sonny Rao
2013-01-16  0:50               ` Andrew Morton
2013-01-16  0:50                 ` Andrew Morton
2013-01-16  1:21                 ` Sonny Rao
2013-01-16  1:21                   ` Sonny Rao
2013-01-16  4:47                   ` Minchan Kim
2013-01-16  4:47                     ` Minchan Kim
2013-01-16 20:08                     ` Sonny Rao
2013-01-16 20:08                       ` Sonny Rao
2013-01-16  4:43             ` Minchan Kim
2013-01-16  4:43               ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.