All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC:  Easy-Reclaimable LRU list
@ 2012-06-19  5:49 ` Minchan Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Minchan Kim @ 2012-06-19  5:49 UTC (permalink / raw)
  To: linux-mm, LKML
  Cc: Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Johannes Weiner, Andrea Arcangeli, Andrew Morton,
	Anton Vorontsov, John Stultz, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

Hi everybody!

Recently, there are some efforts to handle system memory pressure.

1) low memory notification - [1]
2) fallocate(VOLATILE) - [2]
3) fadvise(NOREUSE) - [3]

For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
Reclaimable LRU list includes _easy_ reclaimable pages.
For example, easy reclaimable pages are following as. 

1. invalidated but remained LRU list.
2. pageout pages for reclaim(PG_reclaim pages)
3. fadvise(NOREUSE)
4. fallocate(VOLATILE)

Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.
Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary
swapout in anon pages in easy-reclaimable LRU list.
It also can make admin measure how many we have available pages at the moment without latency.
It's very important in recent mobile systems because page reclaim/writeback is very critical
of application latency. Of course, it could affect normal desktop, too.
With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES,
for example. If it's below threshold we defined, we could trigger 1st level notification
if we really need prototying low memory notification.

We may change madvise(DONTNEED) implementation instead of zapping page immediately.
If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault.
Of course, we can discard instead of swap out if system memory pressure happens.
We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread.

As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly.
The rationale is that in non-rotation device, read/write cost is much asynchronous.
Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages
if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too.

Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late.
I hope listen others opinion before get into the code.

Any comment are welcome.
Thanks.

[1] http://lkml.org/lkml/2012/5/1/97
[2] https://lkml.org/lkml/2012/6/1/322
[3] https://lkml.org/lkml/2011/6/24/136

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RFC:  Easy-Reclaimable LRU list
@ 2012-06-19  5:49 ` Minchan Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Minchan Kim @ 2012-06-19  5:49 UTC (permalink / raw)
  To: linux-mm, LKML
  Cc: Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Johannes Weiner, Andrea Arcangeli, Andrew Morton,
	Anton Vorontsov, John Stultz, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

Hi everybody!

Recently, there are some efforts to handle system memory pressure.

1) low memory notification - [1]
2) fallocate(VOLATILE) - [2]
3) fadvise(NOREUSE) - [3]

For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
Reclaimable LRU list includes _easy_ reclaimable pages.
For example, easy reclaimable pages are following as. 

1. invalidated but remained LRU list.
2. pageout pages for reclaim(PG_reclaim pages)
3. fadvise(NOREUSE)
4. fallocate(VOLATILE)

Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.
Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary
swapout in anon pages in easy-reclaimable LRU list.
It also can make admin measure how many we have available pages at the moment without latency.
It's very important in recent mobile systems because page reclaim/writeback is very critical
of application latency. Of course, it could affect normal desktop, too.
With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES,
for example. If it's below threshold we defined, we could trigger 1st level notification
if we really need prototying low memory notification.

We may change madvise(DONTNEED) implementation instead of zapping page immediately.
If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault.
Of course, we can discard instead of swap out if system memory pressure happens.
We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread.

As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly.
The rationale is that in non-rotation device, read/write cost is much asynchronous.
Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages
if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too.

Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late.
I hope listen others opinion before get into the code.

Any comment are welcome.
Thanks.

[1] http://lkml.org/lkml/2012/5/1/97
[2] https://lkml.org/lkml/2012/6/1/322
[3] https://lkml.org/lkml/2011/6/24/136

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
  2012-06-19  5:49 ` Minchan Kim
@ 2012-06-21 19:21   ` John Stultz
  -1 siblings, 0 replies; 26+ messages in thread
From: John Stultz @ 2012-06-21 19:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, LKML, Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

On 06/18/2012 10:49 PM, Minchan Kim wrote:
> Hi everybody!
>
> Recently, there are some efforts to handle system memory pressure.
>
> 1) low memory notification - [1]
> 2) fallocate(VOLATILE) - [2]
> 3) fadvise(NOREUSE) - [3]
>
> For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
> Reclaimable LRU list includes _easy_ reclaimable pages.
> For example, easy reclaimable pages are following as.
>
> 1. invalidated but remained LRU list.
> 2. pageout pages for reclaim(PG_reclaim pages)
> 3. fadvise(NOREUSE)
> 4. fallocate(VOLATILE)
>
> Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.
> Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary
> swapout in anon pages in easy-reclaimable LRU list.

I was hoping there would be further comment on this by more core VM 
devs, but so far things have been quiet (is everyone on vacation?).

Overall this seems reasonable for the volatile ranges functionality.  
The one down-side being that dealing with the ranges on a per-page basis 
can make marking and unmarking larger ranges as volatile fairly 
expensive. In my tests with my last patchset, it was over 75x slower 
(~1.5ms) marking and umarking a 1meg range when we deactivate and 
activate all of the pages, instead of just inserting the volatile range 
into an interval tree and purge via the shrinker (~20us).  Granted, my 
initial approach is somewhat naive, and some pagevec batching has 
improved things three-fold (down to ~500us) , but I'm still ~25x slower 
when iterating over all the pages.

There's surely further improvements to be made, but this added cost 
worries me, as users are unlikely to generously volunteer up memory to 
the kernel as volatile if doing so frequently adds significant overhead.

This makes me wonder if having something like an early-shrinker which 
gets called prior to shrinking the lrus might be a better approach for 
volatile ranges. It would still be numa-unaware, but would keep the 
overhead very light to both volatile users and non users.

Even so, I'd be interested in seeing more about your approach, in the 
hopes that it might not be as costly as my initial attempt. Do you have 
any plans to start prototyping this?

thanks
-john


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
@ 2012-06-21 19:21   ` John Stultz
  0 siblings, 0 replies; 26+ messages in thread
From: John Stultz @ 2012-06-21 19:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, LKML, Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

On 06/18/2012 10:49 PM, Minchan Kim wrote:
> Hi everybody!
>
> Recently, there are some efforts to handle system memory pressure.
>
> 1) low memory notification - [1]
> 2) fallocate(VOLATILE) - [2]
> 3) fadvise(NOREUSE) - [3]
>
> For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
> Reclaimable LRU list includes _easy_ reclaimable pages.
> For example, easy reclaimable pages are following as.
>
> 1. invalidated but remained LRU list.
> 2. pageout pages for reclaim(PG_reclaim pages)
> 3. fadvise(NOREUSE)
> 4. fallocate(VOLATILE)
>
> Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.
> Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary
> swapout in anon pages in easy-reclaimable LRU list.

I was hoping there would be further comment on this by more core VM 
devs, but so far things have been quiet (is everyone on vacation?).

Overall this seems reasonable for the volatile ranges functionality.  
The one down-side being that dealing with the ranges on a per-page basis 
can make marking and unmarking larger ranges as volatile fairly 
expensive. In my tests with my last patchset, it was over 75x slower 
(~1.5ms) marking and umarking a 1meg range when we deactivate and 
activate all of the pages, instead of just inserting the volatile range 
into an interval tree and purge via the shrinker (~20us).  Granted, my 
initial approach is somewhat naive, and some pagevec batching has 
improved things three-fold (down to ~500us) , but I'm still ~25x slower 
when iterating over all the pages.

There's surely further improvements to be made, but this added cost 
worries me, as users are unlikely to generously volunteer up memory to 
the kernel as volatile if doing so frequently adds significant overhead.

This makes me wonder if having something like an early-shrinker which 
gets called prior to shrinking the lrus might be a better approach for 
volatile ranges. It would still be numa-unaware, but would keep the 
overhead very light to both volatile users and non users.

Even so, I'd be interested in seeing more about your approach, in the 
hopes that it might not be as costly as my initial attempt. Do you have 
any plans to start prototyping this?

thanks
-john

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
  2012-06-21 19:21   ` John Stultz
@ 2012-06-22  6:57     ` Minchan Kim
  -1 siblings, 0 replies; 26+ messages in thread
From: Minchan Kim @ 2012-06-22  6:57 UTC (permalink / raw)
  To: John Stultz
  Cc: linux-mm, LKML, Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

Hi John,

On 06/22/2012 04:21 AM, John Stultz wrote:

> On 06/18/2012 10:49 PM, Minchan Kim wrote:
>> Hi everybody!
>>
>> Recently, there are some efforts to handle system memory pressure.
>>
>> 1) low memory notification - [1]
>> 2) fallocate(VOLATILE) - [2]
>> 3) fadvise(NOREUSE) - [3]
>>
>> For them, I would like to add new LRU list, aka "Ereclaimable" which
>> is opposite of "unevictable".
>> Reclaimable LRU list includes _easy_ reclaimable pages.
>> For example, easy reclaimable pages are following as.
>>
>> 1. invalidated but remained LRU list.
>> 2. pageout pages for reclaim(PG_reclaim pages)
>> 3. fadvise(NOREUSE)
>> 4. fallocate(VOLATILE)
>>
>> Their pages shouldn't stir normal LRU list and compaction might not
>> migrate them, even.
>> Reclaimer can reclaim Ereclaimable pages before normal lru list and
>> will avoid unnecessary
>> swapout in anon pages in easy-reclaimable LRU list.
> 
> I was hoping there would be further comment on this by more core VM
> devs, but so far things have been quiet (is everyone on vacation?).


At least, there are no dissent comment until now.
Let be a positive. :)

> 
> Overall this seems reasonable for the volatile ranges functionality. 
> The one down-side being that dealing with the ranges on a per-page basis
> can make marking and unmarking larger ranges as volatile fairly
> expensive. In my tests with my last patchset, it was over 75x slower
> (~1.5ms) marking and umarking a 1meg range when we deactivate and
> activate all of the pages, instead of just inserting the volatile range
> into an interval tree and purge via the shrinker (~20us).  Granted, my
> initial approach is somewhat naive, and some pagevec batching has
> improved things three-fold (down to ~500us) , but I'm still ~25x slower
> when iterating over all the pages.
> 
> There's surely further improvements to be made, but this added cost
> worries me, as users are unlikely to generously volunteer up memory to
> the kernel as volatile if doing so frequently adds significant overhead.
> 
> This makes me wonder if having something like an early-shrinker which
> gets called prior to shrinking the lrus might be a better approach for
> volatile ranges. It would still be numa-unaware, but would keep the
> overhead very light to both volatile users and non users.


How about doing it in background?
In your process context, you can schedule your work to workqueue and when work is executed,
you can move the pages into lru list you want.
Just an idea.

> 
> Even so, I'd be interested in seeing more about your approach, in the
> hopes that it might not be as costly as my initial attempt. Do you have
> any plans to start prototyping this?


I will wait response a few day and if anyone doesn't raise critical problems, will start.
But please keep in mind.I guess it's never trivial so you shouldn't depend on my schedule.
Thanks.

> 
> thanks
> -john
> 



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
@ 2012-06-22  6:57     ` Minchan Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Minchan Kim @ 2012-06-22  6:57 UTC (permalink / raw)
  To: John Stultz
  Cc: linux-mm, LKML, Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

Hi John,

On 06/22/2012 04:21 AM, John Stultz wrote:

> On 06/18/2012 10:49 PM, Minchan Kim wrote:
>> Hi everybody!
>>
>> Recently, there are some efforts to handle system memory pressure.
>>
>> 1) low memory notification - [1]
>> 2) fallocate(VOLATILE) - [2]
>> 3) fadvise(NOREUSE) - [3]
>>
>> For them, I would like to add new LRU list, aka "Ereclaimable" which
>> is opposite of "unevictable".
>> Reclaimable LRU list includes _easy_ reclaimable pages.
>> For example, easy reclaimable pages are following as.
>>
>> 1. invalidated but remained LRU list.
>> 2. pageout pages for reclaim(PG_reclaim pages)
>> 3. fadvise(NOREUSE)
>> 4. fallocate(VOLATILE)
>>
>> Their pages shouldn't stir normal LRU list and compaction might not
>> migrate them, even.
>> Reclaimer can reclaim Ereclaimable pages before normal lru list and
>> will avoid unnecessary
>> swapout in anon pages in easy-reclaimable LRU list.
> 
> I was hoping there would be further comment on this by more core VM
> devs, but so far things have been quiet (is everyone on vacation?).


At least, there are no dissent comment until now.
Let be a positive. :)

> 
> Overall this seems reasonable for the volatile ranges functionality. 
> The one down-side being that dealing with the ranges on a per-page basis
> can make marking and unmarking larger ranges as volatile fairly
> expensive. In my tests with my last patchset, it was over 75x slower
> (~1.5ms) marking and umarking a 1meg range when we deactivate and
> activate all of the pages, instead of just inserting the volatile range
> into an interval tree and purge via the shrinker (~20us).  Granted, my
> initial approach is somewhat naive, and some pagevec batching has
> improved things three-fold (down to ~500us) , but I'm still ~25x slower
> when iterating over all the pages.
> 
> There's surely further improvements to be made, but this added cost
> worries me, as users are unlikely to generously volunteer up memory to
> the kernel as volatile if doing so frequently adds significant overhead.
> 
> This makes me wonder if having something like an early-shrinker which
> gets called prior to shrinking the lrus might be a better approach for
> volatile ranges. It would still be numa-unaware, but would keep the
> overhead very light to both volatile users and non users.


How about doing it in background?
In your process context, you can schedule your work to workqueue and when work is executed,
you can move the pages into lru list you want.
Just an idea.

> 
> Even so, I'd be interested in seeing more about your approach, in the
> hopes that it might not be as costly as my initial attempt. Do you have
> any plans to start prototyping this?


I will wait response a few day and if anyone doesn't raise critical problems, will start.
But please keep in mind.I guess it's never trivial so you shouldn't depend on my schedule.
Thanks.

> 
> thanks
> -john
> 



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
  2012-06-22  6:57     ` Minchan Kim
@ 2012-06-23  4:45       ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 26+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-23  4:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: John Stultz, linux-mm, LKML, Rik van Riel, Mel Gorman,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

(2012/06/22 15:57), Minchan Kim wrote:
> Hi John,
>
> On 06/22/2012 04:21 AM, John Stultz wrote:
>
>> On 06/18/2012 10:49 PM, Minchan Kim wrote:
>>> Hi everybody!
>>>
>>> Recently, there are some efforts to handle system memory pressure.
>>>
>>> 1) low memory notification - [1]
>>> 2) fallocate(VOLATILE) - [2]
>>> 3) fadvise(NOREUSE) - [3]
>>>
>>> For them, I would like to add new LRU list, aka "Ereclaimable" which
>>> is opposite of "unevictable".
>>> Reclaimable LRU list includes _easy_ reclaimable pages.
>>> For example, easy reclaimable pages are following as.
>>>
>>> 1. invalidated but remained LRU list.
>>> 2. pageout pages for reclaim(PG_reclaim pages)
>>> 3. fadvise(NOREUSE)
>>> 4. fallocate(VOLATILE)
>>>
>>> Their pages shouldn't stir normal LRU list and compaction might not
>>> migrate them, even.
>>> Reclaimer can reclaim Ereclaimable pages before normal lru list and
>>> will avoid unnecessary
>>> swapout in anon pages in easy-reclaimable LRU list.
>>
>> I was hoping there would be further comment on this by more core VM
>> devs, but so far things have been quiet (is everyone on vacation?).
>
>
> At least, there are no dissent comment until now.
> Let be a positive. :)

I think this is interesting approach. Major concern is how to guarantee EReclaimable
pages are really EReclaimable...Do you have any idea ? madviced pages are really
EReclaimable ?

A (very) small concern is will you use one more page-flags for this ? ;)

Thanks,
-Kame






^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
@ 2012-06-23  4:45       ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 26+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-23  4:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: John Stultz, linux-mm, LKML, Rik van Riel, Mel Gorman,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

(2012/06/22 15:57), Minchan Kim wrote:
> Hi John,
>
> On 06/22/2012 04:21 AM, John Stultz wrote:
>
>> On 06/18/2012 10:49 PM, Minchan Kim wrote:
>>> Hi everybody!
>>>
>>> Recently, there are some efforts to handle system memory pressure.
>>>
>>> 1) low memory notification - [1]
>>> 2) fallocate(VOLATILE) - [2]
>>> 3) fadvise(NOREUSE) - [3]
>>>
>>> For them, I would like to add new LRU list, aka "Ereclaimable" which
>>> is opposite of "unevictable".
>>> Reclaimable LRU list includes _easy_ reclaimable pages.
>>> For example, easy reclaimable pages are following as.
>>>
>>> 1. invalidated but remained LRU list.
>>> 2. pageout pages for reclaim(PG_reclaim pages)
>>> 3. fadvise(NOREUSE)
>>> 4. fallocate(VOLATILE)
>>>
>>> Their pages shouldn't stir normal LRU list and compaction might not
>>> migrate them, even.
>>> Reclaimer can reclaim Ereclaimable pages before normal lru list and
>>> will avoid unnecessary
>>> swapout in anon pages in easy-reclaimable LRU list.
>>
>> I was hoping there would be further comment on this by more core VM
>> devs, but so far things have been quiet (is everyone on vacation?).
>
>
> At least, there are no dissent comment until now.
> Let be a positive. :)

I think this is interesting approach. Major concern is how to guarantee EReclaimable
pages are really EReclaimable...Do you have any idea ? madviced pages are really
EReclaimable ?

A (very) small concern is will you use one more page-flags for this ? ;)

Thanks,
-Kame





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
  2012-06-23  4:45       ` Kamezawa Hiroyuki
@ 2012-06-23 15:53         ` Rik van Riel
  -1 siblings, 0 replies; 26+ messages in thread
From: Rik van Riel @ 2012-06-23 15:53 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Minchan Kim, John Stultz, linux-mm, LKML, Mel Gorman,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

On 06/23/2012 12:45 AM, Kamezawa Hiroyuki wrote:

> I think this is interesting approach. Major concern is how to guarantee
> EReclaimable
> pages are really EReclaimable...Do you have any idea ? madviced pages
> are really EReclaimable ?

I suspect the EReclaimable pages can only be clean page
cache pages that are not mapped by any processes.

Once somebody tries to use the page, mark_page_accessed
will move it to another list.

> A (very) small concern is will you use one more page-flags for this ? ;)

This could be an issue on a 32 bit system, true.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
@ 2012-06-23 15:53         ` Rik van Riel
  0 siblings, 0 replies; 26+ messages in thread
From: Rik van Riel @ 2012-06-23 15:53 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Minchan Kim, John Stultz, linux-mm, LKML, Mel Gorman,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

On 06/23/2012 12:45 AM, Kamezawa Hiroyuki wrote:

> I think this is interesting approach. Major concern is how to guarantee
> EReclaimable
> pages are really EReclaimable...Do you have any idea ? madviced pages
> are really EReclaimable ?

I suspect the EReclaimable pages can only be clean page
cache pages that are not mapped by any processes.

Once somebody tries to use the page, mark_page_accessed
will move it to another list.

> A (very) small concern is will you use one more page-flags for this ? ;)

This could be an issue on a 32 bit system, true.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Easy-Reclaimable LRU list
  2012-06-23 15:53         ` Rik van Riel
@ 2012-06-24 11:09           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 26+ messages in thread
From: KOSAKI Motohiro @ 2012-06-24 11:09 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Kamezawa Hiroyuki, Minchan Kim, John Stultz, linux-mm, LKML,
	Mel Gorman, Johannes Weiner, Andrea Arcangeli, Andrew Morton,
	Anton Vorontsov, Pekka Enberg, Wu Fengguang, Hugh Dickins

On Sat, Jun 23, 2012 at 11:53 AM, Rik van Riel <riel@redhat.com> wrote:
> On 06/23/2012 12:45 AM, Kamezawa Hiroyuki wrote:
>
>> I think this is interesting approach. Major concern is how to guarantee
>> EReclaimable
>> pages are really EReclaimable...Do you have any idea ? madviced pages
>> are really EReclaimable ?
>
> I suspect the EReclaimable pages can only be clean page
> cache pages that are not mapped by any processes.
>
> Once somebody tries to use the page, mark_page_accessed
> will move it to another list.

100% agree.


>> A (very) small concern is will you use one more page-flags for this ? ;)
>
> This could be an issue on a 32 bit system, true.

Do we really need SwapBacked bit? Actually swap-backed is
per-superblock attribute and don't change dynamically (i.e. no race
happen). thus this bit
might be able to move into page->mapping or page->mapping->host.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Easy-Reclaimable LRU list
@ 2012-06-24 11:09           ` KOSAKI Motohiro
  0 siblings, 0 replies; 26+ messages in thread
From: KOSAKI Motohiro @ 2012-06-24 11:09 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Kamezawa Hiroyuki, Minchan Kim, John Stultz, linux-mm, LKML,
	Mel Gorman, Johannes Weiner, Andrea Arcangeli, Andrew Morton,
	Anton Vorontsov, Pekka Enberg, Wu Fengguang, Hugh Dickins

On Sat, Jun 23, 2012 at 11:53 AM, Rik van Riel <riel@redhat.com> wrote:
> On 06/23/2012 12:45 AM, Kamezawa Hiroyuki wrote:
>
>> I think this is interesting approach. Major concern is how to guarantee
>> EReclaimable
>> pages are really EReclaimable...Do you have any idea ? madviced pages
>> are really EReclaimable ?
>
> I suspect the EReclaimable pages can only be clean page
> cache pages that are not mapped by any processes.
>
> Once somebody tries to use the page, mark_page_accessed
> will move it to another list.

100% agree.


>> A (very) small concern is will you use one more page-flags for this ? ;)
>
> This could be an issue on a 32 bit system, true.

Do we really need SwapBacked bit? Actually swap-backed is
per-superblock attribute and don't change dynamically (i.e. no race
happen). thus this bit
might be able to move into page->mapping or page->mapping->host.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
  2012-06-23  4:45       ` Kamezawa Hiroyuki
@ 2012-06-25  0:15         ` Minchan Kim
  -1 siblings, 0 replies; 26+ messages in thread
From: Minchan Kim @ 2012-06-25  0:15 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: John Stultz, linux-mm, LKML, Rik van Riel, Mel Gorman,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

Hi Kame,

On 06/23/2012 01:45 PM, Kamezawa Hiroyuki wrote:

> (2012/06/22 15:57), Minchan Kim wrote:
>> Hi John,
>>
>> On 06/22/2012 04:21 AM, John Stultz wrote:
>>
>>> On 06/18/2012 10:49 PM, Minchan Kim wrote:
>>>> Hi everybody!
>>>>
>>>> Recently, there are some efforts to handle system memory pressure.
>>>>
>>>> 1) low memory notification - [1]
>>>> 2) fallocate(VOLATILE) - [2]
>>>> 3) fadvise(NOREUSE) - [3]
>>>>
>>>> For them, I would like to add new LRU list, aka "Ereclaimable" which
>>>> is opposite of "unevictable".
>>>> Reclaimable LRU list includes _easy_ reclaimable pages.
>>>> For example, easy reclaimable pages are following as.
>>>>
>>>> 1. invalidated but remained LRU list.
>>>> 2. pageout pages for reclaim(PG_reclaim pages)
>>>> 3. fadvise(NOREUSE)
>>>> 4. fallocate(VOLATILE)
>>>>
>>>> Their pages shouldn't stir normal LRU list and compaction might not
>>>> migrate them, even.
>>>> Reclaimer can reclaim Ereclaimable pages before normal lru list and
>>>> will avoid unnecessary
>>>> swapout in anon pages in easy-reclaimable LRU list.
>>>
>>> I was hoping there would be further comment on this by more core VM
>>> devs, but so far things have been quiet (is everyone on vacation?).
>>
>>
>> At least, there are no dissent comment until now.
>> Let be a positive. :)
> 
> I think this is interesting approach. Major concern is how to guarantee
> EReclaimable
> pages are really EReclaimable...Do you have any idea ? madviced pages
> are really
> EReclaimable ?


I would like to select just discardable pages.

1. unmapped file page 
2. PG_reclaimed page - (that pages would have no mapped and a candidate 
   for reclaim ASAP)
3. fallocate(VOLATILE) - (We can just discard them without swapout)
4. madvise(MADV_DONTNEED)/fadvise(NOREUSE) -
   (It could be difficult than (1,2,3) but it's very likely to reclaim easily than others.

> 
> A (very) small concern is will you use one more page-flags for this ? ;)


Maybe and it could be a serious problem on 32 bit machine.
I didn't dive into that but I guess we can reuse PG_reclaim bit.
PG_reclaim is always used by with !PageActive and Ereclaimable LRU list doesn't have 
active LRU list. so we can change following as

- #define PG_reclaim
+ #define PG_Ereclaim

SetPageReclaim(page)
{
	page->flags |= (PG_Ereclaim|PG_active);
}

TestPageReclaim(page)
{
	if (((page->flags && PG_Ereclaim|PG_active)) == (PG_Ereclaim|PG_active)) 
		return true;
	return false;
}

SetPageEreclaim(page)
{
	page->flags |= PG_Ereclaim;
}

Thanks for the comment, Kame.
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
@ 2012-06-25  0:15         ` Minchan Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Minchan Kim @ 2012-06-25  0:15 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: John Stultz, linux-mm, LKML, Rik van Riel, Mel Gorman,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

Hi Kame,

On 06/23/2012 01:45 PM, Kamezawa Hiroyuki wrote:

> (2012/06/22 15:57), Minchan Kim wrote:
>> Hi John,
>>
>> On 06/22/2012 04:21 AM, John Stultz wrote:
>>
>>> On 06/18/2012 10:49 PM, Minchan Kim wrote:
>>>> Hi everybody!
>>>>
>>>> Recently, there are some efforts to handle system memory pressure.
>>>>
>>>> 1) low memory notification - [1]
>>>> 2) fallocate(VOLATILE) - [2]
>>>> 3) fadvise(NOREUSE) - [3]
>>>>
>>>> For them, I would like to add new LRU list, aka "Ereclaimable" which
>>>> is opposite of "unevictable".
>>>> Reclaimable LRU list includes _easy_ reclaimable pages.
>>>> For example, easy reclaimable pages are following as.
>>>>
>>>> 1. invalidated but remained LRU list.
>>>> 2. pageout pages for reclaim(PG_reclaim pages)
>>>> 3. fadvise(NOREUSE)
>>>> 4. fallocate(VOLATILE)
>>>>
>>>> Their pages shouldn't stir normal LRU list and compaction might not
>>>> migrate them, even.
>>>> Reclaimer can reclaim Ereclaimable pages before normal lru list and
>>>> will avoid unnecessary
>>>> swapout in anon pages in easy-reclaimable LRU list.
>>>
>>> I was hoping there would be further comment on this by more core VM
>>> devs, but so far things have been quiet (is everyone on vacation?).
>>
>>
>> At least, there are no dissent comment until now.
>> Let be a positive. :)
> 
> I think this is interesting approach. Major concern is how to guarantee
> EReclaimable
> pages are really EReclaimable...Do you have any idea ? madviced pages
> are really
> EReclaimable ?


I would like to select just discardable pages.

1. unmapped file page 
2. PG_reclaimed page - (that pages would have no mapped and a candidate 
   for reclaim ASAP)
3. fallocate(VOLATILE) - (We can just discard them without swapout)
4. madvise(MADV_DONTNEED)/fadvise(NOREUSE) -
   (It could be difficult than (1,2,3) but it's very likely to reclaim easily than others.

> 
> A (very) small concern is will you use one more page-flags for this ? ;)


Maybe and it could be a serious problem on 32 bit machine.
I didn't dive into that but I guess we can reuse PG_reclaim bit.
PG_reclaim is always used by with !PageActive and Ereclaimable LRU list doesn't have 
active LRU list. so we can change following as

- #define PG_reclaim
+ #define PG_Ereclaim

SetPageReclaim(page)
{
	page->flags |= (PG_Ereclaim|PG_active);
}

TestPageReclaim(page)
{
	if (((page->flags && PG_Ereclaim|PG_active)) == (PG_Ereclaim|PG_active)) 
		return true;
	return false;
}

SetPageEreclaim(page)
{
	page->flags |= PG_Ereclaim;
}

Thanks for the comment, Kame.
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
  2012-06-19  5:49 ` Minchan Kim
@ 2012-06-25  8:46   ` Glauber Costa
  -1 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2012-06-25  8:46 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, LKML, Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, John Stultz, Pekka Enberg,
	Wu Fengguang, Hugh Dickins

On 06/19/2012 09:49 AM, Minchan Kim wrote:
> Hi everybody!
>
> Recently, there are some efforts to handle system memory pressure.
>
> 1) low memory notification - [1]
> 2) fallocate(VOLATILE) - [2]
> 3) fadvise(NOREUSE) - [3]
>
> For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
> Reclaimable LRU list includes_easy_  reclaimable pages.
> For example, easy reclaimable pages are following as.
>
> 1. invalidated but remained LRU list.
> 2. pageout pages for reclaim(PG_reclaim pages)
> 3. fadvise(NOREUSE)
> 4. fallocate(VOLATILE)
>
> Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.
What about other things moving memory like CMA ?




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
@ 2012-06-25  8:46   ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2012-06-25  8:46 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, LKML, Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, John Stultz, Pekka Enberg,
	Wu Fengguang, Hugh Dickins

On 06/19/2012 09:49 AM, Minchan Kim wrote:
> Hi everybody!
>
> Recently, there are some efforts to handle system memory pressure.
>
> 1) low memory notification - [1]
> 2) fallocate(VOLATILE) - [2]
> 3) fadvise(NOREUSE) - [3]
>
> For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
> Reclaimable LRU list includes_easy_  reclaimable pages.
> For example, easy reclaimable pages are following as.
>
> 1. invalidated but remained LRU list.
> 2. pageout pages for reclaim(PG_reclaim pages)
> 3. fadvise(NOREUSE)
> 4. fallocate(VOLATILE)
>
> Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.
What about other things moving memory like CMA ?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
  2012-06-19  5:49 ` Minchan Kim
@ 2012-06-25 10:24   ` Mel Gorman
  -1 siblings, 0 replies; 26+ messages in thread
From: Mel Gorman @ 2012-06-25 10:24 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, LKML, Rik van Riel, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Johannes Weiner, Andrea Arcangeli, Andrew Morton,
	Anton Vorontsov, John Stultz, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote:
> Hi everybody!
> 
> Recently, there are some efforts to handle system memory pressure.
> 
> 1) low memory notification - [1]
> 2) fallocate(VOLATILE) - [2]
> 3) fadvise(NOREUSE) - [3]
> 
> For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
> Reclaimable LRU list includes _easy_ reclaimable pages.
> For example, easy reclaimable pages are following as. 
> 
> 1. invalidated but remained LRU list.
> 2. pageout pages for reclaim(PG_reclaim pages)
> 3. fadvise(NOREUSE)
> 4. fallocate(VOLATILE)
> 
> Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.

Why would compaction not migrate them? We might still want to migrate
NORESUSE or VOLATILE pages.

> Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary
> swapout in anon pages in easy-reclaimable LRU list.
> It also can make admin measure how many we have available pages at the moment without latency.

That's not true for PG_reclaim pages as those pages cannot be discarded
until writeback completes.

One reason why I tried moving PG_reclaim pages to a separate list was
to avoid excessive scanning when writing back to slow devices. If those
pages were moved to an "easy-reclaimable" LRU list then the value would
be reduced as scanning would still occur. It might make it worse because
the whole Ereclaimable list would be scanned for pages that cannot be
reclaimed at all before moving to another LRU list.

This separate list does not exist today because it required a page bit to
implement and I did not want it to be a 64-bit only feature. You will
probably hit the same problem.

The setting of the page bit is also going to be a problem but you may be
able to lazily move pages to the EReclaimable list in the same way
unevictable pages are handled.

> It's very important in recent mobile systems because page reclaim/writeback is very critical
> of application latency. Of course, it could affect normal desktop, too.
> With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES,
> for example. If it's below threshold we defined, we could trigger 1st level notification
> if we really need prototying low memory notification.
> 

If PG_reclaim pages are on this list, then that calculation will not be
helpful.

> We may change madvise(DONTNEED) implementation instead of zapping page immediately.
> If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault.
> Of course, we can discard instead of swap out if system memory pressure happens.
> We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread.
> 
> As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly.

That alters ageing of pages significantly. It means that workloads that
are using read heavily will have their pages discarded first.

> The rationale is that in non-rotation device, read/write cost is much asynchronous.

While this is true that does not justify throwing away unmapped clean
page cache first every time.

> Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages
> if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too.
> 
> Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late.
> I hope listen others opinion before get into the code.
> 

Care is needed. I think you'll only be able to use this list for
NORESUSE, VOLATILE and invalidated pages. If you add PG_reclaim it not be
"easily-reclaimable" and if you add clean unmapped pages then there will
be regressions in workloads that are read-intensive.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
@ 2012-06-25 10:24   ` Mel Gorman
  0 siblings, 0 replies; 26+ messages in thread
From: Mel Gorman @ 2012-06-25 10:24 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, LKML, Rik van Riel, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Johannes Weiner, Andrea Arcangeli, Andrew Morton,
	Anton Vorontsov, John Stultz, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote:
> Hi everybody!
> 
> Recently, there are some efforts to handle system memory pressure.
> 
> 1) low memory notification - [1]
> 2) fallocate(VOLATILE) - [2]
> 3) fadvise(NOREUSE) - [3]
> 
> For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
> Reclaimable LRU list includes _easy_ reclaimable pages.
> For example, easy reclaimable pages are following as. 
> 
> 1. invalidated but remained LRU list.
> 2. pageout pages for reclaim(PG_reclaim pages)
> 3. fadvise(NOREUSE)
> 4. fallocate(VOLATILE)
> 
> Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.

Why would compaction not migrate them? We might still want to migrate
NORESUSE or VOLATILE pages.

> Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary
> swapout in anon pages in easy-reclaimable LRU list.
> It also can make admin measure how many we have available pages at the moment without latency.

That's not true for PG_reclaim pages as those pages cannot be discarded
until writeback completes.

One reason why I tried moving PG_reclaim pages to a separate list was
to avoid excessive scanning when writing back to slow devices. If those
pages were moved to an "easy-reclaimable" LRU list then the value would
be reduced as scanning would still occur. It might make it worse because
the whole Ereclaimable list would be scanned for pages that cannot be
reclaimed at all before moving to another LRU list.

This separate list does not exist today because it required a page bit to
implement and I did not want it to be a 64-bit only feature. You will
probably hit the same problem.

The setting of the page bit is also going to be a problem but you may be
able to lazily move pages to the EReclaimable list in the same way
unevictable pages are handled.

> It's very important in recent mobile systems because page reclaim/writeback is very critical
> of application latency. Of course, it could affect normal desktop, too.
> With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES,
> for example. If it's below threshold we defined, we could trigger 1st level notification
> if we really need prototying low memory notification.
> 

If PG_reclaim pages are on this list, then that calculation will not be
helpful.

> We may change madvise(DONTNEED) implementation instead of zapping page immediately.
> If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault.
> Of course, we can discard instead of swap out if system memory pressure happens.
> We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread.
> 
> As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly.

That alters ageing of pages significantly. It means that workloads that
are using read heavily will have their pages discarded first.

> The rationale is that in non-rotation device, read/write cost is much asynchronous.

While this is true that does not justify throwing away unmapped clean
page cache first every time.

> Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages
> if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too.
> 
> Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late.
> I hope listen others opinion before get into the code.
> 

Care is needed. I think you'll only be able to use this list for
NORESUSE, VOLATILE and invalidated pages. If you add PG_reclaim it not be
"easily-reclaimable" and if you add clean unmapped pages then there will
be regressions in workloads that are read-intensive.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
  2012-06-25  8:46   ` Glauber Costa
@ 2012-06-26  0:12     ` Minchan Kim
  -1 siblings, 0 replies; 26+ messages in thread
From: Minchan Kim @ 2012-06-26  0:12 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, LKML, Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, John Stultz, Pekka Enberg,
	Wu Fengguang, Hugh Dickins

On 06/25/2012 05:46 PM, Glauber Costa wrote:

> On 06/19/2012 09:49 AM, Minchan Kim wrote:
>> Hi everybody!
>>
>> Recently, there are some efforts to handle system memory pressure.
>>
>> 1) low memory notification - [1]
>> 2) fallocate(VOLATILE) - [2]
>> 3) fadvise(NOREUSE) - [3]
>>
>> For them, I would like to add new LRU list, aka "Ereclaimable" which
>> is opposite of "unevictable".
>> Reclaimable LRU list includes_easy_  reclaimable pages.
>> For example, easy reclaimable pages are following as.
>>
>> 1. invalidated but remained LRU list.
>> 2. pageout pages for reclaim(PG_reclaim pages)
>> 3. fadvise(NOREUSE)
>> 4. fallocate(VOLATILE)
>>
>> Their pages shouldn't stir normal LRU list and compaction might not
>> migrate them, even.
> What about other things moving memory like CMA ?


Sorry for not being able to understand your point.
Can you elaborate a bit more?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
@ 2012-06-26  0:12     ` Minchan Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Minchan Kim @ 2012-06-26  0:12 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, LKML, Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, John Stultz, Pekka Enberg,
	Wu Fengguang, Hugh Dickins

On 06/25/2012 05:46 PM, Glauber Costa wrote:

> On 06/19/2012 09:49 AM, Minchan Kim wrote:
>> Hi everybody!
>>
>> Recently, there are some efforts to handle system memory pressure.
>>
>> 1) low memory notification - [1]
>> 2) fallocate(VOLATILE) - [2]
>> 3) fadvise(NOREUSE) - [3]
>>
>> For them, I would like to add new LRU list, aka "Ereclaimable" which
>> is opposite of "unevictable".
>> Reclaimable LRU list includes_easy_  reclaimable pages.
>> For example, easy reclaimable pages are following as.
>>
>> 1. invalidated but remained LRU list.
>> 2. pageout pages for reclaim(PG_reclaim pages)
>> 3. fadvise(NOREUSE)
>> 4. fallocate(VOLATILE)
>>
>> Their pages shouldn't stir normal LRU list and compaction might not
>> migrate them, even.
> What about other things moving memory like CMA ?


Sorry for not being able to understand your point.
Can you elaborate a bit more?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
  2012-06-25 10:24   ` Mel Gorman
@ 2012-06-26  0:26     ` Minchan Kim
  -1 siblings, 0 replies; 26+ messages in thread
From: Minchan Kim @ 2012-06-26  0:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, LKML, Rik van Riel, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Johannes Weiner, Andrea Arcangeli, Andrew Morton,
	Anton Vorontsov, John Stultz, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

On 06/25/2012 07:24 PM, Mel Gorman wrote:

> On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote:
>> Hi everybody!
>>
>> Recently, there are some efforts to handle system memory pressure.
>>
>> 1) low memory notification - [1]
>> 2) fallocate(VOLATILE) - [2]
>> 3) fadvise(NOREUSE) - [3]
>>
>> For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
>> Reclaimable LRU list includes _easy_ reclaimable pages.
>> For example, easy reclaimable pages are following as. 
>>
>> 1. invalidated but remained LRU list.
>> 2. pageout pages for reclaim(PG_reclaim pages)
>> 3. fadvise(NOREUSE)
>> 4. fallocate(VOLATILE)
>>
>> Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.
> 
> Why would compaction not migrate them? We might still want to migrate
> NORESUSE or VOLATILE pages.


It might.

> 
>> Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary
>> swapout in anon pages in easy-reclaimable LRU list.
>> It also can make admin measure how many we have available pages at the moment without latency.
> 
> That's not true for PG_reclaim pages as those pages cannot be discarded
> until writeback completes.

> 

> One reason why I tried moving PG_reclaim pages to a separate list was
> to avoid excessive scanning when writing back to slow devices. If those
> pages were moved to an "easy-reclaimable" LRU list then the value would
> be reduced as scanning would still occur. It might make it worse because
> the whole Ereclaimable list would be scanned for pages that cannot be
> reclaimed at all before moving to another LRU list.


I should have written more clear.
I mean following as

end_page_writeback(struct page *)
{
	if (PageReclaim(page))
		move_ereclaim_lru_list(page);
}

So Ereclaimable LRU list can have a discardable pages.
> 

> This separate list does not exist today because it required a page bit to
> implement and I did not want it to be a 64-bit only feature. You will
> probably hit the same problem.


True. Others already pointed it out in this thread.
And I post a idea.

Copy/Paste

"
Maybe and it could be a serious problem on 32 bit machine.
I didn't dive into that but I guess we can reuse PG_reclaim bit.
PG_reclaim is always used by with !PageActive and Ereclaimable LRU list doesn't have 
active LRU list. so we can change following as

- #define PG_reclaim
+ #define PG_Ereclaim

SetPageReclaim(page)
{
	page->flags |= (PG_Ereclaim|PG_active);
}

TestPageReclaim(page)
{
	if (((page->flags && PG_Ereclaim|PG_active)) == (PG_Ereclaim|PG_active)) 
		return true;
	return false;
}

SetPageEreclaim(page)
{
	page->flags |= PG_Ereclaim;
}
"

> 
> The setting of the page bit is also going to be a problem but you may be
> able to lazily move pages to the EReclaimable list in the same way
> unevictable pages are handled.


First of all, I don't consider lazy moving like unevictable.
We can move VOLATILE/NOREUSE pages into EReclaiabmle LRU list in backgroud by using workqueue.
Please tell me the scenario if we consider lazy moving.

> 
>> It's very important in recent mobile systems because page reclaim/writeback is very critical
>> of application latency. Of course, it could affect normal desktop, too.
>> With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES,
>> for example. If it's below threshold we defined, we could trigger 1st level notification
>> if we really need prototying low memory notification.
>>
> 
> If PG_reclaim pages are on this list, then that calculation will not be
> helpful.


PG_reclaim pages would be not in Ereclaimable LRU list like I mentioned above.

> 
>> We may change madvise(DONTNEED) implementation instead of zapping page immediately.
>> If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault.
>> Of course, we can discard instead of swap out if system memory pressure happens.
>> We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread.
>>
>> As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly.
> 
> That alters ageing of pages significantly. It means that workloads that
> are using read heavily will have their pages discarded first.\

> 

>> The rationale is that in non-rotation device, read/write cost is much asynchronous.
> 
> While this is true that does not justify throwing away unmapped clean
> page cache first every time.


That's true. That is workload I have a concern.
We need balancing unmmapped/mapped pages so sometime, some mapped pages would be moved into
unevictable LRU list with unmapping all of pte. I believe It could mitigate the problem,
but not perfect, I admit. Maybe we need some knob for admin to tune it.
Anyway, it's a big concern for me and one of careful test for regression.

> 
>> Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages
>> if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too.
>>
>> Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late.
>> I hope listen others opinion before get into the code.
>>
> 
> Care is needed. I think you'll only be able to use this list for
> NORESUSE, VOLATILE and invalidated pages. If you add PG_reclaim it not be
> "easily-reclaimable" and if you add clean unmapped pages then there will
> be regressions in workloads that are read-intensive.
> 


Thanks for the feedback, Mel.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
@ 2012-06-26  0:26     ` Minchan Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Minchan Kim @ 2012-06-26  0:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, LKML, Rik van Riel, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Johannes Weiner, Andrea Arcangeli, Andrew Morton,
	Anton Vorontsov, John Stultz, Pekka Enberg, Wu Fengguang,
	Hugh Dickins

On 06/25/2012 07:24 PM, Mel Gorman wrote:

> On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote:
>> Hi everybody!
>>
>> Recently, there are some efforts to handle system memory pressure.
>>
>> 1) low memory notification - [1]
>> 2) fallocate(VOLATILE) - [2]
>> 3) fadvise(NOREUSE) - [3]
>>
>> For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
>> Reclaimable LRU list includes _easy_ reclaimable pages.
>> For example, easy reclaimable pages are following as. 
>>
>> 1. invalidated but remained LRU list.
>> 2. pageout pages for reclaim(PG_reclaim pages)
>> 3. fadvise(NOREUSE)
>> 4. fallocate(VOLATILE)
>>
>> Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.
> 
> Why would compaction not migrate them? We might still want to migrate
> NORESUSE or VOLATILE pages.


It might.

> 
>> Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary
>> swapout in anon pages in easy-reclaimable LRU list.
>> It also can make admin measure how many we have available pages at the moment without latency.
> 
> That's not true for PG_reclaim pages as those pages cannot be discarded
> until writeback completes.

> 

> One reason why I tried moving PG_reclaim pages to a separate list was
> to avoid excessive scanning when writing back to slow devices. If those
> pages were moved to an "easy-reclaimable" LRU list then the value would
> be reduced as scanning would still occur. It might make it worse because
> the whole Ereclaimable list would be scanned for pages that cannot be
> reclaimed at all before moving to another LRU list.


I should have written more clear.
I mean following as

end_page_writeback(struct page *)
{
	if (PageReclaim(page))
		move_ereclaim_lru_list(page);
}

So Ereclaimable LRU list can have a discardable pages.
> 

> This separate list does not exist today because it required a page bit to
> implement and I did not want it to be a 64-bit only feature. You will
> probably hit the same problem.


True. Others already pointed it out in this thread.
And I post a idea.

Copy/Paste

"
Maybe and it could be a serious problem on 32 bit machine.
I didn't dive into that but I guess we can reuse PG_reclaim bit.
PG_reclaim is always used by with !PageActive and Ereclaimable LRU list doesn't have 
active LRU list. so we can change following as

- #define PG_reclaim
+ #define PG_Ereclaim

SetPageReclaim(page)
{
	page->flags |= (PG_Ereclaim|PG_active);
}

TestPageReclaim(page)
{
	if (((page->flags && PG_Ereclaim|PG_active)) == (PG_Ereclaim|PG_active)) 
		return true;
	return false;
}

SetPageEreclaim(page)
{
	page->flags |= PG_Ereclaim;
}
"

> 
> The setting of the page bit is also going to be a problem but you may be
> able to lazily move pages to the EReclaimable list in the same way
> unevictable pages are handled.


First of all, I don't consider lazy moving like unevictable.
We can move VOLATILE/NOREUSE pages into EReclaiabmle LRU list in backgroud by using workqueue.
Please tell me the scenario if we consider lazy moving.

> 
>> It's very important in recent mobile systems because page reclaim/writeback is very critical
>> of application latency. Of course, it could affect normal desktop, too.
>> With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES,
>> for example. If it's below threshold we defined, we could trigger 1st level notification
>> if we really need prototying low memory notification.
>>
> 
> If PG_reclaim pages are on this list, then that calculation will not be
> helpful.


PG_reclaim pages would be not in Ereclaimable LRU list like I mentioned above.

> 
>> We may change madvise(DONTNEED) implementation instead of zapping page immediately.
>> If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault.
>> Of course, we can discard instead of swap out if system memory pressure happens.
>> We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread.
>>
>> As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly.
> 
> That alters ageing of pages significantly. It means that workloads that
> are using read heavily will have their pages discarded first.\

> 

>> The rationale is that in non-rotation device, read/write cost is much asynchronous.
> 
> While this is true that does not justify throwing away unmapped clean
> page cache first every time.


That's true. That is workload I have a concern.
We need balancing unmmapped/mapped pages so sometime, some mapped pages would be moved into
unevictable LRU list with unmapping all of pte. I believe It could mitigate the problem,
but not perfect, I admit. Maybe we need some knob for admin to tune it.
Anyway, it's a big concern for me and one of careful test for regression.

> 
>> Read is very fast while write is very slow so it would be a gain while we can avoid writeback of dirty pages
>> if possible although we need several reads. It can be implemented easily with Ereclaimable pages, too.
>>
>> Anyway, it's just a brain-storming phase and never implemented yet but decide posting before it's too late.
>> I hope listen others opinion before get into the code.
>>
> 
> Care is needed. I think you'll only be able to use this list for
> NORESUSE, VOLATILE and invalidated pages. If you add PG_reclaim it not be
> "easily-reclaimable" and if you add clean unmapped pages then there will
> be regressions in workloads that are read-intensive.
> 


Thanks for the feedback, Mel.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
  2012-06-26  0:12     ` Minchan Kim
@ 2012-06-26  8:07       ` Glauber Costa
  -1 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2012-06-26  8:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, LKML, Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, John Stultz, Pekka Enberg,
	Wu Fengguang, Hugh Dickins

On 06/26/2012 04:12 AM, Minchan Kim wrote:
> On 06/25/2012 05:46 PM, Glauber Costa wrote:
>
>> On 06/19/2012 09:49 AM, Minchan Kim wrote:
>>> Hi everybody!
>>>
>>> Recently, there are some efforts to handle system memory pressure.
>>>
>>> 1) low memory notification - [1]
>>> 2) fallocate(VOLATILE) - [2]
>>> 3) fadvise(NOREUSE) - [3]
>>>
>>> For them, I would like to add new LRU list, aka "Ereclaimable" which
>>> is opposite of "unevictable".
>>> Reclaimable LRU list includes_easy_  reclaimable pages.
>>> For example, easy reclaimable pages are following as.
>>>
>>> 1. invalidated but remained LRU list.
>>> 2. pageout pages for reclaim(PG_reclaim pages)
>>> 3. fadvise(NOREUSE)
>>> 4. fallocate(VOLATILE)
>>>
>>> Their pages shouldn't stir normal LRU list and compaction might not
>>> migrate them, even.
>> What about other things moving memory like CMA ?
>
>
> Sorry for not being able to understand your point.
> Can you elaborate a bit more?
>

Well, maybe I didn't =)
I was just wondering why exactly it is that troubles your scheme with 
compaction, and if such restriction would also apply to memory movement
schemes like CMA.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
@ 2012-06-26  8:07       ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2012-06-26  8:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, LKML, Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, John Stultz, Pekka Enberg,
	Wu Fengguang, Hugh Dickins

On 06/26/2012 04:12 AM, Minchan Kim wrote:
> On 06/25/2012 05:46 PM, Glauber Costa wrote:
>
>> On 06/19/2012 09:49 AM, Minchan Kim wrote:
>>> Hi everybody!
>>>
>>> Recently, there are some efforts to handle system memory pressure.
>>>
>>> 1) low memory notification - [1]
>>> 2) fallocate(VOLATILE) - [2]
>>> 3) fadvise(NOREUSE) - [3]
>>>
>>> For them, I would like to add new LRU list, aka "Ereclaimable" which
>>> is opposite of "unevictable".
>>> Reclaimable LRU list includes_easy_  reclaimable pages.
>>> For example, easy reclaimable pages are following as.
>>>
>>> 1. invalidated but remained LRU list.
>>> 2. pageout pages for reclaim(PG_reclaim pages)
>>> 3. fadvise(NOREUSE)
>>> 4. fallocate(VOLATILE)
>>>
>>> Their pages shouldn't stir normal LRU list and compaction might not
>>> migrate them, even.
>> What about other things moving memory like CMA ?
>
>
> Sorry for not being able to understand your point.
> Can you elaborate a bit more?
>

Well, maybe I didn't =)
I was just wondering why exactly it is that troubles your scheme with 
compaction, and if such restriction would also apply to memory movement
schemes like CMA.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
  2012-06-25 10:24   ` Mel Gorman
@ 2012-07-17 16:03     ` Zheng Liu
  -1 siblings, 0 replies; 26+ messages in thread
From: Zheng Liu @ 2012-07-17 16:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, linux-mm, LKML, Rik van Riel, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, John Stultz, Pekka Enberg,
	Wu Fengguang, Hugh Dickins

On Mon, Jun 25, 2012 at 11:24:35AM +0100, Mel Gorman wrote:
> On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote:
> > Hi everybody!
> > 
> > Recently, there are some efforts to handle system memory pressure.
> > 
> > 1) low memory notification - [1]
> > 2) fallocate(VOLATILE) - [2]
> > 3) fadvise(NOREUSE) - [3]
> > 
> > For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
> > Reclaimable LRU list includes _easy_ reclaimable pages.
> > For example, easy reclaimable pages are following as. 
> > 
> > 1. invalidated but remained LRU list.
> > 2. pageout pages for reclaim(PG_reclaim pages)
> > 3. fadvise(NOREUSE)
> > 4. fallocate(VOLATILE)
> > 
> > Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.
> 
> Why would compaction not migrate them? We might still want to migrate
> NORESUSE or VOLATILE pages.
> 
> > Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary
> > swapout in anon pages in easy-reclaimable LRU list.
> > It also can make admin measure how many we have available pages at the moment without latency.
> 
> That's not true for PG_reclaim pages as those pages cannot be discarded
> until writeback completes.
> 
> One reason why I tried moving PG_reclaim pages to a separate list was
> to avoid excessive scanning when writing back to slow devices. If those
> pages were moved to an "easy-reclaimable" LRU list then the value would
> be reduced as scanning would still occur. It might make it worse because
> the whole Ereclaimable list would be scanned for pages that cannot be
> reclaimed at all before moving to another LRU list.
> 
> This separate list does not exist today because it required a page bit to
> implement and I did not want it to be a 64-bit only feature. You will
> probably hit the same problem.
> 
> The setting of the page bit is also going to be a problem but you may be
> able to lazily move pages to the EReclaimable list in the same way
> unevictable pages are handled.
> 
> > It's very important in recent mobile systems because page reclaim/writeback is very critical
> > of application latency. Of course, it could affect normal desktop, too.
> > With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES,
> > for example. If it's below threshold we defined, we could trigger 1st level notification
> > if we really need prototying low memory notification.
> > 
> 
> If PG_reclaim pages are on this list, then that calculation will not be
> helpful.
> 
> > We may change madvise(DONTNEED) implementation instead of zapping page immediately.
> > If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault.
> > Of course, we can discard instead of swap out if system memory pressure happens.
> > We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread.
> > 
> > As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly.
> 
> That alters ageing of pages significantly. It means that workloads that
> are using read heavily will have their pages discarded first.

Hi Mel,

Sorry, I only notice this thread today.  The key issue is that we need to
balance between page cache and mapped file page.  AFAIK, in latest kernel,
the page cache gets a higher priority than mapped file page because it is
easy to be activated and be promoted into active list.  For example,
when the application reads some data twice at a offset,
mark_page_accessed will be called twice, and this page will be
activated.  However, when the application accesses a mapped file page
twice,  it is only in inactive list and access bit is marked.  Until we
try to free pages, this page will be given a chance to keep in inactive
list.  It is unfair for mapped file page.  In old kernel, such as
2.6.18, mapped file page is treated as anonymous page, which has a
higher priority.  Meanwhile, for most developers, they think that there
is no any differences between page cache and mapped file page.  So IMHO
we need to reduce the priority of page cache, or at least we need to
measure access times of mapped file page correctly.  As this thread is
discussed [1], we met this problem in our product system.

1. http://www.spinics.net/lists/linux-mm/msg34642.html

Regards,
Zheng

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC:  Easy-Reclaimable LRU list
@ 2012-07-17 16:03     ` Zheng Liu
  0 siblings, 0 replies; 26+ messages in thread
From: Zheng Liu @ 2012-07-17 16:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, linux-mm, LKML, Rik van Riel, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Johannes Weiner, Andrea Arcangeli,
	Andrew Morton, Anton Vorontsov, John Stultz, Pekka Enberg,
	Wu Fengguang, Hugh Dickins

On Mon, Jun 25, 2012 at 11:24:35AM +0100, Mel Gorman wrote:
> On Tue, Jun 19, 2012 at 02:49:01PM +0900, Minchan Kim wrote:
> > Hi everybody!
> > 
> > Recently, there are some efforts to handle system memory pressure.
> > 
> > 1) low memory notification - [1]
> > 2) fallocate(VOLATILE) - [2]
> > 3) fadvise(NOREUSE) - [3]
> > 
> > For them, I would like to add new LRU list, aka "Ereclaimable" which is opposite of "unevictable".
> > Reclaimable LRU list includes _easy_ reclaimable pages.
> > For example, easy reclaimable pages are following as. 
> > 
> > 1. invalidated but remained LRU list.
> > 2. pageout pages for reclaim(PG_reclaim pages)
> > 3. fadvise(NOREUSE)
> > 4. fallocate(VOLATILE)
> > 
> > Their pages shouldn't stir normal LRU list and compaction might not migrate them, even.
> 
> Why would compaction not migrate them? We might still want to migrate
> NORESUSE or VOLATILE pages.
> 
> > Reclaimer can reclaim Ereclaimable pages before normal lru list and will avoid unnecessary
> > swapout in anon pages in easy-reclaimable LRU list.
> > It also can make admin measure how many we have available pages at the moment without latency.
> 
> That's not true for PG_reclaim pages as those pages cannot be discarded
> until writeback completes.
> 
> One reason why I tried moving PG_reclaim pages to a separate list was
> to avoid excessive scanning when writing back to slow devices. If those
> pages were moved to an "easy-reclaimable" LRU list then the value would
> be reduced as scanning would still occur. It might make it worse because
> the whole Ereclaimable list would be scanned for pages that cannot be
> reclaimed at all before moving to another LRU list.
> 
> This separate list does not exist today because it required a page bit to
> implement and I did not want it to be a 64-bit only feature. You will
> probably hit the same problem.
> 
> The setting of the page bit is also going to be a problem but you may be
> able to lazily move pages to the EReclaimable list in the same way
> unevictable pages are handled.
> 
> > It's very important in recent mobile systems because page reclaim/writeback is very critical
> > of application latency. Of course, it could affect normal desktop, too.
> > With it, we can calculate fast-available pages more exactly with NR_FREE_PAGES + NR_ERECLAIMABLE_PAGES,
> > for example. If it's below threshold we defined, we could trigger 1st level notification
> > if we really need prototying low memory notification.
> > 
> 
> If PG_reclaim pages are on this list, then that calculation will not be
> helpful.
> 
> > We may change madvise(DONTNEED) implementation instead of zapping page immediately.
> > If memory pressure doesn't happen, pages are in memory so we can avoid so many minor fault.
> > Of course, we can discard instead of swap out if system memory pressure happens.
> > We might implement it madvise(VOLATILE) instead of DONTNEED, but anyway it's off-topic in this thread.
> > 
> > As a another example, we can implement CFLRU(Clean-First LRU) which reclaims unmapped-clean cache page firstly.
> 
> That alters ageing of pages significantly. It means that workloads that
> are using read heavily will have their pages discarded first.

Hi Mel,

Sorry, I only notice this thread today.  The key issue is that we need to
balance between page cache and mapped file page.  AFAIK, in latest kernel,
the page cache gets a higher priority than mapped file page because it is
easy to be activated and be promoted into active list.  For example,
when the application reads some data twice at a offset,
mark_page_accessed will be called twice, and this page will be
activated.  However, when the application accesses a mapped file page
twice,  it is only in inactive list and access bit is marked.  Until we
try to free pages, this page will be given a chance to keep in inactive
list.  It is unfair for mapped file page.  In old kernel, such as
2.6.18, mapped file page is treated as anonymous page, which has a
higher priority.  Meanwhile, for most developers, they think that there
is no any differences between page cache and mapped file page.  So IMHO
we need to reduce the priority of page cache, or at least we need to
measure access times of mapped file page correctly.  As this thread is
discussed [1], we met this problem in our product system.

1. http://www.spinics.net/lists/linux-mm/msg34642.html

Regards,
Zheng

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-07-17 15:55 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-19  5:49 RFC: Easy-Reclaimable LRU list Minchan Kim
2012-06-19  5:49 ` Minchan Kim
2012-06-21 19:21 ` John Stultz
2012-06-21 19:21   ` John Stultz
2012-06-22  6:57   ` Minchan Kim
2012-06-22  6:57     ` Minchan Kim
2012-06-23  4:45     ` Kamezawa Hiroyuki
2012-06-23  4:45       ` Kamezawa Hiroyuki
2012-06-23 15:53       ` Rik van Riel
2012-06-23 15:53         ` Rik van Riel
2012-06-24 11:09         ` KOSAKI Motohiro
2012-06-24 11:09           ` KOSAKI Motohiro
2012-06-25  0:15       ` Minchan Kim
2012-06-25  0:15         ` Minchan Kim
2012-06-25  8:46 ` Glauber Costa
2012-06-25  8:46   ` Glauber Costa
2012-06-26  0:12   ` Minchan Kim
2012-06-26  0:12     ` Minchan Kim
2012-06-26  8:07     ` Glauber Costa
2012-06-26  8:07       ` Glauber Costa
2012-06-25 10:24 ` Mel Gorman
2012-06-25 10:24   ` Mel Gorman
2012-06-26  0:26   ` Minchan Kim
2012-06-26  0:26     ` Minchan Kim
2012-07-17 16:03   ` Zheng Liu
2012-07-17 16:03     ` Zheng Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.