linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mapped pagecache pages vs unmapped pages
@ 2012-04-06  0:29 Alexey Ivanov
  2012-04-06  7:16 ` gnehzuil.lzheng
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Ivanov @ 2012-04-06  0:29 UTC (permalink / raw)
  To: linux-kernel

In progress of migration from FreeBSD to Linux and we found some strange behavior: periodically running tasks (like rsync/p2p deployment) evict mapped pages from memory.

>From my little research I've found following lkml thread:
https://lkml.org/lkml/2008/6/11/278
And more precisely this commit: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4f98a2fee8acdb4ac84545df98cccecfd130f8db
which along with splitting LRU into "anon" and "file" removed support of reclaim_mapped.

Is there a knob to prioritize mapped memory over unmapped (without modifying all apps to use O_DIRECT/fadvise/madvise or mlocking our data in memory) or at least some way to change proportion of Active(file)/Inactive(file)?

-- 
Sincerely,
Alexey Ivanov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mapped pagecache pages vs unmapped pages
  2012-04-06  0:29 mapped pagecache pages vs unmapped pages Alexey Ivanov
@ 2012-04-06  7:16 ` gnehzuil.lzheng
  2012-04-09 17:11   ` Alexey Ivanov
  0 siblings, 1 reply; 9+ messages in thread
From: gnehzuil.lzheng @ 2012-04-06  7:16 UTC (permalink / raw)
  To: Alexey Ivanov; +Cc: linux-kernel, linux-mm

On 04/06/2012 08:29 AM, Alexey Ivanov wrote:

> In progress of migration from FreeBSD to Linux and we found some strange behavior: periodically running tasks (like rsync/p2p deployment) evict mapped pages from memory.
> 
> From my little research I've found following lkml thread:
> https://lkml.org/lkml/2008/6/11/278
> And more precisely this commit: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4f98a2fee8acdb4ac84545df98cccecfd130f8db
> which along with splitting LRU into "anon" and "file" removed support of reclaim_mapped.
> 
> Is there a knob to prioritize mapped memory over unmapped (without modifying all apps to use O_DIRECT/fadvise/madvise or mlocking our data in memory) or at least some way to change proportion of Active(file)/Inactive(file)?
> 


Hi Alexey,

Cc to linux-mm mailing list.

I have met the similar problem and I have sent a mail to discuss it.
Maybe it can help you
(http://marc.info/?l=linux-mm&m=132947026019538&w=2).

Now Konstantin has sent a patch set to try to expand vm_flags from 32
bit to 64 bit.  Then we can add the new flag into vm_flags and
prioritize mmaped pages in madvise(2).

Regards,
Zheng

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mapped pagecache pages vs unmapped pages
  2012-04-06  7:16 ` gnehzuil.lzheng
@ 2012-04-09 17:11   ` Alexey Ivanov
  2012-04-09 18:14     ` Rik van Riel
  2012-04-09 18:17     ` Ying Han
  0 siblings, 2 replies; 9+ messages in thread
From: Alexey Ivanov @ 2012-04-09 17:11 UTC (permalink / raw)
  To: gnehzuil.lzheng; +Cc: linux-kernel, linux-mm

Thanks for the hint!

Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()?

06.04.2012, 11:16, "gnehzuil.lzheng@gmail.com" <gnehzuil.lzheng@gmail.com>:
> On 04/06/2012 08:29 AM, Alexey Ivanov wrote:
>
>>  In progress of migration from FreeBSD to Linux and we found some strange behavior: periodically running tasks (like rsync/p2p deployment) evict mapped pages from memory.
>>
>>  From my little research I've found following lkml thread:
>>  https://lkml.org/lkml/2008/6/11/278
>>  And more precisely this commit: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4f98a2fee8acdb4ac84545df98cccecfd130f8db
>>  which along with splitting LRU into "anon" and "file" removed support of reclaim_mapped.
>>
>>  Is there a knob to prioritize mapped memory over unmapped (without modifying all apps to use O_DIRECT/fadvise/madvise or mlocking our data in memory) or at least some way to change proportion of Active(file)/Inactive(file)?
>
> Hi Alexey,
>
> Cc to linux-mm mailing list.
>
> I have met the similar problem and I have sent a mail to discuss it.
> Maybe it can help you
> (http://marc.info/?l=linux-mm&m=132947026019538&w=2).
>
> Now Konstantin has sent a patch set to try to expand vm_flags from 32
> bit to 64 bit.  Then we can add the new flag into vm_flags and
> prioritize mmaped pages in madvise(2).
>
> Regards,
> Zheng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Alexey Ivanov
Yandex Search Admin Team
*************
tel.: +7 (985) 120-35-83 (int. 7176)
http://staff.yandex-team.ru/rbtz
*************

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mapped pagecache pages vs unmapped pages
  2012-04-09 17:11   ` Alexey Ivanov
@ 2012-04-09 18:14     ` Rik van Riel
  2012-04-09 23:50       ` Alexey Ivanov
  2012-04-09 18:17     ` Ying Han
  1 sibling, 1 reply; 9+ messages in thread
From: Rik van Riel @ 2012-04-09 18:14 UTC (permalink / raw)
  To: Alexey Ivanov; +Cc: gnehzuil.lzheng, linux-kernel, linux-mm

On 04/09/2012 01:11 PM, Alexey Ivanov wrote:
> Thanks for the hint!
>
> Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()?

New anonymous pages start out on the active anon list, and
are always referenced.  If memory fills up, they may end
up getting moved to the inactive anon list; being referenced
while on the inactive anon list is enough to get them promoted
back to the active list.

New file pages start out on the INACTIVE file list, and
start their lives not referenced at all. Due to readahead
extra reads, many file pages may never be referenced.

Only file pages that are referenced twice make it onto
the active list.

This means the inactive file list has to be large enough
for all the readahead buffers, and give pages enough time
on the list that frequently accessed ones can get accessed
twice and promoted.

http://linux-mm.org/PageReplacementDesign

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mapped pagecache pages vs unmapped pages
  2012-04-09 17:11   ` Alexey Ivanov
  2012-04-09 18:14     ` Rik van Riel
@ 2012-04-09 18:17     ` Ying Han
  1 sibling, 0 replies; 9+ messages in thread
From: Ying Han @ 2012-04-09 18:17 UTC (permalink / raw)
  To: Alexey Ivanov; +Cc: gnehzuil.lzheng, linux-kernel, linux-mm, Rik van Riel

On Mon, Apr 9, 2012 at 10:11 AM, Alexey Ivanov <rbtz@yandex-team.ru> wrote:
> Thanks for the hint!
>
> Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()?

anonymous pages starts out referenced in active list, and scanning the
whole active list will only rotate those pages. So we would like to
limit the size of inactive anon to save scanning.

--Ying


>
> 06.04.2012, 11:16, "gnehzuil.lzheng@gmail.com" <gnehzuil.lzheng@gmail.com>:
>> On 04/06/2012 08:29 AM, Alexey Ivanov wrote:
>>
>>>  In progress of migration from FreeBSD to Linux and we found some strange behavior: periodically running tasks (like rsync/p2p deployment) evict mapped pages from memory.
>>>
>>>  From my little research I've found following lkml thread:
>>>  https://lkml.org/lkml/2008/6/11/278
>>>  And more precisely this commit: https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4f98a2fee8acdb4ac84545df98cccecfd130f8db
>>>  which along with splitting LRU into "anon" and "file" removed support of reclaim_mapped.
>>>
>>>  Is there a knob to prioritize mapped memory over unmapped (without modifying all apps to use O_DIRECT/fadvise/madvise or mlocking our data in memory) or at least some way to change proportion of Active(file)/Inactive(file)?
>>
>> Hi Alexey,
>>
>> Cc to linux-mm mailing list.
>>
>> I have met the similar problem and I have sent a mail to discuss it.
>> Maybe it can help you
>> (http://marc.info/?l=linux-mm&m=132947026019538&w=2).
>>
>> Now Konstantin has sent a patch set to try to expand vm_flags from 32
>> bit to 64 bit.  Then we can add the new flag into vm_flags and
>> prioritize mmaped pages in madvise(2).
>>
>> Regards,
>> Zheng
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
> --
> Alexey Ivanov
> Yandex Search Admin Team
> *************
> tel.: +7 (985) 120-35-83 (int. 7176)
> http://staff.yandex-team.ru/rbtz
> *************
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mapped pagecache pages vs unmapped pages
  2012-04-09 18:14     ` Rik van Riel
@ 2012-04-09 23:50       ` Alexey Ivanov
  2012-04-10  0:31         ` Minchan Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Ivanov @ 2012-04-09 23:50 UTC (permalink / raw)
  To: Rik van Riel; +Cc: gnehzuil.lzheng, linux-kernel, linux-mm, yinghan

Did you consider making this ratio tunable, at least manually(i.e. via sysctl)?
I suppose we are not the only ones with almost-whole-ram-mmaped workload.

09.04.2012, 22:56, "Rik van Riel" <riel@redhat.com>:
> On 04/09/2012 01:11 PM, Alexey Ivanov wrote:
>
>>  Thanks for the hint!
>>
>>  Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()?
>
> New anonymous pages start out on the active anon list, and
> are always referenced.  If memory fills up, they may end
> up getting moved to the inactive anon list; being referenced
> while on the inactive anon list is enough to get them promoted
> back to the active list.
>
> New file pages start out on the INACTIVE file list, and
> start their lives not referenced at all. Due to readahead
> extra reads, many file pages may never be referenced.
>
> Only file pages that are referenced twice make it onto
> the active list.
>
> This means the inactive file list has to be large enough
> for all the readahead buffers, and give pages enough time
> on the list that frequently accessed ones can get accessed
> twice and promoted.
>
> http://linux-mm.org/PageReplacementDesign
>
> --
> All rights reversed

-- 
Alexey Ivanov
Yandex Search Admin Team

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mapped pagecache pages vs unmapped pages
  2012-04-09 23:50       ` Alexey Ivanov
@ 2012-04-10  0:31         ` Minchan Kim
  2012-04-10  0:49           ` Rik van Riel
  0 siblings, 1 reply; 9+ messages in thread
From: Minchan Kim @ 2012-04-10  0:31 UTC (permalink / raw)
  To: Alexey Ivanov
  Cc: Rik van Riel, gnehzuil.lzheng, linux-kernel, linux-mm, yinghan

2012-04-10 오전 8:50, Alexey Ivanov 쓴 글:

> Did you consider making this ratio tunable, at least manually(i.e. via sysctl)?
> I suppose we are not the only ones with almost-whole-ram-mmaped workload.


Personally, I think it's not good approach.
It depends on kernel's internal implemenatation which would be changed
in future as we chagend it at 2.6.28.

In my opinion, kernel just should do best effort to keep active working
set except some critical pages which are code pages. If it's not active
working set but user want to keep them, we have to add new feature like
fadvise/madvise(WORKING_SET) to give the hint to kenrel. Although it
causes changing legacy programs, it doesn't copuled kernel's reclaim
algorithm and it's way to go, I think.

> 
> 09.04.2012, 22:56, "Rik van Riel" <riel@redhat.com>:
>> On 04/09/2012 01:11 PM, Alexey Ivanov wrote:
>>
>>>  Thanks for the hint!
>>>
>>>  Can anyone clarify the reason of not using zone->inactive_ratio in inactive_file_is_low_global()?
>>
>> New anonymous pages start out on the active anon list, and
>> are always referenced.  If memory fills up, they may end
>> up getting moved to the inactive anon list; being referenced
>> while on the inactive anon list is enough to get them promoted
>> back to the active list.
>>
>> New file pages start out on the INACTIVE file list, and
>> start their lives not referenced at all. Due to readahead
>> extra reads, many file pages may never be referenced.
>>
>> Only file pages that are referenced twice make it onto
>> the active list.
>>
>> This means the inactive file list has to be large enough
>> for all the readahead buffers, and give pages enough time
>> on the list that frequently accessed ones can get accessed
>> twice and promoted.
>>
>> http://linux-mm.org/PageReplacementDesign
>>
>> --
>> All rights reversed
> 



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mapped pagecache pages vs unmapped pages
  2012-04-10  0:31         ` Minchan Kim
@ 2012-04-10  0:49           ` Rik van Riel
  2012-04-10  1:25             ` Minchan Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Rik van Riel @ 2012-04-10  0:49 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Alexey Ivanov, gnehzuil.lzheng, linux-kernel, linux-mm, yinghan

On 04/09/2012 08:31 PM, Minchan Kim wrote:
> 2012-04-10 오전 8:50, Alexey Ivanov 쓴 글:
>
>> Did you consider making this ratio tunable, at least manually(i.e. via sysctl)?
>> I suppose we are not the only ones with almost-whole-ram-mmaped workload.
>
> Personally, I think it's not good approach.
> It depends on kernel's internal implemenatation which would be changed
> in future as we chagend it at 2.6.28.

I also believe that a tunable for this is not going to be
a very workable approach, for the simple reason that changing
the value does not make a predictable change in the effectiveness
of working set detection or protection.

> In my opinion, kernel just should do best effort to keep active working
> set except some critical pages which are code pages.

Johannes has some experimental code to measure refaults, and
calculate their distance in a multi-zone, multi-cgroup environment.

That would allow us to predictably place things in the working set
as required.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mapped pagecache pages vs unmapped pages
  2012-04-10  0:49           ` Rik van Riel
@ 2012-04-10  1:25             ` Minchan Kim
  0 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2012-04-10  1:25 UTC (permalink / raw)
  To: Rik van Riel, hannes
  Cc: Alexey Ivanov, gnehzuil.lzheng, linux-kernel, linux-mm, yinghan

2012-04-10 오전 9:49, Rik van Riel 쓴 글:

> On 04/09/2012 08:31 PM, Minchan Kim wrote:
>> 2012-04-10 오전 8:50, Alexey Ivanov 쓴 글:
>>
>>> Did you consider making this ratio tunable, at least manually(i.e.
>>> via sysctl)?
>>> I suppose we are not the only ones with almost-whole-ram-mmaped
>>> workload.
>>
>> Personally, I think it's not good approach.
>> It depends on kernel's internal implemenatation which would be changed
>> in future as we chagend it at 2.6.28.
> 
> I also believe that a tunable for this is not going to be
> a very workable approach, for the simple reason that changing
> the value does not make a predictable change in the effectiveness
> of working set detection or protection.
> 
>> In my opinion, kernel just should do best effort to keep active working
>> set except some critical pages which are code pages.
> 
> Johannes has some experimental code to measure refaults, and
> calculate their distance in a multi-zone, multi-cgroup environment.
> 
> That would allow us to predictably place things in the working set
> as required.
> 


Hannes, it can help many people if you post your code. ;)


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-04-10  1:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-06  0:29 mapped pagecache pages vs unmapped pages Alexey Ivanov
2012-04-06  7:16 ` gnehzuil.lzheng
2012-04-09 17:11   ` Alexey Ivanov
2012-04-09 18:14     ` Rik van Riel
2012-04-09 23:50       ` Alexey Ivanov
2012-04-10  0:31         ` Minchan Kim
2012-04-10  0:49           ` Rik van Riel
2012-04-10  1:25             ` Minchan Kim
2012-04-09 18:17     ` Ying Han

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).