All of lore.kernel.org
 help / color / mirror / Atom feed
* thrashing on file pages
@ 2017-04-05  1:01 Luigi Semenzato
  2017-04-11 19:25 ` Luigi Semenzato
  2017-04-13  5:42 ` Minchan Kim
  0 siblings, 2 replies; 5+ messages in thread
From: Luigi Semenzato @ 2017-04-05  1:01 UTC (permalink / raw)
  To: Linux Memory Management List

Greetings MM community, and apologies for being out of touch.

We're running into a MM problem which we encountered in the early
versions of Chrome OS, about 7 years ago, which is that under certain
interactive loads we thrash on executable pages.

At the time, Mandeep Baines solved this problem by introducing a
min_filelist_kbytes parameter, which simply stops the scanning of the
file list whenever the number of pages in it is below that threshold.
This works surprisingly well for Chrome OS because the Chrome browser
has a known text size and is the only large user program.
Additionally we use Feedback-Directed Optimization to keep the hot
code together in the same pages.

But given that Chromebooks can run Android apps, the picture is
changing.  We can bump min_filelist_kbytes, but we no longer have an
upper bound for the working set of a workflow which cycles through
multiple Android apps.  Tab/app switching is more natural and
therefore more frequent on laptops than it is on phones, and it puts a
bigger strain on the MM.

I should mention that we manage memory also by OOM-killing Android
apps and discarding Chrome tabs before the system runs our of memory.
We also reassign kernel-OOM-kill priorities for the cases in which our
user-level killing code isn't quick enough.

In our attempts to avoid the thrashing, we played around with
swappiness.  Dmitry Torokhov (three desks down from mine) suggested
shifting the upper bound of 100 to 200, which makes sense because we
use zram to reclaim anonymous pages, and paging back from zram is a
lot faster than reading from SSD.  So I have played around with
swappiness up to 190 but I can still reproduce the thrashing.  I have
noticed this code in vmscan.c:

        if (!sc->priority && swappiness) {
                scan_balance = SCAN_EQUAL;
                goto out;
        }

which suggests that under heavy pressure, swappiness is ignored.  I
removed this code, but that didn't help either.  I am not fully
convinced that my experiments are fully repeatable (quite the
opposite), and there may be variations in the point at which thrashing
starts, but the bottom line is that it still starts.

Are we the only ones with this problem?  It's possible, since Android
by design can be aggressive in killing processes, and conversely
Chrome OS is popular in the low-end of the market, where devices with
2GB of RAM are still common, and memory exhaustion can be reached
pretty easily.  I noticed that vmscan.c has code which tries to
protect pages with the VM_EXEC flag from premature eviction, so the
problem might have been seen before in some form.

I'll be grateful for any suggestion, advice, or other information.  Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thrashing on file pages
  2017-04-05  1:01 thrashing on file pages Luigi Semenzato
@ 2017-04-11 19:25 ` Luigi Semenzato
  2017-04-13  5:42 ` Minchan Kim
  1 sibling, 0 replies; 5+ messages in thread
From: Luigi Semenzato @ 2017-04-11 19:25 UTC (permalink / raw)
  To: Linux Memory Management List

Maybe this message was too long.  Quick summary:

Are we (chrome os) the only ones who experience thrashing from
excessive eviction of code pages?

Chrome OS added a mechanism (also called "the hacky patch")
https://codereview.chromium.org/4128001 which stops the scanning of
file lists below a fixed threshold (configurable with sysctl).  This
has worked very well.  Would it be worth upstreaming?  Are there
alternatives?

We have other ways of freeing up memory---specifically we close Chrome
tabs (and Android apps, now).  But, depending on allocation speed, we
may get behind with the freeing, and end up thrashing to the point
that even OOM kills are seriously delayed.

And furthermore: are we the only one who would like to see the max
value for swappiness be raised from 100 to 200?  This seems reasonable
when the swap device is much faster than the file backing device.

These may not be issues on servers, where the load is carefully
controlled.  But they seem hard to avoid on consumer devices.

Your reply will help millions of people!  (Us too, but that's just a
side effect.)

Thanks :)






On Tue, Apr 4, 2017 at 6:01 PM, Luigi Semenzato <semenzato@google.com> wrote:
> Greetings MM community, and apologies for being out of touch.
>
> We're running into a MM problem which we encountered in the early
> versions of Chrome OS, about 7 years ago, which is that under certain
> interactive loads we thrash on executable pages.
>
> At the time, Mandeep Baines solved this problem by introducing a
> min_filelist_kbytes parameter, which simply stops the scanning of the
> file list whenever the number of pages in it is below that threshold.
> This works surprisingly well for Chrome OS because the Chrome browser
> has a known text size and is the only large user program.
> Additionally we use Feedback-Directed Optimization to keep the hot
> code together in the same pages.
>
> But given that Chromebooks can run Android apps, the picture is
> changing.  We can bump min_filelist_kbytes, but we no longer have an
> upper bound for the working set of a workflow which cycles through
> multiple Android apps.  Tab/app switching is more natural and
> therefore more frequent on laptops than it is on phones, and it puts a
> bigger strain on the MM.
>
> I should mention that we manage memory also by OOM-killing Android
> apps and discarding Chrome tabs before the system runs our of memory.
> We also reassign kernel-OOM-kill priorities for the cases in which our
> user-level killing code isn't quick enough.
>
> In our attempts to avoid the thrashing, we played around with
> swappiness.  Dmitry Torokhov (three desks down from mine) suggested
> shifting the upper bound of 100 to 200, which makes sense because we
> use zram to reclaim anonymous pages, and paging back from zram is a
> lot faster than reading from SSD.  So I have played around with
> swappiness up to 190 but I can still reproduce the thrashing.  I have
> noticed this code in vmscan.c:
>
>         if (!sc->priority && swappiness) {
>                 scan_balance = SCAN_EQUAL;
>                 goto out;
>         }
>
> which suggests that under heavy pressure, swappiness is ignored.  I
> removed this code, but that didn't help either.  I am not fully
> convinced that my experiments are fully repeatable (quite the
> opposite), and there may be variations in the point at which thrashing
> starts, but the bottom line is that it still starts.
>
> Are we the only ones with this problem?  It's possible, since Android
> by design can be aggressive in killing processes, and conversely
> Chrome OS is popular in the low-end of the market, where devices with
> 2GB of RAM are still common, and memory exhaustion can be reached
> pretty easily.  I noticed that vmscan.c has code which tries to
> protect pages with the VM_EXEC flag from premature eviction, so the
> problem might have been seen before in some form.
>
> I'll be grateful for any suggestion, advice, or other information.  Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thrashing on file pages
  2017-04-05  1:01 thrashing on file pages Luigi Semenzato
  2017-04-11 19:25 ` Luigi Semenzato
@ 2017-04-13  5:42 ` Minchan Kim
  2017-04-21 18:15   ` Luigi Semenzato
  1 sibling, 1 reply; 5+ messages in thread
From: Minchan Kim @ 2017-04-13  5:42 UTC (permalink / raw)
  To: Luigi Semenzato
  Cc: Linux Memory Management List, timmurray, Johannes Weiner, vinmenon

Hi Luigi,

On Tue, Apr 04, 2017 at 06:01:50PM -0700, Luigi Semenzato wrote:
> Greetings MM community, and apologies for being out of touch.
> 
> We're running into a MM problem which we encountered in the early
> versions of Chrome OS, about 7 years ago, which is that under certain
> interactive loads we thrash on executable pages.
> 
> At the time, Mandeep Baines solved this problem by introducing a
> min_filelist_kbytes parameter, which simply stops the scanning of the
> file list whenever the number of pages in it is below that threshold.
> This works surprisingly well for Chrome OS because the Chrome browser
> has a known text size and is the only large user program.
> Additionally we use Feedback-Directed Optimization to keep the hot
> code together in the same pages.
> 
> But given that Chromebooks can run Android apps, the picture is
> changing.  We can bump min_filelist_kbytes, but we no longer have an
> upper bound for the working set of a workflow which cycles through
> multiple Android apps.  Tab/app switching is more natural and
> therefore more frequent on laptops than it is on phones, and it puts a
> bigger strain on the MM.
> 
> I should mention that we manage memory also by OOM-killing Android
> apps and discarding Chrome tabs before the system runs our of memory.
> We also reassign kernel-OOM-kill priorities for the cases in which our
> user-level killing code isn't quick enough.
> 
> In our attempts to avoid the thrashing, we played around with
> swappiness.  Dmitry Torokhov (three desks down from mine) suggested
> shifting the upper bound of 100 to 200, which makes sense because we

It does makes sense but look at below.

> use zram to reclaim anonymous pages, and paging back from zram is a
> lot faster than reading from SSD.  So I have played around with
> swappiness up to 190 but I can still reproduce the thrashing.  I have
> noticed this code in vmscan.c:
> 
>         if (!sc->priority && swappiness) {
>                 scan_balance = SCAN_EQUAL;
>                 goto out;
>         }
> 
> which suggests that under heavy pressure, swappiness is ignored.  I
> removed this code, but that didn't help either.  I am not fully
> convinced that my experiments are fully repeatable (quite the
> opposite), and there may be variations in the point at which thrashing
> starts, but the bottom line is that it still starts.

If sc->priroity is zero, maybe, it means VM would already reclaim
lots of workingset. That might be one of reason you cannot see the
difference.

I think more culprit is as follow,

get_scan_count:

        if (!inactive_file_is_low(lruvec) && lruvec_lru_size() >> sc->priroity) {
                scan_balance = SCAN_FILE;
                goto out;
        }

And it works with
shrink_list:
        if (is_active_lru(lru))
                if (inactive_list_is_low(lru)
                                shrink_active_list(lru);

It means VM prefer file-backed page to anonymous page reclaim until below condition.

get_scan_count:

        if (global_reclaim(sc)) {
                if (zonefile + zonefree <= high_wmark_pages(zone))
                        scan_balance = SCAN_ANON;
        }

It means VM will protect some amount of file-backed pages but
the amount of pages VM protected depends high watermark which relies on
min_free_kbytes. Recently, you can control the size via watermark_scale_factor
without min_free_kbytes. So you can mimic min_filelist_kbytes with that
although it has limitation for high watermark(20%).
(795ae7a0de6b, mm: scale kswapd watermarks in proportion to memory)

> 
> Are we the only ones with this problem?  It's possible, since Android

No. You're not lonely.
http://lkml.kernel.org/r/20170317231636.142311-1-timmurray@google.com

Johannes are preparing some patches(aggressive anonymous page reclaim
+ thrashing detection).

https://lwn.net/Articles/690069/
https://marc.info/?l=linux-mm&m=148351203826308

I hope we makes progress the discussion to find some solution.
Please, join the discussion if you have interested. :)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thrashing on file pages
  2017-04-13  5:42 ` Minchan Kim
@ 2017-04-21 18:15   ` Luigi Semenzato
  2017-04-24  7:05     ` Minchan Kim
  0 siblings, 1 reply; 5+ messages in thread
From: Luigi Semenzato @ 2017-04-21 18:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linux Memory Management List, Tim Murray, Johannes Weiner, vinmenon

Thank you very much Minchan.

I took a look at Johannes proposal.  It all makes sense but I'd like
to point out one additional issue, which is partly a time scale issue.

In Chrome OS (and this potentially applies to Android) one common use
pattern is to do some work in one browser tab, then switch to another
tab and do some work there and so on (think of apps instead of tabs on
Android).  Thus there is a loose notion of a "working set of tabs".

For Chrome OS, it is important that the tab working set fit in memory
(RAM + swap).  If it does not, some tabs in the set get "discarded"
while using the others: i.e. the browser releases most of their
resources, including their javascript and DOM state.

Thus, swapping is *much* better than discarding, and usually faster.
Then it is quite allright for a renderer process (a process backing
one or more tabs) to make very little progress for some time, while it
pages in its code and data (mostly data in the case of Chrome OS).
The length of "some time" depends on the application, but in this case
(interactive application) could be as long as a small number of
seconds.

Thus there should be a way of nullifying any actions that may be taken
as a result of thrashing detection, because in these cases the
thrashing is expected and preferable to the alternatives.




On Wed, Apr 12, 2017 at 10:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> Hi Luigi,
>
> On Tue, Apr 04, 2017 at 06:01:50PM -0700, Luigi Semenzato wrote:
>> Greetings MM community, and apologies for being out of touch.
>>
>> We're running into a MM problem which we encountered in the early
>> versions of Chrome OS, about 7 years ago, which is that under certain
>> interactive loads we thrash on executable pages.
>>
>> At the time, Mandeep Baines solved this problem by introducing a
>> min_filelist_kbytes parameter, which simply stops the scanning of the
>> file list whenever the number of pages in it is below that threshold.
>> This works surprisingly well for Chrome OS because the Chrome browser
>> has a known text size and is the only large user program.
>> Additionally we use Feedback-Directed Optimization to keep the hot
>> code together in the same pages.
>>
>> But given that Chromebooks can run Android apps, the picture is
>> changing.  We can bump min_filelist_kbytes, but we no longer have an
>> upper bound for the working set of a workflow which cycles through
>> multiple Android apps.  Tab/app switching is more natural and
>> therefore more frequent on laptops than it is on phones, and it puts a
>> bigger strain on the MM.
>>
>> I should mention that we manage memory also by OOM-killing Android
>> apps and discarding Chrome tabs before the system runs our of memory.
>> We also reassign kernel-OOM-kill priorities for the cases in which our
>> user-level killing code isn't quick enough.
>>
>> In our attempts to avoid the thrashing, we played around with
>> swappiness.  Dmitry Torokhov (three desks down from mine) suggested
>> shifting the upper bound of 100 to 200, which makes sense because we
>
> It does makes sense but look at below.
>
>> use zram to reclaim anonymous pages, and paging back from zram is a
>> lot faster than reading from SSD.  So I have played around with
>> swappiness up to 190 but I can still reproduce the thrashing.  I have
>> noticed this code in vmscan.c:
>>
>>         if (!sc->priority && swappiness) {
>>                 scan_balance = SCAN_EQUAL;
>>                 goto out;
>>         }
>>
>> which suggests that under heavy pressure, swappiness is ignored.  I
>> removed this code, but that didn't help either.  I am not fully
>> convinced that my experiments are fully repeatable (quite the
>> opposite), and there may be variations in the point at which thrashing
>> starts, but the bottom line is that it still starts.
>
> If sc->priroity is zero, maybe, it means VM would already reclaim
> lots of workingset. That might be one of reason you cannot see the
> difference.
>
> I think more culprit is as follow,
>
> get_scan_count:
>
>         if (!inactive_file_is_low(lruvec) && lruvec_lru_size() >> sc->priroity) {
>                 scan_balance = SCAN_FILE;
>                 goto out;
>         }
>
> And it works with
> shrink_list:
>         if (is_active_lru(lru))
>                 if (inactive_list_is_low(lru)
>                                 shrink_active_list(lru);
>
> It means VM prefer file-backed page to anonymous page reclaim until below condition.
>
> get_scan_count:
>
>         if (global_reclaim(sc)) {
>                 if (zonefile + zonefree <= high_wmark_pages(zone))
>                         scan_balance = SCAN_ANON;
>         }
>
> It means VM will protect some amount of file-backed pages but
> the amount of pages VM protected depends high watermark which relies on
> min_free_kbytes. Recently, you can control the size via watermark_scale_factor
> without min_free_kbytes. So you can mimic min_filelist_kbytes with that
> although it has limitation for high watermark(20%).
> (795ae7a0de6b, mm: scale kswapd watermarks in proportion to memory)
>
>>
>> Are we the only ones with this problem?  It's possible, since Android
>
> No. You're not lonely.
> http://lkml.kernel.org/r/20170317231636.142311-1-timmurray@google.com
>
> Johannes are preparing some patches(aggressive anonymous page reclaim
> + thrashing detection).
>
> https://lwn.net/Articles/690069/
> https://marc.info/?l=linux-mm&m=148351203826308
>
> I hope we makes progress the discussion to find some solution.
> Please, join the discussion if you have interested. :)
>
> Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thrashing on file pages
  2017-04-21 18:15   ` Luigi Semenzato
@ 2017-04-24  7:05     ` Minchan Kim
  0 siblings, 0 replies; 5+ messages in thread
From: Minchan Kim @ 2017-04-24  7:05 UTC (permalink / raw)
  To: Luigi Semenzato
  Cc: Linux Memory Management List, Tim Murray, Johannes Weiner, vinmenon

On Fri, Apr 21, 2017 at 11:15:11AM -0700, Luigi Semenzato wrote:
> Thank you very much Minchan.
> 
> I took a look at Johannes proposal.  It all makes sense but I'd like
> to point out one additional issue, which is partly a time scale issue.
> 
> In Chrome OS (and this potentially applies to Android) one common use
> pattern is to do some work in one browser tab, then switch to another
> tab and do some work there and so on (think of apps instead of tabs on
> Android).  Thus there is a loose notion of a "working set of tabs".
> 
> For Chrome OS, it is important that the tab working set fit in memory
> (RAM + swap).  If it does not, some tabs in the set get "discarded"
> while using the others: i.e. the browser releases most of their
> resources, including their javascript and DOM state.
> 
> Thus, swapping is *much* better than discarding, and usually faster.
> Then it is quite allright for a renderer process (a process backing
> one or more tabs) to make very little progress for some time, while it
> pages in its code and data (mostly data in the case of Chrome OS).
> The length of "some time" depends on the application, but in this case
> (interactive application) could be as long as a small number of
> seconds.
> 
> Thus there should be a way of nullifying any actions that may be taken
> as a result of thrashing detection, because in these cases the
> thrashing is expected and preferable to the alternatives.

Once we are able to quantify memory pressure, it would be more easier
to have a relative scale of memory pressure discrimination like
Johannes mentioned.

>From the idea, we can implement "reclaiming priorities per mem cgroup"
from Tim more sientific, IMHO. With that, you can make some groups's
reclaim void although thrashing happens.

> 
> 
> 
> 
> On Wed, Apr 12, 2017 at 10:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> > Hi Luigi,
> >
> > On Tue, Apr 04, 2017 at 06:01:50PM -0700, Luigi Semenzato wrote:
> >> Greetings MM community, and apologies for being out of touch.
> >>
> >> We're running into a MM problem which we encountered in the early
> >> versions of Chrome OS, about 7 years ago, which is that under certain
> >> interactive loads we thrash on executable pages.
> >>
> >> At the time, Mandeep Baines solved this problem by introducing a
> >> min_filelist_kbytes parameter, which simply stops the scanning of the
> >> file list whenever the number of pages in it is below that threshold.
> >> This works surprisingly well for Chrome OS because the Chrome browser
> >> has a known text size and is the only large user program.
> >> Additionally we use Feedback-Directed Optimization to keep the hot
> >> code together in the same pages.
> >>
> >> But given that Chromebooks can run Android apps, the picture is
> >> changing.  We can bump min_filelist_kbytes, but we no longer have an
> >> upper bound for the working set of a workflow which cycles through
> >> multiple Android apps.  Tab/app switching is more natural and
> >> therefore more frequent on laptops than it is on phones, and it puts a
> >> bigger strain on the MM.
> >>
> >> I should mention that we manage memory also by OOM-killing Android
> >> apps and discarding Chrome tabs before the system runs our of memory.
> >> We also reassign kernel-OOM-kill priorities for the cases in which our
> >> user-level killing code isn't quick enough.
> >>
> >> In our attempts to avoid the thrashing, we played around with
> >> swappiness.  Dmitry Torokhov (three desks down from mine) suggested
> >> shifting the upper bound of 100 to 200, which makes sense because we
> >
> > It does makes sense but look at below.
> >
> >> use zram to reclaim anonymous pages, and paging back from zram is a
> >> lot faster than reading from SSD.  So I have played around with
> >> swappiness up to 190 but I can still reproduce the thrashing.  I have
> >> noticed this code in vmscan.c:
> >>
> >>         if (!sc->priority && swappiness) {
> >>                 scan_balance = SCAN_EQUAL;
> >>                 goto out;
> >>         }
> >>
> >> which suggests that under heavy pressure, swappiness is ignored.  I
> >> removed this code, but that didn't help either.  I am not fully
> >> convinced that my experiments are fully repeatable (quite the
> >> opposite), and there may be variations in the point at which thrashing
> >> starts, but the bottom line is that it still starts.
> >
> > If sc->priroity is zero, maybe, it means VM would already reclaim
> > lots of workingset. That might be one of reason you cannot see the
> > difference.
> >
> > I think more culprit is as follow,
> >
> > get_scan_count:
> >
> >         if (!inactive_file_is_low(lruvec) && lruvec_lru_size() >> sc->priroity) {
> >                 scan_balance = SCAN_FILE;
> >                 goto out;
> >         }
> >
> > And it works with
> > shrink_list:
> >         if (is_active_lru(lru))
> >                 if (inactive_list_is_low(lru)
> >                                 shrink_active_list(lru);
> >
> > It means VM prefer file-backed page to anonymous page reclaim until below condition.
> >
> > get_scan_count:
> >
> >         if (global_reclaim(sc)) {
> >                 if (zonefile + zonefree <= high_wmark_pages(zone))
> >                         scan_balance = SCAN_ANON;
> >         }
> >
> > It means VM will protect some amount of file-backed pages but
> > the amount of pages VM protected depends high watermark which relies on
> > min_free_kbytes. Recently, you can control the size via watermark_scale_factor
> > without min_free_kbytes. So you can mimic min_filelist_kbytes with that
> > although it has limitation for high watermark(20%).
> > (795ae7a0de6b, mm: scale kswapd watermarks in proportion to memory)
> >
> >>
> >> Are we the only ones with this problem?  It's possible, since Android
> >
> > No. You're not lonely.
> > http://lkml.kernel.org/r/20170317231636.142311-1-timmurray@google.com
> >
> > Johannes are preparing some patches(aggressive anonymous page reclaim
> > + thrashing detection).
> >
> > https://lwn.net/Articles/690069/
> > https://marc.info/?l=linux-mm&m=148351203826308
> >
> > I hope we makes progress the discussion to find some solution.
> > Please, join the discussion if you have interested. :)
> >
> > Thanks.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-24  7:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-05  1:01 thrashing on file pages Luigi Semenzato
2017-04-11 19:25 ` Luigi Semenzato
2017-04-13  5:42 ` Minchan Kim
2017-04-21 18:15   ` Luigi Semenzato
2017-04-24  7:05     ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.