All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Using "page credits" as a solution for common thrashing  scenarios
@ 2009-11-17 20:52 Eyal Lotem
  2009-11-17 21:08 ` Rik van Riel
  2009-11-24 21:15 ` Andi Kleen
  0 siblings, 2 replies; 6+ messages in thread
From: Eyal Lotem @ 2009-11-17 20:52 UTC (permalink / raw)
  To: LKML

I apologize for not sending a patch, but I am not yet skilled enough with
Linux kernel hacking to stop with the cheap talk and show you the code.

I have encountered numerous times the following situation:

  * Small processes A, B, C happily functioning on the system (e.g: bash,
    xterm, etc).

  * Large misbehaving process D (e.g firefox) is launched, and due to
    a bug or other issue, begins to allocate and write to an extremely
    large memory working set.

  * The entire operating system and all processes become highly
    unresponsive. Sometimes to the point where I cannot even kill the
    problematic process.  SysRq keys still work, but those are usually too
    coarse.  Sometimes the OOM will start firing at random.

My analysis of this:

  * I believe that process D in this scenario is basically causing the
    kernel MM to evict all of the pages of the well-behaving A, B, C
    processes.

  * I think it is wrong for the kernel to evict the 15 pages of the bash,
    xterm, X server's working set, as an example, in order for a
    misbehaving process to have 1000015 instead of 1000000 pages in its
    working set. EVEN if that misbehaving process is accessing its working
    set far more aggressively.

Suggested solution:

 If my analysis isn't off, then I believe the following solution might
 mitigate the problem elegantly.

 1. Maintain a per-process Most-Recently-Used (MRU) page listing. This
    listing is allowed to be approximate (e.g: by use of the page table's
    Accessed and Dirty bits).

 2. Assign a number of "page credits" to each process (sort of a memory
    "niceness" level) which the kernel automatically disperses on the MRU
    pages of that process.  When the MRU set changes, the "page credits"
    are *moved* from LRU entries to new MRU entries.

 3. Each physical page accumulates all of the "page credits" of the various
    processes that use it in their MRU.  This allows shared pages to
    accumulate more credit when they are useful for more processes.

 4. Page eviction should still be global (not per-process) but should evict
    pages whose accumulated "page credits" count is lowest.

 5. per-process "page credit" levels can be maintained by user-space (using
    setrlimit or such). It probably makes sense for fork() to split the
    "page credit" levels and not duplicate them (and spawn new page credits
    out of nowhere).  While having a good way to specify per-process "page
    credits" could improve things, I don't believe it is critical to do
    this accurately in order for this to effectively prevent thrashing.

Rationale: The usefulness of a page also stems from how big of a percentage
  it is of it's user process working set.  If a single page is the entire
  working set of a process, evicting that single page is most costly than
  even evicting 100 pages of another process if that other process's
  working set is extremely large.

Highly simplified example of suggested solution (numbers are made up):

NOTE: The example does not include shared library memory use and various
other details (I do believe this solution handles those elegantly), to
avoid clutter.

 1. 3 bash processes are assigned 1 million credits each, and each of them
    use a shared 10 pages (e.g mmap'd code) and another 40 unique pages in
    their MRU working set.

    The kernel automatically assigns 20,000 credits (1M / 50) to each of
    the pages in each bash process.  The shared 10 physical pages will
    accumulate 60,000 credits each.  Each of the 120 (40 * 3) unique
    physical pages accumulates 20,000 credits.

 2. xterm, X processes are assigned 1 million credits each, and each of
    them use 200 unique pages.

    The kernel automatically assigns 5,000 credits to each of the unique
    pages, so the physical pages accumulate 5,000 credits each.

 3. a firefox process is too assigned 1 million credits. It aggressively
    allocates and writes to as many pages as the kernel allows.  Instead of
    starting to evict the above processes (bash's, xterm's, X's) which
    hinder responsiveness to the user, the kernel will find the physical
    pages with the least amount of "credits" to evict.

 4. Assuming firefox has already allocated 1 million pages before eviction
    is required, the eviction decision is faced with the following data:

    * 10 physical pages shared by the bash processes, each have 60,000
      credits.

    * 120 physical pages (40 x 3 unique pages of the bash processes), each
      with 20,000 credits.

    * 400 physical pages (of xterm and X) with 5,000 credits each.

    * 1 million physical pages (of firefox) with 1 credit each.

 5. The kernel has a "no-brainer" choice here. Instead of effectively
    pausing all of the behaving processes in order to allocate a few more
    pages for firefox, it can and should make firefox pay for its
    wastefulness.  It will evict firefox's old pages to make room for
    firefox's new pages.

    In effect, firefox's DOS attack here (on the physical page resource)
    will attack only firefox itself, and not the rest of the system.
    Firefox's DOS attack on disk I/O and other resources is still underway
    (Perhaps requiring a different solution).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Using "page credits" as a solution for common thrashing scenarios
  2009-11-17 20:52 [RFC] Using "page credits" as a solution for common thrashing scenarios Eyal Lotem
@ 2009-11-17 21:08 ` Rik van Riel
  2009-11-17 21:19   ` Eyal Lotem
  2009-11-24 21:15 ` Andi Kleen
  1 sibling, 1 reply; 6+ messages in thread
From: Rik van Riel @ 2009-11-17 21:08 UTC (permalink / raw)
  To: Eyal Lotem; +Cc: LKML

On 11/17/2009 03:52 PM, Eyal Lotem wrote:
> Highly simplified example of suggested solution (numbers are made up):
>
> NOTE: The example does not include shared library memory use and various
> other details (I do believe this solution handles those elegantly), to
> avoid clutter.
>
>    
The real problematic details are in the data structures.

How to organize the pages in-kernel?

How does the kernel find the pages with the lowest
score?  In the memory zone where memory needs
to be freed...

How can the kernel efficiently change the score of
tens of thousands (or even millions) of pages at once,
moving them to the right LRU list cheaply?

Making page replacement scale to systems with
hundreds of gigabytes of memory is a necessity, as
is evicting the right kind of page (filesystem cache
vs process page, etc).

Within those constraints, it becomes very hard to
implement a solution along your ideas.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Using "page credits" as a solution for common thrashing  scenarios
  2009-11-17 21:08 ` Rik van Riel
@ 2009-11-17 21:19   ` Eyal Lotem
  0 siblings, 0 replies; 6+ messages in thread
From: Eyal Lotem @ 2009-11-17 21:19 UTC (permalink / raw)
  To: Rik van Riel; +Cc: LKML

> The real problematic details are in the data structures.
>
> How to organize the pages in-kernel?
>
> How does the kernel find the pages with the lowest
> score?  In the memory zone where memory needs
> to be freed...
>
> How can the kernel efficiently change the score of
> tens of thousands (or even millions) of pages at once,
> moving them to the right LRU list cheaply?
>
> Making page replacement scale to systems with
> hundreds of gigabytes of memory is a necessity, as
> is evicting the right kind of page (filesystem cache
> vs process page, etc).
>
> Within those constraints, it becomes very hard to
> implement a solution along your ideas.

I agree an implementation that reasonably addresses these concerns is far
from trivial.  I wouldn't give up on finding a way to implement it (or a
good approximation of it) so soon.

Before diving into the exact implementation details, I was wondering if I'm
getting some basic fact wrong -- would this solution (if possible to
implement with reasonable performance) truly mitigate this kind of
thrashing?

Eyal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Using "page credits" as a solution for common thrashing  scenarios
  2009-11-17 20:52 [RFC] Using "page credits" as a solution for common thrashing scenarios Eyal Lotem
  2009-11-17 21:08 ` Rik van Riel
@ 2009-11-24 21:15 ` Andi Kleen
  2009-11-24 22:39   ` Chris Friesen
  2010-06-08  9:45   ` Eyal Lotem
  1 sibling, 2 replies; 6+ messages in thread
From: Andi Kleen @ 2009-11-24 21:15 UTC (permalink / raw)
  To: Eyal Lotem; +Cc: LKML

Eyal Lotem <eyal.lotem@gmail.com> writes:

Replying to an old email.

>   * I think it is wrong for the kernel to evict the 15 pages of the bash,
>     xterm, X server's working set, as an example, in order for a
>     misbehaving process to have 1000015 instead of 1000000 pages in its
>     working set. EVEN if that misbehaving process is accessing its working
>     set far more aggressively.

One problem in practice tends to be that it's hard to realiably detect
that a process is misbehaving. The 1000000 page process might be your
critical database, while the 15 page process is something very
unimportant.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Using "page credits" as a solution for common thrashing scenarios
  2009-11-24 21:15 ` Andi Kleen
@ 2009-11-24 22:39   ` Chris Friesen
  2010-06-08  9:45   ` Eyal Lotem
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Friesen @ 2009-11-24 22:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Eyal Lotem, LKML

On 11/24/2009 03:15 PM, Andi Kleen wrote:
> Eyal Lotem <eyal.lotem@gmail.com> writes:
> 
> Replying to an old email.
> 
>>   * I think it is wrong for the kernel to evict the 15 pages of the bash,
>>     xterm, X server's working set, as an example, in order for a
>>     misbehaving process to have 1000015 instead of 1000000 pages in its
>>     working set. EVEN if that misbehaving process is accessing its working
>>     set far more aggressively.
> 
> One problem in practice tends to be that it's hard to realiably detect
> that a process is misbehaving. The 1000000 page process might be your
> critical database, while the 15 page process is something very
> unimportant.

Quite a while ago now I proposed the ability for an app (with suitable
privileges) to register with the system the amount of memory it expected
to use.  As long as it was under that amount it would be immune to the
oom-killer.

We've been using this in production for some time now as we have several
very large memory footprint apps that otherwise become quite attractive
to the oom killer.

The oom_adj proc entry made this less of an issue so I never bothered
pushing it to mainline.

Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Using "page credits" as a solution for common thrashing  scenarios
  2009-11-24 21:15 ` Andi Kleen
  2009-11-24 22:39   ` Chris Friesen
@ 2010-06-08  9:45   ` Eyal Lotem
  1 sibling, 0 replies; 6+ messages in thread
From: Eyal Lotem @ 2010-06-08  9:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: LKML

Replying to a very old email :-)

On Wed, Nov 25, 2009 at 12:15 AM, Andi Kleen <andi@firstfloor.org> wrote:
> Eyal Lotem <eyal.lotem@gmail.com> writes:
>
> Replying to an old email.
>
>>   * I think it is wrong for the kernel to evict the 15 pages of the bash,
>>     xterm, X server's working set, as an example, in order for a
>>     misbehaving process to have 1000015 instead of 1000000 pages in its
>>     working set. EVEN if that misbehaving process is accessing its working
>>     set far more aggressively.
>
> One problem in practice tends to be that it's hard to realiably detect
> that a process is misbehaving. The 1000000 page process might be your
> critical database, while the 15 page process is something very
> unimportant.

Well, this solution doesn't really depend on any detection of
"misbehaving", it just goes about a more accurate way of defining page
importance.  A simple solution to the problem you suggest is assigning
far more "credits" to the database than to the 15-page process.

Eyal

>
> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-06-08  9:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-17 20:52 [RFC] Using "page credits" as a solution for common thrashing scenarios Eyal Lotem
2009-11-17 21:08 ` Rik van Riel
2009-11-17 21:19   ` Eyal Lotem
2009-11-24 21:15 ` Andi Kleen
2009-11-24 22:39   ` Chris Friesen
2010-06-08  9:45   ` Eyal Lotem

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.