All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>,
	ksummit-discuss@lists.linuxfoundation.org
Subject: Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
Date: Thu, 28 Jul 2016 20:25:45 -0400	[thread overview]
Message-ID: <1469751945.13905.6.camel@redhat.com> (raw)
In-Reply-To: <20160728185523.GA16390@cmpxchg.org>

[-- Attachment #1: Type: text/plain, Size: 3148 bytes --]

On Thu, 2016-07-28 at 14:55 -0400, Johannes Weiner wrote:
> On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote:
> > Most recently I have been working on reviving swap for SSDs and
> > persistent memory devices (https://lwn.net/Articles/690079/) as
> > part
> > of a bigger anti-thrashing effort to make the VM recover swiftly
> > and
> > predictably from load spikes.
> 
> A bit of context, in case we want to discuss this at KS:
> 
> We frequently have machines hang and stop responding indefinitely
> after they experience memory load spikes. On closer look, we find
> most
> tasks either in page reclaim or majorfaulting parts of an executable
> or library. It's a typical thrashing pattern, where everybody
> cannibalizes everybody else. The problem is that with fast storage
> the
> cache reloads can be fast enough that there are never enough in-
> flight
> pages at a time to cause page reclaim to fail and trigger the OOM
> killer. The livelock persists until external remediation reboots the
> box or we get lucky and non-cache allocations eventually suck up the
> remaining page cache and trigger the OOM killer.
> 
> To avoid hitting this situation, we currently have to keep a generous
> memory reserve for occasional spikes, which sucks for utilization the
> rest of the time. Swap would be useful here, but the swapout code is
> basically only triggering when memory pressure rises - which again
> doesn't happen - so I've been working on the swap code to balance
> cache reclaim vs. swap based on relative thrashing between the two.
> 
> There is usually some cold/unused anonymous memory lying around that
> can be unloaded into swap during workload spikes, so that allows us
> to
> drive up the average memory utilization without increasing the risk
> at
> least. But if we screw up and there are not enough unused anon pages,
> we are back to thrashing - only now it involves swapping too.
> 
> So how do we address this?
> 
> A pathological thrashing situation is very obvious to any user, but
> it's not quite clear how to quantify it inside the kernel and have it
> trigger the OOM killer. It might be useful to talk about
> metrics. Could we quantify application progress? Could we quantify
> the
> amount of time a task or the system spends thrashing, and somehow
> express it as a percentage of overall execution time? Maybe something
> comparable to IO wait time, except tracking the time spent performing
> reclaim and waiting on IO that is refetching recently evicted pages?
> 
> This question seems to go beyond the memory subsystem and potentially
> involve the scheduler and the block layer, so it might be a good tech
> topic for KS.

I would like to discuss this topic, as well.

This is a very fundamental issue that used to be hard
coded in the BSDs (in the 1980s & 1990s), but where
hard coding is totally inappropriate with today's memory
sizes, and variation in I/O subsystem speeds.

Solving this, even if only on the detection side, could
make a real difference in having systems survive load
spikes.

-- 

All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  parent reply	other threads:[~2016-07-29  0:25 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-25 17:11 [Ksummit-discuss] Self nomination Johannes Weiner
2016-07-25 18:15 ` Rik van Riel
2016-07-26 10:56   ` Jan Kara
2016-07-26 13:10     ` Vlastimil Babka
2016-07-28 18:55 ` [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was " Johannes Weiner
2016-07-28 21:41   ` James Bottomley
2016-08-01 15:46     ` Johannes Weiner
2016-08-01 16:06       ` James Bottomley
2016-08-01 16:11         ` Dave Hansen
2016-08-01 16:33           ` James Bottomley
2016-08-01 18:13             ` Rik van Riel
2016-08-01 19:51             ` Dave Hansen
2016-08-01 17:08           ` Johannes Weiner
2016-08-01 18:19             ` Johannes Weiner
2016-07-29  0:25   ` Rik van Riel [this message]
2016-07-29 11:07   ` Mel Gorman
2016-07-29 16:26     ` Luck, Tony
2016-08-01 15:17       ` Rik van Riel
2016-08-01 16:55     ` Johannes Weiner
2016-08-02  9:18   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1469751945.13905.6.camel@redhat.com \
    --to=riel@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=ksummit-discuss@lists.linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.