From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 55AFE9C for ; Fri, 29 Jul 2016 00:25:49 +0000 (UTC) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id BC895236 for ; Fri, 29 Jul 2016 00:25:48 +0000 (UTC) Message-ID: <1469751945.13905.6.camel@redhat.com> From: Rik van Riel To: Johannes Weiner , ksummit-discuss@lists.linuxfoundation.org Date: Thu, 28 Jul 2016 20:25:45 -0400 In-Reply-To: <20160728185523.GA16390@cmpxchg.org> References: <20160725171142.GA26006@cmpxchg.org> <20160728185523.GA16390@cmpxchg.org> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-Xgud52SBzhin9JoXZ3CV" Mime-Version: 1.0 Subject: Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re: Self nomination List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , --=-Xgud52SBzhin9JoXZ3CV Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 2016-07-28 at 14:55 -0400, Johannes Weiner wrote: > On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote: > > Most recently I have been working on reviving swap for SSDs and > > persistent memory devices (https://lwn.net/Articles/690079/) as > > part > > of a bigger anti-thrashing effort to make the VM recover swiftly > > and > > predictably from load spikes. >=20 > A bit of context, in case we want to discuss this at KS: >=20 > We frequently have machines hang and stop responding indefinitely > after they experience memory load spikes. On closer look, we find > most > tasks either in page reclaim or majorfaulting parts of an executable > or library. It's a typical thrashing pattern, where everybody > cannibalizes everybody else. The problem is that with fast storage > the > cache reloads can be fast enough that there are never enough in- > flight > pages at a time to cause page reclaim to fail and trigger the OOM > killer. The livelock persists until external remediation reboots the > box or we get lucky and non-cache allocations eventually suck up the > remaining page cache and trigger the OOM killer. >=20 > To avoid hitting this situation, we currently have to keep a generous > memory reserve for occasional spikes, which sucks for utilization the > rest of the time. Swap would be useful here, but the swapout code is > basically only triggering when memory pressure rises - which again > doesn't happen - so I've been working on the swap code to balance > cache reclaim vs. swap based on relative thrashing between the two. >=20 > There is usually some cold/unused anonymous memory lying around that > can be unloaded into swap during workload spikes, so that allows us > to > drive up the average memory utilization without increasing the risk > at > least. But if we screw up and there are not enough unused anon pages, > we are back to thrashing - only now it involves swapping too. >=20 > So how do we address this? >=20 > A pathological thrashing situation is very obvious to any user, but > it's not quite clear how to quantify it inside the kernel and have it > trigger the OOM killer. It might be useful to talk about > metrics. Could we quantify application progress? Could we quantify > the > amount of time a task or the system spends thrashing, and somehow > express it as a percentage of overall execution time? Maybe something > comparable to IO wait time, except tracking the time spent performing > reclaim and waiting on IO that is refetching recently evicted pages? >=20 > This question seems to go beyond the memory subsystem and potentially > involve the scheduler and the block layer, so it might be a good tech > topic for KS. I would like to discuss this topic, as well. This is a very fundamental issue that used to be hard coded in the BSDs (in the 1980s & 1990s), but where hard coding is totally inappropriate with today's memory sizes, and variation in I/O subsystem speeds. Solving this, even if only on the detection side, could make a real difference in having systems survive load spikes. --=20 All Rights Reversed. --=-Xgud52SBzhin9JoXZ3CV Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXmqKJAAoJEM553pKExN6DF4AH/ilMFBwpePbH6c9oS5EO7QhI IyyihgYTM7NQASDCFWXF0jf67SbNNK7dQjPnv11ybw5TMKb79VfbN93MbwMljY6U NuIXEoPNdFixc0g8LMYwr301JdooYtQJ424xejEvwCKvY1rNrqU9S2dtCJ8dk0nb k7IqBIJPa6WYuKxsjx1c1QT4Xp+wMhA95G3pBD2FPI1hv4dusnh/gBE2GSNk0M38 KBSSsSuVvvsLjIoKJxdY6Y1jLfwSf2PW2IJh0v1L9R6qt30R/243bUTeMCqloozA c40mbo541JQC19aOpmCAc2VMmQcvEK1wiqF0HMNP0nw/dg1Ui/4KPBXeKFMu1SY= =hFp8 -----END PGP SIGNATURE----- --=-Xgud52SBzhin9JoXZ3CV--