linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "Artem S. Tashkinov" <aros@gmx.com>,
	linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Suren Baghdasaryan <surenb@google.com>
Subject: Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure
Date: Mon, 5 Aug 2019 15:31:19 +0200	[thread overview]
Message-ID: <20190805133119.GO7597@dhcp22.suse.cz> (raw)
In-Reply-To: <ce102f29-3adc-d0fd-41ee-e32c1bcd7e8d@suse.cz>

On Mon 05-08-19 14:13:16, Vlastimil Babka wrote:
> On 8/4/19 11:23 AM, Artem S. Tashkinov wrote:
> > Hello,
> > 
> > There's this bug which has been bugging many people for many years
> > already and which is reproducible in less than a few minutes under the
> > latest and greatest kernel, 5.2.6. All the kernel parameters are set to
> > defaults.
> > 
> > Steps to reproduce:
> > 
> > 1) Boot with mem=4G
> > 2) Disable swap to make everything faster (sudo swapoff -a)
> > 3) Launch a web browser, e.g. Chrome/Chromium or/and Firefox
> > 4) Start opening tabs in either of them and watch your free RAM decrease
> > 
> > Once you hit a situation when opening a new tab requires more RAM than
> > is currently available, the system will stall hard. You will barely  be
> > able to move the mouse pointer. Your disk LED will be flashing
> > incessantly (I'm not entirely sure why). You will not be able to run new
> > applications or close currently running ones.
> 
> > This little crisis may continue for minutes or even longer. I think
> > that's not how the system should behave in this situation. I believe
> > something must be done about that to avoid this stall.
> 
> Yeah that's a known problem, made worse SSD's in fact, as they are able
> to keep refaulting the last remaining file pages fast enough, so there
> is still apparent progress in reclaim and OOM doesn't kick in.
> 
> At this point, the likely solution will be probably based on pressure
> stall monitoring (PSI). I don't know how far we are from a built-in
> monitor with reasonable defaults for a desktop workload, so CCing
> relevant folks.

Another potential approach would be to consider the refault information
we have already for file backed pages. Once we start reclaiming only
workingset pages then we should be trashing, right? It cannot be as
precise as the cost model which can be defined around PSI but it might
give us at least a fallback measure.

This is a really just an idea for a primitive detection. Most likely
incorrect one but it shows an idea at least. It is completely untested
and might be completely broken so unless somebody is really brave and
doesn't run anything that would be missed then I do not recommend to run
it.

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 70394cabaf4e..7f30c78b4fbc 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -300,6 +300,7 @@ struct lruvec {
 	atomic_long_t			inactive_age;
 	/* Refaults at the time of last reclaim cycle */
 	unsigned long			refaults;
+	atomic_t			workingset_refaults;
 #ifdef CONFIG_MEMCG
 	struct pglist_data *pgdat;
 #endif
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 4bfb5c4ac108..4401753c3912 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -311,6 +311,15 @@ void *workingset_eviction(struct page *page);
 void workingset_refault(struct page *page, void *shadow);
 void workingset_activation(struct page *page);
 
+bool lruvec_trashing(struct lruvec *lruvec)
+{
+	/*
+	 * One quarter of the inactive list is constantly refaulting.
+	 * This suggests that we are trashing.
+	 */
+	return 4 * atomic_read(&lruvec->workingset_refaults) > lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
+}
+
 /* Only track the nodes of mappings with shadow entries */
 void workingset_update_node(struct xa_node *node);
 #define mapping_set_update(xas, mapping) do {				\
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7889f583ced9..d198594af0cd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2381,6 +2381,11 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 						  denominator);
 			break;
 		case SCAN_FILE:
+			if (lruvec_trashing(lruvec)) {
+				size = 0;
+				scan = 0;
+				break;
+			}
 		case SCAN_ANON:
 			/* Scan one type exclusively */
 			if ((scan_balance == SCAN_FILE) != file) {
diff --git a/mm/workingset.c b/mm/workingset.c
index e0b4edcb88c8..ee4c45b27e34 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -309,17 +309,25 @@ void workingset_refault(struct page *page, void *shadow)
 	 * don't act on pages that couldn't stay resident even if all
 	 * the memory was available to the page cache.
 	 */
-	if (refault_distance > active_file)
+	if (refault_distance > active_file) {
+		atomic_set(&lruvec->workingset_refaults, 0);
 		goto out;
+	}
 
 	SetPageActive(page);
 	atomic_long_inc(&lruvec->inactive_age);
 	inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE);
+	atomic_inc(&lruvec->workingset_refaults);
 
 	/* Page was active prior to eviction */
 	if (workingset) {
 		SetPageWorkingset(page);
 		inc_lruvec_state(lruvec, WORKINGSET_RESTORE);
+		/*
+		 * Double the trashing numbers for the actual working set.
+		 * refaults
+		 */
+		atomic_inc(&lruvec->workingset_refaults);
 	}
 out:
 	rcu_read_unlock();
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2019-08-05 13:31 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-04  9:23 Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure Artem S. Tashkinov
2019-08-05 12:13 ` Vlastimil Babka
2019-08-05 13:31   ` Michal Hocko [this message]
2019-08-05 16:47     ` Suren Baghdasaryan
2019-08-05 18:55     ` Johannes Weiner
2019-08-06  9:29       ` Michal Hocko
2019-08-05 19:31   ` Johannes Weiner
2019-08-06  1:08     ` Suren Baghdasaryan
2019-08-06  9:36       ` Vlastimil Babka
2019-08-06 14:27         ` Johannes Weiner
2019-08-06 14:36           ` Michal Hocko
2019-08-06 16:27             ` Suren Baghdasaryan
2019-08-06 22:01               ` Johannes Weiner
2019-08-07  7:59                 ` Michal Hocko
2019-08-07 20:51                   ` Johannes Weiner
2019-08-07 21:01                     ` Andrew Morton
2019-08-07 21:34                       ` Johannes Weiner
2019-08-07 21:12                     ` Johannes Weiner
2019-08-08 11:48                     ` Michal Hocko
2019-08-08 15:10                       ` ndrw.xf
2019-08-08 16:32                         ` Michal Hocko
2019-08-08 17:57                           ` ndrw.xf
2019-08-08 18:59                             ` Michal Hocko
2019-08-08 21:59                               ` ndrw
2019-08-09  8:57                                 ` Michal Hocko
2019-08-09 10:09                                   ` ndrw
2019-08-09 10:50                                     ` Michal Hocko
2019-08-09 14:18                                       ` Pintu Agarwal
2019-08-10 12:34                                       ` ndrw
2019-08-12  8:24                                         ` Michal Hocko
2019-08-10 21:07                                   ` ndrw
2021-07-24 17:32                         ` Alexey Avramov
2019-08-08 14:47                     ` Vlastimil Babka
2019-08-08 17:27                       ` Johannes Weiner
2019-08-09 14:56                         ` Vlastimil Babka
2019-08-09 17:31                           ` Johannes Weiner
2019-08-13 13:47                             ` Vlastimil Babka
2019-08-06 21:43       ` James Courtier-Dutton
2019-08-06 19:00 ` Florian Weimer
2019-08-20  6:46 ` Daniel Drake
2019-08-21 21:42   ` James Courtier-Dutton
2019-08-29 12:29     ` Michal Hocko
2019-09-02 20:15     ` Pavel Machek
2019-08-23  1:54   ` ndrw
2019-08-23  2:14     ` Daniel Drake
     [not found] <20190805090514.5992-1-hdanton@sina.com>
2019-08-05 12:01 ` Artem S. Tashkinov
2019-08-06  8:57 Johannes Buchner
2019-08-06 19:43 Remi Gauvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190805133119.GO7597@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aros@gmx.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).