linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>, Suren Baghdasaryan <surenb@google.com>,
	Vinayak Menon <vinmenon@codeaurora.org>,
	Christopher Lameter <cl@linux.com>,
	Mike Galbraith <efault@gmx.de>,
	Shakeel Butt <shakeelb@google.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure
Date: Thu, 12 Jul 2018 13:29:42 -0400	[thread overview]
Message-ID: <20180712172942.10094-11-hannes@cmpxchg.org> (raw)
In-Reply-To: <20180712172942.10094-1-hannes@cmpxchg.org>

Right now, psi reports pressure and stall times of already concluded
stall events. For most use cases this is current enough, but certain
highly latency-sensitive applications, like the Android OOM killer,
might want to know about and react to stall states before they have
even concluded (e.g. a prolonged reclaim cycle).

This patches the procfs/cgroupfs interface such that when the pressure
metrics are read, the current per-cpu states, if any, are taken into
account as well.

Any ongoing states are concluded, their time snapshotted, and then
restarted. This requires holding the rq lock to avoid corruption. It
could use some form of rq lock ratelimiting or avoidance.

Requested-by: Suren Baghdasaryan <surenb@google.com>
Not-yet-signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 kernel/sched/psi.c | 56 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 46 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 53e0b7b83e2e..5a6c6057f775 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -190,7 +190,7 @@ static void calc_avgs(unsigned long avg[3], u64 time, int missed_periods)
 	}
 }
 
-static bool psi_update_stats(struct psi_group *group)
+static bool psi_update_stats(struct psi_group *group, bool ondemand)
 {
 	u64 some[NR_PSI_RESOURCES] = { 0, };
 	u64 full[NR_PSI_RESOURCES] = { 0, };
@@ -200,8 +200,6 @@ static bool psi_update_stats(struct psi_group *group)
 	int cpu;
 	int r;
 
-	mutex_lock(&group->stat_lock);
-
 	/*
 	 * Collect the per-cpu time buckets and average them into a
 	 * single time sample that is normalized to wallclock time.
@@ -218,10 +216,36 @@ static bool psi_update_stats(struct psi_group *group)
 	for_each_online_cpu(cpu) {
 		struct psi_group_cpu *groupc = per_cpu_ptr(group->cpus, cpu);
 		unsigned long nonidle;
+		struct rq_flags rf;
+		struct rq *rq;
+		u64 now;
 
-		if (!groupc->nonidle_time)
+		if (!groupc->nonidle_time && !groupc->nonidle)
 			continue;
 
+		/*
+		 * We come here for two things: 1) periodic per-cpu
+		 * bucket flushing and averaging and 2) when the user
+		 * wants to read a pressure file. For flushing and
+		 * averaging, which is relatively infrequent, we can
+		 * be lazy and tolerate some raciness with concurrent
+		 * updates to the per-cpu counters. However, if a user
+		 * polls the pressure state, we want to give them the
+		 * most uptodate information we have, including any
+		 * currently active state which hasn't been timed yet,
+		 * because in case of an iowait or a reclaim run, that
+		 * can be significant.
+		 */
+		if (ondemand) {
+			rq = cpu_rq(cpu);
+			rq_lock_irq(rq, &rf);
+
+			now = cpu_clock(cpu);
+
+			groupc->nonidle_time += now - groupc->nonidle_start;
+			groupc->nonidle_start = now;
+		}
+
 		nonidle = nsecs_to_jiffies(groupc->nonidle_time);
 		groupc->nonidle_time = 0;
 		nonidle_total += nonidle;
@@ -229,13 +253,27 @@ static bool psi_update_stats(struct psi_group *group)
 		for (r = 0; r < NR_PSI_RESOURCES; r++) {
 			struct psi_resource *res = &groupc->res[r];
 
+			if (ondemand && res->state != PSI_NONE) {
+				bool is_full = res->state == PSI_FULL;
+
+				res->times[is_full] += now - res->state_start;
+				res->state_start = now;
+			}
+
 			some[r] += (res->times[0] + res->times[1]) * nonidle;
 			full[r] += res->times[1] * nonidle;
 
-			/* It's racy, but we can tolerate some error */
 			res->times[0] = 0;
 			res->times[1] = 0;
 		}
+
+		if (ondemand)
+			rq_unlock_irq(rq, &rf);
+	}
+
+	for (r = 0; r < NR_PSI_RESOURCES; r++) {
+		do_div(some[r], max(nonidle_total, 1UL));
+		do_div(full[r], max(nonidle_total, 1UL));
 	}
 
 	/*
@@ -249,12 +287,10 @@ static bool psi_update_stats(struct psi_group *group)
 	 * activity, thus no data, and clock ticks are sporadic. The
 	 * below handles both.
 	 */
+	mutex_lock(&group->stat_lock);
 
 	/* total= */
 	for (r = 0; r < NR_PSI_RESOURCES; r++) {
-		do_div(some[r], max(nonidle_total, 1UL));
-		do_div(full[r], max(nonidle_total, 1UL));
-
 		group->some[r] += some[r];
 		group->full[r] += full[r];
 	}
@@ -301,7 +337,7 @@ static void psi_clock(struct work_struct *work)
 	 * go - see calc_avgs() and missed_periods.
 	 */
 
-	nonidle = psi_update_stats(group);
+	nonidle = psi_update_stats(group, false);
 
 	if (nonidle) {
 		unsigned long delay = 0;
@@ -570,7 +606,7 @@ int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res)
 	if (psi_disabled)
 		return -EOPNOTSUPP;
 
-	psi_update_stats(group);
+	psi_update_stats(group, true);
 
 	for (w = 0; w < 3; w++) {
 		avg[0][w] = group->avg_some[res][w];
-- 
2.18.0


  parent reply	other threads:[~2018-07-12 17:27 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-12 17:29 [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 Johannes Weiner
2018-07-12 17:29 ` [PATCH 01/10] mm: workingset: don't drop refault information prematurely Johannes Weiner
2018-07-12 17:29 ` [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing Johannes Weiner
2018-07-23 13:36   ` Arnd Bergmann
2018-07-23 15:23     ` Johannes Weiner
2018-07-23 15:35       ` Arnd Bergmann
2018-07-23 16:27         ` Johannes Weiner
2018-07-24 15:04           ` Will Deacon
2018-07-25 16:06             ` Will Deacon
2018-07-12 17:29 ` [PATCH 03/10] delayacct: track delays from thrashing cache pages Johannes Weiner
2018-07-12 17:29 ` [PATCH 04/10] sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD Johannes Weiner
2018-07-12 17:29 ` [PATCH 05/10] sched: loadavg: make calc_load_n() public Johannes Weiner
2018-07-12 17:29 ` [PATCH 06/10] sched: sched.h: make rq locking and clock functions available in stats.h Johannes Weiner
2018-07-12 17:29 ` [PATCH 07/10] sched: introduce this_rq_lock_irq() Johannes Weiner
2018-07-12 17:29 ` [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO Johannes Weiner
2018-07-13  9:21   ` Peter Zijlstra
2018-07-13 16:17     ` Johannes Weiner
2018-07-14  8:48       ` Peter Zijlstra
2018-07-14  9:02       ` Peter Zijlstra
2018-07-17 10:03   ` Peter Zijlstra
2018-07-18 21:56     ` Johannes Weiner
2018-07-17 14:16   ` Peter Zijlstra
2018-07-18 22:00     ` Johannes Weiner
2018-07-17 14:21   ` Peter Zijlstra
2018-07-18 22:03     ` Johannes Weiner
2018-07-17 15:01   ` Peter Zijlstra
2018-07-18 22:06     ` Johannes Weiner
2018-07-20 14:13       ` Johannes Weiner
2018-07-17 15:17   ` Peter Zijlstra
2018-07-18 22:11     ` Johannes Weiner
2018-07-17 15:32   ` Peter Zijlstra
2018-07-18 12:03   ` Peter Zijlstra
2018-07-18 12:22     ` Peter Zijlstra
2018-07-18 22:36     ` Johannes Weiner
2018-07-19 13:58       ` Peter Zijlstra
2018-07-19  9:26     ` Peter Zijlstra
2018-07-19 12:50       ` Johannes Weiner
2018-07-19 13:18         ` Peter Zijlstra
2018-07-19 15:08     ` Linus Torvalds
2018-07-19 17:54       ` Johannes Weiner
2018-07-19 18:47     ` Johannes Weiner
2018-07-19 20:31       ` Peter Zijlstra
2018-07-24 16:01         ` Johannes Weiner
2018-07-18 12:46   ` Peter Zijlstra
2018-07-18 13:56     ` Johannes Weiner
2018-07-18 16:31       ` Peter Zijlstra
2018-07-18 16:46         ` Johannes Weiner
2018-07-20 20:35   ` Peter Zijlstra
2018-07-12 17:29 ` [PATCH 09/10] psi: cgroup support Johannes Weiner
2018-07-12 20:08   ` Tejun Heo
2018-07-17 15:40   ` Peter Zijlstra
2018-07-24 15:54     ` Johannes Weiner
2018-07-12 17:29 ` Johannes Weiner [this message]
2018-07-12 23:45   ` [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure Andrew Morton
2018-07-13 22:17     ` Johannes Weiner
2018-07-13 22:13   ` Suren Baghdasaryan
2018-07-13 22:49     ` Johannes Weiner
2018-07-13 23:34       ` Suren Baghdasaryan
2018-07-17 15:13   ` Peter Zijlstra
2018-07-12 17:37 ` [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 Linus Torvalds
2018-07-12 23:44 ` Andrew Morton
2018-07-13 22:14   ` Johannes Weiner
2018-07-16 15:57 ` Daniel Drake
2018-07-17 11:25   ` Michal Hocko
2018-07-17 12:13     ` Daniel Drake
2018-07-17 12:23       ` Michal Hocko
2018-07-25 22:57         ` Daniel Drake
2018-07-18 22:21     ` Johannes Weiner
2018-07-19 11:29       ` peter enderborg
2018-07-19 12:18         ` Johannes Weiner
2018-07-23 21:14 ` Balbir Singh
2018-07-24 15:15   ` Johannes Weiner
2018-07-26  1:07     ` Singh, Balbir
2018-07-26 20:07       ` Johannes Weiner
2018-07-27 23:40         ` Suren Baghdasaryan
2018-07-27 22:01 ` Pavel Machek
2018-07-30 15:40   ` Johannes Weiner
2018-07-30 17:39     ` Pavel Machek
2018-07-30 17:51       ` Tejun Heo
2018-07-30 17:54         ` Randy Dunlap
2018-07-30 18:05           ` Tejun Heo
2018-07-30 17:59         ` Pavel Machek
2018-07-30 18:07           ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180712172942.10094-11-hannes@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=efault@gmx.de \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vinmenon@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).