All of lore.kernel.org
 help / color / mirror / Atom feed
From: Viresh Kumar <viresh.kumar@linaro.org>
To: akpm@linux-foundation.org, hannes@cmpxchg.org, cl@linux.com
Cc: linaro-kernel@lists.linaro.org, linux-kernel@vger.kernel.org,
	vinmenon@codeaurora.org, shashim@codeaurora.org, mhocko@suse.cz,
	mgorman@suse.de, dave@stgolabs.net, koct9i@gmail.com,
	linux-mm@kvack.org, Viresh Kumar <viresh.kumar@linaro.org>
Subject: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
Date: Thu, 26 Mar 2015 11:09:01 +0530	[thread overview]
Message-ID: <359c926bc85cdf79650e39f2344c2083002545bb.1427347966.git.viresh.kumar@linaro.org> (raw)

A delayed work to schedule vmstat_shepherd() is queued at periodic intervals for
internal working of vmstat core. This work and its timer end up waking an idle
cpu sometimes, as this always stays on CPU0.

Because we re-queue the work from its handler, idle_cpu() returns false and so
the timer (used by delayed work) never migrates to any other CPU.

This may not be the desired behavior always as waking up an idle CPU to queue
work on few other CPUs isn't good from power-consumption point of view.

In order to avoid waking up an idle core, we can replace schedule_delayed_work()
with a normal work plus a separate timer. The timer handler will then queue the
work after re-arming the timer. If the CPU was idle before the timer fired,
idle_cpu() will mostly return true and the next timer shall be migrated to a
non-idle CPU.

But the timer core has a limitation, when the timer is re-armed from its
handler, timer core disables migration of that timer to other cores. Details of
that limitation are present in kernel/time/timer.c:__mod_timer() routine.

Another simple yet effective solution can be to keep two timers with same
handler and keep toggling between them, so that the above limitation doesn't
hold true anymore.

This patch replaces schedule_delayed_work() with schedule_work() plus two
timers. After this, it was seen that the timer and its do get migrated to other
non-idle CPUs, when the local cpu is idle.

Tested-by: Vinayak Menon <vinmenon@codeaurora.org>
Tested-by: Shiraz Hashim <shashim@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
This patch isn't sent to say its the best way forward, but to get a discussion
started on the same.

 mm/vmstat.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4f5cd974e11a..d45e4243a046 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1424,8 +1424,18 @@ static bool need_update(int cpu)
  * inactivity.
  */
 static void vmstat_shepherd(struct work_struct *w);
+static DECLARE_WORK(shepherd, vmstat_shepherd);
 
-static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd);
+/*
+ * Two timers are used here to avoid waking up an idle CPU. If a single timer is
+ * kept, then re-arming the timer from its handler doesn't let us migrate it.
+ */
+static struct timer_list shepherd_timer[2];
+#define toggle_timer() (shepherd_timer_index = !shepherd_timer_index,	\
+			&shepherd_timer[shepherd_timer_index])
+
+static void vmstat_shepherd_timer(unsigned long data);
+static int shepherd_timer_index;
 
 static void vmstat_shepherd(struct work_struct *w)
 {
@@ -1441,15 +1451,19 @@ static void vmstat_shepherd(struct work_struct *w)
 				&per_cpu(vmstat_work, cpu), 0);
 
 	put_online_cpus();
+}
 
-	schedule_delayed_work(&shepherd,
-		round_jiffies_relative(sysctl_stat_interval));
+static void vmstat_shepherd_timer(unsigned long data)
+{
+	mod_timer(toggle_timer(),
+		  jiffies + round_jiffies_relative(sysctl_stat_interval));
+	schedule_work(&shepherd);
 
 }
 
 static void __init start_shepherd_timer(void)
 {
-	int cpu;
+	int cpu, i = -1;
 
 	for_each_possible_cpu(cpu)
 		INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu),
@@ -1459,8 +1473,13 @@ static void __init start_shepherd_timer(void)
 		BUG();
 	cpumask_copy(cpu_stat_off, cpu_online_mask);
 
-	schedule_delayed_work(&shepherd,
-		round_jiffies_relative(sysctl_stat_interval));
+	while (++i < 2) {
+		init_timer(&shepherd_timer[i]);
+		shepherd_timer[i].function = vmstat_shepherd_timer;
+	};
+
+	mod_timer(toggle_timer(),
+		  jiffies + round_jiffies_relative(sysctl_stat_interval));
 }
 
 static void vmstat_cpu_dead(int node)
-- 
2.3.0.rc0.44.ga94655d


WARNING: multiple messages have this Message-ID (diff)
From: Viresh Kumar <viresh.kumar@linaro.org>
To: akpm@linux-foundation.org, hannes@cmpxchg.org, cl@linux.com
Cc: linaro-kernel@lists.linaro.org, linux-kernel@vger.kernel.org,
	vinmenon@codeaurora.org, shashim@codeaurora.org, mhocko@suse.cz,
	mgorman@suse.de, dave@stgolabs.net, koct9i@gmail.com,
	linux-mm@kvack.org, Viresh Kumar <viresh.kumar@linaro.org>
Subject: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
Date: Thu, 26 Mar 2015 11:09:01 +0530	[thread overview]
Message-ID: <359c926bc85cdf79650e39f2344c2083002545bb.1427347966.git.viresh.kumar@linaro.org> (raw)

A delayed work to schedule vmstat_shepherd() is queued at periodic intervals for
internal working of vmstat core. This work and its timer end up waking an idle
cpu sometimes, as this always stays on CPU0.

Because we re-queue the work from its handler, idle_cpu() returns false and so
the timer (used by delayed work) never migrates to any other CPU.

This may not be the desired behavior always as waking up an idle CPU to queue
work on few other CPUs isn't good from power-consumption point of view.

In order to avoid waking up an idle core, we can replace schedule_delayed_work()
with a normal work plus a separate timer. The timer handler will then queue the
work after re-arming the timer. If the CPU was idle before the timer fired,
idle_cpu() will mostly return true and the next timer shall be migrated to a
non-idle CPU.

But the timer core has a limitation, when the timer is re-armed from its
handler, timer core disables migration of that timer to other cores. Details of
that limitation are present in kernel/time/timer.c:__mod_timer() routine.

Another simple yet effective solution can be to keep two timers with same
handler and keep toggling between them, so that the above limitation doesn't
hold true anymore.

This patch replaces schedule_delayed_work() with schedule_work() plus two
timers. After this, it was seen that the timer and its do get migrated to other
non-idle CPUs, when the local cpu is idle.

Tested-by: Vinayak Menon <vinmenon@codeaurora.org>
Tested-by: Shiraz Hashim <shashim@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
This patch isn't sent to say its the best way forward, but to get a discussion
started on the same.

 mm/vmstat.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4f5cd974e11a..d45e4243a046 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1424,8 +1424,18 @@ static bool need_update(int cpu)
  * inactivity.
  */
 static void vmstat_shepherd(struct work_struct *w);
+static DECLARE_WORK(shepherd, vmstat_shepherd);
 
-static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd);
+/*
+ * Two timers are used here to avoid waking up an idle CPU. If a single timer is
+ * kept, then re-arming the timer from its handler doesn't let us migrate it.
+ */
+static struct timer_list shepherd_timer[2];
+#define toggle_timer() (shepherd_timer_index = !shepherd_timer_index,	\
+			&shepherd_timer[shepherd_timer_index])
+
+static void vmstat_shepherd_timer(unsigned long data);
+static int shepherd_timer_index;
 
 static void vmstat_shepherd(struct work_struct *w)
 {
@@ -1441,15 +1451,19 @@ static void vmstat_shepherd(struct work_struct *w)
 				&per_cpu(vmstat_work, cpu), 0);
 
 	put_online_cpus();
+}
 
-	schedule_delayed_work(&shepherd,
-		round_jiffies_relative(sysctl_stat_interval));
+static void vmstat_shepherd_timer(unsigned long data)
+{
+	mod_timer(toggle_timer(),
+		  jiffies + round_jiffies_relative(sysctl_stat_interval));
+	schedule_work(&shepherd);
 
 }
 
 static void __init start_shepherd_timer(void)
 {
-	int cpu;
+	int cpu, i = -1;
 
 	for_each_possible_cpu(cpu)
 		INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu),
@@ -1459,8 +1473,13 @@ static void __init start_shepherd_timer(void)
 		BUG();
 	cpumask_copy(cpu_stat_off, cpu_online_mask);
 
-	schedule_delayed_work(&shepherd,
-		round_jiffies_relative(sysctl_stat_interval));
+	while (++i < 2) {
+		init_timer(&shepherd_timer[i]);
+		shepherd_timer[i].function = vmstat_shepherd_timer;
+	};
+
+	mod_timer(toggle_timer(),
+		  jiffies + round_jiffies_relative(sysctl_stat_interval));
 }
 
 static void vmstat_cpu_dead(int node)
-- 
2.3.0.rc0.44.ga94655d

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2015-03-26  5:39 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-26  5:39 Viresh Kumar [this message]
2015-03-26  5:39 ` [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work Viresh Kumar
2015-03-26 20:18 ` Andrew Morton
2015-03-26 20:18   ` Andrew Morton
2015-03-27  4:49   ` Viresh Kumar
2015-03-27  4:49     ` Viresh Kumar
2015-03-27  9:16     ` Peter Zijlstra
2015-03-27  9:16       ` Peter Zijlstra
2015-03-27  9:30       ` Peter Zijlstra
2015-03-27  9:30         ` Peter Zijlstra
2015-03-27 11:11         ` Christoph Lameter
2015-03-27 11:11           ` Christoph Lameter
2015-03-27 12:02           ` Peter Zijlstra
2015-03-27 12:02             ` Peter Zijlstra
2015-03-27 19:45             ` Christoph Lameter
2015-03-27 19:45               ` Christoph Lameter
2015-03-28  4:28             ` Viresh Kumar
2015-03-28  4:28               ` Viresh Kumar
2015-03-28 11:41               ` Peter Zijlstra
2015-03-28 11:41                 ` Peter Zijlstra
2015-03-28  4:18         ` Viresh Kumar
2015-03-28  4:18           ` Viresh Kumar
2015-03-28  9:53           ` Peter Zijlstra
2015-03-28  9:53             ` Peter Zijlstra
2015-03-28 11:57             ` viresh kumar
2015-03-28 11:57               ` viresh kumar
2015-03-28 12:04               ` Viresh Kumar
2015-03-28 12:04                 ` Viresh Kumar
2015-03-28 13:44               ` Peter Zijlstra
2015-03-28 13:44                 ` Peter Zijlstra
2015-03-29 10:24                 ` Peter Zijlstra
2015-03-29 10:24                   ` Peter Zijlstra
2015-03-30 12:02                   ` Viresh Kumar
2015-03-30 12:02                     ` Viresh Kumar
2015-03-30 12:47                     ` Peter Zijlstra
2015-03-30 12:47                       ` Peter Zijlstra
2015-03-30 13:14                       ` Viresh Kumar
2015-03-30 13:14                         ` Viresh Kumar
2015-03-30 13:59                         ` Peter Zijlstra
2015-03-30 13:59                           ` Peter Zijlstra
2015-03-30 16:17                           ` Viresh Kumar
2015-03-30 16:17                             ` Viresh Kumar
2015-03-30 16:25                             ` Peter Zijlstra
2015-03-30 16:25                               ` Peter Zijlstra
2015-03-29 12:01                 ` Viresh Kumar
2015-03-29 12:01                   ` Viresh Kumar
2015-03-29 17:24                   ` Peter Zijlstra
2015-03-29 17:24                     ` Peter Zijlstra
2015-03-30 15:08             ` Michal Hocko
2015-03-30 15:08               ` Michal Hocko
2015-03-30 15:14               ` Peter Zijlstra
2015-03-30 15:14                 ` Peter Zijlstra
2015-03-30 15:42               ` Christoph Lameter
2015-03-30 15:42                 ` Christoph Lameter
2015-03-27 14:19 ` Michal Hocko
2015-03-27 14:19   ` Michal Hocko
2015-03-28  4:34   ` Viresh Kumar
2015-03-28  4:34     ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=359c926bc85cdf79650e39f2344c2083002545bb.1427347966.git.viresh.kumar@linaro.org \
    --to=viresh.kumar@linaro.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dave@stgolabs.net \
    --cc=hannes@cmpxchg.org \
    --cc=koct9i@gmail.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=shashim@codeaurora.org \
    --cc=vinmenon@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.