All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-26  5:39 ` Viresh Kumar
  0 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-26  5:39 UTC (permalink / raw)
  To: akpm, hannes, cl
  Cc: linaro-kernel, linux-kernel, vinmenon, shashim, mhocko, mgorman,
	dave, koct9i, linux-mm, Viresh Kumar

A delayed work to schedule vmstat_shepherd() is queued at periodic intervals for
internal working of vmstat core. This work and its timer end up waking an idle
cpu sometimes, as this always stays on CPU0.

Because we re-queue the work from its handler, idle_cpu() returns false and so
the timer (used by delayed work) never migrates to any other CPU.

This may not be the desired behavior always as waking up an idle CPU to queue
work on few other CPUs isn't good from power-consumption point of view.

In order to avoid waking up an idle core, we can replace schedule_delayed_work()
with a normal work plus a separate timer. The timer handler will then queue the
work after re-arming the timer. If the CPU was idle before the timer fired,
idle_cpu() will mostly return true and the next timer shall be migrated to a
non-idle CPU.

But the timer core has a limitation, when the timer is re-armed from its
handler, timer core disables migration of that timer to other cores. Details of
that limitation are present in kernel/time/timer.c:__mod_timer() routine.

Another simple yet effective solution can be to keep two timers with same
handler and keep toggling between them, so that the above limitation doesn't
hold true anymore.

This patch replaces schedule_delayed_work() with schedule_work() plus two
timers. After this, it was seen that the timer and its do get migrated to other
non-idle CPUs, when the local cpu is idle.

Tested-by: Vinayak Menon <vinmenon@codeaurora.org>
Tested-by: Shiraz Hashim <shashim@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
This patch isn't sent to say its the best way forward, but to get a discussion
started on the same.

 mm/vmstat.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4f5cd974e11a..d45e4243a046 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1424,8 +1424,18 @@ static bool need_update(int cpu)
  * inactivity.
  */
 static void vmstat_shepherd(struct work_struct *w);
+static DECLARE_WORK(shepherd, vmstat_shepherd);
 
-static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd);
+/*
+ * Two timers are used here to avoid waking up an idle CPU. If a single timer is
+ * kept, then re-arming the timer from its handler doesn't let us migrate it.
+ */
+static struct timer_list shepherd_timer[2];
+#define toggle_timer() (shepherd_timer_index = !shepherd_timer_index,	\
+			&shepherd_timer[shepherd_timer_index])
+
+static void vmstat_shepherd_timer(unsigned long data);
+static int shepherd_timer_index;
 
 static void vmstat_shepherd(struct work_struct *w)
 {
@@ -1441,15 +1451,19 @@ static void vmstat_shepherd(struct work_struct *w)
 				&per_cpu(vmstat_work, cpu), 0);
 
 	put_online_cpus();
+}
 
-	schedule_delayed_work(&shepherd,
-		round_jiffies_relative(sysctl_stat_interval));
+static void vmstat_shepherd_timer(unsigned long data)
+{
+	mod_timer(toggle_timer(),
+		  jiffies + round_jiffies_relative(sysctl_stat_interval));
+	schedule_work(&shepherd);
 
 }
 
 static void __init start_shepherd_timer(void)
 {
-	int cpu;
+	int cpu, i = -1;
 
 	for_each_possible_cpu(cpu)
 		INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu),
@@ -1459,8 +1473,13 @@ static void __init start_shepherd_timer(void)
 		BUG();
 	cpumask_copy(cpu_stat_off, cpu_online_mask);
 
-	schedule_delayed_work(&shepherd,
-		round_jiffies_relative(sysctl_stat_interval));
+	while (++i < 2) {
+		init_timer(&shepherd_timer[i]);
+		shepherd_timer[i].function = vmstat_shepherd_timer;
+	};
+
+	mod_timer(toggle_timer(),
+		  jiffies + round_jiffies_relative(sysctl_stat_interval));
 }
 
 static void vmstat_cpu_dead(int node)
-- 
2.3.0.rc0.44.ga94655d


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-26  5:39 ` Viresh Kumar
  0 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-26  5:39 UTC (permalink / raw)
  To: akpm, hannes, cl
  Cc: linaro-kernel, linux-kernel, vinmenon, shashim, mhocko, mgorman,
	dave, koct9i, linux-mm, Viresh Kumar

A delayed work to schedule vmstat_shepherd() is queued at periodic intervals for
internal working of vmstat core. This work and its timer end up waking an idle
cpu sometimes, as this always stays on CPU0.

Because we re-queue the work from its handler, idle_cpu() returns false and so
the timer (used by delayed work) never migrates to any other CPU.

This may not be the desired behavior always as waking up an idle CPU to queue
work on few other CPUs isn't good from power-consumption point of view.

In order to avoid waking up an idle core, we can replace schedule_delayed_work()
with a normal work plus a separate timer. The timer handler will then queue the
work after re-arming the timer. If the CPU was idle before the timer fired,
idle_cpu() will mostly return true and the next timer shall be migrated to a
non-idle CPU.

But the timer core has a limitation, when the timer is re-armed from its
handler, timer core disables migration of that timer to other cores. Details of
that limitation are present in kernel/time/timer.c:__mod_timer() routine.

Another simple yet effective solution can be to keep two timers with same
handler and keep toggling between them, so that the above limitation doesn't
hold true anymore.

This patch replaces schedule_delayed_work() with schedule_work() plus two
timers. After this, it was seen that the timer and its do get migrated to other
non-idle CPUs, when the local cpu is idle.

Tested-by: Vinayak Menon <vinmenon@codeaurora.org>
Tested-by: Shiraz Hashim <shashim@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
This patch isn't sent to say its the best way forward, but to get a discussion
started on the same.

 mm/vmstat.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4f5cd974e11a..d45e4243a046 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1424,8 +1424,18 @@ static bool need_update(int cpu)
  * inactivity.
  */
 static void vmstat_shepherd(struct work_struct *w);
+static DECLARE_WORK(shepherd, vmstat_shepherd);
 
-static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd);
+/*
+ * Two timers are used here to avoid waking up an idle CPU. If a single timer is
+ * kept, then re-arming the timer from its handler doesn't let us migrate it.
+ */
+static struct timer_list shepherd_timer[2];
+#define toggle_timer() (shepherd_timer_index = !shepherd_timer_index,	\
+			&shepherd_timer[shepherd_timer_index])
+
+static void vmstat_shepherd_timer(unsigned long data);
+static int shepherd_timer_index;
 
 static void vmstat_shepherd(struct work_struct *w)
 {
@@ -1441,15 +1451,19 @@ static void vmstat_shepherd(struct work_struct *w)
 				&per_cpu(vmstat_work, cpu), 0);
 
 	put_online_cpus();
+}
 
-	schedule_delayed_work(&shepherd,
-		round_jiffies_relative(sysctl_stat_interval));
+static void vmstat_shepherd_timer(unsigned long data)
+{
+	mod_timer(toggle_timer(),
+		  jiffies + round_jiffies_relative(sysctl_stat_interval));
+	schedule_work(&shepherd);
 
 }
 
 static void __init start_shepherd_timer(void)
 {
-	int cpu;
+	int cpu, i = -1;
 
 	for_each_possible_cpu(cpu)
 		INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu),
@@ -1459,8 +1473,13 @@ static void __init start_shepherd_timer(void)
 		BUG();
 	cpumask_copy(cpu_stat_off, cpu_online_mask);
 
-	schedule_delayed_work(&shepherd,
-		round_jiffies_relative(sysctl_stat_interval));
+	while (++i < 2) {
+		init_timer(&shepherd_timer[i]);
+		shepherd_timer[i].function = vmstat_shepherd_timer;
+	};
+
+	mod_timer(toggle_timer(),
+		  jiffies + round_jiffies_relative(sysctl_stat_interval));
 }
 
 static void vmstat_cpu_dead(int node)
-- 
2.3.0.rc0.44.ga94655d

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-26  5:39 ` Viresh Kumar
@ 2015-03-26 20:18   ` Andrew Morton
  -1 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2015-03-26 20:18 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: hannes, cl, linaro-kernel, linux-kernel, vinmenon, shashim,
	mhocko, mgorman, dave, koct9i, linux-mm, Suresh Siddha,
	Peter Zijlstra, Thomas Gleixner

On Thu, 26 Mar 2015 11:09:01 +0530 Viresh Kumar <viresh.kumar@linaro.org> wrote:

> A delayed work to schedule vmstat_shepherd() is queued at periodic intervals for
> internal working of vmstat core. This work and its timer end up waking an idle
> cpu sometimes, as this always stays on CPU0.
> 
> Because we re-queue the work from its handler, idle_cpu() returns false and so
> the timer (used by delayed work) never migrates to any other CPU.
> 
> This may not be the desired behavior always as waking up an idle CPU to queue
> work on few other CPUs isn't good from power-consumption point of view.
> 
> In order to avoid waking up an idle core, we can replace schedule_delayed_work()
> with a normal work plus a separate timer. The timer handler will then queue the
> work after re-arming the timer. If the CPU was idle before the timer fired,
> idle_cpu() will mostly return true and the next timer shall be migrated to a
> non-idle CPU.
> 
> But the timer core has a limitation, when the timer is re-armed from its
> handler, timer core disables migration of that timer to other cores. Details of
> that limitation are present in kernel/time/timer.c:__mod_timer() routine.
> 
> Another simple yet effective solution can be to keep two timers with same
> handler and keep toggling between them, so that the above limitation doesn't
> hold true anymore.
> 
> This patch replaces schedule_delayed_work() with schedule_work() plus two
> timers. After this, it was seen that the timer and its do get migrated to other
> non-idle CPUs, when the local cpu is idle.

Shouldn't this be viewed as a shortcoming of the core timer code? 

vmstat_shepherd() is merely rescheduling itself with
schedule_delayed_work().  That's a dead bog simple operation and if
it's producing suboptimal behaviour then we shouldn't be fixing it with
elaborate workarounds in the caller?


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-26 20:18   ` Andrew Morton
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2015-03-26 20:18 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: hannes, cl, linaro-kernel, linux-kernel, vinmenon, shashim,
	mhocko, mgorman, dave, koct9i, linux-mm, Suresh Siddha,
	Peter Zijlstra, Thomas Gleixner

On Thu, 26 Mar 2015 11:09:01 +0530 Viresh Kumar <viresh.kumar@linaro.org> wrote:

> A delayed work to schedule vmstat_shepherd() is queued at periodic intervals for
> internal working of vmstat core. This work and its timer end up waking an idle
> cpu sometimes, as this always stays on CPU0.
> 
> Because we re-queue the work from its handler, idle_cpu() returns false and so
> the timer (used by delayed work) never migrates to any other CPU.
> 
> This may not be the desired behavior always as waking up an idle CPU to queue
> work on few other CPUs isn't good from power-consumption point of view.
> 
> In order to avoid waking up an idle core, we can replace schedule_delayed_work()
> with a normal work plus a separate timer. The timer handler will then queue the
> work after re-arming the timer. If the CPU was idle before the timer fired,
> idle_cpu() will mostly return true and the next timer shall be migrated to a
> non-idle CPU.
> 
> But the timer core has a limitation, when the timer is re-armed from its
> handler, timer core disables migration of that timer to other cores. Details of
> that limitation are present in kernel/time/timer.c:__mod_timer() routine.
> 
> Another simple yet effective solution can be to keep two timers with same
> handler and keep toggling between them, so that the above limitation doesn't
> hold true anymore.
> 
> This patch replaces schedule_delayed_work() with schedule_work() plus two
> timers. After this, it was seen that the timer and its do get migrated to other
> non-idle CPUs, when the local cpu is idle.

Shouldn't this be viewed as a shortcoming of the core timer code? 

vmstat_shepherd() is merely rescheduling itself with
schedule_delayed_work().  That's a dead bog simple operation and if
it's producing suboptimal behaviour then we shouldn't be fixing it with
elaborate workarounds in the caller?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-26 20:18   ` Andrew Morton
@ 2015-03-27  4:49     ` Viresh Kumar
  -1 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-27  4:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, Christoph Lameter, Linaro Kernel Mailman List,
	Linux Kernel Mailing List, vinmenon, shashim, Michal Hocko,
	mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Peter Zijlstra, Thomas Gleixner

On 27 March 2015 at 01:48, Andrew Morton <akpm@linux-foundation.org> wrote:
> Shouldn't this be viewed as a shortcoming of the core timer code?

Yeah, it is. Some (not so pretty) solutions were tried earlier to fix that, but
they are rejected for obviously reasons [1].

> vmstat_shepherd() is merely rescheduling itself with
> schedule_delayed_work().  That's a dead bog simple operation and if
> it's producing suboptimal behaviour then we shouldn't be fixing it with
> elaborate workarounds in the caller?

I understand that, and that's why I sent it as an RFC to get the discussion
started. Does anyone else have got another (acceptable) idea to get this
resolved ?

--
viresh

[1] http://lists.linaro.org/pipermail/linaro-kernel/2013-November/008866.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-27  4:49     ` Viresh Kumar
  0 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-27  4:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, Christoph Lameter, Linaro Kernel Mailman List,
	Linux Kernel Mailing List, vinmenon, shashim, Michal Hocko,
	mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Peter Zijlstra, Thomas Gleixner

On 27 March 2015 at 01:48, Andrew Morton <akpm@linux-foundation.org> wrote:
> Shouldn't this be viewed as a shortcoming of the core timer code?

Yeah, it is. Some (not so pretty) solutions were tried earlier to fix that, but
they are rejected for obviously reasons [1].

> vmstat_shepherd() is merely rescheduling itself with
> schedule_delayed_work().  That's a dead bog simple operation and if
> it's producing suboptimal behaviour then we shouldn't be fixing it with
> elaborate workarounds in the caller?

I understand that, and that's why I sent it as an RFC to get the discussion
started. Does anyone else have got another (acceptable) idea to get this
resolved ?

--
viresh

[1] http://lists.linaro.org/pipermail/linaro-kernel/2013-November/008866.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-27  4:49     ` Viresh Kumar
@ 2015-03-27  9:16       ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-27  9:16 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, mgorman, dave, koct9i,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Fri, Mar 27, 2015 at 10:19:54AM +0530, Viresh Kumar wrote:
> On 27 March 2015 at 01:48, Andrew Morton <akpm@linux-foundation.org> wrote:
> > Shouldn't this be viewed as a shortcoming of the core timer code?
> 
> Yeah, it is. Some (not so pretty) solutions were tried earlier to fix that, but
> they are rejected for obviously reasons [1].
> 
> > vmstat_shepherd() is merely rescheduling itself with
> > schedule_delayed_work().  That's a dead bog simple operation and if
> > it's producing suboptimal behaviour then we shouldn't be fixing it with
> > elaborate workarounds in the caller?
> 
> I understand that, and that's why I sent it as an RFC to get the discussion
> started. Does anyone else have got another (acceptable) idea to get this
> resolved ?

So the issue seems to be that we need base->running_timer in order to
tell if a callback is running, right?

We could align the base on 8 bytes to gain an extra bit in the pointer
and use that bit to indicate the running state. Then these sites can
spin on that bit while we can change the actual base pointer.

Since the timer->base pointer is locked through the base->lock and
hand-over is safe vs lock_timer_base, this should all work.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-27  9:16       ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-27  9:16 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, mgorman, dave, koct9i,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Fri, Mar 27, 2015 at 10:19:54AM +0530, Viresh Kumar wrote:
> On 27 March 2015 at 01:48, Andrew Morton <akpm@linux-foundation.org> wrote:
> > Shouldn't this be viewed as a shortcoming of the core timer code?
> 
> Yeah, it is. Some (not so pretty) solutions were tried earlier to fix that, but
> they are rejected for obviously reasons [1].
> 
> > vmstat_shepherd() is merely rescheduling itself with
> > schedule_delayed_work().  That's a dead bog simple operation and if
> > it's producing suboptimal behaviour then we shouldn't be fixing it with
> > elaborate workarounds in the caller?
> 
> I understand that, and that's why I sent it as an RFC to get the discussion
> started. Does anyone else have got another (acceptable) idea to get this
> resolved ?

So the issue seems to be that we need base->running_timer in order to
tell if a callback is running, right?

We could align the base on 8 bytes to gain an extra bit in the pointer
and use that bit to indicate the running state. Then these sites can
spin on that bit while we can change the actual base pointer.

Since the timer->base pointer is locked through the base->lock and
hand-over is safe vs lock_timer_base, this should all work.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-27  9:16       ` Peter Zijlstra
@ 2015-03-27  9:30         ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-27  9:30 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, mgorman, dave, koct9i,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Fri, Mar 27, 2015 at 10:16:13AM +0100, Peter Zijlstra wrote:
> On Fri, Mar 27, 2015 at 10:19:54AM +0530, Viresh Kumar wrote:
> > On 27 March 2015 at 01:48, Andrew Morton <akpm@linux-foundation.org> wrote:
> > > Shouldn't this be viewed as a shortcoming of the core timer code?
> > 
> > Yeah, it is. Some (not so pretty) solutions were tried earlier to fix that, but
> > they are rejected for obviously reasons [1].
> > 
> > > vmstat_shepherd() is merely rescheduling itself with
> > > schedule_delayed_work().  That's a dead bog simple operation and if
> > > it's producing suboptimal behaviour then we shouldn't be fixing it with
> > > elaborate workarounds in the caller?
> > 
> > I understand that, and that's why I sent it as an RFC to get the discussion
> > started. Does anyone else have got another (acceptable) idea to get this
> > resolved ?
> 
> So the issue seems to be that we need base->running_timer in order to
> tell if a callback is running, right?
> 
> We could align the base on 8 bytes to gain an extra bit in the pointer
> and use that bit to indicate the running state. Then these sites can
> spin on that bit while we can change the actual base pointer.

Even though tvec_base has ____cacheline_aligned stuck on, most are
allocated using kzalloc_node() which does not actually respect that but
already guarantees a minimum u64 alignment, so I think we can use that
third bit without too much magic.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-27  9:30         ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-27  9:30 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, mgorman, dave, koct9i,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Fri, Mar 27, 2015 at 10:16:13AM +0100, Peter Zijlstra wrote:
> On Fri, Mar 27, 2015 at 10:19:54AM +0530, Viresh Kumar wrote:
> > On 27 March 2015 at 01:48, Andrew Morton <akpm@linux-foundation.org> wrote:
> > > Shouldn't this be viewed as a shortcoming of the core timer code?
> > 
> > Yeah, it is. Some (not so pretty) solutions were tried earlier to fix that, but
> > they are rejected for obviously reasons [1].
> > 
> > > vmstat_shepherd() is merely rescheduling itself with
> > > schedule_delayed_work().  That's a dead bog simple operation and if
> > > it's producing suboptimal behaviour then we shouldn't be fixing it with
> > > elaborate workarounds in the caller?
> > 
> > I understand that, and that's why I sent it as an RFC to get the discussion
> > started. Does anyone else have got another (acceptable) idea to get this
> > resolved ?
> 
> So the issue seems to be that we need base->running_timer in order to
> tell if a callback is running, right?
> 
> We could align the base on 8 bytes to gain an extra bit in the pointer
> and use that bit to indicate the running state. Then these sites can
> spin on that bit while we can change the actual base pointer.

Even though tvec_base has ____cacheline_aligned stuck on, most are
allocated using kzalloc_node() which does not actually respect that but
already guarantees a minimum u64 alignment, so I think we can use that
third bit without too much magic.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-27  9:30         ` Peter Zijlstra
@ 2015-03-27 11:11           ` Christoph Lameter
  -1 siblings, 0 replies; 58+ messages in thread
From: Christoph Lameter @ 2015-03-27 11:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Viresh Kumar, Andrew Morton, hannes, Linaro Kernel Mailman List,
	Linux Kernel Mailing List, vinmenon, shashim, Michal Hocko,
	mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Fri, 27 Mar 2015, Peter Zijlstra wrote:

> > We could align the base on 8 bytes to gain an extra bit in the pointer
> > and use that bit to indicate the running state. Then these sites can
> > spin on that bit while we can change the actual base pointer.
>
> Even though tvec_base has ____cacheline_aligned stuck on, most are
> allocated using kzalloc_node() which does not actually respect that but
> already guarantees a minimum u64 alignment, so I think we can use that
> third bit without too much magic.

Create a new slab cache for this purpose that does the proper aligning?


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-27 11:11           ` Christoph Lameter
  0 siblings, 0 replies; 58+ messages in thread
From: Christoph Lameter @ 2015-03-27 11:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Viresh Kumar, Andrew Morton, hannes, Linaro Kernel Mailman List,
	Linux Kernel Mailing List, vinmenon, shashim, Michal Hocko,
	mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Fri, 27 Mar 2015, Peter Zijlstra wrote:

> > We could align the base on 8 bytes to gain an extra bit in the pointer
> > and use that bit to indicate the running state. Then these sites can
> > spin on that bit while we can change the actual base pointer.
>
> Even though tvec_base has ____cacheline_aligned stuck on, most are
> allocated using kzalloc_node() which does not actually respect that but
> already guarantees a minimum u64 alignment, so I think we can use that
> third bit without too much magic.

Create a new slab cache for this purpose that does the proper aligning?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-27 11:11           ` Christoph Lameter
@ 2015-03-27 12:02             ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-27 12:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Viresh Kumar, Andrew Morton, hannes, Linaro Kernel Mailman List,
	Linux Kernel Mailing List, vinmenon, shashim, Michal Hocko,
	mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Fri, Mar 27, 2015 at 06:11:44AM -0500, Christoph Lameter wrote:
> On Fri, 27 Mar 2015, Peter Zijlstra wrote:
> 
> > > We could align the base on 8 bytes to gain an extra bit in the pointer
> > > and use that bit to indicate the running state. Then these sites can
> > > spin on that bit while we can change the actual base pointer.
> >
> > Even though tvec_base has ____cacheline_aligned stuck on, most are
> > allocated using kzalloc_node() which does not actually respect that but
> > already guarantees a minimum u64 alignment, so I think we can use that
> > third bit without too much magic.
> 
> Create a new slab cache for this purpose that does the proper aligning?

That is certainly a possibility, but we'll only ever allocate nr_cpus-1
entries from it, a whole new slab cache might be overkill.

What's not clear to me is why that thing is allocated at all, AFAICT
something like:

static DEFINE_PER_CPU(struct tvec_base, tvec_bases);

Should do the right thing and be much simpler.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-27 12:02             ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-27 12:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Viresh Kumar, Andrew Morton, hannes, Linaro Kernel Mailman List,
	Linux Kernel Mailing List, vinmenon, shashim, Michal Hocko,
	mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Fri, Mar 27, 2015 at 06:11:44AM -0500, Christoph Lameter wrote:
> On Fri, 27 Mar 2015, Peter Zijlstra wrote:
> 
> > > We could align the base on 8 bytes to gain an extra bit in the pointer
> > > and use that bit to indicate the running state. Then these sites can
> > > spin on that bit while we can change the actual base pointer.
> >
> > Even though tvec_base has ____cacheline_aligned stuck on, most are
> > allocated using kzalloc_node() which does not actually respect that but
> > already guarantees a minimum u64 alignment, so I think we can use that
> > third bit without too much magic.
> 
> Create a new slab cache for this purpose that does the proper aligning?

That is certainly a possibility, but we'll only ever allocate nr_cpus-1
entries from it, a whole new slab cache might be overkill.

What's not clear to me is why that thing is allocated at all, AFAICT
something like:

static DEFINE_PER_CPU(struct tvec_base, tvec_bases);

Should do the right thing and be much simpler.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-26  5:39 ` Viresh Kumar
@ 2015-03-27 14:19   ` Michal Hocko
  -1 siblings, 0 replies; 58+ messages in thread
From: Michal Hocko @ 2015-03-27 14:19 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: akpm, hannes, cl, linaro-kernel, linux-kernel, vinmenon, shashim,
	mgorman, dave, koct9i, linux-mm

On Thu 26-03-15 11:09:01, Viresh Kumar wrote:
> A delayed work to schedule vmstat_shepherd() is queued at periodic intervals for
> internal working of vmstat core. This work and its timer end up waking an idle
> cpu sometimes, as this always stays on CPU0.
> 
> Because we re-queue the work from its handler, idle_cpu() returns false and so
> the timer (used by delayed work) never migrates to any other CPU.
> 
> This may not be the desired behavior always as waking up an idle CPU to queue
> work on few other CPUs isn't good from power-consumption point of view.

Wouldn't something like I was suggesting few months back
(http://article.gmane.org/gmane.linux.kernel.mm/127569) solve this
problem as well? Scheduler should be idle aware, no? I mean it shouldn't
wake up an idle CPU if the task might run on another one.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-27 14:19   ` Michal Hocko
  0 siblings, 0 replies; 58+ messages in thread
From: Michal Hocko @ 2015-03-27 14:19 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: akpm, hannes, cl, linaro-kernel, linux-kernel, vinmenon, shashim,
	mgorman, dave, koct9i, linux-mm

On Thu 26-03-15 11:09:01, Viresh Kumar wrote:
> A delayed work to schedule vmstat_shepherd() is queued at periodic intervals for
> internal working of vmstat core. This work and its timer end up waking an idle
> cpu sometimes, as this always stays on CPU0.
> 
> Because we re-queue the work from its handler, idle_cpu() returns false and so
> the timer (used by delayed work) never migrates to any other CPU.
> 
> This may not be the desired behavior always as waking up an idle CPU to queue
> work on few other CPUs isn't good from power-consumption point of view.

Wouldn't something like I was suggesting few months back
(http://article.gmane.org/gmane.linux.kernel.mm/127569) solve this
problem as well? Scheduler should be idle aware, no? I mean it shouldn't
wake up an idle CPU if the task might run on another one.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-27 12:02             ` Peter Zijlstra
@ 2015-03-27 19:45               ` Christoph Lameter
  -1 siblings, 0 replies; 58+ messages in thread
From: Christoph Lameter @ 2015-03-27 19:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Viresh Kumar, Andrew Morton, hannes, Linaro Kernel Mailman List,
	Linux Kernel Mailing List, vinmenon, shashim, Michal Hocko,
	mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Fri, 27 Mar 2015, Peter Zijlstra wrote:

> On Fri, Mar 27, 2015 at 06:11:44AM -0500, Christoph Lameter wrote:
> >
> > Create a new slab cache for this purpose that does the proper aligning?
>
> That is certainly a possibility, but we'll only ever allocate nr_cpus-1
> entries from it, a whole new slab cache might be overkill.

This will certainly be aliased to some other slab cache so not much
overhead is created.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-27 19:45               ` Christoph Lameter
  0 siblings, 0 replies; 58+ messages in thread
From: Christoph Lameter @ 2015-03-27 19:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Viresh Kumar, Andrew Morton, hannes, Linaro Kernel Mailman List,
	Linux Kernel Mailing List, vinmenon, shashim, Michal Hocko,
	mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Fri, 27 Mar 2015, Peter Zijlstra wrote:

> On Fri, Mar 27, 2015 at 06:11:44AM -0500, Christoph Lameter wrote:
> >
> > Create a new slab cache for this purpose that does the proper aligning?
>
> That is certainly a possibility, but we'll only ever allocate nr_cpus-1
> entries from it, a whole new slab cache might be overkill.

This will certainly be aliased to some other slab cache so not much
overhead is created.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-27  9:30         ` Peter Zijlstra
@ 2015-03-28  4:18           ` Viresh Kumar
  -1 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-28  4:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, mgorman, dave, koct9i,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Fri, Mar 27, 2015 at 3:00 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Mar 27, 2015 at 10:16:13AM +0100, Peter Zijlstra wrote:

>> So the issue seems to be that we need base->running_timer in order to
>> tell if a callback is running, right?
>>
>> We could align the base on 8 bytes to gain an extra bit in the pointer
>> and use that bit to indicate the running state. Then these sites can
>> spin on that bit while we can change the actual base pointer.
>
> Even though tvec_base has ____cacheline_aligned stuck on, most are
> allocated using kzalloc_node() which does not actually respect that but
> already guarantees a minimum u64 alignment, so I think we can use that
> third bit without too much magic.

Right. So what I tried earlier was very much similar to you are suggesting.
The only difference was that I haven't made much attempts towards
saving memory.

But Thomas didn't like it for sure and I believe that Rant will hold true for
what you are suggesting as well, isn't it ?

http://lists.linaro.org/pipermail/linaro-kernel/2013-November/008866.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-28  4:18           ` Viresh Kumar
  0 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-28  4:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, mgorman, dave, koct9i,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Fri, Mar 27, 2015 at 3:00 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Mar 27, 2015 at 10:16:13AM +0100, Peter Zijlstra wrote:

>> So the issue seems to be that we need base->running_timer in order to
>> tell if a callback is running, right?
>>
>> We could align the base on 8 bytes to gain an extra bit in the pointer
>> and use that bit to indicate the running state. Then these sites can
>> spin on that bit while we can change the actual base pointer.
>
> Even though tvec_base has ____cacheline_aligned stuck on, most are
> allocated using kzalloc_node() which does not actually respect that but
> already guarantees a minimum u64 alignment, so I think we can use that
> third bit without too much magic.

Right. So what I tried earlier was very much similar to you are suggesting.
The only difference was that I haven't made much attempts towards
saving memory.

But Thomas didn't like it for sure and I believe that Rant will hold true for
what you are suggesting as well, isn't it ?

http://lists.linaro.org/pipermail/linaro-kernel/2013-November/008866.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-27 12:02             ` Peter Zijlstra
@ 2015-03-28  4:28               ` Viresh Kumar
  -1 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-28  4:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Andrew Morton, Johannes Weiner,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 27 March 2015 at 17:32, Peter Zijlstra <peterz@infradead.org> wrote:
> What's not clear to me is why that thing is allocated at all, AFAICT
> something like:
>
> static DEFINE_PER_CPU(struct tvec_base, tvec_bases);
>
> Should do the right thing and be much simpler.

Does this comment from timers.c answers your query ?

                        /*
                         * This is for the boot CPU - we use compile-time
                         * static initialisation because per-cpu memory isn't
                         * ready yet and because the memory allocators are not
                         * initialised either.
                         */

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-28  4:28               ` Viresh Kumar
  0 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-28  4:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Andrew Morton, Johannes Weiner,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 27 March 2015 at 17:32, Peter Zijlstra <peterz@infradead.org> wrote:
> What's not clear to me is why that thing is allocated at all, AFAICT
> something like:
>
> static DEFINE_PER_CPU(struct tvec_base, tvec_bases);
>
> Should do the right thing and be much simpler.

Does this comment from timers.c answers your query ?

                        /*
                         * This is for the boot CPU - we use compile-time
                         * static initialisation because per-cpu memory isn't
                         * ready yet and because the memory allocators are not
                         * initialised either.
                         */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-27 14:19   ` Michal Hocko
@ 2015-03-28  4:34     ` Viresh Kumar
  -1 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-28  4:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List

On 27 March 2015 at 19:49, Michal Hocko <mhocko@suse.cz> wrote:

> Wouldn't something like I was suggesting few months back
> (http://article.gmane.org/gmane.linux.kernel.mm/127569) solve this
> problem as well? Scheduler should be idle aware, no? I mean it shouldn't
> wake up an idle CPU if the task might run on another one.

Probably yes. Lets see what others have to say about it..

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-28  4:34     ` Viresh Kumar
  0 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-28  4:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List

On 27 March 2015 at 19:49, Michal Hocko <mhocko@suse.cz> wrote:

> Wouldn't something like I was suggesting few months back
> (http://article.gmane.org/gmane.linux.kernel.mm/127569) solve this
> problem as well? Scheduler should be idle aware, no? I mean it shouldn't
> wake up an idle CPU if the task might run on another one.

Probably yes. Lets see what others have to say about it..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-28  4:18           ` Viresh Kumar
@ 2015-03-28  9:53             ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-28  9:53 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, mgorman, dave, koct9i,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Sat, Mar 28, 2015 at 09:48:28AM +0530, Viresh Kumar wrote:
> On Fri, Mar 27, 2015 at 3:00 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, Mar 27, 2015 at 10:16:13AM +0100, Peter Zijlstra wrote:
> 
> >> So the issue seems to be that we need base->running_timer in order to
> >> tell if a callback is running, right?
> >>
> >> We could align the base on 8 bytes to gain an extra bit in the pointer
> >> and use that bit to indicate the running state. Then these sites can
> >> spin on that bit while we can change the actual base pointer.
> >
> > Even though tvec_base has ____cacheline_aligned stuck on, most are
> > allocated using kzalloc_node() which does not actually respect that but
> > already guarantees a minimum u64 alignment, so I think we can use that
> > third bit without too much magic.
> 
> Right. So what I tried earlier was very much similar to you are suggesting.
> The only difference was that I haven't made much attempts towards
> saving memory.
> 
> But Thomas didn't like it for sure and I believe that Rant will hold true for
> what you are suggesting as well, isn't it ?
> 
> http://lists.linaro.org/pipermail/linaro-kernel/2013-November/008866.html

Well, for one your patch is indeed disgusting. But yes I'm aware Thomas
wants to rewrite the timer thing. But Thomas is away for a little while
and if this really needs to happen then it does.

Alternatively the thing hocko suggests is an utter fail too. You cannot
stuff that into hardirq context, that's insane.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-28  9:53             ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-28  9:53 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, mgorman, dave, koct9i,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Sat, Mar 28, 2015 at 09:48:28AM +0530, Viresh Kumar wrote:
> On Fri, Mar 27, 2015 at 3:00 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, Mar 27, 2015 at 10:16:13AM +0100, Peter Zijlstra wrote:
> 
> >> So the issue seems to be that we need base->running_timer in order to
> >> tell if a callback is running, right?
> >>
> >> We could align the base on 8 bytes to gain an extra bit in the pointer
> >> and use that bit to indicate the running state. Then these sites can
> >> spin on that bit while we can change the actual base pointer.
> >
> > Even though tvec_base has ____cacheline_aligned stuck on, most are
> > allocated using kzalloc_node() which does not actually respect that but
> > already guarantees a minimum u64 alignment, so I think we can use that
> > third bit without too much magic.
> 
> Right. So what I tried earlier was very much similar to you are suggesting.
> The only difference was that I haven't made much attempts towards
> saving memory.
> 
> But Thomas didn't like it for sure and I believe that Rant will hold true for
> what you are suggesting as well, isn't it ?
> 
> http://lists.linaro.org/pipermail/linaro-kernel/2013-November/008866.html

Well, for one your patch is indeed disgusting. But yes I'm aware Thomas
wants to rewrite the timer thing. But Thomas is away for a little while
and if this really needs to happen then it does.

Alternatively the thing hocko suggests is an utter fail too. You cannot
stuff that into hardirq context, that's insane.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-28  4:28               ` Viresh Kumar
@ 2015-03-28 11:41                 ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-28 11:41 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Christoph Lameter, Andrew Morton, Johannes Weiner,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Sat, Mar 28, 2015 at 09:58:38AM +0530, Viresh Kumar wrote:
> On 27 March 2015 at 17:32, Peter Zijlstra <peterz@infradead.org> wrote:
> > What's not clear to me is why that thing is allocated at all, AFAICT
> > something like:
> >
> > static DEFINE_PER_CPU(struct tvec_base, tvec_bases);
> >
> > Should do the right thing and be much simpler.
> 
> Does this comment from timers.c answers your query ?
> 
>                         /*
>                          * This is for the boot CPU - we use compile-time
>                          * static initialisation because per-cpu memory isn't
>                          * ready yet and because the memory allocators are not
>                          * initialised either.
>                          */

No. The reason is because __TIMER_INITIALIZER() needs to set ->base to a
valid pointer and we cannot get a compile time pointer to per-cpu
entries because we don't know where we'll map the section, even for the
boot cpu.

This in turn means we need that boot_tvec_bases thing.

Now the reason we need to set ->base to a valid pointer is because we've
made NULL special.

Arguably we could create another special pointer, but since that's init
time only we get to add code for that will 'never' be used, more special
cases.

Its all a bit of a bother, but it makes sense.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-28 11:41                 ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-28 11:41 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Christoph Lameter, Andrew Morton, Johannes Weiner,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Sat, Mar 28, 2015 at 09:58:38AM +0530, Viresh Kumar wrote:
> On 27 March 2015 at 17:32, Peter Zijlstra <peterz@infradead.org> wrote:
> > What's not clear to me is why that thing is allocated at all, AFAICT
> > something like:
> >
> > static DEFINE_PER_CPU(struct tvec_base, tvec_bases);
> >
> > Should do the right thing and be much simpler.
> 
> Does this comment from timers.c answers your query ?
> 
>                         /*
>                          * This is for the boot CPU - we use compile-time
>                          * static initialisation because per-cpu memory isn't
>                          * ready yet and because the memory allocators are not
>                          * initialised either.
>                          */

No. The reason is because __TIMER_INITIALIZER() needs to set ->base to a
valid pointer and we cannot get a compile time pointer to per-cpu
entries because we don't know where we'll map the section, even for the
boot cpu.

This in turn means we need that boot_tvec_bases thing.

Now the reason we need to set ->base to a valid pointer is because we've
made NULL special.

Arguably we could create another special pointer, but since that's init
time only we get to add code for that will 'never' be used, more special
cases.

Its all a bit of a bother, but it makes sense.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-28  9:53             ` Peter Zijlstra
@ 2015-03-28 11:57               ` viresh kumar
  -1 siblings, 0 replies; 58+ messages in thread
From: viresh kumar @ 2015-03-28 11:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 28 March 2015 at 15:23, Peter Zijlstra <peterz@infradead.org> wrote:

> Well, for one your patch is indeed disgusting.

Yeah, I agree :)

> But yes I'm aware Thomas
> wants to rewrite the timer thing. But Thomas is away for a little while
> and if this really needs to happen then it does.

Sometime back I was trying to use another bit from base pointer for
marking a timer as PINNED:

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..e7184f57449c 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -67,8 +67,9 @@ extern struct tvec_base boot_tvec_bases;
  */
 #define TIMER_DEFERRABLE               0x1LU
 #define TIMER_IRQSAFE                  0x2LU
+#define TIMER_PINNED                   0x4LU
 -#define TIMER_FLAG_MASK                        0x3LU
+#define TIMER_FLAG_MASK                        0x7LU


And Fenguang's build-bot showed the problem (only) on blackfin [1].

        config: make ARCH=blackfin allyesconfig

        All error/warnings:

           kernel/timer.c: In function 'init_timers':
        >> kernel/timer.c:1683:2: error: call to '__compiletime_assert_1683'
        >> declared with attribute error: BUILD_BUG_ON failed:
        >> __alignof__(struct tvec_base) & TIMER_FLAG_MASK


So probably we need to make 'base' aligned to 8 bytes ?



So, what you are suggesting is something like this (untested):

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..68bf09d69352 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -67,8 +67,9 @@ extern struct tvec_base boot_tvec_bases;
  */
 #define TIMER_DEFERRABLE               0x1LU
 #define TIMER_IRQSAFE                  0x2LU
+#define TIMER_RUNNING                  0x4LU

-#define TIMER_FLAG_MASK                        0x3LU
+#define TIMER_FLAG_MASK                        0x7LU

 #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
                .entry = { .prev = TIMER_ENTRY_STATIC },        \
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..8f9efa64bd34 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -105,6 +105,21 @@ static inline unsigned int tbase_get_irqsafe(struct tvec_base *base)
        return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE);
 }

+static inline unsigned int tbase_get_running(struct tvec_base *base)
+{
+       return ((unsigned int)(unsigned long)base & TIMER_RUNNING);
+}
+
+static inline unsigned int tbase_set_running(struct tvec_base *base)
+{
+       return ((unsigned int)(unsigned long)base | TIMER_RUNNING);
+}
+
+static inline unsigned int tbase_clear_running(struct tvec_base *base)
+{
+       return ((unsigned int)(unsigned long)base & ~TIMER_RUNNING);
+}
+
 static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
 {
        return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
@@ -781,21 +796,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
        new_base = per_cpu(tvec_bases, cpu);

        if (base != new_base) {
-               /*
-                * We are trying to schedule the timer on the local CPU.
-                * However we can't change timer's base while it is running,
-                * otherwise del_timer_sync() can't detect that the timer's
-                * handler yet has not finished. This also guarantees that
-                * the timer is serialized wrt itself.
-                */
-               if (likely(base->running_timer != timer)) {
-                       /* See the comment in lock_timer_base() */
-                       timer_set_base(timer, NULL);
-                       spin_unlock(&base->lock);
-                       base = new_base;
-                       spin_lock(&base->lock);
-                       timer_set_base(timer, base);
-               }
+               /* See the comment in lock_timer_base() */
+               timer_set_base(timer, NULL);
+               spin_unlock(&base->lock);
+               base = new_base;
+               spin_lock(&base->lock);
+               timer_set_base(timer, base);
        }

        timer->expires = expires;
@@ -1016,7 +1022,7 @@ int try_to_del_timer_sync(struct timer_list *timer)

        base = lock_timer_base(timer, &flags);

-       if (base->running_timer != timer) {
+       if (tbase_get_running(timer->base)) {
                timer_stats_timer_clear_start_info(timer);
                ret = detach_if_pending(timer, base, true);
        }
@@ -1202,6 +1208,7 @@ static inline void __run_timers(struct tvec_base *base)
                        timer_stats_account_timer(timer);

                        base->running_timer = timer;
+                       tbase_set_running(timer->base);
                        detach_expired_timer(timer, base);

                        if (irqsafe) {
@@ -1216,6 +1223,7 @@ static inline void __run_timers(struct tvec_base *base)
                }
        }
        base->running_timer = NULL;
+       tbase_clear_running(timer->base);
        spin_unlock_irq(&base->lock);
 }

------------x--------------------x----------------------

Right?


Now there are few issues I see here (Sorry if they are all imaginary):
- In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
  happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
  hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
  us to a lot of other problems? It wouldn't be serialized to itself anymore ?

- Because the timer has migrated to another CPU, the locking in __run_timers()
  needs to be fixed. And that will make it complicated ..

  - __run_timer() doesn't lock bases of other CPUs, and it has to do it now..
  - We probably need to take locks of both local CPU and the one to which timer migrated.

- Its possible now that there can be more than one running timer for a base, which wasn't
  true earlier. Not sure if it will break something.


Thanks for your continuous support to reply to my (sometimes stupid) queries.

--
viresh

[1] https://lists.01.org/pipermail/kbuild-all/2014-April/003982.html

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-28 11:57               ` viresh kumar
  0 siblings, 0 replies; 58+ messages in thread
From: viresh kumar @ 2015-03-28 11:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 28 March 2015 at 15:23, Peter Zijlstra <peterz@infradead.org> wrote:

> Well, for one your patch is indeed disgusting.

Yeah, I agree :)

> But yes I'm aware Thomas
> wants to rewrite the timer thing. But Thomas is away for a little while
> and if this really needs to happen then it does.

Sometime back I was trying to use another bit from base pointer for
marking a timer as PINNED:

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..e7184f57449c 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -67,8 +67,9 @@ extern struct tvec_base boot_tvec_bases;
  */
 #define TIMER_DEFERRABLE               0x1LU
 #define TIMER_IRQSAFE                  0x2LU
+#define TIMER_PINNED                   0x4LU
 -#define TIMER_FLAG_MASK                        0x3LU
+#define TIMER_FLAG_MASK                        0x7LU


And Fenguang's build-bot showed the problem (only) on blackfin [1].

        config: make ARCH=blackfin allyesconfig

        All error/warnings:

           kernel/timer.c: In function 'init_timers':
        >> kernel/timer.c:1683:2: error: call to '__compiletime_assert_1683'
        >> declared with attribute error: BUILD_BUG_ON failed:
        >> __alignof__(struct tvec_base) & TIMER_FLAG_MASK


So probably we need to make 'base' aligned to 8 bytes ?



So, what you are suggesting is something like this (untested):

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..68bf09d69352 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -67,8 +67,9 @@ extern struct tvec_base boot_tvec_bases;
  */
 #define TIMER_DEFERRABLE               0x1LU
 #define TIMER_IRQSAFE                  0x2LU
+#define TIMER_RUNNING                  0x4LU

-#define TIMER_FLAG_MASK                        0x3LU
+#define TIMER_FLAG_MASK                        0x7LU

 #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
                .entry = { .prev = TIMER_ENTRY_STATIC },        \
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..8f9efa64bd34 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -105,6 +105,21 @@ static inline unsigned int tbase_get_irqsafe(struct tvec_base *base)
        return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE);
 }

+static inline unsigned int tbase_get_running(struct tvec_base *base)
+{
+       return ((unsigned int)(unsigned long)base & TIMER_RUNNING);
+}
+
+static inline unsigned int tbase_set_running(struct tvec_base *base)
+{
+       return ((unsigned int)(unsigned long)base | TIMER_RUNNING);
+}
+
+static inline unsigned int tbase_clear_running(struct tvec_base *base)
+{
+       return ((unsigned int)(unsigned long)base & ~TIMER_RUNNING);
+}
+
 static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
 {
        return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
@@ -781,21 +796,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
        new_base = per_cpu(tvec_bases, cpu);

        if (base != new_base) {
-               /*
-                * We are trying to schedule the timer on the local CPU.
-                * However we can't change timer's base while it is running,
-                * otherwise del_timer_sync() can't detect that the timer's
-                * handler yet has not finished. This also guarantees that
-                * the timer is serialized wrt itself.
-                */
-               if (likely(base->running_timer != timer)) {
-                       /* See the comment in lock_timer_base() */
-                       timer_set_base(timer, NULL);
-                       spin_unlock(&base->lock);
-                       base = new_base;
-                       spin_lock(&base->lock);
-                       timer_set_base(timer, base);
-               }
+               /* See the comment in lock_timer_base() */
+               timer_set_base(timer, NULL);
+               spin_unlock(&base->lock);
+               base = new_base;
+               spin_lock(&base->lock);
+               timer_set_base(timer, base);
        }

        timer->expires = expires;
@@ -1016,7 +1022,7 @@ int try_to_del_timer_sync(struct timer_list *timer)

        base = lock_timer_base(timer, &flags);

-       if (base->running_timer != timer) {
+       if (tbase_get_running(timer->base)) {
                timer_stats_timer_clear_start_info(timer);
                ret = detach_if_pending(timer, base, true);
        }
@@ -1202,6 +1208,7 @@ static inline void __run_timers(struct tvec_base *base)
                        timer_stats_account_timer(timer);

                        base->running_timer = timer;
+                       tbase_set_running(timer->base);
                        detach_expired_timer(timer, base);

                        if (irqsafe) {
@@ -1216,6 +1223,7 @@ static inline void __run_timers(struct tvec_base *base)
                }
        }
        base->running_timer = NULL;
+       tbase_clear_running(timer->base);
        spin_unlock_irq(&base->lock);
 }

------------x--------------------x----------------------

Right?


Now there are few issues I see here (Sorry if they are all imaginary):
- In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
  happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
  hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
  us to a lot of other problems? It wouldn't be serialized to itself anymore ?

- Because the timer has migrated to another CPU, the locking in __run_timers()
  needs to be fixed. And that will make it complicated ..

  - __run_timer() doesn't lock bases of other CPUs, and it has to do it now..
  - We probably need to take locks of both local CPU and the one to which timer migrated.

- Its possible now that there can be more than one running timer for a base, which wasn't
  true earlier. Not sure if it will break something.


Thanks for your continuous support to reply to my (sometimes stupid) queries.

--
viresh

[1] https://lists.01.org/pipermail/kbuild-all/2014-April/003982.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-28 11:57               ` viresh kumar
@ 2015-03-28 12:04                 ` Viresh Kumar
  -1 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-28 12:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 28 March 2015 at 17:27, viresh kumar <viresh.kumar@linaro.org> wrote:
> On 28 March 2015 at 15:23, Peter Zijlstra <peterz@infradead.org> wrote:
>
>> Well, for one your patch is indeed disgusting.
>
> Yeah, I agree :)

Sigh..

Sorry for the series of *nonsense* mails before the last one.

Its some thunderbird *BUG* which did that, I was accessing my
mail from both gmail's interface and thunderbird and somehow
this happened. I have hit the send button only once.

Really sorry for the noise.

(The last mail has few inquiries towards the end and a thanks note,
just to identify it..)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-28 12:04                 ` Viresh Kumar
  0 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-28 12:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 28 March 2015 at 17:27, viresh kumar <viresh.kumar@linaro.org> wrote:
> On 28 March 2015 at 15:23, Peter Zijlstra <peterz@infradead.org> wrote:
>
>> Well, for one your patch is indeed disgusting.
>
> Yeah, I agree :)

Sigh..

Sorry for the series of *nonsense* mails before the last one.

Its some thunderbird *BUG* which did that, I was accessing my
mail from both gmail's interface and thunderbird and somehow
this happened. I have hit the send button only once.

Really sorry for the noise.

(The last mail has few inquiries towards the end and a thanks note,
just to identify it..)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-28 11:57               ` viresh kumar
@ 2015-03-28 13:44                 ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-28 13:44 UTC (permalink / raw)
  To: viresh kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Sat, Mar 28, 2015 at 05:27:23PM +0530, viresh kumar wrote:
> So probably we need to make 'base' aligned to 8 bytes ?

Yeah, something like the below (at the very end) should ensure the thing
is cacheline aligned, that should give us a fair few bits.

> So, what you are suggesting is something like this (untested):

> @@ -1202,6 +1208,7 @@ static inline void __run_timers(struct tvec_base *base)
>                         timer_stats_account_timer(timer);
> 
>                         base->running_timer = timer;
> +                       tbase_set_running(timer->base);
>                         detach_expired_timer(timer, base);
> 
>                         if (irqsafe) {
> @@ -1216,6 +1223,7 @@ static inline void __run_timers(struct tvec_base *base)
>                 }
>         }
>         base->running_timer = NULL;
> +       tbase_clear_running(timer->base);
>         spin_unlock_irq(&base->lock);
>  }

That's broken. You need to clear running on all the timers you set it
on. Furthermore, you need to revalidate timer->base == base after
call_timer_fn().

Something like so:

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..489ce182f8ec 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1213,6 +1213,21 @@ static inline void __run_timers(struct tvec_base *base)
 				call_timer_fn(timer, fn, data);
 				spin_lock_irq(&base->lock);
 			}
+
+			if (unlikely(timer->base != base)) {
+				unsigned long flags;
+				struct tvec_base *tbase;
+
+				spin_unlock(&base->lock);
+
+				tbase = lock_timer_base(timer, &flags);
+				tbase_clear_running(timer->base);
+				spin_unlock(&tbase->lock);
+
+				spin_lock(&base->lock);
+			} else {
+				tbase_clear_running(timer->base);
+			}
 		}
 	}
 	base->running_timer = NULL;

Also, once you have tbase_running, we can take base->running_timer out
altogether.

> Now there are few issues I see here (Sorry if they are all imaginary):
> - In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
>   happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
>   hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
>   us to a lot of other problems? It wouldn't be serialized to itself anymore ?

What I said above.

> - Because the timer has migrated to another CPU, the locking in __run_timers()
>   needs to be fixed. And that will make it complicated ..

Hardly.

>   - __run_timer() doesn't lock bases of other CPUs, and it has to do it now..

Yep, but rarely.

>   - We probably need to take locks of both local CPU and the one to which timer migrated.

Nope, or rather, not at the same time. That's what the NULL magic buys
us.

> - Its possible now that there can be more than one running timer for a base, which wasn't
>   true earlier. Not sure if it will break something.

Only if you messed it up real bad :-)

---
 kernel/time/timer.c | 36 ++++++++----------------------------
 1 file changed, 8 insertions(+), 28 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..c8c45bf50b2e 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -93,6 +93,7 @@ struct tvec_base {
 struct tvec_base boot_tvec_bases;
 EXPORT_SYMBOL(boot_tvec_bases);
 static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases;
+static DEFINE_PER_CPU(struct tvec_base, __tvec_bases);
 
 /* Functions below help us manage 'deferrable' flag */
 static inline unsigned int tbase_get_deferrable(struct tvec_base *base)
@@ -1534,46 +1535,25 @@ EXPORT_SYMBOL(schedule_timeout_uninterruptible);
 
 static int init_timers_cpu(int cpu)
 {
-	int j;
-	struct tvec_base *base;
+	struct tvec_base *base = per_cpu(tvec_bases, cpu);
 	static char tvec_base_done[NR_CPUS];
+	int j;
 
 	if (!tvec_base_done[cpu]) {
 		static char boot_done;
 
-		if (boot_done) {
-			/*
-			 * The APs use this path later in boot
-			 */
-			base = kzalloc_node(sizeof(*base), GFP_KERNEL,
-					    cpu_to_node(cpu));
-			if (!base)
-				return -ENOMEM;
-
-			/* Make sure tvec_base has TIMER_FLAG_MASK bits free */
-			if (WARN_ON(base != tbase_get_base(base))) {
-				kfree(base);
-				return -ENOMEM;
-			}
-			per_cpu(tvec_bases, cpu) = base;
+		if (!boot_done) {
+			boot_done = 1; /* skip the boot cpu */
 		} else {
-			/*
-			 * This is for the boot CPU - we use compile-time
-			 * static initialisation because per-cpu memory isn't
-			 * ready yet and because the memory allocators are not
-			 * initialised either.
-			 */
-			boot_done = 1;
-			base = &boot_tvec_bases;
+			base = per_cpu_ptr(&__tvec_bases);
+			per_cpu(tvec_bases, cpu) = base;
 		}
+
 		spin_lock_init(&base->lock);
 		tvec_base_done[cpu] = 1;
 		base->cpu = cpu;
-	} else {
-		base = per_cpu(tvec_bases, cpu);
 	}
 
-
 	for (j = 0; j < TVN_SIZE; j++) {
 		INIT_LIST_HEAD(base->tv5.vec + j);
 		INIT_LIST_HEAD(base->tv4.vec + j);

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-28 13:44                 ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-28 13:44 UTC (permalink / raw)
  To: viresh kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Sat, Mar 28, 2015 at 05:27:23PM +0530, viresh kumar wrote:
> So probably we need to make 'base' aligned to 8 bytes ?

Yeah, something like the below (at the very end) should ensure the thing
is cacheline aligned, that should give us a fair few bits.

> So, what you are suggesting is something like this (untested):

> @@ -1202,6 +1208,7 @@ static inline void __run_timers(struct tvec_base *base)
>                         timer_stats_account_timer(timer);
> 
>                         base->running_timer = timer;
> +                       tbase_set_running(timer->base);
>                         detach_expired_timer(timer, base);
> 
>                         if (irqsafe) {
> @@ -1216,6 +1223,7 @@ static inline void __run_timers(struct tvec_base *base)
>                 }
>         }
>         base->running_timer = NULL;
> +       tbase_clear_running(timer->base);
>         spin_unlock_irq(&base->lock);
>  }

That's broken. You need to clear running on all the timers you set it
on. Furthermore, you need to revalidate timer->base == base after
call_timer_fn().

Something like so:

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..489ce182f8ec 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1213,6 +1213,21 @@ static inline void __run_timers(struct tvec_base *base)
 				call_timer_fn(timer, fn, data);
 				spin_lock_irq(&base->lock);
 			}
+
+			if (unlikely(timer->base != base)) {
+				unsigned long flags;
+				struct tvec_base *tbase;
+
+				spin_unlock(&base->lock);
+
+				tbase = lock_timer_base(timer, &flags);
+				tbase_clear_running(timer->base);
+				spin_unlock(&tbase->lock);
+
+				spin_lock(&base->lock);
+			} else {
+				tbase_clear_running(timer->base);
+			}
 		}
 	}
 	base->running_timer = NULL;

Also, once you have tbase_running, we can take base->running_timer out
altogether.

> Now there are few issues I see here (Sorry if they are all imaginary):
> - In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
>   happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
>   hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
>   us to a lot of other problems? It wouldn't be serialized to itself anymore ?

What I said above.

> - Because the timer has migrated to another CPU, the locking in __run_timers()
>   needs to be fixed. And that will make it complicated ..

Hardly.

>   - __run_timer() doesn't lock bases of other CPUs, and it has to do it now..

Yep, but rarely.

>   - We probably need to take locks of both local CPU and the one to which timer migrated.

Nope, or rather, not at the same time. That's what the NULL magic buys
us.

> - Its possible now that there can be more than one running timer for a base, which wasn't
>   true earlier. Not sure if it will break something.

Only if you messed it up real bad :-)

---
 kernel/time/timer.c | 36 ++++++++----------------------------
 1 file changed, 8 insertions(+), 28 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..c8c45bf50b2e 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -93,6 +93,7 @@ struct tvec_base {
 struct tvec_base boot_tvec_bases;
 EXPORT_SYMBOL(boot_tvec_bases);
 static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases;
+static DEFINE_PER_CPU(struct tvec_base, __tvec_bases);
 
 /* Functions below help us manage 'deferrable' flag */
 static inline unsigned int tbase_get_deferrable(struct tvec_base *base)
@@ -1534,46 +1535,25 @@ EXPORT_SYMBOL(schedule_timeout_uninterruptible);
 
 static int init_timers_cpu(int cpu)
 {
-	int j;
-	struct tvec_base *base;
+	struct tvec_base *base = per_cpu(tvec_bases, cpu);
 	static char tvec_base_done[NR_CPUS];
+	int j;
 
 	if (!tvec_base_done[cpu]) {
 		static char boot_done;
 
-		if (boot_done) {
-			/*
-			 * The APs use this path later in boot
-			 */
-			base = kzalloc_node(sizeof(*base), GFP_KERNEL,
-					    cpu_to_node(cpu));
-			if (!base)
-				return -ENOMEM;
-
-			/* Make sure tvec_base has TIMER_FLAG_MASK bits free */
-			if (WARN_ON(base != tbase_get_base(base))) {
-				kfree(base);
-				return -ENOMEM;
-			}
-			per_cpu(tvec_bases, cpu) = base;
+		if (!boot_done) {
+			boot_done = 1; /* skip the boot cpu */
 		} else {
-			/*
-			 * This is for the boot CPU - we use compile-time
-			 * static initialisation because per-cpu memory isn't
-			 * ready yet and because the memory allocators are not
-			 * initialised either.
-			 */
-			boot_done = 1;
-			base = &boot_tvec_bases;
+			base = per_cpu_ptr(&__tvec_bases);
+			per_cpu(tvec_bases, cpu) = base;
 		}
+
 		spin_lock_init(&base->lock);
 		tvec_base_done[cpu] = 1;
 		base->cpu = cpu;
-	} else {
-		base = per_cpu(tvec_bases, cpu);
 	}
 
-
 	for (j = 0; j < TVN_SIZE; j++) {
 		INIT_LIST_HEAD(base->tv5.vec + j);
 		INIT_LIST_HEAD(base->tv4.vec + j);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-28 13:44                 ` Peter Zijlstra
@ 2015-03-29 10:24                   ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-29 10:24 UTC (permalink / raw)
  To: viresh kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Sat, Mar 28, 2015 at 02:44:57PM +0100, Peter Zijlstra wrote:
> > Now there are few issues I see here (Sorry if they are all imaginary):
> > - In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
> >   happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
> >   hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
> >   us to a lot of other problems? It wouldn't be serialized to itself anymore ?
> 
> What I said above.

What I didn't say, but had thought of is that __run_timer() should skip
any timer that has RUNNING set -- for obvious reasons :-)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-29 10:24                   ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-29 10:24 UTC (permalink / raw)
  To: viresh kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Sat, Mar 28, 2015 at 02:44:57PM +0100, Peter Zijlstra wrote:
> > Now there are few issues I see here (Sorry if they are all imaginary):
> > - In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
> >   happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
> >   hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
> >   us to a lot of other problems? It wouldn't be serialized to itself anymore ?
> 
> What I said above.

What I didn't say, but had thought of is that __run_timer() should skip
any timer that has RUNNING set -- for obvious reasons :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-28 13:44                 ` Peter Zijlstra
@ 2015-03-29 12:01                   ` Viresh Kumar
  -1 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-29 12:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 28 March 2015 at 19:14, Peter Zijlstra <peterz@infradead.org> wrote:
> Yeah, something like the below (at the very end) should ensure the thing
> is cacheline aligned, that should give us a fair few bits.

> ---
>  kernel/time/timer.c | 36 ++++++++----------------------------
>  1 file changed, 8 insertions(+), 28 deletions(-)
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 2d3f5c504939..c8c45bf50b2e 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -93,6 +93,7 @@ struct tvec_base {
>  struct tvec_base boot_tvec_bases;
>  EXPORT_SYMBOL(boot_tvec_bases);
>  static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases;
> +static DEFINE_PER_CPU(struct tvec_base, __tvec_bases);
>
>  /* Functions below help us manage 'deferrable' flag */
>  static inline unsigned int tbase_get_deferrable(struct tvec_base *base)
> @@ -1534,46 +1535,25 @@ EXPORT_SYMBOL(schedule_timeout_uninterruptible);
>
>  static int init_timers_cpu(int cpu)
>  {
> -       int j;
> -       struct tvec_base *base;
> +       struct tvec_base *base = per_cpu(tvec_bases, cpu);
>         static char tvec_base_done[NR_CPUS];
> +       int j;
>
>         if (!tvec_base_done[cpu]) {
>                 static char boot_done;
>
> -               if (boot_done) {
> -                       /*
> -                        * The APs use this path later in boot
> -                        */
> -                       base = kzalloc_node(sizeof(*base), GFP_KERNEL,
> -                                           cpu_to_node(cpu));
> -                       if (!base)
> -                               return -ENOMEM;
> -
> -                       /* Make sure tvec_base has TIMER_FLAG_MASK bits free */
> -                       if (WARN_ON(base != tbase_get_base(base))) {
> -                               kfree(base);
> -                               return -ENOMEM;
> -                       }
> -                       per_cpu(tvec_bases, cpu) = base;
> +               if (!boot_done) {
> +                       boot_done = 1; /* skip the boot cpu */
>                 } else {
> -                       /*
> -                        * This is for the boot CPU - we use compile-time
> -                        * static initialisation because per-cpu memory isn't
> -                        * ready yet and because the memory allocators are not
> -                        * initialised either.
> -                        */
> -                       boot_done = 1;
> -                       base = &boot_tvec_bases;
> +                       base = per_cpu_ptr(&__tvec_bases);
> +                       per_cpu(tvec_bases, cpu) = base;
>                 }
> +
>                 spin_lock_init(&base->lock);
>                 tvec_base_done[cpu] = 1;
>                 base->cpu = cpu;
> -       } else {
> -               base = per_cpu(tvec_bases, cpu);
>         }
>
> -
>         for (j = 0; j < TVN_SIZE; j++) {
>                 INIT_LIST_HEAD(base->tv5.vec + j);
>                 INIT_LIST_HEAD(base->tv4.vec + j);

Even after this with following diff gives me the same warning on blackfin..

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..58bc28d9cef2 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -68,7 +68,7 @@ extern struct tvec_base boot_tvec_bases;
 #define TIMER_DEFERRABLE               0x1LU
 #define TIMER_IRQSAFE                  0x2LU

-#define TIMER_FLAG_MASK                        0x3LU
+#define TIMER_FLAG_MASK                        0x7LU

 #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
                .entry = { .prev = TIMER_ENTRY_STATIC },        \


---------x--------------------x----------------------

Warning:

config: blackfin-allyesconfig (attached as .config)
reproduce:
  wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
-O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout ca713e393c6eceb54e803df204772a3d6e6c7981
  # save the attached .config to linux build tree
  make.cross ARCH=blackfin

All error/warnings:

   kernel/time/timer.c: In function 'init_timers':
>> kernel/time/timer.c:1648:2: error: call to '__compiletime_assert_1648' declared with attribute error: BUILD_BUG_ON failed: __alignof__(struct tvec_base) & TIMER_FLAG_MASK

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-29 12:01                   ` Viresh Kumar
  0 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-29 12:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 28 March 2015 at 19:14, Peter Zijlstra <peterz@infradead.org> wrote:
> Yeah, something like the below (at the very end) should ensure the thing
> is cacheline aligned, that should give us a fair few bits.

> ---
>  kernel/time/timer.c | 36 ++++++++----------------------------
>  1 file changed, 8 insertions(+), 28 deletions(-)
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 2d3f5c504939..c8c45bf50b2e 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -93,6 +93,7 @@ struct tvec_base {
>  struct tvec_base boot_tvec_bases;
>  EXPORT_SYMBOL(boot_tvec_bases);
>  static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases;
> +static DEFINE_PER_CPU(struct tvec_base, __tvec_bases);
>
>  /* Functions below help us manage 'deferrable' flag */
>  static inline unsigned int tbase_get_deferrable(struct tvec_base *base)
> @@ -1534,46 +1535,25 @@ EXPORT_SYMBOL(schedule_timeout_uninterruptible);
>
>  static int init_timers_cpu(int cpu)
>  {
> -       int j;
> -       struct tvec_base *base;
> +       struct tvec_base *base = per_cpu(tvec_bases, cpu);
>         static char tvec_base_done[NR_CPUS];
> +       int j;
>
>         if (!tvec_base_done[cpu]) {
>                 static char boot_done;
>
> -               if (boot_done) {
> -                       /*
> -                        * The APs use this path later in boot
> -                        */
> -                       base = kzalloc_node(sizeof(*base), GFP_KERNEL,
> -                                           cpu_to_node(cpu));
> -                       if (!base)
> -                               return -ENOMEM;
> -
> -                       /* Make sure tvec_base has TIMER_FLAG_MASK bits free */
> -                       if (WARN_ON(base != tbase_get_base(base))) {
> -                               kfree(base);
> -                               return -ENOMEM;
> -                       }
> -                       per_cpu(tvec_bases, cpu) = base;
> +               if (!boot_done) {
> +                       boot_done = 1; /* skip the boot cpu */
>                 } else {
> -                       /*
> -                        * This is for the boot CPU - we use compile-time
> -                        * static initialisation because per-cpu memory isn't
> -                        * ready yet and because the memory allocators are not
> -                        * initialised either.
> -                        */
> -                       boot_done = 1;
> -                       base = &boot_tvec_bases;
> +                       base = per_cpu_ptr(&__tvec_bases);
> +                       per_cpu(tvec_bases, cpu) = base;
>                 }
> +
>                 spin_lock_init(&base->lock);
>                 tvec_base_done[cpu] = 1;
>                 base->cpu = cpu;
> -       } else {
> -               base = per_cpu(tvec_bases, cpu);
>         }
>
> -
>         for (j = 0; j < TVN_SIZE; j++) {
>                 INIT_LIST_HEAD(base->tv5.vec + j);
>                 INIT_LIST_HEAD(base->tv4.vec + j);

Even after this with following diff gives me the same warning on blackfin..

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..58bc28d9cef2 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -68,7 +68,7 @@ extern struct tvec_base boot_tvec_bases;
 #define TIMER_DEFERRABLE               0x1LU
 #define TIMER_IRQSAFE                  0x2LU

-#define TIMER_FLAG_MASK                        0x3LU
+#define TIMER_FLAG_MASK                        0x7LU

 #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
                .entry = { .prev = TIMER_ENTRY_STATIC },        \


---------x--------------------x----------------------

Warning:

config: blackfin-allyesconfig (attached as .config)
reproduce:
  wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
-O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout ca713e393c6eceb54e803df204772a3d6e6c7981
  # save the attached .config to linux build tree
  make.cross ARCH=blackfin

All error/warnings:

   kernel/time/timer.c: In function 'init_timers':
>> kernel/time/timer.c:1648:2: error: call to '__compiletime_assert_1648' declared with attribute error: BUILD_BUG_ON failed: __alignof__(struct tvec_base) & TIMER_FLAG_MASK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-29 12:01                   ` Viresh Kumar
@ 2015-03-29 17:24                     ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-29 17:24 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner,
	realmz6

On Sun, Mar 29, 2015 at 05:31:32PM +0530, Viresh Kumar wrote:
> Warning:
> 
> config: blackfin-allyesconfig (attached as .config)
> reproduce:
>   wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
> -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout ca713e393c6eceb54e803df204772a3d6e6c7981
>   # save the attached .config to linux build tree
>   make.cross ARCH=blackfin
> 
> All error/warnings:
> 
>    kernel/time/timer.c: In function 'init_timers':
> >> kernel/time/timer.c:1648:2: error: call to '__compiletime_assert_1648' declared with attribute error: BUILD_BUG_ON failed: __alignof__(struct tvec_base) & TIMER_FLAG_MASK

Ha, this is because blackfin is broken.

blackfin doesn't respect ____cacheline_aligned and NOPs it for UP
builds. Why it thinks {__,}__cacheline_aligned semantics should differ
between SMP/UP is a mystery to me, we have the &_in_smp primitives for
that.

So just ignore this, let the blackfin people deal with it.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-29 17:24                     ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-29 17:24 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner,
	realmz6

On Sun, Mar 29, 2015 at 05:31:32PM +0530, Viresh Kumar wrote:
> Warning:
> 
> config: blackfin-allyesconfig (attached as .config)
> reproduce:
>   wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
> -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout ca713e393c6eceb54e803df204772a3d6e6c7981
>   # save the attached .config to linux build tree
>   make.cross ARCH=blackfin
> 
> All error/warnings:
> 
>    kernel/time/timer.c: In function 'init_timers':
> >> kernel/time/timer.c:1648:2: error: call to '__compiletime_assert_1648' declared with attribute error: BUILD_BUG_ON failed: __alignof__(struct tvec_base) & TIMER_FLAG_MASK

Ha, this is because blackfin is broken.

blackfin doesn't respect ____cacheline_aligned and NOPs it for UP
builds. Why it thinks {__,}__cacheline_aligned semantics should differ
between SMP/UP is a mystery to me, we have the &_in_smp primitives for
that.

So just ignore this, let the blackfin people deal with it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-29 10:24                   ` Peter Zijlstra
@ 2015-03-30 12:02                     ` Viresh Kumar
  -1 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-30 12:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 29 March 2015 at 15:54, Peter Zijlstra <peterz@infradead.org> wrote:
> On Sat, Mar 28, 2015 at 02:44:57PM +0100, Peter Zijlstra wrote:
>> > Now there are few issues I see here (Sorry if they are all imaginary):
>> > - In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
>> >   happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
>> >   hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
>> >   us to a lot of other problems? It wouldn't be serialized to itself anymore ?
>>
>> What I said above.
>
> What I didn't say, but had thought of is that __run_timer() should skip
> any timer that has RUNNING set -- for obvious reasons :-)

Below is copied from your first reply, and so you probably already
said that ? :)

> Also, once you have tbase_running, we can take base->running_timer out
> altogether.

I wanted to clarify if I understood it correctly..

Are you saying that:

Case 1.) if we find tbase_running on cpuY (because it was rearmed
from its handler on cpuX and has got migrated to cpuY), then we should drop the
timer from the list without calling its handler (as that is already
running in parallel) ?

Or

Case 2.) we keep retrying for it, until the time the other handler finishes?


I have few queries for both the cases.

Case 1.) Will that be fair to the timer user as the timer may get lost
completely.
If we skip the timer on cpuY here, it wouldn't be enqueued again and
so will be lost.

Case 2.) We kept waiting for the first handler to finish ..
- cpuY may waste some cycles as it kept waiting for handler to finish on cpuX ..
- We may need to perform base unlock/lock on cpuY, so that cpuX can take cpuY's
lock to reset tbase_running. And that might be racy, not sure.

--
viresh

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-30 12:02                     ` Viresh Kumar
  0 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-30 12:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 29 March 2015 at 15:54, Peter Zijlstra <peterz@infradead.org> wrote:
> On Sat, Mar 28, 2015 at 02:44:57PM +0100, Peter Zijlstra wrote:
>> > Now there are few issues I see here (Sorry if they are all imaginary):
>> > - In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
>> >   happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
>> >   hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
>> >   us to a lot of other problems? It wouldn't be serialized to itself anymore ?
>>
>> What I said above.
>
> What I didn't say, but had thought of is that __run_timer() should skip
> any timer that has RUNNING set -- for obvious reasons :-)

Below is copied from your first reply, and so you probably already
said that ? :)

> Also, once you have tbase_running, we can take base->running_timer out
> altogether.

I wanted to clarify if I understood it correctly..

Are you saying that:

Case 1.) if we find tbase_running on cpuY (because it was rearmed
from its handler on cpuX and has got migrated to cpuY), then we should drop the
timer from the list without calling its handler (as that is already
running in parallel) ?

Or

Case 2.) we keep retrying for it, until the time the other handler finishes?


I have few queries for both the cases.

Case 1.) Will that be fair to the timer user as the timer may get lost
completely.
If we skip the timer on cpuY here, it wouldn't be enqueued again and
so will be lost.

Case 2.) We kept waiting for the first handler to finish ..
- cpuY may waste some cycles as it kept waiting for handler to finish on cpuX ..
- We may need to perform base unlock/lock on cpuY, so that cpuX can take cpuY's
lock to reset tbase_running. And that might be racy, not sure.

--
viresh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-30 12:02                     ` Viresh Kumar
@ 2015-03-30 12:47                       ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-30 12:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Mon, Mar 30, 2015 at 05:32:16PM +0530, Viresh Kumar wrote:
> On 29 March 2015 at 15:54, Peter Zijlstra <peterz@infradead.org> wrote:

> > What I didn't say, but had thought of is that __run_timer() should skip
> > any timer that has RUNNING set -- for obvious reasons :-)

> Below is copied from your first reply, and so you probably already
> said that ? :)
> 
> > Also, once you have tbase_running, we can take base->running_timer out
> > altogether.

No, I means something else with that. We can remove the
tvec_base::running_timer field. Everything that uses that can use
tbase_running() AFAICT.

> I wanted to clarify if I understood it correctly..
> 
> Are you saying that:

> Case 2.) we keep retrying for it, until the time the other handler finishes?

That.

If we remove it from the list before we call ->fn. Therefore, even if
migrate happens, it will not see a RUNNING timer entry, seeing how its
not actually on any lists.

The only way to get on a list while running is if ->fn() requeues itself
_on_another_base_. When that happens, we need to wait for it to complete
running.

> Case 2.) We kept waiting for the first handler to finish ..
> - cpuY may waste some cycles as it kept waiting for handler to finish on cpuX ..

True, rather silly to requeue a timer on the same jiffy as its already
running through, but yes, an unlikely possibility.

You can run another timer while we wait -- if there is another of
course.

> - We may need to perform base unlock/lock on cpuY, so that cpuX can take cpuY's
> lock to reset tbase_running. And that might be racy, not sure.

Drop yes, racy not so much I think.


diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..1394f9540348 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1189,12 +1189,39 @@ static inline void __run_timers(struct tvec_base *base)
 			cascade(base, &base->tv5, INDEX(3));
 		++base->timer_jiffies;
 		list_replace_init(base->tv1.vec + index, head);
+
+again:
 		while (!list_empty(head)) {
 			void (*fn)(unsigned long);
 			unsigned long data;
 			bool irqsafe;
 
-			timer = list_first_entry(head, struct timer_list,entry);
+			timer = list_first_entry(head, struct timer_list, entry);
+			if (unlikely(tbase_running(timer))) {
+				/* Only one timer on the list, force wait. */
+				if (unlikely(head->next == head->prev)) {
+					spin_unlock(&base->lock);
+
+					/*
+					 * The only way to get here is if the
+					 * handler requeued itself on another
+					 * base, this guarantees the timer will
+					 * not go away.
+					 */
+					while (tbase_running(timer))
+						cpu_relax();
+
+					spin_lock(&base->lock);
+				} else  {
+					/*
+					 * Otherwise, rotate the list and try
+					 * someone else.
+					 */
+					list_move_tail(&timer->entry, head);
+				}
+				goto again;
+			}
+
 			fn = timer->function;
 			data = timer->data;
 			irqsafe = tbase_get_irqsafe(timer->base);

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-30 12:47                       ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-30 12:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Mon, Mar 30, 2015 at 05:32:16PM +0530, Viresh Kumar wrote:
> On 29 March 2015 at 15:54, Peter Zijlstra <peterz@infradead.org> wrote:

> > What I didn't say, but had thought of is that __run_timer() should skip
> > any timer that has RUNNING set -- for obvious reasons :-)

> Below is copied from your first reply, and so you probably already
> said that ? :)
> 
> > Also, once you have tbase_running, we can take base->running_timer out
> > altogether.

No, I means something else with that. We can remove the
tvec_base::running_timer field. Everything that uses that can use
tbase_running() AFAICT.

> I wanted to clarify if I understood it correctly..
> 
> Are you saying that:

> Case 2.) we keep retrying for it, until the time the other handler finishes?

That.

If we remove it from the list before we call ->fn. Therefore, even if
migrate happens, it will not see a RUNNING timer entry, seeing how its
not actually on any lists.

The only way to get on a list while running is if ->fn() requeues itself
_on_another_base_. When that happens, we need to wait for it to complete
running.

> Case 2.) We kept waiting for the first handler to finish ..
> - cpuY may waste some cycles as it kept waiting for handler to finish on cpuX ..

True, rather silly to requeue a timer on the same jiffy as its already
running through, but yes, an unlikely possibility.

You can run another timer while we wait -- if there is another of
course.

> - We may need to perform base unlock/lock on cpuY, so that cpuX can take cpuY's
> lock to reset tbase_running. And that might be racy, not sure.

Drop yes, racy not so much I think.


diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..1394f9540348 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1189,12 +1189,39 @@ static inline void __run_timers(struct tvec_base *base)
 			cascade(base, &base->tv5, INDEX(3));
 		++base->timer_jiffies;
 		list_replace_init(base->tv1.vec + index, head);
+
+again:
 		while (!list_empty(head)) {
 			void (*fn)(unsigned long);
 			unsigned long data;
 			bool irqsafe;
 
-			timer = list_first_entry(head, struct timer_list,entry);
+			timer = list_first_entry(head, struct timer_list, entry);
+			if (unlikely(tbase_running(timer))) {
+				/* Only one timer on the list, force wait. */
+				if (unlikely(head->next == head->prev)) {
+					spin_unlock(&base->lock);
+
+					/*
+					 * The only way to get here is if the
+					 * handler requeued itself on another
+					 * base, this guarantees the timer will
+					 * not go away.
+					 */
+					while (tbase_running(timer))
+						cpu_relax();
+
+					spin_lock(&base->lock);
+				} else  {
+					/*
+					 * Otherwise, rotate the list and try
+					 * someone else.
+					 */
+					list_move_tail(&timer->entry, head);
+				}
+				goto again;
+			}
+
 			fn = timer->function;
 			data = timer->data;
 			irqsafe = tbase_get_irqsafe(timer->base);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-30 12:47                       ` Peter Zijlstra
@ 2015-03-30 13:14                         ` Viresh Kumar
  -1 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-30 13:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 30 March 2015 at 18:17, Peter Zijlstra <peterz@infradead.org> wrote:
> No, I means something else with that. We can remove the
> tvec_base::running_timer field. Everything that uses that can use
> tbase_running() AFAICT.

Okay, there is one instance which still needs it.

migrate_timers():

        BUG_ON(old_base->running_timer);

What I wasn't sure about it is if we get can drop this statement or not.
If we decide not to drop it, then we can convert running_timer into a bool.

> Drop yes, racy not so much I think.
>
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 2d3f5c504939..1394f9540348 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1189,12 +1189,39 @@ static inline void __run_timers(struct tvec_base *base)
>                         cascade(base, &base->tv5, INDEX(3));
>                 ++base->timer_jiffies;
>                 list_replace_init(base->tv1.vec + index, head);
> +
> +again:
>                 while (!list_empty(head)) {
>                         void (*fn)(unsigned long);
>                         unsigned long data;
>                         bool irqsafe;
>
> -                       timer = list_first_entry(head, struct timer_list,entry);
> +                       timer = list_first_entry(head, struct timer_list, entry);
> +                       if (unlikely(tbase_running(timer))) {
> +                               /* Only one timer on the list, force wait. */
> +                               if (unlikely(head->next == head->prev)) {
> +                                       spin_unlock(&base->lock);
> +
> +                                       /*
> +                                        * The only way to get here is if the
> +                                        * handler requeued itself on another
> +                                        * base, this guarantees the timer will
> +                                        * not go away.
> +                                        */
> +                                       while (tbase_running(timer))
> +                                               cpu_relax();
> +
> +                                       spin_lock(&base->lock);
> +                               } else  {
> +                                       /*
> +                                        * Otherwise, rotate the list and try
> +                                        * someone else.
> +                                        */
> +                                       list_move_tail(&timer->entry, head);
> +                               }
> +                               goto again;
> +                       }
> +
>                         fn = timer->function;
>                         data = timer->data;
>                         irqsafe = tbase_get_irqsafe(timer->base);

Yeah, so I have written something similar only. Wasn't sure about what you wrote
earlier. Thanks for the clarification.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-30 13:14                         ` Viresh Kumar
  0 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-30 13:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 30 March 2015 at 18:17, Peter Zijlstra <peterz@infradead.org> wrote:
> No, I means something else with that. We can remove the
> tvec_base::running_timer field. Everything that uses that can use
> tbase_running() AFAICT.

Okay, there is one instance which still needs it.

migrate_timers():

        BUG_ON(old_base->running_timer);

What I wasn't sure about it is if we get can drop this statement or not.
If we decide not to drop it, then we can convert running_timer into a bool.

> Drop yes, racy not so much I think.
>
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 2d3f5c504939..1394f9540348 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1189,12 +1189,39 @@ static inline void __run_timers(struct tvec_base *base)
>                         cascade(base, &base->tv5, INDEX(3));
>                 ++base->timer_jiffies;
>                 list_replace_init(base->tv1.vec + index, head);
> +
> +again:
>                 while (!list_empty(head)) {
>                         void (*fn)(unsigned long);
>                         unsigned long data;
>                         bool irqsafe;
>
> -                       timer = list_first_entry(head, struct timer_list,entry);
> +                       timer = list_first_entry(head, struct timer_list, entry);
> +                       if (unlikely(tbase_running(timer))) {
> +                               /* Only one timer on the list, force wait. */
> +                               if (unlikely(head->next == head->prev)) {
> +                                       spin_unlock(&base->lock);
> +
> +                                       /*
> +                                        * The only way to get here is if the
> +                                        * handler requeued itself on another
> +                                        * base, this guarantees the timer will
> +                                        * not go away.
> +                                        */
> +                                       while (tbase_running(timer))
> +                                               cpu_relax();
> +
> +                                       spin_lock(&base->lock);
> +                               } else  {
> +                                       /*
> +                                        * Otherwise, rotate the list and try
> +                                        * someone else.
> +                                        */
> +                                       list_move_tail(&timer->entry, head);
> +                               }
> +                               goto again;
> +                       }
> +
>                         fn = timer->function;
>                         data = timer->data;
>                         irqsafe = tbase_get_irqsafe(timer->base);

Yeah, so I have written something similar only. Wasn't sure about what you wrote
earlier. Thanks for the clarification.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-30 13:14                         ` Viresh Kumar
@ 2015-03-30 13:59                           ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-30 13:59 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Mon, Mar 30, 2015 at 06:44:22PM +0530, Viresh Kumar wrote:
> On 30 March 2015 at 18:17, Peter Zijlstra <peterz@infradead.org> wrote:
> > No, I means something else with that. We can remove the
> > tvec_base::running_timer field. Everything that uses that can use
> > tbase_running() AFAICT.
> 
> Okay, there is one instance which still needs it.
> 
> migrate_timers():
> 
>         BUG_ON(old_base->running_timer);
> 
> What I wasn't sure about it is if we get can drop this statement or not.
> If we decide not to drop it, then we can convert running_timer into a bool.

Yeah, so that _should_ not trigger (obviously), and while I agree with
the sentiment of sanity checks, I'm not sure its worth keeping that
variable around just for that.

Anyway, while I'm looking at struct tvec_base I notice the cpu member
should be second after the lock, that'll save 8 bytes on the structure
on 64bit machines.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-30 13:59                           ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-30 13:59 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Mon, Mar 30, 2015 at 06:44:22PM +0530, Viresh Kumar wrote:
> On 30 March 2015 at 18:17, Peter Zijlstra <peterz@infradead.org> wrote:
> > No, I means something else with that. We can remove the
> > tvec_base::running_timer field. Everything that uses that can use
> > tbase_running() AFAICT.
> 
> Okay, there is one instance which still needs it.
> 
> migrate_timers():
> 
>         BUG_ON(old_base->running_timer);
> 
> What I wasn't sure about it is if we get can drop this statement or not.
> If we decide not to drop it, then we can convert running_timer into a bool.

Yeah, so that _should_ not trigger (obviously), and while I agree with
the sentiment of sanity checks, I'm not sure its worth keeping that
variable around just for that.

Anyway, while I'm looking at struct tvec_base I notice the cpu member
should be second after the lock, that'll save 8 bytes on the structure
on 64bit machines.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-28  9:53             ` Peter Zijlstra
@ 2015-03-30 15:08               ` Michal Hocko
  -1 siblings, 0 replies; 58+ messages in thread
From: Michal Hocko @ 2015-03-30 15:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Viresh Kumar, Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Sat 28-03-15 10:53:22, Peter Zijlstra wrote:
[...]
> Alternatively the thing hocko suggests is an utter fail too. You cannot
> stuff that into hardirq context, that's insane.

I guess you are referring to
http://article.gmane.org/gmane.linux.kernel.mm/127569, right?

Why cannot we do something like refresh_cpu_vm_stats from the IRQ
context?  Especially the first zone stat part. The per-cpu pagesets is
more costly and it would need a special treatment, alright. A simple
way would be to splice the lists from the per-cpu context and then free
those pages from the kthread context.

I am still wondering why those two things were squashed into a single
place. Why kswapd is not doing the pcp cleanup?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-30 15:08               ` Michal Hocko
  0 siblings, 0 replies; 58+ messages in thread
From: Michal Hocko @ 2015-03-30 15:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Viresh Kumar, Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Sat 28-03-15 10:53:22, Peter Zijlstra wrote:
[...]
> Alternatively the thing hocko suggests is an utter fail too. You cannot
> stuff that into hardirq context, that's insane.

I guess you are referring to
http://article.gmane.org/gmane.linux.kernel.mm/127569, right?

Why cannot we do something like refresh_cpu_vm_stats from the IRQ
context?  Especially the first zone stat part. The per-cpu pagesets is
more costly and it would need a special treatment, alright. A simple
way would be to splice the lists from the per-cpu context and then free
those pages from the kthread context.

I am still wondering why those two things were squashed into a single
place. Why kswapd is not doing the pcp cleanup?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-30 15:08               ` Michal Hocko
@ 2015-03-30 15:14                 ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-30 15:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Viresh Kumar, Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Mon, Mar 30, 2015 at 05:08:18PM +0200, Michal Hocko wrote:
> On Sat 28-03-15 10:53:22, Peter Zijlstra wrote:
> [...]
> > Alternatively the thing hocko suggests is an utter fail too. You cannot
> > stuff that into hardirq context, that's insane.
> 
> I guess you are referring to
> http://article.gmane.org/gmane.linux.kernel.mm/127569, right?
> 
> Why cannot we do something like refresh_cpu_vm_stats from the IRQ
> context?  Especially the first zone stat part.

Big machines have big zone counts. There are machines with >200 nodes.
Although with the current trend of bigger nodes, the number of nodes
seems to come down as well. Still.

> The per-cpu pagesets is
> more costly and it would need a special treatment, alright. A simple
> way would be to splice the lists from the per-cpu context and then free
> those pages from the kthread context.
> 
> I am still wondering why those two things were squashed into a single
> place. Why kswapd is not doing the pcp cleanup?

Probably because they could be. The problem with kswapd is that its per
node, not per cpu.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-30 15:14                 ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-30 15:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Viresh Kumar, Andrew Morton, hannes, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Mon, Mar 30, 2015 at 05:08:18PM +0200, Michal Hocko wrote:
> On Sat 28-03-15 10:53:22, Peter Zijlstra wrote:
> [...]
> > Alternatively the thing hocko suggests is an utter fail too. You cannot
> > stuff that into hardirq context, that's insane.
> 
> I guess you are referring to
> http://article.gmane.org/gmane.linux.kernel.mm/127569, right?
> 
> Why cannot we do something like refresh_cpu_vm_stats from the IRQ
> context?  Especially the first zone stat part.

Big machines have big zone counts. There are machines with >200 nodes.
Although with the current trend of bigger nodes, the number of nodes
seems to come down as well. Still.

> The per-cpu pagesets is
> more costly and it would need a special treatment, alright. A simple
> way would be to splice the lists from the per-cpu context and then free
> those pages from the kthread context.
> 
> I am still wondering why those two things were squashed into a single
> place. Why kswapd is not doing the pcp cleanup?

Probably because they could be. The problem with kswapd is that its per
node, not per cpu.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-30 15:08               ` Michal Hocko
@ 2015-03-30 15:42                 ` Christoph Lameter
  -1 siblings, 0 replies; 58+ messages in thread
From: Christoph Lameter @ 2015-03-30 15:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Peter Zijlstra, Viresh Kumar, Andrew Morton, hannes,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Mon, 30 Mar 2015, Michal Hocko wrote:

> Why cannot we do something like refresh_cpu_vm_stats from the IRQ
> context?  Especially the first zone stat part. The per-cpu pagesets is
> more costly and it would need a special treatment, alright. A simple
> way would be to splice the lists from the per-cpu context and then free
> those pages from the kthread context.

That would work.

> I am still wondering why those two things were squashed into a single
> place. Why kswapd is not doing the pcp cleanup?

They were squashed together by me for conveniences sake. They could be
separated. AFAICT the pcp cleanup could be done only on demand and we
already have logic for that when flushihng via IPI.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-30 15:42                 ` Christoph Lameter
  0 siblings, 0 replies; 58+ messages in thread
From: Christoph Lameter @ 2015-03-30 15:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Peter Zijlstra, Viresh Kumar, Andrew Morton, hannes,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, mgorman, dave, koct9i, Linux Memory Management List,
	Suresh Siddha, Thomas Gleixner

On Mon, 30 Mar 2015, Michal Hocko wrote:

> Why cannot we do something like refresh_cpu_vm_stats from the IRQ
> context?  Especially the first zone stat part. The per-cpu pagesets is
> more costly and it would need a special treatment, alright. A simple
> way would be to splice the lists from the per-cpu context and then free
> those pages from the kthread context.

That would work.

> I am still wondering why those two things were squashed into a single
> place. Why kswapd is not doing the pcp cleanup?

They were squashed together by me for conveniences sake. They could be
separated. AFAICT the pcp cleanup could be done only on demand and we
already have logic for that when flushihng via IPI.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-30 13:59                           ` Peter Zijlstra
@ 2015-03-30 16:17                             ` Viresh Kumar
  -1 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-30 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 30 March 2015 at 19:29, Peter Zijlstra <peterz@infradead.org> wrote:
> Yeah, so that _should_ not trigger (obviously), and while I agree with
> the sentiment of sanity checks, I'm not sure its worth keeping that
> variable around just for that.

I read it as I can remove it then ? :)

> Anyway, while I'm looking at struct tvec_base I notice the cpu member
> should be second after the lock, that'll save 8 bytes on the structure
> on 64bit machines.

Hmm, I tried it on my macbook-pro.

$ uname -a
Linux vireshk 3.13.0-46-generic #79-Ubuntu SMP Tue Mar 10 20:06:50 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2

config: arch/x86/configs/x86_64_defconfig

And all I get it is 8256 bytes, with or without the change.

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..afc5d74678df 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -77,12 +77,12 @@ struct tvec_root {

 struct tvec_base {
        spinlock_t lock;
+       int cpu;
        struct timer_list *running_timer;
        unsigned long timer_jiffies;
        unsigned long next_timer;
        unsigned long active_timers;
        unsigned long all_timers;
-       int cpu;
        struct tvec_root tv1;
        struct tvec tv2;
        struct tvec tv3;

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-30 16:17                             ` Viresh Kumar
  0 siblings, 0 replies; 58+ messages in thread
From: Viresh Kumar @ 2015-03-30 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On 30 March 2015 at 19:29, Peter Zijlstra <peterz@infradead.org> wrote:
> Yeah, so that _should_ not trigger (obviously), and while I agree with
> the sentiment of sanity checks, I'm not sure its worth keeping that
> variable around just for that.

I read it as I can remove it then ? :)

> Anyway, while I'm looking at struct tvec_base I notice the cpu member
> should be second after the lock, that'll save 8 bytes on the structure
> on 64bit machines.

Hmm, I tried it on my macbook-pro.

$ uname -a
Linux vireshk 3.13.0-46-generic #79-Ubuntu SMP Tue Mar 10 20:06:50 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2

config: arch/x86/configs/x86_64_defconfig

And all I get it is 8256 bytes, with or without the change.

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..afc5d74678df 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -77,12 +77,12 @@ struct tvec_root {

 struct tvec_base {
        spinlock_t lock;
+       int cpu;
        struct timer_list *running_timer;
        unsigned long timer_jiffies;
        unsigned long next_timer;
        unsigned long active_timers;
        unsigned long all_timers;
-       int cpu;
        struct tvec_root tv1;
        struct tvec tv2;
        struct tvec tv3;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
  2015-03-30 16:17                             ` Viresh Kumar
@ 2015-03-30 16:25                               ` Peter Zijlstra
  -1 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-30 16:25 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Mon, Mar 30, 2015 at 09:47:01PM +0530, Viresh Kumar wrote:
> And all I get it is 8256 bytes, with or without the change.

Duh, rounded up to cacheline boundary ;-)

Trades two 4 byte holes at the start for a bigger 'hole' at the end.

struct tvec_base {
        spinlock_t                 lock;                 /*     0     2 */

        /* XXX 6 bytes hole, try to pack */

        struct timer_list *        running_timer;        /*     8     8 */
        long unsigned int          timer_jiffies;        /*    16     8 */
        long unsigned int          next_timer;           /*    24     8 */
        long unsigned int          active_timers;        /*    32     8 */
        long unsigned int          all_timers;           /*    40     8 */
        int                        cpu;                  /*    48     4 */

        /* XXX 4 bytes hole, try to pack */

        struct tvec_root           tv1;                  /*    56  4096 */
        /* --- cacheline 64 boundary (4096 bytes) was 56 bytes ago --- */
        struct tvec                tv2;                  /*  4152  1024 */
        /* --- cacheline 80 boundary (5120 bytes) was 56 bytes ago --- */
        struct tvec                tv3;                  /*  5176  1024 */
        /* --- cacheline 96 boundary (6144 bytes) was 56 bytes ago --- */
        struct tvec                tv4;                  /*  6200  1024 */
        /* --- cacheline 112 boundary (7168 bytes) was 56 bytes ago --- */
        struct tvec                tv5;                  /*  7224  1024 */
        /* --- cacheline 128 boundary (8192 bytes) was 56 bytes ago --- */

        /* size: 8256, cachelines: 129, members: 12 */
        /* sum members: 8238, holes: 2, sum holes: 10 */
        /* padding: 8 */
};

vs

struct tvec_base {
	spinlock_t                 lock;                 /*     0     2 */

	/* XXX 2 bytes hole, try to pack */

	int                        cpu;                  /*     4     4 */
	struct timer_list *        running_timer;        /*     8     8 */
	long unsigned int          timer_jiffies;        /*    16     8 */
	long unsigned int          next_timer;           /*    24     8 */
	long unsigned int          active_timers;        /*    32     8 */
	long unsigned int          all_timers;           /*    40     8 */
	struct tvec_root           tv1;                  /*    48  4096 */
	/* --- cacheline 64 boundary (4096 bytes) was 48 bytes ago --- */
	struct tvec                tv2;                  /*  4144  1024 */
	/* --- cacheline 80 boundary (5120 bytes) was 48 bytes ago --- */
	struct tvec                tv3;                  /*  5168  1024 */
	/* --- cacheline 96 boundary (6144 bytes) was 48 bytes ago --- */
	struct tvec                tv4;                  /*  6192  1024 */
	/* --- cacheline 112 boundary (7168 bytes) was 48 bytes ago --- */
	struct tvec                tv5;                  /*  7216  1024 */
	/* --- cacheline 128 boundary (8192 bytes) was 48 bytes ago --- */

	/* size: 8256, cachelines: 129, members: 12 */
	/* sum members: 8238, holes: 1, sum holes: 2 */
	/* padding: 16 */
};

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
@ 2015-03-30 16:25                               ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2015-03-30 16:25 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Andrew Morton, Johannes Weiner, Christoph Lameter,
	Linaro Kernel Mailman List, Linux Kernel Mailing List, vinmenon,
	shashim, Michal Hocko, Mel Gorman, dave, Konstantin Khlebnikov,
	Linux Memory Management List, Suresh Siddha, Thomas Gleixner

On Mon, Mar 30, 2015 at 09:47:01PM +0530, Viresh Kumar wrote:
> And all I get it is 8256 bytes, with or without the change.

Duh, rounded up to cacheline boundary ;-)

Trades two 4 byte holes at the start for a bigger 'hole' at the end.

struct tvec_base {
        spinlock_t                 lock;                 /*     0     2 */

        /* XXX 6 bytes hole, try to pack */

        struct timer_list *        running_timer;        /*     8     8 */
        long unsigned int          timer_jiffies;        /*    16     8 */
        long unsigned int          next_timer;           /*    24     8 */
        long unsigned int          active_timers;        /*    32     8 */
        long unsigned int          all_timers;           /*    40     8 */
        int                        cpu;                  /*    48     4 */

        /* XXX 4 bytes hole, try to pack */

        struct tvec_root           tv1;                  /*    56  4096 */
        /* --- cacheline 64 boundary (4096 bytes) was 56 bytes ago --- */
        struct tvec                tv2;                  /*  4152  1024 */
        /* --- cacheline 80 boundary (5120 bytes) was 56 bytes ago --- */
        struct tvec                tv3;                  /*  5176  1024 */
        /* --- cacheline 96 boundary (6144 bytes) was 56 bytes ago --- */
        struct tvec                tv4;                  /*  6200  1024 */
        /* --- cacheline 112 boundary (7168 bytes) was 56 bytes ago --- */
        struct tvec                tv5;                  /*  7224  1024 */
        /* --- cacheline 128 boundary (8192 bytes) was 56 bytes ago --- */

        /* size: 8256, cachelines: 129, members: 12 */
        /* sum members: 8238, holes: 2, sum holes: 10 */
        /* padding: 8 */
};

vs

struct tvec_base {
	spinlock_t                 lock;                 /*     0     2 */

	/* XXX 2 bytes hole, try to pack */

	int                        cpu;                  /*     4     4 */
	struct timer_list *        running_timer;        /*     8     8 */
	long unsigned int          timer_jiffies;        /*    16     8 */
	long unsigned int          next_timer;           /*    24     8 */
	long unsigned int          active_timers;        /*    32     8 */
	long unsigned int          all_timers;           /*    40     8 */
	struct tvec_root           tv1;                  /*    48  4096 */
	/* --- cacheline 64 boundary (4096 bytes) was 48 bytes ago --- */
	struct tvec                tv2;                  /*  4144  1024 */
	/* --- cacheline 80 boundary (5120 bytes) was 48 bytes ago --- */
	struct tvec                tv3;                  /*  5168  1024 */
	/* --- cacheline 96 boundary (6144 bytes) was 48 bytes ago --- */
	struct tvec                tv4;                  /*  6192  1024 */
	/* --- cacheline 112 boundary (7168 bytes) was 48 bytes ago --- */
	struct tvec                tv5;                  /*  7216  1024 */
	/* --- cacheline 128 boundary (8192 bytes) was 48 bytes ago --- */

	/* size: 8256, cachelines: 129, members: 12 */
	/* sum members: 8238, holes: 1, sum holes: 2 */
	/* padding: 16 */
};

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2015-03-30 16:25 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-26  5:39 [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work Viresh Kumar
2015-03-26  5:39 ` Viresh Kumar
2015-03-26 20:18 ` Andrew Morton
2015-03-26 20:18   ` Andrew Morton
2015-03-27  4:49   ` Viresh Kumar
2015-03-27  4:49     ` Viresh Kumar
2015-03-27  9:16     ` Peter Zijlstra
2015-03-27  9:16       ` Peter Zijlstra
2015-03-27  9:30       ` Peter Zijlstra
2015-03-27  9:30         ` Peter Zijlstra
2015-03-27 11:11         ` Christoph Lameter
2015-03-27 11:11           ` Christoph Lameter
2015-03-27 12:02           ` Peter Zijlstra
2015-03-27 12:02             ` Peter Zijlstra
2015-03-27 19:45             ` Christoph Lameter
2015-03-27 19:45               ` Christoph Lameter
2015-03-28  4:28             ` Viresh Kumar
2015-03-28  4:28               ` Viresh Kumar
2015-03-28 11:41               ` Peter Zijlstra
2015-03-28 11:41                 ` Peter Zijlstra
2015-03-28  4:18         ` Viresh Kumar
2015-03-28  4:18           ` Viresh Kumar
2015-03-28  9:53           ` Peter Zijlstra
2015-03-28  9:53             ` Peter Zijlstra
2015-03-28 11:57             ` viresh kumar
2015-03-28 11:57               ` viresh kumar
2015-03-28 12:04               ` Viresh Kumar
2015-03-28 12:04                 ` Viresh Kumar
2015-03-28 13:44               ` Peter Zijlstra
2015-03-28 13:44                 ` Peter Zijlstra
2015-03-29 10:24                 ` Peter Zijlstra
2015-03-29 10:24                   ` Peter Zijlstra
2015-03-30 12:02                   ` Viresh Kumar
2015-03-30 12:02                     ` Viresh Kumar
2015-03-30 12:47                     ` Peter Zijlstra
2015-03-30 12:47                       ` Peter Zijlstra
2015-03-30 13:14                       ` Viresh Kumar
2015-03-30 13:14                         ` Viresh Kumar
2015-03-30 13:59                         ` Peter Zijlstra
2015-03-30 13:59                           ` Peter Zijlstra
2015-03-30 16:17                           ` Viresh Kumar
2015-03-30 16:17                             ` Viresh Kumar
2015-03-30 16:25                             ` Peter Zijlstra
2015-03-30 16:25                               ` Peter Zijlstra
2015-03-29 12:01                 ` Viresh Kumar
2015-03-29 12:01                   ` Viresh Kumar
2015-03-29 17:24                   ` Peter Zijlstra
2015-03-29 17:24                     ` Peter Zijlstra
2015-03-30 15:08             ` Michal Hocko
2015-03-30 15:08               ` Michal Hocko
2015-03-30 15:14               ` Peter Zijlstra
2015-03-30 15:14                 ` Peter Zijlstra
2015-03-30 15:42               ` Christoph Lameter
2015-03-30 15:42                 ` Christoph Lameter
2015-03-27 14:19 ` Michal Hocko
2015-03-27 14:19   ` Michal Hocko
2015-03-28  4:34   ` Viresh Kumar
2015-03-28  4:34     ` Viresh Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.