All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Christoph Lameter <cl@linux.com>,
	Linaro Kernel Mailman List <linaro-kernel@lists.linaro.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	vinmenon@codeaurora.org, shashim@codeaurora.org,
	Michal Hocko <mhocko@suse.cz>, Mel Gorman <mgorman@suse.de>,
	dave@stgolabs.net, Konstantin Khlebnikov <koct9i@gmail.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
Date: Mon, 30 Mar 2015 14:47:46 +0200	[thread overview]
Message-ID: <20150330124746.GI21418@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <CAKohpon2GSpk+6pNuHEsDC55hHtowwfGJivPM0Gh0wt1A2cd-w@mail.gmail.com>

On Mon, Mar 30, 2015 at 05:32:16PM +0530, Viresh Kumar wrote:
> On 29 March 2015 at 15:54, Peter Zijlstra <peterz@infradead.org> wrote:

> > What I didn't say, but had thought of is that __run_timer() should skip
> > any timer that has RUNNING set -- for obvious reasons :-)

> Below is copied from your first reply, and so you probably already
> said that ? :)
> 
> > Also, once you have tbase_running, we can take base->running_timer out
> > altogether.

No, I means something else with that. We can remove the
tvec_base::running_timer field. Everything that uses that can use
tbase_running() AFAICT.

> I wanted to clarify if I understood it correctly..
> 
> Are you saying that:

> Case 2.) we keep retrying for it, until the time the other handler finishes?

That.

If we remove it from the list before we call ->fn. Therefore, even if
migrate happens, it will not see a RUNNING timer entry, seeing how its
not actually on any lists.

The only way to get on a list while running is if ->fn() requeues itself
_on_another_base_. When that happens, we need to wait for it to complete
running.

> Case 2.) We kept waiting for the first handler to finish ..
> - cpuY may waste some cycles as it kept waiting for handler to finish on cpuX ..

True, rather silly to requeue a timer on the same jiffy as its already
running through, but yes, an unlikely possibility.

You can run another timer while we wait -- if there is another of
course.

> - We may need to perform base unlock/lock on cpuY, so that cpuX can take cpuY's
> lock to reset tbase_running. And that might be racy, not sure.

Drop yes, racy not so much I think.


diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..1394f9540348 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1189,12 +1189,39 @@ static inline void __run_timers(struct tvec_base *base)
 			cascade(base, &base->tv5, INDEX(3));
 		++base->timer_jiffies;
 		list_replace_init(base->tv1.vec + index, head);
+
+again:
 		while (!list_empty(head)) {
 			void (*fn)(unsigned long);
 			unsigned long data;
 			bool irqsafe;
 
-			timer = list_first_entry(head, struct timer_list,entry);
+			timer = list_first_entry(head, struct timer_list, entry);
+			if (unlikely(tbase_running(timer))) {
+				/* Only one timer on the list, force wait. */
+				if (unlikely(head->next == head->prev)) {
+					spin_unlock(&base->lock);
+
+					/*
+					 * The only way to get here is if the
+					 * handler requeued itself on another
+					 * base, this guarantees the timer will
+					 * not go away.
+					 */
+					while (tbase_running(timer))
+						cpu_relax();
+
+					spin_lock(&base->lock);
+				} else  {
+					/*
+					 * Otherwise, rotate the list and try
+					 * someone else.
+					 */
+					list_move_tail(&timer->entry, head);
+				}
+				goto again;
+			}
+
 			fn = timer->function;
 			data = timer->data;
 			irqsafe = tbase_get_irqsafe(timer->base);

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Christoph Lameter <cl@linux.com>,
	Linaro Kernel Mailman List <linaro-kernel@lists.linaro.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	vinmenon@codeaurora.org, shashim@codeaurora.org,
	Michal Hocko <mhocko@suse.cz>, Mel Gorman <mgorman@suse.de>,
	dave@stgolabs.net, Konstantin Khlebnikov <koct9i@gmail.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
Date: Mon, 30 Mar 2015 14:47:46 +0200	[thread overview]
Message-ID: <20150330124746.GI21418@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <CAKohpon2GSpk+6pNuHEsDC55hHtowwfGJivPM0Gh0wt1A2cd-w@mail.gmail.com>

On Mon, Mar 30, 2015 at 05:32:16PM +0530, Viresh Kumar wrote:
> On 29 March 2015 at 15:54, Peter Zijlstra <peterz@infradead.org> wrote:

> > What I didn't say, but had thought of is that __run_timer() should skip
> > any timer that has RUNNING set -- for obvious reasons :-)

> Below is copied from your first reply, and so you probably already
> said that ? :)
> 
> > Also, once you have tbase_running, we can take base->running_timer out
> > altogether.

No, I means something else with that. We can remove the
tvec_base::running_timer field. Everything that uses that can use
tbase_running() AFAICT.

> I wanted to clarify if I understood it correctly..
> 
> Are you saying that:

> Case 2.) we keep retrying for it, until the time the other handler finishes?

That.

If we remove it from the list before we call ->fn. Therefore, even if
migrate happens, it will not see a RUNNING timer entry, seeing how its
not actually on any lists.

The only way to get on a list while running is if ->fn() requeues itself
_on_another_base_. When that happens, we need to wait for it to complete
running.

> Case 2.) We kept waiting for the first handler to finish ..
> - cpuY may waste some cycles as it kept waiting for handler to finish on cpuX ..

True, rather silly to requeue a timer on the same jiffy as its already
running through, but yes, an unlikely possibility.

You can run another timer while we wait -- if there is another of
course.

> - We may need to perform base unlock/lock on cpuY, so that cpuX can take cpuY's
> lock to reset tbase_running. And that might be racy, not sure.

Drop yes, racy not so much I think.


diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..1394f9540348 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1189,12 +1189,39 @@ static inline void __run_timers(struct tvec_base *base)
 			cascade(base, &base->tv5, INDEX(3));
 		++base->timer_jiffies;
 		list_replace_init(base->tv1.vec + index, head);
+
+again:
 		while (!list_empty(head)) {
 			void (*fn)(unsigned long);
 			unsigned long data;
 			bool irqsafe;
 
-			timer = list_first_entry(head, struct timer_list,entry);
+			timer = list_first_entry(head, struct timer_list, entry);
+			if (unlikely(tbase_running(timer))) {
+				/* Only one timer on the list, force wait. */
+				if (unlikely(head->next == head->prev)) {
+					spin_unlock(&base->lock);
+
+					/*
+					 * The only way to get here is if the
+					 * handler requeued itself on another
+					 * base, this guarantees the timer will
+					 * not go away.
+					 */
+					while (tbase_running(timer))
+						cpu_relax();
+
+					spin_lock(&base->lock);
+				} else  {
+					/*
+					 * Otherwise, rotate the list and try
+					 * someone else.
+					 */
+					list_move_tail(&timer->entry, head);
+				}
+				goto again;
+			}
+
 			fn = timer->function;
 			data = timer->data;
 			irqsafe = tbase_get_irqsafe(timer->base);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-03-30 12:48 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-26  5:39 [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work Viresh Kumar
2015-03-26  5:39 ` Viresh Kumar
2015-03-26 20:18 ` Andrew Morton
2015-03-26 20:18   ` Andrew Morton
2015-03-27  4:49   ` Viresh Kumar
2015-03-27  4:49     ` Viresh Kumar
2015-03-27  9:16     ` Peter Zijlstra
2015-03-27  9:16       ` Peter Zijlstra
2015-03-27  9:30       ` Peter Zijlstra
2015-03-27  9:30         ` Peter Zijlstra
2015-03-27 11:11         ` Christoph Lameter
2015-03-27 11:11           ` Christoph Lameter
2015-03-27 12:02           ` Peter Zijlstra
2015-03-27 12:02             ` Peter Zijlstra
2015-03-27 19:45             ` Christoph Lameter
2015-03-27 19:45               ` Christoph Lameter
2015-03-28  4:28             ` Viresh Kumar
2015-03-28  4:28               ` Viresh Kumar
2015-03-28 11:41               ` Peter Zijlstra
2015-03-28 11:41                 ` Peter Zijlstra
2015-03-28  4:18         ` Viresh Kumar
2015-03-28  4:18           ` Viresh Kumar
2015-03-28  9:53           ` Peter Zijlstra
2015-03-28  9:53             ` Peter Zijlstra
2015-03-28 11:57             ` viresh kumar
2015-03-28 11:57               ` viresh kumar
2015-03-28 12:04               ` Viresh Kumar
2015-03-28 12:04                 ` Viresh Kumar
2015-03-28 13:44               ` Peter Zijlstra
2015-03-28 13:44                 ` Peter Zijlstra
2015-03-29 10:24                 ` Peter Zijlstra
2015-03-29 10:24                   ` Peter Zijlstra
2015-03-30 12:02                   ` Viresh Kumar
2015-03-30 12:02                     ` Viresh Kumar
2015-03-30 12:47                     ` Peter Zijlstra [this message]
2015-03-30 12:47                       ` Peter Zijlstra
2015-03-30 13:14                       ` Viresh Kumar
2015-03-30 13:14                         ` Viresh Kumar
2015-03-30 13:59                         ` Peter Zijlstra
2015-03-30 13:59                           ` Peter Zijlstra
2015-03-30 16:17                           ` Viresh Kumar
2015-03-30 16:17                             ` Viresh Kumar
2015-03-30 16:25                             ` Peter Zijlstra
2015-03-30 16:25                               ` Peter Zijlstra
2015-03-29 12:01                 ` Viresh Kumar
2015-03-29 12:01                   ` Viresh Kumar
2015-03-29 17:24                   ` Peter Zijlstra
2015-03-29 17:24                     ` Peter Zijlstra
2015-03-30 15:08             ` Michal Hocko
2015-03-30 15:08               ` Michal Hocko
2015-03-30 15:14               ` Peter Zijlstra
2015-03-30 15:14                 ` Peter Zijlstra
2015-03-30 15:42               ` Christoph Lameter
2015-03-30 15:42                 ` Christoph Lameter
2015-03-27 14:19 ` Michal Hocko
2015-03-27 14:19   ` Michal Hocko
2015-03-28  4:34   ` Viresh Kumar
2015-03-28  4:34     ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150330124746.GI21418@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dave@stgolabs.net \
    --cc=hannes@cmpxchg.org \
    --cc=koct9i@gmail.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=shashim@codeaurora.org \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=vinmenon@codeaurora.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.