All of lore.kernel.org
 help / color / mirror / Atom feed
From: viresh kumar <viresh.kumar@linaro.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linaro-kernel@lists.linaro.org, linux-kernel@vger.kernel.org,
	Steven Miao <realmz6@gmail.com>,
	shashim@codeaurora.org
Subject: Re: [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer
Date: Fri, 17 Apr 2015 13:42:54 +0530	[thread overview]
Message-ID: <5530C086.2020700@linaro.org> (raw)
In-Reply-To: <alpine.DEB.2.11.1504150008470.3845@nanos>


Hi Thomas,

On Wednesday 15 April 2015 04:43 AM, Thomas Gleixner wrote:
> No new flags in the timer base, no concurrent expiry, no extra
> conditionals in the expiry path. Just a single extra check at the end
> of the softirq handler for this rare condition instead of imposing all
> that nonsense to the hotpath.

Here is what I could get to based on your suggestions:

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..c412511d34b8 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -78,6 +78,8 @@ struct tvec_root {
 struct tvec_base {
        spinlock_t lock;
        struct timer_list *running_timer;
+       struct list_head migration_list;
+       struct tvec_base *preferred_target;
        unsigned long timer_jiffies;
        unsigned long next_timer;
        unsigned long active_timers;
@@ -785,10 +787,18 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
                 * We are trying to schedule the timer on the local CPU.
                 * However we can't change timer's base while it is running,
                 * otherwise del_timer_sync() can't detect that the timer's
-                * handler yet has not finished. This also guarantees that
-                * the timer is serialized wrt itself.
+                * handler yet has not finished.
+                *
+                * Move timer to migration_list which can be processed after all
+                * timers in current cycle are serviced.  This also guarantees
+                * that the timer is serialized wrt itself.
                 */
-               if (likely(base->running_timer != timer)) {
+               if (unlikely(base->running_timer == timer)) {
+                       timer->expires = expires;
+                       base->preferred_target = new_base;
+                       list_add_tail(&timer->entry, &base->migration_list);
+                       goto out_unlock;
+               } else {
                        /* See the comment in lock_timer_base() */
                        timer_set_base(timer, NULL);
                        spin_unlock(&base->lock);
@@ -1214,6 +1224,33 @@ static inline void __run_timers(struct tvec_base *base)
                                spin_lock_irq(&base->lock);
                        }
                }
+
+               /*
+                * Timer handler re-armed timer and we didn't wanted to add that
+                * on a idle local CPU. Its handler has finished now, lets
+                * enqueue it again.
+                */
+               if (unlikely(!list_empty(&base->migration_list))) {
+                       struct tvec_base *new_base = base->preferred_target;
+
+                       do {
+                               timer = list_first_entry(&base->migration_list,
+                                                        struct timer_list, entry);
+
+                               __list_del(timer->entry.prev, timer->entry.next);
+
+                               /* See the comment in lock_timer_base() */
+                               timer_set_base(timer, NULL);
+                               spin_unlock(&base->lock);
+
+                               spin_lock(&new_base->lock);
+                               timer_set_base(timer, new_base);
+                               internal_add_timer(new_base, timer);
+                               spin_unlock(&new_base->lock);
+
+                               spin_lock(&base->lock);
+                       } while (!list_empty(&base->migration_list));
+               }
        }
        base->running_timer = NULL;
        spin_unlock_irq(&base->lock);
@@ -1583,6 +1620,7 @@ static int init_timers_cpu(int cpu)
        for (j = 0; j < TVR_SIZE; j++)
                INIT_LIST_HEAD(base->tv1.vec + j);

+       INIT_LIST_HEAD(&base->migration_list);
        base->timer_jiffies = jiffies;
        base->next_timer = base->timer_jiffies;
        base->active_timers = 0;


Does this look any better ?

I tested this on Exynos (Dual ARM Cortex-A9) board, with ubuntu over it.
System was fairly idle and I did 'dmesg > /dev/null' on one of the CPUs
in an infinite loop, to get CPUs out of idle.


I have got few more concerns/diffs over this to further optimize things,
but kept them separate so that I can drop them if they aren't correct.

1.) Should the above list_empty(migration_list) block be added out of the

	while (time_after_eq(jiffies, base->timer_jiffies))

    block ? So that we check it only once per timer interrupt.

    Also ->running_timer is set to the last serviced timer and a
    del_timer_sync() might be waiting to remove it. But we continue with
    the migration list first, without clearing it first. Not sure if this
    is important at all..


2.) By the time we finish serving all pending timers, local CPU might not be
    idle anymore OR the target CPU may become idle.

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index c412511d34b8..d75d31e9dc49 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1233,6 +1233,14 @@ static inline void __run_timers(struct tvec_base *base)
                if (unlikely(!list_empty(&base->migration_list))) {
                        struct tvec_base *new_base = base->preferred_target;

+                       if (!idle_cpu(base->cpu)) {
+                               /* Local CPU not idle anymore */
+                               new_base = base;
+                       } else if (idle_cpu(new_base->cpu)) {
+                               /* Re-evaluate base, target CPU has gone idle */
+                               new_base = per_cpu(tvec_bases, get_nohz_timer_target(false));
+                       }
+
                        do {
                                timer = list_first_entry(&base->migration_list,
                                                         struct timer_list, entry);


The first if block is getting hit a lot of times in my tests. But
the second one has never been hit (Probably because of only two CPUs,
not sure).


2.) It might be better to update preferred_target every time we choose a
difference base. This may help us avoid calling get_nohz_timer_target()
in the second if block above.

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index d75d31e9dc49..558cd98abd87 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -783,6 +783,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
        new_base = per_cpu(tvec_bases, cpu);

        if (base != new_base) {
+               base->preferred_target = new_base;
+
                /*
                 * We are trying to schedule the timer on the local CPU.
                 * However we can't change timer's base while it is running,
@@ -795,7 +797,6 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
                 */
                if (unlikely(base->running_timer == timer)) {
                        timer->expires = expires;
-                       base->preferred_target = new_base;
                        list_add_tail(&timer->entry, &base->migration_list);
                        goto out_unlock;
                } else {



--
viresh

  reply	other threads:[~2015-04-17  8:13 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-31  6:55 [PATCH 0/2] timer: Migrate running timers Viresh Kumar
2015-03-31  6:55 ` [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer Viresh Kumar
2015-03-31 14:53   ` Peter Zijlstra
2015-04-14 23:13   ` Thomas Gleixner
2015-04-17  8:12     ` viresh kumar [this message]
2015-04-17  8:32       ` Ingo Molnar
2015-04-21 21:32       ` Thomas Gleixner
2015-04-21 21:54         ` Eric Dumazet
2015-04-22 15:29           ` Peter Zijlstra
2015-04-22 16:02             ` Eric Dumazet
2015-04-22 18:56               ` Thomas Gleixner
2015-04-22 19:59                 ` Eric Dumazet
2015-04-22 21:56                   ` Thomas Gleixner
2015-04-23  6:57                     ` Eric Dumazet
2015-04-23 12:45                       ` Thomas Gleixner
2015-04-25 18:37                         ` Eric Dumazet
2015-05-05 13:00                           ` Thomas Gleixner
2015-05-06 16:33                             ` Eric Dumazet
2015-04-15 15:54   ` Thomas Gleixner
2015-03-31  6:55 ` [PATCH 2/2] timer: Replace base-> 'running_timer' with 'busy' Viresh Kumar
2015-03-31 15:01 ` [PATCH 0/2] timer: Migrate running timers Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5530C086.2020700@linaro.org \
    --to=viresh.kumar@linaro.org \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=realmz6@gmail.com \
    --cc=shashim@codeaurora.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.