All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yong Zhang <yong.zhang0@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Charles Wang" <muming.wq@gmail.com>,
	linux-kernel@vger.kernel.org, "Ingo Molnar" <mingo@redhat.com>,
	"Tao Ma" <tm@tao.ma>, 含黛 <handai.szj@taobao.com>,
	"Doug Smythies" <dsmythies@telus.net>,
	"Thomas Gleixner" <tglx@linutronix.de>
Subject: Re: [PATCH] sched: Folding nohz load accounting more accurate
Date: Tue, 19 Jun 2012 14:08:24 +0800	[thread overview]
Message-ID: <20120619060824.GA31684@zhy> (raw)
In-Reply-To: <1340035417.15222.95.camel@twins>

On Mon, Jun 18, 2012 at 06:03:37PM +0200, Peter Zijlstra wrote:
> > Nohz exit is always caused
> > by processes woken up--non-idle model. It's not fair here, idle
> > calculated to non-idle.
> > 
> >      time-expect-sampling
> >                    |    time-do-sampling
> >                    |         |
> >                    V         V
> > -|-------------------------|--
> > start_nohz              stop_nohz
> 
> I don't think the delay in sampling is the biggest problem, I think the
> problem is the direct interaction between a cpu going idle and another
> cpu taking a sample.

IIUC, you hook into tick_nohz_idle_exit() will cure Charles's problem.

And comments below.

> ---
>  kernel/sched/core.c      |  290 ++++++++++++++++++++++++++++++++++------------
>  kernel/sched/idle_task.c |    1 -
>  kernel/sched/sched.h     |    2 -
>  kernel/time/tick-sched.c |    2 +
>  4 files changed, 220 insertions(+), 75 deletions(-)
> 
> + *  - When we go NO_HZ idle during the window, we can negate our sample
> + *    contribution, causing under-accounting.
> + *
> + *    We avoid this by keeping two idle-delta counters and flipping them
> + *    when the window starts, thus separating old and new NO_HZ load.
> + *
> + *    The only trick is the slight shift in index flip for read vs write.
> + *
> + *       0             5             10            15
> + *         +10           +10           +10           +10
> + *       |-|-----------|-|-----------|-|-----------|-|
> + *    r:001           110           001           110
> + *    w:011           100           011           100

I'm confused by this comments, looking at your code, index is increased by
1 for each samaple window.

> + *
> + *    This ensures we'll fold the old idle contribution in this window while
> + *    accumlating the new one.
> + *
> + *  - When we wake up from NO_HZ idle during the window, we push up our
> + *    contribution, since we effectively move our sample point to a known
> + *    busy state.
> + *
> + *    This is solved by pushing the window forward, and thus skipping the
> + *    sample, for this cpu (effectively using the idle-delta for this cpu which
> + *    was in effect at the time the window opened). This also solves the issue
> + *    of having to deal with a cpu having been in NOHZ idle for multiple
> + *    LOAD_FREQ intervals.
>   *
>   * When making the ILB scale, we should try to pull this in as well.
>   */
> -static long calc_load_fold_idle(void)
> +void calc_load_exit_idle(void)
>  {
> -	long delta = 0;
> +	struct rq *this_rq = this_rq();
>  
>  	/*
> -	 * Its got a race, we don't care...
> +	 * If we're still outside the sample window, we're done.
>  	 */
> -	if (atomic_long_read(&calc_load_tasks_idle))
> -		delta = atomic_long_xchg(&calc_load_tasks_idle, 0);
> +	if (time_before(jiffies, this_rq->calc_load_update))
> +		return;
	else if (time_before(jiffies, calc_load_update + 10)
		this_rq->calc_load_update = calc_load_update + LOAD_FREQ;
	else
		this_rq->calc_load_update = calc_load_update;

Otherwise if you woke after the sample window, we loose on sample?
And maybe we need local variable to cache calc_load_update.

Thanks,
Yong

  reply	other threads:[~2012-06-19  6:08 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-09 10:54 [PATCH] sched: Folding nohz load accounting more accurate Charles Wang
2012-06-11 15:42 ` Peter Zijlstra
     [not found]   ` <4FD6BFC4.1060302@gmail.com>
2012-06-12  8:54     ` Peter Zijlstra
2012-06-12  9:34   ` Charles Wang
2012-06-12  9:56     ` Peter Zijlstra
2012-06-13  5:55       ` Doug Smythies
2012-06-13  7:56         ` Charles Wang
2012-06-14  4:41           ` Doug Smythies
2012-06-14 15:42             ` Charles Wang
2012-06-16  6:42               ` Doug Smythies
2012-06-13  8:16         ` Peter Zijlstra
2012-06-13 15:33           ` Doug Smythies
2012-06-13 21:57             ` Peter Zijlstra
2012-06-14  3:13               ` Doug Smythies
2012-06-18 10:13                 ` Peter Zijlstra
2012-07-20 19:24         ` sched: care and feeding of load-avg code (Re: [PATCH] sched: Folding nohz load accounting more accurate) Jonathan Nieder
2012-06-15 14:27       ` [PATCH] sched: Folding nohz load accounting more accurate Charles Wang
2012-06-15 17:39         ` Peter Zijlstra
2012-06-16 14:53           ` Doug Smythies
2012-06-18  6:41             ` Doug Smythies
2012-06-18 14:41               ` Charles Wang
2012-06-18 10:06           ` Charles Wang
2012-06-18 16:03         ` Peter Zijlstra
2012-06-19  6:08           ` Yong Zhang [this message]
2012-06-19  9:18             ` Peter Zijlstra
2012-06-19 15:50               ` Doug Smythies
2012-06-20  9:45                 ` Peter Zijlstra
2012-06-21  4:12                   ` Doug Smythies
2012-06-21  6:35                     ` Charles Wang
2012-06-21  8:48                     ` Peter Zijlstra
2012-06-22 14:03                     ` Peter Zijlstra
2012-06-24 21:45                       ` Doug Smythies
2012-07-03 16:01                         ` Doug Smythies
2012-06-25  2:15                       ` Charles Wang
2012-07-06  6:19                       ` [tip:sched/core] sched/nohz: Rewrite and fix load-avg computation -- again tip-bot for Peter Zijlstra
2012-06-19  6:19           ` [PATCH] sched: Folding nohz load accounting more accurate Doug Smythies
2012-06-19  6:24           ` Charles Wang
2012-06-19  9:57             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120619060824.GA31684@zhy \
    --to=yong.zhang0@gmail.com \
    --cc=dsmythies@telus.net \
    --cc=handai.szj@taobao.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=muming.wq@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tm@tao.ma \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.