xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Dario Faggioli <dario.faggioli@citrix.com>,
	Anshul Makkar <anshul.makkar@citrix.com>,
	David Vrabel <david.vrabel@citrix.com>
Subject: Re: [PATCH 07/19] xen: credit2: prevent load balancing to go mad if time goes backwards
Date: Wed, 6 Jul 2016 17:21:13 +0100	[thread overview]
Message-ID: <CAFLBxZZdS21UGW4TSDGvaDtn3LzD3X6L3f-rpDP4ySrHN0r7RQ@mail.gmail.com> (raw)
In-Reply-To: <5767BF3102000078000F6802@prv-mh.provo.novell.com>

On Mon, Jun 20, 2016 at 9:02 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 18.06.16 at 01:12, <dario.faggioli@citrix.com> wrote:
>> This really should not happen, but:
>>  1. it does happen! Investigation is ongoing here:
>>     http://lists.xen.org/archives/html/xen-devel/2016-06/msg00922.html
>>  2. even when 1 will be fixed it makes sense and is easy enough
>>     to have a 'safety catch' for it.
>>
>> The reason why this is particularly bad for Credit2 is that
>> negative values of delta mean out of scale high load (because
>> of the conversion to unsigned). This, for instance in the
>> case of runqueue load, results in a runqueue having its load
>> updated to values of the order of 10000% or so, which in turns
>> means that the load balancer will migrate everything off from
>> the pCPUs in the runqueue, and leave them idle until the load
>> gets back to something sane... which may indeed take a while!
>>
>> This is not a fix for the problem of time going backwards. In
>> fact, if that happens a lot, load tracking accuracy is still
>> compromized, but at least the effect is a lot less bad than
>> before.
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> ---
>> Cc: George Dunlap <george.dunlap@citrix.com>
>> Cc: Anshul Makkar <anshul.makkar@citrix.com>
>> Cc: David Vrabel <david.vrabel@citrix.com>
>> ---
>>  xen/common/sched_credit2.c |   12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index 50f8dfd..b73d034 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -404,6 +404,12 @@ __update_runq_load(const struct scheduler *ops,
>>      else
>>      {
>>          delta = now - rqd->load_last_update;
>> +        if ( unlikely(delta < 0) )
>> +        {
>> +            d2printk("%s: Time went backwards? now %"PRI_stime" llu %"PRI_stime"\n",
>> +                     __func__, now, rqd->load_last_update);
>> +            delta = 0;
>> +        }
>>
>>          rqd->avgload =
>>              ( ( delta * ( (unsigned long long)rqd->load << prv->load_window_shift ) )
>> @@ -455,6 +461,12 @@ __update_svc_load(const struct scheduler *ops,
>>      else
>>      {
>>          delta = now - svc->load_last_update;
>> +        if ( unlikely(delta < 0) )
>> +        {
>> +            d2printk("%s: Time went backwards? now %"PRI_stime" llu %"PRI_stime"\n",
>> +                     __func__, now, svc->load_last_update);
>> +            delta = 0;
>> +        }
>>
>>          svc->avgload =
>>              ( ( delta * ( (unsigned long long)vcpu_load << prv->load_window_shift ) )
>
> Do the absolute times really matter here? I.e. wouldn't it be more
> useful to simply log the value of delta?
>
> Also, may I ask you to use the L modifier in favor of the ll one, for
> being one byte shorter (and hence, even if just very slightly,
> reducing both image size and cache pressure)?
>
> And finally, instead of logging function names, could the two
> messages be made distinguishable by other means resulting in less
> data issued to the log (and potentially needing transmission over
> a slow serial line)?

The reason this is under a "d2printk" is because it's really only to
help developers in debugging.  In-tree this warning isn't even on with
debug=y; you have to go to the top of the file and change the #define
to make it even exist.

Given that, I don't think the quibbles over the code size or the
length of what's logged really matter.  I think we should just take it
as it is.

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-07-06 16:21 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17 23:11 [PATCH 00/19] xen: sched: assorted fixes and improvements to Credit2 Dario Faggioli
2016-06-17 23:11 ` [PATCH 01/19] xen: sched: leave CPUs doing tasklet work alone Dario Faggioli
2016-06-20  7:48   ` Jan Beulich
2016-07-07 10:11     ` Dario Faggioli
2016-06-21 16:17   ` anshul makkar
2016-07-06 15:41   ` George Dunlap
2016-07-07 10:25     ` Dario Faggioli
2016-06-17 23:11 ` [PATCH 02/19] xen: sched: make the 'tickled' perf counter clearer Dario Faggioli
2016-06-18  0:36   ` Meng Xu
2016-07-06 15:52   ` George Dunlap
2016-06-17 23:11 ` [PATCH 03/19] xen: credit2: insert and tickle don't need a cpu parameter Dario Faggioli
2016-06-21 16:41   ` anshul makkar
2016-07-06 15:59   ` George Dunlap
2016-06-17 23:11 ` [PATCH 04/19] xen: credit2: kill useless helper function choose_cpu Dario Faggioli
2016-07-06 16:02   ` George Dunlap
2016-07-07 10:26     ` Dario Faggioli
2016-06-17 23:11 ` [PATCH 05/19] xen: credit2: do not warn if calling burn_credits more than once Dario Faggioli
2016-07-06 16:05   ` George Dunlap
2016-06-17 23:12 ` [PATCH 06/19] xen: credit2: read NOW() with the proper runq lock held Dario Faggioli
2016-06-20  7:56   ` Jan Beulich
2016-07-06 16:10     ` George Dunlap
2016-07-07 10:28       ` Dario Faggioli
2016-06-17 23:12 ` [PATCH 07/19] xen: credit2: prevent load balancing to go mad if time goes backwards Dario Faggioli
2016-06-20  8:02   ` Jan Beulich
2016-07-06 16:21     ` George Dunlap [this message]
2016-07-07  7:29       ` Jan Beulich
2016-07-07  9:09         ` George Dunlap
2016-07-07  9:18           ` Jan Beulich
2016-07-07 10:53             ` Dario Faggioli
2016-06-17 23:12 ` [PATCH 08/19] xen: credit2: when tickling, check idle cpus first Dario Faggioli
2016-07-06 16:36   ` George Dunlap
2016-06-17 23:12 ` [PATCH 09/19] xen: credit2: avoid calling __update_svc_load() multiple times on the same vcpu Dario Faggioli
2016-07-06 16:40   ` George Dunlap
2016-06-17 23:12 ` [PATCH 10/19] xen: credit2: rework load tracking logic Dario Faggioli
2016-07-06 17:33   ` George Dunlap
2016-06-17 23:12 ` [PATCH 11/19] tools: tracing: adapt Credit2 load tracking events to new format Dario Faggioli
2016-06-21  9:27   ` Wei Liu
2016-06-17 23:12 ` [PATCH 12/19] xen: credit2: use non-atomic cpumask and bit operations Dario Faggioli
2016-07-07  9:45   ` George Dunlap
2016-06-17 23:12 ` [PATCH 13/19] xen: credit2: make the code less experimental Dario Faggioli
2016-06-20  8:13   ` Jan Beulich
2016-07-07 10:59     ` Dario Faggioli
2016-07-07 15:17   ` George Dunlap
2016-07-07 16:43     ` Dario Faggioli
2016-06-17 23:12 ` [PATCH 14/19] xen: credit2: add yet some more tracing Dario Faggioli
2016-06-20  8:15   ` Jan Beulich
2016-07-07 15:34     ` George Dunlap
2016-07-07 15:34   ` George Dunlap
2016-06-17 23:13 ` [PATCH 15/19] xen: credit2: only marshall trace point arguments if tracing enabled Dario Faggioli
2016-07-07 15:37   ` George Dunlap
2016-06-17 23:13 ` [PATCH 16/19] tools: tracing: deal with new Credit2 events Dario Faggioli
2016-07-07 15:39   ` George Dunlap
2016-06-17 23:13 ` [PATCH 17/19] xen: credit2: the private scheduler lock can be an rwlock Dario Faggioli
2016-07-07 16:00   ` George Dunlap
2016-06-17 23:13 ` [PATCH 18/19] xen: credit2: implement SMT support independent runq arrangement Dario Faggioli
2016-06-20  8:26   ` Jan Beulich
2016-06-20 10:38     ` Dario Faggioli
2016-06-27 15:20   ` anshul makkar
2016-07-12 13:40   ` George Dunlap
2016-06-17 23:13 ` [PATCH 19/19] xen: credit2: use cpumask_first instead of cpumask_any when choosing cpu Dario Faggioli
2016-06-20  8:30   ` Jan Beulich
2016-06-20 11:28     ` Dario Faggioli
2016-06-21 10:42   ` David Vrabel
2016-07-07 16:55     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFLBxZZdS21UGW4TSDGvaDtn3LzD3X6L3f-rpDP4ySrHN0r7RQ@mail.gmail.com \
    --to=george.dunlap@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=anshul.makkar@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).