xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Anshul Makkar <anshul.makkar@citrix.com>,
	Dario Faggioli <dario.faggioli@citrix.com>,
	David Vrabel <david.vrabel@citrix.com>
Subject: Re: [PATCH 07/19] xen: credit2: prevent load balancing to go mad if time goes backwards
Date: Thu, 7 Jul 2016 10:09:16 +0100	[thread overview]
Message-ID: <a5b9e1c0-1d25-4cda-b1b6-a546189b43b0@citrix.com> (raw)
In-Reply-To: <577E20F902000078000FBD9E@prv-mh.provo.novell.com>

On 07/07/16 08:29, Jan Beulich wrote:
>>>> On 06.07.16 at 18:21, <george.dunlap@citrix.com> wrote:
>> On Mon, Jun 20, 2016 at 9:02 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 18.06.16 at 01:12, <dario.faggioli@citrix.com> wrote:
>>>> This really should not happen, but:
>>>>  1. it does happen! Investigation is ongoing here:
>>>>     http://lists.xen.org/archives/html/xen-devel/2016-06/msg00922.html 
>>>>  2. even when 1 will be fixed it makes sense and is easy enough
>>>>     to have a 'safety catch' for it.
>>>>
>>>> The reason why this is particularly bad for Credit2 is that
>>>> negative values of delta mean out of scale high load (because
>>>> of the conversion to unsigned). This, for instance in the
>>>> case of runqueue load, results in a runqueue having its load
>>>> updated to values of the order of 10000% or so, which in turns
>>>> means that the load balancer will migrate everything off from
>>>> the pCPUs in the runqueue, and leave them idle until the load
>>>> gets back to something sane... which may indeed take a while!
>>>>
>>>> This is not a fix for the problem of time going backwards. In
>>>> fact, if that happens a lot, load tracking accuracy is still
>>>> compromized, but at least the effect is a lot less bad than
>>>> before.
>>>>
>>>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>>>> ---
>>>> Cc: George Dunlap <george.dunlap@citrix.com>
>>>> Cc: Anshul Makkar <anshul.makkar@citrix.com>
>>>> Cc: David Vrabel <david.vrabel@citrix.com>
>>>> ---
>>>>  xen/common/sched_credit2.c |   12 ++++++++++++
>>>>  1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>>>> index 50f8dfd..b73d034 100644
>>>> --- a/xen/common/sched_credit2.c
>>>> +++ b/xen/common/sched_credit2.c
>>>> @@ -404,6 +404,12 @@ __update_runq_load(const struct scheduler *ops,
>>>>      else
>>>>      {
>>>>          delta = now - rqd->load_last_update;
>>>> +        if ( unlikely(delta < 0) )
>>>> +        {
>>>> +            d2printk("%s: Time went backwards? now %"PRI_stime" llu 
>> %"PRI_stime"\n",
>>>> +                     __func__, now, rqd->load_last_update);
>>>> +            delta = 0;
>>>> +        }
>>>>
>>>>          rqd->avgload =
>>>>              ( ( delta * ( (unsigned long long)rqd->load << 
>> prv->load_window_shift ) )
>>>> @@ -455,6 +461,12 @@ __update_svc_load(const struct scheduler *ops,
>>>>      else
>>>>      {
>>>>          delta = now - svc->load_last_update;
>>>> +        if ( unlikely(delta < 0) )
>>>> +        {
>>>> +            d2printk("%s: Time went backwards? now %"PRI_stime" llu 
>> %"PRI_stime"\n",
>>>> +                     __func__, now, svc->load_last_update);
>>>> +            delta = 0;
>>>> +        }
>>>>
>>>>          svc->avgload =
>>>>              ( ( delta * ( (unsigned long long)vcpu_load << 
>> prv->load_window_shift ) )
>>>
>>> Do the absolute times really matter here? I.e. wouldn't it be more
>>> useful to simply log the value of delta?
>>>
>>> Also, may I ask you to use the L modifier in favor of the ll one, for
>>> being one byte shorter (and hence, even if just very slightly,
>>> reducing both image size and cache pressure)?
>>>
>>> And finally, instead of logging function names, could the two
>>> messages be made distinguishable by other means resulting in less
>>> data issued to the log (and potentially needing transmission over
>>> a slow serial line)?
>>
>> The reason this is under a "d2printk" is because it's really only to
>> help developers in debugging.  In-tree this warning isn't even on with
>> debug=y; you have to go to the top of the file and change the #define
>> to make it even exist.
>>
>> Given that, I don't think the quibbles over the code size or the
>> length of what's logged really matter.  I think we should just take it
>> as it is.
>>
>> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
> 
> Oh, okay - I agree on those two parts then. But the question on the
> usefulness of absolute vs relative times remains.

What is the usefulness of printing the relative time?  If you have the
absolute time, you have some chance of catching mistakes like one of the
times being 0 or something like that; or of being able to correlate it
with another time printed somewhere else (for instance, a timestamp from
a trace record).

In any case, I think it's really a bike shed.  Dario is the one who has
used this error message to find an actual bug recently, so I'll let him
decide what he thinks the most useful thing to print here is.

 -George



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-07-07  9:11 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17 23:11 [PATCH 00/19] xen: sched: assorted fixes and improvements to Credit2 Dario Faggioli
2016-06-17 23:11 ` [PATCH 01/19] xen: sched: leave CPUs doing tasklet work alone Dario Faggioli
2016-06-20  7:48   ` Jan Beulich
2016-07-07 10:11     ` Dario Faggioli
2016-06-21 16:17   ` anshul makkar
2016-07-06 15:41   ` George Dunlap
2016-07-07 10:25     ` Dario Faggioli
2016-06-17 23:11 ` [PATCH 02/19] xen: sched: make the 'tickled' perf counter clearer Dario Faggioli
2016-06-18  0:36   ` Meng Xu
2016-07-06 15:52   ` George Dunlap
2016-06-17 23:11 ` [PATCH 03/19] xen: credit2: insert and tickle don't need a cpu parameter Dario Faggioli
2016-06-21 16:41   ` anshul makkar
2016-07-06 15:59   ` George Dunlap
2016-06-17 23:11 ` [PATCH 04/19] xen: credit2: kill useless helper function choose_cpu Dario Faggioli
2016-07-06 16:02   ` George Dunlap
2016-07-07 10:26     ` Dario Faggioli
2016-06-17 23:11 ` [PATCH 05/19] xen: credit2: do not warn if calling burn_credits more than once Dario Faggioli
2016-07-06 16:05   ` George Dunlap
2016-06-17 23:12 ` [PATCH 06/19] xen: credit2: read NOW() with the proper runq lock held Dario Faggioli
2016-06-20  7:56   ` Jan Beulich
2016-07-06 16:10     ` George Dunlap
2016-07-07 10:28       ` Dario Faggioli
2016-06-17 23:12 ` [PATCH 07/19] xen: credit2: prevent load balancing to go mad if time goes backwards Dario Faggioli
2016-06-20  8:02   ` Jan Beulich
2016-07-06 16:21     ` George Dunlap
2016-07-07  7:29       ` Jan Beulich
2016-07-07  9:09         ` George Dunlap [this message]
2016-07-07  9:18           ` Jan Beulich
2016-07-07 10:53             ` Dario Faggioli
2016-06-17 23:12 ` [PATCH 08/19] xen: credit2: when tickling, check idle cpus first Dario Faggioli
2016-07-06 16:36   ` George Dunlap
2016-06-17 23:12 ` [PATCH 09/19] xen: credit2: avoid calling __update_svc_load() multiple times on the same vcpu Dario Faggioli
2016-07-06 16:40   ` George Dunlap
2016-06-17 23:12 ` [PATCH 10/19] xen: credit2: rework load tracking logic Dario Faggioli
2016-07-06 17:33   ` George Dunlap
2016-06-17 23:12 ` [PATCH 11/19] tools: tracing: adapt Credit2 load tracking events to new format Dario Faggioli
2016-06-21  9:27   ` Wei Liu
2016-06-17 23:12 ` [PATCH 12/19] xen: credit2: use non-atomic cpumask and bit operations Dario Faggioli
2016-07-07  9:45   ` George Dunlap
2016-06-17 23:12 ` [PATCH 13/19] xen: credit2: make the code less experimental Dario Faggioli
2016-06-20  8:13   ` Jan Beulich
2016-07-07 10:59     ` Dario Faggioli
2016-07-07 15:17   ` George Dunlap
2016-07-07 16:43     ` Dario Faggioli
2016-06-17 23:12 ` [PATCH 14/19] xen: credit2: add yet some more tracing Dario Faggioli
2016-06-20  8:15   ` Jan Beulich
2016-07-07 15:34     ` George Dunlap
2016-07-07 15:34   ` George Dunlap
2016-06-17 23:13 ` [PATCH 15/19] xen: credit2: only marshall trace point arguments if tracing enabled Dario Faggioli
2016-07-07 15:37   ` George Dunlap
2016-06-17 23:13 ` [PATCH 16/19] tools: tracing: deal with new Credit2 events Dario Faggioli
2016-07-07 15:39   ` George Dunlap
2016-06-17 23:13 ` [PATCH 17/19] xen: credit2: the private scheduler lock can be an rwlock Dario Faggioli
2016-07-07 16:00   ` George Dunlap
2016-06-17 23:13 ` [PATCH 18/19] xen: credit2: implement SMT support independent runq arrangement Dario Faggioli
2016-06-20  8:26   ` Jan Beulich
2016-06-20 10:38     ` Dario Faggioli
2016-06-27 15:20   ` anshul makkar
2016-07-12 13:40   ` George Dunlap
2016-06-17 23:13 ` [PATCH 19/19] xen: credit2: use cpumask_first instead of cpumask_any when choosing cpu Dario Faggioli
2016-06-20  8:30   ` Jan Beulich
2016-06-20 11:28     ` Dario Faggioli
2016-06-21 10:42   ` David Vrabel
2016-07-07 16:55     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5b9e1c0-1d25-4cda-b1b6-a546189b43b0@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=anshul.makkar@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).