All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jürgen Groß" <jgross@suse.com>
To: Dario Faggioli <dfaggioli@suse.com>, xen-devel@lists.xenproject.org
Cc: Charles Arnold <carnold@suse.com>,
	Jan Beulich <jbeulich@suse.com>, Glen <glenbarney@gmail.com>,
	George Dunlap <george.dunlap@citrix.com>,
	Tomas Mozes <hydrapolic@gmail.com>, Sarah Newman <srn@prgmr.com>
Subject: Re: [Xen-devel] [PATCH 0/2] xen: credit2: fix vcpu starvation due to too few credits
Date: Thu, 12 Mar 2020 16:51:31 +0100	[thread overview]
Message-ID: <03f34120-8420-a526-1b03-03601c169be1@suse.com> (raw)
In-Reply-To: <158402056376.753.7091379488590272336.stgit@Palanthas>

On 12.03.20 14:44, Dario Faggioli wrote:
> Hello everyone,
> 
> There have been reports of a Credit2 issue due to which vCPUs where
> being starved, to the point that guest kernel would complain or even
> crash.
> 
> See the following xen-users and xen-devel threads:
> https://lists.xenproject.org/archives/html/xen-users/2020-02/msg00018.html
> https://lists.xenproject.org/archives/html/xen-users/2020-02/msg00015.html
> https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01158.html
> 
> I did some investigations, and figured out that the vCPUs in question
> are not scheduled for long time intervals because they somehow manage to
> be given an amount of credits which is less than the credit the idle
> vCPU has.
> 
> An example of this situation is shown here. In fact, we can see d0v1
> sitting in the runqueue while all the CPUs are idle, as it has
> -1254238270 credits, which is smaller than -2^30 = −1073741824:
> 
>      (XEN) Runqueue 0:
>      (XEN)   ncpus              = 28
>      (XEN)   cpus               = 0-27
>      (XEN)   max_weight         = 256
>      (XEN)   pick_bias          = 22
>      (XEN)   instload           = 1
>      (XEN)   aveload            = 293391 (~111%)
>      (XEN)   idlers: 00,00000000,00000000,00000000,00000000,00000000,0fffffff
>      (XEN)   tickled: 00,00000000,00000000,00000000,00000000,00000000,00000000
>      (XEN)   fully idle cores: 00,00000000,00000000,00000000,00000000,00000000,0fffffff
>      [...]
>      (XEN) Runqueue 0:
>      (XEN) CPU[00] runq=0, sibling=00,..., core=00,...
>      (XEN) CPU[01] runq=0, sibling=00,..., core=00,...
>      [...]
>      (XEN) CPU[26] runq=0, sibling=00,..., core=00,...
>      (XEN) CPU[27] runq=0, sibling=00,..., core=00,...
>      (XEN) RUNQ:
>      (XEN)     0: [0.1] flags=0 cpu=5 credit=-1254238270 [w=256] load=262144 (~100%)
> 
> This happens bacause --although very rarely-- vCPUs are allowed to
> execute for much more than the scheduler would want them to.
> 
> For example, I have a trace showing that csched2_schedule() is invoked at
> t=57970746155ns. At t=57970747658ns (+1503ns) the s_timer is set to
> fire at t=57979485083ns, i.e., 8738928ns in future. That's because credit
> of snext is exactly that 8738928ns. Then, what I see is that the next
> call to burn_credits(), coming from csched2_schedule() for the same vCPU
> happens at t=60083283617ns. That is *a lot* (2103798534ns) later than
> when we expected and asked. Of course, that also means that delta is
> 2112537462ns, and therefore credits will sink to -2103798534!

Current ideas are:

- Could it be the vcpu is busy for very long time in the hypervisor?
   So either fighting with another vcpu for a lock, doing a long
   running hypercall, ...

- The timer used is not reliable.

- The time base is not reliable (tsc or whatever is used for getting
   the time has jumped 2 seconds into the future).

- System management mode has kicked in.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  parent reply	other threads:[~2020-03-12 15:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-12 13:44 [Xen-devel] [PATCH 0/2] xen: credit2: fix vcpu starvation due to too few credits Dario Faggioli
2020-03-12 13:44 ` [Xen-devel] [PATCH 1/2] xen: credit2: avoid vCPUs to ever reach lower credits than idle Dario Faggioli
2020-03-12 13:55   ` Andrew Cooper
2020-03-12 14:40     ` George Dunlap
2020-03-12 15:10       ` Dario Faggioli
2020-03-12 14:58     ` Dario Faggioli
2020-03-12 14:45   ` George Dunlap
2020-03-12 17:03     ` Dario Faggioli
2020-03-12 15:26   ` Jan Beulich
2020-03-12 16:00     ` Jürgen Groß
2020-03-12 16:59       ` Dario Faggioli
2020-03-12 16:11     ` Dario Faggioli
2020-03-12 16:36       ` Jan Beulich
2020-03-12 13:44 ` [Xen-devel] [PATCH 2/2] xen: credit2: fix credit reset happening too few times Dario Faggioli
2020-03-12 15:08 ` [Xen-devel] [PATCH 0/2] xen: credit2: fix vcpu starvation due to too few credits Roger Pau Monné
2020-03-12 17:02   ` Dario Faggioli
2020-03-12 17:59     ` Roger Pau Monné
2020-03-13  6:19       ` Dario Faggioli
2020-03-12 15:51 ` Jürgen Groß [this message]
2020-03-12 16:27   ` Andrew Cooper
2020-03-13  7:26     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03f34120-8420-a526-1b03-03601c169be1@suse.com \
    --to=jgross@suse.com \
    --cc=carnold@suse.com \
    --cc=dfaggioli@suse.com \
    --cc=george.dunlap@citrix.com \
    --cc=glenbarney@gmail.com \
    --cc=hydrapolic@gmail.com \
    --cc=jbeulich@suse.com \
    --cc=srn@prgmr.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.