xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <dfaggioli@suse.com>
To: xen-devel@lists.xenproject.org
Cc: Juergen Gross <jgross@suse.com>,
	Charles Arnold <carnold@suse.com>,
	Jan Beulich <jbeulich@suse.com>, Glen <glenbarney@gmail.com>,
	George Dunlap <george.dunlap@citrix.com>,
	Tomas Mozes <hydrapolic@gmail.com>, Sarah Newman <srn@prgmr.com>
Subject: [Xen-devel] [PATCH 0/2] xen: credit2: fix vcpu starvation due to too few credits
Date: Thu, 12 Mar 2020 14:44:07 +0100	[thread overview]
Message-ID: <158402056376.753.7091379488590272336.stgit@Palanthas> (raw)

Hello everyone,

There have been reports of a Credit2 issue due to which vCPUs where
being starved, to the point that guest kernel would complain or even
crash.

See the following xen-users and xen-devel threads:
https://lists.xenproject.org/archives/html/xen-users/2020-02/msg00018.html
https://lists.xenproject.org/archives/html/xen-users/2020-02/msg00015.html
https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01158.html

I did some investigations, and figured out that the vCPUs in question
are not scheduled for long time intervals because they somehow manage to
be given an amount of credits which is less than the credit the idle
vCPU has.

An example of this situation is shown here. In fact, we can see d0v1
sitting in the runqueue while all the CPUs are idle, as it has
-1254238270 credits, which is smaller than -2^30 = −1073741824:

    (XEN) Runqueue 0:
    (XEN)   ncpus              = 28
    (XEN)   cpus               = 0-27
    (XEN)   max_weight         = 256
    (XEN)   pick_bias          = 22
    (XEN)   instload           = 1
    (XEN)   aveload            = 293391 (~111%)
    (XEN)   idlers: 00,00000000,00000000,00000000,00000000,00000000,0fffffff
    (XEN)   tickled: 00,00000000,00000000,00000000,00000000,00000000,00000000
    (XEN)   fully idle cores: 00,00000000,00000000,00000000,00000000,00000000,0fffffff
    [...]
    (XEN) Runqueue 0:
    (XEN) CPU[00] runq=0, sibling=00,..., core=00,...
    (XEN) CPU[01] runq=0, sibling=00,..., core=00,...
    [...]
    (XEN) CPU[26] runq=0, sibling=00,..., core=00,...
    (XEN) CPU[27] runq=0, sibling=00,..., core=00,...
    (XEN) RUNQ:
    (XEN)     0: [0.1] flags=0 cpu=5 credit=-1254238270 [w=256] load=262144 (~100%)

This happens bacause --although very rarely-- vCPUs are allowed to
execute for much more than the scheduler would want them to.

For example, I have a trace showing that csched2_schedule() is invoked at
t=57970746155ns. At t=57970747658ns (+1503ns) the s_timer is set to
fire at t=57979485083ns, i.e., 8738928ns in future. That's because credit
of snext is exactly that 8738928ns. Then, what I see is that the next
call to burn_credits(), coming from csched2_schedule() for the same vCPU
happens at t=60083283617ns. That is *a lot* (2103798534ns) later than
when we expected and asked. Of course, that also means that delta is
2112537462ns, and therefore credits will sink to -2103798534!

Also, to the best of my current knowledge, this does not look like
Credit2 related, as I've observed it when running with Credit1 as well.
I personally don't think it would be scheduling related, in general, but
I need to do more investigation to be sure about that (and/or to figure
out what the real root cause is).

The reason why Credit2 is affected much more than Credit1 is because of
how time accounting is done. Basically, there's very rudimental time
accounting in Credit1, which is a very bad thing, IMO, but indeed that
is also what prevented for this issue to cause severe stalls.

One more thing is that Credit2 gives -2^30 credits to the idle vCPU, which
was considered to be low enough, which is true. But it's not a robust
choice, should an issue like the one we're discussing occur, which is
happening. :-) Therefore, I think we should lower the credits of the
idle vCPU to the minimum possible value, so that even under whatever
unusual or weird or buggy situations like this one, we will never pick
idle instead of an actual vCPU that is ready to run.

This is what is done in the first patch of this series. This is a
robustness improvement and a fix (or at least the best way we can deal
with the it within the scheduler) for the issue at hand. It therefore
should be backported.

While looking into this, I also have found out that there is an actual
bug in Credit2 code. It is something I introduced myself with commit
5e4b4199667b9 ("xen: credit2: only reset credit on reset condition").
In fact, while it was and still is a good idea to avoid resetting
credits too often, the implementation of this was just wrong.

A fix for this bug is what is contained in patch 2. And it also should
be backported.

Note that patch 2 alone was also already mitigating the stall/starvation
issue quite substantially. Nevertheless, the proper fix for the issue
itself is making Credit2 more robust against similar problem, as done in
patch 1, while this other bug just happens to be something which
interact with the sympthoms.

This to say that, although both patches will be bugported, asboth are
actual bugfixes, if there is the need to apply something "in emergency"
to fix the starvation problem, applying only patch 1 is enough.

Thanks and Regards
---
Dario Faggioli (2):
      xen: credit2: avoid vCPUs to ever reach lower credits than idle
      xen: credit2: fix credit reset happening too few times

 tools/xentrace/formats     |    2 +-
 tools/xentrace/xenalyze.c  |    8 +++-----
 xen/common/sched/credit2.c |   32 ++++++++++++++------------------
 3 files changed, 18 insertions(+), 24 deletions(-)
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

             reply	other threads:[~2020-03-12 13:44 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-12 13:44 Dario Faggioli [this message]
2020-03-12 13:44 ` [Xen-devel] [PATCH 1/2] xen: credit2: avoid vCPUs to ever reach lower credits than idle Dario Faggioli
2020-03-12 13:55   ` Andrew Cooper
2020-03-12 14:40     ` George Dunlap
2020-03-12 15:10       ` Dario Faggioli
2020-03-12 14:58     ` Dario Faggioli
2020-03-12 14:45   ` George Dunlap
2020-03-12 17:03     ` Dario Faggioli
2020-03-12 15:26   ` Jan Beulich
2020-03-12 16:00     ` Jürgen Groß
2020-03-12 16:59       ` Dario Faggioli
2020-03-12 16:11     ` Dario Faggioli
2020-03-12 16:36       ` Jan Beulich
2020-03-12 13:44 ` [Xen-devel] [PATCH 2/2] xen: credit2: fix credit reset happening too few times Dario Faggioli
2020-03-12 15:08 ` [Xen-devel] [PATCH 0/2] xen: credit2: fix vcpu starvation due to too few credits Roger Pau Monné
2020-03-12 17:02   ` Dario Faggioli
2020-03-12 17:59     ` Roger Pau Monné
2020-03-13  6:19       ` Dario Faggioli
2020-03-12 15:51 ` Jürgen Groß
2020-03-12 16:27   ` Andrew Cooper
2020-03-13  7:26     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=158402056376.753.7091379488590272336.stgit@Palanthas \
    --to=dfaggioli@suse.com \
    --cc=carnold@suse.com \
    --cc=george.dunlap@citrix.com \
    --cc=glenbarney@gmail.com \
    --cc=hydrapolic@gmail.com \
    --cc=jbeulich@suse.com \
    --cc=jgross@suse.com \
    --cc=srn@prgmr.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).