From: Jan Beulich <jbeulich@suse.com>
To: Dario Faggioli <dfaggioli@suse.com>
Cc: committers@xenproject.org,
"Tamas K Lengyel" <tamas@tklengyel.com>,
"Andrew Cooper" <andrew.cooper3@citrix.com>,
"Michał Leszczyński" <michal.leszczynski@cert.pl>,
"Ian Jackson" <iwj@xenproject.org>,
xen-devel@lists.xenproject.org
Subject: Re: [ANNOUNCE] Xen 4.15 release schedule and feature tracking
Date: Fri, 29 Jan 2021 09:38:19 +0100 [thread overview]
Message-ID: <78cb6825-c5db-4613-3fd6-e7fc98441b41@suse.com> (raw)
In-Reply-To: <8c4b30f5f16824124e50922c871d440bf39991ba.camel@suse.com>
On 28.01.2021 19:26, Dario Faggioli wrote:
> On Thu, 2021-01-14 at 19:02 +0000, Andrew Cooper wrote:
>> 2) "scheduler broken" bugs. We've had 4 or 5 reports of Xen not
>> working, and very little investigation on whats going on. Suspicion
>> is
>> that there might be two bugs, one with smt=0 on recent AMD hardware,
>> and
>> one more general "some workloads cause negative credit" and might or
>> might not be specific to credit2 (debugging feedback differs - also
>> might be 3 underlying issue).
>>
> Yep, so, let's try to summarize/collect the ones I think you may be
> referring to:
>
> 1) There is one report about Credit2 not working, while Credit1 was
> fine. It's this one:
>
> https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01561.html
>
> It's the one where it somehow happens that one or more vCPUs manage to
> run for a really really long timeslice, much more than the scheduler
> would have allowed them to, and this cause problems. _If_ that's it, my
> investigation so far seems to show that this happens despite scheduler
> code tries to enforce (via timers) the proper timeslice limits. when it
> happens, makes the scheduler very unhappy. I've see reports of it
> occurring both on Credit and Credit2, but definitely Credit2 seems to
> be more sensitive to it.
>
> I've actually been trying to track it down for a while now, but I can't
> easily reproduce it, so it's proving to be challenging.
>
> 2) Then there has been his one:
>
> https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01005.html
>
> Here, the where reporter said that "[credit1] results is an observable
> delay, unusable performance; credit2 seems to be the only usable
> scheduler". This is the one that also Andrew mention, happening on
> Ryzen and with SMT disabled (as this is on QubesOS, IIRC).
>
> Here, doing "dom0_max_vcpus=1 dom0_vcpus_pin" seemed to mitigate the
> problem but, of course, with obvious limitations. I don't have a Ryzen
> handy, but I have a Zen and a Zen2. I checked there and again could not
> reproduce (although, what I tried was upstream Xen, not QubesOS).
>
> 3) Then I recall this one:
>
> https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01800.html
>
> This also started as a "scheduler, probably Credit2" bug. But it then
> turned out manifests on both Credit1 and Credit2 and it started to
> happen on 4.14, while it was not there in 4.13... And nothing major
> changed in scheduling between these two releases, I think.
>
> During the analysis, we thought we identified a livelock, but then
> could not pinpoint what was exactly going on. Oh, and then it was also
> discovered that Credit2 + PVH dom0 seemed to be a working
> configuration, and it's weird for a scheduling issue to have a (dom0)
> domain type dependency, I think. But that could be anything really...
> and I'm sure happy to keep digging.
>
> 4) There's the NULL scheduler + ARM + vwfi=native issue:
>
> https://lists.xenproject.org/archives/html/xen-devel/2021-01/msg01634.html
>
> This looks like something that we saw before, but remained unfixed,
> although not exactly like that. If it's that one, analysis is done, and
> we're working on a patch. If it's something else or even something
> similar but slightly different... Well, we'll have to see when we have
> the patch.
>
> 5) We're also dealing with this bugreport, although this is being
> reported against Xen 4.13 (openSUSE 's packaged version of it):
>
> https://bugzilla.opensuse.org/show_bug.cgi?id=1179246
>
> This is again on recent AMD hardware and here, "dom0_max_vcpus=4
> dom0_vcpus_pin" works ok, but only until a (Windows) HVM guest is
> started. When that happens, though, we have crashes/hangs.
>
> If guests are PV, things are apparently fine. If the HVM guests use a
> different set of CPUs than dom0 (e.g., vm.cpumask="4-63" in xl.conf),
> thinks are fine as well.
>
> Again a scheduler issue and a scheduling algorithm dependency was
> theorized and will be investigated (if the user can come back with
> answers, which may take some time, as explained in the report). The
> different behavior with different kind of guests is a little weird for
> an issue of this kind, IME, but let's see.
>
> 6) If we want, we can include this too (hopefully just for reference):
>
> https://lists.xenproject.org/archives/html/xen-devel/2021-01/msg01376.html
>
> As indeed the symptoms were similar, such as hanging during boot, but
> all fine with dom0_max_vcpus=1. However, Jan is currently investigating
> this one, and they're heading toward problems with TSC reliability
> reporting and rendezvous, but let's see.
>
> Did I forget any?
Going just from my mailbox, where I didn't keep all of the still
unaddressed reports, but some (another one I have there is among
the ones you've mentioned above):
https://lists.xen.org/archives/html/xen-devel/2020-03/msg01251.html
https://lists.xen.org/archives/html/xen-devel/2020-05/msg01985.html
Jan
next prev parent reply other threads:[~2021-01-29 8:38 UTC|newest]
Thread overview: 144+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-12 21:52 [PATCH V4 00/24] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
2021-01-12 21:52 ` [PATCH V4 01/24] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
2021-01-15 15:16 ` Julien Grall
2021-01-15 16:41 ` Jan Beulich
2021-01-16 9:48 ` Oleksandr
2021-01-18 8:22 ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 02/24] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving Oleksandr Tyshchenko
2021-01-15 15:17 ` Julien Grall
2021-01-18 8:24 ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 03/24] x86/ioreq: Provide out-of-line wrapper for the handle_mmio() Oleksandr Tyshchenko
2021-01-15 14:48 ` Alex Bennée
2021-01-15 15:19 ` Julien Grall
2021-01-18 8:29 ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 04/24] xen/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
2021-01-15 14:55 ` Alex Bennée
2021-01-15 15:23 ` Julien Grall
2021-01-18 8:48 ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 05/24] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common Oleksandr Tyshchenko
2021-01-15 15:25 ` Julien Grall
2021-01-20 8:48 ` Alex Bennée
2021-01-20 9:31 ` Julien Grall
2021-01-12 21:52 ` [PATCH V4 06/24] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common Oleksandr Tyshchenko
2021-01-15 15:34 ` Julien Grall
2021-01-20 8:57 ` Alex Bennée
2021-01-20 16:15 ` Jan Beulich
2021-01-20 20:47 ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 07/24] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common Oleksandr Tyshchenko
2021-01-15 15:36 ` Julien Grall
2021-01-18 8:59 ` Paul Durrant
2021-01-20 8:58 ` Alex Bennée
2021-01-12 21:52 ` [PATCH V4 08/24] xen/ioreq: Move x86's ioreq_server to struct domain Oleksandr Tyshchenko
2021-01-15 15:44 ` Julien Grall
2021-01-18 9:09 ` Paul Durrant
2021-01-20 9:00 ` Alex Bennée
2021-01-12 21:52 ` [PATCH V4 09/24] xen/ioreq: Make x86's IOREQ related dm-op handling common Oleksandr Tyshchenko
2021-01-18 9:17 ` Paul Durrant
2021-01-18 10:19 ` Oleksandr
2021-01-18 10:34 ` Paul Durrant
2021-01-20 16:21 ` Jan Beulich
2021-01-21 10:23 ` Oleksandr
2021-01-21 10:27 ` Jan Beulich
2021-01-21 11:13 ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 10/24] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu Oleksandr Tyshchenko
2021-01-15 19:34 ` Julien Grall
2021-01-18 9:35 ` Paul Durrant
2021-01-20 16:24 ` Jan Beulich
2021-01-12 21:52 ` [PATCH V4 11/24] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
2021-01-14 3:58 ` Wei Chen
2021-01-14 15:31 ` Oleksandr
2021-01-15 14:35 ` Alex Bennée
2021-01-18 17:42 ` Oleksandr
2021-01-18 9:38 ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 12/24] xen/ioreq: Remove "hvm" prefixes from involved function names Oleksandr Tyshchenko
2021-01-18 9:55 ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 13/24] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() Oleksandr Tyshchenko
2021-01-15 19:37 ` Julien Grall
2021-01-17 11:32 ` Oleksandr
2021-01-18 10:00 ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 14/24] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
2021-01-15 0:55 ` Stefano Stabellini
2021-01-17 12:45 ` Oleksandr
2021-01-20 0:23 ` Stefano Stabellini
2021-01-21 9:51 ` Oleksandr
2021-01-15 20:26 ` Julien Grall
2021-01-17 17:11 ` Oleksandr
2021-01-17 18:07 ` Julien Grall
2021-01-17 18:52 ` Oleksandr
2021-01-18 19:17 ` Julien Grall
2021-01-19 15:20 ` Oleksandr
2021-01-20 0:50 ` Stefano Stabellini
2021-01-20 15:57 ` Julien Grall
2021-01-20 19:47 ` Stefano Stabellini
2021-01-21 9:31 ` Oleksandr
2021-01-21 21:34 ` Stefano Stabellini
2021-01-20 15:50 ` Julien Grall
2021-01-21 8:50 ` Oleksandr
2021-01-27 10:24 ` Jan Beulich
2021-01-27 12:22 ` Oleksandr
2021-01-27 12:52 ` Jan Beulich
2021-01-18 10:44 ` Jan Beulich
2021-01-18 15:52 ` Oleksandr
2021-01-18 16:00 ` Jan Beulich
2021-01-18 16:29 ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 15/24] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed Oleksandr Tyshchenko
2021-01-15 1:12 ` Stefano Stabellini
2021-01-15 20:55 ` Julien Grall
2021-01-17 20:23 ` Oleksandr
2021-01-18 10:57 ` Julien Grall
2021-01-18 13:23 ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 16/24] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
2021-01-15 1:19 ` Stefano Stabellini
2021-01-15 20:59 ` Julien Grall
2021-01-21 13:57 ` Jan Beulich
2021-01-21 18:42 ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 17/24] xen/ioreq: Introduce domain_has_ioreq_server() Oleksandr Tyshchenko
2021-01-15 1:24 ` Stefano Stabellini
2021-01-18 10:23 ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 18/24] xen/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
2021-01-15 1:32 ` Stefano Stabellini
2021-01-12 21:52 ` [PATCH V4 19/24] xen/arm: io: Abstract sign-extension Oleksandr Tyshchenko
2021-01-15 1:35 ` Stefano Stabellini
2021-01-12 21:52 ` [PATCH V4 20/24] xen/arm: io: Harden sign extension check Oleksandr Tyshchenko
2021-01-15 1:48 ` Stefano Stabellini
2021-01-22 10:15 ` Volodymyr Babchuk
2021-01-12 21:52 ` [PATCH V4 21/24] xen/ioreq: Make x86's send_invalidate_req() common Oleksandr Tyshchenko
2021-01-18 10:31 ` Paul Durrant
2021-01-21 14:02 ` Jan Beulich
2021-01-12 21:52 ` [PATCH V4 22/24] xen/arm: Add mapcache invalidation handling Oleksandr Tyshchenko
2021-01-15 2:11 ` Stefano Stabellini
2021-01-21 19:47 ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 23/24] libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko
2021-01-15 21:30 ` Julien Grall
2021-01-17 22:22 ` Oleksandr
2021-01-20 16:40 ` Julien Grall
2021-01-20 20:35 ` Stefano Stabellini
2021-02-09 21:04 ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 24/24] [RFC] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko
2021-01-14 17:20 ` Ian Jackson
2021-01-16 9:05 ` Oleksandr
2021-01-15 22:01 ` Julien Grall
2021-01-18 8:32 ` Oleksandr
2021-01-20 17:05 ` Julien Grall
2021-02-10 9:02 ` Oleksandr
2021-03-06 19:52 ` Julien Grall
2021-01-14 3:55 ` [PATCH V4 00/24] IOREQ feature (+ virtio-mmio) on Arm Wei Chen
2021-01-14 15:23 ` Oleksandr
2021-01-07 14:35 ` [ANNOUNCE] Xen 4.15 release schedule and feature tracking Ian Jackson
2021-01-07 15:45 ` Oleksandr
2021-01-14 16:11 ` [PATCH V4 00/24] IOREQ feature (+ virtio-mmio) on Arm Ian Jackson
2021-01-14 18:41 ` Oleksandr
2021-01-14 16:06 ` [ANNOUNCE] Xen 4.15 release schedule and feature tracking Ian Jackson
2021-01-14 19:02 ` Andrew Cooper
2021-01-15 9:57 ` Jan Beulich
2021-01-15 10:00 ` Julien Grall
2021-01-15 10:52 ` Andrew Cooper
2021-01-15 10:59 ` Andrew Cooper
2021-01-15 11:08 ` Jan Beulich
2021-01-15 10:43 ` Bertrand Marquis
2021-01-15 15:14 ` Lengyel, Tamas
2021-01-28 22:55 ` Dario Faggioli
2021-01-28 18:26 ` Dario Faggioli
2021-01-28 22:15 ` Dario Faggioli
2021-01-29 8:38 ` Jan Beulich [this message]
2021-01-29 9:22 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=78cb6825-c5db-4613-3fd6-e7fc98441b41@suse.com \
--to=jbeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=committers@xenproject.org \
--cc=dfaggioli@suse.com \
--cc=iwj@xenproject.org \
--cc=michal.leszczynski@cert.pl \
--cc=tamas@tklengyel.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).