xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Dario Faggioli <dfaggioli@suse.com>
Cc: committers@xenproject.org,
	"Tamas K Lengyel" <tamas@tklengyel.com>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Michał Leszczyński" <michal.leszczynski@cert.pl>,
	"Ian Jackson" <iwj@xenproject.org>,
	xen-devel@lists.xenproject.org
Subject: Re: [ANNOUNCE] Xen 4.15 release schedule and feature tracking
Date: Fri, 29 Jan 2021 09:38:19 +0100	[thread overview]
Message-ID: <78cb6825-c5db-4613-3fd6-e7fc98441b41@suse.com> (raw)
In-Reply-To: <8c4b30f5f16824124e50922c871d440bf39991ba.camel@suse.com>

On 28.01.2021 19:26, Dario Faggioli wrote:
> On Thu, 2021-01-14 at 19:02 +0000, Andrew Cooper wrote:
>> 2) "scheduler broken" bugs.  We've had 4 or 5 reports of Xen not
>> working, and very little investigation on whats going on.  Suspicion
>> is
>> that there might be two bugs, one with smt=0 on recent AMD hardware,
>> and
>> one more general "some workloads cause negative credit" and might or
>> might not be specific to credit2 (debugging feedback differs - also
>> might be 3 underlying issue).
>>
> Yep, so, let's try to summarize/collect the ones I think you may be
> referring to:
> 
> 1) There is one report about Credit2 not working, while Credit1 was
> fine. It's this one:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01561.html
> 
> It's the one where it somehow happens that one or more vCPUs manage to
> run for a really really long timeslice, much more than the scheduler
> would have allowed them to, and this cause problems. _If_ that's it, my
> investigation so far seems to show that this happens despite scheduler
> code tries to enforce (via timers) the proper timeslice limits. when it
> happens, makes the scheduler very unhappy. I've see reports of it
> occurring both on Credit and Credit2, but definitely Credit2 seems to
> be more sensitive to it.
> 
> I've actually been trying to track it down for a while now, but I can't
> easily reproduce it, so it's proving to be challenging.
> 
> 2) Then there has been his one:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01005.html
> 
> Here, the where reporter said that "[credit1] results is an observable
> delay, unusable performance; credit2 seems to be the only usable
> scheduler". This is the one that also Andrew mention, happening on
> Ryzen and with SMT disabled (as this is on QubesOS, IIRC).
> 
> Here, doing "dom0_max_vcpus=1 dom0_vcpus_pin" seemed to mitigate the
> problem but, of course, with obvious limitations. I don't have a Ryzen
> handy, but I have a Zen and a Zen2. I checked there and again could not
> reproduce (although, what I tried was upstream Xen, not QubesOS).
> 
> 3) Then I recall this one:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01800.html
> 
> This also started as a "scheduler, probably Credit2" bug. But it then
> turned out manifests on both Credit1 and Credit2 and it started to
> happen on 4.14, while it was not there in 4.13... And nothing major
> changed in scheduling between these two releases, I think.
> 
> During the analysis, we thought we identified a livelock, but then
> could not pinpoint what was exactly going on. Oh, and then it was also
> discovered that Credit2 + PVH dom0 seemed to be a working
> configuration, and it's weird for a scheduling issue to have a (dom0)
> domain type dependency, I think. But that could be anything really...
> and I'm sure happy to keep digging.
> 
> 4) There's the NULL scheduler + ARM + vwfi=native issue:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2021-01/msg01634.html
> 
> This looks like something that we saw before, but remained unfixed,
> although not exactly like that. If it's that one, analysis is done, and
> we're working on a patch. If it's something else or even something
> similar but slightly different... Well, we'll have to see when we have
> the patch.
> 
> 5) We're also dealing with this bugreport, although this is being
> reported against Xen 4.13 (openSUSE 's packaged version of it):
> 
> https://bugzilla.opensuse.org/show_bug.cgi?id=1179246
> 
> This is again on recent AMD hardware and here, "dom0_max_vcpus=4
> dom0_vcpus_pin" works ok, but only until a (Windows) HVM guest is
> started. When that happens, though, we have crashes/hangs.
> 
> If guests are PV, things are apparently fine. If the HVM guests use a
> different set of CPUs than dom0 (e.g., vm.cpumask="4-63" in xl.conf),
> thinks are fine as well.
> 
> Again a scheduler issue and a scheduling algorithm dependency was
> theorized and will be investigated (if the user can come back with
> answers, which may take some time, as explained in the report). The
> different behavior with different kind of guests is a little weird for
> an issue of this kind, IME, but let's see.
> 
> 6) If we want, we can include this too (hopefully just for reference):
> 
> https://lists.xenproject.org/archives/html/xen-devel/2021-01/msg01376.html
> 
> As indeed the symptoms were similar, such as hanging during boot, but
> all fine with dom0_max_vcpus=1. However, Jan is currently investigating
> this one, and they're heading toward problems with TSC reliability
> reporting and rendezvous, but let's see.
> 
> Did I forget any?

Going just from my mailbox, where I didn't keep all of the still
unaddressed reports, but some (another one I have there is among
the ones you've mentioned above):

https://lists.xen.org/archives/html/xen-devel/2020-03/msg01251.html
https://lists.xen.org/archives/html/xen-devel/2020-05/msg01985.html

Jan


  parent reply	other threads:[~2021-01-29  8:38 UTC|newest]

Thread overview: 144+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12 21:52 [PATCH V4 00/24] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
2021-01-12 21:52 ` [PATCH V4 01/24] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
2021-01-15 15:16   ` Julien Grall
2021-01-15 16:41   ` Jan Beulich
2021-01-16  9:48     ` Oleksandr
2021-01-18  8:22   ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 02/24] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving Oleksandr Tyshchenko
2021-01-15 15:17   ` Julien Grall
2021-01-18  8:24   ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 03/24] x86/ioreq: Provide out-of-line wrapper for the handle_mmio() Oleksandr Tyshchenko
2021-01-15 14:48   ` Alex Bennée
2021-01-15 15:19   ` Julien Grall
2021-01-18  8:29   ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 04/24] xen/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
2021-01-15 14:55   ` Alex Bennée
2021-01-15 15:23   ` Julien Grall
2021-01-18  8:48   ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 05/24] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common Oleksandr Tyshchenko
2021-01-15 15:25   ` Julien Grall
2021-01-20  8:48   ` Alex Bennée
2021-01-20  9:31     ` Julien Grall
2021-01-12 21:52 ` [PATCH V4 06/24] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common Oleksandr Tyshchenko
2021-01-15 15:34   ` Julien Grall
2021-01-20  8:57   ` Alex Bennée
2021-01-20 16:15   ` Jan Beulich
2021-01-20 20:47     ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 07/24] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common Oleksandr Tyshchenko
2021-01-15 15:36   ` Julien Grall
2021-01-18  8:59   ` Paul Durrant
2021-01-20  8:58   ` Alex Bennée
2021-01-12 21:52 ` [PATCH V4 08/24] xen/ioreq: Move x86's ioreq_server to struct domain Oleksandr Tyshchenko
2021-01-15 15:44   ` Julien Grall
2021-01-18  9:09   ` Paul Durrant
2021-01-20  9:00   ` Alex Bennée
2021-01-12 21:52 ` [PATCH V4 09/24] xen/ioreq: Make x86's IOREQ related dm-op handling common Oleksandr Tyshchenko
2021-01-18  9:17   ` Paul Durrant
2021-01-18 10:19     ` Oleksandr
2021-01-18 10:34       ` Paul Durrant
2021-01-20 16:21   ` Jan Beulich
2021-01-21 10:23     ` Oleksandr
2021-01-21 10:27       ` Jan Beulich
2021-01-21 11:13         ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 10/24] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu Oleksandr Tyshchenko
2021-01-15 19:34   ` Julien Grall
2021-01-18  9:35   ` Paul Durrant
2021-01-20 16:24   ` Jan Beulich
2021-01-12 21:52 ` [PATCH V4 11/24] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
2021-01-14  3:58   ` Wei Chen
2021-01-14 15:31     ` Oleksandr
2021-01-15 14:35       ` Alex Bennée
2021-01-18 17:42         ` Oleksandr
2021-01-18  9:38   ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 12/24] xen/ioreq: Remove "hvm" prefixes from involved function names Oleksandr Tyshchenko
2021-01-18  9:55   ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 13/24] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() Oleksandr Tyshchenko
2021-01-15 19:37   ` Julien Grall
2021-01-17 11:32     ` Oleksandr
2021-01-18 10:00   ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 14/24] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
2021-01-15  0:55   ` Stefano Stabellini
2021-01-17 12:45     ` Oleksandr
2021-01-20  0:23       ` Stefano Stabellini
2021-01-21  9:51         ` Oleksandr
2021-01-15 20:26   ` Julien Grall
2021-01-17 17:11     ` Oleksandr
2021-01-17 18:07       ` Julien Grall
2021-01-17 18:52         ` Oleksandr
2021-01-18 19:17           ` Julien Grall
2021-01-19 15:20             ` Oleksandr
2021-01-20  0:50               ` Stefano Stabellini
2021-01-20 15:57                 ` Julien Grall
2021-01-20 19:47                   ` Stefano Stabellini
2021-01-21  9:31                     ` Oleksandr
2021-01-21 21:34                       ` Stefano Stabellini
2021-01-20 15:50           ` Julien Grall
2021-01-21  8:50             ` Oleksandr
2021-01-27 10:24               ` Jan Beulich
2021-01-27 12:22                 ` Oleksandr
2021-01-27 12:52                   ` Jan Beulich
2021-01-18 10:44       ` Jan Beulich
2021-01-18 15:52         ` Oleksandr
2021-01-18 16:00           ` Jan Beulich
2021-01-18 16:29             ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 15/24] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed Oleksandr Tyshchenko
2021-01-15  1:12   ` Stefano Stabellini
2021-01-15 20:55   ` Julien Grall
2021-01-17 20:23     ` Oleksandr
2021-01-18 10:57       ` Julien Grall
2021-01-18 13:23         ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 16/24] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
2021-01-15  1:19   ` Stefano Stabellini
2021-01-15 20:59   ` Julien Grall
2021-01-21 13:57   ` Jan Beulich
2021-01-21 18:42     ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 17/24] xen/ioreq: Introduce domain_has_ioreq_server() Oleksandr Tyshchenko
2021-01-15  1:24   ` Stefano Stabellini
2021-01-18 10:23   ` Paul Durrant
2021-01-12 21:52 ` [PATCH V4 18/24] xen/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
2021-01-15  1:32   ` Stefano Stabellini
2021-01-12 21:52 ` [PATCH V4 19/24] xen/arm: io: Abstract sign-extension Oleksandr Tyshchenko
2021-01-15  1:35   ` Stefano Stabellini
2021-01-12 21:52 ` [PATCH V4 20/24] xen/arm: io: Harden sign extension check Oleksandr Tyshchenko
2021-01-15  1:48   ` Stefano Stabellini
2021-01-22 10:15   ` Volodymyr Babchuk
2021-01-12 21:52 ` [PATCH V4 21/24] xen/ioreq: Make x86's send_invalidate_req() common Oleksandr Tyshchenko
2021-01-18 10:31   ` Paul Durrant
2021-01-21 14:02     ` Jan Beulich
2021-01-12 21:52 ` [PATCH V4 22/24] xen/arm: Add mapcache invalidation handling Oleksandr Tyshchenko
2021-01-15  2:11   ` Stefano Stabellini
2021-01-21 19:47     ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 23/24] libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko
2021-01-15 21:30   ` Julien Grall
2021-01-17 22:22     ` Oleksandr
2021-01-20 16:40       ` Julien Grall
2021-01-20 20:35         ` Stefano Stabellini
2021-02-09 21:04         ` Oleksandr
2021-01-12 21:52 ` [PATCH V4 24/24] [RFC] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko
2021-01-14 17:20   ` Ian Jackson
2021-01-16  9:05     ` Oleksandr
2021-01-15 22:01   ` Julien Grall
2021-01-18  8:32     ` Oleksandr
2021-01-20 17:05       ` Julien Grall
2021-02-10  9:02         ` Oleksandr
2021-03-06 19:52           ` Julien Grall
2021-01-14  3:55 ` [PATCH V4 00/24] IOREQ feature (+ virtio-mmio) on Arm Wei Chen
2021-01-14 15:23   ` Oleksandr
2021-01-07 14:35     ` [ANNOUNCE] Xen 4.15 release schedule and feature tracking Ian Jackson
2021-01-07 15:45       ` Oleksandr
2021-01-14 16:11         ` [PATCH V4 00/24] IOREQ feature (+ virtio-mmio) on Arm Ian Jackson
2021-01-14 18:41           ` Oleksandr
2021-01-14 16:06       ` [ANNOUNCE] Xen 4.15 release schedule and feature tracking Ian Jackson
2021-01-14 19:02         ` Andrew Cooper
2021-01-15  9:57           ` Jan Beulich
2021-01-15 10:00             ` Julien Grall
2021-01-15 10:52             ` Andrew Cooper
2021-01-15 10:59               ` Andrew Cooper
2021-01-15 11:08                 ` Jan Beulich
2021-01-15 10:43           ` Bertrand Marquis
2021-01-15 15:14           ` Lengyel, Tamas
2021-01-28 22:55             ` Dario Faggioli
2021-01-28 18:26           ` Dario Faggioli
2021-01-28 22:15             ` Dario Faggioli
2021-01-29  8:38             ` Jan Beulich [this message]
2021-01-29  9:22               ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=78cb6825-c5db-4613-3fd6-e7fc98441b41@suse.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=committers@xenproject.org \
    --cc=dfaggioli@suse.com \
    --cc=iwj@xenproject.org \
    --cc=michal.leszczynski@cert.pl \
    --cc=tamas@tklengyel.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).