On Fri, 2021-01-15 at 15:14 +0000, Lengyel, Tamas wrote: > > > 2) "scheduler broken" bugs.  We've had 4 or 5 reports of Xen not > > working, > > and very little investigation on whats going on.  Suspicion is that > > there > > might be two bugs, one with smt=0 on recent AMD hardware, and one > > more general "some workloads cause negative credit" and might or > > might > > not be specific to credit2 (debugging feedback differs - also might > > be 3 > > underlying issue). > > We've also ran into intermittent Xen lockups requiring power-cycling > servers. We switched back to credit1 and had no issues since.  > Ah, that's interesting... Among the issues that I listed in my other email, when trying to do a quick summary, "only" number 1 is about Credit working when Credit2 does not. This one you're mentioning here may be the second... or it may be the same! :-O As said there, my theory so far is that there's a bug somewhere, not necessarily in scheduling code, to which the two algorithms react differently. Of course this is a theory, and I've not been able to confirm it yet (otherwise I also would have fixed the problem. :-P). But really, it would be interesting to double check if at least the symptoms are the same than the ones of the issue reported here. > Hard to tell if it was related to the scheduler or the pile of other > experimental stuff we are running with but right now we have stable > systems across the board with credit1. > Well, sure, that's understandable. :-) Which is why it's tricky at times to debug these issue. In fact, I cannot reproduce them myself, and users, rightfully, move on if they found a workaround. Anyway, if at some point you decide to investigate, I'll be happy to help. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <> (Raistlin Majere)