Null scheduler and vwfi native problem

* Null scheduler and vwfi native problem
@ 2021-01-21 10:54 Anders Törnqvist
  2021-01-21 18:32 ` Dario Faggioli
  2021-01-21 19:16 ` Julien Grall
  0 siblings, 2 replies; 31+ messages in thread
From: Anders Törnqvist @ 2021-01-21 10:54 UTC (permalink / raw)
  To: xen-devel

Hi,

I see a problem with destroy and restart of a domain. Interrupts are not 
available when trying to restart a domain.

The situation seems very similar to the thread "null scheduler bug" 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html.

The target system is a iMX8-based ARM board and Xen is a 4.13.0 version 
built from https://source.codeaurora.org/external/imx/imx-xen.git.

Xen is booted with sched=null vwfi=native.
One physical CPU core is pinned to the domu.
Some interrupts are passed through to the domu.

When destroying the domain with xl destroy etc it does not complain but 
then when trying to restart the domain
again with a "xl create <domain cfg>" I get:
(XEN) IRQ 210 is already used by domain 1

"xl list" does not contain the domain.

Repeating the "xl create" command 5-10 times eventually starts the 
domain without complaining about the IRQ.

Inspired from the discussion in the thread above I have put printks in 
the xen/common/domain.c file.
In the function domain_destroy I have a printk("End of domain_destroy 
function\n") in the end.
In the function complete_domain_destroy have a printk("Begin of 
complete_domain_destroy function\n") in the beginning.

With these printouts I get at "xl destroy":
(XEN) End of domain_destroy function

So it seems like the function complete_domain_destroy is not called.

"xl create" results in:
(XEN) IRQ 210 is already used by domain 1
(XEN) End of domain_destroy function

Then repeated "xl create" looks the same until after a few tries I also get:
(XEN) Begin of complete_domain_destroy function

After that the next "xl create" creates the domain.

I have also applied the patch from 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html. 
This does seem to change the results.

Starting the system without "sched=null vwfi=native" does not result in 
the problem.

BR
Anders

^ permalink raw reply	[flat|nested] 31+ messages in thread