ITS emulation race conditions

* ITS emulation race conditions
@ 2017-04-09 23:12 André Przywara
  2017-04-10 16:58 ` Andre Przywara
  0 siblings, 1 reply; 6+ messages in thread
From: André Przywara @ 2017-04-09 23:12 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall; +Cc: xen-devel

Hi,

I wanted to run some ideas on how to prevent the race conditions we are
facing with the ITS emulation and removing devices and/or LPIs.
I think Stefano's idea of tagging a discarded LPI is the key, but still
some details are left to be solved.
I think we are dealing with two issues:
1) A guest's DISCARD can race with in incoming LPI.
Ideally DISCARD would remove the host LPI -> vLPI connection (in the
host_lpis table), so any new incoming (host) LPI would simply be
discarded very early in gicv3_do_LPI() without ever resolving into a
virtual LPI. Now while this removal is atomic, we could have just missed
an incoming LPI, so the old virtual LPI would still traverse down the
VGIC with a "doomed" virtual LPI ID.
I wonder if that could be solved by a "crosswise" check:
- The DISCARD handler *first* tags the pending_irq as "UNMAPPED", *then*
removes the host_lpis connectin.
- do_LPI() reads the host_lpis() array, *then* checks the UNMAPPED tag
in the corresponding pending_irq (or lets vgic_vcpu_inject_irq() do that).
With this setup the DISCARD handler can assume that no new virtual LPI
instances enter the VGIC afterwards.

2) An unmapped LPI might still be in use by the VGIC, foremost might
still be in an LR.
Tagging the pending_irq should solve this in general. Whenever a VGIC
function finds the UNMAPPED tag, it does not process the vIRQ, but
retires it. For simplicity we might limit this to handling when a VCPU
exists and an LR gets cleaned up: If we hit a tagged pending_irq here,
we clean the LR, remove the pending_irq from all lists and signal the
ITS emulation that this pending_irq is now ready to be removed (by
calling some kind of cleanup_lpi() function, for instance).
The ITS code can then remove the struct pending_irq from the radix tree.

MAPD(V=0) is now using this to tag all still mapped events as
"UNMAPPED", the counting down the "still alive" pending_irqs in
cleanup_lpi() until they reach zero. At this point it should be safe to
free the pend_irq array.

Does that sound like a plan?
Do I miss something here? Probably yes, hence I am asking ;-)

Cheers,
Andre.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread