All of lore.kernel.org
 help / color / mirror / Atom feed
* ITS emulation race conditions
@ 2017-04-09 23:12 André Przywara
  2017-04-10 16:58 ` Andre Przywara
  0 siblings, 1 reply; 6+ messages in thread
From: André Przywara @ 2017-04-09 23:12 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall; +Cc: xen-devel

Hi,

I wanted to run some ideas on how to prevent the race conditions we are
facing with the ITS emulation and removing devices and/or LPIs.
I think Stefano's idea of tagging a discarded LPI is the key, but still
some details are left to be solved.
I think we are dealing with two issues:
1) A guest's DISCARD can race with in incoming LPI.
Ideally DISCARD would remove the host LPI -> vLPI connection (in the
host_lpis table), so any new incoming (host) LPI would simply be
discarded very early in gicv3_do_LPI() without ever resolving into a
virtual LPI. Now while this removal is atomic, we could have just missed
an incoming LPI, so the old virtual LPI would still traverse down the
VGIC with a "doomed" virtual LPI ID.
I wonder if that could be solved by a "crosswise" check:
- The DISCARD handler *first* tags the pending_irq as "UNMAPPED", *then*
removes the host_lpis connectin.
- do_LPI() reads the host_lpis() array, *then* checks the UNMAPPED tag
in the corresponding pending_irq (or lets vgic_vcpu_inject_irq() do that).
With this setup the DISCARD handler can assume that no new virtual LPI
instances enter the VGIC afterwards.

2) An unmapped LPI might still be in use by the VGIC, foremost might
still be in an LR.
Tagging the pending_irq should solve this in general. Whenever a VGIC
function finds the UNMAPPED tag, it does not process the vIRQ, but
retires it. For simplicity we might limit this to handling when a VCPU
exists and an LR gets cleaned up: If we hit a tagged pending_irq here,
we clean the LR, remove the pending_irq from all lists and signal the
ITS emulation that this pending_irq is now ready to be removed (by
calling some kind of cleanup_lpi() function, for instance).
The ITS code can then remove the struct pending_irq from the radix tree.

MAPD(V=0) is now using this to tag all still mapped events as
"UNMAPPED", the counting down the "still alive" pending_irqs in
cleanup_lpi() until they reach zero. At this point it should be safe to
free the pend_irq array.

Does that sound like a plan?
Do I miss something here? Probably yes, hence I am asking ;-)

Cheers,
Andre.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ITS emulation race conditions
  2017-04-09 23:12 ITS emulation race conditions André Przywara
@ 2017-04-10 16:58 ` Andre Przywara
  2017-04-10 23:01   ` Stefano Stabellini
  0 siblings, 1 reply; 6+ messages in thread
From: Andre Przywara @ 2017-04-10 16:58 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall; +Cc: xen-devel

Hi,

On 10/04/17 00:12, André Przywara wrote:
> Hi,
> 
> I wanted to run some ideas on how to prevent the race conditions we are
> facing with the ITS emulation and removing devices and/or LPIs.
> I think Stefano's idea of tagging a discarded LPI is the key, but still
> some details are left to be solved.
> I think we are dealing with two issues:
> 1) A guest's DISCARD can race with in incoming LPI.
> Ideally DISCARD would remove the host LPI -> vLPI connection (in the
> host_lpis table), so any new incoming (host) LPI would simply be
> discarded very early in gicv3_do_LPI() without ever resolving into a
> virtual LPI. Now while this removal is atomic, we could have just missed
> an incoming LPI, so the old virtual LPI would still traverse down the
> VGIC with a "doomed" virtual LPI ID.
> I wonder if that could be solved by a "crosswise" check:
> - The DISCARD handler *first* tags the pending_irq as "UNMAPPED", *then*
> removes the host_lpis connectin.
> - do_LPI() reads the host_lpis() array, *then* checks the UNMAPPED tag
> in the corresponding pending_irq (or lets vgic_vcpu_inject_irq() do that).
> With this setup the DISCARD handler can assume that no new virtual LPI
> instances enter the VGIC afterwards.
> 
> 2) An unmapped LPI might still be in use by the VGIC, foremost might
> still be in an LR.
> Tagging the pending_irq should solve this in general. Whenever a VGIC
> function finds the UNMAPPED tag, it does not process the vIRQ, but
> retires it. For simplicity we might limit this to handling when a VCPU
> exists and an LR gets cleaned up: If we hit a tagged pending_irq here,
> we clean the LR, remove the pending_irq from all lists and signal the
> ITS emulation that this pending_irq is now ready to be removed (by
> calling some kind of cleanup_lpi() function, for instance).
> The ITS code can then remove the struct pending_irq from the radix tree.
> MAPD(V=0) is now using this to tag all still mapped events as
> "UNMAPPED", the counting down the "still alive" pending_irqs in
> cleanup_lpi() until they reach zero. At this point it should be safe to
> free the pend_irq array.
> 
> Does that sound like a plan?
> Do I miss something here? Probably yes, hence I am asking ;-)

So there are two issues with this:
- For doing the LPI cleanup, we would need to call a virtual ITS or
virtual LPI specific function directly from gic.c. This looks like bad
coding style, as it breaks the abstraction (calling a virtual LPI/ITS
specific function from within gic.c)

- If we have a "DISCARD; MAPTI" sequence, targeting the same vLPI, while
this vLPI is still in an LR (but not cleaned up until both commands have
been handled), we end up with using the wrong pending_irq struct (the
new one). (I assume the UNMAPPED tag would be cleared upon the new MAPTI).

Can this create problems?
I see two possibilities:
a) the old vLPI is still pending in the LR: as the new LR would be
pristine, the code in update_one_lr() would just clean the QUEUED bit,
but not do anything further. The LR would be kept as used by this vLPI,
with the state still set to GICH_LR_PENDING. Upon re-entering the VCPU
this would make the new vLPI pending.
b) the old vLPI has been EOIed: the new LR would be pristine, the code
in update_one_lr() would try to clean it up anyway, which should not
hurt, AFAICT.

So if there is no new LPI triggered, a) would be wrong, but b) right. Is
that correct so far?

Now if there is a new LPI, a) would be correct, though the priority
might be different (though I am not sure this is an issue). b) should
work anyway, since the cleanup code checks for a new pending condition,
so would (re-)inject the (new) vLPI.

Julien does not seem to be a fan of this tagging idea, as this might
create more subtle failures that we don't see at the moment (which I can
understand).

So I would like to know how to proceed here:
I) stick to the tag, and fix that particular case above by checking for
the "inflight && ENABLED" condition and clearing the LR if the
pending_irq does not state it's pending anymore
II) introduce a NO_LONGER_PENDING tag in addition to the UNMAPPED tag,
where a new MAPTI just clears UNMAPPED_LPI, but keeps NO_LONGER_PENDING.
NO_LONGER_PENDING would be checked and cleared by update_one_lr(),
UNMAPPED by vgic_vcpu_inject_irq().
That would leave the issue how to call the pend_irq_cleanup() function
from update_one_lr() without breaking abstraction
III) Revert to the "check for NULL" solution.

MMh, this is nasty stuff, any suggestions?

Cheers,
Andre.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ITS emulation race conditions
  2017-04-10 16:58 ` Andre Przywara
@ 2017-04-10 23:01   ` Stefano Stabellini
  2017-04-11  9:24     ` Julien Grall
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Stabellini @ 2017-04-10 23:01 UTC (permalink / raw)
  To: Andre Przywara; +Cc: xen-devel, Julien Grall, Stefano Stabellini

[-- Attachment #1: Type: TEXT/PLAIN, Size: 8498 bytes --]

On Mon, 10 Apr 2017, Andre Przywara wrote:
> Hi,
> 
> On 10/04/17 00:12, André Przywara wrote:
> > Hi,
> > 
> > I wanted to run some ideas on how to prevent the race conditions we are
> > facing with the ITS emulation and removing devices and/or LPIs.
> > I think Stefano's idea of tagging a discarded LPI is the key, but still
> > some details are left to be solved.
> > I think we are dealing with two issues:
> > 1) A guest's DISCARD can race with in incoming LPI.
> > Ideally DISCARD would remove the host LPI -> vLPI connection (in the
> > host_lpis table), so any new incoming (host) LPI would simply be
> > discarded very early in gicv3_do_LPI() without ever resolving into a
> > virtual LPI. Now while this removal is atomic, we could have just missed
> > an incoming LPI, so the old virtual LPI would still traverse down the
> > VGIC with a "doomed" virtual LPI ID.
> > I wonder if that could be solved by a "crosswise" check:
> > - The DISCARD handler *first* tags the pending_irq as "UNMAPPED", *then*
> > removes the host_lpis connectin.
> > - do_LPI() reads the host_lpis() array, *then* checks the UNMAPPED tag
> > in the corresponding pending_irq (or lets vgic_vcpu_inject_irq() do that).
> > With this setup the DISCARD handler can assume that no new virtual LPI
> > instances enter the VGIC afterwards.
> >
> > 2) An unmapped LPI might still be in use by the VGIC, foremost might
> > still be in an LR.
> > Tagging the pending_irq should solve this in general. Whenever a VGIC
> > function finds the UNMAPPED tag, it does not process the vIRQ, but
> > retires it. For simplicity we might limit this to handling when a VCPU
> > exists and an LR gets cleaned up: If we hit a tagged pending_irq here,
> > we clean the LR, remove the pending_irq from all lists and signal the
> > ITS emulation that this pending_irq is now ready to be removed (by
> > calling some kind of cleanup_lpi() function, for instance).
> > The ITS code can then remove the struct pending_irq from the radix tree.
> > MAPD(V=0) is now using this to tag all still mapped events as
> > "UNMAPPED", the counting down the "still alive" pending_irqs in
> > cleanup_lpi() until they reach zero. At this point it should be safe to
> > free the pend_irq array.
> > 
> > Does that sound like a plan?
> > Do I miss something here? Probably yes, hence I am asking ;-)
> 
> So there are two issues with this:
> - For doing the LPI cleanup, we would need to call a virtual ITS or
> virtual LPI specific function directly from gic.c. This looks like bad
> coding style, as it breaks the abstraction (calling a virtual LPI/ITS
> specific function from within gic.c)

This is just code organization, I am not worried about it. We might have
to register cleanup functions. The real problem to solve is below.


> - If we have a "DISCARD; MAPTI" sequence, targeting the same vLPI, while
> this vLPI is still in an LR (but not cleaned up until both commands have
> been handled), we end up with using the wrong pending_irq struct (the
> new one). (I assume the UNMAPPED tag would be cleared upon the new MAPTI).

It looks like "DISCARD; MAPTI" would be a problem even if we go with
"GIC: Add checks for NULL pointer pending_irq's", because instead of a
NULL pending_irq, we could get the new pending_irq, the one backing the
same vlpi but a different eventid.


> Can this create problems?
> I see two possibilities:
> a) the old vLPI is still pending in the LR: as the new LR would be
> pristine, the code in update_one_lr() would just clean the QUEUED bit,
> but not do anything further. The LR would be kept as used by this vLPI,
> with the state still set to GICH_LR_PENDING. Upon re-entering the VCPU
> this would make the new vLPI pending.
> b) the old vLPI has been EOIed: the new LR would be pristine, the code
> in update_one_lr() would try to clean it up anyway, which should not
> hurt, AFAICT.

This cannot be allowed to happen. We have to keep a consistent internal
state at all times: we cannot change the vlpi mapping in pend_lpi_tree
before the old vlpi is completed discarded. We cannot have a vlpi marked
as "UNMAPPED", but still alive in an LR, while vlpi_to_pending already
returns the new vlpi.

We have a number of possible approaches, I'll mark them with [letter].


[a] On DISCARD we do everything we can to remove the vlpi from
everywhere:

- if pending_irq is in lr_queue, remove it from the list
- if the vlpi is in an LR register as pending (not active, see below),
  remove it from the LR (see the code in gic_restore_pending_irqs, under
  "No more free LRs: find a lower priority irq to evict")
- remove pending_irq from inflight
- remove pending_irq from pend_lpi_tree

At this stage there should be no more traces of the vlpi/pending_irq
anywhere.

What do we do if a "DISCARD; MAPTI" pair of commands is issued while the
old vlpi is still ACTIVE in an LR?  ACTIVE means that the guest is still
handling the irq and hasn't even EOIed it yet. I know that physical LPIs
have no active state, but what about virtual LPIs? I am tempted to say
that in any case for simplicity we could just remove the vlpi from the
LR ("evict") anyway. I don't think this case should happen with a well
behaved guest anyhow.

But the other issue is that "DISCARD, MAPTI" could be done on a
different vcpu, compared to the one having the old vlpi in an LR. In
this case, we would have to send an SGI to the right pcpu and clear the
LR, which is a royal pain in the neck because the other vcpu could not
even be running. Also another MAPTI could come in on a different pcpu
while we haven't completed the first LR removal. There might be races.
Let's explore other options.


[b] On DISCARD we would have to:
- if pending_irq is in lr_queue, remove it from the list
- if the vlpi is in an LR register:
  - mark as UNMAPPED (this check can be done from another vcpu)
  - remove from inflight (it could also be done in gic_update_one_lr)

Then when the guest EOIs the interrupt, in gic_update_one_lr:
- clear LR
- if UNMAPPED
  - clear UNMAPPED and return
  - remove old pending_irq from pend_lpi_tree (may not be necessary?)


Now the problem with this approach, like you pointed out, is: what do we
do if the guest issues a MAPTI before EOIing the interrupt?


[c] The simplest thing to do is ignore it (or report error back to the
guest) and printk a warning if the old vlpi is still UNMAPPED. It could
be a decent solution for Xen 4.9.


[d] Otherwise, we can try to handle it properly. On DISCARD:
- if pending_irq is in lr_queue, remove it from the list
- if vlpi is in LR:
  - mark as UNMAPPED
  - remove from inflight

On MAPTI:
if old pending_irq is UNMAPPED:
  - clear UNMAPPED in old pending_irq
  - add UNMAPPED to new pending_irq
- remove old pending_irq from pend_lpi_tree
- add new pending_irq to pend_lpi_tree

Then when the guest EOIs the interrupt, in gic_update_one_lr:
- clear LR
- if UNMAPPED
  - clear UNMAPPED and return


Among these approaches, [b]+[c] would be OK for 4.9. Or [d]. [a] looks
simple but it is actually difficult to do correctly.


> So I would like to know how to proceed here:
> I) stick to the tag, and fix that particular case above by checking for
> the "inflight && ENABLED" condition and clearing the LR if the
> pending_irq does not state it's pending anymore
> II) introduce a NO_LONGER_PENDING tag in addition to the UNMAPPED tag,
> where a new MAPTI just clears UNMAPPED_LPI, but keeps NO_LONGER_PENDING.
> NO_LONGER_PENDING would be checked and cleared by update_one_lr(),
> UNMAPPED by vgic_vcpu_inject_irq().
> That would leave the issue how to call the pend_irq_cleanup() function
> from update_one_lr() without breaking abstraction

If MAPTI is called with a different eventid but the same vlpi, wouldn't
vlpi_to_pending return the wrong pending_irq struct in
gic_update_one_lr, causing problems to both I) and II)? The flags would
not be set.


> III) Revert to the "check for NULL" solution.

Wouldn't irq_to_pending potentially return the new pending_irq instead
of NULL in gic_update_one_lr? This is a problem. For example, the new
pending_irq is not in the inflight list, but gic_update_one_lr will try
to remove it from it.

I think this approach is error prone: we should always get the right
pending_irq struct, or one that is appropriately marked with a flag, or
NULL. In this case, we would still have to mark the new pending_irq as
"UNMAPPED" to tell gic_update_one_lr to do nothing and return. It would
end up very similar to [d]. We might as well do [d].

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ITS emulation race conditions
  2017-04-10 23:01   ` Stefano Stabellini
@ 2017-04-11  9:24     ` Julien Grall
  2017-04-11 11:13       ` Julien Grall
  0 siblings, 1 reply; 6+ messages in thread
From: Julien Grall @ 2017-04-11  9:24 UTC (permalink / raw)
  To: Stefano Stabellini, Andre Przywara; +Cc: xen-devel

Hi Stefano,

On 04/11/2017 12:01 AM, Stefano Stabellini wrote:
> On Mon, 10 Apr 2017, Andre Przywara wrote:
>> Hi,
>>
>> On 10/04/17 00:12, André Przywara wrote:
>>> Hi,
>>>
>>> I wanted to run some ideas on how to prevent the race conditions we are
>>> facing with the ITS emulation and removing devices and/or LPIs.
>>> I think Stefano's idea of tagging a discarded LPI is the key, but still
>>> some details are left to be solved.
>>> I think we are dealing with two issues:
>>> 1) A guest's DISCARD can race with in incoming LPI.
>>> Ideally DISCARD would remove the host LPI -> vLPI connection (in the
>>> host_lpis table), so any new incoming (host) LPI would simply be
>>> discarded very early in gicv3_do_LPI() without ever resolving into a
>>> virtual LPI. Now while this removal is atomic, we could have just missed
>>> an incoming LPI, so the old virtual LPI would still traverse down the
>>> VGIC with a "doomed" virtual LPI ID.
>>> I wonder if that could be solved by a "crosswise" check:
>>> - The DISCARD handler *first* tags the pending_irq as "UNMAPPED", *then*
>>> removes the host_lpis connectin.
>>> - do_LPI() reads the host_lpis() array, *then* checks the UNMAPPED tag
>>> in the corresponding pending_irq (or lets vgic_vcpu_inject_irq() do that).
>>> With this setup the DISCARD handler can assume that no new virtual LPI
>>> instances enter the VGIC afterwards.
>>>
>>> 2) An unmapped LPI might still be in use by the VGIC, foremost might
>>> still be in an LR.
>>> Tagging the pending_irq should solve this in general. Whenever a VGIC
>>> function finds the UNMAPPED tag, it does not process the vIRQ, but
>>> retires it. For simplicity we might limit this to handling when a VCPU
>>> exists and an LR gets cleaned up: If we hit a tagged pending_irq here,
>>> we clean the LR, remove the pending_irq from all lists and signal the
>>> ITS emulation that this pending_irq is now ready to be removed (by
>>> calling some kind of cleanup_lpi() function, for instance).
>>> The ITS code can then remove the struct pending_irq from the radix tree.
>>> MAPD(V=0) is now using this to tag all still mapped events as
>>> "UNMAPPED", the counting down the "still alive" pending_irqs in
>>> cleanup_lpi() until they reach zero. At this point it should be safe to
>>> free the pend_irq array.
>>>
>>> Does that sound like a plan?
>>> Do I miss something here? Probably yes, hence I am asking ;-)
>>
>> So there are two issues with this:
>> - For doing the LPI cleanup, we would need to call a virtual ITS or
>> virtual LPI specific function directly from gic.c. This looks like bad
>> coding style, as it breaks the abstraction (calling a virtual LPI/ITS
>> specific function from within gic.c)
>
> This is just code organization, I am not worried about it. We might have
> to register cleanup functions. The real problem to solve is below.
>
>
>> - If we have a "DISCARD; MAPTI" sequence, targeting the same vLPI, while
>> this vLPI is still in an LR (but not cleaned up until both commands have
>> been handled), we end up with using the wrong pending_irq struct (the
>> new one). (I assume the UNMAPPED tag would be cleared upon the new MAPTI).
>
> It looks like "DISCARD; MAPTI" would be a problem even if we go with
> "GIC: Add checks for NULL pointer pending_irq's", because instead of a
> NULL pending_irq, we could get the new pending_irq, the one backing the
> same vlpi but a different eventid.
>
>
>> Can this create problems?
>> I see two possibilities:
>> a) the old vLPI is still pending in the LR: as the new LR would be
>> pristine, the code in update_one_lr() would just clean the QUEUED bit,
>> but not do anything further. The LR would be kept as used by this vLPI,
>> with the state still set to GICH_LR_PENDING. Upon re-entering the VCPU
>> this would make the new vLPI pending.
>> b) the old vLPI has been EOIed: the new LR would be pristine, the code
>> in update_one_lr() would try to clean it up anyway, which should not
>> hurt, AFAICT.
>
> This cannot be allowed to happen. We have to keep a consistent internal
> state at all times: we cannot change the vlpi mapping in pend_lpi_tree
> before the old vlpi is completed discarded. We cannot have a vlpi marked
> as "UNMAPPED", but still alive in an LR, while vlpi_to_pending already
> returns the new vlpi.
>
> We have a number of possible approaches, I'll mark them with [letter].
>
>
> [a] On DISCARD we do everything we can to remove the vlpi from
> everywhere:
>
> - if pending_irq is in lr_queue, remove it from the list
> - if the vlpi is in an LR register as pending (not active, see below),
>   remove it from the LR (see the code in gic_restore_pending_irqs, under
>   "No more free LRs: find a lower priority irq to evict")
> - remove pending_irq from inflight
> - remove pending_irq from pend_lpi_tree
>
> At this stage there should be no more traces of the vlpi/pending_irq
> anywhere.
>
> What do we do if a "DISCARD; MAPTI" pair of commands is issued while the
> old vlpi is still ACTIVE in an LR?  ACTIVE means that the guest is still
> handling the irq and hasn't even EOIed it yet. I know that physical LPIs
> have no active state, but what about virtual LPIs? I am tempted to say
> that in any case for simplicity we could just remove the vlpi from the
> LR ("evict") anyway. I don't think this case should happen with a well
> behaved guest anyhow.
>
> But the other issue is that "DISCARD, MAPTI" could be done on a
> different vcpu, compared to the one having the old vlpi in an LR. In
> this case, we would have to send an SGI to the right pcpu and clear the
> LR, which is a royal pain in the neck because the other vcpu could not
> even be running. Also another MAPTI could come in on a different pcpu
> while we haven't completed the first LR removal. There might be races.
> Let's explore other options.
>
>
> [b] On DISCARD we would have to:
> - if pending_irq is in lr_queue, remove it from the list
> - if the vlpi is in an LR register:
>   - mark as UNMAPPED (this check can be done from another vcpu)
>   - remove from inflight (it could also be done in gic_update_one_lr)
>
> Then when the guest EOIs the interrupt, in gic_update_one_lr:
> - clear LR
> - if UNMAPPED
>   - clear UNMAPPED and return
>   - remove old pending_irq from pend_lpi_tree (may not be necessary?)
>
>
> Now the problem with this approach, like you pointed out, is: what do we
> do if the guest issues a MAPTI before EOIing the interrupt?
>
>
> [c] The simplest thing to do is ignore it (or report error back to the
> guest) and printk a warning if the old vlpi is still UNMAPPED. It could
> be a decent solution for Xen 4.9.

I don't think this is a decent solution for Xen 4.9. It could be easily 
trigger and it is possible to have "DISCARD MAPTI" in the queue.

>
>
> [d] Otherwise, we can try to handle it properly. On DISCARD:
> - if pending_irq is in lr_queue, remove it from the list
> - if vlpi is in LR:
>   - mark as UNMAPPED
>   - remove from inflight
>
> On MAPTI:
> if old pending_irq is UNMAPPED:
>   - clear UNMAPPED in old pending_irq
>   - add UNMAPPED to new pending_irq
> - remove old pending_irq from pend_lpi_tree
> - add new pending_irq to pend_lpi_tree
>
> Then when the guest EOIs the interrupt, in gic_update_one_lr:
> - clear LR
> - if UNMAPPED
>   - clear UNMAPPED and return

So who is removing the pending_irq from the radix_tree? If you do in 
both MAPTI and gic_update_one_lr, you will end up in a potential race 
condition where MAPTI add the new on in the radix_tree and 
gic_clear_update will remove from the radix_tree. So the mapping will 
disappear.

It was not easily solvable on the migration case, and I don't think this 
is different here. Also, the new MAPTI may use a different collection 
(vCPU ID) so how do you protect that correctly?

>
> Among these approaches, [b]+[c] would be OK for 4.9. Or [d]. [a] looks
> simple but it is actually difficult to do correctly.
>
>
>> So I would like to know how to proceed here:
>> I) stick to the tag, and fix that particular case above by checking for
>> the "inflight && ENABLED" condition and clearing the LR if the
>> pending_irq does not state it's pending anymore
>> II) introduce a NO_LONGER_PENDING tag in addition to the UNMAPPED tag,
>> where a new MAPTI just clears UNMAPPED_LPI, but keeps NO_LONGER_PENDING.
>> NO_LONGER_PENDING would be checked and cleared by update_one_lr(),
>> UNMAPPED by vgic_vcpu_inject_irq().
>> That would leave the issue how to call the pend_irq_cleanup() function
>> from update_one_lr() without breaking abstraction
>
> If MAPTI is called with a different eventid but the same vlpi, wouldn't
> vlpi_to_pending return the wrong pending_irq struct in
> gic_update_one_lr, causing problems to both I) and II)? The flags would
> not be set.
>
>
>> III) Revert to the "check for NULL" solution.
>
> Wouldn't irq_to_pending potentially return the new pending_irq instead
> of NULL in gic_update_one_lr? This is a problem. For example, the new
> pending_irq is not in the inflight list, but gic_update_one_lr will try
> to remove it from it.
>
> I think this approach is error prone: we should always get the right
> pending_irq struct, or one that is appropriately marked with a flag, or
> NULL. In this case, we would still have to mark the new pending_irq as
> "UNMAPPED" to tell gic_update_one_lr to do nothing and return. It would
> end up very similar to [d]. We might as well do [d].

I think you still have to handle the NULL case because you may receive a 
spurious interrupt from the host before whilst cleaning up the mapping.

The mapping may pLPI <-> vLPI may still exist, but when you get into 
vgic_vcpu_inject_irq the irq_to_pending will return NULL as it does not 
exist anymore in the radix tree.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ITS emulation race conditions
  2017-04-11  9:24     ` Julien Grall
@ 2017-04-11 11:13       ` Julien Grall
  0 siblings, 0 replies; 6+ messages in thread
From: Julien Grall @ 2017-04-11 11:13 UTC (permalink / raw)
  To: Stefano Stabellini, Andre Przywara; +Cc: xen-devel



On 11/04/17 10:24, Julien Grall wrote:
> Hi Stefano,
>
> On 04/11/2017 12:01 AM, Stefano Stabellini wrote:
>> On Mon, 10 Apr 2017, Andre Przywara wrote:
>>> Hi,
>>>
>>> On 10/04/17 00:12, André Przywara wrote:
>>>> Hi,
>>>>
>>>> I wanted to run some ideas on how to prevent the race conditions we are
>>>> facing with the ITS emulation and removing devices and/or LPIs.
>>>> I think Stefano's idea of tagging a discarded LPI is the key, but still
>>>> some details are left to be solved.
>>>> I think we are dealing with two issues:
>>>> 1) A guest's DISCARD can race with in incoming LPI.
>>>> Ideally DISCARD would remove the host LPI -> vLPI connection (in the
>>>> host_lpis table), so any new incoming (host) LPI would simply be
>>>> discarded very early in gicv3_do_LPI() without ever resolving into a
>>>> virtual LPI. Now while this removal is atomic, we could have just
>>>> missed
>>>> an incoming LPI, so the old virtual LPI would still traverse down the
>>>> VGIC with a "doomed" virtual LPI ID.
>>>> I wonder if that could be solved by a "crosswise" check:
>>>> - The DISCARD handler *first* tags the pending_irq as "UNMAPPED",
>>>> *then*
>>>> removes the host_lpis connectin.
>>>> - do_LPI() reads the host_lpis() array, *then* checks the UNMAPPED tag
>>>> in the corresponding pending_irq (or lets vgic_vcpu_inject_irq() do
>>>> that).
>>>> With this setup the DISCARD handler can assume that no new virtual LPI
>>>> instances enter the VGIC afterwards.
>>>>
>>>> 2) An unmapped LPI might still be in use by the VGIC, foremost might
>>>> still be in an LR.
>>>> Tagging the pending_irq should solve this in general. Whenever a VGIC
>>>> function finds the UNMAPPED tag, it does not process the vIRQ, but
>>>> retires it. For simplicity we might limit this to handling when a VCPU
>>>> exists and an LR gets cleaned up: If we hit a tagged pending_irq here,
>>>> we clean the LR, remove the pending_irq from all lists and signal the
>>>> ITS emulation that this pending_irq is now ready to be removed (by
>>>> calling some kind of cleanup_lpi() function, for instance).
>>>> The ITS code can then remove the struct pending_irq from the radix
>>>> tree.
>>>> MAPD(V=0) is now using this to tag all still mapped events as
>>>> "UNMAPPED", the counting down the "still alive" pending_irqs in
>>>> cleanup_lpi() until they reach zero. At this point it should be safe to
>>>> free the pend_irq array.
>>>>
>>>> Does that sound like a plan?
>>>> Do I miss something here? Probably yes, hence I am asking ;-)
>>>
>>> So there are two issues with this:
>>> - For doing the LPI cleanup, we would need to call a virtual ITS or
>>> virtual LPI specific function directly from gic.c. This looks like bad
>>> coding style, as it breaks the abstraction (calling a virtual LPI/ITS
>>> specific function from within gic.c)
>>
>> This is just code organization, I am not worried about it. We might have
>> to register cleanup functions. The real problem to solve is below.
>>
>>
>>> - If we have a "DISCARD; MAPTI" sequence, targeting the same vLPI, while
>>> this vLPI is still in an LR (but not cleaned up until both commands have
>>> been handled), we end up with using the wrong pending_irq struct (the
>>> new one). (I assume the UNMAPPED tag would be cleared upon the new
>>> MAPTI).
>>
>> It looks like "DISCARD; MAPTI" would be a problem even if we go with
>> "GIC: Add checks for NULL pointer pending_irq's", because instead of a
>> NULL pending_irq, we could get the new pending_irq, the one backing the
>> same vlpi but a different eventid.
>>
>>
>>> Can this create problems?
>>> I see two possibilities:
>>> a) the old vLPI is still pending in the LR: as the new LR would be
>>> pristine, the code in update_one_lr() would just clean the QUEUED bit,
>>> but not do anything further. The LR would be kept as used by this vLPI,
>>> with the state still set to GICH_LR_PENDING. Upon re-entering the VCPU
>>> this would make the new vLPI pending.
>>> b) the old vLPI has been EOIed: the new LR would be pristine, the code
>>> in update_one_lr() would try to clean it up anyway, which should not
>>> hurt, AFAICT.
>>
>> This cannot be allowed to happen. We have to keep a consistent internal
>> state at all times: we cannot change the vlpi mapping in pend_lpi_tree
>> before the old vlpi is completed discarded. We cannot have a vlpi marked
>> as "UNMAPPED", but still alive in an LR, while vlpi_to_pending already
>> returns the new vlpi.

Well, in hardware LPI are handled by the re-distributor and ITS is only 
acting as a walker to find the mapping (DeviceID, EventID) -> LPI.

In real hardware, a LPI may be received afterwards if the re-distributor 
already inject it. The OS will receive the LPI and have to deal with it, 
potentially re-directing to the new interrupt handler and not the old.

The problem is the same here, an LPI may be in LR whilst the DISCARD 
happen. I think this is too late and the guest should receive it.

>>
>> We have a number of possible approaches, I'll mark them with [letter].
>>
>>
>> [a] On DISCARD we do everything we can to remove the vlpi from
>> everywhere:
>>
>> - if pending_irq is in lr_queue, remove it from the list
>> - if the vlpi is in an LR register as pending (not active, see below),
>>   remove it from the LR (see the code in gic_restore_pending_irqs, under
>>   "No more free LRs: find a lower priority irq to evict")
>> - remove pending_irq from inflight
>> - remove pending_irq from pend_lpi_tree
>>
>> At this stage there should be no more traces of the vlpi/pending_irq
>> anywhere.
>>
>> What do we do if a "DISCARD; MAPTI" pair of commands is issued while the
>> old vlpi is still ACTIVE in an LR?  ACTIVE means that the guest is still
>> handling the irq and hasn't even EOIed it yet. I know that physical LPIs
>> have no active state, but what about virtual LPIs? I am tempted to say
>> that in any case for simplicity we could just remove the vlpi from the
>> LR ("evict") anyway. I don't think this case should happen with a well
>> behaved guest anyhow.
>>
>> But the other issue is that "DISCARD, MAPTI" could be done on a
>> different vcpu, compared to the one having the old vlpi in an LR. In
>> this case, we would have to send an SGI to the right pcpu and clear the
>> LR, which is a royal pain in the neck because the other vcpu could not
>> even be running. Also another MAPTI could come in on a different pcpu
>> while we haven't completed the first LR removal. There might be races.
>> Let's explore other options.
>>
>>
>> [b] On DISCARD we would have to:
>> - if pending_irq is in lr_queue, remove it from the list
>> - if the vlpi is in an LR register:
>>   - mark as UNMAPPED (this check can be done from another vcpu)
>>   - remove from inflight (it could also be done in gic_update_one_lr)
>>
>> Then when the guest EOIs the interrupt, in gic_update_one_lr:
>> - clear LR
>> - if UNMAPPED
>>   - clear UNMAPPED and return
>>   - remove old pending_irq from pend_lpi_tree (may not be necessary?)
>>
>>
>> Now the problem with this approach, like you pointed out, is: what do we
>> do if the guest issues a MAPTI before EOIing the interrupt?
>>
>>
>> [c] The simplest thing to do is ignore it (or report error back to the
>> guest) and printk a warning if the old vlpi is still UNMAPPED. It could
>> be a decent solution for Xen 4.9.
>
> I don't think this is a decent solution for Xen 4.9. It could be easily
> trigger and it is possible to have "DISCARD MAPTI" in the queue.
>
>>
>>
>> [d] Otherwise, we can try to handle it properly. On DISCARD:
>> - if pending_irq is in lr_queue, remove it from the list
>> - if vlpi is in LR:
>>   - mark as UNMAPPED
>>   - remove from inflight
>>
>> On MAPTI:
>> if old pending_irq is UNMAPPED:
>>   - clear UNMAPPED in old pending_irq
>>   - add UNMAPPED to new pending_irq
>> - remove old pending_irq from pend_lpi_tree
>> - add new pending_irq to pend_lpi_tree
>>
>> Then when the guest EOIs the interrupt, in gic_update_one_lr:
>> - clear LR
>> - if UNMAPPED
>>   - clear UNMAPPED and return
>
> So who is removing the pending_irq from the radix_tree? If you do in
> both MAPTI and gic_update_one_lr, you will end up in a potential race
> condition where MAPTI add the new on in the radix_tree and
> gic_clear_update will remove from the radix_tree. So the mapping will
> disappear.
>
> It was not easily solvable on the migration case, and I don't think this
> is different here. Also, the new MAPTI may use a different collection
> (vCPU ID) so how do you protect that correctly?

Thinking a bit more, I think there is another race with your suggestion 
between gic_update_one_lr and vgci_vcpu_inject_irq.

If I understand correctly your suggestion, UNMAPPED will clear the lrs 
and return. However, you may have receive a new interrupt just before 
clearing the LRs.

Do you expect vgic_vcpu_inject_irq to clear UNMAPPED? If so, how the 
function will know if this is the new pending_irq or the previous and 
should be ignored?

>
>>
>> Among these approaches, [b]+[c] would be OK for 4.9. Or [d]. [a] looks
>> simple but it is actually difficult to do correctly.
>>
>>
>>> So I would like to know how to proceed here:
>>> I) stick to the tag, and fix that particular case above by checking for
>>> the "inflight && ENABLED" condition and clearing the LR if the
>>> pending_irq does not state it's pending anymore
>>> II) introduce a NO_LONGER_PENDING tag in addition to the UNMAPPED tag,
>>> where a new MAPTI just clears UNMAPPED_LPI, but keeps NO_LONGER_PENDING.
>>> NO_LONGER_PENDING would be checked and cleared by update_one_lr(),
>>> UNMAPPED by vgic_vcpu_inject_irq().
>>> That would leave the issue how to call the pend_irq_cleanup() function
>>> from update_one_lr() without breaking abstraction
>>
>> If MAPTI is called with a different eventid but the same vlpi, wouldn't
>> vlpi_to_pending return the wrong pending_irq struct in
>> gic_update_one_lr, causing problems to both I) and II)? The flags would
>> not be set.
>>
>>
>>> III) Revert to the "check for NULL" solution.
>>
>> Wouldn't irq_to_pending potentially return the new pending_irq instead
>> of NULL in gic_update_one_lr? This is a problem. For example, the new
>> pending_irq is not in the inflight list, but gic_update_one_lr will try
>> to remove it from it.
>>
>> I think this approach is error prone: we should always get the right
>> pending_irq struct, or one that is appropriately marked with a flag, or
>> NULL. In this case, we would still have to mark the new pending_irq as
>> "UNMAPPED" to tell gic_update_one_lr to do nothing and return. It would
>> end up very similar to [d]. We might as well do [d].
>
> I think you still have to handle the NULL case because you may receive a
> spurious interrupt from the host before whilst cleaning up the mapping.
>
> The mapping may pLPI <-> vLPI may still exist, but when you get into
> vgic_vcpu_inject_irq the irq_to_pending will return NULL as it does not
> exist anymore in the radix tree.
>
> Cheers,
>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* ITS emulation race conditions
@ 2017-04-09 23:10 Andre Przywara
  0 siblings, 0 replies; 6+ messages in thread
From: Andre Przywara @ 2017-04-09 23:10 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall; +Cc: xen-devel

Hi,

I wanted to run some ideas on how to prevent the race conditions we are
facing with the ITS emulation and removing devices and/or LPIs.
I think Stefano's idea of tagging a discarded LPI is the key, but still
some details are left to be solved.
I think we are dealing with two issues:
1) A guest's DISCARD can race with in incoming LPI.
Ideally DISCARD would remove the host LPI -> vLPI connection (in the
host_lpis table), so any new incoming (host) LPI would simply be
discarded very early in gicv3_do_LPI() without ever resolving into a
virtual LPI. Now while this removal is atomic, we could have just missed
an incoming LPI, so the old virtual LPI would still traverse down the
VGIC with a "doomed" virtual LPI ID.
I wonder if that could be solved by a "crosswise" check:
- The DISCARD handler *first* tags the pending_irq as "UNMAPPED", *then*
removes the host_lpis connectin.
- do_LPI() reads the host_lpis() array, *then* checks the UNMAPPED tag
in the corresponding pending_irq (or lets vgic_vcpu_inject_irq() do that).
With this setup the DISCARD handler can assume that no new virtual LPI
instances enter the VGIC afterwards.

2) An unmapped LPI might still be in use by the VGIC, foremost might
still be in an LR.
Tagging the pending_irq should solve this in general. Whenever a VGIC
function finds the UNMAPPED tag, it does not process the vIRQ, but
retires it. For simplicity we might limit this to handling when a VCPU
exists and an LR gets cleaned up: If we hit a tagged pending_irq here,
we clean the LR, remove the pending_irq from all lists and signal the
ITS emulation that this pending_irq is now ready to be removed (by
calling some kind of cleanup_lpi() function, for instance).
The ITS code can then remove the struct pending_irq from the radix tree.

MAPD(V=0) is now using this to tag all still mapped events as
"UNMAPPED", the counting down the "still alive" pending_irqs in
cleanup_lpi() until they reach zero. At this point it should be safe to
free the pend_irq array.

Does that sound like a plan?
Do I miss something here? Probably yes, hence I am asking ;-)

Cheers,
Andre.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-11 11:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-09 23:12 ITS emulation race conditions André Przywara
2017-04-10 16:58 ` Andre Przywara
2017-04-10 23:01   ` Stefano Stabellini
2017-04-11  9:24     ` Julien Grall
2017-04-11 11:13       ` Julien Grall
  -- strict thread matches above, loose matches on Subject: below --
2017-04-09 23:10 Andre Przywara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.