All of lore.kernel.org
 help / color / mirror / Atom feed
* x86/HVM: Linux'es apic_pending_intr_clear() warns about stale IRR
@ 2022-10-31 15:55 Jan Beulich
  2022-10-31 18:37 ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2022-10-31 15:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monné, Andrew Cooper

Hello,

quite likely this isn't new, but I've ended up noticing it only recently:
On an oldish system where I hand a HVM guest an SR-IOV NIC (not sure yet
whether that actually matters) all APs have that warning issued, with all
reported values zero except for the very first IRR one - that's 00080000.
Which is suspicious by itself, for naming vector 0x13, i.e. below 0x20
and hence within CPU exception range.

For one I wonder about their logic: The function is called after setting
TPR to 0x10, which prevents the handling of vectors below 0x20 (and in
particular their propagation from ISR to IRR, if my understanding of the
process is right and the convoluted and imo partly incomplete SDM
description hasn't confused me). Plus the function runs when IRQs are
still off, which is another reason why nothing would ever propagate from
IRR to ISR while the function performs it work. Yet a comment there says

	/*
	 * If the ISR map is not empty. ACK the APIC and run another round
	 * to verify whether a pending IRR has been unblocked and turned
	 * into a ISR.
	 */

suggesting IRR bits could "promote" to ISR ones. And this, to me, is the
only justification for warning about leftover IRR bits (whereas I
certainly agree that the logic should result in all clear ISR bits, and
hence warning when one is still set is appropriate).

And then I got puzzled by our logic: vlapic_get_ppr() is called only by
vlapic_set_ppr(), vlapic_lowest_prio(), and vlapic_read_aligned(). Yet
in particular not by vlapic_has_pending_irq(). While it looks like we
don't really ignore TPR during delivery, this appears to be a strange
split approach: hvm_interrupt_blocked() checks TPR, whereas
vlapic_has_pending_irq() checks ISR. I wonder if subtle issues can't
result from that ...

Of course I'm yet to figure out how IRR bit 0x13 ends up being set in
the first place.

Any correction to my understanding as well as any useful insight would
be appreciated.

Jan


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: x86/HVM: Linux'es apic_pending_intr_clear() warns about stale IRR
  2022-10-31 15:55 x86/HVM: Linux'es apic_pending_intr_clear() warns about stale IRR Jan Beulich
@ 2022-10-31 18:37 ` Andrew Cooper
  2022-11-03  8:48   ` Jan Beulich
  2022-11-03 13:48   ` Jan Beulich
  0 siblings, 2 replies; 4+ messages in thread
From: Andrew Cooper @ 2022-10-31 18:37 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Roger Pau Monne

On 31/10/2022 15:55, Jan Beulich wrote:
> Hello,
>
> quite likely this isn't new, but I've ended up noticing it only recently:
> On an oldish system where I hand a HVM guest an SR-IOV NIC (not sure yet
> whether that actually matters) all APs have that warning issued, with all
> reported values zero except for the very first IRR one - that's 00080000.
> Which is suspicious by itself, for naming vector 0x13, i.e. below 0x20
> and hence within CPU exception range.

To be clear, these are the VM's APs ?

> For one I wonder about their logic: The function is called after setting
> TPR to 0x10, which prevents the handling of vectors below 0x20 (and in
> particular their propagation from ISR to IRR, if my understanding of the
> process is right and the convoluted and imo partly incomplete SDM
> description hasn't confused me). Plus the function runs when IRQs are
> still off, which is another reason why nothing would ever propagate from
> IRR to ISR while the function performs it work. Yet a comment there says
>
> 	/*
> 	 * If the ISR map is not empty. ACK the APIC and run another round
> 	 * to verify whether a pending IRR has been unblocked and turned
> 	 * into a ISR.
> 	 */
>
> suggesting IRR bits could "promote" to ISR ones. And this, to me, is the
> only justification for warning about leftover IRR bits (whereas I
> certainly agree that the logic should result in all clear ISR bits, and
> hence warning when one is still set is appropriate).

Both the SDM and APM are fairly clear that IRR only moves to ISR when
the core accepts the interrupt.

So I agree that nothing in IRR will actually move to ISR as described by
the comment.

> And then I got puzzled by our logic: vlapic_get_ppr() is called only by
> vlapic_set_ppr(), vlapic_lowest_prio(), and vlapic_read_aligned(). Yet
> in particular not by vlapic_has_pending_irq(). While it looks like we
> don't really ignore TPR during delivery, this appears to be a strange
> split approach: hvm_interrupt_blocked() checks TPR, whereas
> vlapic_has_pending_irq() checks ISR. I wonder if subtle issues can't
> result from that ...

This is precisely why want the fine grain settings for APIC
acceleration.  I know for certain there's at least one bug here, because
it still causes windows to explode on migrate.

> Of course I'm yet to figure out how IRR bit 0x13 ends up being set in
> the first place.

0x13 is a legal vector for incoming interrupts (for reasons of Windows
using 0x1f for self-IPI.)

Exception wise, it's #XF, which isn't very common.  Xen could in
principle have had an event delivery type error and tried to deliver an
exception as an IRQ, and I don't think any of the safety assertions in
hvm_inject_event() would have triggered in this case.

I don't expect Linux will have deliberately IPI'd itself with that
vector, but I suppose it's not impossible if it constructed an IPI from
a badly initialised variable.  Alternatively, that vector is in the
PIC's default range, so we could have an emulation issue there.


If I were you, I'd ensure the VM has 4 or fewer vCPUs, and set up %dr
pointing at ISR[31:0] in the regs page.  That will catch whatever
happens to be writing into ISR, and the backtrace will probably be
interesting.  Just make sure you've disabled interrupt posting first,
because that is the one source that will bypass the debugregs.

~Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: x86/HVM: Linux'es apic_pending_intr_clear() warns about stale IRR
  2022-10-31 18:37 ` Andrew Cooper
@ 2022-11-03  8:48   ` Jan Beulich
  2022-11-03 13:48   ` Jan Beulich
  1 sibling, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2022-11-03  8:48 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monne, xen-devel

On 31.10.2022 19:37, Andrew Cooper wrote:
> On 31/10/2022 15:55, Jan Beulich wrote:
>> Hello,
>>
>> quite likely this isn't new, but I've ended up noticing it only recently:
>> On an oldish system where I hand a HVM guest an SR-IOV NIC (not sure yet
>> whether that actually matters) all APs have that warning issued, with all
>> reported values zero except for the very first IRR one - that's 00080000.
>> Which is suspicious by itself, for naming vector 0x13, i.e. below 0x20
>> and hence within CPU exception range.
> 
> To be clear, these are the VM's APs ?

Yes. I'm now also pretty sure this is a Linux side issue, as I've verified
it to be new in 6.0. Debugging is complicated some by the host not being
very reliable anymore - the SR-IOV card has been producing floods of
corrected errors for a long time, but recently apparently uncorrected
errors have been occurring every now and then, resulting in a host reset.
But I'll keep trying to see if I can spot where this behavior is coming
from.

Jan


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: x86/HVM: Linux'es apic_pending_intr_clear() warns about stale IRR
  2022-10-31 18:37 ` Andrew Cooper
  2022-11-03  8:48   ` Jan Beulich
@ 2022-11-03 13:48   ` Jan Beulich
  1 sibling, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2022-11-03 13:48 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monne, xen-devel

On 31.10.2022 19:37, Andrew Cooper wrote:
> On 31/10/2022 15:55, Jan Beulich wrote:
>> Of course I'm yet to figure out how IRR bit 0x13 ends up being set in
>> the first place.
> 
> 0x13 is a legal vector for incoming interrupts (for reasons of Windows
> using 0x1f for self-IPI.)

I was wrong here, mislead by the way Linux logs the IRR and ISR: About
any other separator than comma (and perhaps a blank) would likely have
made me check the order of elements. With commas it was pretty natural
to assume the logged items are in array element order. Turns out they
are not, so the bit in question corresponds to vector 0xf3. Which then
pretty cleanly connects to a new feature in 6.0 - see
https://lists.xen.org/archives/html/xen-devel/2022-11/msg00199.html.

Jan


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-11-03 13:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31 15:55 x86/HVM: Linux'es apic_pending_intr_clear() warns about stale IRR Jan Beulich
2022-10-31 18:37 ` Andrew Cooper
2022-11-03  8:48   ` Jan Beulich
2022-11-03 13:48   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.