kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* kvm-unit-tests: Question about the "no interrupt when timer is disabled" case
@ 2020-07-23  8:56 Zenghui Yu
  2020-07-24 11:08 ` Alexandru Elisei
  0 siblings, 1 reply; 4+ messages in thread
From: Zenghui Yu @ 2020-07-23  8:56 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: Marc Zyngier, kvmarm

Hi Alexandru,

I've noticed that the timer case will fail in the -stable 4.19 kernel.
The log is as follows:

FAIL: vtimer-busy-loop: no interrupt when timer is disabled
FAIL: vtimer-busy-loop: interrupt signal no longer pending

And it's because the related fix [16e604a437c8, "KVM: arm/arm64: vgic:
Reevaluate level sensitive interrupts on enable"] hasn't been backported
to the stable tree.

Just out of curiosity, _without_ this fix, had you actually seen the
guest getting into trouble due to an un-retired level-sensitive
interrupt and your patch fixed it? Or this was found by code inspection?

Take the exact vtimer case as an example, is it possible that the Linux
guest would disable the vtimer (the input interrupt line is driven to 0
but the old KVM doesn't take this into account) and potentially hit this
issue? I'm not familiar with it.

One of our internal tree is based on the stable 4.19 and I'm sure this
fix is not included. But I havn't received any bad reports from our
users yet. But if there's any potential problem without this fix, it'd
good to get it properly backported. I can help to send a backport
request if so.


Thanks,
Zenghui
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kvm-unit-tests: Question about the "no interrupt when timer is disabled" case
  2020-07-23  8:56 kvm-unit-tests: Question about the "no interrupt when timer is disabled" case Zenghui Yu
@ 2020-07-24 11:08 ` Alexandru Elisei
  2020-07-25  4:50   ` Zenghui Yu
  0 siblings, 1 reply; 4+ messages in thread
From: Alexandru Elisei @ 2020-07-24 11:08 UTC (permalink / raw)
  To: Zenghui Yu; +Cc: Marc Zyngier, kvmarm

Hi Zenghui,

I don't believe this issue can be triggered by a Linux guest. Details below.

On 7/23/20 9:56 AM, Zenghui Yu wrote:
> Hi Alexandru,
>
> I've noticed that the timer case will fail in the -stable 4.19 kernel.
> The log is as follows:
>
> FAIL: vtimer-busy-loop: no interrupt when timer is disabled
> FAIL: vtimer-busy-loop: interrupt signal no longer pending
>
> And it's because the related fix [16e604a437c8, "KVM: arm/arm64: vgic:
> Reevaluate level sensitive interrupts on enable"] hasn't been backported
> to the stable tree.

This is not an actual fix (hence no "Fixes" tag), this is more like an improvement
of the behaviour of the GIC. Like the patch description says, this can happen even
on hardware if the GIC hasn't sampled the device interrupt state (or the device
itself hasn't updated it) before the CPU re-enables the interrupt.

>
> Just out of curiosity, _without_ this fix, had you actually seen the
> guest getting into trouble due to an un-retired level-sensitive
> interrupt and your patch fixed it? Or this was found by code inspection?

This issue was found when running kvm-unit-tests on the model.

>
> Take the exact vtimer case as an example, is it possible that the Linux
> guest would disable the vtimer (the input interrupt line is driven to 0
> but the old KVM doesn't take this into account) and potentially hit this
> issue? I'm not familiar with it.

To trigger this, a guest has to do the following steps:

1. Disable the timer interrupt at the Redistributor level.
2. Trigger the timer interrupt in the timer.
3. Disable the timer entirely (CNT{P,V}_CTL_EL0.ENABLE = 0), which also disables
the timer interrupt.
4. Enable the timer interrupt at the Redistributor level.

I believe there are two reasons why this will never happen for a Linux guest:

- This isn't the way Linux handles interrupts. Furthermore, I don't believe Linux
will ever disable a specific interrupt at the irqchip level.
- The timer IRQ handler checks the ISTATUS flag in the timer control register
before handling the interrupt. The flag is unset if the timer is disabled.

I hope my explanation made sense, please chime in if I missed something or you
want more details.

Thanks,
Alex
>
> One of our internal tree is based on the stable 4.19 and I'm sure this
> fix is not included. But I havn't received any bad reports from our
> users yet. But if there's any potential problem without this fix, it'd
> good to get it properly backported. I can help to send a backport
> request if so.
>
>
> Thanks,
> Zenghui
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kvm-unit-tests: Question about the "no interrupt when timer is disabled" case
  2020-07-24 11:08 ` Alexandru Elisei
@ 2020-07-25  4:50   ` Zenghui Yu
  2020-07-27 11:27     ` Alexandru Elisei
  0 siblings, 1 reply; 4+ messages in thread
From: Zenghui Yu @ 2020-07-25  4:50 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: Marc Zyngier, kvmarm

Hi Alex,

[+cc Nianyao]

On 2020/7/24 19:08, Alexandru Elisei wrote:
> Hi Zenghui,
> 
> I don't believe this issue can be triggered by a Linux guest. Details below.
> 
> On 7/23/20 9:56 AM, Zenghui Yu wrote:
>> Hi Alexandru,
>>
>> I've noticed that the timer case will fail in the -stable 4.19 kernel.
>> The log is as follows:
>>
>> FAIL: vtimer-busy-loop: no interrupt when timer is disabled
>> FAIL: vtimer-busy-loop: interrupt signal no longer pending
>>
>> And it's because the related fix [16e604a437c8, "KVM: arm/arm64: vgic:
>> Reevaluate level sensitive interrupts on enable"] hasn't been backported
>> to the stable tree.
> 
> This is not an actual fix (hence no "Fixes" tag), this is more like an improvement
> of the behaviour of the GIC. Like the patch description says, this can happen even
> on hardware if the GIC hasn't sampled the device interrupt state (or the device
> itself hasn't updated it) before the CPU re-enables the interrupt.

Fair enough.

>>
>> Just out of curiosity, _without_ this fix, had you actually seen the
>> guest getting into trouble due to an un-retired level-sensitive
>> interrupt and your patch fixed it? Or this was found by code inspection?
> 
> This issue was found when running kvm-unit-tests on the model.
> 
>>
>> Take the exact vtimer case as an example, is it possible that the Linux
>> guest would disable the vtimer (the input interrupt line is driven to 0
>> but the old KVM doesn't take this into account) and potentially hit this
>> issue? I'm not familiar with it.
> 
> To trigger this, a guest has to do the following steps:
> 
> 1. Disable the timer interrupt at the Redistributor level.
> 2. Trigger the timer interrupt in the timer.
> 3. Disable the timer entirely (CNT{P,V}_CTL_EL0.ENABLE = 0), which also disables
> the timer interrupt.
> 4. Enable the timer interrupt at the Redistributor level.
> 
> I believe there are two reasons why this will never happen for a Linux guest:
> 
> - This isn't the way Linux handles interrupts. Furthermore, I don't believe Linux
> will ever disable a specific interrupt at the irqchip level.

This can at least happen in arch_timer_stop() [arm_arch_timer.c], where
the disable_percpu_irq() API will disable the interrupt (via irq_mask()
callback which will in turn disable the interrupt at GIC level by
programming the ICENABLER0).

What I'm worried is something like:

1. Disable the timer interrupt (at RDist level by poking the ICENABLER0,
    or at CPU level by masking PSTATE.I)

   [ timer interrupt is made pending, and remains pending in (v)GIC. ]

2. Disable the timer
3. Enable the timer interrupt (at RDist level by poking the ISENABLER0,
    or at CPU level by unmasking PSTATE.I)

   [ The interrupt is forwarded to (v)CPU, and we end-up re-enabling the
     timer inside the timer IRQ handler, which may not be as expected. ]

I'm just not sure if this will be a possible scenario in the Linux, and
is it harmful if this would happen.

> - The timer IRQ handler checks the ISTATUS flag in the timer control register
> before handling the interrupt. The flag is unset if the timer is disabled.

This actually doesn't match the spec. Arm ARM D13.8.25 has a description
about the ISTATUS field as below,

"When the value of the ENABLE bit is 0, the ISTATUS field is UNKNOWN."

I guess what Nianyao had posted [*] may address the concern above...

[*] 
http://lore.kernel.org/r/1595584037-6877-1-git-send-email-zhangshaokun@hisilicon.com/

> 
> I hope my explanation made sense, please chime in if I missed something or you
> want more details.

Thanks Alex,
Zenghui
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kvm-unit-tests: Question about the "no interrupt when timer is disabled" case
  2020-07-25  4:50   ` Zenghui Yu
@ 2020-07-27 11:27     ` Alexandru Elisei
  0 siblings, 0 replies; 4+ messages in thread
From: Alexandru Elisei @ 2020-07-27 11:27 UTC (permalink / raw)
  To: Zenghui Yu; +Cc: Marc Zyngier, kvmarm

Hi Zenghui,

On 7/25/20 5:50 AM, Zenghui Yu wrote:
> Hi Alex,
>
> [+cc Nianyao]
>
> On 2020/7/24 19:08, Alexandru Elisei wrote:
>> Hi Zenghui,
>>
>> I don't believe this issue can be triggered by a Linux guest. Details below.
>>
>> On 7/23/20 9:56 AM, Zenghui Yu wrote:
>>> Hi Alexandru,
>>>
>>> I've noticed that the timer case will fail in the -stable 4.19 kernel.
>>> The log is as follows:
>>>
>>> FAIL: vtimer-busy-loop: no interrupt when timer is disabled
>>> FAIL: vtimer-busy-loop: interrupt signal no longer pending
>>>
>>> And it's because the related fix [16e604a437c8, "KVM: arm/arm64: vgic:
>>> Reevaluate level sensitive interrupts on enable"] hasn't been backported
>>> to the stable tree.
>>
>> This is not an actual fix (hence no "Fixes" tag), this is more like an improvement
>> of the behaviour of the GIC. Like the patch description says, this can happen even
>> on hardware if the GIC hasn't sampled the device interrupt state (or the device
>> itself hasn't updated it) before the CPU re-enables the interrupt.
>
> Fair enough.
>
>>>
>>> Just out of curiosity, _without_ this fix, had you actually seen the
>>> guest getting into trouble due to an un-retired level-sensitive
>>> interrupt and your patch fixed it? Or this was found by code inspection?
>>
>> This issue was found when running kvm-unit-tests on the model.
>>
>>>
>>> Take the exact vtimer case as an example, is it possible that the Linux
>>> guest would disable the vtimer (the input interrupt line is driven to 0
>>> but the old KVM doesn't take this into account) and potentially hit this
>>> issue? I'm not familiar with it.
>>
>> To trigger this, a guest has to do the following steps:
>>
>> 1. Disable the timer interrupt at the Redistributor level.
>> 2. Trigger the timer interrupt in the timer.
>> 3. Disable the timer entirely (CNT{P,V}_CTL_EL0.ENABLE = 0), which also disables
>> the timer interrupt.
>> 4. Enable the timer interrupt at the Redistributor level.
>>
>> I believe there are two reasons why this will never happen for a Linux guest:
>>
>> - This isn't the way Linux handles interrupts. Furthermore, I don't believe Linux
>> will ever disable a specific interrupt at the irqchip level.
>
> This can at least happen in arch_timer_stop() [arm_arch_timer.c], where
> the disable_percpu_irq() API will disable the interrupt (via irq_mask()
> callback which will in turn disable the interrupt at GIC level by
> programming the ICENABLER0).

Sorry, I missed that. Did a grep for arch_timer_stop(), the function is called
only when a CPU is offlined.

>
> What I'm worried is something like:
>
> 1. Disable the timer interrupt (at RDist level by poking the ICENABLER0,
>    or at CPU level by masking PSTATE.I)

A CPU masking interrupts with PSTATE.I will not trigger the spurious interrupt.
KVM doesn't care if the guest is masking interrupts when it fills the LR
registers, and the interrupt state is updated for level-sensitive hardware
interrupts that are in the LR registers on a guest exit. As long as KVM keeps the
interrupt in a LR register, the interrupt state will be updated at guest exit
(writing to I{C,S}ENABLER0 triggers one).

>
>   [ timer interrupt is made pending, and remains pending in (v)GIC. ]
>
> 2. Disable the timer
> 3. Enable the timer interrupt (at RDist level by poking the ISENABLER0,
>    or at CPU level by unmasking PSTATE.I)
>
>   [ The interrupt is forwarded to (v)CPU, and we end-up re-enabling the
>     timer inside the timer IRQ handler, which may not be as expected. ]
>
> I'm just not sure if this will be a possible scenario in the Linux, and
> is it harmful if this would happen.

I'm not familiar with hotplug, but I assume that a CPU is offlined with a
PSCI_CPU_OFF call. KVM reset the vcpu when it's brought online again, and that
means updating the timer state in kvm_timer_vcpu_load() -> kvm_timer_vcpu_load_gic().

I've been thinking about it, and getting spurious timer interrupts can happen in
another, more common case, when the GIC retires the physical interrupt between
guest exit (when the interrupt state is sampled by KVM) and guest entry. There's
nothing that KVM can do in this case.

>
>> - The timer IRQ handler checks the ISTATUS flag in the timer control register
>> before handling the interrupt. The flag is unset if the timer is disabled.
>
> This actually doesn't match the spec. Arm ARM D13.8.25 has a description
> about the ISTATUS field as below,
>
> "When the value of the ENABLE bit is 0, the ISTATUS field is UNKNOWN."

My bad, when I looked at the Arm ARM I missed that part.

>
> I guess what Nianyao had posted [*] may address the concern above...
>
> [*]
> http://lore.kernel.org/r/1595584037-6877-1-git-send-email-zhangshaokun@hisilicon.com/

I've read Marc's explanation, and I tend to agree. Spurious interrupts can happen
on real hardware too, and there's really nothing that KVM can do about it.

Thanks,

Alex

>
>>
>> I hope my explanation made sense, please chime in if I missed something or you
>> want more details.
>
> Thanks Alex,
> Zenghui
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-07-27 11:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-23  8:56 kvm-unit-tests: Question about the "no interrupt when timer is disabled" case Zenghui Yu
2020-07-24 11:08 ` Alexandru Elisei
2020-07-25  4:50   ` Zenghui Yu
2020-07-27 11:27     ` Alexandru Elisei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).