* kvm-unit-tests: Question about the "no interrupt when timer is disabled" case @ 2020-07-23 8:56 Zenghui Yu 2020-07-24 11:08 ` Alexandru Elisei 0 siblings, 1 reply; 4+ messages in thread From: Zenghui Yu @ 2020-07-23 8:56 UTC (permalink / raw) To: Alexandru Elisei; +Cc: Marc Zyngier, kvmarm Hi Alexandru, I've noticed that the timer case will fail in the -stable 4.19 kernel. The log is as follows: FAIL: vtimer-busy-loop: no interrupt when timer is disabled FAIL: vtimer-busy-loop: interrupt signal no longer pending And it's because the related fix [16e604a437c8, "KVM: arm/arm64: vgic: Reevaluate level sensitive interrupts on enable"] hasn't been backported to the stable tree. Just out of curiosity, _without_ this fix, had you actually seen the guest getting into trouble due to an un-retired level-sensitive interrupt and your patch fixed it? Or this was found by code inspection? Take the exact vtimer case as an example, is it possible that the Linux guest would disable the vtimer (the input interrupt line is driven to 0 but the old KVM doesn't take this into account) and potentially hit this issue? I'm not familiar with it. One of our internal tree is based on the stable 4.19 and I'm sure this fix is not included. But I havn't received any bad reports from our users yet. But if there's any potential problem without this fix, it'd good to get it properly backported. I can help to send a backport request if so. Thanks, Zenghui _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: kvm-unit-tests: Question about the "no interrupt when timer is disabled" case 2020-07-23 8:56 kvm-unit-tests: Question about the "no interrupt when timer is disabled" case Zenghui Yu @ 2020-07-24 11:08 ` Alexandru Elisei 2020-07-25 4:50 ` Zenghui Yu 0 siblings, 1 reply; 4+ messages in thread From: Alexandru Elisei @ 2020-07-24 11:08 UTC (permalink / raw) To: Zenghui Yu; +Cc: Marc Zyngier, kvmarm Hi Zenghui, I don't believe this issue can be triggered by a Linux guest. Details below. On 7/23/20 9:56 AM, Zenghui Yu wrote: > Hi Alexandru, > > I've noticed that the timer case will fail in the -stable 4.19 kernel. > The log is as follows: > > FAIL: vtimer-busy-loop: no interrupt when timer is disabled > FAIL: vtimer-busy-loop: interrupt signal no longer pending > > And it's because the related fix [16e604a437c8, "KVM: arm/arm64: vgic: > Reevaluate level sensitive interrupts on enable"] hasn't been backported > to the stable tree. This is not an actual fix (hence no "Fixes" tag), this is more like an improvement of the behaviour of the GIC. Like the patch description says, this can happen even on hardware if the GIC hasn't sampled the device interrupt state (or the device itself hasn't updated it) before the CPU re-enables the interrupt. > > Just out of curiosity, _without_ this fix, had you actually seen the > guest getting into trouble due to an un-retired level-sensitive > interrupt and your patch fixed it? Or this was found by code inspection? This issue was found when running kvm-unit-tests on the model. > > Take the exact vtimer case as an example, is it possible that the Linux > guest would disable the vtimer (the input interrupt line is driven to 0 > but the old KVM doesn't take this into account) and potentially hit this > issue? I'm not familiar with it. To trigger this, a guest has to do the following steps: 1. Disable the timer interrupt at the Redistributor level. 2. Trigger the timer interrupt in the timer. 3. Disable the timer entirely (CNT{P,V}_CTL_EL0.ENABLE = 0), which also disables the timer interrupt. 4. Enable the timer interrupt at the Redistributor level. I believe there are two reasons why this will never happen for a Linux guest: - This isn't the way Linux handles interrupts. Furthermore, I don't believe Linux will ever disable a specific interrupt at the irqchip level. - The timer IRQ handler checks the ISTATUS flag in the timer control register before handling the interrupt. The flag is unset if the timer is disabled. I hope my explanation made sense, please chime in if I missed something or you want more details. Thanks, Alex > > One of our internal tree is based on the stable 4.19 and I'm sure this > fix is not included. But I havn't received any bad reports from our > users yet. But if there's any potential problem without this fix, it'd > good to get it properly backported. I can help to send a backport > request if so. > > > Thanks, > Zenghui _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: kvm-unit-tests: Question about the "no interrupt when timer is disabled" case 2020-07-24 11:08 ` Alexandru Elisei @ 2020-07-25 4:50 ` Zenghui Yu 2020-07-27 11:27 ` Alexandru Elisei 0 siblings, 1 reply; 4+ messages in thread From: Zenghui Yu @ 2020-07-25 4:50 UTC (permalink / raw) To: Alexandru Elisei; +Cc: Marc Zyngier, kvmarm Hi Alex, [+cc Nianyao] On 2020/7/24 19:08, Alexandru Elisei wrote: > Hi Zenghui, > > I don't believe this issue can be triggered by a Linux guest. Details below. > > On 7/23/20 9:56 AM, Zenghui Yu wrote: >> Hi Alexandru, >> >> I've noticed that the timer case will fail in the -stable 4.19 kernel. >> The log is as follows: >> >> FAIL: vtimer-busy-loop: no interrupt when timer is disabled >> FAIL: vtimer-busy-loop: interrupt signal no longer pending >> >> And it's because the related fix [16e604a437c8, "KVM: arm/arm64: vgic: >> Reevaluate level sensitive interrupts on enable"] hasn't been backported >> to the stable tree. > > This is not an actual fix (hence no "Fixes" tag), this is more like an improvement > of the behaviour of the GIC. Like the patch description says, this can happen even > on hardware if the GIC hasn't sampled the device interrupt state (or the device > itself hasn't updated it) before the CPU re-enables the interrupt. Fair enough. >> >> Just out of curiosity, _without_ this fix, had you actually seen the >> guest getting into trouble due to an un-retired level-sensitive >> interrupt and your patch fixed it? Or this was found by code inspection? > > This issue was found when running kvm-unit-tests on the model. > >> >> Take the exact vtimer case as an example, is it possible that the Linux >> guest would disable the vtimer (the input interrupt line is driven to 0 >> but the old KVM doesn't take this into account) and potentially hit this >> issue? I'm not familiar with it. > > To trigger this, a guest has to do the following steps: > > 1. Disable the timer interrupt at the Redistributor level. > 2. Trigger the timer interrupt in the timer. > 3. Disable the timer entirely (CNT{P,V}_CTL_EL0.ENABLE = 0), which also disables > the timer interrupt. > 4. Enable the timer interrupt at the Redistributor level. > > I believe there are two reasons why this will never happen for a Linux guest: > > - This isn't the way Linux handles interrupts. Furthermore, I don't believe Linux > will ever disable a specific interrupt at the irqchip level. This can at least happen in arch_timer_stop() [arm_arch_timer.c], where the disable_percpu_irq() API will disable the interrupt (via irq_mask() callback which will in turn disable the interrupt at GIC level by programming the ICENABLER0). What I'm worried is something like: 1. Disable the timer interrupt (at RDist level by poking the ICENABLER0, or at CPU level by masking PSTATE.I) [ timer interrupt is made pending, and remains pending in (v)GIC. ] 2. Disable the timer 3. Enable the timer interrupt (at RDist level by poking the ISENABLER0, or at CPU level by unmasking PSTATE.I) [ The interrupt is forwarded to (v)CPU, and we end-up re-enabling the timer inside the timer IRQ handler, which may not be as expected. ] I'm just not sure if this will be a possible scenario in the Linux, and is it harmful if this would happen. > - The timer IRQ handler checks the ISTATUS flag in the timer control register > before handling the interrupt. The flag is unset if the timer is disabled. This actually doesn't match the spec. Arm ARM D13.8.25 has a description about the ISTATUS field as below, "When the value of the ENABLE bit is 0, the ISTATUS field is UNKNOWN." I guess what Nianyao had posted [*] may address the concern above... [*] http://lore.kernel.org/r/1595584037-6877-1-git-send-email-zhangshaokun@hisilicon.com/ > > I hope my explanation made sense, please chime in if I missed something or you > want more details. Thanks Alex, Zenghui _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: kvm-unit-tests: Question about the "no interrupt when timer is disabled" case 2020-07-25 4:50 ` Zenghui Yu @ 2020-07-27 11:27 ` Alexandru Elisei 0 siblings, 0 replies; 4+ messages in thread From: Alexandru Elisei @ 2020-07-27 11:27 UTC (permalink / raw) To: Zenghui Yu; +Cc: Marc Zyngier, kvmarm Hi Zenghui, On 7/25/20 5:50 AM, Zenghui Yu wrote: > Hi Alex, > > [+cc Nianyao] > > On 2020/7/24 19:08, Alexandru Elisei wrote: >> Hi Zenghui, >> >> I don't believe this issue can be triggered by a Linux guest. Details below. >> >> On 7/23/20 9:56 AM, Zenghui Yu wrote: >>> Hi Alexandru, >>> >>> I've noticed that the timer case will fail in the -stable 4.19 kernel. >>> The log is as follows: >>> >>> FAIL: vtimer-busy-loop: no interrupt when timer is disabled >>> FAIL: vtimer-busy-loop: interrupt signal no longer pending >>> >>> And it's because the related fix [16e604a437c8, "KVM: arm/arm64: vgic: >>> Reevaluate level sensitive interrupts on enable"] hasn't been backported >>> to the stable tree. >> >> This is not an actual fix (hence no "Fixes" tag), this is more like an improvement >> of the behaviour of the GIC. Like the patch description says, this can happen even >> on hardware if the GIC hasn't sampled the device interrupt state (or the device >> itself hasn't updated it) before the CPU re-enables the interrupt. > > Fair enough. > >>> >>> Just out of curiosity, _without_ this fix, had you actually seen the >>> guest getting into trouble due to an un-retired level-sensitive >>> interrupt and your patch fixed it? Or this was found by code inspection? >> >> This issue was found when running kvm-unit-tests on the model. >> >>> >>> Take the exact vtimer case as an example, is it possible that the Linux >>> guest would disable the vtimer (the input interrupt line is driven to 0 >>> but the old KVM doesn't take this into account) and potentially hit this >>> issue? I'm not familiar with it. >> >> To trigger this, a guest has to do the following steps: >> >> 1. Disable the timer interrupt at the Redistributor level. >> 2. Trigger the timer interrupt in the timer. >> 3. Disable the timer entirely (CNT{P,V}_CTL_EL0.ENABLE = 0), which also disables >> the timer interrupt. >> 4. Enable the timer interrupt at the Redistributor level. >> >> I believe there are two reasons why this will never happen for a Linux guest: >> >> - This isn't the way Linux handles interrupts. Furthermore, I don't believe Linux >> will ever disable a specific interrupt at the irqchip level. > > This can at least happen in arch_timer_stop() [arm_arch_timer.c], where > the disable_percpu_irq() API will disable the interrupt (via irq_mask() > callback which will in turn disable the interrupt at GIC level by > programming the ICENABLER0). Sorry, I missed that. Did a grep for arch_timer_stop(), the function is called only when a CPU is offlined. > > What I'm worried is something like: > > 1. Disable the timer interrupt (at RDist level by poking the ICENABLER0, > or at CPU level by masking PSTATE.I) A CPU masking interrupts with PSTATE.I will not trigger the spurious interrupt. KVM doesn't care if the guest is masking interrupts when it fills the LR registers, and the interrupt state is updated for level-sensitive hardware interrupts that are in the LR registers on a guest exit. As long as KVM keeps the interrupt in a LR register, the interrupt state will be updated at guest exit (writing to I{C,S}ENABLER0 triggers one). > > [ timer interrupt is made pending, and remains pending in (v)GIC. ] > > 2. Disable the timer > 3. Enable the timer interrupt (at RDist level by poking the ISENABLER0, > or at CPU level by unmasking PSTATE.I) > > [ The interrupt is forwarded to (v)CPU, and we end-up re-enabling the > timer inside the timer IRQ handler, which may not be as expected. ] > > I'm just not sure if this will be a possible scenario in the Linux, and > is it harmful if this would happen. I'm not familiar with hotplug, but I assume that a CPU is offlined with a PSCI_CPU_OFF call. KVM reset the vcpu when it's brought online again, and that means updating the timer state in kvm_timer_vcpu_load() -> kvm_timer_vcpu_load_gic(). I've been thinking about it, and getting spurious timer interrupts can happen in another, more common case, when the GIC retires the physical interrupt between guest exit (when the interrupt state is sampled by KVM) and guest entry. There's nothing that KVM can do in this case. > >> - The timer IRQ handler checks the ISTATUS flag in the timer control register >> before handling the interrupt. The flag is unset if the timer is disabled. > > This actually doesn't match the spec. Arm ARM D13.8.25 has a description > about the ISTATUS field as below, > > "When the value of the ENABLE bit is 0, the ISTATUS field is UNKNOWN." My bad, when I looked at the Arm ARM I missed that part. > > I guess what Nianyao had posted [*] may address the concern above... > > [*] > http://lore.kernel.org/r/1595584037-6877-1-git-send-email-zhangshaokun@hisilicon.com/ I've read Marc's explanation, and I tend to agree. Spurious interrupts can happen on real hardware too, and there's really nothing that KVM can do about it. Thanks, Alex > >> >> I hope my explanation made sense, please chime in if I missed something or you >> want more details. > > Thanks Alex, > Zenghui _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-07-27 11:27 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-23 8:56 kvm-unit-tests: Question about the "no interrupt when timer is disabled" case Zenghui Yu 2020-07-24 11:08 ` Alexandru Elisei 2020-07-25 4:50 ` Zenghui Yu 2020-07-27 11:27 ` Alexandru Elisei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).