* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 7:01 ` Shunyong Yang 0 siblings, 0 replies; 50+ messages in thread From: Shunyong Yang @ 2018-03-08 7:01 UTC (permalink / raw) To: christoffer.dall Cc: marc.zyngier, ard.biesheuvel, will.deacon, eric.auger, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Shunyong Yang, Joey Zheng When resampling irqfds is enabled, level interrupt should be de-asserted when resampling happens. On page 4-47 of GIC v3 specification IHI0069D, it said, "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU interface, the IRI changes the status of the interrupt to active and pending if: • It is an edge-triggered interrupt, and another edge has been detected since the interrupt was acknowledged. • It is a level-sensitive interrupt, and the level has not been deasserted since the interrupt was acknowledged." GIC v2 specification IHI0048B.b has similar description on page 3-42 for state machine transition. When some VFIO device, like mtty(8250 VFIO mdev emulation driver in samples/vfio-mdev) triggers a level interrupt, the status transition in LR is pending-->active-->active and pending. Then it will wait resampling to de-assert the interrupt. Current design of lr_signals_eoi_mi() will return false if state in LR is not invalid(Inactive). It causes resampling will not happen in mtty case. This will cause interrupt fired continuously to guest even 8250 IIR has no interrupt. When 8250's interrupt is configured in shared mode, it will pass interrupt to other drivers to handle. However, there is no other driver involved. Then, a "nobody cared" kernel complaint occurs. / # cat /dev/ttyS0 [ 4.826836] random: crng init done [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" option) [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 [ 6.378927] Hardware name: linux,dummy-virt (DT) [ 6.380876] Call trace: [ 6.381937] dump_backtrace+0x0/0x180 [ 6.383495] show_stack+0x14/0x1c [ 6.384902] dump_stack+0x90/0xb4 [ 6.386312] __report_bad_irq+0x38/0xe0 [ 6.387944] note_interrupt+0x1f4/0x2b8 [ 6.389568] handle_irq_event_percpu+0x54/0x7c [ 6.391433] handle_irq_event+0x44/0x74 [ 6.393056] handle_fasteoi_irq+0x9c/0x154 [ 6.394784] generic_handle_irq+0x24/0x38 [ 6.396483] __handle_domain_irq+0x60/0xb4 [ 6.398207] gic_handle_irq+0x98/0x1b0 [ 6.399796] el1_irq+0xb0/0x128 [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 [ 6.403149] __setup_irq+0x41c/0x678 [ 6.404669] request_threaded_irq+0xe0/0x190 [ 6.406474] univ8250_setup_irq+0x208/0x234 [ 6.408250] serial8250_do_startup+0x1b4/0x754 [ 6.410123] serial8250_startup+0x20/0x28 [ 6.411826] uart_startup.part.21+0x78/0x144 [ 6.413633] uart_port_activate+0x50/0x68 [ 6.415328] tty_port_open+0x84/0xd4 [ 6.416851] uart_open+0x34/0x44 [ 6.418229] tty_open+0xec/0x3c8 [ 6.419610] chrdev_open+0xb0/0x198 [ 6.421093] do_dentry_open+0x200/0x310 [ 6.422714] vfs_open+0x54/0x84 [ 6.424054] path_openat+0x2dc/0xf04 [ 6.425569] do_filp_open+0x68/0xd8 [ 6.427044] do_sys_open+0x16c/0x224 [ 6.428563] SyS_openat+0x10/0x18 [ 6.429972] el0_svc_naked+0x30/0x34 [ 6.431494] handlers: [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt [ 6.434597] Disabling IRQ #41 This patch changes the lr state condition in lr_signals_eoi_mi() from invalid(Inactive) to active and pending to avoid this. I am not sure about the original design of the condition of invalid(active). So, This RFC is sent out for comments. Cc: Joey Zheng <yu.zheng@hxt-semitech.com> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> --- virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c index e9d840a75e7b..740ee9a5f551 100644 --- a/virt/kvm/arm/vgic/vgic-v2.c +++ b/virt/kvm/arm/vgic/vgic-v2.c @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) static bool lr_signals_eoi_mi(u32 lr_val) { - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && - !(lr_val & GICH_LR_HW); + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); } /* diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index 6b329414e57a..43111bba7af9 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) static bool lr_signals_eoi_mi(u64 lr_val) { - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && - !(lr_val & ICH_LR_HW); + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); } void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 7:01 ` Shunyong Yang 0 siblings, 0 replies; 50+ messages in thread From: Shunyong Yang @ 2018-03-08 7:01 UTC (permalink / raw) To: linux-arm-kernel When resampling irqfds is enabled, level interrupt should be de-asserted when resampling happens. On page 4-47 of GIC v3 specification IHI0069D, it said, "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU interface, the IRI changes the status of the interrupt to active and pending if: ? It is an edge-triggered interrupt, and another edge has been detected since the interrupt was acknowledged. ? It is a level-sensitive interrupt, and the level has not been deasserted since the interrupt was acknowledged." GIC v2 specification IHI0048B.b has similar description on page 3-42 for state machine transition. When some VFIO device, like mtty(8250 VFIO mdev emulation driver in samples/vfio-mdev) triggers a level interrupt, the status transition in LR is pending-->active-->active and pending. Then it will wait resampling to de-assert the interrupt. Current design of lr_signals_eoi_mi() will return false if state in LR is not invalid(Inactive). It causes resampling will not happen in mtty case. This will cause interrupt fired continuously to guest even 8250 IIR has no interrupt. When 8250's interrupt is configured in shared mode, it will pass interrupt to other drivers to handle. However, there is no other driver involved. Then, a "nobody cared" kernel complaint occurs. / # cat /dev/ttyS0 [ 4.826836] random: crng init done [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" option) [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 [ 6.378927] Hardware name: linux,dummy-virt (DT) [ 6.380876] Call trace: [ 6.381937] dump_backtrace+0x0/0x180 [ 6.383495] show_stack+0x14/0x1c [ 6.384902] dump_stack+0x90/0xb4 [ 6.386312] __report_bad_irq+0x38/0xe0 [ 6.387944] note_interrupt+0x1f4/0x2b8 [ 6.389568] handle_irq_event_percpu+0x54/0x7c [ 6.391433] handle_irq_event+0x44/0x74 [ 6.393056] handle_fasteoi_irq+0x9c/0x154 [ 6.394784] generic_handle_irq+0x24/0x38 [ 6.396483] __handle_domain_irq+0x60/0xb4 [ 6.398207] gic_handle_irq+0x98/0x1b0 [ 6.399796] el1_irq+0xb0/0x128 [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 [ 6.403149] __setup_irq+0x41c/0x678 [ 6.404669] request_threaded_irq+0xe0/0x190 [ 6.406474] univ8250_setup_irq+0x208/0x234 [ 6.408250] serial8250_do_startup+0x1b4/0x754 [ 6.410123] serial8250_startup+0x20/0x28 [ 6.411826] uart_startup.part.21+0x78/0x144 [ 6.413633] uart_port_activate+0x50/0x68 [ 6.415328] tty_port_open+0x84/0xd4 [ 6.416851] uart_open+0x34/0x44 [ 6.418229] tty_open+0xec/0x3c8 [ 6.419610] chrdev_open+0xb0/0x198 [ 6.421093] do_dentry_open+0x200/0x310 [ 6.422714] vfs_open+0x54/0x84 [ 6.424054] path_openat+0x2dc/0xf04 [ 6.425569] do_filp_open+0x68/0xd8 [ 6.427044] do_sys_open+0x16c/0x224 [ 6.428563] SyS_openat+0x10/0x18 [ 6.429972] el0_svc_naked+0x30/0x34 [ 6.431494] handlers: [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt [ 6.434597] Disabling IRQ #41 This patch changes the lr state condition in lr_signals_eoi_mi() from invalid(Inactive) to active and pending to avoid this. I am not sure about the original design of the condition of invalid(active). So, This RFC is sent out for comments. Cc: Joey Zheng <yu.zheng@hxt-semitech.com> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> --- virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c index e9d840a75e7b..740ee9a5f551 100644 --- a/virt/kvm/arm/vgic/vgic-v2.c +++ b/virt/kvm/arm/vgic/vgic-v2.c @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) static bool lr_signals_eoi_mi(u32 lr_val) { - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && - !(lr_val & GICH_LR_HW); + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); } /* diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index 6b329414e57a..43111bba7af9 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) static bool lr_signals_eoi_mi(u64 lr_val) { - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && - !(lr_val & ICH_LR_HW); + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); } void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 7:01 ` Shunyong Yang 0 siblings, 0 replies; 50+ messages in thread From: Shunyong Yang @ 2018-03-08 7:01 UTC (permalink / raw) To: christoffer.dall Cc: marc.zyngier, ard.biesheuvel, will.deacon, eric.auger, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Shunyong Yang, Joey Zheng When resampling irqfds is enabled, level interrupt should be de-asserted when resampling happens. On page 4-47 of GIC v3 specification IHI0069D, it said, "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU interface, the IRI changes the status of the interrupt to active and pending if: • It is an edge-triggered interrupt, and another edge has been detected since the interrupt was acknowledged. • It is a level-sensitive interrupt, and the level has not been deasserted since the interrupt was acknowledged." GIC v2 specification IHI0048B.b has similar description on page 3-42 for state machine transition. When some VFIO device, like mtty(8250 VFIO mdev emulation driver in samples/vfio-mdev) triggers a level interrupt, the status transition in LR is pending-->active-->active and pending. Then it will wait resampling to de-assert the interrupt. Current design of lr_signals_eoi_mi() will return false if state in LR is not invalid(Inactive). It causes resampling will not happen in mtty case. This will cause interrupt fired continuously to guest even 8250 IIR has no interrupt. When 8250's interrupt is configured in shared mode, it will pass interrupt to other drivers to handle. However, there is no other driver involved. Then, a "nobody cared" kernel complaint occurs. / # cat /dev/ttyS0 [ 4.826836] random: crng init done [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" option) [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 [ 6.378927] Hardware name: linux,dummy-virt (DT) [ 6.380876] Call trace: [ 6.381937] dump_backtrace+0x0/0x180 [ 6.383495] show_stack+0x14/0x1c [ 6.384902] dump_stack+0x90/0xb4 [ 6.386312] __report_bad_irq+0x38/0xe0 [ 6.387944] note_interrupt+0x1f4/0x2b8 [ 6.389568] handle_irq_event_percpu+0x54/0x7c [ 6.391433] handle_irq_event+0x44/0x74 [ 6.393056] handle_fasteoi_irq+0x9c/0x154 [ 6.394784] generic_handle_irq+0x24/0x38 [ 6.396483] __handle_domain_irq+0x60/0xb4 [ 6.398207] gic_handle_irq+0x98/0x1b0 [ 6.399796] el1_irq+0xb0/0x128 [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 [ 6.403149] __setup_irq+0x41c/0x678 [ 6.404669] request_threaded_irq+0xe0/0x190 [ 6.406474] univ8250_setup_irq+0x208/0x234 [ 6.408250] serial8250_do_startup+0x1b4/0x754 [ 6.410123] serial8250_startup+0x20/0x28 [ 6.411826] uart_startup.part.21+0x78/0x144 [ 6.413633] uart_port_activate+0x50/0x68 [ 6.415328] tty_port_open+0x84/0xd4 [ 6.416851] uart_open+0x34/0x44 [ 6.418229] tty_open+0xec/0x3c8 [ 6.419610] chrdev_open+0xb0/0x198 [ 6.421093] do_dentry_open+0x200/0x310 [ 6.422714] vfs_open+0x54/0x84 [ 6.424054] path_openat+0x2dc/0xf04 [ 6.425569] do_filp_open+0x68/0xd8 [ 6.427044] do_sys_open+0x16c/0x224 [ 6.428563] SyS_openat+0x10/0x18 [ 6.429972] el0_svc_naked+0x30/0x34 [ 6.431494] handlers: [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt [ 6.434597] Disabling IRQ #41 This patch changes the lr state condition in lr_signals_eoi_mi() from invalid(Inactive) to active and pending to avoid this. I am not sure about the original design of the condition of invalid(active). So, This RFC is sent out for comments. Cc: Joey Zheng <yu.zheng@hxt-semitech.com> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> --- virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c index e9d840a75e7b..740ee9a5f551 100644 --- a/virt/kvm/arm/vgic/vgic-v2.c +++ b/virt/kvm/arm/vgic/vgic-v2.c @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) static bool lr_signals_eoi_mi(u32 lr_val) { - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && - !(lr_val & GICH_LR_HW); + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); } /* diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index 6b329414e57a..43111bba7af9 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) static bool lr_signals_eoi_mi(u64 lr_val) { - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && - !(lr_val & ICH_LR_HW); + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); } void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 7:01 ` Shunyong Yang @ 2018-03-08 8:57 ` Auger Eric -1 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-08 8:57 UTC (permalink / raw) To: Shunyong Yang, christoffer.dall Cc: david.daney, marc.zyngier, ard.biesheuvel, will.deacon, linux-kernel, Joey Zheng, kvmarm, linux-arm-kernel Hi, On 08/03/18 08:01, Shunyong Yang wrote: > When resampling irqfds is enabled, level interrupt should be > de-asserted when resampling happens. On page 4-47 of GIC v3 > specification IHI0069D, it said, > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > interface, the IRI changes the status of the interrupt to active > and pending if: > • It is an edge-triggered interrupt, and another edge has been > detected since the interrupt was acknowledged. > • It is a level-sensitive interrupt, and the level has not been > deasserted since the interrupt was acknowledged." > > GIC v2 specification IHI0048B.b has similar description on page > 3-42 for state machine transition. > > When some VFIO device, like mtty(8250 VFIO mdev emulation driver > in samples/vfio-mdev) triggers a level interrupt, the status > transition in LR is pending-->active-->active and pending. > Then it will wait resampling to de-assert the interrupt. > > Current design of lr_signals_eoi_mi() will return false if state > in LR is not invalid(Inactive). It causes resampling will not happen > in mtty case. > > This will cause interrupt fired continuously to guest even 8250 IIR > has no interrupt. When 8250's interrupt is configured in shared mode, > it will pass interrupt to other drivers to handle. However, there > is no other driver involved. Then, a "nobody cared" kernel complaint > occurs. > > / # cat /dev/ttyS0 > [ 4.826836] random: crng init done > [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" > option) > [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 > [ 6.378927] Hardware name: linux,dummy-virt (DT) > [ 6.380876] Call trace: > [ 6.381937] dump_backtrace+0x0/0x180 > [ 6.383495] show_stack+0x14/0x1c > [ 6.384902] dump_stack+0x90/0xb4 > [ 6.386312] __report_bad_irq+0x38/0xe0 > [ 6.387944] note_interrupt+0x1f4/0x2b8 > [ 6.389568] handle_irq_event_percpu+0x54/0x7c > [ 6.391433] handle_irq_event+0x44/0x74 > [ 6.393056] handle_fasteoi_irq+0x9c/0x154 > [ 6.394784] generic_handle_irq+0x24/0x38 > [ 6.396483] __handle_domain_irq+0x60/0xb4 > [ 6.398207] gic_handle_irq+0x98/0x1b0 > [ 6.399796] el1_irq+0xb0/0x128 > [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 > [ 6.403149] __setup_irq+0x41c/0x678 > [ 6.404669] request_threaded_irq+0xe0/0x190 > [ 6.406474] univ8250_setup_irq+0x208/0x234 > [ 6.408250] serial8250_do_startup+0x1b4/0x754 > [ 6.410123] serial8250_startup+0x20/0x28 > [ 6.411826] uart_startup.part.21+0x78/0x144 > [ 6.413633] uart_port_activate+0x50/0x68 > [ 6.415328] tty_port_open+0x84/0xd4 > [ 6.416851] uart_open+0x34/0x44 > [ 6.418229] tty_open+0xec/0x3c8 > [ 6.419610] chrdev_open+0xb0/0x198 > [ 6.421093] do_dentry_open+0x200/0x310 > [ 6.422714] vfs_open+0x54/0x84 > [ 6.424054] path_openat+0x2dc/0xf04 > [ 6.425569] do_filp_open+0x68/0xd8 > [ 6.427044] do_sys_open+0x16c/0x224 > [ 6.428563] SyS_openat+0x10/0x18 > [ 6.429972] el0_svc_naked+0x30/0x34 > [ 6.431494] handlers: > [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt > [ 6.434597] Disabling IRQ #41 > > This patch changes the lr state condition in lr_signals_eoi_mi() from > invalid(Inactive) to active and pending to avoid this. > > I am not sure about the original design of the condition of > invalid(active). So, This RFC is sent out for comments. > > Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > --- > virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > index e9d840a75e7b..740ee9a5f551 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u32 lr_val) > { > - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > - !(lr_val & GICH_LR_HW); > + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && > + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); > } > > /* > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > index 6b329414e57a..43111bba7af9 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u64 lr_val) > { > - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > - !(lr_val & ICH_LR_HW); > + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && > + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); In general don't we have this state transition inactive -> pending -> pending + active (1) -> active -> inactive. In that case won't we lower the virt irq level when folding the LR on Pending + Active state, which is not was we want? Thanks Eric > } > > void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 8:57 ` Auger Eric 0 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-08 8:57 UTC (permalink / raw) To: linux-arm-kernel Hi, On 08/03/18 08:01, Shunyong Yang wrote: > When resampling irqfds is enabled, level interrupt should be > de-asserted when resampling happens. On page 4-47 of GIC v3 > specification IHI0069D, it said, > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > interface, the IRI changes the status of the interrupt to active > and pending if: > ? It is an edge-triggered interrupt, and another edge has been > detected since the interrupt was acknowledged. > ? It is a level-sensitive interrupt, and the level has not been > deasserted since the interrupt was acknowledged." > > GIC v2 specification IHI0048B.b has similar description on page > 3-42 for state machine transition. > > When some VFIO device, like mtty(8250 VFIO mdev emulation driver > in samples/vfio-mdev) triggers a level interrupt, the status > transition in LR is pending-->active-->active and pending. > Then it will wait resampling to de-assert the interrupt. > > Current design of lr_signals_eoi_mi() will return false if state > in LR is not invalid(Inactive). It causes resampling will not happen > in mtty case. > > This will cause interrupt fired continuously to guest even 8250 IIR > has no interrupt. When 8250's interrupt is configured in shared mode, > it will pass interrupt to other drivers to handle. However, there > is no other driver involved. Then, a "nobody cared" kernel complaint > occurs. > > / # cat /dev/ttyS0 > [ 4.826836] random: crng init done > [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" > option) > [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 > [ 6.378927] Hardware name: linux,dummy-virt (DT) > [ 6.380876] Call trace: > [ 6.381937] dump_backtrace+0x0/0x180 > [ 6.383495] show_stack+0x14/0x1c > [ 6.384902] dump_stack+0x90/0xb4 > [ 6.386312] __report_bad_irq+0x38/0xe0 > [ 6.387944] note_interrupt+0x1f4/0x2b8 > [ 6.389568] handle_irq_event_percpu+0x54/0x7c > [ 6.391433] handle_irq_event+0x44/0x74 > [ 6.393056] handle_fasteoi_irq+0x9c/0x154 > [ 6.394784] generic_handle_irq+0x24/0x38 > [ 6.396483] __handle_domain_irq+0x60/0xb4 > [ 6.398207] gic_handle_irq+0x98/0x1b0 > [ 6.399796] el1_irq+0xb0/0x128 > [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 > [ 6.403149] __setup_irq+0x41c/0x678 > [ 6.404669] request_threaded_irq+0xe0/0x190 > [ 6.406474] univ8250_setup_irq+0x208/0x234 > [ 6.408250] serial8250_do_startup+0x1b4/0x754 > [ 6.410123] serial8250_startup+0x20/0x28 > [ 6.411826] uart_startup.part.21+0x78/0x144 > [ 6.413633] uart_port_activate+0x50/0x68 > [ 6.415328] tty_port_open+0x84/0xd4 > [ 6.416851] uart_open+0x34/0x44 > [ 6.418229] tty_open+0xec/0x3c8 > [ 6.419610] chrdev_open+0xb0/0x198 > [ 6.421093] do_dentry_open+0x200/0x310 > [ 6.422714] vfs_open+0x54/0x84 > [ 6.424054] path_openat+0x2dc/0xf04 > [ 6.425569] do_filp_open+0x68/0xd8 > [ 6.427044] do_sys_open+0x16c/0x224 > [ 6.428563] SyS_openat+0x10/0x18 > [ 6.429972] el0_svc_naked+0x30/0x34 > [ 6.431494] handlers: > [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt > [ 6.434597] Disabling IRQ #41 > > This patch changes the lr state condition in lr_signals_eoi_mi() from > invalid(Inactive) to active and pending to avoid this. > > I am not sure about the original design of the condition of > invalid(active). So, This RFC is sent out for comments. > > Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > --- > virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > index e9d840a75e7b..740ee9a5f551 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u32 lr_val) > { > - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > - !(lr_val & GICH_LR_HW); > + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && > + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); > } > > /* > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > index 6b329414e57a..43111bba7af9 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u64 lr_val) > { > - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > - !(lr_val & ICH_LR_HW); > + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && > + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); In general don't we have this state transition inactive -> pending -> pending + active (1) -> active -> inactive. In that case won't we lower the virt irq level when folding the LR on Pending + Active state, which is not was we want? Thanks Eric > } > > void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [此邮件可能存在风险] Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 8:57 ` Auger Eric @ 2018-03-08 9:31 ` Yang, Shunyong -1 siblings, 0 replies; 50+ messages in thread From: Yang, Shunyong @ 2018-03-08 9:31 UTC (permalink / raw) To: eric.auger, cdall Cc: linux-kernel, ard.biesheuvel, kvmarm, Zheng, Joey, will.deacon, linux-arm-kernel, david.daney, marc.zyngier Hi, Eric, First, please let me change Christoffer's email to cdall@kernel.org. I add more information about my test below, please check. On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote: > Hi, > > On 08/03/18 08:01, Shunyong Yang wrote: > > > > When resampling irqfds is enabled, level interrupt should be > > de-asserted when resampling happens. On page 4-47 of GIC v3 > > specification IHI0069D, it said, > > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > > interface, the IRI changes the status of the interrupt to active > > and pending if: > > • It is an edge-triggered interrupt, and another edge has been > > detected since the interrupt was acknowledged. > > • It is a level-sensitive interrupt, and the level has not been > > deasserted since the interrupt was acknowledged." > > > > GIC v2 specification IHI0048B.b has similar description on page > > 3-42 for state machine transition. > > > > When some VFIO device, like mtty(8250 VFIO mdev emulation driver > > in samples/vfio-mdev) triggers a level interrupt, the status > > transition in LR is pending-->active-->active and pending. > > Then it will wait resampling to de-assert the interrupt. > > > > Current design of lr_signals_eoi_mi() will return false if state > > in LR is not invalid(Inactive). It causes resampling will not > > happen > > in mtty case. > > > > This will cause interrupt fired continuously to guest even 8250 IIR > > has no interrupt. When 8250's interrupt is configured in shared > > mode, > > it will pass interrupt to other drivers to handle. However, there > > is no other driver involved. Then, a "nobody cared" kernel > > complaint > > occurs. > > > > / # cat /dev/ttyS0 > > [ 4.826836] random: crng init done > > [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" > > option) > > [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 > > [ 6.378927] Hardware name: linux,dummy-virt (DT) > > [ 6.380876] Call trace: > > [ 6.381937] dump_backtrace+0x0/0x180 > > [ 6.383495] show_stack+0x14/0x1c > > [ 6.384902] dump_stack+0x90/0xb4 > > [ 6.386312] __report_bad_irq+0x38/0xe0 > > [ 6.387944] note_interrupt+0x1f4/0x2b8 > > [ 6.389568] handle_irq_event_percpu+0x54/0x7c > > [ 6.391433] handle_irq_event+0x44/0x74 > > [ 6.393056] handle_fasteoi_irq+0x9c/0x154 > > [ 6.394784] generic_handle_irq+0x24/0x38 > > [ 6.396483] __handle_domain_irq+0x60/0xb4 > > [ 6.398207] gic_handle_irq+0x98/0x1b0 > > [ 6.399796] el1_irq+0xb0/0x128 > > [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 > > [ 6.403149] __setup_irq+0x41c/0x678 > > [ 6.404669] request_threaded_irq+0xe0/0x190 > > [ 6.406474] univ8250_setup_irq+0x208/0x234 > > [ 6.408250] serial8250_do_startup+0x1b4/0x754 > > [ 6.410123] serial8250_startup+0x20/0x28 > > [ 6.411826] uart_startup.part.21+0x78/0x144 > > [ 6.413633] uart_port_activate+0x50/0x68 > > [ 6.415328] tty_port_open+0x84/0xd4 > > [ 6.416851] uart_open+0x34/0x44 > > [ 6.418229] tty_open+0xec/0x3c8 > > [ 6.419610] chrdev_open+0xb0/0x198 > > [ 6.421093] do_dentry_open+0x200/0x310 > > [ 6.422714] vfs_open+0x54/0x84 > > [ 6.424054] path_openat+0x2dc/0xf04 > > [ 6.425569] do_filp_open+0x68/0xd8 > > [ 6.427044] do_sys_open+0x16c/0x224 > > [ 6.428563] SyS_openat+0x10/0x18 > > [ 6.429972] el0_svc_naked+0x30/0x34 > > [ 6.431494] handlers: > > [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt > > [ 6.434597] Disabling IRQ #41 > > > > This patch changes the lr state condition in lr_signals_eoi_mi() > > from > > invalid(Inactive) to active and pending to avoid this. > > > > I am not sure about the original design of the condition of > > invalid(active). So, This RFC is sent out for comments. > > > > Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > > Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > > --- > > virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > > virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > > 2 files changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic- > > v2.c > > index e9d840a75e7b..740ee9a5f551 100644 > > --- a/virt/kvm/arm/vgic/vgic-v2.c > > +++ b/virt/kvm/arm/vgic/vgic-v2.c > > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > > > static bool lr_signals_eoi_mi(u32 lr_val) > > { > > - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) > > && > > - !(lr_val & GICH_LR_HW); > > + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && > > + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); > > } > > > > /* > > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic- > > v3.c > > index 6b329414e57a..43111bba7af9 100644 > > --- a/virt/kvm/arm/vgic/vgic-v3.c > > +++ b/virt/kvm/arm/vgic/vgic-v3.c > > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > > > static bool lr_signals_eoi_mi(u64 lr_val) > > { > > - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) > > && > > - !(lr_val & ICH_LR_HW); > > + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && > > + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > > In general don't we have this state transition > > inactive -> pending -> pending + active (1) -> active -> inactive. > > In that case won't we lower the virt irq level when folding the LR on > Pending + Active state, which is not was we want? > > Thanks > > Eric In current code, in my test, when I output LR value of the mtty IRQ 41 (hwirq = 36) in vgic_v3_fold_lr_state(). The LR's transition starts like following, 0-->50a0020000000024-->90a0020000000024-->d0a0020000000024 That is inactive-->pending-->active-->pending + active. Then it keep running cyclic pending-->active-->pending + active. The level interrupt de-assert should happen in following code /* Notify fds when the guest EOI'ed a level-triggered IRQ */ if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu->kvm, intid)) kvm_notify_acked_irq(vcpu->kvm, 0, intid - VGIC_NR_PRIVATE_IRQS); But as addressed in commit message, lr_signals_eoi_mi() will return false if state in LR is not invalid(inactive), so it has no chance to de-assert the level interrupt in my test. Thanks. Shunyong. > > > > > } > > > > void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [此邮件可能存在风险] Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 9:31 ` Yang, Shunyong 0 siblings, 0 replies; 50+ messages in thread From: Yang, Shunyong @ 2018-03-08 9:31 UTC (permalink / raw) To: linux-arm-kernel Hi, Eric, First, please let me change?Christoffer's email to?cdall at kernel.org. I add more information about my test below, please check. On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote: > Hi, > > On 08/03/18 08:01, Shunyong Yang wrote: > > > > When resampling irqfds is enabled, level interrupt should be > > de-asserted when resampling happens. On page 4-47 of GIC v3 > > specification IHI0069D, it said, > > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > > interface, the IRI changes the status of the interrupt to active > > and pending if: > > ? It is an edge-triggered interrupt, and another edge has been > > detected since the interrupt was acknowledged. > > ? It is a level-sensitive interrupt, and the level has not been > > deasserted since the interrupt was acknowledged." > > > > GIC v2 specification IHI0048B.b has similar description on page > > 3-42 for state machine transition. > > > > When some VFIO device, like mtty(8250 VFIO mdev emulation driver > > in samples/vfio-mdev) triggers a level interrupt, the status > > transition in LR is pending-->active-->active and pending. > > Then it will wait resampling to de-assert the interrupt. > > > > Current design of lr_signals_eoi_mi() will return false if state > > in LR is not invalid(Inactive). It causes resampling will not > > happen > > in mtty case. > > > > This will cause interrupt fired continuously to guest even 8250 IIR > > has no interrupt. When 8250's interrupt is configured in shared > > mode, > > it will pass interrupt to other drivers to handle. However, there > > is no other driver involved. Then, a "nobody cared" kernel > > complaint > > occurs. > > > > / # cat /dev/ttyS0 > > [????4.826836] random: crng init done > > [????6.373620] irq 41: nobody cared (try booting with the "irqpoll" > > option) > > [????6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 > > [????6.378927] Hardware name: linux,dummy-virt (DT) > > [????6.380876] Call trace: > > [????6.381937]??dump_backtrace+0x0/0x180 > > [????6.383495]??show_stack+0x14/0x1c > > [????6.384902]??dump_stack+0x90/0xb4 > > [????6.386312]??__report_bad_irq+0x38/0xe0 > > [????6.387944]??note_interrupt+0x1f4/0x2b8 > > [????6.389568]??handle_irq_event_percpu+0x54/0x7c > > [????6.391433]??handle_irq_event+0x44/0x74 > > [????6.393056]??handle_fasteoi_irq+0x9c/0x154 > > [????6.394784]??generic_handle_irq+0x24/0x38 > > [????6.396483]??__handle_domain_irq+0x60/0xb4 > > [????6.398207]??gic_handle_irq+0x98/0x1b0 > > [????6.399796]??el1_irq+0xb0/0x128 > > [????6.401138]??_raw_spin_unlock_irqrestore+0x18/0x40 > > [????6.403149]??__setup_irq+0x41c/0x678 > > [????6.404669]??request_threaded_irq+0xe0/0x190 > > [????6.406474]??univ8250_setup_irq+0x208/0x234 > > [????6.408250]??serial8250_do_startup+0x1b4/0x754 > > [????6.410123]??serial8250_startup+0x20/0x28 > > [????6.411826]??uart_startup.part.21+0x78/0x144 > > [????6.413633]??uart_port_activate+0x50/0x68 > > [????6.415328]??tty_port_open+0x84/0xd4 > > [????6.416851]??uart_open+0x34/0x44 > > [????6.418229]??tty_open+0xec/0x3c8 > > [????6.419610]??chrdev_open+0xb0/0x198 > > [????6.421093]??do_dentry_open+0x200/0x310 > > [????6.422714]??vfs_open+0x54/0x84 > > [????6.424054]??path_openat+0x2dc/0xf04 > > [????6.425569]??do_filp_open+0x68/0xd8 > > [????6.427044]??do_sys_open+0x16c/0x224 > > [????6.428563]??SyS_openat+0x10/0x18 > > [????6.429972]??el0_svc_naked+0x30/0x34 > > [????6.431494] handlers: > > [????6.432479] [<000000000e9fb4bb>] serial8250_interrupt > > [????6.434597] Disabling IRQ #41 > > > > This patch changes the lr state condition in lr_signals_eoi_mi() > > from > > invalid(Inactive) to active and pending to avoid this. > > > > I am not sure about the original design of the condition of > > invalid(active). So, This RFC is sent out for comments. > > > > Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > > Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > > --- > > ?virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > > ?virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > > ?2 files changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic- > > v2.c > > index e9d840a75e7b..740ee9a5f551 100644 > > --- a/virt/kvm/arm/vgic/vgic-v2.c > > +++ b/virt/kvm/arm/vgic/vgic-v2.c > > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > ? > > ?static bool lr_signals_eoi_mi(u32 lr_val) > > ?{ > > - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) > > && > > - ???????!(lr_val & GICH_LR_HW); > > + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && > > + ???????(lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); > > ?} > > ? > > ?/* > > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic- > > v3.c > > index 6b329414e57a..43111bba7af9 100644 > > --- a/virt/kvm/arm/vgic/vgic-v3.c > > +++ b/virt/kvm/arm/vgic/vgic-v3.c > > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > ? > > ?static bool lr_signals_eoi_mi(u64 lr_val) > > ?{ > > - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) > > && > > - ???????!(lr_val & ICH_LR_HW); > > + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && > > + ???????(lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > > In general don't we have this state transition > > inactive -> pending -> pending + active (1) -> active -> inactive. > > In that case won't we lower the virt irq level when folding the LR on > Pending + Active state, which is not was we want? > > Thanks > > Eric In current code, in my test, when I output LR value of the mtty IRQ 41 (hwirq = 36) in?vgic_v3_fold_lr_state(). The LR's transition starts like following, 0-->50a0020000000024-->90a0020000000024-->d0a0020000000024 That is inactive-->pending-->active-->pending + active. Then it keep running cyclic pending-->active-->pending + active. The level interrupt de-assert should happen in following code /* Notify fds when the guest EOI'ed a level-triggered IRQ */ if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu->kvm, intid)) kvm_notify_acked_irq(vcpu->kvm, 0, ?????intid - VGIC_NR_PRIVATE_IRQS); But as addressed in commit message, lr_signals_eoi_mi() will return false if state in LR is not invalid(inactive), so it has no chance to de-assert the level interrupt in my test.? Thanks. Shunyong. > > > > > ?} > > ? > > ?void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [此邮件可能存在风险] Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 9:31 ` Yang, Shunyong @ 2018-03-08 11:01 ` Marc Zyngier -1 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-08 11:01 UTC (permalink / raw) To: Yang, Shunyong, eric.auger, cdall Cc: linux-kernel, ard.biesheuvel, kvmarm, Zheng, Joey, will.deacon, linux-arm-kernel, david.daney On 08/03/18 09:31, Yang, Shunyong wrote: > Hi, Eric, > > First, please let me change Christoffer's email to cdall@kernel.org. I > add more information about my test below, please check. > > On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote: >> Hi, >> >> On 08/03/18 08:01, Shunyong Yang wrote: >>> >>> When resampling irqfds is enabled, level interrupt should be >>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>> specification IHI0069D, it said, >>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>> interface, the IRI changes the status of the interrupt to active >>> and pending if: >>> • It is an edge-triggered interrupt, and another edge has been >>> detected since the interrupt was acknowledged. >>> • It is a level-sensitive interrupt, and the level has not been >>> deasserted since the interrupt was acknowledged." >>> >>> GIC v2 specification IHI0048B.b has similar description on page >>> 3-42 for state machine transition. >>> >>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>> in samples/vfio-mdev) triggers a level interrupt, the status >>> transition in LR is pending-->active-->active and pending. >>> Then it will wait resampling to de-assert the interrupt. >>> >>> Current design of lr_signals_eoi_mi() will return false if state >>> in LR is not invalid(Inactive). It causes resampling will not >>> happen >>> in mtty case. >>> >>> This will cause interrupt fired continuously to guest even 8250 IIR >>> has no interrupt. When 8250's interrupt is configured in shared >>> mode, >>> it will pass interrupt to other drivers to handle. However, there >>> is no other driver involved. Then, a "nobody cared" kernel >>> complaint >>> occurs. >>> >>> / # cat /dev/ttyS0 >>> [ 4.826836] random: crng init done >>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>> option) >>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>> [ 6.380876] Call trace: >>> [ 6.381937] dump_backtrace+0x0/0x180 >>> [ 6.383495] show_stack+0x14/0x1c >>> [ 6.384902] dump_stack+0x90/0xb4 >>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>> [ 6.391433] handle_irq_event+0x44/0x74 >>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>> [ 6.394784] generic_handle_irq+0x24/0x38 >>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>> [ 6.399796] el1_irq+0xb0/0x128 >>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>> [ 6.403149] __setup_irq+0x41c/0x678 >>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>> [ 6.410123] serial8250_startup+0x20/0x28 >>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>> [ 6.413633] uart_port_activate+0x50/0x68 >>> [ 6.415328] tty_port_open+0x84/0xd4 >>> [ 6.416851] uart_open+0x34/0x44 >>> [ 6.418229] tty_open+0xec/0x3c8 >>> [ 6.419610] chrdev_open+0xb0/0x198 >>> [ 6.421093] do_dentry_open+0x200/0x310 >>> [ 6.422714] vfs_open+0x54/0x84 >>> [ 6.424054] path_openat+0x2dc/0xf04 >>> [ 6.425569] do_filp_open+0x68/0xd8 >>> [ 6.427044] do_sys_open+0x16c/0x224 >>> [ 6.428563] SyS_openat+0x10/0x18 >>> [ 6.429972] el0_svc_naked+0x30/0x34 >>> [ 6.431494] handlers: >>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>> [ 6.434597] Disabling IRQ #41 >>> >>> This patch changes the lr state condition in lr_signals_eoi_mi() >>> from >>> invalid(Inactive) to active and pending to avoid this. >>> >>> I am not sure about the original design of the condition of >>> invalid(active). So, This RFC is sent out for comments. >>> >>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>> --- >>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>> 2 files changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic- >>> v2.c >>> index e9d840a75e7b..740ee9a5f551 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u32 lr_val) >>> { >>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) >>> && >>> - !(lr_val & GICH_LR_HW); >>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>> } >>> >>> /* >>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic- >>> v3.c >>> index 6b329414e57a..43111bba7af9 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u64 lr_val) >>> { >>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) >>> && >>> - !(lr_val & ICH_LR_HW); >>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >> >> In general don't we have this state transition >> >> inactive -> pending -> pending + active (1) -> active -> inactive. >> >> In that case won't we lower the virt irq level when folding the LR on >> Pending + Active state, which is not was we want? >> >> Thanks >> >> Eric > > In current code, in my test, when I output LR value of the mtty IRQ 41 > (hwirq = 36) in vgic_v3_fold_lr_state(). The LR's transition starts > like following, > > 0-->50a0020000000024-->90a0020000000024-->d0a0020000000024 > > That is inactive-->pending-->active-->pending + active. > Then it keep running cyclic pending-->active-->pending + active. > > The level interrupt de-assert should happen in following code > /* Notify fds when the guest EOI'ed a level-triggered IRQ */ > if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu->kvm, intid)) > kvm_notify_acked_irq(vcpu->kvm, 0, > intid - VGIC_NR_PRIVATE_IRQS); > > But as addressed in commit message, lr_signals_eoi_mi() will return > false if state in LR is not invalid(inactive), so it has no chance to > de-assert the level interrupt in my test. The problem is that pending+active is not an indication that the guest has actually EOI'd anything. In only indicates that it has been activated. Note that there is a bit of vocabulary discrepancy between KVM and the ARM architecture: KVM uses "acked" where ARM uses EOI. ARM uses "ACK" or "Activate" for something entirely different. Maybe the confusion stems from this difference. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [此邮件可能存在风险] Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 11:01 ` Marc Zyngier 0 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-08 11:01 UTC (permalink / raw) To: linux-arm-kernel On 08/03/18 09:31, Yang, Shunyong wrote: > Hi, Eric, > > First, please let me change?Christoffer's email to?cdall at kernel.org. I > add more information about my test below, please check. > > On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote: >> Hi, >> >> On 08/03/18 08:01, Shunyong Yang wrote: >>> >>> When resampling irqfds is enabled, level interrupt should be >>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>> specification IHI0069D, it said, >>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>> interface, the IRI changes the status of the interrupt to active >>> and pending if: >>> ? It is an edge-triggered interrupt, and another edge has been >>> detected since the interrupt was acknowledged. >>> ? It is a level-sensitive interrupt, and the level has not been >>> deasserted since the interrupt was acknowledged." >>> >>> GIC v2 specification IHI0048B.b has similar description on page >>> 3-42 for state machine transition. >>> >>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>> in samples/vfio-mdev) triggers a level interrupt, the status >>> transition in LR is pending-->active-->active and pending. >>> Then it will wait resampling to de-assert the interrupt. >>> >>> Current design of lr_signals_eoi_mi() will return false if state >>> in LR is not invalid(Inactive). It causes resampling will not >>> happen >>> in mtty case. >>> >>> This will cause interrupt fired continuously to guest even 8250 IIR >>> has no interrupt. When 8250's interrupt is configured in shared >>> mode, >>> it will pass interrupt to other drivers to handle. However, there >>> is no other driver involved. Then, a "nobody cared" kernel >>> complaint >>> occurs. >>> >>> / # cat /dev/ttyS0 >>> [????4.826836] random: crng init done >>> [????6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>> option) >>> [????6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>> [????6.378927] Hardware name: linux,dummy-virt (DT) >>> [????6.380876] Call trace: >>> [????6.381937]??dump_backtrace+0x0/0x180 >>> [????6.383495]??show_stack+0x14/0x1c >>> [????6.384902]??dump_stack+0x90/0xb4 >>> [????6.386312]??__report_bad_irq+0x38/0xe0 >>> [????6.387944]??note_interrupt+0x1f4/0x2b8 >>> [????6.389568]??handle_irq_event_percpu+0x54/0x7c >>> [????6.391433]??handle_irq_event+0x44/0x74 >>> [????6.393056]??handle_fasteoi_irq+0x9c/0x154 >>> [????6.394784]??generic_handle_irq+0x24/0x38 >>> [????6.396483]??__handle_domain_irq+0x60/0xb4 >>> [????6.398207]??gic_handle_irq+0x98/0x1b0 >>> [????6.399796]??el1_irq+0xb0/0x128 >>> [????6.401138]??_raw_spin_unlock_irqrestore+0x18/0x40 >>> [????6.403149]??__setup_irq+0x41c/0x678 >>> [????6.404669]??request_threaded_irq+0xe0/0x190 >>> [????6.406474]??univ8250_setup_irq+0x208/0x234 >>> [????6.408250]??serial8250_do_startup+0x1b4/0x754 >>> [????6.410123]??serial8250_startup+0x20/0x28 >>> [????6.411826]??uart_startup.part.21+0x78/0x144 >>> [????6.413633]??uart_port_activate+0x50/0x68 >>> [????6.415328]??tty_port_open+0x84/0xd4 >>> [????6.416851]??uart_open+0x34/0x44 >>> [????6.418229]??tty_open+0xec/0x3c8 >>> [????6.419610]??chrdev_open+0xb0/0x198 >>> [????6.421093]??do_dentry_open+0x200/0x310 >>> [????6.422714]??vfs_open+0x54/0x84 >>> [????6.424054]??path_openat+0x2dc/0xf04 >>> [????6.425569]??do_filp_open+0x68/0xd8 >>> [????6.427044]??do_sys_open+0x16c/0x224 >>> [????6.428563]??SyS_openat+0x10/0x18 >>> [????6.429972]??el0_svc_naked+0x30/0x34 >>> [????6.431494] handlers: >>> [????6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>> [????6.434597] Disabling IRQ #41 >>> >>> This patch changes the lr state condition in lr_signals_eoi_mi() >>> from >>> invalid(Inactive) to active and pending to avoid this. >>> >>> I am not sure about the original design of the condition of >>> invalid(active). So, This RFC is sent out for comments. >>> >>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>> --- >>> ?virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>> ?virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>> ?2 files changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic- >>> v2.c >>> index e9d840a75e7b..740ee9a5f551 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>> ? >>> ?static bool lr_signals_eoi_mi(u32 lr_val) >>> ?{ >>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) >>> && >>> - ???????!(lr_val & GICH_LR_HW); >>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >>> + ???????(lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>> ?} >>> ? >>> ?/* >>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic- >>> v3.c >>> index 6b329414e57a..43111bba7af9 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>> ? >>> ?static bool lr_signals_eoi_mi(u64 lr_val) >>> ?{ >>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) >>> && >>> - ???????!(lr_val & ICH_LR_HW); >>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>> + ???????(lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >> >> In general don't we have this state transition >> >> inactive -> pending -> pending + active (1) -> active -> inactive. >> >> In that case won't we lower the virt irq level when folding the LR on >> Pending + Active state, which is not was we want? >> >> Thanks >> >> Eric > > In current code, in my test, when I output LR value of the mtty IRQ 41 > (hwirq = 36) in?vgic_v3_fold_lr_state(). The LR's transition starts > like following, > > 0-->50a0020000000024-->90a0020000000024-->d0a0020000000024 > > That is inactive-->pending-->active-->pending + active. > Then it keep running cyclic pending-->active-->pending + active. > > The level interrupt de-assert should happen in following code > /* Notify fds when the guest EOI'ed a level-triggered IRQ */ > if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu->kvm, intid)) > kvm_notify_acked_irq(vcpu->kvm, 0, > ?????intid - VGIC_NR_PRIVATE_IRQS); > > But as addressed in commit message, lr_signals_eoi_mi() will return > false if state in LR is not invalid(inactive), so it has no chance to > de-assert the level interrupt in my test. The problem is that pending+active is not an indication that the guest has actually EOI'd anything. In only indicates that it has been activated. Note that there is a bit of vocabulary discrepancy between KVM and the ARM architecture: KVM uses "acked" where ARM uses EOI. ARM uses "ACK" or "Activate" for something entirely different. Maybe the confusion stems from this difference. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [此邮件可能存在风险] Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 9:31 ` Yang, Shunyong @ 2018-03-08 15:29 ` Auger Eric -1 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-08 15:29 UTC (permalink / raw) To: Yang, Shunyong, cdall Cc: linux-kernel, ard.biesheuvel, kvmarm, Zheng, Joey, will.deacon, linux-arm-kernel, david.daney, marc.zyngier Hi Shunyong, On 08/03/18 10:31, Yang, Shunyong wrote: > Hi, Eric, > > First, please let me change Christoffer's email to cdall@kernel.org. I > add more information about my test below, please check. > > On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote: >> Hi, >> >> On 08/03/18 08:01, Shunyong Yang wrote: >>> >>> When resampling irqfds is enabled, level interrupt should be >>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>> specification IHI0069D, it said, >>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>> interface, the IRI changes the status of the interrupt to active >>> and pending if: >>> • It is an edge-triggered interrupt, and another edge has been >>> detected since the interrupt was acknowledged. >>> • It is a level-sensitive interrupt, and the level has not been >>> deasserted since the interrupt was acknowledged." >>> >>> GIC v2 specification IHI0048B.b has similar description on page >>> 3-42 for state machine transition. >>> >>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>> in samples/vfio-mdev) triggers a level interrupt, the status >>> transition in LR is pending-->active-->active and pending. >>> Then it will wait resampling to de-assert the interrupt. >>> >>> Current design of lr_signals_eoi_mi() will return false if state >>> in LR is not invalid(Inactive). It causes resampling will not >>> happen >>> in mtty case. >>> >>> This will cause interrupt fired continuously to guest even 8250 IIR >>> has no interrupt. When 8250's interrupt is configured in shared >>> mode, >>> it will pass interrupt to other drivers to handle. However, there >>> is no other driver involved. Then, a "nobody cared" kernel >>> complaint >>> occurs. >>> >>> / # cat /dev/ttyS0 >>> [ 4.826836] random: crng init done >>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>> option) >>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>> [ 6.380876] Call trace: >>> [ 6.381937] dump_backtrace+0x0/0x180 >>> [ 6.383495] show_stack+0x14/0x1c >>> [ 6.384902] dump_stack+0x90/0xb4 >>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>> [ 6.391433] handle_irq_event+0x44/0x74 >>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>> [ 6.394784] generic_handle_irq+0x24/0x38 >>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>> [ 6.399796] el1_irq+0xb0/0x128 >>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>> [ 6.403149] __setup_irq+0x41c/0x678 >>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>> [ 6.410123] serial8250_startup+0x20/0x28 >>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>> [ 6.413633] uart_port_activate+0x50/0x68 >>> [ 6.415328] tty_port_open+0x84/0xd4 >>> [ 6.416851] uart_open+0x34/0x44 >>> [ 6.418229] tty_open+0xec/0x3c8 >>> [ 6.419610] chrdev_open+0xb0/0x198 >>> [ 6.421093] do_dentry_open+0x200/0x310 >>> [ 6.422714] vfs_open+0x54/0x84 >>> [ 6.424054] path_openat+0x2dc/0xf04 >>> [ 6.425569] do_filp_open+0x68/0xd8 >>> [ 6.427044] do_sys_open+0x16c/0x224 >>> [ 6.428563] SyS_openat+0x10/0x18 >>> [ 6.429972] el0_svc_naked+0x30/0x34 >>> [ 6.431494] handlers: >>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>> [ 6.434597] Disabling IRQ #41 >>> >>> This patch changes the lr state condition in lr_signals_eoi_mi() >>> from >>> invalid(Inactive) to active and pending to avoid this. >>> >>> I am not sure about the original design of the condition of >>> invalid(active). So, This RFC is sent out for comments. >>> >>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>> --- >>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>> 2 files changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic- >>> v2.c >>> index e9d840a75e7b..740ee9a5f551 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u32 lr_val) >>> { >>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) >>> && >>> - !(lr_val & GICH_LR_HW); >>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>> } >>> >>> /* >>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic- >>> v3.c >>> index 6b329414e57a..43111bba7af9 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u64 lr_val) >>> { >>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) >>> && >>> - !(lr_val & ICH_LR_HW); >>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >> >> In general don't we have this state transition >> >> inactive -> pending -> pending + active (1) -> active -> inactive. >> >> In that case won't we lower the virt irq level when folding the LR on >> Pending + Active state, which is not was we want? >> >> Thanks >> >> Eric > > In current code, in my test, when I output LR value of the mtty IRQ 41 > (hwirq = 36) in vgic_v3_fold_lr_state(). The LR's transition starts > like following, > > 0-->50a0020000000024-->90a0020000000024-->d0a0020000000024 > > That is inactive-->pending-->active-->pending + active. Yes sorry I did a big mixture of virt line level and LR pending state. I had below case in mind: P -> guest IAR -> A -> exit/entry -> P+A -> exit in which case you shouldn't call the resampler. Thanks Eric > Then it keep running cyclic pending-->active-->pending + active. > > The level interrupt de-assert should happen in following code > /* Notify fds when the guest EOI'ed a level-triggered IRQ */ > if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu->kvm, intid)) > kvm_notify_acked_irq(vcpu->kvm, 0, > intid - VGIC_NR_PRIVATE_IRQS); > > But as addressed in commit message, lr_signals_eoi_mi() will return > false if state in LR is not invalid(inactive), so it has no chance to > de-assert the level interrupt in my test. > > Thanks. > Shunyong. > >> >>> >>> } >>> >>> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >>> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [此邮件可能存在风险] Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 15:29 ` Auger Eric 0 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-08 15:29 UTC (permalink / raw) To: linux-arm-kernel Hi Shunyong, On 08/03/18 10:31, Yang, Shunyong wrote: > Hi, Eric, > > First, please let me change Christoffer's email to cdall at kernel.org. I > add more information about my test below, please check. > > On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote: >> Hi, >> >> On 08/03/18 08:01, Shunyong Yang wrote: >>> >>> When resampling irqfds is enabled, level interrupt should be >>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>> specification IHI0069D, it said, >>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>> interface, the IRI changes the status of the interrupt to active >>> and pending if: >>> ? It is an edge-triggered interrupt, and another edge has been >>> detected since the interrupt was acknowledged. >>> ? It is a level-sensitive interrupt, and the level has not been >>> deasserted since the interrupt was acknowledged." >>> >>> GIC v2 specification IHI0048B.b has similar description on page >>> 3-42 for state machine transition. >>> >>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>> in samples/vfio-mdev) triggers a level interrupt, the status >>> transition in LR is pending-->active-->active and pending. >>> Then it will wait resampling to de-assert the interrupt. >>> >>> Current design of lr_signals_eoi_mi() will return false if state >>> in LR is not invalid(Inactive). It causes resampling will not >>> happen >>> in mtty case. >>> >>> This will cause interrupt fired continuously to guest even 8250 IIR >>> has no interrupt. When 8250's interrupt is configured in shared >>> mode, >>> it will pass interrupt to other drivers to handle. However, there >>> is no other driver involved. Then, a "nobody cared" kernel >>> complaint >>> occurs. >>> >>> / # cat /dev/ttyS0 >>> [ 4.826836] random: crng init done >>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>> option) >>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>> [ 6.380876] Call trace: >>> [ 6.381937] dump_backtrace+0x0/0x180 >>> [ 6.383495] show_stack+0x14/0x1c >>> [ 6.384902] dump_stack+0x90/0xb4 >>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>> [ 6.391433] handle_irq_event+0x44/0x74 >>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>> [ 6.394784] generic_handle_irq+0x24/0x38 >>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>> [ 6.399796] el1_irq+0xb0/0x128 >>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>> [ 6.403149] __setup_irq+0x41c/0x678 >>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>> [ 6.410123] serial8250_startup+0x20/0x28 >>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>> [ 6.413633] uart_port_activate+0x50/0x68 >>> [ 6.415328] tty_port_open+0x84/0xd4 >>> [ 6.416851] uart_open+0x34/0x44 >>> [ 6.418229] tty_open+0xec/0x3c8 >>> [ 6.419610] chrdev_open+0xb0/0x198 >>> [ 6.421093] do_dentry_open+0x200/0x310 >>> [ 6.422714] vfs_open+0x54/0x84 >>> [ 6.424054] path_openat+0x2dc/0xf04 >>> [ 6.425569] do_filp_open+0x68/0xd8 >>> [ 6.427044] do_sys_open+0x16c/0x224 >>> [ 6.428563] SyS_openat+0x10/0x18 >>> [ 6.429972] el0_svc_naked+0x30/0x34 >>> [ 6.431494] handlers: >>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>> [ 6.434597] Disabling IRQ #41 >>> >>> This patch changes the lr state condition in lr_signals_eoi_mi() >>> from >>> invalid(Inactive) to active and pending to avoid this. >>> >>> I am not sure about the original design of the condition of >>> invalid(active). So, This RFC is sent out for comments. >>> >>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>> --- >>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>> 2 files changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic- >>> v2.c >>> index e9d840a75e7b..740ee9a5f551 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u32 lr_val) >>> { >>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) >>> && >>> - !(lr_val & GICH_LR_HW); >>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>> } >>> >>> /* >>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic- >>> v3.c >>> index 6b329414e57a..43111bba7af9 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u64 lr_val) >>> { >>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) >>> && >>> - !(lr_val & ICH_LR_HW); >>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >> >> In general don't we have this state transition >> >> inactive -> pending -> pending + active (1) -> active -> inactive. >> >> In that case won't we lower the virt irq level when folding the LR on >> Pending + Active state, which is not was we want? >> >> Thanks >> >> Eric > > In current code, in my test, when I output LR value of the mtty IRQ 41 > (hwirq = 36) in vgic_v3_fold_lr_state(). The LR's transition starts > like following, > > 0-->50a0020000000024-->90a0020000000024-->d0a0020000000024 > > That is inactive-->pending-->active-->pending + active. Yes sorry I did a big mixture of virt line level and LR pending state. I had below case in mind: P -> guest IAR -> A -> exit/entry -> P+A -> exit in which case you shouldn't call the resampler. Thanks Eric > Then it keep running cyclic pending-->active-->pending + active. > > The level interrupt de-assert should happen in following code > /* Notify fds when the guest EOI'ed a level-triggered IRQ */ > if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu->kvm, intid)) > kvm_notify_acked_irq(vcpu->kvm, 0, > intid - VGIC_NR_PRIVATE_IRQS); > > But as addressed in commit message, lr_signals_eoi_mi() will return > false if state in LR is not invalid(inactive), so it has no chance to > de-assert the level interrupt in my test. > > Thanks. > Shunyong. > >> >>> >>> } >>> >>> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >>> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 7:01 ` Shunyong Yang (?) @ 2018-03-08 9:49 ` Marc Zyngier -1 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-08 9:49 UTC (permalink / raw) To: Shunyong Yang Cc: ard.biesheuvel, will.deacon, eric.auger, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng, Christoffer Dall [updated Christoffer's email address] Hi Shunyong, On 08/03/18 07:01, Shunyong Yang wrote: > When resampling irqfds is enabled, level interrupt should be > de-asserted when resampling happens. On page 4-47 of GIC v3 > specification IHI0069D, it said, > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > interface, the IRI changes the status of the interrupt to active > and pending if: > • It is an edge-triggered interrupt, and another edge has been > detected since the interrupt was acknowledged. > • It is a level-sensitive interrupt, and the level has not been > deasserted since the interrupt was acknowledged." > > GIC v2 specification IHI0048B.b has similar description on page > 3-42 for state machine transition. > > When some VFIO device, like mtty(8250 VFIO mdev emulation driver > in samples/vfio-mdev) triggers a level interrupt, the status > transition in LR is pending-->active-->active and pending. > Then it will wait resampling to de-assert the interrupt. > > Current design of lr_signals_eoi_mi() will return false if state > in LR is not invalid(Inactive). It causes resampling will not happen > in mtty case. Let me rephrase this, and tell me if I understood it correctly: - A level interrupt is injected, activated by the guest (LR state=active) - guest exits, re-enters, (LR state=pending+active) - guest EOIs the interrupt (LR state=pending) - maintenance interrupt - we don't signal the resampling because we're not in an invalid state Is that correct? That's an interesting case, because it seems to invalidate some of the optimization that went in over a year ago. 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation We could compare the value of the LR before the guest entry with the value at exit time, but we still could miss it if we have a transition such as P+A -> P -> A and assume a long enough propagation delay for the maintenance interrupt (which is very likely). In essence, we have lost the benefit of EISR, which was to give us a way to deal with asynchronous signalling. > > This will cause interrupt fired continuously to guest even 8250 IIR > has no interrupt. When 8250's interrupt is configured in shared mode, > it will pass interrupt to other drivers to handle. However, there > is no other driver involved. Then, a "nobody cared" kernel complaint > occurs. > > / # cat /dev/ttyS0 > [ 4.826836] random: crng init done > [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" > option) > [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 > [ 6.378927] Hardware name: linux,dummy-virt (DT) > [ 6.380876] Call trace: > [ 6.381937] dump_backtrace+0x0/0x180 > [ 6.383495] show_stack+0x14/0x1c > [ 6.384902] dump_stack+0x90/0xb4 > [ 6.386312] __report_bad_irq+0x38/0xe0 > [ 6.387944] note_interrupt+0x1f4/0x2b8 > [ 6.389568] handle_irq_event_percpu+0x54/0x7c > [ 6.391433] handle_irq_event+0x44/0x74 > [ 6.393056] handle_fasteoi_irq+0x9c/0x154 > [ 6.394784] generic_handle_irq+0x24/0x38 > [ 6.396483] __handle_domain_irq+0x60/0xb4 > [ 6.398207] gic_handle_irq+0x98/0x1b0 > [ 6.399796] el1_irq+0xb0/0x128 > [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 > [ 6.403149] __setup_irq+0x41c/0x678 > [ 6.404669] request_threaded_irq+0xe0/0x190 > [ 6.406474] univ8250_setup_irq+0x208/0x234 > [ 6.408250] serial8250_do_startup+0x1b4/0x754 > [ 6.410123] serial8250_startup+0x20/0x28 > [ 6.411826] uart_startup.part.21+0x78/0x144 > [ 6.413633] uart_port_activate+0x50/0x68 > [ 6.415328] tty_port_open+0x84/0xd4 > [ 6.416851] uart_open+0x34/0x44 > [ 6.418229] tty_open+0xec/0x3c8 > [ 6.419610] chrdev_open+0xb0/0x198 > [ 6.421093] do_dentry_open+0x200/0x310 > [ 6.422714] vfs_open+0x54/0x84 > [ 6.424054] path_openat+0x2dc/0xf04 > [ 6.425569] do_filp_open+0x68/0xd8 > [ 6.427044] do_sys_open+0x16c/0x224 > [ 6.428563] SyS_openat+0x10/0x18 > [ 6.429972] el0_svc_naked+0x30/0x34 > [ 6.431494] handlers: > [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt > [ 6.434597] Disabling IRQ #41 > > This patch changes the lr state condition in lr_signals_eoi_mi() from > invalid(Inactive) to active and pending to avoid this. > > I am not sure about the original design of the condition of > invalid(active). So, This RFC is sent out for comments. > > Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > --- > virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > index e9d840a75e7b..740ee9a5f551 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u32 lr_val) > { > - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > - !(lr_val & GICH_LR_HW); > + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && That feels very wrong. You're now signalling the resampling in both invalid and pending+active, and the latter state doesn't mean you've EOIed anything. You're now over-signalling, and signalling the wrong event. > + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); > } > > /* > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > index 6b329414e57a..43111bba7af9 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u64 lr_val) > { > - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > - !(lr_val & ICH_LR_HW); > + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && > + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > } > > void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > Assuming I understand the issue correctly, I cannot really see how to solve this without reintroducing EISR, which sucks majorly. I'll try to cook something shortly and we can all have a good fight about how crap this is. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 9:49 ` Marc Zyngier 0 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-08 9:49 UTC (permalink / raw) To: linux-arm-kernel [updated Christoffer's email address] Hi Shunyong, On 08/03/18 07:01, Shunyong Yang wrote: > When resampling irqfds is enabled, level interrupt should be > de-asserted when resampling happens. On page 4-47 of GIC v3 > specification IHI0069D, it said, > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > interface, the IRI changes the status of the interrupt to active > and pending if: > ? It is an edge-triggered interrupt, and another edge has been > detected since the interrupt was acknowledged. > ? It is a level-sensitive interrupt, and the level has not been > deasserted since the interrupt was acknowledged." > > GIC v2 specification IHI0048B.b has similar description on page > 3-42 for state machine transition. > > When some VFIO device, like mtty(8250 VFIO mdev emulation driver > in samples/vfio-mdev) triggers a level interrupt, the status > transition in LR is pending-->active-->active and pending. > Then it will wait resampling to de-assert the interrupt. > > Current design of lr_signals_eoi_mi() will return false if state > in LR is not invalid(Inactive). It causes resampling will not happen > in mtty case. Let me rephrase this, and tell me if I understood it correctly: - A level interrupt is injected, activated by the guest (LR state=active) - guest exits, re-enters, (LR state=pending+active) - guest EOIs the interrupt (LR state=pending) - maintenance interrupt - we don't signal the resampling because we're not in an invalid state Is that correct? That's an interesting case, because it seems to invalidate some of the optimization that went in over a year ago. 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation We could compare the value of the LR before the guest entry with the value at exit time, but we still could miss it if we have a transition such as P+A -> P -> A and assume a long enough propagation delay for the maintenance interrupt (which is very likely). In essence, we have lost the benefit of EISR, which was to give us a way to deal with asynchronous signalling. > > This will cause interrupt fired continuously to guest even 8250 IIR > has no interrupt. When 8250's interrupt is configured in shared mode, > it will pass interrupt to other drivers to handle. However, there > is no other driver involved. Then, a "nobody cared" kernel complaint > occurs. > > / # cat /dev/ttyS0 > [ 4.826836] random: crng init done > [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" > option) > [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 > [ 6.378927] Hardware name: linux,dummy-virt (DT) > [ 6.380876] Call trace: > [ 6.381937] dump_backtrace+0x0/0x180 > [ 6.383495] show_stack+0x14/0x1c > [ 6.384902] dump_stack+0x90/0xb4 > [ 6.386312] __report_bad_irq+0x38/0xe0 > [ 6.387944] note_interrupt+0x1f4/0x2b8 > [ 6.389568] handle_irq_event_percpu+0x54/0x7c > [ 6.391433] handle_irq_event+0x44/0x74 > [ 6.393056] handle_fasteoi_irq+0x9c/0x154 > [ 6.394784] generic_handle_irq+0x24/0x38 > [ 6.396483] __handle_domain_irq+0x60/0xb4 > [ 6.398207] gic_handle_irq+0x98/0x1b0 > [ 6.399796] el1_irq+0xb0/0x128 > [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 > [ 6.403149] __setup_irq+0x41c/0x678 > [ 6.404669] request_threaded_irq+0xe0/0x190 > [ 6.406474] univ8250_setup_irq+0x208/0x234 > [ 6.408250] serial8250_do_startup+0x1b4/0x754 > [ 6.410123] serial8250_startup+0x20/0x28 > [ 6.411826] uart_startup.part.21+0x78/0x144 > [ 6.413633] uart_port_activate+0x50/0x68 > [ 6.415328] tty_port_open+0x84/0xd4 > [ 6.416851] uart_open+0x34/0x44 > [ 6.418229] tty_open+0xec/0x3c8 > [ 6.419610] chrdev_open+0xb0/0x198 > [ 6.421093] do_dentry_open+0x200/0x310 > [ 6.422714] vfs_open+0x54/0x84 > [ 6.424054] path_openat+0x2dc/0xf04 > [ 6.425569] do_filp_open+0x68/0xd8 > [ 6.427044] do_sys_open+0x16c/0x224 > [ 6.428563] SyS_openat+0x10/0x18 > [ 6.429972] el0_svc_naked+0x30/0x34 > [ 6.431494] handlers: > [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt > [ 6.434597] Disabling IRQ #41 > > This patch changes the lr state condition in lr_signals_eoi_mi() from > invalid(Inactive) to active and pending to avoid this. > > I am not sure about the original design of the condition of > invalid(active). So, This RFC is sent out for comments. > > Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > --- > virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > index e9d840a75e7b..740ee9a5f551 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u32 lr_val) > { > - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > - !(lr_val & GICH_LR_HW); > + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && That feels very wrong. You're now signalling the resampling in both invalid and pending+active, and the latter state doesn't mean you've EOIed anything. You're now over-signalling, and signalling the wrong event. > + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); > } > > /* > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > index 6b329414e57a..43111bba7af9 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u64 lr_val) > { > - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > - !(lr_val & ICH_LR_HW); > + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && > + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > } > > void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > Assuming I understand the issue correctly, I cannot really see how to solve this without reintroducing EISR, which sucks majorly. I'll try to cook something shortly and we can all have a good fight about how crap this is. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 9:49 ` Marc Zyngier 0 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-08 9:49 UTC (permalink / raw) To: Shunyong Yang Cc: david.daney, ard.biesheuvel, will.deacon, linux-kernel, Joey Zheng, kvmarm, linux-arm-kernel [updated Christoffer's email address] Hi Shunyong, On 08/03/18 07:01, Shunyong Yang wrote: > When resampling irqfds is enabled, level interrupt should be > de-asserted when resampling happens. On page 4-47 of GIC v3 > specification IHI0069D, it said, > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > interface, the IRI changes the status of the interrupt to active > and pending if: > • It is an edge-triggered interrupt, and another edge has been > detected since the interrupt was acknowledged. > • It is a level-sensitive interrupt, and the level has not been > deasserted since the interrupt was acknowledged." > > GIC v2 specification IHI0048B.b has similar description on page > 3-42 for state machine transition. > > When some VFIO device, like mtty(8250 VFIO mdev emulation driver > in samples/vfio-mdev) triggers a level interrupt, the status > transition in LR is pending-->active-->active and pending. > Then it will wait resampling to de-assert the interrupt. > > Current design of lr_signals_eoi_mi() will return false if state > in LR is not invalid(Inactive). It causes resampling will not happen > in mtty case. Let me rephrase this, and tell me if I understood it correctly: - A level interrupt is injected, activated by the guest (LR state=active) - guest exits, re-enters, (LR state=pending+active) - guest EOIs the interrupt (LR state=pending) - maintenance interrupt - we don't signal the resampling because we're not in an invalid state Is that correct? That's an interesting case, because it seems to invalidate some of the optimization that went in over a year ago. 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation We could compare the value of the LR before the guest entry with the value at exit time, but we still could miss it if we have a transition such as P+A -> P -> A and assume a long enough propagation delay for the maintenance interrupt (which is very likely). In essence, we have lost the benefit of EISR, which was to give us a way to deal with asynchronous signalling. > > This will cause interrupt fired continuously to guest even 8250 IIR > has no interrupt. When 8250's interrupt is configured in shared mode, > it will pass interrupt to other drivers to handle. However, there > is no other driver involved. Then, a "nobody cared" kernel complaint > occurs. > > / # cat /dev/ttyS0 > [ 4.826836] random: crng init done > [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" > option) > [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 > [ 6.378927] Hardware name: linux,dummy-virt (DT) > [ 6.380876] Call trace: > [ 6.381937] dump_backtrace+0x0/0x180 > [ 6.383495] show_stack+0x14/0x1c > [ 6.384902] dump_stack+0x90/0xb4 > [ 6.386312] __report_bad_irq+0x38/0xe0 > [ 6.387944] note_interrupt+0x1f4/0x2b8 > [ 6.389568] handle_irq_event_percpu+0x54/0x7c > [ 6.391433] handle_irq_event+0x44/0x74 > [ 6.393056] handle_fasteoi_irq+0x9c/0x154 > [ 6.394784] generic_handle_irq+0x24/0x38 > [ 6.396483] __handle_domain_irq+0x60/0xb4 > [ 6.398207] gic_handle_irq+0x98/0x1b0 > [ 6.399796] el1_irq+0xb0/0x128 > [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 > [ 6.403149] __setup_irq+0x41c/0x678 > [ 6.404669] request_threaded_irq+0xe0/0x190 > [ 6.406474] univ8250_setup_irq+0x208/0x234 > [ 6.408250] serial8250_do_startup+0x1b4/0x754 > [ 6.410123] serial8250_startup+0x20/0x28 > [ 6.411826] uart_startup.part.21+0x78/0x144 > [ 6.413633] uart_port_activate+0x50/0x68 > [ 6.415328] tty_port_open+0x84/0xd4 > [ 6.416851] uart_open+0x34/0x44 > [ 6.418229] tty_open+0xec/0x3c8 > [ 6.419610] chrdev_open+0xb0/0x198 > [ 6.421093] do_dentry_open+0x200/0x310 > [ 6.422714] vfs_open+0x54/0x84 > [ 6.424054] path_openat+0x2dc/0xf04 > [ 6.425569] do_filp_open+0x68/0xd8 > [ 6.427044] do_sys_open+0x16c/0x224 > [ 6.428563] SyS_openat+0x10/0x18 > [ 6.429972] el0_svc_naked+0x30/0x34 > [ 6.431494] handlers: > [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt > [ 6.434597] Disabling IRQ #41 > > This patch changes the lr state condition in lr_signals_eoi_mi() from > invalid(Inactive) to active and pending to avoid this. > > I am not sure about the original design of the condition of > invalid(active). So, This RFC is sent out for comments. > > Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > --- > virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > index e9d840a75e7b..740ee9a5f551 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u32 lr_val) > { > - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > - !(lr_val & GICH_LR_HW); > + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && That feels very wrong. You're now signalling the resampling in both invalid and pending+active, and the latter state doesn't mean you've EOIed anything. You're now over-signalling, and signalling the wrong event. > + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); > } > > /* > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > index 6b329414e57a..43111bba7af9 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u64 lr_val) > { > - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > - !(lr_val & ICH_LR_HW); > + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && > + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > } > > void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > Assuming I understand the issue correctly, I cannot really see how to solve this without reintroducing EISR, which sucks majorly. I'll try to cook something shortly and we can all have a good fight about how crap this is. Thanks, M. -- Jazz is not dead. It just smells funny... _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 9:49 ` Marc Zyngier @ 2018-03-08 11:54 ` Marc Zyngier -1 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-08 11:54 UTC (permalink / raw) To: Shunyong Yang Cc: ard.biesheuvel, will.deacon, eric.auger, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng, Christoffer Dall On 08/03/18 09:49, Marc Zyngier wrote: > [updated Christoffer's email address] > > Hi Shunyong, > > On 08/03/18 07:01, Shunyong Yang wrote: >> When resampling irqfds is enabled, level interrupt should be >> de-asserted when resampling happens. On page 4-47 of GIC v3 >> specification IHI0069D, it said, >> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >> interface, the IRI changes the status of the interrupt to active >> and pending if: >> • It is an edge-triggered interrupt, and another edge has been >> detected since the interrupt was acknowledged. >> • It is a level-sensitive interrupt, and the level has not been >> deasserted since the interrupt was acknowledged." >> >> GIC v2 specification IHI0048B.b has similar description on page >> 3-42 for state machine transition. >> >> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >> in samples/vfio-mdev) triggers a level interrupt, the status >> transition in LR is pending-->active-->active and pending. >> Then it will wait resampling to de-assert the interrupt. >> >> Current design of lr_signals_eoi_mi() will return false if state >> in LR is not invalid(Inactive). It causes resampling will not happen >> in mtty case. > > Let me rephrase this, and tell me if I understood it correctly: > > - A level interrupt is injected, activated by the guest (LR state=active) > - guest exits, re-enters, (LR state=pending+active) > - guest EOIs the interrupt (LR state=pending) > - maintenance interrupt > - we don't signal the resampling because we're not in an invalid state > > Is that correct? > > That's an interesting case, because it seems to invalidate some of the > optimization that went in over a year ago. > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation > > We could compare the value of the LR before the guest entry with > the value at exit time, but we still could miss it if we have a > transition such as P+A -> P -> A and assume a long enough propagation > delay for the maintenance interrupt (which is very likely). > > In essence, we have lost the benefit of EISR, which was to give us a > way to deal with asynchronous signalling. > >> >> This will cause interrupt fired continuously to guest even 8250 IIR >> has no interrupt. When 8250's interrupt is configured in shared mode, >> it will pass interrupt to other drivers to handle. However, there >> is no other driver involved. Then, a "nobody cared" kernel complaint >> occurs. >> >> / # cat /dev/ttyS0 >> [ 4.826836] random: crng init done >> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >> option) >> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >> [ 6.378927] Hardware name: linux,dummy-virt (DT) >> [ 6.380876] Call trace: >> [ 6.381937] dump_backtrace+0x0/0x180 >> [ 6.383495] show_stack+0x14/0x1c >> [ 6.384902] dump_stack+0x90/0xb4 >> [ 6.386312] __report_bad_irq+0x38/0xe0 >> [ 6.387944] note_interrupt+0x1f4/0x2b8 >> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >> [ 6.391433] handle_irq_event+0x44/0x74 >> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >> [ 6.394784] generic_handle_irq+0x24/0x38 >> [ 6.396483] __handle_domain_irq+0x60/0xb4 >> [ 6.398207] gic_handle_irq+0x98/0x1b0 >> [ 6.399796] el1_irq+0xb0/0x128 >> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >> [ 6.403149] __setup_irq+0x41c/0x678 >> [ 6.404669] request_threaded_irq+0xe0/0x190 >> [ 6.406474] univ8250_setup_irq+0x208/0x234 >> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >> [ 6.410123] serial8250_startup+0x20/0x28 >> [ 6.411826] uart_startup.part.21+0x78/0x144 >> [ 6.413633] uart_port_activate+0x50/0x68 >> [ 6.415328] tty_port_open+0x84/0xd4 >> [ 6.416851] uart_open+0x34/0x44 >> [ 6.418229] tty_open+0xec/0x3c8 >> [ 6.419610] chrdev_open+0xb0/0x198 >> [ 6.421093] do_dentry_open+0x200/0x310 >> [ 6.422714] vfs_open+0x54/0x84 >> [ 6.424054] path_openat+0x2dc/0xf04 >> [ 6.425569] do_filp_open+0x68/0xd8 >> [ 6.427044] do_sys_open+0x16c/0x224 >> [ 6.428563] SyS_openat+0x10/0x18 >> [ 6.429972] el0_svc_naked+0x30/0x34 >> [ 6.431494] handlers: >> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >> [ 6.434597] Disabling IRQ #41 >> >> This patch changes the lr state condition in lr_signals_eoi_mi() from >> invalid(Inactive) to active and pending to avoid this. >> >> I am not sure about the original design of the condition of >> invalid(active). So, This RFC is sent out for comments. >> >> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >> --- >> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >> 2 files changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c >> index e9d840a75e7b..740ee9a5f551 100644 >> --- a/virt/kvm/arm/vgic/vgic-v2.c >> +++ b/virt/kvm/arm/vgic/vgic-v2.c >> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >> >> static bool lr_signals_eoi_mi(u32 lr_val) >> { >> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && >> - !(lr_val & GICH_LR_HW); >> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && > > That feels very wrong. You're now signalling the resampling in both > invalid and pending+active, and the latter state doesn't mean you've > EOIed anything. You're now over-signalling, and signalling the > wrong event. > >> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >> } >> >> /* >> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c >> index 6b329414e57a..43111bba7af9 100644 >> --- a/virt/kvm/arm/vgic/vgic-v3.c >> +++ b/virt/kvm/arm/vgic/vgic-v3.c >> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >> >> static bool lr_signals_eoi_mi(u64 lr_val) >> { >> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && >> - !(lr_val & ICH_LR_HW); >> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >> } >> >> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >> > > Assuming I understand the issue correctly, I cannot really see how > to solve this without reintroducing EISR, which sucks majorly. > > I'll try to cook something shortly and we can all have a good > fight about how crap this is. Here's what I came up with. I don't really like it, but that's the least invasive this I could come up with. Please let me know if that helps with your test case. Note that I have only boot-tested this on a sample of 1 machine, so I don't expect this to be perfect. Also, any guideline on how to reproduce this would be much appreciated. I never used this mdev/mtty thing, so please bear with me. Thanks, M. >From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 From: Marc Zyngier <marc.zyngier@arm.com> Date: Thu, 8 Mar 2018 11:14:06 +0000 Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI status We so far rely on the LR state to decide whether the guest has EOI'd a level interrupt or not. While this looks like a good idea on the surface, it leads to a couple of annoying corner cases: Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI The state is now pending, we've really EOI'd the interrupt, and yet lr_signals_eoi_mi() returns false, since the state is not 0. The result is that we won't signal anything on the corresponding irqfd, which people complain about. Meh. Example 2: P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires Same issue: state isn't 0, and nothing happens. The core of the problem is that we can't decide on whether an interrupt has been EOId by just looking at the LR if we ever want to support the P+A state, as things do change behind our back. An alternative to dropping P+A is to bring back our friend EISR, which indicates which LRs have generated a MI. Instead of dragging the state around like we used to do, use it to clear the EOI bit from the in-memory copy, and use that as a predicate to find out if it fired or not. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- virt/kvm/arm/hyp/vgic-v2-sr.c | 8 ++++++++ virt/kvm/arm/hyp/vgic-v3-sr.c | 6 ++++++ virt/kvm/arm/vgic/vgic-v2.c | 3 +-- virt/kvm/arm/vgic/vgic-v3.c | 3 +-- 4 files changed, 16 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c index 4fe6e797e8b3..475cb2d7fd33 100644 --- a/virt/kvm/arm/hyp/vgic-v2-sr.c +++ b/virt/kvm/arm/hyp/vgic-v2-sr.c @@ -43,6 +43,11 @@ static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; int i; u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs; + u64 eisr; + + eisr = readl_relaxed(base + GICH_EISR0); + if (unlikely(used_lrs > 32)) + eisr |= (u64)readl_relaxed(base + GICH_EISR1) << 32; for (i = 0; i < used_lrs; i++) { if (cpu_if->vgic_elrsr & (1UL << i)) @@ -50,6 +55,9 @@ static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) else cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4)); + if ((cpu_if->vgic_lr[i] & GICH_LR_EOI) && !(eisr & (1UL << i))) + cpu_if->vgic_lr[i] &= ~GICH_LR_EOI; + writel_relaxed(0, base + GICH_LR0 + (i * 4)); } } diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c index b89ce5432214..2ce63d6740b0 100644 --- a/virt/kvm/arm/hyp/vgic-v3-sr.c +++ b/virt/kvm/arm/hyp/vgic-v3-sr.c @@ -223,8 +223,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) if (used_lrs) { int i; u32 nr_pre_bits; + u32 eisr; cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2); + eisr = read_gicreg(ICH_EISR_EL2); write_gicreg(0, ICH_HCR_EL2); val = read_gicreg(ICH_VTR_EL2); @@ -236,6 +238,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) else cpu_if->vgic_lr[i] = __gic_v3_get_lr(i); + if ((cpu_if->vgic_lr[i] & ICH_LR_EOI) && + !(eisr & (1 << i))) + cpu_if->vgic_lr[i] &= ~ICH_LR_EOI; + __gic_v3_set_lr(0, i); } diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c index e9d840a75e7b..0be616e4ee29 100644 --- a/virt/kvm/arm/vgic/vgic-v2.c +++ b/virt/kvm/arm/vgic/vgic-v2.c @@ -46,8 +46,7 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) static bool lr_signals_eoi_mi(u32 lr_val) { - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && - !(lr_val & GICH_LR_HW); + return (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); } /* diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index 6b329414e57a..c68352b8ed28 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -35,8 +35,7 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) static bool lr_signals_eoi_mi(u64 lr_val) { - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && - !(lr_val & ICH_LR_HW); + return (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); } void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) -- 2.14.2 -- Jazz is not dead. It just smells funny... ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 11:54 ` Marc Zyngier 0 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-08 11:54 UTC (permalink / raw) To: linux-arm-kernel On 08/03/18 09:49, Marc Zyngier wrote: > [updated Christoffer's email address] > > Hi Shunyong, > > On 08/03/18 07:01, Shunyong Yang wrote: >> When resampling irqfds is enabled, level interrupt should be >> de-asserted when resampling happens. On page 4-47 of GIC v3 >> specification IHI0069D, it said, >> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >> interface, the IRI changes the status of the interrupt to active >> and pending if: >> ? It is an edge-triggered interrupt, and another edge has been >> detected since the interrupt was acknowledged. >> ? It is a level-sensitive interrupt, and the level has not been >> deasserted since the interrupt was acknowledged." >> >> GIC v2 specification IHI0048B.b has similar description on page >> 3-42 for state machine transition. >> >> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >> in samples/vfio-mdev) triggers a level interrupt, the status >> transition in LR is pending-->active-->active and pending. >> Then it will wait resampling to de-assert the interrupt. >> >> Current design of lr_signals_eoi_mi() will return false if state >> in LR is not invalid(Inactive). It causes resampling will not happen >> in mtty case. > > Let me rephrase this, and tell me if I understood it correctly: > > - A level interrupt is injected, activated by the guest (LR state=active) > - guest exits, re-enters, (LR state=pending+active) > - guest EOIs the interrupt (LR state=pending) > - maintenance interrupt > - we don't signal the resampling because we're not in an invalid state > > Is that correct? > > That's an interesting case, because it seems to invalidate some of the > optimization that went in over a year ago. > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation > > We could compare the value of the LR before the guest entry with > the value at exit time, but we still could miss it if we have a > transition such as P+A -> P -> A and assume a long enough propagation > delay for the maintenance interrupt (which is very likely). > > In essence, we have lost the benefit of EISR, which was to give us a > way to deal with asynchronous signalling. > >> >> This will cause interrupt fired continuously to guest even 8250 IIR >> has no interrupt. When 8250's interrupt is configured in shared mode, >> it will pass interrupt to other drivers to handle. However, there >> is no other driver involved. Then, a "nobody cared" kernel complaint >> occurs. >> >> / # cat /dev/ttyS0 >> [ 4.826836] random: crng init done >> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >> option) >> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >> [ 6.378927] Hardware name: linux,dummy-virt (DT) >> [ 6.380876] Call trace: >> [ 6.381937] dump_backtrace+0x0/0x180 >> [ 6.383495] show_stack+0x14/0x1c >> [ 6.384902] dump_stack+0x90/0xb4 >> [ 6.386312] __report_bad_irq+0x38/0xe0 >> [ 6.387944] note_interrupt+0x1f4/0x2b8 >> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >> [ 6.391433] handle_irq_event+0x44/0x74 >> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >> [ 6.394784] generic_handle_irq+0x24/0x38 >> [ 6.396483] __handle_domain_irq+0x60/0xb4 >> [ 6.398207] gic_handle_irq+0x98/0x1b0 >> [ 6.399796] el1_irq+0xb0/0x128 >> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >> [ 6.403149] __setup_irq+0x41c/0x678 >> [ 6.404669] request_threaded_irq+0xe0/0x190 >> [ 6.406474] univ8250_setup_irq+0x208/0x234 >> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >> [ 6.410123] serial8250_startup+0x20/0x28 >> [ 6.411826] uart_startup.part.21+0x78/0x144 >> [ 6.413633] uart_port_activate+0x50/0x68 >> [ 6.415328] tty_port_open+0x84/0xd4 >> [ 6.416851] uart_open+0x34/0x44 >> [ 6.418229] tty_open+0xec/0x3c8 >> [ 6.419610] chrdev_open+0xb0/0x198 >> [ 6.421093] do_dentry_open+0x200/0x310 >> [ 6.422714] vfs_open+0x54/0x84 >> [ 6.424054] path_openat+0x2dc/0xf04 >> [ 6.425569] do_filp_open+0x68/0xd8 >> [ 6.427044] do_sys_open+0x16c/0x224 >> [ 6.428563] SyS_openat+0x10/0x18 >> [ 6.429972] el0_svc_naked+0x30/0x34 >> [ 6.431494] handlers: >> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >> [ 6.434597] Disabling IRQ #41 >> >> This patch changes the lr state condition in lr_signals_eoi_mi() from >> invalid(Inactive) to active and pending to avoid this. >> >> I am not sure about the original design of the condition of >> invalid(active). So, This RFC is sent out for comments. >> >> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >> --- >> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >> 2 files changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c >> index e9d840a75e7b..740ee9a5f551 100644 >> --- a/virt/kvm/arm/vgic/vgic-v2.c >> +++ b/virt/kvm/arm/vgic/vgic-v2.c >> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >> >> static bool lr_signals_eoi_mi(u32 lr_val) >> { >> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && >> - !(lr_val & GICH_LR_HW); >> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && > > That feels very wrong. You're now signalling the resampling in both > invalid and pending+active, and the latter state doesn't mean you've > EOIed anything. You're now over-signalling, and signalling the > wrong event. > >> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >> } >> >> /* >> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c >> index 6b329414e57a..43111bba7af9 100644 >> --- a/virt/kvm/arm/vgic/vgic-v3.c >> +++ b/virt/kvm/arm/vgic/vgic-v3.c >> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >> >> static bool lr_signals_eoi_mi(u64 lr_val) >> { >> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && >> - !(lr_val & ICH_LR_HW); >> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >> } >> >> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >> > > Assuming I understand the issue correctly, I cannot really see how > to solve this without reintroducing EISR, which sucks majorly. > > I'll try to cook something shortly and we can all have a good > fight about how crap this is. Here's what I came up with. I don't really like it, but that's the least invasive this I could come up with. Please let me know if that helps with your test case. Note that I have only boot-tested this on a sample of 1 machine, so I don't expect this to be perfect. Also, any guideline on how to reproduce this would be much appreciated. I never used this mdev/mtty thing, so please bear with me. Thanks, M. >From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 From: Marc Zyngier <marc.zyngier@arm.com> Date: Thu, 8 Mar 2018 11:14:06 +0000 Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI status We so far rely on the LR state to decide whether the guest has EOI'd a level interrupt or not. While this looks like a good idea on the surface, it leads to a couple of annoying corner cases: Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI The state is now pending, we've really EOI'd the interrupt, and yet lr_signals_eoi_mi() returns false, since the state is not 0. The result is that we won't signal anything on the corresponding irqfd, which people complain about. Meh. Example 2: P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires Same issue: state isn't 0, and nothing happens. The core of the problem is that we can't decide on whether an interrupt has been EOId by just looking at the LR if we ever want to support the P+A state, as things do change behind our back. An alternative to dropping P+A is to bring back our friend EISR, which indicates which LRs have generated a MI. Instead of dragging the state around like we used to do, use it to clear the EOI bit from the in-memory copy, and use that as a predicate to find out if it fired or not. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- virt/kvm/arm/hyp/vgic-v2-sr.c | 8 ++++++++ virt/kvm/arm/hyp/vgic-v3-sr.c | 6 ++++++ virt/kvm/arm/vgic/vgic-v2.c | 3 +-- virt/kvm/arm/vgic/vgic-v3.c | 3 +-- 4 files changed, 16 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c index 4fe6e797e8b3..475cb2d7fd33 100644 --- a/virt/kvm/arm/hyp/vgic-v2-sr.c +++ b/virt/kvm/arm/hyp/vgic-v2-sr.c @@ -43,6 +43,11 @@ static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; int i; u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs; + u64 eisr; + + eisr = readl_relaxed(base + GICH_EISR0); + if (unlikely(used_lrs > 32)) + eisr |= (u64)readl_relaxed(base + GICH_EISR1) << 32; for (i = 0; i < used_lrs; i++) { if (cpu_if->vgic_elrsr & (1UL << i)) @@ -50,6 +55,9 @@ static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) else cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4)); + if ((cpu_if->vgic_lr[i] & GICH_LR_EOI) && !(eisr & (1UL << i))) + cpu_if->vgic_lr[i] &= ~GICH_LR_EOI; + writel_relaxed(0, base + GICH_LR0 + (i * 4)); } } diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c index b89ce5432214..2ce63d6740b0 100644 --- a/virt/kvm/arm/hyp/vgic-v3-sr.c +++ b/virt/kvm/arm/hyp/vgic-v3-sr.c @@ -223,8 +223,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) if (used_lrs) { int i; u32 nr_pre_bits; + u32 eisr; cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2); + eisr = read_gicreg(ICH_EISR_EL2); write_gicreg(0, ICH_HCR_EL2); val = read_gicreg(ICH_VTR_EL2); @@ -236,6 +238,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) else cpu_if->vgic_lr[i] = __gic_v3_get_lr(i); + if ((cpu_if->vgic_lr[i] & ICH_LR_EOI) && + !(eisr & (1 << i))) + cpu_if->vgic_lr[i] &= ~ICH_LR_EOI; + __gic_v3_set_lr(0, i); } diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c index e9d840a75e7b..0be616e4ee29 100644 --- a/virt/kvm/arm/vgic/vgic-v2.c +++ b/virt/kvm/arm/vgic/vgic-v2.c @@ -46,8 +46,7 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) static bool lr_signals_eoi_mi(u32 lr_val) { - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && - !(lr_val & GICH_LR_HW); + return (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); } /* diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index 6b329414e57a..c68352b8ed28 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -35,8 +35,7 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) static bool lr_signals_eoi_mi(u64 lr_val) { - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && - !(lr_val & ICH_LR_HW); + return (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); } void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) -- 2.14.2 -- Jazz is not dead. It just smells funny... ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 11:54 ` Marc Zyngier @ 2018-03-08 16:09 ` Auger Eric -1 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-08 16:09 UTC (permalink / raw) To: Marc Zyngier, Shunyong Yang Cc: Christoffer Dall, david.daney, ard.biesheuvel, will.deacon, linux-kernel, Joey Zheng, kvmarm, linux-arm-kernel Hi Marc, On 08/03/18 12:54, Marc Zyngier wrote: > On 08/03/18 09:49, Marc Zyngier wrote: >> [updated Christoffer's email address] >> >> Hi Shunyong, >> >> On 08/03/18 07:01, Shunyong Yang wrote: >>> When resampling irqfds is enabled, level interrupt should be >>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>> specification IHI0069D, it said, >>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>> interface, the IRI changes the status of the interrupt to active >>> and pending if: >>> • It is an edge-triggered interrupt, and another edge has been >>> detected since the interrupt was acknowledged. >>> • It is a level-sensitive interrupt, and the level has not been >>> deasserted since the interrupt was acknowledged." >>> >>> GIC v2 specification IHI0048B.b has similar description on page >>> 3-42 for state machine transition. >>> >>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>> in samples/vfio-mdev) triggers a level interrupt, the status >>> transition in LR is pending-->active-->active and pending. >>> Then it will wait resampling to de-assert the interrupt. >>> >>> Current design of lr_signals_eoi_mi() will return false if state >>> in LR is not invalid(Inactive). It causes resampling will not happen >>> in mtty case. >> >> Let me rephrase this, and tell me if I understood it correctly: >> >> - A level interrupt is injected, activated by the guest (LR state=active) >> - guest exits, re-enters, (LR state=pending+active) >> - guest EOIs the interrupt (LR state=pending) >> - maintenance interrupt >> - we don't signal the resampling because we're not in an invalid state >> >> Is that correct? >> >> That's an interesting case, because it seems to invalidate some of the >> optimization that went in over a year ago. >> >> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields >> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state >> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation >> >> We could compare the value of the LR before the guest entry with >> the value at exit time, but we still could miss it if we have a >> transition such as P+A -> P -> A and assume a long enough propagation >> delay for the maintenance interrupt (which is very likely). >> >> In essence, we have lost the benefit of EISR, which was to give us a >> way to deal with asynchronous signalling. >> >>> >>> This will cause interrupt fired continuously to guest even 8250 IIR >>> has no interrupt. When 8250's interrupt is configured in shared mode, >>> it will pass interrupt to other drivers to handle. However, there >>> is no other driver involved. Then, a "nobody cared" kernel complaint >>> occurs. >>> >>> / # cat /dev/ttyS0 >>> [ 4.826836] random: crng init done >>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>> option) >>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>> [ 6.380876] Call trace: >>> [ 6.381937] dump_backtrace+0x0/0x180 >>> [ 6.383495] show_stack+0x14/0x1c >>> [ 6.384902] dump_stack+0x90/0xb4 >>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>> [ 6.391433] handle_irq_event+0x44/0x74 >>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>> [ 6.394784] generic_handle_irq+0x24/0x38 >>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>> [ 6.399796] el1_irq+0xb0/0x128 >>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>> [ 6.403149] __setup_irq+0x41c/0x678 >>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>> [ 6.410123] serial8250_startup+0x20/0x28 >>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>> [ 6.413633] uart_port_activate+0x50/0x68 >>> [ 6.415328] tty_port_open+0x84/0xd4 >>> [ 6.416851] uart_open+0x34/0x44 >>> [ 6.418229] tty_open+0xec/0x3c8 >>> [ 6.419610] chrdev_open+0xb0/0x198 >>> [ 6.421093] do_dentry_open+0x200/0x310 >>> [ 6.422714] vfs_open+0x54/0x84 >>> [ 6.424054] path_openat+0x2dc/0xf04 >>> [ 6.425569] do_filp_open+0x68/0xd8 >>> [ 6.427044] do_sys_open+0x16c/0x224 >>> [ 6.428563] SyS_openat+0x10/0x18 >>> [ 6.429972] el0_svc_naked+0x30/0x34 >>> [ 6.431494] handlers: >>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>> [ 6.434597] Disabling IRQ #41 >>> >>> This patch changes the lr state condition in lr_signals_eoi_mi() from >>> invalid(Inactive) to active and pending to avoid this. >>> >>> I am not sure about the original design of the condition of >>> invalid(active). So, This RFC is sent out for comments. >>> >>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>> --- >>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>> 2 files changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c >>> index e9d840a75e7b..740ee9a5f551 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u32 lr_val) >>> { >>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && >>> - !(lr_val & GICH_LR_HW); >>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >> >> That feels very wrong. You're now signalling the resampling in both >> invalid and pending+active, and the latter state doesn't mean you've >> EOIed anything. You're now over-signalling, and signalling the >> wrong event. >> >>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>> } >>> >>> /* >>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c >>> index 6b329414e57a..43111bba7af9 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u64 lr_val) >>> { >>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && >>> - !(lr_val & ICH_LR_HW); >>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >>> } >>> >>> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >>> >> >> Assuming I understand the issue correctly, I cannot really see how >> to solve this without reintroducing EISR, which sucks majorly. >> >> I'll try to cook something shortly and we can all have a good >> fight about how crap this is. > > Here's what I came up with. I don't really like it, but that's > the least invasive this I could come up with. Please let me > know if that helps with your test case. Note that I have only > boot-tested this on a sample of 1 machine, so I don't expect this > to be perfect. > > Also, any guideline on how to reproduce this would be much appreciated. > I never used this mdev/mtty thing, so please bear with me. > > Thanks, > > M. > > From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 > From: Marc Zyngier <marc.zyngier@arm.com> > Date: Thu, 8 Mar 2018 11:14:06 +0000 > Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI > status > > We so far rely on the LR state to decide whether the guest has > EOI'd a level interrupt or not. While this looks like a good > idea on the surface, it leads to a couple of annoying corner > cases: > > Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) > P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI > > The state is now pending, we've really EOI'd the interrupt, and > yet lr_signals_eoi_mi() returns false, since the state is not 0. > The result is that we won't signal anything on the corresponding > irqfd, which people complain about. Meh. > > Example 2: > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires In that case aren't we acking the same IRQ occurence twice? > > Same issue: state isn't 0, and nothing happens. > > The core of the problem is that we can't decide on whether an > interrupt has been EOId by just looking at the LR if we ever > want to support the P+A state, as things do change behind our back. > > An alternative to dropping P+A is to bring back our friend EISR, > which indicates which LRs have generated a MI. Instead of dragging > the state around like we used to do, use it to clear the EOI bit > from the in-memory copy, and use that as a predicate to find out > if it fired or not. > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > --- > virt/kvm/arm/hyp/vgic-v2-sr.c | 8 ++++++++ > virt/kvm/arm/hyp/vgic-v3-sr.c | 6 ++++++ > virt/kvm/arm/vgic/vgic-v2.c | 3 +-- > virt/kvm/arm/vgic/vgic-v3.c | 3 +-- > 4 files changed, 16 insertions(+), 4 deletions(-) > > diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c > index 4fe6e797e8b3..475cb2d7fd33 100644 > --- a/virt/kvm/arm/hyp/vgic-v2-sr.c > +++ b/virt/kvm/arm/hyp/vgic-v2-sr.c > @@ -43,6 +43,11 @@ static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) > struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; > int i; > u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs; > + u64 eisr; > + > + eisr = readl_relaxed(base + GICH_EISR0); > + if (unlikely(used_lrs > 32)) > + eisr |= (u64)readl_relaxed(base + GICH_EISR1) << 32; > > for (i = 0; i < used_lrs; i++) { > if (cpu_if->vgic_elrsr & (1UL << i)) > @@ -50,6 +55,9 @@ static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) > else > cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4)); > > + if ((cpu_if->vgic_lr[i] & GICH_LR_EOI) && !(eisr & (1UL << i))) > + cpu_if->vgic_lr[i] &= ~GICH_LR_EOI; > + > writel_relaxed(0, base + GICH_LR0 + (i * 4)); > } > } > diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c > index b89ce5432214..2ce63d6740b0 100644 > --- a/virt/kvm/arm/hyp/vgic-v3-sr.c > +++ b/virt/kvm/arm/hyp/vgic-v3-sr.c > @@ -223,8 +223,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) > if (used_lrs) { > int i; > u32 nr_pre_bits; > + u32 eisr; > > cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2); > + eisr = read_gicreg(ICH_EISR_EL2); > > write_gicreg(0, ICH_HCR_EL2); > val = read_gicreg(ICH_VTR_EL2); > @@ -236,6 +238,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) > else > cpu_if->vgic_lr[i] = __gic_v3_get_lr(i); > > + if ((cpu_if->vgic_lr[i] & ICH_LR_EOI) && > + !(eisr & (1 << i))) > + cpu_if->vgic_lr[i] &= ~ICH_LR_EOI; > + > __gic_v3_set_lr(0, i); > } > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > index e9d840a75e7b..0be616e4ee29 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -46,8 +46,7 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u32 lr_val) > { > - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > - !(lr_val & GICH_LR_HW); > + return (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); do we still need to test lr_val & GICH_LR_HW? Aren't LR_EOI and LR_HW antagonist (maybe it is a reminder of architected timer stuff?) > } > > /* > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > index 6b329414e57a..c68352b8ed28 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -35,8 +35,7 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u64 lr_val) > { > - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > - !(lr_val & ICH_LR_HW); > + return (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > } > > void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > Otherwise Looks good to me Thanks Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 16:09 ` Auger Eric 0 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-08 16:09 UTC (permalink / raw) To: linux-arm-kernel Hi Marc, On 08/03/18 12:54, Marc Zyngier wrote: > On 08/03/18 09:49, Marc Zyngier wrote: >> [updated Christoffer's email address] >> >> Hi Shunyong, >> >> On 08/03/18 07:01, Shunyong Yang wrote: >>> When resampling irqfds is enabled, level interrupt should be >>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>> specification IHI0069D, it said, >>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>> interface, the IRI changes the status of the interrupt to active >>> and pending if: >>> ? It is an edge-triggered interrupt, and another edge has been >>> detected since the interrupt was acknowledged. >>> ? It is a level-sensitive interrupt, and the level has not been >>> deasserted since the interrupt was acknowledged." >>> >>> GIC v2 specification IHI0048B.b has similar description on page >>> 3-42 for state machine transition. >>> >>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>> in samples/vfio-mdev) triggers a level interrupt, the status >>> transition in LR is pending-->active-->active and pending. >>> Then it will wait resampling to de-assert the interrupt. >>> >>> Current design of lr_signals_eoi_mi() will return false if state >>> in LR is not invalid(Inactive). It causes resampling will not happen >>> in mtty case. >> >> Let me rephrase this, and tell me if I understood it correctly: >> >> - A level interrupt is injected, activated by the guest (LR state=active) >> - guest exits, re-enters, (LR state=pending+active) >> - guest EOIs the interrupt (LR state=pending) >> - maintenance interrupt >> - we don't signal the resampling because we're not in an invalid state >> >> Is that correct? >> >> That's an interesting case, because it seems to invalidate some of the >> optimization that went in over a year ago. >> >> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields >> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state >> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation >> >> We could compare the value of the LR before the guest entry with >> the value at exit time, but we still could miss it if we have a >> transition such as P+A -> P -> A and assume a long enough propagation >> delay for the maintenance interrupt (which is very likely). >> >> In essence, we have lost the benefit of EISR, which was to give us a >> way to deal with asynchronous signalling. >> >>> >>> This will cause interrupt fired continuously to guest even 8250 IIR >>> has no interrupt. When 8250's interrupt is configured in shared mode, >>> it will pass interrupt to other drivers to handle. However, there >>> is no other driver involved. Then, a "nobody cared" kernel complaint >>> occurs. >>> >>> / # cat /dev/ttyS0 >>> [ 4.826836] random: crng init done >>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>> option) >>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>> [ 6.380876] Call trace: >>> [ 6.381937] dump_backtrace+0x0/0x180 >>> [ 6.383495] show_stack+0x14/0x1c >>> [ 6.384902] dump_stack+0x90/0xb4 >>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>> [ 6.391433] handle_irq_event+0x44/0x74 >>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>> [ 6.394784] generic_handle_irq+0x24/0x38 >>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>> [ 6.399796] el1_irq+0xb0/0x128 >>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>> [ 6.403149] __setup_irq+0x41c/0x678 >>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>> [ 6.410123] serial8250_startup+0x20/0x28 >>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>> [ 6.413633] uart_port_activate+0x50/0x68 >>> [ 6.415328] tty_port_open+0x84/0xd4 >>> [ 6.416851] uart_open+0x34/0x44 >>> [ 6.418229] tty_open+0xec/0x3c8 >>> [ 6.419610] chrdev_open+0xb0/0x198 >>> [ 6.421093] do_dentry_open+0x200/0x310 >>> [ 6.422714] vfs_open+0x54/0x84 >>> [ 6.424054] path_openat+0x2dc/0xf04 >>> [ 6.425569] do_filp_open+0x68/0xd8 >>> [ 6.427044] do_sys_open+0x16c/0x224 >>> [ 6.428563] SyS_openat+0x10/0x18 >>> [ 6.429972] el0_svc_naked+0x30/0x34 >>> [ 6.431494] handlers: >>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>> [ 6.434597] Disabling IRQ #41 >>> >>> This patch changes the lr state condition in lr_signals_eoi_mi() from >>> invalid(Inactive) to active and pending to avoid this. >>> >>> I am not sure about the original design of the condition of >>> invalid(active). So, This RFC is sent out for comments. >>> >>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>> --- >>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>> 2 files changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c >>> index e9d840a75e7b..740ee9a5f551 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u32 lr_val) >>> { >>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && >>> - !(lr_val & GICH_LR_HW); >>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >> >> That feels very wrong. You're now signalling the resampling in both >> invalid and pending+active, and the latter state doesn't mean you've >> EOIed anything. You're now over-signalling, and signalling the >> wrong event. >> >>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>> } >>> >>> /* >>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c >>> index 6b329414e57a..43111bba7af9 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u64 lr_val) >>> { >>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && >>> - !(lr_val & ICH_LR_HW); >>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >>> } >>> >>> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >>> >> >> Assuming I understand the issue correctly, I cannot really see how >> to solve this without reintroducing EISR, which sucks majorly. >> >> I'll try to cook something shortly and we can all have a good >> fight about how crap this is. > > Here's what I came up with. I don't really like it, but that's > the least invasive this I could come up with. Please let me > know if that helps with your test case. Note that I have only > boot-tested this on a sample of 1 machine, so I don't expect this > to be perfect. > > Also, any guideline on how to reproduce this would be much appreciated. > I never used this mdev/mtty thing, so please bear with me. > > Thanks, > > M. > > From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 > From: Marc Zyngier <marc.zyngier@arm.com> > Date: Thu, 8 Mar 2018 11:14:06 +0000 > Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI > status > > We so far rely on the LR state to decide whether the guest has > EOI'd a level interrupt or not. While this looks like a good > idea on the surface, it leads to a couple of annoying corner > cases: > > Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) > P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI > > The state is now pending, we've really EOI'd the interrupt, and > yet lr_signals_eoi_mi() returns false, since the state is not 0. > The result is that we won't signal anything on the corresponding > irqfd, which people complain about. Meh. > > Example 2: > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires In that case aren't we acking the same IRQ occurence twice? > > Same issue: state isn't 0, and nothing happens. > > The core of the problem is that we can't decide on whether an > interrupt has been EOId by just looking at the LR if we ever > want to support the P+A state, as things do change behind our back. > > An alternative to dropping P+A is to bring back our friend EISR, > which indicates which LRs have generated a MI. Instead of dragging > the state around like we used to do, use it to clear the EOI bit > from the in-memory copy, and use that as a predicate to find out > if it fired or not. > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > --- > virt/kvm/arm/hyp/vgic-v2-sr.c | 8 ++++++++ > virt/kvm/arm/hyp/vgic-v3-sr.c | 6 ++++++ > virt/kvm/arm/vgic/vgic-v2.c | 3 +-- > virt/kvm/arm/vgic/vgic-v3.c | 3 +-- > 4 files changed, 16 insertions(+), 4 deletions(-) > > diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c > index 4fe6e797e8b3..475cb2d7fd33 100644 > --- a/virt/kvm/arm/hyp/vgic-v2-sr.c > +++ b/virt/kvm/arm/hyp/vgic-v2-sr.c > @@ -43,6 +43,11 @@ static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) > struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; > int i; > u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs; > + u64 eisr; > + > + eisr = readl_relaxed(base + GICH_EISR0); > + if (unlikely(used_lrs > 32)) > + eisr |= (u64)readl_relaxed(base + GICH_EISR1) << 32; > > for (i = 0; i < used_lrs; i++) { > if (cpu_if->vgic_elrsr & (1UL << i)) > @@ -50,6 +55,9 @@ static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) > else > cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4)); > > + if ((cpu_if->vgic_lr[i] & GICH_LR_EOI) && !(eisr & (1UL << i))) > + cpu_if->vgic_lr[i] &= ~GICH_LR_EOI; > + > writel_relaxed(0, base + GICH_LR0 + (i * 4)); > } > } > diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c > index b89ce5432214..2ce63d6740b0 100644 > --- a/virt/kvm/arm/hyp/vgic-v3-sr.c > +++ b/virt/kvm/arm/hyp/vgic-v3-sr.c > @@ -223,8 +223,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) > if (used_lrs) { > int i; > u32 nr_pre_bits; > + u32 eisr; > > cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2); > + eisr = read_gicreg(ICH_EISR_EL2); > > write_gicreg(0, ICH_HCR_EL2); > val = read_gicreg(ICH_VTR_EL2); > @@ -236,6 +238,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) > else > cpu_if->vgic_lr[i] = __gic_v3_get_lr(i); > > + if ((cpu_if->vgic_lr[i] & ICH_LR_EOI) && > + !(eisr & (1 << i))) > + cpu_if->vgic_lr[i] &= ~ICH_LR_EOI; > + > __gic_v3_set_lr(0, i); > } > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > index e9d840a75e7b..0be616e4ee29 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -46,8 +46,7 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u32 lr_val) > { > - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > - !(lr_val & GICH_LR_HW); > + return (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); do we still need to test lr_val & GICH_LR_HW? Aren't LR_EOI and LR_HW antagonist (maybe it is a reminder of architected timer stuff?) > } > > /* > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > index 6b329414e57a..c68352b8ed28 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -35,8 +35,7 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u64 lr_val) > { > - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > - !(lr_val & ICH_LR_HW); > + return (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > } > > void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > Otherwise Looks good to me Thanks Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 11:54 ` Marc Zyngier @ 2018-03-08 16:19 ` Christoffer Dall -1 siblings, 0 replies; 50+ messages in thread From: Christoffer Dall @ 2018-03-08 16:19 UTC (permalink / raw) To: Marc Zyngier Cc: Shunyong Yang, ard.biesheuvel, will.deacon, eric.auger, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: > On 08/03/18 09:49, Marc Zyngier wrote: > > [updated Christoffer's email address] > > > > Hi Shunyong, > > > > On 08/03/18 07:01, Shunyong Yang wrote: > >> When resampling irqfds is enabled, level interrupt should be > >> de-asserted when resampling happens. On page 4-47 of GIC v3 > >> specification IHI0069D, it said, > >> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > >> interface, the IRI changes the status of the interrupt to active > >> and pending if: > >> • It is an edge-triggered interrupt, and another edge has been > >> detected since the interrupt was acknowledged. > >> • It is a level-sensitive interrupt, and the level has not been > >> deasserted since the interrupt was acknowledged." > >> > >> GIC v2 specification IHI0048B.b has similar description on page > >> 3-42 for state machine transition. > >> > >> When some VFIO device, like mtty(8250 VFIO mdev emulation driver > >> in samples/vfio-mdev) triggers a level interrupt, the status > >> transition in LR is pending-->active-->active and pending. > >> Then it will wait resampling to de-assert the interrupt. > >> > >> Current design of lr_signals_eoi_mi() will return false if state > >> in LR is not invalid(Inactive). It causes resampling will not happen > >> in mtty case. > > > > Let me rephrase this, and tell me if I understood it correctly: > > > > - A level interrupt is injected, activated by the guest (LR state=active) > > - guest exits, re-enters, (LR state=pending+active) > > - guest EOIs the interrupt (LR state=pending) > > - maintenance interrupt > > - we don't signal the resampling because we're not in an invalid state > > > > Is that correct? > > > > That's an interesting case, because it seems to invalidate some of the > > optimization that went in over a year ago. > > > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields > > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state > > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation > > > > We could compare the value of the LR before the guest entry with > > the value at exit time, but we still could miss it if we have a > > transition such as P+A -> P -> A and assume a long enough propagation > > delay for the maintenance interrupt (which is very likely). > > > > In essence, we have lost the benefit of EISR, which was to give us a > > way to deal with asynchronous signalling. > > > >> > >> This will cause interrupt fired continuously to guest even 8250 IIR > >> has no interrupt. When 8250's interrupt is configured in shared mode, > >> it will pass interrupt to other drivers to handle. However, there > >> is no other driver involved. Then, a "nobody cared" kernel complaint > >> occurs. > >> > >> / # cat /dev/ttyS0 > >> [ 4.826836] random: crng init done > >> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" > >> option) > >> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 > >> [ 6.378927] Hardware name: linux,dummy-virt (DT) > >> [ 6.380876] Call trace: > >> [ 6.381937] dump_backtrace+0x0/0x180 > >> [ 6.383495] show_stack+0x14/0x1c > >> [ 6.384902] dump_stack+0x90/0xb4 > >> [ 6.386312] __report_bad_irq+0x38/0xe0 > >> [ 6.387944] note_interrupt+0x1f4/0x2b8 > >> [ 6.389568] handle_irq_event_percpu+0x54/0x7c > >> [ 6.391433] handle_irq_event+0x44/0x74 > >> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 > >> [ 6.394784] generic_handle_irq+0x24/0x38 > >> [ 6.396483] __handle_domain_irq+0x60/0xb4 > >> [ 6.398207] gic_handle_irq+0x98/0x1b0 > >> [ 6.399796] el1_irq+0xb0/0x128 > >> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 > >> [ 6.403149] __setup_irq+0x41c/0x678 > >> [ 6.404669] request_threaded_irq+0xe0/0x190 > >> [ 6.406474] univ8250_setup_irq+0x208/0x234 > >> [ 6.408250] serial8250_do_startup+0x1b4/0x754 > >> [ 6.410123] serial8250_startup+0x20/0x28 > >> [ 6.411826] uart_startup.part.21+0x78/0x144 > >> [ 6.413633] uart_port_activate+0x50/0x68 > >> [ 6.415328] tty_port_open+0x84/0xd4 > >> [ 6.416851] uart_open+0x34/0x44 > >> [ 6.418229] tty_open+0xec/0x3c8 > >> [ 6.419610] chrdev_open+0xb0/0x198 > >> [ 6.421093] do_dentry_open+0x200/0x310 > >> [ 6.422714] vfs_open+0x54/0x84 > >> [ 6.424054] path_openat+0x2dc/0xf04 > >> [ 6.425569] do_filp_open+0x68/0xd8 > >> [ 6.427044] do_sys_open+0x16c/0x224 > >> [ 6.428563] SyS_openat+0x10/0x18 > >> [ 6.429972] el0_svc_naked+0x30/0x34 > >> [ 6.431494] handlers: > >> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt > >> [ 6.434597] Disabling IRQ #41 > >> > >> This patch changes the lr state condition in lr_signals_eoi_mi() from > >> invalid(Inactive) to active and pending to avoid this. > >> > >> I am not sure about the original design of the condition of > >> invalid(active). So, This RFC is sent out for comments. > >> > >> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > >> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > >> --- > >> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > >> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > >> 2 files changed, 4 insertions(+), 4 deletions(-) > >> > >> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > >> index e9d840a75e7b..740ee9a5f551 100644 > >> --- a/virt/kvm/arm/vgic/vgic-v2.c > >> +++ b/virt/kvm/arm/vgic/vgic-v2.c > >> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > >> > >> static bool lr_signals_eoi_mi(u32 lr_val) > >> { > >> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > >> - !(lr_val & GICH_LR_HW); > >> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && > > > > That feels very wrong. You're now signalling the resampling in both > > invalid and pending+active, and the latter state doesn't mean you've > > EOIed anything. You're now over-signalling, and signalling the > > wrong event. > > > >> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); > >> } > >> > >> /* > >> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > >> index 6b329414e57a..43111bba7af9 100644 > >> --- a/virt/kvm/arm/vgic/vgic-v3.c > >> +++ b/virt/kvm/arm/vgic/vgic-v3.c > >> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > >> > >> static bool lr_signals_eoi_mi(u64 lr_val) > >> { > >> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > >> - !(lr_val & ICH_LR_HW); > >> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && > >> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > >> } > >> > >> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > >> > > > > Assuming I understand the issue correctly, I cannot really see how > > to solve this without reintroducing EISR, which sucks majorly. > > > > I'll try to cook something shortly and we can all have a good > > fight about how crap this is. > > Here's what I came up with. I don't really like it, but that's > the least invasive this I could come up with. Please let me > know if that helps with your test case. Note that I have only > boot-tested this on a sample of 1 machine, so I don't expect this > to be perfect. > > Also, any guideline on how to reproduce this would be much appreciated. > I never used this mdev/mtty thing, so please bear with me. > > Thanks, > > M. > > From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 > From: Marc Zyngier <marc.zyngier@arm.com> > Date: Thu, 8 Mar 2018 11:14:06 +0000 > Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI > status > > We so far rely on the LR state to decide whether the guest has > EOI'd a level interrupt or not. While this looks like a good > idea on the surface, it leads to a couple of annoying corner > cases: > > Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) > P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI Do we really get an EOI maintenance interrupt here? Reading the MISR and EISR descriptions make me thing this is not the case... > > The state is now pending, we've really EOI'd the interrupt, and > yet lr_signals_eoi_mi() returns false, since the state is not 0. > The result is that we won't signal anything on the corresponding > irqfd, which people complain about. Meh. So the core of the problem is that when we've entered the guest with PENDING+ACTIVE and when we exit (for some reason) we don't signal the resamplefd, right? The solution seems to me that we don't ever do PENDING+ACTIVE if you need to resample after each deactivate. What would be the point of appending a pending state that you only know to be valid after a resample anyway? > > Example 2: > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires We could be more clever and do the following calculation on every exit: If you enter with P, and exit with either A or 0, then signal. If you enter with P+A, and you exit with either P, A, or 0, then signal. Wouldn't that also solve it? (Although I have a feeling you'd miss some exits in this case). Thanks, -Christoffer ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 16:19 ` Christoffer Dall 0 siblings, 0 replies; 50+ messages in thread From: Christoffer Dall @ 2018-03-08 16:19 UTC (permalink / raw) To: linux-arm-kernel On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: > On 08/03/18 09:49, Marc Zyngier wrote: > > [updated Christoffer's email address] > > > > Hi Shunyong, > > > > On 08/03/18 07:01, Shunyong Yang wrote: > >> When resampling irqfds is enabled, level interrupt should be > >> de-asserted when resampling happens. On page 4-47 of GIC v3 > >> specification IHI0069D, it said, > >> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > >> interface, the IRI changes the status of the interrupt to active > >> and pending if: > >> ? It is an edge-triggered interrupt, and another edge has been > >> detected since the interrupt was acknowledged. > >> ? It is a level-sensitive interrupt, and the level has not been > >> deasserted since the interrupt was acknowledged." > >> > >> GIC v2 specification IHI0048B.b has similar description on page > >> 3-42 for state machine transition. > >> > >> When some VFIO device, like mtty(8250 VFIO mdev emulation driver > >> in samples/vfio-mdev) triggers a level interrupt, the status > >> transition in LR is pending-->active-->active and pending. > >> Then it will wait resampling to de-assert the interrupt. > >> > >> Current design of lr_signals_eoi_mi() will return false if state > >> in LR is not invalid(Inactive). It causes resampling will not happen > >> in mtty case. > > > > Let me rephrase this, and tell me if I understood it correctly: > > > > - A level interrupt is injected, activated by the guest (LR state=active) > > - guest exits, re-enters, (LR state=pending+active) > > - guest EOIs the interrupt (LR state=pending) > > - maintenance interrupt > > - we don't signal the resampling because we're not in an invalid state > > > > Is that correct? > > > > That's an interesting case, because it seems to invalidate some of the > > optimization that went in over a year ago. > > > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields > > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state > > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation > > > > We could compare the value of the LR before the guest entry with > > the value at exit time, but we still could miss it if we have a > > transition such as P+A -> P -> A and assume a long enough propagation > > delay for the maintenance interrupt (which is very likely). > > > > In essence, we have lost the benefit of EISR, which was to give us a > > way to deal with asynchronous signalling. > > > >> > >> This will cause interrupt fired continuously to guest even 8250 IIR > >> has no interrupt. When 8250's interrupt is configured in shared mode, > >> it will pass interrupt to other drivers to handle. However, there > >> is no other driver involved. Then, a "nobody cared" kernel complaint > >> occurs. > >> > >> / # cat /dev/ttyS0 > >> [ 4.826836] random: crng init done > >> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" > >> option) > >> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 > >> [ 6.378927] Hardware name: linux,dummy-virt (DT) > >> [ 6.380876] Call trace: > >> [ 6.381937] dump_backtrace+0x0/0x180 > >> [ 6.383495] show_stack+0x14/0x1c > >> [ 6.384902] dump_stack+0x90/0xb4 > >> [ 6.386312] __report_bad_irq+0x38/0xe0 > >> [ 6.387944] note_interrupt+0x1f4/0x2b8 > >> [ 6.389568] handle_irq_event_percpu+0x54/0x7c > >> [ 6.391433] handle_irq_event+0x44/0x74 > >> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 > >> [ 6.394784] generic_handle_irq+0x24/0x38 > >> [ 6.396483] __handle_domain_irq+0x60/0xb4 > >> [ 6.398207] gic_handle_irq+0x98/0x1b0 > >> [ 6.399796] el1_irq+0xb0/0x128 > >> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 > >> [ 6.403149] __setup_irq+0x41c/0x678 > >> [ 6.404669] request_threaded_irq+0xe0/0x190 > >> [ 6.406474] univ8250_setup_irq+0x208/0x234 > >> [ 6.408250] serial8250_do_startup+0x1b4/0x754 > >> [ 6.410123] serial8250_startup+0x20/0x28 > >> [ 6.411826] uart_startup.part.21+0x78/0x144 > >> [ 6.413633] uart_port_activate+0x50/0x68 > >> [ 6.415328] tty_port_open+0x84/0xd4 > >> [ 6.416851] uart_open+0x34/0x44 > >> [ 6.418229] tty_open+0xec/0x3c8 > >> [ 6.419610] chrdev_open+0xb0/0x198 > >> [ 6.421093] do_dentry_open+0x200/0x310 > >> [ 6.422714] vfs_open+0x54/0x84 > >> [ 6.424054] path_openat+0x2dc/0xf04 > >> [ 6.425569] do_filp_open+0x68/0xd8 > >> [ 6.427044] do_sys_open+0x16c/0x224 > >> [ 6.428563] SyS_openat+0x10/0x18 > >> [ 6.429972] el0_svc_naked+0x30/0x34 > >> [ 6.431494] handlers: > >> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt > >> [ 6.434597] Disabling IRQ #41 > >> > >> This patch changes the lr state condition in lr_signals_eoi_mi() from > >> invalid(Inactive) to active and pending to avoid this. > >> > >> I am not sure about the original design of the condition of > >> invalid(active). So, This RFC is sent out for comments. > >> > >> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > >> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > >> --- > >> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > >> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > >> 2 files changed, 4 insertions(+), 4 deletions(-) > >> > >> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > >> index e9d840a75e7b..740ee9a5f551 100644 > >> --- a/virt/kvm/arm/vgic/vgic-v2.c > >> +++ b/virt/kvm/arm/vgic/vgic-v2.c > >> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > >> > >> static bool lr_signals_eoi_mi(u32 lr_val) > >> { > >> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > >> - !(lr_val & GICH_LR_HW); > >> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && > > > > That feels very wrong. You're now signalling the resampling in both > > invalid and pending+active, and the latter state doesn't mean you've > > EOIed anything. You're now over-signalling, and signalling the > > wrong event. > > > >> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); > >> } > >> > >> /* > >> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > >> index 6b329414e57a..43111bba7af9 100644 > >> --- a/virt/kvm/arm/vgic/vgic-v3.c > >> +++ b/virt/kvm/arm/vgic/vgic-v3.c > >> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > >> > >> static bool lr_signals_eoi_mi(u64 lr_val) > >> { > >> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > >> - !(lr_val & ICH_LR_HW); > >> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && > >> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > >> } > >> > >> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > >> > > > > Assuming I understand the issue correctly, I cannot really see how > > to solve this without reintroducing EISR, which sucks majorly. > > > > I'll try to cook something shortly and we can all have a good > > fight about how crap this is. > > Here's what I came up with. I don't really like it, but that's > the least invasive this I could come up with. Please let me > know if that helps with your test case. Note that I have only > boot-tested this on a sample of 1 machine, so I don't expect this > to be perfect. > > Also, any guideline on how to reproduce this would be much appreciated. > I never used this mdev/mtty thing, so please bear with me. > > Thanks, > > M. > > From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 > From: Marc Zyngier <marc.zyngier@arm.com> > Date: Thu, 8 Mar 2018 11:14:06 +0000 > Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI > status > > We so far rely on the LR state to decide whether the guest has > EOI'd a level interrupt or not. While this looks like a good > idea on the surface, it leads to a couple of annoying corner > cases: > > Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) > P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI Do we really get an EOI maintenance interrupt here? Reading the MISR and EISR descriptions make me thing this is not the case... > > The state is now pending, we've really EOI'd the interrupt, and > yet lr_signals_eoi_mi() returns false, since the state is not 0. > The result is that we won't signal anything on the corresponding > irqfd, which people complain about. Meh. So the core of the problem is that when we've entered the guest with PENDING+ACTIVE and when we exit (for some reason) we don't signal the resamplefd, right? The solution seems to me that we don't ever do PENDING+ACTIVE if you need to resample after each deactivate. What would be the point of appending a pending state that you only know to be valid after a resample anyway? > > Example 2: > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires We could be more clever and do the following calculation on every exit: If you enter with P, and exit with either A or 0, then signal. If you enter with P+A, and you exit with either P, A, or 0, then signal. Wouldn't that also solve it? (Although I have a feeling you'd miss some exits in this case). Thanks, -Christoffer ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 16:19 ` Christoffer Dall @ 2018-03-08 17:28 ` Marc Zyngier -1 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-08 17:28 UTC (permalink / raw) To: Christoffer Dall Cc: Shunyong Yang, ard.biesheuvel, will.deacon, eric.auger, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng On Thu, 08 Mar 2018 16:19:00 +0000, Christoffer Dall wrote: > > On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: > > On 08/03/18 09:49, Marc Zyngier wrote: > > > [updated Christoffer's email address] > > > > > > Hi Shunyong, > > > > > > On 08/03/18 07:01, Shunyong Yang wrote: > > >> When resampling irqfds is enabled, level interrupt should be > > >> de-asserted when resampling happens. On page 4-47 of GIC v3 > > >> specification IHI0069D, it said, > > >> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > > >> interface, the IRI changes the status of the interrupt to active > > >> and pending if: > > >> • It is an edge-triggered interrupt, and another edge has been > > >> detected since the interrupt was acknowledged. > > >> • It is a level-sensitive interrupt, and the level has not been > > >> deasserted since the interrupt was acknowledged." > > >> > > >> GIC v2 specification IHI0048B.b has similar description on page > > >> 3-42 for state machine transition. > > >> > > >> When some VFIO device, like mtty(8250 VFIO mdev emulation driver > > >> in samples/vfio-mdev) triggers a level interrupt, the status > > >> transition in LR is pending-->active-->active and pending. > > >> Then it will wait resampling to de-assert the interrupt. > > >> > > >> Current design of lr_signals_eoi_mi() will return false if state > > >> in LR is not invalid(Inactive). It causes resampling will not happen > > >> in mtty case. > > > > > > Let me rephrase this, and tell me if I understood it correctly: > > > > > > - A level interrupt is injected, activated by the guest (LR state=active) > > > - guest exits, re-enters, (LR state=pending+active) > > > - guest EOIs the interrupt (LR state=pending) > > > - maintenance interrupt > > > - we don't signal the resampling because we're not in an invalid state > > > > > > Is that correct? > > > > > > That's an interesting case, because it seems to invalidate some of the > > > optimization that went in over a year ago. > > > > > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields > > > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state > > > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation > > > > > > We could compare the value of the LR before the guest entry with > > > the value at exit time, but we still could miss it if we have a > > > transition such as P+A -> P -> A and assume a long enough propagation > > > delay for the maintenance interrupt (which is very likely). > > > > > > In essence, we have lost the benefit of EISR, which was to give us a > > > way to deal with asynchronous signalling. > > > > > >> > > >> This will cause interrupt fired continuously to guest even 8250 IIR > > >> has no interrupt. When 8250's interrupt is configured in shared mode, > > >> it will pass interrupt to other drivers to handle. However, there > > >> is no other driver involved. Then, a "nobody cared" kernel complaint > > >> occurs. > > >> > > >> / # cat /dev/ttyS0 > > >> [ 4.826836] random: crng init done > > >> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" > > >> option) > > >> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 > > >> [ 6.378927] Hardware name: linux,dummy-virt (DT) > > >> [ 6.380876] Call trace: > > >> [ 6.381937] dump_backtrace+0x0/0x180 > > >> [ 6.383495] show_stack+0x14/0x1c > > >> [ 6.384902] dump_stack+0x90/0xb4 > > >> [ 6.386312] __report_bad_irq+0x38/0xe0 > > >> [ 6.387944] note_interrupt+0x1f4/0x2b8 > > >> [ 6.389568] handle_irq_event_percpu+0x54/0x7c > > >> [ 6.391433] handle_irq_event+0x44/0x74 > > >> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 > > >> [ 6.394784] generic_handle_irq+0x24/0x38 > > >> [ 6.396483] __handle_domain_irq+0x60/0xb4 > > >> [ 6.398207] gic_handle_irq+0x98/0x1b0 > > >> [ 6.399796] el1_irq+0xb0/0x128 > > >> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 > > >> [ 6.403149] __setup_irq+0x41c/0x678 > > >> [ 6.404669] request_threaded_irq+0xe0/0x190 > > >> [ 6.406474] univ8250_setup_irq+0x208/0x234 > > >> [ 6.408250] serial8250_do_startup+0x1b4/0x754 > > >> [ 6.410123] serial8250_startup+0x20/0x28 > > >> [ 6.411826] uart_startup.part.21+0x78/0x144 > > >> [ 6.413633] uart_port_activate+0x50/0x68 > > >> [ 6.415328] tty_port_open+0x84/0xd4 > > >> [ 6.416851] uart_open+0x34/0x44 > > >> [ 6.418229] tty_open+0xec/0x3c8 > > >> [ 6.419610] chrdev_open+0xb0/0x198 > > >> [ 6.421093] do_dentry_open+0x200/0x310 > > >> [ 6.422714] vfs_open+0x54/0x84 > > >> [ 6.424054] path_openat+0x2dc/0xf04 > > >> [ 6.425569] do_filp_open+0x68/0xd8 > > >> [ 6.427044] do_sys_open+0x16c/0x224 > > >> [ 6.428563] SyS_openat+0x10/0x18 > > >> [ 6.429972] el0_svc_naked+0x30/0x34 > > >> [ 6.431494] handlers: > > >> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt > > >> [ 6.434597] Disabling IRQ #41 > > >> > > >> This patch changes the lr state condition in lr_signals_eoi_mi() from > > >> invalid(Inactive) to active and pending to avoid this. > > >> > > >> I am not sure about the original design of the condition of > > >> invalid(active). So, This RFC is sent out for comments. > > >> > > >> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > > >> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > > >> --- > > >> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > > >> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > > >> 2 files changed, 4 insertions(+), 4 deletions(-) > > >> > > >> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > > >> index e9d840a75e7b..740ee9a5f551 100644 > > >> --- a/virt/kvm/arm/vgic/vgic-v2.c > > >> +++ b/virt/kvm/arm/vgic/vgic-v2.c > > >> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > >> > > >> static bool lr_signals_eoi_mi(u32 lr_val) > > >> { > > >> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > > >> - !(lr_val & GICH_LR_HW); > > >> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && > > > > > > That feels very wrong. You're now signalling the resampling in both > > > invalid and pending+active, and the latter state doesn't mean you've > > > EOIed anything. You're now over-signalling, and signalling the > > > wrong event. > > > > > >> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); > > >> } > > >> > > >> /* > > >> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > > >> index 6b329414e57a..43111bba7af9 100644 > > >> --- a/virt/kvm/arm/vgic/vgic-v3.c > > >> +++ b/virt/kvm/arm/vgic/vgic-v3.c > > >> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > >> > > >> static bool lr_signals_eoi_mi(u64 lr_val) > > >> { > > >> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > > >> - !(lr_val & ICH_LR_HW); > > >> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && > > >> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > > >> } > > >> > > >> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > > >> > > > > > > Assuming I understand the issue correctly, I cannot really see how > > > to solve this without reintroducing EISR, which sucks majorly. > > > > > > I'll try to cook something shortly and we can all have a good > > > fight about how crap this is. > > > > Here's what I came up with. I don't really like it, but that's > > the least invasive this I could come up with. Please let me > > know if that helps with your test case. Note that I have only > > boot-tested this on a sample of 1 machine, so I don't expect this > > to be perfect. > > > > Also, any guideline on how to reproduce this would be much appreciated. > > I never used this mdev/mtty thing, so please bear with me. > > > > Thanks, > > > > M. > > > > From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 > > From: Marc Zyngier <marc.zyngier@arm.com> > > Date: Thu, 8 Mar 2018 11:14:06 +0000 > > Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI > > status > > > > We so far rely on the LR state to decide whether the guest has > > EOI'd a level interrupt or not. While this looks like a good > > idea on the surface, it leads to a couple of annoying corner > > cases: > > > > Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) > > P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI > > Do we really get an EOI maintenance interrupt here? Reading the MISR > and EISR descriptions make me thing this is not the case... Yeah, it looks like I always want EISR to do what I want, and not to do what it does. Man, this thing is such a piece of crap. OK, scratch that. We need to do it without the help of the HW. > > The state is now pending, we've really EOI'd the interrupt, and > > yet lr_signals_eoi_mi() returns false, since the state is not 0. > > The result is that we won't signal anything on the corresponding > > irqfd, which people complain about. Meh. > > So the core of the problem is that when we've entered the guest with > PENDING+ACTIVE and when we exit (for some reason) we don't signal the > resamplefd, right? The solution seems to me that we don't ever do > PENDING+ACTIVE if you need to resample after each deactivate. What > would be the point of appending a pending state that you only know to be > valid after a resample anyway? The question is then to identify that a given source needs to be signalled back to VFIO. Calling into the eventfd code on the hot path is pretty horrid (I'm not sure if we can really call into this with interrupts disabled, for example). > > > > > Example 2: > > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires > > We could be more clever and do the following calculation on every exit: > > If you enter with P, and exit with either A or 0, then signal. > > If you enter with P+A, and you exit with either P, A, or 0, then signal. > > Wouldn't that also solve it? (Although I have a feeling you'd miss some > exits in this case). I'd be more confident if we did forbid P+A for such interrupts altogether, as they really feel like another kind of HW interrupt. Eric: Is there any way to get a callback from the eventfd code to flag a given irq as requiring a notification on EOI? Thanks, M. -- Jazz is not dead, it just smell funny. ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 17:28 ` Marc Zyngier 0 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-08 17:28 UTC (permalink / raw) To: linux-arm-kernel On Thu, 08 Mar 2018 16:19:00 +0000, Christoffer Dall wrote: > > On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: > > On 08/03/18 09:49, Marc Zyngier wrote: > > > [updated Christoffer's email address] > > > > > > Hi Shunyong, > > > > > > On 08/03/18 07:01, Shunyong Yang wrote: > > >> When resampling irqfds is enabled, level interrupt should be > > >> de-asserted when resampling happens. On page 4-47 of GIC v3 > > >> specification IHI0069D, it said, > > >> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > > >> interface, the IRI changes the status of the interrupt to active > > >> and pending if: > > >> ? It is an edge-triggered interrupt, and another edge has been > > >> detected since the interrupt was acknowledged. > > >> ? It is a level-sensitive interrupt, and the level has not been > > >> deasserted since the interrupt was acknowledged." > > >> > > >> GIC v2 specification IHI0048B.b has similar description on page > > >> 3-42 for state machine transition. > > >> > > >> When some VFIO device, like mtty(8250 VFIO mdev emulation driver > > >> in samples/vfio-mdev) triggers a level interrupt, the status > > >> transition in LR is pending-->active-->active and pending. > > >> Then it will wait resampling to de-assert the interrupt. > > >> > > >> Current design of lr_signals_eoi_mi() will return false if state > > >> in LR is not invalid(Inactive). It causes resampling will not happen > > >> in mtty case. > > > > > > Let me rephrase this, and tell me if I understood it correctly: > > > > > > - A level interrupt is injected, activated by the guest (LR state=active) > > > - guest exits, re-enters, (LR state=pending+active) > > > - guest EOIs the interrupt (LR state=pending) > > > - maintenance interrupt > > > - we don't signal the resampling because we're not in an invalid state > > > > > > Is that correct? > > > > > > That's an interesting case, because it seems to invalidate some of the > > > optimization that went in over a year ago. > > > > > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields > > > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state > > > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation > > > > > > We could compare the value of the LR before the guest entry with > > > the value at exit time, but we still could miss it if we have a > > > transition such as P+A -> P -> A and assume a long enough propagation > > > delay for the maintenance interrupt (which is very likely). > > > > > > In essence, we have lost the benefit of EISR, which was to give us a > > > way to deal with asynchronous signalling. > > > > > >> > > >> This will cause interrupt fired continuously to guest even 8250 IIR > > >> has no interrupt. When 8250's interrupt is configured in shared mode, > > >> it will pass interrupt to other drivers to handle. However, there > > >> is no other driver involved. Then, a "nobody cared" kernel complaint > > >> occurs. > > >> > > >> / # cat /dev/ttyS0 > > >> [ 4.826836] random: crng init done > > >> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" > > >> option) > > >> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 > > >> [ 6.378927] Hardware name: linux,dummy-virt (DT) > > >> [ 6.380876] Call trace: > > >> [ 6.381937] dump_backtrace+0x0/0x180 > > >> [ 6.383495] show_stack+0x14/0x1c > > >> [ 6.384902] dump_stack+0x90/0xb4 > > >> [ 6.386312] __report_bad_irq+0x38/0xe0 > > >> [ 6.387944] note_interrupt+0x1f4/0x2b8 > > >> [ 6.389568] handle_irq_event_percpu+0x54/0x7c > > >> [ 6.391433] handle_irq_event+0x44/0x74 > > >> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 > > >> [ 6.394784] generic_handle_irq+0x24/0x38 > > >> [ 6.396483] __handle_domain_irq+0x60/0xb4 > > >> [ 6.398207] gic_handle_irq+0x98/0x1b0 > > >> [ 6.399796] el1_irq+0xb0/0x128 > > >> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 > > >> [ 6.403149] __setup_irq+0x41c/0x678 > > >> [ 6.404669] request_threaded_irq+0xe0/0x190 > > >> [ 6.406474] univ8250_setup_irq+0x208/0x234 > > >> [ 6.408250] serial8250_do_startup+0x1b4/0x754 > > >> [ 6.410123] serial8250_startup+0x20/0x28 > > >> [ 6.411826] uart_startup.part.21+0x78/0x144 > > >> [ 6.413633] uart_port_activate+0x50/0x68 > > >> [ 6.415328] tty_port_open+0x84/0xd4 > > >> [ 6.416851] uart_open+0x34/0x44 > > >> [ 6.418229] tty_open+0xec/0x3c8 > > >> [ 6.419610] chrdev_open+0xb0/0x198 > > >> [ 6.421093] do_dentry_open+0x200/0x310 > > >> [ 6.422714] vfs_open+0x54/0x84 > > >> [ 6.424054] path_openat+0x2dc/0xf04 > > >> [ 6.425569] do_filp_open+0x68/0xd8 > > >> [ 6.427044] do_sys_open+0x16c/0x224 > > >> [ 6.428563] SyS_openat+0x10/0x18 > > >> [ 6.429972] el0_svc_naked+0x30/0x34 > > >> [ 6.431494] handlers: > > >> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt > > >> [ 6.434597] Disabling IRQ #41 > > >> > > >> This patch changes the lr state condition in lr_signals_eoi_mi() from > > >> invalid(Inactive) to active and pending to avoid this. > > >> > > >> I am not sure about the original design of the condition of > > >> invalid(active). So, This RFC is sent out for comments. > > >> > > >> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > > >> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> > > >> --- > > >> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > > >> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > > >> 2 files changed, 4 insertions(+), 4 deletions(-) > > >> > > >> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > > >> index e9d840a75e7b..740ee9a5f551 100644 > > >> --- a/virt/kvm/arm/vgic/vgic-v2.c > > >> +++ b/virt/kvm/arm/vgic/vgic-v2.c > > >> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > >> > > >> static bool lr_signals_eoi_mi(u32 lr_val) > > >> { > > >> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > > >> - !(lr_val & GICH_LR_HW); > > >> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && > > > > > > That feels very wrong. You're now signalling the resampling in both > > > invalid and pending+active, and the latter state doesn't mean you've > > > EOIed anything. You're now over-signalling, and signalling the > > > wrong event. > > > > > >> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); > > >> } > > >> > > >> /* > > >> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > > >> index 6b329414e57a..43111bba7af9 100644 > > >> --- a/virt/kvm/arm/vgic/vgic-v3.c > > >> +++ b/virt/kvm/arm/vgic/vgic-v3.c > > >> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > >> > > >> static bool lr_signals_eoi_mi(u64 lr_val) > > >> { > > >> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > > >> - !(lr_val & ICH_LR_HW); > > >> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && > > >> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > > >> } > > >> > > >> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > > >> > > > > > > Assuming I understand the issue correctly, I cannot really see how > > > to solve this without reintroducing EISR, which sucks majorly. > > > > > > I'll try to cook something shortly and we can all have a good > > > fight about how crap this is. > > > > Here's what I came up with. I don't really like it, but that's > > the least invasive this I could come up with. Please let me > > know if that helps with your test case. Note that I have only > > boot-tested this on a sample of 1 machine, so I don't expect this > > to be perfect. > > > > Also, any guideline on how to reproduce this would be much appreciated. > > I never used this mdev/mtty thing, so please bear with me. > > > > Thanks, > > > > M. > > > > From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 > > From: Marc Zyngier <marc.zyngier@arm.com> > > Date: Thu, 8 Mar 2018 11:14:06 +0000 > > Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI > > status > > > > We so far rely on the LR state to decide whether the guest has > > EOI'd a level interrupt or not. While this looks like a good > > idea on the surface, it leads to a couple of annoying corner > > cases: > > > > Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) > > P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI > > Do we really get an EOI maintenance interrupt here? Reading the MISR > and EISR descriptions make me thing this is not the case... Yeah, it looks like I always want EISR to do what I want, and not to do what it does. Man, this thing is such a piece of crap. OK, scratch that. We need to do it without the help of the HW. > > The state is now pending, we've really EOI'd the interrupt, and > > yet lr_signals_eoi_mi() returns false, since the state is not 0. > > The result is that we won't signal anything on the corresponding > > irqfd, which people complain about. Meh. > > So the core of the problem is that when we've entered the guest with > PENDING+ACTIVE and when we exit (for some reason) we don't signal the > resamplefd, right? The solution seems to me that we don't ever do > PENDING+ACTIVE if you need to resample after each deactivate. What > would be the point of appending a pending state that you only know to be > valid after a resample anyway? The question is then to identify that a given source needs to be signalled back to VFIO. Calling into the eventfd code on the hot path is pretty horrid (I'm not sure if we can really call into this with interrupts disabled, for example). > > > > > Example 2: > > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires > > We could be more clever and do the following calculation on every exit: > > If you enter with P, and exit with either A or 0, then signal. > > If you enter with P+A, and you exit with either P, A, or 0, then signal. > > Wouldn't that also solve it? (Although I have a feeling you'd miss some > exits in this case). I'd be more confident if we did forbid P+A for such interrupts altogether, as they really feel like another kind of HW interrupt. Eric: Is there any way to get a callback from the eventfd code to flag a given irq as requiring a notification on EOI? Thanks, M. -- Jazz is not dead, it just smell funny. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 17:28 ` Marc Zyngier @ 2018-03-08 18:12 ` Auger Eric -1 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-08 18:12 UTC (permalink / raw) To: Marc Zyngier, Christoffer Dall Cc: Shunyong Yang, ard.biesheuvel, will.deacon, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng Hi Marc, Christoffer, On 08/03/18 18:28, Marc Zyngier wrote: > On Thu, 08 Mar 2018 16:19:00 +0000, > Christoffer Dall wrote: >> >> On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: >>> On 08/03/18 09:49, Marc Zyngier wrote: >>>> [updated Christoffer's email address] >>>> >>>> Hi Shunyong, >>>> >>>> On 08/03/18 07:01, Shunyong Yang wrote: >>>>> When resampling irqfds is enabled, level interrupt should be >>>>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>>>> specification IHI0069D, it said, >>>>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>>>> interface, the IRI changes the status of the interrupt to active >>>>> and pending if: >>>>> • It is an edge-triggered interrupt, and another edge has been >>>>> detected since the interrupt was acknowledged. >>>>> • It is a level-sensitive interrupt, and the level has not been >>>>> deasserted since the interrupt was acknowledged." >>>>> >>>>> GIC v2 specification IHI0048B.b has similar description on page >>>>> 3-42 for state machine transition. >>>>> >>>>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>>>> in samples/vfio-mdev) triggers a level interrupt, the status >>>>> transition in LR is pending-->active-->active and pending. >>>>> Then it will wait resampling to de-assert the interrupt. >>>>> >>>>> Current design of lr_signals_eoi_mi() will return false if state >>>>> in LR is not invalid(Inactive). It causes resampling will not happen >>>>> in mtty case. >>>> >>>> Let me rephrase this, and tell me if I understood it correctly: >>>> >>>> - A level interrupt is injected, activated by the guest (LR state=active) >>>> - guest exits, re-enters, (LR state=pending+active) >>>> - guest EOIs the interrupt (LR state=pending) >>>> - maintenance interrupt >>>> - we don't signal the resampling because we're not in an invalid state >>>> >>>> Is that correct? >>>> >>>> That's an interesting case, because it seems to invalidate some of the >>>> optimization that went in over a year ago. >>>> >>>> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields >>>> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state >>>> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation >>>> >>>> We could compare the value of the LR before the guest entry with >>>> the value at exit time, but we still could miss it if we have a >>>> transition such as P+A -> P -> A and assume a long enough propagation >>>> delay for the maintenance interrupt (which is very likely). >>>> >>>> In essence, we have lost the benefit of EISR, which was to give us a >>>> way to deal with asynchronous signalling. >>>> >>>>> >>>>> This will cause interrupt fired continuously to guest even 8250 IIR >>>>> has no interrupt. When 8250's interrupt is configured in shared mode, >>>>> it will pass interrupt to other drivers to handle. However, there >>>>> is no other driver involved. Then, a "nobody cared" kernel complaint >>>>> occurs. >>>>> >>>>> / # cat /dev/ttyS0 >>>>> [ 4.826836] random: crng init done >>>>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>>>> option) >>>>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>>>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>>>> [ 6.380876] Call trace: >>>>> [ 6.381937] dump_backtrace+0x0/0x180 >>>>> [ 6.383495] show_stack+0x14/0x1c >>>>> [ 6.384902] dump_stack+0x90/0xb4 >>>>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>>>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>>>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>>>> [ 6.391433] handle_irq_event+0x44/0x74 >>>>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>>>> [ 6.394784] generic_handle_irq+0x24/0x38 >>>>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>>>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>>>> [ 6.399796] el1_irq+0xb0/0x128 >>>>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>>>> [ 6.403149] __setup_irq+0x41c/0x678 >>>>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>>>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>>>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>>>> [ 6.410123] serial8250_startup+0x20/0x28 >>>>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>>>> [ 6.413633] uart_port_activate+0x50/0x68 >>>>> [ 6.415328] tty_port_open+0x84/0xd4 >>>>> [ 6.416851] uart_open+0x34/0x44 >>>>> [ 6.418229] tty_open+0xec/0x3c8 >>>>> [ 6.419610] chrdev_open+0xb0/0x198 >>>>> [ 6.421093] do_dentry_open+0x200/0x310 >>>>> [ 6.422714] vfs_open+0x54/0x84 >>>>> [ 6.424054] path_openat+0x2dc/0xf04 >>>>> [ 6.425569] do_filp_open+0x68/0xd8 >>>>> [ 6.427044] do_sys_open+0x16c/0x224 >>>>> [ 6.428563] SyS_openat+0x10/0x18 >>>>> [ 6.429972] el0_svc_naked+0x30/0x34 >>>>> [ 6.431494] handlers: >>>>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>>>> [ 6.434597] Disabling IRQ #41 >>>>> >>>>> This patch changes the lr state condition in lr_signals_eoi_mi() from >>>>> invalid(Inactive) to active and pending to avoid this. >>>>> >>>>> I am not sure about the original design of the condition of >>>>> invalid(active). So, This RFC is sent out for comments. >>>>> >>>>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>>>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>>>> --- >>>>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>>>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>>>> 2 files changed, 4 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c >>>>> index e9d840a75e7b..740ee9a5f551 100644 >>>>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>>>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>>>> >>>>> static bool lr_signals_eoi_mi(u32 lr_val) >>>>> { >>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && >>>>> - !(lr_val & GICH_LR_HW); >>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >>>> >>>> That feels very wrong. You're now signalling the resampling in both >>>> invalid and pending+active, and the latter state doesn't mean you've >>>> EOIed anything. You're now over-signalling, and signalling the >>>> wrong event. >>>> >>>>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>>>> } >>>>> >>>>> /* >>>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c >>>>> index 6b329414e57a..43111bba7af9 100644 >>>>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>>>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>>>> >>>>> static bool lr_signals_eoi_mi(u64 lr_val) >>>>> { >>>>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && >>>>> - !(lr_val & ICH_LR_HW); >>>>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>>>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >>>>> } >>>>> >>>>> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >>>>> >>>> >>>> Assuming I understand the issue correctly, I cannot really see how >>>> to solve this without reintroducing EISR, which sucks majorly. >>>> >>>> I'll try to cook something shortly and we can all have a good >>>> fight about how crap this is. >>> >>> Here's what I came up with. I don't really like it, but that's >>> the least invasive this I could come up with. Please let me >>> know if that helps with your test case. Note that I have only >>> boot-tested this on a sample of 1 machine, so I don't expect this >>> to be perfect. >>> >>> Also, any guideline on how to reproduce this would be much appreciated. >>> I never used this mdev/mtty thing, so please bear with me. >>> >>> Thanks, >>> >>> M. >>> >>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 >>> From: Marc Zyngier <marc.zyngier@arm.com> >>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI >>> status >>> >>> We so far rely on the LR state to decide whether the guest has >>> EOI'd a level interrupt or not. While this looks like a good >>> idea on the surface, it leads to a couple of annoying corner >>> cases: >>> >>> Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) >>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI >> >> Do we really get an EOI maintenance interrupt here? Reading the MISR >> and EISR descriptions make me thing this is not the case... Hum yes in EISR it is said that ICH_LR.State = 0b00! > > Yeah, it looks like I always want EISR to do what I want, and not to > do what it does. Man, this thing is such a piece of crap. > > OK, scratch that. We need to do it without the help of the HW. > >>> The state is now pending, we've really EOI'd the interrupt, and >>> yet lr_signals_eoi_mi() returns false, since the state is not 0. >>> The result is that we won't signal anything on the corresponding >>> irqfd, which people complain about. Meh. >> >> So the core of the problem is that when we've entered the guest with >> PENDING+ACTIVE and when we exit (for some reason) we don't signal the >> resamplefd, right? The solution seems to me that we don't ever do >> PENDING+ACTIVE if you need to resample after each deactivate. What >> would be the point of appending a pending state that you only know to be >> valid after a resample anyway? > > The question is then to identify that a given source needs to be > signalled back to VFIO. Calling into the eventfd code on the hot path > is pretty horrid (I'm not sure if we can really call into this with > interrupts disabled, for example). > >> >>> >>> Example 2: >>> P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires >> >> We could be more clever and do the following calculation on every exit: >> >> If you enter with P, and exit with either A or 0, then signal. >> >> If you enter with P+A, and you exit with either P, A, or 0, then signal. >> >> Wouldn't that also solve it? (Although I have a feeling you'd miss some >> exits in this case). > > I'd be more confident if we did forbid P+A for such interrupts > altogether, as they really feel like another kind of HW interrupt. the LR P+A looks strange to me too. all the more so it may cause the same IRQ to be acked twice? P -> A -> 0 (resample). Doesn't our issue come from the fact we reinject the P in LR until the line level is deasserted? > > Eric: Is there any way to get a callback from the eventfd code to flag > a given irq as requiring a notification on EOI? bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin) was used in the past. I think it does what you want. Thanks Eric > > Thanks, > > M. > ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 18:12 ` Auger Eric 0 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-08 18:12 UTC (permalink / raw) To: linux-arm-kernel Hi Marc, Christoffer, On 08/03/18 18:28, Marc Zyngier wrote: > On Thu, 08 Mar 2018 16:19:00 +0000, > Christoffer Dall wrote: >> >> On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: >>> On 08/03/18 09:49, Marc Zyngier wrote: >>>> [updated Christoffer's email address] >>>> >>>> Hi Shunyong, >>>> >>>> On 08/03/18 07:01, Shunyong Yang wrote: >>>>> When resampling irqfds is enabled, level interrupt should be >>>>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>>>> specification IHI0069D, it said, >>>>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>>>> interface, the IRI changes the status of the interrupt to active >>>>> and pending if: >>>>> ? It is an edge-triggered interrupt, and another edge has been >>>>> detected since the interrupt was acknowledged. >>>>> ? It is a level-sensitive interrupt, and the level has not been >>>>> deasserted since the interrupt was acknowledged." >>>>> >>>>> GIC v2 specification IHI0048B.b has similar description on page >>>>> 3-42 for state machine transition. >>>>> >>>>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>>>> in samples/vfio-mdev) triggers a level interrupt, the status >>>>> transition in LR is pending-->active-->active and pending. >>>>> Then it will wait resampling to de-assert the interrupt. >>>>> >>>>> Current design of lr_signals_eoi_mi() will return false if state >>>>> in LR is not invalid(Inactive). It causes resampling will not happen >>>>> in mtty case. >>>> >>>> Let me rephrase this, and tell me if I understood it correctly: >>>> >>>> - A level interrupt is injected, activated by the guest (LR state=active) >>>> - guest exits, re-enters, (LR state=pending+active) >>>> - guest EOIs the interrupt (LR state=pending) >>>> - maintenance interrupt >>>> - we don't signal the resampling because we're not in an invalid state >>>> >>>> Is that correct? >>>> >>>> That's an interesting case, because it seems to invalidate some of the >>>> optimization that went in over a year ago. >>>> >>>> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields >>>> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state >>>> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation >>>> >>>> We could compare the value of the LR before the guest entry with >>>> the value at exit time, but we still could miss it if we have a >>>> transition such as P+A -> P -> A and assume a long enough propagation >>>> delay for the maintenance interrupt (which is very likely). >>>> >>>> In essence, we have lost the benefit of EISR, which was to give us a >>>> way to deal with asynchronous signalling. >>>> >>>>> >>>>> This will cause interrupt fired continuously to guest even 8250 IIR >>>>> has no interrupt. When 8250's interrupt is configured in shared mode, >>>>> it will pass interrupt to other drivers to handle. However, there >>>>> is no other driver involved. Then, a "nobody cared" kernel complaint >>>>> occurs. >>>>> >>>>> / # cat /dev/ttyS0 >>>>> [ 4.826836] random: crng init done >>>>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>>>> option) >>>>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>>>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>>>> [ 6.380876] Call trace: >>>>> [ 6.381937] dump_backtrace+0x0/0x180 >>>>> [ 6.383495] show_stack+0x14/0x1c >>>>> [ 6.384902] dump_stack+0x90/0xb4 >>>>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>>>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>>>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>>>> [ 6.391433] handle_irq_event+0x44/0x74 >>>>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>>>> [ 6.394784] generic_handle_irq+0x24/0x38 >>>>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>>>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>>>> [ 6.399796] el1_irq+0xb0/0x128 >>>>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>>>> [ 6.403149] __setup_irq+0x41c/0x678 >>>>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>>>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>>>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>>>> [ 6.410123] serial8250_startup+0x20/0x28 >>>>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>>>> [ 6.413633] uart_port_activate+0x50/0x68 >>>>> [ 6.415328] tty_port_open+0x84/0xd4 >>>>> [ 6.416851] uart_open+0x34/0x44 >>>>> [ 6.418229] tty_open+0xec/0x3c8 >>>>> [ 6.419610] chrdev_open+0xb0/0x198 >>>>> [ 6.421093] do_dentry_open+0x200/0x310 >>>>> [ 6.422714] vfs_open+0x54/0x84 >>>>> [ 6.424054] path_openat+0x2dc/0xf04 >>>>> [ 6.425569] do_filp_open+0x68/0xd8 >>>>> [ 6.427044] do_sys_open+0x16c/0x224 >>>>> [ 6.428563] SyS_openat+0x10/0x18 >>>>> [ 6.429972] el0_svc_naked+0x30/0x34 >>>>> [ 6.431494] handlers: >>>>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>>>> [ 6.434597] Disabling IRQ #41 >>>>> >>>>> This patch changes the lr state condition in lr_signals_eoi_mi() from >>>>> invalid(Inactive) to active and pending to avoid this. >>>>> >>>>> I am not sure about the original design of the condition of >>>>> invalid(active). So, This RFC is sent out for comments. >>>>> >>>>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>>>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>>>> --- >>>>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>>>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>>>> 2 files changed, 4 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c >>>>> index e9d840a75e7b..740ee9a5f551 100644 >>>>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>>>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>>>> >>>>> static bool lr_signals_eoi_mi(u32 lr_val) >>>>> { >>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && >>>>> - !(lr_val & GICH_LR_HW); >>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >>>> >>>> That feels very wrong. You're now signalling the resampling in both >>>> invalid and pending+active, and the latter state doesn't mean you've >>>> EOIed anything. You're now over-signalling, and signalling the >>>> wrong event. >>>> >>>>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>>>> } >>>>> >>>>> /* >>>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c >>>>> index 6b329414e57a..43111bba7af9 100644 >>>>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>>>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>>>> >>>>> static bool lr_signals_eoi_mi(u64 lr_val) >>>>> { >>>>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && >>>>> - !(lr_val & ICH_LR_HW); >>>>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>>>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >>>>> } >>>>> >>>>> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >>>>> >>>> >>>> Assuming I understand the issue correctly, I cannot really see how >>>> to solve this without reintroducing EISR, which sucks majorly. >>>> >>>> I'll try to cook something shortly and we can all have a good >>>> fight about how crap this is. >>> >>> Here's what I came up with. I don't really like it, but that's >>> the least invasive this I could come up with. Please let me >>> know if that helps with your test case. Note that I have only >>> boot-tested this on a sample of 1 machine, so I don't expect this >>> to be perfect. >>> >>> Also, any guideline on how to reproduce this would be much appreciated. >>> I never used this mdev/mtty thing, so please bear with me. >>> >>> Thanks, >>> >>> M. >>> >>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 >>> From: Marc Zyngier <marc.zyngier@arm.com> >>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI >>> status >>> >>> We so far rely on the LR state to decide whether the guest has >>> EOI'd a level interrupt or not. While this looks like a good >>> idea on the surface, it leads to a couple of annoying corner >>> cases: >>> >>> Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) >>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI >> >> Do we really get an EOI maintenance interrupt here? Reading the MISR >> and EISR descriptions make me thing this is not the case... Hum yes in EISR it is said that ICH_LR.State = 0b00! > > Yeah, it looks like I always want EISR to do what I want, and not to > do what it does. Man, this thing is such a piece of crap. > > OK, scratch that. We need to do it without the help of the HW. > >>> The state is now pending, we've really EOI'd the interrupt, and >>> yet lr_signals_eoi_mi() returns false, since the state is not 0. >>> The result is that we won't signal anything on the corresponding >>> irqfd, which people complain about. Meh. >> >> So the core of the problem is that when we've entered the guest with >> PENDING+ACTIVE and when we exit (for some reason) we don't signal the >> resamplefd, right? The solution seems to me that we don't ever do >> PENDING+ACTIVE if you need to resample after each deactivate. What >> would be the point of appending a pending state that you only know to be >> valid after a resample anyway? > > The question is then to identify that a given source needs to be > signalled back to VFIO. Calling into the eventfd code on the hot path > is pretty horrid (I'm not sure if we can really call into this with > interrupts disabled, for example). > >> >>> >>> Example 2: >>> P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires >> >> We could be more clever and do the following calculation on every exit: >> >> If you enter with P, and exit with either A or 0, then signal. >> >> If you enter with P+A, and you exit with either P, A, or 0, then signal. >> >> Wouldn't that also solve it? (Although I have a feeling you'd miss some >> exits in this case). > > I'd be more confident if we did forbid P+A for such interrupts > altogether, as they really feel like another kind of HW interrupt. the LR P+A looks strange to me too. all the more so it may cause the same IRQ to be acked twice? P -> A -> 0 (resample). Doesn't our issue come from the fact we reinject the P in LR until the line level is deasserted? > > Eric: Is there any way to get a callback from the eventfd code to flag > a given irq as requiring a notification on EOI? bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin) was used in the past. I think it does what you want. Thanks Eric > > Thanks, > > M. > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 18:12 ` Auger Eric @ 2018-03-09 3:14 ` Yang, Shunyong -1 siblings, 0 replies; 50+ messages in thread From: Yang, Shunyong @ 2018-03-09 3:14 UTC (permalink / raw) To: eric.auger, marc.zyngier, cdall Cc: linux-arm-kernel, linux-kernel, david.daney, ard.biesheuvel, kvmarm, will.deacon, Zheng, Joey Hi, Eric, Marc and Christoffer, On Thu, 2018-03-08 at 19:12 +0100, Auger Eric wrote: > Hi Marc, Christoffer, > > On 08/03/18 18:28, Marc Zyngier wrote: > > > > On Thu, 08 Mar 2018 16:19:00 +0000, > > Christoffer Dall wrote: > > > > > > > > > On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: > > > > > > > > On 08/03/18 09:49, Marc Zyngier wrote: > > > > > > > > > > [updated Christoffer's email address] > > > > > > > > > > Hi Shunyong, > > > > > > > > > > On 08/03/18 07:01, Shunyong Yang wrote: > > > > > > > > > > > > When resampling irqfds is enabled, level interrupt should > > > > > > be > > > > > > de-asserted when resampling happens. On page 4-47 of GIC v3 > > > > > > specification IHI0069D, it said, > > > > > > "When the PE acknowledges an SGI, a PPI, or an SPI at the > > > > > > CPU > > > > > > interface, the IRI changes the status of the interrupt to > > > > > > active > > > > > > and pending if: > > > > > > • It is an edge-triggered interrupt, and another edge has > > > > > > been > > > > > > detected since the interrupt was acknowledged. > > > > > > • It is a level-sensitive interrupt, and the level has not > > > > > > been > > > > > > deasserted since the interrupt was acknowledged." > > > > > > > > > > > > GIC v2 specification IHI0048B.b has similar description on > > > > > > page > > > > > > 3-42 for state machine transition. > > > > > > > > > > > > When some VFIO device, like mtty(8250 VFIO mdev emulation > > > > > > driver > > > > > > in samples/vfio-mdev) triggers a level interrupt, the > > > > > > status > > > > > > transition in LR is pending-->active-->active and pending. > > > > > > Then it will wait resampling to de-assert the interrupt. > > > > > > > > > > > > Current design of lr_signals_eoi_mi() will return false if > > > > > > state > > > > > > in LR is not invalid(Inactive). It causes resampling will > > > > > > not happen > > > > > > in mtty case. > > > > > Let me rephrase this, and tell me if I understood it > > > > > correctly: > > > > > > > > > > - A level interrupt is injected, activated by the guest (LR > > > > > state=active) > > > > > - guest exits, re-enters, (LR state=pending+active) > > > > > - guest EOIs the interrupt (LR state=pending) > > > > > - maintenance interrupt > > > > > - we don't signal the resampling because we're not in an > > > > > invalid state > > > > > > > > > > Is that correct? > > > > > > > > > > That's an interesting case, because it seems to invalidate > > > > > some of the > > > > > optimization that went in over a year ago. > > > > > > > > > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR > > > > > fields > > > > > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary > > > > > save_maint_int_state > > > > > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary > > > > > process_maintenance operation > > > > > > > > > > We could compare the value of the LR before the guest entry > > > > > with > > > > > the value at exit time, but we still could miss it if we have > > > > > a > > > > > transition such as P+A -> P -> A and assume a long enough > > > > > propagation > > > > > delay for the maintenance interrupt (which is very likely). > > > > > > > > > > In essence, we have lost the benefit of EISR, which was to > > > > > give us a > > > > > way to deal with asynchronous signalling. > > > > > > > > > > > > > > > > > > > > > > > This will cause interrupt fired continuously to guest even > > > > > > 8250 IIR > > > > > > has no interrupt. When 8250's interrupt is configured in > > > > > > shared mode, > > > > > > it will pass interrupt to other drivers to handle. However, > > > > > > there > > > > > > is no other driver involved. Then, a "nobody cared" kernel > > > > > > complaint > > > > > > occurs. > > > > > > > > > > > > / # cat /dev/ttyS0 > > > > > > [ 4.826836] random: crng init done > > > > > > [ 6.373620] irq 41: nobody cared (try booting with the > > > > > > "irqpoll" > > > > > > option) > > > > > > [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted > > > > > > 4.16.0-rc4 #4 > > > > > > [ 6.378927] Hardware name: linux,dummy-virt (DT) > > > > > > [ 6.380876] Call trace: > > > > > > [ 6.381937] dump_backtrace+0x0/0x180 > > > > > > [ 6.383495] show_stack+0x14/0x1c > > > > > > [ 6.384902] dump_stack+0x90/0xb4 > > > > > > [ 6.386312] __report_bad_irq+0x38/0xe0 > > > > > > [ 6.387944] note_interrupt+0x1f4/0x2b8 > > > > > > [ 6.389568] handle_irq_event_percpu+0x54/0x7c > > > > > > [ 6.391433] handle_irq_event+0x44/0x74 > > > > > > [ 6.393056] handle_fasteoi_irq+0x9c/0x154 > > > > > > [ 6.394784] generic_handle_irq+0x24/0x38 > > > > > > [ 6.396483] __handle_domain_irq+0x60/0xb4 > > > > > > [ 6.398207] gic_handle_irq+0x98/0x1b0 > > > > > > [ 6.399796] el1_irq+0xb0/0x128 > > > > > > [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 > > > > > > [ 6.403149] __setup_irq+0x41c/0x678 > > > > > > [ 6.404669] request_threaded_irq+0xe0/0x190 > > > > > > [ 6.406474] univ8250_setup_irq+0x208/0x234 > > > > > > [ 6.408250] serial8250_do_startup+0x1b4/0x754 > > > > > > [ 6.410123] serial8250_startup+0x20/0x28 > > > > > > [ 6.411826] uart_startup.part.21+0x78/0x144 > > > > > > [ 6.413633] uart_port_activate+0x50/0x68 > > > > > > [ 6.415328] tty_port_open+0x84/0xd4 > > > > > > [ 6.416851] uart_open+0x34/0x44 > > > > > > [ 6.418229] tty_open+0xec/0x3c8 > > > > > > [ 6.419610] chrdev_open+0xb0/0x198 > > > > > > [ 6.421093] do_dentry_open+0x200/0x310 > > > > > > [ 6.422714] vfs_open+0x54/0x84 > > > > > > [ 6.424054] path_openat+0x2dc/0xf04 > > > > > > [ 6.425569] do_filp_open+0x68/0xd8 > > > > > > [ 6.427044] do_sys_open+0x16c/0x224 > > > > > > [ 6.428563] SyS_openat+0x10/0x18 > > > > > > [ 6.429972] el0_svc_naked+0x30/0x34 > > > > > > [ 6.431494] handlers: > > > > > > [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt > > > > > > [ 6.434597] Disabling IRQ #41 > > > > > > > > > > > > This patch changes the lr state condition in > > > > > > lr_signals_eoi_mi() from > > > > > > invalid(Inactive) to active and pending to avoid this. > > > > > > > > > > > > I am not sure about the original design of the condition of > > > > > > invalid(active). So, This RFC is sent out for comments. > > > > > > > > > > > > Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > > > > > > Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.co > > > > > > m> > > > > > > --- > > > > > > virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > > > > > > virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > > > > > > 2 files changed, 4 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c > > > > > > b/virt/kvm/arm/vgic/vgic-v2.c > > > > > > index e9d840a75e7b..740ee9a5f551 100644 > > > > > > --- a/virt/kvm/arm/vgic/vgic-v2.c > > > > > > +++ b/virt/kvm/arm/vgic/vgic-v2.c > > > > > > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct > > > > > > kvm_vcpu *vcpu) > > > > > > > > > > > > static bool lr_signals_eoi_mi(u32 lr_val) > > > > > > { > > > > > > - return !(lr_val & GICH_LR_STATE) && (lr_val & > > > > > > GICH_LR_EOI) && > > > > > > - !(lr_val & GICH_LR_HW); > > > > > > + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) > > > > > > && > > > > > That feels very wrong. You're now signalling the resampling > > > > > in both > > > > > invalid and pending+active, and the latter state doesn't mean > > > > > you've > > > > > EOIed anything. You're now over-signalling, and signalling > > > > > the > > > > > wrong event. I am using XOR GICH_LR_STATE(0b'11), so only 0b'11(P&A) will be signaled. Other state will be false. And I am curious why the EOI bit in LR indicate the end of interrupt regardless of the state? Please bear with me as I am a newbie in this part. > > > > > > > > > > > > > > > > > + (lr_val & GICH_LR_EOI) && !(lr_val & > > > > > > GICH_LR_HW); > > > > > > } > > > > > > > > > > > > /* > > > > > > diff --git a/virt/kvm/arm/vgic/vgic-v3.c > > > > > > b/virt/kvm/arm/vgic/vgic-v3.c > > > > > > index 6b329414e57a..43111bba7af9 100644 > > > > > > --- a/virt/kvm/arm/vgic/vgic-v3.c > > > > > > +++ b/virt/kvm/arm/vgic/vgic-v3.c > > > > > > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct > > > > > > kvm_vcpu *vcpu) > > > > > > > > > > > > static bool lr_signals_eoi_mi(u64 lr_val) > > > > > > { > > > > > > - return !(lr_val & ICH_LR_STATE) && (lr_val & > > > > > > ICH_LR_EOI) && > > > > > > - !(lr_val & ICH_LR_HW); > > > > > > + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) > > > > > > && > > > > > > + (lr_val & ICH_LR_EOI) && !(lr_val & > > > > > > ICH_LR_HW); > > > > > > } > > > > > > > > > > > > void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > > > > > > > > > > > Assuming I understand the issue correctly, I cannot really > > > > > see how > > > > > to solve this without reintroducing EISR, which sucks > > > > > majorly. > > > > > > > > > > I'll try to cook something shortly and we can all have a good > > > > > fight about how crap this is. > > > > Here's what I came up with. I don't really like it, but that's > > > > the least invasive this I could come up with. Please let me > > > > know if that helps with your test case. Note that I have only > > > > boot-tested this on a sample of 1 machine, so I don't expect > > > > this > > > > to be perfect. > > > > > > > > Also, any guideline on how to reproduce this would be much > > > > appreciated. > > > > I never used this mdev/mtty thing, so please bear with me. > > > > > > > > Thanks, > > > > > > > > M. The mdev/mtty documentation is at Documentation/vfio-mediated- device.txt. It docmented how to enable mtty device. And support for "vfio-pci,sysfsdev" should be availabe in your qemu version (I compiled the latest version). Following is my commond to run qemu with mdev support, "qemu-system-aarch64 -m 1024 -cpu host -M virt,gic_version=3 -nographic \ -kernel /home/yangsy/up-kvm/arch/arm64/boot/Image.gz \ -initrd /home/yangsy/kvm/ramdisk/initrd.img \ -netdev user,id=eth0 -device virtio-net-device,netdev=eth0 -enable-kvm \ -append "root=/dev/ram rdinit=/sbin/init" \ -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f- 3c1e-e6bfe0fa1001 " For just test this vgic case, type "cat /dev/ttyS0" in guest. But if test read/write multiple bytes, please apply following patch also https://patchwork.kernel.org/patch/10267039/ > > > > > > > > From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 > > > > 00:00:00 2001 > > > > From: Marc Zyngier <marc.zyngier@arm.com> > > > > Date: Thu, 8 Mar 2018 11:14:06 +0000 > > > > Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to > > > > guess EOI MI > > > > status > > > > > > > > We so far rely on the LR state to decide whether the guest has > > > > EOI'd a level interrupt or not. While this looks like a good > > > > idea on the surface, it leads to a couple of annoying corner > > > > cases: > > > > > > > > Example 1: (P = Pending, A = Active, MI = Maintenance > > > > Interrupt) > > > > P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> > > > > MI > > > Do we really get an EOI maintenance interrupt here? Reading the > > > MISR > > > and EISR descriptions make me thing this is not the case... > Hum yes in EISR it is said that ICH_LR.State = 0b00! > > > > > > Yeah, it looks like I always want EISR to do what I want, and not > > to > > do what it does. Man, this thing is such a piece of crap. > > > > OK, scratch that. We need to do it without the help of the HW. If convenient, maybe we can get something from HW gus. :-) Hi, Marc, Do you need me to test the patch you posted for EISR? As it seems there are some things need more discussion. > > > > > > > > > > > > > The state is now pending, we've really EOI'd the interrupt, and > > > > yet lr_signals_eoi_mi() returns false, since the state is not > > > > 0. > > > > The result is that we won't signal anything on the > > > > corresponding > > > > irqfd, which people complain about. Meh. > > > So the core of the problem is that when we've entered the guest > > > with > > > PENDING+ACTIVE and when we exit (for some reason) we don't signal > > > the > > > resamplefd, right? The solution seems to me that we don't ever > > > do > > > PENDING+ACTIVE if you need to resample after each > > > deactivate. What > > > would be the point of appending a pending state that you only > > > know to be > > > valid after a resample anyway? > > The question is then to identify that a given source needs to be > > signalled back to VFIO. Calling into the eventfd code on the hot > > path > > is pretty horrid (I'm not sure if we can really call into this with > > interrupts disabled, for example). > > > > > > > > > > > > > > > > > > > > Example 2: > > > > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI > > > > fires > > > We could be more clever and do the following calculation on every > > > exit: > > > > > > If you enter with P, and exit with either A or 0, then signal. > > > > > > If you enter with P+A, and you exit with either P, A, or 0, then > > > signal. > > > > > > Wouldn't that also solve it? (Although I have a feeling you'd > > > miss some > > > exits in this case). > > I'd be more confident if we did forbid P+A for such interrupts > > altogether, as they really feel like another kind of HW interrupt. > the LR P+A looks strange to me too. all the more so it may cause the > same IRQ to be acked twice? > > P -> A -> 0 (resample). Doesn't our issue come from the fact we > reinject > the P in LR until the line level is deasserted? > > > > > > Eric: Is there any way to get a callback from the eventfd code to > > flag > > a given irq as requiring a notification on EOI? > bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned > pin) was used in the past. I think it does what you want. > > Thanks > > Eric > > > > > > Thanks, > > > > M. > > I have added some logs to compare level interrupt between pl011(hwirq = 33) and mtty (hwirq = 36). In mtty case, vgic_queue_irq_unlock() is called twice. But only called once in pl011. following is the log, ===Without my patch=== ###PL011### <4>[ 180.598266] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 latch:0 level:1 <4>[ 180.604460] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 level:1 <4>[ 180.604540] ==>90a0020000000021(active) <4>[ 180.614878] ==>d0a0020000000021(P&A) <4>[ 180.618415] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 latch:0 level:0 <4>[ 180.625508] ==>90a0020000000021(active) <4>[ 180.629343] ==>10a0020000000021(inactive) ###mtty-vfio### <4>[ 223.123329] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 latch:0 level:1 <4>[ 223.129736] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 level:1 <4>[ 223.136027] ==>50a0020000000024(pending) <4>[ 223.139954] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 level:1 <4>[ 223.146460] ==>90a0020000000024(active) <4>[ 223.150273] ==>d0a0020000000024(P&A) <4>[ 223.153827] ==>90a0020000000024(active) <4>[ 223.157668] ==>d0a0020000000024(P&A) ...........cyclic... I rembered in some tests the state change is cyclic P->A->P&A. But it seems I cannot reproduce it. Is output LR state in kvm_vgic_inject_irq() reliable? ===With my patch=== ###PL011### <4>[ 114.798528] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 latch:0 level:1 <4>[ 114.804743] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 level:1 <4>[ 114.804796] ==>90a0020000000021(active) <4>[ 114.815077] ==>d0a0020000000021(P&A) <4>[ 114.818628] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 latch:0 level:0 <4>[ 114.825726] ==>90a0020000000021(active) <4>[ 114.829560] ==>10a0020000000021(inactive) ###mtty-vfio### <4>[ 161.579083] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 latch:0 level:1 <4>[ 161.585419] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 level:1 <4>[ 161.591780] ==>50a0020000000024(pending) <4>[ 161.595708] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 level:1 <4>[ 161.602204] ==>90a0020000000024(active) <4>[ 161.606023] ==>d0a0020000000024(P&A) <4>[ 161.609561] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 latch:0 level:0 <4>[ 161.616693] ==>10a0020000000024(inactive) <4>[ 161.620745] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 latch:0 level:1 <4>[ 161.627800] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 level:1 <4>[ 161.627849] ==>90a0020000000024(active) <4>[ 161.640076] ==>d0a0020000000024(P&A) <4>[ 161.642689] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 latch:0 level:0 <4>[ 161.649822] ==>10a0020000000024(inactive) Following is the test patch, diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c old mode 100644 new mode 100755 index 6b329414e57a..00fb83b11f43 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -26,6 +26,9 @@ static bool common_trap; static bool gicv4_enable; +int monitor_irq = 36; +module_param(monitor_irq, int, S_IRUGO | S_IWUSR); + void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) { struct vgic_v3_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v3; @@ -39,6 +42,8 @@ static bool lr_signals_eoi_mi(u64 lr_val) !(lr_val & ICH_LR_HW); } +u64 last_val = 0; + void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) { struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; @@ -46,6 +51,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) u32 model = vcpu->kvm->arch.vgic.vgic_model; int lr; unsigned long flags; + char *str[]={"inactive", "pending", "active", "P&A"}; cpuif->vgic_hcr &= ~ICH_HCR_UIE; @@ -60,6 +66,13 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) intid = val & GICH_LR_VIRTUALID; /* Notify fds when the guest EOI'ed a level-triggered IRQ */ + if (intid == monitor_irq) { + if (last_val != val) { + printk("==>%llx(%s)\n", val, str[(val >> 62) & 0x03 ]); + last_val = val; + } + } + if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu- >kvm, intid)) kvm_notify_acked_irq(vcpu->kvm, 0, intid - VGIC _NR_PRIVATE_IRQS); diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c old mode 100644 new mode 100755 index 07126a3b1908..9c284623ea23 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -31,6 +31,8 @@ #define DEBUG_SPINLOCK_BUG_ON(p) #endif +extern int monitor_irq; + struct vgic_global kvm_vgic_global_state __ro_after_init = { .gicv3_cpuif = STATIC_KEY_FALSE_INIT, }; @@ -381,6 +383,12 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu); kvm_vcpu_kick(vcpu); + if (irq->intid == monitor_irq) { + printk("##%s %d irq->intid:%d enable:%d level:%d\n", + __func__, __LINE__, irq->intid, + irq->enabled, irq->line_level); + //dump_stack(); + } return true; } @@ -401,6 +409,9 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, * level-sensitive interrupts. You can think of the level parameter as 1 * being HIGH and 0 being LOW and all devices being active-HIGH. */ + +bool monitor_vm_entry_start = false; + int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid, bool level, void *owner) { @@ -437,6 +448,13 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid, else irq->pending_latch = true; + if (irq->intid == monitor_irq) { + printk("%s %d irq:%d enabled:%d config:%d latch:%d level:%d\n", + __func__, __LINE__, irq->intid, irq->enabled, irq->config, + irq->pending_latch, irq->line_level); + monitor_vm_entry_start = true; + } + vgic_queue_irq_unlock(kvm, irq, flags); vgic_put_irq(kvm, irq); Thanks. Shunyong. ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-09 3:14 ` Yang, Shunyong 0 siblings, 0 replies; 50+ messages in thread From: Yang, Shunyong @ 2018-03-09 3:14 UTC (permalink / raw) To: linux-arm-kernel Hi, Eric, Marc and Christoffer, On Thu, 2018-03-08 at 19:12 +0100, Auger Eric wrote: > Hi Marc, Christoffer, > > On 08/03/18 18:28, Marc Zyngier wrote: > > > > On Thu, 08 Mar 2018 16:19:00 +0000, > > Christoffer Dall wrote: > > > > > > > > > On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: > > > > > > > > On 08/03/18 09:49, Marc Zyngier wrote: > > > > > > > > > > [updated Christoffer's email address] > > > > > > > > > > Hi Shunyong, > > > > > > > > > > On 08/03/18 07:01, Shunyong Yang wrote: > > > > > > > > > > > > When resampling irqfds is enabled, level interrupt should > > > > > > be > > > > > > de-asserted when resampling happens. On page 4-47 of GIC v3 > > > > > > specification IHI0069D, it said, > > > > > > "When the PE acknowledges an SGI, a PPI, or an SPI at the > > > > > > CPU > > > > > > interface, the IRI changes the status of the interrupt to > > > > > > active > > > > > > and pending if: > > > > > > ? It is an edge-triggered interrupt, and another edge has > > > > > > been > > > > > > detected since the interrupt was acknowledged. > > > > > > ? It is a level-sensitive interrupt, and the level has not > > > > > > been > > > > > > deasserted since the interrupt was acknowledged." > > > > > > > > > > > > GIC v2 specification IHI0048B.b has similar description on > > > > > > page > > > > > > 3-42 for state machine transition. > > > > > > > > > > > > When some VFIO device, like mtty(8250 VFIO mdev emulation > > > > > > driver > > > > > > in samples/vfio-mdev) triggers a level interrupt, the > > > > > > status > > > > > > transition in LR is pending-->active-->active and pending. > > > > > > Then it will wait resampling to de-assert the interrupt. > > > > > > > > > > > > Current design of lr_signals_eoi_mi() will return false if > > > > > > state > > > > > > in LR is not invalid(Inactive). It causes resampling will > > > > > > not happen > > > > > > in mtty case. > > > > > Let me rephrase this, and tell me if I understood it > > > > > correctly: > > > > > > > > > > - A level interrupt is injected, activated by the guest (LR > > > > > state=active) > > > > > - guest exits, re-enters, (LR state=pending+active) > > > > > - guest EOIs the interrupt (LR state=pending) > > > > > - maintenance interrupt > > > > > - we don't signal the resampling because we're not in an > > > > > invalid state > > > > > > > > > > Is that correct? > > > > > > > > > > That's an interesting case, because it seems to invalidate > > > > > some of the? > > > > > optimization that went in over a year ago. > > > > > > > > > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR > > > > > fields > > > > > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary > > > > > save_maint_int_state > > > > > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary > > > > > process_maintenance operation > > > > > > > > > > We could compare the value of the LR before the guest entry > > > > > with > > > > > the value at exit time, but we still could miss it if we have > > > > > a > > > > > transition such as P+A -> P -> A and assume a long enough > > > > > propagation > > > > > delay for the maintenance interrupt (which is very likely). > > > > > > > > > > In essence, we have lost the benefit of EISR, which was to > > > > > give us a > > > > > way to deal with asynchronous signalling. > > > > > > > > > > > > > > > > > > > > > > > This will cause interrupt fired continuously to guest even > > > > > > 8250 IIR > > > > > > has no interrupt. When 8250's interrupt is configured in > > > > > > shared mode, > > > > > > it will pass interrupt to other drivers to handle. However, > > > > > > there > > > > > > is no other driver involved. Then, a "nobody cared" kernel > > > > > > complaint > > > > > > occurs. > > > > > > > > > > > > / # cat /dev/ttyS0 > > > > > > [????4.826836] random: crng init done > > > > > > [????6.373620] irq 41: nobody cared (try booting with the > > > > > > "irqpoll" > > > > > > option) > > > > > > [????6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted > > > > > > 4.16.0-rc4 #4 > > > > > > [????6.378927] Hardware name: linux,dummy-virt (DT) > > > > > > [????6.380876] Call trace: > > > > > > [????6.381937]??dump_backtrace+0x0/0x180 > > > > > > [????6.383495]??show_stack+0x14/0x1c > > > > > > [????6.384902]??dump_stack+0x90/0xb4 > > > > > > [????6.386312]??__report_bad_irq+0x38/0xe0 > > > > > > [????6.387944]??note_interrupt+0x1f4/0x2b8 > > > > > > [????6.389568]??handle_irq_event_percpu+0x54/0x7c > > > > > > [????6.391433]??handle_irq_event+0x44/0x74 > > > > > > [????6.393056]??handle_fasteoi_irq+0x9c/0x154 > > > > > > [????6.394784]??generic_handle_irq+0x24/0x38 > > > > > > [????6.396483]??__handle_domain_irq+0x60/0xb4 > > > > > > [????6.398207]??gic_handle_irq+0x98/0x1b0 > > > > > > [????6.399796]??el1_irq+0xb0/0x128 > > > > > > [????6.401138]??_raw_spin_unlock_irqrestore+0x18/0x40 > > > > > > [????6.403149]??__setup_irq+0x41c/0x678 > > > > > > [????6.404669]??request_threaded_irq+0xe0/0x190 > > > > > > [????6.406474]??univ8250_setup_irq+0x208/0x234 > > > > > > [????6.408250]??serial8250_do_startup+0x1b4/0x754 > > > > > > [????6.410123]??serial8250_startup+0x20/0x28 > > > > > > [????6.411826]??uart_startup.part.21+0x78/0x144 > > > > > > [????6.413633]??uart_port_activate+0x50/0x68 > > > > > > [????6.415328]??tty_port_open+0x84/0xd4 > > > > > > [????6.416851]??uart_open+0x34/0x44 > > > > > > [????6.418229]??tty_open+0xec/0x3c8 > > > > > > [????6.419610]??chrdev_open+0xb0/0x198 > > > > > > [????6.421093]??do_dentry_open+0x200/0x310 > > > > > > [????6.422714]??vfs_open+0x54/0x84 > > > > > > [????6.424054]??path_openat+0x2dc/0xf04 > > > > > > [????6.425569]??do_filp_open+0x68/0xd8 > > > > > > [????6.427044]??do_sys_open+0x16c/0x224 > > > > > > [????6.428563]??SyS_openat+0x10/0x18 > > > > > > [????6.429972]??el0_svc_naked+0x30/0x34 > > > > > > [????6.431494] handlers: > > > > > > [????6.432479] [<000000000e9fb4bb>] serial8250_interrupt > > > > > > [????6.434597] Disabling IRQ #41 > > > > > > > > > > > > This patch changes the lr state condition in > > > > > > lr_signals_eoi_mi() from > > > > > > invalid(Inactive) to active and pending to avoid this. > > > > > > > > > > > > I am not sure about the original design of the condition of > > > > > > invalid(active). So, This RFC is sent out for comments. > > > > > > > > > > > > Cc: Joey Zheng <yu.zheng@hxt-semitech.com> > > > > > > Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.co > > > > > > m> > > > > > > --- > > > > > > ?virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- > > > > > > ?virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- > > > > > > ?2 files changed, 4 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c > > > > > > b/virt/kvm/arm/vgic/vgic-v2.c > > > > > > index e9d840a75e7b..740ee9a5f551 100644 > > > > > > --- a/virt/kvm/arm/vgic/vgic-v2.c > > > > > > +++ b/virt/kvm/arm/vgic/vgic-v2.c > > > > > > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct > > > > > > kvm_vcpu *vcpu) > > > > > > ? > > > > > > ?static bool lr_signals_eoi_mi(u32 lr_val) > > > > > > ?{ > > > > > > - return !(lr_val & GICH_LR_STATE) && (lr_val & > > > > > > GICH_LR_EOI) && > > > > > > - ???????!(lr_val & GICH_LR_HW); > > > > > > + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) > > > > > > && > > > > > That feels very wrong. You're now signalling the resampling > > > > > in both > > > > > invalid and pending+active, and the latter state doesn't mean > > > > > you've > > > > > EOIed anything. You're now over-signalling, and signalling > > > > > the > > > > > wrong event. I am using XOR GICH_LR_STATE(0b'11), so only 0b'11(P&A) will be signaled. Other state will be false. And I am curious why the EOI bit in LR indicate the end of interrupt regardless of the state? Please bear with me as I am a newbie in this part. > > > > > > > > > > > > > > > > > + ???????(lr_val & GICH_LR_EOI) && !(lr_val & > > > > > > GICH_LR_HW); > > > > > > ?} > > > > > > ? > > > > > > ?/* > > > > > > diff --git a/virt/kvm/arm/vgic/vgic-v3.c > > > > > > b/virt/kvm/arm/vgic/vgic-v3.c > > > > > > index 6b329414e57a..43111bba7af9 100644 > > > > > > --- a/virt/kvm/arm/vgic/vgic-v3.c > > > > > > +++ b/virt/kvm/arm/vgic/vgic-v3.c > > > > > > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct > > > > > > kvm_vcpu *vcpu) > > > > > > ? > > > > > > ?static bool lr_signals_eoi_mi(u64 lr_val) > > > > > > ?{ > > > > > > - return !(lr_val & ICH_LR_STATE) && (lr_val & > > > > > > ICH_LR_EOI) && > > > > > > - ???????!(lr_val & ICH_LR_HW); > > > > > > + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) > > > > > > && > > > > > > + ???????(lr_val & ICH_LR_EOI) && !(lr_val & > > > > > > ICH_LR_HW); > > > > > > ?} > > > > > > ? > > > > > > ?void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > > > > > > > > > > > Assuming I understand the issue correctly, I cannot really > > > > > see how > > > > > to solve this without reintroducing EISR, which sucks > > > > > majorly. > > > > > > > > > > I'll try to cook something shortly and we can all have a good > > > > > fight about how crap this is. > > > > Here's what I came up with. I don't really like it, but that's > > > > the least invasive this I could come up with. Please let me > > > > know if that helps with your test case. Note that I have only > > > > boot-tested this on a sample of 1 machine, so I don't expect > > > > this > > > > to be perfect. > > > > > > > > Also, any guideline on how to reproduce this would be much > > > > appreciated. > > > > I never used this mdev/mtty thing, so please bear with me. > > > > > > > > Thanks, > > > > > > > > M. The mdev/mtty documentation is at?Documentation/vfio-mediated- device.txt. It docmented how to enable mtty device. And support for "vfio-pci,sysfsdev" should be availabe in your qemu version (I compiled the latest version). Following is my commond to run qemu with mdev support, "qemu-system-aarch64 -m 1024 -cpu host -M virt,gic_version=3 -nographic \ -kernel /home/yangsy/up-kvm/arch/arm64/boot/Image.gz \ -initrd /home/yangsy/kvm/ramdisk/initrd.img \ -netdev user,id=eth0 -device virtio-net-device,netdev=eth0 -enable-kvm \ -append "root=/dev/ram rdinit=/sbin/init" \ -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f- 3c1e-e6bfe0fa1001 " For just test this vgic case, type "cat /dev/ttyS0" in guest. But if test read/write multiple bytes, please apply following patch also https://patchwork.kernel.org/patch/10267039/ > > > > > > > > From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 > > > > 00:00:00 2001 > > > > From: Marc Zyngier <marc.zyngier@arm.com> > > > > Date: Thu, 8 Mar 2018 11:14:06 +0000 > > > > Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to > > > > guess EOI MI > > > > ?status > > > > > > > > We so far rely on the LR state to decide whether the guest has > > > > EOI'd a level interrupt or not. While this looks like a good > > > > idea on the surface, it leads to a couple of annoying corner > > > > cases: > > > > > > > > Example 1: (P = Pending, A = Active, MI = Maintenance > > > > Interrupt) > > > > P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> > > > > MI > > > Do we really get an EOI maintenance interrupt here???Reading the > > > MISR > > > and EISR descriptions make me thing this is not the case... > Hum yes in EISR it is said that ICH_LR.State = 0b00! > > > > > > Yeah, it looks like I always want EISR to do what I want, and not > > to > > do what it does. Man, this thing is such a piece of crap. > > > > OK, scratch that. We need to do it without the help of the HW. If?convenient, maybe we can get something from HW gus. :-) Hi, Marc, Do you need me to test the patch you posted for EISR? As it seems there are some things need more discussion. > > > > > > > > > > > > > The state is now pending, we've really EOI'd the interrupt, and > > > > yet lr_signals_eoi_mi() returns false, since the state is not > > > > 0. > > > > The result is that we won't signal anything on the > > > > corresponding > > > > irqfd, which people complain about. Meh. > > > So the core of the problem is that when we've entered the guest > > > with > > > PENDING+ACTIVE and when we exit (for some reason) we don't signal > > > the > > > resamplefd, right???The solution seems to me that we don't ever > > > do > > > PENDING+ACTIVE if you need to resample after each > > > deactivate.??What > > > would be the point of appending a pending state that you only > > > know to be > > > valid after a resample anyway? > > The question is then to identify that a given source needs to be > > signalled back to VFIO. Calling into the eventfd code on the hot > > path > > is pretty horrid (I'm not sure if we can really call into this with > > interrupts disabled, for example). > > > > > > > > > > > > > > > > > > > > Example 2: > > > > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI > > > > fires > > > We could be more clever and do the following calculation on every > > > exit: > > > > > > If you enter with P, and exit with either A or 0, then signal. > > > > > > If you enter with P+A, and you exit with either P, A, or 0, then > > > signal. > > > > > > Wouldn't that also solve it???(Although I have a feeling you'd > > > miss some > > > exits in this case). > > I'd be more confident if we did forbid P+A for such interrupts > > altogether, as they really feel like another kind of HW interrupt. > the LR P+A looks strange to me too. all the more so it may cause the > same IRQ to be acked twice? > > P -> A -> 0 (resample). Doesn't our issue come from the fact we > reinject > the P in LR until the line level is deasserted? > > > > > > Eric: Is there any way to get a callback from the eventfd code to > > flag > > a given irq as requiring a notification on EOI? > bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned > pin) was used in the past. I think it does what you want. > > Thanks > > Eric > > > > > > Thanks, > > > > M. > > I have added some logs to compare level interrupt between pl011(hwirq = 33) and mtty (hwirq = 36). In mtty case, vgic_queue_irq_unlock() is called twice. But only called once in pl011. following is the log, ===Without my patch=== ###PL011### <4>[??180.598266] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 latch:0 level:1 <4>[??180.604460] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 level:1 <4>[??180.604540] ==>90a0020000000021(active) <4>[??180.614878] ==>d0a0020000000021(P&A) <4>[??180.618415] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 latch:0 level:0 <4>[??180.625508] ==>90a0020000000021(active) <4>[??180.629343] ==>10a0020000000021(inactive) ###mtty-vfio### <4>[??223.123329] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 latch:0 level:1 <4>[??223.129736] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 level:1 <4>[??223.136027] ==>50a0020000000024(pending) <4>[??223.139954] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 level:1 <4>[??223.146460] ==>90a0020000000024(active) <4>[??223.150273] ==>d0a0020000000024(P&A) <4>[??223.153827] ==>90a0020000000024(active) <4>[??223.157668] ==>d0a0020000000024(P&A) ...........cyclic... I rembered in some tests the state change is cyclic P->A->P&A. But it seems I cannot reproduce it. Is output LR state in?kvm_vgic_inject_irq() reliable? ===With my patch=== ###PL011### <4>[??114.798528] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 latch:0 level:1 <4>[??114.804743] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 level:1 <4>[??114.804796] ==>90a0020000000021(active) <4>[??114.815077] ==>d0a0020000000021(P&A) <4>[??114.818628] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 latch:0 level:0 <4>[??114.825726] ==>90a0020000000021(active) <4>[??114.829560] ==>10a0020000000021(inactive) ###mtty-vfio### <4>[??161.579083] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 latch:0 level:1 <4>[??161.585419] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 level:1 <4>[??161.591780] ==>50a0020000000024(pending) <4>[??161.595708] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 level:1 <4>[??161.602204] ==>90a0020000000024(active) <4>[??161.606023] ==>d0a0020000000024(P&A) <4>[??161.609561] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 latch:0 level:0 <4>[??161.616693] ==>10a0020000000024(inactive) <4>[??161.620745] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 latch:0 level:1 <4>[??161.627800] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 level:1 <4>[??161.627849] ==>90a0020000000024(active) <4>[??161.640076] ==>d0a0020000000024(P&A) <4>[??161.642689] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 latch:0 level:0 <4>[??161.649822] ==>10a0020000000024(inactive) Following is the test patch, diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c old mode 100644 new mode 100755 index 6b329414e57a..00fb83b11f43 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -26,6 +26,9 @@ ?static bool common_trap; ?static bool gicv4_enable; ? +int monitor_irq = 36; +module_param(monitor_irq, int, S_IRUGO | S_IWUSR); + ?void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) ?{ ? struct vgic_v3_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v3; @@ -39,6 +42,8 @@ static bool lr_signals_eoi_mi(u64 lr_val) ? ???????!(lr_val & ICH_LR_HW); ?} ? +u64 last_val = 0; + ?void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) ?{ ? struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; @@ -46,6 +51,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) ? u32 model = vcpu->kvm->arch.vgic.vgic_model; ? int lr; ? unsigned long flags; + char *str[]={"inactive", "pending", "active", "P&A"}; ? ? cpuif->vgic_hcr &= ~ICH_HCR_UIE; ? @@ -60,6 +66,13 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) ? intid = val & GICH_LR_VIRTUALID; ? ? /* Notify fds when the guest EOI'ed a level-triggered IRQ */ + if (intid == monitor_irq) { + if (last_val != val) { + printk("==>%llx(%s)\n", val, str[(val >> 62) & 0x03 ]); + last_val = val; + } + } + ? if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu- >kvm, intid)) ? kvm_notify_acked_irq(vcpu->kvm, 0, ? ?????intid - VGIC _NR_PRIVATE_IRQS); diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c old mode 100644 new mode 100755 index 07126a3b1908..9c284623ea23 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -31,6 +31,8 @@ ?#define DEBUG_SPINLOCK_BUG_ON(p) ?#endif ? +extern int monitor_irq; + ?struct vgic_global kvm_vgic_global_state __ro_after_init = { ? .gicv3_cpuif = STATIC_KEY_FALSE_INIT, ?}; @@ -381,6 +383,12 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, ? kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu); ? kvm_vcpu_kick(vcpu); ? + if (irq->intid == monitor_irq) { + printk("##%s %d irq->intid:%d enable:%d level:%d\n", + __func__, __LINE__, irq->intid, + irq->enabled, irq->line_level); + //dump_stack(); + } ? return true; ?} ? @@ -401,6 +409,9 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, ? * level-sensitive interrupts.??You can think of the level parameter as 1 ? * being HIGH and 0 being LOW and all devices being active-HIGH. ? */ + +bool monitor_vm_entry_start = false; + ?int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid, ? bool level, void *owner) ?{ @@ -437,6 +448,13 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid, ? else ? irq->pending_latch = true; ? + if (irq->intid == monitor_irq) { + printk("%s %d irq:%d enabled:%d config:%d latch:%d level:%d\n", + __func__, __LINE__, irq->intid, irq->enabled, irq->config, + irq->pending_latch, irq->line_level); + monitor_vm_entry_start = true; + } + ? vgic_queue_irq_unlock(kvm, irq, flags); ? vgic_put_irq(kvm, irq); Thanks. Shunyong. ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-09 3:14 ` Yang, Shunyong @ 2018-03-09 9:40 ` Marc Zyngier -1 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-09 9:40 UTC (permalink / raw) To: Yang, Shunyong, eric.auger, cdall Cc: linux-arm-kernel, linux-kernel, david.daney, ard.biesheuvel, kvmarm, will.deacon, Zheng, Joey On 09/03/18 03:14, Yang, Shunyong wrote: [trimming things a bit] >>>>>>> static bool lr_signals_eoi_mi(u32 lr_val) >>>>>>> { >>>>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & >>>>>>> GICH_LR_EOI) && >>>>>>> - !(lr_val & GICH_LR_HW); >>>>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) >>>>>>> && >>>>>> That feels very wrong. You're now signalling the resampling >>>>>> in both >>>>>> invalid and pending+active, and the latter state doesn't mean >>>>>> you've >>>>>> EOIed anything. You're now over-signalling, and signalling >>>>>> the >>>>>> wrong event. > > I am using XOR GICH_LR_STATE(0b'11), so only 0b'11(P&A) will be > signaled. Other state will be false. And that's really wrong. P+A is a state where the interrupt is still being processed. The only case where we can reliably detect that an interrupt has been EOId is when state==0. > And I am curious why the EOI bit in LR indicate the end of interrupt > regardless of the state? Please bear with me as I am a newbie in this > part. The EOI bit indicates that we've requested a maintenance interrupt from the HW. It only triggers when state==0. If you have (like you describe further down) a sequence of P -> A -> (exit) -> P+A -> P -> A -> (exit) P+A ... we can never reliably detect that an interrupt has been EOId (because the HW never delivers a maintenance interrupt), other than by tracking the states before and after exit, and hoping that you've done an exit because you're touching the source of the interrupt. >>>>> Also, any guideline on how to reproduce this would be much >>>>> appreciated. >>>>> I never used this mdev/mtty thing, so please bear with me. >>>>> >>>>> Thanks, >>>>> >>>>> M. > > The mdev/mtty documentation is at Documentation/vfio-mediated- > device.txt. It docmented how to enable mtty device. > And support for "vfio-pci,sysfsdev" should be availabe in your qemu > version (I compiled the latest version). > Following is my commond to run qemu with mdev support, > "qemu-system-aarch64 -m 1024 -cpu host -M virt,gic_version=3 -nographic > \ > -kernel /home/yangsy/up-kvm/arch/arm64/boot/Image.gz \ > -initrd /home/yangsy/kvm/ramdisk/initrd.img \ > -netdev user,id=eth0 -device virtio-net-device,netdev=eth0 -enable-kvm > \ > -append "root=/dev/ram rdinit=/sbin/init" \ > -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f- > 3c1e-e6bfe0fa1001 > " > For just test this vgic case, type "cat /dev/ttyS0" in guest. But if > test read/write multiple bytes, please apply following patch also > https://patchwork.kernel.org/patch/10267039/ Thanks. I'll have a look. > >>>>> >>>>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 >>>>> 00:00:00 2001 >>>>> From: Marc Zyngier <marc.zyngier@arm.com> >>>>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>>>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to >>>>> guess EOI MI >>>>> status >>>>> >>>>> We so far rely on the LR state to decide whether the guest has >>>>> EOI'd a level interrupt or not. While this looks like a good >>>>> idea on the surface, it leads to a couple of annoying corner >>>>> cases: >>>>> >>>>> Example 1: (P = Pending, A = Active, MI = Maintenance >>>>> Interrupt) >>>>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> >>>>> MI >>>> Do we really get an EOI maintenance interrupt here? Reading the >>>> MISR >>>> and EISR descriptions make me thing this is not the case... >> Hum yes in EISR it is said that ICH_LR.State = 0b00! >>> >>> >>> Yeah, it looks like I always want EISR to do what I want, and not >>> to >>> do what it does. Man, this thing is such a piece of crap. >>> >>> OK, scratch that. We need to do it without the help of the HW. > > If convenient, maybe we can get something from HW gus. :-) > > Hi, Marc, > > Do you need me to test the patch you posted for EISR? As it seems there > are some things need more discussion. Yeah, that approach doesn't work. I'll try and come up with another approach (basically banning P+A for interrupts that require a back notification). [...] > I have added some logs to compare level interrupt between pl011(hwirq = > 33) and mtty (hwirq = 36). In mtty case, vgic_queue_irq_unlock() is > called twice. But only called once in pl011. > > following is the log, > ===Without my patch=== > ###PL011### > > <4>[ 180.598266] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 > latch:0 level:1 > <4>[ 180.604460] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 > level:1 > <4>[ 180.604540] ==>90a0020000000021(active) > <4>[ 180.614878] ==>d0a0020000000021(P&A) > <4>[ 180.618415] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 > latch:0 level:0 > <4>[ 180.625508] ==>90a0020000000021(active) > <4>[ 180.629343] ==>10a0020000000021(inactive) > > ###mtty-vfio### > <4>[ 223.123329] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 > latch:0 level:1 > <4>[ 223.129736] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 > level:1 > <4>[ 223.136027] ==>50a0020000000024(pending) > <4>[ 223.139954] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 > level:1 > <4>[ 223.146460] ==>90a0020000000024(active) > <4>[ 223.150273] ==>d0a0020000000024(P&A) > <4>[ 223.153827] ==>90a0020000000024(active) > <4>[ 223.157668] ==>d0a0020000000024(P&A) So the line is never lowered. That's very odd. > ...........cyclic... > > I rembered in some tests the state change is cyclic P->A->P&A. But it > seems I cannot reproduce it. Is output LR state > in kvm_vgic_inject_irq() reliable? > > ===With my patch=== > ###PL011### > <4>[ 114.798528] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 > latch:0 level:1 > <4>[ 114.804743] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 > level:1 > <4>[ 114.804796] ==>90a0020000000021(active) > <4>[ 114.815077] ==>d0a0020000000021(P&A) > <4>[ 114.818628] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 > latch:0 level:0 > <4>[ 114.825726] ==>90a0020000000021(active) > <4>[ 114.829560] ==>10a0020000000021(inactive) > > ###mtty-vfio### > > <4>[ 161.579083] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 > latch:0 level:1 > <4>[ 161.585419] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 > level:1 > <4>[ 161.591780] ==>50a0020000000024(pending) > <4>[ 161.595708] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 > level:1 > <4>[ 161.602204] ==>90a0020000000024(active) > <4>[ 161.606023] ==>d0a0020000000024(P&A) > <4>[ 161.609561] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 > latch:0 level:0 > <4>[ 161.616693] ==>10a0020000000024(inactive) > <4>[ 161.620745] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 > latch:0 level:1 > <4>[ 161.627800] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 > level:1 > <4>[ 161.627849] ==>90a0020000000024(active) > <4>[ 161.640076] ==>d0a0020000000024(P&A) > <4>[ 161.642689] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 > latch:0 level:0 > <4>[ 161.649822] ==>10a0020000000024(inactive) Which is really bizarre. The device only lowers the line when it is being told that the interrupt has been processed. That really smells of a bug in the device emulation. It should be lowered when the guest clears the interrupt status at the device level, and not when notified that the interrupt has been completed at the interrupt controller level. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-09 9:40 ` Marc Zyngier 0 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-09 9:40 UTC (permalink / raw) To: linux-arm-kernel On 09/03/18 03:14, Yang, Shunyong wrote: [trimming things a bit] >>>>>>> ?static bool lr_signals_eoi_mi(u32 lr_val) >>>>>>> ?{ >>>>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & >>>>>>> GICH_LR_EOI) && >>>>>>> - ???????!(lr_val & GICH_LR_HW); >>>>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) >>>>>>> && >>>>>> That feels very wrong. You're now signalling the resampling >>>>>> in both >>>>>> invalid and pending+active, and the latter state doesn't mean >>>>>> you've >>>>>> EOIed anything. You're now over-signalling, and signalling >>>>>> the >>>>>> wrong event. > > I am using XOR GICH_LR_STATE(0b'11), so only 0b'11(P&A) will be > signaled. Other state will be false. And that's really wrong. P+A is a state where the interrupt is still being processed. The only case where we can reliably detect that an interrupt has been EOId is when state==0. > And I am curious why the EOI bit in LR indicate the end of interrupt > regardless of the state? Please bear with me as I am a newbie in this > part. The EOI bit indicates that we've requested a maintenance interrupt from the HW. It only triggers when state==0. If you have (like you describe further down) a sequence of P -> A -> (exit) -> P+A -> P -> A -> (exit) P+A ... we can never reliably detect that an interrupt has been EOId (because the HW never delivers a maintenance interrupt), other than by tracking the states before and after exit, and hoping that you've done an exit because you're touching the source of the interrupt. >>>>> Also, any guideline on how to reproduce this would be much >>>>> appreciated. >>>>> I never used this mdev/mtty thing, so please bear with me. >>>>> >>>>> Thanks, >>>>> >>>>> M. > > The mdev/mtty documentation is at?Documentation/vfio-mediated- > device.txt. It docmented how to enable mtty device. > And support for "vfio-pci,sysfsdev" should be availabe in your qemu > version (I compiled the latest version). > Following is my commond to run qemu with mdev support, > "qemu-system-aarch64 -m 1024 -cpu host -M virt,gic_version=3 -nographic > \ > -kernel /home/yangsy/up-kvm/arch/arm64/boot/Image.gz \ > -initrd /home/yangsy/kvm/ramdisk/initrd.img \ > -netdev user,id=eth0 -device virtio-net-device,netdev=eth0 -enable-kvm > \ > -append "root=/dev/ram rdinit=/sbin/init" \ > -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f- > 3c1e-e6bfe0fa1001 > " > For just test this vgic case, type "cat /dev/ttyS0" in guest. But if > test read/write multiple bytes, please apply following patch also > https://patchwork.kernel.org/patch/10267039/ Thanks. I'll have a look. > >>>>> >>>>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 >>>>> 00:00:00 2001 >>>>> From: Marc Zyngier <marc.zyngier@arm.com> >>>>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>>>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to >>>>> guess EOI MI >>>>> ?status >>>>> >>>>> We so far rely on the LR state to decide whether the guest has >>>>> EOI'd a level interrupt or not. While this looks like a good >>>>> idea on the surface, it leads to a couple of annoying corner >>>>> cases: >>>>> >>>>> Example 1: (P = Pending, A = Active, MI = Maintenance >>>>> Interrupt) >>>>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> >>>>> MI >>>> Do we really get an EOI maintenance interrupt here???Reading the >>>> MISR >>>> and EISR descriptions make me thing this is not the case... >> Hum yes in EISR it is said that ICH_LR.State = 0b00! >>> >>> >>> Yeah, it looks like I always want EISR to do what I want, and not >>> to >>> do what it does. Man, this thing is such a piece of crap. >>> >>> OK, scratch that. We need to do it without the help of the HW. > > If?convenient, maybe we can get something from HW gus. :-) > > Hi, Marc, > > Do you need me to test the patch you posted for EISR? As it seems there > are some things need more discussion. Yeah, that approach doesn't work. I'll try and come up with another approach (basically banning P+A for interrupts that require a back notification). [...] > I have added some logs to compare level interrupt between pl011(hwirq = > 33) and mtty (hwirq = 36). In mtty case, vgic_queue_irq_unlock() is > called twice. But only called once in pl011. > > following is the log, > ===Without my patch=== > ###PL011### > > <4>[??180.598266] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 > latch:0 level:1 > <4>[??180.604460] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 > level:1 > <4>[??180.604540] ==>90a0020000000021(active) > <4>[??180.614878] ==>d0a0020000000021(P&A) > <4>[??180.618415] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 > latch:0 level:0 > <4>[??180.625508] ==>90a0020000000021(active) > <4>[??180.629343] ==>10a0020000000021(inactive) > > ###mtty-vfio### > <4>[??223.123329] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 > latch:0 level:1 > <4>[??223.129736] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 > level:1 > <4>[??223.136027] ==>50a0020000000024(pending) > <4>[??223.139954] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 > level:1 > <4>[??223.146460] ==>90a0020000000024(active) > <4>[??223.150273] ==>d0a0020000000024(P&A) > <4>[??223.153827] ==>90a0020000000024(active) > <4>[??223.157668] ==>d0a0020000000024(P&A) So the line is never lowered. That's very odd. > ...........cyclic... > > I rembered in some tests the state change is cyclic P->A->P&A. But it > seems I cannot reproduce it. Is output LR state > in?kvm_vgic_inject_irq() reliable? > > ===With my patch=== > ###PL011### > <4>[??114.798528] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 > latch:0 level:1 > <4>[??114.804743] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 > level:1 > <4>[??114.804796] ==>90a0020000000021(active) > <4>[??114.815077] ==>d0a0020000000021(P&A) > <4>[??114.818628] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 > latch:0 level:0 > <4>[??114.825726] ==>90a0020000000021(active) > <4>[??114.829560] ==>10a0020000000021(inactive) > > ###mtty-vfio### > > <4>[??161.579083] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 > latch:0 level:1 > <4>[??161.585419] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 > level:1 > <4>[??161.591780] ==>50a0020000000024(pending) > <4>[??161.595708] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 > level:1 > <4>[??161.602204] ==>90a0020000000024(active) > <4>[??161.606023] ==>d0a0020000000024(P&A) > <4>[??161.609561] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 > latch:0 level:0 > <4>[??161.616693] ==>10a0020000000024(inactive) > <4>[??161.620745] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 > latch:0 level:1 > <4>[??161.627800] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 > level:1 > <4>[??161.627849] ==>90a0020000000024(active) > <4>[??161.640076] ==>d0a0020000000024(P&A) > <4>[??161.642689] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 > latch:0 level:0 > <4>[??161.649822] ==>10a0020000000024(inactive) Which is really bizarre. The device only lowers the line when it is being told that the interrupt has been processed. That really smells of a bug in the device emulation. It should be lowered when the guest clears the interrupt status at the device level, and not when notified that the interrupt has been completed at the interrupt controller level. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-09 9:40 ` Marc Zyngier @ 2018-03-09 13:10 ` Auger Eric -1 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-09 13:10 UTC (permalink / raw) To: Marc Zyngier, Yang, Shunyong, cdall Cc: ard.biesheuvel, david.daney, will.deacon, linux-kernel, Zheng, Joey, kvmarm, linux-arm-kernel Hi Marc, On 09/03/18 10:40, Marc Zyngier wrote: > On 09/03/18 03:14, Yang, Shunyong wrote: > > [trimming things a bit] > >>>>>>>> static bool lr_signals_eoi_mi(u32 lr_val) >>>>>>>> { >>>>>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & >>>>>>>> GICH_LR_EOI) && >>>>>>>> - !(lr_val & GICH_LR_HW); >>>>>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) >>>>>>>> && >>>>>>> That feels very wrong. You're now signalling the resampling >>>>>>> in both >>>>>>> invalid and pending+active, and the latter state doesn't mean >>>>>>> you've >>>>>>> EOIed anything. You're now over-signalling, and signalling >>>>>>> the >>>>>>> wrong event. >> >> I am using XOR GICH_LR_STATE(0b'11), so only 0b'11(P&A) will be >> signaled. Other state will be false. > > And that's really wrong. P+A is a state where the interrupt is still > being processed. The only case where we can reliably detect that an > interrupt has been EOId is when state==0. > >> And I am curious why the EOI bit in LR indicate the end of interrupt >> regardless of the state? Please bear with me as I am a newbie in this >> part. > > The EOI bit indicates that we've requested a maintenance interrupt from > the HW. It only triggers when state==0. If you have (like you describe > further down) a sequence of > > P -> A -> (exit) -> P+A -> P -> A -> (exit) P+A ... > > we can never reliably detect that an interrupt has been EOId (because > the HW never delivers a maintenance interrupt), other than by tracking > the states before and after exit, and hoping that you've done an exit > because you're touching the source of the interrupt. > >>>>>> Also, any guideline on how to reproduce this would be much >>>>>> appreciated. >>>>>> I never used this mdev/mtty thing, so please bear with me. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> M. >> >> The mdev/mtty documentation is at Documentation/vfio-mediated- >> device.txt. It docmented how to enable mtty device. >> And support for "vfio-pci,sysfsdev" should be availabe in your qemu >> version (I compiled the latest version). >> Following is my commond to run qemu with mdev support, >> "qemu-system-aarch64 -m 1024 -cpu host -M virt,gic_version=3 -nographic >> \ >> -kernel /home/yangsy/up-kvm/arch/arm64/boot/Image.gz \ >> -initrd /home/yangsy/kvm/ramdisk/initrd.img \ >> -netdev user,id=eth0 -device virtio-net-device,netdev=eth0 -enable-kvm >> \ >> -append "root=/dev/ram rdinit=/sbin/init" \ >> -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f- >> 3c1e-e6bfe0fa1001 >> " >> For just test this vgic case, type "cat /dev/ttyS0" in guest. But if >> test read/write multiple bytes, please apply following patch also >> https://patchwork.kernel.org/patch/10267039/ > > Thanks. I'll have a look. > >> >>>>>> >>>>>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 >>>>>> 00:00:00 2001 >>>>>> From: Marc Zyngier <marc.zyngier@arm.com> >>>>>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>>>>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to >>>>>> guess EOI MI >>>>>> status >>>>>> >>>>>> We so far rely on the LR state to decide whether the guest has >>>>>> EOI'd a level interrupt or not. While this looks like a good >>>>>> idea on the surface, it leads to a couple of annoying corner >>>>>> cases: >>>>>> >>>>>> Example 1: (P = Pending, A = Active, MI = Maintenance >>>>>> Interrupt) >>>>>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> >>>>>> MI >>>>> Do we really get an EOI maintenance interrupt here? Reading the >>>>> MISR >>>>> and EISR descriptions make me thing this is not the case... >>> Hum yes in EISR it is said that ICH_LR.State = 0b00! >>>> >>>> >>>> Yeah, it looks like I always want EISR to do what I want, and not >>>> to >>>> do what it does. Man, this thing is such a piece of crap. >>>> >>>> OK, scratch that. We need to do it without the help of the HW. >> >> If convenient, maybe we can get something from HW gus. :-) >> >> Hi, Marc, >> >> Do you need me to test the patch you posted for EISR? As it seems there >> are some things need more discussion. > > Yeah, that approach doesn't work. I'll try and come up with another > approach (basically banning P+A for interrupts that require a back > notification). > > [...] > >> I have added some logs to compare level interrupt between pl011(hwirq = >> 33) and mtty (hwirq = 36). In mtty case, vgic_queue_irq_unlock() is >> called twice. But only called once in pl011. >> >> following is the log, >> ===Without my patch=== >> ###PL011### >> >> <4>[ 180.598266] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >> latch:0 level:1 >> <4>[ 180.604460] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 >> level:1 >> <4>[ 180.604540] ==>90a0020000000021(active) >> <4>[ 180.614878] ==>d0a0020000000021(P&A) >> <4>[ 180.618415] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >> latch:0 level:0 >> <4>[ 180.625508] ==>90a0020000000021(active) >> <4>[ 180.629343] ==>10a0020000000021(inactive) >> >> ###mtty-vfio### >> <4>[ 223.123329] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 >> latch:0 level:1 >> <4>[ 223.129736] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >> level:1 >> <4>[ 223.136027] ==>50a0020000000024(pending) >> <4>[ 223.139954] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >> level:1 >> <4>[ 223.146460] ==>90a0020000000024(active) >> <4>[ 223.150273] ==>d0a0020000000024(P&A) >> <4>[ 223.153827] ==>90a0020000000024(active) >> <4>[ 223.157668] ==>d0a0020000000024(P&A) > > So the line is never lowered. That's very odd. > >> ...........cyclic... >> >> I rembered in some tests the state change is cyclic P->A->P&A. But it >> seems I cannot reproduce it. Is output LR state >> in kvm_vgic_inject_irq() reliable? >> >> ===With my patch=== >> ###PL011### >> <4>[ 114.798528] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >> latch:0 level:1 >> <4>[ 114.804743] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 >> level:1 >> <4>[ 114.804796] ==>90a0020000000021(active) >> <4>[ 114.815077] ==>d0a0020000000021(P&A) >> <4>[ 114.818628] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >> latch:0 level:0 >> <4>[ 114.825726] ==>90a0020000000021(active) >> <4>[ 114.829560] ==>10a0020000000021(inactive) >> >> ###mtty-vfio### >> >> <4>[ 161.579083] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 >> latch:0 level:1 >> <4>[ 161.585419] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >> level:1 >> <4>[ 161.591780] ==>50a0020000000024(pending) >> <4>[ 161.595708] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >> level:1 >> <4>[ 161.602204] ==>90a0020000000024(active) >> <4>[ 161.606023] ==>d0a0020000000024(P&A) >> <4>[ 161.609561] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >> latch:0 level:0 >> <4>[ 161.616693] ==>10a0020000000024(inactive) >> <4>[ 161.620745] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >> latch:0 level:1 >> <4>[ 161.627800] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >> level:1 >> <4>[ 161.627849] ==>90a0020000000024(active) >> <4>[ 161.640076] ==>d0a0020000000024(P&A) >> <4>[ 161.642689] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >> latch:0 level:0 >> <4>[ 161.649822] ==>10a0020000000024(inactive) > > Which is really bizarre. The device only lowers the line when it is > being told that the interrupt has been processed. That really smells of > a bug in the device emulation. It should be lowered when the guest > clears the interrupt status at the device level, and not when notified > that the interrupt has been completed at the interrupt controller level. Not sure I get what you mean. To me the guest driver may have properly acked the interrupt at HW level. But this cannot lower the virtual line level. The virtual line level only is set when an interrupt hits and the VFIO irq handler signals the irqfd. only the resamplefd can lower the virtual line level. There is no communication between the VFIO driver and KVM to lower the virtual line level. Note the resamplefd also is used to unmask the interrupt on VFIO driver side. Thanks Eric > > Thanks, > > M. > ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-09 13:10 ` Auger Eric 0 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-09 13:10 UTC (permalink / raw) To: linux-arm-kernel Hi Marc, On 09/03/18 10:40, Marc Zyngier wrote: > On 09/03/18 03:14, Yang, Shunyong wrote: > > [trimming things a bit] > >>>>>>>> static bool lr_signals_eoi_mi(u32 lr_val) >>>>>>>> { >>>>>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & >>>>>>>> GICH_LR_EOI) && >>>>>>>> - !(lr_val & GICH_LR_HW); >>>>>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) >>>>>>>> && >>>>>>> That feels very wrong. You're now signalling the resampling >>>>>>> in both >>>>>>> invalid and pending+active, and the latter state doesn't mean >>>>>>> you've >>>>>>> EOIed anything. You're now over-signalling, and signalling >>>>>>> the >>>>>>> wrong event. >> >> I am using XOR GICH_LR_STATE(0b'11), so only 0b'11(P&A) will be >> signaled. Other state will be false. > > And that's really wrong. P+A is a state where the interrupt is still > being processed. The only case where we can reliably detect that an > interrupt has been EOId is when state==0. > >> And I am curious why the EOI bit in LR indicate the end of interrupt >> regardless of the state? Please bear with me as I am a newbie in this >> part. > > The EOI bit indicates that we've requested a maintenance interrupt from > the HW. It only triggers when state==0. If you have (like you describe > further down) a sequence of > > P -> A -> (exit) -> P+A -> P -> A -> (exit) P+A ... > > we can never reliably detect that an interrupt has been EOId (because > the HW never delivers a maintenance interrupt), other than by tracking > the states before and after exit, and hoping that you've done an exit > because you're touching the source of the interrupt. > >>>>>> Also, any guideline on how to reproduce this would be much >>>>>> appreciated. >>>>>> I never used this mdev/mtty thing, so please bear with me. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> M. >> >> The mdev/mtty documentation is at Documentation/vfio-mediated- >> device.txt. It docmented how to enable mtty device. >> And support for "vfio-pci,sysfsdev" should be availabe in your qemu >> version (I compiled the latest version). >> Following is my commond to run qemu with mdev support, >> "qemu-system-aarch64 -m 1024 -cpu host -M virt,gic_version=3 -nographic >> \ >> -kernel /home/yangsy/up-kvm/arch/arm64/boot/Image.gz \ >> -initrd /home/yangsy/kvm/ramdisk/initrd.img \ >> -netdev user,id=eth0 -device virtio-net-device,netdev=eth0 -enable-kvm >> \ >> -append "root=/dev/ram rdinit=/sbin/init" \ >> -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f- >> 3c1e-e6bfe0fa1001 >> " >> For just test this vgic case, type "cat /dev/ttyS0" in guest. But if >> test read/write multiple bytes, please apply following patch also >> https://patchwork.kernel.org/patch/10267039/ > > Thanks. I'll have a look. > >> >>>>>> >>>>>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 >>>>>> 00:00:00 2001 >>>>>> From: Marc Zyngier <marc.zyngier@arm.com> >>>>>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>>>>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to >>>>>> guess EOI MI >>>>>> status >>>>>> >>>>>> We so far rely on the LR state to decide whether the guest has >>>>>> EOI'd a level interrupt or not. While this looks like a good >>>>>> idea on the surface, it leads to a couple of annoying corner >>>>>> cases: >>>>>> >>>>>> Example 1: (P = Pending, A = Active, MI = Maintenance >>>>>> Interrupt) >>>>>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> >>>>>> MI >>>>> Do we really get an EOI maintenance interrupt here? Reading the >>>>> MISR >>>>> and EISR descriptions make me thing this is not the case... >>> Hum yes in EISR it is said that ICH_LR.State = 0b00! >>>> >>>> >>>> Yeah, it looks like I always want EISR to do what I want, and not >>>> to >>>> do what it does. Man, this thing is such a piece of crap. >>>> >>>> OK, scratch that. We need to do it without the help of the HW. >> >> If convenient, maybe we can get something from HW gus. :-) >> >> Hi, Marc, >> >> Do you need me to test the patch you posted for EISR? As it seems there >> are some things need more discussion. > > Yeah, that approach doesn't work. I'll try and come up with another > approach (basically banning P+A for interrupts that require a back > notification). > > [...] > >> I have added some logs to compare level interrupt between pl011(hwirq = >> 33) and mtty (hwirq = 36). In mtty case, vgic_queue_irq_unlock() is >> called twice. But only called once in pl011. >> >> following is the log, >> ===Without my patch=== >> ###PL011### >> >> <4>[ 180.598266] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >> latch:0 level:1 >> <4>[ 180.604460] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 >> level:1 >> <4>[ 180.604540] ==>90a0020000000021(active) >> <4>[ 180.614878] ==>d0a0020000000021(P&A) >> <4>[ 180.618415] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >> latch:0 level:0 >> <4>[ 180.625508] ==>90a0020000000021(active) >> <4>[ 180.629343] ==>10a0020000000021(inactive) >> >> ###mtty-vfio### >> <4>[ 223.123329] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 >> latch:0 level:1 >> <4>[ 223.129736] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >> level:1 >> <4>[ 223.136027] ==>50a0020000000024(pending) >> <4>[ 223.139954] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >> level:1 >> <4>[ 223.146460] ==>90a0020000000024(active) >> <4>[ 223.150273] ==>d0a0020000000024(P&A) >> <4>[ 223.153827] ==>90a0020000000024(active) >> <4>[ 223.157668] ==>d0a0020000000024(P&A) > > So the line is never lowered. That's very odd. > >> ...........cyclic... >> >> I rembered in some tests the state change is cyclic P->A->P&A. But it >> seems I cannot reproduce it. Is output LR state >> in kvm_vgic_inject_irq() reliable? >> >> ===With my patch=== >> ###PL011### >> <4>[ 114.798528] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >> latch:0 level:1 >> <4>[ 114.804743] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 >> level:1 >> <4>[ 114.804796] ==>90a0020000000021(active) >> <4>[ 114.815077] ==>d0a0020000000021(P&A) >> <4>[ 114.818628] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >> latch:0 level:0 >> <4>[ 114.825726] ==>90a0020000000021(active) >> <4>[ 114.829560] ==>10a0020000000021(inactive) >> >> ###mtty-vfio### >> >> <4>[ 161.579083] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 >> latch:0 level:1 >> <4>[ 161.585419] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >> level:1 >> <4>[ 161.591780] ==>50a0020000000024(pending) >> <4>[ 161.595708] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >> level:1 >> <4>[ 161.602204] ==>90a0020000000024(active) >> <4>[ 161.606023] ==>d0a0020000000024(P&A) >> <4>[ 161.609561] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >> latch:0 level:0 >> <4>[ 161.616693] ==>10a0020000000024(inactive) >> <4>[ 161.620745] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >> latch:0 level:1 >> <4>[ 161.627800] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >> level:1 >> <4>[ 161.627849] ==>90a0020000000024(active) >> <4>[ 161.640076] ==>d0a0020000000024(P&A) >> <4>[ 161.642689] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >> latch:0 level:0 >> <4>[ 161.649822] ==>10a0020000000024(inactive) > > Which is really bizarre. The device only lowers the line when it is > being told that the interrupt has been processed. That really smells of > a bug in the device emulation. It should be lowered when the guest > clears the interrupt status at the device level, and not when notified > that the interrupt has been completed at the interrupt controller level. Not sure I get what you mean. To me the guest driver may have properly acked the interrupt at HW level. But this cannot lower the virtual line level. The virtual line level only is set when an interrupt hits and the VFIO irq handler signals the irqfd. only the resamplefd can lower the virtual line level. There is no communication between the VFIO driver and KVM to lower the virtual line level. Note the resamplefd also is used to unmask the interrupt on VFIO driver side. Thanks Eric > > Thanks, > > M. > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-09 13:10 ` Auger Eric @ 2018-03-09 13:37 ` Marc Zyngier -1 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-09 13:37 UTC (permalink / raw) To: Auger Eric, Yang, Shunyong, cdall Cc: ard.biesheuvel, david.daney, will.deacon, linux-kernel, Zheng, Joey, kvmarm, linux-arm-kernel On 09/03/18 13:10, Auger Eric wrote: > Hi Marc, > > On 09/03/18 10:40, Marc Zyngier wrote: >> On 09/03/18 03:14, Yang, Shunyong wrote: >> >> [trimming things a bit] >> >>>>>>>>> static bool lr_signals_eoi_mi(u32 lr_val) >>>>>>>>> { >>>>>>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & >>>>>>>>> GICH_LR_EOI) && >>>>>>>>> - !(lr_val & GICH_LR_HW); >>>>>>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) >>>>>>>>> && >>>>>>>> That feels very wrong. You're now signalling the resampling >>>>>>>> in both >>>>>>>> invalid and pending+active, and the latter state doesn't mean >>>>>>>> you've >>>>>>>> EOIed anything. You're now over-signalling, and signalling >>>>>>>> the >>>>>>>> wrong event. >>> >>> I am using XOR GICH_LR_STATE(0b'11), so only 0b'11(P&A) will be >>> signaled. Other state will be false. >> >> And that's really wrong. P+A is a state where the interrupt is still >> being processed. The only case where we can reliably detect that an >> interrupt has been EOId is when state==0. >> >>> And I am curious why the EOI bit in LR indicate the end of interrupt >>> regardless of the state? Please bear with me as I am a newbie in this >>> part. >> >> The EOI bit indicates that we've requested a maintenance interrupt from >> the HW. It only triggers when state==0. If you have (like you describe >> further down) a sequence of >> >> P -> A -> (exit) -> P+A -> P -> A -> (exit) P+A ... >> >> we can never reliably detect that an interrupt has been EOId (because >> the HW never delivers a maintenance interrupt), other than by tracking >> the states before and after exit, and hoping that you've done an exit >> because you're touching the source of the interrupt. >> >>>>>>> Also, any guideline on how to reproduce this would be much >>>>>>> appreciated. >>>>>>> I never used this mdev/mtty thing, so please bear with me. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> M. >>> >>> The mdev/mtty documentation is at Documentation/vfio-mediated- >>> device.txt. It docmented how to enable mtty device. >>> And support for "vfio-pci,sysfsdev" should be availabe in your qemu >>> version (I compiled the latest version). >>> Following is my commond to run qemu with mdev support, >>> "qemu-system-aarch64 -m 1024 -cpu host -M virt,gic_version=3 -nographic >>> \ >>> -kernel /home/yangsy/up-kvm/arch/arm64/boot/Image.gz \ >>> -initrd /home/yangsy/kvm/ramdisk/initrd.img \ >>> -netdev user,id=eth0 -device virtio-net-device,netdev=eth0 -enable-kvm >>> \ >>> -append "root=/dev/ram rdinit=/sbin/init" \ >>> -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f- >>> 3c1e-e6bfe0fa1001 >>> " >>> For just test this vgic case, type "cat /dev/ttyS0" in guest. But if >>> test read/write multiple bytes, please apply following patch also >>> https://patchwork.kernel.org/patch/10267039/ >> >> Thanks. I'll have a look. >> >>> >>>>>>> >>>>>>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 >>>>>>> 00:00:00 2001 >>>>>>> From: Marc Zyngier <marc.zyngier@arm.com> >>>>>>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>>>>>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to >>>>>>> guess EOI MI >>>>>>> status >>>>>>> >>>>>>> We so far rely on the LR state to decide whether the guest has >>>>>>> EOI'd a level interrupt or not. While this looks like a good >>>>>>> idea on the surface, it leads to a couple of annoying corner >>>>>>> cases: >>>>>>> >>>>>>> Example 1: (P = Pending, A = Active, MI = Maintenance >>>>>>> Interrupt) >>>>>>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> >>>>>>> MI >>>>>> Do we really get an EOI maintenance interrupt here? Reading the >>>>>> MISR >>>>>> and EISR descriptions make me thing this is not the case... >>>> Hum yes in EISR it is said that ICH_LR.State = 0b00! >>>>> >>>>> >>>>> Yeah, it looks like I always want EISR to do what I want, and not >>>>> to >>>>> do what it does. Man, this thing is such a piece of crap. >>>>> >>>>> OK, scratch that. We need to do it without the help of the HW. >>> >>> If convenient, maybe we can get something from HW gus. :-) >>> >>> Hi, Marc, >>> >>> Do you need me to test the patch you posted for EISR? As it seems there >>> are some things need more discussion. >> >> Yeah, that approach doesn't work. I'll try and come up with another >> approach (basically banning P+A for interrupts that require a back >> notification). >> >> [...] >> >>> I have added some logs to compare level interrupt between pl011(hwirq = >>> 33) and mtty (hwirq = 36). In mtty case, vgic_queue_irq_unlock() is >>> called twice. But only called once in pl011. >>> >>> following is the log, >>> ===Without my patch=== >>> ###PL011### >>> >>> <4>[ 180.598266] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >>> latch:0 level:1 >>> <4>[ 180.604460] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 >>> level:1 >>> <4>[ 180.604540] ==>90a0020000000021(active) >>> <4>[ 180.614878] ==>d0a0020000000021(P&A) >>> <4>[ 180.618415] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >>> latch:0 level:0 >>> <4>[ 180.625508] ==>90a0020000000021(active) >>> <4>[ 180.629343] ==>10a0020000000021(inactive) >>> >>> ###mtty-vfio### >>> <4>[ 223.123329] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 >>> latch:0 level:1 >>> <4>[ 223.129736] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >>> level:1 >>> <4>[ 223.136027] ==>50a0020000000024(pending) >>> <4>[ 223.139954] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >>> level:1 >>> <4>[ 223.146460] ==>90a0020000000024(active) >>> <4>[ 223.150273] ==>d0a0020000000024(P&A) >>> <4>[ 223.153827] ==>90a0020000000024(active) >>> <4>[ 223.157668] ==>d0a0020000000024(P&A) >> >> So the line is never lowered. That's very odd. >> >>> ...........cyclic... >>> >>> I rembered in some tests the state change is cyclic P->A->P&A. But it >>> seems I cannot reproduce it. Is output LR state >>> in kvm_vgic_inject_irq() reliable? >>> >>> ===With my patch=== >>> ###PL011### >>> <4>[ 114.798528] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >>> latch:0 level:1 >>> <4>[ 114.804743] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 >>> level:1 >>> <4>[ 114.804796] ==>90a0020000000021(active) >>> <4>[ 114.815077] ==>d0a0020000000021(P&A) >>> <4>[ 114.818628] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >>> latch:0 level:0 >>> <4>[ 114.825726] ==>90a0020000000021(active) >>> <4>[ 114.829560] ==>10a0020000000021(inactive) >>> >>> ###mtty-vfio### >>> >>> <4>[ 161.579083] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 >>> latch:0 level:1 >>> <4>[ 161.585419] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >>> level:1 >>> <4>[ 161.591780] ==>50a0020000000024(pending) >>> <4>[ 161.595708] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >>> level:1 >>> <4>[ 161.602204] ==>90a0020000000024(active) >>> <4>[ 161.606023] ==>d0a0020000000024(P&A) >>> <4>[ 161.609561] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >>> latch:0 level:0 >>> <4>[ 161.616693] ==>10a0020000000024(inactive) >>> <4>[ 161.620745] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >>> latch:0 level:1 >>> <4>[ 161.627800] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >>> level:1 >>> <4>[ 161.627849] ==>90a0020000000024(active) >>> <4>[ 161.640076] ==>d0a0020000000024(P&A) >>> <4>[ 161.642689] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >>> latch:0 level:0 >>> <4>[ 161.649822] ==>10a0020000000024(inactive) >> >> Which is really bizarre. The device only lowers the line when it is >> being told that the interrupt has been processed. That really smells of >> a bug in the device emulation. It should be lowered when the guest >> clears the interrupt status at the device level, and not when notified >> that the interrupt has been completed at the interrupt controller level. > Not sure I get what you mean. To me the guest driver may have properly > acked the interrupt at HW level. But this cannot lower the virtual line > level. Why? How? If the guest has indeed talked to the device, where is the trap? How comes there is no lowering of the line? That's now how level interrupts are modelled, which is what we're supposed to deal with here. > The virtual line level only is set when an interrupt hits and the > VFIO irq handler signals the irqfd. only the resamplefd can lower the > virtual line level. There is no communication between the VFIO driver > and KVM to lower the virtual line level. Note the resamplefd also is > used to unmask the interrupt on VFIO driver side. Then this is not a level interrupt. This is some VFIO-specific mechanism that uses interrupts as a signalling mechanism, and breaks the reasonable expectations of the guest. For example: - Interrupt fires - guest acks the interrupt at the device level - guest reads the pending state on the GIC At that point, the guest will find that the irq is still pending, which is in total violation of the interrupt model. What we have here seems to be some bizarre "level with latch until EOIed", which doesn't exist in the architecture. Even worse, we're not able to describe it to the guest (neither DT or ACPI describe this model). Oh well... M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-09 13:37 ` Marc Zyngier 0 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-09 13:37 UTC (permalink / raw) To: linux-arm-kernel On 09/03/18 13:10, Auger Eric wrote: > Hi Marc, > > On 09/03/18 10:40, Marc Zyngier wrote: >> On 09/03/18 03:14, Yang, Shunyong wrote: >> >> [trimming things a bit] >> >>>>>>>>> static bool lr_signals_eoi_mi(u32 lr_val) >>>>>>>>> { >>>>>>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & >>>>>>>>> GICH_LR_EOI) && >>>>>>>>> - !(lr_val & GICH_LR_HW); >>>>>>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) >>>>>>>>> && >>>>>>>> That feels very wrong. You're now signalling the resampling >>>>>>>> in both >>>>>>>> invalid and pending+active, and the latter state doesn't mean >>>>>>>> you've >>>>>>>> EOIed anything. You're now over-signalling, and signalling >>>>>>>> the >>>>>>>> wrong event. >>> >>> I am using XOR GICH_LR_STATE(0b'11), so only 0b'11(P&A) will be >>> signaled. Other state will be false. >> >> And that's really wrong. P+A is a state where the interrupt is still >> being processed. The only case where we can reliably detect that an >> interrupt has been EOId is when state==0. >> >>> And I am curious why the EOI bit in LR indicate the end of interrupt >>> regardless of the state? Please bear with me as I am a newbie in this >>> part. >> >> The EOI bit indicates that we've requested a maintenance interrupt from >> the HW. It only triggers when state==0. If you have (like you describe >> further down) a sequence of >> >> P -> A -> (exit) -> P+A -> P -> A -> (exit) P+A ... >> >> we can never reliably detect that an interrupt has been EOId (because >> the HW never delivers a maintenance interrupt), other than by tracking >> the states before and after exit, and hoping that you've done an exit >> because you're touching the source of the interrupt. >> >>>>>>> Also, any guideline on how to reproduce this would be much >>>>>>> appreciated. >>>>>>> I never used this mdev/mtty thing, so please bear with me. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> M. >>> >>> The mdev/mtty documentation is at Documentation/vfio-mediated- >>> device.txt. It docmented how to enable mtty device. >>> And support for "vfio-pci,sysfsdev" should be availabe in your qemu >>> version (I compiled the latest version). >>> Following is my commond to run qemu with mdev support, >>> "qemu-system-aarch64 -m 1024 -cpu host -M virt,gic_version=3 -nographic >>> \ >>> -kernel /home/yangsy/up-kvm/arch/arm64/boot/Image.gz \ >>> -initrd /home/yangsy/kvm/ramdisk/initrd.img \ >>> -netdev user,id=eth0 -device virtio-net-device,netdev=eth0 -enable-kvm >>> \ >>> -append "root=/dev/ram rdinit=/sbin/init" \ >>> -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f- >>> 3c1e-e6bfe0fa1001 >>> " >>> For just test this vgic case, type "cat /dev/ttyS0" in guest. But if >>> test read/write multiple bytes, please apply following patch also >>> https://patchwork.kernel.org/patch/10267039/ >> >> Thanks. I'll have a look. >> >>> >>>>>>> >>>>>>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 >>>>>>> 00:00:00 2001 >>>>>>> From: Marc Zyngier <marc.zyngier@arm.com> >>>>>>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>>>>>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to >>>>>>> guess EOI MI >>>>>>> status >>>>>>> >>>>>>> We so far rely on the LR state to decide whether the guest has >>>>>>> EOI'd a level interrupt or not. While this looks like a good >>>>>>> idea on the surface, it leads to a couple of annoying corner >>>>>>> cases: >>>>>>> >>>>>>> Example 1: (P = Pending, A = Active, MI = Maintenance >>>>>>> Interrupt) >>>>>>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> >>>>>>> MI >>>>>> Do we really get an EOI maintenance interrupt here? Reading the >>>>>> MISR >>>>>> and EISR descriptions make me thing this is not the case... >>>> Hum yes in EISR it is said that ICH_LR.State = 0b00! >>>>> >>>>> >>>>> Yeah, it looks like I always want EISR to do what I want, and not >>>>> to >>>>> do what it does. Man, this thing is such a piece of crap. >>>>> >>>>> OK, scratch that. We need to do it without the help of the HW. >>> >>> If convenient, maybe we can get something from HW gus. :-) >>> >>> Hi, Marc, >>> >>> Do you need me to test the patch you posted for EISR? As it seems there >>> are some things need more discussion. >> >> Yeah, that approach doesn't work. I'll try and come up with another >> approach (basically banning P+A for interrupts that require a back >> notification). >> >> [...] >> >>> I have added some logs to compare level interrupt between pl011(hwirq = >>> 33) and mtty (hwirq = 36). In mtty case, vgic_queue_irq_unlock() is >>> called twice. But only called once in pl011. >>> >>> following is the log, >>> ===Without my patch=== >>> ###PL011### >>> >>> <4>[ 180.598266] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >>> latch:0 level:1 >>> <4>[ 180.604460] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 >>> level:1 >>> <4>[ 180.604540] ==>90a0020000000021(active) >>> <4>[ 180.614878] ==>d0a0020000000021(P&A) >>> <4>[ 180.618415] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >>> latch:0 level:0 >>> <4>[ 180.625508] ==>90a0020000000021(active) >>> <4>[ 180.629343] ==>10a0020000000021(inactive) >>> >>> ###mtty-vfio### >>> <4>[ 223.123329] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 >>> latch:0 level:1 >>> <4>[ 223.129736] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >>> level:1 >>> <4>[ 223.136027] ==>50a0020000000024(pending) >>> <4>[ 223.139954] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >>> level:1 >>> <4>[ 223.146460] ==>90a0020000000024(active) >>> <4>[ 223.150273] ==>d0a0020000000024(P&A) >>> <4>[ 223.153827] ==>90a0020000000024(active) >>> <4>[ 223.157668] ==>d0a0020000000024(P&A) >> >> So the line is never lowered. That's very odd. >> >>> ...........cyclic... >>> >>> I rembered in some tests the state change is cyclic P->A->P&A. But it >>> seems I cannot reproduce it. Is output LR state >>> in kvm_vgic_inject_irq() reliable? >>> >>> ===With my patch=== >>> ###PL011### >>> <4>[ 114.798528] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >>> latch:0 level:1 >>> <4>[ 114.804743] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1 >>> level:1 >>> <4>[ 114.804796] ==>90a0020000000021(active) >>> <4>[ 114.815077] ==>d0a0020000000021(P&A) >>> <4>[ 114.818628] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1 >>> latch:0 level:0 >>> <4>[ 114.825726] ==>90a0020000000021(active) >>> <4>[ 114.829560] ==>10a0020000000021(inactive) >>> >>> ###mtty-vfio### >>> >>> <4>[ 161.579083] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1 >>> latch:0 level:1 >>> <4>[ 161.585419] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >>> level:1 >>> <4>[ 161.591780] ==>50a0020000000024(pending) >>> <4>[ 161.595708] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >>> level:1 >>> <4>[ 161.602204] ==>90a0020000000024(active) >>> <4>[ 161.606023] ==>d0a0020000000024(P&A) >>> <4>[ 161.609561] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >>> latch:0 level:0 >>> <4>[ 161.616693] ==>10a0020000000024(inactive) >>> <4>[ 161.620745] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >>> latch:0 level:1 >>> <4>[ 161.627800] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1 >>> level:1 >>> <4>[ 161.627849] ==>90a0020000000024(active) >>> <4>[ 161.640076] ==>d0a0020000000024(P&A) >>> <4>[ 161.642689] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1 >>> latch:0 level:0 >>> <4>[ 161.649822] ==>10a0020000000024(inactive) >> >> Which is really bizarre. The device only lowers the line when it is >> being told that the interrupt has been processed. That really smells of >> a bug in the device emulation. It should be lowered when the guest >> clears the interrupt status at the device level, and not when notified >> that the interrupt has been completed at the interrupt controller level. > Not sure I get what you mean. To me the guest driver may have properly > acked the interrupt at HW level. But this cannot lower the virtual line > level. Why? How? If the guest has indeed talked to the device, where is the trap? How comes there is no lowering of the line? That's now how level interrupts are modelled, which is what we're supposed to deal with here. > The virtual line level only is set when an interrupt hits and the > VFIO irq handler signals the irqfd. only the resamplefd can lower the > virtual line level. There is no communication between the VFIO driver > and KVM to lower the virtual line level. Note the resamplefd also is > used to unmask the interrupt on VFIO driver side. Then this is not a level interrupt. This is some VFIO-specific mechanism that uses interrupts as a signalling mechanism, and breaks the reasonable expectations of the guest. For example: - Interrupt fires - guest acks the interrupt at the device level - guest reads the pending state on the GIC At that point, the guest will find that the irq is still pending, which is in total violation of the interrupt model. What we have here seems to be some bizarre "level with latch until EOIed", which doesn't exist in the architecture. Even worse, we're not able to describe it to the guest (neither DT or ACPI describe this model). Oh well... M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 18:12 ` Auger Eric @ 2018-03-09 9:12 ` Marc Zyngier -1 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-09 9:12 UTC (permalink / raw) To: Auger Eric, Christoffer Dall Cc: Shunyong Yang, ard.biesheuvel, will.deacon, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng On 08/03/18 18:12, Auger Eric wrote: > Hi Marc, Christoffer, > > On 08/03/18 18:28, Marc Zyngier wrote: >> On Thu, 08 Mar 2018 16:19:00 +0000, >> Christoffer Dall wrote: >>> >>> On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: >>>> On 08/03/18 09:49, Marc Zyngier wrote: >>>>> [updated Christoffer's email address] >>>>> >>>>> Hi Shunyong, >>>>> >>>>> On 08/03/18 07:01, Shunyong Yang wrote: >>>>>> When resampling irqfds is enabled, level interrupt should be >>>>>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>>>>> specification IHI0069D, it said, >>>>>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>>>>> interface, the IRI changes the status of the interrupt to active >>>>>> and pending if: >>>>>> • It is an edge-triggered interrupt, and another edge has been >>>>>> detected since the interrupt was acknowledged. >>>>>> • It is a level-sensitive interrupt, and the level has not been >>>>>> deasserted since the interrupt was acknowledged." >>>>>> >>>>>> GIC v2 specification IHI0048B.b has similar description on page >>>>>> 3-42 for state machine transition. >>>>>> >>>>>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>>>>> in samples/vfio-mdev) triggers a level interrupt, the status >>>>>> transition in LR is pending-->active-->active and pending. >>>>>> Then it will wait resampling to de-assert the interrupt. >>>>>> >>>>>> Current design of lr_signals_eoi_mi() will return false if state >>>>>> in LR is not invalid(Inactive). It causes resampling will not happen >>>>>> in mtty case. >>>>> >>>>> Let me rephrase this, and tell me if I understood it correctly: >>>>> >>>>> - A level interrupt is injected, activated by the guest (LR state=active) >>>>> - guest exits, re-enters, (LR state=pending+active) >>>>> - guest EOIs the interrupt (LR state=pending) >>>>> - maintenance interrupt >>>>> - we don't signal the resampling because we're not in an invalid state >>>>> >>>>> Is that correct? >>>>> >>>>> That's an interesting case, because it seems to invalidate some of the >>>>> optimization that went in over a year ago. >>>>> >>>>> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields >>>>> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state >>>>> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation >>>>> >>>>> We could compare the value of the LR before the guest entry with >>>>> the value at exit time, but we still could miss it if we have a >>>>> transition such as P+A -> P -> A and assume a long enough propagation >>>>> delay for the maintenance interrupt (which is very likely). >>>>> >>>>> In essence, we have lost the benefit of EISR, which was to give us a >>>>> way to deal with asynchronous signalling. >>>>> >>>>>> >>>>>> This will cause interrupt fired continuously to guest even 8250 IIR >>>>>> has no interrupt. When 8250's interrupt is configured in shared mode, >>>>>> it will pass interrupt to other drivers to handle. However, there >>>>>> is no other driver involved. Then, a "nobody cared" kernel complaint >>>>>> occurs. >>>>>> >>>>>> / # cat /dev/ttyS0 >>>>>> [ 4.826836] random: crng init done >>>>>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>>>>> option) >>>>>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>>>>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>>>>> [ 6.380876] Call trace: >>>>>> [ 6.381937] dump_backtrace+0x0/0x180 >>>>>> [ 6.383495] show_stack+0x14/0x1c >>>>>> [ 6.384902] dump_stack+0x90/0xb4 >>>>>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>>>>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>>>>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>>>>> [ 6.391433] handle_irq_event+0x44/0x74 >>>>>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>>>>> [ 6.394784] generic_handle_irq+0x24/0x38 >>>>>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>>>>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>>>>> [ 6.399796] el1_irq+0xb0/0x128 >>>>>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>>>>> [ 6.403149] __setup_irq+0x41c/0x678 >>>>>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>>>>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>>>>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>>>>> [ 6.410123] serial8250_startup+0x20/0x28 >>>>>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>>>>> [ 6.413633] uart_port_activate+0x50/0x68 >>>>>> [ 6.415328] tty_port_open+0x84/0xd4 >>>>>> [ 6.416851] uart_open+0x34/0x44 >>>>>> [ 6.418229] tty_open+0xec/0x3c8 >>>>>> [ 6.419610] chrdev_open+0xb0/0x198 >>>>>> [ 6.421093] do_dentry_open+0x200/0x310 >>>>>> [ 6.422714] vfs_open+0x54/0x84 >>>>>> [ 6.424054] path_openat+0x2dc/0xf04 >>>>>> [ 6.425569] do_filp_open+0x68/0xd8 >>>>>> [ 6.427044] do_sys_open+0x16c/0x224 >>>>>> [ 6.428563] SyS_openat+0x10/0x18 >>>>>> [ 6.429972] el0_svc_naked+0x30/0x34 >>>>>> [ 6.431494] handlers: >>>>>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>>>>> [ 6.434597] Disabling IRQ #41 >>>>>> >>>>>> This patch changes the lr state condition in lr_signals_eoi_mi() from >>>>>> invalid(Inactive) to active and pending to avoid this. >>>>>> >>>>>> I am not sure about the original design of the condition of >>>>>> invalid(active). So, This RFC is sent out for comments. >>>>>> >>>>>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>>>>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>>>>> --- >>>>>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>>>>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>>>>> 2 files changed, 4 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c >>>>>> index e9d840a75e7b..740ee9a5f551 100644 >>>>>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>>>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>>>>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>>>>> >>>>>> static bool lr_signals_eoi_mi(u32 lr_val) >>>>>> { >>>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && >>>>>> - !(lr_val & GICH_LR_HW); >>>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >>>>> >>>>> That feels very wrong. You're now signalling the resampling in both >>>>> invalid and pending+active, and the latter state doesn't mean you've >>>>> EOIed anything. You're now over-signalling, and signalling the >>>>> wrong event. >>>>> >>>>>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>>>>> } >>>>>> >>>>>> /* >>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c >>>>>> index 6b329414e57a..43111bba7af9 100644 >>>>>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>>>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>>>>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>>>>> >>>>>> static bool lr_signals_eoi_mi(u64 lr_val) >>>>>> { >>>>>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && >>>>>> - !(lr_val & ICH_LR_HW); >>>>>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>>>>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >>>>>> } >>>>>> >>>>>> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >>>>>> >>>>> >>>>> Assuming I understand the issue correctly, I cannot really see how >>>>> to solve this without reintroducing EISR, which sucks majorly. >>>>> >>>>> I'll try to cook something shortly and we can all have a good >>>>> fight about how crap this is. >>>> >>>> Here's what I came up with. I don't really like it, but that's >>>> the least invasive this I could come up with. Please let me >>>> know if that helps with your test case. Note that I have only >>>> boot-tested this on a sample of 1 machine, so I don't expect this >>>> to be perfect. >>>> >>>> Also, any guideline on how to reproduce this would be much appreciated. >>>> I never used this mdev/mtty thing, so please bear with me. >>>> >>>> Thanks, >>>> >>>> M. >>>> >>>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 >>>> From: Marc Zyngier <marc.zyngier@arm.com> >>>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI >>>> status >>>> >>>> We so far rely on the LR state to decide whether the guest has >>>> EOI'd a level interrupt or not. While this looks like a good >>>> idea on the surface, it leads to a couple of annoying corner >>>> cases: >>>> >>>> Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) >>>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI >>> >>> Do we really get an EOI maintenance interrupt here? Reading the MISR >>> and EISR descriptions make me thing this is not the case... > > Hum yes in EISR it is said that ICH_LR.State = 0b00! >> >> Yeah, it looks like I always want EISR to do what I want, and not to >> do what it does. Man, this thing is such a piece of crap. >> >> OK, scratch that. We need to do it without the help of the HW. >> >>>> The state is now pending, we've really EOI'd the interrupt, and >>>> yet lr_signals_eoi_mi() returns false, since the state is not 0. >>>> The result is that we won't signal anything on the corresponding >>>> irqfd, which people complain about. Meh. >>> >>> So the core of the problem is that when we've entered the guest with >>> PENDING+ACTIVE and when we exit (for some reason) we don't signal the >>> resamplefd, right? The solution seems to me that we don't ever do >>> PENDING+ACTIVE if you need to resample after each deactivate. What >>> would be the point of appending a pending state that you only know to be >>> valid after a resample anyway? >> >> The question is then to identify that a given source needs to be >> signalled back to VFIO. Calling into the eventfd code on the hot path >> is pretty horrid (I'm not sure if we can really call into this with >> interrupts disabled, for example). >> >>> >>>> >>>> Example 2: >>>> P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires >>> >>> We could be more clever and do the following calculation on every exit: >>> >>> If you enter with P, and exit with either A or 0, then signal. >>> >>> If you enter with P+A, and you exit with either P, A, or 0, then signal. >>> >>> Wouldn't that also solve it? (Although I have a feeling you'd miss some >>> exits in this case). >> >> I'd be more confident if we did forbid P+A for such interrupts >> altogether, as they really feel like another kind of HW interrupt. > > the LR P+A looks strange to me too. all the more so it may cause the > same IRQ to be acked twice? If the pending bit isn't dropped by the time we get to EOI the first one, probably. But that's pretty much expected with a level interrupt isn't it? > P -> A -> 0 (resample). Doesn't our issue come from the fact we reinject > the P in LR until the line level is deasserted? Which is consistent with the life cycle of a level interrupt. What usually happens is (for a non HW interrupt): P -> IAR -> A -> lower the line in the device -> 0 If you generate an exit at the right spot, and yet don't lower the line, you end up with: P -> IAR -> A -> exit/enter -> P+A >From there, if you lower the line, it is likely to cause an exit: P+A -> MMIO trap lowering the line -> A >> >> Eric: Is there any way to get a callback from the eventfd code to flag >> a given irq as requiring a notification on EOI? > > bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned > pin) was used in the past. I think it does what you want. > Not exactly. I'm very reluctant to call this on the hot path (I'd need the info on hw_flush), and I'd rather have a callback from the eventfd subsystem to tell me when a pin is being associated with a notifier (because this is likely to be very rare). If that doesn't exit, never mind. We can see if that solves Shunyong issue and optimize later. M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-09 9:12 ` Marc Zyngier 0 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-09 9:12 UTC (permalink / raw) To: linux-arm-kernel On 08/03/18 18:12, Auger Eric wrote: > Hi Marc, Christoffer, > > On 08/03/18 18:28, Marc Zyngier wrote: >> On Thu, 08 Mar 2018 16:19:00 +0000, >> Christoffer Dall wrote: >>> >>> On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: >>>> On 08/03/18 09:49, Marc Zyngier wrote: >>>>> [updated Christoffer's email address] >>>>> >>>>> Hi Shunyong, >>>>> >>>>> On 08/03/18 07:01, Shunyong Yang wrote: >>>>>> When resampling irqfds is enabled, level interrupt should be >>>>>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>>>>> specification IHI0069D, it said, >>>>>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>>>>> interface, the IRI changes the status of the interrupt to active >>>>>> and pending if: >>>>>> ? It is an edge-triggered interrupt, and another edge has been >>>>>> detected since the interrupt was acknowledged. >>>>>> ? It is a level-sensitive interrupt, and the level has not been >>>>>> deasserted since the interrupt was acknowledged." >>>>>> >>>>>> GIC v2 specification IHI0048B.b has similar description on page >>>>>> 3-42 for state machine transition. >>>>>> >>>>>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>>>>> in samples/vfio-mdev) triggers a level interrupt, the status >>>>>> transition in LR is pending-->active-->active and pending. >>>>>> Then it will wait resampling to de-assert the interrupt. >>>>>> >>>>>> Current design of lr_signals_eoi_mi() will return false if state >>>>>> in LR is not invalid(Inactive). It causes resampling will not happen >>>>>> in mtty case. >>>>> >>>>> Let me rephrase this, and tell me if I understood it correctly: >>>>> >>>>> - A level interrupt is injected, activated by the guest (LR state=active) >>>>> - guest exits, re-enters, (LR state=pending+active) >>>>> - guest EOIs the interrupt (LR state=pending) >>>>> - maintenance interrupt >>>>> - we don't signal the resampling because we're not in an invalid state >>>>> >>>>> Is that correct? >>>>> >>>>> That's an interesting case, because it seems to invalidate some of the >>>>> optimization that went in over a year ago. >>>>> >>>>> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields >>>>> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state >>>>> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation >>>>> >>>>> We could compare the value of the LR before the guest entry with >>>>> the value at exit time, but we still could miss it if we have a >>>>> transition such as P+A -> P -> A and assume a long enough propagation >>>>> delay for the maintenance interrupt (which is very likely). >>>>> >>>>> In essence, we have lost the benefit of EISR, which was to give us a >>>>> way to deal with asynchronous signalling. >>>>> >>>>>> >>>>>> This will cause interrupt fired continuously to guest even 8250 IIR >>>>>> has no interrupt. When 8250's interrupt is configured in shared mode, >>>>>> it will pass interrupt to other drivers to handle. However, there >>>>>> is no other driver involved. Then, a "nobody cared" kernel complaint >>>>>> occurs. >>>>>> >>>>>> / # cat /dev/ttyS0 >>>>>> [ 4.826836] random: crng init done >>>>>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>>>>> option) >>>>>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>>>>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>>>>> [ 6.380876] Call trace: >>>>>> [ 6.381937] dump_backtrace+0x0/0x180 >>>>>> [ 6.383495] show_stack+0x14/0x1c >>>>>> [ 6.384902] dump_stack+0x90/0xb4 >>>>>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>>>>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>>>>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>>>>> [ 6.391433] handle_irq_event+0x44/0x74 >>>>>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>>>>> [ 6.394784] generic_handle_irq+0x24/0x38 >>>>>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>>>>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>>>>> [ 6.399796] el1_irq+0xb0/0x128 >>>>>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>>>>> [ 6.403149] __setup_irq+0x41c/0x678 >>>>>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>>>>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>>>>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>>>>> [ 6.410123] serial8250_startup+0x20/0x28 >>>>>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>>>>> [ 6.413633] uart_port_activate+0x50/0x68 >>>>>> [ 6.415328] tty_port_open+0x84/0xd4 >>>>>> [ 6.416851] uart_open+0x34/0x44 >>>>>> [ 6.418229] tty_open+0xec/0x3c8 >>>>>> [ 6.419610] chrdev_open+0xb0/0x198 >>>>>> [ 6.421093] do_dentry_open+0x200/0x310 >>>>>> [ 6.422714] vfs_open+0x54/0x84 >>>>>> [ 6.424054] path_openat+0x2dc/0xf04 >>>>>> [ 6.425569] do_filp_open+0x68/0xd8 >>>>>> [ 6.427044] do_sys_open+0x16c/0x224 >>>>>> [ 6.428563] SyS_openat+0x10/0x18 >>>>>> [ 6.429972] el0_svc_naked+0x30/0x34 >>>>>> [ 6.431494] handlers: >>>>>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>>>>> [ 6.434597] Disabling IRQ #41 >>>>>> >>>>>> This patch changes the lr state condition in lr_signals_eoi_mi() from >>>>>> invalid(Inactive) to active and pending to avoid this. >>>>>> >>>>>> I am not sure about the original design of the condition of >>>>>> invalid(active). So, This RFC is sent out for comments. >>>>>> >>>>>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>>>>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>>>>> --- >>>>>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>>>>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>>>>> 2 files changed, 4 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c >>>>>> index e9d840a75e7b..740ee9a5f551 100644 >>>>>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>>>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>>>>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>>>>> >>>>>> static bool lr_signals_eoi_mi(u32 lr_val) >>>>>> { >>>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && >>>>>> - !(lr_val & GICH_LR_HW); >>>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >>>>> >>>>> That feels very wrong. You're now signalling the resampling in both >>>>> invalid and pending+active, and the latter state doesn't mean you've >>>>> EOIed anything. You're now over-signalling, and signalling the >>>>> wrong event. >>>>> >>>>>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>>>>> } >>>>>> >>>>>> /* >>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c >>>>>> index 6b329414e57a..43111bba7af9 100644 >>>>>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>>>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>>>>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>>>>> >>>>>> static bool lr_signals_eoi_mi(u64 lr_val) >>>>>> { >>>>>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && >>>>>> - !(lr_val & ICH_LR_HW); >>>>>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>>>>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >>>>>> } >>>>>> >>>>>> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >>>>>> >>>>> >>>>> Assuming I understand the issue correctly, I cannot really see how >>>>> to solve this without reintroducing EISR, which sucks majorly. >>>>> >>>>> I'll try to cook something shortly and we can all have a good >>>>> fight about how crap this is. >>>> >>>> Here's what I came up with. I don't really like it, but that's >>>> the least invasive this I could come up with. Please let me >>>> know if that helps with your test case. Note that I have only >>>> boot-tested this on a sample of 1 machine, so I don't expect this >>>> to be perfect. >>>> >>>> Also, any guideline on how to reproduce this would be much appreciated. >>>> I never used this mdev/mtty thing, so please bear with me. >>>> >>>> Thanks, >>>> >>>> M. >>>> >>>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 >>>> From: Marc Zyngier <marc.zyngier@arm.com> >>>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI >>>> status >>>> >>>> We so far rely on the LR state to decide whether the guest has >>>> EOI'd a level interrupt or not. While this looks like a good >>>> idea on the surface, it leads to a couple of annoying corner >>>> cases: >>>> >>>> Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) >>>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI >>> >>> Do we really get an EOI maintenance interrupt here? Reading the MISR >>> and EISR descriptions make me thing this is not the case... > > Hum yes in EISR it is said that ICH_LR.State = 0b00! >> >> Yeah, it looks like I always want EISR to do what I want, and not to >> do what it does. Man, this thing is such a piece of crap. >> >> OK, scratch that. We need to do it without the help of the HW. >> >>>> The state is now pending, we've really EOI'd the interrupt, and >>>> yet lr_signals_eoi_mi() returns false, since the state is not 0. >>>> The result is that we won't signal anything on the corresponding >>>> irqfd, which people complain about. Meh. >>> >>> So the core of the problem is that when we've entered the guest with >>> PENDING+ACTIVE and when we exit (for some reason) we don't signal the >>> resamplefd, right? The solution seems to me that we don't ever do >>> PENDING+ACTIVE if you need to resample after each deactivate. What >>> would be the point of appending a pending state that you only know to be >>> valid after a resample anyway? >> >> The question is then to identify that a given source needs to be >> signalled back to VFIO. Calling into the eventfd code on the hot path >> is pretty horrid (I'm not sure if we can really call into this with >> interrupts disabled, for example). >> >>> >>>> >>>> Example 2: >>>> P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires >>> >>> We could be more clever and do the following calculation on every exit: >>> >>> If you enter with P, and exit with either A or 0, then signal. >>> >>> If you enter with P+A, and you exit with either P, A, or 0, then signal. >>> >>> Wouldn't that also solve it? (Although I have a feeling you'd miss some >>> exits in this case). >> >> I'd be more confident if we did forbid P+A for such interrupts >> altogether, as they really feel like another kind of HW interrupt. > > the LR P+A looks strange to me too. all the more so it may cause the > same IRQ to be acked twice? If the pending bit isn't dropped by the time we get to EOI the first one, probably. But that's pretty much expected with a level interrupt isn't it? > P -> A -> 0 (resample). Doesn't our issue come from the fact we reinject > the P in LR until the line level is deasserted? Which is consistent with the life cycle of a level interrupt. What usually happens is (for a non HW interrupt): P -> IAR -> A -> lower the line in the device -> 0 If you generate an exit at the right spot, and yet don't lower the line, you end up with: P -> IAR -> A -> exit/enter -> P+A >From there, if you lower the line, it is likely to cause an exit: P+A -> MMIO trap lowering the line -> A >> >> Eric: Is there any way to get a callback from the eventfd code to flag >> a given irq as requiring a notification on EOI? > > bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned > pin) was used in the past. I think it does what you want. > Not exactly. I'm very reluctant to call this on the hot path (I'd need the info on hw_flush), and I'd rather have a callback from the eventfd subsystem to tell me when a pin is being associated with a notifier (because this is likely to be very rare). If that doesn't exit, never mind. We can see if that solves Shunyong issue and optimize later. M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-09 9:12 ` Marc Zyngier @ 2018-03-09 13:18 ` Auger Eric -1 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-09 13:18 UTC (permalink / raw) To: Marc Zyngier, Christoffer Dall Cc: Shunyong Yang, ard.biesheuvel, will.deacon, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng Hi Marc, On 09/03/18 10:12, Marc Zyngier wrote: > On 08/03/18 18:12, Auger Eric wrote: >> Hi Marc, Christoffer, >> >> On 08/03/18 18:28, Marc Zyngier wrote: >>> On Thu, 08 Mar 2018 16:19:00 +0000, >>> Christoffer Dall wrote: >>>> >>>> On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: >>>>> On 08/03/18 09:49, Marc Zyngier wrote: >>>>>> [updated Christoffer's email address] >>>>>> >>>>>> Hi Shunyong, >>>>>> >>>>>> On 08/03/18 07:01, Shunyong Yang wrote: >>>>>>> When resampling irqfds is enabled, level interrupt should be >>>>>>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>>>>>> specification IHI0069D, it said, >>>>>>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>>>>>> interface, the IRI changes the status of the interrupt to active >>>>>>> and pending if: >>>>>>> • It is an edge-triggered interrupt, and another edge has been >>>>>>> detected since the interrupt was acknowledged. >>>>>>> • It is a level-sensitive interrupt, and the level has not been >>>>>>> deasserted since the interrupt was acknowledged." >>>>>>> >>>>>>> GIC v2 specification IHI0048B.b has similar description on page >>>>>>> 3-42 for state machine transition. >>>>>>> >>>>>>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>>>>>> in samples/vfio-mdev) triggers a level interrupt, the status >>>>>>> transition in LR is pending-->active-->active and pending. >>>>>>> Then it will wait resampling to de-assert the interrupt. >>>>>>> >>>>>>> Current design of lr_signals_eoi_mi() will return false if state >>>>>>> in LR is not invalid(Inactive). It causes resampling will not happen >>>>>>> in mtty case. >>>>>> >>>>>> Let me rephrase this, and tell me if I understood it correctly: >>>>>> >>>>>> - A level interrupt is injected, activated by the guest (LR state=active) >>>>>> - guest exits, re-enters, (LR state=pending+active) >>>>>> - guest EOIs the interrupt (LR state=pending) >>>>>> - maintenance interrupt >>>>>> - we don't signal the resampling because we're not in an invalid state >>>>>> >>>>>> Is that correct? >>>>>> >>>>>> That's an interesting case, because it seems to invalidate some of the >>>>>> optimization that went in over a year ago. >>>>>> >>>>>> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields >>>>>> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state >>>>>> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation >>>>>> >>>>>> We could compare the value of the LR before the guest entry with >>>>>> the value at exit time, but we still could miss it if we have a >>>>>> transition such as P+A -> P -> A and assume a long enough propagation >>>>>> delay for the maintenance interrupt (which is very likely). >>>>>> >>>>>> In essence, we have lost the benefit of EISR, which was to give us a >>>>>> way to deal with asynchronous signalling. >>>>>> >>>>>>> >>>>>>> This will cause interrupt fired continuously to guest even 8250 IIR >>>>>>> has no interrupt. When 8250's interrupt is configured in shared mode, >>>>>>> it will pass interrupt to other drivers to handle. However, there >>>>>>> is no other driver involved. Then, a "nobody cared" kernel complaint >>>>>>> occurs. >>>>>>> >>>>>>> / # cat /dev/ttyS0 >>>>>>> [ 4.826836] random: crng init done >>>>>>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>>>>>> option) >>>>>>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>>>>>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>>>>>> [ 6.380876] Call trace: >>>>>>> [ 6.381937] dump_backtrace+0x0/0x180 >>>>>>> [ 6.383495] show_stack+0x14/0x1c >>>>>>> [ 6.384902] dump_stack+0x90/0xb4 >>>>>>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>>>>>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>>>>>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>>>>>> [ 6.391433] handle_irq_event+0x44/0x74 >>>>>>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>>>>>> [ 6.394784] generic_handle_irq+0x24/0x38 >>>>>>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>>>>>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>>>>>> [ 6.399796] el1_irq+0xb0/0x128 >>>>>>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>>>>>> [ 6.403149] __setup_irq+0x41c/0x678 >>>>>>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>>>>>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>>>>>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>>>>>> [ 6.410123] serial8250_startup+0x20/0x28 >>>>>>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>>>>>> [ 6.413633] uart_port_activate+0x50/0x68 >>>>>>> [ 6.415328] tty_port_open+0x84/0xd4 >>>>>>> [ 6.416851] uart_open+0x34/0x44 >>>>>>> [ 6.418229] tty_open+0xec/0x3c8 >>>>>>> [ 6.419610] chrdev_open+0xb0/0x198 >>>>>>> [ 6.421093] do_dentry_open+0x200/0x310 >>>>>>> [ 6.422714] vfs_open+0x54/0x84 >>>>>>> [ 6.424054] path_openat+0x2dc/0xf04 >>>>>>> [ 6.425569] do_filp_open+0x68/0xd8 >>>>>>> [ 6.427044] do_sys_open+0x16c/0x224 >>>>>>> [ 6.428563] SyS_openat+0x10/0x18 >>>>>>> [ 6.429972] el0_svc_naked+0x30/0x34 >>>>>>> [ 6.431494] handlers: >>>>>>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>>>>>> [ 6.434597] Disabling IRQ #41 >>>>>>> >>>>>>> This patch changes the lr state condition in lr_signals_eoi_mi() from >>>>>>> invalid(Inactive) to active and pending to avoid this. >>>>>>> >>>>>>> I am not sure about the original design of the condition of >>>>>>> invalid(active). So, This RFC is sent out for comments. >>>>>>> >>>>>>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>>>>>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>>>>>> --- >>>>>>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>>>>>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>>>>>> 2 files changed, 4 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c >>>>>>> index e9d840a75e7b..740ee9a5f551 100644 >>>>>>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>>>>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>>>>>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>>>>>> >>>>>>> static bool lr_signals_eoi_mi(u32 lr_val) >>>>>>> { >>>>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && >>>>>>> - !(lr_val & GICH_LR_HW); >>>>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >>>>>> >>>>>> That feels very wrong. You're now signalling the resampling in both >>>>>> invalid and pending+active, and the latter state doesn't mean you've >>>>>> EOIed anything. You're now over-signalling, and signalling the >>>>>> wrong event. >>>>>> >>>>>>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>>>>>> } >>>>>>> >>>>>>> /* >>>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c >>>>>>> index 6b329414e57a..43111bba7af9 100644 >>>>>>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>>>>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>>>>>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>>>>>> >>>>>>> static bool lr_signals_eoi_mi(u64 lr_val) >>>>>>> { >>>>>>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && >>>>>>> - !(lr_val & ICH_LR_HW); >>>>>>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>>>>>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >>>>>>> } >>>>>>> >>>>>>> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >>>>>>> >>>>>> >>>>>> Assuming I understand the issue correctly, I cannot really see how >>>>>> to solve this without reintroducing EISR, which sucks majorly. >>>>>> >>>>>> I'll try to cook something shortly and we can all have a good >>>>>> fight about how crap this is. >>>>> >>>>> Here's what I came up with. I don't really like it, but that's >>>>> the least invasive this I could come up with. Please let me >>>>> know if that helps with your test case. Note that I have only >>>>> boot-tested this on a sample of 1 machine, so I don't expect this >>>>> to be perfect. >>>>> >>>>> Also, any guideline on how to reproduce this would be much appreciated. >>>>> I never used this mdev/mtty thing, so please bear with me. >>>>> >>>>> Thanks, >>>>> >>>>> M. >>>>> >>>>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 >>>>> From: Marc Zyngier <marc.zyngier@arm.com> >>>>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>>>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI >>>>> status >>>>> >>>>> We so far rely on the LR state to decide whether the guest has >>>>> EOI'd a level interrupt or not. While this looks like a good >>>>> idea on the surface, it leads to a couple of annoying corner >>>>> cases: >>>>> >>>>> Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) >>>>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI >>>> >>>> Do we really get an EOI maintenance interrupt here? Reading the MISR >>>> and EISR descriptions make me thing this is not the case... >> >> Hum yes in EISR it is said that ICH_LR.State = 0b00! >>> >>> Yeah, it looks like I always want EISR to do what I want, and not to >>> do what it does. Man, this thing is such a piece of crap. >>> >>> OK, scratch that. We need to do it without the help of the HW. >>> >>>>> The state is now pending, we've really EOI'd the interrupt, and >>>>> yet lr_signals_eoi_mi() returns false, since the state is not 0. >>>>> The result is that we won't signal anything on the corresponding >>>>> irqfd, which people complain about. Meh. >>>> >>>> So the core of the problem is that when we've entered the guest with >>>> PENDING+ACTIVE and when we exit (for some reason) we don't signal the >>>> resamplefd, right? The solution seems to me that we don't ever do >>>> PENDING+ACTIVE if you need to resample after each deactivate. What >>>> would be the point of appending a pending state that you only know to be >>>> valid after a resample anyway? >>> >>> The question is then to identify that a given source needs to be >>> signalled back to VFIO. Calling into the eventfd code on the hot path >>> is pretty horrid (I'm not sure if we can really call into this with >>> interrupts disabled, for example). >>> >>>> >>>>> >>>>> Example 2: >>>>> P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires >>>> >>>> We could be more clever and do the following calculation on every exit: >>>> >>>> If you enter with P, and exit with either A or 0, then signal. >>>> >>>> If you enter with P+A, and you exit with either P, A, or 0, then signal. >>>> >>>> Wouldn't that also solve it? (Although I have a feeling you'd miss some >>>> exits in this case). >>> >>> I'd be more confident if we did forbid P+A for such interrupts >>> altogether, as they really feel like another kind of HW interrupt. >> >> the LR P+A looks strange to me too. all the more so it may cause the >> same IRQ to be acked twice? > > If the pending bit isn't dropped by the time we get to EOI the first > one, probably. But that's pretty much expected with a level interrupt > isn't it? > >> P -> A -> 0 (resample). Doesn't our issue come from the fact we reinject >> the P in LR until the line level is deasserted? > > Which is consistent with the life cycle of a level interrupt. What > usually happens is (for a non HW interrupt): > > P -> IAR -> A -> lower the line in the device -> 0 > > If you generate an exit at the right spot, and yet don't lower the line, > you end up with: > > P -> IAR -> A -> exit/enter -> P+A > > From there, if you lower the line, it is likely to cause an exit: > > P+A -> MMIO trap lowering the line -> A > >>> >>> Eric: Is there any way to get a callback from the eventfd code to flag >>> a given irq as requiring a notification on EOI? >> >> bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned >> pin) was used in the past. I think it does what you want. >> > > Not exactly. I'm very reluctant to call this on the hot path (I'd need > the info on hw_flush), and I'd rather have a callback from the eventfd > subsystem to tell me when a pin is being associated with a notifier > (because this is likely to be very rare). > > If that doesn't exit, never mind. We can see if that solves Shunyong > issue and optimize later. We don't have such callback mechanism AFAK. However we may call an arch specific function in kvm_irqfd_assign. Thanks Eric > > M. > ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-09 13:18 ` Auger Eric 0 siblings, 0 replies; 50+ messages in thread From: Auger Eric @ 2018-03-09 13:18 UTC (permalink / raw) To: linux-arm-kernel Hi Marc, On 09/03/18 10:12, Marc Zyngier wrote: > On 08/03/18 18:12, Auger Eric wrote: >> Hi Marc, Christoffer, >> >> On 08/03/18 18:28, Marc Zyngier wrote: >>> On Thu, 08 Mar 2018 16:19:00 +0000, >>> Christoffer Dall wrote: >>>> >>>> On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: >>>>> On 08/03/18 09:49, Marc Zyngier wrote: >>>>>> [updated Christoffer's email address] >>>>>> >>>>>> Hi Shunyong, >>>>>> >>>>>> On 08/03/18 07:01, Shunyong Yang wrote: >>>>>>> When resampling irqfds is enabled, level interrupt should be >>>>>>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>>>>>> specification IHI0069D, it said, >>>>>>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>>>>>> interface, the IRI changes the status of the interrupt to active >>>>>>> and pending if: >>>>>>> ? It is an edge-triggered interrupt, and another edge has been >>>>>>> detected since the interrupt was acknowledged. >>>>>>> ? It is a level-sensitive interrupt, and the level has not been >>>>>>> deasserted since the interrupt was acknowledged." >>>>>>> >>>>>>> GIC v2 specification IHI0048B.b has similar description on page >>>>>>> 3-42 for state machine transition. >>>>>>> >>>>>>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>>>>>> in samples/vfio-mdev) triggers a level interrupt, the status >>>>>>> transition in LR is pending-->active-->active and pending. >>>>>>> Then it will wait resampling to de-assert the interrupt. >>>>>>> >>>>>>> Current design of lr_signals_eoi_mi() will return false if state >>>>>>> in LR is not invalid(Inactive). It causes resampling will not happen >>>>>>> in mtty case. >>>>>> >>>>>> Let me rephrase this, and tell me if I understood it correctly: >>>>>> >>>>>> - A level interrupt is injected, activated by the guest (LR state=active) >>>>>> - guest exits, re-enters, (LR state=pending+active) >>>>>> - guest EOIs the interrupt (LR state=pending) >>>>>> - maintenance interrupt >>>>>> - we don't signal the resampling because we're not in an invalid state >>>>>> >>>>>> Is that correct? >>>>>> >>>>>> That's an interesting case, because it seems to invalidate some of the >>>>>> optimization that went in over a year ago. >>>>>> >>>>>> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields >>>>>> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state >>>>>> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation >>>>>> >>>>>> We could compare the value of the LR before the guest entry with >>>>>> the value at exit time, but we still could miss it if we have a >>>>>> transition such as P+A -> P -> A and assume a long enough propagation >>>>>> delay for the maintenance interrupt (which is very likely). >>>>>> >>>>>> In essence, we have lost the benefit of EISR, which was to give us a >>>>>> way to deal with asynchronous signalling. >>>>>> >>>>>>> >>>>>>> This will cause interrupt fired continuously to guest even 8250 IIR >>>>>>> has no interrupt. When 8250's interrupt is configured in shared mode, >>>>>>> it will pass interrupt to other drivers to handle. However, there >>>>>>> is no other driver involved. Then, a "nobody cared" kernel complaint >>>>>>> occurs. >>>>>>> >>>>>>> / # cat /dev/ttyS0 >>>>>>> [ 4.826836] random: crng init done >>>>>>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>>>>>> option) >>>>>>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>>>>>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>>>>>> [ 6.380876] Call trace: >>>>>>> [ 6.381937] dump_backtrace+0x0/0x180 >>>>>>> [ 6.383495] show_stack+0x14/0x1c >>>>>>> [ 6.384902] dump_stack+0x90/0xb4 >>>>>>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>>>>>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>>>>>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>>>>>> [ 6.391433] handle_irq_event+0x44/0x74 >>>>>>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>>>>>> [ 6.394784] generic_handle_irq+0x24/0x38 >>>>>>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>>>>>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>>>>>> [ 6.399796] el1_irq+0xb0/0x128 >>>>>>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>>>>>> [ 6.403149] __setup_irq+0x41c/0x678 >>>>>>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>>>>>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>>>>>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>>>>>> [ 6.410123] serial8250_startup+0x20/0x28 >>>>>>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>>>>>> [ 6.413633] uart_port_activate+0x50/0x68 >>>>>>> [ 6.415328] tty_port_open+0x84/0xd4 >>>>>>> [ 6.416851] uart_open+0x34/0x44 >>>>>>> [ 6.418229] tty_open+0xec/0x3c8 >>>>>>> [ 6.419610] chrdev_open+0xb0/0x198 >>>>>>> [ 6.421093] do_dentry_open+0x200/0x310 >>>>>>> [ 6.422714] vfs_open+0x54/0x84 >>>>>>> [ 6.424054] path_openat+0x2dc/0xf04 >>>>>>> [ 6.425569] do_filp_open+0x68/0xd8 >>>>>>> [ 6.427044] do_sys_open+0x16c/0x224 >>>>>>> [ 6.428563] SyS_openat+0x10/0x18 >>>>>>> [ 6.429972] el0_svc_naked+0x30/0x34 >>>>>>> [ 6.431494] handlers: >>>>>>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>>>>>> [ 6.434597] Disabling IRQ #41 >>>>>>> >>>>>>> This patch changes the lr state condition in lr_signals_eoi_mi() from >>>>>>> invalid(Inactive) to active and pending to avoid this. >>>>>>> >>>>>>> I am not sure about the original design of the condition of >>>>>>> invalid(active). So, This RFC is sent out for comments. >>>>>>> >>>>>>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com> >>>>>>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> >>>>>>> --- >>>>>>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>>>>>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>>>>>> 2 files changed, 4 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c >>>>>>> index e9d840a75e7b..740ee9a5f551 100644 >>>>>>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>>>>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>>>>>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>>>>>> >>>>>>> static bool lr_signals_eoi_mi(u32 lr_val) >>>>>>> { >>>>>>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && >>>>>>> - !(lr_val & GICH_LR_HW); >>>>>>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >>>>>> >>>>>> That feels very wrong. You're now signalling the resampling in both >>>>>> invalid and pending+active, and the latter state doesn't mean you've >>>>>> EOIed anything. You're now over-signalling, and signalling the >>>>>> wrong event. >>>>>> >>>>>>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>>>>>> } >>>>>>> >>>>>>> /* >>>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c >>>>>>> index 6b329414e57a..43111bba7af9 100644 >>>>>>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>>>>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>>>>>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>>>>>> >>>>>>> static bool lr_signals_eoi_mi(u64 lr_val) >>>>>>> { >>>>>>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && >>>>>>> - !(lr_val & ICH_LR_HW); >>>>>>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>>>>>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >>>>>>> } >>>>>>> >>>>>>> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >>>>>>> >>>>>> >>>>>> Assuming I understand the issue correctly, I cannot really see how >>>>>> to solve this without reintroducing EISR, which sucks majorly. >>>>>> >>>>>> I'll try to cook something shortly and we can all have a good >>>>>> fight about how crap this is. >>>>> >>>>> Here's what I came up with. I don't really like it, but that's >>>>> the least invasive this I could come up with. Please let me >>>>> know if that helps with your test case. Note that I have only >>>>> boot-tested this on a sample of 1 machine, so I don't expect this >>>>> to be perfect. >>>>> >>>>> Also, any guideline on how to reproduce this would be much appreciated. >>>>> I never used this mdev/mtty thing, so please bear with me. >>>>> >>>>> Thanks, >>>>> >>>>> M. >>>>> >>>>> From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 >>>>> From: Marc Zyngier <marc.zyngier@arm.com> >>>>> Date: Thu, 8 Mar 2018 11:14:06 +0000 >>>>> Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI >>>>> status >>>>> >>>>> We so far rely on the LR state to decide whether the guest has >>>>> EOI'd a level interrupt or not. While this looks like a good >>>>> idea on the surface, it leads to a couple of annoying corner >>>>> cases: >>>>> >>>>> Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) >>>>> P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI >>>> >>>> Do we really get an EOI maintenance interrupt here? Reading the MISR >>>> and EISR descriptions make me thing this is not the case... >> >> Hum yes in EISR it is said that ICH_LR.State = 0b00! >>> >>> Yeah, it looks like I always want EISR to do what I want, and not to >>> do what it does. Man, this thing is such a piece of crap. >>> >>> OK, scratch that. We need to do it without the help of the HW. >>> >>>>> The state is now pending, we've really EOI'd the interrupt, and >>>>> yet lr_signals_eoi_mi() returns false, since the state is not 0. >>>>> The result is that we won't signal anything on the corresponding >>>>> irqfd, which people complain about. Meh. >>>> >>>> So the core of the problem is that when we've entered the guest with >>>> PENDING+ACTIVE and when we exit (for some reason) we don't signal the >>>> resamplefd, right? The solution seems to me that we don't ever do >>>> PENDING+ACTIVE if you need to resample after each deactivate. What >>>> would be the point of appending a pending state that you only know to be >>>> valid after a resample anyway? >>> >>> The question is then to identify that a given source needs to be >>> signalled back to VFIO. Calling into the eventfd code on the hot path >>> is pretty horrid (I'm not sure if we can really call into this with >>> interrupts disabled, for example). >>> >>>> >>>>> >>>>> Example 2: >>>>> P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires >>>> >>>> We could be more clever and do the following calculation on every exit: >>>> >>>> If you enter with P, and exit with either A or 0, then signal. >>>> >>>> If you enter with P+A, and you exit with either P, A, or 0, then signal. >>>> >>>> Wouldn't that also solve it? (Although I have a feeling you'd miss some >>>> exits in this case). >>> >>> I'd be more confident if we did forbid P+A for such interrupts >>> altogether, as they really feel like another kind of HW interrupt. >> >> the LR P+A looks strange to me too. all the more so it may cause the >> same IRQ to be acked twice? > > If the pending bit isn't dropped by the time we get to EOI the first > one, probably. But that's pretty much expected with a level interrupt > isn't it? > >> P -> A -> 0 (resample). Doesn't our issue come from the fact we reinject >> the P in LR until the line level is deasserted? > > Which is consistent with the life cycle of a level interrupt. What > usually happens is (for a non HW interrupt): > > P -> IAR -> A -> lower the line in the device -> 0 > > If you generate an exit at the right spot, and yet don't lower the line, > you end up with: > > P -> IAR -> A -> exit/enter -> P+A > > From there, if you lower the line, it is likely to cause an exit: > > P+A -> MMIO trap lowering the line -> A > >>> >>> Eric: Is there any way to get a callback from the eventfd code to flag >>> a given irq as requiring a notification on EOI? >> >> bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned >> pin) was used in the past. I think it does what you want. >> > > Not exactly. I'm very reluctant to call this on the hot path (I'd need > the info on hw_flush), and I'd rather have a callback from the eventfd > subsystem to tell me when a pin is being associated with a notifier > (because this is likely to be very rare). > > If that doesn't exit, never mind. We can see if that solves Shunyong > issue and optimize later. We don't have such callback mechanism AFAK. However we may call an arch specific function in kvm_irqfd_assign. Thanks Eric > > M. > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 17:28 ` Marc Zyngier @ 2018-03-09 21:36 ` Christoffer Dall -1 siblings, 0 replies; 50+ messages in thread From: Christoffer Dall @ 2018-03-09 21:36 UTC (permalink / raw) To: Marc Zyngier Cc: Shunyong Yang, ard.biesheuvel, will.deacon, eric.auger, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote: > On Thu, 08 Mar 2018 16:19:00 +0000, > Christoffer Dall wrote: > > > > On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: > > > On 08/03/18 09:49, Marc Zyngier wrote: [...] > > > The state is now pending, we've really EOI'd the interrupt, and > > > yet lr_signals_eoi_mi() returns false, since the state is not 0. > > > The result is that we won't signal anything on the corresponding > > > irqfd, which people complain about. Meh. > > > > So the core of the problem is that when we've entered the guest with > > PENDING+ACTIVE and when we exit (for some reason) we don't signal the > > resamplefd, right? The solution seems to me that we don't ever do > > PENDING+ACTIVE if you need to resample after each deactivate. What > > would be the point of appending a pending state that you only know to be > > valid after a resample anyway? > > The question is then to identify that a given source needs to be > signalled back to VFIO. Calling into the eventfd code on the hot path > is pretty horrid (I'm not sure if we can really call into this with > interrupts disabled, for example). > This feels like a bad layering violation to me as well. > > > > > > > > Example 2: > > > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires > > > > We could be more clever and do the following calculation on every exit: > > > > If you enter with P, and exit with either A or 0, then signal. > > > > If you enter with P+A, and you exit with either P, A, or 0, then signal. > > > > Wouldn't that also solve it? (Although I have a feeling you'd miss some > > exits in this case). > > I'd be more confident if we did forbid P+A for such interrupts > altogether, as they really feel like another kind of HW interrupt. How about a slightly bigger hammer: Can we avoid doing P+A for level interrupts completely? I don't think that really makes much sense, and I think we simply everything if we just come back out and resample the line. For an edge, something like a network card, there's a potential performance win to appending a new pending state, but I doubt that this is the case for level interrupts. The timer would be unaffected, because it's a HW interrupt. Thanks, -Christoffer ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-09 21:36 ` Christoffer Dall 0 siblings, 0 replies; 50+ messages in thread From: Christoffer Dall @ 2018-03-09 21:36 UTC (permalink / raw) To: linux-arm-kernel On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote: > On Thu, 08 Mar 2018 16:19:00 +0000, > Christoffer Dall wrote: > > > > On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote: > > > On 08/03/18 09:49, Marc Zyngier wrote: [...] > > > The state is now pending, we've really EOI'd the interrupt, and > > > yet lr_signals_eoi_mi() returns false, since the state is not 0. > > > The result is that we won't signal anything on the corresponding > > > irqfd, which people complain about. Meh. > > > > So the core of the problem is that when we've entered the guest with > > PENDING+ACTIVE and when we exit (for some reason) we don't signal the > > resamplefd, right? The solution seems to me that we don't ever do > > PENDING+ACTIVE if you need to resample after each deactivate. What > > would be the point of appending a pending state that you only know to be > > valid after a resample anyway? > > The question is then to identify that a given source needs to be > signalled back to VFIO. Calling into the eventfd code on the hot path > is pretty horrid (I'm not sure if we can really call into this with > interrupts disabled, for example). > This feels like a bad layering violation to me as well. > > > > > > > > Example 2: > > > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires > > > > We could be more clever and do the following calculation on every exit: > > > > If you enter with P, and exit with either A or 0, then signal. > > > > If you enter with P+A, and you exit with either P, A, or 0, then signal. > > > > Wouldn't that also solve it? (Although I have a feeling you'd miss some > > exits in this case). > > I'd be more confident if we did forbid P+A for such interrupts > altogether, as they really feel like another kind of HW interrupt. How about a slightly bigger hammer: Can we avoid doing P+A for level interrupts completely? I don't think that really makes much sense, and I think we simply everything if we just come back out and resample the line. For an edge, something like a network card, there's a potential performance win to appending a new pending state, but I doubt that this is the case for level interrupts. The timer would be unaffected, because it's a HW interrupt. Thanks, -Christoffer ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-09 21:36 ` Christoffer Dall @ 2018-03-10 12:20 ` Marc Zyngier -1 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-10 12:20 UTC (permalink / raw) To: Christoffer Dall Cc: Shunyong Yang, ard.biesheuvel, will.deacon, eric.auger, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng On Fri, 09 Mar 2018 21:36:12 +0000, Christoffer Dall wrote: > > On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote: > > I'd be more confident if we did forbid P+A for such interrupts > > altogether, as they really feel like another kind of HW interrupt. > > How about a slightly bigger hammer: Can we avoid doing P+A for level > interrupts completely? I don't think that really makes much sense, and > I think we simply everything if we just come back out and resample the > line. For an edge, something like a network card, there's a potential > performance win to appending a new pending state, but I doubt that this > is the case for level interrupts. I started implementing the same thing yesterday. Somehow, it feels slightly better to have the same flow for all level interrupts, including the timer, and we only use the MI on EOI as a way to trigger the next state of injection. Still testing, but looking good so far. I'm still puzzled that we have this level-but-not-quite behaviour for VFIO interrupts. At some point, it is going to bite us badly. M. -- Jazz is not dead, it just smell funny. ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-10 12:20 ` Marc Zyngier 0 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-10 12:20 UTC (permalink / raw) To: linux-arm-kernel On Fri, 09 Mar 2018 21:36:12 +0000, Christoffer Dall wrote: > > On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote: > > I'd be more confident if we did forbid P+A for such interrupts > > altogether, as they really feel like another kind of HW interrupt. > > How about a slightly bigger hammer: Can we avoid doing P+A for level > interrupts completely? I don't think that really makes much sense, and > I think we simply everything if we just come back out and resample the > line. For an edge, something like a network card, there's a potential > performance win to appending a new pending state, but I doubt that this > is the case for level interrupts. I started implementing the same thing yesterday. Somehow, it feels slightly better to have the same flow for all level interrupts, including the timer, and we only use the MI on EOI as a way to trigger the next state of injection. Still testing, but looking good so far. I'm still puzzled that we have this level-but-not-quite behaviour for VFIO interrupts. At some point, it is going to bite us badly. M. -- Jazz is not dead, it just smell funny. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-10 12:20 ` Marc Zyngier @ 2018-03-11 1:55 ` Christoffer Dall -1 siblings, 0 replies; 50+ messages in thread From: Christoffer Dall @ 2018-03-11 1:55 UTC (permalink / raw) To: Marc Zyngier Cc: Shunyong Yang, Ard Biesheuvel, Will Deacon, Auger Eric, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier <marc.zyngier@arm.com> wrote: > On Fri, 09 Mar 2018 21:36:12 +0000, > Christoffer Dall wrote: >> >> On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote: >> > I'd be more confident if we did forbid P+A for such interrupts >> > altogether, as they really feel like another kind of HW interrupt. >> >> How about a slightly bigger hammer: Can we avoid doing P+A for level >> interrupts completely? I don't think that really makes much sense, and >> I think we simply everything if we just come back out and resample the >> line. For an edge, something like a network card, there's a potential >> performance win to appending a new pending state, but I doubt that this >> is the case for level interrupts. > > I started implementing the same thing yesterday. Somehow, it feels > slightly better to have the same flow for all level interrupts, > including the timer, and we only use the MI on EOI as a way to trigger > the next state of injection. Still testing, but looking good so far. > > I'm still puzzled that we have this level-but-not-quite behaviour for > VFIO interrupts. At some point, it is going to bite us badly. > Where is the departure from level-triggered behavior with VFIO? As far as I can tell, the GIC flow of the interrupts will be just a level interrupt, but we just need to make sure the resamplefd mechanism is supported for both types of interrupts. Whether or not that's a decent mechanism seems orthogonal to me, but that's a discussion for another day I think. Thanks, -Christoffer ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-11 1:55 ` Christoffer Dall 0 siblings, 0 replies; 50+ messages in thread From: Christoffer Dall @ 2018-03-11 1:55 UTC (permalink / raw) To: linux-arm-kernel On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier <marc.zyngier@arm.com> wrote: > On Fri, 09 Mar 2018 21:36:12 +0000, > Christoffer Dall wrote: >> >> On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote: >> > I'd be more confident if we did forbid P+A for such interrupts >> > altogether, as they really feel like another kind of HW interrupt. >> >> How about a slightly bigger hammer: Can we avoid doing P+A for level >> interrupts completely? I don't think that really makes much sense, and >> I think we simply everything if we just come back out and resample the >> line. For an edge, something like a network card, there's a potential >> performance win to appending a new pending state, but I doubt that this >> is the case for level interrupts. > > I started implementing the same thing yesterday. Somehow, it feels > slightly better to have the same flow for all level interrupts, > including the timer, and we only use the MI on EOI as a way to trigger > the next state of injection. Still testing, but looking good so far. > > I'm still puzzled that we have this level-but-not-quite behaviour for > VFIO interrupts. At some point, it is going to bite us badly. > Where is the departure from level-triggered behavior with VFIO? As far as I can tell, the GIC flow of the interrupts will be just a level interrupt, but we just need to make sure the resamplefd mechanism is supported for both types of interrupts. Whether or not that's a decent mechanism seems orthogonal to me, but that's a discussion for another day I think. Thanks, -Christoffer ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-11 1:55 ` Christoffer Dall @ 2018-03-11 12:17 ` Marc Zyngier -1 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-11 12:17 UTC (permalink / raw) To: Christoffer Dall, Shunyong Yang Cc: Ard Biesheuvel, Will Deacon, Auger Eric, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng On Sun, 11 Mar 2018 01:55:08 +0000 Christoffer Dall <cdall@kernel.org> wrote: > On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier <marc.zyngier@arm.com> wrote: > > On Fri, 09 Mar 2018 21:36:12 +0000, > > Christoffer Dall wrote: > >> > >> On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote: > >> > I'd be more confident if we did forbid P+A for such interrupts > >> > altogether, as they really feel like another kind of HW interrupt. > >> > >> How about a slightly bigger hammer: Can we avoid doing P+A for level > >> interrupts completely? I don't think that really makes much sense, and > >> I think we simply everything if we just come back out and resample the > >> line. For an edge, something like a network card, there's a potential > >> performance win to appending a new pending state, but I doubt that this > >> is the case for level interrupts. > > > > I started implementing the same thing yesterday. Somehow, it feels > > slightly better to have the same flow for all level interrupts, > > including the timer, and we only use the MI on EOI as a way to trigger > > the next state of injection. Still testing, but looking good so far. > > > > I'm still puzzled that we have this level-but-not-quite behaviour for > > VFIO interrupts. At some point, it is going to bite us badly. > > > > Where is the departure from level-triggered behavior with VFIO? As > far as I can tell, the GIC flow of the interrupts will be just a level > interrupt, The GIC is fine, I believe. What is not exactly fine is the signalling from the device, which will never be dropped until the EOI has been detected. > but we just need to make sure the resamplefd mechanism is > supported for both types of interrupts. Whether or not that's a > decent mechanism seems orthogonal to me, but that's a discussion for > another day I think. Given that VFIO is built around this mechanism, I don't think we have a choice but to support it. Anyway, I came up with the following patch, which I tested on Seattle with mtty. It also survived my usual hammering of cyclictest, hackbench and bulk VM installs. Shunyong, could you please give it a go? Thanks, M. >From 9ca96b9fb535cc6ab578bda85c4ecbc4a8c63cd7 Mon Sep 17 00:00:00 2001 From: Marc Zyngier <marc.zyngier@arm.com> Date: Fri, 9 Mar 2018 14:59:40 +0000 Subject: [PATCH] KVM: arm/arm64: vgic: Disallow Active+Pending for level interrupts It was recently reported that VFIO mediated devices, and anything that VFIO exposes as level interrupts, do no strictly follow the expected logic of such interrupts as it only lowers the input line when the guest has EOId the interrupt at the GIC level, rather than when it Acked the interrupt at the device level. The GIC's Active+Pending state is fundamentally incompatible with this behaviour, as it prevents KVM from observing the EOI, and in turn results in VFIO never dropping the line. This results in an interrupt storm in the guest, which it really never expected. As we cannot really change VFIO to follow the strict rules of level signalling, let's forbid the A+P state altogether, as it is in the end only an optimization. It ensures that we will transition via an invalid state, which we can use to notify VFIO of the EOI. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- virt/kvm/arm/vgic/vgic-v2.c | 47 +++++++++++++++++++++++++++------------------ virt/kvm/arm/vgic/vgic-v3.c | 47 +++++++++++++++++++++++++++------------------ 2 files changed, 56 insertions(+), 38 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c index 29556f71b691..9356d749da1d 100644 --- a/virt/kvm/arm/vgic/vgic-v2.c +++ b/virt/kvm/arm/vgic/vgic-v2.c @@ -153,8 +153,35 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr) { u32 val = irq->intid; + bool allow_pending = true; - if (irq_is_pending(irq)) { + if (irq->active) + val |= GICH_LR_ACTIVE_BIT; + + if (irq->hw) { + val |= GICH_LR_HW; + val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; + /* + * Never set pending+active on a HW interrupt, as the + * pending state is kept at the physical distributor + * level. + */ + if (irq->active) + allow_pending = false; + } else { + if (irq->config == VGIC_CONFIG_LEVEL) { + val |= GICH_LR_EOI; + + /* + * Software resampling doesn't work very well + * if we allow P+A, so let's not do that. + */ + if (irq->active) + allow_pending = false; + } + } + + if (allow_pending && irq_is_pending(irq)) { val |= GICH_LR_PENDING_BIT; if (irq->config == VGIC_CONFIG_EDGE) @@ -171,24 +198,6 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr) } } - if (irq->active) - val |= GICH_LR_ACTIVE_BIT; - - if (irq->hw) { - val |= GICH_LR_HW; - val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; - /* - * Never set pending+active on a HW interrupt, as the - * pending state is kept at the physical distributor - * level. - */ - if (irq->active && irq_is_pending(irq)) - val &= ~GICH_LR_PENDING_BIT; - } else { - if (irq->config == VGIC_CONFIG_LEVEL) - val |= GICH_LR_EOI; - } - /* * Level-triggered mapped IRQs are special because we only observe * rising edges as input to the VGIC. We therefore lower the line diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index 0ff2006f3781..6b484575cafb 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -135,8 +135,35 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr) { u32 model = vcpu->kvm->arch.vgic.vgic_model; u64 val = irq->intid; + bool allow_pending = true; - if (irq_is_pending(irq)) { + if (irq->active) + val |= ICH_LR_ACTIVE_BIT; + + if (irq->hw) { + val |= ICH_LR_HW; + val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; + /* + * Never set pending+active on a HW interrupt, as the + * pending state is kept at the physical distributor + * level. + */ + if (irq->active) + allow_pending = false; + } else { + if (irq->config == VGIC_CONFIG_LEVEL) { + val |= ICH_LR_EOI; + + /* + * Software resampling doesn't work very well + * if we allow P+A, so let's not do that. + */ + if (irq->active) + allow_pending = false; + } + } + + if (allow_pending && irq_is_pending(irq)) { val |= ICH_LR_PENDING_BIT; if (irq->config == VGIC_CONFIG_EDGE) @@ -154,24 +181,6 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr) } } - if (irq->active) - val |= ICH_LR_ACTIVE_BIT; - - if (irq->hw) { - val |= ICH_LR_HW; - val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; - /* - * Never set pending+active on a HW interrupt, as the - * pending state is kept at the physical distributor - * level. - */ - if (irq->active && irq_is_pending(irq)) - val &= ~ICH_LR_PENDING_BIT; - } else { - if (irq->config == VGIC_CONFIG_LEVEL) - val |= ICH_LR_EOI; - } - /* * Level-triggered mapped IRQs are special because we only observe * rising edges as input to the VGIC. We therefore lower the line -- 2.14.2 -- Without deviation from the norm, progress is not possible. ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-11 12:17 ` Marc Zyngier 0 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-11 12:17 UTC (permalink / raw) To: linux-arm-kernel On Sun, 11 Mar 2018 01:55:08 +0000 Christoffer Dall <cdall@kernel.org> wrote: > On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier <marc.zyngier@arm.com> wrote: > > On Fri, 09 Mar 2018 21:36:12 +0000, > > Christoffer Dall wrote: > >> > >> On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote: > >> > I'd be more confident if we did forbid P+A for such interrupts > >> > altogether, as they really feel like another kind of HW interrupt. > >> > >> How about a slightly bigger hammer: Can we avoid doing P+A for level > >> interrupts completely? I don't think that really makes much sense, and > >> I think we simply everything if we just come back out and resample the > >> line. For an edge, something like a network card, there's a potential > >> performance win to appending a new pending state, but I doubt that this > >> is the case for level interrupts. > > > > I started implementing the same thing yesterday. Somehow, it feels > > slightly better to have the same flow for all level interrupts, > > including the timer, and we only use the MI on EOI as a way to trigger > > the next state of injection. Still testing, but looking good so far. > > > > I'm still puzzled that we have this level-but-not-quite behaviour for > > VFIO interrupts. At some point, it is going to bite us badly. > > > > Where is the departure from level-triggered behavior with VFIO? As > far as I can tell, the GIC flow of the interrupts will be just a level > interrupt, The GIC is fine, I believe. What is not exactly fine is the signalling from the device, which will never be dropped until the EOI has been detected. > but we just need to make sure the resamplefd mechanism is > supported for both types of interrupts. Whether or not that's a > decent mechanism seems orthogonal to me, but that's a discussion for > another day I think. Given that VFIO is built around this mechanism, I don't think we have a choice but to support it. Anyway, I came up with the following patch, which I tested on Seattle with mtty. It also survived my usual hammering of cyclictest, hackbench and bulk VM installs. Shunyong, could you please give it a go? Thanks, M. >From 9ca96b9fb535cc6ab578bda85c4ecbc4a8c63cd7 Mon Sep 17 00:00:00 2001 From: Marc Zyngier <marc.zyngier@arm.com> Date: Fri, 9 Mar 2018 14:59:40 +0000 Subject: [PATCH] KVM: arm/arm64: vgic: Disallow Active+Pending for level interrupts It was recently reported that VFIO mediated devices, and anything that VFIO exposes as level interrupts, do no strictly follow the expected logic of such interrupts as it only lowers the input line when the guest has EOId the interrupt at the GIC level, rather than when it Acked the interrupt at the device level. The GIC's Active+Pending state is fundamentally incompatible with this behaviour, as it prevents KVM from observing the EOI, and in turn results in VFIO never dropping the line. This results in an interrupt storm in the guest, which it really never expected. As we cannot really change VFIO to follow the strict rules of level signalling, let's forbid the A+P state altogether, as it is in the end only an optimization. It ensures that we will transition via an invalid state, which we can use to notify VFIO of the EOI. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- virt/kvm/arm/vgic/vgic-v2.c | 47 +++++++++++++++++++++++++++------------------ virt/kvm/arm/vgic/vgic-v3.c | 47 +++++++++++++++++++++++++++------------------ 2 files changed, 56 insertions(+), 38 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c index 29556f71b691..9356d749da1d 100644 --- a/virt/kvm/arm/vgic/vgic-v2.c +++ b/virt/kvm/arm/vgic/vgic-v2.c @@ -153,8 +153,35 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr) { u32 val = irq->intid; + bool allow_pending = true; - if (irq_is_pending(irq)) { + if (irq->active) + val |= GICH_LR_ACTIVE_BIT; + + if (irq->hw) { + val |= GICH_LR_HW; + val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; + /* + * Never set pending+active on a HW interrupt, as the + * pending state is kept@the physical distributor + * level. + */ + if (irq->active) + allow_pending = false; + } else { + if (irq->config == VGIC_CONFIG_LEVEL) { + val |= GICH_LR_EOI; + + /* + * Software resampling doesn't work very well + * if we allow P+A, so let's not do that. + */ + if (irq->active) + allow_pending = false; + } + } + + if (allow_pending && irq_is_pending(irq)) { val |= GICH_LR_PENDING_BIT; if (irq->config == VGIC_CONFIG_EDGE) @@ -171,24 +198,6 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr) } } - if (irq->active) - val |= GICH_LR_ACTIVE_BIT; - - if (irq->hw) { - val |= GICH_LR_HW; - val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; - /* - * Never set pending+active on a HW interrupt, as the - * pending state is kept@the physical distributor - * level. - */ - if (irq->active && irq_is_pending(irq)) - val &= ~GICH_LR_PENDING_BIT; - } else { - if (irq->config == VGIC_CONFIG_LEVEL) - val |= GICH_LR_EOI; - } - /* * Level-triggered mapped IRQs are special because we only observe * rising edges as input to the VGIC. We therefore lower the line diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index 0ff2006f3781..6b484575cafb 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -135,8 +135,35 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr) { u32 model = vcpu->kvm->arch.vgic.vgic_model; u64 val = irq->intid; + bool allow_pending = true; - if (irq_is_pending(irq)) { + if (irq->active) + val |= ICH_LR_ACTIVE_BIT; + + if (irq->hw) { + val |= ICH_LR_HW; + val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; + /* + * Never set pending+active on a HW interrupt, as the + * pending state is kept@the physical distributor + * level. + */ + if (irq->active) + allow_pending = false; + } else { + if (irq->config == VGIC_CONFIG_LEVEL) { + val |= ICH_LR_EOI; + + /* + * Software resampling doesn't work very well + * if we allow P+A, so let's not do that. + */ + if (irq->active) + allow_pending = false; + } + } + + if (allow_pending && irq_is_pending(irq)) { val |= ICH_LR_PENDING_BIT; if (irq->config == VGIC_CONFIG_EDGE) @@ -154,24 +181,6 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr) } } - if (irq->active) - val |= ICH_LR_ACTIVE_BIT; - - if (irq->hw) { - val |= ICH_LR_HW; - val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; - /* - * Never set pending+active on a HW interrupt, as the - * pending state is kept@the physical distributor - * level. - */ - if (irq->active && irq_is_pending(irq)) - val &= ~ICH_LR_PENDING_BIT; - } else { - if (irq->config == VGIC_CONFIG_LEVEL) - val |= ICH_LR_EOI; - } - /* * Level-triggered mapped IRQs are special because we only observe * rising edges as input to the VGIC. We therefore lower the line -- 2.14.2 -- Without deviation from the norm, progress is not possible. ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-11 12:17 ` Marc Zyngier @ 2018-03-12 2:33 ` Yang, Shunyong -1 siblings, 0 replies; 50+ messages in thread From: Yang, Shunyong @ 2018-03-12 2:33 UTC (permalink / raw) To: marc.zyngier, cdall Cc: linux-kernel, ard.biesheuvel, kvmarm, Zheng, Joey, will.deacon, linux-arm-kernel, david.daney, eric.auger Hi, Marc, On Sun, 2018-03-11 at 12:17 +0000, Marc Zyngier wrote: > On Sun, 11 Mar 2018 01:55:08 +0000 > Christoffer Dall <cdall@kernel.org> wrote: > > > > > On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier <marc.zyngier@arm.co > > m> wrote: > > > > > > On Fri, 09 Mar 2018 21:36:12 +0000, > > > Christoffer Dall wrote: > > > > > > > > > > > > On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote: > > > > > > > > > > I'd be more confident if we did forbid P+A for such > > > > > interrupts > > > > > altogether, as they really feel like another kind of HW > > > > > interrupt. > > > > How about a slightly bigger hammer: Can we avoid doing P+A for > > > > level > > > > interrupts completely? I don't think that really makes much > > > > sense, and > > > > I think we simply everything if we just come back out and > > > > resample the > > > > line. For an edge, something like a network card, there's a > > > > potential > > > > performance win to appending a new pending state, but I doubt > > > > that this > > > > is the case for level interrupts. > > > I started implementing the same thing yesterday. Somehow, it > > > feels > > > slightly better to have the same flow for all level interrupts, > > > including the timer, and we only use the MI on EOI as a way to > > > trigger > > > the next state of injection. Still testing, but looking good so > > > far. > > > > > > I'm still puzzled that we have this level-but-not-quite behaviour > > > for > > > VFIO interrupts. At some point, it is going to bite us badly. > > > > > Where is the departure from level-triggered behavior with VFIO? As > > far as I can tell, the GIC flow of the interrupts will be just a > > level > > interrupt, > The GIC is fine, I believe. What is not exactly fine is the > signalling > from the device, which will never be dropped until the EOI has been > detected. > > > > > but we just need to make sure the resamplefd mechanism is > > supported for both types of interrupts. Whether or not that's a > > decent mechanism seems orthogonal to me, but that's a discussion > > for > > another day I think. > Given that VFIO is built around this mechanism, I don't think we have > a > choice but to support it. Anyway, I came up with the following patch, > which I tested on Seattle with mtty. It also survived my usual > hammering of cyclictest, hackbench and bulk VM installs. > > Shunyong, could you please give it a go? > > Thanks, > > M. > I have tested the patch. It works on QDF2400 platform and kvm_notify_acked_irq() is called when state is idle. BTW, I have following questions when I was debugging the issue. Coud you please give me some help? 1)what does "mi" mean in gic code? such as lr_signals_eoi_mi(); 2)In some __hyp_text code where printk() will cause "HYP panic:", such as in __kvm_vcpu_run(). How can I output debug information? Thanks. Shunyong. > From 9ca96b9fb535cc6ab578bda85c4ecbc4a8c63cd7 Mon Sep 17 00:00:00 > 2001 > From: Marc Zyngier <marc.zyngier@arm.com> > Date: Fri, 9 Mar 2018 14:59:40 +0000 > Subject: [PATCH] KVM: arm/arm64: vgic: Disallow Active+Pending for > level > interrupts > > It was recently reported that VFIO mediated devices, and anything > that VFIO exposes as level interrupts, do no strictly follow the > expected logic of such interrupts as it only lowers the input > line when the guest has EOId the interrupt at the GIC level, rather > than when it Acked the interrupt at the device level. > > The GIC's Active+Pending state is fundamentally incompatible with > this behaviour, as it prevents KVM from observing the EOI, and in > turn results in VFIO never dropping the line. This results in an > interrupt storm in the guest, which it really never expected. > > As we cannot really change VFIO to follow the strict rules of level > signalling, let's forbid the A+P state altogether, as it is in the > end only an optimization. It ensures that we will transition via > an invalid state, which we can use to notify VFIO of the EOI. > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > --- > virt/kvm/arm/vgic/vgic-v2.c | 47 +++++++++++++++++++++++++++------ > ------------ > virt/kvm/arm/vgic/vgic-v3.c | 47 +++++++++++++++++++++++++++------ > ------------ > 2 files changed, 56 insertions(+), 38 deletions(-) > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic- > v2.c > index 29556f71b691..9356d749da1d 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -153,8 +153,35 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu > *vcpu) > void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq > *irq, int lr) > { > u32 val = irq->intid; > + bool allow_pending = true; > > - if (irq_is_pending(irq)) { > + if (irq->active) > + val |= GICH_LR_ACTIVE_BIT; > + > + if (irq->hw) { > + val |= GICH_LR_HW; > + val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; > + /* > + * Never set pending+active on a HW interrupt, as > the > + * pending state is kept at the physical distributor > + * level. > + */ > + if (irq->active) > + allow_pending = false; > + } else { > + if (irq->config == VGIC_CONFIG_LEVEL) { > + val |= GICH_LR_EOI; > + > + /* > + * Software resampling doesn't work very > well > + * if we allow P+A, so let's not do that. > + */ > + if (irq->active) > + allow_pending = false; > + } > + } > + > + if (allow_pending && irq_is_pending(irq)) { > val |= GICH_LR_PENDING_BIT; > > if (irq->config == VGIC_CONFIG_EDGE) > @@ -171,24 +198,6 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) > } > } > > - if (irq->active) > - val |= GICH_LR_ACTIVE_BIT; > - > - if (irq->hw) { > - val |= GICH_LR_HW; > - val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; > - /* > - * Never set pending+active on a HW interrupt, as > the > - * pending state is kept at the physical distributor > - * level. > - */ > - if (irq->active && irq_is_pending(irq)) > - val &= ~GICH_LR_PENDING_BIT; > - } else { > - if (irq->config == VGIC_CONFIG_LEVEL) > - val |= GICH_LR_EOI; > - } > - > /* > * Level-triggered mapped IRQs are special because we only > observe > * rising edges as input to the VGIC. We therefore lower > the line > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic- > v3.c > index 0ff2006f3781..6b484575cafb 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -135,8 +135,35 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) > { > u32 model = vcpu->kvm->arch.vgic.vgic_model; > u64 val = irq->intid; > + bool allow_pending = true; > > - if (irq_is_pending(irq)) { > + if (irq->active) > + val |= ICH_LR_ACTIVE_BIT; > + > + if (irq->hw) { > + val |= ICH_LR_HW; > + val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; > + /* > + * Never set pending+active on a HW interrupt, as > the > + * pending state is kept at the physical distributor > + * level. > + */ > + if (irq->active) > + allow_pending = false; > + } else { > + if (irq->config == VGIC_CONFIG_LEVEL) { > + val |= ICH_LR_EOI; > + > + /* > + * Software resampling doesn't work very > well > + * if we allow P+A, so let's not do that. > + */ > + if (irq->active) > + allow_pending = false; > + } > + } > + > + if (allow_pending && irq_is_pending(irq)) { > val |= ICH_LR_PENDING_BIT; > > if (irq->config == VGIC_CONFIG_EDGE) > @@ -154,24 +181,6 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) > } > } > > - if (irq->active) > - val |= ICH_LR_ACTIVE_BIT; > - > - if (irq->hw) { > - val |= ICH_LR_HW; > - val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; > - /* > - * Never set pending+active on a HW interrupt, as > the > - * pending state is kept at the physical distributor > - * level. > - */ > - if (irq->active && irq_is_pending(irq)) > - val &= ~ICH_LR_PENDING_BIT; > - } else { > - if (irq->config == VGIC_CONFIG_LEVEL) > - val |= ICH_LR_EOI; > - } > - > /* > * Level-triggered mapped IRQs are special because we only > observe > * rising edges as input to the VGIC. We therefore lower > the line > -- > 2.14.2 > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-12 2:33 ` Yang, Shunyong 0 siblings, 0 replies; 50+ messages in thread From: Yang, Shunyong @ 2018-03-12 2:33 UTC (permalink / raw) To: linux-arm-kernel Hi, Marc, On Sun, 2018-03-11 at 12:17 +0000, Marc Zyngier wrote: > On Sun, 11 Mar 2018 01:55:08 +0000 > Christoffer Dall <cdall@kernel.org> wrote: > > > > > On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier <marc.zyngier@arm.co > > m> wrote: > > > > > > On Fri, 09 Mar 2018 21:36:12 +0000, > > > Christoffer Dall wrote:?? > > > > > > > > > > > > On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote:?? > > > > > > > > > > I'd be more confident if we did forbid P+A for such > > > > > interrupts > > > > > altogether, as they really feel like another kind of HW > > > > > interrupt.?? > > > > How about a slightly bigger hammer:??Can we avoid doing P+A for > > > > level > > > > interrupts completely???I don't think that really makes much > > > > sense, and > > > > I think we simply everything if we just come back out and > > > > resample the > > > > line.??For an edge, something like a network card, there's a > > > > potential > > > > performance win to appending a new pending state, but I doubt > > > > that this > > > > is the case for level interrupts.?? > > > I started implementing the same thing yesterday. Somehow, it > > > feels > > > slightly better to have the same flow for all level interrupts, > > > including the timer, and we only use the MI on EOI as a way to > > > trigger > > > the next state of injection. Still testing, but looking good so > > > far. > > > > > > I'm still puzzled that we have this level-but-not-quite behaviour > > > for > > > VFIO interrupts. At some point, it is going to bite us badly. > > > ? > > Where is the departure from level-triggered behavior with VFIO???As > > far as I can tell, the GIC flow of the interrupts will be just a > > level > > interrupt,? > The GIC is fine, I believe. What is not exactly fine is the > signalling > from the device, which will never be dropped until the EOI has been > detected. > > > > > but we just need to make sure the resamplefd mechanism is > > supported for both types of interrupts.??Whether or not that's a > > decent mechanism seems orthogonal to me, but that's a discussion > > for > > another day I think. > Given that VFIO is built around this mechanism, I don't think we have > a > choice but to support it. Anyway, I came up with the following patch, > which I tested on Seattle with mtty. It also survived my usual > hammering of cyclictest, hackbench??and bulk VM installs. > > Shunyong, could you please give it a go? > > Thanks, > > M. > I have tested the patch. It works on QDF2400 platform and?kvm_notify_acked_irq() is called when state is idle. BTW, I have following questions when I was debugging the issue. Coud you please give me some help? 1)what does "mi" mean in gic code? such as lr_signals_eoi_mi(); 2)In some __hyp_text?code where printk() will cause "HYP panic:", such as in __kvm_vcpu_run(). How can I output debug information? Thanks. Shunyong. > From 9ca96b9fb535cc6ab578bda85c4ecbc4a8c63cd7 Mon Sep 17 00:00:00 > 2001 > From: Marc Zyngier <marc.zyngier@arm.com> > Date: Fri, 9 Mar 2018 14:59:40 +0000 > Subject: [PATCH] KVM: arm/arm64: vgic: Disallow Active+Pending for > level > ?interrupts > > It was recently reported that VFIO mediated devices, and anything > that VFIO exposes as level interrupts, do no strictly follow the > expected logic of such interrupts as it only lowers the input > line when the guest has EOId the interrupt at the GIC level, rather > than when it Acked the interrupt at the device level. > > The GIC's Active+Pending state is fundamentally incompatible with > this behaviour, as it prevents KVM from observing the EOI, and in > turn results in VFIO never dropping the line. This results in an > interrupt storm in the guest, which it really never expected. > > As we cannot really change VFIO to follow the strict rules of level > signalling, let's forbid the A+P state altogether, as it is in the > end only an optimization. It ensures that we will transition via > an invalid state, which we can use to notify VFIO of the EOI. > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > --- > ?virt/kvm/arm/vgic/vgic-v2.c | 47 +++++++++++++++++++++++++++------ > ------------ > ?virt/kvm/arm/vgic/vgic-v3.c | 47 +++++++++++++++++++++++++++------ > ------------ > ?2 files changed, 56 insertions(+), 38 deletions(-) > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic- > v2.c > index 29556f71b691..9356d749da1d 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -153,8 +153,35 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu > *vcpu) > ?void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq > *irq, int lr) > ?{ > ? u32 val = irq->intid; > + bool allow_pending = true; > ? > - if (irq_is_pending(irq)) { > + if (irq->active) > + val |= GICH_LR_ACTIVE_BIT; > + > + if (irq->hw) { > + val |= GICH_LR_HW; > + val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; > + /* > + ?* Never set pending+active on a HW interrupt, as > the > + ?* pending state is kept at the physical distributor > + ?* level. > + ?*/ > + if (irq->active) > + allow_pending = false; > + } else { > + if (irq->config == VGIC_CONFIG_LEVEL) { > + val |= GICH_LR_EOI; > + > + /* > + ?* Software resampling doesn't work very > well > + ?* if we allow P+A, so let's not do that. > + ?*/ > + if (irq->active) > + allow_pending = false; > + } > + } > + > + if (allow_pending && irq_is_pending(irq)) { > ? val |= GICH_LR_PENDING_BIT; > ? > ? if (irq->config == VGIC_CONFIG_EDGE) > @@ -171,24 +198,6 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) > ? } > ? } > ? > - if (irq->active) > - val |= GICH_LR_ACTIVE_BIT; > - > - if (irq->hw) { > - val |= GICH_LR_HW; > - val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; > - /* > - ?* Never set pending+active on a HW interrupt, as > the > - ?* pending state is kept at the physical distributor > - ?* level. > - ?*/ > - if (irq->active && irq_is_pending(irq)) > - val &= ~GICH_LR_PENDING_BIT; > - } else { > - if (irq->config == VGIC_CONFIG_LEVEL) > - val |= GICH_LR_EOI; > - } > - > ? /* > ? ?* Level-triggered mapped IRQs are special because we only > observe > ? ?* rising edges as input to the VGIC.??We therefore lower > the line > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic- > v3.c > index 0ff2006f3781..6b484575cafb 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -135,8 +135,35 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) > ?{ > ? u32 model = vcpu->kvm->arch.vgic.vgic_model; > ? u64 val = irq->intid; > + bool allow_pending = true; > ? > - if (irq_is_pending(irq)) { > + if (irq->active) > + val |= ICH_LR_ACTIVE_BIT; > + > + if (irq->hw) { > + val |= ICH_LR_HW; > + val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; > + /* > + ?* Never set pending+active on a HW interrupt, as > the > + ?* pending state is kept at the physical distributor > + ?* level. > + ?*/ > + if (irq->active) > + allow_pending = false; > + } else { > + if (irq->config == VGIC_CONFIG_LEVEL) { > + val |= ICH_LR_EOI; > + > + /* > + ?* Software resampling doesn't work very > well > + ?* if we allow P+A, so let's not do that. > + ?*/ > + if (irq->active) > + allow_pending = false; > + } > + } > + > + if (allow_pending && irq_is_pending(irq)) { > ? val |= ICH_LR_PENDING_BIT; > ? > ? if (irq->config == VGIC_CONFIG_EDGE) > @@ -154,24 +181,6 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) > ? } > ? } > ? > - if (irq->active) > - val |= ICH_LR_ACTIVE_BIT; > - > - if (irq->hw) { > - val |= ICH_LR_HW; > - val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; > - /* > - ?* Never set pending+active on a HW interrupt, as > the > - ?* pending state is kept at the physical distributor > - ?* level. > - ?*/ > - if (irq->active && irq_is_pending(irq)) > - val &= ~ICH_LR_PENDING_BIT; > - } else { > - if (irq->config == VGIC_CONFIG_LEVEL) > - val |= ICH_LR_EOI; > - } > - > ? /* > ? ?* Level-triggered mapped IRQs are special because we only > observe > ? ?* rising edges as input to the VGIC.??We therefore lower > the line > --? > 2.14.2 > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-12 2:33 ` Yang, Shunyong @ 2018-03-12 10:09 ` Marc Zyngier -1 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-12 10:09 UTC (permalink / raw) To: Yang, Shunyong, cdall Cc: linux-kernel, ard.biesheuvel, kvmarm, Zheng, Joey, will.deacon, linux-arm-kernel, david.daney, eric.auger On 12/03/18 02:33, Yang, Shunyong wrote: > Hi, Marc, > > On Sun, 2018-03-11 at 12:17 +0000, Marc Zyngier wrote: >> On Sun, 11 Mar 2018 01:55:08 +0000 >> Christoffer Dall <cdall@kernel.org> wrote: >> >>> >>> On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier <marc.zyngier@arm.co >>> m> wrote: >>>> >>>> On Fri, 09 Mar 2018 21:36:12 +0000, >>>> Christoffer Dall wrote: >>>>> >>>>> >>>>> On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote: >>>>>> >>>>>> I'd be more confident if we did forbid P+A for such >>>>>> interrupts >>>>>> altogether, as they really feel like another kind of HW >>>>>> interrupt. >>>>> How about a slightly bigger hammer: Can we avoid doing P+A for >>>>> level >>>>> interrupts completely? I don't think that really makes much >>>>> sense, and >>>>> I think we simply everything if we just come back out and >>>>> resample the >>>>> line. For an edge, something like a network card, there's a >>>>> potential >>>>> performance win to appending a new pending state, but I doubt >>>>> that this >>>>> is the case for level interrupts. >>>> I started implementing the same thing yesterday. Somehow, it >>>> feels >>>> slightly better to have the same flow for all level interrupts, >>>> including the timer, and we only use the MI on EOI as a way to >>>> trigger >>>> the next state of injection. Still testing, but looking good so >>>> far. >>>> >>>> I'm still puzzled that we have this level-but-not-quite behaviour >>>> for >>>> VFIO interrupts. At some point, it is going to bite us badly. >>>> >>> Where is the departure from level-triggered behavior with VFIO? As >>> far as I can tell, the GIC flow of the interrupts will be just a >>> level >>> interrupt, >> The GIC is fine, I believe. What is not exactly fine is the >> signalling >> from the device, which will never be dropped until the EOI has been >> detected. >> >>> >>> but we just need to make sure the resamplefd mechanism is >>> supported for both types of interrupts. Whether or not that's a >>> decent mechanism seems orthogonal to me, but that's a discussion >>> for >>> another day I think. >> Given that VFIO is built around this mechanism, I don't think we have >> a >> choice but to support it. Anyway, I came up with the following patch, >> which I tested on Seattle with mtty. It also survived my usual >> hammering of cyclictest, hackbench and bulk VM installs. >> >> Shunyong, could you please give it a go? >> >> Thanks, >> >> M. >> > > I have tested the patch. It works on QDF2400 platform > and kvm_notify_acked_irq() is called when state is idle. Thanks a lot for testing. > > BTW, I have following questions when I was debugging the issue. > Coud you please give me some help? > 1)what does "mi" mean in gic code? such as lr_signals_eoi_mi(); MI stands for Maintenance Interrupts. Life is too short to write that all the time ;-) > 2)In some __hyp_text code where printk() will cause "HYP panic:", such > as in __kvm_vcpu_run(). How can I output debug information? You can't. None of the kernel is mapped at EL2 on pre-VHE hardware. You'll need to find indirect ways of outputting information (store data in memory, and output it once you're back to EL1). Thanks again, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-12 10:09 ` Marc Zyngier 0 siblings, 0 replies; 50+ messages in thread From: Marc Zyngier @ 2018-03-12 10:09 UTC (permalink / raw) To: linux-arm-kernel On 12/03/18 02:33, Yang, Shunyong wrote: > Hi, Marc, > > On Sun, 2018-03-11 at 12:17 +0000, Marc Zyngier wrote: >> On Sun, 11 Mar 2018 01:55:08 +0000 >> Christoffer Dall <cdall@kernel.org> wrote: >> >>> >>> On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier <marc.zyngier@arm.co >>> m> wrote: >>>> >>>> On Fri, 09 Mar 2018 21:36:12 +0000, >>>> Christoffer Dall wrote:?? >>>>> >>>>> >>>>> On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote:?? >>>>>> >>>>>> I'd be more confident if we did forbid P+A for such >>>>>> interrupts >>>>>> altogether, as they really feel like another kind of HW >>>>>> interrupt.?? >>>>> How about a slightly bigger hammer:??Can we avoid doing P+A for >>>>> level >>>>> interrupts completely???I don't think that really makes much >>>>> sense, and >>>>> I think we simply everything if we just come back out and >>>>> resample the >>>>> line.??For an edge, something like a network card, there's a >>>>> potential >>>>> performance win to appending a new pending state, but I doubt >>>>> that this >>>>> is the case for level interrupts.?? >>>> I started implementing the same thing yesterday. Somehow, it >>>> feels >>>> slightly better to have the same flow for all level interrupts, >>>> including the timer, and we only use the MI on EOI as a way to >>>> trigger >>>> the next state of injection. Still testing, but looking good so >>>> far. >>>> >>>> I'm still puzzled that we have this level-but-not-quite behaviour >>>> for >>>> VFIO interrupts. At some point, it is going to bite us badly. >>>> ? >>> Where is the departure from level-triggered behavior with VFIO???As >>> far as I can tell, the GIC flow of the interrupts will be just a >>> level >>> interrupt,? >> The GIC is fine, I believe. What is not exactly fine is the >> signalling >> from the device, which will never be dropped until the EOI has been >> detected. >> >>> >>> but we just need to make sure the resamplefd mechanism is >>> supported for both types of interrupts.??Whether or not that's a >>> decent mechanism seems orthogonal to me, but that's a discussion >>> for >>> another day I think. >> Given that VFIO is built around this mechanism, I don't think we have >> a >> choice but to support it. Anyway, I came up with the following patch, >> which I tested on Seattle with mtty. It also survived my usual >> hammering of cyclictest, hackbench??and bulk VM installs. >> >> Shunyong, could you please give it a go? >> >> Thanks, >> >> M. >> > > I have tested the patch. It works on QDF2400 platform > and?kvm_notify_acked_irq() is called when state is idle. Thanks a lot for testing. > > BTW, I have following questions when I was debugging the issue. > Coud you please give me some help? > 1)what does "mi" mean in gic code? such as lr_signals_eoi_mi(); MI stands for Maintenance Interrupts. Life is too short to write that all the time ;-) > 2)In some __hyp_text?code where printk() will cause "HYP panic:", such > as in __kvm_vcpu_run(). How can I output debug information? You can't. None of the kernel is mapped at EL2 on pre-VHE hardware. You'll need to find indirect ways of outputting information (store data in memory, and output it once you're back to EL1). Thanks again, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling 2018-03-08 9:49 ` Marc Zyngier @ 2018-03-08 16:10 ` Christoffer Dall -1 siblings, 0 replies; 50+ messages in thread From: Christoffer Dall @ 2018-03-08 16:10 UTC (permalink / raw) To: Marc Zyngier Cc: Shunyong Yang, ard.biesheuvel, will.deacon, eric.auger, david.daney, linux-arm-kernel, kvmarm, linux-kernel, Joey Zheng On Thu, Mar 08, 2018 at 09:49:43AM +0000, Marc Zyngier wrote: > [updated Christoffer's email address] > > Hi Shunyong, > > On 08/03/18 07:01, Shunyong Yang wrote: > > When resampling irqfds is enabled, level interrupt should be > > de-asserted when resampling happens. On page 4-47 of GIC v3 > > specification IHI0069D, it said, > > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > > interface, the IRI changes the status of the interrupt to active > > and pending if: > > • It is an edge-triggered interrupt, and another edge has been > > detected since the interrupt was acknowledged. > > • It is a level-sensitive interrupt, and the level has not been > > deasserted since the interrupt was acknowledged." > > > > GIC v2 specification IHI0048B.b has similar description on page > > 3-42 for state machine transition. > > > > When some VFIO device, like mtty(8250 VFIO mdev emulation driver > > in samples/vfio-mdev) triggers a level interrupt, the status > > transition in LR is pending-->active-->active and pending. > > Then it will wait resampling to de-assert the interrupt. > > > > Current design of lr_signals_eoi_mi() will return false if state > > in LR is not invalid(Inactive). It causes resampling will not happen > > in mtty case. > > Let me rephrase this, and tell me if I understood it correctly: > > - A level interrupt is injected, activated by the guest (LR state=active) > - guest exits, re-enters, (LR state=pending+active) > - guest EOIs the interrupt (LR state=pending) > - maintenance interrupt > - we don't signal the resampling because we're not in an invalid state > > Is that correct? > > That's an interesting case, because it seems to invalidate some of the > optimization that went in over a year ago. > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation > > We could compare the value of the LR before the guest entry with > the value at exit time, but we still could miss it if we have a > transition such as P+A -> P -> A and assume a long enough propagation > delay for the maintenance interrupt (which is very likely). > > In essence, we have lost the benefit of EISR, which was to give us a > way to deal with asynchronous signalling. > I don't understand why EISR gives us anything beyond looking at the LR and evaluating if the state is 00. My reading of the spec is that the EISR is merely a shortcut to knowing the state of the LRs but contains not record or information beyond what you can read from the LRs. What am I missing? Thanks, -Christoffer ^ permalink raw reply [flat|nested] 50+ messages in thread
* [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling @ 2018-03-08 16:10 ` Christoffer Dall 0 siblings, 0 replies; 50+ messages in thread From: Christoffer Dall @ 2018-03-08 16:10 UTC (permalink / raw) To: linux-arm-kernel On Thu, Mar 08, 2018 at 09:49:43AM +0000, Marc Zyngier wrote: > [updated Christoffer's email address] > > Hi Shunyong, > > On 08/03/18 07:01, Shunyong Yang wrote: > > When resampling irqfds is enabled, level interrupt should be > > de-asserted when resampling happens. On page 4-47 of GIC v3 > > specification IHI0069D, it said, > > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU > > interface, the IRI changes the status of the interrupt to active > > and pending if: > > ? It is an edge-triggered interrupt, and another edge has been > > detected since the interrupt was acknowledged. > > ? It is a level-sensitive interrupt, and the level has not been > > deasserted since the interrupt was acknowledged." > > > > GIC v2 specification IHI0048B.b has similar description on page > > 3-42 for state machine transition. > > > > When some VFIO device, like mtty(8250 VFIO mdev emulation driver > > in samples/vfio-mdev) triggers a level interrupt, the status > > transition in LR is pending-->active-->active and pending. > > Then it will wait resampling to de-assert the interrupt. > > > > Current design of lr_signals_eoi_mi() will return false if state > > in LR is not invalid(Inactive). It causes resampling will not happen > > in mtty case. > > Let me rephrase this, and tell me if I understood it correctly: > > - A level interrupt is injected, activated by the guest (LR state=active) > - guest exits, re-enters, (LR state=pending+active) > - guest EOIs the interrupt (LR state=pending) > - maintenance interrupt > - we don't signal the resampling because we're not in an invalid state > > Is that correct? > > That's an interesting case, because it seems to invalidate some of the > optimization that went in over a year ago. > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation > > We could compare the value of the LR before the guest entry with > the value at exit time, but we still could miss it if we have a > transition such as P+A -> P -> A and assume a long enough propagation > delay for the maintenance interrupt (which is very likely). > > In essence, we have lost the benefit of EISR, which was to give us a > way to deal with asynchronous signalling. > I don't understand why EISR gives us anything beyond looking at the LR and evaluating if the state is 00. My reading of the spec is that the EISR is merely a shortcut to knowing the state of the LRs but contains not record or information beyond what you can read from the LRs. What am I missing? Thanks, -Christoffer ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2018-03-12 10:09 UTC | newest] Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-03-08 7:01 [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling Shunyong Yang 2018-03-08 7:01 ` Shunyong Yang 2018-03-08 7:01 ` Shunyong Yang 2018-03-08 8:57 ` Auger Eric 2018-03-08 8:57 ` Auger Eric 2018-03-08 9:31 ` [此邮件可能存在风险] " Yang, Shunyong 2018-03-08 9:31 ` Yang, Shunyong 2018-03-08 11:01 ` Marc Zyngier 2018-03-08 11:01 ` Marc Zyngier 2018-03-08 15:29 ` Auger Eric 2018-03-08 15:29 ` Auger Eric 2018-03-08 9:49 ` Marc Zyngier 2018-03-08 9:49 ` Marc Zyngier 2018-03-08 9:49 ` Marc Zyngier 2018-03-08 11:54 ` Marc Zyngier 2018-03-08 11:54 ` Marc Zyngier 2018-03-08 16:09 ` Auger Eric 2018-03-08 16:09 ` Auger Eric 2018-03-08 16:19 ` Christoffer Dall 2018-03-08 16:19 ` Christoffer Dall 2018-03-08 17:28 ` Marc Zyngier 2018-03-08 17:28 ` Marc Zyngier 2018-03-08 18:12 ` Auger Eric 2018-03-08 18:12 ` Auger Eric 2018-03-09 3:14 ` Yang, Shunyong 2018-03-09 3:14 ` Yang, Shunyong 2018-03-09 9:40 ` Marc Zyngier 2018-03-09 9:40 ` Marc Zyngier 2018-03-09 13:10 ` Auger Eric 2018-03-09 13:10 ` Auger Eric 2018-03-09 13:37 ` Marc Zyngier 2018-03-09 13:37 ` Marc Zyngier 2018-03-09 9:12 ` Marc Zyngier 2018-03-09 9:12 ` Marc Zyngier 2018-03-09 13:18 ` Auger Eric 2018-03-09 13:18 ` Auger Eric 2018-03-09 21:36 ` Christoffer Dall 2018-03-09 21:36 ` Christoffer Dall 2018-03-10 12:20 ` Marc Zyngier 2018-03-10 12:20 ` Marc Zyngier 2018-03-11 1:55 ` Christoffer Dall 2018-03-11 1:55 ` Christoffer Dall 2018-03-11 12:17 ` Marc Zyngier 2018-03-11 12:17 ` Marc Zyngier 2018-03-12 2:33 ` Yang, Shunyong 2018-03-12 2:33 ` Yang, Shunyong 2018-03-12 10:09 ` Marc Zyngier 2018-03-12 10:09 ` Marc Zyngier 2018-03-08 16:10 ` Christoffer Dall 2018-03-08 16:10 ` Christoffer Dall
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.