On Wed, Mar 9, 2022 at 3:04 PM Michael S. Tsirkin wrote: > On Wed, Mar 09, 2022 at 11:41:01AM +0800, Jason Wang wrote: > > > > > > On Wed, Mar 9, 2022 at 12:36 AM Michael S. Tsirkin > wrote: > > > > On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote: > > > On Tue, 19 Oct 2021 08:01:46 +0100, > > > Jason Wang wrote: > > > > > > > > We used to synchronize pending MSI-X irq handlers via > > > > synchronize_irq(), this may not work for the untrusted device > which > > > > may keep sending interrupts after reset which may lead unexpected > > > > results. Similarly, we should not enable MSI-X interrupt until > the > > > > device is ready. So this patch fixes those two issues by: > > > > > > > > 1) switching to use disable_irq() to prevent the virtio interrupt > > > > handlers to be called after the device is reset. > > > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready() > > > > > > > > This can make sure the virtio interrupt handler won't be called > before > > > > virtio_device_ready() and after reset. > > > > > > > > Cc: Thomas Gleixner > > > > Cc: Peter Zijlstra > > > > Cc: Paul E. McKenney > > > > Signed-off-by: Jason Wang > > > > --- > > > > drivers/virtio/virtio_pci_common.c | 27 > +++++++++++++++++++++------ > > > > drivers/virtio/virtio_pci_common.h | 6 ++++-- > > > > drivers/virtio/virtio_pci_legacy.c | 5 +++-- > > > > drivers/virtio/virtio_pci_modern.c | 6 ++++-- > > > > 4 files changed, 32 insertions(+), 12 deletions(-) > > > > > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/ > > virtio_pci_common.c > > > > index b35bb2d57f62..8d8f83aca721 100644 > > > > --- a/drivers/virtio/virtio_pci_common.c > > > > +++ b/drivers/virtio/virtio_pci_common.c > > > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy, > > > > "Force legacy mode for transitional virtio 1 > devices"); > > > > #endif > > > > > > > > -/* wait for pending irq handlers */ > > > > -void vp_synchronize_vectors(struct virtio_device *vdev) > > > > +/* disable irq handlers */ > > > > +void vp_disable_cbs(struct virtio_device *vdev) > > > > { > > > > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > > > > int i; > > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct > virtio_device > > *vdev) > > > > synchronize_irq(vp_dev->pci_dev->irq); > > > > > > > > for (i = 0; i < vp_dev->msix_vectors; ++i) > > > > - synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i)); > > > > + disable_irq(pci_irq_vector(vp_dev->pci_dev, i)); > > > > +} > > > > + > > > > +/* enable irq handlers */ > > > > +void vp_enable_cbs(struct virtio_device *vdev) > > > > +{ > > > > + struct virtio_pci_device *vp_dev = to_vp_device(vdev); > > > > + int i; > > > > + > > > > + if (vp_dev->intx_enabled) > > > > + return; > > > > + > > > > + for (i = 0; i < vp_dev->msix_vectors; ++i) > > > > + enable_irq(pci_irq_vector(vp_dev->pci_dev, i)); > > > > > > This results in a splat at boot time if you set maxcpus=, > > > see below. Enabling interrupts that are affinity managed is *bad*. > You > > > don't even know whether the CPU which is supposed to handle this is > > > online or not. > > > > > > The core kernel notices it, shouts and keeps the interrupt > disabled, > > > but this should be fixed. The whole point of managed interrupts is > to > > > let them be dealt with outside of the drivers, and tied into the > CPUs > > > being brought up and down. If virtio needs (for one reason or > another) > > > to manage interrupts on its own, so be it. But this patch isn't the > > > way to do it, I'm afraid. > > > > > > M. > > > > Thanks for reporting this! > > > > What virtio is doing here isn't unique however. > > > > If device driver is to be hardened against device sending interrupts > > while driver is initializing/cleaning up, it needs kernel to provide > > capability to disable these. > > > > If we then declare that that is impossible for managed interrupts > > then that will mean most devices can't use managed interrupts > > because ideally we'd have all drivers hardened. > > > > > > Or that probably means we need to use a driver specific mechanism as > what we > > did for INTX instead of using NO_AUTO_EN. > > > > Thanks > > What we did for INTX was pretty expensive, we justified this > by saying INTX handling involves expensive IO reads anyway > so it's in the noise. > We only add a READ_ONCE() on the callback like: + if (!READ_ONCE(vp_dev->intx_soft_enabled)) + return IRQ_NONE; It should not be that expensive. > > Let's see what does Thomas suggest. > Yes, sure. Thanks > > > > > > > Thomas I think you were the one who suggested enabling/disabling > > interrupts originally - thoughts? > > > > Feedback appreciated. > > > > > > > > > [ 3.434849] ------------[ cut here ]------------ > > > [ 3.434850] WARNING: CPU: 0 PID: 93 at kernel/irq/chip.c:210 > > irq_startup+0x10 > > > e/0x120 > > > [ 3.434861] Modules linked in: virtio_net(E+) net_failover(E) > failover > > (E) vir > > > tio_blk(E+) bochs(E+) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) > ahci > > (E+) libah > > > ci(E) virtio_pci(E) virtio_pci_legacy_dev(E) > virtio_pci_modern_dev(E) > > virtio(E) > > > drm_kms_helper(E) cec(E) libata(E) crct10dif_pclmul(E) > crct10dif_common > > (E) crc32 > > > _pclmul(E) scsi_mod(E) i2c_i801(E) crc32c_intel(E) psmouse(E) > i2c_smbus > > (E) scsi_ > > > common(E) lpc_ich(E) virtio_ring(E) drm(E) button(E) > > > [ 3.434890] CPU: 0 PID: 93 Comm: systemd-udevd Tainted: G > > > E 5. > > > 17.0-rc7-00020-gea4424be1688 #63 > > > [ 3.434893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), > BIOS > > 0.0.0 02 > > > /06/2015 > > > [ 3.434897] RIP: 0010:irq_startup+0x10e/0x120 > > > [ 3.434904] Code: c0 75 2b 4c 89 e7 31 d2 4c 89 ee e8 dc c5 ff > ff 48 > > 89 ef e8 > > > 94 fe ff ff 41 89 c4 e9 33 ff ff ff e8 e7 ca ff ff e9 50 ff ff ff > <0f> > > 0b eb ac > > > 0f 0b eb a8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 > > > [ 3.434906] RSP: 0018:ffff972c402bbbf0 EFLAGS: 00010002 > > > [ 3.434908] RAX: 0000000000000004 RBX: 0000000000000001 RCX: > > 0000000000000040 > > > [ 3.434912] RDX: 0000000000000000 RSI: ffffffffa768dee0 RDI: > > ffff8bcf8ce34648 > > > [ 3.434913] RBP: ffff8bcfb007a800 R08: 0000000000000004 R09: > > ffffffffa74cb828 > > > [ 3.434915] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000001 > > > [ 3.434916] R13: ffff8bcf8ce34648 R14: ffff8bcf8d185c70 R15: > > 0000000000000200 > > > [ 3.434918] FS: 00007f5b3179f8c0(0000) > GS:ffff8bcffbc00000(0000) > > knlGS:00000 > > > 00000000000 > > > [ 3.434919] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 3.434921] CR2: 000055ca47bab6b8 CR3: 000000017bc40003 CR4: > > 0000000000170ef0 > > > [ 3.434928] Call Trace: > > > [ 3.434936] > > > [ 3.434938] enable_irq+0x48/0x90 > > > [ 3.434943] vp_enable_cbs+0x36/0x70 [virtio_pci] > > > [ 3.434948] virtblk_probe+0x457/0x7dc [virtio_blk] > > > [ 3.434954] virtio_dev_probe+0x1ae/0x280 [virtio] > > > [ 3.434959] really_probe+0x1f5/0x3d0 > > > [ 3.434966] __driver_probe_device+0xfe/0x180 > > > [ 3.434969] driver_probe_device+0x1e/0x90 > > > [ 3.434971] __driver_attach+0xc0/0x1c0 > > > [ 3.434974] ? __device_attach_driver+0xe0/0xe0 > > > [ 3.434976] ? __device_attach_driver+0xe0/0xe0 > > > [ 3.434978] bus_for_each_dev+0x78/0xc0 > > > [ 3.434982] bus_add_driver+0x149/0x1e0 > > > [ 3.434985] driver_register+0x8b/0xe0 > > > [ 3.434987] ? 0xffffffffc01aa000 > > > [ 3.434990] init+0x52/0x1000 [virtio_blk] > > > [ 3.434994] do_one_initcall+0x44/0x200 > > > [ 3.435001] ? kmem_cache_alloc_trace+0x300/0x400 > > > [ 3.435006] do_init_module+0x4c/0x260 > > > [ 3.435013] __do_sys_finit_module+0xb4/0x120 > > > [ 3.435018] do_syscall_64+0x3b/0xc0 > > > [ 3.435027] entry_SYSCALL_64_after_hwframe+0x44/0xae > > > [ 3.435037] RIP: 0033:0x7f5b31c589b9 > > > [ 3.435040] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 > 00 00 > > 48 89 f8 > > > 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 > <48> > > 3d 01 f0 > > > ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48 > > > [ 3.435042] RSP: 002b:00007ffc608fc198 EFLAGS: 00000246 > ORIG_RAX: > > 00000000000 > > > 00139 > > > [ 3.435045] RAX: ffffffffffffffda RBX: 000055ca47ba8700 RCX: > > 00007f5b31c589b9 > > > [ 3.435046] RDX: 0000000000000000 RSI: 00007f5b31de3e2d RDI: > > 0000000000000005 > > > [ 3.435048] RBP: 0000000000020000 R08: 0000000000000000 R09: > > 000055ca47ba9030 > > > [ 3.435049] R10: 0000000000000005 R11: 0000000000000246 R12: > > 00007f5b31de3e2d > > > [ 3.435050] R13: 0000000000000000 R14: 000055ca47ba7060 R15: > > 000055ca47ba8700 > > > [ 3.435053] > > > [ 3.435059] ---[ end trace 0000000000000000 ]--- > > > [ 3.440593] vda: vda1 vda2 vda3 > > > [ 3.445283] scsi host0: Virtio SCSI HBA > > > [ 3.450373] scsi 0:0:0:0: CD-ROM QEMU QEMU > CD-ROM > > 2.5+ PQ > > > : 0 ANSI: 5 > > > > > > -- > > > Without deviation from the norm, progress is not possible. > > > > > >