> 在 2022年8月25日,20:39,Klaus Jensen 写道: > > On Aug 25 13:56, Klaus Jensen wrote: >>> On Aug 25 19:16, Jinhao Fan wrote: >>> On 8/25/2022 5:33 PM, Klaus Jensen wrote: >>>> I'm still a bit perplexed by this issue, so I just tried moving >>>> nvme_init_irq_notifier() to the end of nvme_init_cq() and removing this >>>> first_io_cqe thing. I did not observe any particular issues? >>>> >>>> What bad behavior did you encounter, it seems to work fine to me >>> >>> The kernel boots up and got stuck, waiting for interrupts. Then the request >>> times out and got retried three times. Finally the driver seems to decide >>> that the drive is down and continues to boot. >>> >>> I added some prints during debugging and found that the MSI-X message which >>> got registered in KVM via kvm_irqchip_add_msi_route() is not the same as the >>> one actually used in msix_notify(). >>> >>> Are you sure you are using KVM's irqfd? >>> >> >> Pretty sure? Using "ioeventfd=on,irq-eventfd=on" on the controller. >> >> And the following patch. >> >> >> diff --git i/hw/nvme/ctrl.c w/hw/nvme/ctrl.c >> index 30bbda7bb5ae..b2e41d3bd745 100644 >> --- i/hw/nvme/ctrl.c >> +++ w/hw/nvme/ctrl.c >> @@ -1490,21 +1490,6 @@ static void nvme_post_cqes(void *opaque) >> if (!pending) { >> n->cq_pending++; >> } >> - >> - if (unlikely(cq->first_io_cqe)) { >> - /* >> - * Initilize event notifier when first cqe is posted. For irqfd >> - * support we need to register the MSI message in KVM. We >> - * can not do this registration at CQ creation time because >> - * Linux's NVMe driver changes the MSI message after CQ creation. >> - */ >> - cq->first_io_cqe = false; >> - >> - if (n->params.irq_eventfd) { >> - nvme_init_irq_notifier(n, cq); >> - } >> - } >> - >> } >> >> nvme_irq_assert(n, cq); >> @@ -4914,11 +4899,14 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr, >> } >> n->cq[cqid] = cq; >> cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq); >> + >> /* >> * Only enable irqfd for IO queues since we always emulate admin queue >> * in main loop thread >> */ >> - cq->first_io_cqe = cqid != 0; >> + if (cqid && n->params.irq_eventfd) { >> + nvme_init_irq_notifier(n, cq); >> + } >> } >> >> > > From a trace, this is what I observe: > > First, the queue is created and a virq (0) is assigned. > > msix_table_mmio_write dev nvme hwaddr 0xc val 0x0 size 4 > pci_nvme_mmio_write addr 0x1000 data 0x7 size 4 > pci_nvme_mmio_doorbell_sq sqid 0 new_tail 7 > pci_nvme_admin_cmd cid 4117 sqid 0 opc 0x5 opname 'NVME_ADM_CMD_CREATE_CQ' > pci_nvme_create_cq create completion queue, addr=0x104318000, cqid=1, vector=1, qsize=1023, qflags=3, ien=1 > kvm_irqchip_add_msi_route dev nvme vector 1 virq 0 > kvm_irqchip_commit_routes > pci_nvme_enqueue_req_completion cid 4117 cqid 0 dw0 0x0 dw1 0x0 status 0x0 > pci_nvme_irq_msix raising MSI-X IRQ vector 0 > pci_nvme_mmio_write addr 0x1004 data 0x7 size 4 > pci_nvme_mmio_doorbell_cq cqid 0 new_head 7 > > We go on and the SQ is created as well. > > pci_nvme_mmio_write addr 0x1000 data 0x8 size 4 > pci_nvme_mmio_doorbell_sq sqid 0 new_tail 8 > pci_nvme_admin_cmd cid 4118 sqid 0 opc 0x1 opname 'NVME_ADM_CMD_CREATE_SQ' > pci_nvme_create_sq create submission queue, addr=0x1049a0000, sqid=1, cqid=1, qsize=1023, qflags=1 > pci_nvme_enqueue_req_completion cid 4118 cqid 0 dw0 0x0 dw1 0x0 status 0x0 > pci_nvme_irq_msix raising MSI-X IRQ vector 0 > pci_nvme_mmio_write addr 0x1004 data 0x8 size 4 > pci_nvme_mmio_doorbell_cq cqid 0 new_head 8 > > > Then i get a bunch of update_msi_routes, but the virq's are not related > to the nvme device. > > However, I then assume we hit queue_request_irq() in the kernel and we > see the MSI-X table updated: > > msix_table_mmio_write dev nvme hwaddr 0x1c val 0x1 size 4 > msix_table_mmio_write dev nvme hwaddr 0x10 val 0xfee003f8 size 4 > msix_table_mmio_write dev nvme hwaddr 0x14 val 0x0 size 4 > msix_table_mmio_write dev nvme hwaddr 0x18 val 0x0 size 4 > msix_table_mmio_write dev nvme hwaddr 0x1c val 0x0 size 4 > kvm_irqchip_update_msi_route Updating MSI route virq=0 > ... other virq updates > kvm_irqchip_commit_routes > > Notice the last trace line. The route for virq 0 is updated. > > Looks to me that the virq route is implicitly updated with the new > message, no? Could you try without the msix masking patch? I suspect our unmask function actually did the “implicit” update here. >