From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Subject: [PATCH v2 14/16] KVM: PPC: Book3S HV: XIVE: add passthrough support Date: Fri, 22 Feb 2019 12:28:38 +0100 Message-ID: <20190222112840.25000-15-clg@kaod.org> References: <20190222112840.25000-1-clg@kaod.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: kvm@vger.kernel.org, Paul Mackerras , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , linuxppc-dev@lists.ozlabs.org, David Gibson To: kvm-ppc@vger.kernel.org Return-path: In-Reply-To: <20190222112840.25000-1-clg@kaod.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+glppe-linuxppc-embedded-2=m.gmane.org@lists.ozlabs.org Sender: "Linuxppc-dev" List-Id: kvm.vger.kernel.org The KVM XICS-over-XIVE device and the proposed KVM XIVE native device implement an IRQ space for the guest using the generic IPI interrupts of the XIVE IC controller. These interrupts are allocated at the OPAL level and "mapped" into the guest IRQ number space in the range 0-0x1FFF. Interrupt management is performed in the XIVE way: using loads and stores on the addresses of the XIVE IPI interrupt ESB pages. Both KVM devices share the same internal structure caching information on the interrupts, among which the xive_irq_data struct containing the addresses of the IPI ESB pages and an extra one in case of passthrough. The later contains the addresses of the ESB pages of the underlying HW controller interrupts, PHB4 in all cases for now. A guest, when running in the XICS legacy interrupt mode, lets the KVM XICS-over-XIVE device "handle" interrupt management, that is to perform the loads and stores on the addresses of the ESB pages of the guest interrupts. However, when running in XIVE native exploitation mode, the KVM XIVE native device exposes the interrupt ESB pages to the guest and lets the guest perform directly the loads and stores. The VMA exposing the ESB pages make use of a custom VM fault handler which role is to populate the VMA with appropriate pages. When a fault occurs, the guest IRQ number is deduced from the offset, and the ESB pages of associated XIVE IPI interrupt are inserted in the VMA (using the internal structure caching information on the interrupts). Supporting device passthrough in the guest running in XIVE native exploitation mode adds some extra refinements because the ESB pages of a different HW controller (PHB4) need to be exposed to the guest along with the initial IPI ESB pages of the XIVE IC controller. But the overall mechanic is the same. When the device HW irqs are mapped into or unmapped from the guest IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped() and kvmppc_xive_clr_mapped(), are called to record or clear the passthrough interrupt information and to perform the switch. The approach taken by this patch is to clear the ESB pages of the guest IRQ number being mapped and let the VM fault handler repopulate. The handler will insert the ESB page corresponding to the HW interrupt of the device being passed-through or the initial IPI ESB page if the device is being removed. Signed-off-by: Cédric Le Goater --- arch/powerpc/kvm/book3s_xive.h | 9 +++++ arch/powerpc/kvm/book3s_xive.c | 15 ++++++++ arch/powerpc/kvm/book3s_xive_native.c | 41 ++++++++++++++++++++++ Documentation/virtual/kvm/devices/xive.txt | 15 ++++++++ 4 files changed, 80 insertions(+) diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h index 6660d138c6b7..d1f832a53811 100644 --- a/arch/powerpc/kvm/book3s_xive.h +++ b/arch/powerpc/kvm/book3s_xive.h @@ -94,6 +94,11 @@ struct kvmppc_xive_src_block { struct kvmppc_xive_irq_state irq_state[KVMPPC_XICS_IRQ_PER_ICS]; }; +struct kvmppc_xive; + +struct kvmppc_xive_ops { + int (*reset_mapped)(struct kvm *kvm, unsigned long guest_irq); +}; struct kvmppc_xive { struct kvm *kvm; @@ -132,6 +137,10 @@ struct kvmppc_xive { /* Flags */ u8 single_escalation; + + struct kvmppc_xive_ops *ops; + struct address_space *mapping; + struct mutex mapping_lock; }; #define KVMPPC_XIVE_Q_COUNT 8 diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c index 7431e31bc541..7a14512b8944 100644 --- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -942,6 +942,13 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq, /* Turn the IPI hard off */ xive_vm_esb_load(&state->ipi_data, XIVE_ESB_SET_PQ_01); + /* + * Reset ESB guest mapping. Needed when ESB pages are exposed + * to the guest in XIVE native mode + */ + if (xive->ops && xive->ops->reset_mapped) + xive->ops->reset_mapped(kvm, guest_irq); + /* Grab info about irq */ state->pt_number = hw_irq; state->pt_data = irq_data_get_irq_handler_data(host_data); @@ -1027,6 +1034,14 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq, state->pt_number = 0; state->pt_data = NULL; + /* + * Reset ESB guest mapping. Needed when ESB pages are exposed + * to the guest in XIVE native mode + */ + if (xive->ops && xive->ops->reset_mapped) { + xive->ops->reset_mapped(kvm, guest_irq); + } + /* Reconfigure the IPI */ xive_native_configure_irq(state->ipi_number, xive_vp(xive, state->act_server), diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c index 92cab6409e8e..bf60870144f1 100644 --- a/arch/powerpc/kvm/book3s_xive_native.c +++ b/arch/powerpc/kvm/book3s_xive_native.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -176,6 +177,35 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev, return rc; } +/* + * Device passthrough support + */ +static int kvmppc_xive_native_reset_mapped(struct kvm *kvm, unsigned long irq) +{ + struct kvmppc_xive *xive = kvm->arch.xive; + + if (irq >= KVMPPC_XIVE_NR_IRQS) + return -EINVAL; + + /* + * Clear the ESB pages of the IRQ number being mapped (or + * unmapped) into the guest and let the the VM fault handler + * repopulate with the appropriate ESB pages (device or IC) + */ + pr_debug("clearing esb pages for girq 0x%lx\n", irq); + mutex_lock(&xive->mapping_lock); + if (xive->mapping) + unmap_mapping_range(xive->mapping, + irq * (2ull << PAGE_SHIFT), + 2ull << PAGE_SHIFT, 1); + mutex_unlock(&xive->mapping_lock); + return 0; +} + +static struct kvmppc_xive_ops kvmppc_xive_native_ops = { + .reset_mapped = kvmppc_xive_native_reset_mapped, +}; + static int xive_native_esb_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; @@ -253,6 +283,8 @@ static const struct vm_operations_struct xive_native_tima_vmops = { static int kvmppc_xive_native_mmap(struct kvm_device *dev, struct vm_area_struct *vma) { + struct kvmppc_xive *xive = dev->private; + /* We only allow mappings at fixed offset for now */ if (vma->vm_pgoff == KVM_XIVE_TIMA_PAGE_OFFSET) { if (vma_pages(vma) > 4) @@ -268,6 +300,13 @@ static int kvmppc_xive_native_mmap(struct kvm_device *dev, vma->vm_flags |= VM_IO | VM_PFNMAP; vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot); + + /* + * Grab the KVM device file address_space to be able to clear + * the ESB pages mapping when a device is passed-through into + * the guest. + */ + xive->mapping = vma->vm_file->f_mapping; return 0; } @@ -913,6 +952,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type) xive->dev = dev; xive->kvm = kvm; kvm->arch.xive = xive; + mutex_init(&xive->mapping_lock); /* We use the default queue size set by the host */ xive->q_order = xive_native_default_eq_shift(); @@ -933,6 +973,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type) ret = -ENOMEM; xive->single_escalation = xive_native_has_single_escalation(); + xive->ops = &kvmppc_xive_native_ops; if (ret) kfree(xive); diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index be5000b2eb5a..7a242cb07e7c 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -43,6 +43,21 @@ the legacy interrupt mode, referred as XICS (POWER7/8). manage the source: to trigger, to EOI, to turn off the source for instance. + 3. Device passthrough + + When a device is passed-through into the guest, the source + interrupts are from a different HW controller (PHB4) and the ESB + pages exposed to the guest should accommadate this change. + + The passthru_irq helpers, kvmppc_xive_set_mapped() and + kvmppc_xive_clr_mapped() are called when the device HW irqs are + mapped into or unmapped from the guest IRQ number space. The KVM + device extends these helpers to clear the ESB pages of the guest IRQ + number being mapped and then lets the VM fault handler repopulate. + The handler will insert the ESB page corresponding to the HW + interrupt of the device being passed-through or the initial IPI ESB + page if the device has being removed. + * Groups: 1. KVM_DEV_XIVE_GRP_CTRL -- 2.20.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Fri, 22 Feb 2019 11:28:38 +0000 Subject: [PATCH v2 14/16] KVM: PPC: Book3S HV: XIVE: add passthrough support Message-Id: <20190222112840.25000-15-clg@kaod.org> List-Id: References: <20190222112840.25000-1-clg@kaod.org> In-Reply-To: <20190222112840.25000-1-clg@kaod.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: kvm-ppc@vger.kernel.org Cc: kvm@vger.kernel.org, Paul Mackerras , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , linuxppc-dev@lists.ozlabs.org, David Gibson The KVM XICS-over-XIVE device and the proposed KVM XIVE native device implement an IRQ space for the guest using the generic IPI interrupts of the XIVE IC controller. These interrupts are allocated at the OPAL level and "mapped" into the guest IRQ number space in the range 0-0x1FFF. Interrupt management is performed in the XIVE way: using loads and stores on the addresses of the XIVE IPI interrupt ESB pages. Both KVM devices share the same internal structure caching information on the interrupts, among which the xive_irq_data struct containing the addresses of the IPI ESB pages and an extra one in case of passthrough. The later contains the addresses of the ESB pages of the underlying HW controller interrupts, PHB4 in all cases for now. A guest, when running in the XICS legacy interrupt mode, lets the KVM XICS-over-XIVE device "handle" interrupt management, that is to perform the loads and stores on the addresses of the ESB pages of the guest interrupts. However, when running in XIVE native exploitation mode, the KVM XIVE native device exposes the interrupt ESB pages to the guest and lets the guest perform directly the loads and stores. The VMA exposing the ESB pages make use of a custom VM fault handler which role is to populate the VMA with appropriate pages. When a fault occurs, the guest IRQ number is deduced from the offset, and the ESB pages of associated XIVE IPI interrupt are inserted in the VMA (using the internal structure caching information on the interrupts). Supporting device passthrough in the guest running in XIVE native exploitation mode adds some extra refinements because the ESB pages of a different HW controller (PHB4) need to be exposed to the guest along with the initial IPI ESB pages of the XIVE IC controller. But the overall mechanic is the same. When the device HW irqs are mapped into or unmapped from the guest IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped() and kvmppc_xive_clr_mapped(), are called to record or clear the passthrough interrupt information and to perform the switch. The approach taken by this patch is to clear the ESB pages of the guest IRQ number being mapped and let the VM fault handler repopulate. The handler will insert the ESB page corresponding to the HW interrupt of the device being passed-through or the initial IPI ESB page if the device is being removed. Signed-off-by: C=C3=A9dric Le Goater --- arch/powerpc/kvm/book3s_xive.h | 9 +++++ arch/powerpc/kvm/book3s_xive.c | 15 ++++++++ arch/powerpc/kvm/book3s_xive_native.c | 41 ++++++++++++++++++++++ Documentation/virtual/kvm/devices/xive.txt | 15 ++++++++ 4 files changed, 80 insertions(+) diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h index 6660d138c6b7..d1f832a53811 100644 --- a/arch/powerpc/kvm/book3s_xive.h +++ b/arch/powerpc/kvm/book3s_xive.h @@ -94,6 +94,11 @@ struct kvmppc_xive_src_block { struct kvmppc_xive_irq_state irq_state[KVMPPC_XICS_IRQ_PER_ICS]; }; =20 +struct kvmppc_xive; + +struct kvmppc_xive_ops { + int (*reset_mapped)(struct kvm *kvm, unsigned long guest_irq); +}; =20 struct kvmppc_xive { struct kvm *kvm; @@ -132,6 +137,10 @@ struct kvmppc_xive { =20 /* Flags */ u8 single_escalation; + + struct kvmppc_xive_ops *ops; + struct address_space *mapping; + struct mutex mapping_lock; }; =20 #define KVMPPC_XIVE_Q_COUNT 8 diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c index 7431e31bc541..7a14512b8944 100644 --- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -942,6 +942,13 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned l= ong guest_irq, /* Turn the IPI hard off */ xive_vm_esb_load(&state->ipi_data, XIVE_ESB_SET_PQ_01); =20 + /* + * Reset ESB guest mapping. Needed when ESB pages are exposed + * to the guest in XIVE native mode + */ + if (xive->ops && xive->ops->reset_mapped) + xive->ops->reset_mapped(kvm, guest_irq); + /* Grab info about irq */ state->pt_number =3D hw_irq; state->pt_data =3D irq_data_get_irq_handler_data(host_data); @@ -1027,6 +1034,14 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned= long guest_irq, state->pt_number =3D 0; state->pt_data =3D NULL; =20 + /* + * Reset ESB guest mapping. Needed when ESB pages are exposed + * to the guest in XIVE native mode + */ + if (xive->ops && xive->ops->reset_mapped) { + xive->ops->reset_mapped(kvm, guest_irq); + } + /* Reconfigure the IPI */ xive_native_configure_irq(state->ipi_number, xive_vp(xive, state->act_server), diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3= s_xive_native.c index 92cab6409e8e..bf60870144f1 100644 --- a/arch/powerpc/kvm/book3s_xive_native.c +++ b/arch/powerpc/kvm/book3s_xive_native.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -176,6 +177,35 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device = *dev, return rc; } =20 +/* + * Device passthrough support + */ +static int kvmppc_xive_native_reset_mapped(struct kvm *kvm, unsigned long = irq) +{ + struct kvmppc_xive *xive =3D kvm->arch.xive; + + if (irq >=3D KVMPPC_XIVE_NR_IRQS) + return -EINVAL; + + /* + * Clear the ESB pages of the IRQ number being mapped (or + * unmapped) into the guest and let the the VM fault handler + * repopulate with the appropriate ESB pages (device or IC) + */ + pr_debug("clearing esb pages for girq 0x%lx\n", irq); + mutex_lock(&xive->mapping_lock); + if (xive->mapping) + unmap_mapping_range(xive->mapping, + irq * (2ull << PAGE_SHIFT), + 2ull << PAGE_SHIFT, 1); + mutex_unlock(&xive->mapping_lock); + return 0; +} + +static struct kvmppc_xive_ops kvmppc_xive_native_ops =3D { + .reset_mapped =3D kvmppc_xive_native_reset_mapped, +}; + static int xive_native_esb_fault(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; @@ -253,6 +283,8 @@ static const struct vm_operations_struct xive_native_ti= ma_vmops =3D { static int kvmppc_xive_native_mmap(struct kvm_device *dev, struct vm_area_struct *vma) { + struct kvmppc_xive *xive =3D dev->private; + /* We only allow mappings at fixed offset for now */ if (vma->vm_pgoff =3D KVM_XIVE_TIMA_PAGE_OFFSET) { if (vma_pages(vma) > 4) @@ -268,6 +300,13 @@ static int kvmppc_xive_native_mmap(struct kvm_device *= dev, =20 vma->vm_flags |=3D VM_IO | VM_PFNMAP; vma->vm_page_prot =3D pgprot_noncached_wc(vma->vm_page_prot); + + /* + * Grab the KVM device file address_space to be able to clear + * the ESB pages mapping when a device is passed-through into + * the guest. + */ + xive->mapping =3D vma->vm_file->f_mapping; return 0; } =20 @@ -913,6 +952,7 @@ static int kvmppc_xive_native_create(struct kvm_device = *dev, u32 type) xive->dev =3D dev; xive->kvm =3D kvm; kvm->arch.xive =3D xive; + mutex_init(&xive->mapping_lock); =20 /* We use the default queue size set by the host */ xive->q_order =3D xive_native_default_eq_shift(); @@ -933,6 +973,7 @@ static int kvmppc_xive_native_create(struct kvm_device = *dev, u32 type) ret =3D -ENOMEM; =20 xive->single_escalation =3D xive_native_has_single_escalation(); + xive->ops =3D &kvmppc_xive_native_ops; =20 if (ret) kfree(xive); diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/vir= tual/kvm/devices/xive.txt index be5000b2eb5a..7a242cb07e7c 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -43,6 +43,21 @@ the legacy interrupt mode, referred as XICS (POWER7/8). manage the source: to trigger, to EOI, to turn off the source for instance. =20 + 3. Device passthrough + + When a device is passed-through into the guest, the source + interrupts are from a different HW controller (PHB4) and the ESB + pages exposed to the guest should accommadate this change. + + The passthru_irq helpers, kvmppc_xive_set_mapped() and + kvmppc_xive_clr_mapped() are called when the device HW irqs are + mapped into or unmapped from the guest IRQ number space. The KVM + device extends these helpers to clear the ESB pages of the guest IRQ + number being mapped and then lets the VM fault handler repopulate. + The handler will insert the ESB page corresponding to the HW + interrupt of the device being passed-through or the initial IPI ESB + page if the device has being removed. + * Groups: =20 1. KVM_DEV_XIVE_GRP_CTRL --=20 2.20.1