On Fri, Feb 22, 2019 at 12:28:38PM +0100, Cédric Le Goater wrote: > The KVM XICS-over-XIVE device and the proposed KVM XIVE native device > implement an IRQ space for the guest using the generic IPI interrupts > of the XIVE IC controller. These interrupts are allocated at the OPAL > level and "mapped" into the guest IRQ number space in the range 0-0x1FFF. > Interrupt management is performed in the XIVE way: using loads and > stores on the addresses of the XIVE IPI interrupt ESB pages. > > Both KVM devices share the same internal structure caching information > on the interrupts, among which the xive_irq_data struct containing the > addresses of the IPI ESB pages and an extra one in case of passthrough. > The later contains the addresses of the ESB pages of the underlying HW > controller interrupts, PHB4 in all cases for now. > > A guest, when running in the XICS legacy interrupt mode, lets the KVM > XICS-over-XIVE device "handle" interrupt management, that is to > perform the loads and stores on the addresses of the ESB pages of the > guest interrupts. However, when running in XIVE native exploitation > mode, the KVM XIVE native device exposes the interrupt ESB pages to > the guest and lets the guest perform directly the loads and stores. > > The VMA exposing the ESB pages make use of a custom VM fault handler > which role is to populate the VMA with appropriate pages. When a fault > occurs, the guest IRQ number is deduced from the offset, and the ESB > pages of associated XIVE IPI interrupt are inserted in the VMA (using > the internal structure caching information on the interrupts). > > Supporting device passthrough in the guest running in XIVE native > exploitation mode adds some extra refinements because the ESB pages > of a different HW controller (PHB4) need to be exposed to the guest > along with the initial IPI ESB pages of the XIVE IC controller. But > the overall mechanic is the same. > > When the device HW irqs are mapped into or unmapped from the guest > IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped() > and kvmppc_xive_clr_mapped(), are called to record or clear the > passthrough interrupt information and to perform the switch. > > The approach taken by this patch is to clear the ESB pages of the > guest IRQ number being mapped and let the VM fault handler repopulate. > The handler will insert the ESB page corresponding to the HW interrupt > of the device being passed-through or the initial IPI ESB page if the > device is being removed. > > Signed-off-by: Cédric Le Goater > --- > arch/powerpc/kvm/book3s_xive.h | 9 +++++ > arch/powerpc/kvm/book3s_xive.c | 15 ++++++++ > arch/powerpc/kvm/book3s_xive_native.c | 41 ++++++++++++++++++++++ > Documentation/virtual/kvm/devices/xive.txt | 15 ++++++++ > 4 files changed, 80 insertions(+) > > diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h > index 6660d138c6b7..d1f832a53811 100644 > --- a/arch/powerpc/kvm/book3s_xive.h > +++ b/arch/powerpc/kvm/book3s_xive.h > @@ -94,6 +94,11 @@ struct kvmppc_xive_src_block { > struct kvmppc_xive_irq_state irq_state[KVMPPC_XICS_IRQ_PER_ICS]; > }; > > +struct kvmppc_xive; > + > +struct kvmppc_xive_ops { > + int (*reset_mapped)(struct kvm *kvm, unsigned long guest_irq); > +}; > > struct kvmppc_xive { > struct kvm *kvm; > @@ -132,6 +137,10 @@ struct kvmppc_xive { > > /* Flags */ > u8 single_escalation; > + > + struct kvmppc_xive_ops *ops; > + struct address_space *mapping; > + struct mutex mapping_lock; > }; > > #define KVMPPC_XIVE_Q_COUNT 8 > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c > index 7431e31bc541..7a14512b8944 100644 > --- a/arch/powerpc/kvm/book3s_xive.c > +++ b/arch/powerpc/kvm/book3s_xive.c > @@ -942,6 +942,13 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq, > /* Turn the IPI hard off */ > xive_vm_esb_load(&state->ipi_data, XIVE_ESB_SET_PQ_01); > > + /* > + * Reset ESB guest mapping. Needed when ESB pages are exposed > + * to the guest in XIVE native mode > + */ > + if (xive->ops && xive->ops->reset_mapped) > + xive->ops->reset_mapped(kvm, guest_irq); > + > /* Grab info about irq */ > state->pt_number = hw_irq; > state->pt_data = irq_data_get_irq_handler_data(host_data); > @@ -1027,6 +1034,14 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq, > state->pt_number = 0; > state->pt_data = NULL; > > + /* > + * Reset ESB guest mapping. Needed when ESB pages are exposed > + * to the guest in XIVE native mode > + */ > + if (xive->ops && xive->ops->reset_mapped) { > + xive->ops->reset_mapped(kvm, guest_irq); > + } > + > /* Reconfigure the IPI */ > xive_native_configure_irq(state->ipi_number, > xive_vp(xive, state->act_server), > diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c > index 92cab6409e8e..bf60870144f1 100644 > --- a/arch/powerpc/kvm/book3s_xive_native.c > +++ b/arch/powerpc/kvm/book3s_xive_native.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -176,6 +177,35 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev, > return rc; > } > > +/* > + * Device passthrough support > + */ > +static int kvmppc_xive_native_reset_mapped(struct kvm *kvm, unsigned long irq) > +{ > + struct kvmppc_xive *xive = kvm->arch.xive; > + > + if (irq >= KVMPPC_XIVE_NR_IRQS) > + return -EINVAL; > + > + /* > + * Clear the ESB pages of the IRQ number being mapped (or > + * unmapped) into the guest and let the the VM fault handler > + * repopulate with the appropriate ESB pages (device or IC) > + */ > + pr_debug("clearing esb pages for girq 0x%lx\n", irq); > + mutex_lock(&xive->mapping_lock); > + if (xive->mapping) > + unmap_mapping_range(xive->mapping, > + irq * (2ull << PAGE_SHIFT), > + 2ull << PAGE_SHIFT, 1); > + mutex_unlock(&xive->mapping_lock); > + return 0; > +} > + > +static struct kvmppc_xive_ops kvmppc_xive_native_ops = { > + .reset_mapped = kvmppc_xive_native_reset_mapped, > +}; > + > static int xive_native_esb_fault(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > @@ -253,6 +283,8 @@ static const struct vm_operations_struct xive_native_tima_vmops = { > static int kvmppc_xive_native_mmap(struct kvm_device *dev, > struct vm_area_struct *vma) > { > + struct kvmppc_xive *xive = dev->private; > + > /* We only allow mappings at fixed offset for now */ > if (vma->vm_pgoff == KVM_XIVE_TIMA_PAGE_OFFSET) { > if (vma_pages(vma) > 4) > @@ -268,6 +300,13 @@ static int kvmppc_xive_native_mmap(struct kvm_device *dev, > > vma->vm_flags |= VM_IO | VM_PFNMAP; > vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot); > + > + /* > + * Grab the KVM device file address_space to be able to clear > + * the ESB pages mapping when a device is passed-through into > + * the guest. > + */ > + xive->mapping = vma->vm_file->f_mapping; > return 0; > } > > @@ -913,6 +952,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type) > xive->dev = dev; > xive->kvm = kvm; > kvm->arch.xive = xive; > + mutex_init(&xive->mapping_lock); > > /* We use the default queue size set by the host */ > xive->q_order = xive_native_default_eq_shift(); > @@ -933,6 +973,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type) > ret = -ENOMEM; > > xive->single_escalation = xive_native_has_single_escalation(); > + xive->ops = &kvmppc_xive_native_ops; > > if (ret) > kfree(xive); > diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt > index be5000b2eb5a..7a242cb07e7c 100644 > --- a/Documentation/virtual/kvm/devices/xive.txt > +++ b/Documentation/virtual/kvm/devices/xive.txt > @@ -43,6 +43,21 @@ the legacy interrupt mode, referred as XICS (POWER7/8). > manage the source: to trigger, to EOI, to turn off the source for > instance. > > + 3. Device passthrough > + > + When a device is passed-through into the guest, the source > + interrupts are from a different HW controller (PHB4) and the ESB > + pages exposed to the guest should accommadate this change. > + > + The passthru_irq helpers, kvmppc_xive_set_mapped() and > + kvmppc_xive_clr_mapped() are called when the device HW irqs are > + mapped into or unmapped from the guest IRQ number space. The KVM > + device extends these helpers to clear the ESB pages of the guest IRQ > + number being mapped and then lets the VM fault handler repopulate. > + The handler will insert the ESB page corresponding to the HW > + interrupt of the device being passed-through or the initial IPI ESB > + page if the device has being removed. I think it might be worth emphasizing that this all happens with KVM and userspace / the guest doesn't need to do anything about this remapping. Really this is an informational aside, not something a user of the device actually needs to know. > * Groups: > > 1. KVM_DEV_XIVE_GRP_CTRL -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson