This series adds support for the OpenCAPI devices for vfio pci. It builds on top of the existing ocxl driver + http://patchwork.ozlabs.org/patch/1177999/ VFIO is a Linux kernel driver framework used by QEMU to make devices directly assignable to virtual machines. All OpenCAPI devices on the same PCI slot will all be grouped and assigned to the same guest. - Assume these are the devices you want to assign 0007:00:00.0 Processing accelerators: IBM Device 062b 0007:00:00.1 Processing accelerators: IBM Device 062b - Two Devices in the group $ ls /sys/bus/pci/devices/0007\:00\:00.0/iommu_group/devices/ 0007:00:00.0 0007:00:00.1 - Find vendor & device ID $ lspci -n -s 0007:00:00 0007:00:00.0 1200: 1014:062b 0007:00:00.1 1200: 1014:062b - Unbind from the current ocxl device driver if already loaded $ rmmod ocxl - Load vfio-pci if it's not already done. $ modprobe vfio-pci - Bind to vfio-pci $ echo 1014 062b > /sys/bus/pci/drivers/vfio-pci/new_id This will result in a new device node "/dev/vfio/7", which will be use by QEMU to setup the devices for passthrough. - Pass to qemu using -device vfio-pci -device vfio-pci,multifunction=on,host=0007:00:00.0,addr=2.0 -device vfio-pci,multifunction=on,host=0007:00:00.1,addr=2.1 It has been tested in a bare-metal and QEMU environment using the memcpy and the AFP AFUs. christophe lombard (2): powerpc/powernv: Register IOMMU group for OpenCAPI devices vfio/pci: Introduce OpenCAPI devices support. arch/powerpc/platforms/powernv/ocxl.c | 164 ++++++++++--- arch/powerpc/platforms/powernv/pci-ioda.c | 19 +- arch/powerpc/platforms/powernv/pci.h | 13 + drivers/vfio/pci/Kconfig | 7 + drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/vfio_pci.c | 19 ++ drivers/vfio/pci/vfio_pci_ocxl.c | 287 ++++++++++++++++++++++ drivers/vfio/vfio.c | 25 ++ include/linux/vfio.h | 13 + include/uapi/linux/vfio.h | 22 ++ 10 files changed, 530 insertions(+), 40 deletions(-) create mode 100644 drivers/vfio/pci/vfio_pci_ocxl.c -- 2.21.0
This patch adds group registration for the OpenCAPI devices. An unique iommu group is register for multiple PE, ie for a set of multiple devices sharing the same domain, same bus and same slot. This groud registration will be used to assign an OpenCAPI device to a guest to participate in VFIO, like vfio-pci. The release_ownership hook is used to disable the Scheduled Process Area and clean allocated data if it's not done previously when the ocxl driver is unloaded. To support multiple OpenCAPI devices on the same machine, iommu group and platform data are declared in the npu_link which is common for each devices sharing the same domain, same bus and same slot. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> --- arch/powerpc/platforms/powernv/ocxl.c | 164 +++++++++++++++++----- arch/powerpc/platforms/powernv/pci-ioda.c | 19 ++- arch/powerpc/platforms/powernv/pci.h | 13 ++ 3 files changed, 156 insertions(+), 40 deletions(-) diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c index 12b146c2f855..67b2be965415 100644 --- a/arch/powerpc/platforms/powernv/ocxl.c +++ b/arch/powerpc/platforms/powernv/ocxl.c @@ -74,6 +74,8 @@ struct npu_link { u16 fn_desired_actags[8]; struct actag_range fn_actags[8]; bool assignment_done; + struct iommu_group *group; + struct platform_data data; }; static struct list_head links_list = LIST_HEAD_INIT(links_list); static DEFINE_MUTEX(links_list_lock); @@ -603,54 +605,56 @@ int pnv_ocxl_platform_setup(struct pci_dev *dev, int PE_mask, { struct pci_controller *hose = pci_bus_to_host(dev->bus); struct pnv_phb *phb = hose->private_data; - struct platform_data *data; + struct npu_link *link = NULL; int xsl_irq; u32 bdfn; - int rc; - - data = kzalloc(sizeof(*data), GFP_KERNEL); - if (!data) - return -ENOMEM; + int rc = 0; - rc = alloc_spa(dev, data); - if (rc) { - kfree(data); - return rc; + mutex_lock(&links_list_lock); + link = find_link(dev); + if (!link) { + dev_err(&dev->dev, "Failed to setup platform\n"); + mutex_unlock(&links_list_lock); + return -ENODEV; } + rc = alloc_spa(dev, &link->data); + if (rc) + goto unlock; + rc = get_xsl_irq(dev, &xsl_irq); if (rc) { - free_spa(data); - kfree(data); - return rc; + free_spa(&link->data); + goto unlock; } - rc = map_xsl_regs(dev, &data->dsisr, &data->dar, &data->tfc, - &data->pe_handle); + rc = map_xsl_regs(dev, &link->data.dsisr, &link->data.dar, + &link->data.tfc, &link->data.pe_handle); if (rc) { - free_spa(data); - kfree(data); - return rc; + free_spa(&link->data); + goto unlock; } bdfn = (dev->bus->number << 8) | dev->devfn; rc = opal_npu_spa_setup(phb->opal_id, bdfn, - virt_to_phys(data->spa->spa_mem), + virt_to_phys(link->data.spa->spa_mem), PE_mask); if (rc) { dev_err(&dev->dev, "Can't setup Shared Process Area: %d\n", rc); - unmap_xsl_regs(data->dsisr, data->dar, data->tfc, - data->pe_handle); - free_spa(data); - kfree(data); - return rc; + unmap_xsl_regs(link->data.dsisr, link->data.dar, + link->data.tfc, link->data.pe_handle); + free_spa(&link->data); + goto unlock; } - data->phb_opal_id = phb->opal_id; - data->bdfn = bdfn; - *platform_data = (void *) data; + link->data.phb_opal_id = phb->opal_id; + link->data.bdfn = bdfn; *hwirq = xsl_irq; - return 0; + *platform_data = (void *)&link->data; + +unlock: + mutex_unlock(&links_list_lock); + return rc; } EXPORT_SYMBOL_GPL(pnv_ocxl_platform_setup); @@ -682,11 +686,13 @@ void pnv_ocxl_platform_release(void *platform_data) struct platform_data *data = (struct platform_data *)platform_data; int rc; - rc = opal_npu_spa_setup(data->phb_opal_id, data->bdfn, 0, 0); - WARN_ON(rc); - unmap_xsl_regs(data->dsisr, data->dar, data->tfc, data->pe_handle); - free_spa(data); - kfree(data); + if (data->spa) { + rc = opal_npu_spa_setup(data->phb_opal_id, data->bdfn, 0, 0); + WARN_ON(rc); + unmap_xsl_regs(data->dsisr, data->dar, data->tfc, + data->pe_handle); + free_spa(data); + } } EXPORT_SYMBOL_GPL(pnv_ocxl_platform_release); @@ -837,3 +843,95 @@ int pnv_ocxl_remove_pe(void *platform_data, int pasid, u32 *pid, return remove_pe_from_cache(data, *pe_handle); } EXPORT_SYMBOL_GPL(pnv_ocxl_remove_pe); + +static void take_ownership(struct iommu_table_group *table_group) +{ +} + +static void release_ownership(struct iommu_table_group *table_group) +{ + struct pnv_ioda_pe *pe = container_of(table_group, + struct pnv_ioda_pe, + table_group); + struct npu_link *link = NULL; + + mutex_lock(&links_list_lock); + + link = find_link(pe->pdev); + if (!link) + return; + + if (link->data.spa) + pnv_ocxl_platform_release(&link->data); + + mutex_unlock(&links_list_lock); +} + +static long set_window(struct iommu_table_group *table_group, + int num, struct iommu_table *tbl) +{ + return 0; +} + +static long unset_window(struct iommu_table_group *table_group, + int num) +{ + return 0; +} + +static long create_table(struct iommu_table_group *table_group, + int num, __u32 page_shift, __u64 window_size, + __u32 levels, struct iommu_table **ptbl) +{ + return 0; +} + +static struct iommu_table_group_ops pnv_ocxl_ops = { + .take_ownership = take_ownership, + .release_ownership = release_ownership, + .set_window = set_window, + .unset_window = unset_window, + .create_table = create_table, +}; + +static void group_release(void *iommu_data) +{ + struct iommu_table_group *table_group = iommu_data; + + table_group->group = NULL; +} + +struct iommu_table_group *pnv_ocxl_setup_table_group(struct pnv_ioda_pe *pe) +{ + struct iommu_table_group *table_group; + struct npu_link *link = NULL; + struct pci_controller *hose; + + mutex_lock(&links_list_lock); + + /* The functions of a device all share the same link and by + * default the same table group + */ + link = find_link(pe->pdev); + if (!link) + return NULL; + + hose = pe->phb->hose; + table_group = &pe->table_group; + table_group->ops = &pnv_ocxl_ops; + if (link->group) { + table_group->group = link->group; + iommu_group_set_iommudata(link->group, table_group, + group_release); + } else { + if (!table_group->group) { + iommu_register_group(table_group, + hose->global_number, + pe->pe_number); + link->group = table_group->group; + } + } + + mutex_unlock(&links_list_lock); + return table_group; +} diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index d8080558d020..3f98b05e2d55 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2629,22 +2629,27 @@ static void pnv_pci_ioda_setup_iommu_api(void) list_for_each_entry(hose, &hose_list, list_node) { phb = hose->private_data; - if (phb->type == PNV_PHB_NPU_NVLINK || - phb->type == PNV_PHB_NPU_OCAPI) + if (phb->type == PNV_PHB_NPU_NVLINK) continue; list_for_each_entry(pe, &phb->ioda.pe_list, list) { struct iommu_table_group *table_group; - table_group = pnv_try_setup_npu_table_group(pe); - if (!table_group) { - if (!pnv_pci_ioda_pe_dma_weight(pe)) + if (phb->type == PNV_PHB_NPU_OCAPI) { + table_group = pnv_ocxl_setup_table_group(pe); + if (!table_group) continue; + } else { + table_group = pnv_try_setup_npu_table_group(pe); + if (!table_group) { + if (!pnv_pci_ioda_pe_dma_weight(pe)) + continue; - table_group = &pe->table_group; - iommu_register_group(&pe->table_group, + table_group = &pe->table_group; + iommu_register_group(&pe->table_group, pe->phb->hose->global_number, pe->pe_number); + } } pnv_ioda_setup_bus_iommu_group(pe, table_group, pe->pbus); diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h index 469c24463247..df4b7583efea 100644 --- a/arch/powerpc/platforms/powernv/pci.h +++ b/arch/powerpc/platforms/powernv/pci.h @@ -218,6 +218,19 @@ extern struct iommu_table_group *pnv_try_setup_npu_table_group( extern struct iommu_table_group *pnv_npu_compound_attach( struct pnv_ioda_pe *pe); +/* OpenCAPI functions */ +#if IS_ENABLED(CONFIG_OCXL_BASE) +extern struct iommu_table_group *pnv_ocxl_setup_table_group( + struct pnv_ioda_pe *pe); +#else +static inline struct iommu_table_group *pnv_ocxl_setup_table_group( + struct pnv_ioda_pe *pe) +{ + return NULL; +} +#endif /* CONFIG_OCXL_BASE */ + + /* pci-ioda-tce.c */ #define POWERNV_IOMMU_DEFAULT_LEVELS 1 #define POWERNV_IOMMU_MAX_LEVELS 5 -- 2.21.0
This patch adds new IOCTL commands for VFIO PCI driver to support configuration and management for OpenCAPI devices, which have been passed through from host to QEMU VFIO. OpenCAPI (Open Coherent Accelerator Processor Interface) is an interface between processors and accelerators. The main IOCTL command is: VFIO_DEVICE_OCXL_OP Handles devices, which supports the OpenCAPI interface, using the ocxl pnv_* interface. The following commands are supported, based on the hcalls defined in ocxl/pseries.c that implements the guest-specific callbacks. VFIO_DEVICE_OCXL_CONFIG_ADAPTER Used to configure OpenCAPI adapter characteristics. VFIO_DEVICE_OCXL_CONFIG_SPA Used to configure the schedule process area (SPA) table for an OpenCAPI device. VFIO_DEVICE_OCXL_GET_FAULT_STATE Used to retrieve fault information from an OpenCAPI device. VFIO_DEVICE_OCXL_HANDLE_FAULT Used to respond to an OpenCAPI fault. The platform data is declared in the vfio_pci_ocxl_link which is common for each devices sharing the same domain, same bus and same slot. The lpid value, requested to configure the process element in the Scheduled Process Area, is not available in the QEMU environment. This implies getting it from the host through the iommu group. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> --- drivers/vfio/pci/Kconfig | 7 + drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/vfio_pci.c | 19 ++ drivers/vfio/pci/vfio_pci_ocxl.c | 287 +++++++++++++++++++++++++++++++ drivers/vfio/vfio.c | 25 +++ include/linux/vfio.h | 13 ++ include/uapi/linux/vfio.h | 22 +++ 7 files changed, 374 insertions(+) create mode 100644 drivers/vfio/pci/vfio_pci_ocxl.c diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index ac3c1dd3edef..fd3716d10ded 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -45,3 +45,10 @@ config VFIO_PCI_NVLINK2 depends on VFIO_PCI && PPC_POWERNV help VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs + +config VFIO_PCI_OCXL + depends on VFIO_PCI + def_bool y if OCXL_BASE + help + VFIO PCI support for devices which handle the Open Coherent + Accelerator Processor Interface. diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index f027f8a0e89c..6d55a5fee4b0 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -3,5 +3,6 @@ vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o +vfio-pci-$(CONFIG_VFIO_PCI_OCXL) += vfio_pci_ocxl.o obj-$(CONFIG_VFIO_PCI) += vfio-pci.o diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 703948c9fbe1..4f9741bbe790 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1128,6 +1128,25 @@ static long vfio_pci_ioctl(void *device_data, return vfio_pci_ioeventfd(vdev, ioeventfd.offset, ioeventfd.data, count, ioeventfd.fd); + } else if (cmd == VFIO_DEVICE_OCXL_OP) { + struct vfio_device_ocxl_op ocxl_op; + int ret = 0; + + minsz = offsetofend(struct vfio_device_ocxl_op, data); + + if (copy_from_user(&ocxl_op, (void __user *)arg, minsz)) + return -EFAULT; + + if (ocxl_op.argsz < minsz) + return -EINVAL; + + ret = vfio_pci_ocxl_ioctl(vdev->pdev, &ocxl_op); + + if (!ret) { + if (copy_to_user((void __user *)arg, &ocxl_op, minsz)) + ret = -EFAULT; + } + return ret; } return -ENOTTY; diff --git a/drivers/vfio/pci/vfio_pci_ocxl.c b/drivers/vfio/pci/vfio_pci_ocxl.c new file mode 100644 index 000000000000..cb5cd4fb416d --- /dev/null +++ b/drivers/vfio/pci/vfio_pci_ocxl.c @@ -0,0 +1,287 @@ +// SPDX-License-Identifier: GPL-2.0+ +// Copyright 2019 IBM Corp. + +#include <asm/kvm_ppc.h> +#include <asm/pnv-ocxl.h> +#include <linux/vfio.h> +#include <linux/slab.h> +#include <linux/pci.h> +#include <linux/kvm_host.h> + +struct vfio_device_ocxl_link { + struct list_head list; + int domain; + int bus; + int slot; + void *platform_data; +}; +static struct list_head links_list = LIST_HEAD_INIT(links_list); +static DEFINE_MUTEX(links_list_lock); + +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER 1 +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_SETUP 1 +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_RELEASE 2 +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_GET_ACTAG 3 +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_GET_PASID 4 +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_SET_TL 5 +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_ALLOC_IRQ 6 +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_FREE_IRQ 7 + +#define VFIO_DEVICE_OCXL_CONFIG_SPA 2 +#define VFIO_DEVICE_OCXL_CONFIG_SPA_SET 1 +#define VFIO_DEVICE_OCXL_CONFIG_SPA_UPDATE 2 +#define VFIO_DEVICE_OCXL_CONFIG_SPA_REMOVE 3 + +#define VFIO_DEVICE_OCXL_GET_FAULT_STATE 3 +#define VFIO_DEVICE_OCXL_HANDLE_FAULT 4 + +static struct vfio_device_ocxl_link *find_link(struct pci_dev *pdev) +{ + struct vfio_device_ocxl_link *link; + + list_for_each_entry(link, &links_list, list) { + /* The functions of a device all share the same link */ + if (link->domain == pci_domain_nr(pdev->bus) && + link->bus == pdev->bus->number && + link->slot == PCI_SLOT(pdev->devfn)) { + return link; + } + } + + /* link doesn't exist yet. Allocate one */ + link = kzalloc(sizeof(struct vfio_device_ocxl_link), GFP_KERNEL); + if (!link) + return NULL; + link->domain = pci_domain_nr(pdev->bus); + link->bus = pdev->bus->number; + link->slot = PCI_SLOT(pdev->devfn); + list_add(&link->list, &links_list); + return link; +} + +static long irq_mapped(struct pci_dev *pdev, + int host_irq, int guest_irq, bool set) +{ + struct irq_desc *desc; + struct kvm *kvm; + int ret = 0, virq; + + virq = irq_create_mapping(NULL, host_irq); + if (!virq) { + dev_err(&pdev->dev, + "irq_create_mapping failed for translation interrupt\n"); + return -EINVAL; + } + + desc = irq_to_desc(virq); + if (!desc) { + dev_err(&pdev->dev, + "irq_to_desc failed (host_irq: %d, virq: %d)\n", + host_irq, virq); + return -EIO; + } + + kvm = vfio_dev_get_kvm(&pdev->dev); + if (!kvm) + return -ENODEV; + + mutex_lock(&kvm->lock); + if (xics_on_xive()) { + if (set) + ret = kvmppc_xive_set_mapped(kvm, guest_irq, desc); + else + ret = kvmppc_xive_clr_mapped(kvm, guest_irq, desc); + } else { + if (set) + kvmppc_xics_set_mapped(kvm, guest_irq, host_irq); + else + kvmppc_xics_clr_mapped(kvm, guest_irq, host_irq); + } + mutex_unlock(&kvm->lock); + kvm_put_kvm(kvm); + + return ret; +} + +static long config_adapter(struct pci_dev *pdev, + struct vfio_device_ocxl_op *ocxl_op, + struct vfio_device_ocxl_link *link) +{ + int PE_mask, host_irq, guest_irq, count, tl_dvsec; + u16 base, enabled, supported; + u64 cmd; + int ret = 0; + + cmd = ocxl_op->data[2]; + switch (cmd) { + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER_SETUP: + PE_mask = ocxl_op->data[3]; + ret = pnv_ocxl_platform_setup(pdev, PE_mask, + &host_irq, + &link->platform_data); + if (!ret) { + guest_irq = ocxl_op->data[4]; + ret = irq_mapped(pdev, host_irq, guest_irq, true); + if (!ret) + ocxl_op->data[0] = host_irq; + } + break; + + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER_RELEASE: + pnv_ocxl_platform_release(link->platform_data); + + host_irq = ocxl_op->data[3]; + guest_irq = ocxl_op->data[4]; + if (host_irq && guest_irq) + ret = irq_mapped(pdev, host_irq, guest_irq, false); + break; + + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER_GET_ACTAG: + ret = pnv_ocxl_get_actag(pdev, &base, &enabled, + &supported); + if (!ret) { + ocxl_op->data[0] = base; + ocxl_op->data[1] = enabled; + ocxl_op->data[2] = supported; + } + break; + + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER_GET_PASID: + ret = pnv_ocxl_get_pasid_count(pdev, &count); + if (!ret) + ocxl_op->data[0] = count; + break; + + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER_SET_TL: + tl_dvsec = ocxl_op->data[3]; + ret = pnv_ocxl_set_TL(pdev, tl_dvsec); + break; + + default: + ret = -EINVAL; + } + + if (ret) + dev_err(&pdev->dev, "Failed to configure the adapter " + "(cmd: %#llx, ret: %d)\n", + cmd, ret); + + return ret; +} + +static long config_spa(struct pci_dev *pdev, + struct vfio_device_ocxl_op *ocxl_op, + struct vfio_device_ocxl_link *link) +{ + int lpid, pid, tid, pasid; + int pe_handle, ret = 0; + u32 pidr, tidr, amr; + struct kvm *kvm; + u64 cmd; + + cmd = ocxl_op->data[2]; + switch (cmd) { + case VFIO_DEVICE_OCXL_CONFIG_SPA_SET: + kvm = vfio_dev_get_kvm(&pdev->dev); + if (!kvm) + return -ENODEV; + lpid = kvm->arch.lpid; + kvm_put_kvm(kvm); + + pasid = ocxl_op->data[3]; + pidr = ocxl_op->data[4]; + tidr = ocxl_op->data[5]; + amr = ocxl_op->data[6]; + + ret = pnv_ocxl_set_pe(link->platform_data, lpid, pasid, + pidr, tidr, amr, &pe_handle); + if (!ret) + ocxl_op->data[0] = pe_handle; + break; + + case VFIO_DEVICE_OCXL_CONFIG_SPA_UPDATE: + pasid = ocxl_op->data[3]; + tid = ocxl_op->data[4]; + + pnv_ocxl_update_pe(link->platform_data, pasid, tid); + break; + + case VFIO_DEVICE_OCXL_CONFIG_SPA_REMOVE: + pasid = ocxl_op->data[3]; + + ret = pnv_ocxl_remove_pe(link->platform_data, pasid, + &pid, &tid, &pe_handle); + if (!ret) { + ocxl_op->data[0] = pid; + ocxl_op->data[1] = tid; + ocxl_op->data[2] = pe_handle; + } + break; + + default: + ret = -EINVAL; + } + + if (ret) + dev_err(&pdev->dev, "Failed to configure the SPA " + "(cmd: %#llx, ret: %d)\n", + cmd, ret); + + return ret; +} + +static void get_fault_state(struct vfio_device_ocxl_op *ocxl_op, + struct vfio_device_ocxl_link *link) +{ + u64 dsisr, dar, pe_handle; + int pid; + + pnv_ocxl_get_fault_state(link->platform_data, &dsisr, &dar, + &pe_handle, &pid); + + ocxl_op->data[0] = dsisr; + ocxl_op->data[1] = dar; + ocxl_op->data[2] = pe_handle; + ocxl_op->data[3] = pid; +} + +static void handle_fault(struct vfio_device_ocxl_op *ocxl_op, + struct vfio_device_ocxl_link *link) +{ + u64 tfc; + + tfc = ocxl_op->data[2]; + pnv_ocxl_handle_fault(link->platform_data, tfc); +} + +long vfio_pci_ocxl_ioctl(struct pci_dev *pdev, + struct vfio_device_ocxl_op *ocxl_op) +{ + struct vfio_device_ocxl_link *link; + int ret = 0; + + /* The functions of a device all share the same link */ + mutex_lock(&links_list_lock); + link = find_link(pdev); + + switch (ocxl_op->op) { + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER: + ret = config_adapter(pdev, ocxl_op, link); + break; + case VFIO_DEVICE_OCXL_CONFIG_SPA: + ret = config_spa(pdev, ocxl_op, link); + break; + case VFIO_DEVICE_OCXL_GET_FAULT_STATE: + get_fault_state(ocxl_op, link); + break; + case VFIO_DEVICE_OCXL_HANDLE_FAULT: + handle_fault(ocxl_op, link); + break; + default: + ret = -EINVAL; + } + + mutex_unlock(&links_list_lock); + return ret; +} +EXPORT_SYMBOL_GPL(vfio_pci_ocxl_ioctl); diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 388597930b64..31d64ecac690 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -18,6 +18,7 @@ #include <linux/fs.h> #include <linux/idr.h> #include <linux/iommu.h> +#include <linux/kvm_host.h> #include <linux/list.h> #include <linux/miscdevice.h> #include <linux/module.h> @@ -2051,6 +2052,30 @@ void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm) } EXPORT_SYMBOL_GPL(vfio_group_set_kvm); +struct kvm *vfio_dev_get_kvm(struct device *dev) +{ + struct iommu_group *iommu_group; + struct vfio_group *group; + struct kvm *kvm; + + iommu_group = iommu_group_get(dev); + if (!iommu_group) + return NULL; + + group = vfio_group_get_from_iommu(iommu_group); + if (!group) { + iommu_group_put(iommu_group); + return NULL; + } + + kvm_get_kvm(kvm = group->kvm); + iommu_group_put(iommu_group); + vfio_group_put(group); + + return kvm; +} +EXPORT_SYMBOL_GPL(vfio_dev_get_kvm); + static int vfio_register_group_notifier(struct vfio_group *group, unsigned long *events, struct notifier_block *nb) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index e42a711a2800..22ee8d007353 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -129,6 +129,7 @@ extern int vfio_unregister_notifier(struct device *dev, struct kvm; extern void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm); +extern struct kvm *vfio_dev_get_kvm(struct device *dev); /* * Sub-module helpers @@ -195,4 +196,16 @@ extern int vfio_virqfd_enable(void *opaque, void *data, struct virqfd **pvirqfd, int fd); extern void vfio_virqfd_disable(struct virqfd **pvirqfd); +/* OpenCAPI */ +#if IS_ENABLED(CONFIG_OCXL_BASE) +extern long vfio_pci_ocxl_ioctl(struct pci_dev *pdev, + struct vfio_device_ocxl_op *ocxl_op); +#else +static inline long vfio_pci_ocxl_ioctl(struct pci_dev *pdev, + struct vfio_device_ocxl_op *ocxl_op) +{ + return -ENOTTY; +} +#endif /* CONFIG_OCXL_BASE */ + #endif /* VFIO_H */ diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 8f10748dac79..4432593c2e65 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -912,6 +912,28 @@ struct vfio_iommu_spapr_tce_remove { }; #define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 20) +/** + * VFIO_DEVICE_OCXL_OP - _IOW(VFIO_TYPE, VFIO_BASE + 22, struct vfio_device_ocxl_op) + * + * Handles devices, which supports the OpenCAPI interface, using the + * ocxl pnv_* interface. + */ +struct vfio_device_ocxl_op { + __u32 argsz; + __u32 flags; + __u32 op; + __u64 data[9]; + /* data to be read and written + * data[0] = buid + * data[1] = config_addr + * data[2] = cmd or data[2] = p1, data[3] = p2, ... + * data[3] = p1, params[4] = p2, ... + * + * data[x] = outx ... + */ +}; +#define VFIO_DEVICE_OCXL_OP _IO(VFIO_TYPE, VFIO_BASE + 22) + /* ***************************************************************** */ #endif /* _UAPIVFIO_H */ -- 2.21.0
Hi Christophe,
Sorry, I didn't have time to look at your other series yet and
likely the same for this one with the upcoming KVM Forum... :-\
Anyway, for any VFIO related patch, don't forget to Cc the
maintainer, Alex Williamson <alex.williamson@redhat.com> .
Cheers,
--
Greg
On Thu, 24 Oct 2019 15:28:03 +0200
christophe lombard <clombard@linux.vnet.ibm.com> wrote:
> This series adds support for the OpenCAPI devices for vfio pci.
>
> It builds on top of the existing ocxl driver +
> http://patchwork.ozlabs.org/patch/1177999/
>
> VFIO is a Linux kernel driver framework used by QEMU to make devices
> directly assignable to virtual machines.
>
> All OpenCAPI devices on the same PCI slot will all be grouped and
> assigned to the same guest.
>
> - Assume these are the devices you want to assign
> 0007:00:00.0 Processing accelerators: IBM Device 062b
> 0007:00:00.1 Processing accelerators: IBM Device 062b
>
> - Two Devices in the group
> $ ls /sys/bus/pci/devices/0007\:00\:00.0/iommu_group/devices/
> 0007:00:00.0 0007:00:00.1
>
> - Find vendor & device ID
> $ lspci -n -s 0007:00:00
> 0007:00:00.0 1200: 1014:062b
> 0007:00:00.1 1200: 1014:062b
>
> - Unbind from the current ocxl device driver if already loaded
> $ rmmod ocxl
>
> - Load vfio-pci if it's not already done.
> $ modprobe vfio-pci
>
> - Bind to vfio-pci
> $ echo 1014 062b > /sys/bus/pci/drivers/vfio-pci/new_id
>
> This will result in a new device node "/dev/vfio/7", which will be
> use by QEMU to setup the devices for passthrough.
>
> - Pass to qemu using -device vfio-pci
> -device vfio-pci,multifunction=on,host=0007:00:00.0,addr=2.0 -device
> vfio-pci,multifunction=on,host=0007:00:00.1,addr=2.1
>
> It has been tested in a bare-metal and QEMU environment using the memcpy
> and the AFP AFUs.
>
> christophe lombard (2):
> powerpc/powernv: Register IOMMU group for OpenCAPI devices
> vfio/pci: Introduce OpenCAPI devices support.
>
> arch/powerpc/platforms/powernv/ocxl.c | 164 ++++++++++---
> arch/powerpc/platforms/powernv/pci-ioda.c | 19 +-
> arch/powerpc/platforms/powernv/pci.h | 13 +
> drivers/vfio/pci/Kconfig | 7 +
> drivers/vfio/pci/Makefile | 1 +
> drivers/vfio/pci/vfio_pci.c | 19 ++
> drivers/vfio/pci/vfio_pci_ocxl.c | 287 ++++++++++++++++++++++
> drivers/vfio/vfio.c | 25 ++
> include/linux/vfio.h | 13 +
> include/uapi/linux/vfio.h | 22 ++
> 10 files changed, 530 insertions(+), 40 deletions(-)
> create mode 100644 drivers/vfio/pci/vfio_pci_ocxl.c
>
On 25/10/2019 00:28, christophe lombard wrote: > This series adds support for the OpenCAPI devices for vfio pci. You can pass any PCI device via vfio already, what is missing today to make it fully working? For example, for nvlink gpus it was coherent memory and ATSD which we needed to expose to the userspace, with opencapi I thought it is going to be some sort of SRIOV-alike or a mediated device but all you do here is to pass through the entire device with no new resources shared by the vfio-pci driver. > > It builds on top of the existing ocxl driver + > http://patchwork.ozlabs.org/patch/1177999/ Next time please provide a link to the series such as: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=136572 > > VFIO is a Linux kernel driver framework used by QEMU to make devices > directly assignable to virtual machines. > > All OpenCAPI devices on the same PCI slot will all be grouped and > assigned to the same guest. > > - Assume these are the devices you want to assign > 0007:00:00.0 Processing accelerators: IBM Device 062b > 0007:00:00.1 Processing accelerators: IBM Device 062b > > - Two Devices in the group > $ ls /sys/bus/pci/devices/0007\:00\:00.0/iommu_group/devices/ > 0007:00:00.0 0007:00:00.1 > > - Find vendor & device ID > $ lspci -n -s 0007:00:00 > 0007:00:00.0 1200: 1014:062b > 0007:00:00.1 1200: 1014:062b > > - Unbind from the current ocxl device driver if already loaded > $ rmmod ocxl > > - Load vfio-pci if it's not already done. > $ modprobe vfio-pci > > - Bind to vfio-pci > $ echo 1014 062b > /sys/bus/pci/drivers/vfio-pci/new_id > > This will result in a new device node "/dev/vfio/7", which will be > use by QEMU to setup the devices for passthrough. > > - Pass to qemu using -device vfio-pci > -device vfio-pci,multifunction=on,host=0007:00:00.0,addr=2.0 -device > vfio-pci,multifunction=on,host=0007:00:00.1,addr=2.1 > > It has been tested in a bare-metal and QEMU environment using the memcpy > and the AFP AFUs. Is there the corresponding qemu tree somewhere? > > christophe lombard (2): > powerpc/powernv: Register IOMMU group for OpenCAPI devices > vfio/pci: Introduce OpenCAPI devices support. > > arch/powerpc/platforms/powernv/ocxl.c | 164 ++++++++++--- > arch/powerpc/platforms/powernv/pci-ioda.c | 19 +- > arch/powerpc/platforms/powernv/pci.h | 13 + > drivers/vfio/pci/Kconfig | 7 + > drivers/vfio/pci/Makefile | 1 + > drivers/vfio/pci/vfio_pci.c | 19 ++ > drivers/vfio/pci/vfio_pci_ocxl.c | 287 ++++++++++++++++++++++ > drivers/vfio/vfio.c | 25 ++ > include/linux/vfio.h | 13 + > include/uapi/linux/vfio.h | 22 ++ > 10 files changed, 530 insertions(+), 40 deletions(-) > create mode 100644 drivers/vfio/pci/vfio_pci_ocxl.c > -- Alexey
On 25/10/2019 00:28, christophe lombard wrote: > This patch adds new IOCTL commands for VFIO PCI driver to support > configuration and management for OpenCAPI devices, which have been passed > through from host to QEMU VFIO. So far we managed to keep all IBM POWER specific inside the IOMMU subdriver, why is this case different? You really have to outline what OpenCAPI does and why simple passing through does not work and how the guest will trigger these ioctls() in QEMU, are you adding hypercalls? What driver do you expect to work with this in the guest? arch/powerpc/platforms/powernv/ocxl.c is a powernv-tied driver which calls into OPAL so it won't work on pseries. > OpenCAPI (Open Coherent Accelerator Processor Interface) is an interface > between processors and accelerators. > > The main IOCTL command is: > VFIO_DEVICE_OCXL_OP Handles devices, which supports the OpenCAPI > interface, using the ocxl pnv_* interface. > > The following commands are supported, based on the hcalls defined > in ocxl/pseries.c that implements the guest-specific callbacks. > VFIO_DEVICE_OCXL_CONFIG_ADAPTER Used to configure OpenCAPI adapter > characteristics. > > VFIO_DEVICE_OCXL_CONFIG_SPA Used to configure the schedule process > area (SPA) table for an OpenCAPI device. > > VFIO_DEVICE_OCXL_GET_FAULT_STATE Used to retrieve fault information > from an OpenCAPI device. > > VFIO_DEVICE_OCXL_HANDLE_FAULT Used to respond to an OpenCAPI fault. > > The platform data is declared in the vfio_pci_ocxl_link which is common > for each devices sharing the same domain, same bus and same slot. So this can be IOMMU attributes, no? > The lpid value, requested to configure the process element in the > Scheduled Process Area, is not available in the QEMU environment. > This implies getting it from the host through the iommu group. lpid does not change after the device is assigned so you can just do what drivers/vfio/pci/vfio_pci_nvlink2.c does (set it once per IOMMU group when KVM is attached to a group) and skip the vfio_dev_get_kvm() part. Thanks, > Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> > --- > drivers/vfio/pci/Kconfig | 7 + > drivers/vfio/pci/Makefile | 1 + > drivers/vfio/pci/vfio_pci.c | 19 ++ > drivers/vfio/pci/vfio_pci_ocxl.c | 287 +++++++++++++++++++++++++++++++ > drivers/vfio/vfio.c | 25 +++ > include/linux/vfio.h | 13 ++ > include/uapi/linux/vfio.h | 22 +++ > 7 files changed, 374 insertions(+) > create mode 100644 drivers/vfio/pci/vfio_pci_ocxl.c > > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig > index ac3c1dd3edef..fd3716d10ded 100644 > --- a/drivers/vfio/pci/Kconfig > +++ b/drivers/vfio/pci/Kconfig > @@ -45,3 +45,10 @@ config VFIO_PCI_NVLINK2 > depends on VFIO_PCI && PPC_POWERNV > help > VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs > + > +config VFIO_PCI_OCXL > + depends on VFIO_PCI > + def_bool y if OCXL_BASE > + help > + VFIO PCI support for devices which handle the Open Coherent > + Accelerator Processor Interface. > diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile > index f027f8a0e89c..6d55a5fee4b0 100644 > --- a/drivers/vfio/pci/Makefile > +++ b/drivers/vfio/pci/Makefile > @@ -3,5 +3,6 @@ > vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o > vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o > vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o > +vfio-pci-$(CONFIG_VFIO_PCI_OCXL) += vfio_pci_ocxl.o > > obj-$(CONFIG_VFIO_PCI) += vfio-pci.o > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index 703948c9fbe1..4f9741bbe790 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -1128,6 +1128,25 @@ static long vfio_pci_ioctl(void *device_data, > > return vfio_pci_ioeventfd(vdev, ioeventfd.offset, > ioeventfd.data, count, ioeventfd.fd); > + } else if (cmd == VFIO_DEVICE_OCXL_OP) { > + struct vfio_device_ocxl_op ocxl_op; > + int ret = 0; > + > + minsz = offsetofend(struct vfio_device_ocxl_op, data); > + > + if (copy_from_user(&ocxl_op, (void __user *)arg, minsz)) > + return -EFAULT; > + > + if (ocxl_op.argsz < minsz) > + return -EINVAL; > + > + ret = vfio_pci_ocxl_ioctl(vdev->pdev, &ocxl_op); > + > + if (!ret) { > + if (copy_to_user((void __user *)arg, &ocxl_op, minsz)) > + ret = -EFAULT; > + } > + return ret; > } > > return -ENOTTY; > diff --git a/drivers/vfio/pci/vfio_pci_ocxl.c b/drivers/vfio/pci/vfio_pci_ocxl.c > new file mode 100644 > index 000000000000..cb5cd4fb416d > --- /dev/null > +++ b/drivers/vfio/pci/vfio_pci_ocxl.c > @@ -0,0 +1,287 @@ > +// SPDX-License-Identifier: GPL-2.0+ > +// Copyright 2019 IBM Corp. > + > +#include <asm/kvm_ppc.h> > +#include <asm/pnv-ocxl.h> > +#include <linux/vfio.h> > +#include <linux/slab.h> > +#include <linux/pci.h> > +#include <linux/kvm_host.h> > + > +struct vfio_device_ocxl_link { > + struct list_head list; > + int domain; > + int bus; > + int slot; > + void *platform_data; > +}; > +static struct list_head links_list = LIST_HEAD_INIT(links_list); > +static DEFINE_MUTEX(links_list_lock); > + > +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER 1 > +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_SETUP 1 > +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_RELEASE 2 > +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_GET_ACTAG 3 > +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_GET_PASID 4 > +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_SET_TL 5 > +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_ALLOC_IRQ 6 > +#define VFIO_DEVICE_OCXL_CONFIG_ADAPTER_FREE_IRQ 7 > + > +#define VFIO_DEVICE_OCXL_CONFIG_SPA 2 > +#define VFIO_DEVICE_OCXL_CONFIG_SPA_SET 1 > +#define VFIO_DEVICE_OCXL_CONFIG_SPA_UPDATE 2 > +#define VFIO_DEVICE_OCXL_CONFIG_SPA_REMOVE 3 > + > +#define VFIO_DEVICE_OCXL_GET_FAULT_STATE 3 > +#define VFIO_DEVICE_OCXL_HANDLE_FAULT 4 > + > +static struct vfio_device_ocxl_link *find_link(struct pci_dev *pdev) > +{ > + struct vfio_device_ocxl_link *link; > + > + list_for_each_entry(link, &links_list, list) { > + /* The functions of a device all share the same link */ > + if (link->domain == pci_domain_nr(pdev->bus) && > + link->bus == pdev->bus->number && > + link->slot == PCI_SLOT(pdev->devfn)) { > + return link; > + } > + } > + > + /* link doesn't exist yet. Allocate one */ > + link = kzalloc(sizeof(struct vfio_device_ocxl_link), GFP_KERNEL); > + if (!link) > + return NULL; > + link->domain = pci_domain_nr(pdev->bus); > + link->bus = pdev->bus->number; > + link->slot = PCI_SLOT(pdev->devfn); > + list_add(&link->list, &links_list); > + return link; > +} > + > +static long irq_mapped(struct pci_dev *pdev, > + int host_irq, int guest_irq, bool set) > +{ > + struct irq_desc *desc; > + struct kvm *kvm; > + int ret = 0, virq; > + > + virq = irq_create_mapping(NULL, host_irq); > + if (!virq) { > + dev_err(&pdev->dev, > + "irq_create_mapping failed for translation interrupt\n"); > + return -EINVAL; > + } > + > + desc = irq_to_desc(virq); > + if (!desc) { > + dev_err(&pdev->dev, > + "irq_to_desc failed (host_irq: %d, virq: %d)\n", > + host_irq, virq); > + return -EIO; > + } > + > + kvm = vfio_dev_get_kvm(&pdev->dev); > + if (!kvm) > + return -ENODEV; > + > + mutex_lock(&kvm->lock); > + if (xics_on_xive()) { > + if (set) > + ret = kvmppc_xive_set_mapped(kvm, guest_irq, desc); > + else > + ret = kvmppc_xive_clr_mapped(kvm, guest_irq, desc); > + } else { > + if (set) > + kvmppc_xics_set_mapped(kvm, guest_irq, host_irq); > + else > + kvmppc_xics_clr_mapped(kvm, guest_irq, host_irq); > + } > + mutex_unlock(&kvm->lock); > + kvm_put_kvm(kvm); > + > + return ret; > +} > + > +static long config_adapter(struct pci_dev *pdev, > + struct vfio_device_ocxl_op *ocxl_op, > + struct vfio_device_ocxl_link *link) > +{ > + int PE_mask, host_irq, guest_irq, count, tl_dvsec; > + u16 base, enabled, supported; > + u64 cmd; > + int ret = 0; > + > + cmd = ocxl_op->data[2]; > + switch (cmd) { > + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER_SETUP: > + PE_mask = ocxl_op->data[3]; > + ret = pnv_ocxl_platform_setup(pdev, PE_mask, > + &host_irq, > + &link->platform_data); > + if (!ret) { > + guest_irq = ocxl_op->data[4]; > + ret = irq_mapped(pdev, host_irq, guest_irq, true); > + if (!ret) > + ocxl_op->data[0] = host_irq; > + } > + break; > + > + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER_RELEASE: > + pnv_ocxl_platform_release(link->platform_data); > + > + host_irq = ocxl_op->data[3]; > + guest_irq = ocxl_op->data[4]; > + if (host_irq && guest_irq) > + ret = irq_mapped(pdev, host_irq, guest_irq, false); > + break; > + > + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER_GET_ACTAG: > + ret = pnv_ocxl_get_actag(pdev, &base, &enabled, > + &supported); > + if (!ret) { > + ocxl_op->data[0] = base; > + ocxl_op->data[1] = enabled; > + ocxl_op->data[2] = supported; > + } > + break; > + > + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER_GET_PASID: > + ret = pnv_ocxl_get_pasid_count(pdev, &count); > + if (!ret) > + ocxl_op->data[0] = count; > + break; > + > + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER_SET_TL: > + tl_dvsec = ocxl_op->data[3]; > + ret = pnv_ocxl_set_TL(pdev, tl_dvsec); > + break; > + > + default: > + ret = -EINVAL; > + } > + > + if (ret) > + dev_err(&pdev->dev, "Failed to configure the adapter " > + "(cmd: %#llx, ret: %d)\n", > + cmd, ret); > + > + return ret; > +} > + > +static long config_spa(struct pci_dev *pdev, > + struct vfio_device_ocxl_op *ocxl_op, > + struct vfio_device_ocxl_link *link) > +{ > + int lpid, pid, tid, pasid; > + int pe_handle, ret = 0; > + u32 pidr, tidr, amr; > + struct kvm *kvm; > + u64 cmd; > + > + cmd = ocxl_op->data[2]; > + switch (cmd) { > + case VFIO_DEVICE_OCXL_CONFIG_SPA_SET: > + kvm = vfio_dev_get_kvm(&pdev->dev); > + if (!kvm) > + return -ENODEV; > + lpid = kvm->arch.lpid; > + kvm_put_kvm(kvm); > + > + pasid = ocxl_op->data[3]; > + pidr = ocxl_op->data[4]; > + tidr = ocxl_op->data[5]; > + amr = ocxl_op->data[6]; > + > + ret = pnv_ocxl_set_pe(link->platform_data, lpid, pasid, > + pidr, tidr, amr, &pe_handle); > + if (!ret) > + ocxl_op->data[0] = pe_handle; > + break; > + > + case VFIO_DEVICE_OCXL_CONFIG_SPA_UPDATE: > + pasid = ocxl_op->data[3]; > + tid = ocxl_op->data[4]; > + > + pnv_ocxl_update_pe(link->platform_data, pasid, tid); > + break; > + > + case VFIO_DEVICE_OCXL_CONFIG_SPA_REMOVE: > + pasid = ocxl_op->data[3]; > + > + ret = pnv_ocxl_remove_pe(link->platform_data, pasid, > + &pid, &tid, &pe_handle); > + if (!ret) { > + ocxl_op->data[0] = pid; > + ocxl_op->data[1] = tid; > + ocxl_op->data[2] = pe_handle; > + } > + break; > + > + default: > + ret = -EINVAL; > + } > + > + if (ret) > + dev_err(&pdev->dev, "Failed to configure the SPA " > + "(cmd: %#llx, ret: %d)\n", > + cmd, ret); > + > + return ret; > +} > + > +static void get_fault_state(struct vfio_device_ocxl_op *ocxl_op, > + struct vfio_device_ocxl_link *link) > +{ > + u64 dsisr, dar, pe_handle; > + int pid; > + > + pnv_ocxl_get_fault_state(link->platform_data, &dsisr, &dar, > + &pe_handle, &pid); > + > + ocxl_op->data[0] = dsisr; > + ocxl_op->data[1] = dar; > + ocxl_op->data[2] = pe_handle; > + ocxl_op->data[3] = pid; > +} > + > +static void handle_fault(struct vfio_device_ocxl_op *ocxl_op, > + struct vfio_device_ocxl_link *link) > +{ > + u64 tfc; > + > + tfc = ocxl_op->data[2]; > + pnv_ocxl_handle_fault(link->platform_data, tfc); > +} > + > +long vfio_pci_ocxl_ioctl(struct pci_dev *pdev, > + struct vfio_device_ocxl_op *ocxl_op) > +{ > + struct vfio_device_ocxl_link *link; > + int ret = 0; > + > + /* The functions of a device all share the same link */ > + mutex_lock(&links_list_lock); > + link = find_link(pdev); > + > + switch (ocxl_op->op) { > + case VFIO_DEVICE_OCXL_CONFIG_ADAPTER: > + ret = config_adapter(pdev, ocxl_op, link); > + break; > + case VFIO_DEVICE_OCXL_CONFIG_SPA: > + ret = config_spa(pdev, ocxl_op, link); > + break; > + case VFIO_DEVICE_OCXL_GET_FAULT_STATE: > + get_fault_state(ocxl_op, link); > + break; > + case VFIO_DEVICE_OCXL_HANDLE_FAULT: > + handle_fault(ocxl_op, link); > + break; > + default: > + ret = -EINVAL; > + } > + > + mutex_unlock(&links_list_lock); > + return ret; > +} > +EXPORT_SYMBOL_GPL(vfio_pci_ocxl_ioctl); > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index 388597930b64..31d64ecac690 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c > @@ -18,6 +18,7 @@ > #include <linux/fs.h> > #include <linux/idr.h> > #include <linux/iommu.h> > +#include <linux/kvm_host.h> > #include <linux/list.h> > #include <linux/miscdevice.h> > #include <linux/module.h> > @@ -2051,6 +2052,30 @@ void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm) > } > EXPORT_SYMBOL_GPL(vfio_group_set_kvm); > > +struct kvm *vfio_dev_get_kvm(struct device *dev) > +{ > + struct iommu_group *iommu_group; > + struct vfio_group *group; > + struct kvm *kvm; > + > + iommu_group = iommu_group_get(dev); > + if (!iommu_group) > + return NULL; > + > + group = vfio_group_get_from_iommu(iommu_group); > + if (!group) { > + iommu_group_put(iommu_group); > + return NULL; > + } > + > + kvm_get_kvm(kvm = group->kvm); > + iommu_group_put(iommu_group); > + vfio_group_put(group); > + > + return kvm; > +} > +EXPORT_SYMBOL_GPL(vfio_dev_get_kvm); > + > static int vfio_register_group_notifier(struct vfio_group *group, > unsigned long *events, > struct notifier_block *nb) > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > index e42a711a2800..22ee8d007353 100644 > --- a/include/linux/vfio.h > +++ b/include/linux/vfio.h > @@ -129,6 +129,7 @@ extern int vfio_unregister_notifier(struct device *dev, > > struct kvm; > extern void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm); > +extern struct kvm *vfio_dev_get_kvm(struct device *dev); > > /* > * Sub-module helpers > @@ -195,4 +196,16 @@ extern int vfio_virqfd_enable(void *opaque, > void *data, struct virqfd **pvirqfd, int fd); > extern void vfio_virqfd_disable(struct virqfd **pvirqfd); > > +/* OpenCAPI */ > +#if IS_ENABLED(CONFIG_OCXL_BASE) > +extern long vfio_pci_ocxl_ioctl(struct pci_dev *pdev, > + struct vfio_device_ocxl_op *ocxl_op); > +#else > +static inline long vfio_pci_ocxl_ioctl(struct pci_dev *pdev, > + struct vfio_device_ocxl_op *ocxl_op) > +{ > + return -ENOTTY; > +} > +#endif /* CONFIG_OCXL_BASE */ > + > #endif /* VFIO_H */ > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 8f10748dac79..4432593c2e65 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -912,6 +912,28 @@ struct vfio_iommu_spapr_tce_remove { > }; > #define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 20) > > +/** > + * VFIO_DEVICE_OCXL_OP - _IOW(VFIO_TYPE, VFIO_BASE + 22, struct vfio_device_ocxl_op) > + * > + * Handles devices, which supports the OpenCAPI interface, using the > + * ocxl pnv_* interface. > + */ > +struct vfio_device_ocxl_op { > + __u32 argsz; > + __u32 flags; > + __u32 op; > + __u64 data[9]; > + /* data to be read and written > + * data[0] = buid > + * data[1] = config_addr > + * data[2] = cmd or data[2] = p1, data[3] = p2, ... > + * data[3] = p1, params[4] = p2, ... > + * > + * data[x] = outx ... > + */ > +}; > +#define VFIO_DEVICE_OCXL_OP _IO(VFIO_TYPE, VFIO_BASE + 22) > + > /* ***************************************************************** */ > > #endif /* _UAPIVFIO_H */ > -- Alexey
On 11/11/19 3:17 pm, Alexey Kardashevskiy wrote: > What driver do you expect to work with this in the guest? > arch/powerpc/platforms/powernv/ocxl.c is a powernv-tied driver which > calls into OPAL so it won't work on pseries. https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=135087&state=* https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=137704&state=* for context -- Andrew Donnellan OzLabs, ADL Canberra ajd@linux.ibm.com IBM Australia Limited
On 11/11/2019 15:28, Andrew Donnellan wrote:
> On 11/11/19 3:17 pm, Alexey Kardashevskiy wrote:
>> What driver do you expect to work with this in the guest?
>> arch/powerpc/platforms/powernv/ocxl.c is a powernv-tied driver which
>> calls into OPAL so it won't work on pseries.
>
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=135087&state=*
>
>
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=137704&state=*
This does not help much and this is definitely outdated as those
proposed hcalls do not match ioctls proposed by this patchset.
--
Alexey