All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Jagannathan Raman <jag.raman@oracle.com>
Cc: elena.ufimtseva@oracle.com, john.g.johnson@oracle.com,
	thuth@redhat.com, bleal@redhat.com, swapnil.ingle@nutanix.com,
	john.levon@nutanix.com, philmd@redhat.com, qemu-devel@nongnu.org,
	wainersm@redhat.com, alex.williamson@redhat.com,
	thanos.makatos@nutanix.com, marcandre.lureau@gmail.com,
	crosa@redhat.com, pbonzini@redhat.com, alex.bennee@linaro.org
Subject: Re: [PATCH v4 11/14] vfio-user: IOMMU support for remote device
Date: Thu, 16 Dec 2021 14:40:25 +0000	[thread overview]
Message-ID: <YbtP2eaBnptogQDf@stefanha-x1.localdomain> (raw)
In-Reply-To: <acae079dec4261d762311b86a0e699ba9ad79737.1639549843.git.jag.raman@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 10561 bytes --]

On Wed, Dec 15, 2021 at 10:35:35AM -0500, Jagannathan Raman wrote:
> Assign separate address space for each device in the remote processes.

If I understand correctly this isn't really an IOMMU. It's abusing the
IOMMU APIs to create isolated address spaces for each device. This way
memory regions added by the vfio-user client do not conflict when there
are multiple vfio-user servers.

Calling pci_root_bus_new() and keeping one PCI bus per VfuObject might
be a cleaner approach:
- Lets you isolate both PCI Memory Space and IO Space.
- Isolates the PCIDevices and their addresses on the bus.
- Isolates irqs.
- No more need to abuse the IOMMU API.

I might be missing something because I haven't investigated how to do
this myself.

> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/hw/pci/pci.h      |   2 +
>  include/hw/remote/iommu.h |  24 ++++++++
>  hw/pci/pci.c              |   2 +-
>  hw/remote/iommu.c         | 117 ++++++++++++++++++++++++++++++++++++++
>  hw/remote/machine.c       |   5 ++
>  hw/remote/vfio-user-obj.c |  20 ++++++-
>  MAINTAINERS               |   2 +
>  hw/remote/meson.build     |   1 +
>  8 files changed, 169 insertions(+), 4 deletions(-)
>  create mode 100644 include/hw/remote/iommu.h
>  create mode 100644 hw/remote/iommu.c
> 
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 5c4016b995..f2fc2d5375 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -734,6 +734,8 @@ void lsi53c8xx_handle_legacy_cmdline(DeviceState *lsi_dev);
>  qemu_irq pci_allocate_irq(PCIDevice *pci_dev);
>  void pci_set_irq(PCIDevice *pci_dev, int level);
>  
> +void pci_init_bus_master(PCIDevice *pci_dev);

This function isn't used in this patch. Why make it public?

> +
>  static inline void pci_irq_assert(PCIDevice *pci_dev)
>  {
>      pci_set_irq(pci_dev, 1);
> diff --git a/include/hw/remote/iommu.h b/include/hw/remote/iommu.h
> new file mode 100644
> index 0000000000..42ce0ca383
> --- /dev/null
> +++ b/include/hw/remote/iommu.h
> @@ -0,0 +1,24 @@
> +/*
> + * IOMMU for remote device
> + *
> + * Copyright © 2021 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef REMOTE_IOMMU_H
> +#define REMOTE_IOMMU_H
> +
> +#include "hw/pci/pci_bus.h"
> +
> +void remote_iommu_free(PCIDevice *pci_dev);
> +
> +void remote_iommu_init(void);
> +
> +void remote_iommu_set(PCIBus *bus);
> +
> +MemoryRegion *remote_iommu_get_ram(PCIDevice *pci_dev);
> +
> +#endif
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 4a84e478ce..57d561cc03 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -95,7 +95,7 @@ static const VMStateDescription vmstate_pcibus = {
>      }
>  };
>  
> -static void pci_init_bus_master(PCIDevice *pci_dev)
> +void pci_init_bus_master(PCIDevice *pci_dev)
>  {
>      AddressSpace *dma_as = pci_device_iommu_address_space(pci_dev);
>  
> diff --git a/hw/remote/iommu.c b/hw/remote/iommu.c
> new file mode 100644
> index 0000000000..30c866badb
> --- /dev/null
> +++ b/hw/remote/iommu.c
> @@ -0,0 +1,117 @@
> +/*
> + * Remote IOMMU
> + *
> + * Copyright © 2021 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/remote/iommu.h"
> +#include "hw/pci/pci_bus.h"
> +#include "exec/memory.h"
> +#include "exec/address-spaces.h"
> +#include "trace.h"
> +
> +struct VFUIOMMU {
> +    AddressSpace  as;
> +    MemoryRegion  mr;

I guess this is the root MemoryRegion container? Calling it "root" or
"root_mr" instead of "mr" would make that clearer.

> +};
> +
> +typedef struct VFUPciBus {

There is no uppercase/lowercase consistency between VfuObject vs
VFUIOMMU vs VFUPciBus. Although the coding standard doesn't dictate ABC
vs Abc, please be consistent. I suggest following the VfuObject
convention started in the previous patches. The names would be VfuIommu
and VfuPciBus.

> +    PCIBus           *bus;
> +    struct VFUIOMMU  *iommu[];
> +} VFUPciBus;
> +
> +GHashTable *remote_as_table;
> +
> +static AddressSpace *remote_iommu_get_as(PCIBus *bus, void *opaque, int devfn)
> +{
> +    VFUPciBus *vfu_pci_bus = NULL;
> +    struct VFUIOMMU *iommu = NULL;
> +
> +    if (!remote_as_table) {
> +        return &address_space_memory;
> +    }
> +
> +    vfu_pci_bus = g_hash_table_lookup(remote_as_table, bus);
> +
> +    if (!vfu_pci_bus) {
> +        vfu_pci_bus = g_malloc0(sizeof(VFUPciBus));
> +        vfu_pci_bus->bus = bus;
> +        g_hash_table_insert(remote_as_table, bus, vfu_pci_bus);
> +    }
> +
> +    iommu = vfu_pci_bus->iommu[devfn];
> +
> +    if (!iommu) {
> +        g_autofree char *mr_name = g_strdup_printf("vfu-ram-%d", devfn);
> +        g_autofree char *as_name = g_strdup_printf("vfu-as-%d", devfn);
> +
> +        iommu = g_malloc0(sizeof(struct VFUIOMMU));
> +
> +        memory_region_init(&iommu->mr, NULL, mr_name, UINT64_MAX);
> +        address_space_init(&iommu->as, &iommu->mr, as_name);
> +
> +        vfu_pci_bus->iommu[devfn] = iommu;
> +    }
> +
> +    return &iommu->as;
> +}
> +
> +void remote_iommu_free(PCIDevice *pci_dev)
> +{
> +    VFUPciBus *vfu_pci_bus = NULL;
> +    struct VFUIOMMU *iommu = NULL;
> +
> +    if (!remote_as_table) {
> +        return;
> +    }
> +
> +    vfu_pci_bus = g_hash_table_lookup(remote_as_table, pci_get_bus(pci_dev));
> +
> +    if (!vfu_pci_bus) {
> +        return;
> +    }
> +
> +    iommu = vfu_pci_bus->iommu[pci_dev->devfn];
> +
> +    vfu_pci_bus->iommu[pci_dev->devfn] = NULL;
> +
> +    if (iommu) {
> +        memory_region_unref(&iommu->mr);
> +        address_space_destroy(&iommu->as);
> +        g_free(iommu);
> +    }
> +}
> +
> +void remote_iommu_init(void)
> +{
> +    remote_as_table = g_hash_table_new_full(NULL, NULL, NULL, NULL);
> +}
> +
> +void remote_iommu_set(PCIBus *bus)
> +{
> +    pci_setup_iommu(bus, remote_iommu_get_as, NULL);
> +}
> +
> +MemoryRegion *remote_iommu_get_ram(PCIDevice *pci_dev)
> +{
> +    PCIBus *bus = pci_get_bus(pci_dev);
> +    VFUPciBus *vfu_pci_bus;
> +
> +    if (!remote_as_table) {
> +        return get_system_memory();
> +    }
> +
> +    vfu_pci_bus = g_hash_table_lookup(remote_as_table, bus);
> +    if (!vfu_pci_bus) {
> +        return get_system_memory();
> +    }
> +
> +    return &vfu_pci_bus->iommu[pci_dev->devfn]->mr;
> +}
> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
> index 952105eab5..023be0491e 100644
> --- a/hw/remote/machine.c
> +++ b/hw/remote/machine.c
> @@ -21,6 +21,7 @@
>  #include "qapi/error.h"
>  #include "hw/pci/pci_host.h"
>  #include "hw/remote/iohub.h"
> +#include "hw/remote/iommu.h"
>  
>  static void remote_machine_init(MachineState *machine)
>  {
> @@ -52,6 +53,10 @@ static void remote_machine_init(MachineState *machine)
>  
>      remote_iohub_init(&s->iohub);
>  
> +    remote_iommu_init();
> +
> +    remote_iommu_set(pci_host->bus);
> +
>      pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
>                   &s->iohub, REMOTE_IOHUB_NB_PIRQS);
>  }
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 9399e87cbe..ae375e69b9 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -49,6 +49,7 @@
>  #include "hw/qdev-core.h"
>  #include "hw/pci/pci.h"
>  #include "qemu/timer.h"
> +#include "hw/remote/iommu.h"
>  
>  #define TYPE_VFU_OBJECT "x-vfio-user-server"
>  OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
> @@ -210,6 +211,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
>  
>  static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
>  {
> +    VfuObject *o = vfu_get_private(vfu_ctx);
>      MemoryRegion *subregion = NULL;
>      g_autofree char *name = NULL;
>      static unsigned int suffix;
> @@ -226,14 +228,15 @@ static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
>      memory_region_init_ram_ptr(subregion, NULL, name,
>                                 iov->iov_len, info->vaddr);
>  
> -    memory_region_add_subregion(get_system_memory(), (hwaddr)iov->iov_base,
> -                                subregion);
> +    memory_region_add_subregion(remote_iommu_get_ram(o->pci_dev),
> +                                (hwaddr)iov->iov_base, subregion);
>  
>      trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
>  }
>  
>  static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
>  {
> +    VfuObject *o = vfu_get_private(vfu_ctx);
>      MemoryRegion *mr = NULL;
>      ram_addr_t offset;
>  
> @@ -242,7 +245,7 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
>          return;
>      }
>  
> -    memory_region_del_subregion(get_system_memory(), mr);
> +    memory_region_del_subregion(remote_iommu_get_ram(o->pci_dev), mr);
>  
>      object_unparent((OBJECT(mr)));
>  
> @@ -320,6 +323,7 @@ static vfu_region_access_cb_t *vfu_object_bar_handlers[PCI_NUM_REGIONS] = {
>   */
>  static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
>  {
> +    VfuObject *o = vfu_get_private(vfu_ctx);
>      int i;
>  
>      for (i = 0; i < PCI_NUM_REGIONS; i++) {
> @@ -332,6 +336,12 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
>                           vfu_object_bar_handlers[i],
>                           VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
>  
> +        if ((o->pci_dev->io_regions[i].type & PCI_BASE_ADDRESS_SPACE) == 0) {
> +            memory_region_unref(o->pci_dev->io_regions[i].address_space);
> +            o->pci_dev->io_regions[i].address_space =
> +                remote_iommu_get_ram(o->pci_dev);
> +        }

This looks hacky. If you create a separate PCIHost for each device
instead then the BARs will be created in the MemoryRegion (confusingly
named "address_space" in the PCI code) of your choosing.

Also, why is PCI Memory Space isolated via VFUIOMMU but PCI IO Space is
not?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2021-12-16 14:42 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-15 15:35 [PATCH v4 00/14] vfio-user server in QEMU Jagannathan Raman
2021-12-15 15:35 ` [PATCH v4 01/14] configure, meson: override C compiler for cmake Jagannathan Raman
2021-12-15 15:35 ` [PATCH v4 02/14] tests/avocado: Specify target VM argument to helper routines Jagannathan Raman
2021-12-15 15:54   ` Philippe Mathieu-Daudé
2021-12-15 22:04   ` Beraldo Leal
2021-12-16 21:28     ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 03/14] vfio-user: build library Jagannathan Raman
2021-12-15 15:35 ` [PATCH v4 04/14] vfio-user: define vfio-user-server object Jagannathan Raman
2021-12-16  9:33   ` Stefan Hajnoczi
2021-12-17  2:17     ` Jag Raman
2021-12-16  9:58   ` Stefan Hajnoczi
2021-12-17  2:31     ` Jag Raman
2021-12-17  8:28       ` Stefan Hajnoczi
2021-12-15 15:35 ` [PATCH v4 05/14] vfio-user: instantiate vfio-user context Jagannathan Raman
2021-12-16  9:55   ` Stefan Hajnoczi
2021-12-16 21:32     ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 06/14] vfio-user: find and init PCI device Jagannathan Raman
2021-12-16 10:39   ` Stefan Hajnoczi
2021-12-17  3:12     ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 07/14] vfio-user: run vfio-user context Jagannathan Raman
2021-12-16 11:17   ` Stefan Hajnoczi
2021-12-17 17:59     ` Jag Raman
2021-12-20  8:29       ` Stefan Hajnoczi
2021-12-21  3:04         ` Jag Raman
2022-01-05 10:38       ` Thanos Makatos
2022-01-06 13:35         ` Stefan Hajnoczi
2022-01-10 17:56           ` John Levon
2022-01-11  9:36             ` Stefan Hajnoczi
2022-01-11 13:12               ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 08/14] vfio-user: handle PCI config space accesses Jagannathan Raman
2021-12-16 11:30   ` Stefan Hajnoczi
2021-12-16 11:47     ` John Levon
2021-12-16 16:00       ` Stefan Hajnoczi
2021-12-15 15:35 ` [PATCH v4 09/14] vfio-user: handle DMA mappings Jagannathan Raman
2021-12-16 13:24   ` Stefan Hajnoczi
2021-12-17 19:11     ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 10/14] vfio-user: handle PCI BAR accesses Jagannathan Raman
2021-12-16 14:10   ` Stefan Hajnoczi
2021-12-17 19:12     ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 11/14] vfio-user: IOMMU support for remote device Jagannathan Raman
2021-12-16 14:40   ` Stefan Hajnoczi [this message]
2021-12-17 20:00     ` Jag Raman
2021-12-20 14:36       ` Stefan Hajnoczi
2021-12-21  4:32         ` Jag Raman
2022-01-06 13:10           ` Stefan Hajnoczi
2021-12-15 15:35 ` [PATCH v4 12/14] vfio-user: handle device interrupts Jagannathan Raman
2021-12-16 15:56   ` Stefan Hajnoczi
2021-12-15 15:35 ` [PATCH v4 13/14] vfio-user: register handlers to facilitate migration Jagannathan Raman
2021-12-15 15:35 ` [PATCH v4 14/14] vfio-user: avocado tests for vfio-user Jagannathan Raman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbtP2eaBnptogQDf@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=bleal@redhat.com \
    --cc=crosa@redhat.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=jag.raman@oracle.com \
    --cc=john.g.johnson@oracle.com \
    --cc=john.levon@nutanix.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=swapnil.ingle@nutanix.com \
    --cc=thanos.makatos@nutanix.com \
    --cc=thuth@redhat.com \
    --cc=wainersm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.