All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/9] spapr_pci: Prepare for VFIO
@ 2014-05-21 14:21 Alexey Kardashevskiy
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 1/9] spapr: Enable dynamic change of the supported hypercalls list Alexey Kardashevskiy
                   ` (8 more replies)
  0 siblings, 9 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf


This patchset prepares QEMU for VFIO support on SPAPR. It also does
preparations for Dynamic DMA window feature which allows to
create DMA windows with 16MB IOMMU pages which will allow to map
the entire guest RAM for DMA at almost no cost.


Alexey Kardashevskiy (9):
  spapr: Enable dynamic change of the supported hypercalls list
  spapr_iommu: Enable multiple TCE requests
  spapr_pci: Introduce a finish_realize() callback
  spapr_pci: spapr_iommu: Make DMA window a subregion
  spapr_pci: Allow multiple TCE tables per PHB
  spapr_iommu: Convert old qdev_init_nofail() to
    object_property_set_bool
  spapr_iommu: Get rid of window_size in sPAPRTCETable
  spapr_iommu: Introduce page_shift in sPAPRTCETable
  spapr_iommu: Introduce bus_offset in sPAPRTCETable

 hw/ppc/spapr.c              |  25 ++++++-
 hw/ppc/spapr_iommu.c        | 158 +++++++++++++++++++++++++++++++++-----------
 hw/ppc/spapr_pci.c          |  96 ++++++++++++++++++++++-----
 hw/ppc/spapr_vio.c          |   6 +-
 include/hw/pci-host/spapr.h |  18 ++++-
 include/hw/ppc/spapr.h      |   8 ++-
 target-ppc/kvm.c            |  11 ++-
 target-ppc/kvm_ppc.h        |   9 ++-
 trace-events                |   2 +
 9 files changed, 264 insertions(+), 69 deletions(-)

-- 
1.9.rc0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 1/9] spapr: Enable dynamic change of the supported hypercalls list
  2014-05-21 14:21 [Qemu-devel] [PATCH 0/9] spapr_pci: Prepare for VFIO Alexey Kardashevskiy
@ 2014-05-21 14:21 ` Alexey Kardashevskiy
  2014-05-21 14:26   ` Alexander Graf
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 2/9] spapr_iommu: Enable multiple TCE requests Alexey Kardashevskiy
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

At the moment the "ibm,hypertas-functions" list is fixed. However some
calls should be listed there if they are supported by QEMU or the host
kernel.

This enables hyperrtas_prop to grow on stack by adding
a SPAPR_HYPERRTAS_ADD macro.

The first user of this is going to be a "multi-tce" property.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0a61246..e174e04 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -306,8 +306,6 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     CPUState *cs;
     uint32_t start_prop = cpu_to_be32(initrd_base);
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
-    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
-        "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk\0hcall-set-mode";
     char qemu_hypertas_prop[] = "hcall-memop1";
     uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
@@ -316,6 +314,24 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
     unsigned sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
     uint32_t cpus_per_socket = sockets ? (smp_cpus / sockets) : 1;
+    char *hypertas_prop = NULL;
+    int hypertas_prop_len = 0;
+
+#define SPAPR_HYPERRTAS_ADD(prop)                                   \
+    do {                                                            \
+        const char proptmp[] = prop;                                \
+        char *httmp = alloca(hypertas_prop_len + sizeof(proptmp));  \
+        if (hypertas_prop_len) {                                    \
+            memcpy(httmp, hypertas_prop, hypertas_prop_len);        \
+        }                                                           \
+        memcpy(httmp + hypertas_prop_len, proptmp, sizeof(proptmp));\
+        hypertas_prop_len += sizeof(proptmp);                       \
+        hypertas_prop = httmp;                                      \
+    } while (0)
+
+    SPAPR_HYPERRTAS_ADD(
+        "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
+        "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk\0hcall-set-mode");
 
     fdt = g_malloc0(FDT_MAX_SIZE);
     _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
@@ -485,7 +501,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     _FDT((fdt_begin_node(fdt, "rtas")));
 
     _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
-                       sizeof(hypertas_prop))));
+                       hypertas_prop_len)));
     _FDT((fdt_property(fdt, "qemu,hypertas-functions", qemu_hypertas_prop,
                        sizeof(qemu_hypertas_prop))));
 
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 2/9] spapr_iommu: Enable multiple TCE requests
  2014-05-21 14:21 [Qemu-devel] [PATCH 0/9] spapr_pci: Prepare for VFIO Alexey Kardashevskiy
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 1/9] spapr: Enable dynamic change of the supported hypercalls list Alexey Kardashevskiy
@ 2014-05-21 14:21 ` Alexey Kardashevskiy
  2014-05-21 14:37   ` Alexander Graf
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 3/9] spapr_pci: Introduce a finish_realize() callback Alexey Kardashevskiy
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

Currently only single TCE entry per request is supported (H_PUT_TCE).
However PAPR+ specification allows multiple entry requests such as
H_PUT_TCE_INDIRECT and H_STUFF_TCE. Having less transitions to the host
kernel via ioctls, support of these calls can accelerate IOMMU operations.

This implements H_STUFF_TCE and H_PUT_TCE_INDIRECT.

This advertises "multi-tce" capability to the guest if the host kernel
supports it (KVM_CAP_SPAPR_MULTITCE) or guest is running in TCG mode.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v1:
* removed checks for liobn as the check is performed already in
spapr_tce_find_by_liobn
* added hcall-multi-tce if the host kernel supports the capability
---
 hw/ppc/spapr.c       |  3 ++
 hw/ppc/spapr_iommu.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/kvm.c     |  7 +++++
 target-ppc/kvm_ppc.h |  7 +++++
 trace-events         |  2 ++
 5 files changed, 97 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e174e04..66929cb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -500,6 +500,9 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     /* RTAS */
     _FDT((fdt_begin_node(fdt, "rtas")));
 
+    if (kvmppc_spapr_use_multitce()) {
+        SPAPR_HYPERRTAS_ADD("hcall-multi-tce");
+    }
     _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
                        hypertas_prop_len)));
     _FDT((fdt_property(fdt, "qemu,hypertas-functions", qemu_hypertas_prop,
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 72493d8..ab5037c 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -224,6 +224,82 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
     return H_SUCCESS;
 }
 
+static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
+                                       sPAPREnvironment *spapr,
+                                       target_ulong opcode, target_ulong *args)
+{
+    int i;
+    target_ulong liobn = args[0];
+    target_ulong ioba = args[1];
+    target_ulong ioba1 = ioba;
+    target_ulong tce_list = args[2];
+    target_ulong npages = args[3];
+    target_ulong ret = H_PARAMETER;
+    sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
+    CPUState *cs = CPU(cpu);
+
+    if (!tcet) {
+        return H_PARAMETER;
+    }
+
+    if (npages > 512) {
+        return H_PARAMETER;
+    }
+
+    ioba &= ~SPAPR_TCE_PAGE_MASK;
+    tce_list &= ~SPAPR_TCE_PAGE_MASK;
+
+    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
+        target_ulong tce = ldq_phys(cs->as, tce_list +
+                                    i * sizeof(target_ulong));
+        ret = put_tce_emu(tcet, ioba, tce);
+        if (ret) {
+            break;
+        }
+    }
+
+    /* Trace last successful or the first problematic entry */
+    i = i ? (i - 1) : 0;
+    trace_spapr_iommu_indirect(liobn, ioba1, tce_list, i,
+                               ldq_phys(cs->as,
+                               tce_list + i * sizeof(target_ulong)),
+                               ret);
+
+    return ret;
+}
+
+static target_ulong h_stuff_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                              target_ulong opcode, target_ulong *args)
+{
+    int i;
+    target_ulong liobn = args[0];
+    target_ulong ioba = args[1];
+    target_ulong tce_value = args[2];
+    target_ulong npages = args[3];
+    target_ulong ret = H_PARAMETER;
+    sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
+
+    if (!tcet) {
+        return H_PARAMETER;
+    }
+
+    if (npages > tcet->nb_table) {
+        return H_PARAMETER;
+    }
+
+    ioba &= ~SPAPR_TCE_PAGE_MASK;
+
+    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
+        ret = put_tce_emu(tcet, ioba, tce_value);
+        if (ret) {
+            break;
+        }
+    }
+    trace_spapr_iommu_stuff(liobn, ioba, tce_value, npages, ret);
+
+    return ret;
+}
+
 static target_ulong h_put_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                               target_ulong opcode, target_ulong *args)
 {
@@ -332,6 +408,8 @@ static void spapr_tce_table_class_init(ObjectClass *klass, void *data)
     /* hcall-tce */
     spapr_register_hypercall(H_PUT_TCE, h_put_tce);
     spapr_register_hypercall(H_GET_TCE, h_get_tce);
+    spapr_register_hypercall(H_PUT_TCE_INDIRECT, h_put_tce_indirect);
+    spapr_register_hypercall(H_STUFF_TCE, h_stuff_tce);
 }
 
 static TypeInfo spapr_tce_table_info = {
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index ca31027..e1fb29b 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -62,6 +62,7 @@ static int cap_booke_sregs;
 static int cap_ppc_smt;
 static int cap_ppc_rma;
 static int cap_spapr_tce;
+static int cap_spapr_multitce;
 static int cap_hior;
 static int cap_one_reg;
 static int cap_epr;
@@ -98,6 +99,7 @@ int kvm_arch_init(KVMState *s)
     cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
     cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
     cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
+    cap_spapr_multitce = kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE);
     cap_one_reg = kvm_check_extension(s, KVM_CAP_ONE_REG);
     cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
     cap_epr = kvm_check_extension(s, KVM_CAP_PPC_EPR);
@@ -1608,6 +1610,11 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
 }
 #endif
 
+bool kvmppc_spapr_use_multitce(void)
+{
+    return !kvm_enabled() || cap_spapr_multitce;
+}
+
 void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd)
 {
     struct kvm_create_spapr_tce args = {
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index ff077ec..1756ad2 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -31,6 +31,7 @@ int kvmppc_set_tcr(PowerPCCPU *cpu);
 int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu);
 #ifndef CONFIG_USER_ONLY
 off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem);
+bool kvmppc_spapr_use_multitce(void);
 void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd);
 int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
 int kvmppc_reset_htab(int shift_hint);
@@ -130,6 +131,12 @@ static inline off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem)
     return 0;
 }
 
+static inline bool kvmppc_spapr_use_multitce(void)
+{
+    /* No KVM, so always use the qemu multitce implementations */
+    return true;
+}
+
 static inline void *kvmppc_create_spapr_tce(uint32_t liobn,
                                             uint32_t window_size, int *fd)
 {
diff --git a/trace-events b/trace-events
index 5997846..89719c7 100644
--- a/trace-events
+++ b/trace-events
@@ -1186,6 +1186,8 @@ xics_ics_eoi(int nr) "ics_eoi: irq %#x"
 # hw/ppc/spapr_iommu.c
 spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64
 spapr_iommu_get(uint64_t liobn, uint64_t ioba, uint64_t ret, uint64_t tce) "liobn=%"PRIx64" ioba=0x%"PRIx64" ret=%"PRId64" tce=0x%"PRIx64
+spapr_iommu_indirect(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t iobaN, uint64_t tceN, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tcelist=0x%"PRIx64" iobaN=0x%"PRIx64" tceN=0x%"PRIx64" ret=%"PRId64
+spapr_iommu_stuff(uint64_t liobn, uint64_t ioba, uint64_t tce_value, uint64_t npages, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tcevalue=0x%"PRIx64" npages=%"PRId64" ret=%"PRId64
 spapr_iommu_xlate(uint64_t liobn, uint64_t ioba, uint64_t tce, unsigned perm, unsigned pgsize) "liobn=%"PRIx64" 0x%"PRIx64" -> 0x%"PRIx64" perm=%u mask=%x"
 spapr_iommu_new_table(uint64_t liobn, void *tcet, void *table, int fd) "liobn=%"PRIx64" tcet=%p table=%p fd=%d"
 
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 3/9] spapr_pci: Introduce a finish_realize() callback
  2014-05-21 14:21 [Qemu-devel] [PATCH 0/9] spapr_pci: Prepare for VFIO Alexey Kardashevskiy
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 1/9] spapr: Enable dynamic change of the supported hypercalls list Alexey Kardashevskiy
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 2/9] spapr_iommu: Enable multiple TCE requests Alexey Kardashevskiy
@ 2014-05-21 14:21 ` Alexey Kardashevskiy
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 4/9] spapr_pci: spapr_iommu: Make DMA window a subregion Alexey Kardashevskiy
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

The spapr-pci PHB initializes IOMMU for emulated devices only.
The upcoming VFIO support will do it different. However both emulated
and VFIO PHB types share most of the initialization code.
For the type specific things a new finish_realize() callback is
introduced.

This introduces sPAPRPHBClass derived from PCIHostBridgeClass and
adds the callback pointer.

This implements finish_realize() for emulated devices.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

The difference to VFIO is that for VFIO we have to ask the kernel
about DMA window size before calling spapr_tce_new_table() and
then we have to tell VFIO KVM device about LIOBN <-> IOMMU link.
---
 hw/ppc/spapr_pci.c          | 38 ++++++++++++++++++++++++++------------
 include/hw/pci-host/spapr.h | 14 ++++++++++++++
 2 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 1db73f2..b141e83 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -528,6 +528,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     SysBusDevice *s = SYS_BUS_DEVICE(dev);
     sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
     PCIHostState *phb = PCI_HOST_BRIDGE(s);
+    sPAPRPHBClass *info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(s);
     char *namebuf;
     int i;
     PCIBus *bus;
@@ -609,18 +610,6 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
                            PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
     phb->bus = bus;
 
-    sphb->dma_window_start = 0;
-    sphb->dma_window_size = 0x40000000;
-    sphb->tcet = spapr_tce_new_table(dev, sphb->dma_liobn,
-                                     sphb->dma_window_size);
-    if (!sphb->tcet) {
-        error_setg(errp, "Unable to create TCE table for %s",
-                   sphb->dtbusname);
-        return;
-    }
-    address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet),
-                       sphb->dtbusname);
-
     pci_setup_iommu(bus, spapr_pci_dma_iommu, sphb);
 
     pci_bus_set_route_irq_fn(bus, spapr_route_intx_pin_to_irq);
@@ -639,6 +628,28 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
 
         sphb->lsi_table[i].irq = irq;
     }
+
+    if (!info->finish_realize) {
+        error_setg(errp, "finish_realize not defined");
+        return;
+    }
+
+    info->finish_realize(sphb, errp);
+}
+
+static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
+{
+    sphb->dma_window_start = 0;
+    sphb->dma_window_size = 0x40000000;
+    sphb->tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
+                                     sphb->dma_window_size);
+    if (!sphb->tcet) {
+        error_setg(errp, "Unable to create TCE table for %s",
+                   sphb->dtbusname);
+        return ;
+    }
+    address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet),
+                       sphb->dtbusname);
 }
 
 static void spapr_phb_reset(DeviceState *qdev)
@@ -722,6 +733,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
 {
     PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
     DeviceClass *dc = DEVICE_CLASS(klass);
+    sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
 
     hc->root_bus_path = spapr_phb_root_bus_path;
     dc->realize = spapr_phb_realize;
@@ -730,6 +742,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
     dc->vmsd = &vmstate_spapr_pci;
     set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
     dc->cannot_instantiate_with_device_add_yet = false;
+    spc->finish_realize = spapr_phb_finish_realize;
 }
 
 static const TypeInfo spapr_phb_info = {
@@ -737,6 +750,7 @@ static const TypeInfo spapr_phb_info = {
     .parent        = TYPE_PCI_HOST_BRIDGE,
     .instance_size = sizeof(sPAPRPHBState),
     .class_init    = spapr_phb_class_init,
+    .class_size    = sizeof(sPAPRPHBClass),
 };
 
 PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 970b4a9..ab29281 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -34,6 +34,20 @@
 #define SPAPR_PCI_HOST_BRIDGE(obj) \
     OBJECT_CHECK(sPAPRPHBState, (obj), TYPE_SPAPR_PCI_HOST_BRIDGE)
 
+#define SPAPR_PCI_HOST_BRIDGE_CLASS(klass) \
+     OBJECT_CLASS_CHECK(sPAPRPHBClass, (klass), TYPE_SPAPR_PCI_HOST_BRIDGE)
+#define SPAPR_PCI_HOST_BRIDGE_GET_CLASS(obj) \
+     OBJECT_GET_CLASS(sPAPRPHBClass, (obj), TYPE_SPAPR_PCI_HOST_BRIDGE)
+
+typedef struct sPAPRPHBClass sPAPRPHBClass;
+typedef struct sPAPRPHBState sPAPRPHBState;
+
+struct sPAPRPHBClass {
+    PCIHostBridgeClass parent_class;
+
+    void (*finish_realize)(sPAPRPHBState *sphb, Error **errp);
+};
+
 typedef struct sPAPRPHBState {
     PCIHostState parent_obj;
 
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 4/9] spapr_pci: spapr_iommu: Make DMA window a subregion
  2014-05-21 14:21 [Qemu-devel] [PATCH 0/9] spapr_pci: Prepare for VFIO Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 3/9] spapr_pci: Introduce a finish_realize() callback Alexey Kardashevskiy
@ 2014-05-21 14:21 ` Alexey Kardashevskiy
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 5/9] spapr_pci: Allow multiple TCE tables per PHB Alexey Kardashevskiy
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

Currently the default DMA window is represented by a single MemoryRegion.
However there can be more than just one window so we need
a "root" memory region to be separated from the actual DMA window(s).

This introduces a "root" IOMMU memory region and adds a subregion for
the default DMA 32bit window. Following patches will add other
subregion(s).

This initializes a default DMA window subregion size to the guest RAM
size as this window can be switched into "bypass" mode which implements
direct DMA mapping.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c        |  2 +-
 hw/ppc/spapr_pci.c          | 20 ++++++++++++++++++--
 include/hw/pci-host/spapr.h |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index ab5037c..241ceeb 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -136,7 +136,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
     trace_spapr_iommu_new_table(tcet->liobn, tcet, tcet->table, tcet->fd);
 
     memory_region_init_iommu(&tcet->iommu, OBJECT(dev), &spapr_iommu_ops,
-                             "iommu-spapr", UINT64_MAX);
+                             "iommu-spapr", ram_size);
 
     QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
 
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index b141e83..f1684c2 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -610,6 +610,20 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
                            PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
     phb->bus = bus;
 
+    /*
+     * Initialize PHB address space.
+     * By default there will be at least one subregion for default
+     * 32bit DMA window.
+     * Later the guest might want to create another DMA window
+     * which will become another memory subregion.
+     */
+    sprintf(namebuf, "%s.iommu-root", sphb->dtbusname);
+
+    memory_region_init(&sphb->iommu_root, OBJECT(sphb),
+                       namebuf, UINT64_MAX);
+    address_space_init(&sphb->iommu_as, &sphb->iommu_root,
+                       sphb->dtbusname);
+
     pci_setup_iommu(bus, spapr_pci_dma_iommu, sphb);
 
     pci_bus_set_route_irq_fn(bus, spapr_route_intx_pin_to_irq);
@@ -648,8 +662,10 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
                    sphb->dtbusname);
         return ;
     }
-    address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet),
-                       sphb->dtbusname);
+
+    /* Register default 32bit DMA window */
+    memory_region_add_subregion(&sphb->iommu_root, 0,
+                                spapr_tce_get_iommu(sphb->tcet));
 }
 
 static void spapr_phb_reset(DeviceState *qdev)
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index ab29281..c98ebdf 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -64,6 +64,7 @@ typedef struct sPAPRPHBState {
     uint64_t dma_window_size;
     sPAPRTCETable *tcet;
     AddressSpace iommu_as;
+    MemoryRegion iommu_root;
 
     struct spapr_pci_lsi {
         uint32_t irq;
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 5/9] spapr_pci: Allow multiple TCE tables per PHB
  2014-05-21 14:21 [Qemu-devel] [PATCH 0/9] spapr_pci: Prepare for VFIO Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 4/9] spapr_pci: spapr_iommu: Make DMA window a subregion Alexey Kardashevskiy
@ 2014-05-21 14:21 ` Alexey Kardashevskiy
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 6/9] spapr_iommu: Convert old qdev_init_nofail() to object_property_set_bool Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

At the moment sPAPRPHBState contains a @tcet pointer to the only
TCE table. However sPAPR spec allows having more than one DMA window.

Since the TCE object is already a child of SPAPR PHB object, there is
no need to keep an additional pointer to it in sPAPRPHBState so remove it.

This changes the way sPAPRPHBState::reset performs reset of sPAPRTCETable
objects.

This changes the default DMA window properties calculation.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

The only reason for sPAPRPHBState to keep a pointer to sPAPRTCETable is
to have a direct link to calculate default 32bit window properties.
So I decided to replace the link with a spapr_phb_children_dt() loop
and use first TCE table for default window. Is that ok or ugly?
---
 hw/ppc/spapr_pci.c          | 54 ++++++++++++++++++++++++++++++++++++---------
 include/hw/pci-host/spapr.h |  1 -
 2 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index f1684c2..aa29116 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -653,11 +653,13 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
 
 static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
 {
+    sPAPRTCETable *tcet;
+
     sphb->dma_window_start = 0;
     sphb->dma_window_size = 0x40000000;
-    sphb->tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
-                                     sphb->dma_window_size);
-    if (!sphb->tcet) {
+    tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
+                               sphb->dma_window_size);
+    if (!tcet) {
         error_setg(errp, "Unable to create TCE table for %s",
                    sphb->dtbusname);
         return ;
@@ -665,16 +667,24 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
 
     /* Register default 32bit DMA window */
     memory_region_add_subregion(&sphb->iommu_root, 0,
-                                spapr_tce_get_iommu(sphb->tcet));
+                                spapr_tce_get_iommu(tcet));
+}
+
+static int spapr_phb_children_reset(Object *child, void *opaque)
+{
+    DeviceState *dev = (DeviceState *) object_dynamic_cast(child, TYPE_DEVICE);
+
+    if (dev) {
+        device_reset(dev);
+    }
+
+    return 0;
 }
 
 static void spapr_phb_reset(DeviceState *qdev)
 {
-    SysBusDevice *s = SYS_BUS_DEVICE(qdev);
-    sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
-
     /* Reset the IOMMU state */
-    device_reset(DEVICE(sphb->tcet));
+    object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
 }
 
 static Property spapr_phb_properties[] = {
@@ -791,6 +801,29 @@ PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
 #define b_fff(x)        b_x((x), 8, 3)  /* function number */
 #define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
 
+typedef struct sPAPRTCEDT {
+    void *fdt;
+    int node_off;
+} sPAPRTCEDT;
+
+static int spapr_phb_children_dt(Object *child, void *opaque)
+{
+    sPAPRTCEDT *p = opaque;
+    sPAPRTCETable *tcet;
+
+    tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
+    if (!tcet) {
+        return 0;
+    }
+
+    spapr_dma_dt(p->fdt, p->node_off, "ibm,dma-window",
+                 tcet->liobn, 0,
+                 tcet->window_size);
+    /* Stop after the first window */
+
+    return 1;
+}
+
 int spapr_populate_pci_dt(sPAPRPHBState *phb,
                           uint32_t xics_phandle,
                           void *fdt)
@@ -870,9 +903,8 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
     _FDT(fdt_setprop(fdt, bus_off, "interrupt-map", &interrupt_map,
                      sizeof(interrupt_map)));
 
-    spapr_dma_dt(fdt, bus_off, "ibm,dma-window",
-                 phb->dma_liobn, phb->dma_window_start,
-                 phb->dma_window_size);
+    object_child_foreach(OBJECT(phb), spapr_phb_children_dt,
+                         &((sPAPRTCEDT){ .fdt = fdt, .node_off = bus_off }));
 
     return 0;
 }
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index c98ebdf..5ea4745 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -62,7 +62,6 @@ typedef struct sPAPRPHBState {
     uint32_t dma_liobn;
     uint64_t dma_window_start;
     uint64_t dma_window_size;
-    sPAPRTCETable *tcet;
     AddressSpace iommu_as;
     MemoryRegion iommu_root;
 
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 6/9] spapr_iommu: Convert old qdev_init_nofail() to object_property_set_bool
  2014-05-21 14:21 [Qemu-devel] [PATCH 0/9] spapr_pci: Prepare for VFIO Alexey Kardashevskiy
                   ` (4 preceding siblings ...)
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 5/9] spapr_pci: Allow multiple TCE tables per PHB Alexey Kardashevskiy
@ 2014-05-21 14:21 ` Alexey Kardashevskiy
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 7/9] spapr_iommu: Get rid of window_size in sPAPRTCETable Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

qdev_init_nofail() was replaced by object_property_set_bool("realized")
all over the QEMU so do we.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 241ceeb..0dd6509 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -163,7 +163,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn, size_t wi
 
     object_property_add_child(OBJECT(owner), "tce-table", OBJECT(tcet), NULL);
 
-    qdev_init_nofail(DEVICE(tcet));
+    object_property_set_bool(OBJECT(tcet), true, "realized", NULL);
 
     return tcet;
 }
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 7/9] spapr_iommu: Get rid of window_size in sPAPRTCETable
  2014-05-21 14:21 [Qemu-devel] [PATCH 0/9] spapr_pci: Prepare for VFIO Alexey Kardashevskiy
                   ` (5 preceding siblings ...)
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 6/9] spapr_iommu: Convert old qdev_init_nofail() to object_property_set_bool Alexey Kardashevskiy
@ 2014-05-21 14:21 ` Alexey Kardashevskiy
  2014-05-21 22:05   ` Alexander Graf
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift " Alexey Kardashevskiy
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 9/9] spapr_iommu: Introduce bus_offset " Alexey Kardashevskiy
  8 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

This removes window_size as it is basically a copy of nb_table
shifted by SPAPR_TCE_PAGE_SHIFT. As new dynamic DMA windows are
going to support windows as big as the entire RAM and this number
will be bigger that 32 capacity, we will have to do something
about @window_size anyway and removal seems to be the right way to go.

This removes dma_window_start/dma_window_size from sPAPRPHBState as
they are no longer used.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c        | 41 +++++++++++++++--------------------------
 hw/ppc/spapr_pci.c          |  6 ++----
 hw/ppc/spapr_vio.c          |  4 +++-
 include/hw/pci-host/spapr.h |  2 --
 include/hw/ppc/spapr.h      |  3 +--
 target-ppc/kvm.c            |  4 ++--
 target-ppc/kvm_ppc.h        |  2 +-
 7 files changed, 24 insertions(+), 38 deletions(-)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 0dd6509..90de3e3 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -70,7 +70,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
 
     if (tcet->bypass) {
         ret.perm = IOMMU_RW;
-    } else if (addr < tcet->window_size) {
+    } else if ((addr >> SPAPR_TCE_PAGE_SHIFT) < tcet->nb_table) {
         /* Check if we are in bound */
         tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT];
         ret.iova = addr & ~SPAPR_TCE_PAGE_MASK;
@@ -84,25 +84,15 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
     return ret;
 }
 
-static int spapr_tce_table_pre_load(void *opaque)
-{
-    sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
-
-    tcet->nb_table = tcet->window_size >> SPAPR_TCE_PAGE_SHIFT;
-
-    return 0;
-}
-
 static const VMStateDescription vmstate_spapr_tce_table = {
     .name = "spapr_iommu",
-    .version_id = 1,
+    .version_id = 2,
     .minimum_version_id = 1,
     .minimum_version_id_old = 1,
-    .pre_load = spapr_tce_table_pre_load,
     .fields      = (VMStateField []) {
         /* Sanity check */
         VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
-        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
+        VMSTATE_UINT32_EQUAL(nb_table, sPAPRTCETable),
 
         /* IOMMU state */
         VMSTATE_BOOL(bypass, sPAPRTCETable),
@@ -122,16 +112,15 @@ static int spapr_tce_table_realize(DeviceState *dev)
 
     if (kvm_enabled()) {
         tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
-                                              tcet->window_size,
+                                              tcet->nb_table <<
+                                              SPAPR_TCE_PAGE_SHIFT,
                                               &tcet->fd);
     }
 
     if (!tcet->table) {
-        size_t table_size = (tcet->window_size >> SPAPR_TCE_PAGE_SHIFT)
-            * sizeof(uint64_t);
+        size_t table_size = tcet->nb_table * sizeof(uint64_t);
         tcet->table = g_malloc0(table_size);
     }
-    tcet->nb_table = tcet->window_size >> SPAPR_TCE_PAGE_SHIFT;
 
     trace_spapr_iommu_new_table(tcet->liobn, tcet, tcet->table, tcet->fd);
 
@@ -143,7 +132,8 @@ static int spapr_tce_table_realize(DeviceState *dev)
     return 0;
 }
 
-sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn, size_t window_size)
+sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
+                                   uint32_t nb_table)
 {
     sPAPRTCETable *tcet;
 
@@ -153,13 +143,13 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn, size_t wi
         return NULL;
     }
 
-    if (!window_size) {
+    if (!nb_table) {
         return NULL;
     }
 
     tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
     tcet->liobn = liobn;
-    tcet->window_size = window_size;
+    tcet->nb_table = nb_table;
 
     object_property_add_child(OBJECT(owner), "tce-table", OBJECT(tcet), NULL);
 
@@ -176,7 +166,7 @@ static void spapr_tce_table_finalize(Object *obj)
 
     if (!kvm_enabled() ||
         (kvmppc_remove_spapr_tce(tcet->table, tcet->fd,
-                                 tcet->window_size) != 0)) {
+                                 tcet->nb_table) != 0)) {
         g_free(tcet->table);
     }
 }
@@ -194,8 +184,7 @@ void spapr_tce_set_bypass(sPAPRTCETable *tcet, bool bypass)
 static void spapr_tce_reset(DeviceState *dev)
 {
     sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
-    size_t table_size = (tcet->window_size >> SPAPR_TCE_PAGE_SHIFT)
-        * sizeof(uint64_t);
+    size_t table_size = tcet->nb_table * sizeof(uint64_t);
 
     tcet->bypass = false;
     memset(tcet->table, 0, table_size);
@@ -206,7 +195,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
 {
     IOMMUTLBEntry entry;
 
-    if (ioba >= tcet->window_size) {
+    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
         hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
                       TARGET_FMT_lx "\n", ioba);
         return H_PARAMETER;
@@ -322,7 +311,7 @@ static target_ulong h_put_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 static target_ulong get_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
                                 target_ulong *tce)
 {
-    if (ioba >= tcet->window_size) {
+    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
         hcall_dprintf("spapr_iommu_get_tce on out-of-bounds IOBA 0x"
                       TARGET_FMT_lx "\n", ioba);
         return H_PARAMETER;
@@ -393,7 +382,7 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
     }
 
     return spapr_dma_dt(fdt, node_off, propname,
-                        tcet->liobn, 0, tcet->window_size);
+                        tcet->liobn, 0, tcet->nb_table << SPAPR_TCE_PAGE_SHIFT);
 }
 
 static void spapr_tce_table_class_init(ObjectClass *klass, void *data)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index aa29116..fdd4c07 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -655,10 +655,8 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
 {
     sPAPRTCETable *tcet;
 
-    sphb->dma_window_start = 0;
-    sphb->dma_window_size = 0x40000000;
     tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
-                               sphb->dma_window_size);
+                               0x40000000 >> SPAPR_TCE_PAGE_SHIFT);
     if (!tcet) {
         error_setg(errp, "Unable to create TCE table for %s",
                    sphb->dtbusname);
@@ -818,7 +816,7 @@ static int spapr_phb_children_dt(Object *child, void *opaque)
 
     spapr_dma_dt(p->fdt, p->node_off, "ibm,dma-window",
                  tcet->liobn, 0,
-                 tcet->window_size);
+                 tcet->nb_table << SPAPR_TCE_PAGE_SHIFT);
     /* Stop after the first window */
 
     return 1;
diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index 2ae06a3..b84e481 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -456,7 +456,9 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
 
     if (pc->rtce_window_size) {
         uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
-        dev->tcet = spapr_tce_new_table(qdev, liobn, pc->rtce_window_size);
+        dev->tcet = spapr_tce_new_table(qdev, liobn,
+                                        pc->rtce_window_size >>
+                                        SPAPR_TCE_PAGE_SHIFT);
         address_space_init(&dev->as, spapr_tce_get_iommu(dev->tcet), qdev->id);
     }
 
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 5ea4745..07ba7cf 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -60,8 +60,6 @@ typedef struct sPAPRPHBState {
     MemoryRegion memwindow, iowindow;
 
     uint32_t dma_liobn;
-    uint64_t dma_window_start;
-    uint64_t dma_window_size;
     AddressSpace iommu_as;
     MemoryRegion iommu_root;
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9f8bb89..5f7791d 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -391,7 +391,6 @@ typedef struct sPAPRTCETable sPAPRTCETable;
 struct sPAPRTCETable {
     DeviceState parent;
     uint32_t liobn;
-    uint32_t window_size;
     uint32_t nb_table;
     uint64_t *table;
     bool bypass;
@@ -403,7 +402,7 @@ struct sPAPRTCETable {
 void spapr_events_init(sPAPREnvironment *spapr);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
-                                   size_t window_size);
+                                   uint32_t nb_table);
 MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
 void spapr_tce_set_bypass(sPAPRTCETable *tcet, bool bypass);
 int spapr_dma_dt(void *fdt, int node_off, const char *propname,
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index e1fb29b..f42d73b 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1655,7 +1655,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd)
     return table;
 }
 
-int kvmppc_remove_spapr_tce(void *table, int fd, uint32_t window_size)
+int kvmppc_remove_spapr_tce(void *table, int fd, uint32_t nb_table)
 {
     long len;
 
@@ -1663,7 +1663,7 @@ int kvmppc_remove_spapr_tce(void *table, int fd, uint32_t window_size)
         return -1;
     }
 
-    len = (window_size / SPAPR_TCE_PAGE_SIZE)*sizeof(uint64_t);
+    len = nb_table * sizeof(uint64_t);
     if ((munmap(table, len) < 0) ||
         (close(fd) < 0)) {
         fprintf(stderr, "KVM: Unexpected error removing TCE table: %s",
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 1756ad2..f7a32eb 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -144,7 +144,7 @@ static inline void *kvmppc_create_spapr_tce(uint32_t liobn,
 }
 
 static inline int kvmppc_remove_spapr_tce(void *table, int pfd,
-                                          uint32_t window_size)
+                                          uint32_t nb_table)
 {
     return -1;
 }
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift in sPAPRTCETable
  2014-05-21 14:21 [Qemu-devel] [PATCH 0/9] spapr_pci: Prepare for VFIO Alexey Kardashevskiy
                   ` (6 preceding siblings ...)
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 7/9] spapr_iommu: Get rid of window_size in sPAPRTCETable Alexey Kardashevskiy
@ 2014-05-21 14:21 ` Alexey Kardashevskiy
  2014-05-21 22:11   ` Alexander Graf
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 9/9] spapr_iommu: Introduce bus_offset " Alexey Kardashevskiy
  8 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
spec allows other page sizes and we are going to implement them, we need
page size to be configrable.

This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT
with it whereever it is possible.

This removes SPAPR_TCE_PAGE_MASK as it is no longer used.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c   | 54 +++++++++++++++++++++++++++++---------------------
 hw/ppc/spapr_pci.c     |  1 +
 hw/ppc/spapr_vio.c     |  1 +
 include/hw/ppc/spapr.h |  3 ++-
 4 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 90de3e3..e765a6d 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -70,12 +70,13 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
 
     if (tcet->bypass) {
         ret.perm = IOMMU_RW;
-    } else if ((addr >> SPAPR_TCE_PAGE_SHIFT) < tcet->nb_table) {
+    } else if ((addr >> tcet->page_shift) < tcet->nb_table) {
         /* Check if we are in bound */
-        tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT];
-        ret.iova = addr & ~SPAPR_TCE_PAGE_MASK;
-        ret.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
-        ret.addr_mask = SPAPR_TCE_PAGE_MASK;
+        target_ulong mask = ~((1 << tcet->page_shift) - 1);
+        tce = tcet->table[addr >> tcet->page_shift];
+        ret.iova = addr & mask;
+        ret.translated_addr = tce & mask;
+        ret.addr_mask = ~mask;
         ret.perm = tce;
     }
     trace_spapr_iommu_xlate(tcet->liobn, addr, ret.iova, ret.perm,
@@ -113,7 +114,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
     if (kvm_enabled()) {
         tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
                                               tcet->nb_table <<
-                                              SPAPR_TCE_PAGE_SHIFT,
+                                              tcet->page_shift,
                                               &tcet->fd);
     }
 
@@ -133,6 +134,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
 }
 
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
+                                   uint32_t page_shift,
                                    uint32_t nb_table)
 {
     sPAPRTCETable *tcet;
@@ -149,6 +151,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
 
     tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
     tcet->liobn = liobn;
+    tcet->page_shift = page_shift;
     tcet->nb_table = nb_table;
 
     object_property_add_child(OBJECT(owner), "tce-table", OBJECT(tcet), NULL);
@@ -194,19 +197,20 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
                                 target_ulong tce)
 {
     IOMMUTLBEntry entry;
+    target_ulong mask = ~((1 << tcet->page_shift) - 1);
 
-    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
+    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
         hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
                       TARGET_FMT_lx "\n", ioba);
         return H_PARAMETER;
     }
 
-    tcet->table[ioba >> SPAPR_TCE_PAGE_SHIFT] = tce;
+    tcet->table[ioba >> tcet->page_shift] = tce;
 
     entry.target_as = &address_space_memory,
-    entry.iova = ioba & ~SPAPR_TCE_PAGE_MASK;
-    entry.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
-    entry.addr_mask = SPAPR_TCE_PAGE_MASK;
+    entry.iova = ioba & mask;
+    entry.translated_addr = tce & mask;
+    entry.addr_mask = ~mask;
     entry.perm = tce;
     memory_region_notify_iommu(&tcet->iommu, entry);
 
@@ -226,6 +230,7 @@ static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
     target_ulong ret = H_PARAMETER;
     sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
     CPUState *cs = CPU(cpu);
+    target_ulong mask;
 
     if (!tcet) {
         return H_PARAMETER;
@@ -235,12 +240,14 @@ static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
         return H_PARAMETER;
     }
 
-    ioba &= ~SPAPR_TCE_PAGE_MASK;
-    tce_list &= ~SPAPR_TCE_PAGE_MASK;
+    mask = ~((1 << tcet->page_shift) - 1);
+    ioba &= mask;
+
+    for (i = 0; i < npages; ++i, ioba += (1 << tcet->page_shift)) {
+        target_ulong off = (tce_list & ~SPAPR_TCE_RW) +
+                                i * sizeof(target_ulong);
+        target_ulong tce = ldq_phys(cs->as, off);
 
-    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
-        target_ulong tce = ldq_phys(cs->as, tce_list +
-                                    i * sizeof(target_ulong));
         ret = put_tce_emu(tcet, ioba, tce);
         if (ret) {
             break;
@@ -276,9 +283,9 @@ static target_ulong h_stuff_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         return H_PARAMETER;
     }
 
-    ioba &= ~SPAPR_TCE_PAGE_MASK;
+    ioba &= ~((1 << tcet->page_shift) - 1);
 
-    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
+    for (i = 0; i < npages; ++i, ioba += (1 << tcet->page_shift)) {
         ret = put_tce_emu(tcet, ioba, tce_value);
         if (ret) {
             break;
@@ -298,7 +305,7 @@ static target_ulong h_put_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     target_ulong ret = H_PARAMETER;
     sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
 
-    ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
+    ioba &= ~((1 << tcet->page_shift) - 1);
 
     if (tcet) {
         ret = put_tce_emu(tcet, ioba, tce);
@@ -311,13 +318,13 @@ static target_ulong h_put_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 static target_ulong get_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
                                 target_ulong *tce)
 {
-    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
+    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
         hcall_dprintf("spapr_iommu_get_tce on out-of-bounds IOBA 0x"
                       TARGET_FMT_lx "\n", ioba);
         return H_PARAMETER;
     }
 
-    *tce = tcet->table[ioba >> SPAPR_TCE_PAGE_SHIFT];
+    *tce = tcet->table[ioba >> tcet->page_shift];
 
     return H_SUCCESS;
 }
@@ -330,8 +337,9 @@ static target_ulong h_get_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     target_ulong tce = 0;
     target_ulong ret = H_PARAMETER;
     sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
+    const target_ulong mask = ~((1 << tcet->page_shift) - 1);
 
-    ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
+    ioba &= mask;
 
     if (tcet) {
         ret = get_tce_emu(tcet, ioba, &tce);
@@ -382,7 +390,7 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
     }
 
     return spapr_dma_dt(fdt, node_off, propname,
-                        tcet->liobn, 0, tcet->nb_table << SPAPR_TCE_PAGE_SHIFT);
+                        tcet->liobn, 0, tcet->nb_table << tcet->page_shift);
 }
 
 static void spapr_tce_table_class_init(ObjectClass *klass, void *data)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index fdd4c07..c9850d4 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -656,6 +656,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
     sPAPRTCETable *tcet;
 
     tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
+                               SPAPR_TCE_PAGE_SHIFT,
                                0x40000000 >> SPAPR_TCE_PAGE_SHIFT);
     if (!tcet) {
         error_setg(errp, "Unable to create TCE table for %s",
diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index b84e481..d7e9e6a 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -457,6 +457,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
     if (pc->rtce_window_size) {
         uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
         dev->tcet = spapr_tce_new_table(qdev, liobn,
+                                        SPAPR_TCE_PAGE_SHIFT,
                                         pc->rtce_window_size >>
                                         SPAPR_TCE_PAGE_SHIFT);
         address_space_init(&dev->as, spapr_tce_get_iommu(dev->tcet), qdev->id);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5f7791d..2e88e52 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -375,7 +375,6 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
 
 #define SPAPR_TCE_PAGE_SHIFT   12
 #define SPAPR_TCE_PAGE_SIZE    (1ULL << SPAPR_TCE_PAGE_SHIFT)
-#define SPAPR_TCE_PAGE_MASK    (SPAPR_TCE_PAGE_SIZE - 1)
 
 #define SPAPR_VIO_BASE_LIOBN    0x00000000
 #define SPAPR_PCI_BASE_LIOBN    0x80000000
@@ -392,6 +391,7 @@ struct sPAPRTCETable {
     DeviceState parent;
     uint32_t liobn;
     uint32_t nb_table;
+    uint32_t page_shift;
     uint64_t *table;
     bool bypass;
     int fd;
@@ -402,6 +402,7 @@ struct sPAPRTCETable {
 void spapr_events_init(sPAPREnvironment *spapr);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
+                                   uint32_t page_shift,
                                    uint32_t nb_table);
 MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
 void spapr_tce_set_bypass(sPAPRTCETable *tcet, bool bypass);
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 9/9] spapr_iommu: Introduce bus_offset in sPAPRTCETable
  2014-05-21 14:21 [Qemu-devel] [PATCH 0/9] spapr_pci: Prepare for VFIO Alexey Kardashevskiy
                   ` (7 preceding siblings ...)
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift " Alexey Kardashevskiy
@ 2014-05-21 14:21 ` Alexey Kardashevskiy
  8 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

This adds @bus_offset into sPAPRTCETable to tell where TCE table starts
from. It is set to 0 for emulated devices. Dynamic DMA windows will use
other offset.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c   | 13 +++++++++----
 hw/ppc/spapr_pci.c     |  5 +++--
 hw/ppc/spapr_vio.c     |  1 +
 include/hw/ppc/spapr.h |  2 ++
 4 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index e765a6d..d93a9d0 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -134,6 +134,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
 }
 
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
+                                   uint64_t bus_offset,
                                    uint32_t page_shift,
                                    uint32_t nb_table)
 {
@@ -151,6 +152,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
 
     tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
     tcet->liobn = liobn;
+    tcet->bus_offset = bus_offset;
     tcet->page_shift = page_shift;
     tcet->nb_table = nb_table;
 
@@ -198,14 +200,15 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
 {
     IOMMUTLBEntry entry;
     target_ulong mask = ~((1 << tcet->page_shift) - 1);
+    unsigned long index = (ioba - tcet->bus_offset) >> tcet->page_shift;
 
-    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
+    if (index >= tcet->nb_table) {
         hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
                       TARGET_FMT_lx "\n", ioba);
         return H_PARAMETER;
     }
 
-    tcet->table[ioba >> tcet->page_shift] = tce;
+    tcet->table[index] = tce;
 
     entry.target_as = &address_space_memory,
     entry.iova = ioba & mask;
@@ -318,13 +321,15 @@ static target_ulong h_put_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 static target_ulong get_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
                                 target_ulong *tce)
 {
-    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
+    unsigned long index = (ioba - tcet->bus_offset) >> tcet->page_shift;
+
+    if (index >= tcet->nb_table) {
         hcall_dprintf("spapr_iommu_get_tce on out-of-bounds IOBA 0x"
                       TARGET_FMT_lx "\n", ioba);
         return H_PARAMETER;
     }
 
-    *tce = tcet->table[ioba >> tcet->page_shift];
+    *tce = tcet->table[index];
 
     return H_SUCCESS;
 }
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index c9850d4..ddfd8bb 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -656,6 +656,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
     sPAPRTCETable *tcet;
 
     tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
+                               0,
                                SPAPR_TCE_PAGE_SHIFT,
                                0x40000000 >> SPAPR_TCE_PAGE_SHIFT);
     if (!tcet) {
@@ -816,8 +817,8 @@ static int spapr_phb_children_dt(Object *child, void *opaque)
     }
 
     spapr_dma_dt(p->fdt, p->node_off, "ibm,dma-window",
-                 tcet->liobn, 0,
-                 tcet->nb_table << SPAPR_TCE_PAGE_SHIFT);
+                 tcet->liobn, tcet->bus_offset,
+                 tcet->nb_table << tcet->page_shift);
     /* Stop after the first window */
 
     return 1;
diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index d7e9e6a..48b0125 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -457,6 +457,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
     if (pc->rtce_window_size) {
         uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
         dev->tcet = spapr_tce_new_table(qdev, liobn,
+                                        0,
                                         SPAPR_TCE_PAGE_SHIFT,
                                         pc->rtce_window_size >>
                                         SPAPR_TCE_PAGE_SHIFT);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 2e88e52..c848c50 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -391,6 +391,7 @@ struct sPAPRTCETable {
     DeviceState parent;
     uint32_t liobn;
     uint32_t nb_table;
+    uint64_t bus_offset;
     uint32_t page_shift;
     uint64_t *table;
     bool bypass;
@@ -402,6 +403,7 @@ struct sPAPRTCETable {
 void spapr_events_init(sPAPREnvironment *spapr);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
+                                   uint64_t bus_offset,
                                    uint32_t page_shift,
                                    uint32_t nb_table);
 MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] spapr: Enable dynamic change of the supported hypercalls list
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 1/9] spapr: Enable dynamic change of the supported hypercalls list Alexey Kardashevskiy
@ 2014-05-21 14:26   ` Alexander Graf
  2014-05-21 15:21     ` [Qemu-devel] [PATCH v2] " Alexey Kardashevskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2014-05-21 14:26 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 21.05.14 16:21, Alexey Kardashevskiy wrote:
> At the moment the "ibm,hypertas-functions" list is fixed. However some
> calls should be listed there if they are supported by QEMU or the host
> kernel.
>
> This enables hyperrtas_prop to grow on stack by adding
> a SPAPR_HYPERRTAS_ADD macro.
>
> The first user of this is going to be a "multi-tce" property.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   hw/ppc/spapr.c | 22 +++++++++++++++++++---
>   1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0a61246..e174e04 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -306,8 +306,6 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>       CPUState *cs;
>       uint32_t start_prop = cpu_to_be32(initrd_base);
>       uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
> -    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
> -        "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk\0hcall-set-mode";
>       char qemu_hypertas_prop[] = "hcall-memop1";
>       uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
>       uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
> @@ -316,6 +314,24 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>       QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
>       unsigned sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
>       uint32_t cpus_per_socket = sockets ? (smp_cpus / sockets) : 1;
> +    char *hypertas_prop = NULL;
> +    int hypertas_prop_len = 0;
> +
> +#define SPAPR_HYPERRTAS_ADD(prop)                                   \
> +    do {                                                            \
> +        const char proptmp[] = prop;                                \
> +        char *httmp = alloca(hypertas_prop_len + sizeof(proptmp));  \
> +        if (hypertas_prop_len) {                                    \
> +            memcpy(httmp, hypertas_prop, hypertas_prop_len);        \
> +        }                                                           \
> +        memcpy(httmp + hypertas_prop_len, proptmp, sizeof(proptmp));\
> +        hypertas_prop_len += sizeof(proptmp);                       \
> +        hypertas_prop = httmp;                                      \
> +    } while (0)

Please make this an inline function. Also while I appreciate your 
attempt to speed up memory allocation with alloca, I don't think alloca 
is available on Windows hosts, so we can't use it.

Isn't there a gstring type we can use to grow dynamically and maintain 
content and length at the same time?

> +
> +    SPAPR_HYPERRTAS_ADD(
> +        "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
> +        "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk\0hcall-set-mode");

Add these individually please :).


Alex

>   
>       fdt = g_malloc0(FDT_MAX_SIZE);
>       _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
> @@ -485,7 +501,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>       _FDT((fdt_begin_node(fdt, "rtas")));
>   
>       _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
> -                       sizeof(hypertas_prop))));
> +                       hypertas_prop_len)));
>       _FDT((fdt_property(fdt, "qemu,hypertas-functions", qemu_hypertas_prop,
>                          sizeof(qemu_hypertas_prop))));
>   

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] spapr_iommu: Enable multiple TCE requests
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 2/9] spapr_iommu: Enable multiple TCE requests Alexey Kardashevskiy
@ 2014-05-21 14:37   ` Alexander Graf
  2014-05-21 15:23     ` [Qemu-devel] [PATCH v2] " Alexey Kardashevskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2014-05-21 14:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 21.05.14 16:21, Alexey Kardashevskiy wrote:
> Currently only single TCE entry per request is supported (H_PUT_TCE).
> However PAPR+ specification allows multiple entry requests such as
> H_PUT_TCE_INDIRECT and H_STUFF_TCE. Having less transitions to the host
> kernel via ioctls, support of these calls can accelerate IOMMU operations.
>
> This implements H_STUFF_TCE and H_PUT_TCE_INDIRECT.
>
> This advertises "multi-tce" capability to the guest if the host kernel
> supports it (KVM_CAP_SPAPR_MULTITCE) or guest is running in TCG mode.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v1:
> * removed checks for liobn as the check is performed already in
> spapr_tce_find_by_liobn
> * added hcall-multi-tce if the host kernel supports the capability
> ---
>   hw/ppc/spapr.c       |  3 ++
>   hw/ppc/spapr_iommu.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   target-ppc/kvm.c     |  7 +++++
>   target-ppc/kvm_ppc.h |  7 +++++
>   trace-events         |  2 ++
>   5 files changed, 97 insertions(+)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e174e04..66929cb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -500,6 +500,9 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>       /* RTAS */
>       _FDT((fdt_begin_node(fdt, "rtas")));
>   
> +    if (kvmppc_spapr_use_multitce()) {

Sorry I didn't realize this earlier. I think it's more obvious to the 
reader if the "enabled for TCG" logic is not hidden in some other function:

   if (!kvm_enabled() || kvmppc_supports_multitce()) {

> +        SPAPR_HYPERRTAS_ADD("hcall-multi-tce");
> +    }
>       _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
>                          hypertas_prop_len)));
>       _FDT((fdt_property(fdt, "qemu,hypertas-functions", qemu_hypertas_prop,
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 72493d8..ab5037c 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -224,6 +224,82 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
>       return H_SUCCESS;
>   }
>   
> +static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
> +                                       sPAPREnvironment *spapr,
> +                                       target_ulong opcode, target_ulong *args)
> +{
> +    int i;
> +    target_ulong liobn = args[0];
> +    target_ulong ioba = args[1];
> +    target_ulong ioba1 = ioba;
> +    target_ulong tce_list = args[2];
> +    target_ulong npages = args[3];
> +    target_ulong ret = H_PARAMETER;
> +    sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
> +    CPUState *cs = CPU(cpu);
> +
> +    if (!tcet) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (npages > 512) {
> +        return H_PARAMETER;
> +    }
> +
> +    ioba &= ~SPAPR_TCE_PAGE_MASK;
> +    tce_list &= ~SPAPR_TCE_PAGE_MASK;
> +
> +    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
> +        target_ulong tce = ldq_phys(cs->as, tce_list +

Is this one of those cases where the guest may expect us to mask the 
upper bits again? It's not an rtas call after all. What does sPAPR say?


Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v2] spapr: Enable dynamic change of the supported hypercalls list
  2014-05-21 14:26   ` Alexander Graf
@ 2014-05-21 15:21     ` Alexey Kardashevskiy
  2014-05-22 10:47       ` Alexander Graf
  0 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 15:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

At the moment the "ibm,hypertas-functions" list is fixed. However some
calls should be listed there if they are supported by QEMU or the host
kernel.

This enables hyperrtas_prop to grow on stack by adding
a SPAPR_HYPERRTAS_ADD macro. "qemu,hypertas-functions" is converted as well.

The first user of this is going to be a "multi-tce" property.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* replaced alloca() with GString
---
 hw/ppc/spapr.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0a61246..3b28211 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -293,6 +293,10 @@ static size_t create_page_sizes_prop(CPUPPCState *env, uint32_t *prop,
         }                                                          \
     } while (0)
 
+static inline void add_str(GString *s, const gchar *s1)
+{
+    g_string_append_len(s, s1, strlen(s1) + 1);
+}
 
 static void *spapr_create_fdt_skel(hwaddr initrd_base,
                                    hwaddr initrd_size,
@@ -306,9 +310,8 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     CPUState *cs;
     uint32_t start_prop = cpu_to_be32(initrd_base);
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
-    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
-        "\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk\0hcall-set-mode";
-    char qemu_hypertas_prop[] = "hcall-memop1";
+    GString *hypertas = g_string_sized_new(256);
+    GString *qemu_hypertas = g_string_sized_new(256);
     uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
     int i, smt = kvmppc_smt_threads();
@@ -317,6 +320,17 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     unsigned sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
     uint32_t cpus_per_socket = sockets ? (smp_cpus / sockets) : 1;
 
+    add_str(hypertas, "hcall-pft");
+    add_str(hypertas, "hcall-term");
+    add_str(hypertas, "hcall-dabr");
+    add_str(hypertas, "hcall-interrupt");
+    add_str(hypertas, "hcall-tce");
+    add_str(hypertas, "hcall-vio");
+    add_str(hypertas, "hcall-splpar");
+    add_str(hypertas, "hcall-bulk");
+    add_str(hypertas, "hcall-set-mode");
+    add_str(qemu_hypertas, "hcall-memop1");
+
     fdt = g_malloc0(FDT_MAX_SIZE);
     _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
 
@@ -484,10 +498,12 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     /* RTAS */
     _FDT((fdt_begin_node(fdt, "rtas")));
 
-    _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
-                       sizeof(hypertas_prop))));
-    _FDT((fdt_property(fdt, "qemu,hypertas-functions", qemu_hypertas_prop,
-                       sizeof(qemu_hypertas_prop))));
+    _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas->str,
+                       hypertas->len)));
+    g_string_free(hypertas, TRUE);
+    _FDT((fdt_property(fdt, "qemu,hypertas-functions", qemu_hypertas->str,
+                       qemu_hypertas->len)));
+    g_string_free(qemu_hypertas, TRUE);
 
     _FDT((fdt_property(fdt, "ibm,associativity-reference-points",
         refpoints, sizeof(refpoints))));
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v2] spapr_iommu: Enable multiple TCE requests
  2014-05-21 14:37   ` Alexander Graf
@ 2014-05-21 15:23     ` Alexey Kardashevskiy
  2014-05-21 16:03       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 15:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

Currently only single TCE entry per request is supported (H_PUT_TCE).
However PAPR+ specification allows multiple entry requests such as
H_PUT_TCE_INDIRECT and H_STUFF_TCE. Having less transitions to the host
kernel via ioctls, support of these calls can accelerate IOMMU operations.

This implements H_STUFF_TCE and H_PUT_TCE_INDIRECT.

This advertises "multi-tce" capability to the guest if the host kernel
supports it (KVM_CAP_SPAPR_MULTITCE) or guest is running in TCG mode.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* multi-tce enabled explicitely for TCG, it was implicit
* kvmppc_spapr_use_multitce() does not handle TCG anymore

v1:
* removed checks for liobn as the check is performed already in
spapr_tce_find_by_liobn
* added hcall-multi-tce if the host kernel supports the capability
---
 hw/ppc/spapr.c       |  3 ++
 hw/ppc/spapr_iommu.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/kvm.c     |  7 +++++
 target-ppc/kvm_ppc.h |  6 ++++
 trace-events         |  2 ++
 5 files changed, 96 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3b28211..697fba6 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -498,6 +498,9 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     /* RTAS */
     _FDT((fdt_begin_node(fdt, "rtas")));
 
+    if (!kvm_enabled() || kvmppc_spapr_use_multitce()) {
+        add_str(hypertas, "hcall-multi-tce");
+    }
     _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas->str,
                        hypertas->len)));
     g_string_free(hypertas, TRUE);
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 72493d8..ab5037c 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -224,6 +224,82 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
     return H_SUCCESS;
 }
 
+static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
+                                       sPAPREnvironment *spapr,
+                                       target_ulong opcode, target_ulong *args)
+{
+    int i;
+    target_ulong liobn = args[0];
+    target_ulong ioba = args[1];
+    target_ulong ioba1 = ioba;
+    target_ulong tce_list = args[2];
+    target_ulong npages = args[3];
+    target_ulong ret = H_PARAMETER;
+    sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
+    CPUState *cs = CPU(cpu);
+
+    if (!tcet) {
+        return H_PARAMETER;
+    }
+
+    if (npages > 512) {
+        return H_PARAMETER;
+    }
+
+    ioba &= ~SPAPR_TCE_PAGE_MASK;
+    tce_list &= ~SPAPR_TCE_PAGE_MASK;
+
+    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
+        target_ulong tce = ldq_phys(cs->as, tce_list +
+                                    i * sizeof(target_ulong));
+        ret = put_tce_emu(tcet, ioba, tce);
+        if (ret) {
+            break;
+        }
+    }
+
+    /* Trace last successful or the first problematic entry */
+    i = i ? (i - 1) : 0;
+    trace_spapr_iommu_indirect(liobn, ioba1, tce_list, i,
+                               ldq_phys(cs->as,
+                               tce_list + i * sizeof(target_ulong)),
+                               ret);
+
+    return ret;
+}
+
+static target_ulong h_stuff_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                              target_ulong opcode, target_ulong *args)
+{
+    int i;
+    target_ulong liobn = args[0];
+    target_ulong ioba = args[1];
+    target_ulong tce_value = args[2];
+    target_ulong npages = args[3];
+    target_ulong ret = H_PARAMETER;
+    sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
+
+    if (!tcet) {
+        return H_PARAMETER;
+    }
+
+    if (npages > tcet->nb_table) {
+        return H_PARAMETER;
+    }
+
+    ioba &= ~SPAPR_TCE_PAGE_MASK;
+
+    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
+        ret = put_tce_emu(tcet, ioba, tce_value);
+        if (ret) {
+            break;
+        }
+    }
+    trace_spapr_iommu_stuff(liobn, ioba, tce_value, npages, ret);
+
+    return ret;
+}
+
 static target_ulong h_put_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                               target_ulong opcode, target_ulong *args)
 {
@@ -332,6 +408,8 @@ static void spapr_tce_table_class_init(ObjectClass *klass, void *data)
     /* hcall-tce */
     spapr_register_hypercall(H_PUT_TCE, h_put_tce);
     spapr_register_hypercall(H_GET_TCE, h_get_tce);
+    spapr_register_hypercall(H_PUT_TCE_INDIRECT, h_put_tce_indirect);
+    spapr_register_hypercall(H_STUFF_TCE, h_stuff_tce);
 }
 
 static TypeInfo spapr_tce_table_info = {
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index ca31027..bcf2db8 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -62,6 +62,7 @@ static int cap_booke_sregs;
 static int cap_ppc_smt;
 static int cap_ppc_rma;
 static int cap_spapr_tce;
+static int cap_spapr_multitce;
 static int cap_hior;
 static int cap_one_reg;
 static int cap_epr;
@@ -98,6 +99,7 @@ int kvm_arch_init(KVMState *s)
     cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
     cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
     cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
+    cap_spapr_multitce = kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE);
     cap_one_reg = kvm_check_extension(s, KVM_CAP_ONE_REG);
     cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
     cap_epr = kvm_check_extension(s, KVM_CAP_PPC_EPR);
@@ -1608,6 +1610,11 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
 }
 #endif
 
+bool kvmppc_spapr_use_multitce(void)
+{
+    return cap_spapr_multitce;
+}
+
 void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd)
 {
     struct kvm_create_spapr_tce args = {
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index ff077ec..b90d31b 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -31,6 +31,7 @@ int kvmppc_set_tcr(PowerPCCPU *cpu);
 int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu);
 #ifndef CONFIG_USER_ONLY
 off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem);
+bool kvmppc_spapr_use_multitce(void);
 void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd);
 int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
 int kvmppc_reset_htab(int shift_hint);
@@ -130,6 +131,11 @@ static inline off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem)
     return 0;
 }
 
+static inline bool kvmppc_spapr_use_multitce(void)
+{
+    return false;
+}
+
 static inline void *kvmppc_create_spapr_tce(uint32_t liobn,
                                             uint32_t window_size, int *fd)
 {
diff --git a/trace-events b/trace-events
index 5997846..89719c7 100644
--- a/trace-events
+++ b/trace-events
@@ -1186,6 +1186,8 @@ xics_ics_eoi(int nr) "ics_eoi: irq %#x"
 # hw/ppc/spapr_iommu.c
 spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64
 spapr_iommu_get(uint64_t liobn, uint64_t ioba, uint64_t ret, uint64_t tce) "liobn=%"PRIx64" ioba=0x%"PRIx64" ret=%"PRId64" tce=0x%"PRIx64
+spapr_iommu_indirect(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t iobaN, uint64_t tceN, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tcelist=0x%"PRIx64" iobaN=0x%"PRIx64" tceN=0x%"PRIx64" ret=%"PRId64
+spapr_iommu_stuff(uint64_t liobn, uint64_t ioba, uint64_t tce_value, uint64_t npages, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tcevalue=0x%"PRIx64" npages=%"PRId64" ret=%"PRId64
 spapr_iommu_xlate(uint64_t liobn, uint64_t ioba, uint64_t tce, unsigned perm, unsigned pgsize) "liobn=%"PRIx64" 0x%"PRIx64" -> 0x%"PRIx64" perm=%u mask=%x"
 spapr_iommu_new_table(uint64_t liobn, void *tcet, void *table, int fd) "liobn=%"PRIx64" tcet=%p table=%p fd=%d"
 
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2] spapr_iommu: Enable multiple TCE requests
  2014-05-21 15:23     ` [Qemu-devel] [PATCH v2] " Alexey Kardashevskiy
@ 2014-05-21 16:03       ` Alexey Kardashevskiy
  2014-05-21 21:54         ` Alexander Graf
  0 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 16:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, Alexander Graf

On 05/22/2014 01:23 AM, Alexey Kardashevskiy wrote:
> Currently only single TCE entry per request is supported (H_PUT_TCE).
> However PAPR+ specification allows multiple entry requests such as
> H_PUT_TCE_INDIRECT and H_STUFF_TCE. Having less transitions to the host
> kernel via ioctls, support of these calls can accelerate IOMMU operations.
> 
> This implements H_STUFF_TCE and H_PUT_TCE_INDIRECT.
> 
> This advertises "multi-tce" capability to the guest if the host kernel
> supports it (KVM_CAP_SPAPR_MULTITCE) or guest is running in TCG mode.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * multi-tce enabled explicitely for TCG, it was implicit
> * kvmppc_spapr_use_multitce() does not handle TCG anymore
> 
> v1:
> * removed checks for liobn as the check is performed already in
> spapr_tce_find_by_liobn
> * added hcall-multi-tce if the host kernel supports the capability
> ---
>  hw/ppc/spapr.c       |  3 ++
>  hw/ppc/spapr_iommu.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  target-ppc/kvm.c     |  7 +++++
>  target-ppc/kvm_ppc.h |  6 ++++
>  trace-events         |  2 ++
>  5 files changed, 96 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3b28211..697fba6 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -498,6 +498,9 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>      /* RTAS */
>      _FDT((fdt_begin_node(fdt, "rtas")));
>  
> +    if (!kvm_enabled() || kvmppc_spapr_use_multitce()) {
> +        add_str(hypertas, "hcall-multi-tce");
> +    }
>      _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas->str,
>                         hypertas->len)));
>      g_string_free(hypertas, TRUE);
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 72493d8..ab5037c 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -224,6 +224,82 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
>      return H_SUCCESS;
>  }
>  
> +static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
> +                                       sPAPREnvironment *spapr,
> +                                       target_ulong opcode, target_ulong *args)
> +{
> +    int i;
> +    target_ulong liobn = args[0];
> +    target_ulong ioba = args[1];
> +    target_ulong ioba1 = ioba;
> +    target_ulong tce_list = args[2];
> +    target_ulong npages = args[3];
> +    target_ulong ret = H_PARAMETER;
> +    sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
> +    CPUState *cs = CPU(cpu);
> +
> +    if (!tcet) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (npages > 512) {
> +        return H_PARAMETER;
> +    }
> +
> +    ioba &= ~SPAPR_TCE_PAGE_MASK;
> +    tce_list &= ~SPAPR_TCE_PAGE_MASK;
> +
> +    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
> +        target_ulong tce = ldq_phys(cs->as, tce_list +
> +                                    i * sizeof(target_ulong));

Sorry, it is too late here, forgot to comment :)

I cannot use rtas_ld straight away as it is 32bit and here I need 64bit.
Anyway, this is a hypercall and it is called from guest virtual mode so I
do not think that rule with masking top bits for RTAS applies here.

SPAPR says about it:

uint64 TCE, /* The logical address of a page of (4 K long on a 4 K
boundary) of TCE contents to
be stored in the TCE table (contains logical address of storage page to be
mapped)*/

I believe "logical address" in this context is a "guest physical" address.
Does it help? Does not help me :)



> +        ret = put_tce_emu(tcet, ioba, tce);
> +        if (ret) {
> +            break;
> +        }
> +    }
> +
> +    /* Trace last successful or the first problematic entry */
> +    i = i ? (i - 1) : 0;
> +    trace_spapr_iommu_indirect(liobn, ioba1, tce_list, i,
> +                               ldq_phys(cs->as,
> +                               tce_list + i * sizeof(target_ulong)),
> +                               ret);
> +
> +    return ret;
> +}
> +
> +static target_ulong h_stuff_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +                              target_ulong opcode, target_ulong *args)
> +{
> +    int i;
> +    target_ulong liobn = args[0];
> +    target_ulong ioba = args[1];
> +    target_ulong tce_value = args[2];
> +    target_ulong npages = args[3];
> +    target_ulong ret = H_PARAMETER;
> +    sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
> +
> +    if (!tcet) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (npages > tcet->nb_table) {
> +        return H_PARAMETER;
> +    }
> +
> +    ioba &= ~SPAPR_TCE_PAGE_MASK;
> +
> +    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
> +        ret = put_tce_emu(tcet, ioba, tce_value);
> +        if (ret) {
> +            break;
> +        }
> +    }
> +    trace_spapr_iommu_stuff(liobn, ioba, tce_value, npages, ret);
> +
> +    return ret;
> +}
> +
>  static target_ulong h_put_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>                                target_ulong opcode, target_ulong *args)
>  {
> @@ -332,6 +408,8 @@ static void spapr_tce_table_class_init(ObjectClass *klass, void *data)
>      /* hcall-tce */
>      spapr_register_hypercall(H_PUT_TCE, h_put_tce);
>      spapr_register_hypercall(H_GET_TCE, h_get_tce);
> +    spapr_register_hypercall(H_PUT_TCE_INDIRECT, h_put_tce_indirect);
> +    spapr_register_hypercall(H_STUFF_TCE, h_stuff_tce);
>  }
>  
>  static TypeInfo spapr_tce_table_info = {
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index ca31027..bcf2db8 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -62,6 +62,7 @@ static int cap_booke_sregs;
>  static int cap_ppc_smt;
>  static int cap_ppc_rma;
>  static int cap_spapr_tce;
> +static int cap_spapr_multitce;
>  static int cap_hior;
>  static int cap_one_reg;
>  static int cap_epr;
> @@ -98,6 +99,7 @@ int kvm_arch_init(KVMState *s)
>      cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
>      cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
>      cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
> +    cap_spapr_multitce = kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE);
>      cap_one_reg = kvm_check_extension(s, KVM_CAP_ONE_REG);
>      cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
>      cap_epr = kvm_check_extension(s, KVM_CAP_PPC_EPR);
> @@ -1608,6 +1610,11 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
>  }
>  #endif
>  
> +bool kvmppc_spapr_use_multitce(void)
> +{
> +    return cap_spapr_multitce;
> +}
> +
>  void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd)
>  {
>      struct kvm_create_spapr_tce args = {
> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> index ff077ec..b90d31b 100644
> --- a/target-ppc/kvm_ppc.h
> +++ b/target-ppc/kvm_ppc.h
> @@ -31,6 +31,7 @@ int kvmppc_set_tcr(PowerPCCPU *cpu);
>  int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu);
>  #ifndef CONFIG_USER_ONLY
>  off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem);
> +bool kvmppc_spapr_use_multitce(void);
>  void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd);
>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>  int kvmppc_reset_htab(int shift_hint);
> @@ -130,6 +131,11 @@ static inline off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem)
>      return 0;
>  }
>  
> +static inline bool kvmppc_spapr_use_multitce(void)
> +{
> +    return false;
> +}
> +
>  static inline void *kvmppc_create_spapr_tce(uint32_t liobn,
>                                              uint32_t window_size, int *fd)
>  {
> diff --git a/trace-events b/trace-events
> index 5997846..89719c7 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1186,6 +1186,8 @@ xics_ics_eoi(int nr) "ics_eoi: irq %#x"
>  # hw/ppc/spapr_iommu.c
>  spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64
>  spapr_iommu_get(uint64_t liobn, uint64_t ioba, uint64_t ret, uint64_t tce) "liobn=%"PRIx64" ioba=0x%"PRIx64" ret=%"PRId64" tce=0x%"PRIx64
> +spapr_iommu_indirect(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t iobaN, uint64_t tceN, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tcelist=0x%"PRIx64" iobaN=0x%"PRIx64" tceN=0x%"PRIx64" ret=%"PRId64
> +spapr_iommu_stuff(uint64_t liobn, uint64_t ioba, uint64_t tce_value, uint64_t npages, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tcevalue=0x%"PRIx64" npages=%"PRId64" ret=%"PRId64
>  spapr_iommu_xlate(uint64_t liobn, uint64_t ioba, uint64_t tce, unsigned perm, unsigned pgsize) "liobn=%"PRIx64" 0x%"PRIx64" -> 0x%"PRIx64" perm=%u mask=%x"
>  spapr_iommu_new_table(uint64_t liobn, void *tcet, void *table, int fd) "liobn=%"PRIx64" tcet=%p table=%p fd=%d"
>  
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2] spapr_iommu: Enable multiple TCE requests
  2014-05-21 16:03       ` Alexey Kardashevskiy
@ 2014-05-21 21:54         ` Alexander Graf
  0 siblings, 0 replies; 30+ messages in thread
From: Alexander Graf @ 2014-05-21 21:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 21.05.14 18:03, Alexey Kardashevskiy wrote:
> On 05/22/2014 01:23 AM, Alexey Kardashevskiy wrote:
>> Currently only single TCE entry per request is supported (H_PUT_TCE).
>> However PAPR+ specification allows multiple entry requests such as
>> H_PUT_TCE_INDIRECT and H_STUFF_TCE. Having less transitions to the host
>> kernel via ioctls, support of these calls can accelerate IOMMU operations.
>>
>> This implements H_STUFF_TCE and H_PUT_TCE_INDIRECT.
>>
>> This advertises "multi-tce" capability to the guest if the host kernel
>> supports it (KVM_CAP_SPAPR_MULTITCE) or guest is running in TCG mode.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> Changes:
>> v2:
>> * multi-tce enabled explicitely for TCG, it was implicit
>> * kvmppc_spapr_use_multitce() does not handle TCG anymore
>>
>> v1:
>> * removed checks for liobn as the check is performed already in
>> spapr_tce_find_by_liobn
>> * added hcall-multi-tce if the host kernel supports the capability
>> ---
>>   hw/ppc/spapr.c       |  3 ++
>>   hw/ppc/spapr_iommu.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   target-ppc/kvm.c     |  7 +++++
>>   target-ppc/kvm_ppc.h |  6 ++++
>>   trace-events         |  2 ++
>>   5 files changed, 96 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 3b28211..697fba6 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -498,6 +498,9 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>>       /* RTAS */
>>       _FDT((fdt_begin_node(fdt, "rtas")));
>>   
>> +    if (!kvm_enabled() || kvmppc_spapr_use_multitce()) {
>> +        add_str(hypertas, "hcall-multi-tce");
>> +    }
>>       _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas->str,
>>                          hypertas->len)));
>>       g_string_free(hypertas, TRUE);
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index 72493d8..ab5037c 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -224,6 +224,82 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
>>       return H_SUCCESS;
>>   }
>>   
>> +static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
>> +                                       sPAPREnvironment *spapr,
>> +                                       target_ulong opcode, target_ulong *args)
>> +{
>> +    int i;
>> +    target_ulong liobn = args[0];
>> +    target_ulong ioba = args[1];
>> +    target_ulong ioba1 = ioba;
>> +    target_ulong tce_list = args[2];
>> +    target_ulong npages = args[3];
>> +    target_ulong ret = H_PARAMETER;
>> +    sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>> +    CPUState *cs = CPU(cpu);
>> +
>> +    if (!tcet) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    if (npages > 512) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    ioba &= ~SPAPR_TCE_PAGE_MASK;
>> +    tce_list &= ~SPAPR_TCE_PAGE_MASK;
>> +
>> +    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
>> +        target_ulong tce = ldq_phys(cs->as, tce_list +
>> +                                    i * sizeof(target_ulong));
> Sorry, it is too late here, forgot to comment :)
>
> I cannot use rtas_ld straight away as it is 32bit and here I need 64bit.
> Anyway, this is a hypercall and it is called from guest virtual mode so I
> do not think that rule with masking top bits for RTAS applies here.
>
> SPAPR says about it:
>
> uint64 TCE, /* The logical address of a page of (4 K long on a 4 K
> boundary) of TCE contents to
> be stored in the TCE table (contains logical address of storage page to be
> mapped)*/
>
> I believe "logical address" in this context is a "guest physical" address.
> Does it help? Does not help me :)

Yeah, I think it's safe to assume that "logical address" basically is 
our guest physical. So using ldq_phys is correct.


Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 7/9] spapr_iommu: Get rid of window_size in sPAPRTCETable
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 7/9] spapr_iommu: Get rid of window_size in sPAPRTCETable Alexey Kardashevskiy
@ 2014-05-21 22:05   ` Alexander Graf
  0 siblings, 0 replies; 30+ messages in thread
From: Alexander Graf @ 2014-05-21 22:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 21.05.14 16:21, Alexey Kardashevskiy wrote:
> This removes window_size as it is basically a copy of nb_table
> shifted by SPAPR_TCE_PAGE_SHIFT. As new dynamic DMA windows are
> going to support windows as big as the entire RAM and this number
> will be bigger that 32 capacity, we will have to do something
> about @window_size anyway and removal seems to be the right way to go.
>
> This removes dma_window_start/dma_window_size from sPAPRPHBState as
> they are no longer used.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   hw/ppc/spapr_iommu.c        | 41 +++++++++++++++--------------------------
>   hw/ppc/spapr_pci.c          |  6 ++----
>   hw/ppc/spapr_vio.c          |  4 +++-
>   include/hw/pci-host/spapr.h |  2 --
>   include/hw/ppc/spapr.h      |  3 +--
>   target-ppc/kvm.c            |  4 ++--
>   target-ppc/kvm_ppc.h        |  2 +-
>   7 files changed, 24 insertions(+), 38 deletions(-)
>
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 0dd6509..90de3e3 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -70,7 +70,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>   
>       if (tcet->bypass) {
>           ret.perm = IOMMU_RW;
> -    } else if (addr < tcet->window_size) {
> +    } else if ((addr >> SPAPR_TCE_PAGE_SHIFT) < tcet->nb_table) {
>           /* Check if we are in bound */
>           tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT];
>           ret.iova = addr & ~SPAPR_TCE_PAGE_MASK;
> @@ -84,25 +84,15 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>       return ret;
>   }
>   
> -static int spapr_tce_table_pre_load(void *opaque)
> -{
> -    sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> -
> -    tcet->nb_table = tcet->window_size >> SPAPR_TCE_PAGE_SHIFT;
> -
> -    return 0;
> -}
> -
>   static const VMStateDescription vmstate_spapr_tce_table = {
>       .name = "spapr_iommu",
> -    .version_id = 1,
> +    .version_id = 2,
>       .minimum_version_id = 1,

Doesn't this have to become 2 as well?


Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift in sPAPRTCETable
  2014-05-21 14:21 ` [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift " Alexey Kardashevskiy
@ 2014-05-21 22:11   ` Alexander Graf
  2014-05-21 23:45     ` Alexey Kardashevskiy
  2014-05-22  4:25     ` Alexey Kardashevskiy
  0 siblings, 2 replies; 30+ messages in thread
From: Alexander Graf @ 2014-05-21 22:11 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 21.05.14 16:21, Alexey Kardashevskiy wrote:
> At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
> spec allows other page sizes and we are going to implement them, we need
> page size to be configrable.
>
> This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT
> with it whereever it is possible.
>
> This removes SPAPR_TCE_PAGE_MASK as it is no longer used.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   hw/ppc/spapr_iommu.c   | 54 +++++++++++++++++++++++++++++---------------------
>   hw/ppc/spapr_pci.c     |  1 +
>   hw/ppc/spapr_vio.c     |  1 +
>   include/hw/ppc/spapr.h |  3 ++-
>   4 files changed, 35 insertions(+), 24 deletions(-)
>
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 90de3e3..e765a6d 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -70,12 +70,13 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>   
>       if (tcet->bypass) {
>           ret.perm = IOMMU_RW;
> -    } else if ((addr >> SPAPR_TCE_PAGE_SHIFT) < tcet->nb_table) {
> +    } else if ((addr >> tcet->page_shift) < tcet->nb_table) {
>           /* Check if we are in bound */
> -        tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT];
> -        ret.iova = addr & ~SPAPR_TCE_PAGE_MASK;
> -        ret.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
> -        ret.addr_mask = SPAPR_TCE_PAGE_MASK;
> +        target_ulong mask = ~((1 << tcet->page_shift) - 1);

Why target_ulong? This should be u64 or hwaddr or something along those 
lines, no? Also, can the mask grow bigger than 31bits? If so you 
probably want 1ULL (below as well).

In fact, we might be better off with a few more fields to tcet. Just add 
page_mask and page_size in addition to the page_shift one and use them 
instead of calculating them over and over again.

> +        tce = tcet->table[addr >> tcet->page_shift];
> +        ret.iova = addr & mask;
> +        ret.translated_addr = tce & mask;
> +        ret.addr_mask = ~mask;
>           ret.perm = tce;
>       }
>       trace_spapr_iommu_xlate(tcet->liobn, addr, ret.iova, ret.perm,
> @@ -113,7 +114,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
>       if (kvm_enabled()) {
>           tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
>                                                 tcet->nb_table <<
> -                                              SPAPR_TCE_PAGE_SHIFT,
> +                                              tcet->page_shift,
>                                                 &tcet->fd);
>       }
>   
> @@ -133,6 +134,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
>   }
>   
>   sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
> +                                   uint32_t page_shift,
>                                      uint32_t nb_table)
>   {
>       sPAPRTCETable *tcet;
> @@ -149,6 +151,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
>   
>       tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
>       tcet->liobn = liobn;
> +    tcet->page_shift = page_shift;
>       tcet->nb_table = nb_table;
>   
>       object_property_add_child(OBJECT(owner), "tce-table", OBJECT(tcet), NULL);
> @@ -194,19 +197,20 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
>                                   target_ulong tce)
>   {
>       IOMMUTLBEntry entry;
> +    target_ulong mask = ~((1 << tcet->page_shift) - 1);
>   
> -    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
> +    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
>           hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
>                         TARGET_FMT_lx "\n", ioba);
>           return H_PARAMETER;
>       }
>   
> -    tcet->table[ioba >> SPAPR_TCE_PAGE_SHIFT] = tce;
> +    tcet->table[ioba >> tcet->page_shift] = tce;
>   
>       entry.target_as = &address_space_memory,
> -    entry.iova = ioba & ~SPAPR_TCE_PAGE_MASK;
> -    entry.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
> -    entry.addr_mask = SPAPR_TCE_PAGE_MASK;
> +    entry.iova = ioba & mask;
> +    entry.translated_addr = tce & mask;
> +    entry.addr_mask = ~mask;
>       entry.perm = tce;
>       memory_region_notify_iommu(&tcet->iommu, entry);
>   
> @@ -226,6 +230,7 @@ static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
>       target_ulong ret = H_PARAMETER;
>       sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>       CPUState *cs = CPU(cpu);
> +    target_ulong mask;
>   
>       if (!tcet) {
>           return H_PARAMETER;
> @@ -235,12 +240,14 @@ static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
>           return H_PARAMETER;
>       }
>   
> -    ioba &= ~SPAPR_TCE_PAGE_MASK;
> -    tce_list &= ~SPAPR_TCE_PAGE_MASK;
> +    mask = ~((1 << tcet->page_shift) - 1);
> +    ioba &= mask;
> +
> +    for (i = 0; i < npages; ++i, ioba += (1 << tcet->page_shift)) {
> +        target_ulong off = (tce_list & ~SPAPR_TCE_RW) +
> +                                i * sizeof(target_ulong);
> +        target_ulong tce = ldq_phys(cs->as, off);
>   
> -    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
> -        target_ulong tce = ldq_phys(cs->as, tce_list +
> -                                    i * sizeof(target_ulong));
>           ret = put_tce_emu(tcet, ioba, tce);
>           if (ret) {
>               break;
> @@ -276,9 +283,9 @@ static target_ulong h_stuff_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>           return H_PARAMETER;
>       }
>   
> -    ioba &= ~SPAPR_TCE_PAGE_MASK;
> +    ioba &= ~((1 << tcet->page_shift) - 1);
>   
> -    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
> +    for (i = 0; i < npages; ++i, ioba += (1 << tcet->page_shift)) {
>           ret = put_tce_emu(tcet, ioba, tce_value);
>           if (ret) {
>               break;
> @@ -298,7 +305,7 @@ static target_ulong h_put_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>       target_ulong ret = H_PARAMETER;
>       sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>   
> -    ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
> +    ioba &= ~((1 << tcet->page_shift) - 1);
>   
>       if (tcet) {
>           ret = put_tce_emu(tcet, ioba, tce);
> @@ -311,13 +318,13 @@ static target_ulong h_put_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>   static target_ulong get_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
>                                   target_ulong *tce)
>   {
> -    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
> +    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
>           hcall_dprintf("spapr_iommu_get_tce on out-of-bounds IOBA 0x"
>                         TARGET_FMT_lx "\n", ioba);
>           return H_PARAMETER;
>       }
>   
> -    *tce = tcet->table[ioba >> SPAPR_TCE_PAGE_SHIFT];
> +    *tce = tcet->table[ioba >> tcet->page_shift];
>   
>       return H_SUCCESS;
>   }
> @@ -330,8 +337,9 @@ static target_ulong h_get_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>       target_ulong tce = 0;
>       target_ulong ret = H_PARAMETER;
>       sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
> +    const target_ulong mask = ~((1 << tcet->page_shift) - 1);
>   
> -    ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
> +    ioba &= mask;
>   
>       if (tcet) {
>           ret = get_tce_emu(tcet, ioba, &tce);
> @@ -382,7 +390,7 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
>       }
>   
>       return spapr_dma_dt(fdt, node_off, propname,
> -                        tcet->liobn, 0, tcet->nb_table << SPAPR_TCE_PAGE_SHIFT);
> +                        tcet->liobn, 0, tcet->nb_table << tcet->page_shift);
>   }
>   
>   static void spapr_tce_table_class_init(ObjectClass *klass, void *data)
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index fdd4c07..c9850d4 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -656,6 +656,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
>       sPAPRTCETable *tcet;
>   
>       tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> +                               SPAPR_TCE_PAGE_SHIFT,
>                                  0x40000000 >> SPAPR_TCE_PAGE_SHIFT);
>       if (!tcet) {
>           error_setg(errp, "Unable to create TCE table for %s",
> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
> index b84e481..d7e9e6a 100644
> --- a/hw/ppc/spapr_vio.c
> +++ b/hw/ppc/spapr_vio.c
> @@ -457,6 +457,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
>       if (pc->rtce_window_size) {
>           uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
>           dev->tcet = spapr_tce_new_table(qdev, liobn,
> +                                        SPAPR_TCE_PAGE_SHIFT,

I don't quite understand who defines what the TCE page size is for a 
given device. Can you try to explain this to me?


Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift in sPAPRTCETable
  2014-05-21 22:11   ` Alexander Graf
@ 2014-05-21 23:45     ` Alexey Kardashevskiy
  2014-05-22 10:09       ` Alexander Graf
  2014-05-22  4:25     ` Alexey Kardashevskiy
  1 sibling, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 23:45 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc

On 05/22/2014 08:11 AM, Alexander Graf wrote:
> 
> On 21.05.14 16:21, Alexey Kardashevskiy wrote:
>> At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
>> spec allows other page sizes and we are going to implement them, we need
>> page size to be configrable.
>>
>> This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT
>> with it whereever it is possible.
>>
>> This removes SPAPR_TCE_PAGE_MASK as it is no longer used.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   hw/ppc/spapr_iommu.c   | 54
>> +++++++++++++++++++++++++++++---------------------
>>   hw/ppc/spapr_pci.c     |  1 +
>>   hw/ppc/spapr_vio.c     |  1 +
>>   include/hw/ppc/spapr.h |  3 ++-
>>   4 files changed, 35 insertions(+), 24 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index 90de3e3..e765a6d 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -70,12 +70,13 @@ static IOMMUTLBEntry
>> spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>>         if (tcet->bypass) {
>>           ret.perm = IOMMU_RW;
>> -    } else if ((addr >> SPAPR_TCE_PAGE_SHIFT) < tcet->nb_table) {
>> +    } else if ((addr >> tcet->page_shift) < tcet->nb_table) {
>>           /* Check if we are in bound */
>> -        tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT];
>> -        ret.iova = addr & ~SPAPR_TCE_PAGE_MASK;
>> -        ret.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
>> -        ret.addr_mask = SPAPR_TCE_PAGE_MASK;
>> +        target_ulong mask = ~((1 << tcet->page_shift) - 1);
> 
> Why target_ulong? This should be u64 or hwaddr or something along those
> lines, no? Also, can the mask grow bigger than 31bits? If so you probably
> want 1ULL (below as well).
> 
> In fact, we might be better off with a few more fields to tcet. Just add
> page_mask and page_size in addition to the page_shift one and use them
> instead of calculating them over and over again.
> 
>> +        tce = tcet->table[addr >> tcet->page_shift];
>> +        ret.iova = addr & mask;
>> +        ret.translated_addr = tce & mask;
>> +        ret.addr_mask = ~mask;
>>           ret.perm = tce;
>>       }
>>       trace_spapr_iommu_xlate(tcet->liobn, addr, ret.iova, ret.perm,
>> @@ -113,7 +114,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
>>       if (kvm_enabled()) {
>>           tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
>>                                                 tcet->nb_table <<
>> -                                              SPAPR_TCE_PAGE_SHIFT,
>> +                                              tcet->page_shift,
>>                                                 &tcet->fd);
>>       }
>>   @@ -133,6 +134,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
>>   }
>>     sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
>> +                                   uint32_t page_shift,
>>                                      uint32_t nb_table)
>>   {
>>       sPAPRTCETable *tcet;
>> @@ -149,6 +151,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState
>> *owner, uint32_t liobn,
>>         tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
>>       tcet->liobn = liobn;
>> +    tcet->page_shift = page_shift;
>>       tcet->nb_table = nb_table;
>>         object_property_add_child(OBJECT(owner), "tce-table",
>> OBJECT(tcet), NULL);
>> @@ -194,19 +197,20 @@ static target_ulong put_tce_emu(sPAPRTCETable
>> *tcet, target_ulong ioba,
>>                                   target_ulong tce)
>>   {
>>       IOMMUTLBEntry entry;
>> +    target_ulong mask = ~((1 << tcet->page_shift) - 1);
>>   -    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
>> +    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
>>           hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
>>                         TARGET_FMT_lx "\n", ioba);
>>           return H_PARAMETER;
>>       }
>>   -    tcet->table[ioba >> SPAPR_TCE_PAGE_SHIFT] = tce;
>> +    tcet->table[ioba >> tcet->page_shift] = tce;
>>         entry.target_as = &address_space_memory,
>> -    entry.iova = ioba & ~SPAPR_TCE_PAGE_MASK;
>> -    entry.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
>> -    entry.addr_mask = SPAPR_TCE_PAGE_MASK;
>> +    entry.iova = ioba & mask;
>> +    entry.translated_addr = tce & mask;
>> +    entry.addr_mask = ~mask;
>>       entry.perm = tce;
>>       memory_region_notify_iommu(&tcet->iommu, entry);
>>   @@ -226,6 +230,7 @@ static target_ulong h_put_tce_indirect(PowerPCCPU
>> *cpu,
>>       target_ulong ret = H_PARAMETER;
>>       sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>>       CPUState *cs = CPU(cpu);
>> +    target_ulong mask;
>>         if (!tcet) {
>>           return H_PARAMETER;
>> @@ -235,12 +240,14 @@ static target_ulong h_put_tce_indirect(PowerPCCPU
>> *cpu,
>>           return H_PARAMETER;
>>       }
>>   -    ioba &= ~SPAPR_TCE_PAGE_MASK;
>> -    tce_list &= ~SPAPR_TCE_PAGE_MASK;
>> +    mask = ~((1 << tcet->page_shift) - 1);
>> +    ioba &= mask;
>> +
>> +    for (i = 0; i < npages; ++i, ioba += (1 << tcet->page_shift)) {
>> +        target_ulong off = (tce_list & ~SPAPR_TCE_RW) +
>> +                                i * sizeof(target_ulong);
>> +        target_ulong tce = ldq_phys(cs->as, off);
>>   -    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
>> -        target_ulong tce = ldq_phys(cs->as, tce_list +
>> -                                    i * sizeof(target_ulong));
>>           ret = put_tce_emu(tcet, ioba, tce);
>>           if (ret) {
>>               break;
>> @@ -276,9 +283,9 @@ static target_ulong h_stuff_tce(PowerPCCPU *cpu,
>> sPAPREnvironment *spapr,
>>           return H_PARAMETER;
>>       }
>>   -    ioba &= ~SPAPR_TCE_PAGE_MASK;
>> +    ioba &= ~((1 << tcet->page_shift) - 1);
>>   -    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
>> +    for (i = 0; i < npages; ++i, ioba += (1 << tcet->page_shift)) {
>>           ret = put_tce_emu(tcet, ioba, tce_value);
>>           if (ret) {
>>               break;
>> @@ -298,7 +305,7 @@ static target_ulong h_put_tce(PowerPCCPU *cpu,
>> sPAPREnvironment *spapr,
>>       target_ulong ret = H_PARAMETER;
>>       sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>>   -    ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
>> +    ioba &= ~((1 << tcet->page_shift) - 1);
>>         if (tcet) {
>>           ret = put_tce_emu(tcet, ioba, tce);
>> @@ -311,13 +318,13 @@ static target_ulong h_put_tce(PowerPCCPU *cpu,
>> sPAPREnvironment *spapr,
>>   static target_ulong get_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
>>                                   target_ulong *tce)
>>   {
>> -    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
>> +    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
>>           hcall_dprintf("spapr_iommu_get_tce on out-of-bounds IOBA 0x"
>>                         TARGET_FMT_lx "\n", ioba);
>>           return H_PARAMETER;
>>       }
>>   -    *tce = tcet->table[ioba >> SPAPR_TCE_PAGE_SHIFT];
>> +    *tce = tcet->table[ioba >> tcet->page_shift];
>>         return H_SUCCESS;
>>   }
>> @@ -330,8 +337,9 @@ static target_ulong h_get_tce(PowerPCCPU *cpu,
>> sPAPREnvironment *spapr,
>>       target_ulong tce = 0;
>>       target_ulong ret = H_PARAMETER;
>>       sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>> +    const target_ulong mask = ~((1 << tcet->page_shift) - 1);
>>   -    ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
>> +    ioba &= mask;
>>         if (tcet) {
>>           ret = get_tce_emu(tcet, ioba, &tce);
>> @@ -382,7 +390,7 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const
>> char *propname,
>>       }
>>         return spapr_dma_dt(fdt, node_off, propname,
>> -                        tcet->liobn, 0, tcet->nb_table <<
>> SPAPR_TCE_PAGE_SHIFT);
>> +                        tcet->liobn, 0, tcet->nb_table <<
>> tcet->page_shift);
>>   }
>>     static void spapr_tce_table_class_init(ObjectClass *klass, void *data)
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index fdd4c07..c9850d4 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -656,6 +656,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState
>> *sphb, Error **errp)
>>       sPAPRTCETable *tcet;
>>         tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>> +                               SPAPR_TCE_PAGE_SHIFT,
>>                                  0x40000000 >> SPAPR_TCE_PAGE_SHIFT);
>>       if (!tcet) {
>>           error_setg(errp, "Unable to create TCE table for %s",
>> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
>> index b84e481..d7e9e6a 100644
>> --- a/hw/ppc/spapr_vio.c
>> +++ b/hw/ppc/spapr_vio.c
>> @@ -457,6 +457,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
>>       if (pc->rtce_window_size) {
>>           uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
>>           dev->tcet = spapr_tce_new_table(qdev, liobn,
>> +                                        SPAPR_TCE_PAGE_SHIFT,
> 
> I don't quite understand who defines what the TCE page size is for a given
> device. Can you try to explain this to me?

If it is default window (for PCI) or window for VIO - it is 4K. If it is a
dynamic DMA window - page size is a parameter of RTAS call which creates
the window.



-- 
Alexey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift in sPAPRTCETable
  2014-05-21 22:11   ` Alexander Graf
  2014-05-21 23:45     ` Alexey Kardashevskiy
@ 2014-05-22  4:25     ` Alexey Kardashevskiy
  2014-05-22  7:11       ` Alexander Graf
  1 sibling, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-22  4:25 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc

On 05/22/2014 08:11 AM, Alexander Graf wrote:
> 
> On 21.05.14 16:21, Alexey Kardashevskiy wrote:
>> At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
>> spec allows other page sizes and we are going to implement them, we need
>> page size to be configrable.
>>
>> This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT
>> with it whereever it is possible.
>>
>> This removes SPAPR_TCE_PAGE_MASK as it is no longer used.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   hw/ppc/spapr_iommu.c   | 54
>> +++++++++++++++++++++++++++++---------------------
>>   hw/ppc/spapr_pci.c     |  1 +
>>   hw/ppc/spapr_vio.c     |  1 +
>>   include/hw/ppc/spapr.h |  3 ++-
>>   4 files changed, 35 insertions(+), 24 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index 90de3e3..e765a6d 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -70,12 +70,13 @@ static IOMMUTLBEntry
>> spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>>         if (tcet->bypass) {
>>           ret.perm = IOMMU_RW;
>> -    } else if ((addr >> SPAPR_TCE_PAGE_SHIFT) < tcet->nb_table) {
>> +    } else if ((addr >> tcet->page_shift) < tcet->nb_table) {
>>           /* Check if we are in bound */
>> -        tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT];
>> -        ret.iova = addr & ~SPAPR_TCE_PAGE_MASK;
>> -        ret.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
>> -        ret.addr_mask = SPAPR_TCE_PAGE_MASK;
>> +        target_ulong mask = ~((1 << tcet->page_shift) - 1);
> 
> Why target_ulong? This should be u64 or hwaddr or something along those
> lines, no?

Should be hwaddr, right, will fix.

> Also, can the mask grow bigger than 31bits? If so you probably
> want 1ULL (below as well).

It cannot now but PAPR allows pages to be 16GB so you are making good point
here, will fix too.

> In fact, we might be better off with a few more fields to tcet. Just add
> page_mask and page_size in addition to the page_shift one and use them
> instead of calculating them over and over again.

This is only used for emulated devices, not even for virtio. How much will
we earn from that optimization? Will it be even measurable? On the other
hand, this creates an opportunity (subtle, yes) to screw things up. Still
worth?



-- 
Alexey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift in sPAPRTCETable
  2014-05-22  4:25     ` Alexey Kardashevskiy
@ 2014-05-22  7:11       ` Alexander Graf
  0 siblings, 0 replies; 30+ messages in thread
From: Alexander Graf @ 2014-05-22  7:11 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel



> Am 22.05.2014 um 06:25 schrieb Alexey Kardashevskiy <aik@ozlabs.ru>:
> 
>> On 05/22/2014 08:11 AM, Alexander Graf wrote:
>> 
>>> On 21.05.14 16:21, Alexey Kardashevskiy wrote:
>>> At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
>>> spec allows other page sizes and we are going to implement them, we need
>>> page size to be configrable.
>>> 
>>> This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT
>>> with it whereever it is possible.
>>> 
>>> This removes SPAPR_TCE_PAGE_MASK as it is no longer used.
>>> 
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  hw/ppc/spapr_iommu.c   | 54
>>> +++++++++++++++++++++++++++++---------------------
>>>  hw/ppc/spapr_pci.c     |  1 +
>>>  hw/ppc/spapr_vio.c     |  1 +
>>>  include/hw/ppc/spapr.h |  3 ++-
>>>  4 files changed, 35 insertions(+), 24 deletions(-)
>>> 
>>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>>> index 90de3e3..e765a6d 100644
>>> --- a/hw/ppc/spapr_iommu.c
>>> +++ b/hw/ppc/spapr_iommu.c
>>> @@ -70,12 +70,13 @@ static IOMMUTLBEntry
>>> spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>>>        if (tcet->bypass) {
>>>          ret.perm = IOMMU_RW;
>>> -    } else if ((addr >> SPAPR_TCE_PAGE_SHIFT) < tcet->nb_table) {
>>> +    } else if ((addr >> tcet->page_shift) < tcet->nb_table) {
>>>          /* Check if we are in bound */
>>> -        tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT];
>>> -        ret.iova = addr & ~SPAPR_TCE_PAGE_MASK;
>>> -        ret.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
>>> -        ret.addr_mask = SPAPR_TCE_PAGE_MASK;
>>> +        target_ulong mask = ~((1 << tcet->page_shift) - 1);
>> 
>> Why target_ulong? This should be u64 or hwaddr or something along those
>> lines, no?
> 
> Should be hwaddr, right, will fix.
> 
>> Also, can the mask grow bigger than 31bits? If so you probably
>> want 1ULL (below as well).
> 
> It cannot now but PAPR allows pages to be 16GB so you are making good point
> here, will fix too.
> 
>> In fact, we might be better off with a few more fields to tcet. Just add
>> page_mask and page_size in addition to the page_shift one and use them
>> instead of calculating them over and over again.
> 
> This is only used for emulated devices, not even for virtio. How much will
> we earn from that optimization? Will it be even measurable? On the other
> hand, this creates an opportunity (subtle, yes) to screw things up. Still
> worth?

I'm not concerned about speed here, but readibility :). Inline calculations are just harder to read.

The alternative would be to extract every calculation into a local variable and use that instead ;).

Alex

> 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift in sPAPRTCETable
  2014-05-21 23:45     ` Alexey Kardashevskiy
@ 2014-05-22 10:09       ` Alexander Graf
  2014-05-22 10:24         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2014-05-22 10:09 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 22.05.14 01:45, Alexey Kardashevskiy wrote:
> On 05/22/2014 08:11 AM, Alexander Graf wrote:
>> On 21.05.14 16:21, Alexey Kardashevskiy wrote:
>>> At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
>>> spec allows other page sizes and we are going to implement them, we need
>>> page size to be configrable.
>>>
>>> This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT
>>> with it whereever it is possible.
>>>
>>> This removes SPAPR_TCE_PAGE_MASK as it is no longer used.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>    hw/ppc/spapr_iommu.c   | 54
>>> +++++++++++++++++++++++++++++---------------------
>>>    hw/ppc/spapr_pci.c     |  1 +
>>>    hw/ppc/spapr_vio.c     |  1 +
>>>    include/hw/ppc/spapr.h |  3 ++-
>>>    4 files changed, 35 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>>> index 90de3e3..e765a6d 100644
>>> --- a/hw/ppc/spapr_iommu.c
>>> +++ b/hw/ppc/spapr_iommu.c
>>> @@ -70,12 +70,13 @@ static IOMMUTLBEntry
>>> spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>>>          if (tcet->bypass) {
>>>            ret.perm = IOMMU_RW;
>>> -    } else if ((addr >> SPAPR_TCE_PAGE_SHIFT) < tcet->nb_table) {
>>> +    } else if ((addr >> tcet->page_shift) < tcet->nb_table) {
>>>            /* Check if we are in bound */
>>> -        tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT];
>>> -        ret.iova = addr & ~SPAPR_TCE_PAGE_MASK;
>>> -        ret.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
>>> -        ret.addr_mask = SPAPR_TCE_PAGE_MASK;
>>> +        target_ulong mask = ~((1 << tcet->page_shift) - 1);
>> Why target_ulong? This should be u64 or hwaddr or something along those
>> lines, no? Also, can the mask grow bigger than 31bits? If so you probably
>> want 1ULL (below as well).
>>
>> In fact, we might be better off with a few more fields to tcet. Just add
>> page_mask and page_size in addition to the page_shift one and use them
>> instead of calculating them over and over again.
>>
>>> +        tce = tcet->table[addr >> tcet->page_shift];
>>> +        ret.iova = addr & mask;
>>> +        ret.translated_addr = tce & mask;
>>> +        ret.addr_mask = ~mask;
>>>            ret.perm = tce;
>>>        }
>>>        trace_spapr_iommu_xlate(tcet->liobn, addr, ret.iova, ret.perm,
>>> @@ -113,7 +114,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
>>>        if (kvm_enabled()) {
>>>            tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
>>>                                                  tcet->nb_table <<
>>> -                                              SPAPR_TCE_PAGE_SHIFT,
>>> +                                              tcet->page_shift,
>>>                                                  &tcet->fd);
>>>        }
>>>    @@ -133,6 +134,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
>>>    }
>>>      sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
>>> +                                   uint32_t page_shift,
>>>                                       uint32_t nb_table)
>>>    {
>>>        sPAPRTCETable *tcet;
>>> @@ -149,6 +151,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState
>>> *owner, uint32_t liobn,
>>>          tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
>>>        tcet->liobn = liobn;
>>> +    tcet->page_shift = page_shift;
>>>        tcet->nb_table = nb_table;
>>>          object_property_add_child(OBJECT(owner), "tce-table",
>>> OBJECT(tcet), NULL);
>>> @@ -194,19 +197,20 @@ static target_ulong put_tce_emu(sPAPRTCETable
>>> *tcet, target_ulong ioba,
>>>                                    target_ulong tce)
>>>    {
>>>        IOMMUTLBEntry entry;
>>> +    target_ulong mask = ~((1 << tcet->page_shift) - 1);
>>>    -    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
>>> +    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
>>>            hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
>>>                          TARGET_FMT_lx "\n", ioba);
>>>            return H_PARAMETER;
>>>        }
>>>    -    tcet->table[ioba >> SPAPR_TCE_PAGE_SHIFT] = tce;
>>> +    tcet->table[ioba >> tcet->page_shift] = tce;
>>>          entry.target_as = &address_space_memory,
>>> -    entry.iova = ioba & ~SPAPR_TCE_PAGE_MASK;
>>> -    entry.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
>>> -    entry.addr_mask = SPAPR_TCE_PAGE_MASK;
>>> +    entry.iova = ioba & mask;
>>> +    entry.translated_addr = tce & mask;
>>> +    entry.addr_mask = ~mask;
>>>        entry.perm = tce;
>>>        memory_region_notify_iommu(&tcet->iommu, entry);
>>>    @@ -226,6 +230,7 @@ static target_ulong h_put_tce_indirect(PowerPCCPU
>>> *cpu,
>>>        target_ulong ret = H_PARAMETER;
>>>        sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>>>        CPUState *cs = CPU(cpu);
>>> +    target_ulong mask;
>>>          if (!tcet) {
>>>            return H_PARAMETER;
>>> @@ -235,12 +240,14 @@ static target_ulong h_put_tce_indirect(PowerPCCPU
>>> *cpu,
>>>            return H_PARAMETER;
>>>        }
>>>    -    ioba &= ~SPAPR_TCE_PAGE_MASK;
>>> -    tce_list &= ~SPAPR_TCE_PAGE_MASK;
>>> +    mask = ~((1 << tcet->page_shift) - 1);
>>> +    ioba &= mask;
>>> +
>>> +    for (i = 0; i < npages; ++i, ioba += (1 << tcet->page_shift)) {
>>> +        target_ulong off = (tce_list & ~SPAPR_TCE_RW) +
>>> +                                i * sizeof(target_ulong);
>>> +        target_ulong tce = ldq_phys(cs->as, off);
>>>    -    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
>>> -        target_ulong tce = ldq_phys(cs->as, tce_list +
>>> -                                    i * sizeof(target_ulong));
>>>            ret = put_tce_emu(tcet, ioba, tce);
>>>            if (ret) {
>>>                break;
>>> @@ -276,9 +283,9 @@ static target_ulong h_stuff_tce(PowerPCCPU *cpu,
>>> sPAPREnvironment *spapr,
>>>            return H_PARAMETER;
>>>        }
>>>    -    ioba &= ~SPAPR_TCE_PAGE_MASK;
>>> +    ioba &= ~((1 << tcet->page_shift) - 1);
>>>    -    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
>>> +    for (i = 0; i < npages; ++i, ioba += (1 << tcet->page_shift)) {
>>>            ret = put_tce_emu(tcet, ioba, tce_value);
>>>            if (ret) {
>>>                break;
>>> @@ -298,7 +305,7 @@ static target_ulong h_put_tce(PowerPCCPU *cpu,
>>> sPAPREnvironment *spapr,
>>>        target_ulong ret = H_PARAMETER;
>>>        sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>>>    -    ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
>>> +    ioba &= ~((1 << tcet->page_shift) - 1);
>>>          if (tcet) {
>>>            ret = put_tce_emu(tcet, ioba, tce);
>>> @@ -311,13 +318,13 @@ static target_ulong h_put_tce(PowerPCCPU *cpu,
>>> sPAPREnvironment *spapr,
>>>    static target_ulong get_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
>>>                                    target_ulong *tce)
>>>    {
>>> -    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
>>> +    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
>>>            hcall_dprintf("spapr_iommu_get_tce on out-of-bounds IOBA 0x"
>>>                          TARGET_FMT_lx "\n", ioba);
>>>            return H_PARAMETER;
>>>        }
>>>    -    *tce = tcet->table[ioba >> SPAPR_TCE_PAGE_SHIFT];
>>> +    *tce = tcet->table[ioba >> tcet->page_shift];
>>>          return H_SUCCESS;
>>>    }
>>> @@ -330,8 +337,9 @@ static target_ulong h_get_tce(PowerPCCPU *cpu,
>>> sPAPREnvironment *spapr,
>>>        target_ulong tce = 0;
>>>        target_ulong ret = H_PARAMETER;
>>>        sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>>> +    const target_ulong mask = ~((1 << tcet->page_shift) - 1);
>>>    -    ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
>>> +    ioba &= mask;
>>>          if (tcet) {
>>>            ret = get_tce_emu(tcet, ioba, &tce);
>>> @@ -382,7 +390,7 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const
>>> char *propname,
>>>        }
>>>          return spapr_dma_dt(fdt, node_off, propname,
>>> -                        tcet->liobn, 0, tcet->nb_table <<
>>> SPAPR_TCE_PAGE_SHIFT);
>>> +                        tcet->liobn, 0, tcet->nb_table <<
>>> tcet->page_shift);
>>>    }
>>>      static void spapr_tce_table_class_init(ObjectClass *klass, void *data)
>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>> index fdd4c07..c9850d4 100644
>>> --- a/hw/ppc/spapr_pci.c
>>> +++ b/hw/ppc/spapr_pci.c
>>> @@ -656,6 +656,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState
>>> *sphb, Error **errp)
>>>        sPAPRTCETable *tcet;
>>>          tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>>> +                               SPAPR_TCE_PAGE_SHIFT,
>>>                                   0x40000000 >> SPAPR_TCE_PAGE_SHIFT);
>>>        if (!tcet) {
>>>            error_setg(errp, "Unable to create TCE table for %s",
>>> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
>>> index b84e481..d7e9e6a 100644
>>> --- a/hw/ppc/spapr_vio.c
>>> +++ b/hw/ppc/spapr_vio.c
>>> @@ -457,6 +457,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
>>>        if (pc->rtce_window_size) {
>>>            uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
>>>            dev->tcet = spapr_tce_new_table(qdev, liobn,
>>> +                                        SPAPR_TCE_PAGE_SHIFT,
>> I don't quite understand who defines what the TCE page size is for a given
>> device. Can you try to explain this to me?
> If it is default window (for PCI) or window for VIO - it is 4K. If it is a
> dynamic DMA window - page size is a parameter of RTAS call which creates
> the window.

Could we change that default size for non-dynamic windows somehow? 4k is 
really fine grained.

Since the KVM in-kernel TCE code really is just a dumb memory poker 
without checks, I guess we're fine on that side with dynamic page sizes.


Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift in sPAPRTCETable
  2014-05-22 10:09       ` Alexander Graf
@ 2014-05-22 10:24         ` Alexey Kardashevskiy
  2014-05-22 10:45           ` Alexander Graf
  0 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-22 10:24 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc

On 05/22/2014 08:09 PM, Alexander Graf wrote:
> 
> On 22.05.14 01:45, Alexey Kardashevskiy wrote:
>> On 05/22/2014 08:11 AM, Alexander Graf wrote:
>>> On 21.05.14 16:21, Alexey Kardashevskiy wrote:
>>>> At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
>>>> spec allows other page sizes and we are going to implement them, we need
>>>> page size to be configrable.
>>>>
>>>> This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT
>>>> with it whereever it is possible.
>>>>
>>>> This removes SPAPR_TCE_PAGE_MASK as it is no longer used.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>    hw/ppc/spapr_iommu.c   | 54
>>>> +++++++++++++++++++++++++++++---------------------
>>>>    hw/ppc/spapr_pci.c     |  1 +
>>>>    hw/ppc/spapr_vio.c     |  1 +
>>>>    include/hw/ppc/spapr.h |  3 ++-
>>>>    4 files changed, 35 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>>>> index 90de3e3..e765a6d 100644
>>>> --- a/hw/ppc/spapr_iommu.c
>>>> +++ b/hw/ppc/spapr_iommu.c
>>>> @@ -70,12 +70,13 @@ static IOMMUTLBEntry
>>>> spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>>>>          if (tcet->bypass) {
>>>>            ret.perm = IOMMU_RW;
>>>> -    } else if ((addr >> SPAPR_TCE_PAGE_SHIFT) < tcet->nb_table) {
>>>> +    } else if ((addr >> tcet->page_shift) < tcet->nb_table) {
>>>>            /* Check if we are in bound */
>>>> -        tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT];
>>>> -        ret.iova = addr & ~SPAPR_TCE_PAGE_MASK;
>>>> -        ret.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
>>>> -        ret.addr_mask = SPAPR_TCE_PAGE_MASK;
>>>> +        target_ulong mask = ~((1 << tcet->page_shift) - 1);
>>> Why target_ulong? This should be u64 or hwaddr or something along those
>>> lines, no? Also, can the mask grow bigger than 31bits? If so you probably
>>> want 1ULL (below as well).
>>>
>>> In fact, we might be better off with a few more fields to tcet. Just add
>>> page_mask and page_size in addition to the page_shift one and use them
>>> instead of calculating them over and over again.
>>>
>>>> +        tce = tcet->table[addr >> tcet->page_shift];
>>>> +        ret.iova = addr & mask;
>>>> +        ret.translated_addr = tce & mask;
>>>> +        ret.addr_mask = ~mask;
>>>>            ret.perm = tce;
>>>>        }
>>>>        trace_spapr_iommu_xlate(tcet->liobn, addr, ret.iova, ret.perm,
>>>> @@ -113,7 +114,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
>>>>        if (kvm_enabled()) {
>>>>            tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
>>>>                                                  tcet->nb_table <<
>>>> -                                              SPAPR_TCE_PAGE_SHIFT,
>>>> +                                              tcet->page_shift,
>>>>                                                  &tcet->fd);
>>>>        }
>>>>    @@ -133,6 +134,7 @@ static int spapr_tce_table_realize(DeviceState
>>>> *dev)
>>>>    }
>>>>      sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t
>>>> liobn,
>>>> +                                   uint32_t page_shift,
>>>>                                       uint32_t nb_table)
>>>>    {
>>>>        sPAPRTCETable *tcet;
>>>> @@ -149,6 +151,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState
>>>> *owner, uint32_t liobn,
>>>>          tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
>>>>        tcet->liobn = liobn;
>>>> +    tcet->page_shift = page_shift;
>>>>        tcet->nb_table = nb_table;
>>>>          object_property_add_child(OBJECT(owner), "tce-table",
>>>> OBJECT(tcet), NULL);
>>>> @@ -194,19 +197,20 @@ static target_ulong put_tce_emu(sPAPRTCETable
>>>> *tcet, target_ulong ioba,
>>>>                                    target_ulong tce)
>>>>    {
>>>>        IOMMUTLBEntry entry;
>>>> +    target_ulong mask = ~((1 << tcet->page_shift) - 1);
>>>>    -    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
>>>> +    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
>>>>            hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
>>>>                          TARGET_FMT_lx "\n", ioba);
>>>>            return H_PARAMETER;
>>>>        }
>>>>    -    tcet->table[ioba >> SPAPR_TCE_PAGE_SHIFT] = tce;
>>>> +    tcet->table[ioba >> tcet->page_shift] = tce;
>>>>          entry.target_as = &address_space_memory,
>>>> -    entry.iova = ioba & ~SPAPR_TCE_PAGE_MASK;
>>>> -    entry.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
>>>> -    entry.addr_mask = SPAPR_TCE_PAGE_MASK;
>>>> +    entry.iova = ioba & mask;
>>>> +    entry.translated_addr = tce & mask;
>>>> +    entry.addr_mask = ~mask;
>>>>        entry.perm = tce;
>>>>        memory_region_notify_iommu(&tcet->iommu, entry);
>>>>    @@ -226,6 +230,7 @@ static target_ulong h_put_tce_indirect(PowerPCCPU
>>>> *cpu,
>>>>        target_ulong ret = H_PARAMETER;
>>>>        sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>>>>        CPUState *cs = CPU(cpu);
>>>> +    target_ulong mask;
>>>>          if (!tcet) {
>>>>            return H_PARAMETER;
>>>> @@ -235,12 +240,14 @@ static target_ulong h_put_tce_indirect(PowerPCCPU
>>>> *cpu,
>>>>            return H_PARAMETER;
>>>>        }
>>>>    -    ioba &= ~SPAPR_TCE_PAGE_MASK;
>>>> -    tce_list &= ~SPAPR_TCE_PAGE_MASK;
>>>> +    mask = ~((1 << tcet->page_shift) - 1);
>>>> +    ioba &= mask;
>>>> +
>>>> +    for (i = 0; i < npages; ++i, ioba += (1 << tcet->page_shift)) {
>>>> +        target_ulong off = (tce_list & ~SPAPR_TCE_RW) +
>>>> +                                i * sizeof(target_ulong);
>>>> +        target_ulong tce = ldq_phys(cs->as, off);
>>>>    -    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
>>>> -        target_ulong tce = ldq_phys(cs->as, tce_list +
>>>> -                                    i * sizeof(target_ulong));
>>>>            ret = put_tce_emu(tcet, ioba, tce);
>>>>            if (ret) {
>>>>                break;
>>>> @@ -276,9 +283,9 @@ static target_ulong h_stuff_tce(PowerPCCPU *cpu,
>>>> sPAPREnvironment *spapr,
>>>>            return H_PARAMETER;
>>>>        }
>>>>    -    ioba &= ~SPAPR_TCE_PAGE_MASK;
>>>> +    ioba &= ~((1 << tcet->page_shift) - 1);
>>>>    -    for (i = 0; i < npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
>>>> +    for (i = 0; i < npages; ++i, ioba += (1 << tcet->page_shift)) {
>>>>            ret = put_tce_emu(tcet, ioba, tce_value);
>>>>            if (ret) {
>>>>                break;
>>>> @@ -298,7 +305,7 @@ static target_ulong h_put_tce(PowerPCCPU *cpu,
>>>> sPAPREnvironment *spapr,
>>>>        target_ulong ret = H_PARAMETER;
>>>>        sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>>>>    -    ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
>>>> +    ioba &= ~((1 << tcet->page_shift) - 1);
>>>>          if (tcet) {
>>>>            ret = put_tce_emu(tcet, ioba, tce);
>>>> @@ -311,13 +318,13 @@ static target_ulong h_put_tce(PowerPCCPU *cpu,
>>>> sPAPREnvironment *spapr,
>>>>    static target_ulong get_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
>>>>                                    target_ulong *tce)
>>>>    {
>>>> -    if ((ioba >> SPAPR_TCE_PAGE_SHIFT) >= tcet->nb_table) {
>>>> +    if ((ioba >> tcet->page_shift) >= tcet->nb_table) {
>>>>            hcall_dprintf("spapr_iommu_get_tce on out-of-bounds IOBA 0x"
>>>>                          TARGET_FMT_lx "\n", ioba);
>>>>            return H_PARAMETER;
>>>>        }
>>>>    -    *tce = tcet->table[ioba >> SPAPR_TCE_PAGE_SHIFT];
>>>> +    *tce = tcet->table[ioba >> tcet->page_shift];
>>>>          return H_SUCCESS;
>>>>    }
>>>> @@ -330,8 +337,9 @@ static target_ulong h_get_tce(PowerPCCPU *cpu,
>>>> sPAPREnvironment *spapr,
>>>>        target_ulong tce = 0;
>>>>        target_ulong ret = H_PARAMETER;
>>>>        sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
>>>> +    const target_ulong mask = ~((1 << tcet->page_shift) - 1);
>>>>    -    ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
>>>> +    ioba &= mask;
>>>>          if (tcet) {
>>>>            ret = get_tce_emu(tcet, ioba, &tce);
>>>> @@ -382,7 +390,7 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const
>>>> char *propname,
>>>>        }
>>>>          return spapr_dma_dt(fdt, node_off, propname,
>>>> -                        tcet->liobn, 0, tcet->nb_table <<
>>>> SPAPR_TCE_PAGE_SHIFT);
>>>> +                        tcet->liobn, 0, tcet->nb_table <<
>>>> tcet->page_shift);
>>>>    }
>>>>      static void spapr_tce_table_class_init(ObjectClass *klass, void
>>>> *data)
>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>> index fdd4c07..c9850d4 100644
>>>> --- a/hw/ppc/spapr_pci.c
>>>> +++ b/hw/ppc/spapr_pci.c
>>>> @@ -656,6 +656,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState
>>>> *sphb, Error **errp)
>>>>        sPAPRTCETable *tcet;
>>>>          tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>>>> +                               SPAPR_TCE_PAGE_SHIFT,
>>>>                                   0x40000000 >> SPAPR_TCE_PAGE_SHIFT);
>>>>        if (!tcet) {
>>>>            error_setg(errp, "Unable to create TCE table for %s",
>>>> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
>>>> index b84e481..d7e9e6a 100644
>>>> --- a/hw/ppc/spapr_vio.c
>>>> +++ b/hw/ppc/spapr_vio.c
>>>> @@ -457,6 +457,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
>>>>        if (pc->rtce_window_size) {
>>>>            uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
>>>>            dev->tcet = spapr_tce_new_table(qdev, liobn,
>>>> +                                        SPAPR_TCE_PAGE_SHIFT,
>>> I don't quite understand who defines what the TCE page size is for a given
>>> device. Can you try to explain this to me?
>> If it is default window (for PCI) or window for VIO - it is 4K. If it is a
>> dynamic DMA window - page size is a parameter of RTAS call which creates
>> the window.
> 
> Could we change that default size for non-dynamic windows somehow? 4k is
> really fine grained.

No, this is hardcoded in a million places and old distros. Normally it is
maximum 1GB or 2MB table, not too bad. And with DDW support, we can make
the default one really small (64MB?) and lose even less.

SPAPR:
R1–7.3.31–4. For the Dynamic DMA Windows (DDW) option: The platform must
provide a default DMA window
for each PE, and all of the following must be true:
a. The window is defined by the “ibm,dma-window” property in the OF device
tree.
b. The window is defined with 4 KB I/O pages.
c. The window is located entirely below 4 GB.


> 
> Since the KVM in-kernel TCE code really is just a dumb memory poker without
> checks, I guess we're fine on that side with dynamic page sizes.

Yep.



-- 
Alexey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift in sPAPRTCETable
  2014-05-22 10:24         ` Alexey Kardashevskiy
@ 2014-05-22 10:45           ` Alexander Graf
  2014-05-22 10:46             ` Alexey Kardashevskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2014-05-22 10:45 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 22.05.14 12:24, Alexey Kardashevskiy wrote:
> On 05/22/2014 08:09 PM, Alexander Graf wrote:
>> On 22.05.14 01:45, Alexey Kardashevskiy wrote:
>>> On 05/22/2014 08:11 AM, Alexander Graf wrote:
>>>> On 21.05.14 16:21, Alexey Kardashevskiy wrote:
>>>>> At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
>>>>> spec allows other page sizes and we are going to implement them, we need
>>>>> page size to be configrable.
>>>>>
>>>>> This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT
>>>>> with it whereever it is possible.
>>>>>
>>>>> This removes SPAPR_TCE_PAGE_MASK as it is no longer used.
>>>>>
>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

[...]

>>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>>> index fdd4c07..c9850d4 100644
>>>>> --- a/hw/ppc/spapr_pci.c
>>>>> +++ b/hw/ppc/spapr_pci.c
>>>>> @@ -656,6 +656,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState
>>>>> *sphb, Error **errp)
>>>>>         sPAPRTCETable *tcet;
>>>>>           tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>>>>> +                               SPAPR_TCE_PAGE_SHIFT,
>>>>>                                    0x40000000 >> SPAPR_TCE_PAGE_SHIFT);
>>>>>         if (!tcet) {
>>>>>             error_setg(errp, "Unable to create TCE table for %s",
>>>>> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
>>>>> index b84e481..d7e9e6a 100644
>>>>> --- a/hw/ppc/spapr_vio.c
>>>>> +++ b/hw/ppc/spapr_vio.c
>>>>> @@ -457,6 +457,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
>>>>>         if (pc->rtce_window_size) {
>>>>>             uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
>>>>>             dev->tcet = spapr_tce_new_table(qdev, liobn,
>>>>> +                                        SPAPR_TCE_PAGE_SHIFT,
>>>> I don't quite understand who defines what the TCE page size is for a given
>>>> device. Can you try to explain this to me?
>>> If it is default window (for PCI) or window for VIO - it is 4K. If it is a
>>> dynamic DMA window - page size is a parameter of RTAS call which creates
>>> the window.
>> Could we change that default size for non-dynamic windows somehow? 4k is
>> really fine grained.
> No, this is hardcoded in a million places and old distros. Normally it is
> maximum 1GB or 2MB table, not too bad. And with DDW support, we can make
> the default one really small (64MB?) and lose even less.
>
> SPAPR:
> R1–7.3.31–4. For the Dynamic DMA Windows (DDW) option: The platform must
> provide a default DMA window
> for each PE, and all of the following must be true:
> a. The window is defined by the “ibm,dma-window” property in the OF device
> tree.
> b. The window is defined with 4 KB I/O pages.
> c. The window is located entirely below 4 GB.

Cool, would be interesting to see how that affects performance. I like 
most of this patch set btw, I think it should be good to go in in the 
next round.


Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift in sPAPRTCETable
  2014-05-22 10:45           ` Alexander Graf
@ 2014-05-22 10:46             ` Alexey Kardashevskiy
  2014-05-22 10:48               ` Alexander Graf
  0 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-22 10:46 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc

On 05/22/2014 08:45 PM, Alexander Graf wrote:
> 
> On 22.05.14 12:24, Alexey Kardashevskiy wrote:
>> On 05/22/2014 08:09 PM, Alexander Graf wrote:
>>> On 22.05.14 01:45, Alexey Kardashevskiy wrote:
>>>> On 05/22/2014 08:11 AM, Alexander Graf wrote:
>>>>> On 21.05.14 16:21, Alexey Kardashevskiy wrote:
>>>>>> At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
>>>>>> spec allows other page sizes and we are going to implement them, we need
>>>>>> page size to be configrable.
>>>>>>
>>>>>> This adds @page_shift into sPAPRTCETable and replaces
>>>>>> SPAPR_TCE_PAGE_SHIFT
>>>>>> with it whereever it is possible.
>>>>>>
>>>>>> This removes SPAPR_TCE_PAGE_MASK as it is no longer used.
>>>>>>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> [...]
> 
>>>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>>>> index fdd4c07..c9850d4 100644
>>>>>> --- a/hw/ppc/spapr_pci.c
>>>>>> +++ b/hw/ppc/spapr_pci.c
>>>>>> @@ -656,6 +656,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState
>>>>>> *sphb, Error **errp)
>>>>>>         sPAPRTCETable *tcet;
>>>>>>           tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>>>>>> +                               SPAPR_TCE_PAGE_SHIFT,
>>>>>>                                    0x40000000 >> SPAPR_TCE_PAGE_SHIFT);
>>>>>>         if (!tcet) {
>>>>>>             error_setg(errp, "Unable to create TCE table for %s",
>>>>>> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
>>>>>> index b84e481..d7e9e6a 100644
>>>>>> --- a/hw/ppc/spapr_vio.c
>>>>>> +++ b/hw/ppc/spapr_vio.c
>>>>>> @@ -457,6 +457,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
>>>>>>         if (pc->rtce_window_size) {
>>>>>>             uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
>>>>>>             dev->tcet = spapr_tce_new_table(qdev, liobn,
>>>>>> +                                        SPAPR_TCE_PAGE_SHIFT,
>>>>> I don't quite understand who defines what the TCE page size is for a
>>>>> given
>>>>> device. Can you try to explain this to me?
>>>> If it is default window (for PCI) or window for VIO - it is 4K. If it is a
>>>> dynamic DMA window - page size is a parameter of RTAS call which creates
>>>> the window.
>>> Could we change that default size for non-dynamic windows somehow? 4k is
>>> really fine grained.
>> No, this is hardcoded in a million places and old distros. Normally it is
>> maximum 1GB or 2MB table, not too bad. And with DDW support, we can make
>> the default one really small (64MB?) and lose even less.
>>
>> SPAPR:
>> R1–7.3.31–4. For the Dynamic DMA Windows (DDW) option: The platform must
>> provide a default DMA window
>> for each PE, and all of the following must be true:
>> a. The window is defined by the “ibm,dma-window” property in the OF device
>> tree.
>> b. The window is defined with 4 KB I/O pages.
>> c. The window is located entirely below 4 GB.
> 
> Cool, would be interesting to see how that affects performance. I like most
> of this patch set btw, I think it should be good to go in in the next round.


10Gbit ethernet is ok but 40Gbit seems to suffer from this limitation.



-- 
Alexey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2] spapr: Enable dynamic change of the supported hypercalls list
  2014-05-21 15:21     ` [Qemu-devel] [PATCH v2] " Alexey Kardashevskiy
@ 2014-05-22 10:47       ` Alexander Graf
  2014-05-22 11:01         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2014-05-22 10:47 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 21.05.14 17:21, Alexey Kardashevskiy wrote:
> At the moment the "ibm,hypertas-functions" list is fixed. However some
> calls should be listed there if they are supported by QEMU or the host
> kernel.
>
> This enables hyperrtas_prop to grow on stack by adding
> a SPAPR_HYPERRTAS_ADD macro. "qemu,hypertas-functions" is converted as well.
>
> The first user of this is going to be a "multi-tce" property.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * replaced alloca() with GString
> ---
>   hw/ppc/spapr.c | 30 +++++++++++++++++++++++-------
>   1 file changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0a61246..3b28211 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -293,6 +293,10 @@ static size_t create_page_sizes_prop(CPUPPCState *env, uint32_t *prop,
>           }                                                          \
>       } while (0)
>   
> +static inline void add_str(GString *s, const gchar *s1)

Please remove the "inline" :). Otherwise this looks a lot nicer than 
before :)


Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift in sPAPRTCETable
  2014-05-22 10:46             ` Alexey Kardashevskiy
@ 2014-05-22 10:48               ` Alexander Graf
  2014-05-22 10:55                 ` Alexey Kardashevskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2014-05-22 10:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 22.05.14 12:46, Alexey Kardashevskiy wrote:
> On 05/22/2014 08:45 PM, Alexander Graf wrote:
>> On 22.05.14 12:24, Alexey Kardashevskiy wrote:
>>> On 05/22/2014 08:09 PM, Alexander Graf wrote:
>>>> On 22.05.14 01:45, Alexey Kardashevskiy wrote:
>>>>> On 05/22/2014 08:11 AM, Alexander Graf wrote:
>>>>>> On 21.05.14 16:21, Alexey Kardashevskiy wrote:
>>>>>>> At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
>>>>>>> spec allows other page sizes and we are going to implement them, we need
>>>>>>> page size to be configrable.
>>>>>>>
>>>>>>> This adds @page_shift into sPAPRTCETable and replaces
>>>>>>> SPAPR_TCE_PAGE_SHIFT
>>>>>>> with it whereever it is possible.
>>>>>>>
>>>>>>> This removes SPAPR_TCE_PAGE_MASK as it is no longer used.
>>>>>>>
>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> [...]
>>
>>>>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>>>>> index fdd4c07..c9850d4 100644
>>>>>>> --- a/hw/ppc/spapr_pci.c
>>>>>>> +++ b/hw/ppc/spapr_pci.c
>>>>>>> @@ -656,6 +656,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState
>>>>>>> *sphb, Error **errp)
>>>>>>>          sPAPRTCETable *tcet;
>>>>>>>            tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>>>>>>> +                               SPAPR_TCE_PAGE_SHIFT,
>>>>>>>                                     0x40000000 >> SPAPR_TCE_PAGE_SHIFT);
>>>>>>>          if (!tcet) {
>>>>>>>              error_setg(errp, "Unable to create TCE table for %s",
>>>>>>> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
>>>>>>> index b84e481..d7e9e6a 100644
>>>>>>> --- a/hw/ppc/spapr_vio.c
>>>>>>> +++ b/hw/ppc/spapr_vio.c
>>>>>>> @@ -457,6 +457,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
>>>>>>>          if (pc->rtce_window_size) {
>>>>>>>              uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
>>>>>>>              dev->tcet = spapr_tce_new_table(qdev, liobn,
>>>>>>> +                                        SPAPR_TCE_PAGE_SHIFT,
>>>>>> I don't quite understand who defines what the TCE page size is for a
>>>>>> given
>>>>>> device. Can you try to explain this to me?
>>>>> If it is default window (for PCI) or window for VIO - it is 4K. If it is a
>>>>> dynamic DMA window - page size is a parameter of RTAS call which creates
>>>>> the window.
>>>> Could we change that default size for non-dynamic windows somehow? 4k is
>>>> really fine grained.
>>> No, this is hardcoded in a million places and old distros. Normally it is
>>> maximum 1GB or 2MB table, not too bad. And with DDW support, we can make
>>> the default one really small (64MB?) and lose even less.
>>>
>>> SPAPR:
>>> R1–7.3.31–4. For the Dynamic DMA Windows (DDW) option: The platform must
>>> provide a default DMA window
>>> for each PE, and all of the following must be true:
>>> a. The window is defined by the “ibm,dma-window” property in the OF device
>>> tree.
>>> b. The window is defined with 4 KB I/O pages.
>>> c. The window is located entirely below 4 GB.
>> Cool, would be interesting to see how that affects performance. I like most
>> of this patch set btw, I think it should be good to go in in the next round.
>
> 10Gbit ethernet is ok but 40Gbit seems to suffer from this limitation.

Oh, I was implying that it'd be interesting to see how we fare when we 
do a really small default DMA window to force the guest to use DDW and 
preferably even with very large pages.


Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift in sPAPRTCETable
  2014-05-22 10:48               ` Alexander Graf
@ 2014-05-22 10:55                 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-22 10:55 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc

On 05/22/2014 08:48 PM, Alexander Graf wrote:
> 
> On 22.05.14 12:46, Alexey Kardashevskiy wrote:
>> On 05/22/2014 08:45 PM, Alexander Graf wrote:
>>> On 22.05.14 12:24, Alexey Kardashevskiy wrote:
>>>> On 05/22/2014 08:09 PM, Alexander Graf wrote:
>>>>> On 22.05.14 01:45, Alexey Kardashevskiy wrote:
>>>>>> On 05/22/2014 08:11 AM, Alexander Graf wrote:
>>>>>>> On 21.05.14 16:21, Alexey Kardashevskiy wrote:
>>>>>>>> At the moment only 4K pages are supported by sPAPRTCETable. Since
>>>>>>>> sPAPR
>>>>>>>> spec allows other page sizes and we are going to implement them, we
>>>>>>>> need
>>>>>>>> page size to be configrable.
>>>>>>>>
>>>>>>>> This adds @page_shift into sPAPRTCETable and replaces
>>>>>>>> SPAPR_TCE_PAGE_SHIFT
>>>>>>>> with it whereever it is possible.
>>>>>>>>
>>>>>>>> This removes SPAPR_TCE_PAGE_MASK as it is no longer used.
>>>>>>>>
>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> [...]
>>>
>>>>>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>>>>>> index fdd4c07..c9850d4 100644
>>>>>>>> --- a/hw/ppc/spapr_pci.c
>>>>>>>> +++ b/hw/ppc/spapr_pci.c
>>>>>>>> @@ -656,6 +656,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState
>>>>>>>> *sphb, Error **errp)
>>>>>>>>          sPAPRTCETable *tcet;
>>>>>>>>            tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>>>>>>>> +                               SPAPR_TCE_PAGE_SHIFT,
>>>>>>>>                                     0x40000000 >>
>>>>>>>> SPAPR_TCE_PAGE_SHIFT);
>>>>>>>>          if (!tcet) {
>>>>>>>>              error_setg(errp, "Unable to create TCE table for %s",
>>>>>>>> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
>>>>>>>> index b84e481..d7e9e6a 100644
>>>>>>>> --- a/hw/ppc/spapr_vio.c
>>>>>>>> +++ b/hw/ppc/spapr_vio.c
>>>>>>>> @@ -457,6 +457,7 @@ static int spapr_vio_busdev_init(DeviceState
>>>>>>>> *qdev)
>>>>>>>>          if (pc->rtce_window_size) {
>>>>>>>>              uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
>>>>>>>>              dev->tcet = spapr_tce_new_table(qdev, liobn,
>>>>>>>> +                                        SPAPR_TCE_PAGE_SHIFT,
>>>>>>> I don't quite understand who defines what the TCE page size is for a
>>>>>>> given
>>>>>>> device. Can you try to explain this to me?
>>>>>> If it is default window (for PCI) or window for VIO - it is 4K. If it
>>>>>> is a
>>>>>> dynamic DMA window - page size is a parameter of RTAS call which creates
>>>>>> the window.
>>>>> Could we change that default size for non-dynamic windows somehow? 4k is
>>>>> really fine grained.
>>>> No, this is hardcoded in a million places and old distros. Normally it is
>>>> maximum 1GB or 2MB table, not too bad. And with DDW support, we can make
>>>> the default one really small (64MB?) and lose even less.
>>>>
>>>> SPAPR:
>>>> R1–7.3.31–4. For the Dynamic DMA Windows (DDW) option: The platform must
>>>> provide a default DMA window
>>>> for each PE, and all of the following must be true:
>>>> a. The window is defined by the “ibm,dma-window” property in the OF device
>>>> tree.
>>>> b. The window is defined with 4 KB I/O pages.
>>>> c. The window is located entirely below 4 GB.
>>> Cool, would be interesting to see how that affects performance. I like most
>>> of this patch set btw, I think it should be good to go in in the next
>>> round.
>>
>> 10Gbit ethernet is ok but 40Gbit seems to suffer from this limitation.
> 
> Oh, I was implying that it'd be interesting to see how we fare when we do a
> really small default DMA window to force the guest to use DDW and
> preferably even with very large pages.


Guest switches to DDW always if the feature is there. SPAPR DDW API even
allows deleting the default one if the platform supports "reset" (which I
do not implement for now).



-- 
Alexey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2] spapr: Enable dynamic change of the supported hypercalls list
  2014-05-22 10:47       ` Alexander Graf
@ 2014-05-22 11:01         ` Alexey Kardashevskiy
  2014-05-22 11:02           ` Alexander Graf
  0 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-22 11:01 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc

On 05/22/2014 08:47 PM, Alexander Graf wrote:
> 
> On 21.05.14 17:21, Alexey Kardashevskiy wrote:
>> At the moment the "ibm,hypertas-functions" list is fixed. However some
>> calls should be listed there if they are supported by QEMU or the host
>> kernel.
>>
>> This enables hyperrtas_prop to grow on stack by adding
>> a SPAPR_HYPERRTAS_ADD macro. "qemu,hypertas-functions" is converted as well.
>>
>> The first user of this is going to be a "multi-tce" property.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> Changes:
>> v2:
>> * replaced alloca() with GString
>> ---
>>   hw/ppc/spapr.c | 30 +++++++++++++++++++++++-------
>>   1 file changed, 23 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 0a61246..3b28211 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -293,6 +293,10 @@ static size_t create_page_sizes_prop(CPUPPCState
>> *env, uint32_t *prop,
>>           }                                                          \
>>       } while (0)
>>   +static inline void add_str(GString *s, const gchar *s1)
> 
> Please remove the "inline" :). Otherwise this looks a lot nicer than before :)

Good :) What now? Have you finished with this set and I can repost it?


-- 
Alexey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2] spapr: Enable dynamic change of the supported hypercalls list
  2014-05-22 11:01         ` Alexey Kardashevskiy
@ 2014-05-22 11:02           ` Alexander Graf
  0 siblings, 0 replies; 30+ messages in thread
From: Alexander Graf @ 2014-05-22 11:02 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 22.05.14 13:01, Alexey Kardashevskiy wrote:
> On 05/22/2014 08:47 PM, Alexander Graf wrote:
>> On 21.05.14 17:21, Alexey Kardashevskiy wrote:
>>> At the moment the "ibm,hypertas-functions" list is fixed. However some
>>> calls should be listed there if they are supported by QEMU or the host
>>> kernel.
>>>
>>> This enables hyperrtas_prop to grow on stack by adding
>>> a SPAPR_HYPERRTAS_ADD macro. "qemu,hypertas-functions" is converted as well.
>>>
>>> The first user of this is going to be a "multi-tce" property.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>> Changes:
>>> v2:
>>> * replaced alloca() with GString
>>> ---
>>>    hw/ppc/spapr.c | 30 +++++++++++++++++++++++-------
>>>    1 file changed, 23 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 0a61246..3b28211 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -293,6 +293,10 @@ static size_t create_page_sizes_prop(CPUPPCState
>>> *env, uint32_t *prop,
>>>            }                                                          \
>>>        } while (0)
>>>    +static inline void add_str(GString *s, const gchar *s1)
>> Please remove the "inline" :). Otherwise this looks a lot nicer than before :)
> Good :) What now? Have you finished with this set and I can repost it?

Yup


Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-05-22 11:02 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-21 14:21 [Qemu-devel] [PATCH 0/9] spapr_pci: Prepare for VFIO Alexey Kardashevskiy
2014-05-21 14:21 ` [Qemu-devel] [PATCH 1/9] spapr: Enable dynamic change of the supported hypercalls list Alexey Kardashevskiy
2014-05-21 14:26   ` Alexander Graf
2014-05-21 15:21     ` [Qemu-devel] [PATCH v2] " Alexey Kardashevskiy
2014-05-22 10:47       ` Alexander Graf
2014-05-22 11:01         ` Alexey Kardashevskiy
2014-05-22 11:02           ` Alexander Graf
2014-05-21 14:21 ` [Qemu-devel] [PATCH 2/9] spapr_iommu: Enable multiple TCE requests Alexey Kardashevskiy
2014-05-21 14:37   ` Alexander Graf
2014-05-21 15:23     ` [Qemu-devel] [PATCH v2] " Alexey Kardashevskiy
2014-05-21 16:03       ` Alexey Kardashevskiy
2014-05-21 21:54         ` Alexander Graf
2014-05-21 14:21 ` [Qemu-devel] [PATCH 3/9] spapr_pci: Introduce a finish_realize() callback Alexey Kardashevskiy
2014-05-21 14:21 ` [Qemu-devel] [PATCH 4/9] spapr_pci: spapr_iommu: Make DMA window a subregion Alexey Kardashevskiy
2014-05-21 14:21 ` [Qemu-devel] [PATCH 5/9] spapr_pci: Allow multiple TCE tables per PHB Alexey Kardashevskiy
2014-05-21 14:21 ` [Qemu-devel] [PATCH 6/9] spapr_iommu: Convert old qdev_init_nofail() to object_property_set_bool Alexey Kardashevskiy
2014-05-21 14:21 ` [Qemu-devel] [PATCH 7/9] spapr_iommu: Get rid of window_size in sPAPRTCETable Alexey Kardashevskiy
2014-05-21 22:05   ` Alexander Graf
2014-05-21 14:21 ` [Qemu-devel] [PATCH 8/9] spapr_iommu: Introduce page_shift " Alexey Kardashevskiy
2014-05-21 22:11   ` Alexander Graf
2014-05-21 23:45     ` Alexey Kardashevskiy
2014-05-22 10:09       ` Alexander Graf
2014-05-22 10:24         ` Alexey Kardashevskiy
2014-05-22 10:45           ` Alexander Graf
2014-05-22 10:46             ` Alexey Kardashevskiy
2014-05-22 10:48               ` Alexander Graf
2014-05-22 10:55                 ` Alexey Kardashevskiy
2014-05-22  4:25     ` Alexey Kardashevskiy
2014-05-22  7:11       ` Alexander Graf
2014-05-21 14:21 ` [Qemu-devel] [PATCH 9/9] spapr_iommu: Introduce bus_offset " Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.