[Qemu-devel] [0/3] pseries: Support and improvements for KVM Book3S-HV support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [0/3] pseries: Support and improvements for KVM Book3S-HV support
@ 2011-09-29  6:45 David Gibson
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 1/3] pseries: Support SMT systems for KVM Book3S-HV David Gibson
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: David Gibson @ 2011-09-29  6:45 UTC (permalink / raw)
  To: agraf; +Cc: qemu-devel

Alex Graf has added support for KVM acceleration of the pseries
machine, using his Book3S-PR KVM variant, which runs the guest in
userspace, emulating supervisor operations.  Recent kernels now have
the Book3S-HV KVM variant which uses the hardware hypervisor features
of recent POWER CPUs.  Alex's changes to qemu are enough to get qemu
working roughly with Book3S-HV, but taking full advantage of this mode
needs more work.  This patch series makes a start on better exploiting
Book3S-HV.

Even with these patches, qemu won't quite be able to run on a current
Book3S-HV KVM kernel.  That's because current Book3S-HV requires guest
memory to be backed by hugepages, but qemu refuses to use hugepages
for guest memory unless KVM advertises CAP_SYNC_MMU, which Book3S-HV
does not currently do.  We're working on improvements to the KVM code
which will implement CAP_SYNC_MMU and allow smallpage backing of
guests, but they're not there yet.  So, in order to test Book3S-HV for
now you need to either:

 * Hack the host kernel to lie and advertise CAP_SYNC_MMU even though
   it doesn't really implement it.

or

 * Hack qemu so it does not check for CAP_SYNC_MMU when the -mem-path
   option is used.

Bot approaches are ugly and unsafe, but it seems we can generally get
away with it in practice.  Obviously this is only an interim hack
until the proper CAP_SYNC_MMU support is ready.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 1/3] pseries: Support SMT systems for KVM Book3S-HV
  2011-09-29  6:45 [Qemu-devel] [0/3] pseries: Support and improvements for KVM Book3S-HV support David Gibson
@ 2011-09-29  6:45 ` David Gibson
  2011-09-29  7:27   ` Jan Kiszka
  2011-09-29 13:17   ` Alexander Graf
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 2/3] pseries: Allow KVM Book3S-HV on PPC970 CPUS David Gibson
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 3/3] pseries: Use Book3S-HV TCE acceleration capabilities David Gibson
  2 siblings, 2 replies; 11+ messages in thread
From: David Gibson @ 2011-09-29  6:45 UTC (permalink / raw)
  To: agraf; +Cc: qemu-devel

Alex Graf has already made qemu support KVM for the pseries machine
when using the Book3S-PR KVM variant (which runs the guest in
usermode, emulating supervisor operations).  This code allows gets us
very close to also working with KVM Book3S-HV (using the hypervisor
capabilities of recent POWER CPUs).

This patch moves us another step towards Book3S-HV support by
correctly handling SMT (multithreaded) POWER CPUs.  There are two
parts to this:

 * Querying KVM to check SMT capability, and if present, adjusting the
   cpu numbers that qemu assigns to cause KVM to assign guest threads
   to cores in the right way (this isn't automatic, because the POWER
   HV support has a limitation that different threads on a single core
   cannot be in different guests at the same time).

 * Correctly informing the guest OS of the SMT thread to core mappings
   via the device tree.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr.c           |   23 ++++++++++++++++++++---
 target-ppc/helper.c  |   11 +++++++++++
 target-ppc/kvm.c     |   10 ++++++++++
 target-ppc/kvm_ppc.h |    6 ++++++
 4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index b118975..ba9ae1c 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -29,6 +29,9 @@
 #include "elf.h"
 #include "net.h"
 #include "blockdev.h"
+#include "cpus.h"
+#include "kvm.h"
+#include "kvm_ppc.h"
 
 #include "hw/boards.h"
 #include "hw/ppc.h"
@@ -103,6 +106,7 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
     int i;
     char *modelname;
+    int smt = kvmppc_smt_threads();
 
 #define _FDT(exp) \
     do { \
@@ -162,13 +166,17 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
         int index = env->cpu_index;
-        uint32_t gserver_prop[] = {cpu_to_be32(index), 0}; /* HACK! */
+        uint32_t servers_prop[smp_threads];
+        uint32_t gservers_prop[smp_threads * 2];
         char *nodename;
         uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
                            0xffffffff, 0xffffffff};
         uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq() : TIMEBASE_FREQ;
         uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 1000000000;
 
+        if ((index % smt) != 0)
+            continue;
+
         if (asprintf(&nodename, "%s@%x", modelname, index) < 0) {
             fprintf(stderr, "Allocation failure\n");
             exit(1);
@@ -193,9 +201,18 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
                            pft_size_prop, sizeof(pft_size_prop))));
         _FDT((fdt_property_string(fdt, "status", "okay")));
         _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
-        _FDT((fdt_property_cell(fdt, "ibm,ppc-interrupt-server#s", index)));
+
+        /* Build interrupt servers and gservers properties */
+        for (i = 0; i < smp_threads; i++) {
+            servers_prop[i] = cpu_to_be32(index + i);
+            /* Hack, direct the group queues back to cpu 0 */
+            gservers_prop[i*2] = cpu_to_be32(index + i);
+            gservers_prop[i*2 + 1] = 0;
+        }
+        _FDT((fdt_property(fdt, "ibm,ppc-interrupt-server#s",
+                           servers_prop, sizeof(servers_prop))));
         _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s",
-                           gserver_prop, sizeof(gserver_prop))));
+                           gservers_prop, sizeof(gservers_prop))));
 
         if (env->mmu_model & POWERPC_MMU_1TSEG) {
             _FDT((fdt_property(fdt, "ibm,processor-segment-sizes",
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 6339be3..137a494 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -26,6 +26,8 @@
 #include "helper_regs.h"
 #include "qemu-common.h"
 #include "kvm.h"
+#include "kvm_ppc.h"
+#include "cpus.h"
 
 //#define DEBUG_MMU
 //#define DEBUG_BATS
@@ -3189,6 +3191,15 @@ CPUPPCState *cpu_ppc_init (const char *cpu_model)
     if (tcg_enabled()) {
         ppc_translate_init();
     }
+    /* Adjust cpu index for SMT */
+#if !defined(CONFIG_USER_ONLY)
+    if (kvm_enabled()) {
+        int smt = kvmppc_smt_threads();
+
+        env->cpu_index = (env->cpu_index / smp_threads)*smt
+            + (env->cpu_index % smp_threads);
+    }
+#endif /* !CONFIG_USER_ONLY */
     env->cpu_model_str = cpu_model;
     cpu_ppc_register_internal(env, def);
 
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 35a6f10..2c1bc7a 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -28,6 +28,7 @@
 #include "kvm_ppc.h"
 #include "cpu.h"
 #include "device_tree.h"
+#include "hw/spapr.h"
 
 #include "hw/sysbus.h"
 #include "hw/spapr.h"
@@ -53,6 +54,7 @@ static int cap_interrupt_unset = false;
 static int cap_interrupt_level = false;
 static int cap_segstate;
 static int cap_booke_sregs;
+static int cap_ppc_smt = 0;
 
 /* XXX We have a race condition where we actually have a level triggered
  *     interrupt, but the infrastructure can't expose that yet, so the guest
@@ -76,6 +78,9 @@ int kvm_arch_init(KVMState *s)
     cap_interrupt_level = kvm_check_extension(s, KVM_CAP_PPC_IRQ_LEVEL);
     cap_segstate = kvm_check_extension(s, KVM_CAP_PPC_SEGSTATE);
     cap_booke_sregs = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_SREGS);
+#ifdef KVM_CAP_PPC_SMT
+    cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
+#endif
 
     if (!cap_interrupt_level) {
         fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
@@ -731,6 +736,11 @@ fail:
     cpu_abort(env, "This KVM version does not support PAPR\n");
 }
 
+int kvmppc_smt_threads(void)
+{
+    return cap_ppc_smt ? cap_ppc_smt : 1;
+}
+
 bool kvm_arch_stop_on_emulation_error(CPUState *env)
 {
     return true;
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index c484e60..c298411 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -18,6 +18,7 @@ uint64_t kvmppc_get_clockfreq(void);
 int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len);
 int kvmppc_set_interrupt(CPUState *env, int irq, int level);
 void kvmppc_set_papr(CPUState *env);
+int kvmppc_smt_threads(void);
 
 #else
 
@@ -45,6 +46,11 @@ static inline void kvmppc_set_papr(CPUState *env)
 {
 }
 
+static inline int kvmppc_smt_threads(void)
+{
+    return 1;
+}
+
 #endif
 
 #ifndef CONFIG_KVM
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 2/3] pseries: Allow KVM Book3S-HV on PPC970 CPUS
  2011-09-29  6:45 [Qemu-devel] [0/3] pseries: Support and improvements for KVM Book3S-HV support David Gibson
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 1/3] pseries: Support SMT systems for KVM Book3S-HV David Gibson
@ 2011-09-29  6:45 ` David Gibson
  2011-09-29 13:25   ` Alexander Graf
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 3/3] pseries: Use Book3S-HV TCE acceleration capabilities David Gibson
  2 siblings, 1 reply; 11+ messages in thread
From: David Gibson @ 2011-09-29  6:45 UTC (permalink / raw)
  To: agraf; +Cc: qemu-devel

At present, using the hypervisor aware Book3S-HV KVM will only work
with qemu on POWER7 CPUs.  PPC970 CPUs also have hypervisor
capability, but they lack the VRMA feature which makes assigning guest
memory easier.

In order to allow KVM Book3S-HV on PPC970, we need to specially
allocate the first chunk of guest memory (the "Real Mode Area" or
RMA), so that it is physically contiguous.

Sufficiently recent host kernels allow such contiguous RMAs to be
allocated, with a kvm capability advertising whether the feature is
available and/or necessary on this hardware.  This patch enables qemu
to use this support, thus allowing kvm acceleration of pseries qemu
machines on PPC970 hardware.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr.c           |   50 ++++++++++++++++++++++++++++++++++++++++--------
 target-ppc/kvm.c     |   51 ++++++++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/kvm_ppc.h |    6 +++++
 3 files changed, 98 insertions(+), 9 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index ba9ae1c..d51425a 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -89,6 +89,7 @@ qemu_irq spapr_allocate_irq(uint32_t hint, uint32_t *irq_num)
 }
 
 static void *spapr_create_fdt_skel(const char *cpu_model,
+                                   target_phys_addr_t rma_size,
                                    target_phys_addr_t initrd_base,
                                    target_phys_addr_t initrd_size,
                                    const char *boot_device,
@@ -97,7 +98,9 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 {
     void *fdt;
     CPUState *env;
-    uint64_t mem_reg_property[] = { 0, cpu_to_be64(ram_size) };
+    uint64_t mem_reg_property_rma[] = { 0, cpu_to_be64(rma_size) };
+    uint64_t mem_reg_property_nonrma[] = { cpu_to_be64(rma_size),
+                                           cpu_to_be64(ram_size - rma_size) };
     uint32_t start_prop = cpu_to_be32(initrd_base);
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
     uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
@@ -143,15 +146,25 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 
     _FDT((fdt_end_node(fdt)));
 
-    /* memory node */
+    /* memory node(s) */
     _FDT((fdt_begin_node(fdt, "memory@0")));
 
     _FDT((fdt_property_string(fdt, "device_type", "memory")));
-    _FDT((fdt_property(fdt, "reg",
-                       mem_reg_property, sizeof(mem_reg_property))));
-
+    _FDT((fdt_property(fdt, "reg", mem_reg_property_rma,
+                       sizeof(mem_reg_property_rma))));
     _FDT((fdt_end_node(fdt)));
 
+    if (ram_size > rma_size) {
+        char mem_name[32];
+
+	sprintf(mem_name, "memory@%" PRIx64, (uint64_t)rma_size);
+	_FDT((fdt_begin_node(fdt, mem_name)));
+	_FDT((fdt_property_string(fdt, "device_type", "memory")));
+        _FDT((fdt_property(fdt, "reg", mem_reg_property_nonrma,
+                           sizeof(mem_reg_property_nonrma))));
+        _FDT((fdt_end_node(fdt)));
+    }        
+
     /* cpus */
     _FDT((fdt_begin_node(fdt, "cpus")));
 
@@ -341,6 +354,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 {
     CPUState *env;
     int i;
+    target_phys_addr_t rma_alloc_size, rma_size;
     ram_addr_t ram_offset;
     uint32_t initrd_base;
     long kernel_size, initrd_size, fw_size;
@@ -350,10 +364,23 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     spapr = g_malloc(sizeof(*spapr));
     cpu_ppc_hypercall = emulate_spapr_hypercall;
 
+    /* Allocate RMA if necessary */
+    rma_alloc_size = kvmppc_alloc_rma("ppc_spapr.rma");
+
+    if (rma_alloc_size == -1) {
+        hw_error("qemu: Unable to create RMA\n");
+        exit(1);
+    }
+    if (rma_alloc_size && (rma_alloc_size < ram_size)) {
+        rma_size = rma_alloc_size;
+    } else {
+        rma_size = ram_size;
+    }
+
     /* We place the device tree just below either the top of RAM, or
      * 2GB, so that it can be processed with 32-bit code if
      * necessary */
-    spapr->fdt_addr = MIN(ram_size, 0x80000000) - FDT_MAX_SIZE;
+    spapr->fdt_addr = MIN(rma_size, 0x80000000) - FDT_MAX_SIZE;
     spapr->rtas_addr = spapr->fdt_addr - RTAS_MAX_SIZE;
 
     /* init CPUs */
@@ -378,8 +405,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 
     /* allocate RAM */
     spapr->ram_limit = ram_size;
-    ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", spapr->ram_limit);
-    cpu_register_physical_memory(0, ram_size, ram_offset);
+    if (spapr->ram_limit > rma_alloc_size) {
+        ram_addr_t nonrma_base = rma_alloc_size;
+        ram_addr_t nonrma_size = spapr->ram_limit - rma_alloc_size;
+
+        ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", nonrma_size);
+        cpu_register_physical_memory(nonrma_base, nonrma_size, ram_offset);
+    }
 
     /* allocate hash page table.  For now we always make this 16mb,
      * later we should probably make it scale to the size of guest
@@ -503,7 +535,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     }
 
     /* Prepare the device tree */
-    spapr->fdt_skel = spapr_create_fdt_skel(cpu_model,
+    spapr->fdt_skel = spapr_create_fdt_skel(cpu_model, rma_size,
                                             initrd_base, initrd_size,
                                             boot_device, kernel_cmdline,
                                             pteg_shift + 7);
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 2c1bc7a..37ee902 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -55,6 +55,9 @@ static int cap_interrupt_level = false;
 static int cap_segstate;
 static int cap_booke_sregs;
 static int cap_ppc_smt = 0;
+#ifdef KVM_CAP_PPC_RMA
+static int cap_ppc_rma = 0;
+#endif
 
 /* XXX We have a race condition where we actually have a level triggered
  *     interrupt, but the infrastructure can't expose that yet, so the guest
@@ -81,6 +84,9 @@ int kvm_arch_init(KVMState *s)
 #ifdef KVM_CAP_PPC_SMT
     cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
 #endif
+#ifdef KVM_CAP_PPC_RMA
+    cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
+#endif
 
     if (!cap_interrupt_level) {
         fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
@@ -741,6 +747,51 @@ int kvmppc_smt_threads(void)
     return cap_ppc_smt ? cap_ppc_smt : 1;
 }
 
+off_t kvmppc_alloc_rma(const char *name)
+{
+#ifndef KVM_CAP_PPC_RMA
+    return 0;
+#else
+    void *rma;
+    ram_addr_t rma_offset;
+    off_t size;
+    int fd;
+    struct kvm_allocate_rma ret;
+
+    /* If cap_ppc_rma == 0, contiguous RMA allocation is not supported
+     * if cap_ppc_rma == 1, contiguous RMA allocation is supported, but
+     *                      not necessary on this hardware
+     * if cap_ppc_rma == 2, contiguous RMA allocation is needed on this hardware
+     *
+     * FIXME: We should allow the user to force contiguous RMA
+     * allocation in the cap_ppc_rma==1 case.
+     */
+    if (cap_ppc_rma < 2) {
+        return 0;
+    }
+
+    fd = kvm_vm_ioctl(kvm_state, KVM_ALLOCATE_RMA, &ret);
+    if (fd < 0) {
+        fprintf(stderr, "KVM: Error on KVM_ALLOCATE_RMA: %s\n",
+                strerror(errno));
+        return -1;
+    }
+
+    size = MIN(ret.rma_size, 256ul << 20);
+
+    rma = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+    if (rma == MAP_FAILED) {
+        fprintf(stderr, "KVM: Error mapping RMA: %s\n", strerror(errno));
+        return -1;
+    };
+
+    rma_offset = qemu_ram_alloc_from_ptr(NULL, name, size, rma);
+    cpu_register_physical_memory(0, size, rma_offset);
+
+    return size;
+#endif
+}
+
 bool kvm_arch_stop_on_emulation_error(CPUState *env)
 {
     return true;
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index c298411..ad9903c 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -19,6 +19,7 @@ int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len);
 int kvmppc_set_interrupt(CPUState *env, int irq, int level);
 void kvmppc_set_papr(CPUState *env);
 int kvmppc_smt_threads(void);
+off_t kvmppc_alloc_rma(const char *name);
 
 #else
 
@@ -51,6 +52,11 @@ static inline int kvmppc_smt_threads(void)
     return 1;
 }
 
+static inline off_t kvmppc_alloc_rma(const char *name)
+{
+    return 0;
+}
+
 #endif
 
 #ifndef CONFIG_KVM
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 3/3] pseries: Use Book3S-HV TCE acceleration capabilities
  2011-09-29  6:45 [Qemu-devel] [0/3] pseries: Support and improvements for KVM Book3S-HV support David Gibson
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 1/3] pseries: Support SMT systems for KVM Book3S-HV David Gibson
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 2/3] pseries: Allow KVM Book3S-HV on PPC970 CPUS David Gibson
@ 2011-09-29  6:45 ` David Gibson
  2011-09-29  7:27   ` Jan Kiszka
  2011-09-29 13:29   ` Alexander Graf
  2 siblings, 2 replies; 11+ messages in thread
From: David Gibson @ 2011-09-29  6:45 UTC (permalink / raw)
  To: agraf; +Cc: qemu-devel

The pseries machine of qemu implements the TCE mechanism used as a
virtual IOMMU for the PAPR defined virtual IO devices.  Because the
PAPR spec only defines a small DMA address space, the guest VIO
drivers need to update TCE mappings very frequently - the virtual
network device is particularly bad.  This means many slow exits to
qemu to emulate the H_PUT_TCE hypercall.

Sufficiently recent kernels allow this to be mitigated by implementing
H_PUT_TCE in the host kernel.  To make use of this, however, qemu
needs to initialize the necessary TCE tables, and map them into itself
so that the VIO device implementations can retrieve the mappings when
they access guest memory (which is treated as a virtual DMA
operation).

This patch adds the necessary calls to use the KVM TCE acceleration.
If the kernel does not support acceleration, or there is some other
error creating the accelerated TCE table, then it will still fall back
to full userspace TCE implementation.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr_vio.c       |    8 ++++++-
 hw/spapr_vio.h       |    1 +
 target-ppc/kvm.c     |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/kvm_ppc.h |   14 +++++++++++++
 4 files changed, 76 insertions(+), 1 deletions(-)

diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
index 35818e1..1da3032 100644
--- a/hw/spapr_vio.c
+++ b/hw/spapr_vio.c
@@ -165,7 +165,13 @@ static void rtce_init(VIOsPAPRDevice *dev)
         * sizeof(VIOsPAPR_RTCE);
 
     if (size) {
-        dev->rtce_table = g_malloc0(size);
+        dev->rtce_table = kvmppc_create_spapr_tce(dev->reg,
+                                                  dev->rtce_window_size,
+                                                  &dev->kvmtce_fd);
+
+        if (!dev->rtce_table) {
+            dev->rtce_table = g_malloc0(size);
+        }
     }
 }
 
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index 4fe5f74..a325a5f 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -57,6 +57,7 @@ typedef struct VIOsPAPRDevice {
     target_ulong signal_state;
     uint32_t rtce_window_size;
     VIOsPAPR_RTCE *rtce_table;
+    int kvmtce_fd;
     VIOsPAPR_CRQ crq;
 } VIOsPAPRDevice;
 
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 37ee902..866cf7f 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -28,6 +28,7 @@
 #include "kvm_ppc.h"
 #include "cpu.h"
 #include "device_tree.h"
+#include "hw/sysbus.h"
 #include "hw/spapr.h"
 
 #include "hw/sysbus.h"
@@ -58,6 +59,7 @@ static int cap_ppc_smt = 0;
 #ifdef KVM_CAP_PPC_RMA
 static int cap_ppc_rma = 0;
 #endif
+static int cap_spapr_tce = false;
 
 /* XXX We have a race condition where we actually have a level triggered
  *     interrupt, but the infrastructure can't expose that yet, so the guest
@@ -87,6 +89,9 @@ int kvm_arch_init(KVMState *s)
 #ifdef KVM_CAP_PPC_RMA
     cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
 #endif
+#ifdef KVM_CAP_SPAPR_TCE
+    cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
+#endif
 
     if (!cap_interrupt_level) {
         fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
@@ -792,6 +797,55 @@ off_t kvmppc_alloc_rma(const char *name)
 #endif
 }
 
+void *kvmppc_create_spapr_tce(target_ulong liobn, uint32_t window_size, int *pfd)
+{    struct kvm_create_spapr_tce args = {
+        .liobn = liobn,
+        .window_size = window_size,
+    };
+    long len;
+    int fd;
+    void *table;
+
+    if (!cap_spapr_tce) {
+        return NULL;
+    }
+
+    fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_SPAPR_TCE, &args);
+    if (fd < 0) {
+        return NULL;
+    }
+
+    len = (window_size / SPAPR_VIO_TCE_PAGE_SIZE) * sizeof(VIOsPAPR_RTCE);
+    /* FIXME: round this up to page size */
+
+    table = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
+    if (table == MAP_FAILED) {
+        close(fd);
+        return NULL;
+    }
+
+    *pfd = fd;
+    return table;
+}
+
+int kvmppc_remove_spapr_tce(void *table, int fd, uint32_t window_size)
+{
+    long len;
+
+    if (fd < 0)
+        return -1;
+
+    len = (window_size / SPAPR_VIO_TCE_PAGE_SIZE)*sizeof(VIOsPAPR_RTCE);
+    if ((munmap(table, len) < 0) ||
+        (close(fd) < 0)) {
+        fprintf(stderr, "KVM: Unexpected error removing KVM SPAPR TCE "
+                "table: %s", strerror(errno));
+        /* Leak the table */
+    }
+
+    return 0;
+}
+
 bool kvm_arch_stop_on_emulation_error(CPUState *env)
 {
     return true;
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index ad9903c..b78579a 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -20,6 +20,8 @@ int kvmppc_set_interrupt(CPUState *env, int irq, int level);
 void kvmppc_set_papr(CPUState *env);
 int kvmppc_smt_threads(void);
 off_t kvmppc_alloc_rma(const char *name);
+void *kvmppc_create_spapr_tce(target_ulong liobn, uint32_t window_size, int *fd);
+int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
 
 #else
 
@@ -57,6 +59,18 @@ static inline off_t kvmppc_alloc_rma(const char *name)
     return 0;
 }
 
+static inline void *kvmppc_create_spapr_tce(target_ulong liobn,
+                                            uint32_t window_size, int *fd)
+{
+    return NULL;
+}
+
+static inline int kvmppc_remove_spapr_tce(void *table, int pfd,
+                                          uint32_t window_size)
+{
+    return -1;
+}
+
 #endif
 
 #ifndef CONFIG_KVM
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] pseries: Support SMT systems for KVM Book3S-HV
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 1/3] pseries: Support SMT systems for KVM Book3S-HV David Gibson
@ 2011-09-29  7:27   ` Jan Kiszka
  2011-09-29 13:17   ` Alexander Graf
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Kiszka @ 2011-09-29  7:27 UTC (permalink / raw)
  To: David Gibson; +Cc: agraf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 673 bytes --]

On 2011-09-29 08:45, David Gibson wrote:
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 35a6f10..2c1bc7a 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c

...

> @@ -76,6 +78,9 @@ int kvm_arch_init(KVMState *s)
>      cap_interrupt_level = kvm_check_extension(s, KVM_CAP_PPC_IRQ_LEVEL);
>      cap_segstate = kvm_check_extension(s, KVM_CAP_PPC_SEGSTATE);
>      cap_booke_sregs = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_SREGS);
> +#ifdef KVM_CAP_PPC_SMT
> +    cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
> +#endif

Just use update-linux-headers.sh if the CAP is missing and drop the
#ifdef. Same for patch 2 & 3.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] pseries: Use Book3S-HV TCE acceleration capabilities
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 3/3] pseries: Use Book3S-HV TCE acceleration capabilities David Gibson
@ 2011-09-29  7:27   ` Jan Kiszka
  2011-09-29 13:29   ` Alexander Graf
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Kiszka @ 2011-09-29  7:27 UTC (permalink / raw)
  To: David Gibson; +Cc: agraf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3737 bytes --]

On 2011-09-29 08:45, David Gibson wrote:
> The pseries machine of qemu implements the TCE mechanism used as a
> virtual IOMMU for the PAPR defined virtual IO devices.  Because the
> PAPR spec only defines a small DMA address space, the guest VIO
> drivers need to update TCE mappings very frequently - the virtual
> network device is particularly bad.  This means many slow exits to
> qemu to emulate the H_PUT_TCE hypercall.
> 
> Sufficiently recent kernels allow this to be mitigated by implementing
> H_PUT_TCE in the host kernel.  To make use of this, however, qemu
> needs to initialize the necessary TCE tables, and map them into itself
> so that the VIO device implementations can retrieve the mappings when
> they access guest memory (which is treated as a virtual DMA
> operation).
> 
> This patch adds the necessary calls to use the KVM TCE acceleration.
> If the kernel does not support acceleration, or there is some other
> error creating the accelerated TCE table, then it will still fall back
> to full userspace TCE implementation.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/spapr_vio.c       |    8 ++++++-
>  hw/spapr_vio.h       |    1 +
>  target-ppc/kvm.c     |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  target-ppc/kvm_ppc.h |   14 +++++++++++++
>  4 files changed, 76 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
> index 35818e1..1da3032 100644
> --- a/hw/spapr_vio.c
> +++ b/hw/spapr_vio.c
> @@ -165,7 +165,13 @@ static void rtce_init(VIOsPAPRDevice *dev)
>          * sizeof(VIOsPAPR_RTCE);
>  
>      if (size) {
> -        dev->rtce_table = g_malloc0(size);
> +        dev->rtce_table = kvmppc_create_spapr_tce(dev->reg,
> +                                                  dev->rtce_window_size,
> +                                                  &dev->kvmtce_fd);
> +
> +        if (!dev->rtce_table) {
> +            dev->rtce_table = g_malloc0(size);
> +        }
>      }
>  }
>  
> diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
> index 4fe5f74..a325a5f 100644
> --- a/hw/spapr_vio.h
> +++ b/hw/spapr_vio.h
> @@ -57,6 +57,7 @@ typedef struct VIOsPAPRDevice {
>      target_ulong signal_state;
>      uint32_t rtce_window_size;
>      VIOsPAPR_RTCE *rtce_table;
> +    int kvmtce_fd;
>      VIOsPAPR_CRQ crq;
>  } VIOsPAPRDevice;
>  
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 37ee902..866cf7f 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -28,6 +28,7 @@
>  #include "kvm_ppc.h"
>  #include "cpu.h"
>  #include "device_tree.h"
> +#include "hw/sysbus.h"
>  #include "hw/spapr.h"
>  
>  #include "hw/sysbus.h"
> @@ -58,6 +59,7 @@ static int cap_ppc_smt = 0;
>  #ifdef KVM_CAP_PPC_RMA
>  static int cap_ppc_rma = 0;
>  #endif
> +static int cap_spapr_tce = false;
>  
>  /* XXX We have a race condition where we actually have a level triggered
>   *     interrupt, but the infrastructure can't expose that yet, so the guest
> @@ -87,6 +89,9 @@ int kvm_arch_init(KVMState *s)
>  #ifdef KVM_CAP_PPC_RMA
>      cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
>  #endif
> +#ifdef KVM_CAP_SPAPR_TCE
> +    cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
> +#endif
>  
>      if (!cap_interrupt_level) {
>          fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
> @@ -792,6 +797,55 @@ off_t kvmppc_alloc_rma(const char *name)
>  #endif
>  }
>  
> +void *kvmppc_create_spapr_tce(target_ulong liobn, uint32_t window_size, int *pfd)
> +{    struct kvm_create_spapr_tce args = {

Missing newline?

And please make sure your patches pass checkpatch.pl.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] pseries: Support SMT systems for KVM Book3S-HV
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 1/3] pseries: Support SMT systems for KVM Book3S-HV David Gibson
  2011-09-29  7:27   ` Jan Kiszka
@ 2011-09-29 13:17   ` Alexander Graf
  2011-09-30  1:02     ` David Gibson
  1 sibling, 1 reply; 11+ messages in thread
From: Alexander Graf @ 2011-09-29 13:17 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel


On 29.09.2011, at 08:45, David Gibson wrote:

> Alex Graf has already made qemu support KVM for the pseries machine
> when using the Book3S-PR KVM variant (which runs the guest in
> usermode, emulating supervisor operations).  This code allows gets us
> very close to also working with KVM Book3S-HV (using the hypervisor
> capabilities of recent POWER CPUs).
> 
> This patch moves us another step towards Book3S-HV support by
> correctly handling SMT (multithreaded) POWER CPUs.  There are two
> parts to this:
> 
> * Querying KVM to check SMT capability, and if present, adjusting the
>   cpu numbers that qemu assigns to cause KVM to assign guest threads
>   to cores in the right way (this isn't automatic, because the POWER
>   HV support has a limitation that different threads on a single core
>   cannot be in different guests at the same time).
> 
> * Correctly informing the guest OS of the SMT thread to core mappings
>   via the device tree.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/spapr.c           |   23 ++++++++++++++++++++---
> target-ppc/helper.c  |   11 +++++++++++
> target-ppc/kvm.c     |   10 ++++++++++
> target-ppc/kvm_ppc.h |    6 ++++++
> 4 files changed, 47 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/spapr.c b/hw/spapr.c
> index b118975..ba9ae1c 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -29,6 +29,9 @@
> #include "elf.h"
> #include "net.h"
> #include "blockdev.h"
> +#include "cpus.h"
> +#include "kvm.h"
> +#include "kvm_ppc.h"
> 
> #include "hw/boards.h"
> #include "hw/ppc.h"
> @@ -103,6 +106,7 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
>     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
>     int i;
>     char *modelname;
> +    int smt = kvmppc_smt_threads();
> 
> #define _FDT(exp) \
>     do { \
> @@ -162,13 +166,17 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
> 
>     for (env = first_cpu; env != NULL; env = env->next_cpu) {
>         int index = env->cpu_index;
> -        uint32_t gserver_prop[] = {cpu_to_be32(index), 0}; /* HACK! */
> +        uint32_t servers_prop[smp_threads];
> +        uint32_t gservers_prop[smp_threads * 2];
>         char *nodename;
>         uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
>                            0xffffffff, 0xffffffff};
>         uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq() : TIMEBASE_FREQ;
>         uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 1000000000;
> 
> +        if ((index % smt) != 0)
> +            continue;

Please run through checkpatch.pl

Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] pseries: Allow KVM Book3S-HV on PPC970 CPUS
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 2/3] pseries: Allow KVM Book3S-HV on PPC970 CPUS David Gibson
@ 2011-09-29 13:25   ` Alexander Graf
  0 siblings, 0 replies; 11+ messages in thread
From: Alexander Graf @ 2011-09-29 13:25 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel


On 29.09.2011, at 08:45, David Gibson wrote:

> At present, using the hypervisor aware Book3S-HV KVM will only work
> with qemu on POWER7 CPUs.  PPC970 CPUs also have hypervisor
> capability, but they lack the VRMA feature which makes assigning guest
> memory easier.
> 
> In order to allow KVM Book3S-HV on PPC970, we need to specially
> allocate the first chunk of guest memory (the "Real Mode Area" or
> RMA), so that it is physically contiguous.
> 
> Sufficiently recent host kernels allow such contiguous RMAs to be
> allocated, with a kvm capability advertising whether the feature is
> available and/or necessary on this hardware.  This patch enables qemu
> to use this support, thus allowing kvm acceleration of pseries qemu
> machines on PPC970 hardware.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/spapr.c           |   50 ++++++++++++++++++++++++++++++++++++++++--------
> target-ppc/kvm.c     |   51 ++++++++++++++++++++++++++++++++++++++++++++++++++
> target-ppc/kvm_ppc.h |    6 +++++
> 3 files changed, 98 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/spapr.c b/hw/spapr.c
> index ba9ae1c..d51425a 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -89,6 +89,7 @@ qemu_irq spapr_allocate_irq(uint32_t hint, uint32_t *irq_num)
> }
> 
> static void *spapr_create_fdt_skel(const char *cpu_model,
> +                                   target_phys_addr_t rma_size,
>                                    target_phys_addr_t initrd_base,
>                                    target_phys_addr_t initrd_size,
>                                    const char *boot_device,
> @@ -97,7 +98,9 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
> {
>     void *fdt;
>     CPUState *env;
> -    uint64_t mem_reg_property[] = { 0, cpu_to_be64(ram_size) };
> +    uint64_t mem_reg_property_rma[] = { 0, cpu_to_be64(rma_size) };
> +    uint64_t mem_reg_property_nonrma[] = { cpu_to_be64(rma_size),
> +                                           cpu_to_be64(ram_size - rma_size) };
>     uint32_t start_prop = cpu_to_be32(initrd_base);
>     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>     uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
> @@ -143,15 +146,25 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
> 
>     _FDT((fdt_end_node(fdt)));
> 
> -    /* memory node */
> +    /* memory node(s) */
>     _FDT((fdt_begin_node(fdt, "memory@0")));
> 
>     _FDT((fdt_property_string(fdt, "device_type", "memory")));
> -    _FDT((fdt_property(fdt, "reg",
> -                       mem_reg_property, sizeof(mem_reg_property))));
> -
> +    _FDT((fdt_property(fdt, "reg", mem_reg_property_rma,
> +                       sizeof(mem_reg_property_rma))));
>     _FDT((fdt_end_node(fdt)));
> 
> +    if (ram_size > rma_size) {
> +        char mem_name[32];
> +
> +	sprintf(mem_name, "memory@%" PRIx64, (uint64_t)rma_size);
> +	_FDT((fdt_begin_node(fdt, mem_name)));
> +	_FDT((fdt_property_string(fdt, "device_type", "memory")));
> +        _FDT((fdt_property(fdt, "reg", mem_reg_property_nonrma,
> +                           sizeof(mem_reg_property_nonrma))));
> +        _FDT((fdt_end_node(fdt)));
> +    }        
> +
>     /* cpus */
>     _FDT((fdt_begin_node(fdt, "cpus")));
> 
> @@ -341,6 +354,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
> {
>     CPUState *env;
>     int i;
> +    target_phys_addr_t rma_alloc_size, rma_size;
>     ram_addr_t ram_offset;
>     uint32_t initrd_base;
>     long kernel_size, initrd_size, fw_size;
> @@ -350,10 +364,23 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>     spapr = g_malloc(sizeof(*spapr));
>     cpu_ppc_hypercall = emulate_spapr_hypercall;
> 
> +    /* Allocate RMA if necessary */
> +    rma_alloc_size = kvmppc_alloc_rma("ppc_spapr.rma");
> +
> +    if (rma_alloc_size == -1) {
> +        hw_error("qemu: Unable to create RMA\n");
> +        exit(1);
> +    }
> +    if (rma_alloc_size && (rma_alloc_size < ram_size)) {
> +        rma_size = rma_alloc_size;
> +    } else {
> +        rma_size = ram_size;
> +    }
> +
>     /* We place the device tree just below either the top of RAM, or
>      * 2GB, so that it can be processed with 32-bit code if
>      * necessary */
> -    spapr->fdt_addr = MIN(ram_size, 0x80000000) - FDT_MAX_SIZE;
> +    spapr->fdt_addr = MIN(rma_size, 0x80000000) - FDT_MAX_SIZE;

The change looks sane, so I'd assume the description above is now wrong :)

>     spapr->rtas_addr = spapr->fdt_addr - RTAS_MAX_SIZE;
> 
>     /* init CPUs */
> @@ -378,8 +405,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
> 
>     /* allocate RAM */
>     spapr->ram_limit = ram_size;
> -    ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", spapr->ram_limit);
> -    cpu_register_physical_memory(0, ram_size, ram_offset);
> +    if (spapr->ram_limit > rma_alloc_size) {
> +        ram_addr_t nonrma_base = rma_alloc_size;
> +        ram_addr_t nonrma_size = spapr->ram_limit - rma_alloc_size;
> +
> +        ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", nonrma_size);
> +        cpu_register_physical_memory(nonrma_base, nonrma_size, ram_offset);
> +    }
> 
>     /* allocate hash page table.  For now we always make this 16mb,
>      * later we should probably make it scale to the size of guest
> @@ -503,7 +535,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>     }
> 
>     /* Prepare the device tree */
> -    spapr->fdt_skel = spapr_create_fdt_skel(cpu_model,
> +    spapr->fdt_skel = spapr_create_fdt_skel(cpu_model, rma_size,
>                                             initrd_base, initrd_size,
>                                             boot_device, kernel_cmdline,
>                                             pteg_shift + 7);
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 2c1bc7a..37ee902 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -55,6 +55,9 @@ static int cap_interrupt_level = false;
> static int cap_segstate;
> static int cap_booke_sregs;
> static int cap_ppc_smt = 0;
> +#ifdef KVM_CAP_PPC_RMA

No need for these ifdefs anymore thanks to qemu local kvm headers :)


Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] pseries: Use Book3S-HV TCE acceleration capabilities
  2011-09-29  6:45 ` [Qemu-devel] [PATCH 3/3] pseries: Use Book3S-HV TCE acceleration capabilities David Gibson
  2011-09-29  7:27   ` Jan Kiszka
@ 2011-09-29 13:29   ` Alexander Graf
  2011-09-30  2:37     ` David Gibson
  1 sibling, 1 reply; 11+ messages in thread
From: Alexander Graf @ 2011-09-29 13:29 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel


On 29.09.2011, at 08:45, David Gibson wrote:

> The pseries machine of qemu implements the TCE mechanism used as a
> virtual IOMMU for the PAPR defined virtual IO devices.  Because the
> PAPR spec only defines a small DMA address space, the guest VIO
> drivers need to update TCE mappings very frequently - the virtual
> network device is particularly bad.  This means many slow exits to
> qemu to emulate the H_PUT_TCE hypercall.
> 
> Sufficiently recent kernels allow this to be mitigated by implementing
> H_PUT_TCE in the host kernel.  To make use of this, however, qemu
> needs to initialize the necessary TCE tables, and map them into itself
> so that the VIO device implementations can retrieve the mappings when
> they access guest memory (which is treated as a virtual DMA
> operation).
> 
> This patch adds the necessary calls to use the KVM TCE acceleration.
> If the kernel does not support acceleration, or there is some other
> error creating the accelerated TCE table, then it will still fall back
> to full userspace TCE implementation.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/spapr_vio.c       |    8 ++++++-
> hw/spapr_vio.h       |    1 +
> target-ppc/kvm.c     |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++
> target-ppc/kvm_ppc.h |   14 +++++++++++++
> 4 files changed, 76 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
> index 35818e1..1da3032 100644
> --- a/hw/spapr_vio.c
> +++ b/hw/spapr_vio.c
> @@ -165,7 +165,13 @@ static void rtce_init(VIOsPAPRDevice *dev)
>         * sizeof(VIOsPAPR_RTCE);
> 
>     if (size) {
> -        dev->rtce_table = g_malloc0(size);
> +        dev->rtce_table = kvmppc_create_spapr_tce(dev->reg,
> +                                                  dev->rtce_window_size,
> +                                                  &dev->kvmtce_fd);
> +
> +        if (!dev->rtce_table) {
> +            dev->rtce_table = g_malloc0(size);
> +        }
>     }
> }
> 
> diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
> index 4fe5f74..a325a5f 100644
> --- a/hw/spapr_vio.h
> +++ b/hw/spapr_vio.h
> @@ -57,6 +57,7 @@ typedef struct VIOsPAPRDevice {
>     target_ulong signal_state;
>     uint32_t rtce_window_size;
>     VIOsPAPR_RTCE *rtce_table;
> +    int kvmtce_fd;
>     VIOsPAPR_CRQ crq;
> } VIOsPAPRDevice;
> 
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 37ee902..866cf7f 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -28,6 +28,7 @@
> #include "kvm_ppc.h"
> #include "cpu.h"
> #include "device_tree.h"
> +#include "hw/sysbus.h"
> #include "hw/spapr.h"
> 
> #include "hw/sysbus.h"
> @@ -58,6 +59,7 @@ static int cap_ppc_smt = 0;
> #ifdef KVM_CAP_PPC_RMA
> static int cap_ppc_rma = 0;
> #endif
> +static int cap_spapr_tce = false;
> 
> /* XXX We have a race condition where we actually have a level triggered
>  *     interrupt, but the infrastructure can't expose that yet, so the guest
> @@ -87,6 +89,9 @@ int kvm_arch_init(KVMState *s)
> #ifdef KVM_CAP_PPC_RMA
>     cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
> #endif
> +#ifdef KVM_CAP_SPAPR_TCE
> +    cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
> +#endif
> 
>     if (!cap_interrupt_level) {
>         fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
> @@ -792,6 +797,55 @@ off_t kvmppc_alloc_rma(const char *name)
> #endif
> }
> 
> +void *kvmppc_create_spapr_tce(target_ulong liobn, uint32_t window_size, int *pfd)
> +{    struct kvm_create_spapr_tce args = {
> +        .liobn = liobn,
> +        .window_size = window_size,
> +    };
> +    long len;
> +    int fd;
> +    void *table;
> +
> +    if (!cap_spapr_tce) {
> +        return NULL;
> +    }
> +
> +    fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_SPAPR_TCE, &args);
> +    if (fd < 0) {
> +        return NULL;
> +    }
> +
> +    len = (window_size / SPAPR_VIO_TCE_PAGE_SIZE) * sizeof(VIOsPAPR_RTCE);
> +    /* FIXME: round this up to page size */
> +
> +    table = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
> +    if (table == MAP_FAILED) {
> +        close(fd);
> +        return NULL;
> +    }
> +
> +    *pfd = fd;
> +    return table;
> +}
> +
> +int kvmppc_remove_spapr_tce(void *table, int fd, uint32_t window_size)

Hrm. Is this ever called somewhere?


Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] pseries: Support SMT systems for KVM Book3S-HV
  2011-09-29 13:17   ` Alexander Graf
@ 2011-09-30  1:02     ` David Gibson
  0 siblings, 0 replies; 11+ messages in thread
From: David Gibson @ 2011-09-30  1:02 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-devel

On Thu, Sep 29, 2011 at 03:17:11PM +0200, Alexander Graf wrote:
> On 29.09.2011, at 08:45, David Gibson wrote:
[snip]
> > diff --git a/hw/spapr.c b/hw/spapr.c
> > index b118975..ba9ae1c 100644
> > --- a/hw/spapr.c
> > +++ b/hw/spapr.c
> > @@ -29,6 +29,9 @@
> > #include "elf.h"
> > #include "net.h"
> > #include "blockdev.h"
> > +#include "cpus.h"
> > +#include "kvm.h"
> > +#include "kvm_ppc.h"
> > 
> > #include "hw/boards.h"
> > #include "hw/ppc.h"
> > @@ -103,6 +106,7 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
> >     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
> >     int i;
> >     char *modelname;
> > +    int smt = kvmppc_smt_threads();
> > 
> > #define _FDT(exp) \
> >     do { \
> > @@ -162,13 +166,17 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
> > 
> >     for (env = first_cpu; env != NULL; env = env->next_cpu) {
> >         int index = env->cpu_index;
> > -        uint32_t gserver_prop[] = {cpu_to_be32(index), 0}; /* HACK! */
> > +        uint32_t servers_prop[smp_threads];
> > +        uint32_t gservers_prop[smp_threads * 2];
> >         char *nodename;
> >         uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
> >                            0xffffffff, 0xffffffff};
> >         uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq() : TIMEBASE_FREQ;
> >         uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 1000000000;
> > 
> > +        if ((index % smt) != 0)
> > +            continue;
> 
> Please run through checkpatch.pl

Actually, checkpatch.pl didn't spot that one, for zome bizarre
reason.  Fixed, nonetheless.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] pseries: Use Book3S-HV TCE acceleration capabilities
  2011-09-29 13:29   ` Alexander Graf
@ 2011-09-30  2:37     ` David Gibson
  0 siblings, 0 replies; 11+ messages in thread
From: David Gibson @ 2011-09-30  2:37 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-devel

On Thu, Sep 29, 2011 at 03:29:59PM +0200, Alexander Graf wrote:
> On 29.09.2011, at 08:45, David Gibson wrote:
[snip]
> > +int kvmppc_remove_spapr_tce(void *table, int fd, uint32_t window_size)
> 
> Hrm. Is this ever called somewhere?

Not yet.  It will be needed when we refactor the TCE bypass support
for integration with general IOMMU infrastructure.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-09-30  3:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-29  6:45 [Qemu-devel] [0/3] pseries: Support and improvements for KVM Book3S-HV support David Gibson
2011-09-29  6:45 ` [Qemu-devel] [PATCH 1/3] pseries: Support SMT systems for KVM Book3S-HV David Gibson
2011-09-29  7:27   ` Jan Kiszka
2011-09-29 13:17   ` Alexander Graf
2011-09-30  1:02     ` David Gibson
2011-09-29  6:45 ` [Qemu-devel] [PATCH 2/3] pseries: Allow KVM Book3S-HV on PPC970 CPUS David Gibson
2011-09-29 13:25   ` Alexander Graf
2011-09-29  6:45 ` [Qemu-devel] [PATCH 3/3] pseries: Use Book3S-HV TCE acceleration capabilities David Gibson
2011-09-29  7:27   ` Jan Kiszka
2011-09-29 13:29   ` Alexander Graf
2011-09-30  2:37     ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.