All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Implement emulation of pSeries logical partitions (v3)
@ 2011-03-16  4:56 David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 01/26] Clean up PowerPC SLB handling code David Gibson
                   ` (25 more replies)
  0 siblings, 26 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

This patch series adds a "pseries" machine to qemu, allowing it to
emulate IBM pSeries logical partitions.  More specifically it
implements the interface defined by the "PowerPC Architecture Platform
Requirements" document (PAPR, or sPAPR for short).

Along the way we add a bunch of support for more modern ppc CPUs than
are currently supported.  It also makes some significant cleanups to
the translation code for hash page table based ppc MMUs.

Please apply.

Changes since v2 of this series:
 * Assorted bugfixes and cleanups.

Changes since v1 of this series:
 * numerous coding style fixups
 * incorporated most review comments from initial version
 * moved to a wholly dynamic hypercall registration scheme
 * assorted other cleanups
 * many more patches implementing VIO devices

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 01/26] Clean up PowerPC SLB handling code
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 02/26] Allow qemu_devtree_setprop() to take arbitrary values David Gibson
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

Currently the SLB information when emulating a PowerPC 970 is
storeed in a structure with the unhelpfully named fields 'tmp'
and 'tmp64'.  While the layout in these fields does match the
description of the SLB in the architecture document, it is not
convenient either for looking up the SLB, or for emulating the
slbmte instruction.

This patch, therefore, reorganizes the SLB entry structure to be
divided in the the "ESID related" and "VSID related" fields as
they are divided in instructions accessing the SLB.

In addition to making the code smaller and more readable, this will
make it easier to implement for the 1TB segments used in more
recent PowerPC chips.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h       |   29 +++++++-
 target-ppc/helper.c    |  178 ++++++++++++++----------------------------------
 target-ppc/helper.h    |    1 -
 target-ppc/op_helper.c |    9 +--
 4 files changed, 80 insertions(+), 137 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index deb8d7c..a20c132 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -43,6 +43,8 @@
 # define TARGET_VIRT_ADDR_SPACE_BITS 64
 #endif
 
+#define TARGET_PAGE_BITS_16M 24
+
 #else /* defined (TARGET_PPC64) */
 /* PowerPC 32 definitions */
 #define TARGET_LONG_BITS 32
@@ -359,10 +361,31 @@ union ppc_tlb_t {
 
 typedef struct ppc_slb_t ppc_slb_t;
 struct ppc_slb_t {
-    uint64_t tmp64;
-    uint32_t tmp;
+    uint64_t esid;
+    uint64_t vsid;
 };
 
+/* Bits in the SLB ESID word */
+#define SLB_ESID_ESID           0xFFFFFFFFF0000000ULL
+#define SLB_ESID_V              0x0000000008000000ULL /* valid */
+
+/* Bits in the SLB VSID word */
+#define SLB_VSID_SHIFT          12
+#define SLB_VSID_SSIZE_SHIFT    62
+#define SLB_VSID_B              0xc000000000000000ULL
+#define SLB_VSID_B_256M         0x0000000000000000ULL
+#define SLB_VSID_VSID           0x3FFFFFFFFFFFF000ULL
+#define SLB_VSID_KS             0x0000000000000800ULL
+#define SLB_VSID_KP             0x0000000000000400ULL
+#define SLB_VSID_N              0x0000000000000200ULL /* no-execute */
+#define SLB_VSID_L              0x0000000000000100ULL
+#define SLB_VSID_C              0x0000000000000080ULL /* class */
+#define SLB_VSID_LP             0x0000000000000030ULL
+#define SLB_VSID_ATTR           0x0000000000000FFFULL
+
+#define SEGMENT_SHIFT_256M      28
+#define SEGMENT_MASK_256M       ~((1ULL << SEGMENT_SHIFT_256M) - 1)
+
 /*****************************************************************************/
 /* Machine state register bits definition                                    */
 #define MSR_SF   63 /* Sixty-four-bit mode                            hflags */
@@ -755,7 +778,7 @@ void ppc_store_sdr1 (CPUPPCState *env, target_ulong value);
 void ppc_store_asr (CPUPPCState *env, target_ulong value);
 target_ulong ppc_load_slb (CPUPPCState *env, int slb_nr);
 target_ulong ppc_load_sr (CPUPPCState *env, int sr_nr);
-void ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs);
+int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs);
 #endif /* defined(TARGET_PPC64) */
 void ppc_store_sr (CPUPPCState *env, int srnum, target_ulong value);
 #endif /* !defined(CONFIG_USER_ONLY) */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 4b49101..2094ca3 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -672,85 +672,36 @@ static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
 }
 
 #if defined(TARGET_PPC64)
-static ppc_slb_t *slb_get_entry(CPUPPCState *env, int nr)
-{
-    ppc_slb_t *retval = &env->slb[nr];
-
-#if 0 // XXX implement bridge mode?
-    if (env->spr[SPR_ASR] & 1) {
-        target_phys_addr_t sr_base;
-
-        sr_base = env->spr[SPR_ASR] & 0xfffffffffffff000;
-        sr_base += (12 * nr);
-
-        retval->tmp64 = ldq_phys(sr_base);
-        retval->tmp = ldl_phys(sr_base + 8);
-    }
-#endif
-
-    return retval;
-}
-
-static void slb_set_entry(CPUPPCState *env, int nr, ppc_slb_t *slb)
-{
-    ppc_slb_t *entry = &env->slb[nr];
-
-    if (slb == entry)
-        return;
-
-    entry->tmp64 = slb->tmp64;
-    entry->tmp = slb->tmp;
-}
-
-static inline int slb_is_valid(ppc_slb_t *slb)
-{
-    return (int)(slb->tmp64 & 0x0000000008000000ULL);
-}
-
-static inline void slb_invalidate(ppc_slb_t *slb)
-{
-    slb->tmp64 &= ~0x0000000008000000ULL;
-}
-
 static inline int slb_lookup(CPUPPCState *env, target_ulong eaddr,
                              target_ulong *vsid, target_ulong *page_mask,
                              int *attr, int *target_page_bits)
 {
-    target_ulong mask;
-    int n, ret;
+    uint64_t esid;
+    int n;
 
-    ret = -5;
     LOG_SLB("%s: eaddr " TARGET_FMT_lx "\n", __func__, eaddr);
-    mask = 0x0000000000000000ULL; /* Avoid gcc warning */
+
+    esid = (eaddr & SEGMENT_MASK_256M) | SLB_ESID_V;
+
     for (n = 0; n < env->slb_nr; n++) {
-        ppc_slb_t *slb = slb_get_entry(env, n);
-
-        LOG_SLB("%s: seg %d %016" PRIx64 " %08"
-                    PRIx32 "\n", __func__, n, slb->tmp64, slb->tmp);
-        if (slb_is_valid(slb)) {
-            /* SLB entry is valid */
-            mask = 0xFFFFFFFFF0000000ULL;
-            if (slb->tmp & 0x8) {
-                /* 16 MB PTEs */
-                if (target_page_bits)
-                    *target_page_bits = 24;
-            } else {
-                /* 4 KB PTEs */
-                if (target_page_bits)
-                    *target_page_bits = TARGET_PAGE_BITS;
-            }
-            if ((eaddr & mask) == (slb->tmp64 & mask)) {
-                /* SLB match */
-                *vsid = ((slb->tmp64 << 24) | (slb->tmp >> 8)) & 0x0003FFFFFFFFFFFFULL;
-                *page_mask = ~mask;
-                *attr = slb->tmp & 0xFF;
-                ret = n;
-                break;
+        ppc_slb_t *slb = &env->slb[n];
+
+        LOG_SLB("%s: slot %d %016" PRIx64 " %016"
+                    PRIx64 "\n", __func__, n, slb->esid, slb->vsid);
+        if (slb->esid == esid) {
+            *vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
+            *page_mask = ~SEGMENT_MASK_256M;
+            *attr = slb->vsid & SLB_VSID_ATTR;
+            if (target_page_bits) {
+                *target_page_bits = (slb->vsid & SLB_VSID_L)
+                    ? TARGET_PAGE_BITS_16M
+                    : TARGET_PAGE_BITS;
             }
+            return n;
         }
     }
 
-    return ret;
+    return -5;
 }
 
 void ppc_slb_invalidate_all (CPUPPCState *env)
@@ -760,11 +711,10 @@ void ppc_slb_invalidate_all (CPUPPCState *env)
     do_invalidate = 0;
     /* XXX: Warning: slbia never invalidates the first segment */
     for (n = 1; n < env->slb_nr; n++) {
-        ppc_slb_t *slb = slb_get_entry(env, n);
+        ppc_slb_t *slb = &env->slb[n];
 
-        if (slb_is_valid(slb)) {
-            slb_invalidate(slb);
-            slb_set_entry(env, n, slb);
+        if (slb->esid & SLB_ESID_V) {
+            slb->esid &= ~SLB_ESID_V;
             /* XXX: given the fact that segment size is 256 MB or 1TB,
              *      and we still don't have a tlb_flush_mask(env, n, mask)
              *      in Qemu, we just invalidate all TLBs
@@ -781,68 +731,44 @@ void ppc_slb_invalidate_one (CPUPPCState *env, uint64_t T0)
     target_ulong vsid, page_mask;
     int attr;
     int n;
+    ppc_slb_t *slb;
 
     n = slb_lookup(env, T0, &vsid, &page_mask, &attr, NULL);
-    if (n >= 0) {
-        ppc_slb_t *slb = slb_get_entry(env, n);
-
-        if (slb_is_valid(slb)) {
-            slb_invalidate(slb);
-            slb_set_entry(env, n, slb);
-            /* XXX: given the fact that segment size is 256 MB or 1TB,
-             *      and we still don't have a tlb_flush_mask(env, n, mask)
-             *      in Qemu, we just invalidate all TLBs
-             */
-            tlb_flush(env, 1);
-        }
+    if (n < 0) {
+        return;
     }
-}
 
-target_ulong ppc_load_slb (CPUPPCState *env, int slb_nr)
-{
-    target_ulong rt;
-    ppc_slb_t *slb = slb_get_entry(env, slb_nr);
+    slb = &env->slb[n];
 
-    if (slb_is_valid(slb)) {
-        /* SLB entry is valid */
-        /* Copy SLB bits 62:88 to Rt 37:63 (VSID 23:49) */
-        rt = slb->tmp >> 8;             /* 65:88 => 40:63 */
-        rt |= (slb->tmp64 & 0x7) << 24; /* 62:64 => 37:39 */
-        /* Copy SLB bits 89:92 to Rt 33:36 (KsKpNL) */
-        rt |= ((slb->tmp >> 4) & 0xF) << 27;
-    } else {
-        rt = 0;
-    }
-    LOG_SLB("%s: %016" PRIx64 " %08" PRIx32 " => %d "
-            TARGET_FMT_lx "\n", __func__, slb->tmp64, slb->tmp, slb_nr, rt);
+    if (slb->esid & SLB_ESID_V) {
+        slb->esid &= ~SLB_ESID_V;
 
-    return rt;
+        /* XXX: given the fact that segment size is 256 MB or 1TB,
+         *      and we still don't have a tlb_flush_mask(env, n, mask)
+         *      in Qemu, we just invalidate all TLBs
+         */
+        tlb_flush(env, 1);
+    }
 }
 
-void ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs)
+int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs)
 {
-    ppc_slb_t *slb;
-
-    uint64_t vsid;
-    uint64_t esid;
-    int flags, valid, slb_nr;
-
-    vsid = rs >> 12;
-    flags = ((rs >> 8) & 0xf);
+    int slot = rb & 0xfff;
+    uint64_t esid = rb & ~0xfff;
+    ppc_slb_t *slb = &env->slb[slot];
 
-    esid = rb >> 28;
-    valid = (rb & (1 << 27));
-    slb_nr = rb & 0xfff;
+    if (slot >= env->slb_nr) {
+        return -1;
+    }
 
-    slb = slb_get_entry(env, slb_nr);
-    slb->tmp64 = (esid << 28) | valid | (vsid >> 24);
-    slb->tmp = (vsid << 8) | (flags << 3);
+    slb->esid = esid;
+    slb->vsid = rs;
 
     LOG_SLB("%s: %d " TARGET_FMT_lx " - " TARGET_FMT_lx " => %016" PRIx64
-            " %08" PRIx32 "\n", __func__, slb_nr, rb, rs, slb->tmp64,
-            slb->tmp);
+            " %016" PRIx64 "\n", __func__, slot, rb, rs,
+            slb->esid, slb->vsid);
 
-    slb_set_entry(env, slb_nr, slb);
+    return 0;
 }
 #endif /* defined(TARGET_PPC64) */
 
@@ -860,24 +786,22 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
 {
     target_phys_addr_t sdr, hash, mask, sdr_mask, htab_mask;
     target_ulong sr, vsid, vsid_mask, pgidx, page_mask;
-#if defined(TARGET_PPC64)
-    int attr;
-#endif
     int ds, vsid_sh, sdr_sh, pr, target_page_bits;
     int ret, ret2;
 
     pr = msr_pr;
 #if defined(TARGET_PPC64)
     if (env->mmu_model & POWERPC_MMU_64) {
+        int attr;
+
         LOG_MMU("Check SLBs\n");
         ret = slb_lookup(env, eaddr, &vsid, &page_mask, &attr,
                          &target_page_bits);
         if (ret < 0)
             return ret;
-        ctx->key = ((attr & 0x40) && (pr != 0)) ||
-            ((attr & 0x80) && (pr == 0)) ? 1 : 0;
+        ctx->key = !!(pr ? (attr & SLB_VSID_KP) : (attr & SLB_VSID_KS));
         ds = 0;
-        ctx->nx = attr & 0x10 ? 1 : 0;
+        ctx->nx = !!(attr & SLB_VSID_N);
         ctx->eaddr = eaddr;
         vsid_mask = 0x00003FFFFFFFFF80ULL;
         vsid_sh = 7;
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 2bf9283..d512cb0 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -340,7 +340,6 @@ DEF_HELPER_1(74xx_tlbi, void, tl)
 DEF_HELPER_FLAGS_0(tlbia, TCG_CALL_CONST, void)
 DEF_HELPER_FLAGS_1(tlbie, TCG_CALL_CONST, void, tl)
 #if defined(TARGET_PPC64)
-DEF_HELPER_FLAGS_1(load_slb, TCG_CALL_CONST, tl, tl)
 DEF_HELPER_FLAGS_2(store_slb, TCG_CALL_CONST, void, tl, tl)
 DEF_HELPER_FLAGS_0(slbia, TCG_CALL_CONST, void)
 DEF_HELPER_FLAGS_1(slbie, TCG_CALL_CONST, void, tl)
diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
index 17e070a..bf41627 100644
--- a/target-ppc/op_helper.c
+++ b/target-ppc/op_helper.c
@@ -3746,14 +3746,11 @@ void helper_store_sr (target_ulong sr_num, target_ulong val)
 
 /* SLB management */
 #if defined(TARGET_PPC64)
-target_ulong helper_load_slb (target_ulong slb_nr)
-{
-    return ppc_load_slb(env, slb_nr);
-}
-
 void helper_store_slb (target_ulong rb, target_ulong rs)
 {
-    ppc_store_slb(env, rb, rs);
+    if (ppc_store_slb(env, rb, rs) < 0) {
+        helper_raise_exception_err(POWERPC_EXCP_PROGRAM, POWERPC_EXCP_INVAL);
+    }
 }
 
 void helper_slbia (void)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 02/26] Allow qemu_devtree_setprop() to take arbitrary values
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 01/26] Clean up PowerPC SLB handling code David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC David Gibson
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

From: David Gibson <dwg@au1.ibm.com>

Currently qemu_devtree_setprop() expects the new property value to be
given as a uint32_t *.  While property values consisting of u32s are
common, in general they can have any bytestring value.

Therefore, this patch alters the function to take a void * instead,
allowing callers to easily give anything as the property value.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 device_tree.c |    2 +-
 device_tree.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/device_tree.c b/device_tree.c
index 426a631..21be070 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -74,7 +74,7 @@ fail:
 }
 
 int qemu_devtree_setprop(void *fdt, const char *node_path,
-                         const char *property, uint32_t *val_array, int size)
+                         const char *property, void *val_array, int size)
 {
     int offset;
 
diff --git a/device_tree.h b/device_tree.h
index f05c4e7..cecd98f 100644
--- a/device_tree.h
+++ b/device_tree.h
@@ -17,7 +17,7 @@
 void *load_device_tree(const char *filename_path, int *sizep);
 
 int qemu_devtree_setprop(void *fdt, const char *node_path,
-                         const char *property, uint32_t *val_array, int size);
+                         const char *property, void *val_array, int size);
 int qemu_devtree_setprop_cell(void *fdt, const char *node_path,
                               const char *property, uint32_t val);
 int qemu_devtree_setprop_string(void *fdt, const char *node_path,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 01/26] Clean up PowerPC SLB handling code David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 02/26] Allow qemu_devtree_setprop() to take arbitrary values David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16 13:46   ` [Qemu-devel] " Alexander Graf
  2011-03-16 20:44   ` [Qemu-devel] " Anthony Liguori
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 04/26] Implement PowerPC slbmfee and slbmfev instructions David Gibson
                   ` (22 subsequent siblings)
  25 siblings, 2 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

From: David Gibson <dwg@au1.ibm.com>

PowerPC and POWER chips since the POWER4 and 970 have a special
hypervisor mode, and a corresponding form of the system call
instruction which traps to the hypervisor.

qemu currently has stub implementations of hypervisor mode.  That
is, the outline is there to allow qemu to run a PowerPC hypervisor
under emulation.  There are a number of details missing so this
won't actually work at present, but the idea is there.

What there is no provision at all, is for qemu to instead emulate
the hypervisor itself.  That is to have hypercalls trap into qemu
and their result be emulated from qemu, rather than running
hypervisor code within the emulated system.

Hypervisor hardware aware KVM implementations are in the works and
it would  be useful for debugging and development to also allow
full emulation of the same para-virtualized guests as such a KVM.

Therefore, this patch adds a hook which will allow a machine to
set up emulation of hypervisor calls.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h    |    2 ++
 target-ppc/helper.c |    4 ++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index a20c132..eaddc27 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -692,6 +692,8 @@ struct CPUPPCState {
     int bfd_mach;
     uint32_t flags;
     uint64_t insns_flags;
+    void (*emulate_hypercall)(CPUState *, void *);
+    void *hcall_opaque;
 
     int error_code;
     uint32_t pending_interrupts;
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 2094ca3..19aa067 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -2152,6 +2152,10 @@ static inline void powerpc_excp(CPUState *env, int excp_model, int excp)
     case POWERPC_EXCP_SYSCALL:   /* System call exception                    */
         dump_syscall(env);
         lev = env->error_code;
+	if ((lev == 1) && env->emulate_hypercall) {
+	    env->emulate_hypercall(env, env->hcall_opaque);
+	    return;
+	}	    
         if (lev == 1 || (lpes0 == 0 && lpes1 == 0))
             new_msr |= (target_ulong)MSR_HVB;
         goto store_next;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 04/26] Implement PowerPC slbmfee and slbmfev instructions
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (2 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 05/26] Implement missing parts of the logic for the POWER PURR David Gibson
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

From: David Gibson <dwg@au1.ibm.com>

For a 64-bit PowerPC target, qemu correctly implements translation
through the segment lookaside buffer.  Likewise it supports the
slbmte instruction which is used to load entries into the SLB.

However, it does not emulate the slbmfee and slbmfev instructions
which read SLB entries back into registers.  Because these are
only occasionally used in guests (mostly for debugging) we get
away with it.

However, given the recent SLB cleanups, it becomes quite easy to
implement these, and thereby allow, amongst other things, a guest
Linux to use xmon's command to dump the SLB.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h       |    2 ++
 target-ppc/helper.c    |   26 ++++++++++++++++++++++++++
 target-ppc/helper.h    |    2 ++
 target-ppc/op_helper.c |   20 ++++++++++++++++++++
 target-ppc/translate.c |   29 ++++++++++++++++++++++++++++-
 5 files changed, 78 insertions(+), 1 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index eaddc27..9a7495a 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -781,6 +781,8 @@ void ppc_store_asr (CPUPPCState *env, target_ulong value);
 target_ulong ppc_load_slb (CPUPPCState *env, int slb_nr);
 target_ulong ppc_load_sr (CPUPPCState *env, int sr_nr);
 int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs);
+int ppc_load_slb_esid (CPUPPCState *env, target_ulong rb, target_ulong *rt);
+int ppc_load_slb_vsid (CPUPPCState *env, target_ulong rb, target_ulong *rt);
 #endif /* defined(TARGET_PPC64) */
 void ppc_store_sr (CPUPPCState *env, int srnum, target_ulong value);
 #endif /* !defined(CONFIG_USER_ONLY) */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 19aa067..4830981 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -770,6 +770,32 @@ int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs)
 
     return 0;
 }
+
+int ppc_load_slb_esid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
+{
+    int slot = rb & 0xfff;
+    ppc_slb_t *slb = &env->slb[slot];
+
+    if (slot >= env->slb_nr) {
+        return -1;
+    }
+
+    *rt = slb->esid;
+    return 0;
+}
+
+int ppc_load_slb_vsid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
+{
+    int slot = rb & 0xfff;
+    ppc_slb_t *slb = &env->slb[slot];
+
+    if (slot >= env->slb_nr) {
+        return -1;
+    }
+
+    *rt = slb->vsid;
+    return 0;
+}
 #endif /* defined(TARGET_PPC64) */
 
 /* Perform segment based translation */
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index d512cb0..1a69cf8 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -341,6 +341,8 @@ DEF_HELPER_FLAGS_0(tlbia, TCG_CALL_CONST, void)
 DEF_HELPER_FLAGS_1(tlbie, TCG_CALL_CONST, void, tl)
 #if defined(TARGET_PPC64)
 DEF_HELPER_FLAGS_2(store_slb, TCG_CALL_CONST, void, tl, tl)
+DEF_HELPER_1(load_slb_esid, tl, tl)
+DEF_HELPER_1(load_slb_vsid, tl, tl)
 DEF_HELPER_FLAGS_0(slbia, TCG_CALL_CONST, void)
 DEF_HELPER_FLAGS_1(slbie, TCG_CALL_CONST, void, tl)
 #endif
diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
index bf41627..bdb1f17 100644
--- a/target-ppc/op_helper.c
+++ b/target-ppc/op_helper.c
@@ -3753,6 +3753,26 @@ void helper_store_slb (target_ulong rb, target_ulong rs)
     }
 }
 
+target_ulong helper_load_slb_esid (target_ulong rb)
+{
+    target_ulong rt;
+
+    if (ppc_load_slb_esid(env, rb, &rt) < 0) {
+        helper_raise_exception_err(POWERPC_EXCP_PROGRAM, POWERPC_EXCP_INVAL);
+    }
+    return rt;
+}
+
+target_ulong helper_load_slb_vsid (target_ulong rb)
+{
+    target_ulong rt;
+
+    if (ppc_load_slb_vsid(env, rb, &rt) < 0) {
+        helper_raise_exception_err(POWERPC_EXCP_PROGRAM, POWERPC_EXCP_INVAL);
+    }
+    return rt;
+}
+
 void helper_slbia (void)
 {
     ppc_slb_invalidate_all(env);
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 89413c5..2b1a851 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -4227,6 +4227,31 @@ static void gen_slbmte(DisasContext *ctx)
 #endif
 }
 
+static void gen_slbmfee(DisasContext *ctx)
+{
+#if defined(CONFIG_USER_ONLY)
+    gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
+#else
+    if (unlikely(!ctx->mem_idx)) {
+        gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
+        return;
+    }
+    gen_helper_load_slb_esid(cpu_gpr[rS(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);
+#endif
+}
+
+static void gen_slbmfev(DisasContext *ctx)
+{
+#if defined(CONFIG_USER_ONLY)
+    gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
+#else
+    if (unlikely(!ctx->mem_idx)) {
+        gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
+        return;
+    }
+    gen_helper_load_slb_vsid(cpu_gpr[rS(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);
+#endif
+}
 #endif /* defined(TARGET_PPC64) */
 
 /***                      Lookaside buffer management                      ***/
@@ -8110,7 +8135,9 @@ GEN_HANDLER2(mfsrin_64b, "mfsrin", 0x1F, 0x13, 0x14, 0x001F0001,
 GEN_HANDLER2(mtsr_64b, "mtsr", 0x1F, 0x12, 0x06, 0x0010F801, PPC_SEGMENT_64B),
 GEN_HANDLER2(mtsrin_64b, "mtsrin", 0x1F, 0x12, 0x07, 0x001F0001,
              PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmte, "slbmte", 0x1F, 0x12, 0x0C, 0x00000000, PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmte, "slbmte", 0x1F, 0x12, 0x0C, 0x001F0001, PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001F0001, PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001F0001, PPC_SEGMENT_64B),
 #endif
 GEN_HANDLER(tlbia, 0x1F, 0x12, 0x0B, 0x03FFFC01, PPC_MEM_TLBIA),
 GEN_HANDLER(tlbiel, 0x1F, 0x12, 0x08, 0x03FF0001, PPC_MEM_TLBIE),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 05/26] Implement missing parts of the logic for the POWER PURR
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (3 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 04/26] Implement PowerPC slbmfee and slbmfev instructions David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 06/26] Correct ppc popcntb logic, implement popcntw and popcntd David Gibson
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

From: David Gibson <dwg@au1.ibm.com>

The PURR (Processor Utilization Resource Register) is a register found
on recent POWER CPUs.  The guts of implementing it at least enough to
get by are already present in qemu, however some of the helper
functions needed to actually wire it up are missing.

This patch adds the necessary glue, so that the PURR can be wired up
when we implement newer POWER CPU targets which include it.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/helper.h         |    1 +
 target-ppc/op_helper.c      |    5 +++++
 target-ppc/translate_init.c |    6 ++++++
 3 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 1a69cf8..4227897 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -371,6 +371,7 @@ DEF_HELPER_0(load_tbl, tl)
 DEF_HELPER_0(load_tbu, tl)
 DEF_HELPER_0(load_atbl, tl)
 DEF_HELPER_0(load_atbu, tl)
+DEF_HELPER_0(load_purr, tl)
 DEF_HELPER_0(load_601_rtcl, tl)
 DEF_HELPER_0(load_601_rtcu, tl)
 #if !defined(CONFIG_USER_ONLY)
diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
index bdb1f17..b9b5ae2 100644
--- a/target-ppc/op_helper.c
+++ b/target-ppc/op_helper.c
@@ -86,6 +86,11 @@ target_ulong helper_load_atbu (void)
     return cpu_ppc_load_atbu(env);
 }
 
+target_ulong helper_load_purr (void)
+{
+    return (target_ulong)cpu_ppc_load_purr(env);
+}
+
 target_ulong helper_load_601_rtcl (void)
 {
     return cpu_ppc601_load_rtcl(env);
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 7c08b1c..9d2e4a1 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -251,6 +251,12 @@ static void spr_write_atbu (void *opaque, int sprn, int gprn)
 {
     gen_helper_store_atbu(cpu_gpr[gprn]);
 }
+
+__attribute__ (( unused ))
+static void spr_read_purr(void *opaque, int gprn, int sprn)
+{
+    gen_helper_load_purr(cpu_gpr[gprn]);
+}
 #endif
 
 #if !defined(CONFIG_USER_ONLY)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 06/26] Correct ppc popcntb logic, implement popcntw and popcntd
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (4 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 05/26] Implement missing parts of the logic for the POWER PURR David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 07/26] Clean up slb_lookup() function David Gibson
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

From: David Gibson <dwg@au1.ibm.com>

qemu already includes support for the popcntb instruction introduced
in POWER5 (although it doesn't actually allow you to choose POWER5).

However, the logic is slightly incorrect: it will generate results
truncated to 32-bits when the CPU is in 32-bit mode.  This is not
normal for powerpc - generally arithmetic instructions on a 64-bit
powerpc cpu will generate full 64 bit results, it's just that only the
low 32 bits will be significant for condition codes.

This patch corrects this nit, which actually simplifies the code slightly.

In addition, this patch implements the popcntw and popcntd
instructions added in POWER7, in preparation for allowing POWER7 as an
emulated CPU.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h       |    2 +
 target-ppc/helper.h    |    3 +-
 target-ppc/op_helper.c |   55 +++++++++++++++++++++++++++++++++++++++++++----
 target-ppc/translate.c |   20 +++++++++++++----
 4 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 9a7495a..f9ad3b8 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1507,6 +1507,8 @@ enum {
     PPC_DCRX           = 0x2000000000000000ULL,
     /* user-mode DCR access, implemented in PowerPC 460                      */
     PPC_DCRUX          = 0x4000000000000000ULL,
+    /* popcntw and popcntd instructions                                      */
+    PPC_POPCNTWD       = 0x8000000000000000ULL,
 };
 
 /*****************************************************************************/
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 4227897..19c5ebe 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -38,10 +38,11 @@ DEF_HELPER_2(mulldo, i64, i64, i64)
 
 DEF_HELPER_FLAGS_1(cntlzw, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntb, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
+DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
 DEF_HELPER_2(sraw, tl, tl, tl)
 #if defined(TARGET_PPC64)
 DEF_HELPER_FLAGS_1(cntlzd, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
-DEF_HELPER_FLAGS_1(popcntb_64, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
+DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
 DEF_HELPER_2(srad, tl, tl, tl)
 #endif
 
diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
index b9b5ae2..9dd3217 100644
--- a/target-ppc/op_helper.c
+++ b/target-ppc/op_helper.c
@@ -497,6 +497,50 @@ target_ulong helper_srad (target_ulong value, target_ulong shift)
 }
 #endif
 
+#if defined(TARGET_PPC64)
+target_ulong helper_popcntb (target_ulong val)
+{
+    val = (val & 0x5555555555555555ULL) + ((val >>  1) &
+                                           0x5555555555555555ULL);
+    val = (val & 0x3333333333333333ULL) + ((val >>  2) &
+                                           0x3333333333333333ULL);
+    val = (val & 0x0f0f0f0f0f0f0f0fULL) + ((val >>  4) &
+                                           0x0f0f0f0f0f0f0f0fULL);
+    return val;
+}
+
+target_ulong helper_popcntw (target_ulong val)
+{
+    val = (val & 0x5555555555555555ULL) + ((val >>  1) &
+                                           0x5555555555555555ULL);
+    val = (val & 0x3333333333333333ULL) + ((val >>  2) &
+                                           0x3333333333333333ULL);
+    val = (val & 0x0f0f0f0f0f0f0f0fULL) + ((val >>  4) &
+                                           0x0f0f0f0f0f0f0f0fULL);
+    val = (val & 0x00ff00ff00ff00ffULL) + ((val >>  8) &
+                                           0x00ff00ff00ff00ffULL);
+    val = (val & 0x0000ffff0000ffffULL) + ((val >> 16) &
+                                           0x0000ffff0000ffffULL);
+    return val;
+}
+
+target_ulong helper_popcntd (target_ulong val)
+{
+    val = (val & 0x5555555555555555ULL) + ((val >>  1) &
+                                           0x5555555555555555ULL);
+    val = (val & 0x3333333333333333ULL) + ((val >>  2) &
+                                           0x3333333333333333ULL);
+    val = (val & 0x0f0f0f0f0f0f0f0fULL) + ((val >>  4) &
+                                           0x0f0f0f0f0f0f0f0fULL);
+    val = (val & 0x00ff00ff00ff00ffULL) + ((val >>  8) &
+                                           0x00ff00ff00ff00ffULL);
+    val = (val & 0x0000ffff0000ffffULL) + ((val >> 16) &
+                                           0x0000ffff0000ffffULL);
+    val = (val & 0x00000000ffffffffULL) + ((val >> 32) &
+                                           0x00000000ffffffffULL);
+    return val;
+}
+#else
 target_ulong helper_popcntb (target_ulong val)
 {
     val = (val & 0x55555555) + ((val >>  1) & 0x55555555);
@@ -505,12 +549,13 @@ target_ulong helper_popcntb (target_ulong val)
     return val;
 }
 
-#if defined(TARGET_PPC64)
-target_ulong helper_popcntb_64 (target_ulong val)
+target_ulong helper_popcntw (target_ulong val)
 {
-    val = (val & 0x5555555555555555ULL) + ((val >>  1) & 0x5555555555555555ULL);
-    val = (val & 0x3333333333333333ULL) + ((val >>  2) & 0x3333333333333333ULL);
-    val = (val & 0x0f0f0f0f0f0f0f0fULL) + ((val >>  4) & 0x0f0f0f0f0f0f0f0fULL);
+    val = (val & 0x55555555) + ((val >>  1) & 0x55555555);
+    val = (val & 0x33333333) + ((val >>  2) & 0x33333333);
+    val = (val & 0x0f0f0f0f) + ((val >>  4) & 0x0f0f0f0f);
+    val = (val & 0x00ff00ff) + ((val >>  8) & 0x00ff00ff);
+    val = (val & 0x0000ffff) + ((val >> 16) & 0x0000ffff);
     return val;
 }
 #endif
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 2b1a851..5c28ac3 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1483,13 +1483,21 @@ static void gen_xoris(DisasContext *ctx)
 /* popcntb : PowerPC 2.03 specification */
 static void gen_popcntb(DisasContext *ctx)
 {
+    gen_helper_popcntb(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+}
+
+static void gen_popcntw(DisasContext *ctx)
+{
+    gen_helper_popcntw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+}
+
 #if defined(TARGET_PPC64)
-    if (ctx->sf_mode)
-        gen_helper_popcntb_64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
-    else
-#endif
-        gen_helper_popcntb(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+/* popcntd: PowerPC 2.06 specification */
+static void gen_popcntd(DisasContext *ctx)
+{
+    gen_helper_popcntd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
 }
+#endif
 
 #if defined(TARGET_PPC64)
 /* extsw & extsw. */
@@ -8034,7 +8042,9 @@ GEN_HANDLER(oris, 0x19, 0xFF, 0xFF, 0x00000000, PPC_INTEGER),
 GEN_HANDLER(xori, 0x1A, 0xFF, 0xFF, 0x00000000, PPC_INTEGER),
 GEN_HANDLER(xoris, 0x1B, 0xFF, 0xFF, 0x00000000, PPC_INTEGER),
 GEN_HANDLER(popcntb, 0x1F, 0x03, 0x03, 0x0000F801, PPC_POPCNTB),
+GEN_HANDLER(popcntw, 0x1F, 0x1A, 0x0b, 0x0000F801, PPC_POPCNTWD),
 #if defined(TARGET_PPC64)
+GEN_HANDLER(popcntd, 0x1F, 0x1A, 0x0F, 0x0000F801, PPC_POPCNTWD),
 GEN_HANDLER(cntlzd, 0x1F, 0x1A, 0x01, 0x00000000, PPC_64B),
 #endif
 GEN_HANDLER(rlwimi, 0x14, 0xFF, 0xFF, 0x00000000, PPC_INTEGER),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 07/26] Clean up slb_lookup() function
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (5 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 06/26] Correct ppc popcntb logic, implement popcntw and popcntd David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 08/26] Parse SDR1 on mtspr instead of at translate time David Gibson
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

The slb_lookup() function, used in the ppc translation path returns a
number of slb entry fields in reference parameters.  However, only one
of the two callers of slb_lookup() actually wants this information.

This patch, therefore, makes slb_lookup() return a simple pointer to the
located SLB entry (or NULL), and the caller which needs the fields can
extract them itself.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/helper.c |   45 ++++++++++++++++++---------------------------
 1 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 4830981..73d93ca 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -672,9 +672,7 @@ static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
 }
 
 #if defined(TARGET_PPC64)
-static inline int slb_lookup(CPUPPCState *env, target_ulong eaddr,
-                             target_ulong *vsid, target_ulong *page_mask,
-                             int *attr, int *target_page_bits)
+static inline ppc_slb_t *slb_lookup(CPUPPCState *env, target_ulong eaddr)
 {
     uint64_t esid;
     int n;
@@ -689,19 +687,11 @@ static inline int slb_lookup(CPUPPCState *env, target_ulong eaddr,
         LOG_SLB("%s: slot %d %016" PRIx64 " %016"
                     PRIx64 "\n", __func__, n, slb->esid, slb->vsid);
         if (slb->esid == esid) {
-            *vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
-            *page_mask = ~SEGMENT_MASK_256M;
-            *attr = slb->vsid & SLB_VSID_ATTR;
-            if (target_page_bits) {
-                *target_page_bits = (slb->vsid & SLB_VSID_L)
-                    ? TARGET_PAGE_BITS_16M
-                    : TARGET_PAGE_BITS;
-            }
-            return n;
+            return slb;
         }
     }
 
-    return -5;
+    return NULL;
 }
 
 void ppc_slb_invalidate_all (CPUPPCState *env)
@@ -728,18 +718,13 @@ void ppc_slb_invalidate_all (CPUPPCState *env)
 
 void ppc_slb_invalidate_one (CPUPPCState *env, uint64_t T0)
 {
-    target_ulong vsid, page_mask;
-    int attr;
-    int n;
     ppc_slb_t *slb;
 
-    n = slb_lookup(env, T0, &vsid, &page_mask, &attr, NULL);
-    if (n < 0) {
+    slb = slb_lookup(env, T0);
+    if (!slb) {
         return;
     }
 
-    slb = &env->slb[n];
-
     if (slb->esid & SLB_ESID_V) {
         slb->esid &= ~SLB_ESID_V;
 
@@ -818,16 +803,22 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
     pr = msr_pr;
 #if defined(TARGET_PPC64)
     if (env->mmu_model & POWERPC_MMU_64) {
-        int attr;
+        ppc_slb_t *slb;
 
         LOG_MMU("Check SLBs\n");
-        ret = slb_lookup(env, eaddr, &vsid, &page_mask, &attr,
-                         &target_page_bits);
-        if (ret < 0)
-            return ret;
-        ctx->key = !!(pr ? (attr & SLB_VSID_KP) : (attr & SLB_VSID_KS));
+        slb = slb_lookup(env, eaddr);
+        if (!slb) {
+            return -5;
+        }
+
+        vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
+        page_mask = ~SEGMENT_MASK_256M;
+        target_page_bits = (slb->vsid & SLB_VSID_L)
+            ? TARGET_PAGE_BITS_16M : TARGET_PAGE_BITS;
+        ctx->key = !!(pr ? (slb->vsid & SLB_VSID_KP)
+                      : (slb->vsid & SLB_VSID_KS));
         ds = 0;
-        ctx->nx = !!(attr & SLB_VSID_N);
+        ctx->nx = !!(slb->vsid & SLB_VSID_N);
         ctx->eaddr = eaddr;
         vsid_mask = 0x00003FFFFFFFFF80ULL;
         vsid_sh = 7;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 08/26] Parse SDR1 on mtspr instead of at translate time
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (6 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 07/26] Clean up slb_lookup() function David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 09/26] Use "hash" more consistently in ppc mmu code David Gibson
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

On ppc machines with hash table MMUs, the special purpose register SDR1
contains both the base address of the encoded size (hashed) page tables.

At present, we interpret the SDR1 value within the address translation
path.  But because the encodings of the size for 32-bit and 64-bit are
different this makes for a confusing branch on the MMU type with a bunch
of curly shifts and masks in the middle of the translate path.

This patch cleans things up by moving the interpretation on SDR1 into the
helper function handling the write to the register.  This leaves a simple
pre-sanitized base address and mask for the hash table in the CPUState
structure which is easier to work with in the translation path.

This makes the translation path more readable.  It addresses the FIXME
comment currently in the mtsdr1 helper, by validating the SDR1 value during
interpretation.  Finally it opens the way for emulating a pSeries-style
partition where the hash table used for translation is not mapped into
the guests's RAM.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 monitor.c                   |    2 +-
 target-ppc/cpu.h            |   11 +++++-
 target-ppc/helper.c         |   79 ++++++++++++++++++++++++-------------------
 target-ppc/kvm.c            |    2 +-
 target-ppc/machine.c        |    6 ++-
 target-ppc/translate.c      |    2 +-
 target-ppc/translate_init.c |    7 +---
 7 files changed, 62 insertions(+), 47 deletions(-)

diff --git a/monitor.c b/monitor.c
index 22ae3bb..cbc6cca 100644
--- a/monitor.c
+++ b/monitor.c
@@ -3457,7 +3457,7 @@ static const MonitorDef monitor_defs[] = {
     { "asr", offsetof(CPUState, asr) },
 #endif
     /* Segment registers */
-    { "sdr1", offsetof(CPUState, sdr1) },
+    { "sdr1", offsetof(CPUState, spr[SPR_SDR1]) },
     { "sr0", offsetof(CPUState, sr[0]) },
     { "sr1", offsetof(CPUState, sr[1]) },
     { "sr2", offsetof(CPUState, sr[2]) },
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index f9ad3b8..42d0973 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -359,6 +359,14 @@ union ppc_tlb_t {
 };
 #endif
 
+#define SDR_32_HTABORG         0xFFFF0000UL
+#define SDR_32_HTABMASK        0x000001FFUL
+
+#if defined(TARGET_PPC64)
+#define SDR_64_HTABORG         0xFFFFFFFFFFFC0000ULL
+#define SDR_64_HTABSIZE        0x000000000000001FULL
+#endif /* defined(TARGET_PPC64 */
+
 typedef struct ppc_slb_t ppc_slb_t;
 struct ppc_slb_t {
     uint64_t esid;
@@ -642,7 +650,8 @@ struct CPUPPCState {
     int slb_nr;
 #endif
     /* segment registers */
-    target_ulong sdr1;
+    target_phys_addr_t htab_base;
+    target_phys_addr_t htab_mask;
     target_ulong sr[32];
     /* BATs */
     int nb_BATs;
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 73d93ca..df90722 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -784,20 +784,19 @@ int ppc_load_slb_vsid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
 #endif /* defined(TARGET_PPC64) */
 
 /* Perform segment based translation */
-static inline target_phys_addr_t get_pgaddr(target_phys_addr_t sdr1,
-                                            int sdr_sh,
-                                            target_phys_addr_t hash,
-                                            target_phys_addr_t mask)
+static inline target_phys_addr_t get_pgaddr(target_phys_addr_t htab_base,
+                                            target_phys_addr_t htab_mask,
+                                            target_phys_addr_t hash)
 {
-    return (sdr1 & ((target_phys_addr_t)(-1ULL) << sdr_sh)) | (hash & mask);
+    return htab_base | (hash & htab_mask);
 }
 
 static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                               target_ulong eaddr, int rw, int type)
 {
-    target_phys_addr_t sdr, hash, mask, sdr_mask, htab_mask;
+    target_phys_addr_t hash;
     target_ulong sr, vsid, vsid_mask, pgidx, page_mask;
-    int ds, vsid_sh, sdr_sh, pr, target_page_bits;
+    int ds, vsid_sh, pr, target_page_bits;
     int ret, ret2;
 
     pr = msr_pr;
@@ -822,8 +821,6 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         ctx->eaddr = eaddr;
         vsid_mask = 0x00003FFFFFFFFF80ULL;
         vsid_sh = 7;
-        sdr_sh = 18;
-        sdr_mask = 0x3FF80;
     } else
 #endif /* defined(TARGET_PPC64) */
     {
@@ -836,8 +833,6 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         vsid = sr & 0x00FFFFFF;
         vsid_mask = 0x01FFFFC0;
         vsid_sh = 6;
-        sdr_sh = 16;
-        sdr_mask = 0xFFC0;
         target_page_bits = TARGET_PAGE_BITS;
         LOG_MMU("Check segment v=" TARGET_FMT_lx " %d " TARGET_FMT_lx " nip="
                 TARGET_FMT_lx " lr=" TARGET_FMT_lx
@@ -853,29 +848,26 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         if (type != ACCESS_CODE || ctx->nx == 0) {
             /* Page address translation */
             /* Primary table address */
-            sdr = env->sdr1;
             pgidx = (eaddr & page_mask) >> target_page_bits;
 #if defined(TARGET_PPC64)
             if (env->mmu_model & POWERPC_MMU_64) {
-                htab_mask = 0x0FFFFFFF >> (28 - (sdr & 0x1F));
                 /* XXX: this is false for 1 TB segments */
                 hash = ((vsid ^ pgidx) << vsid_sh) & vsid_mask;
             } else
 #endif
             {
-                htab_mask = sdr & 0x000001FF;
                 hash = ((vsid ^ pgidx) << vsid_sh) & vsid_mask;
             }
-            mask = (htab_mask << sdr_sh) | sdr_mask;
-            LOG_MMU("sdr " TARGET_FMT_plx " sh %d hash " TARGET_FMT_plx
-                    " mask " TARGET_FMT_plx " " TARGET_FMT_lx "\n",
-                    sdr, sdr_sh, hash, mask, page_mask);
-            ctx->pg_addr[0] = get_pgaddr(sdr, sdr_sh, hash, mask);
+            LOG_MMU("htab_base " TARGET_FMT_plx " htab_mask " TARGET_FMT_plx
+                    " hash " TARGET_FMT_plx "\n",
+                    env->htab_base, env->htab_mask, hash);
+            ctx->pg_addr[0] = get_pgaddr(env->htab_base, env->htab_mask, hash);
             /* Secondary table address */
             hash = (~hash) & vsid_mask;
-            LOG_MMU("sdr " TARGET_FMT_plx " sh %d hash " TARGET_FMT_plx
-                    " mask " TARGET_FMT_plx "\n", sdr, sdr_sh, hash, mask);
-            ctx->pg_addr[1] = get_pgaddr(sdr, sdr_sh, hash, mask);
+            LOG_MMU("htab_base " TARGET_FMT_plx " htab_mask " TARGET_FMT_plx
+                    " hash " TARGET_FMT_plx "\n",
+                    env->htab_base, env->htab_mask, hash);
+            ctx->pg_addr[1] = get_pgaddr(env->htab_base, env->htab_mask, hash);
 #if defined(TARGET_PPC64)
             if (env->mmu_model & POWERPC_MMU_64) {
                 /* Only 5 bits of the page index are used in the AVPN */
@@ -897,19 +889,21 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                 /* Software TLB search */
                 ret = ppc6xx_tlb_check(env, ctx, eaddr, rw, type);
             } else {
-                LOG_MMU("0 sdr1=" TARGET_FMT_plx " vsid=" TARGET_FMT_lx " "
-                        "api=" TARGET_FMT_lx " hash=" TARGET_FMT_plx
-                        " pg_addr=" TARGET_FMT_plx "\n",
-                        sdr, vsid, pgidx, hash, ctx->pg_addr[0]);
+                LOG_MMU("0 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
+                        " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
+                        " hash=" TARGET_FMT_plx " pg_addr=" TARGET_FMT_plx "\n",
+                        env->htab_base, env->htab_mask, vsid, pgidx, hash,
+                        ctx->pg_addr[0]);
                 /* Primary table lookup */
                 ret = find_pte(env, ctx, 0, rw, type, target_page_bits);
                 if (ret < 0) {
                     /* Secondary table lookup */
                     if (eaddr != 0xEFFFFFFF)
-                        LOG_MMU("1 sdr1=" TARGET_FMT_plx " vsid=" TARGET_FMT_lx " "
-                                "api=" TARGET_FMT_lx " hash=" TARGET_FMT_plx
-                                " pg_addr=" TARGET_FMT_plx "\n", sdr, vsid,
-                                pgidx, hash, ctx->pg_addr[1]);
+                        LOG_MMU("1 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
+                                " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
+                                " hash=" TARGET_FMT_plx " pg_addr=" TARGET_FMT_plx "\n",
+                                env->htab_base, env->htab_mask, vsid, pgidx, hash,
+                                ctx->pg_addr[1]);
                     ret2 = find_pte(env, ctx, 1, rw, type,
                                     target_page_bits);
                     if (ret2 != -1)
@@ -1915,11 +1909,26 @@ void ppc_store_asr (CPUPPCState *env, target_ulong value)
 void ppc_store_sdr1 (CPUPPCState *env, target_ulong value)
 {
     LOG_MMU("%s: " TARGET_FMT_lx "\n", __func__, value);
-    if (env->sdr1 != value) {
-        /* XXX: for PowerPC 64, should check that the HTABSIZE value
-         *      is <= 28
-         */
-        env->sdr1 = value;
+    if (env->spr[SPR_SDR1] != value) {
+        env->spr[SPR_SDR1] = value;
+#if defined(TARGET_PPC64)
+        if (env->mmu_model & POWERPC_MMU_64) {
+            target_ulong htabsize = value & SDR_64_HTABSIZE;
+
+            if (htabsize > 28) {
+                fprintf(stderr, "Invalid HTABSIZE 0x" TARGET_FMT_lx
+                        " stored in SDR1\n", htabsize);
+                htabsize = 28;
+            }
+            env->htab_mask = (1ULL << (htabsize + 18)) - 1;
+            env->htab_base = value & SDR_64_HTABORG;
+        } else
+#endif /* defined(TARGET_PPC64) */
+        {
+            /* FIXME: Should check for valid HTABMASK values */
+            env->htab_mask = ((value & SDR_32_HTABMASK) << 16) | 0xFFFF;
+            env->htab_base = value & SDR_32_HTABORG;
+        }
         tlb_flush(env, 1);
     }
 }
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index bd4012a..8938e28 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -169,7 +169,7 @@ int kvm_arch_get_registers(CPUState *env)
 
 #ifdef KVM_CAP_PPC_SEGSTATE
     if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_SEGSTATE)) {
-        env->sdr1 = sregs.u.s.sdr1;
+        ppc_store_sdr1(env, sregs.u.s.sdr1);
 
         /* Sync SLB */
 #ifdef TARGET_PPC64
diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index 67de951..0c1986e 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -37,7 +37,7 @@ void cpu_save(QEMUFile *f, void *opaque)
     qemu_put_betls(f, &env->asr);
     qemu_put_sbe32s(f, &env->slb_nr);
 #endif
-    qemu_put_betls(f, &env->sdr1);
+    qemu_put_betls(f, &env->spr[SPR_SDR1]);
     for (i = 0; i < 32; i++)
         qemu_put_betls(f, &env->sr[i]);
     for (i = 0; i < 2; i++)
@@ -93,6 +93,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
 {
     CPUState *env = (CPUState *)opaque;
     unsigned int i, j;
+    target_ulong sdr1;
 
     for (i = 0; i < 32; i++)
         qemu_get_betls(f, &env->gpr[i]);
@@ -124,7 +125,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
     qemu_get_betls(f, &env->asr);
     qemu_get_sbe32s(f, &env->slb_nr);
 #endif
-    qemu_get_betls(f, &env->sdr1);
+    qemu_get_betls(f, &sdr1);
     for (i = 0; i < 32; i++)
         qemu_get_betls(f, &env->sr[i]);
     for (i = 0; i < 2; i++)
@@ -152,6 +153,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
 #endif
     for (i = 0; i < 1024; i++)
         qemu_get_betls(f, &env->spr[i]);
+    ppc_store_sdr1(env, sdr1);
     qemu_get_be32s(f, &env->vscr);
     qemu_get_be64s(f, &env->spe_acc);
     qemu_get_be32s(f, &env->spe_fscr);
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 5c28ac3..561b756 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -8930,7 +8930,7 @@ void cpu_dump_state (CPUState *env, FILE *f, fprintf_function cpu_fprintf,
 #if !defined(CONFIG_USER_ONLY)
     cpu_fprintf(f, "SRR0 " TARGET_FMT_lx " SRR1 " TARGET_FMT_lx " SDR1 "
                 TARGET_FMT_lx "\n", env->spr[SPR_SRR0], env->spr[SPR_SRR1],
-                env->sdr1);
+                env->spr[SPR_SDR1]);
 #endif
 
 #undef RGPL
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 9d2e4a1..6270ec6 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -341,11 +341,6 @@ static void spr_write_dbatl_h (void *opaque, int sprn, int gprn)
 }
 
 /* SDR1 */
-static void spr_read_sdr1 (void *opaque, int gprn, int sprn)
-{
-    tcg_gen_ld_tl(cpu_gpr[gprn], cpu_env, offsetof(CPUState, sdr1));
-}
-
 static void spr_write_sdr1 (void *opaque, int sprn, int gprn)
 {
     gen_helper_store_sdr1(cpu_gpr[gprn]);
@@ -669,7 +664,7 @@ static void gen_spr_ne_601 (CPUPPCState *env)
     /* Memory management */
     spr_register(env, SPR_SDR1, "SDR1",
                  SPR_NOACCESS, SPR_NOACCESS,
-                 &spr_read_sdr1, &spr_write_sdr1,
+                 &spr_read_generic, &spr_write_sdr1,
                  0x00000000);
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 09/26] Use "hash" more consistently in ppc mmu code
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (7 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 08/26] Parse SDR1 on mtspr instead of at translate time David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 10/26] Better factor the ppc hash translation path David Gibson
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

Currently, get_segment() has a variable called hash.  However it doesn't
(quite) get the hash value for the ppc hashed page table.  Instead it
gets the hash shifted - effectively the offset of the hash bucket within
the hash page table.

As well, as being different to the normal use of plain "hash" in the
architecture documentation, this usage necessitates some awkward 32/64
dependent masks and shifts which clutter up the path in get_segment().

This patch alters the code to use raw hash values through get_segment()
including storing raw hashes instead of pte group offsets in the ctx
structure.  This cleans up the path noticeably.

This does necessitate 32/64 dependent shifts when the hash values are
taken out of the ctx structure and used, but those paths already have
32/64 bit variants so this is less awkward than it was in get_segment().

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h    |    5 ++-
 target-ppc/helper.c |   99 ++++++++++++++++++++++++--------------------------
 2 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 42d0973..592907a 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -367,6 +367,9 @@ union ppc_tlb_t {
 #define SDR_64_HTABSIZE        0x000000000000001FULL
 #endif /* defined(TARGET_PPC64 */
 
+#define HASH_PTE_SIZE_32       8
+#define HASH_PTE_SIZE_64       16
+
 typedef struct ppc_slb_t ppc_slb_t;
 struct ppc_slb_t {
     uint64_t esid;
@@ -746,7 +749,7 @@ struct mmu_ctx_t {
     target_phys_addr_t raddr;      /* Real address              */
     target_phys_addr_t eaddr;      /* Effective address         */
     int prot;                      /* Protection bits           */
-    target_phys_addr_t pg_addr[2]; /* PTE tables base addresses */
+    target_phys_addr_t hash[2];    /* Pagetable hash values     */
     target_ulong ptem;             /* Virtual segment ID | API  */
     int key;                       /* Access key                */
     int nx;                        /* Non-execute area          */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index df90722..b9438b2 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -563,21 +563,30 @@ static inline int get_bat(CPUState *env, mmu_ctx_t *ctx, target_ulong virtual,
     return ret;
 }
 
+static inline target_phys_addr_t get_pteg_offset(CPUState *env,
+                                                 target_phys_addr_t hash,
+                                                 int pte_size)
+{
+    return (hash * pte_size * 8) & env->htab_mask;
+}
+
 /* PTE table lookup */
-static inline int _find_pte(mmu_ctx_t *ctx, int is_64b, int h, int rw,
-                            int type, int target_page_bits)
+static inline int _find_pte(CPUState *env, mmu_ctx_t *ctx, int is_64b, int h,
+                            int rw, int type, int target_page_bits)
 {
-    target_ulong base, pte0, pte1;
+    target_phys_addr_t pteg_off;
+    target_ulong pte0, pte1;
     int i, good = -1;
     int ret, r;
 
     ret = -1; /* No entry found */
-    base = ctx->pg_addr[h];
+    pteg_off = get_pteg_offset(env, ctx->hash[h],
+                               is_64b ? HASH_PTE_SIZE_64 : HASH_PTE_SIZE_32);
     for (i = 0; i < 8; i++) {
 #if defined(TARGET_PPC64)
         if (is_64b) {
-            pte0 = ldq_phys(base + (i * 16));
-            pte1 = ldq_phys(base + (i * 16) + 8);
+            pte0 = ldq_phys(env->htab_base + pteg_off + (i * 16));
+            pte1 = ldq_phys(env->htab_base + pteg_off + (i * 16) + 8);
 
             /* We have a TLB that saves 4K pages, so let's
              * split a huge page to 4k chunks */
@@ -588,17 +597,17 @@ static inline int _find_pte(mmu_ctx_t *ctx, int is_64b, int h, int rw,
             r = pte64_check(ctx, pte0, pte1, h, rw, type);
             LOG_MMU("Load pte from " TARGET_FMT_lx " => " TARGET_FMT_lx " "
                     TARGET_FMT_lx " %d %d %d " TARGET_FMT_lx "\n",
-                    base + (i * 16), pte0, pte1, (int)(pte0 & 1), h,
+                    pteg_base + (i * 16), pte0, pte1, (int)(pte0 & 1), h,
                     (int)((pte0 >> 1) & 1), ctx->ptem);
         } else
 #endif
         {
-            pte0 = ldl_phys(base + (i * 8));
-            pte1 =  ldl_phys(base + (i * 8) + 4);
+            pte0 = ldl_phys(env->htab_base + pteg_off + (i * 8));
+            pte1 =  ldl_phys(env->htab_base + pteg_off + (i * 8) + 4);
             r = pte32_check(ctx, pte0, pte1, h, rw, type);
             LOG_MMU("Load pte from " TARGET_FMT_lx " => " TARGET_FMT_lx " "
                     TARGET_FMT_lx " %d %d %d " TARGET_FMT_lx "\n",
-                    base + (i * 8), pte0, pte1, (int)(pte0 >> 31), h,
+                    pteg_base + (i * 8), pte0, pte1, (int)(pte0 >> 31), h,
                     (int)((pte0 >> 6) & 1), ctx->ptem);
         }
         switch (r) {
@@ -634,11 +643,13 @@ static inline int _find_pte(mmu_ctx_t *ctx, int is_64b, int h, int rw,
         if (pte_update_flags(ctx, &pte1, ret, rw) == 1) {
 #if defined(TARGET_PPC64)
             if (is_64b) {
-                stq_phys_notdirty(base + (good * 16) + 8, pte1);
+                stq_phys_notdirty(env->htab_base + pteg_off + (good * 16) + 8,
+                                  pte1);
             } else
 #endif
             {
-                stl_phys_notdirty(base + (good * 8) + 4, pte1);
+                stl_phys_notdirty(env->htab_base + pteg_off + (good * 8) + 4,
+                                  pte1);
             }
         }
     }
@@ -646,17 +657,17 @@ static inline int _find_pte(mmu_ctx_t *ctx, int is_64b, int h, int rw,
     return ret;
 }
 
-static inline int find_pte32(mmu_ctx_t *ctx, int h, int rw, int type,
-                             int target_page_bits)
+static inline int find_pte32(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
+                             int type, int target_page_bits)
 {
-    return _find_pte(ctx, 0, h, rw, type, target_page_bits);
+    return _find_pte(env, ctx, 0, h, rw, type, target_page_bits);
 }
 
 #if defined(TARGET_PPC64)
-static inline int find_pte64(mmu_ctx_t *ctx, int h, int rw, int type,
-                             int target_page_bits)
+static inline int find_pte64(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
+                             int type, int target_page_bits)
 {
-    return _find_pte(ctx, 1, h, rw, type, target_page_bits);
+    return _find_pte(env, ctx, 1, h, rw, type, target_page_bits);
 }
 #endif
 
@@ -665,10 +676,10 @@ static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
 {
 #if defined(TARGET_PPC64)
     if (env->mmu_model & POWERPC_MMU_64)
-        return find_pte64(ctx, h, rw, type, target_page_bits);
+        return find_pte64(env, ctx, h, rw, type, target_page_bits);
 #endif
 
-    return find_pte32(ctx, h, rw, type, target_page_bits);
+    return find_pte32(env, ctx, h, rw, type, target_page_bits);
 }
 
 #if defined(TARGET_PPC64)
@@ -784,19 +795,12 @@ int ppc_load_slb_vsid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
 #endif /* defined(TARGET_PPC64) */
 
 /* Perform segment based translation */
-static inline target_phys_addr_t get_pgaddr(target_phys_addr_t htab_base,
-                                            target_phys_addr_t htab_mask,
-                                            target_phys_addr_t hash)
-{
-    return htab_base | (hash & htab_mask);
-}
-
 static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                               target_ulong eaddr, int rw, int type)
 {
     target_phys_addr_t hash;
-    target_ulong sr, vsid, vsid_mask, pgidx, page_mask;
-    int ds, vsid_sh, pr, target_page_bits;
+    target_ulong sr, vsid, pgidx, page_mask;
+    int ds, pr, target_page_bits;
     int ret, ret2;
 
     pr = msr_pr;
@@ -819,8 +823,6 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         ds = 0;
         ctx->nx = !!(slb->vsid & SLB_VSID_N);
         ctx->eaddr = eaddr;
-        vsid_mask = 0x00003FFFFFFFFF80ULL;
-        vsid_sh = 7;
     } else
 #endif /* defined(TARGET_PPC64) */
     {
@@ -831,8 +833,6 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         ds = sr & 0x80000000 ? 1 : 0;
         ctx->nx = sr & 0x10000000 ? 1 : 0;
         vsid = sr & 0x00FFFFFF;
-        vsid_mask = 0x01FFFFC0;
-        vsid_sh = 6;
         target_page_bits = TARGET_PAGE_BITS;
         LOG_MMU("Check segment v=" TARGET_FMT_lx " %d " TARGET_FMT_lx " nip="
                 TARGET_FMT_lx " lr=" TARGET_FMT_lx
@@ -847,27 +847,22 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         /* Check if instruction fetch is allowed, if needed */
         if (type != ACCESS_CODE || ctx->nx == 0) {
             /* Page address translation */
-            /* Primary table address */
             pgidx = (eaddr & page_mask) >> target_page_bits;
 #if defined(TARGET_PPC64)
             if (env->mmu_model & POWERPC_MMU_64) {
                 /* XXX: this is false for 1 TB segments */
-                hash = ((vsid ^ pgidx) << vsid_sh) & vsid_mask;
+                hash = vsid ^ pgidx;
             } else
 #endif
             {
-                hash = ((vsid ^ pgidx) << vsid_sh) & vsid_mask;
+                hash = vsid ^ pgidx;
             }
             LOG_MMU("htab_base " TARGET_FMT_plx " htab_mask " TARGET_FMT_plx
                     " hash " TARGET_FMT_plx "\n",
                     env->htab_base, env->htab_mask, hash);
-            ctx->pg_addr[0] = get_pgaddr(env->htab_base, env->htab_mask, hash);
-            /* Secondary table address */
-            hash = (~hash) & vsid_mask;
-            LOG_MMU("htab_base " TARGET_FMT_plx " htab_mask " TARGET_FMT_plx
-                    " hash " TARGET_FMT_plx "\n",
-                    env->htab_base, env->htab_mask, hash);
-            ctx->pg_addr[1] = get_pgaddr(env->htab_base, env->htab_mask, hash);
+            ctx->hash[0] = hash;
+            ctx->hash[1] = ~hash;
+
 #if defined(TARGET_PPC64)
             if (env->mmu_model & POWERPC_MMU_64) {
                 /* Only 5 bits of the page index are used in the AVPN */
@@ -891,9 +886,9 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
             } else {
                 LOG_MMU("0 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
                         " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
-                        " hash=" TARGET_FMT_plx " pg_addr=" TARGET_FMT_plx "\n",
-                        env->htab_base, env->htab_mask, vsid, pgidx, hash,
-                        ctx->pg_addr[0]);
+                        " hash=" TARGET_FMT_plx "\n",
+                        env->htab_base, env->htab_mask, vsid, pgidx,
+                        ctx->hash[0]);
                 /* Primary table lookup */
                 ret = find_pte(env, ctx, 0, rw, type, target_page_bits);
                 if (ret < 0) {
@@ -901,9 +896,9 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                     if (eaddr != 0xEFFFFFFF)
                         LOG_MMU("1 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
                                 " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
-                                " hash=" TARGET_FMT_plx " pg_addr=" TARGET_FMT_plx "\n",
-                                env->htab_base, env->htab_mask, vsid, pgidx, hash,
-                                ctx->pg_addr[1]);
+                                " hash=" TARGET_FMT_plx "\n",
+                                env->htab_base, env->htab_mask, vsid, pgidx,
+                                ctx->hash[1]);
                     ret2 = find_pte(env, ctx, 1, rw, type,
                                     target_page_bits);
                     if (ret2 != -1)
@@ -1455,8 +1450,10 @@ int cpu_ppc_handle_mmu_fault (CPUState *env, target_ulong address, int rw,
                     env->spr[SPR_DCMP] = 0x80000000 | ctx.ptem;
                 tlb_miss:
                     env->error_code |= ctx.key << 19;
-                    env->spr[SPR_HASH1] = ctx.pg_addr[0];
-                    env->spr[SPR_HASH2] = ctx.pg_addr[1];
+                    env->spr[SPR_HASH1] = env->htab_base +
+                        get_pteg_offset(env, ctx.hash[0], HASH_PTE_SIZE_32);
+                    env->spr[SPR_HASH2] = env->htab_base +
+                        get_pteg_offset(env, ctx.hash[1], HASH_PTE_SIZE_32);
                     break;
                 case POWERPC_MMU_SOFT_74xx:
                     if (rw == 1) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 10/26] Better factor the ppc hash translation path
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (8 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 09/26] Use "hash" more consistently in ppc mmu code David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 11/26] Support 1T segments on ppc David Gibson
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

Currently the path handling hash page table translation in get_segment()
has a mix of common and 32 or 64 bit specific code.  However the
division is not done terribly well which results in a lot of messy code
flipping between common and divided paths.

This patch improves the organization, consolidating several divided paths
into one.  This in turn allows simplification of some code in
get_segment(), removing a number of ugly interim variables.

This new factorization will also make it easier to add support for the 1T
segments added in newer CPUs.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h    |    1 +
 target-ppc/helper.c |   68 +++++++++++++++------------------------------------
 2 files changed, 21 insertions(+), 48 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 592907a..71f8d72 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -386,6 +386,7 @@ struct ppc_slb_t {
 #define SLB_VSID_B              0xc000000000000000ULL
 #define SLB_VSID_B_256M         0x0000000000000000ULL
 #define SLB_VSID_VSID           0x3FFFFFFFFFFFF000ULL
+#define SLB_VSID_PTEM           (SLB_VSID_B | SLB_VSID_VSID)
 #define SLB_VSID_KS             0x0000000000000800ULL
 #define SLB_VSID_KP             0x0000000000000400ULL
 #define SLB_VSID_N              0x0000000000000200ULL /* no-execute */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index b9438b2..111675d 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -657,29 +657,15 @@ static inline int _find_pte(CPUState *env, mmu_ctx_t *ctx, int is_64b, int h,
     return ret;
 }
 
-static inline int find_pte32(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
-                             int type, int target_page_bits)
-{
-    return _find_pte(env, ctx, 0, h, rw, type, target_page_bits);
-}
-
-#if defined(TARGET_PPC64)
-static inline int find_pte64(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
-                             int type, int target_page_bits)
-{
-    return _find_pte(env, ctx, 1, h, rw, type, target_page_bits);
-}
-#endif
-
 static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
                            int type, int target_page_bits)
 {
 #if defined(TARGET_PPC64)
     if (env->mmu_model & POWERPC_MMU_64)
-        return find_pte64(env, ctx, h, rw, type, target_page_bits);
+        return _find_pte(env, ctx, 1, h, rw, type, target_page_bits);
 #endif
 
-    return find_pte32(env, ctx, h, rw, type, target_page_bits);
+    return _find_pte(env, ctx, 0, h, rw, type, target_page_bits);
 }
 
 #if defined(TARGET_PPC64)
@@ -799,14 +785,16 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                               target_ulong eaddr, int rw, int type)
 {
     target_phys_addr_t hash;
-    target_ulong sr, vsid, pgidx, page_mask;
+    target_ulong vsid;
     int ds, pr, target_page_bits;
     int ret, ret2;
 
     pr = msr_pr;
+    ctx->eaddr = eaddr;
 #if defined(TARGET_PPC64)
     if (env->mmu_model & POWERPC_MMU_64) {
         ppc_slb_t *slb;
+        target_ulong pageaddr;
 
         LOG_MMU("Check SLBs\n");
         slb = slb_lookup(env, eaddr);
@@ -815,19 +803,24 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         }
 
         vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
-        page_mask = ~SEGMENT_MASK_256M;
         target_page_bits = (slb->vsid & SLB_VSID_L)
             ? TARGET_PAGE_BITS_16M : TARGET_PAGE_BITS;
         ctx->key = !!(pr ? (slb->vsid & SLB_VSID_KP)
                       : (slb->vsid & SLB_VSID_KS));
         ds = 0;
         ctx->nx = !!(slb->vsid & SLB_VSID_N);
-        ctx->eaddr = eaddr;
+
+        pageaddr = eaddr & ((1ULL << 28) - (1ULL << target_page_bits));
+        /* XXX: this is false for 1 TB segments */
+        hash = vsid ^ (pageaddr >> target_page_bits);
+        /* Only 5 bits of the page index are used in the AVPN */
+        ctx->ptem = (slb->vsid & SLB_VSID_PTEM) | ((pageaddr >> 16) & 0x0F80);
     } else
 #endif /* defined(TARGET_PPC64) */
     {
+        target_ulong sr, pgidx;
+
         sr = env->sr[eaddr >> 28];
-        page_mask = 0x0FFFFFFF;
         ctx->key = (((sr & 0x20000000) && (pr != 0)) ||
                     ((sr & 0x40000000) && (pr == 0))) ? 1 : 0;
         ds = sr & 0x80000000 ? 1 : 0;
@@ -839,6 +832,9 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                 " ir=%d dr=%d pr=%d %d t=%d\n",
                 eaddr, (int)(eaddr >> 28), sr, env->nip, env->lr, (int)msr_ir,
                 (int)msr_dr, pr != 0 ? 1 : 0, rw, type);
+        pgidx = (eaddr & ~SEGMENT_MASK_256M) >> target_page_bits;
+        hash = vsid ^ pgidx;
+        ctx->ptem = (vsid << 7) | (pgidx >> 10);
     }
     LOG_MMU("pte segment: key=%d ds %d nx %d vsid " TARGET_FMT_lx "\n",
             ctx->key, ds, ctx->nx, vsid);
@@ -847,36 +843,12 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         /* Check if instruction fetch is allowed, if needed */
         if (type != ACCESS_CODE || ctx->nx == 0) {
             /* Page address translation */
-            pgidx = (eaddr & page_mask) >> target_page_bits;
-#if defined(TARGET_PPC64)
-            if (env->mmu_model & POWERPC_MMU_64) {
-                /* XXX: this is false for 1 TB segments */
-                hash = vsid ^ pgidx;
-            } else
-#endif
-            {
-                hash = vsid ^ pgidx;
-            }
             LOG_MMU("htab_base " TARGET_FMT_plx " htab_mask " TARGET_FMT_plx
                     " hash " TARGET_FMT_plx "\n",
                     env->htab_base, env->htab_mask, hash);
             ctx->hash[0] = hash;
             ctx->hash[1] = ~hash;
 
-#if defined(TARGET_PPC64)
-            if (env->mmu_model & POWERPC_MMU_64) {
-                /* Only 5 bits of the page index are used in the AVPN */
-                if (target_page_bits > 23) {
-                    ctx->ptem = (vsid << 12) |
-                                ((pgidx << (target_page_bits - 16)) & 0xF80);
-                } else {
-                    ctx->ptem = (vsid << 12) | ((pgidx >> 4) & 0x0F80);
-                }
-            } else
-#endif
-            {
-                ctx->ptem = (vsid << 7) | (pgidx >> 10);
-            }
             /* Initialize real address with an invalid value */
             ctx->raddr = (target_phys_addr_t)-1ULL;
             if (unlikely(env->mmu_model == POWERPC_MMU_SOFT_6xx ||
@@ -885,9 +857,9 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                 ret = ppc6xx_tlb_check(env, ctx, eaddr, rw, type);
             } else {
                 LOG_MMU("0 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
-                        " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
+                        " vsid=" TARGET_FMT_lx " ptem=" TARGET_FMT_lx
                         " hash=" TARGET_FMT_plx "\n",
-                        env->htab_base, env->htab_mask, vsid, pgidx,
+                        env->htab_base, env->htab_mask, vsid, ctx->ptem,
                         ctx->hash[0]);
                 /* Primary table lookup */
                 ret = find_pte(env, ctx, 0, rw, type, target_page_bits);
@@ -895,9 +867,9 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                     /* Secondary table lookup */
                     if (eaddr != 0xEFFFFFFF)
                         LOG_MMU("1 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
-                                " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
+                                " vsid=" TARGET_FMT_lx " ptem=" TARGET_FMT_lx
                                 " hash=" TARGET_FMT_plx "\n",
-                                env->htab_base, env->htab_mask, vsid, pgidx,
+                                env->htab_base, env->htab_mask, vsid, ctx->ptem,
                                 ctx->hash[1]);
                     ret2 = find_pte(env, ctx, 1, rw, type,
                                     target_page_bits);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 11/26] Support 1T segments on ppc
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (9 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 10/26] Better factor the ppc hash translation path David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 12/26] Add POWER7 support for ppc David Gibson
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

Traditionally, the "segments" used for the two-stage translation used on
powerpc MMUs were 256MB in size.  This was the only option on all hash
page table based 32-bit powerpc cpus, and on the earlier 64-bit hash page
table based cpus.  However, newer 64-bit cpus also permit 1TB segments

This patch adds support for 1TB segment translation to the qemu code.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h    |    7 ++++++
 target-ppc/helper.c |   54 +++++++++++++++++++++++++++++++++++---------------
 2 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 71f8d72..9abf4a9 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -114,6 +114,7 @@ enum powerpc_mmu_t {
     POWERPC_MMU_601        = 0x0000000A,
 #if defined(TARGET_PPC64)
 #define POWERPC_MMU_64       0x00010000
+#define POWERPC_MMU_1TSEG    0x00020000
     /* 64 bits PowerPC MMU                                     */
     POWERPC_MMU_64B        = POWERPC_MMU_64 | 0x00000001,
     /* 620 variant (no segment exceptions)                     */
@@ -382,9 +383,11 @@ struct ppc_slb_t {
 
 /* Bits in the SLB VSID word */
 #define SLB_VSID_SHIFT          12
+#define SLB_VSID_SHIFT_1T       24
 #define SLB_VSID_SSIZE_SHIFT    62
 #define SLB_VSID_B              0xc000000000000000ULL
 #define SLB_VSID_B_256M         0x0000000000000000ULL
+#define SLB_VSID_B_1T           0x4000000000000000ULL
 #define SLB_VSID_VSID           0x3FFFFFFFFFFFF000ULL
 #define SLB_VSID_PTEM           (SLB_VSID_B | SLB_VSID_VSID)
 #define SLB_VSID_KS             0x0000000000000800ULL
@@ -398,6 +401,10 @@ struct ppc_slb_t {
 #define SEGMENT_SHIFT_256M      28
 #define SEGMENT_MASK_256M       ~((1ULL << SEGMENT_SHIFT_256M) - 1)
 
+#define SEGMENT_SHIFT_1T        40
+#define SEGMENT_MASK_1T         ~((1ULL << SEGMENT_SHIFT_1T) - 1)
+
+
 /*****************************************************************************/
 /* Machine state register bits definition                                    */
 #define MSR_SF   63 /* Sixty-four-bit mode                            hflags */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 111675d..3e3b5da 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -671,19 +671,26 @@ static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
 #if defined(TARGET_PPC64)
 static inline ppc_slb_t *slb_lookup(CPUPPCState *env, target_ulong eaddr)
 {
-    uint64_t esid;
+    uint64_t esid_256M, esid_1T;
     int n;
 
     LOG_SLB("%s: eaddr " TARGET_FMT_lx "\n", __func__, eaddr);
 
-    esid = (eaddr & SEGMENT_MASK_256M) | SLB_ESID_V;
+    esid_256M = (eaddr & SEGMENT_MASK_256M) | SLB_ESID_V;
+    esid_1T = (eaddr & SEGMENT_MASK_1T) | SLB_ESID_V;
 
     for (n = 0; n < env->slb_nr; n++) {
         ppc_slb_t *slb = &env->slb[n];
 
         LOG_SLB("%s: slot %d %016" PRIx64 " %016"
                     PRIx64 "\n", __func__, n, slb->esid, slb->vsid);
-        if (slb->esid == esid) {
+        /* We check for 1T matches on all MMUs here - if the MMU
+         * doesn't have 1T segment support, we will have prevented 1T
+         * entries from being inserted in the slbmte code. */
+        if ( ((slb->esid == esid_256M) &&
+              ((slb->vsid & SLB_VSID_B) == SLB_VSID_B_256M))
+             || ((slb->esid == esid_1T) &&
+                 ((slb->vsid & SLB_VSID_B) == SLB_VSID_B_1T)) ) {
             return slb;
         }
     }
@@ -736,16 +743,19 @@ void ppc_slb_invalidate_one (CPUPPCState *env, uint64_t T0)
 int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs)
 {
     int slot = rb & 0xfff;
-    uint64_t esid = rb & ~0xfff;
     ppc_slb_t *slb = &env->slb[slot];
-
-    if (slot >= env->slb_nr) {
-        return -1;
-    }
-
-    slb->esid = esid;
+ 
+    if (rb & (0x1000 - env->slb_nr))
+	return -1; /* Reserved bits set or slot too high */
+    if (rs & (SLB_VSID_B & ~SLB_VSID_B_1T))
+	return -1; /* Bad segment size */
+    if ((rs & SLB_VSID_B) && !(env->mmu_model & POWERPC_MMU_1TSEG))
+ 	return -1; /* 1T segment on MMU that doesn't support it */
+
+    /* Mask out the slot number as we store the entry */
+    slb->esid = rb & (SLB_ESID_ESID | SLB_ESID_V);
     slb->vsid = rs;
-
+ 
     LOG_SLB("%s: %d " TARGET_FMT_lx " - " TARGET_FMT_lx " => %016" PRIx64
             " %016" PRIx64 "\n", __func__, slot, rb, rs,
             slb->esid, slb->vsid);
@@ -795,6 +805,7 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
     if (env->mmu_model & POWERPC_MMU_64) {
         ppc_slb_t *slb;
         target_ulong pageaddr;
+        int segment_bits;
 
         LOG_MMU("Check SLBs\n");
         slb = slb_lookup(env, eaddr);
@@ -802,7 +813,14 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
             return -5;
         }
 
-        vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
+	if (slb->vsid & SLB_VSID_B) {
+	    vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT_1T;
+	    segment_bits = 40;
+	} else {
+	    vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
+	    segment_bits = 28;
+	}
+
         target_page_bits = (slb->vsid & SLB_VSID_L)
             ? TARGET_PAGE_BITS_16M : TARGET_PAGE_BITS;
         ctx->key = !!(pr ? (slb->vsid & SLB_VSID_KP)
@@ -810,11 +828,15 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         ds = 0;
         ctx->nx = !!(slb->vsid & SLB_VSID_N);
 
-        pageaddr = eaddr & ((1ULL << 28) - (1ULL << target_page_bits));
-        /* XXX: this is false for 1 TB segments */
-        hash = vsid ^ (pageaddr >> target_page_bits);
+        pageaddr = eaddr & ((1ULL << segment_bits) 
+                            - (1ULL << target_page_bits));
+	if (slb->vsid & SLB_VSID_B)
+	    hash = vsid ^ (vsid << 25) ^ (pageaddr >> target_page_bits);
+	else
+	    hash = vsid ^ (pageaddr >> target_page_bits);
         /* Only 5 bits of the page index are used in the AVPN */
-        ctx->ptem = (slb->vsid & SLB_VSID_PTEM) | ((pageaddr >> 16) & 0x0F80);
+        ctx->ptem = (slb->vsid & SLB_VSID_PTEM) | 
+            ((pageaddr >> 16) & ((1ULL << segment_bits) - 0x80));
     } else
 #endif /* defined(TARGET_PPC64) */
     {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 12/26] Add POWER7 support for ppc
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (10 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 11/26] Support 1T segments on ppc David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 13/26] Start implementing pSeries logical partition machine David Gibson
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

This adds emulation support for the recent POWER7 cpu to qemu.  It's far
from perfect - it's missing a number of POWER7 features so far, including
any support for VSX or decimal floating point instructions.  However, it's
close enough to boot a kernel with the POWER7 PVR.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/ppc.c                    |   35 +++++++++++++++
 hw/ppc.h                    |    1 +
 target-ppc/cpu.h            |   16 +++++++
 target-ppc/helper.c         |    6 +++
 target-ppc/translate_init.c |  103 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 161 insertions(+), 0 deletions(-)

diff --git a/hw/ppc.c b/hw/ppc.c
index de02d33..2aa152b 100644
--- a/hw/ppc.c
+++ b/hw/ppc.c
@@ -247,6 +247,41 @@ void ppc970_irq_init (CPUState *env)
     env->irq_inputs = (void **)qemu_allocate_irqs(&ppc970_set_irq, env,
                                                   PPC970_INPUT_NB);
 }
+
+/* POWER7 internal IRQ controller */
+static void power7_set_irq (void *opaque, int pin, int level)
+{
+    CPUState *env = opaque;
+    int cur_level;
+
+    LOG_IRQ("%s: env %p pin %d level %d\n", __func__,
+                env, pin, level);
+    cur_level = (env->irq_input_state >> pin) & 1;
+
+    switch (pin) {
+    case POWER7_INPUT_INT:
+        /* Level sensitive - active high */
+        LOG_IRQ("%s: set the external IRQ state to %d\n",
+                __func__, level);
+        ppc_set_irq(env, PPC_INTERRUPT_EXT, level);
+        break;
+    default:
+        /* Unknown pin - do nothing */
+        LOG_IRQ("%s: unknown IRQ pin %d\n", __func__, pin);
+        return;
+    }
+    if (level) {
+        env->irq_input_state |= 1 << pin;
+    } else {
+        env->irq_input_state &= ~(1 << pin);
+    }
+}
+
+void ppcPOWER7_irq_init (CPUState *env)
+{
+    env->irq_inputs = (void **)qemu_allocate_irqs(&power7_set_irq, env,
+                                                  POWER7_INPUT_NB);
+}
 #endif /* defined(TARGET_PPC64) */
 
 /* PowerPC 40x internal IRQ controller */
diff --git a/hw/ppc.h b/hw/ppc.h
index 34f54cf..3ccf134 100644
--- a/hw/ppc.h
+++ b/hw/ppc.h
@@ -36,6 +36,7 @@ void ppc40x_irq_init (CPUState *env);
 void ppce500_irq_init (CPUState *env);
 void ppc6xx_irq_init (CPUState *env);
 void ppc970_irq_init (CPUState *env);
+void ppcPOWER7_irq_init (CPUState *env);
 
 /* PPC machines for OpenBIOS */
 enum {
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 9abf4a9..3a47d11 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -119,6 +119,8 @@ enum powerpc_mmu_t {
     POWERPC_MMU_64B        = POWERPC_MMU_64 | 0x00000001,
     /* 620 variant (no segment exceptions)                     */
     POWERPC_MMU_620        = POWERPC_MMU_64 | 0x00000002,
+    /* Architecture 2.06 variant                               */
+    POWERPC_MMU_2_06       = POWERPC_MMU_64 | POWERPC_MMU_1TSEG | 0x00000003,
 #endif /* defined(TARGET_PPC64) */
 };
 
@@ -154,6 +156,8 @@ enum powerpc_excp_t {
 #if defined(TARGET_PPC64)
     /* PowerPC 970 exception model      */
     POWERPC_EXCP_970,
+    /* POWER7 exception model           */
+    POWERPC_EXCP_POWER7,
 #endif /* defined(TARGET_PPC64) */
 };
 
@@ -289,6 +293,8 @@ enum powerpc_input_t {
     PPC_FLAGS_INPUT_405,
     /* PowerPC 970 bus                  */
     PPC_FLAGS_INPUT_970,
+    /* PowerPC POWER7 bus               */
+    PPC_FLAGS_INPUT_POWER7,
     /* PowerPC 401 bus                  */
     PPC_FLAGS_INPUT_401,
     /* Freescale RCPU bus               */
@@ -1003,6 +1009,7 @@ static inline void cpu_clone_regs(CPUState *env, target_ulong newsp)
 #define SPR_HSPRG1            (0x131)
 #define SPR_HDSISR            (0x132)
 #define SPR_HDAR              (0x133)
+#define SPR_SPURR             (0x134)
 #define SPR_BOOKE_DBCR0       (0x134)
 #define SPR_IBCR              (0x135)
 #define SPR_PURR              (0x135)
@@ -1627,6 +1634,15 @@ enum {
     PPC970_INPUT_THINT      = 6,
     PPC970_INPUT_NB,
 };
+
+enum {
+    /* POWER7 input pins */
+    POWER7_INPUT_INT        = 0,
+    /* POWER7 probably has other inputs, but we don't care about them
+     * for any existing machine.  We can wire these up when we need
+     * them */
+    POWER7_INPUT_NB,
+};
 #endif
 
 /* Hardware exceptions definitions */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 3e3b5da..13a5ab1 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -1192,6 +1192,7 @@ static inline int check_physical(CPUState *env, mmu_ctx_t *ctx,
 #if defined(TARGET_PPC64)
     case POWERPC_MMU_620:
     case POWERPC_MMU_64B:
+    case POWERPC_MMU_2_06:
         /* Real address are 60 bits long */
         ctx->raddr &= 0x0FFFFFFFFFFFFFFFULL;
         ctx->prot |= PAGE_WRITE;
@@ -1269,6 +1270,7 @@ int get_physical_address (CPUState *env, mmu_ctx_t *ctx, target_ulong eaddr,
 #if defined(TARGET_PPC64)
         case POWERPC_MMU_620:
         case POWERPC_MMU_64B:
+        case POWERPC_MMU_2_06:
 #endif
             if (ret < 0) {
                 /* We didn't match any BAT entry or don't have BATs */
@@ -1368,6 +1370,7 @@ int cpu_ppc_handle_mmu_fault (CPUState *env, target_ulong address, int rw,
 #if defined(TARGET_PPC64)
                 case POWERPC_MMU_620:
                 case POWERPC_MMU_64B:
+                case POWERPC_MMU_2_06:
 #endif
                     env->exception_index = POWERPC_EXCP_ISI;
                     env->error_code = 0x40000000;
@@ -1477,6 +1480,7 @@ int cpu_ppc_handle_mmu_fault (CPUState *env, target_ulong address, int rw,
 #if defined(TARGET_PPC64)
                 case POWERPC_MMU_620:
                 case POWERPC_MMU_64B:
+                case POWERPC_MMU_2_06:
 #endif
                     env->exception_index = POWERPC_EXCP_DSI;
                     env->error_code = 0;
@@ -1800,6 +1804,7 @@ void ppc_tlb_invalidate_all (CPUPPCState *env)
 #if defined(TARGET_PPC64)
     case POWERPC_MMU_620:
     case POWERPC_MMU_64B:
+    case POWERPC_MMU_2_06:
 #endif /* defined(TARGET_PPC64) */
         tlb_flush(env, 1);
         break;
@@ -1867,6 +1872,7 @@ void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr)
 #if defined(TARGET_PPC64)
     case POWERPC_MMU_620:
     case POWERPC_MMU_64B:
+    case POWERPC_MMU_2_06:
         /* tlbie invalidate TLBs for all segments */
         /* XXX: given the fact that there are too many segments to invalidate,
          *      and we still don't have a tlb_flush_mask(env, n, mask) in Qemu,
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 6270ec6..58de0cb 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -61,6 +61,7 @@ void glue(glue(ppc, name),_irq_init) (CPUPPCState *env);
 PPC_IRQ_INIT_FN(40x);
 PPC_IRQ_INIT_FN(6xx);
 PPC_IRQ_INIT_FN(970);
+PPC_IRQ_INIT_FN(POWER7);
 PPC_IRQ_INIT_FN(e500);
 
 /* Generic callbacks:
@@ -3129,6 +3130,35 @@ static void init_excp_970 (CPUPPCState *env)
     env->hreset_vector = 0x0000000000000100ULL;
 #endif
 }
+
+static void init_excp_POWER7 (CPUPPCState *env)
+{
+#if !defined(CONFIG_USER_ONLY)
+    env->excp_vectors[POWERPC_EXCP_RESET]    = 0x00000100;
+    env->excp_vectors[POWERPC_EXCP_MCHECK]   = 0x00000200;
+    env->excp_vectors[POWERPC_EXCP_DSI]      = 0x00000300;
+    env->excp_vectors[POWERPC_EXCP_DSEG]     = 0x00000380;
+    env->excp_vectors[POWERPC_EXCP_ISI]      = 0x00000400;
+    env->excp_vectors[POWERPC_EXCP_ISEG]     = 0x00000480;
+    env->excp_vectors[POWERPC_EXCP_EXTERNAL] = 0x00000500;
+    env->excp_vectors[POWERPC_EXCP_ALIGN]    = 0x00000600;
+    env->excp_vectors[POWERPC_EXCP_PROGRAM]  = 0x00000700;
+    env->excp_vectors[POWERPC_EXCP_FPU]      = 0x00000800;
+    env->excp_vectors[POWERPC_EXCP_DECR]     = 0x00000900;
+    env->excp_vectors[POWERPC_EXCP_HDECR]    = 0x00000980;
+    env->excp_vectors[POWERPC_EXCP_SYSCALL]  = 0x00000C00;
+    env->excp_vectors[POWERPC_EXCP_TRACE]    = 0x00000D00;
+    env->excp_vectors[POWERPC_EXCP_PERFM]    = 0x00000F00;
+    env->excp_vectors[POWERPC_EXCP_VPU]      = 0x00000F20;
+    env->excp_vectors[POWERPC_EXCP_IABR]     = 0x00001300;
+    env->excp_vectors[POWERPC_EXCP_MAINT]    = 0x00001600;
+    env->excp_vectors[POWERPC_EXCP_VPUA]     = 0x00001700;
+    env->excp_vectors[POWERPC_EXCP_THERM]    = 0x00001800;
+    env->hreset_excp_prefix = 0;
+    /* Hardware reset vector */
+    env->hreset_vector = 0x0000000000000100ULL;
+#endif
+}
 #endif
 
 /*****************************************************************************/
@@ -6310,6 +6340,74 @@ static void init_proc_970MP (CPUPPCState *env)
     vscr_init(env, 0x00010000);
 }
 
+/* POWER7 */
+#define POWERPC_INSNS_POWER7  (PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |        \
+                              PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |   \
+                              PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |           \
+                              PPC_FLOAT_STFIWX |                              \
+                              PPC_CACHE | PPC_CACHE_ICBI | PPC_CACHE_DCBZT |  \
+                              PPC_MEM_SYNC | PPC_MEM_EIEIO |                  \
+                              PPC_MEM_TLBIE | PPC_MEM_TLBSYNC |               \
+                              PPC_64B | PPC_ALTIVEC |                         \
+                              PPC_SEGMENT_64B | PPC_SLBI |                    \
+                              PPC_POPCNTB | PPC_POPCNTWD)
+#define POWERPC_MSRM_POWER7   (0x800000000204FF36ULL)
+#define POWERPC_MMU_POWER7    (POWERPC_MMU_2_06)
+#define POWERPC_EXCP_POWER7   (POWERPC_EXCP_POWER7)
+#define POWERPC_INPUT_POWER7  (PPC_FLAGS_INPUT_POWER7)
+#define POWERPC_BFDM_POWER7   (bfd_mach_ppc64)
+#define POWERPC_FLAG_POWER7   (POWERPC_FLAG_VRE | POWERPC_FLAG_SE |            \
+                              POWERPC_FLAG_BE | POWERPC_FLAG_PMM |            \
+                              POWERPC_FLAG_BUS_CLK)
+#define check_pow_POWER7    check_pow_nocheck
+
+static void init_proc_POWER7 (CPUPPCState *env)
+{
+    gen_spr_ne_601(env);
+    gen_spr_7xx(env);
+    /* Time base */
+    gen_tbl(env);
+    /* PURR & SPURR: Hack - treat these as aliases for the TB for now */
+    spr_register(env, SPR_PURR,   "PURR",
+                 &spr_read_purr, SPR_NOACCESS,
+                 &spr_read_purr, SPR_NOACCESS,
+                 0x00000000);
+    spr_register(env, SPR_SPURR,   "SPURR",
+                 &spr_read_purr, SPR_NOACCESS,
+                 &spr_read_purr, SPR_NOACCESS,
+                 0x00000000);
+    /* Memory management */
+    /* XXX : not implemented */
+    spr_register(env, SPR_MMUCFG, "MMUCFG",
+                 SPR_NOACCESS, SPR_NOACCESS,
+                 &spr_read_generic, SPR_NOACCESS,
+                 0x00000000); /* TOFIX */
+    /* XXX : not implemented */
+    spr_register(env, SPR_CTRL, "SPR_CTRLT",
+                 SPR_NOACCESS, SPR_NOACCESS,
+                 &spr_read_generic, &spr_write_generic,
+                 0x80800000);
+    spr_register(env, SPR_UCTRL, "SPR_CTRLF",
+                 SPR_NOACCESS, SPR_NOACCESS,
+                 &spr_read_generic, &spr_write_generic,
+                 0x80800000);
+    spr_register(env, SPR_VRSAVE, "SPR_VRSAVE",
+                 &spr_read_generic, &spr_write_generic,
+                 &spr_read_generic, &spr_write_generic,
+                 0x00000000);
+#if !defined(CONFIG_USER_ONLY)
+    env->slb_nr = 32;
+#endif
+    init_excp_POWER7(env);
+    env->dcache_line_size = 128;
+    env->icache_line_size = 128;
+    /* Allocate hardware IRQ controller */
+    ppcPOWER7_irq_init(env);
+    /* Can't find information on what this should be on reset.  This
+     * value is the one used by 74xx processors. */
+    vscr_init(env, 0x00010000);
+}
+
 /* PowerPC 620                                                               */
 #define POWERPC_INSNS_620    (PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |        \
                               PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |   \
@@ -7032,6 +7130,8 @@ enum {
     CPU_POWERPC_POWER6             = 0x003E0000,
     CPU_POWERPC_POWER6_5           = 0x0F000001, /* POWER6 in POWER5 mode */
     CPU_POWERPC_POWER6A            = 0x0F000002,
+#define CPU_POWERPC_POWER7           CPU_POWERPC_POWER7_v20
+    CPU_POWERPC_POWER7_v20         = 0x003F0200,
     CPU_POWERPC_970                = 0x00390202,
 #define CPU_POWERPC_970FX            CPU_POWERPC_970FX_v31
     CPU_POWERPC_970FX_v10          = 0x00391100,
@@ -8834,6 +8934,9 @@ static const ppc_def_t ppc_defs[] = {
     /* POWER6A                                                               */
     POWERPC_DEF("POWER6A",       CPU_POWERPC_POWER6A,                POWER6),
 #endif
+    /* POWER7                                                                */
+    POWERPC_DEF("POWER7",	 CPU_POWERPC_POWER7,		     POWER7),
+    POWERPC_DEF("POWER7_v2.0",	 CPU_POWERPC_POWER7_v20,	     POWER7),
     /* PowerPC 970                                                           */
     POWERPC_DEF("970",           CPU_POWERPC_970,                    970),
     /* PowerPC 970FX (G5)                                                    */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 13/26] Start implementing pSeries logical partition machine
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (11 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 12/26] Add POWER7 support for ppc David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16 14:30   ` [Qemu-devel] " Alexander Graf
  2011-03-16 21:59   ` [Qemu-devel] " Anthony Liguori
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 14/26] Implement the bus structure for PAPR virtual IO David Gibson
                   ` (12 subsequent siblings)
  25 siblings, 2 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

This patch adds a "pseries" machine to qemu.  This aims to emulate a
logical partition on an IBM pSeries machine, compliant to the
"PowerPC Architecture Platform Requirements" (PAPR) document.

This initial version is quite limited, it implements a basic machine
and PAPR hypercall emulation.  So far only one hypercall is present -
H_PUT_TERM_CHAR - so that a (write-only) console is available.

Multiple CPUs are permitted, with SMP entry handled kexec() style.

The machine so far more resembles an old POWER4 style "full system
partition" rather than a modern LPAR, in that the guest manages the
page tables directly, rather than via hypercalls.

The machine requires qemu to be configured with --enable-fdt.  The
machine can (so far) only be booted with -kernel - i.e. no partition
firmware is provided.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 Makefile.target  |    2 +
 hw/spapr.c       |  314 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/spapr.h       |  246 ++++++++++++++++++++++++++++++++++++++++++
 hw/spapr_hcall.c |   43 ++++++++
 4 files changed, 605 insertions(+), 0 deletions(-)
 create mode 100644 hw/spapr.c
 create mode 100644 hw/spapr.h
 create mode 100644 hw/spapr_hcall.c

diff --git a/Makefile.target b/Makefile.target
index f0df98e..e6a7557 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -231,6 +231,8 @@ obj-ppc-y += ppc_prep.o
 obj-ppc-y += ppc_oldworld.o
 # NewWorld PowerMac
 obj-ppc-y += ppc_newworld.o
+# IBM pSeries (sPAPR)
+obj-ppc-y += spapr.o spapr_hcall.o
 # PowerPC 4xx boards
 obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
 obj-ppc-y += ppc440.o ppc440_bamboo.o
diff --git a/hw/spapr.c b/hw/spapr.c
new file mode 100644
index 0000000..8b4e16e
--- /dev/null
+++ b/hw/spapr.c
@@ -0,0 +1,314 @@
+/*
+ * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System Emulator
+ *
+ * Copyright (c) 2004-2007 Fabrice Bellard
+ * Copyright (c) 2007 Jocelyn Mayer
+ * Copyright (c) 2010 David Gibson, IBM Corporation.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ */
+#include "sysemu.h"
+#include "qemu-char.h"
+#include "hw.h"
+#include "elf.h"
+
+#include "hw/boards.h"
+#include "hw/ppc.h"
+#include "hw/loader.h"
+
+#include "hw/spapr.h"
+
+#include <libfdt.h>
+
+#define KERNEL_LOAD_ADDR        0x00000000
+#define INITRD_LOAD_ADDR        0x02800000
+#define FDT_MAX_SIZE            0x10000
+
+#define TIMEBASE_FREQ           512000000ULL
+
+#define MAX_CPUS                32
+
+static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
+                              const char *cpu_model, CPUState *envs[],
+                              sPAPREnvironment *spapr,
+                              target_phys_addr_t initrd_base,
+                              target_phys_addr_t initrd_size,
+                              const char *kernel_cmdline)
+{
+    void *fdt;
+    uint64_t mem_reg_property[] = { 0, cpu_to_be64(ramsize) };
+    uint32_t start_prop = cpu_to_be32(initrd_base);
+    uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
+    int i;
+    char *modelname;
+
+#define _FDT(exp) \
+    do { \
+        int ret = (exp);                                           \
+        if (ret < 0) {                                             \
+            hw_error("qemu: error creating device tree: %s: %s\n", \
+                     #exp, fdt_strerror(ret));                     \
+            return NULL;                                           \
+        }                                                          \
+    } while (0)
+
+    fdt = qemu_mallocz(FDT_MAX_SIZE);
+    _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
+    
+    _FDT((fdt_finish_reservemap(fdt)));
+
+    /* Root node */
+    _FDT((fdt_begin_node(fdt, "")));
+    _FDT((fdt_property_string(fdt, "device_type", "chrp")));
+    _FDT((fdt_property_string(fdt, "model", "qemu,emulated-pSeries-LPAR")));
+
+    _FDT((fdt_property_cell(fdt, "#address-cells", 0x2)));
+    _FDT((fdt_property_cell(fdt, "#size-cells", 0x2)));
+
+    /* /chosen */
+    _FDT((fdt_begin_node(fdt, "chosen")));
+
+    _FDT((fdt_property_string(fdt, "bootargs", kernel_cmdline)));
+    _FDT((fdt_property(fdt, "linux,initrd-start", &start_prop, sizeof(start_prop))));
+    _FDT((fdt_property(fdt, "linux,initrd-end", &end_prop, sizeof(end_prop))));
+    
+    _FDT((fdt_end_node(fdt)));
+
+    /* memory node */
+    _FDT((fdt_begin_node(fdt, "memory@0")));
+
+    _FDT((fdt_property_string(fdt, "device_type", "memory")));
+    _FDT((fdt_property(fdt, "reg", mem_reg_property, sizeof(mem_reg_property))));
+    
+    _FDT((fdt_end_node(fdt)));
+    
+    /* cpus */
+    _FDT((fdt_begin_node(fdt, "cpus")));
+
+    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
+    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
+
+    modelname = qemu_strdup(cpu_model);
+    
+    for (i = 0; i < strlen(modelname); i++) {
+        modelname[i] = toupper(modelname[i]);
+    }
+
+    for (i = 0; i < smp_cpus; i++) {
+        CPUState *env = envs[i];
+        char *nodename;
+        uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
+                           0xffffffff, 0xffffffff};
+
+        if (asprintf(&nodename, "%s@%x", modelname, i) < 0) {
+            fprintf(stderr, "Allocation failure\n");
+            exit(1);
+        }
+
+        _FDT((fdt_begin_node(fdt, nodename)));
+
+        free(nodename);
+
+        _FDT((fdt_property_cell(fdt, "reg", i)));
+        _FDT((fdt_property_string(fdt, "device_type", "cpu")));
+
+        _FDT((fdt_property_cell(fdt, "cpu-version", env->spr[SPR_PVR])));
+        _FDT((fdt_property_cell(fdt, "dcache-block-size", env->dcache_line_size)));
+        _FDT((fdt_property_cell(fdt, "icache-block-size", env->icache_line_size)));
+        _FDT((fdt_property_cell(fdt, "timebase-frequency", TIMEBASE_FREQ)));
+        /* Hardcode CPU frequency for now.  It's kind of arbitrary on
+         * full emu, for kvm we should copy it from the host */
+        _FDT((fdt_property_cell(fdt, "clock-frequency", 1000000000)));
+        _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr)));
+        _FDT((fdt_property_string(fdt, "status", "okay")));
+        _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
+
+        if (envs[i]->mmu_model & POWERPC_MMU_1TSEG) {
+            _FDT((fdt_property(fdt, "ibm,processor-segment-sizes",
+                               segs, sizeof(segs))));
+        }
+
+        _FDT((fdt_end_node(fdt)));
+    }
+
+    qemu_free(modelname);
+
+    _FDT((fdt_end_node(fdt)));
+
+    _FDT((fdt_end_node(fdt))); /* close root node */
+    _FDT((fdt_finish(fdt)));
+
+    if (fdt_size) {
+        *fdt_size = fdt_totalsize(fdt);
+    }
+
+    return fdt;
+}
+
+static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
+{
+    return (addr & 0x0fffffff) + KERNEL_LOAD_ADDR;
+}
+
+static void emulate_spapr_hypercall(CPUState *env, void *opaque)
+{
+    env->gpr[3] = spapr_hypercall(env, (sPAPREnvironment *)opaque,
+                                  env->gpr[3], &env->gpr[4]);
+}
+
+/* FIXME: hack until we implement the proper VIO console */
+static target_ulong h_put_term_char(CPUState *env, sPAPREnvironment *spapr,
+                                    target_ulong opcode, target_ulong *args)
+{
+    uint8_t buf[16];
+
+    stq_p(buf, args[2]);
+    stq_p(buf + 8, args[3]);
+
+    qemu_chr_write(serial_hds[0], buf, args[1]);
+
+    return 0;
+}
+
+
+/* pSeries LPAR / sPAPR hardware init */
+static void ppc_spapr_init(ram_addr_t ram_size,
+                           const char *boot_device,
+                           const char *kernel_filename,
+                           const char *kernel_cmdline,
+                           const char *initrd_filename,
+                           const char *cpu_model)
+{
+    CPUState *envs[MAX_CPUS];
+    void *fdt;
+    int i;
+    ram_addr_t ram_offset;
+    target_phys_addr_t fdt_addr;
+    uint32_t kernel_base, initrd_base;
+    long kernel_size, initrd_size;
+    int fdt_size;
+    sPAPREnvironment *spapr;
+
+    spapr = qemu_malloc(sizeof(*spapr));
+
+    /* We place the device tree just below either the top of RAM, or
+     * 2GB, so that it can be processed with 32-bit code if
+     * necessary */
+    fdt_addr = MIN(ram_size, 0x80000000) - FDT_MAX_SIZE;
+
+    /* init CPUs */
+    if (cpu_model == NULL) {
+        cpu_model = "POWER7";
+    }
+    for (i = 0; i < smp_cpus; i++) {
+        CPUState *env =  cpu_init(cpu_model);
+
+        if (!env) {
+            fprintf(stderr, "Unable to find PowerPC CPU definition\n");
+            exit(1);
+        }
+        /* Set time-base frequency to 512 MHz */
+        cpu_ppc_tb_init(env, TIMEBASE_FREQ);
+        qemu_register_reset((QEMUResetHandler*)&cpu_reset, env);
+
+        env->emulate_hypercall = emulate_spapr_hypercall;
+        env->hcall_opaque = spapr;
+
+        env->hreset_vector = 0x60;
+        env->hreset_excp_prefix = 0;
+        env->gpr[3] = i;
+
+        envs[i] = env;
+    }
+
+    /* allocate RAM */
+    ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
+    cpu_register_physical_memory(0, ram_size, ram_offset);
+
+    spapr_register_hypercall(H_PUT_TERM_CHAR, h_put_term_char);
+
+    if (kernel_filename) {
+        uint64_t lowaddr = 0;
+
+        kernel_base = KERNEL_LOAD_ADDR;
+
+        kernel_size = load_elf(kernel_filename, translate_kernel_address, NULL,
+                               NULL, &lowaddr, NULL, 1, ELF_MACHINE, 0);
+        if (kernel_size < 0) {
+            kernel_size = load_image_targphys(kernel_filename, kernel_base,
+                                              ram_size - kernel_base);
+        }
+        if (kernel_size < 0) {
+            hw_error("qemu: could not load kernel '%s'\n", kernel_filename);
+            exit(1);
+        }
+
+        /* load initrd */
+        if (initrd_filename) {
+            initrd_base = INITRD_LOAD_ADDR;
+            initrd_size = load_image_targphys(initrd_filename, initrd_base,
+                                              ram_size - initrd_base);
+            if (initrd_size < 0) {
+                hw_error("qemu: could not load initial ram disk '%s'\n",
+                         initrd_filename);
+                exit(1);
+            }
+        } else {
+            initrd_base = 0;
+            initrd_size = 0;
+        }
+
+    } else {
+        fprintf(stderr, "pSeries machine needs -kernel for now");
+        exit(1);
+    }
+
+    /* Prepare the device tree */
+    fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, envs, spapr,
+                           initrd_base, initrd_size, kernel_cmdline);
+    if (!fdt) {
+        hw_error("Couldn't create pSeries device tree\n");
+        exit(1);
+    }
+
+    cpu_physical_memory_write(fdt_addr, fdt, fdt_size);
+
+    qemu_free(fdt);
+
+    envs[0]->gpr[3] = fdt_addr;
+    envs[0]->gpr[5] = 0;
+    envs[0]->hreset_vector = kernel_base;
+}
+
+static QEMUMachine spapr_machine = {
+    .name = "pseries",
+    .desc = "pSeries Logical Partition (PAPR compliant)",
+    .init = ppc_spapr_init,
+    .max_cpus = MAX_CPUS,
+    .no_vga = 1,
+    .no_parallel = 1,
+};
+
+static void spapr_machine_init(void)
+{
+    qemu_register_machine(&spapr_machine);
+}
+
+machine_init(spapr_machine_init);
diff --git a/hw/spapr.h b/hw/spapr.h
new file mode 100644
index 0000000..9e63a19
--- /dev/null
+++ b/hw/spapr.h
@@ -0,0 +1,246 @@
+#if !defined (__HW_SPAPR_H__)
+#define __HW_SPAPR_H__
+
+typedef struct sPAPREnvironment {
+} sPAPREnvironment;
+
+#define H_SUCCESS         0
+#define H_BUSY            1        /* Hardware busy -- retry later */
+#define H_CLOSED          2        /* Resource closed */
+#define H_NOT_AVAILABLE   3
+#define H_CONSTRAINED     4        /* Resource request constrained to max allowed */
+#define H_PARTIAL         5
+#define H_IN_PROGRESS     14       /* Kind of like busy */
+#define H_PAGE_REGISTERED 15
+#define H_PARTIAL_STORE   16
+#define H_PENDING         17       /* returned from H_POLL_PENDING */
+#define H_CONTINUE        18       /* Returned from H_Join on success */
+#define H_LONG_BUSY_START_RANGE         9900  /* Start of long busy range */
+#define H_LONG_BUSY_ORDER_1_MSEC        9900  /* Long busy, hint that 1msec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_ORDER_10_MSEC       9901  /* Long busy, hint that 10msec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_ORDER_100_MSEC      9902  /* Long busy, hint that 100msec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_ORDER_1_SEC         9903  /* Long busy, hint that 1sec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_ORDER_10_SEC        9904  /* Long busy, hint that 10sec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_ORDER_100_SEC       9905  /* Long busy, hint that 100sec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_END_RANGE           9905  /* End of long busy range */
+#define H_HARDWARE        -1       /* Hardware error */
+#define H_FUNCTION        -2       /* Function not supported */
+#define H_PRIVILEGE       -3       /* Caller not privileged */
+#define H_PARAMETER       -4       /* Parameter invalid, out-of-range or conflicting */
+#define H_BAD_MODE        -5       /* Illegal msr value */
+#define H_PTEG_FULL       -6       /* PTEG is full */
+#define H_NOT_FOUND       -7       /* PTE was not found" */
+#define H_RESERVED_DABR   -8       /* DABR address is reserved by the hypervisor on this processor" */
+#define H_NO_MEM          -9
+#define H_AUTHORITY       -10
+#define H_PERMISSION      -11
+#define H_DROPPED         -12
+#define H_SOURCE_PARM     -13
+#define H_DEST_PARM       -14
+#define H_REMOTE_PARM     -15
+#define H_RESOURCE        -16
+#define H_ADAPTER_PARM    -17
+#define H_RH_PARM         -18
+#define H_RCQ_PARM        -19
+#define H_SCQ_PARM        -20
+#define H_EQ_PARM         -21
+#define H_RT_PARM         -22
+#define H_ST_PARM         -23
+#define H_SIGT_PARM       -24
+#define H_TOKEN_PARM      -25
+#define H_MLENGTH_PARM    -27
+#define H_MEM_PARM        -28
+#define H_MEM_ACCESS_PARM -29
+#define H_ATTR_PARM       -30
+#define H_PORT_PARM       -31
+#define H_MCG_PARM        -32
+#define H_VL_PARM         -33
+#define H_TSIZE_PARM      -34
+#define H_TRACE_PARM      -35
+
+#define H_MASK_PARM       -37
+#define H_MCG_FULL        -38
+#define H_ALIAS_EXIST     -39
+#define H_P_COUNTER       -40
+#define H_TABLE_FULL      -41
+#define H_ALT_TABLE       -42
+#define H_MR_CONDITION    -43
+#define H_NOT_ENOUGH_RESOURCES -44
+#define H_R_STATE         -45
+#define H_RESCINDEND      -46
+#define H_MULTI_THREADS_ACTIVE -9005
+
+
+/* Long Busy is a condition that can be returned by the firmware
+ * when a call cannot be completed now, but the identical call
+ * should be retried later.  This prevents calls blocking in the
+ * firmware for long periods of time.  Annoyingly the firmware can return
+ * a range of return codes, hinting at how long we should wait before
+ * retrying.  If you don't care for the hint, the macro below is a good
+ * way to check for the long_busy return codes
+ */
+#define H_IS_LONG_BUSY(x)  ((x >= H_LONG_BUSY_START_RANGE) \
+                            && (x <= H_LONG_BUSY_END_RANGE))
+
+/* Flags */
+#define H_LARGE_PAGE      (1ULL<<(63-16))
+#define H_EXACT           (1ULL<<(63-24))       /* Use exact PTE or return H_PTEG_FULL */
+#define H_R_XLATE         (1ULL<<(63-25))       /* include a valid logical page num in the pte if the valid bit is set */
+#define H_READ_4          (1ULL<<(63-26))       /* Return 4 PTEs */
+#define H_PAGE_STATE_CHANGE (1ULL<<(63-28))
+#define H_PAGE_UNUSED     ((1ULL<<(63-29)) | (1ULL<<(63-30)))
+#define H_PAGE_SET_UNUSED (H_PAGE_STATE_CHANGE | H_PAGE_UNUSED)
+#define H_PAGE_SET_LOANED (H_PAGE_SET_UNUSED | (1ULL<<(63-31)))
+#define H_PAGE_SET_ACTIVE H_PAGE_STATE_CHANGE
+#define H_AVPN            (1ULL<<(63-32))       /* An avpn is provided as a sanity test */
+#define H_ANDCOND         (1ULL<<(63-33))
+#define H_ICACHE_INVALIDATE (1ULL<<(63-40))     /* icbi, etc.  (ignored for IO pages) */
+#define H_ICACHE_SYNCHRONIZE (1ULL<<(63-41))    /* dcbst, icbi, etc (ignored for IO pages */
+#define H_ZERO_PAGE       (1ULL<<(63-48))       /* zero the page before mapping (ignored for IO pages) */
+#define H_COPY_PAGE       (1ULL<<(63-49))
+#define H_N               (1ULL<<(63-61))
+#define H_PP1             (1ULL<<(63-62))
+#define H_PP2             (1ULL<<(63-63))
+
+/* VASI States */
+#define H_VASI_INVALID    0
+#define H_VASI_ENABLED    1
+#define H_VASI_ABORTED    2
+#define H_VASI_SUSPENDING 3
+#define H_VASI_SUSPENDED  4
+#define H_VASI_RESUMED    5
+#define H_VASI_COMPLETED  6
+
+/* DABRX flags */
+#define H_DABRX_HYPERVISOR (1ULL<<(63-61))
+#define H_DABRX_KERNEL     (1ULL<<(63-62))
+#define H_DABRX_USER       (1ULL<<(63-63))
+
+/* Each control block has to be on a 4K bondary */
+#define H_CB_ALIGNMENT     4096
+
+/* pSeries hypervisor opcodes */
+#define H_REMOVE                0x04
+#define H_ENTER                 0x08
+#define H_READ                  0x0c
+#define H_CLEAR_MOD             0x10
+#define H_CLEAR_REF             0x14
+#define H_PROTECT               0x18
+#define H_GET_TCE               0x1c
+#define H_PUT_TCE               0x20
+#define H_SET_SPRG0             0x24
+#define H_SET_DABR              0x28
+#define H_PAGE_INIT             0x2c
+#define H_SET_ASR               0x30
+#define H_ASR_ON                0x34
+#define H_ASR_OFF               0x38
+#define H_LOGICAL_CI_LOAD       0x3c
+#define H_LOGICAL_CI_STORE      0x40
+#define H_LOGICAL_CACHE_LOAD    0x44
+#define H_LOGICAL_CACHE_STORE   0x48
+#define H_LOGICAL_ICBI          0x4c
+#define H_LOGICAL_DCBF          0x50
+#define H_GET_TERM_CHAR         0x54
+#define H_PUT_TERM_CHAR         0x58
+#define H_REAL_TO_LOGICAL       0x5c
+#define H_HYPERVISOR_DATA       0x60
+#define H_EOI                   0x64
+#define H_CPPR                  0x68
+#define H_IPI                   0x6c
+#define H_IPOLL                 0x70
+#define H_XIRR                  0x74
+#define H_PERFMON               0x7c
+#define H_MIGRATE_DMA           0x78
+#define H_REGISTER_VPA          0xDC
+#define H_CEDE                  0xE0
+#define H_CONFER                0xE4
+#define H_PROD                  0xE8
+#define H_GET_PPP               0xEC
+#define H_SET_PPP               0xF0
+#define H_PURR                  0xF4
+#define H_PIC                   0xF8
+#define H_REG_CRQ               0xFC
+#define H_FREE_CRQ              0x100
+#define H_VIO_SIGNAL            0x104
+#define H_SEND_CRQ              0x108
+#define H_COPY_RDMA             0x110
+#define H_REGISTER_LOGICAL_LAN  0x114
+#define H_FREE_LOGICAL_LAN      0x118
+#define H_ADD_LOGICAL_LAN_BUFFER 0x11C
+#define H_SEND_LOGICAL_LAN      0x120
+#define H_BULK_REMOVE           0x124
+#define H_MULTICAST_CTRL        0x130
+#define H_SET_XDABR             0x134
+#define H_STUFF_TCE             0x138
+#define H_PUT_TCE_INDIRECT      0x13C
+#define H_CHANGE_LOGICAL_LAN_MAC 0x14C
+#define H_VTERM_PARTNER_INFO    0x150
+#define H_REGISTER_VTERM        0x154
+#define H_FREE_VTERM            0x158
+#define H_RESET_EVENTS          0x15C
+#define H_ALLOC_RESOURCE        0x160
+#define H_FREE_RESOURCE         0x164
+#define H_MODIFY_QP             0x168
+#define H_QUERY_QP              0x16C
+#define H_REREGISTER_PMR        0x170
+#define H_REGISTER_SMR          0x174
+#define H_QUERY_MR              0x178
+#define H_QUERY_MW              0x17C
+#define H_QUERY_HCA             0x180
+#define H_QUERY_PORT            0x184
+#define H_MODIFY_PORT           0x188
+#define H_DEFINE_AQP1           0x18C
+#define H_GET_TRACE_BUFFER      0x190
+#define H_DEFINE_AQP0           0x194
+#define H_RESIZE_MR             0x198
+#define H_ATTACH_MCQP           0x19C
+#define H_DETACH_MCQP           0x1A0
+#define H_CREATE_RPT            0x1A4
+#define H_REMOVE_RPT            0x1A8
+#define H_REGISTER_RPAGES       0x1AC
+#define H_DISABLE_AND_GETC      0x1B0
+#define H_ERROR_DATA            0x1B4
+#define H_GET_HCA_INFO          0x1B8
+#define H_GET_PERF_COUNT        0x1BC
+#define H_MANAGE_TRACE          0x1C0
+#define H_FREE_LOGICAL_LAN_BUFFER 0x1D4
+#define H_QUERY_INT_STATE       0x1E4
+#define H_POLL_PENDING          0x1D8
+#define H_ILLAN_ATTRIBUTES      0x244
+#define H_MODIFY_HEA_QP         0x250
+#define H_QUERY_HEA_QP          0x254
+#define H_QUERY_HEA             0x258
+#define H_QUERY_HEA_PORT        0x25C
+#define H_MODIFY_HEA_PORT       0x260
+#define H_REG_BCMC              0x264
+#define H_DEREG_BCMC            0x268
+#define H_REGISTER_HEA_RPAGES   0x26C
+#define H_DISABLE_AND_GET_HEA   0x270
+#define H_GET_HEA_INFO          0x274
+#define H_ALLOC_HEA_RESOURCE    0x278
+#define H_ADD_CONN              0x284
+#define H_DEL_CONN              0x288
+#define H_JOIN                  0x298
+#define H_VASI_STATE            0x2A4
+#define H_ENABLE_CRQ            0x2B0
+#define H_GET_EM_PARMS          0x2B8
+#define H_SET_MPP               0x2D0
+#define H_GET_MPP               0x2D4
+#define MAX_HCALL_OPCODE        H_GET_MPP
+
+typedef target_ulong (*spapr_hcall_fn)(CPUState *env, sPAPREnvironment *spapr,
+                                       target_ulong opcode,
+                                       target_ulong *args);
+
+void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn);
+target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
+                             target_ulong opcode, target_ulong *args);
+
+
+#endif /* !defined (__HW_SPAPR_H__) */
diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
new file mode 100644
index 0000000..6ddac00
--- /dev/null
+++ b/hw/spapr_hcall.c
@@ -0,0 +1,43 @@
+#include "sysemu.h"
+#include "cpu.h"
+#include "qemu-char.h"
+#include "hw/spapr.h"
+
+struct hypercall {
+    spapr_hcall_fn fn;
+} hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
+
+void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn)
+{
+    struct hypercall *hc;
+
+    assert(opcode <= MAX_HCALL_OPCODE);
+    assert((opcode & 0x3) == 0);
+
+    hc = hypercall_table + (opcode / 4);
+
+    assert(!hc->fn || (fn == hc->fn));
+
+    hc->fn = fn;
+}
+
+target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
+                             target_ulong opcode, target_ulong *args)
+{
+    if (msr_pr) {
+        fprintf(stderr, "Hypercall made with MSR=0x" TARGET_FMT_lx "\n",
+                env->msr);
+        return H_PRIVILEGE;
+    }
+
+    if ((opcode <= MAX_HCALL_OPCODE)
+        && ((opcode & 0x3) == 0)) {
+        struct hypercall *hc = hypercall_table + (opcode / 4);
+
+        if (hc->fn)
+            return hc->fn(env, spapr, opcode, args);
+    }
+
+    fprintf(stderr, "Unimplemented hcall 0x" TARGET_FMT_lx "\n", opcode);
+    return H_FUNCTION;
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 14/26] Implement the bus structure for PAPR virtual IO
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (12 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 13/26] Start implementing pSeries logical partition machine David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16 14:43   ` [Qemu-devel] " Alexander Graf
  2011-03-16 22:04   ` [Qemu-devel] " Anthony Liguori
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 15/26] Virtual hash page table handling on pSeries machine David Gibson
                   ` (11 subsequent siblings)
  25 siblings, 2 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

This extends the "pseries" (PAPR) machine to include a virtual IO bus
supporting the PAPR defined hypercall based virtual IO mechanisms.

So far only one VIO device is provided, the vty / vterm, providing
a full console (polled only, for now).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 Makefile.target |    3 +-
 hw/spapr.c      |   47 ++++++++-----
 hw/spapr.h      |    3 +
 hw/spapr_vio.c  |  212 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/spapr_vio.h  |   50 +++++++++++++
 hw/spapr_vty.c  |  145 +++++++++++++++++++++++++++++++++++++
 6 files changed, 441 insertions(+), 19 deletions(-)
 create mode 100644 hw/spapr_vio.c
 create mode 100644 hw/spapr_vio.h
 create mode 100644 hw/spapr_vty.c

diff --git a/Makefile.target b/Makefile.target
index e6a7557..3f2b235 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -232,7 +232,8 @@ obj-ppc-y += ppc_oldworld.o
 # NewWorld PowerMac
 obj-ppc-y += ppc_newworld.o
 # IBM pSeries (sPAPR)
-obj-ppc-y += spapr.o spapr_hcall.o
+obj-ppc-y += spapr.o spapr_hcall.o spapr_vio.o
+obj-ppc-y += spapr_vty.o
 # PowerPC 4xx boards
 obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
 obj-ppc-y += ppc440.o ppc440_bamboo.o
diff --git a/hw/spapr.c b/hw/spapr.c
index 8b4e16e..25e4a9e 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -25,7 +25,6 @@
  *
  */
 #include "sysemu.h"
-#include "qemu-char.h"
 #include "hw.h"
 #include "elf.h"
 
@@ -34,6 +33,7 @@
 #include "hw/loader.h"
 
 #include "hw/spapr.h"
+#include "hw/spapr_vio.h"
 
 #include <libfdt.h>
 
@@ -58,6 +58,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
     int i;
     char *modelname;
+    int ret;
 
 #define _FDT(exp) \
     do { \
@@ -152,9 +153,29 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
 
     _FDT((fdt_end_node(fdt)));
 
+    /* vdevice */
+    _FDT((fdt_begin_node(fdt, "vdevice")));
+
+    _FDT((fdt_property_string(fdt, "device_type", "vdevice")));
+    _FDT((fdt_property_string(fdt, "compatible", "IBM,vdevice")));
+    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
+    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
+    
+    _FDT((fdt_end_node(fdt)));
+
     _FDT((fdt_end_node(fdt))); /* close root node */
     _FDT((fdt_finish(fdt)));
 
+    /* re-expand to allow for further tweaks */
+    _FDT((fdt_open_into(fdt, fdt, FDT_MAX_SIZE)));
+
+    ret = spapr_populate_vdevice(spapr->vio_bus, fdt);
+    if (ret < 0) {
+        fprintf(stderr, "couldn't setup vio devices in fdt\n");
+    }
+
+    _FDT((fdt_pack(fdt)));
+
     if (fdt_size) {
         *fdt_size = fdt_totalsize(fdt);
     }
@@ -173,21 +194,6 @@ static void emulate_spapr_hypercall(CPUState *env, void *opaque)
                                   env->gpr[3], &env->gpr[4]);
 }
 
-/* FIXME: hack until we implement the proper VIO console */
-static target_ulong h_put_term_char(CPUState *env, sPAPREnvironment *spapr,
-                                    target_ulong opcode, target_ulong *args)
-{
-    uint8_t buf[16];
-
-    stq_p(buf, args[2]);
-    stq_p(buf + 8, args[3]);
-
-    qemu_chr_write(serial_hds[0], buf, args[1]);
-
-    return 0;
-}
-
-
 /* pSeries LPAR / sPAPR hardware init */
 static void ppc_spapr_init(ram_addr_t ram_size,
                            const char *boot_device,
@@ -242,7 +248,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
     cpu_register_physical_memory(0, ram_size, ram_offset);
 
-    spapr_register_hypercall(H_PUT_TERM_CHAR, h_put_term_char);
+    spapr->vio_bus = spapr_vio_bus_init();
+
+    for (i = 0; i < MAX_SERIAL_PORTS; i++) {
+        if (serial_hds[i]) {
+            spapr_vty_create(spapr->vio_bus, i, serial_hds[i]);
+        }
+    }
 
     if (kernel_filename) {
         uint64_t lowaddr = 0;
@@ -274,7 +286,6 @@ static void ppc_spapr_init(ram_addr_t ram_size,
             initrd_base = 0;
             initrd_size = 0;
         }
-
     } else {
         fprintf(stderr, "pSeries machine needs -kernel for now");
         exit(1);
diff --git a/hw/spapr.h b/hw/spapr.h
index 9e63a19..47bf2ef 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -1,7 +1,10 @@
 #if !defined (__HW_SPAPR_H__)
 #define __HW_SPAPR_H__
 
+struct VIOsPAPRBus;
+
 typedef struct sPAPREnvironment {
+    struct VIOsPAPRBus *vio_bus;
 } sPAPREnvironment;
 
 #define H_SUCCESS         0
diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
new file mode 100644
index 0000000..0ed63f4
--- /dev/null
+++ b/hw/spapr_vio.c
@@ -0,0 +1,212 @@
+/*
+ * QEMU sPAPR VIO code
+ *
+ * Copyright (c) 2010 David Gibson, IBM Corporation <david@gibson.dropbear.id.au>
+ * Based on the s390 virtio bus code:
+ * Copyright (c) 2009 Alexander Graf <agraf@suse.de>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "hw.h"
+#include "sysemu.h"
+#include "boards.h"
+#include "monitor.h"
+#include "loader.h"
+#include "elf.h"
+#include "hw/sysbus.h"
+#include "kvm.h"
+#include "device_tree.h"
+
+#include "hw/spapr.h"
+#include "hw/spapr_vio.h"
+
+#ifdef CONFIG_FDT
+#include <libfdt.h>
+#endif /* CONFIG_FDT */
+
+/* #define DEBUG_SPAPR */
+
+#ifdef DEBUG_SPAPR
+#define dprintf(fmt, ...) \
+    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define dprintf(fmt, ...) \
+    do { } while (0)
+#endif
+
+static struct BusInfo spapr_vio_bus_info = {
+    .name       = "spapr-vio",
+    .size       = sizeof(VIOsPAPRBus),
+};
+
+VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg)
+{
+    DeviceState *qdev;
+    VIOsPAPRDevice *dev = NULL;
+
+    QLIST_FOREACH(qdev, &bus->bus.children, sibling) {
+        dev = (VIOsPAPRDevice *)qdev;
+        if (dev->reg == reg) {
+            break;
+        }
+    }
+
+    return dev;
+}
+
+#ifdef CONFIG_FDT
+static int vio_make_devnode(VIOsPAPRDevice *dev,
+                            void *fdt)
+{
+    VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)dev->qdev.info;
+    int vdevice_off, node_off;
+    int ret;
+
+    vdevice_off = fdt_path_offset(fdt, "/vdevice");
+    if (vdevice_off < 0) {
+        return vdevice_off;
+    }
+
+    node_off = fdt_add_subnode(fdt, vdevice_off, dev->qdev.id);
+    if (node_off < 0) {
+        return node_off;
+    }
+
+    ret = fdt_setprop_cell(fdt, node_off, "reg", dev->reg);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (info->dt_type) {
+        ret = fdt_setprop_string(fdt, node_off, "device_type",
+                                 info->dt_type);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    if (info->dt_compatible) {
+        ret = fdt_setprop_string(fdt, node_off, "compatible",
+                                 info->dt_compatible);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    if (info->devnode) {
+        ret = (info->devnode)(dev, fdt, node_off);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return node_off;
+}
+#endif /* CONFIG_FDT */
+
+static int spapr_vio_busdev_init(DeviceState *dev, DeviceInfo *info)
+{
+    VIOsPAPRDeviceInfo *_info = (VIOsPAPRDeviceInfo *)info;
+    VIOsPAPRDevice *_dev = (VIOsPAPRDevice *)dev;
+    char *id;
+
+    if (asprintf(&id, "%s@%x", _info->dt_name, _dev->reg) < 0) {
+        return -1;
+    }
+
+    _dev->qdev.id = id;
+
+    return _info->init(_dev);
+}
+
+void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info)
+{
+    info->qdev.init = spapr_vio_busdev_init;
+    info->qdev.bus_info = &spapr_vio_bus_info;
+
+    assert(info->qdev.size >= sizeof(VIOsPAPRDevice));
+    qdev_register(&info->qdev);
+}
+
+VIOsPAPRBus *spapr_vio_bus_init(void)
+{
+    VIOsPAPRBus *bus;
+    BusState *_bus;
+    DeviceState *dev;
+    DeviceInfo *_info;
+
+    /* Create bridge device */
+    dev = qdev_create(NULL, "spapr-vio-bridge");
+    qdev_init_nofail(dev);
+
+    /* Create bus on bridge device */
+
+    _bus = qbus_create(&spapr_vio_bus_info, dev, "spapr-vio");
+    bus = DO_UPCAST(VIOsPAPRBus, bus, _bus);
+
+    for (_info = device_info_list; _info; _info = _info->next) {
+        VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)_info;
+
+        if (_info->bus_info != &spapr_vio_bus_info)
+            continue;
+
+        if (info->hcalls)
+            info->hcalls(bus);
+    }
+
+    return bus;
+}
+
+/* Represents sPAPR hcall VIO devices */
+
+static int spapr_vio_bridge_init(SysBusDevice *dev)
+{
+    /* nothing */
+    return 0;
+}
+
+static SysBusDeviceInfo spapr_vio_bridge_info = {
+    .init = spapr_vio_bridge_init,
+    .qdev.name  = "spapr-vio-bridge",
+    .qdev.size  = sizeof(SysBusDevice),
+    .qdev.no_user = 1,
+};
+
+static void spapr_vio_register_devices(void)
+{
+    sysbus_register_withprop(&spapr_vio_bridge_info);
+}
+
+device_init(spapr_vio_register_devices)
+
+#ifdef CONFIG_FDT
+int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt)
+{
+    DeviceState *qdev;
+    int ret = 0;
+
+    QLIST_FOREACH(qdev, &bus->bus.children, sibling) {
+        VIOsPAPRDevice *dev = (VIOsPAPRDevice *)qdev;
+
+        ret = vio_make_devnode(dev, fdt);
+
+        if (ret < 0) {
+            return ret;
+        }
+    }
+    
+    return 0;
+}
+#endif /* CONFIG_FDT */
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
new file mode 100644
index 0000000..b164ad3
--- /dev/null
+++ b/hw/spapr_vio.h
@@ -0,0 +1,50 @@
+#ifndef _HW_SPAPR_VIO_H
+#define _HW_SPAPR_VIO_H
+/*
+ * QEMU sPAPR VIO bus definitions
+ *
+ * Copyright (c) 2010 David Gibson, IBM Corporation <david@gibson.dropbear.id.au>
+ * Based on the s390 virtio bus definitions:
+ * Copyright (c) 2009 Alexander Graf <agraf@suse.de>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+typedef struct VIOsPAPRDevice {
+    DeviceState qdev;
+    uint32_t reg;
+} VIOsPAPRDevice;
+
+typedef struct VIOsPAPRBus {
+    BusState bus;
+} VIOsPAPRBus;
+
+typedef struct {
+    DeviceInfo qdev;
+    const char *dt_name, *dt_type, *dt_compatible;
+    int (*init)(VIOsPAPRDevice *dev);
+    void (*hcalls)(VIOsPAPRBus *bus);
+    int (*devnode)(VIOsPAPRDevice *dev, void *fdt, int node_off);
+} VIOsPAPRDeviceInfo;
+
+extern VIOsPAPRBus *spapr_vio_bus_init(void);
+extern VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg);
+extern void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info);
+extern int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt);
+
+void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len);
+void spapr_vty_create(VIOsPAPRBus *bus,
+                      uint32_t reg, CharDriverState *chardev);
+
+#endif /* _HW_SPAPR_VIO_H */
diff --git a/hw/spapr_vty.c b/hw/spapr_vty.c
new file mode 100644
index 0000000..afc9ef9
--- /dev/null
+++ b/hw/spapr_vty.c
@@ -0,0 +1,145 @@
+#include "qdev.h"
+#include "qemu-char.h"
+#include "hw/spapr.h"
+#include "hw/spapr_vio.h"
+
+#define VTERM_BUFSIZE   16
+
+typedef struct VIOsPAPRVTYDevice {
+    VIOsPAPRDevice sdev;
+    CharDriverState *chardev;
+    uint32_t in, out;
+    uint8_t buf[VTERM_BUFSIZE];
+} VIOsPAPRVTYDevice;
+
+static int vty_can_receive(void *opaque)
+{
+    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)opaque;
+
+    return (dev->in - dev->out) < VTERM_BUFSIZE;
+}
+
+static void vty_receive(void *opaque, const uint8_t *buf, int size)
+{
+    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)opaque;
+    int i;
+
+    for (i = 0; i < size; i++) {
+        assert((dev->in - dev->out) < VTERM_BUFSIZE);
+        dev->buf[dev->in++ % VTERM_BUFSIZE] = buf[i];
+    }
+}
+
+static int vty_getchars(VIOsPAPRDevice *sdev, uint8_t *buf, int max)
+{
+    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
+    int n = 0;
+
+    while ((n < max) && (dev->out != dev->in))
+        buf[n++] = dev->buf[dev->out++ % VTERM_BUFSIZE];
+
+    return n;
+}
+
+void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len)
+{
+    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
+
+    /* FIXME: should check the qemu_chr_write() return value */
+    qemu_chr_write(dev->chardev, buf, len);
+}
+
+static int spapr_vty_init(VIOsPAPRDevice *sdev)
+{
+    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
+
+    qemu_chr_add_handlers(dev->chardev, vty_can_receive,
+                          vty_receive, NULL, dev);
+
+    return 0;
+}
+
+static target_ulong h_put_term_char(CPUState *env, sPAPREnvironment *spapr,
+                                    target_ulong opcode, target_ulong *args)
+{
+    target_ulong reg = args[0];
+    target_ulong len = args[1];
+    target_ulong char0_7 = args[2];
+    target_ulong char8_15 = args[3];
+    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+    uint8_t buf[16];
+
+    if (!sdev)
+        return H_PARAMETER;
+
+    if (len > 16)
+        return H_PARAMETER;
+
+    *((uint64_t *)buf) = cpu_to_be64(char0_7);
+    *((uint64_t *)buf + 1) = cpu_to_be64(char8_15);
+
+    vty_putchars(sdev, buf, len);
+
+    return H_SUCCESS;
+}
+
+static target_ulong h_get_term_char(CPUState *env, sPAPREnvironment *spapr,
+                                    target_ulong opcode, target_ulong *args)
+{
+    target_ulong reg = args[0];
+    target_ulong *len = args + 0;
+    target_ulong *char0_7 = args + 1;
+    target_ulong *char8_15 = args + 2;
+    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+    uint8_t buf[16];
+
+    if (!sdev)
+        return H_PARAMETER;
+
+    *len = vty_getchars(sdev, buf, sizeof(buf));
+    if (*len < 16)
+        memset(buf + *len, 0, 16 - *len);
+
+    *char0_7 = be64_to_cpu(*((uint64_t *)buf));
+    *char8_15 = be64_to_cpu(*((uint64_t *)buf + 1));
+
+    return H_SUCCESS;
+}
+
+void spapr_vty_create(VIOsPAPRBus *bus,
+                      uint32_t reg, CharDriverState *chardev)
+{
+    DeviceState *dev;
+
+    dev = qdev_create(&bus->bus, "spapr-vty");
+    qdev_prop_set_uint32(dev, "reg", reg);
+    qdev_prop_set_chr(dev, "chardev", chardev);
+    qdev_init_nofail(dev);
+}
+
+static void vty_hcalls(VIOsPAPRBus *bus)
+{
+    spapr_register_hypercall(H_PUT_TERM_CHAR, h_put_term_char);
+    spapr_register_hypercall(H_GET_TERM_CHAR, h_get_term_char);
+}
+
+static VIOsPAPRDeviceInfo spapr_vty = {
+    .init = spapr_vty_init,
+    .dt_name = "vty",
+    .dt_type = "serial",
+    .dt_compatible = "hvterm1",
+    .hcalls = vty_hcalls,
+    .qdev.name = "spapr-vty",
+    .qdev.size = sizeof(VIOsPAPRVTYDevice),
+    .qdev.props = (Property[]) {
+        DEFINE_PROP_UINT32("reg", VIOsPAPRDevice, reg, 0),
+        DEFINE_PROP_CHR("chardev", VIOsPAPRVTYDevice, chardev),
+        DEFINE_PROP_END_OF_LIST(),
+    },
+};
+
+static void spapr_vty_register(void)
+{
+    spapr_vio_bus_register_withprop(&spapr_vty);
+}
+device_init(spapr_vty_register);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 15/26] Virtual hash page table handling on pSeries machine
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (13 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 14/26] Implement the bus structure for PAPR virtual IO David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16 15:03   ` [Qemu-devel] " Alexander Graf
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 16/26] Implement hcall based RTAS for pSeries machines David Gibson
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

On pSeries logical partitions, excepting the old POWER4-style full system
partitions, the guest does not have direct access to the hardware page
table.  Instead, the pagetable exists in hypervisor memory, and the guest
must manipulate it with hypercalls.

However, our current pSeries emulation more closely resembles the old
style where the guest must set up and handle the pagetables itself.  This
patch converts it to act like a modern partition.

This involves two things: first, the hash translation path is modified to
permit the has table to be stored externally to the emulated machine's
RAM.  The pSeries machine init code configures the CPUs to use this mode.

Secondly, we emulate the PAPR hypercalls for manipulating the external
hashed page table.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/spapr.c          |   32 ++++++-
 hw/spapr_hcall.c    |  247 +++++++++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/cpu.h    |    2 +
 target-ppc/helper.c |   36 ++++++--
 4 files changed, 305 insertions(+), 12 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 25e4a9e..c3d9286 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -50,12 +50,15 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
                               sPAPREnvironment *spapr,
                               target_phys_addr_t initrd_base,
                               target_phys_addr_t initrd_size,
-                              const char *kernel_cmdline)
+                              const char *kernel_cmdline,
+                              long hash_shift)
 {
     void *fdt;
     uint64_t mem_reg_property[] = { 0, cpu_to_be64(ramsize) };
     uint32_t start_prop = cpu_to_be32(initrd_base);
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
+    uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
+    char hypertas_prop[] = "hcall-pft\0hcall-term";
     int i;
     char *modelname;
     int ret;
@@ -138,6 +141,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
          * full emu, for kvm we should copy it from the host */
         _FDT((fdt_property_cell(fdt, "clock-frequency", 1000000000)));
         _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr)));
+        _FDT((fdt_property(fdt, "ibm,pft-size", pft_size_prop, sizeof(pft_size_prop))));
         _FDT((fdt_property_string(fdt, "status", "okay")));
         _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
 
@@ -153,6 +157,14 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
 
     _FDT((fdt_end_node(fdt)));
 
+    /* RTAS */
+    _FDT((fdt_begin_node(fdt, "rtas")));
+
+    _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
+                       sizeof(hypertas_prop))));
+    
+    _FDT((fdt_end_node(fdt)));
+
     /* vdevice */
     _FDT((fdt_begin_node(fdt, "vdevice")));
 
@@ -203,12 +215,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
                            const char *cpu_model)
 {
     CPUState *envs[MAX_CPUS];
-    void *fdt;
+    void *fdt, *htab;
     int i;
     ram_addr_t ram_offset;
     target_phys_addr_t fdt_addr;
     uint32_t kernel_base, initrd_base;
-    long kernel_size, initrd_size;
+    long kernel_size, initrd_size, htab_size;
+    long pteg_shift = 17;
     int fdt_size;
     sPAPREnvironment *spapr;
 
@@ -248,6 +261,16 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
     cpu_register_physical_memory(0, ram_size, ram_offset);
 
+    /* allocate hash page table */
+    htab_size = 1ULL << (pteg_shift + 7);
+    htab = qemu_mallocz(htab_size);
+
+    for (i = 0; i < smp_cpus; i++) {
+        envs[i]->external_htab = htab;
+        envs[i]->htab_base = -1;
+        envs[i]->htab_mask = htab_size - 1;
+    }
+
     spapr->vio_bus = spapr_vio_bus_init();
 
     for (i = 0; i < MAX_SERIAL_PORTS; i++) {
@@ -293,7 +316,8 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 
     /* Prepare the device tree */
     fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, envs, spapr,
-                           initrd_base, initrd_size, kernel_cmdline);
+                           initrd_base, initrd_size, kernel_cmdline,
+                           pteg_shift + 7);
     if (!fdt) {
         hw_error("Couldn't create pSeries device tree\n");
         exit(1);
diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
index 6ddac00..2b14000 100644
--- a/hw/spapr_hcall.c
+++ b/hw/spapr_hcall.c
@@ -1,8 +1,246 @@
 #include "sysemu.h"
 #include "cpu.h"
 #include "qemu-char.h"
+#include "sysemu.h"
+#include "qemu-char.h"
+#include "exec-all.h"
 #include "hw/spapr.h"
 
+#define HPTES_PER_GROUP 8
+
+#define HPTE_V_SSIZE_SHIFT      62
+#define HPTE_V_AVPN_SHIFT       7
+#define HPTE_V_AVPN             0x3fffffffffffff80ULL
+#define HPTE_V_AVPN_VAL(x)      (((x) & HPTE_V_AVPN) >> HPTE_V_AVPN_SHIFT)
+#define HPTE_V_COMPARE(x,y)     (!(((x) ^ (y)) & 0xffffffffffffff80UL))
+#define HPTE_V_BOLTED           0x0000000000000010ULL
+#define HPTE_V_LOCK             0x0000000000000008ULL
+#define HPTE_V_LARGE            0x0000000000000004ULL
+#define HPTE_V_SECONDARY        0x0000000000000002ULL
+#define HPTE_V_VALID            0x0000000000000001ULL
+
+#define HPTE_R_PP0              0x8000000000000000ULL
+#define HPTE_R_TS               0x4000000000000000ULL
+#define HPTE_R_KEY_HI           0x3000000000000000ULL
+#define HPTE_R_RPN_SHIFT        12
+#define HPTE_R_RPN              0x3ffffffffffff000ULL
+#define HPTE_R_FLAGS            0x00000000000003ffULL
+#define HPTE_R_PP               0x0000000000000003ULL
+#define HPTE_R_N                0x0000000000000004ULL
+#define HPTE_R_G                0x0000000000000008ULL
+#define HPTE_R_M                0x0000000000000010ULL
+#define HPTE_R_I                0x0000000000000020ULL
+#define HPTE_R_W                0x0000000000000040ULL
+#define HPTE_R_WIMG             0x0000000000000078ULL
+#define HPTE_R_C                0x0000000000000080ULL
+#define HPTE_R_R                0x0000000000000100ULL
+#define HPTE_R_KEY_LO           0x0000000000000e00ULL
+
+#define HPTE_V_1TB_SEG          0x4000000000000000ULL
+#define HPTE_V_VRMA_MASK        0x4001ffffff000000ULL
+
+#define HPTE_V_HVLOCK           0x40ULL
+
+static inline int lock_hpte(void *hpte, target_ulong bits)
+{
+    uint64_t pteh;
+
+    pteh = ldq_p(hpte);
+
+    /* FIXME: probably need some sort of lockage for SMP */
+    if (pteh & bits) {
+        return 0;
+    }
+    stq_p(hpte, pteh | HPTE_V_HVLOCK);
+    return 1;
+}
+
+static target_ulong compute_tlbie_rb(target_ulong v, target_ulong r,
+                                     target_ulong pte_index)
+{
+    target_ulong rb, va_low;
+
+    rb = (v & ~0x7fULL) << 16; /* AVA field */
+    va_low = pte_index >> 3;
+    if (v & HPTE_V_SECONDARY)
+        va_low = ~va_low;
+    /* xor vsid from AVA */
+    if (!(v & HPTE_V_1TB_SEG))
+        va_low ^= v >> 12;
+    else
+        va_low ^= v >> 24;
+    va_low &= 0x7ff;
+    if (v & HPTE_V_LARGE) {
+        rb |= 1;                         /* L field */
+#if 0 /* Disable that P7 specific bit for now */
+        if (r & 0xff000) {
+            /* non-16MB large page, must be 64k */
+            /* (masks depend on page size) */
+            rb |= 0x1000;                /* page encoding in LP field */
+            rb |= (va_low & 0x7f) << 16; /* 7b of VA in AVA/LP field */
+            rb |= (va_low & 0xfe);       /* AVAL field */
+        }
+#endif
+    } else {
+        /* 4kB page */
+        rb |= (va_low & 0x7ff) << 12;   /* remaining 11b of AVA */
+    }
+    rb |= (v >> 54) & 0x300;            /* B field */
+    return rb;
+}
+
+static target_ulong h_enter(CPUState *env, sPAPREnvironment *spapr,
+                            target_ulong opcode, target_ulong *args)
+{
+    target_ulong flags = args[0];
+    target_ulong pte_index = args[1];
+    target_ulong pteh = args[2];
+    target_ulong ptel = args[3];
+    target_ulong porder;
+    target_ulong i, pa;
+    uint8_t *hpte;
+
+    /* only handle 4k and 16M pages for now */
+    porder = 12;
+    if (pteh & HPTE_V_LARGE) {
+        if ((ptel & 0xf000) == 0x1000) {
+            /* 64k page */
+            porder = 16;
+        } else if ((ptel & 0xff000) == 0) {
+            /* 16M page */
+            porder = 24;
+            /* lowest AVA bit must be 0 for 16M pages */
+            if (pteh & 0x80)
+                return H_PARAMETER;
+        } else {
+            return H_PARAMETER;
+        }
+    }
+
+    pa = ptel & HPTE_R_RPN;
+    /* FIXME: bounds check the pa? */
+
+    /* Check WIMG */
+    if ((ptel & HPTE_R_WIMG) != HPTE_R_M)
+        return H_PARAMETER;
+    pteh &= ~0x60ULL;
+
+    if ((pte_index * HASH_PTE_SIZE_64) & ~env->htab_mask)
+        return H_PARAMETER;
+    if (likely((flags & H_EXACT) == 0)) {
+        pte_index &= ~7ULL;
+        hpte = env->external_htab + (pte_index * HASH_PTE_SIZE_64);
+        for (i = 0; ; ++i) {
+            if (i == 8)
+                return H_PTEG_FULL;
+            if (((ldq_p(hpte) & HPTE_V_VALID) == 0) &&
+                lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID)) {
+                break;
+            }
+            hpte += HASH_PTE_SIZE_64;
+        }
+    } else {
+        i = 0;
+        hpte = env->external_htab + (pte_index * HASH_PTE_SIZE_64);
+        if (!lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID)) {
+            return H_PTEG_FULL;
+        }
+    }
+    stq_p(hpte + (HASH_PTE_SIZE_64/2), ptel);
+    /* eieio();  FIXME: need some sort of barrier for smp? */
+    stq_p(hpte, pteh);
+
+    assert (!(ldq_p(hpte) & HPTE_V_HVLOCK));
+    args[0] = pte_index + i;
+    return H_SUCCESS;
+}
+
+static target_ulong h_remove(CPUState *env, sPAPREnvironment *spapr,
+                             target_ulong opcode, target_ulong *args)
+{
+    target_ulong flags = args[0];
+    target_ulong pte_index = args[1];
+    target_ulong avpn = args[2];
+    uint8_t *hpte;
+    target_ulong v, r, rb;
+
+    if ((pte_index * HASH_PTE_SIZE_64) & ~env->htab_mask) {
+        return H_PARAMETER;
+    }
+
+    hpte = env->external_htab + (pte_index * HASH_PTE_SIZE_64);
+    while (!lock_hpte(hpte, HPTE_V_HVLOCK)) {
+        /* We have no real concurrency in qemu soft-emulation, so we
+         * will never actually have a contested lock */
+        assert(0);
+    }
+
+    v = ldq_p(hpte);
+    r = ldq_p(hpte + (HASH_PTE_SIZE_64/2));
+
+    if ((v & HPTE_V_VALID) == 0 ||
+        ((flags & H_AVPN) && (v & ~0x7fULL) != avpn) ||
+        ((flags & H_ANDCOND) && (v & avpn) != 0)) {
+        stq_p(hpte, v & ~HPTE_V_HVLOCK);
+        assert (!(ldq_p(hpte) & HPTE_V_HVLOCK));
+        return H_NOT_FOUND;
+    }
+    args[0] = v & ~HPTE_V_HVLOCK;
+    args[1] = r;
+    stq_p(hpte, 0);
+    rb = compute_tlbie_rb(v, r, pte_index);
+//    ppc_tlb_invalidate_one(env, rb);
+    tlb_flush(env, 1);
+    assert (!(ldq_p(hpte) & HPTE_V_HVLOCK));
+    return H_SUCCESS;
+}
+
+static target_ulong h_protect(CPUState *env, sPAPREnvironment *spapr,
+                              target_ulong opcode, target_ulong *args)
+{
+    target_ulong flags = args[0];
+    target_ulong pte_index = args[1];
+    target_ulong avpn = args[2];
+    uint8_t *hpte;
+    target_ulong v, r, rb;
+
+    if ((pte_index * HASH_PTE_SIZE_64) & ~env->htab_mask) {
+        return H_PARAMETER;
+    }
+
+    hpte = env->external_htab + (pte_index * HASH_PTE_SIZE_64);
+    while (!lock_hpte(hpte, HPTE_V_HVLOCK)) {
+        /* We have no real concurrency in qemu soft-emulation, so we
+         * will never actually have a contested lock */
+        assert(0);
+    }
+
+    v = ldq_p(hpte);
+    r = ldq_p(hpte + (HASH_PTE_SIZE_64/2));
+
+    if ((v & HPTE_V_VALID) == 0 ||
+        ((flags & H_AVPN) && (v & ~0x7fULL) != avpn)) {
+        stq_p(hpte, v & ~HPTE_V_HVLOCK);
+        assert (!(ldq_p(hpte) & HPTE_V_HVLOCK));
+        return H_NOT_FOUND;
+    }
+
+    r &= ~(HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
+           HPTE_R_KEY_HI | HPTE_R_KEY_LO);
+    r |= (flags << 55) & HPTE_R_PP0;
+    r |= (flags << 48) & HPTE_R_KEY_HI;
+    r |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
+    rb = compute_tlbie_rb(v, r, pte_index);
+    stq_p(hpte, v & ~HPTE_V_VALID);
+    //ppc_tlb_invalidate_one(env, rb);
+    tlb_flush(env, 1);
+    stq_p(hpte + (HASH_PTE_SIZE_64/2), r);
+    /* eieio(); FIXME: need some sort of barrier on smp? */
+    stq_p(hpte, v & ~HPTE_V_HVLOCK);
+    assert (!(ldq_p(hpte) & HPTE_V_HVLOCK));
+    return H_SUCCESS;
+}
+
 struct hypercall {
     spapr_hcall_fn fn;
 } hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
@@ -41,3 +279,12 @@ target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
     fprintf(stderr, "Unimplemented hcall 0x" TARGET_FMT_lx "\n", opcode);
     return H_FUNCTION;
 }
+
+static void hypercall_init(void)
+{
+    /* hcall-pft */
+    spapr_register_hypercall(H_ENTER, h_enter);
+    spapr_register_hypercall(H_REMOVE, h_remove);
+    spapr_register_hypercall(H_PROTECT, h_protect);
+}
+device_init(hypercall_init);
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 3a47d11..29d6b49 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -670,6 +670,8 @@ struct CPUPPCState {
     target_phys_addr_t htab_base;
     target_phys_addr_t htab_mask;
     target_ulong sr[32];
+    /* externally stored hash table */
+    uint8_t *external_htab;
     /* BATs */
     int nb_BATs;
     target_ulong DBAT[2][8];
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 13a5ab1..5ead62f 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -585,8 +585,13 @@ static inline int _find_pte(CPUState *env, mmu_ctx_t *ctx, int is_64b, int h,
     for (i = 0; i < 8; i++) {
 #if defined(TARGET_PPC64)
         if (is_64b) {
-            pte0 = ldq_phys(env->htab_base + pteg_off + (i * 16));
-            pte1 = ldq_phys(env->htab_base + pteg_off + (i * 16) + 8);
+            if (env->external_htab) {
+                pte0 = ldq_p(env->external_htab + pteg_off + (i * 16));
+                pte1 = ldq_p(env->external_htab + pteg_off + (i * 16) + 8);
+            } else {
+                pte0 = ldq_phys(env->htab_base + pteg_off + (i * 16));
+                pte1 = ldq_phys(env->htab_base + pteg_off + (i * 16) + 8);
+            }
 
             /* We have a TLB that saves 4K pages, so let's
              * split a huge page to 4k chunks */
@@ -602,8 +607,13 @@ static inline int _find_pte(CPUState *env, mmu_ctx_t *ctx, int is_64b, int h,
         } else
 #endif
         {
-            pte0 = ldl_phys(env->htab_base + pteg_off + (i * 8));
-            pte1 =  ldl_phys(env->htab_base + pteg_off + (i * 8) + 4);
+            if (env->external_htab) {
+                pte0 = ldl_p(env->external_htab + pteg_off + (i * 8));
+                pte1 = ldl_p(env->external_htab + pteg_off + (i * 8) + 4);
+            } else {
+                pte0 = ldl_phys(env->htab_base + pteg_off + (i * 8));
+                pte1 = ldl_phys(env->htab_base + pteg_off + (i * 8) + 4);
+            }
             r = pte32_check(ctx, pte0, pte1, h, rw, type);
             LOG_MMU("Load pte from " TARGET_FMT_lx " => " TARGET_FMT_lx " "
                     TARGET_FMT_lx " %d %d %d " TARGET_FMT_lx "\n",
@@ -643,13 +653,23 @@ static inline int _find_pte(CPUState *env, mmu_ctx_t *ctx, int is_64b, int h,
         if (pte_update_flags(ctx, &pte1, ret, rw) == 1) {
 #if defined(TARGET_PPC64)
             if (is_64b) {
-                stq_phys_notdirty(env->htab_base + pteg_off + (good * 16) + 8,
-                                  pte1);
+                if (env->external_htab) {
+                    stq_p(env->external_htab + pteg_off + (good * 16) + 8,
+                          pte1);
+                } else {
+                    stq_phys_notdirty(env->htab_base + pteg_off +
+                                      (good * 16) + 8, pte1);
+                }
             } else
 #endif
             {
-                stl_phys_notdirty(env->htab_base + pteg_off + (good * 8) + 4,
-                                  pte1);
+                if (env->external_htab) {
+                    stl_p(env->external_htab + pteg_off + (good * 8) + 4,
+                          pte1);
+                } else {
+                    stl_phys_notdirty(env->htab_base + pteg_off +
+                                      (good * 8) + 4, pte1);
+                }
             }
         }
     }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 16/26] Implement hcall based RTAS for pSeries machines
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (14 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 15/26] Virtual hash page table handling on pSeries machine David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16 15:08   ` [Qemu-devel] " Alexander Graf
  2011-03-16 22:08   ` [Qemu-devel] " Anthony Liguori
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 17/26] Implement assorted pSeries hcalls and RTAS methods David Gibson
                   ` (9 subsequent siblings)
  25 siblings, 2 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

On pSeries machines, operating systems can instantiate "RTAS" (Run-Time
Abstraction Services), a runtime component of the firmware which implements
a number of low-level, infrequently used operations.  On logical partitions
under a hypervisor, many of the RTAS functions require hypervisor
privilege.  For simplicity, therefore, hypervisor systems typically
implement the in-partition RTAS as just a tiny wrapper around a hypercall
which actually implements the various RTAS functions.

This patch implements such a hypercall based RTAS for our emulated pSeries
machine.  A tiny in-partition "firmware" calls a new hypercall, which
looks up available RTAS services in a table.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 Makefile               |    3 +-
 Makefile.target        |    2 +-
 hw/spapr.c             |   27 +++++++++++--
 hw/spapr.h             |   21 ++++++++++
 hw/spapr_hcall.c       |   15 +++++++
 hw/spapr_rtas.c        |  104 ++++++++++++++++++++++++++++++++++++++++++++++++
 pc-bios/spapr-rtas.bin |  Bin 0 -> 20 bytes
 7 files changed, 166 insertions(+), 6 deletions(-)
 create mode 100644 hw/spapr_rtas.c
 create mode 100644 pc-bios/spapr-rtas.bin

diff --git a/Makefile b/Makefile
index eca4c76..fc4bd24 100644
--- a/Makefile
+++ b/Makefile
@@ -213,7 +213,8 @@ pxe-ne2k_pci.bin pxe-pcnet.bin \
 pxe-rtl8139.bin pxe-virtio.bin \
 bamboo.dtb petalogix-s3adsp1800.dtb \
 multiboot.bin linuxboot.bin \
-s390-zipl.rom
+s390-zipl.rom \
+spapr-rtas.bin
 else
 BLOBS=
 endif
diff --git a/Makefile.target b/Makefile.target
index 3f2b235..e333225 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -232,7 +232,7 @@ obj-ppc-y += ppc_oldworld.o
 # NewWorld PowerMac
 obj-ppc-y += ppc_newworld.o
 # IBM pSeries (sPAPR)
-obj-ppc-y += spapr.o spapr_hcall.o spapr_vio.o
+obj-ppc-y += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
 obj-ppc-y += spapr_vty.o
 # PowerPC 4xx boards
 obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
diff --git a/hw/spapr.c b/hw/spapr.c
index c3d9286..f41451b 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -40,6 +40,7 @@
 #define KERNEL_LOAD_ADDR        0x00000000
 #define INITRD_LOAD_ADDR        0x02800000
 #define FDT_MAX_SIZE            0x10000
+#define RTAS_MAX_SIZE           0x10000
 
 #define TIMEBASE_FREQ           512000000ULL
 
@@ -51,6 +52,8 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
                               target_phys_addr_t initrd_base,
                               target_phys_addr_t initrd_size,
                               const char *kernel_cmdline,
+                              target_phys_addr_t rtas_addr,
+                              target_phys_addr_t rtas_size,
                               long hash_shift)
 {
     void *fdt;
@@ -162,7 +165,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
 
     _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
                        sizeof(hypertas_prop))));
-    
+
     _FDT((fdt_end_node(fdt)));
 
     /* vdevice */
@@ -186,6 +189,11 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
         fprintf(stderr, "couldn't setup vio devices in fdt\n");
     }
 
+    /* RTAS */
+    ret = spapr_rtas_device_tree_setup(fdt, rtas_addr, rtas_size);
+    if (ret < 0)
+        fprintf(stderr, "Couldn't set up RTAS device tree properties\n");
+
     _FDT((fdt_pack(fdt)));
 
     if (fdt_size) {
@@ -218,12 +226,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     void *fdt, *htab;
     int i;
     ram_addr_t ram_offset;
-    target_phys_addr_t fdt_addr;
+    target_phys_addr_t fdt_addr, rtas_addr;
     uint32_t kernel_base, initrd_base;
-    long kernel_size, initrd_size, htab_size;
+    long kernel_size, initrd_size, htab_size, rtas_size;
     long pteg_shift = 17;
     int fdt_size;
     sPAPREnvironment *spapr;
+    char *filename;
 
     spapr = qemu_malloc(sizeof(*spapr));
 
@@ -231,6 +240,8 @@ static void ppc_spapr_init(ram_addr_t ram_size,
      * 2GB, so that it can be processed with 32-bit code if
      * necessary */
     fdt_addr = MIN(ram_size, 0x80000000) - FDT_MAX_SIZE;
+    /* RTAS goes just below that */
+    rtas_addr = fdt_addr - RTAS_MAX_SIZE;
 
     /* init CPUs */
     if (cpu_model == NULL) {
@@ -271,6 +282,14 @@ static void ppc_spapr_init(ram_addr_t ram_size,
         envs[i]->htab_mask = htab_size - 1;
     }
 
+    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
+    rtas_size = load_image_targphys(filename, rtas_addr, ram_size - rtas_addr);
+    if (rtas_size < 0) {
+        hw_error("qemu: could not load LPAR rtas '%s'\n", filename);
+        exit(1);
+    }
+    qemu_free(filename);
+
     spapr->vio_bus = spapr_vio_bus_init();
 
     for (i = 0; i < MAX_SERIAL_PORTS; i++) {
@@ -317,7 +336,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     /* Prepare the device tree */
     fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, envs, spapr,
                            initrd_base, initrd_size, kernel_cmdline,
-                           pteg_shift + 7);
+                           rtas_addr, rtas_size, pteg_shift + 7);
     if (!fdt) {
         hw_error("Couldn't create pSeries device tree\n");
         exit(1);
diff --git a/hw/spapr.h b/hw/spapr.h
index 47bf2ef..7a7c319 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -237,6 +237,8 @@ typedef struct sPAPREnvironment {
 #define H_GET_MPP               0x2D4
 #define MAX_HCALL_OPCODE        H_GET_MPP
 
+#define H_RTAS                  0x72746173
+
 typedef target_ulong (*spapr_hcall_fn)(CPUState *env, sPAPREnvironment *spapr,
                                        target_ulong opcode,
                                        target_ulong *args);
@@ -245,5 +247,24 @@ void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn);
 target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
                              target_ulong opcode, target_ulong *args);
 
+static inline uint32_t rtas_ld(target_ulong phys, int n)
+{
+    return ldl_phys(phys + 4*n);
+}
+
+static inline void rtas_st(target_ulong phys, int n, uint32_t val)
+{
+    stl_phys(phys + 4*n, val);
+}
+
+typedef void (*spapr_rtas_fn)(sPAPREnvironment *spapr, uint32_t token,
+                              uint32_t nargs, target_ulong args,
+                              uint32_t nret, target_ulong rets);
+void spapr_rtas_register(const char *name, spapr_rtas_fn fn);
+target_ulong spapr_rtas_call(sPAPREnvironment *spapr,
+                             uint32_t token, uint32_t nargs, target_ulong args,
+                             uint32_t nret, target_ulong rets);
+int spapr_rtas_device_tree_setup(void *fdt, target_phys_addr_t rtas_addr,
+                                 target_phys_addr_t rtas_size);
 
 #endif /* !defined (__HW_SPAPR_H__) */
diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
index 2b14000..7b8e17c 100644
--- a/hw/spapr_hcall.c
+++ b/hw/spapr_hcall.c
@@ -241,6 +241,16 @@ static target_ulong h_protect(CPUState *env, sPAPREnvironment *spapr,
     return H_SUCCESS;
 }
 
+static target_ulong h_rtas(sPAPREnvironment *spapr, target_ulong rtas_r3)
+{
+    uint32_t token = ldl_phys(rtas_r3);
+    uint32_t nargs = ldl_phys(rtas_r3 + 4);
+    uint32_t nret = ldl_phys(rtas_r3 + 8);
+
+    return spapr_rtas_call(spapr, token, nargs, rtas_r3 + 12,
+                           nret, rtas_r3 + 12 + 4*nargs);
+}
+
 struct hypercall {
     spapr_hcall_fn fn;
 } hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
@@ -276,6 +286,11 @@ target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
             return hc->fn(env, spapr, opcode, args);
     }
 
+    if (opcode == H_RTAS) {
+        /* H_RTAS is a special case outside the normal range */
+        return h_rtas(spapr, args[0]);
+    }
+
     fprintf(stderr, "Unimplemented hcall 0x" TARGET_FMT_lx "\n", opcode);
     return H_FUNCTION;
 }
diff --git a/hw/spapr_rtas.c b/hw/spapr_rtas.c
new file mode 100644
index 0000000..c606018
--- /dev/null
+++ b/hw/spapr_rtas.c
@@ -0,0 +1,104 @@
+#include "cpu.h"
+#include "sysemu.h"
+#include "qemu-char.h"
+#include "hw/qdev.h"
+#include "device_tree.h"
+
+#include "hw/spapr.h"
+#include "hw/spapr_vio.h"
+
+#include <libfdt.h>
+
+#define TOKEN_BASE      0x2000
+#define TOKEN_MAX       0x100
+
+static struct rtas_call {
+    const char *name;
+    spapr_rtas_fn fn;
+} rtas_table[TOKEN_MAX];
+
+struct rtas_call *rtas_next = rtas_table;
+
+target_ulong spapr_rtas_call(sPAPREnvironment *spapr,
+                             uint32_t token, uint32_t nargs, target_ulong args,
+                             uint32_t nret, target_ulong rets)
+{
+    if ((token >= TOKEN_BASE)
+        && ((token - TOKEN_BASE) < TOKEN_MAX)) {
+        struct rtas_call *call = rtas_table + (token - TOKEN_BASE);
+
+        if (call->fn) {
+            call->fn(spapr, token, nargs, args, nret, rets);
+            return H_SUCCESS;
+        }
+    }
+
+    fprintf(stderr, "Unknown RTAS token 0x%x\n", token);
+    rtas_st(rets, 0, -3);
+    return H_PARAMETER;
+}
+
+void spapr_rtas_register(const char *name, spapr_rtas_fn fn)
+{
+    assert(rtas_next < (rtas_table + TOKEN_MAX));
+
+    rtas_next->name = name;
+    rtas_next->fn = fn;
+
+    rtas_next++;
+}
+
+int spapr_rtas_device_tree_setup(void *fdt, target_phys_addr_t rtas_addr,
+                                 target_phys_addr_t rtas_size)
+{
+    int ret;
+    int i;
+
+    ret = fdt_add_mem_rsv(fdt, rtas_addr, rtas_size);
+    if (ret < 0) {
+        fprintf(stderr, "Couldn't add RTAS reserve entry: %s\n",
+                fdt_strerror(ret));
+        return ret;
+    }
+
+    ret = qemu_devtree_setprop_cell(fdt, "/rtas", "linux,rtas-base",
+                                    rtas_addr);
+    if (ret < 0) {
+        fprintf(stderr, "Couldn't add linux,rtas-base property: %s\n",
+                fdt_strerror(ret));
+        return ret;
+    }
+
+    ret = qemu_devtree_setprop_cell(fdt, "/rtas", "linux,rtas-entry",
+                                    rtas_addr);
+    if (ret < 0) {
+        fprintf(stderr, "Couldn't add linux,rtas-entry property: %s\n",
+                fdt_strerror(ret));
+        return ret;
+    }
+
+    ret = qemu_devtree_setprop_cell(fdt, "/rtas", "rtas-size",
+                                    rtas_size);
+    if (ret < 0) {
+        fprintf(stderr, "Couldn't add rtas-size property: %s\n",
+                fdt_strerror(ret));
+        return ret;
+    }
+
+    for (i = 0; i < TOKEN_MAX; i++) {
+        struct rtas_call *call = &rtas_table[i];
+
+        if (!call->fn) {
+            continue;
+        }
+
+        ret = qemu_devtree_setprop_cell(fdt, "/rtas", call->name, i + TOKEN_BASE);
+        if (ret < 0) {
+            fprintf(stderr, "Couldn't add rtas token for %s: %s\n",
+                    call->name, fdt_strerror(ret));
+            return ret;
+        }
+
+    }
+    return 0;
+}
diff --git a/pc-bios/spapr-rtas.bin b/pc-bios/spapr-rtas.bin
new file mode 100644
index 0000000000000000000000000000000000000000..eade9c0e8ff0fd3071e3a6638a11c1a2e9a47152
GIT binary patch
literal 20
bcmb<Pk*=^wC@M)vPAqm|U{LaFU{C-6M#cr<

literal 0
HcmV?d00001

-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 17/26] Implement assorted pSeries hcalls and RTAS methods
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (15 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 16/26] Implement hcall based RTAS for pSeries machines David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics) David Gibson
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

This patch adds several small utility hypercalls and RTAS methods to
the pSeries platform emulation.  Specifically:

* 'display-character' rtas call

This just prints a character to the console, it's occasionally used
for early debug of the OS.  The support includes a hack to make this
RTAS call respond on the normal token value present on real hardware,
since some early debugging tools just assume this value without
checking the device tree.

* 'get-time-of-day' rtas call

This one just takes the host real time, converts to the PAPR described
format and returns it to the guest.

* 'power-off' rtas call

This one shuts down the emulated system.

* H_DABR hypercall

On pSeries, the DABR debug register is usually a hypervisor resource
and virtualized through this hypercall.  If the hypercall is not
present, Linux will under some circumstances attempt to manipulate the
DABR directly which will fail on this emulated machine.

This stub implementation is enough to stop that behaviour, although it
doesn't actually implement the requested DABR operations as yet.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/spapr.c       |    2 +-
 hw/spapr_hcall.c |   10 ++++++++
 hw/spapr_rtas.c  |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+), 1 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index f41451b..23f493a 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -61,7 +61,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
     uint32_t start_prop = cpu_to_be32(initrd_base);
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
     uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
-    char hypertas_prop[] = "hcall-pft\0hcall-term";
+    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr";
     int i;
     char *modelname;
     int ret;
diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
index 7b8e17c..0ff83c9 100644
--- a/hw/spapr_hcall.c
+++ b/hw/spapr_hcall.c
@@ -241,6 +241,13 @@ static target_ulong h_protect(CPUState *env, sPAPREnvironment *spapr,
     return H_SUCCESS;
 }
 
+static target_ulong h_set_dabr(CPUState *env, sPAPREnvironment *spapr,
+                               target_ulong opcode, target_ulong *args)
+{
+    /* FIXME: actually implement this */
+    return H_HARDWARE;
+}
+
 static target_ulong h_rtas(sPAPREnvironment *spapr, target_ulong rtas_r3)
 {
     uint32_t token = ldl_phys(rtas_r3);
@@ -301,5 +308,8 @@ static void hypercall_init(void)
     spapr_register_hypercall(H_ENTER, h_enter);
     spapr_register_hypercall(H_REMOVE, h_remove);
     spapr_register_hypercall(H_PROTECT, h_protect);
+
+    /* hcall-dabr */
+    spapr_register_hypercall(H_SET_DABR, h_set_dabr);
 }
 device_init(hypercall_init);
diff --git a/hw/spapr_rtas.c b/hw/spapr_rtas.c
index c606018..354f4df 100644
--- a/hw/spapr_rtas.c
+++ b/hw/spapr_rtas.c
@@ -12,6 +12,58 @@
 #define TOKEN_BASE      0x2000
 #define TOKEN_MAX       0x100
 
+static void rtas_display_character(sPAPREnvironment *spapr,
+                                   uint32_t token, uint32_t nargs,
+                                   target_ulong args,
+                                   uint32_t nret, target_ulong rets)
+{
+    uint8_t c = rtas_ld(args, 0);
+    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, 0);
+
+    if (!sdev) {
+        rtas_st(rets, 0, -1);
+    } else {
+        vty_putchars(sdev, &c, sizeof(c));
+        rtas_st(rets, 0, 0);
+    }
+}
+
+static void rtas_get_time_of_day(sPAPREnvironment *spapr,
+                                 uint32_t token, uint32_t nargs,
+                                 target_ulong args,
+                                 uint32_t nret, target_ulong rets)
+{
+    struct tm tm;
+
+    if (nret != 8) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    qemu_get_timedate(&tm, 0);    
+
+    rtas_st(rets, 0, 0); /* Success */
+    rtas_st(rets, 1, tm.tm_year + 1900);
+    rtas_st(rets, 2, tm.tm_mon + 1);
+    rtas_st(rets, 3, tm.tm_mday);
+    rtas_st(rets, 4, tm.tm_hour);
+    rtas_st(rets, 5, tm.tm_min);
+    rtas_st(rets, 6, tm.tm_sec);
+    rtas_st(rets, 7, 0); /* we don't do nanoseconds */
+}
+
+static void rtas_power_off(sPAPREnvironment *spapr,
+                           uint32_t token, uint32_t nargs, target_ulong args,
+                           uint32_t nret, target_ulong rets)
+{
+    if (nargs != 2 || nret != 1) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+    qemu_system_shutdown_request();
+    rtas_st(rets, 0, 0);
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -33,6 +85,15 @@ target_ulong spapr_rtas_call(sPAPREnvironment *spapr,
         }
     }
 
+    /* HACK: Some Linux early debug code uses RTAS display-character,
+     * but assumes the token value is 0xa (which it is on some real
+     * machines) without looking it up in the device tree.  This
+     * special case makes this work */
+    if (token == 0xa) {
+        rtas_display_character(spapr, 0xa, nargs, args, nret, rets);
+        return H_SUCCESS;
+    }
+
     fprintf(stderr, "Unknown RTAS token 0x%x\n", token);
     rtas_st(rets, 0, -3);
     return H_PARAMETER;
@@ -102,3 +163,11 @@ int spapr_rtas_device_tree_setup(void *fdt, target_phys_addr_t rtas_addr,
     }
     return 0;
 }
+
+static void register_core_rtas(void)
+{
+    spapr_rtas_register("display-character", rtas_display_character);
+    spapr_rtas_register("get-time-of-day", rtas_get_time_of_day);
+    spapr_rtas_register("power-off", rtas_power_off);
+}
+device_init(register_core_rtas);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics)
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (16 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 17/26] Implement assorted pSeries hcalls and RTAS methods David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16 15:47   ` [Qemu-devel] " Alexander Graf
  2011-03-16 22:16   ` [Qemu-devel] " Anthony Liguori
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 19/26] Add PAPR H_VIO_SIGNAL hypercall and infrastructure for VIO interrupts David Gibson
                   ` (7 subsequent siblings)
  25 siblings, 2 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

PAPR defines an interrupt control architecture which is logically divided
into ICS (Interrupt Control Presentation, each unit is responsible for
presenting interrupts to a particular "interrupt server", i.e. CPU) and
ICS (Interrupt Control Source, each unit responsible for one or more
hardware interrupts as numbered globally across the system).  All PAPR
virtual IO devices expect to deliver interrupts via this mechanism.  In
Linux, this interrupt controller system is handled by the "xics" driver.

On pSeries systems, access to the interrupt controller is virtualized via
hypercalls and RTAS methods.  However, the virtualized interface is very
similar to the underlying interrupt controller hardware, and similar PICs
exist un-virtualized in some other systems.

This patch implements both the ICP and ICS sides of the PAPR interrupt
controller.  For now, only the hypercall virtualized interface is provided,
however it would be relatively straightforward to graft an emulated
register interface onto the underlying interrupt logic if we want to add
a machine with a hardware ICS/ICP system in the future.

There are some limitations in this implementation: it is assumed for now
that only one instance of the ICS exists, although a full xics system can
have several, each responsible for a different group of hardware irqs.
ICP/ICS can handle both level-sensitve (LSI) and message signalled (MSI)
interrupt inputs.  For now, this implementation supports only MSI
interrupts, since that is used by PAPR virtual IO devices.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 Makefile.target |    2 +-
 hw/spapr.c      |   26 +++
 hw/spapr.h      |    2 +
 hw/xics.c       |  528 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/xics.h       |   13 ++
 5 files changed, 570 insertions(+), 1 deletions(-)
 create mode 100644 hw/xics.c
 create mode 100644 hw/xics.h

diff --git a/Makefile.target b/Makefile.target
index e333225..2b0588e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -233,7 +233,7 @@ obj-ppc-y += ppc_oldworld.o
 obj-ppc-y += ppc_newworld.o
 # IBM pSeries (sPAPR)
 obj-ppc-y += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
-obj-ppc-y += spapr_vty.o
+obj-ppc-y += xics.o spapr_vty.o
 # PowerPC 4xx boards
 obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
 obj-ppc-y += ppc440.o ppc440_bamboo.o
diff --git a/hw/spapr.c b/hw/spapr.c
index 23f493a..be30def 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -34,6 +34,7 @@
 
 #include "hw/spapr.h"
 #include "hw/spapr_vio.h"
+#include "hw/xics.h"
 
 #include <libfdt.h>
 
@@ -62,6 +63,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
     uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
     char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr";
+    uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
     int i;
     char *modelname;
     int ret;
@@ -120,6 +122,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
 
     for (i = 0; i < smp_cpus; i++) {
         CPUState *env = envs[i];
+        uint32_t gserver_prop[] = {cpu_to_be32(i), 0}; /* HACK! */
         char *nodename;
         uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
                            0xffffffff, 0xffffffff};
@@ -147,6 +150,9 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
         _FDT((fdt_property(fdt, "ibm,pft-size", pft_size_prop, sizeof(pft_size_prop))));
         _FDT((fdt_property_string(fdt, "status", "okay")));
         _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
+        _FDT((fdt_property_cell(fdt, "ibm,ppc-interrupt-server#s", i)));
+        _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s", 
+                           gserver_prop, sizeof(gserver_prop))));
 
         if (envs[i]->mmu_model & POWERPC_MMU_1TSEG) {
             _FDT((fdt_property(fdt, "ibm,processor-segment-sizes",
@@ -168,6 +174,20 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
 
     _FDT((fdt_end_node(fdt)));
 
+    /* interrupt controller */ 
+    _FDT((fdt_begin_node(fdt, "interrupt-controller@0")));
+
+    _FDT((fdt_property_string(fdt, "device_type",
+                              "PowerPC-External-Interrupt-Presentation")));
+    _FDT((fdt_property_string(fdt, "compatible", "IBM,ppc-xicp")));
+    _FDT((fdt_property_cell(fdt, "reg", 0)));    
+    _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
+    _FDT((fdt_property(fdt, "ibm,interrupt-server-ranges",
+                       interrupt_server_ranges_prop,
+                       sizeof(interrupt_server_ranges_prop))));
+
+    _FDT((fdt_end_node(fdt)));
+   
     /* vdevice */
     _FDT((fdt_begin_node(fdt, "vdevice")));
 
@@ -175,6 +195,8 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
     _FDT((fdt_property_string(fdt, "compatible", "IBM,vdevice")));
     _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
     _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
+    _FDT((fdt_property_cell(fdt, "#interrupt-cells", 0x2)));
+    _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
     
     _FDT((fdt_end_node(fdt)));
 
@@ -290,6 +312,10 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     }
     qemu_free(filename);
 
+    /* Set up Interrupt Controller */
+    spapr->icp = xics_system_init(smp_cpus, &env, MAX_SERIAL_PORTS);
+
+    /* Set up VIO bus */
     spapr->vio_bus = spapr_vio_bus_init();
 
     for (i = 0; i < MAX_SERIAL_PORTS; i++) {
diff --git a/hw/spapr.h b/hw/spapr.h
index 7a7c319..4b54c22 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -2,9 +2,11 @@
 #define __HW_SPAPR_H__
 
 struct VIOsPAPRBus;
+struct icp_state;
 
 typedef struct sPAPREnvironment {
     struct VIOsPAPRBus *vio_bus;
+    struct icp_state *icp;
 } sPAPREnvironment;
 
 #define H_SUCCESS         0
diff --git a/hw/xics.c b/hw/xics.c
new file mode 100644
index 0000000..46e778a
--- /dev/null
+++ b/hw/xics.c
@@ -0,0 +1,528 @@
+#include "hw.h"
+#include "hw/spapr.h"
+#include "hw/xics.h"
+
+#include <pthread.h>
+
+/*
+ * ICP: Presentation layer
+ */
+
+struct icp_server_state {
+    uint32_t cppr :8;
+    uint32_t xisr :24;
+    uint8_t pending_priority;
+    uint8_t mfrr;
+    qemu_irq output;
+    pthread_mutex_t lock;
+};
+
+struct ics_state;
+
+struct icp_state {
+    long nr_servers;
+    struct icp_server_state *ss;
+    struct ics_state *ics;
+};
+
+static void ics_reject(struct ics_state *ics, int nr);
+static void ics_resend(struct ics_state *ics);
+static void ics_eoi(struct ics_state *ics, int nr);
+
+static void icp_check_ipi(struct icp_state *icp, int server)
+{
+    struct icp_server_state *ss = icp->ss + server;
+    
+    if (ss->xisr && (ss->pending_priority <= ss->mfrr)) {
+        return;
+    }
+
+    if (ss->xisr) {
+        ics_reject(icp->ics, ss->xisr);
+    }
+
+    ss->xisr = XICS_IPI;
+    ss->pending_priority = ss->mfrr;
+    qemu_irq_raise(ss->output);
+}
+
+static void icp_resend(struct icp_state *icp, int server)
+{
+    struct icp_server_state *ss = icp->ss + server;
+
+    if (ss->mfrr < ss->cppr) {
+        icp_check_ipi(icp, server);
+    }
+    ics_resend(icp->ics);
+}
+
+static void icp_set_cppr(struct icp_state *icp, int server, uint8_t cppr)
+{
+    struct icp_server_state *ss = icp->ss + server;
+    uint8_t old_cppr;
+    uint32_t old_xisr;
+
+    pthread_mutex_lock(&ss->lock);
+    old_cppr = ss->cppr;
+    ss->cppr = cppr;
+
+    if (cppr < old_cppr) {
+        if (ss->xisr && (cppr <= ss->pending_priority)) {
+            old_xisr = ss->xisr;
+            ss->xisr = 0;
+            qemu_irq_lower(ss->output);
+            ics_reject(icp->ics, old_xisr);
+        }
+    } else {
+        if (!ss->xisr) {
+            icp_resend(icp, server);
+        }
+    }
+    pthread_mutex_unlock(&ss->lock);
+}
+
+static void icp_set_mfrr(struct icp_state *icp, int nr, uint8_t mfrr)
+{
+    struct icp_server_state *ss = icp->ss + nr;
+
+    pthread_mutex_lock(&ss->lock);
+
+    ss->mfrr = mfrr;
+    if (mfrr < ss->cppr) {
+        icp_check_ipi(icp, nr);
+    }
+
+    pthread_mutex_unlock(&ss->lock);
+}
+
+static uint32_t icp_accept(struct icp_server_state *ss)
+{
+    uint32_t xirr;
+
+    pthread_mutex_lock(&ss->lock);
+    qemu_irq_lower(ss->output);
+    xirr = ss->cppr << 24 | ss->xisr;
+    ss->xisr = 0;
+    ss->cppr = ss->pending_priority;
+    pthread_mutex_unlock(&ss->lock);
+    return xirr;
+}
+
+static void icp_eoi(struct icp_state *icp, int server, uint32_t xirr)
+{
+    struct icp_server_state *ss = icp->ss + server;
+
+    ics_eoi(icp->ics, xirr & 0xffffff);
+    /* Send EOI -> ICS */
+    ss->cppr = xirr >> 24;
+    if (!ss->xisr) {
+        icp_resend(icp, server);
+    }
+}
+
+static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority)
+{
+    struct icp_server_state *ss = icp->ss + server;
+
+    pthread_mutex_lock(&ss->lock);
+
+    if ((priority >= ss->cppr)
+        || (ss->xisr && (ss->pending_priority <= priority))) {
+        ics_reject(icp->ics, nr);
+    } else {
+        if (ss->xisr) {
+            ics_reject(icp->ics, ss->xisr);
+        }
+        ss->xisr = nr;
+        ss->pending_priority = priority;
+        qemu_irq_raise(ss->output);
+    }
+
+    pthread_mutex_unlock(&ss->lock);
+}
+
+/*
+ * ICS: Source layer
+ */
+
+struct ics_irq_state {
+    int server;
+    uint8_t priority;
+    uint8_t saved_priority;
+    /* int pending :1; */
+    /* int presented :1; */
+    int rejected :1;
+    int masked_pending :1;
+};
+
+struct ics_state {
+    int nr_irqs;
+    int offset;
+    qemu_irq *qirqs;
+    struct ics_irq_state *irqs;
+    struct icp_state *icp;
+};
+
+static int ics_valid_irq(struct ics_state *ics, uint32_t nr)
+{
+    return (nr >= ics->offset)
+        && (nr < (ics->offset + ics->nr_irqs));
+}
+
+static void ics_set_irq_msi(void *opaque, int nr, int val)
+{
+    struct ics_state *ics = (struct ics_state *)opaque;
+    struct ics_irq_state *irq = ics->irqs + nr;
+
+    if (val) {
+        if (irq->priority == 0xff) {
+            irq->masked_pending = 1;
+            /* masked pending */ ;
+        } else  {
+            icp_irq(ics->icp, irq->server, nr + ics->offset, irq->priority);
+        }
+    }
+}
+
+static void ics_reject_msi(struct ics_state *ics, int nr)
+{
+    struct ics_irq_state *irq = ics->irqs + nr - ics->offset;
+
+    irq->rejected = 1;
+}
+
+static void ics_resend_msi(struct ics_state *ics)
+{
+    int i;
+
+    for (i = 0; i < ics->nr_irqs; i++) {
+        struct ics_irq_state *irq = ics->irqs + i;
+
+        /* FIXME: filter by server#? */
+        if (irq->rejected) {
+            irq->rejected = 0;
+            if (irq->priority != 0xff) {
+                icp_irq(ics->icp, irq->server, i + ics->offset, irq->priority);
+            }
+        }
+    }
+}
+
+static void ics_write_xive_msi(struct ics_state *ics, int nr, int server,
+                               uint8_t priority)
+{
+    struct ics_irq_state *irq = ics->irqs + nr;
+
+    irq->server = server;
+    irq->priority = priority;
+
+    if (!irq->masked_pending || (priority = 0xff)) {
+        return;
+    }
+
+    irq->masked_pending = 0;
+    icp_irq(ics->icp, server, nr + ics->offset, priority);
+}
+
+/* static void ics_recheck_irq(struct ics_state *ics, int nr) */
+/* { */
+/*     struct ics_irq_state *irq = xics->irqs + (nr - xics->offset); */
+
+/*     if (irq->pending && (irq->priority != 0xff)) { */
+/*      irq->presented = 1; */
+/*      icp_irq(xicp->ss + irq->server, nr + ics->offset, irq->priority); */
+/*     } */
+/* } */
+
+/* static void ics_set_irq(void *opaque, int nr, int val) */
+/* { */
+/*     struct ics_state *ics = (struct ics_state *)opaque; */
+/*     struct ics_irq_state *irq = ics->irqs + nr; */
+
+/*     irq->pending = val; */
+/*     ics_recheck_irq(ics, nr); */
+/* } */
+
+/* static void ics_reject(int nr) */
+/* { */
+/*     struct ics_irq_state *irq = xics->irqs + (nr - xics->offset); */
+
+/*     assert(irq->presented); */
+/*     irq->rejected = 1; */
+/*     irq->presented = 0; */
+/* } */
+
+/* static void ics_eoi(int nr) */
+/* { */
+/*     struct ics_irq_state *irq = xics->irqs + (nr - xics->offset); */
+
+/*     assert(irq->presented); */
+/*     irq->presented = 0; */
+/*     irq->rejected = 0; */
+/*     ics_recheck_irq(xics, nr); */
+/* } */
+
+/* static void ics_resend_irq(struct ics_state *ics, int nr, */
+/*                            struct icp_server_state *ss) */
+/* { */
+/*     struct ics_irq_state *irq = ics->irqs + (nr - ics->offset); */
+
+/*     if (!irq->rejected) */
+/*         return; /\* Not rejected, so no need to resend *\/ */
+
+/*     if (ss != (xicp->ss + irq->server)) */
+/*         return; /\* Not for this server, so don't resend *\/ */
+
+/*     ics_recheck_irq(ics, nr); */
+/* } */
+
+/* static void ics_resend(struct icp_server_state *ss) */
+/* { */
+/*     int i; */
+
+/*     for (i = 0; i < xics->nr_irqs; i++) */
+/*         ics_resend_irq(xics, nr, ss); */
+/* } */
+
+static void ics_reject(struct ics_state *ics, int nr)
+{
+    ics_reject_msi(ics, nr);
+}
+
+static void ics_resend(struct ics_state *ics)
+{
+    ics_resend_msi(ics);
+}
+
+static void ics_eoi(struct ics_state *ics, int nr)
+{
+}
+
+/*
+ * Exported functions
+ */
+
+qemu_irq xics_find_qirq(struct icp_state *icp, int irq)
+{
+    if ((irq < icp->ics->offset)
+        || (irq >= (icp->ics->offset + icp->ics->nr_irqs))) {
+        return NULL;
+    }
+
+    return icp->ics->qirqs[irq - icp->ics->offset];
+}
+
+static target_ulong h_cppr(CPUState *env, sPAPREnvironment *spapr,
+                           target_ulong opcode, target_ulong *args)
+{
+    target_ulong cppr = args[0];
+
+    icp_set_cppr(spapr->icp, env->cpu_index, cppr);
+    return H_SUCCESS;
+}
+
+static target_ulong h_ipi(CPUState *env, sPAPREnvironment *spapr,
+                          target_ulong opcode, target_ulong *args)
+{
+    target_ulong server = args[0];
+    target_ulong mfrr = args[1];
+
+    if (server >= spapr->icp->nr_servers) {
+        return H_PARAMETER;
+    }
+
+    icp_set_mfrr(spapr->icp, server, mfrr);
+    return H_SUCCESS;
+
+}
+
+static target_ulong h_xirr(CPUState *env, sPAPREnvironment *spapr,
+                           target_ulong opcode, target_ulong *args)
+{
+    uint32_t xirr = icp_accept(spapr->icp->ss + env->cpu_index);
+
+    args[0] = xirr;
+    return H_SUCCESS;
+}
+
+static target_ulong h_eoi(CPUState *env, sPAPREnvironment *spapr,
+                          target_ulong opcode, target_ulong *args)
+{
+    target_ulong xirr = args[0];
+
+    icp_eoi(spapr->icp, env->cpu_index, xirr);
+    return H_SUCCESS;
+}
+
+static void rtas_set_xive(sPAPREnvironment *spapr, uint32_t token,
+                          uint32_t nargs, target_ulong args,
+                          uint32_t nret, target_ulong rets)
+{
+    struct ics_state *ics = spapr->icp->ics;
+    uint32_t nr, server, priority;
+
+    if ((nargs != 3) || (nret != 1)) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    nr = rtas_ld(args, 0);
+    server = rtas_ld(args, 1);
+    priority = rtas_ld(args, 2);
+
+    if (!ics_valid_irq(ics, nr) || (server >= ics->icp->nr_servers)
+        || (priority > 0xff)) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    ics_write_xive_msi(ics, nr - ics->offset, server, priority);
+
+    rtas_st(rets, 0, 0); /* Success */
+}
+
+static void rtas_get_xive(sPAPREnvironment *spapr, uint32_t token,
+                          uint32_t nargs, target_ulong args,
+                          uint32_t nret, target_ulong rets)
+{
+    struct ics_state *ics = spapr->icp->ics;
+    uint32_t nr;
+
+    if ((nargs != 1) || (nret != 3)) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    nr = rtas_ld(args, 0);
+
+    if (!ics_valid_irq(ics, nr)) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    rtas_st(rets, 0, 0); /* Success */
+    rtas_st(rets, 1, ics->irqs[nr - ics->offset].server);
+    rtas_st(rets, 2, ics->irqs[nr - ics->offset].priority);
+}
+
+static void rtas_int_off(sPAPREnvironment *spapr, uint32_t token,
+                         uint32_t nargs, target_ulong args,
+                         uint32_t nret, target_ulong rets)
+{
+    struct ics_state *ics = spapr->icp->ics;
+    uint32_t nr;
+
+    if ((nargs != 1) || (nret != 1)) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    nr = rtas_ld(args, 0);
+
+    if (!ics_valid_irq(ics, nr)) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    /* This is a NOP for now, since the described PAPR semantics don't
+     * seem to gel with what Linux does */
+#if 0
+    struct ics_irq_state *irq = xics->irqs + (nr - xics->offset);
+
+    irq->saved_priority = irq->priority;
+    ics_write_xive_msi(xics, nr - xics->offset, irq->server, 0xff);
+#endif
+
+    rtas_st(rets, 0, 0); /* Success */
+}
+
+static void rtas_int_on(sPAPREnvironment *spapr, uint32_t token,
+                        uint32_t nargs, target_ulong args,
+                        uint32_t nret, target_ulong rets)
+{
+    struct ics_state *ics = spapr->icp->ics;
+    uint32_t nr;
+
+    if ((nargs != 1) || (nret != 1)) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    nr = rtas_ld(args, 0);
+
+    if (!ics_valid_irq(ics, nr)) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    /* This is a NOP for now, since the described PAPR semantics don't
+     * seem to gel with what Linux does */
+#if 0
+    struct ics_irq_state *irq = xics->irqs + (nr - xics->offset);
+
+    ics_write_xive_msi(xics, nr - xics->offset,
+                       irq->server, irq->saved_priority);
+#endif
+
+    rtas_st(rets, 0, 0); /* Success */
+}
+
+struct icp_state *xics_system_init(int nr_servers, CPUState *servers[],
+                                   int nr_irqs)
+{
+    int i;
+    struct icp_state *icp;
+    struct ics_state *ics;
+
+    icp = qemu_mallocz(sizeof(*icp));
+    icp->nr_servers = nr_servers;
+    icp->ss = qemu_mallocz(nr_servers * sizeof(struct icp_server_state));
+
+    for (i = 0; i < nr_servers; i++) {
+        servers[i]->cpu_index = i;
+
+        switch (PPC_INPUT(servers[i])) {
+        case PPC_FLAGS_INPUT_POWER7:
+            icp->ss[i].output = servers[i]->irq_inputs[POWER7_INPUT_INT];
+            break;
+
+        case PPC_FLAGS_INPUT_970:
+            icp->ss[i].output = servers[i]->irq_inputs[PPC970_INPUT_INT];
+            break;
+
+        default:
+            hw_error("XICS interrupt model does not support this CPU bus model\n");
+            exit(1);
+        }
+
+        icp->ss[i].mfrr = 0xff;
+        pthread_mutex_init(&icp->ss[i].lock, NULL);
+    }
+
+    ics = qemu_mallocz(sizeof(*ics));
+    ics->nr_irqs = nr_irqs;
+    ics->offset = 16;
+    ics->irqs = qemu_mallocz(nr_irqs * sizeof(struct ics_irq_state));
+
+    icp->ics = ics;
+    ics->icp = icp;
+
+    for (i = 0; i < nr_irqs; i++) {
+        ics->irqs[i].priority = 0xff;
+        ics->irqs[i].saved_priority = 0xff;
+    }
+
+    ics->qirqs = qemu_allocate_irqs(ics_set_irq_msi, ics, nr_irqs);
+
+    spapr_register_hypercall(H_CPPR, h_cppr);
+    spapr_register_hypercall(H_IPI, h_ipi);
+    spapr_register_hypercall(H_XIRR, h_xirr);
+    spapr_register_hypercall(H_EOI, h_eoi);
+
+    spapr_rtas_register("ibm,set-xive", rtas_set_xive);
+    spapr_rtas_register("ibm,get-xive", rtas_get_xive);
+    spapr_rtas_register("ibm,int-off", rtas_int_off);
+    spapr_rtas_register("ibm,int-on", rtas_int_on);
+
+    return icp;
+}
diff --git a/hw/xics.h b/hw/xics.h
new file mode 100644
index 0000000..e55f5f1
--- /dev/null
+++ b/hw/xics.h
@@ -0,0 +1,13 @@
+#if !defined(__XICS_H__)
+#define __XICS_H__
+
+#define XICS_IPI        0x2
+
+struct icp_state;
+
+qemu_irq xics_find_qirq(struct icp_state *icp, int irq);
+
+struct icp_state *xics_system_init(int nr_servers, CPUState *servers[],
+                                   int nr_irqs);
+
+#endif /* __XICS_H__ */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 19/26] Add PAPR H_VIO_SIGNAL hypercall and infrastructure for VIO interrupts
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (17 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics) David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16 15:49   ` [Qemu-devel] " Alexander Graf
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 20/26] Add (virtual) interrupt to PAPR virtual tty device David Gibson
                   ` (6 subsequent siblings)
  25 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

This patch adds infrastructure to support interrupts from PAPR virtual IO
devices.  This includes correctly advertising those interrupts in the
device tree, and implementing the H_VIO_SIGNAL hypercall, used to
enable and disable individual device interrupts.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/spapr.c     |    2 +-
 hw/spapr_vio.c |   34 ++++++++++++++++++++++++++++++++++
 hw/spapr_vio.h |    6 ++++++
 3 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index be30def..5b19963 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -62,7 +62,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
     uint32_t start_prop = cpu_to_be32(initrd_base);
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
     uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
-    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr";
+    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt";
     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
     int i;
     char *modelname;
diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
index 0ed63f4..45edd94 100644
--- a/hw/spapr_vio.c
+++ b/hw/spapr_vio.c
@@ -105,6 +105,15 @@ static int vio_make_devnode(VIOsPAPRDevice *dev,
         }
     }
 
+    if (dev->qirq) {
+        uint32_t ints_prop[] = {cpu_to_be32(dev->vio_irq_num), 0};
+
+        ret = fdt_setprop(fdt, node_off, "interrupts", ints_prop,
+                          sizeof(ints_prop));
+        if (ret < 0)
+            return ret;
+    }
+
     if (info->devnode) {
         ret = (info->devnode)(dev, fdt, node_off);
         if (ret < 0) {
@@ -140,6 +149,28 @@ void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info)
     qdev_register(&info->qdev);
 }
 
+static target_ulong h_vio_signal(CPUState *env, sPAPREnvironment *spapr,
+                                 target_ulong opcode,
+                                 target_ulong *args)
+{
+    target_ulong reg = args[0];
+    target_ulong mode = args[1];
+    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+    VIOsPAPRDeviceInfo *info;
+
+    if (!dev)
+        return H_PARAMETER;
+
+    info = (VIOsPAPRDeviceInfo *)dev->qdev.info;
+
+    if (mode & ~info->signal_mask)
+        return H_PARAMETER;;
+
+    dev->signal_state = mode;
+
+    return H_SUCCESS;
+}
+
 VIOsPAPRBus *spapr_vio_bus_init(void)
 {
     VIOsPAPRBus *bus;
@@ -156,6 +187,9 @@ VIOsPAPRBus *spapr_vio_bus_init(void)
     _bus = qbus_create(&spapr_vio_bus_info, dev, "spapr-vio");
     bus = DO_UPCAST(VIOsPAPRBus, bus, _bus);
 
+    /* hcall-vio */
+    spapr_register_hypercall(H_VIO_SIGNAL, h_vio_signal);
+
     for (_info = device_info_list; _info; _info = _info->next) {
         VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)_info;
 
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index b164ad3..8a000c6 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -24,6 +24,9 @@
 typedef struct VIOsPAPRDevice {
     DeviceState qdev;
     uint32_t reg;
+    qemu_irq qirq;
+    uint32_t vio_irq_num;
+    target_ulong signal_state;
 } VIOsPAPRDevice;
 
 typedef struct VIOsPAPRBus {
@@ -33,6 +36,7 @@ typedef struct VIOsPAPRBus {
 typedef struct {
     DeviceInfo qdev;
     const char *dt_name, *dt_type, *dt_compatible;
+    target_ulong signal_mask;
     int (*init)(VIOsPAPRDevice *dev);
     void (*hcalls)(VIOsPAPRBus *bus);
     int (*devnode)(VIOsPAPRDevice *dev, void *fdt, int node_off);
@@ -43,6 +47,8 @@ extern VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg);
 extern void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info);
 extern int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt);
 
+extern int spapr_vio_signal(VIOsPAPRDevice *dev, target_ulong mode);
+
 void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len);
 void spapr_vty_create(VIOsPAPRBus *bus,
                       uint32_t reg, CharDriverState *chardev);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 20/26] Add (virtual) interrupt to PAPR virtual tty device
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (18 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 19/26] Add PAPR H_VIO_SIGNAL hypercall and infrastructure for VIO interrupts David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 21/26] Implement TCE translation for sPAPR VIO David Gibson
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

Now that we have implemented the PAPR "xics" virtualized interrupt
controller, we can add interrupts in PAPR VIO devices.  This patch adds
interrupt support to the PAPR virtual tty/console device.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/spapr.c     |    6 ++++--
 hw/spapr_vio.h |    3 ++-
 hw/spapr_vty.c |   11 ++++++++++-
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 5b19963..e7f8864 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -255,6 +255,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     int fdt_size;
     sPAPREnvironment *spapr;
     char *filename;
+    int irq = 16;
 
     spapr = qemu_malloc(sizeof(*spapr));
 
@@ -318,9 +319,10 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     /* Set up VIO bus */
     spapr->vio_bus = spapr_vio_bus_init();
 
-    for (i = 0; i < MAX_SERIAL_PORTS; i++) {
+    for (i = 0; i < MAX_SERIAL_PORTS; i++, irq++) {
         if (serial_hds[i]) {
-            spapr_vty_create(spapr->vio_bus, i, serial_hds[i]);
+            spapr_vty_create(spapr->vio_bus, i, serial_hds[i],
+                             xics_find_qirq(spapr->icp, irq), irq);
         }
     }
 
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index 8a000c6..2013927 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -51,6 +51,7 @@ extern int spapr_vio_signal(VIOsPAPRDevice *dev, target_ulong mode);
 
 void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len);
 void spapr_vty_create(VIOsPAPRBus *bus,
-                      uint32_t reg, CharDriverState *chardev);
+                      uint32_t reg, CharDriverState *chardev,
+                      qemu_irq qirq, uint32_t vio_irq_num);
 
 #endif /* _HW_SPAPR_VIO_H */
diff --git a/hw/spapr_vty.c b/hw/spapr_vty.c
index afc9ef9..5c2412a 100644
--- a/hw/spapr_vty.c
+++ b/hw/spapr_vty.c
@@ -24,6 +24,10 @@ static void vty_receive(void *opaque, const uint8_t *buf, int size)
     VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)opaque;
     int i;
 
+    if ((dev->in == dev->out) && size) {
+        /* toggle line to simulate edge interrupt */
+        qemu_irq_pulse(dev->sdev.qirq);
+    }
     for (i = 0; i < size; i++) {
         assert((dev->in - dev->out) < VTERM_BUFSIZE);
         dev->buf[dev->in++ % VTERM_BUFSIZE] = buf[i];
@@ -107,14 +111,19 @@ static target_ulong h_get_term_char(CPUState *env, sPAPREnvironment *spapr,
 }
 
 void spapr_vty_create(VIOsPAPRBus *bus,
-                      uint32_t reg, CharDriverState *chardev)
+                      uint32_t reg, CharDriverState *chardev,
+                      qemu_irq qirq, uint32_t vio_irq_num)
 {
     DeviceState *dev;
+    VIOsPAPRDevice *sdev;
 
     dev = qdev_create(&bus->bus, "spapr-vty");
     qdev_prop_set_uint32(dev, "reg", reg);
     qdev_prop_set_chr(dev, "chardev", chardev);
     qdev_init_nofail(dev);
+    sdev = (VIOsPAPRDevice *)dev;
+    sdev->qirq = qirq;
+    sdev->vio_irq_num = vio_irq_num;
 }
 
 static void vty_hcalls(VIOsPAPRBus *bus)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 21/26] Implement TCE translation for sPAPR VIO
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (19 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 20/26] Add (virtual) interrupt to PAPR virtual tty device David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16 16:03   ` [Qemu-devel] " Alexander Graf
  2011-03-16 22:20   ` [Qemu-devel] " Anthony Liguori
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 22/26] Implement sPAPR Virtual LAN (ibmveth) David Gibson
                   ` (4 subsequent siblings)
  25 siblings, 2 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

From: Ben Herrenschmidt <benh@kernel.crashing.org>

This patch implements the necessary infrastructure and hypercalls for
sPAPR's TCE (Translation Control Entry) IOMMU mechanism.  This is necessary
for all virtual IO devices which do DMA (i.e. nearly all of them).

Signed-off-by: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/spapr.c     |    3 +-
 hw/spapr_vio.c |  232 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/spapr_vio.h |   32 ++++++++
 3 files changed, 266 insertions(+), 1 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index e7f8864..a362889 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -62,7 +62,8 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
     uint32_t start_prop = cpu_to_be32(initrd_base);
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
     uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
-    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt";
+    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
+        "\0hcall-tce";
     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
     int i;
     char *modelname;
diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
index 45edd94..37cf51e 100644
--- a/hw/spapr_vio.c
+++ b/hw/spapr_vio.c
@@ -37,6 +37,7 @@
 #endif /* CONFIG_FDT */
 
 /* #define DEBUG_SPAPR */
+/* #define DEBUG_TCE */
 
 #ifdef DEBUG_SPAPR
 #define dprintf(fmt, ...) \
@@ -114,6 +115,28 @@ static int vio_make_devnode(VIOsPAPRDevice *dev,
             return ret;
     }
 
+    if (dev->rtce_window_size) {
+        uint32_t dma_prop[] = {cpu_to_be32(dev->reg),
+                               0, 0,
+                               0, cpu_to_be32(dev->rtce_window_size)};
+
+        ret = fdt_setprop_cell(fdt, node_off, "ibm,#dma-address-cells", 2);
+        if (ret < 0) {
+            return ret;
+        }
+
+        ret = fdt_setprop_cell(fdt, node_off, "ibm,#dma-size-cells", 2);
+        if (ret < 0) {
+            return ret;
+        }
+
+        ret = fdt_setprop(fdt, node_off, "ibm,my-dma-window", dma_prop,
+                          sizeof(dma_prop));
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
     if (info->devnode) {
         ret = (info->devnode)(dev, fdt, node_off);
         if (ret < 0) {
@@ -125,6 +148,210 @@ static int vio_make_devnode(VIOsPAPRDevice *dev,
 }
 #endif /* CONFIG_FDT */
 
+/*
+ * RTCE handling
+ */
+
+static void rtce_init(VIOsPAPRDevice *dev)
+{
+    size_t size = (dev->rtce_window_size >> SPAPR_VIO_TCE_PAGE_SHIFT)
+        * sizeof(VIOsPAPR_RTCE);
+
+    if (size) {
+        dev->rtce_table = qemu_mallocz(size);
+    }
+}
+
+static target_ulong h_put_tce(CPUState *env, sPAPREnvironment *spapr,
+                              target_ulong opcode, target_ulong *args)
+{
+    target_ulong liobn = args[0];
+    target_ulong ioba = args[1];
+    target_ulong tce = args[2];
+    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, liobn);
+    VIOsPAPR_RTCE *rtce;
+
+    if (!dev) {
+        fprintf(stderr, "spapr_vio_put_tce on non-existent LIOBN "
+                TARGET_FMT_lx "\n",
+                liobn);
+        return H_PARAMETER;
+    }
+
+    ioba &= ~(SPAPR_VIO_TCE_PAGE_SIZE - 1);
+
+#ifdef DEBUG_TCE
+    fprintf(stderr, "spapr_vio_put_tce on %s  ioba 0x" TARGET_FMT_lx 
+            "  TCE 0x" TARGET_FMT_lx "\n", dev->qdev.id, ioba, tce);
+#endif
+
+    if (ioba >= dev->rtce_window_size) {
+        fprintf(stderr, "spapr_vio_put_tce on out-of-boards IOBA 0x" TARGET_FMT_lx "\n",
+                ioba);
+        return H_PARAMETER;
+    }
+
+    rtce = dev->rtce_table + (ioba >> SPAPR_VIO_TCE_PAGE_SHIFT);
+    rtce->tce = tce;
+   
+    return H_SUCCESS;
+}
+
+int spapr_vio_check_tces(VIOsPAPRDevice *dev, target_ulong ioba,
+                         target_ulong len, enum VIOsPAPR_TCEAccess access)
+{
+    int start, end, i;
+
+    start = ioba >> SPAPR_VIO_TCE_PAGE_SHIFT;
+    end = (ioba + len - 1) >> SPAPR_VIO_TCE_PAGE_SHIFT;
+
+    for (i = start; i <= end; i++) {
+        if ((dev->rtce_table[i].tce & access) != access) {
+            fprintf(stderr, "FAIL on %d\n", i);
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+/* XX Might want to special case KVM for speed ? */
+int spapr_tce_dma_write(VIOsPAPRDevice *dev, uint64_t taddr, const void *buf,
+                        uint32_t size)
+{
+#ifdef DEBUG_TCE
+    fprintf(stderr, "spapr_tce_dma_write taddr=0x%llx size=0x%x\n",
+            (unsigned long long)taddr, size);
+#endif
+
+    while(size) {
+        uint64_t tce;
+        uint32_t lsize;
+        uint64_t txaddr;
+
+        /* Check if we are in bound */
+        if (taddr >= dev->rtce_window_size) {
+            fprintf(stderr, "spapr_tce_dma_write out of bounds\n");
+            return -H_DEST_PARM;
+        }
+        tce = dev->rtce_table[taddr >> SPAPR_VIO_TCE_PAGE_SHIFT].tce;
+
+        /* How much til end of page ? */
+        lsize = MIN(size, ((~taddr) & SPAPR_VIO_TCE_PAGE_MASK) + 1);
+
+        /* Check TCE */
+        if (!(tce & 2))
+            return -H_DEST_PARM;
+
+        /* Translate */
+        txaddr = (tce & ~SPAPR_VIO_TCE_PAGE_MASK) | (taddr & SPAPR_VIO_TCE_PAGE_MASK);
+
+#ifdef DEBUG_TCE
+        fprintf(stderr, " -> write to txaddr=0x%llx, size=0x%x\n",
+                (unsigned long long)txaddr, lsize);
+#endif
+
+        /* Do it */
+        cpu_physical_memory_write(txaddr, buf, lsize);
+        buf += lsize;
+        taddr += lsize;
+        size -= lsize;
+    }
+    return 0;
+}
+
+/* XX Might want to special case KVM for speed ? */
+int spapr_tce_dma_zero(VIOsPAPRDevice *dev, uint64_t taddr, uint32_t size)
+{
+    uint8_t *zeroes;
+
+#ifdef DEBUG_TCE
+    fprintf(stderr, "spapr_tce_dma_zero taddr=0x%llx size=0x%x\n",
+            (unsigned long long)taddr, size);
+#endif
+
+    /* FIXME: do this better... */
+    zeroes = alloca(size);
+    memset(zeroes, 0, size);
+    return spapr_tce_dma_write(dev, taddr, zeroes, size);
+}
+
+void stb_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint8_t val)
+{
+    spapr_tce_dma_write(dev, taddr, &val, sizeof(val));
+}
+
+void sth_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint16_t val)
+{
+    val = tswap16(val);
+    spapr_tce_dma_write(dev, taddr, &val, sizeof(val));
+}
+
+
+void stw_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint32_t val)
+{
+    val = tswap32(val);
+    spapr_tce_dma_write(dev, taddr, &val, sizeof(val));
+}
+
+void stq_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint64_t val)
+{
+    val = tswap64(val);
+    spapr_tce_dma_write(dev, taddr, &val, sizeof(val));
+}
+
+int spapr_tce_dma_read(VIOsPAPRDevice *dev, uint64_t taddr, void *buf,
+                       uint32_t size)
+{
+#ifdef DEBUG_TCE
+    fprintf(stderr, "spapr_tce_dma_write taddr=0x%llx size=0x%x\n",
+            (unsigned long long)taddr, size);
+#endif
+
+    while(size) {
+        uint64_t tce;
+        uint32_t lsize;
+        uint64_t txaddr;
+
+        /* Check if we are in bound */
+        if (taddr >= dev->rtce_window_size) {
+            fprintf(stderr, "spapr_tce_dma_read out of bounds\n");
+            return -H_DEST_PARM;
+        }
+        tce = dev->rtce_table[taddr >> SPAPR_VIO_TCE_PAGE_SHIFT].tce;
+
+        /* How much til end of page ? */
+        lsize = MIN(size, ((~taddr) & SPAPR_VIO_TCE_PAGE_MASK) + 1);
+
+        /* Check TCE */
+        if (!(tce & 1)) {
+            return H_DEST_PARM;
+        }
+
+        /* Translate */
+        txaddr = (tce & ~SPAPR_VIO_TCE_PAGE_MASK) | (taddr & SPAPR_VIO_TCE_PAGE_MASK);
+
+#ifdef DEBUG_TCE
+        fprintf(stderr, " -> write to txaddr=0x%llx, size=0x%x\n",
+                (unsigned long long)txaddr, lsize);
+#endif
+        /* Do it */
+        cpu_physical_memory_read(txaddr, buf, lsize);
+        buf += lsize;
+        taddr += lsize;
+        size -= lsize;
+    }
+    return H_SUCCESS;
+}
+
+uint64_t ldq_tce(VIOsPAPRDevice *dev, uint64_t taddr)
+{
+    uint64_t val;
+
+    spapr_tce_dma_read(dev, taddr, &val, sizeof(val));
+    return tswap64(val);
+}
+
 static int spapr_vio_busdev_init(DeviceState *dev, DeviceInfo *info)
 {
     VIOsPAPRDeviceInfo *_info = (VIOsPAPRDeviceInfo *)info;
@@ -137,6 +364,8 @@ static int spapr_vio_busdev_init(DeviceState *dev, DeviceInfo *info)
 
     _dev->qdev.id = id;
 
+    rtce_init(_dev);
+
     return _info->init(_dev);
 }
 
@@ -190,6 +419,9 @@ VIOsPAPRBus *spapr_vio_bus_init(void)
     /* hcall-vio */
     spapr_register_hypercall(H_VIO_SIGNAL, h_vio_signal);
 
+    /* hcall-tce */
+    spapr_register_hypercall(H_PUT_TCE, h_put_tce);
+
     for (_info = device_info_list; _info; _info = _info->next) {
         VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)_info;
 
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index 2013927..1b15d3e 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -21,12 +21,29 @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
 
+#define SPAPR_VIO_TCE_PAGE_SHIFT	12
+#define SPAPR_VIO_TCE_PAGE_SIZE		(1ULL << SPAPR_VIO_TCE_PAGE_SHIFT)
+#define SPAPR_VIO_TCE_PAGE_MASK		(SPAPR_VIO_TCE_PAGE_SIZE - 1)
+
+enum VIOsPAPR_TCEAccess {
+    SPAPR_TCE_FAULT = 0,
+    SPAPR_TCE_RO = 1,
+    SPAPR_TCE_WO = 2,
+    SPAPR_TCE_RW = 3,
+};
+
+typedef struct VIOsPAPR_RTCE {
+    uint64_t tce;
+} VIOsPAPR_RTCE;
+
 typedef struct VIOsPAPRDevice {
     DeviceState qdev;
     uint32_t reg;
     qemu_irq qirq;
     uint32_t vio_irq_num;
     target_ulong signal_state;
+    uint32_t rtce_window_size;
+    VIOsPAPR_RTCE *rtce_table;
 } VIOsPAPRDevice;
 
 typedef struct VIOsPAPRBus {
@@ -49,6 +66,21 @@ extern int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt);
 
 extern int spapr_vio_signal(VIOsPAPRDevice *dev, target_ulong mode);
 
+int spapr_vio_check_tces(VIOsPAPRDevice *dev, target_ulong ioba,
+                         target_ulong len,
+                         enum VIOsPAPR_TCEAccess access);
+
+int spapr_tce_dma_read(VIOsPAPRDevice *dev, uint64_t taddr,
+                       void *buf, uint32_t size);
+int spapr_tce_dma_write(VIOsPAPRDevice *dev, uint64_t taddr,
+                        const void *buf, uint32_t size);
+int spapr_tce_dma_zero(VIOsPAPRDevice *dev, uint64_t taddr, uint32_t size);
+void stb_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint8_t val);
+void sth_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint16_t val);
+void stw_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint32_t val);
+void stq_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint64_t val);
+uint64_t ldq_tce(VIOsPAPRDevice *dev, uint64_t taddr);
+
 void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len);
 void spapr_vty_create(VIOsPAPRBus *bus,
                       uint32_t reg, CharDriverState *chardev,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 22/26] Implement sPAPR Virtual LAN (ibmveth)
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (20 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 21/26] Implement TCE translation for sPAPR VIO David Gibson
@ 2011-03-16  4:56 ` David Gibson
  2011-03-16 16:12   ` [Qemu-devel] " Alexander Graf
  2011-03-16 22:29   ` [Qemu-devel] " Anthony Liguori
  2011-03-16  4:57 ` [Qemu-devel] [PATCH 23/26] Implement PAPR CRQ hypercalls David Gibson
                   ` (3 subsequent siblings)
  25 siblings, 2 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:56 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

This patch implements the PAPR specified Inter Virtual Machine Logical
LAN; that is the virtual hardware used by the Linux ibmveth driver.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 Makefile.target |    2 +-
 hw/spapr.c      |   21 +++-
 hw/spapr_llan.c |  476 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/spapr_vio.h  |    9 +-
 4 files changed, 503 insertions(+), 5 deletions(-)
 create mode 100644 hw/spapr_llan.c

diff --git a/Makefile.target b/Makefile.target
index 2b0588e..ef86d43 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -233,7 +233,7 @@ obj-ppc-y += ppc_oldworld.o
 obj-ppc-y += ppc_newworld.o
 # IBM pSeries (sPAPR)
 obj-ppc-y += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
-obj-ppc-y += xics.o spapr_vty.o
+obj-ppc-y += xics.o spapr_vty.o spapr_llan.o
 # PowerPC 4xx boards
 obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
 obj-ppc-y += ppc440.o ppc440_bamboo.o
diff --git a/hw/spapr.c b/hw/spapr.c
index a362889..44cf3cc 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -27,6 +27,7 @@
 #include "sysemu.h"
 #include "hw.h"
 #include "elf.h"
+#include "net.h"
 
 #include "hw/boards.h"
 #include "hw/ppc.h"
@@ -315,7 +316,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     qemu_free(filename);
 
     /* Set up Interrupt Controller */
-    spapr->icp = xics_system_init(smp_cpus, &env, MAX_SERIAL_PORTS);
+    spapr->icp = xics_system_init(smp_cpus, envs, MAX_SERIAL_PORTS + nb_nics);
 
     /* Set up VIO bus */
     spapr->vio_bus = spapr_vio_bus_init();
@@ -327,6 +328,24 @@ static void ppc_spapr_init(ram_addr_t ram_size,
         }
     }
 
+    for (i = 0; i < nb_nics; i++, irq++) {
+        NICInfo *nd = &nd_table[i];
+
+        if (!nd->model) {
+            nd->model = qemu_strdup("ibmveth");
+        }
+
+        if (strcmp(nd->model, "ibmveth") == 0) {
+            spapr_vlan_create(spapr->vio_bus, 0x1000 + i, nd,
+                              xics_find_qirq(spapr->icp, irq), irq);
+        } else {
+            fprintf(stderr, "pSeries (sPAPR) platform does not support "
+                    "NIC model '%s' (only ibmveth is supported)\n",
+                    nd->model);
+            exit(1);
+        }
+    }
+
     if (kernel_filename) {
         uint64_t lowaddr = 0;
 
diff --git a/hw/spapr_llan.c b/hw/spapr_llan.c
new file mode 100644
index 0000000..da0562d
--- /dev/null
+++ b/hw/spapr_llan.c
@@ -0,0 +1,476 @@
+#include "hw.h"
+#include "net.h"
+#include "hw/qdev.h"
+#include "hw/spapr.h"
+#include "hw/spapr_vio.h"
+
+#include <libfdt.h>
+
+#define ETH_ALEN        6
+
+//#define DEBUG
+
+#ifdef DEBUG
+#define dprintf(fmt...) do { fprintf(stderr, fmt); } while(0)
+#else
+#define dprintf(fmt...)
+#endif
+
+/*
+ * Virtual LAN device
+ */
+
+typedef uint64_t vlan_bd_t;
+
+#define VLAN_BD_VALID        0x8000000000000000ULL
+#define VLAN_BD_TOGGLE       0x4000000000000000ULL
+#define VLAN_BD_NO_CSUM      0x0200000000000000ULL
+#define VLAN_BD_CSUM_GOOD    0x0100000000000000ULL
+#define VLAN_BD_LEN_MASK     0x00ffffff00000000ULL
+#define VLAN_BD_LEN(bd)      (((bd) & VLAN_BD_LEN_MASK) >> 32)
+#define VLAN_BD_ADDR_MASK    0x00000000ffffffffULL
+#define VLAN_BD_ADDR(bd)     ((bd) & VLAN_BD_ADDR_MASK)
+
+#define VLAN_VALID_BD(addr, len) (VLAN_BD_VALID | \
+                                  (((len) << 32) & VLAN_BD_LEN_MASK) |  \
+                                  (addr & VLAN_BD_ADDR_MASK))
+
+#define VLAN_RXQC_TOGGLE     0x80
+#define VLAN_RXQC_VALID      0x40
+#define VLAN_RXQC_NO_CSUM    0x02
+#define VLAN_RXQC_CSUM_GOOD  0x01
+
+#define VLAN_RQ_ALIGNMENT    16
+#define VLAN_RXQ_BD_OFF      0
+#define VLAN_FILTER_BD_OFF   8
+#define VLAN_RX_BDS_OFF      16
+#define VLAN_MAX_BUFS        ((SPAPR_VIO_TCE_PAGE_SIZE - VLAN_RX_BDS_OFF) / 8)
+
+typedef struct VIOsPAPRVLANDevice {
+    VIOsPAPRDevice sdev;
+    NICConf nicconf;
+    NICState *nic;
+    int isopen;
+    target_ulong buf_list;
+    int add_buf_ptr, use_buf_ptr, rx_bufs;
+    target_ulong rxq_ptr;
+} VIOsPAPRVLANDevice;
+
+static int spapr_vlan_can_receive(VLANClientState *nc)
+{
+    VIOsPAPRVLANDevice *dev = DO_UPCAST(NICState, nc, nc)->opaque;
+
+    return (dev->isopen && dev->rx_bufs > 0);
+}
+
+static ssize_t spapr_vlan_receive(VLANClientState *nc, const uint8_t *buf,
+                                  size_t size)
+{
+    VIOsPAPRDevice *sdev = DO_UPCAST(NICState, nc, nc)->opaque;
+    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
+    vlan_bd_t rxq_bd = ldq_tce(sdev, dev->buf_list + VLAN_RXQ_BD_OFF);
+    vlan_bd_t bd;
+    int buf_ptr = dev->use_buf_ptr;
+    uint64_t handle;
+    uint8_t control;
+
+    dprintf("spapr_vlan_receive() [%s] rx_bufs=%d\n", sdev->qdev.id,
+            dev->rx_bufs);
+
+    if (!dev->isopen) {
+        return -1;
+    }
+
+    if (!dev->rx_bufs) {
+        return -1;
+    }
+
+    do {
+        buf_ptr += 8;
+        if (buf_ptr >= SPAPR_VIO_TCE_PAGE_SIZE) {
+            buf_ptr = VLAN_RX_BDS_OFF;
+        }
+
+        bd = ldq_tce(sdev, dev->buf_list + buf_ptr);
+        dprintf("use_buf_ptr=%d bd=0x%016llx\n",
+                buf_ptr, (unsigned long long)bd);
+    } while ((!(bd & VLAN_BD_VALID) || (VLAN_BD_LEN(bd) < (size + 8)))
+             && (buf_ptr != dev->use_buf_ptr));
+
+    if (!(bd & VLAN_BD_VALID) || (VLAN_BD_LEN(bd) < (size + 8))) {
+        /* Failed to find a suitable buffer */
+        return -1;
+    }
+
+    /* Remove the buffer from the pool */
+    dev->rx_bufs--;
+    dev->use_buf_ptr = buf_ptr;
+    stq_tce(sdev, dev->buf_list + dev->use_buf_ptr, 0);
+
+    dprintf("Found buffer: ptr=%d num=%d\n", dev->use_buf_ptr, dev->rx_bufs);
+
+    /* Transfer the packet data */
+    if (spapr_tce_dma_write(sdev, VLAN_BD_ADDR(bd) + 8, buf, size) < 0) {
+        return -1;
+    }
+
+    dprintf("spapr_vlan_receive: DMA write completed\n");
+
+    /* Update the receive queue */
+    control = VLAN_RXQC_TOGGLE | VLAN_RXQC_VALID;
+    if (rxq_bd & VLAN_BD_TOGGLE) {
+        control ^= VLAN_RXQC_TOGGLE;
+    }
+
+    handle = ldq_tce(sdev, VLAN_BD_ADDR(bd));
+    stq_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 8, handle);
+    stw_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 4, size);
+    sth_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 2, 8);
+    stb_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr, control);
+
+    dprintf("wrote rxq entry (ptr=0x%llx): 0x%016llx 0x%016llx\n",
+            (unsigned long long)dev->rxq_ptr,
+            (unsigned long long)ldq_tce(sdev, VLAN_BD_ADDR(rxq_bd) +
+                                        dev->rxq_ptr),
+            (unsigned long long)ldq_tce(sdev, VLAN_BD_ADDR(rxq_bd) +
+                                        dev->rxq_ptr + 8));
+
+    dev->rxq_ptr += 16;
+    if (dev->rxq_ptr >= VLAN_BD_LEN(rxq_bd)) {
+        dev->rxq_ptr = 0;
+        stq_tce(sdev, dev->buf_list + VLAN_RXQ_BD_OFF, rxq_bd ^ VLAN_BD_TOGGLE);
+    }
+
+    if (sdev->signal_state & 1) {
+        qemu_irq_pulse(sdev->qirq);
+    }
+
+    return size;
+}
+
+static NetClientInfo net_spapr_vlan_info = {
+    .type = NET_CLIENT_TYPE_NIC,
+    .size = sizeof(NICState),
+    .can_receive = spapr_vlan_can_receive,
+    .receive = spapr_vlan_receive,
+};
+
+static int spapr_vlan_init(VIOsPAPRDevice *sdev)
+{
+    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
+    VIOsPAPRBus *bus;
+
+    bus = DO_UPCAST(VIOsPAPRBus, bus, sdev->qdev.parent_bus);
+
+    qemu_macaddr_default_if_unset(&dev->nicconf.macaddr);
+
+    dev->nic = qemu_new_nic(&net_spapr_vlan_info, &dev->nicconf,
+                            sdev->qdev.info->name, sdev->qdev.id, dev);
+    qemu_format_nic_info_str(&dev->nic->nc, dev->nicconf.macaddr.a);
+
+    return 0;
+}
+
+void spapr_vlan_create(VIOsPAPRBus *bus, uint32_t reg, NICInfo *nd,
+                       qemu_irq qirq, uint32_t vio_irq_num)
+{
+    DeviceState *dev;
+    VIOsPAPRDevice *sdev;
+
+    dev = qdev_create(&bus->bus, "spapr-vlan");
+    qdev_prop_set_uint32(dev, "reg", reg);
+    
+    qdev_set_nic_properties(dev, nd);
+
+    qdev_init_nofail(dev);
+    sdev = (VIOsPAPRDevice *)dev;
+    sdev->qirq = qirq;
+    sdev->vio_irq_num = vio_irq_num;
+}
+
+static int spapr_vlan_devnode(VIOsPAPRDevice *dev, void *fdt, int node_off)
+{
+    VIOsPAPRVLANDevice *vdev = (VIOsPAPRVLANDevice *)dev;
+    int ret;
+
+    ret = fdt_setprop(fdt, node_off, "local-mac-address",
+                      &vdev->nicconf.macaddr, ETH_ALEN);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = fdt_setprop_cell(fdt, node_off, "ibm,mac-address-filters", 0);
+    if (ret < 0) {
+        return ret;
+    }
+
+    return 0;
+}
+
+static int check_bd(VIOsPAPRVLANDevice *dev, vlan_bd_t bd, target_ulong alignment)
+{
+    if ((VLAN_BD_ADDR(bd) % alignment)
+        || (VLAN_BD_LEN(bd) % alignment)) {
+        return -1;
+    }
+
+    if (spapr_vio_check_tces(&dev->sdev, VLAN_BD_ADDR(bd),
+                             VLAN_BD_LEN(bd), SPAPR_TCE_RW) != 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static target_ulong h_register_logical_lan(CPUState *env, sPAPREnvironment *spapr,
+                                           target_ulong opcode,
+                                           target_ulong *args)
+{
+    target_ulong reg = args[0];
+    target_ulong buf_list = args[1];
+    target_ulong rec_queue = args[2];
+    target_ulong filter_list = args[3];
+//    target_ulong mac_address = args[4];
+    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
+    vlan_bd_t filter_list_bd;
+#ifdef DEBUG
+    target_ulong mac_address = args[4];
+#endif
+
+    if (!dev) {
+        return H_PARAMETER;
+    }
+
+    if (dev->isopen) {
+        fprintf(stderr, "H_REGISTER_LOGICAL_LAN called twice without "
+                "H_FREE_LOGICAL_LAN\n");
+        return H_RESOURCE;
+    }
+
+    if (check_bd(dev, VLAN_VALID_BD(buf_list, SPAPR_VIO_TCE_PAGE_SIZE),
+                 SPAPR_VIO_TCE_PAGE_SIZE) < 0) {
+        fprintf(stderr, "Bad buf_list 0x" TARGET_FMT_lx 
+                " for H_REGISTER_LOGICAL_LAN\n", buf_list);
+        return H_PARAMETER;
+    }
+
+    filter_list_bd = VLAN_VALID_BD(filter_list, SPAPR_VIO_TCE_PAGE_SIZE);
+    if (check_bd(dev, filter_list_bd, SPAPR_VIO_TCE_PAGE_SIZE) < 0) {
+        fprintf(stderr, "Bad filter_list 0x" TARGET_FMT_lx 
+                " for H_REGISTER_LOGICAL_LAN\n", filter_list);
+        return H_PARAMETER;
+    }
+
+    if (!(rec_queue & VLAN_BD_VALID)
+        || (check_bd(dev, rec_queue, VLAN_RQ_ALIGNMENT) < 0)) {
+        fprintf(stderr, "Bad receive queue for H_REGISTER_LOGICAL_LAN\n");
+        return H_PARAMETER;
+    }
+
+    dev->buf_list = buf_list;
+    sdev->signal_state = 0;
+
+    rec_queue &= ~VLAN_BD_TOGGLE;
+
+    /* Initialize the buffer list */
+    stq_tce(sdev, buf_list, rec_queue);
+    stq_tce(sdev, buf_list + 8, filter_list_bd);
+    spapr_tce_dma_zero(sdev, buf_list + VLAN_RX_BDS_OFF,
+                       SPAPR_VIO_TCE_PAGE_SIZE - VLAN_RX_BDS_OFF);
+    dev->add_buf_ptr = VLAN_RX_BDS_OFF - 8;
+    dev->use_buf_ptr = VLAN_RX_BDS_OFF - 8;
+    dev->rx_bufs = 0;
+    dev->rxq_ptr = 0;
+
+    /* Initialize the receive queue */
+    spapr_tce_dma_zero(sdev, VLAN_BD_ADDR(rec_queue), VLAN_BD_LEN(rec_queue));
+
+    dev->isopen = 1;
+    return H_SUCCESS;
+}
+
+
+static target_ulong h_free_logical_lan(CPUState *env, sPAPREnvironment *spapr,
+                                       target_ulong opcode, target_ulong *args)
+{
+    target_ulong reg = args[0];
+    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
+
+    if (!dev) {
+        return H_PARAMETER;
+    }
+
+    if (!dev->isopen) {
+        fprintf(stderr, "H_FREE_LOGICAL_LAN called without "
+                "H_REGISTER_LOGICAL_LAN\n");
+        return H_RESOURCE;
+    }
+
+    dev->buf_list = 0;
+    dev->rx_bufs = 0;
+    dev->isopen = 0;
+    return H_SUCCESS;
+}
+
+static target_ulong h_add_logical_lan_buffer(CPUState *env, sPAPREnvironment *spapr,
+                                             target_ulong opcode, target_ulong *args)
+{
+    target_ulong reg = args[0];
+    target_ulong buf = args[1];
+    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
+    vlan_bd_t bd;
+
+    dprintf("H_ADD_LOGICAL_LAN_BUFFER(0x" TARGET_FMT_lx
+            ", 0x" TARGET_FMT_lx ")\n", reg, buf);
+
+    if (!sdev) {
+        fprintf(stderr, "Wrong device in h_add_logical_lan_buffer\n");
+        return H_PARAMETER;
+    }
+
+    if ((check_bd(dev, buf, 4) < 0)
+        || (VLAN_BD_LEN(buf) < 16)) {
+        fprintf(stderr, "Bad buffer enqueued in h_add_logical_lan_buffer\n");
+        return H_PARAMETER;
+    }
+
+    if (!dev->isopen || dev->rx_bufs >= VLAN_MAX_BUFS) {
+        return H_RESOURCE;
+    }
+
+    do {
+        dev->add_buf_ptr += 8;
+        if (dev->add_buf_ptr >= SPAPR_VIO_TCE_PAGE_SIZE) {
+            dev->add_buf_ptr = VLAN_RX_BDS_OFF;
+        }
+
+        bd = ldq_tce(sdev, dev->buf_list + dev->add_buf_ptr);
+    } while (bd & VLAN_BD_VALID);
+
+    stq_tce(sdev, dev->buf_list + dev->add_buf_ptr, buf);
+
+    dev->rx_bufs++;
+
+    dprintf("h_add_logical_lan_buffer():  Added buf  ptr=%d  rx_bufs=%d"
+            " bd=0x%016llx\n", dev->add_buf_ptr, dev->rx_bufs,
+            (unsigned long long)buf);
+    
+    return H_SUCCESS;
+}
+
+static target_ulong h_send_logical_lan(CPUState *env, sPAPREnvironment *spapr,
+                                       target_ulong opcode, target_ulong *args)
+{
+    target_ulong reg = args[0];
+    target_ulong *bufs = args + 1;
+    target_ulong continue_token = args[7];
+    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
+    unsigned total_len;
+    uint8_t *lbuf, *p;
+    int i, nbufs;
+    int ret = H_SUCCESS;
+
+    dprintf("H_SEND_LOGICAL_LAN(0x" TARGET_FMT_lx ", <bufs>, 0x" 
+            TARGET_FMT_lx ")\n", reg, continue_token); 
+
+    if (!sdev) {
+        return H_PARAMETER;
+    }
+
+    dprintf("rxbufs = %d\n", dev->rx_bufs);
+
+    if (!dev->isopen) {
+        return H_DROPPED;
+    }
+
+    if (continue_token) {
+        return H_HARDWARE; /* FIXME actually handle this */
+    }
+
+    total_len = 0;
+    for (i = 0; i < 6; i++) {
+        dprintf("   buf desc: 0x" TARGET_FMT_lx "\n", bufs[i]);
+        if (!(bufs[i] & VLAN_BD_VALID)) {
+            break;
+        }
+        total_len += VLAN_BD_LEN(bufs[i]);
+    }
+    
+    nbufs = i;
+    dprintf("h_send_logical_lan() %d buffers, total length 0x%x\n",
+            nbufs, total_len);
+
+    if (total_len == 0) {
+        return ret;
+    }
+
+    lbuf = qemu_mallocz(total_len);
+    p = lbuf;
+    for (i = 0; i < nbufs; i++) {
+        ret = spapr_tce_dma_read(sdev, VLAN_BD_ADDR(bufs[i]),
+                                 p, VLAN_BD_LEN(bufs[i]));
+        if (ret < 0) {
+            goto out;
+        }
+
+        p += VLAN_BD_LEN(bufs[i]);
+    }
+
+    qemu_send_packet(&dev->nic->nc, lbuf, total_len);
+
+out:
+    qemu_free(lbuf);
+
+    return ret;
+}
+
+static target_ulong h_multicast_ctrl(CPUState *env, sPAPREnvironment *spapr,
+                                     target_ulong opcode, target_ulong *args)
+{
+    target_ulong reg = args[0];
+    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+
+    if (!dev) {
+        return H_PARAMETER;
+    }
+
+    return H_SUCCESS;
+}
+
+static void vlan_hcalls(VIOsPAPRBus *bus)
+{
+    spapr_register_hypercall(H_REGISTER_LOGICAL_LAN, h_register_logical_lan);
+    spapr_register_hypercall(H_FREE_LOGICAL_LAN, h_free_logical_lan);
+    spapr_register_hypercall(H_SEND_LOGICAL_LAN, h_send_logical_lan);
+    spapr_register_hypercall(H_ADD_LOGICAL_LAN_BUFFER, h_add_logical_lan_buffer);
+    spapr_register_hypercall(H_MULTICAST_CTRL, h_multicast_ctrl);
+}
+
+static VIOsPAPRDeviceInfo spapr_vlan = {
+    .init = spapr_vlan_init,
+    .devnode = spapr_vlan_devnode,
+    .dt_name = "l-lan",
+    .dt_type = "network",
+    .dt_compatible = "IBM,l-lan",
+    .signal_mask = 0x1,
+    .hcalls = vlan_hcalls,
+    .qdev.name = "spapr-vlan",
+    .qdev.size = sizeof(VIOsPAPRVLANDevice),
+    .qdev.props = (Property[]) {
+        DEFINE_PROP_UINT32("reg", VIOsPAPRDevice, reg, 0x1000),
+        DEFINE_PROP_UINT32("dma-window", VIOsPAPRDevice, rtce_window_size,
+                           0x10000000),
+        DEFINE_NIC_PROPERTIES(VIOsPAPRVLANDevice, nicconf),
+        DEFINE_PROP_END_OF_LIST(),
+    },
+};
+
+static void spapr_vlan_register(void)
+{
+    spapr_vio_bus_register_withprop(&spapr_vlan);
+}
+device_init(spapr_vlan_register);
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index 1b15d3e..4cfaf55 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -21,9 +21,9 @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
 
-#define SPAPR_VIO_TCE_PAGE_SHIFT	12
-#define SPAPR_VIO_TCE_PAGE_SIZE		(1ULL << SPAPR_VIO_TCE_PAGE_SHIFT)
-#define SPAPR_VIO_TCE_PAGE_MASK		(SPAPR_VIO_TCE_PAGE_SIZE - 1)
+#define SPAPR_VIO_TCE_PAGE_SHIFT   12
+#define SPAPR_VIO_TCE_PAGE_SIZE    (1ULL << SPAPR_VIO_TCE_PAGE_SHIFT)
+#define SPAPR_VIO_TCE_PAGE_MASK    (SPAPR_VIO_TCE_PAGE_SIZE - 1)
 
 enum VIOsPAPR_TCEAccess {
     SPAPR_TCE_FAULT = 0,
@@ -86,4 +86,7 @@ void spapr_vty_create(VIOsPAPRBus *bus,
                       uint32_t reg, CharDriverState *chardev,
                       qemu_irq qirq, uint32_t vio_irq_num);
 
+void spapr_vlan_create(VIOsPAPRBus *bus, uint32_t reg, NICInfo *nd,
+                       qemu_irq qirq, uint32_t vio_irq_num);
+
 #endif /* _HW_SPAPR_VIO_H */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 23/26] Implement PAPR CRQ hypercalls
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (21 preceding siblings ...)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 22/26] Implement sPAPR Virtual LAN (ibmveth) David Gibson
@ 2011-03-16  4:57 ` David Gibson
  2011-03-16 16:15   ` [Qemu-devel] " Alexander Graf
  2011-03-16  4:57 ` [Qemu-devel] [PATCH 24/26] Implement PAPR virtual SCSI interface (ibmvscsi) David Gibson
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:57 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

From: Ben Herrenschmidt <benh@kernel.crashing.org>

This patch implements the infrastructure and hypercalls necessary for the
PAPR specified CRQ (Command Request Queue) mechanism.  This general
request queueing system is used by many of the PAPR virtual IO devices,
including the virtual scsi adapter.

Signed-off-by: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/spapr.c     |    2 +-
 hw/spapr_vio.c |  159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/spapr_vio.h |   12 ++++
 3 files changed, 172 insertions(+), 1 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 44cf3cc..cb97a16 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -64,7 +64,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
     uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
     char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
-        "\0hcall-tce";
+        "\0hcall-tce\0hcall-vio";
     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
     int i;
     char *modelname;
diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
index 37cf51e..96668f3 100644
--- a/hw/spapr_vio.c
+++ b/hw/spapr_vio.c
@@ -352,6 +352,159 @@ uint64_t ldq_tce(VIOsPAPRDevice *dev, uint64_t taddr)
     return tswap64(val);
 }
 
+/*
+ * CRQ handling
+ */
+static target_ulong h_reg_crq(CPUState *env, sPAPREnvironment *spapr,
+                              target_ulong opcode, target_ulong *args)
+{
+    target_ulong reg = args[0];
+    target_ulong queue_addr = args[1];
+    target_ulong queue_len = args[2];
+    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+
+    if (!dev) {
+        fprintf(stderr, "h_reg_crq on non-existent unit 0x"
+                TARGET_FMT_lx "\n", reg);
+        return H_PARAMETER;
+    }
+
+    /* We can't grok a queue size bigger than 256M for now */
+    if (queue_len < 0x1000 || queue_len > 0x10000000) {
+        fprintf(stderr, "h_reg_crq, queue size too small or too big (0x%llx)\n",
+                (unsigned long long)queue_len);
+        return H_PARAMETER;
+    }
+
+    /* Check queue alignment */
+    if (queue_addr & 0xfff) {
+        fprintf(stderr, "h_reg_crq, queue not aligned (0x%llx)\n",
+                (unsigned long long)queue_addr);
+        return H_PARAMETER;
+    }
+
+    /* Check if device supports CRQs */
+    if (!dev->crq.SendFunc) {
+        return H_NOT_FOUND;
+    }
+
+
+    /* Already a queue ? */
+    if (dev->crq.qsize) {
+        return H_RESOURCE;
+    }
+    dev->crq.qladdr = queue_addr;
+    dev->crq.qsize = queue_len;
+    dev->crq.qnext = 0;
+
+    dprintf("CRQ for dev 0x" TARGET_FMT_lx " registered at 0x"
+            TARGET_FMT_lx "/0x" TARGET_FMT_lx "\n",
+            reg, queue_addr, queue_len);
+    return H_SUCCESS;
+}
+
+static target_ulong h_free_crq(CPUState *env, sPAPREnvironment *spapr,
+                               target_ulong opcode, target_ulong *args)
+{
+    target_ulong reg = args[0];
+    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+
+    if (!dev) {
+        fprintf(stderr, "h_free_crq on non-existent unit 0x"
+                TARGET_FMT_lx "\n", reg);
+        return H_PARAMETER;
+    }
+
+    dev->crq.qladdr = 0;
+    dev->crq.qsize = 0;
+    dev->crq.qnext = 0;
+
+    dprintf("CRQ for dev 0x" TARGET_FMT_lx " freed\n", reg);
+
+    return H_SUCCESS;
+}
+
+static target_ulong h_send_crq(CPUState *env, sPAPREnvironment *spapr,
+                               target_ulong opcode, target_ulong *args)
+{
+    target_ulong reg = args[0];
+    target_ulong msg_hi = args[1];
+    target_ulong msg_lo = args[2];
+    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+    uint64_t crq_mangle[2];
+
+    if (!dev) {
+        fprintf(stderr, "h_send_crq on non-existent unit 0x"
+                TARGET_FMT_lx "\n", reg);
+        return H_PARAMETER;
+    }
+    crq_mangle[0] = cpu_to_be64(msg_hi);
+    crq_mangle[1] = cpu_to_be64(msg_lo);
+
+    if (dev->crq.SendFunc) {
+        return dev->crq.SendFunc(dev, (uint8_t *)crq_mangle);
+    }
+
+    return H_HARDWARE;
+}
+
+static target_ulong h_enable_crq(CPUState *env, sPAPREnvironment *spapr,
+                                 target_ulong opcode, target_ulong *args)
+{
+    target_ulong reg = args[0];
+    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
+
+    if (!dev) {
+        fprintf(stderr, "h_enable_crq on non-existent unit 0x"
+                TARGET_FMT_lx "\n", reg);
+        return H_PARAMETER;
+    }
+
+    return 0;
+}
+
+/* Returns negative error, 0 success, or positive: queue full */
+int spapr_vio_send_crq(VIOsPAPRDevice *dev, uint8_t *crq)
+{
+    int rc;
+    uint8_t byte;
+
+    if (!dev->crq.qsize) {
+        fprintf(stderr, "spapr_vio_send_creq on uninitialized queue\n");
+        return -1;
+    }
+
+    /* Maybe do a fast path for KVM just writing to the pages */
+    rc = spapr_tce_dma_read(dev, dev->crq.qladdr + dev->crq.qnext, &byte, 1);
+    if (rc) {
+        return rc;
+    }
+    if (byte != 0) {
+        return 1;
+    }
+
+    rc = spapr_tce_dma_write(dev, dev->crq.qladdr + dev->crq.qnext + 8, &crq[8], 8);
+    if (rc) {
+        return rc;
+    }
+#ifdef __powerpc__
+    /* Really only needed for kvm... */
+    asm volatile("eieio" : : : "memory");
+#endif
+    rc = spapr_tce_dma_write(dev, dev->crq.qladdr + dev->crq.qnext, crq, 8);
+    if (rc) {
+        return rc;
+    }
+
+    dev->crq.qnext = (dev->crq.qnext + 16) % dev->crq.qsize;
+
+    if (dev->signal_state & 1) {
+        qemu_irq_pulse(dev->qirq);
+    }
+
+    return 0;
+}
+
 static int spapr_vio_busdev_init(DeviceState *dev, DeviceInfo *info)
 {
     VIOsPAPRDeviceInfo *_info = (VIOsPAPRDeviceInfo *)info;
@@ -422,6 +575,12 @@ VIOsPAPRBus *spapr_vio_bus_init(void)
     /* hcall-tce */
     spapr_register_hypercall(H_PUT_TCE, h_put_tce);
 
+    /* hcall-crq */
+    spapr_register_hypercall(H_REG_CRQ, h_reg_crq);
+    spapr_register_hypercall(H_FREE_CRQ, h_free_crq);
+    spapr_register_hypercall(H_SEND_CRQ, h_send_crq);
+    spapr_register_hypercall(H_ENABLE_CRQ, h_enable_crq);
+
     for (_info = device_info_list; _info; _info = _info->next) {
         VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)_info;
 
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index 4cfaf55..ba16795 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -32,10 +32,19 @@ enum VIOsPAPR_TCEAccess {
     SPAPR_TCE_RW = 3,
 };
 
+struct VIOsPAPRDevice;
+
 typedef struct VIOsPAPR_RTCE {
     uint64_t tce;
 } VIOsPAPR_RTCE;
 
+typedef struct VIOsPAPR_CRQ {
+    uint64_t qladdr;
+    uint32_t qsize;
+    uint32_t qnext;
+    int(*SendFunc)(struct VIOsPAPRDevice *vdev, uint8_t *crq);
+} VIOsPAPR_CRQ;
+
 typedef struct VIOsPAPRDevice {
     DeviceState qdev;
     uint32_t reg;
@@ -44,6 +53,7 @@ typedef struct VIOsPAPRDevice {
     target_ulong signal_state;
     uint32_t rtce_window_size;
     VIOsPAPR_RTCE *rtce_table;
+    VIOsPAPR_CRQ crq;
 } VIOsPAPRDevice;
 
 typedef struct VIOsPAPRBus {
@@ -81,6 +91,8 @@ void stw_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint32_t val);
 void stq_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint64_t val);
 uint64_t ldq_tce(VIOsPAPRDevice *dev, uint64_t taddr);
 
+int spapr_vio_send_crq(VIOsPAPRDevice *dev, uint8_t *crq);
+
 void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len);
 void spapr_vty_create(VIOsPAPRBus *bus,
                       uint32_t reg, CharDriverState *chardev,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 24/26] Implement PAPR virtual SCSI interface (ibmvscsi)
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (22 preceding siblings ...)
  2011-03-16  4:57 ` [Qemu-devel] [PATCH 23/26] Implement PAPR CRQ hypercalls David Gibson
@ 2011-03-16  4:57 ` David Gibson
  2011-03-16 16:41   ` [Qemu-devel] " Alexander Graf
  2011-03-16  4:57 ` [Qemu-devel] [PATCH 25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine David Gibson
  2011-03-16  4:57 ` [Qemu-devel] [PATCH 26/26] Implement PAPR VPA functions for pSeries shared processor partitions David Gibson
  25 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:57 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

This patch implements the infrastructure and hypercalls necessary for
the PAPR specified Virtual SCSI interface.  This is the normal method
for providing (virtual) disks to PAPR partitions.

Signed-off-by: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 Makefile.target  |    2 +-
 hw/ppc-viosrp.h  |  216 ++++++++++++
 hw/spapr.c       |   10 +-
 hw/spapr_vio.h   |    3 +
 hw/spapr_vscsi.c |  960 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/srp.h         |  241 ++++++++++++++
 6 files changed, 1430 insertions(+), 2 deletions(-)
 create mode 100644 hw/ppc-viosrp.h
 create mode 100644 hw/spapr_vscsi.c
 create mode 100644 hw/srp.h

diff --git a/Makefile.target b/Makefile.target
index ef86d43..49f9e9a 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -233,7 +233,7 @@ obj-ppc-y += ppc_oldworld.o
 obj-ppc-y += ppc_newworld.o
 # IBM pSeries (sPAPR)
 obj-ppc-y += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
-obj-ppc-y += xics.o spapr_vty.o spapr_llan.o
+obj-ppc-y += xics.o spapr_vty.o spapr_llan.o spapr_vscsi.o
 # PowerPC 4xx boards
 obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
 obj-ppc-y += ppc440.o ppc440_bamboo.o
diff --git a/hw/ppc-viosrp.h b/hw/ppc-viosrp.h
new file mode 100644
index 0000000..9afcf7a
--- /dev/null
+++ b/hw/ppc-viosrp.h
@@ -0,0 +1,216 @@
+/*****************************************************************************/
+/* srp.h -- SCSI RDMA Protocol definitions                                   */
+/*                                                                           */
+/* Written By: Colin Devilbis, IBM Corporation                               */
+/*                                                                           */
+/* Copyright (C) 2003 IBM Corporation                                        */
+/*                                                                           */
+/* This program is free software; you can redistribute it and/or modify      */
+/* it under the terms of the GNU General Public License as published by      */
+/* the Free Software Foundation; either version 2 of the License, or         */
+/* (at your option) any later version.                                       */
+/*                                                                           */
+/* This program is distributed in the hope that it will be useful,           */
+/* but WITHOUT ANY WARRANTY; without even the implied warranty of            */
+/* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the             */
+/* GNU General Public License for more details.                              */
+/*                                                                           */
+/* You should have received a copy of the GNU General Public License         */
+/* along with this program; if not, write to the Free Software               */
+/* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA */
+/*                                                                           */
+/*                                                                           */
+/* This file contains structures and definitions for IBM RPA (RS/6000        */
+/* platform architecture) implementation of the SRP (SCSI RDMA Protocol)     */
+/* standard.  SRP is used on IBM iSeries and pSeries platforms to send SCSI  */
+/* commands between logical partitions.                                      */
+/*                                                                           */
+/* SRP Information Units (IUs) are sent on a "Command/Response Queue" (CRQ)  */
+/* between partitions.  The definitions in this file are architected,        */
+/* and cannot be changed without breaking compatibility with other versions  */
+/* of Linux and other operating systems (AIX, OS/400) that talk this protocol*/
+/* between logical partitions                                                */
+/*****************************************************************************/
+#ifndef PPC_VIOSRP_H
+#define PPC_VIOSRP_H
+
+#define SRP_VERSION "16.a"
+#define SRP_MAX_IU_LEN    256
+#define SRP_MAX_LOC_LEN 32
+
+union srp_iu {
+    struct srp_login_req login_req;
+    struct srp_login_rsp login_rsp;
+    struct srp_login_rej login_rej;
+    struct srp_i_logout i_logout;
+    struct srp_t_logout t_logout;
+    struct srp_tsk_mgmt tsk_mgmt;
+    struct srp_cmd cmd;
+    struct srp_rsp rsp;
+    uint8_t reserved[SRP_MAX_IU_LEN];
+};
+
+enum viosrp_crq_formats {
+    VIOSRP_SRP_FORMAT = 0x01,
+    VIOSRP_MAD_FORMAT = 0x02,
+    VIOSRP_OS400_FORMAT = 0x03,
+    VIOSRP_AIX_FORMAT = 0x04,
+    VIOSRP_LINUX_FORMAT = 0x06,
+    VIOSRP_INLINE_FORMAT = 0x07
+};
+
+enum viosrp_crq_status {
+    VIOSRP_OK = 0x0,
+    VIOSRP_NONRECOVERABLE_ERR = 0x1,
+    VIOSRP_VIOLATES_MAX_XFER = 0x2,
+    VIOSRP_PARTNER_PANIC = 0x3,
+    VIOSRP_DEVICE_BUSY = 0x8,
+    VIOSRP_ADAPTER_FAIL = 0x10,
+    VIOSRP_OK2 = 0x99,
+};
+
+struct viosrp_crq {
+    uint8_t valid;        /* used by RPA */
+    uint8_t format;        /* SCSI vs out-of-band */
+    uint8_t reserved;
+    uint8_t status;        /* non-scsi failure? (e.g. DMA failure) */
+    uint16_t timeout;        /* in seconds */
+    uint16_t IU_length;        /* in bytes */
+    uint64_t IU_data_ptr;    /* the TCE for transferring data */
+};
+
+/* MADs are Management requests above and beyond the IUs defined in the SRP
+ * standard.  
+ */
+enum viosrp_mad_types {
+    VIOSRP_EMPTY_IU_TYPE = 0x01,
+    VIOSRP_ERROR_LOG_TYPE = 0x02,
+    VIOSRP_ADAPTER_INFO_TYPE = 0x03,
+    VIOSRP_HOST_CONFIG_TYPE = 0x04,
+    VIOSRP_CAPABILITIES_TYPE = 0x05,
+    VIOSRP_ENABLE_FAST_FAIL = 0x08,
+};
+
+enum viosrp_mad_status {
+    VIOSRP_MAD_SUCCESS = 0x00,
+    VIOSRP_MAD_NOT_SUPPORTED = 0xF1,
+    VIOSRP_MAD_FAILED = 0xF7,
+};
+
+enum viosrp_capability_type {
+    MIGRATION_CAPABILITIES = 0x01,
+    RESERVATION_CAPABILITIES = 0x02,
+};
+
+enum viosrp_capability_support {
+    SERVER_DOES_NOT_SUPPORTS_CAP = 0x0,
+    SERVER_SUPPORTS_CAP = 0x01,
+    SERVER_CAP_DATA = 0x02,
+};
+
+enum viosrp_reserve_type {
+    CLIENT_RESERVE_SCSI_2 = 0x01,
+};
+
+enum viosrp_capability_flag {
+    CLIENT_MIGRATED = 0x01,
+    CLIENT_RECONNECT = 0x02,
+    CAP_LIST_SUPPORTED = 0x04,
+    CAP_LIST_DATA = 0x08,
+};
+
+/* 
+ * Common MAD header
+ */
+struct mad_common {
+    uint32_t type;
+    uint16_t status;
+    uint16_t length;
+    uint64_t tag;
+};
+
+/*
+ * All SRP (and MAD) requests normally flow from the
+ * client to the server.  There is no way for the server to send
+ * an asynchronous message back to the client.  The Empty IU is used
+ * to hang out a meaningless request to the server so that it can respond
+ * asynchrouously with something like a SCSI AER 
+ */
+struct viosrp_empty_iu {
+    struct mad_common common;
+    uint64_t buffer;
+    uint32_t port;
+};
+
+struct viosrp_error_log {
+    struct mad_common common;
+    uint64_t buffer;
+};
+
+struct viosrp_adapter_info {
+    struct mad_common common;
+    uint64_t buffer;
+};
+
+struct viosrp_host_config {
+    struct mad_common common;
+    uint64_t buffer;
+};
+
+struct viosrp_fast_fail {
+    struct mad_common common;
+};
+
+struct viosrp_capabilities {
+    struct mad_common common;
+    uint64_t buffer;
+};
+
+struct mad_capability_common {
+    uint32_t cap_type;
+    uint16_t length;
+    uint16_t server_support;
+};
+
+struct mad_reserve_cap {
+    struct mad_capability_common common;
+    uint32_t type;
+};
+
+struct mad_migration_cap {
+    struct mad_capability_common common;
+    uint32_t ecl;
+};
+
+struct capabilities{
+    uint32_t flags;
+    char name[SRP_MAX_LOC_LEN];
+    char loc[SRP_MAX_LOC_LEN];
+    struct mad_migration_cap migration;
+    struct mad_reserve_cap reserve;
+};
+
+union mad_iu {
+    struct viosrp_empty_iu empty_iu;
+    struct viosrp_error_log error_log;
+    struct viosrp_adapter_info adapter_info;
+    struct viosrp_host_config host_config;
+    struct viosrp_fast_fail fast_fail;
+    struct viosrp_capabilities capabilities;
+};
+
+union viosrp_iu {
+    union srp_iu srp;
+    union mad_iu mad;
+};
+
+struct mad_adapter_info_data {
+    char srp_version[8];
+    char partition_name[96];
+    uint32_t partition_number;
+    uint32_t mad_version;
+    uint32_t os_type;
+    uint32_t port_max_txu[8];    /* per-port maximum transfer */
+};
+
+#endif
diff --git a/hw/spapr.c b/hw/spapr.c
index cb97a16..5f868fc 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -28,6 +28,7 @@
 #include "hw.h"
 #include "elf.h"
 #include "net.h"
+#include "blockdev.h"
 
 #include "hw/boards.h"
 #include "hw/ppc.h"
@@ -316,7 +317,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     qemu_free(filename);
 
     /* Set up Interrupt Controller */
-    spapr->icp = xics_system_init(smp_cpus, envs, MAX_SERIAL_PORTS + nb_nics);
+    spapr->icp = xics_system_init(smp_cpus, envs, MAX_SERIAL_PORTS + nb_nics + drive_get_max_bus(IF_SCSI) + 1);
 
     /* Set up VIO bus */
     spapr->vio_bus = spapr_vio_bus_init();
@@ -346,6 +347,12 @@ static void ppc_spapr_init(ram_addr_t ram_size,
         }
     }
 
+    for (i = 0; i <= drive_get_max_bus(IF_SCSI); i++) {
+        spapr_vscsi_create(spapr->vio_bus, 0x2000 + i,
+                           xics_find_qirq(spapr->icp, irq), irq);
+        irq++;
+    }
+
     if (kernel_filename) {
         uint64_t lowaddr = 0;
 
@@ -406,6 +413,7 @@ static QEMUMachine spapr_machine = {
     .max_cpus = MAX_CPUS,
     .no_vga = 1,
     .no_parallel = 1,
+    .use_scsi = 1,
 };
 
 static void spapr_machine_init(void)
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index ba16795..b7d0daa 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -101,4 +101,7 @@ void spapr_vty_create(VIOsPAPRBus *bus,
 void spapr_vlan_create(VIOsPAPRBus *bus, uint32_t reg, NICInfo *nd,
                        qemu_irq qirq, uint32_t vio_irq_num);
 
+void spapr_vscsi_create(VIOsPAPRBus *bus, uint32_t reg,
+                        qemu_irq qirq, uint32_t vio_irq_num);
+
 #endif /* _HW_SPAPR_VIO_H */
diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c
new file mode 100644
index 0000000..0a67095
--- /dev/null
+++ b/hw/spapr_vscsi.c
@@ -0,0 +1,960 @@
+/* TODO:
+ *
+ *  - Cleanups :-)
+ *  - Sort out better how to assign devices to VSCSI instances
+ *  - Fix residual counts
+ *  - Add indirect descriptors support
+ *  - Maybe do autosense (PAPR seems to mandate it, linux doesn't care)
+ */
+#include "hw.h"
+#include "scsi.h"
+#include "scsi-defs.h"
+#include "net.h" /* Remove that when we can */
+#include "srp.h"
+#include "hw/qdev.h"
+#include "hw/spapr.h"
+#include "hw/spapr_vio.h"
+#include "hw/ppc-viosrp.h"
+
+#include <libfdt.h>
+
+//#define DEBUG_VSCSI
+
+#ifdef DEBUG_VSCSI
+#define dprintf(fmt, ...) \
+    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define dprintf(fmt, ...) \
+    do { } while (0)
+#endif
+
+#define min(a, b) ((a) < (b) ? (a) : (b))
+
+/*
+ * Virtual SCSI device
+ */
+
+/* Random numbers */
+#define VSCSI_MAX_SECTORS       4096/*1024*//*256*/
+#define VSCSI_REQ_LIMIT         24
+
+#define SCSI_SENSE_BUF_SIZE     96
+#define SRP_RSP_SENSE_DATA_LEN  18
+
+typedef union vscsi_crq {
+    struct viosrp_crq s;
+    uint8_t raw[16];
+} vscsi_crq;
+
+typedef struct vscsi_req
+{
+    vscsi_crq               crq;
+    union viosrp_iu         iu;
+
+    /* SCSI request tracking */
+    SCSIDevice              *sdev;
+    uint32_t                qtag; /* qemu tag != srp tag */
+    int                     lun;
+    int                     active;
+    long                    data_len;
+    int                     writing;
+    int                     sensing;
+    int                     senselen;
+    uint8_t                 sense[SCSI_SENSE_BUF_SIZE];
+
+    /* RDMA related bits */
+    uint8_t                 dma_fmt;
+    struct srp_direct_buf   ext_desc;
+    struct srp_direct_buf   *cur_desc;
+    struct srp_indirect_buf *ind_desc;
+    int                     local_desc;
+    int                     total_desc;
+    
+} vscsi_req;
+
+
+typedef struct {
+    VIOsPAPRDevice vdev;
+    SCSIBus bus;
+    vscsi_req reqs[VSCSI_REQ_LIMIT];
+} VSCSIState;
+
+/* XXX Debug only */
+static VSCSIState *dbg_vscsi_state;
+
+
+static struct vscsi_req *vscsi_get_req(VSCSIState *s)
+{
+    vscsi_req *req;
+    int i;
+
+    for (i = 0; i < VSCSI_REQ_LIMIT; i++) {
+        req = &s->reqs[i];
+        if (!req->active) {
+            memset(req, 0, sizeof(*req));
+            req->qtag = i;
+            req->active = 1;
+            return req;
+        }
+    }
+    return NULL;
+}
+
+static void vscsi_put_req(VSCSIState *s, vscsi_req *req)
+{
+    req->active = 0;
+}
+
+static vscsi_req *vscsi_find_req(VSCSIState *s, uint32_t tag)
+{
+    if (tag >= VSCSI_REQ_LIMIT || !s->reqs[tag].active) {
+        return NULL;
+    }
+    return &s->reqs[tag];
+}
+
+static void vscsi_decode_id_lun(uint64_t srp_lun, int *id, int *lun)
+{
+    /* XXX Figure that one out properly ! This is crackpot */
+    *id = (srp_lun >> 56) & 0x7f;
+    *lun = (srp_lun >> 48) & 0xff;
+}
+
+static int vscsi_send_iu(VSCSIState *s, vscsi_req *req,
+                         uint64_t length, uint8_t format)
+{
+    long rc, rc1;
+
+    /* First copy the SRP */
+    rc = spapr_tce_dma_write(&s->vdev, req->crq.s.IU_data_ptr,
+                             &req->iu, length);
+    if (rc) {
+        fprintf(stderr, "vscsi_send_iu: DMA write failure !\n");
+    }
+
+    req->crq.s.valid = 0x80;
+    req->crq.s.format = format;
+    req->crq.s.reserved = 0x00;
+    req->crq.s.timeout = cpu_to_be16(0x0000);
+    req->crq.s.IU_length = cpu_to_be16(length);
+    req->crq.s.IU_data_ptr = req->iu.srp.rsp.tag; /* right byte order */
+
+    if (rc == 0) {
+        req->crq.s.status = 0x99; /* Just needs to be non-zero */
+    } else {
+        req->crq.s.status = 0x00;
+    }
+
+    rc1 = spapr_vio_send_crq(&s->vdev, req->crq.raw);
+    if (rc1) {
+        fprintf(stderr, "vscsi_send_iu: Error sending response\n");
+        return rc1;
+    }
+
+    return rc;
+}
+
+static void vscsi_makeup_sense(VSCSIState *s, vscsi_req *req,
+                               uint8_t key, uint8_t asc, uint8_t ascq)
+{
+    req->senselen = SRP_RSP_SENSE_DATA_LEN;
+
+    /* Valid bit and 'current errors' */
+    req->sense[0] = (0x1 << 7 | 0x70);
+    /* Sense key */
+    req->sense[2] = key;
+    /* Additional sense length */
+    req->sense[7] = 0xa; /* 10 bytes */
+    /* Additional sense code */
+    req->sense[12] = asc;
+    req->sense[13] = ascq;
+}
+
+static int vscsi_send_rsp(VSCSIState *s, vscsi_req *req,
+                          uint8_t status, int32_t res_in, int32_t res_out)
+{
+   union viosrp_iu *iu = &req->iu;
+   uint64_t tag = iu->srp.rsp.tag;
+   int total_len = sizeof(iu->srp.rsp);
+
+   dprintf("VSCSI: Sending resp status: 0x%x, "
+           "res_in: %d, res_out: %d \n", status, res_in, res_out);
+
+   memset(iu, 0, sizeof(struct srp_rsp));
+   iu->srp.rsp.opcode = SRP_RSP;
+   iu->srp.rsp.req_lim_delta = cpu_to_be32(1);
+   iu->srp.rsp.tag = tag;
+
+   /* Handle residuals */
+   if (res_in < 0) {
+       iu->srp.rsp.flags |= SRP_RSP_FLAG_DIUNDER;
+       res_in = -res_in;
+   } else if (res_in) {
+       iu->srp.rsp.flags |= SRP_RSP_FLAG_DIOVER;
+   }
+   if (res_out < 0) {
+       iu->srp.rsp.flags |= SRP_RSP_FLAG_DOUNDER;
+       res_out = -res_out;
+   } else if (res_out) {
+       iu->srp.rsp.flags |= SRP_RSP_FLAG_DOOVER;
+   }
+   iu->srp.rsp.data_in_res_cnt = cpu_to_be32(res_in);
+   iu->srp.rsp.data_out_res_cnt = cpu_to_be32(res_out);
+
+   /* We don't do response data */
+   /* iu->srp.rsp.flags &= ~SRP_RSP_FLAG_RSPVALID; */
+   iu->srp.rsp.resp_data_len = cpu_to_be32(0);
+
+   /* Handle success vs. failure */
+   iu->srp.rsp.status = status;
+   if (status) {
+       iu->srp.rsp.sol_not = (iu->srp.cmd.sol_not & 0x04) >> 2;
+       if (req->senselen) {
+           req->iu.srp.rsp.flags |= SRP_RSP_FLAG_SNSVALID;
+           req->iu.srp.rsp.sense_data_len = cpu_to_be32(req->senselen);
+           memcpy(req->iu.srp.rsp.data, req->sense, req->senselen);
+           total_len += req->senselen;
+       }
+   } else {
+       iu->srp.rsp.sol_not = (iu->srp.cmd.sol_not & 0x02) >> 1;
+   }
+
+   vscsi_send_iu(s, req, total_len, VIOSRP_SRP_FORMAT);
+   return 0;
+}
+
+static inline void vscsi_swap_desc(struct srp_direct_buf *desc)
+{
+    desc->va = be64_to_cpu(desc->va);
+    desc->len = be32_to_cpu(desc->len);
+}
+
+static int vscsi_srp_direct_data(VSCSIState *s, vscsi_req *req,
+                                 uint8_t *buf, uint32_t len)
+{
+    struct srp_direct_buf *md = req->cur_desc;
+    uint32_t llen;
+    int rc;
+
+    dprintf("VSCSI: direct segment 0x%x bytes, va=0x%llx desc len=0x%x\n",
+            len, (unsigned long long)md->va, md->len);
+
+    llen = min(len, md->len);
+    if (llen) {
+        if (req->writing) { /* writing = to device = reading from memory */
+            rc = spapr_tce_dma_read(&s->vdev, md->va, buf, llen);
+        } else {
+            rc = spapr_tce_dma_write(&s->vdev, md->va, buf, llen);
+        }
+    }
+    md->len -= llen;
+    md->va += llen;
+
+    if (rc) {
+        return -1;
+    }
+    return llen;
+}
+
+static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
+                                   uint8_t *buf, uint32_t len)
+{
+    struct srp_direct_buf *td = &req->ind_desc->table_desc;
+    struct srp_direct_buf *md = req->cur_desc;
+    int rc = 0;
+    uint32_t llen, total = 0;
+
+    dprintf("VSCSI: indirect segment 0x%x bytes, td va=0x%llx len=0x%x\n",
+            len, (unsigned long long)td->va, td->len);
+
+    /* While we have data ... */
+    while(len) {
+        /* If we have a descriptor but it's empty, go fetch a new one */
+        if (md && md->len == 0) {
+            /* More local available, use one */
+            if (req->local_desc) {
+                md = ++req->cur_desc;
+                --req->local_desc;
+                --req->total_desc;
+                td->va += sizeof(struct srp_direct_buf);
+            } else {
+                md = req->cur_desc = NULL;
+            }
+        }
+        /* No descriptor at hand, fetch one */
+        if (!md) {
+            if (!req->total_desc) {
+                dprintf("VSCSI:   Out of descriptors !\n");
+                break;
+            }
+            md = req->cur_desc = &req->ext_desc;
+            dprintf("VSCSI:   Reading desc from 0x%llx\n", (unsigned long long)td->va);
+            rc = spapr_tce_dma_read(&s->vdev, td->va, md, sizeof(struct srp_direct_buf));
+            if (rc) {
+                dprintf("VSCSI: tce_dma_read -> %d reading ext_desc\n", rc);
+                break;
+            }
+            vscsi_swap_desc(md);
+            td->va += sizeof(struct srp_direct_buf);
+            --req->total_desc;
+        }
+        dprintf("VSCSI:   [desc va=0x%llx,len=0x%x] remaining=0x%x\n",
+                (unsigned long long)md->va, md->len, len);
+
+        /* Perform transfer */
+        llen = min(len, md->len);
+        if (req->writing) { /* writing = to device = reading from memory */ 
+            rc = spapr_tce_dma_read(&s->vdev, md->va, buf, llen);
+            
+        } else {
+            rc = spapr_tce_dma_write(&s->vdev, md->va, buf, llen);
+        }
+        if (rc) {
+            dprintf("VSCSI: tce_dma_r/w(%d) -> %d\n", req->writing, rc);
+            break;
+        }
+        dprintf("VSCSI:     data: %02x %02x %02x %02x...\n",
+                buf[0], buf[1], buf[2], buf[3]);
+
+        len -= llen;
+        buf += llen;
+        total += llen;
+        md->va += llen;
+        md->len -= llen;
+    }
+    return rc ? -1 : total;
+}
+
+static int vscsi_srp_transfer_data(VSCSIState *s, vscsi_req *req,
+                                   int writing, uint8_t *buf, uint32_t len)
+{
+    int err = 0;
+  
+    switch (req->dma_fmt) {
+    case SRP_NO_DATA_DESC:
+        dprintf("VSCSI: no data desc transfer, skipping 0x%x bytes\n", len);
+        break;
+    case SRP_DATA_DESC_DIRECT:
+        err = vscsi_srp_direct_data(s, req, buf, len);
+        break;
+    case SRP_DATA_DESC_INDIRECT:
+        err = vscsi_srp_indirect_data(s, req, buf, len);
+        break;
+    }
+    return err;
+}
+
+/* Bits from linux srp */
+static int data_out_desc_size(struct srp_cmd *cmd)
+{
+    int size = 0;
+    uint8_t fmt = cmd->buf_fmt >> 4;
+
+    switch (fmt) {
+    case SRP_NO_DATA_DESC:
+        break;
+    case SRP_DATA_DESC_DIRECT:
+        size = sizeof(struct srp_direct_buf);
+        break;
+    case SRP_DATA_DESC_INDIRECT:
+        size = sizeof(struct srp_indirect_buf) +
+            sizeof(struct srp_direct_buf) * cmd->data_out_desc_cnt;
+        break;
+    default:
+        break;
+    }
+    return size;
+}
+
+static int vscsi_preprocess_desc(vscsi_req *req)
+{
+    struct srp_cmd *cmd = &req->iu.srp.cmd;
+    int offset, i;
+
+    offset = cmd->add_cdb_len & ~3;
+
+    if (req->writing) {
+        req->dma_fmt = cmd->buf_fmt >> 4;
+    } else {
+        offset += data_out_desc_size(cmd);
+        req->dma_fmt = cmd->buf_fmt & ((1U << 4) - 1);
+    }
+
+    switch (req->dma_fmt) {
+    case SRP_NO_DATA_DESC:
+        break;
+    case SRP_DATA_DESC_DIRECT:
+        req->cur_desc = (struct srp_direct_buf *)(cmd->add_data + offset);
+        req->total_desc = req->local_desc = 1;
+        vscsi_swap_desc(req->cur_desc);
+        dprintf("VSCSI: using direct RDMA %s, 0x%x bytes MD: 0x%llx\n",
+                req->writing ? "write" : "read",
+                req->cur_desc->len, (unsigned long long)req->cur_desc->va);
+        break;
+    case SRP_DATA_DESC_INDIRECT:
+        req->ind_desc = (struct srp_indirect_buf *)(cmd->add_data + offset);
+        vscsi_swap_desc(&req->ind_desc->table_desc);
+        req->total_desc = req->ind_desc->table_desc.len / sizeof(struct srp_direct_buf);
+        req->local_desc = req->writing ? cmd->data_out_desc_cnt :
+            cmd->data_in_desc_cnt;
+        for (i = 0; i < req->local_desc; i++)
+            vscsi_swap_desc(&req->ind_desc->desc_list[i]);
+        req->cur_desc = req->local_desc ? &req->ind_desc->desc_list[0] : NULL;
+        dprintf("VSCSI: using indirect RDMA %s, 0x%x bytes %d descs (%d local) VA: 0x%llx\n",
+                req->writing ? "read" : "write", be32_to_cpu(req->ind_desc->len),
+                req->total_desc, req->local_desc,
+                (unsigned long long)req->ind_desc->table_desc.va);
+        break;
+    default:
+        fprintf(stderr,
+                "vscsi_preprocess_desc: Unknown format %x\n", req->dma_fmt);
+        return -1;
+    }
+
+    return 0;
+}
+
+static void vscsi_send_request_sense(VSCSIState *s, vscsi_req *req)
+{
+    SCSIDevice *sdev = req->sdev;
+    uint8_t *cdb = req->iu.srp.cmd.cdb;
+    int n;
+
+    cdb[0] = 3;
+    cdb[1] = 0;
+    cdb[2] = 0;
+    cdb[3] = 0;
+    cdb[4] = 96;
+    cdb[5] = 0;
+    req->sensing = 1;
+    n = sdev->info->send_command(sdev, req->qtag, cdb, req->lun);
+    dprintf("VSCSI: Queued request sense tag 0x%x \n", req->qtag);
+    if (n < 0) {
+        fprintf(stderr, "VSCSI: REQUEST_SENSE wants write data !?!?!?\n");
+        sdev->info->cancel_io(sdev, req->qtag);
+        vscsi_makeup_sense(s, req, HARDWARE_ERROR, 0, 0);
+        vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
+        vscsi_put_req(s, req);
+        return;
+    } else if (n == 0) {
+        return;
+    }
+    sdev->info->read_data(sdev, req->qtag);
+}
+
+/* Callback to indicate that the SCSI layer has completed a transfer.  */
+static void vscsi_command_complete(SCSIBus *bus, int reason, uint32_t tag,
+                                   uint32_t arg)
+{
+    VSCSIState *s = DO_UPCAST(VSCSIState, vdev.qdev, bus->qbus.parent);
+    vscsi_req *req = vscsi_find_req(s, tag);
+    SCSIDevice *sdev;
+    uint8_t *buf;
+    int32_t res_in = 0, res_out = 0;
+    int len, rc = 0;
+
+    dprintf("VSCSI: SCSI cmd complete, r=0x%x tag=0x%x arg=0x%x, req=%p\n",
+            reason, tag, arg, req);
+    if (req == NULL) {
+        fprintf(stderr, "VSCSI: Can't find request for tag 0x%x\n", tag);
+        return;
+    }
+    sdev = req->sdev;
+
+    if (req->sensing) {
+        if (reason == SCSI_REASON_DONE) {
+            dprintf("VSCSI: Sense done !\n");
+            vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
+            vscsi_put_req(s, req);
+        } else {
+            uint8_t *buf = sdev->info->get_buf(sdev, tag);
+
+            len = min(arg, SCSI_SENSE_BUF_SIZE);
+            dprintf("VSCSI: Sense data, %d bytes:\n", len);
+            dprintf("       %02x  %02x  %02x  %02x  %02x  %02x  %02x  %02x\n",
+                    buf[0], buf[1], buf[2], buf[3],
+                    buf[4], buf[5], buf[6], buf[7]);
+            dprintf("       %02x  %02x  %02x  %02x  %02x  %02x  %02x  %02x\n",
+                    buf[8], buf[9], buf[10], buf[11],
+                    buf[12], buf[13], buf[14], buf[15]);
+            memcpy(req->sense, buf, len);
+            req->senselen = len;
+            sdev->info->read_data(sdev, req->qtag);
+        }
+        return;
+    }
+
+    if (reason == SCSI_REASON_DONE) {
+        dprintf("VSCSI: Command complete err=%d\n", arg);
+        if (arg == 0) {
+            /* We handle overflows, not underflows for normal commands,
+             * but hopefully nobody cares
+             */
+            if (req->writing)
+                res_out = req->data_len;
+            else
+                res_in = req->data_len;
+            vscsi_send_rsp(s, req, 0, res_in, res_out);
+        } else if (arg == CHECK_CONDITION) {
+            dprintf("VSCSI: Got CHECK_CONDITION, requesting sense...\n");
+            vscsi_send_request_sense(s, req);
+            return;
+        } else {
+            vscsi_send_rsp(s, req, arg, 0, 0);
+        }
+        vscsi_put_req(s, req);
+        return;
+    }
+
+    /* "arg" is how much we have read for reads and how much we want
+     * to write for writes (ie, how much is to be DMA'd)
+     */
+    if (arg) {
+        buf = sdev->info->get_buf(sdev, tag);
+        rc = vscsi_srp_transfer_data(s, req, req->writing, buf, arg);
+    }
+    if (rc < 0) {
+        fprintf(stderr, "VSCSI: RDMA error rc=%d!\n", rc);
+        sdev->info->cancel_io(sdev, req->qtag);
+        vscsi_makeup_sense(s, req, HARDWARE_ERROR, 0, 0);
+        vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
+        vscsi_put_req(s, req);
+        return;
+    }
+
+    /* Start next chunk */
+    req->data_len -= rc;
+    if (req->writing) {
+        sdev->info->write_data(sdev, req->qtag);
+    } else {
+        sdev->info->read_data(sdev, req->qtag);
+    }
+}
+
+static void vscsi_process_login(VSCSIState *s, vscsi_req *req)
+{
+    union viosrp_iu *iu = &req->iu;
+    struct srp_login_rsp *rsp = &iu->srp.login_rsp;
+    uint64_t tag = iu->srp.rsp.tag;
+
+    dprintf("VSCSI: Got login, sendin response !\n");
+
+    /* TODO handle case that requested size is wrong and
+     * buffer format is wrong
+     */
+    memset(iu, 0, sizeof(struct srp_login_rsp));
+    rsp->opcode = SRP_LOGIN_RSP;
+    /* Don't advertise quite as many request as we support to
+     * keep room for management stuff etc...
+     */
+    rsp->req_lim_delta = cpu_to_be32(VSCSI_REQ_LIMIT-2);
+    rsp->tag = tag;
+    rsp->max_it_iu_len = cpu_to_be32(sizeof(union srp_iu));
+    rsp->max_ti_iu_len = cpu_to_be32(sizeof(union srp_iu));
+    /* direct and indirect */
+    rsp->buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT);
+
+    vscsi_send_iu(s, req, sizeof(*rsp), VIOSRP_SRP_FORMAT);
+}
+
+static void vscsi_inquiry_no_target(VSCSIState *s, vscsi_req *req)
+{
+    uint8_t *cdb = req->iu.srp.cmd.cdb;
+    uint8_t resp_data[36];
+    int rc, len, alen;
+
+    /* We dont do EVPD. Also check that page_code is 0 */
+    if ((cdb[1] & 0x01) || (cdb[1] & 0x01) || cdb[2] != 0) {
+        /* Send INVALID FIELD IN CDB */
+        vscsi_makeup_sense(s, req, ILLEGAL_REQUEST, 0x24, 0);
+        vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
+        return;
+    }
+    alen = cdb[3];
+    alen = (alen << 8) | cdb[4];
+    len = min(alen, 36);
+
+    /* Fake up inquiry using PQ=3 */
+    memset(resp_data, 0, 36);
+    resp_data[0] = 0x7f;   /* Not capable of supporting a device here */
+    resp_data[2] = 0x06;   /* SPS-4 */
+    resp_data[3] = 0x02;   /* Resp data format */
+    resp_data[4] = 36 - 5; /* Additional length */ 
+    resp_data[7] = 0x10;   /* Sync transfers */
+    memcpy(&resp_data[16], "QEMU EMPTY      ", 16);
+    memcpy(&resp_data[8], "QEMU    ", 8);
+
+    req->writing = 0;
+    vscsi_preprocess_desc(req);
+    rc = vscsi_srp_transfer_data(s, req, 0, resp_data, len);
+    if (rc < 0) {
+        vscsi_makeup_sense(s, req, HARDWARE_ERROR, 0, 0);
+        vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
+    } else {
+        vscsi_send_rsp(s, req, 0, 36 - rc, 0);
+    }
+}
+
+static int vscsi_queue_cmd(VSCSIState *s, vscsi_req *req)
+{
+    union srp_iu *srp = &req->iu.srp;
+    SCSIDevice *sdev;
+    int n, id, lun;
+
+    vscsi_decode_id_lun(be64_to_cpu(srp->cmd.lun), &id, &lun);
+
+    /* Qemu vs. linux issue with LUNs to be sorted out ... */
+    sdev = (id < 8 && lun < 16) ? s->bus.devs[id] : NULL;
+    if (!sdev) {
+        dprintf("VSCSI: Command for id %d with no drive\n", id);
+        if (srp->cmd.cdb[0] == INQUIRY) {
+            vscsi_inquiry_no_target(s, req);
+        } else {
+            vscsi_makeup_sense(s, req, ILLEGAL_REQUEST, 0x24, 0x00);
+            vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
+        } return 1;
+    }
+
+    req->sdev = sdev;
+    req->lun = lun;
+    n = sdev->info->send_command(sdev, req->qtag, srp->cmd.cdb, lun);
+    
+    dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x ID %d LUN %d ret: %d\n",
+            req->qtag, srp->cmd.cdb[0], id, lun, n);
+
+    if (n) {
+        /* Transfer direction must be set before preprocessing the
+         * descriptors
+         */
+        req->writing = (n < 1);
+
+        /* Preprocess RDMA descriptors */
+        vscsi_preprocess_desc(req);
+    }
+
+    /* Get transfer direction and initiate transfer */
+    if (n > 0) {
+        req->data_len = n;
+        sdev->info->read_data(sdev, req->qtag);
+    } else if (n < 0) {
+        req->data_len = -n;
+        sdev->info->write_data(sdev, req->qtag);
+    }
+    /* Don't touch req here, it may have been recycled already */
+
+    return 0;
+}
+
+static int vscsi_process_tsk_mgmt(VSCSIState *s, vscsi_req *req)
+{
+    union viosrp_iu *iu = &req->iu;
+    int fn;
+
+    fprintf(stderr, "vscsi_process_tsk_mgmt %02x\n",
+            iu->srp.tsk_mgmt.tsk_mgmt_func);
+
+    switch (iu->srp.tsk_mgmt.tsk_mgmt_func) {
+#if 0 /* We really don't deal with these for now */
+    case SRP_TSK_ABORT_TASK:
+        fn = ABORT_TASK;
+        break;
+    case SRP_TSK_ABORT_TASK_SET:
+        fn = ABORT_TASK_SET;
+        break;
+    case SRP_TSK_CLEAR_TASK_SET:
+        fn = CLEAR_TASK_SET;
+        break;
+    case SRP_TSK_LUN_RESET:
+        fn = LOGICAL_UNIT_RESET;
+        break;
+    case SRP_TSK_CLEAR_ACA:
+        fn = CLEAR_ACA;
+        break;
+#endif
+    default:
+        fn = 0;
+    }
+    if (fn) {
+        /* XXX Send/Handle target task management */
+        ;
+    } else {
+        vscsi_makeup_sense(s, req, ILLEGAL_REQUEST, 0x20, 0);
+        vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
+    }
+    return !fn;
+}
+
+static int vscsi_handle_srp_req(VSCSIState *s, vscsi_req *req)
+{
+    union srp_iu *srp = &req->iu.srp;
+    int done = 1;
+    uint8_t opcode = srp->rsp.opcode;
+
+    switch (opcode) {
+    case SRP_LOGIN_REQ:
+        vscsi_process_login(s, req);
+        break;
+    case SRP_TSK_MGMT:
+        done = vscsi_process_tsk_mgmt(s, req);
+        break;
+    case SRP_CMD:
+        done = vscsi_queue_cmd(s, req);
+        break;
+    case SRP_LOGIN_RSP:
+    case SRP_I_LOGOUT:
+    case SRP_T_LOGOUT:
+    case SRP_RSP:
+    case SRP_CRED_REQ:
+    case SRP_CRED_RSP:
+    case SRP_AER_REQ:
+    case SRP_AER_RSP:
+        fprintf(stderr, "VSCSI: Unsupported opcode %02x\n", opcode);
+        break;
+    default:
+        fprintf(stderr, "VSCSI: Unknown type %02x\n", opcode);
+    }
+
+    return done;
+}
+
+static int vscsi_send_adapter_info(VSCSIState *s, vscsi_req *req)
+{
+    struct viosrp_adapter_info *sinfo;
+    struct mad_adapter_info_data info;
+    int rc;
+
+    sinfo = &req->iu.mad.adapter_info;
+
+#if 0 /* What for ? */
+    rc = spapr_tce_dma_read(&s->vdev, be64_to_cpu(sinfo->buffer),
+                            &info, be16_to_cpu(sinfo->common.length));
+    if (rc) {
+        fprintf(stderr, "vscsi_send_adapter_info: DMA read failure !\n");
+    }
+#endif
+    memset(&info, 0, sizeof(info));
+    strcpy(info.srp_version, SRP_VERSION);
+    strncpy(info.partition_name, "qemu", sizeof("qemu"));
+    info.partition_number = cpu_to_be32(0);
+    info.mad_version = cpu_to_be32(1);
+    info.os_type = cpu_to_be32(2);
+    info.port_max_txu[0] = cpu_to_be32(VSCSI_MAX_SECTORS << 9);
+
+    rc = spapr_tce_dma_write(&s->vdev, be64_to_cpu(sinfo->buffer),
+                             &info, be16_to_cpu(sinfo->common.length));
+    if (rc)  {
+        fprintf(stderr, "vscsi_send_adapter_info: DMA write failure !\n");
+    }
+
+    sinfo->common.status = rc ? cpu_to_be32(1) : 0;
+
+    return vscsi_send_iu(s, req, sizeof(*sinfo), VIOSRP_MAD_FORMAT);
+}
+
+static int vscsi_handle_mad_req(VSCSIState *s, vscsi_req *req)
+{
+    union mad_iu *mad = &req->iu.mad;
+
+    switch (be32_to_cpu(mad->empty_iu.common.type)) {
+    case VIOSRP_EMPTY_IU_TYPE:
+        fprintf(stderr, "Unsupported EMPTY MAD IU\n");
+        break;
+    case VIOSRP_ERROR_LOG_TYPE:
+        fprintf(stderr, "Unsupported ERROR LOG MAD IU\n");
+        mad->error_log.common.status = cpu_to_be16(1);
+        vscsi_send_iu(s, req, sizeof(mad->error_log), VIOSRP_MAD_FORMAT);
+        break;
+    case VIOSRP_ADAPTER_INFO_TYPE:
+        vscsi_send_adapter_info(s, req);
+        break;
+    case VIOSRP_HOST_CONFIG_TYPE:
+        mad->host_config.common.status = cpu_to_be16(1);
+        vscsi_send_iu(s, req, sizeof(mad->host_config), VIOSRP_MAD_FORMAT);
+        break;
+    default:
+        fprintf(stderr, "VSCSI: Unknown MAD type %02x\n",
+                be32_to_cpu(mad->empty_iu.common.type));
+    }
+
+    return 1;
+}
+
+static void vscsi_got_payload(VSCSIState *s, vscsi_crq *crq)
+{
+    vscsi_req *req;
+    int done;
+
+    req = vscsi_get_req(s);
+    if (req == NULL) {
+        fprintf(stderr, "VSCSI: Failed to get a request !\n");
+        return;
+    }
+
+    /* We only support a limited number of descriptors, we know
+     * the ibmvscsi driver uses up to 10 max, so it should fit
+     * in our 256 bytes IUs. If not we'll have to increase the size
+     * of the structure.
+     */
+    if (crq->s.IU_length > sizeof(union viosrp_iu)) {
+        fprintf(stderr, "VSCSI: SRP IU too long (%d bytes) !\n",
+                crq->s.IU_length);
+        return;
+    }
+
+    /* XXX Handle failure differently ? */
+    if (spapr_tce_dma_read(&s->vdev, crq->s.IU_data_ptr, &req->iu,
+                           crq->s.IU_length)) {
+        fprintf(stderr, "vscsi_got_payload: DMA read failure !\n");
+        qemu_free(req);
+    }
+    memcpy(&req->crq, crq, sizeof(vscsi_crq));
+
+    if (crq->s.format == VIOSRP_MAD_FORMAT) {
+        done = vscsi_handle_mad_req(s, req);
+    } else {
+        done = vscsi_handle_srp_req(s, req);
+    }
+
+    if (done) {
+        vscsi_put_req(s, req);
+    }
+}
+
+
+static int vscsi_do_crq(struct VIOsPAPRDevice *dev, uint8_t *crq_data)
+{
+    VSCSIState *s = DO_UPCAST(VSCSIState, vdev, dev);
+    vscsi_crq crq;
+    
+    memcpy(crq.raw, crq_data, 16);
+    crq.s.timeout = be16_to_cpu(crq.s.timeout);
+    crq.s.IU_length = be16_to_cpu(crq.s.IU_length);
+    crq.s.IU_data_ptr = be64_to_cpu(crq.s.IU_data_ptr);
+
+    dprintf("VSCSI: do_crq %02x %02x ...\n", crq.raw[0], crq.raw[1]);
+
+    switch(crq.s.valid) {
+    case 0xc0: /* Init command/response */
+
+        /* Respond to initialization request */
+        if (crq.s.format == 0x01) {
+            memset(crq.raw, 0, 16);
+            crq.s.valid = 0xc0;
+            crq.s.format = 0x02;
+            spapr_vio_send_crq(dev, crq.raw);
+        }
+
+        /* Note that in hotplug cases, we might get a 0x02
+         * as a result of us emitting the init request
+         */
+
+        break;
+    case 0xff: /* Link event */
+
+        /* Not handled for now */
+
+        break;
+    case 0x80: /* Payloads */
+        switch (crq.s.format) {
+        case VIOSRP_SRP_FORMAT: /* AKA VSCSI request */
+        case VIOSRP_MAD_FORMAT: /* AKA VSCSI response */
+            vscsi_got_payload(s, &crq);
+            break;
+        case VIOSRP_OS400_FORMAT:
+        case VIOSRP_AIX_FORMAT:
+        case VIOSRP_LINUX_FORMAT:
+        case VIOSRP_INLINE_FORMAT:
+            fprintf(stderr, "vscsi_do_srq: Unsupported payload format %02x\n",
+                    crq.s.format);
+            break;
+        default:
+            fprintf(stderr, "vscsi_do_srq: Unknown payload format %02x\n",
+                    crq.s.format);
+        }
+        break;
+    default:
+        fprintf(stderr, "vscsi_do_crq: unknown CRQ %02x %02x ...\n",
+                crq.raw[0], crq.raw[1]);
+    };
+
+    return 0;
+}
+
+static int spapr_vscsi_init(VIOsPAPRDevice *dev)
+{
+    VSCSIState *s = DO_UPCAST(VSCSIState, vdev, dev);
+    int i;
+
+    dbg_vscsi_state = s;
+
+    /* Initialize qemu request tags */
+    memset(s->reqs, 0, sizeof(s->reqs));
+    for (i = 0; i < VSCSI_REQ_LIMIT; i++)
+        s->reqs[i].qtag = i;
+
+    dev->crq.SendFunc = vscsi_do_crq;
+
+    scsi_bus_new(&s->bus, &dev->qdev, 1, VSCSI_REQ_LIMIT,
+                 vscsi_command_complete);
+    if (!dev->qdev.hotplugged) {
+        scsi_bus_legacy_handle_cmdline(&s->bus);
+    }
+
+    return 0;
+}
+
+void spapr_vscsi_create(VIOsPAPRBus *bus, uint32_t reg,
+                        qemu_irq qirq, uint32_t vio_irq_num)
+{
+    DeviceState *dev;
+    VIOsPAPRDevice *sdev;
+
+    dev = qdev_create(&bus->bus, "spapr-vscsi");
+    qdev_prop_set_uint32(dev, "reg", reg);
+
+    qdev_init_nofail(dev);
+
+    sdev = (VIOsPAPRDevice *)dev;
+    sdev->qirq = qirq;
+    sdev->vio_irq_num = vio_irq_num;
+}
+
+static int spapr_vscsi_devnode(VIOsPAPRDevice *dev, void *fdt, int node_off)
+{
+    int ret;
+
+    ret = fdt_setprop_cell(fdt, node_off, "#address-cells", 2);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = fdt_setprop_cell(fdt, node_off, "#size-cells", 0);
+    if (ret < 0) {
+        return ret;
+    }
+
+    return 0;
+}
+
+static VIOsPAPRDeviceInfo spapr_vscsi = {
+    .init = spapr_vscsi_init,
+    .devnode = spapr_vscsi_devnode,
+    .dt_name = "v-scsi",
+    .dt_type = "vscsi",
+    .dt_compatible = "IBM,v-scsi",
+    .signal_mask = 0x00000001,
+    .qdev.name = "spapr-vscsi",
+    .qdev.size = sizeof(VSCSIState),
+    .qdev.props = (Property[]) {
+        DEFINE_PROP_UINT32("reg", VIOsPAPRDevice, reg, 0x2000),
+        DEFINE_PROP_UINT32("dma-window", VIOsPAPRDevice,
+                           rtce_window_size, 0x10000000),
+        DEFINE_PROP_END_OF_LIST(),
+    },
+};
+
+static void spapr_vscsi_register(void)
+{
+    spapr_vio_bus_register_withprop(&spapr_vscsi);
+}
+device_init(spapr_vscsi_register);
diff --git a/hw/srp.h b/hw/srp.h
new file mode 100644
index 0000000..9d55fc4
--- /dev/null
+++ b/hw/srp.h
@@ -0,0 +1,241 @@
+/*
+ * Copyright (c) 2005 Cisco Systems.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * $Id$
+ */
+
+#ifndef SCSI_SRP_H
+#define SCSI_SRP_H
+
+/*
+ * Structures and constants for the SCSI RDMA Protocol (SRP) as
+ * defined by the INCITS T10 committee.  This file was written using
+ * draft Revision 16a of the SRP standard.
+ */
+
+enum {
+
+    SRP_LOGIN_REQ = 0x00,
+    SRP_TSK_MGMT  = 0x01,
+    SRP_CMD       = 0x02,
+    SRP_I_LOGOUT  = 0x03,
+    SRP_LOGIN_RSP = 0xc0,
+    SRP_RSP       = 0xc1,
+    SRP_LOGIN_REJ = 0xc2,
+    SRP_T_LOGOUT  = 0x80,
+    SRP_CRED_REQ  = 0x81,
+    SRP_AER_REQ   = 0x82,
+    SRP_CRED_RSP  = 0x41,
+    SRP_AER_RSP   = 0x42
+};
+
+enum {
+    SRP_BUF_FORMAT_DIRECT   = 1 << 1,
+    SRP_BUF_FORMAT_INDIRECT = 1 << 2
+};
+
+enum {
+    SRP_NO_DATA_DESC       = 0,
+    SRP_DATA_DESC_DIRECT   = 1,
+    SRP_DATA_DESC_INDIRECT = 2
+};
+
+enum {
+    SRP_TSK_ABORT_TASK     = 0x01,
+    SRP_TSK_ABORT_TASK_SET = 0x02,
+    SRP_TSK_CLEAR_TASK_SET = 0x04,
+    SRP_TSK_LUN_RESET      = 0x08,
+    SRP_TSK_CLEAR_ACA      = 0x40
+};
+
+enum srp_login_rej_reason {
+    SRP_LOGIN_REJ_UNABLE_ESTABLISH_CHANNEL   = 0x00010000,
+    SRP_LOGIN_REJ_INSUFFICIENT_RESOURCES     = 0x00010001,
+    SRP_LOGIN_REJ_REQ_IT_IU_LENGTH_TOO_LARGE = 0x00010002,
+    SRP_LOGIN_REJ_UNABLE_ASSOCIATE_CHANNEL   = 0x00010003,
+    SRP_LOGIN_REJ_UNSUPPORTED_DESCRIPTOR_FMT = 0x00010004,
+    SRP_LOGIN_REJ_MULTI_CHANNEL_UNSUPPORTED  = 0x00010005,
+    SRP_LOGIN_REJ_CHANNEL_LIMIT_REACHED      = 0x00010006
+};
+
+enum {
+    SRP_REV10_IB_IO_CLASS  = 0xff00,
+    SRP_REV16A_IB_IO_CLASS = 0x0100
+};
+
+struct srp_direct_buf {
+    uint64_t    va;
+    uint32_t    key;
+    uint32_t    len;
+};
+
+/*
+ * We need the packed attribute because the SRP spec puts the list of
+ * descriptors at an offset of 20, which is not aligned to the size of
+ * struct srp_direct_buf.  The whole structure must be packed to avoid
+ * having the 20-byte structure padded to 24 bytes on 64-bit architectures.
+ */
+struct srp_indirect_buf {
+    struct srp_direct_buf    table_desc;
+    uint32_t                 len;
+    struct srp_direct_buf    desc_list[0];
+} __attribute__((packed));
+
+enum {
+    SRP_MULTICHAN_SINGLE = 0,
+    SRP_MULTICHAN_MULTI  = 1
+};
+
+struct srp_login_req {
+    uint8_t    opcode;
+    uint8_t    reserved1[7];
+    uint64_t   tag;
+    uint32_t   req_it_iu_len;
+    uint8_t    reserved2[4];
+    uint16_t   req_buf_fmt;
+    uint8_t    req_flags;
+    uint8_t    reserved3[5];
+    uint8_t    initiator_port_id[16];
+    uint8_t    target_port_id[16];
+};
+
+/*
+ * The SRP spec defines the size of the LOGIN_RSP structure to be 52
+ * bytes, so it needs to be packed to avoid having it padded to 56
+ * bytes on 64-bit architectures.
+ */
+struct srp_login_rsp {
+    uint8_t    opcode;
+    uint8_t    reserved1[3];
+    uint32_t   req_lim_delta;
+    uint64_t   tag;
+    uint32_t   max_it_iu_len;
+    uint32_t   max_ti_iu_len;
+    uint16_t   buf_fmt;
+    uint8_t    rsp_flags;
+    uint8_t    reserved2[25];
+} __attribute__((packed));
+
+struct srp_login_rej {
+    uint8_t    opcode;
+    uint8_t    reserved1[3];
+    uint32_t   reason;
+    uint64_t   tag;
+    uint8_t    reserved2[8];
+    uint16_t   buf_fmt;
+    uint8_t    reserved3[6];
+};
+
+struct srp_i_logout {
+    uint8_t    opcode;
+    uint8_t    reserved[7];
+    uint64_t   tag;
+};
+
+struct srp_t_logout {
+    uint8_t    opcode;
+    uint8_t    sol_not;
+    uint8_t    reserved[2];
+    uint32_t   reason;
+    uint64_t   tag;
+};
+
+/*
+ * We need the packed attribute because the SRP spec only aligns the
+ * 8-byte LUN field to 4 bytes.
+ */
+struct srp_tsk_mgmt {
+    uint8_t    opcode;
+    uint8_t    sol_not;
+    uint8_t    reserved1[6];
+    uint64_t   tag;
+    uint8_t    reserved2[4];
+    uint64_t   lun __attribute__((packed));
+    uint8_t    reserved3[2];
+    uint8_t    tsk_mgmt_func;
+    uint8_t    reserved4;
+    uint64_t   task_tag;
+    uint8_t    reserved5[8];
+};
+
+/*
+ * We need the packed attribute because the SRP spec only aligns the
+ * 8-byte LUN field to 4 bytes.
+ */
+struct srp_cmd {
+    uint8_t    opcode;
+    uint8_t    sol_not;
+    uint8_t    reserved1[3];
+    uint8_t    buf_fmt;
+    uint8_t    data_out_desc_cnt;
+    uint8_t    data_in_desc_cnt;
+    uint64_t   tag;
+    uint8_t    reserved2[4];
+    uint64_t   lun __attribute__((packed));
+    uint8_t    reserved3;
+    uint8_t    task_attr;
+    uint8_t    reserved4;
+    uint8_t    add_cdb_len;
+    uint8_t    cdb[16];
+    uint8_t    add_data[0];
+};
+
+enum {
+    SRP_RSP_FLAG_RSPVALID = 1 << 0,
+    SRP_RSP_FLAG_SNSVALID = 1 << 1,
+    SRP_RSP_FLAG_DOOVER   = 1 << 2,
+    SRP_RSP_FLAG_DOUNDER  = 1 << 3,
+    SRP_RSP_FLAG_DIOVER   = 1 << 4,
+    SRP_RSP_FLAG_DIUNDER  = 1 << 5
+};
+
+/*
+ * The SRP spec defines the size of the RSP structure to be 36 bytes,
+ * so it needs to be packed to avoid having it padded to 40 bytes on
+ * 64-bit architectures.
+ */
+struct srp_rsp {
+    uint8_t    opcode;
+    uint8_t    sol_not;
+    uint8_t    reserved1[2];
+    uint32_t   req_lim_delta;
+    uint64_t   tag;
+    uint8_t    reserved2[2];
+    uint8_t    flags;
+    uint8_t    status;
+    uint32_t   data_out_res_cnt;
+    uint32_t   data_in_res_cnt;
+    uint32_t   sense_data_len;
+    uint32_t   resp_data_len;
+    uint8_t    data[0];
+} __attribute__((packed));
+
+#endif /* SCSI_SRP_H */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (23 preceding siblings ...)
  2011-03-16  4:57 ` [Qemu-devel] [PATCH 24/26] Implement PAPR virtual SCSI interface (ibmvscsi) David Gibson
@ 2011-03-16  4:57 ` David Gibson
  2011-03-16 16:43   ` [Qemu-devel] " Alexander Graf
  2011-03-16  4:57 ` [Qemu-devel] [PATCH 26/26] Implement PAPR VPA functions for pSeries shared processor partitions David Gibson
  25 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:57 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

From: Ben Herrenschmidt <benh@kernel.crashing.org>

Usually, PAPR virtual IO devices use a virtual IOMMU mechanism, TCEs,
to mediate all DMA transfers.  While this is necessary for some sorts of
operation, it can be complex to program and slow for others.

This patch implements a mechanism for bypassing TCE translation, treating
"IO" addresses as plain (guest) physical memory addresses.  This has two
main uses:
 * Simple, but 64-bit aware programs like firmwares can use the VIO devices
without the complexity of TCE setup.
 * The guest OS can optionally use the TCE bypass to improve performance in
suitable situations.

The mechanism used is a per-device flag which disables TCE translation.
The flag is toggled with some (hypervisor-implemented) RTAS methods.

Signed-off-by: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/spapr_vio.c |   82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/spapr_vio.h |    5 +++
 2 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
index 96668f3..280f34a 100644
--- a/hw/spapr_vio.c
+++ b/hw/spapr_vio.c
@@ -224,6 +224,12 @@ int spapr_tce_dma_write(VIOsPAPRDevice *dev, uint64_t taddr, const void *buf,
             (unsigned long long)taddr, size);
 #endif
 
+    /* Check for bypass */
+    if (dev->flags & VIO_PAPR_FLAG_DMA_BYPASS) {
+        cpu_physical_memory_write(taddr, buf, size);
+        return 0;
+    }
+
     while(size) {
         uint64_t tce;
         uint32_t lsize;
@@ -308,6 +314,12 @@ int spapr_tce_dma_read(VIOsPAPRDevice *dev, uint64_t taddr, void *buf,
             (unsigned long long)taddr, size);
 #endif
 
+    /* Check for bypass */
+    if (dev->flags & VIO_PAPR_FLAG_DMA_BYPASS) {
+        cpu_physical_memory_read(taddr, buf, size);
+        return 0;
+    }
+
     while(size) {
         uint64_t tce;
         uint32_t lsize;
@@ -505,6 +517,72 @@ int spapr_vio_send_crq(VIOsPAPRDevice *dev, uint8_t *crq)
     return 0;
 }
 
+/* "quiesce" handling */
+
+static void spapr_vio_quiesce_one(VIOsPAPRDevice *dev)
+{
+    dev->flags &= ~VIO_PAPR_FLAG_DMA_BYPASS;
+    
+    if (dev->rtce_table) {
+        size_t size = (dev->rtce_window_size >> SPAPR_VIO_TCE_PAGE_SHIFT)
+            * sizeof(VIOsPAPR_RTCE);
+        memset(dev->rtce_table, 0, size);
+    }
+
+    dev->crq.qladdr = 0;
+    dev->crq.qsize = 0;
+    dev->crq.qnext = 0;
+}
+
+static void rtas_set_tce_bypass(sPAPREnvironment *spapr, uint32_t token,
+                                uint32_t nargs, target_ulong args,
+                                uint32_t nret, target_ulong rets)
+{
+    VIOsPAPRBus *bus = spapr->vio_bus;
+    VIOsPAPRDevice *dev;
+    uint32_t unit, enable;
+
+    if (nargs != 2) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+    unit = rtas_ld(args, 0);
+    enable = rtas_ld(args, 1);
+    dev = spapr_vio_find_by_reg(bus, unit);
+    if (!dev) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+    if (enable) {
+        dev->flags |= VIO_PAPR_FLAG_DMA_BYPASS;
+    } else {
+        dev->flags &= ~VIO_PAPR_FLAG_DMA_BYPASS;
+    }
+
+    rtas_st(rets, 0, 0);
+}
+
+static void rtas_quiesce(sPAPREnvironment *spapr, uint32_t token,
+                         uint32_t nargs, target_ulong args,
+                         uint32_t nret, target_ulong rets)
+{
+    VIOsPAPRBus *bus = spapr->vio_bus;
+    DeviceState *qdev;
+    VIOsPAPRDevice *dev = NULL;
+
+    if (nargs != 0) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+ 
+    QLIST_FOREACH(qdev, &bus->bus.children, sibling) {
+        dev = (VIOsPAPRDevice *)qdev;
+        spapr_vio_quiesce_one(dev);
+    }
+
+    rtas_st(rets, 0, 0);
+}
+
 static int spapr_vio_busdev_init(DeviceState *dev, DeviceInfo *info)
 {
     VIOsPAPRDeviceInfo *_info = (VIOsPAPRDeviceInfo *)info;
@@ -581,6 +659,10 @@ VIOsPAPRBus *spapr_vio_bus_init(void)
     spapr_register_hypercall(H_SEND_CRQ, h_send_crq);
     spapr_register_hypercall(H_ENABLE_CRQ, h_enable_crq);
 
+    /* RTAS calls */
+    spapr_rtas_register("ibm,set-tce-bypass", rtas_set_tce_bypass);
+    spapr_rtas_register("quiesce", rtas_quiesce);
+
     for (_info = device_info_list; _info; _info = _info->next) {
         VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)_info;
 
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index b7d0daa..841b043 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -48,6 +48,8 @@ typedef struct VIOsPAPR_CRQ {
 typedef struct VIOsPAPRDevice {
     DeviceState qdev;
     uint32_t reg;
+    uint32_t flags;
+#define VIO_PAPR_FLAG_DMA_BYPASS        0x1
     qemu_irq qirq;
     uint32_t vio_irq_num;
     target_ulong signal_state;
@@ -104,4 +106,7 @@ void spapr_vlan_create(VIOsPAPRBus *bus, uint32_t reg, NICInfo *nd,
 void spapr_vscsi_create(VIOsPAPRBus *bus, uint32_t reg,
                         qemu_irq qirq, uint32_t vio_irq_num);
 
+int spapr_tce_set_bypass(uint32_t unit, uint32_t enable);
+void spapr_vio_quiesce(void);
+
 #endif /* _HW_SPAPR_VIO_H */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH 26/26] Implement PAPR VPA functions for pSeries shared processor partitions
  2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
                   ` (24 preceding siblings ...)
  2011-03-16  4:57 ` [Qemu-devel] [PATCH 25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine David Gibson
@ 2011-03-16  4:57 ` David Gibson
  25 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-16  4:57 UTC (permalink / raw)
  To: agraf, qemu-devel; +Cc: paulus, anton

Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition.  qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.

In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions.  There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.

A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.

Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.

This patch implements the VPA and H_CEDE hypercalls in qemu.  We don't
implement any of the more advanced statistics which can be communicated
through the VPA.  However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/spapr.c       |    2 +-
 hw/spapr_hcall.c |  192 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/cpu.h |    4 +
 3 files changed, 197 insertions(+), 1 deletions(-)

diff --git a/hw/spapr.c b/hw/spapr.c
index 5f868fc..76bdd36 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -65,7 +65,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
     uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
     char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
-        "\0hcall-tce\0hcall-vio";
+        "\0hcall-tce\0hcall-vio\0hcall-splpar";
     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
     int i;
     char *modelname;
diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
index 0ff83c9..6f65655 100644
--- a/hw/spapr_hcall.c
+++ b/hw/spapr_hcall.c
@@ -4,6 +4,8 @@
 #include "sysemu.h"
 #include "qemu-char.h"
 #include "exec-all.h"
+#include "exec.h"
+#include "helper_regs.h"
 #include "hw/spapr.h"
 
 #define HPTES_PER_GROUP 8
@@ -248,6 +250,192 @@ static target_ulong h_set_dabr(CPUState *env, sPAPREnvironment *spapr,
     return H_HARDWARE;
 }
 
+#define FLAGS_REGISTER_VPA         0x0000200000000000ULL
+#define FLAGS_REGISTER_DTL         0x0000400000000000ULL
+#define FLAGS_REGISTER_SLBSHADOW   0x0000600000000000ULL
+#define FLAGS_DEREGISTER_VPA       0x0000a00000000000ULL
+#define FLAGS_DEREGISTER_DTL       0x0000c00000000000ULL
+#define FLAGS_DEREGISTER_SLBSHADOW 0x0000e00000000000ULL
+
+#define VPA_MIN_SIZE           640
+#define VPA_SIZE_OFFSET        0x4
+#define VPA_SHARED_PROC_OFFSET 0x9
+#define VPA_SHARED_PROC_VAL    0x2
+
+static target_ulong register_vpa(CPUState *env, target_ulong vpa)
+{
+    uint16_t size;
+    uint8_t tmp;
+
+    if (vpa == 0) {
+        fprintf(stderr, "Can't cope with registering a VPA at logical 0\n");
+        return H_HARDWARE;
+    }
+
+    if (vpa % env->dcache_line_size) {
+        return H_PARAMETER;
+    }
+    /* FIXME: bounds check the address */
+
+    size = lduw_phys(vpa + 0x4);
+
+    if (size < VPA_MIN_SIZE) {
+        return H_PARAMETER;
+    }
+
+    /* VPA is not allowed to cross a page boundary */
+    if ((vpa / 4096) != ((vpa + size - 1) / 4096)) {
+        return H_PARAMETER;
+    }
+
+    env->vpa = vpa;
+
+    tmp = ldub_phys(env->vpa + VPA_SHARED_PROC_OFFSET);
+    tmp |= VPA_SHARED_PROC_VAL;
+    stb_phys(env->vpa + VPA_SHARED_PROC_OFFSET, tmp);
+
+    return H_SUCCESS;
+}
+
+static target_ulong deregister_vpa(CPUState *env, target_ulong vpa)
+{
+    if (env->slb_shadow) {
+        return H_RESOURCE;
+    }
+
+    if (env->dispatch_trace_log) {
+        return H_RESOURCE;
+    }
+
+    env->vpa = 0;
+    return H_SUCCESS;
+}
+
+static target_ulong register_slb_shadow(CPUState *env, target_ulong addr)
+{
+    uint32_t size;
+
+    if (addr == 0) {
+        fprintf(stderr, "Can't cope with SLB shadow at logical 0\n");
+        return H_HARDWARE;
+    }
+
+    size = ldl_phys(addr + 0x4);
+    if (size < 0x8) {
+        return H_PARAMETER;
+    }
+
+    if ((addr / 4096) != ((addr + size - 1) / 4096)) {
+        return H_PARAMETER;
+    }
+
+    if (!env->vpa) {
+        return H_RESOURCE;
+    }
+
+    env->slb_shadow = addr;
+
+    return H_SUCCESS;
+}
+
+static target_ulong deregister_slb_shadow(CPUState *env, target_ulong addr)
+{
+    env->slb_shadow = 0;
+    return H_SUCCESS;
+}
+
+static target_ulong register_dtl(CPUState *env, target_ulong addr)
+{
+    uint32_t size;
+
+    if (addr == 0) {
+        fprintf(stderr, "Can't cope with DTL at logical 0\n");
+        return H_HARDWARE;
+    }
+
+    size = ldl_phys(addr + 0x4);
+
+    if (size < 48) {
+        return H_PARAMETER;
+    }
+
+    if (!env->vpa) {
+        return H_RESOURCE;
+    }
+
+    env->dispatch_trace_log = addr;
+    env->dtl_size = size;
+
+    return H_SUCCESS;
+}
+
+static target_ulong deregister_dtl(CPUState *emv, target_ulong addr)
+{
+    env->dispatch_trace_log = 0;
+    env->dtl_size = 0;
+
+    return H_SUCCESS;
+}
+
+static target_ulong h_register_vpa(CPUState *env, sPAPREnvironment *spapr,
+                                   target_ulong opcode, target_ulong *args)
+{
+    target_ulong flags = args[0];
+    target_ulong procno = args[1];
+    target_ulong vpa = args[2];
+    target_ulong ret = H_PARAMETER;
+    CPUState *tenv;
+
+    for (tenv = first_cpu; tenv; tenv = tenv->next_cpu) {
+        if (tenv->cpu_index == procno) {
+            break;
+        }
+    }
+
+    if (!tenv) {
+        return H_PARAMETER;
+    }
+
+    switch (flags) {
+    case FLAGS_REGISTER_VPA:
+        ret = register_vpa(tenv, vpa);
+        break;
+
+    case FLAGS_DEREGISTER_VPA:
+        ret = deregister_vpa(tenv, vpa);
+        break;
+
+    case FLAGS_REGISTER_SLBSHADOW:
+        ret = register_slb_shadow(tenv, vpa);
+        break;
+
+    case FLAGS_DEREGISTER_SLBSHADOW:
+        ret = deregister_slb_shadow(tenv, vpa);
+        break;
+
+    case FLAGS_REGISTER_DTL:
+        ret = register_dtl(tenv, vpa);
+        break;
+
+    case FLAGS_DEREGISTER_DTL:
+        ret = deregister_dtl(tenv, vpa);
+        break;
+    }
+
+    return ret;
+}
+
+static target_ulong h_cede(CPUState *env, sPAPREnvironment *spapr,
+                           target_ulong opcode, target_ulong *args)
+{
+    env->msr |= (1ULL << MSR_EE);
+    hreg_compute_hflags(env);
+    if (!cpu_has_work(env)) {
+        env->halted = 1;
+    }
+    return H_SUCCESS;
+}
+
 static target_ulong h_rtas(sPAPREnvironment *spapr, target_ulong rtas_r3)
 {
     uint32_t token = ldl_phys(rtas_r3);
@@ -311,5 +499,9 @@ static void hypercall_init(void)
 
     /* hcall-dabr */
     spapr_register_hypercall(H_SET_DABR, h_set_dabr);
+
+    /* hcall-splpar */
+    spapr_register_hypercall(H_REGISTER_VPA, h_register_vpa);
+    spapr_register_hypercall(H_CEDE, h_cede);
 }
 device_init(hypercall_init);
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 29d6b49..867a2d8 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -722,6 +722,10 @@ struct CPUPPCState {
     uint64_t insns_flags;
     void (*emulate_hypercall)(CPUState *, void *);
     void *hcall_opaque;
+    target_phys_addr_t vpa;
+    target_phys_addr_t slb_shadow;
+    target_phys_addr_t dispatch_trace_log;
+    uint32_t dtl_size;
 
     int error_code;
     uint32_t pending_interrupts;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC David Gibson
@ 2011-03-16 13:46   ` Alexander Graf
  2011-03-16 16:58     ` Stefan Hajnoczi
  2011-03-16 20:44   ` [Qemu-devel] " Anthony Liguori
  1 sibling, 1 reply; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 13:46 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:56 AM, David Gibson wrote:
> From: David Gibson<dwg@au1.ibm.com>
>
> PowerPC and POWER chips since the POWER4 and 970 have a special
> hypervisor mode, and a corresponding form of the system call
> instruction which traps to the hypervisor.
>
> qemu currently has stub implementations of hypervisor mode.  That
> is, the outline is there to allow qemu to run a PowerPC hypervisor
> under emulation.  There are a number of details missing so this
> won't actually work at present, but the idea is there.
>
> What there is no provision at all, is for qemu to instead emulate
> the hypervisor itself.  That is to have hypercalls trap into qemu
> and their result be emulated from qemu, rather than running
> hypervisor code within the emulated system.
>
> Hypervisor hardware aware KVM implementations are in the works and
> it would  be useful for debugging and development to also allow
> full emulation of the same para-virtualized guests as such a KVM.
>
> Therefore, this patch adds a hook which will allow a machine to
> set up emulation of hypervisor calls.
>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   target-ppc/cpu.h    |    2 ++
>   target-ppc/helper.c |    4 ++++
>   2 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index a20c132..eaddc27 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -692,6 +692,8 @@ struct CPUPPCState {
>       int bfd_mach;
>       uint32_t flags;
>       uint64_t insns_flags;
> +    void (*emulate_hypercall)(CPUState *, void *);
> +    void *hcall_opaque;
>
>       int error_code;
>       uint32_t pending_interrupts;
> diff --git a/target-ppc/helper.c b/target-ppc/helper.c
> index 2094ca3..19aa067 100644
> --- a/target-ppc/helper.c
> +++ b/target-ppc/helper.c
> @@ -2152,6 +2152,10 @@ static inline void powerpc_excp(CPUState *env, int excp_model, int excp)
>       case POWERPC_EXCP_SYSCALL:   /* System call exception                    */
>           dump_syscall(env);
>           lev = env->error_code;
> +	if ((lev == 1)&&  env->emulate_hypercall) {
> +	    env->emulate_hypercall(env, env->hcall_opaque);
> +	    return;
> +	}	

Tabs! Please go through all your patches and make sure there are no tabs 
in there :(.


Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 13/26] Start implementing pSeries logical partition machine
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 13/26] Start implementing pSeries logical partition machine David Gibson
@ 2011-03-16 14:30   ` Alexander Graf
  2011-03-16 21:59   ` [Qemu-devel] " Anthony Liguori
  1 sibling, 0 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 14:30 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:56 AM, David Gibson wrote:
> This patch adds a "pseries" machine to qemu.  This aims to emulate a
> logical partition on an IBM pSeries machine, compliant to the
> "PowerPC Architecture Platform Requirements" (PAPR) document.
>
> This initial version is quite limited, it implements a basic machine
> and PAPR hypercall emulation.  So far only one hypercall is present -
> H_PUT_TERM_CHAR - so that a (write-only) console is available.
>
> Multiple CPUs are permitted, with SMP entry handled kexec() style.
>
> The machine so far more resembles an old POWER4 style "full system
> partition" rather than a modern LPAR, in that the guest manages the
> page tables directly, rather than via hypercalls.
>
> The machine requires qemu to be configured with --enable-fdt.  The
> machine can (so far) only be booted with -kernel - i.e. no partition
> firmware is provided.
>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   Makefile.target  |    2 +
>   hw/spapr.c       |  314 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/spapr.h       |  246 ++++++++++++++++++++++++++++++++++++++++++
>   hw/spapr_hcall.c |   43 ++++++++
>   4 files changed, 605 insertions(+), 0 deletions(-)
>   create mode 100644 hw/spapr.c
>   create mode 100644 hw/spapr.h
>   create mode 100644 hw/spapr_hcall.c
>
> diff --git a/Makefile.target b/Makefile.target
> index f0df98e..e6a7557 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -231,6 +231,8 @@ obj-ppc-y += ppc_prep.o
>   obj-ppc-y += ppc_oldworld.o
>   # NewWorld PowerMac
>   obj-ppc-y += ppc_newworld.o
> +# IBM pSeries (sPAPR)
> +obj-ppc-y += spapr.o spapr_hcall.o
>   # PowerPC 4xx boards
>   obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>   obj-ppc-y += ppc440.o ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
>

[snip]

> +#endif /* !defined (__HW_SPAPR_H__) */
> diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
> new file mode 100644
> index 0000000..6ddac00
> --- /dev/null
> +++ b/hw/spapr_hcall.c

Missing license

> @@ -0,0 +1,43 @@
> +#include "sysemu.h"
> +#include "cpu.h"
> +#include "qemu-char.h"
> +#include "hw/spapr.h"
> +
> +struct hypercall {
> +    spapr_hcall_fn fn;
> +} hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
> +
> +void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn)
> +{
> +    struct hypercall *hc;
> +
> +    assert(opcode<= MAX_HCALL_OPCODE);
> +    assert((opcode&  0x3) == 0);
> +
> +    hc = hypercall_table + (opcode / 4);
> +
> +    assert(!hc->fn || (fn == hc->fn));
> +
> +    hc->fn = fn;
> +}
> +
> +target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
> +                             target_ulong opcode, target_ulong *args)
> +{
> +    if (msr_pr) {
> +        fprintf(stderr, "Hypercall made with MSR=0x" TARGET_FMT_lx "\n",
> +                env->msr);
> +        return H_PRIVILEGE;
> +    }
> +
> +    if ((opcode<= MAX_HCALL_OPCODE)
> +&&  ((opcode&  0x3) == 0)) {
> +        struct hypercall *hc = hypercall_table + (opcode / 4);
> +
> +        if (hc->fn)

Braces

> +            return hc->fn(env, spapr, opcode, args);
> +    }
> +
> +    fprintf(stderr, "Unimplemented hcall 0x" TARGET_FMT_lx "\n", opcode);
> +    return H_FUNCTION;
> +}

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 14/26] Implement the bus structure for PAPR virtual IO
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 14/26] Implement the bus structure for PAPR virtual IO David Gibson
@ 2011-03-16 14:43   ` Alexander Graf
  2011-03-16 22:04   ` [Qemu-devel] " Anthony Liguori
  1 sibling, 0 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 14:43 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:56 AM, David Gibson wrote:
> This extends the "pseries" (PAPR) machine to include a virtual IO bus
> supporting the PAPR defined hypercall based virtual IO mechanisms.
>
> So far only one VIO device is provided, the vty / vterm, providing
> a full console (polled only, for now).
>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   Makefile.target |    3 +-
>   hw/spapr.c      |   47 ++++++++-----
>   hw/spapr.h      |    3 +
>   hw/spapr_vio.c  |  212 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/spapr_vio.h  |   50 +++++++++++++
>   hw/spapr_vty.c  |  145 +++++++++++++++++++++++++++++++++++++
>   6 files changed, 441 insertions(+), 19 deletions(-)
>   create mode 100644 hw/spapr_vio.c
>   create mode 100644 hw/spapr_vio.h
>   create mode 100644 hw/spapr_vty.c
>
> diff --git a/Makefile.target b/Makefile.target
> index e6a7557..3f2b235 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -232,7 +232,8 @@ obj-ppc-y += ppc_oldworld.o
>   # NewWorld PowerMac
>   obj-ppc-y += ppc_newworld.o
>   # IBM pSeries (sPAPR)
> -obj-ppc-y += spapr.o spapr_hcall.o
> +obj-ppc-y += spapr.o spapr_hcall.o spapr_vio.o
> +obj-ppc-y += spapr_vty.o
>   # PowerPC 4xx boards
>   obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>   obj-ppc-y += ppc440.o ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> index 8b4e16e..25e4a9e 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -25,7 +25,6 @@
>    *
>    */
>   #include "sysemu.h"
> -#include "qemu-char.h"
>   #include "hw.h"
>   #include "elf.h"
>
> @@ -34,6 +33,7 @@
>   #include "hw/loader.h"
>
>   #include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
>
>   #include<libfdt.h>
>
> @@ -58,6 +58,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>       uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>       int i;
>       char *modelname;
> +    int ret;
>
>   #define _FDT(exp) \
>       do { \
> @@ -152,9 +153,29 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>
>       _FDT((fdt_end_node(fdt)));
>
> +    /* vdevice */
> +    _FDT((fdt_begin_node(fdt, "vdevice")));
> +
> +    _FDT((fdt_property_string(fdt, "device_type", "vdevice")));
> +    _FDT((fdt_property_string(fdt, "compatible", "IBM,vdevice")));
> +    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
> +    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
> +
> +    _FDT((fdt_end_node(fdt)));
> +
>       _FDT((fdt_end_node(fdt))); /* close root node */
>       _FDT((fdt_finish(fdt)));
>
> +    /* re-expand to allow for further tweaks */
> +    _FDT((fdt_open_into(fdt, fdt, FDT_MAX_SIZE)));
> +
> +    ret = spapr_populate_vdevice(spapr->vio_bus, fdt);
> +    if (ret<  0) {
> +        fprintf(stderr, "couldn't setup vio devices in fdt\n");
> +    }
> +
> +    _FDT((fdt_pack(fdt)));
> +
>       if (fdt_size) {
>           *fdt_size = fdt_totalsize(fdt);
>       }
> @@ -173,21 +194,6 @@ static void emulate_spapr_hypercall(CPUState *env, void *opaque)
>                                     env->gpr[3],&env->gpr[4]);
>   }
>
> -/* FIXME: hack until we implement the proper VIO console */
> -static target_ulong h_put_term_char(CPUState *env, sPAPREnvironment *spapr,
> -                                    target_ulong opcode, target_ulong *args)
> -{
> -    uint8_t buf[16];
> -
> -    stq_p(buf, args[2]);
> -    stq_p(buf + 8, args[3]);
> -
> -    qemu_chr_write(serial_hds[0], buf, args[1]);
> -
> -    return 0;
> -}
> -
> -
>   /* pSeries LPAR / sPAPR hardware init */
>   static void ppc_spapr_init(ram_addr_t ram_size,
>                              const char *boot_device,
> @@ -242,7 +248,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>       ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
>       cpu_register_physical_memory(0, ram_size, ram_offset);
>
> -    spapr_register_hypercall(H_PUT_TERM_CHAR, h_put_term_char);
> +    spapr->vio_bus = spapr_vio_bus_init();
> +
> +    for (i = 0; i<  MAX_SERIAL_PORTS; i++) {
> +        if (serial_hds[i]) {
> +            spapr_vty_create(spapr->vio_bus, i, serial_hds[i]);
> +        }
> +    }
>
>       if (kernel_filename) {
>           uint64_t lowaddr = 0;
> @@ -274,7 +286,6 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>               initrd_base = 0;
>               initrd_size = 0;
>           }
> -
>       } else {
>           fprintf(stderr, "pSeries machine needs -kernel for now");
>           exit(1);
> diff --git a/hw/spapr.h b/hw/spapr.h
> index 9e63a19..47bf2ef 100644
> --- a/hw/spapr.h
> +++ b/hw/spapr.h
> @@ -1,7 +1,10 @@
>   #if !defined (__HW_SPAPR_H__)
>   #define __HW_SPAPR_H__
>
> +struct VIOsPAPRBus;
> +
>   typedef struct sPAPREnvironment {
> +    struct VIOsPAPRBus *vio_bus;
>   } sPAPREnvironment;
>
>   #define H_SUCCESS         0
> diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
> new file mode 100644
> index 0000000..0ed63f4
> --- /dev/null
> +++ b/hw/spapr_vio.c
> @@ -0,0 +1,212 @@
> +/*
> + * QEMU sPAPR VIO code
> + *
> + * Copyright (c) 2010 David Gibson, IBM Corporation<david@gibson.dropbear.id.au>
> + * Based on the s390 virtio bus code:
> + * Copyright (c) 2009 Alexander Graf<agraf@suse.de>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see<http://www.gnu.org/licenses/>.
> + */
> +
> +#include "hw.h"
> +#include "sysemu.h"
> +#include "boards.h"
> +#include "monitor.h"
> +#include "loader.h"
> +#include "elf.h"
> +#include "hw/sysbus.h"
> +#include "kvm.h"
> +#include "device_tree.h"
> +
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +#ifdef CONFIG_FDT
> +#include<libfdt.h>
> +#endif /* CONFIG_FDT */
> +
> +/* #define DEBUG_SPAPR */
> +
> +#ifdef DEBUG_SPAPR
> +#define dprintf(fmt, ...) \
> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define dprintf(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +static struct BusInfo spapr_vio_bus_info = {
> +    .name       = "spapr-vio",
> +    .size       = sizeof(VIOsPAPRBus),
> +};
> +
> +VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg)
> +{
> +    DeviceState *qdev;
> +    VIOsPAPRDevice *dev = NULL;
> +
> +    QLIST_FOREACH(qdev,&bus->bus.children, sibling) {
> +        dev = (VIOsPAPRDevice *)qdev;
> +        if (dev->reg == reg) {
> +            break;
> +        }
> +    }
> +
> +    return dev;
> +}
> +
> +#ifdef CONFIG_FDT
> +static int vio_make_devnode(VIOsPAPRDevice *dev,
> +                            void *fdt)
> +{
> +    VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)dev->qdev.info;
> +    int vdevice_off, node_off;
> +    int ret;
> +
> +    vdevice_off = fdt_path_offset(fdt, "/vdevice");
> +    if (vdevice_off<  0) {
> +        return vdevice_off;
> +    }
> +
> +    node_off = fdt_add_subnode(fdt, vdevice_off, dev->qdev.id);
> +    if (node_off<  0) {
> +        return node_off;
> +    }
> +
> +    ret = fdt_setprop_cell(fdt, node_off, "reg", dev->reg);
> +    if (ret<  0) {
> +        return ret;
> +    }
> +
> +    if (info->dt_type) {
> +        ret = fdt_setprop_string(fdt, node_off, "device_type",
> +                                 info->dt_type);
> +        if (ret<  0) {
> +            return ret;
> +        }
> +    }
> +
> +    if (info->dt_compatible) {
> +        ret = fdt_setprop_string(fdt, node_off, "compatible",
> +                                 info->dt_compatible);
> +        if (ret<  0) {
> +            return ret;
> +        }
> +    }
> +
> +    if (info->devnode) {
> +        ret = (info->devnode)(dev, fdt, node_off);
> +        if (ret<  0) {
> +            return ret;
> +        }
> +    }
> +
> +    return node_off;
> +}
> +#endif /* CONFIG_FDT */
> +
> +static int spapr_vio_busdev_init(DeviceState *dev, DeviceInfo *info)
> +{
> +    VIOsPAPRDeviceInfo *_info = (VIOsPAPRDeviceInfo *)info;
> +    VIOsPAPRDevice *_dev = (VIOsPAPRDevice *)dev;
> +    char *id;
> +
> +    if (asprintf(&id, "%s@%x", _info->dt_name, _dev->reg)<  0) {
> +        return -1;
> +    }
> +
> +    _dev->qdev.id = id;

asprintf only guarantees existence during the scope it's called in. So 
id is not guaranteed to actually be real data when it's used later on, 
no? I'm not even sure if it'd work of init would strdup it - the 
compiler could optimize the function call below away into a real branch.

> +
> +    return _info->init(_dev);
> +}
> +
> +void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info)
> +{
> +    info->qdev.init = spapr_vio_busdev_init;
> +    info->qdev.bus_info =&spapr_vio_bus_info;
> +
> +    assert(info->qdev.size>= sizeof(VIOsPAPRDevice));
> +    qdev_register(&info->qdev);
> +}
> +
> +VIOsPAPRBus *spapr_vio_bus_init(void)
> +{
> +    VIOsPAPRBus *bus;
> +    BusState *_bus;
> +    DeviceState *dev;
> +    DeviceInfo *_info;
> +
> +    /* Create bridge device */
> +    dev = qdev_create(NULL, "spapr-vio-bridge");
> +    qdev_init_nofail(dev);
> +
> +    /* Create bus on bridge device */
> +
> +    _bus = qbus_create(&spapr_vio_bus_info, dev, "spapr-vio");
> +    bus = DO_UPCAST(VIOsPAPRBus, bus, _bus);
> +
> +    for (_info = device_info_list; _info; _info = _info->next) {
> +        VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)_info;
> +
> +        if (_info->bus_info !=&spapr_vio_bus_info)

Braces

> +            continue;
> +
> +        if (info->hcalls)

Braces

> +            info->hcalls(bus);
> +    }
> +
> +    return bus;
> +}
> +
> +/* Represents sPAPR hcall VIO devices */
> +
> +static int spapr_vio_bridge_init(SysBusDevice *dev)
> +{
> +    /* nothing */
> +    return 0;
> +}
> +
> +static SysBusDeviceInfo spapr_vio_bridge_info = {
> +    .init = spapr_vio_bridge_init,
> +    .qdev.name  = "spapr-vio-bridge",
> +    .qdev.size  = sizeof(SysBusDevice),
> +    .qdev.no_user = 1,
> +};
> +
> +static void spapr_vio_register_devices(void)
> +{
> +    sysbus_register_withprop(&spapr_vio_bridge_info);
> +}
> +
> +device_init(spapr_vio_register_devices)
> +
> +#ifdef CONFIG_FDT
> +int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt)
> +{
> +    DeviceState *qdev;
> +    int ret = 0;
> +
> +    QLIST_FOREACH(qdev,&bus->bus.children, sibling) {
> +        VIOsPAPRDevice *dev = (VIOsPAPRDevice *)qdev;
> +
> +        ret = vio_make_devnode(dev, fdt);
> +
> +        if (ret<  0) {
> +            return ret;
> +        }
> +    }
> +
> +    return 0;
> +}
> +#endif /* CONFIG_FDT */
> diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
> new file mode 100644
> index 0000000..b164ad3
> --- /dev/null
> +++ b/hw/spapr_vio.h
> @@ -0,0 +1,50 @@
> +#ifndef _HW_SPAPR_VIO_H
> +#define _HW_SPAPR_VIO_H
> +/*
> + * QEMU sPAPR VIO bus definitions
> + *
> + * Copyright (c) 2010 David Gibson, IBM Corporation<david@gibson.dropbear.id.au>
> + * Based on the s390 virtio bus definitions:
> + * Copyright (c) 2009 Alexander Graf<agraf@suse.de>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see<http://www.gnu.org/licenses/>.
> + */
> +
> +typedef struct VIOsPAPRDevice {
> +    DeviceState qdev;
> +    uint32_t reg;
> +} VIOsPAPRDevice;
> +
> +typedef struct VIOsPAPRBus {
> +    BusState bus;
> +} VIOsPAPRBus;
> +
> +typedef struct {
> +    DeviceInfo qdev;
> +    const char *dt_name, *dt_type, *dt_compatible;
> +    int (*init)(VIOsPAPRDevice *dev);
> +    void (*hcalls)(VIOsPAPRBus *bus);
> +    int (*devnode)(VIOsPAPRDevice *dev, void *fdt, int node_off);
> +} VIOsPAPRDeviceInfo;
> +
> +extern VIOsPAPRBus *spapr_vio_bus_init(void);
> +extern VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg);
> +extern void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info);
> +extern int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt);
> +
> +void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len);
> +void spapr_vty_create(VIOsPAPRBus *bus,
> +                      uint32_t reg, CharDriverState *chardev);
> +
> +#endif /* _HW_SPAPR_VIO_H */
> diff --git a/hw/spapr_vty.c b/hw/spapr_vty.c
> new file mode 100644
> index 0000000..afc9ef9
> --- /dev/null
> +++ b/hw/spapr_vty.c
> @@ -0,0 +1,145 @@

license header

> +#include "qdev.h"
> +#include "qemu-char.h"
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +#define VTERM_BUFSIZE   16
> +
> +typedef struct VIOsPAPRVTYDevice {
> +    VIOsPAPRDevice sdev;
> +    CharDriverState *chardev;
> +    uint32_t in, out;
> +    uint8_t buf[VTERM_BUFSIZE];
> +} VIOsPAPRVTYDevice;
> +
> +static int vty_can_receive(void *opaque)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)opaque;
> +
> +    return (dev->in - dev->out)<  VTERM_BUFSIZE;
> +}
> +
> +static void vty_receive(void *opaque, const uint8_t *buf, int size)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)opaque;
> +    int i;
> +
> +    for (i = 0; i<  size; i++) {
> +        assert((dev->in - dev->out)<  VTERM_BUFSIZE);
> +        dev->buf[dev->in++ % VTERM_BUFSIZE] = buf[i];
> +    }
> +}
> +
> +static int vty_getchars(VIOsPAPRDevice *sdev, uint8_t *buf, int max)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
> +    int n = 0;
> +
> +    while ((n<  max)&&  (dev->out != dev->in))

Braces

> +        buf[n++] = dev->buf[dev->out++ % VTERM_BUFSIZE];
> +
> +    return n;
> +}
> +
> +void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
> +
> +    /* FIXME: should check the qemu_chr_write() return value */
> +    qemu_chr_write(dev->chardev, buf, len);
> +}
> +
> +static int spapr_vty_init(VIOsPAPRDevice *sdev)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
> +
> +    qemu_chr_add_handlers(dev->chardev, vty_can_receive,
> +                          vty_receive, NULL, dev);
> +
> +    return 0;
> +}
> +
> +static target_ulong h_put_term_char(CPUState *env, sPAPREnvironment *spapr,
> +                                    target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    target_ulong len = args[1];
> +    target_ulong char0_7 = args[2];
> +    target_ulong char8_15 = args[3];
> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    uint8_t buf[16];
> +
> +    if (!sdev)

Braces

> +        return H_PARAMETER;
> +
> +    if (len>  16)

Braces

> +        return H_PARAMETER;
> +
> +    *((uint64_t *)buf) = cpu_to_be64(char0_7);

This is be64_to_cpu, no? Not that it matters...
Btw - shouldn't stq_p work just as well here?

> +    *((uint64_t *)buf + 1) = cpu_to_be64(char8_15);
> +
> +    vty_putchars(sdev, buf, len);
> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_get_term_char(CPUState *env, sPAPREnvironment *spapr,
> +                                    target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    target_ulong *len = args + 0;
> +    target_ulong *char0_7 = args + 1;
> +    target_ulong *char8_15 = args + 2;
> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    uint8_t buf[16];
> +
> +    if (!sdev)

Braces

> +        return H_PARAMETER;
> +
> +    *len = vty_getchars(sdev, buf, sizeof(buf));
> +    if (*len<  16)

Braces

> +        memset(buf + *len, 0, 16 - *len);
> +
> +    *char0_7 = be64_to_cpu(*((uint64_t *)buf));

This is basically *char0_7 = ldq_p(buf); But I don't mind if you leave 
it as is - it's probably faster this way. Not that the 2 ns actually 
matter ;).


Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 15/26] Virtual hash page table handling on pSeries machine
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 15/26] Virtual hash page table handling on pSeries machine David Gibson
@ 2011-03-16 15:03   ` Alexander Graf
  2011-03-17  1:03     ` [Qemu-devel] Re: [PATCH 15/26] Virtual hash page table handling on pSeries machine' David Gibson
  0 siblings, 1 reply; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 15:03 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:56 AM, David Gibson wrote:
> On pSeries logical partitions, excepting the old POWER4-style full system
> partitions, the guest does not have direct access to the hardware page
> table.  Instead, the pagetable exists in hypervisor memory, and the guest
> must manipulate it with hypercalls.
>
> However, our current pSeries emulation more closely resembles the old
> style where the guest must set up and handle the pagetables itself.  This
> patch converts it to act like a modern partition.
>
> This involves two things: first, the hash translation path is modified to
> permit the has table to be stored externally to the emulated machine's
> RAM.  The pSeries machine init code configures the CPUs to use this mode.
>
> Secondly, we emulate the PAPR hypercalls for manipulating the external
> hashed page table.
>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   hw/spapr.c          |   32 ++++++-
>   hw/spapr_hcall.c    |  247 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   target-ppc/cpu.h    |    2 +
>   target-ppc/helper.c |   36 ++++++--
>   4 files changed, 305 insertions(+), 12 deletions(-)
>
> diff --git a/hw/spapr.c b/hw/spapr.c
> index 25e4a9e..c3d9286 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -50,12 +50,15 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>                                 sPAPREnvironment *spapr,
>                                 target_phys_addr_t initrd_base,
>                                 target_phys_addr_t initrd_size,
> -                              const char *kernel_cmdline)
> +                              const char *kernel_cmdline,
> +                              long hash_shift)
>   {
>       void *fdt;
>       uint64_t mem_reg_property[] = { 0, cpu_to_be64(ramsize) };
>       uint32_t start_prop = cpu_to_be32(initrd_base);
>       uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
> +    uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
> +    char hypertas_prop[] = "hcall-pft\0hcall-term";
>       int i;
>       char *modelname;
>       int ret;
> @@ -138,6 +141,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>            * full emu, for kvm we should copy it from the host */
>           _FDT((fdt_property_cell(fdt, "clock-frequency", 1000000000)));
>           _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr)));
> +        _FDT((fdt_property(fdt, "ibm,pft-size", pft_size_prop, sizeof(pft_size_prop))));
>           _FDT((fdt_property_string(fdt, "status", "okay")));
>           _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
>
> @@ -153,6 +157,14 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>
>       _FDT((fdt_end_node(fdt)));
>
> +    /* RTAS */
> +    _FDT((fdt_begin_node(fdt, "rtas")));
> +
> +    _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
> +                       sizeof(hypertas_prop))));
> +
> +    _FDT((fdt_end_node(fdt)));
> +
>       /* vdevice */
>       _FDT((fdt_begin_node(fdt, "vdevice")));
>
> @@ -203,12 +215,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>                              const char *cpu_model)
>   {
>       CPUState *envs[MAX_CPUS];
> -    void *fdt;
> +    void *fdt, *htab;
>       int i;
>       ram_addr_t ram_offset;
>       target_phys_addr_t fdt_addr;
>       uint32_t kernel_base, initrd_base;
> -    long kernel_size, initrd_size;
> +    long kernel_size, initrd_size, htab_size;
> +    long pteg_shift = 17;
>       int fdt_size;
>       sPAPREnvironment *spapr;
>
> @@ -248,6 +261,16 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>       ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
>       cpu_register_physical_memory(0, ram_size, ram_offset);
>
> +    /* allocate hash page table */
> +    htab_size = 1ULL<<  (pteg_shift + 7);

Linux makes the htab size depend on the provided amount of ram. 
Shouldn't we do the same?

> +    htab = qemu_mallocz(htab_size);
> +
> +    for (i = 0; i<  smp_cpus; i++) {
> +        envs[i]->external_htab = htab;
> +        envs[i]->htab_base = -1;
> +        envs[i]->htab_mask = htab_size - 1;
> +    }
> +
>       spapr->vio_bus = spapr_vio_bus_init();
>
>       for (i = 0; i<  MAX_SERIAL_PORTS; i++) {
> @@ -293,7 +316,8 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>
>       /* Prepare the device tree */
>       fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, envs, spapr,
> -                           initrd_base, initrd_size, kernel_cmdline);
> +                           initrd_base, initrd_size, kernel_cmdline,
> +                           pteg_shift + 7);
>       if (!fdt) {
>           hw_error("Couldn't create pSeries device tree\n");
>           exit(1);
> diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
> index 6ddac00..2b14000 100644
> --- a/hw/spapr_hcall.c
> +++ b/hw/spapr_hcall.c
> @@ -1,8 +1,246 @@
>   #include "sysemu.h"
>   #include "cpu.h"
>   #include "qemu-char.h"
> +#include "sysemu.h"
> +#include "qemu-char.h"
> +#include "exec-all.h"
>   #include "hw/spapr.h"
>
> +#define HPTES_PER_GROUP 8
> +
> +#define HPTE_V_SSIZE_SHIFT      62
> +#define HPTE_V_AVPN_SHIFT       7
> +#define HPTE_V_AVPN             0x3fffffffffffff80ULL
> +#define HPTE_V_AVPN_VAL(x)      (((x)&  HPTE_V_AVPN)>>  HPTE_V_AVPN_SHIFT)
> +#define HPTE_V_COMPARE(x,y)     (!(((x) ^ (y))&  0xffffffffffffff80UL))
> +#define HPTE_V_BOLTED           0x0000000000000010ULL
> +#define HPTE_V_LOCK             0x0000000000000008ULL
> +#define HPTE_V_LARGE            0x0000000000000004ULL
> +#define HPTE_V_SECONDARY        0x0000000000000002ULL
> +#define HPTE_V_VALID            0x0000000000000001ULL
> +
> +#define HPTE_R_PP0              0x8000000000000000ULL
> +#define HPTE_R_TS               0x4000000000000000ULL
> +#define HPTE_R_KEY_HI           0x3000000000000000ULL
> +#define HPTE_R_RPN_SHIFT        12
> +#define HPTE_R_RPN              0x3ffffffffffff000ULL
> +#define HPTE_R_FLAGS            0x00000000000003ffULL
> +#define HPTE_R_PP               0x0000000000000003ULL
> +#define HPTE_R_N                0x0000000000000004ULL
> +#define HPTE_R_G                0x0000000000000008ULL
> +#define HPTE_R_M                0x0000000000000010ULL
> +#define HPTE_R_I                0x0000000000000020ULL
> +#define HPTE_R_W                0x0000000000000040ULL
> +#define HPTE_R_WIMG             0x0000000000000078ULL
> +#define HPTE_R_C                0x0000000000000080ULL
> +#define HPTE_R_R                0x0000000000000100ULL
> +#define HPTE_R_KEY_LO           0x0000000000000e00ULL
> +
> +#define HPTE_V_1TB_SEG          0x4000000000000000ULL
> +#define HPTE_V_VRMA_MASK        0x4001ffffff000000ULL
> +
> +#define HPTE_V_HVLOCK           0x40ULL
> +
> +static inline int lock_hpte(void *hpte, target_ulong bits)
> +{
> +    uint64_t pteh;
> +
> +    pteh = ldq_p(hpte);
> +
> +    /* FIXME: probably need some sort of lockage for SMP */

Guest SMP doesn't get mapped to host SMP. So you're safe here.

> +    if (pteh&  bits) {
> +        return 0;
> +    }
> +    stq_p(hpte, pteh | HPTE_V_HVLOCK);
> +    return 1;
> +}
> +
> +static target_ulong compute_tlbie_rb(target_ulong v, target_ulong r,
> +                                     target_ulong pte_index)
> +{
> +    target_ulong rb, va_low;
> +
> +    rb = (v&  ~0x7fULL)<<  16; /* AVA field */
> +    va_low = pte_index>>  3;
> +    if (v&  HPTE_V_SECONDARY)

Braces

> +        va_low = ~va_low;
> +    /* xor vsid from AVA */
> +    if (!(v&  HPTE_V_1TB_SEG))

Braces

> +        va_low ^= v>>  12;
> +    else
> +        va_low ^= v>>  24;
> +    va_low&= 0x7ff;
> +    if (v&  HPTE_V_LARGE) {
> +        rb |= 1;                         /* L field */
> +#if 0 /* Disable that P7 specific bit for now */
> +        if (r&  0xff000) {
> +            /* non-16MB large page, must be 64k */
> +            /* (masks depend on page size) */
> +            rb |= 0x1000;                /* page encoding in LP field */
> +            rb |= (va_low&  0x7f)<<  16; /* 7b of VA in AVA/LP field */
> +            rb |= (va_low&  0xfe);       /* AVAL field */
> +        }
> +#endif
> +    } else {
> +        /* 4kB page */
> +        rb |= (va_low&  0x7ff)<<  12;   /* remaining 11b of AVA */
> +    }
> +    rb |= (v>>  54)&  0x300;            /* B field */
> +    return rb;
> +}
> +
> +static target_ulong h_enter(CPUState *env, sPAPREnvironment *spapr,
> +                            target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong flags = args[0];
> +    target_ulong pte_index = args[1];
> +    target_ulong pteh = args[2];
> +    target_ulong ptel = args[3];
> +    target_ulong porder;
> +    target_ulong i, pa;
> +    uint8_t *hpte;
> +
> +    /* only handle 4k and 16M pages for now */
> +    porder = 12;
> +    if (pteh&  HPTE_V_LARGE) {
> +        if ((ptel&  0xf000) == 0x1000) {
> +            /* 64k page */

According to the comment above and the #if 0 in tlbie you don't support 
64k pages?

> +            porder = 16;
> +        } else if ((ptel&  0xff000) == 0) {
> +            /* 16M page */
> +            porder = 24;
> +            /* lowest AVA bit must be 0 for 16M pages */
> +            if (pteh&  0x80)

Braces

> +                return H_PARAMETER;
> +        } else {
> +            return H_PARAMETER;
> +        }
> +    }
> +
> +    pa = ptel&  HPTE_R_RPN;
> +    /* FIXME: bounds check the pa? */
> +
> +    /* Check WIMG */
> +    if ((ptel&  HPTE_R_WIMG) != HPTE_R_M)

Braces

> +        return H_PARAMETER;
> +    pteh&= ~0x60ULL;
> +
> +    if ((pte_index * HASH_PTE_SIZE_64)&  ~env->htab_mask)

Braces

> +        return H_PARAMETER;
> +    if (likely((flags&  H_EXACT) == 0)) {
> +        pte_index&= ~7ULL;
> +        hpte = env->external_htab + (pte_index * HASH_PTE_SIZE_64);
> +        for (i = 0; ; ++i) {
> +            if (i == 8)

Braces

> +                return H_PTEG_FULL;
> +            if (((ldq_p(hpte)&  HPTE_V_VALID) == 0)&&
> +                lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID)) {
> +                break;
> +            }
> +            hpte += HASH_PTE_SIZE_64;
> +        }
> +    } else {
> +        i = 0;
> +        hpte = env->external_htab + (pte_index * HASH_PTE_SIZE_64);
> +        if (!lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID)) {
> +            return H_PTEG_FULL;
> +        }
> +    }
> +    stq_p(hpte + (HASH_PTE_SIZE_64/2), ptel);
> +    /* eieio();  FIXME: need some sort of barrier for smp? */

see above :)

> +    stq_p(hpte, pteh);
> +
> +    assert (!(ldq_p(hpte)&  HPTE_V_HVLOCK));
> +    args[0] = pte_index + i;
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_remove(CPUState *env, sPAPREnvironment *spapr,
> +                             target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong flags = args[0];
> +    target_ulong pte_index = args[1];
> +    target_ulong avpn = args[2];
> +    uint8_t *hpte;
> +    target_ulong v, r, rb;
> +
> +    if ((pte_index * HASH_PTE_SIZE_64)&  ~env->htab_mask) {
> +        return H_PARAMETER;
> +    }
> +
> +    hpte = env->external_htab + (pte_index * HASH_PTE_SIZE_64);
> +    while (!lock_hpte(hpte, HPTE_V_HVLOCK)) {
> +        /* We have no real concurrency in qemu soft-emulation, so we
> +         * will never actually have a contested lock */
> +        assert(0);
> +    }
> +
> +    v = ldq_p(hpte);
> +    r = ldq_p(hpte + (HASH_PTE_SIZE_64/2));
> +
> +    if ((v&  HPTE_V_VALID) == 0 ||
> +        ((flags&  H_AVPN)&&  (v&  ~0x7fULL) != avpn) ||
> +        ((flags&  H_ANDCOND)&&  (v&  avpn) != 0)) {
> +        stq_p(hpte, v&  ~HPTE_V_HVLOCK);
> +        assert (!(ldq_p(hpte)&  HPTE_V_HVLOCK));
> +        return H_NOT_FOUND;
> +    }
> +    args[0] = v&  ~HPTE_V_HVLOCK;
> +    args[1] = r;
> +    stq_p(hpte, 0);
> +    rb = compute_tlbie_rb(v, r, pte_index);
> +//    ppc_tlb_invalidate_one(env, rb);

Huh?

> +    tlb_flush(env, 1);
> +    assert (!(ldq_p(hpte)&  HPTE_V_HVLOCK));
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_protect(CPUState *env, sPAPREnvironment *spapr,
> +                              target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong flags = args[0];
> +    target_ulong pte_index = args[1];
> +    target_ulong avpn = args[2];
> +    uint8_t *hpte;
> +    target_ulong v, r, rb;
> +
> +    if ((pte_index * HASH_PTE_SIZE_64)&  ~env->htab_mask) {
> +        return H_PARAMETER;
> +    }
> +
> +    hpte = env->external_htab + (pte_index * HASH_PTE_SIZE_64);
> +    while (!lock_hpte(hpte, HPTE_V_HVLOCK)) {
> +        /* We have no real concurrency in qemu soft-emulation, so we
> +         * will never actually have a contested lock */
> +        assert(0);
> +    }
> +
> +    v = ldq_p(hpte);
> +    r = ldq_p(hpte + (HASH_PTE_SIZE_64/2));
> +
> +    if ((v&  HPTE_V_VALID) == 0 ||
> +        ((flags&  H_AVPN)&&  (v&  ~0x7fULL) != avpn)) {
> +        stq_p(hpte, v&  ~HPTE_V_HVLOCK);
> +        assert (!(ldq_p(hpte)&  HPTE_V_HVLOCK));
> +        return H_NOT_FOUND;
> +    }
> +
> +    r&= ~(HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
> +           HPTE_R_KEY_HI | HPTE_R_KEY_LO);
> +    r |= (flags<<  55)&  HPTE_R_PP0;
> +    r |= (flags<<  48)&  HPTE_R_KEY_HI;
> +    r |= flags&  (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
> +    rb = compute_tlbie_rb(v, r, pte_index);
> +    stq_p(hpte, v&  ~HPTE_V_VALID);
> +    //ppc_tlb_invalidate_one(env, rb);

Huh?

> +    tlb_flush(env, 1);

Wow, why do you need a full tlb flush here?



Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 16/26] Implement hcall based RTAS for pSeries machines
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 16/26] Implement hcall based RTAS for pSeries machines David Gibson
@ 2011-03-16 15:08   ` Alexander Graf
  2011-03-17  1:22     ` David Gibson
  2011-03-16 22:08   ` [Qemu-devel] " Anthony Liguori
  1 sibling, 1 reply; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 15:08 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:56 AM, David Gibson wrote:
> On pSeries machines, operating systems can instantiate "RTAS" (Run-Time
> Abstraction Services), a runtime component of the firmware which implements
> a number of low-level, infrequently used operations.  On logical partitions
> under a hypervisor, many of the RTAS functions require hypervisor
> privilege.  For simplicity, therefore, hypervisor systems typically
> implement the in-partition RTAS as just a tiny wrapper around a hypercall
> which actually implements the various RTAS functions.
>
> This patch implements such a hypercall based RTAS for our emulated pSeries
> machine.  A tiny in-partition "firmware" calls a new hypercall, which
> looks up available RTAS services in a table.
>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   Makefile               |    3 +-
>   Makefile.target        |    2 +-
>   hw/spapr.c             |   27 +++++++++++--
>   hw/spapr.h             |   21 ++++++++++
>   hw/spapr_hcall.c       |   15 +++++++
>   hw/spapr_rtas.c        |  104 ++++++++++++++++++++++++++++++++++++++++++++++++
>   pc-bios/spapr-rtas.bin |  Bin 0 ->  20 bytes
>   7 files changed, 166 insertions(+), 6 deletions(-)
>   create mode 100644 hw/spapr_rtas.c
>   create mode 100644 pc-bios/spapr-rtas.bin
>
> diff --git a/Makefile b/Makefile
> index eca4c76..fc4bd24 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -213,7 +213,8 @@ pxe-ne2k_pci.bin pxe-pcnet.bin \
>   pxe-rtl8139.bin pxe-virtio.bin \
>   bamboo.dtb petalogix-s3adsp1800.dtb \
>   multiboot.bin linuxboot.bin \
> -s390-zipl.rom
> +s390-zipl.rom \
> +spapr-rtas.bin
>   else
>   BLOBS=
>   endif
> diff --git a/Makefile.target b/Makefile.target
> index 3f2b235..e333225 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -232,7 +232,7 @@ obj-ppc-y += ppc_oldworld.o
>   # NewWorld PowerMac
>   obj-ppc-y += ppc_newworld.o
>   # IBM pSeries (sPAPR)
> -obj-ppc-y += spapr.o spapr_hcall.o spapr_vio.o
> +obj-ppc-y += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
>   obj-ppc-y += spapr_vty.o
>   # PowerPC 4xx boards
>   obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> index c3d9286..f41451b 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -40,6 +40,7 @@
>   #define KERNEL_LOAD_ADDR        0x00000000
>   #define INITRD_LOAD_ADDR        0x02800000
>   #define FDT_MAX_SIZE            0x10000
> +#define RTAS_MAX_SIZE           0x10000
>
>   #define TIMEBASE_FREQ           512000000ULL
>
> @@ -51,6 +52,8 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>                                 target_phys_addr_t initrd_base,
>                                 target_phys_addr_t initrd_size,
>                                 const char *kernel_cmdline,
> +                              target_phys_addr_t rtas_addr,
> +                              target_phys_addr_t rtas_size,
>                                 long hash_shift)
>   {
>       void *fdt;
> @@ -162,7 +165,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>
>       _FDT((fdt_property(fdt, "ibm,hypertas-functions", hypertas_prop,
>                          sizeof(hypertas_prop))));
> -
> +

Ahem...

>       _FDT((fdt_end_node(fdt)));
>
>       /* vdevice */
> @@ -186,6 +189,11 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>           fprintf(stderr, "couldn't setup vio devices in fdt\n");
>       }
>
> +    /* RTAS */
> +    ret = spapr_rtas_device_tree_setup(fdt, rtas_addr, rtas_size);
> +    if (ret<  0)

Braces

> +        fprintf(stderr, "Couldn't set up RTAS device tree properties\n");
> +
>       _FDT((fdt_pack(fdt)));
>
>       if (fdt_size) {
> @@ -218,12 +226,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>       void *fdt, *htab;
>       int i;
>       ram_addr_t ram_offset;
> -    target_phys_addr_t fdt_addr;
> +    target_phys_addr_t fdt_addr, rtas_addr;
>       uint32_t kernel_base, initrd_base;
> -    long kernel_size, initrd_size, htab_size;
> +    long kernel_size, initrd_size, htab_size, rtas_size;
>       long pteg_shift = 17;
>       int fdt_size;
>       sPAPREnvironment *spapr;
> +    char *filename;
>
>       spapr = qemu_malloc(sizeof(*spapr));
>
> @@ -231,6 +240,8 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>        * 2GB, so that it can be processed with 32-bit code if
>        * necessary */
>       fdt_addr = MIN(ram_size, 0x80000000) - FDT_MAX_SIZE;
> +    /* RTAS goes just below that */
> +    rtas_addr = fdt_addr - RTAS_MAX_SIZE;
>
>       /* init CPUs */
>       if (cpu_model == NULL) {
> @@ -271,6 +282,14 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>           envs[i]->htab_mask = htab_size - 1;
>       }
>
> +    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
> +    rtas_size = load_image_targphys(filename, rtas_addr, ram_size - rtas_addr);
> +    if (rtas_size<  0) {
> +        hw_error("qemu: could not load LPAR rtas '%s'\n", filename);
> +        exit(1);
> +    }
> +    qemu_free(filename);
> +
>       spapr->vio_bus = spapr_vio_bus_init();
>
>       for (i = 0; i<  MAX_SERIAL_PORTS; i++) {
> @@ -317,7 +336,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>       /* Prepare the device tree */
>       fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, envs, spapr,
>                              initrd_base, initrd_size, kernel_cmdline,
> -                           pteg_shift + 7);
> +                           rtas_addr, rtas_size, pteg_shift + 7);
>       if (!fdt) {
>           hw_error("Couldn't create pSeries device tree\n");
>           exit(1);
> diff --git a/hw/spapr.h b/hw/spapr.h
> index 47bf2ef..7a7c319 100644
> --- a/hw/spapr.h
> +++ b/hw/spapr.h
> @@ -237,6 +237,8 @@ typedef struct sPAPREnvironment {
>   #define H_GET_MPP               0x2D4
>   #define MAX_HCALL_OPCODE        H_GET_MPP
>
> +#define H_RTAS                  0x72746173
> +
>   typedef target_ulong (*spapr_hcall_fn)(CPUState *env, sPAPREnvironment *spapr,
>                                          target_ulong opcode,
>                                          target_ulong *args);
> @@ -245,5 +247,24 @@ void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn);
>   target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
>                                target_ulong opcode, target_ulong *args);
>
> +static inline uint32_t rtas_ld(target_ulong phys, int n)
> +{
> +    return ldl_phys(phys + 4*n);
> +}
> +
> +static inline void rtas_st(target_ulong phys, int n, uint32_t val)
> +{
> +    stl_phys(phys + 4*n, val);
> +}
> +
> +typedef void (*spapr_rtas_fn)(sPAPREnvironment *spapr, uint32_t token,
> +                              uint32_t nargs, target_ulong args,
> +                              uint32_t nret, target_ulong rets);
> +void spapr_rtas_register(const char *name, spapr_rtas_fn fn);
> +target_ulong spapr_rtas_call(sPAPREnvironment *spapr,
> +                             uint32_t token, uint32_t nargs, target_ulong args,
> +                             uint32_t nret, target_ulong rets);
> +int spapr_rtas_device_tree_setup(void *fdt, target_phys_addr_t rtas_addr,
> +                                 target_phys_addr_t rtas_size);
>
>   #endif /* !defined (__HW_SPAPR_H__) */
> diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
> index 2b14000..7b8e17c 100644
> --- a/hw/spapr_hcall.c
> +++ b/hw/spapr_hcall.c
> @@ -241,6 +241,16 @@ static target_ulong h_protect(CPUState *env, sPAPREnvironment *spapr,
>       return H_SUCCESS;
>   }
>
> +static target_ulong h_rtas(sPAPREnvironment *spapr, target_ulong rtas_r3)
> +{
> +    uint32_t token = ldl_phys(rtas_r3);
> +    uint32_t nargs = ldl_phys(rtas_r3 + 4);
> +    uint32_t nret = ldl_phys(rtas_r3 + 8);
> +
> +    return spapr_rtas_call(spapr, token, nargs, rtas_r3 + 12,
> +                           nret, rtas_r3 + 12 + 4*nargs);
> +}
> +
>   struct hypercall {
>       spapr_hcall_fn fn;
>   } hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
> @@ -276,6 +286,11 @@ target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
>               return hc->fn(env, spapr, opcode, args);
>       }
>
> +    if (opcode == H_RTAS) {
> +        /* H_RTAS is a special case outside the normal range */
> +        return h_rtas(spapr, args[0]);
> +    }
> +
>       fprintf(stderr, "Unimplemented hcall 0x" TARGET_FMT_lx "\n", opcode);
>       return H_FUNCTION;
>   }
> diff --git a/hw/spapr_rtas.c b/hw/spapr_rtas.c
> new file mode 100644
> index 0000000..c606018
> --- /dev/null
> +++ b/hw/spapr_rtas.c
> @@ -0,0 +1,104 @@
> +#include "cpu.h"
> +#include "sysemu.h"
> +#include "qemu-char.h"
> +#include "hw/qdev.h"
> +#include "device_tree.h"
> +
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +#include<libfdt.h>
> +
> +#define TOKEN_BASE      0x2000
> +#define TOKEN_MAX       0x100
> +
> +static struct rtas_call {
> +    const char *name;
> +    spapr_rtas_fn fn;
> +} rtas_table[TOKEN_MAX];
> +
> +struct rtas_call *rtas_next = rtas_table;
> +
> +target_ulong spapr_rtas_call(sPAPREnvironment *spapr,
> +                             uint32_t token, uint32_t nargs, target_ulong args,
> +                             uint32_t nret, target_ulong rets)
> +{
> +    if ((token>= TOKEN_BASE)
> +&&  ((token - TOKEN_BASE)<  TOKEN_MAX)) {
> +        struct rtas_call *call = rtas_table + (token - TOKEN_BASE);
> +
> +        if (call->fn) {
> +            call->fn(spapr, token, nargs, args, nret, rets);
> +            return H_SUCCESS;
> +        }
> +    }
> +
> +    fprintf(stderr, "Unknown RTAS token 0x%x\n", token);
> +    rtas_st(rets, 0, -3);
> +    return H_PARAMETER;
> +}
> +
> +void spapr_rtas_register(const char *name, spapr_rtas_fn fn)
> +{
> +    assert(rtas_next<  (rtas_table + TOKEN_MAX));
> +
> +    rtas_next->name = name;
> +    rtas_next->fn = fn;
> +
> +    rtas_next++;
> +}
> +
> +int spapr_rtas_device_tree_setup(void *fdt, target_phys_addr_t rtas_addr,
> +                                 target_phys_addr_t rtas_size)
> +{
> +    int ret;
> +    int i;
> +
> +    ret = fdt_add_mem_rsv(fdt, rtas_addr, rtas_size);
> +    if (ret<  0) {
> +        fprintf(stderr, "Couldn't add RTAS reserve entry: %s\n",
> +                fdt_strerror(ret));
> +        return ret;
> +    }
> +
> +    ret = qemu_devtree_setprop_cell(fdt, "/rtas", "linux,rtas-base",
> +                                    rtas_addr);
> +    if (ret<  0) {
> +        fprintf(stderr, "Couldn't add linux,rtas-base property: %s\n",
> +                fdt_strerror(ret));
> +        return ret;
> +    }
> +
> +    ret = qemu_devtree_setprop_cell(fdt, "/rtas", "linux,rtas-entry",
> +                                    rtas_addr);
> +    if (ret<  0) {
> +        fprintf(stderr, "Couldn't add linux,rtas-entry property: %s\n",
> +                fdt_strerror(ret));
> +        return ret;
> +    }
> +
> +    ret = qemu_devtree_setprop_cell(fdt, "/rtas", "rtas-size",
> +                                    rtas_size);
> +    if (ret<  0) {
> +        fprintf(stderr, "Couldn't add rtas-size property: %s\n",
> +                fdt_strerror(ret));
> +        return ret;
> +    }
> +
> +    for (i = 0; i<  TOKEN_MAX; i++) {
> +        struct rtas_call *call =&rtas_table[i];
> +
> +        if (!call->fn) {
> +            continue;
> +        }
> +
> +        ret = qemu_devtree_setprop_cell(fdt, "/rtas", call->name, i + TOKEN_BASE);
> +        if (ret<  0) {
> +            fprintf(stderr, "Couldn't add rtas token for %s: %s\n",
> +                    call->name, fdt_strerror(ret));
> +            return ret;
> +        }
> +
> +    }
> +    return 0;
> +}
> diff --git a/pc-bios/spapr-rtas.bin b/pc-bios/spapr-rtas.bin
> new file mode 100644
> index 0000000000000000000000000000000000000000..eade9c0e8ff0fd3071e3a6638a11c1a2e9a47152
> GIT binary patch
> literal 20
> bcmb<Pk*=^wC@M)vPAqm|U{LaFU{C-6M#cr<
>
> literal 0
> HcmV?d00001

Despite being very simple, this is missing source code. There needs to 
at least be a reference on where to find it in some text file in pc-bios.


Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics) David Gibson
@ 2011-03-16 15:47   ` Alexander Graf
  2011-03-17  1:29     ` David Gibson
  2011-03-16 22:16   ` [Qemu-devel] " Anthony Liguori
  1 sibling, 1 reply; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 15:47 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:56 AM, David Gibson wrote:
> PAPR defines an interrupt control architecture which is logically divided
> into ICS (Interrupt Control Presentation, each unit is responsible for
> presenting interrupts to a particular "interrupt server", i.e. CPU) and
> ICS (Interrupt Control Source, each unit responsible for one or more
> hardware interrupts as numbered globally across the system).  All PAPR
> virtual IO devices expect to deliver interrupts via this mechanism.  In
> Linux, this interrupt controller system is handled by the "xics" driver.
>
> On pSeries systems, access to the interrupt controller is virtualized via
> hypercalls and RTAS methods.  However, the virtualized interface is very
> similar to the underlying interrupt controller hardware, and similar PICs
> exist un-virtualized in some other systems.
>
> This patch implements both the ICP and ICS sides of the PAPR interrupt
> controller.  For now, only the hypercall virtualized interface is provided,
> however it would be relatively straightforward to graft an emulated
> register interface onto the underlying interrupt logic if we want to add
> a machine with a hardware ICS/ICP system in the future.
>
> There are some limitations in this implementation: it is assumed for now
> that only one instance of the ICS exists, although a full xics system can
> have several, each responsible for a different group of hardware irqs.
> ICP/ICS can handle both level-sensitve (LSI) and message signalled (MSI)
> interrupt inputs.  For now, this implementation supports only MSI
> interrupts, since that is used by PAPR virtual IO devices.
>
> Signed-off-by: Paul Mackerras<paulus@samba.org>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   Makefile.target |    2 +-
>   hw/spapr.c      |   26 +++
>   hw/spapr.h      |    2 +
>   hw/xics.c       |  528 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/xics.h       |   13 ++
>   5 files changed, 570 insertions(+), 1 deletions(-)
>   create mode 100644 hw/xics.c
>   create mode 100644 hw/xics.h
>
> diff --git a/Makefile.target b/Makefile.target
> index e333225..2b0588e 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -233,7 +233,7 @@ obj-ppc-y += ppc_oldworld.o
>   obj-ppc-y += ppc_newworld.o
>   # IBM pSeries (sPAPR)
>   obj-ppc-y += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
> -obj-ppc-y += spapr_vty.o
> +obj-ppc-y += xics.o spapr_vty.o
>   # PowerPC 4xx boards
>   obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>   obj-ppc-y += ppc440.o ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> index 23f493a..be30def 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -34,6 +34,7 @@
>
>   #include "hw/spapr.h"
>   #include "hw/spapr_vio.h"
> +#include "hw/xics.h"
>
>   #include<libfdt.h>
>
> @@ -62,6 +63,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>       uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>       uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
>       char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr";
> +    uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
>       int i;
>       char *modelname;
>       int ret;
> @@ -120,6 +122,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>
>       for (i = 0; i<  smp_cpus; i++) {
>           CPUState *env = envs[i];
> +        uint32_t gserver_prop[] = {cpu_to_be32(i), 0}; /* HACK! */
>           char *nodename;
>           uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
>                              0xffffffff, 0xffffffff};
> @@ -147,6 +150,9 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>           _FDT((fdt_property(fdt, "ibm,pft-size", pft_size_prop, sizeof(pft_size_prop))));
>           _FDT((fdt_property_string(fdt, "status", "okay")));
>           _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
> +        _FDT((fdt_property_cell(fdt, "ibm,ppc-interrupt-server#s", i)));
> +        _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s",
> +                           gserver_prop, sizeof(gserver_prop))));
>
>           if (envs[i]->mmu_model&  POWERPC_MMU_1TSEG) {
>               _FDT((fdt_property(fdt, "ibm,processor-segment-sizes",
> @@ -168,6 +174,20 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>
>       _FDT((fdt_end_node(fdt)));
>
> +    /* interrupt controller */
> +    _FDT((fdt_begin_node(fdt, "interrupt-controller@0")));
> +
> +    _FDT((fdt_property_string(fdt, "device_type",
> +                              "PowerPC-External-Interrupt-Presentation")));
> +    _FDT((fdt_property_string(fdt, "compatible", "IBM,ppc-xicp")));
> +    _FDT((fdt_property_cell(fdt, "reg", 0)));
> +    _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
> +    _FDT((fdt_property(fdt, "ibm,interrupt-server-ranges",
> +                       interrupt_server_ranges_prop,
> +                       sizeof(interrupt_server_ranges_prop))));
> +
> +    _FDT((fdt_end_node(fdt)));
> +
>       /* vdevice */
>       _FDT((fdt_begin_node(fdt, "vdevice")));
>
> @@ -175,6 +195,8 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>       _FDT((fdt_property_string(fdt, "compatible", "IBM,vdevice")));
>       _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
>       _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
> +    _FDT((fdt_property_cell(fdt, "#interrupt-cells", 0x2)));
> +    _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
>
>       _FDT((fdt_end_node(fdt)));
>
> @@ -290,6 +312,10 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>       }
>       qemu_free(filename);
>
> +    /* Set up Interrupt Controller */
> +    spapr->icp = xics_system_init(smp_cpus,&env, MAX_SERIAL_PORTS);
> +
> +    /* Set up VIO bus */
>       spapr->vio_bus = spapr_vio_bus_init();
>
>       for (i = 0; i<  MAX_SERIAL_PORTS; i++) {
> diff --git a/hw/spapr.h b/hw/spapr.h
> index 7a7c319..4b54c22 100644
> --- a/hw/spapr.h
> +++ b/hw/spapr.h
> @@ -2,9 +2,11 @@
>   #define __HW_SPAPR_H__
>
>   struct VIOsPAPRBus;
> +struct icp_state;
>
>   typedef struct sPAPREnvironment {
>       struct VIOsPAPRBus *vio_bus;
> +    struct icp_state *icp;
>   } sPAPREnvironment;
>
>   #define H_SUCCESS         0
> diff --git a/hw/xics.c b/hw/xics.c
> new file mode 100644
> index 0000000..46e778a
> --- /dev/null
> +++ b/hw/xics.c
> @@ -0,0 +1,528 @@
> +#include "hw.h"
> +#include "hw/spapr.h"
> +#include "hw/xics.h"
> +
> +#include<pthread.h>
> +
> +/*
> + * ICP: Presentation layer
> + */
> +
> +struct icp_server_state {
> +    uint32_t cppr :8;
> +    uint32_t xisr :24;
> +    uint8_t pending_priority;
> +    uint8_t mfrr;
> +    qemu_irq output;
> +    pthread_mutex_t lock;
> +};
> +
> +struct ics_state;
> +
> +struct icp_state {
> +    long nr_servers;
> +    struct icp_server_state *ss;
> +    struct ics_state *ics;
> +};
> +
> +static void ics_reject(struct ics_state *ics, int nr);
> +static void ics_resend(struct ics_state *ics);
> +static void ics_eoi(struct ics_state *ics, int nr);
> +
> +static void icp_check_ipi(struct icp_state *icp, int server)
> +{
> +    struct icp_server_state *ss = icp->ss + server;
> +
> +    if (ss->xisr&&  (ss->pending_priority<= ss->mfrr)) {
> +        return;
> +    }
> +
> +    if (ss->xisr) {
> +        ics_reject(icp->ics, ss->xisr);
> +    }
> +
> +    ss->xisr = XICS_IPI;
> +    ss->pending_priority = ss->mfrr;
> +    qemu_irq_raise(ss->output);
> +}
> +
> +static void icp_resend(struct icp_state *icp, int server)
> +{
> +    struct icp_server_state *ss = icp->ss + server;
> +
> +    if (ss->mfrr<  ss->cppr) {
> +        icp_check_ipi(icp, server);
> +    }
> +    ics_resend(icp->ics);
> +}
> +
> +static void icp_set_cppr(struct icp_state *icp, int server, uint8_t cppr)
> +{
> +    struct icp_server_state *ss = icp->ss + server;
> +    uint8_t old_cppr;
> +    uint32_t old_xisr;
> +
> +    pthread_mutex_lock(&ss->lock);
> +    old_cppr = ss->cppr;
> +    ss->cppr = cppr;
> +
> +    if (cppr<  old_cppr) {
> +        if (ss->xisr&&  (cppr<= ss->pending_priority)) {
> +            old_xisr = ss->xisr;
> +            ss->xisr = 0;
> +            qemu_irq_lower(ss->output);
> +            ics_reject(icp->ics, old_xisr);
> +        }
> +    } else {
> +        if (!ss->xisr) {
> +            icp_resend(icp, server);
> +        }
> +    }
> +    pthread_mutex_unlock(&ss->lock);
> +}
> +
> +static void icp_set_mfrr(struct icp_state *icp, int nr, uint8_t mfrr)
> +{
> +    struct icp_server_state *ss = icp->ss + nr;
> +
> +    pthread_mutex_lock(&ss->lock);
> +
> +    ss->mfrr = mfrr;
> +    if (mfrr<  ss->cppr) {
> +        icp_check_ipi(icp, nr);
> +    }
> +
> +    pthread_mutex_unlock(&ss->lock);
> +}
> +
> +static uint32_t icp_accept(struct icp_server_state *ss)
> +{
> +    uint32_t xirr;
> +
> +    pthread_mutex_lock(&ss->lock);
> +    qemu_irq_lower(ss->output);
> +    xirr = ss->cppr<<  24 | ss->xisr;
> +    ss->xisr = 0;
> +    ss->cppr = ss->pending_priority;
> +    pthread_mutex_unlock(&ss->lock);
> +    return xirr;
> +}
> +
> +static void icp_eoi(struct icp_state *icp, int server, uint32_t xirr)
> +{
> +    struct icp_server_state *ss = icp->ss + server;
> +
> +    ics_eoi(icp->ics, xirr&  0xffffff);
> +    /* Send EOI ->  ICS */
> +    ss->cppr = xirr>>  24;
> +    if (!ss->xisr) {
> +        icp_resend(icp, server);
> +    }
> +}
> +
> +static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority)
> +{
> +    struct icp_server_state *ss = icp->ss + server;
> +
> +    pthread_mutex_lock(&ss->lock);
> +
> +    if ((priority>= ss->cppr)
> +        || (ss->xisr&&  (ss->pending_priority<= priority))) {
> +        ics_reject(icp->ics, nr);
> +    } else {
> +        if (ss->xisr) {
> +            ics_reject(icp->ics, ss->xisr);
> +        }
> +        ss->xisr = nr;
> +        ss->pending_priority = priority;
> +        qemu_irq_raise(ss->output);
> +    }
> +
> +    pthread_mutex_unlock(&ss->lock);
> +}
> +
> +/*
> + * ICS: Source layer
> + */
> +
> +struct ics_irq_state {
> +    int server;
> +    uint8_t priority;
> +    uint8_t saved_priority;
> +    /* int pending :1; */
> +    /* int presented :1; */
> +    int rejected :1;
> +    int masked_pending :1;
> +};
> +
> +struct ics_state {
> +    int nr_irqs;
> +    int offset;
> +    qemu_irq *qirqs;
> +    struct ics_irq_state *irqs;
> +    struct icp_state *icp;
> +};
> +
> +static int ics_valid_irq(struct ics_state *ics, uint32_t nr)
> +{
> +    return (nr>= ics->offset)
> +&&  (nr<  (ics->offset + ics->nr_irqs));
> +}
> +
> +static void ics_set_irq_msi(void *opaque, int nr, int val)
> +{
> +    struct ics_state *ics = (struct ics_state *)opaque;
> +    struct ics_irq_state *irq = ics->irqs + nr;
> +
> +    if (val) {
> +        if (irq->priority == 0xff) {
> +            irq->masked_pending = 1;
> +            /* masked pending */ ;
> +        } else  {
> +            icp_irq(ics->icp, irq->server, nr + ics->offset, irq->priority);
> +        }
> +    }
> +}
> +
> +static void ics_reject_msi(struct ics_state *ics, int nr)
> +{
> +    struct ics_irq_state *irq = ics->irqs + nr - ics->offset;
> +
> +    irq->rejected = 1;
> +}
> +
> +static void ics_resend_msi(struct ics_state *ics)
> +{
> +    int i;
> +
> +    for (i = 0; i<  ics->nr_irqs; i++) {
> +        struct ics_irq_state *irq = ics->irqs + i;
> +
> +        /* FIXME: filter by server#? */
> +        if (irq->rejected) {
> +            irq->rejected = 0;
> +            if (irq->priority != 0xff) {
> +                icp_irq(ics->icp, irq->server, i + ics->offset, irq->priority);
> +            }
> +        }
> +    }
> +}
> +
> +static void ics_write_xive_msi(struct ics_state *ics, int nr, int server,
> +                               uint8_t priority)
> +{
> +    struct ics_irq_state *irq = ics->irqs + nr;
> +
> +    irq->server = server;
> +    irq->priority = priority;
> +
> +    if (!irq->masked_pending || (priority = 0xff)) {
> +        return;
> +    }
> +
> +    irq->masked_pending = 0;
> +    icp_irq(ics->icp, server, nr + ics->offset, priority);
> +}
> +
> +/* static void ics_recheck_irq(struct ics_state *ics, int nr) */
> +/* { */
> +/*     struct ics_irq_state *irq = xics->irqs + (nr - xics->offset); */
> +
> +/*     if (irq->pending&&  (irq->priority != 0xff)) { */
> +/*      irq->presented = 1; */
> +/*      icp_irq(xicp->ss + irq->server, nr + ics->offset, irq->priority); */
> +/*     } */
> +/* } */
> +
> +/* static void ics_set_irq(void *opaque, int nr, int val) */
> +/* { */
> +/*     struct ics_state *ics = (struct ics_state *)opaque; */
> +/*     struct ics_irq_state *irq = ics->irqs + nr; */
> +
> +/*     irq->pending = val; */
> +/*     ics_recheck_irq(ics, nr); */
> +/* } */
> +
> +/* static void ics_reject(int nr) */
> +/* { */
> +/*     struct ics_irq_state *irq = xics->irqs + (nr - xics->offset); */
> +
> +/*     assert(irq->presented); */
> +/*     irq->rejected = 1; */
> +/*     irq->presented = 0; */
> +/* } */
> +
> +/* static void ics_eoi(int nr) */
> +/* { */
> +/*     struct ics_irq_state *irq = xics->irqs + (nr - xics->offset); */
> +
> +/*     assert(irq->presented); */
> +/*     irq->presented = 0; */
> +/*     irq->rejected = 0; */
> +/*     ics_recheck_irq(xics, nr); */
> +/* } */
> +
> +/* static void ics_resend_irq(struct ics_state *ics, int nr, */
> +/*                            struct icp_server_state *ss) */
> +/* { */
> +/*     struct ics_irq_state *irq = ics->irqs + (nr - ics->offset); */
> +
> +/*     if (!irq->rejected) */
> +/*         return; /\* Not rejected, so no need to resend *\/ */
> +
> +/*     if (ss != (xicp->ss + irq->server)) */
> +/*         return; /\* Not for this server, so don't resend *\/ */
> +
> +/*     ics_recheck_irq(ics, nr); */
> +/* } */
> +
> +/* static void ics_resend(struct icp_server_state *ss) */
> +/* { */
> +/*     int i; */
> +
> +/*     for (i = 0; i<  xics->nr_irqs; i++) */
> +/*         ics_resend_irq(xics, nr, ss); */
> +/* } */

Why is all this commented out? Better #if 0 it all away. Or even better, 
don't include it in the patch - unless you think the code is crucial and 
to be activated soon.

> +
> +static void ics_reject(struct ics_state *ics, int nr)
> +{
> +    ics_reject_msi(ics, nr);
> +}
> +
> +static void ics_resend(struct ics_state *ics)
> +{
> +    ics_resend_msi(ics);
> +}
> +
> +static void ics_eoi(struct ics_state *ics, int nr)
> +{
> +}
> +
> +/*
> + * Exported functions
> + */
> +
> +qemu_irq xics_find_qirq(struct icp_state *icp, int irq)
> +{
> +    if ((irq<  icp->ics->offset)
> +        || (irq>= (icp->ics->offset + icp->ics->nr_irqs))) {
> +        return NULL;
> +    }
> +
> +    return icp->ics->qirqs[irq - icp->ics->offset];
> +}
> +
> +static target_ulong h_cppr(CPUState *env, sPAPREnvironment *spapr,
> +                           target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong cppr = args[0];
> +
> +    icp_set_cppr(spapr->icp, env->cpu_index, cppr);
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_ipi(CPUState *env, sPAPREnvironment *spapr,
> +                          target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong server = args[0];
> +    target_ulong mfrr = args[1];
> +
> +    if (server>= spapr->icp->nr_servers) {
> +        return H_PARAMETER;
> +    }
> +
> +    icp_set_mfrr(spapr->icp, server, mfrr);
> +    return H_SUCCESS;
> +
> +}
> +
> +static target_ulong h_xirr(CPUState *env, sPAPREnvironment *spapr,
> +                           target_ulong opcode, target_ulong *args)
> +{
> +    uint32_t xirr = icp_accept(spapr->icp->ss + env->cpu_index);
> +
> +    args[0] = xirr;
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_eoi(CPUState *env, sPAPREnvironment *spapr,
> +                          target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong xirr = args[0];
> +
> +    icp_eoi(spapr->icp, env->cpu_index, xirr);
> +    return H_SUCCESS;
> +}
> +
> +static void rtas_set_xive(sPAPREnvironment *spapr, uint32_t token,
> +                          uint32_t nargs, target_ulong args,
> +                          uint32_t nret, target_ulong rets)
> +{
> +    struct ics_state *ics = spapr->icp->ics;
> +    uint32_t nr, server, priority;
> +
> +    if ((nargs != 3) || (nret != 1)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    nr = rtas_ld(args, 0);
> +    server = rtas_ld(args, 1);
> +    priority = rtas_ld(args, 2);
> +
> +    if (!ics_valid_irq(ics, nr) || (server>= ics->icp->nr_servers)
> +        || (priority>  0xff)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    ics_write_xive_msi(ics, nr - ics->offset, server, priority);
> +
> +    rtas_st(rets, 0, 0); /* Success */
> +}
> +
> +static void rtas_get_xive(sPAPREnvironment *spapr, uint32_t token,
> +                          uint32_t nargs, target_ulong args,
> +                          uint32_t nret, target_ulong rets)
> +{
> +    struct ics_state *ics = spapr->icp->ics;
> +    uint32_t nr;
> +
> +    if ((nargs != 1) || (nret != 3)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    nr = rtas_ld(args, 0);
> +
> +    if (!ics_valid_irq(ics, nr)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    rtas_st(rets, 0, 0); /* Success */
> +    rtas_st(rets, 1, ics->irqs[nr - ics->offset].server);
> +    rtas_st(rets, 2, ics->irqs[nr - ics->offset].priority);
> +}
> +
> +static void rtas_int_off(sPAPREnvironment *spapr, uint32_t token,
> +                         uint32_t nargs, target_ulong args,
> +                         uint32_t nret, target_ulong rets)
> +{
> +    struct ics_state *ics = spapr->icp->ics;
> +    uint32_t nr;
> +
> +    if ((nargs != 1) || (nret != 1)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    nr = rtas_ld(args, 0);
> +
> +    if (!ics_valid_irq(ics, nr)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    /* This is a NOP for now, since the described PAPR semantics don't
> +     * seem to gel with what Linux does */
> +#if 0
> +    struct ics_irq_state *irq = xics->irqs + (nr - xics->offset);
> +
> +    irq->saved_priority = irq->priority;
> +    ics_write_xive_msi(xics, nr - xics->offset, irq->server, 0xff);
> +#endif
> +
> +    rtas_st(rets, 0, 0); /* Success */
> +}
> +
> +static void rtas_int_on(sPAPREnvironment *spapr, uint32_t token,
> +                        uint32_t nargs, target_ulong args,
> +                        uint32_t nret, target_ulong rets)
> +{
> +    struct ics_state *ics = spapr->icp->ics;
> +    uint32_t nr;
> +
> +    if ((nargs != 1) || (nret != 1)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    nr = rtas_ld(args, 0);
> +
> +    if (!ics_valid_irq(ics, nr)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    /* This is a NOP for now, since the described PAPR semantics don't
> +     * seem to gel with what Linux does */
> +#if 0
> +    struct ics_irq_state *irq = xics->irqs + (nr - xics->offset);
> +
> +    ics_write_xive_msi(xics, nr - xics->offset,
> +                       irq->server, irq->saved_priority);
> +#endif
> +
> +    rtas_st(rets, 0, 0); /* Success */
> +}
> +
> +struct icp_state *xics_system_init(int nr_servers, CPUState *servers[],
> +                                   int nr_irqs)
> +{
> +    int i;
> +    struct icp_state *icp;
> +    struct ics_state *ics;
> +
> +    icp = qemu_mallocz(sizeof(*icp));
> +    icp->nr_servers = nr_servers;
> +    icp->ss = qemu_mallocz(nr_servers * sizeof(struct icp_server_state));
> +
> +    for (i = 0; i<  nr_servers; i++) {
> +        servers[i]->cpu_index = i;
> +
> +        switch (PPC_INPUT(servers[i])) {
> +        case PPC_FLAGS_INPUT_POWER7:
> +            icp->ss[i].output = servers[i]->irq_inputs[POWER7_INPUT_INT];
> +            break;
> +
> +        case PPC_FLAGS_INPUT_970:
> +            icp->ss[i].output = servers[i]->irq_inputs[PPC970_INPUT_INT];
> +            break;
> +
> +        default:
> +            hw_error("XICS interrupt model does not support this CPU bus model\n");
> +            exit(1);
> +        }
> +
> +        icp->ss[i].mfrr = 0xff;
> +        pthread_mutex_init(&icp->ss[i].lock, NULL);
> +    }
> +
> +    ics = qemu_mallocz(sizeof(*ics));
> +    ics->nr_irqs = nr_irqs;
> +    ics->offset = 16;
> +    ics->irqs = qemu_mallocz(nr_irqs * sizeof(struct ics_irq_state));
> +
> +    icp->ics = ics;
> +    ics->icp = icp;
> +
> +    for (i = 0; i<  nr_irqs; i++) {
> +        ics->irqs[i].priority = 0xff;
> +        ics->irqs[i].saved_priority = 0xff;
> +    }
> +
> +    ics->qirqs = qemu_allocate_irqs(ics_set_irq_msi, ics, nr_irqs);
> +
> +    spapr_register_hypercall(H_CPPR, h_cppr);
> +    spapr_register_hypercall(H_IPI, h_ipi);
> +    spapr_register_hypercall(H_XIRR, h_xirr);
> +    spapr_register_hypercall(H_EOI, h_eoi);
> +
> +    spapr_rtas_register("ibm,set-xive", rtas_set_xive);
> +    spapr_rtas_register("ibm,get-xive", rtas_get_xive);
> +    spapr_rtas_register("ibm,int-off", rtas_int_off);
> +    spapr_rtas_register("ibm,int-on", rtas_int_on);
> +
> +    return icp;
> +}
> diff --git a/hw/xics.h b/hw/xics.h
> new file mode 100644
> index 0000000..e55f5f1
> --- /dev/null
> +++ b/hw/xics.h

Header missing

> @@ -0,0 +1,13 @@
> +#if !defined(__XICS_H__)
> +#define __XICS_H__
> +
> +#define XICS_IPI        0x2
> +
> +struct icp_state;
> +
> +qemu_irq xics_find_qirq(struct icp_state *icp, int irq);
> +
> +struct icp_state *xics_system_init(int nr_servers, CPUState *servers[],
> +                                   int nr_irqs);
> +
> +#endif /* __XICS_H__ */

Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 19/26] Add PAPR H_VIO_SIGNAL hypercall and infrastructure for VIO interrupts
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 19/26] Add PAPR H_VIO_SIGNAL hypercall and infrastructure for VIO interrupts David Gibson
@ 2011-03-16 15:49   ` Alexander Graf
  2011-03-17  1:38     ` David Gibson
  0 siblings, 1 reply; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 15:49 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:56 AM, David Gibson wrote:
> This patch adds infrastructure to support interrupts from PAPR virtual IO
> devices.  This includes correctly advertising those interrupts in the
> device tree, and implementing the H_VIO_SIGNAL hypercall, used to
> enable and disable individual device interrupts.
>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   hw/spapr.c     |    2 +-
>   hw/spapr_vio.c |   34 ++++++++++++++++++++++++++++++++++
>   hw/spapr_vio.h |    6 ++++++
>   3 files changed, 41 insertions(+), 1 deletions(-)
>
> diff --git a/hw/spapr.c b/hw/spapr.c
> index be30def..5b19963 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -62,7 +62,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>       uint32_t start_prop = cpu_to_be32(initrd_base);
>       uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>       uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
> -    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr";
> +    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt";
>       uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
>       int i;
>       char *modelname;
> diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
> index 0ed63f4..45edd94 100644
> --- a/hw/spapr_vio.c
> +++ b/hw/spapr_vio.c
> @@ -105,6 +105,15 @@ static int vio_make_devnode(VIOsPAPRDevice *dev,
>           }
>       }
>
> +    if (dev->qirq) {
> +        uint32_t ints_prop[] = {cpu_to_be32(dev->vio_irq_num), 0};
> +
> +        ret = fdt_setprop(fdt, node_off, "interrupts", ints_prop,
> +                          sizeof(ints_prop));
> +        if (ret<  0)

Braces

> +            return ret;
> +    }
> +
>       if (info->devnode) {
>           ret = (info->devnode)(dev, fdt, node_off);
>           if (ret<  0) {
> @@ -140,6 +149,28 @@ void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info)
>       qdev_register(&info->qdev);
>   }
>
> +static target_ulong h_vio_signal(CPUState *env, sPAPREnvironment *spapr,
> +                                 target_ulong opcode,
> +                                 target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    target_ulong mode = args[1];
> +    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    VIOsPAPRDeviceInfo *info;
> +
> +    if (!dev)

Braces

> +        return H_PARAMETER;
> +
> +    info = (VIOsPAPRDeviceInfo *)dev->qdev.info;
> +
> +    if (mode&  ~info->signal_mask)

Braces

> +        return H_PARAMETER;;
> +
> +    dev->signal_state = mode;

No need to notify the device?


Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 21/26] Implement TCE translation for sPAPR VIO
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 21/26] Implement TCE translation for sPAPR VIO David Gibson
@ 2011-03-16 16:03   ` Alexander Graf
  2011-03-16 20:05     ` Benjamin Herrenschmidt
  2011-03-17  1:43     ` David Gibson
  2011-03-16 22:20   ` [Qemu-devel] " Anthony Liguori
  1 sibling, 2 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 16:03 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:56 AM, David Gibson wrote:
> From: Ben Herrenschmidt<benh@kernel.crashing.org>
>
> This patch implements the necessary infrastructure and hypercalls for
> sPAPR's TCE (Translation Control Entry) IOMMU mechanism.  This is necessary
> for all virtual IO devices which do DMA (i.e. nearly all of them).
>
> Signed-off-by: Ben Herrenschmidt<benh@kernel.crashing.org>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   hw/spapr.c     |    3 +-
>   hw/spapr_vio.c |  232 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/spapr_vio.h |   32 ++++++++
>   3 files changed, 266 insertions(+), 1 deletions(-)
>
> diff --git a/hw/spapr.c b/hw/spapr.c
> index e7f8864..a362889 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -62,7 +62,8 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>       uint32_t start_prop = cpu_to_be32(initrd_base);
>       uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>       uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
> -    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt";
> +    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
> +        "\0hcall-tce";
>       uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
>       int i;
>       char *modelname;
> diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
> index 45edd94..37cf51e 100644
> --- a/hw/spapr_vio.c
> +++ b/hw/spapr_vio.c
> @@ -37,6 +37,7 @@
>   #endif /* CONFIG_FDT */
>
>   /* #define DEBUG_SPAPR */
> +/* #define DEBUG_TCE */
>
>   #ifdef DEBUG_SPAPR
>   #define dprintf(fmt, ...) \
> @@ -114,6 +115,28 @@ static int vio_make_devnode(VIOsPAPRDevice *dev,
>               return ret;
>       }
>
> +    if (dev->rtce_window_size) {
> +        uint32_t dma_prop[] = {cpu_to_be32(dev->reg),
> +                               0, 0,
> +                               0, cpu_to_be32(dev->rtce_window_size)};
> +
> +        ret = fdt_setprop_cell(fdt, node_off, "ibm,#dma-address-cells", 2);
> +        if (ret<  0) {
> +            return ret;
> +        }
> +
> +        ret = fdt_setprop_cell(fdt, node_off, "ibm,#dma-size-cells", 2);
> +        if (ret<  0) {
> +            return ret;
> +        }
> +
> +        ret = fdt_setprop(fdt, node_off, "ibm,my-dma-window", dma_prop,
> +                          sizeof(dma_prop));
> +        if (ret<  0) {
> +            return ret;
> +        }
> +    }
> +
>       if (info->devnode) {
>           ret = (info->devnode)(dev, fdt, node_off);
>           if (ret<  0) {
> @@ -125,6 +148,210 @@ static int vio_make_devnode(VIOsPAPRDevice *dev,
>   }
>   #endif /* CONFIG_FDT */
>
> +/*
> + * RTCE handling
> + */
> +
> +static void rtce_init(VIOsPAPRDevice *dev)
> +{
> +    size_t size = (dev->rtce_window_size>>  SPAPR_VIO_TCE_PAGE_SHIFT)
> +        * sizeof(VIOsPAPR_RTCE);
> +
> +    if (size) {
> +        dev->rtce_table = qemu_mallocz(size);
> +    }
> +}
> +
> +static target_ulong h_put_tce(CPUState *env, sPAPREnvironment *spapr,
> +                              target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong liobn = args[0];
> +    target_ulong ioba = args[1];
> +    target_ulong tce = args[2];
> +    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, liobn);
> +    VIOsPAPR_RTCE *rtce;
> +
> +    if (!dev) {
> +        fprintf(stderr, "spapr_vio_put_tce on non-existent LIOBN "
> +                TARGET_FMT_lx "\n",
> +                liobn);
> +        return H_PARAMETER;
> +    }
> +
> +    ioba&= ~(SPAPR_VIO_TCE_PAGE_SIZE - 1);
> +
> +#ifdef DEBUG_TCE
> +    fprintf(stderr, "spapr_vio_put_tce on %s  ioba 0x" TARGET_FMT_lx
> +            "  TCE 0x" TARGET_FMT_lx "\n", dev->qdev.id, ioba, tce);
> +#endif
> +
> +    if (ioba>= dev->rtce_window_size) {
> +        fprintf(stderr, "spapr_vio_put_tce on out-of-boards IOBA 0x" TARGET_FMT_lx "\n",
> +                ioba);
> +        return H_PARAMETER;
> +    }
> +
> +    rtce = dev->rtce_table + (ioba>>  SPAPR_VIO_TCE_PAGE_SHIFT);
> +    rtce->tce = tce;
> +
> +    return H_SUCCESS;
> +}
> +
> +int spapr_vio_check_tces(VIOsPAPRDevice *dev, target_ulong ioba,
> +                         target_ulong len, enum VIOsPAPR_TCEAccess access)
> +{
> +    int start, end, i;
> +
> +    start = ioba>>  SPAPR_VIO_TCE_PAGE_SHIFT;
> +    end = (ioba + len - 1)>>  SPAPR_VIO_TCE_PAGE_SHIFT;
> +
> +    for (i = start; i<= end; i++) {
> +        if ((dev->rtce_table[i].tce&  access) != access) {
> +            fprintf(stderr, "FAIL on %d\n", i);
> +            return -1;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +/* XX Might want to special case KVM for speed ? */

XXX

> +int spapr_tce_dma_write(VIOsPAPRDevice *dev, uint64_t taddr, const void *buf,
> +                        uint32_t size)
> +{
> +#ifdef DEBUG_TCE
> +    fprintf(stderr, "spapr_tce_dma_write taddr=0x%llx size=0x%x\n",
> +            (unsigned long long)taddr, size);
> +#endif
> +
> +    while(size) {
> +        uint64_t tce;
> +        uint32_t lsize;
> +        uint64_t txaddr;
> +
> +        /* Check if we are in bound */
> +        if (taddr>= dev->rtce_window_size) {
> +            fprintf(stderr, "spapr_tce_dma_write out of bounds\n");
> +            return -H_DEST_PARM;
> +        }
> +        tce = dev->rtce_table[taddr>>  SPAPR_VIO_TCE_PAGE_SHIFT].tce;
> +
> +        /* How much til end of page ? */
> +        lsize = MIN(size, ((~taddr)&  SPAPR_VIO_TCE_PAGE_MASK) + 1);
> +
> +        /* Check TCE */
> +        if (!(tce&  2))

Braces

> +            return -H_DEST_PARM;
> +
> +        /* Translate */
> +        txaddr = (tce&  ~SPAPR_VIO_TCE_PAGE_MASK) | (taddr&  SPAPR_VIO_TCE_PAGE_MASK);
> +
> +#ifdef DEBUG_TCE
> +        fprintf(stderr, " ->  write to txaddr=0x%llx, size=0x%x\n",
> +                (unsigned long long)txaddr, lsize);
> +#endif
> +
> +        /* Do it */
> +        cpu_physical_memory_write(txaddr, buf, lsize);
> +        buf += lsize;
> +        taddr += lsize;
> +        size -= lsize;
> +    }
> +    return 0;
> +}
> +
> +/* XX Might want to special case KVM for speed ? */

XXX

> +int spapr_tce_dma_zero(VIOsPAPRDevice *dev, uint64_t taddr, uint32_t size)
> +{
> +    uint8_t *zeroes;
> +
> +#ifdef DEBUG_TCE
> +    fprintf(stderr, "spapr_tce_dma_zero taddr=0x%llx size=0x%x\n",
> +            (unsigned long long)taddr, size);
> +#endif
> +
> +    /* FIXME: do this better... */
> +    zeroes = alloca(size);
> +    memset(zeroes, 0, size);

You sure that zeroes is still alive during the call? If I were a 
compiler, I'd probably optimize the return away so that it'd end up 
being a simple branch to spapr_tce_dma_write - coincidentally 
invalidating the stack that zeroes is on.


Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 22/26] Implement sPAPR Virtual LAN (ibmveth)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 22/26] Implement sPAPR Virtual LAN (ibmveth) David Gibson
@ 2011-03-16 16:12   ` Alexander Graf
  2011-03-17  2:04     ` David Gibson
  2011-03-16 22:29   ` [Qemu-devel] " Anthony Liguori
  1 sibling, 1 reply; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 16:12 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:56 AM, David Gibson wrote:
> This patch implements the PAPR specified Inter Virtual Machine Logical
> LAN; that is the virtual hardware used by the Linux ibmveth driver.
>
> Signed-off-by: Paul Mackerras<paulus@samba.org>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   Makefile.target |    2 +-
>   hw/spapr.c      |   21 +++-
>   hw/spapr_llan.c |  476 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/spapr_vio.h  |    9 +-
>   4 files changed, 503 insertions(+), 5 deletions(-)
>   create mode 100644 hw/spapr_llan.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 2b0588e..ef86d43 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -233,7 +233,7 @@ obj-ppc-y += ppc_oldworld.o
>   obj-ppc-y += ppc_newworld.o
>   # IBM pSeries (sPAPR)
>   obj-ppc-y += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
> -obj-ppc-y += xics.o spapr_vty.o
> +obj-ppc-y += xics.o spapr_vty.o spapr_llan.o
>   # PowerPC 4xx boards
>   obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>   obj-ppc-y += ppc440.o ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> index a362889..44cf3cc 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -27,6 +27,7 @@
>   #include "sysemu.h"
>   #include "hw.h"
>   #include "elf.h"
> +#include "net.h"
>
>   #include "hw/boards.h"
>   #include "hw/ppc.h"
> @@ -315,7 +316,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>       qemu_free(filename);
>
>       /* Set up Interrupt Controller */
> -    spapr->icp = xics_system_init(smp_cpus,&env, MAX_SERIAL_PORTS);
> +    spapr->icp = xics_system_init(smp_cpus, envs, MAX_SERIAL_PORTS + nb_nics);
>
>       /* Set up VIO bus */
>       spapr->vio_bus = spapr_vio_bus_init();
> @@ -327,6 +328,24 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>           }
>       }
>
> +    for (i = 0; i<  nb_nics; i++, irq++) {
> +        NICInfo *nd =&nd_table[i];
> +
> +        if (!nd->model) {
> +            nd->model = qemu_strdup("ibmveth");
> +        }
> +
> +        if (strcmp(nd->model, "ibmveth") == 0) {
> +            spapr_vlan_create(spapr->vio_bus, 0x1000 + i, nd,
> +                              xics_find_qirq(spapr->icp, irq), irq);
> +        } else {
> +            fprintf(stderr, "pSeries (sPAPR) platform does not support "
> +                    "NIC model '%s' (only ibmveth is supported)\n",
> +                    nd->model);
> +            exit(1);
> +        }
> +    }
> +
>       if (kernel_filename) {
>           uint64_t lowaddr = 0;
>
> diff --git a/hw/spapr_llan.c b/hw/spapr_llan.c
> new file mode 100644
> index 0000000..da0562d
> --- /dev/null
> +++ b/hw/spapr_llan.c
> @@ -0,0 +1,476 @@

License header

> +#include "hw.h"
> +#include "net.h"
> +#include "hw/qdev.h"
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +#include<libfdt.h>

Hrm - might be good to protect compilation against existence of fdt then?

> +
> +#define ETH_ALEN        6
> +
> +//#define DEBUG
> +
> +#ifdef DEBUG
> +#define dprintf(fmt...) do { fprintf(stderr, fmt); } while(0)
> +#else
> +#define dprintf(fmt...)
> +#endif
> +
> +/*
> + * Virtual LAN device
> + */
> +
> +typedef uint64_t vlan_bd_t;
> +
> +#define VLAN_BD_VALID        0x8000000000000000ULL
> +#define VLAN_BD_TOGGLE       0x4000000000000000ULL
> +#define VLAN_BD_NO_CSUM      0x0200000000000000ULL
> +#define VLAN_BD_CSUM_GOOD    0x0100000000000000ULL
> +#define VLAN_BD_LEN_MASK     0x00ffffff00000000ULL
> +#define VLAN_BD_LEN(bd)      (((bd)&  VLAN_BD_LEN_MASK)>>  32)
> +#define VLAN_BD_ADDR_MASK    0x00000000ffffffffULL
> +#define VLAN_BD_ADDR(bd)     ((bd)&  VLAN_BD_ADDR_MASK)
> +
> +#define VLAN_VALID_BD(addr, len) (VLAN_BD_VALID | \
> +                                  (((len)<<  32)&  VLAN_BD_LEN_MASK) |  \
> +                                  (addr&  VLAN_BD_ADDR_MASK))
> +
> +#define VLAN_RXQC_TOGGLE     0x80
> +#define VLAN_RXQC_VALID      0x40
> +#define VLAN_RXQC_NO_CSUM    0x02
> +#define VLAN_RXQC_CSUM_GOOD  0x01
> +
> +#define VLAN_RQ_ALIGNMENT    16
> +#define VLAN_RXQ_BD_OFF      0
> +#define VLAN_FILTER_BD_OFF   8
> +#define VLAN_RX_BDS_OFF      16
> +#define VLAN_MAX_BUFS        ((SPAPR_VIO_TCE_PAGE_SIZE - VLAN_RX_BDS_OFF) / 8)
> +
> +typedef struct VIOsPAPRVLANDevice {
> +    VIOsPAPRDevice sdev;
> +    NICConf nicconf;
> +    NICState *nic;
> +    int isopen;
> +    target_ulong buf_list;
> +    int add_buf_ptr, use_buf_ptr, rx_bufs;
> +    target_ulong rxq_ptr;
> +} VIOsPAPRVLANDevice;
> +
> +static int spapr_vlan_can_receive(VLANClientState *nc)
> +{
> +    VIOsPAPRVLANDevice *dev = DO_UPCAST(NICState, nc, nc)->opaque;
> +
> +    return (dev->isopen&&  dev->rx_bufs>  0);
> +}
> +
> +static ssize_t spapr_vlan_receive(VLANClientState *nc, const uint8_t *buf,
> +                                  size_t size)
> +{
> +    VIOsPAPRDevice *sdev = DO_UPCAST(NICState, nc, nc)->opaque;
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +    vlan_bd_t rxq_bd = ldq_tce(sdev, dev->buf_list + VLAN_RXQ_BD_OFF);
> +    vlan_bd_t bd;
> +    int buf_ptr = dev->use_buf_ptr;
> +    uint64_t handle;
> +    uint8_t control;
> +
> +    dprintf("spapr_vlan_receive() [%s] rx_bufs=%d\n", sdev->qdev.id,
> +            dev->rx_bufs);
> +
> +    if (!dev->isopen) {
> +        return -1;
> +    }
> +
> +    if (!dev->rx_bufs) {
> +        return -1;
> +    }
> +
> +    do {
> +        buf_ptr += 8;
> +        if (buf_ptr>= SPAPR_VIO_TCE_PAGE_SIZE) {
> +            buf_ptr = VLAN_RX_BDS_OFF;
> +        }
> +
> +        bd = ldq_tce(sdev, dev->buf_list + buf_ptr);
> +        dprintf("use_buf_ptr=%d bd=0x%016llx\n",
> +                buf_ptr, (unsigned long long)bd);
> +    } while ((!(bd&  VLAN_BD_VALID) || (VLAN_BD_LEN(bd)<  (size + 8)))
> +&&  (buf_ptr != dev->use_buf_ptr));
> +
> +    if (!(bd&  VLAN_BD_VALID) || (VLAN_BD_LEN(bd)<  (size + 8))) {
> +        /* Failed to find a suitable buffer */
> +        return -1;
> +    }
> +
> +    /* Remove the buffer from the pool */
> +    dev->rx_bufs--;
> +    dev->use_buf_ptr = buf_ptr;
> +    stq_tce(sdev, dev->buf_list + dev->use_buf_ptr, 0);
> +
> +    dprintf("Found buffer: ptr=%d num=%d\n", dev->use_buf_ptr, dev->rx_bufs);
> +
> +    /* Transfer the packet data */
> +    if (spapr_tce_dma_write(sdev, VLAN_BD_ADDR(bd) + 8, buf, size)<  0) {
> +        return -1;
> +    }
> +
> +    dprintf("spapr_vlan_receive: DMA write completed\n");
> +
> +    /* Update the receive queue */
> +    control = VLAN_RXQC_TOGGLE | VLAN_RXQC_VALID;
> +    if (rxq_bd&  VLAN_BD_TOGGLE) {
> +        control ^= VLAN_RXQC_TOGGLE;
> +    }
> +
> +    handle = ldq_tce(sdev, VLAN_BD_ADDR(bd));
> +    stq_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 8, handle);
> +    stw_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 4, size);
> +    sth_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 2, 8);
> +    stb_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr, control);
> +
> +    dprintf("wrote rxq entry (ptr=0x%llx): 0x%016llx 0x%016llx\n",
> +            (unsigned long long)dev->rxq_ptr,
> +            (unsigned long long)ldq_tce(sdev, VLAN_BD_ADDR(rxq_bd) +
> +                                        dev->rxq_ptr),
> +            (unsigned long long)ldq_tce(sdev, VLAN_BD_ADDR(rxq_bd) +
> +                                        dev->rxq_ptr + 8));
> +
> +    dev->rxq_ptr += 16;
> +    if (dev->rxq_ptr>= VLAN_BD_LEN(rxq_bd)) {
> +        dev->rxq_ptr = 0;
> +        stq_tce(sdev, dev->buf_list + VLAN_RXQ_BD_OFF, rxq_bd ^ VLAN_BD_TOGGLE);
> +    }
> +
> +    if (sdev->signal_state&  1) {
> +        qemu_irq_pulse(sdev->qirq);
> +    }
> +
> +    return size;
> +}
> +
> +static NetClientInfo net_spapr_vlan_info = {
> +    .type = NET_CLIENT_TYPE_NIC,
> +    .size = sizeof(NICState),
> +    .can_receive = spapr_vlan_can_receive,
> +    .receive = spapr_vlan_receive,
> +};
> +
> +static int spapr_vlan_init(VIOsPAPRDevice *sdev)
> +{
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +    VIOsPAPRBus *bus;
> +
> +    bus = DO_UPCAST(VIOsPAPRBus, bus, sdev->qdev.parent_bus);
> +
> +    qemu_macaddr_default_if_unset(&dev->nicconf.macaddr);
> +
> +    dev->nic = qemu_new_nic(&net_spapr_vlan_info,&dev->nicconf,
> +                            sdev->qdev.info->name, sdev->qdev.id, dev);
> +    qemu_format_nic_info_str(&dev->nic->nc, dev->nicconf.macaddr.a);
> +
> +    return 0;
> +}
> +
> +void spapr_vlan_create(VIOsPAPRBus *bus, uint32_t reg, NICInfo *nd,
> +                       qemu_irq qirq, uint32_t vio_irq_num)
> +{
> +    DeviceState *dev;
> +    VIOsPAPRDevice *sdev;
> +
> +    dev = qdev_create(&bus->bus, "spapr-vlan");
> +    qdev_prop_set_uint32(dev, "reg", reg);
> +
> +    qdev_set_nic_properties(dev, nd);
> +
> +    qdev_init_nofail(dev);
> +    sdev = (VIOsPAPRDevice *)dev;
> +    sdev->qirq = qirq;
> +    sdev->vio_irq_num = vio_irq_num;
> +}
> +
> +static int spapr_vlan_devnode(VIOsPAPRDevice *dev, void *fdt, int node_off)
> +{
> +    VIOsPAPRVLANDevice *vdev = (VIOsPAPRVLANDevice *)dev;
> +    int ret;
> +
> +    ret = fdt_setprop(fdt, node_off, "local-mac-address",
> +&vdev->nicconf.macaddr, ETH_ALEN);
> +    if (ret<  0) {
> +        return ret;
> +    }
> +
> +    ret = fdt_setprop_cell(fdt, node_off, "ibm,mac-address-filters", 0);
> +    if (ret<  0) {
> +        return ret;
> +    }
> +
> +    return 0;
> +}
> +
> +static int check_bd(VIOsPAPRVLANDevice *dev, vlan_bd_t bd, target_ulong alignment)
> +{
> +    if ((VLAN_BD_ADDR(bd) % alignment)
> +        || (VLAN_BD_LEN(bd) % alignment)) {
> +        return -1;
> +    }
> +
> +    if (spapr_vio_check_tces(&dev->sdev, VLAN_BD_ADDR(bd),
> +                             VLAN_BD_LEN(bd), SPAPR_TCE_RW) != 0) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static target_ulong h_register_logical_lan(CPUState *env, sPAPREnvironment *spapr,
> +                                           target_ulong opcode,
> +                                           target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    target_ulong buf_list = args[1];
> +    target_ulong rec_queue = args[2];
> +    target_ulong filter_list = args[3];
> +//    target_ulong mac_address = args[4];

Hrm :). Duplicate from below?

> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +    vlan_bd_t filter_list_bd;
> +#ifdef DEBUG
> +    target_ulong mac_address = args[4];
> +#endif
> +
> +    if (!dev) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (dev->isopen) {
> +        fprintf(stderr, "H_REGISTER_LOGICAL_LAN called twice without "
> +                "H_FREE_LOGICAL_LAN\n");
> +        return H_RESOURCE;
> +    }
> +
> +    if (check_bd(dev, VLAN_VALID_BD(buf_list, SPAPR_VIO_TCE_PAGE_SIZE),
> +                 SPAPR_VIO_TCE_PAGE_SIZE)<  0) {
> +        fprintf(stderr, "Bad buf_list 0x" TARGET_FMT_lx
> +                " for H_REGISTER_LOGICAL_LAN\n", buf_list);
> +        return H_PARAMETER;
> +    }
> +
> +    filter_list_bd = VLAN_VALID_BD(filter_list, SPAPR_VIO_TCE_PAGE_SIZE);
> +    if (check_bd(dev, filter_list_bd, SPAPR_VIO_TCE_PAGE_SIZE)<  0) {
> +        fprintf(stderr, "Bad filter_list 0x" TARGET_FMT_lx
> +                " for H_REGISTER_LOGICAL_LAN\n", filter_list);
> +        return H_PARAMETER;
> +    }
> +
> +    if (!(rec_queue&  VLAN_BD_VALID)
> +        || (check_bd(dev, rec_queue, VLAN_RQ_ALIGNMENT)<  0)) {
> +        fprintf(stderr, "Bad receive queue for H_REGISTER_LOGICAL_LAN\n");
> +        return H_PARAMETER;
> +    }
> +
> +    dev->buf_list = buf_list;
> +    sdev->signal_state = 0;
> +
> +    rec_queue&= ~VLAN_BD_TOGGLE;
> +
> +    /* Initialize the buffer list */
> +    stq_tce(sdev, buf_list, rec_queue);
> +    stq_tce(sdev, buf_list + 8, filter_list_bd);
> +    spapr_tce_dma_zero(sdev, buf_list + VLAN_RX_BDS_OFF,
> +                       SPAPR_VIO_TCE_PAGE_SIZE - VLAN_RX_BDS_OFF);
> +    dev->add_buf_ptr = VLAN_RX_BDS_OFF - 8;
> +    dev->use_buf_ptr = VLAN_RX_BDS_OFF - 8;
> +    dev->rx_bufs = 0;
> +    dev->rxq_ptr = 0;
> +
> +    /* Initialize the receive queue */
> +    spapr_tce_dma_zero(sdev, VLAN_BD_ADDR(rec_queue), VLAN_BD_LEN(rec_queue));
> +
> +    dev->isopen = 1;
> +    return H_SUCCESS;
> +}
> +
> +
> +static target_ulong h_free_logical_lan(CPUState *env, sPAPREnvironment *spapr,
> +                                       target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +
> +    if (!dev) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (!dev->isopen) {
> +        fprintf(stderr, "H_FREE_LOGICAL_LAN called without "
> +                "H_REGISTER_LOGICAL_LAN\n");
> +        return H_RESOURCE;
> +    }
> +
> +    dev->buf_list = 0;
> +    dev->rx_bufs = 0;
> +    dev->isopen = 0;
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_add_logical_lan_buffer(CPUState *env, sPAPREnvironment *spapr,
> +                                             target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    target_ulong buf = args[1];
> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +    vlan_bd_t bd;
> +
> +    dprintf("H_ADD_LOGICAL_LAN_BUFFER(0x" TARGET_FMT_lx
> +            ", 0x" TARGET_FMT_lx ")\n", reg, buf);
> +
> +    if (!sdev) {
> +        fprintf(stderr, "Wrong device in h_add_logical_lan_buffer\n");
> +        return H_PARAMETER;
> +    }
> +
> +    if ((check_bd(dev, buf, 4)<  0)
> +        || (VLAN_BD_LEN(buf)<  16)) {
> +        fprintf(stderr, "Bad buffer enqueued in h_add_logical_lan_buffer\n");
> +        return H_PARAMETER;
> +    }
> +
> +    if (!dev->isopen || dev->rx_bufs>= VLAN_MAX_BUFS) {
> +        return H_RESOURCE;
> +    }
> +
> +    do {
> +        dev->add_buf_ptr += 8;
> +        if (dev->add_buf_ptr>= SPAPR_VIO_TCE_PAGE_SIZE) {
> +            dev->add_buf_ptr = VLAN_RX_BDS_OFF;
> +        }
> +
> +        bd = ldq_tce(sdev, dev->buf_list + dev->add_buf_ptr);
> +    } while (bd&  VLAN_BD_VALID);
> +
> +    stq_tce(sdev, dev->buf_list + dev->add_buf_ptr, buf);
> +
> +    dev->rx_bufs++;
> +
> +    dprintf("h_add_logical_lan_buffer():  Added buf  ptr=%d  rx_bufs=%d"
> +            " bd=0x%016llx\n", dev->add_buf_ptr, dev->rx_bufs,
> +            (unsigned long long)buf);
> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_send_logical_lan(CPUState *env, sPAPREnvironment *spapr,
> +                                       target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    target_ulong *bufs = args + 1;
> +    target_ulong continue_token = args[7];
> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +    unsigned total_len;
> +    uint8_t *lbuf, *p;
> +    int i, nbufs;
> +    int ret = H_SUCCESS;
> +
> +    dprintf("H_SEND_LOGICAL_LAN(0x" TARGET_FMT_lx ",<bufs>, 0x"
> +            TARGET_FMT_lx ")\n", reg, continue_token);
> +
> +    if (!sdev) {
> +        return H_PARAMETER;
> +    }
> +
> +    dprintf("rxbufs = %d\n", dev->rx_bufs);
> +
> +    if (!dev->isopen) {
> +        return H_DROPPED;
> +    }
> +
> +    if (continue_token) {
> +        return H_HARDWARE; /* FIXME actually handle this */
> +    }
> +
> +    total_len = 0;
> +    for (i = 0; i<  6; i++) {
> +        dprintf("   buf desc: 0x" TARGET_FMT_lx "\n", bufs[i]);
> +        if (!(bufs[i]&  VLAN_BD_VALID)) {
> +            break;
> +        }
> +        total_len += VLAN_BD_LEN(bufs[i]);
> +    }
> +
> +    nbufs = i;
> +    dprintf("h_send_logical_lan() %d buffers, total length 0x%x\n",
> +            nbufs, total_len);
> +
> +    if (total_len == 0) {
> +        return ret;
> +    }
> +
> +    lbuf = qemu_mallocz(total_len);

Do you really need the zeroing here? In fact, this looks like a good 
candidate for alloca :).

> +    p = lbuf;
> +    for (i = 0; i<  nbufs; i++) {
> +        ret = spapr_tce_dma_read(sdev, VLAN_BD_ADDR(bufs[i]),
> +                                 p, VLAN_BD_LEN(bufs[i]));
> +        if (ret<  0) {
> +            goto out;
> +        }
> +
> +        p += VLAN_BD_LEN(bufs[i]);
> +    }
> +
> +    qemu_send_packet(&dev->nic->nc, lbuf, total_len);
> +
> +out:
> +    qemu_free(lbuf);
> +
> +    return ret;
> +}
> +
> +static target_ulong h_multicast_ctrl(CPUState *env, sPAPREnvironment *spapr,
> +                                     target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +
> +    if (!dev) {
> +        return H_PARAMETER;
> +    }
> +
> +    return H_SUCCESS;
> +}
> +
> +static void vlan_hcalls(VIOsPAPRBus *bus)
> +{
> +    spapr_register_hypercall(H_REGISTER_LOGICAL_LAN, h_register_logical_lan);
> +    spapr_register_hypercall(H_FREE_LOGICAL_LAN, h_free_logical_lan);
> +    spapr_register_hypercall(H_SEND_LOGICAL_LAN, h_send_logical_lan);
> +    spapr_register_hypercall(H_ADD_LOGICAL_LAN_BUFFER, h_add_logical_lan_buffer);
> +    spapr_register_hypercall(H_MULTICAST_CTRL, h_multicast_ctrl);
> +}
> +
> +static VIOsPAPRDeviceInfo spapr_vlan = {
> +    .init = spapr_vlan_init,
> +    .devnode = spapr_vlan_devnode,
> +    .dt_name = "l-lan",
> +    .dt_type = "network",
> +    .dt_compatible = "IBM,l-lan",
> +    .signal_mask = 0x1,
> +    .hcalls = vlan_hcalls,
> +    .qdev.name = "spapr-vlan",
> +    .qdev.size = sizeof(VIOsPAPRVLANDevice),
> +    .qdev.props = (Property[]) {
> +        DEFINE_PROP_UINT32("reg", VIOsPAPRDevice, reg, 0x1000),
> +        DEFINE_PROP_UINT32("dma-window", VIOsPAPRDevice, rtce_window_size,
> +                           0x10000000),
> +        DEFINE_NIC_PROPERTIES(VIOsPAPRVLANDevice, nicconf),
> +        DEFINE_PROP_END_OF_LIST(),
> +    },
> +};
> +
> +static void spapr_vlan_register(void)
> +{
> +    spapr_vio_bus_register_withprop(&spapr_vlan);
> +}
> +device_init(spapr_vlan_register);
> diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
> index 1b15d3e..4cfaf55 100644
> --- a/hw/spapr_vio.h
> +++ b/hw/spapr_vio.h
> @@ -21,9 +21,9 @@
>    * License along with this library; if not, see<http://www.gnu.org/licenses/>.
>    */
>
> -#define SPAPR_VIO_TCE_PAGE_SHIFT	12
> -#define SPAPR_VIO_TCE_PAGE_SIZE		(1ULL<<  SPAPR_VIO_TCE_PAGE_SHIFT)
> -#define SPAPR_VIO_TCE_PAGE_MASK		(SPAPR_VIO_TCE_PAGE_SIZE - 1)
> +#define SPAPR_VIO_TCE_PAGE_SHIFT   12
> +#define SPAPR_VIO_TCE_PAGE_SIZE    (1ULL<<  SPAPR_VIO_TCE_PAGE_SHIFT)
> +#define SPAPR_VIO_TCE_PAGE_MASK    (SPAPR_VIO_TCE_PAGE_SIZE - 1)

Those shouldn't have been tabs in the first place :)


Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 23/26] Implement PAPR CRQ hypercalls
  2011-03-16  4:57 ` [Qemu-devel] [PATCH 23/26] Implement PAPR CRQ hypercalls David Gibson
@ 2011-03-16 16:15   ` Alexander Graf
  0 siblings, 0 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 16:15 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:57 AM, David Gibson wrote:
> From: Ben Herrenschmidt<benh@kernel.crashing.org>
>
> This patch implements the infrastructure and hypercalls necessary for the
> PAPR specified CRQ (Command Request Queue) mechanism.  This general
> request queueing system is used by many of the PAPR virtual IO devices,
> including the virtual scsi adapter.
>
> Signed-off-by: Ben Herrenschmidt<benh@kernel.crashing.org>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   hw/spapr.c     |    2 +-
>   hw/spapr_vio.c |  159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/spapr_vio.h |   12 ++++
>   3 files changed, 172 insertions(+), 1 deletions(-)
>
> diff --git a/hw/spapr.c b/hw/spapr.c
> index 44cf3cc..cb97a16 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -64,7 +64,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>       uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>       uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
>       char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
> -        "\0hcall-tce";
> +        "\0hcall-tce\0hcall-vio";
>       uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
>       int i;
>       char *modelname;
> diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
> index 37cf51e..96668f3 100644
> --- a/hw/spapr_vio.c
> +++ b/hw/spapr_vio.c
> @@ -352,6 +352,159 @@ uint64_t ldq_tce(VIOsPAPRDevice *dev, uint64_t taddr)
>       return tswap64(val);
>   }
>
> +/*
> + * CRQ handling
> + */
> +static target_ulong h_reg_crq(CPUState *env, sPAPREnvironment *spapr,
> +                              target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    target_ulong queue_addr = args[1];
> +    target_ulong queue_len = args[2];
> +    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +
> +    if (!dev) {
> +        fprintf(stderr, "h_reg_crq on non-existent unit 0x"
> +                TARGET_FMT_lx "\n", reg);
> +        return H_PARAMETER;
> +    }
> +
> +    /* We can't grok a queue size bigger than 256M for now */
> +    if (queue_len<  0x1000 || queue_len>  0x10000000) {
> +        fprintf(stderr, "h_reg_crq, queue size too small or too big (0x%llx)\n",
> +                (unsigned long long)queue_len);
> +        return H_PARAMETER;
> +    }
> +
> +    /* Check queue alignment */
> +    if (queue_addr&  0xfff) {
> +        fprintf(stderr, "h_reg_crq, queue not aligned (0x%llx)\n",
> +                (unsigned long long)queue_addr);
> +        return H_PARAMETER;
> +    }
> +
> +    /* Check if device supports CRQs */
> +    if (!dev->crq.SendFunc) {
> +        return H_NOT_FOUND;
> +    }
> +
> +
> +    /* Already a queue ? */
> +    if (dev->crq.qsize) {
> +        return H_RESOURCE;
> +    }
> +    dev->crq.qladdr = queue_addr;
> +    dev->crq.qsize = queue_len;
> +    dev->crq.qnext = 0;
> +
> +    dprintf("CRQ for dev 0x" TARGET_FMT_lx " registered at 0x"
> +            TARGET_FMT_lx "/0x" TARGET_FMT_lx "\n",
> +            reg, queue_addr, queue_len);
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_free_crq(CPUState *env, sPAPREnvironment *spapr,
> +                               target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +
> +    if (!dev) {
> +        fprintf(stderr, "h_free_crq on non-existent unit 0x"
> +                TARGET_FMT_lx "\n", reg);
> +        return H_PARAMETER;
> +    }
> +
> +    dev->crq.qladdr = 0;
> +    dev->crq.qsize = 0;
> +    dev->crq.qnext = 0;
> +
> +    dprintf("CRQ for dev 0x" TARGET_FMT_lx " freed\n", reg);
> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_send_crq(CPUState *env, sPAPREnvironment *spapr,
> +                               target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    target_ulong msg_hi = args[1];
> +    target_ulong msg_lo = args[2];
> +    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    uint64_t crq_mangle[2];
> +
> +    if (!dev) {
> +        fprintf(stderr, "h_send_crq on non-existent unit 0x"
> +                TARGET_FMT_lx "\n", reg);
> +        return H_PARAMETER;
> +    }
> +    crq_mangle[0] = cpu_to_be64(msg_hi);
> +    crq_mangle[1] = cpu_to_be64(msg_lo);
> +
> +    if (dev->crq.SendFunc) {
> +        return dev->crq.SendFunc(dev, (uint8_t *)crq_mangle);
> +    }
> +
> +    return H_HARDWARE;
> +}
> +
> +static target_ulong h_enable_crq(CPUState *env, sPAPREnvironment *spapr,
> +                                 target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +
> +    if (!dev) {
> +        fprintf(stderr, "h_enable_crq on non-existent unit 0x"
> +                TARGET_FMT_lx "\n", reg);
> +        return H_PARAMETER;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Returns negative error, 0 success, or positive: queue full */
> +int spapr_vio_send_crq(VIOsPAPRDevice *dev, uint8_t *crq)
> +{
> +    int rc;
> +    uint8_t byte;
> +
> +    if (!dev->crq.qsize) {
> +        fprintf(stderr, "spapr_vio_send_creq on uninitialized queue\n");
> +        return -1;
> +    }
> +
> +    /* Maybe do a fast path for KVM just writing to the pages */
> +    rc = spapr_tce_dma_read(dev, dev->crq.qladdr + dev->crq.qnext,&byte, 1);
> +    if (rc) {
> +        return rc;
> +    }
> +    if (byte != 0) {
> +        return 1;
> +    }
> +
> +    rc = spapr_tce_dma_write(dev, dev->crq.qladdr + dev->crq.qnext + 8,&crq[8], 8);
> +    if (rc) {
> +        return rc;
> +    }
> +#ifdef __powerpc__
> +    /* Really only needed for kvm... */

Create a kvm helper function for this in target-ppc/kvm.c please that is 
just a nop when !kvm.

> +    asm volatile("eieio" : : : "memory");
> +#endif

Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 24/26] Implement PAPR virtual SCSI interface (ibmvscsi)
  2011-03-16  4:57 ` [Qemu-devel] [PATCH 24/26] Implement PAPR virtual SCSI interface (ibmvscsi) David Gibson
@ 2011-03-16 16:41   ` Alexander Graf
  2011-03-16 16:51     ` Anthony Liguori
  2011-03-16 20:08     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 16:41 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:57 AM, David Gibson wrote:
> This patch implements the infrastructure and hypercalls necessary for
> the PAPR specified Virtual SCSI interface.  This is the normal method
> for providing (virtual) disks to PAPR partitions.
>
> Signed-off-by: Ben Herrenschmidt<benh@kernel.crashing.org>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   Makefile.target  |    2 +-
>   hw/ppc-viosrp.h  |  216 ++++++++++++
>   hw/spapr.c       |   10 +-
>   hw/spapr_vio.h   |    3 +
>   hw/spapr_vscsi.c |  960 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/srp.h         |  241 ++++++++++++++
>   6 files changed, 1430 insertions(+), 2 deletions(-)
>   create mode 100644 hw/ppc-viosrp.h
>   create mode 100644 hw/spapr_vscsi.c
>   create mode 100644 hw/srp.h
>
> diff --git a/Makefile.target b/Makefile.target
> index ef86d43..49f9e9a 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -233,7 +233,7 @@ obj-ppc-y += ppc_oldworld.o
>   obj-ppc-y += ppc_newworld.o
>   # IBM pSeries (sPAPR)
>   obj-ppc-y += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
> -obj-ppc-y += xics.o spapr_vty.o spapr_llan.o
> +obj-ppc-y += xics.o spapr_vty.o spapr_llan.o spapr_vscsi.o
>   # PowerPC 4xx boards
>   obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>   obj-ppc-y += ppc440.o ppc440_bamboo.o
> diff --git a/hw/ppc-viosrp.h b/hw/ppc-viosrp.h
> new file mode 100644
> index 0000000..9afcf7a
> --- /dev/null
> +++ b/hw/ppc-viosrp.h
> @@ -0,0 +1,216 @@
> +/*****************************************************************************/
> +/* srp.h -- SCSI RDMA Protocol definitions                                   */
> +/*                                                                           */
> +/* Written By: Colin Devilbis, IBM Corporation                               */
> +/*                                                                           */
> +/* Copyright (C) 2003 IBM Corporation                                        */
> +/*                                                                           */
> +/* This program is free software; you can redistribute it and/or modify      */
> +/* it under the terms of the GNU General Public License as published by      */
> +/* the Free Software Foundation; either version 2 of the License, or         */
> +/* (at your option) any later version.                                       */
> +/*                                                                           */
> +/* This program is distributed in the hope that it will be useful,           */
> +/* but WITHOUT ANY WARRANTY; without even the implied warranty of            */
> +/* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the             */
> +/* GNU General Public License for more details.                              */
> +/*                                                                           */
> +/* You should have received a copy of the GNU General Public License         */
> +/* along with this program; if not, write to the Free Software               */
> +/* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA */
> +/*                                                                           */
> +/*                                                                           */
> +/* This file contains structures and definitions for IBM RPA (RS/6000        */
> +/* platform architecture) implementation of the SRP (SCSI RDMA Protocol)     */
> +/* standard.  SRP is used on IBM iSeries and pSeries platforms to send SCSI  */
> +/* commands between logical partitions.                                      */
> +/*                                                                           */
> +/* SRP Information Units (IUs) are sent on a "Command/Response Queue" (CRQ)  */
> +/* between partitions.  The definitions in this file are architected,        */
> +/* and cannot be changed without breaking compatibility with other versions  */
> +/* of Linux and other operating systems (AIX, OS/400) that talk this protocol*/
> +/* between logical partitions                                                */
> +/*****************************************************************************/
> +#ifndef PPC_VIOSRP_H
> +#define PPC_VIOSRP_H
> +
> +#define SRP_VERSION "16.a"
> +#define SRP_MAX_IU_LEN    256
> +#define SRP_MAX_LOC_LEN 32
> +
> +union srp_iu {
> +    struct srp_login_req login_req;
> +    struct srp_login_rsp login_rsp;
> +    struct srp_login_rej login_rej;
> +    struct srp_i_logout i_logout;
> +    struct srp_t_logout t_logout;
> +    struct srp_tsk_mgmt tsk_mgmt;
> +    struct srp_cmd cmd;
> +    struct srp_rsp rsp;
> +    uint8_t reserved[SRP_MAX_IU_LEN];
> +};
> +
> +enum viosrp_crq_formats {
> +    VIOSRP_SRP_FORMAT = 0x01,
> +    VIOSRP_MAD_FORMAT = 0x02,
> +    VIOSRP_OS400_FORMAT = 0x03,
> +    VIOSRP_AIX_FORMAT = 0x04,
> +    VIOSRP_LINUX_FORMAT = 0x06,
> +    VIOSRP_INLINE_FORMAT = 0x07
> +};
> +
> +enum viosrp_crq_status {
> +    VIOSRP_OK = 0x0,
> +    VIOSRP_NONRECOVERABLE_ERR = 0x1,
> +    VIOSRP_VIOLATES_MAX_XFER = 0x2,
> +    VIOSRP_PARTNER_PANIC = 0x3,
> +    VIOSRP_DEVICE_BUSY = 0x8,
> +    VIOSRP_ADAPTER_FAIL = 0x10,
> +    VIOSRP_OK2 = 0x99,
> +};
> +
> +struct viosrp_crq {
> +    uint8_t valid;        /* used by RPA */
> +    uint8_t format;        /* SCSI vs out-of-band */
> +    uint8_t reserved;
> +    uint8_t status;        /* non-scsi failure? (e.g. DMA failure) */
> +    uint16_t timeout;        /* in seconds */
> +    uint16_t IU_length;        /* in bytes */
> +    uint64_t IU_data_ptr;    /* the TCE for transferring data */
> +};
> +
> +/* MADs are Management requests above and beyond the IUs defined in the SRP
> + * standard.
> + */
> +enum viosrp_mad_types {
> +    VIOSRP_EMPTY_IU_TYPE = 0x01,
> +    VIOSRP_ERROR_LOG_TYPE = 0x02,
> +    VIOSRP_ADAPTER_INFO_TYPE = 0x03,
> +    VIOSRP_HOST_CONFIG_TYPE = 0x04,
> +    VIOSRP_CAPABILITIES_TYPE = 0x05,
> +    VIOSRP_ENABLE_FAST_FAIL = 0x08,
> +};
> +
> +enum viosrp_mad_status {
> +    VIOSRP_MAD_SUCCESS = 0x00,
> +    VIOSRP_MAD_NOT_SUPPORTED = 0xF1,
> +    VIOSRP_MAD_FAILED = 0xF7,
> +};
> +
> +enum viosrp_capability_type {
> +    MIGRATION_CAPABILITIES = 0x01,
> +    RESERVATION_CAPABILITIES = 0x02,
> +};
> +
> +enum viosrp_capability_support {
> +    SERVER_DOES_NOT_SUPPORTS_CAP = 0x0,
> +    SERVER_SUPPORTS_CAP = 0x01,
> +    SERVER_CAP_DATA = 0x02,
> +};
> +
> +enum viosrp_reserve_type {
> +    CLIENT_RESERVE_SCSI_2 = 0x01,
> +};
> +
> +enum viosrp_capability_flag {
> +    CLIENT_MIGRATED = 0x01,
> +    CLIENT_RECONNECT = 0x02,
> +    CAP_LIST_SUPPORTED = 0x04,
> +    CAP_LIST_DATA = 0x08,
> +};
> +
> +/*
> + * Common MAD header
> + */
> +struct mad_common {
> +    uint32_t type;
> +    uint16_t status;
> +    uint16_t length;
> +    uint64_t tag;

Is this an in-memory representation? If so, it should be packed, right? 
Same goes for the ones below.

> +};
> +
> +/*
> + * All SRP (and MAD) requests normally flow from the
> + * client to the server.  There is no way for the server to send
> + * an asynchronous message back to the client.  The Empty IU is used
> + * to hang out a meaningless request to the server so that it can respond
> + * asynchrouously with something like a SCSI AER
> + */
> +struct viosrp_empty_iu {
> +    struct mad_common common;
> +    uint64_t buffer;
> +    uint32_t port;
> +};
> +
> +struct viosrp_error_log {
> +    struct mad_common common;
> +    uint64_t buffer;
> +};
> +
> +struct viosrp_adapter_info {
> +    struct mad_common common;
> +    uint64_t buffer;
> +};
> +
> +struct viosrp_host_config {
> +    struct mad_common common;
> +    uint64_t buffer;
> +};
> +
> +struct viosrp_fast_fail {
> +    struct mad_common common;
> +};
> +
> +struct viosrp_capabilities {
> +    struct mad_common common;
> +    uint64_t buffer;
> +};
> +
> +struct mad_capability_common {
> +    uint32_t cap_type;
> +    uint16_t length;
> +    uint16_t server_support;
> +};
> +
> +struct mad_reserve_cap {
> +    struct mad_capability_common common;
> +    uint32_t type;
> +};
> +
> +struct mad_migration_cap {
> +    struct mad_capability_common common;
> +    uint32_t ecl;
> +};
> +
> +struct capabilities{

Space, but unused struct

> +    uint32_t flags;
> +    char name[SRP_MAX_LOC_LEN];
> +    char loc[SRP_MAX_LOC_LEN];
> +    struct mad_migration_cap migration;
> +    struct mad_reserve_cap reserve;
> +};
> +
> +union mad_iu {
> +    struct viosrp_empty_iu empty_iu;
> +    struct viosrp_error_log error_log;
> +    struct viosrp_adapter_info adapter_info;
> +    struct viosrp_host_config host_config;
> +    struct viosrp_fast_fail fast_fail;
> +    struct viosrp_capabilities capabilities;
> +};
> +
> +union viosrp_iu {
> +    union srp_iu srp;
> +    union mad_iu mad;
> +};
> +
> +struct mad_adapter_info_data {
> +    char srp_version[8];
> +    char partition_name[96];
> +    uint32_t partition_number;
> +    uint32_t mad_version;
> +    uint32_t os_type;
> +    uint32_t port_max_txu[8];    /* per-port maximum transfer */
> +};
> +
> +#endif
> diff --git a/hw/spapr.c b/hw/spapr.c
> index cb97a16..5f868fc 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -28,6 +28,7 @@
>   #include "hw.h"
>   #include "elf.h"
>   #include "net.h"
> +#include "blockdev.h"
>
>   #include "hw/boards.h"
>   #include "hw/ppc.h"
> @@ -316,7 +317,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>       qemu_free(filename);
>
>       /* Set up Interrupt Controller */
> -    spapr->icp = xics_system_init(smp_cpus, envs, MAX_SERIAL_PORTS + nb_nics);
> +    spapr->icp = xics_system_init(smp_cpus, envs, MAX_SERIAL_PORTS + nb_nics + drive_get_max_bus(IF_SCSI) + 1);

This looks like it's exceeding 80 characters :)

>
>       /* Set up VIO bus */
>       spapr->vio_bus = spapr_vio_bus_init();
> @@ -346,6 +347,12 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>           }
>       }
>
> +    for (i = 0; i<= drive_get_max_bus(IF_SCSI); i++) {
> +        spapr_vscsi_create(spapr->vio_bus, 0x2000 + i,
> +                           xics_find_qirq(spapr->icp, irq), irq);
> +        irq++;
> +    }
> +
>       if (kernel_filename) {
>           uint64_t lowaddr = 0;
>
> @@ -406,6 +413,7 @@ static QEMUMachine spapr_machine = {
>       .max_cpus = MAX_CPUS,
>       .no_vga = 1,
>       .no_parallel = 1,
> +    .use_scsi = 1,
>   };
>
>   static void spapr_machine_init(void)
> diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
> index ba16795..b7d0daa 100644
> --- a/hw/spapr_vio.h
> +++ b/hw/spapr_vio.h
> @@ -101,4 +101,7 @@ void spapr_vty_create(VIOsPAPRBus *bus,
>   void spapr_vlan_create(VIOsPAPRBus *bus, uint32_t reg, NICInfo *nd,
>                          qemu_irq qirq, uint32_t vio_irq_num);
>
> +void spapr_vscsi_create(VIOsPAPRBus *bus, uint32_t reg,
> +                        qemu_irq qirq, uint32_t vio_irq_num);
> +
>   #endif /* _HW_SPAPR_VIO_H */
> diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c
> new file mode 100644
> index 0000000..0a67095
> --- /dev/null
> +++ b/hw/spapr_vscsi.c
> @@ -0,0 +1,960 @@

License header

> +/* TODO:
> + *
> + *  - Cleanups :-)
> + *  - Sort out better how to assign devices to VSCSI instances
> + *  - Fix residual counts
> + *  - Add indirect descriptors support
> + *  - Maybe do autosense (PAPR seems to mandate it, linux doesn't care)
> + */
> +#include "hw.h"
> +#include "scsi.h"
> +#include "scsi-defs.h"
> +#include "net.h" /* Remove that when we can */
> +#include "srp.h"
> +#include "hw/qdev.h"
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +#include "hw/ppc-viosrp.h"
> +
> +#include<libfdt.h>
> +
> +//#define DEBUG_VSCSI
> +
> +#ifdef DEBUG_VSCSI
> +#define dprintf(fmt, ...) \
> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define dprintf(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +#define min(a, b) ((a)<  (b) ? (a) : (b))

There's MIN for that

> +
> +/*
> + * Virtual SCSI device
> + */
> +
> +/* Random numbers */
> +#define VSCSI_MAX_SECTORS       4096/*1024*//*256*/

Probably good to just remove the commented out ones

> +#define VSCSI_REQ_LIMIT         24
> +
> +#define SCSI_SENSE_BUF_SIZE     96
> +#define SRP_RSP_SENSE_DATA_LEN  18
> +
> +typedef union vscsi_crq {
> +    struct viosrp_crq s;
> +    uint8_t raw[16];
> +} vscsi_crq;
> +
> +typedef struct vscsi_req
> +{
> +    vscsi_crq               crq;
> +    union viosrp_iu         iu;
> +
> +    /* SCSI request tracking */
> +    SCSIDevice              *sdev;
> +    uint32_t                qtag; /* qemu tag != srp tag */
> +    int                     lun;
> +    int                     active;
> +    long                    data_len;
> +    int                     writing;
> +    int                     sensing;
> +    int                     senselen;
> +    uint8_t                 sense[SCSI_SENSE_BUF_SIZE];
> +
> +    /* RDMA related bits */
> +    uint8_t                 dma_fmt;
> +    struct srp_direct_buf   ext_desc;
> +    struct srp_direct_buf   *cur_desc;
> +    struct srp_indirect_buf *ind_desc;
> +    int                     local_desc;
> +    int                     total_desc;
> +
> +} vscsi_req;
> +
> +
> +typedef struct {
> +    VIOsPAPRDevice vdev;
> +    SCSIBus bus;
> +    vscsi_req reqs[VSCSI_REQ_LIMIT];
> +} VSCSIState;
> +
> +/* XXX Debug only */
> +static VSCSIState *dbg_vscsi_state;
> +
> +
> +static struct vscsi_req *vscsi_get_req(VSCSIState *s)
> +{
> +    vscsi_req *req;
> +    int i;
> +
> +    for (i = 0; i<  VSCSI_REQ_LIMIT; i++) {
> +        req =&s->reqs[i];
> +        if (!req->active) {
> +            memset(req, 0, sizeof(*req));
> +            req->qtag = i;
> +            req->active = 1;
> +            return req;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static void vscsi_put_req(VSCSIState *s, vscsi_req *req)
> +{
> +    req->active = 0;
> +}
> +
> +static vscsi_req *vscsi_find_req(VSCSIState *s, uint32_t tag)
> +{
> +    if (tag>= VSCSI_REQ_LIMIT || !s->reqs[tag].active) {
> +        return NULL;
> +    }
> +    return&s->reqs[tag];
> +}
> +
> +static void vscsi_decode_id_lun(uint64_t srp_lun, int *id, int *lun)
> +{
> +    /* XXX Figure that one out properly ! This is crackpot */
> +    *id = (srp_lun>>  56)&  0x7f;
> +    *lun = (srp_lun>>  48)&  0xff;
> +}
> +
> +static int vscsi_send_iu(VSCSIState *s, vscsi_req *req,
> +                         uint64_t length, uint8_t format)
> +{
> +    long rc, rc1;
> +
> +    /* First copy the SRP */
> +    rc = spapr_tce_dma_write(&s->vdev, req->crq.s.IU_data_ptr,
> +&req->iu, length);
> +    if (rc) {
> +        fprintf(stderr, "vscsi_send_iu: DMA write failure !\n");
> +    }
> +
> +    req->crq.s.valid = 0x80;
> +    req->crq.s.format = format;
> +    req->crq.s.reserved = 0x00;
> +    req->crq.s.timeout = cpu_to_be16(0x0000);
> +    req->crq.s.IU_length = cpu_to_be16(length);
> +    req->crq.s.IU_data_ptr = req->iu.srp.rsp.tag; /* right byte order */
> +
> +    if (rc == 0) {
> +        req->crq.s.status = 0x99; /* Just needs to be non-zero */
> +    } else {
> +        req->crq.s.status = 0x00;
> +    }
> +
> +    rc1 = spapr_vio_send_crq(&s->vdev, req->crq.raw);
> +    if (rc1) {
> +        fprintf(stderr, "vscsi_send_iu: Error sending response\n");
> +        return rc1;
> +    }
> +
> +    return rc;
> +}
> +
> +static void vscsi_makeup_sense(VSCSIState *s, vscsi_req *req,
> +                               uint8_t key, uint8_t asc, uint8_t ascq)
> +{
> +    req->senselen = SRP_RSP_SENSE_DATA_LEN;
> +
> +    /* Valid bit and 'current errors' */
> +    req->sense[0] = (0x1<<  7 | 0x70);
> +    /* Sense key */
> +    req->sense[2] = key;
> +    /* Additional sense length */
> +    req->sense[7] = 0xa; /* 10 bytes */
> +    /* Additional sense code */
> +    req->sense[12] = asc;
> +    req->sense[13] = ascq;
> +}
> +
> +static int vscsi_send_rsp(VSCSIState *s, vscsi_req *req,
> +                          uint8_t status, int32_t res_in, int32_t res_out)
> +{
> +   union viosrp_iu *iu =&req->iu;
> +   uint64_t tag = iu->srp.rsp.tag;
> +   int total_len = sizeof(iu->srp.rsp);
> +
> +   dprintf("VSCSI: Sending resp status: 0x%x, "
> +           "res_in: %d, res_out: %d \n", status, res_in, res_out);
> +
> +   memset(iu, 0, sizeof(struct srp_rsp));
> +   iu->srp.rsp.opcode = SRP_RSP;
> +   iu->srp.rsp.req_lim_delta = cpu_to_be32(1);
> +   iu->srp.rsp.tag = tag;
> +
> +   /* Handle residuals */
> +   if (res_in<  0) {
> +       iu->srp.rsp.flags |= SRP_RSP_FLAG_DIUNDER;
> +       res_in = -res_in;
> +   } else if (res_in) {
> +       iu->srp.rsp.flags |= SRP_RSP_FLAG_DIOVER;
> +   }
> +   if (res_out<  0) {
> +       iu->srp.rsp.flags |= SRP_RSP_FLAG_DOUNDER;
> +       res_out = -res_out;
> +   } else if (res_out) {
> +       iu->srp.rsp.flags |= SRP_RSP_FLAG_DOOVER;
> +   }
> +   iu->srp.rsp.data_in_res_cnt = cpu_to_be32(res_in);
> +   iu->srp.rsp.data_out_res_cnt = cpu_to_be32(res_out);
> +
> +   /* We don't do response data */
> +   /* iu->srp.rsp.flags&= ~SRP_RSP_FLAG_RSPVALID; */
> +   iu->srp.rsp.resp_data_len = cpu_to_be32(0);
> +
> +   /* Handle success vs. failure */
> +   iu->srp.rsp.status = status;
> +   if (status) {
> +       iu->srp.rsp.sol_not = (iu->srp.cmd.sol_not&  0x04)>>  2;
> +       if (req->senselen) {
> +           req->iu.srp.rsp.flags |= SRP_RSP_FLAG_SNSVALID;
> +           req->iu.srp.rsp.sense_data_len = cpu_to_be32(req->senselen);
> +           memcpy(req->iu.srp.rsp.data, req->sense, req->senselen);
> +           total_len += req->senselen;
> +       }
> +   } else {
> +       iu->srp.rsp.sol_not = (iu->srp.cmd.sol_not&  0x02)>>  1;
> +   }
> +
> +   vscsi_send_iu(s, req, total_len, VIOSRP_SRP_FORMAT);
> +   return 0;
> +}
> +
> +static inline void vscsi_swap_desc(struct srp_direct_buf *desc)
> +{
> +    desc->va = be64_to_cpu(desc->va);
> +    desc->len = be32_to_cpu(desc->len);
> +}
> +
> +static int vscsi_srp_direct_data(VSCSIState *s, vscsi_req *req,
> +                                 uint8_t *buf, uint32_t len)
> +{
> +    struct srp_direct_buf *md = req->cur_desc;
> +    uint32_t llen;
> +    int rc;
> +
> +    dprintf("VSCSI: direct segment 0x%x bytes, va=0x%llx desc len=0x%x\n",
> +            len, (unsigned long long)md->va, md->len);
> +
> +    llen = min(len, md->len);
> +    if (llen) {
> +        if (req->writing) { /* writing = to device = reading from memory */
> +            rc = spapr_tce_dma_read(&s->vdev, md->va, buf, llen);
> +        } else {
> +            rc = spapr_tce_dma_write(&s->vdev, md->va, buf, llen);
> +        }
> +    }
> +    md->len -= llen;
> +    md->va += llen;
> +
> +    if (rc) {
> +        return -1;
> +    }
> +    return llen;
> +}
> +
> +static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
> +                                   uint8_t *buf, uint32_t len)
> +{
> +    struct srp_direct_buf *td =&req->ind_desc->table_desc;
> +    struct srp_direct_buf *md = req->cur_desc;
> +    int rc = 0;
> +    uint32_t llen, total = 0;
> +
> +    dprintf("VSCSI: indirect segment 0x%x bytes, td va=0x%llx len=0x%x\n",
> +            len, (unsigned long long)td->va, td->len);
> +
> +    /* While we have data ... */
> +    while(len) {
> +        /* If we have a descriptor but it's empty, go fetch a new one */
> +        if (md&&  md->len == 0) {
> +            /* More local available, use one */
> +            if (req->local_desc) {
> +                md = ++req->cur_desc;
> +                --req->local_desc;
> +                --req->total_desc;
> +                td->va += sizeof(struct srp_direct_buf);
> +            } else {
> +                md = req->cur_desc = NULL;
> +            }
> +        }
> +        /* No descriptor at hand, fetch one */
> +        if (!md) {
> +            if (!req->total_desc) {
> +                dprintf("VSCSI:   Out of descriptors !\n");
> +                break;
> +            }
> +            md = req->cur_desc =&req->ext_desc;
> +            dprintf("VSCSI:   Reading desc from 0x%llx\n", (unsigned long long)td->va);
> +            rc = spapr_tce_dma_read(&s->vdev, td->va, md, sizeof(struct srp_direct_buf));
> +            if (rc) {
> +                dprintf("VSCSI: tce_dma_read ->  %d reading ext_desc\n", rc);
> +                break;
> +            }
> +            vscsi_swap_desc(md);
> +            td->va += sizeof(struct srp_direct_buf);
> +            --req->total_desc;
> +        }
> +        dprintf("VSCSI:   [desc va=0x%llx,len=0x%x] remaining=0x%x\n",
> +                (unsigned long long)md->va, md->len, len);
> +
> +        /* Perform transfer */
> +        llen = min(len, md->len);
> +        if (req->writing) { /* writing = to device = reading from memory */
> +            rc = spapr_tce_dma_read(&s->vdev, md->va, buf, llen);
> +

spurious line

> +        } else {
> +            rc = spapr_tce_dma_write(&s->vdev, md->va, buf, llen);
> +        }
> +        if (rc) {
> +            dprintf("VSCSI: tce_dma_r/w(%d) ->  %d\n", req->writing, rc);
> +            break;
> +        }
> +        dprintf("VSCSI:     data: %02x %02x %02x %02x...\n",
> +                buf[0], buf[1], buf[2], buf[3]);
> +
> +        len -= llen;
> +        buf += llen;
> +        total += llen;
> +        md->va += llen;
> +        md->len -= llen;
> +    }
> +    return rc ? -1 : total;
> +}
> +
> +static int vscsi_srp_transfer_data(VSCSIState *s, vscsi_req *req,
> +                                   int writing, uint8_t *buf, uint32_t len)
> +{
> +    int err = 0;
> +
> +    switch (req->dma_fmt) {
> +    case SRP_NO_DATA_DESC:
> +        dprintf("VSCSI: no data desc transfer, skipping 0x%x bytes\n", len);
> +        break;
> +    case SRP_DATA_DESC_DIRECT:
> +        err = vscsi_srp_direct_data(s, req, buf, len);
> +        break;
> +    case SRP_DATA_DESC_INDIRECT:
> +        err = vscsi_srp_indirect_data(s, req, buf, len);
> +        break;
> +    }
> +    return err;
> +}
> +
> +/* Bits from linux srp */
> +static int data_out_desc_size(struct srp_cmd *cmd)
> +{
> +    int size = 0;
> +    uint8_t fmt = cmd->buf_fmt>>  4;
> +
> +    switch (fmt) {
> +    case SRP_NO_DATA_DESC:
> +        break;
> +    case SRP_DATA_DESC_DIRECT:
> +        size = sizeof(struct srp_direct_buf);
> +        break;
> +    case SRP_DATA_DESC_INDIRECT:
> +        size = sizeof(struct srp_indirect_buf) +
> +            sizeof(struct srp_direct_buf) * cmd->data_out_desc_cnt;
> +        break;
> +    default:
> +        break;
> +    }
> +    return size;
> +}
> +
> +static int vscsi_preprocess_desc(vscsi_req *req)
> +{
> +    struct srp_cmd *cmd =&req->iu.srp.cmd;
> +    int offset, i;
> +
> +    offset = cmd->add_cdb_len&  ~3;
> +
> +    if (req->writing) {
> +        req->dma_fmt = cmd->buf_fmt>>  4;
> +    } else {
> +        offset += data_out_desc_size(cmd);
> +        req->dma_fmt = cmd->buf_fmt&  ((1U<<  4) - 1);
> +    }
> +
> +    switch (req->dma_fmt) {
> +    case SRP_NO_DATA_DESC:
> +        break;
> +    case SRP_DATA_DESC_DIRECT:
> +        req->cur_desc = (struct srp_direct_buf *)(cmd->add_data + offset);
> +        req->total_desc = req->local_desc = 1;
> +        vscsi_swap_desc(req->cur_desc);
> +        dprintf("VSCSI: using direct RDMA %s, 0x%x bytes MD: 0x%llx\n",
> +                req->writing ? "write" : "read",
> +                req->cur_desc->len, (unsigned long long)req->cur_desc->va);
> +        break;
> +    case SRP_DATA_DESC_INDIRECT:
> +        req->ind_desc = (struct srp_indirect_buf *)(cmd->add_data + offset);
> +        vscsi_swap_desc(&req->ind_desc->table_desc);
> +        req->total_desc = req->ind_desc->table_desc.len / sizeof(struct srp_direct_buf);
> +        req->local_desc = req->writing ? cmd->data_out_desc_cnt :
> +            cmd->data_in_desc_cnt;
> +        for (i = 0; i<  req->local_desc; i++)

Braces

> +            vscsi_swap_desc(&req->ind_desc->desc_list[i]);
> +        req->cur_desc = req->local_desc ?&req->ind_desc->desc_list[0] : NULL;
> +        dprintf("VSCSI: using indirect RDMA %s, 0x%x bytes %d descs (%d local) VA: 0x%llx\n",
> +                req->writing ? "read" : "write", be32_to_cpu(req->ind_desc->len),
> +                req->total_desc, req->local_desc,
> +                (unsigned long long)req->ind_desc->table_desc.va);
> +        break;
> +    default:
> +        fprintf(stderr,
> +                "vscsi_preprocess_desc: Unknown format %x\n", req->dma_fmt);
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static void vscsi_send_request_sense(VSCSIState *s, vscsi_req *req)
> +{
> +    SCSIDevice *sdev = req->sdev;
> +    uint8_t *cdb = req->iu.srp.cmd.cdb;
> +    int n;
> +
> +    cdb[0] = 3;
> +    cdb[1] = 0;
> +    cdb[2] = 0;
> +    cdb[3] = 0;
> +    cdb[4] = 96;
> +    cdb[5] = 0;
> +    req->sensing = 1;
> +    n = sdev->info->send_command(sdev, req->qtag, cdb, req->lun);
> +    dprintf("VSCSI: Queued request sense tag 0x%x \n", req->qtag);
> +    if (n<  0) {
> +        fprintf(stderr, "VSCSI: REQUEST_SENSE wants write data !?!?!?\n");
> +        sdev->info->cancel_io(sdev, req->qtag);
> +        vscsi_makeup_sense(s, req, HARDWARE_ERROR, 0, 0);
> +        vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
> +        vscsi_put_req(s, req);
> +        return;
> +    } else if (n == 0) {
> +        return;
> +    }
> +    sdev->info->read_data(sdev, req->qtag);
> +}
> +
> +/* Callback to indicate that the SCSI layer has completed a transfer.  */
> +static void vscsi_command_complete(SCSIBus *bus, int reason, uint32_t tag,
> +                                   uint32_t arg)
> +{
> +    VSCSIState *s = DO_UPCAST(VSCSIState, vdev.qdev, bus->qbus.parent);
> +    vscsi_req *req = vscsi_find_req(s, tag);
> +    SCSIDevice *sdev;
> +    uint8_t *buf;
> +    int32_t res_in = 0, res_out = 0;
> +    int len, rc = 0;
> +
> +    dprintf("VSCSI: SCSI cmd complete, r=0x%x tag=0x%x arg=0x%x, req=%p\n",
> +            reason, tag, arg, req);
> +    if (req == NULL) {
> +        fprintf(stderr, "VSCSI: Can't find request for tag 0x%x\n", tag);
> +        return;
> +    }
> +    sdev = req->sdev;
> +
> +    if (req->sensing) {
> +        if (reason == SCSI_REASON_DONE) {
> +            dprintf("VSCSI: Sense done !\n");
> +            vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
> +            vscsi_put_req(s, req);
> +        } else {
> +            uint8_t *buf = sdev->info->get_buf(sdev, tag);
> +
> +            len = min(arg, SCSI_SENSE_BUF_SIZE);
> +            dprintf("VSCSI: Sense data, %d bytes:\n", len);
> +            dprintf("       %02x  %02x  %02x  %02x  %02x  %02x  %02x  %02x\n",
> +                    buf[0], buf[1], buf[2], buf[3],
> +                    buf[4], buf[5], buf[6], buf[7]);
> +            dprintf("       %02x  %02x  %02x  %02x  %02x  %02x  %02x  %02x\n",
> +                    buf[8], buf[9], buf[10], buf[11],
> +                    buf[12], buf[13], buf[14], buf[15]);
> +            memcpy(req->sense, buf, len);
> +            req->senselen = len;
> +            sdev->info->read_data(sdev, req->qtag);
> +        }
> +        return;
> +    }
> +
> +    if (reason == SCSI_REASON_DONE) {
> +        dprintf("VSCSI: Command complete err=%d\n", arg);
> +        if (arg == 0) {
> +            /* We handle overflows, not underflows for normal commands,
> +             * but hopefully nobody cares
> +             */
> +            if (req->writing)

Braces

> +                res_out = req->data_len;
> +            else
> +                res_in = req->data_len;
> +            vscsi_send_rsp(s, req, 0, res_in, res_out);
> +        } else if (arg == CHECK_CONDITION) {
> +            dprintf("VSCSI: Got CHECK_CONDITION, requesting sense...\n");
> +            vscsi_send_request_sense(s, req);
> +            return;
> +        } else {
> +            vscsi_send_rsp(s, req, arg, 0, 0);
> +        }
> +        vscsi_put_req(s, req);
> +        return;
> +    }
> +
> +    /* "arg" is how much we have read for reads and how much we want
> +     * to write for writes (ie, how much is to be DMA'd)
> +     */
> +    if (arg) {
> +        buf = sdev->info->get_buf(sdev, tag);
> +        rc = vscsi_srp_transfer_data(s, req, req->writing, buf, arg);
> +    }
> +    if (rc<  0) {
> +        fprintf(stderr, "VSCSI: RDMA error rc=%d!\n", rc);
> +        sdev->info->cancel_io(sdev, req->qtag);
> +        vscsi_makeup_sense(s, req, HARDWARE_ERROR, 0, 0);
> +        vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
> +        vscsi_put_req(s, req);
> +        return;
> +    }
> +
> +    /* Start next chunk */
> +    req->data_len -= rc;
> +    if (req->writing) {
> +        sdev->info->write_data(sdev, req->qtag);
> +    } else {
> +        sdev->info->read_data(sdev, req->qtag);
> +    }
> +}
> +
> +static void vscsi_process_login(VSCSIState *s, vscsi_req *req)
> +{
> +    union viosrp_iu *iu =&req->iu;
> +    struct srp_login_rsp *rsp =&iu->srp.login_rsp;
> +    uint64_t tag = iu->srp.rsp.tag;
> +
> +    dprintf("VSCSI: Got login, sendin response !\n");
> +
> +    /* TODO handle case that requested size is wrong and
> +     * buffer format is wrong
> +     */
> +    memset(iu, 0, sizeof(struct srp_login_rsp));
> +    rsp->opcode = SRP_LOGIN_RSP;
> +    /* Don't advertise quite as many request as we support to
> +     * keep room for management stuff etc...
> +     */
> +    rsp->req_lim_delta = cpu_to_be32(VSCSI_REQ_LIMIT-2);
> +    rsp->tag = tag;
> +    rsp->max_it_iu_len = cpu_to_be32(sizeof(union srp_iu));
> +    rsp->max_ti_iu_len = cpu_to_be32(sizeof(union srp_iu));
> +    /* direct and indirect */
> +    rsp->buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT);
> +
> +    vscsi_send_iu(s, req, sizeof(*rsp), VIOSRP_SRP_FORMAT);
> +}
> +
> +static void vscsi_inquiry_no_target(VSCSIState *s, vscsi_req *req)
> +{
> +    uint8_t *cdb = req->iu.srp.cmd.cdb;
> +    uint8_t resp_data[36];
> +    int rc, len, alen;
> +
> +    /* We dont do EVPD. Also check that page_code is 0 */
> +    if ((cdb[1]&  0x01) || (cdb[1]&  0x01) || cdb[2] != 0) {
> +        /* Send INVALID FIELD IN CDB */
> +        vscsi_makeup_sense(s, req, ILLEGAL_REQUEST, 0x24, 0);
> +        vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
> +        return;
> +    }
> +    alen = cdb[3];
> +    alen = (alen<<  8) | cdb[4];
> +    len = min(alen, 36);
> +
> +    /* Fake up inquiry using PQ=3 */
> +    memset(resp_data, 0, 36);
> +    resp_data[0] = 0x7f;   /* Not capable of supporting a device here */
> +    resp_data[2] = 0x06;   /* SPS-4 */
> +    resp_data[3] = 0x02;   /* Resp data format */
> +    resp_data[4] = 36 - 5; /* Additional length */
> +    resp_data[7] = 0x10;   /* Sync transfers */
> +    memcpy(&resp_data[16], "QEMU EMPTY      ", 16);
> +    memcpy(&resp_data[8], "QEMU    ", 8);
> +
> +    req->writing = 0;
> +    vscsi_preprocess_desc(req);
> +    rc = vscsi_srp_transfer_data(s, req, 0, resp_data, len);
> +    if (rc<  0) {
> +        vscsi_makeup_sense(s, req, HARDWARE_ERROR, 0, 0);
> +        vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
> +    } else {
> +        vscsi_send_rsp(s, req, 0, 36 - rc, 0);
> +    }
> +}
> +
> +static int vscsi_queue_cmd(VSCSIState *s, vscsi_req *req)
> +{
> +    union srp_iu *srp =&req->iu.srp;
> +    SCSIDevice *sdev;
> +    int n, id, lun;
> +
> +    vscsi_decode_id_lun(be64_to_cpu(srp->cmd.lun),&id,&lun);
> +
> +    /* Qemu vs. linux issue with LUNs to be sorted out ... */
> +    sdev = (id<  8&&  lun<  16) ? s->bus.devs[id] : NULL;
> +    if (!sdev) {
> +        dprintf("VSCSI: Command for id %d with no drive\n", id);
> +        if (srp->cmd.cdb[0] == INQUIRY) {
> +            vscsi_inquiry_no_target(s, req);
> +        } else {
> +            vscsi_makeup_sense(s, req, ILLEGAL_REQUEST, 0x24, 0x00);
> +            vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
> +        } return 1;
> +    }
> +
> +    req->sdev = sdev;
> +    req->lun = lun;
> +    n = sdev->info->send_command(sdev, req->qtag, srp->cmd.cdb, lun);
> +
> +    dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x ID %d LUN %d ret: %d\n",
> +            req->qtag, srp->cmd.cdb[0], id, lun, n);
> +
> +    if (n) {
> +        /* Transfer direction must be set before preprocessing the
> +         * descriptors
> +         */
> +        req->writing = (n<  1);
> +
> +        /* Preprocess RDMA descriptors */
> +        vscsi_preprocess_desc(req);
> +    }
> +
> +    /* Get transfer direction and initiate transfer */
> +    if (n>  0) {
> +        req->data_len = n;
> +        sdev->info->read_data(sdev, req->qtag);
> +    } else if (n<  0) {
> +        req->data_len = -n;
> +        sdev->info->write_data(sdev, req->qtag);
> +    }
> +    /* Don't touch req here, it may have been recycled already */
> +
> +    return 0;
> +}
> +
> +static int vscsi_process_tsk_mgmt(VSCSIState *s, vscsi_req *req)
> +{
> +    union viosrp_iu *iu =&req->iu;
> +    int fn;
> +
> +    fprintf(stderr, "vscsi_process_tsk_mgmt %02x\n",
> +            iu->srp.tsk_mgmt.tsk_mgmt_func);
> +
> +    switch (iu->srp.tsk_mgmt.tsk_mgmt_func) {
> +#if 0 /* We really don't deal with these for now */
> +    case SRP_TSK_ABORT_TASK:
> +        fn = ABORT_TASK;
> +        break;
> +    case SRP_TSK_ABORT_TASK_SET:
> +        fn = ABORT_TASK_SET;
> +        break;
> +    case SRP_TSK_CLEAR_TASK_SET:
> +        fn = CLEAR_TASK_SET;
> +        break;
> +    case SRP_TSK_LUN_RESET:
> +        fn = LOGICAL_UNIT_RESET;
> +        break;
> +    case SRP_TSK_CLEAR_ACA:
> +        fn = CLEAR_ACA;
> +        break;
> +#endif
> +    default:
> +        fn = 0;
> +    }
> +    if (fn) {
> +        /* XXX Send/Handle target task management */
> +        ;
> +    } else {
> +        vscsi_makeup_sense(s, req, ILLEGAL_REQUEST, 0x20, 0);
> +        vscsi_send_rsp(s, req, CHECK_CONDITION, 0, 0);
> +    }
> +    return !fn;
> +}
> +
> +static int vscsi_handle_srp_req(VSCSIState *s, vscsi_req *req)
> +{
> +    union srp_iu *srp =&req->iu.srp;
> +    int done = 1;
> +    uint8_t opcode = srp->rsp.opcode;
> +
> +    switch (opcode) {
> +    case SRP_LOGIN_REQ:
> +        vscsi_process_login(s, req);
> +        break;
> +    case SRP_TSK_MGMT:
> +        done = vscsi_process_tsk_mgmt(s, req);
> +        break;
> +    case SRP_CMD:
> +        done = vscsi_queue_cmd(s, req);
> +        break;
> +    case SRP_LOGIN_RSP:
> +    case SRP_I_LOGOUT:
> +    case SRP_T_LOGOUT:
> +    case SRP_RSP:
> +    case SRP_CRED_REQ:
> +    case SRP_CRED_RSP:
> +    case SRP_AER_REQ:
> +    case SRP_AER_RSP:
> +        fprintf(stderr, "VSCSI: Unsupported opcode %02x\n", opcode);
> +        break;
> +    default:
> +        fprintf(stderr, "VSCSI: Unknown type %02x\n", opcode);
> +    }
> +
> +    return done;
> +}
> +
> +static int vscsi_send_adapter_info(VSCSIState *s, vscsi_req *req)
> +{
> +    struct viosrp_adapter_info *sinfo;
> +    struct mad_adapter_info_data info;
> +    int rc;
> +
> +    sinfo =&req->iu.mad.adapter_info;
> +
> +#if 0 /* What for ? */
> +    rc = spapr_tce_dma_read(&s->vdev, be64_to_cpu(sinfo->buffer),
> +&info, be16_to_cpu(sinfo->common.length));
> +    if (rc) {
> +        fprintf(stderr, "vscsi_send_adapter_info: DMA read failure !\n");
> +    }
> +#endif
> +    memset(&info, 0, sizeof(info));
> +    strcpy(info.srp_version, SRP_VERSION);
> +    strncpy(info.partition_name, "qemu", sizeof("qemu"));
> +    info.partition_number = cpu_to_be32(0);
> +    info.mad_version = cpu_to_be32(1);
> +    info.os_type = cpu_to_be32(2);
> +    info.port_max_txu[0] = cpu_to_be32(VSCSI_MAX_SECTORS<<  9);
> +
> +    rc = spapr_tce_dma_write(&s->vdev, be64_to_cpu(sinfo->buffer),
> +&info, be16_to_cpu(sinfo->common.length));
> +    if (rc)  {
> +        fprintf(stderr, "vscsi_send_adapter_info: DMA write failure !\n");
> +    }
> +
> +    sinfo->common.status = rc ? cpu_to_be32(1) : 0;
> +
> +    return vscsi_send_iu(s, req, sizeof(*sinfo), VIOSRP_MAD_FORMAT);
> +}
> +
> +static int vscsi_handle_mad_req(VSCSIState *s, vscsi_req *req)
> +{
> +    union mad_iu *mad =&req->iu.mad;
> +
> +    switch (be32_to_cpu(mad->empty_iu.common.type)) {
> +    case VIOSRP_EMPTY_IU_TYPE:
> +        fprintf(stderr, "Unsupported EMPTY MAD IU\n");
> +        break;
> +    case VIOSRP_ERROR_LOG_TYPE:
> +        fprintf(stderr, "Unsupported ERROR LOG MAD IU\n");
> +        mad->error_log.common.status = cpu_to_be16(1);
> +        vscsi_send_iu(s, req, sizeof(mad->error_log), VIOSRP_MAD_FORMAT);
> +        break;
> +    case VIOSRP_ADAPTER_INFO_TYPE:
> +        vscsi_send_adapter_info(s, req);
> +        break;
> +    case VIOSRP_HOST_CONFIG_TYPE:
> +        mad->host_config.common.status = cpu_to_be16(1);
> +        vscsi_send_iu(s, req, sizeof(mad->host_config), VIOSRP_MAD_FORMAT);
> +        break;
> +    default:
> +        fprintf(stderr, "VSCSI: Unknown MAD type %02x\n",
> +                be32_to_cpu(mad->empty_iu.common.type));
> +    }
> +
> +    return 1;
> +}
> +
> +static void vscsi_got_payload(VSCSIState *s, vscsi_crq *crq)
> +{
> +    vscsi_req *req;
> +    int done;
> +
> +    req = vscsi_get_req(s);
> +    if (req == NULL) {
> +        fprintf(stderr, "VSCSI: Failed to get a request !\n");
> +        return;
> +    }
> +
> +    /* We only support a limited number of descriptors, we know
> +     * the ibmvscsi driver uses up to 10 max, so it should fit
> +     * in our 256 bytes IUs. If not we'll have to increase the size
> +     * of the structure.
> +     */
> +    if (crq->s.IU_length>  sizeof(union viosrp_iu)) {
> +        fprintf(stderr, "VSCSI: SRP IU too long (%d bytes) !\n",
> +                crq->s.IU_length);
> +        return;
> +    }
> +
> +    /* XXX Handle failure differently ? */
> +    if (spapr_tce_dma_read(&s->vdev, crq->s.IU_data_ptr,&req->iu,
> +                           crq->s.IU_length)) {
> +        fprintf(stderr, "vscsi_got_payload: DMA read failure !\n");
> +        qemu_free(req);
> +    }
> +    memcpy(&req->crq, crq, sizeof(vscsi_crq));
> +
> +    if (crq->s.format == VIOSRP_MAD_FORMAT) {
> +        done = vscsi_handle_mad_req(s, req);
> +    } else {
> +        done = vscsi_handle_srp_req(s, req);
> +    }
> +
> +    if (done) {
> +        vscsi_put_req(s, req);
> +    }
> +}
> +
> +
> +static int vscsi_do_crq(struct VIOsPAPRDevice *dev, uint8_t *crq_data)
> +{
> +    VSCSIState *s = DO_UPCAST(VSCSIState, vdev, dev);
> +    vscsi_crq crq;
> +
> +    memcpy(crq.raw, crq_data, 16);
> +    crq.s.timeout = be16_to_cpu(crq.s.timeout);
> +    crq.s.IU_length = be16_to_cpu(crq.s.IU_length);
> +    crq.s.IU_data_ptr = be64_to_cpu(crq.s.IU_data_ptr);
> +
> +    dprintf("VSCSI: do_crq %02x %02x ...\n", crq.raw[0], crq.raw[1]);
> +
> +    switch(crq.s.valid) {
> +    case 0xc0: /* Init command/response */
> +
> +        /* Respond to initialization request */
> +        if (crq.s.format == 0x01) {
> +            memset(crq.raw, 0, 16);
> +            crq.s.valid = 0xc0;
> +            crq.s.format = 0x02;
> +            spapr_vio_send_crq(dev, crq.raw);
> +        }
> +
> +        /* Note that in hotplug cases, we might get a 0x02
> +         * as a result of us emitting the init request
> +         */
> +
> +        break;
> +    case 0xff: /* Link event */
> +
> +        /* Not handled for now */
> +
> +        break;
> +    case 0x80: /* Payloads */
> +        switch (crq.s.format) {
> +        case VIOSRP_SRP_FORMAT: /* AKA VSCSI request */
> +        case VIOSRP_MAD_FORMAT: /* AKA VSCSI response */
> +            vscsi_got_payload(s,&crq);
> +            break;
> +        case VIOSRP_OS400_FORMAT:
> +        case VIOSRP_AIX_FORMAT:
> +        case VIOSRP_LINUX_FORMAT:
> +        case VIOSRP_INLINE_FORMAT:
> +            fprintf(stderr, "vscsi_do_srq: Unsupported payload format %02x\n",
> +                    crq.s.format);
> +            break;
> +        default:
> +            fprintf(stderr, "vscsi_do_srq: Unknown payload format %02x\n",
> +                    crq.s.format);
> +        }
> +        break;
> +    default:
> +        fprintf(stderr, "vscsi_do_crq: unknown CRQ %02x %02x ...\n",
> +                crq.raw[0], crq.raw[1]);
> +    };
> +
> +    return 0;
> +}
> +
> +static int spapr_vscsi_init(VIOsPAPRDevice *dev)
> +{
> +    VSCSIState *s = DO_UPCAST(VSCSIState, vdev, dev);
> +    int i;
> +
> +    dbg_vscsi_state = s;
> +
> +    /* Initialize qemu request tags */
> +    memset(s->reqs, 0, sizeof(s->reqs));
> +    for (i = 0; i<  VSCSI_REQ_LIMIT; i++)

Braces

> +        s->reqs[i].qtag = i;
> +
> +    dev->crq.SendFunc = vscsi_do_crq;
> +
> +    scsi_bus_new(&s->bus,&dev->qdev, 1, VSCSI_REQ_LIMIT,
> +                 vscsi_command_complete);
> +    if (!dev->qdev.hotplugged) {
> +        scsi_bus_legacy_handle_cmdline(&s->bus);
> +    }
> +
> +    return 0;
> +}
> +
> +void spapr_vscsi_create(VIOsPAPRBus *bus, uint32_t reg,
> +                        qemu_irq qirq, uint32_t vio_irq_num)
> +{
> +    DeviceState *dev;
> +    VIOsPAPRDevice *sdev;
> +
> +    dev = qdev_create(&bus->bus, "spapr-vscsi");
> +    qdev_prop_set_uint32(dev, "reg", reg);
> +
> +    qdev_init_nofail(dev);
> +
> +    sdev = (VIOsPAPRDevice *)dev;
> +    sdev->qirq = qirq;
> +    sdev->vio_irq_num = vio_irq_num;
> +}
> +
> +static int spapr_vscsi_devnode(VIOsPAPRDevice *dev, void *fdt, int node_off)
> +{
> +    int ret;
> +
> +    ret = fdt_setprop_cell(fdt, node_off, "#address-cells", 2);
> +    if (ret<  0) {
> +        return ret;
> +    }
> +
> +    ret = fdt_setprop_cell(fdt, node_off, "#size-cells", 0);
> +    if (ret<  0) {
> +        return ret;
> +    }
> +
> +    return 0;
> +}
> +
> +static VIOsPAPRDeviceInfo spapr_vscsi = {
> +    .init = spapr_vscsi_init,
> +    .devnode = spapr_vscsi_devnode,
> +    .dt_name = "v-scsi",
> +    .dt_type = "vscsi",
> +    .dt_compatible = "IBM,v-scsi",
> +    .signal_mask = 0x00000001,
> +    .qdev.name = "spapr-vscsi",
> +    .qdev.size = sizeof(VSCSIState),
> +    .qdev.props = (Property[]) {
> +        DEFINE_PROP_UINT32("reg", VIOsPAPRDevice, reg, 0x2000),
> +        DEFINE_PROP_UINT32("dma-window", VIOsPAPRDevice,
> +                           rtce_window_size, 0x10000000),
> +        DEFINE_PROP_END_OF_LIST(),
> +    },
> +};
> +
> +static void spapr_vscsi_register(void)
> +{
> +    spapr_vio_bus_register_withprop(&spapr_vscsi);
> +}
> +device_init(spapr_vscsi_register);
> diff --git a/hw/srp.h b/hw/srp.h
> new file mode 100644
> index 0000000..9d55fc4
> --- /dev/null
> +++ b/hw/srp.h
> @@ -0,0 +1,241 @@
> +/*
> + * Copyright (c) 2005 Cisco Systems.  All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + * $Id$
> + */
> +
> +#ifndef SCSI_SRP_H
> +#define SCSI_SRP_H
> +
> +/*
> + * Structures and constants for the SCSI RDMA Protocol (SRP) as
> + * defined by the INCITS T10 committee.  This file was written using
> + * draft Revision 16a of the SRP standard.
> + */
> +
> +enum {
> +
> +    SRP_LOGIN_REQ = 0x00,
> +    SRP_TSK_MGMT  = 0x01,
> +    SRP_CMD       = 0x02,
> +    SRP_I_LOGOUT  = 0x03,
> +    SRP_LOGIN_RSP = 0xc0,
> +    SRP_RSP       = 0xc1,
> +    SRP_LOGIN_REJ = 0xc2,
> +    SRP_T_LOGOUT  = 0x80,
> +    SRP_CRED_REQ  = 0x81,
> +    SRP_AER_REQ   = 0x82,
> +    SRP_CRED_RSP  = 0x41,
> +    SRP_AER_RSP   = 0x42
> +};
> +
> +enum {
> +    SRP_BUF_FORMAT_DIRECT   = 1<<  1,
> +    SRP_BUF_FORMAT_INDIRECT = 1<<  2
> +};
> +
> +enum {
> +    SRP_NO_DATA_DESC       = 0,
> +    SRP_DATA_DESC_DIRECT   = 1,
> +    SRP_DATA_DESC_INDIRECT = 2
> +};
> +
> +enum {
> +    SRP_TSK_ABORT_TASK     = 0x01,
> +    SRP_TSK_ABORT_TASK_SET = 0x02,
> +    SRP_TSK_CLEAR_TASK_SET = 0x04,
> +    SRP_TSK_LUN_RESET      = 0x08,
> +    SRP_TSK_CLEAR_ACA      = 0x40
> +};
> +
> +enum srp_login_rej_reason {
> +    SRP_LOGIN_REJ_UNABLE_ESTABLISH_CHANNEL   = 0x00010000,
> +    SRP_LOGIN_REJ_INSUFFICIENT_RESOURCES     = 0x00010001,
> +    SRP_LOGIN_REJ_REQ_IT_IU_LENGTH_TOO_LARGE = 0x00010002,
> +    SRP_LOGIN_REJ_UNABLE_ASSOCIATE_CHANNEL   = 0x00010003,
> +    SRP_LOGIN_REJ_UNSUPPORTED_DESCRIPTOR_FMT = 0x00010004,
> +    SRP_LOGIN_REJ_MULTI_CHANNEL_UNSUPPORTED  = 0x00010005,
> +    SRP_LOGIN_REJ_CHANNEL_LIMIT_REACHED      = 0x00010006
> +};
> +
> +enum {
> +    SRP_REV10_IB_IO_CLASS  = 0xff00,
> +    SRP_REV16A_IB_IO_CLASS = 0x0100
> +};
> +
> +struct srp_direct_buf {
> +    uint64_t    va;
> +    uint32_t    key;
> +    uint32_t    len;
> +};
> +
> +/*
> + * We need the packed attribute because the SRP spec puts the list of
> + * descriptors at an offset of 20, which is not aligned to the size of
> + * struct srp_direct_buf.  The whole structure must be packed to avoid
> + * having the 20-byte structure padded to 24 bytes on 64-bit architectures.
> + */
> +struct srp_indirect_buf {
> +    struct srp_direct_buf    table_desc;
> +    uint32_t                 len;
> +    struct srp_direct_buf    desc_list[0];
> +} __attribute__((packed));
> +
> +enum {
> +    SRP_MULTICHAN_SINGLE = 0,
> +    SRP_MULTICHAN_MULTI  = 1
> +};
> +
> +struct srp_login_req {
> +    uint8_t    opcode;
> +    uint8_t    reserved1[7];
> +    uint64_t   tag;
> +    uint32_t   req_it_iu_len;
> +    uint8_t    reserved2[4];
> +    uint16_t   req_buf_fmt;
> +    uint8_t    req_flags;
> +    uint8_t    reserved3[5];
> +    uint8_t    initiator_port_id[16];
> +    uint8_t    target_port_id[16];
> +};
> +
> +/*
> + * The SRP spec defines the size of the LOGIN_RSP structure to be 52
> + * bytes, so it needs to be packed to avoid having it padded to 56
> + * bytes on 64-bit architectures.
> + */
> +struct srp_login_rsp {
> +    uint8_t    opcode;
> +    uint8_t    reserved1[3];
> +    uint32_t   req_lim_delta;
> +    uint64_t   tag;
> +    uint32_t   max_it_iu_len;
> +    uint32_t   max_ti_iu_len;
> +    uint16_t   buf_fmt;
> +    uint8_t    rsp_flags;
> +    uint8_t    reserved2[25];
> +} __attribute__((packed));
> +
> +struct srp_login_rej {
> +    uint8_t    opcode;
> +    uint8_t    reserved1[3];
> +    uint32_t   reason;
> +    uint64_t   tag;
> +    uint8_t    reserved2[8];
> +    uint16_t   buf_fmt;
> +    uint8_t    reserved3[6];
> +};

Why isn't this one packed? And the ones below?

> +
> +struct srp_i_logout {
> +    uint8_t    opcode;
> +    uint8_t    reserved[7];
> +    uint64_t   tag;
> +};
> +
> +struct srp_t_logout {
> +    uint8_t    opcode;
> +    uint8_t    sol_not;
> +    uint8_t    reserved[2];
> +    uint32_t   reason;
> +    uint64_t   tag;
> +};
> +
> +/*
> + * We need the packed attribute because the SRP spec only aligns the
> + * 8-byte LUN field to 4 bytes.
> + */
> +struct srp_tsk_mgmt {
> +    uint8_t    opcode;
> +    uint8_t    sol_not;
> +    uint8_t    reserved1[6];
> +    uint64_t   tag;
> +    uint8_t    reserved2[4];
> +    uint64_t   lun __attribute__((packed));
> +    uint8_t    reserved3[2];
> +    uint8_t    tsk_mgmt_func;
> +    uint8_t    reserved4;
> +    uint64_t   task_tag;
> +    uint8_t    reserved5[8];
> +};
> +
> +/*
> + * We need the packed attribute because the SRP spec only aligns the
> + * 8-byte LUN field to 4 bytes.
> + */
> +struct srp_cmd {
> +    uint8_t    opcode;
> +    uint8_t    sol_not;
> +    uint8_t    reserved1[3];
> +    uint8_t    buf_fmt;
> +    uint8_t    data_out_desc_cnt;
> +    uint8_t    data_in_desc_cnt;
> +    uint64_t   tag;
> +    uint8_t    reserved2[4];
> +    uint64_t   lun __attribute__((packed));
> +    uint8_t    reserved3;
> +    uint8_t    task_attr;
> +    uint8_t    reserved4;
> +    uint8_t    add_cdb_len;
> +    uint8_t    cdb[16];
> +    uint8_t    add_data[0];
> +};
> +
> +enum {
> +    SRP_RSP_FLAG_RSPVALID = 1<<  0,
> +    SRP_RSP_FLAG_SNSVALID = 1<<  1,
> +    SRP_RSP_FLAG_DOOVER   = 1<<  2,
> +    SRP_RSP_FLAG_DOUNDER  = 1<<  3,
> +    SRP_RSP_FLAG_DIOVER   = 1<<  4,
> +    SRP_RSP_FLAG_DIUNDER  = 1<<  5
> +};
> +
> +/*
> + * The SRP spec defines the size of the RSP structure to be 36 bytes,
> + * so it needs to be packed to avoid having it padded to 40 bytes on
> + * 64-bit architectures.
> + */
> +struct srp_rsp {
> +    uint8_t    opcode;
> +    uint8_t    sol_not;
> +    uint8_t    reserved1[2];
> +    uint32_t   req_lim_delta;
> +    uint64_t   tag;
> +    uint8_t    reserved2[2];
> +    uint8_t    flags;
> +    uint8_t    status;
> +    uint32_t   data_out_res_cnt;
> +    uint32_t   data_in_res_cnt;
> +    uint32_t   sense_data_len;
> +    uint32_t   resp_data_len;
> +    uint8_t    data[0];
> +} __attribute__((packed));
> +
> +#endif /* SCSI_SRP_H */


Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine
  2011-03-16  4:57 ` [Qemu-devel] [PATCH 25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine David Gibson
@ 2011-03-16 16:43   ` Alexander Graf
  2011-03-17  2:21     ` David Gibson
  0 siblings, 1 reply; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 16:43 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton

On 03/16/2011 05:57 AM, David Gibson wrote:
> From: Ben Herrenschmidt<benh@kernel.crashing.org>
>
> Usually, PAPR virtual IO devices use a virtual IOMMU mechanism, TCEs,
> to mediate all DMA transfers.  While this is necessary for some sorts of
> operation, it can be complex to program and slow for others.
>
> This patch implements a mechanism for bypassing TCE translation, treating
> "IO" addresses as plain (guest) physical memory addresses.  This has two
> main uses:
>   * Simple, but 64-bit aware programs like firmwares can use the VIO devices
> without the complexity of TCE setup.
>   * The guest OS can optionally use the TCE bypass to improve performance in
> suitable situations.
>
> The mechanism used is a per-device flag which disables TCE translation.
> The flag is toggled with some (hypervisor-implemented) RTAS methods.

Is this an official extension used by anyone or is it your own invention 
that's not implemented in pHyp?


Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 24/26] Implement PAPR virtual SCSI interface (ibmvscsi)
  2011-03-16 16:41   ` [Qemu-devel] " Alexander Graf
@ 2011-03-16 16:51     ` Anthony Liguori
  2011-03-16 20:08     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 82+ messages in thread
From: Anthony Liguori @ 2011-03-16 16:51 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton, David Gibson

On 03/16/2011 11:41 AM, Alexander Graf wrote:
>> new file mode 100644
>> index 0000000..9d55fc4
>> --- /dev/null
>> +++ b/hw/srp.h
>> @@ -0,0 +1,241 @@
>> +/*
>> + * Copyright (c) 2005 Cisco Systems.  All rights reserved.
>> + *
>> + * This software is available to you under a choice of one of two
>> + * licenses.  You may choose to be licensed under the terms of the GNU
>> + * General Public License (GPL) Version 2, available from the file
>> + * COPYING in the main directory of this source tree, or the
>> + * OpenIB.org BSD license below:
>> + *
>> + *     Redistribution and use in source and binary forms, with or
>> + *     without modification, are permitted provided that the following
>> + *     conditions are met:
>> + *
>> + *      - Redistributions of source code must retain the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer.
>> + *
>> + *      - Redistributions in binary form must reproduce the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer in the documentation and/or other materials
>> + *        provided with the distribution.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> + * SOFTWARE.
>> + *
>> + * $Id$
>> + */
>> +
>> +#ifndef SCSI_SRP_H
>> +#define SCSI_SRP_H
>> +
>> +/*
>> + * Structures and constants for the SCSI RDMA Protocol (SRP) as
>> + * defined by the INCITS T10 committee.  This file was written using
>> + * draft Revision 16a of the SRP standard.
>> + */
>> +
>> +enum {
>> +
>> +    SRP_LOGIN_REQ = 0x00,
>> +    SRP_TSK_MGMT  = 0x01,
>> +    SRP_CMD       = 0x02,
>> +    SRP_I_LOGOUT  = 0x03,
>> +    SRP_LOGIN_RSP = 0xc0,
>> +    SRP_RSP       = 0xc1,
>> +    SRP_LOGIN_REJ = 0xc2,
>> +    SRP_T_LOGOUT  = 0x80,
>> +    SRP_CRED_REQ  = 0x81,
>> +    SRP_AER_REQ   = 0x82,
>> +    SRP_CRED_RSP  = 0x41,
>> +    SRP_AER_RSP   = 0x42
>> +};
>> +
>> +enum {
>> +    SRP_BUF_FORMAT_DIRECT   = 1<<  1,
>> +    SRP_BUF_FORMAT_INDIRECT = 1<<  2
>> +};
>> +
>> +enum {
>> +    SRP_NO_DATA_DESC       = 0,
>> +    SRP_DATA_DESC_DIRECT   = 1,
>> +    SRP_DATA_DESC_INDIRECT = 2
>> +};
>> +
>> +enum {
>> +    SRP_TSK_ABORT_TASK     = 0x01,
>> +    SRP_TSK_ABORT_TASK_SET = 0x02,
>> +    SRP_TSK_CLEAR_TASK_SET = 0x04,
>> +    SRP_TSK_LUN_RESET      = 0x08,
>> +    SRP_TSK_CLEAR_ACA      = 0x40
>> +};
>> +
>> +enum srp_login_rej_reason {
>> +    SRP_LOGIN_REJ_UNABLE_ESTABLISH_CHANNEL   = 0x00010000,
>> +    SRP_LOGIN_REJ_INSUFFICIENT_RESOURCES     = 0x00010001,
>> +    SRP_LOGIN_REJ_REQ_IT_IU_LENGTH_TOO_LARGE = 0x00010002,
>> +    SRP_LOGIN_REJ_UNABLE_ASSOCIATE_CHANNEL   = 0x00010003,
>> +    SRP_LOGIN_REJ_UNSUPPORTED_DESCRIPTOR_FMT = 0x00010004,
>> +    SRP_LOGIN_REJ_MULTI_CHANNEL_UNSUPPORTED  = 0x00010005,
>> +    SRP_LOGIN_REJ_CHANNEL_LIMIT_REACHED      = 0x00010006
>> +};
>> +
>> +enum {
>> +    SRP_REV10_IB_IO_CLASS  = 0xff00,
>> +    SRP_REV16A_IB_IO_CLASS = 0x0100
>> +};
>> +
>> +struct srp_direct_buf {
>> +    uint64_t    va;
>> +    uint32_t    key;
>> +    uint32_t    len;
>> +};
>> +
>> +/*
>> + * We need the packed attribute because the SRP spec puts the list of
>> + * descriptors at an offset of 20, which is not aligned to the size of
>> + * struct srp_direct_buf.  The whole structure must be packed to avoid
>> + * having the 20-byte structure padded to 24 bytes on 64-bit 
>> architectures.
>> + */
>> +struct srp_indirect_buf {
>> +    struct srp_direct_buf    table_desc;
>> +    uint32_t                 len;
>> +    struct srp_direct_buf    desc_list[0];
>> +} __attribute__((packed));
>> +
>> +enum {
>> +    SRP_MULTICHAN_SINGLE = 0,
>> +    SRP_MULTICHAN_MULTI  = 1
>> +};
>> +
>> +struct srp_login_req {
>> +    uint8_t    opcode;
>> +    uint8_t    reserved1[7];
>> +    uint64_t   tag;
>> +    uint32_t   req_it_iu_len;
>> +    uint8_t    reserved2[4];
>> +    uint16_t   req_buf_fmt;
>> +    uint8_t    req_flags;
>> +    uint8_t    reserved3[5];
>> +    uint8_t    initiator_port_id[16];
>> +    uint8_t    target_port_id[16];
>> +};
>> +
>> +/*
>> + * The SRP spec defines the size of the LOGIN_RSP structure to be 52
>> + * bytes, so it needs to be packed to avoid having it padded to 56
>> + * bytes on 64-bit architectures.
>> + */
>> +struct srp_login_rsp {
>> +    uint8_t    opcode;
>> +    uint8_t    reserved1[3];
>> +    uint32_t   req_lim_delta;
>> +    uint64_t   tag;
>> +    uint32_t   max_it_iu_len;
>> +    uint32_t   max_ti_iu_len;
>> +    uint16_t   buf_fmt;
>> +    uint8_t    rsp_flags;
>> +    uint8_t    reserved2[25];
>> +} __attribute__((packed));
>> +
>> +struct srp_login_rej {
>> +    uint8_t    opcode;
>> +    uint8_t    reserved1[3];
>> +    uint32_t   reason;
>> +    uint64_t   tag;
>> +    uint8_t    reserved2[8];
>> +    uint16_t   buf_fmt;
>> +    uint8_t    reserved3[6];
>> +};
> diff --git a/hw/srp.h b/hw/srp.h
>
> Why isn't this one packed? And the ones below?

It's naturally aligned.  There's no need to pack things that are 
naturally aligned (structure size is a multiple of 8 and each type 
starts at an offset that's a multiple of it's size).

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC
  2011-03-16 13:46   ` [Qemu-devel] " Alexander Graf
@ 2011-03-16 16:58     ` Stefan Hajnoczi
  2011-03-17  2:26       ` David Gibson
  0 siblings, 1 reply; 82+ messages in thread
From: Stefan Hajnoczi @ 2011-03-16 16:58 UTC (permalink / raw)
  To: David Gibson; +Cc: Alexander Graf, paulus, qemu-devel, anton

On Wed, Mar 16, 2011 at 1:46 PM, Alexander Graf <agraf@suse.de> wrote:
> On 03/16/2011 05:56 AM, David Gibson wrote:
>>
>> From: David Gibson<dwg@au1.ibm.com>
>>
>> PowerPC and POWER chips since the POWER4 and 970 have a special
>> hypervisor mode, and a corresponding form of the system call
>> instruction which traps to the hypervisor.
>>
>> qemu currently has stub implementations of hypervisor mode.  That
>> is, the outline is there to allow qemu to run a PowerPC hypervisor
>> under emulation.  There are a number of details missing so this
>> won't actually work at present, but the idea is there.
>>
>> What there is no provision at all, is for qemu to instead emulate
>> the hypervisor itself.  That is to have hypercalls trap into qemu
>> and their result be emulated from qemu, rather than running
>> hypervisor code within the emulated system.
>>
>> Hypervisor hardware aware KVM implementations are in the works and
>> it would  be useful for debugging and development to also allow
>> full emulation of the same para-virtualized guests as such a KVM.
>>
>> Therefore, this patch adds a hook which will allow a machine to
>> set up emulation of hypervisor calls.
>>
>> Signed-off-by: David Gibson<dwg@au1.ibm.com>
>> ---
>>  target-ppc/cpu.h    |    2 ++
>>  target-ppc/helper.c |    4 ++++
>>  2 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
>> index a20c132..eaddc27 100644
>> --- a/target-ppc/cpu.h
>> +++ b/target-ppc/cpu.h
>> @@ -692,6 +692,8 @@ struct CPUPPCState {
>>      int bfd_mach;
>>      uint32_t flags;
>>      uint64_t insns_flags;
>> +    void (*emulate_hypercall)(CPUState *, void *);
>> +    void *hcall_opaque;
>>
>>      int error_code;
>>      uint32_t pending_interrupts;
>> diff --git a/target-ppc/helper.c b/target-ppc/helper.c
>> index 2094ca3..19aa067 100644
>> --- a/target-ppc/helper.c
>> +++ b/target-ppc/helper.c
>> @@ -2152,6 +2152,10 @@ static inline void powerpc_excp(CPUState *env, int
>> excp_model, int excp)
>>      case POWERPC_EXCP_SYSCALL:   /* System call exception
>>    */
>>          dump_syscall(env);
>>          lev = env->error_code;
>> +       if ((lev == 1)&&  env->emulate_hypercall) {
>> +           env->emulate_hypercall(env, env->hcall_opaque);
>> +           return;
>> +       }
>
> Tabs! Please go through all your patches and make sure there are no tabs in
> there :(.

scripts/checkpatch.pl is there to automate style checking.  That's the
easiest way to check patches before submitting them.

Stefan

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 21/26] Implement TCE translation for sPAPR VIO
  2011-03-16 16:03   ` [Qemu-devel] " Alexander Graf
@ 2011-03-16 20:05     ` Benjamin Herrenschmidt
  2011-03-16 20:21       ` Anthony Liguori
  2011-03-16 20:22       ` Anthony Liguori
  2011-03-17  1:43     ` David Gibson
  1 sibling, 2 replies; 82+ messages in thread
From: Benjamin Herrenschmidt @ 2011-03-16 20:05 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton, David Gibson

On Wed, 2011-03-16 at 17:03 +0100, Alexander Graf wrote:
> 
> > +int spapr_tce_dma_zero(VIOsPAPRDevice *dev, uint64_t taddr,
> uint32_t size)
> > +{
> > +    uint8_t *zeroes;
> > +
> > +#ifdef DEBUG_TCE
> > +    fprintf(stderr, "spapr_tce_dma_zero taddr=0x%llx size=0x%x\n",
> > +            (unsigned long long)taddr, size);
> > +#endif
> > +
> > +    /* FIXME: do this better... */
> > +    zeroes = alloca(size);
> > +    memset(zeroes, 0, size);
> 
> You sure that zeroes is still alive during the call? If I were a 
> compiler, I'd probably optimize the return away so that it'd end up 
> being a simple branch to spapr_tce_dma_write - coincidentally 
> invalidating the stack that zeroes is on.

Ugh ? How would this ever be legal for a compiler to do that ?

Ben.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 24/26] Implement PAPR virtual SCSI interface (ibmvscsi)
  2011-03-16 16:41   ` [Qemu-devel] " Alexander Graf
  2011-03-16 16:51     ` Anthony Liguori
@ 2011-03-16 20:08     ` Benjamin Herrenschmidt
  2011-03-16 20:19       ` Anthony Liguori
  1 sibling, 1 reply; 82+ messages in thread
From: Benjamin Herrenschmidt @ 2011-03-16 20:08 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton, David Gibson

On Wed, 2011-03-16 at 17:41 +0100, Alexander Graf wrote:

> > +/*
> > + * Common MAD header
> > + */
> > +struct mad_common {
> > +    uint32_t type;
> > +    uint16_t status;
> > +    uint16_t length;
> > +    uint64_t tag;
> 
> Is this an in-memory representation? If so, it should be packed, right? 
> Same goes for the ones below.

Well, all the fields are naturally aligned, as is the structure itself,
do we really need to pack ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 24/26] Implement PAPR virtual SCSI interface (ibmvscsi)
  2011-03-16 20:08     ` Benjamin Herrenschmidt
@ 2011-03-16 20:19       ` Anthony Liguori
  0 siblings, 0 replies; 82+ messages in thread
From: Anthony Liguori @ 2011-03-16 20:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Gibson, paulus, Alexander Graf, anton, qemu-devel

On 03/16/2011 03:08 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2011-03-16 at 17:41 +0100, Alexander Graf wrote:
>
>>> +/*
>>> + * Common MAD header
>>> + */
>>> +struct mad_common {
>>> +    uint32_t type;
>>> +    uint16_t status;
>>> +    uint16_t length;
>>> +    uint64_t tag;
>> Is this an in-memory representation? If so, it should be packed, right?
>> Same goes for the ones below.
> Well, all the fields are naturally aligned, as is the structure itself,
> do we really need to pack ?

No.

Regards,

Anthony Liguori

> Cheers,
> Ben.
>
>
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 21/26] Implement TCE translation for sPAPR VIO
  2011-03-16 20:05     ` Benjamin Herrenschmidt
@ 2011-03-16 20:21       ` Anthony Liguori
  2011-03-16 20:22       ` Anthony Liguori
  1 sibling, 0 replies; 82+ messages in thread
From: Anthony Liguori @ 2011-03-16 20:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Gibson, paulus, Alexander Graf, anton, qemu-devel

On 03/16/2011 03:05 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2011-03-16 at 17:03 +0100, Alexander Graf wrote:
>>> +int spapr_tce_dma_zero(VIOsPAPRDevice *dev, uint64_t taddr,
>> uint32_t size)
>>> +{
>>> +    uint8_t *zeroes;
>>> +
>>> +#ifdef DEBUG_TCE
>>> +    fprintf(stderr, "spapr_tce_dma_zero taddr=0x%llx size=0x%x\n",
>>> +            (unsigned long long)taddr, size);
>>> +#endif
>>> +
>>> +    /* FIXME: do this better... */
>>> +    zeroes = alloca(size);
>>> +    memset(zeroes, 0, size);
>> You sure that zeroes is still alive during the call? If I were a
>> compiler, I'd probably optimize the return away so that it'd end up
>> being a simple branch to spapr_tce_dma_write - coincidentally
>> invalidating the stack that zeroes is on.
> Ugh ? How would this ever be legal for a compiler to do that ?

Yeah, the compiler can't do that.  The return of alloca() is valid as 
long as the stack frame is valid.  Inlining doesn't change that.
Regards,

Anthony Liguori

> Ben.
>
>
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 21/26] Implement TCE translation for sPAPR VIO
  2011-03-16 20:05     ` Benjamin Herrenschmidt
  2011-03-16 20:21       ` Anthony Liguori
@ 2011-03-16 20:22       ` Anthony Liguori
  2011-03-16 20:36         ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2011-03-16 20:22 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Gibson, paulus, Alexander Graf, anton, qemu-devel

On 03/16/2011 03:05 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2011-03-16 at 17:03 +0100, Alexander Graf wrote:
>>> +int spapr_tce_dma_zero(VIOsPAPRDevice *dev, uint64_t taddr,
>> uint32_t size)
>>> +{
>>> +    uint8_t *zeroes;
>>> +
>>> +#ifdef DEBUG_TCE
>>> +    fprintf(stderr, "spapr_tce_dma_zero taddr=0x%llx size=0x%x\n",
>>> +            (unsigned long long)taddr, size);
>>> +#endif
>>> +
>>> +    /* FIXME: do this better... */
>>> +    zeroes = alloca(size);
>>> +    memset(zeroes, 0, size);
>> You sure that zeroes is still alive during the call? If I were a
>> compiler, I'd probably optimize the return away so that it'd end up
>> being a simple branch to spapr_tce_dma_write - coincidentally
>> invalidating the stack that zeroes is on.
> Ugh ? How would this ever be legal for a compiler to do that ?

But BTW, if you're already being evil and using alloca, it's a whole lot 
nicer to just do:

uint8_t zeros[size];

Regards,

Anthony Liguori

> Ben.
>
>
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 21/26] Implement TCE translation for sPAPR VIO
  2011-03-16 20:22       ` Anthony Liguori
@ 2011-03-16 20:36         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 82+ messages in thread
From: Benjamin Herrenschmidt @ 2011-03-16 20:36 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: David Gibson, paulus, Alexander Graf, anton, qemu-devel

On Wed, 2011-03-16 at 15:22 -0500, Anthony Liguori wrote:
> 
> But BTW, if you're already being evil and using alloca, it's a whole
> lot 
> nicer to just do:
> 
> uint8_t zeros[size];

Right. I haven't written that bit of the code so I'll let David fix it
but it does indeed look nicer. Eventually, we -could- I suppose make
some of these faster since all we really need is poke at the guest
memory and I'm sure we can do it directly a way or another :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC David Gibson
  2011-03-16 13:46   ` [Qemu-devel] " Alexander Graf
@ 2011-03-16 20:44   ` Anthony Liguori
  2011-03-17  4:55     ` David Gibson
  1 sibling, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2011-03-16 20:44 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, agraf, anton, qemu-devel

On 03/15/2011 11:56 PM, David Gibson wrote:
> From: David Gibson<dwg@au1.ibm.com>
>
> PowerPC and POWER chips since the POWER4 and 970 have a special
> hypervisor mode, and a corresponding form of the system call
> instruction which traps to the hypervisor.
>
> qemu currently has stub implementations of hypervisor mode.  That
> is, the outline is there to allow qemu to run a PowerPC hypervisor
> under emulation.  There are a number of details missing so this
> won't actually work at present, but the idea is there.
>
> What there is no provision at all, is for qemu to instead emulate
> the hypervisor itself.  That is to have hypercalls trap into qemu
> and their result be emulated from qemu, rather than running
> hypervisor code within the emulated system.
>
> Hypervisor hardware aware KVM implementations are in the works and
> it would  be useful for debugging and development to also allow
> full emulation of the same para-virtualized guests as such a KVM.
>
> Therefore, this patch adds a hook which will allow a machine to
> set up emulation of hypervisor calls.
>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   target-ppc/cpu.h    |    2 ++
>   target-ppc/helper.c |    4 ++++
>   2 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index a20c132..eaddc27 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -692,6 +692,8 @@ struct CPUPPCState {
>       int bfd_mach;
>       uint32_t flags;
>       uint64_t insns_flags;
> +    void (*emulate_hypercall)(CPUState *, void *);
> +    void *hcall_opaque;

Is the hypercall handler ever specific to a CPU?

I'd prefer to see this as a generic interface that wasn't specific to 
target-ppc.

Basically, add a:

void cpu_hypercall(CPUState *env);

And then implement it within your target.  I'm not sure I get the opaque 
argument.

Regards,

Anthony Liguori

>
>       int error_code;
>       uint32_t pending_interrupts;
> diff --git a/target-ppc/helper.c b/target-ppc/helper.c
> index 2094ca3..19aa067 100644
> --- a/target-ppc/helper.c
> +++ b/target-ppc/helper.c
> @@ -2152,6 +2152,10 @@ static inline void powerpc_excp(CPUState *env, int excp_model, int excp)
>       case POWERPC_EXCP_SYSCALL:   /* System call exception                    */
>           dump_syscall(env);
>           lev = env->error_code;
> +	if ((lev == 1)&&  env->emulate_hypercall) {
> +	    env->emulate_hypercall(env, env->hcall_opaque);
> +	    return;
> +	}	
>           if (lev == 1 || (lpes0 == 0&&  lpes1 == 0))
>               new_msr |= (target_ulong)MSR_HVB;
>           goto store_next;

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 13/26] Start implementing pSeries logical partition machine
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 13/26] Start implementing pSeries logical partition machine David Gibson
  2011-03-16 14:30   ` [Qemu-devel] " Alexander Graf
@ 2011-03-16 21:59   ` Anthony Liguori
  2011-03-16 23:46     ` Alexander Graf
  2011-03-17  3:08     ` David Gibson
  1 sibling, 2 replies; 82+ messages in thread
From: Anthony Liguori @ 2011-03-16 21:59 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, agraf, anton, qemu-devel

On 03/15/2011 11:56 PM, David Gibson wrote:
> This patch adds a "pseries" machine to qemu.  This aims to emulate a
> logical partition on an IBM pSeries machine, compliant to the
> "PowerPC Architecture Platform Requirements" (PAPR) document.

Can we call the machine 'papr' or at least 'lpar'

Technically speaking, System P is the proper name these days, but I 
think papr or lpar would make a lot more sense to people.

> This initial version is quite limited, it implements a basic machine
> and PAPR hypercall emulation.  So far only one hypercall is present -
> H_PUT_TERM_CHAR - so that a (write-only) console is available.
>
> Multiple CPUs are permitted, with SMP entry handled kexec() style.
>
> The machine so far more resembles an old POWER4 style "full system
> partition" rather than a modern LPAR, in that the guest manages the
> page tables directly, rather than via hypercalls.
>
> The machine requires qemu to be configured with --enable-fdt.  The
> machine can (so far) only be booted with -kernel - i.e. no partition
> firmware is provided.
>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   Makefile.target  |    2 +
>   hw/spapr.c       |  314 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/spapr.h       |  246 ++++++++++++++++++++++++++++++++++++++++++
>   hw/spapr_hcall.c |   43 ++++++++
>   4 files changed, 605 insertions(+), 0 deletions(-)
>   create mode 100644 hw/spapr.c
>   create mode 100644 hw/spapr.h
>   create mode 100644 hw/spapr_hcall.c
>
> diff --git a/Makefile.target b/Makefile.target
> index f0df98e..e6a7557 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -231,6 +231,8 @@ obj-ppc-y += ppc_prep.o
>   obj-ppc-y += ppc_oldworld.o
>   # NewWorld PowerMac
>   obj-ppc-y += ppc_newworld.o
> +# IBM pSeries (sPAPR)
> +obj-ppc-y += spapr.o spapr_hcall.o
>   # PowerPC 4xx boards
>   obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>   obj-ppc-y += ppc440.o ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> new file mode 100644
> index 0000000..8b4e16e
> --- /dev/null
> +++ b/hw/spapr.c
> @@ -0,0 +1,314 @@
> +/*
> + * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System Emulator
> + *
> + * Copyright (c) 2004-2007 Fabrice Bellard
> + * Copyright (c) 2007 Jocelyn Mayer
> + * Copyright (c) 2010 David Gibson, IBM Corporation.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + *
> + */
> +#include "sysemu.h"
> +#include "qemu-char.h"
> +#include "hw.h"
> +#include "elf.h"
> +
> +#include "hw/boards.h"
> +#include "hw/ppc.h"
> +#include "hw/loader.h"
> +
> +#include "hw/spapr.h"
> +
> +#include<libfdt.h>
> +
> +#define KERNEL_LOAD_ADDR        0x00000000
> +#define INITRD_LOAD_ADDR        0x02800000
> +#define FDT_MAX_SIZE            0x10000
> +
> +#define TIMEBASE_FREQ           512000000ULL
> +
> +#define MAX_CPUS                32
> +
> +static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
> +                              const char *cpu_model, CPUState *envs[],
> +                              sPAPREnvironment *spapr,
> +                              target_phys_addr_t initrd_base,
> +                              target_phys_addr_t initrd_size,
> +                              const char *kernel_cmdline)
> +{
> +    void *fdt;
> +    uint64_t mem_reg_property[] = { 0, cpu_to_be64(ramsize) };
> +    uint32_t start_prop = cpu_to_be32(initrd_base);
> +    uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
> +    int i;
> +    char *modelname;
> +
> +#define _FDT(exp) \
> +    do { \
> +        int ret = (exp);                                           \
> +        if (ret<  0) {                                             \
> +            hw_error("qemu: error creating device tree: %s: %s\n", \
> +                     #exp, fdt_strerror(ret));                     \
> +            return NULL;                                           \
> +        }                                                          \
> +    } while (0)

I'm not a huge fan of macros like this.  It'd be much nicer to use a 
goto to have common error handling.

> +
> +    fdt = qemu_mallocz(FDT_MAX_SIZE);
> +    _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
> +
> +    _FDT((fdt_finish_reservemap(fdt)));
> +
> +    /* Root node */
> +    _FDT((fdt_begin_node(fdt, "")));
> +    _FDT((fdt_property_string(fdt, "device_type", "chrp")));
> +    _FDT((fdt_property_string(fdt, "model", "qemu,emulated-pSeries-LPAR")));
> +
> +    _FDT((fdt_property_cell(fdt, "#address-cells", 0x2)));
> +    _FDT((fdt_property_cell(fdt, "#size-cells", 0x2)));
> +
> +    /* /chosen */
> +    _FDT((fdt_begin_node(fdt, "chosen")));
> +
> +    _FDT((fdt_property_string(fdt, "bootargs", kernel_cmdline)));
> +    _FDT((fdt_property(fdt, "linux,initrd-start",&start_prop, sizeof(start_prop))));
> +    _FDT((fdt_property(fdt, "linux,initrd-end",&end_prop, sizeof(end_prop))));
> +
> +    _FDT((fdt_end_node(fdt)));
> +
> +    /* memory node */
> +    _FDT((fdt_begin_node(fdt, "memory@0")));
> +
> +    _FDT((fdt_property_string(fdt, "device_type", "memory")));
> +    _FDT((fdt_property(fdt, "reg", mem_reg_property, sizeof(mem_reg_property))));
> +
> +    _FDT((fdt_end_node(fdt)));
> +
> +    /* cpus */
> +    _FDT((fdt_begin_node(fdt, "cpus")));
> +
> +    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
> +    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
> +
> +    modelname = qemu_strdup(cpu_model);
> +
> +    for (i = 0; i<  strlen(modelname); i++) {
> +        modelname[i] = toupper(modelname[i]);
> +    }
> +
> +    for (i = 0; i<  smp_cpus; i++) {
> +        CPUState *env = envs[i];
> +        char *nodename;
> +        uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
> +                           0xffffffff, 0xffffffff};
> +
> +        if (asprintf(&nodename, "%s@%x", modelname, i)<  0) {
> +            fprintf(stderr, "Allocation failure\n");
> +            exit(1);
> +        }

asprintf isn't portable and we don't have a portable replacement (yet).  
I'd suggest using a static size buffer and snprintf().

> +
> +        _FDT((fdt_begin_node(fdt, nodename)));
> +
> +        free(nodename);
> +
> +        _FDT((fdt_property_cell(fdt, "reg", i)));
> +        _FDT((fdt_property_string(fdt, "device_type", "cpu")));
> +
> +        _FDT((fdt_property_cell(fdt, "cpu-version", env->spr[SPR_PVR])));
> +        _FDT((fdt_property_cell(fdt, "dcache-block-size", env->dcache_line_size)));
> +        _FDT((fdt_property_cell(fdt, "icache-block-size", env->icache_line_size)));
> +        _FDT((fdt_property_cell(fdt, "timebase-frequency", TIMEBASE_FREQ)));
> +        /* Hardcode CPU frequency for now.  It's kind of arbitrary on
> +         * full emu, for kvm we should copy it from the host */
> +        _FDT((fdt_property_cell(fdt, "clock-frequency", 1000000000)));
> +        _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr)));
> +        _FDT((fdt_property_string(fdt, "status", "okay")));
> +        _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
> +
> +        if (envs[i]->mmu_model&  POWERPC_MMU_1TSEG) {
> +            _FDT((fdt_property(fdt, "ibm,processor-segment-sizes",
> +                               segs, sizeof(segs))));
> +        }
> +
> +        _FDT((fdt_end_node(fdt)));
> +    }
> +
> +    qemu_free(modelname);
> +
> +    _FDT((fdt_end_node(fdt)));
> +
> +    _FDT((fdt_end_node(fdt))); /* close root node */
> +    _FDT((fdt_finish(fdt)));
> +
> +    if (fdt_size) {
> +        *fdt_size = fdt_totalsize(fdt);
> +    }
> +
> +    return fdt;
> +}
> +
> +static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
> +{
> +    return (addr&  0x0fffffff) + KERNEL_LOAD_ADDR;
> +}
> +
> +static void emulate_spapr_hypercall(CPUState *env, void *opaque)
> +{
> +    env->gpr[3] = spapr_hypercall(env, (sPAPREnvironment *)opaque,
> +                                  env->gpr[3],&env->gpr[4]);
> +}
> +
> +/* FIXME: hack until we implement the proper VIO console */
> +static target_ulong h_put_term_char(CPUState *env, sPAPREnvironment *spapr,
> +                                    target_ulong opcode, target_ulong *args)
> +{
> +    uint8_t buf[16];
> +
> +    stq_p(buf, args[2]);
> +    stq_p(buf + 8, args[3]);
> +
> +    qemu_chr_write(serial_hds[0], buf, args[1]);
> +
> +    return 0;
> +}
> +
> +
> +/* pSeries LPAR / sPAPR hardware init */
> +static void ppc_spapr_init(ram_addr_t ram_size,
> +                           const char *boot_device,
> +                           const char *kernel_filename,
> +                           const char *kernel_cmdline,
> +                           const char *initrd_filename,
> +                           const char *cpu_model)
> +{
> +    CPUState *envs[MAX_CPUS];
> +    void *fdt;
> +    int i;
> +    ram_addr_t ram_offset;
> +    target_phys_addr_t fdt_addr;
> +    uint32_t kernel_base, initrd_base;
> +    long kernel_size, initrd_size;
> +    int fdt_size;
> +    sPAPREnvironment *spapr;
> +
> +    spapr = qemu_malloc(sizeof(*spapr));
> +
> +    /* We place the device tree just below either the top of RAM, or
> +     * 2GB, so that it can be processed with 32-bit code if
> +     * necessary */
> +    fdt_addr = MIN(ram_size, 0x80000000) - FDT_MAX_SIZE;
> +
> +    /* init CPUs */
> +    if (cpu_model == NULL) {
> +        cpu_model = "POWER7";
> +    }
> +    for (i = 0; i<  smp_cpus; i++) {
> +        CPUState *env =  cpu_init(cpu_model);
> +
> +        if (!env) {
> +            fprintf(stderr, "Unable to find PowerPC CPU definition\n");
> +            exit(1);
> +        }
> +        /* Set time-base frequency to 512 MHz */
> +        cpu_ppc_tb_init(env, TIMEBASE_FREQ);
> +        qemu_register_reset((QEMUResetHandler*)&cpu_reset, env);
> +
> +        env->emulate_hypercall = emulate_spapr_hypercall;
> +        env->hcall_opaque = spapr;
> +
> +        env->hreset_vector = 0x60;
> +        env->hreset_excp_prefix = 0;
> +        env->gpr[3] = i;
> +
> +        envs[i] = env;
> +    }
> +
> +    /* allocate RAM */
> +    ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
> +    cpu_register_physical_memory(0, ram_size, ram_offset);
> +
> +    spapr_register_hypercall(H_PUT_TERM_CHAR, h_put_term_char);
> +
> +    if (kernel_filename) {
> +        uint64_t lowaddr = 0;
> +
> +        kernel_base = KERNEL_LOAD_ADDR;
> +
> +        kernel_size = load_elf(kernel_filename, translate_kernel_address, NULL,
> +                               NULL,&lowaddr, NULL, 1, ELF_MACHINE, 0);
> +        if (kernel_size<  0) {
> +            kernel_size = load_image_targphys(kernel_filename, kernel_base,
> +                                              ram_size - kernel_base);
> +        }
> +        if (kernel_size<  0) {
> +            hw_error("qemu: could not load kernel '%s'\n", kernel_filename);
> +            exit(1);
> +        }
> +
> +        /* load initrd */
> +        if (initrd_filename) {
> +            initrd_base = INITRD_LOAD_ADDR;
> +            initrd_size = load_image_targphys(initrd_filename, initrd_base,
> +                                              ram_size - initrd_base);
> +            if (initrd_size<  0) {
> +                hw_error("qemu: could not load initial ram disk '%s'\n",
> +                         initrd_filename);
> +                exit(1);
> +            }
> +        } else {
> +            initrd_base = 0;
> +            initrd_size = 0;
> +        }
> +
> +    } else {
> +        fprintf(stderr, "pSeries machine needs -kernel for now");
> +        exit(1);
> +    }
> +
> +    /* Prepare the device tree */
> +    fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, envs, spapr,
> +                           initrd_base, initrd_size, kernel_cmdline);
> +    if (!fdt) {
> +        hw_error("Couldn't create pSeries device tree\n");
> +        exit(1);
> +    }
> +
> +    cpu_physical_memory_write(fdt_addr, fdt, fdt_size);
> +
> +    qemu_free(fdt);
> +
> +    envs[0]->gpr[3] = fdt_addr;
> +    envs[0]->gpr[5] = 0;
> +    envs[0]->hreset_vector = kernel_base;
> +}
> +
> +static QEMUMachine spapr_machine = {
> +    .name = "pseries",
> +    .desc = "pSeries Logical Partition (PAPR compliant)",
> +    .init = ppc_spapr_init,
> +    .max_cpus = MAX_CPUS,
> +    .no_vga = 1,
> +    .no_parallel = 1,
> +};
> +
> +static void spapr_machine_init(void)
> +{
> +    qemu_register_machine(&spapr_machine);
> +}
> +
> +machine_init(spapr_machine_init);
> diff --git a/hw/spapr.h b/hw/spapr.h
> new file mode 100644
> index 0000000..9e63a19
> --- /dev/null
> +++ b/hw/spapr.h
> @@ -0,0 +1,246 @@

This needs a copyright of some sort.

> +#if !defined (__HW_SPAPR_H__)
> +#define __HW_SPAPR_H__
> +
> +typedef struct sPAPREnvironment {
> +} sPAPREnvironment;
> +
> +#define H_SUCCESS         0
> +#define H_BUSY            1        /* Hardware busy -- retry later */
> +#define H_CLOSED          2        /* Resource closed */
> +#define H_NOT_AVAILABLE   3
> +#define H_CONSTRAINED     4        /* Resource request constrained to max allowed */
> +#define H_PARTIAL         5
> +#define H_IN_PROGRESS     14       /* Kind of like busy */
> +#define H_PAGE_REGISTERED 15
> +#define H_PARTIAL_STORE   16
> +#define H_PENDING         17       /* returned from H_POLL_PENDING */
> +#define H_CONTINUE        18       /* Returned from H_Join on success */
> +#define H_LONG_BUSY_START_RANGE         9900  /* Start of long busy range */
> +#define H_LONG_BUSY_ORDER_1_MSEC        9900  /* Long busy, hint that 1msec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_ORDER_10_MSEC       9901  /* Long busy, hint that 10msec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_ORDER_100_MSEC      9902  /* Long busy, hint that 100msec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_ORDER_1_SEC         9903  /* Long busy, hint that 1sec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_ORDER_10_SEC        9904  /* Long busy, hint that 10sec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_ORDER_100_SEC       9905  /* Long busy, hint that 100sec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_END_RANGE           9905  /* End of long busy range */
> +#define H_HARDWARE        -1       /* Hardware error */
> +#define H_FUNCTION        -2       /* Function not supported */
> +#define H_PRIVILEGE       -3       /* Caller not privileged */
> +#define H_PARAMETER       -4       /* Parameter invalid, out-of-range or conflicting */
> +#define H_BAD_MODE        -5       /* Illegal msr value */
> +#define H_PTEG_FULL       -6       /* PTEG is full */
> +#define H_NOT_FOUND       -7       /* PTE was not found" */
> +#define H_RESERVED_DABR   -8       /* DABR address is reserved by the hypervisor on this processor" */
> +#define H_NO_MEM          -9
> +#define H_AUTHORITY       -10
> +#define H_PERMISSION      -11
> +#define H_DROPPED         -12
> +#define H_SOURCE_PARM     -13
> +#define H_DEST_PARM       -14
> +#define H_REMOTE_PARM     -15
> +#define H_RESOURCE        -16
> +#define H_ADAPTER_PARM    -17
> +#define H_RH_PARM         -18
> +#define H_RCQ_PARM        -19
> +#define H_SCQ_PARM        -20
> +#define H_EQ_PARM         -21
> +#define H_RT_PARM         -22
> +#define H_ST_PARM         -23
> +#define H_SIGT_PARM       -24
> +#define H_TOKEN_PARM      -25
> +#define H_MLENGTH_PARM    -27
> +#define H_MEM_PARM        -28
> +#define H_MEM_ACCESS_PARM -29
> +#define H_ATTR_PARM       -30
> +#define H_PORT_PARM       -31
> +#define H_MCG_PARM        -32
> +#define H_VL_PARM         -33
> +#define H_TSIZE_PARM      -34
> +#define H_TRACE_PARM      -35
> +
> +#define H_MASK_PARM       -37
> +#define H_MCG_FULL        -38
> +#define H_ALIAS_EXIST     -39
> +#define H_P_COUNTER       -40
> +#define H_TABLE_FULL      -41
> +#define H_ALT_TABLE       -42
> +#define H_MR_CONDITION    -43
> +#define H_NOT_ENOUGH_RESOURCES -44
> +#define H_R_STATE         -45
> +#define H_RESCINDEND      -46
> +#define H_MULTI_THREADS_ACTIVE -9005
> +
> +
> +/* Long Busy is a condition that can be returned by the firmware
> + * when a call cannot be completed now, but the identical call
> + * should be retried later.  This prevents calls blocking in the
> + * firmware for long periods of time.  Annoyingly the firmware can return
> + * a range of return codes, hinting at how long we should wait before
> + * retrying.  If you don't care for the hint, the macro below is a good
> + * way to check for the long_busy return codes
> + */
> +#define H_IS_LONG_BUSY(x)  ((x>= H_LONG_BUSY_START_RANGE) \
> +&&  (x<= H_LONG_BUSY_END_RANGE))
> +
> +/* Flags */
> +#define H_LARGE_PAGE      (1ULL<<(63-16))
> +#define H_EXACT           (1ULL<<(63-24))       /* Use exact PTE or return H_PTEG_FULL */
> +#define H_R_XLATE         (1ULL<<(63-25))       /* include a valid logical page num in the pte if the valid bit is set */
> +#define H_READ_4          (1ULL<<(63-26))       /* Return 4 PTEs */
> +#define H_PAGE_STATE_CHANGE (1ULL<<(63-28))
> +#define H_PAGE_UNUSED     ((1ULL<<(63-29)) | (1ULL<<(63-30)))
> +#define H_PAGE_SET_UNUSED (H_PAGE_STATE_CHANGE | H_PAGE_UNUSED)
> +#define H_PAGE_SET_LOANED (H_PAGE_SET_UNUSED | (1ULL<<(63-31)))
> +#define H_PAGE_SET_ACTIVE H_PAGE_STATE_CHANGE
> +#define H_AVPN            (1ULL<<(63-32))       /* An avpn is provided as a sanity test */
> +#define H_ANDCOND         (1ULL<<(63-33))
> +#define H_ICACHE_INVALIDATE (1ULL<<(63-40))     /* icbi, etc.  (ignored for IO pages) */
> +#define H_ICACHE_SYNCHRONIZE (1ULL<<(63-41))    /* dcbst, icbi, etc (ignored for IO pages */
> +#define H_ZERO_PAGE       (1ULL<<(63-48))       /* zero the page before mapping (ignored for IO pages) */
> +#define H_COPY_PAGE       (1ULL<<(63-49))
> +#define H_N               (1ULL<<(63-61))
> +#define H_PP1             (1ULL<<(63-62))
> +#define H_PP2             (1ULL<<(63-63))
> +
> +/* VASI States */
> +#define H_VASI_INVALID    0
> +#define H_VASI_ENABLED    1
> +#define H_VASI_ABORTED    2
> +#define H_VASI_SUSPENDING 3
> +#define H_VASI_SUSPENDED  4
> +#define H_VASI_RESUMED    5
> +#define H_VASI_COMPLETED  6
> +
> +/* DABRX flags */
> +#define H_DABRX_HYPERVISOR (1ULL<<(63-61))
> +#define H_DABRX_KERNEL     (1ULL<<(63-62))
> +#define H_DABRX_USER       (1ULL<<(63-63))
> +
> +/* Each control block has to be on a 4K bondary */
> +#define H_CB_ALIGNMENT     4096
> +
> +/* pSeries hypervisor opcodes */
> +#define H_REMOVE                0x04
> +#define H_ENTER                 0x08
> +#define H_READ                  0x0c
> +#define H_CLEAR_MOD             0x10
> +#define H_CLEAR_REF             0x14
> +#define H_PROTECT               0x18
> +#define H_GET_TCE               0x1c
> +#define H_PUT_TCE               0x20
> +#define H_SET_SPRG0             0x24
> +#define H_SET_DABR              0x28
> +#define H_PAGE_INIT             0x2c
> +#define H_SET_ASR               0x30
> +#define H_ASR_ON                0x34
> +#define H_ASR_OFF               0x38
> +#define H_LOGICAL_CI_LOAD       0x3c
> +#define H_LOGICAL_CI_STORE      0x40
> +#define H_LOGICAL_CACHE_LOAD    0x44
> +#define H_LOGICAL_CACHE_STORE   0x48
> +#define H_LOGICAL_ICBI          0x4c
> +#define H_LOGICAL_DCBF          0x50
> +#define H_GET_TERM_CHAR         0x54
> +#define H_PUT_TERM_CHAR         0x58
> +#define H_REAL_TO_LOGICAL       0x5c
> +#define H_HYPERVISOR_DATA       0x60
> +#define H_EOI                   0x64
> +#define H_CPPR                  0x68
> +#define H_IPI                   0x6c
> +#define H_IPOLL                 0x70
> +#define H_XIRR                  0x74
> +#define H_PERFMON               0x7c
> +#define H_MIGRATE_DMA           0x78
> +#define H_REGISTER_VPA          0xDC
> +#define H_CEDE                  0xE0
> +#define H_CONFER                0xE4
> +#define H_PROD                  0xE8
> +#define H_GET_PPP               0xEC
> +#define H_SET_PPP               0xF0
> +#define H_PURR                  0xF4
> +#define H_PIC                   0xF8
> +#define H_REG_CRQ               0xFC
> +#define H_FREE_CRQ              0x100
> +#define H_VIO_SIGNAL            0x104
> +#define H_SEND_CRQ              0x108
> +#define H_COPY_RDMA             0x110
> +#define H_REGISTER_LOGICAL_LAN  0x114
> +#define H_FREE_LOGICAL_LAN      0x118
> +#define H_ADD_LOGICAL_LAN_BUFFER 0x11C
> +#define H_SEND_LOGICAL_LAN      0x120
> +#define H_BULK_REMOVE           0x124
> +#define H_MULTICAST_CTRL        0x130
> +#define H_SET_XDABR             0x134
> +#define H_STUFF_TCE             0x138
> +#define H_PUT_TCE_INDIRECT      0x13C
> +#define H_CHANGE_LOGICAL_LAN_MAC 0x14C
> +#define H_VTERM_PARTNER_INFO    0x150
> +#define H_REGISTER_VTERM        0x154
> +#define H_FREE_VTERM            0x158
> +#define H_RESET_EVENTS          0x15C
> +#define H_ALLOC_RESOURCE        0x160
> +#define H_FREE_RESOURCE         0x164
> +#define H_MODIFY_QP             0x168
> +#define H_QUERY_QP              0x16C
> +#define H_REREGISTER_PMR        0x170
> +#define H_REGISTER_SMR          0x174
> +#define H_QUERY_MR              0x178
> +#define H_QUERY_MW              0x17C
> +#define H_QUERY_HCA             0x180
> +#define H_QUERY_PORT            0x184
> +#define H_MODIFY_PORT           0x188
> +#define H_DEFINE_AQP1           0x18C
> +#define H_GET_TRACE_BUFFER      0x190
> +#define H_DEFINE_AQP0           0x194
> +#define H_RESIZE_MR             0x198
> +#define H_ATTACH_MCQP           0x19C
> +#define H_DETACH_MCQP           0x1A0
> +#define H_CREATE_RPT            0x1A4
> +#define H_REMOVE_RPT            0x1A8
> +#define H_REGISTER_RPAGES       0x1AC
> +#define H_DISABLE_AND_GETC      0x1B0
> +#define H_ERROR_DATA            0x1B4
> +#define H_GET_HCA_INFO          0x1B8
> +#define H_GET_PERF_COUNT        0x1BC
> +#define H_MANAGE_TRACE          0x1C0
> +#define H_FREE_LOGICAL_LAN_BUFFER 0x1D4
> +#define H_QUERY_INT_STATE       0x1E4
> +#define H_POLL_PENDING          0x1D8
> +#define H_ILLAN_ATTRIBUTES      0x244
> +#define H_MODIFY_HEA_QP         0x250
> +#define H_QUERY_HEA_QP          0x254
> +#define H_QUERY_HEA             0x258
> +#define H_QUERY_HEA_PORT        0x25C
> +#define H_MODIFY_HEA_PORT       0x260
> +#define H_REG_BCMC              0x264
> +#define H_DEREG_BCMC            0x268
> +#define H_REGISTER_HEA_RPAGES   0x26C
> +#define H_DISABLE_AND_GET_HEA   0x270
> +#define H_GET_HEA_INFO          0x274
> +#define H_ALLOC_HEA_RESOURCE    0x278
> +#define H_ADD_CONN              0x284
> +#define H_DEL_CONN              0x288
> +#define H_JOIN                  0x298
> +#define H_VASI_STATE            0x2A4
> +#define H_ENABLE_CRQ            0x2B0
> +#define H_GET_EM_PARMS          0x2B8
> +#define H_SET_MPP               0x2D0
> +#define H_GET_MPP               0x2D4
> +#define MAX_HCALL_OPCODE        H_GET_MPP
> +
> +typedef target_ulong (*spapr_hcall_fn)(CPUState *env, sPAPREnvironment *spapr,
> +                                       target_ulong opcode,
> +                                       target_ulong *args);
> +
> +void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn);
> +target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
> +                             target_ulong opcode, target_ulong *args);
> +
> +
> +#endif /* !defined (__HW_SPAPR_H__) */
> diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
> new file mode 100644
> index 0000000..6ddac00
> --- /dev/null
> +++ b/hw/spapr_hcall.c
> @@ -0,0 +1,43 @@
> +#include "sysemu.h"
> +#include "cpu.h"
> +#include "qemu-char.h"
> +#include "hw/spapr.h"
> +
> +struct hypercall {
> +    spapr_hcall_fn fn;
> +} hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];

This isn't following CODING_STYLE.

> +void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn)
> +{
> +    struct hypercall *hc;
> +
> +    assert(opcode<= MAX_HCALL_OPCODE);
> +    assert((opcode&  0x3) == 0);
> +
> +    hc = hypercall_table + (opcode / 4);
> +
> +    assert(!hc->fn || (fn == hc->fn));
> +
> +    hc->fn = fn;
> +}
> +
> +target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
> +                             target_ulong opcode, target_ulong *args)
> +{
> +    if (msr_pr) {
> +        fprintf(stderr, "Hypercall made with MSR=0x" TARGET_FMT_lx "\n",
> +                env->msr);
> +        return H_PRIVILEGE;
> +    }
> +
> +    if ((opcode<= MAX_HCALL_OPCODE)
> +&&  ((opcode&  0x3) == 0)) {
> +        struct hypercall *hc = hypercall_table + (opcode / 4);
> +
> +        if (hc->fn)
> +            return hc->fn(env, spapr, opcode, args);
> +    }
> +
> +    fprintf(stderr, "Unimplemented hcall 0x" TARGET_FMT_lx "\n", opcode);
> +    return H_FUNCTION;
> +}

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 14/26] Implement the bus structure for PAPR virtual IO
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 14/26] Implement the bus structure for PAPR virtual IO David Gibson
  2011-03-16 14:43   ` [Qemu-devel] " Alexander Graf
@ 2011-03-16 22:04   ` Anthony Liguori
  2011-03-17  3:19     ` David Gibson
  1 sibling, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2011-03-16 22:04 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, agraf, anton, qemu-devel

On 03/15/2011 11:56 PM, David Gibson wrote:
> This extends the "pseries" (PAPR) machine to include a virtual IO bus
> supporting the PAPR defined hypercall based virtual IO mechanisms.
>
> So far only one VIO device is provided, the vty / vterm, providing
> a full console (polled only, for now).
>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   Makefile.target |    3 +-
>   hw/spapr.c      |   47 ++++++++-----
>   hw/spapr.h      |    3 +
>   hw/spapr_vio.c  |  212 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/spapr_vio.h  |   50 +++++++++++++
>   hw/spapr_vty.c  |  145 +++++++++++++++++++++++++++++++++++++
>   6 files changed, 441 insertions(+), 19 deletions(-)
>   create mode 100644 hw/spapr_vio.c
>   create mode 100644 hw/spapr_vio.h
>   create mode 100644 hw/spapr_vty.c
>
> diff --git a/Makefile.target b/Makefile.target
> index e6a7557..3f2b235 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -232,7 +232,8 @@ obj-ppc-y += ppc_oldworld.o
>   # NewWorld PowerMac
>   obj-ppc-y += ppc_newworld.o
>   # IBM pSeries (sPAPR)
> -obj-ppc-y += spapr.o spapr_hcall.o
> +obj-ppc-y += spapr.o spapr_hcall.o spapr_vio.o
> +obj-ppc-y += spapr_vty.o
>   # PowerPC 4xx boards
>   obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>   obj-ppc-y += ppc440.o ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> index 8b4e16e..25e4a9e 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -25,7 +25,6 @@
>    *
>    */
>   #include "sysemu.h"
> -#include "qemu-char.h"
>   #include "hw.h"
>   #include "elf.h"
>
> @@ -34,6 +33,7 @@
>   #include "hw/loader.h"
>
>   #include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
>
>   #include<libfdt.h>
>
> @@ -58,6 +58,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>       uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>       int i;
>       char *modelname;
> +    int ret;
>
>   #define _FDT(exp) \
>       do { \
> @@ -152,9 +153,29 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>
>       _FDT((fdt_end_node(fdt)));
>
> +    /* vdevice */
> +    _FDT((fdt_begin_node(fdt, "vdevice")));
> +
> +    _FDT((fdt_property_string(fdt, "device_type", "vdevice")));
> +    _FDT((fdt_property_string(fdt, "compatible", "IBM,vdevice")));
> +    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
> +    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
> +
> +    _FDT((fdt_end_node(fdt)));
> +
>       _FDT((fdt_end_node(fdt))); /* close root node */
>       _FDT((fdt_finish(fdt)));
>
> +    /* re-expand to allow for further tweaks */
> +    _FDT((fdt_open_into(fdt, fdt, FDT_MAX_SIZE)));
> +
> +    ret = spapr_populate_vdevice(spapr->vio_bus, fdt);
> +    if (ret<  0) {
> +        fprintf(stderr, "couldn't setup vio devices in fdt\n");
> +    }
> +
> +    _FDT((fdt_pack(fdt)));
> +
>       if (fdt_size) {
>           *fdt_size = fdt_totalsize(fdt);
>       }
> @@ -173,21 +194,6 @@ static void emulate_spapr_hypercall(CPUState *env, void *opaque)
>                                     env->gpr[3],&env->gpr[4]);
>   }
>
> -/* FIXME: hack until we implement the proper VIO console */
> -static target_ulong h_put_term_char(CPUState *env, sPAPREnvironment *spapr,
> -                                    target_ulong opcode, target_ulong *args)
> -{
> -    uint8_t buf[16];
> -
> -    stq_p(buf, args[2]);
> -    stq_p(buf + 8, args[3]);
> -
> -    qemu_chr_write(serial_hds[0], buf, args[1]);
> -
> -    return 0;
> -}
> -
> -
>   /* pSeries LPAR / sPAPR hardware init */
>   static void ppc_spapr_init(ram_addr_t ram_size,
>                              const char *boot_device,
> @@ -242,7 +248,13 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>       ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
>       cpu_register_physical_memory(0, ram_size, ram_offset);
>
> -    spapr_register_hypercall(H_PUT_TERM_CHAR, h_put_term_char);
> +    spapr->vio_bus = spapr_vio_bus_init();
> +
> +    for (i = 0; i<  MAX_SERIAL_PORTS; i++) {
> +        if (serial_hds[i]) {
> +            spapr_vty_create(spapr->vio_bus, i, serial_hds[i]);
> +        }
> +    }
>
>       if (kernel_filename) {
>           uint64_t lowaddr = 0;
> @@ -274,7 +286,6 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>               initrd_base = 0;
>               initrd_size = 0;
>           }
> -
>       } else {
>           fprintf(stderr, "pSeries machine needs -kernel for now");
>           exit(1);
> diff --git a/hw/spapr.h b/hw/spapr.h
> index 9e63a19..47bf2ef 100644
> --- a/hw/spapr.h
> +++ b/hw/spapr.h
> @@ -1,7 +1,10 @@
>   #if !defined (__HW_SPAPR_H__)
>   #define __HW_SPAPR_H__
>
> +struct VIOsPAPRBus;
> +
>   typedef struct sPAPREnvironment {
> +    struct VIOsPAPRBus *vio_bus;
>   } sPAPREnvironment;
>
>   #define H_SUCCESS         0
> diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
> new file mode 100644
> index 0000000..0ed63f4
> --- /dev/null
> +++ b/hw/spapr_vio.c
> @@ -0,0 +1,212 @@
> +/*
> + * QEMU sPAPR VIO code
> + *
> + * Copyright (c) 2010 David Gibson, IBM Corporation<david@gibson.dropbear.id.au>
> + * Based on the s390 virtio bus code:
> + * Copyright (c) 2009 Alexander Graf<agraf@suse.de>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see<http://www.gnu.org/licenses/>.
> + */
> +
> +#include "hw.h"
> +#include "sysemu.h"
> +#include "boards.h"
> +#include "monitor.h"
> +#include "loader.h"
> +#include "elf.h"
> +#include "hw/sysbus.h"
> +#include "kvm.h"
> +#include "device_tree.h"
> +
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +#ifdef CONFIG_FDT
> +#include<libfdt.h>
> +#endif /* CONFIG_FDT */
> +
> +/* #define DEBUG_SPAPR */
> +
> +#ifdef DEBUG_SPAPR
> +#define dprintf(fmt, ...) \
> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define dprintf(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +static struct BusInfo spapr_vio_bus_info = {
> +    .name       = "spapr-vio",
> +    .size       = sizeof(VIOsPAPRBus),
> +};
> +
> +VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg)
> +{
> +    DeviceState *qdev;
> +    VIOsPAPRDevice *dev = NULL;
> +
> +    QLIST_FOREACH(qdev,&bus->bus.children, sibling) {
> +        dev = (VIOsPAPRDevice *)qdev;
> +        if (dev->reg == reg) {
> +            break;
> +        }
> +    }
> +
> +    return dev;
> +}
> +
> +#ifdef CONFIG_FDT
> +static int vio_make_devnode(VIOsPAPRDevice *dev,
> +                            void *fdt)
> +{
> +    VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)dev->qdev.info;
> +    int vdevice_off, node_off;
> +    int ret;
> +
> +    vdevice_off = fdt_path_offset(fdt, "/vdevice");
> +    if (vdevice_off<  0) {
> +        return vdevice_off;
> +    }
> +
> +    node_off = fdt_add_subnode(fdt, vdevice_off, dev->qdev.id);
> +    if (node_off<  0) {
> +        return node_off;
> +    }
> +
> +    ret = fdt_setprop_cell(fdt, node_off, "reg", dev->reg);
> +    if (ret<  0) {
> +        return ret;
> +    }
> +
> +    if (info->dt_type) {
> +        ret = fdt_setprop_string(fdt, node_off, "device_type",
> +                                 info->dt_type);
> +        if (ret<  0) {
> +            return ret;
> +        }
> +    }
> +
> +    if (info->dt_compatible) {
> +        ret = fdt_setprop_string(fdt, node_off, "compatible",
> +                                 info->dt_compatible);
> +        if (ret<  0) {
> +            return ret;
> +        }
> +    }
> +
> +    if (info->devnode) {
> +        ret = (info->devnode)(dev, fdt, node_off);
> +        if (ret<  0) {
> +            return ret;
> +        }
> +    }
> +
> +    return node_off;
> +}
> +#endif /* CONFIG_FDT */
> +
> +static int spapr_vio_busdev_init(DeviceState *dev, DeviceInfo *info)
> +{
> +    VIOsPAPRDeviceInfo *_info = (VIOsPAPRDeviceInfo *)info;
> +    VIOsPAPRDevice *_dev = (VIOsPAPRDevice *)dev;
> +    char *id;
> +
> +    if (asprintf(&id, "%s@%x", _info->dt_name, _dev->reg)<  0) {
> +        return -1;
> +    }
> +
> +    _dev->qdev.id = id;
> +
> +    return _info->init(_dev);

The C standard actually reserves the _ and __ namespaces for compilers 
and system headers.  The kernel can get away with it because it doesn't 
use system headers but we've had trouble with this in QEMU.

> +}
> +
> +void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info)
> +{
> +    info->qdev.init = spapr_vio_busdev_init;
> +    info->qdev.bus_info =&spapr_vio_bus_info;
> +
> +    assert(info->qdev.size>= sizeof(VIOsPAPRDevice));
> +    qdev_register(&info->qdev);
> +}
> +
> +VIOsPAPRBus *spapr_vio_bus_init(void)
> +{
> +    VIOsPAPRBus *bus;
> +    BusState *_bus;
> +    DeviceState *dev;
> +    DeviceInfo *_info;
> +
> +    /* Create bridge device */
> +    dev = qdev_create(NULL, "spapr-vio-bridge");
> +    qdev_init_nofail(dev);
> +
> +    /* Create bus on bridge device */
> +
> +    _bus = qbus_create(&spapr_vio_bus_info, dev, "spapr-vio");
> +    bus = DO_UPCAST(VIOsPAPRBus, bus, _bus);
> +
> +    for (_info = device_info_list; _info; _info = _info->next) {
> +        VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)_info;
> +
> +        if (_info->bus_info !=&spapr_vio_bus_info)
> +            continue;
> +
> +        if (info->hcalls)
> +            info->hcalls(bus);

Got a little sloppy with braces here..

> +    }
> +
> +    return bus;
> +}
> +
> +/* Represents sPAPR hcall VIO devices */
> +
> +static int spapr_vio_bridge_init(SysBusDevice *dev)
> +{
> +    /* nothing */
> +    return 0;
> +}
> +
> +static SysBusDeviceInfo spapr_vio_bridge_info = {
> +    .init = spapr_vio_bridge_init,
> +    .qdev.name  = "spapr-vio-bridge",
> +    .qdev.size  = sizeof(SysBusDevice),
> +    .qdev.no_user = 1,
> +};
> +
> +static void spapr_vio_register_devices(void)
> +{
> +    sysbus_register_withprop(&spapr_vio_bridge_info);
> +}
> +
> +device_init(spapr_vio_register_devices)
> +
> +#ifdef CONFIG_FDT
> +int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt)
> +{
> +    DeviceState *qdev;
> +    int ret = 0;
> +
> +    QLIST_FOREACH(qdev,&bus->bus.children, sibling) {
> +        VIOsPAPRDevice *dev = (VIOsPAPRDevice *)qdev;
> +
> +        ret = vio_make_devnode(dev, fdt);
> +
> +        if (ret<  0) {
> +            return ret;
> +        }
> +    }
> +
> +    return 0;
> +}
> +#endif /* CONFIG_FDT */
> diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
> new file mode 100644
> index 0000000..b164ad3
> --- /dev/null
> +++ b/hw/spapr_vio.h
> @@ -0,0 +1,50 @@
> +#ifndef _HW_SPAPR_VIO_H
> +#define _HW_SPAPR_VIO_H
> +/*
> + * QEMU sPAPR VIO bus definitions
> + *
> + * Copyright (c) 2010 David Gibson, IBM Corporation<david@gibson.dropbear.id.au>
> + * Based on the s390 virtio bus definitions:
> + * Copyright (c) 2009 Alexander Graf<agraf@suse.de>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see<http://www.gnu.org/licenses/>.
> + */
> +
> +typedef struct VIOsPAPRDevice {
> +    DeviceState qdev;
> +    uint32_t reg;
> +} VIOsPAPRDevice;
> +
> +typedef struct VIOsPAPRBus {
> +    BusState bus;
> +} VIOsPAPRBus;
> +
> +typedef struct {
> +    DeviceInfo qdev;
> +    const char *dt_name, *dt_type, *dt_compatible;
> +    int (*init)(VIOsPAPRDevice *dev);
> +    void (*hcalls)(VIOsPAPRBus *bus);
> +    int (*devnode)(VIOsPAPRDevice *dev, void *fdt, int node_off);
> +} VIOsPAPRDeviceInfo;
> +
> +extern VIOsPAPRBus *spapr_vio_bus_init(void);
> +extern VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg);
> +extern void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info);
> +extern int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt);
> +
> +void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len);
> +void spapr_vty_create(VIOsPAPRBus *bus,
> +                      uint32_t reg, CharDriverState *chardev);
> +
> +#endif /* _HW_SPAPR_VIO_H */
> diff --git a/hw/spapr_vty.c b/hw/spapr_vty.c
> new file mode 100644
> index 0000000..afc9ef9
> --- /dev/null
> +++ b/hw/spapr_vty.c
> @@ -0,0 +1,145 @@
> +#include "qdev.h"
> +#include "qemu-char.h"
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +#define VTERM_BUFSIZE   16
> +
> +typedef struct VIOsPAPRVTYDevice {
> +    VIOsPAPRDevice sdev;
> +    CharDriverState *chardev;
> +    uint32_t in, out;
> +    uint8_t buf[VTERM_BUFSIZE];
> +} VIOsPAPRVTYDevice;
> +
> +static int vty_can_receive(void *opaque)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)opaque;
> +
> +    return (dev->in - dev->out)<  VTERM_BUFSIZE;
> +}
> +
> +static void vty_receive(void *opaque, const uint8_t *buf, int size)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)opaque;
> +    int i;
> +
> +    for (i = 0; i<  size; i++) {
> +        assert((dev->in - dev->out)<  VTERM_BUFSIZE);
> +        dev->buf[dev->in++ % VTERM_BUFSIZE] = buf[i];
> +    }
> +}
> +
> +static int vty_getchars(VIOsPAPRDevice *sdev, uint8_t *buf, int max)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
> +    int n = 0;
> +
> +    while ((n<  max)&&  (dev->out != dev->in))
> +        buf[n++] = dev->buf[dev->out++ % VTERM_BUFSIZE];
> +

We have a checkpatch.pl in the tree.  I'd suggest using that to get rid 
of the rest of the CODING_STYLE issues which I'll stop commenting on.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 16/26] Implement hcall based RTAS for pSeries machines
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 16/26] Implement hcall based RTAS for pSeries machines David Gibson
  2011-03-16 15:08   ` [Qemu-devel] " Alexander Graf
@ 2011-03-16 22:08   ` Anthony Liguori
  1 sibling, 0 replies; 82+ messages in thread
From: Anthony Liguori @ 2011-03-16 22:08 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, agraf, anton, qemu-devel

On 03/15/2011 11:56 PM, David Gibson wrote:
> diff --git a/hw/spapr_rtas.c b/hw/spapr_rtas.c
> new file mode 100644
> index 0000000..c606018
> --- /dev/null
> +++ b/hw/spapr_rtas.c
> @@ -0,0 +1,104 @@

Needs copyright

Regards,

Anthony Liguori

> +#include "cpu.h"
> +#include "sysemu.h"
> +#include "qemu-char.h"
> +#include "hw/qdev.h"
> +#include "device_tree.h"
> +
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +#include<libfdt.h>
> +
> +#define TOKEN_BASE      0x2000
> +#define TOKEN_MAX       0x100
> +
> +static struct rtas_call {
> +    const char *name;
> +    spapr_rtas_fn fn;
> +} rtas_table[TOKEN_MAX];
> +
> +struct rtas_call *rtas_next = rtas_table;
> +
> +target_ulong spapr_rtas_call(sPAPREnvironment *spapr,
> +                             uint32_t token, uint32_t nargs, target_ulong args,
> +                             uint32_t nret, target_ulong rets)
> +{
> +    if ((token>= TOKEN_BASE)
> +&&  ((token - TOKEN_BASE)<  TOKEN_MAX)) {
> +        struct rtas_call *call = rtas_table + (token - TOKEN_BASE);
> +
> +        if (call->fn) {
> +            call->fn(spapr, token, nargs, args, nret, rets);
> +            return H_SUCCESS;
> +        }
> +    }
> +
> +    fprintf(stderr, "Unknown RTAS token 0x%x\n", token);
> +    rtas_st(rets, 0, -3);
> +    return H_PARAMETER;
> +}
> +
> +void spapr_rtas_register(const char *name, spapr_rtas_fn fn)
> +{
> +    assert(rtas_next<  (rtas_table + TOKEN_MAX));
> +
> +    rtas_next->name = name;
> +    rtas_next->fn = fn;
> +
> +    rtas_next++;
> +}
> +
> +int spapr_rtas_device_tree_setup(void *fdt, target_phys_addr_t rtas_addr,
> +                                 target_phys_addr_t rtas_size)
> +{
> +    int ret;
> +    int i;
> +
> +    ret = fdt_add_mem_rsv(fdt, rtas_addr, rtas_size);
> +    if (ret<  0) {
> +        fprintf(stderr, "Couldn't add RTAS reserve entry: %s\n",
> +                fdt_strerror(ret));
> +        return ret;
> +    }
> +
> +    ret = qemu_devtree_setprop_cell(fdt, "/rtas", "linux,rtas-base",
> +                                    rtas_addr);
> +    if (ret<  0) {
> +        fprintf(stderr, "Couldn't add linux,rtas-base property: %s\n",
> +                fdt_strerror(ret));
> +        return ret;
> +    }
> +
> +    ret = qemu_devtree_setprop_cell(fdt, "/rtas", "linux,rtas-entry",
> +                                    rtas_addr);
> +    if (ret<  0) {
> +        fprintf(stderr, "Couldn't add linux,rtas-entry property: %s\n",
> +                fdt_strerror(ret));
> +        return ret;
> +    }
> +
> +    ret = qemu_devtree_setprop_cell(fdt, "/rtas", "rtas-size",
> +                                    rtas_size);
> +    if (ret<  0) {
> +        fprintf(stderr, "Couldn't add rtas-size property: %s\n",
> +                fdt_strerror(ret));
> +        return ret;
> +    }
> +
> +    for (i = 0; i<  TOKEN_MAX; i++) {
> +        struct rtas_call *call =&rtas_table[i];
> +
> +        if (!call->fn) {
> +            continue;
> +        }
> +
> +        ret = qemu_devtree_setprop_cell(fdt, "/rtas", call->name, i + TOKEN_BASE);
> +        if (ret<  0) {
> +            fprintf(stderr, "Couldn't add rtas token for %s: %s\n",
> +                    call->name, fdt_strerror(ret));
> +            return ret;
> +        }
> +
> +    }
> +    return 0;
> +}
> diff --git a/pc-bios/spapr-rtas.bin b/pc-bios/spapr-rtas.bin
> new file mode 100644
> index 0000000000000000000000000000000000000000..eade9c0e8ff0fd3071e3a6638a11c1a2e9a47152
> GIT binary patch
> literal 20
> bcmb<Pk*=^wC@M)vPAqm|U{LaFU{C-6M#cr<
>
> literal 0
> HcmV?d00001
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics) David Gibson
  2011-03-16 15:47   ` [Qemu-devel] " Alexander Graf
@ 2011-03-16 22:16   ` Anthony Liguori
  2011-03-17  1:34     ` David Gibson
  1 sibling, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2011-03-16 22:16 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, agraf, anton, qemu-devel

On 03/15/2011 11:56 PM, David Gibson wrote:
> PAPR defines an interrupt control architecture which is logically divided
> into ICS (Interrupt Control Presentation, each unit is responsible for
> presenting interrupts to a particular "interrupt server", i.e. CPU) and
> ICS (Interrupt Control Source, each unit responsible for one or more
> hardware interrupts as numbered globally across the system).  All PAPR
> virtual IO devices expect to deliver interrupts via this mechanism.  In
> Linux, this interrupt controller system is handled by the "xics" driver.
>
> On pSeries systems, access to the interrupt controller is virtualized via
> hypercalls and RTAS methods.  However, the virtualized interface is very
> similar to the underlying interrupt controller hardware, and similar PICs
> exist un-virtualized in some other systems.
>
> This patch implements both the ICP and ICS sides of the PAPR interrupt
> controller.  For now, only the hypercall virtualized interface is provided,
> however it would be relatively straightforward to graft an emulated
> register interface onto the underlying interrupt logic if we want to add
> a machine with a hardware ICS/ICP system in the future.
>
> There are some limitations in this implementation: it is assumed for now
> that only one instance of the ICS exists, although a full xics system can
> have several, each responsible for a different group of hardware irqs.
> ICP/ICS can handle both level-sensitve (LSI) and message signalled (MSI)
> interrupt inputs.  For now, this implementation supports only MSI
> interrupts, since that is used by PAPR virtual IO devices.
>
> Signed-off-by: Paul Mackerras<paulus@samba.org>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   Makefile.target |    2 +-
>   hw/spapr.c      |   26 +++
>   hw/spapr.h      |    2 +
>   hw/xics.c       |  528 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/xics.h       |   13 ++
>   5 files changed, 570 insertions(+), 1 deletions(-)
>   create mode 100644 hw/xics.c
>   create mode 100644 hw/xics.h
>
> diff --git a/Makefile.target b/Makefile.target
> index e333225..2b0588e 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -233,7 +233,7 @@ obj-ppc-y += ppc_oldworld.o
>   obj-ppc-y += ppc_newworld.o
>   # IBM pSeries (sPAPR)
>   obj-ppc-y += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
> -obj-ppc-y += spapr_vty.o
> +obj-ppc-y += xics.o spapr_vty.o
>   # PowerPC 4xx boards
>   obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>   obj-ppc-y += ppc440.o ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> index 23f493a..be30def 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -34,6 +34,7 @@
>
>   #include "hw/spapr.h"
>   #include "hw/spapr_vio.h"
> +#include "hw/xics.h"
>
>   #include<libfdt.h>
>
> @@ -62,6 +63,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>       uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>       uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
>       char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr";
> +    uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
>       int i;
>       char *modelname;
>       int ret;
> @@ -120,6 +122,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>
>       for (i = 0; i<  smp_cpus; i++) {
>           CPUState *env = envs[i];
> +        uint32_t gserver_prop[] = {cpu_to_be32(i), 0}; /* HACK! */
>           char *nodename;
>           uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
>                              0xffffffff, 0xffffffff};
> @@ -147,6 +150,9 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>           _FDT((fdt_property(fdt, "ibm,pft-size", pft_size_prop, sizeof(pft_size_prop))));
>           _FDT((fdt_property_string(fdt, "status", "okay")));
>           _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
> +        _FDT((fdt_property_cell(fdt, "ibm,ppc-interrupt-server#s", i)));
> +        _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s",
> +                           gserver_prop, sizeof(gserver_prop))));
>
>           if (envs[i]->mmu_model&  POWERPC_MMU_1TSEG) {
>               _FDT((fdt_property(fdt, "ibm,processor-segment-sizes",
> @@ -168,6 +174,20 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>
>       _FDT((fdt_end_node(fdt)));
>
> +    /* interrupt controller */
> +    _FDT((fdt_begin_node(fdt, "interrupt-controller@0")));
> +
> +    _FDT((fdt_property_string(fdt, "device_type",
> +                              "PowerPC-External-Interrupt-Presentation")));
> +    _FDT((fdt_property_string(fdt, "compatible", "IBM,ppc-xicp")));
> +    _FDT((fdt_property_cell(fdt, "reg", 0)));
> +    _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
> +    _FDT((fdt_property(fdt, "ibm,interrupt-server-ranges",
> +                       interrupt_server_ranges_prop,
> +                       sizeof(interrupt_server_ranges_prop))));
> +
> +    _FDT((fdt_end_node(fdt)));
> +
>       /* vdevice */
>       _FDT((fdt_begin_node(fdt, "vdevice")));
>
> @@ -175,6 +195,8 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>       _FDT((fdt_property_string(fdt, "compatible", "IBM,vdevice")));
>       _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
>       _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
> +    _FDT((fdt_property_cell(fdt, "#interrupt-cells", 0x2)));
> +    _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
>
>       _FDT((fdt_end_node(fdt)));
>
> @@ -290,6 +312,10 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>       }
>       qemu_free(filename);
>
> +    /* Set up Interrupt Controller */
> +    spapr->icp = xics_system_init(smp_cpus,&env, MAX_SERIAL_PORTS);
> +
> +    /* Set up VIO bus */
>       spapr->vio_bus = spapr_vio_bus_init();
>
>       for (i = 0; i<  MAX_SERIAL_PORTS; i++) {
> diff --git a/hw/spapr.h b/hw/spapr.h
> index 7a7c319..4b54c22 100644
> --- a/hw/spapr.h
> +++ b/hw/spapr.h
> @@ -2,9 +2,11 @@
>   #define __HW_SPAPR_H__
>
>   struct VIOsPAPRBus;
> +struct icp_state;
>
>   typedef struct sPAPREnvironment {
>       struct VIOsPAPRBus *vio_bus;
> +    struct icp_state *icp;
>   } sPAPREnvironment;
>
>   #define H_SUCCESS         0
> diff --git a/hw/xics.c b/hw/xics.c
> new file mode 100644
> index 0000000..46e778a
> --- /dev/null
> +++ b/hw/xics.c
> @@ -0,0 +1,528 @@

Copyright.

> +#include "hw.h"
> +#include "hw/spapr.h"
> +#include "hw/xics.h"
> +
> +#include<pthread.h>

This isn't needed and it'll break the Windows build.   We carry a global 
mutex whenever QEMU code executes.

> +/*
> + * ICP: Presentation layer
> + */
> +
> +struct icp_server_state {
> +    uint32_t cppr :8;
> +    uint32_t xisr :24;

No real reason to use bitfields here.

> +    uint8_t pending_priority;
> +    uint8_t mfrr;
> +    qemu_irq output;
> +    pthread_mutex_t lock;
> +};
> +
> +struct ics_state;
> +
> +struct icp_state {
> +    long nr_servers;
> +    struct icp_server_state *ss;
> +    struct ics_state *ics;
> +};
> +
> +static void ics_reject(struct ics_state *ics, int nr);
> +static void ics_resend(struct ics_state *ics);
> +static void ics_eoi(struct ics_state *ics, int nr);
> +
> +static void icp_check_ipi(struct icp_state *icp, int server)
> +{
> +    struct icp_server_state *ss = icp->ss + server;
> +
> +    if (ss->xisr&&  (ss->pending_priority<= ss->mfrr)) {
> +        return;
> +    }
> +
> +    if (ss->xisr) {
> +        ics_reject(icp->ics, ss->xisr);
> +    }
> +
> +    ss->xisr = XICS_IPI;
> +    ss->pending_priority = ss->mfrr;
> +    qemu_irq_raise(ss->output);
> +}
> +
> +static void icp_resend(struct icp_state *icp, int server)
> +{
> +    struct icp_server_state *ss = icp->ss + server;
> +
> +    if (ss->mfrr<  ss->cppr) {
> +        icp_check_ipi(icp, server);
> +    }
> +    ics_resend(icp->ics);
> +}
> +
> +static void icp_set_cppr(struct icp_state *icp, int server, uint8_t cppr)
> +{
> +    struct icp_server_state *ss = icp->ss + server;
> +    uint8_t old_cppr;
> +    uint32_t old_xisr;
> +
> +    pthread_mutex_lock(&ss->lock);
> +    old_cppr = ss->cppr;
> +    ss->cppr = cppr;
> +
> +    if (cppr<  old_cppr) {
> +        if (ss->xisr&&  (cppr<= ss->pending_priority)) {
> +            old_xisr = ss->xisr;
> +            ss->xisr = 0;
> +            qemu_irq_lower(ss->output);
> +            ics_reject(icp->ics, old_xisr);
> +        }
> +    } else {
> +        if (!ss->xisr) {
> +            icp_resend(icp, server);
> +        }
> +    }
> +    pthread_mutex_unlock(&ss->lock);
> +}
> +
> +static void icp_set_mfrr(struct icp_state *icp, int nr, uint8_t mfrr)
> +{
> +    struct icp_server_state *ss = icp->ss + nr;
> +
> +    pthread_mutex_lock(&ss->lock);
> +
> +    ss->mfrr = mfrr;
> +    if (mfrr<  ss->cppr) {
> +        icp_check_ipi(icp, nr);
> +    }
> +
> +    pthread_mutex_unlock(&ss->lock);
> +}
> +
> +static uint32_t icp_accept(struct icp_server_state *ss)
> +{
> +    uint32_t xirr;
> +
> +    pthread_mutex_lock(&ss->lock);
> +    qemu_irq_lower(ss->output);
> +    xirr = ss->cppr<<  24 | ss->xisr;
> +    ss->xisr = 0;
> +    ss->cppr = ss->pending_priority;
> +    pthread_mutex_unlock(&ss->lock);
> +    return xirr;
> +}
> +
> +static void icp_eoi(struct icp_state *icp, int server, uint32_t xirr)
> +{
> +    struct icp_server_state *ss = icp->ss + server;
> +
> +    ics_eoi(icp->ics, xirr&  0xffffff);
> +    /* Send EOI ->  ICS */
> +    ss->cppr = xirr>>  24;
> +    if (!ss->xisr) {
> +        icp_resend(icp, server);
> +    }
> +}
> +
> +static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority)
> +{
> +    struct icp_server_state *ss = icp->ss + server;
> +
> +    pthread_mutex_lock(&ss->lock);
> +
> +    if ((priority>= ss->cppr)
> +        || (ss->xisr&&  (ss->pending_priority<= priority))) {
> +        ics_reject(icp->ics, nr);
> +    } else {
> +        if (ss->xisr) {
> +            ics_reject(icp->ics, ss->xisr);
> +        }
> +        ss->xisr = nr;
> +        ss->pending_priority = priority;
> +        qemu_irq_raise(ss->output);
> +    }
> +
> +    pthread_mutex_unlock(&ss->lock);
> +}
> +
> +/*
> + * ICS: Source layer
> + */
> +
> +struct ics_irq_state {
> +    int server;
> +    uint8_t priority;
> +    uint8_t saved_priority;
> +    /* int pending :1; */
> +    /* int presented :1; */
> +    int rejected :1;
> +    int masked_pending :1;
> +};
> +
> +struct ics_state {
> +    int nr_irqs;
> +    int offset;
> +    qemu_irq *qirqs;
> +    struct ics_irq_state *irqs;
> +    struct icp_state *icp;
> +};
> +
> +static int ics_valid_irq(struct ics_state *ics, uint32_t nr)
> +{
> +    return (nr>= ics->offset)
> +&&  (nr<  (ics->offset + ics->nr_irqs));
> +}
> +
> +static void ics_set_irq_msi(void *opaque, int nr, int val)
> +{
> +    struct ics_state *ics = (struct ics_state *)opaque;
> +    struct ics_irq_state *irq = ics->irqs + nr;
> +
> +    if (val) {
> +        if (irq->priority == 0xff) {
> +            irq->masked_pending = 1;
> +            /* masked pending */ ;
> +        } else  {
> +            icp_irq(ics->icp, irq->server, nr + ics->offset, irq->priority);
> +        }
> +    }
> +}
> +
> +static void ics_reject_msi(struct ics_state *ics, int nr)
> +{
> +    struct ics_irq_state *irq = ics->irqs + nr - ics->offset;
> +
> +    irq->rejected = 1;
> +}
> +
> +static void ics_resend_msi(struct ics_state *ics)
> +{
> +    int i;
> +
> +    for (i = 0; i<  ics->nr_irqs; i++) {
> +        struct ics_irq_state *irq = ics->irqs + i;
> +
> +        /* FIXME: filter by server#? */
> +        if (irq->rejected) {
> +            irq->rejected = 0;
> +            if (irq->priority != 0xff) {
> +                icp_irq(ics->icp, irq->server, i + ics->offset, irq->priority);
> +            }
> +        }
> +    }
> +}
> +
> +static void ics_write_xive_msi(struct ics_state *ics, int nr, int server,
> +                               uint8_t priority)
> +{
> +    struct ics_irq_state *irq = ics->irqs + nr;
> +
> +    irq->server = server;
> +    irq->priority = priority;
> +
> +    if (!irq->masked_pending || (priority = 0xff)) {
> +        return;
> +    }
> +
> +    irq->masked_pending = 0;
> +    icp_irq(ics->icp, server, nr + ics->offset, priority);
> +}
> +
> +/* static void ics_recheck_irq(struct ics_state *ics, int nr) */

This is a pretty ugly way to comment out code.  At least use an #if 0.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 21/26] Implement TCE translation for sPAPR VIO
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 21/26] Implement TCE translation for sPAPR VIO David Gibson
  2011-03-16 16:03   ` [Qemu-devel] " Alexander Graf
@ 2011-03-16 22:20   ` Anthony Liguori
  2011-03-18  1:58     ` David Gibson
  1 sibling, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2011-03-16 22:20 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, agraf, anton, qemu-devel

On 03/15/2011 11:56 PM, David Gibson wrote:
> From: Ben Herrenschmidt<benh@kernel.crashing.org>
>
> This patch implements the necessary infrastructure and hypercalls for
> sPAPR's TCE (Translation Control Entry) IOMMU mechanism.  This is necessary
> for all virtual IO devices which do DMA (i.e. nearly all of them).
>
> Signed-off-by: Ben Herrenschmidt<benh@kernel.crashing.org>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   hw/spapr.c     |    3 +-
>   hw/spapr_vio.c |  232 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/spapr_vio.h |   32 ++++++++
>   3 files changed, 266 insertions(+), 1 deletions(-)
>
> diff --git a/hw/spapr.c b/hw/spapr.c
> index e7f8864..a362889 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -62,7 +62,8 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>       uint32_t start_prop = cpu_to_be32(initrd_base);
>       uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>       uint32_t pft_size_prop[] = {0, cpu_to_be32(hash_shift)};
> -    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt";
> +    char hypertas_prop[] = "hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt"
> +        "\0hcall-tce";
>       uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
>       int i;
>       char *modelname;
> diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
> index 45edd94..37cf51e 100644
> --- a/hw/spapr_vio.c
> +++ b/hw/spapr_vio.c
> @@ -37,6 +37,7 @@
>   #endif /* CONFIG_FDT */
>
>   /* #define DEBUG_SPAPR */
> +/* #define DEBUG_TCE */
>
>   #ifdef DEBUG_SPAPR
>   #define dprintf(fmt, ...) \
> @@ -114,6 +115,28 @@ static int vio_make_devnode(VIOsPAPRDevice *dev,
>               return ret;
>       }
>
> +    if (dev->rtce_window_size) {
> +        uint32_t dma_prop[] = {cpu_to_be32(dev->reg),
> +                               0, 0,
> +                               0, cpu_to_be32(dev->rtce_window_size)};
> +
> +        ret = fdt_setprop_cell(fdt, node_off, "ibm,#dma-address-cells", 2);
> +        if (ret<  0) {
> +            return ret;
> +        }
> +
> +        ret = fdt_setprop_cell(fdt, node_off, "ibm,#dma-size-cells", 2);
> +        if (ret<  0) {
> +            return ret;
> +        }
> +
> +        ret = fdt_setprop(fdt, node_off, "ibm,my-dma-window", dma_prop,
> +                          sizeof(dma_prop));
> +        if (ret<  0) {
> +            return ret;
> +        }
> +    }
> +
>       if (info->devnode) {
>           ret = (info->devnode)(dev, fdt, node_off);
>           if (ret<  0) {
> @@ -125,6 +148,210 @@ static int vio_make_devnode(VIOsPAPRDevice *dev,
>   }
>   #endif /* CONFIG_FDT */
>
> +/*
> + * RTCE handling
> + */
> +
> +static void rtce_init(VIOsPAPRDevice *dev)
> +{
> +    size_t size = (dev->rtce_window_size>>  SPAPR_VIO_TCE_PAGE_SHIFT)
> +        * sizeof(VIOsPAPR_RTCE);
> +
> +    if (size) {
> +        dev->rtce_table = qemu_mallocz(size);
> +    }
> +}
> +
> +static target_ulong h_put_tce(CPUState *env, sPAPREnvironment *spapr,
> +                              target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong liobn = args[0];
> +    target_ulong ioba = args[1];
> +    target_ulong tce = args[2];
> +    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, liobn);
> +    VIOsPAPR_RTCE *rtce;
> +
> +    if (!dev) {
> +        fprintf(stderr, "spapr_vio_put_tce on non-existent LIOBN "
> +                TARGET_FMT_lx "\n",
> +                liobn);

You generally want to avoid guest triggered fprintfs as it can be 
exploited in scenarios where qemu's stdout is logged to disk (libvirt).  
We usually wrap this in a DPRINTF() of some sort.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 22/26] Implement sPAPR Virtual LAN (ibmveth)
  2011-03-16  4:56 ` [Qemu-devel] [PATCH 22/26] Implement sPAPR Virtual LAN (ibmveth) David Gibson
  2011-03-16 16:12   ` [Qemu-devel] " Alexander Graf
@ 2011-03-16 22:29   ` Anthony Liguori
  2011-03-17  2:09     ` David Gibson
  1 sibling, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2011-03-16 22:29 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, agraf, anton, qemu-devel

On 03/15/2011 11:56 PM, David Gibson wrote:
> This patch implements the PAPR specified Inter Virtual Machine Logical
> LAN; that is the virtual hardware used by the Linux ibmveth driver.
>
> Signed-off-by: Paul Mackerras<paulus@samba.org>
> Signed-off-by: David Gibson<dwg@au1.ibm.com>
> ---
>   Makefile.target |    2 +-
>   hw/spapr.c      |   21 +++-
>   hw/spapr_llan.c |  476 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/spapr_vio.h  |    9 +-
>   4 files changed, 503 insertions(+), 5 deletions(-)
>   create mode 100644 hw/spapr_llan.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 2b0588e..ef86d43 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -233,7 +233,7 @@ obj-ppc-y += ppc_oldworld.o
>   obj-ppc-y += ppc_newworld.o
>   # IBM pSeries (sPAPR)
>   obj-ppc-y += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
> -obj-ppc-y += xics.o spapr_vty.o
> +obj-ppc-y += xics.o spapr_vty.o spapr_llan.o
>   # PowerPC 4xx boards
>   obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>   obj-ppc-y += ppc440.o ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> index a362889..44cf3cc 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -27,6 +27,7 @@
>   #include "sysemu.h"
>   #include "hw.h"
>   #include "elf.h"
> +#include "net.h"
>
>   #include "hw/boards.h"
>   #include "hw/ppc.h"
> @@ -315,7 +316,7 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>       qemu_free(filename);
>
>       /* Set up Interrupt Controller */
> -    spapr->icp = xics_system_init(smp_cpus,&env, MAX_SERIAL_PORTS);
> +    spapr->icp = xics_system_init(smp_cpus, envs, MAX_SERIAL_PORTS + nb_nics);
>
>       /* Set up VIO bus */
>       spapr->vio_bus = spapr_vio_bus_init();
> @@ -327,6 +328,24 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>           }
>       }
>
> +    for (i = 0; i<  nb_nics; i++, irq++) {
> +        NICInfo *nd =&nd_table[i];
> +
> +        if (!nd->model) {
> +            nd->model = qemu_strdup("ibmveth");
> +        }
> +
> +        if (strcmp(nd->model, "ibmveth") == 0) {
> +            spapr_vlan_create(spapr->vio_bus, 0x1000 + i, nd,
> +                              xics_find_qirq(spapr->icp, irq), irq);
> +        } else {
> +            fprintf(stderr, "pSeries (sPAPR) platform does not support "
> +                    "NIC model '%s' (only ibmveth is supported)\n",
> +                    nd->model);
> +            exit(1);
> +        }
> +    }
> +
>       if (kernel_filename) {
>           uint64_t lowaddr = 0;
>
> diff --git a/hw/spapr_llan.c b/hw/spapr_llan.c
> new file mode 100644
> index 0000000..da0562d
> --- /dev/null
> +++ b/hw/spapr_llan.c
> @@ -0,0 +1,476 @@
> +#include "hw.h"
> +#include "net.h"
> +#include "hw/qdev.h"
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +#include<libfdt.h>
> +
> +#define ETH_ALEN        6
> +
> +//#define DEBUG
> +
> +#ifdef DEBUG
> +#define dprintf(fmt...) do { fprintf(stderr, fmt); } while(0)
> +#else
> +#define dprintf(fmt...)
> +#endif
> +
> +/*
> + * Virtual LAN device
> + */
> +
> +typedef uint64_t vlan_bd_t;
> +
> +#define VLAN_BD_VALID        0x8000000000000000ULL
> +#define VLAN_BD_TOGGLE       0x4000000000000000ULL
> +#define VLAN_BD_NO_CSUM      0x0200000000000000ULL
> +#define VLAN_BD_CSUM_GOOD    0x0100000000000000ULL
> +#define VLAN_BD_LEN_MASK     0x00ffffff00000000ULL
> +#define VLAN_BD_LEN(bd)      (((bd)&  VLAN_BD_LEN_MASK)>>  32)
> +#define VLAN_BD_ADDR_MASK    0x00000000ffffffffULL
> +#define VLAN_BD_ADDR(bd)     ((bd)&  VLAN_BD_ADDR_MASK)
> +
> +#define VLAN_VALID_BD(addr, len) (VLAN_BD_VALID | \
> +                                  (((len)<<  32)&  VLAN_BD_LEN_MASK) |  \
> +                                  (addr&  VLAN_BD_ADDR_MASK))
> +
> +#define VLAN_RXQC_TOGGLE     0x80
> +#define VLAN_RXQC_VALID      0x40
> +#define VLAN_RXQC_NO_CSUM    0x02
> +#define VLAN_RXQC_CSUM_GOOD  0x01
> +
> +#define VLAN_RQ_ALIGNMENT    16
> +#define VLAN_RXQ_BD_OFF      0
> +#define VLAN_FILTER_BD_OFF   8
> +#define VLAN_RX_BDS_OFF      16
> +#define VLAN_MAX_BUFS        ((SPAPR_VIO_TCE_PAGE_SIZE - VLAN_RX_BDS_OFF) / 8)
> +
> +typedef struct VIOsPAPRVLANDevice {
> +    VIOsPAPRDevice sdev;
> +    NICConf nicconf;
> +    NICState *nic;
> +    int isopen;
> +    target_ulong buf_list;
> +    int add_buf_ptr, use_buf_ptr, rx_bufs;
> +    target_ulong rxq_ptr;
> +} VIOsPAPRVLANDevice;
> +
> +static int spapr_vlan_can_receive(VLANClientState *nc)
> +{
> +    VIOsPAPRVLANDevice *dev = DO_UPCAST(NICState, nc, nc)->opaque;
> +
> +    return (dev->isopen&&  dev->rx_bufs>  0);
> +}
> +
> +static ssize_t spapr_vlan_receive(VLANClientState *nc, const uint8_t *buf,
> +                                  size_t size)
> +{
> +    VIOsPAPRDevice *sdev = DO_UPCAST(NICState, nc, nc)->opaque;
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +    vlan_bd_t rxq_bd = ldq_tce(sdev, dev->buf_list + VLAN_RXQ_BD_OFF);
> +    vlan_bd_t bd;
> +    int buf_ptr = dev->use_buf_ptr;
> +    uint64_t handle;
> +    uint8_t control;
> +
> +    dprintf("spapr_vlan_receive() [%s] rx_bufs=%d\n", sdev->qdev.id,
> +            dev->rx_bufs);
> +
> +    if (!dev->isopen) {
> +        return -1;
> +    }
> +
> +    if (!dev->rx_bufs) {
> +        return -1;
> +    }
> +
> +    do {
> +        buf_ptr += 8;
> +        if (buf_ptr>= SPAPR_VIO_TCE_PAGE_SIZE) {
> +            buf_ptr = VLAN_RX_BDS_OFF;
> +        }
> +
> +        bd = ldq_tce(sdev, dev->buf_list + buf_ptr);
> +        dprintf("use_buf_ptr=%d bd=0x%016llx\n",
> +                buf_ptr, (unsigned long long)bd);
> +    } while ((!(bd&  VLAN_BD_VALID) || (VLAN_BD_LEN(bd)<  (size + 8)))
> +&&  (buf_ptr != dev->use_buf_ptr));
> +
> +    if (!(bd&  VLAN_BD_VALID) || (VLAN_BD_LEN(bd)<  (size + 8))) {
> +        /* Failed to find a suitable buffer */
> +        return -1;
> +    }
> +
> +    /* Remove the buffer from the pool */
> +    dev->rx_bufs--;
> +    dev->use_buf_ptr = buf_ptr;
> +    stq_tce(sdev, dev->buf_list + dev->use_buf_ptr, 0);
> +
> +    dprintf("Found buffer: ptr=%d num=%d\n", dev->use_buf_ptr, dev->rx_bufs);
> +
> +    /* Transfer the packet data */
> +    if (spapr_tce_dma_write(sdev, VLAN_BD_ADDR(bd) + 8, buf, size)<  0) {
> +        return -1;
> +    }
> +
> +    dprintf("spapr_vlan_receive: DMA write completed\n");
> +
> +    /* Update the receive queue */
> +    control = VLAN_RXQC_TOGGLE | VLAN_RXQC_VALID;
> +    if (rxq_bd&  VLAN_BD_TOGGLE) {
> +        control ^= VLAN_RXQC_TOGGLE;
> +    }
> +
> +    handle = ldq_tce(sdev, VLAN_BD_ADDR(bd));
> +    stq_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 8, handle);
> +    stw_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 4, size);
> +    sth_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 2, 8);
> +    stb_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr, control);
> +
> +    dprintf("wrote rxq entry (ptr=0x%llx): 0x%016llx 0x%016llx\n",
> +            (unsigned long long)dev->rxq_ptr,
> +            (unsigned long long)ldq_tce(sdev, VLAN_BD_ADDR(rxq_bd) +
> +                                        dev->rxq_ptr),
> +            (unsigned long long)ldq_tce(sdev, VLAN_BD_ADDR(rxq_bd) +
> +                                        dev->rxq_ptr + 8));
> +
> +    dev->rxq_ptr += 16;
> +    if (dev->rxq_ptr>= VLAN_BD_LEN(rxq_bd)) {
> +        dev->rxq_ptr = 0;
> +        stq_tce(sdev, dev->buf_list + VLAN_RXQ_BD_OFF, rxq_bd ^ VLAN_BD_TOGGLE);
> +    }
> +
> +    if (sdev->signal_state&  1) {
> +        qemu_irq_pulse(sdev->qirq);
> +    }
> +
> +    return size;
> +}
> +
> +static NetClientInfo net_spapr_vlan_info = {
> +    .type = NET_CLIENT_TYPE_NIC,
> +    .size = sizeof(NICState),
> +    .can_receive = spapr_vlan_can_receive,
> +    .receive = spapr_vlan_receive,
> +};
> +
> +static int spapr_vlan_init(VIOsPAPRDevice *sdev)
> +{
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +    VIOsPAPRBus *bus;
> +
> +    bus = DO_UPCAST(VIOsPAPRBus, bus, sdev->qdev.parent_bus);
> +
> +    qemu_macaddr_default_if_unset(&dev->nicconf.macaddr);
> +
> +    dev->nic = qemu_new_nic(&net_spapr_vlan_info,&dev->nicconf,
> +                            sdev->qdev.info->name, sdev->qdev.id, dev);
> +    qemu_format_nic_info_str(&dev->nic->nc, dev->nicconf.macaddr.a);
> +
> +    return 0;
> +}
> +
> +void spapr_vlan_create(VIOsPAPRBus *bus, uint32_t reg, NICInfo *nd,
> +                       qemu_irq qirq, uint32_t vio_irq_num)
> +{
> +    DeviceState *dev;
> +    VIOsPAPRDevice *sdev;
> +
> +    dev = qdev_create(&bus->bus, "spapr-vlan");
> +    qdev_prop_set_uint32(dev, "reg", reg);
> +
> +    qdev_set_nic_properties(dev, nd);
> +
> +    qdev_init_nofail(dev);
> +    sdev = (VIOsPAPRDevice *)dev;
> +    sdev->qirq = qirq;
> +    sdev->vio_irq_num = vio_irq_num;
> +}
> +
> +static int spapr_vlan_devnode(VIOsPAPRDevice *dev, void *fdt, int node_off)
> +{
> +    VIOsPAPRVLANDevice *vdev = (VIOsPAPRVLANDevice *)dev;
> +    int ret;
> +
> +    ret = fdt_setprop(fdt, node_off, "local-mac-address",
> +&vdev->nicconf.macaddr, ETH_ALEN);
> +    if (ret<  0) {
> +        return ret;
> +    }
> +
> +    ret = fdt_setprop_cell(fdt, node_off, "ibm,mac-address-filters", 0);
> +    if (ret<  0) {
> +        return ret;
> +    }
> +
> +    return 0;
> +}
> +
> +static int check_bd(VIOsPAPRVLANDevice *dev, vlan_bd_t bd, target_ulong alignment)
> +{
> +    if ((VLAN_BD_ADDR(bd) % alignment)
> +        || (VLAN_BD_LEN(bd) % alignment)) {
> +        return -1;
> +    }
> +
> +    if (spapr_vio_check_tces(&dev->sdev, VLAN_BD_ADDR(bd),
> +                             VLAN_BD_LEN(bd), SPAPR_TCE_RW) != 0) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static target_ulong h_register_logical_lan(CPUState *env, sPAPREnvironment *spapr,
> +                                           target_ulong opcode,
> +                                           target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    target_ulong buf_list = args[1];
> +    target_ulong rec_queue = args[2];
> +    target_ulong filter_list = args[3];
> +//    target_ulong mac_address = args[4];
> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +    vlan_bd_t filter_list_bd;
> +#ifdef DEBUG
> +    target_ulong mac_address = args[4];
> +#endif
> +
> +    if (!dev) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (dev->isopen) {
> +        fprintf(stderr, "H_REGISTER_LOGICAL_LAN called twice without "
> +                "H_FREE_LOGICAL_LAN\n");
> +        return H_RESOURCE;
> +    }
> +
> +    if (check_bd(dev, VLAN_VALID_BD(buf_list, SPAPR_VIO_TCE_PAGE_SIZE),
> +                 SPAPR_VIO_TCE_PAGE_SIZE)<  0) {
> +        fprintf(stderr, "Bad buf_list 0x" TARGET_FMT_lx
> +                " for H_REGISTER_LOGICAL_LAN\n", buf_list);
> +        return H_PARAMETER;
> +    }
> +
> +    filter_list_bd = VLAN_VALID_BD(filter_list, SPAPR_VIO_TCE_PAGE_SIZE);
> +    if (check_bd(dev, filter_list_bd, SPAPR_VIO_TCE_PAGE_SIZE)<  0) {
> +        fprintf(stderr, "Bad filter_list 0x" TARGET_FMT_lx
> +                " for H_REGISTER_LOGICAL_LAN\n", filter_list);
> +        return H_PARAMETER;
> +    }
> +
> +    if (!(rec_queue&  VLAN_BD_VALID)
> +        || (check_bd(dev, rec_queue, VLAN_RQ_ALIGNMENT)<  0)) {
> +        fprintf(stderr, "Bad receive queue for H_REGISTER_LOGICAL_LAN\n");
> +        return H_PARAMETER;
> +    }
> +
> +    dev->buf_list = buf_list;
> +    sdev->signal_state = 0;
> +
> +    rec_queue&= ~VLAN_BD_TOGGLE;
> +
> +    /* Initialize the buffer list */
> +    stq_tce(sdev, buf_list, rec_queue);
> +    stq_tce(sdev, buf_list + 8, filter_list_bd);
> +    spapr_tce_dma_zero(sdev, buf_list + VLAN_RX_BDS_OFF,
> +                       SPAPR_VIO_TCE_PAGE_SIZE - VLAN_RX_BDS_OFF);
> +    dev->add_buf_ptr = VLAN_RX_BDS_OFF - 8;
> +    dev->use_buf_ptr = VLAN_RX_BDS_OFF - 8;
> +    dev->rx_bufs = 0;
> +    dev->rxq_ptr = 0;
> +
> +    /* Initialize the receive queue */
> +    spapr_tce_dma_zero(sdev, VLAN_BD_ADDR(rec_queue), VLAN_BD_LEN(rec_queue));
> +
> +    dev->isopen = 1;
> +    return H_SUCCESS;
> +}
> +
> +
> +static target_ulong h_free_logical_lan(CPUState *env, sPAPREnvironment *spapr,
> +                                       target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +
> +    if (!dev) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (!dev->isopen) {
> +        fprintf(stderr, "H_FREE_LOGICAL_LAN called without "
> +                "H_REGISTER_LOGICAL_LAN\n");
> +        return H_RESOURCE;
> +    }
> +
> +    dev->buf_list = 0;
> +    dev->rx_bufs = 0;
> +    dev->isopen = 0;
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_add_logical_lan_buffer(CPUState *env, sPAPREnvironment *spapr,
> +                                             target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    target_ulong buf = args[1];
> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +    vlan_bd_t bd;
> +
> +    dprintf("H_ADD_LOGICAL_LAN_BUFFER(0x" TARGET_FMT_lx
> +            ", 0x" TARGET_FMT_lx ")\n", reg, buf);
> +
> +    if (!sdev) {
> +        fprintf(stderr, "Wrong device in h_add_logical_lan_buffer\n");
> +        return H_PARAMETER;
> +    }
> +
> +    if ((check_bd(dev, buf, 4)<  0)
> +        || (VLAN_BD_LEN(buf)<  16)) {
> +        fprintf(stderr, "Bad buffer enqueued in h_add_logical_lan_buffer\n");
> +        return H_PARAMETER;
> +    }
> +
> +    if (!dev->isopen || dev->rx_bufs>= VLAN_MAX_BUFS) {
> +        return H_RESOURCE;
> +    }
> +
> +    do {
> +        dev->add_buf_ptr += 8;
> +        if (dev->add_buf_ptr>= SPAPR_VIO_TCE_PAGE_SIZE) {
> +            dev->add_buf_ptr = VLAN_RX_BDS_OFF;
> +        }
> +
> +        bd = ldq_tce(sdev, dev->buf_list + dev->add_buf_ptr);
> +    } while (bd&  VLAN_BD_VALID);
> +
> +    stq_tce(sdev, dev->buf_list + dev->add_buf_ptr, buf);
> +
> +    dev->rx_bufs++;
> +
> +    dprintf("h_add_logical_lan_buffer():  Added buf  ptr=%d  rx_bufs=%d"
> +            " bd=0x%016llx\n", dev->add_buf_ptr, dev->rx_bufs,
> +            (unsigned long long)buf);
> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_send_logical_lan(CPUState *env, sPAPREnvironment *spapr,
> +                                       target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong reg = args[0];
> +    target_ulong *bufs = args + 1;
> +    target_ulong continue_token = args[7];
> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +    VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
> +    unsigned total_len;
> +    uint8_t *lbuf, *p;
> +    int i, nbufs;
> +    int ret = H_SUCCESS;
> +
> +    dprintf("H_SEND_LOGICAL_LAN(0x" TARGET_FMT_lx ",<bufs>, 0x"
> +            TARGET_FMT_lx ")\n", reg, continue_token);
> +
> +    if (!sdev) {
> +        return H_PARAMETER;
> +    }
> +
> +    dprintf("rxbufs = %d\n", dev->rx_bufs);
> +
> +    if (!dev->isopen) {
> +        return H_DROPPED;
> +    }
> +
> +    if (continue_token) {
> +        return H_HARDWARE; /* FIXME actually handle this */
> +    }
> +
> +    total_len = 0;
> +    for (i = 0; i<  6; i++) {
> +        dprintf("   buf desc: 0x" TARGET_FMT_lx "\n", bufs[i]);
> +        if (!(bufs[i]&  VLAN_BD_VALID)) {
> +            break;
> +        }
> +        total_len += VLAN_BD_LEN(bufs[i]);
> +    }
> +
> +    nbufs = i;
> +    dprintf("h_send_logical_lan() %d buffers, total length 0x%x\n",
> +            nbufs, total_len);
> +
> +    if (total_len == 0) {
> +        return ret;
> +    }
> +
> +    lbuf = qemu_mallocz(total_len);
> +    p = lbuf;
> +    for (i = 0; i<  nbufs; i++) {
> +        ret = spapr_tce_dma_read(sdev, VLAN_BD_ADDR(bufs[i]),
> +                                 p, VLAN_BD_LEN(bufs[i]));
> +        if (ret<  0) {
> +            goto out;
> +        }
> +
> +        p += VLAN_BD_LEN(bufs[i]);
> +    }

I don't like the idea that there's a guest driven allocation that can 
reach 100mb here.  I'd suggest that we at least limit total_len to 64k 
to be on the safe side since a packet can't be larger than that anyway.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 13/26] Start implementing pSeries logical partition machine
  2011-03-16 21:59   ` [Qemu-devel] " Anthony Liguori
@ 2011-03-16 23:46     ` Alexander Graf
  2011-03-17  3:08     ` David Gibson
  1 sibling, 0 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-16 23:46 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: paulus, qemu-devel, anton, David Gibson





On 16.03.2011, at 22:59, Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/15/2011 11:56 PM, David Gibson wrote:
>> This patch adds a "pseries" machine to qemu.  This aims to emulate a
>> logical partition on an IBM pSeries machine, compliant to the
>> "PowerPC Architecture Platform Requirements" (PAPR) document.
> 
> Can we call the machine 'papr' or at least 'lpar'
> 
> Technically speaking, System P is the proper name these days, but I think papr or lpar would make a lot more sense to people.

I actually find the name pretty nice. It gives you what you'd expect without knowing ibm acronyms.

Lpar is just plain wrong semantically. It's a different dimension.

Papr would work, but then I'd rather go for spapr as there also is an epapr.


Alex

> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 15/26] Virtual hash page table handling on pSeries machine'
  2011-03-16 15:03   ` [Qemu-devel] " Alexander Graf
@ 2011-03-17  1:03     ` David Gibson
  2011-03-17  7:35       ` Alexander Graf
  0 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-17  1:03 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Wed, Mar 16, 2011 at 04:03:47PM +0100, Alexander Graf wrote:
> On 03/16/2011 05:56 AM, David Gibson wrote:
[snip]
> >@@ -248,6 +261,16 @@ static void ppc_spapr_init(ram_addr_t ram_size,
> >      ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
> >      cpu_register_physical_memory(0, ram_size, ram_offset);
> >
> >+    /* allocate hash page table */
> >+    htab_size = 1ULL<<  (pteg_shift + 7);
> 
> Linux makes the htab size depend on the provided amount of ram.
> Shouldn't we do the same?

Well... maybe.  In fact the guidelines for hash allocation tend to be
rather larger than really necessary for a Linux guest, so generally
16mb for the hash will be fine.  This does also correspond to the
allocation for the guest hash we use in our experimental kvm code
(making the hash exactly one hugepage makes the necessary contiguous
allocation easier).

[snip]
> >+    r&= ~(HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
> >+           HPTE_R_KEY_HI | HPTE_R_KEY_LO);
> >+    r |= (flags<<  55)&  HPTE_R_PP0;
> >+    r |= (flags<<  48)&  HPTE_R_KEY_HI;
> >+    r |= flags&  (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
> >+    rb = compute_tlbie_rb(v, r, pte_index);
> >+    stq_p(hpte, v&  ~HPTE_V_VALID);
> >+    //ppc_tlb_invalidate_one(env, rb);
> 
> Huh?
> 
> >+    tlb_flush(env, 1);
> 
> Wow, why do you need a full tlb flush here?

Ah, meant to revert that and forgot.  Originally I wasn't sure if
compute_tlbie_rb was deducing enough of the full virtual address to
make a targetted tlb invalidate safe.  I've since discovered it does,
but fixing this up to take advantage fell through the cracks.

Fixing that for the next version now.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 16/26] Implement hcall based RTAS for pSeries machines
  2011-03-16 15:08   ` [Qemu-devel] " Alexander Graf
@ 2011-03-17  1:22     ` David Gibson
  2011-03-17  7:36       ` Alexander Graf
  0 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-17  1:22 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Wed, Mar 16, 2011 at 04:08:30PM +0100, Alexander Graf wrote:
> On 03/16/2011 05:56 AM, David Gibson wrote:
[snip]
> >diff --git a/pc-bios/spapr-rtas.bin b/pc-bios/spapr-rtas.bin
> >new file mode 100644
> >index 0000000000000000000000000000000000000000..eade9c0e8ff0fd3071e3a6638a11c1a2e9a47152
> >GIT binary patch
> >literal 20
> >bcmb<Pk*=^wC@M)vPAqm|U{LaFU{C-6M#cr<
> >
> >literal 0
> >HcmV?d00001
> 
> Despite being very simple, this is missing source code. There needs
> to at least be a reference on where to find it in some text file in
> pc-bios.

Good point.  Can I just include the source here?  Having an external
reference to a 5-line asm file seems crazy...

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics)
  2011-03-16 15:47   ` [Qemu-devel] " Alexander Graf
@ 2011-03-17  1:29     ` David Gibson
  2011-03-17  7:37       ` Alexander Graf
  0 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-17  1:29 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Wed, Mar 16, 2011 at 04:47:11PM +0100, Alexander Graf wrote:
> On 03/16/2011 05:56 AM, David Gibson wrote:
[snip]
> >+/* static void ics_resend(struct icp_server_state *ss) */
> >+/* { */
> >+/*     int i; */
> >+
> >+/*     for (i = 0; i<  xics->nr_irqs; i++) */
> >+/*         ics_resend_irq(xics, nr, ss); */
> >+/* } */
> 
> Why is all this commented out? Better #if 0 it all away. Or even
> better, don't include it in the patch - unless you think the code is
> crucial and to be activated soon.

Hrm, it was supposed to implement level (rather than message)
interrupts on XICS.  But I think its bitrotted since I commented it
out.  Removed.

> >diff --git a/hw/xics.h b/hw/xics.h
> >new file mode 100644
> >index 0000000..e55f5f1
> >--- /dev/null
> >+++ b/hw/xics.h
> 
> Header missing

I'm not sure what you mean by this

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics)
  2011-03-16 22:16   ` [Qemu-devel] " Anthony Liguori
@ 2011-03-17  1:34     ` David Gibson
  2011-03-17 13:13       ` Anthony Liguori
  0 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-17  1:34 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: paulus, agraf, anton, qemu-devel

On Wed, Mar 16, 2011 at 05:16:07PM -0500, Anthony Liguori wrote:
> On 03/15/2011 11:56 PM, David Gibson wrote:
[snip]
> >+#include<pthread.h>
> 
> This isn't needed and it'll break the Windows build.   We carry a
> global mutex whenever QEMU code executes.

Good point, I wrote this before I realized all the qemu code was serialized.

> >+/*
> >+ * ICP: Presentation layer
> >+ */
> >+
> >+struct icp_server_state {
> >+    uint32_t cppr :8;
> >+    uint32_t xisr :24;
> 
> No real reason to use bitfields here.

Well.. in the hardware xics implementation, CPPR and XISR are
considered fields of the one 32-bit register, XIRR.  Matching that is
why I have the bitfield.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 19/26] Add PAPR H_VIO_SIGNAL hypercall and infrastructure for VIO interrupts
  2011-03-16 15:49   ` [Qemu-devel] " Alexander Graf
@ 2011-03-17  1:38     ` David Gibson
  2011-03-17  7:38       ` Alexander Graf
  0 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-17  1:38 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Wed, Mar 16, 2011 at 04:49:07PM +0100, Alexander Graf wrote:
> On 03/16/2011 05:56 AM, David Gibson wrote:
[snip]
> >+        return H_PARAMETER;;
> >+
> >+    dev->signal_state = mode;
> 
> No need to notify the device?

No, at the point it would send an interrupt the device checks
signal_state.

That said, I was considering another cleanup to the signal code to
have the devices use a helper function checking signal_state, and also
allowing multiple interrupts for a VIO device.  Not sure if it's worth
folding that into this series or doing it as a later extension.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 21/26] Implement TCE translation for sPAPR VIO
  2011-03-16 16:03   ` [Qemu-devel] " Alexander Graf
  2011-03-16 20:05     ` Benjamin Herrenschmidt
@ 2011-03-17  1:43     ` David Gibson
  1 sibling, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-17  1:43 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Wed, Mar 16, 2011 at 05:03:54PM +0100, Alexander Graf wrote:
> On 03/16/2011 05:56 AM, David Gibson wrote:
> >From: Ben Herrenschmidt<benh@kernel.crashing.org>
[snip]
> >+/* XX Might want to special case KVM for speed ? */
> 
> XXX

Comment removed.  In fact, we've now implemented the KVM acceleration
in a later series of patches.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 22/26] Implement sPAPR Virtual LAN (ibmveth)
  2011-03-16 16:12   ` [Qemu-devel] " Alexander Graf
@ 2011-03-17  2:04     ` David Gibson
  0 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-17  2:04 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Wed, Mar 16, 2011 at 05:12:17PM +0100, Alexander Graf wrote:
> On 03/16/2011 05:56 AM, David Gibson wrote:
[snip]
> >+#include "hw.h"
> >+#include "net.h"
> >+#include "hw/qdev.h"
> >+#include "hw/spapr.h"
> >+#include "hw/spapr_vio.h"
> >+
> >+#include<libfdt.h>
> 
> Hrm - might be good to protect compilation against existence of fdt
> then?

Ah, yeah, you've reminded me.  So this appears in a number of places.
We basically have to have fdt support for the pseries platform.
What's the right way to make the whole machine definition conditional
upon libfdt?

Incidentally, can I suggest it might be a good idea to include libfdt
in the qemu tree and make it always-on instead of configurable.

[snip]
> >+    lbuf = qemu_mallocz(total_len);
> 
> Do you really need the zeroing here? In fact, this looks like a good
> candidate for alloca :).

Ah, good idea.

[snip]
> >-#define SPAPR_VIO_TCE_PAGE_SHIFT	12
> >-#define SPAPR_VIO_TCE_PAGE_SIZE		(1ULL<<  SPAPR_VIO_TCE_PAGE_SHIFT)
> >-#define SPAPR_VIO_TCE_PAGE_MASK		(SPAPR_VIO_TCE_PAGE_SIZE - 1)
> >+#define SPAPR_VIO_TCE_PAGE_SHIFT   12
> >+#define SPAPR_VIO_TCE_PAGE_SIZE    (1ULL<<  SPAPR_VIO_TCE_PAGE_SHIFT)
> >+#define SPAPR_VIO_TCE_PAGE_MASK    (SPAPR_VIO_TCE_PAGE_SIZE - 1)
> 
> Those shouldn't have been tabs in the first place :)

Fixed in the original patch now.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 22/26] Implement sPAPR Virtual LAN (ibmveth)
  2011-03-16 22:29   ` [Qemu-devel] " Anthony Liguori
@ 2011-03-17  2:09     ` David Gibson
  0 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-17  2:09 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: paulus, agraf, anton, qemu-devel

On Wed, Mar 16, 2011 at 05:29:48PM -0500, Anthony Liguori wrote:
> On 03/15/2011 11:56 PM, David Gibson wrote:
[snip]
> >+    lbuf = qemu_mallocz(total_len);
> >+    p = lbuf;
> >+    for (i = 0; i<  nbufs; i++) {
> >+        ret = spapr_tce_dma_read(sdev, VLAN_BD_ADDR(bufs[i]),
> >+                                 p, VLAN_BD_LEN(bufs[i]));
> >+        if (ret<  0) {
> >+            goto out;
> >+        }
> >+
> >+        p += VLAN_BD_LEN(bufs[i]);
> >+    }
> 
> I don't like the idea that there's a guest driven allocation that
> can reach 100mb here.  I'd suggest that we at least limit total_len
> to 64k to be on the safe side since a packet can't be larger than
> that anyway.

Ah, a very good point.  Done.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine
  2011-03-16 16:43   ` [Qemu-devel] " Alexander Graf
@ 2011-03-17  2:21     ` David Gibson
  2011-03-17  3:25       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-17  2:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Wed, Mar 16, 2011 at 05:43:55PM +0100, Alexander Graf wrote:
> On 03/16/2011 05:57 AM, David Gibson wrote:
> >From: Ben Herrenschmidt<benh@kernel.crashing.org>
> >
> >Usually, PAPR virtual IO devices use a virtual IOMMU mechanism, TCEs,
> >to mediate all DMA transfers.  While this is necessary for some sorts of
> >operation, it can be complex to program and slow for others.
> >
> >This patch implements a mechanism for bypassing TCE translation, treating
> >"IO" addresses as plain (guest) physical memory addresses.  This has two
> >main uses:
> >  * Simple, but 64-bit aware programs like firmwares can use the VIO devices
> >without the complexity of TCE setup.
> >  * The guest OS can optionally use the TCE bypass to improve performance in
> >suitable situations.
> >
> >The mechanism used is a per-device flag which disables TCE translation.
> >The flag is toggled with some (hypervisor-implemented) RTAS methods.
> 
> Is this an official extension used by anyone or is it your own
> invention that's not implemented in pHyp?

The latter.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC
  2011-03-16 16:58     ` Stefan Hajnoczi
@ 2011-03-17  2:26       ` David Gibson
  0 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-17  2:26 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Alexander Graf, paulus, qemu-devel, anton

On Wed, Mar 16, 2011 at 04:58:41PM +0000, Stefan Hajnoczi wrote:
> On Wed, Mar 16, 2011 at 1:46 PM, Alexander Graf <agraf@suse.de> wrote:
> > On 03/16/2011 05:56 AM, David Gibson wrote:
[snip]
> scripts/checkpatch.pl is there to automate style checking.  That's the
> easiest way to check patches before submitting them.

Ah, thanks.  I was hoping for a tool like that but somehow missed it.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 13/26] Start implementing pSeries logical partition machine
  2011-03-16 21:59   ` [Qemu-devel] " Anthony Liguori
  2011-03-16 23:46     ` Alexander Graf
@ 2011-03-17  3:08     ` David Gibson
  1 sibling, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-17  3:08 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: paulus, agraf, anton, qemu-devel

On Wed, Mar 16, 2011 at 04:59:22PM -0500, Anthony Liguori wrote:
> On 03/15/2011 11:56 PM, David Gibson wrote:
> >This patch adds a "pseries" machine to qemu.  This aims to emulate a
> >logical partition on an IBM pSeries machine, compliant to the
> >"PowerPC Architecture Platform Requirements" (PAPR) document.
> 
> Can we call the machine 'papr' or at least 'lpar'
> 
> Technically speaking, System P is the proper name these days, but I
> think papr or lpar would make a lot more sense to people.

Well, I thought about renaming it to "spapr", but we thought "pseries"
was a name far more likely to be familiar to most people.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 14/26] Implement the bus structure for PAPR virtual IO
  2011-03-16 22:04   ` [Qemu-devel] " Anthony Liguori
@ 2011-03-17  3:19     ` David Gibson
  0 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-17  3:19 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: paulus, agraf, anton, qemu-devel

On Wed, Mar 16, 2011 at 05:04:43PM -0500, Anthony Liguori wrote:
> On 03/15/2011 11:56 PM, David Gibson wrote:
[snip]
> >+static int spapr_vio_busdev_init(DeviceState *dev, DeviceInfo *info)
> >+{
> >+    VIOsPAPRDeviceInfo *_info = (VIOsPAPRDeviceInfo *)info;
> >+    VIOsPAPRDevice *_dev = (VIOsPAPRDevice *)dev;
> >+    char *id;
> >+
> >+    if (asprintf(&id, "%s@%x", _info->dt_name, _dev->reg)<  0) {
> >+        return -1;
> >+    }
> >+
> >+    _dev->qdev.id = id;
> >+
> >+    return _info->init(_dev);
> 
> The C standard actually reserves the _ and __ namespaces for
> compilers and system headers.  The kernel can get away with it
> because it doesn't use system headers but we've had trouble with
> this in QEMU.

Ok, I was just following the example of the s390 code here.
Nonetheless I've changed it.

[snip]
> >+        if (info->hcalls)
> >+            info->hcalls(bus);
> 
> Got a little sloppy with braces here..

Not so much sloppy as just very, very used to the kernel style.

[snip]
> We have a checkpatch.pl in the tree.  I'd suggest using that to get
> rid of the rest of the CODING_STYLE issues which I'll stop
> commenting on.

Yeah, believe it or not I did already fix up a lot of these.  I
thought I'd got nearly all of them, but obviously not :(.  Either that
or I did get them and lost the changes in the git mishap I had a while
back.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine
  2011-03-17  2:21     ` David Gibson
@ 2011-03-17  3:25       ` Benjamin Herrenschmidt
  2011-03-17  7:44         ` Alexander Graf
  0 siblings, 1 reply; 82+ messages in thread
From: Benjamin Herrenschmidt @ 2011-03-17  3:25 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, Alexander Graf, anton, qemu-devel

On Thu, 2011-03-17 at 13:21 +1100, David Gibson wrote:
> > Is this an official extension used by anyone or is it your own
> > invention that's not implemented in pHyp?
> 
> The latter.

The main reason is to avoid having to deal with TCEs in SLOF :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC
  2011-03-16 20:44   ` [Qemu-devel] " Anthony Liguori
@ 2011-03-17  4:55     ` David Gibson
  2011-03-17 13:20       ` Anthony Liguori
  0 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-17  4:55 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: paulus, agraf, anton, qemu-devel

On Wed, Mar 16, 2011 at 03:44:49PM -0500, Anthony Liguori wrote:
> On 03/15/2011 11:56 PM, David Gibson wrote:
> >From: David Gibson<dwg@au1.ibm.com>
> >
> >PowerPC and POWER chips since the POWER4 and 970 have a special
> >hypervisor mode, and a corresponding form of the system call
> >instruction which traps to the hypervisor.
> >
> >qemu currently has stub implementations of hypervisor mode.  That
> >is, the outline is there to allow qemu to run a PowerPC hypervisor
> >under emulation.  There are a number of details missing so this
> >won't actually work at present, but the idea is there.
> >
> >What there is no provision at all, is for qemu to instead emulate
> >the hypervisor itself.  That is to have hypercalls trap into qemu
> >and their result be emulated from qemu, rather than running
> >hypervisor code within the emulated system.
> >
> >Hypervisor hardware aware KVM implementations are in the works and
> >it would  be useful for debugging and development to also allow
> >full emulation of the same para-virtualized guests as such a KVM.
> >
> >Therefore, this patch adds a hook which will allow a machine to
> >set up emulation of hypervisor calls.
> >
> >Signed-off-by: David Gibson<dwg@au1.ibm.com>
> >---
> >  target-ppc/cpu.h    |    2 ++
> >  target-ppc/helper.c |    4 ++++
> >  2 files changed, 6 insertions(+), 0 deletions(-)
> >
> >diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> >index a20c132..eaddc27 100644
> >--- a/target-ppc/cpu.h
> >+++ b/target-ppc/cpu.h
> >@@ -692,6 +692,8 @@ struct CPUPPCState {
> >      int bfd_mach;
> >      uint32_t flags;
> >      uint64_t insns_flags;
> >+    void (*emulate_hypercall)(CPUState *, void *);
> >+    void *hcall_opaque;
> 
> Is the hypercall handler ever specific to a CPU?

If you mean, "is the hypercall environment ever different from one cpu
to another within the same guest at the same time", then no.  Or at
least, no for any platform that exists now, and anything plausible I
can think of.

If you mean can the hypercall ABI and handling be different for
different CPU models within an architecture, then yes.  It's not there
yet, but BookE CPUs *will* have a quite different hypercall
environment to the PAPR hypercall environment used on IBM servers.

> I'd prefer to see this as a generic interface that wasn't specific
> to target-ppc.

> 
> Basically, add a:
> 
> void cpu_hypercall(CPUState *env);
> 
> And then implement it within your target.

I'm not exactly sure what you mean by "target" here.  It is *not*
sufficient to make the hypercall function per guest architecture, it
must be per machine.  However, it could be a global hook rather than
in the CPUState.

>  I'm not sure I get the
> opaque argument.

Well, my hypercall code needs to get at various device structures
established during machine init.  I use the opaque argument to pass a
context with this information, rather than having globals for the
things I need.  I could use a global instead, if you'd prefer.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 15/26] Virtual hash page table handling on pSeries machine'
  2011-03-17  1:03     ` [Qemu-devel] Re: [PATCH 15/26] Virtual hash page table handling on pSeries machine' David Gibson
@ 2011-03-17  7:35       ` Alexander Graf
  0 siblings, 0 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-17  7:35 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 17.03.2011, at 02:03, David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Mar 16, 2011 at 04:03:47PM +0100, Alexander Graf wrote:
>> On 03/16/2011 05:56 AM, David Gibson wrote:
> [snip]
>>> @@ -248,6 +261,16 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>>>     ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
>>>     cpu_register_physical_memory(0, ram_size, ram_offset);
>>> 
>>> +    /* allocate hash page table */
>>> +    htab_size = 1ULL<<  (pteg_shift + 7);
>> 
>> Linux makes the htab size depend on the provided amount of ram.
>> Shouldn't we do the same?
> 
> Well... maybe.  In fact the guidelines for hash allocation tend to be
> rather larger than really necessary for a Linux guest, so generally
> 16mb for the hash will be fine.  This does also correspond to the
> allocation for the guest hash we use in our experimental kvm code
> (making the hash exactly one hugepage makes the necessary contiguous
> allocation easier).

Hrm - ok :).

Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 16/26] Implement hcall based RTAS for pSeries machines
  2011-03-17  1:22     ` David Gibson
@ 2011-03-17  7:36       ` Alexander Graf
  0 siblings, 0 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-17  7:36 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton





On 17.03.2011, at 02:22, David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Mar 16, 2011 at 04:08:30PM +0100, Alexander Graf wrote:
>> On 03/16/2011 05:56 AM, David Gibson wrote:
> [snip]
>>> diff --git a/pc-bios/spapr-rtas.bin b/pc-bios/spapr-rtas.bin
>>> new file mode 100644
>>> index 0000000000000000000000000000000000000000..eade9c0e8ff0fd3071e3a6638a11c1a2e9a47152
>>> GIT binary patch
>>> literal 20
>>> bcmb<Pk*=^wC@M)vPAqm|U{LaFU{C-6M#cr<
>>> 
>>> literal 0
>>> HcmV?d00001
>> 
>> Despite being very simple, this is missing source code. There needs
>> to at least be a reference on where to find it in some text file in
>> pc-bios.
> 
> Good point.  Can I just include the source here?  Having an external
> reference to a 5-line asm file seems crazy...

Sure - we already have the source code in-tree for some x86 option roms.

Alex

> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics)
  2011-03-17  1:29     ` David Gibson
@ 2011-03-17  7:37       ` Alexander Graf
  0 siblings, 0 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-17  7:37 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 17.03.2011, at 02:29, David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Mar 16, 2011 at 04:47:11PM +0100, Alexander Graf wrote:
>> On 03/16/2011 05:56 AM, David Gibson wrote:
> [snip]
>>> +/* static void ics_resend(struct icp_server_state *ss) */
>>> +/* { */
>>> +/*     int i; */
>>> +
>>> +/*     for (i = 0; i<  xics->nr_irqs; i++) */
>>> +/*         ics_resend_irq(xics, nr, ss); */
>>> +/* } */
>> 
>> Why is all this commented out? Better #if 0 it all away. Or even
>> better, don't include it in the patch - unless you think the code is
>> crucial and to be activated soon.
> 
> Hrm, it was supposed to implement level (rather than message)
> interrupts on XICS.  But I think its bitrotted since I commented it
> out.  Removed.
> 
>>> diff --git a/hw/xics.h b/hw/xics.h
>>> new file mode 100644
>>> index 0000000..e55f5f1
>>> --- /dev/null
>>> +++ b/hw/xics.h
>> 
>> Header missing
> 
> I'm not sure what you mean by this

Every source file should have a license/copyright header ;)

Alex

> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 19/26] Add PAPR H_VIO_SIGNAL hypercall and infrastructure for VIO interrupts
  2011-03-17  1:38     ` David Gibson
@ 2011-03-17  7:38       ` Alexander Graf
  0 siblings, 0 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-17  7:38 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 17.03.2011, at 02:38, David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Mar 16, 2011 at 04:49:07PM +0100, Alexander Graf wrote:
>> On 03/16/2011 05:56 AM, David Gibson wrote:
> [snip]
>>> +        return H_PARAMETER;;
>>> +
>>> +    dev->signal_state = mode;
>> 
>> No need to notify the device?
> 
> No, at the point it would send an interrupt the device checks
> signal_state.
> 
> That said, I was considering another cleanup to the signal code to
> have the devices use a helper function checking signal_state, and also
> allowing multiple interrupts for a VIO device.  Not sure if it's worth
> folding that into this series or doing it as a later extension.

It's fine to leave it as is for now.

> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine
  2011-03-17  3:25       ` Benjamin Herrenschmidt
@ 2011-03-17  7:44         ` Alexander Graf
  2011-03-17  8:44           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 82+ messages in thread
From: Alexander Graf @ 2011-03-17  7:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: paulus, qemu-devel, anton, David Gibson


On 17.03.2011, at 04:25, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Thu, 2011-03-17 at 13:21 +1100, David Gibson wrote:
>>> Is this an official extension used by anyone or is it your own
>>> invention that's not implemented in pHyp?
>> 
>> The latter.
> 
> The main reason is to avoid having to deal with TCEs in SLOF :-)

That makes sense :). Let's move this patch to later when you introduce SLOF support then? As it is, it would be unused code.


Alex

> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine
  2011-03-17  7:44         ` Alexander Graf
@ 2011-03-17  8:44           ` Benjamin Herrenschmidt
  2011-03-17  9:37             ` Alexander Graf
  0 siblings, 1 reply; 82+ messages in thread
From: Benjamin Herrenschmidt @ 2011-03-17  8:44 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton, David Gibson

On Thu, 2011-03-17 at 08:44 +0100, Alexander Graf wrote:
> On 17.03.2011, at 04:25, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Thu, 2011-03-17 at 13:21 +1100, David Gibson wrote:
> >>> Is this an official extension used by anyone or is it your own
> >>> invention that's not implemented in pHyp?
> >> 
> >> The latter.
> > 
> > The main reason is to avoid having to deal with TCEs in SLOF :-)
> 
> That makes sense :). Let's move this patch to later when you introduce SLOF
> support then? As it is, it would be unused code.

Well, SLOF is around the corner, I just need to find out where to put
the git repo :-)

Cheers,
Ben.

> 
> Alex
> 
> > 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: [PATCH 25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine
  2011-03-17  8:44           ` Benjamin Herrenschmidt
@ 2011-03-17  9:37             ` Alexander Graf
  0 siblings, 0 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-17  9:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: paulus, qemu-devel, anton, David Gibson

On 03/17/2011 09:44 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2011-03-17 at 08:44 +0100, Alexander Graf wrote:
>> On 17.03.2011, at 04:25, Benjamin Herrenschmidt<benh@kernel.crashing.org>  wrote:
>>
>>> On Thu, 2011-03-17 at 13:21 +1100, David Gibson wrote:
>>>>> Is this an official extension used by anyone or is it your own
>>>>> invention that's not implemented in pHyp?
>>>> The latter.
>>> The main reason is to avoid having to deal with TCEs in SLOF :-)
>> That makes sense :). Let's move this patch to later when you introduce SLOF
>> support then? As it is, it would be unused code.
> Well, SLOF is around the corner, I just need to find out where to put
> the git repo :-)

Include it in v4 then :)


Alex

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics)
  2011-03-17  1:34     ` David Gibson
@ 2011-03-17 13:13       ` Anthony Liguori
  2011-03-23  3:48         ` David Gibson
  0 siblings, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2011-03-17 13:13 UTC (permalink / raw)
  To: agraf, qemu-devel, paulus, anton

On 03/16/2011 08:34 PM, David Gibson wrote:
>
>>> +/*
>>> + * ICP: Presentation layer
>>> + */
>>> +
>>> +struct icp_server_state {
>>> +    uint32_t cppr :8;
>>> +    uint32_t xisr :24;
>> No real reason to use bitfields here.
> Well.. in the hardware xics implementation, CPPR and XISR are
> considered fields of the one 32-bit register, XIRR.  Matching that is
> why I have the bitfield.

Bitfields don't work well with the way we save device state.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC
  2011-03-17  4:55     ` David Gibson
@ 2011-03-17 13:20       ` Anthony Liguori
  2011-03-18  4:03         ` David Gibson
  0 siblings, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2011-03-17 13:20 UTC (permalink / raw)
  To: agraf, qemu-devel, paulus, anton

On 03/16/2011 11:55 PM, David Gibson wrote:
> On Wed, Mar 16, 2011 at 03:44:49PM -0500, Anthony Liguori wrote:
>> On 03/15/2011 11:56 PM, David Gibson wrote:
>>> From: David Gibson<dwg@au1.ibm.com>
>>>
>>> PowerPC and POWER chips since the POWER4 and 970 have a special
>>> hypervisor mode, and a corresponding form of the system call
>>> instruction which traps to the hypervisor.
>>>
>>> qemu currently has stub implementations of hypervisor mode.  That
>>> is, the outline is there to allow qemu to run a PowerPC hypervisor
>>> under emulation.  There are a number of details missing so this
>>> won't actually work at present, but the idea is there.
>>>
>>> What there is no provision at all, is for qemu to instead emulate
>>> the hypervisor itself.  That is to have hypercalls trap into qemu
>>> and their result be emulated from qemu, rather than running
>>> hypervisor code within the emulated system.
>>>
>>> Hypervisor hardware aware KVM implementations are in the works and
>>> it would  be useful for debugging and development to also allow
>>> full emulation of the same para-virtualized guests as such a KVM.
>>>
>>> Therefore, this patch adds a hook which will allow a machine to
>>> set up emulation of hypervisor calls.
>>>
>>> Signed-off-by: David Gibson<dwg@au1.ibm.com>
>>> ---
>>>   target-ppc/cpu.h    |    2 ++
>>>   target-ppc/helper.c |    4 ++++
>>>   2 files changed, 6 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
>>> index a20c132..eaddc27 100644
>>> --- a/target-ppc/cpu.h
>>> +++ b/target-ppc/cpu.h
>>> @@ -692,6 +692,8 @@ struct CPUPPCState {
>>>       int bfd_mach;
>>>       uint32_t flags;
>>>       uint64_t insns_flags;
>>> +    void (*emulate_hypercall)(CPUState *, void *);
>>> +    void *hcall_opaque;
>> Is the hypercall handler ever specific to a CPU?
> If you mean, "is the hypercall environment ever different from one cpu
> to another within the same guest at the same time", then no.  Or at
> least, no for any platform that exists now, and anything plausible I
> can think of.

Yes, that's what I was asking.  So having a function pointer in each 
CPUState isn't necessary.

> If you mean can the hypercall ABI and handling be different for
> different CPU models within an architecture, then yes.  It's not there
> yet, but BookE CPUs *will* have a quite different hypercall
> environment to the PAPR hypercall environment used on IBM servers.
>
>> I'd prefer to see this as a generic interface that wasn't specific
>> to target-ppc.
>> Basically, add a:
>>
>> void cpu_hypercall(CPUState *env);
>>
>> And then implement it within your target.
> I'm not exactly sure what you mean by "target" here.  It is *not*
> sufficient to make the hypercall function per guest architecture, it
> must be per machine.  However, it could be a global hook rather than
> in the CPUState.

I'd suggest a totally generic hypercall infrastructure but I know that's 
not plausible for Power.  So I'm suggesting defining cpu_hypercall() in 
cpu.h, and then somewhere in target-ppc/, you can implement whatever 
logic you need to support that function.

This fits well with how we dispatch other forms of I/O (cpu_outb, 
cpu_physical_memory_rw, etc).

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 21/26] Implement TCE translation for sPAPR VIO
  2011-03-16 22:20   ` [Qemu-devel] " Anthony Liguori
@ 2011-03-18  1:58     ` David Gibson
  0 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-18  1:58 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: paulus, agraf, anton, qemu-devel

On Wed, Mar 16, 2011 at 05:20:53PM -0500, Anthony Liguori wrote:
> On 03/15/2011 11:56 PM, David Gibson wrote:
> >From: Ben Herrenschmidt<benh@kernel.crashing.org>
[snip]
> >+static target_ulong h_put_tce(CPUState *env, sPAPREnvironment *spapr,
> >+                              target_ulong opcode, target_ulong *args)
> >+{
> >+    target_ulong liobn = args[0];
> >+    target_ulong ioba = args[1];
> >+    target_ulong tce = args[2];
> >+    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, liobn);
> >+    VIOsPAPR_RTCE *rtce;
> >+
> >+    if (!dev) {
> >+        fprintf(stderr, "spapr_vio_put_tce on non-existent LIOBN "
> >+                TARGET_FMT_lx "\n",
> >+                liobn);
> 
> You generally want to avoid guest triggered fprintfs as it can be
> exploited in scenarios where qemu's stdout is logged to disk
> (libvirt).  We usually wrap this in a DPRINTF() of some sort.

Ah, good point.  I've gone through and audited for this sort of thing.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC
  2011-03-17 13:20       ` Anthony Liguori
@ 2011-03-18  4:03         ` David Gibson
  2011-03-18  6:57           ` Alexander Graf
  0 siblings, 1 reply; 82+ messages in thread
From: David Gibson @ 2011-03-18  4:03 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: paulus, agraf, anton, qemu-devel

On Thu, Mar 17, 2011 at 08:20:52AM -0500, Anthony Liguori wrote:
> On 03/16/2011 11:55 PM, David Gibson wrote:
> >On Wed, Mar 16, 2011 at 03:44:49PM -0500, Anthony Liguori wrote:
> >>On 03/15/2011 11:56 PM, David Gibson wrote:
[snip]
> >>Is the hypercall handler ever specific to a CPU?
> >If you mean, "is the hypercall environment ever different from one cpu
> >to another within the same guest at the same time", then no.  Or at
> >least, no for any platform that exists now, and anything plausible I
> >can think of.
> 
> Yes, that's what I was asking.  So having a function pointer in each
> CPUState isn't necessary.

That's right.

> >If you mean can the hypercall ABI and handling be different for
> >different CPU models within an architecture, then yes.  It's not there
> >yet, but BookE CPUs *will* have a quite different hypercall
> >environment to the PAPR hypercall environment used on IBM servers.
> >
> >>I'd prefer to see this as a generic interface that wasn't specific
> >>to target-ppc.
> >>Basically, add a:
> >>
> >>void cpu_hypercall(CPUState *env);
> >>
> >>And then implement it within your target.
> >I'm not exactly sure what you mean by "target" here.  It is *not*
> >sufficient to make the hypercall function per guest architecture, it
> >must be per machine.  However, it could be a global hook rather than
> >in the CPUState.
> 
> I'd suggest a totally generic hypercall infrastructure but I know
> that's not plausible for Power.

I'm still not sure what you're getting at here.  I can't see how a
generic (as in across architectures) hypercall infrastructure makes
sense when clearly both the implemenentation of a hypercall, and the
trigger to fire it off will be ISA specific.

>  So I'm suggesting defining
> cpu_hypercall() in cpu.h, and then somewhere in target-ppc/, you can
> implement whatever logic you need to support that function.

So I don't see the point of having this arch-specific wrapper function
which will do nothing but call a hypervisor platform specific hook,
presumably set by the machine.  There's only one callsite for the
hypercall function, why not just call the hook straight from there.

So, for the moment, I'm just going to take the hypercall hook out of
the CPUState and make it a global.  We can have the next round of
objections from there :).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC
  2011-03-18  4:03         ` David Gibson
@ 2011-03-18  6:57           ` Alexander Graf
  0 siblings, 0 replies; 82+ messages in thread
From: Alexander Graf @ 2011-03-18  6:57 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 18.03.2011, at 05:03, David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Mar 17, 2011 at 08:20:52AM -0500, Anthony Liguori wrote:
>> On 03/16/2011 11:55 PM, David Gibson wrote:
>>> On Wed, Mar 16, 2011 at 03:44:49PM -0500, Anthony Liguori wrote:
>>>> On 03/15/2011 11:56 PM, David Gibson wrote:
> [snip]
>>>> Is the hypercall handler ever specific to a CPU?
>>> If you mean, "is the hypercall environment ever different from one cpu
>>> to another within the same guest at the same time", then no.  Or at
>>> least, no for any platform that exists now, and anything plausible I
>>> can think of.
>> 
>> Yes, that's what I was asking.  So having a function pointer in each
>> CPUState isn't necessary.
> 
> That's right.
> 
>>> If you mean can the hypercall ABI and handling be different for
>>> different CPU models within an architecture, then yes.  It's not there
>>> yet, but BookE CPUs *will* have a quite different hypercall
>>> environment to the PAPR hypercall environment used on IBM servers.
>>> 
>>>> I'd prefer to see this as a generic interface that wasn't specific
>>>> to target-ppc.
>>>> Basically, add a:
>>>> 
>>>> void cpu_hypercall(CPUState *env);
>>>> 
>>>> And then implement it within your target.
>>> I'm not exactly sure what you mean by "target" here.  It is *not*
>>> sufficient to make the hypercall function per guest architecture, it
>>> must be per machine.  However, it could be a global hook rather than
>>> in the CPUState.
>> 
>> I'd suggest a totally generic hypercall infrastructure but I know
>> that's not plausible for Power.
> 
> I'm still not sure what you're getting at here.  I can't see how a
> generic (as in across architectures) hypercall infrastructure makes
> sense when clearly both the implemenentation of a hypercall, and the
> trigger to fire it off will be ISA specific.

I second that feeling. The reason for abstracting mmio and pio calls is so that multiple targets can share device emulation code. Hypercall code is 100% platform specific anyways.

Alex

> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics)
  2011-03-17 13:13       ` Anthony Liguori
@ 2011-03-23  3:48         ` David Gibson
  0 siblings, 0 replies; 82+ messages in thread
From: David Gibson @ 2011-03-23  3:48 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: paulus, agraf, anton, qemu-devel

On Thu, Mar 17, 2011 at 08:13:27AM -0500, Anthony Liguori wrote:
> On 03/16/2011 08:34 PM, David Gibson wrote:
> >
> >>>+/*
> >>>+ * ICP: Presentation layer
> >>>+ */
> >>>+
> >>>+struct icp_server_state {
> >>>+    uint32_t cppr :8;
> >>>+    uint32_t xisr :24;
> >>No real reason to use bitfields here.
> >Well.. in the hardware xics implementation, CPPR and XISR are
> >considered fields of the one 32-bit register, XIRR.  Matching that is
> >why I have the bitfield.
> 
> Bitfields don't work well with the way we save device state.

Good point.  In fact, I think I even hit that when I did some
preliminary looking at adding partition save/migration support to the
pseries stuff.  Bitfields removed in the next version.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2011-03-23  4:41 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-16  4:56 [Qemu-devel] Implement emulation of pSeries logical partitions (v3) David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 01/26] Clean up PowerPC SLB handling code David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 02/26] Allow qemu_devtree_setprop() to take arbitrary values David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 03/26] Add a hook to allow hypercalls to be emulated on PowerPC David Gibson
2011-03-16 13:46   ` [Qemu-devel] " Alexander Graf
2011-03-16 16:58     ` Stefan Hajnoczi
2011-03-17  2:26       ` David Gibson
2011-03-16 20:44   ` [Qemu-devel] " Anthony Liguori
2011-03-17  4:55     ` David Gibson
2011-03-17 13:20       ` Anthony Liguori
2011-03-18  4:03         ` David Gibson
2011-03-18  6:57           ` Alexander Graf
2011-03-16  4:56 ` [Qemu-devel] [PATCH 04/26] Implement PowerPC slbmfee and slbmfev instructions David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 05/26] Implement missing parts of the logic for the POWER PURR David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 06/26] Correct ppc popcntb logic, implement popcntw and popcntd David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 07/26] Clean up slb_lookup() function David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 08/26] Parse SDR1 on mtspr instead of at translate time David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 09/26] Use "hash" more consistently in ppc mmu code David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 10/26] Better factor the ppc hash translation path David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 11/26] Support 1T segments on ppc David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 12/26] Add POWER7 support for ppc David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 13/26] Start implementing pSeries logical partition machine David Gibson
2011-03-16 14:30   ` [Qemu-devel] " Alexander Graf
2011-03-16 21:59   ` [Qemu-devel] " Anthony Liguori
2011-03-16 23:46     ` Alexander Graf
2011-03-17  3:08     ` David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 14/26] Implement the bus structure for PAPR virtual IO David Gibson
2011-03-16 14:43   ` [Qemu-devel] " Alexander Graf
2011-03-16 22:04   ` [Qemu-devel] " Anthony Liguori
2011-03-17  3:19     ` David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 15/26] Virtual hash page table handling on pSeries machine David Gibson
2011-03-16 15:03   ` [Qemu-devel] " Alexander Graf
2011-03-17  1:03     ` [Qemu-devel] Re: [PATCH 15/26] Virtual hash page table handling on pSeries machine' David Gibson
2011-03-17  7:35       ` Alexander Graf
2011-03-16  4:56 ` [Qemu-devel] [PATCH 16/26] Implement hcall based RTAS for pSeries machines David Gibson
2011-03-16 15:08   ` [Qemu-devel] " Alexander Graf
2011-03-17  1:22     ` David Gibson
2011-03-17  7:36       ` Alexander Graf
2011-03-16 22:08   ` [Qemu-devel] " Anthony Liguori
2011-03-16  4:56 ` [Qemu-devel] [PATCH 17/26] Implement assorted pSeries hcalls and RTAS methods David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 18/26] Implement the PAPR (pSeries) virtualized interrupt controller (xics) David Gibson
2011-03-16 15:47   ` [Qemu-devel] " Alexander Graf
2011-03-17  1:29     ` David Gibson
2011-03-17  7:37       ` Alexander Graf
2011-03-16 22:16   ` [Qemu-devel] " Anthony Liguori
2011-03-17  1:34     ` David Gibson
2011-03-17 13:13       ` Anthony Liguori
2011-03-23  3:48         ` David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 19/26] Add PAPR H_VIO_SIGNAL hypercall and infrastructure for VIO interrupts David Gibson
2011-03-16 15:49   ` [Qemu-devel] " Alexander Graf
2011-03-17  1:38     ` David Gibson
2011-03-17  7:38       ` Alexander Graf
2011-03-16  4:56 ` [Qemu-devel] [PATCH 20/26] Add (virtual) interrupt to PAPR virtual tty device David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 21/26] Implement TCE translation for sPAPR VIO David Gibson
2011-03-16 16:03   ` [Qemu-devel] " Alexander Graf
2011-03-16 20:05     ` Benjamin Herrenschmidt
2011-03-16 20:21       ` Anthony Liguori
2011-03-16 20:22       ` Anthony Liguori
2011-03-16 20:36         ` Benjamin Herrenschmidt
2011-03-17  1:43     ` David Gibson
2011-03-16 22:20   ` [Qemu-devel] " Anthony Liguori
2011-03-18  1:58     ` David Gibson
2011-03-16  4:56 ` [Qemu-devel] [PATCH 22/26] Implement sPAPR Virtual LAN (ibmveth) David Gibson
2011-03-16 16:12   ` [Qemu-devel] " Alexander Graf
2011-03-17  2:04     ` David Gibson
2011-03-16 22:29   ` [Qemu-devel] " Anthony Liguori
2011-03-17  2:09     ` David Gibson
2011-03-16  4:57 ` [Qemu-devel] [PATCH 23/26] Implement PAPR CRQ hypercalls David Gibson
2011-03-16 16:15   ` [Qemu-devel] " Alexander Graf
2011-03-16  4:57 ` [Qemu-devel] [PATCH 24/26] Implement PAPR virtual SCSI interface (ibmvscsi) David Gibson
2011-03-16 16:41   ` [Qemu-devel] " Alexander Graf
2011-03-16 16:51     ` Anthony Liguori
2011-03-16 20:08     ` Benjamin Herrenschmidt
2011-03-16 20:19       ` Anthony Liguori
2011-03-16  4:57 ` [Qemu-devel] [PATCH 25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine David Gibson
2011-03-16 16:43   ` [Qemu-devel] " Alexander Graf
2011-03-17  2:21     ` David Gibson
2011-03-17  3:25       ` Benjamin Herrenschmidt
2011-03-17  7:44         ` Alexander Graf
2011-03-17  8:44           ` Benjamin Herrenschmidt
2011-03-17  9:37             ` Alexander Graf
2011-03-16  4:57 ` [Qemu-devel] [PATCH 26/26] Implement PAPR VPA functions for pSeries shared processor partitions David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.