All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08
@ 2016-02-09 12:13 Paolo Bonzini
  2016-02-09 12:13 ` [Qemu-devel] [PULL 02/32] memory: RCU ram_list.dirty_memory[] for safe RAM hotplug Paolo Bonzini
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Paolo Bonzini @ 2016-02-09 12:13 UTC (permalink / raw)
  To: qemu-devel

The following changes since commit e4a096b1cd4350eeca5dcdc391ab333d2083d7fd:

  ui/cocoa.m: Include qemu/osdep.h (2016-02-08 13:14:40 +0000)

are available in the git repository at:

  git://github.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to 91eb5e5293fda2127ae33594715b47ff7e4eb985:

  qemu-char, io: fix ordering of arguments for UDP socket creation (2016-02-09 13:06:00 +0100)

----------------------------------------------------------------
* switch to C11 atomics (Alex)
* Coverity fixes for IPMI (Corey), i386 (Paolo), qemu-char (Paolo)
* at long last, fail on wrong .pc files if -m32 is in use (Daniel)
* qemu-char regression fix (Daniel)
* SAS1068 device (Paolo)
* memory region docs improvements (Peter)
* target-i386 cleanups (Richard)
* qemu-nbd docs improvements (Sitsofe)
* thread-safe memory hotplug (Stefan)

----------------------------------------------------------------
Alex Bennée (1):
      include/qemu/atomic.h: default to __atomic functions

Andrew Jones (1):
      kvm-all: trace: strerror fixup

Corey Minyard (2):
      ipmi_bmc_sim: Fix off by one in check.
      ipmi_bmc_sim: Add break to correct watchdog NMI check

Daniel P. Berrange (2):
      configure: sanity check the glib library that pkg-config finds
      char: fix repeated registration of tcp chardev I/O handlers

Janosch Frank (1):
      scripts/kvm/kvm_stat: Fix tracefs access checking

John Snow (1):
      nbd: avoid unaligned uint64_t store

Paolo Bonzini (8):
      memory: add early bail out from cpu_physical_memory_set_dirty_range
      qemu-char: Keep pty slave file descriptor open until the master is closed
      scsi: push WWN fields up to SCSIDevice
      scsi-generic: grab device and port SAS addresses from backend
      hw: Add support for LSI SAS1068 (mptsas) device
      ipmi: do not take/drop iothread lock
      target-i386: fix PSE36 mode
      qemu-char, io: fix ordering of arguments for UDP socket creation

Peter Maydell (1):
      docs/memory.txt: Improve list of different memory regions

Richard Henderson (10):
      target-i386: Create gen_lea_v_seg
      target-i386: Introduce mo_stacksize
      target-i386: Use gen_lea_v_seg in gen_lea_modrm
      target-i386: Use gen_lea_v_seg in stack subroutines
      target-i386: Access segs via TCG registers
      target-i386: Use gen_lea_v_seg in pusha/popa
      target-i386: Rewrite gen_enter inline
      target-i386: Rewrite leave
      target-i386: Tidy gen_add_A0_im
      target-i386: Deconstruct the cpu_T array

Sitsofe Wheeler (3):
      qemu-nbd: Fix unintended texi verbatim formatting
      qemu-nbd: Minor texi updates
      qemu-nbd: Fix texi sentence capitalisation

Stefan Hajnoczi (1):
      memory: RCU ram_list.dirty_memory[] for safe RAM hotplug

Stephen Warren (1):
      MAINTAINERS: add all-match entry for qemu-devel@

 MAINTAINERS               |    5 +
 configure                 |   24 +
 default-configs/pci.mak   |    1 +
 docs/memory.txt           |   26 +-
 exec.c                    |   75 +-
 hw/ipmi/ipmi.c            |    2 -
 hw/ipmi/ipmi_bmc_sim.c    |    4 +-
 hw/scsi/Makefile.objs     |    1 +
 hw/scsi/mpi.h             | 1153 ++++++++++++++++++++++++++++++
 hw/scsi/mptconfig.c       |  904 ++++++++++++++++++++++++
 hw/scsi/mptendian.c       |  204 ++++++
 hw/scsi/mptsas.c          | 1441 +++++++++++++++++++++++++++++++++++++
 hw/scsi/mptsas.h          |  100 +++
 hw/scsi/scsi-disk.c       |   23 +-
 hw/scsi/scsi-generic.c    |   92 +++
 include/exec/ram_addr.h   |  193 ++++-
 include/hw/pci/pci_ids.h  |    1 +
 include/hw/scsi/scsi.h    |    3 +
 include/qemu/atomic.h     |  192 +++--
 io/channel-socket.c       |    2 +-
 kvm-all.c                 |    4 +-
 migration/ram.c           |    4 -
 nbd/server.c              |   20 +-
 qemu-char.c               |   10 +-
 qemu-nbd.texi             |   80 ++-
 scripts/get_maintainer.pl |    2 +-
 scripts/kvm/kvm_stat      |   23 +-
 target-i386/helper.c      |    4 +-
 target-i386/helper.h      |    4 -
 target-i386/seg_helper.c  |   74 --
 target-i386/translate.c   | 1725 +++++++++++++++++++++------------------------
 trace-events              |   22 +
 32 files changed, 5223 insertions(+), 1195 deletions(-)
 create mode 100644 hw/scsi/mpi.h
 create mode 100644 hw/scsi/mptconfig.c
 create mode 100644 hw/scsi/mptendian.c
 create mode 100644 hw/scsi/mptsas.c
 create mode 100644 hw/scsi/mptsas.h
-- 
2.5.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PULL 02/32] memory: RCU ram_list.dirty_memory[] for safe RAM hotplug
  2016-02-09 12:13 [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Paolo Bonzini
@ 2016-02-09 12:13 ` Paolo Bonzini
  2016-02-10 16:56   ` Leon Alrae
  2016-02-09 12:13 ` [Qemu-devel] [PULL 08/32] hw: Add support for LSI SAS1068 (mptsas) device Paolo Bonzini
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2016-02-09 12:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi

From: Stefan Hajnoczi <stefanha@redhat.com>

Although accesses to ram_list.dirty_memory[] use atomics so multiple
threads can safely dirty the bitmap, the data structure is not fully
thread-safe yet.

This patch handles the RAM hotplug case where ram_list.dirty_memory[] is
grown.  ram_list.dirty_memory[] is change from a regular bitmap to an
RCU array of pointers to fixed-size bitmap blocks.  Threads can continue
accessing bitmap blocks while the array is being extended.  See the
comments in the code for an in-depth explanation of struct
DirtyMemoryBlocks.

I have tested that live migration with virtio-blk dataplane works.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <1453728801-5398-2-git-send-email-stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 exec.c                  |  75 +++++++++++++++----
 include/exec/ram_addr.h | 189 ++++++++++++++++++++++++++++++++++++++++++------
 migration/ram.c         |   4 -
 3 files changed, 225 insertions(+), 43 deletions(-)

diff --git a/exec.c b/exec.c
index ab37360..7d67c11 100644
--- a/exec.c
+++ b/exec.c
@@ -980,8 +980,9 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
                                               ram_addr_t length,
                                               unsigned client)
 {
+    DirtyMemoryBlocks *blocks;
     unsigned long end, page;
-    bool dirty;
+    bool dirty = false;
 
     if (length == 0) {
         return false;
@@ -989,8 +990,22 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
 
     end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
     page = start >> TARGET_PAGE_BITS;
-    dirty = bitmap_test_and_clear_atomic(ram_list.dirty_memory[client],
-                                         page, end - page);
+
+    rcu_read_lock();
+
+    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
+
+    while (page < end) {
+        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
+
+        dirty |= bitmap_test_and_clear_atomic(blocks->blocks[idx],
+                                              offset, num);
+        page += num;
+    }
+
+    rcu_read_unlock();
 
     if (dirty && tcg_enabled()) {
         tlb_reset_dirty_range_all(start, length);
@@ -1504,6 +1519,47 @@ int qemu_ram_resize(ram_addr_t base, ram_addr_t newsize, Error **errp)
     return 0;
 }
 
+/* Called with ram_list.mutex held */
+static void dirty_memory_extend(ram_addr_t old_ram_size,
+                                ram_addr_t new_ram_size)
+{
+    ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size,
+                                             DIRTY_MEMORY_BLOCK_SIZE);
+    ram_addr_t new_num_blocks = DIV_ROUND_UP(new_ram_size,
+                                             DIRTY_MEMORY_BLOCK_SIZE);
+    int i;
+
+    /* Only need to extend if block count increased */
+    if (new_num_blocks <= old_num_blocks) {
+        return;
+    }
+
+    for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
+        DirtyMemoryBlocks *old_blocks;
+        DirtyMemoryBlocks *new_blocks;
+        int j;
+
+        old_blocks = atomic_rcu_read(&ram_list.dirty_memory[i]);
+        new_blocks = g_malloc(sizeof(*new_blocks) +
+                              sizeof(new_blocks->blocks[0]) * new_num_blocks);
+
+        if (old_num_blocks) {
+            memcpy(new_blocks->blocks, old_blocks->blocks,
+                   old_num_blocks * sizeof(old_blocks->blocks[0]));
+        }
+
+        for (j = old_num_blocks; j < new_num_blocks; j++) {
+            new_blocks->blocks[j] = bitmap_new(DIRTY_MEMORY_BLOCK_SIZE);
+        }
+
+        atomic_rcu_set(&ram_list.dirty_memory[i], new_blocks);
+
+        if (old_blocks) {
+            g_free_rcu(old_blocks, rcu);
+        }
+    }
+}
+
 static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
 {
     RAMBlock *block;
@@ -1543,6 +1599,7 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
               (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS);
     if (new_ram_size > old_ram_size) {
         migration_bitmap_extend(old_ram_size, new_ram_size);
+        dirty_memory_extend(old_ram_size, new_ram_size);
     }
     /* Keep the list sorted from biggest to smallest block.  Unlike QTAILQ,
      * QLIST (which has an RCU-friendly variant) does not have insertion at
@@ -1568,18 +1625,6 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
     ram_list.version++;
     qemu_mutex_unlock_ramlist();
 
-    new_ram_size = last_ram_offset() >> TARGET_PAGE_BITS;
-
-    if (new_ram_size > old_ram_size) {
-        int i;
-
-        /* ram_list.dirty_memory[] is protected by the iothread lock.  */
-        for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
-            ram_list.dirty_memory[i] =
-                bitmap_zero_extend(ram_list.dirty_memory[i],
-                                   old_ram_size, new_ram_size);
-       }
-    }
     cpu_physical_memory_set_dirty_range(new_block->offset,
                                         new_block->used_length,
                                         DIRTY_CLIENTS_ALL);
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index f2e872d..b1413a1 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -49,13 +49,43 @@ static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
     return (char *)block->host + offset;
 }
 
+/* The dirty memory bitmap is split into fixed-size blocks to allow growth
+ * under RCU.  The bitmap for a block can be accessed as follows:
+ *
+ *   rcu_read_lock();
+ *
+ *   DirtyMemoryBlocks *blocks =
+ *       atomic_rcu_read(&ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
+ *
+ *   ram_addr_t idx = (addr >> TARGET_PAGE_BITS) / DIRTY_MEMORY_BLOCK_SIZE;
+ *   unsigned long *block = blocks.blocks[idx];
+ *   ...access block bitmap...
+ *
+ *   rcu_read_unlock();
+ *
+ * Remember to check for the end of the block when accessing a range of
+ * addresses.  Move on to the next block if you reach the end.
+ *
+ * Organization into blocks allows dirty memory to grow (but not shrink) under
+ * RCU.  When adding new RAMBlocks requires the dirty memory to grow, a new
+ * DirtyMemoryBlocks array is allocated with pointers to existing blocks kept
+ * the same.  Other threads can safely access existing blocks while dirty
+ * memory is being grown.  When no threads are using the old DirtyMemoryBlocks
+ * anymore it is freed by RCU (but the underlying blocks stay because they are
+ * pointed to from the new DirtyMemoryBlocks).
+ */
+#define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8)
+typedef struct {
+    struct rcu_head rcu;
+    unsigned long *blocks[];
+} DirtyMemoryBlocks;
+
 typedef struct RAMList {
     QemuMutex mutex;
-    /* Protected by the iothread lock.  */
-    unsigned long *dirty_memory[DIRTY_MEMORY_NUM];
     RAMBlock *mru_block;
     /* RCU-enabled, writes protected by the ramlist lock. */
     QLIST_HEAD(, RAMBlock) blocks;
+    DirtyMemoryBlocks *dirty_memory[DIRTY_MEMORY_NUM];
     uint32_t version;
 } RAMList;
 extern RAMList ram_list;
@@ -89,30 +119,70 @@ static inline bool cpu_physical_memory_get_dirty(ram_addr_t start,
                                                  ram_addr_t length,
                                                  unsigned client)
 {
-    unsigned long end, page, next;
+    DirtyMemoryBlocks *blocks;
+    unsigned long end, page;
+    bool dirty = false;
 
     assert(client < DIRTY_MEMORY_NUM);
 
     end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
     page = start >> TARGET_PAGE_BITS;
-    next = find_next_bit(ram_list.dirty_memory[client], end, page);
 
-    return next < end;
+    rcu_read_lock();
+
+    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
+
+    while (page < end) {
+        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
+
+        if (find_next_bit(blocks->blocks[idx], offset, num) < num) {
+            dirty = true;
+            break;
+        }
+
+        page += num;
+    }
+
+    rcu_read_unlock();
+
+    return dirty;
 }
 
 static inline bool cpu_physical_memory_all_dirty(ram_addr_t start,
                                                  ram_addr_t length,
                                                  unsigned client)
 {
-    unsigned long end, page, next;
+    DirtyMemoryBlocks *blocks;
+    unsigned long end, page;
+    bool dirty = true;
 
     assert(client < DIRTY_MEMORY_NUM);
 
     end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
     page = start >> TARGET_PAGE_BITS;
-    next = find_next_zero_bit(ram_list.dirty_memory[client], end, page);
 
-    return next >= end;
+    rcu_read_lock();
+
+    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
+
+    while (page < end) {
+        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
+
+        if (find_next_zero_bit(blocks->blocks[idx], offset, num) < num) {
+            dirty = false;
+            break;
+        }
+
+        page += num;
+    }
+
+    rcu_read_unlock();
+
+    return dirty;
 }
 
 static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr,
@@ -154,16 +224,31 @@ static inline uint8_t cpu_physical_memory_range_includes_clean(ram_addr_t start,
 static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr,
                                                       unsigned client)
 {
+    unsigned long page, idx, offset;
+    DirtyMemoryBlocks *blocks;
+
     assert(client < DIRTY_MEMORY_NUM);
-    set_bit_atomic(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]);
+
+    page = addr >> TARGET_PAGE_BITS;
+    idx = page / DIRTY_MEMORY_BLOCK_SIZE;
+    offset = page % DIRTY_MEMORY_BLOCK_SIZE;
+
+    rcu_read_lock();
+
+    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
+
+    set_bit_atomic(offset, blocks->blocks[idx]);
+
+    rcu_read_unlock();
 }
 
 static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
                                                        ram_addr_t length,
                                                        uint8_t mask)
 {
+    DirtyMemoryBlocks *blocks[DIRTY_MEMORY_NUM];
     unsigned long end, page;
-    unsigned long **d = ram_list.dirty_memory;
+    int i;
 
     if (!mask && !xen_enabled()) {
         return;
@@ -171,15 +256,36 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
 
     end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
     page = start >> TARGET_PAGE_BITS;
-    if (likely(mask & (1 << DIRTY_MEMORY_MIGRATION))) {
-        bitmap_set_atomic(d[DIRTY_MEMORY_MIGRATION], page, end - page);
-    }
-    if (unlikely(mask & (1 << DIRTY_MEMORY_VGA))) {
-        bitmap_set_atomic(d[DIRTY_MEMORY_VGA], page, end - page);
+
+    rcu_read_lock();
+
+    for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
+        blocks[i] = atomic_rcu_read(&ram_list.dirty_memory[i]);
     }
-    if (unlikely(mask & (1 << DIRTY_MEMORY_CODE))) {
-        bitmap_set_atomic(d[DIRTY_MEMORY_CODE], page, end - page);
+
+    while (page < end) {
+        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
+
+        if (likely(mask & (1 << DIRTY_MEMORY_MIGRATION))) {
+            bitmap_set_atomic(blocks[DIRTY_MEMORY_MIGRATION]->blocks[idx],
+                              offset, num);
+        }
+        if (unlikely(mask & (1 << DIRTY_MEMORY_VGA))) {
+            bitmap_set_atomic(blocks[DIRTY_MEMORY_VGA]->blocks[idx],
+                              offset, num);
+        }
+        if (unlikely(mask & (1 << DIRTY_MEMORY_CODE))) {
+            bitmap_set_atomic(blocks[DIRTY_MEMORY_CODE]->blocks[idx],
+                              offset, num);
+        }
+
+        page += num;
     }
+
+    rcu_read_unlock();
+
     xen_modified_memory(start, length);
 }
 
@@ -199,21 +305,41 @@ static inline void cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
     /* start address is aligned at the start of a word? */
     if ((((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) &&
         (hpratio == 1)) {
+        unsigned long **blocks[DIRTY_MEMORY_NUM];
+        unsigned long idx;
+        unsigned long offset;
         long k;
         long nr = BITS_TO_LONGS(pages);
 
+        idx = (start >> TARGET_PAGE_BITS) / DIRTY_MEMORY_BLOCK_SIZE;
+        offset = BIT_WORD((start >> TARGET_PAGE_BITS) %
+                          DIRTY_MEMORY_BLOCK_SIZE);
+
+        rcu_read_lock();
+
+        for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
+            blocks[i] = atomic_rcu_read(&ram_list.dirty_memory[i])->blocks;
+        }
+
         for (k = 0; k < nr; k++) {
             if (bitmap[k]) {
                 unsigned long temp = leul_to_cpu(bitmap[k]);
-                unsigned long **d = ram_list.dirty_memory;
 
-                atomic_or(&d[DIRTY_MEMORY_MIGRATION][page + k], temp);
-                atomic_or(&d[DIRTY_MEMORY_VGA][page + k], temp);
+                atomic_or(&blocks[DIRTY_MEMORY_MIGRATION][idx][offset], temp);
+                atomic_or(&blocks[DIRTY_MEMORY_VGA][idx][offset], temp);
                 if (tcg_enabled()) {
-                    atomic_or(&d[DIRTY_MEMORY_CODE][page + k], temp);
+                    atomic_or(&blocks[DIRTY_MEMORY_CODE][idx][offset], temp);
                 }
             }
+
+            if (++offset >= BITS_TO_LONGS(DIRTY_MEMORY_BLOCK_SIZE)) {
+                offset = 0;
+                idx++;
+            }
         }
+
+        rcu_read_unlock();
+
         xen_modified_memory(start, pages << TARGET_PAGE_BITS);
     } else {
         uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL : DIRTY_CLIENTS_NOCODE;
@@ -265,18 +391,33 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
     if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) {
         int k;
         int nr = BITS_TO_LONGS(length >> TARGET_PAGE_BITS);
-        unsigned long *src = ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION];
+        unsigned long * const *src;
+        unsigned long idx = (page * BITS_PER_LONG) / DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long offset = BIT_WORD((page * BITS_PER_LONG) %
+                                        DIRTY_MEMORY_BLOCK_SIZE);
+
+        rcu_read_lock();
+
+        src = atomic_rcu_read(
+                &ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION])->blocks;
 
         for (k = page; k < page + nr; k++) {
-            if (src[k]) {
-                unsigned long bits = atomic_xchg(&src[k], 0);
+            if (src[idx][offset]) {
+                unsigned long bits = atomic_xchg(&src[idx][offset], 0);
                 unsigned long new_dirty;
                 new_dirty = ~dest[k];
                 dest[k] |= bits;
                 new_dirty &= bits;
                 num_dirty += ctpopl(new_dirty);
             }
+
+            if (++offset >= BITS_TO_LONGS(DIRTY_MEMORY_BLOCK_SIZE)) {
+                offset = 0;
+                idx++;
+            }
         }
+
+        rcu_read_unlock();
     } else {
         for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
             if (cpu_physical_memory_test_and_clear_dirty(
diff --git a/migration/ram.c b/migration/ram.c
index 3cdfea4..96c749f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -609,7 +609,6 @@ static void migration_bitmap_sync_init(void)
     iterations_prev = 0;
 }
 
-/* Called with iothread lock held, to protect ram_list.dirty_memory[] */
 static void migration_bitmap_sync(void)
 {
     RAMBlock *block;
@@ -1921,8 +1920,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
         acct_clear();
     }
 
-    /* iothread lock needed for ram_list.dirty_memory[] */
-    qemu_mutex_lock_iothread();
     qemu_mutex_lock_ramlist();
     rcu_read_lock();
     bytes_transferred = 0;
@@ -1947,7 +1944,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     memory_global_dirty_log_start();
     migration_bitmap_sync();
     qemu_mutex_unlock_ramlist();
-    qemu_mutex_unlock_iothread();
 
     qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PULL 08/32] hw: Add support for LSI SAS1068 (mptsas) device
  2016-02-09 12:13 [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Paolo Bonzini
  2016-02-09 12:13 ` [Qemu-devel] [PULL 02/32] memory: RCU ram_list.dirty_memory[] for safe RAM hotplug Paolo Bonzini
@ 2016-02-09 12:13 ` Paolo Bonzini
  2016-02-09 12:13 ` [Qemu-devel] [PULL 29/32] docs/memory.txt: Improve list of different memory regions Paolo Bonzini
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2016-02-09 12:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Don Slutz

This adds the SAS1068 device, a SAS disk controller used in VMware that
is oldish but widely supported and has decent performance.  Unlike
megasas, it presents itself as a SAS controller and not as a RAID
controller.  The device corresponds to the mptsas kernel driver in
Linux.

A few small things in the device setup are based on Don Slutz's old
patch, but the device emulation was written from scratch based on Don's
SeaBIOS patch and on the FreeBSD and Linux drivers.  It is 2400 lines
shorter than Don's patch (and roughly the same size as MegaSAS---also
because it doesn't support the similar SPI controller), implements SCSI
task management functions (with asynchronous cancellation), supports
big-endian hosts, has complete support for migration and follows the
QEMU coding standards much more closely.

To write the driver, I first split Don's patch in two parts, with
the configuration bits in one file and the rest in a separate file.
I first left mptconfig.c in place and rewrote the rest, then deleted
mptconfig.c as well.  The configuration pages are still based mostly on
VirtualBox's, though not exactly the same.  However, the implementation
is completely different.  The contents of the pages themselves should
not be copyrightable.

Signed-off-by: Don Slutz <Don@CloudSwitch.com>
Message-Id: <1347382813-5662-1-git-send-email-Don@CloudSwitch.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 default-configs/pci.mak  |    1 +
 hw/scsi/Makefile.objs    |    1 +
 hw/scsi/mpi.h            | 1153 +++++++++++++++++++++++++++++++++++++
 hw/scsi/mptconfig.c      |  904 +++++++++++++++++++++++++++++
 hw/scsi/mptendian.c      |  204 +++++++
 hw/scsi/mptsas.c         | 1441 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/scsi/mptsas.h         |  100 ++++
 include/hw/pci/pci_ids.h |    1 +
 trace-events             |   22 +
 9 files changed, 3827 insertions(+)
 create mode 100644 hw/scsi/mpi.h
 create mode 100644 hw/scsi/mptconfig.c
 create mode 100644 hw/scsi/mptendian.c
 create mode 100644 hw/scsi/mptsas.c
 create mode 100644 hw/scsi/mptsas.h

diff --git a/default-configs/pci.mak b/default-configs/pci.mak
index f250119..4fa9a28 100644
--- a/default-configs/pci.mak
+++ b/default-configs/pci.mak
@@ -15,6 +15,7 @@ CONFIG_ES1370=y
 CONFIG_LSI_SCSI_PCI=y
 CONFIG_VMW_PVSCSI_SCSI_PCI=y
 CONFIG_MEGASAS_SCSI_PCI=y
+CONFIG_MPTSAS_SCSI_PCI=y
 CONFIG_RTL8139_PCI=y
 CONFIG_E1000_PCI=y
 CONFIG_VMXNET3_PCI=y
diff --git a/hw/scsi/Makefile.objs b/hw/scsi/Makefile.objs
index 40c79d3..5a2248b 100644
--- a/hw/scsi/Makefile.objs
+++ b/hw/scsi/Makefile.objs
@@ -1,6 +1,7 @@
 common-obj-y += scsi-disk.o
 common-obj-y += scsi-generic.o scsi-bus.o
 common-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
+common-obj-$(CONFIG_MPTSAS_SCSI_PCI) += mptsas.o mptconfig.o mptendian.o
 common-obj-$(CONFIG_MEGASAS_SCSI_PCI) += megasas.o
 common-obj-$(CONFIG_VMW_PVSCSI_SCSI_PCI) += vmw_pvscsi.o
 common-obj-$(CONFIG_ESP) += esp.o
diff --git a/hw/scsi/mpi.h b/hw/scsi/mpi.h
new file mode 100644
index 0000000..0568e19
--- /dev/null
+++ b/hw/scsi/mpi.h
@@ -0,0 +1,1153 @@
+/*-
+ * Based on FreeBSD sys/dev/mpt/mpilib headers.
+ *
+ * Copyright (c) 2000-2010, LSI Logic Corporation and its contributors.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon including
+ *    a substantially similar Disclaimer requirement for further binary
+ *    redistribution.
+ * 3. Neither the name of the LSI Logic Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF THE COPYRIGHT
+ * OWNER OR CONTRIBUTOR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef MPI_H
+#define MPI_H
+
+enum {
+    MPI_FUNCTION_SCSI_IO_REQUEST                = 0x00,
+    MPI_FUNCTION_SCSI_TASK_MGMT                 = 0x01,
+    MPI_FUNCTION_IOC_INIT                       = 0x02,
+    MPI_FUNCTION_IOC_FACTS                      = 0x03,
+    MPI_FUNCTION_CONFIG                         = 0x04,
+    MPI_FUNCTION_PORT_FACTS                     = 0x05,
+    MPI_FUNCTION_PORT_ENABLE                    = 0x06,
+    MPI_FUNCTION_EVENT_NOTIFICATION             = 0x07,
+    MPI_FUNCTION_EVENT_ACK                      = 0x08,
+    MPI_FUNCTION_FW_DOWNLOAD                    = 0x09,
+    MPI_FUNCTION_TARGET_CMD_BUFFER_POST         = 0x0A,
+    MPI_FUNCTION_TARGET_ASSIST                  = 0x0B,
+    MPI_FUNCTION_TARGET_STATUS_SEND             = 0x0C,
+    MPI_FUNCTION_TARGET_MODE_ABORT              = 0x0D,
+    MPI_FUNCTION_FC_LINK_SRVC_BUF_POST          = 0x0E,
+    MPI_FUNCTION_FC_LINK_SRVC_RSP               = 0x0F,
+    MPI_FUNCTION_FC_EX_LINK_SRVC_SEND           = 0x10,
+    MPI_FUNCTION_FC_ABORT                       = 0x11,
+    MPI_FUNCTION_FW_UPLOAD                      = 0x12,
+    MPI_FUNCTION_FC_COMMON_TRANSPORT_SEND       = 0x13,
+    MPI_FUNCTION_FC_PRIMITIVE_SEND              = 0x14,
+
+    MPI_FUNCTION_RAID_ACTION                    = 0x15,
+    MPI_FUNCTION_RAID_SCSI_IO_PASSTHROUGH       = 0x16,
+
+    MPI_FUNCTION_TOOLBOX                        = 0x17,
+
+    MPI_FUNCTION_SCSI_ENCLOSURE_PROCESSOR       = 0x18,
+
+    MPI_FUNCTION_MAILBOX                        = 0x19,
+
+    MPI_FUNCTION_SMP_PASSTHROUGH                = 0x1A,
+    MPI_FUNCTION_SAS_IO_UNIT_CONTROL            = 0x1B,
+    MPI_FUNCTION_SATA_PASSTHROUGH               = 0x1C,
+
+    MPI_FUNCTION_DIAG_BUFFER_POST               = 0x1D,
+    MPI_FUNCTION_DIAG_RELEASE                   = 0x1E,
+
+    MPI_FUNCTION_SCSI_IO_32                     = 0x1F,
+
+    MPI_FUNCTION_LAN_SEND                       = 0x20,
+    MPI_FUNCTION_LAN_RECEIVE                    = 0x21,
+    MPI_FUNCTION_LAN_RESET                      = 0x22,
+
+    MPI_FUNCTION_TARGET_ASSIST_EXTENDED         = 0x23,
+    MPI_FUNCTION_TARGET_CMD_BUF_BASE_POST       = 0x24,
+    MPI_FUNCTION_TARGET_CMD_BUF_LIST_POST       = 0x25,
+
+    MPI_FUNCTION_INBAND_BUFFER_POST             = 0x28,
+    MPI_FUNCTION_INBAND_SEND                    = 0x29,
+    MPI_FUNCTION_INBAND_RSP                     = 0x2A,
+    MPI_FUNCTION_INBAND_ABORT                   = 0x2B,
+
+    MPI_FUNCTION_IOC_MESSAGE_UNIT_RESET         = 0x40,
+    MPI_FUNCTION_IO_UNIT_RESET                  = 0x41,
+    MPI_FUNCTION_HANDSHAKE                      = 0x42,
+    MPI_FUNCTION_REPLY_FRAME_REMOVAL            = 0x43,
+    MPI_FUNCTION_HOST_PAGEBUF_ACCESS_CONTROL    = 0x44,
+};
+
+/****************************************************************************/
+/*  Registers                                                               */
+/****************************************************************************/
+
+enum {
+    MPI_IOC_STATE_RESET                 = 0x00000000,
+    MPI_IOC_STATE_READY                 = 0x10000000,
+    MPI_IOC_STATE_OPERATIONAL           = 0x20000000,
+    MPI_IOC_STATE_FAULT                 = 0x40000000,
+
+    MPI_DOORBELL_OFFSET                 = 0x00000000,
+    MPI_DOORBELL_ACTIVE                 = 0x08000000, /* DoorbellUsed */
+    MPI_DOORBELL_WHO_INIT_MASK          = 0x07000000,
+    MPI_DOORBELL_WHO_INIT_SHIFT         = 24,
+    MPI_DOORBELL_FUNCTION_MASK          = 0xFF000000,
+    MPI_DOORBELL_FUNCTION_SHIFT         = 24,
+    MPI_DOORBELL_ADD_DWORDS_MASK        = 0x00FF0000,
+    MPI_DOORBELL_ADD_DWORDS_SHIFT       = 16,
+    MPI_DOORBELL_DATA_MASK              = 0x0000FFFF,
+    MPI_DOORBELL_FUNCTION_SPECIFIC_MASK = 0x0000FFFF,
+
+    MPI_DB_HPBAC_VALUE_MASK             = 0x0000F000,
+    MPI_DB_HPBAC_ENABLE_ACCESS          = 0x01,
+    MPI_DB_HPBAC_DISABLE_ACCESS         = 0x02,
+    MPI_DB_HPBAC_FREE_BUFFER            = 0x03,
+
+    MPI_WRITE_SEQUENCE_OFFSET           = 0x00000004,
+    MPI_WRSEQ_KEY_VALUE_MASK            = 0x0000000F,
+    MPI_WRSEQ_1ST_KEY_VALUE             = 0x04,
+    MPI_WRSEQ_2ND_KEY_VALUE             = 0x0B,
+    MPI_WRSEQ_3RD_KEY_VALUE             = 0x02,
+    MPI_WRSEQ_4TH_KEY_VALUE             = 0x07,
+    MPI_WRSEQ_5TH_KEY_VALUE             = 0x0D,
+
+    MPI_DIAGNOSTIC_OFFSET               = 0x00000008,
+    MPI_DIAG_CLEAR_FLASH_BAD_SIG        = 0x00000400,
+    MPI_DIAG_PREVENT_IOC_BOOT           = 0x00000200,
+    MPI_DIAG_DRWE                       = 0x00000080,
+    MPI_DIAG_FLASH_BAD_SIG              = 0x00000040,
+    MPI_DIAG_RESET_HISTORY              = 0x00000020,
+    MPI_DIAG_RW_ENABLE                  = 0x00000010,
+    MPI_DIAG_RESET_ADAPTER              = 0x00000004,
+    MPI_DIAG_DISABLE_ARM                = 0x00000002,
+    MPI_DIAG_MEM_ENABLE                 = 0x00000001,
+
+    MPI_TEST_BASE_ADDRESS_OFFSET        = 0x0000000C,
+
+    MPI_DIAG_RW_DATA_OFFSET             = 0x00000010,
+
+    MPI_DIAG_RW_ADDRESS_OFFSET          = 0x00000014,
+
+    MPI_HOST_INTERRUPT_STATUS_OFFSET    = 0x00000030,
+    MPI_HIS_IOP_DOORBELL_STATUS         = 0x80000000,
+    MPI_HIS_REPLY_MESSAGE_INTERRUPT     = 0x00000008,
+    MPI_HIS_DOORBELL_INTERRUPT          = 0x00000001,
+
+    MPI_HOST_INTERRUPT_MASK_OFFSET      = 0x00000034,
+    MPI_HIM_RIM                         = 0x00000008,
+    MPI_HIM_DIM                         = 0x00000001,
+
+    MPI_REQUEST_QUEUE_OFFSET            = 0x00000040,
+    MPI_REQUEST_POST_FIFO_OFFSET        = 0x00000040,
+
+    MPI_REPLY_QUEUE_OFFSET              = 0x00000044,
+    MPI_REPLY_POST_FIFO_OFFSET          = 0x00000044,
+    MPI_REPLY_FREE_FIFO_OFFSET          = 0x00000044,
+
+    MPI_HI_PRI_REQUEST_QUEUE_OFFSET     = 0x00000048,
+};
+
+#define MPI_ADDRESS_REPLY_A_BIT          0x80000000
+
+/****************************************************************************/
+/*  Scatter/gather elements                                                 */
+/****************************************************************************/
+
+typedef struct MPISGEntry {
+    uint32_t                FlagsLength;
+    union
+    {
+        uint32_t            Address32;
+        uint64_t            Address64;
+    } u;
+} QEMU_PACKED MPISGEntry;
+
+/* Flags field bit definitions */
+
+enum {
+    MPI_SGE_FLAGS_LAST_ELEMENT              = 0x80000000,
+    MPI_SGE_FLAGS_END_OF_BUFFER             = 0x40000000,
+    MPI_SGE_FLAGS_ELEMENT_TYPE_MASK         = 0x30000000,
+    MPI_SGE_FLAGS_LOCAL_ADDRESS             = 0x08000000,
+    MPI_SGE_FLAGS_DIRECTION                 = 0x04000000,
+    MPI_SGE_FLAGS_64_BIT_ADDRESSING         = 0x02000000,
+    MPI_SGE_FLAGS_END_OF_LIST               = 0x01000000,
+
+    MPI_SGE_LENGTH_MASK                     = 0x00FFFFFF,
+    MPI_SGE_CHAIN_LENGTH_MASK               = 0x0000FFFF,
+
+    MPI_SGE_FLAGS_TRANSACTION_ELEMENT       = 0x00000000,
+    MPI_SGE_FLAGS_SIMPLE_ELEMENT            = 0x10000000,
+    MPI_SGE_FLAGS_CHAIN_ELEMENT             = 0x30000000,
+
+    /* Direction */
+
+    MPI_SGE_FLAGS_IOC_TO_HOST               = 0x00000000,
+    MPI_SGE_FLAGS_HOST_TO_IOC               = 0x04000000,
+
+    MPI_SGE_CHAIN_OFFSET_MASK               = 0x00FF0000,
+};
+
+#define MPI_SGE_CHAIN_OFFSET_SHIFT 16
+
+/****************************************************************************/
+/* Standard message request header for all request messages                 */
+/****************************************************************************/
+
+typedef struct MPIRequestHeader {
+    uint8_t                 Reserved[2];      /* function specific */
+    uint8_t                 ChainOffset;
+    uint8_t                 Function;
+    uint8_t                 Reserved1[3];     /* function specific */
+    uint8_t                 MsgFlags;
+    uint32_t                MsgContext;
+} QEMU_PACKED MPIRequestHeader;
+
+
+typedef struct MPIDefaultReply {
+    uint8_t                 Reserved[2];      /* function specific */
+    uint8_t                 MsgLength;
+    uint8_t                 Function;
+    uint8_t                 Reserved1[3];     /* function specific */
+    uint8_t                 MsgFlags;
+    uint32_t                MsgContext;
+    uint8_t                 Reserved2[2];     /* function specific */
+    uint16_t                IOCStatus;
+    uint32_t                IOCLogInfo;
+} QEMU_PACKED MPIDefaultReply;
+
+/* MsgFlags definition for all replies */
+
+#define MPI_MSGFLAGS_CONTINUATION_REPLY         (0x80)
+
+enum {
+
+    /************************************************************************/
+    /*  Common IOCStatus values for all replies                             */
+    /************************************************************************/
+
+    MPI_IOCSTATUS_SUCCESS                   = 0x0000,
+    MPI_IOCSTATUS_INVALID_FUNCTION          = 0x0001,
+    MPI_IOCSTATUS_BUSY                      = 0x0002,
+    MPI_IOCSTATUS_INVALID_SGL               = 0x0003,
+    MPI_IOCSTATUS_INTERNAL_ERROR            = 0x0004,
+    MPI_IOCSTATUS_RESERVED                  = 0x0005,
+    MPI_IOCSTATUS_INSUFFICIENT_RESOURCES    = 0x0006,
+    MPI_IOCSTATUS_INVALID_FIELD             = 0x0007,
+    MPI_IOCSTATUS_INVALID_STATE             = 0x0008,
+    MPI_IOCSTATUS_OP_STATE_NOT_SUPPORTED    = 0x0009,
+
+    /************************************************************************/
+    /*  Config IOCStatus values                                             */
+    /************************************************************************/
+
+    MPI_IOCSTATUS_CONFIG_INVALID_ACTION     = 0x0020,
+    MPI_IOCSTATUS_CONFIG_INVALID_TYPE       = 0x0021,
+    MPI_IOCSTATUS_CONFIG_INVALID_PAGE       = 0x0022,
+    MPI_IOCSTATUS_CONFIG_INVALID_DATA       = 0x0023,
+    MPI_IOCSTATUS_CONFIG_NO_DEFAULTS        = 0x0024,
+    MPI_IOCSTATUS_CONFIG_CANT_COMMIT        = 0x0025,
+
+    /************************************************************************/
+    /*  SCSIIO Reply = SPI & FCP, initiator values                           */
+    /************************************************************************/
+
+    MPI_IOCSTATUS_SCSI_RECOVERED_ERROR      = 0x0040,
+    MPI_IOCSTATUS_SCSI_INVALID_BUS          = 0x0041,
+    MPI_IOCSTATUS_SCSI_INVALID_TARGETID     = 0x0042,
+    MPI_IOCSTATUS_SCSI_DEVICE_NOT_THERE     = 0x0043,
+    MPI_IOCSTATUS_SCSI_DATA_OVERRUN         = 0x0044,
+    MPI_IOCSTATUS_SCSI_DATA_UNDERRUN        = 0x0045,
+    MPI_IOCSTATUS_SCSI_IO_DATA_ERROR        = 0x0046,
+    MPI_IOCSTATUS_SCSI_PROTOCOL_ERROR       = 0x0047,
+    MPI_IOCSTATUS_SCSI_TASK_TERMINATED      = 0x0048,
+    MPI_IOCSTATUS_SCSI_RESIDUAL_MISMATCH    = 0x0049,
+    MPI_IOCSTATUS_SCSI_TASK_MGMT_FAILED     = 0x004A,
+    MPI_IOCSTATUS_SCSI_IOC_TERMINATED       = 0x004B,
+    MPI_IOCSTATUS_SCSI_EXT_TERMINATED       = 0x004C,
+
+    /************************************************************************/
+    /*  For use by SCSI Initiator and SCSI Target end-to-end data protection*/
+    /************************************************************************/
+
+    MPI_IOCSTATUS_EEDP_GUARD_ERROR          = 0x004D,
+    MPI_IOCSTATUS_EEDP_REF_TAG_ERROR        = 0x004E,
+    MPI_IOCSTATUS_EEDP_APP_TAG_ERROR        = 0x004F,
+
+    /************************************************************************/
+    /*  SCSI Target values                                                  */
+    /************************************************************************/
+
+    MPI_IOCSTATUS_TARGET_PRIORITY_IO         = 0x0060,
+    MPI_IOCSTATUS_TARGET_INVALID_PORT        = 0x0061,
+    MPI_IOCSTATUS_TARGET_INVALID_IO_INDEX    = 0x0062,
+    MPI_IOCSTATUS_TARGET_ABORTED             = 0x0063,
+    MPI_IOCSTATUS_TARGET_NO_CONN_RETRYABLE   = 0x0064,
+    MPI_IOCSTATUS_TARGET_NO_CONNECTION       = 0x0065,
+    MPI_IOCSTATUS_TARGET_XFER_COUNT_MISMATCH = 0x006A,
+    MPI_IOCSTATUS_TARGET_STS_DATA_NOT_SENT   = 0x006B,
+    MPI_IOCSTATUS_TARGET_DATA_OFFSET_ERROR   = 0x006D,
+    MPI_IOCSTATUS_TARGET_TOO_MUCH_WRITE_DATA = 0x006E,
+    MPI_IOCSTATUS_TARGET_IU_TOO_SHORT        = 0x006F,
+    MPI_IOCSTATUS_TARGET_ACK_NAK_TIMEOUT     = 0x0070,
+    MPI_IOCSTATUS_TARGET_NAK_RECEIVED        = 0x0071,
+
+    /************************************************************************/
+    /*  Fibre Channel Direct Access values                                  */
+    /************************************************************************/
+
+    MPI_IOCSTATUS_FC_ABORTED                = 0x0066,
+    MPI_IOCSTATUS_FC_RX_ID_INVALID          = 0x0067,
+    MPI_IOCSTATUS_FC_DID_INVALID            = 0x0068,
+    MPI_IOCSTATUS_FC_NODE_LOGGED_OUT        = 0x0069,
+    MPI_IOCSTATUS_FC_EXCHANGE_CANCELED      = 0x006C,
+
+    /************************************************************************/
+    /*  LAN values                                                          */
+    /************************************************************************/
+
+    MPI_IOCSTATUS_LAN_DEVICE_NOT_FOUND      = 0x0080,
+    MPI_IOCSTATUS_LAN_DEVICE_FAILURE        = 0x0081,
+    MPI_IOCSTATUS_LAN_TRANSMIT_ERROR        = 0x0082,
+    MPI_IOCSTATUS_LAN_TRANSMIT_ABORTED      = 0x0083,
+    MPI_IOCSTATUS_LAN_RECEIVE_ERROR         = 0x0084,
+    MPI_IOCSTATUS_LAN_RECEIVE_ABORTED       = 0x0085,
+    MPI_IOCSTATUS_LAN_PARTIAL_PACKET        = 0x0086,
+    MPI_IOCSTATUS_LAN_CANCELED              = 0x0087,
+
+    /************************************************************************/
+    /*  Serial Attached SCSI values                                         */
+    /************************************************************************/
+
+    MPI_IOCSTATUS_SAS_SMP_REQUEST_FAILED    = 0x0090,
+    MPI_IOCSTATUS_SAS_SMP_DATA_OVERRUN      = 0x0091,
+
+    /************************************************************************/
+    /*  Inband values                                                       */
+    /************************************************************************/
+
+    MPI_IOCSTATUS_INBAND_ABORTED            = 0x0098,
+    MPI_IOCSTATUS_INBAND_NO_CONNECTION      = 0x0099,
+
+    /************************************************************************/
+    /*  Diagnostic Tools values                                             */
+    /************************************************************************/
+
+    MPI_IOCSTATUS_DIAGNOSTIC_RELEASED       = 0x00A0,
+
+    /************************************************************************/
+    /*  IOCStatus flag to indicate that log info is available               */
+    /************************************************************************/
+
+    MPI_IOCSTATUS_FLAG_LOG_INFO_AVAILABLE   = 0x8000,
+    MPI_IOCSTATUS_MASK                      = 0x7FFF,
+
+    /************************************************************************/
+    /*  LogInfo Types                                                       */
+    /************************************************************************/
+
+    MPI_IOCLOGINFO_TYPE_MASK                = 0xF0000000,
+    MPI_IOCLOGINFO_TYPE_SHIFT               = 28,
+    MPI_IOCLOGINFO_TYPE_NONE                = 0x0,
+    MPI_IOCLOGINFO_TYPE_SCSI                = 0x1,
+    MPI_IOCLOGINFO_TYPE_FC                  = 0x2,
+    MPI_IOCLOGINFO_TYPE_SAS                 = 0x3,
+    MPI_IOCLOGINFO_TYPE_ISCSI               = 0x4,
+    MPI_IOCLOGINFO_LOG_DATA_MASK            = 0x0FFFFFFF,
+};
+
+/****************************************************************************/
+/*  SCSI IO messages and associated structures                              */
+/****************************************************************************/
+
+typedef struct MPIMsgSCSIIORequest {
+    uint8_t                 TargetID;           /* 00h */
+    uint8_t                 Bus;                /* 01h */
+    uint8_t                 ChainOffset;        /* 02h */
+    uint8_t                 Function;           /* 03h */
+    uint8_t                 CDBLength;          /* 04h */
+    uint8_t                 SenseBufferLength;  /* 05h */
+    uint8_t                 Reserved;           /* 06h */
+    uint8_t                 MsgFlags;           /* 07h */
+    uint32_t                MsgContext;         /* 08h */
+    uint8_t                 LUN[8];             /* 0Ch */
+    uint32_t                Control;            /* 14h */
+    uint8_t                 CDB[16];            /* 18h */
+    uint32_t                DataLength;         /* 28h */
+    uint32_t                SenseBufferLowAddr; /* 2Ch */
+} QEMU_PACKED MPIMsgSCSIIORequest;
+
+/* SCSI IO MsgFlags bits */
+
+#define MPI_SCSIIO_MSGFLGS_SENSE_WIDTH              (0x01)
+#define MPI_SCSIIO_MSGFLGS_SENSE_WIDTH_32           (0x00)
+#define MPI_SCSIIO_MSGFLGS_SENSE_WIDTH_64           (0x01)
+
+#define MPI_SCSIIO_MSGFLGS_SENSE_LOCATION           (0x02)
+#define MPI_SCSIIO_MSGFLGS_SENSE_LOC_HOST           (0x00)
+#define MPI_SCSIIO_MSGFLGS_SENSE_LOC_IOC            (0x02)
+
+#define MPI_SCSIIO_MSGFLGS_CMD_DETERMINES_DATA_DIR  (0x04)
+
+/* SCSI IO LUN fields */
+
+#define MPI_SCSIIO_LUN_FIRST_LEVEL_ADDRESSING   (0x0000FFFF)
+#define MPI_SCSIIO_LUN_SECOND_LEVEL_ADDRESSING  (0xFFFF0000)
+#define MPI_SCSIIO_LUN_THIRD_LEVEL_ADDRESSING   (0x0000FFFF)
+#define MPI_SCSIIO_LUN_FOURTH_LEVEL_ADDRESSING  (0xFFFF0000)
+#define MPI_SCSIIO_LUN_LEVEL_1_WORD             (0xFF00)
+#define MPI_SCSIIO_LUN_LEVEL_1_DWORD            (0x0000FF00)
+
+/* SCSI IO Control bits */
+
+#define MPI_SCSIIO_CONTROL_DATADIRECTION_MASK   (0x03000000)
+#define MPI_SCSIIO_CONTROL_NODATATRANSFER       (0x00000000)
+#define MPI_SCSIIO_CONTROL_WRITE                (0x01000000)
+#define MPI_SCSIIO_CONTROL_READ                 (0x02000000)
+
+#define MPI_SCSIIO_CONTROL_ADDCDBLEN_MASK       (0x3C000000)
+#define MPI_SCSIIO_CONTROL_ADDCDBLEN_SHIFT      (26)
+
+#define MPI_SCSIIO_CONTROL_TASKATTRIBUTE_MASK   (0x00000700)
+#define MPI_SCSIIO_CONTROL_SIMPLEQ              (0x00000000)
+#define MPI_SCSIIO_CONTROL_HEADOFQ              (0x00000100)
+#define MPI_SCSIIO_CONTROL_ORDEREDQ             (0x00000200)
+#define MPI_SCSIIO_CONTROL_ACAQ                 (0x00000400)
+#define MPI_SCSIIO_CONTROL_UNTAGGED             (0x00000500)
+#define MPI_SCSIIO_CONTROL_NO_DISCONNECT        (0x00000700)
+
+#define MPI_SCSIIO_CONTROL_TASKMANAGE_MASK      (0x00FF0000)
+#define MPI_SCSIIO_CONTROL_OBSOLETE             (0x00800000)
+#define MPI_SCSIIO_CONTROL_CLEAR_ACA_RSV        (0x00400000)
+#define MPI_SCSIIO_CONTROL_TARGET_RESET         (0x00200000)
+#define MPI_SCSIIO_CONTROL_LUN_RESET_RSV        (0x00100000)
+#define MPI_SCSIIO_CONTROL_RESERVED             (0x00080000)
+#define MPI_SCSIIO_CONTROL_CLR_TASK_SET_RSV     (0x00040000)
+#define MPI_SCSIIO_CONTROL_ABORT_TASK_SET       (0x00020000)
+#define MPI_SCSIIO_CONTROL_RESERVED2            (0x00010000)
+
+/* SCSI IO reply structure */
+typedef struct MPIMsgSCSIIOReply
+{
+    uint8_t                 TargetID;           /* 00h */
+    uint8_t                 Bus;                /* 01h */
+    uint8_t                 MsgLength;          /* 02h */
+    uint8_t                 Function;           /* 03h */
+    uint8_t                 CDBLength;          /* 04h */
+    uint8_t                 SenseBufferLength;  /* 05h */
+    uint8_t                 Reserved;           /* 06h */
+    uint8_t                 MsgFlags;           /* 07h */
+    uint32_t                MsgContext;         /* 08h */
+    uint8_t                 SCSIStatus;         /* 0Ch */
+    uint8_t                 SCSIState;          /* 0Dh */
+    uint16_t                IOCStatus;          /* 0Eh */
+    uint32_t                IOCLogInfo;         /* 10h */
+    uint32_t                TransferCount;      /* 14h */
+    uint32_t                SenseCount;         /* 18h */
+    uint32_t                ResponseInfo;       /* 1Ch */
+    uint16_t                TaskTag;            /* 20h */
+    uint16_t                Reserved1;          /* 22h */
+} QEMU_PACKED MPIMsgSCSIIOReply;
+
+/* SCSI IO Reply SCSIStatus values (SAM-2 status codes) */
+
+#define MPI_SCSI_STATUS_SUCCESS                 (0x00)
+#define MPI_SCSI_STATUS_CHECK_CONDITION         (0x02)
+#define MPI_SCSI_STATUS_CONDITION_MET           (0x04)
+#define MPI_SCSI_STATUS_BUSY                    (0x08)
+#define MPI_SCSI_STATUS_INTERMEDIATE            (0x10)
+#define MPI_SCSI_STATUS_INTERMEDIATE_CONDMET    (0x14)
+#define MPI_SCSI_STATUS_RESERVATION_CONFLICT    (0x18)
+#define MPI_SCSI_STATUS_COMMAND_TERMINATED      (0x22)
+#define MPI_SCSI_STATUS_TASK_SET_FULL           (0x28)
+#define MPI_SCSI_STATUS_ACA_ACTIVE              (0x30)
+
+#define MPI_SCSI_STATUS_FCPEXT_DEVICE_LOGGED_OUT    (0x80)
+#define MPI_SCSI_STATUS_FCPEXT_NO_LINK              (0x81)
+#define MPI_SCSI_STATUS_FCPEXT_UNASSIGNED           (0x82)
+
+
+/* SCSI IO Reply SCSIState values */
+
+#define MPI_SCSI_STATE_AUTOSENSE_VALID          (0x01)
+#define MPI_SCSI_STATE_AUTOSENSE_FAILED         (0x02)
+#define MPI_SCSI_STATE_NO_SCSI_STATUS           (0x04)
+#define MPI_SCSI_STATE_TERMINATED               (0x08)
+#define MPI_SCSI_STATE_RESPONSE_INFO_VALID      (0x10)
+#define MPI_SCSI_STATE_QUEUE_TAG_REJECTED       (0x20)
+
+/* SCSI IO Reply ResponseInfo values */
+/* (FCP-1 RSP_CODE values and SPI-3 Packetized Failure codes) */
+
+#define MPI_SCSI_RSP_INFO_FUNCTION_COMPLETE     (0x00000000)
+#define MPI_SCSI_RSP_INFO_FCP_BURST_LEN_ERROR   (0x01000000)
+#define MPI_SCSI_RSP_INFO_CMND_FIELDS_INVALID   (0x02000000)
+#define MPI_SCSI_RSP_INFO_FCP_DATA_RO_ERROR     (0x03000000)
+#define MPI_SCSI_RSP_INFO_TASK_MGMT_UNSUPPORTED (0x04000000)
+#define MPI_SCSI_RSP_INFO_TASK_MGMT_FAILED      (0x05000000)
+#define MPI_SCSI_RSP_INFO_SPI_LQ_INVALID_TYPE   (0x06000000)
+
+#define MPI_SCSI_TASKTAG_UNKNOWN                (0xFFFF)
+
+
+/****************************************************************************/
+/*  SCSI Task Management messages                                           */
+/****************************************************************************/
+
+typedef struct MPIMsgSCSITaskMgmt {
+    uint8_t                 TargetID;           /* 00h */
+    uint8_t                 Bus;                /* 01h */
+    uint8_t                 ChainOffset;        /* 02h */
+    uint8_t                 Function;           /* 03h */
+    uint8_t                 Reserved;           /* 04h */
+    uint8_t                 TaskType;           /* 05h */
+    uint8_t                 Reserved1;          /* 06h */
+    uint8_t                 MsgFlags;           /* 07h */
+    uint32_t                MsgContext;         /* 08h */
+    uint8_t                 LUN[8];             /* 0Ch */
+    uint32_t                Reserved2[7];       /* 14h */
+    uint32_t                TaskMsgContext;     /* 30h */
+} QEMU_PACKED MPIMsgSCSITaskMgmt;
+
+enum {
+    /* TaskType values */
+
+    MPI_SCSITASKMGMT_TASKTYPE_ABORT_TASK            = 0x01,
+    MPI_SCSITASKMGMT_TASKTYPE_ABRT_TASK_SET         = 0x02,
+    MPI_SCSITASKMGMT_TASKTYPE_TARGET_RESET          = 0x03,
+    MPI_SCSITASKMGMT_TASKTYPE_RESET_BUS             = 0x04,
+    MPI_SCSITASKMGMT_TASKTYPE_LOGICAL_UNIT_RESET    = 0x05,
+    MPI_SCSITASKMGMT_TASKTYPE_CLEAR_TASK_SET        = 0x06,
+    MPI_SCSITASKMGMT_TASKTYPE_QUERY_TASK            = 0x07,
+    MPI_SCSITASKMGMT_TASKTYPE_CLR_ACA               = 0x08,
+
+    /* MsgFlags bits */
+
+    MPI_SCSITASKMGMT_MSGFLAGS_DO_NOT_SEND_TASK_IU   = 0x01,
+
+    MPI_SCSITASKMGMT_MSGFLAGS_TARGET_RESET_OPTION   = 0x00,
+    MPI_SCSITASKMGMT_MSGFLAGS_LIP_RESET_OPTION      = 0x02,
+    MPI_SCSITASKMGMT_MSGFLAGS_LIPRESET_RESET_OPTION = 0x04,
+
+    MPI_SCSITASKMGMT_MSGFLAGS_SOFT_RESET_OPTION     = 0x08,
+};
+
+/* SCSI Task Management Reply */
+typedef struct MPIMsgSCSITaskMgmtReply {
+    uint8_t                 TargetID;           /* 00h */
+    uint8_t                 Bus;                /* 01h */
+    uint8_t                 MsgLength;          /* 02h */
+    uint8_t                 Function;           /* 03h */
+    uint8_t                 ResponseCode;       /* 04h */
+    uint8_t                 TaskType;           /* 05h */
+    uint8_t                 Reserved1;          /* 06h */
+    uint8_t                 MsgFlags;           /* 07h */
+    uint32_t                MsgContext;         /* 08h */
+    uint8_t                 Reserved2[2];       /* 0Ch */
+    uint16_t                IOCStatus;          /* 0Eh */
+    uint32_t                IOCLogInfo;         /* 10h */
+    uint32_t                TerminationCount;   /* 14h */
+} QEMU_PACKED MPIMsgSCSITaskMgmtReply;
+
+/* ResponseCode values */
+enum {
+    MPI_SCSITASKMGMT_RSP_TM_COMPLETE                = 0x00,
+    MPI_SCSITASKMGMT_RSP_INVALID_FRAME              = 0x02,
+    MPI_SCSITASKMGMT_RSP_TM_NOT_SUPPORTED           = 0x04,
+    MPI_SCSITASKMGMT_RSP_TM_FAILED                  = 0x05,
+    MPI_SCSITASKMGMT_RSP_TM_SUCCEEDED               = 0x08,
+    MPI_SCSITASKMGMT_RSP_TM_INVALID_LUN             = 0x09,
+    MPI_SCSITASKMGMT_RSP_IO_QUEUED_ON_IOC           = 0x80,
+};
+
+/****************************************************************************/
+/*  IOCInit message                                                         */
+/****************************************************************************/
+
+typedef struct MPIMsgIOCInit {
+    uint8_t                 WhoInit;                    /* 00h */
+    uint8_t                 Reserved;                   /* 01h */
+    uint8_t                 ChainOffset;                /* 02h */
+    uint8_t                 Function;                   /* 03h */
+    uint8_t                 Flags;                      /* 04h */
+    uint8_t                 MaxDevices;                 /* 05h */
+    uint8_t                 MaxBuses;                   /* 06h */
+    uint8_t                 MsgFlags;                   /* 07h */
+    uint32_t                MsgContext;                 /* 08h */
+    uint16_t                ReplyFrameSize;             /* 0Ch */
+    uint8_t                 Reserved1[2];               /* 0Eh */
+    uint32_t                HostMfaHighAddr;            /* 10h */
+    uint32_t                SenseBufferHighAddr;        /* 14h */
+    uint32_t                ReplyFifoHostSignalingAddr; /* 18h */
+    MPISGEntry              HostPageBufferSGE;          /* 1Ch */
+    uint16_t                MsgVersion;                 /* 28h */
+    uint16_t                HeaderVersion;              /* 2Ah */
+} QEMU_PACKED MPIMsgIOCInit;
+
+enum {
+    /* WhoInit values */
+
+    MPI_WHOINIT_NO_ONE                              = 0x00,
+    MPI_WHOINIT_SYSTEM_BIOS                         = 0x01,
+    MPI_WHOINIT_ROM_BIOS                            = 0x02,
+    MPI_WHOINIT_PCI_PEER                            = 0x03,
+    MPI_WHOINIT_HOST_DRIVER                         = 0x04,
+    MPI_WHOINIT_MANUFACTURER                        = 0x05,
+
+    /* Flags values */
+
+    MPI_IOCINIT_FLAGS_HOST_PAGE_BUFFER_PERSISTENT   = 0x04,
+    MPI_IOCINIT_FLAGS_REPLY_FIFO_HOST_SIGNAL        = 0x02,
+    MPI_IOCINIT_FLAGS_DISCARD_FW_IMAGE              = 0x01,
+
+    /* MsgVersion */
+
+    MPI_IOCINIT_MSGVERSION_MAJOR_MASK               = 0xFF00,
+    MPI_IOCINIT_MSGVERSION_MAJOR_SHIFT              = 8,
+    MPI_IOCINIT_MSGVERSION_MINOR_MASK               = 0x00FF,
+    MPI_IOCINIT_MSGVERSION_MINOR_SHIFT              = 0,
+
+    /* HeaderVersion */
+
+    MPI_IOCINIT_HEADERVERSION_UNIT_MASK             = 0xFF00,
+    MPI_IOCINIT_HEADERVERSION_UNIT_SHIFT            = 8,
+    MPI_IOCINIT_HEADERVERSION_DEV_MASK              = 0x00FF,
+    MPI_IOCINIT_HEADERVERSION_DEV_SHIFT             = 0,
+};
+
+typedef struct MPIMsgIOCInitReply {
+    uint8_t                 WhoInit;                    /* 00h */
+    uint8_t                 Reserved;                   /* 01h */
+    uint8_t                 MsgLength;                  /* 02h */
+    uint8_t                 Function;                   /* 03h */
+    uint8_t                 Flags;                      /* 04h */
+    uint8_t                 MaxDevices;                 /* 05h */
+    uint8_t                 MaxBuses;                   /* 06h */
+    uint8_t                 MsgFlags;                   /* 07h */
+    uint32_t                MsgContext;                 /* 08h */
+    uint16_t                Reserved2;                  /* 0Ch */
+    uint16_t                IOCStatus;                  /* 0Eh */
+    uint32_t                IOCLogInfo;                 /* 10h */
+} QEMU_PACKED MPIMsgIOCInitReply;
+
+
+
+/****************************************************************************/
+/*  IOC Facts message                                                       */
+/****************************************************************************/
+
+typedef struct MPIMsgIOCFacts {
+    uint8_t                 Reserved[2];                /* 00h */
+    uint8_t                 ChainOffset;                /* 01h */
+    uint8_t                 Function;                   /* 02h */
+    uint8_t                 Reserved1[3];               /* 03h */
+    uint8_t                 MsgFlags;                   /* 04h */
+    uint32_t                MsgContext;                 /* 08h */
+} QEMU_PACKED MPIMsgIOCFacts;
+
+/* IOC Facts Reply */
+typedef struct MPIMsgIOCFactsReply {
+    uint16_t                MsgVersion;                 /* 00h */
+    uint8_t                 MsgLength;                  /* 02h */
+    uint8_t                 Function;                   /* 03h */
+    uint16_t                HeaderVersion;              /* 04h */
+    uint8_t                 IOCNumber;                  /* 06h */
+    uint8_t                 MsgFlags;                   /* 07h */
+    uint32_t                MsgContext;                 /* 08h */
+    uint16_t                IOCExceptions;              /* 0Ch */
+    uint16_t                IOCStatus;                  /* 0Eh */
+    uint32_t                IOCLogInfo;                 /* 10h */
+    uint8_t                 MaxChainDepth;              /* 14h */
+    uint8_t                 WhoInit;                    /* 15h */
+    uint8_t                 BlockSize;                  /* 16h */
+    uint8_t                 Flags;                      /* 17h */
+    uint16_t                ReplyQueueDepth;            /* 18h */
+    uint16_t                RequestFrameSize;           /* 1Ah */
+    uint16_t                Reserved_0101_FWVersion;    /* 1Ch */ /* obsolete 16-bit FWVersion */
+    uint16_t                ProductID;                  /* 1Eh */
+    uint32_t                CurrentHostMfaHighAddr;     /* 20h */
+    uint16_t                GlobalCredits;              /* 24h */
+    uint8_t                 NumberOfPorts;              /* 26h */
+    uint8_t                 EventState;                 /* 27h */
+    uint32_t                CurrentSenseBufferHighAddr; /* 28h */
+    uint16_t                CurReplyFrameSize;          /* 2Ch */
+    uint8_t                 MaxDevices;                 /* 2Eh */
+    uint8_t                 MaxBuses;                   /* 2Fh */
+    uint32_t                FWImageSize;                /* 30h */
+    uint32_t                IOCCapabilities;            /* 34h */
+    uint8_t                 FWVersionDev;               /* 38h */
+    uint8_t                 FWVersionUnit;              /* 39h */
+    uint8_t                 FWVersionMinor;             /* 3ah */
+    uint8_t                 FWVersionMajor;             /* 3bh */
+    uint16_t                HighPriorityQueueDepth;     /* 3Ch */
+    uint16_t                Reserved2;                  /* 3Eh */
+    MPISGEntry              HostPageBufferSGE;          /* 40h */
+    uint32_t                ReplyFifoHostSignalingAddr; /* 4Ch */
+} QEMU_PACKED MPIMsgIOCFactsReply;
+
+enum {
+    MPI_IOCFACTS_MSGVERSION_MAJOR_MASK              = 0xFF00,
+    MPI_IOCFACTS_MSGVERSION_MAJOR_SHIFT             = 8,
+    MPI_IOCFACTS_MSGVERSION_MINOR_MASK              = 0x00FF,
+    MPI_IOCFACTS_MSGVERSION_MINOR_SHIFT             = 0,
+
+    MPI_IOCFACTS_HDRVERSION_UNIT_MASK               = 0xFF00,
+    MPI_IOCFACTS_HDRVERSION_UNIT_SHIFT              = 8,
+    MPI_IOCFACTS_HDRVERSION_DEV_MASK                = 0x00FF,
+    MPI_IOCFACTS_HDRVERSION_DEV_SHIFT               = 0,
+
+    MPI_IOCFACTS_EXCEPT_CONFIG_CHECKSUM_FAIL        = 0x0001,
+    MPI_IOCFACTS_EXCEPT_RAID_CONFIG_INVALID         = 0x0002,
+    MPI_IOCFACTS_EXCEPT_FW_CHECKSUM_FAIL            = 0x0004,
+    MPI_IOCFACTS_EXCEPT_PERSISTENT_TABLE_FULL       = 0x0008,
+    MPI_IOCFACTS_EXCEPT_METADATA_UNSUPPORTED        = 0x0010,
+
+    MPI_IOCFACTS_FLAGS_FW_DOWNLOAD_BOOT             = 0x01,
+    MPI_IOCFACTS_FLAGS_REPLY_FIFO_HOST_SIGNAL       = 0x02,
+    MPI_IOCFACTS_FLAGS_HOST_PAGE_BUFFER_PERSISTENT  = 0x04,
+
+    MPI_IOCFACTS_EVENTSTATE_DISABLED                = 0x00,
+    MPI_IOCFACTS_EVENTSTATE_ENABLED                 = 0x01,
+
+    MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q              = 0x00000001,
+    MPI_IOCFACTS_CAPABILITY_REPLY_HOST_SIGNAL       = 0x00000002,
+    MPI_IOCFACTS_CAPABILITY_QUEUE_FULL_HANDLING     = 0x00000004,
+    MPI_IOCFACTS_CAPABILITY_DIAG_TRACE_BUFFER       = 0x00000008,
+    MPI_IOCFACTS_CAPABILITY_SNAPSHOT_BUFFER         = 0x00000010,
+    MPI_IOCFACTS_CAPABILITY_EXTENDED_BUFFER         = 0x00000020,
+    MPI_IOCFACTS_CAPABILITY_EEDP                    = 0x00000040,
+    MPI_IOCFACTS_CAPABILITY_BIDIRECTIONAL           = 0x00000080,
+    MPI_IOCFACTS_CAPABILITY_MULTICAST               = 0x00000100,
+    MPI_IOCFACTS_CAPABILITY_SCSIIO32                = 0x00000200,
+    MPI_IOCFACTS_CAPABILITY_NO_SCSIIO16             = 0x00000400,
+    MPI_IOCFACTS_CAPABILITY_TLR                     = 0x00000800,
+};
+
+/****************************************************************************/
+/*  Port Facts message and Reply                                            */
+/****************************************************************************/
+
+typedef struct MPIMsgPortFacts {
+     uint8_t                Reserved[2];                /* 00h */
+     uint8_t                ChainOffset;                /* 02h */
+     uint8_t                Function;                   /* 03h */
+     uint8_t                Reserved1[2];               /* 04h */
+     uint8_t                PortNumber;                 /* 06h */
+     uint8_t                MsgFlags;                   /* 07h */
+     uint32_t               MsgContext;                 /* 08h */
+} QEMU_PACKED MPIMsgPortFacts;
+
+typedef struct MPIMsgPortFactsReply {
+     uint16_t               Reserved;                   /* 00h */
+     uint8_t                MsgLength;                  /* 02h */
+     uint8_t                Function;                   /* 03h */
+     uint16_t               Reserved1;                  /* 04h */
+     uint8_t                PortNumber;                 /* 06h */
+     uint8_t                MsgFlags;                   /* 07h */
+     uint32_t               MsgContext;                 /* 08h */
+     uint16_t               Reserved2;                  /* 0Ch */
+     uint16_t               IOCStatus;                  /* 0Eh */
+     uint32_t               IOCLogInfo;                 /* 10h */
+     uint8_t                Reserved3;                  /* 14h */
+     uint8_t                PortType;                   /* 15h */
+     uint16_t               MaxDevices;                 /* 16h */
+     uint16_t               PortSCSIID;                 /* 18h */
+     uint16_t               ProtocolFlags;              /* 1Ah */
+     uint16_t               MaxPostedCmdBuffers;        /* 1Ch */
+     uint16_t               MaxPersistentIDs;           /* 1Eh */
+     uint16_t               MaxLanBuckets;              /* 20h */
+     uint8_t                MaxInitiators;              /* 22h */
+     uint8_t                Reserved4;                  /* 23h */
+     uint32_t               Reserved5;                  /* 24h */
+} QEMU_PACKED MPIMsgPortFactsReply;
+
+
+enum {
+    /* PortTypes values */
+    MPI_PORTFACTS_PORTTYPE_INACTIVE         = 0x00,
+    MPI_PORTFACTS_PORTTYPE_SCSI             = 0x01,
+    MPI_PORTFACTS_PORTTYPE_FC               = 0x10,
+    MPI_PORTFACTS_PORTTYPE_ISCSI            = 0x20,
+    MPI_PORTFACTS_PORTTYPE_SAS              = 0x30,
+
+    /* ProtocolFlags values */
+    MPI_PORTFACTS_PROTOCOL_LOGBUSADDR       = 0x01,
+    MPI_PORTFACTS_PROTOCOL_LAN              = 0x02,
+    MPI_PORTFACTS_PROTOCOL_TARGET           = 0x04,
+    MPI_PORTFACTS_PROTOCOL_INITIATOR        = 0x08,
+};
+
+
+/****************************************************************************/
+/*  Port Enable Message                                                     */
+/****************************************************************************/
+
+typedef struct MPIMsgPortEnable {
+    uint8_t                 Reserved[2];                /* 00h */
+    uint8_t                 ChainOffset;                /* 02h */
+    uint8_t                 Function;                   /* 03h */
+    uint8_t                 Reserved1[2];               /* 04h */
+    uint8_t                 PortNumber;                 /* 06h */
+    uint8_t                 MsgFlags;                   /* 07h */
+    uint32_t                MsgContext;                 /* 08h */
+} QEMU_PACKED MPIMsgPortEnable;
+
+typedef struct MPIMsgPortEnableReply {
+    uint8_t                 Reserved[2];                /* 00h */
+    uint8_t                 MsgLength;                  /* 02h */
+    uint8_t                 Function;                   /* 03h */
+    uint8_t                 Reserved1[2];               /* 04h */
+    uint8_t                 PortNumber;                 /* 05h */
+    uint8_t                 MsgFlags;                   /* 07h */
+    uint32_t                MsgContext;                 /* 08h */
+    uint16_t                Reserved2;                  /* 0Ch */
+    uint16_t                IOCStatus;                  /* 0Eh */
+    uint32_t                IOCLogInfo;                 /* 10h */
+} QEMU_PACKED MPIMsgPortEnableReply;
+
+/****************************************************************************/
+/*  Event Notification messages                                             */
+/****************************************************************************/
+
+typedef struct MPIMsgEventNotify {
+    uint8_t                 Switch;                     /* 00h */
+    uint8_t                 Reserved;                   /* 01h */
+    uint8_t                 ChainOffset;                /* 02h */
+    uint8_t                 Function;                   /* 03h */
+    uint8_t                 Reserved1[3];               /* 04h */
+    uint8_t                 MsgFlags;                   /* 07h */
+    uint32_t                MsgContext;                 /* 08h */
+} QEMU_PACKED MPIMsgEventNotify;
+
+/* Event Notification Reply */
+
+typedef struct MPIMsgEventNotifyReply {
+     uint16_t               EventDataLength;            /* 00h */
+     uint8_t                MsgLength;                  /* 02h */
+     uint8_t                Function;                   /* 03h */
+     uint8_t                Reserved1[2];               /* 04h */
+     uint8_t                AckRequired;                /* 06h */
+     uint8_t                MsgFlags;                   /* 07h */
+     uint32_t               MsgContext;                 /* 08h */
+     uint8_t                Reserved2[2];               /* 0Ch */
+     uint16_t               IOCStatus;                  /* 0Eh */
+     uint32_t               IOCLogInfo;                 /* 10h */
+     uint32_t               Event;                      /* 14h */
+     uint32_t               EventContext;               /* 18h */
+     uint32_t               Data[1];                    /* 1Ch */
+} QEMU_PACKED MPIMsgEventNotifyReply;
+
+/* Event Acknowledge */
+
+typedef struct MPIMsgEventAck {
+    uint8_t                 Reserved[2];                /* 00h */
+    uint8_t                 ChainOffset;                /* 02h */
+    uint8_t                 Function;                   /* 03h */
+    uint8_t                 Reserved1[3];               /* 04h */
+    uint8_t                 MsgFlags;                   /* 07h */
+    uint32_t                MsgContext;                 /* 08h */
+    uint32_t                Event;                      /* 0Ch */
+    uint32_t                EventContext;               /* 10h */
+} QEMU_PACKED MPIMsgEventAck;
+
+typedef struct MPIMsgEventAckReply {
+    uint8_t                 Reserved[2];                /* 00h */
+    uint8_t                 MsgLength;                  /* 02h */
+    uint8_t                 Function;                   /* 03h */
+    uint8_t                 Reserved1[3];               /* 04h */
+    uint8_t                 MsgFlags;                   /* 07h */
+    uint32_t                MsgContext;                 /* 08h */
+    uint16_t                Reserved2;                  /* 0Ch */
+    uint16_t                IOCStatus;                  /* 0Eh */
+    uint32_t                IOCLogInfo;                 /* 10h */
+} QEMU_PACKED MPIMsgEventAckReply;
+
+enum {
+    /* Switch */
+
+    MPI_EVENT_NOTIFICATION_SWITCH_OFF   = 0x00,
+    MPI_EVENT_NOTIFICATION_SWITCH_ON    = 0x01,
+
+    /* Event */
+
+    MPI_EVENT_NONE                          = 0x00000000,
+    MPI_EVENT_LOG_DATA                      = 0x00000001,
+    MPI_EVENT_STATE_CHANGE                  = 0x00000002,
+    MPI_EVENT_UNIT_ATTENTION                = 0x00000003,
+    MPI_EVENT_IOC_BUS_RESET                 = 0x00000004,
+    MPI_EVENT_EXT_BUS_RESET                 = 0x00000005,
+    MPI_EVENT_RESCAN                        = 0x00000006,
+    MPI_EVENT_LINK_STATUS_CHANGE            = 0x00000007,
+    MPI_EVENT_LOOP_STATE_CHANGE             = 0x00000008,
+    MPI_EVENT_LOGOUT                        = 0x00000009,
+    MPI_EVENT_EVENT_CHANGE                  = 0x0000000A,
+    MPI_EVENT_INTEGRATED_RAID               = 0x0000000B,
+    MPI_EVENT_SCSI_DEVICE_STATUS_CHANGE     = 0x0000000C,
+    MPI_EVENT_ON_BUS_TIMER_EXPIRED          = 0x0000000D,
+    MPI_EVENT_QUEUE_FULL                    = 0x0000000E,
+    MPI_EVENT_SAS_DEVICE_STATUS_CHANGE      = 0x0000000F,
+    MPI_EVENT_SAS_SES                       = 0x00000010,
+    MPI_EVENT_PERSISTENT_TABLE_FULL         = 0x00000011,
+    MPI_EVENT_SAS_PHY_LINK_STATUS           = 0x00000012,
+    MPI_EVENT_SAS_DISCOVERY_ERROR           = 0x00000013,
+    MPI_EVENT_IR_RESYNC_UPDATE              = 0x00000014,
+    MPI_EVENT_IR2                           = 0x00000015,
+    MPI_EVENT_SAS_DISCOVERY                 = 0x00000016,
+    MPI_EVENT_SAS_BROADCAST_PRIMITIVE       = 0x00000017,
+    MPI_EVENT_SAS_INIT_DEVICE_STATUS_CHANGE = 0x00000018,
+    MPI_EVENT_SAS_INIT_TABLE_OVERFLOW       = 0x00000019,
+    MPI_EVENT_SAS_SMP_ERROR                 = 0x0000001A,
+    MPI_EVENT_SAS_EXPANDER_STATUS_CHANGE    = 0x0000001B,
+    MPI_EVENT_LOG_ENTRY_ADDED               = 0x00000021,
+
+    /* AckRequired field values */
+
+    MPI_EVENT_NOTIFICATION_ACK_NOT_REQUIRED = 0x00,
+    MPI_EVENT_NOTIFICATION_ACK_REQUIRED     = 0x01,
+};
+
+/****************************************************************************
+*   Config Request Message
+****************************************************************************/
+
+typedef struct MPIMsgConfig {
+    uint8_t                 Action;                     /* 00h */
+    uint8_t                 Reserved;                   /* 01h */
+    uint8_t                 ChainOffset;                /* 02h */
+    uint8_t                 Function;                   /* 03h */
+    uint16_t                ExtPageLength;              /* 04h */
+    uint8_t                 ExtPageType;                /* 06h */
+    uint8_t                 MsgFlags;                   /* 07h */
+    uint32_t                MsgContext;                 /* 08h */
+    uint8_t                 Reserved2[8];               /* 0Ch */
+    uint8_t                 PageVersion;                /* 14h */
+    uint8_t                 PageLength;                 /* 15h */
+    uint8_t                 PageNumber;                 /* 16h */
+    uint8_t                 PageType;                   /* 17h */
+    uint32_t                PageAddress;                /* 18h */
+    MPISGEntry              PageBufferSGE;              /* 1Ch */
+} QEMU_PACKED MPIMsgConfig;
+
+/* Action field values */
+
+enum {
+    MPI_CONFIG_ACTION_PAGE_HEADER               = 0x00,
+    MPI_CONFIG_ACTION_PAGE_READ_CURRENT         = 0x01,
+    MPI_CONFIG_ACTION_PAGE_WRITE_CURRENT        = 0x02,
+    MPI_CONFIG_ACTION_PAGE_DEFAULT              = 0x03,
+    MPI_CONFIG_ACTION_PAGE_WRITE_NVRAM          = 0x04,
+    MPI_CONFIG_ACTION_PAGE_READ_DEFAULT         = 0x05,
+    MPI_CONFIG_ACTION_PAGE_READ_NVRAM           = 0x06,
+};
+
+
+/* Config Reply Message */
+typedef struct MPIMsgConfigReply {
+    uint8_t                 Action;                     /* 00h */
+    uint8_t                 Reserved;                   /* 01h */
+    uint8_t                 MsgLength;                  /* 02h */
+    uint8_t                 Function;                   /* 03h */
+    uint16_t                ExtPageLength;              /* 04h */
+    uint8_t                 ExtPageType;                /* 06h */
+    uint8_t                 MsgFlags;                   /* 07h */
+    uint32_t                MsgContext;                 /* 08h */
+    uint8_t                 Reserved2[2];               /* 0Ch */
+    uint16_t                IOCStatus;                  /* 0Eh */
+    uint32_t                IOCLogInfo;                 /* 10h */
+    uint8_t                 PageVersion;                /* 14h */
+    uint8_t                 PageLength;                 /* 15h */
+    uint8_t                 PageNumber;                 /* 16h */
+    uint8_t                 PageType;                   /* 17h */
+} QEMU_PACKED MPIMsgConfigReply;
+
+enum {
+    /* PageAddress field values */
+    MPI_CONFIG_PAGEATTR_READ_ONLY               = 0x00,
+    MPI_CONFIG_PAGEATTR_CHANGEABLE              = 0x10,
+    MPI_CONFIG_PAGEATTR_PERSISTENT              = 0x20,
+    MPI_CONFIG_PAGEATTR_RO_PERSISTENT           = 0x30,
+    MPI_CONFIG_PAGEATTR_MASK                    = 0xF0,
+
+    MPI_CONFIG_PAGETYPE_IO_UNIT                 = 0x00,
+    MPI_CONFIG_PAGETYPE_IOC                     = 0x01,
+    MPI_CONFIG_PAGETYPE_BIOS                    = 0x02,
+    MPI_CONFIG_PAGETYPE_SCSI_PORT               = 0x03,
+    MPI_CONFIG_PAGETYPE_SCSI_DEVICE             = 0x04,
+    MPI_CONFIG_PAGETYPE_FC_PORT                 = 0x05,
+    MPI_CONFIG_PAGETYPE_FC_DEVICE               = 0x06,
+    MPI_CONFIG_PAGETYPE_LAN                     = 0x07,
+    MPI_CONFIG_PAGETYPE_RAID_VOLUME             = 0x08,
+    MPI_CONFIG_PAGETYPE_MANUFACTURING           = 0x09,
+    MPI_CONFIG_PAGETYPE_RAID_PHYSDISK           = 0x0A,
+    MPI_CONFIG_PAGETYPE_INBAND                  = 0x0B,
+    MPI_CONFIG_PAGETYPE_EXTENDED                = 0x0F,
+    MPI_CONFIG_PAGETYPE_MASK                    = 0x0F,
+
+    MPI_CONFIG_EXTPAGETYPE_SAS_IO_UNIT          = 0x10,
+    MPI_CONFIG_EXTPAGETYPE_SAS_EXPANDER         = 0x11,
+    MPI_CONFIG_EXTPAGETYPE_SAS_DEVICE           = 0x12,
+    MPI_CONFIG_EXTPAGETYPE_SAS_PHY              = 0x13,
+    MPI_CONFIG_EXTPAGETYPE_LOG                  = 0x14,
+    MPI_CONFIG_EXTPAGETYPE_ENCLOSURE            = 0x15,
+
+    MPI_SCSI_PORT_PGAD_PORT_MASK                = 0x000000FF,
+
+    MPI_SCSI_DEVICE_FORM_MASK                   = 0xF0000000,
+    MPI_SCSI_DEVICE_FORM_BUS_TID                = 0x00000000,
+    MPI_SCSI_DEVICE_TARGET_ID_MASK              = 0x000000FF,
+    MPI_SCSI_DEVICE_TARGET_ID_SHIFT             = 0,
+    MPI_SCSI_DEVICE_BUS_MASK                    = 0x0000FF00,
+    MPI_SCSI_DEVICE_BUS_SHIFT                   = 8,
+    MPI_SCSI_DEVICE_FORM_TARGET_MODE            = 0x10000000,
+    MPI_SCSI_DEVICE_TM_RESPOND_ID_MASK          = 0x000000FF,
+    MPI_SCSI_DEVICE_TM_RESPOND_ID_SHIFT         = 0,
+    MPI_SCSI_DEVICE_TM_BUS_MASK                 = 0x0000FF00,
+    MPI_SCSI_DEVICE_TM_BUS_SHIFT                = 8,
+    MPI_SCSI_DEVICE_TM_INIT_ID_MASK             = 0x00FF0000,
+    MPI_SCSI_DEVICE_TM_INIT_ID_SHIFT            = 16,
+
+    MPI_FC_PORT_PGAD_PORT_MASK                  = 0xF0000000,
+    MPI_FC_PORT_PGAD_PORT_SHIFT                 = 28,
+    MPI_FC_PORT_PGAD_FORM_MASK                  = 0x0F000000,
+    MPI_FC_PORT_PGAD_FORM_INDEX                 = 0x01000000,
+    MPI_FC_PORT_PGAD_INDEX_MASK                 = 0x0000FFFF,
+    MPI_FC_PORT_PGAD_INDEX_SHIFT                = 0,
+
+    MPI_FC_DEVICE_PGAD_PORT_MASK                = 0xF0000000,
+    MPI_FC_DEVICE_PGAD_PORT_SHIFT               = 28,
+    MPI_FC_DEVICE_PGAD_FORM_MASK                = 0x0F000000,
+    MPI_FC_DEVICE_PGAD_FORM_NEXT_DID            = 0x00000000,
+    MPI_FC_DEVICE_PGAD_ND_PORT_MASK             = 0xF0000000,
+    MPI_FC_DEVICE_PGAD_ND_PORT_SHIFT            = 28,
+    MPI_FC_DEVICE_PGAD_ND_DID_MASK              = 0x00FFFFFF,
+    MPI_FC_DEVICE_PGAD_ND_DID_SHIFT             = 0,
+    MPI_FC_DEVICE_PGAD_FORM_BUS_TID             = 0x01000000,
+    MPI_FC_DEVICE_PGAD_BT_BUS_MASK              = 0x0000FF00,
+    MPI_FC_DEVICE_PGAD_BT_BUS_SHIFT             = 8,
+    MPI_FC_DEVICE_PGAD_BT_TID_MASK              = 0x000000FF,
+    MPI_FC_DEVICE_PGAD_BT_TID_SHIFT             = 0,
+
+    MPI_PHYSDISK_PGAD_PHYSDISKNUM_MASK          = 0x000000FF,
+    MPI_PHYSDISK_PGAD_PHYSDISKNUM_SHIFT         = 0,
+
+    MPI_SAS_EXPAND_PGAD_FORM_MASK             = 0xF0000000,
+    MPI_SAS_EXPAND_PGAD_FORM_SHIFT            = 28,
+    MPI_SAS_EXPAND_PGAD_FORM_GET_NEXT_HANDLE  = 0x00000000,
+    MPI_SAS_EXPAND_PGAD_FORM_HANDLE_PHY_NUM   = 0x00000001,
+    MPI_SAS_EXPAND_PGAD_FORM_HANDLE           = 0x00000002,
+    MPI_SAS_EXPAND_PGAD_GNH_MASK_HANDLE       = 0x0000FFFF,
+    MPI_SAS_EXPAND_PGAD_GNH_SHIFT_HANDLE      = 0,
+    MPI_SAS_EXPAND_PGAD_HPN_MASK_PHY          = 0x00FF0000,
+    MPI_SAS_EXPAND_PGAD_HPN_SHIFT_PHY         = 16,
+    MPI_SAS_EXPAND_PGAD_HPN_MASK_HANDLE       = 0x0000FFFF,
+    MPI_SAS_EXPAND_PGAD_HPN_SHIFT_HANDLE      = 0,
+    MPI_SAS_EXPAND_PGAD_H_MASK_HANDLE         = 0x0000FFFF,
+    MPI_SAS_EXPAND_PGAD_H_SHIFT_HANDLE        = 0,
+
+    MPI_SAS_DEVICE_PGAD_FORM_MASK               = 0xF0000000,
+    MPI_SAS_DEVICE_PGAD_FORM_SHIFT              = 28,
+    MPI_SAS_DEVICE_PGAD_FORM_GET_NEXT_HANDLE    = 0x00000000,
+    MPI_SAS_DEVICE_PGAD_FORM_BUS_TARGET_ID      = 0x00000001,
+    MPI_SAS_DEVICE_PGAD_FORM_HANDLE             = 0x00000002,
+    MPI_SAS_DEVICE_PGAD_GNH_HANDLE_MASK         = 0x0000FFFF,
+    MPI_SAS_DEVICE_PGAD_GNH_HANDLE_SHIFT        = 0,
+    MPI_SAS_DEVICE_PGAD_BT_BUS_MASK             = 0x0000FF00,
+    MPI_SAS_DEVICE_PGAD_BT_BUS_SHIFT            = 8,
+    MPI_SAS_DEVICE_PGAD_BT_TID_MASK             = 0x000000FF,
+    MPI_SAS_DEVICE_PGAD_BT_TID_SHIFT            = 0,
+    MPI_SAS_DEVICE_PGAD_H_HANDLE_MASK           = 0x0000FFFF,
+    MPI_SAS_DEVICE_PGAD_H_HANDLE_SHIFT          = 0,
+
+    MPI_SAS_PHY_PGAD_FORM_MASK                  = 0xF0000000,
+    MPI_SAS_PHY_PGAD_FORM_SHIFT                 = 28,
+    MPI_SAS_PHY_PGAD_FORM_PHY_NUMBER            = 0x0,
+    MPI_SAS_PHY_PGAD_FORM_PHY_TBL_INDEX         = 0x1,
+    MPI_SAS_PHY_PGAD_PHY_NUMBER_MASK            = 0x000000FF,
+    MPI_SAS_PHY_PGAD_PHY_NUMBER_SHIFT           = 0,
+    MPI_SAS_PHY_PGAD_PHY_TBL_INDEX_MASK         = 0x0000FFFF,
+    MPI_SAS_PHY_PGAD_PHY_TBL_INDEX_SHIFT        = 0,
+
+    MPI_SAS_ENCLOS_PGAD_FORM_MASK               = 0xF0000000,
+    MPI_SAS_ENCLOS_PGAD_FORM_SHIFT              = 28,
+    MPI_SAS_ENCLOS_PGAD_FORM_GET_NEXT_HANDLE    = 0x00000000,
+    MPI_SAS_ENCLOS_PGAD_FORM_HANDLE             = 0x00000001,
+    MPI_SAS_ENCLOS_PGAD_GNH_HANDLE_MASK         = 0x0000FFFF,
+    MPI_SAS_ENCLOS_PGAD_GNH_HANDLE_SHIFT        = 0,
+    MPI_SAS_ENCLOS_PGAD_H_HANDLE_MASK           = 0x0000FFFF,
+    MPI_SAS_ENCLOS_PGAD_H_HANDLE_SHIFT          = 0,
+};
+
+/* Too many structs and definitions... see mptconfig.c for the few
+ * that are used.
+ */
+
+/****************************************************************************/
+/*  Firmware Upload message and associated structures                       */
+/****************************************************************************/
+
+enum {
+    /* defines for using the ProductId field */
+    MPI_FW_HEADER_PID_TYPE_MASK                     = 0xF000,
+    MPI_FW_HEADER_PID_TYPE_SCSI                     = 0x0000,
+    MPI_FW_HEADER_PID_TYPE_FC                       = 0x1000,
+    MPI_FW_HEADER_PID_TYPE_SAS                      = 0x2000,
+
+    MPI_FW_HEADER_PID_PROD_MASK                     = 0x0F00,
+    MPI_FW_HEADER_PID_PROD_INITIATOR_SCSI           = 0x0100,
+    MPI_FW_HEADER_PID_PROD_TARGET_INITIATOR_SCSI    = 0x0200,
+    MPI_FW_HEADER_PID_PROD_TARGET_SCSI              = 0x0300,
+    MPI_FW_HEADER_PID_PROD_IM_SCSI                  = 0x0400,
+    MPI_FW_HEADER_PID_PROD_IS_SCSI                  = 0x0500,
+    MPI_FW_HEADER_PID_PROD_CTX_SCSI                 = 0x0600,
+    MPI_FW_HEADER_PID_PROD_IR_SCSI                  = 0x0700,
+
+    MPI_FW_HEADER_PID_FAMILY_MASK                   = 0x00FF,
+
+    /* SCSI */
+    MPI_FW_HEADER_PID_FAMILY_1030A0_SCSI            = 0x0001,
+    MPI_FW_HEADER_PID_FAMILY_1030B0_SCSI            = 0x0002,
+    MPI_FW_HEADER_PID_FAMILY_1030B1_SCSI            = 0x0003,
+    MPI_FW_HEADER_PID_FAMILY_1030C0_SCSI            = 0x0004,
+    MPI_FW_HEADER_PID_FAMILY_1020A0_SCSI            = 0x0005,
+    MPI_FW_HEADER_PID_FAMILY_1020B0_SCSI            = 0x0006,
+    MPI_FW_HEADER_PID_FAMILY_1020B1_SCSI            = 0x0007,
+    MPI_FW_HEADER_PID_FAMILY_1020C0_SCSI            = 0x0008,
+    MPI_FW_HEADER_PID_FAMILY_1035A0_SCSI            = 0x0009,
+    MPI_FW_HEADER_PID_FAMILY_1035B0_SCSI            = 0x000A,
+    MPI_FW_HEADER_PID_FAMILY_1030TA0_SCSI           = 0x000B,
+    MPI_FW_HEADER_PID_FAMILY_1020TA0_SCSI           = 0x000C,
+
+    /* Fibre Channel */
+    MPI_FW_HEADER_PID_FAMILY_909_FC                 = 0x0000,
+    MPI_FW_HEADER_PID_FAMILY_919_FC                 = 0x0001, /* 919 and 929     */
+    MPI_FW_HEADER_PID_FAMILY_919X_FC                = 0x0002, /* 919X and 929X   */
+    MPI_FW_HEADER_PID_FAMILY_919XL_FC               = 0x0003, /* 919XL and 929XL */
+    MPI_FW_HEADER_PID_FAMILY_939X_FC                = 0x0004, /* 939X and 949X   */
+    MPI_FW_HEADER_PID_FAMILY_959_FC                 = 0x0005,
+    MPI_FW_HEADER_PID_FAMILY_949E_FC                = 0x0006,
+
+    /* SAS */
+    MPI_FW_HEADER_PID_FAMILY_1064_SAS               = 0x0001,
+    MPI_FW_HEADER_PID_FAMILY_1068_SAS               = 0x0002,
+    MPI_FW_HEADER_PID_FAMILY_1078_SAS               = 0x0003,
+    MPI_FW_HEADER_PID_FAMILY_106xE_SAS              = 0x0004, /* 1068E, 1066E, and 1064E */
+};
+
+#endif
diff --git a/hw/scsi/mptconfig.c b/hw/scsi/mptconfig.c
new file mode 100644
index 0000000..d049825
--- /dev/null
+++ b/hw/scsi/mptconfig.c
@@ -0,0 +1,904 @@
+/*
+ * QEMU LSI SAS1068 Host Bus Adapter emulation - configuration pages
+ *
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * Author: Paolo Bonzini
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ */
+#include "qemu/osdep.h"
+#include "hw/hw.h"
+#include "hw/pci/pci.h"
+#include "hw/scsi/scsi.h"
+
+#include "mptsas.h"
+#include "mpi.h"
+#include "trace.h"
+
+/* Generic functions for marshaling and unmarshaling.  */
+
+#define repl1(x) x
+#define repl2(x) x x
+#define repl3(x) x x x
+#define repl4(x) x x x x
+#define repl5(x) x x x x x
+#define repl6(x) x x x x x x
+#define repl7(x) x x x x x x x
+#define repl8(x) x x x x x x x x
+
+#define repl(n, x) glue(repl, n)(x)
+
+typedef union PackValue {
+    uint64_t ll;
+    char *str;
+} PackValue;
+
+static size_t vfill(uint8_t *data, size_t size, const char *fmt, va_list ap)
+{
+    size_t ofs;
+    PackValue val;
+    const char *p;
+
+    ofs = 0;
+    p = fmt;
+    while (*p) {
+        memset(&val, 0, sizeof(val));
+        switch (*p) {
+        case '*':
+            p++;
+            break;
+        case 'b':
+        case 'w':
+        case 'l':
+            val.ll = va_arg(ap, int);
+            break;
+        case 'q':
+            val.ll = va_arg(ap, int64_t);
+            break;
+        case 's':
+            val.str = va_arg(ap, void *);
+            break;
+        }
+        switch (*p++) {
+        case 'b':
+            if (data) {
+                stb_p(data + ofs, val.ll);
+            }
+            ofs++;
+            break;
+        case 'w':
+            if (data) {
+                stw_le_p(data + ofs, val.ll);
+            }
+            ofs += 2;
+            break;
+        case 'l':
+            if (data) {
+                stl_le_p(data + ofs, val.ll);
+            }
+            ofs += 4;
+            break;
+        case 'q':
+            if (data) {
+                stq_le_p(data + ofs, val.ll);
+            }
+            ofs += 8;
+            break;
+        case 's':
+            {
+                int cnt = atoi(p);
+                if (data) {
+                    if (val.str) {
+                        strncpy((void *)data + ofs, val.str, cnt);
+                    } else {
+                        memset((void *)data + ofs, 0, cnt);
+                    }
+                }
+                ofs += cnt;
+                break;
+            }
+        }
+    }
+
+    return ofs;
+}
+
+static size_t vpack(uint8_t **p_data, const char *fmt, va_list ap1)
+{
+    size_t size = 0;
+    uint8_t *data = NULL;
+
+    if (p_data) {
+        va_list ap2;
+
+        va_copy(ap2, ap1);
+        size = vfill(NULL, 0, fmt, ap2);
+        *p_data = data = g_malloc(size);
+    }
+    return vfill(data, size, fmt, ap1);
+}
+
+static size_t fill(uint8_t *data, size_t size, const char *fmt, ...)
+{
+    va_list ap;
+    size_t ret;
+
+    va_start(ap, fmt);
+    ret = vfill(data, size, fmt, ap);
+    va_end(ap);
+
+    return ret;
+}
+
+/* Functions to build the page header and fill in the length, always used
+ * through the macros.
+ */
+
+#define MPTSAS_CONFIG_PACK(number, type, version, fmt, ...)                  \
+    mptsas_config_pack(data, "b*bbb" fmt, version, number, type,             \
+                       ## __VA_ARGS__)
+
+static size_t mptsas_config_pack(uint8_t **data, const char *fmt, ...)
+{
+    va_list ap;
+    size_t ret;
+
+    va_start(ap, fmt);
+    ret = vpack(data, fmt, ap);
+    va_end(ap);
+
+    if (data) {
+        assert(ret < 256 && (ret % 4) == 0);
+        stb_p(*data + 1, ret / 4);
+    }
+    return ret;
+}
+
+#define MPTSAS_CONFIG_PACK_EXT(number, type, version, fmt, ...)              \
+    mptsas_config_pack_ext(data, "b*bbb*wb*b" fmt, version, number,          \
+                           MPI_CONFIG_PAGETYPE_EXTENDED, type, ## __VA_ARGS__)
+
+static size_t mptsas_config_pack_ext(uint8_t **data, const char *fmt, ...)
+{
+    va_list ap;
+    size_t ret;
+
+    va_start(ap, fmt);
+    ret = vpack(data, fmt, ap);
+    va_end(ap);
+
+    if (data) {
+        assert(ret < 65536 && (ret % 4) == 0);
+        stw_le_p(*data + 4, ret / 4);
+    }
+    return ret;
+}
+
+/* Manufacturing pages */
+
+static
+size_t mptsas_config_manufacturing_0(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(0, MPI_CONFIG_PAGETYPE_MANUFACTURING, 0x00,
+                              "s16s8s16s16s16",
+                              "QEMU MPT Fusion",
+                              "2.5",
+                              "QEMU MPT Fusion",
+                              "QEMU",
+                              "0000111122223333");
+}
+
+static
+size_t mptsas_config_manufacturing_1(MPTSASState *s, uint8_t **data, int address)
+{
+    /* VPD - all zeros */
+    return MPTSAS_CONFIG_PACK(1, MPI_CONFIG_PAGETYPE_MANUFACTURING, 0x00,
+                              "s256");
+}
+
+static
+size_t mptsas_config_manufacturing_2(MPTSASState *s, uint8_t **data, int address)
+{
+    PCIDeviceClass *pcic = PCI_DEVICE_GET_CLASS(s);
+    return MPTSAS_CONFIG_PACK(2, MPI_CONFIG_PAGETYPE_MANUFACTURING, 0x00,
+                              "wb*b*l",
+                              pcic->device_id, pcic->revision);
+}
+
+static
+size_t mptsas_config_manufacturing_3(MPTSASState *s, uint8_t **data, int address)
+{
+    PCIDeviceClass *pcic = PCI_DEVICE_GET_CLASS(s);
+    return MPTSAS_CONFIG_PACK(3, MPI_CONFIG_PAGETYPE_MANUFACTURING, 0x00,
+                              "wb*b*l",
+                              pcic->device_id, pcic->revision);
+}
+
+static
+size_t mptsas_config_manufacturing_4(MPTSASState *s, uint8_t **data, int address)
+{
+    /* All zeros */
+    return MPTSAS_CONFIG_PACK(4, MPI_CONFIG_PAGETYPE_MANUFACTURING, 0x05,
+                              "*l*b*b*b*b*b*b*w*s56*l*l*l*l*l*l"
+                              "*b*b*w*b*b*w*l*l");
+}
+
+static
+size_t mptsas_config_manufacturing_5(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(5, MPI_CONFIG_PAGETYPE_MANUFACTURING, 0x02,
+                              "q*b*b*w*l*l", s->sas_addr);
+}
+
+static
+size_t mptsas_config_manufacturing_6(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(6, MPI_CONFIG_PAGETYPE_MANUFACTURING, 0x00,
+                              "*l");
+}
+
+static
+size_t mptsas_config_manufacturing_7(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(7, MPI_CONFIG_PAGETYPE_MANUFACTURING, 0x00,
+                              "*l*l*l*s16*b*b*w", MPTSAS_NUM_PORTS);
+}
+
+static
+size_t mptsas_config_manufacturing_8(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(8, MPI_CONFIG_PAGETYPE_MANUFACTURING, 0x00,
+                              "*l");
+}
+
+static
+size_t mptsas_config_manufacturing_9(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(9, MPI_CONFIG_PAGETYPE_MANUFACTURING, 0x00,
+                              "*l");
+}
+
+static
+size_t mptsas_config_manufacturing_10(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(10, MPI_CONFIG_PAGETYPE_MANUFACTURING, 0x00,
+                              "*l");
+}
+
+/* I/O unit pages */
+
+static
+size_t mptsas_config_io_unit_0(MPTSASState *s, uint8_t **data, int address)
+{
+    PCIDevice *pci = PCI_DEVICE(s);
+    uint64_t unique_value = 0x53504D554D4551LL;  /* "QEMUMPTx" */
+
+    unique_value |= (uint64_t)pci->devfn << 56;
+    return MPTSAS_CONFIG_PACK(0, MPI_CONFIG_PAGETYPE_IO_UNIT, 0x00,
+                              "q", unique_value);
+}
+
+static
+size_t mptsas_config_io_unit_1(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(1, MPI_CONFIG_PAGETYPE_IO_UNIT, 0x02, "l",
+                              0x41 /* single function, RAID disabled */ );
+}
+
+static
+size_t mptsas_config_io_unit_2(MPTSASState *s, uint8_t **data, int address)
+{
+    PCIDevice *pci = PCI_DEVICE(s);
+    uint8_t devfn = pci->devfn;
+    return MPTSAS_CONFIG_PACK(2, MPI_CONFIG_PAGETYPE_IO_UNIT, 0x02,
+                              "llbbw*b*b*w*b*b*w*b*b*w*l",
+                              0, 0x100, 0 /* pci bus? */, devfn, 0);
+}
+
+static
+size_t mptsas_config_io_unit_3(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(3, MPI_CONFIG_PAGETYPE_IO_UNIT, 0x01,
+                              "*b*b*w*l");
+}
+
+static
+size_t mptsas_config_io_unit_4(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(4, MPI_CONFIG_PAGETYPE_IO_UNIT, 0x00, "*l*l*q");
+}
+
+/* I/O controller pages */
+
+static
+size_t mptsas_config_ioc_0(MPTSASState *s, uint8_t **data, int address)
+{
+    PCIDeviceClass *pcic = PCI_DEVICE_GET_CLASS(s);
+
+    return MPTSAS_CONFIG_PACK(0, MPI_CONFIG_PAGETYPE_IOC, 0x01,
+                              "*l*lwwb*b*b*blww",
+                              pcic->vendor_id, pcic->device_id, pcic->revision,
+                              pcic->subsystem_vendor_id,
+                              pcic->subsystem_id);
+}
+
+static
+size_t mptsas_config_ioc_1(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(1, MPI_CONFIG_PAGETYPE_IOC, 0x03,
+                              "*l*l*b*b*b*b");
+}
+
+static
+size_t mptsas_config_ioc_2(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(2, MPI_CONFIG_PAGETYPE_IOC, 0x04,
+                              "*l*b*b*b*b");
+}
+
+static
+size_t mptsas_config_ioc_3(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(3, MPI_CONFIG_PAGETYPE_IOC, 0x00,
+                              "*b*b*w");
+}
+
+static
+size_t mptsas_config_ioc_4(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(4, MPI_CONFIG_PAGETYPE_IOC, 0x00,
+                              "*b*b*w");
+}
+
+static
+size_t mptsas_config_ioc_5(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(5, MPI_CONFIG_PAGETYPE_IOC, 0x00,
+                              "*l*b*b*w");
+}
+
+static
+size_t mptsas_config_ioc_6(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK(6, MPI_CONFIG_PAGETYPE_IOC, 0x01,
+                              "*l*b*b*b*b*b*b*b*b*b*b*w*l*l*l*l*b*b*w"
+                              "*w*w*w*w*l*l*l");
+}
+
+/* SAS I/O unit pages (extended) */
+
+#define MPTSAS_CONFIG_SAS_IO_UNIT_0_SIZE 16
+
+#define MPI_SAS_IOUNIT0_RATE_FAILED_SPEED_NEGOTIATION 0x02
+#define MPI_SAS_IOUNIT0_RATE_1_5                      0x08
+#define MPI_SAS_IOUNIT0_RATE_3_0                      0x09
+
+#define MPI_SAS_DEVICE_INFO_NO_DEVICE                 0x00000000
+#define MPI_SAS_DEVICE_INFO_END_DEVICE                0x00000001
+#define MPI_SAS_DEVICE_INFO_SSP_TARGET                0x00000400
+
+#define MPI_SAS_DEVICE0_ASTATUS_NO_ERRORS             0x00
+
+#define MPI_SAS_DEVICE0_FLAGS_DEVICE_PRESENT          0x0001
+#define MPI_SAS_DEVICE0_FLAGS_DEVICE_MAPPED           0x0002
+#define MPI_SAS_DEVICE0_FLAGS_MAPPING_PERSISTENT      0x0004
+
+
+
+static SCSIDevice *mptsas_phy_get_device(MPTSASState *s, int i,
+                                         int *phy_handle, int *dev_handle)
+{
+    SCSIDevice *d = scsi_device_find(&s->bus, 0, i, 0);
+
+    if (phy_handle) {
+        *phy_handle = i + 1;
+    }
+    if (dev_handle) {
+        *dev_handle = d ? i + 1 + MPTSAS_NUM_PORTS : 0;
+    }
+    return d;
+}
+
+static
+size_t mptsas_config_sas_io_unit_0(MPTSASState *s, uint8_t **data, int address)
+{
+    size_t size = MPTSAS_CONFIG_PACK_EXT(0, MPI_CONFIG_EXTPAGETYPE_SAS_IO_UNIT, 0x04,
+                                         "*w*wb*b*w"
+                                         repl(MPTSAS_NUM_PORTS, "*s16"),
+                                         MPTSAS_NUM_PORTS);
+
+    if (data) {
+        size_t ofs = size - MPTSAS_NUM_PORTS * MPTSAS_CONFIG_SAS_IO_UNIT_0_SIZE;
+        int i;
+
+        for (i = 0; i < MPTSAS_NUM_PORTS; i++) {
+            int phy_handle, dev_handle;
+            SCSIDevice *dev = mptsas_phy_get_device(s, i, &phy_handle, &dev_handle);
+
+            fill(*data + ofs, MPTSAS_CONFIG_SAS_IO_UNIT_0_SIZE,
+                 "bbbblwwl", i, 0, 0,
+                 (dev
+                  ? MPI_SAS_IOUNIT0_RATE_3_0
+                  : MPI_SAS_IOUNIT0_RATE_FAILED_SPEED_NEGOTIATION),
+                 (dev
+                  ? MPI_SAS_DEVICE_INFO_END_DEVICE | MPI_SAS_DEVICE_INFO_SSP_TARGET
+                  : MPI_SAS_DEVICE_INFO_NO_DEVICE),
+                 dev_handle,
+                 dev_handle,
+                 0);
+            ofs += MPTSAS_CONFIG_SAS_IO_UNIT_0_SIZE;
+        }
+        assert(ofs == size);
+    }
+    return size;
+}
+
+#define MPTSAS_CONFIG_SAS_IO_UNIT_1_SIZE 12
+
+static
+size_t mptsas_config_sas_io_unit_1(MPTSASState *s, uint8_t **data, int address)
+{
+    size_t size = MPTSAS_CONFIG_PACK_EXT(1, MPI_CONFIG_EXTPAGETYPE_SAS_IO_UNIT, 0x07,
+                                         "*w*w*w*wb*b*b*b"
+                                         repl(MPTSAS_NUM_PORTS, "*s12"),
+                                         MPTSAS_NUM_PORTS);
+
+    if (data) {
+        size_t ofs = size - MPTSAS_NUM_PORTS * MPTSAS_CONFIG_SAS_IO_UNIT_1_SIZE;
+        int i;
+
+        for (i = 0; i < MPTSAS_NUM_PORTS; i++) {
+            SCSIDevice *dev = mptsas_phy_get_device(s, i, NULL, NULL);
+            fill(*data + ofs, MPTSAS_CONFIG_SAS_IO_UNIT_1_SIZE,
+                 "bbbblww", i, 0, 0,
+                 (MPI_SAS_IOUNIT0_RATE_3_0 << 4) | MPI_SAS_IOUNIT0_RATE_1_5,
+                 (dev
+                  ? MPI_SAS_DEVICE_INFO_END_DEVICE | MPI_SAS_DEVICE_INFO_SSP_TARGET
+                  : MPI_SAS_DEVICE_INFO_NO_DEVICE),
+                 0, 0);
+            ofs += MPTSAS_CONFIG_SAS_IO_UNIT_1_SIZE;
+        }
+        assert(ofs == size);
+    }
+    return size;
+}
+
+static
+size_t mptsas_config_sas_io_unit_2(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK_EXT(2, MPI_CONFIG_EXTPAGETYPE_SAS_IO_UNIT, 0x06,
+                                  "*b*b*w*w*w*b*b*w");
+}
+
+static
+size_t mptsas_config_sas_io_unit_3(MPTSASState *s, uint8_t **data, int address)
+{
+    return MPTSAS_CONFIG_PACK_EXT(3, MPI_CONFIG_EXTPAGETYPE_SAS_IO_UNIT, 0x06,
+                                  "*l*l*l*l*l*l*l*l*l");
+}
+
+/* SAS PHY pages (extended) */
+
+static int mptsas_phy_addr_get(MPTSASState *s, int address)
+{
+    int i;
+    if ((address >> MPI_SAS_PHY_PGAD_FORM_SHIFT) == 0) {
+        i = address & 255;
+    } else if ((address >> MPI_SAS_PHY_PGAD_FORM_SHIFT) == 1) {
+        i = address & 65535;
+    } else {
+        return -EINVAL;
+    }
+
+    if (i >= MPTSAS_NUM_PORTS) {
+        return -EINVAL;
+    }
+
+    return i;
+}
+
+static
+size_t mptsas_config_phy_0(MPTSASState *s, uint8_t **data, int address)
+{
+    int phy_handle = -1;
+    int dev_handle = -1;
+    int i = mptsas_phy_addr_get(s, address);
+    SCSIDevice *dev;
+
+    if (i < 0) {
+        trace_mptsas_config_sas_phy(s, address, i, phy_handle, dev_handle, 0);
+        return i;
+    }
+
+    dev = mptsas_phy_get_device(s, i, &phy_handle, &dev_handle);
+    trace_mptsas_config_sas_phy(s, address, i, phy_handle, dev_handle, 0);
+
+    return MPTSAS_CONFIG_PACK_EXT(0, MPI_CONFIG_EXTPAGETYPE_SAS_PHY, 0x01,
+                                  "w*wqwb*blbb*b*b*l",
+                                  dev_handle, s->sas_addr, dev_handle, i,
+                                  (dev
+                                   ? MPI_SAS_DEVICE_INFO_END_DEVICE /* | MPI_SAS_DEVICE_INFO_SSP_TARGET?? */
+                                   : MPI_SAS_DEVICE_INFO_NO_DEVICE),
+                                  (MPI_SAS_IOUNIT0_RATE_3_0 << 4) | MPI_SAS_IOUNIT0_RATE_1_5,
+                                  (MPI_SAS_IOUNIT0_RATE_3_0 << 4) | MPI_SAS_IOUNIT0_RATE_1_5);
+}
+
+static
+size_t mptsas_config_phy_1(MPTSASState *s, uint8_t **data, int address)
+{
+    int phy_handle = -1;
+    int dev_handle = -1;
+    int i = mptsas_phy_addr_get(s, address);
+
+    if (i < 0) {
+        trace_mptsas_config_sas_phy(s, address, i, phy_handle, dev_handle, 1);
+        return i;
+    }
+
+    (void) mptsas_phy_get_device(s, i, &phy_handle, &dev_handle);
+    trace_mptsas_config_sas_phy(s, address, i, phy_handle, dev_handle, 1);
+
+    return MPTSAS_CONFIG_PACK_EXT(1, MPI_CONFIG_EXTPAGETYPE_SAS_PHY, 0x01,
+                                  "*l*l*l*l*l");
+}
+
+/* SAS device pages (extended) */
+
+static int mptsas_device_addr_get(MPTSASState *s, int address)
+{
+    uint32_t handle, i;
+    uint32_t form = address >> MPI_SAS_PHY_PGAD_FORM_SHIFT;
+    if (form == MPI_SAS_DEVICE_PGAD_FORM_GET_NEXT_HANDLE) {
+        handle = address & MPI_SAS_DEVICE_PGAD_GNH_HANDLE_MASK;
+        do {
+            if (handle == 65535) {
+                handle = MPTSAS_NUM_PORTS + 1;
+            } else {
+                ++handle;
+            }
+            i = handle - 1 - MPTSAS_NUM_PORTS;
+        } while (i < MPTSAS_NUM_PORTS && !scsi_device_find(&s->bus, 0, i, 0));
+
+    } else if (form == MPI_SAS_DEVICE_PGAD_FORM_BUS_TARGET_ID) {
+        if (address & MPI_SAS_DEVICE_PGAD_BT_BUS_MASK) {
+            return -EINVAL;
+        }
+        i = address & MPI_SAS_DEVICE_PGAD_BT_TID_MASK;
+
+    } else if (form == MPI_SAS_DEVICE_PGAD_FORM_HANDLE) {
+        handle = address & MPI_SAS_DEVICE_PGAD_H_HANDLE_MASK;
+        i = handle - 1 - MPTSAS_NUM_PORTS;
+
+    } else {
+        return -EINVAL;
+    }
+
+    if (i >= MPTSAS_NUM_PORTS) {
+        return -EINVAL;
+    }
+
+    return i;
+}
+
+static
+size_t mptsas_config_sas_device_0(MPTSASState *s, uint8_t **data, int address)
+{
+    int phy_handle = -1;
+    int dev_handle = -1;
+    int i = mptsas_device_addr_get(s, address);
+    SCSIDevice *dev = mptsas_phy_get_device(s, i, &phy_handle, &dev_handle);
+
+    trace_mptsas_config_sas_device(s, address, i, phy_handle, dev_handle, 0);
+    if (!dev) {
+        return -ENOENT;
+    }
+
+    return MPTSAS_CONFIG_PACK_EXT(0, MPI_CONFIG_EXTPAGETYPE_SAS_DEVICE, 0x05,
+                                  "*w*wqwbbwbblwb*b",
+                                  dev->wwn, phy_handle, i,
+                                  MPI_SAS_DEVICE0_ASTATUS_NO_ERRORS,
+                                  dev_handle, i, 0,
+                                  MPI_SAS_DEVICE_INFO_END_DEVICE | MPI_SAS_DEVICE_INFO_SSP_TARGET,
+                                  (MPI_SAS_DEVICE0_FLAGS_DEVICE_PRESENT |
+                                   MPI_SAS_DEVICE0_FLAGS_DEVICE_MAPPED |
+                                   MPI_SAS_DEVICE0_FLAGS_MAPPING_PERSISTENT), i);
+}
+
+static
+size_t mptsas_config_sas_device_1(MPTSASState *s, uint8_t **data, int address)
+{
+    int phy_handle = -1;
+    int dev_handle = -1;
+    int i = mptsas_device_addr_get(s, address);
+    SCSIDevice *dev = mptsas_phy_get_device(s, i, &phy_handle, &dev_handle);
+
+    trace_mptsas_config_sas_device(s, address, i, phy_handle, dev_handle, 1);
+    if (!dev) {
+        return -ENOENT;
+    }
+
+    return MPTSAS_CONFIG_PACK_EXT(1, MPI_CONFIG_EXTPAGETYPE_SAS_DEVICE, 0x00,
+                                  "*lq*lwbb*s20",
+                                  dev->wwn, dev_handle, i, 0);
+}
+
+static
+size_t mptsas_config_sas_device_2(MPTSASState *s, uint8_t **data, int address)
+{
+    int phy_handle = -1;
+    int dev_handle = -1;
+    int i = mptsas_device_addr_get(s, address);
+    SCSIDevice *dev = mptsas_phy_get_device(s, i, &phy_handle, &dev_handle);
+
+    trace_mptsas_config_sas_device(s, address, i, phy_handle, dev_handle, 2);
+    if (!dev) {
+        return -ENOENT;
+    }
+
+    return MPTSAS_CONFIG_PACK_EXT(2, MPI_CONFIG_EXTPAGETYPE_SAS_DEVICE, 0x01,
+                                  "ql", dev->wwn, 0);
+}
+
+typedef struct MPTSASConfigPage {
+    uint8_t number;
+    uint8_t type;
+    size_t (*mpt_config_build)(MPTSASState *s, uint8_t **data, int address);
+} MPTSASConfigPage;
+
+static const MPTSASConfigPage mptsas_config_pages[] = {
+    {
+        0, MPI_CONFIG_PAGETYPE_MANUFACTURING,
+        mptsas_config_manufacturing_0,
+    }, {
+        1, MPI_CONFIG_PAGETYPE_MANUFACTURING,
+        mptsas_config_manufacturing_1,
+    }, {
+        2, MPI_CONFIG_PAGETYPE_MANUFACTURING,
+        mptsas_config_manufacturing_2,
+    }, {
+        3, MPI_CONFIG_PAGETYPE_MANUFACTURING,
+        mptsas_config_manufacturing_3,
+    }, {
+        4, MPI_CONFIG_PAGETYPE_MANUFACTURING,
+        mptsas_config_manufacturing_4,
+    }, {
+        5, MPI_CONFIG_PAGETYPE_MANUFACTURING,
+        mptsas_config_manufacturing_5,
+    }, {
+        6, MPI_CONFIG_PAGETYPE_MANUFACTURING,
+        mptsas_config_manufacturing_6,
+    }, {
+        7, MPI_CONFIG_PAGETYPE_MANUFACTURING,
+        mptsas_config_manufacturing_7,
+    }, {
+        8, MPI_CONFIG_PAGETYPE_MANUFACTURING,
+        mptsas_config_manufacturing_8,
+    }, {
+        9, MPI_CONFIG_PAGETYPE_MANUFACTURING,
+        mptsas_config_manufacturing_9,
+    }, {
+        10, MPI_CONFIG_PAGETYPE_MANUFACTURING,
+        mptsas_config_manufacturing_10,
+    }, {
+        0, MPI_CONFIG_PAGETYPE_IO_UNIT,
+        mptsas_config_io_unit_0,
+    }, {
+        1, MPI_CONFIG_PAGETYPE_IO_UNIT,
+        mptsas_config_io_unit_1,
+    }, {
+        2, MPI_CONFIG_PAGETYPE_IO_UNIT,
+        mptsas_config_io_unit_2,
+    }, {
+        3, MPI_CONFIG_PAGETYPE_IO_UNIT,
+        mptsas_config_io_unit_3,
+    }, {
+        4, MPI_CONFIG_PAGETYPE_IO_UNIT,
+        mptsas_config_io_unit_4,
+    }, {
+        0, MPI_CONFIG_PAGETYPE_IOC,
+        mptsas_config_ioc_0,
+    }, {
+        1, MPI_CONFIG_PAGETYPE_IOC,
+        mptsas_config_ioc_1,
+    }, {
+        2, MPI_CONFIG_PAGETYPE_IOC,
+        mptsas_config_ioc_2,
+    }, {
+        3, MPI_CONFIG_PAGETYPE_IOC,
+        mptsas_config_ioc_3,
+    }, {
+        4, MPI_CONFIG_PAGETYPE_IOC,
+        mptsas_config_ioc_4,
+    }, {
+        5, MPI_CONFIG_PAGETYPE_IOC,
+        mptsas_config_ioc_5,
+    }, {
+        6, MPI_CONFIG_PAGETYPE_IOC,
+        mptsas_config_ioc_6,
+    }, {
+        0, MPI_CONFIG_EXTPAGETYPE_SAS_IO_UNIT,
+        mptsas_config_sas_io_unit_0,
+    }, {
+        1, MPI_CONFIG_EXTPAGETYPE_SAS_IO_UNIT,
+        mptsas_config_sas_io_unit_1,
+    }, {
+        2, MPI_CONFIG_EXTPAGETYPE_SAS_IO_UNIT,
+        mptsas_config_sas_io_unit_2,
+    }, {
+        3, MPI_CONFIG_EXTPAGETYPE_SAS_IO_UNIT,
+        mptsas_config_sas_io_unit_3,
+    }, {
+        0, MPI_CONFIG_EXTPAGETYPE_SAS_PHY,
+        mptsas_config_phy_0,
+    }, {
+        1, MPI_CONFIG_EXTPAGETYPE_SAS_PHY,
+        mptsas_config_phy_1,
+    }, {
+        0, MPI_CONFIG_EXTPAGETYPE_SAS_DEVICE,
+        mptsas_config_sas_device_0,
+    }, {
+        1, MPI_CONFIG_EXTPAGETYPE_SAS_DEVICE,
+        mptsas_config_sas_device_1,
+    }, {
+       2,  MPI_CONFIG_EXTPAGETYPE_SAS_DEVICE,
+        mptsas_config_sas_device_2,
+    }
+};
+
+static const MPTSASConfigPage *mptsas_find_config_page(int type, int number)
+{
+    const MPTSASConfigPage *page;
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(mptsas_config_pages); i++) {
+        page = &mptsas_config_pages[i];
+        if (page->type == type && page->number == number) {
+            return page;
+        }
+    }
+
+    return NULL;
+}
+
+void mptsas_process_config(MPTSASState *s, MPIMsgConfig *req)
+{
+    PCIDevice *pci = PCI_DEVICE(s);
+
+    MPIMsgConfigReply reply;
+    const MPTSASConfigPage *page;
+    size_t length;
+    uint8_t type;
+    uint8_t *data = NULL;
+    uint32_t flags_and_length;
+    uint32_t dmalen;
+    uint64_t pa;
+
+    mptsas_fix_config_endianness(req);
+
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_msg) < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_reply) < sizeof(reply));
+
+    /* Copy common bits from the request into the reply. */
+    memset(&reply, 0, sizeof(reply));
+    reply.Action      = req->Action;
+    reply.Function    = req->Function;
+    reply.MsgContext  = req->MsgContext;
+    reply.MsgLength   = sizeof(reply) / 4;
+    reply.PageType    = req->PageType;
+    reply.PageNumber  = req->PageNumber;
+    reply.PageLength  = req->PageLength;
+    reply.PageVersion = req->PageVersion;
+
+    type = req->PageType & MPI_CONFIG_PAGETYPE_MASK;
+    if (type == MPI_CONFIG_PAGETYPE_EXTENDED) {
+        type = req->ExtPageType;
+        if (type <= MPI_CONFIG_PAGETYPE_MASK) {
+            reply.IOCStatus = MPI_IOCSTATUS_CONFIG_INVALID_TYPE;
+            goto out;
+        }
+
+        reply.ExtPageType = req->ExtPageType;
+    }
+
+    page = mptsas_find_config_page(type, req->PageNumber);
+
+    switch(req->Action) {
+    case MPI_CONFIG_ACTION_PAGE_DEFAULT:
+    case MPI_CONFIG_ACTION_PAGE_HEADER:
+    case MPI_CONFIG_ACTION_PAGE_READ_NVRAM:
+    case MPI_CONFIG_ACTION_PAGE_READ_CURRENT:
+    case MPI_CONFIG_ACTION_PAGE_READ_DEFAULT:
+    case MPI_CONFIG_ACTION_PAGE_WRITE_CURRENT:
+    case MPI_CONFIG_ACTION_PAGE_WRITE_NVRAM:
+        break;
+
+    default:
+        reply.IOCStatus = MPI_IOCSTATUS_CONFIG_INVALID_ACTION;
+        goto out;
+    }
+
+    if (!page) {
+        page = mptsas_find_config_page(type, 1);
+        if (page) {
+            reply.IOCStatus = MPI_IOCSTATUS_CONFIG_INVALID_PAGE;
+        } else {
+            reply.IOCStatus = MPI_IOCSTATUS_CONFIG_INVALID_TYPE;
+        }
+        goto out;
+    }
+
+    if (req->Action == MPI_CONFIG_ACTION_PAGE_DEFAULT ||
+        req->Action == MPI_CONFIG_ACTION_PAGE_HEADER) {
+        length = page->mpt_config_build(s, NULL, req->PageAddress);
+        if ((ssize_t)length < 0) {
+            reply.IOCStatus = MPI_IOCSTATUS_CONFIG_INVALID_PAGE;
+            goto out;
+        } else {
+            goto done;
+        }
+    }
+
+    if (req->Action == MPI_CONFIG_ACTION_PAGE_WRITE_CURRENT ||
+        req->Action == MPI_CONFIG_ACTION_PAGE_WRITE_NVRAM) {
+        length = page->mpt_config_build(s, NULL, req->PageAddress);
+        if ((ssize_t)length < 0) {
+            reply.IOCStatus = MPI_IOCSTATUS_CONFIG_INVALID_PAGE;
+        } else {
+            reply.IOCStatus = MPI_IOCSTATUS_CONFIG_CANT_COMMIT;
+        }
+        goto out;
+    }
+
+    flags_and_length = req->PageBufferSGE.FlagsLength;
+    dmalen = flags_and_length & MPI_SGE_LENGTH_MASK;
+    if (dmalen == 0) {
+        length = page->mpt_config_build(s, NULL, req->PageAddress);
+        if ((ssize_t)length < 0) {
+            reply.IOCStatus = MPI_IOCSTATUS_CONFIG_INVALID_PAGE;
+            goto out;
+        } else {
+            goto done;
+        }
+    }
+
+    if (flags_and_length & MPI_SGE_FLAGS_64_BIT_ADDRESSING) {
+        pa = req->PageBufferSGE.u.Address64;
+    } else {
+        pa = req->PageBufferSGE.u.Address32;
+    }
+
+    /* Only read actions left.  */
+    length = page->mpt_config_build(s, &data, req->PageAddress);
+    if ((ssize_t)length < 0) {
+        reply.IOCStatus = MPI_IOCSTATUS_CONFIG_INVALID_PAGE;
+        goto out;
+    } else {
+        assert(data[2] == page->number);
+        pci_dma_write(pci, pa, data, MIN(length, dmalen));
+        goto done;
+    }
+
+    abort();
+
+done:
+    if (type > MPI_CONFIG_PAGETYPE_MASK) {
+        reply.ExtPageLength = length / 4;
+        reply.ExtPageType   = req->ExtPageType;
+    } else {
+        reply.PageLength    = length / 4;
+    }
+
+out:
+    mptsas_fix_config_reply_endianness(&reply);
+    mptsas_reply(s, (MPIDefaultReply *)&reply);
+    g_free(data);
+}
diff --git a/hw/scsi/mptendian.c b/hw/scsi/mptendian.c
new file mode 100644
index 0000000..b7fe2a2
--- /dev/null
+++ b/hw/scsi/mptendian.c
@@ -0,0 +1,204 @@
+/*
+ * QEMU LSI SAS1068 Host Bus Adapter emulation
+ * Endianness conversion for MPI data structures
+ *
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * Authors: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/hw.h"
+#include "hw/pci/pci.h"
+#include "sysemu/dma.h"
+#include "sysemu/block-backend.h"
+#include "hw/pci/msi.h"
+#include "qemu/iov.h"
+#include "hw/scsi/scsi.h"
+#include "block/scsi.h"
+#include "trace.h"
+
+#include "mptsas.h"
+#include "mpi.h"
+
+static void mptsas_fix_sgentry_endianness(MPISGEntry *sge)
+{
+    le32_to_cpus(&sge->FlagsLength);
+    if (sge->FlagsLength & MPI_SGE_FLAGS_64_BIT_ADDRESSING) {
+       le64_to_cpus(&sge->u.Address64);
+    } else {
+       le32_to_cpus(&sge->u.Address32);
+    }
+}
+
+static void mptsas_fix_sgentry_endianness_reply(MPISGEntry *sge)
+{
+    if (sge->FlagsLength & MPI_SGE_FLAGS_64_BIT_ADDRESSING) {
+       cpu_to_le64s(&sge->u.Address64);
+    } else {
+       cpu_to_le32s(&sge->u.Address32);
+    }
+    cpu_to_le32s(&sge->FlagsLength);
+}
+
+void mptsas_fix_scsi_io_endianness(MPIMsgSCSIIORequest *req)
+{
+    le32_to_cpus(&req->MsgContext);
+    le32_to_cpus(&req->Control);
+    le32_to_cpus(&req->DataLength);
+    le32_to_cpus(&req->SenseBufferLowAddr);
+}
+
+void mptsas_fix_scsi_io_reply_endianness(MPIMsgSCSIIOReply *reply)
+{
+    cpu_to_le32s(&reply->MsgContext);
+    cpu_to_le16s(&reply->IOCStatus);
+    cpu_to_le32s(&reply->IOCLogInfo);
+    cpu_to_le32s(&reply->TransferCount);
+    cpu_to_le32s(&reply->SenseCount);
+    cpu_to_le32s(&reply->ResponseInfo);
+    cpu_to_le16s(&reply->TaskTag);
+}
+
+void mptsas_fix_scsi_task_mgmt_endianness(MPIMsgSCSITaskMgmt *req)
+{
+    le32_to_cpus(&req->MsgContext);
+    le32_to_cpus(&req->TaskMsgContext);
+}
+
+void mptsas_fix_scsi_task_mgmt_reply_endianness(MPIMsgSCSITaskMgmtReply *reply)
+{
+    cpu_to_le32s(&reply->MsgContext);
+    cpu_to_le16s(&reply->IOCStatus);
+    cpu_to_le32s(&reply->IOCLogInfo);
+    cpu_to_le32s(&reply->TerminationCount);
+}
+
+void mptsas_fix_ioc_init_endianness(MPIMsgIOCInit *req)
+{
+    le32_to_cpus(&req->MsgContext);
+    le16_to_cpus(&req->ReplyFrameSize);
+    le32_to_cpus(&req->HostMfaHighAddr);
+    le32_to_cpus(&req->SenseBufferHighAddr);
+    le32_to_cpus(&req->ReplyFifoHostSignalingAddr);
+    mptsas_fix_sgentry_endianness(&req->HostPageBufferSGE);
+    le16_to_cpus(&req->MsgVersion);
+    le16_to_cpus(&req->HeaderVersion);
+}
+
+void mptsas_fix_ioc_init_reply_endianness(MPIMsgIOCInitReply *reply)
+{
+    cpu_to_le32s(&reply->MsgContext);
+    cpu_to_le16s(&reply->IOCStatus);
+    cpu_to_le32s(&reply->IOCLogInfo);
+}
+
+void mptsas_fix_ioc_facts_endianness(MPIMsgIOCFacts *req)
+{
+    le32_to_cpus(&req->MsgContext);
+}
+
+void mptsas_fix_ioc_facts_reply_endianness(MPIMsgIOCFactsReply *reply)
+{
+    cpu_to_le16s(&reply->MsgVersion);
+    cpu_to_le16s(&reply->HeaderVersion);
+    cpu_to_le32s(&reply->MsgContext);
+    cpu_to_le16s(&reply->IOCExceptions);
+    cpu_to_le16s(&reply->IOCStatus);
+    cpu_to_le32s(&reply->IOCLogInfo);
+    cpu_to_le16s(&reply->ReplyQueueDepth);
+    cpu_to_le16s(&reply->RequestFrameSize);
+    cpu_to_le16s(&reply->ProductID);
+    cpu_to_le32s(&reply->CurrentHostMfaHighAddr);
+    cpu_to_le16s(&reply->GlobalCredits);
+    cpu_to_le32s(&reply->CurrentSenseBufferHighAddr);
+    cpu_to_le16s(&reply->CurReplyFrameSize);
+    cpu_to_le32s(&reply->FWImageSize);
+    cpu_to_le32s(&reply->IOCCapabilities);
+    cpu_to_le16s(&reply->HighPriorityQueueDepth);
+    mptsas_fix_sgentry_endianness_reply(&reply->HostPageBufferSGE);
+    cpu_to_le32s(&reply->ReplyFifoHostSignalingAddr);
+}
+
+void mptsas_fix_config_endianness(MPIMsgConfig *req)
+{
+    le16_to_cpus(&req->ExtPageLength);
+    le32_to_cpus(&req->MsgContext);
+    le32_to_cpus(&req->PageAddress);
+    mptsas_fix_sgentry_endianness(&req->PageBufferSGE);
+}
+
+void mptsas_fix_config_reply_endianness(MPIMsgConfigReply *reply)
+{
+    cpu_to_le16s(&reply->ExtPageLength);
+    cpu_to_le32s(&reply->MsgContext);
+    cpu_to_le16s(&reply->IOCStatus);
+    cpu_to_le32s(&reply->IOCLogInfo);
+}
+
+void mptsas_fix_port_facts_endianness(MPIMsgPortFacts *req)
+{
+    le32_to_cpus(&req->MsgContext);
+}
+
+void mptsas_fix_port_facts_reply_endianness(MPIMsgPortFactsReply *reply)
+{
+    cpu_to_le32s(&reply->MsgContext);
+    cpu_to_le16s(&reply->IOCStatus);
+    cpu_to_le32s(&reply->IOCLogInfo);
+    cpu_to_le16s(&reply->MaxDevices);
+    cpu_to_le16s(&reply->PortSCSIID);
+    cpu_to_le16s(&reply->ProtocolFlags);
+    cpu_to_le16s(&reply->MaxPostedCmdBuffers);
+    cpu_to_le16s(&reply->MaxPersistentIDs);
+    cpu_to_le16s(&reply->MaxLanBuckets);
+}
+
+void mptsas_fix_port_enable_endianness(MPIMsgPortEnable *req)
+{
+    le32_to_cpus(&req->MsgContext);
+}
+
+void mptsas_fix_port_enable_reply_endianness(MPIMsgPortEnableReply *reply)
+{
+    cpu_to_le32s(&reply->MsgContext);
+    cpu_to_le16s(&reply->IOCStatus);
+    cpu_to_le32s(&reply->IOCLogInfo);
+}
+
+void mptsas_fix_event_notification_endianness(MPIMsgEventNotify *req)
+{
+    le32_to_cpus(&req->MsgContext);
+}
+
+void mptsas_fix_event_notification_reply_endianness(MPIMsgEventNotifyReply *reply)
+{
+    int length = reply->EventDataLength;
+    int i;
+
+    cpu_to_le16s(&reply->EventDataLength);
+    cpu_to_le32s(&reply->MsgContext);
+    cpu_to_le16s(&reply->IOCStatus);
+    cpu_to_le32s(&reply->IOCLogInfo);
+    cpu_to_le32s(&reply->Event);
+    cpu_to_le32s(&reply->EventContext);
+
+    /* Really depends on the event kind.  This will do for now.  */
+    for (i = 0; i < length; i++) {
+        cpu_to_le32s(&reply->Data[i]);
+    }
+}
+
diff --git a/hw/scsi/mptsas.c b/hw/scsi/mptsas.c
new file mode 100644
index 0000000..333cc1f
--- /dev/null
+++ b/hw/scsi/mptsas.c
@@ -0,0 +1,1441 @@
+/*
+ * QEMU LSI SAS1068 Host Bus Adapter emulation
+ * Based on the QEMU Megaraid emulator
+ *
+ * Copyright (c) 2009-2012 Hannes Reinecke, SUSE Labs
+ * Copyright (c) 2012 Verizon, Inc.
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * Authors: Don Slutz, Paolo Bonzini
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/hw.h"
+#include "hw/pci/pci.h"
+#include "sysemu/dma.h"
+#include "sysemu/block-backend.h"
+#include "hw/pci/msi.h"
+#include "qemu/iov.h"
+#include "hw/scsi/scsi.h"
+#include "block/scsi.h"
+#include "trace.h"
+
+#include "mptsas.h"
+#include "mpi.h"
+
+#define NAA_LOCALLY_ASSIGNED_ID 0x3ULL
+#define IEEE_COMPANY_LOCALLY_ASSIGNED 0x525400
+
+#define TYPE_MPTSAS1068 "mptsas1068"
+
+#define MPT_SAS(obj) \
+    OBJECT_CHECK(MPTSASState, (obj), TYPE_MPTSAS1068)
+
+#define MPTSAS1068_PRODUCT_ID                  \
+    (MPI_FW_HEADER_PID_FAMILY_1068_SAS |       \
+     MPI_FW_HEADER_PID_PROD_INITIATOR_SCSI |   \
+     MPI_FW_HEADER_PID_TYPE_SAS)
+
+struct MPTSASRequest {
+    MPIMsgSCSIIORequest scsi_io;
+    SCSIRequest *sreq;
+    QEMUSGList qsg;
+    MPTSASState *dev;
+
+    QTAILQ_ENTRY(MPTSASRequest) next;
+};
+
+static void mptsas_update_interrupt(MPTSASState *s)
+{
+    PCIDevice *pci = (PCIDevice *) s;
+    uint32_t state = s->intr_status & ~(s->intr_mask | MPI_HIS_IOP_DOORBELL_STATUS);
+
+    if (s->msi_in_use && msi_enabled(pci)) {
+        if (state) {
+            trace_mptsas_irq_msi(s);
+            msi_notify(pci, 0);
+        }
+    }
+
+    trace_mptsas_irq_intx(s, !!state);
+    pci_set_irq(pci, !!state);
+}
+
+static void mptsas_set_fault(MPTSASState *s, uint32_t code)
+{
+    if ((s->state & MPI_IOC_STATE_FAULT) == 0) {
+        s->state = MPI_IOC_STATE_FAULT | code;
+    }
+}
+
+#define MPTSAS_FIFO_INVALID(s, name)                     \
+    ((s)->name##_head > ARRAY_SIZE((s)->name) ||         \
+     (s)->name##_tail > ARRAY_SIZE((s)->name))
+
+#define MPTSAS_FIFO_EMPTY(s, name)                       \
+    ((s)->name##_head == (s)->name##_tail)
+
+#define MPTSAS_FIFO_FULL(s, name)                        \
+    ((s)->name##_head == ((s)->name##_tail + 1) % ARRAY_SIZE((s)->name))
+
+#define MPTSAS_FIFO_GET(s, name) ({                      \
+    uint32_t _val = (s)->name[(s)->name##_head++];       \
+    (s)->name##_head %= ARRAY_SIZE((s)->name);           \
+    _val;                                                \
+})
+
+#define MPTSAS_FIFO_PUT(s, name, val) do {       \
+    (s)->name[(s)->name##_tail++] = (val);       \
+    (s)->name##_tail %= ARRAY_SIZE((s)->name);   \
+} while(0)
+
+static void mptsas_post_reply(MPTSASState *s, MPIDefaultReply *reply)
+{
+    PCIDevice *pci = (PCIDevice *) s;
+    uint32_t addr_lo;
+
+    if (MPTSAS_FIFO_EMPTY(s, reply_free) || MPTSAS_FIFO_FULL(s, reply_post)) {
+        mptsas_set_fault(s, MPI_IOCSTATUS_INSUFFICIENT_RESOURCES);
+        return;
+    }
+
+    addr_lo = MPTSAS_FIFO_GET(s, reply_free);
+
+    pci_dma_write(pci, addr_lo | s->host_mfa_high_addr, reply,
+                  MIN(s->reply_frame_size, 4 * reply->MsgLength));
+
+    MPTSAS_FIFO_PUT(s, reply_post, MPI_ADDRESS_REPLY_A_BIT | (addr_lo >> 1));
+
+    s->intr_status |= MPI_HIS_REPLY_MESSAGE_INTERRUPT;
+    if (s->doorbell_state == DOORBELL_WRITE) {
+        s->doorbell_state = DOORBELL_NONE;
+        s->intr_status |= MPI_HIS_DOORBELL_INTERRUPT;
+    }
+    mptsas_update_interrupt(s);
+}
+
+void mptsas_reply(MPTSASState *s, MPIDefaultReply *reply)
+{
+    if (s->doorbell_state == DOORBELL_WRITE) {
+        /* The reply is sent out in 16 bit chunks, while the size
+         * in the reply is in 32 bit units.
+         */
+        s->doorbell_state = DOORBELL_READ;
+        s->doorbell_reply_idx = 0;
+        s->doorbell_reply_size = reply->MsgLength * 2;
+        memcpy(s->doorbell_reply, reply, s->doorbell_reply_size * 2);
+        s->intr_status |= MPI_HIS_DOORBELL_INTERRUPT;
+        mptsas_update_interrupt(s);
+    } else {
+        mptsas_post_reply(s, reply);
+    }
+}
+
+static void mptsas_turbo_reply(MPTSASState *s, uint32_t msgctx)
+{
+    if (MPTSAS_FIFO_FULL(s, reply_post)) {
+        mptsas_set_fault(s, MPI_IOCSTATUS_INSUFFICIENT_RESOURCES);
+        return;
+    }
+
+    /* The reply is just the message context ID (bit 31 = clear). */
+    MPTSAS_FIFO_PUT(s, reply_post, msgctx);
+
+    s->intr_status |= MPI_HIS_REPLY_MESSAGE_INTERRUPT;
+    mptsas_update_interrupt(s);
+}
+
+#define MPTSAS_MAX_REQUEST_SIZE 52
+
+static const int mpi_request_sizes[] = {
+    [MPI_FUNCTION_SCSI_IO_REQUEST]    = sizeof(MPIMsgSCSIIORequest),
+    [MPI_FUNCTION_SCSI_TASK_MGMT]     = sizeof(MPIMsgSCSITaskMgmt),
+    [MPI_FUNCTION_IOC_INIT]           = sizeof(MPIMsgIOCInit),
+    [MPI_FUNCTION_IOC_FACTS]          = sizeof(MPIMsgIOCFacts),
+    [MPI_FUNCTION_CONFIG]             = sizeof(MPIMsgConfig),
+    [MPI_FUNCTION_PORT_FACTS]         = sizeof(MPIMsgPortFacts),
+    [MPI_FUNCTION_PORT_ENABLE]        = sizeof(MPIMsgPortEnable),
+    [MPI_FUNCTION_EVENT_NOTIFICATION] = sizeof(MPIMsgEventNotify),
+};
+
+static dma_addr_t mptsas_ld_sg_base(MPTSASState *s, uint32_t flags_and_length,
+                                    dma_addr_t *sgaddr)
+{
+    PCIDevice *pci = (PCIDevice *) s;
+    dma_addr_t addr;
+
+    if (flags_and_length & MPI_SGE_FLAGS_64_BIT_ADDRESSING) {
+        addr = ldq_le_pci_dma(pci, *sgaddr + 4);
+        *sgaddr += 12;
+    } else {
+        addr = ldl_le_pci_dma(pci, *sgaddr + 4);
+        *sgaddr += 8;
+    }
+    return addr;
+}
+
+static int mptsas_build_sgl(MPTSASState *s, MPTSASRequest *req, hwaddr addr)
+{
+    PCIDevice *pci = (PCIDevice *) s;
+    hwaddr next_chain_addr;
+    uint32_t left;
+    hwaddr sgaddr;
+    uint32_t chain_offset;
+
+    chain_offset = req->scsi_io.ChainOffset;
+    next_chain_addr = addr + chain_offset * sizeof(uint32_t);
+    sgaddr = addr + sizeof(MPIMsgSCSIIORequest);
+    pci_dma_sglist_init(&req->qsg, pci, 4);
+    left = req->scsi_io.DataLength;
+
+    for(;;) {
+        dma_addr_t addr, len;
+        uint32_t flags_and_length;
+
+        flags_and_length = ldl_le_pci_dma(pci, sgaddr);
+        len = flags_and_length & MPI_SGE_LENGTH_MASK;
+        if ((flags_and_length & MPI_SGE_FLAGS_ELEMENT_TYPE_MASK)
+            != MPI_SGE_FLAGS_SIMPLE_ELEMENT ||
+            (!len &&
+             !(flags_and_length & MPI_SGE_FLAGS_END_OF_LIST) &&
+             !(flags_and_length & MPI_SGE_FLAGS_END_OF_BUFFER))) {
+            return MPI_IOCSTATUS_INVALID_SGL;
+        }
+
+        len = MIN(len, left);
+        if (!len) {
+            /* We reached the desired transfer length, ignore extra
+             * elements of the s/g list.
+             */
+            break;
+        }
+
+        addr = mptsas_ld_sg_base(s, flags_and_length, &sgaddr);
+        qemu_sglist_add(&req->qsg, addr, len);
+        left -= len;
+
+        if (flags_and_length & MPI_SGE_FLAGS_END_OF_LIST) {
+            break;
+        }
+
+        if (flags_and_length & MPI_SGE_FLAGS_LAST_ELEMENT) {
+            if (!chain_offset) {
+                break;
+            }
+
+            flags_and_length = ldl_le_pci_dma(pci, next_chain_addr);
+            if ((flags_and_length & MPI_SGE_FLAGS_ELEMENT_TYPE_MASK)
+                != MPI_SGE_FLAGS_CHAIN_ELEMENT) {
+                return MPI_IOCSTATUS_INVALID_SGL;
+            }
+
+            sgaddr = mptsas_ld_sg_base(s, flags_and_length, &next_chain_addr);
+            chain_offset =
+                (flags_and_length & MPI_SGE_CHAIN_OFFSET_MASK) >> MPI_SGE_CHAIN_OFFSET_SHIFT;
+            next_chain_addr = sgaddr + chain_offset * sizeof(uint32_t);
+        }
+    }
+    return 0;
+}
+
+static void mptsas_free_request(MPTSASRequest *req)
+{
+    MPTSASState *s = req->dev;
+
+    if (req->sreq != NULL) {
+        req->sreq->hba_private = NULL;
+        scsi_req_unref(req->sreq);
+        req->sreq = NULL;
+        QTAILQ_REMOVE(&s->pending, req, next);
+    }
+    qemu_sglist_destroy(&req->qsg);
+    g_free(req);
+}
+
+static int mptsas_scsi_device_find(MPTSASState *s, int bus, int target,
+                                   uint8_t *lun, SCSIDevice **sdev)
+{
+    if (bus != 0) {
+        return MPI_IOCSTATUS_SCSI_INVALID_BUS;
+    }
+
+    if (target >= s->max_devices) {
+        return MPI_IOCSTATUS_SCSI_INVALID_TARGETID;
+    }
+
+    *sdev = scsi_device_find(&s->bus, bus, target, lun[1]);
+    if (!*sdev) {
+        return MPI_IOCSTATUS_SCSI_DEVICE_NOT_THERE;
+    }
+
+    return 0;
+}
+
+static int mptsas_process_scsi_io_request(MPTSASState *s,
+                                          MPIMsgSCSIIORequest *scsi_io,
+                                          hwaddr addr)
+{
+    MPTSASRequest *req;
+    MPIMsgSCSIIOReply reply;
+    SCSIDevice *sdev;
+    int status;
+
+    mptsas_fix_scsi_io_endianness(scsi_io);
+
+    trace_mptsas_process_scsi_io_request(s, scsi_io->Bus, scsi_io->TargetID,
+                                         scsi_io->LUN[1], scsi_io->DataLength);
+
+    status = mptsas_scsi_device_find(s, scsi_io->Bus, scsi_io->TargetID,
+                                     scsi_io->LUN, &sdev);
+    if (status) {
+        goto bad;
+    }
+
+    req = g_new(MPTSASRequest, 1);
+    QTAILQ_INSERT_TAIL(&s->pending, req, next);
+    req->scsi_io = *scsi_io;
+    req->dev = s;
+
+    status = mptsas_build_sgl(s, req, addr);
+    if (status) {
+        goto free_bad;
+    }
+
+    if (req->qsg.size < scsi_io->DataLength) {
+        trace_mptsas_sgl_overflow(s, scsi_io->MsgContext, scsi_io->DataLength,
+                                  req->qsg.size);
+        status = MPI_IOCSTATUS_INVALID_SGL;
+        goto free_bad;
+    }
+
+    req->sreq = scsi_req_new(sdev, scsi_io->MsgContext,
+                            scsi_io->LUN[1], scsi_io->CDB, req);
+
+    if (req->sreq->cmd.xfer > scsi_io->DataLength) {
+        goto overrun;
+    }
+    switch (scsi_io->Control & MPI_SCSIIO_CONTROL_DATADIRECTION_MASK) {
+    case MPI_SCSIIO_CONTROL_NODATATRANSFER:
+        if (req->sreq->cmd.mode != SCSI_XFER_NONE) {
+            goto overrun;
+        }
+        break;
+
+    case MPI_SCSIIO_CONTROL_WRITE:
+        if (req->sreq->cmd.mode != SCSI_XFER_TO_DEV) {
+            goto overrun;
+        }
+        break;
+
+    case MPI_SCSIIO_CONTROL_READ:
+        if (req->sreq->cmd.mode != SCSI_XFER_FROM_DEV) {
+            goto overrun;
+        }
+        break;
+    }
+
+    if (scsi_req_enqueue(req->sreq)) {
+        scsi_req_continue(req->sreq);
+    }
+    return 0;
+
+overrun:
+    trace_mptsas_scsi_overflow(s, scsi_io->MsgContext, req->sreq->cmd.xfer,
+                               scsi_io->DataLength);
+    status = MPI_IOCSTATUS_SCSI_DATA_OVERRUN;
+free_bad:
+    mptsas_free_request(req);
+bad:
+    memset(&reply, 0, sizeof(reply));
+    reply.TargetID          = scsi_io->TargetID;
+    reply.Bus               = scsi_io->Bus;
+    reply.MsgLength         = sizeof(reply) / 4;
+    reply.Function          = scsi_io->Function;
+    reply.CDBLength         = scsi_io->CDBLength;
+    reply.SenseBufferLength = scsi_io->SenseBufferLength;
+    reply.MsgContext        = scsi_io->MsgContext;
+    reply.SCSIState         = MPI_SCSI_STATE_NO_SCSI_STATUS;
+    reply.IOCStatus         = status;
+
+    mptsas_fix_scsi_io_reply_endianness(&reply);
+    mptsas_reply(s, (MPIDefaultReply *)&reply);
+
+    return 0;
+}
+
+typedef struct {
+    Notifier                notifier;
+    MPTSASState             *s;
+    MPIMsgSCSITaskMgmtReply *reply;
+} MPTSASCancelNotifier;
+
+static void mptsas_cancel_notify(Notifier *notifier, void *data)
+{
+    MPTSASCancelNotifier *n = container_of(notifier,
+                                           MPTSASCancelNotifier,
+                                           notifier);
+
+    /* Abusing IOCLogInfo to store the expected number of requests... */
+    if (++n->reply->TerminationCount == n->reply->IOCLogInfo) {
+        n->reply->IOCLogInfo = 0;
+        mptsas_fix_scsi_task_mgmt_reply_endianness(n->reply);
+        mptsas_post_reply(n->s, (MPIDefaultReply *)n->reply);
+        g_free(n->reply);
+    }
+    g_free(n);
+}
+
+static void mptsas_process_scsi_task_mgmt(MPTSASState *s, MPIMsgSCSITaskMgmt *req)
+{
+    MPIMsgSCSITaskMgmtReply reply;
+    MPIMsgSCSITaskMgmtReply *reply_async;
+    int status, count;
+    SCSIDevice *sdev;
+    SCSIRequest *r, *next;
+    BusChild *kid;
+
+    mptsas_fix_scsi_task_mgmt_endianness(req);
+
+    QEMU_BUILD_BUG_ON(MPTSAS_MAX_REQUEST_SIZE < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_msg) < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_reply) < sizeof(reply));
+
+    memset(&reply, 0, sizeof(reply));
+    reply.TargetID   = req->TargetID;
+    reply.Bus        = req->Bus;
+    reply.MsgLength  = sizeof(reply) / 4;
+    reply.Function   = req->Function;
+    reply.TaskType   = req->TaskType;
+    reply.MsgContext = req->MsgContext;
+
+    switch (req->TaskType) {
+    case MPI_SCSITASKMGMT_TASKTYPE_ABORT_TASK:
+    case MPI_SCSITASKMGMT_TASKTYPE_QUERY_TASK:
+        status = mptsas_scsi_device_find(s, req->Bus, req->TargetID,
+                                         req->LUN, &sdev);
+        if (status) {
+            reply.IOCStatus = status;
+            goto out;
+        }
+        if (sdev->lun != req->LUN[1]) {
+            reply.ResponseCode = MPI_SCSITASKMGMT_RSP_TM_INVALID_LUN;
+            goto out;
+        }
+
+        QTAILQ_FOREACH_SAFE(r, &sdev->requests, next, next) {
+            MPTSASRequest *cmd_req = r->hba_private;
+            if (cmd_req && cmd_req->scsi_io.MsgContext == req->TaskMsgContext) {
+                break;
+            }
+        }
+        if (r) {
+            /*
+             * Assert that the request has not been completed yet, we
+             * check for it in the loop above.
+             */
+            assert(r->hba_private);
+            if (req->TaskType == MPI_SCSITASKMGMT_TASKTYPE_QUERY_TASK) {
+                /* "If the specified command is present in the task set, then
+                 * return a service response set to FUNCTION SUCCEEDED".
+                 */
+                reply.ResponseCode = MPI_SCSITASKMGMT_RSP_TM_SUCCEEDED;
+            } else {
+                MPTSASCancelNotifier *notifier;
+
+                reply_async = g_memdup(&reply, sizeof(MPIMsgSCSITaskMgmtReply));
+                reply_async->IOCLogInfo = INT_MAX;
+
+                count = 1;
+                notifier = g_new(MPTSASCancelNotifier, 1);
+                notifier->s = s;
+                notifier->reply = reply_async;
+                notifier->notifier.notify = mptsas_cancel_notify;
+                scsi_req_cancel_async(r, &notifier->notifier);
+                goto reply_maybe_async;
+            }
+        }
+        break;
+
+    case MPI_SCSITASKMGMT_TASKTYPE_ABRT_TASK_SET:
+    case MPI_SCSITASKMGMT_TASKTYPE_CLEAR_TASK_SET:
+        status = mptsas_scsi_device_find(s, req->Bus, req->TargetID,
+                                         req->LUN, &sdev);
+        if (status) {
+            reply.IOCStatus = status;
+            goto out;
+        }
+        if (sdev->lun != req->LUN[1]) {
+            reply.ResponseCode = MPI_SCSITASKMGMT_RSP_TM_INVALID_LUN;
+            goto out;
+        }
+
+        reply_async = g_memdup(&reply, sizeof(MPIMsgSCSITaskMgmtReply));
+        reply_async->IOCLogInfo = INT_MAX;
+
+        count = 0;
+        QTAILQ_FOREACH_SAFE(r, &sdev->requests, next, next) {
+            if (r->hba_private) {
+                MPTSASCancelNotifier *notifier;
+
+                count++;
+                notifier = g_new(MPTSASCancelNotifier, 1);
+                notifier->s = s;
+                notifier->reply = reply_async;
+                notifier->notifier.notify = mptsas_cancel_notify;
+                scsi_req_cancel_async(r, &notifier->notifier);
+            }
+        }
+
+reply_maybe_async:
+        if (reply_async->TerminationCount < count) {
+            reply_async->IOCLogInfo = count;
+            return;
+        }
+        reply.TerminationCount = count;
+        break;
+
+    case MPI_SCSITASKMGMT_TASKTYPE_LOGICAL_UNIT_RESET:
+        status = mptsas_scsi_device_find(s, req->Bus, req->TargetID,
+                                         req->LUN, &sdev);
+        if (status) {
+            reply.IOCStatus = status;
+            goto out;
+        }
+        if (sdev->lun != req->LUN[1]) {
+            reply.ResponseCode = MPI_SCSITASKMGMT_RSP_TM_INVALID_LUN;
+            goto out;
+        }
+        qdev_reset_all(&sdev->qdev);
+        break;
+
+    case MPI_SCSITASKMGMT_TASKTYPE_TARGET_RESET:
+        if (req->Bus != 0) {
+            reply.IOCStatus = MPI_IOCSTATUS_SCSI_INVALID_BUS;
+            goto out;
+        }
+        if (req->TargetID > s->max_devices) {
+            reply.IOCStatus = MPI_IOCSTATUS_SCSI_INVALID_TARGETID;
+            goto out;
+        }
+
+        QTAILQ_FOREACH(kid, &s->bus.qbus.children, sibling) {
+            sdev = SCSI_DEVICE(kid->child);
+            if (sdev->channel == 0 && sdev->id == req->TargetID) {
+                qdev_reset_all(kid->child);
+            }
+        }
+        break;
+
+    case MPI_SCSITASKMGMT_TASKTYPE_RESET_BUS:
+        qbus_reset_all(&s->bus.qbus);
+        break;
+
+    default:
+        reply.ResponseCode = MPI_SCSITASKMGMT_RSP_TM_NOT_SUPPORTED;
+        break;
+    }
+
+out:
+    mptsas_fix_scsi_task_mgmt_reply_endianness(&reply);
+    mptsas_post_reply(s, (MPIDefaultReply *)&reply);
+}
+
+static void mptsas_process_ioc_init(MPTSASState *s, MPIMsgIOCInit *req)
+{
+    MPIMsgIOCInitReply reply;
+
+    mptsas_fix_ioc_init_endianness(req);
+
+    QEMU_BUILD_BUG_ON(MPTSAS_MAX_REQUEST_SIZE < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_msg) < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_reply) < sizeof(reply));
+
+    s->who_init               = req->WhoInit;
+    s->reply_frame_size       = req->ReplyFrameSize;
+    s->max_buses              = req->MaxBuses;
+    s->max_devices            = req->MaxDevices ? req->MaxDevices : 256;
+    s->host_mfa_high_addr     = (hwaddr)req->HostMfaHighAddr << 32;
+    s->sense_buffer_high_addr = (hwaddr)req->SenseBufferHighAddr << 32;
+
+    if (s->state == MPI_IOC_STATE_READY) {
+        s->state = MPI_IOC_STATE_OPERATIONAL;
+    }
+
+    memset(&reply, 0, sizeof(reply));
+    reply.WhoInit    = s->who_init;
+    reply.MsgLength  = sizeof(reply) / 4;
+    reply.Function   = req->Function;
+    reply.MaxDevices = s->max_devices;
+    reply.MaxBuses   = s->max_buses;
+    reply.MsgContext = req->MsgContext;
+
+    mptsas_fix_ioc_init_reply_endianness(&reply);
+    mptsas_reply(s, (MPIDefaultReply *)&reply);
+}
+
+static void mptsas_process_ioc_facts(MPTSASState *s,
+                                     MPIMsgIOCFacts *req)
+{
+    MPIMsgIOCFactsReply reply;
+
+    mptsas_fix_ioc_facts_endianness(req);
+
+    QEMU_BUILD_BUG_ON(MPTSAS_MAX_REQUEST_SIZE < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_msg) < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_reply) < sizeof(reply));
+
+    memset(&reply, 0, sizeof(reply));
+    reply.MsgVersion                 = 0x0105;
+    reply.MsgLength                  = sizeof(reply) / 4;
+    reply.Function                   = req->Function;
+    reply.MsgContext                 = req->MsgContext;
+    reply.MaxChainDepth              = MPTSAS_MAXIMUM_CHAIN_DEPTH;
+    reply.WhoInit                    = s->who_init;
+    reply.BlockSize                  = MPTSAS_MAX_REQUEST_SIZE / sizeof(uint32_t);
+    reply.ReplyQueueDepth            = ARRAY_SIZE(s->reply_post) - 1;
+    QEMU_BUILD_BUG_ON(ARRAY_SIZE(s->reply_post) != ARRAY_SIZE(s->reply_free));
+
+    reply.RequestFrameSize           = 128;
+    reply.ProductID                  = MPTSAS1068_PRODUCT_ID;
+    reply.CurrentHostMfaHighAddr     = s->host_mfa_high_addr >> 32;
+    reply.GlobalCredits              = ARRAY_SIZE(s->request_post) - 1;
+    reply.NumberOfPorts              = MPTSAS_NUM_PORTS;
+    reply.CurrentSenseBufferHighAddr = s->sense_buffer_high_addr >> 32;
+    reply.CurReplyFrameSize          = s->reply_frame_size;
+    reply.MaxDevices                 = s->max_devices;
+    reply.MaxBuses                   = s->max_buses;
+    reply.FWVersionDev               = 0;
+    reply.FWVersionUnit              = 0x92;
+    reply.FWVersionMinor             = 0x32;
+    reply.FWVersionMajor             = 0x1;
+
+    mptsas_fix_ioc_facts_reply_endianness(&reply);
+    mptsas_reply(s, (MPIDefaultReply *)&reply);
+}
+
+static void mptsas_process_port_facts(MPTSASState *s,
+                                     MPIMsgPortFacts *req)
+{
+    MPIMsgPortFactsReply reply;
+
+    mptsas_fix_port_facts_endianness(req);
+
+    QEMU_BUILD_BUG_ON(MPTSAS_MAX_REQUEST_SIZE < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_msg) < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_reply) < sizeof(reply));
+
+    memset(&reply, 0, sizeof(reply));
+    reply.MsgLength  = sizeof(reply) / 4;
+    reply.Function   = req->Function;
+    reply.PortNumber = req->PortNumber;
+    reply.MsgContext = req->MsgContext;
+
+    if (req->PortNumber < MPTSAS_NUM_PORTS) {
+        reply.PortType      = MPI_PORTFACTS_PORTTYPE_SAS;
+        reply.MaxDevices    = MPTSAS_NUM_PORTS;
+        reply.PortSCSIID    = MPTSAS_NUM_PORTS;
+        reply.ProtocolFlags = MPI_PORTFACTS_PROTOCOL_LOGBUSADDR | MPI_PORTFACTS_PROTOCOL_INITIATOR;
+    }
+
+    mptsas_fix_port_facts_reply_endianness(&reply);
+    mptsas_reply(s, (MPIDefaultReply *)&reply);
+}
+
+static void mptsas_process_port_enable(MPTSASState *s,
+                                       MPIMsgPortEnable *req)
+{
+    MPIMsgPortEnableReply reply;
+
+    mptsas_fix_port_enable_endianness(req);
+
+    QEMU_BUILD_BUG_ON(MPTSAS_MAX_REQUEST_SIZE < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_msg) < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_reply) < sizeof(reply));
+
+    memset(&reply, 0, sizeof(reply));
+    reply.MsgLength  = sizeof(reply) / 4;
+    reply.PortNumber = req->PortNumber;
+    reply.Function   = req->Function;
+    reply.MsgContext = req->MsgContext;
+
+    mptsas_fix_port_enable_reply_endianness(&reply);
+    mptsas_reply(s, (MPIDefaultReply *)&reply);
+}
+
+static void mptsas_process_event_notification(MPTSASState *s,
+                                              MPIMsgEventNotify *req)
+{
+    MPIMsgEventNotifyReply reply;
+
+    mptsas_fix_event_notification_endianness(req);
+
+    QEMU_BUILD_BUG_ON(MPTSAS_MAX_REQUEST_SIZE < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_msg) < sizeof(*req));
+    QEMU_BUILD_BUG_ON(sizeof(s->doorbell_reply) < sizeof(reply));
+
+    /* Don't even bother storing whether event notification is enabled,
+     * since it is not accessible.
+     */
+
+    memset(&reply, 0, sizeof(reply));
+    reply.EventDataLength = sizeof(reply.Data) / 4;
+    reply.MsgLength       = sizeof(reply) / 4;
+    reply.Function        = req->Function;
+
+    /* This is set because events are sent through the reply FIFOs.  */
+    reply.MsgFlags        = MPI_MSGFLAGS_CONTINUATION_REPLY;
+
+    reply.MsgContext      = req->MsgContext;
+    reply.Event           = MPI_EVENT_EVENT_CHANGE;
+    reply.Data[0]         = !!req->Switch;
+
+    mptsas_fix_event_notification_reply_endianness(&reply);
+    mptsas_reply(s, (MPIDefaultReply *)&reply);
+}
+
+static void mptsas_process_message(MPTSASState *s, MPIRequestHeader *req)
+{
+    trace_mptsas_process_message(s, req->Function, req->MsgContext);
+    switch (req->Function) {
+    case MPI_FUNCTION_SCSI_TASK_MGMT:
+        mptsas_process_scsi_task_mgmt(s, (MPIMsgSCSITaskMgmt *)req);
+        break;
+
+    case MPI_FUNCTION_IOC_INIT:
+        mptsas_process_ioc_init(s, (MPIMsgIOCInit *)req);
+        break;
+
+    case MPI_FUNCTION_IOC_FACTS:
+        mptsas_process_ioc_facts(s, (MPIMsgIOCFacts *)req);
+        break;
+
+    case MPI_FUNCTION_PORT_FACTS:
+        mptsas_process_port_facts(s, (MPIMsgPortFacts *)req);
+        break;
+
+    case MPI_FUNCTION_PORT_ENABLE:
+        mptsas_process_port_enable(s, (MPIMsgPortEnable *)req);
+        break;
+
+    case MPI_FUNCTION_EVENT_NOTIFICATION:
+        mptsas_process_event_notification(s, (MPIMsgEventNotify *)req);
+        break;
+
+    case MPI_FUNCTION_CONFIG:
+        mptsas_process_config(s, (MPIMsgConfig *)req);
+        break;
+
+    default:
+        trace_mptsas_unhandled_cmd(s, req->Function, 0);
+        mptsas_set_fault(s, MPI_IOCSTATUS_INVALID_FUNCTION);
+        break;
+    }
+}
+
+static void mptsas_fetch_request(MPTSASState *s)
+{
+    PCIDevice *pci = (PCIDevice *) s;
+    char req[MPTSAS_MAX_REQUEST_SIZE];
+    MPIRequestHeader *hdr = (MPIRequestHeader *)req;
+    hwaddr addr;
+    int size;
+
+    if (s->state != MPI_IOC_STATE_OPERATIONAL) {
+        mptsas_set_fault(s, MPI_IOCSTATUS_INVALID_STATE);
+        return;
+    }
+
+    /* Read the message header from the guest first. */
+    addr = s->host_mfa_high_addr | MPTSAS_FIFO_GET(s, request_post);
+    pci_dma_read(pci, addr, req, sizeof(hdr));
+
+    if (hdr->Function < ARRAY_SIZE(mpi_request_sizes) &&
+        mpi_request_sizes[hdr->Function]) {
+        /* Read the rest of the request based on the type.  Do not
+         * reread everything, as that could cause a TOC/TOU mismatch
+         * and leak data from the QEMU stack.
+         */
+        size = mpi_request_sizes[hdr->Function];
+        assert(size <= MPTSAS_MAX_REQUEST_SIZE);
+        pci_dma_read(pci, addr + sizeof(hdr), &req[sizeof(hdr)],
+                     size - sizeof(hdr));
+    }
+
+    if (hdr->Function == MPI_FUNCTION_SCSI_IO_REQUEST) {
+        /* SCSI I/O requests are separate from mptsas_process_message
+         * because they cannot be sent through the doorbell yet.
+         */
+        mptsas_process_scsi_io_request(s, (MPIMsgSCSIIORequest *)req, addr);
+    } else {
+        mptsas_process_message(s, (MPIRequestHeader *)req);
+    }
+}
+
+static void mptsas_fetch_requests(void *opaque)
+{
+    MPTSASState *s = opaque;
+
+    while (!MPTSAS_FIFO_EMPTY(s, request_post)) {
+        mptsas_fetch_request(s);
+    }
+}
+
+static void mptsas_soft_reset(MPTSASState *s)
+{
+    uint32_t save_mask;
+
+    trace_mptsas_reset(s);
+
+    /* Temporarily disable interrupts */
+    save_mask = s->intr_mask;
+    s->intr_mask = MPI_HIM_DIM | MPI_HIM_RIM;
+    mptsas_update_interrupt(s);
+
+    qbus_reset_all(&s->bus.qbus);
+    s->intr_status = 0;
+    s->intr_mask = save_mask;
+
+    s->reply_free_tail = 0;
+    s->reply_free_head = 0;
+    s->reply_post_tail = 0;
+    s->reply_post_head = 0;
+    s->request_post_tail = 0;
+    s->request_post_head = 0;
+    qemu_bh_cancel(s->request_bh);
+
+    s->state = MPI_IOC_STATE_READY;
+}
+
+static uint32_t mptsas_doorbell_read(MPTSASState *s)
+{
+    uint32_t ret;
+
+    ret = (s->who_init << MPI_DOORBELL_WHO_INIT_SHIFT) & MPI_DOORBELL_WHO_INIT_SHIFT;
+    ret |= s->state;
+    switch (s->doorbell_state) {
+    case DOORBELL_NONE:
+        break;
+
+    case DOORBELL_WRITE:
+        ret |= MPI_DOORBELL_ACTIVE;
+        break;
+
+    case DOORBELL_READ:
+        /* Get rid of the IOC fault code.  */
+        ret &= ~MPI_DOORBELL_DATA_MASK;
+
+        assert(s->intr_status & MPI_HIS_DOORBELL_INTERRUPT);
+        assert(s->doorbell_reply_idx <= s->doorbell_reply_size);
+
+        ret |= MPI_DOORBELL_ACTIVE;
+        if (s->doorbell_reply_idx < s->doorbell_reply_size) {
+            /* For more information about this endian switch, see the
+             * commit message for commit 36b62ae ("fw_cfg: fix endianness in
+             * fw_cfg_data_mem_read() / _write()", 2015-01-16).
+             */
+            ret |= le16_to_cpu(s->doorbell_reply[s->doorbell_reply_idx++]);
+        }
+        break;
+
+    default:
+        abort();
+    }
+
+    return ret;
+}
+
+static void mptsas_doorbell_write(MPTSASState *s, uint32_t val)
+{
+    if (s->doorbell_state == DOORBELL_WRITE) {
+        if (s->doorbell_idx < s->doorbell_cnt) {
+            /* For more information about this endian switch, see the
+             * commit message for commit 36b62ae ("fw_cfg: fix endianness in
+             * fw_cfg_data_mem_read() / _write()", 2015-01-16).
+             */
+            s->doorbell_msg[s->doorbell_idx++] = cpu_to_le32(val);
+            if (s->doorbell_idx == s->doorbell_cnt) {
+                mptsas_process_message(s, (MPIRequestHeader *)s->doorbell_msg);
+            }
+        }
+        return;
+    }
+
+    switch ((val & MPI_DOORBELL_FUNCTION_MASK) >> MPI_DOORBELL_FUNCTION_SHIFT) {
+    case MPI_FUNCTION_IOC_MESSAGE_UNIT_RESET:
+        mptsas_soft_reset(s);
+        break;
+    case MPI_FUNCTION_IO_UNIT_RESET:
+        break;
+    case MPI_FUNCTION_HANDSHAKE:
+        s->doorbell_state = DOORBELL_WRITE;
+        s->doorbell_idx = 0;
+        s->doorbell_cnt = (val & MPI_DOORBELL_ADD_DWORDS_MASK)
+            >> MPI_DOORBELL_ADD_DWORDS_SHIFT;
+        s->intr_status |= MPI_HIS_DOORBELL_INTERRUPT;
+        mptsas_update_interrupt(s);
+        break;
+    default:
+        trace_mptsas_unhandled_doorbell_cmd(s, val);
+        break;
+    }
+}
+
+static void mptsas_write_sequence_write(MPTSASState *s, uint32_t val)
+{
+    /* If the diagnostic register is enabled, any write to this register
+     * will disable it.  Otherwise, the guest has to do a magic five-write
+     * sequence.
+     */
+    if (s->diagnostic & MPI_DIAG_DRWE) {
+        goto disable;
+    }
+
+    switch (s->diagnostic_idx) {
+    case 0:
+        if ((val & MPI_WRSEQ_KEY_VALUE_MASK) != MPI_WRSEQ_1ST_KEY_VALUE) {
+            goto disable;
+        }
+        break;
+    case 1:
+        if ((val & MPI_WRSEQ_KEY_VALUE_MASK) != MPI_WRSEQ_2ND_KEY_VALUE) {
+            goto disable;
+        }
+        break;
+    case 2:
+        if ((val & MPI_WRSEQ_KEY_VALUE_MASK) != MPI_WRSEQ_3RD_KEY_VALUE) {
+            goto disable;
+        }
+        break;
+    case 3:
+        if ((val & MPI_WRSEQ_KEY_VALUE_MASK) != MPI_WRSEQ_4TH_KEY_VALUE) {
+            goto disable;
+        }
+        break;
+    case 4:
+        if ((val & MPI_WRSEQ_KEY_VALUE_MASK) != MPI_WRSEQ_5TH_KEY_VALUE) {
+            goto disable;
+        }
+        /* Prepare Spaceball One for departure, and change the
+         * combination on my luggage!
+         */
+        s->diagnostic |= MPI_DIAG_DRWE;
+        break;
+    }
+    s->diagnostic_idx++;
+    return;
+
+disable:
+    s->diagnostic &= ~MPI_DIAG_DRWE;
+    s->diagnostic_idx = 0;
+}
+
+static int mptsas_hard_reset(MPTSASState *s)
+{
+    mptsas_soft_reset(s);
+
+    s->intr_mask = MPI_HIM_DIM | MPI_HIM_RIM;
+
+    s->host_mfa_high_addr = 0;
+    s->sense_buffer_high_addr = 0;
+    s->reply_frame_size = 0;
+    s->max_devices = MPTSAS_NUM_PORTS;
+    s->max_buses = 1;
+
+    return 0;
+}
+
+static void mptsas_interrupt_status_write(MPTSASState *s)
+{
+    switch (s->doorbell_state) {
+    case DOORBELL_NONE:
+    case DOORBELL_WRITE:
+        s->intr_status &= ~MPI_HIS_DOORBELL_INTERRUPT;
+        break;
+
+    case DOORBELL_READ:
+        /* The reply can be read continuously, so leave the interrupt up.  */
+        assert(s->intr_status & MPI_HIS_DOORBELL_INTERRUPT);
+        if (s->doorbell_reply_idx == s->doorbell_reply_size) {
+            s->doorbell_state = DOORBELL_NONE;
+        }
+        break;
+
+    default:
+        abort();
+    }
+    mptsas_update_interrupt(s);
+}
+
+static uint32_t mptsas_reply_post_read(MPTSASState *s)
+{
+    uint32_t ret;
+
+    if (!MPTSAS_FIFO_EMPTY(s, reply_post)) {
+        ret = MPTSAS_FIFO_GET(s, reply_post);
+    } else {
+        ret = -1;
+        s->intr_status &= ~MPI_HIS_REPLY_MESSAGE_INTERRUPT;
+        mptsas_update_interrupt(s);
+    }
+
+    return ret;
+}
+
+static uint64_t mptsas_mmio_read(void *opaque, hwaddr addr,
+                                  unsigned size)
+{
+    MPTSASState *s = opaque;
+    uint32_t ret = 0;
+
+    switch (addr & ~3) {
+    case MPI_DOORBELL_OFFSET:
+        ret = mptsas_doorbell_read(s);
+        break;
+
+    case MPI_DIAGNOSTIC_OFFSET:
+        ret = s->diagnostic;
+        break;
+
+    case MPI_HOST_INTERRUPT_STATUS_OFFSET:
+        ret = s->intr_status;
+        break;
+
+    case MPI_HOST_INTERRUPT_MASK_OFFSET:
+        ret = s->intr_mask;
+        break;
+
+    case MPI_REPLY_POST_FIFO_OFFSET:
+        ret = mptsas_reply_post_read(s);
+        break;
+
+    default:
+        trace_mptsas_mmio_unhandled_read(s, addr);
+        break;
+    }
+    trace_mptsas_mmio_read(s, addr, ret);
+    return ret;
+}
+
+static void mptsas_mmio_write(void *opaque, hwaddr addr,
+                               uint64_t val, unsigned size)
+{
+    MPTSASState *s = opaque;
+
+    trace_mptsas_mmio_write(s, addr, val);
+    switch (addr) {
+    case MPI_DOORBELL_OFFSET:
+        mptsas_doorbell_write(s, val);
+        break;
+
+    case MPI_WRITE_SEQUENCE_OFFSET:
+        mptsas_write_sequence_write(s, val);
+        break;
+
+    case MPI_DIAGNOSTIC_OFFSET:
+        if (val & MPI_DIAG_RESET_ADAPTER) {
+            mptsas_hard_reset(s);
+        }
+        break;
+
+    case MPI_HOST_INTERRUPT_STATUS_OFFSET:
+        mptsas_interrupt_status_write(s);
+        break;
+
+    case MPI_HOST_INTERRUPT_MASK_OFFSET:
+        s->intr_mask = val & (MPI_HIM_RIM | MPI_HIM_DIM);
+        mptsas_update_interrupt(s);
+        break;
+
+    case MPI_REQUEST_POST_FIFO_OFFSET:
+        if (MPTSAS_FIFO_FULL(s, request_post)) {
+            mptsas_set_fault(s, MPI_IOCSTATUS_INSUFFICIENT_RESOURCES);
+        } else {
+            MPTSAS_FIFO_PUT(s, request_post, val & ~0x03);
+            qemu_bh_schedule(s->request_bh);
+        }
+        break;
+
+    case MPI_REPLY_FREE_FIFO_OFFSET:
+        if (MPTSAS_FIFO_FULL(s, reply_free)) {
+            mptsas_set_fault(s, MPI_IOCSTATUS_INSUFFICIENT_RESOURCES);
+        } else {
+            MPTSAS_FIFO_PUT(s, reply_free, val);
+        }
+        break;
+
+    default:
+        trace_mptsas_mmio_unhandled_write(s, addr, val);
+        break;
+    }
+}
+
+static const MemoryRegionOps mptsas_mmio_ops = {
+    .read = mptsas_mmio_read,
+    .write = mptsas_mmio_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    }
+};
+
+static const MemoryRegionOps mptsas_port_ops = {
+    .read = mptsas_mmio_read,
+    .write = mptsas_mmio_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    }
+};
+
+static uint64_t mptsas_diag_read(void *opaque, hwaddr addr,
+                                   unsigned size)
+{
+    MPTSASState *s = opaque;
+    trace_mptsas_diag_read(s, addr, 0);
+    return 0;
+}
+
+static void mptsas_diag_write(void *opaque, hwaddr addr,
+                               uint64_t val, unsigned size)
+{
+    MPTSASState *s = opaque;
+    trace_mptsas_diag_write(s, addr, val);
+}
+
+static const MemoryRegionOps mptsas_diag_ops = {
+    .read = mptsas_diag_read,
+    .write = mptsas_diag_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    }
+};
+
+static QEMUSGList *mptsas_get_sg_list(SCSIRequest *sreq)
+{
+    MPTSASRequest *req = sreq->hba_private;
+
+    return &req->qsg;
+}
+
+static void mptsas_command_complete(SCSIRequest *sreq,
+        uint32_t status, size_t resid)
+{
+    MPTSASRequest *req = sreq->hba_private;
+    MPTSASState *s = req->dev;
+    uint8_t sense_buf[SCSI_SENSE_BUF_SIZE];
+    uint8_t sense_len;
+
+    hwaddr sense_buffer_addr = req->dev->sense_buffer_high_addr |
+            req->scsi_io.SenseBufferLowAddr;
+
+    trace_mptsas_command_complete(s, req->scsi_io.MsgContext, status, resid);
+
+    sense_len = scsi_req_get_sense(sreq, sense_buf, SCSI_SENSE_BUF_SIZE);
+    if (sense_len > 0) {
+        pci_dma_write(PCI_DEVICE(s), sense_buffer_addr, sense_buf,
+                      MIN(req->scsi_io.SenseBufferLength, sense_len));
+    }
+
+    if (sreq->status != GOOD || resid ||
+        req->dev->doorbell_state == DOORBELL_WRITE) {
+        MPIMsgSCSIIOReply reply;
+
+        memset(&reply, 0, sizeof(reply));
+        reply.TargetID          = req->scsi_io.TargetID;
+        reply.Bus               = req->scsi_io.Bus;
+        reply.MsgLength         = sizeof(reply) / 4;
+        reply.Function          = req->scsi_io.Function;
+        reply.CDBLength         = req->scsi_io.CDBLength;
+        reply.SenseBufferLength = req->scsi_io.SenseBufferLength;
+        reply.MsgFlags          = req->scsi_io.MsgFlags;
+        reply.MsgContext        = req->scsi_io.MsgContext;
+        reply.SCSIStatus        = sreq->status;
+        if (sreq->status == GOOD) {
+            reply.TransferCount = req->scsi_io.DataLength - resid;
+            if (resid) {
+                reply.IOCStatus     = MPI_IOCSTATUS_SCSI_DATA_UNDERRUN;
+            }
+        } else {
+            reply.SCSIState     = MPI_SCSI_STATE_AUTOSENSE_VALID;
+            reply.SenseCount    = sense_len;
+            reply.IOCStatus     = MPI_IOCSTATUS_SCSI_DATA_UNDERRUN;
+        }
+
+        mptsas_fix_scsi_io_reply_endianness(&reply);
+        mptsas_post_reply(req->dev, (MPIDefaultReply *)&reply);
+    } else {
+        mptsas_turbo_reply(req->dev, req->scsi_io.MsgContext);
+    }
+
+    mptsas_free_request(req);
+}
+
+static void mptsas_request_cancelled(SCSIRequest *sreq)
+{
+    MPTSASRequest *req = sreq->hba_private;
+    MPIMsgSCSIIOReply reply;
+
+    memset(&reply, 0, sizeof(reply));
+    reply.TargetID          = req->scsi_io.TargetID;
+    reply.Bus               = req->scsi_io.Bus;
+    reply.MsgLength         = sizeof(reply) / 4;
+    reply.Function          = req->scsi_io.Function;
+    reply.CDBLength         = req->scsi_io.CDBLength;
+    reply.SenseBufferLength = req->scsi_io.SenseBufferLength;
+    reply.MsgFlags          = req->scsi_io.MsgFlags;
+    reply.MsgContext        = req->scsi_io.MsgContext;
+    reply.SCSIState         = MPI_SCSI_STATE_NO_SCSI_STATUS;
+    reply.IOCStatus         = MPI_IOCSTATUS_SCSI_TASK_TERMINATED;
+
+    mptsas_fix_scsi_io_reply_endianness(&reply);
+    mptsas_post_reply(req->dev, (MPIDefaultReply *)&reply);
+    mptsas_free_request(req);
+}
+
+static void mptsas_save_request(QEMUFile *f, SCSIRequest *sreq)
+{
+    MPTSASRequest *req = sreq->hba_private;
+    int i;
+
+    qemu_put_buffer(f, (unsigned char *)&req->scsi_io, sizeof(req->scsi_io));
+    qemu_put_be32(f, req->qsg.nsg);
+    for (i = 0; i < req->qsg.nsg; i++) {
+        qemu_put_be64(f, req->qsg.sg[i].base);
+        qemu_put_be64(f, req->qsg.sg[i].len);
+    }
+}
+
+static void *mptsas_load_request(QEMUFile *f, SCSIRequest *sreq)
+{
+    SCSIBus *bus = sreq->bus;
+    MPTSASState *s = container_of(bus, MPTSASState, bus);
+    PCIDevice *pci = PCI_DEVICE(s);
+    MPTSASRequest *req;
+    int i, n;
+
+    req = g_new(MPTSASRequest, 1);
+    qemu_get_buffer(f, (unsigned char *)&req->scsi_io, sizeof(req->scsi_io));
+
+    n = qemu_get_be32(f);
+    /* TODO: add a way for SCSIBusInfo's load_request to fail,
+     * and fail migration instead of asserting here.
+     * When we do, we might be able to re-enable NDEBUG below.
+     */
+#ifdef NDEBUG
+#error building with NDEBUG is not supported
+#endif
+    assert(n >= 0);
+
+    pci_dma_sglist_init(&req->qsg, pci, n);
+    for (i = 0; i < n; i++) {
+        uint64_t base = qemu_get_be64(f);
+        uint64_t len = qemu_get_be64(f);
+        qemu_sglist_add(&req->qsg, base, len);
+    }
+
+    scsi_req_ref(sreq);
+    req->sreq = sreq;
+    req->dev = s;
+
+    return req;
+}
+
+static const struct SCSIBusInfo mptsas_scsi_info = {
+    .tcq = true,
+    .max_target = MPTSAS_NUM_PORTS,
+    .max_lun = 1,
+
+    .get_sg_list = mptsas_get_sg_list,
+    .complete = mptsas_command_complete,
+    .cancel = mptsas_request_cancelled,
+    .save_request = mptsas_save_request,
+    .load_request = mptsas_load_request,
+};
+
+static void mptsas_scsi_init(PCIDevice *dev, Error **errp)
+{
+    DeviceState *d = DEVICE(dev);
+    MPTSASState *s = MPT_SAS(dev);
+
+    dev->config[PCI_LATENCY_TIMER] = 0;
+    dev->config[PCI_INTERRUPT_PIN] = 0x01;
+
+    memory_region_init_io(&s->mmio_io, OBJECT(s), &mptsas_mmio_ops, s,
+                          "mptsas-mmio", 0x4000);
+    memory_region_init_io(&s->port_io, OBJECT(s), &mptsas_port_ops, s,
+                          "mptsas-io", 256);
+    memory_region_init_io(&s->diag_io, OBJECT(s), &mptsas_diag_ops, s,
+                          "mptsas-diag", 0x10000);
+
+    if (s->msi_available &&
+        msi_init(dev, 0, 1, true, false) >= 0) {
+        s->msi_in_use = true;
+    }
+
+    pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &s->port_io);
+    pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY |
+                                 PCI_BASE_ADDRESS_MEM_TYPE_32, &s->mmio_io);
+    pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY |
+                                 PCI_BASE_ADDRESS_MEM_TYPE_32, &s->diag_io);
+
+    if (!s->sas_addr) {
+        s->sas_addr = ((NAA_LOCALLY_ASSIGNED_ID << 24) |
+                       IEEE_COMPANY_LOCALLY_ASSIGNED) << 36;
+        s->sas_addr |= (pci_bus_num(dev->bus) << 16);
+        s->sas_addr |= (PCI_SLOT(dev->devfn) << 8);
+        s->sas_addr |= PCI_FUNC(dev->devfn);
+    }
+    s->max_devices = MPTSAS_NUM_PORTS;
+
+    s->request_bh = qemu_bh_new(mptsas_fetch_requests, s);
+
+    QTAILQ_INIT(&s->pending);
+
+    scsi_bus_new(&s->bus, sizeof(s->bus), &dev->qdev, &mptsas_scsi_info, NULL);
+    if (!d->hotplugged) {
+        scsi_bus_legacy_handle_cmdline(&s->bus, errp);
+    }
+}
+
+static void mptsas_scsi_uninit(PCIDevice *dev)
+{
+    MPTSASState *s = MPT_SAS(dev);
+
+    qemu_bh_delete(s->request_bh);
+    if (s->msi_in_use) {
+        msi_uninit(dev);
+    }
+}
+
+static void mptsas_reset(DeviceState *dev)
+{
+    MPTSASState *s = MPT_SAS(dev);
+
+    mptsas_hard_reset(s);
+}
+
+static int mptsas_post_load(void *opaque, int version_id)
+{
+    MPTSASState *s = opaque;
+
+    if (s->doorbell_idx > s->doorbell_cnt ||
+        s->doorbell_cnt > ARRAY_SIZE(s->doorbell_msg) ||
+        s->doorbell_reply_idx > s->doorbell_reply_size ||
+        s->doorbell_reply_size > ARRAY_SIZE(s->doorbell_reply) ||
+        MPTSAS_FIFO_INVALID(s, request_post) ||
+        MPTSAS_FIFO_INVALID(s, reply_post) ||
+        MPTSAS_FIFO_INVALID(s, reply_free) ||
+        s->diagnostic_idx > 4) {
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_mptsas = {
+    .name = "mptsas",
+    .version_id = 0,
+    .minimum_version_id = 0,
+    .minimum_version_id_old = 0,
+    .post_load = mptsas_post_load,
+    .fields      = (VMStateField[]) {
+        VMSTATE_PCI_DEVICE(dev, MPTSASState),
+        VMSTATE_BOOL(msi_in_use, MPTSASState),
+
+        VMSTATE_UINT32(state, MPTSASState),
+        VMSTATE_UINT8(who_init, MPTSASState),
+        VMSTATE_UINT8(doorbell_state, MPTSASState),
+        VMSTATE_UINT32_ARRAY(doorbell_msg, MPTSASState, 256),
+        VMSTATE_INT32(doorbell_idx, MPTSASState),
+        VMSTATE_INT32(doorbell_cnt, MPTSASState),
+
+        VMSTATE_UINT16_ARRAY(doorbell_reply, MPTSASState, 256),
+        VMSTATE_INT32(doorbell_reply_idx, MPTSASState),
+        VMSTATE_INT32(doorbell_reply_size, MPTSASState),
+
+        VMSTATE_UINT32(diagnostic, MPTSASState),
+        VMSTATE_UINT8(diagnostic_idx, MPTSASState),
+
+        VMSTATE_UINT32(intr_status, MPTSASState),
+        VMSTATE_UINT32(intr_mask, MPTSASState),
+
+        VMSTATE_UINT32_ARRAY(request_post, MPTSASState,
+                             MPTSAS_REQUEST_QUEUE_DEPTH + 1),
+        VMSTATE_UINT16(request_post_head, MPTSASState),
+        VMSTATE_UINT16(request_post_tail, MPTSASState),
+
+        VMSTATE_UINT32_ARRAY(reply_post, MPTSASState,
+                             MPTSAS_REPLY_QUEUE_DEPTH + 1),
+        VMSTATE_UINT16(reply_post_head, MPTSASState),
+        VMSTATE_UINT16(reply_post_tail, MPTSASState),
+
+        VMSTATE_UINT32_ARRAY(reply_free, MPTSASState,
+                             MPTSAS_REPLY_QUEUE_DEPTH + 1),
+        VMSTATE_UINT16(reply_free_head, MPTSASState),
+        VMSTATE_UINT16(reply_free_tail, MPTSASState),
+
+        VMSTATE_UINT16(max_buses, MPTSASState),
+        VMSTATE_UINT16(max_devices, MPTSASState),
+        VMSTATE_UINT16(reply_frame_size, MPTSASState),
+        VMSTATE_UINT64(host_mfa_high_addr, MPTSASState),
+        VMSTATE_UINT64(sense_buffer_high_addr, MPTSASState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static Property mptsas_properties[] = {
+    DEFINE_PROP_UINT64("sas_address", MPTSASState, sas_addr, 0),
+    /* TODO: test MSI support under Windows */
+    DEFINE_PROP_BIT("msi", MPTSASState, msi_available, 0, true),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void mptsas1068_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+    PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
+
+    pc->realize = mptsas_scsi_init;
+    pc->exit = mptsas_scsi_uninit;
+    pc->romfile = 0;
+    pc->vendor_id = PCI_VENDOR_ID_LSI_LOGIC;
+    pc->device_id = PCI_DEVICE_ID_LSI_SAS1068;
+    pc->subsystem_vendor_id = PCI_VENDOR_ID_LSI_LOGIC;
+    pc->subsystem_id = 0x8000;
+    pc->class_id = PCI_CLASS_STORAGE_SCSI;
+    dc->props = mptsas_properties;
+    dc->reset = mptsas_reset;
+    dc->vmsd = &vmstate_mptsas;
+    dc->desc = "LSI SAS 1068";
+}
+
+static const TypeInfo mptsas_info = {
+    .name = TYPE_MPTSAS1068,
+    .parent = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(MPTSASState),
+    .class_init = mptsas1068_class_init,
+};
+
+static void mptsas_register_types(void)
+{
+    type_register(&mptsas_info);
+}
+
+type_init(mptsas_register_types)
diff --git a/hw/scsi/mptsas.h b/hw/scsi/mptsas.h
new file mode 100644
index 0000000..595f81f
--- /dev/null
+++ b/hw/scsi/mptsas.h
@@ -0,0 +1,100 @@
+#ifndef MPTSAS_H
+#define MPTSAS_H
+
+#include "mpi.h"
+
+#define MPTSAS_NUM_PORTS 8
+#define MPTSAS_MAX_FRAMES 2048     /* Firmware limit at 65535 */
+
+#define MPTSAS_REQUEST_QUEUE_DEPTH 128
+#define MPTSAS_REPLY_QUEUE_DEPTH   128
+
+#define MPTSAS_MAXIMUM_CHAIN_DEPTH 0x22
+
+typedef struct MPTSASState MPTSASState;
+typedef struct MPTSASRequest MPTSASRequest;
+
+enum {
+    DOORBELL_NONE,
+    DOORBELL_WRITE,
+    DOORBELL_READ
+};
+
+struct MPTSASState {
+    PCIDevice dev;
+    MemoryRegion mmio_io;
+    MemoryRegion port_io;
+    MemoryRegion diag_io;
+    QEMUBH *request_bh;
+
+    uint32_t msi_available;
+    uint64_t sas_addr;
+
+    bool msi_in_use;
+
+    /* Doorbell register */
+    uint32_t state;
+    uint8_t who_init;
+    uint8_t doorbell_state;
+
+    /* Buffer for requests that are sent through the doorbell register.  */
+    uint32_t doorbell_msg[256];
+    int doorbell_idx;
+    int doorbell_cnt;
+
+    uint16_t doorbell_reply[256];
+    int doorbell_reply_idx;
+    int doorbell_reply_size;
+
+    /* Other registers */
+    uint8_t diagnostic_idx;
+    uint32_t diagnostic;
+    uint32_t intr_mask;
+    uint32_t intr_status;
+
+    /* Request queues */
+    uint32_t request_post[MPTSAS_REQUEST_QUEUE_DEPTH + 1];
+    uint16_t request_post_head;
+    uint16_t request_post_tail;
+
+    uint32_t reply_post[MPTSAS_REPLY_QUEUE_DEPTH + 1];
+    uint16_t reply_post_head;
+    uint16_t reply_post_tail;
+
+    uint32_t reply_free[MPTSAS_REPLY_QUEUE_DEPTH + 1];
+    uint16_t reply_free_head;
+    uint16_t reply_free_tail;
+
+    /* IOC Facts */
+    hwaddr host_mfa_high_addr;
+    hwaddr sense_buffer_high_addr;
+    uint16_t max_devices;
+    uint16_t max_buses;
+    uint16_t reply_frame_size;
+
+    SCSIBus bus;
+    QTAILQ_HEAD(, MPTSASRequest) pending;
+};
+
+void mptsas_fix_scsi_io_endianness(MPIMsgSCSIIORequest *req);
+void mptsas_fix_scsi_io_reply_endianness(MPIMsgSCSIIOReply *reply);
+void mptsas_fix_scsi_task_mgmt_endianness(MPIMsgSCSITaskMgmt *req);
+void mptsas_fix_scsi_task_mgmt_reply_endianness(MPIMsgSCSITaskMgmtReply *reply);
+void mptsas_fix_ioc_init_endianness(MPIMsgIOCInit *req);
+void mptsas_fix_ioc_init_reply_endianness(MPIMsgIOCInitReply *reply);
+void mptsas_fix_ioc_facts_endianness(MPIMsgIOCFacts *req);
+void mptsas_fix_ioc_facts_reply_endianness(MPIMsgIOCFactsReply *reply);
+void mptsas_fix_config_endianness(MPIMsgConfig *req);
+void mptsas_fix_config_reply_endianness(MPIMsgConfigReply *reply);
+void mptsas_fix_port_facts_endianness(MPIMsgPortFacts *req);
+void mptsas_fix_port_facts_reply_endianness(MPIMsgPortFactsReply *reply);
+void mptsas_fix_port_enable_endianness(MPIMsgPortEnable *req);
+void mptsas_fix_port_enable_reply_endianness(MPIMsgPortEnableReply *reply);
+void mptsas_fix_event_notification_endianness(MPIMsgEventNotify *req);
+void mptsas_fix_event_notification_reply_endianness(MPIMsgEventNotifyReply *reply);
+
+void mptsas_reply(MPTSASState *s, MPIDefaultReply *reply);
+
+void mptsas_process_config(MPTSASState *s, MPIMsgConfig *req);
+
+#endif /* MPTSAS_H */
diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index d98e6c9..db85afa 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -64,6 +64,7 @@
 #define PCI_VENDOR_ID_LSI_LOGIC          0x1000
 #define PCI_DEVICE_ID_LSI_53C810         0x0001
 #define PCI_DEVICE_ID_LSI_53C895A        0x0012
+#define PCI_DEVICE_ID_LSI_SAS1068        0x0054
 #define PCI_DEVICE_ID_LSI_SAS1078        0x0060
 #define PCI_DEVICE_ID_LSI_SAS0079        0x0079
 
diff --git a/trace-events b/trace-events
index c9ac144..f986c81 100644
--- a/trace-events
+++ b/trace-events
@@ -726,6 +726,28 @@ lm32_uart_memory_write(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
 lm32_uart_memory_read(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
 lm32_uart_irq_state(int level) "irq state %d"
 
+# hw/scsi/mptsas.c
+mptsas_command_complete(void *dev, uint32_t ctx, uint32_t status, uint32_t resid) "dev %p context 0x%08x status %x resid %d"
+mptsas_diag_read(void *dev, uint32_t addr, uint32_t val) "dev %p addr 0x%08x value 0x%08x"
+mptsas_diag_write(void *dev, uint32_t addr, uint32_t val) "dev %p addr 0x%08x value 0x%08x"
+mptsas_irq_intx(void *dev, int level) "dev %p level %d"
+mptsas_irq_msi(void *dev) "dev %p "
+mptsas_mmio_read(void *dev, uint32_t addr, uint32_t val) "dev %p addr 0x%08x value 0x%x"
+mptsas_mmio_unhandled_read(void *dev, uint32_t addr) "dev %p addr 0x%08x"
+mptsas_mmio_unhandled_write(void *dev, uint32_t addr, uint32_t val) "dev %p addr 0x%08x value 0x%x"
+mptsas_mmio_write(void *dev, uint32_t addr, uint32_t val) "dev %p addr 0x%08x value 0x%x"
+mptsas_process_message(void *dev, int msg, uint32_t ctx) "dev %p cmd %d context 0x%08x\n"
+mptsas_process_scsi_io_request(void *dev, int bus, int target, int lun, uint64_t len) "dev %p dev %d:%d:%d length %"PRIu64""
+mptsas_reset(void *dev) "dev %p "
+mptsas_scsi_overflow(void *dev, uint32_t ctx, uint64_t req, uint64_t found) "dev %p context 0x%08x: %"PRIu64"/%"PRIu64""
+mptsas_sgl_overflow(void *dev, uint32_t ctx, uint64_t req, uint64_t found) "dev %p context 0x%08x: %"PRIu64"/%"PRIu64""
+mptsas_unhandled_cmd(void *dev, uint32_t ctx, uint8_t msg_cmd) "dev %p context 0x%08x: Unhandled cmd %x"
+mptsas_unhandled_doorbell_cmd(void *dev, int cmd) "dev %p value 0x%08x"
+
+# hw/scsi/mptconfig.c
+mptsas_config_sas_device(void *dev, int address, int port, int phy_handle, int dev_handle, int page) "dev %p address %d (port %d, handles: phy %d dev %d) page %d"
+mptsas_config_sas_phy(void *dev, int address, int port, int phy_handle, int dev_handle, int page) "dev %p address %d (port %d, handles: phy %d dev %d) page %d"
+
 # hw/scsi/megasas.c
 megasas_init_firmware(uint64_t pa) "pa %" PRIx64 " "
 megasas_init_queue(uint64_t queue_pa, int queue_len, uint64_t head, uint64_t tail, uint32_t flags) "queue at %" PRIx64 " len %d head %" PRIx64 " tail %" PRIx64 " flags %x"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PULL 29/32] docs/memory.txt: Improve list of different memory regions
  2016-02-09 12:13 [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Paolo Bonzini
  2016-02-09 12:13 ` [Qemu-devel] [PULL 02/32] memory: RCU ram_list.dirty_memory[] for safe RAM hotplug Paolo Bonzini
  2016-02-09 12:13 ` [Qemu-devel] [PULL 08/32] hw: Add support for LSI SAS1068 (mptsas) device Paolo Bonzini
@ 2016-02-09 12:13 ` Paolo Bonzini
  2016-02-09 12:13 ` [Qemu-devel] [PULL 30/32] target-i386: fix PSE36 mode Paolo Bonzini
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2016-02-09 12:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

From: Peter Maydell <peter.maydell@linaro.org>

Improve the part of the memory region documentation which describes
the various different kinds of memory region:
 * add the missing types ROM, IOMMU and reservation
 * mention the functions used to initialize each type, as a hint
   for finding the API docs and examples of use

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1454007297-3971-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 docs/memory.txt | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/docs/memory.txt b/docs/memory.txt
index 2ceb348..8745f76 100644
--- a/docs/memory.txt
+++ b/docs/memory.txt
@@ -26,14 +26,28 @@ These represent memory as seen from the CPU or a device's viewpoint.
 Types of regions
 ----------------
 
-There are four types of memory regions (all represented by a single C type
+There are multiple types of memory regions (all represented by a single C type
 MemoryRegion):
 
 - RAM: a RAM region is simply a range of host memory that can be made available
   to the guest.
+  You typically initialize these with memory_region_init_ram().  Some special
+  purposes require the variants memory_region_init_resizeable_ram(),
+  memory_region_init_ram_from_file(), or memory_region_init_ram_ptr().
 
 - MMIO: a range of guest memory that is implemented by host callbacks;
   each read or write causes a callback to be called on the host.
+  You initialize these with memory_region_io(), passing it a MemoryRegionOps
+  structure describing the callbacks.
+
+- ROM: a ROM memory region works like RAM for reads (directly accessing
+  a region of host memory), but like MMIO for writes (invoking a callback).
+  You initialize these with memory_region_init_rom_device().
+
+- IOMMU region: an IOMMU region translates addresses of accesses made to it
+  and forwards them to some other target memory region.  As the name suggests,
+  these are only needed for modelling an IOMMU, not for simple devices.
+  You initialize these with memory_region_init_iommu().
 
 - container: a container simply includes other memory regions, each at
   a different offset.  Containers are useful for grouping several regions
@@ -45,12 +59,22 @@ MemoryRegion):
   can overlay a subregion of RAM with MMIO or ROM, or a PCI controller
   that does not prevent card from claiming overlapping BARs.
 
+  You initialize a pure container with memory_region_init().
+
 - alias: a subsection of another region.  Aliases allow a region to be
   split apart into discontiguous regions.  Examples of uses are memory banks
   used when the guest address space is smaller than the amount of RAM
   addressed, or a memory controller that splits main memory to expose a "PCI
   hole".  Aliases may point to any type of region, including other aliases,
   but an alias may not point back to itself, directly or indirectly.
+  You initialize these with memory_region_init_alias().
+
+- reservation region: a reservation region is primarily for debugging.
+  It claims I/O space that is not supposed to be handled by QEMU itself.
+  The typical use is to track parts of the address space which will be
+  handled by the host kernel when KVM is enabled.
+  You initialize these with memory_region_init_reservation(), or by
+  passing a NULL callback parameter to memory_region_init_io().
 
 It is valid to add subregions to a region which is not a pure container
 (that is, to an MMIO, RAM or ROM region). This means that the region
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PULL 30/32] target-i386: fix PSE36 mode
  2016-02-09 12:13 [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-02-09 12:13 ` [Qemu-devel] [PULL 29/32] docs/memory.txt: Improve list of different memory regions Paolo Bonzini
@ 2016-02-09 12:13 ` Paolo Bonzini
  2016-02-09 12:13 ` [Qemu-devel] [PULL 31/32] MAINTAINERS: add all-match entry for qemu-devel@ Paolo Bonzini
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2016-02-09 12:13 UTC (permalink / raw)
  To: qemu-devel

(pde & 0x1fe000) is a 32-bit integer; when shifting it
into bits 39-32 the result is zero.  Fix it by making the
mask (and thus the result of the AND) a 64-bit integer.

Reported by Coverity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-i386/helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 81568c8..3802ed9 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -861,7 +861,7 @@ int x86_cpu_handle_mmu_fault(CPUState *cs, vaddr addr,
             /* Bits 20-13 provide bits 39-32 of the address, bit 21 is reserved.
              * Leave bits 20-13 in place for setting accessed/dirty bits below.
              */
-            pte = pde | ((pde & 0x1fe000) << (32 - 13));
+            pte = pde | ((pde & 0x1fe000LL) << (32 - 13));
             rsvd_mask = 0x200000;
             goto do_check_protect_pse36;
         }
@@ -1056,7 +1056,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
         if (!(pde & PG_PRESENT_MASK))
             return -1;
         if ((pde & PG_PSE_MASK) && (env->cr[4] & CR4_PSE_MASK)) {
-            pte = pde | ((pde & 0x1fe000) << (32 - 13));
+            pte = pde | ((pde & 0x1fe000LL) << (32 - 13));
             page_size = 4096 * 1024;
         } else {
             /* page directory entry */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PULL 31/32] MAINTAINERS: add all-match entry for qemu-devel@
  2016-02-09 12:13 [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Paolo Bonzini
                   ` (3 preceding siblings ...)
  2016-02-09 12:13 ` [Qemu-devel] [PULL 30/32] target-i386: fix PSE36 mode Paolo Bonzini
@ 2016-02-09 12:13 ` Paolo Bonzini
  2016-02-09 13:47   ` Markus Armbruster
  2016-02-09 12:13 ` [Qemu-devel] [PULL 32/32] qemu-char, io: fix ordering of arguments for UDP socket creation Paolo Bonzini
  2016-02-09 14:20 ` [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Peter Maydell
  6 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2016-02-09 12:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: John Snow, Markus Armbruster, Stephen Warren

From: Stephen Warren <swarren@wwwdotorg.org>

Add an entry to MAINTAINERS that matches every patch, and requests the
user send patches to qemu-devel@nongnu.org.

It's not 100% obvious to project newcomers that all patches should be sent
there; checkpatch doesn't say so, and since it mentions other lists to CC,
the wording "the list" from the SubmitAPatch wiki page can be taken
to mean only those lists, not the main list too.

The F: entries were taken from a similar entry in the Linux kernel.

Modify get_maintainer.pl so that the all-matching list entry doesn't
prevent the git fallback from ever triggering. The fallback now relies
on finding a real person in MAINTAINERS, not just a mailing list for
the relevant sub-community. This portion of the patch was suggested by
pbonzini@redhat.com.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Signed-off-by: Stephen Warren <swarren@wwwdotorg.org>
Message-Id: <1454987065-12961-1-git-send-email-swarren@wwwdotorg.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 MAINTAINERS               | 5 +++++
 scripts/get_maintainer.pl | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b6ed87a..2d78eea 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -52,6 +52,11 @@ General Project Administration
 ------------------------------
 M: Peter Maydell <peter.maydell@linaro.org>
 
+All patches CC here
+L: qemu-devel@nongnu.org
+F: *
+F: */
+
 Responsible Disclosure, Reporting Security Issues
 ------------------------------
 W: http://wiki.qemu.org/SecurityProcess
diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
index 7dacf32..8261bcb 100755
--- a/scripts/get_maintainer.pl
+++ b/scripts/get_maintainer.pl
@@ -636,7 +636,7 @@ sub get_maintainers {
 
     if ($email) {
 	if (! $interactive) {
-	    $email_git_fallback = 0 if @email_to > 0 || @list_to > 0 || $email_git || $email_git_blame;
+	    $email_git_fallback = 0 if @email_to > 0 || $email_git || $email_git_blame;
 	    if ($email_git_fallback) {
 	        print STDERR "get_maintainer.pl: No maintainers found, printing recent contributors.\n";
 	        print STDERR "get_maintainer.pl: Do not blindly cc: them on patches!  Use common sense.\n";
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PULL 32/32] qemu-char, io: fix ordering of arguments for UDP socket creation
  2016-02-09 12:13 [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Paolo Bonzini
                   ` (4 preceding siblings ...)
  2016-02-09 12:13 ` [Qemu-devel] [PULL 31/32] MAINTAINERS: add all-match entry for qemu-devel@ Paolo Bonzini
@ 2016-02-09 12:13 ` Paolo Bonzini
  2016-02-09 14:20 ` [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Peter Maydell
  6 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2016-02-09 12:13 UTC (permalink / raw)
  To: qemu-devel

Two wrongs make a right, but they should be fixed anyway.

Cc: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1455015557-15106-1-git-send-email-pbonzini@redhat.com>
---
 io/channel-socket.c | 2 +-
 qemu-char.c         | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 22d2fd6..bf66a78 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -258,7 +258,7 @@ int qio_channel_socket_dgram_sync(QIOChannelSocket *ioc,
     int fd;
 
     trace_qio_channel_socket_dgram_sync(ioc, localAddr, remoteAddr);
-    fd = socket_dgram(localAddr, remoteAddr, errp);
+    fd = socket_dgram(remoteAddr, localAddr, errp);
     if (fd < 0) {
         trace_qio_channel_socket_dgram_fail(ioc);
         return -1;
diff --git a/qemu-char.c b/qemu-char.c
index 84eb8a1..2b2c56b 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -4386,7 +4386,7 @@ static CharDriverState *qmp_chardev_open_udp(const char *id,
     QIOChannelSocket *sioc = qio_channel_socket_new();
 
     if (qio_channel_socket_dgram_sync(sioc,
-                                      udp->remote, udp->local,
+                                      udp->local, udp->remote,
                                       errp) < 0) {
         object_unref(OBJECT(sioc));
         return NULL;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PULL 31/32] MAINTAINERS: add all-match entry for qemu-devel@
  2016-02-09 12:13 ` [Qemu-devel] [PULL 31/32] MAINTAINERS: add all-match entry for qemu-devel@ Paolo Bonzini
@ 2016-02-09 13:47   ` Markus Armbruster
  2016-02-09 15:01     ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Markus Armbruster @ 2016-02-09 13:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: John Snow, qemu-devel, Stephen Warren

Not so fast :)

Paolo Bonzini <pbonzini@redhat.com> writes:

> From: Stephen Warren <swarren@wwwdotorg.org>
>
> Add an entry to MAINTAINERS that matches every patch, and requests the
> user send patches to qemu-devel@nongnu.org.
>
> It's not 100% obvious to project newcomers that all patches should be sent
> there; checkpatch doesn't say so, and since it mentions other lists to CC,

s/checkpatch/get_maintainer.pl/

> the wording "the list" from the SubmitAPatch wiki page can be taken
> to mean only those lists, not the main list too.
>
> The F: entries were taken from a similar entry in the Linux kernel.
>
> Modify get_maintainer.pl so that the all-matching list entry doesn't
> prevent the git fallback from ever triggering.

This suggests only the all-matching entry is exempt, but...

>                                                The fallback now relies
> on finding a real person in MAINTAINERS, not just a mailing list for

... in fact all lists now are.  Entries with L: and without M:

    Maintained: LINUX, POSIX, i386 target
    Odd fixes: GDB stub
    Orphan: Gumstix, Mipssim, Stable 0.10, Stable 0.14, Stable 0.15

For these, we now fall back to git.  Intentional?

If yes, mention it in the commit message.

If no, the easiest fix is to give them an M, if we can find a willing
victim.

Aside: we also have entries with neither L: nor M:

    Orphan: LSI53C895A, M68K, an5206, dummy_m68k, mcf5208

Odd.

> the relevant sub-community. This portion of the patch was suggested by
> pbonzini@redhat.com.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: John Snow <jsnow@redhat.com>
> Signed-off-by: Stephen Warren <swarren@wwwdotorg.org>
> Message-Id: <1454987065-12961-1-git-send-email-swarren@wwwdotorg.org>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  MAINTAINERS               | 5 +++++
>  scripts/get_maintainer.pl | 2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b6ed87a..2d78eea 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -52,6 +52,11 @@ General Project Administration
>  ------------------------------
>  M: Peter Maydell <peter.maydell@linaro.org>
>  
> +All patches CC here

"All patches must be sent to this mailing list"

> +L: qemu-devel@nongnu.org
> +F: *
> +F: */
> +
>  Responsible Disclosure, Reporting Security Issues
>  ------------------------------
>  W: http://wiki.qemu.org/SecurityProcess
> diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
> index 7dacf32..8261bcb 100755
> --- a/scripts/get_maintainer.pl
> +++ b/scripts/get_maintainer.pl
> @@ -636,7 +636,7 @@ sub get_maintainers {
>  
>      if ($email) {
>  	if (! $interactive) {
> -	    $email_git_fallback = 0 if @email_to > 0 || @list_to > 0 || $email_git || $email_git_blame;
> +	    $email_git_fallback = 0 if @email_to > 0 || $email_git || $email_git_blame;
>  	    if ($email_git_fallback) {
>  	        print STDERR "get_maintainer.pl: No maintainers found, printing recent contributors.\n";
>  	        print STDERR "get_maintainer.pl: Do not blindly cc: them on patches!  Use common sense.\n";

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08
  2016-02-09 12:13 [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Paolo Bonzini
                   ` (5 preceding siblings ...)
  2016-02-09 12:13 ` [Qemu-devel] [PULL 32/32] qemu-char, io: fix ordering of arguments for UDP socket creation Paolo Bonzini
@ 2016-02-09 14:20 ` Peter Maydell
  2016-02-09 14:41   ` Paolo Bonzini
  6 siblings, 1 reply; 16+ messages in thread
From: Peter Maydell @ 2016-02-09 14:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: QEMU Developers, Richard Henderson

On 9 February 2016 at 12:13, Paolo Bonzini <pbonzini@redhat.com> wrote:
> The following changes since commit e4a096b1cd4350eeca5dcdc391ab333d2083d7fd:
>
>   ui/cocoa.m: Include qemu/osdep.h (2016-02-08 13:14:40 +0000)
>
> are available in the git repository at:
>
>   git://github.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 91eb5e5293fda2127ae33594715b47ff7e4eb985:
>
>   qemu-char, io: fix ordering of arguments for UDP socket creation (2016-02-09 13:06:00 +0100)
>
> ----------------------------------------------------------------
> * switch to C11 atomics (Alex)
> * Coverity fixes for IPMI (Corey), i386 (Paolo), qemu-char (Paolo)
> * at long last, fail on wrong .pc files if -m32 is in use (Daniel)
> * qemu-char regression fix (Daniel)
> * SAS1068 device (Paolo)
> * memory region docs improvements (Peter)
> * target-i386 cleanups (Richard)
> * qemu-nbd docs improvements (Sitsofe)
> * thread-safe memory hotplug (Stefan)

Compilation failure I'm afraid (all hosts):

/home/pm215/qemu/target-i386/translate.c: In function ‘tcg_x86_init’:
/home/pm215/qemu/target-i386/translate.c:7724:34: error: passing
argument 1 of ‘tcg_global_mem_new_i32’ makes pointer from integer
without a cast [-Werror]
                                  seg_base_names[i]);
                                  ^
In file included from /home/pm215/qemu/tcg/tcg-op.h:25:0,
                 from /home/pm215/qemu/target-i386/translate.c:24:
/home/pm215/qemu/tcg/tcg.h:644:24: note: expected ‘TCGv_ptr’ but
argument is of type ‘int’
 static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr reg, intptr_t offset,
                        ^

thanks
-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08
  2016-02-09 14:20 ` [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Peter Maydell
@ 2016-02-09 14:41   ` Paolo Bonzini
  2016-02-09 17:01     ` Richard Henderson
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2016-02-09 14:41 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Richard Henderson



On 09/02/2016 15:20, Peter Maydell wrote:
> Compilation failure I'm afraid (all hosts):
> 
> /home/pm215/qemu/target-i386/translate.c: In function ‘tcg_x86_init’:
> /home/pm215/qemu/target-i386/translate.c:7724:34: error: passing
> argument 1 of ‘tcg_global_mem_new_i32’ makes pointer from integer
> without a cast [-Werror]
>                                   seg_base_names[i]);
>                                   ^
> In file included from /home/pm215/qemu/tcg/tcg-op.h:25:0,
>                  from /home/pm215/qemu/target-i386/translate.c:24:
> /home/pm215/qemu/tcg/tcg.h:644:24: note: expected ‘TCGv_ptr’ but
> argument is of type ‘int’
>  static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr reg, intptr_t offset,
>                         ^

Hmm, not on my host and I don't see what's going on:

    static const char seg_base_names[6][8] = {
        [R_CS] = "cs_base",
        [R_DS] = "ds_base",
        [R_ES] = "es_base",
        [R_FS] = "fs_base",
        [R_GS] = "gs_base",
        [R_SS] = "ss_base",
    };

        cpu_seg_base[i]
            = tcg_global_mem_new(TCG_AREG0,
                                 offsetof(CPUX86State, segs[i].base),
                                 seg_base_names[i]);


There's no difference between that and e.g.

    cpu_cc_src2 = tcg_global_mem_new(TCG_AREG0,
                                     offsetof(CPUX86State, cc_src2),
                                     "cc_src2");


Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PULL 31/32] MAINTAINERS: add all-match entry for qemu-devel@
  2016-02-09 13:47   ` Markus Armbruster
@ 2016-02-09 15:01     ` Paolo Bonzini
  2016-02-09 15:24       ` Markus Armbruster
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2016-02-09 15:01 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: John Snow, qemu-devel, Stephen Warren



On 09/02/2016 14:47, Markus Armbruster wrote:
>> >                                                The fallback now relies
>> > on finding a real person in MAINTAINERS, not just a mailing list for
> ... in fact all lists now are.  Entries with L: and without M:
> 
>     Maintained: LINUX, POSIX, i386 target
>     Odd fixes: GDB stub
>     Orphan: Gumstix, Mipssim, Stable 0.10, Stable 0.14, Stable 0.15

Not Stable 0.15 actually.

> For these, we now fall back to git.  Intentional?

Yes, intentional.  They are other instances of qemu-devel@ or, for
Gumstix, qemu-arm@.  For some of them it may make sense to give them an
M too, for stable versions we probably should drop them.

> If yes, mention it in the commit message.

It is mentioned.  For gumstix, it says sub-community right after where
you cut it. :) "The fallback now relies on finding a real person in
MAINTAINERS, not just a mailing list for the relevant sub-community".

For others, there's no reason why it should be different from the
catch-all entry.

> If no, the easiest fix is to give them an M, if we can find a willing
> victim.
> 
> Aside: we also have entries with neither L: nor M:
> 
>     Orphan: LSI53C895A, M68K, an5206, dummy_m68k, mcf5208
> 
> Odd.

Not too odd considering they're Orphan. :)  In fact the above entries
could have their L removed too (apart from Gumstix again).

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PULL 31/32] MAINTAINERS: add all-match entry for qemu-devel@
  2016-02-09 15:01     ` Paolo Bonzini
@ 2016-02-09 15:24       ` Markus Armbruster
  2016-02-09 15:27         ` Peter Maydell
  0 siblings, 1 reply; 16+ messages in thread
From: Markus Armbruster @ 2016-02-09 15:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: John Snow, qemu-devel, Stephen Warren

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 09/02/2016 14:47, Markus Armbruster wrote:
>>> >                                                The fallback now relies
>>> > on finding a real person in MAINTAINERS, not just a mailing list for
>> ... in fact all lists now are.  Entries with L: and without M:
>> 
>>     Maintained: LINUX, POSIX, i386 target
>>     Odd fixes: GDB stub
>>     Orphan: Gumstix, Mipssim, Stable 0.10, Stable 0.14, Stable 0.15
>
> Not Stable 0.15 actually.

Oops, it's Stable 1.0.

>> For these, we now fall back to git.  Intentional?
>
> Yes, intentional.  They are other instances of qemu-devel@ or, for
> Gumstix, qemu-arm@.  For some of them it may make sense to give them an
> M too, for stable versions we probably should drop them.
>
>> If yes, mention it in the commit message.
>
> It is mentioned.  For gumstix, it says sub-community right after where
> you cut it. :) "The fallback now relies on finding a real person in
> MAINTAINERS, not just a mailing list for the relevant sub-community".

Yes, that sentence technically covers it, but I find the paragraph
confusing all the same.  Two ways to get my R-by:

1. Split the patch in two: part 1 is the get_maintainer.pl modification,
   part 2 is the MAINTAINERS update.  Each part does just one thing
   then, which allows simple commit messages.  That's what I'd do.

2. Keep the patch as is, but improve the commit message.  Here's my try:

    MAINTAINERS: Fall back to git unless we have a human, add qemu-devel@

    From: Stephen Warren <swarren@wwwdotorg.org>

    Add an entry to MAINTAINERS that matches every patch, and requests the
    user send patches to qemu-devel@nongnu.org.

    It's not 100% obvious to project newcomers that all patches should
    be sent there; get_maintainer.pl doesn't say so, and since it
    mentions other lists to CC, the wording "the list" from the
    SubmitAPatch wiki page can be taken to mean only those lists, not
    the main list too.

    The F: entries were taken from a similar entry in the Linux kernel.

    On its own, this would break fallback to git, because now every file
    has a maintainer of sorts.  Modify get_maintainer.pl so that mailing
    lists (L: lines) no longer prevent the fallback, only humans (M:
    entries).  This portion of the patch was suggested by
    pbonzini@redhat.com.

    Several pre-existing entries have a list but no human.  These now
    fall back to git.  That's a feature.

    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Markus Armbruster <armbru@redhat.com>
    Cc: John Snow <jsnow@redhat.com>
    Signed-off-by: Stephen Warren <swarren@wwwdotorg.org>
    Message-Id: <1454987065-12961-1-git-send-email-swarren@wwwdotorg.org>
    [Commit message tweaked]
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Reviewed-by: Markus Armbruster <armbru@redhat.com>

In addition, consider changing the "All patches CC here" line to "All
patches must be sent to this mailing list".

> For others, there's no reason why it should be different from the
> catch-all entry.
>
>> If no, the easiest fix is to give them an M, if we can find a willing
>> victim.
>> 
>> Aside: we also have entries with neither L: nor M:
>> 
>>     Orphan: LSI53C895A, M68K, an5206, dummy_m68k, mcf5208
>> 
>> Odd.
>
> Not too odd considering they're Orphan. :)  In fact the above entries
> could have their L removed too (apart from Gumstix again).
>
> Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PULL 31/32] MAINTAINERS: add all-match entry for qemu-devel@
  2016-02-09 15:24       ` Markus Armbruster
@ 2016-02-09 15:27         ` Peter Maydell
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2016-02-09 15:27 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, John Snow, QEMU Developers, Stephen Warren

On 9 February 2016 at 15:24, Markus Armbruster <armbru@redhat.com> wrote:
> 1. Split the patch in two: part 1 is the get_maintainer.pl modification,
>    part 2 is the MAINTAINERS update.  Each part does just one thing
>    then, which allows simple commit messages.  That's what I'd do.

I would favour this as well...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08
  2016-02-09 14:41   ` Paolo Bonzini
@ 2016-02-09 17:01     ` Richard Henderson
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2016-02-09 17:01 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Maydell; +Cc: QEMU Developers

On 02/10/2016 01:41 AM, Paolo Bonzini wrote:
>
>
> On 09/02/2016 15:20, Peter Maydell wrote:
>> Compilation failure I'm afraid (all hosts):
>>
>> /home/pm215/qemu/target-i386/translate.c: In function ‘tcg_x86_init’:
>> /home/pm215/qemu/target-i386/translate.c:7724:34: error: passing
>> argument 1 of ‘tcg_global_mem_new_i32’ makes pointer from integer
>> without a cast [-Werror]
>>                                    seg_base_names[i]);
>>                                    ^
>> In file included from /home/pm215/qemu/tcg/tcg-op.h:25:0,
>>                   from /home/pm215/qemu/target-i386/translate.c:24:
>> /home/pm215/qemu/tcg/tcg.h:644:24: note: expected ‘TCGv_ptr’ but
>> argument is of type ‘int’
>>   static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr reg, intptr_t offset,
>>                          ^
>
> Hmm, not on my host and I don't see what's going on:
>
>      static const char seg_base_names[6][8] = {
>          [R_CS] = "cs_base",
>          [R_DS] = "ds_base",
>          [R_ES] = "es_base",
>          [R_FS] = "fs_base",
>          [R_GS] = "gs_base",
>          [R_SS] = "ss_base",
>      };
>
>          cpu_seg_base[i]
>              = tcg_global_mem_new(TCG_AREG0,
>                                   offsetof(CPUX86State, segs[i].base),
>                                   seg_base_names[i]);
>
>
> There's no difference between that and e.g.
>
>      cpu_cc_src2 = tcg_global_mem_new(TCG_AREG0,
>                                       offsetof(CPUX86State, cc_src2),
>                                       "cc_src2");

Merge conflict.  s/TCG_AREG0/cpu_env/ after my latest tcg patch set.


r~

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PULL 02/32] memory: RCU ram_list.dirty_memory[] for safe RAM hotplug
  2016-02-09 12:13 ` [Qemu-devel] [PULL 02/32] memory: RCU ram_list.dirty_memory[] for safe RAM hotplug Paolo Bonzini
@ 2016-02-10 16:56   ` Leon Alrae
  2016-02-10 17:01     ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Leon Alrae @ 2016-02-10 16:56 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: Stefan Hajnoczi

Hi,

I just noticed significant performance hit with this change. Booting
small system (I tried on system mips only) was usually taking around 20
seconds, now reaches 3 minutes with this change.

Leon

On 09/02/16 12:13, Paolo Bonzini wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Although accesses to ram_list.dirty_memory[] use atomics so multiple
> threads can safely dirty the bitmap, the data structure is not fully
> thread-safe yet.
> 
> This patch handles the RAM hotplug case where ram_list.dirty_memory[] is
> grown.  ram_list.dirty_memory[] is change from a regular bitmap to an
> RCU array of pointers to fixed-size bitmap blocks.  Threads can continue
> accessing bitmap blocks while the array is being extended.  See the
> comments in the code for an in-depth explanation of struct
> DirtyMemoryBlocks.
> 
> I have tested that live migration with virtio-blk dataplane works.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> Message-Id: <1453728801-5398-2-git-send-email-stefanha@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  exec.c                  |  75 +++++++++++++++----
>  include/exec/ram_addr.h | 189 ++++++++++++++++++++++++++++++++++++++++++------
>  migration/ram.c         |   4 -
>  3 files changed, 225 insertions(+), 43 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index ab37360..7d67c11 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -980,8 +980,9 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
>                                                ram_addr_t length,
>                                                unsigned client)
>  {
> +    DirtyMemoryBlocks *blocks;
>      unsigned long end, page;
> -    bool dirty;
> +    bool dirty = false;
>  
>      if (length == 0) {
>          return false;
> @@ -989,8 +990,22 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
>  
>      end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
>      page = start >> TARGET_PAGE_BITS;
> -    dirty = bitmap_test_and_clear_atomic(ram_list.dirty_memory[client],
> -                                         page, end - page);
> +
> +    rcu_read_lock();
> +
> +    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
> +
> +    while (page < end) {
> +        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> +        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> +        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
> +
> +        dirty |= bitmap_test_and_clear_atomic(blocks->blocks[idx],
> +                                              offset, num);
> +        page += num;
> +    }
> +
> +    rcu_read_unlock();
>  
>      if (dirty && tcg_enabled()) {
>          tlb_reset_dirty_range_all(start, length);
> @@ -1504,6 +1519,47 @@ int qemu_ram_resize(ram_addr_t base, ram_addr_t newsize, Error **errp)
>      return 0;
>  }
>  
> +/* Called with ram_list.mutex held */
> +static void dirty_memory_extend(ram_addr_t old_ram_size,
> +                                ram_addr_t new_ram_size)
> +{
> +    ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size,
> +                                             DIRTY_MEMORY_BLOCK_SIZE);
> +    ram_addr_t new_num_blocks = DIV_ROUND_UP(new_ram_size,
> +                                             DIRTY_MEMORY_BLOCK_SIZE);
> +    int i;
> +
> +    /* Only need to extend if block count increased */
> +    if (new_num_blocks <= old_num_blocks) {
> +        return;
> +    }
> +
> +    for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
> +        DirtyMemoryBlocks *old_blocks;
> +        DirtyMemoryBlocks *new_blocks;
> +        int j;
> +
> +        old_blocks = atomic_rcu_read(&ram_list.dirty_memory[i]);
> +        new_blocks = g_malloc(sizeof(*new_blocks) +
> +                              sizeof(new_blocks->blocks[0]) * new_num_blocks);
> +
> +        if (old_num_blocks) {
> +            memcpy(new_blocks->blocks, old_blocks->blocks,
> +                   old_num_blocks * sizeof(old_blocks->blocks[0]));
> +        }
> +
> +        for (j = old_num_blocks; j < new_num_blocks; j++) {
> +            new_blocks->blocks[j] = bitmap_new(DIRTY_MEMORY_BLOCK_SIZE);
> +        }
> +
> +        atomic_rcu_set(&ram_list.dirty_memory[i], new_blocks);
> +
> +        if (old_blocks) {
> +            g_free_rcu(old_blocks, rcu);
> +        }
> +    }
> +}
> +
>  static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
>  {
>      RAMBlock *block;
> @@ -1543,6 +1599,7 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
>                (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS);
>      if (new_ram_size > old_ram_size) {
>          migration_bitmap_extend(old_ram_size, new_ram_size);
> +        dirty_memory_extend(old_ram_size, new_ram_size);
>      }
>      /* Keep the list sorted from biggest to smallest block.  Unlike QTAILQ,
>       * QLIST (which has an RCU-friendly variant) does not have insertion at
> @@ -1568,18 +1625,6 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
>      ram_list.version++;
>      qemu_mutex_unlock_ramlist();
>  
> -    new_ram_size = last_ram_offset() >> TARGET_PAGE_BITS;
> -
> -    if (new_ram_size > old_ram_size) {
> -        int i;
> -
> -        /* ram_list.dirty_memory[] is protected by the iothread lock.  */
> -        for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
> -            ram_list.dirty_memory[i] =
> -                bitmap_zero_extend(ram_list.dirty_memory[i],
> -                                   old_ram_size, new_ram_size);
> -       }
> -    }
>      cpu_physical_memory_set_dirty_range(new_block->offset,
>                                          new_block->used_length,
>                                          DIRTY_CLIENTS_ALL);
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index f2e872d..b1413a1 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -49,13 +49,43 @@ static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
>      return (char *)block->host + offset;
>  }
>  
> +/* The dirty memory bitmap is split into fixed-size blocks to allow growth
> + * under RCU.  The bitmap for a block can be accessed as follows:
> + *
> + *   rcu_read_lock();
> + *
> + *   DirtyMemoryBlocks *blocks =
> + *       atomic_rcu_read(&ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
> + *
> + *   ram_addr_t idx = (addr >> TARGET_PAGE_BITS) / DIRTY_MEMORY_BLOCK_SIZE;
> + *   unsigned long *block = blocks.blocks[idx];
> + *   ...access block bitmap...
> + *
> + *   rcu_read_unlock();
> + *
> + * Remember to check for the end of the block when accessing a range of
> + * addresses.  Move on to the next block if you reach the end.
> + *
> + * Organization into blocks allows dirty memory to grow (but not shrink) under
> + * RCU.  When adding new RAMBlocks requires the dirty memory to grow, a new
> + * DirtyMemoryBlocks array is allocated with pointers to existing blocks kept
> + * the same.  Other threads can safely access existing blocks while dirty
> + * memory is being grown.  When no threads are using the old DirtyMemoryBlocks
> + * anymore it is freed by RCU (but the underlying blocks stay because they are
> + * pointed to from the new DirtyMemoryBlocks).
> + */
> +#define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8)
> +typedef struct {
> +    struct rcu_head rcu;
> +    unsigned long *blocks[];
> +} DirtyMemoryBlocks;
> +
>  typedef struct RAMList {
>      QemuMutex mutex;
> -    /* Protected by the iothread lock.  */
> -    unsigned long *dirty_memory[DIRTY_MEMORY_NUM];
>      RAMBlock *mru_block;
>      /* RCU-enabled, writes protected by the ramlist lock. */
>      QLIST_HEAD(, RAMBlock) blocks;
> +    DirtyMemoryBlocks *dirty_memory[DIRTY_MEMORY_NUM];
>      uint32_t version;
>  } RAMList;
>  extern RAMList ram_list;
> @@ -89,30 +119,70 @@ static inline bool cpu_physical_memory_get_dirty(ram_addr_t start,
>                                                   ram_addr_t length,
>                                                   unsigned client)
>  {
> -    unsigned long end, page, next;
> +    DirtyMemoryBlocks *blocks;
> +    unsigned long end, page;
> +    bool dirty = false;
>  
>      assert(client < DIRTY_MEMORY_NUM);
>  
>      end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
>      page = start >> TARGET_PAGE_BITS;
> -    next = find_next_bit(ram_list.dirty_memory[client], end, page);
>  
> -    return next < end;
> +    rcu_read_lock();
> +
> +    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
> +
> +    while (page < end) {
> +        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> +        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> +        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
> +
> +        if (find_next_bit(blocks->blocks[idx], offset, num) < num) {
> +            dirty = true;
> +            break;
> +        }
> +
> +        page += num;
> +    }
> +
> +    rcu_read_unlock();
> +
> +    return dirty;
>  }
>  
>  static inline bool cpu_physical_memory_all_dirty(ram_addr_t start,
>                                                   ram_addr_t length,
>                                                   unsigned client)
>  {
> -    unsigned long end, page, next;
> +    DirtyMemoryBlocks *blocks;
> +    unsigned long end, page;
> +    bool dirty = true;
>  
>      assert(client < DIRTY_MEMORY_NUM);
>  
>      end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
>      page = start >> TARGET_PAGE_BITS;
> -    next = find_next_zero_bit(ram_list.dirty_memory[client], end, page);
>  
> -    return next >= end;
> +    rcu_read_lock();
> +
> +    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
> +
> +    while (page < end) {
> +        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> +        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> +        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
> +
> +        if (find_next_zero_bit(blocks->blocks[idx], offset, num) < num) {
> +            dirty = false;
> +            break;
> +        }
> +
> +        page += num;
> +    }
> +
> +    rcu_read_unlock();
> +
> +    return dirty;
>  }
>  
>  static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr,
> @@ -154,16 +224,31 @@ static inline uint8_t cpu_physical_memory_range_includes_clean(ram_addr_t start,
>  static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr,
>                                                        unsigned client)
>  {
> +    unsigned long page, idx, offset;
> +    DirtyMemoryBlocks *blocks;
> +
>      assert(client < DIRTY_MEMORY_NUM);
> -    set_bit_atomic(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]);
> +
> +    page = addr >> TARGET_PAGE_BITS;
> +    idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> +    offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> +
> +    rcu_read_lock();
> +
> +    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
> +
> +    set_bit_atomic(offset, blocks->blocks[idx]);
> +
> +    rcu_read_unlock();
>  }
>  
>  static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
>                                                         ram_addr_t length,
>                                                         uint8_t mask)
>  {
> +    DirtyMemoryBlocks *blocks[DIRTY_MEMORY_NUM];
>      unsigned long end, page;
> -    unsigned long **d = ram_list.dirty_memory;
> +    int i;
>  
>      if (!mask && !xen_enabled()) {
>          return;
> @@ -171,15 +256,36 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
>  
>      end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
>      page = start >> TARGET_PAGE_BITS;
> -    if (likely(mask & (1 << DIRTY_MEMORY_MIGRATION))) {
> -        bitmap_set_atomic(d[DIRTY_MEMORY_MIGRATION], page, end - page);
> -    }
> -    if (unlikely(mask & (1 << DIRTY_MEMORY_VGA))) {
> -        bitmap_set_atomic(d[DIRTY_MEMORY_VGA], page, end - page);
> +
> +    rcu_read_lock();
> +
> +    for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
> +        blocks[i] = atomic_rcu_read(&ram_list.dirty_memory[i]);
>      }
> -    if (unlikely(mask & (1 << DIRTY_MEMORY_CODE))) {
> -        bitmap_set_atomic(d[DIRTY_MEMORY_CODE], page, end - page);
> +
> +    while (page < end) {
> +        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> +        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> +        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
> +
> +        if (likely(mask & (1 << DIRTY_MEMORY_MIGRATION))) {
> +            bitmap_set_atomic(blocks[DIRTY_MEMORY_MIGRATION]->blocks[idx],
> +                              offset, num);
> +        }
> +        if (unlikely(mask & (1 << DIRTY_MEMORY_VGA))) {
> +            bitmap_set_atomic(blocks[DIRTY_MEMORY_VGA]->blocks[idx],
> +                              offset, num);
> +        }
> +        if (unlikely(mask & (1 << DIRTY_MEMORY_CODE))) {
> +            bitmap_set_atomic(blocks[DIRTY_MEMORY_CODE]->blocks[idx],
> +                              offset, num);
> +        }
> +
> +        page += num;
>      }
> +
> +    rcu_read_unlock();
> +
>      xen_modified_memory(start, length);
>  }
>  
> @@ -199,21 +305,41 @@ static inline void cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
>      /* start address is aligned at the start of a word? */
>      if ((((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) &&
>          (hpratio == 1)) {
> +        unsigned long **blocks[DIRTY_MEMORY_NUM];
> +        unsigned long idx;
> +        unsigned long offset;
>          long k;
>          long nr = BITS_TO_LONGS(pages);
>  
> +        idx = (start >> TARGET_PAGE_BITS) / DIRTY_MEMORY_BLOCK_SIZE;
> +        offset = BIT_WORD((start >> TARGET_PAGE_BITS) %
> +                          DIRTY_MEMORY_BLOCK_SIZE);
> +
> +        rcu_read_lock();
> +
> +        for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
> +            blocks[i] = atomic_rcu_read(&ram_list.dirty_memory[i])->blocks;
> +        }
> +
>          for (k = 0; k < nr; k++) {
>              if (bitmap[k]) {
>                  unsigned long temp = leul_to_cpu(bitmap[k]);
> -                unsigned long **d = ram_list.dirty_memory;
>  
> -                atomic_or(&d[DIRTY_MEMORY_MIGRATION][page + k], temp);
> -                atomic_or(&d[DIRTY_MEMORY_VGA][page + k], temp);
> +                atomic_or(&blocks[DIRTY_MEMORY_MIGRATION][idx][offset], temp);
> +                atomic_or(&blocks[DIRTY_MEMORY_VGA][idx][offset], temp);
>                  if (tcg_enabled()) {
> -                    atomic_or(&d[DIRTY_MEMORY_CODE][page + k], temp);
> +                    atomic_or(&blocks[DIRTY_MEMORY_CODE][idx][offset], temp);
>                  }
>              }
> +
> +            if (++offset >= BITS_TO_LONGS(DIRTY_MEMORY_BLOCK_SIZE)) {
> +                offset = 0;
> +                idx++;
> +            }
>          }
> +
> +        rcu_read_unlock();
> +
>          xen_modified_memory(start, pages << TARGET_PAGE_BITS);
>      } else {
>          uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL : DIRTY_CLIENTS_NOCODE;
> @@ -265,18 +391,33 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
>      if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) {
>          int k;
>          int nr = BITS_TO_LONGS(length >> TARGET_PAGE_BITS);
> -        unsigned long *src = ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION];
> +        unsigned long * const *src;
> +        unsigned long idx = (page * BITS_PER_LONG) / DIRTY_MEMORY_BLOCK_SIZE;
> +        unsigned long offset = BIT_WORD((page * BITS_PER_LONG) %
> +                                        DIRTY_MEMORY_BLOCK_SIZE);
> +
> +        rcu_read_lock();
> +
> +        src = atomic_rcu_read(
> +                &ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION])->blocks;
>  
>          for (k = page; k < page + nr; k++) {
> -            if (src[k]) {
> -                unsigned long bits = atomic_xchg(&src[k], 0);
> +            if (src[idx][offset]) {
> +                unsigned long bits = atomic_xchg(&src[idx][offset], 0);
>                  unsigned long new_dirty;
>                  new_dirty = ~dest[k];
>                  dest[k] |= bits;
>                  new_dirty &= bits;
>                  num_dirty += ctpopl(new_dirty);
>              }
> +
> +            if (++offset >= BITS_TO_LONGS(DIRTY_MEMORY_BLOCK_SIZE)) {
> +                offset = 0;
> +                idx++;
> +            }
>          }
> +
> +        rcu_read_unlock();
>      } else {
>          for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
>              if (cpu_physical_memory_test_and_clear_dirty(
> diff --git a/migration/ram.c b/migration/ram.c
> index 3cdfea4..96c749f 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -609,7 +609,6 @@ static void migration_bitmap_sync_init(void)
>      iterations_prev = 0;
>  }
>  
> -/* Called with iothread lock held, to protect ram_list.dirty_memory[] */
>  static void migration_bitmap_sync(void)
>  {
>      RAMBlock *block;
> @@ -1921,8 +1920,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>          acct_clear();
>      }
>  
> -    /* iothread lock needed for ram_list.dirty_memory[] */
> -    qemu_mutex_lock_iothread();
>      qemu_mutex_lock_ramlist();
>      rcu_read_lock();
>      bytes_transferred = 0;
> @@ -1947,7 +1944,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      memory_global_dirty_log_start();
>      migration_bitmap_sync();
>      qemu_mutex_unlock_ramlist();
> -    qemu_mutex_unlock_iothread();
>  
>      qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
>  
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PULL 02/32] memory: RCU ram_list.dirty_memory[] for safe RAM hotplug
  2016-02-10 16:56   ` Leon Alrae
@ 2016-02-10 17:01     ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2016-02-10 17:01 UTC (permalink / raw)
  To: Leon Alrae, qemu-devel; +Cc: Stefan Hajnoczi



On 10/02/2016 17:56, Leon Alrae wrote:
> Hi,
> 
> I just noticed significant performance hit with this change. Booting
> small system (I tried on system mips only) was usually taking around 20
> seconds, now reaches 3 minutes with this change.

You're lucky that it booted at all. :)  Unfortunately I noticed a slow
boot yesterday (6 minutes for Fedora 21 on TCG), but I blamed that I was
running on top of TCG _and_ NFS.  In retrospect that was not too smart.

Today I tried FreeDOS and it hangs completely, but my self-nak and
Peter's push unfortunately crossed.

I've posted a fix and Stefan has already reviewed it.  If you can test
it, that would be great.  The above Fedora 21 image takes about 50
seconds to boot with the patch, so that's in line with the x9 slowdown
you reported.

http://article.gmane.org/gmane.comp.emulators.qemu/393105/raw

Paolo

> Leon
> 
> On 09/02/16 12:13, Paolo Bonzini wrote:
>> From: Stefan Hajnoczi <stefanha@redhat.com>
>>
>> Although accesses to ram_list.dirty_memory[] use atomics so multiple
>> threads can safely dirty the bitmap, the data structure is not fully
>> thread-safe yet.
>>
>> This patch handles the RAM hotplug case where ram_list.dirty_memory[] is
>> grown.  ram_list.dirty_memory[] is change from a regular bitmap to an
>> RCU array of pointers to fixed-size bitmap blocks.  Threads can continue
>> accessing bitmap blocks while the array is being extended.  See the
>> comments in the code for an in-depth explanation of struct
>> DirtyMemoryBlocks.
>>
>> I have tested that live migration with virtio-blk dataplane works.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Message-Id: <1453728801-5398-2-git-send-email-stefanha@redhat.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  exec.c                  |  75 +++++++++++++++----
>>  include/exec/ram_addr.h | 189 ++++++++++++++++++++++++++++++++++++++++++------
>>  migration/ram.c         |   4 -
>>  3 files changed, 225 insertions(+), 43 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index ab37360..7d67c11 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -980,8 +980,9 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
>>                                                ram_addr_t length,
>>                                                unsigned client)
>>  {
>> +    DirtyMemoryBlocks *blocks;
>>      unsigned long end, page;
>> -    bool dirty;
>> +    bool dirty = false;
>>  
>>      if (length == 0) {
>>          return false;
>> @@ -989,8 +990,22 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
>>  
>>      end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
>>      page = start >> TARGET_PAGE_BITS;
>> -    dirty = bitmap_test_and_clear_atomic(ram_list.dirty_memory[client],
>> -                                         page, end - page);
>> +
>> +    rcu_read_lock();
>> +
>> +    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
>> +
>> +    while (page < end) {
>> +        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
>> +        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
>> +        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
>> +
>> +        dirty |= bitmap_test_and_clear_atomic(blocks->blocks[idx],
>> +                                              offset, num);
>> +        page += num;
>> +    }
>> +
>> +    rcu_read_unlock();
>>  
>>      if (dirty && tcg_enabled()) {
>>          tlb_reset_dirty_range_all(start, length);
>> @@ -1504,6 +1519,47 @@ int qemu_ram_resize(ram_addr_t base, ram_addr_t newsize, Error **errp)
>>      return 0;
>>  }
>>  
>> +/* Called with ram_list.mutex held */
>> +static void dirty_memory_extend(ram_addr_t old_ram_size,
>> +                                ram_addr_t new_ram_size)
>> +{
>> +    ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size,
>> +                                             DIRTY_MEMORY_BLOCK_SIZE);
>> +    ram_addr_t new_num_blocks = DIV_ROUND_UP(new_ram_size,
>> +                                             DIRTY_MEMORY_BLOCK_SIZE);
>> +    int i;
>> +
>> +    /* Only need to extend if block count increased */
>> +    if (new_num_blocks <= old_num_blocks) {
>> +        return;
>> +    }
>> +
>> +    for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
>> +        DirtyMemoryBlocks *old_blocks;
>> +        DirtyMemoryBlocks *new_blocks;
>> +        int j;
>> +
>> +        old_blocks = atomic_rcu_read(&ram_list.dirty_memory[i]);
>> +        new_blocks = g_malloc(sizeof(*new_blocks) +
>> +                              sizeof(new_blocks->blocks[0]) * new_num_blocks);
>> +
>> +        if (old_num_blocks) {
>> +            memcpy(new_blocks->blocks, old_blocks->blocks,
>> +                   old_num_blocks * sizeof(old_blocks->blocks[0]));
>> +        }
>> +
>> +        for (j = old_num_blocks; j < new_num_blocks; j++) {
>> +            new_blocks->blocks[j] = bitmap_new(DIRTY_MEMORY_BLOCK_SIZE);
>> +        }
>> +
>> +        atomic_rcu_set(&ram_list.dirty_memory[i], new_blocks);
>> +
>> +        if (old_blocks) {
>> +            g_free_rcu(old_blocks, rcu);
>> +        }
>> +    }
>> +}
>> +
>>  static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
>>  {
>>      RAMBlock *block;
>> @@ -1543,6 +1599,7 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
>>                (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS);
>>      if (new_ram_size > old_ram_size) {
>>          migration_bitmap_extend(old_ram_size, new_ram_size);
>> +        dirty_memory_extend(old_ram_size, new_ram_size);
>>      }
>>      /* Keep the list sorted from biggest to smallest block.  Unlike QTAILQ,
>>       * QLIST (which has an RCU-friendly variant) does not have insertion at
>> @@ -1568,18 +1625,6 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp)
>>      ram_list.version++;
>>      qemu_mutex_unlock_ramlist();
>>  
>> -    new_ram_size = last_ram_offset() >> TARGET_PAGE_BITS;
>> -
>> -    if (new_ram_size > old_ram_size) {
>> -        int i;
>> -
>> -        /* ram_list.dirty_memory[] is protected by the iothread lock.  */
>> -        for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
>> -            ram_list.dirty_memory[i] =
>> -                bitmap_zero_extend(ram_list.dirty_memory[i],
>> -                                   old_ram_size, new_ram_size);
>> -       }
>> -    }
>>      cpu_physical_memory_set_dirty_range(new_block->offset,
>>                                          new_block->used_length,
>>                                          DIRTY_CLIENTS_ALL);
>> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
>> index f2e872d..b1413a1 100644
>> --- a/include/exec/ram_addr.h
>> +++ b/include/exec/ram_addr.h
>> @@ -49,13 +49,43 @@ static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
>>      return (char *)block->host + offset;
>>  }
>>  
>> +/* The dirty memory bitmap is split into fixed-size blocks to allow growth
>> + * under RCU.  The bitmap for a block can be accessed as follows:
>> + *
>> + *   rcu_read_lock();
>> + *
>> + *   DirtyMemoryBlocks *blocks =
>> + *       atomic_rcu_read(&ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
>> + *
>> + *   ram_addr_t idx = (addr >> TARGET_PAGE_BITS) / DIRTY_MEMORY_BLOCK_SIZE;
>> + *   unsigned long *block = blocks.blocks[idx];
>> + *   ...access block bitmap...
>> + *
>> + *   rcu_read_unlock();
>> + *
>> + * Remember to check for the end of the block when accessing a range of
>> + * addresses.  Move on to the next block if you reach the end.
>> + *
>> + * Organization into blocks allows dirty memory to grow (but not shrink) under
>> + * RCU.  When adding new RAMBlocks requires the dirty memory to grow, a new
>> + * DirtyMemoryBlocks array is allocated with pointers to existing blocks kept
>> + * the same.  Other threads can safely access existing blocks while dirty
>> + * memory is being grown.  When no threads are using the old DirtyMemoryBlocks
>> + * anymore it is freed by RCU (but the underlying blocks stay because they are
>> + * pointed to from the new DirtyMemoryBlocks).
>> + */
>> +#define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8)
>> +typedef struct {
>> +    struct rcu_head rcu;
>> +    unsigned long *blocks[];
>> +} DirtyMemoryBlocks;
>> +
>>  typedef struct RAMList {
>>      QemuMutex mutex;
>> -    /* Protected by the iothread lock.  */
>> -    unsigned long *dirty_memory[DIRTY_MEMORY_NUM];
>>      RAMBlock *mru_block;
>>      /* RCU-enabled, writes protected by the ramlist lock. */
>>      QLIST_HEAD(, RAMBlock) blocks;
>> +    DirtyMemoryBlocks *dirty_memory[DIRTY_MEMORY_NUM];
>>      uint32_t version;
>>  } RAMList;
>>  extern RAMList ram_list;
>> @@ -89,30 +119,70 @@ static inline bool cpu_physical_memory_get_dirty(ram_addr_t start,
>>                                                   ram_addr_t length,
>>                                                   unsigned client)
>>  {
>> -    unsigned long end, page, next;
>> +    DirtyMemoryBlocks *blocks;
>> +    unsigned long end, page;
>> +    bool dirty = false;
>>  
>>      assert(client < DIRTY_MEMORY_NUM);
>>  
>>      end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
>>      page = start >> TARGET_PAGE_BITS;
>> -    next = find_next_bit(ram_list.dirty_memory[client], end, page);
>>  
>> -    return next < end;
>> +    rcu_read_lock();
>> +
>> +    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
>> +
>> +    while (page < end) {
>> +        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
>> +        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
>> +        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
>> +
>> +        if (find_next_bit(blocks->blocks[idx], offset, num) < num) {
>> +            dirty = true;
>> +            break;
>> +        }
>> +
>> +        page += num;
>> +    }
>> +
>> +    rcu_read_unlock();
>> +
>> +    return dirty;
>>  }
>>  
>>  static inline bool cpu_physical_memory_all_dirty(ram_addr_t start,
>>                                                   ram_addr_t length,
>>                                                   unsigned client)
>>  {
>> -    unsigned long end, page, next;
>> +    DirtyMemoryBlocks *blocks;
>> +    unsigned long end, page;
>> +    bool dirty = true;
>>  
>>      assert(client < DIRTY_MEMORY_NUM);
>>  
>>      end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
>>      page = start >> TARGET_PAGE_BITS;
>> -    next = find_next_zero_bit(ram_list.dirty_memory[client], end, page);
>>  
>> -    return next >= end;
>> +    rcu_read_lock();
>> +
>> +    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
>> +
>> +    while (page < end) {
>> +        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
>> +        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
>> +        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
>> +
>> +        if (find_next_zero_bit(blocks->blocks[idx], offset, num) < num) {
>> +            dirty = false;
>> +            break;
>> +        }
>> +
>> +        page += num;
>> +    }
>> +
>> +    rcu_read_unlock();
>> +
>> +    return dirty;
>>  }
>>  
>>  static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr,
>> @@ -154,16 +224,31 @@ static inline uint8_t cpu_physical_memory_range_includes_clean(ram_addr_t start,
>>  static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr,
>>                                                        unsigned client)
>>  {
>> +    unsigned long page, idx, offset;
>> +    DirtyMemoryBlocks *blocks;
>> +
>>      assert(client < DIRTY_MEMORY_NUM);
>> -    set_bit_atomic(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]);
>> +
>> +    page = addr >> TARGET_PAGE_BITS;
>> +    idx = page / DIRTY_MEMORY_BLOCK_SIZE;
>> +    offset = page % DIRTY_MEMORY_BLOCK_SIZE;
>> +
>> +    rcu_read_lock();
>> +
>> +    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
>> +
>> +    set_bit_atomic(offset, blocks->blocks[idx]);
>> +
>> +    rcu_read_unlock();
>>  }
>>  
>>  static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
>>                                                         ram_addr_t length,
>>                                                         uint8_t mask)
>>  {
>> +    DirtyMemoryBlocks *blocks[DIRTY_MEMORY_NUM];
>>      unsigned long end, page;
>> -    unsigned long **d = ram_list.dirty_memory;
>> +    int i;
>>  
>>      if (!mask && !xen_enabled()) {
>>          return;
>> @@ -171,15 +256,36 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
>>  
>>      end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
>>      page = start >> TARGET_PAGE_BITS;
>> -    if (likely(mask & (1 << DIRTY_MEMORY_MIGRATION))) {
>> -        bitmap_set_atomic(d[DIRTY_MEMORY_MIGRATION], page, end - page);
>> -    }
>> -    if (unlikely(mask & (1 << DIRTY_MEMORY_VGA))) {
>> -        bitmap_set_atomic(d[DIRTY_MEMORY_VGA], page, end - page);
>> +
>> +    rcu_read_lock();
>> +
>> +    for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
>> +        blocks[i] = atomic_rcu_read(&ram_list.dirty_memory[i]);
>>      }
>> -    if (unlikely(mask & (1 << DIRTY_MEMORY_CODE))) {
>> -        bitmap_set_atomic(d[DIRTY_MEMORY_CODE], page, end - page);
>> +
>> +    while (page < end) {
>> +        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
>> +        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
>> +        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
>> +
>> +        if (likely(mask & (1 << DIRTY_MEMORY_MIGRATION))) {
>> +            bitmap_set_atomic(blocks[DIRTY_MEMORY_MIGRATION]->blocks[idx],
>> +                              offset, num);
>> +        }
>> +        if (unlikely(mask & (1 << DIRTY_MEMORY_VGA))) {
>> +            bitmap_set_atomic(blocks[DIRTY_MEMORY_VGA]->blocks[idx],
>> +                              offset, num);
>> +        }
>> +        if (unlikely(mask & (1 << DIRTY_MEMORY_CODE))) {
>> +            bitmap_set_atomic(blocks[DIRTY_MEMORY_CODE]->blocks[idx],
>> +                              offset, num);
>> +        }
>> +
>> +        page += num;
>>      }
>> +
>> +    rcu_read_unlock();
>> +
>>      xen_modified_memory(start, length);
>>  }
>>  
>> @@ -199,21 +305,41 @@ static inline void cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
>>      /* start address is aligned at the start of a word? */
>>      if ((((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) &&
>>          (hpratio == 1)) {
>> +        unsigned long **blocks[DIRTY_MEMORY_NUM];
>> +        unsigned long idx;
>> +        unsigned long offset;
>>          long k;
>>          long nr = BITS_TO_LONGS(pages);
>>  
>> +        idx = (start >> TARGET_PAGE_BITS) / DIRTY_MEMORY_BLOCK_SIZE;
>> +        offset = BIT_WORD((start >> TARGET_PAGE_BITS) %
>> +                          DIRTY_MEMORY_BLOCK_SIZE);
>> +
>> +        rcu_read_lock();
>> +
>> +        for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
>> +            blocks[i] = atomic_rcu_read(&ram_list.dirty_memory[i])->blocks;
>> +        }
>> +
>>          for (k = 0; k < nr; k++) {
>>              if (bitmap[k]) {
>>                  unsigned long temp = leul_to_cpu(bitmap[k]);
>> -                unsigned long **d = ram_list.dirty_memory;
>>  
>> -                atomic_or(&d[DIRTY_MEMORY_MIGRATION][page + k], temp);
>> -                atomic_or(&d[DIRTY_MEMORY_VGA][page + k], temp);
>> +                atomic_or(&blocks[DIRTY_MEMORY_MIGRATION][idx][offset], temp);
>> +                atomic_or(&blocks[DIRTY_MEMORY_VGA][idx][offset], temp);
>>                  if (tcg_enabled()) {
>> -                    atomic_or(&d[DIRTY_MEMORY_CODE][page + k], temp);
>> +                    atomic_or(&blocks[DIRTY_MEMORY_CODE][idx][offset], temp);
>>                  }
>>              }
>> +
>> +            if (++offset >= BITS_TO_LONGS(DIRTY_MEMORY_BLOCK_SIZE)) {
>> +                offset = 0;
>> +                idx++;
>> +            }
>>          }
>> +
>> +        rcu_read_unlock();
>> +
>>          xen_modified_memory(start, pages << TARGET_PAGE_BITS);
>>      } else {
>>          uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL : DIRTY_CLIENTS_NOCODE;
>> @@ -265,18 +391,33 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
>>      if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) {
>>          int k;
>>          int nr = BITS_TO_LONGS(length >> TARGET_PAGE_BITS);
>> -        unsigned long *src = ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION];
>> +        unsigned long * const *src;
>> +        unsigned long idx = (page * BITS_PER_LONG) / DIRTY_MEMORY_BLOCK_SIZE;
>> +        unsigned long offset = BIT_WORD((page * BITS_PER_LONG) %
>> +                                        DIRTY_MEMORY_BLOCK_SIZE);
>> +
>> +        rcu_read_lock();
>> +
>> +        src = atomic_rcu_read(
>> +                &ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION])->blocks;
>>  
>>          for (k = page; k < page + nr; k++) {
>> -            if (src[k]) {
>> -                unsigned long bits = atomic_xchg(&src[k], 0);
>> +            if (src[idx][offset]) {
>> +                unsigned long bits = atomic_xchg(&src[idx][offset], 0);
>>                  unsigned long new_dirty;
>>                  new_dirty = ~dest[k];
>>                  dest[k] |= bits;
>>                  new_dirty &= bits;
>>                  num_dirty += ctpopl(new_dirty);
>>              }
>> +
>> +            if (++offset >= BITS_TO_LONGS(DIRTY_MEMORY_BLOCK_SIZE)) {
>> +                offset = 0;
>> +                idx++;
>> +            }
>>          }
>> +
>> +        rcu_read_unlock();
>>      } else {
>>          for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
>>              if (cpu_physical_memory_test_and_clear_dirty(
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 3cdfea4..96c749f 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -609,7 +609,6 @@ static void migration_bitmap_sync_init(void)
>>      iterations_prev = 0;
>>  }
>>  
>> -/* Called with iothread lock held, to protect ram_list.dirty_memory[] */
>>  static void migration_bitmap_sync(void)
>>  {
>>      RAMBlock *block;
>> @@ -1921,8 +1920,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>          acct_clear();
>>      }
>>  
>> -    /* iothread lock needed for ram_list.dirty_memory[] */
>> -    qemu_mutex_lock_iothread();
>>      qemu_mutex_lock_ramlist();
>>      rcu_read_lock();
>>      bytes_transferred = 0;
>> @@ -1947,7 +1944,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>      memory_global_dirty_log_start();
>>      migration_bitmap_sync();
>>      qemu_mutex_unlock_ramlist();
>> -    qemu_mutex_unlock_iothread();
>>  
>>      qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
>>  
>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-02-10 17:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-09 12:13 [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Paolo Bonzini
2016-02-09 12:13 ` [Qemu-devel] [PULL 02/32] memory: RCU ram_list.dirty_memory[] for safe RAM hotplug Paolo Bonzini
2016-02-10 16:56   ` Leon Alrae
2016-02-10 17:01     ` Paolo Bonzini
2016-02-09 12:13 ` [Qemu-devel] [PULL 08/32] hw: Add support for LSI SAS1068 (mptsas) device Paolo Bonzini
2016-02-09 12:13 ` [Qemu-devel] [PULL 29/32] docs/memory.txt: Improve list of different memory regions Paolo Bonzini
2016-02-09 12:13 ` [Qemu-devel] [PULL 30/32] target-i386: fix PSE36 mode Paolo Bonzini
2016-02-09 12:13 ` [Qemu-devel] [PULL 31/32] MAINTAINERS: add all-match entry for qemu-devel@ Paolo Bonzini
2016-02-09 13:47   ` Markus Armbruster
2016-02-09 15:01     ` Paolo Bonzini
2016-02-09 15:24       ` Markus Armbruster
2016-02-09 15:27         ` Peter Maydell
2016-02-09 12:13 ` [Qemu-devel] [PULL 32/32] qemu-char, io: fix ordering of arguments for UDP socket creation Paolo Bonzini
2016-02-09 14:20 ` [Qemu-devel] [PULL v2 00/32] Misc changes for 2016-02-08 Peter Maydell
2016-02-09 14:41   ` Paolo Bonzini
2016-02-09 17:01     ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.