All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] PCI Shared memory device
@ 2010-04-07 22:51 ` Cam Macdonell
  0 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-07 22:51 UTC (permalink / raw)
  To: kvm; +Cc: qemu-devel, Cam Macdonell

Latest patch for PCI shared memory device that maps a host shared memory object
to be shared between guests

new in this series
    - moved to single Doorbell register and use datamatch to trigger different
      VMs rather than one register per eventfd
    - remove writing arbitrary values to eventfds.  Only values of 1 are now
      written to ensure correct usage

Cam Macdonell (3):
  Device specification for shared memory PCI device
  Support adding a file to qemu's ram allocation
  Inter-VM shared memory PCI device

 Makefile.target                    |    3 +
 cpu-common.h                       |    1 +
 docs/specs/ivshmem_device_spec.txt |   85 +++++
 exec.c                             |   33 ++
 hw/ivshmem.c                       |  700 ++++++++++++++++++++++++++++++++++++
 qemu-char.c                        |    6 +
 qemu-char.h                        |    3 +
 7 files changed, 831 insertions(+), 0 deletions(-)
 create mode 100644 docs/specs/ivshmem_device_spec.txt
 create mode 100644 hw/ivshmem.c


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v4 0/3] PCI Shared memory device
@ 2010-04-07 22:51 ` Cam Macdonell
  0 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-07 22:51 UTC (permalink / raw)
  To: kvm; +Cc: Cam Macdonell, qemu-devel

Latest patch for PCI shared memory device that maps a host shared memory object
to be shared between guests

new in this series
    - moved to single Doorbell register and use datamatch to trigger different
      VMs rather than one register per eventfd
    - remove writing arbitrary values to eventfds.  Only values of 1 are now
      written to ensure correct usage

Cam Macdonell (3):
  Device specification for shared memory PCI device
  Support adding a file to qemu's ram allocation
  Inter-VM shared memory PCI device

 Makefile.target                    |    3 +
 cpu-common.h                       |    1 +
 docs/specs/ivshmem_device_spec.txt |   85 +++++
 exec.c                             |   33 ++
 hw/ivshmem.c                       |  700 ++++++++++++++++++++++++++++++++++++
 qemu-char.c                        |    6 +
 qemu-char.h                        |    3 +
 7 files changed, 831 insertions(+), 0 deletions(-)
 create mode 100644 docs/specs/ivshmem_device_spec.txt
 create mode 100644 hw/ivshmem.c

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v4 1/3] Device specification for shared memory PCI device
  2010-04-07 22:51 ` [Qemu-devel] " Cam Macdonell
@ 2010-04-07 22:51   ` Cam Macdonell
  -1 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-07 22:51 UTC (permalink / raw)
  To: kvm; +Cc: qemu-devel, Cam Macdonell

---
 docs/specs/ivshmem_device_spec.txt |   85 ++++++++++++++++++++++++++++++++++++
 1 files changed, 85 insertions(+), 0 deletions(-)
 create mode 100644 docs/specs/ivshmem_device_spec.txt

diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt
new file mode 100644
index 0000000..9895782
--- /dev/null
+++ b/docs/specs/ivshmem_device_spec.txt
@@ -0,0 +1,85 @@
+
+Device Specification for Inter-VM shared memory device
+------------------------------------------------------
+
+The Inter-VM shared memory device is designed to share a region of memory to
+userspace in multiple virtual guests.  The memory region does not belong to any
+guest, but is a POSIX memory object on the host.  Optionally, the device may
+support sending interrupts to other guests sharing the same memory region.
+
+The Inter-VM PCI device
+-----------------------
+
+BARs
+
+The device supports three BARs.  BAR0 is a 1 Kbyte MMIO region to support
+registers.  BAR1 is used for MSI-X when it is enabled in the device.  BAR2 is
+used to map the shared memory object from the host.  The size of BAR2 is
+specified when the guest is started and must be a power of 2 in size.
+
+Registers
+
+The device currently supports 4 registers of 32-bits each.  Registers
+are used for synchronization between guests sharing the same memory object when
+interrupts are supported (this requires using the shared memory server).
+
+The server assigns each VM an ID number and sends this ID number to the Qemu
+process when the guest starts.
+
+enum ivshmem_registers {
+    IntrMask = 0,
+    IntrStatus = 4,
+    IVPosition = 8,
+    Doorbell = 12
+};
+
+The first two registers are the interrupt mask and status registers.  Mask and
+status are only used with pin-based interrupts.  They are unused with MSI
+interrupts.  The IVPosition register is read-only and reports the guest's ID
+number.  To interrupt another guest, a guest must write to the Doorbell
+register.  The doorbell register is 32-bits, logically divided into two 16-bit
+fields.  The high 16-bits are the guest ID to interrupt and the low 16-bits are
+the interrupt vector to trigger.
+
+The semantics of the value written to the doorbell depends on whether the
+device is using MSI or a regular pin-based interrupt.  In short, MSI uses
+vectors and regular interrupts set the status register.
+
+Regular Interrupts
+------------------
+
+If regular interrupts are used (due to either a guest not supporting MSI or the
+user specifying not to use them on startup) then the value written to the lower
+16-bits of the Doorbell register results is arbitrary and will trigger an
+interrupt in the destination guest.
+
+An interrupt is also generated when a new guest accesses the shared memory
+region.  A status of (2^32 - 1) indicates that a new guest has joined.
+
+Message Signalled Interrupts
+----------------------------
+
+A ivshmem device may support multiple MSI vectors.  If so, the lower 16-bits
+written to the Doorbell register must be between 1 and the maximum number of
+vectors the guest supports.  The lower 16 bits written to the doorbell is the
+MSI vector that will be raised in the destination guest.  The number of MSI
+vectors can vary but it is set when the VM is started, however vector 0 is
+used to notify that a new guest has joined.  Guests should not use vector 0 for
+any other purpose.
+
+The important thing to remember with MSI is that it is only a signal, no status
+is set (since MSI interrupts are not shared).  All information other than the
+interrupt itself should be communicated via the shared memory region.  Devices
+supporting multiple MSI vectors can use different vectors to indicate different
+events have occurred.  The semantics of interrupt vectors are left to the
+user's discretion.
+
+Usage in the Guest
+------------------
+
+The shared memory device is intended to be used with the provided UIO driver.
+Very little configuration is needed.  The guest should map BAR0 to access the
+registers (an array of 32-bit ints allows simple writing) and map BAR2 to
+access the shared memory region itself.  The size of the shared memory region
+is specified when the guest (or shared memory server) is started.  A guest may
+map the whole shared memory region or only part of it.
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v4 1/3] Device specification for shared memory PCI device
@ 2010-04-07 22:51   ` Cam Macdonell
  0 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-07 22:51 UTC (permalink / raw)
  To: kvm; +Cc: Cam Macdonell, qemu-devel

---
 docs/specs/ivshmem_device_spec.txt |   85 ++++++++++++++++++++++++++++++++++++
 1 files changed, 85 insertions(+), 0 deletions(-)
 create mode 100644 docs/specs/ivshmem_device_spec.txt

diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt
new file mode 100644
index 0000000..9895782
--- /dev/null
+++ b/docs/specs/ivshmem_device_spec.txt
@@ -0,0 +1,85 @@
+
+Device Specification for Inter-VM shared memory device
+------------------------------------------------------
+
+The Inter-VM shared memory device is designed to share a region of memory to
+userspace in multiple virtual guests.  The memory region does not belong to any
+guest, but is a POSIX memory object on the host.  Optionally, the device may
+support sending interrupts to other guests sharing the same memory region.
+
+The Inter-VM PCI device
+-----------------------
+
+BARs
+
+The device supports three BARs.  BAR0 is a 1 Kbyte MMIO region to support
+registers.  BAR1 is used for MSI-X when it is enabled in the device.  BAR2 is
+used to map the shared memory object from the host.  The size of BAR2 is
+specified when the guest is started and must be a power of 2 in size.
+
+Registers
+
+The device currently supports 4 registers of 32-bits each.  Registers
+are used for synchronization between guests sharing the same memory object when
+interrupts are supported (this requires using the shared memory server).
+
+The server assigns each VM an ID number and sends this ID number to the Qemu
+process when the guest starts.
+
+enum ivshmem_registers {
+    IntrMask = 0,
+    IntrStatus = 4,
+    IVPosition = 8,
+    Doorbell = 12
+};
+
+The first two registers are the interrupt mask and status registers.  Mask and
+status are only used with pin-based interrupts.  They are unused with MSI
+interrupts.  The IVPosition register is read-only and reports the guest's ID
+number.  To interrupt another guest, a guest must write to the Doorbell
+register.  The doorbell register is 32-bits, logically divided into two 16-bit
+fields.  The high 16-bits are the guest ID to interrupt and the low 16-bits are
+the interrupt vector to trigger.
+
+The semantics of the value written to the doorbell depends on whether the
+device is using MSI or a regular pin-based interrupt.  In short, MSI uses
+vectors and regular interrupts set the status register.
+
+Regular Interrupts
+------------------
+
+If regular interrupts are used (due to either a guest not supporting MSI or the
+user specifying not to use them on startup) then the value written to the lower
+16-bits of the Doorbell register results is arbitrary and will trigger an
+interrupt in the destination guest.
+
+An interrupt is also generated when a new guest accesses the shared memory
+region.  A status of (2^32 - 1) indicates that a new guest has joined.
+
+Message Signalled Interrupts
+----------------------------
+
+A ivshmem device may support multiple MSI vectors.  If so, the lower 16-bits
+written to the Doorbell register must be between 1 and the maximum number of
+vectors the guest supports.  The lower 16 bits written to the doorbell is the
+MSI vector that will be raised in the destination guest.  The number of MSI
+vectors can vary but it is set when the VM is started, however vector 0 is
+used to notify that a new guest has joined.  Guests should not use vector 0 for
+any other purpose.
+
+The important thing to remember with MSI is that it is only a signal, no status
+is set (since MSI interrupts are not shared).  All information other than the
+interrupt itself should be communicated via the shared memory region.  Devices
+supporting multiple MSI vectors can use different vectors to indicate different
+events have occurred.  The semantics of interrupt vectors are left to the
+user's discretion.
+
+Usage in the Guest
+------------------
+
+The shared memory device is intended to be used with the provided UIO driver.
+Very little configuration is needed.  The guest should map BAR0 to access the
+registers (an array of 32-bit ints allows simple writing) and map BAR2 to
+access the shared memory region itself.  The size of the shared memory region
+is specified when the guest (or shared memory server) is started.  A guest may
+map the whole shared memory region or only part of it.
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 2/3] Support adding a file to qemu's ram allocation
  2010-04-07 22:51   ` [Qemu-devel] " Cam Macdonell
@ 2010-04-07 22:51     ` Cam Macdonell
  -1 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-07 22:51 UTC (permalink / raw)
  To: kvm; +Cc: qemu-devel, Cam Macdonell

This avoids the need of using qemu_ram_alloc and mmap with MAP_FIXED to map a
host file into guest RAM.  This function mmaps the opened file anywhere and adds
the memory to the ram blocks.

Usage is

qemu_ram_mmap(fd, size, MAP_SHARED, offset);
---
 cpu-common.h |    1 +
 exec.c       |   33 +++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index 49c7fb3..87c82fc 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -32,6 +32,7 @@ static inline void cpu_register_physical_memory(target_phys_addr_t start_addr,
 }
 
 ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
+ram_addr_t qemu_ram_mmap(int, ram_addr_t, int, int);
 ram_addr_t qemu_ram_alloc(ram_addr_t);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
diff --git a/exec.c b/exec.c
index 467a0e7..2303be7 100644
--- a/exec.c
+++ b/exec.c
@@ -2811,6 +2811,39 @@ static void *file_ram_alloc(ram_addr_t memory, const char *path)
 }
 #endif
 
+ram_addr_t qemu_ram_mmap(int fd, ram_addr_t size, int flags, int offset)
+{
+    RAMBlock *new_block;
+
+    size = TARGET_PAGE_ALIGN(size);
+    new_block = qemu_malloc(sizeof(*new_block));
+
+    // map the file passed as a parameter to be this part of memory
+    new_block->host = mmap(0, size, PROT_READ|PROT_WRITE, flags, fd, offset);
+
+#ifdef MADV_MERGEABLE
+    madvise(new_block->host, size, MADV_MERGEABLE);
+#endif
+
+    new_block->offset = last_ram_offset;
+    new_block->length = size;
+
+    new_block->next = ram_blocks;
+    ram_blocks = new_block;
+
+    phys_ram_dirty = qemu_realloc(phys_ram_dirty,
+        (last_ram_offset + size) >> TARGET_PAGE_BITS);
+    memset(phys_ram_dirty + (last_ram_offset >> TARGET_PAGE_BITS),
+           0xff, size >> TARGET_PAGE_BITS);
+
+    last_ram_offset += size;
+
+    if (kvm_enabled())
+        kvm_setup_guest_memory(new_block->host, size);
+
+    return new_block->offset;
+}
+
 ram_addr_t qemu_ram_alloc(ram_addr_t size)
 {
     RAMBlock *new_block;
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v4 2/3] Support adding a file to qemu's ram allocation
@ 2010-04-07 22:51     ` Cam Macdonell
  0 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-07 22:51 UTC (permalink / raw)
  To: kvm; +Cc: Cam Macdonell, qemu-devel

This avoids the need of using qemu_ram_alloc and mmap with MAP_FIXED to map a
host file into guest RAM.  This function mmaps the opened file anywhere and adds
the memory to the ram blocks.

Usage is

qemu_ram_mmap(fd, size, MAP_SHARED, offset);
---
 cpu-common.h |    1 +
 exec.c       |   33 +++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index 49c7fb3..87c82fc 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -32,6 +32,7 @@ static inline void cpu_register_physical_memory(target_phys_addr_t start_addr,
 }
 
 ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
+ram_addr_t qemu_ram_mmap(int, ram_addr_t, int, int);
 ram_addr_t qemu_ram_alloc(ram_addr_t);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
diff --git a/exec.c b/exec.c
index 467a0e7..2303be7 100644
--- a/exec.c
+++ b/exec.c
@@ -2811,6 +2811,39 @@ static void *file_ram_alloc(ram_addr_t memory, const char *path)
 }
 #endif
 
+ram_addr_t qemu_ram_mmap(int fd, ram_addr_t size, int flags, int offset)
+{
+    RAMBlock *new_block;
+
+    size = TARGET_PAGE_ALIGN(size);
+    new_block = qemu_malloc(sizeof(*new_block));
+
+    // map the file passed as a parameter to be this part of memory
+    new_block->host = mmap(0, size, PROT_READ|PROT_WRITE, flags, fd, offset);
+
+#ifdef MADV_MERGEABLE
+    madvise(new_block->host, size, MADV_MERGEABLE);
+#endif
+
+    new_block->offset = last_ram_offset;
+    new_block->length = size;
+
+    new_block->next = ram_blocks;
+    ram_blocks = new_block;
+
+    phys_ram_dirty = qemu_realloc(phys_ram_dirty,
+        (last_ram_offset + size) >> TARGET_PAGE_BITS);
+    memset(phys_ram_dirty + (last_ram_offset >> TARGET_PAGE_BITS),
+           0xff, size >> TARGET_PAGE_BITS);
+
+    last_ram_offset += size;
+
+    if (kvm_enabled())
+        kvm_setup_guest_memory(new_block->host, size);
+
+    return new_block->offset;
+}
+
 ram_addr_t qemu_ram_alloc(ram_addr_t size)
 {
     RAMBlock *new_block;
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 3/3] Inter-VM shared memory PCI device
  2010-04-07 22:51     ` [Qemu-devel] " Cam Macdonell
@ 2010-04-07 22:52       ` Cam Macdonell
  -1 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-07 22:52 UTC (permalink / raw)
  To: kvm; +Cc: qemu-devel, Cam Macdonell

Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest.  This patch also supports interrupts between guest by
communicating over a unix domain socket.  This patch applies to the qemu-kvm
repository.

    -device ivshmem,size=<size in MB>[,shm=<shm name>]

Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.

    -device ivshmem,size=<size in MB>[,shm=<shm name>][,chardev=<id>][,msi=on]
            [,irqfd=on][,vectors=n]
    -chardev socket,path=<path>,id=<id>

Sample programs, init scripts and the shared memory server are available in a
git repo here:

    www.gitorious.org/nahanni
---
 Makefile.target |    3 +
 hw/ivshmem.c    |  700 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-char.c     |    6 +
 qemu-char.h     |    3 +
 4 files changed, 712 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index 1ffd802..bc9a681 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
 obj-y += rtl8139.o
 obj-y += e1000.o
 
+# Inter-VM PCI shared memory
+obj-y += ivshmem.o
+
 # Hardware support
 obj-i386-y = pckbd.o dma.o
 obj-i386-y += vga.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 0000000..2ec6c2c
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,700 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *      Cam Macdonell <cam@cs.ualberta.ca>
+ *
+ * Based On: cirrus_vga.c and rtl8139.c
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/io.h>
+#include <sys/ioctl.h>
+#include <sys/eventfd.h>
+#include "hw.h"
+#include "console.h"
+#include "pc.h"
+#include "pci.h"
+#include "sysemu.h"
+
+#include "msix.h"
+#include "qemu-kvm.h"
+#include "libkvm.h"
+
+#include <sys/eventfd.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+
+#define PCI_COMMAND_IOACCESS                0x0001
+#define PCI_COMMAND_MEMACCESS               0x0002
+
+#define DEBUG_IVSHMEM
+
+#define IVSHMEM_IRQFD   0
+#define IVSHMEM_MSI     1
+#define IVSHMEM_MAX_EVENTFDS  16
+
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, args...)        \
+    do {printf("IVSHMEM: " fmt, ##args); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, args...)
+#endif
+
+#define NEW_GUEST_VAL UINT_MAX
+
+struct eventfd_entry {
+    PCIDevice *pdev;
+    int vector;
+};
+
+typedef struct IVShmemState {
+    PCIDevice dev;
+    uint32_t intrmask;
+    uint32_t intrstatus;
+    uint32_t doorbell;
+
+    CharDriverState * chr;
+    CharDriverState ** eventfd_chr;
+    int ivshmem_mmio_io_addr;
+
+    pcibus_t mmio_addr;
+    uint8_t *ivshmem_ptr;
+    unsigned long ivshmem_offset;
+    unsigned int ivshmem_size;
+    int shm_fd; /* shared memory file descriptor */
+
+    /* array of eventfds for each guest */
+    int * eventfds[IVSHMEM_MAX_EVENTFDS];
+    /* keep track of # of eventfds for each guest*/
+    int * eventfds_posn_count;
+
+    int vm_id;
+    int num_eventfds;
+    uint32_t vectors;
+    uint32_t features;
+    struct eventfd_entry eventfd_table[IVSHMEM_MAX_EVENTFDS];
+
+    char * shmobj;
+    uint32_t size; /*size of shared memory in MB*/
+} IVShmemState;
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+    IntrMask = 0,
+    IntrStatus = 4,
+    IVPosition = 8,
+    Doorbell = 12,
+};
+
+static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, int feature) {
+    return (ivs->features & (1 << feature));
+}
+
+static inline int is_power_of_two(int x) {
+    return (x & (x-1)) == 0;
+}
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+                    pcibus_t addr, pcibus_t size, int type)
+{
+    IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+    IVSHMEM_DPRINTF("addr = %u size = %u\n", (uint32_t)addr, (uint32_t)size);
+    cpu_register_physical_memory(addr, s->ivshmem_size, s->ivshmem_offset);
+
+}
+
+/* accessing registers - based on rtl8139 */
+static void ivshmem_update_irq(IVShmemState *s, int val)
+{
+    int isr;
+    isr = (s->intrstatus & s->intrmask) & 0xffffffff;
+
+    /* don't print ISR resets */
+    if (isr) {
+        IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n",
+           isr ? 1 : 0, s->intrstatus, s->intrmask);
+    }
+
+    qemu_set_irq(s->dev.irq[0], (isr != 0));
+}
+
+static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
+{
+    IVSHMEM_DPRINTF("IntrMask write(w) val = 0x%04x\n", val);
+
+    s->intrmask = val;
+
+    ivshmem_update_irq(s, val);
+}
+
+static uint32_t ivshmem_IntrMask_read(IVShmemState *s)
+{
+    uint32_t ret = s->intrmask;
+
+    IVSHMEM_DPRINTF("intrmask read(w) val = 0x%04x\n", ret);
+
+    return ret;
+}
+
+static void ivshmem_IntrStatus_write(IVShmemState *s, uint32_t val)
+{
+    IVSHMEM_DPRINTF("IntrStatus write(w) val = 0x%04x\n", val);
+
+    s->intrstatus = val;
+
+    ivshmem_update_irq(s, val);
+    return;
+}
+
+static uint32_t ivshmem_IntrStatus_read(IVShmemState *s)
+{
+    uint32_t ret = s->intrstatus;
+
+    /* reading ISR clears all interrupts */
+    s->intrstatus = 0;
+
+    ivshmem_update_irq(s, 0);
+
+    return ret;
+}
+
+static void ivshmem_io_writew(void *opaque, uint8_t addr, uint32_t val)
+{
+
+    IVSHMEM_DPRINTF("We shouldn't be writing words\n");
+}
+
+static void ivshmem_io_writel(void *opaque, uint8_t addr, uint32_t val)
+{
+    IVShmemState *s = opaque;
+
+    u_int64_t write_one = 1;
+    u_int16_t dest = val >> 16;
+    u_int16_t vector = val & 0xff;
+
+    addr &= 0xfe;
+
+    switch (addr)
+    {
+        case IntrMask:
+            ivshmem_IntrMask_write(s, val);
+            break;
+
+        case IntrStatus:
+            ivshmem_IntrStatus_write(s, val);
+            break;
+
+        case Doorbell:
+            /* check doorbell range */
+            IVSHMEM_DPRINTF("Writing %ld to VM %d on vector %d\n", write_one, dest, vector);
+            if ((vector > 0) && (vector < s->eventfds_posn_count[dest])) {
+                if (write(s->eventfds[dest][vector], &(write_one), 8) != 8) {
+                    IVSHMEM_DPRINTF("error writing to eventfd\n");
+                }
+            }
+            break;
+        default:
+            IVSHMEM_DPRINTF("Invalid VM Doorbell VM %d\n", dest);
+    }
+}
+
+static void ivshmem_io_writeb(void *opaque, uint8_t addr, uint32_t val)
+{
+    IVSHMEM_DPRINTF("We shouldn't be writing bytes\n");
+}
+
+static uint32_t ivshmem_io_readw(void *opaque, uint8_t addr)
+{
+
+    IVSHMEM_DPRINTF("We shouldn't be reading words\n");
+    return 0;
+}
+
+static uint32_t ivshmem_io_readl(void *opaque, uint8_t addr)
+{
+
+    IVShmemState *s = opaque;
+    uint32_t ret;
+
+    switch (addr)
+    {
+        case IntrMask:
+            ret = ivshmem_IntrMask_read(s);
+            break;
+
+        case IntrStatus:
+            ret = ivshmem_IntrStatus_read(s);
+            break;
+
+        case IVPosition:
+            /* return my id in the ivshmem list */
+            ret = s->vm_id;
+            break;
+
+        default:
+            IVSHMEM_DPRINTF("why are we reading 0x%x\n", addr);
+            ret = 0;
+    }
+
+    return ret;
+
+}
+
+static uint32_t ivshmem_io_readb(void *opaque, uint8_t addr)
+{
+    IVSHMEM_DPRINTF("We shouldn't be reading bytes\n");
+
+    return 0;
+}
+
+static void ivshmem_mmio_writeb(void *opaque,
+                                target_phys_addr_t addr, uint32_t val)
+{
+    ivshmem_io_writeb(opaque, addr & 0xFF, val);
+}
+
+static void ivshmem_mmio_writew(void *opaque,
+                                target_phys_addr_t addr, uint32_t val)
+{
+    ivshmem_io_writew(opaque, addr & 0xFF, val);
+}
+
+static void ivshmem_mmio_writel(void *opaque,
+                                target_phys_addr_t addr, uint32_t val)
+{
+    ivshmem_io_writel(opaque, addr & 0xFF, val);
+}
+
+static uint32_t ivshmem_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+    return ivshmem_io_readb(opaque, addr & 0xFF);
+}
+
+static uint32_t ivshmem_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+    uint32_t val = ivshmem_io_readw(opaque, addr & 0xFF);
+    return val;
+}
+
+static uint32_t ivshmem_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+    uint32_t val = ivshmem_io_readl(opaque, addr & 0xFF);
+    return val;
+}
+
+static CPUReadMemoryFunc *ivshmem_mmio_read[3] = {
+    ivshmem_mmio_readb,
+    ivshmem_mmio_readw,
+    ivshmem_mmio_readl,
+};
+
+static CPUWriteMemoryFunc *ivshmem_mmio_write[3] = {
+    ivshmem_mmio_writeb,
+    ivshmem_mmio_writew,
+    ivshmem_mmio_writel,
+};
+
+static void ivshmem_receive(void *opaque, const uint8_t *buf, int size)
+{
+    IVShmemState *s = opaque;
+
+    ivshmem_IntrStatus_write(s, *buf);
+
+    IVSHMEM_DPRINTF("ivshmem_receive 0x%02x\n", *buf);
+}
+
+static int ivshmem_can_receive(void * opaque)
+{
+    return 8;
+}
+
+static void ivshmem_event(void *opaque, int event)
+{
+//    IVShmemState *s = opaque;
+    IVSHMEM_DPRINTF("ivshmem_event %d\n", event);
+}
+
+static void fake_irqfd(void *opaque, const uint8_t *buf, int size) {
+
+    struct eventfd_entry *entry = opaque;
+    PCIDevice *pdev = entry->pdev;
+
+    IVSHMEM_DPRINTF("fake irqfd on vector %d\n", entry->vector);
+    msix_notify(pdev, entry->vector);
+}
+
+static CharDriverState* create_eventfd_chr_device(void * opaque, int eventfd,
+                                                                    int vector)
+{
+    // create a event character device based on the passed eventfd
+    IVShmemState *s = opaque;
+    CharDriverState * chr;
+
+    chr = qemu_chr_open_eventfd(eventfd);
+
+    if (chr == NULL) {
+        IVSHMEM_DPRINTF("creating eventfd for eventfd %d failed\n", eventfd);
+        exit(-1);
+    }
+
+    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        s->eventfd_table[vector].pdev = &s->dev;
+        s->eventfd_table[vector].vector = vector;
+
+        qemu_chr_add_handlers(chr, ivshmem_can_receive, fake_irqfd,
+                      ivshmem_event, &s->eventfd_table[vector]);
+    } else {
+        qemu_chr_add_handlers(chr, ivshmem_can_receive, ivshmem_receive,
+                      ivshmem_event, s);
+    }
+
+    return chr;
+
+}
+
+static int check_shm_size(IVShmemState *s, int shmemfd) {
+    /* check that the guest isn't going to try and map more memory than the
+     * card server allocated return -1 to indicate error */
+
+    struct stat buf;
+
+    fstat(shmemfd, &buf);
+
+    if (s->ivshmem_size > buf.st_size) {
+        fprintf(stderr, "IVSHMEM ERROR: Requested memory size greater");
+        fprintf(stderr, " than shared object size (%d > %ld)\n",
+                                          s->ivshmem_size, buf.st_size);
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+static void create_shared_memory_BAR(IVShmemState *s, int fd) {
+
+    s->shm_fd = fd;
+
+    s->ivshmem_offset = qemu_ram_mmap(s->shm_fd, s->ivshmem_size,
+             MAP_SHARED, 0);
+
+    s->ivshmem_ptr = qemu_get_ram_ptr(s->ivshmem_offset);
+
+    /* region for shared memory */
+    pci_register_bar(&s->dev, 2, s->ivshmem_size,
+                       PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_map);
+}
+
+static int ivshmem_irqfd(PCIDevice* pdev, uint16_t vector, int fd)
+{
+    struct kvm_irqfd call = { };
+    int r;
+
+    IVSHMEM_DPRINTF("inside irqfd\n");
+    if (vector >= pdev->msix_entries_nr)
+        return -EINVAL;
+    call.fd = fd;
+    call.gsi = pdev->msix_irq_entries[vector].gsi;
+    r = kvm_vm_ioctl(kvm_state, KVM_IRQFD, &call);
+    if (r < 0) {
+        IVSHMEM_DPRINTF("allocating irqfd failed %d\n", r);
+        return r;
+    }
+    return 0;
+}
+
+static int ivshmem_ioeventfd(IVShmemState* s, int posn, int fd, int vector)
+{
+
+    int ret;
+    struct kvm_ioeventfd iofd;
+
+    iofd.datamatch = (posn << 16) | vector;
+    iofd.addr = s->mmio_addr + Doorbell;
+    iofd.len = 4;
+    iofd.flags = KVM_IOEVENTFD_FLAG_DATAMATCH;
+    iofd.fd = fd;
+
+    ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &iofd);
+
+    if (ret < 0) {
+        fprintf(stderr, "error assigning ioeventfd (%d)\n", ret);
+        perror(strerror(ret));
+    } else {
+        IVSHMEM_DPRINTF("success assigning ioeventfd (%d:%d)\n", posn, vector);
+    }
+
+    return ret;
+}
+/* notify that a new guest has joined */
+static void new_guest_interrupt(IVShmemState *s)
+{
+    if (msix_enabled(&s->dev)) {
+        msix_notify(&s->dev, 0);
+    } else {
+        ivshmem_IntrStatus_write(s, NEW_GUEST_VAL);
+    }
+}
+
+static void close_guest_eventfds(IVShmemState *s, int posn)
+{
+    int i, guest_curr_max;
+
+    guest_curr_max = s->eventfds_posn_count[posn];
+
+    for (i = 0; i < guest_curr_max; i++)
+        close(s->eventfds[posn][i]);
+
+    free(s->eventfds[posn]);
+    s->eventfds_posn_count[posn] = 0;
+}
+
+static void ivshmem_read(void *opaque, const uint8_t * buf, int flags)
+{
+    IVShmemState *s = opaque;
+    int incoming_fd, tmp_fd;
+    int guest_curr_max;
+    long incoming_posn;
+
+    memcpy(&incoming_posn, buf, sizeof(long));
+    /* pick off s->chr->msgfd and store it, posn should accompany msg */
+    tmp_fd = qemu_chr_get_msgfd(s->chr);
+    IVSHMEM_DPRINTF("posn is %ld, fd is %d\n", incoming_posn, tmp_fd);
+
+    if (tmp_fd == -1) {
+        /* if posn is positive and unseen before then this is our posn*/
+        if ((incoming_posn >= 0) && (s->eventfds[incoming_posn] == NULL)) {
+            /* receive our posn */
+            s->vm_id = incoming_posn;
+            return;
+        } else {
+            /* otherwise an fd == -1 means an existing guest has gone away */
+            IVSHMEM_DPRINTF("posn %ld has gone away\n", incoming_posn);
+            close_guest_eventfds(s, incoming_posn);
+            return;
+        }
+    }
+
+    /* because of the implementation of get_msgfd, we need a dup */
+    incoming_fd = dup(tmp_fd);
+
+    /* if the position is -1, then it's shared memory region fd */
+    if (incoming_posn == -1) {
+
+        s->num_eventfds = 0;
+
+        if (check_shm_size(s, incoming_fd) == -1) {
+            exit(-1);
+        }
+
+        /* creating a BAR in qemu_chr callback may be crazy */
+        create_shared_memory_BAR(s, incoming_fd);
+
+       return;
+    }
+
+    /* each guest has an array of eventfds, and we keep track of how many
+     * guests for each VM */
+    guest_curr_max = s->eventfds_posn_count[incoming_posn];
+    if (guest_curr_max == 0) {
+        s->eventfds[incoming_posn] = (int *) malloc(IVSHMEM_MAX_EVENTFDS *
+                                                                sizeof(int));
+        new_guest_interrupt(s);
+    }
+
+    /* this is an eventfd for a particular guest VM */
+    IVSHMEM_DPRINTF("eventfds[%ld][%d] = %d\n", incoming_posn, guest_curr_max,
+                                                                incoming_fd);
+    s->eventfds[incoming_posn][guest_curr_max] = incoming_fd;
+
+    /* increment count for particular guest */
+    s->eventfds_posn_count[incoming_posn]++;
+
+    /* ioeventfd and irqfd are enabled together,
+     * so the flag IRQFD refers to both */
+    if (ivshmem_has_feature(s, IVSHMEM_IRQFD) && guest_curr_max > 0) {
+        /* allocate ioeventfd for the new fd
+         * received for guest @ incoming_posn */
+        ivshmem_ioeventfd(s, incoming_posn, incoming_fd, guest_curr_max);
+    }
+
+    /* keep track of the maximum VM ID */
+    if (incoming_posn > s->num_eventfds) {
+        s->num_eventfds = incoming_posn;
+    }
+
+    if (incoming_posn == s->vm_id) {
+        if (ivshmem_has_feature(s, IVSHMEM_IRQFD)) {
+            /* setup irqfd for this VM's eventfd */
+            ivshmem_irqfd(&s->dev, guest_curr_max,
+                                s->eventfds[s->vm_id][guest_curr_max]);
+        } else {
+            /* initialize char device for callback
+             * if this is one of my eventfd */
+            s->eventfd_chr[guest_curr_max] = create_eventfd_chr_device(s,
+                s->eventfds[s->vm_id][guest_curr_max], guest_curr_max);
+        }
+    }
+
+    return;
+}
+
+static void ivshmem_reset(DeviceState *d)
+{
+    return;
+}
+
+static void ivshmem_mmio_map(PCIDevice *pci_dev, int region_num,
+                       pcibus_t addr, pcibus_t size, int type)
+{
+    IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+    s->mmio_addr = addr;
+    cpu_register_physical_memory(addr + 0, 0x400, s->ivshmem_mmio_io_addr);
+
+    /* now that our mmio region has been allocated, we can receive
+     * the file descriptors */
+    qemu_chr_add_handlers(s->chr, ivshmem_can_receive, ivshmem_read,
+                     ivshmem_event, s);
+
+}
+
+static int pci_ivshmem_init(PCIDevice *dev)
+{
+    IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
+    uint8_t *pci_conf;
+    int i;
+
+    /* BARs must be a power of 2 */
+    if (is_power_of_two(s->size))
+        s->ivshmem_size = s->size * 1024* 1024;
+    else {
+        fprintf(stderr, "ivshmem: size must be power of 2\n");
+        exit(1);
+    }
+
+    /* IRQFD requires MSI */
+    if (ivshmem_has_feature(s, IVSHMEM_IRQFD) &&
+        !ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        fprintf(stderr, "ivshmem: ioeventfd/irqfd requires MSI\n");
+        exit(1);
+    }
+
+    pci_conf = s->dev.config;
+    pci_conf[0x00] = 0xf4; // Qumranet vendor ID 0x5002
+    pci_conf[0x01] = 0x1a;
+    pci_conf[0x02] = 0x10;
+    pci_conf[0x03] = 0x11;
+    pci_conf[0x04] = PCI_COMMAND_IOACCESS | PCI_COMMAND_MEMACCESS;
+    pci_conf[0x0a] = 0x00; // RAM controller
+    pci_conf[0x0b] = 0x05;
+    pci_conf[0x0e] = 0x00; // header_type
+
+    s->ivshmem_mmio_io_addr = cpu_register_io_memory(ivshmem_mmio_read,
+                                    ivshmem_mmio_write, s);
+    /* region for registers*/
+    pci_register_bar(&s->dev, 0, 0x400,
+                           PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_mmio_map);
+
+    /* allocate the MSI-X vectors */
+    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
+
+        if (!msix_init(&s->dev, s->vectors, 1, 0)) {
+            pci_register_bar(&s->dev, 1,
+                             msix_bar_size(&s->dev),
+                             PCI_BASE_ADDRESS_SPACE_MEMORY,
+                             msix_mmio_map);
+            IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
+        } else {
+            IVSHMEM_DPRINTF("msix initialization failed\n");
+        }
+
+        /* 'activate' the vectors */
+        for (i = 0; i < s->vectors; i++) {
+            msix_vector_use(&s->dev, i);
+        }
+    }
+
+    if ((s->chr != NULL) && (strncmp(s->chr->filename, "unix:", 5) == 0)) {
+        /* if we get a UNIX socket as the parameter we will talk
+         * to the ivshmem server later once the MMIO BAR is actually
+         * allocated (see ivshmem_mmio_map) */
+
+        s->eventfds_posn_count = qemu_mallocz(IVSHMEM_MAX_EVENTFDS *
+                                                                sizeof(int));
+
+        IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
+                                                            s->chr->filename);
+
+        s->vm_id = -1;
+
+    } else {
+        /* just map the file immediately, we're not using a server */
+        int fd;
+
+        IVSHMEM_DPRINTF("using shm_open (shm object = %s)\n", s->shmobj);
+
+        if ((fd = shm_open(s->shmobj, O_CREAT|O_RDWR,
+                        S_IRWXU|S_IRWXG|S_IRWXO)) < 0) {
+            fprintf(stderr, "kvm_ivshmem: could not open shared file\n");
+            exit(-1);
+        }
+
+        /* mmap onto PCI device's memory */
+        if (ftruncate(fd, s->ivshmem_size) != 0) {
+            fprintf(stderr, "kvm_ivshmem: could not truncate shared file\n");
+        }
+
+        create_shared_memory_BAR(s, fd);
+
+    }
+
+    IVSHMEM_DPRINTF("shared memory size is = %d\n", s->size);
+
+    pci_conf[PCI_INTERRUPT_PIN] = 1; // we are going to support interrupts
+
+    if (!ivshmem_has_feature(s, IVSHMEM_IRQFD)) {
+        s->eventfd_chr = (CharDriverState **)malloc(IVSHMEM_MAX_EVENTFDS *
+                                                            sizeof(void *));
+    }
+
+    return 0;
+}
+
+static int pci_ivshmem_uninit(PCIDevice *dev)
+{
+    IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
+
+    cpu_unregister_io_memory(s->ivshmem_mmio_io_addr);
+
+    return 0;
+}
+
+static PCIDeviceInfo ivshmem_info = {
+    .qdev.name  = "ivshmem",
+    .qdev.size  = sizeof(IVShmemState),
+    .qdev.reset = ivshmem_reset,
+    .init       = pci_ivshmem_init,
+    .exit       = pci_ivshmem_uninit,
+    .qdev.props = (Property[]) {
+        DEFINE_PROP_CHR("chardev", IVShmemState, chr),
+        DEFINE_PROP_UINT32("size", IVShmemState, size, 0),
+        DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 2),
+        DEFINE_PROP_BIT("irqfd", IVShmemState, features, IVSHMEM_IRQFD, false),
+        DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true),
+        DEFINE_PROP_STRING("shm", IVShmemState, shmobj),
+        DEFINE_PROP_END_OF_LIST(),
+    }
+};
+
+static void ivshmem_register_devices(void)
+{
+    pci_qdev_register(&ivshmem_info);
+}
+
+device_init(ivshmem_register_devices)
diff --git a/qemu-char.c b/qemu-char.c
index 048da3f..41cb8c7 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2076,6 +2076,12 @@ static void tcp_chr_read(void *opaque)
     }
 }
 
+CharDriverState *qemu_chr_open_eventfd(int eventfd){
+
+    return qemu_chr_open_fd(eventfd, eventfd);
+
+}
+
 static void tcp_chr_connect(void *opaque)
 {
     CharDriverState *chr = opaque;
diff --git a/qemu-char.h b/qemu-char.h
index 3a9427b..1571091 100644
--- a/qemu-char.h
+++ b/qemu-char.h
@@ -93,6 +93,9 @@ void qemu_chr_info_print(Monitor *mon, const QObject *ret_data);
 void qemu_chr_info(Monitor *mon, QObject **ret_data);
 CharDriverState *qemu_chr_find(const char *name);
 
+/* add an eventfd to the qemu devices that are polled */
+CharDriverState *qemu_chr_open_eventfd(int eventfd);
+
 extern int term_escape_char;
 
 /* async I/O support */
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v4 3/3] Inter-VM shared memory PCI device
@ 2010-04-07 22:52       ` Cam Macdonell
  0 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-07 22:52 UTC (permalink / raw)
  To: kvm; +Cc: Cam Macdonell, qemu-devel

Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest.  This patch also supports interrupts between guest by
communicating over a unix domain socket.  This patch applies to the qemu-kvm
repository.

    -device ivshmem,size=<size in MB>[,shm=<shm name>]

Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.

    -device ivshmem,size=<size in MB>[,shm=<shm name>][,chardev=<id>][,msi=on]
            [,irqfd=on][,vectors=n]
    -chardev socket,path=<path>,id=<id>

Sample programs, init scripts and the shared memory server are available in a
git repo here:

    www.gitorious.org/nahanni
---
 Makefile.target |    3 +
 hw/ivshmem.c    |  700 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-char.c     |    6 +
 qemu-char.h     |    3 +
 4 files changed, 712 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index 1ffd802..bc9a681 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
 obj-y += rtl8139.o
 obj-y += e1000.o
 
+# Inter-VM PCI shared memory
+obj-y += ivshmem.o
+
 # Hardware support
 obj-i386-y = pckbd.o dma.o
 obj-i386-y += vga.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 0000000..2ec6c2c
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,700 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *      Cam Macdonell <cam@cs.ualberta.ca>
+ *
+ * Based On: cirrus_vga.c and rtl8139.c
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/io.h>
+#include <sys/ioctl.h>
+#include <sys/eventfd.h>
+#include "hw.h"
+#include "console.h"
+#include "pc.h"
+#include "pci.h"
+#include "sysemu.h"
+
+#include "msix.h"
+#include "qemu-kvm.h"
+#include "libkvm.h"
+
+#include <sys/eventfd.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+
+#define PCI_COMMAND_IOACCESS                0x0001
+#define PCI_COMMAND_MEMACCESS               0x0002
+
+#define DEBUG_IVSHMEM
+
+#define IVSHMEM_IRQFD   0
+#define IVSHMEM_MSI     1
+#define IVSHMEM_MAX_EVENTFDS  16
+
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, args...)        \
+    do {printf("IVSHMEM: " fmt, ##args); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, args...)
+#endif
+
+#define NEW_GUEST_VAL UINT_MAX
+
+struct eventfd_entry {
+    PCIDevice *pdev;
+    int vector;
+};
+
+typedef struct IVShmemState {
+    PCIDevice dev;
+    uint32_t intrmask;
+    uint32_t intrstatus;
+    uint32_t doorbell;
+
+    CharDriverState * chr;
+    CharDriverState ** eventfd_chr;
+    int ivshmem_mmio_io_addr;
+
+    pcibus_t mmio_addr;
+    uint8_t *ivshmem_ptr;
+    unsigned long ivshmem_offset;
+    unsigned int ivshmem_size;
+    int shm_fd; /* shared memory file descriptor */
+
+    /* array of eventfds for each guest */
+    int * eventfds[IVSHMEM_MAX_EVENTFDS];
+    /* keep track of # of eventfds for each guest*/
+    int * eventfds_posn_count;
+
+    int vm_id;
+    int num_eventfds;
+    uint32_t vectors;
+    uint32_t features;
+    struct eventfd_entry eventfd_table[IVSHMEM_MAX_EVENTFDS];
+
+    char * shmobj;
+    uint32_t size; /*size of shared memory in MB*/
+} IVShmemState;
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+    IntrMask = 0,
+    IntrStatus = 4,
+    IVPosition = 8,
+    Doorbell = 12,
+};
+
+static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, int feature) {
+    return (ivs->features & (1 << feature));
+}
+
+static inline int is_power_of_two(int x) {
+    return (x & (x-1)) == 0;
+}
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+                    pcibus_t addr, pcibus_t size, int type)
+{
+    IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+    IVSHMEM_DPRINTF("addr = %u size = %u\n", (uint32_t)addr, (uint32_t)size);
+    cpu_register_physical_memory(addr, s->ivshmem_size, s->ivshmem_offset);
+
+}
+
+/* accessing registers - based on rtl8139 */
+static void ivshmem_update_irq(IVShmemState *s, int val)
+{
+    int isr;
+    isr = (s->intrstatus & s->intrmask) & 0xffffffff;
+
+    /* don't print ISR resets */
+    if (isr) {
+        IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n",
+           isr ? 1 : 0, s->intrstatus, s->intrmask);
+    }
+
+    qemu_set_irq(s->dev.irq[0], (isr != 0));
+}
+
+static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
+{
+    IVSHMEM_DPRINTF("IntrMask write(w) val = 0x%04x\n", val);
+
+    s->intrmask = val;
+
+    ivshmem_update_irq(s, val);
+}
+
+static uint32_t ivshmem_IntrMask_read(IVShmemState *s)
+{
+    uint32_t ret = s->intrmask;
+
+    IVSHMEM_DPRINTF("intrmask read(w) val = 0x%04x\n", ret);
+
+    return ret;
+}
+
+static void ivshmem_IntrStatus_write(IVShmemState *s, uint32_t val)
+{
+    IVSHMEM_DPRINTF("IntrStatus write(w) val = 0x%04x\n", val);
+
+    s->intrstatus = val;
+
+    ivshmem_update_irq(s, val);
+    return;
+}
+
+static uint32_t ivshmem_IntrStatus_read(IVShmemState *s)
+{
+    uint32_t ret = s->intrstatus;
+
+    /* reading ISR clears all interrupts */
+    s->intrstatus = 0;
+
+    ivshmem_update_irq(s, 0);
+
+    return ret;
+}
+
+static void ivshmem_io_writew(void *opaque, uint8_t addr, uint32_t val)
+{
+
+    IVSHMEM_DPRINTF("We shouldn't be writing words\n");
+}
+
+static void ivshmem_io_writel(void *opaque, uint8_t addr, uint32_t val)
+{
+    IVShmemState *s = opaque;
+
+    u_int64_t write_one = 1;
+    u_int16_t dest = val >> 16;
+    u_int16_t vector = val & 0xff;
+
+    addr &= 0xfe;
+
+    switch (addr)
+    {
+        case IntrMask:
+            ivshmem_IntrMask_write(s, val);
+            break;
+
+        case IntrStatus:
+            ivshmem_IntrStatus_write(s, val);
+            break;
+
+        case Doorbell:
+            /* check doorbell range */
+            IVSHMEM_DPRINTF("Writing %ld to VM %d on vector %d\n", write_one, dest, vector);
+            if ((vector > 0) && (vector < s->eventfds_posn_count[dest])) {
+                if (write(s->eventfds[dest][vector], &(write_one), 8) != 8) {
+                    IVSHMEM_DPRINTF("error writing to eventfd\n");
+                }
+            }
+            break;
+        default:
+            IVSHMEM_DPRINTF("Invalid VM Doorbell VM %d\n", dest);
+    }
+}
+
+static void ivshmem_io_writeb(void *opaque, uint8_t addr, uint32_t val)
+{
+    IVSHMEM_DPRINTF("We shouldn't be writing bytes\n");
+}
+
+static uint32_t ivshmem_io_readw(void *opaque, uint8_t addr)
+{
+
+    IVSHMEM_DPRINTF("We shouldn't be reading words\n");
+    return 0;
+}
+
+static uint32_t ivshmem_io_readl(void *opaque, uint8_t addr)
+{
+
+    IVShmemState *s = opaque;
+    uint32_t ret;
+
+    switch (addr)
+    {
+        case IntrMask:
+            ret = ivshmem_IntrMask_read(s);
+            break;
+
+        case IntrStatus:
+            ret = ivshmem_IntrStatus_read(s);
+            break;
+
+        case IVPosition:
+            /* return my id in the ivshmem list */
+            ret = s->vm_id;
+            break;
+
+        default:
+            IVSHMEM_DPRINTF("why are we reading 0x%x\n", addr);
+            ret = 0;
+    }
+
+    return ret;
+
+}
+
+static uint32_t ivshmem_io_readb(void *opaque, uint8_t addr)
+{
+    IVSHMEM_DPRINTF("We shouldn't be reading bytes\n");
+
+    return 0;
+}
+
+static void ivshmem_mmio_writeb(void *opaque,
+                                target_phys_addr_t addr, uint32_t val)
+{
+    ivshmem_io_writeb(opaque, addr & 0xFF, val);
+}
+
+static void ivshmem_mmio_writew(void *opaque,
+                                target_phys_addr_t addr, uint32_t val)
+{
+    ivshmem_io_writew(opaque, addr & 0xFF, val);
+}
+
+static void ivshmem_mmio_writel(void *opaque,
+                                target_phys_addr_t addr, uint32_t val)
+{
+    ivshmem_io_writel(opaque, addr & 0xFF, val);
+}
+
+static uint32_t ivshmem_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+    return ivshmem_io_readb(opaque, addr & 0xFF);
+}
+
+static uint32_t ivshmem_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+    uint32_t val = ivshmem_io_readw(opaque, addr & 0xFF);
+    return val;
+}
+
+static uint32_t ivshmem_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+    uint32_t val = ivshmem_io_readl(opaque, addr & 0xFF);
+    return val;
+}
+
+static CPUReadMemoryFunc *ivshmem_mmio_read[3] = {
+    ivshmem_mmio_readb,
+    ivshmem_mmio_readw,
+    ivshmem_mmio_readl,
+};
+
+static CPUWriteMemoryFunc *ivshmem_mmio_write[3] = {
+    ivshmem_mmio_writeb,
+    ivshmem_mmio_writew,
+    ivshmem_mmio_writel,
+};
+
+static void ivshmem_receive(void *opaque, const uint8_t *buf, int size)
+{
+    IVShmemState *s = opaque;
+
+    ivshmem_IntrStatus_write(s, *buf);
+
+    IVSHMEM_DPRINTF("ivshmem_receive 0x%02x\n", *buf);
+}
+
+static int ivshmem_can_receive(void * opaque)
+{
+    return 8;
+}
+
+static void ivshmem_event(void *opaque, int event)
+{
+//    IVShmemState *s = opaque;
+    IVSHMEM_DPRINTF("ivshmem_event %d\n", event);
+}
+
+static void fake_irqfd(void *opaque, const uint8_t *buf, int size) {
+
+    struct eventfd_entry *entry = opaque;
+    PCIDevice *pdev = entry->pdev;
+
+    IVSHMEM_DPRINTF("fake irqfd on vector %d\n", entry->vector);
+    msix_notify(pdev, entry->vector);
+}
+
+static CharDriverState* create_eventfd_chr_device(void * opaque, int eventfd,
+                                                                    int vector)
+{
+    // create a event character device based on the passed eventfd
+    IVShmemState *s = opaque;
+    CharDriverState * chr;
+
+    chr = qemu_chr_open_eventfd(eventfd);
+
+    if (chr == NULL) {
+        IVSHMEM_DPRINTF("creating eventfd for eventfd %d failed\n", eventfd);
+        exit(-1);
+    }
+
+    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        s->eventfd_table[vector].pdev = &s->dev;
+        s->eventfd_table[vector].vector = vector;
+
+        qemu_chr_add_handlers(chr, ivshmem_can_receive, fake_irqfd,
+                      ivshmem_event, &s->eventfd_table[vector]);
+    } else {
+        qemu_chr_add_handlers(chr, ivshmem_can_receive, ivshmem_receive,
+                      ivshmem_event, s);
+    }
+
+    return chr;
+
+}
+
+static int check_shm_size(IVShmemState *s, int shmemfd) {
+    /* check that the guest isn't going to try and map more memory than the
+     * card server allocated return -1 to indicate error */
+
+    struct stat buf;
+
+    fstat(shmemfd, &buf);
+
+    if (s->ivshmem_size > buf.st_size) {
+        fprintf(stderr, "IVSHMEM ERROR: Requested memory size greater");
+        fprintf(stderr, " than shared object size (%d > %ld)\n",
+                                          s->ivshmem_size, buf.st_size);
+        return -1;
+    } else {
+        return 0;
+    }
+}
+
+static void create_shared_memory_BAR(IVShmemState *s, int fd) {
+
+    s->shm_fd = fd;
+
+    s->ivshmem_offset = qemu_ram_mmap(s->shm_fd, s->ivshmem_size,
+             MAP_SHARED, 0);
+
+    s->ivshmem_ptr = qemu_get_ram_ptr(s->ivshmem_offset);
+
+    /* region for shared memory */
+    pci_register_bar(&s->dev, 2, s->ivshmem_size,
+                       PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_map);
+}
+
+static int ivshmem_irqfd(PCIDevice* pdev, uint16_t vector, int fd)
+{
+    struct kvm_irqfd call = { };
+    int r;
+
+    IVSHMEM_DPRINTF("inside irqfd\n");
+    if (vector >= pdev->msix_entries_nr)
+        return -EINVAL;
+    call.fd = fd;
+    call.gsi = pdev->msix_irq_entries[vector].gsi;
+    r = kvm_vm_ioctl(kvm_state, KVM_IRQFD, &call);
+    if (r < 0) {
+        IVSHMEM_DPRINTF("allocating irqfd failed %d\n", r);
+        return r;
+    }
+    return 0;
+}
+
+static int ivshmem_ioeventfd(IVShmemState* s, int posn, int fd, int vector)
+{
+
+    int ret;
+    struct kvm_ioeventfd iofd;
+
+    iofd.datamatch = (posn << 16) | vector;
+    iofd.addr = s->mmio_addr + Doorbell;
+    iofd.len = 4;
+    iofd.flags = KVM_IOEVENTFD_FLAG_DATAMATCH;
+    iofd.fd = fd;
+
+    ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &iofd);
+
+    if (ret < 0) {
+        fprintf(stderr, "error assigning ioeventfd (%d)\n", ret);
+        perror(strerror(ret));
+    } else {
+        IVSHMEM_DPRINTF("success assigning ioeventfd (%d:%d)\n", posn, vector);
+    }
+
+    return ret;
+}
+/* notify that a new guest has joined */
+static void new_guest_interrupt(IVShmemState *s)
+{
+    if (msix_enabled(&s->dev)) {
+        msix_notify(&s->dev, 0);
+    } else {
+        ivshmem_IntrStatus_write(s, NEW_GUEST_VAL);
+    }
+}
+
+static void close_guest_eventfds(IVShmemState *s, int posn)
+{
+    int i, guest_curr_max;
+
+    guest_curr_max = s->eventfds_posn_count[posn];
+
+    for (i = 0; i < guest_curr_max; i++)
+        close(s->eventfds[posn][i]);
+
+    free(s->eventfds[posn]);
+    s->eventfds_posn_count[posn] = 0;
+}
+
+static void ivshmem_read(void *opaque, const uint8_t * buf, int flags)
+{
+    IVShmemState *s = opaque;
+    int incoming_fd, tmp_fd;
+    int guest_curr_max;
+    long incoming_posn;
+
+    memcpy(&incoming_posn, buf, sizeof(long));
+    /* pick off s->chr->msgfd and store it, posn should accompany msg */
+    tmp_fd = qemu_chr_get_msgfd(s->chr);
+    IVSHMEM_DPRINTF("posn is %ld, fd is %d\n", incoming_posn, tmp_fd);
+
+    if (tmp_fd == -1) {
+        /* if posn is positive and unseen before then this is our posn*/
+        if ((incoming_posn >= 0) && (s->eventfds[incoming_posn] == NULL)) {
+            /* receive our posn */
+            s->vm_id = incoming_posn;
+            return;
+        } else {
+            /* otherwise an fd == -1 means an existing guest has gone away */
+            IVSHMEM_DPRINTF("posn %ld has gone away\n", incoming_posn);
+            close_guest_eventfds(s, incoming_posn);
+            return;
+        }
+    }
+
+    /* because of the implementation of get_msgfd, we need a dup */
+    incoming_fd = dup(tmp_fd);
+
+    /* if the position is -1, then it's shared memory region fd */
+    if (incoming_posn == -1) {
+
+        s->num_eventfds = 0;
+
+        if (check_shm_size(s, incoming_fd) == -1) {
+            exit(-1);
+        }
+
+        /* creating a BAR in qemu_chr callback may be crazy */
+        create_shared_memory_BAR(s, incoming_fd);
+
+       return;
+    }
+
+    /* each guest has an array of eventfds, and we keep track of how many
+     * guests for each VM */
+    guest_curr_max = s->eventfds_posn_count[incoming_posn];
+    if (guest_curr_max == 0) {
+        s->eventfds[incoming_posn] = (int *) malloc(IVSHMEM_MAX_EVENTFDS *
+                                                                sizeof(int));
+        new_guest_interrupt(s);
+    }
+
+    /* this is an eventfd for a particular guest VM */
+    IVSHMEM_DPRINTF("eventfds[%ld][%d] = %d\n", incoming_posn, guest_curr_max,
+                                                                incoming_fd);
+    s->eventfds[incoming_posn][guest_curr_max] = incoming_fd;
+
+    /* increment count for particular guest */
+    s->eventfds_posn_count[incoming_posn]++;
+
+    /* ioeventfd and irqfd are enabled together,
+     * so the flag IRQFD refers to both */
+    if (ivshmem_has_feature(s, IVSHMEM_IRQFD) && guest_curr_max > 0) {
+        /* allocate ioeventfd for the new fd
+         * received for guest @ incoming_posn */
+        ivshmem_ioeventfd(s, incoming_posn, incoming_fd, guest_curr_max);
+    }
+
+    /* keep track of the maximum VM ID */
+    if (incoming_posn > s->num_eventfds) {
+        s->num_eventfds = incoming_posn;
+    }
+
+    if (incoming_posn == s->vm_id) {
+        if (ivshmem_has_feature(s, IVSHMEM_IRQFD)) {
+            /* setup irqfd for this VM's eventfd */
+            ivshmem_irqfd(&s->dev, guest_curr_max,
+                                s->eventfds[s->vm_id][guest_curr_max]);
+        } else {
+            /* initialize char device for callback
+             * if this is one of my eventfd */
+            s->eventfd_chr[guest_curr_max] = create_eventfd_chr_device(s,
+                s->eventfds[s->vm_id][guest_curr_max], guest_curr_max);
+        }
+    }
+
+    return;
+}
+
+static void ivshmem_reset(DeviceState *d)
+{
+    return;
+}
+
+static void ivshmem_mmio_map(PCIDevice *pci_dev, int region_num,
+                       pcibus_t addr, pcibus_t size, int type)
+{
+    IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+    s->mmio_addr = addr;
+    cpu_register_physical_memory(addr + 0, 0x400, s->ivshmem_mmio_io_addr);
+
+    /* now that our mmio region has been allocated, we can receive
+     * the file descriptors */
+    qemu_chr_add_handlers(s->chr, ivshmem_can_receive, ivshmem_read,
+                     ivshmem_event, s);
+
+}
+
+static int pci_ivshmem_init(PCIDevice *dev)
+{
+    IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
+    uint8_t *pci_conf;
+    int i;
+
+    /* BARs must be a power of 2 */
+    if (is_power_of_two(s->size))
+        s->ivshmem_size = s->size * 1024* 1024;
+    else {
+        fprintf(stderr, "ivshmem: size must be power of 2\n");
+        exit(1);
+    }
+
+    /* IRQFD requires MSI */
+    if (ivshmem_has_feature(s, IVSHMEM_IRQFD) &&
+        !ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        fprintf(stderr, "ivshmem: ioeventfd/irqfd requires MSI\n");
+        exit(1);
+    }
+
+    pci_conf = s->dev.config;
+    pci_conf[0x00] = 0xf4; // Qumranet vendor ID 0x5002
+    pci_conf[0x01] = 0x1a;
+    pci_conf[0x02] = 0x10;
+    pci_conf[0x03] = 0x11;
+    pci_conf[0x04] = PCI_COMMAND_IOACCESS | PCI_COMMAND_MEMACCESS;
+    pci_conf[0x0a] = 0x00; // RAM controller
+    pci_conf[0x0b] = 0x05;
+    pci_conf[0x0e] = 0x00; // header_type
+
+    s->ivshmem_mmio_io_addr = cpu_register_io_memory(ivshmem_mmio_read,
+                                    ivshmem_mmio_write, s);
+    /* region for registers*/
+    pci_register_bar(&s->dev, 0, 0x400,
+                           PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_mmio_map);
+
+    /* allocate the MSI-X vectors */
+    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
+
+        if (!msix_init(&s->dev, s->vectors, 1, 0)) {
+            pci_register_bar(&s->dev, 1,
+                             msix_bar_size(&s->dev),
+                             PCI_BASE_ADDRESS_SPACE_MEMORY,
+                             msix_mmio_map);
+            IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
+        } else {
+            IVSHMEM_DPRINTF("msix initialization failed\n");
+        }
+
+        /* 'activate' the vectors */
+        for (i = 0; i < s->vectors; i++) {
+            msix_vector_use(&s->dev, i);
+        }
+    }
+
+    if ((s->chr != NULL) && (strncmp(s->chr->filename, "unix:", 5) == 0)) {
+        /* if we get a UNIX socket as the parameter we will talk
+         * to the ivshmem server later once the MMIO BAR is actually
+         * allocated (see ivshmem_mmio_map) */
+
+        s->eventfds_posn_count = qemu_mallocz(IVSHMEM_MAX_EVENTFDS *
+                                                                sizeof(int));
+
+        IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
+                                                            s->chr->filename);
+
+        s->vm_id = -1;
+
+    } else {
+        /* just map the file immediately, we're not using a server */
+        int fd;
+
+        IVSHMEM_DPRINTF("using shm_open (shm object = %s)\n", s->shmobj);
+
+        if ((fd = shm_open(s->shmobj, O_CREAT|O_RDWR,
+                        S_IRWXU|S_IRWXG|S_IRWXO)) < 0) {
+            fprintf(stderr, "kvm_ivshmem: could not open shared file\n");
+            exit(-1);
+        }
+
+        /* mmap onto PCI device's memory */
+        if (ftruncate(fd, s->ivshmem_size) != 0) {
+            fprintf(stderr, "kvm_ivshmem: could not truncate shared file\n");
+        }
+
+        create_shared_memory_BAR(s, fd);
+
+    }
+
+    IVSHMEM_DPRINTF("shared memory size is = %d\n", s->size);
+
+    pci_conf[PCI_INTERRUPT_PIN] = 1; // we are going to support interrupts
+
+    if (!ivshmem_has_feature(s, IVSHMEM_IRQFD)) {
+        s->eventfd_chr = (CharDriverState **)malloc(IVSHMEM_MAX_EVENTFDS *
+                                                            sizeof(void *));
+    }
+
+    return 0;
+}
+
+static int pci_ivshmem_uninit(PCIDevice *dev)
+{
+    IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
+
+    cpu_unregister_io_memory(s->ivshmem_mmio_io_addr);
+
+    return 0;
+}
+
+static PCIDeviceInfo ivshmem_info = {
+    .qdev.name  = "ivshmem",
+    .qdev.size  = sizeof(IVShmemState),
+    .qdev.reset = ivshmem_reset,
+    .init       = pci_ivshmem_init,
+    .exit       = pci_ivshmem_uninit,
+    .qdev.props = (Property[]) {
+        DEFINE_PROP_CHR("chardev", IVShmemState, chr),
+        DEFINE_PROP_UINT32("size", IVShmemState, size, 0),
+        DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 2),
+        DEFINE_PROP_BIT("irqfd", IVShmemState, features, IVSHMEM_IRQFD, false),
+        DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true),
+        DEFINE_PROP_STRING("shm", IVShmemState, shmobj),
+        DEFINE_PROP_END_OF_LIST(),
+    }
+};
+
+static void ivshmem_register_devices(void)
+{
+    pci_qdev_register(&ivshmem_info);
+}
+
+device_init(ivshmem_register_devices)
diff --git a/qemu-char.c b/qemu-char.c
index 048da3f..41cb8c7 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2076,6 +2076,12 @@ static void tcp_chr_read(void *opaque)
     }
 }
 
+CharDriverState *qemu_chr_open_eventfd(int eventfd){
+
+    return qemu_chr_open_fd(eventfd, eventfd);
+
+}
+
 static void tcp_chr_connect(void *opaque)
 {
     CharDriverState *chr = opaque;
diff --git a/qemu-char.h b/qemu-char.h
index 3a9427b..1571091 100644
--- a/qemu-char.h
+++ b/qemu-char.h
@@ -93,6 +93,9 @@ void qemu_chr_info_print(Monitor *mon, const QObject *ret_data);
 void qemu_chr_info(Monitor *mon, QObject **ret_data);
 CharDriverState *qemu_chr_find(const char *name);
 
+/* add an eventfd to the qemu devices that are polled */
+CharDriverState *qemu_chr_open_eventfd(int eventfd);
+
 extern int term_escape_char;
 
 /* async I/O support */
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4] Shared memory uio_pci driver
  2010-04-07 22:51   ` [Qemu-devel] " Cam Macdonell
@ 2010-04-07 23:00     ` Cam Macdonell
  -1 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-07 23:00 UTC (permalink / raw)
  To: kvm; +Cc: qemu-devel, Cam Macdonell

This patch adds a driver for my shared memory PCI device using the uio_pci
interface.  The driver has three memory regions.  The first memory region is for
device registers for sending interrupts. The second BAR is for receiving MSI-X
interrupts and the third memory region maps the shared memory.  The device only
exports the first and third memory regions to userspace.

This driver supports MSI-X and regular pin interrupts.  Currently, the number
of MSI vectors is set to 2 (one for new connections and the other for
interrupts) but it could easily be increased.  If MSI is not available, then
regular interrupts will be used.

This version added formatting and style corrections as well as better
error-checking and cleanup when errors occur.

---
 drivers/uio/Kconfig       |    8 ++
 drivers/uio/Makefile      |    1 +
 drivers/uio/uio_ivshmem.c |  252 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 261 insertions(+), 0 deletions(-)
 create mode 100644 drivers/uio/uio_ivshmem.c

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 1da73ec..b92cded 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -74,6 +74,14 @@ config UIO_SERCOS3
 
 	  If you compile this as a module, it will be called uio_sercos3.
 
+config UIO_IVSHMEM
+	tristate "KVM shared memory PCI driver"
+	default n
+	help
+	  Userspace I/O interface for the KVM shared memory device.  This
+	  driver will make available two memory regions, the first is
+	  registers and the second is a region for sharing between VMs.
+
 config UIO_PCI_GENERIC
 	tristate "Generic driver for PCI 2.3 and PCI Express cards"
 	depends on PCI
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index 18fd818..25c1ca5 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_UIO_AEC)	+= uio_aec.o
 obj-$(CONFIG_UIO_SERCOS3)	+= uio_sercos3.o
 obj-$(CONFIG_UIO_PCI_GENERIC)	+= uio_pci_generic.o
 obj-$(CONFIG_UIO_NETX)	+= uio_netx.o
+obj-$(CONFIG_UIO_IVSHMEM) += uio_ivshmem.o
diff --git a/drivers/uio/uio_ivshmem.c b/drivers/uio/uio_ivshmem.c
new file mode 100644
index 0000000..42ac9a7
--- /dev/null
+++ b/drivers/uio/uio_ivshmem.c
@@ -0,0 +1,252 @@
+/*
+ * UIO IVShmem Driver
+ *
+ * (C) 2009 Cam Macdonell
+ * based on Hilscher CIF card driver (C) 2007 Hans J. Koch <hjk@linutronix.de>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/uio_driver.h>
+
+#include <asm/io.h>
+
+#define IntrStatus 0x04
+#define IntrMask 0x00
+
+struct ivshmem_info {
+	struct uio_info *uio;
+	struct pci_dev *dev;
+	char (*msix_names)[256];
+	struct msix_entry *msix_entries;
+	int nvectors;
+};
+
+static irqreturn_t ivshmem_handler(int irq, struct uio_info *dev_info)
+{
+
+	void __iomem *plx_intscr = dev_info->mem[0].internal_addr
+					+ IntrStatus;
+	u32 val;
+
+	val = readl(plx_intscr);
+	if (val == 0)
+		return IRQ_NONE;
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t ivshmem_msix_handler(int irq, void *opaque)
+{
+
+	struct uio_info * dev_info = (struct uio_info *) opaque;
+
+	/* we have to do this explicitly when using MSI-X */
+	uio_event_notify(dev_info);
+	return IRQ_HANDLED;
+}
+
+static void free_msix_vectors(struct ivshmem_info *ivs_info,
+							const int max_vector)
+{
+	int i;
+
+	for (i = 0; i < max_vector; i++)
+		free_irq(ivs_info->msix_entries[i].vector, ivs_info->uio);
+}
+
+static int request_msix_vectors(struct ivshmem_info *ivs_info, int nvectors)
+{
+	int i, err;
+	const char *name = "ivshmem";
+
+	ivs_info->nvectors = nvectors;
+
+	ivs_info->msix_entries = kmalloc(nvectors * sizeof *
+						ivs_info->msix_entries,
+						GFP_KERNEL);
+	if (ivs_info->msix_entries == NULL)
+		return -ENOSPC;
+
+	ivs_info->msix_names = kmalloc(nvectors * sizeof *ivs_info->msix_names,
+			GFP_KERNEL);
+	if (ivs_info->msix_names == NULL) {
+		kfree(ivs_info->msix_entries);
+		return -ENOSPC;
+	}
+
+	for (i = 0; i < nvectors; ++i)
+		ivs_info->msix_entries[i].entry = i;
+
+	err = pci_enable_msix(ivs_info->dev, ivs_info->msix_entries,
+					ivs_info->nvectors);
+	if (err > 0) {
+		ivs_info->nvectors = err; /* msi-x positive error code
+					 returns the number available*/
+		err = pci_enable_msix(ivs_info->dev, ivs_info->msix_entries,
+					ivs_info->nvectors);
+		if (err) {
+			printk(KERN_INFO "no MSI (%d). Back to INTx.\n", err);
+			goto error;
+		}
+	}
+
+	if (err)
+	    goto error;
+
+	for (i = 0; i < ivs_info->nvectors; i++) {
+
+		snprintf(ivs_info->msix_names[i], sizeof *ivs_info->msix_names,
+			"%s-config", name);
+
+		err = request_irq(ivs_info->msix_entries[i].vector,
+			ivshmem_msix_handler, 0,
+			ivs_info->msix_names[i], ivs_info->uio);
+
+		if (err) {
+			free_msix_vectors(ivs_info, i - 1);
+			goto error;
+		}
+
+	}
+
+	return 0;
+error:
+	kfree(ivs_info->msix_entries);
+	kfree(ivs_info->msix_names);
+	return err;
+
+}
+
+static int __devinit ivshmem_pci_probe(struct pci_dev *dev,
+					const struct pci_device_id *id)
+{
+	struct uio_info *info;
+	struct ivshmem_info * ivshmem_info;
+	int nvectors = 4;
+
+	info = kzalloc(sizeof(struct uio_info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	ivshmem_info = kzalloc(sizeof(struct ivshmem_info), GFP_KERNEL);
+	if (!ivshmem_info) {
+		kfree(info);
+		return -ENOMEM;
+	}
+
+	if (pci_enable_device(dev))
+		goto out_free;
+
+	if (pci_request_regions(dev, "ivshmem"))
+		goto out_disable;
+
+	info->mem[0].addr = pci_resource_start(dev, 0);
+	if (!info->mem[0].addr)
+		goto out_release;
+
+	info->mem[0].size = pci_resource_len(dev, 0);
+	info->mem[0].internal_addr = pci_ioremap_bar(dev, 0);
+	if (!info->mem[0].internal_addr) {
+		goto out_release;
+	}
+
+	info->mem[0].memtype = UIO_MEM_PHYS;
+
+	info->mem[1].addr = pci_resource_start(dev, 2);
+	if (!info->mem[1].addr)
+		goto out_unmap;
+	info->mem[1].internal_addr = pci_ioremap_bar(dev, 2);
+	if (!info->mem[1].internal_addr)
+		goto out_unmap;
+
+	info->mem[1].size = pci_resource_len(dev, 2);
+	info->mem[1].memtype = UIO_MEM_PHYS;
+
+	ivshmem_info->uio = info;
+	ivshmem_info->dev = dev;
+
+	if (request_msix_vectors(ivshmem_info, nvectors) != 0) {
+		printk(KERN_INFO "regular IRQs\n");
+		info->irq = dev->irq;
+		info->irq_flags = IRQF_SHARED;
+		info->handler = ivshmem_handler;
+		writel(0xffffffff, info->mem[0].internal_addr + IntrMask);
+	} else {
+		printk(KERN_INFO "MSI-X enabled\n");
+		info->irq = -1;
+	}
+
+	info->name = "ivshmem";
+	info->version = "0.0.1";
+
+	if (uio_register_device(&dev->dev, info))
+		goto out_unmap2;
+
+	pci_set_drvdata(dev, info);
+
+
+	return 0;
+out_unmap2:
+	iounmap(info->mem[2].internal_addr);
+out_unmap:
+	iounmap(info->mem[0].internal_addr);
+out_release:
+	pci_release_regions(dev);
+out_disable:
+	pci_disable_device(dev);
+out_free:
+	kfree (info);
+	return -ENODEV;
+}
+
+static void ivshmem_pci_remove(struct pci_dev *dev)
+{
+	struct uio_info *info = pci_get_drvdata(dev);
+
+	uio_unregister_device(info);
+	pci_release_regions(dev);
+	pci_disable_device(dev);
+	pci_set_drvdata(dev, NULL);
+	iounmap(info->mem[0].internal_addr);
+
+	kfree (info);
+}
+
+static struct pci_device_id ivshmem_pci_ids[] __devinitdata = {
+	{
+		.vendor =	0x1af4,
+		.device =	0x1110,
+		.subvendor =	PCI_ANY_ID,
+		.subdevice =	PCI_ANY_ID,
+	},
+	{ 0, }
+};
+
+static struct pci_driver ivshmem_pci_driver = {
+	.name = "uio_ivshmem",
+	.id_table = ivshmem_pci_ids,
+	.probe = ivshmem_pci_probe,
+	.remove = ivshmem_pci_remove,
+};
+
+static int __init ivshmem_init_module(void)
+{
+	return pci_register_driver(&ivshmem_pci_driver);
+}
+
+static void __exit ivshmem_exit_module(void)
+{
+	pci_unregister_driver(&ivshmem_pci_driver);
+}
+
+module_init(ivshmem_init_module);
+module_exit(ivshmem_exit_module);
+
+MODULE_DEVICE_TABLE(pci, ivshmem_pci_ids);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Cam Macdonell");
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v4] Shared memory uio_pci driver
@ 2010-04-07 23:00     ` Cam Macdonell
  0 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-07 23:00 UTC (permalink / raw)
  To: kvm; +Cc: Cam Macdonell, qemu-devel

This patch adds a driver for my shared memory PCI device using the uio_pci
interface.  The driver has three memory regions.  The first memory region is for
device registers for sending interrupts. The second BAR is for receiving MSI-X
interrupts and the third memory region maps the shared memory.  The device only
exports the first and third memory regions to userspace.

This driver supports MSI-X and regular pin interrupts.  Currently, the number
of MSI vectors is set to 2 (one for new connections and the other for
interrupts) but it could easily be increased.  If MSI is not available, then
regular interrupts will be used.

This version added formatting and style corrections as well as better
error-checking and cleanup when errors occur.

---
 drivers/uio/Kconfig       |    8 ++
 drivers/uio/Makefile      |    1 +
 drivers/uio/uio_ivshmem.c |  252 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 261 insertions(+), 0 deletions(-)
 create mode 100644 drivers/uio/uio_ivshmem.c

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 1da73ec..b92cded 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -74,6 +74,14 @@ config UIO_SERCOS3
 
 	  If you compile this as a module, it will be called uio_sercos3.
 
+config UIO_IVSHMEM
+	tristate "KVM shared memory PCI driver"
+	default n
+	help
+	  Userspace I/O interface for the KVM shared memory device.  This
+	  driver will make available two memory regions, the first is
+	  registers and the second is a region for sharing between VMs.
+
 config UIO_PCI_GENERIC
 	tristate "Generic driver for PCI 2.3 and PCI Express cards"
 	depends on PCI
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index 18fd818..25c1ca5 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_UIO_AEC)	+= uio_aec.o
 obj-$(CONFIG_UIO_SERCOS3)	+= uio_sercos3.o
 obj-$(CONFIG_UIO_PCI_GENERIC)	+= uio_pci_generic.o
 obj-$(CONFIG_UIO_NETX)	+= uio_netx.o
+obj-$(CONFIG_UIO_IVSHMEM) += uio_ivshmem.o
diff --git a/drivers/uio/uio_ivshmem.c b/drivers/uio/uio_ivshmem.c
new file mode 100644
index 0000000..42ac9a7
--- /dev/null
+++ b/drivers/uio/uio_ivshmem.c
@@ -0,0 +1,252 @@
+/*
+ * UIO IVShmem Driver
+ *
+ * (C) 2009 Cam Macdonell
+ * based on Hilscher CIF card driver (C) 2007 Hans J. Koch <hjk@linutronix.de>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/uio_driver.h>
+
+#include <asm/io.h>
+
+#define IntrStatus 0x04
+#define IntrMask 0x00
+
+struct ivshmem_info {
+	struct uio_info *uio;
+	struct pci_dev *dev;
+	char (*msix_names)[256];
+	struct msix_entry *msix_entries;
+	int nvectors;
+};
+
+static irqreturn_t ivshmem_handler(int irq, struct uio_info *dev_info)
+{
+
+	void __iomem *plx_intscr = dev_info->mem[0].internal_addr
+					+ IntrStatus;
+	u32 val;
+
+	val = readl(plx_intscr);
+	if (val == 0)
+		return IRQ_NONE;
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t ivshmem_msix_handler(int irq, void *opaque)
+{
+
+	struct uio_info * dev_info = (struct uio_info *) opaque;
+
+	/* we have to do this explicitly when using MSI-X */
+	uio_event_notify(dev_info);
+	return IRQ_HANDLED;
+}
+
+static void free_msix_vectors(struct ivshmem_info *ivs_info,
+							const int max_vector)
+{
+	int i;
+
+	for (i = 0; i < max_vector; i++)
+		free_irq(ivs_info->msix_entries[i].vector, ivs_info->uio);
+}
+
+static int request_msix_vectors(struct ivshmem_info *ivs_info, int nvectors)
+{
+	int i, err;
+	const char *name = "ivshmem";
+
+	ivs_info->nvectors = nvectors;
+
+	ivs_info->msix_entries = kmalloc(nvectors * sizeof *
+						ivs_info->msix_entries,
+						GFP_KERNEL);
+	if (ivs_info->msix_entries == NULL)
+		return -ENOSPC;
+
+	ivs_info->msix_names = kmalloc(nvectors * sizeof *ivs_info->msix_names,
+			GFP_KERNEL);
+	if (ivs_info->msix_names == NULL) {
+		kfree(ivs_info->msix_entries);
+		return -ENOSPC;
+	}
+
+	for (i = 0; i < nvectors; ++i)
+		ivs_info->msix_entries[i].entry = i;
+
+	err = pci_enable_msix(ivs_info->dev, ivs_info->msix_entries,
+					ivs_info->nvectors);
+	if (err > 0) {
+		ivs_info->nvectors = err; /* msi-x positive error code
+					 returns the number available*/
+		err = pci_enable_msix(ivs_info->dev, ivs_info->msix_entries,
+					ivs_info->nvectors);
+		if (err) {
+			printk(KERN_INFO "no MSI (%d). Back to INTx.\n", err);
+			goto error;
+		}
+	}
+
+	if (err)
+	    goto error;
+
+	for (i = 0; i < ivs_info->nvectors; i++) {
+
+		snprintf(ivs_info->msix_names[i], sizeof *ivs_info->msix_names,
+			"%s-config", name);
+
+		err = request_irq(ivs_info->msix_entries[i].vector,
+			ivshmem_msix_handler, 0,
+			ivs_info->msix_names[i], ivs_info->uio);
+
+		if (err) {
+			free_msix_vectors(ivs_info, i - 1);
+			goto error;
+		}
+
+	}
+
+	return 0;
+error:
+	kfree(ivs_info->msix_entries);
+	kfree(ivs_info->msix_names);
+	return err;
+
+}
+
+static int __devinit ivshmem_pci_probe(struct pci_dev *dev,
+					const struct pci_device_id *id)
+{
+	struct uio_info *info;
+	struct ivshmem_info * ivshmem_info;
+	int nvectors = 4;
+
+	info = kzalloc(sizeof(struct uio_info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	ivshmem_info = kzalloc(sizeof(struct ivshmem_info), GFP_KERNEL);
+	if (!ivshmem_info) {
+		kfree(info);
+		return -ENOMEM;
+	}
+
+	if (pci_enable_device(dev))
+		goto out_free;
+
+	if (pci_request_regions(dev, "ivshmem"))
+		goto out_disable;
+
+	info->mem[0].addr = pci_resource_start(dev, 0);
+	if (!info->mem[0].addr)
+		goto out_release;
+
+	info->mem[0].size = pci_resource_len(dev, 0);
+	info->mem[0].internal_addr = pci_ioremap_bar(dev, 0);
+	if (!info->mem[0].internal_addr) {
+		goto out_release;
+	}
+
+	info->mem[0].memtype = UIO_MEM_PHYS;
+
+	info->mem[1].addr = pci_resource_start(dev, 2);
+	if (!info->mem[1].addr)
+		goto out_unmap;
+	info->mem[1].internal_addr = pci_ioremap_bar(dev, 2);
+	if (!info->mem[1].internal_addr)
+		goto out_unmap;
+
+	info->mem[1].size = pci_resource_len(dev, 2);
+	info->mem[1].memtype = UIO_MEM_PHYS;
+
+	ivshmem_info->uio = info;
+	ivshmem_info->dev = dev;
+
+	if (request_msix_vectors(ivshmem_info, nvectors) != 0) {
+		printk(KERN_INFO "regular IRQs\n");
+		info->irq = dev->irq;
+		info->irq_flags = IRQF_SHARED;
+		info->handler = ivshmem_handler;
+		writel(0xffffffff, info->mem[0].internal_addr + IntrMask);
+	} else {
+		printk(KERN_INFO "MSI-X enabled\n");
+		info->irq = -1;
+	}
+
+	info->name = "ivshmem";
+	info->version = "0.0.1";
+
+	if (uio_register_device(&dev->dev, info))
+		goto out_unmap2;
+
+	pci_set_drvdata(dev, info);
+
+
+	return 0;
+out_unmap2:
+	iounmap(info->mem[2].internal_addr);
+out_unmap:
+	iounmap(info->mem[0].internal_addr);
+out_release:
+	pci_release_regions(dev);
+out_disable:
+	pci_disable_device(dev);
+out_free:
+	kfree (info);
+	return -ENODEV;
+}
+
+static void ivshmem_pci_remove(struct pci_dev *dev)
+{
+	struct uio_info *info = pci_get_drvdata(dev);
+
+	uio_unregister_device(info);
+	pci_release_regions(dev);
+	pci_disable_device(dev);
+	pci_set_drvdata(dev, NULL);
+	iounmap(info->mem[0].internal_addr);
+
+	kfree (info);
+}
+
+static struct pci_device_id ivshmem_pci_ids[] __devinitdata = {
+	{
+		.vendor =	0x1af4,
+		.device =	0x1110,
+		.subvendor =	PCI_ANY_ID,
+		.subdevice =	PCI_ANY_ID,
+	},
+	{ 0, }
+};
+
+static struct pci_driver ivshmem_pci_driver = {
+	.name = "uio_ivshmem",
+	.id_table = ivshmem_pci_ids,
+	.probe = ivshmem_pci_probe,
+	.remove = ivshmem_pci_remove,
+};
+
+static int __init ivshmem_init_module(void)
+{
+	return pci_register_driver(&ivshmem_pci_driver);
+}
+
+static void __exit ivshmem_exit_module(void)
+{
+	pci_unregister_driver(&ivshmem_pci_driver);
+}
+
+module_init(ivshmem_init_module);
+module_exit(ivshmem_exit_module);
+
+MODULE_DEVICE_TABLE(pci, ivshmem_pci_ids);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Cam Macdonell");
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 1/3] Device specification for shared memory PCI device
  2010-04-07 22:51   ` [Qemu-devel] " Cam Macdonell
@ 2010-04-12 20:34     ` Avi Kivity
  -1 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-12 20:34 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: kvm, qemu-devel

On 04/08/2010 01:51 AM, Cam Macdonell wrote:

(sorry about the late review)

> +
> +Regular Interrupts
> +------------------
> +
> +If regular interrupts are used (due to either a guest not supporting MSI or the
> +user specifying not to use them on startup) then the value written to the lower
> +16-bits of the Doorbell register results is arbitrary and will trigger an
> +interrupt in the destination guest.
>    

Does the value written show up in the status register?  If yes, it can 
get overwritten by other interrupts.  If not, the lower 16 bits should 
be reserved to the value 1 for future expansion.  Basically it means 
that the pci interrupt is equivalent to to vector 1.

> +
> +An interrupt is also generated when a new guest accesses the shared memory
> +region.  A status of (2^32 - 1) indicates that a new guest has joined.
>    

Suggest making this a bitfield, define bit 0 as 'at least some other 
machine has signalled you' and bit 1 as 'at least one other machine has 
joined'.

> +
> +Message Signalled Interrupts
> +----------------------------
> +
> +A ivshmem device may support multiple MSI vectors.  If so, the lower 16-bits
> +written to the Doorbell register must be between 1 and the maximum number of
> +vectors the guest supports.  The lower 16 bits written to the doorbell is the
> +MSI vector that will be raised in the destination guest.  The number of MSI
> +vectors can vary but it is set when the VM is started, however vector 0 is
> +used to notify that a new guest has joined.  Guests should not use vector 0 for
> +any other purpose.
>    

Come to think about it, the guest has joined is actually pointless.  
Since it hasn't initialized yet you can't talk to it.  So it's best to 
leave it completely to the application, which can initialize shared 
memory and start sending interrupts.  An application defined protocol 
can handle joining.

How is initialization performed?  I guess we can define memory to start 
zeroed and let participants compete to acquire a lock.

Need to document the mask register.

Do we want an interrupt on a guest leaving?  Let's not complicate things.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] Re: [PATCH v4 1/3] Device specification for shared memory PCI device
@ 2010-04-12 20:34     ` Avi Kivity
  0 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-12 20:34 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: qemu-devel, kvm

On 04/08/2010 01:51 AM, Cam Macdonell wrote:

(sorry about the late review)

> +
> +Regular Interrupts
> +------------------
> +
> +If regular interrupts are used (due to either a guest not supporting MSI or the
> +user specifying not to use them on startup) then the value written to the lower
> +16-bits of the Doorbell register results is arbitrary and will trigger an
> +interrupt in the destination guest.
>    

Does the value written show up in the status register?  If yes, it can 
get overwritten by other interrupts.  If not, the lower 16 bits should 
be reserved to the value 1 for future expansion.  Basically it means 
that the pci interrupt is equivalent to to vector 1.

> +
> +An interrupt is also generated when a new guest accesses the shared memory
> +region.  A status of (2^32 - 1) indicates that a new guest has joined.
>    

Suggest making this a bitfield, define bit 0 as 'at least some other 
machine has signalled you' and bit 1 as 'at least one other machine has 
joined'.

> +
> +Message Signalled Interrupts
> +----------------------------
> +
> +A ivshmem device may support multiple MSI vectors.  If so, the lower 16-bits
> +written to the Doorbell register must be between 1 and the maximum number of
> +vectors the guest supports.  The lower 16 bits written to the doorbell is the
> +MSI vector that will be raised in the destination guest.  The number of MSI
> +vectors can vary but it is set when the VM is started, however vector 0 is
> +used to notify that a new guest has joined.  Guests should not use vector 0 for
> +any other purpose.
>    

Come to think about it, the guest has joined is actually pointless.  
Since it hasn't initialized yet you can't talk to it.  So it's best to 
leave it completely to the application, which can initialize shared 
memory and start sending interrupts.  An application defined protocol 
can handle joining.

How is initialization performed?  I guess we can define memory to start 
zeroed and let participants compete to acquire a lock.

Need to document the mask register.

Do we want an interrupt on a guest leaving?  Let's not complicate things.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 2/3] Support adding a file to qemu's ram allocation
  2010-04-07 22:51     ` [Qemu-devel] " Cam Macdonell
@ 2010-04-12 20:38       ` Avi Kivity
  -1 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-12 20:38 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: kvm, qemu-devel

On 04/08/2010 01:51 AM, Cam Macdonell wrote:
> This avoids the need of using qemu_ram_alloc and mmap with MAP_FIXED to map a
> host file into guest RAM.  This function mmaps the opened file anywhere and adds
> the memory to the ram blocks.
>
> Usage is
>
> qemu_ram_mmap(fd, size, MAP_SHARED, offset);
> ---
>   cpu-common.h |    1 +
>   exec.c       |   33 +++++++++++++++++++++++++++++++++
>   2 files changed, 34 insertions(+), 0 deletions(-)
>
> diff --git a/cpu-common.h b/cpu-common.h
> index 49c7fb3..87c82fc 100644
> --- a/cpu-common.h
> +++ b/cpu-common.h
> @@ -32,6 +32,7 @@ static inline void cpu_register_physical_memory(target_phys_addr_t start_addr,
>   }
>
>   ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
> +ram_addr_t qemu_ram_mmap(int, ram_addr_t, int, int);
>    

Use prototypes with argument names please.  That's not the style around 
it, but that's bad style.

>
> +ram_addr_t qemu_ram_mmap(int fd, ram_addr_t size, int flags, int offset)
>    

off_t offset

> +{
> +    RAMBlock *new_block;
> +
> +    size = TARGET_PAGE_ALIGN(size);
> +    new_block = qemu_malloc(sizeof(*new_block));
> +
> +    // map the file passed as a parameter to be this part of memory
>    

/* comments */

> +    new_block->host = mmap(0, size, PROT_READ|PROT_WRITE, flags, fd, offset);
>    

Error checking.

> +
> +#ifdef MADV_MERGEABLE
> +    madvise(new_block->host, size, MADV_MERGEABLE);
> +#endif
>    

Won't work (ksm only merges anonymous pages), but keep it there in case 
it learns about pagecache.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] Re: [PATCH v4 2/3] Support adding a file to qemu's ram allocation
@ 2010-04-12 20:38       ` Avi Kivity
  0 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-12 20:38 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: qemu-devel, kvm

On 04/08/2010 01:51 AM, Cam Macdonell wrote:
> This avoids the need of using qemu_ram_alloc and mmap with MAP_FIXED to map a
> host file into guest RAM.  This function mmaps the opened file anywhere and adds
> the memory to the ram blocks.
>
> Usage is
>
> qemu_ram_mmap(fd, size, MAP_SHARED, offset);
> ---
>   cpu-common.h |    1 +
>   exec.c       |   33 +++++++++++++++++++++++++++++++++
>   2 files changed, 34 insertions(+), 0 deletions(-)
>
> diff --git a/cpu-common.h b/cpu-common.h
> index 49c7fb3..87c82fc 100644
> --- a/cpu-common.h
> +++ b/cpu-common.h
> @@ -32,6 +32,7 @@ static inline void cpu_register_physical_memory(target_phys_addr_t start_addr,
>   }
>
>   ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
> +ram_addr_t qemu_ram_mmap(int, ram_addr_t, int, int);
>    

Use prototypes with argument names please.  That's not the style around 
it, but that's bad style.

>
> +ram_addr_t qemu_ram_mmap(int fd, ram_addr_t size, int flags, int offset)
>    

off_t offset

> +{
> +    RAMBlock *new_block;
> +
> +    size = TARGET_PAGE_ALIGN(size);
> +    new_block = qemu_malloc(sizeof(*new_block));
> +
> +    // map the file passed as a parameter to be this part of memory
>    

/* comments */

> +    new_block->host = mmap(0, size, PROT_READ|PROT_WRITE, flags, fd, offset);
>    

Error checking.

> +
> +#ifdef MADV_MERGEABLE
> +    madvise(new_block->host, size, MADV_MERGEABLE);
> +#endif
>    

Won't work (ksm only merges anonymous pages), but keep it there in case 
it learns about pagecache.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 3/3] Inter-VM shared memory PCI device
  2010-04-07 22:52       ` [Qemu-devel] " Cam Macdonell
@ 2010-04-12 20:56         ` Avi Kivity
  -1 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-12 20:56 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: kvm, qemu-devel

On 04/08/2010 01:52 AM, Cam Macdonell wrote:
> Support an inter-vm shared memory device that maps a shared-memory object as a
> PCI device in the guest.  This patch also supports interrupts between guest by
> communicating over a unix domain socket.  This patch applies to the qemu-kvm
> repository.
>
>      -device ivshmem,size=<size in MB>[,shm=<shm name>]
>    

Can that be <size in format accepted by -m> (2M, 4G, 19T, ...).

> Interrupts are supported between multiple VMs by using a shared memory server
> by using a chardev socket.
>
>      -device ivshmem,size=<size in MB>[,shm=<shm name>][,chardev=<id>][,msi=on]
>              [,irqfd=on][,vectors=n]
>      -chardev socket,path=<path>,id=<id>
>    

Do we need the irqfd parameter?  Should be on by default.

On the other hand, it may fail with older kernels with limited irqfd 
slots, so better keep it there.

> Sample programs, init scripts and the shared memory server are available in a
> git repo here:
>
>      www.gitorious.org/nahanni
>    

Please consider qemu.git/contrib.

> ---
>   Makefile.target |    3 +
>   hw/ivshmem.c    |  700 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   qemu-char.c     |    6 +
>   qemu-char.h     |    3 +
>    

qemu-doc.texi | 45 +++++++++++++

>   4 files changed, 712 insertions(+), 0 deletions(-)
>   create mode 100644 hw/ivshmem.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 1ffd802..bc9a681 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
>   obj-y += rtl8139.o
>   obj-y += e1000.o
>
> +# Inter-VM PCI shared memory
> +obj-y += ivshmem.o
> +
>    

depends on CONFIG_PCI

>   # Hardware support
> +
> +#define PCI_COMMAND_IOACCESS                0x0001
> +#define PCI_COMMAND_MEMACCESS               0x0002
>    

Should be in pci.h?

> +
> +#define DEBUG_IVSHMEM
>    

Disable for production.

> +
> +#define IVSHMEM_IRQFD   0
> +#define IVSHMEM_MSI     1
> +#define IVSHMEM_MAX_EVENTFDS  16
>    

Too low?  why limit?
> +
> +struct eventfd_entry {
> +    PCIDevice *pdev;
> +    int vector;
> +};
>    

Coding style:  EventfdEntry, and a typedef.

> +
> +typedef struct IVShmemState {
> +    PCIDevice dev;
> +    uint32_t intrmask;
> +    uint32_t intrstatus;
> +    uint32_t doorbell;
> +
> +    CharDriverState * chr;
> +    CharDriverState ** eventfd_chr;
> +    int ivshmem_mmio_io_addr;
> +
> +    pcibus_t mmio_addr;
> +    uint8_t *ivshmem_ptr;
> +    unsigned long ivshmem_offset;
>    

off_t

> +    unsigned int ivshmem_size;
>    

ram_addr_t

> +
> +/* accessing registers - based on rtl8139 */
> +static void ivshmem_update_irq(IVShmemState *s, int val)
> +{
> +    int isr;
> +    isr = (s->intrstatus&  s->intrmask)&  0xffffffff;
>    

This is highly undocumented, but fits my suggested 'status is bitmap'.  
'isr' needs to be uint32_t if you mask it like that.

> +
> +    /* don't print ISR resets */
> +    if (isr) {
> +        IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n",
> +           isr ? 1 : 0, s->intrstatus, s->intrmask);
> +    }
> +
> +    qemu_set_irq(s->dev.irq[0], (isr != 0));
> +}
> +
> +
>
>
> +
> +static void create_shared_memory_BAR(IVShmemState *s, int fd) {
> +
> +    s->shm_fd = fd;
> +
> +    s->ivshmem_offset = qemu_ram_mmap(s->shm_fd, s->ivshmem_size,
> +             MAP_SHARED, 0);
>    

Where did the offset go?

> +
> +    s->ivshmem_ptr = qemu_get_ram_ptr(s->ivshmem_offset);
>    

Never used, please drop.

> +
> +    /* region for shared memory */
> +    pci_register_bar(&s->dev, 2, s->ivshmem_size,
> +                       PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_map);
>    

Might be worthwhile to mark it as a 64-bit BAR.  Please test with 
ridiculous shared memory sizes.

> +}
> +
> +static int ivshmem_irqfd(PCIDevice* pdev, uint16_t vector, int fd)
> +{
> +    struct kvm_irqfd call = { };
> +    int r;
> +
> +    IVSHMEM_DPRINTF("inside irqfd\n");
> +    if (vector>= pdev->msix_entries_nr)
> +        return -EINVAL;
> +    call.fd = fd;
> +    call.gsi = pdev->msix_irq_entries[vector].gsi;
> +    r = kvm_vm_ioctl(kvm_state, KVM_IRQFD,&call);
> +    if (r<  0) {
> +        IVSHMEM_DPRINTF("allocating irqfd failed %d\n", r);
> +        return r;
> +    }
> +    return 0;
> +}
>    

should be in kvm.c for reuse.

> +
> +static int ivshmem_ioeventfd(IVShmemState* s, int posn, int fd, int vector)
> +{
> +
> +    int ret;
> +    struct kvm_ioeventfd iofd;
> +
> +    iofd.datamatch = (posn<<  16) | vector;
> +    iofd.addr = s->mmio_addr + Doorbell;
> +    iofd.len = 4;
> +    iofd.flags = KVM_IOEVENTFD_FLAG_DATAMATCH;
> +    iofd.fd = fd;
> +
> +    ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD,&iofd);
>    

Ditto.

> +
> +    if (ret<  0) {
> +        fprintf(stderr, "error assigning ioeventfd (%d)\n", ret);
> +        perror(strerror(ret));
> +    } else {
> +        IVSHMEM_DPRINTF("success assigning ioeventfd (%d:%d)\n", posn, vector);
> +    }
> +
> +    return ret;
> +}
>    

blank line here.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] Re: [PATCH v4 3/3] Inter-VM shared memory PCI device
@ 2010-04-12 20:56         ` Avi Kivity
  0 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-12 20:56 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: qemu-devel, kvm

On 04/08/2010 01:52 AM, Cam Macdonell wrote:
> Support an inter-vm shared memory device that maps a shared-memory object as a
> PCI device in the guest.  This patch also supports interrupts between guest by
> communicating over a unix domain socket.  This patch applies to the qemu-kvm
> repository.
>
>      -device ivshmem,size=<size in MB>[,shm=<shm name>]
>    

Can that be <size in format accepted by -m> (2M, 4G, 19T, ...).

> Interrupts are supported between multiple VMs by using a shared memory server
> by using a chardev socket.
>
>      -device ivshmem,size=<size in MB>[,shm=<shm name>][,chardev=<id>][,msi=on]
>              [,irqfd=on][,vectors=n]
>      -chardev socket,path=<path>,id=<id>
>    

Do we need the irqfd parameter?  Should be on by default.

On the other hand, it may fail with older kernels with limited irqfd 
slots, so better keep it there.

> Sample programs, init scripts and the shared memory server are available in a
> git repo here:
>
>      www.gitorious.org/nahanni
>    

Please consider qemu.git/contrib.

> ---
>   Makefile.target |    3 +
>   hw/ivshmem.c    |  700 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   qemu-char.c     |    6 +
>   qemu-char.h     |    3 +
>    

qemu-doc.texi | 45 +++++++++++++

>   4 files changed, 712 insertions(+), 0 deletions(-)
>   create mode 100644 hw/ivshmem.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 1ffd802..bc9a681 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
>   obj-y += rtl8139.o
>   obj-y += e1000.o
>
> +# Inter-VM PCI shared memory
> +obj-y += ivshmem.o
> +
>    

depends on CONFIG_PCI

>   # Hardware support
> +
> +#define PCI_COMMAND_IOACCESS                0x0001
> +#define PCI_COMMAND_MEMACCESS               0x0002
>    

Should be in pci.h?

> +
> +#define DEBUG_IVSHMEM
>    

Disable for production.

> +
> +#define IVSHMEM_IRQFD   0
> +#define IVSHMEM_MSI     1
> +#define IVSHMEM_MAX_EVENTFDS  16
>    

Too low?  why limit?
> +
> +struct eventfd_entry {
> +    PCIDevice *pdev;
> +    int vector;
> +};
>    

Coding style:  EventfdEntry, and a typedef.

> +
> +typedef struct IVShmemState {
> +    PCIDevice dev;
> +    uint32_t intrmask;
> +    uint32_t intrstatus;
> +    uint32_t doorbell;
> +
> +    CharDriverState * chr;
> +    CharDriverState ** eventfd_chr;
> +    int ivshmem_mmio_io_addr;
> +
> +    pcibus_t mmio_addr;
> +    uint8_t *ivshmem_ptr;
> +    unsigned long ivshmem_offset;
>    

off_t

> +    unsigned int ivshmem_size;
>    

ram_addr_t

> +
> +/* accessing registers - based on rtl8139 */
> +static void ivshmem_update_irq(IVShmemState *s, int val)
> +{
> +    int isr;
> +    isr = (s->intrstatus&  s->intrmask)&  0xffffffff;
>    

This is highly undocumented, but fits my suggested 'status is bitmap'.  
'isr' needs to be uint32_t if you mask it like that.

> +
> +    /* don't print ISR resets */
> +    if (isr) {
> +        IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n",
> +           isr ? 1 : 0, s->intrstatus, s->intrmask);
> +    }
> +
> +    qemu_set_irq(s->dev.irq[0], (isr != 0));
> +}
> +
> +
>
>
> +
> +static void create_shared_memory_BAR(IVShmemState *s, int fd) {
> +
> +    s->shm_fd = fd;
> +
> +    s->ivshmem_offset = qemu_ram_mmap(s->shm_fd, s->ivshmem_size,
> +             MAP_SHARED, 0);
>    

Where did the offset go?

> +
> +    s->ivshmem_ptr = qemu_get_ram_ptr(s->ivshmem_offset);
>    

Never used, please drop.

> +
> +    /* region for shared memory */
> +    pci_register_bar(&s->dev, 2, s->ivshmem_size,
> +                       PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_map);
>    

Might be worthwhile to mark it as a 64-bit BAR.  Please test with 
ridiculous shared memory sizes.

> +}
> +
> +static int ivshmem_irqfd(PCIDevice* pdev, uint16_t vector, int fd)
> +{
> +    struct kvm_irqfd call = { };
> +    int r;
> +
> +    IVSHMEM_DPRINTF("inside irqfd\n");
> +    if (vector>= pdev->msix_entries_nr)
> +        return -EINVAL;
> +    call.fd = fd;
> +    call.gsi = pdev->msix_irq_entries[vector].gsi;
> +    r = kvm_vm_ioctl(kvm_state, KVM_IRQFD,&call);
> +    if (r<  0) {
> +        IVSHMEM_DPRINTF("allocating irqfd failed %d\n", r);
> +        return r;
> +    }
> +    return 0;
> +}
>    

should be in kvm.c for reuse.

> +
> +static int ivshmem_ioeventfd(IVShmemState* s, int posn, int fd, int vector)
> +{
> +
> +    int ret;
> +    struct kvm_ioeventfd iofd;
> +
> +    iofd.datamatch = (posn<<  16) | vector;
> +    iofd.addr = s->mmio_addr + Doorbell;
> +    iofd.len = 4;
> +    iofd.flags = KVM_IOEVENTFD_FLAG_DATAMATCH;
> +    iofd.fd = fd;
> +
> +    ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD,&iofd);
>    

Ditto.

> +
> +    if (ret<  0) {
> +        fprintf(stderr, "error assigning ioeventfd (%d)\n", ret);
> +        perror(strerror(ret));
> +    } else {
> +        IVSHMEM_DPRINTF("success assigning ioeventfd (%d:%d)\n", posn, vector);
> +    }
> +
> +    return ret;
> +}
>    

blank line here.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4] Shared memory uio_pci driver
  2010-04-07 23:00     ` [Qemu-devel] " Cam Macdonell
@ 2010-04-12 20:57       ` Avi Kivity
  -1 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-12 20:57 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: kvm, qemu-devel

On 04/08/2010 02:00 AM, Cam Macdonell wrote:
> This patch adds a driver for my shared memory PCI device using the uio_pci
> interface.  The driver has three memory regions.  The first memory region is for
> device registers for sending interrupts. The second BAR is for receiving MSI-X
> interrupts and the third memory region maps the shared memory.  The device only
> exports the first and third memory regions to userspace.
>
> This driver supports MSI-X and regular pin interrupts.  Currently, the number
> of MSI vectors is set to 2 (one for new connections and the other for
> interrupts) but it could easily be increased.  If MSI is not available, then
> regular interrupts will be used.
>
> This version added formatting and style corrections as well as better
> error-checking and cleanup when errors occur.
>
>    

There is work now to bring msi to the generic pci 2.3 driver, perhaps we 
can use that instead.  From a quick look it looks fine.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] Re: [PATCH v4] Shared memory uio_pci driver
@ 2010-04-12 20:57       ` Avi Kivity
  0 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-12 20:57 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: qemu-devel, kvm

On 04/08/2010 02:00 AM, Cam Macdonell wrote:
> This patch adds a driver for my shared memory PCI device using the uio_pci
> interface.  The driver has three memory regions.  The first memory region is for
> device registers for sending interrupts. The second BAR is for receiving MSI-X
> interrupts and the third memory region maps the shared memory.  The device only
> exports the first and third memory regions to userspace.
>
> This driver supports MSI-X and regular pin interrupts.  Currently, the number
> of MSI vectors is set to 2 (one for new connections and the other for
> interrupts) but it could easily be increased.  If MSI is not available, then
> regular interrupts will be used.
>
> This version added formatting and style corrections as well as better
> error-checking and cleanup when errors occur.
>
>    

There is work now to bring msi to the generic pci 2.3 driver, perhaps we 
can use that instead.  From a quick look it looks fine.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 1/3] Device specification for shared memory PCI device
  2010-04-12 20:34     ` [Qemu-devel] " Avi Kivity
@ 2010-04-12 21:11       ` Cam Macdonell
  -1 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-12 21:11 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, qemu-devel

On Mon, Apr 12, 2010 at 3:34 PM, Avi Kivity <avi@redhat.com> wrote:
> On 04/08/2010 01:51 AM, Cam Macdonell wrote:
>
> (sorry about the late review)
>
>> +
>> +Regular Interrupts
>> +------------------
>> +
>> +If regular interrupts are used (due to either a guest not supporting MSI
>> or the
>> +user specifying not to use them on startup) then the value written to the
>> lower
>> +16-bits of the Doorbell register results is arbitrary and will trigger an
>> +interrupt in the destination guest.
>>
>
> Does the value written show up in the status register?  If yes, it can get
> overwritten by other interrupts.  If not, the lower 16 bits should be
> reserved to the value 1 for future expansion.  Basically it means that the
> pci interrupt is equivalent to to vector 1.

The status register is only 1 or 0.  I've made it so 1 is the only
value written to trigger an interrupt.

>
>> +
>> +An interrupt is also generated when a new guest accesses the shared
>> memory
>> +region.  A status of (2^32 - 1) indicates that a new guest has joined.
>>
>
> Suggest making this a bitfield, define bit 0 as 'at least some other machine
> has signalled you' and bit 1 as 'at least one other machine has joined'.
>
>> +
>> +Message Signalled Interrupts
>> +----------------------------
>> +
>> +A ivshmem device may support multiple MSI vectors.  If so, the lower
>> 16-bits
>> +written to the Doorbell register must be between 1 and the maximum number
>> of
>> +vectors the guest supports.  The lower 16 bits written to the doorbell is
>> the
>> +MSI vector that will be raised in the destination guest.  The number of
>> MSI
>> +vectors can vary but it is set when the VM is started, however vector 0
>> is
>> +used to notify that a new guest has joined.  Guests should not use vector
>> 0 for
>> +any other purpose.
>>
>
> Come to think about it, the guest has joined is actually pointless.  Since
> it hasn't initialized yet you can't talk to it.  So it's best to leave it
> completely to the application, which can initialize shared memory and start
> sending interrupts.  An application defined protocol can handle joining.

Good point.

> How is initialization performed?  I guess we can define memory to start
> zeroed and let participants compete to acquire a lock.

No initialization of the memory occurs presently.

With interrupts the shared memory server could zero the memory.
Without the server (non-interrupt case) the guests can try and open
the shared memory with O_EXCL first and zero the memory if it
succeeds.  If O_EXCL fails, then guest would open without O_EXCL and
not initialize.

>
> Need to document the mask register.

Currently only applies with regular interrupts.  Since the status
register is only 0 or 1, then only the first bit has any affect.  I'll
add this to the spec.

>
> Do we want an interrupt on a guest leaving?  Let's not complicate things.

Probably not if we don't have one on join.

Cam

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] Re: [PATCH v4 1/3] Device specification for shared memory PCI device
@ 2010-04-12 21:11       ` Cam Macdonell
  0 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-12 21:11 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm

On Mon, Apr 12, 2010 at 3:34 PM, Avi Kivity <avi@redhat.com> wrote:
> On 04/08/2010 01:51 AM, Cam Macdonell wrote:
>
> (sorry about the late review)
>
>> +
>> +Regular Interrupts
>> +------------------
>> +
>> +If regular interrupts are used (due to either a guest not supporting MSI
>> or the
>> +user specifying not to use them on startup) then the value written to the
>> lower
>> +16-bits of the Doorbell register results is arbitrary and will trigger an
>> +interrupt in the destination guest.
>>
>
> Does the value written show up in the status register?  If yes, it can get
> overwritten by other interrupts.  If not, the lower 16 bits should be
> reserved to the value 1 for future expansion.  Basically it means that the
> pci interrupt is equivalent to to vector 1.

The status register is only 1 or 0.  I've made it so 1 is the only
value written to trigger an interrupt.

>
>> +
>> +An interrupt is also generated when a new guest accesses the shared
>> memory
>> +region.  A status of (2^32 - 1) indicates that a new guest has joined.
>>
>
> Suggest making this a bitfield, define bit 0 as 'at least some other machine
> has signalled you' and bit 1 as 'at least one other machine has joined'.
>
>> +
>> +Message Signalled Interrupts
>> +----------------------------
>> +
>> +A ivshmem device may support multiple MSI vectors.  If so, the lower
>> 16-bits
>> +written to the Doorbell register must be between 1 and the maximum number
>> of
>> +vectors the guest supports.  The lower 16 bits written to the doorbell is
>> the
>> +MSI vector that will be raised in the destination guest.  The number of
>> MSI
>> +vectors can vary but it is set when the VM is started, however vector 0
>> is
>> +used to notify that a new guest has joined.  Guests should not use vector
>> 0 for
>> +any other purpose.
>>
>
> Come to think about it, the guest has joined is actually pointless.  Since
> it hasn't initialized yet you can't talk to it.  So it's best to leave it
> completely to the application, which can initialize shared memory and start
> sending interrupts.  An application defined protocol can handle joining.

Good point.

> How is initialization performed?  I guess we can define memory to start
> zeroed and let participants compete to acquire a lock.

No initialization of the memory occurs presently.

With interrupts the shared memory server could zero the memory.
Without the server (non-interrupt case) the guests can try and open
the shared memory with O_EXCL first and zero the memory if it
succeeds.  If O_EXCL fails, then guest would open without O_EXCL and
not initialize.

>
> Need to document the mask register.

Currently only applies with regular interrupts.  Since the status
register is only 0 or 1, then only the first bit has any affect.  I'll
add this to the spec.

>
> Do we want an interrupt on a guest leaving?  Let's not complicate things.

Probably not if we don't have one on join.

Cam

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] PCI Shared memory device
  2010-04-07 22:51 ` [Qemu-devel] " Cam Macdonell
@ 2010-04-12 21:17   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 30+ messages in thread
From: Michael S. Tsirkin @ 2010-04-12 21:17 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: kvm, qemu-devel

On Wed, Apr 07, 2010 at 04:51:57PM -0600, Cam Macdonell wrote:
> Latest patch for PCI shared memory device that maps a host shared memory object
> to be shared between guests

FWIW, I still think it would be better to reuse virtio-pci spec
for feature negotiation, control etc even
if Anthony nacks the reuse of virtio code in qemu.

-- 
MST

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] Re: [PATCH v4 0/3] PCI Shared memory device
@ 2010-04-12 21:17   ` Michael S. Tsirkin
  0 siblings, 0 replies; 30+ messages in thread
From: Michael S. Tsirkin @ 2010-04-12 21:17 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: qemu-devel, kvm

On Wed, Apr 07, 2010 at 04:51:57PM -0600, Cam Macdonell wrote:
> Latest patch for PCI shared memory device that maps a host shared memory object
> to be shared between guests

FWIW, I still think it would be better to reuse virtio-pci spec
for feature negotiation, control etc even
if Anthony nacks the reuse of virtio code in qemu.

-- 
MST

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 3/3] Inter-VM shared memory PCI device
  2010-04-12 20:56         ` [Qemu-devel] " Avi Kivity
@ 2010-04-14 23:30           ` Cam Macdonell
  -1 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-14 23:30 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, qemu-devel

On Mon, Apr 12, 2010 at 2:56 PM, Avi Kivity <avi@redhat.com> wrote:
> On 04/08/2010 01:52 AM, Cam Macdonell wrote:
>>
>> Support an inter-vm shared memory device that maps a shared-memory object
>> as a
>> PCI device in the guest.  This patch also supports interrupts between
>> guest by
>> communicating over a unix domain socket.  This patch applies to the
>> qemu-kvm
>> repository.
>>
>>     -device ivshmem,size=<size in MB>[,shm=<shm name>]
>>
>
> Can that be <size in format accepted by -m> (2M, 4G, 19T, ...).
>
>> Interrupts are supported between multiple VMs by using a shared memory
>> server
>> by using a chardev socket.
>>
>>     -device ivshmem,size=<size in MB>[,shm=<shm
>> name>][,chardev=<id>][,msi=on]
>>             [,irqfd=on][,vectors=n]
>>     -chardev socket,path=<path>,id=<id>
>>
>
> Do we need the irqfd parameter?  Should be on by default.
>
> On the other hand, it may fail with older kernels with limited irqfd slots,
> so better keep it there.
>
>> Sample programs, init scripts and the shared memory server are available
>> in a
>> git repo here:
>>
>>     www.gitorious.org/nahanni
>>
>
> Please consider qemu.git/contrib.

Should the compilation be tied into Qemu's regular build with a switch
(e.g. --enable-ivshmem-server)? Or should it be its own separate
build?

Cam

>
>> ---
>>  Makefile.target |    3 +
>>  hw/ivshmem.c    |  700
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  qemu-char.c     |    6 +
>>  qemu-char.h     |    3 +
>>
>
> qemu-doc.texi | 45 +++++++++++++

Seems to be light on qdev devices.  I notice there is a section named
"Data Type Index" that "could be used for qdev device names and
options", but is currently empty.  Should I place documentation there
of device there or just add it to "3.3 Invocation"?

>
>>  4 files changed, 712 insertions(+), 0 deletions(-)
>>  create mode 100644 hw/ivshmem.c
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 1ffd802..bc9a681 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
>>  obj-y += rtl8139.o
>>  obj-y += e1000.o
>>
>> +# Inter-VM PCI shared memory
>> +obj-y += ivshmem.o
>> +
>>
>
> depends on CONFIG_PCI

as in

obj-($CONFIG_PCI) += ivshmem.o


the variable CONFIG_PCI doesn't seem to be set during configuration.
I don't see any other PCI devices that depend on it.  Do we also want
to depend on CONFIG_KVM?

>> +static void create_shared_memory_BAR(IVShmemState *s, int fd) {
>> +
>> +    s->shm_fd = fd;
>> +
>> +    s->ivshmem_offset = qemu_ram_mmap(s->shm_fd, s->ivshmem_size,
>> +             MAP_SHARED, 0);
>>
>
> Where did the offset go?

0 is the offset.  I include the offset parameter in qemu_ram_mmap() to
make it flexible for other uses.  Are you suggesting to take an
optional offset as an argument to -device?

Cam

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] Re: [PATCH v4 3/3] Inter-VM shared memory PCI device
@ 2010-04-14 23:30           ` Cam Macdonell
  0 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-14 23:30 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm

On Mon, Apr 12, 2010 at 2:56 PM, Avi Kivity <avi@redhat.com> wrote:
> On 04/08/2010 01:52 AM, Cam Macdonell wrote:
>>
>> Support an inter-vm shared memory device that maps a shared-memory object
>> as a
>> PCI device in the guest.  This patch also supports interrupts between
>> guest by
>> communicating over a unix domain socket.  This patch applies to the
>> qemu-kvm
>> repository.
>>
>>     -device ivshmem,size=<size in MB>[,shm=<shm name>]
>>
>
> Can that be <size in format accepted by -m> (2M, 4G, 19T, ...).
>
>> Interrupts are supported between multiple VMs by using a shared memory
>> server
>> by using a chardev socket.
>>
>>     -device ivshmem,size=<size in MB>[,shm=<shm
>> name>][,chardev=<id>][,msi=on]
>>             [,irqfd=on][,vectors=n]
>>     -chardev socket,path=<path>,id=<id>
>>
>
> Do we need the irqfd parameter?  Should be on by default.
>
> On the other hand, it may fail with older kernels with limited irqfd slots,
> so better keep it there.
>
>> Sample programs, init scripts and the shared memory server are available
>> in a
>> git repo here:
>>
>>     www.gitorious.org/nahanni
>>
>
> Please consider qemu.git/contrib.

Should the compilation be tied into Qemu's regular build with a switch
(e.g. --enable-ivshmem-server)? Or should it be its own separate
build?

Cam

>
>> ---
>>  Makefile.target |    3 +
>>  hw/ivshmem.c    |  700
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  qemu-char.c     |    6 +
>>  qemu-char.h     |    3 +
>>
>
> qemu-doc.texi | 45 +++++++++++++

Seems to be light on qdev devices.  I notice there is a section named
"Data Type Index" that "could be used for qdev device names and
options", but is currently empty.  Should I place documentation there
of device there or just add it to "3.3 Invocation"?

>
>>  4 files changed, 712 insertions(+), 0 deletions(-)
>>  create mode 100644 hw/ivshmem.c
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 1ffd802..bc9a681 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
>>  obj-y += rtl8139.o
>>  obj-y += e1000.o
>>
>> +# Inter-VM PCI shared memory
>> +obj-y += ivshmem.o
>> +
>>
>
> depends on CONFIG_PCI

as in

obj-($CONFIG_PCI) += ivshmem.o


the variable CONFIG_PCI doesn't seem to be set during configuration.
I don't see any other PCI devices that depend on it.  Do we also want
to depend on CONFIG_KVM?

>> +static void create_shared_memory_BAR(IVShmemState *s, int fd) {
>> +
>> +    s->shm_fd = fd;
>> +
>> +    s->ivshmem_offset = qemu_ram_mmap(s->shm_fd, s->ivshmem_size,
>> +             MAP_SHARED, 0);
>>
>
> Where did the offset go?

0 is the offset.  I include the offset parameter in qemu_ram_mmap() to
make it flexible for other uses.  Are you suggesting to take an
optional offset as an argument to -device?

Cam

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 3/3] Inter-VM shared memory PCI device
  2010-04-14 23:30           ` [Qemu-devel] " Cam Macdonell
@ 2010-04-15  8:33             ` Avi Kivity
  -1 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-15  8:33 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: kvm, qemu-devel

On 04/15/2010 02:30 AM, Cam Macdonell wrote:
>>
>>> Sample programs, init scripts and the shared memory server are available
>>> in a
>>> git repo here:
>>>
>>>      www.gitorious.org/nahanni
>>>
>>>        
>> Please consider qemu.git/contrib.
>>      
> Should the compilation be tied into Qemu's regular build with a switch
> (e.g. --enable-ivshmem-server)? Or should it be its own separate
> build?
>    

It can have its own makefile.
>>> ---
>>>   Makefile.target |    3 +
>>>   hw/ivshmem.c    |  700
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   qemu-char.c     |    6 +
>>>   qemu-char.h     |    3 +
>>>
>>>        
>> qemu-doc.texi | 45 +++++++++++++
>>      
> Seems to be light on qdev devices.  I notice there is a section named
> "Data Type Index" that "could be used for qdev device names and
> options", but is currently empty.  Should I place documentation there
> of device there or just add it to "3.3 Invocation"?
>    

I think those are in qemu-options.hx.  Just put it somewhere where it 
seems appropriate.

>    
>>      
>>>   4 files changed, 712 insertions(+), 0 deletions(-)
>>>   create mode 100644 hw/ivshmem.c
>>>
>>> diff --git a/Makefile.target b/Makefile.target
>>> index 1ffd802..bc9a681 100644
>>> --- a/Makefile.target
>>> +++ b/Makefile.target
>>> @@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
>>>   obj-y += rtl8139.o
>>>   obj-y += e1000.o
>>>
>>> +# Inter-VM PCI shared memory
>>> +obj-y += ivshmem.o
>>> +
>>>
>>>        
>> depends on CONFIG_PCI
>>      
> as in
>
> obj-($CONFIG_PCI) += ivshmem.o
>
>
> the variable CONFIG_PCI doesn't seem to be set during configuration.
> I don't see any other PCI devices that depend on it.

My mistake, keep as is.

> Do we also want
> to depend on CONFIG_KVM?
>    

No real need.

>>> +static void create_shared_memory_BAR(IVShmemState *s, int fd) {
>>> +
>>> +    s->shm_fd = fd;
>>> +
>>> +    s->ivshmem_offset = qemu_ram_mmap(s->shm_fd, s->ivshmem_size,
>>> +             MAP_SHARED, 0);
>>>
>>>        
>> Where did the offset go?
>>      
> 0 is the offset.  I include the offset parameter in qemu_ram_mmap() to
> make it flexible for other uses.

Makes sense.

> Are you suggesting to take an
> optional offset as an argument to -device?
>    

No need.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] Re: [PATCH v4 3/3] Inter-VM shared memory PCI device
@ 2010-04-15  8:33             ` Avi Kivity
  0 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-15  8:33 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: qemu-devel, kvm

On 04/15/2010 02:30 AM, Cam Macdonell wrote:
>>
>>> Sample programs, init scripts and the shared memory server are available
>>> in a
>>> git repo here:
>>>
>>>      www.gitorious.org/nahanni
>>>
>>>        
>> Please consider qemu.git/contrib.
>>      
> Should the compilation be tied into Qemu's regular build with a switch
> (e.g. --enable-ivshmem-server)? Or should it be its own separate
> build?
>    

It can have its own makefile.
>>> ---
>>>   Makefile.target |    3 +
>>>   hw/ivshmem.c    |  700
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   qemu-char.c     |    6 +
>>>   qemu-char.h     |    3 +
>>>
>>>        
>> qemu-doc.texi | 45 +++++++++++++
>>      
> Seems to be light on qdev devices.  I notice there is a section named
> "Data Type Index" that "could be used for qdev device names and
> options", but is currently empty.  Should I place documentation there
> of device there or just add it to "3.3 Invocation"?
>    

I think those are in qemu-options.hx.  Just put it somewhere where it 
seems appropriate.

>    
>>      
>>>   4 files changed, 712 insertions(+), 0 deletions(-)
>>>   create mode 100644 hw/ivshmem.c
>>>
>>> diff --git a/Makefile.target b/Makefile.target
>>> index 1ffd802..bc9a681 100644
>>> --- a/Makefile.target
>>> +++ b/Makefile.target
>>> @@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
>>>   obj-y += rtl8139.o
>>>   obj-y += e1000.o
>>>
>>> +# Inter-VM PCI shared memory
>>> +obj-y += ivshmem.o
>>> +
>>>
>>>        
>> depends on CONFIG_PCI
>>      
> as in
>
> obj-($CONFIG_PCI) += ivshmem.o
>
>
> the variable CONFIG_PCI doesn't seem to be set during configuration.
> I don't see any other PCI devices that depend on it.

My mistake, keep as is.

> Do we also want
> to depend on CONFIG_KVM?
>    

No real need.

>>> +static void create_shared_memory_BAR(IVShmemState *s, int fd) {
>>> +
>>> +    s->shm_fd = fd;
>>> +
>>> +    s->ivshmem_offset = qemu_ram_mmap(s->shm_fd, s->ivshmem_size,
>>> +             MAP_SHARED, 0);
>>>
>>>        
>> Where did the offset go?
>>      
> 0 is the offset.  I include the offset parameter in qemu_ram_mmap() to
> make it flexible for other uses.

Makes sense.

> Are you suggesting to take an
> optional offset as an argument to -device?
>    

No need.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4] Shared memory uio_pci driver
  2010-04-12 20:57       ` [Qemu-devel] " Avi Kivity
@ 2010-04-23 17:45         ` Cam Macdonell
  -1 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-23 17:45 UTC (permalink / raw)
  To: Avi Kivity, Michael S. Tsirkin; +Cc: kvm, qemu-devel

On Mon, Apr 12, 2010 at 2:57 PM, Avi Kivity <avi@redhat.com> wrote:
>
> There is work now to bring msi to the generic pci 2.3 driver, perhaps we can
> use that instead.  From a quick look it looks fine.
>

I'd be interested to follow this development.  I can't find anything
on LKML, is it being discussed anywhere?

Thanks,
Cam

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] Re: [PATCH v4] Shared memory uio_pci driver
@ 2010-04-23 17:45         ` Cam Macdonell
  0 siblings, 0 replies; 30+ messages in thread
From: Cam Macdonell @ 2010-04-23 17:45 UTC (permalink / raw)
  To: Avi Kivity, Michael S. Tsirkin; +Cc: qemu-devel, kvm

On Mon, Apr 12, 2010 at 2:57 PM, Avi Kivity <avi@redhat.com> wrote:
>
> There is work now to bring msi to the generic pci 2.3 driver, perhaps we can
> use that instead.  From a quick look it looks fine.
>

I'd be interested to follow this development.  I can't find anything
on LKML, is it being discussed anywhere?

Thanks,
Cam

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4] Shared memory uio_pci driver
  2010-04-23 17:45         ` [Qemu-devel] " Cam Macdonell
@ 2010-04-24  9:28           ` Avi Kivity
  -1 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-24  9:28 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: Michael S. Tsirkin, kvm, qemu-devel

On 04/23/2010 08:45 PM, Cam Macdonell wrote:
> On Mon, Apr 12, 2010 at 2:57 PM, Avi Kivity<avi@redhat.com>  wrote:
>    
>> There is work now to bring msi to the generic pci 2.3 driver, perhaps we can
>> use that instead.  From a quick look it looks fine.
>>
>>      
> I'd be interested to follow this development.  I can't find anything
> on LKML, is it being discussed anywhere?
>
>    

Look for patches from Tom Lyon (on kvm@ as well).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] Re: [PATCH v4] Shared memory uio_pci driver
@ 2010-04-24  9:28           ` Avi Kivity
  0 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2010-04-24  9:28 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: qemu-devel, kvm, Michael S. Tsirkin

On 04/23/2010 08:45 PM, Cam Macdonell wrote:
> On Mon, Apr 12, 2010 at 2:57 PM, Avi Kivity<avi@redhat.com>  wrote:
>    
>> There is work now to bring msi to the generic pci 2.3 driver, perhaps we can
>> use that instead.  From a quick look it looks fine.
>>
>>      
> I'd be interested to follow this development.  I can't find anything
> on LKML, is it being discussed anywhere?
>
>    

Look for patches from Tom Lyon (on kvm@ as well).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2010-04-24  9:28 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-07 22:51 [PATCH v4 0/3] PCI Shared memory device Cam Macdonell
2010-04-07 22:51 ` [Qemu-devel] " Cam Macdonell
2010-04-07 22:51 ` [PATCH v4 1/3] Device specification for shared memory PCI device Cam Macdonell
2010-04-07 22:51   ` [Qemu-devel] " Cam Macdonell
2010-04-07 22:51   ` [PATCH v4 2/3] Support adding a file to qemu's ram allocation Cam Macdonell
2010-04-07 22:51     ` [Qemu-devel] " Cam Macdonell
2010-04-07 22:52     ` [PATCH v4 3/3] Inter-VM shared memory PCI device Cam Macdonell
2010-04-07 22:52       ` [Qemu-devel] " Cam Macdonell
2010-04-12 20:56       ` Avi Kivity
2010-04-12 20:56         ` [Qemu-devel] " Avi Kivity
2010-04-14 23:30         ` Cam Macdonell
2010-04-14 23:30           ` [Qemu-devel] " Cam Macdonell
2010-04-15  8:33           ` Avi Kivity
2010-04-15  8:33             ` [Qemu-devel] " Avi Kivity
2010-04-12 20:38     ` [PATCH v4 2/3] Support adding a file to qemu's ram allocation Avi Kivity
2010-04-12 20:38       ` [Qemu-devel] " Avi Kivity
2010-04-07 23:00   ` [PATCH v4] Shared memory uio_pci driver Cam Macdonell
2010-04-07 23:00     ` [Qemu-devel] " Cam Macdonell
2010-04-12 20:57     ` Avi Kivity
2010-04-12 20:57       ` [Qemu-devel] " Avi Kivity
2010-04-23 17:45       ` Cam Macdonell
2010-04-23 17:45         ` [Qemu-devel] " Cam Macdonell
2010-04-24  9:28         ` Avi Kivity
2010-04-24  9:28           ` [Qemu-devel] " Avi Kivity
2010-04-12 20:34   ` [PATCH v4 1/3] Device specification for shared memory PCI device Avi Kivity
2010-04-12 20:34     ` [Qemu-devel] " Avi Kivity
2010-04-12 21:11     ` Cam Macdonell
2010-04-12 21:11       ` [Qemu-devel] " Cam Macdonell
2010-04-12 21:17 ` [PATCH v4 0/3] PCI Shared memory device Michael S. Tsirkin
2010-04-12 21:17   ` [Qemu-devel] " Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.