All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] New device API
@ 2009-05-05 11:31 Paul Brook
  2009-05-05 15:56 ` Blue Swirl
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Paul Brook @ 2009-05-05 11:31 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2070 bytes --]

The attached patch is my attempt at a new internal API for device creation in 
qemu.

The long term goal here is fully dynamic machine construction, i.e. a user 
supplied configuration file/tree rather than the current hardcoded board 
configs.

As a first step towards this, I've implemented an API that allows devices to 
be created and configured without requiring device specific knowledge.

There are two sides to this API.  The device side allows a device to 
register/configure functionality (e.g. MMIO regions, IRQs, character devices, 
etc). Most of the infrastructure for this is already present, we just need to 
configure it consistently, rather than on an ad-hoc basis. I expect this API 
to remain largely unchanged[1].

The board side allows creation on configuration of devices. In the medium term 
I expect this to go away, and be replaced by data driven configuration.

I also expect that this device abstraction will let itself to a system bus 
model to solve many of the IOMMU/DMA/address mapping issues that qemu 
currently can't handle.

There are still a few rough edges. Busses aren't handled in a particularly 
consistent way, PCI isn't particularly well integrated, and the 
implementation of things like qdev_get_chardev is an ugly hack mapping onto 
current commandline options. However I believe the device side API is fairly 
solid, and is a necessary prerequisite for fixing the bigger configuration 
problem.

I've converted a bunch of devices, anyone interested in seeing how it fits 
together in practice can pull from
  git://repo.or.cz/qemu/pbrook.git qdev

It you have objections to or suggestion about this approach please speak up 
now, but please bear in ming that this code is still somewhat in flux and the 
caveats mentioned above.

Paul

[1] I subscribe to the linux kernel theory that stable internal APIs are a 
coincidence, not a feature. i.e. a consistent internal API is good, but that 
it should be changed whenever technically desirable. I have no interest in 
maintaining a fixed API for the benefit of third parties.

[-- Attachment #2: patch.qdev --]
[-- Type: text/x-diff, Size: 13458 bytes --]

commit 11c765848af6cb345a81f2722f141b305538182d
Author: Paul Brook <paul@codesourcery.com>
Date:   Mon May 4 17:13:08 2009 +0100

    Basic qdev infrastructure.
    
    Signed-off-by: Paul Brook <paul@codesourcery.com>

diff --git a/Makefile.target b/Makefile.target
index f735105..a35e724 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -498,6 +498,7 @@ endif #CONFIG_BSD_USER
 # System emulator target
 ifndef CONFIG_USER_ONLY
 
+DEVICES=
 OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o dma-helpers.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
@@ -686,6 +687,13 @@ ifeq ($(TARGET_BASE_ARCH), m68k)
 OBJS+= an5206.o mcf5206.o mcf_uart.o mcf_intc.o mcf5208.o mcf_fec.o
 OBJS+= m68k-semi.o dummy_m68k.o
 endif
+
+DEVICE_OBJS=$(addsuffix .o,$(DEVICES))
+$(DEVICE_OBJS): CPPFLAGS+=-DDEVICE_NAME=$(subst -,_,$(@:.o=))
+
+OBJS+=$(DEVICE_OBJS)
+OBJS+=devices.o qdev.o
+
 ifdef CONFIG_GDBSTUB
 OBJS+=gdbstub.o gdbstub-xml.o
 endif
@@ -752,6 +760,9 @@ endif
 qemu-options.h: $(SRC_PATH)/qemu-options.hx
 	$(call quiet-command,sh $(SRC_PATH)/hxtool -h < $< > $@,"  GEN   $(TARGET_DIR)$@")
 
+devices.c: gen_devices.sh Makefile.target config.mak
+	$(call quiet-command,$(SHELL) $(SRC_PATH)/gen_devices.sh $@ $(subst -,_,$(DEVICES)),"  GEN   $(TARGET_DIR)$@")
+
 clean:
 	rm -f *.o *.a *~ $(PROGS) nwfpe/*.o fpu/*.o qemu-options.h gdbstub-xml.c
 	rm -f *.d */*.d tcg/*.o
diff --git a/gen_devices.sh b/gen_devices.sh
new file mode 100644
index 0000000..1d6ec12
--- /dev/null
+++ b/gen_devices.sh
@@ -0,0 +1,17 @@
+#! /bin/sh
+# Call device init functions.
+
+file="$1"
+shift
+devices="$@"
+echo '/* Generated by gen_devices.sh */' > $file
+echo '#include "sysemu.h"' >> $file
+for x in $devices ; do
+    echo "void ${x}_register(void);" >> $file
+done
+echo "void register_devices(void)" >> $file
+echo "{" >> $file
+for x in $devices ; do
+    echo "    ${x}_register();" >> $file
+done
+echo "}" >> $file
diff --git a/hw/qdev.c b/hw/qdev.c
new file mode 100644
index 0000000..eb8c172
--- /dev/null
+++ b/hw/qdev.c
@@ -0,0 +1,294 @@
+/*
+ *  Dynamic device configuration and creation.
+ *
+ *  Copyright (c) 2009 CodeSourcery
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include "hw.h"
+#include "qdev.h"
+#include "sysemu.h"
+
+#include <assert.h>
+
+struct DeviceProperty
+{
+    const char *name;
+    union {
+        int i;
+        void *ptr;
+    } value;
+    DeviceProperty *next;
+};
+
+struct DeviceType
+{
+    const char *name;
+    qdev_initfn init;
+    void *opaque;
+    int size;
+    DeviceType *next;
+};
+
+static DeviceType *device_type_list;
+
+/* Register a new device type.  */
+DeviceType *qdev_register(const char *name, int size, qdev_initfn init,
+                          void *opaque)
+{
+    DeviceType *t;
+
+    assert(size >= sizeof(DeviceState));
+
+    t = qemu_mallocz(sizeof(DeviceType));
+    t->next = device_type_list;
+    device_type_list = t;
+    t->name = qemu_strdup(name);
+    t->size = size;
+    t->init = init;
+    t->opaque = opaque;
+
+    return t;
+}
+
+/* Create a new device.  This only initializes the device state structure
+   and allows properties to be set.  qdev_init should be called to
+   initialize the actual device emulation.  */
+DeviceState *qdev_create(const char *name)
+{
+    DeviceType *t;
+    DeviceState *dev;
+
+    for (t = device_type_list; t; t = t->next) {
+        if (strcmp(t->name, name) == 0) {
+            break;
+        }
+    }
+    if (!t) {
+        fprintf(stderr, "Unknown device '%s'\n", name);
+        exit(1);
+    }
+
+    dev = qemu_mallocz(t->size);
+    dev->name = name;
+    dev->type = t;
+    return dev;
+}
+
+/* Initialize a device.  Device properties should be set before calling
+   this function.  IRQs and MMIO regions should be connected/mapped after
+   calling this function.  */
+void qdev_init(DeviceState *dev)
+{
+    dev->type->init(dev, dev->type->opaque);
+}
+
+void qdev_connect_irq(DeviceState *dev, int n, qemu_irq irq)
+{
+    assert(n >= 0 && n < dev->num_irq);
+    dev->irqs[n] = 0;
+    if (dev->irqp[n]) {
+        *dev->irqp[n] = irq;
+    }
+}
+
+void qdev_mmio_map(DeviceState *dev, int n, target_phys_addr_t addr)
+{
+    assert(n >= 0 && n < dev->num_mmio);
+
+    if (dev->mmio[n].addr == addr) {
+        /* ??? region already mapped here.  */
+        return;
+    }
+    if (dev->mmio[n].addr != (target_phys_addr_t)-1) {
+        /* Unregister previous mapping.  */
+        cpu_register_physical_memory(dev->mmio[n].addr, dev->mmio[n].size,
+                                     IO_MEM_UNASSIGNED);
+    }
+    dev->mmio[n].addr = addr;
+    if (dev->mmio[n].cb) {
+        dev->mmio[n].cb(dev, addr);
+    } else {
+        cpu_register_physical_memory(addr, dev->mmio[n].size,
+                                     dev->mmio[n].iofunc);
+    }
+}
+
+void qdev_set_bus(DeviceState *dev, void *bus)
+{
+    assert(!dev->bus);
+    dev->bus = bus;
+}
+
+static DeviceProperty *create_prop(DeviceState *dev, const char *name)
+{
+    DeviceProperty *prop;
+
+    /* TODO: Check for duplicate properties.  */
+    prop = qemu_mallocz(sizeof(*prop));
+    prop->name = qemu_strdup(name);
+    prop->next = dev->props;
+    dev->props = prop;
+
+    return prop;
+}
+
+void qdev_set_prop_int(DeviceState *dev, const char *name, int value)
+{
+    DeviceProperty *prop;
+
+    prop = create_prop(dev, name);
+    prop->value.i = value;
+}
+
+void qdev_set_prop_ptr(DeviceState *dev, const char *name, void *value)
+{
+    DeviceProperty *prop;
+
+    prop = create_prop(dev, name);
+    prop->value.ptr = value;
+}
+
+
+qemu_irq qdev_get_irq_sink(DeviceState *dev, int n)
+{
+    assert(n >= 0 && n < dev->num_irq_sink);
+    return dev->irq_sink[n];
+}
+
+void qdev_init_mmio(DeviceState *dev, target_phys_addr_t size, int iofunc)
+{
+    int n;
+
+    assert(dev->num_mmio < QDEV_MAX_MMIO);
+    n = dev->num_mmio++;
+    dev->mmio[n].addr = -1;
+    dev->mmio[n].size = size;
+    dev->mmio[n].iofunc = iofunc;
+}
+
+void qdev_init_mmio_cb(DeviceState *dev, target_phys_addr_t size,
+                       mmio_mapfunc cb)
+{
+    int n;
+
+    assert(dev->num_mmio < QDEV_MAX_MMIO);
+    n = dev->num_mmio++;
+    dev->mmio[n].addr = -1;
+    dev->mmio[n].size = size;
+    dev->mmio[n].cb = cb;
+}
+
+/* Request an IRQ source.  The actual IRQ object may be populated later.  */
+void qdev_init_irq(DeviceState *dev, qemu_irq *p)
+{
+    int n;
+
+    assert(dev->num_irq < QDEV_MAX_IRQ);
+    n = dev->num_irq++;
+    dev->irqp[n] = p;
+}
+
+/* Pass IRQs from a target device.  */
+void qdev_pass_irq(DeviceState *dev, DeviceState *target)
+{
+    int i;
+    assert(dev->num_irq == 0);
+    dev->num_irq = target->num_irq;
+    for (i = 0; i < dev->num_irq; i++) {
+        dev->irqp[i] = target->irqp[i];
+    }
+}
+
+/* Register device IRQ sinks.  */
+void qdev_init_irq_sink(DeviceState *dev, qemu_irq_handler handler, int nirq)
+{
+    dev->num_irq_sink = nirq;
+    dev->irq_sink = qemu_allocate_irqs(handler, dev, nirq);
+}
+
+/* Get a character (serial) device interface.  */
+CharDriverState *qdev_init_chardev(DeviceState *dev)
+{
+    static int next_serial;
+    static int next_virtconsole;
+    if (strncmp(dev->name, "virtio", 6) == 0) {
+        return virtcon_hds[next_virtconsole++];
+    } else {
+        return serial_hds[next_serial++];
+    }
+}
+
+void *qdev_get_bus(DeviceState *dev)
+{
+    return dev->bus;
+}
+
+static DeviceProperty *find_prop(DeviceState *dev, const char *name)
+{
+    DeviceProperty *prop;
+
+    for (prop = dev->props; prop; prop = prop->next) {
+        if (strcmp(prop->name, name) == 0) {
+            return prop;
+        }
+    }
+    /* TODO: When this comes from a config file we will need to handle
+       missing properties gracefully.  */
+    abort();
+}
+
+int qdev_get_prop_int(DeviceState *dev, const char *name)
+{
+    DeviceProperty *prop;
+
+    prop = find_prop(dev, name);
+    return prop->value.i;
+}
+
+void *qdev_get_prop_ptr(DeviceState *dev, const char *name)
+{
+    DeviceProperty *prop;
+
+    prop = find_prop(dev, name);
+    return prop->value.ptr;
+}
+
+DeviceState *qdev_create_varargs(const char *name,
+                                 target_phys_addr_t addr, ...)
+{
+    DeviceState *dev;
+    va_list va;
+    qemu_irq irq;
+    int n;
+
+    dev = qdev_create(name);
+    qdev_init(dev);
+    if (addr != (target_phys_addr_t)-1) {
+        qdev_mmio_map(dev, 0, addr);
+    }
+    va_start(va, addr);
+    n = 0;
+    while (1) {
+        irq = va_arg(va, qemu_irq);
+        if (!irq) {
+            break;
+        }
+        qdev_connect_irq(dev, n, irq);
+        n++;
+    }
+    return dev;
+}
diff --git a/hw/qdev.h b/hw/qdev.h
new file mode 100644
index 0000000..08b2713
--- /dev/null
+++ b/hw/qdev.h
@@ -0,0 +1,89 @@
+#ifndef QDEV_H
+#define QDEV_H
+
+#include "irq.h"
+
+typedef struct DeviceType DeviceType;
+
+#define QDEV_MAX_MMIO 5
+#define QDEV_MAX_IRQ 32
+
+typedef struct DeviceProperty DeviceProperty;
+
+typedef void (*mmio_mapfunc)(DeviceState *dev, target_phys_addr_t addr);
+
+/* This structure should not be accessed directly.  We declare it here
+   so that it can be embedded in individual device state structures.  */
+struct DeviceState
+{
+    const char *name;
+    DeviceType *type;
+    void *bus;
+    int num_irq;
+    qemu_irq irqs[QDEV_MAX_IRQ];
+    qemu_irq *irqp[QDEV_MAX_IRQ];
+    int num_irq_sink;
+    qemu_irq *irq_sink;
+    int num_mmio;
+    struct {
+        target_phys_addr_t addr;
+        target_phys_addr_t size;
+        mmio_mapfunc cb;
+        int iofunc;
+    } mmio[QDEV_MAX_MMIO];
+    DeviceProperty *props;
+};
+
+
+/*** Board API.  This should go away once we have a machine config file.  ***/
+
+DeviceState *qdev_create(const char *name);
+void qdev_init(DeviceState *dev);
+
+/* Set properties between creation and init.  */
+void qdev_set_bus(DeviceState *dev, void *bus);
+void qdev_set_prop_int(DeviceState *dev, const char *name, int value);
+void qdev_set_prop_ptr(DeviceState *dev, const char *name, void *value);
+
+/* Configure device after init.   */
+void qdev_connect_irq(DeviceState *dev, int n, qemu_irq irq);
+void qdev_mmio_map(DeviceState *dev, int n, target_phys_addr_t addr);
+qemu_irq qdev_get_irq_sink(DeviceState *dev, int n);
+
+
+/*** Device API.  ***/
+
+typedef void (*qdev_initfn)(DeviceState *dev, void *opaque);
+
+DeviceType *qdev_register(const char *name, int size, qdev_initfn init,
+                          void *opaque);
+
+/* Register device properties.  */
+void qdev_init_mmio(DeviceState *dev, target_phys_addr_t size, int iofunc);
+void qdev_init_mmio_cb(DeviceState *dev, target_phys_addr_t size,
+                       mmio_mapfunc cb);
+void qdev_init_irq(DeviceState *dev, qemu_irq *p);
+void qdev_pass_irq(DeviceState *dev, DeviceState *target);
+void qdev_init_irq_sink(DeviceState *dev, qemu_irq_handler handler, int nirq);
+
+CharDriverState *qdev_init_chardev(DeviceState *dev);
+
+void *qdev_get_bus(DeviceState *dev);
+int qdev_get_prop_int(DeviceState *dev, const char *name);
+void *qdev_get_prop_ptr(DeviceState *dev, const char *name);
+
+/* Helper function for creating devices with no special properties.  */
+DeviceState *qdev_create_varargs(const char *name,
+                                 target_phys_addr_t addr, ...);
+static inline DeviceState *qdev_create_simple(const char *name,
+                                              target_phys_addr_t addr,
+                                              qemu_irq irq)
+{
+    return qdev_create_varargs(name, addr, irq, NULL);
+}
+
+#ifdef DEVICE_NAME
+void glue(DEVICE_NAME, _register)(void);
+#endif
+
+#endif
diff --git a/qemu-common.h b/qemu-common.h
index c90c3e3..e792e94 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -178,6 +178,7 @@ typedef struct PCIDevice PCIDevice;
 typedef struct SerialState SerialState;
 typedef struct IRQState *qemu_irq;
 struct pcmcia_card_s;
+typedef struct DeviceState DeviceState;
 
 /* CPU save/load.  */
 void cpu_save(QEMUFile *f, void *opaque);
diff --git a/sysemu.h b/sysemu.h
index 9bb9fbc..24c9fe6 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -259,4 +259,6 @@ int get_param_value(char *buf, int buf_size,
                     const char *tag, const char *str);
 int check_params(const char * const *params, const char *str);
 
+void register_devices(void);
+
 #endif
diff --git a/vl.c b/vl.c
index 867111c..b21ea19 100644
--- a/vl.c
+++ b/vl.c
@@ -5904,6 +5904,8 @@ int main(int argc, char **argv, char **envp)
         }
     }
 
+    register_devices();
+
     machine->init(ram_size, vga_ram_size, boot_devices,
                   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
 

[-- Attachment #3: patch.qdev-scsi --]
[-- Type: text/x-diff, Size: 1747 bytes --]

commit 82ec0ee7bed3280952cab897810c2fe00b0abff2
Author: Paul Brook <paul@codesourcery.com>
Date:   Mon May 4 17:13:11 2009 +0100

    qdev scsi bus infrastructure
    
    Signed-off-by: Paul Brook <paul@codesourcery.com>

diff --git a/hw/qdev.c b/hw/qdev.c
index 868ba75..7874049 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -309,3 +309,21 @@ BlockDriverState *qdev_init_bdrv(DeviceState *dev, BlockInterfaceType type)
     }
     return drives_table[index].bdrv;
 }
+
+static int next_scsi_bus;
+
+/* Create a scsi bus, and attach devices to it.  */
+void qdev_init_scsi(DeviceState *dev, SCSIAttachFn attach)
+{
+   int bus = next_scsi_bus++;
+   int unit;
+   int index;
+
+   for (unit = 0; unit < MAX_SCSI_DEVS; unit++) {
+       index = drive_get_index(IF_SCSI, bus, unit);
+       if (index == -1) {
+           continue;
+       }
+       attach(dev, drives_table[index].bdrv, unit);
+   }
+}
diff --git a/hw/qdev.h b/hw/qdev.h
index 08b2713..79eb22f 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -54,6 +54,7 @@ qemu_irq qdev_get_irq_sink(DeviceState *dev, int n);
 /*** Device API.  ***/
 
 typedef void (*qdev_initfn)(DeviceState *dev, void *opaque);
+typedef void (*SCSIAttachFn)(void *opaque, BlockDriverState *bdrv, int unit);
 
 DeviceType *qdev_register(const char *name, int size, qdev_initfn init,
                           void *opaque);
@@ -65,6 +66,7 @@ void qdev_init_mmio_cb(DeviceState *dev, target_phys_addr_t size,
 void qdev_init_irq(DeviceState *dev, qemu_irq *p);
 void qdev_pass_irq(DeviceState *dev, DeviceState *target);
 void qdev_init_irq_sink(DeviceState *dev, qemu_irq_handler handler, int nirq);
+void qdev_init_scsi(DeviceState *dev, SCSIAttachFn attach);
 
 CharDriverState *qdev_init_chardev(DeviceState *dev);
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-05 11:31 [Qemu-devel] [RFC] New device API Paul Brook
@ 2009-05-05 15:56 ` Blue Swirl
  2009-05-05 16:17   ` Paul Brook
  2009-05-05 18:22 ` Anthony Liguori
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Blue Swirl @ 2009-05-05 15:56 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

On 5/5/09, Paul Brook <paul@codesourcery.com> wrote:
> The attached patch is my attempt at a new internal API for device creation in
>  qemu.
>
>  The long term goal here is fully dynamic machine construction, i.e. a user
>  supplied configuration file/tree rather than the current hardcoded board
>  configs.
>
>  As a first step towards this, I've implemented an API that allows devices to
>  be created and configured without requiring device specific knowledge.
>
>  There are two sides to this API.  The device side allows a device to
>  register/configure functionality (e.g. MMIO regions, IRQs, character devices,
>  etc). Most of the infrastructure for this is already present, we just need to
>  configure it consistently, rather than on an ad-hoc basis. I expect this API
>  to remain largely unchanged[1].
>
>  The board side allows creation on configuration of devices. In the medium term
>  I expect this to go away, and be replaced by data driven configuration.
>
>  I also expect that this device abstraction will let itself to a system bus
>  model to solve many of the IOMMU/DMA/address mapping issues that qemu
>  currently can't handle.
>
>  There are still a few rough edges. Busses aren't handled in a particularly
>  consistent way, PCI isn't particularly well integrated, and the
>  implementation of things like qdev_get_chardev is an ugly hack mapping onto
>  current commandline options. However I believe the device side API is fairly
>  solid, and is a necessary prerequisite for fixing the bigger configuration
>  problem.
>
>  I've converted a bunch of devices, anyone interested in seeing how it fits
>  together in practice can pull from
>   git://repo.or.cz/qemu/pbrook.git qdev
>
>  It you have objections to or suggestion about this approach please speak up
>  now, but please bear in ming that this code is still somewhat in flux and the
>  caveats mentioned above.

FSF address is old.

About "int" property, I'd use a wider type to support 64 bit properties.

I don't think the ptr property can be used to construct OpenFirmware
properties, because the length of the data is not handled.

What happens if you try to register two devices of the same type, can
you identify them somehow? In OF, this is handed with by adding @ and
address.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-05 15:56 ` Blue Swirl
@ 2009-05-05 16:17   ` Paul Brook
  2009-05-05 16:26     ` Blue Swirl
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Brook @ 2009-05-05 16:17 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

> About "int" property, I'd use a wider type to support 64 bit properties.

Ok.

> I don't think the ptr property can be used to construct OpenFirmware
> properties, because the length of the data is not handled.

This is intended to be used to link between devices. An OF implementation 
would probably use a phandle. I may rename to device/object to clarify.

The ESP DMA bits where we pass function pointers are a temporary hack, and 
should go away.

> What happens if you try to register two devices of the same type, can
> you identify them somehow? In OF, this is handed with by adding @ and
> address.

I guess I'll either use an informative suffix (like OF/FDT), or have separate 
type and instance name.

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-05 16:17   ` Paul Brook
@ 2009-05-05 16:26     ` Blue Swirl
  2009-05-05 16:35       ` Paul Brook
  0 siblings, 1 reply; 27+ messages in thread
From: Blue Swirl @ 2009-05-05 16:26 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

On 5/5/09, Paul Brook <paul@codesourcery.com> wrote:
> > About "int" property, I'd use a wider type to support 64 bit properties.
>
>
> Ok.
>
>
>  > I don't think the ptr property can be used to construct OpenFirmware
>  > properties, because the length of the data is not handled.
>
>
> This is intended to be used to link between devices. An OF implementation
>  would probably use a phandle. I may rename to device/object to clarify.

Still, it would be useful to have a way for the devices to be able to
export strings and blobs to the device tree.

>  The ESP DMA bits where we pass function pointers are a temporary hack, and
>  should go away.
>
>
>  > What happens if you try to register two devices of the same type, can
>  > you identify them somehow? In OF, this is handed with by adding @ and
>  > address.
>
>
> I guess I'll either use an informative suffix (like OF/FDT), or have separate
>  type and instance name.
>
>
>  Paul
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-05 16:26     ` Blue Swirl
@ 2009-05-05 16:35       ` Paul Brook
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Brook @ 2009-05-05 16:35 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

> >  > I don't think the ptr property can be used to construct OpenFirmware
> >  > properties, because the length of the data is not handled.
> >
> > This is intended to be used to link between devices. An OF implementation
> >  would probably use a phandle. I may rename to device/object to clarify.
>
> Still, it would be useful to have a way for the devices to be able to
> export strings and blobs to the device tree.

Right. ptr is the wrong way to represent those though :-)

Note that I don't intend for devices to modify the configuration data. qemu 
may want to manipulate it bit to support commandline driven machine 
modifications (like -net nic), but that bit doesn't exist yet.

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-05 11:31 [Qemu-devel] [RFC] New device API Paul Brook
  2009-05-05 15:56 ` Blue Swirl
@ 2009-05-05 18:22 ` Anthony Liguori
  2009-05-05 22:42   ` Edgar E. Iglesias
  2009-05-06  0:52   ` Paul Brook
  2009-05-05 22:25 ` Edgar E. Iglesias
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 27+ messages in thread
From: Anthony Liguori @ 2009-05-05 18:22 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul Brook wrote:
> The attached patch is my attempt at a new internal API for device creation in 
> qemu.
>   

Instead of recreating constructors, I think we should just use GCC's 
constructor attribute.  This gives us ordering which will be important 
when dealing with buses.

I think the layering is not quite right with qdev.

Not all devices fit into the parameters of register_mmio/connect_irq.  
When dealing with bus devices (like PCI devices), I think you really 
have to model the constructs that the bus expose.  Note, this starts to 
look very similar to the Linux kernel's layered device model.

For instance, the following makes sense to me (from an x86 perspective):

Device              <- for devices connected directly to the 
northbridge/southbridge
  - raise_interrupt(0..255)
  - lower_interrupt(0..255)
  - register_mmio
  - register_pio
  - read_memory
  - write_memory


A PCI controller is a Device, but it introduces the concept of 
PCIDevice.  This is:

PCIDevice
  - raise_link(A|B|C|D)
  - lower_link(A|B|C|D)
  - register_io_region(IO|MEM, BAR, size)
  - read_memory
  - write_memory

A SCSI controller is a PCIDevice, but it introduces the concept of 
SCSIDevice.  This is:

SCSIDevice
  - send_cdb()
  - recv_cdb()

And so forth.  Virtio is a bus, ISA is a bus, etc. etc.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-05 11:31 [Qemu-devel] [RFC] New device API Paul Brook
  2009-05-05 15:56 ` Blue Swirl
  2009-05-05 18:22 ` Anthony Liguori
@ 2009-05-05 22:25 ` Edgar E. Iglesias
  2009-05-08  1:54 ` Zachary Amsden
  2009-05-08  5:27 ` Marcelo Tosatti
  4 siblings, 0 replies; 27+ messages in thread
From: Edgar E. Iglesias @ 2009-05-05 22:25 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

On Tue, May 05, 2009 at 12:31:09PM +0100, Paul Brook wrote:
> The attached patch is my attempt at a new internal API for device creation in 
> qemu.

>From my perspective this looks quite nice.

Like Blue said, 64bit properties would be nice. Also the _varargs method
wasn't very intuitive but that's a small detail.

I'm looking forward to seeing more of this.

Cheers

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-05 18:22 ` Anthony Liguori
@ 2009-05-05 22:42   ` Edgar E. Iglesias
  2009-05-06  0:52   ` Paul Brook
  1 sibling, 0 replies; 27+ messages in thread
From: Edgar E. Iglesias @ 2009-05-05 22:42 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paul Brook, qemu-devel

On Tue, May 05, 2009 at 01:22:45PM -0500, Anthony Liguori wrote:
> Paul Brook wrote:
>> The attached patch is my attempt at a new internal API for device creation 
>> in qemu.
>>   
>
> Instead of recreating constructors, I think we should just use GCC's 
> constructor attribute.  This gives us ordering which will be important when 
> dealing with buses.
>
> I think the layering is not quite right with qdev.
>
> Not all devices fit into the parameters of register_mmio/connect_irq.  When 
> dealing with bus devices (like PCI devices), I think you really have to 
> model the constructs that the bus expose.  Note, this starts to look very 
> similar to the Linux kernel's layered device model.
>
> For instance, the following makes sense to me (from an x86 perspective):
>
> Device              <- for devices connected directly to the 
> northbridge/southbridge
>  - raise_interrupt(0..255)
>  - lower_interrupt(0..255)
>  - register_mmio
>  - register_pio
>  - read_memory
>  - write_memory

Yes, I guess the memory access indirection will be useful once there is
better bus modeling. Another thing that could be useful is a generic
datapath between devices. For example to interconnect DMA controllers
with peripherals through side buses. 

These features can probably be added incrementally though.

Cheers

>
>
> A PCI controller is a Device, but it introduces the concept of PCIDevice.  
> This is:
>
> PCIDevice
>  - raise_link(A|B|C|D)
>  - lower_link(A|B|C|D)
>  - register_io_region(IO|MEM, BAR, size)
>  - read_memory
>  - write_memory
>
> A SCSI controller is a PCIDevice, but it introduces the concept of 
> SCSIDevice.  This is:
>
> SCSIDevice
>  - send_cdb()
>  - recv_cdb()
>
> And so forth.  Virtio is a bus, ISA is a bus, etc. etc.
>
> Regards,
>
> Anthony Liguori
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-05 18:22 ` Anthony Liguori
  2009-05-05 22:42   ` Edgar E. Iglesias
@ 2009-05-06  0:52   ` Paul Brook
  2009-05-06  1:04     ` Paul Brook
  1 sibling, 1 reply; 27+ messages in thread
From: Paul Brook @ 2009-05-06  0:52 UTC (permalink / raw)
  To: qemu-devel

> > The attached patch is my attempt at a new internal API for device
> > creation in qemu.
>
> Instead of recreating constructors, I think we should just use GCC's
> constructor attribute.  This gives us ordering which will be important
> when dealing with buses.

The reason I'm not using constructors is because you have to workaround 
ordering issues. All constructors are run before main(), so there's a very 
limited amount they can actually do, and constructor priorities are not 
available on all hosts.

> I think the layering is not quite right with qdev.
>
> Not all devices fit into the parameters of register_mmio/connect_irq.
> When dealing with bus devices (like PCI devices), I think you really
> have to model the constructs that the bus expose.  Note, this starts to
> look very similar to the Linux kernel's layered device model.
> 
> For instance, the following makes sense to me (from an x86 perspective):
> <snip>

Hmm, I'll think about this a bit.

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-06  0:52   ` Paul Brook
@ 2009-05-06  1:04     ` Paul Brook
  2009-05-06 13:35       ` Anthony Liguori
  2009-05-09 20:55       ` Anthony Liguori
  0 siblings, 2 replies; 27+ messages in thread
From: Paul Brook @ 2009-05-06  1:04 UTC (permalink / raw)
  To: qemu-devel

On Wednesday 06 May 2009, Paul Brook wrote:
> > > The attached patch is my attempt at a new internal API for device
> > > creation in qemu.
> >
> > Instead of recreating constructors, I think we should just use GCC's
> > constructor attribute.  This gives us ordering which will be important
> > when dealing with buses.
>
> The reason I'm not using constructors is because you have to workaround
> ordering issues. All constructors are run before main(), so there's a very
> limited amount they can actually do, and constructor priorities are not
> available on all hosts.

Oh, the other thing is that constructors don't work when you put objects in a 
static library.  You need am explicit dependency to pull in objects.

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-06  1:04     ` Paul Brook
@ 2009-05-06 13:35       ` Anthony Liguori
  2009-05-09 20:55       ` Anthony Liguori
  1 sibling, 0 replies; 27+ messages in thread
From: Anthony Liguori @ 2009-05-06 13:35 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul Brook wrote:
> On Wednesday 06 May 2009, Paul Brook wrote:
>   
>>>> The attached patch is my attempt at a new internal API for device
>>>> creation in qemu.
>>>>         
>>> Instead of recreating constructors, I think we should just use GCC's
>>> constructor attribute.  This gives us ordering which will be important
>>> when dealing with buses.
>>>       
>> The reason I'm not using constructors is because you have to workaround
>> ordering issues. All constructors are run before main(), so there's a very
>> limited amount they can actually do, and constructor priorities are not
>> available on all hosts.
>>     

I was going to make an argument that you want to have multiple 
constructors in a single file.  However, I think this is wrong.  I think 
you want to make sure that you only register one device per file.

> Oh, the other thing is that constructors don't work when you put objects in a 
> static library.  You need am explicit dependency to pull in objects.
>   

I think I'd be happier if we just made it a little less connected.  So, 
something like:

static DeviceType *qdev_register(const char *name, int size, qdev_initfn init,
                                 void *opaque)
{
}

device_init(qdev_register);

Then:

#! /bin/sh
# Call device init functions.

file="$1"
shift
devices="$@"
echo '/* Generated by gen_devices.sh */' > $file
echo '#include "sysemu.h"' >> $file
echo "void register_devices(void)" >> $file
echo "{" >> $file
for x in $devices ; do
    sed -e 's:$device_init(\(.*)\);$:#qdevice \1:g' | grep '#qdevice' | cut -f2- -d' ' | while read DEVICE_INIT; do
        echo "{ extern void qemu_do_device_init_${DEVICE_INIT}(void); qemu_do_device_init_${DEVICE_INIT}(); }"
    done
done
echo "}" >> $file

And:

#define device_init(func) \
void qemu_do_device_init ## func (void) { \
    func(); \
}

This requires that device_init() functions always are unique names.  It also means we can switch to constructors down the road if we wanted to and that problem goes away.  Also, it introduces a saner way to introduce priority.  We can have device_init(), pci_device_init(), virtio_device_init(), etc.

I haven't tried any of this code BTW.

Regards,

Anthony Liguori


> Paul
>
>
>   

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-05 11:31 [Qemu-devel] [RFC] New device API Paul Brook
                   ` (2 preceding siblings ...)
  2009-05-05 22:25 ` Edgar E. Iglesias
@ 2009-05-08  1:54 ` Zachary Amsden
  2009-05-08 11:28   ` Paul Brook
  2009-05-08 13:47   ` Anthony Liguori
  2009-05-08  5:27 ` Marcelo Tosatti
  4 siblings, 2 replies; 27+ messages in thread
From: Zachary Amsden @ 2009-05-08  1:54 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul Brook wrote:
> The attached patch is my attempt at a new internal API for device creation in 
> qemu.
> 
> The long term goal here is fully dynamic machine construction, i.e. a user 
> supplied configuration file/tree rather than the current hardcoded board 
> configs.
> 
> As a first step towards this, I've implemented an API that allows devices to 
> be created and configured without requiring device specific knowledge.
> 
> There are two sides to this API.  The device side allows a device to 
> register/configure functionality (e.g. MMIO regions, IRQs, character devices, 
> etc). Most of the infrastructure for this is already present, we just need to 
> configure it consistently, rather than on an ad-hoc basis. I expect this API 
> to remain largely unchanged[1].
> 
> The board side allows creation on configuration of devices. In the medium term 
> I expect this to go away, and be replaced by data driven configuration.
> 
> I also expect that this device abstraction will let itself to a system bus 
> model to solve many of the IOMMU/DMA/address mapping issues that qemu 
> currently can't handle.
> 
> There are still a few rough edges. Busses aren't handled in a particularly 
> consistent way, PCI isn't particularly well integrated, and the 
> implementation of things like qdev_get_chardev is an ugly hack mapping onto 
> current commandline options. However I believe the device side API is fairly 
> solid, and is a necessary prerequisite for fixing the bigger configuration 
> problem.


I think the general direction is good, and this is sorely needed, but I
think having a fixed / static device struct isn't flexible enough to
handle the complexities of multiple device / bus types - for example,
MSI-X could add tons of IRQ vectors to a device, or complex devices
could have tons of MMIO regions.

I think a more flexible scheme would be to have a common device header,
and fixed substructures which are added as needed to a device.

For example, showing exploded embedded types / initialized properties.
This isn't intended to be a realistic device, or even real C code.

struct RocketController
{
    struct Device {
        struct DeviceType *base_class;
	const char *instance_name;
        DeviceProperty *configuration_data;
        struct DeviceExtension *next;
    } dev;
    struct PCIExtension {
        struct DeviceExtension header {
            int type = DEV_EXT_PCI;
            struct DeviceExtension *next;
        }
        PCIRegs regs;
        PCIFunction *registered_functions;
        struct PCIExtension *pci_next;
    } pci;
    struct MMIOExtension {
        struct DeviceExtension header {
            int type = DEV_EXT_MMIO;
            struct DeviceExtension *next;
        }
        target_phys_addr_t addr;
        target_phys_addr_t size;
        mmio_mapfunc cb;
    } mmio;
    struct IRQExtension {
        struct DeviceExtension header {
            int type = DEV_EXT_IRQ;
            struct DeviceExtension *next;
        }
        int irq;
        int type = DEV_INTR_LEVEL | DEV_INTR_MSI;
    } irqs [ROCKET_CONTROLLER_MSIX_IRQS+1];
    struct MSIXExtension {
        struct DeviceExtension header {
            int type = DEV_EXT_MSIX;
            struct DeviceExtension *next;
        }
        struct MMIOExtension *table_bar;
        target_phys_addr_t table_offset;
        struct MMIOExtension *pba_bar;
        target_phys_addr_t pba_offset;
    } msix;
    struct RocketFuelControllerStatus {
        struct DeviceExtension header {
            int type = DEV_EXT_PRIVATE;
            struct DeviceExtension *next;
            suspend_callback_fn *foo;
            resume_callback_fn *goo;
        }
        ...
    } private;
}


Now it's possible to define cool macros to walk an arbitrary device
structure (as the extensions are linked by the next field):

#define dev_for_each_ext(dev,ext) \
 for (ext = dev->next; ext; ext = ext->next)

#define dev_for_each_irq(dev,ext) \
  dev_for_each_ext(dev,ext) if (ext->type == DEV_EXT_IRQ)

inline Bool dev_supports_pci(struct Device *dev)
{
   struct DeviceExtension *ext;
   dev_for_each_ext(dev,ext)
       if (ext->type == DEV_EXT_PCI)
          return TRUE;
   return FALSE;
}

You could also use offsets instead of a next pointer (might allow more
space efficient packing of a backpointer to the device struct proper).

Device configuration would then be something like

qdev_add_pci(dev, &rocket->pci);
for (i = 0; i < NUM_ROCKET_IRQS; i++)
   qdev_add_intr(dev, &rocket->intr[i]);
qdev_add_msix(dev, &rocket->msix);
qdev_add_private_data(dev, &rocket->private, rocket_launch, rocket_land);

And of course you can skip regions that aren't configured (MSI-X, might
for example have a configuration variable to disable).

The config issue is certainly a very important problem to address.  I
think it should be noted there are three types of configuration data,
all of which are ontologically different.  There are:

1) Static per-device configuration choices, such as feature enable /
disable, framebuffer size, PCI vs. ISA bus representation.  These are
fixed choices over the lifetime on the VM.  In the above, they would be
represented as DeviceProperty configuration strings.

2) Dynamic per-device data.  Things such as PCI bus numbers, IRQs or
MMIO addresses.  These are dynamic and change, but have fixed, defined
layouts.  They may have to be stored in an intermediate representation
(a configuration file or wire protocol) for suspend / resume or
migration.  In the above, it would be possible to write handlers for
suspend / resume that operate on a generic device representation,
knowing to call specific handlers for PCI, etc.  Would certainly
mitigate a lot of bug potential in the dangerous intersection of ad-hoc
device growth and complex migration / suspend code.

3) Implementation specific data.  Pieces of data which are required for
a particular realization of a device on the host, such as file
descriptors, call back functions, authentication data, network routing.
 This data is not visible at this layer, nor do I think it should be.
Since devices may move, and virtual machines may migrate, possibly
across widely differing platforms, there could be an entirely different
realization of a device (OSS vs. ALSA sound as a trivial example).  You
may even want to dynamically change these on a running VM (SDL to VNC is
a good example).

How to represent #3 is not entirely clear, nor is it clear how to parse
the arguments and data required, as it is ad-hoc and necessarily
implementation dependent.  The dynamic changing could be done via
monitor commands or something like it.  It should be entirely possible
to switch my SVGA backend from SDL local rendering to using VNC at
run-time, for example, as long as I supply any additional configuration
data needed for VNC.

I think any attempt to broach the config problem has to realize these
distinctions in whatever namespace it uses if it is to be successful.

How to represent this of course will require more discussion.

Zach

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-05 11:31 [Qemu-devel] [RFC] New device API Paul Brook
                   ` (3 preceding siblings ...)
  2009-05-08  1:54 ` Zachary Amsden
@ 2009-05-08  5:27 ` Marcelo Tosatti
  2009-05-08 10:44   ` Paul Brook
  2009-05-28 13:53   ` Markus Armbruster
  4 siblings, 2 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2009-05-08  5:27 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul,

On Tue, May 05, 2009 at 12:31:09PM +0100, Paul Brook wrote:
> The attached patch is my attempt at a new internal API for device creation in 
> qemu.
> 
> The long term goal here is fully dynamic machine construction, i.e. a user 
> supplied configuration file/tree rather than the current hardcoded board 
> configs.
> 
> As a first step towards this, I've implemented an API that allows devices to 
> be created and configured without requiring device specific knowledge.
> 
> There are two sides to this API.  The device side allows a device to 
> register/configure functionality (e.g. MMIO regions, IRQs, character devices, 
> etc). Most of the infrastructure for this is already present, we just need to 
> configure it consistently, rather than on an ad-hoc basis. I expect this API 
> to remain largely unchanged[1].
> 
> The board side allows creation on configuration of devices. In the medium term 
> I expect this to go away, and be replaced by data driven configuration.
> 
> I also expect that this device abstraction will let itself to a system bus 
> model to solve many of the IOMMU/DMA/address mapping issues that qemu 
> currently can't handle.
> 
> There are still a few rough edges. Busses aren't handled in a particularly 
> consistent way, PCI isn't particularly well integrated, and the 
> implementation of things like qdev_get_chardev is an ugly hack mapping onto 
> current commandline options. However I believe the device side API is fairly 
> solid, and is a necessary prerequisite for fixing the bigger configuration 
> problem.
> 
> I've converted a bunch of devices, anyone interested in seeing how it fits 
> together in practice can pull from
>   git://repo.or.cz/qemu/pbrook.git qdev
> 
> It you have objections to or suggestion about this approach please speak up 
> now, but please bear in ming that this code is still somewhat in flux and the 
> caveats mentioned above.
> 
> Paul
> 
> [1] I subscribe to the linux kernel theory that stable internal APIs are a 
> coincidence, not a feature. i.e. a consistent internal API is good, but that 
> it should be changed whenever technically desirable. I have no interest in 
> maintaining a fixed API for the benefit of third parties.

Makes lots of sense. And also its perfectly fine for the API to evolve.

> commit 11c765848af6cb345a81f2722f141b305538182d
> Author: Paul Brook <paul@codesourcery.com>
> Date:   Mon May 4 17:13:08 2009 +0100
> 
>     Basic qdev infrastructure.
>     
>     Signed-off-by: Paul Brook <paul@codesourcery.com>
> 
> diff --git a/Makefile.target b/Makefile.target
> index f735105..a35e724 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -498,6 +498,7 @@ endif #CONFIG_BSD_USER
>  # System emulator target
>  ifndef CONFIG_USER_ONLY
>  
> +DEVICES=
>  OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o dma-helpers.o
>  # virtio has to be here due to weird dependency between PCI and virtio-net.
>  # need to fix this properly
> @@ -686,6 +687,13 @@ ifeq ($(TARGET_BASE_ARCH), m68k)
>  OBJS+= an5206.o mcf5206.o mcf_uart.o mcf_intc.o mcf5208.o mcf_fec.o
>  OBJS+= m68k-semi.o dummy_m68k.o
>  endif
> +
> +DEVICE_OBJS=$(addsuffix .o,$(DEVICES))
> +$(DEVICE_OBJS): CPPFLAGS+=-DDEVICE_NAME=$(subst -,_,$(@:.o=))
> +
> +OBJS+=$(DEVICE_OBJS)
> +OBJS+=devices.o qdev.o
> +
>  ifdef CONFIG_GDBSTUB
>  OBJS+=gdbstub.o gdbstub-xml.o
>  endif
> @@ -752,6 +760,9 @@ endif
>  qemu-options.h: $(SRC_PATH)/qemu-options.hx
>  	$(call quiet-command,sh $(SRC_PATH)/hxtool -h < $< > $@,"  GEN   $(TARGET_DIR)$@")
>  
> +devices.c: gen_devices.sh Makefile.target config.mak
> +	$(call quiet-command,$(SHELL) $(SRC_PATH)/gen_devices.sh $@ $(subst -,_,$(DEVICES)),"  GEN   $(TARGET_DIR)$@")
> +
>  clean:
>  	rm -f *.o *.a *~ $(PROGS) nwfpe/*.o fpu/*.o qemu-options.h gdbstub-xml.c
>  	rm -f *.d */*.d tcg/*.o
> diff --git a/gen_devices.sh b/gen_devices.sh
> new file mode 100644
> index 0000000..1d6ec12
> --- /dev/null
> +++ b/gen_devices.sh
> @@ -0,0 +1,17 @@
> +#! /bin/sh
> +# Call device init functions.
> +
> +file="$1"
> +shift
> +devices="$@"
> +echo '/* Generated by gen_devices.sh */' > $file
> +echo '#include "sysemu.h"' >> $file
> +for x in $devices ; do
> +    echo "void ${x}_register(void);" >> $file
> +done
> +echo "void register_devices(void)" >> $file
> +echo "{" >> $file
> +for x in $devices ; do
> +    echo "    ${x}_register();" >> $file
> +done
> +echo "}" >> $file
> diff --git a/hw/qdev.c b/hw/qdev.c
> new file mode 100644
> index 0000000..eb8c172
> --- /dev/null
> +++ b/hw/qdev.c
> @@ -0,0 +1,294 @@
> +/*
> + *  Dynamic device configuration and creation.
> + *
> + *  Copyright (c) 2009 CodeSourcery
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + */
> +
> +#include "hw.h"
> +#include "qdev.h"
> +#include "sysemu.h"
> +
> +#include <assert.h>
> +
> +struct DeviceProperty
> +{
> +    const char *name;
> +    union {
> +        int i;
> +        void *ptr;
> +    } value;
> +    DeviceProperty *next;
> +};
> +
> +struct DeviceType
> +{
> +    const char *name;
> +    qdev_initfn init;
> +    void *opaque;
> +    int size;
> +    DeviceType *next;
> +};
> +
> +static DeviceType *device_type_list;

Perhaps "device driver" is more suitable? Or "device emulator"??

> +
> +/* Register a new device type.  */
> +DeviceType *qdev_register(const char *name, int size, qdev_initfn init,
> +                          void *opaque)
> +{
> +    DeviceType *t;
> +
> +    assert(size >= sizeof(DeviceState));
> +
> +    t = qemu_mallocz(sizeof(DeviceType));
> +    t->next = device_type_list;
> +    device_type_list = t;
> +    t->name = qemu_strdup(name);
> +    t->size = size;
> +    t->init = init;
> +    t->opaque = opaque;
> +
> +    return t;
> +}

void virtio_blk_register(void)
{
    pci_qdev_register("virtio-blk", sizeof(VirtIOBlock), virtio_blk_init);
}

So you'd expect all "device types" to be registered by the time the
actual machine initialization starts. Similar to modular device drivers
in Linux.

    bs = qdev_init_bdrv(&pci_dev->qdev, IF_VIRTIO);
    s->vdev.get_config = virtio_blk_update_config;

And eventually you'd move qdev_init_bdrv (and all BlockDriverState
knowledge) to happen somewhere else other than device emulation code?

> +
> +/* Create a new device.  This only initializes the device state structure
> +   and allows properties to be set.  qdev_init should be called to
> +   initialize the actual device emulation.  */
> +DeviceState *qdev_create(const char *name)
> +{
> +    DeviceType *t;
> +    DeviceState *dev;
> +
> +    for (t = device_type_list; t; t = t->next) {
> +        if (strcmp(t->name, name) == 0) {
> +            break;
> +        }
> +    }
> +    if (!t) {
> +        fprintf(stderr, "Unknown device '%s'\n", name);
> +        exit(1);
> +    }
> +
> +    dev = qemu_mallocz(t->size);
> +    dev->name = name;
> +    dev->type = t;
> +    return dev;
> +}
> +
> +/* Initialize a device.  Device properties should be set before calling
> +   this function.  IRQs and MMIO regions should be connected/mapped after
> +   calling this function.  */
> +void qdev_init(DeviceState *dev)
> +{
> +    dev->type->init(dev, dev->type->opaque);
> +}
>
> +
> +void qdev_connect_irq(DeviceState *dev, int n, qemu_irq irq)
> +{
> +    assert(n >= 0 && n < dev->num_irq);
> +    dev->irqs[n] = 0;
> +    if (dev->irqp[n]) {
> +        *dev->irqp[n] = irq;
> +    }
> +}
> +
> +void qdev_mmio_map(DeviceState *dev, int n, target_phys_addr_t addr)
> +{
> +    assert(n >= 0 && n < dev->num_mmio);
> +
> +    if (dev->mmio[n].addr == addr) {
> +        /* ??? region already mapped here.  */
> +        return;
> +    }
> +    if (dev->mmio[n].addr != (target_phys_addr_t)-1) {
> +        /* Unregister previous mapping.  */
> +        cpu_register_physical_memory(dev->mmio[n].addr, dev->mmio[n].size,
> +                                     IO_MEM_UNASSIGNED);
> +    }
> +    dev->mmio[n].addr = addr;
> +    if (dev->mmio[n].cb) {
> +        dev->mmio[n].cb(dev, addr);
> +    } else {
> +        cpu_register_physical_memory(addr, dev->mmio[n].size,
> +                                     dev->mmio[n].iofunc);
> +    }
> +}
> +
> +void qdev_set_bus(DeviceState *dev, void *bus)
> +{
> +    assert(!dev->bus);
> +    dev->bus = bus;
> +}
> +
> +static DeviceProperty *create_prop(DeviceState *dev, const char *name)
> +{
> +    DeviceProperty *prop;
> +
> +    /* TODO: Check for duplicate properties.  */
> +    prop = qemu_mallocz(sizeof(*prop));
> +    prop->name = qemu_strdup(name);
> +    prop->next = dev->props;
> +    dev->props = prop;
> +
> +    return prop;
> +}
> +
> +void qdev_set_prop_int(DeviceState *dev, const char *name, int value)
> +{
> +    DeviceProperty *prop;
> +
> +    prop = create_prop(dev, name);
> +    prop->value.i = value;
> +}
> +
> +void qdev_set_prop_ptr(DeviceState *dev, const char *name, void *value)
> +{
> +    DeviceProperty *prop;
> +
> +    prop = create_prop(dev, name);
> +    prop->value.ptr = value;
> +}
> +
> +
> +qemu_irq qdev_get_irq_sink(DeviceState *dev, int n)
> +{
> +    assert(n >= 0 && n < dev->num_irq_sink);
> +    return dev->irq_sink[n];
> +}
> +
> +void qdev_init_mmio(DeviceState *dev, target_phys_addr_t size, int iofunc)
> +{
> +    int n;
> +
> +    assert(dev->num_mmio < QDEV_MAX_MMIO);
> +    n = dev->num_mmio++;
> +    dev->mmio[n].addr = -1;
> +    dev->mmio[n].size = size;
> +    dev->mmio[n].iofunc = iofunc;
> +}
> +
> +void qdev_init_mmio_cb(DeviceState *dev, target_phys_addr_t size,
> +                       mmio_mapfunc cb)
> +{
> +    int n;
> +
> +    assert(dev->num_mmio < QDEV_MAX_MMIO);
> +    n = dev->num_mmio++;
> +    dev->mmio[n].addr = -1;
> +    dev->mmio[n].size = size;
> +    dev->mmio[n].cb = cb;
> +}
> +
> +/* Request an IRQ source.  The actual IRQ object may be populated later.  */
> +void qdev_init_irq(DeviceState *dev, qemu_irq *p)
> +{
> +    int n;
> +
> +    assert(dev->num_irq < QDEV_MAX_IRQ);
> +    n = dev->num_irq++;
> +    dev->irqp[n] = p;
> +}
> +
> +/* Pass IRQs from a target device.  */
> +void qdev_pass_irq(DeviceState *dev, DeviceState *target)
> +{
> +    int i;
> +    assert(dev->num_irq == 0);
> +    dev->num_irq = target->num_irq;
> +    for (i = 0; i < dev->num_irq; i++) {
> +        dev->irqp[i] = target->irqp[i];
> +    }
> +}
> +
> +/* Register device IRQ sinks.  */
> +void qdev_init_irq_sink(DeviceState *dev, qemu_irq_handler handler, int nirq)
> +{
> +    dev->num_irq_sink = nirq;
> +    dev->irq_sink = qemu_allocate_irqs(handler, dev, nirq);
> +}
> +
> +/* Get a character (serial) device interface.  */
> +CharDriverState *qdev_init_chardev(DeviceState *dev)
> +{
> +    static int next_serial;
> +    static int next_virtconsole;
> +    if (strncmp(dev->name, "virtio", 6) == 0) {
> +        return virtcon_hds[next_virtconsole++];
> +    } else {
> +        return serial_hds[next_serial++];
> +    }
> +}
> +
> +void *qdev_get_bus(DeviceState *dev)
> +{
> +    return dev->bus;
> +}
> +
> +static DeviceProperty *find_prop(DeviceState *dev, const char *name)
> +{
> +    DeviceProperty *prop;
> +
> +    for (prop = dev->props; prop; prop = prop->next) {
> +        if (strcmp(prop->name, name) == 0) {
> +            return prop;
> +        }
> +    }
> +    /* TODO: When this comes from a config file we will need to handle
> +       missing properties gracefully.  */
> +    abort();
> +}
> +
> +int qdev_get_prop_int(DeviceState *dev, const char *name)
> +{
> +    DeviceProperty *prop;
> +
> +    prop = find_prop(dev, name);
> +    return prop->value.i;
> +}
> +
> +void *qdev_get_prop_ptr(DeviceState *dev, const char *name)
> +{
> +    DeviceProperty *prop;
> +
> +    prop = find_prop(dev, name);
> +    return prop->value.ptr;
> +}
> +
> +DeviceState *qdev_create_varargs(const char *name,
> +                                 target_phys_addr_t addr, ...)
> +{
> +    DeviceState *dev;
> +    va_list va;
> +    qemu_irq irq;
> +    int n;
> +
> +    dev = qdev_create(name);
> +    qdev_init(dev);
> +    if (addr != (target_phys_addr_t)-1) {
> +        qdev_mmio_map(dev, 0, addr);
> +    }
> +    va_start(va, addr);
> +    n = 0;
> +    while (1) {
> +        irq = va_arg(va, qemu_irq);
> +        if (!irq) {
> +            break;
> +        }
> +        qdev_connect_irq(dev, n, irq);
> +        n++;
> +    }
> +    return dev;
> +}
> diff --git a/hw/qdev.h b/hw/qdev.h
> new file mode 100644
> index 0000000..08b2713
> --- /dev/null
> +++ b/hw/qdev.h
> @@ -0,0 +1,89 @@
> +#ifndef QDEV_H
> +#define QDEV_H
> +
> +#include "irq.h"
> +
> +typedef struct DeviceType DeviceType;
> +
> +#define QDEV_MAX_MMIO 5
> +#define QDEV_MAX_IRQ 32
> +
> +typedef struct DeviceProperty DeviceProperty;
> +
> +typedef void (*mmio_mapfunc)(DeviceState *dev, target_phys_addr_t addr);
> +
> +/* This structure should not be accessed directly.  We declare it here
> +   so that it can be embedded in individual device state structures.  */
> +struct DeviceState
> +{
> +    const char *name;
> +    DeviceType *type;
> +    void *bus;

I suppose parent_bus is more descriptive?.

Here you would have a tree node, eventually.

> +    int num_irq;
> +    qemu_irq irqs[QDEV_MAX_IRQ];
> +    qemu_irq *irqp[QDEV_MAX_IRQ];
> +    int num_irq_sink;
> +    qemu_irq *irq_sink;

This is somewhat complicated. So you need irqp pointers and irqs array
due to the way IRQ initialization is performed? 

> +    struct {
> +        target_phys_addr_t addr;
> +        target_phys_addr_t size;
> +        mmio_mapfunc cb;
> +        int iofunc;
> +    } mmio[QDEV_MAX_MMIO];
> +    DeviceProperty *props;
> +};
> +
> +
> +/*** Board API.  This should go away once we have a machine config file.  ***/
> +
> +DeviceState *qdev_create(const char *name);
> +void qdev_init(DeviceState *dev);
> +
> +/* Set properties between creation and init.  */
> +void qdev_set_bus(DeviceState *dev, void *bus);
> +void qdev_set_prop_int(DeviceState *dev, const char *name, int value);
> +void qdev_set_prop_ptr(DeviceState *dev, const char *name, void *value);
> +
> +/* Configure device after init.   */
> +void qdev_connect_irq(DeviceState *dev, int n, qemu_irq irq);
> +void qdev_mmio_map(DeviceState *dev, int n, target_phys_addr_t addr);
> +qemu_irq qdev_get_irq_sink(DeviceState *dev, int n);
> +
> +
> +/*** Device API.  ***/
> +
> +typedef void (*qdev_initfn)(DeviceState *dev, void *opaque);
> +
> +DeviceType *qdev_register(const char *name, int size, qdev_initfn init,
> +                          void *opaque);

Don't you want to cover savevm/restore callbacks at this level too? 

Device destruction surely (for hotunplug). Passing a structure with
callbacks would be nicer.

> +
> +/* Register device properties.  */
> +void qdev_init_mmio(DeviceState *dev, target_phys_addr_t size, int iofunc);
> +void qdev_init_mmio_cb(DeviceState *dev, target_phys_addr_t size,
> +                       mmio_mapfunc cb);
> +void qdev_init_irq(DeviceState *dev, qemu_irq *p);
> +void qdev_pass_irq(DeviceState *dev, DeviceState *target);

Can you explain how init_irq_sink/pass_irq are supposed to work?

> +void qdev_init_scsi(DeviceState *dev, SCSIAttachFn attach)
> +{
> +   int bus = next_scsi_bus++;
> +   int unit;
> +   int index;
> +
> +   for (unit = 0; unit < MAX_SCSI_DEVS; unit++) {
> +       index = drive_get_index(IF_SCSI, bus, unit);
> +       if (index == -1) {
> +           continue;
> +       }
> +       attach(dev, drives_table[index].bdrv, unit);
> +   }
> +}
> diff --git a/hw/qdev.h b/hw/qdev.h
> index 08b2713..79eb22f 100644
> --- a/hw/qdev.h
> +++ b/hw/qdev.h
> @@ -54,6 +54,7 @@ qemu_irq qdev_get_irq_sink(DeviceState *dev, int n);
>  /*** Device API.  ***/
>  
>  typedef void (*qdev_initfn)(DeviceState *dev, void *opaque);
> +typedef void (*SCSIAttachFn)(void *opaque, BlockDriverState *bdrv, int unit);
>  
>  DeviceType *qdev_register(const char *name, int size, qdev_initfn init,
>                            void *opaque);
> @@ -65,6 +66,7 @@ void qdev_init_mmio_cb(DeviceState *dev, target_phys_addr_t size,
>  void qdev_init_irq(DeviceState *dev, qemu_irq *p);
>  void qdev_pass_irq(DeviceState *dev, DeviceState *target);
>  void qdev_init_irq_sink(DeviceState *dev, qemu_irq_handler handler, int nirq);
> +void qdev_init_scsi(DeviceState *dev, SCSIAttachFn attach);
>  
>  CharDriverState *qdev_init_chardev(DeviceState *dev);
>  

Looks like a good start to me.

Markus should have more substantial comments (he's on vacation
this week). The tree driven machine initialization introduces some
complications (such as parent_bus->child device dependencies) while
handling a lot of the problems you are ignoring (eg the ->config method
is responsible for linking host device -> emulated device information,
etc).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-08  5:27 ` Marcelo Tosatti
@ 2009-05-08 10:44   ` Paul Brook
  2009-05-28 13:53   ` Markus Armbruster
  1 sibling, 0 replies; 27+ messages in thread
From: Paul Brook @ 2009-05-08 10:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: Marcelo Tosatti

> > +
> > +/* Register a new device type.  */
> > +DeviceType *qdev_register(const char *name, int size, qdev_initfn init,
> > +                          void *opaque)
> > +{
> > +    DeviceType *t;
> > +
> > +    assert(size >= sizeof(DeviceState));
> > +
> > +    t = qemu_mallocz(sizeof(DeviceType));
> > +    t->next = device_type_list;
> > +    device_type_list = t;
> > +    t->name = qemu_strdup(name);
> > +    t->size = size;
> > +    t->init = init;
> > +    t->opaque = opaque;
> > +
> > +    return t;
> > +}
>
> void virtio_blk_register(void)
> {
>     pci_qdev_register("virtio-blk", sizeof(VirtIOBlock), virtio_blk_init);
> }
>
> So you'd expect all "device types" to be registered by the time the
> actual machine initialization starts. Similar to modular device drivers
> in Linux.

They are.

>     bs = qdev_init_bdrv(&pci_dev->qdev, IF_VIRTIO);
>     s->vdev.get_config = virtio_blk_update_config;
>
> And eventually you'd move qdev_init_bdrv (and all BlockDriverState
> knowledge) to happen somewhere else other than device emulation code?

I'm not sure what you're getting at here. The whole point of qdev_init_bdrv is 
that it isolates devices from the block device configuration.

> > +/* This structure should not be accessed directly.  We declare it here
> > +   so that it can be embedded in individual device state structures.  */
> > +struct DeviceState
> > +{
> > +    const char *name;
> > +    DeviceType *type;
> > +    void *bus;
>
> I suppose parent_bus is more descriptive?.
>
> Here you would have a tree node, eventually.

I've deliberately kept the device API independent of the configuration 
structure. We need some awareness of bus-device interaction because this 
tends to define how the device interacts with the rest of the system. I don't 
want to force this onto a particular topology though.

> > +    int num_irq;
> > +    qemu_irq irqs[QDEV_MAX_IRQ];
> > +    qemu_irq *irqp[QDEV_MAX_IRQ];
> > +    int num_irq_sink;
> > +    qemu_irq *irq_sink;
>
> This is somewhat complicated. So you need irqp pointers and irqs array
> due to the way IRQ initialization is performed?

Yes. It allows single-pass device initialisation. My earlier prototype had 
multi-pass initialization, which gets confusing and often results in quire a 
bit of duplication. You end up with a first pass saying "I'm going to need X, 
Y and Z", then a second pass that actually retrieves X, Y and Z.


> > +DeviceType *qdev_register(const char *name, int size, qdev_initfn init,
> > +                          void *opaque);
>
> Don't you want to cover savevm/restore callbacks at this level too?
>
> Device destruction surely (for hotunplug). Passing a structure with
> callbacks would be nicer.

I've tried to restrict this to the minimum information needed to instantiate 
the device. Everything else can be sorted out during initialisation. This is 
related to the single-pass initialisation I mentioned above.

> > +/* Register device properties.  */
> > +void qdev_init_mmio(DeviceState *dev, target_phys_addr_t size, int
> > iofunc); +void qdev_init_mmio_cb(DeviceState *dev, target_phys_addr_t
> > size, +                       mmio_mapfunc cb);
> > +void qdev_init_irq(DeviceState *dev, qemu_irq *p);
> > +void qdev_pass_irq(DeviceState *dev, DeviceState *target);
>
> Can you explain how init_irq_sink/pass_irq are supposed to work?

There are two end to each virtual IRQ line. The source (device that raises the 
IRQ) and the sink (device that reacts to the IRQ).

qdev_init_irq sets up an IRQ source. As mentioned above the irq object itself 
is populated after device intiialization (but before the machine starts).

qdev_init_irq_sink sets up a set of IRQ sinks.

qdev_pass_irq is a hack that passes through IRQ sources from a different 
device on this device. It's equivalent to:

proxy_irq_handler(dev, n, level)
{
  qemu_set_irq(dev->irq[n], level)
}

my_device_init(dev, target)
{
  qemu_irq *proxy;
  proxy = qemu_allocate_irqs(dev, proxy_irq_handler, target->num_irq);
  for (i = 0; i < target->num_irq; i++) 
    qdev_connect_irq(target, proxy[i]);
    qdev_init_irq(dev, &dev->irq[i]);
  }
}


> Markus should have more substantial comments (he's on vacation
> this week). The tree driven machine initialization introduces some
> complications (such as parent_bus->child device dependencies) while
> handling a lot of the problems you are ignoring (eg the ->config method
> is responsible for linking host device -> emulated device information,
> etc).

I still don't believe multipass device initialization/configuration is 
necessary. I have kept tree driven initialization in mind when implementing 
the device API.

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-08  1:54 ` Zachary Amsden
@ 2009-05-08 11:28   ` Paul Brook
  2009-05-08 13:47   ` Anthony Liguori
  1 sibling, 0 replies; 27+ messages in thread
From: Paul Brook @ 2009-05-08 11:28 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: qemu-devel

> > The attached patch is my attempt at a new internal API for device
> > creation in qemu.
>...
> I think the general direction is good, and this is sorely needed, but I
> think having a fixed / static device struct isn't flexible enough to
> handle the complexities of multiple device / bus types - for example,
> MSI-X could add tons of IRQ vectors to a device, or complex devices
> could have tons of MMIO regions.
>
> I think a more flexible scheme would be to have a common device header,
> and fixed substructures which are added as needed to a device.

I'm not sure how much benefit this gives in practice, or how much it's worth 
catering for the "same" device that can be attached to different busses. 
While e.g. both PCI and non-PCI devices have MMIO regions and IRQ sources, 
I'm not sure how much can be shared between the two. I suspect the common 
infrastructure extended as far as cpu_register_io_memory/qemu_irq (or 
equivalent), but using the same API for the bus/device interaction is more 
trouble then it's worth.

While a fancy multiple inheritance system like you describe is in interesting 
idea, It feels over-engineered for this particular problem.

> The config issue is certainly a very important problem to address.  I
> think it should be noted there are three types of configuration data,
> all of which are ontologically different.  There are:
>
> 1) Static per-device configuration choices, such as feature enable /
> disable, framebuffer size, PCI vs. ISA bus representation. 
>
> 2) Dynamic per-device data.  

I think these two largely fall out together. Once you've determined that 
you're dealing with a PCI device, all the associated dynamic PCI data follows 
directly from that.

> 3) Implementation specific data.  Pieces of data which are required for
> a particular realization of a device on the host, such as file
> descriptors, call back functions, authentication data, network routing.

We already have things like CharDevice and BlockDriverState to isolate devices 
from the host implementation. In the past this is the distinction I've made 
between user-level config and machine config.

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-08  1:54 ` Zachary Amsden
  2009-05-08 11:28   ` Paul Brook
@ 2009-05-08 13:47   ` Anthony Liguori
  2009-05-09  1:21     ` Zachary Amsden
  1 sibling, 1 reply; 27+ messages in thread
From: Anthony Liguori @ 2009-05-08 13:47 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Paul Brook, qemu-devel

Zachary Amsden wrote:
> I think the general direction is good, and this is sorely needed, but I
> think having a fixed / static device struct isn't flexible enough to
> handle the complexities of multiple device / bus types - for example,
> MSI-X could add tons of IRQ vectors to a device, or complex devices
> could have tons of MMIO regions.
>
> I think a more flexible scheme would be to have a common device header,
> and fixed substructures which are added as needed to a device.
>
> For example, showing exploded embedded types / initialized properties.
> This isn't intended to be a realistic device, or even real C code.
>
> struct RocketController
> {
>     struct Device {
>         struct DeviceType *base_class;
> 	const char *instance_name;
>         DeviceProperty *configuration_data;
>         struct DeviceExtension *next;
>     } dev;
>     struct PCIExtension {
>         struct DeviceExtension header {
>             int type = DEV_EXT_PCI;
>             struct DeviceExtension *next;
>         }
>         PCIRegs regs;
>         PCIFunction *registered_functions;
>         struct PCIExtension *pci_next;
>     } pci;
>     struct MMIOExtension {
>         struct DeviceExtension header {
>             int type = DEV_EXT_MMIO;
>             struct DeviceExtension *next;
>         }
>         target_phys_addr_t addr;
>         target_phys_addr_t size;
>         mmio_mapfunc cb;
>     } mmio;
>     struct IRQExtension {
>         struct DeviceExtension header {
>             int type = DEV_EXT_IRQ;
>             struct DeviceExtension *next;
>         }
>         int irq;
>         int type = DEV_INTR_LEVEL | DEV_INTR_MSI;
>     } irqs [ROCKET_CONTROLLER_MSIX_IRQS+1];
>   

I think the problem with this is that it gives too much information 
about CPU constructs (MMIO/IRQs) to a device that is never connected to 
the actual CPU.

PCI devices do not have a concept of "MMIO".  They have a concept of IO 
regions.  You can have different types of IO regions and the word sizes 
that are supported depend on the width of the PCI bus.  Additionally, 
the rules about endianness of the data totally depend on the PCI 
controller, not the CPU itself.

This is where the current API fails miserably.  You cannot have a PCI 
device calling the CPU MMIO registration functions directly because 
there is no sane way to deal with endianness conversion.  Instead, the 
IO region registration has to go through the PCI bus.

Likewise, for IRQs, we should stick to the same principle.  PCI exposes 
MSI and LNK interrupts to devices.  We should have an API for devices to 
consume at the PCI level for that.

As Paul said, I don't think it's worth trying to make the same devices 
work on top of multiple busses.  I think it's asking for trouble.  
Instead, for devices that can connect to multiple busses (like ne2k), 
they can have separate register_pci_device() and register_isa_device() 
calls and then just internally abstract their chipset functions.  Then 
it just requires a small big of ISA and PCI glue code to support both 
device types.

> 1) Static per-device configuration choices, such as feature enable /
> disable, framebuffer size, PCI vs. ISA bus representation.  These are
> fixed choices over the lifetime on the VM.  In the above, they would be
> represented as DeviceProperty configuration strings.
>   

Yes.  I think this is important too.  But when we introduce this, we 
need to make sure the devices pre-register what strings they support and 
provide human consumable descriptions of what those knobs do.  
Basically, we should be able to auto extract a hardware documentation 
file from the device that describes in detail all of the supported knobs.

> 2) Dynamic per-device data.  Things such as PCI bus numbers, IRQs or
> MMIO addresses.  These are dynamic and change, but have fixed, defined
> layouts.  They may have to be stored in an intermediate representation
> (a configuration file or wire protocol) for suspend / resume or
> migration.  In the above, it would be possible to write handlers for
> suspend / resume that operate on a generic device representation,
> knowing to call specific handlers for PCI, etc.  Would certainly
> mitigate a lot of bug potential in the dangerous intersection of ad-hoc
> device growth and complex migration / suspend code.
>   

For the most part, I think the device should be unaware of these 
things.  It never needs to see it's devfn.  It should preregister what 
lnks it supports and whether it supports MSI, but it should never know 
what IRQs that actually gets routed to.

> 3) Implementation specific data.  Pieces of data which are required for
> a particular realization of a device on the host, such as file
> descriptors, call back functions, authentication data, network routing.
>  This data is not visible at this layer, nor do I think it should be.
> Since devices may move, and virtual machines may migrate, possibly
> across widely differing platforms, there could be an entirely different
> realization of a device (OSS vs. ALSA sound as a trivial example).  You
> may even want to dynamically change these on a running VM (SDL to VNC is
> a good example).
>
> How to represent #3 is not entirely clear, nor is it clear how to parse
>   

We really have three types of things and it's not entirely clear to me 
how to name them.  What we're currently calling devices are emulated 
hardware.  Additionally, we have host drivers that provide backend 
functionality.  This would include things like SDL, VNC, the tap VLAN 
driver.  We also need something like a host device.  This would be the 
front-end functionality that connects to the host driver backend.

A device registers its creation function in it's module_init().  This 
creation function will then register the fact that it's a PCI device and 
will basic information about itself in that registration.  A PCI device 
can be instantiated via a device configuration file and when that 
happens, the device will create a host device for whatever functionality 
it supports.  For a NIC, this would be a NetworkHostDevice or something 
like that.  This NetworkHostDevice would have some link to the device 
itself (via an id, handle, whatever).  A user can then create a 
NetworkHostDriver and attach the NetworkHostDevice to that driver and 
you then have a functional emulated NIC.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-08 13:47   ` Anthony Liguori
@ 2009-05-09  1:21     ` Zachary Amsden
  2009-05-09 13:36       ` Anthony Liguori
  0 siblings, 1 reply; 27+ messages in thread
From: Zachary Amsden @ 2009-05-09  1:21 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paul Brook, qemu-devel

Anthony Liguori wrote:

> This is where the current API fails miserably.  You cannot have a PCI
> device calling the CPU MMIO registration functions directly because
> there is no sane way to deal with endianness conversion.  Instead, the
> IO region registration has to go through the PCI bus.

Right, being an x86 bigot for years has isolated me from these horrors.

> As Paul said, I don't think it's worth trying to make the same devices
> work on top of multiple busses.  I think it's asking for trouble. 
> Instead, for devices that can connect to multiple busses (like ne2k),
> they can have separate register_pci_device() and register_isa_device()
> calls and then just internally abstract their chipset functions.  Then
> it just requires a small big of ISA and PCI glue code to support both
> device types.

Hmm, not sure how I made it sound like devices should work on unnatural
busses.  I simply meant there are some devices that can have ISA and PCI
 implementation, and when that is possible, it makes sense to have that
as a static configuration choice.

>> 1) Static per-device configuration choices, such as feature enable /
>> disable, framebuffer size, PCI vs. ISA bus representation.  These are
>> fixed choices over the lifetime on the VM.  In the above, they would be
>> represented as DeviceProperty configuration strings.
>>   
> 
> Yes.  I think this is important too.  But when we introduce this, we
> need to make sure the devices pre-register what strings they support and
> provide human consumable descriptions of what those knobs do. 
> Basically, we should be able to auto extract a hardware documentation
> file from the device that describes in detail all of the supported knobs.

Yes, I keep falling back to some sort of DEVICETYPE{NNN}.PARAM = "VALUE"
scheme, sort of like .vmx config files.  However, they failed horribly
in not complaining about unparsed parameters; silently ignoring config
data is wrong, and pre-registration should be required, as it stops both
silent typo failures and collision.

>> 2) Dynamic per-device data.  Things such as PCI bus numbers, IRQs or
>> MMIO addresses.  These are dynamic and change, but have fixed, defined
>> layouts.  They may have to be stored in an intermediate representation
>> (a configuration file or wire protocol) for suspend / resume or
>> migration.  In the above, it would be possible to write handlers for
>> suspend / resume that operate on a generic device representation,
>> knowing to call specific handlers for PCI, etc.  Would certainly
>> mitigate a lot of bug potential in the dangerous intersection of ad-hoc
>> device growth and complex migration / suspend code.
>>   
> 
> For the most part, I think the device should be unaware of these
> things.  It never needs to see it's devfn.  It should preregister what
> lnks it supports and whether it supports MSI, but it should never know
> what IRQs that actually gets routed to.

Ideally, but I think in practice the line will blur.. unless you have a
completely ideal bus abstraction, there will still be the need to fill
in reads from configuration space and associated device specific side
effects.


> We really have three types of things and it's not entirely clear to me how to name them.  What we're currently calling devices are emulated hardware.  Additionally, we have host drivers that provide backend functionality.  This would include things like SDL, VNC, the tap VLAN driver.  We also need something like a host device.  This would be the front-end functionality that connects to the host driver backend.

> A device registers its creation function in it's module_init().  This creation function will then register the fact that it's a PCI device and will basic information about itself in that registration.  A PCI device can be instantiated via a device configuration file and when that happens, the device will create a host device for whatever functionality it supports.  For a NIC, this would be a NetworkHostDevice or something like that.  This NetworkHostDevice would have some link to the device itself (via an id, handle, whatever).  A user can then create a NetworkHostDriver and attach the NetworkHostDevice to that driver and you then have a functional emulated NIC. 

This sounds pretty much ideal, I would say, but the details are really
in "will create a host device for whatever functionality".  Is there a
plan to lay out frontend APIs(1) for the various types of devices
(sound, pointer, net, SCSI) so we can have modularized host driver backends?

(1) With APIs being flexible, I don't mean a fixed link-type module, I
mean well-modularized code that makes the 5 architectures x 50 devices x
4 bus models x 3 host implementations less of a problem than it
currently is.

Zach

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-09  1:21     ` Zachary Amsden
@ 2009-05-09 13:36       ` Anthony Liguori
  0 siblings, 0 replies; 27+ messages in thread
From: Anthony Liguori @ 2009-05-09 13:36 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Paul Brook, qemu-devel

Zachary Amsden wrote:
> Anthony Liguori wrote:
>
>   
>> Yes.  I think this is important too.  But when we introduce this, we
>> need to make sure the devices pre-register what strings they support and
>> provide human consumable descriptions of what those knobs do. 
>> Basically, we should be able to auto extract a hardware documentation
>> file from the device that describes in detail all of the supported knobs.
>>     
>
> Yes, I keep falling back to some sort of DEVICETYPE{NNN}.PARAM = "VALUE"
> scheme, sort of like .vmx config files.  However, they failed horribly
> in not complaining about unparsed parameters; silently ignoring config
> data is wrong, and pre-registration should be required, as it stops both
> silent typo failures and collision.
>   

Exactly.

  

>> For the most part, I think the device should be unaware of these
>> things.  It never needs to see it's devfn.  It should preregister what
>> lnks it supports and whether it supports MSI, but it should never know
>> what IRQs that actually gets routed to.
>>     
>
> Ideally, but I think in practice the line will blur.. unless you have a
> completely ideal bus abstraction, there will still be the need to fill
> in reads from configuration space and associated device specific side
> effects.
>   

Perhaps, but I want to get away from any assumption that we're saying 
PCI device 00:01:04.4 is connected to IRQ 10 in a configuration file.  
How a PCI LNK gets routed to an actual IRQ depends on a lot of things 
and to support something like this, you'd need rather sophisticated 
autogeneration of ACPI, dynamic allocation of number of LNKs, etc.  PCI 
devices internally determine how many LNKs they use.  In the config 
file, we should merely be saying create PCI device 00:01:04.4 with 
vendor ID X and device ID Y.  Then we should be able to add device 
specific configuration bits (for virtio-net, for instance, we may say 
how large the queue size is).

Details of interrupt routing, BAR location, etc. are out of scope IMHO.

>> A device registers its creation function in it's module_init().  This creation function will then register the fact that it's a PCI device and will basic information about itself in that registration.  A PCI device can be instantiated via a device configuration file and when that happens, the device will create a host device for whatever functionality it supports.  For a NIC, this would be a NetworkHostDevice or something like that.  This NetworkHostDevice would have some link to the device itself (via an id, handle, whatever).  A user can then create a NetworkHostDriver and attach the NetworkHostDevice to that driver and you then have a functional emulated NIC. 
>>     
>
> This sounds pretty much ideal, I would say, but the details are really
> in "will create a host device for whatever functionality".  Is there a
> plan to lay out frontend APIs(1) for the various types of devices
> (sound, pointer, net, SCSI) so we can have modularized host driver backends?
>   

Yes, and I don't think we're that far away from that today.  I think the 
block driver API is shaping up nicely.  We've almost got an ideal API.  
It'll be a bit more work to remove all of the legacy aspects of the API.

The networking API needs a revamp but I think there's general consensus 
on how to do that.  DisplayState is getting good too but it needs input 
support too.  Also, we have to improve how the TextConsole and 
multiplexing work today to be implemented as proper DisplayState 
layering instead of the hackery that exists today.

> (1) With APIs being flexible, I don't mean a fixed link-type module, I
> mean well-modularized code that makes the 5 architectures x 50 devices x
> 4 bus models x 3 host implementations less of a problem than it
> currently is.
>   

Yes, I think that's what we're aiming for here.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-06  1:04     ` Paul Brook
  2009-05-06 13:35       ` Anthony Liguori
@ 2009-05-09 20:55       ` Anthony Liguori
  2009-05-09 21:06         ` Paul Brook
  2009-05-09 22:52         ` malc
  1 sibling, 2 replies; 27+ messages in thread
From: Anthony Liguori @ 2009-05-09 20:55 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul Brook wrote:
> On Wednesday 06 May 2009, Paul Brook wrote:
>   
>>>> The attached patch is my attempt at a new internal API for device
>>>> creation in qemu.
>>>>         
>>> Instead of recreating constructors, I think we should just use GCC's
>>> constructor attribute.  This gives us ordering which will be important
>>> when dealing with buses.
>>>       
>> The reason I'm not using constructors is because you have to workaround
>> ordering issues. All constructors are run before main(), so there's a very
>> limited amount they can actually do, and constructor priorities are not
>> available on all hosts.
>>     
>
> Oh, the other thing is that constructors don't work when you put objects in a 
> static library.  You need am explicit dependency to pull in objects.
>   

Not if you enable -Wl,--whole-archive.  It ends up looking like:

gcc -o test-stub -g -Wall -O test-stub.c -Wl,--whole-archive libtest.a 
-Wl,--no-whole-archive -L.

And I've confirmed this works.

Regards,

Anthony Liguori

> Paul
>
>
>   

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-09 20:55       ` Anthony Liguori
@ 2009-05-09 21:06         ` Paul Brook
  2009-05-10  1:34           ` Anthony Liguori
  2009-05-09 22:52         ` malc
  1 sibling, 1 reply; 27+ messages in thread
From: Paul Brook @ 2009-05-09 21:06 UTC (permalink / raw)
  To: qemu-devel

On Saturday 09 May 2009, Anthony Liguori wrote:
> Paul Brook wrote:
> > On Wednesday 06 May 2009, Paul Brook wrote:
> >>>> The attached patch is my attempt at a new internal API for device
> >>>> creation in qemu.
> >>>
> >>> Instead of recreating constructors, I think we should just use GCC's
> >>> constructor attribute.  This gives us ordering which will be important
> >>> when dealing with buses.
> >>
> >> The reason I'm not using constructors is because you have to workaround
> >> ordering issues. All constructors are run before main(), so there's a
> >> very limited amount they can actually do, and constructor priorities are
> >> not available on all hosts.
> >
> > Oh, the other thing is that constructors don't work when you put objects
> > in a static library.  You need am explicit dependency to pull in objects.
>
> Not if you enable -Wl,--whole-archive.  It ends up looking like:
>
> gcc -o test-stub -g -Wall -O test-stub.c -Wl,--whole-archive libtest.a
> -Wl,--no-whole-archive -L.
>
> And I've confirmed this works.

Yes, but, eww. Plus it means you've got to pull in absolutely everything, 
whether you want it or not.

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-09 20:55       ` Anthony Liguori
  2009-05-09 21:06         ` Paul Brook
@ 2009-05-09 22:52         ` malc
  2009-05-10  1:35           ` Anthony Liguori
  2009-05-10  1:37           ` Anthony Liguori
  1 sibling, 2 replies; 27+ messages in thread
From: malc @ 2009-05-09 22:52 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paul Brook, qemu-devel

On Sat, 9 May 2009, Anthony Liguori wrote:

> Paul Brook wrote:
> > On Wednesday 06 May 2009, Paul Brook wrote:
> >   
> > > > > The attached patch is my attempt at a new internal API for device
> > > > > creation in qemu.
> > > > >         
> > > > Instead of recreating constructors, I think we should just use GCC's
> > > > constructor attribute.  This gives us ordering which will be important
> > > > when dealing with buses.
> > > >       
> > > The reason I'm not using constructors is because you have to workaround
> > > ordering issues. All constructors are run before main(), so there's a very
> > > limited amount they can actually do, and constructor priorities are not
> > > available on all hosts.
> > >     
> > 
> > Oh, the other thing is that constructors don't work when you put objects in
> > a static library.  You need am explicit dependency to pull in objects.
> >   
> 
> Not if you enable -Wl,--whole-archive.  It ends up looking like:
> 
> gcc -o test-stub -g -Wall -O test-stub.c -Wl,--whole-archive libtest.a
> -Wl,--no-whole-archive -L.
> 
> And I've confirmed this works.

-Wl,--whole-archive means QEMU can only be built with GNU ld.

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-09 21:06         ` Paul Brook
@ 2009-05-10  1:34           ` Anthony Liguori
  0 siblings, 0 replies; 27+ messages in thread
From: Anthony Liguori @ 2009-05-10  1:34 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul Brook wrote:
> Yes, but, eww. Plus it means you've got to pull in absolutely everything, 
> whether you want it or not.
>   

No, just your own internal static archives.  You disable it for any 
other libraries.

And really, if you've got live code in your archives that isn't 
reachable in your program, why is it there in the first place? :-)

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-09 22:52         ` malc
@ 2009-05-10  1:35           ` Anthony Liguori
  2009-05-10  6:50             ` Andreas Färber
  2009-05-10 18:38             ` malc
  2009-05-10  1:37           ` Anthony Liguori
  1 sibling, 2 replies; 27+ messages in thread
From: Anthony Liguori @ 2009-05-10  1:35 UTC (permalink / raw)
  To: malc; +Cc: Paul Brook, qemu-devel

malc wrote:
> -Wl,--whole-archive means QEMU can only be built with GNU ld.
>   
Does upstream QEMU build with anything else today?

Worst case, you can always search out device_init functions and build a 
stub dynamically but it gets really ugly.  Unless there's an immediate 
need not to use constructors, I don't see why we wouldn't.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-09 22:52         ` malc
  2009-05-10  1:35           ` Anthony Liguori
@ 2009-05-10  1:37           ` Anthony Liguori
  1 sibling, 0 replies; 27+ messages in thread
From: Anthony Liguori @ 2009-05-10  1:37 UTC (permalink / raw)
  To: malc; +Cc: Paul Brook, qemu-devel

malc wrote:
> On Sat, 9 May 2009, Anthony Liguori wrote:
>   
>> And I've confirmed this works.
>>     
>
> -Wl,--whole-archive means QEMU can only be built with GNU ld.
>   

Also, we don't have to use static libraries.  It really doesn't give us 
anything so it's pretty simple to swizzle the Makefiles if we have to.  
If we really want to use a library, we should make it a dynamic library 
and install that by default.  That way we save on some installation size.

So there's a lot of simple ways to eliminate --whole-archive if we have to.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-10  1:35           ` Anthony Liguori
@ 2009-05-10  6:50             ` Andreas Färber
  2009-05-10 18:38             ` malc
  1 sibling, 0 replies; 27+ messages in thread
From: Andreas Färber @ 2009-05-10  6:50 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paul Brook, qemu-devel


Am 10.05.2009 um 03:35 schrieb Anthony Liguori:

> malc wrote:
>> -Wl,--whole-archive means QEMU can only be built with GNU ld.
>>
> Does upstream QEMU build with anything else today?

Yes, Sun's gcc uses their own ld, which doesn't support a number of  
GNU options.

And Apple's ld is also worth checking, though I'm not sure if GNU or  
not. I couldn't find that pattern in the man page at least.

With all that discussions about constructors and destructors and  
ordering and so on, aren't you reinventing C++ as an object-oriented  
language? Not that I terribly like it, but it seems simpler and more  
standardized than all those weird GCC extensions currently under  
consideration. It would seem possible to confine object-orientisms to  
hw/ and to keep performance-critical code like TCG pure C. I assume  
you don't like that idea though or someone would've suggested it  
earlier, but I thought I'd mention it for the sake of completeness.

Andreas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-10  1:35           ` Anthony Liguori
  2009-05-10  6:50             ` Andreas Färber
@ 2009-05-10 18:38             ` malc
  1 sibling, 0 replies; 27+ messages in thread
From: malc @ 2009-05-10 18:38 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paul Brook, qemu-devel

On Sat, 9 May 2009, Anthony Liguori wrote:

> malc wrote:
> > -Wl,--whole-archive means QEMU can only be built with GNU ld.
> >   
> Does upstream QEMU build with anything else today?

AIX ld.

> 
> Worst case, you can always search out device_init functions and build a stub
> dynamically but it gets really ugly.  Unless there's an immediate need not to
> use constructors, I don't see why we wouldn't.
> 
> Regards,
> 
> Anthony Liguori
> 

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] New device API
  2009-05-08  5:27 ` Marcelo Tosatti
  2009-05-08 10:44   ` Paul Brook
@ 2009-05-28 13:53   ` Markus Armbruster
  1 sibling, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2009-05-28 13:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Paul Brook, qemu-devel

Marcelo Tosatti <mtosatti@redhat.com> writes:

[...]
> Markus should have more substantial comments (he's on vacation
> this week). The tree driven machine initialization introduces some
> complications (such as parent_bus->child device dependencies) while
> handling a lot of the problems you are ignoring (eg the ->config method
> is responsible for linking host device -> emulated device information,
> etc).

The patch looks okay for what it attempts to do.  I don't think
rehashing the differences in scope to my prototype would do much good.
It'll be interesting to see how the API will evolve to accommodate
data-driven machine building.

What's the plan for getting there?

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-05-28 13:55 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-05 11:31 [Qemu-devel] [RFC] New device API Paul Brook
2009-05-05 15:56 ` Blue Swirl
2009-05-05 16:17   ` Paul Brook
2009-05-05 16:26     ` Blue Swirl
2009-05-05 16:35       ` Paul Brook
2009-05-05 18:22 ` Anthony Liguori
2009-05-05 22:42   ` Edgar E. Iglesias
2009-05-06  0:52   ` Paul Brook
2009-05-06  1:04     ` Paul Brook
2009-05-06 13:35       ` Anthony Liguori
2009-05-09 20:55       ` Anthony Liguori
2009-05-09 21:06         ` Paul Brook
2009-05-10  1:34           ` Anthony Liguori
2009-05-09 22:52         ` malc
2009-05-10  1:35           ` Anthony Liguori
2009-05-10  6:50             ` Andreas Färber
2009-05-10 18:38             ` malc
2009-05-10  1:37           ` Anthony Liguori
2009-05-05 22:25 ` Edgar E. Iglesias
2009-05-08  1:54 ` Zachary Amsden
2009-05-08 11:28   ` Paul Brook
2009-05-08 13:47   ` Anthony Liguori
2009-05-09  1:21     ` Zachary Amsden
2009-05-09 13:36       ` Anthony Liguori
2009-05-08  5:27 ` Marcelo Tosatti
2009-05-08 10:44   ` Paul Brook
2009-05-28 13:53   ` Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.