[Qemu-devel] qdev for programmers writeup

* [Qemu-devel] qdev for programmers writeup
@ 2011-07-11 10:20 Paolo Bonzini
  2011-07-11 10:46 ` Peter Maydell
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2011-07-11 10:20 UTC (permalink / raw)
  To: qemu-devel

Hi,

this is a partial version of a "qdev for programmers" document I've been
working on.  Comments are welcome.

Paolo

--------------------------------- 8< ---------------------------------

== qdev overview and concepts ==

qdev is the factory interface that QEMU uses to create guest devices and
connect them to each other.  It also provides a uniform way to expose host
devices (character, network and block) to the guest.  In the remainder,
unless specified explicitly, "device" will refer to _guest_ devices.

qdev exposes a device tree that alternates buses (qbuses) and devices
(qdevs).  The root of the tree is the system bus SysBus.  Devices can be
leaves, or they can expose buses and talk to the devices on those buses.
Such relation does not cover host counterparts of the devices, which
are not part of the device tree.

A device's interaction occurs by invoking services specific to the kind of
bus.  In general, if a device or bus X wants to requests something from Y,
X needs to know the type of Y, and of course needs to have a pointer to
it.  In a properly "qdevified" board, these assumptions hold:

- qdev enforces what bus a device is placed on;

- buses are not user-visible;

- initialization of buses is driven exclusively by the parent device, and
initialization of devices is driven by the parent bus and a well-defined
set of properties (defined per-device);

- buses do not know what device exposes them;

- devices do not know what device exposes their bus.

With these assumptions in place, leaf devices are the simplest to understand.
They only make requests to the bus and/or to the character/block/network
subsystems; and possibly, they provide services (routines) used by
the bus and the grandparent device.

Intermediate devices also have to provide glue between their parent bus
and their child bus(es), and buses likewise glue two devices.  Depending
on the kind of bus and the relationship of a device with the bus (parent
or child), different sets of services may be defined.  For example, a SCSI
bus mediates many kinds of interaction:

- from a SCSI controller to a SCSI device (e.g., start process this
command);

- from a SCSI bus to a child device (e.g., cancel this command due to a
bus reset);

- from a SCSI bus to its parent controller (e.g., this piece of data was
sent to you by a SCSI device);

- from a SCSI controller to its child bus (e.g., I dealt with this data,
please transfer more);

- from a SCSI device to its parent bus (e.g., please pass this data on
to the controller);

In general, the following rules and best practices are common:

- devices interact with their parent bus, and vice versa;

- buses interact with their parent devices, and vice versa;

- occasionally, devices may interact directly with their grandchildren
devices, but _not_ vice versa; interaction with the grandfather
device is mediated by the parent bus;

- in addition, devices interact freely with their host counterparts
(that is, character/block/network devices).

qdev defines a set of data structures, and devices use them to expose
metainformation to the rest of QEMU and to the user.  The qdev system
is object-oriented; qdev data structures can be subclassed and used to
store additional information, including function pointers for bus-
specific services.  The remainder of this document explains how to
define and use these data structures.

== Implementation techniques ==

qdev exposes an object-oriented mechanism in C through containment (C
replacement for inheritance, so to speak) and tries to make this
as type-safe as possible by leveraging the DO_UPCAST macro.

Sample structure definitions for a superclass and subclass are as follows:

    typedef struct Superclass {
        int field1;
        int field2;
        struct Superclass *field3;
    } Superclass;

    typedef struct Subclass {
        struct Superclass sup;
        int subfield1;
        int subfield2;
    } Subclass;

In many cases, C programmers pass such objects using an opaque pointer 
(void *).  These are then casted to the appropriate subtype like

     void func (void *opaque)
     {
         Subclass *s = (Subclass *) opaque;
         ...
     }

QEMU prefers to always use a more type-safe approach that passes the 
pointer to the superclass.  The cast is then done using the 
aforementioned macro:

     void func (Superclass *state)
     {
         /* more typesafe version of (Subclass *) state, that also
            verifies that &state->sup == state.
            - First argument: subclass type.
            - Second argument: field being accessed.
            - Third argument: variable being casted.  */
         Subclass *sub = DO_UPCAST(Subclass, sup, state);
     }

Casts to a superclass are done with &state->sup.  This scheme is quite 
handy to use, even though may be a bit strange-looking at the beginning.

== qdev data structures ==

This part of the document explains the data structures used by qdev.
These include the class hierarchies for buses and devices, together
with the corresponding metaclass hierarchies, and a registry of
devices and corresponding metainformation.

=== Bus and device hierarchies ===

Buses and devices reside on two parallel hierarchies, BusState and 
DeviceState.  Devices that work on the same bus usually share a
superclass.  Hence, each bus defines a subclass of BusState and an
abstract subclass of DeviceState.  Each device then adds its concrete
subclass in the DeviceState hierarchy.  For example:

     BusState
         PCIBus
         ISABus
         i2c_bus
     DeviceState
         PCIState /* bus common superclass */
             LSIState /* device-specific class */
             ...
         ISADevice
             IB700State
             ISASerialState
             ...
         i2c_slave
             WM8750State
             ...

Here is how tasks are separated between these classes:

1) bus classes (e.g. i2c_bus) are usually the least interesting of all. 
Their fields are mostly private and used at device creation time.  For 
example, you could place here the highest IRQ allocated to devices on 
the bus.  In some cases it is even absent, for example the SysBus reuses 
BusState.

2) bus superclasses (e.g. i2c_slave) typically include the address of 
the device and the interrupt lines that it is connected to.

3) device subclasses contain device-specific configuration information 
(e.g. the character or block devices to connect to) and registers.

=== Describing qdev data structures ===

In addition to defining the structs, each bus and device should describe 
them as "properties".  Since the description that a device exposes is 
shared between the bus superclasses and the device subclasses, a device 
is described completely by the union of "bus properties" (representing 
fields of the abstract per-bus superclass) and "device properties"
(representing fields of the device subclass).  Example:

     /* This is a bus superclass */
     struct i2c_slave
     {
         DeviceState qdev;
         I2CSlaveInfo *info;  /* explained later */
         uint8_t address;
     };

     /* This is how we explain it to QEMU */
     static struct BusInfo i2c_bus_info = {
         .name = "I2C",
         .size = sizeof(i2c_bus),
         .props = (Property[]) {
             /* This means: "address" is an uint8_t property with a
                default value of 0.  Store it in field address of
                struct i2c_slave.  */
             DEFINE_PROP_UINT8("address", struct i2c_slave, address, 0),
             DEFINE_PROP_END_OF_LIST(),
         }
     };

     /* This is a device that exposes no properties.  */
     static I2CSlaveInfo wm8750_info = {
         .qdev.name = "wm8750",
         .qdev.size = sizeof(WM8750State),
         /* For migration and save/restore; do not care yet.  */
         .qdev.vmsd = &vmstate_wm8750,
         /* These functions are exposed to the bus and possibly to
            the grandparent device.  */
         .init = wm8750_init,
         .event = wm8750_event,
         .recv = wm8750_rx,
         .send = wm8750_tx
     };

Another example:

     /* ISA defines no bus properties */
     static struct BusInfo isa_bus_info = {
         .name      = "ISA",
         .size      = sizeof(ISABus),
         /* ISA defines a couple of bus-specific callbacks.  */
         .print_dev = isabus_dev_print,
         .get_fw_dev_path = isabus_get_fw_dev_path,
     };

     /* However, a parallel port does define device properties: */
     static ISADeviceInfo parallel_isa_info = {
         .qdev.name  = "isa-parallel",
         .qdev.size  = sizeof(ISAParallelState),
         .init       = parallel_isa_initfn,
         .qdev.props = (Property[]) {
             DEFINE_PROP_UINT32("index", ISAParallelState, index,   -1),
             DEFINE_PROP_HEX32("iobase", ISAParallelState, iobase,  -1),
             DEFINE_PROP_UINT32("irq",   ISAParallelState, isairq,  7),
             DEFINE_PROP_CHR("chardev",  ISAParallelState, state.chr),
             DEFINE_PROP_END_OF_LIST(),
         },
     };

In general, a device may have both bus properties and device properties. 
Simple examples appropriate for documentation unfortunately don't. :)

=== Metainformation hierarchy ===

Above you may have noticed some new type names: BusInfo, I2CSlaveInfo, 
DeviceInfo.  These are the names used to store information on the class: 
properties of course, and also virtual functions.  In some sense
these *are* metaclass objects.  Their hierarchies mimics the BusState
and DeviceState ones.  The BusInfo/DeviceInfo hierarchy includes a
struct for each abstract class in the BusState/DeviceState hierarchy,
and an instance for each concrete class:

     BusState <=> BusInfo
         PCIBus -> struct BusInfo pci_bus_info = ...
         ISABus -> struct BusInfo isa_bus_info = ...
         i2c_bus -> struct BusInfo i2c_bus_info = ...
     DeviceState <=> DeviceInfo
         PCIState <=> PCIDeviceInfo
             LSIState -> static PCIDeviceInfo lsi_info = ...
             ...
         ISADevice <=> ISADeviceInfo
             IB700State -> static ISADeviceInfo wdt_ib700_info = ...
             ISASerialState -> static ISADeviceInfo serial_isa_info = ...
             ...
         i2c_slave <=> I2CSlaveInfo
             WM8750State -> static I2CSlaveInfo wm8750_info = ...
             ...

I2CSlaveInfo are the place where devices declare virtual functions
requested by the bus, in addition to those already in DeviceInfo.
In many cases, these functions correspond to additional "services" that
only make sense for that bus (example: event/recv/send in the i2c bus).
Sometimes, instead, they replace the ones in the superclass because
the bus needs to pass extra information.  The init function is always
overridden in this way; there is an internal init member in
DeviceInfo:

     typedef int (*qdev_initfn)(DeviceState *dev, DeviceInfo *info);

and one per bus, for example:

     typedef int (*i2c_slave_initfn)(i2c_slave *dev);
     typedef int (*isa_qdev_initfn)(ISADevice *dev);
     typedef int (*pci_qdev_initfn)(PCIDevice *dev);

Here is the way the I2C bus defines its qdev_initfn in terms of 
i2c_slave_initfn:

     static int i2c_slave_qdev_init(DeviceState *dev, DeviceInfo *base)
     {
         I2CSlaveInfo *info = DO_UPCAST (I2CSlaveInfo, qdev, base);
         i2c_slave *s = DO_UPCAST(i2c_slave, qdev, dev);

         /* Store virtual function table for later use.  */
         s->info = info;

         return info->init(s);
     }

=== Registering devices and making them public ===

The last part of qdev is the registry of all devices defined by the
target system.  This is a fundamental piece of metainformation, because
it allows the "-device" option to work, at least for devices that
do not rely on DEFINE_PROP_PTR or sysbus_create_varargs (those
devices can only be instantiated from QEMU's machine initialization
code).

Registering a device's name is done with the qdev_register function.
This function however is used only internally.  The actual function
to be used varies per-bus, so that the bus can first perform some checks
and do some initialization that is common to all DeviceInfo objects for
that bus.

To this end, each bus defines a wrapper function that initializes common
part of the struct DeviceInfo, and passes it to qdev_register:

     void i2c_register_slave(I2CSlaveInfo *info)
     {
         assert(info->qdev.size >= sizeof(i2c_slave));
         info->qdev.init = i2c_slave_qdev_init;
         info->qdev.bus_info = &i2c_bus_info;
         qdev_register(&info->qdev);
     }

Each device then calls this function:

     static void wm8750_register_devices(void)
     {
         i2c_register_slave(&wm8750_info);
     }

In turn, wm8750_register_devices is called at startup (as if it was a 
C++ global constructor; a gcc extension allows to do it in C):

     device_init(wm8750_register_devices)

== Letting buses and devices "talk" ==

In this part of the document, we will examine the mechanisms by which
buses and devices are connected.  The first section will explain how
buses convert human-readable properties into pointers to internal
data structures.  The second section will explain how devices take
care of creating buses.  Finally, we will describe SysBus, which is
the root of the qdev system and connects qdev with the rest of the
QEMU device model.

=== Using buses to connect device layers ===

As mentioned above, buses sit in a unique location, as they have access
to services from both the parent device and the child device.  As such,
they provide the "glue" between two layers of devices.

As part of this, they may simply expose some of the services of the
parent devices to the children.  For example, a USB host controller
interface exposes a bus with one or more "ports", and defines a set of
functions to operate on ports.  USB devices do not operate directly
on these functions; they always go through helpers such as this one:

    void usb_wakeup(USBDevice *dev)
    {
        if (dev->remote_wakeup && dev->port && dev->port->ops->wakeup) {
            dev->port->ops->wakeup(dev);
        }
    }

Helpers like this makes change easier, for example if a function
used to be mandatory and you want to make it optional.

Another very important piece of glue is initialization.  When the bus's
init function is called, properties have been set already and the parent
bus is known too.  hence the bus has the occasion to take the values of
the properties, and convert them to pointers for internal data structures
(or for example qemu_irqs).  Here is an example:

1) the bus defines a property (irq, the IRQ number):

     static struct BusInfo spapr_vio_bus_info = {
         .name       = "spapr-vio",
         .size       = sizeof(VIOsPAPRBus),
         .props = (Property[]) {
             DEFINE_PROP_UINT32("irq", VIOsPAPRDevice, vio_irq_num, 0),
             DEFINE_PROP_END_OF_LIST(),
         },
     };

2) the bus init function talks to the parent device (spapr) in order to 
get a default value and especially a qemu_irq:

     if (!dev->vio_irq_num) {
         dev->vio_irq_num = spapr_allocate_irq (spapr);
     }
     dev->qirq = xics_find_qirq(spapr->icp, dev->vio_irq_num);

So this is how qdev manages to convert human-readable configuration into 
pointers.  Since you cannot go "turtles all the way down", there are two 
fallback mechanisms to pass pointers directly to devices:

1) one is DEFINE_PROP_PTR, which you probably shouldn't use;

2) one is specific to qemu_irq and devices from sysbus; see 
sysbus_create_varargs.

=== Defining a child bus ===

[...]

=== SysBus: the root ===

[...]

== A quick guide to qdev conversion ==

Converting devices to qdev is a three-step process:

1) ensuring that an appropriate bus type is defined where the device can 
be attached to;

2) defining a device's properties (the "schema" exposed by the device);

3) converting board initialization functions to use qdev services.

The first step is very important to achieve a "quality" conversion
to qdev.  QEMU includes partial conversions to qdev that have a large
amount of SysBus devices, or devices that use DEFINE_PROP_PTR.  In many
cases, this is because the authors did not introduce a board-specific
bus type to mediate access to the board resources.  Together with such
a bus type there should be a single root board-specific device that is
attached to SysBus.  An interrupt controller is usually a good candidate
for this because it takes qemu_irqs from the outside, and can make good
use of the specificities of SysBus.

A good design will make the conversion simpler (this is important,
because it is usually hard to convert only a small part of the devices)
and especially the second step might be mostly trivial.

The third step is also very important.  If the conversion was done
well, a lot of board-specific initialization code may be removed and
replaced by command-line options.  This will also give the user the
flexibility of working with "dumbed down" versions of the board, with
some devices removed.  If necessary, standard versions of the board
may be described with configuration files.

Old code not yet converted to qdev uses a specific function for each
device type:

     goldfish_timer_and_rtc_init(0xff003000, 3);
     ...

     static struct goldfish_timer_state timer_state;

     void goldfish_timer_and_rtc_init(uint32_t timerbase, int timerirq)
     {
         timer_state.dev.base = timerbase;
         timer_state.dev.irq = timerirq;
         timer_state.timer = qemu_new_timer_ns(vm_clock,
                                goldfish_timer_tick, &timer_state);

         goldfish_device_add(&timer_state.dev, goldfish_timer_readfn,
             goldfish_timer_writefn, &timer_state);
     }

Here, the "timer_state.dev" function is a sub-structure that is common
to all devices in the board.  This is an embryonal separation between
bus-specific and device-specific data that can be exploited when
converting to qdev.  However, there are substantial differences between
this code and what will be required after qdev conversion:

- the timerbase and timerirq are set via properties before the qdev is
actually created; qdev takes care of initializing the structure's fields;

- creation of the timer is moved into the init virtual function for the
device;

- of all the arguments to goldfish_device_add, only "&timer_state" 
matters, because the goldfish_timer_readfn and goldfish_timer_writefn 
arguments will be stored in the GoldfishDeviceInfo;

- last but not least, everything will be allocated dynamically, so
static device objects such as "timer_state" will have to go.

qdev's metainformation structures BusInfo and DeviceInfo provide a place
for all this information, including even initializers for the static
"timer_state" object.  These for example can become bus property defaults,
or can be moved to the DeviceInfo subclass.

So, the call to goldfish_timer_and_rtc_init can be described entirely
in terms of qdev properties.  This can in turn be expressed in
different ways:

1) command-line

    -device goldfish_timer,base=0xff003000,irq=3

2) configuration files (for -readconfig):

     [device "goldfish_timer"]
         base = 0xff003000
         irq = 3

3) C code:

     /* The first argument is the bus.  See below for how to
        create a bus-specific wrapper to qdev_create.  */
     dev = qdev_create(&goldfish_bus->qbus, "goldfish_timer");
     qdev_prop_set_uint32(dev, "base", 0xff003000);
     qdev_prop_set_uint32(dev, "irq", 3);
     qdev_init_nofail(dev);

The last case will appear in the machine initialization function in
several cases: devices using DEFINE_PROP_PTR; devices that are present
in the board by default (though in the long term we would like to
move those to configuration files); code that creates devices based
on legacy command-line interfaces.  It will often be hidden behind a
helper function not unlike goldfish_timer_and_rtc_init; for example
(slightly edited from the actual QEMU code):

    static ISABus *isabus;

    ISADevice *isa_create(const char *name)
    {
        DeviceState *dev;

        dev = qdev_create(&isabus->qbus, name);
        return DO_UPCAST(ISADevice, qdev, dev);
    }

    static inline void serial_isa_init(int index, CharDriverState *chr)
    {
        ISADevice *dev;

        dev = isa_create("isa-serial");
        qdev_prop_set_uint32(&dev->qdev, "index", index);
        qdev_prop_set_chr(&dev->qdev, "chardev", chr);
        qdev_init_nofail(&dev->qdev);
    }

    ...

    /* Here we create ISA serial ports for each -serial option
       on the command line.  */
    for(i = 0; i < MAX_SERIAL_PORTS; i++) {
        if (serial_hds[i]) {
            serial_isa_init(i, serial_hds[i]);
        }
    }

^ permalink raw reply	[flat|nested] 7+ messages in thread