All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] Machine description as data
@ 2009-02-11 15:40 Markus Armbruster
  2009-02-11 16:31 ` Ian Jackson
                   ` (10 more replies)
  0 siblings, 11 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-11 15:40 UTC (permalink / raw)
  To: qemu-devel

Sorry for the length of this memo.  I tried to make it as concise as I
could.  And there's working mock-up source code to go with it.


Configuration should be data
----------------------------

A QEMU machine (selected with -M) is described by a struct QEMUMachine.
Which contains almost nothing of interest.  Pretty much everything,
including all the buses and devices is instead created by the machine's
initialization function.

Init functions consider a plethora of ad hoc configuration parameters
set by command line options.  Plenty of stuff remains hard-coded all
the same.

Configuration should be data, not code.

A machine's buses and devices can be expressed as a device tree.  More
on that below.

The need for a configuration file
---------------------------------

The command line is a rather odd place to define a virtual machine.
Command line is fine for manipulating a particular run of the machine,
but the machine description belongs into a configuration file.

Once configuration is data, we should be able to initialize it from a
configuration file with relative ease.

However, this memo is only about the *internal* representation of
configuration.  How we get there from a configuration file is a separate
question.  It's without doubt a relevant question, but I feel I need to
limit my scope to have a chance of getting anywhere.

The need for an abstract device interface
-----------------------------------------

Currently, each virtual device is created, configured and initialized in
its own idiosyncratic way.  Some configuration is received as arguments,
some is passed in global variables.

This is workable as long as the machine is constructed by ad hoc init
function code.  The resulting init function tends to be quite a
hairball, though.

I'd like to propose an abstract device interface, so we can build a
machine from its (tree-structured) configuration using just this
interface.  Device idiosyncrasies are to be hidden in the driver code
behind the interface.

What I propose to do
--------------------

A. Configuration as data

   Define an internal machine configuration data structure.  Needs to be
   sufficiently generic to be able to support even oddball machine
   types.  Make it a decorated tree, i.e. a tree of named nodes with
   named properties.

   Create an instance for a prototype machine type.  Make it a PC,
   because that's the easiest to test.

   Define an abstract device interface, initially covering just device
   configuration and initialization.

   Implement the device interface for the devices used by the prototype
   machine type.

   Do not break existing machine types here.  This means we need to keep
   legacy interfaces until their last user is gone (step B).  Could
   become somewhat messy in places for a while.

B. Convert all the existing machine configurations to data.

   This can and should be done incrementally, each machine by people who
   care and know about it.

   Clean up the legacy interfaces now unused, and any messes we made
   behind them.

C. Read (and maybe write) machine configuration

   The external format to use is debatable.  Compared to the rest of the
   task, its choice looks like detail to me, but I'm biased ;)

   Writing the data could be useful for debugging.

D. Command line options to modify the configuration tree

   If we want them.

E. Make legacy command line modify the configuration tree

   For compatibility.  This is my "favourite" part.

We need to start with A.  The other tasks are largely independent.

What I've already done
----------------------

Show me the code, they say.  Find attached a working prototype of step
A.  It passes the "Linux boots" test for me.  I didn't bother to rebase
to current HEAD, happy do to that on request.

Instead of hacking up machine "pc", I created a new machine "pcdt".  I
took a number of shortcuts:

* I put the "pcdt" code into the new file dt.c, and copied code from
  pc.c there.  I could have avoided that by putting my code in pc.c
  instead.  Putting it in a new file helped me pick apart the pc.c
  hairball.  To be cleaned up.

* I copied code from net.c.  Trivial to fix, just give it external
  linkage there.

* I hard-coded the configuration tree in the wrong place (tree.c), out of
  laziness.

* I didn't implement all the devices of the "pc" original.  The devices
  I implemented might not support all existing command line options.

Notable qualities:

* Device drivers are cleanly separated from each other, and from the
  device-agnostic configuration code.

* Each driver specifies the configurable properties in a single place.

* Device configuration is gotten from the configuration tree, which is
  fully checked.  Unknown properties are rejected.


Appendix: Linux device trees
----------------------------

This appendix is probably only of interest to some of you, feel free to
skip.

The IEEE 1275 Open Firmware Device Tree solves a somewhat similar
problem, namely to communicate environmental information (hardware and
configuration) from firmware to operating system.  It's chiefly used on
PowerPCs.  The OS calls Open Firmware to query the device tree.

Linux turns the Open Firmware device tree API into a data format.
Actually two: the DT blob format is a binary data structure, and the
DT source format is human-readable text.  The device tree compiler
"dtc" can convert the two.

We already have a bit of code dealing with this, in device_tree.c.

I briefly examined the DT source format and the tree structure it
describes for the purpose of QEMU configuration.  I decided against
using it in my prototype because I found it awfully low-level and
verbose for that purpose (I'm sure it serves the purpose it was designed
for just fine).  Issues include:

* Since the DT is designed for booting kernels, not configuring QEMU,
  there's information that has no place in QEMU configuration, and
  required QEMU configuration isn't there.

* Redundancy between node name and its device_type property.

* Property "reg", which encodes address ranges, does so in terms of
  "cells": #address-cells 32-bit words (big endian) for the address,
  followed by #size-cells words for the size, where #address-cells and
  #size-cells are properties of the enclosing bus.  If this sounds
  like gibberish to you, well, that's my point.


diff --git a/Makefile b/Makefile
index 4f7a55a..2198bba 100644
--- a/Makefile
+++ b/Makefile
@@ -85,6 +85,7 @@ OBJS+=sd.o ssi-sd.o
 OBJS+=bt.o bt-host.o bt-vhci.o bt-l2cap.o bt-sdp.o bt-hci.o bt-hid.o usb-bt.o
 OBJS+=buffered_file.o migration.o migration-tcp.o net.o qemu-sockets.o
 OBJS+=qemu-char.o aio.o net-checksum.o savevm.o cache-utils.o
+OBJS+=tree.o
 
 ifdef CONFIG_BRLAPI
 OBJS+= baum.o
diff --git a/Makefile.target b/Makefile.target
index a091ce9..10a3245 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -580,6 +580,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
+OBJS+= dt.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
 ifeq ($(TARGET_BASE_ARCH), ppc)
diff --git a/hw/dt.c b/hw/dt.c
new file mode 100644
index 0000000..f62198f
--- /dev/null
+++ b/hw/dt.c
@@ -0,0 +1,1672 @@
+#include <assert.h>
+#include "hw.h"
+#include "pc.h"
+#include "fdc.h"
+#include "pci.h"
+#include "block.h"
+#include "sysemu.h"
+#include "audio/audio.h"
+#include "net.h"
+#include "smbus.h"
+#include "boards.h"
+#include "console.h"
+#include "fw_cfg.h"
+#include "virtio-blk.h"
+#include "virtio-balloon.h"
+#include "virtio-console.h"
+#include "hpet_emul.h"
+#include "tree.h"
+
+/* Forward declarations */
+struct dt_device;
+struct dt_driver;
+struct dt_prop_spec;
+static void dt_parse_prop(struct dt_device *dev, struct tree_prop *prop);
+static BlockDriverState **dt_piix3_hd(struct tree *piix3);
+
+\f

+// FIXME copied from pc.c, external defs stripped, unused stuff #if 0'ed
+/* output Bochs bios info messages */
+//#define DEBUG_BIOS
+
+#define BIOS_FILENAME "bios.bin"
+#define VGABIOS_FILENAME "vgabios.bin"
+#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
+
+#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
+
+/* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables.  */
+#define ACPI_DATA_SIZE       0x10000
+#define BIOS_CFG_IOPORT 0x510
+
+#define MAX_IDE_BUS 2
+
+static fdctrl_t *floppy_controller;
+static RTCState *rtc_state;
+#if 0
+static PITState *pit;
+static IOAPICState *ioapic;
+#endif
+extern PCIDevice *i440fx_state;
+
+static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
+{
+}
+
+#if 0
+/* MSDOS compatibility mode FPU exception support */
+static qemu_irq ferr_irq;
+/* XXX: add IGNNE support */
+void cpu_set_ferr(CPUX86State *s)
+{
+    qemu_irq_raise(ferr_irq);
+}
+
+static void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
+{
+    qemu_irq_lower(ferr_irq);
+}
+#else
+extern qemu_irq ferr_irq;
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data);
+#endif
+
+#if 0
+/* TSC handling */
+uint64_t cpu_get_tsc(CPUX86State *env)
+{
+    /* Note: when using kqemu, it is more logical to return the host TSC
+       because kqemu does not trap the RDTSC instruction for
+       performance reasons */
+#ifdef USE_KQEMU
+    if (env->kqemu_enabled) {
+        return cpu_get_real_ticks();
+    } else
+#endif
+    {
+        return cpu_get_ticks();
+    }
+}
+
+/* SMM support */
+void cpu_smm_update(CPUState *env)
+{
+    if (i440fx_state && env == first_cpu)
+        i440fx_set_smm(i440fx_state, (env->hflags >> HF_SMM_SHIFT) & 1);
+}
+
+/* IRQ handling */
+int cpu_get_pic_interrupt(CPUState *env)
+{
+    int intno;
+
+    intno = apic_get_interrupt(env);
+    if (intno >= 0) {
+        /* set irq request if a PIC irq is still pending */
+        /* XXX: improve that */
+        pic_update_irq(isa_pic);
+        return intno;
+    }
+    /* read the irq from the PIC */
+    if (!apic_accept_pic_intr(env))
+        return -1;
+
+    intno = pic_read_irq(isa_pic);
+    return intno;
+}
+#endif
+
+static void pic_irq_request(void *opaque, int irq, int level)
+{
+    CPUState *env = first_cpu;
+
+    if (env->apic_state) {
+        while (env) {
+            if (apic_accept_pic_intr(env))
+                apic_deliver_pic_intr(env, level);
+            env = env->next_cpu;
+        }
+    } else {
+        if (level)
+            cpu_interrupt(env, CPU_INTERRUPT_HARD);
+        else
+            cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
+    }
+}
+
+/* PC cmos mappings */
+
+#define REG_EQUIPMENT_BYTE          0x14
+
+static int cmos_get_fd_drive_type(int fd0)
+{
+    int val;
+
+    switch (fd0) {
+    case 0:
+        /* 1.44 Mb 3"5 drive */
+        val = 4;
+        break;
+    case 1:
+        /* 2.88 Mb 3"5 drive */
+        val = 5;
+        break;
+    case 2:
+        /* 1.2 Mb 5"5 drive */
+        val = 2;
+        break;
+    default:
+        val = 0;
+        break;
+    }
+    return val;
+}
+
+static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
+{
+    RTCState *s = rtc_state;
+    int cylinders, heads, sectors;
+    bdrv_get_geometry_hint(hd, &cylinders, &heads, &sectors);
+    rtc_set_memory(s, type_ofs, 47);
+    rtc_set_memory(s, info_ofs, cylinders);
+    rtc_set_memory(s, info_ofs + 1, cylinders >> 8);
+    rtc_set_memory(s, info_ofs + 2, heads);
+    rtc_set_memory(s, info_ofs + 3, 0xff);
+    rtc_set_memory(s, info_ofs + 4, 0xff);
+    rtc_set_memory(s, info_ofs + 5, 0xc0 | ((heads > 8) << 3));
+    rtc_set_memory(s, info_ofs + 6, cylinders);
+    rtc_set_memory(s, info_ofs + 7, cylinders >> 8);
+    rtc_set_memory(s, info_ofs + 8, sectors);
+}
+
+/* convert boot_device letter to something recognizable by the bios */
+static int boot_device2nibble(char boot_device)
+{
+    switch(boot_device) {
+    case 'a':
+    case 'b':
+        return 0x01; /* floppy boot */
+    case 'c':
+        return 0x02; /* hard drive boot */
+    case 'd':
+        return 0x03; /* CD-ROM boot */
+    case 'n':
+        return 0x04; /* Network boot */
+    }
+    return 0;
+}
+
+/* copy/pasted from cmos_init, should be made a general function
+ and used there as well */
+static int pc_boot_set(void *opaque, const char *boot_device)
+{
+#define PC_MAX_BOOT_DEVICES 3
+    RTCState *s = (RTCState *)opaque;
+    int nbds, bds[3] = { 0, };
+    int i;
+
+    nbds = strlen(boot_device);
+    if (nbds > PC_MAX_BOOT_DEVICES) {
+        term_printf("Too many boot devices for PC\n");
+        return(1);
+    }
+    for (i = 0; i < nbds; i++) {
+        bds[i] = boot_device2nibble(boot_device[i]);
+        if (bds[i] == 0) {
+            term_printf("Invalid boot device for PC: '%c'\n",
+                    boot_device[i]);
+            return(1);
+        }
+    }
+    rtc_set_memory(s, 0x3d, (bds[1] << 4) | bds[0]);
+    rtc_set_memory(s, 0x38, (bds[2] << 4));
+    return(0);
+}
+
+/* hd_table must contain 4 block drivers */
+static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+                      const char *boot_device, BlockDriverState **hd_table)
+{
+    RTCState *s = rtc_state;
+    int nbds, bds[3] = { 0, };
+    int val;
+    int fd0, fd1, nb;
+    int i;
+
+    /* various important CMOS locations needed by PC/Bochs bios */
+
+    /* memory size */
+    val = 640; /* base memory in K */
+    rtc_set_memory(s, 0x15, val);
+    rtc_set_memory(s, 0x16, val >> 8);
+
+    val = (ram_size / 1024) - 1024;
+    if (val > 65535)
+        val = 65535;
+    rtc_set_memory(s, 0x17, val);
+    rtc_set_memory(s, 0x18, val >> 8);
+    rtc_set_memory(s, 0x30, val);
+    rtc_set_memory(s, 0x31, val >> 8);
+
+    if (above_4g_mem_size) {
+        rtc_set_memory(s, 0x5b, (unsigned int)above_4g_mem_size >> 16);
+        rtc_set_memory(s, 0x5c, (unsigned int)above_4g_mem_size >> 24);
+        rtc_set_memory(s, 0x5d, (uint64_t)above_4g_mem_size >> 32);
+    }
+
+    if (ram_size > (16 * 1024 * 1024))
+        val = (ram_size / 65536) - ((16 * 1024 * 1024) / 65536);
+    else
+        val = 0;
+    if (val > 65535)
+        val = 65535;
+    rtc_set_memory(s, 0x34, val);
+    rtc_set_memory(s, 0x35, val >> 8);
+
+    /* set the number of CPU */
+    rtc_set_memory(s, 0x5f, smp_cpus - 1);
+
+    /* set boot devices, and disable floppy signature check if requested */
+#define PC_MAX_BOOT_DEVICES 3
+    nbds = strlen(boot_device);
+    if (nbds > PC_MAX_BOOT_DEVICES) {
+        fprintf(stderr, "Too many boot devices for PC\n");
+        exit(1);
+    }
+    for (i = 0; i < nbds; i++) {
+        bds[i] = boot_device2nibble(boot_device[i]);
+        if (bds[i] == 0) {
+            fprintf(stderr, "Invalid boot device for PC: '%c'\n",
+                    boot_device[i]);
+            exit(1);
+        }
+    }
+    rtc_set_memory(s, 0x3d, (bds[1] << 4) | bds[0]);
+    rtc_set_memory(s, 0x38, (bds[2] << 4) | (fd_bootchk ?  0x0 : 0x1));
+
+    /* floppy type */
+
+    fd0 = fdctrl_get_drive_type(floppy_controller, 0);
+    fd1 = fdctrl_get_drive_type(floppy_controller, 1);
+
+    val = (cmos_get_fd_drive_type(fd0) << 4) | cmos_get_fd_drive_type(fd1);
+    rtc_set_memory(s, 0x10, val);
+
+    val = 0;
+    nb = 0;
+    if (fd0 < 3)
+        nb++;
+    if (fd1 < 3)
+        nb++;
+    switch (nb) {
+    case 0:
+        break;
+    case 1:
+        val |= 0x01; /* 1 drive, ready for boot */
+        break;
+    case 2:
+        val |= 0x41; /* 2 drives, ready for boot */
+        break;
+    }
+    val |= 0x02; /* FPU is there */
+    val |= 0x04; /* PS/2 mouse installed */
+    rtc_set_memory(s, REG_EQUIPMENT_BYTE, val);
+
+    /* hard drives */
+
+    rtc_set_memory(s, 0x12, (hd_table[0] ? 0xf0 : 0) | (hd_table[1] ? 0x0f : 0));
+    if (hd_table[0])
+        cmos_init_hd(0x19, 0x1b, hd_table[0]);
+    if (hd_table[1])
+        cmos_init_hd(0x1a, 0x24, hd_table[1]);
+
+    val = 0;
+    for (i = 0; i < 4; i++) {
+        if (hd_table[i]) {
+            int cylinders, heads, sectors, translation;
+            /* NOTE: bdrv_get_geometry_hint() returns the physical
+                geometry.  It is always such that: 1 <= sects <= 63, 1
+                <= heads <= 16, 1 <= cylinders <= 16383. The BIOS
+                geometry can be different if a translation is done. */
+            translation = bdrv_get_translation_hint(hd_table[i]);
+            if (translation == BIOS_ATA_TRANSLATION_AUTO) {
+                bdrv_get_geometry_hint(hd_table[i], &cylinders, &heads, &sectors);
+                if (cylinders <= 1024 && heads <= 16 && sectors <= 63) {
+                    /* No translation. */
+                    translation = 0;
+                } else {
+                    /* LBA translation. */
+                    translation = 1;
+                }
+            } else {
+                translation--;
+            }
+            val |= translation << (i * 2);
+        }
+    }
+    rtc_set_memory(s, 0x39, val);
+}
+
+#if 0
+void ioport_set_a20(int enable)
+{
+    /* XXX: send to all CPUs ? */
+    cpu_x86_set_a20(first_cpu, enable);
+}
+
+int ioport_get_a20(void)
+{
+    return ((first_cpu->a20_mask >> 20) & 1);
+}
+#endif
+
+static void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
+{
+    ioport_set_a20((val >> 1) & 1);
+    /* XXX: bit 0 is fast reset */
+}
+
+static uint32_t ioport92_read(void *opaque, uint32_t addr)
+{
+    return ioport_get_a20() << 1;
+}
+
+/***********************************************************/
+/* Bochs BIOS debug ports */
+
+static void bochs_bios_write(void *opaque, uint32_t addr, uint32_t val)
+{
+    static const char shutdown_str[8] = "Shutdown";
+    static int shutdown_index = 0;
+
+    switch(addr) {
+        /* Bochs BIOS messages */
+    case 0x400:
+    case 0x401:
+        fprintf(stderr, "BIOS panic at rombios.c, line %d\n", val);
+        exit(1);
+    case 0x402:
+    case 0x403:
+#ifdef DEBUG_BIOS
+        fprintf(stderr, "%c", val);
+#endif
+        break;
+    case 0x8900:
+        /* same as Bochs power off */
+        if (val == shutdown_str[shutdown_index]) {
+            shutdown_index++;
+            if (shutdown_index == 8) {
+                shutdown_index = 0;
+                qemu_system_shutdown_request();
+            }
+        } else {
+            shutdown_index = 0;
+        }
+        break;
+
+        /* LGPL'ed VGA BIOS messages */
+    case 0x501:
+    case 0x502:
+        fprintf(stderr, "VGA BIOS panic, line %d\n", val);
+        exit(1);
+    case 0x500:
+    case 0x503:
+#ifdef DEBUG_BIOS
+        fprintf(stderr, "%c", val);
+#endif
+        break;
+    }
+}
+
+static void bochs_bios_init(void)
+{
+    void *fw_cfg;
+
+    register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
+    register_ioport_write(0x401, 1, 2, bochs_bios_write, NULL);
+    register_ioport_write(0x402, 1, 1, bochs_bios_write, NULL);
+    register_ioport_write(0x403, 1, 1, bochs_bios_write, NULL);
+    register_ioport_write(0x8900, 1, 1, bochs_bios_write, NULL);
+
+    register_ioport_write(0x501, 1, 2, bochs_bios_write, NULL);
+    register_ioport_write(0x502, 1, 2, bochs_bios_write, NULL);
+    register_ioport_write(0x500, 1, 1, bochs_bios_write, NULL);
+    register_ioport_write(0x503, 1, 1, bochs_bios_write, NULL);
+
+    fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
+    fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
+}
+
+#if 0
+/* Generate an initial boot sector which sets state and jump to
+   a specified vector */
+static void generate_bootsect(uint8_t *option_rom,
+                              uint32_t gpr[8], uint16_t segs[6], uint16_t ip)
+{
+    uint8_t rom[512], *p, *reloc;
+    uint8_t sum;
+    int i;
+
+    memset(rom, 0, sizeof(rom));
+
+    p = rom;
+    /* Make sure we have an option rom signature */
+    *p++ = 0x55;
+    *p++ = 0xaa;
+
+    /* ROM size in sectors*/
+    *p++ = 1;
+
+    /* Hook int19 */
+
+    *p++ = 0x50;		/* push ax */
+    *p++ = 0x1e;		/* push ds */
+    *p++ = 0x31; *p++ = 0xc0;	/* xor ax, ax */
+    *p++ = 0x8e; *p++ = 0xd8;	/* mov ax, ds */
+
+    *p++ = 0xc7; *p++ = 0x06;   /* movvw _start,0x64 */
+    *p++ = 0x64; *p++ = 0x00;
+    reloc = p;
+    *p++ = 0x00; *p++ = 0x00;
+
+    *p++ = 0x8c; *p++ = 0x0e;   /* mov cs,0x66 */
+    *p++ = 0x66; *p++ = 0x00;
+
+    *p++ = 0x1f;		/* pop ds */
+    *p++ = 0x58;		/* pop ax */
+    *p++ = 0xcb;		/* lret */
+    
+    /* Actual code */
+    *reloc = (p - rom);
+
+    *p++ = 0xfa;		/* CLI */
+    *p++ = 0xfc;		/* CLD */
+
+    for (i = 0; i < 6; i++) {
+	if (i == 1)		/* Skip CS */
+	    continue;
+
+	*p++ = 0xb8;		/* MOV AX,imm16 */
+	*p++ = segs[i];
+	*p++ = segs[i] >> 8;
+	*p++ = 0x8e;		/* MOV <seg>,AX */
+	*p++ = 0xc0 + (i << 3);
+    }
+
+    for (i = 0; i < 8; i++) {
+	*p++ = 0x66;		/* 32-bit operand size */
+	*p++ = 0xb8 + i;	/* MOV <reg>,imm32 */
+	*p++ = gpr[i];
+	*p++ = gpr[i] >> 8;
+	*p++ = gpr[i] >> 16;
+	*p++ = gpr[i] >> 24;
+    }
+
+    *p++ = 0xea;		/* JMP FAR */
+    *p++ = ip;			/* IP */
+    *p++ = ip >> 8;
+    *p++ = segs[1];		/* CS */
+    *p++ = segs[1] >> 8;
+
+    /* sign rom */
+    sum = 0;
+    for (i = 0; i < (sizeof(rom) - 1); i++)
+        sum += rom[i];
+    rom[sizeof(rom) - 1] = -sum;
+
+    memcpy(option_rom, rom, sizeof(rom));
+}
+
+static long get_file_size(FILE *f)
+{
+    long where, size;
+
+    /* XXX: on Unix systems, using fstat() probably makes more sense */
+
+    where = ftell(f);
+    fseek(f, 0, SEEK_END);
+    size = ftell(f);
+    fseek(f, where, SEEK_SET);
+
+    return size;
+}
+
+static void load_linux(uint8_t *option_rom,
+                       const char *kernel_filename,
+		       const char *initrd_filename,
+		       const char *kernel_cmdline)
+{
+    uint16_t protocol;
+    uint32_t gpr[8];
+    uint16_t seg[6];
+    uint16_t real_seg;
+    int setup_size, kernel_size, initrd_size, cmdline_size;
+    uint32_t initrd_max;
+    uint8_t header[1024];
+    target_phys_addr_t real_addr, prot_addr, cmdline_addr, initrd_addr;
+    FILE *f, *fi;
+
+    /* Align to 16 bytes as a paranoia measure */
+    cmdline_size = (strlen(kernel_cmdline)+16) & ~15;
+
+    /* load the kernel header */
+    f = fopen(kernel_filename, "rb");
+    if (!f || !(kernel_size = get_file_size(f)) ||
+	fread(header, 1, 1024, f) != 1024) {
+	fprintf(stderr, "qemu: could not load kernel '%s'\n",
+		kernel_filename);
+	exit(1);
+    }
+
+    /* kernel protocol version */
+#if 0
+    fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
+#endif
+    if (ldl_p(header+0x202) == 0x53726448)
+	protocol = lduw_p(header+0x206);
+    else
+	protocol = 0;
+
+    if (protocol < 0x200 || !(header[0x211] & 0x01)) {
+	/* Low kernel */
+	real_addr    = 0x90000;
+	cmdline_addr = 0x9a000 - cmdline_size;
+	prot_addr    = 0x10000;
+    } else if (protocol < 0x202) {
+	/* High but ancient kernel */
+	real_addr    = 0x90000;
+	cmdline_addr = 0x9a000 - cmdline_size;
+	prot_addr    = 0x100000;
+    } else {
+	/* High and recent kernel */
+	real_addr    = 0x10000;
+	cmdline_addr = 0x20000;
+	prot_addr    = 0x100000;
+    }
+
+#if 0
+    fprintf(stderr,
+	    "qemu: real_addr     = 0x" TARGET_FMT_plx "\n"
+	    "qemu: cmdline_addr  = 0x" TARGET_FMT_plx "\n"
+	    "qemu: prot_addr     = 0x" TARGET_FMT_plx "\n",
+	    real_addr,
+	    cmdline_addr,
+	    prot_addr);
+#endif
+
+    /* highest address for loading the initrd */
+    if (protocol >= 0x203)
+	initrd_max = ldl_p(header+0x22c);
+    else
+	initrd_max = 0x37ffffff;
+
+    if (initrd_max >= ram_size-ACPI_DATA_SIZE)
+	initrd_max = ram_size-ACPI_DATA_SIZE-1;
+
+    /* kernel command line */
+    pstrcpy_targphys(cmdline_addr, 4096, kernel_cmdline);
+
+    if (protocol >= 0x202) {
+	stl_p(header+0x228, cmdline_addr);
+    } else {
+	stw_p(header+0x20, 0xA33F);
+	stw_p(header+0x22, cmdline_addr-real_addr);
+    }
+
+    /* loader type */
+    /* High nybble = B reserved for Qemu; low nybble is revision number.
+       If this code is substantially changed, you may want to consider
+       incrementing the revision. */
+    if (protocol >= 0x200)
+	header[0x210] = 0xB0;
+
+    /* heap */
+    if (protocol >= 0x201) {
+	header[0x211] |= 0x80;	/* CAN_USE_HEAP */
+	stw_p(header+0x224, cmdline_addr-real_addr-0x200);
+    }
+
+    /* load initrd */
+    if (initrd_filename) {
+	if (protocol < 0x200) {
+	    fprintf(stderr, "qemu: linux kernel too old to load a ram disk\n");
+	    exit(1);
+	}
+
+	fi = fopen(initrd_filename, "rb");
+	if (!fi) {
+	    fprintf(stderr, "qemu: could not load initial ram disk '%s'\n",
+		    initrd_filename);
+	    exit(1);
+	}
+
+	initrd_size = get_file_size(fi);
+	initrd_addr = (initrd_max-initrd_size) & ~4095;
+
+        fprintf(stderr, "qemu: loading initrd (%#x bytes) at 0x" TARGET_FMT_plx
+                "\n", initrd_size, initrd_addr);
+
+	if (!fread_targphys_ok(initrd_addr, initrd_size, fi)) {
+	    fprintf(stderr, "qemu: read error on initial ram disk '%s'\n",
+		    initrd_filename);
+	    exit(1);
+	}
+	fclose(fi);
+
+	stl_p(header+0x218, initrd_addr);
+	stl_p(header+0x21c, initrd_size);
+    }
+
+    /* store the finalized header and load the rest of the kernel */
+    cpu_physical_memory_write(real_addr, header, 1024);
+
+    setup_size = header[0x1f1];
+    if (setup_size == 0)
+	setup_size = 4;
+
+    setup_size = (setup_size+1)*512;
+    kernel_size -= setup_size;	/* Size of protected-mode code */
+
+    if (!fread_targphys_ok(real_addr+1024, setup_size-1024, f) ||
+	!fread_targphys_ok(prot_addr, kernel_size, f)) {
+	fprintf(stderr, "qemu: read error on kernel '%s'\n",
+		kernel_filename);
+	exit(1);
+    }
+    fclose(f);
+
+    /* generate bootsector to set up the initial register state */
+    real_seg = real_addr >> 4;
+    seg[0] = seg[2] = seg[3] = seg[4] = seg[4] = real_seg;
+    seg[1] = real_seg+0x20;	/* CS */
+    memset(gpr, 0, sizeof gpr);
+    gpr[4] = cmdline_addr-real_addr-16;	/* SP (-16 is paranoia) */
+
+    generate_bootsect(option_rom, gpr, seg, 0);
+}
+#endif
+
+static void main_cpu_reset(void *opaque)
+{
+    CPUState *env = opaque;
+    cpu_reset(env);
+}
+
+static const int ide_iobase[2] = { 0x1f0, 0x170 };
+static const int ide_iobase2[2] = { 0x3f6, 0x376 };
+static const int ide_irq[2] = { 14, 15 };
+
+#define NE2000_NB_MAX 6
+
+static const int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360, 0x280, 0x380 };
+static const int ne2000_irq[NE2000_NB_MAX] = { 9, 10, 11, 3, 4, 5 };
+
+static const int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
+static const int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
+
+static const int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
+static const int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
+
+#if 0 //def HAS_AUDIO
+static void audio_init (PCIBus *pci_bus, qemu_irq *pic)
+{
+    struct soundhw *c;
+    int audio_enabled = 0;
+
+    for (c = soundhw; !audio_enabled && c->name; ++c) {
+        audio_enabled = c->enabled;
+    }
+
+    if (audio_enabled) {
+        AudioState *s;
+
+        s = AUD_init ();
+        if (s) {
+            for (c = soundhw; c->name; ++c) {
+                if (c->enabled) {
+                    if (c->isa) {
+                        c->init.init_isa (s, pic);
+                    }
+                    else {
+                        if (pci_bus) {
+                            c->init.init_pci (pci_bus, s);
+                        }
+                    }
+                }
+            }
+        }
+    }
+}
+
+static void pc_init_ne2k_isa(NICInfo *nd, qemu_irq *pic)
+{
+    static int nb_ne2k = 0;
+
+    if (nb_ne2k == NE2000_NB_MAX)
+        return;
+    isa_ne2000_init(ne2000_io[nb_ne2k], pic[ne2000_irq[nb_ne2k]], nd);
+    nb_ne2k++;
+}
+#endif
+
+\f

+// FIXME copied from net.c
+
+static int parse_macaddr(uint8_t *macaddr, const char *p)
+{
+    int i;
+    char *last_char;
+    long int offset;
+
+    errno = 0;
+    offset = strtol(p, &last_char, 0);    
+    if (0 == errno && '\0' == *last_char &&
+            offset >= 0 && offset <= 0xFFFFFF) {
+        macaddr[3] = (offset & 0xFF0000) >> 16;
+        macaddr[4] = (offset & 0xFF00) >> 8;
+        macaddr[5] = offset & 0xFF;
+        return 0;
+    } else {
+        for(i = 0; i < 6; i++) {
+            macaddr[i] = strtol(p, (char **)&p, 16);
+            if (i == 5) {
+                if (*p != '\0')
+                    return -1;
+            } else {
+                if (*p != ':' && *p != '-')
+                    return -1;
+                p++;
+            }
+        }
+        return 0;    
+    }
+
+    return -1;
+}
+\f

+/* Device Interface */
+
+/*
+ * Device life cycle:
+ *
+ * 1. Configuration: config() method runs after parent's.  Except kids
+ * are skipped when the parent's config() returns non-zero.  config()
+ * should initialize the device's private data from its configuration
+ * sub-tree.  It may edit the configuration sub-tree, and may declare
+ * initialization ordering constraints with tree_require_named().
+ * 
+ * 2. Initialization: init() method runs after parent's and after that
+ * of devices declared required by config().  It should not touch the
+ * configuration tree.
+ *
+ * 3. Start: start() method runs, order is unspecified.
+ * 
+ * Error handling in these driver methods: print to stderr and exit
+ * the program unsuccessfully.
+ *
+ * There is no device shutdown protocol yet.
+ */
+
+struct dt_device {
+    struct tree *conf;		/* configuration sub-tree */
+    struct dt_driver *drv;	/* device driver */
+    void *priv;			/* device private data */
+};
+
+struct dt_driver {
+    const char *name;
+    size_t privsz;		/* size of device private data */
+    struct dt_prop_spec *prop_spec; /* recognized conf node properties */
+    int (*config)(struct dt_device *);
+    void (*init)(struct dt_device *);
+    void (*start)(struct dt_device *);
+};
+
+static struct dt_driver dt_driver_table[];
+
+static struct dt_driver *
+dt_driver_by_name(const char *name)
+{
+    int i;
+
+    for (i = 0; dt_driver_table[i].name; i++) {
+	if (!strcmp(name, dt_driver_table[i].name))
+	    return &dt_driver_table[i];
+    }
+    return NULL;
+}
+
+static struct dt_device *
+dt_device_of(struct tree *conf)
+{
+    return tree_get_user(conf);
+}
+
+static struct dt_device *
+dt_new_device(struct tree *conf, struct dt_driver *drv)
+{
+    struct dt_device *dev;
+    struct tree_prop *prop;
+
+    dev = qemu_malloc(sizeof(*dev));
+    dev->conf = conf;
+    dev->drv = drv;
+    dev->priv = qemu_malloc(drv->privsz);
+    tree_put_user(conf, dev);
+
+    TREE_FOREACH_PROP(prop, conf)
+	dt_parse_prop(dev, prop);
+
+    return dev;
+}
+
+static void
+dt_config(struct tree *conf)
+{
+    struct dt_driver *drv;
+    struct dt_device *dev;
+    struct tree *kid;
+
+    drv = dt_driver_by_name(tree_node_name(conf));
+    if (!drv) {
+	fprintf(stderr, "No driver for device %s\n",
+		tree_node_name(conf));
+	exit(1);
+    }
+    dev = dt_new_device(conf, drv);
+    if (drv->config) {
+	if (drv->config(dev))
+	    return;
+    }
+
+    TREE_FOREACH_KID(kid, conf)
+	dt_config(kid);
+}
+
+static void
+dt_init_visitor(struct tree *node, void *arg)
+{
+    struct dt_device *dev = dt_device_of(node);
+
+    if (dev && dev->drv->init)
+	dev->drv->init(dev);
+}
+
+static void
+dt_init(struct tree *conf)
+{
+    tree_visit(conf, dt_init_visitor, NULL);
+}
+
+static void
+dt_start(struct tree *conf)
+{
+    struct dt_device *dev = dt_device_of(conf);
+    struct tree *kid;
+
+    if (dev && dev->drv->start)
+	dev->drv->start(dev);
+
+    TREE_FOREACH_KID(kid, conf)
+	dt_start(kid);
+}
+
+\f

+/* Device properties */
+
+/*
+ * This is for parsing configuration tree node properties into device
+ * private data.
+ */
+
+struct dt_prop_spec {
+    const char *name;
+    ptrdiff_t offs;		/* offset in device private data */
+    size_t size;		/* size there, for sanity checking */
+    int (*parse)(void *, const char *, struct dt_prop_spec *);
+};
+
+#define DT_PROP_SPEC_INIT(name, strty, member, fmt)			\
+    { name, offsetof(strty, member), sizeof(((strty *)0)->member),	\
+      dt_parse_##fmt }
+
+static struct dt_prop_spec *
+dt_prop_spec_by_name(struct dt_driver *drv, const char *name)
+{
+    struct dt_prop_spec *spec;
+
+    for (spec = drv->prop_spec; spec && spec->name; spec++) {
+	if (!strcmp(spec->name, name))
+	    return spec;
+    }
+    return NULL;
+}
+
+static void
+dt_parse_prop(struct dt_device *dev, struct tree_prop *prop)
+{
+    const char *name = tree_prop_name(prop);
+    size_t size;
+    const char *val = tree_prop_value(prop, &size);
+    struct dt_prop_spec *spec = dt_prop_spec_by_name(dev->drv, name);
+
+    if (!spec) {
+	fprintf(stderr, "A %s device has no property %s\n",
+		dev->drv->name, name);
+	exit(1);
+    }
+
+    if (memchr(val, 0, size) != val + size - 1
+	|| spec->parse((char *)dev->priv + spec->offs, val, spec) < 0) {
+	fprintf(stderr, "Bad value %.*s for property %s of device %s\n",
+		size, val, name, dev->drv->name);
+	exit(1);
+    }
+}
+
+static int
+dt_parse_string(void *dst, const char *src, struct dt_prop_spec *spec)
+{
+    assert(spec->size == sizeof(char *));
+    *(const char **)dst = src;
+    return 0;
+}
+
+static int
+dt_parse_int(void *dst, const char *src, struct dt_prop_spec *spec)
+{
+    char *ep;
+    long val;
+
+    assert(spec->size == sizeof(int));
+    errno = 0;
+    val = strtol(src, &ep, 0);
+    if (*ep || ep == src || errno || (int)val != val)
+	return -1;
+    *(int *)dst = val;
+    return 0;
+}
+
+static int
+dt_parse_ram_addr_t(void *dst, const char *src, struct dt_prop_spec *spec)
+{
+    char *ep;
+    unsigned long val;
+
+    assert(spec->size == sizeof(ram_addr_t));
+    errno = 0;
+    val = strtoul(src, &ep, 0);
+    if (*ep || ep == src || errno || (ram_addr_t)val != val)
+	return -1;
+    *(ram_addr_t *)dst = val;
+    return 0;
+}
+
+static int
+dt_parse_macaddr(void *dst, const char *src, struct dt_prop_spec *spec)
+{
+    assert(spec->size == 6);
+    if (parse_macaddr(dst, src) < 0)
+	return -1;
+    return 0;
+}
+
+\f

+/* CPUs Driver */
+
+struct dt_device_cpus {
+    const char *model;
+    int num;
+};
+
+static struct dt_prop_spec dt_cpus_props[] = {
+    DT_PROP_SPEC_INIT("model", struct dt_device_cpus, model, string),
+    DT_PROP_SPEC_INIT("num", struct dt_device_cpus, num, int),
+};
+
+static void
+dt_cpus_init(struct dt_device *dev)
+{
+    struct dt_device_cpus *priv = dev->priv;
+    int i;
+    CPUState *env;
+
+    for(i = 0; i < priv->num; i++) {
+        env = cpu_init(priv->model);
+        if (!env) {
+            fprintf(stderr, "Unable to find x86 CPU definition\n");
+            exit(1);
+        }
+        if (i != 0)
+            env->halted = 1;
+        qemu_register_reset(main_cpu_reset, env);
+    }
+}
+
+\f

+/* Memory Ranges */
+
+struct dt_device_memrng {
+    target_phys_addr_t phys_addr;
+    ram_addr_t size;
+    ram_addr_t host_offs;
+    ram_addr_t flags;
+};
+
+static void
+dt_memrng(struct dt_device_memrng *rng,
+	  target_phys_addr_t phys_addr, ram_addr_t size,
+	  ram_addr_t host_offs, ram_addr_t flags)
+{
+    rng->phys_addr = phys_addr;
+    rng->size = size;
+    rng->host_offs = host_offs;
+    rng->flags = flags;
+}
+
+static void
+dt_memrng_ram(struct dt_device_memrng *rng,
+	      target_phys_addr_t phys_addr, ram_addr_t size)
+{
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), 0);
+}
+
+static void
+dt_memrng_rom(struct dt_device_memrng *rng,
+	      target_phys_addr_t phys_addr, ram_addr_t maxsz,
+	      const char *dir, const char *image, int top)
+{
+    char buf[1024];
+    int size;
+
+    snprintf(buf, sizeof(buf), "%s/%s", dir, image);
+    size = get_image_size(buf);
+    if (size < 0 || size > maxsz)
+	goto error;
+    if (top)
+	phys_addr = phys_addr + maxsz - size;
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), IO_MEM_ROM);
+    if (load_image(buf, phys_ram_base + rng->host_offs) != size)
+	goto error;
+    return;
+
+error:
+    fprintf(stderr, "qemu: could not load image '%s'\n", buf);
+    exit(1);
+}
+
+static void
+dt_memrng_init(struct dt_device_memrng *rng, int n)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+	cpu_register_physical_memory(rng[i].phys_addr, rng[i].size,
+				     rng[i].host_offs | rng[i].flags);
+}
+
+\f

+/* Memory Driver */
+
+struct dt_device_memory {
+    ram_addr_t ram_size;
+    struct dt_device_memrng *rng;
+    int nrng;
+    /* TODO want a real memory map here */
+    ram_addr_t below_4g, above_4g;
+};
+
+static struct dt_prop_spec dt_memory_props[] = {
+    DT_PROP_SPEC_INIT("ram", struct dt_device_memory, ram_size, ram_addr_t),
+};
+
+static int
+dt_memory_config(struct dt_device *dev)
+{
+    /* TODO memory map hardcoded; get it from dev->conf instead */
+    struct dt_device_memory *priv = dev->priv;
+    struct dt_device_memrng *rng = qemu_malloc(sizeof(*rng) * 4);
+
+    if (priv->ram_size >= 0xe0000000 ) {
+        priv->above_4g = priv->ram_size - 0xe0000000;
+        priv->below_4g = 0xe0000000;
+    } else {
+        priv->below_4g = priv->ram_size;
+	priv->above_4g = 0;
+    }
+
+    dt_memrng_ram(&rng[0], 0, 0xa0000);
+    qemu_ram_alloc(0x60000);
+    dt_memrng_ram(&rng[1], 0x100000, priv->below_4g - 0x100000);
+    if (priv->above_4g)
+	abort();		/* TODO */
+    dt_memrng_rom(&rng[2], 0xe0000000, 0x20000000,
+		  bios_dir, BIOS_FILENAME, 1);
+				/* TODO get name from dev->conf */
+    dt_memrng(&rng[3], 0xe0000, 0x20000,
+	      rng[2].host_offs + rng[2].size - 0x20000, IO_MEM_ROM);
+    /* TODO option ROMs */
+
+    priv->rng = rng;
+    priv->nrng = 4;
+    return 0;
+}
+
+static void
+dt_memory_init(struct dt_device *dev)
+{
+    struct dt_device_memory *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, priv->nrng);
+    bochs_bios_init();
+}
+
+static ram_addr_t
+dt_memory_below_4g(struct tree *memory)
+{
+    struct dt_device *dev = dt_device_of(memory);
+    struct dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->below_4g;
+}
+
+static ram_addr_t
+dt_memory_above_4g(struct tree *memory)
+{
+    struct dt_device *dev = dt_device_of(memory);
+    struct dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->above_4g;
+}
+
+\f

+/* Drives */
+
+static void
+dt_drive_config(struct dt_device *dev, BlockDriverState *drive[], int n)
+{
+    static struct dt_prop_spec spec = { NULL, 0, sizeof(int), NULL };
+    struct tree *kid;
+    int i, index;
+
+    memset(drive, 0, sizeof(*drive) * n);
+    i = 0;
+    TREE_FOREACH_KID(kid, dev->conf) {
+	if (strcmp(tree_node_name(kid), "drive"))
+	    continue;
+	if (dt_parse_int(&index, tree_get_prop_s(kid, "_index"), &spec) < 0)
+	    abort();
+	assert(i < n);
+	drive[i++] = drives_table[index].bdrv;
+    }
+}
+
+\f

+/* PC Miscellanous Driver */
+
+/*
+ * This is a driver for a whole collection of devices.  Could be
+ * picked apart into separate drivers, I guess.
+ */
+
+struct dt_device_pc_misc {
+    const char *boot_device;
+    int apic;
+    int hpet;
+    qemu_irq *i8259;
+    BlockDriverState *fd[MAX_FD];
+};
+
+static struct dt_prop_spec dt_pc_misc_props[] = {
+    DT_PROP_SPEC_INIT("boot-device", struct dt_device_pc_misc, boot_device,
+		      string),
+};
+
+static int
+dt_pc_misc_config(struct dt_device *dev)
+{
+    struct dt_device_pc_misc *priv = dev->priv;
+
+    priv->apic = 1;
+    priv->hpet = 1;
+    priv->i8259 = NULL;
+    dt_drive_config(dev, priv->fd, sizeof(priv->fd) / sizeof(*priv->fd));
+    return 1;
+}
+
+static void
+dt_pc_misc_init(struct dt_device *dev)
+{
+    struct dt_device_pc_misc *priv = dev->priv;
+    CPUState *env;
+    qemu_irq *cpu_irq;
+    IOAPICState *ioapic;
+    PITState *pit;
+    int i;
+
+    if (priv->apic) {
+	for (env = first_cpu; env; env = env->next_cpu) {
+            env->cpuid_features |= CPUID_APIC;
+	    apic_init(env);
+	}
+    }
+
+    vmport_init();
+
+    cpu_irq = qemu_allocate_irqs(pic_irq_request, NULL, 1);
+    priv->i8259 = i8259_init(cpu_irq[0]);
+    ferr_irq = priv->i8259[13];
+
+    register_ioport_write(0x80, 1, 1, ioport80_write, NULL);
+    register_ioport_write(0xf0, 1, 1, ioportF0_write, NULL);
+
+    rtc_state = rtc_init(0x70, priv->i8259[8], 2000);
+    qemu_register_boot_set(pc_boot_set, rtc_state);
+
+    register_ioport_read(0x92, 1, 1, ioport92_read, NULL);
+    register_ioport_write(0x92, 1, 1, ioport92_write, NULL);
+
+    if (priv->apic) {
+        ioapic = ioapic_init();
+        pic_set_alt_irq_func(isa_pic, ioapic_set_irq, ioapic);
+    }
+
+    pit = pit_init(0x40, priv->i8259[0]);
+    pcspk_init(pit);
+    if (priv->hpet)
+        hpet_init(priv->i8259);
+    
+    for(i = 0; i < MAX_SERIAL_PORTS; i++) {
+        if (serial_hds[i]) {
+            serial_init(serial_io[i], priv->i8259[serial_irq[i]], 115200,
+                        serial_hds[i]);
+        }
+    }
+
+    for(i = 0; i < MAX_PARALLEL_PORTS; i++) {
+        if (parallel_hds[i]) {
+            parallel_init(parallel_io[i], priv->i8259[parallel_irq[i]],
+                          parallel_hds[i]);
+        }
+    }
+
+    i8042_init(priv->i8259[1], priv->i8259[12], 0x60);
+    DMA_init(0);
+
+    floppy_controller = fdctrl_init(priv->i8259[6], 2, 0, 0x3f0, priv->fd);
+}
+
+static void
+dt_pc_misc_start(struct dt_device *dev)
+{
+    struct dt_device_pc_misc *priv = dev->priv;
+    struct tree *memory = tree_node_by_name(dev->conf, "/memory");
+    struct tree *piix3 = tree_node_by_name(dev->conf, "/pci/piix3");
+
+    cmos_init(dt_memory_below_4g(memory),
+	      dt_memory_above_4g(memory),
+	      priv->boot_device,
+	      dt_piix3_hd(piix3));
+}
+
+static qemu_irq *
+dt_pc_misc_i8259(struct tree *pc_misc)
+{
+    struct dt_device *dev = dt_device_of(pc_misc);
+    struct dt_device_pc_misc *priv = dev->priv;
+    assert(dev->drv->init == dt_pc_misc_init);
+    return priv->i8259;
+}
+
+\f

+/* PCI Bus Driver */
+
+struct dt_device_pci {
+    PCIBus *bus;
+    struct tree *pc;
+};
+
+static int
+dt_pci_config(struct dt_device *dev)
+{
+    struct dt_device_pci *priv = dev->priv;
+
+    priv->bus = NULL;
+    priv->pc = tree_require_named(dev->conf, "/pc-misc");
+    return 0;
+}
+
+static void
+dt_pci_init(struct dt_device *dev)
+{
+    struct dt_device_pci *priv = dev->priv;
+
+    priv->bus = i440fx_init(&i440fx_state, dt_pc_misc_i8259(priv->pc));
+}
+
+static void
+dt_pci_start(struct dt_device *dev)
+{
+    i440fx_init_memory_mappings(i440fx_state);
+}
+
+static void
+dt_must_be_on_pcibus(struct dt_device *dev)
+{
+    struct dt_device *bus = dt_device_of(tree_parent(dev->conf));
+
+    if (bus->drv->init != dt_pci_init) {
+	fprintf(stderr, "Device %s must be on a PCI bus\n", dev->drv->name);
+	exit(1);
+    }
+}
+
+static struct PCIBus *
+dt_get_pcibus(struct dt_device *dev)
+{
+    struct dt_device *bus = dt_device_of(tree_parent(dev->conf));
+
+    assert(bus->drv->init == dt_pci_init);
+    return ((struct dt_device_pci *)bus->priv)->bus;
+}
+
+\f

+/* PIIX3 Driver */
+
+struct dt_device_piix3 {
+    int devfn;
+    int acpi;
+    int usb;
+    struct tree *pc;
+    BlockDriverState *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
+};
+
+static int
+dt_piix3_config(struct dt_device *dev)
+{
+    struct dt_device_piix3 *priv = dev->priv;
+
+    priv->devfn = -1;
+    priv->acpi = 1;
+    priv->usb = 1;
+    priv->pc = tree_require_named(dev->conf, "/pc-misc");
+    dt_drive_config(dev, priv->hd, sizeof(priv->hd) / sizeof(*priv->hd));
+    dt_must_be_on_pcibus(dev);
+    return 1;
+}
+
+static void
+dt_piix3_init(struct dt_device *dev)
+{
+    struct dt_device_piix3 *priv = dev->priv;
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+    qemu_irq *i8259 = dt_pc_misc_i8259(priv->pc);
+    int i;
+
+    priv->devfn = piix3_init(pci_bus, priv->devfn);
+
+    pci_piix3_ide_init(pci_bus, priv->hd, priv->devfn + 1, i8259);
+
+    if (priv->usb)
+	usb_uhci_piix3_init(pci_bus, priv->devfn + 2);
+
+    if (priv->acpi) {
+        uint8_t *eeprom_buf = qemu_mallocz(8 * 256); /* XXX: make this persistent */
+        i2c_bus *smbus;
+
+        /* TODO: Populate SPD eeprom data.  */
+        smbus = piix4_pm_init(pci_bus, priv->devfn + 3, 0xb100, i8259[9]);
+        for (i = 0; i < 8; i++)
+            smbus_eeprom_device_init(smbus, 0x50 + i, eeprom_buf + (i * 256));
+    }
+}
+
+static BlockDriverState **
+dt_piix3_hd(struct tree *piix3)
+{
+    struct dt_device *dev = dt_device_of(piix3);
+    struct dt_device_piix3 *priv = dev->priv;
+
+    assert(dev->drv->init == dt_piix3_init);
+    return priv->hd;
+}
+
+\f

+/* VGA Driver */
+
+struct dt_driver_vga {
+    const char *model;
+    const char *bios;
+    void (*init)(PCIBus *, uint8_t *, ram_addr_t, int);
+};
+
+static void
+pci_vmsvga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+		 ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vmsvga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size);
+}
+
+static void
+pci_vga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+	      ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size, 0, 0);
+}
+
+static struct dt_driver_vga dt_driver_vga_table[] = {
+    { "cirrus", VGABIOS_CIRRUS_FILENAME, pci_cirrus_vga_init },
+    { "vms", VGABIOS_FILENAME, pci_vmsvga_init_ },
+    { "std", VGABIOS_FILENAME, pci_vga_init_ },
+    { NULL, NULL, NULL }
+};
+
+struct dt_device_vga {
+    const char *model;
+    ram_addr_t ram_size;
+    struct dt_device_memrng rng[1];
+    ram_addr_t ram_offs;
+    struct dt_driver_vga *vga_drv;
+};
+
+static struct dt_prop_spec dt_vga_props[] = {
+    DT_PROP_SPEC_INIT("model", struct dt_device_vga, model, string),
+    DT_PROP_SPEC_INIT("ram", struct dt_device_vga, ram_size, ram_addr_t),
+};
+
+static int
+dt_vga_config(struct dt_device *dev)
+{
+    struct dt_device_vga *priv = dev->priv;
+    int i;
+
+    dt_memrng_rom(&priv->rng[0], 0xc0000, 0x10000,
+		  bios_dir, VGABIOS_CIRRUS_FILENAME, 0);
+				/* TODO get name from dev->conf */
+    priv->ram_offs = qemu_ram_alloc(priv->ram_size);
+
+    for (i = 0; dt_driver_vga_table[i].model; i++) {
+	if (!strcmp(dt_driver_vga_table[i].model, priv->model))
+	    break;
+    }
+    if (!dt_driver_vga_table[i].model) {
+	fprintf(stderr, "Unknown VGA model %s\n", priv->model);
+	exit(1);
+    }
+    priv->vga_drv = &dt_driver_vga_table[i];
+    dt_must_be_on_pcibus(dev);
+    return 0;
+}
+
+static void
+dt_vga_init(struct dt_device *dev)
+{
+    struct dt_device_vga *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, 1);
+    priv->vga_drv->init(dt_get_pcibus(dev),
+			phys_ram_base + priv->ram_offs,
+			priv->ram_offs, priv->ram_size);
+}
+
+\f

+/* NIC Driver */
+
+struct dt_device_nic {
+    NICInfo nd;
+    int vlanid;
+};
+
+static struct dt_prop_spec dt_nic_props[] = {
+    DT_PROP_SPEC_INIT("model", struct dt_device_nic, nd.model, string),
+    DT_PROP_SPEC_INIT("mac", struct dt_device_nic, nd.macaddr, macaddr),
+    DT_PROP_SPEC_INIT("name", struct dt_device_nic, nd.name, string),
+    DT_PROP_SPEC_INIT("vlan", struct dt_device_nic, vlanid, int),
+};
+
+static int
+dt_nic_config(struct dt_device *dev)
+{
+    struct dt_device_nic *priv = dev->priv;
+
+    priv->nd.vlan = qemu_find_vlan(priv->vlanid);
+    dt_must_be_on_pcibus(dev);
+    return 0;
+}
+
+static void
+dt_nic_init(struct dt_device *dev)
+{
+    struct dt_device_nic *priv = dev->priv;
+
+    pci_nic_init(dt_get_pcibus(dev), &priv->nd, -1, NULL);
+}
+
+\f

+/* Machine Driver */
+
+static struct dt_driver dt_driver_table[] = {
+    { "", 0, NULL, NULL },
+    { "cpus", sizeof(struct dt_device_cpus), dt_cpus_props,
+      NULL, dt_cpus_init, NULL },
+    { "memory", sizeof(struct dt_device_memory), dt_memory_props,
+      dt_memory_config, dt_memory_init, NULL },
+    { "pc-misc", sizeof(struct dt_device_pc_misc), dt_pc_misc_props,
+      dt_pc_misc_config, dt_pc_misc_init, dt_pc_misc_start },
+    { "pci", sizeof(struct dt_device_pci), NULL,
+      dt_pci_config, dt_pci_init, dt_pci_start },
+    { "piix3", sizeof(struct dt_device_piix3), NULL,
+      dt_piix3_config, dt_piix3_init, NULL },
+    { "vga", sizeof(struct dt_device_vga), dt_vga_props,
+      dt_vga_config, dt_vga_init, NULL },
+    { "nic", sizeof(struct dt_device_nic), dt_nic_props,
+      dt_nic_config, dt_nic_init, NULL },
+    { NULL, 0, NULL, NULL, NULL }
+};
+
+static void
+dt_attach_drive(struct tree *controller, int drive_index)
+{
+    struct tree *node = tree_new_kid(controller, "drive", NULL);
+
+    tree_put_propf(node, "_index", "%d", drive_index);
+}
+
+/*
+ * Extract configuration from arguments and various global variables
+ * and put it into the configuration tree.
+ */
+static void
+dt_customize_config(struct tree *tree,
+		    ram_addr_t ram_size, int vga_ram_size,
+		    const char *boot_device,
+		    const char *kernel_filename,
+		    const char *kernel_cmdline,
+		    const char *initrd_filename,
+		    const char *cpu_model)
+{
+    struct tree *node, *pci;
+    int i, index;
+
+    node = tree_node_by_name(tree, "/cpus");
+    tree_put_propf(node, "num", "%d", smp_cpus);
+    if (cpu_model)
+	tree_put_propf(node, "model", "%s", cpu_model);
+
+    node = tree_node_by_name(tree, "/memory");
+    tree_put_propf(node, "ram", "%#lx", (unsigned long)ram_size);
+
+    pci = tree_node_by_name(tree, "/pci");
+    if (cirrus_vga_enabled || vmsvga_enabled || std_vga_enabled) {
+	node = tree_new_kid(pci, "vga", NULL);
+	tree_put_propf(node, "model", "%s",
+			  cirrus_vga_enabled ? "cirrus" :
+			  vmsvga_enabled ? "vms" : "std");
+	tree_put_propf(node, "ram", "%#x", vga_ram_size);
+    }
+
+    for(i = 0; i < nb_nics; i++) {
+	/* TODO non-PCI NICs */
+	struct NICInfo *n = &nd_table[i];
+
+	node = tree_new_kid(pci, "nic", NULL);
+	tree_put_propf(node, "mac", "%02x:%02x:%02x:%02x:%02x:%02x",
+		       n->macaddr[0], n->macaddr[1], n->macaddr[2],
+		       n->macaddr[3], n->macaddr[4], n->macaddr[5]);
+	tree_put_propf(node, "model", "%s",
+		       n->model ? n->model : "ne2k_pci");
+	if (n->name)
+	    tree_put_propf(node, "name", "%s", n->name);
+	tree_put_propf(node, "vlan", "%d", n->vlan->id);
+    }
+
+    node = tree_node_by_name(pci, "piix3");
+    for(i = 0; i < MAX_IDE_BUS * MAX_IDE_DEVS; i++) {
+        index = drive_get_index(IF_IDE, i / MAX_IDE_DEVS, i % MAX_IDE_DEVS);
+	if (index != -1)
+	    dt_attach_drive(node, index);
+    }
+
+    node = tree_node_by_name(tree, "/pc-misc");
+    tree_put_propf(node, "boot-device", "%s", boot_device);
+    for(i = 0; i < MAX_FD; i++) {
+        index = drive_get_index(IF_FLOPPY, 0, i);
+	if (index != -1)
+	    dt_attach_drive(node, index);
+    }
+
+    if (kernel_filename)
+	abort();		/* TODO */
+}
+
+static void
+pc_init_dt(ram_addr_t ram_size, int vga_ram_size,
+	   const char *boot_device,
+	   const char *kernel_filename,
+	   const char *kernel_cmdline,
+	   const char *initrd_filename,
+	   const char *cpu_model)
+{
+    struct tree *tree;
+
+    tree = tree_create();
+    if (!tree)
+	exit(1);
+    tree_print(tree);
+    dt_customize_config(tree, ram_size, vga_ram_size, boot_device,
+			kernel_filename, kernel_cmdline, initrd_filename,
+			cpu_model);
+    dt_config(tree);
+    tree_print(tree);
+    dt_init(tree);
+    dt_start(tree);
+}
+
+QEMUMachine pcdt_machine = {
+    .name = "pcdt",
+    .desc = "Standard PC (device tree)",
+    .init = pc_init_dt,
+    .ram_require = VGA_RAM_SIZE + PC_MAX_BIOS_SIZE,
+    .max_cpus = 255,
+};
diff --git a/hw/pc.c b/hw/pc.c
index 176730e..fc9ee20 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -57,21 +57,21 @@ static fdctrl_t *floppy_controller;
 static RTCState *rtc_state;
 static PITState *pit;
 static IOAPICState *ioapic;
-static PCIDevice *i440fx_state;
+PCIDevice *i440fx_state;
 
 static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
 {
 }
 
 /* MSDOS compatibility mode FPU exception support */
-static qemu_irq ferr_irq;
+qemu_irq ferr_irq;
 /* XXX: add IGNNE support */
 void cpu_set_ferr(CPUX86State *s)
 {
     qemu_irq_raise(ferr_irq);
 }
 
-static void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
 {
     qemu_irq_lower(ferr_irq);
 }
diff --git a/target-i386/machine.c b/target-i386/machine.c
index faab2eb..5a2a0c2 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -7,6 +7,8 @@
 
 void register_machines(void)
 {
+    extern QEMUMachine pcdt_machine;
+    qemu_register_machine(&pcdt_machine);
     qemu_register_machine(&pc_machine);
     qemu_register_machine(&isapc_machine);
 }
diff --git a/tree.c b/tree.c
new file mode 100644
index 0000000..66c3ea5
--- /dev/null
+++ b/tree.c
@@ -0,0 +1,374 @@
+/* Ye Olde Decorated Tree */
+
+#include <assert.h>
+#include <ctype.h>
+#include <errno.h>
+#include <stdarg.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include "tree.h"
+#include "qemu-common.h"
+#include "sys-queue.h"
+
+struct tree {
+    const char *name;
+    LIST_HEAD(, tree_prop) props;
+    struct tree *parent;
+    TAILQ_HEAD(, tree) kids;
+    TAILQ_ENTRY(tree) siblings;
+    void *user;
+    LIST_HEAD(, tree) reqs;
+    LIST_ENTRY(tree) reqlink;
+    int visit;
+};
+
+struct tree_prop {
+    const char *name;
+    const void *val;
+    int sz;
+    struct tree *owner;
+    LIST_ENTRY(tree_prop) link;
+};
+
+struct tree *
+tree_create(void)
+{
+    /* TODO read from a config file; tree is hardcoded for now */
+    static struct tree dt_root = {
+	.name = "",
+    };
+
+    static struct tree dt_cpus = {
+	.name = "cpus",
+	.parent = &dt_root,
+    };
+
+    static struct tree_prop dt_cpus_model = {
+	.name = "model",
+	.val = "qemu32",
+	.sz = -1,
+	.owner = &dt_cpus,
+    };
+
+    static struct tree dt_memory = {
+	.name = "memory",
+	.parent = &dt_root,
+    };
+
+    static struct tree dt_pc_misc = {
+	.name = "pc-misc",
+	.parent = &dt_root,
+    };
+
+    static struct tree dt_pci = {
+	.name = "pci",
+	.parent = &dt_root,
+    };
+
+    static struct tree dt_piix3 = {
+	.name = "piix3",
+	.parent = &dt_pci,
+    };
+
+    static struct tree *dt_nodes[] = {
+	&dt_root, &dt_cpus, &dt_memory, &dt_pc_misc,
+	&dt_pci, &dt_piix3,
+	NULL
+    };
+
+    static struct tree_prop *dt_props[] = {
+	&dt_cpus_model,
+	NULL
+    };
+
+    int i;
+
+    for (i = 0; dt_nodes[i]; i++) {
+	LIST_INIT(&dt_nodes[i]->props);
+	TAILQ_INIT(&dt_nodes[i]->kids);
+	LIST_INIT(&dt_nodes[i]->reqs);
+	if (dt_nodes[i]->parent)
+	    TAILQ_INSERT_TAIL(&dt_nodes[i]->parent->kids,
+			      dt_nodes[i], siblings);
+    }
+    for (i = 0; dt_props[i]; i++) {
+	if (dt_props[i]->sz < 0)
+	    dt_props[i]->sz = strlen(dt_props[i]->val) + 1;
+	LIST_INSERT_HEAD(&dt_props[i]->owner->props, dt_props[i], link);
+    }
+
+    return &dt_root;
+}
+
+const char *
+tree_node_name(const struct tree *node)
+{
+    return node->name;
+}
+
+static struct tree *
+tree_kid_by_name(const struct tree *dt, const char *name)
+{
+    const char *slash = strchr(name, '/');
+    size_t len = slash ? slash - name : strlen(name);
+    struct tree *kid;
+
+    TAILQ_FOREACH(kid, &dt->kids, siblings) {
+	if (!memcmp(kid->name, name, len) && kid->name[len] == 0)
+	    return kid;
+    }
+    return NULL;
+}
+
+struct tree *
+tree_node_by_name(const struct tree *node, const char *name)
+{
+    struct tree *kid;
+    size_t len;
+
+    if (name[0] == '/') {
+	for (; node->parent; node = node->parent) ;
+	name++;
+    }
+
+    if (name[0] == 0)
+	return (struct tree *)node;
+
+    kid = tree_kid_by_name(node, name);
+    if (!kid)
+	return NULL;
+
+    len = strlen(kid->name);
+    if (name[len] == 0)
+	return kid;
+    assert (name[len] == '/');
+
+    while (name[len] == '/') len++;
+    return tree_node_by_name(kid, name + len);
+}
+
+struct tree_prop *
+tree_first_prop(const struct tree *node)
+{
+    return LIST_FIRST(&node->props);
+}
+
+struct tree_prop *
+tree_next_prop(const struct tree_prop *prop)
+{
+    return LIST_NEXT(prop, link);
+}
+
+struct tree_prop *
+tree_get_prop(const struct tree *node, const char *name)
+{
+    struct tree_prop *prop;
+
+    LIST_FOREACH(prop, &node->props, link) {
+	if (!strcmp(prop->name, name))
+	    return prop;
+    }
+    return NULL;
+}
+
+const char *
+tree_get_prop_s(const struct tree *node, const char *name)
+{
+    struct tree_prop *prop = tree_get_prop(node, name);
+    if (!prop
+	|| memchr(prop->val, 0, prop->sz) != prop->val + prop->sz - 1) {
+	errno = EINVAL;
+	return NULL;
+    }
+    return prop->val;
+}
+
+const char *
+tree_prop_name(const struct tree_prop *prop)
+{
+    return prop->name;
+}
+
+const void *
+tree_prop_value(const struct tree_prop *prop, size_t *size)
+{
+    if (size)
+	*size = prop->sz;
+    return prop->val;
+}
+
+void
+tree_put_prop(struct tree *node, const char *name,
+	      const void *val, size_t sz)
+{
+    struct tree_prop *prop;
+
+    prop = tree_get_prop(node, name);
+    if (!prop) {
+	prop = qemu_malloc(sizeof(*prop));
+	prop->name = name;
+	prop->owner = node;
+	LIST_INSERT_HEAD(&node->props, prop, link);
+    }
+    /* FIXME need a destructor for val */
+    prop->val = val;
+    prop->sz = sz;
+}
+
+void
+tree_put_propf(struct tree *node, const char *name, const char *fmt, ...)
+{
+    va_list ap;
+    size_t len;
+    char *buf;
+
+    va_start(ap, fmt);
+    len = vsnprintf(NULL, 0, fmt, ap);
+    va_end(ap);
+
+    buf = qemu_malloc(len + 1);
+    va_start(ap, fmt);
+    vsnprintf(buf, len + 1, fmt, ap);
+    va_end(ap);
+
+    tree_put_prop(node, name, buf, len + 1);
+}
+
+void
+tree_put_user(struct tree *node, void *user)
+{
+    node->user = user;
+}
+
+void *
+tree_get_user(const struct tree *node)
+{
+    return node->user;
+}
+
+struct tree *
+tree_new_kid(struct tree *parent, const char *name, void *user)
+{
+    struct tree *kid = qemu_malloc(sizeof(*kid));
+
+    kid->name = name;
+    LIST_INIT(&kid->props);
+    kid->parent = parent;
+    TAILQ_INIT(&kid->kids);
+    TAILQ_INSERT_TAIL(&parent->kids, kid, siblings);
+    kid->user = user;
+    LIST_INIT(&kid->reqs);
+    kid->visit = 0;
+
+    return kid;
+}
+
+struct tree *
+tree_parent(const struct tree *node)
+{
+    return node->parent;
+}
+
+struct tree *
+tree_first_kid(const struct tree *node)
+{
+    return TAILQ_FIRST(&node->kids);
+}
+
+struct tree *
+tree_sibling(const struct tree *node)
+{
+    return TAILQ_NEXT(node, siblings);
+}
+
+void
+tree_require(struct tree *node, struct tree *req)
+{
+    LIST_INSERT_HEAD(&node->reqs, req, reqlink);
+}
+
+struct tree *
+tree_require_named(struct tree *node, const char *reqname)
+{
+    struct tree *req = tree_node_by_name(node, reqname);
+    tree_require(node, req);
+    return req;
+}
+
+static void
+tree_do_visit(struct tree *node,
+	      void (*fun)(struct tree *, void *arg),
+	      void *arg, int visit)
+{
+    struct tree *req, *kid;
+
+    assert(node->visit < visit - 1);
+    node->visit = visit - 1;
+    if (node->parent && node->parent->visit < visit)
+	tree_do_visit(node->parent, fun, arg, visit);
+    LIST_FOREACH(req, &node->reqs, reqlink) {
+	if (req->visit < visit)
+	    tree_do_visit(req, fun, arg, visit);
+    }
+    node->visit = visit;
+    fun(node, arg);
+    TAILQ_FOREACH(kid, &node->kids, siblings) {
+	if (kid->visit < visit - 1)
+	    tree_do_visit(kid, fun, arg, visit);
+    }
+}
+
+void
+tree_visit(struct tree *node,
+	   void (*fun)(struct tree *, void *arg),
+	   void *arg)
+{
+    static int visit;
+
+    visit += 2;
+    tree_do_visit(node, fun, arg, visit);
+}
+
+static void
+tree_print_sub(const struct tree *node, int indent)
+{
+    int i, use_str, sep;
+    const unsigned char *pv;
+    struct tree_prop *prop;
+    struct tree *kid;
+
+    printf("%*s%s {\n", indent, "", node->name[0] ? node->name : "/");
+    LIST_FOREACH(prop, &node->props, link) {
+	printf("%*s%s", indent + 4, "", prop->name);
+	pv = prop->val;
+	if (pv) {
+	    printf(" = ");
+	    use_str = pv[prop->sz - 1] == 0;
+	    for (i = 0; i < prop->sz - 1; i++) {
+		if (!isprint(pv[i]))
+		    use_str = 0;
+	    }
+	    if (use_str)
+		printf("\"%s\"", (const char *)prop->val);
+	    else {
+		sep = '[';
+		for (i = 0; i < prop->sz; i++) {
+		    printf("%c%02x", sep, pv[i]);
+		    sep = ' ';
+		}
+		printf("]");
+	    }
+	}
+	printf(";\n");
+    }
+    TAILQ_FOREACH(kid, &node->kids, siblings)
+	tree_print_sub(kid, indent + 4);
+    printf("%*s};\n", indent, "");
+}
+
+void
+tree_print(const struct tree *node)
+{
+    tree_print_sub(node, 0);
+}
diff --git a/tree.h b/tree.h
new file mode 100644
index 0000000..4b91cf0
--- /dev/null
+++ b/tree.h
@@ -0,0 +1,46 @@
+#ifndef TREE_H
+#define TREE_H
+
+#include <stddef.h>
+
+struct tree;
+struct tree_prop;
+
+struct tree *tree_create(void);
+const char *tree_node_name(const struct tree *node);
+struct tree *tree_node_by_name(const struct tree *node,
+			       const char *name);
+
+struct tree_prop *tree_first_prop(const struct tree *node);
+struct tree_prop *tree_next_prop(const struct tree_prop *prop);
+#define TREE_FOREACH_PROP(var, node) \
+    for (var = tree_first_prop(node); var; var = tree_next_prop(var))
+struct tree_prop *tree_get_prop(const struct tree *node, const char *name);
+const char *tree_get_prop_s(const struct tree *node, const char *name);
+const char *tree_prop_name(const struct tree_prop *prop);
+const void *tree_prop_value(const struct tree_prop *prop, size_t *size);
+void tree_put_prop(struct tree *node, const char *name,
+		   const void *val, size_t sz);
+void tree_put_propf(struct tree *node, const char *name,
+		    const char *fmt, ...)
+    __attribute__((format(printf,3,4)));
+
+void tree_put_user(struct tree *node, void *user);
+void *tree_get_user(const struct tree *node);
+
+struct tree *tree_new_kid(struct tree *parent, const char *name, void *user);
+struct tree *tree_parent(const struct tree *node);
+struct tree *tree_first_kid(const struct tree *node);
+struct tree *tree_sibling(const struct tree *node);
+#define TREE_FOREACH_KID(var, node)					\
+    for (var = tree_first_kid(node); var; var = tree_sibling(var))
+
+void tree_require(struct tree *node, struct tree *req);
+struct tree *tree_require_named(struct tree *node, const char *reqname);
+void tree_visit(struct tree *node,
+		void (*fun)(struct tree *, void *arg),
+		void *arg);
+
+void tree_print(const struct tree *node);
+
+#endif

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
@ 2009-02-11 16:31 ` Ian Jackson
  2009-02-11 17:43   ` Markus Armbruster
       [not found]   ` <18834.64870.951989.714873-msK/Ju9w1zmnROeE8kUsYhEHtJm+Wo+I@public.gmane.org>
       [not found] ` <87iqnh6kyv.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 146+ messages in thread
From: Ian Jackson @ 2009-02-11 16:31 UTC (permalink / raw)
  To: qemu-devel

Markus Armbruster writes ("[Qemu-devel] [RFC] Machine description as data"):
> [stuff]

Yes, this is a good approach.  I have one question though:

>    Define an internal machine configuration data structure.  Needs to be
>    sufficiently generic to be able to support even oddball machine
>    types.  Make it a decorated tree, i.e. a tree of named nodes with
>    named properties.

Many real systems are not strictly tree-structured, because there are
hardware devices which connect via several different paths.  For
example, much hardware supported by OpenWRT comes with a built-in
bridge chip connected internally to a hidden ethernet card; a tape
library would have one interface for the robot and a bunch of SCSI
tapereaders; etc.

When an emulation of such a device starts up, it will want to bind to
several parents.  How will you represent this ?

Ian.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-11 16:31 ` Ian Jackson
@ 2009-02-11 17:43   ` Markus Armbruster
       [not found]   ` <18834.64870.951989.714873-msK/Ju9w1zmnROeE8kUsYhEHtJm+Wo+I@public.gmane.org>
  1 sibling, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-11 17:43 UTC (permalink / raw)
  To: qemu-devel

Ian Jackson <Ian.Jackson@eu.citrix.com> writes:

> Markus Armbruster writes ("[Qemu-devel] [RFC] Machine description as data"):
>> [stuff]
>
> Yes, this is a good approach.  I have one question though:
>
>>    Define an internal machine configuration data structure.  Needs to be
>>    sufficiently generic to be able to support even oddball machine
>>    types.  Make it a decorated tree, i.e. a tree of named nodes with
>>    named properties.
>
> Many real systems are not strictly tree-structured, because there are
> hardware devices which connect via several different paths.  For
> example, much hardware supported by OpenWRT comes with a built-in
> bridge chip connected internally to a hidden ethernet card; a tape
> library would have one interface for the robot and a bunch of SCSI
> tapereaders; etc.
>
> When an emulation of such a device starts up, it will want to bind to
> several parents.  How will you represent this ?
>
> Ian.

Generalize the tree to a a directed acyclic graph (DAG)?

Got that already, in fact, only the non-tree edges are second-class
citizens.  They are separate from tree edges, and unlike those, they can
only be added by the config() methods.  Maybe it would be easier and
cleaner to make the data structure a DAG from the start, instead of
tacking DAG-edges to a tree as if they were some afterthought.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
@ 2009-02-11 18:50     ` Hollis Blanchard
       [not found] ` <87iqnh6kyv.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
                       ` (9 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Hollis Blanchard @ 2009-02-11 18:50 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A

On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
> Sorry for the length of this memo.  I tried to make it as concise as I
> could.  And there's working mock-up source code to go with it.
> 
> 
> Configuration should be data
> ----------------------------
> 
> A QEMU machine (selected with -M) is described by a struct QEMUMachine.
> Which contains almost nothing of interest.  Pretty much everything,
> including all the buses and devices is instead created by the machine's
> initialization function.
> 
> Init functions consider a plethora of ad hoc configuration parameters
> set by command line options.  Plenty of stuff remains hard-coded all
> the same.
> 
> Configuration should be data, not code.
> 
> A machine's buses and devices can be expressed as a device tree.  More
> on that below.
> 
> The need for a configuration file
> ---------------------------------
> 
> The command line is a rather odd place to define a virtual machine.
> Command line is fine for manipulating a particular run of the machine,
> but the machine description belongs into a configuration file.
> 
> Once configuration is data, we should be able to initialize it from a
> configuration file with relative ease.
> 
> However, this memo is only about the *internal* representation of
> configuration.  How we get there from a configuration file is a separate
> question.  It's without doubt a relevant question, but I feel I need to
> limit my scope to have a chance of getting anywhere.
> 
> The need for an abstract device interface
> -----------------------------------------
> 
> Currently, each virtual device is created, configured and initialized in
> its own idiosyncratic way.  Some configuration is received as arguments,
> some is passed in global variables.
> 
> This is workable as long as the machine is constructed by ad hoc init
> function code.  The resulting init function tends to be quite a
> hairball, though.
> 
> I'd like to propose an abstract device interface, so we can build a
> machine from its (tree-structured) configuration using just this
> interface.  Device idiosyncrasies are to be hidden in the driver code
> behind the interface.
> 
> What I propose to do
> --------------------
> 
> A. Configuration as data
> 
>    Define an internal machine configuration data structure.  Needs to be
>    sufficiently generic to be able to support even oddball machine
>    types.  Make it a decorated tree, i.e. a tree of named nodes with
>    named properties.
> 
>    Create an instance for a prototype machine type.  Make it a PC,
>    because that's the easiest to test.
> 
>    Define an abstract device interface, initially covering just device
>    configuration and initialization.
> 
>    Implement the device interface for the devices used by the prototype
>    machine type.
> 
>    Do not break existing machine types here.  This means we need to keep
>    legacy interfaces until their last user is gone (step B).  Could
>    become somewhat messy in places for a while.
> 
> B. Convert all the existing machine configurations to data.
> 
>    This can and should be done incrementally, each machine by people who
>    care and know about it.
> 
>    Clean up the legacy interfaces now unused, and any messes we made
>    behind them.
> 
> C. Read (and maybe write) machine configuration
> 
>    The external format to use is debatable.  Compared to the rest of the
>    task, its choice looks like detail to me, but I'm biased ;)
> 
>    Writing the data could be useful for debugging.
> 
> D. Command line options to modify the configuration tree
> 
>    If we want them.
> 
> E. Make legacy command line modify the configuration tree
> 
>    For compatibility.  This is my "favourite" part.
> 
> We need to start with A.  The other tasks are largely independent.
> 
> What I've already done
> ----------------------
> 
> Show me the code, they say.  Find attached a working prototype of step
> A.  It passes the "Linux boots" test for me.  I didn't bother to rebase
> to current HEAD, happy do to that on request.
> 
> Instead of hacking up machine "pc", I created a new machine "pcdt".  I
> took a number of shortcuts:
> 
> * I put the "pcdt" code into the new file dt.c, and copied code from
>   pc.c there.  I could have avoided that by putting my code in pc.c
>   instead.  Putting it in a new file helped me pick apart the pc.c
>   hairball.  To be cleaned up.
> 
> * I copied code from net.c.  Trivial to fix, just give it external
>   linkage there.
> 
> * I hard-coded the configuration tree in the wrong place (tree.c), out of
>   laziness.
> 
> * I didn't implement all the devices of the "pc" original.  The devices
>   I implemented might not support all existing command line options.
> 
> Notable qualities:
> 
> * Device drivers are cleanly separated from each other, and from the
>   device-agnostic configuration code.
> 
> * Each driver specifies the configurable properties in a single place.
> 
> * Device configuration is gotten from the configuration tree, which is
>   fully checked.  Unknown properties are rejected.
> 
> 
> Appendix: Linux device trees
> ----------------------------
> 
> This appendix is probably only of interest to some of you, feel free to
> skip.
> 
> The IEEE 1275 Open Firmware Device Tree solves a somewhat similar
> problem, namely to communicate environmental information (hardware and
> configuration) from firmware to operating system.  It's chiefly used on
> PowerPCs.  The OS calls Open Firmware to query the device tree.
> 
> Linux turns the Open Firmware device tree API into a data format.
> Actually two: the DT blob format is a binary data structure, and the
> DT source format is human-readable text.  The device tree compiler
> "dtc" can convert the two.
> 
> We already have a bit of code dealing with this, in device_tree.c.
> 
> I briefly examined the DT source format and the tree structure it
> describes for the purpose of QEMU configuration.  I decided against
> using it in my prototype because I found it awfully low-level and
> verbose for that purpose (I'm sure it serves the purpose it was designed
> for just fine).  Issues include:
> 
> * Since the DT is designed for booting kernels, not configuring QEMU,
>   there's information that has no place in QEMU configuration, and
>   required QEMU configuration isn't there.

What's needed is a "binding" in IEEE1275-speak: a document that
describes qemu-specific nodes/properties and how they are to be
interpreted.

As an example, you could require that block devices contain properties
named "qemu,path", "qemu,backend", etc.

> * Redundancy between node name and its device_type property.
> 
> * Property "reg", which encodes address ranges, does so in terms of
>   "cells": #address-cells 32-bit words (big endian) for the address,
>   followed by #size-cells words for the size, where #address-cells and
>   #size-cells are properties of the enclosing bus.  If this sounds
>   like gibberish to you, well, that's my point.

I'm CCing devicetree-discuss for broader discussion.

I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
reinvent all the design and infrastructure for a similar-but-different
device tree.

[Patch snipped]

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-11 18:50     ` Hollis Blanchard
  0 siblings, 0 replies; 146+ messages in thread
From: Hollis Blanchard @ 2009-02-11 18:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
> Sorry for the length of this memo.  I tried to make it as concise as I
> could.  And there's working mock-up source code to go with it.
> 
> 
> Configuration should be data
> ----------------------------
> 
> A QEMU machine (selected with -M) is described by a struct QEMUMachine.
> Which contains almost nothing of interest.  Pretty much everything,
> including all the buses and devices is instead created by the machine's
> initialization function.
> 
> Init functions consider a plethora of ad hoc configuration parameters
> set by command line options.  Plenty of stuff remains hard-coded all
> the same.
> 
> Configuration should be data, not code.
> 
> A machine's buses and devices can be expressed as a device tree.  More
> on that below.
> 
> The need for a configuration file
> ---------------------------------
> 
> The command line is a rather odd place to define a virtual machine.
> Command line is fine for manipulating a particular run of the machine,
> but the machine description belongs into a configuration file.
> 
> Once configuration is data, we should be able to initialize it from a
> configuration file with relative ease.
> 
> However, this memo is only about the *internal* representation of
> configuration.  How we get there from a configuration file is a separate
> question.  It's without doubt a relevant question, but I feel I need to
> limit my scope to have a chance of getting anywhere.
> 
> The need for an abstract device interface
> -----------------------------------------
> 
> Currently, each virtual device is created, configured and initialized in
> its own idiosyncratic way.  Some configuration is received as arguments,
> some is passed in global variables.
> 
> This is workable as long as the machine is constructed by ad hoc init
> function code.  The resulting init function tends to be quite a
> hairball, though.
> 
> I'd like to propose an abstract device interface, so we can build a
> machine from its (tree-structured) configuration using just this
> interface.  Device idiosyncrasies are to be hidden in the driver code
> behind the interface.
> 
> What I propose to do
> --------------------
> 
> A. Configuration as data
> 
>    Define an internal machine configuration data structure.  Needs to be
>    sufficiently generic to be able to support even oddball machine
>    types.  Make it a decorated tree, i.e. a tree of named nodes with
>    named properties.
> 
>    Create an instance for a prototype machine type.  Make it a PC,
>    because that's the easiest to test.
> 
>    Define an abstract device interface, initially covering just device
>    configuration and initialization.
> 
>    Implement the device interface for the devices used by the prototype
>    machine type.
> 
>    Do not break existing machine types here.  This means we need to keep
>    legacy interfaces until their last user is gone (step B).  Could
>    become somewhat messy in places for a while.
> 
> B. Convert all the existing machine configurations to data.
> 
>    This can and should be done incrementally, each machine by people who
>    care and know about it.
> 
>    Clean up the legacy interfaces now unused, and any messes we made
>    behind them.
> 
> C. Read (and maybe write) machine configuration
> 
>    The external format to use is debatable.  Compared to the rest of the
>    task, its choice looks like detail to me, but I'm biased ;)
> 
>    Writing the data could be useful for debugging.
> 
> D. Command line options to modify the configuration tree
> 
>    If we want them.
> 
> E. Make legacy command line modify the configuration tree
> 
>    For compatibility.  This is my "favourite" part.
> 
> We need to start with A.  The other tasks are largely independent.
> 
> What I've already done
> ----------------------
> 
> Show me the code, they say.  Find attached a working prototype of step
> A.  It passes the "Linux boots" test for me.  I didn't bother to rebase
> to current HEAD, happy do to that on request.
> 
> Instead of hacking up machine "pc", I created a new machine "pcdt".  I
> took a number of shortcuts:
> 
> * I put the "pcdt" code into the new file dt.c, and copied code from
>   pc.c there.  I could have avoided that by putting my code in pc.c
>   instead.  Putting it in a new file helped me pick apart the pc.c
>   hairball.  To be cleaned up.
> 
> * I copied code from net.c.  Trivial to fix, just give it external
>   linkage there.
> 
> * I hard-coded the configuration tree in the wrong place (tree.c), out of
>   laziness.
> 
> * I didn't implement all the devices of the "pc" original.  The devices
>   I implemented might not support all existing command line options.
> 
> Notable qualities:
> 
> * Device drivers are cleanly separated from each other, and from the
>   device-agnostic configuration code.
> 
> * Each driver specifies the configurable properties in a single place.
> 
> * Device configuration is gotten from the configuration tree, which is
>   fully checked.  Unknown properties are rejected.
> 
> 
> Appendix: Linux device trees
> ----------------------------
> 
> This appendix is probably only of interest to some of you, feel free to
> skip.
> 
> The IEEE 1275 Open Firmware Device Tree solves a somewhat similar
> problem, namely to communicate environmental information (hardware and
> configuration) from firmware to operating system.  It's chiefly used on
> PowerPCs.  The OS calls Open Firmware to query the device tree.
> 
> Linux turns the Open Firmware device tree API into a data format.
> Actually two: the DT blob format is a binary data structure, and the
> DT source format is human-readable text.  The device tree compiler
> "dtc" can convert the two.
> 
> We already have a bit of code dealing with this, in device_tree.c.
> 
> I briefly examined the DT source format and the tree structure it
> describes for the purpose of QEMU configuration.  I decided against
> using it in my prototype because I found it awfully low-level and
> verbose for that purpose (I'm sure it serves the purpose it was designed
> for just fine).  Issues include:
> 
> * Since the DT is designed for booting kernels, not configuring QEMU,
>   there's information that has no place in QEMU configuration, and
>   required QEMU configuration isn't there.

What's needed is a "binding" in IEEE1275-speak: a document that
describes qemu-specific nodes/properties and how they are to be
interpreted.

As an example, you could require that block devices contain properties
named "qemu,path", "qemu,backend", etc.

> * Redundancy between node name and its device_type property.
> 
> * Property "reg", which encodes address ranges, does so in terms of
>   "cells": #address-cells 32-bit words (big endian) for the address,
>   followed by #size-cells words for the size, where #address-cells and
>   #size-cells are properties of the enclosing bus.  If this sounds
>   like gibberish to you, well, that's my point.

I'm CCing devicetree-discuss for broader discussion.

I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
reinvent all the design and infrastructure for a similar-but-different
device tree.

[Patch snipped]

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-11 16:31 ` Ian Jackson
@ 2009-02-11 18:57       ` Hollis Blanchard
       [not found]   ` <18834.64870.951989.714873-msK/Ju9w1zmnROeE8kUsYhEHtJm+Wo+I@public.gmane.org>
  1 sibling, 0 replies; 146+ messages in thread
From: Hollis Blanchard @ 2009-02-11 18:57 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A; +Cc: devicetree-discuss

On Wed, 2009-02-11 at 16:31 +0000, Ian Jackson wrote:
> Markus Armbruster writes ("[Qemu-devel] [RFC] Machine description as data"):
> > [stuff]
> 
> Yes, this is a good approach.  I have one question though:
> 
> >    Define an internal machine configuration data structure.  Needs to be
> >    sufficiently generic to be able to support even oddball machine
> >    types.  Make it a decorated tree, i.e. a tree of named nodes with
> >    named properties.
> 
> Many real systems are not strictly tree-structured, because there are
> hardware devices which connect via several different paths.  For
> example, much hardware supported by OpenWRT comes with a built-in
> bridge chip connected internally to a hidden ethernet card; a tape
> library would have one interface for the robot and a bunch of SCSI
> tapereaders; etc.

I'm not sure these are great examples, since there still a clear
hierarchy here (e.g. the ethernet card is "behind" the bridge chip).
Also, there is already established practice for representing SoC devices
(found in many embedded PowerPC processors): see arch/powerpc/boot/dts.

However, what *is* a good example would be the interrupt hierarchy,
which can be totally separate from the address/data hierarchy.

The device tree is about *devices*, not interfaces. Each node (device)
can mark itself as implementing multiple interfaces, which is what the
"compatible" property is about.

> When an emulation of such a device starts up, it will want to bind to
> several parents.  How will you represent this ?

There is established design for representing the interrupt hierarchy in
IEEE1275, using explicit "interrupt-parent" properties to create the
interrupt tree.

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-11 18:57       ` Hollis Blanchard
  0 siblings, 0 replies; 146+ messages in thread
From: Hollis Blanchard @ 2009-02-11 18:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

On Wed, 2009-02-11 at 16:31 +0000, Ian Jackson wrote:
> Markus Armbruster writes ("[Qemu-devel] [RFC] Machine description as data"):
> > [stuff]
> 
> Yes, this is a good approach.  I have one question though:
> 
> >    Define an internal machine configuration data structure.  Needs to be
> >    sufficiently generic to be able to support even oddball machine
> >    types.  Make it a decorated tree, i.e. a tree of named nodes with
> >    named properties.
> 
> Many real systems are not strictly tree-structured, because there are
> hardware devices which connect via several different paths.  For
> example, much hardware supported by OpenWRT comes with a built-in
> bridge chip connected internally to a hidden ethernet card; a tape
> library would have one interface for the robot and a bunch of SCSI
> tapereaders; etc.

I'm not sure these are great examples, since there still a clear
hierarchy here (e.g. the ethernet card is "behind" the bridge chip).
Also, there is already established practice for representing SoC devices
(found in many embedded PowerPC processors): see arch/powerpc/boot/dts.

However, what *is* a good example would be the interrupt hierarchy,
which can be totally separate from the address/data hierarchy.

The device tree is about *devices*, not interfaces. Each node (device)
can mark itself as implementing multiple interfaces, which is what the
"compatible" property is about.

> When an emulation of such a device starts up, it will want to bind to
> several parents.  How will you represent this ?

There is established design for representing the interrupt hierarchy in
IEEE1275, using explicit "interrupt-parent" properties to create the
interrupt tree.

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
  2009-02-11 16:31 ` Ian Jackson
       [not found] ` <87iqnh6kyv.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
@ 2009-02-11 19:01 ` Anthony Liguori
  2009-02-11 19:36   ` Blue Swirl
  2009-02-16 16:22 ` [Qemu-devel] Machine description as data prototype, take 2 (was: [RFC] Machine description as data) Markus Armbruster
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Anthony Liguori @ 2009-02-11 19:01 UTC (permalink / raw)
  To: qemu-devel

I think your approach is pretty sound.  A few observations:

1) obviously need to eliminate the code duplication

2) the new code should fit with the rest of QEMU stylistically

3) I'd prefer incremental vs. perfect so let's try to do as much 
refactoring that will be required before actually going the full 9 yards 
and implementing the config file.

4) we don't have to solve all problems all at once as long as we don't 
regress existing features

Regards,

Anthony Liguori

Markus Armbruster wrote:
> Sorry for the length of this memo.  I tried to make it as concise as I
> could.  And there's working mock-up source code to go with it.
>
>
> Configuration should be data
> ----------------------------
>
> A QEMU machine (selected with -M) is described by a struct QEMUMachine.
> Which contains almost nothing of interest.  Pretty much everything,
> including all the buses and devices is instead created by the machine's
> initialization function.
>
> Init functions consider a plethora of ad hoc configuration parameters
> set by command line options.  Plenty of stuff remains hard-coded all
> the same.
>
> Configuration should be data, not code.
>
> A machine's buses and devices can be expressed as a device tree.  More
> on that below.
>
> The need for a configuration file
> ---------------------------------
>
> The command line is a rather odd place to define a virtual machine.
> Command line is fine for manipulating a particular run of the machine,
> but the machine description belongs into a configuration file.
>
> Once configuration is data, we should be able to initialize it from a
> configuration file with relative ease.
>
> However, this memo is only about the *internal* representation of
> configuration.  How we get there from a configuration file is a separate
> question.  It's without doubt a relevant question, but I feel I need to
> limit my scope to have a chance of getting anywhere.
>
> The need for an abstract device interface
> -----------------------------------------
>
> Currently, each virtual device is created, configured and initialized in
> its own idiosyncratic way.  Some configuration is received as arguments,
> some is passed in global variables.
>
> This is workable as long as the machine is constructed by ad hoc init
> function code.  The resulting init function tends to be quite a
> hairball, though.
>
> I'd like to propose an abstract device interface, so we can build a
> machine from its (tree-structured) configuration using just this
> interface.  Device idiosyncrasies are to be hidden in the driver code
> behind the interface.
>
> What I propose to do
> --------------------
>
> A. Configuration as data
>
>    Define an internal machine configuration data structure.  Needs to be
>    sufficiently generic to be able to support even oddball machine
>    types.  Make it a decorated tree, i.e. a tree of named nodes with
>    named properties.
>
>    Create an instance for a prototype machine type.  Make it a PC,
>    because that's the easiest to test.
>
>    Define an abstract device interface, initially covering just device
>    configuration and initialization.
>
>    Implement the device interface for the devices used by the prototype
>    machine type.
>
>    Do not break existing machine types here.  This means we need to keep
>    legacy interfaces until their last user is gone (step B).  Could
>    become somewhat messy in places for a while.
>
> B. Convert all the existing machine configurations to data.
>
>    This can and should be done incrementally, each machine by people who
>    care and know about it.
>
>    Clean up the legacy interfaces now unused, and any messes we made
>    behind them.
>
> C. Read (and maybe write) machine configuration
>
>    The external format to use is debatable.  Compared to the rest of the
>    task, its choice looks like detail to me, but I'm biased ;)
>
>    Writing the data could be useful for debugging.
>
> D. Command line options to modify the configuration tree
>
>    If we want them.
>
> E. Make legacy command line modify the configuration tree
>
>    For compatibility.  This is my "favourite" part.
>
> We need to start with A.  The other tasks are largely independent.
>
> What I've already done
> ----------------------
>
> Show me the code, they say.  Find attached a working prototype of step
> A.  It passes the "Linux boots" test for me.  I didn't bother to rebase
> to current HEAD, happy do to that on request.
>
> Instead of hacking up machine "pc", I created a new machine "pcdt".  I
> took a number of shortcuts:
>
> * I put the "pcdt" code into the new file dt.c, and copied code from
>   pc.c there.  I could have avoided that by putting my code in pc.c
>   instead.  Putting it in a new file helped me pick apart the pc.c
>   hairball.  To be cleaned up.
>
> * I copied code from net.c.  Trivial to fix, just give it external
>   linkage there.
>
> * I hard-coded the configuration tree in the wrong place (tree.c), out of
>   laziness.
>
> * I didn't implement all the devices of the "pc" original.  The devices
>   I implemented might not support all existing command line options.
>
> Notable qualities:
>
> * Device drivers are cleanly separated from each other, and from the
>   device-agnostic configuration code.
>
> * Each driver specifies the configurable properties in a single place.
>
> * Device configuration is gotten from the configuration tree, which is
>   fully checked.  Unknown properties are rejected.
>
>
>   

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-11 18:50     ` Hollis Blanchard
@ 2009-02-11 19:34       ` Blue Swirl
  -1 siblings, 0 replies; 146+ messages in thread
From: Blue Swirl @ 2009-02-11 19:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

On 2/11/09, Hollis Blanchard <hollisb@us.ibm.com> wrote:
> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
>  > Sorry for the length of this memo.  I tried to make it as concise as I
>  > could.  And there's working mock-up source code to go with it.
>  >
>  >
>  > Configuration should be data
>  > ----------------------------
>  >
>  > A QEMU machine (selected with -M) is described by a struct QEMUMachine.
>  > Which contains almost nothing of interest.  Pretty much everything,
>  > including all the buses and devices is instead created by the machine's
>  > initialization function.
>  >
>  > Init functions consider a plethora of ad hoc configuration parameters
>  > set by command line options.  Plenty of stuff remains hard-coded all
>  > the same.
>  >
>  > Configuration should be data, not code.
>  >
>  > A machine's buses and devices can be expressed as a device tree.  More
>  > on that below.
>  >
>  > The need for a configuration file
>  > ---------------------------------
>  >
>  > The command line is a rather odd place to define a virtual machine.
>  > Command line is fine for manipulating a particular run of the machine,
>  > but the machine description belongs into a configuration file.
>  >
>  > Once configuration is data, we should be able to initialize it from a
>  > configuration file with relative ease.
>  >
>  > However, this memo is only about the *internal* representation of
>  > configuration.  How we get there from a configuration file is a separate
>  > question.  It's without doubt a relevant question, but I feel I need to
>  > limit my scope to have a chance of getting anywhere.
>  >
>  > The need for an abstract device interface
>  > -----------------------------------------
>  >
>  > Currently, each virtual device is created, configured and initialized in
>  > its own idiosyncratic way.  Some configuration is received as arguments,
>  > some is passed in global variables.
>  >
>  > This is workable as long as the machine is constructed by ad hoc init
>  > function code.  The resulting init function tends to be quite a
>  > hairball, though.
>  >
>  > I'd like to propose an abstract device interface, so we can build a
>  > machine from its (tree-structured) configuration using just this
>  > interface.  Device idiosyncrasies are to be hidden in the driver code
>  > behind the interface.
>  >
>  > What I propose to do
>  > --------------------
>  >
>  > A. Configuration as data
>  >
>  >    Define an internal machine configuration data structure.  Needs to be
>  >    sufficiently generic to be able to support even oddball machine
>  >    types.  Make it a decorated tree, i.e. a tree of named nodes with
>  >    named properties.
>  >
>  >    Create an instance for a prototype machine type.  Make it a PC,
>  >    because that's the easiest to test.
>  >
>  >    Define an abstract device interface, initially covering just device
>  >    configuration and initialization.
>  >
>  >    Implement the device interface for the devices used by the prototype
>  >    machine type.
>  >
>  >    Do not break existing machine types here.  This means we need to keep
>  >    legacy interfaces until their last user is gone (step B).  Could
>  >    become somewhat messy in places for a while.
>  >
>  > B. Convert all the existing machine configurations to data.
>  >
>  >    This can and should be done incrementally, each machine by people who
>  >    care and know about it.
>  >
>  >    Clean up the legacy interfaces now unused, and any messes we made
>  >    behind them.
>  >
>  > C. Read (and maybe write) machine configuration
>  >
>  >    The external format to use is debatable.  Compared to the rest of the
>  >    task, its choice looks like detail to me, but I'm biased ;)
>  >
>  >    Writing the data could be useful for debugging.
>  >
>  > D. Command line options to modify the configuration tree
>  >
>  >    If we want them.
>  >
>  > E. Make legacy command line modify the configuration tree
>  >
>  >    For compatibility.  This is my "favourite" part.
>  >
>  > We need to start with A.  The other tasks are largely independent.
>  >
>  > What I've already done
>  > ----------------------
>  >
>  > Show me the code, they say.  Find attached a working prototype of step
>  > A.  It passes the "Linux boots" test for me.  I didn't bother to rebase
>  > to current HEAD, happy do to that on request.
>  >
>  > Instead of hacking up machine "pc", I created a new machine "pcdt".  I
>  > took a number of shortcuts:
>  >
>  > * I put the "pcdt" code into the new file dt.c, and copied code from
>  >   pc.c there.  I could have avoided that by putting my code in pc.c
>  >   instead.  Putting it in a new file helped me pick apart the pc.c
>  >   hairball.  To be cleaned up.
>  >
>  > * I copied code from net.c.  Trivial to fix, just give it external
>  >   linkage there.
>  >
>  > * I hard-coded the configuration tree in the wrong place (tree.c), out of
>  >   laziness.
>  >
>  > * I didn't implement all the devices of the "pc" original.  The devices
>  >   I implemented might not support all existing command line options.
>  >
>  > Notable qualities:
>  >
>  > * Device drivers are cleanly separated from each other, and from the
>  >   device-agnostic configuration code.
>  >
>  > * Each driver specifies the configurable properties in a single place.
>  >
>  > * Device configuration is gotten from the configuration tree, which is
>  >   fully checked.  Unknown properties are rejected.
>  >
>  >
>  > Appendix: Linux device trees
>  > ----------------------------
>  >
>  > This appendix is probably only of interest to some of you, feel free to
>  > skip.
>  >
>  > The IEEE 1275 Open Firmware Device Tree solves a somewhat similar
>  > problem, namely to communicate environmental information (hardware and
>  > configuration) from firmware to operating system.  It's chiefly used on
>  > PowerPCs.  The OS calls Open Firmware to query the device tree.
>  >
>  > Linux turns the Open Firmware device tree API into a data format.
>  > Actually two: the DT blob format is a binary data structure, and the
>  > DT source format is human-readable text.  The device tree compiler
>  > "dtc" can convert the two.
>  >
>  > We already have a bit of code dealing with this, in device_tree.c.
>  >
>  > I briefly examined the DT source format and the tree structure it
>  > describes for the purpose of QEMU configuration.  I decided against
>  > using it in my prototype because I found it awfully low-level and
>  > verbose for that purpose (I'm sure it serves the purpose it was designed
>  > for just fine).  Issues include:
>  >
>  > * Since the DT is designed for booting kernels, not configuring QEMU,
>  >   there's information that has no place in QEMU configuration, and
>  >   required QEMU configuration isn't there.
>
>
> What's needed is a "binding" in IEEE1275-speak: a document that
>  describes qemu-specific nodes/properties and how they are to be
>  interpreted.
>
>  As an example, you could require that block devices contain properties
>  named "qemu,path", "qemu,backend", etc.

It should be nice to take the Qemu device tree and use that in
OpenBIOS to provide the OF tree. Linux could use the Qemu tree
directly.

>  > * Redundancy between node name and its device_type property.
>  >
>  > * Property "reg", which encodes address ranges, does so in terms of
>  >   "cells": #address-cells 32-bit words (big endian) for the address,
>  >   followed by #size-cells words for the size, where #address-cells and
>  >   #size-cells are properties of the enclosing bus.  If this sounds
>  >   like gibberish to you, well, that's my point.
>
>
> I'm CCing devicetree-discuss for broader discussion.
>
>  I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
>  reinvent all the design and infrastructure for a similar-but-different
>  device tree.

Fully agree. Based on the suggestions you gave, I don't even think FDT
would need to be extended.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-11 19:34       ` Blue Swirl
  0 siblings, 0 replies; 146+ messages in thread
From: Blue Swirl @ 2009-02-11 19:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

On 2/11/09, Hollis Blanchard <hollisb@us.ibm.com> wrote:
> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
>  > Sorry for the length of this memo.  I tried to make it as concise as I
>  > could.  And there's working mock-up source code to go with it.
>  >
>  >
>  > Configuration should be data
>  > ----------------------------
>  >
>  > A QEMU machine (selected with -M) is described by a struct QEMUMachine.
>  > Which contains almost nothing of interest.  Pretty much everything,
>  > including all the buses and devices is instead created by the machine's
>  > initialization function.
>  >
>  > Init functions consider a plethora of ad hoc configuration parameters
>  > set by command line options.  Plenty of stuff remains hard-coded all
>  > the same.
>  >
>  > Configuration should be data, not code.
>  >
>  > A machine's buses and devices can be expressed as a device tree.  More
>  > on that below.
>  >
>  > The need for a configuration file
>  > ---------------------------------
>  >
>  > The command line is a rather odd place to define a virtual machine.
>  > Command line is fine for manipulating a particular run of the machine,
>  > but the machine description belongs into a configuration file.
>  >
>  > Once configuration is data, we should be able to initialize it from a
>  > configuration file with relative ease.
>  >
>  > However, this memo is only about the *internal* representation of
>  > configuration.  How we get there from a configuration file is a separate
>  > question.  It's without doubt a relevant question, but I feel I need to
>  > limit my scope to have a chance of getting anywhere.
>  >
>  > The need for an abstract device interface
>  > -----------------------------------------
>  >
>  > Currently, each virtual device is created, configured and initialized in
>  > its own idiosyncratic way.  Some configuration is received as arguments,
>  > some is passed in global variables.
>  >
>  > This is workable as long as the machine is constructed by ad hoc init
>  > function code.  The resulting init function tends to be quite a
>  > hairball, though.
>  >
>  > I'd like to propose an abstract device interface, so we can build a
>  > machine from its (tree-structured) configuration using just this
>  > interface.  Device idiosyncrasies are to be hidden in the driver code
>  > behind the interface.
>  >
>  > What I propose to do
>  > --------------------
>  >
>  > A. Configuration as data
>  >
>  >    Define an internal machine configuration data structure.  Needs to be
>  >    sufficiently generic to be able to support even oddball machine
>  >    types.  Make it a decorated tree, i.e. a tree of named nodes with
>  >    named properties.
>  >
>  >    Create an instance for a prototype machine type.  Make it a PC,
>  >    because that's the easiest to test.
>  >
>  >    Define an abstract device interface, initially covering just device
>  >    configuration and initialization.
>  >
>  >    Implement the device interface for the devices used by the prototype
>  >    machine type.
>  >
>  >    Do not break existing machine types here.  This means we need to keep
>  >    legacy interfaces until their last user is gone (step B).  Could
>  >    become somewhat messy in places for a while.
>  >
>  > B. Convert all the existing machine configurations to data.
>  >
>  >    This can and should be done incrementally, each machine by people who
>  >    care and know about it.
>  >
>  >    Clean up the legacy interfaces now unused, and any messes we made
>  >    behind them.
>  >
>  > C. Read (and maybe write) machine configuration
>  >
>  >    The external format to use is debatable.  Compared to the rest of the
>  >    task, its choice looks like detail to me, but I'm biased ;)
>  >
>  >    Writing the data could be useful for debugging.
>  >
>  > D. Command line options to modify the configuration tree
>  >
>  >    If we want them.
>  >
>  > E. Make legacy command line modify the configuration tree
>  >
>  >    For compatibility.  This is my "favourite" part.
>  >
>  > We need to start with A.  The other tasks are largely independent.
>  >
>  > What I've already done
>  > ----------------------
>  >
>  > Show me the code, they say.  Find attached a working prototype of step
>  > A.  It passes the "Linux boots" test for me.  I didn't bother to rebase
>  > to current HEAD, happy do to that on request.
>  >
>  > Instead of hacking up machine "pc", I created a new machine "pcdt".  I
>  > took a number of shortcuts:
>  >
>  > * I put the "pcdt" code into the new file dt.c, and copied code from
>  >   pc.c there.  I could have avoided that by putting my code in pc.c
>  >   instead.  Putting it in a new file helped me pick apart the pc.c
>  >   hairball.  To be cleaned up.
>  >
>  > * I copied code from net.c.  Trivial to fix, just give it external
>  >   linkage there.
>  >
>  > * I hard-coded the configuration tree in the wrong place (tree.c), out of
>  >   laziness.
>  >
>  > * I didn't implement all the devices of the "pc" original.  The devices
>  >   I implemented might not support all existing command line options.
>  >
>  > Notable qualities:
>  >
>  > * Device drivers are cleanly separated from each other, and from the
>  >   device-agnostic configuration code.
>  >
>  > * Each driver specifies the configurable properties in a single place.
>  >
>  > * Device configuration is gotten from the configuration tree, which is
>  >   fully checked.  Unknown properties are rejected.
>  >
>  >
>  > Appendix: Linux device trees
>  > ----------------------------
>  >
>  > This appendix is probably only of interest to some of you, feel free to
>  > skip.
>  >
>  > The IEEE 1275 Open Firmware Device Tree solves a somewhat similar
>  > problem, namely to communicate environmental information (hardware and
>  > configuration) from firmware to operating system.  It's chiefly used on
>  > PowerPCs.  The OS calls Open Firmware to query the device tree.
>  >
>  > Linux turns the Open Firmware device tree API into a data format.
>  > Actually two: the DT blob format is a binary data structure, and the
>  > DT source format is human-readable text.  The device tree compiler
>  > "dtc" can convert the two.
>  >
>  > We already have a bit of code dealing with this, in device_tree.c.
>  >
>  > I briefly examined the DT source format and the tree structure it
>  > describes for the purpose of QEMU configuration.  I decided against
>  > using it in my prototype because I found it awfully low-level and
>  > verbose for that purpose (I'm sure it serves the purpose it was designed
>  > for just fine).  Issues include:
>  >
>  > * Since the DT is designed for booting kernels, not configuring QEMU,
>  >   there's information that has no place in QEMU configuration, and
>  >   required QEMU configuration isn't there.
>
>
> What's needed is a "binding" in IEEE1275-speak: a document that
>  describes qemu-specific nodes/properties and how they are to be
>  interpreted.
>
>  As an example, you could require that block devices contain properties
>  named "qemu,path", "qemu,backend", etc.

It should be nice to take the Qemu device tree and use that in
OpenBIOS to provide the OF tree. Linux could use the Qemu tree
directly.

>  > * Redundancy between node name and its device_type property.
>  >
>  > * Property "reg", which encodes address ranges, does so in terms of
>  >   "cells": #address-cells 32-bit words (big endian) for the address,
>  >   followed by #size-cells words for the size, where #address-cells and
>  >   #size-cells are properties of the enclosing bus.  If this sounds
>  >   like gibberish to you, well, that's my point.
>
>
> I'm CCing devicetree-discuss for broader discussion.
>
>  I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
>  reinvent all the design and infrastructure for a similar-but-different
>  device tree.

Fully agree. Based on the suggestions you gave, I don't even think FDT
would need to be extended.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-11 19:01 ` Anthony Liguori
@ 2009-02-11 19:36   ` Blue Swirl
  2009-02-11 19:56     ` Anthony Liguori
  0 siblings, 1 reply; 146+ messages in thread
From: Blue Swirl @ 2009-02-11 19:36 UTC (permalink / raw)
  To: qemu-devel

On 2/11/09, Anthony Liguori <anthony@codemonkey.ws> wrote:
> I think your approach is pretty sound.  A few observations:
>
>  1) obviously need to eliminate the code duplication
>
>  2) the new code should fit with the rest of QEMU stylistically
>
>  3) I'd prefer incremental vs. perfect so let's try to do as much
> refactoring that will be required before actually going the full 9 yards and
> implementing the config file.
>
>  4) we don't have to solve all problems all at once as long as we don't
> regress existing features

I'd still want to see if a FDT based solution is possible before
taking a homebrew version.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-11 19:36   ` Blue Swirl
@ 2009-02-11 19:56     ` Anthony Liguori
  2009-02-12 10:25       ` Markus Armbruster
  0 siblings, 1 reply; 146+ messages in thread
From: Anthony Liguori @ 2009-02-11 19:56 UTC (permalink / raw)
  To: qemu-devel

Blue Swirl wrote:
> On 2/11/09, Anthony Liguori <anthony@codemonkey.ws> wrote:
>   
>> I think your approach is pretty sound.  A few observations:
>>
>>  1) obviously need to eliminate the code duplication
>>
>>  2) the new code should fit with the rest of QEMU stylistically
>>
>>  3) I'd prefer incremental vs. perfect so let's try to do as much
>> refactoring that will be required before actually going the full 9 yards and
>> implementing the config file.
>>
>>  4) we don't have to solve all problems all at once as long as we don't
>> regress existing features
>>     
>
> I'd still want to see if a FDT based solution is possible before
> taking a homebrew version.
>   

I think I mentioned earlier that I am heavily bias toward a FDT 
solution.  What I'm suggesting though is that we can do some of the 
required cleanup (like device refactoring) before introducing any of the 
tree stuff.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-11 18:57       ` Hollis Blanchard
@ 2009-02-12  3:50           ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-12  3:50 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: devicetree-discuss, qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Wed, Feb 11, 2009 at 12:57:19PM -0600, Hollis Blanchard wrote:
> On Wed, 2009-02-11 at 16:31 +0000, Ian Jackson wrote:
> > Markus Armbruster writes ("[Qemu-devel] [RFC] Machine description as data"):
> > > [stuff]
> > 
> > Yes, this is a good approach.  I have one question though:
> > 
> > >    Define an internal machine configuration data structure.  Needs to be
> > >    sufficiently generic to be able to support even oddball machine
> > >    types.  Make it a decorated tree, i.e. a tree of named nodes with
> > >    named properties.
> > 
> > Many real systems are not strictly tree-structured, because there are
> > hardware devices which connect via several different paths.  For
> > example, much hardware supported by OpenWRT comes with a built-in
> > bridge chip connected internally to a hidden ethernet card; a tape
> > library would have one interface for the robot and a bunch of SCSI
> > tapereaders; etc.
> 
> I'm not sure these are great examples, since there still a clear
> hierarchy here (e.g. the ethernet card is "behind" the bridge chip).
> Also, there is already established practice for representing SoC devices
> (found in many embedded PowerPC processors): see arch/powerpc/boot/dts.
> 
> However, what *is* a good example would be the interrupt hierarchy,
> which can be totally separate from the address/data hierarchy.
> 
> The device tree is about *devices*, not interfaces. Each node (device)
> can mark itself as implementing multiple interfaces, which is what the
> "compatible" property is about.
> 
> > When an emulation of such a device starts up, it will want to bind to
> > several parents.  How will you represent this ?
> 
> There is established design for representing the interrupt hierarchy in
> IEEE1275, using explicit "interrupt-parent" properties to create the
> interrupt tree.

Note that despite "interrupt tree" being the normal terminology, it's
actually a DAG, which is Hollis' point.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12  3:50           ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-12  3:50 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: devicetree-discuss, qemu-devel

On Wed, Feb 11, 2009 at 12:57:19PM -0600, Hollis Blanchard wrote:
> On Wed, 2009-02-11 at 16:31 +0000, Ian Jackson wrote:
> > Markus Armbruster writes ("[Qemu-devel] [RFC] Machine description as data"):
> > > [stuff]
> > 
> > Yes, this is a good approach.  I have one question though:
> > 
> > >    Define an internal machine configuration data structure.  Needs to be
> > >    sufficiently generic to be able to support even oddball machine
> > >    types.  Make it a decorated tree, i.e. a tree of named nodes with
> > >    named properties.
> > 
> > Many real systems are not strictly tree-structured, because there are
> > hardware devices which connect via several different paths.  For
> > example, much hardware supported by OpenWRT comes with a built-in
> > bridge chip connected internally to a hidden ethernet card; a tape
> > library would have one interface for the robot and a bunch of SCSI
> > tapereaders; etc.
> 
> I'm not sure these are great examples, since there still a clear
> hierarchy here (e.g. the ethernet card is "behind" the bridge chip).
> Also, there is already established practice for representing SoC devices
> (found in many embedded PowerPC processors): see arch/powerpc/boot/dts.
> 
> However, what *is* a good example would be the interrupt hierarchy,
> which can be totally separate from the address/data hierarchy.
> 
> The device tree is about *devices*, not interfaces. Each node (device)
> can mark itself as implementing multiple interfaces, which is what the
> "compatible" property is about.
> 
> > When an emulation of such a device starts up, it will want to bind to
> > several parents.  How will you represent this ?
> 
> There is established design for representing the interrupt hierarchy in
> IEEE1275, using explicit "interrupt-parent" properties to create the
> interrupt tree.

Note that despite "interrupt tree" being the normal terminology, it's
actually a DAG, which is Hollis' point.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-11 18:50     ` Hollis Blanchard
@ 2009-02-12  4:01         ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-12  4:01 UTC (permalink / raw)
  To: Hollis Blanchard
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
[snip]
> > I briefly examined the DT source format and the tree structure it
> > describes for the purpose of QEMU configuration.  I decided against
> > using it in my prototype because I found it awfully low-level and
> > verbose for that purpose (I'm sure it serves the purpose it was designed
> > for just fine).  Issues include:
> > 
> > * Since the DT is designed for booting kernels, not configuring QEMU,
> >   there's information that has no place in QEMU configuration, and
> >   required QEMU configuration isn't there.
> 
> What's needed is a "binding" in IEEE1275-speak: a document that
> describes qemu-specific nodes/properties and how they are to be
> interpreted.
> 
> As an example, you could require that block devices contain properties
> named "qemu,path", "qemu,backend", etc.

Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
extra information for qemu's use.  As for the other direction, in some
cases it may be appropriate for qemu's device tree code to fill in
missing device tree properties, based on what the device emulation
code knows about itself.

> > * Redundancy between node name and its device_type property.

Note that "device_type" may not mean what you think.  It describes
what methods the device support within the OF client interface.  New
device trees that aren't linked to a full OF implementation with
client interface should generally omit device_type in most places
(there are a few special cases for compatibility with OSes that expect
device_type properties in certain places).

> > * Property "reg", which encodes address ranges, does so in terms of
> >   "cells": #address-cells 32-bit words (big endian) for the address,
> >   followed by #size-cells words for the size, where #address-cells and
> >   #size-cells are properties of the enclosing bus.  If this sounds
> >   like gibberish to you, well, that's my point.

#address-cells and #size-cells takes a little getting used to, but
it's really not that bad.  It's just a way of representing the fact
that different busses have different sized address encodings.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12  4:01         ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-12  4:01 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: devicetree-discuss, qemu-devel

On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
[snip]
> > I briefly examined the DT source format and the tree structure it
> > describes for the purpose of QEMU configuration.  I decided against
> > using it in my prototype because I found it awfully low-level and
> > verbose for that purpose (I'm sure it serves the purpose it was designed
> > for just fine).  Issues include:
> > 
> > * Since the DT is designed for booting kernels, not configuring QEMU,
> >   there's information that has no place in QEMU configuration, and
> >   required QEMU configuration isn't there.
> 
> What's needed is a "binding" in IEEE1275-speak: a document that
> describes qemu-specific nodes/properties and how they are to be
> interpreted.
> 
> As an example, you could require that block devices contain properties
> named "qemu,path", "qemu,backend", etc.

Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
extra information for qemu's use.  As for the other direction, in some
cases it may be appropriate for qemu's device tree code to fill in
missing device tree properties, based on what the device emulation
code knows about itself.

> > * Redundancy between node name and its device_type property.

Note that "device_type" may not mean what you think.  It describes
what methods the device support within the OF client interface.  New
device trees that aren't linked to a full OF implementation with
client interface should generally omit device_type in most places
(there are a few special cases for compatibility with OSes that expect
device_type properties in certain places).

> > * Property "reg", which encodes address ranges, does so in terms of
> >   "cells": #address-cells 32-bit words (big endian) for the address,
> >   followed by #size-cells words for the size, where #address-cells and
> >   #size-cells are properties of the enclosing bus.  If this sounds
> >   like gibberish to you, well, that's my point.

#address-cells and #size-cells takes a little getting used to, but
it's really not that bad.  It's just a way of representing the fact
that different busses have different sized address encodings.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-11 19:56     ` Anthony Liguori
@ 2009-02-12 10:25       ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-12 10:25 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori <anthony@codemonkey.ws> writes:

> Blue Swirl wrote:
>> On 2/11/09, Anthony Liguori <anthony@codemonkey.ws> wrote:
>>   
>>> I think your approach is pretty sound.  A few observations:
>>>
>>>  1) obviously need to eliminate the code duplication

Obviously.

>>>  2) the new code should fit with the rest of QEMU stylistically

Expand tabs.  What else?

>>>  3) I'd prefer incremental vs. perfect so let's try to do as much
>>> refactoring that will be required before actually going the full 9 yards and
>>> implementing the config file.
>>>
>>>  4) we don't have to solve all problems all at once as long as we don't
>>> regress existing features

Exactly.

>> I'd still want to see if a FDT based solution is possible before
>> taking a homebrew version.
>>   
>
> I think I mentioned earlier that I am heavily bias toward a FDT
> solution.  What I'm suggesting though is that we can do some of the
> required cleanup (like device refactoring) before introducing any of
> the tree stuff.

I proposed to start with a (hardcoded) tree, because then a lot of tasks
become independent.  And the work to put device code behind an abstract
device interface happens in the right context from the start: driven by
tree-structured configuration.  That reduces the risk of us solving a
similar, but different problem than the one we actually have.

Of course, that sets us up for replacing the actual tree data structure.
Maybe I'm naive, but how hard could that be?  It's just a decorated
tree.  A bunch of identifiers change, and a few nodes get shuffled
around.

If you don't want to risk that, I can try to make some progress on the
abstract device interface without having a tree.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-11 18:50     ` Hollis Blanchard
@ 2009-02-12 10:26       ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-12 10:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

Hollis Blanchard <hollisb@us.ibm.com> writes:

> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
>> Sorry for the length of this memo.  I tried to make it as concise as I
>> could.  And there's working mock-up source code to go with it.
[...]
>> Appendix: Linux device trees
>> ----------------------------
>> 
>> This appendix is probably only of interest to some of you, feel free to
>> skip.
>> 
>> The IEEE 1275 Open Firmware Device Tree solves a somewhat similar
>> problem, namely to communicate environmental information (hardware and
>> configuration) from firmware to operating system.  It's chiefly used on
>> PowerPCs.  The OS calls Open Firmware to query the device tree.
>> 
>> Linux turns the Open Firmware device tree API into a data format.
>> Actually two: the DT blob format is a binary data structure, and the
>> DT source format is human-readable text.  The device tree compiler
>> "dtc" can convert the two.
>> 
>> We already have a bit of code dealing with this, in device_tree.c.
>> 
>> I briefly examined the DT source format and the tree structure it
>> describes for the purpose of QEMU configuration.  I decided against
>> using it in my prototype because I found it awfully low-level and
>> verbose for that purpose (I'm sure it serves the purpose it was designed
>> for just fine).  Issues include:
>> 
>> * Since the DT is designed for booting kernels, not configuring QEMU,
>>   there's information that has no place in QEMU configuration, and
>>   required QEMU configuration isn't there.
>
> What's needed is a "binding" in IEEE1275-speak: a document that
> describes qemu-specific nodes/properties and how they are to be
> interpreted.
>
> As an example, you could require that block devices contain properties
> named "qemu,path", "qemu,backend", etc.
>
>> * Redundancy between node name and its device_type property.
>> 
>> * Property "reg", which encodes address ranges, does so in terms of
>>   "cells": #address-cells 32-bit words (big endian) for the address,
>>   followed by #size-cells words for the size, where #address-cells and
>>   #size-cells are properties of the enclosing bus.  If this sounds
>>   like gibberish to you, well, that's my point.
>
> I'm CCing devicetree-discuss for broader discussion.
>
> I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
> reinvent all the design and infrastructure for a similar-but-different
> device tree.
>
> [Patch snipped]

I'm not at all opposed to adapting FDT for QEMU use.  My patch is a
prototype, and I'm prepared to throw away some or all of it.

To get this thing started, I wanted working code to demonstrate what I'm
talking about.  If I had dug deeper into FDTs first, we would not be
talking now.

The task I outlined in my memo involves much more than just coming up
with a device tree data structure.  That data structure is to me one
detail among many, and a much less hairy one than most others.  It
certainly was for the prototype.

If I read the comments correctly (all comments, not just this one), the
only real issue with my proposal is you'd rather use FDT for the config
tree.  I don't mind, except I don't know enough about that stuff to do
it all by myself, at least not in a reasonable time frame.  I think I
understand the concepts, can read .dts files with some head-scratching,
and I could perhaps even write one if I sacrificed a chicken or two.
Designing a binding, however, feels well above my level of
(in)competence.

So, to make FDT happen, I need help.  Specifically:

* Point me to the FDT code I'm supposed to integrate.  I'm looking for
  basic decorated tree stuff: create trees, traverse them, get and put
  properties, add and delete nodes, read and write them as plain,
  human-readable text.

* Provide an example tree describing a bare-bones PC, like the one in my
  prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
  port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
  miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
  Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
  tree structure.  Morphing that into something suitable for QEMU
  configuration shouldn't be too hard then, just an exercice in
  redecorating the tree.

* Advice as we go.

Volunteers?

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 10:26       ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-12 10:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

Hollis Blanchard <hollisb@us.ibm.com> writes:

> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
>> Sorry for the length of this memo.  I tried to make it as concise as I
>> could.  And there's working mock-up source code to go with it.
[...]
>> Appendix: Linux device trees
>> ----------------------------
>> 
>> This appendix is probably only of interest to some of you, feel free to
>> skip.
>> 
>> The IEEE 1275 Open Firmware Device Tree solves a somewhat similar
>> problem, namely to communicate environmental information (hardware and
>> configuration) from firmware to operating system.  It's chiefly used on
>> PowerPCs.  The OS calls Open Firmware to query the device tree.
>> 
>> Linux turns the Open Firmware device tree API into a data format.
>> Actually two: the DT blob format is a binary data structure, and the
>> DT source format is human-readable text.  The device tree compiler
>> "dtc" can convert the two.
>> 
>> We already have a bit of code dealing with this, in device_tree.c.
>> 
>> I briefly examined the DT source format and the tree structure it
>> describes for the purpose of QEMU configuration.  I decided against
>> using it in my prototype because I found it awfully low-level and
>> verbose for that purpose (I'm sure it serves the purpose it was designed
>> for just fine).  Issues include:
>> 
>> * Since the DT is designed for booting kernels, not configuring QEMU,
>>   there's information that has no place in QEMU configuration, and
>>   required QEMU configuration isn't there.
>
> What's needed is a "binding" in IEEE1275-speak: a document that
> describes qemu-specific nodes/properties and how they are to be
> interpreted.
>
> As an example, you could require that block devices contain properties
> named "qemu,path", "qemu,backend", etc.
>
>> * Redundancy between node name and its device_type property.
>> 
>> * Property "reg", which encodes address ranges, does so in terms of
>>   "cells": #address-cells 32-bit words (big endian) for the address,
>>   followed by #size-cells words for the size, where #address-cells and
>>   #size-cells are properties of the enclosing bus.  If this sounds
>>   like gibberish to you, well, that's my point.
>
> I'm CCing devicetree-discuss for broader discussion.
>
> I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
> reinvent all the design and infrastructure for a similar-but-different
> device tree.
>
> [Patch snipped]

I'm not at all opposed to adapting FDT for QEMU use.  My patch is a
prototype, and I'm prepared to throw away some or all of it.

To get this thing started, I wanted working code to demonstrate what I'm
talking about.  If I had dug deeper into FDTs first, we would not be
talking now.

The task I outlined in my memo involves much more than just coming up
with a device tree data structure.  That data structure is to me one
detail among many, and a much less hairy one than most others.  It
certainly was for the prototype.

If I read the comments correctly (all comments, not just this one), the
only real issue with my proposal is you'd rather use FDT for the config
tree.  I don't mind, except I don't know enough about that stuff to do
it all by myself, at least not in a reasonable time frame.  I think I
understand the concepts, can read .dts files with some head-scratching,
and I could perhaps even write one if I sacrificed a chicken or two.
Designing a binding, however, feels well above my level of
(in)competence.

So, to make FDT happen, I need help.  Specifically:

* Point me to the FDT code I'm supposed to integrate.  I'm looking for
  basic decorated tree stuff: create trees, traverse them, get and put
  properties, add and delete nodes, read and write them as plain,
  human-readable text.

* Provide an example tree describing a bare-bones PC, like the one in my
  prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
  port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
  miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
  Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
  tree structure.  Morphing that into something suitable for QEMU
  configuration shouldn't be too hard then, just an exercice in
  redecorating the tree.

* Advice as we go.

Volunteers?

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-12  4:01         ` David Gibson
@ 2009-02-12 10:26           ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-12 10:26 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: devicetree-discuss, qemu-devel

David Gibson <dwg@au1.ibm.com> writes:

> On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
>> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
> [snip]
>> > I briefly examined the DT source format and the tree structure it
>> > describes for the purpose of QEMU configuration.  I decided against
>> > using it in my prototype because I found it awfully low-level and
>> > verbose for that purpose (I'm sure it serves the purpose it was designed
>> > for just fine).  Issues include:
>> > 
>> > * Since the DT is designed for booting kernels, not configuring QEMU,
>> >   there's information that has no place in QEMU configuration, and
>> >   required QEMU configuration isn't there.
>> 
>> What's needed is a "binding" in IEEE1275-speak: a document that
>> describes qemu-specific nodes/properties and how they are to be
>> interpreted.
>> 
>> As an example, you could require that block devices contain properties
>> named "qemu,path", "qemu,backend", etc.
>
> Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
> extra information for qemu's use.

I don't feel up to that task, because I'm not really familiar with
IEEE1275.  Could you help out?

>                                    As for the other direction, in some
> cases it may be appropriate for qemu's device tree code to fill in
> missing device tree properties, based on what the device emulation
> code knows about itself.

Agreed.  Configuration should only contain what is actually
configurable.  Anything else that is needed by a consumer of the tree
should be filled in automatically.

>> > * Redundancy between node name and its device_type property.
>
> Note that "device_type" may not mean what you think.  It describes
> what methods the device support within the OF client interface.  New
> device trees that aren't linked to a full OF implementation with
> client interface should generally omit device_type in most places
> (there are a few special cases for compatibility with OSes that expect
> device_type properties in certain places).

I guess the ignorance I mentioned shows ;)

>> > * Property "reg", which encodes address ranges, does so in terms of
>> >   "cells": #address-cells 32-bit words (big endian) for the address,
>> >   followed by #size-cells words for the size, where #address-cells and
>> >   #size-cells are properties of the enclosing bus.  If this sounds
>> >   like gibberish to you, well, that's my point.
>
> #address-cells and #size-cells takes a little getting used to, but
> it's really not that bad.  It's just a way of representing the fact
> that different busses have different sized address encodings.

I didn't mean to say they are a bad idea for FDTs, just that they're on
an awkward level of abstraction for QEMU configuration.  There, I'd
rather express a PCI address as "02:01.0" than as <0x00000220>.
Translating text to binary is the machine's job, not the user's.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 10:26           ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-12 10:26 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: devicetree-discuss, qemu-devel

David Gibson <dwg@au1.ibm.com> writes:

> On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
>> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
> [snip]
>> > I briefly examined the DT source format and the tree structure it
>> > describes for the purpose of QEMU configuration.  I decided against
>> > using it in my prototype because I found it awfully low-level and
>> > verbose for that purpose (I'm sure it serves the purpose it was designed
>> > for just fine).  Issues include:
>> > 
>> > * Since the DT is designed for booting kernels, not configuring QEMU,
>> >   there's information that has no place in QEMU configuration, and
>> >   required QEMU configuration isn't there.
>> 
>> What's needed is a "binding" in IEEE1275-speak: a document that
>> describes qemu-specific nodes/properties and how they are to be
>> interpreted.
>> 
>> As an example, you could require that block devices contain properties
>> named "qemu,path", "qemu,backend", etc.
>
> Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
> extra information for qemu's use.

I don't feel up to that task, because I'm not really familiar with
IEEE1275.  Could you help out?

>                                    As for the other direction, in some
> cases it may be appropriate for qemu's device tree code to fill in
> missing device tree properties, based on what the device emulation
> code knows about itself.

Agreed.  Configuration should only contain what is actually
configurable.  Anything else that is needed by a consumer of the tree
should be filled in automatically.

>> > * Redundancy between node name and its device_type property.
>
> Note that "device_type" may not mean what you think.  It describes
> what methods the device support within the OF client interface.  New
> device trees that aren't linked to a full OF implementation with
> client interface should generally omit device_type in most places
> (there are a few special cases for compatibility with OSes that expect
> device_type properties in certain places).

I guess the ignorance I mentioned shows ;)

>> > * Property "reg", which encodes address ranges, does so in terms of
>> >   "cells": #address-cells 32-bit words (big endian) for the address,
>> >   followed by #size-cells words for the size, where #address-cells and
>> >   #size-cells are properties of the enclosing bus.  If this sounds
>> >   like gibberish to you, well, that's my point.
>
> #address-cells and #size-cells takes a little getting used to, but
> it's really not that bad.  It's just a way of representing the fact
> that different busses have different sized address encodings.

I didn't mean to say they are a bad idea for FDTs, just that they're on
an awkward level of abstraction for QEMU configuration.  There, I'd
rather express a PCI address as "02:01.0" than as <0x00000220>.
Translating text to binary is the machine's job, not the user's.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-12 10:26       ` [Qemu-devel] " Markus Armbruster
@ 2009-02-12 12:36         ` Carl-Daniel Hailfinger
  -1 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-12 12:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

On 12.02.2009 11:26, Markus Armbruster wrote:
> Hollis Blanchard <hollisb@us.ibm.com> writes:
>   
>> I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
>> reinvent all the design and infrastructure for a similar-but-different
>> device tree.
>>
>> [Patch snipped]
>>     
>
> I'm not at all opposed to adapting FDT for QEMU use.  My patch is a
> prototype, and I'm prepared to throw away some or all of it.
> [...]
> If I read the comments correctly (all comments, not just this one), the
> only real issue with my proposal is you'd rather use FDT for the config
> tree.  I don't mind, except I don't know enough about that stuff to do
> it all by myself, at least not in a reasonable time frame.  I think I
> understand the concepts, can read .dts files with some head-scratching,
> and I could perhaps even write one if I sacrificed a chicken or two.
> Designing a binding, however, feels well above my level of
> (in)competence.
>
> So, to make FDT happen, I need help.  Specifically:
>
> * Provide an example tree describing a bare-bones PC, like the one in my
>   prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
>   port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
>   miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
>   Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
>   tree structure.  Morphing that into something suitable for QEMU
>   configuration shouldn't be too hard then, just an exercice in
>   redecorating the tree.
>   

Once you start modeling any recent AMD x86_64 hardware accurately, it
starts to hurt.
The HyperTransport link topology is needed for correct setup of HT
links, but HT appears as part of virtual PCI config interfaces. That
would be OK, but the topology of the PCI config interfaces of the HT
links has almost nothing to do with the real HT topology.

You don't get a tree or even a DAG. It's just a digraph with the
occassional cycle. And you have to annotate some edges as well, not just
the vertices. I have no idea whether the FDT can represent such graphs.

Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 12:36         ` Carl-Daniel Hailfinger
  0 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-12 12:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

On 12.02.2009 11:26, Markus Armbruster wrote:
> Hollis Blanchard <hollisb@us.ibm.com> writes:
>   
>> I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
>> reinvent all the design and infrastructure for a similar-but-different
>> device tree.
>>
>> [Patch snipped]
>>     
>
> I'm not at all opposed to adapting FDT for QEMU use.  My patch is a
> prototype, and I'm prepared to throw away some or all of it.
> [...]
> If I read the comments correctly (all comments, not just this one), the
> only real issue with my proposal is you'd rather use FDT for the config
> tree.  I don't mind, except I don't know enough about that stuff to do
> it all by myself, at least not in a reasonable time frame.  I think I
> understand the concepts, can read .dts files with some head-scratching,
> and I could perhaps even write one if I sacrificed a chicken or two.
> Designing a binding, however, feels well above my level of
> (in)competence.
>
> So, to make FDT happen, I need help.  Specifically:
>
> * Provide an example tree describing a bare-bones PC, like the one in my
>   prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
>   port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
>   miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
>   Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
>   tree structure.  Morphing that into something suitable for QEMU
>   configuration shouldn't be too hard then, just an exercice in
>   redecorating the tree.
>   

Once you start modeling any recent AMD x86_64 hardware accurately, it
starts to hurt.
The HyperTransport link topology is needed for correct setup of HT
links, but HT appears as part of virtual PCI config interfaces. That
would be OK, but the topology of the PCI config interfaces of the HT
links has almost nothing to do with the real HT topology.

You don't get a tree or even a DAG. It's just a digraph with the
occassional cycle. And you have to annotate some edges as well, not just
the vertices. I have no idea whether the FDT can represent such graphs.

Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-12 10:26           ` [Qemu-devel] " Markus Armbruster
@ 2009-02-12 12:49             ` Carl-Daniel Hailfinger
  -1 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-12 12:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Hollis Blanchard

On 12.02.2009 11:26, Markus Armbruster wrote:
> David Gibson <dwg@au1.ibm.com> writes:
>
>   
>> On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
>>     
>>> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
>>>       
>>>> I briefly examined the DT source format and the tree structure it
>>>> describes for the purpose of QEMU configuration.  I decided against
>>>> using it in my prototype because I found it awfully low-level and
>>>> verbose for that purpose (I'm sure it serves the purpose it was designed
>>>> for just fine).  Issues include:
>>>>
>>>> * Since the DT is designed for booting kernels, not configuring QEMU,
>>>>   there's information that has no place in QEMU configuration, and
>>>>   required QEMU configuration isn't there.
>>>>         
>>> What's needed is a "binding" in IEEE1275-speak: a document that
>>> describes qemu-specific nodes/properties and how they are to be
>>> interpreted.
>>>
>>> As an example, you could require that block devices contain properties
>>> named "qemu,path", "qemu,backend", etc.
>>>> * Property "reg", which encodes address ranges, does so in terms of
>>>>   "cells": #address-cells 32-bit words (big endian) for the address,
>>>>   followed by #size-cells words for the size, where #address-cells and
>>>>   #size-cells are properties of the enclosing bus.  If this sounds
>>>>   like gibberish to you, well, that's my point.
>>>>         
>> #address-cells and #size-cells takes a little getting used to, but
>> it's really not that bad.  It's just a way of representing the fact
>> that different busses have different sized address encodings.
>>     
>
> I didn't mean to say they are a bad idea for FDTs, just that they're on
> an awkward level of abstraction for QEMU configuration.  There, I'd
> rather express a PCI address as "02:01.0" than as <0x00000220>.
> Translating text to binary is the machine's job, not the user's.
>   

Coreboot v3 is using some device tree variant which is IMHO a bit more
user friendly. The tree below is incomplete (for example, it leaves out
the PCI bus number and assumes that it is zero by default), but you
surely get the idea.

/{
    mainboard_vendor = "Gigabyte";
    mainboard_name = "M57SLI";
    cpus { };
    apic@0 {
    };
    domain@0 {
        pci@0,0 { /* MCP55 RAM? */ 
        };
        pci@1,0 {
            /config/("southbridge/nvidia/mcp55/lpc.dts");
            ioport@2e {
                /config/("superio/ite/it8716f/dts");
                com1enable = "1";
                ecenable = "1";
                kbenable = "1";
                mouseenable = "1";
                gpioenable = "1";
            };
        };
        pci@1,1 { /* smbus */
        };
        pci@2,0 { /* usb */
        };
        pci@2,1 { /* usb */
        };
        pci@4,0 {
            /config/("southbridge/nvidia/mcp55/ide.dts");
            ide0_enable = "1";
        };
        pci@5,0 {
            /config/("southbridge/nvidia/mcp55/sata.dts");
            sata0_enable = "1";
        };
        pci@5,1 {
            /config/("southbridge/nvidia/mcp55/sata.dts");
            sata1_enable = "1";
        };
        pci@6,0 { /* PCI */
        };
        pci@6,1 {
            /*/config/("southbridge/nvidia/mcp55/audio.dts"); */
        };
        pci@8,0 {
        /*
            /config/("southbridge/nvidia/mcp55/nic.dts");
            mac_eeprom_smbus = "3";
            mac_eeprom_addr = "0x51";
        */
        };
        pci@f,0 { /* PCIe */ 
        };
        pci@18,0 {
            /config/("northbridge/amd/k8/pci");
        };
        pci@18,1 {};
        pci@18,2 {};
        pci@18,3 {
            /config/("northbridge/amd/k8/mcf3");
        };
    };
};


The /config/("...") statements are basically comparable to #include
"..." in C.

While the syntax pci@dev,fn is different to the bus:dev.fn you're used
to, it's IMHO a lot more readable than <0x00000220>.

Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 12:49             ` Carl-Daniel Hailfinger
  0 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-12 12:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Hollis Blanchard

On 12.02.2009 11:26, Markus Armbruster wrote:
> David Gibson <dwg@au1.ibm.com> writes:
>
>   
>> On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
>>     
>>> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
>>>       
>>>> I briefly examined the DT source format and the tree structure it
>>>> describes for the purpose of QEMU configuration.  I decided against
>>>> using it in my prototype because I found it awfully low-level and
>>>> verbose for that purpose (I'm sure it serves the purpose it was designed
>>>> for just fine).  Issues include:
>>>>
>>>> * Since the DT is designed for booting kernels, not configuring QEMU,
>>>>   there's information that has no place in QEMU configuration, and
>>>>   required QEMU configuration isn't there.
>>>>         
>>> What's needed is a "binding" in IEEE1275-speak: a document that
>>> describes qemu-specific nodes/properties and how they are to be
>>> interpreted.
>>>
>>> As an example, you could require that block devices contain properties
>>> named "qemu,path", "qemu,backend", etc.
>>>> * Property "reg", which encodes address ranges, does so in terms of
>>>>   "cells": #address-cells 32-bit words (big endian) for the address,
>>>>   followed by #size-cells words for the size, where #address-cells and
>>>>   #size-cells are properties of the enclosing bus.  If this sounds
>>>>   like gibberish to you, well, that's my point.
>>>>         
>> #address-cells and #size-cells takes a little getting used to, but
>> it's really not that bad.  It's just a way of representing the fact
>> that different busses have different sized address encodings.
>>     
>
> I didn't mean to say they are a bad idea for FDTs, just that they're on
> an awkward level of abstraction for QEMU configuration.  There, I'd
> rather express a PCI address as "02:01.0" than as <0x00000220>.
> Translating text to binary is the machine's job, not the user's.
>   

Coreboot v3 is using some device tree variant which is IMHO a bit more
user friendly. The tree below is incomplete (for example, it leaves out
the PCI bus number and assumes that it is zero by default), but you
surely get the idea.

/{
    mainboard_vendor = "Gigabyte";
    mainboard_name = "M57SLI";
    cpus { };
    apic@0 {
    };
    domain@0 {
        pci@0,0 { /* MCP55 RAM? */ 
        };
        pci@1,0 {
            /config/("southbridge/nvidia/mcp55/lpc.dts");
            ioport@2e {
                /config/("superio/ite/it8716f/dts");
                com1enable = "1";
                ecenable = "1";
                kbenable = "1";
                mouseenable = "1";
                gpioenable = "1";
            };
        };
        pci@1,1 { /* smbus */
        };
        pci@2,0 { /* usb */
        };
        pci@2,1 { /* usb */
        };
        pci@4,0 {
            /config/("southbridge/nvidia/mcp55/ide.dts");
            ide0_enable = "1";
        };
        pci@5,0 {
            /config/("southbridge/nvidia/mcp55/sata.dts");
            sata0_enable = "1";
        };
        pci@5,1 {
            /config/("southbridge/nvidia/mcp55/sata.dts");
            sata1_enable = "1";
        };
        pci@6,0 { /* PCI */
        };
        pci@6,1 {
            /*/config/("southbridge/nvidia/mcp55/audio.dts"); */
        };
        pci@8,0 {
        /*
            /config/("southbridge/nvidia/mcp55/nic.dts");
            mac_eeprom_smbus = "3";
            mac_eeprom_addr = "0x51";
        */
        };
        pci@f,0 { /* PCIe */ 
        };
        pci@18,0 {
            /config/("northbridge/amd/k8/pci");
        };
        pci@18,1 {};
        pci@18,2 {};
        pci@18,3 {
            /config/("northbridge/amd/k8/mcf3");
        };
    };
};


The /config/("...") statements are basically comparable to #include
"..." in C.

While the syntax pci@dev,fn is different to the bus:dev.fn you're used
to, it's IMHO a lot more readable than <0x00000220>.

Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 10:26       ` [Qemu-devel] " Markus Armbruster
  (?)
  (?)
@ 2009-02-12 16:07       ` Paul Brook
  2009-02-12 17:17         ` Blue Swirl
  2009-02-12 18:09         ` Marcelo Tosatti
  -1 siblings, 2 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-12 16:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 865 bytes --]

> * Point me to the FDT code I'm supposed to integrate.  I'm looking for
>   basic decorated tree stuff: create trees, traverse them, get and put
>   properties, add and delete nodes, read and write them as plain,
>   human-readable text.

I've been threatening to merge my FDT code for a while, but haven't got round 
to it.  I've attached A drop of my current code, along with a bunch of 
example devices (I haven't yet converted any of the current machines). 
The basic strategy is the devices should only have deal with this interface, 
and not with teh config structures or the rest of qemu directly. Register 
windows and interrupts are converted, but things like DMA accesses still use 
the old interfaces.

Most of the devices (e.g. the serial port) support both new and old init 
methods. A few (e.g. nand controller) are pure devtree based devices.

Paul

[-- Attachment #2: devtree.tar.bz2 --]
[-- Type: application/x-tbz, Size: 14296 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-12 12:49             ` [Qemu-devel] " Carl-Daniel Hailfinger
@ 2009-02-12 16:46               ` M. Warner Losh
  -1 siblings, 0 replies; 146+ messages in thread
From: M. Warner Losh @ 2009-02-12 16:46 UTC (permalink / raw)
  To: qemu-devel, c-d.hailfinger.devel.2006; +Cc: devicetree-discuss, hollisb

<87iqng0x3t.fsf@pike.pond.sub.org>
<49941AE3.1000806@gmx.net>
X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

In message: <49941AE3.1000806@gmx.net>
Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> writes:
: > I didn't mean to say they are a bad idea for FDTs, just that they're on
: > an awkward level of abstraction for QEMU configuration.  There, I'd
: > rather express a PCI address as "02:01.0" than as <0x00000220>.
: > Translating text to binary is the machine's job, not the user's.
: 
: Coreboot v3 is using some device tree variant which is IMHO a bit more
: user friendly. The tree below is incomplete (for example, it leaves out
: the PCI bus number and assumes that it is zero by default), but you
: surely get the idea.
: 
: /{
:     mainboard_vendor = "Gigabyte";
:     mainboard_name = "M57SLI";
:     cpus { };
:     apic@0 {
:     };
:     domain@0 {
:         pci@0,0 { /* MCP55 RAM? */ 
:         };
:         pci@1,0 {
:             /config/("southbridge/nvidia/mcp55/lpc.dts");
:             ioport@2e {

<etc>

I'd like to make a couple of comments here.

One, I dislike the DTS syntax.  It is hard to learn to read, and I
always have to have the manual in my hands to read it.

However, every board that's being produced for powerpc has the DTB at
least available.  It has to be, or (recent?) Linux kernels flat out
won't work.  This suggests that it might be a good idea to look at
this format.

There's DTS and DTB.  One is the source, the other is the binary
created from the source.  I'd recommend that qemu actually use the DTB
rather than the DTS to implement things.  This way one could have a
nicer syntax like the above and generate the DTB, or one could use the
DTS provided by a vendor if there was a more specific board they
wanted qemu to emulate.

Carl-Daniel, how does coreboot v3 generate the data that's passed to
the kernel?

Warner

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 16:46               ` M. Warner Losh
  0 siblings, 0 replies; 146+ messages in thread
From: M. Warner Losh @ 2009-02-12 16:46 UTC (permalink / raw)
  To: qemu-devel, c-d.hailfinger.devel.2006; +Cc: devicetree-discuss, hollisb

<87iqng0x3t.fsf@pike.pond.sub.org>
<49941AE3.1000806@gmx.net>
X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

In message: <49941AE3.1000806@gmx.net>
Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> writes:
: > I didn't mean to say they are a bad idea for FDTs, just that they're on
: > an awkward level of abstraction for QEMU configuration.  There, I'd
: > rather express a PCI address as "02:01.0" than as <0x00000220>.
: > Translating text to binary is the machine's job, not the user's.
: 
: Coreboot v3 is using some device tree variant which is IMHO a bit more
: user friendly. The tree below is incomplete (for example, it leaves out
: the PCI bus number and assumes that it is zero by default), but you
: surely get the idea.
: 
: /{
:     mainboard_vendor = "Gigabyte";
:     mainboard_name = "M57SLI";
:     cpus { };
:     apic@0 {
:     };
:     domain@0 {
:         pci@0,0 { /* MCP55 RAM? */ 
:         };
:         pci@1,0 {
:             /config/("southbridge/nvidia/mcp55/lpc.dts");
:             ioport@2e {

<etc>

I'd like to make a couple of comments here.

One, I dislike the DTS syntax.  It is hard to learn to read, and I
always have to have the manual in my hands to read it.

However, every board that's being produced for powerpc has the DTB at
least available.  It has to be, or (recent?) Linux kernels flat out
won't work.  This suggests that it might be a good idea to look at
this format.

There's DTS and DTB.  One is the source, the other is the binary
created from the source.  I'd recommend that qemu actually use the DTB
rather than the DTS to implement things.  This way one could have a
nicer syntax like the above and generate the DTB, or one could use the
DTS provided by a vendor if there was a more specific board they
wanted qemu to emulate.

Carl-Daniel, how does coreboot v3 generate the data that's passed to
the kernel?

Warner

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 16:07       ` Paul Brook
@ 2009-02-12 17:17         ` Blue Swirl
  2009-02-12 18:09         ` Marcelo Tosatti
  1 sibling, 0 replies; 146+ messages in thread
From: Blue Swirl @ 2009-02-12 17:17 UTC (permalink / raw)
  To: qemu-devel

On 2/12/09, Paul Brook <paul@codesourcery.com> wrote:
> > * Point me to the FDT code I'm supposed to integrate.  I'm looking for
>  >   basic decorated tree stuff: create trees, traverse them, get and put
>  >   properties, add and delete nodes, read and write them as plain,
>  >   human-readable text.
>
>
> I've been threatening to merge my FDT code for a while, but haven't got round
>  to it.  I've attached A drop of my current code, along with a bunch of
>  example devices (I haven't yet converted any of the current machines).
>  The basic strategy is the devices should only have deal with this interface,
>  and not with teh config structures or the rest of qemu directly. Register
>  windows and interrupts are converted, but things like DMA accesses still use
>  the old interfaces.
>
>  Most of the devices (e.g. the serial port) support both new and old init
>  methods. A few (e.g. nand controller) are pure devtree based devices.

Looks good to me so far. It will be a big job to convert all devices, though.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 10:26           ` [Qemu-devel] " Markus Armbruster
@ 2009-02-12 17:52               ` Hollis Blanchard
  -1 siblings, 0 replies; 146+ messages in thread
From: Hollis Blanchard @ 2009-02-12 17:52 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Thu, 2009-02-12 at 11:26 +0100, Markus Armbruster wrote:
>  David Gibson <dwg-8fk3Idey6ehBDgjK7y7TUQ@public.gmane.org> writes:
> 
> > On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
> >> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
> > [snip]
> >> > I briefly examined the DT source format and the tree structure it
> >> > describes for the purpose of QEMU configuration.  I decided
> against
> >> > using it in my prototype because I found it awfully low-level and
> >> > verbose for that purpose (I'm sure it serves the purpose it was
> designed
> >> > for just fine).  Issues include:
> >> > 
> >> > * Since the DT is designed for booting kernels, not configuring
> QEMU,
> >> >   there's information that has no place in QEMU configuration,
> and
> >> >   required QEMU configuration isn't there.
> >> 
> >> What's needed is a "binding" in IEEE1275-speak: a document that
> >> describes qemu-specific nodes/properties and how they are to be
> >> interpreted.
> >> 
> >> As an example, you could require that block devices contain
> properties
> >> named "qemu,path", "qemu,backend", etc.
> >
> > Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
> > extra information for qemu's use.
> 
> I don't feel up to that task, because I'm not really familiar with
> IEEE1275.  Could you help out?

I'm not really a "language lawyer" for device trees, but I can help.

FWIW, I was imagining (from a PowerPC point of view) that a strict
subset of the device tree interpreted by qemu would be passed into the
guest. In other words, once qemu is done with it, it would strip every
property prefixed with "qemu," and copy the result into guest memory.
PowerPC kernels require this data structure, and even when firmware runs
in the guest, you still need to tell the firmware what the system layout
is, and the device tree is an obvious candidate...

For x86, maybe it doesn't make sense to have in-guest BIOS split a
qemu-provided device tree into all the nasty BIOS data structures, but I
just wanted to give you an idea of how this could be used on multiple
architectures.

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 17:52               ` Hollis Blanchard
  0 siblings, 0 replies; 146+ messages in thread
From: Hollis Blanchard @ 2009-02-12 17:52 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: devicetree-discuss, qemu-devel

On Thu, 2009-02-12 at 11:26 +0100, Markus Armbruster wrote:
>  David Gibson <dwg@au1.ibm.com> writes:
> 
> > On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
> >> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
> > [snip]
> >> > I briefly examined the DT source format and the tree structure it
> >> > describes for the purpose of QEMU configuration.  I decided
> against
> >> > using it in my prototype because I found it awfully low-level and
> >> > verbose for that purpose (I'm sure it serves the purpose it was
> designed
> >> > for just fine).  Issues include:
> >> > 
> >> > * Since the DT is designed for booting kernels, not configuring
> QEMU,
> >> >   there's information that has no place in QEMU configuration,
> and
> >> >   required QEMU configuration isn't there.
> >> 
> >> What's needed is a "binding" in IEEE1275-speak: a document that
> >> describes qemu-specific nodes/properties and how they are to be
> >> interpreted.
> >> 
> >> As an example, you could require that block devices contain
> properties
> >> named "qemu,path", "qemu,backend", etc.
> >
> > Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
> > extra information for qemu's use.
> 
> I don't feel up to that task, because I'm not really familiar with
> IEEE1275.  Could you help out?

I'm not really a "language lawyer" for device trees, but I can help.

FWIW, I was imagining (from a PowerPC point of view) that a strict
subset of the device tree interpreted by qemu would be passed into the
guest. In other words, once qemu is done with it, it would strip every
property prefixed with "qemu," and copy the result into guest memory.
PowerPC kernels require this data structure, and even when firmware runs
in the guest, you still need to tell the firmware what the system layout
is, and the device tree is an obvious candidate...

For x86, maybe it doesn't make sense to have in-guest BIOS split a
qemu-provided device tree into all the nasty BIOS data structures, but I
just wanted to give you an idea of how this could be used on multiple
architectures.

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 16:07       ` Paul Brook
  2009-02-12 17:17         ` Blue Swirl
@ 2009-02-12 18:09         ` Marcelo Tosatti
  1 sibling, 0 replies; 146+ messages in thread
From: Marcelo Tosatti @ 2009-02-12 18:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Markus Armbruster

On Thu, Feb 12, 2009 at 04:07:39PM +0000, Paul Brook wrote:
> > * Point me to the FDT code I'm supposed to integrate.  I'm looking for
> >   basic decorated tree stuff: create trees, traverse them, get and put
> >   properties, add and delete nodes, read and write them as plain,
> >   human-readable text.
> 
> I've been threatening to merge my FDT code for a while, but haven't got round 
> to it.  I've attached A drop of my current code, along with a bunch of 
> example devices (I haven't yet converted any of the current machines). 
> The basic strategy is the devices should only have deal with this interface, 
> and not with teh config structures or the rest of qemu directly. Register 
> windows and interrupts are converted, but things like DMA accesses still use 
> the old interfaces.
> 
> Most of the devices (e.g. the serial port) support both new and old init 
> methods. A few (e.g. nand controller) are pure devtree based devices.
> 
> Paul

Ok, so a few questions:

- Should host side parameters live inside particular device nodes, as
  properties? (or to what extent). For example (from early brainstorm
  Markus wrote).

/ {
        model = "pc";                   // -M
        cpus {
                model = "coreduo";      // -cpu
                smp = 2;                // -smp
        }
    // PCI host bridge @domain:bus
        pci@0000:00 {
                // devices on this bus @device.function
                device@1 {              // PIIX3
                        model = "PIIX3";
                        ata {
                                device@0 {
                                        device_type = "disk";
                                        drive {
                                                format = "raw";
                                                file = "/var/lib/libvirt/images/hd.img"
                                                cache = "none";
                                        };
                                                
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There is a mapping between QEMUDevice <-> disk (1:1). See
dt_piix3_config/dt_drive_config on Markus's code. Oh:

    if (dc->has_chardev) {
        int n;
        propstr = fdt_getprop_string(dt, node, "chardev");
        if (propstr) {
            i = sscanf(propstr, "serial%d", &n);
            if (i == 1 && n >= 0 && n < MAX_SERIAL_PORTS)
                d->chardev = serial_hds[n];
        }
    }

Can you show a working example tree?

OTOH vlans and nics have a QEMUDevice -> vlan mapping (vlans are not
part of the emulated hardware device tree). Where they belong?

For starters, for i386, one can append command line parameters to a
static tree containing basic PIIX hw, then have the board code (pc.c)
use that.

But both prototypes look similar in essence. I can't find QEMUDevice
linked in a tree in your code though. It should be possible to, for
example, hot add a PCI device, link it in, and dump the updated tree
with a monitor command.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-12 16:46               ` [Qemu-devel] " M. Warner Losh
@ 2009-02-12 18:29                 ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-12 18:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, c-d.hailfinger.devel.2006, hollisb

"M. Warner Losh" <imp@bsdimp.com> writes:

> <87iqng0x3t.fsf@pike.pond.sub.org>
> <49941AE3.1000806@gmx.net>
> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI)
> Mime-Version: 1.0
> Content-Type: Text/Plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
>
> In message: <49941AE3.1000806@gmx.net>
> Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> writes:
> : > I didn't mean to say they are a bad idea for FDTs, just that they're on
> : > an awkward level of abstraction for QEMU configuration.  There, I'd
> : > rather express a PCI address as "02:01.0" than as <0x00000220>.
> : > Translating text to binary is the machine's job, not the user's.
> : 
> : Coreboot v3 is using some device tree variant which is IMHO a bit more
> : user friendly. The tree below is incomplete (for example, it leaves out
> : the PCI bus number and assumes that it is zero by default), but you
> : surely get the idea.
> : 
> : /{
> :     mainboard_vendor = "Gigabyte";
> :     mainboard_name = "M57SLI";
> :     cpus { };
> :     apic@0 {
> :     };
> :     domain@0 {
> :         pci@0,0 { /* MCP55 RAM? */ 
> :         };
> :         pci@1,0 {
> :             /config/("southbridge/nvidia/mcp55/lpc.dts");
> :             ioport@2e {
>
> <etc>
>
> I'd like to make a couple of comments here.
>
> One, I dislike the DTS syntax.  It is hard to learn to read, and I
> always have to have the manual in my hands to read it.
>
> However, every board that's being produced for powerpc has the DTB at
> least available.  It has to be, or (recent?) Linux kernels flat out
> won't work.  This suggests that it might be a good idea to look at
> this format.
>
> There's DTS and DTB.  One is the source, the other is the binary
> created from the source.  I'd recommend that qemu actually use the DTB
> rather than the DTS to implement things.  This way one could have a
> nicer syntax like the above and generate the DTB, or one could use the
> DTS provided by a vendor if there was a more specific board they
> wanted qemu to emulate.

As far as I know, dtc can decompile DTB into DTS.

I'm not a fan of DTS syntax either, but if we choose FDT, then inventing
an alternative syntax seems rather pointless to me.

As to reading configuration in a binary format: let's not complicate
things more than we need.  It's just a decorated tree, folks.

[...]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 18:29                 ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-12 18:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, c-d.hailfinger.devel.2006, hollisb

"M. Warner Losh" <imp@bsdimp.com> writes:

> <87iqng0x3t.fsf@pike.pond.sub.org>
> <49941AE3.1000806@gmx.net>
> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI)
> Mime-Version: 1.0
> Content-Type: Text/Plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
>
> In message: <49941AE3.1000806@gmx.net>
> Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> writes:
> : > I didn't mean to say they are a bad idea for FDTs, just that they're on
> : > an awkward level of abstraction for QEMU configuration.  There, I'd
> : > rather express a PCI address as "02:01.0" than as <0x00000220>.
> : > Translating text to binary is the machine's job, not the user's.
> : 
> : Coreboot v3 is using some device tree variant which is IMHO a bit more
> : user friendly. The tree below is incomplete (for example, it leaves out
> : the PCI bus number and assumes that it is zero by default), but you
> : surely get the idea.
> : 
> : /{
> :     mainboard_vendor = "Gigabyte";
> :     mainboard_name = "M57SLI";
> :     cpus { };
> :     apic@0 {
> :     };
> :     domain@0 {
> :         pci@0,0 { /* MCP55 RAM? */ 
> :         };
> :         pci@1,0 {
> :             /config/("southbridge/nvidia/mcp55/lpc.dts");
> :             ioport@2e {
>
> <etc>
>
> I'd like to make a couple of comments here.
>
> One, I dislike the DTS syntax.  It is hard to learn to read, and I
> always have to have the manual in my hands to read it.
>
> However, every board that's being produced for powerpc has the DTB at
> least available.  It has to be, or (recent?) Linux kernels flat out
> won't work.  This suggests that it might be a good idea to look at
> this format.
>
> There's DTS and DTB.  One is the source, the other is the binary
> created from the source.  I'd recommend that qemu actually use the DTB
> rather than the DTS to implement things.  This way one could have a
> nicer syntax like the above and generate the DTB, or one could use the
> DTS provided by a vendor if there was a more specific board they
> wanted qemu to emulate.

As far as I know, dtc can decompile DTB into DTS.

I'm not a fan of DTS syntax either, but if we choose FDT, then inventing
an alternative syntax seems rather pointless to me.

As to reading configuration in a binary format: let's not complicate
things more than we need.  It's just a decorated tree, folks.

[...]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 17:52               ` Hollis Blanchard
@ 2009-02-12 18:53                   ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-12 18:53 UTC (permalink / raw)
  To: Hollis Blanchard
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:

> On Thu, 2009-02-12 at 11:26 +0100, Markus Armbruster wrote:
>>  David Gibson <dwg-8fk3Idey6ehBDgjK7y7TUQ@public.gmane.org> writes:
>> 
>> > On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
>> >> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
>> > [snip]
>> >> > I briefly examined the DT source format and the tree structure it
>> >> > describes for the purpose of QEMU configuration.  I decided
>> against
>> >> > using it in my prototype because I found it awfully low-level and
>> >> > verbose for that purpose (I'm sure it serves the purpose it was
>> designed
>> >> > for just fine).  Issues include:
>> >> > 
>> >> > * Since the DT is designed for booting kernels, not configuring
>> QEMU,
>> >> >   there's information that has no place in QEMU configuration,
>> and
>> >> >   required QEMU configuration isn't there.
>> >> 
>> >> What's needed is a "binding" in IEEE1275-speak: a document that
>> >> describes qemu-specific nodes/properties and how they are to be
>> >> interpreted.
>> >> 
>> >> As an example, you could require that block devices contain
>> properties
>> >> named "qemu,path", "qemu,backend", etc.
>> >
>> > Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
>> > extra information for qemu's use.
>> 
>> I don't feel up to that task, because I'm not really familiar with
>> IEEE1275.  Could you help out?
>
> I'm not really a "language lawyer" for device trees, but I can help.

Appreciated!

> FWIW, I was imagining (from a PowerPC point of view) that a strict
> subset of the device tree interpreted by qemu would be passed into the
> guest. In other words, once qemu is done with it, it would strip every
> property prefixed with "qemu," and copy the result into guest memory.
> PowerPC kernels require this data structure, and even when firmware runs
> in the guest, you still need to tell the firmware what the system layout
> is, and the device tree is an obvious candidate...
>
> For x86, maybe it doesn't make sense to have in-guest BIOS split a
> qemu-provided device tree into all the nasty BIOS data structures, but I
> just wanted to give you an idea of how this could be used on multiple
> architectures.

We want a machine configuration: a tree describing configurable devices
and their configurable properties.

For PowerPC, we also want a machine description: a tree describing those
devices and properties that the kernel can't easily and safely probe.

Now, there will be some overlap, and to get the machine description, you
surely want to start with the machine configuration.  But you'll
certainly have to add information beyond configuration.  Just stripping
out "qemu," properties won't do, I fear, unless you're happy to put tons
of stuff in the configuration file that is not actually configurable.
Which would likely annoy its human users.  And what to do with it?  Just
pass it on?  Or verify it matches reality?  Wouldn't that work be better
spent on generating the additional information on the fly, for the
targets that need it?

Once again, I'm not opposed to using some FDT binding for QEMU
configuration.  Syntax is superficial anyway.  It's the tree that
matters.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 18:53                   ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-12 18:53 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: devicetree-discuss, qemu-devel

Hollis Blanchard <hollisb@us.ibm.com> writes:

> On Thu, 2009-02-12 at 11:26 +0100, Markus Armbruster wrote:
>>  David Gibson <dwg@au1.ibm.com> writes:
>> 
>> > On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
>> >> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
>> > [snip]
>> >> > I briefly examined the DT source format and the tree structure it
>> >> > describes for the purpose of QEMU configuration.  I decided
>> against
>> >> > using it in my prototype because I found it awfully low-level and
>> >> > verbose for that purpose (I'm sure it serves the purpose it was
>> designed
>> >> > for just fine).  Issues include:
>> >> > 
>> >> > * Since the DT is designed for booting kernels, not configuring
>> QEMU,
>> >> >   there's information that has no place in QEMU configuration,
>> and
>> >> >   required QEMU configuration isn't there.
>> >> 
>> >> What's needed is a "binding" in IEEE1275-speak: a document that
>> >> describes qemu-specific nodes/properties and how they are to be
>> >> interpreted.
>> >> 
>> >> As an example, you could require that block devices contain
>> properties
>> >> named "qemu,path", "qemu,backend", etc.
>> >
>> > Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
>> > extra information for qemu's use.
>> 
>> I don't feel up to that task, because I'm not really familiar with
>> IEEE1275.  Could you help out?
>
> I'm not really a "language lawyer" for device trees, but I can help.

Appreciated!

> FWIW, I was imagining (from a PowerPC point of view) that a strict
> subset of the device tree interpreted by qemu would be passed into the
> guest. In other words, once qemu is done with it, it would strip every
> property prefixed with "qemu," and copy the result into guest memory.
> PowerPC kernels require this data structure, and even when firmware runs
> in the guest, you still need to tell the firmware what the system layout
> is, and the device tree is an obvious candidate...
>
> For x86, maybe it doesn't make sense to have in-guest BIOS split a
> qemu-provided device tree into all the nasty BIOS data structures, but I
> just wanted to give you an idea of how this could be used on multiple
> architectures.

We want a machine configuration: a tree describing configurable devices
and their configurable properties.

For PowerPC, we also want a machine description: a tree describing those
devices and properties that the kernel can't easily and safely probe.

Now, there will be some overlap, and to get the machine description, you
surely want to start with the machine configuration.  But you'll
certainly have to add information beyond configuration.  Just stripping
out "qemu," properties won't do, I fear, unless you're happy to put tons
of stuff in the configuration file that is not actually configurable.
Which would likely annoy its human users.  And what to do with it?  Just
pass it on?  Or verify it matches reality?  Wouldn't that work be better
spent on generating the additional information on the fly, for the
targets that need it?

Once again, I'm not opposed to using some FDT binding for QEMU
configuration.  Syntax is superficial anyway.  It's the tree that
matters.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 18:53                   ` Markus Armbruster
@ 2009-02-12 19:33                       ` Mitch Bradley
  -1 siblings, 0 replies; 146+ messages in thread
From: Mitch Bradley @ 2009-02-12 19:33 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A, Hollis Blanchard

>
> We want a machine configuration: a tree describing configurable devices
> and their configurable properties.
>   


Regarding configurable devices in Open Firmware:

The baseline usage model for configurable devices was that the firmware 
is responsible for establishing a consistent system configuration, 
possibly based on user-modifiable variables in non-volatile storage.  It 
reports the actual configuration to the OS via the device tree.

For cases where the choice needs to deferred until later, or perhaps 
changed dynamically, a device tree property reports the set of 
possibilities.  In cases where the firmware has already set up the 
devices, it reports the current choice via another property.

The device tree hierarchy serves as the "name space" framework for these 
properties.  Obviously, you need to specify the device for which the 
choice set applies.  The device tree is a coherent naming model for that 
purpose.

Obviously the hierarchical model has problems for highly-configurable 
chipsets in which a setting can result in a wholesale rearrangement of 
the overall connectivity, but it seems to me that board-design 
constraints usually make that a non-problem. The wiring on a given board 
generally forces the choice at that level, so the firmware for that 
board need not report that as a configurable choice.
> For PowerPC, we also want a machine description: a tree describing those
> devices and properties that the kernel can't easily and safely probe.
>   

The gist of the above sentence seems to presuppose that, if the kernel 
can probe, it should.  That's not the only way of thinking about the 
problem.  As a practical matter, the firmware usually needs to do a fair 
amount of probing too, in order to locate the console display and the 
boot devices.  In the process, the firmware usually discovers pretty 
much the entire machine configuration.  If the OS has to repeat the 
process from scratch, it slows down the boot process.  So the IEEE1275 
design supports the model where the firmware can do all the probing, 
handing off a complete system description to the OS.  The OS startup 
code can walk the tree and attach device drivers for what it finds, then 
arrange to handle insert/remove events from hot-pluggable buses.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 19:33                       ` Mitch Bradley
  0 siblings, 0 replies; 146+ messages in thread
From: Mitch Bradley @ 2009-02-12 19:33 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: devicetree-discuss, qemu-devel, Hollis Blanchard

>
> We want a machine configuration: a tree describing configurable devices
> and their configurable properties.
>   


Regarding configurable devices in Open Firmware:

The baseline usage model for configurable devices was that the firmware 
is responsible for establishing a consistent system configuration, 
possibly based on user-modifiable variables in non-volatile storage.  It 
reports the actual configuration to the OS via the device tree.

For cases where the choice needs to deferred until later, or perhaps 
changed dynamically, a device tree property reports the set of 
possibilities.  In cases where the firmware has already set up the 
devices, it reports the current choice via another property.

The device tree hierarchy serves as the "name space" framework for these 
properties.  Obviously, you need to specify the device for which the 
choice set applies.  The device tree is a coherent naming model for that 
purpose.

Obviously the hierarchical model has problems for highly-configurable 
chipsets in which a setting can result in a wholesale rearrangement of 
the overall connectivity, but it seems to me that board-design 
constraints usually make that a non-problem. The wiring on a given board 
generally forces the choice at that level, so the firmware for that 
board need not report that as a configurable choice.
> For PowerPC, we also want a machine description: a tree describing those
> devices and properties that the kernel can't easily and safely probe.
>   

The gist of the above sentence seems to presuppose that, if the kernel 
can probe, it should.  That's not the only way of thinking about the 
problem.  As a practical matter, the firmware usually needs to do a fair 
amount of probing too, in order to locate the console display and the 
boot devices.  In the process, the firmware usually discovers pretty 
much the entire machine configuration.  If the OS has to repeat the 
process from scratch, it slows down the boot process.  So the IEEE1275 
design supports the model where the firmware can do all the probing, 
handing off a complete system description to the OS.  The OS startup 
code can walk the tree and attach device drivers for what it finds, then 
arrange to handle insert/remove events from hot-pluggable buses.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-12 16:46               ` [Qemu-devel] " M. Warner Losh
@ 2009-02-12 23:35                 ` Carl-Daniel Hailfinger
  -1 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-12 23:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, hollisb

On 12.02.2009 17:46, M. Warner Losh wrote:
> <87iqng0x3t.fsf@pike.pond.sub.org>
> <49941AE3.1000806@gmx.net>
> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI)
> Mime-Version: 1.0
> Content-Type: Text/Plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
>
> In message: <49941AE3.1000806@gmx.net>
> Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> writes:
> : > I didn't mean to say they are a bad idea for FDTs, just that they're on
> : > an awkward level of abstraction for QEMU configuration.  There, I'd
> : > rather express a PCI address as "02:01.0" than as <0x00000220>.
> : > Translating text to binary is the machine's job, not the user's.
> : 
> : Coreboot v3 is using some device tree variant which is IMHO a bit more
> : user friendly. The tree below is incomplete (for example, it leaves out
> : the PCI bus number and assumes that it is zero by default), but you
> : surely get the idea.
> : 
> : /{
> :     mainboard_vendor = "Gigabyte";
> :     mainboard_name = "M57SLI";
> :     cpus { };
> :     apic@0 {
> :     };
> :     domain@0 {
> :         pci@0,0 { /* MCP55 RAM? */ 
> :         };
> :         pci@1,0 {
> :             /config/("southbridge/nvidia/mcp55/lpc.dts");
> :             ioport@2e {
>
> <etc>
>
> I'd like to make a couple of comments here.
>
> One, I dislike the DTS syntax.  It is hard to learn to read, and I
> always have to have the manual in my hands to read it.
>
> However, every board that's being produced for powerpc has the DTB at
> least available.  It has to be, or (recent?) Linux kernels flat out
> won't work.  This suggests that it might be a good idea to look at
> this format.
>   

If this is true, I'd consider it to be a misfeature/bug in Linux for
powerpc.

Unless I'm mistaken, Linux is able to probe most hardware properties.
The exceptions on x86 are interrupt routing (at least on most machines)
and memory area designations. Memory configuration can be given as a
command line parameter and with polling enabled on all interrupts, a
kernel should come up fine as well.


> There's DTS and DTB.  One is the source, the other is the binary
> created from the source.  I'd recommend that qemu actually use the DTB
> rather than the DTS to implement things.  This way one could have a
> nicer syntax like the above and generate the DTB, or one could use the
> DTS provided by a vendor if there was a more specific board they
> wanted qemu to emulate.
>
> Carl-Daniel, how does coreboot v3 generate the data that's passed to
> the kernel?
>   

Coreboot v3 does not pass anything derived from the device tree to the
kernel. It simply wouldn't make sense.

Linux and Windows use a few legacy tables and ACPI on x86/x86_64
platforms and if the device tree is any good for firmware purposes, it
won't resemble the ACPI tables at all.

Stuffing info usable for ACPI into the device tree is certainly
possible, but due to topology and content mismatch it's a painful and
pointless exercise.


Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 23:35                 ` Carl-Daniel Hailfinger
  0 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-12 23:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, hollisb

On 12.02.2009 17:46, M. Warner Losh wrote:
> <87iqng0x3t.fsf@pike.pond.sub.org>
> <49941AE3.1000806@gmx.net>
> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI)
> Mime-Version: 1.0
> Content-Type: Text/Plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
>
> In message: <49941AE3.1000806@gmx.net>
> Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> writes:
> : > I didn't mean to say they are a bad idea for FDTs, just that they're on
> : > an awkward level of abstraction for QEMU configuration.  There, I'd
> : > rather express a PCI address as "02:01.0" than as <0x00000220>.
> : > Translating text to binary is the machine's job, not the user's.
> : 
> : Coreboot v3 is using some device tree variant which is IMHO a bit more
> : user friendly. The tree below is incomplete (for example, it leaves out
> : the PCI bus number and assumes that it is zero by default), but you
> : surely get the idea.
> : 
> : /{
> :     mainboard_vendor = "Gigabyte";
> :     mainboard_name = "M57SLI";
> :     cpus { };
> :     apic@0 {
> :     };
> :     domain@0 {
> :         pci@0,0 { /* MCP55 RAM? */ 
> :         };
> :         pci@1,0 {
> :             /config/("southbridge/nvidia/mcp55/lpc.dts");
> :             ioport@2e {
>
> <etc>
>
> I'd like to make a couple of comments here.
>
> One, I dislike the DTS syntax.  It is hard to learn to read, and I
> always have to have the manual in my hands to read it.
>
> However, every board that's being produced for powerpc has the DTB at
> least available.  It has to be, or (recent?) Linux kernels flat out
> won't work.  This suggests that it might be a good idea to look at
> this format.
>   

If this is true, I'd consider it to be a misfeature/bug in Linux for
powerpc.

Unless I'm mistaken, Linux is able to probe most hardware properties.
The exceptions on x86 are interrupt routing (at least on most machines)
and memory area designations. Memory configuration can be given as a
command line parameter and with polling enabled on all interrupts, a
kernel should come up fine as well.


> There's DTS and DTB.  One is the source, the other is the binary
> created from the source.  I'd recommend that qemu actually use the DTB
> rather than the DTS to implement things.  This way one could have a
> nicer syntax like the above and generate the DTB, or one could use the
> DTS provided by a vendor if there was a more specific board they
> wanted qemu to emulate.
>
> Carl-Daniel, how does coreboot v3 generate the data that's passed to
> the kernel?
>   

Coreboot v3 does not pass anything derived from the device tree to the
kernel. It simply wouldn't make sense.

Linux and Windows use a few legacy tables and ACPI on x86/x86_64
platforms and if the device tree is any good for firmware purposes, it
won't resemble the ACPI tables at all.

Stuffing info usable for ACPI into the device tree is certainly
possible, but due to topology and content mismatch it's a painful and
pointless exercise.


Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-12 23:35                 ` [Qemu-devel] " Carl-Daniel Hailfinger
@ 2009-02-12 23:58                   ` Paul Brook
  -1 siblings, 0 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-12 23:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Carl-Daniel Hailfinger, hollisb

> Unless I'm mistaken, Linux is able to probe most hardware properties.

You are badly mistaken.

On x86 workstation/server class hardware you might get away with it because 
everything interesting is either  standard legacy ports or PCI, and your 
firmware/bios already took care of the really hairy bits.

On embedded systems there's often very little that can be automatically 
detected, much less functionality provided by the firmware (You're lucky if 
all your RAM is even turned on!) and you just have to know where stuff is.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 23:58                   ` Paul Brook
  0 siblings, 0 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-12 23:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Carl-Daniel Hailfinger, hollisb

> Unless I'm mistaken, Linux is able to probe most hardware properties.

You are badly mistaken.

On x86 workstation/server class hardware you might get away with it because 
everything interesting is either  standard legacy ports or PCI, and your 
firmware/bios already took care of the really hairy bits.

On embedded systems there's often very little that can be automatically 
detected, much less functionality provided by the firmware (You're lucky if 
all your RAM is even turned on!) and you just have to know where stuff is.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-12 18:29                 ` [Qemu-devel] " Markus Armbruster
@ 2009-02-12 23:58                   ` Carl-Daniel Hailfinger
  -1 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-12 23:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, hollisb

On 12.02.2009 19:29, Markus Armbruster wrote:
> "M. Warner Losh" <imp@bsdimp.com> writes:
>
>   
>> In message: <49941AE3.1000806@gmx.net>
>> Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> writes:
>> : > I didn't mean to say they are a bad idea for FDTs, just that they're on
>> : > an awkward level of abstraction for QEMU configuration.  There, I'd
>> : > rather express a PCI address as "02:01.0" than as <0x00000220>.
>> : > Translating text to binary is the machine's job, not the user's.
>> : 
>> : Coreboot v3 is using some device tree variant which is IMHO a bit more
>> : user friendly. The tree below is incomplete (for example, it leaves out
>> : the PCI bus number and assumes that it is zero by default), but you
>> : surely get the idea.
>> : 
>> : /{
>> :     mainboard_vendor = "Gigabyte";
>> :     mainboard_name = "M57SLI";
>> :     cpus { };
>> :     apic@0 {
>> :     };
>> :     domain@0 {
>> :         pci@0,0 { /* MCP55 RAM? */ 
>> :         };
>> :         pci@1,0 {
>> :             /config/("southbridge/nvidia/mcp55/lpc.dts");
>> :             ioport@2e {
>>
>> <etc>
>>
>> I'd like to make a couple of comments here.
>>
>> One, I dislike the DTS syntax.  It is hard to learn to read, and I
>> always have to have the manual in my hands to read it.
>>
>> However, every board that's being produced for powerpc has the DTB at
>> least available.  It has to be, or (recent?) Linux kernels flat out
>> won't work.  This suggests that it might be a good idea to look at
>> this format.
>>
>> There's DTS and DTB.  One is the source, the other is the binary
>> created from the source.  I'd recommend that qemu actually use the DTB
>> rather than the DTS to implement things.  This way one could have a
>> nicer syntax like the above and generate the DTB, or one could use the
>> DTS provided by a vendor if there was a more specific board they
>> wanted qemu to emulate.
>>     
>
> As far as I know, dtc can decompile DTB into DTS.
>
> I'm not a fan of DTS syntax either, but if we choose FDT, then inventing
> an alternative syntax seems rather pointless to me.
>   

If the alternative syntax is more readable, why not?

If the DTS text file is compiled into DTB anyway, there's absolutely no
reason to make the text file hard to read for humans. Except maybe
making sure that nobody will ever want to change them, and in that case
we can advise developers to modify the DTB directly.

Compilers for DTS variants do exist. For example, coreboot v3 has one.

> As to reading configuration in a binary format: let's not complicate
> things more than we need.  It's just a decorated tree, folks.
>   

How exactly do you represent a digraph with some cycles as a decorated
tree? The solution should allow people without an extensive background
in IEEE1275 to change the graph as needed.
Having to keep a calculator handy for PCI bus addresses is embarrassing
(and with a calculator, requiring the user to determine the full CF8/CFC
PCI config cycles is not that much more effort ;-) ).


Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-12 23:58                   ` Carl-Daniel Hailfinger
  0 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-12 23:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, hollisb

On 12.02.2009 19:29, Markus Armbruster wrote:
> "M. Warner Losh" <imp@bsdimp.com> writes:
>
>   
>> In message: <49941AE3.1000806@gmx.net>
>> Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> writes:
>> : > I didn't mean to say they are a bad idea for FDTs, just that they're on
>> : > an awkward level of abstraction for QEMU configuration.  There, I'd
>> : > rather express a PCI address as "02:01.0" than as <0x00000220>.
>> : > Translating text to binary is the machine's job, not the user's.
>> : 
>> : Coreboot v3 is using some device tree variant which is IMHO a bit more
>> : user friendly. The tree below is incomplete (for example, it leaves out
>> : the PCI bus number and assumes that it is zero by default), but you
>> : surely get the idea.
>> : 
>> : /{
>> :     mainboard_vendor = "Gigabyte";
>> :     mainboard_name = "M57SLI";
>> :     cpus { };
>> :     apic@0 {
>> :     };
>> :     domain@0 {
>> :         pci@0,0 { /* MCP55 RAM? */ 
>> :         };
>> :         pci@1,0 {
>> :             /config/("southbridge/nvidia/mcp55/lpc.dts");
>> :             ioport@2e {
>>
>> <etc>
>>
>> I'd like to make a couple of comments here.
>>
>> One, I dislike the DTS syntax.  It is hard to learn to read, and I
>> always have to have the manual in my hands to read it.
>>
>> However, every board that's being produced for powerpc has the DTB at
>> least available.  It has to be, or (recent?) Linux kernels flat out
>> won't work.  This suggests that it might be a good idea to look at
>> this format.
>>
>> There's DTS and DTB.  One is the source, the other is the binary
>> created from the source.  I'd recommend that qemu actually use the DTB
>> rather than the DTS to implement things.  This way one could have a
>> nicer syntax like the above and generate the DTB, or one could use the
>> DTS provided by a vendor if there was a more specific board they
>> wanted qemu to emulate.
>>     
>
> As far as I know, dtc can decompile DTB into DTS.
>
> I'm not a fan of DTS syntax either, but if we choose FDT, then inventing
> an alternative syntax seems rather pointless to me.
>   

If the alternative syntax is more readable, why not?

If the DTS text file is compiled into DTB anyway, there's absolutely no
reason to make the text file hard to read for humans. Except maybe
making sure that nobody will ever want to change them, and in that case
we can advise developers to modify the DTB directly.

Compilers for DTS variants do exist. For example, coreboot v3 has one.

> As to reading configuration in a binary format: let's not complicate
> things more than we need.  It's just a decorated tree, folks.
>   

How exactly do you represent a digraph with some cycles as a decorated
tree? The solution should allow people without an extensive background
in IEEE1275 to change the graph as needed.
Having to keep a calculator handy for PCI bus addresses is embarrassing
(and with a calculator, requiring the user to determine the full CF8/CFC
PCI config cycles is not that much more effort ;-) ).


Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 23:35                 ` [Qemu-devel] " Carl-Daniel Hailfinger
@ 2009-02-13  0:05                     ` M. Warner Losh
  -1 siblings, 0 replies; 146+ messages in thread
From: M. Warner Losh @ 2009-02-13  0:05 UTC (permalink / raw)
  To: c-d.hailfinger.devel.2006-hi6Y0CQ0nG0
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA

In message: <4994B22E.6060608-hi6Y0CQ0nG0@public.gmane.org>
            Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006-hi6Y0CQ0nG0@public.gmane.org> writes:
: On 12.02.2009 17:46, M. Warner Losh wrote:
: > <87iqng0x3t.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
: > <49941AE3.1000806-hi6Y0CQ0nG0@public.gmane.org>
: > X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI)
: > Mime-Version: 1.0
: > Content-Type: Text/Plain; charset=us-ascii
: > Content-Transfer-Encoding: 7bit
: >
: > In message: <49941AE3.1000806-hi6Y0CQ0nG0@public.gmane.org>
: > Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006-hi6Y0CQ0nG0@public.gmane.org> writes:
: > : > I didn't mean to say they are a bad idea for FDTs, just that they're on
: > : > an awkward level of abstraction for QEMU configuration.  There, I'd
: > : > rather express a PCI address as "02:01.0" than as <0x00000220>.
: > : > Translating text to binary is the machine's job, not the user's.
: > : 
: > : Coreboot v3 is using some device tree variant which is IMHO a bit more
: > : user friendly. The tree below is incomplete (for example, it leaves out
: > : the PCI bus number and assumes that it is zero by default), but you
: > : surely get the idea.
: > : 
: > : /{
: > :     mainboard_vendor = "Gigabyte";
: > :     mainboard_name = "M57SLI";
: > :     cpus { };
: > :     apic@0 {
: > :     };
: > :     domain@0 {
: > :         pci@0,0 { /* MCP55 RAM? */ 
: > :         };
: > :         pci@1,0 {
: > :             /config/("southbridge/nvidia/mcp55/lpc.dts");
: > :             ioport@2e {
: >
: > <etc>
: >
: > I'd like to make a couple of comments here.
: >
: > One, I dislike the DTS syntax.  It is hard to learn to read, and I
: > always have to have the manual in my hands to read it.
: >
: > However, every board that's being produced for powerpc has the DTB at
: > least available.  It has to be, or (recent?) Linux kernels flat out
: > won't work.  This suggests that it might be a good idea to look at
: > this format.
: >   
: 
: If this is true, I'd consider it to be a misfeature/bug in Linux for
: powerpc.

It is neither.  It is absolutely required.  It can probe things like
usb devices and pci devices, but it is impossible to probe how
interrupts are wired, where devices exist on local busses connected to
the SoC, etc.

The DTS tables can have pci nodes in them, it isn't required.

: Unless I'm mistaken, Linux is able to probe most hardware properties.
: The exceptions on x86 are interrupt routing (at least on most machines)
: and memory area designations. Memory configuration can be given as a
: command line parameter and with polling enabled on all interrupts, a
: kernel should come up fine as well.

s/most/some/.  On powerpc you have a much richer pallet to choose
from, and the knowledge of how things are wired into the dtb blob
that's passed.

: > There's DTS and DTB.  One is the source, the other is the binary
: > created from the source.  I'd recommend that qemu actually use the DTB
: > rather than the DTS to implement things.  This way one could have a
: > nicer syntax like the above and generate the DTB, or one could use the
: > DTS provided by a vendor if there was a more specific board they
: > wanted qemu to emulate.
: >
: > Carl-Daniel, how does coreboot v3 generate the data that's passed to
: > the kernel?
: >   
: 
: Coreboot v3 does not pass anything derived from the device tree to the
: kernel. It simply wouldn't make sense.
: 
: Linux and Windows use a few legacy tables and ACPI on x86/x86_64
: platforms and if the device tree is any good for firmware purposes, it
: won't resemble the ACPI tables at all.
: 
: Stuffing info usable for ACPI into the device tree is certainly
: possible, but due to topology and content mismatch it's a painful and
: pointless exercise.

That's likely true...

Warner

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13  0:05                     ` M. Warner Losh
  0 siblings, 0 replies; 146+ messages in thread
From: M. Warner Losh @ 2009-02-13  0:05 UTC (permalink / raw)
  To: c-d.hailfinger.devel.2006; +Cc: devicetree-discuss, qemu-devel, hollisb

In message: <4994B22E.6060608@gmx.net>
            Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> writes:
: On 12.02.2009 17:46, M. Warner Losh wrote:
: > <87iqng0x3t.fsf@pike.pond.sub.org>
: > <49941AE3.1000806@gmx.net>
: > X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI)
: > Mime-Version: 1.0
: > Content-Type: Text/Plain; charset=us-ascii
: > Content-Transfer-Encoding: 7bit
: >
: > In message: <49941AE3.1000806@gmx.net>
: > Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> writes:
: > : > I didn't mean to say they are a bad idea for FDTs, just that they're on
: > : > an awkward level of abstraction for QEMU configuration.  There, I'd
: > : > rather express a PCI address as "02:01.0" than as <0x00000220>.
: > : > Translating text to binary is the machine's job, not the user's.
: > : 
: > : Coreboot v3 is using some device tree variant which is IMHO a bit more
: > : user friendly. The tree below is incomplete (for example, it leaves out
: > : the PCI bus number and assumes that it is zero by default), but you
: > : surely get the idea.
: > : 
: > : /{
: > :     mainboard_vendor = "Gigabyte";
: > :     mainboard_name = "M57SLI";
: > :     cpus { };
: > :     apic@0 {
: > :     };
: > :     domain@0 {
: > :         pci@0,0 { /* MCP55 RAM? */ 
: > :         };
: > :         pci@1,0 {
: > :             /config/("southbridge/nvidia/mcp55/lpc.dts");
: > :             ioport@2e {
: >
: > <etc>
: >
: > I'd like to make a couple of comments here.
: >
: > One, I dislike the DTS syntax.  It is hard to learn to read, and I
: > always have to have the manual in my hands to read it.
: >
: > However, every board that's being produced for powerpc has the DTB at
: > least available.  It has to be, or (recent?) Linux kernels flat out
: > won't work.  This suggests that it might be a good idea to look at
: > this format.
: >   
: 
: If this is true, I'd consider it to be a misfeature/bug in Linux for
: powerpc.

It is neither.  It is absolutely required.  It can probe things like
usb devices and pci devices, but it is impossible to probe how
interrupts are wired, where devices exist on local busses connected to
the SoC, etc.

The DTS tables can have pci nodes in them, it isn't required.

: Unless I'm mistaken, Linux is able to probe most hardware properties.
: The exceptions on x86 are interrupt routing (at least on most machines)
: and memory area designations. Memory configuration can be given as a
: command line parameter and with polling enabled on all interrupts, a
: kernel should come up fine as well.

s/most/some/.  On powerpc you have a much richer pallet to choose
from, and the knowledge of how things are wired into the dtb blob
that's passed.

: > There's DTS and DTB.  One is the source, the other is the binary
: > created from the source.  I'd recommend that qemu actually use the DTB
: > rather than the DTS to implement things.  This way one could have a
: > nicer syntax like the above and generate the DTB, or one could use the
: > DTS provided by a vendor if there was a more specific board they
: > wanted qemu to emulate.
: >
: > Carl-Daniel, how does coreboot v3 generate the data that's passed to
: > the kernel?
: >   
: 
: Coreboot v3 does not pass anything derived from the device tree to the
: kernel. It simply wouldn't make sense.
: 
: Linux and Windows use a few legacy tables and ACPI on x86/x86_64
: platforms and if the device tree is any good for firmware purposes, it
: won't resemble the ACPI tables at all.
: 
: Stuffing info usable for ACPI into the device tree is certainly
: possible, but due to topology and content mismatch it's a painful and
: pointless exercise.

That's likely true...

Warner

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 23:58                   ` [Qemu-devel] " Paul Brook
@ 2009-02-13  0:32                       ` Carl-Daniel Hailfinger
  -1 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-13  0:32 UTC (permalink / raw)
  To: Paul Brook
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA

On 13.02.2009 00:58, Paul Brook wrote:
>> Unless I'm mistaken, Linux is able to probe most hardware properties.
>>     
>
> You are badly mistaken.
>   

Point taken.


> On x86 workstation/server class hardware you might get away with it because 
> everything interesting is either  standard legacy ports or PCI, and your 
> firmware/bios already took care of the really hairy bits.
>   

If the firmware doesn't set up the things which can't be probed, can it
even be called firmware or is it more like a glorified bootloader?


> On embedded systems there's often very little that can be automatically 
> detected, much less functionality provided by the firmware (You're lucky if 
> all your RAM is even turned on!) and you just have to know where stuff is.
>   

Ouch. I always thought turning on all the RAM was either a hardware (old
x86) or firmware (modern x86) task.

I'm a bit surprised by the lack of automatically detectable features in
embedded systems. Wouldn't automatic detection allow reusing whole OS
images on slighly different systems and thus lower development cost?


Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13  0:32                       ` Carl-Daniel Hailfinger
  0 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-13  0:32 UTC (permalink / raw)
  To: Paul Brook; +Cc: devicetree-discuss, qemu-devel, hollisb

On 13.02.2009 00:58, Paul Brook wrote:
>> Unless I'm mistaken, Linux is able to probe most hardware properties.
>>     
>
> You are badly mistaken.
>   

Point taken.


> On x86 workstation/server class hardware you might get away with it because 
> everything interesting is either  standard legacy ports or PCI, and your 
> firmware/bios already took care of the really hairy bits.
>   

If the firmware doesn't set up the things which can't be probed, can it
even be called firmware or is it more like a glorified bootloader?


> On embedded systems there's often very little that can be automatically 
> detected, much less functionality provided by the firmware (You're lucky if 
> all your RAM is even turned on!) and you just have to know where stuff is.
>   

Ouch. I always thought turning on all the RAM was either a hardware (old
x86) or firmware (modern x86) task.

I'm a bit surprised by the lack of automatically detectable features in
embedded systems. Wouldn't automatic detection allow reusing whole OS
images on slighly different systems and thus lower development cost?


Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 10:26       ` [Qemu-devel] " Markus Armbruster
@ 2009-02-13  0:37           ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  0:37 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Thu, Feb 12, 2009 at 11:26:12AM +0100, Markus Armbruster wrote:
> Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
[snip]
> > I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
> > reinvent all the design and infrastructure for a similar-but-different
> > device tree.
> >
> > [Patch snipped]
> 
> I'm not at all opposed to adapting FDT for QEMU use.  My patch is a
> prototype, and I'm prepared to throw away some or all of it.
> 
> To get this thing started, I wanted working code to demonstrate what I'm
> talking about.  If I had dug deeper into FDTs first, we would not be
> talking now.
> 
> The task I outlined in my memo involves much more than just coming up
> with a device tree data structure.  That data structure is to me one
> detail among many, and a much less hairy one than most others.  It
> certainly was for the prototype.
> 
> If I read the comments correctly (all comments, not just this one), the
> only real issue with my proposal is you'd rather use FDT for the config
> tree.  I don't mind, except I don't know enough about that stuff to do
> it all by myself, at least not in a reasonable time frame.  I think I
> understand the concepts, can read .dts files with some head-scratching,
> and I could perhaps even write one if I sacrificed a chicken or two.
> Designing a binding, however, feels well above my level of
> (in)competence.
> 
> So, to make FDT happen, I need help.  Specifically:
> 
> * Point me to the FDT code I'm supposed to integrate.  I'm looking for
>   basic decorated tree stuff: create trees, traverse them, get and put
>   properties, add and delete nodes, read and write them as plain,
>   human-readable text.

dtc and libfdt is a good place to start, if you haven't yet
investigated them:
	git://git.jdl.com/software/dtc.git
Note that although they're distributed together as one tree, dtc and
libfdt are essentially independent pieces of software.  dtc converts
device trees between various formats, dts and dtb in particular.

libfdt does a number of the things you mention with flat trees -
get/set properties, build trees, traverse etc.  If it doesn't do
everything you need, we can probably extend it so that it does: I want
libfdt to be *the* library for manipulating trees in the fdt forma.
It's designed to be easy to embed in other packages for this reason,
although it does have some usage peculiarities because in particular
it's possible to integrate into very limited environments like
firmwares.

[Jon Loeliger is the current maintainer of dtc and libfdt, but I
originally wrote both of them - I know as much about them as anyone
does]

> * Provide an example tree describing a bare-bones PC, like the one in my
>   prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
>   port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
>   miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
>   Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
>   tree structure.  Morphing that into something suitable for QEMU
>   configuration shouldn't be too hard then, just an exercice in
>   redecorating the tree.

I don't off hand know any trees for a PC system.  There are a bunch of
example trees for powerpc systems in arch/powerpc/boot/dts in the
kernel tree.  A few of those, such as prep, at least have parts which
somewhat resemble a PC.  I believe the OLPC also has OF; that would be
an example OF tree for an x86 machine, if not a typical PC.

> * Advice as we go.

I'll do what I can.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13  0:37           ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  0:37 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: devicetree-discuss, qemu-devel

On Thu, Feb 12, 2009 at 11:26:12AM +0100, Markus Armbruster wrote:
> Hollis Blanchard <hollisb@us.ibm.com> writes:
[snip]
> > I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
> > reinvent all the design and infrastructure for a similar-but-different
> > device tree.
> >
> > [Patch snipped]
> 
> I'm not at all opposed to adapting FDT for QEMU use.  My patch is a
> prototype, and I'm prepared to throw away some or all of it.
> 
> To get this thing started, I wanted working code to demonstrate what I'm
> talking about.  If I had dug deeper into FDTs first, we would not be
> talking now.
> 
> The task I outlined in my memo involves much more than just coming up
> with a device tree data structure.  That data structure is to me one
> detail among many, and a much less hairy one than most others.  It
> certainly was for the prototype.
> 
> If I read the comments correctly (all comments, not just this one), the
> only real issue with my proposal is you'd rather use FDT for the config
> tree.  I don't mind, except I don't know enough about that stuff to do
> it all by myself, at least not in a reasonable time frame.  I think I
> understand the concepts, can read .dts files with some head-scratching,
> and I could perhaps even write one if I sacrificed a chicken or two.
> Designing a binding, however, feels well above my level of
> (in)competence.
> 
> So, to make FDT happen, I need help.  Specifically:
> 
> * Point me to the FDT code I'm supposed to integrate.  I'm looking for
>   basic decorated tree stuff: create trees, traverse them, get and put
>   properties, add and delete nodes, read and write them as plain,
>   human-readable text.

dtc and libfdt is a good place to start, if you haven't yet
investigated them:
	git://git.jdl.com/software/dtc.git
Note that although they're distributed together as one tree, dtc and
libfdt are essentially independent pieces of software.  dtc converts
device trees between various formats, dts and dtb in particular.

libfdt does a number of the things you mention with flat trees -
get/set properties, build trees, traverse etc.  If it doesn't do
everything you need, we can probably extend it so that it does: I want
libfdt to be *the* library for manipulating trees in the fdt forma.
It's designed to be easy to embed in other packages for this reason,
although it does have some usage peculiarities because in particular
it's possible to integrate into very limited environments like
firmwares.

[Jon Loeliger is the current maintainer of dtc and libfdt, but I
originally wrote both of them - I know as much about them as anyone
does]

> * Provide an example tree describing a bare-bones PC, like the one in my
>   prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
>   port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
>   miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
>   Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
>   tree structure.  Morphing that into something suitable for QEMU
>   configuration shouldn't be too hard then, just an exercice in
>   redecorating the tree.

I don't off hand know any trees for a PC system.  There are a bunch of
example trees for powerpc systems in arch/powerpc/boot/dts in the
kernel tree.  A few of those, such as prep, at least have parts which
somewhat resemble a PC.  I believe the OLPC also has OF; that would be
an example OF tree for an x86 machine, if not a typical PC.

> * Advice as we go.

I'll do what I can.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 10:26           ` [Qemu-devel] " Markus Armbruster
@ 2009-02-13  0:43               ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  0:43 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A, Hollis Blanchard

On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
> David Gibson <dwg-8fk3Idey6ehBDgjK7y7TUQ@public.gmane.org> writes:
> 
> > On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
> >> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
> > [snip]
> >> > I briefly examined the DT source format and the tree structure it
> >> > describes for the purpose of QEMU configuration.  I decided against
> >> > using it in my prototype because I found it awfully low-level and
> >> > verbose for that purpose (I'm sure it serves the purpose it was designed
> >> > for just fine).  Issues include:
> >> > 
> >> > * Since the DT is designed for booting kernels, not configuring QEMU,
> >> >   there's information that has no place in QEMU configuration, and
> >> >   required QEMU configuration isn't there.
> >> 
> >> What's needed is a "binding" in IEEE1275-speak: a document that
> >> describes qemu-specific nodes/properties and how they are to be
> >> interpreted.
> >> 
> >> As an example, you could require that block devices contain properties
> >> named "qemu,path", "qemu,backend", etc.
> >
> > Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
> > extra information for qemu's use.
> 
> I don't feel up to that task, because I'm not really familiar with
> IEEE1275.  Could you help out?

Uh.. up to some level at least.

> >                                    As for the other direction, in some
> > cases it may be appropriate for qemu's device tree code to fill in
> > missing device tree properties, based on what the device emulation
> > code knows about itself.
> 
> Agreed.  Configuration should only contain what is actually
> configurable.  Anything else that is needed by a consumer of the tree
> should be filled in automatically.

Right.  I guess it will depend exactly what the balance is between
configuration and generated information whether we want to use a dts
with annontations in extra properties or a different tree format as
the root source format.  Either way I think we want to use the fdt
format as the format that qemu and the guest firmware work with.

> >> > * Redundancy between node name and its device_type property.
> >
> > Note that "device_type" may not mean what you think.  It describes
> > what methods the device support within the OF client interface.  New
> > device trees that aren't linked to a full OF implementation with
> > client interface should generally omit device_type in most places
> > (there are a few special cases for compatibility with OSes that expect
> > device_type properties in certain places).
> 
> I guess the ignorance I mentioned shows ;)

Heh, well, device_type is very commonly misunderstood for good
reason.  It made sense in the original OF context, but in the context
of flat trees, the name is very misleading.

> >> > * Property "reg", which encodes address ranges, does so in terms of
> >> >   "cells": #address-cells 32-bit words (big endian) for the address,
> >> >   followed by #size-cells words for the size, where #address-cells and
> >> >   #size-cells are properties of the enclosing bus.  If this sounds
> >> >   like gibberish to you, well, that's my point.
> >
> > #address-cells and #size-cells takes a little getting used to, but
> > it's really not that bad.  It's just a way of representing the fact
> > that different busses have different sized address encodings.
> 
> I didn't mean to say they are a bad idea for FDTs, just that they're on
> an awkward level of abstraction for QEMU configuration.  There, I'd
> rather express a PCI address as "02:01.0" than as <0x00000220>.
> Translating text to binary is the machine's job, not the user's.

Ah, I see what you mean.  Hrm, there are several possibilities here,
we'll have to see which works out best for your purposes.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13  0:43               ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  0:43 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: devicetree-discuss, qemu-devel, Hollis Blanchard

On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
> David Gibson <dwg@au1.ibm.com> writes:
> 
> > On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
> >> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
> > [snip]
> >> > I briefly examined the DT source format and the tree structure it
> >> > describes for the purpose of QEMU configuration.  I decided against
> >> > using it in my prototype because I found it awfully low-level and
> >> > verbose for that purpose (I'm sure it serves the purpose it was designed
> >> > for just fine).  Issues include:
> >> > 
> >> > * Since the DT is designed for booting kernels, not configuring QEMU,
> >> >   there's information that has no place in QEMU configuration, and
> >> >   required QEMU configuration isn't there.
> >> 
> >> What's needed is a "binding" in IEEE1275-speak: a document that
> >> describes qemu-specific nodes/properties and how they are to be
> >> interpreted.
> >> 
> >> As an example, you could require that block devices contain properties
> >> named "qemu,path", "qemu,backend", etc.
> >
> > Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
> > extra information for qemu's use.
> 
> I don't feel up to that task, because I'm not really familiar with
> IEEE1275.  Could you help out?

Uh.. up to some level at least.

> >                                    As for the other direction, in some
> > cases it may be appropriate for qemu's device tree code to fill in
> > missing device tree properties, based on what the device emulation
> > code knows about itself.
> 
> Agreed.  Configuration should only contain what is actually
> configurable.  Anything else that is needed by a consumer of the tree
> should be filled in automatically.

Right.  I guess it will depend exactly what the balance is between
configuration and generated information whether we want to use a dts
with annontations in extra properties or a different tree format as
the root source format.  Either way I think we want to use the fdt
format as the format that qemu and the guest firmware work with.

> >> > * Redundancy between node name and its device_type property.
> >
> > Note that "device_type" may not mean what you think.  It describes
> > what methods the device support within the OF client interface.  New
> > device trees that aren't linked to a full OF implementation with
> > client interface should generally omit device_type in most places
> > (there are a few special cases for compatibility with OSes that expect
> > device_type properties in certain places).
> 
> I guess the ignorance I mentioned shows ;)

Heh, well, device_type is very commonly misunderstood for good
reason.  It made sense in the original OF context, but in the context
of flat trees, the name is very misleading.

> >> > * Property "reg", which encodes address ranges, does so in terms of
> >> >   "cells": #address-cells 32-bit words (big endian) for the address,
> >> >   followed by #size-cells words for the size, where #address-cells and
> >> >   #size-cells are properties of the enclosing bus.  If this sounds
> >> >   like gibberish to you, well, that's my point.
> >
> > #address-cells and #size-cells takes a little getting used to, but
> > it's really not that bad.  It's just a way of representing the fact
> > that different busses have different sized address encodings.
> 
> I didn't mean to say they are a bad idea for FDTs, just that they're on
> an awkward level of abstraction for QEMU configuration.  There, I'd
> rather express a PCI address as "02:01.0" than as <0x00000220>.
> Translating text to binary is the machine's job, not the user's.

Ah, I see what you mean.  Hrm, there are several possibilities here,
we'll have to see which works out best for your purposes.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-13  0:32                       ` Carl-Daniel Hailfinger
@ 2009-02-13  0:47                         ` Jamie Lokier
  -1 siblings, 0 replies; 146+ messages in thread
From: Jamie Lokier @ 2009-02-13  0:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Paul Brook, hollisb

Carl-Daniel Hailfinger wrote:
> I'm a bit surprised by the lack of automatically detectable features in
> embedded systems. Wouldn't automatic detection allow reusing whole OS
> images on slighly different systems and thus lower development cost?

Where systems are slightly different, what often happens is the boot
firmware (boot loader, whatever you want to call it) has different
parameters patched in.  They might be compiled in differently or
different values in a small config block.

That only applies to different versions of the same product though.
Among embedded systems generally there are huge differences between
them.

In the OS image, you often have only drivers for devices on the board,
to keep the OS as small as possible, so it's customised for each board
anyway, except being shared between minor revisions.  There are often
a few custom devices and kernel patches too.

-- Jamie

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13  0:47                         ` Jamie Lokier
  0 siblings, 0 replies; 146+ messages in thread
From: Jamie Lokier @ 2009-02-13  0:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Paul Brook, hollisb

Carl-Daniel Hailfinger wrote:
> I'm a bit surprised by the lack of automatically detectable features in
> embedded systems. Wouldn't automatic detection allow reusing whole OS
> images on slighly different systems and thus lower development cost?

Where systems are slightly different, what often happens is the boot
firmware (boot loader, whatever you want to call it) has different
parameters patched in.  They might be compiled in differently or
different values in a small config block.

That only applies to different versions of the same product though.
Among embedded systems generally there are huge differences between
them.

In the OS image, you often have only drivers for devices on the board,
to keep the OS as small as possible, so it's customised for each board
anyway, except being shared between minor revisions.  There are often
a few custom devices and kernel patches too.

-- Jamie

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 19:33                       ` Mitch Bradley
@ 2009-02-13  0:59                           ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  0:59 UTC (permalink / raw)
  To: Mitch Bradley
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A, Markus Armbruster,
	Hollis Blanchard, qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Thu, Feb 12, 2009 at 09:33:59AM -1000, Mitch Bradley wrote:
>>
>> We want a machine configuration: a tree describing configurable devices
>> and their configurable properties.
>
> Regarding configurable devices in Open Firmware:
>
> The baseline usage model for configurable devices was that the firmware  
> is responsible for establishing a consistent system configuration,  
> possibly based on user-modifiable variables in non-volatile storage.  It  
> reports the actual configuration to the OS via the device tree.
>
> For cases where the choice needs to deferred until later, or perhaps  
> changed dynamically, a device tree property reports the set of  
> possibilities.  In cases where the firmware has already set up the  
> devices, it reports the current choice via another property.
>
> The device tree hierarchy serves as the "name space" framework for these  
> properties.  Obviously, you need to specify the device for which the  
> choice set applies.  The device tree is a coherent naming model for that  
> purpose.
>
> Obviously the hierarchical model has problems for highly-configurable  
> chipsets in which a setting can result in a wholesale rearrangement of  
> the overall connectivity, but it seems to me that board-design  
> constraints usually make that a non-problem. The wiring on a given board  
> generally forces the choice at that level, so the firmware for that  
> board need not report that as a configurable choice.
>> For PowerPC, we also want a machine description: a tree describing those
>> devices and properties that the kernel can't easily and safely probe.
>
> The gist of the above sentence seems to presuppose that, if the kernel  
> can probe, it should.  That's not the only way of thinking about the  
> problem.  As a practical matter, the firmware usually needs to do a fair  
> amount of probing too, in order to locate the console display and the  
> boot devices.  In the process, the firmware usually discovers pretty  
> much the entire machine configuration.  If the OS has to repeat the  
> process from scratch, it slows down the boot process.  So the IEEE1275  
> design supports the model where the firmware can do all the probing,  
> handing off a complete system description to the OS.  The OS startup  
> code can walk the tree and attach device drivers for what it finds, then  
> arrange to handle insert/remove events from hot-pluggable buses.

In the context of a full IEEE1275 implementation that makes sense.
However, in the context of flat trees - which were designed with
embedded machines in mind particularly - cutting down the device tree
to only things which the kernel can't probe is normal practice.  In
this context firmware is often really minimal and doesn't actually
need to probe much.  Basic IO devices like console are often on chip
and easily accessible so in particular the firmware has no need to
probe complex bus structures like PCI.

But more importantly, this is a design decision driven by the
prevalence of buggy-as-hell firmwares.  There are many instances where
to work around broken firmware probing, the kernel has to do almost as
much work as probing from scratch (and the code to do it is generally
more confusing).  And while it might be nice to imagine a world with
good firmware supporting standard interfaces, I really can't see it
ever happening.

So, we've taken the approach of moving as much as possible of the
probing and device-discovery logic into the kernel which is usually a
more easily replacable / fixable component of the system.  The less
the firmware has to put into the tree, the less it can get wrong.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13  0:59                           ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  0:59 UTC (permalink / raw)
  To: Mitch Bradley
  Cc: devicetree-discuss, Markus Armbruster, Hollis Blanchard, qemu-devel

On Thu, Feb 12, 2009 at 09:33:59AM -1000, Mitch Bradley wrote:
>>
>> We want a machine configuration: a tree describing configurable devices
>> and their configurable properties.
>
> Regarding configurable devices in Open Firmware:
>
> The baseline usage model for configurable devices was that the firmware  
> is responsible for establishing a consistent system configuration,  
> possibly based on user-modifiable variables in non-volatile storage.  It  
> reports the actual configuration to the OS via the device tree.
>
> For cases where the choice needs to deferred until later, or perhaps  
> changed dynamically, a device tree property reports the set of  
> possibilities.  In cases where the firmware has already set up the  
> devices, it reports the current choice via another property.
>
> The device tree hierarchy serves as the "name space" framework for these  
> properties.  Obviously, you need to specify the device for which the  
> choice set applies.  The device tree is a coherent naming model for that  
> purpose.
>
> Obviously the hierarchical model has problems for highly-configurable  
> chipsets in which a setting can result in a wholesale rearrangement of  
> the overall connectivity, but it seems to me that board-design  
> constraints usually make that a non-problem. The wiring on a given board  
> generally forces the choice at that level, so the firmware for that  
> board need not report that as a configurable choice.
>> For PowerPC, we also want a machine description: a tree describing those
>> devices and properties that the kernel can't easily and safely probe.
>
> The gist of the above sentence seems to presuppose that, if the kernel  
> can probe, it should.  That's not the only way of thinking about the  
> problem.  As a practical matter, the firmware usually needs to do a fair  
> amount of probing too, in order to locate the console display and the  
> boot devices.  In the process, the firmware usually discovers pretty  
> much the entire machine configuration.  If the OS has to repeat the  
> process from scratch, it slows down the boot process.  So the IEEE1275  
> design supports the model where the firmware can do all the probing,  
> handing off a complete system description to the OS.  The OS startup  
> code can walk the tree and attach device drivers for what it finds, then  
> arrange to handle insert/remove events from hot-pluggable buses.

In the context of a full IEEE1275 implementation that makes sense.
However, in the context of flat trees - which were designed with
embedded machines in mind particularly - cutting down the device tree
to only things which the kernel can't probe is normal practice.  In
this context firmware is often really minimal and doesn't actually
need to probe much.  Basic IO devices like console are often on chip
and easily accessible so in particular the firmware has no need to
probe complex bus structures like PCI.

But more importantly, this is a design decision driven by the
prevalence of buggy-as-hell firmwares.  There are many instances where
to work around broken firmware probing, the kernel has to do almost as
much work as probing from scratch (and the code to do it is generally
more confusing).  And while it might be nice to imagine a world with
good firmware supporting standard interfaces, I really can't see it
ever happening.

So, we've taken the approach of moving as much as possible of the
probing and device-discovery logic into the kernel which is usually a
more easily replacable / fixable component of the system.  The less
the firmware has to put into the tree, the less it can get wrong.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 17:52               ` Hollis Blanchard
@ 2009-02-13  1:00                   ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  1:00 UTC (permalink / raw)
  To: Hollis Blanchard
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A, Markus Armbruster,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Thu, Feb 12, 2009 at 11:52:42AM -0600, Hollis Blanchard wrote:
> On Thu, 2009-02-12 at 11:26 +0100, Markus Armbruster wrote:
> >  David Gibson <dwg-8fk3Idey6ehBDgjK7y7TUQ@public.gmane.org> writes:
> > 
> > > On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
> > >> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
> > > [snip]
> > >> > I briefly examined the DT source format and the tree structure it
> > >> > describes for the purpose of QEMU configuration.  I decided
> > against
> > >> > using it in my prototype because I found it awfully low-level and
> > >> > verbose for that purpose (I'm sure it serves the purpose it was
> > designed
> > >> > for just fine).  Issues include:
> > >> > 
> > >> > * Since the DT is designed for booting kernels, not configuring
> > QEMU,
> > >> >   there's information that has no place in QEMU configuration,
> > and
> > >> >   required QEMU configuration isn't there.
> > >> 
> > >> What's needed is a "binding" in IEEE1275-speak: a document that
> > >> describes qemu-specific nodes/properties and how they are to be
> > >> interpreted.
> > >> 
> > >> As an example, you could require that block devices contain
> > properties
> > >> named "qemu,path", "qemu,backend", etc.
> > >
> > > Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
> > > extra information for qemu's use.
> > 
> > I don't feel up to that task, because I'm not really familiar with
> > IEEE1275.  Could you help out?
> 
> I'm not really a "language lawyer" for device trees, but I can help.
> 
> FWIW, I was imagining (from a PowerPC point of view) that a strict
> subset of the device tree interpreted by qemu would be passed into the
> guest. In other words, once qemu is done with it, it would strip every
> property prefixed with "qemu," and copy the result into guest memory.
> PowerPC kernels require this data structure, and even when firmware runs
> in the guest, you still need to tell the firmware what the system layout
> is, and the device tree is an obvious candidate...

I wouldn't actually bother stripping the "qemu,..." properties.  They
should be ignored by the OS anyway.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13  1:00                   ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  1:00 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: devicetree-discuss, Markus Armbruster, qemu-devel

On Thu, Feb 12, 2009 at 11:52:42AM -0600, Hollis Blanchard wrote:
> On Thu, 2009-02-12 at 11:26 +0100, Markus Armbruster wrote:
> >  David Gibson <dwg@au1.ibm.com> writes:
> > 
> > > On Wed, Feb 11, 2009 at 12:50:28PM -0600, Hollis Blanchard wrote:
> > >> On Wed, 2009-02-11 at 16:40 +0100, Markus Armbruster wrote:
> > > [snip]
> > >> > I briefly examined the DT source format and the tree structure it
> > >> > describes for the purpose of QEMU configuration.  I decided
> > against
> > >> > using it in my prototype because I found it awfully low-level and
> > >> > verbose for that purpose (I'm sure it serves the purpose it was
> > designed
> > >> > for just fine).  Issues include:
> > >> > 
> > >> > * Since the DT is designed for booting kernels, not configuring
> > QEMU,
> > >> >   there's information that has no place in QEMU configuration,
> > and
> > >> >   required QEMU configuration isn't there.
> > >> 
> > >> What's needed is a "binding" in IEEE1275-speak: a document that
> > >> describes qemu-specific nodes/properties and how they are to be
> > >> interpreted.
> > >> 
> > >> As an example, you could require that block devices contain
> > properties
> > >> named "qemu,path", "qemu,backend", etc.
> > >
> > > Yes, it shouldn't be hard to annotate an IEEE1275 style tree with
> > > extra information for qemu's use.
> > 
> > I don't feel up to that task, because I'm not really familiar with
> > IEEE1275.  Could you help out?
> 
> I'm not really a "language lawyer" for device trees, but I can help.
> 
> FWIW, I was imagining (from a PowerPC point of view) that a strict
> subset of the device tree interpreted by qemu would be passed into the
> guest. In other words, once qemu is done with it, it would strip every
> property prefixed with "qemu," and copy the result into guest memory.
> PowerPC kernels require this data structure, and even when firmware runs
> in the guest, you still need to tell the firmware what the system layout
> is, and the device tree is an obvious candidate...

I wouldn't actually bother stripping the "qemu,..." properties.  They
should be ignored by the OS anyway.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 18:29                 ` [Qemu-devel] " Markus Armbruster
@ 2009-02-13  1:05                     ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  1:05 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA,
	c-d.hailfinger.devel.2006-hi6Y0CQ0nG0

On Thu, Feb 12, 2009 at 07:29:12PM +0100, Markus Armbruster wrote:
> "M. Warner Losh" <imp-uzTCJ5RojNnQT0dZR+AlfA@public.gmane.org> writes:
[snip]
> > However, every board that's being produced for powerpc has the DTB at
> > least available.  It has to be, or (recent?) Linux kernels flat out
> > won't work.  This suggests that it might be a good idea to look at
> > this format.
> >
> > There's DTS and DTB.  One is the source, the other is the binary
> > created from the source.  I'd recommend that qemu actually use the DTB
> > rather than the DTS to implement things.  This way one could have a
> > nicer syntax like the above and generate the DTB, or one could use the
> > DTS provided by a vendor if there was a more specific board they
> > wanted qemu to emulate.
> 
> As far as I know, dtc can decompile DTB into DTS.

That's correct.  However, like many decompilation processes,
converting dts->dtb->dts is usually a lossy process to some extent.
dts has multiple ways to represent some things for readability
reasons, which don't affect the content of the compiled tree.

> I'm not a fan of DTS syntax either, but if we choose FDT, then inventing
> an alternative syntax seems rather pointless to me.

If you have suggestions for improving dts that don't involve
completely breaking compatibility with existing trees, we might be
able to incorporate them into dtc to make everyone's life easier.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13  1:05                     ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  1:05 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: devicetree-discuss, qemu-devel, hollisb, c-d.hailfinger.devel.2006

On Thu, Feb 12, 2009 at 07:29:12PM +0100, Markus Armbruster wrote:
> "M. Warner Losh" <imp@bsdimp.com> writes:
[snip]
> > However, every board that's being produced for powerpc has the DTB at
> > least available.  It has to be, or (recent?) Linux kernels flat out
> > won't work.  This suggests that it might be a good idea to look at
> > this format.
> >
> > There's DTS and DTB.  One is the source, the other is the binary
> > created from the source.  I'd recommend that qemu actually use the DTB
> > rather than the DTS to implement things.  This way one could have a
> > nicer syntax like the above and generate the DTB, or one could use the
> > DTS provided by a vendor if there was a more specific board they
> > wanted qemu to emulate.
> 
> As far as I know, dtc can decompile DTB into DTS.

That's correct.  However, like many decompilation processes,
converting dts->dtb->dts is usually a lossy process to some extent.
dts has multiple ways to represent some things for readability
reasons, which don't affect the content of the compiled tree.

> I'm not a fan of DTS syntax either, but if we choose FDT, then inventing
> an alternative syntax seems rather pointless to me.

If you have suggestions for improving dts that don't involve
completely breaking compatibility with existing trees, we might be
able to incorporate them into dtc to make everyone's life easier.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13  0:32                       ` Carl-Daniel Hailfinger
@ 2009-02-13  1:46                           ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  1:46 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A, Paul Brook,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Fri, Feb 13, 2009 at 01:32:19AM +0100, Carl-Daniel Hailfinger wrote:
> On 13.02.2009 00:58, Paul Brook wrote:
> >> Unless I'm mistaken, Linux is able to probe most hardware properties.
> >
> > You are badly mistaken.
> 
> Point taken.
> 
> 
> > On x86 workstation/server class hardware you might get away with it because 
> > everything interesting is either  standard legacy ports or PCI, and your 
> > firmware/bios already took care of the really hairy bits.
> 
> If the firmware doesn't set up the things which can't be probed, can it
> even be called firmware or is it more like a glorified bootloader?

A bootloader, not even much glorified, is often all there is on
embedded systems.

> > On embedded systems there's often very little that can be automatically 
> > detected, much less functionality provided by the firmware (You're lucky if 
> > all your RAM is even turned on!) and you just have to know where stuff is.
> 
> Ouch. I always thought turning on all the RAM was either a hardware (old
> x86) or firmware (modern x86) task.
> 
> I'm a bit surprised by the lack of automatically detectable features in
> embedded systems. Wouldn't automatic detection allow reusing whole OS
> images on slighly different systems and thus lower development cost?

Automatic detection requires protocols between hardware, firmware and
OS to implement it.  The ones that exist for full systems are too
heavyweight for embedded systems, or assume things about the hardware
setup that don't suit what embedded systems want to include.
Typically it's just been easier for embedded vendors to hack their
kernels to know the hardware directly.

The tradeoffs we've made for use of flattened device trees represent
an effort to achieve lower development cost precisely as you describe
here.  Inherently probably hardware (e.g. PCI, USB) is mostly omitted
from the tree, leaving a minimal blob of almost static information
which the firmware/bootloader can include to tell the OS about the
hardware setup while still having almost no "moving parts".

The kernel can still neatly support systems which don't provide a
flattened tree by being built with a wrapper.  The wrapper, which is
specific to a particular hardware/firmware combination contains a
flattened tree, plus some code to tweak it with what little
information the embedded firmware / hardware does provide (ram size
and flash size are common examples).

This way we have a kernel which can run unmodified on many systems
which do provide a flattened tree, and we're no worse off for systems
which don't (a hardware specific kernel is replaced by a hardware
specific kernel+wrapper combination).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13  1:46                           ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  1:46 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: devicetree-discuss, Paul Brook, hollisb, qemu-devel

On Fri, Feb 13, 2009 at 01:32:19AM +0100, Carl-Daniel Hailfinger wrote:
> On 13.02.2009 00:58, Paul Brook wrote:
> >> Unless I'm mistaken, Linux is able to probe most hardware properties.
> >
> > You are badly mistaken.
> 
> Point taken.
> 
> 
> > On x86 workstation/server class hardware you might get away with it because 
> > everything interesting is either  standard legacy ports or PCI, and your 
> > firmware/bios already took care of the really hairy bits.
> 
> If the firmware doesn't set up the things which can't be probed, can it
> even be called firmware or is it more like a glorified bootloader?

A bootloader, not even much glorified, is often all there is on
embedded systems.

> > On embedded systems there's often very little that can be automatically 
> > detected, much less functionality provided by the firmware (You're lucky if 
> > all your RAM is even turned on!) and you just have to know where stuff is.
> 
> Ouch. I always thought turning on all the RAM was either a hardware (old
> x86) or firmware (modern x86) task.
> 
> I'm a bit surprised by the lack of automatically detectable features in
> embedded systems. Wouldn't automatic detection allow reusing whole OS
> images on slighly different systems and thus lower development cost?

Automatic detection requires protocols between hardware, firmware and
OS to implement it.  The ones that exist for full systems are too
heavyweight for embedded systems, or assume things about the hardware
setup that don't suit what embedded systems want to include.
Typically it's just been easier for embedded vendors to hack their
kernels to know the hardware directly.

The tradeoffs we've made for use of flattened device trees represent
an effort to achieve lower development cost precisely as you describe
here.  Inherently probably hardware (e.g. PCI, USB) is mostly omitted
from the tree, leaving a minimal blob of almost static information
which the firmware/bootloader can include to tell the OS about the
hardware setup while still having almost no "moving parts".

The kernel can still neatly support systems which don't provide a
flattened tree by being built with a wrapper.  The wrapper, which is
specific to a particular hardware/firmware combination contains a
flattened tree, plus some code to tweak it with what little
information the embedded firmware / hardware does provide (ram size
and flash size are common examples).

This way we have a kernel which can run unmodified on many systems
which do provide a flattened tree, and we're no worse off for systems
which don't (a hardware specific kernel is replaced by a hardware
specific kernel+wrapper combination).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13  0:43               ` David Gibson
@ 2009-02-13  2:11                   ` Carl-Daniel Hailfinger
  -1 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-13  2:11 UTC (permalink / raw)
  To: Markus Armbruster, Hollis Blanchard,
	devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

On 13.02.2009 01:43, David Gibson wrote:
> On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
>   
>> I didn't mean to say they are a bad idea for FDTs, just that they're on
>> an awkward level of abstraction for QEMU configuration.  There, I'd
>> rather express a PCI address as "02:01.0" than as <0x00000220>.
>> Translating text to binary is the machine's job, not the user's.
>>     
>
> Ah, I see what you mean.  Hrm, there are several possibilities here,
> we'll have to see which works out best for your purposes.
>   

Using the DTC version included in the coreboot v3 sources would solve
that problem and give you a readable PCI address representation.

Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13  2:11                   ` Carl-Daniel Hailfinger
  0 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-13  2:11 UTC (permalink / raw)
  To: Markus Armbruster, Hollis Blanchard, devicetree-discuss, qemu-devel

On 13.02.2009 01:43, David Gibson wrote:
> On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
>   
>> I didn't mean to say they are a bad idea for FDTs, just that they're on
>> an awkward level of abstraction for QEMU configuration.  There, I'd
>> rather express a PCI address as "02:01.0" than as <0x00000220>.
>> Translating text to binary is the machine's job, not the user's.
>>     
>
> Ah, I see what you mean.  Hrm, there are several possibilities here,
> we'll have to see which works out best for your purposes.
>   

Using the DTC version included in the coreboot v3 sources would solve
that problem and give you a readable PCI address representation.

Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13  2:11                   ` Carl-Daniel Hailfinger
@ 2009-02-13  2:17                       ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  2:17 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A, Markus Armbruster,
	Hollis Blanchard, qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Fri, Feb 13, 2009 at 03:11:20AM +0100, Carl-Daniel Hailfinger wrote:
> On 13.02.2009 01:43, David Gibson wrote:
> > On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
> >   
> >> I didn't mean to say they are a bad idea for FDTs, just that they're on
> >> an awkward level of abstraction for QEMU configuration.  There, I'd
> >> rather express a PCI address as "02:01.0" than as <0x00000220>.
> >> Translating text to binary is the machine's job, not the user's.
> >>     
> >
> > Ah, I see what you mean.  Hrm, there are several possibilities here,
> > we'll have to see which works out best for your purposes.
> 
> Using the DTC version included in the coreboot v3 sources would solve
> that problem and give you a readable PCI address representation.

Hrm.. it would be nice if you'd co-ordinated with Jon and I about
this.  Then we could have at least the bits which make sense in
upstream dtc...

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13  2:17                       ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  2:17 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: devicetree-discuss, Markus Armbruster, Hollis Blanchard, qemu-devel

On Fri, Feb 13, 2009 at 03:11:20AM +0100, Carl-Daniel Hailfinger wrote:
> On 13.02.2009 01:43, David Gibson wrote:
> > On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
> >   
> >> I didn't mean to say they are a bad idea for FDTs, just that they're on
> >> an awkward level of abstraction for QEMU configuration.  There, I'd
> >> rather express a PCI address as "02:01.0" than as <0x00000220>.
> >> Translating text to binary is the machine's job, not the user's.
> >>     
> >
> > Ah, I see what you mean.  Hrm, there are several possibilities here,
> > we'll have to see which works out best for your purposes.
> 
> Using the DTC version included in the coreboot v3 sources would solve
> that problem and give you a readable PCI address representation.

Hrm.. it would be nice if you'd co-ordinated with Jon and I about
this.  Then we could have at least the bits which make sense in
upstream dtc...

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* DTS syntax and DTC patches (was: Re: [Qemu-devel] [RFC] Machine description as data)
  2009-02-13  2:17                       ` David Gibson
@ 2009-02-13  2:45                           ` Carl-Daniel Hailfinger
  -1 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-13  2:45 UTC (permalink / raw)
  To: Coreboot, Markus Armbruster, Hollis Blanchard,
	devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

[Adding the coreboot mailing list to CC. It's moderated for
non-subscribers, but it won't take long for legitimate mails to be
approved.]

On 13.02.2009 03:17, David Gibson wrote:
> On Fri, Feb 13, 2009 at 03:11:20AM +0100, Carl-Daniel Hailfinger wrote:
>   
>> On 13.02.2009 01:43, David Gibson wrote:
>>     
>>> On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
>>>   
>>>       
>>>> I didn't mean to say they are a bad idea for FDTs, just that they're on
>>>> an awkward level of abstraction for QEMU configuration.  There, I'd
>>>> rather express a PCI address as "02:01.0" than as <0x00000220>.
>>>> Translating text to binary is the machine's job, not the user's.
>>>>     
>>>>         
>>> Ah, I see what you mean.  Hrm, there are several possibilities here,
>>> we'll have to see which works out best for your purposes.
>>>       
>> Using the DTC version included in the coreboot v3 sources would solve
>> that problem and give you a readable PCI address representation.
>>     
>
> Hrm.. it would be nice if you'd co-ordinated with Jon and I about
> this.  Then we could have at least the bits which make sense in
> upstream dtc...
>   

Probably the biggest obstacle for a full merge right now is that the
coreboot v3 DTC is rather old and has been extended not only for a more
readable DTS syntax variant, but also for additional output modes (C
header and C code).

We (coreboot developers) are interested in reducing our diff with
upstream DTC in order to improve maintainability of our DTC code.


Regards,
Carl-Daniel

^ permalink raw reply	[flat|nested] 146+ messages in thread

* DTS syntax and DTC patches (was: Re: [Qemu-devel] [RFC] Machine description as data)
@ 2009-02-13  2:45                           ` Carl-Daniel Hailfinger
  0 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-13  2:45 UTC (permalink / raw)
  To: Coreboot, Markus Armbruster, Hollis Blanchard,
	devicetree-discuss, qemu-devel

[Adding the coreboot mailing list to CC. It's moderated for
non-subscribers, but it won't take long for legitimate mails to be
approved.]

On 13.02.2009 03:17, David Gibson wrote:
> On Fri, Feb 13, 2009 at 03:11:20AM +0100, Carl-Daniel Hailfinger wrote:
>   
>> On 13.02.2009 01:43, David Gibson wrote:
>>     
>>> On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
>>>   
>>>       
>>>> I didn't mean to say they are a bad idea for FDTs, just that they're on
>>>> an awkward level of abstraction for QEMU configuration.  There, I'd
>>>> rather express a PCI address as "02:01.0" than as <0x00000220>.
>>>> Translating text to binary is the machine's job, not the user's.
>>>>     
>>>>         
>>> Ah, I see what you mean.  Hrm, there are several possibilities here,
>>> we'll have to see which works out best for your purposes.
>>>       
>> Using the DTC version included in the coreboot v3 sources would solve
>> that problem and give you a readable PCI address representation.
>>     
>
> Hrm.. it would be nice if you'd co-ordinated with Jon and I about
> this.  Then we could have at least the bits which make sense in
> upstream dtc...
>   

Probably the biggest obstacle for a full merge right now is that the
coreboot v3 DTC is rather old and has been extended not only for a more
readable DTS syntax variant, but also for additional output modes (C
header and C code).

We (coreboot developers) are interested in reducing our diff with
upstream DTC in order to improve maintainability of our DTC code.


Regards,
Carl-Daniel

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: DTS syntax and DTC patches (was: Re: [Qemu-devel] [RFC] Machine description as data)
  2009-02-13  2:45                           ` Carl-Daniel Hailfinger
@ 2009-02-13  2:51                               ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  2:51 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	devicetree-discuss-mnsaURCQ41sdnm+yROfE0A, Markus Armbruster,
	Hollis Blanchard, Coreboot

On Fri, Feb 13, 2009 at 03:45:45AM +0100, Carl-Daniel Hailfinger wrote:
> [Adding the coreboot mailing list to CC. It's moderated for
> non-subscribers, but it won't take long for legitimate mails to be
> approved.]
> 
> On 13.02.2009 03:17, David Gibson wrote:
> > On Fri, Feb 13, 2009 at 03:11:20AM +0100, Carl-Daniel Hailfinger wrote:
> >   
> >> On 13.02.2009 01:43, David Gibson wrote:
> >>     
> >>> On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
> >>>   
> >>>       
> >>>> I didn't mean to say they are a bad idea for FDTs, just that they're on
> >>>> an awkward level of abstraction for QEMU configuration.  There, I'd
> >>>> rather express a PCI address as "02:01.0" than as <0x00000220>.
> >>>> Translating text to binary is the machine's job, not the user's.
> >>>>     
> >>>>         
> >>> Ah, I see what you mean.  Hrm, there are several possibilities here,
> >>> we'll have to see which works out best for your purposes.
> >>>       
> >> Using the DTC version included in the coreboot v3 sources would solve
> >> that problem and give you a readable PCI address representation.
> >>     
> >
> > Hrm.. it would be nice if you'd co-ordinated with Jon and I about
> > this.  Then we could have at least the bits which make sense in
> > upstream dtc...
> >   
> 
> Probably the biggest obstacle for a full merge right now is that the
> coreboot v3 DTC is rather old and has been extended not only for a more
> readable DTS syntax variant, but also for additional output modes (C
> header and C code).

If the C output mode is what I'm guessing, it should be pretty easy to
add (we already have an asm output mode upstream).

The syntax changes will be trickier.  I want to review any new syntax
for dts very carefully, because I really, really don't want to have to
break backwards compatibility in future (I'm unhappy enough about the
dts-v0 to dts-v1 transition we've already have).

Can you summarise what the syntax changes are?  Maybe start a new
thread with just devicetree-discuss not the other lists for that.

> We (coreboot developers) are interested in reducing our diff with
> upstream DTC in order to improve maintainability of our DTC code.

Good :)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: DTS syntax and DTC patches (was: Re: [Qemu-devel] [RFC] Machine description as data)
@ 2009-02-13  2:51                               ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-13  2:51 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: qemu-devel, devicetree-discuss, Markus Armbruster,
	Hollis Blanchard, Coreboot

On Fri, Feb 13, 2009 at 03:45:45AM +0100, Carl-Daniel Hailfinger wrote:
> [Adding the coreboot mailing list to CC. It's moderated for
> non-subscribers, but it won't take long for legitimate mails to be
> approved.]
> 
> On 13.02.2009 03:17, David Gibson wrote:
> > On Fri, Feb 13, 2009 at 03:11:20AM +0100, Carl-Daniel Hailfinger wrote:
> >   
> >> On 13.02.2009 01:43, David Gibson wrote:
> >>     
> >>> On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
> >>>   
> >>>       
> >>>> I didn't mean to say they are a bad idea for FDTs, just that they're on
> >>>> an awkward level of abstraction for QEMU configuration.  There, I'd
> >>>> rather express a PCI address as "02:01.0" than as <0x00000220>.
> >>>> Translating text to binary is the machine's job, not the user's.
> >>>>     
> >>>>         
> >>> Ah, I see what you mean.  Hrm, there are several possibilities here,
> >>> we'll have to see which works out best for your purposes.
> >>>       
> >> Using the DTC version included in the coreboot v3 sources would solve
> >> that problem and give you a readable PCI address representation.
> >>     
> >
> > Hrm.. it would be nice if you'd co-ordinated with Jon and I about
> > this.  Then we could have at least the bits which make sense in
> > upstream dtc...
> >   
> 
> Probably the biggest obstacle for a full merge right now is that the
> coreboot v3 DTC is rather old and has been extended not only for a more
> readable DTS syntax variant, but also for additional output modes (C
> header and C code).

If the C output mode is what I'm guessing, it should be pretty easy to
add (we already have an asm output mode upstream).

The syntax changes will be trickier.  I want to review any new syntax
for dts very carefully, because I really, really don't want to have to
break backwards compatibility in future (I'm unhappy enough about the
dts-v0 to dts-v1 transition we've already have).

Can you summarise what the syntax changes are?  Maybe start a new
thread with just devicetree-discuss not the other lists for that.

> We (coreboot developers) are interested in reducing our diff with
> upstream DTC in order to improve maintainability of our DTC code.

Good :)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-12 23:58                   ` [Qemu-devel] " Carl-Daniel Hailfinger
@ 2009-02-13 11:19                       ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-13 11:19 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA

Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006-hi6Y0CQ0nG0@public.gmane.org> writes:

> How exactly do you represent a digraph with some cycles as a decorated
> tree? The solution should allow people without an extensive background
> in IEEE1275 to change the graph as needed.

Okay, calling it just a decorated tree is not 100% accurate.  It's a
decorated tree where a certain kind of decoration can refer to another
node.  These additional edges actually make it a directed graph.  But we
still have a tree embedded in that graph, which is useful when we
convert it to or from text.

[...]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 11:19                       ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-13 11:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, hollisb

Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> writes:

> How exactly do you represent a digraph with some cycles as a decorated
> tree? The solution should allow people without an extensive background
> in IEEE1275 to change the graph as needed.

Okay, calling it just a decorated tree is not 100% accurate.  It's a
decorated tree where a certain kind of decoration can refer to another
node.  These additional edges actually make it a directed graph.  But we
still have a tree embedded in that graph, which is useful when we
convert it to or from text.

[...]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13  0:37           ` David Gibson
@ 2009-02-13 11:26               ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-13 11:26 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A

David Gibson <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> writes:

> On Thu, Feb 12, 2009 at 11:26:12AM +0100, Markus Armbruster wrote:
>> Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
> [snip]
>> > I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
>> > reinvent all the design and infrastructure for a similar-but-different
>> > device tree.
>> >
>> > [Patch snipped]
>> 
>> I'm not at all opposed to adapting FDT for QEMU use.  My patch is a
>> prototype, and I'm prepared to throw away some or all of it.
>> 
>> To get this thing started, I wanted working code to demonstrate what I'm
>> talking about.  If I had dug deeper into FDTs first, we would not be
>> talking now.
>> 
>> The task I outlined in my memo involves much more than just coming up
>> with a device tree data structure.  That data structure is to me one
>> detail among many, and a much less hairy one than most others.  It
>> certainly was for the prototype.
>> 
>> If I read the comments correctly (all comments, not just this one), the
>> only real issue with my proposal is you'd rather use FDT for the config
>> tree.  I don't mind, except I don't know enough about that stuff to do
>> it all by myself, at least not in a reasonable time frame.  I think I
>> understand the concepts, can read .dts files with some head-scratching,
>> and I could perhaps even write one if I sacrificed a chicken or two.
>> Designing a binding, however, feels well above my level of
>> (in)competence.
>> 
>> So, to make FDT happen, I need help.  Specifically:
>> 
>> * Point me to the FDT code I'm supposed to integrate.  I'm looking for
>>   basic decorated tree stuff: create trees, traverse them, get and put
>>   properties, add and delete nodes, read and write them as plain,
>>   human-readable text.
>
> dtc and libfdt is a good place to start, if you haven't yet
> investigated them:
> 	git://git.jdl.com/software/dtc.git
> Note that although they're distributed together as one tree, dtc and
> libfdt are essentially independent pieces of software.  dtc converts
> device trees between various formats, dts and dtb in particular.
>
> libfdt does a number of the things you mention with flat trees -
> get/set properties, build trees, traverse etc.  If it doesn't do
> everything you need, we can probably extend it so that it does: I want
> libfdt to be *the* library for manipulating trees in the fdt forma.
> It's designed to be easy to embed in other packages for this reason,
> although it does have some usage peculiarities because in particular
> it's possible to integrate into very limited environments like
> firmwares.
>
> [Jon Loeliger is the current maintainer of dtc and libfdt, but I
> originally wrote both of them - I know as much about them as anyone
> does]

Okay, I looked at dtc and libfdt again, a bit more closely.  I'm sure
there's plenty of ignorance left in me, so please correct me when I'm
babbling nonsense.

FDT is a "flattened tree", i.e. a tree data structure laid out in a
block of memory in a clever way to make it compact and easily
relocatable.  I understand why these are important requirements for
passing information through bootloader to kernel.  They're irrelevant,
however, for use as QEMU configuration.

You can identify an FDT node by node offset or node name.  The node
offset can change when you add or delete nodes or properties.

You want everyone to use libfdt for manipulating FDTs.  I think that's
entirely sensible.  What I still don't get is something else: Why use
FDT for QEMU configuration in the first place?  Let me explain.

I think we have two distinct problems: the need for a flexible,
expressive QEMU machine configuration file and a virtual device
configuration machinery driven by it, and the need for an FDT to pass to
a PowerPC kernel.  The two may be related, but they are not identical.

Let's pretend for a minute the latter need doesn't exist.

QEMU machine configuration wants to be a decorated tree: a tree of named
nodes with named properties.

IEEE 1275 is a standard describing a special kind of decorated tree.
Other kinds can be created with a binding.  If we create a suitable
binding, we can surely cast our configuration trees in the IEEE 1275
framework.

But what would that buy us?  This is a honest question, born out of my
relative ignorance of IEEE 1275.  Mind that we're still busily ignoring
the need for an FDT to pass to a kernel, so "it makes it easier to
create an FDT for the kernel" doesn't count here (it counts elsewhere).

FDTs are a special representation of IEEE 1275 trees in memory, designed
to be compact and relocatable.  But that comes at a price: nodes move
around when the tree changes.  The only real node id is the full name.

This is not the common representation of decorated trees in C programs,
and for a reason.  It's simpler to represent edges as C pointers.  Not
the least advantage of that is notation: "->" beats a function call in
legibility hands down.

Example: the QEMU device data type needs to refer to its device node in
the configuration tree.  If that tree is coded the plain old way, you
store a pointer to the node and follow that.  If it is an FDT, then you
have to store the full node name, and look up the node by name.  I find
that tedious and verbose.

My point is: the question how to represent our decorated tree in memory
is entirely separate from the question of the tree's structure.  Just
because you want your tree to conform to IEEE 1275 doesn't mean you want
your tree flat at all times.

Now let's examine how QEMU machine configuration and FDT machine
descriptions for kernels are related.

In a way, both can be regarded as copies of a complete machine
description with lots of stuff pruned.  Except the complete machine
description doesn't exist.  Because there is no use for it.

FDT routinely prunes stuff like PCI and USB devices, because those are
better probed.

QEMU configuration should certainly prune everything that is not
actually configurable.

To go from QEMU configuration to FDT we therefore may want to prune
superflous stuff, to keep it compact, and we definitely have to add lots
of stuff that has no place in configuration.  Compared to that task, a
change of representation seems trivial.  I figure we want to copy the
tree anyway, because we need to edit it pretty drastically.

It's not obvious to me whether it makes sense to create the FDT from the
QEMU configuration automatically.  If we simulate a specific board, the
FDT is pretty fixed, isn't it?  Much of the configurable stuff could be
precisely in those parts that are omitted from FDT: PCI devices and
such.

>> * Provide an example tree describing a bare-bones PC, like the one in my
>>   prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
>>   port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
>>   miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
>>   Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
>>   tree structure.  Morphing that into something suitable for QEMU
>>   configuration shouldn't be too hard then, just an exercice in
>>   redecorating the tree.
>
> I don't off hand know any trees for a PC system.  There are a bunch of
> example trees for powerpc systems in arch/powerpc/boot/dts in the
> kernel tree.  A few of those, such as prep, at least have parts which
> somewhat resemble a PC.  I believe the OLPC also has OF; that would be
> an example OF tree for an x86 machine, if not a typical PC.

Could you point me to a specific file?  I grepped for prep and OLPC, no
luck.

>> * Advice as we go.
>
> I'll do what I can.

Thanks in advance!

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 11:26               ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-13 11:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

David Gibson <david@gibson.dropbear.id.au> writes:

> On Thu, Feb 12, 2009 at 11:26:12AM +0100, Markus Armbruster wrote:
>> Hollis Blanchard <hollisb@us.ibm.com> writes:
> [snip]
>> > I won't say IEEE1275 is perfect, but IMHO it would be pretty silly to
>> > reinvent all the design and infrastructure for a similar-but-different
>> > device tree.
>> >
>> > [Patch snipped]
>> 
>> I'm not at all opposed to adapting FDT for QEMU use.  My patch is a
>> prototype, and I'm prepared to throw away some or all of it.
>> 
>> To get this thing started, I wanted working code to demonstrate what I'm
>> talking about.  If I had dug deeper into FDTs first, we would not be
>> talking now.
>> 
>> The task I outlined in my memo involves much more than just coming up
>> with a device tree data structure.  That data structure is to me one
>> detail among many, and a much less hairy one than most others.  It
>> certainly was for the prototype.
>> 
>> If I read the comments correctly (all comments, not just this one), the
>> only real issue with my proposal is you'd rather use FDT for the config
>> tree.  I don't mind, except I don't know enough about that stuff to do
>> it all by myself, at least not in a reasonable time frame.  I think I
>> understand the concepts, can read .dts files with some head-scratching,
>> and I could perhaps even write one if I sacrificed a chicken or two.
>> Designing a binding, however, feels well above my level of
>> (in)competence.
>> 
>> So, to make FDT happen, I need help.  Specifically:
>> 
>> * Point me to the FDT code I'm supposed to integrate.  I'm looking for
>>   basic decorated tree stuff: create trees, traverse them, get and put
>>   properties, add and delete nodes, read and write them as plain,
>>   human-readable text.
>
> dtc and libfdt is a good place to start, if you haven't yet
> investigated them:
> 	git://git.jdl.com/software/dtc.git
> Note that although they're distributed together as one tree, dtc and
> libfdt are essentially independent pieces of software.  dtc converts
> device trees between various formats, dts and dtb in particular.
>
> libfdt does a number of the things you mention with flat trees -
> get/set properties, build trees, traverse etc.  If it doesn't do
> everything you need, we can probably extend it so that it does: I want
> libfdt to be *the* library for manipulating trees in the fdt forma.
> It's designed to be easy to embed in other packages for this reason,
> although it does have some usage peculiarities because in particular
> it's possible to integrate into very limited environments like
> firmwares.
>
> [Jon Loeliger is the current maintainer of dtc and libfdt, but I
> originally wrote both of them - I know as much about them as anyone
> does]

Okay, I looked at dtc and libfdt again, a bit more closely.  I'm sure
there's plenty of ignorance left in me, so please correct me when I'm
babbling nonsense.

FDT is a "flattened tree", i.e. a tree data structure laid out in a
block of memory in a clever way to make it compact and easily
relocatable.  I understand why these are important requirements for
passing information through bootloader to kernel.  They're irrelevant,
however, for use as QEMU configuration.

You can identify an FDT node by node offset or node name.  The node
offset can change when you add or delete nodes or properties.

You want everyone to use libfdt for manipulating FDTs.  I think that's
entirely sensible.  What I still don't get is something else: Why use
FDT for QEMU configuration in the first place?  Let me explain.

I think we have two distinct problems: the need for a flexible,
expressive QEMU machine configuration file and a virtual device
configuration machinery driven by it, and the need for an FDT to pass to
a PowerPC kernel.  The two may be related, but they are not identical.

Let's pretend for a minute the latter need doesn't exist.

QEMU machine configuration wants to be a decorated tree: a tree of named
nodes with named properties.

IEEE 1275 is a standard describing a special kind of decorated tree.
Other kinds can be created with a binding.  If we create a suitable
binding, we can surely cast our configuration trees in the IEEE 1275
framework.

But what would that buy us?  This is a honest question, born out of my
relative ignorance of IEEE 1275.  Mind that we're still busily ignoring
the need for an FDT to pass to a kernel, so "it makes it easier to
create an FDT for the kernel" doesn't count here (it counts elsewhere).

FDTs are a special representation of IEEE 1275 trees in memory, designed
to be compact and relocatable.  But that comes at a price: nodes move
around when the tree changes.  The only real node id is the full name.

This is not the common representation of decorated trees in C programs,
and for a reason.  It's simpler to represent edges as C pointers.  Not
the least advantage of that is notation: "->" beats a function call in
legibility hands down.

Example: the QEMU device data type needs to refer to its device node in
the configuration tree.  If that tree is coded the plain old way, you
store a pointer to the node and follow that.  If it is an FDT, then you
have to store the full node name, and look up the node by name.  I find
that tedious and verbose.

My point is: the question how to represent our decorated tree in memory
is entirely separate from the question of the tree's structure.  Just
because you want your tree to conform to IEEE 1275 doesn't mean you want
your tree flat at all times.

Now let's examine how QEMU machine configuration and FDT machine
descriptions for kernels are related.

In a way, both can be regarded as copies of a complete machine
description with lots of stuff pruned.  Except the complete machine
description doesn't exist.  Because there is no use for it.

FDT routinely prunes stuff like PCI and USB devices, because those are
better probed.

QEMU configuration should certainly prune everything that is not
actually configurable.

To go from QEMU configuration to FDT we therefore may want to prune
superflous stuff, to keep it compact, and we definitely have to add lots
of stuff that has no place in configuration.  Compared to that task, a
change of representation seems trivial.  I figure we want to copy the
tree anyway, because we need to edit it pretty drastically.

It's not obvious to me whether it makes sense to create the FDT from the
QEMU configuration automatically.  If we simulate a specific board, the
FDT is pretty fixed, isn't it?  Much of the configurable stuff could be
precisely in those parts that are omitted from FDT: PCI devices and
such.

>> * Provide an example tree describing a bare-bones PC, like the one in my
>>   prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
>>   port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
>>   miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
>>   Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
>>   tree structure.  Morphing that into something suitable for QEMU
>>   configuration shouldn't be too hard then, just an exercice in
>>   redecorating the tree.
>
> I don't off hand know any trees for a PC system.  There are a bunch of
> example trees for powerpc systems in arch/powerpc/boot/dts in the
> kernel tree.  A few of those, such as prep, at least have parts which
> somewhat resemble a PC.  I believe the OLPC also has OF; that would be
> an example OF tree for an x86 machine, if not a typical PC.

Could you point me to a specific file?  I grepped for prep and OLPC, no
luck.

>> * Advice as we go.
>
> I'll do what I can.

Thanks in advance!

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 11:26               ` Markus Armbruster
@ 2009-02-13 12:06                   ` Paul Brook
  -1 siblings, 0 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-13 12:06 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A, Markus Armbruster

> Now let's examine how QEMU machine configuration and FDT machine
> descriptions for kernels are related.
>
> In a way, both can be regarded as copies of a complete machine
> description with lots of stuff pruned.  Except the complete machine
> description doesn't exist.  Because there is no use for it.
>
> FDT routinely prunes stuff like PCI and USB devices, because those are
> better probed.
>
> QEMU configuration should certainly prune everything that is not
> actually configurable.

I'm not sure I agree here, or at least we may be talking past each other.

IMHO the machine config should specify all the bits of the machine that don't 
really want to be exposed to the average user. e.g. the memory layout and 
interrupt routings, etc. We then have a seaparate user config file (possibly 
structured differently) which exposes things like host bindings for disks and 
network devices.

It's all a bit muddy because the current commandline options effect both the 
devices present and the host bindings for the corresponding interfaces. While 
this seems like a good idea to start with, I'm not convinced this is actually 
a desirable feature.  Certainly for embedded machines you want a fixed set of 
hardware. e.g. if we have a SoC with 3 UARTs we should always create those 3 
devices, and it's not meaningful to have more. If the user doesn't specify 
sufficient -serial options then the remainder just get connected 
to /dev/null. Likewise there's a good argument for having the vlan and disc 
configuration be separate from creation of the NIC/HBA devices.

One possibility is that it might actually make more sense to specify 
hot-pluggable devices (e.g. PCI and USB) in a sumilar way that they would be 
added at runtime, rather than trying to force them into a static tree.

My implementation focsed on just the machine config, mostly ignoring the user 
config and host bindings.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 12:06                   ` Paul Brook
  0 siblings, 0 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-13 12:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Markus Armbruster

> Now let's examine how QEMU machine configuration and FDT machine
> descriptions for kernels are related.
>
> In a way, both can be regarded as copies of a complete machine
> description with lots of stuff pruned.  Except the complete machine
> description doesn't exist.  Because there is no use for it.
>
> FDT routinely prunes stuff like PCI and USB devices, because those are
> better probed.
>
> QEMU configuration should certainly prune everything that is not
> actually configurable.

I'm not sure I agree here, or at least we may be talking past each other.

IMHO the machine config should specify all the bits of the machine that don't 
really want to be exposed to the average user. e.g. the memory layout and 
interrupt routings, etc. We then have a seaparate user config file (possibly 
structured differently) which exposes things like host bindings for disks and 
network devices.

It's all a bit muddy because the current commandline options effect both the 
devices present and the host bindings for the corresponding interfaces. While 
this seems like a good idea to start with, I'm not convinced this is actually 
a desirable feature.  Certainly for embedded machines you want a fixed set of 
hardware. e.g. if we have a SoC with 3 UARTs we should always create those 3 
devices, and it's not meaningful to have more. If the user doesn't specify 
sufficient -serial options then the remainder just get connected 
to /dev/null. Likewise there's a good argument for having the vlan and disc 
configuration be separate from creation of the NIC/HBA devices.

One possibility is that it might actually make more sense to specify 
hot-pluggable devices (e.g. PCI and USB) in a sumilar way that they would be 
added at runtime, rather than trying to force them into a static tree.

My implementation focsed on just the machine config, mostly ignoring the user 
config and host bindings.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 12:06                   ` Paul Brook
@ 2009-02-13 12:48                       ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-13 12:48 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A

Paul Brook <paul-qD8j1LwMmJjtCj0u4l0SBw@public.gmane.org> writes:

>> Now let's examine how QEMU machine configuration and FDT machine
>> descriptions for kernels are related.
>>
>> In a way, both can be regarded as copies of a complete machine
>> description with lots of stuff pruned.  Except the complete machine
>> description doesn't exist.  Because there is no use for it.
>>
>> FDT routinely prunes stuff like PCI and USB devices, because those are
>> better probed.
>>
>> QEMU configuration should certainly prune everything that is not
>> actually configurable.
>
> I'm not sure I agree here, or at least we may be talking past each other.

That could be my fault; I guess I didn't express myself clearly.  What
"configurable" means depends on your point of view.

One point of view is assembling pieces of QEMU functionality into a
virtual machine type.  You call that "machine config" below.

Another point of view is configuring a specific virtual machine, based
on a virtual machine type.  I think you call that "user config" below.

In my view, we start with static machine configuration, which we then
modify according to the user's wishes.  The result then drives the
construction of the virtual machine.

My prototype has the static machine configuration compiled in, but I
think it belongs in a config file.

A config file would also be convenient for users.  I guess we'll also
want to support existing command line options, at least for some time.

I'm arguing that both the static machine configuration and the final
configuration (after user config is edited in) lack stuff that needs to
be put into the FDT for the kernel.

Hypothetical example: say the kernel needs to know exactly how the
interrupts are wired.  But QEMU can wire the interrupts just one way,
the way it has always wired them.  What's the point in putting that way
into the machine configuration?  Verifying that whatever is there
matches reality is no less work than generating the information from
scratch, isn't it?

> IMHO the machine config should specify all the bits of the machine that don't 
> really want to be exposed to the average user. e.g. the memory layout and 
> interrupt routings, etc. We then have a seaparate user config file (possibly 
> structured differently) which exposes things like host bindings for disks and 
> network devices.
>
> It's all a bit muddy because the current commandline options effect both the 
> devices present and the host bindings for the corresponding interfaces. While 
> this seems like a good idea to start with, I'm not convinced this is actually 
> a desirable feature.  Certainly for embedded machines you want a fixed set of 
> hardware. e.g. if we have a SoC with 3 UARTs we should always create those 3 
> devices, and it's not meaningful to have more. If the user doesn't specify 
> sufficient -serial options then the remainder just get connected 
> to /dev/null. Likewise there's a good argument for having the vlan and disc 
> configuration be separate from creation of the NIC/HBA devices.
>
> One possibility is that it might actually make more sense to specify 
> hot-pluggable devices (e.g. PCI and USB) in a sumilar way that they would be 
> added at runtime, rather than trying to force them into a static tree.
>
> My implementation focsed on just the machine config, mostly ignoring the user 
> config and host bindings.
>
> Paul

I'm still reading your code.  It would help if you could provide a
Makefile patch that let me actually build it.

My initial impression is that we both approached the problem from
different directions, yet converged on fairly similar solutions.  Each
of us covers stuff the other glossed over.  More later.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 12:48                       ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-13 12:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

Paul Brook <paul@codesourcery.com> writes:

>> Now let's examine how QEMU machine configuration and FDT machine
>> descriptions for kernels are related.
>>
>> In a way, both can be regarded as copies of a complete machine
>> description with lots of stuff pruned.  Except the complete machine
>> description doesn't exist.  Because there is no use for it.
>>
>> FDT routinely prunes stuff like PCI and USB devices, because those are
>> better probed.
>>
>> QEMU configuration should certainly prune everything that is not
>> actually configurable.
>
> I'm not sure I agree here, or at least we may be talking past each other.

That could be my fault; I guess I didn't express myself clearly.  What
"configurable" means depends on your point of view.

One point of view is assembling pieces of QEMU functionality into a
virtual machine type.  You call that "machine config" below.

Another point of view is configuring a specific virtual machine, based
on a virtual machine type.  I think you call that "user config" below.

In my view, we start with static machine configuration, which we then
modify according to the user's wishes.  The result then drives the
construction of the virtual machine.

My prototype has the static machine configuration compiled in, but I
think it belongs in a config file.

A config file would also be convenient for users.  I guess we'll also
want to support existing command line options, at least for some time.

I'm arguing that both the static machine configuration and the final
configuration (after user config is edited in) lack stuff that needs to
be put into the FDT for the kernel.

Hypothetical example: say the kernel needs to know exactly how the
interrupts are wired.  But QEMU can wire the interrupts just one way,
the way it has always wired them.  What's the point in putting that way
into the machine configuration?  Verifying that whatever is there
matches reality is no less work than generating the information from
scratch, isn't it?

> IMHO the machine config should specify all the bits of the machine that don't 
> really want to be exposed to the average user. e.g. the memory layout and 
> interrupt routings, etc. We then have a seaparate user config file (possibly 
> structured differently) which exposes things like host bindings for disks and 
> network devices.
>
> It's all a bit muddy because the current commandline options effect both the 
> devices present and the host bindings for the corresponding interfaces. While 
> this seems like a good idea to start with, I'm not convinced this is actually 
> a desirable feature.  Certainly for embedded machines you want a fixed set of 
> hardware. e.g. if we have a SoC with 3 UARTs we should always create those 3 
> devices, and it's not meaningful to have more. If the user doesn't specify 
> sufficient -serial options then the remainder just get connected 
> to /dev/null. Likewise there's a good argument for having the vlan and disc 
> configuration be separate from creation of the NIC/HBA devices.
>
> One possibility is that it might actually make more sense to specify 
> hot-pluggable devices (e.g. PCI and USB) in a sumilar way that they would be 
> added at runtime, rather than trying to force them into a static tree.
>
> My implementation focsed on just the machine config, mostly ignoring the user 
> config and host bindings.
>
> Paul

I'm still reading your code.  It would help if you could provide a
Makefile patch that let me actually build it.

My initial impression is that we both approached the problem from
different directions, yet converged on fairly similar solutions.  Each
of us covers stuff the other glossed over.  More later.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 12:48                       ` Markus Armbruster
@ 2009-02-13 13:33                           ` Paul Brook
  -1 siblings, 0 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-13 13:33 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A, Markus Armbruster

> Hypothetical example: say the kernel needs to know exactly how the
> interrupts are wired.  But QEMU can wire the interrupts just one way,
> the way it has always wired them.  What's the point in putting that way
> into the machine configuration?  Verifying that whatever is there
> matches reality is no less work than generating the information from
> scratch, isn't it?

Much of the reason for having a machine config is that it allows control over 
things like interrupt routing. Particularly for embedded machines, it's 
common to have a variety of different machines all using the same components, 
but varying in how those components are connected. For example the ARM 
Integrator, Versatile, Realview and Luminary Stellaris boards are all based 
on approximately the same basic set of devices (the ARM PrimeCell SoC 
peripherals), just with different memory maps and interrupt topologies. I 
suspect the same is true for many of the PPC, SH4 and ColdFire boards, and 
probably the different SPARC sun4m/sun4u variants.

Most of the intrastructure to do modular machine construction is already there 
in qemu, it's just currently driven by hardcoded C QEMUMachineInitFunc rather 
than a runtime config.

I guess that's where I see the distinction. Roughly speaking the "machine 
config" replaces pc.c:pc_init1, and the "user config" replaces a lot of the 
goo in vl.c:main, drive_init, etc.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 13:33                           ` Paul Brook
  0 siblings, 0 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-13 13:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Markus Armbruster

> Hypothetical example: say the kernel needs to know exactly how the
> interrupts are wired.  But QEMU can wire the interrupts just one way,
> the way it has always wired them.  What's the point in putting that way
> into the machine configuration?  Verifying that whatever is there
> matches reality is no less work than generating the information from
> scratch, isn't it?

Much of the reason for having a machine config is that it allows control over 
things like interrupt routing. Particularly for embedded machines, it's 
common to have a variety of different machines all using the same components, 
but varying in how those components are connected. For example the ARM 
Integrator, Versatile, Realview and Luminary Stellaris boards are all based 
on approximately the same basic set of devices (the ARM PrimeCell SoC 
peripherals), just with different memory maps and interrupt topologies. I 
suspect the same is true for many of the PPC, SH4 and ColdFire boards, and 
probably the different SPARC sun4m/sun4u variants.

Most of the intrastructure to do modular machine construction is already there 
in qemu, it's just currently driven by hardcoded C QEMUMachineInitFunc rather 
than a runtime config.

I guess that's where I see the distinction. Roughly speaking the "machine 
config" replaces pc.c:pc_init1, and the "user config" replaces a lot of the 
goo in vl.c:main, drive_init, etc.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 13:33                           ` Paul Brook
@ 2009-02-13 14:13                               ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-13 14:13 UTC (permalink / raw)
  To: Paul Brook
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

Paul Brook <paul-qD8j1LwMmJjtCj0u4l0SBw@public.gmane.org> writes:

>> Hypothetical example: say the kernel needs to know exactly how the
>> interrupts are wired.  But QEMU can wire the interrupts just one way,
>> the way it has always wired them.  What's the point in putting that way
>> into the machine configuration?  Verifying that whatever is there
>> matches reality is no less work than generating the information from
>> scratch, isn't it?
>
> Much of the reason for having a machine config is that it allows control over 
> things like interrupt routing. Particularly for embedded machines, it's 
> common to have a variety of different machines all using the same components, 
> but varying in how those components are connected. For example the ARM 
> Integrator, Versatile, Realview and Luminary Stellaris boards are all based 
> on approximately the same basic set of devices (the ARM PrimeCell SoC 
> peripherals), just with different memory maps and interrupt topologies. I 
> suspect the same is true for many of the PPC, SH4 and ColdFire boards, and 
> probably the different SPARC sun4m/sun4u variants.

We make stuff configurable in QEMU when we need it more than one way.
While the kernel wants to see configuration when it could conceivably
exist in more than one way.

> Most of the intrastructure to do modular machine construction is already there 
> in qemu, it's just currently driven by hardcoded C QEMUMachineInitFunc rather 
> than a runtime config.
>
> I guess that's where I see the distinction. Roughly speaking the "machine 
> config" replaces pc.c:pc_init1, and the "user config" replaces a lot of the 
> goo in vl.c:main, drive_init, etc.
>
> Paul

Not that I disagree with that.

Look, my goals are rather modest.  I want to start where we are, put
devices behind a nice abstract interface one by one, picking apart the
pc.c hairball on the way.  The idea is not to design the perfect,
all-encompassing abstract device interface, just to capture what we
need, and extend as we go.  The abstract device interface makes a simple
machine builder possible, driven by tree-structured configuration.  That
in turn makes it easier to make things configurable.  Which can be
expected to lead to more configurability, when and where there's a need
for it.

All this can be done in nice, safe baby steps.  I don't need to come up
with an all-singing machine description fit for a picky kernel before I
can start doing something useful.

Now, if you hand me such a configuration on a platter, I'd be a fool not
to take it.  The catch: I need one for a PC.

I believe there's significant overlap in what we two want to accomplish.
We just come from different directions, with somewhat differing
priorities.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 14:13                               ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-13 14:13 UTC (permalink / raw)
  To: Paul Brook; +Cc: devicetree-discuss, qemu-devel

Paul Brook <paul@codesourcery.com> writes:

>> Hypothetical example: say the kernel needs to know exactly how the
>> interrupts are wired.  But QEMU can wire the interrupts just one way,
>> the way it has always wired them.  What's the point in putting that way
>> into the machine configuration?  Verifying that whatever is there
>> matches reality is no less work than generating the information from
>> scratch, isn't it?
>
> Much of the reason for having a machine config is that it allows control over 
> things like interrupt routing. Particularly for embedded machines, it's 
> common to have a variety of different machines all using the same components, 
> but varying in how those components are connected. For example the ARM 
> Integrator, Versatile, Realview and Luminary Stellaris boards are all based 
> on approximately the same basic set of devices (the ARM PrimeCell SoC 
> peripherals), just with different memory maps and interrupt topologies. I 
> suspect the same is true for many of the PPC, SH4 and ColdFire boards, and 
> probably the different SPARC sun4m/sun4u variants.

We make stuff configurable in QEMU when we need it more than one way.
While the kernel wants to see configuration when it could conceivably
exist in more than one way.

> Most of the intrastructure to do modular machine construction is already there 
> in qemu, it's just currently driven by hardcoded C QEMUMachineInitFunc rather 
> than a runtime config.
>
> I guess that's where I see the distinction. Roughly speaking the "machine 
> config" replaces pc.c:pc_init1, and the "user config" replaces a lot of the 
> goo in vl.c:main, drive_init, etc.
>
> Paul

Not that I disagree with that.

Look, my goals are rather modest.  I want to start where we are, put
devices behind a nice abstract interface one by one, picking apart the
pc.c hairball on the way.  The idea is not to design the perfect,
all-encompassing abstract device interface, just to capture what we
need, and extend as we go.  The abstract device interface makes a simple
machine builder possible, driven by tree-structured configuration.  That
in turn makes it easier to make things configurable.  Which can be
expected to lead to more configurability, when and where there's a need
for it.

All this can be done in nice, safe baby steps.  I don't need to come up
with an all-singing machine description fit for a picky kernel before I
can start doing something useful.

Now, if you hand me such a configuration on a platter, I'd be a fool not
to take it.  The catch: I need one for a PC.

I believe there's significant overlap in what we two want to accomplish.
We just come from different directions, with somewhat differing
priorities.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 14:13                               ` Markus Armbruster
@ 2009-02-13 14:25                                   ` Paul Brook
  -1 siblings, 0 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-13 14:25 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

> Look, my goals are rather modest.  I want to start where we are, put
> devices behind a nice abstract interface one by one, picking apart the
> pc.c hairball on the way.  The idea is not to design the perfect,
> all-encompassing abstract device interface, just to capture what we
> need, and extend as we go.  The abstract device interface makes a simple
> machine builder possible, driven by tree-structured configuration.  That
> in turn makes it easier to make things configurable.  Which can be
> expected to lead to more configurability, when and where there's a need
> for it.
>
> All this can be done in nice, safe baby steps.  I don't need to come up
> with an all-singing machine description fit for a picky kernel before I
> can start doing something useful.
>
> Now, if you hand me such a configuration on a platter, I'd be a fool not
> to take it.  The catch: I need one for a PC.

I suspect these two goals may be contradictory. The PC machine is so hairy 
that you need a singing, dancing machine description to be able to describe 
it.

OTOH if what you really want to do is configure the host binding side of 
things, then as I've mentioned before, I see that as been somewhat separate 
from the actual machine creation, and trying to combine the two is probably a 
mistake. I really don't want users to have to hack the machine config just to 
change the name of an image file.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 14:25                                   ` Paul Brook
  0 siblings, 0 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-13 14:25 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: devicetree-discuss, qemu-devel

> Look, my goals are rather modest.  I want to start where we are, put
> devices behind a nice abstract interface one by one, picking apart the
> pc.c hairball on the way.  The idea is not to design the perfect,
> all-encompassing abstract device interface, just to capture what we
> need, and extend as we go.  The abstract device interface makes a simple
> machine builder possible, driven by tree-structured configuration.  That
> in turn makes it easier to make things configurable.  Which can be
> expected to lead to more configurability, when and where there's a need
> for it.
>
> All this can be done in nice, safe baby steps.  I don't need to come up
> with an all-singing machine description fit for a picky kernel before I
> can start doing something useful.
>
> Now, if you hand me such a configuration on a platter, I'd be a fool not
> to take it.  The catch: I need one for a PC.

I suspect these two goals may be contradictory. The PC machine is so hairy 
that you need a singing, dancing machine description to be able to describe 
it.

OTOH if what you really want to do is configure the host binding side of 
things, then as I've mentioned before, I see that as been somewhat separate 
from the actual machine creation, and trying to combine the two is probably a 
mistake. I really don't want users to have to hack the machine config just to 
change the name of an image file.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [RFC] Machine description as data
  2009-02-13  0:32                       ` Carl-Daniel Hailfinger
@ 2009-02-13 14:32                         ` Lennart Sorensen
  -1 siblings, 0 replies; 146+ messages in thread
From: Lennart Sorensen @ 2009-02-13 14:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Paul Brook, hollisb

On Fri, Feb 13, 2009 at 01:32:19AM +0100, Carl-Daniel Hailfinger wrote:
> Point taken.
> 
> If the firmware doesn't set up the things which can't be probed, can it
> even be called firmware or is it more like a glorified bootloader?
> 
> Ouch. I always thought turning on all the RAM was either a hardware (old
> x86) or firmware (modern x86) task.

Certainly a lot of embedded systems, the bootloader is the firmware and
is responsible for a lot of hardwre configuration, although often the
operating system also does a lot.  Most x86 systems configure devices on
the PCI bus fairly sensible at power on before the OS starts.  On an arm
system, the linux kernel has to do all the PCI bus enumeration and
configuration since there is generally no PCI handling in the
firmware/boot laoder on such a system.  Of course embedded x86 systems
sometimes also don't do everything you want, in which case fixing it in
the boot loader is handy to avoid messing too much with the kernel.

> I'm a bit surprised by the lack of automatically detectable features in
> embedded systems. Wouldn't automatic detection allow reusing whole OS
> images on slighly different systems and thus lower development cost?

There is the problem of detecting what a given GPIO line does.  Use of
GPIO lines is very common on embedded systems, and the use is almost
always custom to each device.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 14:32                         ` Lennart Sorensen
  0 siblings, 0 replies; 146+ messages in thread
From: Lennart Sorensen @ 2009-02-13 14:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Paul Brook, hollisb

On Fri, Feb 13, 2009 at 01:32:19AM +0100, Carl-Daniel Hailfinger wrote:
> Point taken.
> 
> If the firmware doesn't set up the things which can't be probed, can it
> even be called firmware or is it more like a glorified bootloader?
> 
> Ouch. I always thought turning on all the RAM was either a hardware (old
> x86) or firmware (modern x86) task.

Certainly a lot of embedded systems, the bootloader is the firmware and
is responsible for a lot of hardwre configuration, although often the
operating system also does a lot.  Most x86 systems configure devices on
the PCI bus fairly sensible at power on before the OS starts.  On an arm
system, the linux kernel has to do all the PCI bus enumeration and
configuration since there is generally no PCI handling in the
firmware/boot laoder on such a system.  Of course embedded x86 systems
sometimes also don't do everything you want, in which case fixing it in
the boot loader is handy to avoid messing too much with the kernel.

> I'm a bit surprised by the lack of automatically detectable features in
> embedded systems. Wouldn't automatic detection allow reusing whole OS
> images on slighly different systems and thus lower development cost?

There is the problem of detecting what a given GPIO line does.  Use of
GPIO lines is very common on embedded systems, and the use is almost
always custom to each device.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 14:25                                   ` Paul Brook
@ 2009-02-13 15:47                                       ` Jamie Lokier
  -1 siblings, 0 replies; 146+ messages in thread
From: Jamie Lokier @ 2009-02-13 15:47 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A, Markus Armbruster

Paul Brook wrote:
> > Now, if you hand me such a configuration on a platter, I'd be a fool not
> > to take it.  The catch: I need one for a PC.
> 
> I suspect these two goals may be contradictory. The PC machine is so hairy 
> that you need a singing, dancing machine description to be able to describe 
> it.

About 8 months ago I wanted a QEMU PC, but with PIIX4 IDE controller
instead of the PIIX3 IDE controller that hw/pc.c binds.  (That's
needed to allow a Windows 2000 guest imported from Virtual PC to boot
without a blue screen).

It would have been handy to have an option "-drive if=ide,hw=piix4" or
similar, but considering the obscure reasons for it, I'd have been
happy with a machine configuration file where I could edit the type of
attached device.

By the way, Virtual PC has an XML configuration file which describes
the machine it's emulating in some detail, including device serial
numbers and such.  Is it worth a look?

> OTOH if what you really want to do is configure the host binding
> side of things, then as I've mentioned before, I see that as been
> somewhat separate from the actual machine creation, and trying to
> combine the two is probably a mistake. I really don't want users to
> have to hack the machine config just to change the name of an image
> file.

I agree that host binding is separate, but they're related.

"Placeholders" in the machine config for where particular command line
input can modify the config, with defaults, would be nice.

In the case of a disk image file, the machine config's default would
give a default of "no image file" (no disk present), with a
placeholder indicating that "-drive if=ide,index=0 affects this node"
or similar.

For some devices in the machine config, if the setting is "no image
file" or "no terminal attached", the config may say that the device
itself is to be omitted.  This would apply to hard disks, since you
can't have a not present hard disk.

For other devices in the machine config, the config may say the device
should be present but does nothing.  This would apply for those SoC
emulations which always have 3 UARTs, for example.  If the command
line doesn't attach those UARTs to something, the machine config would
still cause the UARTs to be present.  On a PC, you might always
include an emulated floppy drive, even if no floppy options are
included on the command line - unless "-drive
if=floppy,index=0,disabled" is passed on the command line perhaps.

I expect this can fit into any of the machine config syntaxes and tree
types which have been discussed.  It would be nice to have a generic
command line option which can modify any part of the machine config
tree too, but not necessary.

If the machine config syntax is human friendly enough, it may be
possible for host binding config to use the same syntax, instead of
copying a machine config and editing a small section when command line
options aren't detailed enough.

-- Jamie

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 15:47                                       ` Jamie Lokier
  0 siblings, 0 replies; 146+ messages in thread
From: Jamie Lokier @ 2009-02-13 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Markus Armbruster

Paul Brook wrote:
> > Now, if you hand me such a configuration on a platter, I'd be a fool not
> > to take it.  The catch: I need one for a PC.
> 
> I suspect these two goals may be contradictory. The PC machine is so hairy 
> that you need a singing, dancing machine description to be able to describe 
> it.

About 8 months ago I wanted a QEMU PC, but with PIIX4 IDE controller
instead of the PIIX3 IDE controller that hw/pc.c binds.  (That's
needed to allow a Windows 2000 guest imported from Virtual PC to boot
without a blue screen).

It would have been handy to have an option "-drive if=ide,hw=piix4" or
similar, but considering the obscure reasons for it, I'd have been
happy with a machine configuration file where I could edit the type of
attached device.

By the way, Virtual PC has an XML configuration file which describes
the machine it's emulating in some detail, including device serial
numbers and such.  Is it worth a look?

> OTOH if what you really want to do is configure the host binding
> side of things, then as I've mentioned before, I see that as been
> somewhat separate from the actual machine creation, and trying to
> combine the two is probably a mistake. I really don't want users to
> have to hack the machine config just to change the name of an image
> file.

I agree that host binding is separate, but they're related.

"Placeholders" in the machine config for where particular command line
input can modify the config, with defaults, would be nice.

In the case of a disk image file, the machine config's default would
give a default of "no image file" (no disk present), with a
placeholder indicating that "-drive if=ide,index=0 affects this node"
or similar.

For some devices in the machine config, if the setting is "no image
file" or "no terminal attached", the config may say that the device
itself is to be omitted.  This would apply to hard disks, since you
can't have a not present hard disk.

For other devices in the machine config, the config may say the device
should be present but does nothing.  This would apply for those SoC
emulations which always have 3 UARTs, for example.  If the command
line doesn't attach those UARTs to something, the machine config would
still cause the UARTs to be present.  On a PC, you might always
include an emulated floppy drive, even if no floppy options are
included on the command line - unless "-drive
if=floppy,index=0,disabled" is passed on the command line perhaps.

I expect this can fit into any of the machine config syntaxes and tree
types which have been discussed.  It would be nice to have a generic
command line option which can modify any part of the machine config
tree too, but not necessary.

If the machine config syntax is human friendly enough, it may be
possible for host binding config to use the same syntax, instead of
copying a machine config and editing a small section when command line
options aren't detailed enough.

-- Jamie

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [coreboot] DTS syntax and DTC patches (was: Re: [Qemu-devel] [RFC] Machine description as data)
       [not found]                               ` <20090213025101.GC10476-787xzQ0H9iRg7VrjXcPTGA@public.gmane.org>
@ 2009-02-13 17:07                                 ` ron minnich
  0 siblings, 0 replies; 146+ messages in thread
From: ron minnich @ 2009-02-13 17:07 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger, Coreboot, Markus Armbruster,
	Hollis Blanchard, devicetree-d

[-- Attachment #1: Type: text/plain, Size: 4772 bytes --]

Here is the sum total of the differences from when we checked it in
over 2 years ago until now (parser). Our real changes are to
flattree.c and livetree.c, where we do some ugly by-hand parsing of
the ids such that pci@1,0 etc. work. I'd love to see a way to bring
this into the real syntax. I've tried to do as little as possible to
.y and .l.

The diff with comments is attached.

But this brings up a bigger issue and we could use your help.

OK, what did we do? We implemented the ability to have a sort of
template. Here is a sample from real use.

/{
	mainboard_vendor = "Artec";
	mainboard_name = "DBE62";
	cpus { };
	apic@0 {
		/config/("northbridge/amd/geodelx/apic");
	};
	domain@0 {
		/config/("northbridge/amd/geodelx/domain");
		pci@1,0 {
			/config/("northbridge/amd/geodelx/pci");
			/* Video RAM has to be in 2MB chunks. */
			geode_video_mb = "16";
		};
	etc.

so what's going on here?

The config file in most cases is pretty straightforward. It's actually
just a list of properties with a standard setting for chip control. We
MUST have this; we don't want hundreds of settings in each mainboard,
because sometimes a chip fix comes along and we want that to go into
one chip file, and set the correct value, and have all mainboards get
the new value next time they are built.

Let's look at /config/("northbridge/amd/geodelx/pci");

{
	device_operations = "geodelx_mc";

	/* Video RAM has to be in 2MB chunks. */
	geode_video_mb = "0";
};

The device_operations property is processed by flattree and is of no
importance to you, but it is used in coreboot .h and .c code
generation. For coreboot use, we have several property names that are
special.

Note that we create a property, geode_video_mb, and set it to 0.

In the mainboard dts, we over-ride this value, and set it to 16.

These are pretty much the changes and, again, they work. But I'd like
more, as would our community.

Right now, we can take a file containing a list of dts properties,
read them in, and modify them as above. It's not really ideal, and I
am sure you can already see it could be done better. But what we
really want is the ability to read in  a dts node (with subnodes,
etc.) and then elide them in the mainboard file.

So, for example, we have this subsection of one mainboard:

		pci@6{ /* Port 2 */
			/config/("southbridge/amd/rs690/pcie.dts");
		};
		pci@7{ /* Port 3 */
			/config/("southbridge/amd/rs690/pcie.dts");
		};
		pci@12{
			/config/("southbridge/amd/sb600/hda.dts");
		};
		pci@13,0{
			/config/("southbridge/amd/sb600/usb.dts");
		};
		pci@13,1{
			/config/("southbridge/amd/sb600/usb.dts");
		};
		pci@13,2{
			/config/("southbridge/amd/sb600/usb.dts");
		};

This is not a bunch of chips, but one chip. It has lots of pci devices
in it; this one chip is equivalent to a whole mainboard from previous
years. What we'd really like is the ability to do what my wife calls
restrict, add, and remove (I don't have these terms just right, it's
some kind of compiler-speak which is what she does for a living).

Restrict we have; change property values from a default.
Add is what we'd like: add a node to a tree in some way.
Remove we would also like: remove a node from a dts we have read in
via /config/.

Note that the syntax is only suggested here, the right way to do this
is up to you experts.

/{
	device_operations="dbm690t";
	mainboard_vendor = "AMD";
	mainboard_name = "dbm690t";
	cpus { };
	apic@0 {
	};
	domain@0 /config/("northbridge/amd/k8/domain") = {
                pci@1,0 /config/("southbridge/amd/sb600/dts") = {
                      /* change default xyz to "1" */
                      xyz = "1";
                      /* disable pcie port 6. Note this is over-riding
a value in a node. This is new. */
                     pcie@1,0{disable;};
                     /* don't even put port 7 in the tree -- what is a
remove going to look like?. Also new. */
                     - pcie@5,0;
                      /* add the superio; default values are
acceptable. Also new. We can't add nodes. */
                     pnp@2e /config/("superio/winbond/1234");
              };
       };
};

The result would be more compact files and easier maintenance.

I realize these changes may be too large for the dts to take on; there
is an ongoing discussion as to whether some other language might not
be more appropriate.

But, at the same time, people are comfortable with dts. They have
found it very comfortable to use.

I'd like to thank you for this excellent tool. It is being used to
build production BIOSes that are shipping in products as I write this.
It really saved us a lot of work on coreboot v3 and it is a much
better job, certainly, than I could have done myself.

Thanks, and I hope we can discuss this and work together.

ron

[-- Attachment #2: dtcdiff --]
[-- Type: application/octet-stream, Size: 2999 bytes --]

Index: dtc-parser.y
===================================================================
--- dtc-parser.y	(.../LinuxBIOSv3/util/dtc/dtc-parser.y)	(revision 2)
+++ dtc-parser.y	(.../coreboot-v3/util/dtc/dtc-parser.y)	(working copy)
@@ -23,7 +23,6 @@
 
 %{
 #include "dtc.h"
-
 int yylex (void);
 void yyerror (char const *);
 
@@ -46,7 +45,7 @@
 	struct reserve_info *re;
 }
 
-%token DT_MEMRESERVE
+%token <str> DT_MEMRESERVE

>>>>>>>>>>We made this a string. It's never been of any use to us; you might consider whether the concept is obsolete; what's it for?


 %token <addr> DT_ADDR
 %token <str> DT_PROPNAME
 %token <str> DT_NODENAME
@@ -56,6 +55,8 @@
 %token <str> DT_UNIT
 %token <str> DT_LABEL
 %token <str> DT_REF
+%token <str> DT_FILENAME
+%token <proplist> DT_CONFIG


We implemented a "sort of" include. More in the note. You may not like it, the .l is also attached. We needed these two tokens for it. 


 
 %type <data> propdata
 %type <re> memreserve
@@ -68,14 +69,18 @@
 %type <node> devicetree
 %type <node> nodedef
 %type <node> subnode
+%type <proplist> config

This is for extensions/changes to the property list. More in the email. 


 %type <nodelist> subnodes
 %type <str> label
 %type <str> nodename
-
+%type <data> includepath
+%type <data> structname
 %%
 
-sourcefile:	memreserves devicetree {
-			the_boot_info = build_boot_info($1, $2);
+/*sourcefile:	memreserves devicetree {*/
+sourcefile:	devicetree {
+/*			the_boot_info = build_boot_info($1, $2);*/
+			the_boot_info = build_boot_info(0, $1);
 		}
 	;
 
@@ -100,8 +105,8 @@
 		}
 	;
 
-nodedef:	'{' proplist subnodes '}' ';' {
-			$$ = build_node($2, $3);
+nodedef:	'{' config proplist subnodes '}' ';' {
+			$$ = build_node($2, $3, $4);
 		}
 	;
 
@@ -113,6 +118,49 @@
 		}
 	;
 
+config:        DT_CONFIG '(' 
+		includepath {
+			void switchin(FILE *f);
+			FILE *f;
+			/* The need for a cast here is silly */
+			char *name = (char *)$3.val;
+
+			/* TODO: keep track of which of these we have read in. If we have already done it, then 
+			  * don't do it twice. 
+			  */
+			f  = fopenfile(name);
+			if (! f){
+				perror(name);
+				exit(1);
+			}
+			switchin(f);
+		}  '{' proplist '}' ';' {
+			void	switchback(void);
+			switchback();
+			
+		}
+		')' ';' {
+				int namelen;
+				char *name = strdup((char *)$3.val);
+				/* convention: first property is labeled with path */
+				$6->label = name;
+
+				/* convention: if it ends in .dts, strip that off	*/
+				namelen = strlen($6->label);
+				if ((namelen > 4) && (! strncmp(&name[namelen-4], ".dts", 4)))
+					$6->label[namelen-4] = '\0';
+
+				$$ = $6;
+			}
+	|
+	;
+
+includepath:	DT_STRING  { $$ = $1; }
+	;
+
+structname: 	DT_FILENAME {$$ = $1; }
+	;
+

The actual config command. What happens here is that if you have a config, the config properties are attached to tree->config. 

in flattree.c we resolve the property names. More in the mail. 



 propdef:	label DT_PROPNAME '=' propdata ';' {
 			$$ = build_property($2, $4, $1);
 		}

[-- Attachment #3: dtc-diff.l --]
[-- Type: application/octet-stream, Size: 2222 bytes --]

Index: dtc-lexer.l
===================================================================
--- dtc-lexer.l	(.../LinuxBIOSv3/util/dtc/dtc-lexer.l)	(revision 2)
+++ dtc-lexer.l	(.../coreboot-v3/util/dtc/dtc-lexer.l)	(working copy)
@@ -23,6 +23,7 @@
 %x CELLDATA
 %x BYTESTRING
 %x MEMRESERVE
+%x PASSTHROUGH
 
 PROPCHAR	[a-zA-Z0-9,._+*#?-]
 UNITCHAR	[0-9a-f,]
@@ -35,16 +36,33 @@
 
 #include "dtc-parser.tab.h"
 
-/*#define LEXDEBUG	1*/
-
 #ifdef LEXDEBUG
 #define DPRINT(fmt, ...)	fprintf(stderr, fmt, ##__VA_ARGS__)
 #else
 #define DPRINT(fmt, ...)	do { } while (0)
 #endif
 
+char *code = 0;
 
+YY_BUFFER_STATE bstack = NULL;
+int line;
 
+void
+switchin(FILE *f){
+	YY_BUFFER_STATE b;
+	bstack = YY_CURRENT_BUFFER;
+	b = yy_create_buffer(f, 8192);
+	line = yylineno;
+	yylineno = 1;
+	yy_switch_to_buffer(b);
+	
+}
+
+void
+switchback(void){
+		yy_switch_to_buffer(bstack);
+		yylineno = line;
+}


This is for switching input via /config/("filename");

 %}
 
 %%
@@ -58,6 +76,22 @@
 			return DT_STRING;
 		}
 
+^%%\n	{
+			DPRINT("Begin passthrough\n");
+			/* let's be stupid. 1 MB is way more than enough ... */
+			code = malloc(1048576);
+			*code = 0;
+			BEGIN(PASSTHROUGH);
+		}
+
+<PASSTHROUGH>.*	{
+					DPRINT("Matching in passthrough %s\n", yytext);
+					/* you tell me why echo does not work */
+					/*ECHO;*/
+					strcat(code, yytext);
+					strcat(code, "\n");
+				}
+
AFAIK the passthrough stuff is dead and no longer used. This would be removed. 

 "/memreserve/"	{
 			yylloc.first_line = yylineno;
 			DPRINT("Keyword: /memreserve/\n");
@@ -67,7 +101,7 @@
 
 <MEMRESERVE>[0-9a-fA-F]+ {
 			yylloc.first_line = yylineno;
-			if (yyleng > 2*sizeof(yylval.addr)) {
+			if ((unsigned long)yyleng > 2*sizeof(yylval.addr)) {
 				fprintf(stderr, "Address value %s too large\n",
 					yytext);
 			}
@@ -84,9 +118,15 @@
 			return ';';
 		}
 
+"/config/"	{
+			yylloc.first_line = yylineno;
+			DPRINT("Keyword: /config/\n");
+			return DT_CONFIG;
+		}
+

implements DT_CONFIG 
 <CELLDATA>[0-9a-fA-F]+	{
 			yylloc.first_line = yylineno;
-			if (yyleng > 2*sizeof(yylval.cval)) {
+			if ((unsigned long)yyleng > 2*sizeof(yylval.cval)) {
 				fprintf(stderr,
 					"Cell value %s too long\n", yytext);
 			}

[-- Attachment #4: Type: text/plain, Size: 194 bytes --]

_______________________________________________
devicetree-discuss mailing list
devicetree-discuss-mnsaURCQ41sdnm+yROfE0A@public.gmane.org
https://ozlabs.org/mailman/listinfo/devicetree-discuss

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 14:13                               ` Markus Armbruster
@ 2009-02-13 18:36                                   ` Mitch Bradley
  -1 siblings, 0 replies; 146+ messages in thread
From: Mitch Bradley @ 2009-02-13 18:36 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

Here is an IEEE1275 device tree for a OLPC system, which is close enough 
to a PC.

** /

ec-name                  PQ2E21
ec-version               00000054
serial-number            SHF73300042
board-revision-int       00000c18
model                    C1
dma-ranges               07000000 09000000
banner-name              OLPC C1
architecture             OLPC
#size-cells              00000001
#address-cells           00000001

** /cpus

#size-cells              00000000
#address-cells           00000001
name                     cpus

** /cpus/cpu@0

clock-frequency          19d42455
model                    AMD,Geode LX
reg                      00000000
device_type              cpu
name                     cpu

** /flash@fff00000

#address-cells           00000001
reg                      fff00000 00100000
name                     flash

** /memory@0

reg                      00000000 10000000
available                0efca000 00010000
                         0ef80000 00048000
                         00100000 0eb00000
                         00002000 0009e000
name                     memory

** /pci/usb@f,5

assigned-addresses       82007d10 00000000 fe01b000 00000000 00000100
reg                      00007d00 00000000 00000000 00000000 00000000
                         02007d10 00000000 00000000 00000000 00000100
#size-cells              00000000
#address-cells           00000002
name                     usb
device_type              ehci
66mhz-capable
devsel-speed             00000001
class-code               000c0320
subsystem-vendor-id      00001022
subsystem-id             00002095
interrupts               00000004
max-latency              00000000
min-grant                00000000
revision-id              00000002
device-id                00002095
vendor-id                00001022


** /pci/usb@f,4

assigned-addresses       82007c10 00000000 fe01a000 00000000 00001000
reg                      00007c00 00000000 00000000 00000000 00000000
                         02007c10 00000000 00000000 00000000 00001000
#size-cells              00000000
#address-cells           00000002
name                     usb
device_type              ohci
66mhz-capable
devsel-speed             00000001
class-code               000c0310
subsystem-vendor-id      00001022
subsystem-id             00002094
interrupts               00000004
max-latency              00000000
min-grant                00000000
revision-id              00000002
device-id                00002094
vendor-id                00001022


** /pci/audio@f,3

assigned-addresses       81007b10 00000000 00001480 00000000 00000080
reg                      00007b00 00000000 00000000 00000000 00000000
                         01007b10 00000000 00000000 00000000 00000080
compatible               AD1888
                         AC97,CODEC
output-encoding-types    16bit-LE-signed-linear
input-encoding-types     16bit-LE-signed-linear
sample-frame-size        00000010
sample-precisions        00000010
#output-channels         00000002
#input-channels          00000001
device_type              sound
name                     audio
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00040100
subsystem-vendor-id      00001022
subsystem-id             00002093
interrupts               00000002
max-latency              00000000
min-grant                00000000
revision-id              00000001
device-id                00002093
vendor-id                00001022


** /pci/camera@c,2

assigned-addresses       82006210 00000000 fe028000 00000000 00004000
reg                      00006200 00000000 00000000 00000000 00000000
                         02006210 00000000 00000000 00000000 00004000
sensor                   OV7670
compatible               olpc,camera
device_type              camera
model                    olpc,camera
name                     camera
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00040001
subsystem-vendor-id      000011ab
subsystem-id             00004100
interrupts               00000001
max-latency              00000008
min-grant                00000008
revision-id              00000010
device-id                00004102
vendor-id                000011ab


** /pci/sd@c,1

assigned-addresses       81006110 00000000 fe024000 00000000 00004000
reg                      00006100 00000000 00000000 00000000 00000100
                         01006110 00000000 00000000 00000000 00004000
compatible               sdhci
#size-cells              00000000
#address-cells           00000000
name                     sd
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00080501
subsystem-vendor-id      000011ab
subsystem-id             00004100
interrupts               00000001
max-latency              00000008
min-grant                00000008
revision-id              00000010
device-id                00004101
vendor-id                000011ab


** /pci/nandflash@c

assigned-addresses       82006010 00000000 fe020000 00000000 00004000
reg                      00006000 00000000 00000000 00000000 00000000
                         02006010 00000000 00000000 00000000 00004000
compatible               olpc,cafenand
model                    olpc,cafenand
name                     nandflash
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00050101
subsystem-vendor-id      000011ab
subsystem-id             00004100
interrupts               00000001
max-latency              00000008
min-grant                00000008
revision-id              00000010
device-id                00004100
vendor-id                000011ab


** /pci/pci1022,2082@1,2

assigned-addresses       82000a10 00000000 fe010000 00000000 00004000
reg                      00000a00 00000000 00000000 00000000 00000000
                         02000a10 00000000 00000000 00000000 00004000
compatible               pci1022,2082
                         pci1022,2082
                         pciclass,101000
name                     pci1022,2082
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00101000
subsystem-vendor-id      00001022
subsystem-id             00002082
interrupts               00000001
max-latency              00000000
min-grant                00000000
revision-id              00000000
device-id                00002082
vendor-id                00001022


** /pci/host@1

assigned-addresses       81000810 00000000 0000ac1c 00000000 00000004
power-consumption        00 00 00 00 01 7d 78 40
reg                      00000800 00000000 00000000 00000000 00000000
                         01000810 00000000 00000000 00000000 00000004
compatible               pci1022,2080
                         pci1022,2080
                         pciclass,060000
name                     host
66mhz-capable
devsel-speed             00000001
class-code               00060000
subsystem-vendor-id      00001022
subsystem-id             00002080
max-latency              00000000
min-grant                00000000
revision-id              00000021
device-id                00002080
vendor-id                00001022


** /pci/display@1,1

compatible               pci1022,2081
                         pci1022,2081
                         pciclass,030000
66mhz-capable
devsel-speed             00000001
class-code               00030000
subsystem-vendor-id      00001022
subsystem-id             00002081
interrupts               00000001
max-latency              00000000
min-grant                00000000
revision-id              00000000
device-id                00002081
vendor-id                00001022
address                  fd000000
linebytes                00000960
depth                    00000010
height                   00000384
width                    000004b0
assigned-addresses       82000910 00000000 fd000000 00000000 00800000
                         82000914 00000000 fe000000 00000000 00004000
                         82000918 00000000 fe004000 00000000 00004000
                         8200091c 00000000 fe008000 00000000 00004000
                         82000920 00000000 fe00c000 00000000 00004000
iso6429-1983-colors
character-set            ISO8859-1
device_type              display
reg                      00000900 00000000 00000000 00000000 00000100
                         02000910 00000000 00000000 00000000 00800000
                         02000914 00000000 00000000 00000000 00004000
                         02000918 00000000 00000000 00000000 00004000
                         0200091c 00000000 00000000 00000000 00004000
                         02000920 00000000 00000000 00000000 00004000
name                     display


** /pci/isa@f

assigned-addresses       81007810 00000000 000018b0 00000000 00000000
                         81007814 00000000 00001000 00000000 00000000
                         81007818 00000000 00001800 00000000 00000000
                         8100781c 00000000 00001880 00000000 00000000
                         81007820 00000000 00001400 00000000 00000000
                         81007824 00000000 00001840 00000000 00000000
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00060100
subsystem-vendor-id      00001022
subsystem-id             00002090
max-latency              00000000
min-grant                00000000
revision-id              00000003
device-id                00002090
vendor-id                00001022
interrupt-parent         ff867a98
#interrupt-cells         00000002
ranges                   00000000 00000000 02000000 00000000 00000000 
01000000
                         00000001 00000000 01000000 00000000 00000000 
00010000
clock-frequency          007ea5e0
reg                      00007800 00000000 00000000 00000000 00000000
#size-cells              00000001
#address-cells           00000002
device_type              isa
name                     isa


** /pci/usb@f,5/wlan@0,0

device_type              wireless-network
configuration#           00000001
bulk-in-size             00000200
bulk-in-pipe             00000003
bulk-out-size            00000200
bulk-out-pipe            00000002
serial$
device$                  MARVELL Wireless Device
vendor$                  Marvell
compatible               usb1286,2001.3107
                         usb1286,2001
                         usbif1286,classff.ff.ff
                         usbif1286,classff.ff
                         usbif1286,classff
                         usbif,classff.ff.ff
                         usbif,classff.ff
                         usbif,classff
                         usb,device
vendor-id                00001286
device-id                00002001
release                  00003107
name                     wlan
class                    000000ff
subclass                 000000ff
protocol                 000000ff
high-speed
assigned-address         00000001
reg                      00000000 00000000
#size-cells              00000000
#address-cells           00000001


** /pci/sd@c,1/disk

device_type              block
iconname                 sdmmc
name                     disk


** /pci/isa@f/rtc@i70

status                   okay
century                  00000032
alarm_month              0000003e
alarm_day                0000003d
device#                  00000002
interrupts               00000008
                         00000000
reg                      00000001 00000070 00000002
compatible               pnpPNP,b00
device_type              rtc
name                     rtc


** /pci/isa@f/8042@i60

#size-cells              00000000
#address-cells           00000001
reg                      00000001 00000060 00000001
                         00000001 00000064 00000001
compatible               ps2-keyboard-controller
                         INTC,80c42
device_type              8042
name                     8042
model                    INTC,80c42
interrupts               00000001
                         00000003
                         0000000c
                         00000003


** /pci/isa@f/serial@i3f8

reg                      00000001 000003f8 00000008
compatible               pnpPNP,501
device_type              serial
name                     serial
clock-frequency          001c2000
interrupts               00000004
                         00000003


** /pci/isa@f/timer@i40

interrupts               00000000
                         00000003
reg                      00000001 00000040 00000004
                         00000001 00000061 00000001
compatible               pnpPNP,100
device_type              timer
name                     timer


** /pci/isa@f/interrupt-controller@i20

reg                      00000001 00000020 00000002
                         00000001 000000a0 00000002
                         00000001 000004d0 00000002
compatible               pnpPNP,0
device_type              interrupt-controller
name                     interrupt-controller
#address-cells           00000000
#interrupt-cells         00000002
interrupt-controller


** /pci/isa@f/dma-controller@i00

reg                      00000001 00000000 00000010
                         00000001 00000080 00000020
                         00000001 000000c0 00000020
                         00000001 00000481 0000000f
compatible               pnpPNP,200
device_type              dma-controller
name                     dma-controller


** /pci/isa@f/8042@i60/mouse@aux

reg                      00000001
compatible               pnpPNP,f03
device_type              mouse
name                     mouse


** /pci/isa@f/8042@i60/keyboard@kbd

language                 EN
keyboard-type            us
reg                      00000000
device_type              keyboard
compatible               pnpPNP,303
name                     keyboard

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 18:36                                   ` Mitch Bradley
  0 siblings, 0 replies; 146+ messages in thread
From: Mitch Bradley @ 2009-02-13 18:36 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: devicetree-discuss, qemu-devel

Here is an IEEE1275 device tree for a OLPC system, which is close enough 
to a PC.

** /

ec-name                  PQ2E21
ec-version               00000054
serial-number            SHF73300042
board-revision-int       00000c18
model                    C1
dma-ranges               07000000 09000000
banner-name              OLPC C1
architecture             OLPC
#size-cells              00000001
#address-cells           00000001

** /cpus

#size-cells              00000000
#address-cells           00000001
name                     cpus

** /cpus/cpu@0

clock-frequency          19d42455
model                    AMD,Geode LX
reg                      00000000
device_type              cpu
name                     cpu

** /flash@fff00000

#address-cells           00000001
reg                      fff00000 00100000
name                     flash

** /memory@0

reg                      00000000 10000000
available                0efca000 00010000
                         0ef80000 00048000
                         00100000 0eb00000
                         00002000 0009e000
name                     memory

** /pci/usb@f,5

assigned-addresses       82007d10 00000000 fe01b000 00000000 00000100
reg                      00007d00 00000000 00000000 00000000 00000000
                         02007d10 00000000 00000000 00000000 00000100
#size-cells              00000000
#address-cells           00000002
name                     usb
device_type              ehci
66mhz-capable
devsel-speed             00000001
class-code               000c0320
subsystem-vendor-id      00001022
subsystem-id             00002095
interrupts               00000004
max-latency              00000000
min-grant                00000000
revision-id              00000002
device-id                00002095
vendor-id                00001022


** /pci/usb@f,4

assigned-addresses       82007c10 00000000 fe01a000 00000000 00001000
reg                      00007c00 00000000 00000000 00000000 00000000
                         02007c10 00000000 00000000 00000000 00001000
#size-cells              00000000
#address-cells           00000002
name                     usb
device_type              ohci
66mhz-capable
devsel-speed             00000001
class-code               000c0310
subsystem-vendor-id      00001022
subsystem-id             00002094
interrupts               00000004
max-latency              00000000
min-grant                00000000
revision-id              00000002
device-id                00002094
vendor-id                00001022


** /pci/audio@f,3

assigned-addresses       81007b10 00000000 00001480 00000000 00000080
reg                      00007b00 00000000 00000000 00000000 00000000
                         01007b10 00000000 00000000 00000000 00000080
compatible               AD1888
                         AC97,CODEC
output-encoding-types    16bit-LE-signed-linear
input-encoding-types     16bit-LE-signed-linear
sample-frame-size        00000010
sample-precisions        00000010
#output-channels         00000002
#input-channels          00000001
device_type              sound
name                     audio
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00040100
subsystem-vendor-id      00001022
subsystem-id             00002093
interrupts               00000002
max-latency              00000000
min-grant                00000000
revision-id              00000001
device-id                00002093
vendor-id                00001022


** /pci/camera@c,2

assigned-addresses       82006210 00000000 fe028000 00000000 00004000
reg                      00006200 00000000 00000000 00000000 00000000
                         02006210 00000000 00000000 00000000 00004000
sensor                   OV7670
compatible               olpc,camera
device_type              camera
model                    olpc,camera
name                     camera
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00040001
subsystem-vendor-id      000011ab
subsystem-id             00004100
interrupts               00000001
max-latency              00000008
min-grant                00000008
revision-id              00000010
device-id                00004102
vendor-id                000011ab


** /pci/sd@c,1

assigned-addresses       81006110 00000000 fe024000 00000000 00004000
reg                      00006100 00000000 00000000 00000000 00000100
                         01006110 00000000 00000000 00000000 00004000
compatible               sdhci
#size-cells              00000000
#address-cells           00000000
name                     sd
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00080501
subsystem-vendor-id      000011ab
subsystem-id             00004100
interrupts               00000001
max-latency              00000008
min-grant                00000008
revision-id              00000010
device-id                00004101
vendor-id                000011ab


** /pci/nandflash@c

assigned-addresses       82006010 00000000 fe020000 00000000 00004000
reg                      00006000 00000000 00000000 00000000 00000000
                         02006010 00000000 00000000 00000000 00004000
compatible               olpc,cafenand
model                    olpc,cafenand
name                     nandflash
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00050101
subsystem-vendor-id      000011ab
subsystem-id             00004100
interrupts               00000001
max-latency              00000008
min-grant                00000008
revision-id              00000010
device-id                00004100
vendor-id                000011ab


** /pci/pci1022,2082@1,2

assigned-addresses       82000a10 00000000 fe010000 00000000 00004000
reg                      00000a00 00000000 00000000 00000000 00000000
                         02000a10 00000000 00000000 00000000 00004000
compatible               pci1022,2082
                         pci1022,2082
                         pciclass,101000
name                     pci1022,2082
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00101000
subsystem-vendor-id      00001022
subsystem-id             00002082
interrupts               00000001
max-latency              00000000
min-grant                00000000
revision-id              00000000
device-id                00002082
vendor-id                00001022


** /pci/host@1

assigned-addresses       81000810 00000000 0000ac1c 00000000 00000004
power-consumption        00 00 00 00 01 7d 78 40
reg                      00000800 00000000 00000000 00000000 00000000
                         01000810 00000000 00000000 00000000 00000004
compatible               pci1022,2080
                         pci1022,2080
                         pciclass,060000
name                     host
66mhz-capable
devsel-speed             00000001
class-code               00060000
subsystem-vendor-id      00001022
subsystem-id             00002080
max-latency              00000000
min-grant                00000000
revision-id              00000021
device-id                00002080
vendor-id                00001022


** /pci/display@1,1

compatible               pci1022,2081
                         pci1022,2081
                         pciclass,030000
66mhz-capable
devsel-speed             00000001
class-code               00030000
subsystem-vendor-id      00001022
subsystem-id             00002081
interrupts               00000001
max-latency              00000000
min-grant                00000000
revision-id              00000000
device-id                00002081
vendor-id                00001022
address                  fd000000
linebytes                00000960
depth                    00000010
height                   00000384
width                    000004b0
assigned-addresses       82000910 00000000 fd000000 00000000 00800000
                         82000914 00000000 fe000000 00000000 00004000
                         82000918 00000000 fe004000 00000000 00004000
                         8200091c 00000000 fe008000 00000000 00004000
                         82000920 00000000 fe00c000 00000000 00004000
iso6429-1983-colors
character-set            ISO8859-1
device_type              display
reg                      00000900 00000000 00000000 00000000 00000100
                         02000910 00000000 00000000 00000000 00800000
                         02000914 00000000 00000000 00000000 00004000
                         02000918 00000000 00000000 00000000 00004000
                         0200091c 00000000 00000000 00000000 00004000
                         02000920 00000000 00000000 00000000 00004000
name                     display


** /pci/isa@f

assigned-addresses       81007810 00000000 000018b0 00000000 00000000
                         81007814 00000000 00001000 00000000 00000000
                         81007818 00000000 00001800 00000000 00000000
                         8100781c 00000000 00001880 00000000 00000000
                         81007820 00000000 00001400 00000000 00000000
                         81007824 00000000 00001840 00000000 00000000
66mhz-capable
fast-back-to-back
devsel-speed             00000001
class-code               00060100
subsystem-vendor-id      00001022
subsystem-id             00002090
max-latency              00000000
min-grant                00000000
revision-id              00000003
device-id                00002090
vendor-id                00001022
interrupt-parent         ff867a98
#interrupt-cells         00000002
ranges                   00000000 00000000 02000000 00000000 00000000 
01000000
                         00000001 00000000 01000000 00000000 00000000 
00010000
clock-frequency          007ea5e0
reg                      00007800 00000000 00000000 00000000 00000000
#size-cells              00000001
#address-cells           00000002
device_type              isa
name                     isa


** /pci/usb@f,5/wlan@0,0

device_type              wireless-network
configuration#           00000001
bulk-in-size             00000200
bulk-in-pipe             00000003
bulk-out-size            00000200
bulk-out-pipe            00000002
serial$
device$                  MARVELL Wireless Device
vendor$                  Marvell
compatible               usb1286,2001.3107
                         usb1286,2001
                         usbif1286,classff.ff.ff
                         usbif1286,classff.ff
                         usbif1286,classff
                         usbif,classff.ff.ff
                         usbif,classff.ff
                         usbif,classff
                         usb,device
vendor-id                00001286
device-id                00002001
release                  00003107
name                     wlan
class                    000000ff
subclass                 000000ff
protocol                 000000ff
high-speed
assigned-address         00000001
reg                      00000000 00000000
#size-cells              00000000
#address-cells           00000001


** /pci/sd@c,1/disk

device_type              block
iconname                 sdmmc
name                     disk


** /pci/isa@f/rtc@i70

status                   okay
century                  00000032
alarm_month              0000003e
alarm_day                0000003d
device#                  00000002
interrupts               00000008
                         00000000
reg                      00000001 00000070 00000002
compatible               pnpPNP,b00
device_type              rtc
name                     rtc


** /pci/isa@f/8042@i60

#size-cells              00000000
#address-cells           00000001
reg                      00000001 00000060 00000001
                         00000001 00000064 00000001
compatible               ps2-keyboard-controller
                         INTC,80c42
device_type              8042
name                     8042
model                    INTC,80c42
interrupts               00000001
                         00000003
                         0000000c
                         00000003


** /pci/isa@f/serial@i3f8

reg                      00000001 000003f8 00000008
compatible               pnpPNP,501
device_type              serial
name                     serial
clock-frequency          001c2000
interrupts               00000004
                         00000003


** /pci/isa@f/timer@i40

interrupts               00000000
                         00000003
reg                      00000001 00000040 00000004
                         00000001 00000061 00000001
compatible               pnpPNP,100
device_type              timer
name                     timer


** /pci/isa@f/interrupt-controller@i20

reg                      00000001 00000020 00000002
                         00000001 000000a0 00000002
                         00000001 000004d0 00000002
compatible               pnpPNP,0
device_type              interrupt-controller
name                     interrupt-controller
#address-cells           00000000
#interrupt-cells         00000002
interrupt-controller


** /pci/isa@f/dma-controller@i00

reg                      00000001 00000000 00000010
                         00000001 00000080 00000020
                         00000001 000000c0 00000020
                         00000001 00000481 0000000f
compatible               pnpPNP,200
device_type              dma-controller
name                     dma-controller


** /pci/isa@f/8042@i60/mouse@aux

reg                      00000001
compatible               pnpPNP,f03
device_type              mouse
name                     mouse


** /pci/isa@f/8042@i60/keyboard@kbd

language                 EN
keyboard-type            us
reg                      00000000
device_type              keyboard
compatible               pnpPNP,303
name                     keyboard

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 18:36                                   ` Mitch Bradley
@ 2009-02-13 19:49                                       ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-13 19:49 UTC (permalink / raw)
  To: Mitch Bradley
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

Mitch Bradley <wmb-D5eQfiDGL7eakBO8gow8eQ@public.gmane.org> writes:

> Here is an IEEE1275 device tree for a OLPC system, which is close
> enough to a PC.
[snip...]

Thanks!  Got this in .dts syntax, by chance?

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 19:49                                       ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-13 19:49 UTC (permalink / raw)
  To: Mitch Bradley; +Cc: devicetree-discuss, qemu-devel

Mitch Bradley <wmb@firmworks.com> writes:

> Here is an IEEE1275 device tree for a OLPC system, which is close
> enough to a PC.
[snip...]

Thanks!  Got this in .dts syntax, by chance?

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 19:49                                       ` Markus Armbruster
@ 2009-02-13 19:51                                           ` Mitch Bradley
  -1 siblings, 0 replies; 146+ messages in thread
From: Mitch Bradley @ 2009-02-13 19:51 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

>
> Mitch Bradley <wmb-D5eQfiDGL7eakBO8gow8eQ@public.gmane.org> writes:
>
>   
>> > Here is an IEEE1275 device tree for a OLPC system, which is close
>> > enough to a PC.
>>     
> [snip...]
>
> Thanks!  Got this in .dts syntax, by chance?
>   

Sorry, I do full-up Open Firmware, not flattened device trees.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 19:51                                           ` Mitch Bradley
  0 siblings, 0 replies; 146+ messages in thread
From: Mitch Bradley @ 2009-02-13 19:51 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: devicetree-discuss, qemu-devel

>
> Mitch Bradley <wmb@firmworks.com> writes:
>
>   
>> > Here is an IEEE1275 device tree for a OLPC system, which is close
>> > enough to a PC.
>>     
> [snip...]
>
> Thanks!  Got this in .dts syntax, by chance?
>   

Sorry, I do full-up Open Firmware, not flattened device trees.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13  2:11                   ` Carl-Daniel Hailfinger
@ 2009-02-13 20:04                       ` Jon Loeliger
  -1 siblings, 0 replies; 146+ messages in thread
From: Jon Loeliger @ 2009-02-13 20:04 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: devicetree-discuss, Markus Armbruster, Hollis Blanchard,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Fri, 2009-02-13 at 03:11 +0100, Carl-Daniel Hailfinger wrote:
> On 13.02.2009 01:43, David Gibson wrote:
> > On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
> >   
> >> I didn't mean to say they are a bad idea for FDTs, just that they're on
> >> an awkward level of abstraction for QEMU configuration.  There, I'd
> >> rather express a PCI address as "02:01.0" than as <0x00000220>.
> >> Translating text to binary is the machine's job, not the user's.
> >>     
> >
> > Ah, I see what you mean.  Hrm, there are several possibilities here,
> > we'll have to see which works out best for your purposes.
> >   
> 
> Using the DTC version included in the coreboot v3 sources would solve
> that problem and give you a readable PCI address representation.

As would the proposed language enhancements I suggested.

jdl

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 20:04                       ` Jon Loeliger
  0 siblings, 0 replies; 146+ messages in thread
From: Jon Loeliger @ 2009-02-13 20:04 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: devicetree-discuss, Markus Armbruster, Hollis Blanchard, qemu-devel

On Fri, 2009-02-13 at 03:11 +0100, Carl-Daniel Hailfinger wrote:
> On 13.02.2009 01:43, David Gibson wrote:
> > On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
> >   
> >> I didn't mean to say they are a bad idea for FDTs, just that they're on
> >> an awkward level of abstraction for QEMU configuration.  There, I'd
> >> rather express a PCI address as "02:01.0" than as <0x00000220>.
> >> Translating text to binary is the machine's job, not the user's.
> >>     
> >
> > Ah, I see what you mean.  Hrm, there are several possibilities here,
> > we'll have to see which works out best for your purposes.
> >   
> 
> Using the DTC version included in the coreboot v3 sources would solve
> that problem and give you a readable PCI address representation.

As would the proposed language enhancements I suggested.

jdl

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 20:04                       ` Jon Loeliger
@ 2009-02-13 20:15                         ` Carl-Daniel Hailfinger
  -1 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-13 20:15 UTC (permalink / raw)
  To: Jon Loeliger
  Cc: devicetree-discuss, Markus Armbruster, Hollis Blanchard,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

On 13.02.2009 21:04, Jon Loeliger wrote:
> On Fri, 2009-02-13 at 03:11 +0100, Carl-Daniel Hailfinger wrote:
>   
>> On 13.02.2009 01:43, David Gibson wrote:
>>     
>>> On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
>>>   
>>>       
>>>> I didn't mean to say they are a bad idea for FDTs, just that they're on
>>>> an awkward level of abstraction for QEMU configuration.  There, I'd
>>>> rather express a PCI address as "02:01.0" than as <0x00000220>.
>>>> Translating text to binary is the machine's job, not the user's.
>>>>     
>>>>         
>>> Ah, I see what you mean.  Hrm, there are several possibilities here,
>>> we'll have to see which works out best for your purposes.
>>>   
>>>       
>> Using the DTC version included in the coreboot v3 sources would solve
>> that problem and give you a readable PCI address representation.
>>     
>
> As would the proposed language enhancements I suggested.
>   

Do you have a pointer to the archives for that?

Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 20:15                         ` Carl-Daniel Hailfinger
  0 siblings, 0 replies; 146+ messages in thread
From: Carl-Daniel Hailfinger @ 2009-02-13 20:15 UTC (permalink / raw)
  To: Jon Loeliger
  Cc: devicetree-discuss, Markus Armbruster, Hollis Blanchard, qemu-devel

On 13.02.2009 21:04, Jon Loeliger wrote:
> On Fri, 2009-02-13 at 03:11 +0100, Carl-Daniel Hailfinger wrote:
>   
>> On 13.02.2009 01:43, David Gibson wrote:
>>     
>>> On Thu, Feb 12, 2009 at 11:26:46AM +0100, Markus Armbruster wrote:
>>>   
>>>       
>>>> I didn't mean to say they are a bad idea for FDTs, just that they're on
>>>> an awkward level of abstraction for QEMU configuration.  There, I'd
>>>> rather express a PCI address as "02:01.0" than as <0x00000220>.
>>>> Translating text to binary is the machine's job, not the user's.
>>>>     
>>>>         
>>> Ah, I see what you mean.  Hrm, there are several possibilities here,
>>> we'll have to see which works out best for your purposes.
>>>   
>>>       
>> Using the DTC version included in the coreboot v3 sources would solve
>> that problem and give you a readable PCI address representation.
>>     
>
> As would the proposed language enhancements I suggested.
>   

Do you have a pointer to the archives for that?

Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 20:15                         ` Carl-Daniel Hailfinger
@ 2009-02-13 20:19                             ` Jon Loeliger
  -1 siblings, 0 replies; 146+ messages in thread
From: Jon Loeliger @ 2009-02-13 20:19 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: devicetree-discuss, Markus Armbruster, Hollis Blanchard,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Fri, 2009-02-13 at 21:15 +0100, Carl-Daniel Hailfinger wrote:

> > As would the proposed language enhancements I suggested.
> >   
> 
> Do you have a pointer to the archives for that?


All that code is available in the "testing" branch of
the repository on jdl.com.

Fair warning, I'm rebasing that branch to HEAD of master regularly.

jdl

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-13 20:19                             ` Jon Loeliger
  0 siblings, 0 replies; 146+ messages in thread
From: Jon Loeliger @ 2009-02-13 20:19 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: devicetree-discuss, Markus Armbruster, Hollis Blanchard, qemu-devel

On Fri, 2009-02-13 at 21:15 +0100, Carl-Daniel Hailfinger wrote:

> > As would the proposed language enhancements I suggested.
> >   
> 
> Do you have a pointer to the archives for that?


All that code is available in the "testing" branch of
the repository on jdl.com.

Fair warning, I'm rebasing that branch to HEAD of master regularly.

jdl

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-13 11:26               ` Markus Armbruster
@ 2009-02-16  3:42                   ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-16  3:42 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Fri, Feb 13, 2009 at 12:26:28PM +0100, Markus Armbruster wrote:
> David Gibson <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> writes:
> > On Thu, Feb 12, 2009 at 11:26:12AM +0100, Markus Armbruster wrote:
> >> Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
[snip]
> > dtc and libfdt is a good place to start, if you haven't yet
> > investigated them:
> > 	git://git.jdl.com/software/dtc.git
> > Note that although they're distributed together as one tree, dtc and
> > libfdt are essentially independent pieces of software.  dtc converts
> > device trees between various formats, dts and dtb in particular.
> >
> > libfdt does a number of the things you mention with flat trees -
> > get/set properties, build trees, traverse etc.  If it doesn't do
> > everything you need, we can probably extend it so that it does: I want
> > libfdt to be *the* library for manipulating trees in the fdt forma.
> > It's designed to be easy to embed in other packages for this reason,
> > although it does have some usage peculiarities because in particular
> > it's possible to integrate into very limited environments like
> > firmwares.
> >
> > [Jon Loeliger is the current maintainer of dtc and libfdt, but I
> > originally wrote both of them - I know as much about them as anyone
> > does]
> 
> Okay, I looked at dtc and libfdt again, a bit more closely.  I'm sure
> there's plenty of ignorance left in me, so please correct me when I'm
> babbling nonsense.

Sure.  So, I realize that there are two different questions here:
	a) Is IEEE1275 a good starting point for the content of a
decorated tree for configuring qemu.

Personally, I suspect the answer to this is yes, but more information
might convince me otherwise.

	b) Is the flattened tree format for representing IEEE1275-like
trees useful for qemu.

Personally, I think this is a "maybe".  More on this below.

Actually, on consideration there's a third question, too:
	c) Are the extensions / simplifications / adjustments we've
made to IEEE1275 conventions in the context of flattened trees also
useful and appropriate for qemu-configuration tree.

I think if the answer to (a) is yes, then the answer to (c) is yes,
too.

> FDT is a "flattened tree", i.e. a tree data structure laid out in a
> block of memory in a clever way to make it compact and easily

That's correct.

> relocatable.  I understand why these are important requirements for
> passing information through bootloader to kernel.  They're irrelevant,
> however, for use as QEMU configuration.

That's probably largely true.

> You can identify an FDT node by node offset or node name.  The node
> offset can change when you add or delete nodes or properties.

Correct.

> You want everyone to use libfdt for manipulating FDTs.  I think that's
> entirely sensible.  What I still don't get is something else: Why use
> FDT for QEMU configuration in the first place?  Let me explain.

Yeah, I see your point, hence my "maybe" to (b) above.  There's no
obvious call for the fdt format in qemu, but I can see a couple of
minor things that might make it worthwhile: First, if qemu ever does
want to record its configuration tree persistently - to be passed
between programs, or between invocations of a program - then it's
probably better to use the established fdt format rather than creating
a new one, even if fdt isn't designed particularly towards qemu's
purposes.  Second, the existing code / tools for working with the fdt
format *might* be sufficiently useful to make it worth using.

[Note also that the fdt tools will mostly work fine even if the tree
content is *not* very IEEE1275-like]

> I think we have two distinct problems: the need for a flexible,
> expressive QEMU machine configuration file and a virtual device
> configuration machinery driven by it, and the need for an FDT to pass to
> a PowerPC kernel.  The two may be related, but they are not identical.
> 
> Let's pretend for a minute the latter need doesn't exist.
> 
> QEMU machine configuration wants to be a decorated tree: a tree of named
> nodes with named properties.
> 
> IEEE 1275 is a standard describing a special kind of decorated tree.
> Other kinds can be created with a binding.  If we create a suitable
> binding, we can surely cast our configuration trees in the IEEE 1275
> framework.

That's not quite what "binding" usually means in the 1275 context, but
I think I the point is right enough.

> But what would that buy us?  This is a honest question, born out of my
> relative ignorance of IEEE 1275.  Mind that we're still busily ignoring
> the need for an FDT to pass to a kernel, so "it makes it easier to
> create an FDT for the kernel" doesn't count here (it counts elsewhere).

I think the idea behind using IEEE1275-like trees is that there is
significant overlap between the device information that IEEE1275
represents, and the device information which is configurable in qemu.
Ultimately whether it buys you enough depends on how large that
overlap is.

> FDTs are a special representation of IEEE 1275 trees in memory, designed
> to be compact and relocatable.  But that comes at a price: nodes move
> around when the tree changes.  The only real node id is the full name.

Or phandle, for those nodes which have one.

> This is not the common representation of decorated trees in C programs,
> and for a reason.  It's simpler to represent edges as C pointers.  Not
> the least advantage of that is notation: "->" beats a function call in
> legibility hands down.

Yes.  If there's enough manipulation of the tree, then you're
generally better off having a "live" format which uses pointers,
whether or not the fdt format is used at some stage in the process.
Both the kernel and dtc (when taking fdt input) convert the flattened
tree into a "live" representation internally.

> Example: the QEMU device data type needs to refer to its device node in
> the configuration tree.  If that tree is coded the plain old way, you
> store a pointer to the node and follow that.  If it is an FDT, then you
> have to store the full node name, and look up the node by name.  I find
> that tedious and verbose.

Um.. I don't really follow your example.  But I think I see your
point.  How problematic the flattened format is for this depends a lot
on exactly what you need to do with it.  Sometimes it's much easier to
avoid the flattened tree altogether, or transcribe it to a live
format.  Other times, the tree manipulation is simple enough that it's
easier to leave it flat (one example, for phases of the program where
the tree is read-only, which could be a lot for a configuration tree,
then node offsets *can* safely be used like pointers).

> My point is: the question how to represent our decorated tree in memory
> is entirely separate from the question of the tree's structure.  Just
> because you want your tree to conform to IEEE 1275 doesn't mean you want
> your tree flat at all times.

Absolutely, yes.


> Now let's examine how QEMU machine configuration and FDT machine
> descriptions for kernels are related.
> 
> In a way, both can be regarded as copies of a complete machine
> description with lots of stuff pruned.  Except the complete machine
> description doesn't exist.  Because there is no use for it.
> 
> FDT routinely prunes stuff like PCI and USB devices, because those are
> better probed.
> 
> QEMU configuration should certainly prune everything that is not
> actually configurable.
> 
> To go from QEMU configuration to FDT we therefore may want to prune
> superflous stuff, to keep it compact,

Not necessarily.  The kernel should be fine to deal with a tree that
has complete information, even if it doesn't need it, since that's
what a real OF implementation provides.

>  and we definitely have to add lots
> of stuff that has no place in configuration.

Yes.  Well.. whether this is a good plan depends critically on how big
that "lots" really is.

>  Compared to that task, a
> change of representation seems trivial.  I figure we want to copy the
> tree anyway, because we need to edit it pretty drastically.
> 
> It's not obvious to me whether it makes sense to create the FDT from the
> QEMU configuration automatically.  If we simulate a specific board, the
> FDT is pretty fixed, isn't it?  Much of the configurable stuff could be
> precisely in those parts that are omitted from FDT: PCI devices and
> such.

Well.. you definitely want to create the FDT passed to the kernel from
the qemu configuration.  But whether that's best done by essentially
transcribing a configuration tree which is in a similar format, or
just using the configuration tree info to poke the changable bits in a
"skeleton" FDT for the relevant machine is not so clear.

Possibly.  I'm not familiar enough with the various qemu supported
machine models to say.

> >> * Provide an example tree describing a bare-bones PC, like the one in my
> >>   prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
> >>   port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
> >>   miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
> >>   Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
> >>   tree structure.  Morphing that into something suitable for QEMU
> >>   configuration shouldn't be too hard then, just an exercice in
> >>   redecorating the tree.
> >
> > I don't off hand know any trees for a PC system.  There are a bunch of
> > example trees for powerpc systems in arch/powerpc/boot/dts in the
> > kernel tree.  A few of those, such as prep, at least have parts which
> > somewhat resemble a PC.  I believe the OLPC also has OF; that would be
> > an example OF tree for an x86 machine, if not a typical PC.
> 
> Could you point me to a specific file?  I grepped for prep and OLPC, no
> luck.

Oh, sorry, the prep tree hasn't gone into mainline yet.  But I believe
Mitch Bradley supplied a PC tree later in the thread, which would be
better for your purposes, anyway.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-16  3:42                   ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-16  3:42 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: devicetree-discuss, qemu-devel

On Fri, Feb 13, 2009 at 12:26:28PM +0100, Markus Armbruster wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> > On Thu, Feb 12, 2009 at 11:26:12AM +0100, Markus Armbruster wrote:
> >> Hollis Blanchard <hollisb@us.ibm.com> writes:
[snip]
> > dtc and libfdt is a good place to start, if you haven't yet
> > investigated them:
> > 	git://git.jdl.com/software/dtc.git
> > Note that although they're distributed together as one tree, dtc and
> > libfdt are essentially independent pieces of software.  dtc converts
> > device trees between various formats, dts and dtb in particular.
> >
> > libfdt does a number of the things you mention with flat trees -
> > get/set properties, build trees, traverse etc.  If it doesn't do
> > everything you need, we can probably extend it so that it does: I want
> > libfdt to be *the* library for manipulating trees in the fdt forma.
> > It's designed to be easy to embed in other packages for this reason,
> > although it does have some usage peculiarities because in particular
> > it's possible to integrate into very limited environments like
> > firmwares.
> >
> > [Jon Loeliger is the current maintainer of dtc and libfdt, but I
> > originally wrote both of them - I know as much about them as anyone
> > does]
> 
> Okay, I looked at dtc and libfdt again, a bit more closely.  I'm sure
> there's plenty of ignorance left in me, so please correct me when I'm
> babbling nonsense.

Sure.  So, I realize that there are two different questions here:
	a) Is IEEE1275 a good starting point for the content of a
decorated tree for configuring qemu.

Personally, I suspect the answer to this is yes, but more information
might convince me otherwise.

	b) Is the flattened tree format for representing IEEE1275-like
trees useful for qemu.

Personally, I think this is a "maybe".  More on this below.

Actually, on consideration there's a third question, too:
	c) Are the extensions / simplifications / adjustments we've
made to IEEE1275 conventions in the context of flattened trees also
useful and appropriate for qemu-configuration tree.

I think if the answer to (a) is yes, then the answer to (c) is yes,
too.

> FDT is a "flattened tree", i.e. a tree data structure laid out in a
> block of memory in a clever way to make it compact and easily

That's correct.

> relocatable.  I understand why these are important requirements for
> passing information through bootloader to kernel.  They're irrelevant,
> however, for use as QEMU configuration.

That's probably largely true.

> You can identify an FDT node by node offset or node name.  The node
> offset can change when you add or delete nodes or properties.

Correct.

> You want everyone to use libfdt for manipulating FDTs.  I think that's
> entirely sensible.  What I still don't get is something else: Why use
> FDT for QEMU configuration in the first place?  Let me explain.

Yeah, I see your point, hence my "maybe" to (b) above.  There's no
obvious call for the fdt format in qemu, but I can see a couple of
minor things that might make it worthwhile: First, if qemu ever does
want to record its configuration tree persistently - to be passed
between programs, or between invocations of a program - then it's
probably better to use the established fdt format rather than creating
a new one, even if fdt isn't designed particularly towards qemu's
purposes.  Second, the existing code / tools for working with the fdt
format *might* be sufficiently useful to make it worth using.

[Note also that the fdt tools will mostly work fine even if the tree
content is *not* very IEEE1275-like]

> I think we have two distinct problems: the need for a flexible,
> expressive QEMU machine configuration file and a virtual device
> configuration machinery driven by it, and the need for an FDT to pass to
> a PowerPC kernel.  The two may be related, but they are not identical.
> 
> Let's pretend for a minute the latter need doesn't exist.
> 
> QEMU machine configuration wants to be a decorated tree: a tree of named
> nodes with named properties.
> 
> IEEE 1275 is a standard describing a special kind of decorated tree.
> Other kinds can be created with a binding.  If we create a suitable
> binding, we can surely cast our configuration trees in the IEEE 1275
> framework.

That's not quite what "binding" usually means in the 1275 context, but
I think I the point is right enough.

> But what would that buy us?  This is a honest question, born out of my
> relative ignorance of IEEE 1275.  Mind that we're still busily ignoring
> the need for an FDT to pass to a kernel, so "it makes it easier to
> create an FDT for the kernel" doesn't count here (it counts elsewhere).

I think the idea behind using IEEE1275-like trees is that there is
significant overlap between the device information that IEEE1275
represents, and the device information which is configurable in qemu.
Ultimately whether it buys you enough depends on how large that
overlap is.

> FDTs are a special representation of IEEE 1275 trees in memory, designed
> to be compact and relocatable.  But that comes at a price: nodes move
> around when the tree changes.  The only real node id is the full name.

Or phandle, for those nodes which have one.

> This is not the common representation of decorated trees in C programs,
> and for a reason.  It's simpler to represent edges as C pointers.  Not
> the least advantage of that is notation: "->" beats a function call in
> legibility hands down.

Yes.  If there's enough manipulation of the tree, then you're
generally better off having a "live" format which uses pointers,
whether or not the fdt format is used at some stage in the process.
Both the kernel and dtc (when taking fdt input) convert the flattened
tree into a "live" representation internally.

> Example: the QEMU device data type needs to refer to its device node in
> the configuration tree.  If that tree is coded the plain old way, you
> store a pointer to the node and follow that.  If it is an FDT, then you
> have to store the full node name, and look up the node by name.  I find
> that tedious and verbose.

Um.. I don't really follow your example.  But I think I see your
point.  How problematic the flattened format is for this depends a lot
on exactly what you need to do with it.  Sometimes it's much easier to
avoid the flattened tree altogether, or transcribe it to a live
format.  Other times, the tree manipulation is simple enough that it's
easier to leave it flat (one example, for phases of the program where
the tree is read-only, which could be a lot for a configuration tree,
then node offsets *can* safely be used like pointers).

> My point is: the question how to represent our decorated tree in memory
> is entirely separate from the question of the tree's structure.  Just
> because you want your tree to conform to IEEE 1275 doesn't mean you want
> your tree flat at all times.

Absolutely, yes.


> Now let's examine how QEMU machine configuration and FDT machine
> descriptions for kernels are related.
> 
> In a way, both can be regarded as copies of a complete machine
> description with lots of stuff pruned.  Except the complete machine
> description doesn't exist.  Because there is no use for it.
> 
> FDT routinely prunes stuff like PCI and USB devices, because those are
> better probed.
> 
> QEMU configuration should certainly prune everything that is not
> actually configurable.
> 
> To go from QEMU configuration to FDT we therefore may want to prune
> superflous stuff, to keep it compact,

Not necessarily.  The kernel should be fine to deal with a tree that
has complete information, even if it doesn't need it, since that's
what a real OF implementation provides.

>  and we definitely have to add lots
> of stuff that has no place in configuration.

Yes.  Well.. whether this is a good plan depends critically on how big
that "lots" really is.

>  Compared to that task, a
> change of representation seems trivial.  I figure we want to copy the
> tree anyway, because we need to edit it pretty drastically.
> 
> It's not obvious to me whether it makes sense to create the FDT from the
> QEMU configuration automatically.  If we simulate a specific board, the
> FDT is pretty fixed, isn't it?  Much of the configurable stuff could be
> precisely in those parts that are omitted from FDT: PCI devices and
> such.

Well.. you definitely want to create the FDT passed to the kernel from
the qemu configuration.  But whether that's best done by essentially
transcribing a configuration tree which is in a similar format, or
just using the configuration tree info to poke the changable bits in a
"skeleton" FDT for the relevant machine is not so clear.

Possibly.  I'm not familiar enough with the various qemu supported
machine models to say.

> >> * Provide an example tree describing a bare-bones PC, like the one in my
> >>   prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
> >>   port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
> >>   miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
> >>   Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
> >>   tree structure.  Morphing that into something suitable for QEMU
> >>   configuration shouldn't be too hard then, just an exercice in
> >>   redecorating the tree.
> >
> > I don't off hand know any trees for a PC system.  There are a bunch of
> > example trees for powerpc systems in arch/powerpc/boot/dts in the
> > kernel tree.  A few of those, such as prep, at least have parts which
> > somewhat resemble a PC.  I believe the OLPC also has OF; that would be
> > an example OF tree for an x86 machine, if not a typical PC.
> 
> Could you point me to a specific file?  I grepped for prep and OLPC, no
> luck.

Oh, sorry, the prep tree hasn't gone into mainline yet.  But I believe
Mitch Bradley supplied a PC tree later in the thread, which would be
better for your purposes, anyway.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [Qemu-devel] Machine description as data prototype, take 2 (was: [RFC] Machine description as data)
  2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
                   ` (2 preceding siblings ...)
  2009-02-11 19:01 ` Anthony Liguori
@ 2009-02-16 16:22 ` Markus Armbruster
  2009-02-17 17:32   ` Paul Brook
  2009-02-19 10:29 ` [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data) Markus Armbruster
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Markus Armbruster @ 2009-02-16 16:22 UTC (permalink / raw)
  To: qemu-devel

Second iteration of the prototype.

New:

* Cleaner separation of machine and host configuration.

* Conversion of machine configuration between internal representation
  and FDT.  Compiled in only when configure finds libfdt; check the "fdt
  support" line in its output.

Shortcuts:

* Not yet rebased to current HEAD.  Easy enough.

* I put the "pcdt" code into the new file dt.c, and copied code from
  pc.c there.  I could have avoided that by putting my code in pc.c
  instead.  Putting it in a new file helped me pick apart the pc.c
  hairball.  To be cleaned up.  New code in that file starts below the
  line /* Host Configuration */

* I copied code from net.c.  Trivial to fix, just give it external
  linkage there.

* I didn't implement all the devices of the "pc" original.  The devices
  I implemented might not support all existing command line options.

* The initial configuration tree is hardcoded.  It should be read from a
  configuration file.

* Optional stuff is inserted into the initial configuration tree in
  hardcoded places.  The places should be marked in the configuration
  file instead.

Notable qualities:

* Linux still boots & shuts down cleanly.

* Machine and host configuration are cleanly separated.  Machine
  configuration enumerates the components of the virtual machine, and
  how they are connected.  It is a tree of devices nodes.  Host
  configuration is about how the host implements virtual devices.
  Currently just a few flat tables.

* Device drivers implement a common abstract interface.

* Device drivers are cleanly separated from each other, and from the
  device-agnostic machine configuration and initialization code.

* Each device driver specifies its configurable properties in a single
  place.  Unknown properties are rejected.

* A device driver gets its configuration from two sources: the device's
  node in the machine configuraton tree, and applicable host
  configuration tables.

Comments?


diff --git a/Makefile b/Makefile
index 4f7a55a..2198bba 100644
--- a/Makefile
+++ b/Makefile
@@ -85,6 +85,7 @@ OBJS+=sd.o ssi-sd.o
 OBJS+=bt.o bt-host.o bt-vhci.o bt-l2cap.o bt-sdp.o bt-hci.o bt-hid.o usb-bt.o
 OBJS+=buffered_file.o migration.o migration-tcp.o net.o qemu-sockets.o
 OBJS+=qemu-char.o aio.o net-checksum.o savevm.o cache-utils.o
+OBJS+=tree.o
 
 ifdef CONFIG_BRLAPI
 OBJS+= baum.o
diff --git a/Makefile.target b/Makefile.target
index a091ce9..790529b 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -580,7 +580,11 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
+OBJS+= dt.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
+ifdef FDT_LIBS
+LIBS+= $(FDT_LIBS)
+endif
 endif
 ifeq ($(TARGET_BASE_ARCH), ppc)
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
diff --git a/hw/dt.c b/hw/dt.c
new file mode 100644
index 0000000..f839964
--- /dev/null
+++ b/hw/dt.c
@@ -0,0 +1,1856 @@
+#include <assert.h>
+#include "hw.h"
+#include "pc.h"
+#include "fdc.h"
+#include "pci.h"
+#include "block.h"
+#include "sysemu.h"
+#include "audio/audio.h"
+#include "net.h"
+#include "smbus.h"
+#include "boards.h"
+#include "console.h"
+#include "fw_cfg.h"
+#include "virtio-blk.h"
+#include "virtio-balloon.h"
+#include "virtio-console.h"
+#include "hpet_emul.h"
+#include "tree.h"
+
+#ifdef HAVE_FDT
+#include <libfdt.h>
+#endif
+
+/* Forward declarations */
+struct dt_device;
+struct dt_driver;
+struct dt_prop_spec;
+static void dt_parse_prop(struct dt_device *dev, struct tree_prop *prop);
+static BlockDriverState **dt_piix3_hd(struct tree *piix3);
+
+\f

+// FIXME copied from pc.c, external defs stripped, unused stuff #if 0'ed
+/* output Bochs bios info messages */
+//#define DEBUG_BIOS
+
+#define BIOS_FILENAME "bios.bin"
+#define VGABIOS_FILENAME "vgabios.bin"
+#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
+
+#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
+
+/* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables.  */
+#define ACPI_DATA_SIZE       0x10000
+#define BIOS_CFG_IOPORT 0x510
+
+#define MAX_IDE_BUS 2
+
+static fdctrl_t *floppy_controller;
+static RTCState *rtc_state;
+#if 0
+static PITState *pit;
+static IOAPICState *ioapic;
+#endif
+extern PCIDevice *i440fx_state;
+
+static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
+{
+}
+
+#if 0
+/* MSDOS compatibility mode FPU exception support */
+static qemu_irq ferr_irq;
+/* XXX: add IGNNE support */
+void cpu_set_ferr(CPUX86State *s)
+{
+    qemu_irq_raise(ferr_irq);
+}
+
+static void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
+{
+    qemu_irq_lower(ferr_irq);
+}
+#else
+extern qemu_irq ferr_irq;
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data);
+#endif
+
+#if 0
+/* TSC handling */
+uint64_t cpu_get_tsc(CPUX86State *env)
+{
+    /* Note: when using kqemu, it is more logical to return the host TSC
+       because kqemu does not trap the RDTSC instruction for
+       performance reasons */
+#ifdef USE_KQEMU
+    if (env->kqemu_enabled) {
+        return cpu_get_real_ticks();
+    } else
+#endif
+    {
+        return cpu_get_ticks();
+    }
+}
+
+/* SMM support */
+void cpu_smm_update(CPUState *env)
+{
+    if (i440fx_state && env == first_cpu)
+        i440fx_set_smm(i440fx_state, (env->hflags >> HF_SMM_SHIFT) & 1);
+}
+
+/* IRQ handling */
+int cpu_get_pic_interrupt(CPUState *env)
+{
+    int intno;
+
+    intno = apic_get_interrupt(env);
+    if (intno >= 0) {
+        /* set irq request if a PIC irq is still pending */
+        /* XXX: improve that */
+        pic_update_irq(isa_pic);
+        return intno;
+    }
+    /* read the irq from the PIC */
+    if (!apic_accept_pic_intr(env))
+        return -1;
+
+    intno = pic_read_irq(isa_pic);
+    return intno;
+}
+#endif
+
+static void pic_irq_request(void *opaque, int irq, int level)
+{
+    CPUState *env = first_cpu;
+
+    if (env->apic_state) {
+        while (env) {
+            if (apic_accept_pic_intr(env))
+                apic_deliver_pic_intr(env, level);
+            env = env->next_cpu;
+        }
+    } else {
+        if (level)
+            cpu_interrupt(env, CPU_INTERRUPT_HARD);
+        else
+            cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
+    }
+}
+
+/* PC cmos mappings */
+
+#define REG_EQUIPMENT_BYTE          0x14
+
+static int cmos_get_fd_drive_type(int fd0)
+{
+    int val;
+
+    switch (fd0) {
+    case 0:
+        /* 1.44 Mb 3"5 drive */
+        val = 4;
+        break;
+    case 1:
+        /* 2.88 Mb 3"5 drive */
+        val = 5;
+        break;
+    case 2:
+        /* 1.2 Mb 5"5 drive */
+        val = 2;
+        break;
+    default:
+        val = 0;
+        break;
+    }
+    return val;
+}
+
+static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
+{
+    RTCState *s = rtc_state;
+    int cylinders, heads, sectors;
+    bdrv_get_geometry_hint(hd, &cylinders, &heads, &sectors);
+    rtc_set_memory(s, type_ofs, 47);
+    rtc_set_memory(s, info_ofs, cylinders);
+    rtc_set_memory(s, info_ofs + 1, cylinders >> 8);
+    rtc_set_memory(s, info_ofs + 2, heads);
+    rtc_set_memory(s, info_ofs + 3, 0xff);
+    rtc_set_memory(s, info_ofs + 4, 0xff);
+    rtc_set_memory(s, info_ofs + 5, 0xc0 | ((heads > 8) << 3));
+    rtc_set_memory(s, info_ofs + 6, cylinders);
+    rtc_set_memory(s, info_ofs + 7, cylinders >> 8);
+    rtc_set_memory(s, info_ofs + 8, sectors);
+}
+
+/* convert boot_device letter to something recognizable by the bios */
+static int boot_device2nibble(char boot_device)
+{
+    switch(boot_device) {
+    case 'a':
+    case 'b':
+        return 0x01; /* floppy boot */
+    case 'c':
+        return 0x02; /* hard drive boot */
+    case 'd':
+        return 0x03; /* CD-ROM boot */
+    case 'n':
+        return 0x04; /* Network boot */
+    }
+    return 0;
+}
+
+/* copy/pasted from cmos_init, should be made a general function
+ and used there as well */
+static int pc_boot_set(void *opaque, const char *boot_device)
+{
+#define PC_MAX_BOOT_DEVICES 3
+    RTCState *s = (RTCState *)opaque;
+    int nbds, bds[3] = { 0, };
+    int i;
+
+    nbds = strlen(boot_device);
+    if (nbds > PC_MAX_BOOT_DEVICES) {
+        term_printf("Too many boot devices for PC\n");
+        return(1);
+    }
+    for (i = 0; i < nbds; i++) {
+        bds[i] = boot_device2nibble(boot_device[i]);
+        if (bds[i] == 0) {
+            term_printf("Invalid boot device for PC: '%c'\n",
+                    boot_device[i]);
+            return(1);
+        }
+    }
+    rtc_set_memory(s, 0x3d, (bds[1] << 4) | bds[0]);
+    rtc_set_memory(s, 0x38, (bds[2] << 4));
+    return(0);
+}
+
+/* hd_table must contain 4 block drivers */
+static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+                      const char *boot_device, BlockDriverState **hd_table)
+{
+    RTCState *s = rtc_state;
+    int nbds, bds[3] = { 0, };
+    int val;
+    int fd0, fd1, nb;
+    int i;
+
+    /* various important CMOS locations needed by PC/Bochs bios */
+
+    /* memory size */
+    val = 640; /* base memory in K */
+    rtc_set_memory(s, 0x15, val);
+    rtc_set_memory(s, 0x16, val >> 8);
+
+    val = (ram_size / 1024) - 1024;
+    if (val > 65535)
+        val = 65535;
+    rtc_set_memory(s, 0x17, val);
+    rtc_set_memory(s, 0x18, val >> 8);
+    rtc_set_memory(s, 0x30, val);
+    rtc_set_memory(s, 0x31, val >> 8);
+
+    if (above_4g_mem_size) {
+        rtc_set_memory(s, 0x5b, (unsigned int)above_4g_mem_size >> 16);
+        rtc_set_memory(s, 0x5c, (unsigned int)above_4g_mem_size >> 24);
+        rtc_set_memory(s, 0x5d, (uint64_t)above_4g_mem_size >> 32);
+    }
+
+    if (ram_size > (16 * 1024 * 1024))
+        val = (ram_size / 65536) - ((16 * 1024 * 1024) / 65536);
+    else
+        val = 0;
+    if (val > 65535)
+        val = 65535;
+    rtc_set_memory(s, 0x34, val);
+    rtc_set_memory(s, 0x35, val >> 8);
+
+    /* set the number of CPU */
+    rtc_set_memory(s, 0x5f, smp_cpus - 1);
+
+    /* set boot devices, and disable floppy signature check if requested */
+#define PC_MAX_BOOT_DEVICES 3
+    nbds = strlen(boot_device);
+    if (nbds > PC_MAX_BOOT_DEVICES) {
+        fprintf(stderr, "Too many boot devices for PC\n");
+        exit(1);
+    }
+    for (i = 0; i < nbds; i++) {
+        bds[i] = boot_device2nibble(boot_device[i]);
+        if (bds[i] == 0) {
+            fprintf(stderr, "Invalid boot device for PC: '%c'\n",
+                    boot_device[i]);
+            exit(1);
+        }
+    }
+    rtc_set_memory(s, 0x3d, (bds[1] << 4) | bds[0]);
+    rtc_set_memory(s, 0x38, (bds[2] << 4) | (fd_bootchk ?  0x0 : 0x1));
+
+    /* floppy type */
+
+    fd0 = fdctrl_get_drive_type(floppy_controller, 0);
+    fd1 = fdctrl_get_drive_type(floppy_controller, 1);
+
+    val = (cmos_get_fd_drive_type(fd0) << 4) | cmos_get_fd_drive_type(fd1);
+    rtc_set_memory(s, 0x10, val);
+
+    val = 0;
+    nb = 0;
+    if (fd0 < 3)
+        nb++;
+    if (fd1 < 3)
+        nb++;
+    switch (nb) {
+    case 0:
+        break;
+    case 1:
+        val |= 0x01; /* 1 drive, ready for boot */
+        break;
+    case 2:
+        val |= 0x41; /* 2 drives, ready for boot */
+        break;
+    }
+    val |= 0x02; /* FPU is there */
+    val |= 0x04; /* PS/2 mouse installed */
+    rtc_set_memory(s, REG_EQUIPMENT_BYTE, val);
+
+    /* hard drives */
+
+    rtc_set_memory(s, 0x12, (hd_table[0] ? 0xf0 : 0) | (hd_table[1] ? 0x0f : 0));
+    if (hd_table[0])
+        cmos_init_hd(0x19, 0x1b, hd_table[0]);
+    if (hd_table[1])
+        cmos_init_hd(0x1a, 0x24, hd_table[1]);
+
+    val = 0;
+    for (i = 0; i < 4; i++) {
+        if (hd_table[i]) {
+            int cylinders, heads, sectors, translation;
+            /* NOTE: bdrv_get_geometry_hint() returns the physical
+                geometry.  It is always such that: 1 <= sects <= 63, 1
+                <= heads <= 16, 1 <= cylinders <= 16383. The BIOS
+                geometry can be different if a translation is done. */
+            translation = bdrv_get_translation_hint(hd_table[i]);
+            if (translation == BIOS_ATA_TRANSLATION_AUTO) {
+                bdrv_get_geometry_hint(hd_table[i], &cylinders, &heads, &sectors);
+                if (cylinders <= 1024 && heads <= 16 && sectors <= 63) {
+                    /* No translation. */
+                    translation = 0;
+                } else {
+                    /* LBA translation. */
+                    translation = 1;
+                }
+            } else {
+                translation--;
+            }
+            val |= translation << (i * 2);
+        }
+    }
+    rtc_set_memory(s, 0x39, val);
+}
+
+#if 0
+void ioport_set_a20(int enable)
+{
+    /* XXX: send to all CPUs ? */
+    cpu_x86_set_a20(first_cpu, enable);
+}
+
+int ioport_get_a20(void)
+{
+    return ((first_cpu->a20_mask >> 20) & 1);
+}
+#endif
+
+static void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
+{
+    ioport_set_a20((val >> 1) & 1);
+    /* XXX: bit 0 is fast reset */
+}
+
+static uint32_t ioport92_read(void *opaque, uint32_t addr)
+{
+    return ioport_get_a20() << 1;
+}
+
+/***********************************************************/
+/* Bochs BIOS debug ports */
+
+static void bochs_bios_write(void *opaque, uint32_t addr, uint32_t val)
+{
+    static const char shutdown_str[8] = "Shutdown";
+    static int shutdown_index = 0;
+
+    switch(addr) {
+        /* Bochs BIOS messages */
+    case 0x400:
+    case 0x401:
+        fprintf(stderr, "BIOS panic at rombios.c, line %d\n", val);
+        exit(1);
+    case 0x402:
+    case 0x403:
+#ifdef DEBUG_BIOS
+        fprintf(stderr, "%c", val);
+#endif
+        break;
+    case 0x8900:
+        /* same as Bochs power off */
+        if (val == shutdown_str[shutdown_index]) {
+            shutdown_index++;
+            if (shutdown_index == 8) {
+                shutdown_index = 0;
+                qemu_system_shutdown_request();
+            }
+        } else {
+            shutdown_index = 0;
+        }
+        break;
+
+        /* LGPL'ed VGA BIOS messages */
+    case 0x501:
+    case 0x502:
+        fprintf(stderr, "VGA BIOS panic, line %d\n", val);
+        exit(1);
+    case 0x500:
+    case 0x503:
+#ifdef DEBUG_BIOS
+        fprintf(stderr, "%c", val);
+#endif
+        break;
+    }
+}
+
+static void bochs_bios_init(void)
+{
+    void *fw_cfg;
+
+    register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
+    register_ioport_write(0x401, 1, 2, bochs_bios_write, NULL);
+    register_ioport_write(0x402, 1, 1, bochs_bios_write, NULL);
+    register_ioport_write(0x403, 1, 1, bochs_bios_write, NULL);
+    register_ioport_write(0x8900, 1, 1, bochs_bios_write, NULL);
+
+    register_ioport_write(0x501, 1, 2, bochs_bios_write, NULL);
+    register_ioport_write(0x502, 1, 2, bochs_bios_write, NULL);
+    register_ioport_write(0x500, 1, 1, bochs_bios_write, NULL);
+    register_ioport_write(0x503, 1, 1, bochs_bios_write, NULL);
+
+    fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
+    fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
+}
+
+#if 0
+/* Generate an initial boot sector which sets state and jump to
+   a specified vector */
+static void generate_bootsect(uint8_t *option_rom,
+                              uint32_t gpr[8], uint16_t segs[6], uint16_t ip)
+{
+    uint8_t rom[512], *p, *reloc;
+    uint8_t sum;
+    int i;
+
+    memset(rom, 0, sizeof(rom));
+
+    p = rom;
+    /* Make sure we have an option rom signature */
+    *p++ = 0x55;
+    *p++ = 0xaa;
+
+    /* ROM size in sectors*/
+    *p++ = 1;
+
+    /* Hook int19 */
+
+    *p++ = 0x50;		/* push ax */
+    *p++ = 0x1e;		/* push ds */
+    *p++ = 0x31; *p++ = 0xc0;	/* xor ax, ax */
+    *p++ = 0x8e; *p++ = 0xd8;	/* mov ax, ds */
+
+    *p++ = 0xc7; *p++ = 0x06;   /* movvw _start,0x64 */
+    *p++ = 0x64; *p++ = 0x00;
+    reloc = p;
+    *p++ = 0x00; *p++ = 0x00;
+
+    *p++ = 0x8c; *p++ = 0x0e;   /* mov cs,0x66 */
+    *p++ = 0x66; *p++ = 0x00;
+
+    *p++ = 0x1f;		/* pop ds */
+    *p++ = 0x58;		/* pop ax */
+    *p++ = 0xcb;		/* lret */
+    
+    /* Actual code */
+    *reloc = (p - rom);
+
+    *p++ = 0xfa;		/* CLI */
+    *p++ = 0xfc;		/* CLD */
+
+    for (i = 0; i < 6; i++) {
+	if (i == 1)		/* Skip CS */
+	    continue;
+
+	*p++ = 0xb8;		/* MOV AX,imm16 */
+	*p++ = segs[i];
+	*p++ = segs[i] >> 8;
+	*p++ = 0x8e;		/* MOV <seg>,AX */
+	*p++ = 0xc0 + (i << 3);
+    }
+
+    for (i = 0; i < 8; i++) {
+	*p++ = 0x66;		/* 32-bit operand size */
+	*p++ = 0xb8 + i;	/* MOV <reg>,imm32 */
+	*p++ = gpr[i];
+	*p++ = gpr[i] >> 8;
+	*p++ = gpr[i] >> 16;
+	*p++ = gpr[i] >> 24;
+    }
+
+    *p++ = 0xea;		/* JMP FAR */
+    *p++ = ip;			/* IP */
+    *p++ = ip >> 8;
+    *p++ = segs[1];		/* CS */
+    *p++ = segs[1] >> 8;
+
+    /* sign rom */
+    sum = 0;
+    for (i = 0; i < (sizeof(rom) - 1); i++)
+        sum += rom[i];
+    rom[sizeof(rom) - 1] = -sum;
+
+    memcpy(option_rom, rom, sizeof(rom));
+}
+
+static long get_file_size(FILE *f)
+{
+    long where, size;
+
+    /* XXX: on Unix systems, using fstat() probably makes more sense */
+
+    where = ftell(f);
+    fseek(f, 0, SEEK_END);
+    size = ftell(f);
+    fseek(f, where, SEEK_SET);
+
+    return size;
+}
+
+static void load_linux(uint8_t *option_rom,
+                       const char *kernel_filename,
+		       const char *initrd_filename,
+		       const char *kernel_cmdline)
+{
+    uint16_t protocol;
+    uint32_t gpr[8];
+    uint16_t seg[6];
+    uint16_t real_seg;
+    int setup_size, kernel_size, initrd_size, cmdline_size;
+    uint32_t initrd_max;
+    uint8_t header[1024];
+    target_phys_addr_t real_addr, prot_addr, cmdline_addr, initrd_addr;
+    FILE *f, *fi;
+
+    /* Align to 16 bytes as a paranoia measure */
+    cmdline_size = (strlen(kernel_cmdline)+16) & ~15;
+
+    /* load the kernel header */
+    f = fopen(kernel_filename, "rb");
+    if (!f || !(kernel_size = get_file_size(f)) ||
+	fread(header, 1, 1024, f) != 1024) {
+	fprintf(stderr, "qemu: could not load kernel '%s'\n",
+		kernel_filename);
+	exit(1);
+    }
+
+    /* kernel protocol version */
+#if 0
+    fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
+#endif
+    if (ldl_p(header+0x202) == 0x53726448)
+	protocol = lduw_p(header+0x206);
+    else
+	protocol = 0;
+
+    if (protocol < 0x200 || !(header[0x211] & 0x01)) {
+	/* Low kernel */
+	real_addr    = 0x90000;
+	cmdline_addr = 0x9a000 - cmdline_size;
+	prot_addr    = 0x10000;
+    } else if (protocol < 0x202) {
+	/* High but ancient kernel */
+	real_addr    = 0x90000;
+	cmdline_addr = 0x9a000 - cmdline_size;
+	prot_addr    = 0x100000;
+    } else {
+	/* High and recent kernel */
+	real_addr    = 0x10000;
+	cmdline_addr = 0x20000;
+	prot_addr    = 0x100000;
+    }
+
+#if 0
+    fprintf(stderr,
+	    "qemu: real_addr     = 0x" TARGET_FMT_plx "\n"
+	    "qemu: cmdline_addr  = 0x" TARGET_FMT_plx "\n"
+	    "qemu: prot_addr     = 0x" TARGET_FMT_plx "\n",
+	    real_addr,
+	    cmdline_addr,
+	    prot_addr);
+#endif
+
+    /* highest address for loading the initrd */
+    if (protocol >= 0x203)
+	initrd_max = ldl_p(header+0x22c);
+    else
+	initrd_max = 0x37ffffff;
+
+    if (initrd_max >= ram_size-ACPI_DATA_SIZE)
+	initrd_max = ram_size-ACPI_DATA_SIZE-1;
+
+    /* kernel command line */
+    pstrcpy_targphys(cmdline_addr, 4096, kernel_cmdline);
+
+    if (protocol >= 0x202) {
+	stl_p(header+0x228, cmdline_addr);
+    } else {
+	stw_p(header+0x20, 0xA33F);
+	stw_p(header+0x22, cmdline_addr-real_addr);
+    }
+
+    /* loader type */
+    /* High nybble = B reserved for Qemu; low nybble is revision number.
+       If this code is substantially changed, you may want to consider
+       incrementing the revision. */
+    if (protocol >= 0x200)
+	header[0x210] = 0xB0;
+
+    /* heap */
+    if (protocol >= 0x201) {
+	header[0x211] |= 0x80;	/* CAN_USE_HEAP */
+	stw_p(header+0x224, cmdline_addr-real_addr-0x200);
+    }
+
+    /* load initrd */
+    if (initrd_filename) {
+	if (protocol < 0x200) {
+	    fprintf(stderr, "qemu: linux kernel too old to load a ram disk\n");
+	    exit(1);
+	}
+
+	fi = fopen(initrd_filename, "rb");
+	if (!fi) {
+	    fprintf(stderr, "qemu: could not load initial ram disk '%s'\n",
+		    initrd_filename);
+	    exit(1);
+	}
+
+	initrd_size = get_file_size(fi);
+	initrd_addr = (initrd_max-initrd_size) & ~4095;
+
+        fprintf(stderr, "qemu: loading initrd (%#x bytes) at 0x" TARGET_FMT_plx
+                "\n", initrd_size, initrd_addr);
+
+	if (!fread_targphys_ok(initrd_addr, initrd_size, fi)) {
+	    fprintf(stderr, "qemu: read error on initial ram disk '%s'\n",
+		    initrd_filename);
+	    exit(1);
+	}
+	fclose(fi);
+
+	stl_p(header+0x218, initrd_addr);
+	stl_p(header+0x21c, initrd_size);
+    }
+
+    /* store the finalized header and load the rest of the kernel */
+    cpu_physical_memory_write(real_addr, header, 1024);
+
+    setup_size = header[0x1f1];
+    if (setup_size == 0)
+	setup_size = 4;
+
+    setup_size = (setup_size+1)*512;
+    kernel_size -= setup_size;	/* Size of protected-mode code */
+
+    if (!fread_targphys_ok(real_addr+1024, setup_size-1024, f) ||
+	!fread_targphys_ok(prot_addr, kernel_size, f)) {
+	fprintf(stderr, "qemu: read error on kernel '%s'\n",
+		kernel_filename);
+	exit(1);
+    }
+    fclose(f);
+
+    /* generate bootsector to set up the initial register state */
+    real_seg = real_addr >> 4;
+    seg[0] = seg[2] = seg[3] = seg[4] = seg[4] = real_seg;
+    seg[1] = real_seg+0x20;	/* CS */
+    memset(gpr, 0, sizeof gpr);
+    gpr[4] = cmdline_addr-real_addr-16;	/* SP (-16 is paranoia) */
+
+    generate_bootsect(option_rom, gpr, seg, 0);
+}
+#endif
+
+static void main_cpu_reset(void *opaque)
+{
+    CPUState *env = opaque;
+    cpu_reset(env);
+}
+
+static const int ide_iobase[2] = { 0x1f0, 0x170 };
+static const int ide_iobase2[2] = { 0x3f6, 0x376 };
+static const int ide_irq[2] = { 14, 15 };
+
+#define NE2000_NB_MAX 6
+
+static const int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360, 0x280, 0x380 };
+static const int ne2000_irq[NE2000_NB_MAX] = { 9, 10, 11, 3, 4, 5 };
+
+static const int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
+static const int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
+
+static const int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
+static const int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
+
+#if 0 //def HAS_AUDIO
+static void audio_init (PCIBus *pci_bus, qemu_irq *pic)
+{
+    struct soundhw *c;
+    int audio_enabled = 0;
+
+    for (c = soundhw; !audio_enabled && c->name; ++c) {
+        audio_enabled = c->enabled;
+    }
+
+    if (audio_enabled) {
+        AudioState *s;
+
+        s = AUD_init ();
+        if (s) {
+            for (c = soundhw; c->name; ++c) {
+                if (c->enabled) {
+                    if (c->isa) {
+                        c->init.init_isa (s, pic);
+                    }
+                    else {
+                        if (pci_bus) {
+                            c->init.init_pci (pci_bus, s);
+                        }
+                    }
+                }
+            }
+        }
+    }
+}
+
+static void pc_init_ne2k_isa(NICInfo *nd, qemu_irq *pic)
+{
+    static int nb_ne2k = 0;
+
+    if (nb_ne2k == NE2000_NB_MAX)
+        return;
+    isa_ne2000_init(ne2000_io[nb_ne2k], pic[ne2000_irq[nb_ne2k]], nd);
+    nb_ne2k++;
+}
+#endif
+
+\f

+// FIXME copied from net.c
+
+static int parse_macaddr(uint8_t *macaddr, const char *p)
+{
+    int i;
+    char *last_char;
+    long int offset;
+
+    errno = 0;
+    offset = strtol(p, &last_char, 0);    
+    if (0 == errno && '\0' == *last_char &&
+            offset >= 0 && offset <= 0xFFFFFF) {
+        macaddr[3] = (offset & 0xFF0000) >> 16;
+        macaddr[4] = (offset & 0xFF00) >> 8;
+        macaddr[5] = offset & 0xFF;
+        return 0;
+    } else {
+        for(i = 0; i < 6; i++) {
+            macaddr[i] = strtol(p, (char **)&p, 16);
+            if (i == 5) {
+                if (*p != '\0')
+                    return -1;
+            } else {
+                if (*p != ':' && *p != '-')
+                    return -1;
+                p++;
+            }
+        }
+        return 0;    
+    }
+
+    return -1;
+}
+\f

+/* Host Configuration */
+
+struct dt_host {
+    struct tree *nic[MAX_NICS];
+    VLANState *nic_vlan[MAX_NICS];
+    struct tree *drive_ctrl[MAX_DRIVES];
+    BlockDriverState *drive_state[MAX_DRIVES];
+};
+
+static void
+dt_attach_nic(struct dt_host *host, int index,
+	      struct tree *nic, VLANState *vlan)
+{
+    host->nic[index] = nic;
+    host->nic_vlan[index] = vlan;
+}
+
+static VLANState *
+dt_find_vlan(struct tree *conf, struct dt_host *host)
+{
+    int i;
+
+    for (i = 0; i < MAX_NICS; i++) {
+	if (host->nic[i] == conf)
+	    return host->nic_vlan[i];
+    }
+    return NULL;
+}
+
+static void
+dt_attach_drive(struct dt_host *host, int index,
+		struct tree *controller, BlockDriverState *state)
+{
+    host->drive_ctrl[index] = controller;
+    host->drive_state[index] = state;
+}
+
+static void
+dt_drive_config(struct tree *conf, struct dt_host *host,
+		BlockDriverState *drive[], int n)
+{
+    int i, j;
+
+    j = 0;
+    for (i = 0; i < MAX_DRIVES; i++) {
+	if (host->drive_ctrl[i] != conf)
+	    continue;
+	assert(j < n);
+	drive[j++] = host->drive_state[i];
+    }
+}
+
+static void
+dt_print_host_config(struct dt_host *host)
+{
+    char buf[1024];
+    int i;
+
+    for (i = 0; i < MAX_NICS; i++) {
+	if (!host->nic[i])
+	    continue;
+	tree_path(host->nic[i], buf, sizeof(buf));
+	printf("nic#%d\tvlan %-4d\t%s\n",
+	       i, host->nic_vlan[i]->id, buf);
+    }
+
+    for (i = 0; i < MAX_DRIVES; i++) {
+	if (!host->drive_ctrl[i])
+	    continue;
+	tree_path(host->drive_ctrl[i], buf, sizeof(buf));
+	printf("drive#%d\t%-15s %s\n",
+	       i, bdrv_get_device_name(host->drive_state[i]), buf);
+    }
+}
+
+\f

+/* Device Interface */
+
+/*
+ * Device life cycle:
+ *
+ * 1. Configuration: config() method runs after parent's.  Except kids
+ * are skipped when the parent's config() returns non-zero.  config()
+ * should initialize the device's private data from its configuration
+ * sub-tree.  It may edit the configuration sub-tree, and may declare
+ * initialization ordering constraints with tree_require_named().
+ * 
+ * 2. Initialization: init() method runs after parent's and after that
+ * of devices declared required by config().  It should not touch the
+ * configuration tree.
+ *
+ * 3. Start: start() method runs, order is unspecified.
+ * 
+ * Error handling in these driver methods: print to stderr and exit
+ * the program unsuccessfully.
+ *
+ * There is no device shutdown protocol yet.
+ */
+
+struct dt_device {
+    struct tree *conf;		/* configuration sub-tree */
+    struct dt_driver *drv;	/* device driver */
+    void *priv;			/* device private data */
+};
+
+struct dt_driver {
+    const char *name;
+    size_t privsz;		/* size of device private data */
+    struct dt_prop_spec *prop_spec; /* recognized conf node properties */
+    int (*config)(struct dt_device *, struct dt_host *);
+    void (*init)(struct dt_device *);
+    void (*start)(struct dt_device *);
+};
+
+static struct dt_driver dt_driver_table[];
+
+static struct dt_driver *
+dt_driver_by_name(const char *name)
+{
+    int i;
+
+    for (i = 0; dt_driver_table[i].name; i++) {
+	if (!strcmp(name, dt_driver_table[i].name))
+	    return &dt_driver_table[i];
+    }
+    return NULL;
+}
+
+static struct dt_device *
+dt_device_of(struct tree *conf)
+{
+    return tree_get_user(conf);
+}
+
+static struct dt_device *
+dt_new_device(struct tree *conf, struct dt_driver *drv)
+{
+    struct dt_device *dev;
+    struct tree_prop *prop;
+
+    dev = qemu_malloc(sizeof(*dev));
+    dev->conf = conf;
+    dev->drv = drv;
+    dev->priv = qemu_malloc(drv->privsz);
+    tree_put_user(conf, dev);
+
+    TREE_FOREACH_PROP(prop, conf)
+	dt_parse_prop(dev, prop);
+
+    return dev;
+}
+
+static void
+dt_config(struct tree *conf, struct dt_host *host)
+{
+    struct dt_driver *drv;
+    struct dt_device *dev;
+    struct tree *kid;
+
+    drv = dt_driver_by_name(tree_node_name(conf));
+    if (!drv) {
+	fprintf(stderr, "No driver for device %s\n",
+		tree_node_name(conf));
+	exit(1);
+    }
+    dev = dt_new_device(conf, drv);
+    if (drv->config) {
+	if (drv->config(dev, host))
+	    return;
+    }
+
+    TREE_FOREACH_KID(kid, conf)
+	dt_config(kid, host);
+}
+
+static void
+dt_init_visitor(struct tree *node, void *arg)
+{
+    struct dt_device *dev = dt_device_of(node);
+
+    if (dev && dev->drv->init)
+	dev->drv->init(dev);
+}
+
+static void
+dt_init(struct tree *conf)
+{
+    tree_visit(conf, dt_init_visitor, NULL);
+}
+
+static void
+dt_start(struct tree *conf)
+{
+    struct dt_device *dev = dt_device_of(conf);
+    struct tree *kid;
+
+    if (dev && dev->drv->start)
+	dev->drv->start(dev);
+
+    TREE_FOREACH_KID(kid, conf)
+	dt_start(kid);
+}
+
+\f

+/* Device properties */
+
+/*
+ * This is for parsing configuration tree node properties into device
+ * private data.
+ */
+
+struct dt_prop_spec {
+    const char *name;
+    ptrdiff_t offs;		/* offset in device private data */
+    size_t size;		/* size there, for sanity checking */
+    int (*parse)(void *, const char *, struct dt_prop_spec *);
+};
+
+#define DT_PROP_SPEC_INIT(name, strty, member, fmt)			\
+    { name, offsetof(strty, member), sizeof(((strty *)0)->member),	\
+      dt_parse_##fmt }
+
+static struct dt_prop_spec *
+dt_prop_spec_by_name(struct dt_driver *drv, const char *name)
+{
+    struct dt_prop_spec *spec;
+
+    for (spec = drv->prop_spec; spec && spec->name; spec++) {
+	if (!strcmp(spec->name, name))
+	    return spec;
+    }
+    return NULL;
+}
+
+static void
+dt_parse_prop(struct dt_device *dev, struct tree_prop *prop)
+{
+    const char *name = tree_prop_name(prop);
+    size_t size;
+    const char *val = tree_prop_value(prop, &size);
+    struct dt_prop_spec *spec = dt_prop_spec_by_name(dev->drv, name);
+
+    if (!spec) {
+	fprintf(stderr, "A %s device has no property %s\n",
+		dev->drv->name, name);
+	exit(1);
+    }
+
+    if (memchr(val, 0, size) != val + size - 1
+	|| spec->parse((char *)dev->priv + spec->offs, val, spec) < 0) {
+	fprintf(stderr, "Bad value %.*s for property %s of device %s\n",
+		size, val, name, dev->drv->name);
+	exit(1);
+    }
+}
+
+static int
+dt_parse_string(void *dst, const char *src, struct dt_prop_spec *spec)
+{
+    assert(spec->size == sizeof(char *));
+    *(const char **)dst = src;
+    return 0;
+}
+
+static int
+dt_parse_int(void *dst, const char *src, struct dt_prop_spec *spec)
+{
+    char *ep;
+    long val;
+
+    assert(spec->size == sizeof(int));
+    errno = 0;
+    val = strtol(src, &ep, 0);
+    if (*ep || ep == src || errno || (int)val != val)
+	return -1;
+    *(int *)dst = val;
+    return 0;
+}
+
+static int
+dt_parse_ram_addr_t(void *dst, const char *src, struct dt_prop_spec *spec)
+{
+    char *ep;
+    unsigned long val;
+
+    assert(spec->size == sizeof(ram_addr_t));
+    errno = 0;
+    val = strtoul(src, &ep, 0);
+    if (*ep || ep == src || errno || (ram_addr_t)val != val)
+	return -1;
+    *(ram_addr_t *)dst = val;
+    return 0;
+}
+
+static int
+dt_parse_macaddr(void *dst, const char *src, struct dt_prop_spec *spec)
+{
+    assert(spec->size == 6);
+    if (parse_macaddr(dst, src) < 0)
+	return -1;
+    return 0;
+}
+
+\f

+/* Interfacing with FDT */
+
+#ifdef HAVE_FDT
+
+static int dt_fdt_chk(int res);
+static void dt_subtree_to_fdt(const struct tree *conf, void *fdt);
+
+static void *
+dt_tree_to_fdt(const struct tree *conf)
+{
+    int sz = 1024 * 1024;	/* FIXME arbitrary limit */
+    void *fdt = qemu_malloc(sz);
+
+    dt_fdt_chk(fdt_create(fdt, sz));
+    dt_subtree_to_fdt(conf, fdt);
+    dt_fdt_chk(fdt_finish(fdt));
+    return fdt;
+}
+
+static void
+dt_subtree_to_fdt(const struct tree *conf, void *fdt)
+{
+    struct tree_prop *prop;
+    struct tree *kid;
+    const void *pv;
+    size_t sz;
+
+    dt_fdt_chk(fdt_begin_node(fdt, tree_node_name(conf)));
+    TREE_FOREACH_PROP(prop, conf) {
+	pv = tree_prop_value(prop, &sz);
+	dt_fdt_chk(fdt_property(fdt, tree_prop_name(prop), pv, sz));
+    }
+    TREE_FOREACH_KID(kid, conf)
+	dt_subtree_to_fdt(kid, fdt);
+    dt_fdt_chk(fdt_end_node(fdt));
+}
+
+static struct tree *
+dt_fdt_to_tree(const void *fdt)
+{
+    int offs, next, depth;
+    uint32_t tag;
+    struct fdt_property *prop;
+    struct tree *stack[32];	/* FIXME arbitrary limit */
+
+    stack[0] = NULL;		/* "parent" of root */
+    next = depth = 0;
+
+    for (;;) {
+	offs = next;
+	tag = fdt_next_tag(fdt, offs, &next);
+	switch (tag) {
+	case FDT_PROP:
+	    /*
+	     * libfdt apparently doesn't provide a way to get property
+	     * by offset, do it by hand
+	     */
+	    assert(0 < depth && depth < sizeof(stack) / sizeof(*stack));
+	    prop = (void *)(const char *)fdt + fdt_off_dt_struct(fdt) + offs;
+	    tree_put_prop(stack[depth],
+			  fdt_string(fdt, fdt32_to_cpu(prop->nameoff)),
+			  prop->data,
+			  fdt32_to_cpu(prop->len));
+	case FDT_NOP:
+	    break;
+	case FDT_BEGIN_NODE:
+	    depth++;
+	    assert(0 < depth && depth < sizeof(stack) / sizeof(*stack));
+	    stack[depth] = tree_new_kid(stack[depth-1],
+					fdt_get_name(fdt, offs, NULL),
+					NULL);
+	    break;
+	case FDT_END_NODE:
+	    depth--;
+	    break;
+	case FDT_END:
+	    dt_fdt_chk(next);
+	    return stack[1];
+	}
+    }
+}
+
+static int
+dt_fdt_chk(int res)
+{
+    if (res < 0) {
+	fprintf(stderr, "%s\n", fdt_strerror(res)); /* FIXME cryptic */
+	exit(1);
+    }
+    return res;
+}
+
+static void
+dt_fdt_test(struct tree *conf)
+{
+    void *fdt;
+
+    fdt = dt_tree_to_fdt(conf);
+    conf = dt_fdt_to_tree(fdt);
+    tree_print(conf);
+    free(fdt);
+}
+#else
+static void dt_fdt_test(struct tree *conf) { }
+#endif
+\f

+/* CPUs Driver */
+
+struct dt_device_cpus {
+    const char *model;
+    int num;
+};
+
+static struct dt_prop_spec dt_cpus_props[] = {
+    DT_PROP_SPEC_INIT("model", struct dt_device_cpus, model, string),
+    DT_PROP_SPEC_INIT("num", struct dt_device_cpus, num, int),
+};
+
+static void
+dt_cpus_init(struct dt_device *dev)
+{
+    struct dt_device_cpus *priv = dev->priv;
+    int i;
+    CPUState *env;
+
+    for(i = 0; i < priv->num; i++) {
+        env = cpu_init(priv->model);
+        if (!env) {
+            fprintf(stderr, "Unable to find x86 CPU definition\n");
+            exit(1);
+        }
+        if (i != 0)
+            env->halted = 1;
+        qemu_register_reset(main_cpu_reset, env);
+    }
+}
+
+\f

+/* Memory Ranges */
+
+struct dt_device_memrng {
+    target_phys_addr_t phys_addr;
+    ram_addr_t size;
+    ram_addr_t host_offs;
+    ram_addr_t flags;
+};
+
+static void
+dt_memrng(struct dt_device_memrng *rng,
+	  target_phys_addr_t phys_addr, ram_addr_t size,
+	  ram_addr_t host_offs, ram_addr_t flags)
+{
+    rng->phys_addr = phys_addr;
+    rng->size = size;
+    rng->host_offs = host_offs;
+    rng->flags = flags;
+}
+
+static void
+dt_memrng_ram(struct dt_device_memrng *rng,
+	      target_phys_addr_t phys_addr, ram_addr_t size)
+{
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), 0);
+}
+
+static void
+dt_memrng_rom(struct dt_device_memrng *rng,
+	      target_phys_addr_t phys_addr, ram_addr_t maxsz,
+	      const char *dir, const char *image, int top)
+{
+    char buf[1024];
+    int size;
+
+    snprintf(buf, sizeof(buf), "%s/%s", dir, image);
+    size = get_image_size(buf);
+    if (size < 0 || size > maxsz)
+	goto error;
+    if (top)
+	phys_addr = phys_addr + maxsz - size;
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), IO_MEM_ROM);
+    if (load_image(buf, phys_ram_base + rng->host_offs) != size)
+	goto error;
+    return;
+
+error:
+    fprintf(stderr, "qemu: could not load image '%s'\n", buf);
+    exit(1);
+}
+
+static void
+dt_memrng_init(struct dt_device_memrng *rng, int n)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+	cpu_register_physical_memory(rng[i].phys_addr, rng[i].size,
+				     rng[i].host_offs | rng[i].flags);
+}
+
+\f

+/* Memory Driver */
+
+struct dt_device_memory {
+    ram_addr_t ram_size;
+    struct dt_device_memrng *rng;
+    int nrng;
+    /* TODO want a real memory map here */
+    ram_addr_t below_4g, above_4g;
+};
+
+static struct dt_prop_spec dt_memory_props[] = {
+    DT_PROP_SPEC_INIT("ram", struct dt_device_memory, ram_size, ram_addr_t),
+};
+
+static int
+dt_memory_config(struct dt_device *dev, struct dt_host *host)
+{
+    /* TODO memory map hardcoded; get it from dev->conf instead */
+    struct dt_device_memory *priv = dev->priv;
+    struct dt_device_memrng *rng = qemu_malloc(sizeof(*rng) * 4);
+
+    if (priv->ram_size >= 0xe0000000 ) {
+        priv->above_4g = priv->ram_size - 0xe0000000;
+        priv->below_4g = 0xe0000000;
+    } else {
+        priv->below_4g = priv->ram_size;
+	priv->above_4g = 0;
+    }
+
+    dt_memrng_ram(&rng[0], 0, 0xa0000);
+    qemu_ram_alloc(0x60000);
+    dt_memrng_ram(&rng[1], 0x100000, priv->below_4g - 0x100000);
+    if (priv->above_4g)
+	abort();		/* TODO */
+    dt_memrng_rom(&rng[2], 0xe0000000, 0x20000000,
+		  bios_dir, BIOS_FILENAME, 1);
+				/* TODO get name from dev->conf */
+    dt_memrng(&rng[3], 0xe0000, 0x20000,
+	      rng[2].host_offs + rng[2].size - 0x20000, IO_MEM_ROM);
+    /* TODO option ROMs */
+
+    priv->rng = rng;
+    priv->nrng = 4;
+    return 0;
+}
+
+static void
+dt_memory_init(struct dt_device *dev)
+{
+    struct dt_device_memory *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, priv->nrng);
+    bochs_bios_init();
+}
+
+static ram_addr_t
+dt_memory_below_4g(struct tree *memory)
+{
+    struct dt_device *dev = dt_device_of(memory);
+    struct dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->below_4g;
+}
+
+static ram_addr_t
+dt_memory_above_4g(struct tree *memory)
+{
+    struct dt_device *dev = dt_device_of(memory);
+    struct dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->above_4g;
+}
+
+\f

+/* PC Miscellanous Driver */
+
+/*
+ * This is a driver for a whole collection of devices.  Could be
+ * picked apart into separate drivers, I guess.
+ */
+
+struct dt_device_pc_misc {
+    const char *boot_device;
+    int apic;
+    int hpet;
+    qemu_irq *i8259;
+    BlockDriverState *fd[MAX_FD];
+};
+
+static struct dt_prop_spec dt_pc_misc_props[] = {
+    DT_PROP_SPEC_INIT("boot-device", struct dt_device_pc_misc, boot_device,
+		      string),
+};
+
+static int
+dt_pc_misc_config(struct dt_device *dev, struct dt_host *host)
+{
+    struct dt_device_pc_misc *priv = dev->priv;
+
+    priv->apic = 1;
+    priv->hpet = 1;
+    priv->i8259 = NULL;
+    dt_drive_config(dev->conf, host,
+		    priv->fd, sizeof(priv->fd) / sizeof(*priv->fd));
+    return 1;
+}
+
+static void
+dt_pc_misc_init(struct dt_device *dev)
+{
+    struct dt_device_pc_misc *priv = dev->priv;
+    CPUState *env;
+    qemu_irq *cpu_irq;
+    IOAPICState *ioapic;
+    PITState *pit;
+    int i;
+
+    if (priv->apic) {
+	for (env = first_cpu; env; env = env->next_cpu) {
+            env->cpuid_features |= CPUID_APIC;
+	    apic_init(env);
+	}
+    }
+
+    vmport_init();
+
+    cpu_irq = qemu_allocate_irqs(pic_irq_request, NULL, 1);
+    priv->i8259 = i8259_init(cpu_irq[0]);
+    ferr_irq = priv->i8259[13];
+
+    register_ioport_write(0x80, 1, 1, ioport80_write, NULL);
+    register_ioport_write(0xf0, 1, 1, ioportF0_write, NULL);
+
+    rtc_state = rtc_init(0x70, priv->i8259[8], 2000);
+    qemu_register_boot_set(pc_boot_set, rtc_state);
+
+    register_ioport_read(0x92, 1, 1, ioport92_read, NULL);
+    register_ioport_write(0x92, 1, 1, ioport92_write, NULL);
+
+    if (priv->apic) {
+        ioapic = ioapic_init();
+        pic_set_alt_irq_func(isa_pic, ioapic_set_irq, ioapic);
+    }
+
+    pit = pit_init(0x40, priv->i8259[0]);
+    pcspk_init(pit);
+    if (priv->hpet)
+        hpet_init(priv->i8259);
+    
+    for(i = 0; i < MAX_SERIAL_PORTS; i++) {
+        if (serial_hds[i]) {
+            serial_init(serial_io[i], priv->i8259[serial_irq[i]], 115200,
+                        serial_hds[i]);
+        }
+    }
+
+    for(i = 0; i < MAX_PARALLEL_PORTS; i++) {
+        if (parallel_hds[i]) {
+            parallel_init(parallel_io[i], priv->i8259[parallel_irq[i]],
+                          parallel_hds[i]);
+        }
+    }
+
+    i8042_init(priv->i8259[1], priv->i8259[12], 0x60);
+    DMA_init(0);
+
+    floppy_controller = fdctrl_init(priv->i8259[6], 2, 0, 0x3f0, priv->fd);
+}
+
+static void
+dt_pc_misc_start(struct dt_device *dev)
+{
+    struct dt_device_pc_misc *priv = dev->priv;
+    struct tree *memory = tree_node_by_name(dev->conf, "/memory");
+    struct tree *piix3 = tree_node_by_name(dev->conf, "/pci/piix3");
+
+    cmos_init(dt_memory_below_4g(memory),
+	      dt_memory_above_4g(memory),
+	      priv->boot_device,
+	      dt_piix3_hd(piix3));
+}
+
+static qemu_irq *
+dt_pc_misc_i8259(struct tree *pc_misc)
+{
+    struct dt_device *dev = dt_device_of(pc_misc);
+    struct dt_device_pc_misc *priv = dev->priv;
+    assert(dev->drv->init == dt_pc_misc_init);
+    return priv->i8259;
+}
+
+\f

+/* PCI Bus Driver */
+
+struct dt_device_pci {
+    PCIBus *bus;
+    struct tree *pc;
+};
+
+static int
+dt_pci_config(struct dt_device *dev, struct dt_host *host)
+{
+    struct dt_device_pci *priv = dev->priv;
+
+    priv->bus = NULL;
+    priv->pc = tree_require_named(dev->conf, "/pc-misc");
+    return 0;
+}
+
+static void
+dt_pci_init(struct dt_device *dev)
+{
+    struct dt_device_pci *priv = dev->priv;
+
+    priv->bus = i440fx_init(&i440fx_state, dt_pc_misc_i8259(priv->pc));
+}
+
+static void
+dt_pci_start(struct dt_device *dev)
+{
+    i440fx_init_memory_mappings(i440fx_state);
+}
+
+static void
+dt_must_be_on_pcibus(struct dt_device *dev)
+{
+    struct dt_device *bus = dt_device_of(tree_parent(dev->conf));
+
+    if (bus->drv->init != dt_pci_init) {
+	fprintf(stderr, "Device %s must be on a PCI bus\n", dev->drv->name);
+	exit(1);
+    }
+}
+
+static struct PCIBus *
+dt_get_pcibus(struct dt_device *dev)
+{
+    struct dt_device *bus = dt_device_of(tree_parent(dev->conf));
+
+    assert(bus->drv->init == dt_pci_init);
+    return ((struct dt_device_pci *)bus->priv)->bus;
+}
+
+\f

+/* PIIX3 Driver */
+
+struct dt_device_piix3 {
+    int devfn;
+    int acpi;
+    int usb;
+    struct tree *pc;
+    BlockDriverState *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
+};
+
+static int
+dt_piix3_config(struct dt_device *dev, struct dt_host *host)
+{
+    struct dt_device_piix3 *priv = dev->priv;
+
+    priv->devfn = -1;
+    priv->acpi = 1;
+    priv->usb = 1;
+    priv->pc = tree_require_named(dev->conf, "/pc-misc");
+    dt_drive_config(dev->conf, host,
+		    priv->hd, sizeof(priv->hd) / sizeof(*priv->hd));
+    dt_must_be_on_pcibus(dev);
+    return 1;
+}
+
+static void
+dt_piix3_init(struct dt_device *dev)
+{
+    struct dt_device_piix3 *priv = dev->priv;
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+    qemu_irq *i8259 = dt_pc_misc_i8259(priv->pc);
+    int i;
+
+    priv->devfn = piix3_init(pci_bus, priv->devfn);
+
+    pci_piix3_ide_init(pci_bus, priv->hd, priv->devfn + 1, i8259);
+
+    if (priv->usb)
+	usb_uhci_piix3_init(pci_bus, priv->devfn + 2);
+
+    if (priv->acpi) {
+        uint8_t *eeprom_buf = qemu_mallocz(8 * 256); /* XXX: make this persistent */
+        i2c_bus *smbus;
+
+        /* TODO: Populate SPD eeprom data.  */
+        smbus = piix4_pm_init(pci_bus, priv->devfn + 3, 0xb100, i8259[9]);
+        for (i = 0; i < 8; i++)
+            smbus_eeprom_device_init(smbus, 0x50 + i, eeprom_buf + (i * 256));
+    }
+}
+
+static BlockDriverState **
+dt_piix3_hd(struct tree *piix3)
+{
+    struct dt_device *dev = dt_device_of(piix3);
+    struct dt_device_piix3 *priv = dev->priv;
+
+    assert(dev->drv->init == dt_piix3_init);
+    return priv->hd;
+}
+
+\f

+/* VGA Driver */
+
+struct dt_driver_vga {
+    const char *model;
+    const char *bios;
+    void (*init)(PCIBus *, uint8_t *, ram_addr_t, int);
+};
+
+static void
+pci_vmsvga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+		 ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vmsvga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size);
+}
+
+static void
+pci_vga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+	      ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size, 0, 0);
+}
+
+static struct dt_driver_vga dt_driver_vga_table[] = {
+    { "cirrus", VGABIOS_CIRRUS_FILENAME, pci_cirrus_vga_init },
+    { "vms", VGABIOS_FILENAME, pci_vmsvga_init_ },
+    { "std", VGABIOS_FILENAME, pci_vga_init_ },
+    { NULL, NULL, NULL }
+};
+
+struct dt_device_vga {
+    const char *model;
+    ram_addr_t ram_size;
+    struct dt_device_memrng rng[1];
+    ram_addr_t ram_offs;
+    struct dt_driver_vga *vga_drv;
+};
+
+static struct dt_prop_spec dt_vga_props[] = {
+    DT_PROP_SPEC_INIT("model", struct dt_device_vga, model, string),
+    DT_PROP_SPEC_INIT("ram", struct dt_device_vga, ram_size, ram_addr_t),
+};
+
+static int
+dt_vga_config(struct dt_device *dev, struct dt_host *host)
+{
+    struct dt_device_vga *priv = dev->priv;
+    int i;
+
+    dt_memrng_rom(&priv->rng[0], 0xc0000, 0x10000,
+		  bios_dir, VGABIOS_CIRRUS_FILENAME, 0);
+				/* TODO get name from dev->conf */
+    priv->ram_offs = qemu_ram_alloc(priv->ram_size);
+
+    for (i = 0; dt_driver_vga_table[i].model; i++) {
+	if (!strcmp(dt_driver_vga_table[i].model, priv->model))
+	    break;
+    }
+    if (!dt_driver_vga_table[i].model) {
+	fprintf(stderr, "Unknown VGA model %s\n", priv->model);
+	exit(1);
+    }
+    priv->vga_drv = &dt_driver_vga_table[i];
+    dt_must_be_on_pcibus(dev);
+    return 0;
+}
+
+static void
+dt_vga_init(struct dt_device *dev)
+{
+    struct dt_device_vga *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, 1);
+    priv->vga_drv->init(dt_get_pcibus(dev),
+			phys_ram_base + priv->ram_offs,
+			priv->ram_offs, priv->ram_size);
+}
+
+\f

+/* NIC Driver */
+
+struct dt_device_nic {
+    NICInfo nd;
+};
+
+static struct dt_prop_spec dt_nic_props[] = {
+    DT_PROP_SPEC_INIT("model", struct dt_device_nic, nd.model, string),
+    DT_PROP_SPEC_INIT("mac", struct dt_device_nic, nd.macaddr, macaddr),
+    DT_PROP_SPEC_INIT("name", struct dt_device_nic, nd.name, string),
+};
+
+static int
+dt_nic_config(struct dt_device *dev, struct dt_host *host)
+{
+    struct dt_device_nic *priv = dev->priv;
+
+    priv->nd.vlan = dt_find_vlan(dev->conf, host);
+    dt_must_be_on_pcibus(dev);
+    return 0;
+}
+
+static void
+dt_nic_init(struct dt_device *dev)
+{
+    struct dt_device_nic *priv = dev->priv;
+
+    pci_nic_init(dt_get_pcibus(dev), &priv->nd, -1, NULL);
+}
+
+\f

+/* Machine Driver */
+
+static struct dt_driver dt_driver_table[] = {
+    { "", 0, NULL, NULL },
+    { "cpus", sizeof(struct dt_device_cpus), dt_cpus_props,
+      NULL, dt_cpus_init, NULL },
+    { "memory", sizeof(struct dt_device_memory), dt_memory_props,
+      dt_memory_config, dt_memory_init, NULL },
+    { "pc-misc", sizeof(struct dt_device_pc_misc), dt_pc_misc_props,
+      dt_pc_misc_config, dt_pc_misc_init, dt_pc_misc_start },
+    { "pci", sizeof(struct dt_device_pci), NULL,
+      dt_pci_config, dt_pci_init, dt_pci_start },
+    { "piix3", sizeof(struct dt_device_piix3), NULL,
+      dt_piix3_config, dt_piix3_init, NULL },
+    { "vga", sizeof(struct dt_device_vga), dt_vga_props,
+      dt_vga_config, dt_vga_init, NULL },
+    { "nic", sizeof(struct dt_device_nic), dt_nic_props,
+      dt_nic_config, dt_nic_init, NULL },
+    { NULL, 0, NULL, NULL, NULL }
+};
+
+static struct tree *
+dt_read_config(void)
+{
+    struct tree *root, *pci, *leaf;
+
+    /* TODO read from config file */
+    root = tree_new_kid(NULL, "", NULL);
+    leaf = tree_new_kid(root, "cpus", NULL);
+    tree_put_propf(leaf, "model", "%s", "qemu32");
+    leaf = tree_new_kid(root, "memory", NULL);
+    leaf = tree_new_kid(root, "pc-misc", NULL);
+    pci = tree_new_kid(root, "pci", NULL);
+    leaf = tree_new_kid(pci, "piix3", NULL);
+    return root;
+}
+
+/*
+ * Extract configuration from arguments and various global variables
+ * and put it into our machine and host configuration.
+ */
+static void
+dt_customize_config(struct tree *conf,
+		    struct dt_host *host,
+		    ram_addr_t ram_size, int vga_ram_size,
+		    const char *boot_device,
+		    const char *kernel_filename,
+		    const char *kernel_cmdline,
+		    const char *initrd_filename,
+		    const char *cpu_model)
+{
+    struct tree *pci = tree_node_by_name(conf, "/pci");
+    struct tree *node;
+    int i, index;
+
+    node = tree_node_by_name(conf, "/cpus");
+    tree_put_propf(node, "num", "%d", smp_cpus);
+    if (cpu_model)
+	tree_put_propf(node, "model", "%s", cpu_model);
+
+    node = tree_node_by_name(conf, "/memory");
+    tree_put_propf(node, "ram", "%#lx", (unsigned long)ram_size);
+
+    node = tree_node_by_name(conf, "/pc-misc");
+    tree_put_propf(node, "boot-device", "%s", boot_device);
+
+    /* Insert VGA node */
+    if (cirrus_vga_enabled || vmsvga_enabled || std_vga_enabled) {
+	node = tree_new_kid(pci, "vga", NULL);
+	tree_put_propf(node, "model", "%s",
+			  cirrus_vga_enabled ? "cirrus" :
+			  vmsvga_enabled ? "vms" : "std");
+	tree_put_propf(node, "ram", "%#x", vga_ram_size);
+    }
+
+    /* Insert NIC nodes, connect to VLANs */
+    for(i = 0; i < nb_nics; i++) {
+	/* TODO non-PCI NICs */
+	struct NICInfo *n = &nd_table[i];
+
+	node = tree_new_kid(pci, "nic", NULL);
+	tree_put_propf(node, "mac", "%02x:%02x:%02x:%02x:%02x:%02x",
+		       n->macaddr[0], n->macaddr[1], n->macaddr[2],
+		       n->macaddr[3], n->macaddr[4], n->macaddr[5]);
+	tree_put_propf(node, "model", "%s",
+		       n->model ? n->model : "ne2k_pci");
+	if (n->name)
+	    tree_put_propf(node, "name", "%s", n->name);
+	dt_attach_nic(host, i, node, n->vlan);
+    }
+
+    /* Connect drives to their controller nodes */
+    /* IDE */
+    node = tree_node_by_name(pci, "piix3");
+    for(i = 0; i < MAX_IDE_BUS * MAX_IDE_DEVS; i++) {
+        index = drive_get_index(IF_IDE, i / MAX_IDE_DEVS, i % MAX_IDE_DEVS);
+	if (index != -1)
+	    dt_attach_drive(host, index, node, drives_table[index].bdrv);
+    }
+    /* Floppy */
+    node = tree_node_by_name(conf, "/pc-misc");
+    for(i = 0; i < MAX_FD; i++) {
+        index = drive_get_index(IF_FLOPPY, 0, i);
+	if (index != -1)
+	    dt_attach_drive(host, index, node, drives_table[index].bdrv);
+    }
+
+    /* Unimplemented stuff */
+    if (kernel_filename)
+	abort();		/* TODO */
+}
+
+static void
+pc_init_dt(ram_addr_t ram_size, int vga_ram_size,
+	   const char *boot_device,
+	   const char *kernel_filename,
+	   const char *kernel_cmdline,
+	   const char *initrd_filename,
+	   const char *cpu_model)
+{
+    struct tree *conf;
+    struct dt_host host;
+
+    conf = dt_read_config();
+    if (!conf)
+	exit(1);
+    tree_print(conf);
+    memset(&host, 0, sizeof(host));
+    dt_customize_config(conf, &host, ram_size, vga_ram_size, boot_device,
+			kernel_filename, kernel_cmdline, initrd_filename,
+			cpu_model);
+    dt_config(conf, &host);
+    tree_print(conf);
+    dt_print_host_config(&host);
+    dt_fdt_test(conf);
+    dt_init(conf);
+    dt_start(conf);
+}
+
+QEMUMachine pcdt_machine = {
+    .name = "pcdt",
+    .desc = "Standard PC (device tree)",
+    .init = pc_init_dt,
+    .ram_require = VGA_RAM_SIZE + PC_MAX_BIOS_SIZE,
+    .max_cpus = 255,
+};
diff --git a/hw/pc.c b/hw/pc.c
index 176730e..fc9ee20 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -57,21 +57,21 @@ static fdctrl_t *floppy_controller;
 static RTCState *rtc_state;
 static PITState *pit;
 static IOAPICState *ioapic;
-static PCIDevice *i440fx_state;
+PCIDevice *i440fx_state;
 
 static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
 {
 }
 
 /* MSDOS compatibility mode FPU exception support */
-static qemu_irq ferr_irq;
+qemu_irq ferr_irq;
 /* XXX: add IGNNE support */
 void cpu_set_ferr(CPUX86State *s)
 {
     qemu_irq_raise(ferr_irq);
 }
 
-static void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
 {
     qemu_irq_lower(ferr_irq);
 }
diff --git a/target-i386/machine.c b/target-i386/machine.c
index faab2eb..5a2a0c2 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -7,6 +7,8 @@
 
 void register_machines(void)
 {
+    extern QEMUMachine pcdt_machine;
+    qemu_register_machine(&pcdt_machine);
     qemu_register_machine(&pc_machine);
     qemu_register_machine(&isapc_machine);
 }
diff --git a/tree.c b/tree.c
new file mode 100644
index 0000000..ec7af0b
--- /dev/null
+++ b/tree.c
@@ -0,0 +1,333 @@
+/* Ye Olde Decorated Tree */
+
+#include <assert.h>
+#include <ctype.h>
+#include <errno.h>
+#include <stdarg.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include "tree.h"
+#include "qemu-common.h"
+#include "sys-queue.h"
+
+struct tree {
+    const char *name;
+    LIST_HEAD(, tree_prop) props;
+    struct tree *parent;
+    TAILQ_HEAD(, tree) kids;
+    TAILQ_ENTRY(tree) siblings;
+    void *user;
+    LIST_HEAD(, tree) reqs;
+    LIST_ENTRY(tree) reqlink;
+    int visit;
+};
+
+struct tree_prop {
+    const char *name;
+    const void *val;
+    int sz;
+    struct tree *owner;
+    LIST_ENTRY(tree_prop) link;
+};
+
+struct tree *
+tree_new_kid(struct tree *parent, const char *name, void *user)
+{
+    struct tree *kid = qemu_malloc(sizeof(*kid));
+
+    assert(parent || !*name);
+    kid->name = name;
+    LIST_INIT(&kid->props);
+    kid->parent = parent;
+    TAILQ_INIT(&kid->kids);
+    if (parent)
+	TAILQ_INSERT_TAIL(&parent->kids, kid, siblings);
+    kid->user = user;
+    LIST_INIT(&kid->reqs);
+    kid->visit = 0;
+
+    return kid;
+}
+
+const char *
+tree_node_name(const struct tree *node)
+{
+    return node->name;
+}
+
+static struct tree *
+tree_kid_by_name(const struct tree *dt, const char *name)
+{
+    const char *slash = strchr(name, '/');
+    size_t len = slash ? slash - name : strlen(name);
+    struct tree *kid;
+
+    TAILQ_FOREACH(kid, &dt->kids, siblings) {
+	if (!memcmp(kid->name, name, len) && kid->name[len] == 0)
+	    return kid;
+    }
+    return NULL;
+}
+
+struct tree *
+tree_node_by_name(const struct tree *node, const char *name)
+{
+    struct tree *kid;
+    size_t len;
+
+    if (name[0] == '/') {
+	for (; node->parent; node = node->parent) ;
+	name++;
+    }
+
+    if (name[0] == 0)
+	return (struct tree *)node;
+
+    kid = tree_kid_by_name(node, name);
+    if (!kid)
+	return NULL;
+
+    len = strlen(kid->name);
+    if (name[len] == 0)
+	return kid;
+    assert (name[len] == '/');
+
+    while (name[len] == '/') len++;
+    return tree_node_by_name(kid, name + len);
+}
+
+struct tree_prop *
+tree_first_prop(const struct tree *node)
+{
+    return LIST_FIRST(&node->props);
+}
+
+struct tree_prop *
+tree_next_prop(const struct tree_prop *prop)
+{
+    return LIST_NEXT(prop, link);
+}
+
+struct tree_prop *
+tree_get_prop(const struct tree *node, const char *name)
+{
+    struct tree_prop *prop;
+
+    LIST_FOREACH(prop, &node->props, link) {
+	if (!strcmp(prop->name, name))
+	    return prop;
+    }
+    return NULL;
+}
+
+const char *
+tree_get_prop_s(const struct tree *node, const char *name)
+{
+    struct tree_prop *prop = tree_get_prop(node, name);
+    if (!prop
+	|| memchr(prop->val, 0, prop->sz) != prop->val + prop->sz - 1) {
+	errno = EINVAL;
+	return NULL;
+    }
+    return prop->val;
+}
+
+const char *
+tree_prop_name(const struct tree_prop *prop)
+{
+    return prop->name;
+}
+
+const void *
+tree_prop_value(const struct tree_prop *prop, size_t *size)
+{
+    if (size)
+	*size = prop->sz;
+    return prop->val;
+}
+
+void
+tree_put_prop(struct tree *node, const char *name,
+	      const void *val, size_t sz)
+{
+    struct tree_prop *prop;
+
+    prop = tree_get_prop(node, name);
+    if (!prop) {
+	prop = qemu_malloc(sizeof(*prop));
+	prop->name = name;
+	prop->owner = node;
+	LIST_INSERT_HEAD(&node->props, prop, link);
+    }
+    /* FIXME need a destructor for val */
+    prop->val = val;
+    prop->sz = sz;
+}
+
+void
+tree_put_propf(struct tree *node, const char *name, const char *fmt, ...)
+{
+    va_list ap;
+    size_t len;
+    char *buf;
+
+    va_start(ap, fmt);
+    len = vsnprintf(NULL, 0, fmt, ap);
+    va_end(ap);
+
+    buf = qemu_malloc(len + 1);
+    va_start(ap, fmt);
+    vsnprintf(buf, len + 1, fmt, ap);
+    va_end(ap);
+
+    tree_put_prop(node, name, buf, len + 1);
+}
+
+void
+tree_put_user(struct tree *node, void *user)
+{
+    node->user = user;
+}
+
+void *
+tree_get_user(const struct tree *node)
+{
+    return node->user;
+}
+
+struct tree *
+tree_parent(const struct tree *node)
+{
+    return node->parent;
+}
+
+struct tree *
+tree_first_kid(const struct tree *node)
+{
+    return TAILQ_FIRST(&node->kids);
+}
+
+struct tree *
+tree_sibling(const struct tree *node)
+{
+    return TAILQ_NEXT(node, siblings);
+}
+
+void
+tree_require(struct tree *node, struct tree *req)
+{
+    LIST_INSERT_HEAD(&node->reqs, req, reqlink);
+}
+
+struct tree *
+tree_require_named(struct tree *node, const char *reqname)
+{
+    struct tree *req = tree_node_by_name(node, reqname);
+    tree_require(node, req);
+    return req;
+}
+
+static void
+tree_do_visit(struct tree *node,
+	      void (*fun)(struct tree *, void *arg),
+	      void *arg, int visit)
+{
+    struct tree *req, *kid;
+
+    assert(node->visit < visit - 1);
+    node->visit = visit - 1;
+    if (node->parent && node->parent->visit < visit)
+	tree_do_visit(node->parent, fun, arg, visit);
+    LIST_FOREACH(req, &node->reqs, reqlink) {
+	if (req->visit < visit)
+	    tree_do_visit(req, fun, arg, visit);
+    }
+    node->visit = visit;
+    fun(node, arg);
+    TAILQ_FOREACH(kid, &node->kids, siblings) {
+	if (kid->visit < visit - 1)
+	    tree_do_visit(kid, fun, arg, visit);
+    }
+}
+
+void
+tree_visit(struct tree *node,
+	   void (*fun)(struct tree *, void *arg),
+	   void *arg)
+{
+    static int visit;
+
+    visit += 2;
+    tree_do_visit(node, fun, arg, visit);
+}
+
+int
+tree_path(const struct tree *node, char *buf, size_t bufsz)
+{
+    char *p;
+    const struct tree *np;
+    size_t len, res;
+
+    p = buf + bufsz;
+    res = 0;
+    for (np = node; np->parent; np = np->parent) {
+	len = 1 + strlen(np->name);
+	res += len;
+	if (res >= bufsz)
+	    continue;
+	p -= len;
+	memcpy(p + 1, np->name, len - 1);
+	p[0] = '/';
+    }
+
+    if (res < bufsz) {
+	memcpy(buf, p, res);
+	buf[res] = 0;
+    }
+
+    return res;
+}
+
+static void
+tree_print_sub(const struct tree *node, int indent)
+{
+    int i, use_str, sep;
+    const unsigned char *pv;
+    struct tree_prop *prop;
+    struct tree *kid;
+
+    printf("%*s%s {\n", indent, "", node->name[0] ? node->name : "/");
+    LIST_FOREACH(prop, &node->props, link) {
+	printf("%*s%s", indent + 4, "", prop->name);
+	pv = prop->val;
+	if (pv) {
+	    printf(" = ");
+	    use_str = pv[prop->sz - 1] == 0;
+	    for (i = 0; i < prop->sz - 1; i++) {
+		if (!isprint(pv[i]))
+		    use_str = 0;
+	    }
+	    if (use_str)
+		printf("\"%s\"", (const char *)prop->val);
+	    else {
+		sep = '[';
+		for (i = 0; i < prop->sz; i++) {
+		    printf("%c%02x", sep, pv[i]);
+		    sep = ' ';
+		}
+		printf("]");
+	    }
+	}
+	printf(";\n");
+    }
+    TAILQ_FOREACH(kid, &node->kids, siblings)
+	tree_print_sub(kid, indent + 4);
+    printf("%*s};\n", indent, "");
+}
+
+void
+tree_print(const struct tree *node)
+{
+    tree_print_sub(node, 0);
+}
diff --git a/tree.h b/tree.h
new file mode 100644
index 0000000..092350b
--- /dev/null
+++ b/tree.h
@@ -0,0 +1,46 @@
+#ifndef TREE_H
+#define TREE_H
+
+#include <stddef.h>
+
+struct tree;
+struct tree_prop;
+
+struct tree *tree_new_kid(struct tree *parent, const char *name, void *user);
+const char *tree_node_name(const struct tree *node);
+struct tree *tree_node_by_name(const struct tree *node,
+			       const char *name);
+
+struct tree_prop *tree_first_prop(const struct tree *node);
+struct tree_prop *tree_next_prop(const struct tree_prop *prop);
+#define TREE_FOREACH_PROP(var, node) \
+    for (var = tree_first_prop(node); var; var = tree_next_prop(var))
+struct tree_prop *tree_get_prop(const struct tree *node, const char *name);
+const char *tree_get_prop_s(const struct tree *node, const char *name);
+const char *tree_prop_name(const struct tree_prop *prop);
+const void *tree_prop_value(const struct tree_prop *prop, size_t *size);
+void tree_put_prop(struct tree *node, const char *name,
+		   const void *val, size_t sz);
+void tree_put_propf(struct tree *node, const char *name,
+		    const char *fmt, ...)
+    __attribute__((format(printf,3,4)));
+
+void tree_put_user(struct tree *node, void *user);
+void *tree_get_user(const struct tree *node);
+
+struct tree *tree_parent(const struct tree *node);
+struct tree *tree_first_kid(const struct tree *node);
+struct tree *tree_sibling(const struct tree *node);
+#define TREE_FOREACH_KID(var, node)					\
+    for (var = tree_first_kid(node); var; var = tree_sibling(var))
+
+void tree_require(struct tree *node, struct tree *req);
+struct tree *tree_require_named(struct tree *node, const char *reqname);
+void tree_visit(struct tree *node,
+		void (*fun)(struct tree *, void *arg),
+		void *arg);
+
+int tree_path(const struct tree *node, char *buf, size_t bufsz);
+void tree_print(const struct tree *node);
+
+#endif

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-16  3:42                   ` David Gibson
@ 2009-02-16 16:39                       ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-16 16:39 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A

David Gibson <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> writes:

> On Fri, Feb 13, 2009 at 12:26:28PM +0100, Markus Armbruster wrote:
>> David Gibson <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> writes:
>> > On Thu, Feb 12, 2009 at 11:26:12AM +0100, Markus Armbruster wrote:
>> >> Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
> [snip]
>> > dtc and libfdt is a good place to start, if you haven't yet
>> > investigated them:
>> > 	git://git.jdl.com/software/dtc.git
>> > Note that although they're distributed together as one tree, dtc and
>> > libfdt are essentially independent pieces of software.  dtc converts
>> > device trees between various formats, dts and dtb in particular.
>> >
>> > libfdt does a number of the things you mention with flat trees -
>> > get/set properties, build trees, traverse etc.  If it doesn't do
>> > everything you need, we can probably extend it so that it does: I want
>> > libfdt to be *the* library for manipulating trees in the fdt forma.
>> > It's designed to be easy to embed in other packages for this reason,
>> > although it does have some usage peculiarities because in particular
>> > it's possible to integrate into very limited environments like
>> > firmwares.
>> >
>> > [Jon Loeliger is the current maintainer of dtc and libfdt, but I
>> > originally wrote both of them - I know as much about them as anyone
>> > does]
>> 
>> Okay, I looked at dtc and libfdt again, a bit more closely.  I'm sure
>> there's plenty of ignorance left in me, so please correct me when I'm
>> babbling nonsense.
>
> Sure.  So, I realize that there are two different questions here:
> 	a) Is IEEE1275 a good starting point for the content of a
> decorated tree for configuring qemu.
>
> Personally, I suspect the answer to this is yes, but more information
> might convince me otherwise.

I think it's simply too early to call.  We're learning as we go.

> 	b) Is the flattened tree format for representing IEEE1275-like
> trees useful for qemu.
>
> Personally, I think this is a "maybe".  More on this below.
>
> Actually, on consideration there's a third question, too:
> 	c) Are the extensions / simplifications / adjustments we've
> made to IEEE1275 conventions in the context of flattened trees also
> useful and appropriate for qemu-configuration tree.
>
> I think if the answer to (a) is yes, then the answer to (c) is yes,
> too.

Sounds fair to me, but I'm hardly qualified to judge.

>> FDT is a "flattened tree", i.e. a tree data structure laid out in a
>> block of memory in a clever way to make it compact and easily
>
> That's correct.
>
>> relocatable.  I understand why these are important requirements for
>> passing information through bootloader to kernel.  They're irrelevant,
>> however, for use as QEMU configuration.
>
> That's probably largely true.
>
>> You can identify an FDT node by node offset or node name.  The node
>> offset can change when you add or delete nodes or properties.
>
> Correct.
>
>> You want everyone to use libfdt for manipulating FDTs.  I think that's
>> entirely sensible.  What I still don't get is something else: Why use
>> FDT for QEMU configuration in the first place?  Let me explain.
>
> Yeah, I see your point, hence my "maybe" to (b) above.  There's no
> obvious call for the fdt format in qemu, but I can see a couple of
> minor things that might make it worthwhile: First, if qemu ever does
> want to record its configuration tree persistently - to be passed
> between programs, or between invocations of a program - then it's
> probably better to use the established fdt format rather than creating
> a new one, even if fdt isn't designed particularly towards qemu's
> purposes.  Second, the existing code / tools for working with the fdt
> format *might* be sufficiently useful to make it worth using.
>
> [Note also that the fdt tools will mostly work fine even if the tree
> content is *not* very IEEE1275-like]
>
>> I think we have two distinct problems: the need for a flexible,
>> expressive QEMU machine configuration file and a virtual device
>> configuration machinery driven by it, and the need for an FDT to pass to
>> a PowerPC kernel.  The two may be related, but they are not identical.
>> 
>> Let's pretend for a minute the latter need doesn't exist.
>> 
>> QEMU machine configuration wants to be a decorated tree: a tree of named
>> nodes with named properties.
>> 
>> IEEE 1275 is a standard describing a special kind of decorated tree.
>> Other kinds can be created with a binding.  If we create a suitable
>> binding, we can surely cast our configuration trees in the IEEE 1275
>> framework.
>
> That's not quite what "binding" usually means in the 1275 context, but
> I think I the point is right enough.
>
>> But what would that buy us?  This is a honest question, born out of my
>> relative ignorance of IEEE 1275.  Mind that we're still busily ignoring
>> the need for an FDT to pass to a kernel, so "it makes it easier to
>> create an FDT for the kernel" doesn't count here (it counts elsewhere).
>
> I think the idea behind using IEEE1275-like trees is that there is
> significant overlap between the device information that IEEE1275
> represents, and the device information which is configurable in qemu.
> Ultimately whether it buys you enough depends on how large that
> overlap is.

I think that's fair.

I believe we don't quite know yet whether the overlap will make it
worthwhile.

One way to approach this is to assume it will until proven wrong.  You
start with an IEEE 1275 description of the machine, and extend or adapt
it as you go.  My problem with that is that we don't have such
descriptions for the machines that interest me.  Developing them is a
big step that pays no immediate benefits, but blocks the little steps
that do pay.  Moreover, without a *real* user of the description, I'd
likely develop something that looks like IEEE 1275 to me, but isn't.  If
it turns out that IEEE 1275 is not worth it, tough, we already paid for
it.

Another way to approach this is to admit we don't know enough and punt
the decision until we do.  Start with the beneficial baby steps.  Limit
the machine description business to what is required for the baby steps,
making a best effort to stay close to IEEE 1275 structurally.  If it
turns out that IEEE 1275 is worth it, we do whatever is left to make the
descriptions conform to it.

I'm much more comfortable with the second approach.

>> FDTs are a special representation of IEEE 1275 trees in memory, designed
>> to be compact and relocatable.  But that comes at a price: nodes move
>> around when the tree changes.  The only real node id is the full name.
>
> Or phandle, for those nodes which have one.

Right, forgot about those.

>> This is not the common representation of decorated trees in C programs,
>> and for a reason.  It's simpler to represent edges as C pointers.  Not
>> the least advantage of that is notation: "->" beats a function call in
>> legibility hands down.
>
> Yes.  If there's enough manipulation of the tree, then you're
> generally better off having a "live" format which uses pointers,
> whether or not the fdt format is used at some stage in the process.
> Both the kernel and dtc (when taking fdt input) convert the flattened
> tree into a "live" representation internally.

Not surprising.

>> Example: the QEMU device data type needs to refer to its device node in
>> the configuration tree.  If that tree is coded the plain old way, you
>> store a pointer to the node and follow that.  If it is an FDT, then you
>> have to store the full node name, and look up the node by name.  I find
>> that tedious and verbose.
>
> Um.. I don't really follow your example.  But I think I see your
> point.  How problematic the flattened format is for this depends a lot
> on exactly what you need to do with it.  Sometimes it's much easier to
> avoid the flattened tree altogether, or transcribe it to a live
> format.  Other times, the tree manipulation is simple enough that it's
> easier to leave it flat (one example, for phases of the program where
> the tree is read-only, which could be a lot for a configuration tree,
> then node offsets *can* safely be used like pointers).

The machines I care for come with many optional and configurable parts.
We select the basic machine type with command line option -M, and
configure the rest with more command line options.  I figure we want to
keep supporting these options, at least for a while.

I believe the best way to deal with that is start with a basic tree
selected by -M, then modify it according to the other options.  So,
there's a fair amount of configuration tree mutation.

>> My point is: the question how to represent our decorated tree in memory
>> is entirely separate from the question of the tree's structure.  Just
>> because you want your tree to conform to IEEE 1275 doesn't mean you want
>> your tree flat at all times.
>
> Absolutely, yes.
>
>
>> Now let's examine how QEMU machine configuration and FDT machine
>> descriptions for kernels are related.
>> 
>> In a way, both can be regarded as copies of a complete machine
>> description with lots of stuff pruned.  Except the complete machine
>> description doesn't exist.  Because there is no use for it.
>> 
>> FDT routinely prunes stuff like PCI and USB devices, because those are
>> better probed.
>> 
>> QEMU configuration should certainly prune everything that is not
>> actually configurable.
>> 
>> To go from QEMU configuration to FDT we therefore may want to prune
>> superflous stuff, to keep it compact,
>
> Not necessarily.  The kernel should be fine to deal with a tree that
> has complete information, even if it doesn't need it, since that's
> what a real OF implementation provides.

Well, wasn't compactness one of the reasons to flatten it in the first
place?

>>  and we definitely have to add lots
>> of stuff that has no place in configuration.
>
> Yes.  Well.. whether this is a good plan depends critically on how big
> that "lots" really is.

I suspect the only way to find out is to try.

>>  Compared to that task, a
>> change of representation seems trivial.  I figure we want to copy the
>> tree anyway, because we need to edit it pretty drastically.
>> 
>> It's not obvious to me whether it makes sense to create the FDT from the
>> QEMU configuration automatically.  If we simulate a specific board, the
>> FDT is pretty fixed, isn't it?  Much of the configurable stuff could be
>> precisely in those parts that are omitted from FDT: PCI devices and
>> such.
>
> Well.. you definitely want to create the FDT passed to the kernel from
> the qemu configuration.  But whether that's best done by essentially
> transcribing a configuration tree which is in a similar format, or
> just using the configuration tree info to poke the changable bits in a
> "skeleton" FDT for the relevant machine is not so clear.
>
> Possibly.  I'm not familiar enough with the various qemu supported
> machine models to say.

Familiarity with all of them is a tall order...

>> >> * Provide an example tree describing a bare-bones PC, like the one in my
>> >>   prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
>> >>   port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
>> >>   miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
>> >>   Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
>> >>   tree structure.  Morphing that into something suitable for QEMU
>> >>   configuration shouldn't be too hard then, just an exercice in
>> >>   redecorating the tree.
>> >
>> > I don't off hand know any trees for a PC system.  There are a bunch of
>> > example trees for powerpc systems in arch/powerpc/boot/dts in the
>> > kernel tree.  A few of those, such as prep, at least have parts which
>> > somewhat resemble a PC.  I believe the OLPC also has OF; that would be
>> > an example OF tree for an x86 machine, if not a typical PC.
>> 
>> Could you point me to a specific file?  I grepped for prep and OLPC, no
>> luck.
>
> Oh, sorry, the prep tree hasn't gone into mainline yet.  But I believe
> Mitch Bradley supplied a PC tree later in the thread, which would be
> better for your purposes, anyway.

Got that, haven't digested it yet.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-16 16:39                       ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-16 16:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

David Gibson <david@gibson.dropbear.id.au> writes:

> On Fri, Feb 13, 2009 at 12:26:28PM +0100, Markus Armbruster wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>> > On Thu, Feb 12, 2009 at 11:26:12AM +0100, Markus Armbruster wrote:
>> >> Hollis Blanchard <hollisb@us.ibm.com> writes:
> [snip]
>> > dtc and libfdt is a good place to start, if you haven't yet
>> > investigated them:
>> > 	git://git.jdl.com/software/dtc.git
>> > Note that although they're distributed together as one tree, dtc and
>> > libfdt are essentially independent pieces of software.  dtc converts
>> > device trees between various formats, dts and dtb in particular.
>> >
>> > libfdt does a number of the things you mention with flat trees -
>> > get/set properties, build trees, traverse etc.  If it doesn't do
>> > everything you need, we can probably extend it so that it does: I want
>> > libfdt to be *the* library for manipulating trees in the fdt forma.
>> > It's designed to be easy to embed in other packages for this reason,
>> > although it does have some usage peculiarities because in particular
>> > it's possible to integrate into very limited environments like
>> > firmwares.
>> >
>> > [Jon Loeliger is the current maintainer of dtc and libfdt, but I
>> > originally wrote both of them - I know as much about them as anyone
>> > does]
>> 
>> Okay, I looked at dtc and libfdt again, a bit more closely.  I'm sure
>> there's plenty of ignorance left in me, so please correct me when I'm
>> babbling nonsense.
>
> Sure.  So, I realize that there are two different questions here:
> 	a) Is IEEE1275 a good starting point for the content of a
> decorated tree for configuring qemu.
>
> Personally, I suspect the answer to this is yes, but more information
> might convince me otherwise.

I think it's simply too early to call.  We're learning as we go.

> 	b) Is the flattened tree format for representing IEEE1275-like
> trees useful for qemu.
>
> Personally, I think this is a "maybe".  More on this below.
>
> Actually, on consideration there's a third question, too:
> 	c) Are the extensions / simplifications / adjustments we've
> made to IEEE1275 conventions in the context of flattened trees also
> useful and appropriate for qemu-configuration tree.
>
> I think if the answer to (a) is yes, then the answer to (c) is yes,
> too.

Sounds fair to me, but I'm hardly qualified to judge.

>> FDT is a "flattened tree", i.e. a tree data structure laid out in a
>> block of memory in a clever way to make it compact and easily
>
> That's correct.
>
>> relocatable.  I understand why these are important requirements for
>> passing information through bootloader to kernel.  They're irrelevant,
>> however, for use as QEMU configuration.
>
> That's probably largely true.
>
>> You can identify an FDT node by node offset or node name.  The node
>> offset can change when you add or delete nodes or properties.
>
> Correct.
>
>> You want everyone to use libfdt for manipulating FDTs.  I think that's
>> entirely sensible.  What I still don't get is something else: Why use
>> FDT for QEMU configuration in the first place?  Let me explain.
>
> Yeah, I see your point, hence my "maybe" to (b) above.  There's no
> obvious call for the fdt format in qemu, but I can see a couple of
> minor things that might make it worthwhile: First, if qemu ever does
> want to record its configuration tree persistently - to be passed
> between programs, or between invocations of a program - then it's
> probably better to use the established fdt format rather than creating
> a new one, even if fdt isn't designed particularly towards qemu's
> purposes.  Second, the existing code / tools for working with the fdt
> format *might* be sufficiently useful to make it worth using.
>
> [Note also that the fdt tools will mostly work fine even if the tree
> content is *not* very IEEE1275-like]
>
>> I think we have two distinct problems: the need for a flexible,
>> expressive QEMU machine configuration file and a virtual device
>> configuration machinery driven by it, and the need for an FDT to pass to
>> a PowerPC kernel.  The two may be related, but they are not identical.
>> 
>> Let's pretend for a minute the latter need doesn't exist.
>> 
>> QEMU machine configuration wants to be a decorated tree: a tree of named
>> nodes with named properties.
>> 
>> IEEE 1275 is a standard describing a special kind of decorated tree.
>> Other kinds can be created with a binding.  If we create a suitable
>> binding, we can surely cast our configuration trees in the IEEE 1275
>> framework.
>
> That's not quite what "binding" usually means in the 1275 context, but
> I think I the point is right enough.
>
>> But what would that buy us?  This is a honest question, born out of my
>> relative ignorance of IEEE 1275.  Mind that we're still busily ignoring
>> the need for an FDT to pass to a kernel, so "it makes it easier to
>> create an FDT for the kernel" doesn't count here (it counts elsewhere).
>
> I think the idea behind using IEEE1275-like trees is that there is
> significant overlap between the device information that IEEE1275
> represents, and the device information which is configurable in qemu.
> Ultimately whether it buys you enough depends on how large that
> overlap is.

I think that's fair.

I believe we don't quite know yet whether the overlap will make it
worthwhile.

One way to approach this is to assume it will until proven wrong.  You
start with an IEEE 1275 description of the machine, and extend or adapt
it as you go.  My problem with that is that we don't have such
descriptions for the machines that interest me.  Developing them is a
big step that pays no immediate benefits, but blocks the little steps
that do pay.  Moreover, without a *real* user of the description, I'd
likely develop something that looks like IEEE 1275 to me, but isn't.  If
it turns out that IEEE 1275 is not worth it, tough, we already paid for
it.

Another way to approach this is to admit we don't know enough and punt
the decision until we do.  Start with the beneficial baby steps.  Limit
the machine description business to what is required for the baby steps,
making a best effort to stay close to IEEE 1275 structurally.  If it
turns out that IEEE 1275 is worth it, we do whatever is left to make the
descriptions conform to it.

I'm much more comfortable with the second approach.

>> FDTs are a special representation of IEEE 1275 trees in memory, designed
>> to be compact and relocatable.  But that comes at a price: nodes move
>> around when the tree changes.  The only real node id is the full name.
>
> Or phandle, for those nodes which have one.

Right, forgot about those.

>> This is not the common representation of decorated trees in C programs,
>> and for a reason.  It's simpler to represent edges as C pointers.  Not
>> the least advantage of that is notation: "->" beats a function call in
>> legibility hands down.
>
> Yes.  If there's enough manipulation of the tree, then you're
> generally better off having a "live" format which uses pointers,
> whether or not the fdt format is used at some stage in the process.
> Both the kernel and dtc (when taking fdt input) convert the flattened
> tree into a "live" representation internally.

Not surprising.

>> Example: the QEMU device data type needs to refer to its device node in
>> the configuration tree.  If that tree is coded the plain old way, you
>> store a pointer to the node and follow that.  If it is an FDT, then you
>> have to store the full node name, and look up the node by name.  I find
>> that tedious and verbose.
>
> Um.. I don't really follow your example.  But I think I see your
> point.  How problematic the flattened format is for this depends a lot
> on exactly what you need to do with it.  Sometimes it's much easier to
> avoid the flattened tree altogether, or transcribe it to a live
> format.  Other times, the tree manipulation is simple enough that it's
> easier to leave it flat (one example, for phases of the program where
> the tree is read-only, which could be a lot for a configuration tree,
> then node offsets *can* safely be used like pointers).

The machines I care for come with many optional and configurable parts.
We select the basic machine type with command line option -M, and
configure the rest with more command line options.  I figure we want to
keep supporting these options, at least for a while.

I believe the best way to deal with that is start with a basic tree
selected by -M, then modify it according to the other options.  So,
there's a fair amount of configuration tree mutation.

>> My point is: the question how to represent our decorated tree in memory
>> is entirely separate from the question of the tree's structure.  Just
>> because you want your tree to conform to IEEE 1275 doesn't mean you want
>> your tree flat at all times.
>
> Absolutely, yes.
>
>
>> Now let's examine how QEMU machine configuration and FDT machine
>> descriptions for kernels are related.
>> 
>> In a way, both can be regarded as copies of a complete machine
>> description with lots of stuff pruned.  Except the complete machine
>> description doesn't exist.  Because there is no use for it.
>> 
>> FDT routinely prunes stuff like PCI and USB devices, because those are
>> better probed.
>> 
>> QEMU configuration should certainly prune everything that is not
>> actually configurable.
>> 
>> To go from QEMU configuration to FDT we therefore may want to prune
>> superflous stuff, to keep it compact,
>
> Not necessarily.  The kernel should be fine to deal with a tree that
> has complete information, even if it doesn't need it, since that's
> what a real OF implementation provides.

Well, wasn't compactness one of the reasons to flatten it in the first
place?

>>  and we definitely have to add lots
>> of stuff that has no place in configuration.
>
> Yes.  Well.. whether this is a good plan depends critically on how big
> that "lots" really is.

I suspect the only way to find out is to try.

>>  Compared to that task, a
>> change of representation seems trivial.  I figure we want to copy the
>> tree anyway, because we need to edit it pretty drastically.
>> 
>> It's not obvious to me whether it makes sense to create the FDT from the
>> QEMU configuration automatically.  If we simulate a specific board, the
>> FDT is pretty fixed, isn't it?  Much of the configurable stuff could be
>> precisely in those parts that are omitted from FDT: PCI devices and
>> such.
>
> Well.. you definitely want to create the FDT passed to the kernel from
> the qemu configuration.  But whether that's best done by essentially
> transcribing a configuration tree which is in a similar format, or
> just using the configuration tree info to poke the changable bits in a
> "skeleton" FDT for the relevant machine is not so clear.
>
> Possibly.  I'm not familiar enough with the various qemu supported
> machine models to say.

Familiarity with all of them is a tall order...

>> >> * Provide an example tree describing a bare-bones PC, like the one in my
>> >>   prototype: CPU, RAM, BIOS, PIC, APIC, IOAPIC, PIT, DMA, UART, parallel
>> >>   port, floppy controller, CMOS & RTC, a20 gate (port 92) and other
>> >>   miscellanous I/O ports, i440fx, PIIX3 (ISA bridge, IDE, USB, ACPI),
>> >>   Cirrus VGA with BIOS, some PCI NIC.  This gives us all an idea of the
>> >>   tree structure.  Morphing that into something suitable for QEMU
>> >>   configuration shouldn't be too hard then, just an exercice in
>> >>   redecorating the tree.
>> >
>> > I don't off hand know any trees for a PC system.  There are a bunch of
>> > example trees for powerpc systems in arch/powerpc/boot/dts in the
>> > kernel tree.  A few of those, such as prep, at least have parts which
>> > somewhat resemble a PC.  I believe the OLPC also has OF; that would be
>> > an example OF tree for an x86 machine, if not a typical PC.
>> 
>> Could you point me to a specific file?  I grepped for prep and OLPC, no
>> luck.
>
> Oh, sorry, the prep tree hasn't gone into mainline yet.  But I believe
> Mitch Bradley supplied a PC tree later in the thread, which would be
> better for your purposes, anyway.

Got that, haven't digested it yet.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-16 16:39                       ` Markus Armbruster
@ 2009-02-17  3:29                           ` David Gibson
  -1 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-17  3:29 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

On Mon, Feb 16, 2009 at 05:39:40PM +0100, Markus Armbruster wrote:
> David Gibson <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> writes:
> > On Fri, Feb 13, 2009 at 12:26:28PM +0100, Markus Armbruster wrote:
[snip]
> > I think the idea behind using IEEE1275-like trees is that there is
> > significant overlap between the device information that IEEE1275
> > represents, and the device information which is configurable in qemu.
> > Ultimately whether it buys you enough depends on how large that
> > overlap is.
> 
> I think that's fair.
> 
> I believe we don't quite know yet whether the overlap will make it
> worthwhile.

Yeah, true enough.

> One way to approach this is to assume it will until proven wrong.  You
> start with an IEEE 1275 description of the machine, and extend or adapt
> it as you go.  My problem with that is that we don't have such
> descriptions for the machines that interest me.  Developing them is a
> big step that pays no immediate benefits, but blocks the little steps
> that do pay.  Moreover, without a *real* user of the description, I'd
> likely develop something that looks like IEEE 1275 to me, but isn't.  If
> it turns out that IEEE 1275 is not worth it, tough, we already paid for
> it.
> 
> Another way to approach this is to admit we don't know enough and punt
> the decision until we do.  Start with the beneficial baby steps.  Limit
> the machine description business to what is required for the baby steps,
> making a best effort to stay close to IEEE 1275 structurally.  If it
> turns out that IEEE 1275 is worth it, we do whatever is left to make the
> descriptions conform to it.
> 
> I'm much more comfortable with the second approach.

That's reasonable.  However, once you've taken enough baby steps you
do want to be careful that you don't end up long term with something
that's similar enough to 1275 to be confusing, but not similar enough
to be useful.  So at some point we do want to take a look ahead and
see how much difference there will be between the qemu-required config
information and the 1275 dectree.

> >> FDTs are a special representation of IEEE 1275 trees in memory, designed
> >> to be compact and relocatable.  But that comes at a price: nodes move
> >> around when the tree changes.  The only real node id is the full name.
> >
> > Or phandle, for those nodes which have one.
> 
> Right, forgot about those.
> 
> >> This is not the common representation of decorated trees in C programs,
> >> and for a reason.  It's simpler to represent edges as C pointers.  Not
> >> the least advantage of that is notation: "->" beats a function call in
> >> legibility hands down.
> >
> > Yes.  If there's enough manipulation of the tree, then you're
> > generally better off having a "live" format which uses pointers,
> > whether or not the fdt format is used at some stage in the process.
> > Both the kernel and dtc (when taking fdt input) convert the flattened
> > tree into a "live" representation internally.
> 
> Not surprising.
> 
> >> Example: the QEMU device data type needs to refer to its device node in
> >> the configuration tree.  If that tree is coded the plain old way, you
> >> store a pointer to the node and follow that.  If it is an FDT, then you
> >> have to store the full node name, and look up the node by name.  I find
> >> that tedious and verbose.
> >
> > Um.. I don't really follow your example.  But I think I see your
> > point.  How problematic the flattened format is for this depends a lot
> > on exactly what you need to do with it.  Sometimes it's much easier to
> > avoid the flattened tree altogether, or transcribe it to a live
> > format.  Other times, the tree manipulation is simple enough that it's
> > easier to leave it flat (one example, for phases of the program where
> > the tree is read-only, which could be a lot for a configuration tree,
> > then node offsets *can* safely be used like pointers).
> 
> The machines I care for come with many optional and configurable parts.
> We select the basic machine type with command line option -M, and
> configure the rest with more command line options.  I figure we want to
> keep supporting these options, at least for a while.
> 
> I believe the best way to deal with that is start with a basic tree
> selected by -M, then modify it according to the other options.  So,
> there's a fair amount of configuration tree mutation.

Yeah, you're probably right.  Although, in some cases the amount of
complex tree mutation can be cut down by thinking about things in the
right order.  For example if you have a bunch of optional devices,
rather than adding them one by one (with all the required properties)
to the skeleton tree, you can instead have the skeleton tree be the
all-bells-and-whistles variant then delete the subtrees that aren't
present.  libfdt even has a function to replace subtrees with nops
instead of eliding them, which means the offsets of other nodes won't
change.

> >> My point is: the question how to represent our decorated tree in memory
> >> is entirely separate from the question of the tree's structure.  Just
> >> because you want your tree to conform to IEEE 1275 doesn't mean you want
> >> your tree flat at all times.
> >
> > Absolutely, yes.
> >
> >> Now let's examine how QEMU machine configuration and FDT machine
> >> descriptions for kernels are related.
> >> 
> >> In a way, both can be regarded as copies of a complete machine
> >> description with lots of stuff pruned.  Except the complete machine
> >> description doesn't exist.  Because there is no use for it.
> >> 
> >> FDT routinely prunes stuff like PCI and USB devices, because those are
> >> better probed.
> >> 
> >> QEMU configuration should certainly prune everything that is not
> >> actually configurable.
> >> 
> >> To go from QEMU configuration to FDT we therefore may want to prune
> >> superflous stuff, to keep it compact,
> >
> > Not necessarily.  The kernel should be fine to deal with a tree that
> > has complete information, even if it doesn't need it, since that's
> > what a real OF implementation provides.
> 
> Well, wasn't compactness one of the reasons to flatten it in the first
> place?

One, since we were aiming at embedded systems, but not nearly a big a
factor as relocatability.  For qemu, I don't think the compactness is
much of an issue.

> >>  and we definitely have to add lots
> >> of stuff that has no place in configuration.
> >
> > Yes.  Well.. whether this is a good plan depends critically on how big
> > that "lots" really is.
> 
> I suspect the only way to find out is to try.

Indeed.

[snip]
> > Oh, sorry, the prep tree hasn't gone into mainline yet.  But I believe
> > Mitch Bradley supplied a PC tree later in the thread, which would be
> > better for your purposes, anyway.
> 
> Got that, haven't digested it yet.

Fair enough.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-17  3:29                           ` David Gibson
  0 siblings, 0 replies; 146+ messages in thread
From: David Gibson @ 2009-02-17  3:29 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: devicetree-discuss, qemu-devel

On Mon, Feb 16, 2009 at 05:39:40PM +0100, Markus Armbruster wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> > On Fri, Feb 13, 2009 at 12:26:28PM +0100, Markus Armbruster wrote:
[snip]
> > I think the idea behind using IEEE1275-like trees is that there is
> > significant overlap between the device information that IEEE1275
> > represents, and the device information which is configurable in qemu.
> > Ultimately whether it buys you enough depends on how large that
> > overlap is.
> 
> I think that's fair.
> 
> I believe we don't quite know yet whether the overlap will make it
> worthwhile.

Yeah, true enough.

> One way to approach this is to assume it will until proven wrong.  You
> start with an IEEE 1275 description of the machine, and extend or adapt
> it as you go.  My problem with that is that we don't have such
> descriptions for the machines that interest me.  Developing them is a
> big step that pays no immediate benefits, but blocks the little steps
> that do pay.  Moreover, without a *real* user of the description, I'd
> likely develop something that looks like IEEE 1275 to me, but isn't.  If
> it turns out that IEEE 1275 is not worth it, tough, we already paid for
> it.
> 
> Another way to approach this is to admit we don't know enough and punt
> the decision until we do.  Start with the beneficial baby steps.  Limit
> the machine description business to what is required for the baby steps,
> making a best effort to stay close to IEEE 1275 structurally.  If it
> turns out that IEEE 1275 is worth it, we do whatever is left to make the
> descriptions conform to it.
> 
> I'm much more comfortable with the second approach.

That's reasonable.  However, once you've taken enough baby steps you
do want to be careful that you don't end up long term with something
that's similar enough to 1275 to be confusing, but not similar enough
to be useful.  So at some point we do want to take a look ahead and
see how much difference there will be between the qemu-required config
information and the 1275 dectree.

> >> FDTs are a special representation of IEEE 1275 trees in memory, designed
> >> to be compact and relocatable.  But that comes at a price: nodes move
> >> around when the tree changes.  The only real node id is the full name.
> >
> > Or phandle, for those nodes which have one.
> 
> Right, forgot about those.
> 
> >> This is not the common representation of decorated trees in C programs,
> >> and for a reason.  It's simpler to represent edges as C pointers.  Not
> >> the least advantage of that is notation: "->" beats a function call in
> >> legibility hands down.
> >
> > Yes.  If there's enough manipulation of the tree, then you're
> > generally better off having a "live" format which uses pointers,
> > whether or not the fdt format is used at some stage in the process.
> > Both the kernel and dtc (when taking fdt input) convert the flattened
> > tree into a "live" representation internally.
> 
> Not surprising.
> 
> >> Example: the QEMU device data type needs to refer to its device node in
> >> the configuration tree.  If that tree is coded the plain old way, you
> >> store a pointer to the node and follow that.  If it is an FDT, then you
> >> have to store the full node name, and look up the node by name.  I find
> >> that tedious and verbose.
> >
> > Um.. I don't really follow your example.  But I think I see your
> > point.  How problematic the flattened format is for this depends a lot
> > on exactly what you need to do with it.  Sometimes it's much easier to
> > avoid the flattened tree altogether, or transcribe it to a live
> > format.  Other times, the tree manipulation is simple enough that it's
> > easier to leave it flat (one example, for phases of the program where
> > the tree is read-only, which could be a lot for a configuration tree,
> > then node offsets *can* safely be used like pointers).
> 
> The machines I care for come with many optional and configurable parts.
> We select the basic machine type with command line option -M, and
> configure the rest with more command line options.  I figure we want to
> keep supporting these options, at least for a while.
> 
> I believe the best way to deal with that is start with a basic tree
> selected by -M, then modify it according to the other options.  So,
> there's a fair amount of configuration tree mutation.

Yeah, you're probably right.  Although, in some cases the amount of
complex tree mutation can be cut down by thinking about things in the
right order.  For example if you have a bunch of optional devices,
rather than adding them one by one (with all the required properties)
to the skeleton tree, you can instead have the skeleton tree be the
all-bells-and-whistles variant then delete the subtrees that aren't
present.  libfdt even has a function to replace subtrees with nops
instead of eliding them, which means the offsets of other nodes won't
change.

> >> My point is: the question how to represent our decorated tree in memory
> >> is entirely separate from the question of the tree's structure.  Just
> >> because you want your tree to conform to IEEE 1275 doesn't mean you want
> >> your tree flat at all times.
> >
> > Absolutely, yes.
> >
> >> Now let's examine how QEMU machine configuration and FDT machine
> >> descriptions for kernels are related.
> >> 
> >> In a way, both can be regarded as copies of a complete machine
> >> description with lots of stuff pruned.  Except the complete machine
> >> description doesn't exist.  Because there is no use for it.
> >> 
> >> FDT routinely prunes stuff like PCI and USB devices, because those are
> >> better probed.
> >> 
> >> QEMU configuration should certainly prune everything that is not
> >> actually configurable.
> >> 
> >> To go from QEMU configuration to FDT we therefore may want to prune
> >> superflous stuff, to keep it compact,
> >
> > Not necessarily.  The kernel should be fine to deal with a tree that
> > has complete information, even if it doesn't need it, since that's
> > what a real OF implementation provides.
> 
> Well, wasn't compactness one of the reasons to flatten it in the first
> place?

One, since we were aiming at embedded systems, but not nearly a big a
factor as relocatability.  For qemu, I don't think the compactness is
much of an issue.

> >>  and we definitely have to add lots
> >> of stuff that has no place in configuration.
> >
> > Yes.  Well.. whether this is a good plan depends critically on how big
> > that "lots" really is.
> 
> I suspect the only way to find out is to try.

Indeed.

[snip]
> > Oh, sorry, the prep tree hasn't gone into mainline yet.  But I believe
> > Mitch Bradley supplied a PC tree later in the thread, which would be
> > better for your purposes, anyway.
> 
> Got that, haven't digested it yet.

Fair enough.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-17  3:29                           ` David Gibson
@ 2009-02-17  7:54                               ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-17  7:54 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A

David Gibson <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> writes:

> On Mon, Feb 16, 2009 at 05:39:40PM +0100, Markus Armbruster wrote:
>> David Gibson <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> writes:
>> > On Fri, Feb 13, 2009 at 12:26:28PM +0100, Markus Armbruster wrote:
> [snip]
>> > I think the idea behind using IEEE1275-like trees is that there is
>> > significant overlap between the device information that IEEE1275
>> > represents, and the device information which is configurable in qemu.
>> > Ultimately whether it buys you enough depends on how large that
>> > overlap is.
>> 
>> I think that's fair.
>> 
>> I believe we don't quite know yet whether the overlap will make it
>> worthwhile.
>
> Yeah, true enough.
>
>> One way to approach this is to assume it will until proven wrong.  You
>> start with an IEEE 1275 description of the machine, and extend or adapt
>> it as you go.  My problem with that is that we don't have such
>> descriptions for the machines that interest me.  Developing them is a
>> big step that pays no immediate benefits, but blocks the little steps
>> that do pay.  Moreover, without a *real* user of the description, I'd
>> likely develop something that looks like IEEE 1275 to me, but isn't.  If
>> it turns out that IEEE 1275 is not worth it, tough, we already paid for
>> it.
>> 
>> Another way to approach this is to admit we don't know enough and punt
>> the decision until we do.  Start with the beneficial baby steps.  Limit
>> the machine description business to what is required for the baby steps,
>> making a best effort to stay close to IEEE 1275 structurally.  If it
>> turns out that IEEE 1275 is worth it, we do whatever is left to make the
>> descriptions conform to it.
>> 
>> I'm much more comfortable with the second approach.
>
> That's reasonable.  However, once you've taken enough baby steps you
> do want to be careful that you don't end up long term with something
> that's similar enough to 1275 to be confusing, but not similar enough
> to be useful.  So at some point we do want to take a look ahead and
> see how much difference there will be between the qemu-required config
> information and the 1275 dectree.

Agreed.

I think it would help if 1275 experts reviewed the baby steps for
gratuitous deviations from 1275.

[Rest snipped, helpful comments, but I don't have anything interesting
to add...]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-17  7:54                               ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-17  7:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss

David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Feb 16, 2009 at 05:39:40PM +0100, Markus Armbruster wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>> > On Fri, Feb 13, 2009 at 12:26:28PM +0100, Markus Armbruster wrote:
> [snip]
>> > I think the idea behind using IEEE1275-like trees is that there is
>> > significant overlap between the device information that IEEE1275
>> > represents, and the device information which is configurable in qemu.
>> > Ultimately whether it buys you enough depends on how large that
>> > overlap is.
>> 
>> I think that's fair.
>> 
>> I believe we don't quite know yet whether the overlap will make it
>> worthwhile.
>
> Yeah, true enough.
>
>> One way to approach this is to assume it will until proven wrong.  You
>> start with an IEEE 1275 description of the machine, and extend or adapt
>> it as you go.  My problem with that is that we don't have such
>> descriptions for the machines that interest me.  Developing them is a
>> big step that pays no immediate benefits, but blocks the little steps
>> that do pay.  Moreover, without a *real* user of the description, I'd
>> likely develop something that looks like IEEE 1275 to me, but isn't.  If
>> it turns out that IEEE 1275 is not worth it, tough, we already paid for
>> it.
>> 
>> Another way to approach this is to admit we don't know enough and punt
>> the decision until we do.  Start with the beneficial baby steps.  Limit
>> the machine description business to what is required for the baby steps,
>> making a best effort to stay close to IEEE 1275 structurally.  If it
>> turns out that IEEE 1275 is worth it, we do whatever is left to make the
>> descriptions conform to it.
>> 
>> I'm much more comfortable with the second approach.
>
> That's reasonable.  However, once you've taken enough baby steps you
> do want to be careful that you don't end up long term with something
> that's similar enough to 1275 to be confusing, but not similar enough
> to be useful.  So at some point we do want to take a look ahead and
> see how much difference there will be between the qemu-required config
> information and the 1275 dectree.

Agreed.

I think it would help if 1275 experts reviewed the baby steps for
gratuitous deviations from 1275.

[Rest snipped, helpful comments, but I don't have anything interesting
to add...]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 2 (was: [RFC] Machine description as data)
  2009-02-16 16:22 ` [Qemu-devel] Machine description as data prototype, take 2 (was: [RFC] Machine description as data) Markus Armbruster
@ 2009-02-17 17:32   ` Paul Brook
  2009-02-18  8:42     ` [Qemu-devel] Machine description as data prototype, take 2 Markus Armbruster
  0 siblings, 1 reply; 146+ messages in thread
From: Paul Brook @ 2009-02-17 17:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Markus Armbruster

On Monday 16 February 2009, Markus Armbruster wrote:
> +    { "vga", sizeof(struct dt_device_vga), dt_vga_props,
> +      dt_vga_config, dt_vga_init, NULL },
> +    { "nic", sizeof(struct dt_device_nic), dt_nic_props,
> +      dt_nic_config, dt_nic_init, NULL },

I think this is doing things the wrong way. We shouldn't have a "nic" device 
that mutates into one of several devices based on magic options. Instead each 
nic/display adapter/HBA should be registered, and we instantiate those 
directly. For devices that have a common set of properties (e.g. nics) it 
probably makes sense to have common helper functions. It should be possible 
to have a single device that is (say) both a SCSI HBA and a NIC.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-17  3:29                           ` David Gibson
@ 2009-02-17 17:44                               ` Paul Brook
  -1 siblings, 0 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-17 17:44 UTC (permalink / raw)
  To: qemu-devel-qX2TKyscuCcdnm+yROfE0A
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A, Markus Armbruster

> > The machines I care for come with many optional and configurable parts.
> > We select the basic machine type with command line option -M, and
> > configure the rest with more command line options.  I figure we want to
> > keep supporting these options, at least for a while.
> >
> > I believe the best way to deal with that is start with a basic tree
> > selected by -M, then modify it according to the other options.  So,
> > there's a fair amount of configuration tree mutation.
>
> Yeah, you're probably right.  Although, in some cases the amount of
> complex tree mutation can be cut down by thinking about things in the
> right order.  For example if you have a bunch of optional devices,
> rather than adding them one by one (with all the required properties)
> to the skeleton tree, you can instead have the skeleton tree be the
> all-bells-and-whistles variant then delete the subtrees that aren't
> present.  libfdt even has a function to replace subtrees with nops
> instead of eliding them, which means the offsets of other nodes won't
> change.

I'm not so sure this is a vital feature. The current commandline options only 
provide the absolute bare minimum mutation for a basic PC machine, and don't 
even do that particularly well. I'm inclined to say we should punt 
significant machine config modification/generation to an external tool.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-17 17:44                               ` Paul Brook
  0 siblings, 0 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-17 17:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: devicetree-discuss, Markus Armbruster, David Gibson

> > The machines I care for come with many optional and configurable parts.
> > We select the basic machine type with command line option -M, and
> > configure the rest with more command line options.  I figure we want to
> > keep supporting these options, at least for a while.
> >
> > I believe the best way to deal with that is start with a basic tree
> > selected by -M, then modify it according to the other options.  So,
> > there's a fair amount of configuration tree mutation.
>
> Yeah, you're probably right.  Although, in some cases the amount of
> complex tree mutation can be cut down by thinking about things in the
> right order.  For example if you have a bunch of optional devices,
> rather than adding them one by one (with all the required properties)
> to the skeleton tree, you can instead have the skeleton tree be the
> all-bells-and-whistles variant then delete the subtrees that aren't
> present.  libfdt even has a function to replace subtrees with nops
> instead of eliding them, which means the offsets of other nodes won't
> change.

I'm not so sure this is a vital feature. The current commandline options only 
provide the absolute bare minimum mutation for a basic PC machine, and don't 
even do that particularly well. I'm inclined to say we should punt 
significant machine config modification/generation to an external tool.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
  2009-02-17 17:44                               ` Paul Brook
@ 2009-02-18  8:36                                   ` Markus Armbruster
  -1 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-18  8:36 UTC (permalink / raw)
  To: Paul Brook
  Cc: devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A

Paul Brook <paul-qD8j1LwMmJjtCj0u4l0SBw@public.gmane.org> writes:

>> > The machines I care for come with many optional and configurable parts.
>> > We select the basic machine type with command line option -M, and
>> > configure the rest with more command line options.  I figure we want to
>> > keep supporting these options, at least for a while.
>> >
>> > I believe the best way to deal with that is start with a basic tree
>> > selected by -M, then modify it according to the other options.  So,
>> > there's a fair amount of configuration tree mutation.
>>
>> Yeah, you're probably right.  Although, in some cases the amount of
>> complex tree mutation can be cut down by thinking about things in the
>> right order.  For example if you have a bunch of optional devices,
>> rather than adding them one by one (with all the required properties)
>> to the skeleton tree, you can instead have the skeleton tree be the
>> all-bells-and-whistles variant then delete the subtrees that aren't
>> present.  libfdt even has a function to replace subtrees with nops
>> instead of eliding them, which means the offsets of other nodes won't
>> change.
>
> I'm not so sure this is a vital feature. The current commandline options only 
> provide the absolute bare minimum mutation for a basic PC machine, and don't 
> even do that particularly well. I'm inclined to say we should punt 
> significant machine config modification/generation to an external tool.

I'm not exactly in love with the currrent command line myself.  It's
just that I'm trying to make improvements in modest steps with minimal
disruption.  I'd rather not start with throwing out the command line.

If the need for tree manipulation goes away, we can still ditch the
separate internal tree structure and go all FDT.  My prototype already
has a converter to and from FDTs, to demonstrate that the internal tree
is just as treeish as an FDT.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] [RFC] Machine description as data
@ 2009-02-18  8:36                                   ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-18  8:36 UTC (permalink / raw)
  To: Paul Brook; +Cc: devicetree-discuss, qemu-devel, David Gibson

Paul Brook <paul@codesourcery.com> writes:

>> > The machines I care for come with many optional and configurable parts.
>> > We select the basic machine type with command line option -M, and
>> > configure the rest with more command line options.  I figure we want to
>> > keep supporting these options, at least for a while.
>> >
>> > I believe the best way to deal with that is start with a basic tree
>> > selected by -M, then modify it according to the other options.  So,
>> > there's a fair amount of configuration tree mutation.
>>
>> Yeah, you're probably right.  Although, in some cases the amount of
>> complex tree mutation can be cut down by thinking about things in the
>> right order.  For example if you have a bunch of optional devices,
>> rather than adding them one by one (with all the required properties)
>> to the skeleton tree, you can instead have the skeleton tree be the
>> all-bells-and-whistles variant then delete the subtrees that aren't
>> present.  libfdt even has a function to replace subtrees with nops
>> instead of eliding them, which means the offsets of other nodes won't
>> change.
>
> I'm not so sure this is a vital feature. The current commandline options only 
> provide the absolute bare minimum mutation for a basic PC machine, and don't 
> even do that particularly well. I'm inclined to say we should punt 
> significant machine config modification/generation to an external tool.

I'm not exactly in love with the currrent command line myself.  It's
just that I'm trying to make improvements in modest steps with minimal
disruption.  I'd rather not start with throwing out the command line.

If the need for tree manipulation goes away, we can still ditch the
separate internal tree structure and go all FDT.  My prototype already
has a converter to and from FDTs, to demonstrate that the internal tree
is just as treeish as an FDT.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 2
  2009-02-17 17:32   ` Paul Brook
@ 2009-02-18  8:42     ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-18  8:42 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul Brook <paul@codesourcery.com> writes:

> On Monday 16 February 2009, Markus Armbruster wrote:
>> +    { "vga", sizeof(struct dt_device_vga), dt_vga_props,
>> +      dt_vga_config, dt_vga_init, NULL },
>> +    { "nic", sizeof(struct dt_device_nic), dt_nic_props,
>> +      dt_nic_config, dt_nic_init, NULL },
>
> I think this is doing things the wrong way. We shouldn't have a "nic" device 
> that mutates into one of several devices based on magic options. Instead each 
> nic/display adapter/HBA should be registered, and we instantiate those 
> directly. For devices that have a common set of properties (e.g. nics) it 
> probably makes sense to have common helper functions. It should be possible 
> to have a single device that is (say) both a SCSI HBA and a NIC.
>
> Paul

Fair point.  File it under "short cut".

My machine configuration tree is hard-coded, then modified according to
command line options.  The way that's done is not the way I think it
should be done.  I've concentrated on what to do with the configuration,
not so much on how to assemble it.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data)
  2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
                   ` (3 preceding siblings ...)
  2009-02-16 16:22 ` [Qemu-devel] Machine description as data prototype, take 2 (was: [RFC] Machine description as data) Markus Armbruster
@ 2009-02-19 10:29 ` Markus Armbruster
  2009-02-19 13:53   ` Paul Brook
                     ` (3 more replies)
  2009-02-23 18:00 ` [Qemu-devel] Machine description as data prototype, take 4 (was: [RFC] Machine description as data) Markus Armbruster
                   ` (5 subsequent siblings)
  10 siblings, 4 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-19 10:29 UTC (permalink / raw)
  To: qemu-devel

Third iteration of the prototype.

What about an early merge?  If your answer to that is "yes, but", what
exactly do you want changed?

New:

* Rebased to git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6626 c046a42c-6fe2-441c-8c8c-71466251a162

* Code duplication cleaned up.  I chose minimizing the impact on pc.c
  over nice, clean interfaces.  Happy to rework it if that was the wrong
  choice.  I think there are a few opportunities for cleanup that would
  improve pc.c even without taking dt.c into consideration.  I can work
  on patches if you like.

* The "device required" edges moved from struct tree to struct dt_device
  to make the configuration tree more similar to FDTs structurally.

* A bunch of pointless typedefs to hopefully blend in better
  stylistically.  Tabs expanded.  If style issues remain, please point
  them out to me!

Shortcuts:

* I didn't implement all the devices of the "pc" original.  The devices
  I implemented might not support all existing command line options.

* The initial configuration tree is hardcoded.  It should be read from a
  configuration file.

* Optional stuff is inserted into the initial configuration tree in
  hardcoded places.  We should use suitable markers in the configuration
  file instead.

* Linux gripes about ACPI, need to investigate.

Notable qualities:

* Linux still boots & shuts down cleanly (except for the ACPI gripes).

* Machine and host configuration are cleanly separated.  Machine
  configuration enumerates the components of the virtual machine, and
  how they are connected.  It is a tree of devices nodes.  Host
  configuration is about how the host implements virtual devices.
  Currently just a few flat tables.

* Device drivers implement a common abstract interface.

* Device drivers are cleanly separated from each other, and from the
  device-agnostic machine configuration and initialization code.

* Each device driver specifies its configurable properties in a single
  place.  Unknown properties are rejected.

* A device driver gets its configuration from two sources: the device's
  node in the machine configuraton tree, and applicable host
  configuration tables.


 Makefile              |    1 +
 Makefile.target       |    4 +
 hw/dt.c               | 1225 +++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pc.c               |   47 +--
 hw/pcint.h            |   46 ++
 net.c                 |    2 +-
 net.h                 |    1 +
 target-i386/machine.c |    2 +
 tree.c                |  298 ++++++++++++
 tree.h                |   40 ++
 10 files changed, 1638 insertions(+), 28 deletions(-)


diff --git a/Makefile b/Makefile
index 4f7a55a..2198bba 100644
--- a/Makefile
+++ b/Makefile
@@ -85,6 +85,7 @@ OBJS+=sd.o ssi-sd.o
 OBJS+=bt.o bt-host.o bt-vhci.o bt-l2cap.o bt-sdp.o bt-hci.o bt-hid.o usb-bt.o
 OBJS+=buffered_file.o migration.o migration-tcp.o net.o qemu-sockets.o
 OBJS+=qemu-char.o aio.o net-checksum.o savevm.o cache-utils.o
+OBJS+=tree.o
 
 ifdef CONFIG_BRLAPI
 OBJS+= baum.o
diff --git a/Makefile.target b/Makefile.target
index 9e7a1bb..ad254ad 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -583,8 +583,12 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
+OBJS+= dt.o
 OBJS += device-hotplug.o pci-hotplug.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
+ifdef FDT_LIBS
+LIBS+= $(FDT_LIBS)
+endif
 endif
 ifeq ($(TARGET_BASE_ARCH), ppc)
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
diff --git a/hw/dt.c b/hw/dt.c
new file mode 100644
index 0000000..f57668b
--- /dev/null
+++ b/hw/dt.c
@@ -0,0 +1,1225 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/*
+ * Configure and build a machine from configuration data
+ *
+ * The idea is to have generic, device-independent code driven by
+ * device-dependent configuration data, talking to devices through an
+ * abstract device interface.
+ *
+ * For now, this lives in hw/, even though that's not where generic,
+ * device independent code belongs.  This is just so we can minimize
+ * disruption by hiding completely behind the existing QEMUMachine
+ * abstraction.
+ *
+ * The configuration data currently is hardwired to a fairly limited
+ * PC, registered as machine type "pcdt".  The nuts and bolts of PC
+ * emulation remain in pc.c, and that sharing makes the somewhat
+ * clumsy pcint.h necessary.  Having two PC machine types makes no
+ * sense in the long run, of course.  We want to replace pc.c
+ * eventually, and also convert other machine types to this mechanism.
+ */
+
+#include <assert.h>
+#include "hw.h"
+#include "pc.h"
+#include "fdc.h"
+#include "pci.h"
+#include "block.h"
+#include "sysemu.h"
+#include "audio/audio.h"
+#include "net.h"
+#include "smbus.h"
+#include "boards.h"
+#include "console.h"
+#include "fw_cfg.h"
+#include "virtio-blk.h"
+#include "virtio-balloon.h"
+#include "virtio-console.h"
+#include "hpet_emul.h"
+#include "pcint.h"
+#include "tree.h"
+
+#ifdef HAVE_FDT
+#include <libfdt.h>
+#endif
+
+/* Forward declarations */
+typedef struct dt_device dt_device;
+typedef struct dt_tree_list dt_tree_list;
+typedef struct dt_driver dt_driver;
+typedef struct dt_prop_spec dt_prop_spec;
+static void dt_parse_prop(dt_device *dev, tree_prop *prop);
+static BlockDriverState **dt_piix3_hd(tree *piix3);
+
+\f

+/* Host Configuration */
+
+typedef struct dt_host {
+    /* connection NICs <-> VLAN */
+    tree *nic[MAX_NICS];
+    VLANState *nic_vlan[MAX_NICS];
+    /* connection drives <-> controller */
+    tree *drive_ctrl[MAX_DRIVES];
+    BlockDriverState *drive_state[MAX_DRIVES];
+} dt_host;
+
+static void
+dt_attach_nic(dt_host *host, int index,
+              tree *nic, VLANState *vlan)
+{
+    host->nic[index] = nic;
+    host->nic_vlan[index] = vlan;
+}
+
+static VLANState *
+dt_find_vlan(tree *conf, dt_host *host)
+{
+    int i;
+
+    for (i = 0; i < MAX_NICS; i++) {
+        if (host->nic[i] == conf)
+            return host->nic_vlan[i];
+    }
+    return NULL;
+}
+
+static void
+dt_attach_drive(dt_host *host, int index,
+                tree *controller, BlockDriverState *state)
+{
+    host->drive_ctrl[index] = controller;
+    host->drive_state[index] = state;
+}
+
+static void
+dt_drive_config(tree *conf, dt_host *host,
+                BlockDriverState *drive[], int n)
+{
+    int i, j;
+
+    j = 0;
+    for (i = 0; i < MAX_DRIVES; i++) {
+        if (host->drive_ctrl[i] != conf)
+            continue;
+        assert(j < n);
+        drive[j++] = host->drive_state[i];
+    }
+}
+
+static void
+dt_print_host_config(dt_host *host)
+{
+    char buf[1024];
+    int i;
+
+    for (i = 0; i < MAX_NICS; i++) {
+        if (!host->nic[i])
+            continue;
+        tree_path(host->nic[i], buf, sizeof(buf));
+        printf("nic#%d\tvlan %-4d\t%s\n",
+               i, host->nic_vlan[i]->id, buf);
+    }
+
+    for (i = 0; i < MAX_DRIVES; i++) {
+        if (!host->drive_ctrl[i])
+            continue;
+        tree_path(host->drive_ctrl[i], buf, sizeof(buf));
+        printf("drive#%d\t%-15s %s\n",
+               i, bdrv_get_device_name(host->drive_state[i]), buf);
+    }
+}
+
+\f

+/* Device Interface */
+
+/*
+ * Device life cycle:
+ *
+ * 1. Configuration: config() method runs after parent's.  Except kids
+ * are skipped when the parent's config() returns non-zero.  config()
+ * should initialize the device's private data from its configuration
+ * sub-tree.  It may edit the configuration sub-tree, and may declare
+ * initialization ordering constraints with tree_require_named().
+ *
+ * 2. Initialization: init() method runs after parent's and after that
+ * of devices declared required by config().  It should not touch the
+ * configuration tree.
+ *
+ * 3. Start: start() method runs, order is unspecified.
+ *
+ * Error handling in these driver methods: print to stderr and exit
+ * the program unsuccessfully.
+ *
+ * There is no device shutdown protocol yet.
+ */
+
+struct dt_device {
+    tree *conf;                 /* configuration sub-tree */
+    dt_driver *drv;             /* device driver */
+    LIST_HEAD(, dt_tree_list) reqs; /* required devices */
+    int visit;                  /* for dt_visit() */
+    void *priv;                 /* device private data */
+};
+
+struct dt_tree_list {
+    tree *conf;
+    LIST_ENTRY(dt_tree_list) link;
+};
+
+struct dt_driver {
+    const char *name;
+    size_t privsz;              /* size of device private data */
+    dt_prop_spec *prop_spec;    /* recognized conf node properties */
+    int (*config)(dt_device *, dt_host *);
+    void (*init)(dt_device *);
+    void (*start)(dt_device *);
+};
+
+static dt_driver dt_driver_table[];
+
+static dt_driver *
+dt_driver_by_name(const char *name)
+{
+    int i;
+
+    for (i = 0; dt_driver_table[i].name; i++) {
+        if (!strcmp(name, dt_driver_table[i].name))
+            return &dt_driver_table[i];
+    }
+    return NULL;
+}
+
+static dt_device *
+dt_device_of(tree *conf)
+{
+    return tree_get_user(conf);
+}
+
+static dt_device *
+dt_parent_device(dt_device *dev)
+{
+    tree *p = tree_parent(dev->conf);
+
+    return p ? dt_device_of(p) : NULL;
+}
+
+static dt_device *
+dt_new_device(tree *conf, dt_driver *drv)
+{
+    dt_device *dev;
+    tree_prop *prop;
+
+    dev = qemu_malloc(sizeof(*dev));
+    dev->conf = conf;
+    dev->drv = drv;
+    LIST_INIT(&dev->reqs);
+    dev->visit = 0;
+    dev->priv = qemu_malloc(drv->privsz);
+    tree_put_user(conf, dev);
+
+    TREE_FOREACH_PROP(prop, conf)
+        dt_parse_prop(dev, prop);
+
+    return dev;
+}
+
+static void
+dt_config(tree *conf, dt_host *host)
+{
+    dt_driver *drv;
+    dt_device *dev;
+    tree *kid;
+
+    drv = dt_driver_by_name(tree_node_name(conf));
+    if (!drv) {
+        fprintf(stderr, "No driver for device %s\n",
+                tree_node_name(conf));
+        exit(1);
+    }
+    dev = dt_new_device(conf, drv);
+    if (drv->config) {
+        if (drv->config(dev, host))
+            return;
+    }
+
+    TREE_FOREACH_KID(kid, conf)
+        dt_config(kid, host);
+}
+
+static tree *
+dt_require_named(dt_device *dev, const char *reqname)
+{
+    dt_tree_list *l = qemu_malloc(sizeof(*l));
+
+    l->conf = tree_node_by_name(dev->conf, reqname);
+    LIST_INSERT_HEAD(&dev->reqs, l, link);
+    return l->conf;
+}
+
+static void
+dt_do_visit(dt_device *dev,
+            void (*fun)(dt_device *, void *arg),
+            void *arg, int visit)
+{
+    dt_device *parent, *req, *kid;
+    dt_tree_list *l;
+    tree *k;
+
+    assert(dev->visit < visit - 1);
+    dev->visit = visit - 1;
+    parent = dt_parent_device(dev);
+    if (parent && parent->visit < visit)
+        dt_do_visit(parent, fun, arg, visit);
+    LIST_FOREACH(l, &dev->reqs, link) {
+        req = dt_device_of(l->conf);
+        if (req->visit < visit)
+            dt_do_visit(req, fun, arg, visit);
+    }
+    dev->visit = visit;
+    fun(dev, arg);
+    TREE_FOREACH_KID(k, dev->conf) {
+        kid = dt_device_of(k);
+        if (kid->visit < visit - 1)
+            dt_do_visit(kid, fun, arg, visit);
+    }
+}
+
+static void
+dt_visit(tree *node,
+         void (*fun)(dt_device *, void *arg),
+         void *arg)
+{
+    static int visit;
+
+    visit += 2;
+    dt_do_visit(dt_device_of(node), fun, arg, visit);
+}
+
+static void
+dt_init_visitor(dt_device *dev, void *arg)
+{
+    if (dev->drv->init)
+        dev->drv->init(dev);
+}
+
+static void
+dt_init(tree *conf)
+{
+    dt_visit(conf, dt_init_visitor, NULL);
+}
+
+static void
+dt_start(tree *conf)
+{
+    dt_device *dev = dt_device_of(conf);
+    tree *kid;
+
+    if (dev && dev->drv->start)
+        dev->drv->start(dev);
+
+    TREE_FOREACH_KID(kid, conf)
+        dt_start(kid);
+}
+
+\f

+/* Device properties */
+
+/*
+ * This is for parsing configuration tree node properties into device
+ * private data.
+ */
+
+struct dt_prop_spec {
+    const char *name;
+    ptrdiff_t offs;             /* offset in device private data */
+    size_t size;                /* size there, for sanity checking */
+    int (*parse)(void *, const char *, dt_prop_spec *);
+};
+
+#define DT_PROP_SPEC_INIT(name, strty, member, fmt)                     \
+    { name, offsetof(strty, member), sizeof(((strty *)0)->member),      \
+      dt_parse_##fmt }
+
+static dt_prop_spec *
+dt_prop_spec_by_name(dt_driver *drv, const char *name)
+{
+    dt_prop_spec *spec;
+
+    for (spec = drv->prop_spec; spec && spec->name; spec++) {
+        if (!strcmp(spec->name, name))
+            return spec;
+    }
+    return NULL;
+}
+
+static void
+dt_parse_prop(dt_device *dev, tree_prop *prop)
+{
+    const char *name = tree_prop_name(prop);
+    size_t size;
+    const char *val = tree_prop_value(prop, &size);
+    dt_prop_spec *spec = dt_prop_spec_by_name(dev->drv, name);
+
+    if (!spec) {
+        fprintf(stderr, "A %s device has no property %s\n",
+                dev->drv->name, name);
+        exit(1);
+    }
+
+    if (memchr(val, 0, size) != val + size - 1
+        || spec->parse((char *)dev->priv + spec->offs, val, spec) < 0) {
+        fprintf(stderr, "Bad value %.*s for property %s of device %s\n",
+                size, val, name, dev->drv->name);
+        exit(1);
+    }
+}
+
+static int
+dt_parse_string(void *dst, const char *src, dt_prop_spec *spec)
+{
+    assert(spec->size == sizeof(char *));
+    *(const char **)dst = src;
+    return 0;
+}
+
+static int
+dt_parse_int(void *dst, const char *src, dt_prop_spec *spec)
+{
+    char *ep;
+    long val;
+
+    assert(spec->size == sizeof(int));
+    errno = 0;
+    val = strtol(src, &ep, 0);
+    if (*ep || ep == src || errno || (int)val != val)
+        return -1;
+    *(int *)dst = val;
+    return 0;
+}
+
+static int
+dt_parse_ram_addr_t(void *dst, const char *src, dt_prop_spec *spec)
+{
+    char *ep;
+    unsigned long val;
+
+    assert(spec->size == sizeof(ram_addr_t));
+    errno = 0;
+    val = strtoul(src, &ep, 0);
+    if (*ep || ep == src || errno || (ram_addr_t)val != val)
+        return -1;
+    *(ram_addr_t *)dst = val;
+    return 0;
+}
+
+static int
+dt_parse_macaddr(void *dst, const char *src, dt_prop_spec *spec)
+{
+    assert(spec->size == 6);
+    if (parse_macaddr(dst, src) < 0)
+        return -1;
+    return 0;
+}
+
+\f

+/* Interfacing with FDT */
+
+/*
+ * Note: translation to FDT loses the association between
+ * configuration tree nodes and devices.
+ */
+
+#ifdef HAVE_FDT
+
+static int dt_fdt_chk(int res);
+static void dt_subtree_to_fdt(const tree *conf, void *fdt);
+
+static void *
+dt_tree_to_fdt(const tree *conf)
+{
+    int sz = 1024 * 1024;       /* FIXME arbitrary limit */
+    void *fdt = qemu_malloc(sz);
+
+    dt_fdt_chk(fdt_create(fdt, sz));
+    dt_subtree_to_fdt(conf, fdt);
+    dt_fdt_chk(fdt_finish(fdt));
+    return fdt;
+}
+
+static void
+dt_subtree_to_fdt(const tree *conf, void *fdt)
+{
+    tree_prop *prop;
+    tree *kid;
+    const void *pv;
+    size_t sz;
+
+    dt_fdt_chk(fdt_begin_node(fdt, tree_node_name(conf)));
+    TREE_FOREACH_PROP(prop, conf) {
+        pv = tree_prop_value(prop, &sz);
+        dt_fdt_chk(fdt_property(fdt, tree_prop_name(prop), pv, sz));
+    }
+    TREE_FOREACH_KID(kid, conf)
+        dt_subtree_to_fdt(kid, fdt);
+    dt_fdt_chk(fdt_end_node(fdt));
+}
+
+static tree *
+dt_fdt_to_tree(const void *fdt)
+{
+    int offs, next, depth;
+    uint32_t tag;
+    struct fdt_property *prop;
+    tree *stack[32];            /* FIXME arbitrary limit */
+
+    stack[0] = NULL;            /* "parent" of root */
+    next = depth = 0;
+
+    for (;;) {
+        offs = next;
+        tag = fdt_next_tag(fdt, offs, &next);
+        switch (tag) {
+        case FDT_PROP:
+            /*
+             * libfdt apparently doesn't provide a way to get property
+             * by offset, do it by hand
+             */
+            assert(0 < depth && depth < sizeof(stack) / sizeof(*stack));
+            prop = (void *)(const char *)fdt + fdt_off_dt_struct(fdt) + offs;
+            tree_put_prop(stack[depth],
+                          fdt_string(fdt, fdt32_to_cpu(prop->nameoff)),
+                          prop->data,
+                          fdt32_to_cpu(prop->len));
+        case FDT_NOP:
+            break;
+        case FDT_BEGIN_NODE:
+            depth++;
+            assert(0 < depth && depth < sizeof(stack) / sizeof(*stack));
+            stack[depth] = tree_new_kid(stack[depth-1],
+                                        fdt_get_name(fdt, offs, NULL),
+                                        NULL);
+            break;
+        case FDT_END_NODE:
+            depth--;
+            break;
+        case FDT_END:
+            dt_fdt_chk(next);
+            return stack[1];
+        }
+    }
+}
+
+static int
+dt_fdt_chk(int res)
+{
+    if (res < 0) {
+        fprintf(stderr, "%s\n", fdt_strerror(res)); /* FIXME cryptic */
+        exit(1);
+    }
+    return res;
+}
+
+static void
+dt_fdt_test(tree *conf)
+{
+    void *fdt;
+
+    fdt = dt_tree_to_fdt(conf);
+    conf = dt_fdt_to_tree(fdt);
+    tree_print(conf);
+    free(fdt);
+}
+#else
+static void dt_fdt_test(tree *conf) { }
+#endif
+\f

+/* CPUs Driver */
+
+typedef struct dt_device_cpus {
+    const char *model;
+    int num;
+} dt_device_cpus;
+
+static dt_prop_spec dt_cpus_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
+    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
+};
+
+static void
+dt_cpus_init(dt_device *dev)
+{
+    dt_device_cpus *priv = dev->priv;
+    int i;
+    CPUState *env;
+
+    for(i = 0; i < priv->num; i++) {
+        env = cpu_init(priv->model);
+        if (!env) {
+            fprintf(stderr, "Unable to find x86 CPU definition\n");
+            exit(1);
+        }
+        if (i != 0)
+            env->halted = 1;
+        qemu_register_reset(main_cpu_reset, env);
+    }
+}
+
+\f

+/* Memory Ranges */
+
+typedef struct dt_device_memrng {
+    target_phys_addr_t phys_addr;
+    ram_addr_t size;
+    ram_addr_t host_offs;
+    ram_addr_t flags;
+} dt_device_memrng;
+
+static void
+dt_memrng(dt_device_memrng *rng,
+          target_phys_addr_t phys_addr, ram_addr_t size,
+          ram_addr_t host_offs, ram_addr_t flags)
+{
+    rng->phys_addr = phys_addr;
+    rng->size = size;
+    rng->host_offs = host_offs;
+    rng->flags = flags;
+}
+
+static void
+dt_memrng_ram(dt_device_memrng *rng,
+              target_phys_addr_t phys_addr, ram_addr_t size)
+{
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), 0);
+}
+
+static void
+dt_memrng_rom(dt_device_memrng *rng,
+              target_phys_addr_t phys_addr, ram_addr_t maxsz,
+              const char *dir, const char *image, int top)
+{
+    char buf[1024];
+    int size;
+
+    snprintf(buf, sizeof(buf), "%s/%s", dir, image);
+    size = get_image_size(buf);
+    if (size < 0 || size > maxsz)
+        goto error;
+    if (top)
+        phys_addr = phys_addr + maxsz - size;
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), IO_MEM_ROM);
+    if (load_image(buf, phys_ram_base + rng->host_offs) != size)
+        goto error;
+    return;
+
+error:
+    fprintf(stderr, "qemu: could not load image '%s'\n", buf);
+    exit(1);
+}
+
+static void
+dt_memrng_init(dt_device_memrng *rng, int n)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+        cpu_register_physical_memory(rng[i].phys_addr, rng[i].size,
+                                     rng[i].host_offs | rng[i].flags);
+}
+
+\f

+/* Memory Driver */
+
+typedef struct dt_device_memory {
+    ram_addr_t ram_size;
+    dt_device_memrng *rng;
+    int nrng;
+    /* TODO want a real memory map here */
+    ram_addr_t below_4g, above_4g;
+} dt_device_memory;
+
+static dt_prop_spec dt_memory_props[] = {
+    DT_PROP_SPEC_INIT("ram", dt_device_memory, ram_size, ram_addr_t),
+};
+
+static int
+dt_memory_config(dt_device *dev, dt_host *host)
+{
+    /* TODO memory map hardcoded; get it from dev->conf instead */
+    dt_device_memory *priv = dev->priv;
+    dt_device_memrng *rng = qemu_malloc(sizeof(*rng) * 4);
+
+    if (priv->ram_size >= 0xe0000000 ) {
+        priv->above_4g = priv->ram_size - 0xe0000000;
+        priv->below_4g = 0xe0000000;
+    } else {
+        priv->below_4g = priv->ram_size;
+        priv->above_4g = 0;
+    }
+
+    dt_memrng_ram(&rng[0], 0, 0xa0000);
+    qemu_ram_alloc(0x60000);
+    dt_memrng_ram(&rng[1], 0x100000, priv->below_4g - 0x100000);
+    if (priv->above_4g)
+        abort();                /* TODO */
+    dt_memrng_rom(&rng[2], 0xe0000000, 0x20000000,
+                  bios_dir, BIOS_FILENAME, 1);
+                                /* TODO get name from dev->conf */
+    dt_memrng(&rng[3], 0xe0000, 0x20000,
+              rng[2].host_offs + rng[2].size - 0x20000, IO_MEM_ROM);
+    /* TODO option ROMs */
+
+    priv->rng = rng;
+    priv->nrng = 4;
+    return 0;
+}
+
+static void
+dt_memory_init(dt_device *dev)
+{
+    dt_device_memory *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, priv->nrng);
+    bochs_bios_init();
+}
+
+static ram_addr_t
+dt_memory_below_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->below_4g;
+}
+
+static ram_addr_t
+dt_memory_above_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->above_4g;
+}
+
+\f

+/* PC Miscellanous Driver */
+
+/*
+ * This is a driver for a whole collection of devices.  Could be
+ * picked apart into separate drivers, I guess.
+ */
+
+typedef struct dt_device_pc_misc {
+    const char *boot_device;
+    int apic;
+    int hpet;
+    qemu_irq *i8259;
+    BlockDriverState *fd[MAX_FD];
+} dt_device_pc_misc;
+
+static dt_prop_spec dt_pc_misc_props[] = {
+    DT_PROP_SPEC_INIT("boot-device", dt_device_pc_misc, boot_device,
+                      string),
+};
+
+static int
+dt_pc_misc_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pc_misc *priv = dev->priv;
+
+    priv->apic = 1;
+    priv->hpet = 1;
+    priv->i8259 = NULL;
+    dt_drive_config(dev->conf, host,
+                    priv->fd, sizeof(priv->fd) / sizeof(*priv->fd));
+    return 1;
+}
+
+static void
+dt_pc_misc_init(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    CPUState *env;
+    qemu_irq *cpu_irq;
+    IOAPICState *ioapic;
+    PITState *pit;
+    int i;
+
+    if (priv->apic) {
+        for (env = first_cpu; env; env = env->next_cpu) {
+            env->cpuid_features |= CPUID_APIC;
+            apic_init(env);
+        }
+    }
+
+    vmport_init();
+
+    cpu_irq = qemu_allocate_irqs(pic_irq_request, NULL, 1);
+    priv->i8259 = i8259_init(cpu_irq[0]);
+    ferr_irq = priv->i8259[13];
+
+    register_ioport_write(0x80, 1, 1, ioport80_write, NULL);
+    register_ioport_write(0xf0, 1, 1, ioportF0_write, NULL);
+
+    rtc_state = rtc_init(0x70, priv->i8259[8], 2000);
+    qemu_register_boot_set(pc_boot_set, rtc_state);
+
+    register_ioport_read(0x92, 1, 1, ioport92_read, NULL);
+    register_ioport_write(0x92, 1, 1, ioport92_write, NULL);
+
+    if (priv->apic) {
+        ioapic = ioapic_init();
+        pic_set_alt_irq_func(isa_pic, ioapic_set_irq, ioapic);
+    }
+
+    pit = pit_init(0x40, priv->i8259[0]);
+    pcspk_init(pit);
+    if (priv->hpet)
+        hpet_init(priv->i8259);
+
+    for(i = 0; i < MAX_SERIAL_PORTS; i++) {
+        if (serial_hds[i]) {
+            serial_init(serial_io[i], priv->i8259[serial_irq[i]], 115200,
+                        serial_hds[i]);
+        }
+    }
+
+    for(i = 0; i < MAX_PARALLEL_PORTS; i++) {
+        if (parallel_hds[i]) {
+            parallel_init(parallel_io[i], priv->i8259[parallel_irq[i]],
+                          parallel_hds[i]);
+        }
+    }
+
+    i8042_init(priv->i8259[1], priv->i8259[12], 0x60);
+    DMA_init(0);
+
+    floppy_controller = fdctrl_init(priv->i8259[6], 2, 0, 0x3f0, priv->fd);
+}
+
+static void
+dt_pc_misc_start(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    tree *memory = tree_node_by_name(dev->conf, "/memory");
+    tree *piix3 = tree_node_by_name(dev->conf, "/pci/piix3");
+
+    cmos_init(dt_memory_below_4g(memory),
+              dt_memory_above_4g(memory),
+              priv->boot_device,
+              dt_piix3_hd(piix3));
+}
+
+static qemu_irq *
+dt_pc_misc_i8259(tree *pc_misc)
+{
+    dt_device *dev = dt_device_of(pc_misc);
+    dt_device_pc_misc *priv = dev->priv;
+    assert(dev->drv->init == dt_pc_misc_init);
+    return priv->i8259;
+}
+
+\f

+/* PCI Bus Driver */
+
+typedef struct dt_device_pci {
+    PCIBus *bus;
+    tree *pc;
+} dt_device_pci;
+
+static int
+dt_pci_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->bus = NULL;
+    priv->pc = dt_require_named(dev, "/pc-misc");
+    return 0;
+}
+
+static void
+dt_pci_init(dt_device *dev)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->bus = i440fx_init(&i440fx_state, dt_pc_misc_i8259(priv->pc));
+}
+
+static void
+dt_pci_start(dt_device *dev)
+{
+    i440fx_init_memory_mappings(i440fx_state);
+}
+
+static void
+dt_must_be_on_pcibus(dt_device *dev)
+{
+    dt_device *bus = dt_parent_device(dev);
+
+    if (bus->drv->init != dt_pci_init) {
+        fprintf(stderr, "Device %s must be on a PCI bus\n", dev->drv->name);
+        exit(1);
+    }
+}
+
+static PCIBus *
+dt_get_pcibus(dt_device *dev)
+{
+    dt_device *bus = dt_parent_device(dev);
+
+    assert(bus->drv->init == dt_pci_init);
+    return ((dt_device_pci *)bus->priv)->bus;
+}
+
+\f

+/* PIIX3 Driver */
+
+typedef struct dt_device_piix3 {
+    int devfn;
+    int acpi;
+    int usb;
+    tree *pc;
+    BlockDriverState *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
+} dt_device_piix3;
+
+static int
+dt_piix3_config(dt_device *dev, dt_host *host)
+{
+    dt_device_piix3 *priv = dev->priv;
+
+    priv->devfn = -1;
+    priv->acpi = 1;
+    priv->usb = 1;
+    priv->pc = dt_require_named(dev, "/pc-misc");
+    dt_drive_config(dev->conf, host,
+                    priv->hd, sizeof(priv->hd) / sizeof(*priv->hd));
+    dt_must_be_on_pcibus(dev);
+    return 1;
+}
+
+static void
+dt_piix3_init(dt_device *dev)
+{
+    dt_device_piix3 *priv = dev->priv;
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+    qemu_irq *i8259 = dt_pc_misc_i8259(priv->pc);
+    int i;
+
+    priv->devfn = piix3_init(pci_bus, priv->devfn);
+
+    pci_piix3_ide_init(pci_bus, priv->hd, priv->devfn + 1, i8259);
+
+    if (priv->usb)
+        usb_uhci_piix3_init(pci_bus, priv->devfn + 2);
+
+    if (priv->acpi) {
+        uint8_t *eeprom_buf = qemu_mallocz(8 * 256); /* XXX: make this persistent */
+        i2c_bus *smbus;
+
+        /* TODO: Populate SPD eeprom data.  */
+        smbus = piix4_pm_init(pci_bus, priv->devfn + 3, 0xb100, i8259[9]);
+        for (i = 0; i < 8; i++)
+            smbus_eeprom_device_init(smbus, 0x50 + i, eeprom_buf + (i * 256));
+    }
+}
+
+static BlockDriverState **
+dt_piix3_hd(tree *piix3)
+{
+    dt_device *dev = dt_device_of(piix3);
+    dt_device_piix3 *priv = dev->priv;
+
+    assert(dev->drv->init == dt_piix3_init);
+    return priv->hd;
+}
+
+\f

+/* VGA Driver */
+
+typedef struct dt_driver_vga {
+    const char *model;
+    const char *bios;
+    void (*init)(PCIBus *, uint8_t *, ram_addr_t, int);
+} dt_driver_vga;
+
+static void
+pci_vmsvga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+                 ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vmsvga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size);
+}
+
+static void
+pci_vga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+              ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size, 0, 0);
+}
+
+static dt_driver_vga dt_driver_vga_table[] = {
+    { "cirrus", VGABIOS_CIRRUS_FILENAME, pci_cirrus_vga_init },
+    { "vms", VGABIOS_FILENAME, pci_vmsvga_init_ },
+    { "std", VGABIOS_FILENAME, pci_vga_init_ },
+    { NULL, NULL, NULL }
+};
+
+typedef struct dt_device_vga {
+    const char *model;
+    ram_addr_t ram_size;
+    dt_device_memrng rng[1];
+    ram_addr_t ram_offs;
+    dt_driver_vga *vga_drv;
+} dt_device_vga;
+
+static dt_prop_spec dt_vga_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_vga, model, string),
+    DT_PROP_SPEC_INIT("ram", dt_device_vga, ram_size, ram_addr_t),
+};
+
+static int
+dt_vga_config(dt_device *dev, dt_host *host)
+{
+    dt_device_vga *priv = dev->priv;
+    int i;
+
+    dt_memrng_rom(&priv->rng[0], 0xc0000, 0x10000,
+                  bios_dir, VGABIOS_CIRRUS_FILENAME, 0);
+                                /* TODO get name from dev->conf */
+    priv->ram_offs = qemu_ram_alloc(priv->ram_size);
+
+    for (i = 0; dt_driver_vga_table[i].model; i++) {
+        if (!strcmp(dt_driver_vga_table[i].model, priv->model))
+            break;
+    }
+    if (!dt_driver_vga_table[i].model) {
+        fprintf(stderr, "Unknown VGA model %s\n", priv->model);
+        exit(1);
+    }
+    priv->vga_drv = &dt_driver_vga_table[i];
+    dt_must_be_on_pcibus(dev);
+    return 0;
+}
+
+static void
+dt_vga_init(dt_device *dev)
+{
+    dt_device_vga *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, 1);
+    priv->vga_drv->init(dt_get_pcibus(dev),
+                        phys_ram_base + priv->ram_offs,
+                        priv->ram_offs, priv->ram_size);
+}
+
+\f

+/* NIC Driver */
+
+typedef struct dt_device_nic {
+    NICInfo nd;
+} dt_device_nic;
+
+static dt_prop_spec dt_nic_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_nic, nd.model, string),
+    DT_PROP_SPEC_INIT("mac", dt_device_nic, nd.macaddr, macaddr),
+    DT_PROP_SPEC_INIT("name", dt_device_nic, nd.name, string),
+};
+
+static int
+dt_nic_config(dt_device *dev, dt_host *host)
+{
+    dt_device_nic *priv = dev->priv;
+
+    priv->nd.vlan = dt_find_vlan(dev->conf, host);
+    dt_must_be_on_pcibus(dev);
+    return 0;
+}
+
+static void
+dt_nic_init(dt_device *dev)
+{
+    dt_device_nic *priv = dev->priv;
+
+    pci_nic_init(dt_get_pcibus(dev), &priv->nd, -1, NULL);
+}
+
+\f

+/* Machine Driver */
+
+static dt_driver dt_driver_table[] = {
+    { "", 0, NULL, NULL },
+    { "cpus", sizeof(dt_device_cpus), dt_cpus_props,
+      NULL, dt_cpus_init, NULL },
+    { "memory", sizeof(dt_device_memory), dt_memory_props,
+      dt_memory_config, dt_memory_init, NULL },
+    { "pc-misc", sizeof(dt_device_pc_misc), dt_pc_misc_props,
+      dt_pc_misc_config, dt_pc_misc_init, dt_pc_misc_start },
+    { "pci", sizeof(dt_device_pci), NULL,
+      dt_pci_config, dt_pci_init, dt_pci_start },
+    { "piix3", sizeof(dt_device_piix3), NULL,
+      dt_piix3_config, dt_piix3_init, NULL },
+    { "vga", sizeof(dt_device_vga), dt_vga_props,
+      dt_vga_config, dt_vga_init, NULL },
+    { "nic", sizeof(dt_device_nic), dt_nic_props,
+      dt_nic_config, dt_nic_init, NULL },
+    { NULL, 0, NULL, NULL, NULL }
+};
+
+static tree *
+dt_read_config(void)
+{
+    tree *root, *pci, *leaf;
+
+    /*
+     * TODO Read from config file.
+     *
+     * TODO Pretty far from a comprehensive machine configuration, but
+     * we need to start somewhere.
+     */
+    root = tree_new_kid(NULL, "", NULL);
+    leaf = tree_new_kid(root, "cpus", NULL);
+    tree_put_propf(leaf, "model", "%s", "qemu32");
+    leaf = tree_new_kid(root, "memory", NULL);
+    leaf = tree_new_kid(root, "pc-misc", NULL);
+    pci = tree_new_kid(root, "pci", NULL);
+    leaf = tree_new_kid(pci, "piix3", NULL);
+    return root;
+}
+
+/*
+ * Extract configuration from arguments and various global variables
+ * and put it into our machine and host configuration.
+ */
+static void
+dt_customize_config(tree *conf,
+                    dt_host *host,
+                    ram_addr_t ram_size, int vga_ram_size,
+                    const char *boot_device,
+                    const char *kernel_filename,
+                    const char *kernel_cmdline,
+                    const char *initrd_filename,
+                    const char *cpu_model)
+{
+    /*
+     * TODO This is still pretty cheesy: we insert stuff into the tree
+     * at hardcoded places.  Replacing placeholders instead would be
+     * more flexible.  Another idea is to mark certain parts of the
+     * initial tree optional, and remove them here.
+     */
+    tree *pci = tree_node_by_name(conf, "/pci");
+    tree *node;
+    int i, index;
+
+    node = tree_node_by_name(conf, "/cpus");
+    tree_put_propf(node, "num", "%d", smp_cpus);
+    if (cpu_model)
+        tree_put_propf(node, "model", "%s", cpu_model);
+
+    node = tree_node_by_name(conf, "/memory");
+    tree_put_propf(node, "ram", "%#lx", (unsigned long)ram_size);
+
+    node = tree_node_by_name(conf, "/pc-misc");
+    tree_put_propf(node, "boot-device", "%s", boot_device);
+
+    /* Insert VGA node */
+    if (cirrus_vga_enabled || vmsvga_enabled || std_vga_enabled) {
+        node = tree_new_kid(pci, "vga", NULL);
+        tree_put_propf(node, "model", "%s",
+                          cirrus_vga_enabled ? "cirrus" :
+                          vmsvga_enabled ? "vms" : "std");
+        tree_put_propf(node, "ram", "%#x", vga_ram_size);
+    }
+
+    /* Insert NIC nodes, connect to VLANs */
+    for(i = 0; i < nb_nics; i++) {
+        /* TODO non-PCI NICs */
+        NICInfo *n = &nd_table[i];
+
+        node = tree_new_kid(pci, "nic", NULL);
+        tree_put_propf(node, "mac", "%02x:%02x:%02x:%02x:%02x:%02x",
+                       n->macaddr[0], n->macaddr[1], n->macaddr[2],
+                       n->macaddr[3], n->macaddr[4], n->macaddr[5]);
+        tree_put_propf(node, "model", "%s",
+                       n->model ? n->model : "ne2k_pci");
+        if (n->name)
+            tree_put_propf(node, "name", "%s", n->name);
+        dt_attach_nic(host, i, node, n->vlan);
+    }
+
+    /* Connect drives to their controller nodes */
+    /* IDE */
+    node = tree_node_by_name(pci, "piix3");
+    for(i = 0; i < MAX_IDE_BUS * MAX_IDE_DEVS; i++) {
+        index = drive_get_index(IF_IDE, i / MAX_IDE_DEVS, i % MAX_IDE_DEVS);
+        if (index != -1)
+            dt_attach_drive(host, index, node, drives_table[index].bdrv);
+    }
+    /* Floppy */
+    node = tree_node_by_name(conf, "/pc-misc");
+    for(i = 0; i < MAX_FD; i++) {
+        index = drive_get_index(IF_FLOPPY, 0, i);
+        if (index != -1)
+            dt_attach_drive(host, index, node, drives_table[index].bdrv);
+    }
+
+    /* Unimplemented stuff */
+    if (kernel_filename)
+        abort();                /* TODO */
+}
+
+static void
+pc_init_dt(ram_addr_t ram_size, int vga_ram_size,
+           const char *boot_device,
+           const char *kernel_filename,
+           const char *kernel_cmdline,
+           const char *initrd_filename,
+           const char *cpu_model)
+{
+    tree *conf;
+    dt_host host;
+
+    conf = dt_read_config();
+    if (!conf)
+        exit(1);
+    tree_print(conf);
+    memset(&host, 0, sizeof(host));
+    dt_customize_config(conf, &host, ram_size, vga_ram_size, boot_device,
+                        kernel_filename, kernel_cmdline, initrd_filename,
+                        cpu_model);
+    dt_config(conf, &host);
+    tree_print(conf);
+    dt_print_host_config(&host);
+    dt_fdt_test(conf);
+    dt_init(conf);
+    dt_start(conf);
+}
+
+QEMUMachine pcdt_machine = {
+    .name = "pcdt",
+    .desc = "Standard PC (device tree)",
+    .init = pc_init_dt,
+    .ram_require = VGA_RAM_SIZE + PC_MAX_BIOS_SIZE,
+    .max_cpus = 255,
+};
diff --git a/hw/pc.c b/hw/pc.c
index 57ba803..107afb7 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -37,41 +37,34 @@
 #include "virtio-balloon.h"
 #include "virtio-console.h"
 #include "hpet_emul.h"
+#include "pcint.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
 
-#define BIOS_FILENAME "bios.bin"
-#define VGABIOS_FILENAME "vgabios.bin"
-#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
-
-#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
-
 /* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables.  */
 #define ACPI_DATA_SIZE       0x10000
 #define BIOS_CFG_IOPORT 0x510
 
-#define MAX_IDE_BUS 2
-
-static fdctrl_t *floppy_controller;
-static RTCState *rtc_state;
+fdctrl_t *floppy_controller;
+RTCState *rtc_state;
 static PITState *pit;
 static IOAPICState *ioapic;
-static PCIDevice *i440fx_state;
+PCIDevice *i440fx_state;
 
-static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
 {
 }
 
 /* MSDOS compatibility mode FPU exception support */
-static qemu_irq ferr_irq;
+qemu_irq ferr_irq;
 /* XXX: add IGNNE support */
 void cpu_set_ferr(CPUX86State *s)
 {
     qemu_irq_raise(ferr_irq);
 }
 
-static void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
 {
     qemu_irq_lower(ferr_irq);
 }
@@ -120,7 +113,7 @@ int cpu_get_pic_interrupt(CPUState *env)
     return intno;
 }
 
-static void pic_irq_request(void *opaque, int irq, int level)
+void pic_irq_request(void *opaque, int irq, int level)
 {
     CPUState *env = first_cpu;
 
@@ -166,7 +159,7 @@ static int cmos_get_fd_drive_type(int fd0)
     return val;
 }
 
-static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
 {
     RTCState *s = rtc_state;
     int cylinders, heads, sectors;
@@ -202,7 +195,7 @@ static int boot_device2nibble(char boot_device)
 
 /* copy/pasted from cmos_init, should be made a general function
  and used there as well */
-static int pc_boot_set(void *opaque, const char *boot_device)
+int pc_boot_set(void *opaque, const char *boot_device)
 {
 #define PC_MAX_BOOT_DEVICES 3
     RTCState *s = (RTCState *)opaque;
@@ -228,8 +221,8 @@ static int pc_boot_set(void *opaque, const char *boot_device)
 }
 
 /* hd_table must contain 4 block drivers */
-static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
-                      const char *boot_device, BlockDriverState **hd_table)
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table)
 {
     RTCState *s = rtc_state;
     int nbds, bds[3] = { 0, };
@@ -362,13 +355,13 @@ int ioport_get_a20(void)
     return ((first_cpu->a20_mask >> 20) & 1);
 }
 
-static void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
 {
     ioport_set_a20((val >> 1) & 1);
     /* XXX: bit 0 is fast reset */
 }
 
-static uint32_t ioport92_read(void *opaque, uint32_t addr)
+uint32_t ioport92_read(void *opaque, uint32_t addr)
 {
     return ioport_get_a20() << 1;
 }
@@ -420,7 +413,7 @@ static void bochs_bios_write(void *opaque, uint32_t addr, uint32_t val)
     }
 }
 
-static void bochs_bios_init(void)
+void bochs_bios_init(void)
 {
     void *fw_cfg;
 
@@ -687,7 +680,7 @@ static void load_linux(uint8_t *option_rom,
     generate_bootsect(option_rom, gpr, seg, 0);
 }
 
-static void main_cpu_reset(void *opaque)
+void main_cpu_reset(void *opaque)
 {
     CPUState *env = opaque;
     cpu_reset(env);
@@ -702,11 +695,11 @@ static const int ide_irq[2] = { 14, 15 };
 static int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360, 0x280, 0x380 };
 static int ne2000_irq[NE2000_NB_MAX] = { 9, 10, 11, 3, 4, 5 };
 
-static int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
-static int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
+int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
+int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
 
-static int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
-static int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
+int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
+int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
 
 #ifdef HAS_AUDIO
 static void audio_init (PCIBus *pci_bus, qemu_irq *pic)
diff --git a/hw/pcint.h b/hw/pcint.h
new file mode 100644
index 0000000..f18da67
--- /dev/null
+++ b/hw/pcint.h
@@ -0,0 +1,46 @@
+/*
+ * Stuff shared by pc.c and dt.c
+ *
+ * See dt.c for why this should go away eventually.
+ */
+
+#ifndef HW_PC_INT_H
+#define HW_PC_INT_H
+
+#define BIOS_FILENAME "bios.bin"
+#define VGABIOS_FILENAME "vgabios.bin"
+#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
+
+#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
+
+#define MAX_IDE_BUS 2
+
+/* TODO move to ferr stuff in cpu.h? */
+extern qemu_irq ferr_irq;
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data);
+
+/* TODO eliminate */
+extern RTCState *rtc_state;
+extern PCIDevice *i440fx_state;
+extern int serial_io[MAX_SERIAL_PORTS];
+extern int serial_irq[MAX_SERIAL_PORTS];
+extern int parallel_io[MAX_PARALLEL_PORTS];
+extern int parallel_irq[MAX_PARALLEL_PORTS];
+extern fdctrl_t *floppy_controller;
+
+/* TODO move to pic stuff in pc.h? */
+void pic_irq_request(void *opaque, int irq, int level);
+
+/* TODO move to a20 stuff in pc.h? */
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val);
+uint32_t ioport92_read(void *opaque, uint32_t addr);
+
+void bochs_bios_init(void);
+void main_cpu_reset(void *opaque);
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data);
+int pc_boot_set(void *opaque, const char *boot_device);
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd);
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table);
+
+#endif
diff --git a/net.c b/net.c
index 29beb28..5ee3ba4 100644
--- a/net.c
+++ b/net.c
@@ -153,7 +153,7 @@ static void hex_dump(FILE *f, const uint8_t *buf, int size)
 }
 #endif
 
-static int parse_macaddr(uint8_t *macaddr, const char *p)
+int parse_macaddr(uint8_t *macaddr, const char *p)
 {
     int i;
     char *last_char;
diff --git a/net.h b/net.h
index 03c7f18..f672915 100644
--- a/net.h
+++ b/net.h
@@ -47,6 +47,7 @@ int qemu_can_send_packet(VLANClientState *vc);
 ssize_t qemu_sendv_packet(VLANClientState *vc, const struct iovec *iov,
                           int iovcnt);
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size);
+int parse_macaddr(uint8_t *macaddr, const char *p);
 void qemu_format_nic_info_str(VLANClientState *vc, uint8_t macaddr[6]);
 void qemu_check_nic_model(NICInfo *nd, const char *model);
 void qemu_check_nic_model_list(NICInfo *nd, const char * const *models,
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 1cf49d5..01329d2 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -7,6 +7,8 @@
 
 void register_machines(void)
 {
+    extern QEMUMachine pcdt_machine;
+    qemu_register_machine(&pcdt_machine);
     qemu_register_machine(&pc_machine);
     qemu_register_machine(&isapc_machine);
 }
diff --git a/tree.c b/tree.c
new file mode 100644
index 0000000..a906a6a
--- /dev/null
+++ b/tree.c
@@ -0,0 +1,298 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/* Ye Olde Decorated Tree */
+
+#include <assert.h>
+#include "tree.h"
+#include "qemu-common.h"
+#include "sys-queue.h"
+
+struct tree {
+    const char *name;
+    LIST_HEAD(, tree_prop) props;
+    tree *parent;
+    TAILQ_HEAD(, tree) kids;
+    TAILQ_ENTRY(tree) siblings;
+    void *user;
+};
+
+struct tree_prop {
+    const char *name;
+    const void *val;
+    int sz;
+    tree *owner;
+    LIST_ENTRY(tree_prop) link;
+};
+
+tree *
+tree_new_kid(tree *parent, const char *name, void *user)
+{
+    tree *kid = qemu_malloc(sizeof(*kid));
+
+    assert(parent || !*name);
+    kid->name = name;
+    LIST_INIT(&kid->props);
+    kid->parent = parent;
+    TAILQ_INIT(&kid->kids);
+    if (parent)
+        TAILQ_INSERT_TAIL(&parent->kids, kid, siblings);
+    kid->user = user;
+
+    return kid;
+}
+
+const char *
+tree_node_name(const tree *node)
+{
+    return node->name;
+}
+
+static tree *
+tree_kid_by_name(const tree *dt, const char *name)
+{
+    const char *slash = strchr(name, '/');
+    size_t len = slash ? slash - name : strlen(name);
+    tree *kid;
+
+    TAILQ_FOREACH(kid, &dt->kids, siblings) {
+        if (!memcmp(kid->name, name, len) && kid->name[len] == 0)
+            return kid;
+    }
+    return NULL;
+}
+
+tree *
+tree_node_by_name(const tree *node, const char *name)
+{
+    tree *kid;
+    size_t len;
+
+    if (name[0] == '/') {
+        for (; node->parent; node = node->parent) ;
+        name++;
+    }
+
+    if (name[0] == 0)
+        return (tree *)node;
+
+    kid = tree_kid_by_name(node, name);
+    if (!kid)
+        return NULL;
+
+    len = strlen(kid->name);
+    if (name[len] == 0)
+        return kid;
+    assert (name[len] == '/');
+
+    while (name[len] == '/') len++;
+    return tree_node_by_name(kid, name + len);
+}
+
+tree_prop *
+tree_first_prop(const tree *node)
+{
+    return LIST_FIRST(&node->props);
+}
+
+tree_prop *
+tree_next_prop(const tree_prop *prop)
+{
+    return LIST_NEXT(prop, link);
+}
+
+tree_prop *
+tree_get_prop(const tree *node, const char *name)
+{
+    tree_prop *prop;
+
+    LIST_FOREACH(prop, &node->props, link) {
+        if (!strcmp(prop->name, name))
+            return prop;
+    }
+    return NULL;
+}
+
+const char *
+tree_get_prop_s(const tree *node, const char *name)
+{
+    tree_prop *prop = tree_get_prop(node, name);
+    if (!prop
+        || memchr(prop->val, 0, prop->sz) != prop->val + prop->sz - 1) {
+        errno = EINVAL;
+        return NULL;
+    }
+    return prop->val;
+}
+
+const char *
+tree_prop_name(const tree_prop *prop)
+{
+    return prop->name;
+}
+
+const void *
+tree_prop_value(const tree_prop *prop, size_t *size)
+{
+    if (size)
+        *size = prop->sz;
+    return prop->val;
+}
+
+void
+tree_put_prop(tree *node, const char *name,
+              const void *val, size_t sz)
+{
+    tree_prop *prop;
+
+    prop = tree_get_prop(node, name);
+    if (!prop) {
+        prop = qemu_malloc(sizeof(*prop));
+        prop->name = name;
+        prop->owner = node;
+        LIST_INSERT_HEAD(&node->props, prop, link);
+    }
+    /* FIXME need a destructor for val */
+    prop->val = val;
+    prop->sz = sz;
+}
+
+void
+tree_put_propf(tree *node, const char *name, const char *fmt, ...)
+{
+    va_list ap;
+    size_t len;
+    char *buf;
+
+    va_start(ap, fmt);
+    len = vsnprintf(NULL, 0, fmt, ap);
+    va_end(ap);
+
+    buf = qemu_malloc(len + 1);
+    va_start(ap, fmt);
+    vsnprintf(buf, len + 1, fmt, ap);
+    va_end(ap);
+
+    tree_put_prop(node, name, buf, len + 1);
+}
+
+void
+tree_put_user(tree *node, void *user)
+{
+    node->user = user;
+}
+
+void *
+tree_get_user(const tree *node)
+{
+    return node->user;
+}
+
+tree *
+tree_parent(const tree *node)
+{
+    return node->parent;
+}
+
+tree *
+tree_first_kid(const tree *node)
+{
+    return TAILQ_FIRST(&node->kids);
+}
+
+tree *
+tree_sibling(const tree *node)
+{
+    return TAILQ_NEXT(node, siblings);
+}
+
+int
+tree_path(const tree *node, char *buf, size_t bufsz)
+{
+    char *p;
+    const tree *np;
+    size_t len, res;
+
+    p = buf + bufsz;
+    res = 0;
+    for (np = node; np->parent; np = np->parent) {
+        len = 1 + strlen(np->name);
+        res += len;
+        if (res >= bufsz)
+            continue;
+        p -= len;
+        memcpy(p + 1, np->name, len - 1);
+        p[0] = '/';
+    }
+
+    if (res < bufsz) {
+        memcpy(buf, p, res);
+        buf[res] = 0;
+    }
+
+    return res;
+}
+
+static void
+tree_print_sub(const tree *node, int indent)
+{
+    int i, use_str, sep;
+    const unsigned char *pv;
+    tree_prop *prop;
+    tree *kid;
+
+    printf("%*s%s {\n", indent, "", node->name[0] ? node->name : "/");
+    LIST_FOREACH(prop, &node->props, link) {
+        printf("%*s%s", indent + 4, "", prop->name);
+        pv = prop->val;
+        if (pv) {
+            printf(" = ");
+            use_str = pv[prop->sz - 1] == 0;
+            for (i = 0; i < prop->sz - 1; i++) {
+                if (!isprint(pv[i]))
+                    use_str = 0;
+            }
+            if (use_str)
+                printf("\"%s\"", (const char *)prop->val);
+            else {
+                sep = '[';
+                for (i = 0; i < prop->sz; i++) {
+                    printf("%c%02x", sep, pv[i]);
+                    sep = ' ';
+                }
+                printf("]");
+            }
+        }
+        printf(";\n");
+    }
+    TAILQ_FOREACH(kid, &node->kids, siblings)
+        tree_print_sub(kid, indent + 4);
+    printf("%*s};\n", indent, "");
+}
+
+void
+tree_print(const tree *node)
+{
+    tree_print_sub(node, 0);
+}
diff --git a/tree.h b/tree.h
new file mode 100644
index 0000000..3e596f8
--- /dev/null
+++ b/tree.h
@@ -0,0 +1,40 @@
+#ifndef TREE_H
+#define TREE_H
+
+#include <stddef.h>
+
+typedef struct tree tree;
+typedef struct tree_prop tree_prop;
+
+tree *tree_new_kid(tree *parent, const char *name, void *user);
+const char *tree_node_name(const tree *node);
+tree *tree_node_by_name(const tree *node,
+                               const char *name);
+
+tree_prop *tree_first_prop(const tree *node);
+tree_prop *tree_next_prop(const tree_prop *prop);
+#define TREE_FOREACH_PROP(var, node) \
+    for (var = tree_first_prop(node); var; var = tree_next_prop(var))
+tree_prop *tree_get_prop(const tree *node, const char *name);
+const char *tree_get_prop_s(const tree *node, const char *name);
+const char *tree_prop_name(const tree_prop *prop);
+const void *tree_prop_value(const tree_prop *prop, size_t *size);
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz);
+void tree_put_propf(tree *node, const char *name,
+                    const char *fmt, ...)
+    __attribute__((format(printf,3,4)));
+
+void tree_put_user(tree *node, void *user);
+void *tree_get_user(const tree *node);
+
+tree *tree_parent(const tree *node);
+tree *tree_first_kid(const tree *node);
+tree *tree_sibling(const tree *node);
+#define TREE_FOREACH_KID(var, node)                                     \
+    for (var = tree_first_kid(node); var; var = tree_sibling(var))
+
+int tree_path(const tree *node, char *buf, size_t bufsz);
+void tree_print(const tree *node);
+
+#endif

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data)
  2009-02-19 10:29 ` [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data) Markus Armbruster
@ 2009-02-19 13:53   ` Paul Brook
  2009-02-19 14:55     ` [Qemu-devel] Machine description as data prototype, take 3 Markus Armbruster
  2009-02-19 14:36   ` Anthony Liguori
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 146+ messages in thread
From: Paul Brook @ 2009-02-19 13:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Markus Armbruster

On Thursday 19 February 2009, Markus Armbruster wrote:
> Third iteration of the prototype.
>
> What about an early merge?  If your answer to that is "yes, but", what
> exactly do you want changed?

I dislike that you've got everything lumped together. In its current form it's 
unclear that it's actually an improvement from what we currently have. There 
still seems to be an awful lot of code that's extremely PC specific, and I 
can't tell whether/which interfaces achieve separation from the legacy 
hardcoded PC nastyness and generic machine descriptions.

I'm also worried that we have both user config (image files, vlans, etc) and 
machine description (devices instantiation, etc) described in the same place. 
I still believe these should be separate tasks, with clear boundaries between 
the two.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-19 10:29 ` [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data) Markus Armbruster
  2009-02-19 13:53   ` Paul Brook
@ 2009-02-19 14:36   ` Anthony Liguori
  2009-02-19 15:00     ` Markus Armbruster
  2009-02-19 14:49   ` Anthony Liguori
  2009-02-19 16:40   ` [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data) Blue Swirl
  3 siblings, 1 reply; 146+ messages in thread
From: Anthony Liguori @ 2009-02-19 14:36 UTC (permalink / raw)
  To: qemu-devel

Markus Armbruster wrote:
> Third iteration of the prototype.
>
> What about an early merge?  If your answer to that is "yes, but", what
> exactly do you want changed?
>   

I'm all for an early merge but I think there has to be enough of the 
architectural changes in place to allow other people to understand the 
long term direction and also contribute.

I think the following are required for merge:

1) introduction of a new machine init function that returns a tree
2) code outside of dt.c, when -drive if=ide is specified, walks the tree 
looking for a node with an IDE decoration.  Finds the appropriate 
master/slave primary/secondary slot, and hooks up the BlockDriverState 
to the IDE device.
3) reading the machine description from a file

Basically, enough of the architecture that it's clear that we just need 
to do #2 for all of the remaining devices.  I don't think your that far 
from this today.

> New:
>
> * Rebased to git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6626 c046a42c-6fe2-441c-8c8c-71466251a162
>
> * Code duplication cleaned up.  I chose minimizing the impact on pc.c
>   over nice, clean interfaces.  Happy to rework it if that was the wrong
>   choice.  I think there are a few opportunities for cleanup that would
>   improve pc.c even without taking dt.c into consideration.  I can work
>   on patches if you like.
>
> * The "device required" edges moved from struct tree to struct dt_device
>   to make the configuration tree more similar to FDTs structurally.
>
> * A bunch of pointless typedefs to hopefully blend in better
>   stylistically.  Tabs expanded.  If style issues remain, please point
>   them out to me!
>   

I'll respond in a separate note but the style is still off.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-19 10:29 ` [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data) Markus Armbruster
  2009-02-19 13:53   ` Paul Brook
  2009-02-19 14:36   ` Anthony Liguori
@ 2009-02-19 14:49   ` Anthony Liguori
  2009-02-23 17:38     ` Markus Armbruster
  2009-02-19 16:40   ` [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data) Blue Swirl
  3 siblings, 1 reply; 146+ messages in thread
From: Anthony Liguori @ 2009-02-19 14:49 UTC (permalink / raw)
  To: qemu-devel


> diff --git a/hw/dt.c b/hw/dt.c
> +\f

>   

Please remove the ^Ls.  They don't render properly in my mail client.

> +/* Host Configuration */
> +
> +typedef struct dt_host {
> +    /* connection NICs <-> VLAN */
> +    tree *nic[MAX_NICS];
> +    VLANState *nic_vlan[MAX_NICS];
> +    /* connection drives <-> controller */
> +    tree *drive_ctrl[MAX_DRIVES];
> +    BlockDriverState *drive_state[MAX_DRIVES];
> +} dt_host;
>
>   

typedef struct DeviceTreeHost
{
} DeviceTreeHost.

I'm not sure this structure is going to scale well as we introduce more 
types of host devices.  You don't necessarily need to address the host 
configuration file part of this at this stage.

For instance, I think it would be perfectly fine to require to start 
with that the command line configuration matches the describe machine 
file.  For instance, if you see:

-net tap -net nic,model=rtl8139

Then you should search for an rtl8139 and configure the node to be on 
vlan=0.  If an rtl8139 doesn't exist, throw an error.

The long term goal, would be to have a mechanism to modify the tree in a 
generic way and the -net nic code would end up looking like:

node = find_next_device("type=nic,model=rtl8139");
if (!node) {
   node = find_bus("type=pcibus");
   if (!node)
       bail out
   node = add_node_to_bus(node, 
"type=nic,model=rtl8139,remaining_description_of_rtl8139");
   if (!node)
       bail out
}

attach_nic_to_vlan(vlan, node);

> +static dt_driver *
> +dt_driver_by_name(const char *name)
>   

While I'm not wildly opposed to this style (it's nice for grepping), 
most of the rest of the code doesn't do this (it keeps it on the same line).

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-19 13:53   ` Paul Brook
@ 2009-02-19 14:55     ` Markus Armbruster
  2009-02-19 15:03       ` Paul Brook
  0 siblings, 1 reply; 146+ messages in thread
From: Markus Armbruster @ 2009-02-19 14:55 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul Brook <paul@codesourcery.com> writes:

> On Thursday 19 February 2009, Markus Armbruster wrote:
>> Third iteration of the prototype.
>>
>> What about an early merge?  If your answer to that is "yes, but", what
>> exactly do you want changed?
>
> I dislike that you've got everything lumped together. In its current form it's 
> unclear that it's actually an improvement from what we currently have. There 
> still seems to be an awful lot of code that's extremely PC specific, and I 
> can't tell whether/which interfaces achieve separation from the legacy 
> hardcoded PC nastyness and generic machine descriptions.

I think I got the PC nastiness encapsulated in driver methods.  I can
put them into a separate file if you think that would help.

> I'm also worried that we have both user config (image files, vlans, etc) and 
> machine description (devices instantiation, etc) described in the same place. 
> I still believe these should be separate tasks, with clear boundaries between 
> the two.

Actually, we don't.  The machine configuration tree does not contain any
host configuration (if it does, it's an oversight I'd be happy to fix).
That is all in struct dt_host, a completely separate data structure.

The two data structures join in two places:

1. dt_customize_config(), which bridges the gap between our current
configuration system and the new configuration data structure.  Since
the former has machine and host configuration mixed up, dealing with the
two together there is unavoidable.  But keeping them separate from there
on paves the way for keeping them separate from the start some day.

2. The driver config methods, which necessarily use both the machine and
the host configuration.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-19 14:36   ` Anthony Liguori
@ 2009-02-19 15:00     ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-19 15:00 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori <anthony@codemonkey.ws> writes:

> Markus Armbruster wrote:
>> Third iteration of the prototype.
>>
>> What about an early merge?  If your answer to that is "yes, but", what
>> exactly do you want changed?
>>   
>
> I'm all for an early merge but I think there has to be enough of the
> architectural changes in place to allow other people to understand the
> long term direction and also contribute.
>
> I think the following are required for merge:
>
> 1) introduction of a new machine init function that returns a tree
> 2) code outside of dt.c, when -drive if=ide is specified, walks the
> tree looking for a node with an IDE decoration.  Finds the appropriate
> master/slave primary/secondary slot, and hooks up the BlockDriverState
> to the IDE device.
> 3) reading the machine description from a file
>
> Basically, enough of the architecture that it's clear that we just
> need to do #2 for all of the remaining devices.  I don't think your
> that far from this today.

Okay, I'll attack (1) and (2) next, and then we can talk again.

>> New:
>>
>> * Rebased to git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6626 c046a42c-6fe2-441c-8c8c-71466251a162
>>
>> * Code duplication cleaned up.  I chose minimizing the impact on pc.c
>>   over nice, clean interfaces.  Happy to rework it if that was the wrong
>>   choice.  I think there are a few opportunities for cleanup that would
>>   improve pc.c even without taking dt.c into consideration.  I can work
>>   on patches if you like.
>>
>> * The "device required" edges moved from struct tree to struct dt_device
>>   to make the configuration tree more similar to FDTs structurally.
>>
>> * A bunch of pointless typedefs to hopefully blend in better
>>   stylistically.  Tabs expanded.  If style issues remain, please point
>>   them out to me!
>>   
>
> I'll respond in a separate note but the style is still off.

Appreciated.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-19 14:55     ` [Qemu-devel] Machine description as data prototype, take 3 Markus Armbruster
@ 2009-02-19 15:03       ` Paul Brook
  0 siblings, 0 replies; 146+ messages in thread
From: Paul Brook @ 2009-02-19 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Markus Armbruster

On Thursday 19 February 2009, Markus Armbruster wrote:
> Paul Brook <paul@codesourcery.com> writes:
> > On Thursday 19 February 2009, Markus Armbruster wrote:
> >> Third iteration of the prototype.
> >>
> >> What about an early merge?  If your answer to that is "yes, but", what
> >> exactly do you want changed?
> >
> > I dislike that you've got everything lumped together. In its current form
> > it's unclear that it's actually an improvement from what we currently
> > have. There still seems to be an awful lot of code that's extremely PC
> > specific, and I can't tell whether/which interfaces achieve separation
> > from the legacy hardcoded PC nastyness and generic machine descriptions.
>
> I think I got the PC nastiness encapsulated in driver methods.  I can
> put them into a separate file if you think that would help.

That would definitely help.

In principle there should be no PC specific code. It should all be either 
generic code (in dt.c or similar), or device specific code (in the 
appropriate device implementations).

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data)
  2009-02-19 10:29 ` [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data) Markus Armbruster
                     ` (2 preceding siblings ...)
  2009-02-19 14:49   ` Anthony Liguori
@ 2009-02-19 16:40   ` Blue Swirl
  2009-02-19 18:30     ` [Qemu-devel] Machine description as data prototype, take 3 Markus Armbruster
  3 siblings, 1 reply; 146+ messages in thread
From: Blue Swirl @ 2009-02-19 16:40 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4467 bytes --]

On 2/19/09, Markus Armbruster <armbru@redhat.com> wrote:
> Third iteration of the prototype.
>
>  What about an early merge?  If your answer to that is "yes, but", what
>  exactly do you want changed?

Not until the device tree discussion is finished and Qemu release is
out. This isn't something we want to rush in. There is still Paul's
development and even Fabrice's original proposal which both have
relative merits.

>  +static int
>  +dt_parse_int(void *dst, const char *src, dt_prop_spec *spec)

dst should be uint64_t *.

>  +{
>  +    char *ep;
>  +    long val;

uint64_t val

>  +
>  +    assert(spec->size == sizeof(int));
>  +    errno = 0;
>  +    val = strtol(src, &ep, 0);

strtoull

>  +    if (*ep || ep == src || errno || (int)val != val)
>  +        return -1;
>  +    *(int *)dst = val;
>  +    return 0;
>  +}
>  +
>  +static int
>  +dt_parse_ram_addr_t(void *dst, const char *src, dt_prop_spec *spec)

ram_addr_t *dst

>  +{
>  +    char *ep;
>  +    unsigned long val;

ram_addr_t val

>  +
>  +    assert(spec->size == sizeof(ram_addr_t));
>  +    errno = 0;
>  +    val = strtoul(src, &ep, 0);

strtoull

>  +typedef struct dt_device_cpus {
>  +    const char *model;
>  +    int num;
>  +} dt_device_cpus;
>  +
>  +static dt_prop_spec dt_cpus_props[] = {
>  +    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
>  +    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
>  +};

There should be one node for each cpu, not "num". Each node is named
after the CPU model, like /SUNW,UltraSPARC-IIi.

>  +static dt_prop_spec dt_memory_props[] = {
>  +    DT_PROP_SPEC_INIT("ram", dt_device_memory, ram_size, ram_addr_t),
>  +};

Memory node should be name "/memory". It has properties "available"
and "reg", in this case we only want "reg". "reg" property consists of
several phys_addr, size pairs.

>  +static dt_prop_spec dt_pc_misc_props[] = {
>  +    DT_PROP_SPEC_INIT("boot-device", dt_device_pc_misc, boot_device,
>  +                      string),
>  +};

This property is quite standard, the correct place is under "/options".

>  +static dt_prop_spec dt_vga_props[] = {
>  +    DT_PROP_SPEC_INIT("model", dt_device_vga, model, string),
>  +    DT_PROP_SPEC_INIT("ram", dt_device_vga, ram_size, ram_addr_t),

Again, there is no "model" property, but the node name specifies the model.

"ram" is not correct, this should be under "reg" property.

>  +static dt_prop_spec dt_nic_props[] = {
>  +    DT_PROP_SPEC_INIT("model", dt_device_nic, nd.model, string),
>  +    DT_PROP_SPEC_INIT("mac", dt_device_nic, nd.macaddr, macaddr),
>  +    DT_PROP_SPEC_INIT("name", dt_device_nic, nd.name, string),
>  +};

"name" is the node name, you can't use it to anything else.

Again, node name should specify the model.

>  +    root = tree_new_kid(NULL, "", NULL);
>  +    leaf = tree_new_kid(root, "cpus", NULL);
>  +    tree_put_propf(leaf, "model", "%s", "qemu32");
>  +    leaf = tree_new_kid(root, "memory", NULL);
>  +    leaf = tree_new_kid(root, "pc-misc", NULL);

Remove pc-misc.

>  +    pci = tree_new_kid(root, "pci", NULL);
>  +    leaf = tree_new_kid(pci, "piix3", NULL);

"piix3" is equal to "pci". In this case, there will not be any "piix3"
node, "pci" takes it's place. Any known PCI devices use either their
class (like "pci" for PCI bridges) or model specific name, like
"ebus".

>  +    node = tree_node_by_name(pci, "piix3");
>  +    for(i = 0; i < MAX_IDE_BUS * MAX_IDE_DEVS; i++) {
>  +        index = drive_get_index(IF_IDE, i / MAX_IDE_DEVS, i % MAX_IDE_DEVS);
>  +        if (index != -1)
>  +            dt_attach_drive(host, index, node, drives_table[index].bdrv);
>  +    }

For the PIIX IDE controller (under "/pci" node) the correct name is "ide".

>  +    /* Floppy */
>  +    node = tree_node_by_name(conf, "/pc-misc");
>  +    for(i = 0; i < MAX_FD; i++) {
>  +        index = drive_get_index(IF_FLOPPY, 0, i);
>  +        if (index != -1)
>  +            dt_attach_drive(host, index, node, drives_table[index].bdrv);
>  +    }

ISA devices should be put either under a special "/isa" node, or if
there is an PCI-to-ISA bridge, "/pci/isa" or whatever the connection
is.

I have a troubling feeling that you have not read the 1275 standard or
looked how real OpenFirmware machines name things. I've attached a
Sparc64 tree as an example, please also read the OF standards at:

http://playground.sun.com/pub/p1275/

I'd still like to thank you for your efforts so far, this is a
workable starting point.

[-- Attachment #2: fire-t200-tree.bz2 --]
[-- Type: application/x-bzip2, Size: 5823 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-19 16:40   ` [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data) Blue Swirl
@ 2009-02-19 18:30     ` Markus Armbruster
  2009-02-20 18:14       ` Blue Swirl
  0 siblings, 1 reply; 146+ messages in thread
From: Markus Armbruster @ 2009-02-19 18:30 UTC (permalink / raw)
  To: qemu-devel

Blue Swirl <blauwirbel@gmail.com> writes:

> On 2/19/09, Markus Armbruster <armbru@redhat.com> wrote:
>> Third iteration of the prototype.
>>
>>  What about an early merge?  If your answer to that is "yes, but", what
>>  exactly do you want changed?
>
> Not until the device tree discussion is finished and Qemu release is
> out. This isn't something we want to rush in. There is still Paul's
> development and even Fabrice's original proposal which both have
> relative merits.
>
>>  +static int
>>  +dt_parse_int(void *dst, const char *src, dt_prop_spec *spec)
>
> dst should be uint64_t *.
>
>>  +{
>>  +    char *ep;
>>  +    long val;
>
> uint64_t val
>
>>  +
>>  +    assert(spec->size == sizeof(int));
>>  +    errno = 0;
>>  +    val = strtol(src, &ep, 0);
>
> strtoull

The first parameter is void * because this is a dt_prop_spec parse
method.

This particular method parses int, not uint64_t.

>>  +    if (*ep || ep == src || errno || (int)val != val)
>>  +        return -1;
>>  +    *(int *)dst = val;
>>  +    return 0;
>>  +}
>>  +
>>  +static int
>>  +dt_parse_ram_addr_t(void *dst, const char *src, dt_prop_spec *spec)
>
> ram_addr_t *dst
>
>>  +{
>>  +    char *ep;
>>  +    unsigned long val;
>
> ram_addr_t val

Not a good idea, I fear.  I use the type returned by strtoul(), because
that ensures there's no truncation in the assignment.  The conversion to
ram_addr_t happens later, in the part you snipped, and is carefully
checked for truncation.

>>  +
>>  +    assert(spec->size == sizeof(ram_addr_t));
>>  +    errno = 0;
>>  +    val = strtoul(src, &ep, 0);
>
> strtoull

Makes sense if we want to support ram_addr_t wider than long.  Do we?

>>  +typedef struct dt_device_cpus {
>>  +    const char *model;
>>  +    int num;
>>  +} dt_device_cpus;
>>  +
>>  +static dt_prop_spec dt_cpus_props[] = {
>>  +    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
>>  +    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
>>  +};
>
> There should be one node for each cpu, not "num". Each node is named
> after the CPU model, like /SUNW,UltraSPARC-IIi.
>
>>  +static dt_prop_spec dt_memory_props[] = {
>>  +    DT_PROP_SPEC_INIT("ram", dt_device_memory, ram_size, ram_addr_t),
>>  +};
>
> Memory node should be name "/memory". It has properties "available"
> and "reg", in this case we only want "reg". "reg" property consists of
> several phys_addr, size pairs.
>
>>  +static dt_prop_spec dt_pc_misc_props[] = {
>>  +    DT_PROP_SPEC_INIT("boot-device", dt_device_pc_misc, boot_device,
>>  +                      string),
>>  +};
>
> This property is quite standard, the correct place is under "/options".
>
>>  +static dt_prop_spec dt_vga_props[] = {
>>  +    DT_PROP_SPEC_INIT("model", dt_device_vga, model, string),
>>  +    DT_PROP_SPEC_INIT("ram", dt_device_vga, ram_size, ram_addr_t),
>
> Again, there is no "model" property, but the node name specifies the model.
>
> "ram" is not correct, this should be under "reg" property.
>
>>  +static dt_prop_spec dt_nic_props[] = {
>>  +    DT_PROP_SPEC_INIT("model", dt_device_nic, nd.model, string),
>>  +    DT_PROP_SPEC_INIT("mac", dt_device_nic, nd.macaddr, macaddr),
>>  +    DT_PROP_SPEC_INIT("name", dt_device_nic, nd.name, string),
>>  +};
>
> "name" is the node name, you can't use it to anything else.
>
> Again, node name should specify the model.
>
>>  +    root = tree_new_kid(NULL, "", NULL);
>>  +    leaf = tree_new_kid(root, "cpus", NULL);
>>  +    tree_put_propf(leaf, "model", "%s", "qemu32");
>>  +    leaf = tree_new_kid(root, "memory", NULL);
>>  +    leaf = tree_new_kid(root, "pc-misc", NULL);
>
> Remove pc-misc.
>
>>  +    pci = tree_new_kid(root, "pci", NULL);
>>  +    leaf = tree_new_kid(pci, "piix3", NULL);
>
> "piix3" is equal to "pci". In this case, there will not be any "piix3"
> node, "pci" takes it's place. Any known PCI devices use either their
> class (like "pci" for PCI bridges) or model specific name, like
> "ebus".
>
>>  +    node = tree_node_by_name(pci, "piix3");
>>  +    for(i = 0; i < MAX_IDE_BUS * MAX_IDE_DEVS; i++) {
>>  +        index = drive_get_index(IF_IDE, i / MAX_IDE_DEVS, i % MAX_IDE_DEVS);
>>  +        if (index != -1)
>>  +            dt_attach_drive(host, index, node, drives_table[index].bdrv);
>>  +    }
>
> For the PIIX IDE controller (under "/pci" node) the correct name is "ide".
>
>>  +    /* Floppy */
>>  +    node = tree_node_by_name(conf, "/pc-misc");
>>  +    for(i = 0; i < MAX_FD; i++) {
>>  +        index = drive_get_index(IF_FLOPPY, 0, i);
>>  +        if (index != -1)
>>  +            dt_attach_drive(host, index, node, drives_table[index].bdrv);
>>  +    }
>
> ISA devices should be put either under a special "/isa" node, or if
> there is an PCI-to-ISA bridge, "/pci/isa" or whatever the connection
> is.
>
> I have a troubling feeling that you have not read the 1275 standard or
> looked how real OpenFirmware machines name things. I've attached a
> Sparc64 tree as an example, please also read the OF standards at:
>
> http://playground.sun.com/pub/p1275/

To be honest, I read just enough on 1275 to

1. develop doubts on whether it is a good match for the problem
(discussed elsewhere in this thread), and

2. more importantly, realize that if I set out to master 1275 before
touching code, I'd certainly get bogged down in details before I could
accomplish anything useful, and/or get too bored to continue.

So I decided to once again exercise the three principal virtues
(Laziness, Impatience, and Hubris) and just go ahead and create some
working code, so we can have the kind of productive discussion we're
having now.

Let me stress: so far my work has *not* been about bringing 1275 or any
other configuration data structure to QEMU.  It's been chiefly exploring
how to configure and build a virtual machine, driven by configuration
data, talking to device code through an abstract device interface.  I
feel that details of configuration data encoding, like whether something
is encoded in the node name or a property, are entirely tangential to
that effort.  How exactly you decorate those trees doesn't affect the
abstract device interface at all.  It affects the machine builder, but I
doubt it affects it structurally.

It goes without saying that I'm fully prepared to change my
configuration data encoding.  However, I'd like to tackle the
restructuring Anthony recommended first.  Once I got that done, I'll be
happy to revisit your recommendations on config data encoding.

> I'd still like to thank you for your efforts so far, this is a
> workable starting point.

Thanks, that's encouraging.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [coreboot] DTS syntax and DTC patches (was: Re: [Qemu-devel] [RFC] Machine description as data)
       [not found]                                 ` <13426df10902130907m5c3452dpb8f4f2b72f8507b9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-02-20  2:29                                   ` David Gibson
       [not found]                                     ` <20090220022918.GA18332-787xzQ0H9iRg7VrjXcPTGA@public.gmane.org>
  0 siblings, 1 reply; 146+ messages in thread
From: David Gibson @ 2009-02-20  2:29 UTC (permalink / raw)
  To: ron minnich
  Cc: Markus Armbruster, devicetree-discuss-mnsaURCQ41sdnm+yROfE0A,
	Carl-Daniel Hailfinger, Hollis Blanchard, Coreboot

On Fri, Feb 13, 2009 at 09:07:08AM -0800, ron minnich wrote:
> Here is the sum total of the differences from when we checked it in
> over 2 years ago until now (parser). Our real changes are to
> flattree.c and livetree.c, where we do some ugly by-hand parsing of
> the ids such that pci@1,0 etc. work. I'd love to see a way to bring
> this into the real syntax. I've tried to do as little as possible to
> .y and .l.
> 
> The diff with comments is attached.
> 
> But this brings up a bigger issue and we could use your help.
> 
> OK, what did we do? We implemented the ability to have a sort of
> template. Here is a sample from real use.
> 
> /{
> 	mainboard_vendor = "Artec";
> 	mainboard_name = "DBE62";
> 	cpus { };
> 	apic@0 {
> 		/config/("northbridge/amd/geodelx/apic");
> 	};
> 	domain@0 {
> 		/config/("northbridge/amd/geodelx/domain");
> 		pci@1,0 {
> 			/config/("northbridge/amd/geodelx/pci");
> 			/* Video RAM has to be in 2MB chunks. */
> 			geode_video_mb = "16";
> 		};
> 	etc.
> 
> so what's going on here?
> 
> The config file in most cases is pretty straightforward. It's actually
> just a list of properties with a standard setting for chip control. We
> MUST have this; we don't want hundreds of settings in each mainboard,
> because sometimes a chip fix comes along and we want that to go into
> one chip file, and set the correct value, and have all mainboards get
> the new value next time they are built.
> 
> Let's look at /config/("northbridge/amd/geodelx/pci");
> 
> {
> 	device_operations = "geodelx_mc";
> 
> 	/* Video RAM has to be in 2MB chunks. */
> 	geode_video_mb = "0";
> };
> 
> The device_operations property is processed by flattree and is of no
> importance to you, but it is used in coreboot .h and .c code
> generation. For coreboot use, we have several property names that are
> special.
> 
> Note that we create a property, geode_video_mb, and set it to 0.
> 
> In the mainboard dts, we over-ride this value, and set it to 16.
> 
> These are pretty much the changes and, again, they work. But I'd like
> more, as would our community.
> 
> Right now, we can take a file containing a list of dts properties,
> read them in, and modify them as above. It's not really ideal, and I
> am sure you can already see it could be done better. But what we
> really want is the ability to read in  a dts node (with subnodes,
> etc.) and then elide them in the mainboard file.
> 
> So, for example, we have this subsection of one mainboard:
> 
> 		pci@6{ /* Port 2 */
> 			/config/("southbridge/amd/rs690/pcie.dts");
> 		};
> 		pci@7{ /* Port 3 */
> 			/config/("southbridge/amd/rs690/pcie.dts");
> 		};
> 		pci@12{
> 			/config/("southbridge/amd/sb600/hda.dts");
> 		};
> 		pci@13,0{
> 			/config/("southbridge/amd/sb600/usb.dts");
> 		};
> 		pci@13,1{
> 			/config/("southbridge/amd/sb600/usb.dts");
> 		};
> 		pci@13,2{
> 			/config/("southbridge/amd/sb600/usb.dts");
> 		};
> 
> This is not a bunch of chips, but one chip. It has lots of pci devices
> in it; this one chip is equivalent to a whole mainboard from previous
> years. What we'd really like is the ability to do what my wife calls
> restrict, add, and remove (I don't have these terms just right, it's
> some kind of compiler-speak which is what she does for a living).

Hrm, I see.  So, if we added the ability to list properties multiple
times, with the last definition overriding earlier ones, then I
believe that, along with include files, which are already supported
would accomplish what you have implemented with /config/.  Does that
seem correct?

> Restrict we have; change property values from a default.
> Add is what we'd like: add a node to a tree in some way.
> Remove we would also like: remove a node from a dts we have read in
> via /config/.

Hrm.  Well, this sort of thing is certainly on the cards with the
expression support stuff we had in mind.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [coreboot] DTS syntax and DTC patches (was: Re: [Qemu-devel] [RFC] Machine description as data)
       [not found]                                     ` <20090220022918.GA18332-787xzQ0H9iRg7VrjXcPTGA@public.gmane.org>
@ 2009-02-20  3:32                                       ` ron minnich
  0 siblings, 0 replies; 146+ messages in thread
From: ron minnich @ 2009-02-20  3:32 UTC (permalink / raw)
  To: ron minnich, Carl-Daniel Hailfinger, Coreboot, Markus Armbruster, Hollis

On Thu, Feb 19, 2009 at 6:29 PM, David Gibson <dwg-8fk3Idey6ehBDgjK7y7TUQ@public.gmane.org> wrote:

> Hrm, I see.  So, if we added the ability to list properties multiple
> times, with the last definition overriding earlier ones, then I
> believe that, along with include files, which are already supported
> would accomplish what you have implemented with /config/.  Does that
> seem correct?


That ought to do it.

>
>> Restrict we have; change property values from a default.
>> Add is what we'd like: add a node to a tree in some way.
>> Remove we would also like: remove a node from a dts we have read in
>> via /config/.
>
> Hrm.  Well, this sort of thing is certainly on the cards with the
> expression support stuff we had in mind.


Would be just what we need.

Thanks

ron

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-19 18:30     ` [Qemu-devel] Machine description as data prototype, take 3 Markus Armbruster
@ 2009-02-20 18:14       ` Blue Swirl
  2009-02-20 18:20         ` Paul Brook
  2009-02-23 12:18         ` Markus Armbruster
  0 siblings, 2 replies; 146+ messages in thread
From: Blue Swirl @ 2009-02-20 18:14 UTC (permalink / raw)
  To: qemu-devel

On 2/19/09, Markus Armbruster <armbru@redhat.com> wrote:
> Blue Swirl <blauwirbel@gmail.com> writes:
>
>
>  > On 2/19/09, Markus Armbruster <armbru@redhat.com> wrote:
>  >> Third iteration of the prototype.
>  >>
>  >>  What about an early merge?  If your answer to that is "yes, but", what
>  >>  exactly do you want changed?
>  >
>
> > Not until the device tree discussion is finished and Qemu release is
>  > out. This isn't something we want to rush in. There is still Paul's
>  > development and even Fabrice's original proposal which both have
>  > relative merits.
>  >
>  >>  +static int
>  >>  +dt_parse_int(void *dst, const char *src, dt_prop_spec *spec)
>  >
>  > dst should be uint64_t *.
>  >
>  >>  +{
>  >>  +    char *ep;
>  >>  +    long val;
>  >
>  > uint64_t val
>  >
>  >>  +
>  >>  +    assert(spec->size == sizeof(int));
>  >>  +    errno = 0;
>  >>  +    val = strtol(src, &ep, 0);
>  >
>  > strtoull
>
>  The first parameter is void * because this is a dt_prop_spec parse
>  method.
>
>  This particular method parses int, not uint64_t.

But we want to support 64 bit stuff as well, with this change it's easy.

>  >>  +    if (*ep || ep == src || errno || (int)val != val)
>  >>  +        return -1;
>  >>  +    *(int *)dst = val;
>  >>  +    return 0;
>  >>  +}
>  >>  +
>  >>  +static int
>  >>  +dt_parse_ram_addr_t(void *dst, const char *src, dt_prop_spec *spec)
>  >
>  > ram_addr_t *dst
>  >
>  >>  +{
>  >>  +    char *ep;
>  >>  +    unsigned long val;
>  >
>  > ram_addr_t val
>
>  Not a good idea, I fear.  I use the type returned by strtoul(), because
>  that ensures there's no truncation in the assignment.  The conversion to
>  ram_addr_t happens later, in the part you snipped, and is carefully
>  checked for truncation.
>
>  >>  +
>  >>  +    assert(spec->size == sizeof(ram_addr_t));
>  >>  +    errno = 0;
>  >>  +    val = strtoul(src, &ep, 0);
>  >
>  > strtoull
>
>  Makes sense if we want to support ram_addr_t wider than long.  Do we?

No, I don't think so. I was again thinking of 64 bit memory addresses,
but "long" should still be 64 bits in that case.

>  >>  +typedef struct dt_device_cpus {
>  >>  +    const char *model;
>  >>  +    int num;
>  >>  +} dt_device_cpus;
>  >>  +
>  >>  +static dt_prop_spec dt_cpus_props[] = {
>  >>  +    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
>  >>  +    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
>  >>  +};
>  >
>  > There should be one node for each cpu, not "num". Each node is named
>  > after the CPU model, like /SUNW,UltraSPARC-IIi.
>  >
>  >>  +static dt_prop_spec dt_memory_props[] = {
>  >>  +    DT_PROP_SPEC_INIT("ram", dt_device_memory, ram_size, ram_addr_t),
>  >>  +};
>  >
>  > Memory node should be name "/memory". It has properties "available"
>  > and "reg", in this case we only want "reg". "reg" property consists of
>  > several phys_addr, size pairs.
>  >
>  >>  +static dt_prop_spec dt_pc_misc_props[] = {
>  >>  +    DT_PROP_SPEC_INIT("boot-device", dt_device_pc_misc, boot_device,
>  >>  +                      string),
>  >>  +};
>  >
>  > This property is quite standard, the correct place is under "/options".
>  >
>  >>  +static dt_prop_spec dt_vga_props[] = {
>  >>  +    DT_PROP_SPEC_INIT("model", dt_device_vga, model, string),
>  >>  +    DT_PROP_SPEC_INIT("ram", dt_device_vga, ram_size, ram_addr_t),
>  >
>  > Again, there is no "model" property, but the node name specifies the model.
>  >
>  > "ram" is not correct, this should be under "reg" property.
>  >
>  >>  +static dt_prop_spec dt_nic_props[] = {
>  >>  +    DT_PROP_SPEC_INIT("model", dt_device_nic, nd.model, string),
>  >>  +    DT_PROP_SPEC_INIT("mac", dt_device_nic, nd.macaddr, macaddr),
>  >>  +    DT_PROP_SPEC_INIT("name", dt_device_nic, nd.name, string),
>  >>  +};
>  >
>  > "name" is the node name, you can't use it to anything else.
>  >
>  > Again, node name should specify the model.
>  >
>  >>  +    root = tree_new_kid(NULL, "", NULL);
>  >>  +    leaf = tree_new_kid(root, "cpus", NULL);
>  >>  +    tree_put_propf(leaf, "model", "%s", "qemu32");
>  >>  +    leaf = tree_new_kid(root, "memory", NULL);
>  >>  +    leaf = tree_new_kid(root, "pc-misc", NULL);
>  >
>  > Remove pc-misc.
>  >
>  >>  +    pci = tree_new_kid(root, "pci", NULL);
>  >>  +    leaf = tree_new_kid(pci, "piix3", NULL);
>  >
>  > "piix3" is equal to "pci". In this case, there will not be any "piix3"
>  > node, "pci" takes it's place. Any known PCI devices use either their
>  > class (like "pci" for PCI bridges) or model specific name, like
>  > "ebus".
>  >
>  >>  +    node = tree_node_by_name(pci, "piix3");
>  >>  +    for(i = 0; i < MAX_IDE_BUS * MAX_IDE_DEVS; i++) {
>  >>  +        index = drive_get_index(IF_IDE, i / MAX_IDE_DEVS, i % MAX_IDE_DEVS);
>  >>  +        if (index != -1)
>  >>  +            dt_attach_drive(host, index, node, drives_table[index].bdrv);
>  >>  +    }
>  >
>  > For the PIIX IDE controller (under "/pci" node) the correct name is "ide".
>  >
>  >>  +    /* Floppy */
>  >>  +    node = tree_node_by_name(conf, "/pc-misc");
>  >>  +    for(i = 0; i < MAX_FD; i++) {
>  >>  +        index = drive_get_index(IF_FLOPPY, 0, i);
>  >>  +        if (index != -1)
>  >>  +            dt_attach_drive(host, index, node, drives_table[index].bdrv);
>  >>  +    }
>  >
>  > ISA devices should be put either under a special "/isa" node, or if
>  > there is an PCI-to-ISA bridge, "/pci/isa" or whatever the connection
>  > is.
>  >
>  > I have a troubling feeling that you have not read the 1275 standard or
>  > looked how real OpenFirmware machines name things. I've attached a
>  > Sparc64 tree as an example, please also read the OF standards at:
>  >
>  > http://playground.sun.com/pub/p1275/
>
>  To be honest, I read just enough on 1275 to
>
>  1. develop doubts on whether it is a good match for the problem
>  (discussed elsewhere in this thread), and
>
>  2. more importantly, realize that if I set out to master 1275 before
>  touching code, I'd certainly get bogged down in details before I could
>  accomplish anything useful, and/or get too bored to continue.
>
>  So I decided to once again exercise the three principal virtues
>  (Laziness, Impatience, and Hubris) and just go ahead and create some
>  working code, so we can have the kind of productive discussion we're
>  having now.

That approach may produce something that works but it may be something
that is not compatible with the whole picture, or creates unnecessary
shuffling elsewhere.

In this case, there are machines using OF (Sparc32, Sparc64 and PPC),
so the machine config design should be compatible with the OF
structures.

Here's a concrete example: You proposed /cpus/num, whereas the OF way
is adding a number of CPU nodes. It is possible to convert between the
two (if all CPU properties were identical), but it's just unnecessary
work.

>  Let me stress: so far my work has *not* been about bringing 1275 or any
>  other configuration data structure to QEMU.  It's been chiefly exploring
>  how to configure and build a virtual machine, driven by configuration
>  data, talking to device code through an abstract device interface.  I
>  feel that details of configuration data encoding, like whether something
>  is encoded in the node name or a property, are entirely tangential to
>  that effort.  How exactly you decorate those trees doesn't affect the
>  abstract device interface at all.  It affects the machine builder, but I
>  doubt it affects it structurally.

Well, then there should be no problem using the OF model as much as possible?

>  It goes without saying that I'm fully prepared to change my
>  configuration data encoding.  However, I'd like to tackle the
>  restructuring Anthony recommended first.  Once I got that done, I'll be
>  happy to revisit your recommendations on config data encoding.
>
>  > I'd still like to thank you for your efforts so far, this is a
>  > workable starting point.
>
>  Thanks, that's encouraging.
>
>
>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-20 18:14       ` Blue Swirl
@ 2009-02-20 18:20         ` Paul Brook
  2009-02-23 12:00           ` Markus Armbruster
  2009-02-23 12:18         ` Markus Armbruster
  1 sibling, 1 reply; 146+ messages in thread
From: Paul Brook @ 2009-02-20 18:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

> Here's a concrete example: You proposed /cpus/num, whereas the OF way
> is adding a number of CPU nodes. It is possible to convert between the
> two (if all CPU properties were identical), but it's just unnecessary
> work.

In my implementation the device tree code doesn't actually know anything about 
CPUs. They're just another device that gets created.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-20 18:20         ` Paul Brook
@ 2009-02-23 12:00           ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-23 12:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

Paul Brook <paul@codesourcery.com> writes:

>> Here's a concrete example: You proposed /cpus/num, whereas the OF way
>> is adding a number of CPU nodes. It is possible to convert between the
>> two (if all CPU properties were identical), but it's just unnecessary
>> work.
>
> In my implementation the device tree code doesn't actually know anything about 
> CPUs. They're just another device that gets created.

Same here.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-20 18:14       ` Blue Swirl
  2009-02-20 18:20         ` Paul Brook
@ 2009-02-23 12:18         ` Markus Armbruster
  1 sibling, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-23 12:18 UTC (permalink / raw)
  To: qemu-devel

Blue Swirl <blauwirbel@gmail.com> writes:

> On 2/19/09, Markus Armbruster <armbru@redhat.com> wrote:
>> Blue Swirl <blauwirbel@gmail.com> writes:
>>
>>
>>  > On 2/19/09, Markus Armbruster <armbru@redhat.com> wrote:
>>  >> Third iteration of the prototype.
>>  >>
>>  >>  What about an early merge?  If your answer to that is "yes, but", what
>>  >>  exactly do you want changed?
>>  >
>>
>> > Not until the device tree discussion is finished and Qemu release is
>>  > out. This isn't something we want to rush in. There is still Paul's
>>  > development and even Fabrice's original proposal which both have
>>  > relative merits.
>>  >
>>  >>  +static int
>>  >>  +dt_parse_int(void *dst, const char *src, dt_prop_spec *spec)
>>  >
>>  > dst should be uint64_t *.
>>  >
>>  >>  +{
>>  >>  +    char *ep;
>>  >>  +    long val;
>>  >
>>  > uint64_t val
>>  >
>>  >>  +
>>  >>  +    assert(spec->size == sizeof(int));
>>  >>  +    errno = 0;
>>  >>  +    val = strtol(src, &ep, 0);
>>  >
>>  > strtoull
>>
>>  The first parameter is void * because this is a dt_prop_spec parse
>>  method.
>>
>>  This particular method parses int, not uint64_t.
>
> But we want to support 64 bit stuff as well, with this change it's easy.

dt_parse_int() is for safe and convenient parsing into an int variable.
This includes range checking.

Other integer types need their own parsing methods.  To be created as
needed.

>>  >>  +    if (*ep || ep == src || errno || (int)val != val)
>>  >>  +        return -1;
>>  >>  +    *(int *)dst = val;
>>  >>  +    return 0;
>>  >>  +}
>>  >>  +
>>  >>  +static int
>>  >>  +dt_parse_ram_addr_t(void *dst, const char *src, dt_prop_spec *spec)
>>  >
>>  > ram_addr_t *dst
>>  >
>>  >>  +{
>>  >>  +    char *ep;
>>  >>  +    unsigned long val;
>>  >
>>  > ram_addr_t val
>>
>>  Not a good idea, I fear.  I use the type returned by strtoul(), because
>>  that ensures there's no truncation in the assignment.  The conversion to
>>  ram_addr_t happens later, in the part you snipped, and is carefully
>>  checked for truncation.
>>
>>  >>  +
>>  >>  +    assert(spec->size == sizeof(ram_addr_t));
>>  >>  +    errno = 0;
>>  >>  +    val = strtoul(src, &ep, 0);
>>  >
>>  > strtoull
>>
>>  Makes sense if we want to support ram_addr_t wider than long.  Do we?
>
> No, I don't think so. I was again thinking of 64 bit memory addresses,
> but "long" should still be 64 bits in that case.

Okay.

[...]
>>  > I have a troubling feeling that you have not read the 1275 standard or
>>  > looked how real OpenFirmware machines name things. I've attached a
>>  > Sparc64 tree as an example, please also read the OF standards at:
>>  >
>>  > http://playground.sun.com/pub/p1275/
>>
>>  To be honest, I read just enough on 1275 to
>>
>>  1. develop doubts on whether it is a good match for the problem
>>  (discussed elsewhere in this thread), and
>>
>>  2. more importantly, realize that if I set out to master 1275 before
>>  touching code, I'd certainly get bogged down in details before I could
>>  accomplish anything useful, and/or get too bored to continue.
>>
>>  So I decided to once again exercise the three principal virtues
>>  (Laziness, Impatience, and Hubris) and just go ahead and create some
>>  working code, so we can have the kind of productive discussion we're
>>  having now.
>
> That approach may produce something that works but it may be something
> that is not compatible with the whole picture, or creates unnecessary
> shuffling elsewhere.

Only if people who see different aspects of the whole picture neglect to
contribute.

Attempting to create a flawless diamond from the start tends to deliver
expensive solutions to slightly wrong problems late.

> In this case, there are machines using OF (Sparc32, Sparc64 and PPC),
> so the machine config design should be compatible with the OF
> structures.
>
> Here's a concrete example: You proposed /cpus/num, whereas the OF way
> is adding a number of CPU nodes. It is possible to convert between the
> two (if all CPU properties were identical), but it's just unnecessary
> work.

I went for the stupidest solution that could possibly work.  That
solution happens to compress multiple identical CPUs into a single node.
Dropping that should be easy enough.

>>  Let me stress: so far my work has *not* been about bringing 1275 or any
>>  other configuration data structure to QEMU.  It's been chiefly exploring
>>  how to configure and build a virtual machine, driven by configuration
>>  data, talking to device code through an abstract device interface.  I
>>  feel that details of configuration data encoding, like whether something
>>  is encoded in the node name or a property, are entirely tangential to
>>  that effort.  How exactly you decorate those trees doesn't affect the
>>  abstract device interface at all.  It affects the machine builder, but I
>>  doubt it affects it structurally.
>
> Well, then there should be no problem using the OF model as much as possible?

There are problems with using OF *now*, namely the ones I've mentioned
repeatedly:

1. Whether OF is a good match for configuring QEMU remains to be seen.

2. I can only juggle so many balls at the same time.  The reason I was
able to deliver a workable starting point was that I picked the smallest
number of balls that still promised to be interesting.  The 1275 ball
isn't among them, sorry.  That does not mean I refuse to touch that
ball.  It just means that if I add it now, I'll likely drop all the
balls.  Which won't help anybody.

The OF ball can be added later.  Until then we can certainly keep it in
mind, and avoid decisions that make its later addition harder
needlessly.

After the initial merge, my personal opinions and limitations become
less relevant.  You could replace my configuration data encoding
wholesale by OF then, without being held back by my ignorance about it.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-19 14:49   ` Anthony Liguori
@ 2009-02-23 17:38     ` Markus Armbruster
  2009-02-23 18:58       ` Anthony Liguori
  0 siblings, 1 reply; 146+ messages in thread
From: Markus Armbruster @ 2009-02-23 17:38 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori <anthony@codemonkey.ws> writes:

>> diff --git a/hw/dt.c b/hw/dt.c
>> +\f

>>   
>
> Please remove the ^Ls.  They don't render properly in my mail client.

Many source files contain ^L already.  But I'll drop mine if you insist.

>> +/* Host Configuration */
>> +
>> +typedef struct dt_host {
>> +    /* connection NICs <-> VLAN */
>> +    tree *nic[MAX_NICS];
>> +    VLANState *nic_vlan[MAX_NICS];
>> +    /* connection drives <-> controller */
>> +    tree *drive_ctrl[MAX_DRIVES];
>> +    BlockDriverState *drive_state[MAX_DRIVES];
>> +} dt_host;
>>
>>   
>
> typedef struct DeviceTreeHost
> {
> } DeviceTreeHost.

If you insist on CamelCase, IFindThatUglyAndHardToRead, but I can do
that.  Just typedef names?

As to the placement of braces, a quick grep shows the vast majority of
such typedefs to have the brace on the same line as the typedef.

> I'm not sure this structure is going to scale well as we introduce
> more types of host devices.  You don't necessarily need to address the
> host configuration file part of this at this stage.

Agreed.  struct dt_host was just the simplest way I could find to
separate machine from host configuration, and still keep host
configuration together in one place.  I rather like to have it in one
place right now, because it makes it easier for me to find out what host
configuration actually is.

> For instance, I think it would be perfectly fine to require to start
> with that the command line configuration matches the describe machine
> file.  For instance, if you see:
>
> -net tap -net nic,model=rtl8139
>
> Then you should search for an rtl8139 and configure the node to be on
> vlan=0.  If an rtl8139 doesn't exist, throw an error.

Conversely, when an optional tree node isn't enabled (e.g. with -net nic
for NICs), silently cut it from the tree.

> The long term goal, would be to have a mechanism to modify the tree in
> a generic way and the -net nic code would end up looking like:
>
> node = find_next_device("type=nic,model=rtl8139");
> if (!node) {
>   node = find_bus("type=pcibus");
>   if (!node)
>       bail out
>   node = add_node_to_bus(node,
> "type=nic,model=rtl8139,remaining_description_of_rtl8139");
>   if (!node)
>       bail out
> }
>
> attach_nic_to_vlan(vlan, node);

Makes sense to me.

The driver should declare on what kind(s) of bus this device can go.

>> +static dt_driver *
>> +dt_driver_by_name(const char *name)
>>   
>
> While I'm not wildly opposed to this style (it's nice for grepping),
> most of the rest of the code doesn't do this (it keeps it on the same
> line).

Done.

> Regards,
>
> Anthony Liguori

Thanks!

-- 
Consistently separating words by spaces became a general custom about
the tenth century A.D., and lasted until about 1957, when FORTRAN
abandoned the practice.
	-- Sun FORTRAN Reference Manual

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [Qemu-devel] Machine description as data prototype, take 4 (was: [RFC] Machine description as data)
  2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
                   ` (4 preceding siblings ...)
  2009-02-19 10:29 ` [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data) Markus Armbruster
@ 2009-02-23 18:00 ` Markus Armbruster
  2009-02-24 20:06   ` Blue Swirl
  2009-03-03 17:46 ` [Qemu-devel] Machine description as data prototype, take 5 (was: [RFC] Machine description as data) Markus Armbruster
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Markus Armbruster @ 2009-02-23 18:00 UTC (permalink / raw)
  To: qemu-devel

Fourth iteration of the prototype.  I'm not asking to consider a merge
at this time.

New:

* Split into a generic part (dt.c dt.h) and a device-dependent part
  (hw/pcdt.c).  The latter could be split up further; many of its
  devices aren't PC-specific, or could be made so with a bit of effort.
  Arguably, the drivers are encapsulations of existing device
  implementations, and should live close to what they encapsulate.  I'm
  keeping all drivers in one place for now, just to avoid touching too
  many files.

  hw/pcdt.c also still contains the QEMUMachine init method (everything
  below /* Machine Driver */).  That stuff is temporary; see the
  shortcuts below.

* Some stylistic changes Anthony asked for.

Shortcuts:

* I didn't implement all the devices of the "pc" original.  The devices
  I implemented might not support all existing command line options.

* The configuration tree is simplistic.  I expect it to evolve, and I
  wouldn't exclude the possibility of wholesale replacement.

* The initial configuration tree is hardcoded in hw/pcdt.c.  It should
  be read from a configuration file.

* Optional stuff is inserted into the initial configuration tree in
  hardcoded places, in hw/pcdt.c.  We should use suitable markers in the
  configuration file instead, and do it in device-independent code
  outside hw/.

* I'm hiding completely behind the existing QEMUMachine init method
  interface, in hw/pcdt.c.  I guess we'll want to move that out.

* Linux gripes about ACPI, need to investigate.


 Makefile              |    1 +
 Makefile.target       |    4 +-
 dt.c                  |  445 ++++++++++++++++++++++++++++++++
 dt.h                  |  109 ++++++++
 hw/pc.c               |   47 ++--
 hw/pcdt.c             |  686 +++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pcint.h            |   46 ++++
 net.c                 |    2 +-
 net.h                 |    1 +
 target-i386/machine.c |    2 +
 tree.c                |  274 ++++++++++++++++++++
 tree.h                |   40 +++
 12 files changed, 1628 insertions(+), 29 deletions(-)


diff --git a/Makefile b/Makefile
index 4f7a55a..2198bba 100644
--- a/Makefile
+++ b/Makefile
@@ -85,6 +85,7 @@ OBJS+=sd.o ssi-sd.o
 OBJS+=bt.o bt-host.o bt-vhci.o bt-l2cap.o bt-sdp.o bt-hci.o bt-hid.o usb-bt.o
 OBJS+=buffered_file.o migration.o migration-tcp.o net.o qemu-sockets.o
 OBJS+=qemu-char.o aio.o net-checksum.o savevm.o cache-utils.o
+OBJS+=tree.o
 
 ifdef CONFIG_BRLAPI
 OBJS+= baum.o
diff --git a/Makefile.target b/Makefile.target
index 9e7a1bb..cabdaf4 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -505,6 +505,7 @@ OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o dma-helpers.o
 # need to fix this properly
 OBJS+=virtio.o virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
 OBJS+=fw_cfg.o
+OBJS+=dt.o
 ifdef CONFIG_KVM
 OBJS+=kvm.o kvm-all.o
 endif
@@ -536,6 +537,7 @@ endif
 ifdef CONFIG_OSS
 LIBS += $(CONFIG_OSS_LIB)
 endif
+LIBS+= $(FDT_LIBS)
 
 SOUND_HW = sb16.o es1370.o ac97.o
 ifdef CONFIG_ADLIB
@@ -583,6 +585,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
+OBJS+= pcdt.o
 OBJS += device-hotplug.o pci-hotplug.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
@@ -604,7 +607,6 @@ OBJS+= pflash_cfi02.o ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
 OBJS+= ppc440.o ppc440_bamboo.o
 ifdef FDT_LIBS
 OBJS+= device_tree.o
-LIBS+= $(FDT_LIBS)
 endif
 ifdef CONFIG_KVM
 OBJS+= kvm_ppc.o
diff --git a/dt.c b/dt.c
new file mode 100644
index 0000000..631a81a
--- /dev/null
+++ b/dt.c
@@ -0,0 +1,445 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * Configure and build a machine from configuration data
+ *
+ * This is generic, device-independent code driven by device-dependent
+ * configuration data, talking to devices through an abstract device
+ * interface.
+ *
+ * The configuration data currently is hardwired to a fairly limited
+ * PC, registered as machine type "pcdt".  The nuts and bolts of PC
+ * emulation remain in pc.c, and that sharing makes the somewhat
+ * clumsy pcint.h necessary.  Having two PC machine types makes no
+ * sense in the long run, of course.  We want to replace pc.c
+ * eventually, and also convert other machine types to this mechanism.
+ */
+
+#include <assert.h>
+#include "block.h"
+#include "cpu.h"
+#include "dt.h"
+#include "net.h"
+#include "tree.h"
+#include "sysemu.h"
+
+#ifdef HAVE_FDT
+#include <libfdt.h>
+#endif
+
+/* Forward declarations */
+static void dt_parse_prop(dt_device *dev, tree_prop *prop);
+static void dt_fdt_test(tree *conf);
+
+\f

+/* Host Configuration */
+
+void dt_attach_nic(dt_host *host, int index, tree *nic, VLANState *vlan)
+{
+    host->nic[index] = nic;
+    host->nic_vlan[index] = vlan;
+}
+
+VLANState *dt_find_vlan(tree *conf, dt_host *host)
+{
+    int i;
+
+    for (i = 0; i < MAX_NICS; i++) {
+        if (host->nic[i] == conf)
+            return host->nic_vlan[i];
+    }
+    return NULL;
+}
+
+void dt_attach_drive(dt_host *host, int index,
+                tree *controller, BlockDriverState *state)
+{
+    host->drive_ctrl[index] = controller;
+    host->drive_state[index] = state;
+}
+
+void dt_drive_config(tree *conf, dt_host *host,
+                BlockDriverState *drive[], int n)
+{
+    int i, j;
+
+    j = 0;
+    for (i = 0; i < MAX_DRIVES; i++) {
+        if (host->drive_ctrl[i] != conf)
+            continue;
+        assert(j < n);
+        drive[j++] = host->drive_state[i];
+    }
+}
+
+static void dt_print_host_config(dt_host *host)
+{
+    char buf[1024];
+    int i;
+
+    for (i = 0; i < MAX_NICS; i++) {
+        if (!host->nic[i])
+            continue;
+        tree_path(host->nic[i], buf, sizeof(buf));
+        printf("nic#%d\tvlan %-4d\t%s\n",
+               i, host->nic_vlan[i]->id, buf);
+    }
+
+    for (i = 0; i < MAX_DRIVES; i++) {
+        if (!host->drive_ctrl[i])
+            continue;
+        tree_path(host->drive_ctrl[i], buf, sizeof(buf));
+        printf("drive#%d\t%-15s %s\n",
+               i, bdrv_get_device_name(host->drive_state[i]), buf);
+    }
+}
+
+\f

+/* Device Interface */
+
+static dt_driver *dt_driver_by_name(const char *name, dt_driver drvtab[])
+{
+    int i;
+
+    for (i = 0; drvtab[i].name; i++) {
+        if (!strcmp(name, drvtab[i].name))
+            return &drvtab[i];
+    }
+    return NULL;
+}
+
+dt_device *dt_device_of(tree *conf)
+{
+    return tree_get_user(conf);
+}
+
+dt_device *dt_parent_device(dt_device *dev)
+{
+    tree *p = tree_parent(dev->conf);
+
+    return p ? dt_device_of(p) : NULL;
+}
+
+static dt_device *dt_new_device(tree *conf, dt_driver *drv)
+{
+    dt_device *dev;
+    tree_prop *prop;
+
+    dev = qemu_malloc(sizeof(*dev));
+    dev->conf = conf;
+    dev->drv = drv;
+    LIST_INIT(&dev->reqs);
+    dev->visit = 0;
+    dev->priv = qemu_malloc(drv->privsz);
+    tree_put_user(conf, dev);
+
+    TREE_FOREACH_PROP(prop, conf)
+        dt_parse_prop(dev, prop);
+
+    return dev;
+}
+
+static void dt_config(tree *conf, dt_host *host, dt_driver drvtab[])
+{
+    dt_driver *drv;
+    dt_device *dev;
+    tree *kid;
+
+    drv = dt_driver_by_name(tree_node_name(conf), drvtab);
+    if (!drv) {
+        fprintf(stderr, "No driver for device %s\n",
+                tree_node_name(conf));
+        exit(1);
+    }
+    dev = dt_new_device(conf, drv);
+    if (drv->config) {
+        if (drv->config(dev, host))
+            return;
+    }
+
+    TREE_FOREACH_KID(kid, conf)
+        dt_config(kid, host, drvtab);
+}
+
+tree *dt_require_named(dt_device *dev, const char *reqname)
+{
+    dt_tree_list *l = qemu_malloc(sizeof(*l));
+
+    l->conf = tree_node_by_name(dev->conf, reqname);
+    LIST_INSERT_HEAD(&dev->reqs, l, link);
+    return l->conf;
+}
+
+static void dt_do_visit(dt_device *dev,
+                        void (*fun)(dt_device *, void *arg),
+                        void *arg, int visit)
+{
+    dt_device *parent, *req, *kid;
+    dt_tree_list *l;
+    tree *k;
+
+    assert(dev->visit < visit - 1);
+    dev->visit = visit - 1;
+    parent = dt_parent_device(dev);
+    if (parent && parent->visit < visit)
+        dt_do_visit(parent, fun, arg, visit);
+    LIST_FOREACH(l, &dev->reqs, link) {
+        req = dt_device_of(l->conf);
+        if (req->visit < visit)
+            dt_do_visit(req, fun, arg, visit);
+    }
+    dev->visit = visit;
+    fun(dev, arg);
+    TREE_FOREACH_KID(k, dev->conf) {
+        kid = dt_device_of(k);
+        if (kid->visit < visit - 1)
+            dt_do_visit(kid, fun, arg, visit);
+    }
+}
+
+static void dt_visit(tree *node,
+                     void (*fun)(dt_device *, void *arg),
+                     void *arg)
+{
+    static int visit;
+
+    visit += 2;
+    dt_do_visit(dt_device_of(node), fun, arg, visit);
+}
+
+static void dt_init_visitor(dt_device *dev, void *arg)
+{
+    if (dev->drv->init)
+        dev->drv->init(dev);
+}
+
+static void dt_init(tree *conf)
+{
+    dt_visit(conf, dt_init_visitor, NULL);
+}
+
+static void dt_start(tree *conf)
+{
+    dt_device *dev = dt_device_of(conf);
+    tree *kid;
+
+    if (dev && dev->drv->start)
+        dev->drv->start(dev);
+
+    TREE_FOREACH_KID(kid, conf)
+        dt_start(kid);
+}
+
+void dt_create_machine(tree *conf, dt_host *host, dt_driver drvtab[])
+{
+    dt_config(conf, host, drvtab);
+    tree_print(conf);
+    dt_print_host_config(host);
+    dt_fdt_test(conf);
+    dt_init(conf);
+    dt_start(conf);
+}
+
+\f

+/* Device properties */
+
+static dt_prop_spec *dt_prop_spec_by_name(dt_driver *drv, const char *name)
+{
+    dt_prop_spec *spec;
+
+    for (spec = drv->prop_spec; spec && spec->name; spec++) {
+        if (!strcmp(spec->name, name))
+            return spec;
+    }
+    return NULL;
+}
+
+static void dt_parse_prop(dt_device *dev, tree_prop *prop)
+{
+    const char *name = tree_prop_name(prop);
+    size_t size;
+    const char *val = tree_prop_value(prop, &size);
+    dt_prop_spec *spec = dt_prop_spec_by_name(dev->drv, name);
+
+    if (!spec) {
+        fprintf(stderr, "A %s device has no property %s\n",
+                dev->drv->name, name);
+        exit(1);
+    }
+
+    if (memchr(val, 0, size) != val + size - 1
+        || spec->parse((char *)dev->priv + spec->offs, val, spec) < 0) {
+        fprintf(stderr, "Bad value %.*s for property %s of device %s\n",
+                size, val, name, dev->drv->name);
+        exit(1);
+    }
+}
+
+int dt_parse_string(void *dst, const char *src, dt_prop_spec *spec)
+{
+    assert(spec->size == sizeof(char *));
+    *(const char **)dst = src;
+    return 0;
+}
+
+int dt_parse_int(void *dst, const char *src, dt_prop_spec *spec)
+{
+    char *ep;
+    long val;
+
+    assert(spec->size == sizeof(int));
+    errno = 0;
+    val = strtol(src, &ep, 0);
+    if (*ep || ep == src || errno || (int)val != val)
+        return -1;
+    *(int *)dst = val;
+    return 0;
+}
+
+int dt_parse_ram_addr_t(void *dst, const char *src, dt_prop_spec *spec)
+{
+    char *ep;
+    unsigned long val;
+
+    assert(spec->size == sizeof(ram_addr_t));
+    errno = 0;
+    val = strtoul(src, &ep, 0);
+    if (*ep || ep == src || errno || (ram_addr_t)val != val)
+        return -1;
+    *(ram_addr_t *)dst = val;
+    return 0;
+}
+
+int dt_parse_macaddr(void *dst, const char *src, dt_prop_spec *spec)
+{
+    assert(spec->size == 6);
+    if (parse_macaddr(dst, src) < 0)
+        return -1;
+    return 0;
+}
+
+\f

+/* Interfacing with FDT */
+
+/*
+ * Note: translation to FDT loses the association between
+ * configuration tree nodes and devices.
+ */
+
+#ifdef HAVE_FDT
+
+static int dt_fdt_chk(int res);
+static void dt_subtree_to_fdt(const tree *conf, void *fdt);
+
+static void *dt_tree_to_fdt(const tree *conf)
+{
+    int sz = 1024 * 1024;       /* FIXME arbitrary limit */
+    void *fdt = qemu_malloc(sz);
+
+    dt_fdt_chk(fdt_create(fdt, sz));
+    dt_subtree_to_fdt(conf, fdt);
+    dt_fdt_chk(fdt_finish(fdt));
+    return fdt;
+}
+
+static void dt_subtree_to_fdt(const tree *conf, void *fdt)
+{
+    tree_prop *prop;
+    tree *kid;
+    const void *pv;
+    size_t sz;
+
+    dt_fdt_chk(fdt_begin_node(fdt, tree_node_name(conf)));
+    TREE_FOREACH_PROP(prop, conf) {
+        pv = tree_prop_value(prop, &sz);
+        dt_fdt_chk(fdt_property(fdt, tree_prop_name(prop), pv, sz));
+    }
+    TREE_FOREACH_KID(kid, conf)
+        dt_subtree_to_fdt(kid, fdt);
+    dt_fdt_chk(fdt_end_node(fdt));
+}
+
+static tree *dt_fdt_to_tree(const void *fdt)
+{
+    int offs, next, depth;
+    uint32_t tag;
+    struct fdt_property *prop;
+    tree *stack[32];            /* FIXME arbitrary limit */
+
+    stack[0] = NULL;            /* "parent" of root */
+    next = depth = 0;
+
+    for (;;) {
+        offs = next;
+        tag = fdt_next_tag(fdt, offs, &next);
+        switch (tag) {
+        case FDT_PROP:
+            /*
+             * libfdt apparently doesn't provide a way to get property
+             * by offset, do it by hand
+             */
+            assert(0 < depth && depth < sizeof(stack) / sizeof(*stack));
+            prop = (void *)(const char *)fdt + fdt_off_dt_struct(fdt) + offs;
+            tree_put_prop(stack[depth],
+                          fdt_string(fdt, fdt32_to_cpu(prop->nameoff)),
+                          prop->data,
+                          fdt32_to_cpu(prop->len));
+        case FDT_NOP:
+            break;
+        case FDT_BEGIN_NODE:
+            depth++;
+            assert(0 < depth && depth < sizeof(stack) / sizeof(*stack));
+            stack[depth] = tree_new_kid(stack[depth-1],
+                                        fdt_get_name(fdt, offs, NULL),
+                                        NULL);
+            break;
+        case FDT_END_NODE:
+            depth--;
+            break;
+        case FDT_END:
+            dt_fdt_chk(next);
+            return stack[1];
+        }
+    }
+}
+
+static int dt_fdt_chk(int res)
+{
+    if (res < 0) {
+        fprintf(stderr, "%s\n", fdt_strerror(res)); /* FIXME cryptic */
+        exit(1);
+    }
+    return res;
+}
+
+static void dt_fdt_test(tree *conf)
+{
+    void *fdt;
+
+    fdt = dt_tree_to_fdt(conf);
+    conf = dt_fdt_to_tree(fdt);
+    tree_print(conf);
+    free(fdt);
+}
+#else
+static void dt_fdt_test(tree *conf) { }
+#endif
diff --git a/dt.h b/dt.h
new file mode 100644
index 0000000..8f7ea8a
--- /dev/null
+++ b/dt.h
@@ -0,0 +1,109 @@
+#ifndef DT_H
+#define DT_H
+
+#include "sysemu.h"
+#include "net.h"
+#include "tree.h"
+
+typedef struct dt_host dt_host;
+typedef struct dt_device dt_device;
+typedef struct dt_tree_list dt_tree_list;
+typedef struct dt_driver dt_driver;
+typedef struct dt_prop_spec dt_prop_spec;
+
+
+/* Host Configuration */
+
+struct dt_host {
+    /* connection NICs <-> VLAN */
+    tree *nic[MAX_NICS];
+    VLANState *nic_vlan[MAX_NICS];
+    /* connection drives <-> controller */
+    tree *drive_ctrl[MAX_DRIVES];
+    BlockDriverState *drive_state[MAX_DRIVES];
+};
+
+void dt_attach_nic(dt_host *host, int index, tree *nic, VLANState *vlan);
+VLANState *dt_find_vlan(tree *conf, dt_host *host);
+void dt_attach_drive(dt_host *host, int index,
+                     tree *controller, BlockDriverState *state);
+void dt_drive_config(tree *conf, dt_host *host,
+                     BlockDriverState *drive[], int n);
+
+
+/* Device Interface */
+
+/*
+ * Device life cycle:
+ *
+ * 1. Configuration: config() method runs after parent's.  Except kids
+ * are skipped when the parent's config() returns non-zero.  config()
+ * should initialize the device's private data from its configuration
+ * sub-tree.  It may edit the configuration sub-tree, and may declare
+ * initialization ordering constraints with tree_require_named().
+ *
+ * 2. Initialization: init() method runs after parent's and after that
+ * of devices declared required by config().  It should not touch the
+ * configuration tree.
+ *
+ * 3. Start: start() method runs, order is unspecified.
+ *
+ * Error handling in these driver methods: print to stderr and exit
+ * the program unsuccessfully.
+ *
+ * There is no device shutdown protocol yet.
+ */
+
+struct dt_device {
+    tree *conf;                 /* configuration sub-tree */
+    dt_driver *drv;             /* device driver */
+    LIST_HEAD(, dt_tree_list) reqs; /* required devices */
+    int visit;                  /* for dt_visit() */
+    void *priv;                 /* device private data */
+};
+
+struct dt_tree_list {
+    tree *conf;
+    LIST_ENTRY(dt_tree_list) link;
+};
+
+struct dt_driver {
+    const char *name;
+    size_t privsz;              /* size of device private data */
+    dt_prop_spec *prop_spec;    /* recognized conf node properties */
+    int (*config)(dt_device *, dt_host *);
+    void (*init)(dt_device *);
+    void (*start)(dt_device *);
+};
+
+dt_device *dt_device_of(tree *conf);
+dt_device *dt_parent_device(dt_device *dev);
+tree *dt_require_named(dt_device *dev, const char *reqname);
+void dt_create_machine(tree *conf, dt_host *host, dt_driver drvtab[]);
+
+
+/* Device properties */
+
+/*
+ * This is for parsing configuration tree node properties into device
+ * private data.
+ */
+
+struct dt_prop_spec {
+    const char *name;
+    ptrdiff_t offs;             /* offset in device private data */
+    size_t size;                /* size there, for sanity checking */
+    int (*parse)(void *, const char *, dt_prop_spec *);
+};
+
+#define DT_PROP_SPEC_INIT(name, strty, member, fmt)                     \
+    { name, offsetof(strty, member), sizeof(((strty *)0)->member),      \
+      dt_parse_##fmt }
+
+/* Canned property parse methods */
+int dt_parse_string(void *dst, const char *src, dt_prop_spec *spec);
+int dt_parse_int(void *dst, const char *src, dt_prop_spec *spec);
+int dt_parse_ram_addr_t(void *dst, const char *src, dt_prop_spec *spec);
+int dt_parse_macaddr(void *dst, const char *src, dt_prop_spec *spec);
+
+#endif
diff --git a/hw/pc.c b/hw/pc.c
index 57ba803..107afb7 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -37,41 +37,34 @@
 #include "virtio-balloon.h"
 #include "virtio-console.h"
 #include "hpet_emul.h"
+#include "pcint.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
 
-#define BIOS_FILENAME "bios.bin"
-#define VGABIOS_FILENAME "vgabios.bin"
-#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
-
-#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
-
 /* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables.  */
 #define ACPI_DATA_SIZE       0x10000
 #define BIOS_CFG_IOPORT 0x510
 
-#define MAX_IDE_BUS 2
-
-static fdctrl_t *floppy_controller;
-static RTCState *rtc_state;
+fdctrl_t *floppy_controller;
+RTCState *rtc_state;
 static PITState *pit;
 static IOAPICState *ioapic;
-static PCIDevice *i440fx_state;
+PCIDevice *i440fx_state;
 
-static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
 {
 }
 
 /* MSDOS compatibility mode FPU exception support */
-static qemu_irq ferr_irq;
+qemu_irq ferr_irq;
 /* XXX: add IGNNE support */
 void cpu_set_ferr(CPUX86State *s)
 {
     qemu_irq_raise(ferr_irq);
 }
 
-static void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
 {
     qemu_irq_lower(ferr_irq);
 }
@@ -120,7 +113,7 @@ int cpu_get_pic_interrupt(CPUState *env)
     return intno;
 }
 
-static void pic_irq_request(void *opaque, int irq, int level)
+void pic_irq_request(void *opaque, int irq, int level)
 {
     CPUState *env = first_cpu;
 
@@ -166,7 +159,7 @@ static int cmos_get_fd_drive_type(int fd0)
     return val;
 }
 
-static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
 {
     RTCState *s = rtc_state;
     int cylinders, heads, sectors;
@@ -202,7 +195,7 @@ static int boot_device2nibble(char boot_device)
 
 /* copy/pasted from cmos_init, should be made a general function
  and used there as well */
-static int pc_boot_set(void *opaque, const char *boot_device)
+int pc_boot_set(void *opaque, const char *boot_device)
 {
 #define PC_MAX_BOOT_DEVICES 3
     RTCState *s = (RTCState *)opaque;
@@ -228,8 +221,8 @@ static int pc_boot_set(void *opaque, const char *boot_device)
 }
 
 /* hd_table must contain 4 block drivers */
-static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
-                      const char *boot_device, BlockDriverState **hd_table)
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table)
 {
     RTCState *s = rtc_state;
     int nbds, bds[3] = { 0, };
@@ -362,13 +355,13 @@ int ioport_get_a20(void)
     return ((first_cpu->a20_mask >> 20) & 1);
 }
 
-static void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
 {
     ioport_set_a20((val >> 1) & 1);
     /* XXX: bit 0 is fast reset */
 }
 
-static uint32_t ioport92_read(void *opaque, uint32_t addr)
+uint32_t ioport92_read(void *opaque, uint32_t addr)
 {
     return ioport_get_a20() << 1;
 }
@@ -420,7 +413,7 @@ static void bochs_bios_write(void *opaque, uint32_t addr, uint32_t val)
     }
 }
 
-static void bochs_bios_init(void)
+void bochs_bios_init(void)
 {
     void *fw_cfg;
 
@@ -687,7 +680,7 @@ static void load_linux(uint8_t *option_rom,
     generate_bootsect(option_rom, gpr, seg, 0);
 }
 
-static void main_cpu_reset(void *opaque)
+void main_cpu_reset(void *opaque)
 {
     CPUState *env = opaque;
     cpu_reset(env);
@@ -702,11 +695,11 @@ static const int ide_irq[2] = { 14, 15 };
 static int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360, 0x280, 0x380 };
 static int ne2000_irq[NE2000_NB_MAX] = { 9, 10, 11, 3, 4, 5 };
 
-static int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
-static int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
+int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
+int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
 
-static int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
-static int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
+int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
+int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
 
 #ifdef HAS_AUDIO
 static void audio_init (PCIBus *pci_bus, qemu_irq *pic)
diff --git a/hw/pcdt.c b/hw/pcdt.c
new file mode 100644
index 0000000..95bc698
--- /dev/null
+++ b/hw/pcdt.c
@@ -0,0 +1,686 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * This is a PC configured and built using the new dt_ infrastructure.
+ * Having two PC machine types makes no sense in the long run, of
+ * course.  We want to replace pc.c eventually, and also convert other
+ * machine types to this infrastructure.
+ *
+ * The configuration data currently is hardwired, and fairly limited.
+ *
+ * The nuts and bolts of PC emulation remain in pc.c for now, and
+ * using the stuff there makes the somewhat clumsy pcint.h necessary.
+ */
+
+#include <assert.h>
+#include "hw.h"
+#include "pc.h"
+#include "fdc.h"
+#include "pci.h"
+#include "block.h"
+#include "sysemu.h"
+#include "audio/audio.h"
+#include "net.h"
+#include "smbus.h"
+#include "boards.h"
+#include "console.h"
+#include "fw_cfg.h"
+#include "virtio-blk.h"
+#include "virtio-balloon.h"
+#include "virtio-console.h"
+#include "hpet_emul.h"
+#include "pcint.h"
+#include "dt.h"
+
+static BlockDriverState **dt_piix3_hd(tree *piix3);
+
+/* CPUs Driver */
+
+typedef struct dt_device_cpus {
+    const char *model;
+    int num;
+} dt_device_cpus;
+
+static dt_prop_spec dt_cpus_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
+    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
+};
+
+static void dt_cpus_init(dt_device *dev)
+{
+    dt_device_cpus *priv = dev->priv;
+    int i;
+    CPUState *env;
+
+    for(i = 0; i < priv->num; i++) {
+        env = cpu_init(priv->model);
+        if (!env) {
+            fprintf(stderr, "Unable to find x86 CPU definition\n");
+            exit(1);
+        }
+        if (i != 0)
+            env->halted = 1;
+        qemu_register_reset(main_cpu_reset, env);
+    }
+}
+
+\f

+/* Memory Ranges */
+
+typedef struct dt_device_memrng {
+    target_phys_addr_t phys_addr;
+    ram_addr_t size;
+    ram_addr_t host_offs;
+    ram_addr_t flags;
+} dt_device_memrng;
+
+static void dt_memrng(dt_device_memrng *rng,
+                      target_phys_addr_t phys_addr, ram_addr_t size,
+                      ram_addr_t host_offs, ram_addr_t flags)
+{
+    rng->phys_addr = phys_addr;
+    rng->size = size;
+    rng->host_offs = host_offs;
+    rng->flags = flags;
+}
+
+static void dt_memrng_ram(dt_device_memrng *rng,
+                          target_phys_addr_t phys_addr, ram_addr_t size)
+{
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), 0);
+}
+
+static void dt_memrng_rom(dt_device_memrng *rng,
+                          target_phys_addr_t phys_addr, ram_addr_t maxsz,
+                          const char *dir, const char *image, int top)
+{
+    char buf[1024];
+    int size;
+
+    snprintf(buf, sizeof(buf), "%s/%s", dir, image);
+    size = get_image_size(buf);
+    if (size < 0 || size > maxsz)
+        goto error;
+    if (top)
+        phys_addr = phys_addr + maxsz - size;
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), IO_MEM_ROM);
+    if (load_image(buf, phys_ram_base + rng->host_offs) != size)
+        goto error;
+    return;
+
+error:
+    fprintf(stderr, "qemu: could not load image '%s'\n", buf);
+    exit(1);
+}
+
+static void dt_memrng_init(dt_device_memrng *rng, int n)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+        cpu_register_physical_memory(rng[i].phys_addr, rng[i].size,
+                                     rng[i].host_offs | rng[i].flags);
+}
+
+\f

+/* Memory Driver */
+
+typedef struct dt_device_memory {
+    ram_addr_t ram_size;
+    dt_device_memrng *rng;
+    int nrng;
+    /* TODO want a real memory map here */
+    ram_addr_t below_4g, above_4g;
+} dt_device_memory;
+
+static dt_prop_spec dt_memory_props[] = {
+    DT_PROP_SPEC_INIT("ram", dt_device_memory, ram_size, ram_addr_t),
+};
+
+static int dt_memory_config(dt_device *dev, dt_host *host)
+{
+    /* TODO memory map hardcoded; get it from dev->conf instead */
+    dt_device_memory *priv = dev->priv;
+    dt_device_memrng *rng = qemu_malloc(sizeof(*rng) * 4);
+
+    if (priv->ram_size >= 0xe0000000 ) {
+        priv->above_4g = priv->ram_size - 0xe0000000;
+        priv->below_4g = 0xe0000000;
+    } else {
+        priv->below_4g = priv->ram_size;
+        priv->above_4g = 0;
+    }
+
+    dt_memrng_ram(&rng[0], 0, 0xa0000);
+    qemu_ram_alloc(0x60000);
+    dt_memrng_ram(&rng[1], 0x100000, priv->below_4g - 0x100000);
+    if (priv->above_4g)
+        abort();                /* TODO */
+    dt_memrng_rom(&rng[2], 0xe0000000, 0x20000000,
+                  bios_dir, BIOS_FILENAME, 1);
+                                /* TODO get name from dev->conf */
+    dt_memrng(&rng[3], 0xe0000, 0x20000,
+              rng[2].host_offs + rng[2].size - 0x20000, IO_MEM_ROM);
+    /* TODO option ROMs */
+
+    priv->rng = rng;
+    priv->nrng = 4;
+    return 0;
+}
+
+static void dt_memory_init(dt_device *dev)
+{
+    dt_device_memory *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, priv->nrng);
+    bochs_bios_init();
+}
+
+static ram_addr_t dt_memory_below_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->below_4g;
+}
+
+static ram_addr_t dt_memory_above_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->above_4g;
+}
+
+\f

+/* PC Miscellanous Driver */
+
+/*
+ * This is a driver for a whole collection of devices.  Could be
+ * picked apart into separate drivers, I guess.
+ */
+
+typedef struct dt_device_pc_misc {
+    const char *boot_device;
+    int apic;
+    int hpet;
+    qemu_irq *i8259;
+    BlockDriverState *fd[MAX_FD];
+} dt_device_pc_misc;
+
+static dt_prop_spec dt_pc_misc_props[] = {
+    DT_PROP_SPEC_INIT("boot-device", dt_device_pc_misc, boot_device,
+                      string),
+};
+
+static int dt_pc_misc_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pc_misc *priv = dev->priv;
+
+    priv->apic = 1;
+    priv->hpet = 1;
+    priv->i8259 = NULL;
+    dt_drive_config(dev->conf, host,
+                    priv->fd, sizeof(priv->fd) / sizeof(*priv->fd));
+    return 1;
+}
+
+static void dt_pc_misc_init(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    CPUState *env;
+    qemu_irq *cpu_irq;
+    IOAPICState *ioapic;
+    PITState *pit;
+    int i;
+
+    if (priv->apic) {
+        for (env = first_cpu; env; env = env->next_cpu) {
+            env->cpuid_features |= CPUID_APIC;
+            apic_init(env);
+        }
+    }
+
+    vmport_init();
+
+    cpu_irq = qemu_allocate_irqs(pic_irq_request, NULL, 1);
+    priv->i8259 = i8259_init(cpu_irq[0]);
+    ferr_irq = priv->i8259[13];
+
+    register_ioport_write(0x80, 1, 1, ioport80_write, NULL);
+    register_ioport_write(0xf0, 1, 1, ioportF0_write, NULL);
+
+    rtc_state = rtc_init(0x70, priv->i8259[8], 2000);
+    qemu_register_boot_set(pc_boot_set, rtc_state);
+
+    register_ioport_read(0x92, 1, 1, ioport92_read, NULL);
+    register_ioport_write(0x92, 1, 1, ioport92_write, NULL);
+
+    if (priv->apic) {
+        ioapic = ioapic_init();
+        pic_set_alt_irq_func(isa_pic, ioapic_set_irq, ioapic);
+    }
+
+    pit = pit_init(0x40, priv->i8259[0]);
+    pcspk_init(pit);
+    if (priv->hpet)
+        hpet_init(priv->i8259);
+
+    for(i = 0; i < MAX_SERIAL_PORTS; i++) {
+        if (serial_hds[i]) {
+            serial_init(serial_io[i], priv->i8259[serial_irq[i]], 115200,
+                        serial_hds[i]);
+        }
+    }
+
+    for(i = 0; i < MAX_PARALLEL_PORTS; i++) {
+        if (parallel_hds[i]) {
+            parallel_init(parallel_io[i], priv->i8259[parallel_irq[i]],
+                          parallel_hds[i]);
+        }
+    }
+
+    i8042_init(priv->i8259[1], priv->i8259[12], 0x60);
+    DMA_init(0);
+
+    floppy_controller = fdctrl_init(priv->i8259[6], 2, 0, 0x3f0, priv->fd);
+}
+
+static void dt_pc_misc_start(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    tree *memory = tree_node_by_name(dev->conf, "/memory");
+    tree *piix3 = tree_node_by_name(dev->conf, "/pci/piix3");
+
+    cmos_init(dt_memory_below_4g(memory),
+              dt_memory_above_4g(memory),
+              priv->boot_device,
+              dt_piix3_hd(piix3));
+}
+
+static qemu_irq *dt_pc_misc_i8259(tree *pc_misc)
+{
+    dt_device *dev = dt_device_of(pc_misc);
+    dt_device_pc_misc *priv = dev->priv;
+    assert(dev->drv->init == dt_pc_misc_init);
+    return priv->i8259;
+}
+
+\f

+/* PCI Bus Driver */
+
+typedef struct dt_device_pci {
+    PCIBus *bus;
+    tree *pc;
+} dt_device_pci;
+
+static int dt_pci_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->bus = NULL;
+    priv->pc = dt_require_named(dev, "/pc-misc");
+    return 0;
+}
+
+static void dt_pci_init(dt_device *dev)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->bus = i440fx_init(&i440fx_state, dt_pc_misc_i8259(priv->pc));
+}
+
+static void dt_pci_start(dt_device *dev)
+{
+    i440fx_init_memory_mappings(i440fx_state);
+}
+
+static void dt_must_be_on_pcibus(dt_device *dev)
+{
+    dt_device *bus = dt_parent_device(dev);
+
+    if (bus->drv->init != dt_pci_init) {
+        fprintf(stderr, "Device %s must be on a PCI bus\n", dev->drv->name);
+        exit(1);
+    }
+}
+
+static PCIBus *dt_get_pcibus(dt_device *dev)
+{
+    dt_device *bus = dt_parent_device(dev);
+
+    assert(bus->drv->init == dt_pci_init);
+    return ((dt_device_pci *)bus->priv)->bus;
+}
+
+\f

+/* PIIX3 Driver */
+
+typedef struct dt_device_piix3 {
+    int devfn;
+    int acpi;
+    int usb;
+    tree *pc;
+    BlockDriverState *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
+} dt_device_piix3;
+
+static int dt_piix3_config(dt_device *dev, dt_host *host)
+{
+    dt_device_piix3 *priv = dev->priv;
+
+    priv->devfn = -1;
+    priv->acpi = 1;
+    priv->usb = 1;
+    priv->pc = dt_require_named(dev, "/pc-misc");
+    dt_drive_config(dev->conf, host,
+                    priv->hd, sizeof(priv->hd) / sizeof(*priv->hd));
+    dt_must_be_on_pcibus(dev);
+    return 1;
+}
+
+static void dt_piix3_init(dt_device *dev)
+{
+    dt_device_piix3 *priv = dev->priv;
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+    qemu_irq *i8259 = dt_pc_misc_i8259(priv->pc);
+    int i;
+
+    priv->devfn = piix3_init(pci_bus, priv->devfn);
+
+    pci_piix3_ide_init(pci_bus, priv->hd, priv->devfn + 1, i8259);
+
+    if (priv->usb)
+        usb_uhci_piix3_init(pci_bus, priv->devfn + 2);
+
+    if (priv->acpi) {
+        uint8_t *eeprom_buf = qemu_mallocz(8 * 256); /* XXX: make this persistent */
+        i2c_bus *smbus;
+
+        /* TODO: Populate SPD eeprom data.  */
+        smbus = piix4_pm_init(pci_bus, priv->devfn + 3, 0xb100, i8259[9]);
+        for (i = 0; i < 8; i++)
+            smbus_eeprom_device_init(smbus, 0x50 + i, eeprom_buf + (i * 256));
+    }
+}
+
+static BlockDriverState **dt_piix3_hd(tree *piix3)
+{
+    dt_device *dev = dt_device_of(piix3);
+    dt_device_piix3 *priv = dev->priv;
+
+    assert(dev->drv->init == dt_piix3_init);
+    return priv->hd;
+}
+
+\f

+/* VGA Driver */
+
+typedef struct dt_driver_vga {
+    const char *model;
+    const char *bios;
+    void (*init)(PCIBus *, uint8_t *, ram_addr_t, int);
+} dt_driver_vga;
+
+static void pci_vmsvga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+                             ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vmsvga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size);
+}
+
+static void pci_vga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+                          ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size, 0, 0);
+}
+
+static dt_driver_vga dt_driver_vga_table[] = {
+    { "cirrus", VGABIOS_CIRRUS_FILENAME, pci_cirrus_vga_init },
+    { "vms", VGABIOS_FILENAME, pci_vmsvga_init_ },
+    { "std", VGABIOS_FILENAME, pci_vga_init_ },
+    { NULL, NULL, NULL }
+};
+
+typedef struct dt_device_vga {
+    const char *model;
+    ram_addr_t ram_size;
+    dt_device_memrng rng[1];
+    ram_addr_t ram_offs;
+    dt_driver_vga *vga_drv;
+} dt_device_vga;
+
+static dt_prop_spec dt_vga_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_vga, model, string),
+    DT_PROP_SPEC_INIT("ram", dt_device_vga, ram_size, ram_addr_t),
+};
+
+static int dt_vga_config(dt_device *dev, dt_host *host)
+{
+    dt_device_vga *priv = dev->priv;
+    int i;
+
+    dt_memrng_rom(&priv->rng[0], 0xc0000, 0x10000,
+                  bios_dir, VGABIOS_CIRRUS_FILENAME, 0);
+                                /* TODO get name from dev->conf */
+    priv->ram_offs = qemu_ram_alloc(priv->ram_size);
+
+    for (i = 0; dt_driver_vga_table[i].model; i++) {
+        if (!strcmp(dt_driver_vga_table[i].model, priv->model))
+            break;
+    }
+    if (!dt_driver_vga_table[i].model) {
+        fprintf(stderr, "Unknown VGA model %s\n", priv->model);
+        exit(1);
+    }
+    priv->vga_drv = &dt_driver_vga_table[i];
+    dt_must_be_on_pcibus(dev);
+    return 0;
+}
+
+static void dt_vga_init(dt_device *dev)
+{
+    dt_device_vga *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, 1);
+    priv->vga_drv->init(dt_get_pcibus(dev),
+                        phys_ram_base + priv->ram_offs,
+                        priv->ram_offs, priv->ram_size);
+}
+
+\f

+/* NIC Driver */
+
+typedef struct dt_device_nic {
+    NICInfo nd;
+} dt_device_nic;
+
+static dt_prop_spec dt_nic_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_nic, nd.model, string),
+    DT_PROP_SPEC_INIT("mac", dt_device_nic, nd.macaddr, macaddr),
+    DT_PROP_SPEC_INIT("name", dt_device_nic, nd.name, string),
+};
+
+static int dt_nic_config(dt_device *dev, dt_host *host)
+{
+    dt_device_nic *priv = dev->priv;
+
+    priv->nd.vlan = dt_find_vlan(dev->conf, host);
+    dt_must_be_on_pcibus(dev);
+    return 0;
+}
+
+static void dt_nic_init(dt_device *dev)
+{
+    dt_device_nic *priv = dev->priv;
+
+    pci_nic_init(dt_get_pcibus(dev), &priv->nd, -1, NULL);
+}
+
+\f

+/* Machine Driver */
+
+static dt_driver dt_driver_table[] = {
+    { "", 0, NULL, NULL },
+    { "cpus", sizeof(dt_device_cpus), dt_cpus_props,
+      NULL, dt_cpus_init, NULL },
+    { "memory", sizeof(dt_device_memory), dt_memory_props,
+      dt_memory_config, dt_memory_init, NULL },
+    { "pc-misc", sizeof(dt_device_pc_misc), dt_pc_misc_props,
+      dt_pc_misc_config, dt_pc_misc_init, dt_pc_misc_start },
+    { "pci", sizeof(dt_device_pci), NULL,
+      dt_pci_config, dt_pci_init, dt_pci_start },
+    { "piix3", sizeof(dt_device_piix3), NULL,
+      dt_piix3_config, dt_piix3_init, NULL },
+    { "vga", sizeof(dt_device_vga), dt_vga_props,
+      dt_vga_config, dt_vga_init, NULL },
+    { "nic", sizeof(dt_device_nic), dt_nic_props,
+      dt_nic_config, dt_nic_init, NULL },
+    { NULL, 0, NULL, NULL, NULL }
+};
+
+static tree *dt_read_config(void)
+{
+    tree *root, *pci, *leaf;
+
+    /*
+     * TODO Read from config file.
+     *
+     * TODO Pretty far from a comprehensive machine configuration, but
+     * we need to start somewhere.
+     */
+    root = tree_new_kid(NULL, "", NULL);
+    leaf = tree_new_kid(root, "cpus", NULL);
+    tree_put_propf(leaf, "model", "%s", "qemu32");
+    leaf = tree_new_kid(root, "memory", NULL);
+    leaf = tree_new_kid(root, "pc-misc", NULL);
+    pci = tree_new_kid(root, "pci", NULL);
+    leaf = tree_new_kid(pci, "piix3", NULL);
+    return root;
+}
+
+/*
+ * Extract configuration from arguments and various global variables
+ * and put it into our machine and host configuration.
+ */
+static void dt_customize_config(tree *conf,
+                                dt_host *host,
+                                ram_addr_t ram_size, int vga_ram_size,
+                                const char *boot_device,
+                                const char *kernel_filename,
+                                const char *kernel_cmdline,
+                                const char *initrd_filename,
+                                const char *cpu_model)
+{
+    /*
+     * TODO This is still pretty cheesy: we insert stuff into the tree
+     * at hardcoded places.  Replacing placeholders instead would be
+     * more flexible.  Another idea is to mark certain parts of the
+     * initial tree optional, and remove them here.
+     */
+    tree *pci = tree_node_by_name(conf, "/pci");
+    tree *node;
+    int i, index;
+
+    node = tree_node_by_name(conf, "/cpus");
+    tree_put_propf(node, "num", "%d", smp_cpus);
+    if (cpu_model)
+        tree_put_propf(node, "model", "%s", cpu_model);
+
+    node = tree_node_by_name(conf, "/memory");
+    tree_put_propf(node, "ram", "%#lx", (unsigned long)ram_size);
+
+    node = tree_node_by_name(conf, "/pc-misc");
+    tree_put_propf(node, "boot-device", "%s", boot_device);
+
+    /* Insert VGA node */
+    if (cirrus_vga_enabled || vmsvga_enabled || std_vga_enabled) {
+        node = tree_new_kid(pci, "vga", NULL);
+        tree_put_propf(node, "model", "%s",
+                          cirrus_vga_enabled ? "cirrus" :
+                          vmsvga_enabled ? "vms" : "std");
+        tree_put_propf(node, "ram", "%#x", vga_ram_size);
+    }
+
+    /* Insert NIC nodes, connect to VLANs */
+    for(i = 0; i < nb_nics; i++) {
+        /* TODO non-PCI NICs */
+        NICInfo *n = &nd_table[i];
+
+        node = tree_new_kid(pci, "nic", NULL);
+        tree_put_propf(node, "mac", "%02x:%02x:%02x:%02x:%02x:%02x",
+                       n->macaddr[0], n->macaddr[1], n->macaddr[2],
+                       n->macaddr[3], n->macaddr[4], n->macaddr[5]);
+        tree_put_propf(node, "model", "%s",
+                       n->model ? n->model : "ne2k_pci");
+        if (n->name)
+            tree_put_propf(node, "name", "%s", n->name);
+        dt_attach_nic(host, i, node, n->vlan);
+    }
+
+    /* Connect drives to their controller nodes */
+    /* IDE */
+    node = tree_node_by_name(pci, "piix3");
+    for(i = 0; i < MAX_IDE_BUS * MAX_IDE_DEVS; i++) {
+        index = drive_get_index(IF_IDE, i / MAX_IDE_DEVS, i % MAX_IDE_DEVS);
+        if (index != -1)
+            dt_attach_drive(host, index, node, drives_table[index].bdrv);
+    }
+    /* Floppy */
+    node = tree_node_by_name(conf, "/pc-misc");
+    for(i = 0; i < MAX_FD; i++) {
+        index = drive_get_index(IF_FLOPPY, 0, i);
+        if (index != -1)
+            dt_attach_drive(host, index, node, drives_table[index].bdrv);
+    }
+
+    /* Unimplemented stuff */
+    if (kernel_filename)
+        abort();                /* TODO */
+}
+
+static void pc_init_dt(ram_addr_t ram_size, int vga_ram_size,
+                       const char *boot_device,
+                       const char *kernel_filename,
+                       const char *kernel_cmdline,
+                       const char *initrd_filename,
+                       const char *cpu_model)
+{
+    tree *conf;
+    dt_host host;
+
+    conf = dt_read_config();
+    if (!conf)
+        exit(1);
+    tree_print(conf);
+    memset(&host, 0, sizeof(host));
+    dt_customize_config(conf, &host, ram_size, vga_ram_size, boot_device,
+                        kernel_filename, kernel_cmdline, initrd_filename,
+                        cpu_model);
+    dt_create_machine(conf, &host, dt_driver_table);
+}
+
+QEMUMachine pcdt_machine = {
+    .name = "pcdt",
+    .desc = "Standard PC (device tree)",
+    .init = pc_init_dt,
+    .ram_require = VGA_RAM_SIZE + PC_MAX_BIOS_SIZE,
+    .max_cpus = 255,
+};
diff --git a/hw/pcint.h b/hw/pcint.h
new file mode 100644
index 0000000..f18da67
--- /dev/null
+++ b/hw/pcint.h
@@ -0,0 +1,46 @@
+/*
+ * Stuff shared by pc.c and dt.c
+ *
+ * See dt.c for why this should go away eventually.
+ */
+
+#ifndef HW_PC_INT_H
+#define HW_PC_INT_H
+
+#define BIOS_FILENAME "bios.bin"
+#define VGABIOS_FILENAME "vgabios.bin"
+#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
+
+#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
+
+#define MAX_IDE_BUS 2
+
+/* TODO move to ferr stuff in cpu.h? */
+extern qemu_irq ferr_irq;
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data);
+
+/* TODO eliminate */
+extern RTCState *rtc_state;
+extern PCIDevice *i440fx_state;
+extern int serial_io[MAX_SERIAL_PORTS];
+extern int serial_irq[MAX_SERIAL_PORTS];
+extern int parallel_io[MAX_PARALLEL_PORTS];
+extern int parallel_irq[MAX_PARALLEL_PORTS];
+extern fdctrl_t *floppy_controller;
+
+/* TODO move to pic stuff in pc.h? */
+void pic_irq_request(void *opaque, int irq, int level);
+
+/* TODO move to a20 stuff in pc.h? */
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val);
+uint32_t ioport92_read(void *opaque, uint32_t addr);
+
+void bochs_bios_init(void);
+void main_cpu_reset(void *opaque);
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data);
+int pc_boot_set(void *opaque, const char *boot_device);
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd);
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table);
+
+#endif
diff --git a/net.c b/net.c
index 29beb28..5ee3ba4 100644
--- a/net.c
+++ b/net.c
@@ -153,7 +153,7 @@ static void hex_dump(FILE *f, const uint8_t *buf, int size)
 }
 #endif
 
-static int parse_macaddr(uint8_t *macaddr, const char *p)
+int parse_macaddr(uint8_t *macaddr, const char *p)
 {
     int i;
     char *last_char;
diff --git a/net.h b/net.h
index 03c7f18..f672915 100644
--- a/net.h
+++ b/net.h
@@ -47,6 +47,7 @@ int qemu_can_send_packet(VLANClientState *vc);
 ssize_t qemu_sendv_packet(VLANClientState *vc, const struct iovec *iov,
                           int iovcnt);
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size);
+int parse_macaddr(uint8_t *macaddr, const char *p);
 void qemu_format_nic_info_str(VLANClientState *vc, uint8_t macaddr[6]);
 void qemu_check_nic_model(NICInfo *nd, const char *model);
 void qemu_check_nic_model_list(NICInfo *nd, const char * const *models,
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 1cf49d5..01329d2 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -7,6 +7,8 @@
 
 void register_machines(void)
 {
+    extern QEMUMachine pcdt_machine;
+    qemu_register_machine(&pcdt_machine);
     qemu_register_machine(&pc_machine);
     qemu_register_machine(&isapc_machine);
 }
diff --git a/tree.c b/tree.c
new file mode 100644
index 0000000..7f9bcce
--- /dev/null
+++ b/tree.c
@@ -0,0 +1,274 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/* Ye Olde Decorated Tree */
+
+#include <assert.h>
+#include "tree.h"
+#include "qemu-common.h"
+#include "sys-queue.h"
+
+struct tree {
+    const char *name;
+    LIST_HEAD(, tree_prop) props;
+    tree *parent;
+    TAILQ_HEAD(, tree) kids;
+    TAILQ_ENTRY(tree) siblings;
+    void *user;
+};
+
+struct tree_prop {
+    const char *name;
+    const void *val;
+    int sz;
+    tree *owner;
+    LIST_ENTRY(tree_prop) link;
+};
+
+tree *tree_new_kid(tree *parent, const char *name, void *user)
+{
+    tree *kid = qemu_malloc(sizeof(*kid));
+
+    assert(parent || !*name);
+    kid->name = name;
+    LIST_INIT(&kid->props);
+    kid->parent = parent;
+    TAILQ_INIT(&kid->kids);
+    if (parent)
+        TAILQ_INSERT_TAIL(&parent->kids, kid, siblings);
+    kid->user = user;
+
+    return kid;
+}
+
+const char *tree_node_name(const tree *node)
+{
+    return node->name;
+}
+
+static tree *tree_kid_by_name(const tree *dt, const char *name)
+{
+    const char *slash = strchr(name, '/');
+    size_t len = slash ? slash - name : strlen(name);
+    tree *kid;
+
+    TAILQ_FOREACH(kid, &dt->kids, siblings) {
+        if (!memcmp(kid->name, name, len) && kid->name[len] == 0)
+            return kid;
+    }
+    return NULL;
+}
+
+tree *tree_node_by_name(const tree *node, const char *name)
+{
+    tree *kid;
+    size_t len;
+
+    if (name[0] == '/') {
+        for (; node->parent; node = node->parent) ;
+        name++;
+    }
+
+    if (name[0] == 0)
+        return (tree *)node;
+
+    kid = tree_kid_by_name(node, name);
+    if (!kid)
+        return NULL;
+
+    len = strlen(kid->name);
+    if (name[len] == 0)
+        return kid;
+    assert (name[len] == '/');
+
+    while (name[len] == '/') len++;
+    return tree_node_by_name(kid, name + len);
+}
+
+tree_prop *tree_first_prop(const tree *node)
+{
+    return LIST_FIRST(&node->props);
+}
+
+tree_prop *tree_next_prop(const tree_prop *prop)
+{
+    return LIST_NEXT(prop, link);
+}
+
+tree_prop *tree_get_prop(const tree *node, const char *name)
+{
+    tree_prop *prop;
+
+    LIST_FOREACH(prop, &node->props, link) {
+        if (!strcmp(prop->name, name))
+            return prop;
+    }
+    return NULL;
+}
+
+const char *tree_get_prop_s(const tree *node, const char *name)
+{
+    tree_prop *prop = tree_get_prop(node, name);
+    if (!prop
+        || memchr(prop->val, 0, prop->sz) != prop->val + prop->sz - 1) {
+        errno = EINVAL;
+        return NULL;
+    }
+    return prop->val;
+}
+
+const char *tree_prop_name(const tree_prop *prop)
+{
+    return prop->name;
+}
+
+const void *tree_prop_value(const tree_prop *prop, size_t *size)
+{
+    if (size)
+        *size = prop->sz;
+    return prop->val;
+}
+
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz)
+{
+    tree_prop *prop;
+
+    prop = tree_get_prop(node, name);
+    if (!prop) {
+        prop = qemu_malloc(sizeof(*prop));
+        prop->name = name;
+        prop->owner = node;
+        LIST_INSERT_HEAD(&node->props, prop, link);
+    }
+    /* FIXME need a destructor for val */
+    prop->val = val;
+    prop->sz = sz;
+}
+
+void tree_put_propf(tree *node, const char *name, const char *fmt, ...)
+{
+    va_list ap;
+    size_t len;
+    char *buf;
+
+    va_start(ap, fmt);
+    len = vsnprintf(NULL, 0, fmt, ap);
+    va_end(ap);
+
+    buf = qemu_malloc(len + 1);
+    va_start(ap, fmt);
+    vsnprintf(buf, len + 1, fmt, ap);
+    va_end(ap);
+
+    tree_put_prop(node, name, buf, len + 1);
+}
+
+void tree_put_user(tree *node, void *user)
+{
+    node->user = user;
+}
+
+void *tree_get_user(const tree *node)
+{
+    return node->user;
+}
+
+tree *tree_parent(const tree *node)
+{
+    return node->parent;
+}
+
+tree *tree_first_kid(const tree *node)
+{
+    return TAILQ_FIRST(&node->kids);
+}
+
+tree *tree_sibling(const tree *node)
+{
+    return TAILQ_NEXT(node, siblings);
+}
+
+int tree_path(const tree *node, char *buf, size_t bufsz)
+{
+    char *p;
+    const tree *np;
+    size_t len, res;
+
+    p = buf + bufsz;
+    res = 0;
+    for (np = node; np->parent; np = np->parent) {
+        len = 1 + strlen(np->name);
+        res += len;
+        if (res >= bufsz)
+            continue;
+        p -= len;
+        memcpy(p + 1, np->name, len - 1);
+        p[0] = '/';
+    }
+
+    if (res < bufsz) {
+        memcpy(buf, p, res);
+        buf[res] = 0;
+    }
+
+    return res;
+}
+
+static void tree_print_sub(const tree *node, int indent)
+{
+    int i, use_str, sep;
+    const unsigned char *pv;
+    tree_prop *prop;
+    tree *kid;
+
+    printf("%*s%s {\n", indent, "", node->name[0] ? node->name : "/");
+    LIST_FOREACH(prop, &node->props, link) {
+        printf("%*s%s", indent + 4, "", prop->name);
+        pv = prop->val;
+        if (pv) {
+            printf(" = ");
+            use_str = pv[prop->sz - 1] == 0;
+            for (i = 0; i < prop->sz - 1; i++) {
+                if (!isprint(pv[i]))
+                    use_str = 0;
+            }
+            if (use_str)
+                printf("\"%s\"", (const char *)prop->val);
+            else {
+                sep = '[';
+                for (i = 0; i < prop->sz; i++) {
+                    printf("%c%02x", sep, pv[i]);
+                    sep = ' ';
+                }
+                printf("]");
+            }
+        }
+        printf(";\n");
+    }
+    TAILQ_FOREACH(kid, &node->kids, siblings)
+        tree_print_sub(kid, indent + 4);
+    printf("%*s};\n", indent, "");
+}
+
+void tree_print(const tree *node)
+{
+    tree_print_sub(node, 0);
+}
diff --git a/tree.h b/tree.h
new file mode 100644
index 0000000..3e596f8
--- /dev/null
+++ b/tree.h
@@ -0,0 +1,40 @@
+#ifndef TREE_H
+#define TREE_H
+
+#include <stddef.h>
+
+typedef struct tree tree;
+typedef struct tree_prop tree_prop;
+
+tree *tree_new_kid(tree *parent, const char *name, void *user);
+const char *tree_node_name(const tree *node);
+tree *tree_node_by_name(const tree *node,
+                               const char *name);
+
+tree_prop *tree_first_prop(const tree *node);
+tree_prop *tree_next_prop(const tree_prop *prop);
+#define TREE_FOREACH_PROP(var, node) \
+    for (var = tree_first_prop(node); var; var = tree_next_prop(var))
+tree_prop *tree_get_prop(const tree *node, const char *name);
+const char *tree_get_prop_s(const tree *node, const char *name);
+const char *tree_prop_name(const tree_prop *prop);
+const void *tree_prop_value(const tree_prop *prop, size_t *size);
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz);
+void tree_put_propf(tree *node, const char *name,
+                    const char *fmt, ...)
+    __attribute__((format(printf,3,4)));
+
+void tree_put_user(tree *node, void *user);
+void *tree_get_user(const tree *node);
+
+tree *tree_parent(const tree *node);
+tree *tree_first_kid(const tree *node);
+tree *tree_sibling(const tree *node);
+#define TREE_FOREACH_KID(var, node)                                     \
+    for (var = tree_first_kid(node); var; var = tree_sibling(var))
+
+int tree_path(const tree *node, char *buf, size_t bufsz);
+void tree_print(const tree *node);
+
+#endif

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-23 17:38     ` Markus Armbruster
@ 2009-02-23 18:58       ` Anthony Liguori
  2009-02-24  9:08         ` Markus Armbruster
  0 siblings, 1 reply; 146+ messages in thread
From: Anthony Liguori @ 2009-02-23 18:58 UTC (permalink / raw)
  To: qemu-devel

Markus Armbruster wrote:
> Anthony Liguori <anthony@codemonkey.ws> writes:
>
>   
>>> diff --git a/hw/dt.c b/hw/dt.c
>>> +
>>>   
>>>       
>> Please remove the ^Ls.  They don't render properly in my mail client.
>>     
>
> Many source files contain ^L already.  But I'll drop mine if you insist.
>   

My mailer (Thunderbird) expands the ^L into about 20 new lines making 
the patches difficult to review.

>>> +/* Host Configuration */
>>> +
>>> +typedef struct dt_host {
>>> +    /* connection NICs <-> VLAN */
>>> +    tree *nic[MAX_NICS];
>>> +    VLANState *nic_vlan[MAX_NICS];
>>> +    /* connection drives <-> controller */
>>> +    tree *drive_ctrl[MAX_DRIVES];
>>> +    BlockDriverState *drive_state[MAX_DRIVES];
>>> +} dt_host;
>>>
>>>   
>>>       
>> typedef struct DeviceTreeHost
>> {
>> } DeviceTreeHost.
>>     
>
> If you insist on CamelCase, IFindThatUglyAndHardToRead, but I can do
> that.  Just typedef names?
>   

Why?  Just convert to the way the rest of the code does it.  Conformity 
feels good, I promise :-)

> As to the placement of braces, a quick grep shows the vast majority of
> such typedefs to have the brace on the same line as the typedef.
>   

Yeah, that's fine.  My bracket placement was not intentional.

>> For instance, I think it would be perfectly fine to require to start
>> with that the command line configuration matches the describe machine
>> file.  For instance, if you see:
>>
>> -net tap -net nic,model=rtl8139
>>
>> Then you should search for an rtl8139 and configure the node to be on
>> vlan=0.  If an rtl8139 doesn't exist, throw an error.
>>     
>
> Conversely, when an optional tree node isn't enabled (e.g. with -net nic
> for NICs), silently cut it from the tree.
>   

I'd prefer the first iteration to not modify the tree at all.  The 
specified command line configuration should exactly match whatever the 
tree specifies.

How manipulate the tree in a generic way is IMHO a harder problem that 
deserves to be addressed in a proper way.

>> The long term goal, would be to have a mechanism to modify the tree in
>> a generic way and the -net nic code would end up looking like:
>>
>> node = find_next_device("type=nic,model=rtl8139");
>> if (!node) {
>>   node = find_bus("type=pcibus");
>>   if (!node)
>>       bail out
>>   node = add_node_to_bus(node,
>> "type=nic,model=rtl8139,remaining_description_of_rtl8139");
>>   if (!node)
>>       bail out
>> }
>>
>> attach_nic_to_vlan(vlan, node);
>>     
>
> Makes sense to me.
>
> The driver should declare on what kind(s) of bus this device can go.
>   

Yup.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 3
  2009-02-23 18:58       ` Anthony Liguori
@ 2009-02-24  9:08         ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-02-24  9:08 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori <anthony@codemonkey.ws> writes:

> Markus Armbruster wrote:
>> Anthony Liguori <anthony@codemonkey.ws> writes:
>>
>>   
>>>> diff --git a/hw/dt.c b/hw/dt.c
>>>> +
>>>>         
>>> Please remove the ^Ls.  They don't render properly in my mail client.
>>>     
>>
>> Many source files contain ^L already.  But I'll drop mine if you insist.
>>   
>
> My mailer (Thunderbird) expands the ^L into about 20 new lines making
> the patches difficult to review.

I'll drop them.  Pity, since they're quite useful for navigating.

>>>> +/* Host Configuration */
>>>> +
>>>> +typedef struct dt_host {
>>>> +    /* connection NICs <-> VLAN */
>>>> +    tree *nic[MAX_NICS];
>>>> +    VLANState *nic_vlan[MAX_NICS];
>>>> +    /* connection drives <-> controller */
>>>> +    tree *drive_ctrl[MAX_DRIVES];
>>>> +    BlockDriverState *drive_state[MAX_DRIVES];
>>>> +} dt_host;
>>>>
>>>>         
>>> typedef struct DeviceTreeHost
>>> {
>>> } DeviceTreeHost.
>>>     
>>
>> If you insist on CamelCase, IFindThatUglyAndHardToRead, but I can do
>> that.  Just typedef names?
>>   
>
> Why?  Just convert to the way the rest of the code does it.
> Conformity feels good, I promise :-)

But what does the rest of the code actually do?  I can see plenty of
CamelTypeNames, but functions?

Out of curiosity, I used ctags -x to get me separate lists of distinct
function, typedef, variable, member and struct/union/enum tag
definitions (macros and enumerators omitted):

                        function  typedef   variable  member    tags
lower_case [a-z0-9_]     10958       589      2226     10385       813
CamelCase  [a-zA-Z0-9]     164       415       111       249         0
other                      625        65        44       123       318
total                    11747      1069      2381     10757      1131

Naturally, names for widely used, important stuff matter more.  Whether
that tips the balance for typedef names towards CamelCase I don't know
and don't wish to argue about, I'll just do as you wish.  But for
everything else, the conformity argument seems to support lower_case.

>> As to the placement of braces, a quick grep shows the vast majority of
>> such typedefs to have the brace on the same line as the typedef.
>>   
>
> Yeah, that's fine.  My bracket placement was not intentional.
>
>>> For instance, I think it would be perfectly fine to require to start
>>> with that the command line configuration matches the describe machine
>>> file.  For instance, if you see:
>>>
>>> -net tap -net nic,model=rtl8139
>>>
>>> Then you should search for an rtl8139 and configure the node to be on
>>> vlan=0.  If an rtl8139 doesn't exist, throw an error.
>>>     
>>
>> Conversely, when an optional tree node isn't enabled (e.g. with -net nic
>> for NICs), silently cut it from the tree.
>>   
>
> I'd prefer the first iteration to not modify the tree at all.  The
> specified command line configuration should exactly match whatever the
> tree specifies.
>
> How manipulate the tree in a generic way is IMHO a harder problem that
> deserves to be addressed in a proper way.

I agree that we shouldn't attempt to solve that problem in the first
iteration.

Until we solve it, we need a stub.  Whether that stub edits the tree or
not seems relatively unimportant to me, as long as it is cleanly
encapsulated.

>>> The long term goal, would be to have a mechanism to modify the tree in
>>> a generic way and the -net nic code would end up looking like:
>>>
>>> node = find_next_device("type=nic,model=rtl8139");
>>> if (!node) {
>>>   node = find_bus("type=pcibus");
>>>   if (!node)
>>>       bail out
>>>   node = add_node_to_bus(node,
>>> "type=nic,model=rtl8139,remaining_description_of_rtl8139");
>>>   if (!node)
>>>       bail out
>>> }
>>>
>>> attach_nic_to_vlan(vlan, node);
>>>     
>>
>> Makes sense to me.
>>
>> The driver should declare on what kind(s) of bus this device can go.
>>   
>
> Yup.
>
> Regards,
>
> Anthony Liguori

Thanks again!

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 4 (was: [RFC] Machine description as data)
  2009-02-23 18:00 ` [Qemu-devel] Machine description as data prototype, take 4 (was: [RFC] Machine description as data) Markus Armbruster
@ 2009-02-24 20:06   ` Blue Swirl
  2009-02-25 12:13     ` [Qemu-devel] Machine description as data prototype, take 4 Markus Armbruster
  0 siblings, 1 reply; 146+ messages in thread
From: Blue Swirl @ 2009-02-24 20:06 UTC (permalink / raw)
  To: qemu-devel

On 2/23/09, Markus Armbruster <armbru@redhat.com> wrote:
> Fourth iteration of the prototype.  I'm not asking to consider a merge
>  at this time.

>  +static dt_prop_spec dt_cpus_props[] = {
>  +    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
>  +    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
>  +};

Please use "const".

>  +static void dt_memrng_rom(dt_device_memrng *rng,
>  +                          target_phys_addr_t phys_addr, ram_addr_t maxsz,
>  +                          const char *dir, const char *image, int top)
>  +{
>  +    char buf[1024];
>  +    int size;
>  +
>  +    snprintf(buf, sizeof(buf), "%s/%s", dir, image);
>  +    size = get_image_size(buf);
>  +    if (size < 0 || size > maxsz)
>  +        goto error;
>  +    if (top)
>  +        phys_addr = phys_addr + maxsz - size;
>  +    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), IO_MEM_ROM);
>  +    if (load_image(buf, phys_ram_base + rng->host_offs) != size)
>  +        goto error;
>  +    return;
>  +
>  +error:
>  +    fprintf(stderr, "qemu: could not load image '%s'\n", buf);
>  +    exit(1);
>  +}

If this is going to be the generic ROM loader, it should also handle
ELF format images. But for the PC this is enough.

>  +static dt_prop_spec dt_memory_props[] = {
>  +    DT_PROP_SPEC_INIT("ram", dt_device_memory, ram_size, ram_addr_t),
>  +};

"const"

>  +static dt_prop_spec dt_pc_misc_props[] = {
>  +    DT_PROP_SPEC_INIT("boot-device", dt_device_pc_misc, boot_device,
>  +                      string),
>  +};

"const"

>  +static dt_driver_vga dt_driver_vga_table[] = {
>  +    { "cirrus", VGABIOS_CIRRUS_FILENAME, pci_cirrus_vga_init },
>  +    { "vms", VGABIOS_FILENAME, pci_vmsvga_init_ },
>  +    { "std", VGABIOS_FILENAME, pci_vga_init_ },
>  +    { NULL, NULL, NULL }
>  +};

"const"

>  +static dt_prop_spec dt_vga_props[] = {
>  +    DT_PROP_SPEC_INIT("model", dt_device_vga, model, string),
>  +    DT_PROP_SPEC_INIT("ram", dt_device_vga, ram_size, ram_addr_t),
>  +};

"const"

>  +static dt_prop_spec dt_nic_props[] = {
>  +    DT_PROP_SPEC_INIT("model", dt_device_nic, nd.model, string),
>  +    DT_PROP_SPEC_INIT("mac", dt_device_nic, nd.macaddr, macaddr),
>  +    DT_PROP_SPEC_INIT("name", dt_device_nic, nd.name, string),
>  +};

"const"

>  +static dt_driver dt_driver_table[] = {
>  +    { "", 0, NULL, NULL },
>  +    { "cpus", sizeof(dt_device_cpus), dt_cpus_props,
>  +      NULL, dt_cpus_init, NULL },
>  +    { "memory", sizeof(dt_device_memory), dt_memory_props,
>  +      dt_memory_config, dt_memory_init, NULL },
>  +    { "pc-misc", sizeof(dt_device_pc_misc), dt_pc_misc_props,
>  +      dt_pc_misc_config, dt_pc_misc_init, dt_pc_misc_start },
>  +    { "pci", sizeof(dt_device_pci), NULL,
>  +      dt_pci_config, dt_pci_init, dt_pci_start },
>  +    { "piix3", sizeof(dt_device_piix3), NULL,
>  +      dt_piix3_config, dt_piix3_init, NULL },
>  +    { "vga", sizeof(dt_device_vga), dt_vga_props,
>  +      dt_vga_config, dt_vga_init, NULL },
>  +    { "nic", sizeof(dt_device_nic), dt_nic_props,
>  +      dt_nic_config, dt_nic_init, NULL },
>  +    { NULL, 0, NULL, NULL, NULL }
>  +};

"const"

>   void register_machines(void)
>   {
>  +    extern QEMUMachine pcdt_machine;

Put this into hw/boards.h.

>  +tree *tree_new_kid(tree *parent, const char *name, void *user)

The OF term for "kid" is "child".

>  +tree *tree_sibling(const tree *node)

Here the OF word is "peer".

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 4
  2009-02-24 20:06   ` Blue Swirl
@ 2009-02-25 12:13     ` Markus Armbruster
  2009-02-25 20:11       ` Blue Swirl
  0 siblings, 1 reply; 146+ messages in thread
From: Markus Armbruster @ 2009-02-25 12:13 UTC (permalink / raw)
  To: qemu-devel

Blue Swirl <blauwirbel@gmail.com> writes:

> On 2/23/09, Markus Armbruster <armbru@redhat.com> wrote:
>> Fourth iteration of the prototype.  I'm not asking to consider a merge
>>  at this time.
>
>>  +static dt_prop_spec dt_cpus_props[] = {
>>  +    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
>>  +    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
>>  +};
>
> Please use "const".

Done.

>>  +static void dt_memrng_rom(dt_device_memrng *rng,
>>  +                          target_phys_addr_t phys_addr, ram_addr_t maxsz,
>>  +                          const char *dir, const char *image, int top)
>>  +{
>>  +    char buf[1024];
>>  +    int size;
>>  +
>>  +    snprintf(buf, sizeof(buf), "%s/%s", dir, image);
>>  +    size = get_image_size(buf);
>>  +    if (size < 0 || size > maxsz)
>>  +        goto error;
>>  +    if (top)
>>  +        phys_addr = phys_addr + maxsz - size;
>>  +    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), IO_MEM_ROM);
>>  +    if (load_image(buf, phys_ram_base + rng->host_offs) != size)
>>  +        goto error;
>>  +    return;
>>  +
>>  +error:
>>  +    fprintf(stderr, "qemu: could not load image '%s'\n", buf);
>>  +    exit(1);
>>  +}
>
> If this is going to be the generic ROM loader, it should also handle
> ELF format images. But for the PC this is enough.

Yes.

[more const fallout...]
>>   void register_machines(void)
>>   {
>>  +    extern QEMUMachine pcdt_machine;
>
> Put this into hw/boards.h.

Done.  A leftover from the early reckless hacking stage.

>>  +tree *tree_new_kid(tree *parent, const char *name, void *user)
>
> The OF term for "kid" is "child".
>
>>  +tree *tree_sibling(const tree *node)
>
> Here the OF word is "peer".

Do you mean to recommend renames?  Note that right now this is a
perfectly generic decorated tree, which could be used for anything, not
just device trees.

While I'm not particular about "kid" vs. "child", I find "peer" clearly
inferior.  "Sibling" is obvious: child of the same parent.  "Peer" could
be anything.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 4
  2009-02-25 12:13     ` [Qemu-devel] Machine description as data prototype, take 4 Markus Armbruster
@ 2009-02-25 20:11       ` Blue Swirl
  0 siblings, 0 replies; 146+ messages in thread
From: Blue Swirl @ 2009-02-25 20:11 UTC (permalink / raw)
  To: qemu-devel

On 2/25/09, Markus Armbruster <armbru@redhat.com> wrote:
> Blue Swirl <blauwirbel@gmail.com> writes:
>
>  > On 2/23/09, Markus Armbruster <armbru@redhat.com> wrote:
>  >> Fourth iteration of the prototype.  I'm not asking to consider a merge
>  >>  at this time.
>  >
>  >>  +static dt_prop_spec dt_cpus_props[] = {
>  >>  +    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
>  >>  +    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
>  >>  +};
>  >
>  > Please use "const".
>
>  Done.
>
>  >>  +static void dt_memrng_rom(dt_device_memrng *rng,
>  >>  +                          target_phys_addr_t phys_addr, ram_addr_t maxsz,
>  >>  +                          const char *dir, const char *image, int top)
>  >>  +{
>  >>  +    char buf[1024];
>  >>  +    int size;
>  >>  +
>  >>  +    snprintf(buf, sizeof(buf), "%s/%s", dir, image);
>  >>  +    size = get_image_size(buf);
>  >>  +    if (size < 0 || size > maxsz)
>  >>  +        goto error;
>  >>  +    if (top)
>  >>  +        phys_addr = phys_addr + maxsz - size;
>  >>  +    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), IO_MEM_ROM);
>  >>  +    if (load_image(buf, phys_ram_base + rng->host_offs) != size)
>  >>  +        goto error;
>  >>  +    return;
>  >>  +
>  >>  +error:
>  >>  +    fprintf(stderr, "qemu: could not load image '%s'\n", buf);
>  >>  +    exit(1);
>  >>  +}
>  >
>  > If this is going to be the generic ROM loader, it should also handle
>  > ELF format images. But for the PC this is enough.
>
>  Yes.
>
>  [more const fallout...]
>  >>   void register_machines(void)
>  >>   {
>  >>  +    extern QEMUMachine pcdt_machine;
>  >
>  > Put this into hw/boards.h.
>
>  Done.  A leftover from the early reckless hacking stage.
>
>  >>  +tree *tree_new_kid(tree *parent, const char *name, void *user)
>  >
>  > The OF term for "kid" is "child".
>  >
>  >>  +tree *tree_sibling(const tree *node)
>  >
>  > Here the OF word is "peer".
>
>  Do you mean to recommend renames?  Note that right now this is a
>  perfectly generic decorated tree, which could be used for anything, not
>  just device trees.
>
>  While I'm not particular about "kid" vs. "child", I find "peer" clearly
>  inferior.  "Sibling" is obvious: child of the same parent.  "Peer" could
>  be anything.

Maybe the function usage is different, "peer" gets passed the child
node and it returns a node from the same level, whereas your function
takes the parent node. Or does it?

Anyway, "sibling" is still unnecessary deviation.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [Qemu-devel] Machine description as data prototype, take 5 (was: [RFC] Machine description as data)
  2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
                   ` (5 preceding siblings ...)
  2009-02-23 18:00 ` [Qemu-devel] Machine description as data prototype, take 4 (was: [RFC] Machine description as data) Markus Armbruster
@ 2009-03-03 17:46 ` Markus Armbruster
  2009-03-12 18:43 ` [Qemu-devel] Machine description as data prototype, take 6 " Markus Armbruster
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-03-03 17:46 UTC (permalink / raw)
  To: qemu-devel

Fifth iteration of the prototype.  Work in progress, not ready for
merging.

New:

* Buses are no longer hardcoded.  Dynamic devices and drives are
  attached to a suitable bus.  Device-independent and data-driven.

* A few minor things pointed out by reviewers.

* Rebased to git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6674 c046a42c-6fe2-441c-8c8c-71466251a162

Not in, but not forgotten either:

* A couple of renames suggested by reviewers.

* Reduce unnecessary differences to IEEE 1275 trees.

Shortcuts:

* I didn't implement all the devices of the "pc" original.  The devices
  I implemented might not support all existing command line options.

* The configuration tree is simplistic.  I expect it to evolve, and I
  wouldn't exclude the possibility of wholesale replacement.

* The initial configuration tree is hardcoded in hw/pcdt.c.  It should
  be read from a configuration file.

* Can only have one bus of each kind.  Want a generic way to enumerate
  buses of the same kind, and means to let device configuration ask for
  a specific bus.

* I'm hiding completely behind the existing QEMUMachine init method
  interface, in hw/pcdt.c.  I guess we'll want a QEMUMachine interface
  that allows us to move a bit more code out of hw/.

* The interface to the shared code in hw/pc.c (hw/pcint.h) is rather
  crude.

* The memory driver is PC-specific without true need.

* The pc-misc driver should most probably be split up some.

* Linux gripes about ACPI, need to investigate.


 Makefile              |    1 +
 Makefile.target       |    4 +-
 dt.c                  |  619 +++++++++++++++++++++++++++++++++++++++++++++++
 dt.h                  |  117 +++++++++
 hw/boards.h           |    3 +
 hw/pc.c               |   47 ++--
 hw/pcdt.c             |  645 +++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pci.h              |    2 +-
 hw/pcint.h            |   46 ++++
 hw/vmware_vga.c       |    4 +-
 net.c                 |    2 +-
 net.h                 |    1 +
 target-i386/machine.c |    1 +
 tree.c                |  285 ++++++++++++++++++++++
 tree.h                |   41 +++
 15 files changed, 1786 insertions(+), 32 deletions(-)


diff --git a/Makefile b/Makefile
index 4f7a55a..2198bba 100644
--- a/Makefile
+++ b/Makefile
@@ -85,6 +85,7 @@ OBJS+=sd.o ssi-sd.o
 OBJS+=bt.o bt-host.o bt-vhci.o bt-l2cap.o bt-sdp.o bt-hci.o bt-hid.o usb-bt.o
 OBJS+=buffered_file.o migration.o migration-tcp.o net.o qemu-sockets.o
 OBJS+=qemu-char.o aio.o net-checksum.o savevm.o cache-utils.o
+OBJS+=tree.o
 
 ifdef CONFIG_BRLAPI
 OBJS+= baum.o
diff --git a/Makefile.target b/Makefile.target
index f33f762..799437f 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -505,6 +505,7 @@ OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o dma-helpers.o
 # need to fix this properly
 OBJS+=virtio.o virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
 OBJS+=fw_cfg.o
+OBJS+=dt.o
 ifdef CONFIG_KVM
 OBJS+=kvm.o kvm-all.o
 endif
@@ -536,6 +537,7 @@ endif
 ifdef CONFIG_OSS
 LIBS += $(CONFIG_OSS_LIB)
 endif
+LIBS+= $(FDT_LIBS)
 
 SOUND_HW = sb16.o es1370.o ac97.o
 ifdef CONFIG_ADLIB
@@ -583,6 +585,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
+OBJS+= pcdt.o
 OBJS += device-hotplug.o pci-hotplug.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
@@ -606,7 +609,6 @@ OBJS+= ppc440.o ppc440_bamboo.o
 OBJS+= ppce500_pci.o ppce500_mpc8544ds.o
 ifdef FDT_LIBS
 OBJS+= device_tree.o
-LIBS+= $(FDT_LIBS)
 endif
 ifdef CONFIG_KVM
 OBJS+= kvm_ppc.o
diff --git a/dt.c b/dt.c
new file mode 100644
index 0000000..571623e
--- /dev/null
+++ b/dt.c
@@ -0,0 +1,619 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * Configure and build a machine from configuration data
+ *
+ * This is generic, device-independent code driven by device-dependent
+ * configuration data, talking to devices through an abstract device
+ * interface.
+ *
+ * The configuration data currently is hardwired to a fairly limited
+ * PC, registered as machine type "pcdt".  The nuts and bolts of PC
+ * emulation remain in pc.c, and that sharing makes the somewhat
+ * clumsy pcint.h necessary.  Having two PC machine types makes no
+ * sense in the long run, of course.  We want to replace pc.c
+ * eventually, and also convert other machine types to this mechanism.
+ */
+
+#include <assert.h>
+#include "block.h"
+#include "cpu.h"
+#include "dt.h"
+#include "net.h"
+#include "tree.h"
+#include "sysemu.h"
+
+#ifdef HAVE_FDT
+#include <libfdt.h>
+#endif
+
+/* Forward declarations */
+static void dt_parse_prop(dt_device *dev, tree_prop *prop);
+static void dt_add_dyn_devs(tree *conf, dt_host *host,
+                            const dt_driver drvtab[], int vga_ram_size);
+static void dt_fdt_test(tree *conf);
+
+
+/* Host Configuration */
+
+void dt_attach_nic(dt_host *host, int index, tree *nic, VLANState *vlan)
+{
+    host->nic[index] = nic;
+    host->nic_vlan[index] = vlan;
+}
+
+VLANState *dt_find_vlan(tree *conf, dt_host *host)
+{
+    int i;
+
+    for (i = 0; i < nb_nics; i++) {
+        if (host->nic[i] == conf)
+            return host->nic_vlan[i];
+    }
+    return NULL;
+}
+
+void dt_attach_drive(dt_host *host, int index,
+                     tree *node, BlockDriverState *state)
+{
+    host->drive[index] = node;
+    host->drive_state[index] = state;
+}
+
+void dt_drive_config(tree *conf, dt_host *host,
+                     BlockDriverState *drive[], int n)
+{
+    int i, j;
+
+    j = 0;
+    for (i = 0; i < nb_drives; i++) {
+        if (!host->drive[i])
+            continue;           /* TODO rm when all drive types implemented */
+        if (tree_parent(host->drive[i]) != conf)
+            continue;
+        assert(j < n);
+        drive[j++] = host->drive_state[i];
+    }
+}
+
+static void dt_print_host_config(dt_host *host)
+{
+    char buf[1024];
+    int i;
+
+    for (i = 0; i < nb_nics; i++) {
+        if (!host->nic[i])
+            continue;
+        tree_path(host->nic[i], buf, sizeof(buf));
+        printf("nic#%d\tvlan %-4d\t%s\n",
+               i, host->nic_vlan[i]->id, buf);
+    }
+
+    for (i = 0; i < nb_drives; i++) {
+        if (!host->drive[i])
+            continue;
+        tree_path(host->drive[i], buf, sizeof(buf));
+        printf("drive#%d\t%-15s %s\n",
+               i, bdrv_get_device_name(host->drive_state[i]), buf);
+    }
+}
+
+
+/* Device Interface */
+
+static const dt_driver *dt_driver_by_name(const char *name,
+                                          const dt_driver drvtab[])
+{
+    int i;
+
+    for (i = 0; drvtab[i].name; i++) {
+        if (!strcmp(name, drvtab[i].name))
+            return &drvtab[i];
+    }
+    return NULL;
+}
+
+dt_device *dt_device_of(tree *conf)
+{
+    return tree_get_user(conf);
+}
+
+dt_device *dt_parent_device(dt_device *dev)
+{
+    tree *p = tree_parent(dev->conf);
+
+    return p ? dt_device_of(p) : NULL;
+}
+
+/* TODO support multiple buses of the same type */
+static dt_device *dt_find_bus(tree *conf, dt_bus_type bus_type)
+{
+    dt_device *dev;
+    tree *kid;
+
+    dev = dt_device_of(conf);
+    if (dev->drv->bus_type == bus_type)
+        return dev;
+
+    TREE_FOREACH_KID(kid, conf) {
+        dev = dt_find_bus(kid, bus_type);
+        if (dev)
+            return dev;
+    }
+
+    return NULL;
+}
+
+PCIBus *dt_get_pcibus(dt_device *dev)
+{
+    dt_device *bus = dt_parent_device(dev);
+
+    return bus->drv->get_pcibus(bus);
+}
+
+static dt_device *dt_new_device(tree *conf, const dt_driver *drv)
+{
+    dt_device *dev;
+    tree_prop *prop;
+
+    dev = qemu_malloc(sizeof(*dev));
+    dev->conf = conf;
+    dev->drv = drv;
+    LIST_INIT(&dev->reqs);
+    dev->visit = 0;
+    dev->priv = qemu_malloc(drv->privsz);
+    tree_put_user(conf, dev);
+
+    TREE_FOREACH_PROP(prop, conf)
+        dt_parse_prop(dev, prop);
+
+    return dev;
+}
+
+static dt_device *dt_create(tree *conf, const dt_driver drvtab[])
+{
+    const dt_driver *drv;
+    dt_device *dev;
+    tree *kid;
+
+    drv = dt_driver_by_name(tree_node_name(conf), drvtab);
+    if (!drv) {
+        fprintf(stderr, "No driver for device %s\n",
+                tree_node_name(conf));
+        exit(1);
+    }
+
+    assert((drv->bus_type == DT_BUS_PCI) == (drv->get_pcibus != NULL));
+
+    dev = dt_new_device(conf, drv);
+
+    TREE_FOREACH_KID(kid, conf)
+        dt_create(kid, drvtab);
+
+    return dev;
+}
+
+static void dt_config(tree *conf, dt_host *host)
+{
+    dt_device *dev = dt_device_of(conf);
+    dt_device *bus = dt_parent_device(dev);
+    tree *kid;
+
+    if (dev->drv->parent_bus_type == DT_BUS_NONE
+        ? bus != NULL
+        : bus == NULL || bus->drv->bus_type != dev->drv->parent_bus_type) {
+        fprintf(stderr, "Device %s is not on a suitable bus\n",
+                dev->drv->name);
+        exit(1);
+    }
+
+    if (dev->drv->config) {
+        if (dev->drv->config(dev, host))
+            return;
+    }
+
+    TREE_FOREACH_KID(kid, conf)
+        dt_config(kid, host);
+}
+
+tree *dt_require_named(dt_device *dev, const char *reqname)
+{
+    dt_tree_list *l = qemu_malloc(sizeof(*l));
+
+    l->conf = tree_node_by_name(dev->conf, reqname);
+    LIST_INSERT_HEAD(&dev->reqs, l, link);
+    return l->conf;
+}
+
+static void dt_do_visit(dt_device *dev,
+                        void (*fun)(dt_device *, void *arg),
+                        void *arg, int visit)
+{
+    dt_device *parent, *req, *kid;
+    dt_tree_list *l;
+    tree *k;
+
+    assert(dev->visit < visit - 1);
+    dev->visit = visit - 1;
+    parent = dt_parent_device(dev);
+    if (parent && parent->visit < visit)
+        dt_do_visit(parent, fun, arg, visit);
+    LIST_FOREACH(l, &dev->reqs, link) {
+        req = dt_device_of(l->conf);
+        if (req->visit < visit)
+            dt_do_visit(req, fun, arg, visit);
+    }
+    dev->visit = visit;
+    fun(dev, arg);
+    TREE_FOREACH_KID(k, dev->conf) {
+        kid = dt_device_of(k);
+        if (kid->visit < visit - 1)
+            dt_do_visit(kid, fun, arg, visit);
+    }
+}
+
+static void dt_visit(tree *node,
+                     void (*fun)(dt_device *, void *arg),
+                     void *arg)
+{
+    static int visit;
+
+    visit += 2;
+    dt_do_visit(dt_device_of(node), fun, arg, visit);
+}
+
+static void dt_init_visitor(dt_device *dev, void *arg)
+{
+    if (dev->drv->init)
+        dev->drv->init(dev);
+}
+
+static void dt_init(tree *conf)
+{
+    dt_visit(conf, dt_init_visitor, NULL);
+}
+
+static void dt_start(tree *conf)
+{
+    dt_device *dev = dt_device_of(conf);
+    tree *kid;
+
+    if (dev && dev->drv->start)
+        dev->drv->start(dev);
+
+    TREE_FOREACH_KID(kid, conf)
+        dt_start(kid);
+}
+
+void dt_create_machine(tree *conf, dt_host *host,
+                       const dt_driver drvtab[], int vga_ram_size)
+{
+    dt_create(conf, drvtab);
+    dt_add_dyn_devs(conf, host, drvtab, vga_ram_size);
+    dt_config(conf, host);
+    tree_print(conf);
+    dt_print_host_config(host);
+    dt_fdt_test(conf);
+    dt_init(conf);
+    dt_start(conf);
+}
+
+
+/* Device properties */
+
+static const dt_prop_spec *dt_prop_spec_by_name(const dt_driver *drv,
+                                                const char *name)
+{
+    const dt_prop_spec *spec;
+
+    for (spec = drv->prop_spec; spec && spec->name; spec++) {
+        if (!strcmp(spec->name, name))
+            return spec;
+    }
+    return NULL;
+}
+
+static void dt_parse_prop(dt_device *dev, tree_prop *prop)
+{
+    const char *name = tree_prop_name(prop);
+    size_t size;
+    const char *val = tree_prop_value(prop, &size);
+    const dt_prop_spec *spec = dt_prop_spec_by_name(dev->drv, name);
+
+    if (!spec) {
+        fprintf(stderr, "A %s device has no property %s\n",
+                dev->drv->name, name);
+        exit(1);
+    }
+
+    if (memchr(val, 0, size) != val + size - 1
+        || spec->parse((char *)dev->priv + spec->offs, val, spec) < 0) {
+        fprintf(stderr, "Bad value %.*s for property %s of device %s\n",
+                size, val, name, dev->drv->name);
+        exit(1);
+    }
+}
+
+int dt_parse_string(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    assert(spec->size == sizeof(char *));
+    *(const char **)dst = src;
+    return 0;
+}
+
+int dt_parse_int(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    char *ep;
+    long val;
+
+    assert(spec->size == sizeof(int));
+    errno = 0;
+    val = strtol(src, &ep, 0);
+    if (*ep || ep == src || errno || (int)val != val)
+        return -1;
+    *(int *)dst = val;
+    return 0;
+}
+
+int dt_parse_ram_addr_t(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    char *ep;
+    unsigned long val;
+
+    assert(spec->size == sizeof(ram_addr_t));
+    errno = 0;
+    val = strtoul(src, &ep, 0);
+    if (*ep || ep == src || errno || (ram_addr_t)val != val)
+        return -1;
+    *(ram_addr_t *)dst = val;
+    return 0;
+}
+
+int dt_parse_macaddr(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    assert(spec->size == 6);
+    if (parse_macaddr(dst, src) < 0)
+        return -1;
+    return 0;
+}
+
+
+/* Dynamic Devices */
+
+static void dt_add_dyn_dev(tree *conf, tree *node, const dt_driver drvtab[])
+{
+    dt_device *dev = dt_create(node, drvtab);
+    dt_device *bus = dt_find_bus(conf, dev->drv->parent_bus_type);
+                                /* TODO multiple buses */
+
+    if (!bus) {
+        fprintf(stderr, "No suitable bus for device %s\n", dev->drv->name);
+        exit(1);
+    }
+
+    tree_insert(bus->conf, node);
+}
+
+static void dt_add_vga(tree *conf, const dt_driver drvtab[],
+                       const char *model, int vga_ram_size)
+{
+    tree *node = tree_new_kid(NULL, "vga", NULL);
+
+    tree_put_propf(node, "model", "%s", model);
+    tree_put_propf(node, "ram", "%#x", vga_ram_size);
+    dt_add_dyn_dev(conf, node, drvtab);
+}
+
+static void dt_add_nic(tree *conf, dt_host *host, const dt_driver drvtab[],
+                       int index)
+{
+    NICInfo *n = &nd_table[index];
+    tree *node;
+
+    node = tree_new_kid(NULL, "nic", NULL);
+    tree_put_propf(node, "mac", "%02x:%02x:%02x:%02x:%02x:%02x",
+                   n->macaddr[0], n->macaddr[1], n->macaddr[2],
+                   n->macaddr[3], n->macaddr[4], n->macaddr[5]);
+    tree_put_propf(node, "model", "%s",
+                   n->model ? n->model : "ne2k_pci");
+    if (n->name)
+        tree_put_propf(node, "name", "%s", n->name);
+    dt_add_dyn_dev(conf, node, drvtab);
+    dt_attach_nic(host, index, node, n->vlan);
+}
+
+static const char *block_if_name[] = {
+    [IF_IDE] = "ide",
+    [IF_SCSI] = "scsi",
+    [IF_FLOPPY] = "floppy",
+    [IF_PFLASH] = "pflash",
+    [IF_MTD] = "mtd",
+    [IF_SD] = "sd",
+    [IF_VIRTIO] = "virtio",
+};
+
+static void dt_add_drive(tree *conf, dt_host *host, const dt_driver drvtab[],
+                         int index)
+{
+    DriveInfo *d = &drives_table[index];
+    int bus, unit;
+    char buf[32];
+    tree *node;
+
+    unit = d->unit;
+    bus = d->bus;
+
+    switch (d->type) {
+    case IF_IDE:
+        /* hack to hang all IDE drives off the same node for now */
+        unit = bus * MAX_IDE_DEVS + unit;
+        bus = 0;
+        /* fall through */
+    case IF_SCSI:
+    case IF_FLOPPY:
+    case IF_VIRTIO:
+        if (bus != 0) {         /* TODO implement */
+            fprintf(stderr, "Bus#%d not implemented, ignoring drive %s\n",
+                    bus, drives_opt[drives_table[index].drive_opt_idx].opt);
+            break;
+        }
+        snprintf(buf, sizeof(buf), "%s-drive", block_if_name[d->type]);
+        node = tree_new_kid(NULL, strdup(buf), NULL);
+        tree_put_propf(node, "unit", "%d", unit);
+        dt_add_dyn_dev(conf, node, drvtab);
+        dt_attach_drive(host, index, node, d->bdrv);
+        break;
+    case IF_PFLASH:
+    case IF_MTD:
+    case IF_SD:
+        /* TODO implement */
+        fprintf(stderr, "Ignoring unimplemented drive %s\n",
+                drives_opt[drives_table[index].drive_opt_idx].opt);
+        break;
+    }
+}
+
+static void dt_add_dyn_devs(tree *conf, dt_host *host,
+                            const dt_driver drvtab[], int vga_ram_size)
+{
+    int i;
+
+    if (cirrus_vga_enabled || vmsvga_enabled || std_vga_enabled) {
+        dt_add_vga(conf, drvtab,
+                   cirrus_vga_enabled ? "cirrus"
+                   : vmsvga_enabled ? "vms" : "std",
+                   vga_ram_size);
+    }
+
+    for(i = 0; i < nb_nics; i++)
+        dt_add_nic(conf, host, drvtab, i);
+
+    for (i = 0; i < nb_drives; i++)
+        dt_add_drive(conf, host, drvtab, i);
+}
+
+
+/* Interfacing with FDT */
+
+/*
+ * Note: translation to FDT loses the association between
+ * configuration tree nodes and devices.
+ */
+
+#ifdef HAVE_FDT
+
+static int dt_fdt_chk(int res);
+static void dt_subtree_to_fdt(const tree *conf, void *fdt);
+
+static void *dt_tree_to_fdt(const tree *conf)
+{
+    int sz = 1024 * 1024;       /* FIXME arbitrary limit */
+    void *fdt = qemu_malloc(sz);
+
+    dt_fdt_chk(fdt_create(fdt, sz));
+    dt_subtree_to_fdt(conf, fdt);
+    dt_fdt_chk(fdt_finish(fdt));
+    return fdt;
+}
+
+static void dt_subtree_to_fdt(const tree *conf, void *fdt)
+{
+    tree_prop *prop;
+    tree *kid;
+    const void *pv;
+    size_t sz;
+
+    dt_fdt_chk(fdt_begin_node(fdt, tree_node_name(conf)));
+    TREE_FOREACH_PROP(prop, conf) {
+        pv = tree_prop_value(prop, &sz);
+        dt_fdt_chk(fdt_property(fdt, tree_prop_name(prop), pv, sz));
+    }
+    TREE_FOREACH_KID(kid, conf)
+        dt_subtree_to_fdt(kid, fdt);
+    dt_fdt_chk(fdt_end_node(fdt));
+}
+
+static tree *dt_fdt_to_tree(const void *fdt)
+{
+    int offs, next, depth;
+    uint32_t tag;
+    struct fdt_property *prop;
+    tree *stack[32];            /* FIXME arbitrary limit */
+
+    stack[0] = NULL;            /* "parent" of root */
+    next = depth = 0;
+
+    for (;;) {
+        offs = next;
+        tag = fdt_next_tag(fdt, offs, &next);
+        switch (tag) {
+        case FDT_PROP:
+            /*
+             * libfdt apparently doesn't provide a way to get property
+             * by offset, do it by hand
+             */
+            assert(0 < depth && depth < sizeof(stack) / sizeof(*stack));
+            prop = (void *)(const char *)fdt + fdt_off_dt_struct(fdt) + offs;
+            tree_put_prop(stack[depth],
+                          fdt_string(fdt, fdt32_to_cpu(prop->nameoff)),
+                          prop->data,
+                          fdt32_to_cpu(prop->len));
+        case FDT_NOP:
+            break;
+        case FDT_BEGIN_NODE:
+            depth++;
+            assert(0 < depth && depth < sizeof(stack) / sizeof(*stack));
+            stack[depth] = tree_new_kid(stack[depth-1],
+                                        fdt_get_name(fdt, offs, NULL),
+                                        NULL);
+            break;
+        case FDT_END_NODE:
+            depth--;
+            break;
+        case FDT_END:
+            dt_fdt_chk(next);
+            return stack[1];
+        }
+    }
+}
+
+static int dt_fdt_chk(int res)
+{
+    if (res < 0) {
+        fprintf(stderr, "%s\n", fdt_strerror(res)); /* FIXME cryptic */
+        exit(1);
+    }
+    return res;
+}
+
+static void dt_fdt_test(tree *conf)
+{
+    void *fdt;
+
+    fdt = dt_tree_to_fdt(conf);
+    conf = dt_fdt_to_tree(fdt);
+    tree_print(conf);
+    free(fdt);
+}
+#else
+static void dt_fdt_test(tree *conf) { }
+#endif
diff --git a/dt.h b/dt.h
new file mode 100644
index 0000000..bc98c3f
--- /dev/null
+++ b/dt.h
@@ -0,0 +1,117 @@
+#ifndef DT_H
+#define DT_H
+
+#include "sysemu.h"
+#include "net.h"
+#include "tree.h"
+
+typedef struct dt_host dt_host;
+typedef struct dt_device dt_device;
+typedef struct dt_tree_list dt_tree_list;
+typedef struct dt_driver dt_driver;
+typedef struct dt_prop_spec dt_prop_spec;
+
+
+/* Host Configuration */
+
+struct dt_host {
+    /* connection NIC <-> VLAN */
+    tree *nic[MAX_NICS];
+    VLANState *nic_vlan[MAX_NICS];
+    /* connection drive <-> block driver state */
+    tree *drive[MAX_DRIVES];
+    BlockDriverState *drive_state[MAX_DRIVES];
+};
+
+void dt_attach_nic(dt_host *host, int index, tree *nic, VLANState *vlan);
+VLANState *dt_find_vlan(tree *conf, dt_host *host);
+void dt_attach_drive(dt_host *host, int index,
+                     tree *controller, BlockDriverState *state);
+void dt_drive_config(tree *conf, dt_host *host,
+                     BlockDriverState *drive[], int n);
+
+
+/* Device Interface */
+
+/*
+ * Device life cycle:
+ *
+ * 1. Configuration: config() method runs after parent's.  Except kids
+ * are skipped when the parent's config() returns non-zero.  config()
+ * should initialize the device's private data from its configuration
+ * sub-tree.  It may edit the configuration sub-tree, and may declare
+ * initialization ordering constraints with tree_require_named().
+ *
+ * 2. Initialization: init() method runs after parent's and after that
+ * of devices declared required by config().  It should not touch the
+ * configuration tree.
+ *
+ * 3. Start: start() method runs, order is unspecified.
+ *
+ * Error handling in these driver methods: print to stderr and exit
+ * the program unsuccessfully.
+ *
+ * There is no device shutdown protocol yet.
+ */
+
+struct dt_device {
+    tree *conf;                 /* configuration sub-tree */
+    const dt_driver *drv;       /* device driver */
+    LIST_HEAD(, dt_tree_list) reqs; /* required devices */
+    int visit;                  /* for dt_visit() */
+    void *priv;                 /* device private data */
+};
+
+struct dt_tree_list {
+    tree *conf;
+    LIST_ENTRY(dt_tree_list) link;
+};
+
+typedef enum dt_bus_type {
+    DT_BUS_NONE, DT_BUS_ROOT, DT_BUS_PCI, DT_BUS_IDE, DT_BUS_FLOPPY,
+} dt_bus_type;
+
+struct dt_driver {
+    const char *name;
+    size_t privsz;              /* size of device private data */
+    const dt_prop_spec *prop_spec; /* recognized conf node properties */
+    dt_bus_type bus_type, parent_bus_type;
+    int (*config)(dt_device *, dt_host *);
+    void (*init)(dt_device *);
+    void (*start)(dt_device *);
+    PCIBus *(*get_pcibus)(dt_device *); /* iff device is a PCI bus */
+};
+
+dt_device *dt_device_of(tree *conf);
+dt_device *dt_parent_device(dt_device *dev);
+PCIBus *dt_get_pcibus(dt_device *dev);
+tree *dt_require_named(dt_device *dev, const char *reqname);
+void dt_create_machine(tree *conf, dt_host *host,
+                       const dt_driver drvtab[], int vga_ram_size);
+
+
+/* Device properties */
+
+/*
+ * This is for parsing configuration tree node properties into device
+ * private data.
+ */
+
+struct dt_prop_spec {
+    const char *name;
+    ptrdiff_t offs;             /* offset in device private data */
+    size_t size;                /* size there, for sanity checking */
+    int (*parse)(void *, const char *, const dt_prop_spec *);
+};
+
+#define DT_PROP_SPEC_INIT(name, strty, member, fmt)                     \
+    { name, offsetof(strty, member), sizeof(((strty *)0)->member),      \
+      dt_parse_##fmt }
+
+/* Canned property parse methods */
+int dt_parse_string(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_int(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_ram_addr_t(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_macaddr(void *dst, const char *src, const dt_prop_spec *spec);
+
+#endif
diff --git a/hw/boards.h b/hw/boards.h
index 1e62594..d75e518 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -35,6 +35,9 @@ extern QEMUMachine axisdev88_machine;
 extern QEMUMachine pc_machine;
 extern QEMUMachine isapc_machine;
 
+/* pcdt.c */
+extern QEMUMachine pcdt_machine;
+
 /* ppc.c */
 extern QEMUMachine prep_machine;
 extern QEMUMachine core99_machine;
diff --git a/hw/pc.c b/hw/pc.c
index 3849390..3a9197a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -37,45 +37,38 @@
 #include "virtio-balloon.h"
 #include "virtio-console.h"
 #include "hpet_emul.h"
+#include "pcint.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
 
-#define BIOS_FILENAME "bios.bin"
-#define VGABIOS_FILENAME "vgabios.bin"
-#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
-
-#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
-
 /* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables.  */
 #define ACPI_DATA_SIZE       0x10000
 #define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 
-#define MAX_IDE_BUS 2
-
 extern uint8_t *acpi_tables;
 extern size_t acpi_tables_len;
 
-static fdctrl_t *floppy_controller;
-static RTCState *rtc_state;
+fdctrl_t *floppy_controller;
+RTCState *rtc_state;
 static PITState *pit;
 static IOAPICState *ioapic;
-static PCIDevice *i440fx_state;
+PCIDevice *i440fx_state;
 
-static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
 {
 }
 
 /* MSDOS compatibility mode FPU exception support */
-static qemu_irq ferr_irq;
+qemu_irq ferr_irq;
 /* XXX: add IGNNE support */
 void cpu_set_ferr(CPUX86State *s)
 {
     qemu_irq_raise(ferr_irq);
 }
 
-static void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
 {
     qemu_irq_lower(ferr_irq);
 }
@@ -124,7 +117,7 @@ int cpu_get_pic_interrupt(CPUState *env)
     return intno;
 }
 
-static void pic_irq_request(void *opaque, int irq, int level)
+void pic_irq_request(void *opaque, int irq, int level)
 {
     CPUState *env = first_cpu;
 
@@ -170,7 +163,7 @@ static int cmos_get_fd_drive_type(int fd0)
     return val;
 }
 
-static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
 {
     RTCState *s = rtc_state;
     int cylinders, heads, sectors;
@@ -206,7 +199,7 @@ static int boot_device2nibble(char boot_device)
 
 /* copy/pasted from cmos_init, should be made a general function
  and used there as well */
-static int pc_boot_set(void *opaque, const char *boot_device)
+int pc_boot_set(void *opaque, const char *boot_device)
 {
 #define PC_MAX_BOOT_DEVICES 3
     RTCState *s = (RTCState *)opaque;
@@ -232,8 +225,8 @@ static int pc_boot_set(void *opaque, const char *boot_device)
 }
 
 /* hd_table must contain 4 block drivers */
-static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
-                      const char *boot_device, BlockDriverState **hd_table)
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table)
 {
     RTCState *s = rtc_state;
     int nbds, bds[3] = { 0, };
@@ -366,13 +359,13 @@ int ioport_get_a20(void)
     return ((first_cpu->a20_mask >> 20) & 1);
 }
 
-static void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
 {
     ioport_set_a20((val >> 1) & 1);
     /* XXX: bit 0 is fast reset */
 }
 
-static uint32_t ioport92_read(void *opaque, uint32_t addr)
+uint32_t ioport92_read(void *opaque, uint32_t addr)
 {
     return ioport_get_a20() << 1;
 }
@@ -424,7 +417,7 @@ static void bochs_bios_write(void *opaque, uint32_t addr, uint32_t val)
     }
 }
 
-static void bochs_bios_init(void)
+void bochs_bios_init(void)
 {
     void *fw_cfg;
 
@@ -692,7 +685,7 @@ static void load_linux(uint8_t *option_rom,
     generate_bootsect(option_rom, gpr, seg, 0);
 }
 
-static void main_cpu_reset(void *opaque)
+void main_cpu_reset(void *opaque)
 {
     CPUState *env = opaque;
     cpu_reset(env);
@@ -707,11 +700,11 @@ static const int ide_irq[2] = { 14, 15 };
 static int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360, 0x280, 0x380 };
 static int ne2000_irq[NE2000_NB_MAX] = { 9, 10, 11, 3, 4, 5 };
 
-static int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
-static int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
+int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
+int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
 
-static int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
-static int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
+int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
+int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
 
 #ifdef HAS_AUDIO
 static void audio_init (PCIBus *pci_bus, qemu_irq *pic)
diff --git a/hw/pcdt.c b/hw/pcdt.c
new file mode 100644
index 0000000..c8a1505
--- /dev/null
+++ b/hw/pcdt.c
@@ -0,0 +1,645 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * This is a PC configured and built using the new dt_ infrastructure.
+ * Having two PC machine types makes no sense in the long run, of
+ * course.  We want to replace pc.c eventually, and also convert other
+ * machine types to this infrastructure.
+ *
+ * The configuration data currently is hardwired, and fairly limited.
+ *
+ * The nuts and bolts of PC emulation remain in pc.c for now, and
+ * using the stuff there makes the somewhat clumsy pcint.h necessary.
+ */
+
+#include <assert.h>
+#include "hw.h"
+#include "pc.h"
+#include "fdc.h"
+#include "pci.h"
+#include "block.h"
+#include "sysemu.h"
+#include "audio/audio.h"
+#include "net.h"
+#include "smbus.h"
+#include "boards.h"
+#include "console.h"
+#include "fw_cfg.h"
+#include "virtio-blk.h"
+#include "virtio-balloon.h"
+#include "virtio-console.h"
+#include "hpet_emul.h"
+#include "pcint.h"
+#include "dt.h"
+
+static BlockDriverState **dt_piix3_hd(tree *piix3);
+
+/* CPUs Driver */
+
+typedef struct dt_device_cpus {
+    const char *model;
+    int num;
+} dt_device_cpus;
+
+static const dt_prop_spec dt_cpus_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
+    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
+};
+
+static void dt_cpus_init(dt_device *dev)
+{
+    dt_device_cpus *priv = dev->priv;
+    int i;
+    CPUState *env;
+
+    for(i = 0; i < priv->num; i++) {
+        env = cpu_init(priv->model);
+        if (!env) {
+            fprintf(stderr, "Unable to find x86 CPU definition\n");
+            exit(1);
+        }
+        if (i != 0)
+            env->halted = 1;
+        qemu_register_reset(main_cpu_reset, env);
+    }
+}
+
+
+/* Memory Ranges */
+
+typedef struct dt_device_memrng {
+    target_phys_addr_t phys_addr;
+    ram_addr_t size;
+    ram_addr_t host_offs;
+    ram_addr_t flags;
+} dt_device_memrng;
+
+static void dt_memrng(dt_device_memrng *rng,
+                      target_phys_addr_t phys_addr, ram_addr_t size,
+                      ram_addr_t host_offs, ram_addr_t flags)
+{
+    rng->phys_addr = phys_addr;
+    rng->size = size;
+    rng->host_offs = host_offs;
+    rng->flags = flags;
+}
+
+static void dt_memrng_ram(dt_device_memrng *rng,
+                          target_phys_addr_t phys_addr, ram_addr_t size)
+{
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), 0);
+}
+
+static void dt_memrng_rom(dt_device_memrng *rng,
+                          target_phys_addr_t phys_addr, ram_addr_t maxsz,
+                          const char *dir, const char *image, int top)
+{
+    char buf[1024];
+    int size;
+
+    snprintf(buf, sizeof(buf), "%s/%s", dir, image);
+    size = get_image_size(buf);
+    if (size < 0 || size > maxsz)
+        goto error;
+    if (top)
+        phys_addr = phys_addr + maxsz - size;
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), IO_MEM_ROM);
+    if (load_image(buf, phys_ram_base + rng->host_offs) != size)
+        goto error;
+    return;
+
+error:
+    fprintf(stderr, "qemu: could not load image '%s'\n", buf);
+    exit(1);
+}
+
+static void dt_memrng_init(dt_device_memrng *rng, int n)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+        cpu_register_physical_memory(rng[i].phys_addr, rng[i].size,
+                                     rng[i].host_offs | rng[i].flags);
+}
+
+
+/* Memory Driver */
+
+typedef struct dt_device_memory {
+    ram_addr_t ram_size;
+    dt_device_memrng *rng;
+    int nrng;
+    /* TODO want a real memory map here */
+    ram_addr_t below_4g, above_4g;
+} dt_device_memory;
+
+static const dt_prop_spec dt_memory_props[] = {
+    DT_PROP_SPEC_INIT("ram", dt_device_memory, ram_size, ram_addr_t),
+};
+
+static int dt_memory_config(dt_device *dev, dt_host *host)
+{
+    /* TODO memory map hardcoded; get it from dev->conf instead */
+    dt_device_memory *priv = dev->priv;
+    dt_device_memrng *rng = qemu_malloc(sizeof(*rng) * 4);
+
+    if (priv->ram_size >= 0xe0000000 ) {
+        priv->above_4g = priv->ram_size - 0xe0000000;
+        priv->below_4g = 0xe0000000;
+    } else {
+        priv->below_4g = priv->ram_size;
+        priv->above_4g = 0;
+    }
+
+    dt_memrng_ram(&rng[0], 0, 0xa0000);
+    qemu_ram_alloc(0x60000);
+    dt_memrng_ram(&rng[1], 0x100000, priv->below_4g - 0x100000);
+    if (priv->above_4g)
+        abort();                /* TODO */
+    dt_memrng_rom(&rng[2], 0xe0000000, 0x20000000,
+                  bios_dir, BIOS_FILENAME, 1);
+                                /* TODO get name from dev->conf */
+    dt_memrng(&rng[3], 0xe0000, 0x20000,
+              rng[2].host_offs + rng[2].size - 0x20000, IO_MEM_ROM);
+    /* TODO option ROMs */
+
+    priv->rng = rng;
+    priv->nrng = 4;
+    return 0;
+}
+
+static void dt_memory_init(dt_device *dev)
+{
+    dt_device_memory *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, priv->nrng);
+    bochs_bios_init();
+}
+
+static ram_addr_t dt_memory_below_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->below_4g;
+}
+
+static ram_addr_t dt_memory_above_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->above_4g;
+}
+
+
+/* PC Miscellanous Driver */
+
+/*
+ * This is a driver for a whole collection of devices.  Could be
+ * picked apart into separate drivers, I guess.
+ */
+
+typedef struct dt_device_pc_misc {
+    const char *boot_device;
+    int apic;
+    int hpet;
+    qemu_irq *i8259;
+    BlockDriverState *fd[MAX_FD];
+} dt_device_pc_misc;
+
+static const dt_prop_spec dt_pc_misc_props[] = {
+    DT_PROP_SPEC_INIT("boot-device", dt_device_pc_misc, boot_device,
+                      string),
+};
+
+static int dt_pc_misc_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pc_misc *priv = dev->priv;
+
+    priv->apic = 1;
+    priv->hpet = 1;
+    priv->i8259 = NULL;
+    dt_drive_config(dev->conf, host,
+                    priv->fd, sizeof(priv->fd) / sizeof(*priv->fd));
+    return 1;
+}
+
+static void dt_pc_misc_init(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    CPUState *env;
+    qemu_irq *cpu_irq;
+    IOAPICState *ioapic;
+    PITState *pit;
+    int i;
+
+    if (priv->apic) {
+        for (env = first_cpu; env; env = env->next_cpu) {
+            env->cpuid_features |= CPUID_APIC;
+            apic_init(env);
+        }
+    }
+
+    vmport_init();
+
+    cpu_irq = qemu_allocate_irqs(pic_irq_request, NULL, 1);
+    priv->i8259 = i8259_init(cpu_irq[0]);
+    ferr_irq = priv->i8259[13];
+
+    register_ioport_write(0x80, 1, 1, ioport80_write, NULL);
+    register_ioport_write(0xf0, 1, 1, ioportF0_write, NULL);
+
+    rtc_state = rtc_init(0x70, priv->i8259[8], 2000);
+    qemu_register_boot_set(pc_boot_set, rtc_state);
+
+    register_ioport_read(0x92, 1, 1, ioport92_read, NULL);
+    register_ioport_write(0x92, 1, 1, ioport92_write, NULL);
+
+    if (priv->apic) {
+        ioapic = ioapic_init();
+        pic_set_alt_irq_func(isa_pic, ioapic_set_irq, ioapic);
+    }
+
+    pit = pit_init(0x40, priv->i8259[0]);
+    pcspk_init(pit);
+    if (priv->hpet)
+        hpet_init(priv->i8259);
+
+    for(i = 0; i < MAX_SERIAL_PORTS; i++) {
+        if (serial_hds[i]) {
+            serial_init(serial_io[i], priv->i8259[serial_irq[i]], 115200,
+                        serial_hds[i]);
+        }
+    }
+
+    for(i = 0; i < MAX_PARALLEL_PORTS; i++) {
+        if (parallel_hds[i]) {
+            parallel_init(parallel_io[i], priv->i8259[parallel_irq[i]],
+                          parallel_hds[i]);
+        }
+    }
+
+    i8042_init(priv->i8259[1], priv->i8259[12], 0x60);
+    DMA_init(0);
+
+    floppy_controller = fdctrl_init(priv->i8259[6], 2, 0, 0x3f0, priv->fd);
+}
+
+static void dt_pc_misc_start(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    tree *memory = tree_node_by_name(dev->conf, "/memory");
+    tree *piix3 = tree_node_by_name(dev->conf, "/pci/piix3");
+
+    cmos_init(dt_memory_below_4g(memory),
+              dt_memory_above_4g(memory),
+              priv->boot_device,
+              dt_piix3_hd(piix3));
+}
+
+static qemu_irq *dt_pc_misc_i8259(tree *pc_misc)
+{
+    dt_device *dev = dt_device_of(pc_misc);
+    dt_device_pc_misc *priv = dev->priv;
+    assert(dev->drv->init == dt_pc_misc_init);
+    return priv->i8259;
+}
+
+
+/* PCI Bus Driver */
+
+typedef struct dt_device_pci {
+    PCIBus *pcibus;
+    tree *pc;
+} dt_device_pci;
+
+static int dt_pci_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->pcibus = NULL;
+    priv->pc = dt_require_named(dev, "/pc-misc");
+    return 0;
+}
+
+static void dt_pci_init(dt_device *dev)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->pcibus = i440fx_init(&i440fx_state, dt_pc_misc_i8259(priv->pc));
+}
+
+static void dt_pci_start(dt_device *dev)
+{
+    i440fx_init_memory_mappings(i440fx_state);
+}
+
+static PCIBus *dt_pci_get_pcibus(dt_device *dev)
+{
+    return ((dt_device_pci *)dev->priv)->pcibus;
+}
+
+
+/* PIIX3 Driver */
+
+typedef struct dt_device_piix3 {
+    int devfn;
+    int acpi;
+    int usb;
+    tree *pc;
+    BlockDriverState *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
+} dt_device_piix3;
+
+static int dt_piix3_config(dt_device *dev, dt_host *host)
+{
+    dt_device_piix3 *priv = dev->priv;
+
+    priv->devfn = -1;
+    priv->acpi = 1;
+    priv->usb = 1;
+    priv->pc = dt_require_named(dev, "/pc-misc");
+    dt_drive_config(dev->conf, host,
+                    priv->hd, sizeof(priv->hd) / sizeof(*priv->hd));
+    return 1;
+}
+
+static void dt_piix3_init(dt_device *dev)
+{
+    dt_device_piix3 *priv = dev->priv;
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+    qemu_irq *i8259 = dt_pc_misc_i8259(priv->pc);
+    int i;
+
+    priv->devfn = piix3_init(pci_bus, priv->devfn);
+
+    pci_piix3_ide_init(pci_bus, priv->hd, priv->devfn + 1, i8259);
+
+    if (priv->usb)
+        usb_uhci_piix3_init(pci_bus, priv->devfn + 2);
+
+    if (priv->acpi) {
+        uint8_t *eeprom_buf = qemu_mallocz(8 * 256); /* XXX: make this persistent */
+        i2c_bus *smbus;
+
+        /* TODO: Populate SPD eeprom data.  */
+        smbus = piix4_pm_init(pci_bus, priv->devfn + 3, 0xb100, i8259[9]);
+        for (i = 0; i < 8; i++)
+            smbus_eeprom_device_init(smbus, 0x50 + i, eeprom_buf + (i * 256));
+    }
+}
+
+static BlockDriverState **dt_piix3_hd(tree *piix3)
+{
+    dt_device *dev = dt_device_of(piix3);
+    dt_device_piix3 *priv = dev->priv;
+
+    assert(dev->drv->init == dt_piix3_init);
+    return priv->hd;
+}
+
+
+/* VGA Driver */
+
+typedef struct dt_driver_vga {
+    const char *model;
+    const char *bios;
+    void (*init)(PCIBus *, uint8_t *, ram_addr_t, int);
+} dt_driver_vga;
+
+static void pci_vga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+                          ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size, 0, 0);
+}
+
+static dt_driver_vga dt_driver_vga_table[] = {
+    { "cirrus", VGABIOS_CIRRUS_FILENAME, pci_cirrus_vga_init },
+    { "vms", VGABIOS_FILENAME, pci_vmsvga_init },
+    { "std", VGABIOS_FILENAME, pci_vga_init_ },
+    { NULL, NULL, NULL }
+};
+
+typedef struct dt_device_vga {
+    const char *model;
+    ram_addr_t ram_size;
+    dt_device_memrng rng[1];
+    ram_addr_t ram_offs;
+    dt_driver_vga *vga_drv;
+} dt_device_vga;
+
+static const dt_prop_spec dt_vga_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_vga, model, string),
+    DT_PROP_SPEC_INIT("ram", dt_device_vga, ram_size, ram_addr_t),
+};
+
+static int dt_vga_config(dt_device *dev, dt_host *host)
+{
+    dt_device_vga *priv = dev->priv;
+    int i;
+
+    dt_memrng_rom(&priv->rng[0], 0xc0000, 0x10000,
+                  bios_dir, VGABIOS_CIRRUS_FILENAME, 0);
+                                /* TODO get name from dev->conf */
+    priv->ram_offs = qemu_ram_alloc(priv->ram_size);
+
+    for (i = 0; dt_driver_vga_table[i].model; i++) {
+        if (!strcmp(dt_driver_vga_table[i].model, priv->model))
+            break;
+    }
+    if (!dt_driver_vga_table[i].model) {
+        fprintf(stderr, "Unknown VGA model %s\n", priv->model);
+        exit(1);
+    }
+    priv->vga_drv = &dt_driver_vga_table[i];
+    return 0;
+}
+
+static void dt_vga_init(dt_device *dev)
+{
+    dt_device_vga *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, 1);
+    priv->vga_drv->init(dt_get_pcibus(dev),
+                        phys_ram_base + priv->ram_offs,
+                        priv->ram_offs, priv->ram_size);
+}
+
+
+/* NIC Driver */
+
+typedef struct dt_device_nic {
+    NICInfo nd;
+} dt_device_nic;
+
+static const dt_prop_spec dt_nic_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_nic, nd.model, string),
+    DT_PROP_SPEC_INIT("mac", dt_device_nic, nd.macaddr, macaddr),
+    DT_PROP_SPEC_INIT("name", dt_device_nic, nd.name, string),
+};
+
+static int dt_nic_config(dt_device *dev, dt_host *host)
+{
+    dt_device_nic *priv = dev->priv;
+
+    priv->nd.vlan = dt_find_vlan(dev->conf, host);
+    return 0;
+}
+
+static void dt_nic_init(dt_device *dev)
+{
+    dt_device_nic *priv = dev->priv;
+
+    pci_nic_init(dt_get_pcibus(dev), &priv->nd, -1, NULL);
+}
+
+
+/* Drive Driver */
+
+typedef struct dt_device_drive {
+    int unit;
+} dt_device_drive;
+
+static const dt_prop_spec dt_drive_props[] = {
+    DT_PROP_SPEC_INIT("unit", dt_device_drive, unit, int),
+};
+
+
+/* Machine Driver */
+
+static const dt_driver dt_driver_table[] = {
+    { "", 0, NULL, DT_BUS_ROOT, DT_BUS_NONE, NULL, NULL, NULL, NULL },
+    { "cpus", sizeof(dt_device_cpus), dt_cpus_props,
+      DT_BUS_NONE, DT_BUS_ROOT,
+      NULL, dt_cpus_init, NULL, NULL },
+    { "memory", sizeof(dt_device_memory), dt_memory_props,
+      DT_BUS_NONE, DT_BUS_ROOT,
+      dt_memory_config, dt_memory_init, NULL, NULL },
+    { "pc-misc", sizeof(dt_device_pc_misc), dt_pc_misc_props,
+      DT_BUS_FLOPPY, DT_BUS_ROOT,
+      dt_pc_misc_config, dt_pc_misc_init, dt_pc_misc_start, NULL },
+    { "pci", sizeof(dt_device_pci), NULL,
+      DT_BUS_PCI, DT_BUS_ROOT,
+      dt_pci_config, dt_pci_init, dt_pci_start, dt_pci_get_pcibus },
+    { "piix3", sizeof(dt_device_piix3), NULL,
+      DT_BUS_IDE, DT_BUS_PCI,
+      dt_piix3_config, dt_piix3_init, NULL, NULL },
+    { "vga", sizeof(dt_device_vga), dt_vga_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_vga_config, dt_vga_init, NULL, NULL },
+    { "nic", sizeof(dt_device_nic), dt_nic_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_nic_config, dt_nic_init, NULL, NULL },
+    { "ide-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_IDE,
+      NULL, NULL, NULL, NULL },
+    { "floppy-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_FLOPPY,
+      NULL, NULL, NULL, NULL },
+    { NULL, 0, NULL, DT_BUS_NONE, DT_BUS_NONE, NULL, NULL, NULL, NULL }
+};
+
+static tree *dt_read_config(void)
+{
+    tree *root, *pci, *leaf;
+
+    /*
+     * TODO Read from config file.
+     *
+     * TODO Pretty far from a comprehensive machine configuration, but
+     * we need to start somewhere.
+     */
+    root = tree_new_kid(NULL, "", NULL);
+    leaf = tree_new_kid(root, "cpus", NULL);
+    tree_put_propf(leaf, "model", "%s", "qemu32");
+    leaf = tree_new_kid(root, "memory", NULL);
+    leaf = tree_new_kid(root, "pc-misc", NULL);
+    pci = tree_new_kid(root, "pci", NULL);
+    leaf = tree_new_kid(pci, "piix3", NULL);
+    return root;
+}
+
+/*
+ * Extract configuration from arguments and various global variables
+ * and put it into our machine and host configuration.
+ */
+static void dt_customize_config(tree *conf,
+                                dt_host *host,
+                                ram_addr_t ram_size, int vga_ram_size,
+                                const char *boot_device,
+                                const char *kernel_filename,
+                                const char *kernel_cmdline,
+                                const char *initrd_filename,
+                                const char *cpu_model)
+{
+    /*
+     * TODO This is still pretty cheesy: we insert stuff into the tree
+     * at hardcoded places.  Replacing placeholders instead would be
+     * more flexible.  Another idea is to mark certain parts of the
+     * initial tree optional, and remove them here.
+     */
+    tree *node;
+
+    node = tree_node_by_name(conf, "/cpus");
+    tree_put_propf(node, "num", "%d", smp_cpus);
+    if (cpu_model)
+        tree_put_propf(node, "model", "%s", cpu_model);
+
+    node = tree_node_by_name(conf, "/memory");
+    tree_put_propf(node, "ram", "%#lx", (unsigned long)ram_size);
+
+    node = tree_node_by_name(conf, "/pc-misc");
+    tree_put_propf(node, "boot-device", "%s", boot_device);
+
+    /* Unimplemented stuff */
+    if (kernel_filename)
+        abort();                /* TODO */
+}
+
+static void pc_init_dt(ram_addr_t ram_size, int vga_ram_size,
+                       const char *boot_device,
+                       const char *kernel_filename,
+                       const char *kernel_cmdline,
+                       const char *initrd_filename,
+                       const char *cpu_model)
+{
+    tree *conf;
+    dt_host host;
+
+    conf = dt_read_config();
+    if (!conf)
+        exit(1);
+    tree_print(conf);
+    memset(&host, 0, sizeof(host));
+    dt_customize_config(conf, &host, ram_size, vga_ram_size, boot_device,
+                        kernel_filename, kernel_cmdline, initrd_filename,
+                        cpu_model);
+    dt_create_machine(conf, &host, dt_driver_table, vga_ram_size);
+}
+
+QEMUMachine pcdt_machine = {
+    .name = "pcdt",
+    .desc = "Standard PC (device tree)",
+    .init = pc_init_dt,
+    .ram_require = VGA_RAM_SIZE + PC_MAX_BIOS_SIZE,
+    .max_cpus = 255,
+};
diff --git a/hw/pci.h b/hw/pci.h
index 56381e8..b9093f9 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -271,7 +271,7 @@ void *lsi_scsi_init(PCIBus *bus, int devfn);
 
 /* vmware_vga.c */
 void pci_vmsvga_init(PCIBus *bus, uint8_t *vga_ram_base,
-                     unsigned long vga_ram_offset, int vga_ram_size);
+                     ram_addr_t vga_ram_offset, int vga_ram_size);
 
 /* usb-uhci.c */
 void usb_uhci_piix3_init(PCIBus *bus, int devfn);
diff --git a/hw/pcint.h b/hw/pcint.h
new file mode 100644
index 0000000..f18da67
--- /dev/null
+++ b/hw/pcint.h
@@ -0,0 +1,46 @@
+/*
+ * Stuff shared by pc.c and dt.c
+ *
+ * See dt.c for why this should go away eventually.
+ */
+
+#ifndef HW_PC_INT_H
+#define HW_PC_INT_H
+
+#define BIOS_FILENAME "bios.bin"
+#define VGABIOS_FILENAME "vgabios.bin"
+#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
+
+#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
+
+#define MAX_IDE_BUS 2
+
+/* TODO move to ferr stuff in cpu.h? */
+extern qemu_irq ferr_irq;
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data);
+
+/* TODO eliminate */
+extern RTCState *rtc_state;
+extern PCIDevice *i440fx_state;
+extern int serial_io[MAX_SERIAL_PORTS];
+extern int serial_irq[MAX_SERIAL_PORTS];
+extern int parallel_io[MAX_PARALLEL_PORTS];
+extern int parallel_irq[MAX_PARALLEL_PORTS];
+extern fdctrl_t *floppy_controller;
+
+/* TODO move to pic stuff in pc.h? */
+void pic_irq_request(void *opaque, int irq, int level);
+
+/* TODO move to a20 stuff in pc.h? */
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val);
+uint32_t ioport92_read(void *opaque, uint32_t addr);
+
+void bochs_bios_init(void);
+void main_cpu_reset(void *opaque);
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data);
+int pc_boot_set(void *opaque, const char *boot_device);
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd);
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table);
+
+#endif
diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c
index d1cba28..3a7e2d0 100644
--- a/hw/vmware_vga.c
+++ b/hw/vmware_vga.c
@@ -1113,7 +1113,7 @@ static int vmsvga_load(struct vmsvga_state_s *s, QEMUFile *f)
 }
 
 static void vmsvga_init(struct vmsvga_state_s *s,
-                uint8_t *vga_ram_base, unsigned long vga_ram_offset,
+                uint8_t *vga_ram_base, ram_addr_t vga_ram_offset,
                 int vga_ram_size)
 {
     s->vram = vga_ram_base;
@@ -1207,7 +1207,7 @@ static void pci_vmsvga_map_mem(PCIDevice *pci_dev, int region_num,
 #define PCI_CLASS_HEADERTYPE_00h	0x00
 
 void pci_vmsvga_init(PCIBus *bus, uint8_t *vga_ram_base,
-                     unsigned long vga_ram_offset, int vga_ram_size)
+                     ram_addr_t vga_ram_offset, int vga_ram_size)
 {
     struct pci_vmsvga_state_s *s;
 
diff --git a/net.c b/net.c
index 522df03..8f32f60 100644
--- a/net.c
+++ b/net.c
@@ -153,7 +153,7 @@ static void hex_dump(FILE *f, const uint8_t *buf, int size)
 }
 #endif
 
-static int parse_macaddr(uint8_t *macaddr, const char *p)
+int parse_macaddr(uint8_t *macaddr, const char *p)
 {
     int i;
     char *last_char;
diff --git a/net.h b/net.h
index 03c7f18..f672915 100644
--- a/net.h
+++ b/net.h
@@ -47,6 +47,7 @@ int qemu_can_send_packet(VLANClientState *vc);
 ssize_t qemu_sendv_packet(VLANClientState *vc, const struct iovec *iov,
                           int iovcnt);
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size);
+int parse_macaddr(uint8_t *macaddr, const char *p);
 void qemu_format_nic_info_str(VLANClientState *vc, uint8_t macaddr[6]);
 void qemu_check_nic_model(NICInfo *nd, const char *model);
 void qemu_check_nic_model_list(NICInfo *nd, const char * const *models,
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 1cf49d5..34a7b4d 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -7,6 +7,7 @@
 
 void register_machines(void)
 {
+    qemu_register_machine(&pcdt_machine);
     qemu_register_machine(&pc_machine);
     qemu_register_machine(&isapc_machine);
 }
diff --git a/tree.c b/tree.c
new file mode 100644
index 0000000..1825dc0
--- /dev/null
+++ b/tree.c
@@ -0,0 +1,285 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/* Ye Olde Decorated Tree */
+
+#include <assert.h>
+#include "tree.h"
+#include "qemu-common.h"
+#include "sys-queue.h"
+
+struct tree {
+    const char *name;
+    LIST_HEAD(, tree_prop) props;
+    tree *parent;
+    TAILQ_HEAD(, tree) kids;
+    TAILQ_ENTRY(tree) siblings;
+    void *user;
+};
+
+struct tree_prop {
+    const char *name;
+    const void *val;
+    int sz;
+    tree *owner;
+    LIST_ENTRY(tree_prop) link;
+};
+
+tree *tree_new_kid(tree *parent, const char *name, void *user)
+{
+    tree *kid = qemu_malloc(sizeof(*kid));
+
+    kid->name = name;
+    LIST_INIT(&kid->props);
+    kid->parent = NULL;
+    TAILQ_INIT(&kid->kids);
+    kid->user = user;
+    if (parent)
+        tree_insert(parent, kid);
+
+    return kid;
+}
+
+void tree_insert(tree *parent, tree *kid)
+{
+    assert(!kid->parent);
+    kid->parent = parent;
+    TAILQ_INSERT_TAIL(&parent->kids, kid, siblings);
+}
+
+const char *tree_node_name(const tree *node)
+{
+    return node->name;
+}
+
+static tree *tree_kid_by_name(const tree *parent, const char *name)
+{
+    const char *slash = strchr(name, '/');
+    size_t len = slash ? slash - name : strlen(name);
+    tree *kid;
+
+    TAILQ_FOREACH(kid, &parent->kids, siblings) {
+        if (!memcmp(kid->name, name, len) && kid->name[len] == 0)
+            return kid;
+    }
+    return NULL;
+}
+
+tree *tree_node_by_name(const tree *node, const char *name)
+{
+    tree *kid;
+    size_t len;
+
+    if (name[0] == '/') {
+        for (; node->parent; node = node->parent) ;
+        while (*name == '/') name++;
+    }
+
+    if (name[0] == 0)
+        return (tree *)node;
+
+    kid = tree_kid_by_name(node, name);
+    if (!kid)
+        return NULL;
+
+    len = strlen(kid->name);
+    if (name[len] == 0)
+        return kid;
+    assert (name[len] == '/');
+
+    while (name[len] == '/') len++;
+    return tree_node_by_name(kid, name + len);
+}
+
+tree_prop *tree_first_prop(const tree *node)
+{
+    return LIST_FIRST(&node->props);
+}
+
+tree_prop *tree_next_prop(const tree_prop *prop)
+{
+    return LIST_NEXT(prop, link);
+}
+
+tree_prop *tree_get_prop(const tree *node, const char *name)
+{
+    tree_prop *prop;
+
+    LIST_FOREACH(prop, &node->props, link) {
+        if (!strcmp(prop->name, name))
+            return prop;
+    }
+    return NULL;
+}
+
+const char *tree_get_prop_s(const tree *node, const char *name)
+{
+    tree_prop *prop = tree_get_prop(node, name);
+    if (!prop
+        || memchr(prop->val, 0, prop->sz) != prop->val + prop->sz - 1) {
+        errno = EINVAL;
+        return NULL;
+    }
+    return prop->val;
+}
+
+const char *tree_prop_name(const tree_prop *prop)
+{
+    return prop->name;
+}
+
+const void *tree_prop_value(const tree_prop *prop, size_t *size)
+{
+    if (size)
+        *size = prop->sz;
+    return prop->val;
+}
+
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz)
+{
+    tree_prop *prop;
+
+    prop = tree_get_prop(node, name);
+    if (!prop) {
+        prop = qemu_malloc(sizeof(*prop));
+        prop->name = name;
+        prop->owner = node;
+        LIST_INSERT_HEAD(&node->props, prop, link);
+    }
+    /* FIXME need a destructor for val */
+    prop->val = val;
+    prop->sz = sz;
+}
+
+void tree_put_propf(tree *node, const char *name, const char *fmt, ...)
+{
+    va_list ap;
+    size_t len;
+    char *buf;
+
+    va_start(ap, fmt);
+    len = vsnprintf(NULL, 0, fmt, ap);
+    va_end(ap);
+
+    buf = qemu_malloc(len + 1);
+    va_start(ap, fmt);
+    vsnprintf(buf, len + 1, fmt, ap);
+    va_end(ap);
+
+    tree_put_prop(node, name, buf, len + 1);
+}
+
+void tree_put_user(tree *node, void *user)
+{
+    node->user = user;
+}
+
+void *tree_get_user(const tree *node)
+{
+    return node->user;
+}
+
+tree *tree_parent(const tree *node)
+{
+    return node->parent;
+}
+
+tree *tree_first_kid(const tree *node)
+{
+    return TAILQ_FIRST(&node->kids);
+}
+
+tree *tree_sibling(const tree *node)
+{
+    return TAILQ_NEXT(node, siblings);
+}
+
+int tree_path(const tree *node, char *buf, size_t bufsz)
+{
+    char *p;
+    const tree *np;
+    size_t len, res;
+
+    p = buf + bufsz;
+    res = 0;
+    for (np = node; np->parent; np = np->parent) {
+        len = 1 + strlen(np->name);
+        res += len;
+        if (res >= bufsz)
+            continue;
+        p -= len;
+        memcpy(p + 1, np->name, len - 1);
+        p[0] = '/';
+    }
+
+    if (res == 0) {
+        if (++res < bufsz)
+            *--p = '/';
+    }
+
+    if (res < bufsz) {
+        memcpy(buf, p, res);
+        buf[res] = 0;
+    }
+
+    return res;
+}
+
+static void tree_print_sub(const tree *node, int indent)
+{
+    int i, use_str, sep;
+    const unsigned char *pv;
+    tree_prop *prop;
+    tree *kid;
+
+    printf("%*s%s {\n", indent, "", node->parent ? node->name : "/");
+    LIST_FOREACH(prop, &node->props, link) {
+        printf("%*s%s", indent + 4, "", prop->name);
+        pv = prop->val;
+        if (pv) {
+            printf(" = ");
+            use_str = pv[prop->sz - 1] == 0;
+            for (i = 0; i < prop->sz - 1; i++) {
+                if (!isprint(pv[i]))
+                    use_str = 0;
+            }
+            if (use_str)
+                printf("\"%s\"", (const char *)prop->val);
+            else {
+                sep = '[';
+                for (i = 0; i < prop->sz; i++) {
+                    printf("%c%02x", sep, pv[i]);
+                    sep = ' ';
+                }
+                printf("]");
+            }
+        }
+        printf(";\n");
+    }
+    TAILQ_FOREACH(kid, &node->kids, siblings)
+        tree_print_sub(kid, indent + 4);
+    printf("%*s};\n", indent, "");
+}
+
+void tree_print(const tree *node)
+{
+    tree_print_sub(node, 0);
+}
diff --git a/tree.h b/tree.h
new file mode 100644
index 0000000..9fc579a
--- /dev/null
+++ b/tree.h
@@ -0,0 +1,41 @@
+#ifndef TREE_H
+#define TREE_H
+
+#include <stddef.h>
+
+typedef struct tree tree;
+typedef struct tree_prop tree_prop;
+
+tree *tree_new_kid(tree *parent, const char *name, void *user);
+void tree_insert(tree *parent, tree *kid);
+const char *tree_node_name(const tree *node);
+tree *tree_node_by_name(const tree *node,
+                        const char *name);
+
+tree_prop *tree_first_prop(const tree *node);
+tree_prop *tree_next_prop(const tree_prop *prop);
+#define TREE_FOREACH_PROP(var, node) \
+    for (var = tree_first_prop(node); var; var = tree_next_prop(var))
+tree_prop *tree_get_prop(const tree *node, const char *name);
+const char *tree_get_prop_s(const tree *node, const char *name);
+const char *tree_prop_name(const tree_prop *prop);
+const void *tree_prop_value(const tree_prop *prop, size_t *size);
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz);
+void tree_put_propf(tree *node, const char *name,
+                    const char *fmt, ...)
+    __attribute__((format(printf,3,4)));
+
+void tree_put_user(tree *node, void *user);
+void *tree_get_user(const tree *node);
+
+tree *tree_parent(const tree *node);
+tree *tree_first_kid(const tree *node);
+tree *tree_sibling(const tree *node);
+#define TREE_FOREACH_KID(var, node) \
+    for (var = tree_first_kid(node); var; var = tree_sibling(var))
+
+int tree_path(const tree *node, char *buf, size_t bufsz);
+void tree_print(const tree *node);
+
+#endif

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [Qemu-devel] Machine description as data prototype, take 6 (was: [RFC] Machine description as data)
  2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
                   ` (6 preceding siblings ...)
  2009-03-03 17:46 ` [Qemu-devel] Machine description as data prototype, take 5 (was: [RFC] Machine description as data) Markus Armbruster
@ 2009-03-12 18:43 ` Markus Armbruster
  2009-03-17 16:06   ` [Qemu-devel] Machine description as data prototype, take 6 Paul Brook
  2009-03-23 15:50 ` [Qemu-devel] Re: [RFC] Machine description as data Markus Armbruster
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Markus Armbruster @ 2009-03-12 18:43 UTC (permalink / raw)
  To: qemu-devel

Sixth iteration of the prototype.  Work in progress, not quite ready for
merging.

New:

* SCSI.

* Multiple buses of the same kind.  Only used by SCSI for now.

* Rebased to git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6816 c046a42c-6fe2-441c-8c8c-71466251a162

Not in, but not forgotten either:

* A few more renames suggested by reviewers.

* Reduce unnecessary differences to IEEE 1275 trees.

Shortcuts:

* No support for systems without PCI bus.

* I didn't implement all the devices of the "pc" original.  Missing:
  - Option ROMs
  - Audio
  - Virtio block, balloon, console

* The configuration tree is simplistic.  I expect it to evolve, and I
  wouldn't exclude the possibility of wholesale replacement.

* The initial configuration tree is hardcoded in hw/pcdt.c.  It should
  be read from a configuration file.

* A bus is identified by its kind and number.  The bus number depends on
  its position in the tree.  Means for position-independent addressing
  would be nice.

* I'm hiding completely behind the existing QEMUMachine init method
  interface, in hw/pcdt.c.  I guess we'll want a QEMUMachine interface
  that allows us to move a bit more code out of hw/.

* The interface to the shared code in hw/pc.c (hw/pcint.h) is rather
  crude.

* The memory driver is PC-specific.  It should be generic and
  data-driven, but getting there isn't quite as easy as it sounds.
  Memory (and sometimes even holes) need to be allocated in just the
  right order to ensure guest physical address equals host offset for
  certain memory ranges.

* The pc-misc driver should most probably be split up some.

* hw/ppce500_mpc8544ds.c doesn't compile when I configure with fdt
  support.


 Makefile              |    1 +
 Makefile.target       |    4 +-
 dt.c                  |  645 +++++++++++++++++++++++++++++++++++++++++++++
 dt.h                  |  106 ++++++++
 hw/boards.h           |    3 +
 hw/pc.c               |   47 ++--
 hw/pcdt.c             |  691 +++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pci.h              |    2 +-
 hw/pcint.h            |   46 ++++
 hw/vmware_vga.c       |    4 +-
 net.c                 |    2 +-
 net.h                 |    1 +
 target-i386/machine.c |    1 +
 tree.c                |  285 ++++++++++++++++++++
 tree.h                |   41 +++
 15 files changed, 1847 insertions(+), 32 deletions(-)


diff --git a/Makefile b/Makefile
index 82fec80..04026db 100644
--- a/Makefile
+++ b/Makefile
@@ -85,6 +85,7 @@ OBJS+=sd.o ssi-sd.o
 OBJS+=bt.o bt-host.o bt-vhci.o bt-l2cap.o bt-sdp.o bt-hci.o bt-hid.o usb-bt.o
 OBJS+=buffered_file.o migration.o migration-tcp.o net.o qemu-sockets.o
 OBJS+=qemu-char.o aio.o net-checksum.o savevm.o cache-utils.o
+OBJS+=tree.o
 
 ifdef CONFIG_BRLAPI
 OBJS+= baum.o
diff --git a/Makefile.target b/Makefile.target
index 9a2f123..221d5a3 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -505,6 +505,7 @@ OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o dma-helpers.o
 # need to fix this properly
 OBJS+=virtio.o virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
 OBJS+=fw_cfg.o
+OBJS+=dt.o
 ifdef CONFIG_KVM
 OBJS+=kvm.o kvm-all.o
 endif
@@ -536,6 +537,7 @@ endif
 ifdef CONFIG_OSS
 LIBS += $(CONFIG_OSS_LIB)
 endif
+LIBS+= $(FDT_LIBS)
 
 SOUND_HW = sb16.o es1370.o ac97.o
 ifdef CONFIG_ADLIB
@@ -588,6 +590,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
+OBJS+= pcdt.o
 OBJS += device-hotplug.o pci-hotplug.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
@@ -611,7 +614,6 @@ OBJS+= ppc440.o ppc440_bamboo.o
 OBJS+= ppce500_pci.o ppce500_mpc8544ds.o
 ifdef FDT_LIBS
 OBJS+= device_tree.o
-LIBS+= $(FDT_LIBS)
 endif
 ifdef CONFIG_KVM
 OBJS+= kvm_ppc.o
diff --git a/dt.c b/dt.c
new file mode 100644
index 0000000..ff1ef30
--- /dev/null
+++ b/dt.c
@@ -0,0 +1,645 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * Configure and build a machine from configuration data
+ *
+ * This is generic, device-independent code driven by device-dependent
+ * configuration data, talking to devices through an abstract device
+ * interface.
+ *
+ * The configuration data currently is hardwired to a fairly limited
+ * PC, registered as machine type "pcdt".  The nuts and bolts of PC
+ * emulation remain in pc.c, and that sharing makes the somewhat
+ * clumsy pcint.h necessary.  Having two PC machine types makes no
+ * sense in the long run, of course.  We want to replace pc.c
+ * eventually, and also convert other machine types to this mechanism.
+ */
+
+#include <assert.h>
+#include "block.h"
+#include "cpu.h"
+#include "dt.h"
+#include "net.h"
+#include "tree.h"
+#include "sysemu.h"
+
+#ifdef HAVE_FDT
+#include <libfdt.h>
+#endif
+
+/* Forward declarations */
+static void dt_parse_prop(dt_device *dev, tree_prop *prop);
+static void dt_add_dyn_devs(tree *conf, dt_host *host,
+                            const dt_driver drvtab[], int vga_ram_size);
+static void dt_fdt_test(tree *conf);
+
+
+/* Host Configuration */
+
+struct dt_host {
+    /* connection NIC <-> VLAN */
+    tree *nic[MAX_NICS];
+    VLANState *nic_vlan[MAX_NICS];
+    /* connection drive <-> block driver state */
+    tree *drive[MAX_DRIVES];
+    BlockDriverState *drive_state[MAX_DRIVES];
+};
+
+static void dt_attach_nic(dt_host *host, int index, tree *nic, VLANState *vlan)
+{
+    host->nic[index] = nic;
+    host->nic_vlan[index] = vlan;
+}
+
+VLANState *dt_find_vlan(tree *conf, dt_host *host)
+{
+    int i;
+
+    for (i = 0; i < nb_nics; i++) {
+        if (host->nic[i] == conf)
+            return host->nic_vlan[i];
+    }
+    return NULL;
+}
+
+static void dt_attach_drive(dt_host *host, int index,
+                            tree *node, BlockDriverState *state)
+{
+    host->drive[index] = node;
+    host->drive_state[index] = state;
+}
+
+void dt_find_drives(tree *conf, dt_host *host,
+                    BlockDriverState *drive[], int n)
+{
+    int i, unit;
+
+    memset(drive, 0, n * sizeof(drive[0]));
+
+    for (i = 0; i < nb_drives; i++) {
+        if (!host->drive[i])
+            continue;           /* TODO rm when all drive types implemented */
+        if (tree_parent(host->drive[i]) != conf)
+            continue;
+        unit = dt_get_unit(dt_device_of(host->drive[i]));
+        assert(unit < n && !drive[unit]);
+        drive[unit] = host->drive_state[i];
+    }
+}
+
+static void dt_print_host_config(dt_host *host)
+{
+    char buf[1024];
+    int i;
+
+    for (i = 0; i < nb_nics; i++) {
+        if (!host->nic[i])
+            continue;
+        tree_path(host->nic[i], buf, sizeof(buf));
+        printf("nic#%d\tvlan %-4d\t%s\n",
+               i, host->nic_vlan[i]->id, buf);
+    }
+
+    for (i = 0; i < nb_drives; i++) {
+        if (!host->drive[i])
+            continue;
+        tree_path(host->drive[i], buf, sizeof(buf));
+        printf("drive#%d\t%-15s %s\n",
+               i, bdrv_get_device_name(host->drive_state[i]), buf);
+    }
+}
+
+
+/* Device Interface */
+
+static const dt_driver *dt_driver_by_name(const char *name,
+                                          const dt_driver drvtab[])
+{
+    int i;
+
+    for (i = 0; drvtab[i].name; i++) {
+        if (!strcmp(name, drvtab[i].name))
+            return &drvtab[i];
+    }
+    return NULL;
+}
+
+dt_device *dt_device_of(tree *conf)
+{
+    return tree_get_user(conf);
+}
+
+dt_device *dt_parent_device(dt_device *dev)
+{
+    tree *p = tree_parent(dev->conf);
+
+    return p ? dt_device_of(p) : NULL;
+}
+
+static dt_device *dt_do_find_bus(tree *conf, dt_bus_type bus_type, int *skip)
+{
+    dt_device *dev;
+    tree *child;
+
+    dev = dt_device_of(conf);
+    if (dev->drv->bus_type == bus_type && (*skip)-- == 0)
+        return dev;
+
+    TREE_FOREACH_CHILD(child, conf) {
+        dev = dt_do_find_bus(child, bus_type, skip);
+        if (dev)
+            return dev;
+    }
+
+    return NULL;
+}
+
+static dt_device *dt_find_bus(tree *conf, dt_bus_type bus_type, int busno)
+{
+    return dt_do_find_bus(conf, bus_type, &busno);
+}
+
+PCIBus *dt_get_pcibus(dt_device *dev)
+{
+    dt_device *bus = dt_parent_device(dev);
+
+    return bus->drv->get_pcibus(bus);
+}
+
+int dt_get_unit(dt_device *dev)
+{
+    return dev->drv->get_unit(dev);
+}
+
+static dt_device *dt_new_device(tree *conf, const dt_driver *drv)
+{
+    dt_device *dev;
+    tree_prop *prop;
+
+    dev = qemu_malloc(sizeof(*dev));
+    dev->conf = conf;
+    dev->drv = drv;
+    LIST_INIT(&dev->reqs);
+    dev->visit = 0;
+    dev->priv = qemu_malloc(drv->privsz);
+    tree_put_user(conf, dev);
+
+    TREE_FOREACH_PROP(prop, conf)
+        dt_parse_prop(dev, prop);
+
+    return dev;
+}
+
+static dt_device *dt_create(tree *conf, const dt_driver drvtab[])
+{
+    const dt_driver *drv;
+    dt_device *dev;
+    tree *child;
+
+    drv = dt_driver_by_name(tree_node_name(conf), drvtab);
+    if (!drv) {
+        fprintf(stderr, "No driver for device %s\n",
+                tree_node_name(conf));
+        exit(1);
+    }
+
+    assert((drv->bus_type == DT_BUS_PCI) == (drv->get_pcibus != NULL));
+
+    dev = dt_new_device(conf, drv);
+
+    TREE_FOREACH_CHILD(child, conf)
+        dt_create(child, drvtab);
+
+    return dev;
+}
+
+static void dt_config(tree *conf, dt_host *host)
+{
+    dt_device *dev = dt_device_of(conf);
+    dt_device *bus = dt_parent_device(dev);
+    tree *child;
+
+    if (dev->drv->parent_bus_type == DT_BUS_NONE
+        ? bus != NULL
+        : bus == NULL || bus->drv->bus_type != dev->drv->parent_bus_type) {
+        fprintf(stderr, "Device %s is not on a suitable bus\n",
+                dev->drv->name);
+        exit(1);
+    }
+
+    if (dev->drv->config)
+        dev->drv->config(dev, host);
+
+    TREE_FOREACH_CHILD(child, conf)
+        dt_config(child, host);
+}
+
+tree *dt_require_named(dt_device *dev, const char *reqname)
+{
+    dt_tree_list *l = qemu_malloc(sizeof(*l));
+
+    l->conf = tree_node_by_name(dev->conf, reqname);
+    LIST_INSERT_HEAD(&dev->reqs, l, link);
+    return l->conf;
+}
+
+static void dt_do_visit(dt_device *dev,
+                        void (*fun)(dt_device *, void *arg),
+                        void *arg, int visit)
+{
+    dt_device *parent, *req, *child;
+    dt_tree_list *l;
+    tree *k;
+
+    assert(dev->visit < visit - 1);
+    dev->visit = visit - 1;
+    parent = dt_parent_device(dev);
+    if (parent && parent->visit < visit)
+        dt_do_visit(parent, fun, arg, visit);
+    LIST_FOREACH(l, &dev->reqs, link) {
+        req = dt_device_of(l->conf);
+        if (req->visit < visit)
+            dt_do_visit(req, fun, arg, visit);
+    }
+    dev->visit = visit;
+    fun(dev, arg);
+    TREE_FOREACH_CHILD(k, dev->conf) {
+        child = dt_device_of(k);
+        if (child->visit < visit - 1)
+            dt_do_visit(child, fun, arg, visit);
+    }
+}
+
+static void dt_visit(tree *node,
+                     void (*fun)(dt_device *, void *arg),
+                     void *arg)
+{
+    static int visit;
+
+    visit += 2;
+    dt_do_visit(dt_device_of(node), fun, arg, visit);
+}
+
+static void dt_init_visitor(dt_device *dev, void *arg)
+{
+    if (dev->drv->init)
+        dev->drv->init(dev);
+}
+
+static void dt_init(tree *conf)
+{
+    dt_visit(conf, dt_init_visitor, NULL);
+}
+
+static void dt_start(tree *conf)
+{
+    dt_device *dev = dt_device_of(conf);
+    tree *child;
+
+    if (dev && dev->drv->start)
+        dev->drv->start(dev);
+
+    TREE_FOREACH_CHILD(child, conf)
+        dt_start(child);
+}
+
+void dt_create_machine(tree *conf, const dt_driver drvtab[], int vga_ram_size)
+{
+    dt_host host;
+
+    memset(&host, 0, sizeof(host));
+    dt_create(conf, drvtab);
+    dt_add_dyn_devs(conf, &host, drvtab, vga_ram_size);
+    dt_config(conf, &host);
+    tree_print(conf);
+    dt_print_host_config(&host);
+    dt_fdt_test(conf);
+    dt_init(conf);
+    dt_start(conf);
+}
+
+
+/* Device properties */
+
+static const dt_prop_spec *dt_prop_spec_by_name(const dt_driver *drv,
+                                                const char *name)
+{
+    const dt_prop_spec *spec;
+
+    for (spec = drv->prop_spec; spec && spec->name; spec++) {
+        if (!strcmp(spec->name, name))
+            return spec;
+    }
+    return NULL;
+}
+
+static void dt_parse_prop(dt_device *dev, tree_prop *prop)
+{
+    const char *name = tree_prop_name(prop);
+    size_t size;
+    const char *val = tree_prop_value(prop, &size);
+    const dt_prop_spec *spec = dt_prop_spec_by_name(dev->drv, name);
+
+    if (!spec) {
+        fprintf(stderr, "A %s device has no property %s\n",
+                dev->drv->name, name);
+        exit(1);
+    }
+
+    if (memchr(val, 0, size) != val + size - 1
+        || spec->parse((char *)dev->priv + spec->offs, val, spec) < 0) {
+        fprintf(stderr, "Bad value %.*s for property %s of device %s\n",
+                size, val, name, dev->drv->name);
+        exit(1);
+    }
+}
+
+int dt_parse_string(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    assert(spec->size == sizeof(char *));
+    *(const char **)dst = src;
+    return 0;
+}
+
+int dt_parse_int(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    char *ep;
+    long val;
+
+    assert(spec->size == sizeof(int));
+    errno = 0;
+    val = strtol(src, &ep, 0);
+    if (*ep || ep == src || errno || (int)val != val)
+        return -1;
+    *(int *)dst = val;
+    return 0;
+}
+
+int dt_parse_ram_addr_t(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    char *ep;
+    unsigned long val;
+
+    assert(spec->size == sizeof(ram_addr_t));
+    errno = 0;
+    val = strtoul(src, &ep, 0);
+    if (*ep || ep == src || errno || (ram_addr_t)val != val)
+        return -1;
+    *(ram_addr_t *)dst = val;
+    return 0;
+}
+
+int dt_parse_macaddr(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    assert(spec->size == 6);
+    if (parse_macaddr(dst, src) < 0)
+        return -1;
+    return 0;
+}
+
+
+/* Dynamic Devices */
+
+static void dt_add_dyn_dev(tree *conf, tree *node, const dt_driver drvtab[],
+                           int busno)
+{
+    dt_device *dev = dt_create(node, drvtab);
+    dt_device *bus = dt_find_bus(conf, dev->drv->parent_bus_type, busno);
+
+    if (!bus) {
+        fprintf(stderr, "No suitable bus for device %s\n", dev->drv->name);
+        exit(1);
+    }
+
+    tree_insert(bus->conf, node);
+}
+
+static void dt_add_vga(tree *conf, const dt_driver drvtab[],
+                       const char *model, int vga_ram_size)
+{
+    tree *node = tree_new_child(NULL, "vga", NULL);
+
+    tree_put_propf(node, "model", "%s", model);
+    tree_put_propf(node, "ram", "%#x", vga_ram_size);
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+}
+
+static void dt_add_nic(tree *conf, dt_host *host, const dt_driver drvtab[],
+                       int index)
+{
+    NICInfo *n = &nd_table[index];
+    tree *node = node = tree_new_child(NULL, "nic", NULL);
+
+    tree_put_propf(node, "mac", "%02x:%02x:%02x:%02x:%02x:%02x",
+                   n->macaddr[0], n->macaddr[1], n->macaddr[2],
+                   n->macaddr[3], n->macaddr[4], n->macaddr[5]);
+    tree_put_propf(node, "model", "%s",
+                   n->model ? n->model : "ne2k_pci");
+    if (n->name)
+        tree_put_propf(node, "name", "%s", n->name);
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+    dt_attach_nic(host, index, node, n->vlan);
+}
+
+static void dt_add_scsi(tree *conf, const dt_driver drvtab[], int busno)
+{
+    tree *node = tree_new_child(NULL, "scsi", NULL);
+
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+    assert(dt_find_bus(conf, DT_BUS_SCSI, busno)->conf == node);
+}
+
+static const char *block_if_name[] = {
+    [IF_IDE] = "ide",
+    [IF_SCSI] = "scsi",
+    [IF_FLOPPY] = "floppy",
+    [IF_PFLASH] = "pflash",
+    [IF_MTD] = "mtd",
+    [IF_SD] = "sd",
+    [IF_VIRTIO] = "virtio",
+};
+
+static void dt_add_drive(tree *conf, dt_host *host, const dt_driver drvtab[],
+                         int index)
+{
+    DriveInfo *d = &drives_table[index];
+    int bus, unit;
+    char buf[32];
+    tree *node;
+
+    unit = d->unit;
+    bus = d->bus;
+
+    switch (d->type) {
+    case IF_IDE:
+        /* hack to hang all IDE drives off the same node for now */
+        unit = bus * MAX_IDE_DEVS + unit;
+        bus = 0;
+        /* fall through */
+    case IF_SCSI:
+    case IF_FLOPPY:
+    case IF_VIRTIO:
+        snprintf(buf, sizeof(buf), "%s-drive", block_if_name[d->type]);
+        node = tree_new_child(NULL, strdup(buf), NULL);
+        tree_put_propf(node, "unit", "%d", unit);
+        dt_add_dyn_dev(conf, node, drvtab, bus);
+        dt_attach_drive(host, index, node, d->bdrv);
+        break;
+    case IF_PFLASH:
+    case IF_MTD:
+    case IF_SD:
+        /* TODO implement */
+        fprintf(stderr, "Ignoring unimplemented drive %s\n",
+                drives_opt[drives_table[index].drive_opt_idx].opt);
+        break;
+    }
+}
+
+static void dt_add_dyn_devs(tree *conf, dt_host *host,
+                            const dt_driver drvtab[], int vga_ram_size)
+{
+    int i, max_bus;
+
+    if (cirrus_vga_enabled || vmsvga_enabled || std_vga_enabled) {
+        dt_add_vga(conf, drvtab,
+                   cirrus_vga_enabled ? "cirrus"
+                   : vmsvga_enabled ? "vms" : "std",
+                   vga_ram_size);
+    }
+
+    for(i = 0; i < nb_nics; i++)
+        dt_add_nic(conf, host, drvtab, i);
+
+    max_bus = drive_get_max_bus(IF_SCSI);
+    for (i = 0; i <= max_bus; i++)
+        dt_add_scsi(conf, drvtab, i);
+
+    for (i = 0; i < nb_drives; i++)
+        dt_add_drive(conf, host, drvtab, i);
+}
+
+
+/* Interfacing with FDT */
+
+/*
+ * Note: translation to FDT loses the association between
+ * configuration tree nodes and devices.
+ */
+
+#ifdef HAVE_FDT
+
+static int dt_fdt_chk(int res);
+static void dt_subtree_to_fdt(const tree *conf, void *fdt);
+
+static void *dt_tree_to_fdt(const tree *conf)
+{
+    int sz = 1024 * 1024;       /* FIXME arbitrary limit */
+    void *fdt = qemu_malloc(sz);
+
+    dt_fdt_chk(fdt_create(fdt, sz));
+    dt_subtree_to_fdt(conf, fdt);
+    dt_fdt_chk(fdt_finish(fdt));
+    return fdt;
+}
+
+static void dt_subtree_to_fdt(const tree *conf, void *fdt)
+{
+    tree_prop *prop;
+    tree *child;
+    const void *pv;
+    size_t sz;
+
+    dt_fdt_chk(fdt_begin_node(fdt, tree_node_name(conf)));
+    TREE_FOREACH_PROP(prop, conf) {
+        pv = tree_prop_value(prop, &sz);
+        dt_fdt_chk(fdt_property(fdt, tree_prop_name(prop), pv, sz));
+    }
+    TREE_FOREACH_CHILD(child, conf)
+        dt_subtree_to_fdt(child, fdt);
+    dt_fdt_chk(fdt_end_node(fdt));
+}
+
+static tree *dt_fdt_to_tree(const void *fdt)
+{
+    int offs, next, depth;
+    uint32_t tag;
+    struct fdt_property *prop;
+    tree *stack[32];            /* FIXME arbitrary limit */
+
+    stack[0] = NULL;            /* "parent" of root */
+    next = depth = 0;
+
+    for (;;) {
+        offs = next;
+        tag = fdt_next_tag(fdt, offs, &next);
+        switch (tag) {
+        case FDT_PROP:
+            /*
+             * libfdt apparently doesn't provide a way to get property
+             * by offset, do it by hand
+             */
+            assert(0 < depth && depth < ARRAY_SIZE(stack));
+            prop = (void *)(const char *)fdt + fdt_off_dt_struct(fdt) + offs;
+            tree_put_prop(stack[depth],
+                          fdt_string(fdt, fdt32_to_cpu(prop->nameoff)),
+                          prop->data,
+                          fdt32_to_cpu(prop->len));
+        case FDT_NOP:
+            break;
+        case FDT_BEGIN_NODE:
+            depth++;
+            assert(0 < depth && depth < ARRAY_SIZE(stack));
+            stack[depth] = tree_new_child(stack[depth-1],
+                                          fdt_get_name(fdt, offs, NULL),
+                                          NULL);
+            break;
+        case FDT_END_NODE:
+            depth--;
+            break;
+        case FDT_END:
+            dt_fdt_chk(next);
+            return stack[1];
+        }
+    }
+}
+
+static int dt_fdt_chk(int res)
+{
+    if (res < 0) {
+        fprintf(stderr, "%s\n", fdt_strerror(res)); /* FIXME cryptic */
+        exit(1);
+    }
+    return res;
+}
+
+static void dt_fdt_test(tree *conf)
+{
+    void *fdt;
+
+    fdt = dt_tree_to_fdt(conf);
+    conf = dt_fdt_to_tree(fdt);
+    tree_print(conf);
+    free(fdt);
+}
+#else
+static void dt_fdt_test(tree *conf) { }
+#endif
diff --git a/dt.h b/dt.h
new file mode 100644
index 0000000..ae035c0
--- /dev/null
+++ b/dt.h
@@ -0,0 +1,106 @@
+#ifndef DT_H
+#define DT_H
+
+#include "sysemu.h"
+#include "net.h"
+#include "tree.h"
+
+typedef struct dt_host dt_host;
+typedef struct dt_device dt_device;
+typedef struct dt_tree_list dt_tree_list;
+typedef struct dt_driver dt_driver;
+typedef struct dt_prop_spec dt_prop_spec;
+
+
+/* Host Configuration */
+
+VLANState *dt_find_vlan(tree *conf, dt_host *host);
+void dt_find_drives(tree *conf, dt_host *host,
+                    BlockDriverState *drive[], int n);
+
+
+/* Device Interface */
+
+/*
+ * Device life cycle:
+ *
+ * 1. Configuration: config() method runs after parent's.  It should
+ * initialize the device's private data from its configuration
+ * sub-tree.  It may edit the configuration sub-tree, and may declare
+ * initialization ordering constraints with tree_require_named().
+ *
+ * 2. Initialization: init() method runs after parent's and after that
+ * of devices declared required by config().  It should not touch the
+ * configuration tree.
+ *
+ * 3. Start: start() method runs, order is unspecified.
+ *
+ * Error handling in these driver methods: print to stderr and exit
+ * the program unsuccessfully.
+ *
+ * There is no device shutdown protocol yet.
+ */
+
+struct dt_device {
+    tree *conf;                 /* configuration sub-tree */
+    const dt_driver *drv;       /* device driver */
+    LIST_HEAD(, dt_tree_list) reqs; /* required devices */
+    int visit;                  /* for dt_visit() */
+    void *priv;                 /* device private data */
+};
+
+struct dt_tree_list {
+    tree *conf;
+    LIST_ENTRY(dt_tree_list) link;
+};
+
+typedef enum dt_bus_type {
+    DT_BUS_NONE, DT_BUS_ROOT, DT_BUS_PCI, DT_BUS_IDE, DT_BUS_SCSI,
+    DT_BUS_FLOPPY,
+} dt_bus_type;
+
+struct dt_driver {
+    const char *name;
+    size_t privsz;              /* size of device private data */
+    const dt_prop_spec *prop_spec; /* recognized conf node properties */
+    dt_bus_type bus_type, parent_bus_type;
+    void (*config)(dt_device *, dt_host *);
+    void (*init)(dt_device *);
+    void (*start)(dt_device *);
+    PCIBus *(*get_pcibus)(dt_device *); /* iff device is a PCI bus */
+    int (*get_unit)(dt_device *);
+};
+
+dt_device *dt_device_of(tree *conf);
+dt_device *dt_parent_device(dt_device *dev);
+PCIBus *dt_get_pcibus(dt_device *dev);
+int dt_get_unit(dt_device *dev);
+tree *dt_require_named(dt_device *dev, const char *reqname);
+void dt_create_machine(tree *conf, const dt_driver drvtab[], int vga_ram_size);
+
+
+/* Device properties */
+
+/*
+ * This is for parsing configuration tree node properties into device
+ * private data.
+ */
+
+struct dt_prop_spec {
+    const char *name;
+    ptrdiff_t offs;             /* offset in device private data */
+    size_t size;                /* size there, for sanity checking */
+    int (*parse)(void *, const char *, const dt_prop_spec *);
+};
+
+#define DT_PROP_SPEC_INIT(name, strty, member, fmt)                     \
+    { name, offsetof(strty, member), sizeof(((strty *)0)->member),      \
+      dt_parse_##fmt }
+
+/* Canned property parse methods */
+int dt_parse_string(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_int(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_ram_addr_t(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_macaddr(void *dst, const char *src, const dt_prop_spec *spec);
+
+#endif
diff --git a/hw/boards.h b/hw/boards.h
index 1e62594..d75e518 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -35,6 +35,9 @@ extern QEMUMachine axisdev88_machine;
 extern QEMUMachine pc_machine;
 extern QEMUMachine isapc_machine;
 
+/* pcdt.c */
+extern QEMUMachine pcdt_machine;
+
 /* ppc.c */
 extern QEMUMachine prep_machine;
 extern QEMUMachine core99_machine;
diff --git a/hw/pc.c b/hw/pc.c
index 69f25f3..41a0225 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -37,42 +37,35 @@
 #include "virtio-balloon.h"
 #include "virtio-console.h"
 #include "hpet_emul.h"
+#include "pcint.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
 
-#define BIOS_FILENAME "bios.bin"
-#define VGABIOS_FILENAME "vgabios.bin"
-#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
-
-#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
-
 /* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables.  */
 #define ACPI_DATA_SIZE       0x10000
 #define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 
-#define MAX_IDE_BUS 2
-
-static fdctrl_t *floppy_controller;
-static RTCState *rtc_state;
+fdctrl_t *floppy_controller;
+RTCState *rtc_state;
 static PITState *pit;
 static IOAPICState *ioapic;
-static PCIDevice *i440fx_state;
+PCIDevice *i440fx_state;
 
-static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
 {
 }
 
 /* MSDOS compatibility mode FPU exception support */
-static qemu_irq ferr_irq;
+qemu_irq ferr_irq;
 /* XXX: add IGNNE support */
 void cpu_set_ferr(CPUX86State *s)
 {
     qemu_irq_raise(ferr_irq);
 }
 
-static void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
 {
     qemu_irq_lower(ferr_irq);
 }
@@ -121,7 +114,7 @@ int cpu_get_pic_interrupt(CPUState *env)
     return intno;
 }
 
-static void pic_irq_request(void *opaque, int irq, int level)
+void pic_irq_request(void *opaque, int irq, int level)
 {
     CPUState *env = first_cpu;
 
@@ -167,7 +160,7 @@ static int cmos_get_fd_drive_type(int fd0)
     return val;
 }
 
-static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
 {
     RTCState *s = rtc_state;
     int cylinders, heads, sectors;
@@ -203,7 +196,7 @@ static int boot_device2nibble(char boot_device)
 
 /* copy/pasted from cmos_init, should be made a general function
  and used there as well */
-static int pc_boot_set(void *opaque, const char *boot_device)
+int pc_boot_set(void *opaque, const char *boot_device)
 {
     Monitor *mon = cur_mon;
 #define PC_MAX_BOOT_DEVICES 3
@@ -230,8 +223,8 @@ static int pc_boot_set(void *opaque, const char *boot_device)
 }
 
 /* hd_table must contain 4 block drivers */
-static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
-                      const char *boot_device, BlockDriverState **hd_table)
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table)
 {
     RTCState *s = rtc_state;
     int nbds, bds[3] = { 0, };
@@ -364,13 +357,13 @@ int ioport_get_a20(void)
     return ((first_cpu->a20_mask >> 20) & 1);
 }
 
-static void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
 {
     ioport_set_a20((val >> 1) & 1);
     /* XXX: bit 0 is fast reset */
 }
 
-static uint32_t ioport92_read(void *opaque, uint32_t addr)
+uint32_t ioport92_read(void *opaque, uint32_t addr)
 {
     return ioport_get_a20() << 1;
 }
@@ -422,7 +415,7 @@ static void bochs_bios_write(void *opaque, uint32_t addr, uint32_t val)
     }
 }
 
-static void bochs_bios_init(void)
+void bochs_bios_init(void)
 {
     void *fw_cfg;
 
@@ -691,7 +684,7 @@ static void load_linux(uint8_t *option_rom,
     generate_bootsect(option_rom, gpr, seg, 0);
 }
 
-static void main_cpu_reset(void *opaque)
+void main_cpu_reset(void *opaque)
 {
     CPUState *env = opaque;
     cpu_reset(env);
@@ -706,11 +699,11 @@ static const int ide_irq[2] = { 14, 15 };
 static int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360, 0x280, 0x380 };
 static int ne2000_irq[NE2000_NB_MAX] = { 9, 10, 11, 3, 4, 5 };
 
-static int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
-static int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
+int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
+int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
 
-static int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
-static int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
+int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
+int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
 
 #ifdef HAS_AUDIO
 static void audio_init (PCIBus *pci_bus, qemu_irq *pic)
diff --git a/hw/pcdt.c b/hw/pcdt.c
new file mode 100644
index 0000000..dde6980
--- /dev/null
+++ b/hw/pcdt.c
@@ -0,0 +1,691 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * This is a PC configured and built using the new dt_ infrastructure.
+ * Having two PC machine types makes no sense in the long run, of
+ * course.  We want to replace pc.c eventually, and also convert other
+ * machine types to this infrastructure.
+ *
+ * The configuration data currently is hardwired, and fairly limited.
+ *
+ * The nuts and bolts of PC emulation remain in pc.c for now, and
+ * using the stuff there makes the somewhat clumsy pcint.h necessary.
+ *
+ * The drivers here generally don't do the actual work, they just
+ * provide a common interface to existing device code.  Arguably, they
+ * should be integrated into that device code, with the goal of
+ * eventually replacing the old, ad hoc interfaces.
+ *
+ * Several drivers here are not PC-specific, e.g. drivers for various
+ * PCI devices.
+ */
+
+#include <assert.h>
+#include "hw.h"
+#include "pc.h"
+#include "fdc.h"
+#include "pci.h"
+#include "block.h"
+#include "sysemu.h"
+#include "audio/audio.h"
+#include "net.h"
+#include "smbus.h"
+#include "boards.h"
+#include "console.h"
+#include "fw_cfg.h"
+#include "virtio-blk.h"
+#include "virtio-balloon.h"
+#include "virtio-console.h"
+#include "hpet_emul.h"
+#include "pcint.h"
+#include "dt.h"
+
+#ifdef TARGET_X86_64
+#define CPU_MODEL_DEFAULT "qemu32"
+#else
+#define CPU_MODEL_DEFAULT "qemu64"
+#endif
+
+
+static BlockDriverState **dt_piix3_hd(tree *piix3);
+
+/* CPUs Driver */
+
+typedef struct dt_device_cpus {
+    const char *model;
+    int num;
+} dt_device_cpus;
+
+static const dt_prop_spec dt_cpus_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
+    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
+};
+
+static void dt_cpus_init(dt_device *dev)
+{
+    dt_device_cpus *priv = dev->priv;
+    int i;
+    CPUState *env;
+
+    for(i = 0; i < priv->num; i++) {
+        env = cpu_init(priv->model);
+        if (!env) {
+            fprintf(stderr, "Unable to find CPU definition\n");
+            exit(1);
+        }
+        if (i != 0)
+            env->halted = 1;
+        qemu_register_reset(main_cpu_reset, env);
+    }
+}
+
+
+/* Memory Ranges */
+
+typedef struct dt_device_memrng {
+    target_phys_addr_t phys_addr;
+    ram_addr_t size;
+    ram_addr_t host_offs;
+    ram_addr_t flags;
+} dt_device_memrng;
+
+static void dt_memrng(dt_device_memrng *rng,
+                      target_phys_addr_t phys_addr, ram_addr_t size,
+                      ram_addr_t host_offs, ram_addr_t flags)
+{
+    rng->phys_addr = phys_addr;
+    rng->size = size;
+    rng->host_offs = host_offs;
+    rng->flags = flags;
+}
+
+static void dt_memrng_ram(dt_device_memrng *rng,
+                          target_phys_addr_t phys_addr, ram_addr_t size)
+{
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), 0);
+}
+
+static void dt_memrng_rom(dt_device_memrng *rng,
+                          target_phys_addr_t phys_addr, ram_addr_t maxsz,
+                          const char *dir, const char *image, int top)
+{
+    char buf[1024];
+    int size;
+
+    snprintf(buf, sizeof(buf), "%s/%s", dir, image);
+    size = get_image_size(buf);
+    if (size < 0 || size > maxsz)
+        goto error;
+    if (top)
+        phys_addr = phys_addr + maxsz - size;
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), IO_MEM_ROM);
+    if (load_image(buf, phys_ram_base + rng->host_offs) != size)
+        goto error;
+    return;
+
+error:
+    fprintf(stderr, "qemu: could not load image '%s'\n", buf);
+    exit(1);
+}
+
+static void dt_memrng_init(dt_device_memrng *rng, int n)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+        cpu_register_physical_memory(rng[i].phys_addr, rng[i].size,
+                                     rng[i].host_offs | rng[i].flags);
+}
+
+
+/* Memory Driver */
+
+typedef struct dt_device_memory {
+    ram_addr_t ram_size;
+    dt_device_memrng *rng;
+    int nrng;
+    /* TODO want a real memory map here */
+    ram_addr_t below_4g, above_4g;
+} dt_device_memory;
+
+static const dt_prop_spec dt_memory_props[] = {
+    DT_PROP_SPEC_INIT("ram", dt_device_memory, ram_size, ram_addr_t),
+};
+
+static void dt_memory_config(dt_device *dev, dt_host *host)
+{
+    /* TODO memory map hardcoded; get it from dev->conf instead */
+    dt_device_memory *priv = dev->priv;
+    dt_device_memrng *rng = qemu_malloc(sizeof(*rng) * 4);
+
+    if (priv->ram_size >= 0xe0000000 ) {
+        priv->above_4g = priv->ram_size - 0xe0000000;
+        priv->below_4g = 0xe0000000;
+    } else {
+        priv->below_4g = priv->ram_size;
+        priv->above_4g = 0;
+    }
+
+    dt_memrng_ram(&rng[0], 0, 0xa0000);
+    qemu_ram_alloc(0x60000);
+    dt_memrng_ram(&rng[1], 0x100000, priv->below_4g - 0x100000);
+    if (priv->above_4g)
+        abort();                /* TODO */
+    dt_memrng_rom(&rng[2], 0xe0000000, 0x20000000,
+                  bios_dir, BIOS_FILENAME, 1);
+                                /* TODO get name from dev->conf */
+    dt_memrng(&rng[3], 0xe0000, 0x20000,
+              rng[2].host_offs + rng[2].size - 0x20000, IO_MEM_ROM);
+    /* TODO option ROMs */
+
+    priv->rng = rng;
+    priv->nrng = 4;
+}
+
+static void dt_memory_init(dt_device *dev)
+{
+    dt_device_memory *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, priv->nrng);
+    bochs_bios_init();
+}
+
+static ram_addr_t dt_memory_below_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->below_4g;
+}
+
+static ram_addr_t dt_memory_above_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->above_4g;
+}
+
+
+/* PC Miscellanous Driver */
+
+/*
+ * This is a driver for a whole collection of devices.  Could be
+ * picked apart into separate drivers, I guess.
+ */
+
+typedef struct dt_device_pc_misc {
+    const char *boot_device;
+    int apic;
+    int hpet;
+    qemu_irq *i8259;
+    BlockDriverState *fd[MAX_FD];
+} dt_device_pc_misc;
+
+static const dt_prop_spec dt_pc_misc_props[] = {
+    DT_PROP_SPEC_INIT("boot-device", dt_device_pc_misc, boot_device,
+                      string),
+};
+
+static void dt_pc_misc_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pc_misc *priv = dev->priv;
+
+    priv->apic = 1;
+    priv->hpet = 1;
+    priv->i8259 = NULL;
+    dt_find_drives(dev->conf, host, priv->fd, ARRAY_SIZE(priv->fd));
+}
+
+static void dt_pc_misc_init(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    CPUState *env;
+    qemu_irq *cpu_irq;
+    IOAPICState *ioapic;
+    PITState *pit;
+    int i;
+
+    if (priv->apic) {
+        for (env = first_cpu; env; env = env->next_cpu) {
+            env->cpuid_features |= CPUID_APIC;
+            apic_init(env);
+        }
+    }
+
+    vmport_init();
+
+    cpu_irq = qemu_allocate_irqs(pic_irq_request, NULL, 1);
+    priv->i8259 = i8259_init(cpu_irq[0]);
+    ferr_irq = priv->i8259[13];
+
+    register_ioport_write(0x80, 1, 1, ioport80_write, NULL);
+    register_ioport_write(0xf0, 1, 1, ioportF0_write, NULL);
+
+    rtc_state = rtc_init(0x70, priv->i8259[8], 2000);
+    qemu_register_boot_set(pc_boot_set, rtc_state);
+
+    register_ioport_read(0x92, 1, 1, ioport92_read, NULL);
+    register_ioport_write(0x92, 1, 1, ioport92_write, NULL);
+
+    if (priv->apic) {
+        ioapic = ioapic_init();
+        pic_set_alt_irq_func(isa_pic, ioapic_set_irq, ioapic);
+    }
+
+    pit = pit_init(0x40, priv->i8259[0]);
+    pcspk_init(pit);
+    if (priv->hpet)
+        hpet_init(priv->i8259);
+
+    for(i = 0; i < MAX_SERIAL_PORTS; i++) {
+        if (serial_hds[i]) {
+            serial_init(serial_io[i], priv->i8259[serial_irq[i]], 115200,
+                        serial_hds[i]);
+        }
+    }
+
+    for(i = 0; i < MAX_PARALLEL_PORTS; i++) {
+        if (parallel_hds[i]) {
+            parallel_init(parallel_io[i], priv->i8259[parallel_irq[i]],
+                          parallel_hds[i]);
+        }
+    }
+
+    qemu_system_hot_add_init();
+
+    i8042_init(priv->i8259[1], priv->i8259[12], 0x60);
+    DMA_init(0);
+
+    floppy_controller = fdctrl_init(priv->i8259[6], 2, 0, 0x3f0, priv->fd);
+}
+
+static void dt_pc_misc_start(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    tree *memory = tree_node_by_name(dev->conf, "/memory");
+    tree *piix3 = tree_node_by_name(dev->conf, "/pci/piix3");
+
+    cmos_init(dt_memory_below_4g(memory),
+              dt_memory_above_4g(memory),
+              priv->boot_device,
+              dt_piix3_hd(piix3));
+}
+
+static qemu_irq *dt_pc_misc_i8259(tree *pc_misc)
+{
+    dt_device *dev = dt_device_of(pc_misc);
+    dt_device_pc_misc *priv = dev->priv;
+    assert(dev->drv->init == dt_pc_misc_init);
+    return priv->i8259;
+}
+
+
+/* PCI Bus Driver */
+
+typedef struct dt_device_pci {
+    PCIBus *pcibus;
+    tree *pc;
+} dt_device_pci;
+
+static void dt_pci_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->pcibus = NULL;
+    priv->pc = dt_require_named(dev, "/pc-misc");
+}
+
+static void dt_pci_init(dt_device *dev)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->pcibus = i440fx_init(&i440fx_state, dt_pc_misc_i8259(priv->pc));
+}
+
+static void dt_pci_start(dt_device *dev)
+{
+    i440fx_init_memory_mappings(i440fx_state);
+}
+
+static PCIBus *dt_pci_get_pcibus(dt_device *dev)
+{
+    return ((dt_device_pci *)dev->priv)->pcibus;
+}
+
+
+/* PIIX3 Driver */
+
+typedef struct dt_device_piix3 {
+    int devfn;
+    int acpi;
+    int usb;
+    tree *pc;
+    BlockDriverState *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
+} dt_device_piix3;
+
+static void dt_piix3_config(dt_device *dev, dt_host *host)
+{
+    dt_device_piix3 *priv = dev->priv;
+
+    priv->devfn = -1;
+    priv->acpi = 1;
+    priv->usb = 1;
+    priv->pc = dt_require_named(dev, "/pc-misc");
+    dt_find_drives(dev->conf, host, priv->hd, ARRAY_SIZE(priv->hd));
+}
+
+static void dt_piix3_init(dt_device *dev)
+{
+    dt_device_piix3 *priv = dev->priv;
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+    qemu_irq *i8259 = dt_pc_misc_i8259(priv->pc);
+    int i;
+
+    priv->devfn = piix3_init(pci_bus, priv->devfn);
+
+    pci_piix3_ide_init(pci_bus, priv->hd, priv->devfn + 1, i8259);
+
+    if (priv->usb)
+        usb_uhci_piix3_init(pci_bus, priv->devfn + 2);
+
+    if (priv->acpi) {
+        uint8_t *eeprom_buf = qemu_mallocz(8 * 256); /* XXX: make this persistent */
+        i2c_bus *smbus;
+
+        /* TODO: Populate SPD eeprom data.  */
+        smbus = piix4_pm_init(pci_bus, priv->devfn + 3, 0xb100, i8259[9]);
+        for (i = 0; i < 8; i++)
+            smbus_eeprom_device_init(smbus, 0x50 + i, eeprom_buf + (i * 256));
+    }
+}
+
+static BlockDriverState **dt_piix3_hd(tree *piix3)
+{
+    dt_device *dev = dt_device_of(piix3);
+    dt_device_piix3 *priv = dev->priv;
+
+    assert(dev->drv->init == dt_piix3_init);
+    return priv->hd;
+}
+
+
+/* VGA Driver */
+
+typedef struct dt_driver_vga {
+    const char *model;
+    const char *bios;
+    void (*init)(PCIBus *, uint8_t *, ram_addr_t, int);
+} dt_driver_vga;
+
+static void pci_vga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+                          ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size, 0, 0);
+}
+
+static dt_driver_vga dt_driver_vga_table[] = {
+    { "cirrus", VGABIOS_CIRRUS_FILENAME, pci_cirrus_vga_init },
+    { "vms", VGABIOS_FILENAME, pci_vmsvga_init },
+    { "std", VGABIOS_FILENAME, pci_vga_init_ },
+    { NULL, NULL, NULL }
+};
+
+typedef struct dt_device_vga {
+    const char *model;
+    ram_addr_t ram_size;
+    dt_device_memrng rng[1];
+    ram_addr_t ram_offs;
+    dt_driver_vga *vga_drv;
+} dt_device_vga;
+
+static const dt_prop_spec dt_vga_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_vga, model, string),
+    DT_PROP_SPEC_INIT("ram", dt_device_vga, ram_size, ram_addr_t),
+};
+
+static void dt_vga_config(dt_device *dev, dt_host *host)
+{
+    dt_device_vga *priv = dev->priv;
+    int i;
+
+    dt_memrng_rom(&priv->rng[0], 0xc0000, 0x10000,
+                  bios_dir, VGABIOS_CIRRUS_FILENAME, 0);
+                                /* TODO get name from dev->conf */
+    priv->ram_offs = qemu_ram_alloc(priv->ram_size);
+
+    for (i = 0; dt_driver_vga_table[i].model; i++) {
+        if (!strcmp(dt_driver_vga_table[i].model, priv->model))
+            break;
+    }
+    if (!dt_driver_vga_table[i].model) {
+        fprintf(stderr, "Unknown VGA model %s\n", priv->model);
+        exit(1);
+    }
+    priv->vga_drv = &dt_driver_vga_table[i];
+}
+
+static void dt_vga_init(dt_device *dev)
+{
+    dt_device_vga *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, 1);
+    priv->vga_drv->init(dt_get_pcibus(dev),
+                        phys_ram_base + priv->ram_offs,
+                        priv->ram_offs, priv->ram_size);
+}
+
+
+/* NIC Driver */
+
+typedef struct dt_device_nic {
+    NICInfo nd;
+} dt_device_nic;
+
+static const dt_prop_spec dt_nic_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_nic, nd.model, string),
+    DT_PROP_SPEC_INIT("mac", dt_device_nic, nd.macaddr, macaddr),
+    DT_PROP_SPEC_INIT("name", dt_device_nic, nd.name, string),
+};
+
+static void dt_nic_config(dt_device *dev, dt_host *host)
+{
+    dt_device_nic *priv = dev->priv;
+
+    priv->nd.vlan = dt_find_vlan(dev->conf, host);
+}
+
+static void dt_nic_init(dt_device *dev)
+{
+    dt_device_nic *priv = dev->priv;
+
+    pci_nic_init(dt_get_pcibus(dev), &priv->nd, -1, NULL);
+}
+
+
+/* SCSI Driver */
+
+typedef struct dt_device_scsi {
+    void *opaque;
+    BlockDriverState *hd[LSI_MAX_DEVS];
+} dt_device_scsi;
+
+static void dt_scsi_config(dt_device *dev, dt_host *host)
+{
+    dt_device_scsi *priv = dev->priv;
+
+    priv->opaque = NULL;
+    dt_find_drives(dev->conf, host, priv->hd, ARRAY_SIZE(priv->hd));
+}
+
+static void dt_scsi_init(dt_device *dev)
+{
+    dt_device_scsi *priv = dev->priv;
+    int i;
+
+    priv->opaque = lsi_scsi_init(dt_get_pcibus(dev), -1);
+
+    for (i = 0; i < ARRAY_SIZE(priv->hd); i++) {
+        if (priv->hd[i])
+            lsi_scsi_attach(priv->opaque, priv->hd[i], i);
+    }
+}
+
+
+/* Drive Driver */
+
+typedef struct dt_device_drive {
+    int unit;
+} dt_device_drive;
+
+static const dt_prop_spec dt_drive_props[] = {
+    DT_PROP_SPEC_INIT("unit", dt_device_drive, unit, int),
+};
+
+static int dt_drive_get_unit(dt_device *dev)
+{
+    return ((dt_device_drive *)dev->priv)->unit;
+}
+
+
+/* Machine Driver */
+
+static const dt_driver dt_driver_table[] = {
+    { "", 0, NULL, DT_BUS_ROOT, DT_BUS_NONE, NULL, NULL, NULL, NULL, NULL },
+    { "cpus", sizeof(dt_device_cpus), dt_cpus_props,
+      DT_BUS_NONE, DT_BUS_ROOT,
+      NULL, dt_cpus_init, NULL, NULL, NULL },
+    { "memory", sizeof(dt_device_memory), dt_memory_props,
+      DT_BUS_NONE, DT_BUS_ROOT,
+      dt_memory_config, dt_memory_init, NULL, NULL, NULL },
+    { "pc-misc", sizeof(dt_device_pc_misc), dt_pc_misc_props,
+      DT_BUS_FLOPPY, DT_BUS_ROOT,
+      dt_pc_misc_config, dt_pc_misc_init, dt_pc_misc_start, NULL, NULL },
+    { "pci", sizeof(dt_device_pci), NULL,
+      DT_BUS_PCI, DT_BUS_ROOT,
+      dt_pci_config, dt_pci_init, dt_pci_start, dt_pci_get_pcibus, NULL },
+    { "piix3", sizeof(dt_device_piix3), NULL,
+      DT_BUS_IDE, DT_BUS_PCI,
+      dt_piix3_config, dt_piix3_init, NULL, NULL, NULL },
+    { "vga", sizeof(dt_device_vga), dt_vga_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_vga_config, dt_vga_init, NULL, NULL, NULL },
+    { "nic", sizeof(dt_device_nic), dt_nic_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_nic_config, dt_nic_init, NULL, NULL, NULL },
+    { "scsi", sizeof(dt_device_scsi), NULL,
+      DT_BUS_SCSI, DT_BUS_PCI,
+      dt_scsi_config, dt_scsi_init, NULL, NULL, NULL },
+    { "ide-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_IDE,
+      NULL, NULL, NULL, NULL, dt_drive_get_unit },
+    { "scsi-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_SCSI,
+      NULL, NULL, NULL, NULL, dt_drive_get_unit },
+    { "floppy-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_FLOPPY,
+      NULL, NULL, NULL, NULL, dt_drive_get_unit },
+    { NULL, 0, NULL, DT_BUS_NONE, DT_BUS_NONE, NULL, NULL, NULL, NULL, NULL }
+};
+
+static tree *dt_read_config(void)
+{
+    tree *root, *pci, *leaf;
+
+    /*
+     * TODO Read from config file.
+     *
+     * TODO Pretty far from a comprehensive machine configuration, but
+     * we need to start somewhere.
+     */
+    root = tree_new_child(NULL, "", NULL);
+    leaf = tree_new_child(root, "cpus", NULL);
+    tree_put_propf(leaf, "model", "%s", CPU_MODEL_DEFAULT);
+    leaf = tree_new_child(root, "memory", NULL);
+    leaf = tree_new_child(root, "pc-misc", NULL);
+    pci = tree_new_child(root, "pci", NULL);
+    leaf = tree_new_child(pci, "piix3", NULL);
+    return root;
+}
+
+/*
+ * Extract configuration from arguments and various global variables
+ * and put it into our machine configuration.
+ */
+static void dt_customize_config(tree *conf,
+                                ram_addr_t ram_size, int vga_ram_size,
+                                const char *boot_device,
+                                const char *kernel_filename,
+                                const char *kernel_cmdline,
+                                const char *initrd_filename,
+                                const char *cpu_model)
+{
+    /*
+     * TODO This is still pretty cheesy: we insert stuff into the tree
+     * at hardcoded places.  Replacing placeholders instead would be
+     * more flexible.  Another idea is to mark certain parts of the
+     * initial tree optional, and remove them here.
+     */
+    tree *node;
+
+    node = tree_node_by_name(conf, "/cpus");
+    tree_put_propf(node, "num", "%d", smp_cpus);
+    if (cpu_model)
+        tree_put_propf(node, "model", "%s", cpu_model);
+
+    node = tree_node_by_name(conf, "/memory");
+    tree_put_propf(node, "ram", "%#lx", (unsigned long)ram_size);
+
+    node = tree_node_by_name(conf, "/pc-misc");
+    tree_put_propf(node, "boot-device", "%s", boot_device);
+
+    /* Unimplemented stuff */
+    if (kernel_filename)
+        abort();                /* TODO */
+}
+
+static void pc_init_dt(ram_addr_t ram_size, int vga_ram_size,
+                       const char *boot_device,
+                       const char *kernel_filename,
+                       const char *kernel_cmdline,
+                       const char *initrd_filename,
+                       const char *cpu_model)
+{
+    tree *conf;
+
+    conf = dt_read_config();
+    if (!conf)
+        exit(1);
+    tree_print(conf);
+    dt_customize_config(conf, ram_size, vga_ram_size, boot_device,
+                        kernel_filename, kernel_cmdline, initrd_filename,
+                        cpu_model);
+    dt_create_machine(conf, dt_driver_table, vga_ram_size);
+}
+
+QEMUMachine pcdt_machine = {
+    .name = "pcdt",
+    .desc = "Standard PC (device tree)",
+    .init = pc_init_dt,
+    .ram_require = VGA_RAM_SIZE + PC_MAX_BIOS_SIZE,
+    .max_cpus = 255,
+};
diff --git a/hw/pci.h b/hw/pci.h
index b955f39..935fafd 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -273,7 +273,7 @@ void *lsi_scsi_init(PCIBus *bus, int devfn);
 
 /* vmware_vga.c */
 void pci_vmsvga_init(PCIBus *bus, uint8_t *vga_ram_base,
-                     unsigned long vga_ram_offset, int vga_ram_size);
+                     ram_addr_t vga_ram_offset, int vga_ram_size);
 
 /* usb-uhci.c */
 void usb_uhci_piix3_init(PCIBus *bus, int devfn);
diff --git a/hw/pcint.h b/hw/pcint.h
new file mode 100644
index 0000000..f18da67
--- /dev/null
+++ b/hw/pcint.h
@@ -0,0 +1,46 @@
+/*
+ * Stuff shared by pc.c and dt.c
+ *
+ * See dt.c for why this should go away eventually.
+ */
+
+#ifndef HW_PC_INT_H
+#define HW_PC_INT_H
+
+#define BIOS_FILENAME "bios.bin"
+#define VGABIOS_FILENAME "vgabios.bin"
+#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
+
+#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
+
+#define MAX_IDE_BUS 2
+
+/* TODO move to ferr stuff in cpu.h? */
+extern qemu_irq ferr_irq;
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data);
+
+/* TODO eliminate */
+extern RTCState *rtc_state;
+extern PCIDevice *i440fx_state;
+extern int serial_io[MAX_SERIAL_PORTS];
+extern int serial_irq[MAX_SERIAL_PORTS];
+extern int parallel_io[MAX_PARALLEL_PORTS];
+extern int parallel_irq[MAX_PARALLEL_PORTS];
+extern fdctrl_t *floppy_controller;
+
+/* TODO move to pic stuff in pc.h? */
+void pic_irq_request(void *opaque, int irq, int level);
+
+/* TODO move to a20 stuff in pc.h? */
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val);
+uint32_t ioport92_read(void *opaque, uint32_t addr);
+
+void bochs_bios_init(void);
+void main_cpu_reset(void *opaque);
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data);
+int pc_boot_set(void *opaque, const char *boot_device);
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd);
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table);
+
+#endif
diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c
index 5c271e6..45fdbc8 100644
--- a/hw/vmware_vga.c
+++ b/hw/vmware_vga.c
@@ -1122,7 +1122,7 @@ static int vmsvga_load(struct vmsvga_state_s *s, QEMUFile *f)
 }
 
 static void vmsvga_init(struct vmsvga_state_s *s,
-                uint8_t *vga_ram_base, unsigned long vga_ram_offset,
+                uint8_t *vga_ram_base, ram_addr_t vga_ram_offset,
                 int vga_ram_size)
 {
     s->vram = vga_ram_base;
@@ -1216,7 +1216,7 @@ static void pci_vmsvga_map_mem(PCIDevice *pci_dev, int region_num,
 #define PCI_CLASS_HEADERTYPE_00h	0x00
 
 void pci_vmsvga_init(PCIBus *bus, uint8_t *vga_ram_base,
-                     unsigned long vga_ram_offset, int vga_ram_size)
+                     ram_addr_t vga_ram_offset, int vga_ram_size)
 {
     struct pci_vmsvga_state_s *s;
 
diff --git a/net.c b/net.c
index c853daf..831b002 100644
--- a/net.c
+++ b/net.c
@@ -157,7 +157,7 @@ static void hex_dump(FILE *f, const uint8_t *buf, int size)
 }
 #endif
 
-static int parse_macaddr(uint8_t *macaddr, const char *p)
+int parse_macaddr(uint8_t *macaddr, const char *p)
 {
     int i;
     char *last_char;
diff --git a/net.h b/net.h
index 1a51be7..54bdf80 100644
--- a/net.h
+++ b/net.h
@@ -47,6 +47,7 @@ int qemu_can_send_packet(VLANClientState *vc);
 ssize_t qemu_sendv_packet(VLANClientState *vc, const struct iovec *iov,
                           int iovcnt);
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size);
+int parse_macaddr(uint8_t *macaddr, const char *p);
 void qemu_format_nic_info_str(VLANClientState *vc, uint8_t macaddr[6]);
 void qemu_check_nic_model(NICInfo *nd, const char *model);
 void qemu_check_nic_model_list(NICInfo *nd, const char * const *models,
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 1cf49d5..34a7b4d 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -7,6 +7,7 @@
 
 void register_machines(void)
 {
+    qemu_register_machine(&pcdt_machine);
     qemu_register_machine(&pc_machine);
     qemu_register_machine(&isapc_machine);
 }
diff --git a/tree.c b/tree.c
new file mode 100644
index 0000000..da07b76
--- /dev/null
+++ b/tree.c
@@ -0,0 +1,285 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/* Ye Olde Decorated Tree */
+
+#include <assert.h>
+#include "tree.h"
+#include "qemu-common.h"
+#include "sys-queue.h"
+
+struct tree {
+    const char *name;
+    LIST_HEAD(, tree_prop) props;
+    tree *parent;
+    TAILQ_HEAD(, tree) children;
+    TAILQ_ENTRY(tree) siblings;
+    void *user;
+};
+
+struct tree_prop {
+    const char *name;
+    const void *val;
+    int sz;
+    tree *owner;
+    LIST_ENTRY(tree_prop) link;
+};
+
+tree *tree_new_child(tree *parent, const char *name, void *user)
+{
+    tree *child = qemu_malloc(sizeof(*child));
+
+    child->name = name;
+    LIST_INIT(&child->props);
+    child->parent = NULL;
+    TAILQ_INIT(&child->children);
+    child->user = user;
+    if (parent)
+        tree_insert(parent, child);
+
+    return child;
+}
+
+void tree_insert(tree *parent, tree *child)
+{
+    assert(!child->parent);
+    child->parent = parent;
+    TAILQ_INSERT_TAIL(&parent->children, child, siblings);
+}
+
+const char *tree_node_name(const tree *node)
+{
+    return node->name;
+}
+
+static tree *tree_child_by_name(const tree *parent, const char *name)
+{
+    const char *slash = strchr(name, '/');
+    size_t len = slash ? slash - name : strlen(name);
+    tree *child;
+
+    TAILQ_FOREACH(child, &parent->children, siblings) {
+        if (!memcmp(child->name, name, len) && child->name[len] == 0)
+            return child;
+    }
+    return NULL;
+}
+
+tree *tree_node_by_name(const tree *node, const char *name)
+{
+    tree *child;
+    size_t len;
+
+    if (name[0] == '/') {
+        for (; node->parent; node = node->parent) ;
+        while (*name == '/') name++;
+    }
+
+    if (name[0] == 0)
+        return (tree *)node;
+
+    child = tree_child_by_name(node, name);
+    if (!child)
+        return NULL;
+
+    len = strlen(child->name);
+    if (name[len] == 0)
+        return child;
+    assert (name[len] == '/');
+
+    while (name[len] == '/') len++;
+    return tree_node_by_name(child, name + len);
+}
+
+tree_prop *tree_first_prop(const tree *node)
+{
+    return LIST_FIRST(&node->props);
+}
+
+tree_prop *tree_next_prop(const tree_prop *prop)
+{
+    return LIST_NEXT(prop, link);
+}
+
+tree_prop *tree_get_prop(const tree *node, const char *name)
+{
+    tree_prop *prop;
+
+    LIST_FOREACH(prop, &node->props, link) {
+        if (!strcmp(prop->name, name))
+            return prop;
+    }
+    return NULL;
+}
+
+const char *tree_get_prop_s(const tree *node, const char *name)
+{
+    tree_prop *prop = tree_get_prop(node, name);
+    if (!prop
+        || memchr(prop->val, 0, prop->sz) != prop->val + prop->sz - 1) {
+        errno = EINVAL;
+        return NULL;
+    }
+    return prop->val;
+}
+
+const char *tree_prop_name(const tree_prop *prop)
+{
+    return prop->name;
+}
+
+const void *tree_prop_value(const tree_prop *prop, size_t *size)
+{
+    if (size)
+        *size = prop->sz;
+    return prop->val;
+}
+
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz)
+{
+    tree_prop *prop;
+
+    prop = tree_get_prop(node, name);
+    if (!prop) {
+        prop = qemu_malloc(sizeof(*prop));
+        prop->name = name;
+        prop->owner = node;
+        LIST_INSERT_HEAD(&node->props, prop, link);
+    }
+    /* FIXME need a destructor for val */
+    prop->val = val;
+    prop->sz = sz;
+}
+
+void tree_put_propf(tree *node, const char *name, const char *fmt, ...)
+{
+    va_list ap;
+    size_t len;
+    char *buf;
+
+    va_start(ap, fmt);
+    len = vsnprintf(NULL, 0, fmt, ap);
+    va_end(ap);
+
+    buf = qemu_malloc(len + 1);
+    va_start(ap, fmt);
+    vsnprintf(buf, len + 1, fmt, ap);
+    va_end(ap);
+
+    tree_put_prop(node, name, buf, len + 1);
+}
+
+void tree_put_user(tree *node, void *user)
+{
+    node->user = user;
+}
+
+void *tree_get_user(const tree *node)
+{
+    return node->user;
+}
+
+tree *tree_parent(const tree *node)
+{
+    return node->parent;
+}
+
+tree *tree_first_child(const tree *node)
+{
+    return TAILQ_FIRST(&node->children);
+}
+
+tree *tree_sibling(const tree *node)
+{
+    return TAILQ_NEXT(node, siblings);
+}
+
+int tree_path(const tree *node, char *buf, size_t bufsz)
+{
+    char *p;
+    const tree *np;
+    size_t len, res;
+
+    p = buf + bufsz;
+    res = 0;
+    for (np = node; np->parent; np = np->parent) {
+        len = 1 + strlen(np->name);
+        res += len;
+        if (res >= bufsz)
+            continue;
+        p -= len;
+        memcpy(p + 1, np->name, len - 1);
+        p[0] = '/';
+    }
+
+    if (res == 0) {
+        if (++res < bufsz)
+            *--p = '/';
+    }
+
+    if (res < bufsz) {
+        memcpy(buf, p, res);
+        buf[res] = 0;
+    }
+
+    return res;
+}
+
+static void tree_print_sub(const tree *node, int indent)
+{
+    int i, use_str, sep;
+    const unsigned char *pv;
+    tree_prop *prop;
+    tree *child;
+
+    printf("%*s%s {\n", indent, "", node->parent ? node->name : "/");
+    LIST_FOREACH(prop, &node->props, link) {
+        printf("%*s%s", indent + 4, "", prop->name);
+        pv = prop->val;
+        if (pv) {
+            printf(" = ");
+            use_str = pv[prop->sz - 1] == 0;
+            for (i = 0; i < prop->sz - 1; i++) {
+                if (!isprint(pv[i]))
+                    use_str = 0;
+            }
+            if (use_str)
+                printf("\"%s\"", (const char *)prop->val);
+            else {
+                sep = '[';
+                for (i = 0; i < prop->sz; i++) {
+                    printf("%c%02x", sep, pv[i]);
+                    sep = ' ';
+                }
+                printf("]");
+            }
+        }
+        printf(";\n");
+    }
+    TAILQ_FOREACH(child, &node->children, siblings)
+        tree_print_sub(child, indent + 4);
+    printf("%*s};\n", indent, "");
+}
+
+void tree_print(const tree *node)
+{
+    tree_print_sub(node, 0);
+}
diff --git a/tree.h b/tree.h
new file mode 100644
index 0000000..3f3b367
--- /dev/null
+++ b/tree.h
@@ -0,0 +1,41 @@
+#ifndef TREE_H
+#define TREE_H
+
+#include <stddef.h>
+
+typedef struct tree tree;
+typedef struct tree_prop tree_prop;
+
+tree *tree_new_child(tree *parent, const char *name, void *user);
+void tree_insert(tree *parent, tree *child);
+const char *tree_node_name(const tree *node);
+tree *tree_node_by_name(const tree *node,
+                        const char *name);
+
+tree_prop *tree_first_prop(const tree *node);
+tree_prop *tree_next_prop(const tree_prop *prop);
+#define TREE_FOREACH_PROP(var, node) \
+    for (var = tree_first_prop(node); var; var = tree_next_prop(var))
+tree_prop *tree_get_prop(const tree *node, const char *name);
+const char *tree_get_prop_s(const tree *node, const char *name);
+const char *tree_prop_name(const tree_prop *prop);
+const void *tree_prop_value(const tree_prop *prop, size_t *size);
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz);
+void tree_put_propf(tree *node, const char *name,
+                    const char *fmt, ...)
+    __attribute__((format(printf,3,4)));
+
+void tree_put_user(tree *node, void *user);
+void *tree_get_user(const tree *node);
+
+tree *tree_parent(const tree *node);
+tree *tree_first_child(const tree *node);
+tree *tree_sibling(const tree *node);
+#define TREE_FOREACH_CHILD(var, node) \
+    for (var = tree_first_child(node); var; var = tree_sibling(var))
+
+int tree_path(const tree *node, char *buf, size_t bufsz);
+void tree_print(const tree *node);
+
+#endif

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 6
  2009-03-12 18:43 ` [Qemu-devel] Machine description as data prototype, take 6 " Markus Armbruster
@ 2009-03-17 16:06   ` Paul Brook
  2009-03-17 17:32     ` Markus Armbruster
  0 siblings, 1 reply; 146+ messages in thread
From: Paul Brook @ 2009-03-17 16:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Markus Armbruster

> * The memory driver is PC-specific.  It should be generic and
>   data-driven, but getting there isn't quite as easy as it sounds.
>   Memory (and sometimes even holes) need to be allocated in just the
>   right order to ensure guest physical address equals host offset for
>   certain memory ranges.

This is a bug elsewhere and should be fixed there.

> +/*
> + * Device life cycle:
> + *
> + * 1. Configuration: config() method runs after parent's.  It should
> + * initialize the device's private data from its configuration
> + * sub-tree.  It may edit the configuration sub-tree, and may declare
> + * initialization ordering constraints with tree_require_named().
> + * 2. Initialization: init() method runs after parent's and after that
> + * 3. Start: start() method runs, order is unspecified.

Feels like there's at least one too many callbacks here.

The "may edit the configuration sub-tree" also sounds wrong. Devices shouldn't 
be interacting with the config tree directly, they should just be 
requesting/exposing features. This should also mean we shouldn't need manual 
dependencies because all device interaction is explicit.

Possibly this is a bit confused because you've still got all the device code 
lumped in the same file. It's hard to identify hacks for PC bits you've not 
implemented yet, machine/device independent code, per-device code, and 
hardcoded tree generation in lieu of an actual config file reader. Using ugly 
wrappers round the legacy interface doesn't help, especially for PCI devices 
where we already have an abstraction layer.

Paul

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [Qemu-devel] Machine description as data prototype, take 6
  2009-03-17 16:06   ` [Qemu-devel] Machine description as data prototype, take 6 Paul Brook
@ 2009-03-17 17:32     ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-03-17 17:32 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul Brook <paul@codesourcery.com> writes:

>> * The memory driver is PC-specific.  It should be generic and
>>   data-driven, but getting there isn't quite as easy as it sounds.
>>   Memory (and sometimes even holes) need to be allocated in just the
>>   right order to ensure guest physical address equals host offset for
>>   certain memory ranges.
>
> This is a bug elsewhere and should be fixed there.

Yes, the proper fix is a better guest memory allocation interface.

>> +/*
>> + * Device life cycle:
>> + *
>> + * 1. Configuration: config() method runs after parent's.  It should
>> + * initialize the device's private data from its configuration
>> + * sub-tree.  It may edit the configuration sub-tree, and may declare
>> + * initialization ordering constraints with tree_require_named().
>> + * 2. Initialization: init() method runs after parent's and after that
[...]
>> + * 3. Start: start() method runs, order is unspecified.
>
> Feels like there's at least one too many callbacks here.

I didn't have start() initially.  But then I realized I need to run
i440fx_init() before any PCI device's init(), and
i440fx_init_memory_mappings() after other devices initialized.  If
that's not the case, please tell me what I'm doing wrong.

> The "may edit the configuration sub-tree" also sounds wrong. Devices shouldn't 
> be interacting with the config tree directly, they should just be 
> requesting/exposing features.

No config() method currently edits the tree.  It is, however, safe for
them to do so.  Whether they should is a separate question.

>                               This should also mean we shouldn't need manual 
> dependencies because all device interaction is explicit.

I fear I can't quite follow.  Could you elaborate?

When we extend the tree to cover interrupts, edges describing them could
perhaps replace the manual dependencies.

> Possibly this is a bit confused because you've still got all the device code 
> lumped in the same file. It's hard to identify hacks for PC bits you've not 
> implemented yet,

Such hacks don't exist.  Command line options asking for unimplemented
devices are simply ignored.

>                  machine/device independent code,

That's elsewhere: dt.c.

>                                                   per-device code, and 

First three quarters of hw/pcdt.c, one device after the other.

> hardcoded tree generation in lieu of an actual config file reader.

Twelve lines of code in hw/pcdt.c.

There's also code to edit the tree according to the command line.
Required feature at this stage, in my opinion.  The device-independent
part is in dt.c (section "Dynamic Devices").  A few device-dependent
lines are left in pcdt.c, function dt_customize_config().

>                                                                    Using ugly 
> wrappers round the legacy interface doesn't help,

Nobody likes wrappers.  Yet I feel they are a necessity at this stage.
Before we cam replace existing interfaces wholesale, we need to figure
out what to replace them with.  That's the purpose of this prototype.
Moreover, we have quite a few devices, most of them I can't test.
Making *me* change their interfaces is a recipe for unnecessary breakage
and churn.

Once we've convinced ourselves that the new interface is satisfactory,
we can merge the wrappers into the wrappees.  I don't think we've
reached that point already.

>                                                   especially for PCI devices 
> where we already have an abstraction layer.

If there is an abstraction layer covering initialization of PCI devices,
I must have missed it.  Of the ones I implemented so far, every single
one wants to be initialized in its own idiosyncratic way.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [Qemu-devel] Re: [RFC] Machine description as data
  2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
                   ` (7 preceding siblings ...)
  2009-03-12 18:43 ` [Qemu-devel] Machine description as data prototype, take 6 " Markus Armbruster
@ 2009-03-23 15:50 ` Markus Armbruster
  2009-03-23 15:53   ` Markus Armbruster
  2009-03-31  9:16 ` Markus Armbruster
  2009-04-17 16:04 ` Markus Armbruster
  10 siblings, 1 reply; 146+ messages in thread
From: Markus Armbruster @ 2009-03-23 15:50 UTC (permalink / raw)
  To: qemu-devel

Seventh iteration of the prototype.  Work in progress, not quite ready for
merging.

New:

* New machine creation interface.  Machine types implementing the new
  interface provide QEMUMachine member drvtab[] instead of init().  When
  main() detects that, it builds machine configuration and passes it to
  (device-independent) dt_create_machine() instead of calling
  (device-dependent) init().

* Virtio block, balloon & console.

* Rebased to git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6875 c046a42c-6fe2-441c-8c8c-71466251a162

Not in, but not forgotten either:

* A few more renames suggested by reviewers.

* Reduce unnecessary differences to IEEE 1275 trees.

Shortcuts:

* No support for systems without PCI bus.

* I didn't implement all the devices of the "pc" original.  Missing:
  - Option ROMs
  - Audio

* Command line options -usb and -no-acpi have no effect; both USB and
  ACPI are always enabled.

* The configuration tree is simplistic.  I expect it to evolve, and I
  wouldn't exclude the possibility of wholesale replacement.

* The initial configuration tree is hardcoded in dt_read_config().  It
  should be read from a configuration file.

* A bus is identified by its kind and number.  The bus number depends on
  its position in the tree.  Means for position-independent addressing
  would be nice.

* The interface to the shared code in hw/pc.c (hw/pcint.h) is rather
  crude.

* The memory driver is PC-specific.  It should be generic and
  data-driven, but getting there isn't quite as easy as it sounds.
  Memory (and sometimes even holes) need to be allocated in just the
  right order to ensure guest physical address equals host offset for
  certain memory ranges.  I feel the proper way to address this is a
  better guest memory allocation interface.

* The pc-misc driver should most probably be split up some.

Bugs:

* hw/ppce500_mpc8544ds.c doesn't compile when I configure with fdt
  support.

* If I configure both a virtio block device and a virtio console, the
  Linux guest kernel hangs.  The same happens when I move virtio code in
  pc.c in an otherwise unmodified QEMU so that balloon and console are
  initialized earlier.


 Makefile              |    1 
 Makefile.target       |    4 
 dt.c                  |  779 ++++++++++++++++++++++++++++++++++++++++++++++++++
 dt.h                  |  115 +++++++
 hw/boards.h           |    6 
 hw/pc.c               |   47 +--
 hw/pcdt.c             |  677 +++++++++++++++++++++++++++++++++++++++++++
 hw/pci.h              |    2 
 hw/pcint.h            |   46 ++
 hw/vmware_vga.c       |    4 
 net.c                 |    2 
 net.h                 |    1 
 qemu-common.h         |    1 
 target-i386/machine.c |    1 
 tree.c                |  285 ++++++++++++++++++
 tree.h                |   41 ++
 vl.c                  |   15 
 17 files changed, 1992 insertions(+), 35 deletions(-)


/*
 * QEMU PC System Emulator
 *
 * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
 * Copyright (c) 2003-2004 Fabrice Bellard
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License as
 * published by the Free Software Foundation; either version 2 of
 * the License, or (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 */

/*
 * Configure and build a machine from configuration data
 *
 * This is generic, device-independent code driven by device-dependent
 * configuration data, talking to devices through an abstract device
 * interface.
 *
 * Machine types using it implement QEMUMachine member drvtab[]
 * instead of member init().  See hw/pcdt.c for an example.
 */

#include <assert.h>
#include "block.h"
#include "cpu.h"
#include "dt.h"
#include "net.h"
#include "tree.h"
#include "sysemu.h"

#ifdef HAVE_FDT
#include <libfdt.h>
#endif

/* Forward declarations */
static void dt_parse_prop(dt_device *dev, tree_prop *prop);
static void dt_add_dyn_devs(tree *conf, dt_host *host,
                            const dt_driver drvtab[], int vga_ram_size);
static void dt_fdt_test(tree *conf);


/* Host Configuration */

struct dt_host {
    /* connection NIC <-> VLAN */
    int nics;
    tree *nic[MAX_NICS];
    VLANState *nic_vlan[MAX_NICS];
    /* connection drive <-> block driver state */
    int drives;
    int virtio_drives;
    tree *drive[MAX_DRIVES];
    BlockDriverState *drive_state[MAX_DRIVES];
};

static void dt_attach_nic(dt_host *host, tree *nic, VLANState *vlan)
{
    assert(host->nics < MAX_NICS);
    host->nic[host->nics] = nic;
    host->nic_vlan[host->nics] = vlan;
    host->nics++;
}

VLANState *dt_find_vlan(tree *conf, dt_host *host)
{
    int i;

    for (i = 0; i < host->nics; i++) {
        if (host->nic[i] == conf)
            return host->nic_vlan[i];
    }
    return NULL;
}

static void dt_attach_drive(dt_host *host, tree *node, BlockDriverState *state)
{
    assert(host->drives < MAX_DRIVES);
    host->drive[host->drives] = node;
    host->drive_state[host->drives] = state;
    host->drives++;
}

void dt_find_drives(tree *conf, dt_host *host,
                    BlockDriverState *drive[], int n)
{
    int i, unit;

    memset(drive, 0, n * sizeof(drive[0]));

    for (i = 0; i < host->drives; i++) {
        if (tree_parent(host->drive[i]) != conf)
            continue;
        unit = dt_get_unit(dt_device_of(host->drive[i]));
        assert(unit < n && !drive[unit]);
        drive[unit] = host->drive_state[i];
    }
}

static void dt_print_host_config(dt_host *host)
{
    char buf[1024];
    int i;

    for (i = 0; i < host->nics; i++) {
        if (!host->nic[i])
            continue;
        tree_path(host->nic[i], buf, sizeof(buf));
        printf("nic#%d\tvlan %-4d\t%s\n",
               i, host->nic_vlan[i]->id, buf);
    }

    for (i = 0; i < host->drives; i++) {
        tree_path(host->drive[i], buf, sizeof(buf));
        printf("drive#%d\t%-15s %s\n",
               i, bdrv_get_device_name(host->drive_state[i]), buf);
    }
}


/* Device Interface */

static const dt_driver *dt_driver_by_name(const char *name,
                                          const dt_driver drvtab[])
{
    int i;

    for (i = 0; drvtab[i].name; i++) {
        if (!strcmp(name, drvtab[i].name))
            return &drvtab[i];
    }
    return NULL;
}

dt_device *dt_device_of(tree *conf)
{
    return tree_get_user(conf);
}

dt_device *dt_parent_device(dt_device *dev)
{
    tree *p = tree_parent(dev->conf);

    return p ? dt_device_of(p) : NULL;
}

static dt_device *dt_do_find_bus(tree *conf, dt_bus_type bus_type, int *skip)
{
    dt_device *dev;
    tree *child;

    dev = dt_device_of(conf);
    if (dev->drv->bus_type == bus_type && (*skip)-- == 0)
        return dev;

    TREE_FOREACH_CHILD(child, conf) {
        dev = dt_do_find_bus(child, bus_type, skip);
        if (dev)
            return dev;
    }

    return NULL;
}

static dt_device *dt_find_bus(tree *conf, dt_bus_type bus_type, int busno)
{
    return dt_do_find_bus(conf, bus_type, &busno);
}

PCIBus *dt_get_pcibus(dt_device *dev)
{
    dt_device *bus = dt_parent_device(dev);

    return bus->drv->get_pcibus(bus);
}

int dt_get_unit(dt_device *dev)
{
    return dev->drv->get_unit(dev);
}

static dt_device *dt_new_device(tree *conf, const dt_driver *drv)
{
    dt_device *dev;
    tree_prop *prop;

    dev = qemu_malloc(sizeof(*dev));
    dev->conf = conf;
    dev->drv = drv;
    LIST_INIT(&dev->reqs);
    dev->visit = 0;
    dev->priv = qemu_malloc(drv->privsz);
    tree_put_user(conf, dev);

    TREE_FOREACH_PROP(prop, conf)
        dt_parse_prop(dev, prop);

    return dev;
}

static dt_device *dt_create(tree *conf, const dt_driver drvtab[])
{
    const dt_driver *drv;
    dt_device *dev;
    tree *child;

    drv = dt_driver_by_name(tree_node_name(conf), drvtab);
    if (!drv) {
        fprintf(stderr, "No driver for device %s\n",
                tree_node_name(conf));
        exit(1);
    }

    assert((drv->bus_type == DT_BUS_PCI) == (drv->get_pcibus != NULL));

    dev = dt_new_device(conf, drv);

    TREE_FOREACH_CHILD(child, conf)
        dt_create(child, drvtab);

    return dev;
}

static void dt_config(tree *conf, dt_host *host)
{
    dt_device *dev = dt_device_of(conf);
    dt_device *bus = dt_parent_device(dev);
    tree *child;

    if (dev->drv->parent_bus_type == DT_BUS_NONE
        ? bus != NULL
        : bus == NULL || bus->drv->bus_type != dev->drv->parent_bus_type) {
        fprintf(stderr, "Device %s is not on a suitable bus\n",
                dev->drv->name);
        exit(1);
    }

    if (dev->drv->config)
        dev->drv->config(dev, host);

    TREE_FOREACH_CHILD(child, conf)
        dt_config(child, host);
}

tree *dt_require_named(dt_device *dev, const char *reqname)
{
    dt_tree_list *l = qemu_malloc(sizeof(*l));

    l->conf = tree_node_by_name(dev->conf, reqname);
    LIST_INSERT_HEAD(&dev->reqs, l, link);
    return l->conf;
}

static void dt_do_visit(dt_device *dev,
                        void (*fun)(dt_device *, void *arg),
                        void *arg, int visit)
{
    dt_device *parent, *req, *child;
    dt_tree_list *l;
    tree *k;

    assert(dev->visit < visit - 1);
    dev->visit = visit - 1;
    parent = dt_parent_device(dev);
    if (parent && parent->visit < visit)
        dt_do_visit(parent, fun, arg, visit);
    LIST_FOREACH(l, &dev->reqs, link) {
        req = dt_device_of(l->conf);
        if (req->visit < visit)
            dt_do_visit(req, fun, arg, visit);
    }
    dev->visit = visit;
    fun(dev, arg);
    TREE_FOREACH_CHILD(k, dev->conf) {
        child = dt_device_of(k);
        if (child->visit < visit - 1)
            dt_do_visit(child, fun, arg, visit);
    }
}

static void dt_visit(tree *node,
                     void (*fun)(dt_device *, void *arg),
                     void *arg)
{
    static int visit;

    visit += 2;
    dt_do_visit(dt_device_of(node), fun, arg, visit);
}

static void dt_init_visitor(dt_device *dev, void *arg)
{
    if (dev->drv->init)
        dev->drv->init(dev);
}

static void dt_init(tree *conf)
{
    dt_visit(conf, dt_init_visitor, NULL);
}

static void dt_start(tree *conf)
{
    dt_device *dev = dt_device_of(conf);
    tree *child;

    if (dev && dev->drv->start)
        dev->drv->start(dev);

    TREE_FOREACH_CHILD(child, conf)
        dt_start(child);
}

void dt_create_machine(tree *conf)
{
    dt_fdt_test(conf);
    dt_init(conf);
    dt_start(conf);
}


/* Device properties */

static const dt_prop_spec *dt_prop_spec_by_name(const dt_driver *drv,
                                                const char *name)
{
    const dt_prop_spec *spec;

    for (spec = drv->prop_spec; spec && spec->name; spec++) {
        if (!strcmp(spec->name, name))
            return spec;
    }
    return NULL;
}

static void dt_parse_prop(dt_device *dev, tree_prop *prop)
{
    const char *name = tree_prop_name(prop);
    size_t size;
    const char *val = tree_prop_value(prop, &size);
    const dt_prop_spec *spec = dt_prop_spec_by_name(dev->drv, name);

    if (!spec) {
        fprintf(stderr, "A %s device has no property %s\n",
                dev->drv->name, name);
        exit(1);
    }

    if (memchr(val, 0, size) != val + size - 1
        || spec->parse((char *)dev->priv + spec->offs, val, spec) < 0) {
        fprintf(stderr, "Bad value %.*s for property %s of device %s\n",
                size, val, name, dev->drv->name);
        exit(1);
    }
}

int dt_parse_string(void *dst, const char *src, const dt_prop_spec *spec)
{
    assert(spec->size == sizeof(char *));
    *(const char **)dst = src;
    return 0;
}

int dt_parse_int(void *dst, const char *src, const dt_prop_spec *spec)
{
    char *ep;
    long val;

    assert(spec->size == sizeof(int));
    errno = 0;
    val = strtol(src, &ep, 0);
    if (*ep || ep == src || errno || (int)val != val)
        return -1;
    *(int *)dst = val;
    return 0;
}

int dt_parse_ram_addr_t(void *dst, const char *src, const dt_prop_spec *spec)
{
    char *ep;
    unsigned long val;

    assert(spec->size == sizeof(ram_addr_t));
    errno = 0;
    val = strtoul(src, &ep, 0);
    if (*ep || ep == src || errno || (ram_addr_t)val != val)
        return -1;
    *(ram_addr_t *)dst = val;
    return 0;
}

int dt_parse_macaddr(void *dst, const char *src, const dt_prop_spec *spec)
{
    assert(spec->size == 6);
    if (parse_macaddr(dst, src) < 0)
        return -1;
    return 0;
}


/* Dynamic Devices */

static void dt_add_dyn_dev(tree *conf, tree *node, const dt_driver drvtab[],
                           int busno)
{
    dt_device *dev = dt_create(node, drvtab);
    dt_device *bus = dt_find_bus(conf, dev->drv->parent_bus_type, busno);

    if (!bus) {
        fprintf(stderr, "No suitable bus for device %s\n", dev->drv->name);
        exit(1);
    }

    tree_insert(bus->conf, node);
}

static void dt_add_vga(tree *conf, const dt_driver drvtab[],
                       const char *model, int vga_ram_size)
{
    tree *node = tree_new_child(NULL, "vga", NULL);

    tree_put_propf(node, "model", "%s", model);
    tree_put_propf(node, "ram", "%#x", vga_ram_size);
    dt_add_dyn_dev(conf, node, drvtab, 0);
}

static void dt_add_virtio_console(tree *conf, const dt_driver drvtab[],
                                  int index)
{
    tree *node = tree_new_child(NULL, "virtio-console", NULL);

    tree_put_propf(node, "index", "%d", index);
    dt_add_dyn_dev(conf, node, drvtab, 0);
}

static void dt_add_nic(tree *conf, dt_host *host, const dt_driver drvtab[],
                       NICInfo *n)
{
    tree *node = node = tree_new_child(NULL, "nic", NULL);

    tree_put_propf(node, "mac", "%02x:%02x:%02x:%02x:%02x:%02x",
                   n->macaddr[0], n->macaddr[1], n->macaddr[2],
                   n->macaddr[3], n->macaddr[4], n->macaddr[5]);
    tree_put_propf(node, "model", "%s",
                   n->model ? n->model : "ne2k_pci");
    if (n->name)
        tree_put_propf(node, "name", "%s", n->name);
    dt_add_dyn_dev(conf, node, drvtab, 0);
    dt_attach_nic(host, node, n->vlan);
}

static void dt_add_scsi(tree *conf, const dt_driver drvtab[], int busno)
{
    tree *node = tree_new_child(NULL, "scsi", NULL);

    dt_add_dyn_dev(conf, node, drvtab, 0);
    assert(dt_find_bus(conf, DT_BUS_SCSI, busno)->conf == node);
}

static void dt_add_virtio_block(tree *conf, const dt_driver drvtab[],
                                int busno)
{
    tree *node = tree_new_child(NULL, "virtio-block", NULL);

    dt_add_dyn_dev(conf, node, drvtab, 0);
    assert(dt_find_bus(conf, DT_BUS_VIRTIO, busno)->conf == node);
}

static const char *block_if_name[] = {
    [IF_IDE] = "ide",
    [IF_SCSI] = "scsi",
    [IF_FLOPPY] = "floppy",
    [IF_PFLASH] = "pflash",
    [IF_MTD] = "mtd",
    [IF_SD] = "sd",
    [IF_VIRTIO] = "virtio",
};

static void dt_do_add_drive(tree *conf, dt_host *host,
                            const dt_driver drvtab[],
                            int bus_type, int busno, int unit,
                            BlockDriverState *bdrv)
{
    char buf[32];
    tree *node;

    snprintf(buf, sizeof(buf), "%s-drive", block_if_name[bus_type]);
    node = tree_new_child(NULL, strdup(buf), NULL);
    tree_put_propf(node, "unit", "%d", unit);
    dt_add_dyn_dev(conf, node, drvtab, busno);
    dt_attach_drive(host, node, bdrv);
}

static void dt_add_drive(tree *conf, dt_host *host, const dt_driver drvtab[],
                         DriveInfo *d)
{
    switch (d->type) {
    case IF_IDE:
        /* hack to hang all IDE drives off the same node for now */
        dt_do_add_drive(conf, host, drvtab,
                        d->type, 0, d->bus * MAX_IDE_DEVS + d->unit, d->bdrv);
        break;
    case IF_SCSI:
    case IF_FLOPPY:
        dt_do_add_drive(conf, host, drvtab,
                        d->type, d->bus, d->unit, d->bdrv);
        break;
    case IF_VIRTIO:
        /* See comment in on virtio block in dt_add_dyn_devs() */
        dt_do_add_drive(conf, host, drvtab,
                        d->type, host->virtio_drives++, 0, d->bdrv);
        break;
    case IF_PFLASH:
    case IF_MTD:
    case IF_SD:
        /* TODO implement */
        fprintf(stderr, "Ignoring unimplemented drive %s\n",
                drives_opt[d->drive_opt_idx].opt);
        break;
    }
}

static void dt_add_dyn_devs(tree *conf, dt_host *host,
                            const dt_driver drvtab[], int vga_ram_size)
{
    int i, max_bus, busno;

    /* VGA */
    if (cirrus_vga_enabled || vmsvga_enabled || std_vga_enabled) {
        dt_add_vga(conf, drvtab,
                   cirrus_vga_enabled ? "cirrus"
                   : vmsvga_enabled ? "vms" : "std",
                   vga_ram_size);
    }

    /* Virtio consoles */
    for (i = 0; i < MAX_VIRTIO_CONSOLES; i++) {
        if (virtcon_hds[i])
            dt_add_virtio_console(conf, drvtab, i);
    }

    /* NICs */
    for(i = 0; i < nb_nics; i++)
        dt_add_nic(conf, host, drvtab, &nd_table[i]);

    /*
     * SCSI controllers
     *
     * This creates all controllers 0..max_bus, whether they have
     * drives or not.  Matches pc.c behavior.
     */
    max_bus = drive_get_max_bus(IF_SCSI);
    for (i = 0; i <= max_bus; i++)
        dt_add_scsi(conf, drvtab, i);

    /*
     * Virtio block controllers
     *
     * Each virtio drive is its own PCI device.  Since the device tree
     * should reflect that, we give each device on its own virtio
     * block controller node.
     *
     * DriveInfo's bus and unit are a mess.  The user can specify any
     * bus or unit number.  An unspecified bus number defaults to
     * zero, and an unspecified unit number defaults to the first
     * unused one (see drive_init()).  pc.c silently ignores all
     * virtio drives with non-zero bus number, and all drives on bus
     * zero after the first unused unit number.  Instead of
     * replicating that questionable behavior, simply ignore bus and
     * unit for these drives.
     */
    busno = 0;
    for (i = 0; i < nb_drives; i++) {
        if (drives_table[i].type == IF_VIRTIO)
            dt_add_virtio_block(conf, drvtab, busno++);
    }

    /* Drives */
    for (i = 0; i < nb_drives; i++)
        dt_add_drive(conf, host, drvtab, &drives_table[i]);
}


/* Create a configuration */

tree *dt_read_config(const char *name)
{
#ifdef TARGET_X86_64
#define CPU_MODEL_DEFAULT "qemu64"
#else
#define CPU_MODEL_DEFAULT "qemu32"
#endif
    tree *root, *pci, *leaf;

    /*
     * TODO Read from config file.
     *
     * TODO Pretty far from a comprehensive machine configuration, but
     * we need to start somewhere.
     */
    if (strcmp(name, "pcdt")) {
        fprintf(stderr, "qemu: machine %s not implemented", name);
        exit(1);
    }
    root = tree_new_child(NULL, "", NULL);
    leaf = tree_new_child(root, "cpus", NULL);
    tree_put_propf(leaf, "model", "%s", CPU_MODEL_DEFAULT);
    leaf = tree_new_child(root, "memory", NULL);
    leaf = tree_new_child(root, "pc-misc", NULL);
    pci = tree_new_child(root, "pci", NULL);
    leaf = tree_new_child(pci, "piix3", NULL);
    leaf = tree_new_child(pci, "virtio-balloon", NULL);
    return root;
#undef CPU_MODEL_DEFAULT
}

/*
 * Extract configuration from arguments and various global variables
 * and put it into our machine configuration.
 */
void dt_modify_config(tree *conf,
                      const dt_driver drvtab[],
                      ram_addr_t ram_size, int vga_ram_size,
                      const char *boot_device,
                      const char *kernel_filename,
                      const char *kernel_cmdline,
                      const char *initrd_filename,
                      const char *cpu_model)
{
    /*
     * TODO This is still pretty cheesy: we insert stuff into the tree
     * at hardcoded places.  Replacing placeholders instead would be
     * more flexible.  Another idea is to mark certain parts of the
     * initial tree optional, and remove them here.
     */
    tree *node;
    dt_host host;

    tree_print(conf);

    node = tree_node_by_name(conf, "/cpus");
    tree_put_propf(node, "num", "%d", smp_cpus);
    if (cpu_model)
        tree_put_propf(node, "model", "%s", cpu_model);

    node = tree_node_by_name(conf, "/memory");
    tree_put_propf(node, "ram", "%#lx", (unsigned long)ram_size);

    node = tree_node_by_name(conf, "/pc-misc");
    tree_put_propf(node, "boot-device", "%s", boot_device);

    /* Unimplemented stuff */
    if (kernel_filename)
        abort();                /* TODO */

    dt_create(conf, drvtab);
    memset(&host, 0, sizeof(host));
    dt_add_dyn_devs(conf, &host, drvtab, vga_ram_size);
    dt_config(conf, &host);

    dt_print_host_config(&host);
    tree_print(conf);
}


/* Interfacing with FDT */

/*
 * Note: translation to FDT loses the association between
 * configuration tree nodes and devices.
 */

#ifdef HAVE_FDT

static int dt_fdt_chk(int res);
static void dt_subtree_to_fdt(const tree *conf, void *fdt);

static void *dt_tree_to_fdt(const tree *conf)
{
    int sz = 1024 * 1024;       /* FIXME arbitrary limit */
    void *fdt = qemu_malloc(sz);

    dt_fdt_chk(fdt_create(fdt, sz));
    dt_subtree_to_fdt(conf, fdt);
    dt_fdt_chk(fdt_finish(fdt));
    return fdt;
}

static void dt_subtree_to_fdt(const tree *conf, void *fdt)
{
    tree_prop *prop;
    tree *child;
    const void *pv;
    size_t sz;

    dt_fdt_chk(fdt_begin_node(fdt, tree_node_name(conf)));
    TREE_FOREACH_PROP(prop, conf) {
        pv = tree_prop_value(prop, &sz);
        dt_fdt_chk(fdt_property(fdt, tree_prop_name(prop), pv, sz));
    }
    TREE_FOREACH_CHILD(child, conf)
        dt_subtree_to_fdt(child, fdt);
    dt_fdt_chk(fdt_end_node(fdt));
}

static tree *dt_fdt_to_tree(const void *fdt)
{
    int offs, next, depth;
    uint32_t tag;
    struct fdt_property *prop;
    tree *stack[32];            /* FIXME arbitrary limit */

    stack[0] = NULL;            /* "parent" of root */
    next = depth = 0;

    for (;;) {
        offs = next;
        tag = fdt_next_tag(fdt, offs, &next);
        switch (tag) {
        case FDT_PROP:
            /*
             * libfdt apparently doesn't provide a way to get property
             * by offset, do it by hand
             */
            assert(0 < depth && depth < ARRAY_SIZE(stack));
            prop = (void *)(const char *)fdt + fdt_off_dt_struct(fdt) + offs;
            tree_put_prop(stack[depth],
                          fdt_string(fdt, fdt32_to_cpu(prop->nameoff)),
                          prop->data,
                          fdt32_to_cpu(prop->len));
        case FDT_NOP:
            break;
        case FDT_BEGIN_NODE:
            depth++;
            assert(0 < depth && depth < ARRAY_SIZE(stack));
            stack[depth] = tree_new_child(stack[depth-1],
                                          fdt_get_name(fdt, offs, NULL),
                                          NULL);
            break;
        case FDT_END_NODE:
            depth--;
            break;
        case FDT_END:
            dt_fdt_chk(next);
            return stack[1];
        }
    }
}

static int dt_fdt_chk(int res)
{
    if (res < 0) {
        fprintf(stderr, "%s\n", fdt_strerror(res)); /* FIXME cryptic */
        exit(1);
    }
    return res;
}

static void dt_fdt_test(tree *conf)
{
    void *fdt;

    fdt = dt_tree_to_fdt(conf);
    conf = dt_fdt_to_tree(fdt);
    tree_print(conf);
    free(fdt);
}
#else
static void dt_fdt_test(tree *conf) { }
#endif

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [Qemu-devel] Re: [RFC] Machine description as data
  2009-03-23 15:50 ` [Qemu-devel] Re: [RFC] Machine description as data Markus Armbruster
@ 2009-03-23 15:53   ` Markus Armbruster
  0 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-03-23 15:53 UTC (permalink / raw)
  To: qemu-devel

Helps if I attach the patch instead of some random source file...



diff --git a/Makefile b/Makefile
index 82fec80..04026db 100644
--- a/Makefile
+++ b/Makefile
@@ -85,6 +85,7 @@ OBJS+=sd.o ssi-sd.o
 OBJS+=bt.o bt-host.o bt-vhci.o bt-l2cap.o bt-sdp.o bt-hci.o bt-hid.o usb-bt.o
 OBJS+=buffered_file.o migration.o migration-tcp.o net.o qemu-sockets.o
 OBJS+=qemu-char.o aio.o net-checksum.o savevm.o cache-utils.o
+OBJS+=tree.o
 
 ifdef CONFIG_BRLAPI
 OBJS+= baum.o
diff --git a/Makefile.target b/Makefile.target
index 41366ee..e6815e5 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -505,6 +505,7 @@ OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o dma-helpers.o
 # need to fix this properly
 OBJS+=virtio.o virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
 OBJS+=fw_cfg.o
+OBJS+=dt.o
 ifdef CONFIG_KVM
 OBJS+=kvm.o kvm-all.o
 endif
@@ -536,6 +537,7 @@ endif
 ifdef CONFIG_OSS
 LIBS += $(CONFIG_OSS_LIB)
 endif
+LIBS+= $(FDT_LIBS)
 
 SOUND_HW = sb16.o es1370.o ac97.o
 ifdef CONFIG_ADLIB
@@ -588,6 +590,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o ioapic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
+OBJS+= pcdt.o
 OBJS += device-hotplug.o pci-hotplug.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
@@ -611,7 +614,6 @@ OBJS+= ppc440.o ppc440_bamboo.o
 OBJS+= ppce500_pci.o ppce500_mpc8544ds.o
 ifdef FDT_LIBS
 OBJS+= device_tree.o
-LIBS+= $(FDT_LIBS)
 endif
 ifdef CONFIG_KVM
 OBJS+= kvm_ppc.o
diff --git a/dt.c b/dt.c
new file mode 100644
index 0000000..562deb3
--- /dev/null
+++ b/dt.c
@@ -0,0 +1,779 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * Configure and build a machine from configuration data
+ *
+ * This is generic, device-independent code driven by device-dependent
+ * configuration data, talking to devices through an abstract device
+ * interface.
+ *
+ * Machine types using it implement QEMUMachine member drvtab[]
+ * instead of member init().  See hw/pcdt.c for an example.
+ */
+
+#include <assert.h>
+#include "block.h"
+#include "cpu.h"
+#include "dt.h"
+#include "net.h"
+#include "tree.h"
+#include "sysemu.h"
+
+#ifdef HAVE_FDT
+#include <libfdt.h>
+#endif
+
+/* Forward declarations */
+static void dt_parse_prop(dt_device *dev, tree_prop *prop);
+static void dt_add_dyn_devs(tree *conf, dt_host *host,
+                            const dt_driver drvtab[], int vga_ram_size);
+static void dt_fdt_test(tree *conf);
+
+
+/* Host Configuration */
+
+struct dt_host {
+    /* connection NIC <-> VLAN */
+    int nics;
+    tree *nic[MAX_NICS];
+    VLANState *nic_vlan[MAX_NICS];
+    /* connection drive <-> block driver state */
+    int drives;
+    int virtio_drives;
+    tree *drive[MAX_DRIVES];
+    BlockDriverState *drive_state[MAX_DRIVES];
+};
+
+static void dt_attach_nic(dt_host *host, tree *nic, VLANState *vlan)
+{
+    assert(host->nics < MAX_NICS);
+    host->nic[host->nics] = nic;
+    host->nic_vlan[host->nics] = vlan;
+    host->nics++;
+}
+
+VLANState *dt_find_vlan(tree *conf, dt_host *host)
+{
+    int i;
+
+    for (i = 0; i < host->nics; i++) {
+        if (host->nic[i] == conf)
+            return host->nic_vlan[i];
+    }
+    return NULL;
+}
+
+static void dt_attach_drive(dt_host *host, tree *node, BlockDriverState *state)
+{
+    assert(host->drives < MAX_DRIVES);
+    host->drive[host->drives] = node;
+    host->drive_state[host->drives] = state;
+    host->drives++;
+}
+
+void dt_find_drives(tree *conf, dt_host *host,
+                    BlockDriverState *drive[], int n)
+{
+    int i, unit;
+
+    memset(drive, 0, n * sizeof(drive[0]));
+
+    for (i = 0; i < host->drives; i++) {
+        if (tree_parent(host->drive[i]) != conf)
+            continue;
+        unit = dt_get_unit(dt_device_of(host->drive[i]));
+        assert(unit < n && !drive[unit]);
+        drive[unit] = host->drive_state[i];
+    }
+}
+
+static void dt_print_host_config(dt_host *host)
+{
+    char buf[1024];
+    int i;
+
+    for (i = 0; i < host->nics; i++) {
+        if (!host->nic[i])
+            continue;
+        tree_path(host->nic[i], buf, sizeof(buf));
+        printf("nic#%d\tvlan %-4d\t%s\n",
+               i, host->nic_vlan[i]->id, buf);
+    }
+
+    for (i = 0; i < host->drives; i++) {
+        tree_path(host->drive[i], buf, sizeof(buf));
+        printf("drive#%d\t%-15s %s\n",
+               i, bdrv_get_device_name(host->drive_state[i]), buf);
+    }
+}
+
+
+/* Device Interface */
+
+static const dt_driver *dt_driver_by_name(const char *name,
+                                          const dt_driver drvtab[])
+{
+    int i;
+
+    for (i = 0; drvtab[i].name; i++) {
+        if (!strcmp(name, drvtab[i].name))
+            return &drvtab[i];
+    }
+    return NULL;
+}
+
+dt_device *dt_device_of(tree *conf)
+{
+    return tree_get_user(conf);
+}
+
+dt_device *dt_parent_device(dt_device *dev)
+{
+    tree *p = tree_parent(dev->conf);
+
+    return p ? dt_device_of(p) : NULL;
+}
+
+static dt_device *dt_do_find_bus(tree *conf, dt_bus_type bus_type, int *skip)
+{
+    dt_device *dev;
+    tree *child;
+
+    dev = dt_device_of(conf);
+    if (dev->drv->bus_type == bus_type && (*skip)-- == 0)
+        return dev;
+
+    TREE_FOREACH_CHILD(child, conf) {
+        dev = dt_do_find_bus(child, bus_type, skip);
+        if (dev)
+            return dev;
+    }
+
+    return NULL;
+}
+
+static dt_device *dt_find_bus(tree *conf, dt_bus_type bus_type, int busno)
+{
+    return dt_do_find_bus(conf, bus_type, &busno);
+}
+
+PCIBus *dt_get_pcibus(dt_device *dev)
+{
+    dt_device *bus = dt_parent_device(dev);
+
+    return bus->drv->get_pcibus(bus);
+}
+
+int dt_get_unit(dt_device *dev)
+{
+    return dev->drv->get_unit(dev);
+}
+
+static dt_device *dt_new_device(tree *conf, const dt_driver *drv)
+{
+    dt_device *dev;
+    tree_prop *prop;
+
+    dev = qemu_malloc(sizeof(*dev));
+    dev->conf = conf;
+    dev->drv = drv;
+    LIST_INIT(&dev->reqs);
+    dev->visit = 0;
+    dev->priv = qemu_malloc(drv->privsz);
+    tree_put_user(conf, dev);
+
+    TREE_FOREACH_PROP(prop, conf)
+        dt_parse_prop(dev, prop);
+
+    return dev;
+}
+
+static dt_device *dt_create(tree *conf, const dt_driver drvtab[])
+{
+    const dt_driver *drv;
+    dt_device *dev;
+    tree *child;
+
+    drv = dt_driver_by_name(tree_node_name(conf), drvtab);
+    if (!drv) {
+        fprintf(stderr, "No driver for device %s\n",
+                tree_node_name(conf));
+        exit(1);
+    }
+
+    assert((drv->bus_type == DT_BUS_PCI) == (drv->get_pcibus != NULL));
+
+    dev = dt_new_device(conf, drv);
+
+    TREE_FOREACH_CHILD(child, conf)
+        dt_create(child, drvtab);
+
+    return dev;
+}
+
+static void dt_config(tree *conf, dt_host *host)
+{
+    dt_device *dev = dt_device_of(conf);
+    dt_device *bus = dt_parent_device(dev);
+    tree *child;
+
+    if (dev->drv->parent_bus_type == DT_BUS_NONE
+        ? bus != NULL
+        : bus == NULL || bus->drv->bus_type != dev->drv->parent_bus_type) {
+        fprintf(stderr, "Device %s is not on a suitable bus\n",
+                dev->drv->name);
+        exit(1);
+    }
+
+    if (dev->drv->config)
+        dev->drv->config(dev, host);
+
+    TREE_FOREACH_CHILD(child, conf)
+        dt_config(child, host);
+}
+
+tree *dt_require_named(dt_device *dev, const char *reqname)
+{
+    dt_tree_list *l = qemu_malloc(sizeof(*l));
+
+    l->conf = tree_node_by_name(dev->conf, reqname);
+    LIST_INSERT_HEAD(&dev->reqs, l, link);
+    return l->conf;
+}
+
+static void dt_do_visit(dt_device *dev,
+                        void (*fun)(dt_device *, void *arg),
+                        void *arg, int visit)
+{
+    dt_device *parent, *req, *child;
+    dt_tree_list *l;
+    tree *k;
+
+    assert(dev->visit < visit - 1);
+    dev->visit = visit - 1;
+    parent = dt_parent_device(dev);
+    if (parent && parent->visit < visit)
+        dt_do_visit(parent, fun, arg, visit);
+    LIST_FOREACH(l, &dev->reqs, link) {
+        req = dt_device_of(l->conf);
+        if (req->visit < visit)
+            dt_do_visit(req, fun, arg, visit);
+    }
+    dev->visit = visit;
+    fun(dev, arg);
+    TREE_FOREACH_CHILD(k, dev->conf) {
+        child = dt_device_of(k);
+        if (child->visit < visit - 1)
+            dt_do_visit(child, fun, arg, visit);
+    }
+}
+
+static void dt_visit(tree *node,
+                     void (*fun)(dt_device *, void *arg),
+                     void *arg)
+{
+    static int visit;
+
+    visit += 2;
+    dt_do_visit(dt_device_of(node), fun, arg, visit);
+}
+
+static void dt_init_visitor(dt_device *dev, void *arg)
+{
+    if (dev->drv->init)
+        dev->drv->init(dev);
+}
+
+static void dt_init(tree *conf)
+{
+    dt_visit(conf, dt_init_visitor, NULL);
+}
+
+static void dt_start(tree *conf)
+{
+    dt_device *dev = dt_device_of(conf);
+    tree *child;
+
+    if (dev && dev->drv->start)
+        dev->drv->start(dev);
+
+    TREE_FOREACH_CHILD(child, conf)
+        dt_start(child);
+}
+
+void dt_create_machine(tree *conf)
+{
+    dt_fdt_test(conf);
+    dt_init(conf);
+    dt_start(conf);
+}
+
+
+/* Device properties */
+
+static const dt_prop_spec *dt_prop_spec_by_name(const dt_driver *drv,
+                                                const char *name)
+{
+    const dt_prop_spec *spec;
+
+    for (spec = drv->prop_spec; spec && spec->name; spec++) {
+        if (!strcmp(spec->name, name))
+            return spec;
+    }
+    return NULL;
+}
+
+static void dt_parse_prop(dt_device *dev, tree_prop *prop)
+{
+    const char *name = tree_prop_name(prop);
+    size_t size;
+    const char *val = tree_prop_value(prop, &size);
+    const dt_prop_spec *spec = dt_prop_spec_by_name(dev->drv, name);
+
+    if (!spec) {
+        fprintf(stderr, "A %s device has no property %s\n",
+                dev->drv->name, name);
+        exit(1);
+    }
+
+    if (memchr(val, 0, size) != val + size - 1
+        || spec->parse((char *)dev->priv + spec->offs, val, spec) < 0) {
+        fprintf(stderr, "Bad value %.*s for property %s of device %s\n",
+                size, val, name, dev->drv->name);
+        exit(1);
+    }
+}
+
+int dt_parse_string(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    assert(spec->size == sizeof(char *));
+    *(const char **)dst = src;
+    return 0;
+}
+
+int dt_parse_int(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    char *ep;
+    long val;
+
+    assert(spec->size == sizeof(int));
+    errno = 0;
+    val = strtol(src, &ep, 0);
+    if (*ep || ep == src || errno || (int)val != val)
+        return -1;
+    *(int *)dst = val;
+    return 0;
+}
+
+int dt_parse_ram_addr_t(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    char *ep;
+    unsigned long val;
+
+    assert(spec->size == sizeof(ram_addr_t));
+    errno = 0;
+    val = strtoul(src, &ep, 0);
+    if (*ep || ep == src || errno || (ram_addr_t)val != val)
+        return -1;
+    *(ram_addr_t *)dst = val;
+    return 0;
+}
+
+int dt_parse_macaddr(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    assert(spec->size == 6);
+    if (parse_macaddr(dst, src) < 0)
+        return -1;
+    return 0;
+}
+
+
+/* Dynamic Devices */
+
+static void dt_add_dyn_dev(tree *conf, tree *node, const dt_driver drvtab[],
+                           int busno)
+{
+    dt_device *dev = dt_create(node, drvtab);
+    dt_device *bus = dt_find_bus(conf, dev->drv->parent_bus_type, busno);
+
+    if (!bus) {
+        fprintf(stderr, "No suitable bus for device %s\n", dev->drv->name);
+        exit(1);
+    }
+
+    tree_insert(bus->conf, node);
+}
+
+static void dt_add_vga(tree *conf, const dt_driver drvtab[],
+                       const char *model, int vga_ram_size)
+{
+    tree *node = tree_new_child(NULL, "vga", NULL);
+
+    tree_put_propf(node, "model", "%s", model);
+    tree_put_propf(node, "ram", "%#x", vga_ram_size);
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+}
+
+static void dt_add_virtio_console(tree *conf, const dt_driver drvtab[],
+                                  int index)
+{
+    tree *node = tree_new_child(NULL, "virtio-console", NULL);
+
+    tree_put_propf(node, "index", "%d", index);
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+}
+
+static void dt_add_nic(tree *conf, dt_host *host, const dt_driver drvtab[],
+                       NICInfo *n)
+{
+    tree *node = node = tree_new_child(NULL, "nic", NULL);
+
+    tree_put_propf(node, "mac", "%02x:%02x:%02x:%02x:%02x:%02x",
+                   n->macaddr[0], n->macaddr[1], n->macaddr[2],
+                   n->macaddr[3], n->macaddr[4], n->macaddr[5]);
+    tree_put_propf(node, "model", "%s",
+                   n->model ? n->model : "ne2k_pci");
+    if (n->name)
+        tree_put_propf(node, "name", "%s", n->name);
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+    dt_attach_nic(host, node, n->vlan);
+}
+
+static void dt_add_scsi(tree *conf, const dt_driver drvtab[], int busno)
+{
+    tree *node = tree_new_child(NULL, "scsi", NULL);
+
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+    assert(dt_find_bus(conf, DT_BUS_SCSI, busno)->conf == node);
+}
+
+static void dt_add_virtio_block(tree *conf, const dt_driver drvtab[],
+                                int busno)
+{
+    tree *node = tree_new_child(NULL, "virtio-block", NULL);
+
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+    assert(dt_find_bus(conf, DT_BUS_VIRTIO, busno)->conf == node);
+}
+
+static const char *block_if_name[] = {
+    [IF_IDE] = "ide",
+    [IF_SCSI] = "scsi",
+    [IF_FLOPPY] = "floppy",
+    [IF_PFLASH] = "pflash",
+    [IF_MTD] = "mtd",
+    [IF_SD] = "sd",
+    [IF_VIRTIO] = "virtio",
+};
+
+static void dt_do_add_drive(tree *conf, dt_host *host,
+                            const dt_driver drvtab[],
+                            int bus_type, int busno, int unit,
+                            BlockDriverState *bdrv)
+{
+    char buf[32];
+    tree *node;
+
+    snprintf(buf, sizeof(buf), "%s-drive", block_if_name[bus_type]);
+    node = tree_new_child(NULL, strdup(buf), NULL);
+    tree_put_propf(node, "unit", "%d", unit);
+    dt_add_dyn_dev(conf, node, drvtab, busno);
+    dt_attach_drive(host, node, bdrv);
+}
+
+static void dt_add_drive(tree *conf, dt_host *host, const dt_driver drvtab[],
+                         DriveInfo *d)
+{
+    switch (d->type) {
+    case IF_IDE:
+        /* hack to hang all IDE drives off the same node for now */
+        dt_do_add_drive(conf, host, drvtab,
+                        d->type, 0, d->bus * MAX_IDE_DEVS + d->unit, d->bdrv);
+        break;
+    case IF_SCSI:
+    case IF_FLOPPY:
+        dt_do_add_drive(conf, host, drvtab,
+                        d->type, d->bus, d->unit, d->bdrv);
+        break;
+    case IF_VIRTIO:
+        /* See comment in on virtio block in dt_add_dyn_devs() */
+        dt_do_add_drive(conf, host, drvtab,
+                        d->type, host->virtio_drives++, 0, d->bdrv);
+        break;
+    case IF_PFLASH:
+    case IF_MTD:
+    case IF_SD:
+        /* TODO implement */
+        fprintf(stderr, "Ignoring unimplemented drive %s\n",
+                drives_opt[d->drive_opt_idx].opt);
+        break;
+    }
+}
+
+static void dt_add_dyn_devs(tree *conf, dt_host *host,
+                            const dt_driver drvtab[], int vga_ram_size)
+{
+    int i, max_bus, busno;
+
+    /* VGA */
+    if (cirrus_vga_enabled || vmsvga_enabled || std_vga_enabled) {
+        dt_add_vga(conf, drvtab,
+                   cirrus_vga_enabled ? "cirrus"
+                   : vmsvga_enabled ? "vms" : "std",
+                   vga_ram_size);
+    }
+
+    /* Virtio consoles */
+    for (i = 0; i < MAX_VIRTIO_CONSOLES; i++) {
+        if (virtcon_hds[i])
+            dt_add_virtio_console(conf, drvtab, i);
+    }
+
+    /* NICs */
+    for(i = 0; i < nb_nics; i++)
+        dt_add_nic(conf, host, drvtab, &nd_table[i]);
+
+    /*
+     * SCSI controllers
+     *
+     * This creates all controllers 0..max_bus, whether they have
+     * drives or not.  Matches pc.c behavior.
+     */
+    max_bus = drive_get_max_bus(IF_SCSI);
+    for (i = 0; i <= max_bus; i++)
+        dt_add_scsi(conf, drvtab, i);
+
+    /*
+     * Virtio block controllers
+     *
+     * Each virtio drive is its own PCI device.  Since the device tree
+     * should reflect that, we give each device on its own virtio
+     * block controller node.
+     *
+     * DriveInfo's bus and unit are a mess.  The user can specify any
+     * bus or unit number.  An unspecified bus number defaults to
+     * zero, and an unspecified unit number defaults to the first
+     * unused one (see drive_init()).  pc.c silently ignores all
+     * virtio drives with non-zero bus number, and all drives on bus
+     * zero after the first unused unit number.  Instead of
+     * replicating that questionable behavior, simply ignore bus and
+     * unit for these drives.
+     */
+    busno = 0;
+    for (i = 0; i < nb_drives; i++) {
+        if (drives_table[i].type == IF_VIRTIO)
+            dt_add_virtio_block(conf, drvtab, busno++);
+    }
+
+    /* Drives */
+    for (i = 0; i < nb_drives; i++)
+        dt_add_drive(conf, host, drvtab, &drives_table[i]);
+}
+
+
+/* Create a configuration */
+
+tree *dt_read_config(const char *name)
+{
+#ifdef TARGET_X86_64
+#define CPU_MODEL_DEFAULT "qemu64"
+#else
+#define CPU_MODEL_DEFAULT "qemu32"
+#endif
+    tree *root, *pci, *leaf;
+
+    /*
+     * TODO Read from config file.
+     *
+     * TODO Pretty far from a comprehensive machine configuration, but
+     * we need to start somewhere.
+     */
+    if (strcmp(name, "pcdt")) {
+        fprintf(stderr, "qemu: machine %s not implemented", name);
+        exit(1);
+    }
+    root = tree_new_child(NULL, "", NULL);
+    leaf = tree_new_child(root, "cpus", NULL);
+    tree_put_propf(leaf, "model", "%s", CPU_MODEL_DEFAULT);
+    leaf = tree_new_child(root, "memory", NULL);
+    leaf = tree_new_child(root, "pc-misc", NULL);
+    pci = tree_new_child(root, "pci", NULL);
+    leaf = tree_new_child(pci, "piix3", NULL);
+    leaf = tree_new_child(pci, "virtio-balloon", NULL);
+    return root;
+#undef CPU_MODEL_DEFAULT
+}
+
+/*
+ * Extract configuration from arguments and various global variables
+ * and put it into our machine configuration.
+ */
+void dt_modify_config(tree *conf,
+                      const dt_driver drvtab[],
+                      ram_addr_t ram_size, int vga_ram_size,
+                      const char *boot_device,
+                      const char *kernel_filename,
+                      const char *kernel_cmdline,
+                      const char *initrd_filename,
+                      const char *cpu_model)
+{
+    /*
+     * TODO This is still pretty cheesy: we insert stuff into the tree
+     * at hardcoded places.  Replacing placeholders instead would be
+     * more flexible.  Another idea is to mark certain parts of the
+     * initial tree optional, and remove them here.
+     */
+    tree *node;
+    dt_host host;
+
+    tree_print(conf);
+
+    node = tree_node_by_name(conf, "/cpus");
+    tree_put_propf(node, "num", "%d", smp_cpus);
+    if (cpu_model)
+        tree_put_propf(node, "model", "%s", cpu_model);
+
+    node = tree_node_by_name(conf, "/memory");
+    tree_put_propf(node, "ram", "%#lx", (unsigned long)ram_size);
+
+    node = tree_node_by_name(conf, "/pc-misc");
+    tree_put_propf(node, "boot-device", "%s", boot_device);
+
+    /* Unimplemented stuff */
+    if (kernel_filename)
+        abort();                /* TODO */
+
+    dt_create(conf, drvtab);
+    memset(&host, 0, sizeof(host));
+    dt_add_dyn_devs(conf, &host, drvtab, vga_ram_size);
+    dt_config(conf, &host);
+
+    dt_print_host_config(&host);
+    tree_print(conf);
+}
+
+
+/* Interfacing with FDT */
+
+/*
+ * Note: translation to FDT loses the association between
+ * configuration tree nodes and devices.
+ */
+
+#ifdef HAVE_FDT
+
+static int dt_fdt_chk(int res);
+static void dt_subtree_to_fdt(const tree *conf, void *fdt);
+
+static void *dt_tree_to_fdt(const tree *conf)
+{
+    int sz = 1024 * 1024;       /* FIXME arbitrary limit */
+    void *fdt = qemu_malloc(sz);
+
+    dt_fdt_chk(fdt_create(fdt, sz));
+    dt_subtree_to_fdt(conf, fdt);
+    dt_fdt_chk(fdt_finish(fdt));
+    return fdt;
+}
+
+static void dt_subtree_to_fdt(const tree *conf, void *fdt)
+{
+    tree_prop *prop;
+    tree *child;
+    const void *pv;
+    size_t sz;
+
+    dt_fdt_chk(fdt_begin_node(fdt, tree_node_name(conf)));
+    TREE_FOREACH_PROP(prop, conf) {
+        pv = tree_prop_value(prop, &sz);
+        dt_fdt_chk(fdt_property(fdt, tree_prop_name(prop), pv, sz));
+    }
+    TREE_FOREACH_CHILD(child, conf)
+        dt_subtree_to_fdt(child, fdt);
+    dt_fdt_chk(fdt_end_node(fdt));
+}
+
+static tree *dt_fdt_to_tree(const void *fdt)
+{
+    int offs, next, depth;
+    uint32_t tag;
+    struct fdt_property *prop;
+    tree *stack[32];            /* FIXME arbitrary limit */
+
+    stack[0] = NULL;            /* "parent" of root */
+    next = depth = 0;
+
+    for (;;) {
+        offs = next;
+        tag = fdt_next_tag(fdt, offs, &next);
+        switch (tag) {
+        case FDT_PROP:
+            /*
+             * libfdt apparently doesn't provide a way to get property
+             * by offset, do it by hand
+             */
+            assert(0 < depth && depth < ARRAY_SIZE(stack));
+            prop = (void *)(const char *)fdt + fdt_off_dt_struct(fdt) + offs;
+            tree_put_prop(stack[depth],
+                          fdt_string(fdt, fdt32_to_cpu(prop->nameoff)),
+                          prop->data,
+                          fdt32_to_cpu(prop->len));
+        case FDT_NOP:
+            break;
+        case FDT_BEGIN_NODE:
+            depth++;
+            assert(0 < depth && depth < ARRAY_SIZE(stack));
+            stack[depth] = tree_new_child(stack[depth-1],
+                                          fdt_get_name(fdt, offs, NULL),
+                                          NULL);
+            break;
+        case FDT_END_NODE:
+            depth--;
+            break;
+        case FDT_END:
+            dt_fdt_chk(next);
+            return stack[1];
+        }
+    }
+}
+
+static int dt_fdt_chk(int res)
+{
+    if (res < 0) {
+        fprintf(stderr, "%s\n", fdt_strerror(res)); /* FIXME cryptic */
+        exit(1);
+    }
+    return res;
+}
+
+static void dt_fdt_test(tree *conf)
+{
+    void *fdt;
+
+    fdt = dt_tree_to_fdt(conf);
+    conf = dt_fdt_to_tree(fdt);
+    tree_print(conf);
+    free(fdt);
+}
+#else
+static void dt_fdt_test(tree *conf) { }
+#endif
diff --git a/dt.h b/dt.h
new file mode 100644
index 0000000..9814167
--- /dev/null
+++ b/dt.h
@@ -0,0 +1,115 @@
+#ifndef DT_H
+#define DT_H
+
+#include "sysemu.h"
+#include "net.h"
+#include "tree.h"
+
+typedef struct dt_host dt_host;
+typedef struct dt_device dt_device;
+typedef struct dt_tree_list dt_tree_list;
+typedef struct dt_prop_spec dt_prop_spec;
+
+
+/* Host Configuration */
+
+VLANState *dt_find_vlan(tree *conf, dt_host *host);
+void dt_find_drives(tree *conf, dt_host *host,
+                    BlockDriverState *drive[], int n);
+
+
+/* Device Interface */
+
+/*
+ * Device life cycle:
+ *
+ * 1. Configuration: config() method runs after parent's.  It should
+ * initialize the device's private data from its configuration
+ * sub-tree.  It may edit the configuration sub-tree, and may declare
+ * initialization ordering constraints with dt_require_named().
+ *
+ * 2. Initialization: init() method runs after parent's and after that
+ * of devices declared required by config().  It should not touch the
+ * configuration tree.
+ *
+ * 3. Start: start() method runs, order is unspecified.
+ *
+ * Error handling in these driver methods: print to stderr and exit
+ * the program unsuccessfully.
+ *
+ * There is no device shutdown protocol yet.
+ */
+
+struct dt_device {
+    tree *conf;                 /* configuration sub-tree */
+    const dt_driver *drv;       /* device driver */
+    LIST_HEAD(, dt_tree_list) reqs; /* required devices */
+    int visit;                  /* for dt_visit() */
+    void *priv;                 /* device private data */
+};
+
+struct dt_tree_list {
+    tree *conf;
+    LIST_ENTRY(dt_tree_list) link;
+};
+
+typedef enum dt_bus_type {
+    DT_BUS_NONE, DT_BUS_ROOT, DT_BUS_PCI, DT_BUS_IDE, DT_BUS_SCSI,
+    DT_BUS_FLOPPY, DT_BUS_VIRTIO
+} dt_bus_type;
+
+struct dt_driver {
+    const char *name;
+    size_t privsz;              /* size of device private data */
+    const dt_prop_spec *prop_spec; /* recognized conf node properties */
+    dt_bus_type bus_type, parent_bus_type;
+    void (*config)(dt_device *, dt_host *);
+    void (*init)(dt_device *);
+    void (*start)(dt_device *);
+    PCIBus *(*get_pcibus)(dt_device *); /* iff device is a PCI bus */
+    int (*get_unit)(dt_device *);
+};
+
+dt_device *dt_device_of(tree *conf);
+dt_device *dt_parent_device(dt_device *dev);
+PCIBus *dt_get_pcibus(dt_device *dev);
+int dt_get_unit(dt_device *dev);
+tree *dt_require_named(dt_device *dev, const char *reqname);
+
+tree *dt_read_config(const char *name);
+void dt_modify_config(tree *conf,
+                      const dt_driver drvtab[],
+                      ram_addr_t ram_size, int vga_ram_size,
+                      const char *boot_device,
+                      const char *kernel_filename,
+                      const char *kernel_cmdline,
+                      const char *initrd_filename,
+                      const char *cpu_model);
+void dt_create_machine(tree *conf);
+
+
+/* Device properties */
+
+/*
+ * This is for parsing configuration tree node properties into device
+ * private data.
+ */
+
+struct dt_prop_spec {
+    const char *name;
+    ptrdiff_t offs;             /* offset in device private data */
+    size_t size;                /* size there, for sanity checking */
+    int (*parse)(void *, const char *, const dt_prop_spec *);
+};
+
+#define DT_PROP_SPEC_INIT(name, strty, member, fmt)                     \
+    { name, offsetof(strty, member), sizeof(((strty *)0)->member),      \
+      dt_parse_##fmt }
+
+/* Canned property parse methods */
+int dt_parse_string(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_int(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_ram_addr_t(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_macaddr(void *dst, const char *src, const dt_prop_spec *spec);
+
+#endif
diff --git a/hw/boards.h b/hw/boards.h
index 1e62594..b611e88 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -13,7 +13,8 @@ typedef void QEMUMachineInitFunc(ram_addr_t ram_size, int vga_ram_size,
 typedef struct QEMUMachine {
     const char *name;
     const char *desc;
-    QEMUMachineInitFunc *init;
+    QEMUMachineInitFunc *init;  /* traditional machine initialization */
+    const dt_driver *drvtab;    /* new alternative, used if !init */
 #define RAMSIZE_FIXED	(1 << 0)
     ram_addr_t ram_require;
     int nodisk_ok;
@@ -35,6 +36,9 @@ extern QEMUMachine axisdev88_machine;
 extern QEMUMachine pc_machine;
 extern QEMUMachine isapc_machine;
 
+/* pcdt.c */
+extern QEMUMachine pcdt_machine;
+
 /* ppc.c */
 extern QEMUMachine prep_machine;
 extern QEMUMachine core99_machine;
diff --git a/hw/pc.c b/hw/pc.c
index 69f25f3..41a0225 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -37,42 +37,35 @@
 #include "virtio-balloon.h"
 #include "virtio-console.h"
 #include "hpet_emul.h"
+#include "pcint.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
 
-#define BIOS_FILENAME "bios.bin"
-#define VGABIOS_FILENAME "vgabios.bin"
-#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
-
-#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
-
 /* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables.  */
 #define ACPI_DATA_SIZE       0x10000
 #define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 
-#define MAX_IDE_BUS 2
-
-static fdctrl_t *floppy_controller;
-static RTCState *rtc_state;
+fdctrl_t *floppy_controller;
+RTCState *rtc_state;
 static PITState *pit;
 static IOAPICState *ioapic;
-static PCIDevice *i440fx_state;
+PCIDevice *i440fx_state;
 
-static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
 {
 }
 
 /* MSDOS compatibility mode FPU exception support */
-static qemu_irq ferr_irq;
+qemu_irq ferr_irq;
 /* XXX: add IGNNE support */
 void cpu_set_ferr(CPUX86State *s)
 {
     qemu_irq_raise(ferr_irq);
 }
 
-static void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
 {
     qemu_irq_lower(ferr_irq);
 }
@@ -121,7 +114,7 @@ int cpu_get_pic_interrupt(CPUState *env)
     return intno;
 }
 
-static void pic_irq_request(void *opaque, int irq, int level)
+void pic_irq_request(void *opaque, int irq, int level)
 {
     CPUState *env = first_cpu;
 
@@ -167,7 +160,7 @@ static int cmos_get_fd_drive_type(int fd0)
     return val;
 }
 
-static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
 {
     RTCState *s = rtc_state;
     int cylinders, heads, sectors;
@@ -203,7 +196,7 @@ static int boot_device2nibble(char boot_device)
 
 /* copy/pasted from cmos_init, should be made a general function
  and used there as well */
-static int pc_boot_set(void *opaque, const char *boot_device)
+int pc_boot_set(void *opaque, const char *boot_device)
 {
     Monitor *mon = cur_mon;
 #define PC_MAX_BOOT_DEVICES 3
@@ -230,8 +223,8 @@ static int pc_boot_set(void *opaque, const char *boot_device)
 }
 
 /* hd_table must contain 4 block drivers */
-static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
-                      const char *boot_device, BlockDriverState **hd_table)
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table)
 {
     RTCState *s = rtc_state;
     int nbds, bds[3] = { 0, };
@@ -364,13 +357,13 @@ int ioport_get_a20(void)
     return ((first_cpu->a20_mask >> 20) & 1);
 }
 
-static void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
 {
     ioport_set_a20((val >> 1) & 1);
     /* XXX: bit 0 is fast reset */
 }
 
-static uint32_t ioport92_read(void *opaque, uint32_t addr)
+uint32_t ioport92_read(void *opaque, uint32_t addr)
 {
     return ioport_get_a20() << 1;
 }
@@ -422,7 +415,7 @@ static void bochs_bios_write(void *opaque, uint32_t addr, uint32_t val)
     }
 }
 
-static void bochs_bios_init(void)
+void bochs_bios_init(void)
 {
     void *fw_cfg;
 
@@ -691,7 +684,7 @@ static void load_linux(uint8_t *option_rom,
     generate_bootsect(option_rom, gpr, seg, 0);
 }
 
-static void main_cpu_reset(void *opaque)
+void main_cpu_reset(void *opaque)
 {
     CPUState *env = opaque;
     cpu_reset(env);
@@ -706,11 +699,11 @@ static const int ide_irq[2] = { 14, 15 };
 static int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360, 0x280, 0x380 };
 static int ne2000_irq[NE2000_NB_MAX] = { 9, 10, 11, 3, 4, 5 };
 
-static int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
-static int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
+int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
+int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
 
-static int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
-static int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
+int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
+int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
 
 #ifdef HAS_AUDIO
 static void audio_init (PCIBus *pci_bus, qemu_irq *pic)
diff --git a/hw/pcdt.c b/hw/pcdt.c
new file mode 100644
index 0000000..aebbf9f
--- /dev/null
+++ b/hw/pcdt.c
@@ -0,0 +1,677 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * This is a PC configured and built using the new dt.h interface.
+ * Having two PC machine types makes no sense in the long run, of
+ * course.  We want to replace pc.c eventually, and also convert other
+ * machine types to this interface.
+ *
+ * The configuration data currently is hardwired, and fairly limited.
+ *
+ * The nuts and bolts of PC emulation remain in pc.c for now, and
+ * using the stuff there makes the somewhat clumsy pcint.h necessary.
+ *
+ * The drivers here generally don't do the actual work, they just
+ * provide a common interface to existing device code.  Arguably, they
+ * should be integrated into that device code, with the goal of
+ * eventually replacing the old, ad hoc interfaces.
+ *
+ * Several drivers here are not PC-specific, e.g. drivers for various
+ * PCI devices.
+ */
+
+#include <assert.h>
+#include "hw.h"
+#include "pc.h"
+#include "fdc.h"
+#include "pci.h"
+#include "block.h"
+#include "sysemu.h"
+#include "audio/audio.h"
+#include "net.h"
+#include "smbus.h"
+#include "boards.h"
+#include "console.h"
+#include "fw_cfg.h"
+#include "virtio-blk.h"
+#include "virtio-balloon.h"
+#include "virtio-console.h"
+#include "hpet_emul.h"
+#include "pcint.h"
+#include "dt.h"
+
+
+static BlockDriverState **dt_piix3_bds(tree *piix3);
+
+/* CPUs Driver */
+
+typedef struct dt_device_cpus {
+    const char *model;
+    int num;
+} dt_device_cpus;
+
+static const dt_prop_spec dt_cpus_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
+    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
+};
+
+static void dt_cpus_init(dt_device *dev)
+{
+    dt_device_cpus *priv = dev->priv;
+    int i;
+    CPUState *env;
+
+    for(i = 0; i < priv->num; i++) {
+        env = cpu_init(priv->model);
+        if (!env) {
+            fprintf(stderr, "Unable to find CPU definition\n");
+            exit(1);
+        }
+        if (i != 0)
+            env->halted = 1;
+        qemu_register_reset(main_cpu_reset, env);
+    }
+}
+
+
+/* Memory Ranges */
+
+typedef struct dt_device_memrng {
+    target_phys_addr_t phys_addr;
+    ram_addr_t size;
+    ram_addr_t host_offs;
+    ram_addr_t flags;
+} dt_device_memrng;
+
+static void dt_memrng(dt_device_memrng *rng,
+                      target_phys_addr_t phys_addr, ram_addr_t size,
+                      ram_addr_t host_offs, ram_addr_t flags)
+{
+    rng->phys_addr = phys_addr;
+    rng->size = size;
+    rng->host_offs = host_offs;
+    rng->flags = flags;
+}
+
+static void dt_memrng_ram(dt_device_memrng *rng,
+                          target_phys_addr_t phys_addr, ram_addr_t size)
+{
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), 0);
+}
+
+static void dt_memrng_rom(dt_device_memrng *rng,
+                          target_phys_addr_t phys_addr, ram_addr_t maxsz,
+                          const char *dir, const char *image, int top)
+{
+    char buf[1024];
+    int size;
+
+    snprintf(buf, sizeof(buf), "%s/%s", dir, image);
+    size = get_image_size(buf);
+    if (size < 0 || size > maxsz)
+        goto error;
+    if (top)
+        phys_addr = phys_addr + maxsz - size;
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), IO_MEM_ROM);
+    if (load_image(buf, phys_ram_base + rng->host_offs) != size)
+        goto error;
+    return;
+
+error:
+    fprintf(stderr, "qemu: could not load image '%s'\n", buf);
+    exit(1);
+}
+
+static void dt_memrng_init(dt_device_memrng *rng, int n)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+        cpu_register_physical_memory(rng[i].phys_addr, rng[i].size,
+                                     rng[i].host_offs | rng[i].flags);
+}
+
+
+/* Memory Driver */
+
+typedef struct dt_device_memory {
+    ram_addr_t ram_size;
+    dt_device_memrng *rng;
+    int nrng;
+    /* TODO want a real memory map here */
+    ram_addr_t below_4g, above_4g;
+} dt_device_memory;
+
+static const dt_prop_spec dt_memory_props[] = {
+    DT_PROP_SPEC_INIT("ram", dt_device_memory, ram_size, ram_addr_t),
+};
+
+static void dt_memory_config(dt_device *dev, dt_host *host)
+{
+    /* TODO memory map hardcoded; get it from dev->conf instead */
+    dt_device_memory *priv = dev->priv;
+    dt_device_memrng *rng = qemu_malloc(sizeof(*rng) * 4);
+
+    if (priv->ram_size >= 0xe0000000 ) {
+        priv->above_4g = priv->ram_size - 0xe0000000;
+        priv->below_4g = 0xe0000000;
+    } else {
+        priv->below_4g = priv->ram_size;
+        priv->above_4g = 0;
+    }
+
+    dt_memrng_ram(&rng[0], 0, 0xa0000);
+    qemu_ram_alloc(0x60000);
+    dt_memrng_ram(&rng[1], 0x100000, priv->below_4g - 0x100000);
+    if (priv->above_4g)
+        abort();                /* TODO */
+    dt_memrng_rom(&rng[2], 0xe0000000, 0x20000000,
+                  bios_dir, BIOS_FILENAME, 1);
+                                /* TODO get name from dev->conf */
+    dt_memrng(&rng[3], 0xe0000, 0x20000,
+              rng[2].host_offs + rng[2].size - 0x20000, IO_MEM_ROM);
+    /* TODO option ROMs */
+
+    priv->rng = rng;
+    priv->nrng = 4;
+}
+
+static void dt_memory_init(dt_device *dev)
+{
+    dt_device_memory *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, priv->nrng);
+    bochs_bios_init();
+}
+
+static ram_addr_t dt_memory_below_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->below_4g;
+}
+
+static ram_addr_t dt_memory_above_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->above_4g;
+}
+
+
+/* PC Miscellanous Driver */
+
+/*
+ * This is a driver for a whole collection of devices.  Could be
+ * picked apart into separate drivers, I guess.
+ */
+
+typedef struct dt_device_pc_misc {
+    const char *boot_device;
+    int apic;
+    int hpet;
+    qemu_irq *i8259;
+    BlockDriverState *bds[MAX_FD];
+} dt_device_pc_misc;
+
+static const dt_prop_spec dt_pc_misc_props[] = {
+    DT_PROP_SPEC_INIT("boot-device", dt_device_pc_misc, boot_device,
+                      string),
+};
+
+static void dt_pc_misc_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pc_misc *priv = dev->priv;
+
+    priv->apic = 1;
+    priv->hpet = 1;
+    priv->i8259 = NULL;
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_pc_misc_init(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    CPUState *env;
+    qemu_irq *cpu_irq;
+    IOAPICState *ioapic;
+    PITState *pit;
+    int i;
+
+    if (priv->apic) {
+        for (env = first_cpu; env; env = env->next_cpu) {
+            env->cpuid_features |= CPUID_APIC;
+            apic_init(env);
+        }
+    }
+
+    vmport_init();
+
+    cpu_irq = qemu_allocate_irqs(pic_irq_request, NULL, 1);
+    priv->i8259 = i8259_init(cpu_irq[0]);
+    ferr_irq = priv->i8259[13];
+
+    register_ioport_write(0x80, 1, 1, ioport80_write, NULL);
+    register_ioport_write(0xf0, 1, 1, ioportF0_write, NULL);
+
+    rtc_state = rtc_init(0x70, priv->i8259[8], 2000);
+    qemu_register_boot_set(pc_boot_set, rtc_state);
+
+    register_ioport_read(0x92, 1, 1, ioport92_read, NULL);
+    register_ioport_write(0x92, 1, 1, ioport92_write, NULL);
+
+    if (priv->apic) {
+        ioapic = ioapic_init();
+        pic_set_alt_irq_func(isa_pic, ioapic_set_irq, ioapic);
+    }
+
+    pit = pit_init(0x40, priv->i8259[0]);
+    pcspk_init(pit);
+    if (priv->hpet)
+        hpet_init(priv->i8259);
+
+    for(i = 0; i < MAX_SERIAL_PORTS; i++) {
+        if (serial_hds[i]) {
+            serial_init(serial_io[i], priv->i8259[serial_irq[i]], 115200,
+                        serial_hds[i]);
+        }
+    }
+
+    for(i = 0; i < MAX_PARALLEL_PORTS; i++) {
+        if (parallel_hds[i]) {
+            parallel_init(parallel_io[i], priv->i8259[parallel_irq[i]],
+                          parallel_hds[i]);
+        }
+    }
+
+    qemu_system_hot_add_init();
+
+    i8042_init(priv->i8259[1], priv->i8259[12], 0x60);
+    DMA_init(0);
+
+    floppy_controller = fdctrl_init(priv->i8259[6], 2, 0, 0x3f0, priv->bds);
+}
+
+static void dt_pc_misc_start(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    tree *memory = tree_node_by_name(dev->conf, "/memory");
+    tree *piix3 = tree_node_by_name(dev->conf, "/pci/piix3");
+
+    cmos_init(dt_memory_below_4g(memory),
+              dt_memory_above_4g(memory),
+              priv->boot_device,
+              dt_piix3_bds(piix3));
+}
+
+static qemu_irq *dt_pc_misc_i8259(tree *pc_misc)
+{
+    dt_device *dev = dt_device_of(pc_misc);
+    dt_device_pc_misc *priv = dev->priv;
+    assert(dev->drv->init == dt_pc_misc_init);
+    return priv->i8259;
+}
+
+
+/* PCI Bus Driver */
+
+typedef struct dt_device_pci {
+    PCIBus *pcibus;
+    tree *pc;
+} dt_device_pci;
+
+static void dt_pci_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->pcibus = NULL;
+    priv->pc = dt_require_named(dev, "/pc-misc");
+}
+
+static void dt_pci_init(dt_device *dev)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->pcibus = i440fx_init(&i440fx_state, dt_pc_misc_i8259(priv->pc));
+}
+
+static void dt_pci_start(dt_device *dev)
+{
+    i440fx_init_memory_mappings(i440fx_state);
+}
+
+static PCIBus *dt_pci_get_pcibus(dt_device *dev)
+{
+    return ((dt_device_pci *)dev->priv)->pcibus;
+}
+
+
+/* PIIX3 Driver */
+
+typedef struct dt_device_piix3 {
+    int devfn;
+    int acpi;
+    int usb;
+    tree *pc;
+    BlockDriverState *bds[MAX_IDE_BUS * MAX_IDE_DEVS];
+} dt_device_piix3;
+
+static void dt_piix3_config(dt_device *dev, dt_host *host)
+{
+    dt_device_piix3 *priv = dev->priv;
+
+    priv->devfn = -1;
+    priv->acpi = 1;
+    priv->usb = 1;
+    priv->pc = dt_require_named(dev, "/pc-misc");
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_piix3_init(dt_device *dev)
+{
+    dt_device_piix3 *priv = dev->priv;
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+    qemu_irq *i8259 = dt_pc_misc_i8259(priv->pc);
+    int i;
+
+    priv->devfn = piix3_init(pci_bus, priv->devfn);
+
+    pci_piix3_ide_init(pci_bus, priv->bds, priv->devfn + 1, i8259);
+
+    if (priv->usb)
+        usb_uhci_piix3_init(pci_bus, priv->devfn + 2);
+
+    if (priv->acpi) {
+        uint8_t *eeprom_buf = qemu_mallocz(8 * 256); /* XXX: make this persistent */
+        i2c_bus *smbus;
+
+        /* TODO: Populate SPD eeprom data.  */
+        smbus = piix4_pm_init(pci_bus, priv->devfn + 3, 0xb100, i8259[9]);
+        for (i = 0; i < 8; i++)
+            smbus_eeprom_device_init(smbus, 0x50 + i, eeprom_buf + (i * 256));
+    }
+}
+
+static BlockDriverState **dt_piix3_bds(tree *piix3)
+{
+    dt_device *dev = dt_device_of(piix3);
+    dt_device_piix3 *priv = dev->priv;
+
+    assert(dev->drv->init == dt_piix3_init);
+    return priv->bds;
+}
+
+
+/* VGA Driver */
+
+typedef struct dt_driver_vga {
+    const char *model;
+    const char *bios;
+    void (*init)(PCIBus *, uint8_t *, ram_addr_t, int);
+} dt_driver_vga;
+
+static void pci_vga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+                          ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size, 0, 0);
+}
+
+static dt_driver_vga dt_driver_vga_table[] = {
+    { "cirrus", VGABIOS_CIRRUS_FILENAME, pci_cirrus_vga_init },
+    { "vms", VGABIOS_FILENAME, pci_vmsvga_init },
+    { "std", VGABIOS_FILENAME, pci_vga_init_ },
+    { NULL, NULL, NULL }
+};
+
+typedef struct dt_device_vga {
+    const char *model;
+    ram_addr_t ram_size;
+    dt_device_memrng rng[1];
+    ram_addr_t ram_offs;
+    dt_driver_vga *vga_drv;
+} dt_device_vga;
+
+static const dt_prop_spec dt_vga_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_vga, model, string),
+    DT_PROP_SPEC_INIT("ram", dt_device_vga, ram_size, ram_addr_t),
+};
+
+static void dt_vga_config(dt_device *dev, dt_host *host)
+{
+    dt_device_vga *priv = dev->priv;
+    int i;
+
+    dt_memrng_rom(&priv->rng[0], 0xc0000, 0x10000,
+                  bios_dir, VGABIOS_CIRRUS_FILENAME, 0);
+                                /* TODO get name from dev->conf */
+    priv->ram_offs = qemu_ram_alloc(priv->ram_size);
+
+    for (i = 0; dt_driver_vga_table[i].model; i++) {
+        if (!strcmp(dt_driver_vga_table[i].model, priv->model))
+            break;
+    }
+    if (!dt_driver_vga_table[i].model) {
+        fprintf(stderr, "Unknown VGA model %s\n", priv->model);
+        exit(1);
+    }
+    priv->vga_drv = &dt_driver_vga_table[i];
+}
+
+static void dt_vga_init(dt_device *dev)
+{
+    dt_device_vga *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, 1);
+    priv->vga_drv->init(dt_get_pcibus(dev),
+                        phys_ram_base + priv->ram_offs,
+                        priv->ram_offs, priv->ram_size);
+}
+
+
+/* NIC Driver */
+
+typedef struct dt_device_nic {
+    NICInfo nd;
+} dt_device_nic;
+
+static const dt_prop_spec dt_nic_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_nic, nd.model, string),
+    DT_PROP_SPEC_INIT("mac", dt_device_nic, nd.macaddr, macaddr),
+    DT_PROP_SPEC_INIT("name", dt_device_nic, nd.name, string),
+};
+
+static void dt_nic_config(dt_device *dev, dt_host *host)
+{
+    dt_device_nic *priv = dev->priv;
+
+    priv->nd.vlan = dt_find_vlan(dev->conf, host);
+}
+
+static void dt_nic_init(dt_device *dev)
+{
+    dt_device_nic *priv = dev->priv;
+
+    pci_nic_init(dt_get_pcibus(dev), &priv->nd, -1, NULL);
+}
+
+
+/* SCSI Driver */
+
+typedef struct dt_device_scsi {
+    void *opaque;
+    BlockDriverState *bds[LSI_MAX_DEVS];
+} dt_device_scsi;
+
+static void dt_scsi_config(dt_device *dev, dt_host *host)
+{
+    dt_device_scsi *priv = dev->priv;
+
+    priv->opaque = NULL;
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_scsi_init(dt_device *dev)
+{
+    dt_device_scsi *priv = dev->priv;
+    int i;
+
+    priv->opaque = lsi_scsi_init(dt_get_pcibus(dev), -1);
+
+    for (i = 0; i < ARRAY_SIZE(priv->bds); i++) {
+        if (priv->bds[i])
+            lsi_scsi_attach(priv->opaque, priv->bds[i], i);
+    }
+}
+
+
+/* Virtio Block Driver */
+
+typedef struct dt_device_virtio_block {
+    BlockDriverState *bds[1];
+} dt_device_virtio_block;
+
+static void dt_virtio_block_config(dt_device *dev, dt_host *host)
+{
+    dt_device_virtio_block *priv = dev->priv;
+
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_virtio_block_init(dt_device *dev)
+{
+    dt_device_virtio_block *priv = dev->priv;
+
+    virtio_blk_init(dt_get_pcibus(dev), priv->bds[0]);
+}
+
+
+/* Virtio Balloon Driver */
+
+static void dt_virtio_balloon_init(dt_device *dev)
+{
+    virtio_balloon_init(dt_get_pcibus(dev));
+}
+
+
+/* Virtio Console Driver */
+
+typedef struct dt_device_virtio_console {
+    int index;
+    CharDriverState *hds;
+} dt_device_virtio_console;
+
+static const dt_prop_spec dt_virtio_console_props[] = {
+    DT_PROP_SPEC_INIT("index", dt_device_virtio_console, index, int),
+};
+
+static void dt_virtio_console_config(dt_device *dev, dt_host *host)
+{
+    dt_device_virtio_console *priv = dev->priv;
+
+    priv->hds = virtcon_hds[priv->index];
+}
+
+static void dt_virtio_console_init(dt_device *dev)
+{
+    dt_device_virtio_console *priv = dev->priv;
+
+    virtio_console_init(dt_get_pcibus(dev), priv->hds);
+}
+
+
+/* Drive Driver */
+
+typedef struct dt_device_drive {
+    int unit;
+} dt_device_drive;
+
+static const dt_prop_spec dt_drive_props[] = {
+    DT_PROP_SPEC_INIT("unit", dt_device_drive, unit, int),
+};
+
+static int dt_drive_get_unit(dt_device *dev)
+{
+    return ((dt_device_drive *)dev->priv)->unit;
+}
+
+
+/* Machine Driver */
+
+static const dt_driver dt_driver_table[] = {
+    { "", 0, NULL, DT_BUS_ROOT, DT_BUS_NONE, NULL, NULL, NULL, NULL, NULL },
+    { "cpus", sizeof(dt_device_cpus), dt_cpus_props,
+      DT_BUS_NONE, DT_BUS_ROOT,
+      NULL, dt_cpus_init, NULL, NULL, NULL },
+    { "memory", sizeof(dt_device_memory), dt_memory_props,
+      DT_BUS_NONE, DT_BUS_ROOT,
+      dt_memory_config, dt_memory_init, NULL, NULL, NULL },
+    { "pc-misc", sizeof(dt_device_pc_misc), dt_pc_misc_props,
+      DT_BUS_FLOPPY, DT_BUS_ROOT,
+      dt_pc_misc_config, dt_pc_misc_init, dt_pc_misc_start, NULL, NULL },
+    { "pci", sizeof(dt_device_pci), NULL,
+      DT_BUS_PCI, DT_BUS_ROOT,
+      dt_pci_config, dt_pci_init, dt_pci_start, dt_pci_get_pcibus, NULL },
+    { "piix3", sizeof(dt_device_piix3), NULL,
+      DT_BUS_IDE, DT_BUS_PCI,
+      dt_piix3_config, dt_piix3_init, NULL, NULL, NULL },
+    { "vga", sizeof(dt_device_vga), dt_vga_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_vga_config, dt_vga_init, NULL, NULL, NULL },
+    { "nic", sizeof(dt_device_nic), dt_nic_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_nic_config, dt_nic_init, NULL, NULL, NULL },
+    { "scsi", sizeof(dt_device_scsi), NULL,
+      DT_BUS_SCSI, DT_BUS_PCI,
+      dt_scsi_config, dt_scsi_init, NULL, NULL, NULL },
+    { "virtio-block", sizeof(dt_device_virtio_block), NULL,
+      DT_BUS_VIRTIO, DT_BUS_PCI,
+      dt_virtio_block_config, dt_virtio_block_init, NULL, NULL, NULL },
+    { "virtio-balloon", 0, NULL,
+      DT_BUS_NONE, DT_BUS_PCI,
+      NULL, dt_virtio_balloon_init, NULL, NULL, NULL },
+    { "virtio-console", sizeof(dt_device_virtio_console), dt_virtio_console_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_virtio_console_config, dt_virtio_console_init, NULL, NULL, NULL },
+    { "ide-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_IDE,
+      NULL, NULL, NULL, NULL, dt_drive_get_unit },
+    { "scsi-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_SCSI,
+      NULL, NULL, NULL, NULL, dt_drive_get_unit },
+    { "floppy-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_FLOPPY,
+      NULL, NULL, NULL, NULL, dt_drive_get_unit },
+    { "virtio-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_VIRTIO,
+      NULL, NULL, NULL, NULL, dt_drive_get_unit },
+    { NULL, 0, NULL, DT_BUS_NONE, DT_BUS_NONE, NULL, NULL, NULL, NULL, NULL }
+};
+
+QEMUMachine pcdt_machine = {
+    .name = "pcdt",
+    .desc = "Standard PC (device tree)",
+    .drvtab = dt_driver_table,
+    .ram_require = VGA_RAM_SIZE + PC_MAX_BIOS_SIZE,
+    .max_cpus = 255,
+};
diff --git a/hw/pci.h b/hw/pci.h
index 4f24895..26fe59e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -213,7 +213,7 @@ void *lsi_scsi_init(PCIBus *bus, int devfn);
 
 /* vmware_vga.c */
 void pci_vmsvga_init(PCIBus *bus, uint8_t *vga_ram_base,
-                     unsigned long vga_ram_offset, int vga_ram_size);
+                     ram_addr_t vga_ram_offset, int vga_ram_size);
 
 /* usb-uhci.c */
 void usb_uhci_piix3_init(PCIBus *bus, int devfn);
diff --git a/hw/pcint.h b/hw/pcint.h
new file mode 100644
index 0000000..f18da67
--- /dev/null
+++ b/hw/pcint.h
@@ -0,0 +1,46 @@
+/*
+ * Stuff shared by pc.c and dt.c
+ *
+ * See dt.c for why this should go away eventually.
+ */
+
+#ifndef HW_PC_INT_H
+#define HW_PC_INT_H
+
+#define BIOS_FILENAME "bios.bin"
+#define VGABIOS_FILENAME "vgabios.bin"
+#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
+
+#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
+
+#define MAX_IDE_BUS 2
+
+/* TODO move to ferr stuff in cpu.h? */
+extern qemu_irq ferr_irq;
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data);
+
+/* TODO eliminate */
+extern RTCState *rtc_state;
+extern PCIDevice *i440fx_state;
+extern int serial_io[MAX_SERIAL_PORTS];
+extern int serial_irq[MAX_SERIAL_PORTS];
+extern int parallel_io[MAX_PARALLEL_PORTS];
+extern int parallel_irq[MAX_PARALLEL_PORTS];
+extern fdctrl_t *floppy_controller;
+
+/* TODO move to pic stuff in pc.h? */
+void pic_irq_request(void *opaque, int irq, int level);
+
+/* TODO move to a20 stuff in pc.h? */
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val);
+uint32_t ioport92_read(void *opaque, uint32_t addr);
+
+void bochs_bios_init(void);
+void main_cpu_reset(void *opaque);
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data);
+int pc_boot_set(void *opaque, const char *boot_device);
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd);
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table);
+
+#endif
diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c
index 5c271e6..45fdbc8 100644
--- a/hw/vmware_vga.c
+++ b/hw/vmware_vga.c
@@ -1122,7 +1122,7 @@ static int vmsvga_load(struct vmsvga_state_s *s, QEMUFile *f)
 }
 
 static void vmsvga_init(struct vmsvga_state_s *s,
-                uint8_t *vga_ram_base, unsigned long vga_ram_offset,
+                uint8_t *vga_ram_base, ram_addr_t vga_ram_offset,
                 int vga_ram_size)
 {
     s->vram = vga_ram_base;
@@ -1216,7 +1216,7 @@ static void pci_vmsvga_map_mem(PCIDevice *pci_dev, int region_num,
 #define PCI_CLASS_HEADERTYPE_00h	0x00
 
 void pci_vmsvga_init(PCIBus *bus, uint8_t *vga_ram_base,
-                     unsigned long vga_ram_offset, int vga_ram_size)
+                     ram_addr_t vga_ram_offset, int vga_ram_size)
 {
     struct pci_vmsvga_state_s *s;
 
diff --git a/net.c b/net.c
index c853daf..831b002 100644
--- a/net.c
+++ b/net.c
@@ -157,7 +157,7 @@ static void hex_dump(FILE *f, const uint8_t *buf, int size)
 }
 #endif
 
-static int parse_macaddr(uint8_t *macaddr, const char *p)
+int parse_macaddr(uint8_t *macaddr, const char *p)
 {
     int i;
     char *last_char;
diff --git a/net.h b/net.h
index 1a51be7..54bdf80 100644
--- a/net.h
+++ b/net.h
@@ -47,6 +47,7 @@ int qemu_can_send_packet(VLANClientState *vc);
 ssize_t qemu_sendv_packet(VLANClientState *vc, const struct iovec *iov,
                           int iovcnt);
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size);
+int parse_macaddr(uint8_t *macaddr, const char *p);
 void qemu_format_nic_info_str(VLANClientState *vc, uint8_t macaddr[6]);
 void qemu_check_nic_model(NICInfo *nd, const char *model);
 void qemu_check_nic_model_list(NICInfo *nd, const char * const *models,
diff --git a/qemu-common.h b/qemu-common.h
index 28f4791..3d1bcf3 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -178,6 +178,7 @@ typedef struct PCIDevice PCIDevice;
 typedef struct SerialState SerialState;
 typedef struct IRQState *qemu_irq;
 struct pcmcia_card_s;
+typedef struct dt_driver dt_driver;
 
 /* CPU save/load.  */
 void cpu_save(QEMUFile *f, void *opaque);
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 1cf49d5..34a7b4d 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -7,6 +7,7 @@
 
 void register_machines(void)
 {
+    qemu_register_machine(&pcdt_machine);
     qemu_register_machine(&pc_machine);
     qemu_register_machine(&isapc_machine);
 }
diff --git a/tree.c b/tree.c
new file mode 100644
index 0000000..da07b76
--- /dev/null
+++ b/tree.c
@@ -0,0 +1,285 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/* Ye Olde Decorated Tree */
+
+#include <assert.h>
+#include "tree.h"
+#include "qemu-common.h"
+#include "sys-queue.h"
+
+struct tree {
+    const char *name;
+    LIST_HEAD(, tree_prop) props;
+    tree *parent;
+    TAILQ_HEAD(, tree) children;
+    TAILQ_ENTRY(tree) siblings;
+    void *user;
+};
+
+struct tree_prop {
+    const char *name;
+    const void *val;
+    int sz;
+    tree *owner;
+    LIST_ENTRY(tree_prop) link;
+};
+
+tree *tree_new_child(tree *parent, const char *name, void *user)
+{
+    tree *child = qemu_malloc(sizeof(*child));
+
+    child->name = name;
+    LIST_INIT(&child->props);
+    child->parent = NULL;
+    TAILQ_INIT(&child->children);
+    child->user = user;
+    if (parent)
+        tree_insert(parent, child);
+
+    return child;
+}
+
+void tree_insert(tree *parent, tree *child)
+{
+    assert(!child->parent);
+    child->parent = parent;
+    TAILQ_INSERT_TAIL(&parent->children, child, siblings);
+}
+
+const char *tree_node_name(const tree *node)
+{
+    return node->name;
+}
+
+static tree *tree_child_by_name(const tree *parent, const char *name)
+{
+    const char *slash = strchr(name, '/');
+    size_t len = slash ? slash - name : strlen(name);
+    tree *child;
+
+    TAILQ_FOREACH(child, &parent->children, siblings) {
+        if (!memcmp(child->name, name, len) && child->name[len] == 0)
+            return child;
+    }
+    return NULL;
+}
+
+tree *tree_node_by_name(const tree *node, const char *name)
+{
+    tree *child;
+    size_t len;
+
+    if (name[0] == '/') {
+        for (; node->parent; node = node->parent) ;
+        while (*name == '/') name++;
+    }
+
+    if (name[0] == 0)
+        return (tree *)node;
+
+    child = tree_child_by_name(node, name);
+    if (!child)
+        return NULL;
+
+    len = strlen(child->name);
+    if (name[len] == 0)
+        return child;
+    assert (name[len] == '/');
+
+    while (name[len] == '/') len++;
+    return tree_node_by_name(child, name + len);
+}
+
+tree_prop *tree_first_prop(const tree *node)
+{
+    return LIST_FIRST(&node->props);
+}
+
+tree_prop *tree_next_prop(const tree_prop *prop)
+{
+    return LIST_NEXT(prop, link);
+}
+
+tree_prop *tree_get_prop(const tree *node, const char *name)
+{
+    tree_prop *prop;
+
+    LIST_FOREACH(prop, &node->props, link) {
+        if (!strcmp(prop->name, name))
+            return prop;
+    }
+    return NULL;
+}
+
+const char *tree_get_prop_s(const tree *node, const char *name)
+{
+    tree_prop *prop = tree_get_prop(node, name);
+    if (!prop
+        || memchr(prop->val, 0, prop->sz) != prop->val + prop->sz - 1) {
+        errno = EINVAL;
+        return NULL;
+    }
+    return prop->val;
+}
+
+const char *tree_prop_name(const tree_prop *prop)
+{
+    return prop->name;
+}
+
+const void *tree_prop_value(const tree_prop *prop, size_t *size)
+{
+    if (size)
+        *size = prop->sz;
+    return prop->val;
+}
+
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz)
+{
+    tree_prop *prop;
+
+    prop = tree_get_prop(node, name);
+    if (!prop) {
+        prop = qemu_malloc(sizeof(*prop));
+        prop->name = name;
+        prop->owner = node;
+        LIST_INSERT_HEAD(&node->props, prop, link);
+    }
+    /* FIXME need a destructor for val */
+    prop->val = val;
+    prop->sz = sz;
+}
+
+void tree_put_propf(tree *node, const char *name, const char *fmt, ...)
+{
+    va_list ap;
+    size_t len;
+    char *buf;
+
+    va_start(ap, fmt);
+    len = vsnprintf(NULL, 0, fmt, ap);
+    va_end(ap);
+
+    buf = qemu_malloc(len + 1);
+    va_start(ap, fmt);
+    vsnprintf(buf, len + 1, fmt, ap);
+    va_end(ap);
+
+    tree_put_prop(node, name, buf, len + 1);
+}
+
+void tree_put_user(tree *node, void *user)
+{
+    node->user = user;
+}
+
+void *tree_get_user(const tree *node)
+{
+    return node->user;
+}
+
+tree *tree_parent(const tree *node)
+{
+    return node->parent;
+}
+
+tree *tree_first_child(const tree *node)
+{
+    return TAILQ_FIRST(&node->children);
+}
+
+tree *tree_sibling(const tree *node)
+{
+    return TAILQ_NEXT(node, siblings);
+}
+
+int tree_path(const tree *node, char *buf, size_t bufsz)
+{
+    char *p;
+    const tree *np;
+    size_t len, res;
+
+    p = buf + bufsz;
+    res = 0;
+    for (np = node; np->parent; np = np->parent) {
+        len = 1 + strlen(np->name);
+        res += len;
+        if (res >= bufsz)
+            continue;
+        p -= len;
+        memcpy(p + 1, np->name, len - 1);
+        p[0] = '/';
+    }
+
+    if (res == 0) {
+        if (++res < bufsz)
+            *--p = '/';
+    }
+
+    if (res < bufsz) {
+        memcpy(buf, p, res);
+        buf[res] = 0;
+    }
+
+    return res;
+}
+
+static void tree_print_sub(const tree *node, int indent)
+{
+    int i, use_str, sep;
+    const unsigned char *pv;
+    tree_prop *prop;
+    tree *child;
+
+    printf("%*s%s {\n", indent, "", node->parent ? node->name : "/");
+    LIST_FOREACH(prop, &node->props, link) {
+        printf("%*s%s", indent + 4, "", prop->name);
+        pv = prop->val;
+        if (pv) {
+            printf(" = ");
+            use_str = pv[prop->sz - 1] == 0;
+            for (i = 0; i < prop->sz - 1; i++) {
+                if (!isprint(pv[i]))
+                    use_str = 0;
+            }
+            if (use_str)
+                printf("\"%s\"", (const char *)prop->val);
+            else {
+                sep = '[';
+                for (i = 0; i < prop->sz; i++) {
+                    printf("%c%02x", sep, pv[i]);
+                    sep = ' ';
+                }
+                printf("]");
+            }
+        }
+        printf(";\n");
+    }
+    TAILQ_FOREACH(child, &node->children, siblings)
+        tree_print_sub(child, indent + 4);
+    printf("%*s};\n", indent, "");
+}
+
+void tree_print(const tree *node)
+{
+    tree_print_sub(node, 0);
+}
diff --git a/tree.h b/tree.h
new file mode 100644
index 0000000..3f3b367
--- /dev/null
+++ b/tree.h
@@ -0,0 +1,41 @@
+#ifndef TREE_H
+#define TREE_H
+
+#include <stddef.h>
+
+typedef struct tree tree;
+typedef struct tree_prop tree_prop;
+
+tree *tree_new_child(tree *parent, const char *name, void *user);
+void tree_insert(tree *parent, tree *child);
+const char *tree_node_name(const tree *node);
+tree *tree_node_by_name(const tree *node,
+                        const char *name);
+
+tree_prop *tree_first_prop(const tree *node);
+tree_prop *tree_next_prop(const tree_prop *prop);
+#define TREE_FOREACH_PROP(var, node) \
+    for (var = tree_first_prop(node); var; var = tree_next_prop(var))
+tree_prop *tree_get_prop(const tree *node, const char *name);
+const char *tree_get_prop_s(const tree *node, const char *name);
+const char *tree_prop_name(const tree_prop *prop);
+const void *tree_prop_value(const tree_prop *prop, size_t *size);
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz);
+void tree_put_propf(tree *node, const char *name,
+                    const char *fmt, ...)
+    __attribute__((format(printf,3,4)));
+
+void tree_put_user(tree *node, void *user);
+void *tree_get_user(const tree *node);
+
+tree *tree_parent(const tree *node);
+tree *tree_first_child(const tree *node);
+tree *tree_sibling(const tree *node);
+#define TREE_FOREACH_CHILD(var, node) \
+    for (var = tree_first_child(node); var; var = tree_sibling(var))
+
+int tree_path(const tree *node, char *buf, size_t bufsz);
+void tree_print(const tree *node);
+
+#endif
diff --git a/vl.c b/vl.c
index abc7f5d..b85e328 100644
--- a/vl.c
+++ b/vl.c
@@ -152,6 +152,7 @@ int main(int argc, char **argv)
 #include "migration.h"
 #include "kvm.h"
 #include "balloon.h"
+#include "dt.h"
 
 #include "disas.h"
 
@@ -5621,8 +5622,18 @@ int main(int argc, char **argv, char **envp)
         }
     }
 
-    machine->init(ram_size, vga_ram_size, boot_devices,
-                  kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
+    if (machine->init) {
+        machine->init(ram_size, vga_ram_size, boot_devices,
+                      kernel_filename, kernel_cmdline, initrd_filename,
+                      cpu_model);
+    } else {
+        tree *conf = dt_read_config(machine->name);
+        dt_modify_config(conf, machine->drvtab,
+                         ram_size, vga_ram_size, boot_devices,
+                         kernel_filename, kernel_cmdline, initrd_filename,
+                         cpu_model);
+        dt_create_machine(conf);
+    }
 
     current_machine = machine;
 

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [Qemu-devel] Re: [RFC] Machine description as data
  2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
                   ` (8 preceding siblings ...)
  2009-03-23 15:50 ` [Qemu-devel] Re: [RFC] Machine description as data Markus Armbruster
@ 2009-03-31  9:16 ` Markus Armbruster
  2009-04-17 16:04 ` Markus Armbruster
  10 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-03-31  9:16 UTC (permalink / raw)
  To: qemu-devel

Eighth iteration of the prototype.  Work in progress, not quite ready
for merging.

New:

* Interrupt tree, but see shortcuts.

* Rebased to git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6959 c046a42c-6fe2-441c-8c8c-71466251a162

Not in, but not forgotten either:

* A few more renames suggested by reviewers.

* Reduce unnecessary differences to IEEE 1275 trees.

Shortcuts:

* No support for systems without PCI bus.

* I didn't implement all the devices of the "pc" original.  Missing:
  - Option ROMs
  - Audio

* Command line options -usb and -no-acpi have no effect; both USB and
  ACPI are always enabled.

* The configuration tree is simplistic.  I expect it to evolve, and I
  wouldn't exclude the possibility of wholesale replacement.

* The initial configuration tree is hardcoded in dt_read_config().  It
  should be read from a configuration file.

* A bus is identified by its kind and number.  The bus number depends on
  its position in the tree.  Means for position-independent addressing
  would be nice.

* The interface to the shared code in hw/pc.c (hw/pcint.h) is rather
  crude.

* The memory driver is PC-specific.  It should be generic and
  data-driven, but getting there isn't quite as easy as it sounds.
  Memory (and sometimes even holes) need to be allocated in just the
  right order to ensure guest physical address equals host offset for
  certain memory ranges.  I feel the proper way to address this is a
  better guest memory allocation interface.

* The pc-misc driver should most probably be split up some.

* Can't cope with arbitrary interrupt trees.  Code aborts if adding
  interrupt tree edges to the device tree creates cycles.

Bugs (not rechecked since last post):

* hw/ppce500_mpc8544ds.c doesn't compile when I configure with fdt
  support.

* If I configure both a virtio block device and a virtio console, the
  Linux guest kernel hangs.  The same happens when I move virtio code in
  pc.c in an otherwise unmodified QEMU so that balloon and console are
  initialized earlier.


 Makefile              |    1 +
 Makefile.target       |    4 +-
 dt.c                  |  796 +++++++++++++++++++++++++++++++++++++++++++++++++
 dt.h                  |  116 +++++++
 hw/boards.h           |    6 +-
 hw/pc.c               |   47 ++--
 hw/pcdt.c             |  692 ++++++++++++++++++++++++++++++++++++++++++
 hw/pci.h              |    2 +-
 hw/pcint.h            |   46 +++
 hw/vmware_vga.c       |    4 +-
 net.c                 |    2 +-
 net.h                 |    1 +
 qemu-common.h         |    1 +
 target-i386/machine.c |    1 +
 tree.c                |  285 ++++++++++++++++++
 tree.h                |   41 +++
 vl.c                  |   15 +-
 17 files changed, 2025 insertions(+), 35 deletions(-)


diff --git a/Makefile b/Makefile
index 2bee52c..cd2ef37 100644
--- a/Makefile
+++ b/Makefile
@@ -85,6 +85,7 @@ OBJS+=sd.o ssi-sd.o
 OBJS+=bt.o bt-host.o bt-vhci.o bt-l2cap.o bt-sdp.o bt-hci.o bt-hid.o usb-bt.o
 OBJS+=buffered_file.o migration.o migration-tcp.o net.o qemu-sockets.o
 OBJS+=qemu-char.o aio.o net-checksum.o savevm.o cache-utils.o
+OBJS+=tree.o
 
 ifdef CONFIG_BRLAPI
 OBJS+= baum.o
diff --git a/Makefile.target b/Makefile.target
index f862d90..d57ebe5 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -505,6 +505,7 @@ OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o dma-helpers.o
 # need to fix this properly
 OBJS+=virtio.o virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
 OBJS+=fw_cfg.o
+OBJS+=dt.o
 ifdef CONFIG_KVM
 OBJS+=kvm.o kvm-all.o
 endif
@@ -536,6 +537,7 @@ endif
 ifdef CONFIG_OSS
 LIBS += $(CONFIG_OSS_LIB)
 endif
+LIBS+= $(FDT_LIBS)
 
 SOUND_HW = sb16.o es1370.o ac97.o
 ifdef CONFIG_ADLIB
@@ -588,6 +590,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o ioapic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
+OBJS+= pcdt.o
 OBJS += device-hotplug.o pci-hotplug.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
@@ -611,7 +614,6 @@ OBJS+= ppc440.o ppc440_bamboo.o
 OBJS+= ppce500_pci.o ppce500_mpc8544ds.o
 ifdef FDT_LIBS
 OBJS+= device_tree.o
-LIBS+= $(FDT_LIBS)
 endif
 ifdef CONFIG_KVM
 OBJS+= kvm_ppc.o
diff --git a/dt.c b/dt.c
new file mode 100644
index 0000000..aa6d434
--- /dev/null
+++ b/dt.c
@@ -0,0 +1,796 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * Configure and build a machine from configuration data
+ *
+ * This is generic, device-independent code driven by device-dependent
+ * configuration data, talking to devices through an abstract device
+ * interface.
+ *
+ * Machine types using it implement QEMUMachine member drvtab[]
+ * instead of member init().  See hw/pcdt.c for an example.
+ */
+
+#include <assert.h>
+#include "block.h"
+#include "cpu.h"
+#include "dt.h"
+#include "net.h"
+#include "tree.h"
+#include "sysemu.h"
+
+#ifdef HAVE_FDT
+#include <libfdt.h>
+#endif
+
+/* Forward declarations */
+static void dt_parse_prop(dt_device *dev, tree_prop *prop);
+static void dt_add_dyn_devs(tree *conf, dt_host *host,
+                            const dt_driver drvtab[], int vga_ram_size);
+static void dt_fdt_test(tree *conf);
+
+
+/* Host Configuration */
+
+struct dt_host {
+    /* connection NIC <-> VLAN */
+    int nics;
+    tree *nic[MAX_NICS];
+    VLANState *nic_vlan[MAX_NICS];
+    /* connection drive <-> block driver state */
+    int drives;
+    int virtio_drives;
+    tree *drive[MAX_DRIVES];
+    BlockDriverState *drive_state[MAX_DRIVES];
+};
+
+static void dt_attach_nic(dt_host *host, tree *nic, VLANState *vlan)
+{
+    assert(host->nics < MAX_NICS);
+    host->nic[host->nics] = nic;
+    host->nic_vlan[host->nics] = vlan;
+    host->nics++;
+}
+
+VLANState *dt_find_vlan(tree *conf, dt_host *host)
+{
+    int i;
+
+    for (i = 0; i < host->nics; i++) {
+        if (host->nic[i] == conf)
+            return host->nic_vlan[i];
+    }
+    return NULL;
+}
+
+static void dt_attach_drive(dt_host *host, tree *node, BlockDriverState *state)
+{
+    assert(host->drives < MAX_DRIVES);
+    host->drive[host->drives] = node;
+    host->drive_state[host->drives] = state;
+    host->drives++;
+}
+
+void dt_find_drives(tree *conf, dt_host *host,
+                    BlockDriverState *drive[], int n)
+{
+    int i, unit;
+
+    memset(drive, 0, n * sizeof(drive[0]));
+
+    for (i = 0; i < host->drives; i++) {
+        if (tree_parent(host->drive[i]) != conf)
+            continue;
+        unit = dt_get_unit(dt_device_of(host->drive[i]));
+        assert(unit < n && !drive[unit]);
+        drive[unit] = host->drive_state[i];
+    }
+}
+
+static void dt_print_host_config(dt_host *host)
+{
+    char buf[1024];
+    int i;
+
+    for (i = 0; i < host->nics; i++) {
+        if (!host->nic[i])
+            continue;
+        tree_path(host->nic[i], buf, sizeof(buf));
+        printf("nic#%d\tvlan %-4d\t%s\n",
+               i, host->nic_vlan[i]->id, buf);
+    }
+
+    for (i = 0; i < host->drives; i++) {
+        tree_path(host->drive[i], buf, sizeof(buf));
+        printf("drive#%d\t%-15s %s\n",
+               i, bdrv_get_device_name(host->drive_state[i]), buf);
+    }
+}
+
+
+/* Device Interface */
+
+static const dt_driver *dt_driver_by_name(const char *name,
+                                          const dt_driver drvtab[])
+{
+    int i;
+
+    for (i = 0; drvtab[i].name; i++) {
+        if (!strcmp(name, drvtab[i].name))
+            return &drvtab[i];
+    }
+    return NULL;
+}
+
+dt_device *dt_device_of(tree *conf)
+{
+    return tree_get_user(conf);
+}
+
+dt_device *dt_parent_device(dt_device *dev)
+{
+    tree *p = tree_parent(dev->conf);
+
+    return p ? dt_device_of(p) : NULL;
+}
+
+static dt_device *dt_do_find_bus(tree *conf, dt_bus_type bus_type, int *skip)
+{
+    dt_device *dev;
+    tree *child;
+
+    dev = dt_device_of(conf);
+    if (dev->drv->bus_type == bus_type && (*skip)-- == 0)
+        return dev;
+
+    TREE_FOREACH_CHILD(child, conf) {
+        dev = dt_do_find_bus(child, bus_type, skip);
+        if (dev)
+            return dev;
+    }
+
+    return NULL;
+}
+
+static dt_device *dt_find_bus(tree *conf, dt_bus_type bus_type, int busno)
+{
+    return dt_do_find_bus(conf, bus_type, &busno);
+}
+
+PCIBus *dt_get_pcibus(dt_device *dev)
+{
+    dt_device *bus = dt_parent_device(dev);
+
+    return bus->drv->get_pcibus(bus);
+}
+
+int dt_get_unit(dt_device *dev)
+{
+    return dev->drv->get_unit(dev);
+}
+
+qemu_irq *dt_get_int(dt_device *dev, int n)
+{
+    dt_device *p;
+    int nirq;
+    qemu_irq *res;
+
+    if (dev->int_parent) {
+        p = dt_device_of(dev->int_parent);
+        assert (p && p->drv->get_int);
+    } else {
+        p = dt_parent_device(dev);
+        if (!p || !p->drv->get_int)
+            return NULL;
+    }
+
+    res = p->drv->get_int(p, &nirq);
+    assert(n <= nirq);
+    return res;
+}
+
+static dt_device *dt_new_device(tree *conf, const dt_driver *drv)
+{
+    dt_device *dev;
+    tree_prop *prop;
+
+    dev = qemu_malloc(sizeof(*dev));
+    dev->conf = conf;
+    dev->drv = drv;
+    dev->visit = 0;
+    dev->priv = qemu_malloc(drv->privsz);
+    tree_put_user(conf, dev);
+
+    TREE_FOREACH_PROP(prop, conf)
+        dt_parse_prop(dev, prop);
+
+    return dev;
+}
+
+static dt_device *dt_create(tree *conf, const dt_driver drvtab[])
+{
+    const dt_driver *drv;
+    dt_device *dev;
+    tree *child;
+
+    drv = dt_driver_by_name(tree_node_name(conf), drvtab);
+    if (!drv) {
+        fprintf(stderr, "No driver for device %s\n",
+                tree_node_name(conf));
+        exit(1);
+    }
+
+    assert((drv->bus_type == DT_BUS_PCI) == (drv->get_pcibus != NULL));
+
+    dev = dt_new_device(conf, drv);
+
+    TREE_FOREACH_CHILD(child, conf)
+        dt_create(child, drvtab);
+
+    return dev;
+}
+
+static void dt_config(tree *conf, dt_host *host)
+{
+    dt_device *dev = dt_device_of(conf);
+    dt_device *bus = dt_parent_device(dev);
+    tree *child;
+
+    if (dev->drv->parent_bus_type == DT_BUS_NONE
+        ? bus != NULL
+        : bus == NULL || bus->drv->bus_type != dev->drv->parent_bus_type) {
+        fprintf(stderr, "Device %s is not on a suitable bus\n",
+                dev->drv->name);
+        exit(1);
+    }
+
+    if (dev->drv->config)
+        dev->drv->config(dev, host);
+
+    TREE_FOREACH_CHILD(child, conf)
+        dt_config(child, host);
+}
+
+static void dt_do_visit(dt_device *dev,
+                        void (*fun)(dt_device *, void *arg),
+                        void *arg, int visit)
+{
+    dt_device *parent, *req, *child;
+    tree *k;
+
+    assert(dev->visit < visit - 1);
+    dev->visit = visit - 1;
+    parent = dt_parent_device(dev);
+    if (parent && parent->visit < visit)
+        dt_do_visit(parent, fun, arg, visit);
+    if (dev->int_parent) {
+        req = dt_device_of(dev->int_parent);
+        if (req->visit < visit)
+            dt_do_visit(req, fun, arg, visit);
+    }
+    dev->visit = visit;
+    fun(dev, arg);
+    TREE_FOREACH_CHILD(k, dev->conf) {
+        child = dt_device_of(k);
+        if (child->visit < visit - 1)
+            dt_do_visit(child, fun, arg, visit);
+    }
+}
+
+static void dt_visit(tree *node,
+                     void (*fun)(dt_device *, void *arg),
+                     void *arg)
+{
+    static int visit;
+
+    visit += 2;
+    dt_do_visit(dt_device_of(node), fun, arg, visit);
+}
+
+static void dt_init_visitor(dt_device *dev, void *arg)
+{
+    if (dev->drv->init)
+        dev->drv->init(dev);
+}
+
+static void dt_init(tree *conf)
+{
+    dt_visit(conf, dt_init_visitor, NULL);
+}
+
+static void dt_start(tree *conf)
+{
+    dt_device *dev = dt_device_of(conf);
+    tree *child;
+
+    if (dev && dev->drv->start)
+        dev->drv->start(dev);
+
+    TREE_FOREACH_CHILD(child, conf)
+        dt_start(child);
+}
+
+void dt_create_machine(tree *conf)
+{
+    dt_fdt_test(conf);
+    dt_init(conf);
+    dt_start(conf);
+}
+
+
+/* Device properties */
+
+static const dt_prop_spec *dt_prop_spec_by_name(const dt_driver *drv,
+                                                const char *name)
+{
+    const dt_prop_spec *spec;
+
+    for (spec = drv->prop_spec; spec && spec->name; spec++) {
+        if (!strcmp(spec->name, name))
+            return spec;
+    }
+    return NULL;
+}
+
+static void dt_parse_prop(dt_device *dev, tree_prop *prop)
+{
+    const char *name = tree_prop_name(prop);
+    size_t size;
+    const char *val = tree_prop_value(prop, &size);
+    const dt_prop_spec *spec = dt_prop_spec_by_name(dev->drv, name);
+
+    if (!spec && !strcmp(name, "interrupt-parent")) {
+        dev->int_parent = tree_node_by_name(dev->conf, val);
+        /* TODO check it's an interrupt controller */
+        return;
+    }
+
+    if (!spec) {
+        fprintf(stderr, "A %s device has no property %s\n",
+                dev->drv->name, name);
+        exit(1);
+    }
+
+    if (memchr(val, 0, size) != val + size - 1
+        || spec->parse((char *)dev->priv + spec->offs, val, spec) < 0) {
+        fprintf(stderr, "Bad value %.*s for property %s of device %s\n",
+                size, val, name, dev->drv->name);
+        exit(1);
+    }
+}
+
+int dt_parse_string(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    assert(spec->size == sizeof(char *));
+    *(const char **)dst = src;
+    return 0;
+}
+
+int dt_parse_int(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    char *ep;
+    long val;
+
+    assert(spec->size == sizeof(int));
+    errno = 0;
+    val = strtol(src, &ep, 0);
+    if (*ep || ep == src || errno || (int)val != val)
+        return -1;
+    *(int *)dst = val;
+    return 0;
+}
+
+int dt_parse_ram_addr_t(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    char *ep;
+    unsigned long val;
+
+    assert(spec->size == sizeof(ram_addr_t));
+    errno = 0;
+    val = strtoul(src, &ep, 0);
+    if (*ep || ep == src || errno || (ram_addr_t)val != val)
+        return -1;
+    *(ram_addr_t *)dst = val;
+    return 0;
+}
+
+int dt_parse_macaddr(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    assert(spec->size == 6);
+    if (parse_macaddr(dst, src) < 0)
+        return -1;
+    return 0;
+}
+
+
+/* Dynamic Devices */
+
+static void dt_add_dyn_dev(tree *conf, tree *node, const dt_driver drvtab[],
+                           int busno)
+{
+    dt_device *dev = dt_create(node, drvtab);
+    dt_device *bus = dt_find_bus(conf, dev->drv->parent_bus_type, busno);
+
+    if (!bus) {
+        fprintf(stderr, "No suitable bus for device %s\n", dev->drv->name);
+        exit(1);
+    }
+
+    tree_insert(bus->conf, node);
+}
+
+static void dt_add_vga(tree *conf, const dt_driver drvtab[],
+                       const char *model, int vga_ram_size)
+{
+    tree *node = tree_new_child(NULL, "vga", NULL);
+
+    tree_put_propf(node, "model", "%s", model);
+    tree_put_propf(node, "ram", "%#x", vga_ram_size);
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+}
+
+static void dt_add_virtio_console(tree *conf, const dt_driver drvtab[],
+                                  int index)
+{
+    tree *node = tree_new_child(NULL, "virtio-console", NULL);
+
+    tree_put_propf(node, "index", "%d", index);
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+}
+
+static void dt_add_nic(tree *conf, dt_host *host, const dt_driver drvtab[],
+                       NICInfo *n)
+{
+    tree *node = node = tree_new_child(NULL, "nic", NULL);
+
+    tree_put_propf(node, "mac", "%02x:%02x:%02x:%02x:%02x:%02x",
+                   n->macaddr[0], n->macaddr[1], n->macaddr[2],
+                   n->macaddr[3], n->macaddr[4], n->macaddr[5]);
+    tree_put_propf(node, "model", "%s",
+                   n->model ? n->model : "ne2k_pci");
+    if (n->name)
+        tree_put_propf(node, "name", "%s", n->name);
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+    dt_attach_nic(host, node, n->vlan);
+}
+
+static void dt_add_scsi(tree *conf, const dt_driver drvtab[], int busno)
+{
+    tree *node = tree_new_child(NULL, "scsi", NULL);
+
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+    assert(dt_find_bus(conf, DT_BUS_SCSI, busno)->conf == node);
+}
+
+static void dt_add_virtio_block(tree *conf, const dt_driver drvtab[],
+                                int busno)
+{
+    tree *node = tree_new_child(NULL, "virtio-block", NULL);
+
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+    assert(dt_find_bus(conf, DT_BUS_VIRTIO, busno)->conf == node);
+}
+
+static const char *block_if_name[] = {
+    [IF_IDE] = "ide",
+    [IF_SCSI] = "scsi",
+    [IF_FLOPPY] = "floppy",
+    [IF_PFLASH] = "pflash",
+    [IF_MTD] = "mtd",
+    [IF_SD] = "sd",
+    [IF_VIRTIO] = "virtio",
+};
+
+static void dt_do_add_drive(tree *conf, dt_host *host,
+                            const dt_driver drvtab[],
+                            int bus_type, int busno, int unit,
+                            BlockDriverState *bdrv)
+{
+    char buf[32];
+    tree *node;
+
+    snprintf(buf, sizeof(buf), "%s-drive", block_if_name[bus_type]);
+    node = tree_new_child(NULL, strdup(buf), NULL);
+    tree_put_propf(node, "unit", "%d", unit);
+    dt_add_dyn_dev(conf, node, drvtab, busno);
+    dt_attach_drive(host, node, bdrv);
+}
+
+static void dt_add_drive(tree *conf, dt_host *host, const dt_driver drvtab[],
+                         DriveInfo *d)
+{
+    switch (d->type) {
+    case IF_IDE:
+        /* hack to hang all IDE drives off the same node for now */
+        dt_do_add_drive(conf, host, drvtab,
+                        d->type, 0, d->bus * MAX_IDE_DEVS + d->unit, d->bdrv);
+        break;
+    case IF_SCSI:
+    case IF_FLOPPY:
+        dt_do_add_drive(conf, host, drvtab,
+                        d->type, d->bus, d->unit, d->bdrv);
+        break;
+    case IF_VIRTIO:
+        /* See comment in on virtio block in dt_add_dyn_devs() */
+        dt_do_add_drive(conf, host, drvtab,
+                        d->type, host->virtio_drives++, 0, d->bdrv);
+        break;
+    case IF_PFLASH:
+    case IF_MTD:
+    case IF_SD:
+        /* TODO implement */
+        fprintf(stderr, "Ignoring unimplemented drive %s\n",
+                drives_opt[d->drive_opt_idx].opt);
+        break;
+    }
+}
+
+static void dt_add_dyn_devs(tree *conf, dt_host *host,
+                            const dt_driver drvtab[], int vga_ram_size)
+{
+    int i, max_bus, busno;
+
+    /* VGA */
+    if (cirrus_vga_enabled || vmsvga_enabled || std_vga_enabled) {
+        dt_add_vga(conf, drvtab,
+                   cirrus_vga_enabled ? "cirrus"
+                   : vmsvga_enabled ? "vms" : "std",
+                   vga_ram_size);
+    }
+
+    /* Virtio consoles */
+    for (i = 0; i < MAX_VIRTIO_CONSOLES; i++) {
+        if (virtcon_hds[i])
+            dt_add_virtio_console(conf, drvtab, i);
+    }
+
+    /* NICs */
+    for(i = 0; i < nb_nics; i++)
+        dt_add_nic(conf, host, drvtab, &nd_table[i]);
+
+    /*
+     * SCSI controllers
+     *
+     * This creates all controllers 0..max_bus, whether they have
+     * drives or not.  Matches pc.c behavior.
+     */
+    max_bus = drive_get_max_bus(IF_SCSI);
+    for (i = 0; i <= max_bus; i++)
+        dt_add_scsi(conf, drvtab, i);
+
+    /*
+     * Virtio block controllers
+     *
+     * Each virtio drive is its own PCI device.  Since the device tree
+     * should reflect that, we give each device on its own virtio
+     * block controller node.
+     *
+     * DriveInfo's bus and unit are a mess.  The user can specify any
+     * bus or unit number.  An unspecified bus number defaults to
+     * zero, and an unspecified unit number defaults to the first
+     * unused one (see drive_init()).  pc.c silently ignores all
+     * virtio drives with non-zero bus number, and all drives on bus
+     * zero after the first unused unit number.  Instead of
+     * replicating that questionable behavior, simply ignore bus and
+     * unit for these drives.
+     */
+    busno = 0;
+    for (i = 0; i < nb_drives; i++) {
+        if (drives_table[i].type == IF_VIRTIO)
+            dt_add_virtio_block(conf, drvtab, busno++);
+    }
+
+    /* Drives */
+    for (i = 0; i < nb_drives; i++)
+        dt_add_drive(conf, host, drvtab, &drives_table[i]);
+}
+
+
+/* Create a configuration */
+
+tree *dt_read_config(const char *name)
+{
+#ifdef TARGET_X86_64
+#define CPU_MODEL_DEFAULT "qemu64"
+#else
+#define CPU_MODEL_DEFAULT "qemu32"
+#endif
+    tree *root, *pci, *leaf;
+
+    /*
+     * TODO Read from config file.
+     *
+     * TODO Pretty far from a comprehensive machine configuration, but
+     * we need to start somewhere.
+     */
+    if (strcmp(name, "pcdt")) {
+        fprintf(stderr, "qemu: machine %s not implemented", name);
+        exit(1);
+    }
+    root = tree_new_child(NULL, "", NULL);
+    leaf = tree_new_child(root, "cpus", NULL);
+    tree_put_propf(leaf, "model", "%s", CPU_MODEL_DEFAULT);
+    leaf = tree_new_child(root, "memory", NULL);
+    leaf = tree_new_child(root, "pc-misc", NULL);
+    pci = tree_new_child(root, "pci", NULL);
+    tree_put_propf(pci, "interrupt-parent", "/pc-misc");
+    leaf = tree_new_child(pci, "piix3", NULL);
+    tree_put_propf(leaf, "interrupt-parent", "/pc-misc");
+    leaf = tree_new_child(pci, "virtio-balloon", NULL);
+    return root;
+#undef CPU_MODEL_DEFAULT
+}
+
+/*
+ * Extract configuration from arguments and various global variables
+ * and put it into our machine configuration.
+ */
+void dt_modify_config(tree *conf,
+                      const dt_driver drvtab[],
+                      ram_addr_t ram_size, int vga_ram_size,
+                      const char *boot_device,
+                      const char *kernel_filename,
+                      const char *kernel_cmdline,
+                      const char *initrd_filename,
+                      const char *cpu_model)
+{
+    /*
+     * TODO This is still pretty cheesy: we insert stuff into the tree
+     * at hardcoded places.  Replacing placeholders instead would be
+     * more flexible.  Another idea is to mark certain parts of the
+     * initial tree optional, and remove them here.
+     */
+    tree *node;
+    dt_host host;
+
+    tree_print(conf);
+
+    node = tree_node_by_name(conf, "/cpus");
+    tree_put_propf(node, "num", "%d", smp_cpus);
+    if (cpu_model)
+        tree_put_propf(node, "model", "%s", cpu_model);
+
+    node = tree_node_by_name(conf, "/memory");
+    tree_put_propf(node, "ram", "%#lx", (unsigned long)ram_size);
+
+    node = tree_node_by_name(conf, "/pc-misc");
+    tree_put_propf(node, "boot-device", "%s", boot_device);
+
+    /* Unimplemented stuff */
+    if (kernel_filename)
+        abort();                /* TODO */
+
+    dt_create(conf, drvtab);
+    memset(&host, 0, sizeof(host));
+    dt_add_dyn_devs(conf, &host, drvtab, vga_ram_size);
+    dt_config(conf, &host);
+
+    dt_print_host_config(&host);
+    tree_print(conf);
+}
+
+
+/* Interfacing with FDT */
+
+/*
+ * Note: translation to FDT loses the association between
+ * configuration tree nodes and devices.
+ */
+
+#ifdef HAVE_FDT
+
+static int dt_fdt_chk(int res);
+static void dt_subtree_to_fdt(const tree *conf, void *fdt);
+
+static void *dt_tree_to_fdt(const tree *conf)
+{
+    int sz = 1024 * 1024;       /* FIXME arbitrary limit */
+    void *fdt = qemu_malloc(sz);
+
+    dt_fdt_chk(fdt_create(fdt, sz));
+    dt_subtree_to_fdt(conf, fdt);
+    dt_fdt_chk(fdt_finish(fdt));
+    return fdt;
+}
+
+static void dt_subtree_to_fdt(const tree *conf, void *fdt)
+{
+    tree_prop *prop;
+    tree *child;
+    const void *pv;
+    size_t sz;
+
+    dt_fdt_chk(fdt_begin_node(fdt, tree_node_name(conf)));
+    TREE_FOREACH_PROP(prop, conf) {
+        pv = tree_prop_value(prop, &sz);
+        dt_fdt_chk(fdt_property(fdt, tree_prop_name(prop), pv, sz));
+    }
+    TREE_FOREACH_CHILD(child, conf)
+        dt_subtree_to_fdt(child, fdt);
+    dt_fdt_chk(fdt_end_node(fdt));
+}
+
+static tree *dt_fdt_to_tree(const void *fdt)
+{
+    int offs, next, depth;
+    uint32_t tag;
+    struct fdt_property *prop;
+    tree *stack[32];            /* FIXME arbitrary limit */
+
+    stack[0] = NULL;            /* "parent" of root */
+    next = depth = 0;
+
+    for (;;) {
+        offs = next;
+        tag = fdt_next_tag(fdt, offs, &next);
+        switch (tag) {
+        case FDT_PROP:
+            /*
+             * libfdt apparently doesn't provide a way to get property
+             * by offset, do it by hand
+             */
+            assert(0 < depth && depth < ARRAY_SIZE(stack));
+            prop = (void *)(const char *)fdt + fdt_off_dt_struct(fdt) + offs;
+            tree_put_prop(stack[depth],
+                          fdt_string(fdt, fdt32_to_cpu(prop->nameoff)),
+                          prop->data,
+                          fdt32_to_cpu(prop->len));
+        case FDT_NOP:
+            break;
+        case FDT_BEGIN_NODE:
+            depth++;
+            assert(0 < depth && depth < ARRAY_SIZE(stack));
+            stack[depth] = tree_new_child(stack[depth-1],
+                                          fdt_get_name(fdt, offs, NULL),
+                                          NULL);
+            break;
+        case FDT_END_NODE:
+            depth--;
+            break;
+        case FDT_END:
+            dt_fdt_chk(next);
+            return stack[1];
+        }
+    }
+}
+
+static int dt_fdt_chk(int res)
+{
+    if (res < 0) {
+        fprintf(stderr, "%s\n", fdt_strerror(res)); /* FIXME cryptic */
+        exit(1);
+    }
+    return res;
+}
+
+static void dt_fdt_test(tree *conf)
+{
+    void *fdt;
+
+    fdt = dt_tree_to_fdt(conf);
+    conf = dt_fdt_to_tree(fdt);
+    tree_print(conf);
+    free(fdt);
+}
+#else
+static void dt_fdt_test(tree *conf) { }
+#endif
diff --git a/dt.h b/dt.h
new file mode 100644
index 0000000..dc48f9d
--- /dev/null
+++ b/dt.h
@@ -0,0 +1,116 @@
+#ifndef DT_H
+#define DT_H
+
+#include "sysemu.h"
+#include "net.h"
+#include "tree.h"
+
+typedef struct dt_host dt_host;
+typedef struct dt_device dt_device;
+typedef struct dt_prop_spec dt_prop_spec;
+
+
+/* Host Configuration */
+
+VLANState *dt_find_vlan(tree *conf, dt_host *host);
+void dt_find_drives(tree *conf, dt_host *host,
+                    BlockDriverState *drive[], int n);
+
+
+/* Device Interface */
+
+/*
+ * Device life cycle:
+ *
+ * 1. Configuration: config() method runs after parent's.  It should
+ * initialize the device's private data from its configuration
+ * sub-tree.  It may edit the configuration sub-tree.
+ *
+ * 2. Initialization: init() method runs after parent's and interrupt
+ * parent's.  It should not touch the configuration tree.
+ *
+ * 3. Start: start() method runs, order is unspecified.
+ *
+ * Error handling in these driver methods: print to stderr and exit
+ * the program unsuccessfully.
+ *
+ * FIXME The requirement that init() runs after interrupt parent's
+ * init() can't be satisfied for arbitrary interrupt trees.  When that
+ * happens, dt_create_machine() aborts the program.
+ *
+ * There is no device shutdown protocol yet.
+ */
+
+struct dt_device {
+    tree *conf;                 /* configuration sub-tree */
+    const dt_driver *drv;       /* device driver */
+    tree *int_parent;           /* interrupt parent if != tree_parent(conf) */
+    int visit;                  /* for dt_visit() */
+    void *priv;                 /* device private data */
+};
+
+typedef enum dt_bus_type {
+    DT_BUS_NONE, DT_BUS_ROOT, DT_BUS_PCI, DT_BUS_IDE, DT_BUS_SCSI,
+    DT_BUS_FLOPPY, DT_BUS_VIRTIO
+} dt_bus_type;
+
+struct dt_driver {
+    const char *name;
+    size_t privsz;              /* size of device private data */
+    const dt_prop_spec *prop_spec; /* recognized conf node properties */
+    dt_bus_type bus_type, parent_bus_type;
+    /* live cycle methods */
+    void (*config)(dt_device *, dt_host *);
+    void (*init)(dt_device *);
+    void (*start)(dt_device *);
+    /* def'd iff device is a PCI bus, may return NULL until after init() */
+    PCIBus *(*get_pcibus)(dt_device *);
+    /* optional, always available */
+    int (*get_unit)(dt_device *);
+    /* def'd iff is an interrupt ctrlr., may return NULL until after init() */
+    qemu_irq *(*get_int)(dt_device *, int *);
+};
+
+dt_device *dt_device_of(tree *conf);
+dt_device *dt_parent_device(dt_device *dev);
+PCIBus *dt_get_pcibus(dt_device *dev);
+int dt_get_unit(dt_device *dev);
+qemu_irq *dt_get_int(dt_device *dev, int n);
+
+tree *dt_read_config(const char *name);
+void dt_modify_config(tree *conf,
+                      const dt_driver drvtab[],
+                      ram_addr_t ram_size, int vga_ram_size,
+                      const char *boot_device,
+                      const char *kernel_filename,
+                      const char *kernel_cmdline,
+                      const char *initrd_filename,
+                      const char *cpu_model);
+void dt_create_machine(tree *conf);
+
+
+/* Device properties */
+
+/*
+ * This is for parsing configuration tree node properties into device
+ * private data.
+ */
+
+struct dt_prop_spec {
+    const char *name;
+    ptrdiff_t offs;             /* offset in device private data */
+    size_t size;                /* size there, for sanity checking */
+    int (*parse)(void *, const char *, const dt_prop_spec *);
+};
+
+#define DT_PROP_SPEC_INIT(name, strty, member, fmt)                     \
+    { name, offsetof(strty, member), sizeof(((strty *)0)->member),      \
+      dt_parse_##fmt }
+
+/* Canned property parse methods */
+int dt_parse_string(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_int(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_ram_addr_t(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_macaddr(void *dst, const char *src, const dt_prop_spec *spec);
+
+#endif
diff --git a/hw/boards.h b/hw/boards.h
index 1e18ba6..1d1f1ae 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -13,7 +13,8 @@ typedef void QEMUMachineInitFunc(ram_addr_t ram_size, int vga_ram_size,
 typedef struct QEMUMachine {
     const char *name;
     const char *desc;
-    QEMUMachineInitFunc *init;
+    QEMUMachineInitFunc *init;  /* traditional machine initialization */
+    const dt_driver *drvtab;    /* new alternative, used if !init */
 #define RAMSIZE_FIXED	(1 << 0)
     ram_addr_t ram_require;
     int use_scsi;
@@ -34,6 +35,9 @@ extern QEMUMachine axisdev88_machine;
 extern QEMUMachine pc_machine;
 extern QEMUMachine isapc_machine;
 
+/* pcdt.c */
+extern QEMUMachine pcdt_machine;
+
 /* ppc.c */
 extern QEMUMachine prep_machine;
 extern QEMUMachine core99_machine;
diff --git a/hw/pc.c b/hw/pc.c
index f9cfd1f..e67175b 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -37,42 +37,35 @@
 #include "virtio-balloon.h"
 #include "virtio-console.h"
 #include "hpet_emul.h"
+#include "pcint.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
 
-#define BIOS_FILENAME "bios.bin"
-#define VGABIOS_FILENAME "vgabios.bin"
-#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
-
-#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
-
 /* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables.  */
 #define ACPI_DATA_SIZE       0x10000
 #define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 
-#define MAX_IDE_BUS 2
-
-static fdctrl_t *floppy_controller;
-static RTCState *rtc_state;
+fdctrl_t *floppy_controller;
+RTCState *rtc_state;
 static PITState *pit;
 static IOAPICState *ioapic;
-static PCIDevice *i440fx_state;
+PCIDevice *i440fx_state;
 
-static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
 {
 }
 
 /* MSDOS compatibility mode FPU exception support */
-static qemu_irq ferr_irq;
+qemu_irq ferr_irq;
 /* XXX: add IGNNE support */
 void cpu_set_ferr(CPUX86State *s)
 {
     qemu_irq_raise(ferr_irq);
 }
 
-static void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
 {
     qemu_irq_lower(ferr_irq);
 }
@@ -121,7 +114,7 @@ int cpu_get_pic_interrupt(CPUState *env)
     return intno;
 }
 
-static void pic_irq_request(void *opaque, int irq, int level)
+void pic_irq_request(void *opaque, int irq, int level)
 {
     CPUState *env = first_cpu;
 
@@ -167,7 +160,7 @@ static int cmos_get_fd_drive_type(int fd0)
     return val;
 }
 
-static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
 {
     RTCState *s = rtc_state;
     int cylinders, heads, sectors;
@@ -203,7 +196,7 @@ static int boot_device2nibble(char boot_device)
 
 /* copy/pasted from cmos_init, should be made a general function
  and used there as well */
-static int pc_boot_set(void *opaque, const char *boot_device)
+int pc_boot_set(void *opaque, const char *boot_device)
 {
     Monitor *mon = cur_mon;
 #define PC_MAX_BOOT_DEVICES 3
@@ -230,8 +223,8 @@ static int pc_boot_set(void *opaque, const char *boot_device)
 }
 
 /* hd_table must contain 4 block drivers */
-static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
-                      const char *boot_device, BlockDriverState **hd_table)
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table)
 {
     RTCState *s = rtc_state;
     int nbds, bds[3] = { 0, };
@@ -364,13 +357,13 @@ int ioport_get_a20(void)
     return ((first_cpu->a20_mask >> 20) & 1);
 }
 
-static void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
 {
     ioport_set_a20((val >> 1) & 1);
     /* XXX: bit 0 is fast reset */
 }
 
-static uint32_t ioport92_read(void *opaque, uint32_t addr)
+uint32_t ioport92_read(void *opaque, uint32_t addr)
 {
     return ioport_get_a20() << 1;
 }
@@ -422,7 +415,7 @@ static void bochs_bios_write(void *opaque, uint32_t addr, uint32_t val)
     }
 }
 
-static void bochs_bios_init(void)
+void bochs_bios_init(void)
 {
     void *fw_cfg;
 
@@ -691,7 +684,7 @@ static void load_linux(uint8_t *option_rom,
     generate_bootsect(option_rom, gpr, seg, 0);
 }
 
-static void main_cpu_reset(void *opaque)
+void main_cpu_reset(void *opaque)
 {
     CPUState *env = opaque;
     cpu_reset(env);
@@ -706,11 +699,11 @@ static const int ide_irq[2] = { 14, 15 };
 static int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360, 0x280, 0x380 };
 static int ne2000_irq[NE2000_NB_MAX] = { 9, 10, 11, 3, 4, 5 };
 
-static int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
-static int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
+int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
+int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
 
-static int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
-static int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
+int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
+int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
 
 #ifdef HAS_AUDIO
 static void audio_init (PCIBus *pci_bus, qemu_irq *pic)
diff --git a/hw/pcdt.c b/hw/pcdt.c
new file mode 100644
index 0000000..4ea1271
--- /dev/null
+++ b/hw/pcdt.c
@@ -0,0 +1,692 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * This is a PC configured and built using the new dt.h interface.
+ * Having two PC machine types makes no sense in the long run, of
+ * course.  We want to replace pc.c eventually, and also convert other
+ * machine types to this interface.
+ *
+ * The configuration data currently is hardwired, and fairly limited.
+ *
+ * The nuts and bolts of PC emulation remain in pc.c for now, and
+ * using the stuff there makes the somewhat clumsy pcint.h necessary.
+ *
+ * The drivers here generally don't do the actual work, they just
+ * provide a common interface to existing device code.  Arguably, they
+ * should be integrated into that device code, with the goal of
+ * eventually replacing the old, ad hoc interfaces.
+ *
+ * Several drivers here are not PC-specific, e.g. drivers for various
+ * PCI devices.
+ */
+
+#include <assert.h>
+#include "hw.h"
+#include "pc.h"
+#include "fdc.h"
+#include "pci.h"
+#include "block.h"
+#include "sysemu.h"
+#include "audio/audio.h"
+#include "net.h"
+#include "smbus.h"
+#include "boards.h"
+#include "console.h"
+#include "fw_cfg.h"
+#include "virtio-blk.h"
+#include "virtio-balloon.h"
+#include "virtio-console.h"
+#include "hpet_emul.h"
+#include "pcint.h"
+#include "dt.h"
+
+
+static BlockDriverState **dt_piix3_bds(tree *piix3);
+
+/* CPUs Driver */
+
+typedef struct dt_device_cpus {
+    const char *model;
+    int num;
+} dt_device_cpus;
+
+static const dt_prop_spec dt_cpus_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
+    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
+};
+
+static void dt_cpus_init(dt_device *dev)
+{
+    dt_device_cpus *priv = dev->priv;
+    int i;
+    CPUState *env;
+
+    for(i = 0; i < priv->num; i++) {
+        env = cpu_init(priv->model);
+        if (!env) {
+            fprintf(stderr, "Unable to find CPU definition\n");
+            exit(1);
+        }
+        if (i != 0)
+            env->halted = 1;
+        qemu_register_reset(main_cpu_reset, env);
+    }
+}
+
+
+/* Memory Ranges */
+
+typedef struct dt_device_memrng {
+    target_phys_addr_t phys_addr;
+    ram_addr_t size;
+    ram_addr_t host_offs;
+    ram_addr_t flags;
+} dt_device_memrng;
+
+static void dt_memrng(dt_device_memrng *rng,
+                      target_phys_addr_t phys_addr, ram_addr_t size,
+                      ram_addr_t host_offs, ram_addr_t flags)
+{
+    rng->phys_addr = phys_addr;
+    rng->size = size;
+    rng->host_offs = host_offs;
+    rng->flags = flags;
+}
+
+static void dt_memrng_ram(dt_device_memrng *rng,
+                          target_phys_addr_t phys_addr, ram_addr_t size)
+{
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), 0);
+}
+
+static void dt_memrng_rom(dt_device_memrng *rng,
+                          target_phys_addr_t phys_addr, ram_addr_t maxsz,
+                          const char *dir, const char *image, int top)
+{
+    char buf[1024];
+    int size;
+
+    snprintf(buf, sizeof(buf), "%s/%s", dir, image);
+    size = get_image_size(buf);
+    if (size < 0 || size > maxsz)
+        goto error;
+    if (top)
+        phys_addr = phys_addr + maxsz - size;
+    dt_memrng(rng, phys_addr, size, qemu_ram_alloc(size), IO_MEM_ROM);
+    if (load_image(buf, phys_ram_base + rng->host_offs) != size)
+        goto error;
+    return;
+
+error:
+    fprintf(stderr, "qemu: could not load image '%s'\n", buf);
+    exit(1);
+}
+
+static void dt_memrng_init(dt_device_memrng *rng, int n)
+{
+    int i;
+
+    for (i = 0; i < n; i++)
+        cpu_register_physical_memory(rng[i].phys_addr, rng[i].size,
+                                     rng[i].host_offs | rng[i].flags);
+}
+
+
+/* Memory Driver */
+
+typedef struct dt_device_memory {
+    ram_addr_t ram_size;
+    dt_device_memrng *rng;
+    int nrng;
+    /* TODO want a real memory map here */
+    ram_addr_t below_4g, above_4g;
+} dt_device_memory;
+
+static const dt_prop_spec dt_memory_props[] = {
+    DT_PROP_SPEC_INIT("ram", dt_device_memory, ram_size, ram_addr_t),
+};
+
+static void dt_memory_config(dt_device *dev, dt_host *host)
+{
+    /* TODO memory map hardcoded; get it from dev->conf instead */
+    dt_device_memory *priv = dev->priv;
+    dt_device_memrng *rng = qemu_malloc(sizeof(*rng) * 4);
+
+    if (priv->ram_size >= 0xe0000000 ) {
+        priv->above_4g = priv->ram_size - 0xe0000000;
+        priv->below_4g = 0xe0000000;
+    } else {
+        priv->below_4g = priv->ram_size;
+        priv->above_4g = 0;
+    }
+
+    dt_memrng_ram(&rng[0], 0, 0xa0000);
+    qemu_ram_alloc(0x60000);
+    dt_memrng_ram(&rng[1], 0x100000, priv->below_4g - 0x100000);
+    if (priv->above_4g)
+        abort();                /* TODO */
+    dt_memrng_rom(&rng[2], 0xe0000000, 0x20000000,
+                  bios_dir, BIOS_FILENAME, 1);
+                                /* TODO get name from dev->conf */
+    dt_memrng(&rng[3], 0xe0000, 0x20000,
+              rng[2].host_offs + rng[2].size - 0x20000, IO_MEM_ROM);
+    /* TODO option ROMs */
+
+    priv->rng = rng;
+    priv->nrng = 4;
+}
+
+static void dt_memory_init(dt_device *dev)
+{
+    dt_device_memory *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, priv->nrng);
+    bochs_bios_init();
+}
+
+static ram_addr_t dt_memory_below_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->below_4g;
+}
+
+static ram_addr_t dt_memory_above_4g(tree *memory)
+{
+    dt_device *dev = dt_device_of(memory);
+    dt_device_memory *priv = dev->priv;
+    assert(dev->drv->init == dt_memory_init);
+    return priv->above_4g;
+}
+
+
+/* PC Miscellanous Driver */
+
+/*
+ * This is a driver for a whole collection of devices.  Could be
+ * picked apart into separate drivers, I guess.
+ */
+
+typedef struct dt_device_pc_misc {
+    const char *boot_device;
+    int apic;
+    int hpet;
+    qemu_irq *i8259;
+    BlockDriverState *bds[MAX_FD];
+} dt_device_pc_misc;
+
+static const dt_prop_spec dt_pc_misc_props[] = {
+    DT_PROP_SPEC_INIT("boot-device", dt_device_pc_misc, boot_device,
+                      string),
+};
+
+static void dt_pc_misc_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pc_misc *priv = dev->priv;
+
+    priv->apic = 1;
+    priv->hpet = 1;
+    priv->i8259 = NULL;
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_pc_misc_init(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    CPUState *env;
+    qemu_irq *cpu_irq;
+    IOAPICState *ioapic;
+    PITState *pit;
+    int i;
+
+    if (priv->apic) {
+        for (env = first_cpu; env; env = env->next_cpu) {
+            env->cpuid_features |= CPUID_APIC;
+            apic_init(env);
+        }
+    }
+
+    vmport_init();
+
+    cpu_irq = qemu_allocate_irqs(pic_irq_request, NULL, 1);
+    priv->i8259 = i8259_init(cpu_irq[0]);
+    ferr_irq = priv->i8259[13];
+
+    register_ioport_write(0x80, 1, 1, ioport80_write, NULL);
+    register_ioport_write(0xf0, 1, 1, ioportF0_write, NULL);
+
+    rtc_state = rtc_init(0x70, priv->i8259[8], 2000);
+    qemu_register_boot_set(pc_boot_set, rtc_state);
+
+    register_ioport_read(0x92, 1, 1, ioport92_read, NULL);
+    register_ioport_write(0x92, 1, 1, ioport92_write, NULL);
+
+    if (priv->apic) {
+        ioapic = ioapic_init();
+        pic_set_alt_irq_func(isa_pic, ioapic_set_irq, ioapic);
+    }
+
+    pit = pit_init(0x40, priv->i8259[0]);
+    pcspk_init(pit);
+    if (priv->hpet)
+        hpet_init(priv->i8259);
+
+    for(i = 0; i < MAX_SERIAL_PORTS; i++) {
+        if (serial_hds[i]) {
+            serial_init(serial_io[i], priv->i8259[serial_irq[i]], 115200,
+                        serial_hds[i]);
+        }
+    }
+
+    for(i = 0; i < MAX_PARALLEL_PORTS; i++) {
+        if (parallel_hds[i]) {
+            parallel_init(parallel_io[i], priv->i8259[parallel_irq[i]],
+                          parallel_hds[i]);
+        }
+    }
+
+    qemu_system_hot_add_init();
+
+    i8042_init(priv->i8259[1], priv->i8259[12], 0x60);
+    DMA_init(0);
+
+    floppy_controller = fdctrl_init(priv->i8259[6], 2, 0, 0x3f0, priv->bds);
+}
+
+static void dt_pc_misc_start(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    tree *memory = tree_node_by_name(dev->conf, "/memory");
+    tree *piix3 = tree_node_by_name(dev->conf, "/pci/piix3");
+
+    cmos_init(dt_memory_below_4g(memory),
+              dt_memory_above_4g(memory),
+              priv->boot_device,
+              dt_piix3_bds(piix3));
+}
+
+static qemu_irq *dt_pc_misc_get_int(dt_device *dev, int *nirq)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    if (nirq)
+        *nirq = 16;
+    return priv->i8259;
+}
+
+
+/* PCI Bus Driver */
+
+typedef struct dt_device_pci {
+    PCIBus *pcibus;
+} dt_device_pci;
+
+static void dt_pci_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->pcibus = NULL;
+}
+
+static void dt_pci_init(dt_device *dev)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->pcibus = i440fx_init(&i440fx_state, dt_get_int(dev, 16));
+}
+
+static void dt_pci_start(dt_device *dev)
+{
+    i440fx_init_memory_mappings(i440fx_state);
+}
+
+static PCIBus *dt_pci_get_pcibus(dt_device *dev)
+{
+    return ((dt_device_pci *)dev->priv)->pcibus;
+}
+
+
+/* PIIX3 Driver */
+
+typedef struct dt_device_piix3 {
+    int devfn;
+    int acpi;
+    int usb;
+    BlockDriverState *bds[MAX_IDE_BUS * MAX_IDE_DEVS];
+} dt_device_piix3;
+
+static void dt_piix3_config(dt_device *dev, dt_host *host)
+{
+    dt_device_piix3 *priv = dev->priv;
+
+    priv->devfn = -1;
+    priv->acpi = 1;
+    priv->usb = 1;
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_piix3_init(dt_device *dev)
+{
+    dt_device_piix3 *priv = dev->priv;
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+    qemu_irq *i8259 = dt_get_int(dev, 16);
+    int i;
+
+    priv->devfn = piix3_init(pci_bus, priv->devfn);
+
+    pci_piix3_ide_init(pci_bus, priv->bds, priv->devfn + 1, i8259);
+
+    if (priv->usb)
+        usb_uhci_piix3_init(pci_bus, priv->devfn + 2);
+
+    if (priv->acpi) {
+        uint8_t *eeprom_buf = qemu_mallocz(8 * 256); /* XXX: make this persistent */
+        i2c_bus *smbus;
+
+        /* TODO: Populate SPD eeprom data.  */
+        smbus = piix4_pm_init(pci_bus, priv->devfn + 3, 0xb100, i8259[9]);
+        for (i = 0; i < 8; i++)
+            smbus_eeprom_device_init(smbus, 0x50 + i, eeprom_buf + (i * 256));
+    }
+}
+
+static BlockDriverState **dt_piix3_bds(tree *piix3)
+{
+    dt_device *dev = dt_device_of(piix3);
+    dt_device_piix3 *priv = dev->priv;
+
+    assert(dev->drv->init == dt_piix3_init);
+    return priv->bds;
+}
+
+
+/* VGA Driver */
+
+typedef struct dt_driver_vga {
+    const char *model;
+    const char *bios;
+    void (*init)(PCIBus *, uint8_t *, ram_addr_t, int);
+} dt_driver_vga;
+
+static void pci_vga_init_(PCIBus *bus, uint8_t *vga_ram_base,
+                          ram_addr_t vga_ram_offset, int vga_ram_size)
+{
+    pci_vga_init(bus, vga_ram_base, vga_ram_offset, vga_ram_size, 0, 0);
+}
+
+static dt_driver_vga dt_driver_vga_table[] = {
+    { "cirrus", VGABIOS_CIRRUS_FILENAME, pci_cirrus_vga_init },
+    { "vms", VGABIOS_FILENAME, pci_vmsvga_init },
+    { "std", VGABIOS_FILENAME, pci_vga_init_ },
+    { NULL, NULL, NULL }
+};
+
+typedef struct dt_device_vga {
+    const char *model;
+    ram_addr_t ram_size;
+    dt_device_memrng rng[1];
+    ram_addr_t ram_offs;
+    dt_driver_vga *vga_drv;
+} dt_device_vga;
+
+static const dt_prop_spec dt_vga_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_vga, model, string),
+    DT_PROP_SPEC_INIT("ram", dt_device_vga, ram_size, ram_addr_t),
+};
+
+static void dt_vga_config(dt_device *dev, dt_host *host)
+{
+    dt_device_vga *priv = dev->priv;
+    int i;
+
+    dt_memrng_rom(&priv->rng[0], 0xc0000, 0x10000,
+                  bios_dir, VGABIOS_CIRRUS_FILENAME, 0);
+                                /* TODO get name from dev->conf */
+    priv->ram_offs = qemu_ram_alloc(priv->ram_size);
+
+    for (i = 0; dt_driver_vga_table[i].model; i++) {
+        if (!strcmp(dt_driver_vga_table[i].model, priv->model))
+            break;
+    }
+    if (!dt_driver_vga_table[i].model) {
+        fprintf(stderr, "Unknown VGA model %s\n", priv->model);
+        exit(1);
+    }
+    priv->vga_drv = &dt_driver_vga_table[i];
+}
+
+static void dt_vga_init(dt_device *dev)
+{
+    dt_device_vga *priv = dev->priv;
+
+    dt_memrng_init(priv->rng, 1);
+    priv->vga_drv->init(dt_get_pcibus(dev),
+                        phys_ram_base + priv->ram_offs,
+                        priv->ram_offs, priv->ram_size);
+}
+
+
+/* NIC Driver */
+
+typedef struct dt_device_nic {
+    NICInfo nd;
+} dt_device_nic;
+
+static const dt_prop_spec dt_nic_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_nic, nd.model, string),
+    DT_PROP_SPEC_INIT("mac", dt_device_nic, nd.macaddr, macaddr),
+    DT_PROP_SPEC_INIT("name", dt_device_nic, nd.name, string),
+};
+
+static void dt_nic_config(dt_device *dev, dt_host *host)
+{
+    dt_device_nic *priv = dev->priv;
+
+    priv->nd.vlan = dt_find_vlan(dev->conf, host);
+}
+
+static void dt_nic_init(dt_device *dev)
+{
+    dt_device_nic *priv = dev->priv;
+
+    pci_nic_init(dt_get_pcibus(dev), &priv->nd, -1, NULL);
+}
+
+
+/* SCSI Driver */
+
+typedef struct dt_device_scsi {
+    void *opaque;
+    BlockDriverState *bds[LSI_MAX_DEVS];
+} dt_device_scsi;
+
+static void dt_scsi_config(dt_device *dev, dt_host *host)
+{
+    dt_device_scsi *priv = dev->priv;
+
+    priv->opaque = NULL;
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_scsi_init(dt_device *dev)
+{
+    dt_device_scsi *priv = dev->priv;
+    int i;
+
+    priv->opaque = lsi_scsi_init(dt_get_pcibus(dev), -1);
+
+    for (i = 0; i < ARRAY_SIZE(priv->bds); i++) {
+        if (priv->bds[i])
+            lsi_scsi_attach(priv->opaque, priv->bds[i], i);
+    }
+}
+
+
+/* Virtio Block Driver */
+
+typedef struct dt_device_virtio_block {
+    BlockDriverState *bds[1];
+} dt_device_virtio_block;
+
+static void dt_virtio_block_config(dt_device *dev, dt_host *host)
+{
+    dt_device_virtio_block *priv = dev->priv;
+
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_virtio_block_init(dt_device *dev)
+{
+    dt_device_virtio_block *priv = dev->priv;
+
+    virtio_blk_init(dt_get_pcibus(dev), priv->bds[0]);
+}
+
+
+/* Virtio Balloon Driver */
+
+static void dt_virtio_balloon_init(dt_device *dev)
+{
+    virtio_balloon_init(dt_get_pcibus(dev));
+}
+
+
+/* Virtio Console Driver */
+
+typedef struct dt_device_virtio_console {
+    int index;
+    CharDriverState *hds;
+} dt_device_virtio_console;
+
+static const dt_prop_spec dt_virtio_console_props[] = {
+    DT_PROP_SPEC_INIT("index", dt_device_virtio_console, index, int),
+};
+
+static void dt_virtio_console_config(dt_device *dev, dt_host *host)
+{
+    dt_device_virtio_console *priv = dev->priv;
+
+    priv->hds = virtcon_hds[priv->index];
+}
+
+static void dt_virtio_console_init(dt_device *dev)
+{
+    dt_device_virtio_console *priv = dev->priv;
+
+    virtio_console_init(dt_get_pcibus(dev), priv->hds);
+}
+
+
+/* Drive Driver */
+
+typedef struct dt_device_drive {
+    int unit;
+} dt_device_drive;
+
+static const dt_prop_spec dt_drive_props[] = {
+    DT_PROP_SPEC_INIT("unit", dt_device_drive, unit, int),
+};
+
+static int dt_drive_get_unit(dt_device *dev)
+{
+    return ((dt_device_drive *)dev->priv)->unit;
+}
+
+
+/* Machine Driver */
+
+static const dt_driver dt_driver_table[] = {
+    { "", 0, NULL, DT_BUS_ROOT, DT_BUS_NONE,
+      NULL, NULL, NULL,
+      NULL, NULL, NULL },
+    { "cpus", sizeof(dt_device_cpus), dt_cpus_props,
+      DT_BUS_NONE, DT_BUS_ROOT,
+      NULL, dt_cpus_init, NULL,
+      NULL, NULL, NULL },
+    { "memory", sizeof(dt_device_memory), dt_memory_props,
+      DT_BUS_NONE, DT_BUS_ROOT,
+      dt_memory_config, dt_memory_init, NULL,
+      NULL, NULL, NULL },
+    { "pc-misc", sizeof(dt_device_pc_misc), dt_pc_misc_props,
+      DT_BUS_FLOPPY, DT_BUS_ROOT,
+      dt_pc_misc_config, dt_pc_misc_init, dt_pc_misc_start,
+      NULL, NULL, dt_pc_misc_get_int },
+    { "pci", sizeof(dt_device_pci), NULL,
+      DT_BUS_PCI, DT_BUS_ROOT,
+      dt_pci_config, dt_pci_init, dt_pci_start,
+      dt_pci_get_pcibus, NULL, NULL },
+    { "piix3", sizeof(dt_device_piix3), NULL,
+      DT_BUS_IDE, DT_BUS_PCI,
+      dt_piix3_config, dt_piix3_init, NULL,
+      NULL, NULL, NULL },
+    { "vga", sizeof(dt_device_vga), dt_vga_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_vga_config, dt_vga_init, NULL,
+      NULL, NULL, NULL },
+    { "nic", sizeof(dt_device_nic), dt_nic_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_nic_config, dt_nic_init, NULL,
+      NULL, NULL, NULL },
+    { "scsi", sizeof(dt_device_scsi), NULL,
+      DT_BUS_SCSI, DT_BUS_PCI,
+      dt_scsi_config, dt_scsi_init, NULL,
+      NULL, NULL, NULL },
+    { "virtio-block", sizeof(dt_device_virtio_block), NULL,
+      DT_BUS_VIRTIO, DT_BUS_PCI,
+      dt_virtio_block_config, dt_virtio_block_init, NULL,
+      NULL, NULL, NULL },
+    { "virtio-balloon", 0, NULL,
+      DT_BUS_NONE, DT_BUS_PCI,
+      NULL, dt_virtio_balloon_init, NULL,
+      NULL, NULL, NULL },
+    { "virtio-console", sizeof(dt_device_virtio_console), dt_virtio_console_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_virtio_console_config, dt_virtio_console_init, NULL,
+      NULL, NULL, NULL },
+    { "ide-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_IDE,
+      NULL, NULL, NULL,
+      NULL, dt_drive_get_unit, NULL },
+    { "scsi-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_SCSI,
+      NULL, NULL, NULL,
+      NULL, dt_drive_get_unit, NULL },
+    { "floppy-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_FLOPPY,
+      NULL, NULL, NULL,
+      NULL, dt_drive_get_unit, NULL },
+    { "virtio-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_VIRTIO,
+      NULL, NULL, NULL,
+      NULL, dt_drive_get_unit, NULL },
+    { NULL, 0, NULL, DT_BUS_NONE, DT_BUS_NONE,
+      NULL, NULL, NULL,
+      NULL, NULL, NULL }
+};
+
+QEMUMachine pcdt_machine = {
+    .name = "pcdt",
+    .desc = "Standard PC (device tree)",
+    .drvtab = dt_driver_table,
+    .ram_require = VGA_RAM_SIZE + PC_MAX_BIOS_SIZE,
+    .max_cpus = 255,
+};
diff --git a/hw/pci.h b/hw/pci.h
index 831f1b1..a73c9b6 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -220,7 +220,7 @@ void *lsi_scsi_init(PCIBus *bus, int devfn);
 
 /* vmware_vga.c */
 void pci_vmsvga_init(PCIBus *bus, uint8_t *vga_ram_base,
-                     unsigned long vga_ram_offset, int vga_ram_size);
+                     ram_addr_t vga_ram_offset, int vga_ram_size);
 
 /* usb-uhci.c */
 void usb_uhci_piix3_init(PCIBus *bus, int devfn);
diff --git a/hw/pcint.h b/hw/pcint.h
new file mode 100644
index 0000000..f18da67
--- /dev/null
+++ b/hw/pcint.h
@@ -0,0 +1,46 @@
+/*
+ * Stuff shared by pc.c and dt.c
+ *
+ * See dt.c for why this should go away eventually.
+ */
+
+#ifndef HW_PC_INT_H
+#define HW_PC_INT_H
+
+#define BIOS_FILENAME "bios.bin"
+#define VGABIOS_FILENAME "vgabios.bin"
+#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
+
+#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
+
+#define MAX_IDE_BUS 2
+
+/* TODO move to ferr stuff in cpu.h? */
+extern qemu_irq ferr_irq;
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data);
+
+/* TODO eliminate */
+extern RTCState *rtc_state;
+extern PCIDevice *i440fx_state;
+extern int serial_io[MAX_SERIAL_PORTS];
+extern int serial_irq[MAX_SERIAL_PORTS];
+extern int parallel_io[MAX_PARALLEL_PORTS];
+extern int parallel_irq[MAX_PARALLEL_PORTS];
+extern fdctrl_t *floppy_controller;
+
+/* TODO move to pic stuff in pc.h? */
+void pic_irq_request(void *opaque, int irq, int level);
+
+/* TODO move to a20 stuff in pc.h? */
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val);
+uint32_t ioport92_read(void *opaque, uint32_t addr);
+
+void bochs_bios_init(void);
+void main_cpu_reset(void *opaque);
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data);
+int pc_boot_set(void *opaque, const char *boot_device);
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd);
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table);
+
+#endif
diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c
index 5c271e6..45fdbc8 100644
--- a/hw/vmware_vga.c
+++ b/hw/vmware_vga.c
@@ -1122,7 +1122,7 @@ static int vmsvga_load(struct vmsvga_state_s *s, QEMUFile *f)
 }
 
 static void vmsvga_init(struct vmsvga_state_s *s,
-                uint8_t *vga_ram_base, unsigned long vga_ram_offset,
+                uint8_t *vga_ram_base, ram_addr_t vga_ram_offset,
                 int vga_ram_size)
 {
     s->vram = vga_ram_base;
@@ -1216,7 +1216,7 @@ static void pci_vmsvga_map_mem(PCIDevice *pci_dev, int region_num,
 #define PCI_CLASS_HEADERTYPE_00h	0x00
 
 void pci_vmsvga_init(PCIBus *bus, uint8_t *vga_ram_base,
-                     unsigned long vga_ram_offset, int vga_ram_size)
+                     ram_addr_t vga_ram_offset, int vga_ram_size)
 {
     struct pci_vmsvga_state_s *s;
 
diff --git a/net.c b/net.c
index 395ee4f..fc16cbc 100644
--- a/net.c
+++ b/net.c
@@ -157,7 +157,7 @@ static void hex_dump(FILE *f, const uint8_t *buf, int size)
 }
 #endif
 
-static int parse_macaddr(uint8_t *macaddr, const char *p)
+int parse_macaddr(uint8_t *macaddr, const char *p)
 {
     int i;
     char *last_char;
diff --git a/net.h b/net.h
index 1a51be7..54bdf80 100644
--- a/net.h
+++ b/net.h
@@ -47,6 +47,7 @@ int qemu_can_send_packet(VLANClientState *vc);
 ssize_t qemu_sendv_packet(VLANClientState *vc, const struct iovec *iov,
                           int iovcnt);
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size);
+int parse_macaddr(uint8_t *macaddr, const char *p);
 void qemu_format_nic_info_str(VLANClientState *vc, uint8_t macaddr[6]);
 void qemu_check_nic_model(NICInfo *nd, const char *model);
 void qemu_check_nic_model_list(NICInfo *nd, const char * const *models,
diff --git a/qemu-common.h b/qemu-common.h
index c10043d..811adac 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -178,6 +178,7 @@ typedef struct PCIDevice PCIDevice;
 typedef struct SerialState SerialState;
 typedef struct IRQState *qemu_irq;
 struct pcmcia_card_s;
+typedef struct dt_driver dt_driver;
 
 /* CPU save/load.  */
 void cpu_save(QEMUFile *f, void *opaque);
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 1cf49d5..34a7b4d 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -7,6 +7,7 @@
 
 void register_machines(void)
 {
+    qemu_register_machine(&pcdt_machine);
     qemu_register_machine(&pc_machine);
     qemu_register_machine(&isapc_machine);
 }
diff --git a/tree.c b/tree.c
new file mode 100644
index 0000000..da07b76
--- /dev/null
+++ b/tree.c
@@ -0,0 +1,285 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/* Ye Olde Decorated Tree */
+
+#include <assert.h>
+#include "tree.h"
+#include "qemu-common.h"
+#include "sys-queue.h"
+
+struct tree {
+    const char *name;
+    LIST_HEAD(, tree_prop) props;
+    tree *parent;
+    TAILQ_HEAD(, tree) children;
+    TAILQ_ENTRY(tree) siblings;
+    void *user;
+};
+
+struct tree_prop {
+    const char *name;
+    const void *val;
+    int sz;
+    tree *owner;
+    LIST_ENTRY(tree_prop) link;
+};
+
+tree *tree_new_child(tree *parent, const char *name, void *user)
+{
+    tree *child = qemu_malloc(sizeof(*child));
+
+    child->name = name;
+    LIST_INIT(&child->props);
+    child->parent = NULL;
+    TAILQ_INIT(&child->children);
+    child->user = user;
+    if (parent)
+        tree_insert(parent, child);
+
+    return child;
+}
+
+void tree_insert(tree *parent, tree *child)
+{
+    assert(!child->parent);
+    child->parent = parent;
+    TAILQ_INSERT_TAIL(&parent->children, child, siblings);
+}
+
+const char *tree_node_name(const tree *node)
+{
+    return node->name;
+}
+
+static tree *tree_child_by_name(const tree *parent, const char *name)
+{
+    const char *slash = strchr(name, '/');
+    size_t len = slash ? slash - name : strlen(name);
+    tree *child;
+
+    TAILQ_FOREACH(child, &parent->children, siblings) {
+        if (!memcmp(child->name, name, len) && child->name[len] == 0)
+            return child;
+    }
+    return NULL;
+}
+
+tree *tree_node_by_name(const tree *node, const char *name)
+{
+    tree *child;
+    size_t len;
+
+    if (name[0] == '/') {
+        for (; node->parent; node = node->parent) ;
+        while (*name == '/') name++;
+    }
+
+    if (name[0] == 0)
+        return (tree *)node;
+
+    child = tree_child_by_name(node, name);
+    if (!child)
+        return NULL;
+
+    len = strlen(child->name);
+    if (name[len] == 0)
+        return child;
+    assert (name[len] == '/');
+
+    while (name[len] == '/') len++;
+    return tree_node_by_name(child, name + len);
+}
+
+tree_prop *tree_first_prop(const tree *node)
+{
+    return LIST_FIRST(&node->props);
+}
+
+tree_prop *tree_next_prop(const tree_prop *prop)
+{
+    return LIST_NEXT(prop, link);
+}
+
+tree_prop *tree_get_prop(const tree *node, const char *name)
+{
+    tree_prop *prop;
+
+    LIST_FOREACH(prop, &node->props, link) {
+        if (!strcmp(prop->name, name))
+            return prop;
+    }
+    return NULL;
+}
+
+const char *tree_get_prop_s(const tree *node, const char *name)
+{
+    tree_prop *prop = tree_get_prop(node, name);
+    if (!prop
+        || memchr(prop->val, 0, prop->sz) != prop->val + prop->sz - 1) {
+        errno = EINVAL;
+        return NULL;
+    }
+    return prop->val;
+}
+
+const char *tree_prop_name(const tree_prop *prop)
+{
+    return prop->name;
+}
+
+const void *tree_prop_value(const tree_prop *prop, size_t *size)
+{
+    if (size)
+        *size = prop->sz;
+    return prop->val;
+}
+
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz)
+{
+    tree_prop *prop;
+
+    prop = tree_get_prop(node, name);
+    if (!prop) {
+        prop = qemu_malloc(sizeof(*prop));
+        prop->name = name;
+        prop->owner = node;
+        LIST_INSERT_HEAD(&node->props, prop, link);
+    }
+    /* FIXME need a destructor for val */
+    prop->val = val;
+    prop->sz = sz;
+}
+
+void tree_put_propf(tree *node, const char *name, const char *fmt, ...)
+{
+    va_list ap;
+    size_t len;
+    char *buf;
+
+    va_start(ap, fmt);
+    len = vsnprintf(NULL, 0, fmt, ap);
+    va_end(ap);
+
+    buf = qemu_malloc(len + 1);
+    va_start(ap, fmt);
+    vsnprintf(buf, len + 1, fmt, ap);
+    va_end(ap);
+
+    tree_put_prop(node, name, buf, len + 1);
+}
+
+void tree_put_user(tree *node, void *user)
+{
+    node->user = user;
+}
+
+void *tree_get_user(const tree *node)
+{
+    return node->user;
+}
+
+tree *tree_parent(const tree *node)
+{
+    return node->parent;
+}
+
+tree *tree_first_child(const tree *node)
+{
+    return TAILQ_FIRST(&node->children);
+}
+
+tree *tree_sibling(const tree *node)
+{
+    return TAILQ_NEXT(node, siblings);
+}
+
+int tree_path(const tree *node, char *buf, size_t bufsz)
+{
+    char *p;
+    const tree *np;
+    size_t len, res;
+
+    p = buf + bufsz;
+    res = 0;
+    for (np = node; np->parent; np = np->parent) {
+        len = 1 + strlen(np->name);
+        res += len;
+        if (res >= bufsz)
+            continue;
+        p -= len;
+        memcpy(p + 1, np->name, len - 1);
+        p[0] = '/';
+    }
+
+    if (res == 0) {
+        if (++res < bufsz)
+            *--p = '/';
+    }
+
+    if (res < bufsz) {
+        memcpy(buf, p, res);
+        buf[res] = 0;
+    }
+
+    return res;
+}
+
+static void tree_print_sub(const tree *node, int indent)
+{
+    int i, use_str, sep;
+    const unsigned char *pv;
+    tree_prop *prop;
+    tree *child;
+
+    printf("%*s%s {\n", indent, "", node->parent ? node->name : "/");
+    LIST_FOREACH(prop, &node->props, link) {
+        printf("%*s%s", indent + 4, "", prop->name);
+        pv = prop->val;
+        if (pv) {
+            printf(" = ");
+            use_str = pv[prop->sz - 1] == 0;
+            for (i = 0; i < prop->sz - 1; i++) {
+                if (!isprint(pv[i]))
+                    use_str = 0;
+            }
+            if (use_str)
+                printf("\"%s\"", (const char *)prop->val);
+            else {
+                sep = '[';
+                for (i = 0; i < prop->sz; i++) {
+                    printf("%c%02x", sep, pv[i]);
+                    sep = ' ';
+                }
+                printf("]");
+            }
+        }
+        printf(";\n");
+    }
+    TAILQ_FOREACH(child, &node->children, siblings)
+        tree_print_sub(child, indent + 4);
+    printf("%*s};\n", indent, "");
+}
+
+void tree_print(const tree *node)
+{
+    tree_print_sub(node, 0);
+}
diff --git a/tree.h b/tree.h
new file mode 100644
index 0000000..3f3b367
--- /dev/null
+++ b/tree.h
@@ -0,0 +1,41 @@
+#ifndef TREE_H
+#define TREE_H
+
+#include <stddef.h>
+
+typedef struct tree tree;
+typedef struct tree_prop tree_prop;
+
+tree *tree_new_child(tree *parent, const char *name, void *user);
+void tree_insert(tree *parent, tree *child);
+const char *tree_node_name(const tree *node);
+tree *tree_node_by_name(const tree *node,
+                        const char *name);
+
+tree_prop *tree_first_prop(const tree *node);
+tree_prop *tree_next_prop(const tree_prop *prop);
+#define TREE_FOREACH_PROP(var, node) \
+    for (var = tree_first_prop(node); var; var = tree_next_prop(var))
+tree_prop *tree_get_prop(const tree *node, const char *name);
+const char *tree_get_prop_s(const tree *node, const char *name);
+const char *tree_prop_name(const tree_prop *prop);
+const void *tree_prop_value(const tree_prop *prop, size_t *size);
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz);
+void tree_put_propf(tree *node, const char *name,
+                    const char *fmt, ...)
+    __attribute__((format(printf,3,4)));
+
+void tree_put_user(tree *node, void *user);
+void *tree_get_user(const tree *node);
+
+tree *tree_parent(const tree *node);
+tree *tree_first_child(const tree *node);
+tree *tree_sibling(const tree *node);
+#define TREE_FOREACH_CHILD(var, node) \
+    for (var = tree_first_child(node); var; var = tree_sibling(var))
+
+int tree_path(const tree *node, char *buf, size_t bufsz);
+void tree_print(const tree *node);
+
+#endif
diff --git a/vl.c b/vl.c
index 5e6c621..37a742c 100644
--- a/vl.c
+++ b/vl.c
@@ -153,6 +153,7 @@ int main(int argc, char **argv)
 #include "migration.h"
 #include "kvm.h"
 #include "balloon.h"
+#include "dt.h"
 
 #include "disas.h"
 
@@ -5252,8 +5253,18 @@ int main(int argc, char **argv, char **envp)
         }
     }
 
-    machine->init(ram_size, vga_ram_size, boot_devices,
-                  kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
+    if (machine->init) {
+        machine->init(ram_size, vga_ram_size, boot_devices,
+                      kernel_filename, kernel_cmdline, initrd_filename,
+                      cpu_model);
+    } else {
+        tree *conf = dt_read_config(machine->name);
+        dt_modify_config(conf, machine->drvtab,
+                         ram_size, vga_ram_size, boot_devices,
+                         kernel_filename, kernel_cmdline, initrd_filename,
+                         cpu_model);
+        dt_create_machine(conf);
+    }
 
     current_machine = machine;
 

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [Qemu-devel] Re: [RFC] Machine description as data
  2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
                   ` (9 preceding siblings ...)
  2009-03-31  9:16 ` Markus Armbruster
@ 2009-04-17 16:04 ` Markus Armbruster
  10 siblings, 0 replies; 146+ messages in thread
From: Markus Armbruster @ 2009-04-17 16:04 UTC (permalink / raw)
  To: qemu-devel

Ninth iteration of the prototype.  Work in progress, not quite ready for
merging.

New:

* Support arbitrary interrupt trees.

* PIC and PIT have been split off pc-misc (Marcelo Tosatti)

* PIIX3 split into its functions: ISA-bridge, IDE, USB, ACPI.  ISA
  devices (other than pc-misc) are now on the ISA-bridge (Marcelo
  Tosatti)

* RTC has been split off pc-misc.

* The memory driver is no longer PC-specific.  Instead, PC-specific code
  adds appropriate memory nodes to the tree.

* Option ROMs work, including -kernel.

* Command line options -usb and -no-acpi now work.

* Backed out the new machine creation interface for now.  We don't need
  it to get machine-specific and generic parts seperated cleanly.  In
  fact, it got in the way, because it wasn't quite right.

* Rebased to git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7134 c046a42c-6fe2-441c-8c8c-71466251a162

Not in, but not forgotten either:

* A few more renames suggested by reviewers.

* Reduce unnecessary differences to IEEE 1275 trees.

Shortcuts:

* No support for systems without PCI bus.

* I didn't implement all the devices of the "pc" original.  Missing:
  - Audio
  - RAM above 4g

* The configuration tree is simplistic.  I expect it to evolve, and I
  wouldn't exclude the possibility of wholesale replacement.

* The initial configuration tree is hardcoded in dt_hardcoded_config().
  It should be read from a configuration file.

* The new, unified device API is implemented as wrappers around the old
  APIs.  Done that way because we want to explore the new API with
  minimal impact to the rest of the code.  Once we're satisfied with the
  new API, the old APIs should be replaced.

* A bus is identified by its kind and number.  The bus number depends on
  its position in the tree.  Means for position-independent addressing
  would be nice.

* The interface to the shared code in hw/pc.c (hw/pcint.h) is rather
  crude.

* The pc-misc driver should be split up completely.

* The drivers for the four PIIX3 functions share their common PCI device
  number via a global variable.

* BIOS larger than 128KiB isn't implemented.  pc.c loads the BIOS once
  and maps it in two locations.  pcdt.c loads and maps it twice.

Bugs (last checked in 7th iteration):

* hw/ppce500_mpc8544ds.c doesn't compile when I configure with fdt
  support.

* If I configure both a virtio block device and a virtio console, the
  Linux guest kernel hangs.  The same happens when I move virtio code in
  pc.c in an otherwise unmodified QEMU so that balloon and console are
  initialized earlier.


 Makefile              |    1 
 Makefile.target       |    7 
 dt-fdt.c              |  123 ++++++
 dt-host.c             |  160 ++++++++
 dt.c                  |  619 +++++++++++++++++++++++++++++++++
 dt.h                  |  141 +++++++
 hw/boards.h           |    3 
 hw/pc.c               |   55 +--
 hw/pcdt.c             |  915 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pcint.h            |   50 ++
 net.c                 |    2 
 net.h                 |    1 
 target-i386/machine.c |    1 
 tree.c                |  285 +++++++++++++++
 tree.h                |   41 ++
 15 files changed, 2371 insertions(+), 33 deletions(-)


diff --git a/Makefile b/Makefile
index 76e83ba..6b6b090 100644
--- a/Makefile
+++ b/Makefile
@@ -99,6 +99,7 @@ OBJS+=bt-hci-csr.o
 OBJS+=buffered_file.o migration.o migration-tcp.o net.o qemu-sockets.o
 OBJS+=qemu-char.o aio.o net-checksum.o savevm.o cache-utils.o
 OBJS+=msmouse.o ps2.o
+OBJS+=tree.o
 
 ifdef CONFIG_BRLAPI
 OBJS+= baum.o
diff --git a/Makefile.target b/Makefile.target
index e9b039d..0d3f94d 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -499,6 +499,10 @@ OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o dma-helpers.o
 # need to fix this properly
 OBJS+=virtio.o virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
 OBJS+=fw_cfg.o
+OBJS+=dt.o dt-host.o
+ifdef FDT_LIBS
+OBJS+= dt-fdt.o
+endif
 ifdef CONFIG_KVM
 OBJS+=kvm.o kvm-all.o
 endif
@@ -530,6 +534,7 @@ endif
 ifdef CONFIG_OSS
 LIBS += $(CONFIG_OSS_LIB)
 endif
+LIBS+= $(FDT_LIBS)
 
 SOUND_HW = sb16.o es1370.o ac97.o
 ifdef CONFIG_ADLIB
@@ -576,6 +581,7 @@ OBJS+= ide.o pckbd.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o ioapic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
+OBJS+= pcdt.o
 OBJS += device-hotplug.o pci-hotplug.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
@@ -599,7 +605,6 @@ OBJS+= ppc440.o ppc440_bamboo.o
 OBJS+= ppce500_pci.o ppce500_mpc8544ds.o
 ifdef FDT_LIBS
 OBJS+= device_tree.o
-LIBS+= $(FDT_LIBS)
 endif
 ifdef CONFIG_KVM
 OBJS+= kvm_ppc.o
diff --git a/dt-fdt.c b/dt-fdt.c
new file mode 100644
index 0000000..375e50a
--- /dev/null
+++ b/dt-fdt.c
@@ -0,0 +1,123 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/* Interfacing with FDT */
+
+/*
+ * Note: translation to FDT loses the association between
+ * configuration tree nodes and devices.
+ */
+
+#include <libfdt.h>
+#include "dt.h"
+
+static int dt_fdt_chk(int res);
+static void dt_subtree_to_fdt(const tree *conf, void *fdt);
+
+static void *dt_tree_to_fdt(const tree *conf)
+{
+    int sz = 1024 * 1024;       /* FIXME arbitrary limit */
+    void *fdt = qemu_malloc(sz);
+
+    dt_fdt_chk(fdt_create(fdt, sz));
+    dt_subtree_to_fdt(conf, fdt);
+    dt_fdt_chk(fdt_finish(fdt));
+    return fdt;
+}
+
+static void dt_subtree_to_fdt(const tree *conf, void *fdt)
+{
+    tree_prop *prop;
+    tree *child;
+    const void *pv;
+    size_t sz;
+
+    dt_fdt_chk(fdt_begin_node(fdt, tree_node_name(conf)));
+    TREE_FOREACH_PROP(prop, conf) {
+        pv = tree_prop_value(prop, &sz);
+        dt_fdt_chk(fdt_property(fdt, tree_prop_name(prop), pv, sz));
+    }
+    TREE_FOREACH_CHILD(child, conf)
+        dt_subtree_to_fdt(child, fdt);
+    dt_fdt_chk(fdt_end_node(fdt));
+}
+
+static tree *dt_fdt_to_tree(const void *fdt)
+{
+    int offs, next, depth;
+    uint32_t tag;
+    struct fdt_property *prop;
+    tree *stack[32];            /* FIXME arbitrary limit */
+
+    stack[0] = NULL;            /* "parent" of root */
+    next = depth = 0;
+
+    for (;;) {
+        offs = next;
+        tag = fdt_next_tag(fdt, offs, &next);
+        switch (tag) {
+        case FDT_PROP:
+            /*
+             * libfdt apparently doesn't provide a way to get property
+             * by offset, do it by hand
+             */
+            assert(0 < depth && depth < ARRAY_SIZE(stack));
+            prop = (void *)(const char *)fdt + fdt_off_dt_struct(fdt) + offs;
+            tree_put_prop(stack[depth],
+                          fdt_string(fdt, fdt32_to_cpu(prop->nameoff)),
+                          prop->data,
+                          fdt32_to_cpu(prop->len));
+        case FDT_NOP:
+            break;
+        case FDT_BEGIN_NODE:
+            depth++;
+            assert(0 < depth && depth < ARRAY_SIZE(stack));
+            stack[depth] = tree_new_child(stack[depth-1],
+                                          fdt_get_name(fdt, offs, NULL),
+                                          NULL);
+            break;
+        case FDT_END_NODE:
+            depth--;
+            break;
+        case FDT_END:
+            dt_fdt_chk(next);
+            return stack[1];
+        }
+    }
+}
+
+static int dt_fdt_chk(int res)
+{
+    if (res < 0) {
+        fprintf(stderr, "%s\n", fdt_strerror(res)); /* FIXME cryptic */
+        exit(1);
+    }
+    return res;
+}
+
+void dt_fdt_test(tree *conf)
+{
+    void *fdt;
+
+    fdt = dt_tree_to_fdt(conf);
+    conf = dt_fdt_to_tree(fdt);
+    tree_print(conf);
+    free(fdt);
+}
diff --git a/dt-host.c b/dt-host.c
new file mode 100644
index 0000000..0de08a3
--- /dev/null
+++ b/dt-host.c
@@ -0,0 +1,160 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/* Host Configuration */
+
+#include <assert.h>
+#include "block.h"
+#include "dt.h"
+
+/* TODO stupid arbitrary limit */
+#define MAX_MEM (MAX_OPTION_ROMS + 4)
+
+typedef struct dt_host_memory {
+    dt_mem_loader loader;
+    target_phys_addr_t addr;
+    ram_addr_t size;
+    const void *arg;
+} dt_host_memory;
+
+struct dt_host {
+    /* connection NIC <-> VLAN */
+    int nics;
+    tree *nic[MAX_NICS];
+    VLANState *nic_vlan[MAX_NICS];
+    /* connection drive <-> block driver state */
+    int drives;
+    tree *drive[MAX_DRIVES];
+    BlockDriverState *drive_state[MAX_DRIVES];
+    /* initial memory contents */
+    int mems;
+    dt_host_memory mem[MAX_MEM];
+    /* the rest isn't configuration, it's derived from other stuff */
+    int virtio_buses;
+};
+
+dt_host *dt_create_host(void)
+{
+    return qemu_mallocz(sizeof(dt_host));
+}
+
+void dt_attach_nic(dt_host *host, tree *nic, VLANState *vlan)
+{
+    assert(host->nics < MAX_NICS);
+    host->nic[host->nics] = nic;
+    host->nic_vlan[host->nics] = vlan;
+    host->nics++;
+}
+
+VLANState *dt_find_vlan(tree *conf, dt_host *host)
+{
+    int i;
+
+    for (i = 0; i < host->nics; i++) {
+        if (host->nic[i] == conf)
+            return host->nic_vlan[i];
+    }
+    return NULL;
+}
+
+void dt_attach_drive(dt_host *host, tree *node, BlockDriverState *state)
+{
+    assert(host->drives < MAX_DRIVES);
+    host->drive[host->drives] = node;
+    host->drive_state[host->drives] = state;
+    host->drives++;
+}
+
+void dt_find_drives(tree *conf, dt_host *host,
+                    BlockDriverState *drive[], int n)
+{
+    int i, unit;
+
+    memset(drive, 0, n * sizeof(drive[0]));
+
+    for (i = 0; i < host->drives; i++) {
+        if (tree_parent(host->drive[i]) != conf)
+            continue;
+        unit = dt_get_unit(dt_device_of(host->drive[i]));
+        assert(unit < n && !drive[unit]);
+        drive[unit] = host->drive_state[i];
+    }
+}
+
+void dt_config_mem(dt_host *host, dt_mem_loader loader,
+                   target_phys_addr_t addr, ram_addr_t size, const void *arg)
+{
+    assert(host->mems < MAX_MEM);
+    host->mem[host->mems].loader = loader;
+    host->mem[host->mems].addr = addr;
+    host->mem[host->mems].size = size;
+    host->mem[host->mems].arg = arg;
+    host->mems++;
+}
+
+static void dt_init_mem(dt_host *host)
+{
+    dt_host_memory *p;
+
+    for (p = host->mem; p < host->mem + host->mems; p++)
+        p->loader(p->addr, p->size, p->arg);
+}
+
+void dt_image_loader(target_phys_addr_t addr, ram_addr_t size, const void *arg)
+{
+    if (load_image_targphys(arg, addr, size) < 0) {
+        fprintf(stderr, "qemu: could not load image '%s'\n", (char *)arg);
+        exit(1);
+    }
+}
+
+int dt_alloc_virtio_bus(dt_host *host)
+{
+    return host->virtio_buses++;
+}
+
+void dt_host_init(dt_host *host)
+{
+    dt_init_mem(host);
+}
+
+void dt_print_host_config(dt_host *host)
+{
+    char buf[1024];
+    int i;
+
+    for (i = 0; i < host->nics; i++) {
+        if (!host->nic[i])
+            continue;
+        tree_path(host->nic[i], buf, sizeof(buf));
+        printf("nic#%d\tvlan %-4d\t%s\n",
+               i, host->nic_vlan[i]->id, buf);
+    }
+
+    for (i = 0; i < host->drives; i++) {
+        tree_path(host->drive[i], buf, sizeof(buf));
+        printf("drive#%d\t%-15s %s\n",
+               i, bdrv_get_device_name(host->drive_state[i]), buf);
+    }
+
+    for (i = 0; i < host->mems; i++)
+        printf("image\t" TARGET_FMT_plx " %08lx\n",
+               host->mem[i].addr, (unsigned long)host->mem[i].size);
+}
diff --git a/dt.c b/dt.c
new file mode 100644
index 0000000..20d53ac
--- /dev/null
+++ b/dt.c
@@ -0,0 +1,619 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * Configure and build a machine from configuration data
+ *
+ * This is generic, device-independent code driven by device-dependent
+ * configuration data, talking to devices through an abstract device
+ * interface.
+ */
+
+#include <assert.h>
+#include "block.h"
+#include "cpu.h"
+#include "dt.h"
+#include "net.h"
+#include "tree.h"
+#include "sysemu.h"
+
+/* Forward declarations */
+static void dt_parse_prop(dt_device *dev, tree_prop *prop);
+
+
+/* Device Interface */
+
+static const dt_driver *dt_driver_by_name(const char *name,
+                                          const dt_driver drvtab[])
+{
+    int i;
+
+    for (i = 0; drvtab[i].name; i++) {
+        if (!strcmp(name, drvtab[i].name))
+            return &drvtab[i];
+    }
+    return NULL;
+}
+
+dt_device *dt_device_of(tree *conf)
+{
+    return tree_get_user(conf);
+}
+
+dt_device *dt_parent_device(dt_device *dev)
+{
+    tree *p = tree_parent(dev->conf);
+
+    return p ? dt_device_of(p) : NULL;
+}
+
+dt_device *dt_root(dt_device *dev)
+{
+    tree *p, *r;
+
+    p = dev->conf;
+    do r = p; while ((p = tree_parent(p)));
+
+    return dt_device_of(r);
+}
+
+static dt_device *dt_do_find_bus(tree *conf, dt_bus_type bus_type, int *skip)
+{
+    dt_device *dev;
+    tree *child;
+
+    dev = dt_device_of(conf);
+    if (dev->drv->bus_type == bus_type && (*skip)-- == 0)
+        return dev;
+
+    TREE_FOREACH_CHILD(child, conf) {
+        dev = dt_do_find_bus(child, bus_type, skip);
+        if (dev)
+            return dev;
+    }
+
+    return NULL;
+}
+
+dt_device *dt_find_bus(tree *conf, dt_bus_type bus_type, int busno)
+{
+    return dt_do_find_bus(conf, bus_type, &busno);
+}
+
+PCIBus *dt_get_pcibus(dt_device *dev)
+{
+    dt_device *bus = dt_parent_device(dev);
+
+    return bus->drv->get_pcibus(bus);
+}
+
+int dt_get_unit(dt_device *dev)
+{
+    return dev->drv->get_unit(dev);
+}
+
+static dt_device *dt_int_parent(dt_device *dev)
+{
+    dt_device *res;
+
+    if (dev->int_parent)
+        return dt_device_of(dev->int_parent);
+
+    res = dt_parent_device(dev);
+    return res && res->drv->get_int ? res : NULL;
+}
+
+qemu_irq *dt_get_int(dt_device *dev, int n)
+{
+    dt_device *int_parent = dt_int_parent(dev);
+    int nirq;
+    qemu_irq *res;
+
+    if (!int_parent)
+        return NULL;
+
+    res = int_parent->drv->get_int(int_parent, &nirq);
+    assert(n <= nirq);
+    return res;
+}
+
+static dt_device *dt_new_device(tree *conf, const dt_driver *drv)
+{
+    dt_device *dev;
+    tree_prop *prop;
+
+    dev = qemu_malloc(sizeof(*dev));
+    dev->conf = conf;
+    dev->drv = drv;
+    dev->int_parent = NULL;
+    TAILQ_INIT(&dev->int_children);
+    dev->visit = 0;
+    dev->priv = qemu_mallocz(drv->privsz);
+    tree_put_user(conf, dev);
+
+    TREE_FOREACH_PROP(prop, conf)
+        dt_parse_prop(dev, prop);
+
+    return dev;
+}
+
+static dt_device *dt_create(tree *conf, const dt_driver drvtab[])
+{
+    const dt_driver *drv;
+    dt_device *dev;
+    tree *child;
+
+    drv = dt_driver_by_name(tree_node_name(conf), drvtab);
+    if (!drv) {
+        fprintf(stderr, "No driver for device %s\n",
+                tree_node_name(conf));
+        exit(1);
+    }
+
+    assert((drv->bus_type == DT_BUS_PCI) == (drv->get_pcibus != NULL));
+
+    dev = dt_new_device(conf, drv);
+
+    TREE_FOREACH_CHILD(child, conf)
+        dt_create(child, drvtab);
+
+    return dev;
+}
+
+static void dt_config(dt_device *dev, dt_host *host)
+{
+    dt_device *bus = dt_parent_device(dev);
+    dt_device *int_parent = dt_int_parent(dev);
+    dt_device *up;
+    tree *child;
+
+    if (dev->drv->parent_bus_type == DT_BUS_NONE
+        ? bus != NULL
+        : bus == NULL || bus->drv->bus_type != dev->drv->parent_bus_type) {
+        fprintf(stderr, "Device %s is not on a suitable bus\n",
+                dev->drv->name);
+        exit(1);
+    }
+
+    if (dev->int_parent) {
+        if (!int_parent->drv->get_int) {
+            fprintf(stderr, "Device %s has an invalid interrupt-parent\n",
+                    dev->drv->name);
+            exit(1);
+        }
+        for (up = int_parent; up; up = dt_int_parent(up)) {
+            if (up == dev) {
+                fprintf(stderr, "Device %s is its own interrupt ancestor\n",
+                        dev->drv->name);
+                exit(1);
+            }
+        }
+    }
+
+    if (int_parent)
+        TAILQ_INSERT_TAIL(&int_parent->int_children, dev, int_siblings);
+    else if (dev->drv->get_int)
+        TAILQ_INSERT_TAIL(&dt_root(dev)->int_children, dev, int_siblings);
+
+    if (dev->drv->config)
+        dev->drv->config(dev, host);
+
+    TREE_FOREACH_CHILD(child, dev->conf)
+        dt_config(dt_device_of(child), host);
+}
+
+static void dt_int_init(dt_device *dev)
+{
+    dt_device *child;
+
+    if (dev->drv->int_init)
+        dev->drv->int_init(dev);
+
+    TAILQ_FOREACH(child, &dev->int_children, int_siblings)
+        dt_int_init(child);
+}
+
+static void dt_init(dt_device *dev)
+{
+    tree *child;
+
+    if (dev->drv->init)
+        dev->drv->init(dev);
+
+    TREE_FOREACH_CHILD(child, dev->conf)
+        dt_init(dt_device_of(child));
+}
+
+static void dt_start(dt_device *dev)
+{
+    tree *child;
+
+    if (dev->drv->start)
+        dev->drv->start(dev);
+
+    TREE_FOREACH_CHILD(child, dev->conf)
+        dt_start(dt_device_of(child));
+}
+
+void dt_create_machine(dt_device *root, dt_host *host)
+{
+    dt_int_init(root);
+    dt_init(root);
+    dt_host_init(host);
+    dt_start(root);
+}
+
+
+/* Device properties */
+
+static const dt_prop_spec *dt_prop_spec_by_name(const dt_driver *drv,
+                                                const char *name)
+{
+    const dt_prop_spec *spec;
+
+    for (spec = drv->prop_spec; spec && spec->name; spec++) {
+        if (!strcmp(spec->name, name))
+            return spec;
+    }
+    return NULL;
+}
+
+static void dt_parse_prop(dt_device *dev, tree_prop *prop)
+{
+    const char *name = tree_prop_name(prop);
+    size_t size;
+    const char *val = tree_prop_value(prop, &size);
+    const dt_prop_spec *spec = dt_prop_spec_by_name(dev->drv, name);
+
+    if (!spec && !strcmp(name, "interrupt-parent")) {
+        dev->int_parent = tree_node_by_name(dev->conf, val);
+        return;
+    }
+
+    if (!spec) {
+        fprintf(stderr, "A %s device has no property %s\n",
+                dev->drv->name, name);
+        exit(1);
+    }
+
+    if (spec->parse) {
+        if (!val) {
+            fprintf(stderr, "Property %s of device %s needs a value\n",
+                    name, dev->drv->name);
+            exit(1);
+        }
+        if (memchr(val, 0, size) != val + size - 1
+            || spec->parse((char *)dev->priv + spec->offs, val, spec) < 0) {
+            fprintf(stderr, "Bad value %.*s for property %s of device %s\n",
+                    size, val, name, dev->drv->name);
+            exit(1);
+        }
+    } else {
+        if (val) {
+            fprintf(stderr, "Property %s of device %s doesn't take a value\n",
+                    name, dev->drv->name);
+            exit(1);
+        }
+        assert(spec->size == sizeof(int));
+        *(int *)((char *)dev->priv + spec->offs) = 1;
+    }
+}
+
+int dt_parse_string(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    assert(spec->size == sizeof(char *));
+    *(const char **)dst = src;
+    return 0;
+}
+
+int dt_parse_int(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    char *ep;
+    long val;
+
+    assert(spec->size == sizeof(int));
+    errno = 0;
+    val = strtol(src, &ep, 0);
+    if (*ep || ep == src || errno || (int)val != val)
+        return -1;
+    *(int *)dst = val;
+    return 0;
+}
+
+int dt_parse_ram_addr_t(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    char *ep;
+    unsigned long val;
+
+    assert(spec->size == sizeof(ram_addr_t));
+    errno = 0;
+    val = strtoul(src, &ep, 0);
+    if (*ep || ep == src || errno || (ram_addr_t)val != val)
+        return -1;
+    *(ram_addr_t *)dst = val;
+    return 0;
+}
+
+int dt_parse_target_phys_addr_t(void *dst, const char *src,
+                                const dt_prop_spec *spec)
+{
+    char *ep;
+    unsigned long long val;
+
+    assert(spec->size == sizeof(target_phys_addr_t));
+    errno = 0;
+    val = strtoull(src, &ep, 0);
+    if (*ep || ep == src || errno || (target_phys_addr_t)val != val)
+        return -1;
+    *(target_phys_addr_t *)dst = val;
+    return 0;
+}
+
+int dt_parse_macaddr(void *dst, const char *src, const dt_prop_spec *spec)
+{
+    assert(spec->size == 6);
+    if (parse_macaddr(dst, src) < 0)
+        return -1;
+    return 0;
+}
+
+
+/* Dynamic Devices */
+
+static void dt_add_dyn_dev(tree *conf, tree *node, const dt_driver drvtab[],
+                           int busno)
+{
+    dt_device *dev = dt_create(node, drvtab);
+    dt_device *bus = dt_find_bus(conf, dev->drv->parent_bus_type, busno);
+
+    if (!bus) {
+        fprintf(stderr, "No suitable bus for device %s\n", dev->drv->name);
+        exit(1);
+    }
+
+    tree_insert(bus->conf, node);
+}
+
+static void dt_add_vga(tree *conf, const dt_driver drvtab[],
+                       const char *model, int vga_ram_size)
+{
+    tree *node = tree_new_child(NULL, "vga", NULL);
+
+    tree_put_propf(node, "model", "%s", model);
+    tree_put_propf(node, "ram", "%#x", vga_ram_size);
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+}
+
+static void dt_add_virtio_console(tree *conf, const dt_driver drvtab[],
+                                  int index)
+{
+    tree *node = tree_new_child(NULL, "virtio-console", NULL);
+
+    tree_put_propf(node, "index", "%d", index);
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+}
+
+static void dt_add_nic(tree *conf, dt_host *host, const dt_driver drvtab[],
+                       NICInfo *n)
+{
+    tree *node = node = tree_new_child(NULL, "nic", NULL);
+
+    tree_put_propf(node, "mac", "%02x:%02x:%02x:%02x:%02x:%02x",
+                   n->macaddr[0], n->macaddr[1], n->macaddr[2],
+                   n->macaddr[3], n->macaddr[4], n->macaddr[5]);
+    tree_put_propf(node, "model", "%s",
+                   n->model ? n->model : "ne2k_pci");
+    if (n->name)
+        tree_put_propf(node, "name", "%s", n->name);
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+    dt_attach_nic(host, node, n->vlan);
+}
+
+static void dt_add_scsi(tree *conf, const dt_driver drvtab[], int busno)
+{
+    tree *node = tree_new_child(NULL, "scsi", NULL);
+
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+    assert(dt_find_bus(conf, DT_BUS_SCSI, busno)->conf == node);
+}
+
+static void dt_add_virtio_block(tree *conf, const dt_driver drvtab[],
+                                int busno)
+{
+    tree *node = tree_new_child(NULL, "virtio-block", NULL);
+
+    dt_add_dyn_dev(conf, node, drvtab, 0);
+    assert(dt_find_bus(conf, DT_BUS_VIRTIO, busno)->conf == node);
+}
+
+static const char *block_if_name[] = {
+    [IF_IDE] = "ide",
+    [IF_SCSI] = "scsi",
+    [IF_FLOPPY] = "floppy",
+    [IF_PFLASH] = "pflash",
+    [IF_MTD] = "mtd",
+    [IF_SD] = "sd",
+    [IF_VIRTIO] = "virtio",
+};
+
+static void dt_do_add_drive(tree *conf, dt_host *host,
+                            const dt_driver drvtab[],
+                            int bus_type, int busno, int unit,
+                            BlockDriverState *bdrv)
+{
+    char buf[32];
+    tree *node;
+
+    snprintf(buf, sizeof(buf), "%s-drive", block_if_name[bus_type]);
+    node = tree_new_child(NULL, strdup(buf), NULL);
+    tree_put_propf(node, "unit", "%d", unit);
+    dt_add_dyn_dev(conf, node, drvtab, busno);
+    dt_attach_drive(host, node, bdrv);
+}
+
+static void dt_add_drive(tree *conf, dt_host *host, const dt_driver drvtab[],
+                         DriveInfo *d)
+{
+    switch (d->type) {
+    case IF_IDE:
+        /* hack to hang all IDE drives off the same node for now */
+        dt_do_add_drive(conf, host, drvtab,
+                        d->type, 0, d->bus * MAX_IDE_DEVS + d->unit, d->bdrv);
+        break;
+    case IF_SCSI:
+    case IF_FLOPPY:
+        dt_do_add_drive(conf, host, drvtab,
+                        d->type, d->bus, d->unit, d->bdrv);
+        break;
+    case IF_VIRTIO:
+        /* See comment in on virtio block in dt_add_dyn_devs() */
+        dt_do_add_drive(conf, host, drvtab,
+                        d->type, dt_alloc_virtio_bus(host), 0, d->bdrv);
+        break;
+    case IF_PFLASH:
+    case IF_MTD:
+    case IF_SD:
+        /* TODO implement */
+        fprintf(stderr, "Ignoring unimplemented drive %s\n",
+                drives_opt[d->drive_opt_idx].opt);
+        break;
+    }
+}
+
+static void dt_add_dyn_devs(tree *conf, dt_host *host,
+                            const dt_driver drvtab[], int vga_ram_size)
+{
+    int i, max_bus, busno;
+
+    /* VGA */
+    if (cirrus_vga_enabled || vmsvga_enabled || std_vga_enabled) {
+        dt_add_vga(conf, drvtab,
+                   cirrus_vga_enabled ? "cirrus"
+                   : vmsvga_enabled ? "vms" : "std",
+                   vga_ram_size);
+    }
+
+    /* Virtio consoles */
+    for (i = 0; i < MAX_VIRTIO_CONSOLES; i++) {
+        if (virtcon_hds[i])
+            dt_add_virtio_console(conf, drvtab, i);
+    }
+
+    /* NICs */
+    for(i = 0; i < nb_nics; i++)
+        dt_add_nic(conf, host, drvtab, &nd_table[i]);
+
+    /*
+     * SCSI controllers
+     *
+     * This creates all controllers 0..max_bus, whether they have
+     * drives or not.  Matches pc.c behavior.
+     */
+    max_bus = drive_get_max_bus(IF_SCSI);
+    for (i = 0; i <= max_bus; i++)
+        dt_add_scsi(conf, drvtab, i);
+
+    /*
+     * Virtio block controllers
+     *
+     * Each virtio drive is its own PCI device.  Since the device tree
+     * should reflect that, we give each device on its own virtio
+     * block controller node.
+     *
+     * DriveInfo's bus and unit are a mess.  The user can specify any
+     * bus or unit number.  An unspecified bus number defaults to
+     * zero, and an unspecified unit number defaults to the first
+     * unused one (see drive_init()).  pc.c silently ignores all
+     * virtio drives with non-zero bus number, and all drives on bus
+     * zero after the first unused unit number.  Instead of
+     * replicating that questionable behavior, simply ignore bus and
+     * unit for these drives.
+     */
+    busno = 0;
+    for (i = 0; i < nb_drives; i++) {
+        if (drives_table[i].type == IF_VIRTIO)
+            dt_add_virtio_block(conf, drvtab, busno++);
+    }
+
+    /* Drives */
+    for (i = 0; i < nb_drives; i++)
+        dt_add_drive(conf, host, drvtab, &drives_table[i]);
+}
+
+tree *dt_add_memory(tree *conf, target_phys_addr_t addr, ram_addr_t sz, int ro)
+{
+    tree *node;
+
+    node = tree_new_child(conf, "memory", NULL);
+    tree_put_propf(node, "address", "0x" TARGET_FMT_plx, addr);
+    tree_put_propf(node, "size", "%#lx", (unsigned long)sz);
+    if (ro)
+        tree_put_prop(node, "read-only", NULL, 0);
+    return node;
+}
+
+
+/* Create a configuration */
+
+#if 0 /* TODO implement */
+tree *dt_read_config(const char *name)
+{
+}
+#endif
+
+static void
+dt_int_tree_print(dt_device *dev, int indent)
+{
+    const char *name = tree_node_name(dev->conf);
+    dt_device *child;
+
+    printf("%*s%s {\n", indent, "", *name ? name : "/");
+    TAILQ_FOREACH(child, &dev->int_children, int_siblings)
+        dt_int_tree_print(child, indent + 4);
+    printf("%*s};\n", indent, "");
+}
+
+dt_device *dt_config_machine(tree *conf, dt_host *host,
+                             const dt_driver drvtab[],
+                             ram_addr_t ram_size, int vga_ram_size,
+                             const char *cpu_model)
+{
+    tree *node;
+    dt_device *root;
+
+    tree_print(conf);
+
+    node = tree_node_by_name(conf, "/cpus");
+    tree_put_propf(node, "num", "%d", smp_cpus);
+    if (cpu_model)
+        tree_put_propf(node, "model", "%s", cpu_model);
+
+    root = dt_create(conf, drvtab);
+    dt_add_dyn_devs(conf, host, drvtab, vga_ram_size);
+    dt_config(root, host);
+
+    dt_print_host_config(host);
+    tree_print(conf);
+    dt_int_tree_print(root, 0);
+
+    dt_fdt_test(conf);
+    return root;
+}
diff --git a/dt.h b/dt.h
new file mode 100644
index 0000000..85cc6a8
--- /dev/null
+++ b/dt.h
@@ -0,0 +1,141 @@
+#ifndef DT_H
+#define DT_H
+
+#include "sysemu.h"
+#include "net.h"
+#include "tree.h"
+
+typedef struct dt_host dt_host;
+typedef struct dt_device dt_device;
+typedef struct dt_driver dt_driver;
+typedef struct dt_prop_spec dt_prop_spec;
+
+
+/* Host Configuration */
+
+typedef void (*dt_mem_loader)(target_phys_addr_t, ram_addr_t, const void *);
+
+dt_host *dt_create_host(void);
+void dt_attach_nic(dt_host *host, tree *nic, VLANState *vlan);
+VLANState *dt_find_vlan(tree *conf, dt_host *host);
+void dt_attach_drive(dt_host *host, tree *node, BlockDriverState *state);
+void dt_find_drives(tree *conf, dt_host *host,
+                    BlockDriverState *drive[], int n);
+void dt_config_mem(dt_host *host, dt_mem_loader loader,
+                   target_phys_addr_t addr, ram_addr_t size, const void *);
+void dt_image_loader(target_phys_addr_t addr, ram_addr_t size, const void *);
+int dt_alloc_virtio_bus(dt_host *host);
+void dt_host_init(dt_host *host);
+void dt_print_host_config(dt_host *host);
+
+
+/* Device Interface */
+
+/*
+ * Device life cycle:
+ *
+ * 1. Configuration: config() method runs after parent's.  It should
+ * initialize the device's private data from its configuration
+ * sub-tree.  It may edit the configuration sub-tree.  Private data is
+ * zeroed before config() runs.
+ *
+ * 2. Initialization: int_init() method runs after interrupt parent's.
+ * Then init() method runs after parent's.  Neither should touch the
+ * configuration tree.
+ *
+ * 3. Start: start() method runs, order is unspecified.
+ *
+ * Error handling in these driver methods: print to stderr and exit
+ * the program unsuccessfully.
+ *
+ * There is no device shutdown protocol yet.
+ */
+
+struct dt_device {
+    tree *conf;                 /* configuration sub-tree */
+    const dt_driver *drv;       /* device driver */
+    tree *int_parent;           /* interrupt parent if != tree_parent(conf) */
+    TAILQ_HEAD(, dt_device) int_children;
+    TAILQ_ENTRY(dt_device) int_siblings;
+    int visit;                  /* for dt_walk_int_tree() */
+    void *priv;                 /* device private data */
+};
+
+typedef enum dt_bus_type {
+    DT_BUS_NONE, DT_BUS_ROOT, DT_BUS_PCI, DT_BUS_ISA, DT_BUS_IDE,
+    DT_BUS_SCSI, DT_BUS_FLOPPY, DT_BUS_VIRTIO
+} dt_bus_type;
+
+struct dt_driver {
+    const char *name;
+    size_t privsz;              /* size of device private data */
+    const dt_prop_spec *prop_spec; /* recognized conf node properties */
+    dt_bus_type bus_type, parent_bus_type;
+    /* live cycle methods */
+    void (*config)(dt_device *, dt_host *);
+    void (*int_init)(dt_device *);
+    void (*init)(dt_device *);
+    void (*start)(dt_device *);
+    /* def'd iff device is a PCI bus, may return NULL until after init() */
+    PCIBus *(*get_pcibus)(dt_device *);
+    /* optional, always available */
+    int (*get_unit)(dt_device *);
+    /* def'd iff is an int. ctrlr., may return NULL until after int_init() */
+    qemu_irq *(*get_int)(dt_device *, int *);
+};
+
+dt_device *dt_device_of(tree *conf);
+dt_device *dt_parent_device(dt_device *dev);
+dt_device *dt_root(dt_device *dev);
+dt_device *dt_find_bus(tree *conf, dt_bus_type bus_type, int busno);
+PCIBus *dt_get_pcibus(dt_device *dev);
+int dt_get_unit(dt_device *dev);
+qemu_irq *dt_get_int(dt_device *dev, int n);
+
+tree *dt_add_memory(tree *conf, target_phys_addr_t addr, ram_addr_t sz, int ro);
+tree *dt_read_config(const char *name);
+dt_device *dt_config_machine(tree *conf, dt_host *host,
+                             const dt_driver drvtab[],
+                             ram_addr_t ram_size, int vga_ram_size,
+                             const char *cpu_model);
+void dt_create_machine(dt_device *root, dt_host *host);
+
+
+/* Device properties */
+
+/*
+ * This is for parsing configuration tree node properties into device
+ * private data.
+ */
+
+struct dt_prop_spec {
+    const char *name;
+    ptrdiff_t offs;             /* offset in device private data */
+    size_t size;                /* size there, for sanity checking */
+    int (*parse)(void *, const char *, const dt_prop_spec *);
+};
+
+#define DT_PROP_SPEC_INIT(name, strty, member, fmt)                     \
+    { name, offsetof(strty, member), sizeof(((strty *)0)->member),      \
+      dt_parse_##fmt }
+#define DT_PROP_SPEC_SENTINEL() { NULL, 0, 0, NULL }
+
+/* Canned property parse methods */
+#define dt_parse_void NULL
+int dt_parse_string(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_int(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_ram_addr_t(void *dst, const char *src, const dt_prop_spec *spec);
+int dt_parse_target_phys_addr_t(void *dst, const char *src,
+                                const dt_prop_spec *spec);
+int dt_parse_macaddr(void *dst, const char *src, const dt_prop_spec *spec);
+
+
+/* Interfacing with FDT */
+
+#ifdef HAVE_FDT
+void dt_fdt_test(tree *conf);
+#else
+static inline void dt_fdt_test(tree *conf) { }
+#endif
+
+#endif
diff --git a/hw/boards.h b/hw/boards.h
index 7fada94..08c4591 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -32,6 +32,9 @@ extern QEMUMachine axisdev88_machine;
 extern QEMUMachine pc_machine;
 extern QEMUMachine isapc_machine;
 
+/* pcdt.c */
+extern QEMUMachine pcdt_machine;
+
 /* ppc.c */
 extern QEMUMachine prep_machine;
 extern QEMUMachine core99_machine;
diff --git a/hw/pc.c b/hw/pc.c
index 6a1750e..fe3085c 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -37,42 +37,35 @@
 #include "virtio-balloon.h"
 #include "virtio-console.h"
 #include "hpet_emul.h"
+#include "pcint.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
 
-#define BIOS_FILENAME "bios.bin"
-#define VGABIOS_FILENAME "vgabios.bin"
-#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
-
-#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
-
 /* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables.  */
 #define ACPI_DATA_SIZE       0x10000
 #define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 
-#define MAX_IDE_BUS 2
-
-static fdctrl_t *floppy_controller;
-static RTCState *rtc_state;
+fdctrl_t *floppy_controller;
+RTCState *rtc_state;
 static PITState *pit;
 static IOAPICState *ioapic;
-static PCIDevice *i440fx_state;
+PCIDevice *i440fx_state;
 
-static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
 {
 }
 
 /* MSDOS compatibility mode FPU exception support */
-static qemu_irq ferr_irq;
+qemu_irq ferr_irq;
 /* XXX: add IGNNE support */
 void cpu_set_ferr(CPUX86State *s)
 {
     qemu_irq_raise(ferr_irq);
 }
 
-static void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data)
 {
     qemu_irq_lower(ferr_irq);
 }
@@ -121,7 +114,7 @@ int cpu_get_pic_interrupt(CPUState *env)
     return intno;
 }
 
-static void pic_irq_request(void *opaque, int irq, int level)
+void pic_irq_request(void *opaque, int irq, int level)
 {
     CPUState *env = first_cpu;
 
@@ -167,7 +160,7 @@ static int cmos_get_fd_drive_type(int fd0)
     return val;
 }
 
-static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd)
 {
     RTCState *s = rtc_state;
     int cylinders, heads, sectors;
@@ -203,7 +196,7 @@ static int boot_device2nibble(char boot_device)
 
 /* copy/pasted from cmos_init, should be made a general function
  and used there as well */
-static int pc_boot_set(void *opaque, const char *boot_device)
+int pc_boot_set(void *opaque, const char *boot_device)
 {
     Monitor *mon = cur_mon;
 #define PC_MAX_BOOT_DEVICES 3
@@ -230,8 +223,8 @@ static int pc_boot_set(void *opaque, const char *boot_device)
 }
 
 /* hd_table must contain 4 block drivers */
-static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
-                      const char *boot_device, BlockDriverState **hd_table)
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table)
 {
     RTCState *s = rtc_state;
     int nbds, bds[3] = { 0, };
@@ -364,13 +357,13 @@ int ioport_get_a20(void)
     return ((first_cpu->a20_mask >> 20) & 1);
 }
 
-static void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val)
 {
     ioport_set_a20((val >> 1) & 1);
     /* XXX: bit 0 is fast reset */
 }
 
-static uint32_t ioport92_read(void *opaque, uint32_t addr)
+uint32_t ioport92_read(void *opaque, uint32_t addr)
 {
     return ioport_get_a20() << 1;
 }
@@ -422,7 +415,7 @@ static void bochs_bios_write(void *opaque, uint32_t addr, uint32_t val)
     }
 }
 
-static void bochs_bios_init(void)
+void bochs_bios_init(void)
 {
     void *fw_cfg;
 
@@ -537,10 +530,10 @@ static long get_file_size(FILE *f)
     return size;
 }
 
-static void load_linux(target_phys_addr_t option_rom,
-                       const char *kernel_filename,
-		       const char *initrd_filename,
-		       const char *kernel_cmdline)
+void load_linux(target_phys_addr_t option_rom,
+                const char *kernel_filename,
+                const char *initrd_filename,
+                const char *kernel_cmdline)
 {
     uint16_t protocol;
     uint32_t gpr[8];
@@ -691,7 +684,7 @@ static void load_linux(target_phys_addr_t option_rom,
     generate_bootsect(option_rom, gpr, seg, 0);
 }
 
-static void main_cpu_reset(void *opaque)
+void main_cpu_reset(void *opaque)
 {
     CPUState *env = opaque;
     cpu_reset(env);
@@ -706,11 +699,11 @@ static const int ide_irq[2] = { 14, 15 };
 static int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360, 0x280, 0x380 };
 static int ne2000_irq[NE2000_NB_MAX] = { 9, 10, 11, 3, 4, 5 };
 
-static int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
-static int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
+int serial_io[MAX_SERIAL_PORTS] = { 0x3f8, 0x2f8, 0x3e8, 0x2e8 };
+int serial_irq[MAX_SERIAL_PORTS] = { 4, 3, 4, 3 };
 
-static int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
-static int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
+int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
+int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
 
 #ifdef HAS_AUDIO
 static void audio_init (PCIBus *pci_bus, qemu_irq *pic)
diff --git a/hw/pcdt.c b/hw/pcdt.c
new file mode 100644
index 0000000..fac7f63
--- /dev/null
+++ b/hw/pcdt.c
@@ -0,0 +1,915 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * This is a PC configured and built using the new dt.h interface.
+ * Having two PC machine types makes no sense in the long run, of
+ * course.  We want to replace pc.c eventually, and also convert other
+ * machine types to this interface.
+ *
+ * The configuration data currently is hardwired, and fairly limited.
+ *
+ * The nuts and bolts of PC emulation remain in pc.c for now, and
+ * using the stuff there makes the somewhat clumsy pcint.h necessary.
+ *
+ * The drivers here generally don't do the actual work, they just
+ * provide a common interface to existing device code.  Arguably, they
+ * should be integrated into that device code, with the goal of
+ * eventually replacing the old, ad hoc interfaces.
+ *
+ * Several drivers here are not PC-specific, e.g. drivers for various
+ * PCI devices.
+ */
+
+#include <assert.h>
+#include "hw.h"
+#include "pc.h"
+#include "fdc.h"
+#include "pci.h"
+#include "block.h"
+#include "sysemu.h"
+#include "audio/audio.h"
+#include "net.h"
+#include "smbus.h"
+#include "boards.h"
+#include "console.h"
+#include "fw_cfg.h"
+#include "virtio-blk.h"
+#include "virtio-balloon.h"
+#include "virtio-console.h"
+#include "hpet_emul.h"
+#include "pcint.h"
+#include "dt.h"
+
+
+/* CPUs Driver */
+
+typedef struct dt_device_cpus {
+    const char *model;
+    int num;
+} dt_device_cpus;
+
+static const dt_prop_spec dt_cpus_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_cpus, model, string),
+    DT_PROP_SPEC_INIT("num", dt_device_cpus, num, int),
+    DT_PROP_SPEC_SENTINEL()
+};
+
+static void dt_cpus_init(dt_device *dev)
+{
+    dt_device_cpus *priv = dev->priv;
+    int i;
+    CPUState *env;
+
+    for(i = 0; i < priv->num; i++) {
+        env = cpu_init(priv->model);
+        if (!env) {
+            fprintf(stderr, "Unable to find CPU definition\n");
+            exit(1);
+        }
+        if (i != 0)
+            env->halted = 1;
+        qemu_register_reset(main_cpu_reset, env);
+    }
+}
+
+
+/* Memory Driver */
+
+typedef struct dt_device_memory {
+    target_phys_addr_t addr;
+    ram_addr_t size;
+    int read_only;
+} dt_device_memory;
+
+static const dt_prop_spec dt_memory_props[] = {
+    DT_PROP_SPEC_INIT("address", dt_device_memory, addr, target_phys_addr_t),
+    DT_PROP_SPEC_INIT("size", dt_device_memory, size, ram_addr_t),
+    DT_PROP_SPEC_INIT("read-only", dt_device_memory, read_only, void),
+    DT_PROP_SPEC_SENTINEL()
+};
+
+static void dt_memory_init(dt_device *dev)
+{
+    dt_device_memory *priv = dev->priv;
+    ram_addr_t host_offs;
+
+    host_offs = qemu_ram_alloc(priv->size);
+    cpu_register_physical_memory(priv->addr, priv->size,
+                        host_offs | (priv->read_only ? IO_MEM_ROM : 0));
+
+    printf("memory " TARGET_FMT_plx " %8lx %8lx %s\n",
+           priv->addr, (unsigned long)priv->size, (unsigned long)host_offs,
+           priv->read_only ? "ro" : "rw");
+}
+
+
+/* i8254 PIT Driver */
+
+/*
+ * TODO Factor out the parts that are not PC-specific, and move them
+ * out, since there are other architectures that use it.
+ *
+ * TODO HPET should be separate from PIT.
+ */
+
+typedef struct dt_device_i8254 {
+    int hpet;
+} dt_device_i8254;
+
+static void dt_i8254_config(dt_device *dev, dt_host *host)
+{
+    dt_device_i8254 *priv = dev->priv;
+
+    priv->hpet = 1;
+}
+
+static void dt_i8254_init(dt_device *dev)
+{
+    PITState *pit;
+    qemu_irq *i8259;
+    dt_device_i8254 *priv = dev->priv;
+
+    i8259 = dt_get_int(dev, 16);
+
+    pit = pit_init(0x40, i8259[0]);
+    pcspk_init(pit);
+    if (priv->hpet)
+        hpet_init(i8259);
+}
+
+
+/* i8259 PIC Driver */
+
+/*
+ * TODO Factor out the parts that are not PC-specific, and move them
+ * out, since there are other architectures that use it.
+ */
+
+typedef struct dt_device_i8259 {
+    qemu_irq *i8259;
+} dt_device_i8259;
+
+static void dt_i8259_int_init(dt_device *dev)
+{
+    dt_device_i8259 *priv = dev->priv;
+    qemu_irq *cpu_irq;
+
+    cpu_irq = qemu_allocate_irqs(pic_irq_request, NULL, 1);
+    priv->i8259 = i8259_init(cpu_irq[0]);
+    ferr_irq = priv->i8259[13];
+}
+
+static qemu_irq *dt_i8259_get_int(dt_device *dev, int *nirq)
+{
+    dt_device_i8259 *priv = dev->priv;
+
+    if (nirq)
+        *nirq = 16;
+    return priv->i8259;
+}
+
+
+/* MC146818 RTC Driver */
+
+typedef struct dt_device_MC146818 {
+    const char *boot_device;
+    ram_addr_t ram_below_4g, ram_above_4g;
+    /* TODO store cylinders, heads, sectors, translation instead of bds[] */
+    BlockDriverState *bds[MAX_IDE_BUS * MAX_IDE_DEVS];
+} dt_device_MC146818;
+
+static const dt_prop_spec dt_MC146818_props[] = {
+    DT_PROP_SPEC_INIT("boot-device", dt_device_MC146818, boot_device,
+                      string),
+    DT_PROP_SPEC_INIT("ram-below-4g", dt_device_MC146818, ram_below_4g,
+                      ram_addr_t),
+    DT_PROP_SPEC_INIT("ram-above-4g", dt_device_MC146818, ram_above_4g,
+                      ram_addr_t),
+    DT_PROP_SPEC_SENTINEL()
+};
+
+static void dt_MC146818_config(dt_device *dev, dt_host *host)
+{
+    dt_device_MC146818 *priv = dev->priv;
+
+    dt_find_drives(dt_find_bus(dt_root(dev)->conf, DT_BUS_IDE, 0)->conf,
+                   host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_MC146818_init(dt_device *dev)
+{
+    dt_device_MC146818 *priv = dev->priv;
+    qemu_irq *i8259 = dt_get_int(dev, 16);
+
+    rtc_state = rtc_init(0x70, i8259[8], 2000);
+    qemu_register_boot_set(pc_boot_set, rtc_state);
+    cmos_init(priv->ram_below_4g, priv->ram_above_4g, priv->boot_device,
+              priv->bds);
+}
+
+
+/* PC Miscellanous Driver */
+
+/*
+ * This is a driver for a whole collection of devices.  Could be
+ * picked apart into separate drivers, I guess.
+ */
+
+typedef struct dt_device_pc_misc {
+    int apic;
+    BlockDriverState *bds[MAX_FD];
+} dt_device_pc_misc;
+
+static void dt_pc_misc_config(dt_device *dev, dt_host *host)
+{
+    dt_device_pc_misc *priv = dev->priv;
+
+    priv->apic = 1;
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_pc_misc_init(dt_device *dev)
+{
+    dt_device_pc_misc *priv = dev->priv;
+    CPUState *env;
+    IOAPICState *ioapic;
+    qemu_irq *i8259;
+    int i;
+
+    i8259 = dt_get_int(dev, 16);
+
+    if (priv->apic) {
+        for (env = first_cpu; env; env = env->next_cpu) {
+            env->cpuid_features |= CPUID_APIC;
+            apic_init(env);
+        }
+    }
+
+    vmport_init();
+
+    bochs_bios_init();
+
+    register_ioport_write(0x80, 1, 1, ioport80_write, NULL);
+    register_ioport_write(0xf0, 1, 1, ioportF0_write, NULL);
+
+    register_ioport_read(0x92, 1, 1, ioport92_read, NULL);
+    register_ioport_write(0x92, 1, 1, ioport92_write, NULL);
+
+    if (priv->apic) {
+        ioapic = ioapic_init();
+        pic_set_alt_irq_func(isa_pic, ioapic_set_irq, ioapic);
+    }
+
+    for(i = 0; i < MAX_SERIAL_PORTS; i++) {
+        if (serial_hds[i]) {
+            serial_init(serial_io[i], i8259[serial_irq[i]], 115200,
+                        serial_hds[i]);
+        }
+    }
+
+    for(i = 0; i < MAX_PARALLEL_PORTS; i++) {
+        if (parallel_hds[i]) {
+            parallel_init(parallel_io[i], i8259[parallel_irq[i]],
+                          parallel_hds[i]);
+        }
+    }
+
+    qemu_system_hot_add_init();
+
+    i8042_init(i8259[1], i8259[12], 0x60);
+    DMA_init(0);
+
+    floppy_controller = fdctrl_init(i8259[6], 2, 0, 0x3f0, priv->bds);
+}
+
+
+/* PCI Bus Driver */
+
+typedef struct dt_device_pci {
+    PCIBus *pcibus;
+} dt_device_pci;
+
+static void dt_pci_init(dt_device *dev)
+{
+    dt_device_pci *priv = dev->priv;
+
+    priv->pcibus = i440fx_init(&i440fx_state, dt_get_int(dev, 16));
+}
+
+static void dt_pci_start(dt_device *dev)
+{
+    i440fx_init_memory_mappings(i440fx_state);
+}
+
+static PCIBus *dt_pci_get_pcibus(dt_device *dev)
+{
+    return ((dt_device_pci *)dev->priv)->pcibus;
+}
+
+
+/*
+ * FIXME: this assumes the PIIX3 ISA bridge function is ->config'ed first
+ * followed by IDE, USB and ACPI, in this particular order. It just happens
+ * to work now. Fix is to grab PCI addresses from device tree, which should
+ * happen soon.
+ */
+static int piix3_devfn;
+
+/* PIIX3 IDE Driver */
+
+typedef struct dt_device_piix3_ide {
+    BlockDriverState *bds[MAX_IDE_BUS * MAX_IDE_DEVS];
+} dt_device_piix3_ide;
+
+static void dt_piix3_ide_config(dt_device *dev, dt_host *host)
+{
+    dt_device_piix3_ide *priv = dev->priv;
+
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_piix3_ide_init(dt_device *dev)
+{
+    dt_device_piix3_ide *priv = dev->priv;
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+    qemu_irq *i8259 = dt_get_int(dev, 16);
+
+    pci_piix3_ide_init(pci_bus, priv->bds, piix3_devfn + 1, i8259);
+}
+
+
+/* PIIX3 USB Driver */
+
+static void dt_piix3_usb_init(dt_device *dev)
+{
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+
+    usb_uhci_piix3_init(pci_bus, piix3_devfn + 2);
+}
+
+
+/* PIIX3 ACPI Driver */
+
+static void dt_piix3_acpi_init(dt_device *dev)
+{
+    int i;
+    uint8_t *eeprom_buf;
+    i2c_bus *smbus;
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+    qemu_irq *i8259 = dt_get_int(dev, 16);
+
+    eeprom_buf = qemu_mallocz(8 * 256); /* XXX: make this persistent */
+
+    /* TODO: Populate SPD eeprom data.  */
+    smbus = piix4_pm_init(pci_bus, piix3_devfn + 3, 0xb100, i8259[9]);
+    for (i = 0; i < 8; i++)
+        smbus_eeprom_device_init(smbus, 0x50 + i, eeprom_buf + (i * 256));
+}
+
+
+/* PIIX3 Driver */
+
+static void dt_piix3_init(dt_device *dev)
+{
+    PCIBus *pci_bus = dt_get_pcibus(dev);
+
+    piix3_devfn = piix3_init(pci_bus, -1);
+}
+
+
+/* VGA Driver */
+
+typedef struct dt_driver_vga {
+    const char *model;
+    void (*init)(PCIBus *, int);
+} dt_driver_vga;
+
+static void pci_vga_init_(PCIBus *bus, int vga_ram_size)
+{
+    pci_vga_init(bus, vga_ram_size, 0, 0);
+}
+
+static dt_driver_vga dt_driver_vga_table[] = {
+    { "cirrus", pci_cirrus_vga_init },
+    { "vms", pci_vmsvga_init },
+    { "std", pci_vga_init_ },
+    { NULL, NULL }
+};
+
+typedef struct dt_device_vga {
+    const char *model;
+    ram_addr_t ram_size;
+    dt_driver_vga *vga_drv;
+} dt_device_vga;
+
+static const dt_prop_spec dt_vga_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_vga, model, string),
+    DT_PROP_SPEC_INIT("ram", dt_device_vga, ram_size, ram_addr_t),
+    DT_PROP_SPEC_SENTINEL()
+};
+
+static void dt_vga_config(dt_device *dev, dt_host *host)
+{
+    dt_device_vga *priv = dev->priv;
+    int i;
+
+    for (i = 0; dt_driver_vga_table[i].model; i++) {
+        if (!strcmp(dt_driver_vga_table[i].model, priv->model))
+            break;
+    }
+    if (!dt_driver_vga_table[i].model) {
+        fprintf(stderr, "Unknown VGA model %s\n", priv->model);
+        exit(1);
+    }
+    priv->vga_drv = &dt_driver_vga_table[i];
+}
+
+static void dt_vga_init(dt_device *dev)
+{
+    dt_device_vga *priv = dev->priv;
+
+    priv->vga_drv->init(dt_get_pcibus(dev), priv->ram_size);
+}
+
+
+/* NIC Driver */
+
+typedef struct dt_device_nic {
+    NICInfo nd;
+} dt_device_nic;
+
+static const dt_prop_spec dt_nic_props[] = {
+    DT_PROP_SPEC_INIT("model", dt_device_nic, nd.model, string),
+    DT_PROP_SPEC_INIT("mac", dt_device_nic, nd.macaddr, macaddr),
+    DT_PROP_SPEC_INIT("name", dt_device_nic, nd.name, string),
+    DT_PROP_SPEC_SENTINEL()
+};
+
+static void dt_nic_config(dt_device *dev, dt_host *host)
+{
+    dt_device_nic *priv = dev->priv;
+
+    priv->nd.vlan = dt_find_vlan(dev->conf, host);
+}
+
+static void dt_nic_init(dt_device *dev)
+{
+    dt_device_nic *priv = dev->priv;
+
+    pci_nic_init(dt_get_pcibus(dev), &priv->nd, -1, NULL);
+}
+
+
+/* SCSI Driver */
+
+typedef struct dt_device_scsi {
+    void *opaque;
+    BlockDriverState *bds[LSI_MAX_DEVS];
+} dt_device_scsi;
+
+static void dt_scsi_config(dt_device *dev, dt_host *host)
+{
+    dt_device_scsi *priv = dev->priv;
+
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_scsi_init(dt_device *dev)
+{
+    dt_device_scsi *priv = dev->priv;
+    int i;
+
+    priv->opaque = lsi_scsi_init(dt_get_pcibus(dev), -1);
+
+    for (i = 0; i < ARRAY_SIZE(priv->bds); i++) {
+        if (priv->bds[i])
+            lsi_scsi_attach(priv->opaque, priv->bds[i], i);
+    }
+}
+
+
+/* Virtio Block Driver */
+
+typedef struct dt_device_virtio_block {
+    BlockDriverState *bds[1];
+} dt_device_virtio_block;
+
+static void dt_virtio_block_config(dt_device *dev, dt_host *host)
+{
+    dt_device_virtio_block *priv = dev->priv;
+
+    dt_find_drives(dev->conf, host, priv->bds, ARRAY_SIZE(priv->bds));
+}
+
+static void dt_virtio_block_init(dt_device *dev)
+{
+    dt_device_virtio_block *priv = dev->priv;
+
+    virtio_blk_init(dt_get_pcibus(dev), priv->bds[0]);
+}
+
+
+/* Virtio Balloon Driver */
+
+static void dt_virtio_balloon_init(dt_device *dev)
+{
+    virtio_balloon_init(dt_get_pcibus(dev));
+}
+
+
+/* Virtio Console Driver */
+
+typedef struct dt_device_virtio_console {
+    int index;
+    CharDriverState *hds;
+} dt_device_virtio_console;
+
+static const dt_prop_spec dt_virtio_console_props[] = {
+    DT_PROP_SPEC_INIT("index", dt_device_virtio_console, index, int),
+    DT_PROP_SPEC_SENTINEL()
+};
+
+static void dt_virtio_console_config(dt_device *dev, dt_host *host)
+{
+    dt_device_virtio_console *priv = dev->priv;
+
+    priv->hds = virtcon_hds[priv->index];
+}
+
+static void dt_virtio_console_init(dt_device *dev)
+{
+    dt_device_virtio_console *priv = dev->priv;
+
+    virtio_console_init(dt_get_pcibus(dev), priv->hds);
+}
+
+
+/* Drive Driver */
+
+typedef struct dt_device_drive {
+    int unit;
+} dt_device_drive;
+
+static const dt_prop_spec dt_drive_props[] = {
+    DT_PROP_SPEC_INIT("unit", dt_device_drive, unit, int),
+    DT_PROP_SPEC_SENTINEL()
+};
+
+static int dt_drive_get_unit(dt_device *dev)
+{
+    return ((dt_device_drive *)dev->priv)->unit;
+}
+
+
+/* Machine Driver */
+
+static const dt_driver dt_driver_table[] = {
+    { "", 0, NULL, DT_BUS_ROOT, DT_BUS_NONE,
+      NULL, NULL, NULL, NULL,
+      NULL, NULL, NULL },
+    { "cpus", sizeof(dt_device_cpus), dt_cpus_props,
+      DT_BUS_NONE, DT_BUS_ROOT,
+      NULL, NULL, dt_cpus_init, NULL,
+      NULL, NULL, NULL },
+    { "memory", sizeof(dt_device_memory), dt_memory_props,
+      DT_BUS_NONE, DT_BUS_ROOT,
+      NULL, NULL, dt_memory_init, NULL,
+      NULL, NULL, NULL },
+    { "i8259", sizeof(dt_device_i8259), NULL,
+      DT_BUS_NONE, DT_BUS_ISA,
+      NULL, dt_i8259_int_init, NULL, NULL,
+      NULL, NULL, dt_i8259_get_int },
+    { "i8254", sizeof(dt_device_i8254), NULL,
+      DT_BUS_NONE, DT_BUS_ISA,
+      dt_i8254_config, NULL, dt_i8254_init, NULL,
+      NULL, NULL, NULL },
+    { "MC146818", sizeof(dt_device_MC146818), dt_MC146818_props,
+      DT_BUS_NONE, DT_BUS_ISA,
+      dt_MC146818_config, NULL, dt_MC146818_init, NULL,
+      NULL, NULL, NULL },
+    { "pc-misc", sizeof(dt_device_pc_misc), NULL,
+      DT_BUS_FLOPPY, DT_BUS_ROOT,
+      dt_pc_misc_config, NULL, dt_pc_misc_init, NULL,
+      NULL, NULL, NULL },
+    { "pci", sizeof(dt_device_pci), NULL,
+      DT_BUS_PCI, DT_BUS_ROOT,
+      NULL, NULL, dt_pci_init, dt_pci_start,
+      dt_pci_get_pcibus, NULL, NULL },
+    { "piix3-isa-bridge", 0, NULL,
+      DT_BUS_ISA, DT_BUS_PCI,
+      NULL, NULL, dt_piix3_init, NULL,
+      NULL, NULL, NULL },
+    { "piix3-ide", sizeof(dt_device_piix3_ide), NULL,
+      DT_BUS_IDE, DT_BUS_PCI,
+      dt_piix3_ide_config, NULL, dt_piix3_ide_init, NULL,
+      NULL, NULL, NULL },
+    { "piix3-usb", 0, NULL,
+      DT_BUS_NONE, DT_BUS_PCI, /* FIXME: DT_BUS_NONE -> BUS_USB ?*/
+      NULL, NULL, dt_piix3_usb_init, NULL,
+      NULL, NULL, NULL },
+    { "piix3-acpi", 0, NULL,
+      DT_BUS_NONE, DT_BUS_PCI,
+      NULL, NULL, dt_piix3_acpi_init, NULL,
+      NULL, NULL, NULL },
+    { "vga", sizeof(dt_device_vga), dt_vga_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_vga_config, NULL, dt_vga_init, NULL,
+      NULL, NULL, NULL },
+    { "nic", sizeof(dt_device_nic), dt_nic_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_nic_config, NULL, dt_nic_init, NULL,
+      NULL, NULL, NULL },
+    { "scsi", sizeof(dt_device_scsi), NULL,
+      DT_BUS_SCSI, DT_BUS_PCI,
+      dt_scsi_config, NULL, dt_scsi_init, NULL,
+      NULL, NULL, NULL },
+    { "virtio-block", sizeof(dt_device_virtio_block), NULL,
+      DT_BUS_VIRTIO, DT_BUS_PCI,
+      dt_virtio_block_config, NULL, dt_virtio_block_init, NULL,
+      NULL, NULL, NULL },
+    { "virtio-balloon", 0, NULL,
+      DT_BUS_NONE, DT_BUS_PCI,
+      NULL, NULL, dt_virtio_balloon_init, NULL,
+      NULL, NULL, NULL },
+    { "virtio-console", sizeof(dt_device_virtio_console), dt_virtio_console_props,
+      DT_BUS_NONE, DT_BUS_PCI,
+      dt_virtio_console_config, NULL, dt_virtio_console_init, NULL,
+      NULL, NULL, NULL },
+    { "ide-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_IDE,
+      NULL, NULL, NULL, NULL,
+      NULL, dt_drive_get_unit, NULL },
+    { "scsi-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_SCSI,
+      NULL, NULL, NULL, NULL,
+      NULL, dt_drive_get_unit, NULL },
+    { "floppy-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_FLOPPY,
+      NULL, NULL, NULL, NULL,
+      NULL, dt_drive_get_unit, NULL },
+    { "virtio-drive", sizeof(dt_device_drive), dt_drive_props,
+      DT_BUS_NONE, DT_BUS_VIRTIO,
+      NULL, NULL, NULL, NULL,
+      NULL, dt_drive_get_unit, NULL },
+    { NULL, 0, NULL, DT_BUS_NONE, DT_BUS_NONE,
+      NULL, NULL, NULL, NULL,
+      NULL, NULL, NULL }
+};
+
+static tree *dt_hardcoded_config(const char *name)
+{
+#ifdef TARGET_X86_64
+#define CPU_MODEL_DEFAULT "qemu64"
+#else
+#define CPU_MODEL_DEFAULT "qemu32"
+#endif
+    tree *root, *pci, *isa, *leaf;
+
+    /*
+     * TODO Read from config file.
+     *
+     * TODO Pretty far from a comprehensive machine configuration, but
+     * we need to start somewhere.
+     */
+    if (strcmp(name, "pcdt")) {
+        fprintf(stderr, "qemu: machine %s not implemented", name);
+        exit(1);
+    }
+    root = tree_new_child(NULL, "", NULL);
+
+    leaf = tree_new_child(root, "cpus", NULL);
+    tree_put_propf(leaf, "model", "%s", CPU_MODEL_DEFAULT);
+
+    leaf = tree_new_child(root, "pc-misc", NULL);
+    tree_put_propf(leaf, "interrupt-parent", "/pci/piix3-isa-bridge/i8259");
+
+    pci = tree_new_child(root, "pci", NULL);
+    tree_put_propf(pci, "interrupt-parent", "/pci/piix3-isa-bridge/i8259");
+
+    isa = tree_new_child(pci, "piix3-isa-bridge", NULL);
+    tree_put_propf(isa, "interrupt-parent", "/pci/piix3-isa-bridge/i8259");
+
+    leaf = tree_new_child(isa, "i8259", NULL);
+
+    leaf = tree_new_child(isa, "i8254", NULL);
+    tree_put_propf(leaf, "interrupt-parent", "/pci/piix3-isa-bridge/i8259");
+
+    leaf = tree_new_child(isa, "MC146818", NULL);
+    tree_put_propf(leaf, "interrupt-parent", "/pci/piix3-isa-bridge/i8259");
+
+    leaf = tree_new_child(pci, "piix3-ide", NULL);
+    tree_put_propf(leaf, "interrupt-parent", "/pci/piix3-isa-bridge/i8259");
+
+    if (usb_enabled) {
+        leaf = tree_new_child(pci, "piix3-usb", NULL);
+        tree_put_propf(leaf, "interrupt-parent", "/pci/piix3-isa-bridge/i8259");
+    }
+
+    if (acpi_enabled) {
+        leaf = tree_new_child(pci, "piix3-acpi", NULL);
+        tree_put_propf(leaf, "interrupt-parent", "/pci/piix3-isa-bridge/i8259");
+    }
+
+    leaf = tree_new_child(pci, "virtio-balloon", NULL);
+    return root;
+#undef CPU_MODEL_DEFAULT
+}
+
+#define dt_read_config(name) dt_hardcoded_config((name))
+
+static void dt_add_ram(tree *conf, ram_addr_t ram_size)
+{
+    ram_addr_t left, sz;
+
+    left = ram_size;
+    sz = MIN(left, 0xa0000);
+    dt_add_memory(conf, 0, sz, 0);
+    left -= sz;
+
+    sz = MIN(left, 0x60000);
+    left -= sz;
+
+    if (left) {
+        sz = MIN(left, 0xe0000000 - 0x100000);
+        dt_add_memory(conf, 0x100000, sz, 0);
+        left -= sz;
+    }
+
+#if TARGET_PHYS_ADDR_BITS > 32
+    if (left)
+        dt_add_memory(conf, 0x100000000ull, left, 0);
+#endif
+}
+
+static char *bios_image_fname(const char *image)
+{
+    char *fname = qemu_malloc(strlen(bios_dir) + 1 + strlen(image) + 1);
+    sprintf(fname, "%s/%s", bios_dir, image);
+    return fname;
+}
+
+static void dt_add_bios(tree *conf, dt_host *host)
+{
+    char *fname = bios_image_fname(bios_name ? bios_name : BIOS_FILENAME);
+    int size;
+    target_phys_addr_t addr;
+
+    size = get_image_size(fname);
+    if (size <= 0 || size % 0x10000 != 0 || size > 0x20000) {
+                                /* FIXME implement size > 0x20000 */
+        fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", fname);
+        exit(1);
+    }
+
+    addr = 0xffffffff - (size - 1);
+    dt_add_memory(conf, addr, size, 1);
+    dt_config_mem(host, dt_image_loader, addr, size, fname);
+
+    /* TODO map the same memory again */
+    addr = 0x100000 - size;
+    dt_add_memory(conf, addr, size, 1);
+    dt_config_mem(host, dt_image_loader, addr, size, fname);
+}
+
+static int dt_config_oprom(dt_host *host, dt_mem_loader loader,
+                           int offs, int size, const void *arg,
+                           const char *what)
+{
+    offs = (offs + 2047) & ~2047;
+
+    if (offs + size > 0x20000) {
+        if (what)
+            fprintf(stderr, "Not enough space for %s\n", what);
+        else
+            fprintf(stderr, "Not enough space for option rom '%s'\n",
+                    (char *)arg);
+        exit(1);
+    }
+
+    dt_config_mem(host, loader, 0xc0000 + offs, size, arg);
+    return offs + size;
+}
+
+static int dt_config_oprom_image(dt_host *host, int offs, const char *fname)
+{
+    int size;
+
+    size = get_image_size(fname);
+    if (size < 0) {
+        fprintf(stderr, "Could not load option rom '%s'\n", fname);
+        exit(1);
+    }
+
+    return dt_config_oprom(host, dt_image_loader, offs, size, fname, fname);
+}
+
+typedef struct dt_linux_loader_arg {
+    const char *kernel_filename;
+    const char *kernel_cmdline;
+    const char *initrd_filename;
+} dt_linux_loader_arg;
+
+static void dt_linux_loader(target_phys_addr_t addr, ram_addr_t size,
+                             const void *arg)
+{
+    const dt_linux_loader_arg *llarg = arg;
+
+    load_linux(addr, llarg->kernel_filename, llarg->initrd_filename,
+               llarg->kernel_cmdline);
+}
+
+static void dt_add_oprom(tree *conf, dt_host *host,
+                         const char *kernel_filename,
+                         const char *kernel_cmdline,
+                         const char *initrd_filename)
+{
+    int offs, i;
+    dt_linux_loader_arg *llarg;
+
+    offs = 0;
+
+    if (cirrus_vga_enabled || vmsvga_enabled || std_vga_enabled) {
+        offs = dt_config_oprom_image(host, offs, 
+                                     bios_image_fname(cirrus_vga_enabled
+                                                      ? VGABIOS_CIRRUS_FILENAME
+                                                      : VGABIOS_FILENAME));
+        /*
+         * Although video roms can grow larger than 0x8000, the first
+         * 0x8000 bytes are reserved for them.  It means we won't be
+         * looking for any other kind of option rom inside this area.
+         */
+        offs = MAX(offs, 0x8000);
+    }
+
+    if (kernel_filename) {
+        llarg = qemu_malloc(sizeof(*llarg));
+        llarg->kernel_filename = kernel_filename;
+        llarg->kernel_cmdline = kernel_cmdline;
+        llarg->initrd_filename = initrd_filename;
+        offs = dt_config_oprom(host, dt_linux_loader, offs, 2048, llarg,
+                               "kernel loader");
+    }
+
+    for (i = 0; i < nb_option_roms; i++)
+        offs = dt_config_oprom_image(host, offs, option_rom[i]);
+
+    dt_add_memory(conf, 0xc0000, offs, 1);
+}
+
+static void pcdt_init(ram_addr_t ram_size, int vga_ram_size,
+                      const char *boot_device,
+                      const char *kernel_filename,
+                      const char *kernel_cmdline,
+                      const char *initrd_filename,
+                      const char *cpu_model)
+{
+    tree *conf;
+    tree *node;
+    dt_host *host;
+    dt_device *root;
+
+    conf = dt_read_config(pcdt_machine.name);
+    if (!conf)
+        exit(1);
+
+    host = dt_create_host();
+
+    dt_add_ram(conf, ram_size);
+    dt_add_bios(conf, host);
+    dt_add_oprom(conf, host, kernel_filename, kernel_cmdline, initrd_filename);
+                 
+    node = tree_node_by_name(conf, "/pci/piix3-isa-bridge/MC146818");
+    tree_put_propf(node, "boot-device", "%s", boot_device);
+    assert(ram_size <= 0xe0000000); /* TODO implement */
+    tree_put_propf(node, "ram-below-4g", "%#lx", (unsigned long)ram_size);
+
+    root = dt_config_machine(conf, host, dt_driver_table,
+                             ram_size, vga_ram_size, cpu_model);
+    dt_create_machine(root, host);
+}
+
+QEMUMachine pcdt_machine = {
+    .name = "pcdt",
+    .desc = "Standard PC (device tree)",
+    .init = pcdt_init,
+    .max_cpus = 255,
+};
diff --git a/hw/pcint.h b/hw/pcint.h
new file mode 100644
index 0000000..778d5a6
--- /dev/null
+++ b/hw/pcint.h
@@ -0,0 +1,50 @@
+/*
+ * Stuff shared by pc.c and dt.c
+ *
+ * See dt.c for why this should go away eventually.
+ */
+
+#ifndef HW_PC_INT_H
+#define HW_PC_INT_H
+
+#define BIOS_FILENAME "bios.bin"
+#define VGABIOS_FILENAME "vgabios.bin"
+#define VGABIOS_CIRRUS_FILENAME "vgabios-cirrus.bin"
+
+#define PC_MAX_BIOS_SIZE (4 * 1024 * 1024)
+
+#define MAX_IDE_BUS 2
+
+/* TODO move to ferr stuff in cpu.h? */
+extern qemu_irq ferr_irq;
+void ioportF0_write(void *opaque, uint32_t addr, uint32_t data);
+
+/* TODO eliminate */
+extern RTCState *rtc_state;
+extern PCIDevice *i440fx_state;
+extern int serial_io[MAX_SERIAL_PORTS];
+extern int serial_irq[MAX_SERIAL_PORTS];
+extern int parallel_io[MAX_PARALLEL_PORTS];
+extern int parallel_irq[MAX_PARALLEL_PORTS];
+extern fdctrl_t *floppy_controller;
+
+/* TODO move to pic stuff in pc.h? */
+void pic_irq_request(void *opaque, int irq, int level);
+
+/* TODO move to a20 stuff in pc.h? */
+void ioport92_write(void *opaque, uint32_t addr, uint32_t val);
+uint32_t ioport92_read(void *opaque, uint32_t addr);
+
+void bochs_bios_init(void);
+void main_cpu_reset(void *opaque);
+void ioport80_write(void *opaque, uint32_t addr, uint32_t data);
+int pc_boot_set(void *opaque, const char *boot_device);
+void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd);
+void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+               const char *boot_device, BlockDriverState **hd_table);
+void load_linux(target_phys_addr_t option_rom,
+                const char *kernel_filename,
+                const char *initrd_filename,
+                const char *kernel_cmdline);
+
+#endif
diff --git a/net.c b/net.c
index 5365891..d2163bd 100644
--- a/net.c
+++ b/net.c
@@ -157,7 +157,7 @@ static void hex_dump(FILE *f, const uint8_t *buf, int size)
 }
 #endif
 
-static int parse_macaddr(uint8_t *macaddr, const char *p)
+int parse_macaddr(uint8_t *macaddr, const char *p)
 {
     int i;
     char *last_char;
diff --git a/net.h b/net.h
index 1a51be7..54bdf80 100644
--- a/net.h
+++ b/net.h
@@ -47,6 +47,7 @@ int qemu_can_send_packet(VLANClientState *vc);
 ssize_t qemu_sendv_packet(VLANClientState *vc, const struct iovec *iov,
                           int iovcnt);
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size);
+int parse_macaddr(uint8_t *macaddr, const char *p);
 void qemu_format_nic_info_str(VLANClientState *vc, uint8_t macaddr[6]);
 void qemu_check_nic_model(NICInfo *nd, const char *model);
 void qemu_check_nic_model_list(NICInfo *nd, const char * const *models,
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 1cf49d5..34a7b4d 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -7,6 +7,7 @@
 
 void register_machines(void)
 {
+    qemu_register_machine(&pcdt_machine);
     qemu_register_machine(&pc_machine);
     qemu_register_machine(&isapc_machine);
 }
diff --git a/tree.c b/tree.c
new file mode 100644
index 0000000..da07b76
--- /dev/null
+++ b/tree.c
@@ -0,0 +1,285 @@
+/*
+ * QEMU PC System Emulator
+ *
+ * Copyright (C) 2009 Red Hat, Inc., Markus Armbruster <armbru@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/* Ye Olde Decorated Tree */
+
+#include <assert.h>
+#include "tree.h"
+#include "qemu-common.h"
+#include "sys-queue.h"
+
+struct tree {
+    const char *name;
+    LIST_HEAD(, tree_prop) props;
+    tree *parent;
+    TAILQ_HEAD(, tree) children;
+    TAILQ_ENTRY(tree) siblings;
+    void *user;
+};
+
+struct tree_prop {
+    const char *name;
+    const void *val;
+    int sz;
+    tree *owner;
+    LIST_ENTRY(tree_prop) link;
+};
+
+tree *tree_new_child(tree *parent, const char *name, void *user)
+{
+    tree *child = qemu_malloc(sizeof(*child));
+
+    child->name = name;
+    LIST_INIT(&child->props);
+    child->parent = NULL;
+    TAILQ_INIT(&child->children);
+    child->user = user;
+    if (parent)
+        tree_insert(parent, child);
+
+    return child;
+}
+
+void tree_insert(tree *parent, tree *child)
+{
+    assert(!child->parent);
+    child->parent = parent;
+    TAILQ_INSERT_TAIL(&parent->children, child, siblings);
+}
+
+const char *tree_node_name(const tree *node)
+{
+    return node->name;
+}
+
+static tree *tree_child_by_name(const tree *parent, const char *name)
+{
+    const char *slash = strchr(name, '/');
+    size_t len = slash ? slash - name : strlen(name);
+    tree *child;
+
+    TAILQ_FOREACH(child, &parent->children, siblings) {
+        if (!memcmp(child->name, name, len) && child->name[len] == 0)
+            return child;
+    }
+    return NULL;
+}
+
+tree *tree_node_by_name(const tree *node, const char *name)
+{
+    tree *child;
+    size_t len;
+
+    if (name[0] == '/') {
+        for (; node->parent; node = node->parent) ;
+        while (*name == '/') name++;
+    }
+
+    if (name[0] == 0)
+        return (tree *)node;
+
+    child = tree_child_by_name(node, name);
+    if (!child)
+        return NULL;
+
+    len = strlen(child->name);
+    if (name[len] == 0)
+        return child;
+    assert (name[len] == '/');
+
+    while (name[len] == '/') len++;
+    return tree_node_by_name(child, name + len);
+}
+
+tree_prop *tree_first_prop(const tree *node)
+{
+    return LIST_FIRST(&node->props);
+}
+
+tree_prop *tree_next_prop(const tree_prop *prop)
+{
+    return LIST_NEXT(prop, link);
+}
+
+tree_prop *tree_get_prop(const tree *node, const char *name)
+{
+    tree_prop *prop;
+
+    LIST_FOREACH(prop, &node->props, link) {
+        if (!strcmp(prop->name, name))
+            return prop;
+    }
+    return NULL;
+}
+
+const char *tree_get_prop_s(const tree *node, const char *name)
+{
+    tree_prop *prop = tree_get_prop(node, name);
+    if (!prop
+        || memchr(prop->val, 0, prop->sz) != prop->val + prop->sz - 1) {
+        errno = EINVAL;
+        return NULL;
+    }
+    return prop->val;
+}
+
+const char *tree_prop_name(const tree_prop *prop)
+{
+    return prop->name;
+}
+
+const void *tree_prop_value(const tree_prop *prop, size_t *size)
+{
+    if (size)
+        *size = prop->sz;
+    return prop->val;
+}
+
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz)
+{
+    tree_prop *prop;
+
+    prop = tree_get_prop(node, name);
+    if (!prop) {
+        prop = qemu_malloc(sizeof(*prop));
+        prop->name = name;
+        prop->owner = node;
+        LIST_INSERT_HEAD(&node->props, prop, link);
+    }
+    /* FIXME need a destructor for val */
+    prop->val = val;
+    prop->sz = sz;
+}
+
+void tree_put_propf(tree *node, const char *name, const char *fmt, ...)
+{
+    va_list ap;
+    size_t len;
+    char *buf;
+
+    va_start(ap, fmt);
+    len = vsnprintf(NULL, 0, fmt, ap);
+    va_end(ap);
+
+    buf = qemu_malloc(len + 1);
+    va_start(ap, fmt);
+    vsnprintf(buf, len + 1, fmt, ap);
+    va_end(ap);
+
+    tree_put_prop(node, name, buf, len + 1);
+}
+
+void tree_put_user(tree *node, void *user)
+{
+    node->user = user;
+}
+
+void *tree_get_user(const tree *node)
+{
+    return node->user;
+}
+
+tree *tree_parent(const tree *node)
+{
+    return node->parent;
+}
+
+tree *tree_first_child(const tree *node)
+{
+    return TAILQ_FIRST(&node->children);
+}
+
+tree *tree_sibling(const tree *node)
+{
+    return TAILQ_NEXT(node, siblings);
+}
+
+int tree_path(const tree *node, char *buf, size_t bufsz)
+{
+    char *p;
+    const tree *np;
+    size_t len, res;
+
+    p = buf + bufsz;
+    res = 0;
+    for (np = node; np->parent; np = np->parent) {
+        len = 1 + strlen(np->name);
+        res += len;
+        if (res >= bufsz)
+            continue;
+        p -= len;
+        memcpy(p + 1, np->name, len - 1);
+        p[0] = '/';
+    }
+
+    if (res == 0) {
+        if (++res < bufsz)
+            *--p = '/';
+    }
+
+    if (res < bufsz) {
+        memcpy(buf, p, res);
+        buf[res] = 0;
+    }
+
+    return res;
+}
+
+static void tree_print_sub(const tree *node, int indent)
+{
+    int i, use_str, sep;
+    const unsigned char *pv;
+    tree_prop *prop;
+    tree *child;
+
+    printf("%*s%s {\n", indent, "", node->parent ? node->name : "/");
+    LIST_FOREACH(prop, &node->props, link) {
+        printf("%*s%s", indent + 4, "", prop->name);
+        pv = prop->val;
+        if (pv) {
+            printf(" = ");
+            use_str = pv[prop->sz - 1] == 0;
+            for (i = 0; i < prop->sz - 1; i++) {
+                if (!isprint(pv[i]))
+                    use_str = 0;
+            }
+            if (use_str)
+                printf("\"%s\"", (const char *)prop->val);
+            else {
+                sep = '[';
+                for (i = 0; i < prop->sz; i++) {
+                    printf("%c%02x", sep, pv[i]);
+                    sep = ' ';
+                }
+                printf("]");
+            }
+        }
+        printf(";\n");
+    }
+    TAILQ_FOREACH(child, &node->children, siblings)
+        tree_print_sub(child, indent + 4);
+    printf("%*s};\n", indent, "");
+}
+
+void tree_print(const tree *node)
+{
+    tree_print_sub(node, 0);
+}
diff --git a/tree.h b/tree.h
new file mode 100644
index 0000000..3f3b367
--- /dev/null
+++ b/tree.h
@@ -0,0 +1,41 @@
+#ifndef TREE_H
+#define TREE_H
+
+#include <stddef.h>
+
+typedef struct tree tree;
+typedef struct tree_prop tree_prop;
+
+tree *tree_new_child(tree *parent, const char *name, void *user);
+void tree_insert(tree *parent, tree *child);
+const char *tree_node_name(const tree *node);
+tree *tree_node_by_name(const tree *node,
+                        const char *name);
+
+tree_prop *tree_first_prop(const tree *node);
+tree_prop *tree_next_prop(const tree_prop *prop);
+#define TREE_FOREACH_PROP(var, node) \
+    for (var = tree_first_prop(node); var; var = tree_next_prop(var))
+tree_prop *tree_get_prop(const tree *node, const char *name);
+const char *tree_get_prop_s(const tree *node, const char *name);
+const char *tree_prop_name(const tree_prop *prop);
+const void *tree_prop_value(const tree_prop *prop, size_t *size);
+void tree_put_prop(tree *node, const char *name,
+                   const void *val, size_t sz);
+void tree_put_propf(tree *node, const char *name,
+                    const char *fmt, ...)
+    __attribute__((format(printf,3,4)));
+
+void tree_put_user(tree *node, void *user);
+void *tree_get_user(const tree *node);
+
+tree *tree_parent(const tree *node);
+tree *tree_first_child(const tree *node);
+tree *tree_sibling(const tree *node);
+#define TREE_FOREACH_CHILD(var, node) \
+    for (var = tree_first_child(node); var; var = tree_sibling(var))
+
+int tree_path(const tree *node, char *buf, size_t bufsz);
+void tree_print(const tree *node);
+
+#endif

^ permalink raw reply related	[flat|nested] 146+ messages in thread

end of thread, other threads:[~2009-04-17 16:05 UTC | newest]

Thread overview: 146+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-11 15:40 [Qemu-devel] [RFC] Machine description as data Markus Armbruster
2009-02-11 16:31 ` Ian Jackson
2009-02-11 17:43   ` Markus Armbruster
     [not found]   ` <18834.64870.951989.714873-msK/Ju9w1zmnROeE8kUsYhEHtJm+Wo+I@public.gmane.org>
2009-02-11 18:57     ` Hollis Blanchard
2009-02-11 18:57       ` Hollis Blanchard
     [not found]       ` <1234378639.28751.85.camel-EGjIuKC2qUdB0N6nvOmcJFaTQe2KTcn/@public.gmane.org>
2009-02-12  3:50         ` David Gibson
2009-02-12  3:50           ` David Gibson
     [not found] ` <87iqnh6kyv.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2009-02-11 18:50   ` Hollis Blanchard
2009-02-11 18:50     ` Hollis Blanchard
2009-02-11 19:34     ` Blue Swirl
2009-02-11 19:34       ` [Qemu-devel] " Blue Swirl
     [not found]     ` <1234378228.28751.79.camel-EGjIuKC2qUdB0N6nvOmcJFaTQe2KTcn/@public.gmane.org>
2009-02-12  4:01       ` David Gibson
2009-02-12  4:01         ` David Gibson
2009-02-12 10:26         ` Markus Armbruster
2009-02-12 10:26           ` [Qemu-devel] " Markus Armbruster
2009-02-12 12:49           ` Carl-Daniel Hailfinger
2009-02-12 12:49             ` [Qemu-devel] " Carl-Daniel Hailfinger
2009-02-12 16:46             ` M. Warner Losh
2009-02-12 16:46               ` [Qemu-devel] " M. Warner Losh
2009-02-12 18:29               ` Markus Armbruster
2009-02-12 18:29                 ` [Qemu-devel] " Markus Armbruster
2009-02-12 23:58                 ` Carl-Daniel Hailfinger
2009-02-12 23:58                   ` [Qemu-devel] " Carl-Daniel Hailfinger
     [not found]                   ` <4994B7B6.80805-hi6Y0CQ0nG0@public.gmane.org>
2009-02-13 11:19                     ` Markus Armbruster
2009-02-13 11:19                       ` Markus Armbruster
     [not found]                 ` <87prhnwltz.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2009-02-13  1:05                   ` David Gibson
2009-02-13  1:05                     ` David Gibson
2009-02-12 23:35               ` Carl-Daniel Hailfinger
2009-02-12 23:35                 ` [Qemu-devel] " Carl-Daniel Hailfinger
2009-02-12 23:58                 ` Paul Brook
2009-02-12 23:58                   ` [Qemu-devel] " Paul Brook
     [not found]                   ` <200902122358.25864.paul-qD8j1LwMmJjtCj0u4l0SBw@public.gmane.org>
2009-02-13  0:32                     ` Carl-Daniel Hailfinger
2009-02-13  0:32                       ` Carl-Daniel Hailfinger
2009-02-13  0:47                       ` Jamie Lokier
2009-02-13  0:47                         ` [Qemu-devel] " Jamie Lokier
     [not found]                       ` <4994BF93.2070409-hi6Y0CQ0nG0@public.gmane.org>
2009-02-13  1:46                         ` David Gibson
2009-02-13  1:46                           ` David Gibson
2009-02-13 14:32                       ` Lennart Sorensen
2009-02-13 14:32                         ` [Qemu-devel] " Lennart Sorensen
     [not found]                 ` <4994B22E.6060608-hi6Y0CQ0nG0@public.gmane.org>
2009-02-13  0:05                   ` M. Warner Losh
2009-02-13  0:05                     ` M. Warner Losh
     [not found]           ` <87iqng0x3t.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2009-02-12 17:52             ` Hollis Blanchard
2009-02-12 17:52               ` Hollis Blanchard
     [not found]               ` <1234461162.20305.16.camel-EGjIuKC2qUdB0N6nvOmcJFaTQe2KTcn/@public.gmane.org>
2009-02-12 18:53                 ` Markus Armbruster
2009-02-12 18:53                   ` Markus Armbruster
     [not found]                   ` <87fxijwkpn.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2009-02-12 19:33                     ` Mitch Bradley
2009-02-12 19:33                       ` Mitch Bradley
     [not found]                       ` <499479A7.5090902-D5eQfiDGL7eakBO8gow8eQ@public.gmane.org>
2009-02-13  0:59                         ` David Gibson
2009-02-13  0:59                           ` David Gibson
2009-02-13  1:00                 ` David Gibson
2009-02-13  1:00                   ` David Gibson
2009-02-13  0:43             ` David Gibson
2009-02-13  0:43               ` David Gibson
     [not found]               ` <20090213004305.GB8104-787xzQ0H9iRg7VrjXcPTGA@public.gmane.org>
2009-02-13  2:11                 ` Carl-Daniel Hailfinger
2009-02-13  2:11                   ` Carl-Daniel Hailfinger
     [not found]                   ` <4994D6C8.5050004-hi6Y0CQ0nG0@public.gmane.org>
2009-02-13  2:17                     ` David Gibson
2009-02-13  2:17                       ` David Gibson
     [not found]                       ` <20090213021704.GA10476-787xzQ0H9iRg7VrjXcPTGA@public.gmane.org>
2009-02-13  2:45                         ` DTS syntax and DTC patches (was: Re: [Qemu-devel] [RFC] Machine description as data) Carl-Daniel Hailfinger
2009-02-13  2:45                           ` Carl-Daniel Hailfinger
     [not found]                           ` <4994DED9.6020803-hi6Y0CQ0nG0@public.gmane.org>
2009-02-13  2:51                             ` David Gibson
2009-02-13  2:51                               ` David Gibson
     [not found]                               ` <20090213025101.GC10476-787xzQ0H9iRg7VrjXcPTGA@public.gmane.org>
2009-02-13 17:07                                 ` [coreboot] " ron minnich
     [not found]                               ` <13426df10902130907m5c3452dpb8f4f2b72f8507b9@mail.gmail.com>
     [not found]                                 ` <13426df10902130907m5c3452dpb8f4f2b72f8507b9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-02-20  2:29                                   ` David Gibson
     [not found]                                     ` <20090220022918.GA18332-787xzQ0H9iRg7VrjXcPTGA@public.gmane.org>
2009-02-20  3:32                                       ` ron minnich
2009-02-13 20:04                     ` [Qemu-devel] [RFC] Machine description as data Jon Loeliger
2009-02-13 20:04                       ` Jon Loeliger
2009-02-13 20:15                       ` Carl-Daniel Hailfinger
2009-02-13 20:15                         ` Carl-Daniel Hailfinger
     [not found]                         ` <4995D4EE.8030703-hi6Y0CQ0nG0@public.gmane.org>
2009-02-13 20:19                           ` Jon Loeliger
2009-02-13 20:19                             ` Jon Loeliger
2009-02-12 10:26     ` Markus Armbruster
2009-02-12 10:26       ` [Qemu-devel] " Markus Armbruster
2009-02-12 12:36       ` Carl-Daniel Hailfinger
2009-02-12 12:36         ` [Qemu-devel] " Carl-Daniel Hailfinger
2009-02-12 16:07       ` Paul Brook
2009-02-12 17:17         ` Blue Swirl
2009-02-12 18:09         ` Marcelo Tosatti
     [not found]       ` <87k57w0x4r.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2009-02-13  0:37         ` David Gibson
2009-02-13  0:37           ` David Gibson
     [not found]           ` <20090213003724.GA8104-787xzQ0H9iRg7VrjXcPTGA@public.gmane.org>
2009-02-13 11:26             ` Markus Armbruster
2009-02-13 11:26               ` Markus Armbruster
     [not found]               ` <87ab8qr317.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2009-02-13 12:06                 ` Paul Brook
2009-02-13 12:06                   ` Paul Brook
     [not found]                   ` <200902131206.42427.paul-qD8j1LwMmJjtCj0u4l0SBw@public.gmane.org>
2009-02-13 12:48                     ` Markus Armbruster
2009-02-13 12:48                       ` Markus Armbruster
     [not found]                       ` <87ocx6pkol.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2009-02-13 13:33                         ` Paul Brook
2009-02-13 13:33                           ` Paul Brook
     [not found]                           ` <200902131333.47141.paul-qD8j1LwMmJjtCj0u4l0SBw@public.gmane.org>
2009-02-13 14:13                             ` Markus Armbruster
2009-02-13 14:13                               ` Markus Armbruster
     [not found]                               ` <871vu2pgq7.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2009-02-13 14:25                                 ` Paul Brook
2009-02-13 14:25                                   ` Paul Brook
     [not found]                                   ` <200902131425.53137.paul-qD8j1LwMmJjtCj0u4l0SBw@public.gmane.org>
2009-02-13 15:47                                     ` Jamie Lokier
2009-02-13 15:47                                       ` Jamie Lokier
2009-02-13 18:36                                 ` Mitch Bradley
2009-02-13 18:36                                   ` Mitch Bradley
     [not found]                                   ` <4995BDC0.3040806-D5eQfiDGL7eakBO8gow8eQ@public.gmane.org>
2009-02-13 19:49                                     ` Markus Armbruster
2009-02-13 19:49                                       ` Markus Armbruster
     [not found]                                       ` <877i3uglqz.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2009-02-13 19:51                                         ` Mitch Bradley
2009-02-13 19:51                                           ` Mitch Bradley
2009-02-16  3:42                 ` David Gibson
2009-02-16  3:42                   ` David Gibson
     [not found]                   ` <20090216034214.GB9772-787xzQ0H9iRg7VrjXcPTGA@public.gmane.org>
2009-02-16 16:39                     ` Markus Armbruster
2009-02-16 16:39                       ` Markus Armbruster
     [not found]                       ` <87iqnawd2r.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2009-02-17  3:29                         ` David Gibson
2009-02-17  3:29                           ` David Gibson
     [not found]                           ` <20090217032909.GA29225-787xzQ0H9iRg7VrjXcPTGA@public.gmane.org>
2009-02-17  7:54                             ` Markus Armbruster
2009-02-17  7:54                               ` Markus Armbruster
2009-02-17 17:44                             ` Paul Brook
2009-02-17 17:44                               ` Paul Brook
     [not found]                               ` <200902171744.34951.paul-qD8j1LwMmJjtCj0u4l0SBw@public.gmane.org>
2009-02-18  8:36                                 ` Markus Armbruster
2009-02-18  8:36                                   ` Markus Armbruster
2009-02-11 19:01 ` Anthony Liguori
2009-02-11 19:36   ` Blue Swirl
2009-02-11 19:56     ` Anthony Liguori
2009-02-12 10:25       ` Markus Armbruster
2009-02-16 16:22 ` [Qemu-devel] Machine description as data prototype, take 2 (was: [RFC] Machine description as data) Markus Armbruster
2009-02-17 17:32   ` Paul Brook
2009-02-18  8:42     ` [Qemu-devel] Machine description as data prototype, take 2 Markus Armbruster
2009-02-19 10:29 ` [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data) Markus Armbruster
2009-02-19 13:53   ` Paul Brook
2009-02-19 14:55     ` [Qemu-devel] Machine description as data prototype, take 3 Markus Armbruster
2009-02-19 15:03       ` Paul Brook
2009-02-19 14:36   ` Anthony Liguori
2009-02-19 15:00     ` Markus Armbruster
2009-02-19 14:49   ` Anthony Liguori
2009-02-23 17:38     ` Markus Armbruster
2009-02-23 18:58       ` Anthony Liguori
2009-02-24  9:08         ` Markus Armbruster
2009-02-19 16:40   ` [Qemu-devel] Machine description as data prototype, take 3 (was: [RFC] Machine description as data) Blue Swirl
2009-02-19 18:30     ` [Qemu-devel] Machine description as data prototype, take 3 Markus Armbruster
2009-02-20 18:14       ` Blue Swirl
2009-02-20 18:20         ` Paul Brook
2009-02-23 12:00           ` Markus Armbruster
2009-02-23 12:18         ` Markus Armbruster
2009-02-23 18:00 ` [Qemu-devel] Machine description as data prototype, take 4 (was: [RFC] Machine description as data) Markus Armbruster
2009-02-24 20:06   ` Blue Swirl
2009-02-25 12:13     ` [Qemu-devel] Machine description as data prototype, take 4 Markus Armbruster
2009-02-25 20:11       ` Blue Swirl
2009-03-03 17:46 ` [Qemu-devel] Machine description as data prototype, take 5 (was: [RFC] Machine description as data) Markus Armbruster
2009-03-12 18:43 ` [Qemu-devel] Machine description as data prototype, take 6 " Markus Armbruster
2009-03-17 16:06   ` [Qemu-devel] Machine description as data prototype, take 6 Paul Brook
2009-03-17 17:32     ` Markus Armbruster
2009-03-23 15:50 ` [Qemu-devel] Re: [RFC] Machine description as data Markus Armbruster
2009-03-23 15:53   ` Markus Armbruster
2009-03-31  9:16 ` Markus Armbruster
2009-04-17 16:04 ` Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.