All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited
@ 2009-08-20 14:58 Christoph Hellwig
  2009-08-20 14:58 ` [Qemu-devel] [PATCH 1/2] raw-posix: refactor AIO support Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Christoph Hellwig @ 2009-08-20 14:58 UTC (permalink / raw)
  To: qemu-devel

This patchset introduces support native Linux AIO.  The first patch
just refactors the existing AIO emulation by thread pools to have
a cleaner layering which allows the native AIO support to be implemented
more easily.

The second patch introduces real native Linux AIO support, although due
to limitations in the kernel implementation we only can use it for
cache=none.  It is vaguely based on Anhony's earlier patches, but due to
the refactoring in the first patch is is much simpler.  Instead of
trying to fit into the model of the Posix AIO API we directly integrate
into the raw-posix code with a very lean interface (see the first patch
for a more detailed explanation).  That also means we can just register
the AIO completion eventd directly with the qemu poll handler instead of
needing an additional indirection.

The IO code performs slightly better than the thread pool on most
workloads I've thrown at it, and uses a lot less CPU time for it:

iozone -s 1024m -r $num -I -f /dev/sdb

output is in Kb/s:

        write 16k  read 16k  write 64k  read 64k  write 256k  read 256k
native	    39133     75462     100980    156169      133642     168343
qemu        29998     48334      79870    116393      133090     161360
qemu+aio    32151     52902      82513    123893      133767     164113


dd if=/dev/zero of=$dev bs=20M oflag=direct count=400
dd if=$dev of=/dev/zero bs=20M iflag=direct count=400

output is in MB/s:

            write  read
native        116   123
qemu          116   100
qemu+aio      116   121

For all of this the AIO code used significantly less CPU time (no
coparism to native due to VM startup overhead and other issues)

                real        user       sys
qemu      25m45.885s  1m36.422s  1m49.394s
qemu+aio  25m36.950s  1m14.178s  1m13.179s

Note that the results have quite a bit of varions per run, so qemu+aio
beeing faster in one of the tests above shouldn't mean too much, it's
also been minimally slower in some.  From various runs I would say that
for larger block sizes we meat native performance, a little bit sooner
with AIO, and a little bit without. 

All thes results are on a raw host device and using virtio.  With image
files on a filesystems there are potential blocking points in the AIO
implementation. Those are relatively small or non-existant on already
allocated (and at least for XFS that includes preallocated) files, but
for spares files including waiting for disk I/O during allocations and
need to be avoided to not kill performance.  All the results also
already include the MSI support for virtio-blk, btw.

Based on this I would recommend to include this patch, but not use it by
default for now.  After some testing I would suggest to enable it by
default for host devices and investigate a way to make it easily usable
for files, possibly including some kernel support to tell us which files
are "safe".

These patches require my patch to my pthreads mandatory applies first,
which already is in Anthony's queue.  If you want to use them with
qemu-kvm you also need to backout the compatfd changed to raw-block.c
first.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 1/2] raw-posix: refactor AIO support
  2009-08-20 14:58 [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited Christoph Hellwig
@ 2009-08-20 14:58 ` Christoph Hellwig
  2009-08-20 14:58 ` [Qemu-devel] [PATCH 2/2] raw-posix: add Linux native " Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2009-08-20 14:58 UTC (permalink / raw)
  To: qemu-devel


Currently the raw-posix.c code contains a lot of knowledge about the
asynchronous I/O scheme that is mostly implemented in posix-aio-compat.c.
All this code does not really belong here and is getting a bit in the
way of implementing native AIO on Linux.

So instead move all the guts of the AIO implementation into
posix-aio-compat.c (which might need a better name, btw).

There's now a very small interface between the AIO providers and raw-posix.c:

 - an init routine is called from raw_open_common to return an AIO context
   for this drive.  An AIO implementation may either re-use one context
   for all drives, or use a different one for each as the Linux native
   AIO support will do.
 - an submit routine is called from the aio_reav/writev methods to submit
   an AIO request

There are no indirect calls involved in this interface as we need to
decide which one to call manually.  We will only call the Linux AIO native
init function if we were requested to by vl.c, and we will only call
the native submit function if we are asked to and the request is properly
aligned.  That's also the reason why the alignment check actually does
the inverse move and now goes into raw-posix.c.

The old posix-aio-compat.h headers is removed now that most of it's
content is private to posix-aio-compat.c, and instead we add a new
block/raw-posix-aio.h headers is created containing only the tiny interface
between raw-posix.c and the AIO implementation.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu-kvm/block/raw-posix.c
===================================================================
--- qemu-kvm.orig/block/raw-posix.c
+++ qemu-kvm/block/raw-posix.c
@@ -27,7 +27,7 @@
 #include "qemu-log.h"
 #include "block_int.h"
 #include "module.h"
-#include "posix-aio-compat.h"
+#include "block/raw-posix-aio.h"
 
 #ifdef CONFIG_COCOA
 #include <paths.h>
@@ -107,6 +107,7 @@ typedef struct BDRVRawState {
     int type;
     unsigned int lseek_err_cnt;
     int open_flags;
+    void *aio_ctx;
 #if defined(__linux__)
     /* linux floppy specific */
     int64_t fd_open_time;
@@ -117,8 +118,6 @@ typedef struct BDRVRawState {
     uint8_t* aligned_buf;
 } BDRVRawState;
 
-static int posix_aio_init(void);
-
 static int fd_open(BlockDriverState *bs);
 static int64_t raw_getlength(BlockDriverState *bs);
 
@@ -132,8 +131,6 @@ static int raw_open_common(BlockDriverSt
     BDRVRawState *s = bs->opaque;
     int fd, ret;
 
-    posix_aio_init();
-
     s->lseek_err_cnt = 0;
 
     s->open_flags = open_flags | O_BINARY;
@@ -165,12 +162,22 @@ static int raw_open_common(BlockDriverSt
     if ((bdrv_flags & BDRV_O_NOCACHE)) {
         s->aligned_buf = qemu_blockalign(bs, ALIGNED_BUFFER_SIZE);
         if (s->aligned_buf == NULL) {
-            ret = -errno;
-            close(fd);
-            return ret;
+            goto out_close;
         }
     }
+
+    s->aio_ctx = paio_init();
+    if (!s->aio_ctx) {
+        goto out_free_buf;
+    }
+
     return 0;
+
+out_free_buf:
+    qemu_vfree(s->aligned_buf);
+out_close:
+    close(fd);
+    return -errno;
 }
 
 static int raw_open(BlockDriverState *bs, const char *filename, int flags)
@@ -487,240 +494,58 @@ static int raw_write(BlockDriverState *b
     return ret;
 }
 
-/***********************************************************/
-/* Unix AIO using POSIX AIO */
-
-typedef struct RawAIOCB {
-    BlockDriverAIOCB common;
-    struct qemu_paiocb aiocb;
-    struct RawAIOCB *next;
-    int ret;
-} RawAIOCB;
-
-typedef struct PosixAioState
-{
-    int rfd, wfd;
-    RawAIOCB *first_aio;
-} PosixAioState;
-
-static void posix_aio_read(void *opaque)
-{
-    PosixAioState *s = opaque;
-    RawAIOCB *acb, **pacb;
-    int ret;
-    ssize_t len;
-
-    /* read all bytes from signal pipe */
-    for (;;) {
-        char bytes[16];
-
-        len = read(s->rfd, bytes, sizeof(bytes));
-        if (len == -1 && errno == EINTR)
-            continue; /* try again */
-        if (len == sizeof(bytes))
-            continue; /* more to read */
-        break;
-    }
-
-    for(;;) {
-        pacb = &s->first_aio;
-        for(;;) {
-            acb = *pacb;
-            if (!acb)
-                goto the_end;
-            ret = qemu_paio_error(&acb->aiocb);
-            if (ret == ECANCELED) {
-                /* remove the request */
-                *pacb = acb->next;
-                qemu_aio_release(acb);
-            } else if (ret != EINPROGRESS) {
-                /* end of aio */
-                if (ret == 0) {
-                    ret = qemu_paio_return(&acb->aiocb);
-                    if (ret == acb->aiocb.aio_nbytes)
-                        ret = 0;
-                    else
-                        ret = -EINVAL;
-                } else {
-                    ret = -ret;
-                }
-                /* remove the request */
-                *pacb = acb->next;
-                /* call the callback */
-                acb->common.cb(acb->common.opaque, ret);
-                qemu_aio_release(acb);
-                break;
-            } else {
-                pacb = &acb->next;
-            }
-        }
-    }
- the_end: ;
-}
-
-static int posix_aio_flush(void *opaque)
-{
-    PosixAioState *s = opaque;
-    return !!s->first_aio;
-}
-
-static PosixAioState *posix_aio_state;
-
-static void aio_signal_handler(int signum)
-{
-    if (posix_aio_state) {
-        char byte = 0;
-
-        write(posix_aio_state->wfd, &byte, sizeof(byte));
-    }
-
-    qemu_service_io();
-}
-
-static int posix_aio_init(void)
-{
-    struct sigaction act;
-    PosixAioState *s;
-    int fds[2];
-    struct qemu_paioinit ai;
-  
-    if (posix_aio_state)
-        return 0;
-
-    s = qemu_malloc(sizeof(PosixAioState));
-
-    sigfillset(&act.sa_mask);
-    act.sa_flags = 0; /* do not restart syscalls to interrupt select() */
-    act.sa_handler = aio_signal_handler;
-    sigaction(SIGUSR2, &act, NULL);
-
-    s->first_aio = NULL;
-    if (pipe(fds) == -1) {
-        fprintf(stderr, "failed to create pipe\n");
-        return -errno;
-    }
-
-    s->rfd = fds[0];
-    s->wfd = fds[1];
-
-    fcntl(s->rfd, F_SETFL, O_NONBLOCK);
-    fcntl(s->wfd, F_SETFL, O_NONBLOCK);
-
-    qemu_aio_set_fd_handler(s->rfd, posix_aio_read, NULL, posix_aio_flush, s);
-
-    memset(&ai, 0, sizeof(ai));
-    ai.aio_threads = 64;
-    ai.aio_num = 64;
-    qemu_paio_init(&ai);
-
-    posix_aio_state = s;
-
-    return 0;
-}
-
-static void raw_aio_remove(RawAIOCB *acb)
+/*
+ * Check if all memory in this vector is sector aligned.
+ */
+static int qiov_is_aligned(QEMUIOVector *qiov)
 {
-    RawAIOCB **pacb;
+    int i;
 
-    /* remove the callback from the queue */
-    pacb = &posix_aio_state->first_aio;
-    for(;;) {
-        if (*pacb == NULL) {
-            fprintf(stderr, "raw_aio_remove: aio request not found!\n");
-            break;
-        } else if (*pacb == acb) {
-            *pacb = acb->next;
-            qemu_aio_release(acb);
-            break;
+    for (i = 0; i < qiov->niov; i++) {
+        if ((uintptr_t) qiov->iov[i].iov_base % 512) {
+            return 0;
         }
-        pacb = &(*pacb)->next;
     }
-}
 
-static void raw_aio_cancel(BlockDriverAIOCB *blockacb)
-{
-    int ret;
-    RawAIOCB *acb = (RawAIOCB *)blockacb;
-
-    ret = qemu_paio_cancel(acb->aiocb.aio_fildes, &acb->aiocb);
-    if (ret == QEMU_PAIO_NOTCANCELED) {
-        /* fail safe: if the aio could not be canceled, we wait for
-           it */
-        while (qemu_paio_error(&acb->aiocb) == EINPROGRESS);
-    }
-
-    raw_aio_remove(acb);
+    return 1;
 }
 
-static AIOPool raw_aio_pool = {
-    .aiocb_size         = sizeof(RawAIOCB),
-    .cancel             = raw_aio_cancel,
-};
-
-static RawAIOCB *raw_aio_setup(BlockDriverState *bs, int64_t sector_num,
-        QEMUIOVector *qiov, int nb_sectors,
-        BlockDriverCompletionFunc *cb, void *opaque)
+static BlockDriverAIOCB *raw_aio_submit(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque, int type)
 {
     BDRVRawState *s = bs->opaque;
-    RawAIOCB *acb;
 
     if (fd_open(bs) < 0)
         return NULL;
 
-    acb = qemu_aio_get(&raw_aio_pool, bs, cb, opaque);
-    if (!acb)
-        return NULL;
-    acb->aiocb.aio_fildes = s->fd;
-    acb->aiocb.ev_signo = SIGUSR2;
-    acb->aiocb.aio_iov = qiov->iov;
-    acb->aiocb.aio_niov = qiov->niov;
-    acb->aiocb.aio_nbytes = nb_sectors * 512;
-    acb->aiocb.aio_offset = sector_num * 512;
-    acb->aiocb.aio_flags = 0;
-
     /*
      * If O_DIRECT is used the buffer needs to be aligned on a sector
-     * boundary. Tell the low level code to ensure that in case it's
-     * not done yet.
+     * boundary.  Check if this is the case or telll the low-level
+     * driver that it needs to copy the buffer.
      */
-    if (s->aligned_buf)
-        acb->aiocb.aio_flags |= QEMU_AIO_SECTOR_ALIGNED;
+    if (s->aligned_buf && !qiov_is_aligned(qiov)) {
+        type |= QEMU_AIO_MISALIGNED;
+    }
 
-    acb->next = posix_aio_state->first_aio;
-    posix_aio_state->first_aio = acb;
-    return acb;
+    return paio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov, nb_sectors,
+                       cb, opaque, type);
 }
 
 static BlockDriverAIOCB *raw_aio_readv(BlockDriverState *bs,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque)
 {
-    RawAIOCB *acb;
-
-    acb = raw_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque);
-    if (!acb)
-        return NULL;
-    if (qemu_paio_read(&acb->aiocb) < 0) {
-        raw_aio_remove(acb);
-        return NULL;
-    }
-    return &acb->common;
+    return raw_aio_submit(bs, sector_num, qiov, nb_sectors,
+                          cb, opaque, QEMU_AIO_READ);
 }
 
 static BlockDriverAIOCB *raw_aio_writev(BlockDriverState *bs,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque)
 {
-    RawAIOCB *acb;
-
-    acb = raw_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque);
-    if (!acb)
-        return NULL;
-    if (qemu_paio_write(&acb->aiocb) < 0) {
-        raw_aio_remove(acb);
-        return NULL;
-    }
-    return &acb->common;
+    return raw_aio_submit(bs, sector_num, qiov, nb_sectors,
+                          cb, opaque, QEMU_AIO_WRITE);
 }
 
 static void raw_close(BlockDriverState *bs)
@@ -1085,30 +910,10 @@ static BlockDriverAIOCB *hdev_aio_ioctl(
         BlockDriverCompletionFunc *cb, void *opaque)
 {
     BDRVRawState *s = bs->opaque;
-    RawAIOCB *acb;
 
     if (fd_open(bs) < 0)
         return NULL;
-
-    acb = qemu_aio_get(&raw_aio_pool, bs, cb, opaque);
-    if (!acb)
-        return NULL;
-    acb->aiocb.aio_fildes = s->fd;
-    acb->aiocb.ev_signo = SIGUSR2;
-    acb->aiocb.aio_offset = 0;
-    acb->aiocb.aio_flags = 0;
-
-    acb->next = posix_aio_state->first_aio;
-    posix_aio_state->first_aio = acb;
-
-    acb->aiocb.aio_ioctl_buf = buf;
-    acb->aiocb.aio_ioctl_cmd = req;
-    if (qemu_paio_ioctl(&acb->aiocb) < 0) {
-        raw_aio_remove(acb);
-        return NULL;
-    }
-
-    return &acb->common;
+    return paio_ioctl(bs, s->fd, req, buf, cb, opaque);
 }
 
 #elif defined(__FreeBSD__)
@@ -1189,8 +994,6 @@ static int floppy_open(BlockDriverState 
     BDRVRawState *s = bs->opaque;
     int ret;
 
-    posix_aio_init();
-
     s->type = FTYPE_FD;
 
     /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
Index: qemu-kvm/posix-aio-compat.c
===================================================================
--- qemu-kvm.orig/posix-aio-compat.c
+++ qemu-kvm/posix-aio-compat.c
@@ -12,17 +12,49 @@
  */
 
 #include <sys/ioctl.h>
+#include <sys/types.h>
 #include <pthread.h>
 #include <unistd.h>
 #include <errno.h>
 #include <time.h>
+#include <signal.h>
 #include <string.h>
 #include <stdlib.h>
 #include <stdio.h>
+
+#include "sys-queue.h"
 #include "osdep.h"
 #include "qemu-common.h"
+#include "block_int.h"
+
+#include "block/raw-posix-aio.h"
+
+
+struct qemu_paiocb {
+    BlockDriverAIOCB common;
+    int aio_fildes;
+    union {
+        struct iovec *aio_iov;
+	void *aio_ioctl_buf;
+    };
+    int aio_niov;
+    size_t aio_nbytes;
+#define aio_ioctl_cmd   aio_nbytes /* for QEMU_AIO_IOCTL */
+    int ev_signo;
+    off_t aio_offset;
+
+    TAILQ_ENTRY(qemu_paiocb) node;
+    int aio_type;
+    ssize_t ret;
+    int active;
+    struct qemu_paiocb *next;
+};
+
+typedef struct PosixAioState {
+    int rfd, wfd;
+    struct qemu_paiocb *first_aio;
+} PosixAioState;
 
-#include "posix-aio-compat.h"
 
 static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
 static pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
@@ -132,30 +164,13 @@ qemu_pwritev(int fd, const struct iovec 
 
 #endif
 
-/*
- * Check if we need to copy the data in the aiocb into a new
- * properly aligned buffer.
- */
-static int aiocb_needs_copy(struct qemu_paiocb *aiocb)
-{
-    if (aiocb->aio_flags & QEMU_AIO_SECTOR_ALIGNED) {
-        int i;
-
-        for (i = 0; i < aiocb->aio_niov; i++)
-            if ((uintptr_t) aiocb->aio_iov[i].iov_base % 512)
-                return 1;
-    }
-
-    return 0;
-}
-
 static size_t handle_aiocb_rw_vector(struct qemu_paiocb *aiocb)
 {
     size_t offset = 0;
     ssize_t len;
 
     do {
-        if (aiocb->aio_type == QEMU_PAIO_WRITE)
+        if (aiocb->aio_type & QEMU_AIO_WRITE)
             len = qemu_pwritev(aiocb->aio_fildes,
                                aiocb->aio_iov,
                                aiocb->aio_niov,
@@ -178,7 +193,7 @@ static size_t handle_aiocb_rw_linear(str
     size_t len;
 
     while (offset < aiocb->aio_nbytes) {
-         if (aiocb->aio_type == QEMU_PAIO_WRITE)
+         if (aiocb->aio_type & QEMU_AIO_WRITE)
              len = pwrite(aiocb->aio_fildes,
                           (const char *)buf + offset,
                           aiocb->aio_nbytes - offset,
@@ -208,7 +223,7 @@ static size_t handle_aiocb_rw(struct qem
     size_t nbytes;
     char *buf;
 
-    if (!aiocb_needs_copy(aiocb)) {
+    if (!(aiocb->aio_type & QEMU_AIO_MISALIGNED)) {
         /*
          * If there is just a single buffer, and it is properly aligned
          * we can just use plain pread/pwrite without any problems.
@@ -243,7 +258,7 @@ static size_t handle_aiocb_rw(struct qem
      * a single aligned buffer.
      */
     buf = qemu_memalign(512, aiocb->aio_nbytes);
-    if (aiocb->aio_type == QEMU_PAIO_WRITE) {
+    if (aiocb->aio_type & QEMU_AIO_WRITE) {
         char *p = buf;
         int i;
 
@@ -254,7 +269,7 @@ static size_t handle_aiocb_rw(struct qem
     }
 
     nbytes = handle_aiocb_rw_linear(aiocb, buf);
-    if (aiocb->aio_type != QEMU_PAIO_WRITE) {
+    if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
         char *p = buf;
         size_t count = aiocb->aio_nbytes, copy;
         int i;
@@ -310,12 +325,12 @@ static void *aio_thread(void *unused)
         idle_threads--;
         mutex_unlock(&lock);
 
-        switch (aiocb->aio_type) {
-        case QEMU_PAIO_READ:
-        case QEMU_PAIO_WRITE:
+        switch (aiocb->aio_type & QEMU_AIO_TYPE_MASK) {
+        case QEMU_AIO_READ:
+        case QEMU_AIO_WRITE:
 		ret = handle_aiocb_rw(aiocb);
 		break;
-        case QEMU_PAIO_IOCTL:
+        case QEMU_AIO_IOCTL:
 		ret = handle_aiocb_ioctl(aiocb);
 		break;
 	default:
@@ -346,24 +361,8 @@ static void spawn_thread(void)
     thread_create(&thread_id, &attr, aio_thread, NULL);
 }
 
-int qemu_paio_init(struct qemu_paioinit *aioinit)
-{
-    int ret;
-
-    ret = pthread_attr_init(&attr);
-    if (ret) die2(ret, "pthread_attr_init");
-
-    ret = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
-    if (ret) die2(ret, "pthread_attr_setdetachstate");
-
-    TAILQ_INIT(&request_list);
-
-    return 0;
-}
-
-static int qemu_paio_submit(struct qemu_paiocb *aiocb, int type)
+static void qemu_paio_submit(struct qemu_paiocb *aiocb)
 {
-    aiocb->aio_type = type;
     aiocb->ret = -EINPROGRESS;
     aiocb->active = 0;
     mutex_lock(&lock);
@@ -372,26 +371,9 @@ static int qemu_paio_submit(struct qemu_
     TAILQ_INSERT_TAIL(&request_list, aiocb, node);
     mutex_unlock(&lock);
     cond_signal(&cond);
-
-    return 0;
-}
-
-int qemu_paio_read(struct qemu_paiocb *aiocb)
-{
-    return qemu_paio_submit(aiocb, QEMU_PAIO_READ);
-}
-
-int qemu_paio_write(struct qemu_paiocb *aiocb)
-{
-    return qemu_paio_submit(aiocb, QEMU_PAIO_WRITE);
 }
 
-int qemu_paio_ioctl(struct qemu_paiocb *aiocb)
-{
-    return qemu_paio_submit(aiocb, QEMU_PAIO_IOCTL);
-}
-
-ssize_t qemu_paio_return(struct qemu_paiocb *aiocb)
+static ssize_t qemu_paio_return(struct qemu_paiocb *aiocb)
 {
     ssize_t ret;
 
@@ -402,7 +384,7 @@ ssize_t qemu_paio_return(struct qemu_pai
     return ret;
 }
 
-int qemu_paio_error(struct qemu_paiocb *aiocb)
+static int qemu_paio_error(struct qemu_paiocb *aiocb)
 {
     ssize_t ret = qemu_paio_return(aiocb);
 
@@ -414,20 +396,217 @@ int qemu_paio_error(struct qemu_paiocb *
     return ret;
 }
 
-int qemu_paio_cancel(int fd, struct qemu_paiocb *aiocb)
+static void posix_aio_read(void *opaque)
 {
+    PosixAioState *s = opaque;
+    struct qemu_paiocb *acb, **pacb;
     int ret;
+    ssize_t len;
+
+    /* read all bytes from signal pipe */
+    for (;;) {
+        char bytes[16];
+
+        len = read(s->rfd, bytes, sizeof(bytes));
+        if (len == -1 && errno == EINTR)
+            continue; /* try again */
+        if (len == sizeof(bytes))
+            continue; /* more to read */
+        break;
+    }
+
+    for(;;) {
+        pacb = &s->first_aio;
+        for(;;) {
+            acb = *pacb;
+            if (!acb)
+                goto the_end;
+            ret = qemu_paio_error(acb);
+            if (ret == ECANCELED) {
+                /* remove the request */
+                *pacb = acb->next;
+                qemu_aio_release(acb);
+            } else if (ret != EINPROGRESS) {
+                /* end of aio */
+                if (ret == 0) {
+                    ret = qemu_paio_return(acb);
+                    if (ret == acb->aio_nbytes)
+                        ret = 0;
+                    else
+                        ret = -EINVAL;
+                } else {
+                    ret = -ret;
+                }
+                /* remove the request */
+                *pacb = acb->next;
+                /* call the callback */
+                acb->common.cb(acb->common.opaque, ret);
+                qemu_aio_release(acb);
+                break;
+            } else {
+                pacb = &acb->next;
+            }
+        }
+    }
+ the_end: ;
+}
+
+static int posix_aio_flush(void *opaque)
+{
+    PosixAioState *s = opaque;
+    return !!s->first_aio;
+}
+
+static PosixAioState *posix_aio_state;
+
+static void aio_signal_handler(int signum)
+{
+    if (posix_aio_state) {
+        char byte = 0;
+
+        write(posix_aio_state->wfd, &byte, sizeof(byte));
+    }
+
+    qemu_service_io();
+}
+
+static void paio_remove(struct qemu_paiocb *acb)
+{
+    struct qemu_paiocb **pacb;
+
+    /* remove the callback from the queue */
+    pacb = &posix_aio_state->first_aio;
+    for(;;) {
+        if (*pacb == NULL) {
+            fprintf(stderr, "paio_remove: aio request not found!\n");
+            break;
+        } else if (*pacb == acb) {
+            *pacb = acb->next;
+            qemu_aio_release(acb);
+            break;
+        }
+        pacb = &(*pacb)->next;
+    }
+}
+
+static void paio_cancel(BlockDriverAIOCB *blockacb)
+{
+    struct qemu_paiocb *acb = (struct qemu_paiocb *)blockacb;
+    int active = 0;
 
     mutex_lock(&lock);
-    if (!aiocb->active) {
-        TAILQ_REMOVE(&request_list, aiocb, node);
-        aiocb->ret = -ECANCELED;
-        ret = QEMU_PAIO_CANCELED;
-    } else if (aiocb->ret == -EINPROGRESS)
-        ret = QEMU_PAIO_NOTCANCELED;
-    else
-        ret = QEMU_PAIO_ALLDONE;
+    if (!acb->active) {
+        TAILQ_REMOVE(&request_list, acb, node);
+        acb->ret = -ECANCELED;
+    } else if (acb->ret == -EINPROGRESS) {
+        active = 1;
+    }
     mutex_unlock(&lock);
 
-    return ret;
+    if (active) {
+        /* fail safe: if the aio could not be canceled, we wait for
+           it */
+        while (qemu_paio_error(acb) == EINPROGRESS)
+            ;
+    }
+
+    paio_remove(acb);
+}
+
+static AIOPool raw_aio_pool = {
+    .aiocb_size         = sizeof(struct qemu_paiocb),
+    .cancel             = paio_cancel,
+};
+
+BlockDriverAIOCB *paio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque, int type)
+{
+    struct qemu_paiocb *acb;
+
+    acb = qemu_aio_get(&raw_aio_pool, bs, cb, opaque);
+    if (!acb)
+        return NULL;
+    acb->aio_type = type;
+    acb->aio_fildes = fd;
+    acb->ev_signo = SIGUSR2;
+    acb->aio_iov = qiov->iov;
+    acb->aio_niov = qiov->niov;
+    acb->aio_nbytes = nb_sectors * 512;
+    acb->aio_offset = sector_num * 512;
+
+    acb->next = posix_aio_state->first_aio;
+    posix_aio_state->first_aio = acb;
+
+    qemu_paio_submit(acb);
+    return &acb->common;
+}
+
+BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
+        unsigned long int req, void *buf,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    struct qemu_paiocb *acb;
+
+    acb = qemu_aio_get(&raw_aio_pool, bs, cb, opaque);
+    if (!acb)
+        return NULL;
+    acb->aio_type = QEMU_AIO_IOCTL;
+    acb->aio_fildes = fd;
+    acb->ev_signo = SIGUSR2;
+    acb->aio_offset = 0;
+    acb->aio_ioctl_buf = buf;
+    acb->aio_ioctl_cmd = req;
+
+    acb->next = posix_aio_state->first_aio;
+    posix_aio_state->first_aio = acb;
+
+    qemu_paio_submit(acb);
+    return &acb->common;
+}
+
+void *paio_init(void)
+{
+    struct sigaction act;
+    PosixAioState *s;
+    int fds[2];
+    int ret;
+
+    if (posix_aio_state)
+        return posix_aio_state;
+
+    s = qemu_malloc(sizeof(PosixAioState));
+
+    sigfillset(&act.sa_mask);
+    act.sa_flags = 0; /* do not restart syscalls to interrupt select() */
+    act.sa_handler = aio_signal_handler;
+    sigaction(SIGUSR2, &act, NULL);
+
+    s->first_aio = NULL;
+    if (pipe(fds) == -1) {
+        fprintf(stderr, "failed to create pipe\n");
+        return NULL;
+    }
+
+    s->rfd = fds[0];
+    s->wfd = fds[1];
+
+    fcntl(s->rfd, F_SETFL, O_NONBLOCK);
+    fcntl(s->wfd, F_SETFL, O_NONBLOCK);
+
+    qemu_aio_set_fd_handler(s->rfd, posix_aio_read, NULL, posix_aio_flush, s);
+
+    ret = pthread_attr_init(&attr);
+    if (ret)
+        die2(ret, "pthread_attr_init");
+
+    ret = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
+    if (ret)
+        die2(ret, "pthread_attr_setdetachstate");
+
+    TAILQ_INIT(&request_list);
+
+    posix_aio_state = s;
+
+    return posix_aio_state;
 }
Index: qemu-kvm/posix-aio-compat.h
===================================================================
--- qemu-kvm.orig/posix-aio-compat.h
+++ /dev/null
@@ -1,68 +0,0 @@
-/*
- * QEMU posix-aio emulation
- *
- * Copyright IBM, Corp. 2008
- *
- * Authors:
- *  Anthony Liguori   <aliguori@us.ibm.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2.  See
- * the COPYING file in the top-level directory.
- *
- */
-
-#ifndef QEMU_POSIX_AIO_COMPAT_H
-#define QEMU_POSIX_AIO_COMPAT_H
-
-#include <sys/types.h>
-#include <unistd.h>
-#include <signal.h>
-
-#include "sys-queue.h"
-
-#define QEMU_PAIO_CANCELED     0x01
-#define QEMU_PAIO_NOTCANCELED  0x02
-#define QEMU_PAIO_ALLDONE      0x03
-
-struct qemu_paiocb
-{
-    int aio_fildes;
-    union {
-        struct iovec *aio_iov;
-	void *aio_ioctl_buf;
-    };
-    int aio_niov;
-    size_t aio_nbytes;
-#define aio_ioctl_cmd   aio_nbytes /* for QEMU_PAIO_IOCTL */
-    int ev_signo;
-    off_t aio_offset;
-    unsigned aio_flags;
-/* 512 byte alignment required for buffer, offset and length */
-#define QEMU_AIO_SECTOR_ALIGNED	0x01
-
-    /* private */
-    TAILQ_ENTRY(qemu_paiocb) node;
-    int aio_type;
-#define QEMU_PAIO_READ         0x01
-#define QEMU_PAIO_WRITE        0x02
-#define QEMU_PAIO_IOCTL        0x03
-    ssize_t ret;
-    int active;
-};
-
-struct qemu_paioinit
-{
-    unsigned int aio_threads;
-    unsigned int aio_num;
-    unsigned int aio_idle_time;
-};
-
-int qemu_paio_init(struct qemu_paioinit *aioinit);
-int qemu_paio_read(struct qemu_paiocb *aiocb);
-int qemu_paio_write(struct qemu_paiocb *aiocb);
-int qemu_paio_ioctl(struct qemu_paiocb *aiocb);
-int qemu_paio_error(struct qemu_paiocb *aiocb);
-ssize_t qemu_paio_return(struct qemu_paiocb *aiocb);
-int qemu_paio_cancel(int fd, struct qemu_paiocb *aiocb);
-
-#endif
Index: qemu-kvm/block/raw-posix-aio.h
===================================================================
--- /dev/null
+++ qemu-kvm/block/raw-posix-aio.h
@@ -0,0 +1,36 @@
+/*
+ * QEMU Posix block I/O backend AIO support
+ *
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_RAW_POSIX_AIO_H
+#define QEMU_RAW_POSIX_AIO_H
+
+/* AIO request types */
+#define QEMU_AIO_READ         0x0001
+#define QEMU_AIO_WRITE        0x0002
+#define QEMU_AIO_IOCTL        0x0004
+#define QEMU_AIO_TYPE_MASK \
+	(QEMU_AIO_READ|QEMU_AIO_WRITE|QEMU_AIO_IOCTL)
+
+/* AIO flags */
+#define QEMU_AIO_MISALIGNED   0x1000
+
+
+/* posix-aio-compat.c - thread pool based implementation */
+void *paio_init(void);
+BlockDriverAIOCB *paio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque, int type);
+BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
+        unsigned long int req, void *buf,
+        BlockDriverCompletionFunc *cb, void *opaque);
+
+#endif /* QEMU_RAW_POSIX_AIO_H */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 2/2] raw-posix: add Linux native AIO support
  2009-08-20 14:58 [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited Christoph Hellwig
  2009-08-20 14:58 ` [Qemu-devel] [PATCH 1/2] raw-posix: refactor AIO support Christoph Hellwig
@ 2009-08-20 14:58 ` Christoph Hellwig
  2009-08-21  9:53   ` Avi Kivity
  2009-08-20 19:06 ` [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited Jamie Lokier
  2009-08-21  7:40 ` Avi Kivity
  3 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2009-08-20 14:58 UTC (permalink / raw)
  To: qemu-devel


Now that do have a nicer interface to work against we can add Linux native
AIO support.  It's an extremly thing layer just setting up an iocb for
the io_submit system call in the submission path, and registering an
eventfd with the qemu poll handler to do complete the iocbs directly
from there.

This started out based on Anthony's earlier AIO patch, but after
estimated 42,000 rewrites and just as many build system changes
there's not much left of it.

To enable native kernel aio use the aio=native sub-command on the
drive command line.  I have also added an option to qemu-io to
test the aio support without needing a guest.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/Makefile
===================================================================
--- qemu.orig/Makefile	2009-08-19 22:49:08.789354196 -0300
+++ qemu/Makefile	2009-08-19 22:51:25.293352541 -0300
@@ -56,6 +56,7 @@ recurse-all: $(SUBDIR_RULES) $(ROMSUBDIR
 block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o
 block-obj-y += nbd.o block.o aio.o aes.o
 block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
+block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 
 block-nested-y += cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
Index: qemu/block/raw-posix.c
===================================================================
--- qemu.orig/block/raw-posix.c	2009-08-19 22:49:08.793352540 -0300
+++ qemu/block/raw-posix.c	2009-08-19 23:00:21.157402768 -0300
@@ -115,6 +115,7 @@ typedef struct BDRVRawState {
     int fd_got_error;
     int fd_media_changed;
 #endif
+    int use_aio;
     uint8_t* aligned_buf;
 } BDRVRawState;
 
@@ -159,6 +160,7 @@ static int raw_open_common(BlockDriverSt
     }
     s->fd = fd;
     s->aligned_buf = NULL;
+
     if ((bdrv_flags & BDRV_O_NOCACHE)) {
         s->aligned_buf = qemu_blockalign(bs, ALIGNED_BUFFER_SIZE);
         if (s->aligned_buf == NULL) {
@@ -166,9 +168,22 @@ static int raw_open_common(BlockDriverSt
         }
     }
 
-    s->aio_ctx = paio_init();
-    if (!s->aio_ctx) {
-        goto out_free_buf;
+#ifdef CONFIG_LINUX_AIO
+    if ((bdrv_flags & (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) ==
+                      (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) {
+        s->aio_ctx = laio_init();
+        if (!s->aio_ctx) {
+            goto out_free_buf;
+        }
+        s->use_aio = 1;
+    } else
+#endif
+    {
+        s->aio_ctx = paio_init();
+        if (!s->aio_ctx) {
+            goto out_free_buf;
+        }
+        s->use_aio = 0;
     }
 
     return 0;
@@ -524,8 +539,13 @@ static BlockDriverAIOCB *raw_aio_submit(
      * boundary.  Check if this is the case or telll the low-level
      * driver that it needs to copy the buffer.
      */
-    if (s->aligned_buf && !qiov_is_aligned(qiov)) {
-        type |= QEMU_AIO_MISALIGNED;
+    if (s->aligned_buf) {
+        if (!qiov_is_aligned(qiov)) {
+            type |= QEMU_AIO_MISALIGNED;
+        } else if (s->use_aio) {
+            return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov,
+	                       nb_sectors, cb, opaque, type);
+        }
     }
 
     return paio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov, nb_sectors,
Index: qemu/configure
===================================================================
--- qemu.orig/configure	2009-08-19 22:49:08.801352719 -0300
+++ qemu/configure	2009-08-19 22:51:25.305393736 -0300
@@ -197,6 +197,7 @@ build_docs="yes"
 uname_release=""
 curses="yes"
 curl="yes"
+linux_aio="yes"
 io_thread="no"
 nptl="yes"
 mixemu="no"
@@ -499,6 +500,8 @@ for opt do
   ;;
   --enable-mixemu) mixemu="yes"
   ;;
+  --disable-linux-aio) linux_aio="no"
+  ;;
   --enable-io-thread) io_thread="yes"
   ;;
   --disable-blobs) blobs="no"
@@ -636,6 +639,7 @@ echo "  --oss-lib                path to
 echo "  --enable-uname-release=R Return R for uname -r in usermode emulation"
 echo "  --sparc_cpu=V            Build qemu for Sparc architecture v7, v8, v8plus, v8plusa, v9"
 echo "  --disable-vde            disable support for vde network"
+echo "  --disable-linux-aio      disable Linux AIO support"
 echo "  --enable-io-thread       enable IO thread"
 echo "  --disable-blobs          disable installing provided firmware blobs"
 echo "  --kerneldir=PATH         look for kernel includes in PATH"
@@ -1197,6 +1201,23 @@ if test "$pthread" = no; then
 fi
 
 ##########################################
+# linux-aio probe
+AIOLIBS=""
+
+if test "$linux_aio" = "yes" ; then
+    linux_aio=no
+    cat > $TMPC <<EOF
+#include <libaio.h>
+#include <sys/eventfd.h>
+int main(void) { io_setup(0, NULL); io_set_eventfd(NULL, 0); eventfd(0, 0); return 0; }
+EOF
+    if compile_prog "" "-laio" ; then
+        linux_aio=yes
+        LIBS="$LIBS -laio"
+    fi
+fi
+
+##########################################
 # iovec probe
 cat > $TMPC <<EOF
 #include <sys/types.h>
@@ -1527,6 +1548,7 @@ echo "NPTL support      $nptl"
 echo "GUEST_BASE        $guest_base"
 echo "vde support       $vde"
 echo "IO thread         $io_thread"
+echo "Linux AIO support $linux_aio"
 echo "Install blobs     $blobs"
 echo -e "KVM support       $kvm"
 echo "fdt support       $fdt"
@@ -1700,6 +1722,9 @@ fi
 if test "$io_thread" = "yes" ; then
   echo "CONFIG_IOTHREAD=y" >> $config_host_mak
 fi
+if test "$linux_aio" = "yes" ; then
+  echo "CONFIG_LINUX_AIO=y" >> $config_host_mak
+fi
 if test "$blobs" = "yes" ; then
   echo "INSTALL_BLOBS=yes" >> $config_host_mak
 fi
Index: qemu/linux-aio.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ qemu/linux-aio.c	2009-08-20 10:54:10.924375300 -0300
@@ -0,0 +1,204 @@
+/*
+ * Linux native AIO support.
+ *
+ * Copyright (C) 2009 IBM, Corp.
+ * Copyright (C) 2009 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu-common.h"
+#include "qemu-aio.h"
+#include "block_int.h"
+#include "block/raw-posix-aio.h"
+
+#include <sys/eventfd.h>
+#include <libaio.h>
+
+/*
+ * Queue size (per-device).
+ *
+ * XXX: eventually we need to communicate this to the guest and/or make it
+ *      tunable by the guest.  If we get more outstanding requests at a time
+ *      than this we will get EAGAIN from io_submit which is communicated to
+ *      the guest as an I/O error.
+ */
+#define MAX_EVENTS 128
+
+struct qemu_laiocb {
+    BlockDriverAIOCB common;
+    struct qemu_laio_state *ctx;
+    struct iocb iocb;
+    ssize_t ret;
+    size_t nbytes;
+};
+
+struct qemu_laio_state {
+    io_context_t ctx;
+    int efd;
+    int count;
+};
+
+static inline ssize_t io_event_ret(struct io_event *ev)
+{
+    return (ssize_t)(((uint64_t)ev->res2 << 32) | ev->res);
+}
+
+static void qemu_laio_completion_cb(void *opaque)
+{
+    struct qemu_laio_state *s = opaque;
+
+    while (1) {
+        struct io_event events[MAX_EVENTS];
+        uint64_t val;
+        ssize_t ret;
+        struct timespec ts = { 0 };
+        int nevents, i;
+
+        do {
+            ret = read(s->efd, &val, sizeof(val));
+        } while (ret == 1 && errno == EINTR);
+
+        if (ret == -1 && errno == EAGAIN)
+            break;
+
+        if (ret != 8)
+            break;
+
+        do {
+            nevents = io_getevents(s->ctx, val, MAX_EVENTS, events, &ts);
+        } while (nevents == -EINTR);
+
+        for (i = 0; i < nevents; i++) {
+            struct iocb *iocb = events[i].obj;
+            struct qemu_laiocb *laiocb =
+                    container_of(iocb, struct qemu_laiocb, iocb);
+
+            s->count--;
+
+            ret = laiocb->ret = io_event_ret(&events[i]);
+            if (ret != -ECANCELED) {
+                if (ret == laiocb->nbytes)
+                    ret = 0;
+                else if (ret >= 0)
+                    ret = -EINVAL;
+
+                laiocb->common.cb(laiocb->common.opaque, ret);
+            }
+
+            qemu_aio_release(laiocb);
+        }
+    }
+}
+
+static int qemu_laio_flush_cb(void *opaque)
+{
+    struct qemu_laio_state *s = opaque;
+
+    return (s->count > 0) ? 1 : 0;
+}
+
+static void laio_cancel(BlockDriverAIOCB *blockacb)
+{
+    struct qemu_laiocb *laiocb = (struct qemu_laiocb *)blockacb;
+    struct io_event event;
+    int ret;
+
+    if (laiocb->ret != -EINPROGRESS)
+        return;
+
+    /*
+     * Note that as of Linux 2.6.31 neither the block device code nor any
+     * filesystem implements cancellation of AIO request.
+     * Thus the polling loop below is the normal code path.
+     */
+    ret = io_cancel(laiocb->ctx->ctx, &laiocb->iocb, &event);
+    if (ret == 0) {
+        laiocb->ret = -ECANCELED;
+        return;
+    }
+
+    /*
+     * We have to wait for the iocb to finish.
+     *
+     * The only way to get the iocb status update is by polling the io context.
+     * We might be able to do this slightly more optimal by removing the
+     * O_NONBLOCK flag.
+     */
+    while (laiocb->ret == -EINPROGRESS)
+        qemu_laio_completion_cb(laiocb->ctx);
+}
+
+static AIOPool laio_pool = {
+    .aiocb_size         = sizeof(struct qemu_laiocb),
+    .cancel             = laio_cancel,
+};
+
+BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque, int type)
+{
+    struct qemu_laio_state *s = aio_ctx;
+    struct qemu_laiocb *laiocb;
+    struct iocb *iocbs;
+    off_t offset = sector_num * 512;
+
+    laiocb = qemu_aio_get(&laio_pool, bs, cb, opaque);
+    if (!laiocb)
+        return NULL;
+    laiocb->nbytes = nb_sectors * 512;
+    laiocb->ctx = s;
+    laiocb->ret = -EINPROGRESS;
+
+    iocbs = &laiocb->iocb;
+
+    switch (type) {
+    case QEMU_AIO_WRITE:
+        io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
+	break;
+    case QEMU_AIO_READ:
+        io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
+	break;
+    default:
+        fprintf(stderr, "%s: invalid AIO request type 0x%x.\n",
+                        __func__, type);
+        goto out_free_aiocb;
+    }
+    io_set_eventfd(&laiocb->iocb, s->efd);
+    s->count++;
+
+    if (io_submit(s->ctx, 1, &iocbs) < 0)
+        goto out_dec_count;
+    return &laiocb->common;
+
+out_free_aiocb:
+    qemu_aio_release(laiocb);
+out_dec_count:
+    s->count--;
+    return NULL;
+}
+
+void *laio_init(void)
+{
+    struct qemu_laio_state *s;
+
+    s = qemu_mallocz(sizeof(*s));
+    s->efd = eventfd(0, 0);
+    if (s->efd == -1)
+        goto out_free_state;
+    fcntl(s->efd, F_SETFL, O_NONBLOCK);
+
+    if (io_setup(MAX_EVENTS, &s->ctx) != 0)
+        goto out_close_efd;
+
+    qemu_aio_set_fd_handler(s->efd, qemu_laio_completion_cb,
+                            NULL, qemu_laio_flush_cb, s);
+
+    return s;
+
+out_close_efd:
+    close(s->efd);
+out_free_state:
+    qemu_free(s);
+    return NULL;
+}
Index: qemu/block/raw-posix-aio.h
===================================================================
--- qemu.orig/block/raw-posix-aio.h	2009-08-19 22:49:08.797353398 -0300
+++ qemu/block/raw-posix-aio.h	2009-08-19 22:51:25.313401597 -0300
@@ -33,4 +33,10 @@ BlockDriverAIOCB *paio_ioctl(BlockDriver
         unsigned long int req, void *buf,
         BlockDriverCompletionFunc *cb, void *opaque);
 
+/* linux-aio.c - Linux native implementation */
+void *laio_init(void);
+BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque, int type);
+
 #endif /* QEMU_RAW_POSIX_AIO_H */
Index: qemu/block.h
===================================================================
--- qemu.orig/block.h	2009-08-19 22:49:08.809352828 -0300
+++ qemu/block.h	2009-08-19 22:51:25.317384576 -0300
@@ -37,6 +37,7 @@ typedef struct QEMUSnapshotInfo {
                                      bdrv_file_open()) */
 #define BDRV_O_NOCACHE     0x0020 /* do not use the host page cache */
 #define BDRV_O_CACHE_WB    0x0040 /* use write-back caching */
+#define BDRV_O_NATIVE_AIO  0x0080 /* use native AIO instead of the thread pool */
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB)
 
Index: qemu/qemu-options.hx
===================================================================
--- qemu.orig/qemu-options.hx	2009-08-19 22:49:08.817352727 -0300
+++ qemu/qemu-options.hx	2009-08-19 22:51:25.321383686 -0300
@@ -95,7 +95,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
     "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
     "       [,cache=writethrough|writeback|none][,format=f][,serial=s]\n"
-    "       [,addr=A][,id=name]\n"
+    "       [,addr=A][,id=name][,aio=threads|native]\n"
     "                use 'file' as a drive image\n")
 DEF("set", HAS_ARG, QEMU_OPTION_set,
     "-set group.id.arg=value\n"
@@ -128,6 +128,8 @@ These options have the same definition a
 @var{snapshot} is "on" or "off" and allows to enable snapshot for given drive (see @option{-snapshot}).
 @item cache=@var{cache}
 @var{cache} is "none", "writeback", or "writethrough" and controls how the host cache is used to access block data.
+@item aio=@var{aio}
+@var{aio} is "threads", or "native" and selects between pthread based disk I/O and native Linux AIO.
 @item format=@var{format}
 Specify which disk @var{format} will be used rather than detecting
 the format.  Can be used to specifiy format=raw to avoid interpreting
Index: qemu/vl.c
===================================================================
--- qemu.orig/vl.c	2009-08-19 22:49:08.821354562 -0300
+++ qemu/vl.c	2009-08-19 22:51:25.325352976 -0300
@@ -1921,6 +1921,7 @@ DriveInfo *drive_init(QemuOpts *opts, vo
     int max_devs;
     int index;
     int cache;
+    int aio = 0;
     int bdrv_flags, onerror;
     const char *devaddr;
     DriveInfo *dinfo;
@@ -2054,6 +2055,19 @@ DriveInfo *drive_init(QemuOpts *opts, vo
         }
     }
 
+#ifdef CONFIG_LINUX_AIO
+    if ((buf = qemu_opt_get(opts, "aio")) != NULL) {
+        if (!strcmp(buf, "threads"))
+            aio = 0;
+        else if (!strcmp(buf, "native"))
+            aio = 1;
+        else {
+           fprintf(stderr, "qemu: invalid aio option\n");
+           return NULL;
+        }
+    }
+#endif
+
     if ((buf = qemu_opt_get(opts, "format")) != NULL) {
        if (strcmp(buf, "?") == 0) {
             fprintf(stderr, "qemu: Supported formats:");
@@ -2223,11 +2237,19 @@ DriveInfo *drive_init(QemuOpts *opts, vo
         bdrv_flags |= BDRV_O_NOCACHE;
     else if (cache == 2) /* write-back */
         bdrv_flags |= BDRV_O_CACHE_WB;
+
+    if (aio == 1) {
+        bdrv_flags |= BDRV_O_NATIVE_AIO;
+    } else {
+        bdrv_flags &= ~BDRV_O_NATIVE_AIO;
+    }
+
     if (bdrv_open2(dinfo->bdrv, file, bdrv_flags, drv) < 0) {
         fprintf(stderr, "qemu: could not open disk image %s\n",
                         file);
         return NULL;
     }
+
     if (bdrv_key_required(dinfo->bdrv))
         autostart = 0;
     *fatal_error = 0;
Index: qemu/qemu-config.c
===================================================================
--- qemu.orig/qemu-config.c	2009-08-19 22:49:08.825352416 -0300
+++ qemu/qemu-config.c	2009-08-19 22:51:25.333383955 -0300
@@ -53,6 +53,10 @@ QemuOptsList qemu_drive_opts = {
             .type = QEMU_OPT_STRING,
             .help = "host cache usage (none, writeback, writethrough)",
         },{
+            .name = "aio",
+            .type = QEMU_OPT_STRING,
+            .help = "host AIO implementation (threads, native)",
+        },{
             .name = "format",
             .type = QEMU_OPT_STRING,
             .help = "disk format (raw, qcow2, ...)",
Index: qemu/block.c
===================================================================
--- qemu.orig/block.c	2009-08-19 22:58:58.421381858 -0300
+++ qemu/block.c	2009-08-19 22:59:39.033439876 -0300
@@ -411,7 +411,8 @@ int bdrv_open2(BlockDriverState *bs, con
     /* Note: for compatibility, we open disk image files as RDWR, and
        RDONLY as fallback */
     if (!(flags & BDRV_O_FILE))
-        open_flags = BDRV_O_RDWR | (flags & BDRV_O_CACHE_MASK);
+        open_flags = BDRV_O_RDWR |
+		(flags & (BDRV_O_CACHE_MASK|BDRV_O_NATIVE_AIO));
     else
         open_flags = flags & ~(BDRV_O_FILE | BDRV_O_SNAPSHOT);
     ret = drv->bdrv_open(bs, filename, open_flags);
Index: qemu/qemu-io.c
===================================================================
--- qemu.orig/qemu-io.c	2009-08-20 10:41:09.047691604 -0300
+++ qemu/qemu-io.c	2009-08-20 10:57:29.753487097 -0300
@@ -1401,6 +1401,7 @@ static void usage(const char *name)
 "  -n, --nocache        disable host cache\n"
 "  -g, --growable       allow file to grow (only applies to protocols)\n"
 "  -m, --misalign       misalign allocations for O_DIRECT\n"
+"  -k, --native-aio     use kernel AIO implementation (on Linux only)\n"
 "  -h, --help           display this help and exit\n"
 "  -V, --version        output version information and exit\n"
 "\n",
@@ -1412,7 +1413,7 @@ int main(int argc, char **argv)
 {
 	int readonly = 0;
 	int growable = 0;
-	const char *sopt = "hVc:Crsnmg";
+	const char *sopt = "hVc:Crsnmgk";
 	struct option lopt[] = {
 		{ "help", 0, NULL, 'h' },
 		{ "version", 0, NULL, 'V' },
@@ -1424,6 +1425,7 @@ int main(int argc, char **argv)
 		{ "nocache", 0, NULL, 'n' },
 		{ "misalign", 0, NULL, 'm' },
 		{ "growable", 0, NULL, 'g' },
+		{ "native-aio", 0, NULL, 'k' },
 		{ NULL, 0, NULL, 0 }
 	};
 	int c;
@@ -1455,6 +1457,9 @@ int main(int argc, char **argv)
 		case 'g':
 			growable = 1;
 			break;
+		case 'k':
+			flags |= BDRV_O_NATIVE_AIO;
+			break;
 		case 'V':
 			printf("%s version %s\n", progname, VERSION);
 			exit(0);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited
  2009-08-20 14:58 [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited Christoph Hellwig
  2009-08-20 14:58 ` [Qemu-devel] [PATCH 1/2] raw-posix: refactor AIO support Christoph Hellwig
  2009-08-20 14:58 ` [Qemu-devel] [PATCH 2/2] raw-posix: add Linux native " Christoph Hellwig
@ 2009-08-20 19:06 ` Jamie Lokier
  2009-08-21  7:40 ` Avi Kivity
  3 siblings, 0 replies; 10+ messages in thread
From: Jamie Lokier @ 2009-08-20 19:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

Christoph Hellwig wrote:
> The IO code performs slightly better than the thread pool on most
> workloads I've thrown at it, and uses a lot less CPU time for it:

That's interesting.  I was under the impression that over on
linux-aio@kvack.org and on here, some measurements had already been
done showing that threaded AIO wasn't much worse than Linux AIO or
perhaps better, and therefore might as well be used.

Guess I misunderstood something.

-- Jamie

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited
  2009-08-20 14:58 [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited Christoph Hellwig
                   ` (2 preceding siblings ...)
  2009-08-20 19:06 ` [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited Jamie Lokier
@ 2009-08-21  7:40 ` Avi Kivity
  2009-08-21 14:50   ` Christoph Hellwig
  3 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2009-08-21  7:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

On 08/20/2009 05:58 PM, Christoph Hellwig wrote:
> Based on this I would recommend to include this patch, but not use it by
> default for now.  After some testing I would suggest to enable it by
> default for host devices and investigate a way to make it easily usable
> for files, possibly including some kernel support to tell us which files
> are "safe".
>    

I like both the numbers and the patches.  But why disable by default?  
Enabling for cache=none host devices only seems a safe choice.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: add Linux native AIO support
  2009-08-20 14:58 ` [Qemu-devel] [PATCH 2/2] raw-posix: add Linux native " Christoph Hellwig
@ 2009-08-21  9:53   ` Avi Kivity
  2009-08-21 14:48     ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2009-08-21  9:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

On 08/20/2009 05:58 PM, Christoph Hellwig wrote:
> Now that do have a nicer interface to work against we can add Linux native
> AIO support.  It's an extremly thing layer just setting up an iocb for
> the io_submit system call in the submission path, and registering an
> eventfd with the qemu poll handler to do complete the iocbs directly
> from there.
>
> This started out based on Anthony's earlier AIO patch, but after
> estimated 42,000 rewrites and just as many build system changes
> there's not much left of it.
>
> To enable native kernel aio use the aio=native sub-command on the
> drive command line.  I have also added an option to qemu-io to
> test the aio support without needing a guest.
>
>
> Signed-off-by: Christoph Hellwig<hch@lst.de>
>
> Index: qemu/Makefile
> ===================================================================
> --- qemu.orig/Makefile	2009-08-19 22:49:08.789354196 -0300
> +++ qemu/Makefile	2009-08-19 22:51:25.293352541 -0300
> @@ -56,6 +56,7 @@ recurse-all: $(SUBDIR_RULES) $(ROMSUBDIR
>   block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o
>   block-obj-y += nbd.o block.o aio.o aes.o
>   block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
> +block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>
>   block-nested-y += cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
>   block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
> Index: qemu/block/raw-posix.c
> ===================================================================
> --- qemu.orig/block/raw-posix.c	2009-08-19 22:49:08.793352540 -0300
> +++ qemu/block/raw-posix.c	2009-08-19 23:00:21.157402768 -0300
> @@ -115,6 +115,7 @@ typedef struct BDRVRawState {
>       int fd_got_error;
>       int fd_media_changed;
>   #endif
> +    int use_aio;
>       uint8_t* aligned_buf;
>   } BDRVRawState;
>
> @@ -159,6 +160,7 @@ static int raw_open_common(BlockDriverSt
>       }
>       s->fd = fd;
>       s->aligned_buf = NULL;
> +
>       if ((bdrv_flags&  BDRV_O_NOCACHE)) {
>           s->aligned_buf = qemu_blockalign(bs, ALIGNED_BUFFER_SIZE);
>           if (s->aligned_buf == NULL) {
> @@ -166,9 +168,22 @@ static int raw_open_common(BlockDriverSt
>           }
>       }
>
> -    s->aio_ctx = paio_init();
> -    if (!s->aio_ctx) {
> -        goto out_free_buf;
> +#ifdef CONFIG_LINUX_AIO
> +    if ((bdrv_flags&  (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) ==
> +                      (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) {
> +        s->aio_ctx = laio_init();
> +        if (!s->aio_ctx) {
> +            goto out_free_buf;
> +        }
> +        s->use_aio = 1;
> +    } else
> +#endif
> +    {
> +        s->aio_ctx = paio_init();
> +        if (!s->aio_ctx) {
> +            goto out_free_buf;
> +        }
> +        s->use_aio = 0;
>       }
>
>       return 0;
> @@ -524,8 +539,13 @@ static BlockDriverAIOCB *raw_aio_submit(
>        * boundary.  Check if this is the case or telll the low-level
>        * driver that it needs to copy the buffer.
>        */
> -    if (s->aligned_buf&&  !qiov_is_aligned(qiov)) {
> -        type |= QEMU_AIO_MISALIGNED;
> +    if (s->aligned_buf) {
> +        if (!qiov_is_aligned(qiov)) {
> +            type |= QEMU_AIO_MISALIGNED;
> +        } else if (s->use_aio) {
> +            return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov,
> +	                       nb_sectors, cb, opaque, type);
> +        }
>       }
>
>       return paio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov, nb_sectors,
> Index: qemu/configure
> ===================================================================
> --- qemu.orig/configure	2009-08-19 22:49:08.801352719 -0300
> +++ qemu/configure	2009-08-19 22:51:25.305393736 -0300
> @@ -197,6 +197,7 @@ build_docs="yes"
>   uname_release=""
>   curses="yes"
>   curl="yes"
> +linux_aio="yes"
>   io_thread="no"
>   nptl="yes"
>   mixemu="no"
> @@ -499,6 +500,8 @@ for opt do
>     ;;
>     --enable-mixemu) mixemu="yes"
>     ;;
> +  --disable-linux-aio) linux_aio="no"
> +  ;;
>     --enable-io-thread) io_thread="yes"
>     ;;
>     --disable-blobs) blobs="no"
> @@ -636,6 +639,7 @@ echo "  --oss-lib                path to
>   echo "  --enable-uname-release=R Return R for uname -r in usermode emulation"
>   echo "  --sparc_cpu=V            Build qemu for Sparc architecture v7, v8, v8plus, v8plusa, v9"
>   echo "  --disable-vde            disable support for vde network"
> +echo "  --disable-linux-aio      disable Linux AIO support"
>   echo "  --enable-io-thread       enable IO thread"
>   echo "  --disable-blobs          disable installing provided firmware blobs"
>   echo "  --kerneldir=PATH         look for kernel includes in PATH"
> @@ -1197,6 +1201,23 @@ if test "$pthread" = no; then
>   fi
>
>   ##########################################
> +# linux-aio probe
> +AIOLIBS=""
> +
> +if test "$linux_aio" = "yes" ; then
> +    linux_aio=no
> +    cat>  $TMPC<<EOF
> +#include<libaio.h>
> +#include<sys/eventfd.h>
> +int main(void) { io_setup(0, NULL); io_set_eventfd(NULL, 0); eventfd(0, 0); return 0; }
> +EOF
> +    if compile_prog "" "-laio" ; then
> +        linux_aio=yes
> +        LIBS="$LIBS -laio"
> +    fi
> +fi
> +
> +##########################################
>   # iovec probe
>   cat>  $TMPC<<EOF
>   #include<sys/types.h>
> @@ -1527,6 +1548,7 @@ echo "NPTL support      $nptl"
>   echo "GUEST_BASE        $guest_base"
>   echo "vde support       $vde"
>   echo "IO thread         $io_thread"
> +echo "Linux AIO support $linux_aio"
>   echo "Install blobs     $blobs"
>   echo -e "KVM support       $kvm"
>   echo "fdt support       $fdt"
> @@ -1700,6 +1722,9 @@ fi
>   if test "$io_thread" = "yes" ; then
>     echo "CONFIG_IOTHREAD=y">>  $config_host_mak
>   fi
> +if test "$linux_aio" = "yes" ; then
> +  echo "CONFIG_LINUX_AIO=y">>  $config_host_mak
> +fi
>   if test "$blobs" = "yes" ; then
>     echo "INSTALL_BLOBS=yes">>  $config_host_mak
>   fi
> Index: qemu/linux-aio.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ qemu/linux-aio.c	2009-08-20 10:54:10.924375300 -0300
> @@ -0,0 +1,204 @@
> +/*
> + * Linux native AIO support.
> + *
> + * Copyright (C) 2009 IBM, Corp.
> + * Copyright (C) 2009 Red Hat, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "qemu-common.h"
> +#include "qemu-aio.h"
> +#include "block_int.h"
> +#include "block/raw-posix-aio.h"
> +
> +#include<sys/eventfd.h>
> +#include<libaio.h>
> +
> +/*
> + * Queue size (per-device).
> + *
> + * XXX: eventually we need to communicate this to the guest and/or make it
> + *      tunable by the guest.  If we get more outstanding requests at a time
> + *      than this we will get EAGAIN from io_submit which is communicated to
> + *      the guest as an I/O error.
> + */
> +#define MAX_EVENTS 128
>    

Or, we could queue any extra requests.

> +
> +
> +void *laio_init(void)
> +{
> +    struct qemu_laio_state *s;
> +
> +    s = qemu_mallocz(sizeof(*s));
> +    s->efd = eventfd(0, 0);
> +    if (s->efd == -1)
> +        goto out_free_state;
> +    fcntl(s->efd, F_SETFL, O_NONBLOCK);
> +
> +    if (io_setup(MAX_EVENTS,&s->ctx) != 0)
> +        goto out_close_efd;
> +
>    

One day we may want a global io context so we can dequeue many events 
with one syscall.  Or we may not, if we thread these things.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: add Linux native AIO support
  2009-08-21  9:53   ` Avi Kivity
@ 2009-08-21 14:48     ` Christoph Hellwig
  2009-08-21 15:35       ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2009-08-21 14:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoph Hellwig, qemu-devel

On Fri, Aug 21, 2009 at 12:53:49PM +0300, Avi Kivity wrote:
> >+ * Queue size (per-device).
> >+ *
> >+ * XXX: eventually we need to communicate this to the guest and/or make it
> >+ *      tunable by the guest.  If we get more outstanding requests at a 
> >time
> >+ *      than this we will get EAGAIN from io_submit which is communicated 
> >to
> >+ *      the guest as an I/O error.
> >+ */
> >+#define MAX_EVENTS 128
> >   
> 
> Or, we could queue any extra requests.

That doesn't make much sense.    We'd just do an additional level of
queueing in addition to those already optimized implementation in the
guest and host kernels.  This is really just an issue of communicating
the limits we have and deal with it efficiently.  It should be a
relatively small add-on patch.

> >+    if (io_setup(MAX_EVENTS,&s->ctx) != 0)
> >+        goto out_close_efd;
> >+
> >   
> 
> One day we may want a global io context so we can dequeue many events 
> with one syscall.  Or we may not, if we thread these things.

Wecould do this easily, in fact that's what I did before I run into
issues with the completion queue size when using multiple devices.

Syscall overhead in Linux is small enough that I would not bother until
it actually shows up as a problem.  That beeing said threading the block
layer would probably be a benefit for large setups for various reasons.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited
  2009-08-21  7:40 ` Avi Kivity
@ 2009-08-21 14:50   ` Christoph Hellwig
  2009-08-21 15:38     ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2009-08-21 14:50 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoph Hellwig, qemu-devel

On Fri, Aug 21, 2009 at 10:40:55AM +0300, Avi Kivity wrote:
> On 08/20/2009 05:58 PM, Christoph Hellwig wrote:
> >Based on this I would recommend to include this patch, but not use it by
> >default for now.  After some testing I would suggest to enable it by
> >default for host devices and investigate a way to make it easily usable
> >for files, possibly including some kernel support to tell us which files
> >are "safe".
> >   
> 
> I like both the numbers and the patches.  But why disable by default?  
> Enabling for cache=none host devices only seems a safe choice.

I want to 

  a) get some more testing.  Especially from people who are better at
     testing than me and have a wider variety of hardware
  b) sort out the queue depth issue.  Which I'd really prefer to once
     the basic patch is in I don't have to work with a large patch
     stack.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: add Linux native AIO support
  2009-08-21 14:48     ` Christoph Hellwig
@ 2009-08-21 15:35       ` Avi Kivity
  0 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2009-08-21 15:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

On 08/21/2009 05:48 PM, Christoph Hellwig wrote:
> On Fri, Aug 21, 2009 at 12:53:49PM +0300, Avi Kivity wrote:
>    
>>> + * Queue size (per-device).
>>> + *
>>> + * XXX: eventually we need to communicate this to the guest and/or make it
>>> + *      tunable by the guest.  If we get more outstanding requests at a
>>> time
>>> + *      than this we will get EAGAIN from io_submit which is communicated
>>> to
>>> + *      the guest as an I/O error.
>>> + */
>>> +#define MAX_EVENTS 128
>>>
>>>        
>> Or, we could queue any extra requests.
>>      
> That doesn't make much sense.    We'd just do an additional level of
> queueing in addition to those already optimized implementation in the
> guest and host kernels.  This is really just an issue of communicating
> the limits we have and deal with it efficiently.  It should be a
> relatively small add-on patch.
>    

You're right, virtio and scsi already know their queue sizes, should be 
easy to pass it down the stack.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited
  2009-08-21 14:50   ` Christoph Hellwig
@ 2009-08-21 15:38     ` Avi Kivity
  0 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2009-08-21 15:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

On 08/21/2009 05:50 PM, Christoph Hellwig wrote:
>> I like both the numbers and the patches.  But why disable by default?
>> Enabling for cache=none host devices only seems a safe choice.
>>      
> I want to
>
>    a) get some more testing.  Especially from people who are better at
>       testing than me and have a wider variety of hardware
>    

They're called "users" (at least, development snapshot users).  Enabling 
by default will see wider testing.

>    b) sort out the queue depth issue.  Which I'd really prefer to once
>       the basic patch is in I don't have to work with a large patch
>       stack.
>    

Ok.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-08-21 15:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-20 14:58 [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited Christoph Hellwig
2009-08-20 14:58 ` [Qemu-devel] [PATCH 1/2] raw-posix: refactor AIO support Christoph Hellwig
2009-08-20 14:58 ` [Qemu-devel] [PATCH 2/2] raw-posix: add Linux native " Christoph Hellwig
2009-08-21  9:53   ` Avi Kivity
2009-08-21 14:48     ` Christoph Hellwig
2009-08-21 15:35       ` Avi Kivity
2009-08-20 19:06 ` [Qemu-devel] [PATCH 0/2] native Linux AIO support revisited Jamie Lokier
2009-08-21  7:40 ` Avi Kivity
2009-08-21 14:50   ` Christoph Hellwig
2009-08-21 15:38     ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.