Re: [Qemu-devel] [Qemu-block] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext

From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel@nongnu.org, famz@redhat.com, qemu-block@nongnu.org,
	stefanha@redhat.com
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext
Date: Tue, 10 May 2016 11:40:33 +0200	[thread overview]
Message-ID: <20160510094033.GI4921@noname.str.redhat.com> (raw)
In-Reply-To: <20160510093040.GB11408@stefanha-x1.localdomain>

[-- Attachment #1: Type: text/plain, Size: 2968 bytes --]

Am 10.05.2016 um 11:30 hat Stefan Hajnoczi geschrieben:
> On Mon, May 09, 2016 at 06:31:44PM +0200, Paolo Bonzini wrote:
> > On 19/04/2016 11:09, Stefan Hajnoczi wrote:
> > >> > This has better performance because it executes fewer system calls
> > >> > and does not use a bottom half per disk.
> > > Each aio_context_t is initialized for 128 in-flight requests in
> > > laio_init().
> > > 
> > > Will it be possible to hit the limit now that all drives share the same
> > > aio_context_t?
> > 
> > It was also possible before, because the virtqueue can be bigger than
> > 128 items; that's why there is logic to submit I/O requests after an
> > io_get_events.  As usual when the answer seems trivial, am I
> > misunderstanding your question?
> 
> I'm concerned about a performance regression rather than correctness.
> 
> But looking at linux-aio.c there *is* a correctness problem:
> 
>   static void ioq_submit(struct qemu_laio_state *s)
>   {
>       int ret, len;
>       struct qemu_laiocb *aiocb;
>       struct iocb *iocbs[MAX_QUEUED_IO];
>       QSIMPLEQ_HEAD(, qemu_laiocb) completed;
> 
>       do {
>           len = 0;
>           QSIMPLEQ_FOREACH(aiocb, &s->io_q.pending, next) {
>               iocbs[len++] = &aiocb->iocb;
>               if (len == MAX_QUEUED_IO) {
>                   break;
>               }
>           }
> 
>           ret = io_submit(s->ctx, len, iocbs);
>           if (ret == -EAGAIN) {
>               break;
>           }
>           if (ret < 0) {
>               abort();
>           }
> 
>           s->io_q.n -= ret;
>           aiocb = container_of(iocbs[ret - 1], struct qemu_laiocb, iocb);
>           QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
>       } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
>       s->io_q.blocked = (s->io_q.n > 0);
>   }
> 
> io_submit() may have submitted some of the requests when -EAGAIN is
> returned.  QEMU gets no indication of which requests were submitted.

My understanding (which is based on the manpage rather than code) is
that -EAGAIN is only returned if no request could be submitted. In other
cases, the number of submitted requests is returned (similar to how
short reads work).

> It may be possible to dig around in the s->ctx rings to find out or we
> need to keep track of the number of in-flight requests so we can
> prevent ever hitting EAGAIN.
> 
> ioq_submit() pretends that no requests were submitted on -EAGAIN and
> will submit them again next time.  This could result in double
> completions.

Did you check in the code that this can happen?

> Regarding performance, I'm thinking about a guest with 8 disks (queue
> depth 32).  The worst case is when the guest submits 32 requests at once
> but the Linux AIO event limit has already been reached.  Then the disk
> is starved until other disks' requests complete.

Sounds like a valid concern.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]