All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Fam Zheng <fam@euphon.net>, Kevin Wolf <kwolf@redhat.com>,
	Sergio Lopez <slp@redhat.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Max Reitz <mreitz@redhat.com>,
	saket.sinha89@gmail.com, Paolo Bonzini <pbonzini@redhat.com>,
	Julia Suvorova <jusual@mail.ru>,
	Aarushi Mehta <mehta.aaru20@gmail.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v5 04/12] block/io_uring: implements interfaces for io_uring
Date: Wed, 19 Jun 2019 11:14:14 +0100	[thread overview]
Message-ID: <20190619101414.GA13569@stefanha-x1.localdomain> (raw)
In-Reply-To: <81e4ab9b07d5678a3a28e1314c07ee0224e4d9ed.camel@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5464 bytes --]

On Mon, Jun 17, 2019 at 03:26:50PM +0300, Maxim Levitsky wrote:
> On Mon, 2019-06-10 at 19:18 +0530, Aarushi Mehta wrote:
> > +        if (!cqes) {
> > +            break;
> > +        }
> > +        LuringAIOCB *luringcb = io_uring_cqe_get_data(cqes);
> > +        ret = cqes->res;
> > +
> > +        if (ret == luringcb->qiov->size) {
> > +            ret = 0;
> > +        } else if (ret >= 0) {
> 
> 
> You should very carefully check the allowed return values here.
> 
> It looks like you can get '-EINTR' here, which would ask you to rerun the read operation, and otherwise
> you will get the number of bytes read, which might be less that what was asked for, which implies that you
> need to retry the read operation with the remainder of the buffer rather that zero the end of the buffer IMHO 
> 
> (0 is returned on EOF according to 'read' semantics, which I think are used here, thus a short read might not be an EOF)
> 
> 
> Looking at linux-aio.c though I do see that it just passes through the returned value with no special treatments. 
> including lack of check for -EINTR.
> 
> I assume that since aio is linux specific, and it only supports direct IO, it happens
> to have assumption of no short reads/-EINTR (but since libaio has very sparse documentation I can't verify this)
> 
> On the other hand the aio=threads implementation actually does everything as specified on the 'write' manpage,
> retrying the reads on -EINTR, and doing additional reads if less that required number of bytes were read.
> 
> Looking at io_uring implementation in the kernel I see that it does support synchronous (non O_DIRECT mode), 
> and in this case, it goes through the same ->read_iter which is pretty much the same path that 
> regular read() takes and so it might return short reads and or -EINTR.

Interesting point.  Investigating EINTR should at least be a TODO
comment and needs to be resolved before io_uring lands in a QEMU
release.

> > +static int ioq_submit(LuringState *s)
> > +{
> > +    int ret = 0;
> > +    LuringAIOCB *luringcb, *luringcb_next;
> > +
> > +    while (s->io_q.in_queue > 0) {
> > +        QSIMPLEQ_FOREACH_SAFE(luringcb, &s->io_q.sq_overflow, next,
> > +                              luringcb_next) {
> 
> I am torn about the 'sq_overflow' name. it seems to me that its not immediately clear that these
> are the requests that are waiting because the io uring got full, but I can't now think of a better name.
> 
> Maybe add a comment here to explain what is going on here?

Hmm...I suggested this name because I thought it was clear.  But the
fact that it puzzled you proves it wasn't clear :-).

Can anyone think of a better name?  It's the queue we keep in QEMU to
hold requests while the io_uring sq ring is full.

> Also maybe we could somehow utilize the plug/unplug facility to avoid reaching that state in first place?
> Maybe the block layer has some kind of 'max outstanding requests' limit that could be used?
> 
> In my nvme-mdev I opted to not process the input queues when such a condition is detected, but here you can't as the block layer
> pretty much calls you to process the requests.

Block layer callers are allowed to submit as many I/O requests as they
like and there is no feedback mechanism.  It's up to linux-aio.c and
io_uring.c to handle the case where host kernel I/O submission resources
are exhausted.

Plug/unplug is a batching performance optimization to reduce the number
of io_uring_enter() calls but it does not stop the callers from
submitting more I/O requests.  So plug/unplug isn't directly applicable
here.

> > +static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
> > +                            uint64_t offset, int type)
> > +{
> > +    struct io_uring_sqe *sqes = io_uring_get_sqe(&s->ring);
> > +    if (!sqes) {
> > +        sqes = &luringcb->sqeq;
> > +        QSIMPLEQ_INSERT_TAIL(&s->io_q.sq_overflow, luringcb, next);
> > +    }
> > +
> > +    switch (type) {
> > +    case QEMU_AIO_WRITE:
> > +        io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
> > +                             luringcb->qiov->niov, offset);
> > +        break;
> > +    case QEMU_AIO_READ:
> > +        io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
> > +                            luringcb->qiov->niov, offset);
> > +        break;
> > +    case QEMU_AIO_FLUSH:
> > +        io_uring_prep_fsync(sqes, fd, 0);
> > +        break;
> > +    default:
> > +        fprintf(stderr, "%s: invalid AIO request type, aborting 0x%x.\n",
> > +                        __func__, type);
> 
> Nitpick: Don't we use some king of error printing functions like 'error_setg' rather that fprintf?

Here we're not in a context where an Error object can be returned (e.g.
printed by the QMP monitor).  We only have an errno return value that
the emulated storage controller may squash down further to a single
EIO-type error code.

'type' is a QEMU-internal value so the default case is basically
assert(false); /* we should never get here */

For these reasons the fprintf() seems okay here.

> I plan on this or next week to do some benchmarks of the code and I will share the results as soon
> as I do them.

Excellent, Aarushi has been benchmarking too.  Perhaps you can share the
QEMU command-line and fio configuration so that the results can be
compared.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-06-19 10:15 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-10 13:48 [Qemu-devel] [PATCH v5 00/12] Add support for io_uring Aarushi Mehta
2019-06-10 13:48 ` [Qemu-devel] [PATCH v5 01/12] configure: permit use of io_uring Aarushi Mehta
2019-06-10 13:48 ` [Qemu-devel] [PATCH v5 02/12] qapi/block-core: add option for io_uring Aarushi Mehta
2019-06-11  7:36   ` Fam Zheng
2019-06-11  9:32     ` Stefan Hajnoczi
2019-07-03  6:26       ` Markus Armbruster
2019-06-10 13:48 ` [Qemu-devel] [PATCH v5 03/12] block/block: add BDRV flag " Aarushi Mehta
2019-06-10 13:48 ` [Qemu-devel] [PATCH v5 04/12] block/io_uring: implements interfaces " Aarushi Mehta
2019-06-11 11:17   ` Fam Zheng
2019-06-12 14:43     ` Stefan Hajnoczi
2019-06-17 12:26   ` Maxim Levitsky
2019-06-19 10:14     ` Stefan Hajnoczi [this message]
2019-06-19 10:47       ` Maxim Levitsky
2019-06-22 15:10     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-06-10 13:48 ` [Qemu-devel] [PATCH v5 05/12] stubs: add stubs for io_uring interface Aarushi Mehta
2019-06-17 12:33   ` Maxim Levitsky
2019-06-19 10:21     ` Stefan Hajnoczi
2019-06-10 13:48 ` [Qemu-devel] [PATCH v5 06/12] util/async: add aio interfaces for io_uring Aarushi Mehta
2019-06-17 12:56   ` Maxim Levitsky
2019-06-10 13:49 ` [Qemu-devel] [PATCH v5 07/12] blockdev: accept io_uring as option Aarushi Mehta
2019-06-17 13:01   ` Maxim Levitsky
2019-06-19 10:24     ` Stefan Hajnoczi
2019-06-19 10:48       ` Maxim Levitsky
2019-06-10 13:49 ` [Qemu-devel] [PATCH v5 08/12] block/file-posix.c: extend to use io_uring Aarushi Mehta
2019-06-17 14:12   ` Maxim Levitsky
2019-06-10 13:49 ` [Qemu-devel] [PATCH v5 09/12] block: add trace events for io_uring Aarushi Mehta
2019-06-11  9:47   ` Stefan Hajnoczi
2019-06-11 11:34   ` Fam Zheng
2019-06-10 13:49 ` [Qemu-devel] [PATCH v5 10/12] block/io_uring: adds userspace completion polling Aarushi Mehta
2019-06-11  9:51   ` Stefan Hajnoczi
2019-06-17 14:14     ` Maxim Levitsky
2019-06-10 13:49 ` [Qemu-devel] [PATCH v5 11/12] qemu-io: adds support for io_uring Aarushi Mehta
2019-06-11  9:54   ` Stefan Hajnoczi
2019-06-11 12:26   ` Fam Zheng
2019-06-10 13:49 ` [Qemu-devel] [PATCH v5 12/12] qemu-iotests/087: checks " Aarushi Mehta
2019-06-11  9:58   ` Stefan Hajnoczi
2019-06-17 14:21   ` Maxim Levitsky
2019-06-11  9:56 ` [Qemu-devel] [PATCH v5 00/12] Add support " Stefan Hajnoczi
2019-06-22 15:13   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-06-23 14:03     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190619101414.GA13569@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=armbru@redhat.com \
    --cc=fam@euphon.net \
    --cc=jusual@mail.ru \
    --cc=kwolf@redhat.com \
    --cc=mehta.aaru20@gmail.com \
    --cc=mlevitsk@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=saket.sinha89@gmail.com \
    --cc=slp@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.