linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andres Freund <andres@anarazel.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	hch@infradead.org, clm@fb.com
Subject: Re: [PATCH 1/3] io_uring: add support for marking commands as draining
Date: Mon, 6 May 2019 11:03:54 -0700	[thread overview]
Message-ID: <20190506180354.maksphiaokual4jd@alap3.anarazel.de> (raw)
In-Reply-To: <20190411150657.18480-2-axboe@kernel.dk>

Hi,

On 2019-04-11 09:06:55 -0600, Jens Axboe wrote:
> There are no ordering constraints between the submission and completion
> side of io_uring. But sometimes that would be useful to have. One common
> example is doing an fsync, for instance, and have it ordered with
> previous writes. Without support for that, the application must do this
> tracking itself.

The facility seems useful for at least this postgres developer playing
with optionally using io_uring in parts of postgres. As you say, I'd
otherwise need to manually implement drains in userland.


> This adds a general SQE flag, IOSQE_IO_DRAIN. If a command is marked
> with this flag, then it will not be issued before previous commands have
> completed, and subsequent commands submitted after the drain will not be
> issued before the drain is started.. If there are no pending commands,
> setting this flag will not change the behavior of the issue of the
> command.

I think it'd be good if there were some documentation about how io_uring
interacts with writes done via a different io_uring queue, or
traditional write(2) et al.  And whether IOSQE_IO_DRAIN drain influences
that.

In none of the docs I read it's documented if an io_uring fsync
guarantees that a write(2) that finished before an IORING_OP_FSYNC op is
submitted is durable? Given the current implementation that clearly
seems to be the case, but it's not great to rely on the current
implementation as a user of data integrity operations.

Similarly, it'd be good if there were docs about how traditional
read/write/fsync and multiple io_uring queues interact in the face of
concurrent operations. For plain read/write we have posix providing some
baseline guarantees, but obviously doesn't mean anything for io_uring.

I suspect that most people's intuition will be "it's obvious", but also
that such intuitions are likely to differ between people.

Greetings,

Andres Freund

  reply	other threads:[~2019-05-06 18:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11 15:06 [PATCHSET 0/3] io_uring: add sync_file_range and drains Jens Axboe
2019-04-11 15:06 ` [PATCH 1/3] io_uring: add support for marking commands as draining Jens Axboe
2019-05-06 18:03   ` Andres Freund [this message]
2019-04-11 15:06 ` [PATCH 2/3] fs: add sync_file_range() helper Jens Axboe
2019-04-11 15:06 ` [PATCH 3/3] io_uring: add support for IORING_OP_SYNC_FILE_RANGE Jens Axboe
2019-04-11 15:16 ` [PATCHSET 0/3] io_uring: add sync_file_range and drains Matthew Wilcox
2019-04-11 15:23   ` Jens Axboe
2019-04-11 16:19   ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190506180354.maksphiaokual4jd@alap3.anarazel.de \
    --to=andres@anarazel.de \
    --cc=axboe@kernel.dk \
    --cc=clm@fb.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).