From: Vladislav Bolkhovitin <vst@vlnb.net>
To: Chris Friesen <chris.friesen@genband.com>
Cc: Ryan Johnson <ryan.johnson@cs.utoronto.ca>,
General Discussion of SQLite Database <sqlite-users@sqlite.org>,
Nico Williams <nico@cryptonector.com>,
linux-fsdevel@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
linux-kernel <linux-kernel@vger.kernel.org>,
Richard Hipp <drh@hwaci.com>
Subject: Re: [sqlite] light weight write barriers
Date: Mon, 19 Nov 2012 20:23:52 -0500 [thread overview]
Message-ID: <50AADBA8.4090507@vlnb.net> (raw)
In-Reply-To: <50A71A7B.3040407@vlnb.net>
Vladislav Bolkhovitin, on 11/17/2012 12:02 AM wrote:
>>> The easiest way to implement this fsync would involve three things:
>>> 1. Schedule writes for all dirty pages in the fs cache that belong to
>>> the affected file, wait for the device to report success, issue a cache
>>> flush to the device (or request ordering commands, if available) to make
>>> it tell the truth, and wait for the device to report success. AFAIK this
>>> already happens, but without taking advantage of any request ordering
>>> commands.
>>> 2. The requesting thread returns as soon as the kernel has identified
>>> all data that will be written back. This is new, but pretty similar to
>>> what AIO already does.
>>> 3. No write is allowed to enqueue any requests at the device that
>>> involve the same file, until all outstanding fsync complete [3]. This is
>>> new.
>>
>> This sounds interesting as a way to expose some useful semantics to userspace.
>>
>> I assume we'd need to come up with a new syscall or something since it doesn't
>> match the behaviour of posix fsync().
>
> This is how I would export cache sync and requests ordering abstractions to the
> user space:
>
> For async IO (io_submit() and friends) I would extend struct iocb by flags, which
> would allow to set the required capabilities, i.e. if this request is FUA, or full
> cache sync, immediate [1] or not, ORDERED or not, or all at the same time, per
> each iocb.
>
> For the regular read()/write() I would add to "flags" parameter of
> sync_file_range() one more flag: if this sync is immediate or not.
>
> To enforce ordering rules I would add one more command to fcntl(). It would make
> the latest submitted write in this fd ORDERED.
Correction. To avoid possible races better that the new fcntl() command would
specify that N subsequent read()/write()/sync() calls as ORDERED.
For instance, in the simplest case of N=1, one next after fcntl() write() would be
handled as ORDERED.
(Unfortunately, it doesn't look like this old read()/write() interface has space
for a more elegant solution)
Vlad
next prev parent reply other threads:[~2012-11-20 1:24 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CALwJ=MzHjAOs4J4kGH6HLdwP8E88StDWyAPVumNg9zCWpS9Tdg@mail.gmail.com>
2012-10-10 17:17 ` light weight write barriers Andi Kleen
2012-10-11 16:32 ` [sqlite] " 杨苏立 Yang Su Li
2012-10-11 17:41 ` Christoph Hellwig
2012-10-23 19:53 ` Vladislav Bolkhovitin
2012-10-24 21:17 ` Nico Williams
2012-10-24 22:03 ` david
2012-10-25 0:20 ` Nico Williams
2012-10-25 1:04 ` david
2012-10-25 5:18 ` Nico Williams
2012-10-25 6:02 ` Theodore Ts'o
2012-10-25 6:58 ` david
2012-10-25 14:03 ` Theodore Ts'o
2012-10-25 18:03 ` david
2012-10-25 18:29 ` Theodore Ts'o
2012-11-05 20:03 ` Pavel Machek
2012-11-05 22:04 ` Theodore Ts'o
[not found] ` <CALwJ=Mx-uEFLXK2wywekk=0dwrwVFb68wocnH9bjXJmHRsJx3w@mail.gmail.com>
2012-11-05 23:00 ` Theodore Ts'o
2012-10-30 23:49 ` Nico Williams
2012-10-25 5:42 ` Theodore Ts'o
2012-10-25 7:11 ` david
2012-10-27 1:52 ` Vladislav Bolkhovitin
2012-10-25 5:14 ` Theodore Ts'o
2012-10-25 13:03 ` Alan Cox
2012-10-25 13:50 ` Theodore Ts'o
2012-10-27 1:55 ` Vladislav Bolkhovitin
2012-10-27 1:54 ` Vladislav Bolkhovitin
2012-10-27 4:44 ` Theodore Ts'o
2012-10-30 22:22 ` Vladislav Bolkhovitin
2012-10-31 9:54 ` Alan Cox
2012-11-01 20:18 ` Vladislav Bolkhovitin
2012-11-01 21:24 ` Alan Cox
2012-11-02 0:15 ` Vladislav Bolkhovitin
2012-11-02 0:38 ` Howard Chu
2012-11-02 12:33 ` Alan Cox
2012-11-13 3:41 ` Vladislav Bolkhovitin
2012-11-13 17:40 ` Alan Cox
2012-11-13 19:13 ` Nico Williams
2012-11-15 1:17 ` Vladislav Bolkhovitin
2012-11-15 12:07 ` David Lang
2012-11-16 15:06 ` Howard Chu
2012-11-16 15:31 ` Ric Wheeler
2012-11-16 15:54 ` Howard Chu
2012-11-16 18:03 ` Ric Wheeler
2012-11-16 19:14 ` David Lang
2012-11-17 5:02 ` Vladislav Bolkhovitin
[not found] ` <CABK4GYNGrbes2Yhig4ioh-37OXg6iy6gqb3u8A2P2_dqNpMqoQ@mail.gmail.com>
2012-11-17 5:02 ` Vladislav Bolkhovitin
2012-11-15 17:06 ` Ryan Johnson
2012-11-15 22:35 ` Chris Friesen
2012-11-17 5:02 ` Vladislav Bolkhovitin
2012-11-20 1:23 ` Vladislav Bolkhovitin [this message]
2012-11-26 20:05 ` Nico Williams
2012-11-29 2:15 ` Vladislav Bolkhovitin
2012-11-15 1:16 ` Vladislav Bolkhovitin
2012-11-13 3:37 ` Vladislav Bolkhovitin
[not found] ` <CALwJ=MwtFAz7uby+YzPPp2eBG-y+TUTOu9E9tEJbygDQW+s_tg@mail.gmail.com>
2012-11-13 3:41 ` Vladislav Bolkhovitin
[not found] ` <CABK4GYMmigmi7YM9A5Aga21ZWoMKgUe3eX-AhPzLw9CnYhpcGA@mail.gmail.com>
2012-11-13 3:42 ` Vladislav Bolkhovitin
[not found] ` <CALwJ=MyR+nU3zqi3V3JMuEGNwd8FUsw9xLACJvd0HoBv3kRi0w@mail.gmail.com>
2012-10-11 16:38 ` Nico Williams
2012-10-11 16:48 ` Nico Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50AADBA8.4090507@vlnb.net \
--to=vst@vlnb.net \
--cc=chris.friesen@genband.com \
--cc=drh@hwaci.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nico@cryptonector.com \
--cc=ryan.johnson@cs.utoronto.ca \
--cc=sqlite-users@sqlite.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).