From: Bart Van Assche <bvanassche@acm.org>
To: lsf-pc@lists.linux-foundation.org
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: [LSF/MM TOPIC] Atomic Writes
Date: Mon, 11 Feb 2019 09:45:47 -0800 [thread overview]
Message-ID: <1549907147.19311.16.camel@acm.org> (raw)
Background
----------
Atomic writes are writes that either succeed in their entirety or that are
not executed if a power failure occurs. It is well known that using atomic
writes can improve database and filesystem performance significantly [1,2].
Although the NVMe and SCSI standards support atomic writes, neither the
block layer nor filesystems offer a standardized interface for submitting
atomic writes. Hence the proposal to add block device and filesystem
independent interfaces for atomic writes.
Block Layer Proposal
--------------------
* Block drivers (NVMe, SCSI, ...) that support atomic writes set a queue
flag that makes it clear to the block layer core that support for atomic
writes is present.
* Atomic writes are submitted from kernel context by marking individual
requests as atomic. One possible approach is to introduce a new bio flag.
Another possible approach is to introduce a new request type, e.g.
REQ_OP_ATOMIC_WRITE.
* Introduce new limits for atomic writes such that it is guaranteed that
atomic writes will respect the device atomic write alignment and size
restrictions. We will probably need limits that correspond to the NAWUN,
NAWUNPF, NABSN, NABO and NABSPF parameters from the NVMe Identify
Namespace response.
* Kernel code that submits atomic writes is responsible for ensuring that
the write request size does not exceed the maximum size advertised by the
request queue. Fail atomic writes that are too large, not aligned or do
not satisfy the atomic write limits in some other way.
* Add support in blk_stack_limits() for the atomic write limits.
* Allow merging of regular writes with other regular writes. Allow merging
of atomic writes with other atomic writes. Do not allow merging of regular
writes with atomic writes. Respect the device limits when merging atomic
write requests.
* Continue allowing splitting of regular write requests but do not allow
splitting of atomic writes.
* Make it possible to submit atomic writes from user space. One possible
approach is to add an O_ATOMIC flag to the open() system call.
* Applications that want to submit both atomic and non-atomic writes must
open the block device twice - once with and once without the O_ATOMIC
flag.
* Another possible approach is to add a new flag to the flags arguments of
the pwritev2() system call and the asynchronous I/O iocb structure.
Filesystem Proposal
-------------------
* Make it possible to submit atomic writes from user space. Just like for
block devices, one possible approach is to add an O_ATOMIC flag to the
open() system call. Another possible approach is to add a new flag to the
flags arguments of the pwritev2() system call and the asynchronous I/O iocb
structure. Note: Chris Mason had already proposed in 2013 to introduce the
O_ATOMIC flag for filesystems [3].
* Filesystems may but do not have to submit atomic writes to the block layer
to implement O_ATOMIC. Using a traditional transaction mechanism to
implement O_ATOMIC is also fine but will result in write amplification.
* Introduce a standardized interface for querying the filesystem atomic write
limits, e.g. by adding attributes under /sys/fs/.
References
----------
[1] Ouyang, Xiangyong; Nellans, David; Wipfel, Robert; Flynn, David; Panda,
Dhabaleswar K. (February 2011). "Beyond block I/O: Rethinking traditional
storage primitives". 2011 IEEE 17th International Symposium on High
Performance Computer Architecture: 301–311
(http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.300.4140&rep=rep1&type=pdf).
[2] MariaDB Knowledgebase, Atomic Write Support
(https://mariadb.com/kb/en/library/atomic-write-support/).
[3] Chris Mason, Support for atomic IOs, fsdevel mailing list, November 2013
(https://linux-fsdevel.vger.kernel.narkive.com/ba1zJRo7/patch-0-2-support-for-atomic-ios).
[4] Jonathan Corbet, Atomic I/O Operations, March 2013
(https://lwn.net/Articles/552095/).
[5] Jonathan Corbet, Support for atomic block I/O operations, November 2013
(https://lwn.net/Articles/573092/).
reply other threads:[~2019-02-11 17:45 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1549907147.19311.16.camel@acm.org \
--to=bvanassche@acm.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).