Linux-Fsdevel Archive on lore.kernel.org
 help / Atom feed
* [LSF/MM TOPIC] Atomic Writes
@ 2019-02-11 17:45 Bart Van Assche
  0 siblings, 0 replies; 1+ messages in thread
From: Bart Van Assche @ 2019-02-11 17:45 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-block, linux-fsdevel

Background
----------
Atomic writes are writes that either succeed in their entirety or that are
not executed if a power failure occurs. It is well known that using atomic
writes can improve database and filesystem performance significantly +AFs-1,2+AF0.
Although the NVMe and SCSI standards support atomic writes, neither the
block layer nor filesystems offer a standardized interface for submitting
atomic writes. Hence the proposal to add block device and filesystem
independent interfaces for atomic writes.

Block Layer Proposal
--------------------
+ACo Block drivers (NVMe, SCSI, ...) that support atomic writes set a queue
  flag that makes it clear to the block layer core that support for atomic
  writes is present.
+ACo Atomic writes are submitted from kernel context by marking individual
  requests as atomic. One possible approach is to introduce a new bio flag.
  Another possible approach is to introduce a new request type, e.g.
  REQ+AF8-OP+AF8-ATOMIC+AF8-WRITE.
+ACo Introduce new limits for atomic writes such that it is guaranteed that
  atomic writes will respect the device atomic write alignment and size
  restrictions. We will probably need limits that correspond to the NAWUN,
  NAWUNPF, NABSN, NABO and NABSPF parameters from the NVMe Identify
  Namespace response.
+ACo Kernel code that submits atomic writes is responsible for ensuring that
  the write request size does not exceed the maximum size advertised by the
  request queue. Fail atomic writes that are too large, not aligned or do
  not satisfy the atomic write limits in some other way. 
+ACo Add support in blk+AF8-stack+AF8-limits() for the atomic write limits.
+ACo Allow merging of regular writes with other regular writes. Allow merging
  of atomic writes with other atomic writes. Do not allow merging of regular
  writes with atomic writes. Respect the device limits when merging atomic
  write requests.
+ACo Continue allowing splitting of regular write requests but do not allow
  splitting of atomic writes.
+ACo Make it possible to submit atomic writes from user space. One possible
  approach is to add an O+AF8-ATOMIC flag to the open() system call.
+ACo Applications that want to submit both atomic and non-atomic writes must
  open the block device twice - once with and once without the O+AF8-ATOMIC
  flag.
+ACo Another possible approach is to add a new flag to the flags arguments of
  the pwritev2() system call and the asynchronous I/O iocb structure.

Filesystem Proposal
-------------------
+ACo Make it possible to submit atomic writes from user space. Just like for
  block devices, one possible approach is to add an O+AF8-ATOMIC flag to the
  open() system call. Another possible approach is to add a new flag to the
  flags arguments of the pwritev2() system call and the asynchronous I/O iocb
  structure. Note: Chris Mason had already proposed in 2013 to introduce the
  O+AF8-ATOMIC flag for filesystems +AFs-3+AF0.
+ACo Filesystems may but do not have to submit atomic writes to the block layer
  to implement O+AF8-ATOMIC. Using a traditional transaction mechanism to
  implement O+AF8-ATOMIC is also fine but will result in write amplification.
+ACo Introduce a standardized interface for querying the filesystem atomic write
  limits, e.g. by adding attributes under /sys/fs/.

References
----------
+AFs-1+AF0 Ouyang, Xiangyong+ADs Nellans, David+ADs Wipfel, Robert+ADs Flynn, David+ADs Panda,
    Dhabaleswar K. (February 2011). +ACI-Beyond block I/O: Rethinking traditional
    storage primitives+ACI. 2011 IEEE 17th International Symposium on High
    Performance Computer Architecture: 301+IBM-311
    (http://citeseerx.ist.psu.edu/viewdoc/download?doi+AD0-10.1.1.300.4140+ACY-rep+AD0-rep1+ACY-type+AD0-pdf). 
+AFs-2+AF0 MariaDB Knowledgebase, Atomic Write Support
    (https://mariadb.com/kb/en/library/atomic-write-support/).
+AFs-3+AF0 Chris Mason, Support for atomic IOs, fsdevel mailing list, November 2013
    (https://linux-fsdevel.vger.kernel.narkive.com/ba1zJRo7/patch-0-2-support-for-atomic-ios).
+AFs-4+AF0 Jonathan Corbet, Atomic I/O Operations, March 2013
    (https://lwn.net/Articles/552095/). 
+AFs-5+AF0 Jonathan Corbet, Support for atomic block I/O operations, November 2013
    (https://lwn.net/Articles/573092/).

^ permalink raw reply	[flat|nested] 1+ messages in thread

only message in thread, back to index

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-11 17:45 [LSF/MM TOPIC] Atomic Writes Bart Van Assche

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox