linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	martin.petersen@oracle.com, djwong@kernel.org,
	viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com,
	jejb@linux.ibm.com
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-security-module@vger.kernel.org, paul@paul-moore.com,
	jmorris@namei.org, serge@hallyn.com,
	John Garry <john.g.garry@oracle.com>
Subject: [PATCH RFC 00/16] block atomic writes
Date: Wed,  3 May 2023 18:38:05 +0000	[thread overview]
Message-ID: <20230503183821.1473305-1-john.g.garry@oracle.com> (raw)

This series introduces a new proposal to implementing atomic writes in the
kernel.

This series takes the approach of adding a new "atomic" flag to each of
pwritev2() and iocb->ki_flags - RWF_ATOMIC and IOCB_ATOMIC, respectively.
When set, these indicate that we want the write issued "atomically". I
have seen a similar flag for pwritev2() touted on the lists previously.

Only direct IO is supported and for block devices and xfs.

The atomic writes feature requires dedicated HW support, like
SCSI WRITE_ATOMIC_16 command.

The goal here is to provide an interface that allow applications use
application-specific block sizes larger than logical block size
reported by the storage device or larger than filesystem block size as
reported by stat().

With this new interface, application blocks will never be torn or
fractured. For a power fail, for each individual application block, all or
none of the data to be written. A racing atomic write and read will mean
that the read sees all the old data or all the new data, but never a mix
of old and new.

Two new fields are added to struct statx - atomic_write_unit_min and
atomic_write_unit_max. These values are always a power-of-two and
indicate the inclusive min and max block size which the userspace
application may use. The application block size must be a power-of-two.

For each atomic individual write, the total length of a write must be a
multiple of this application block size and must also be at a file offset
which is naturally aligned on that block size. Otherwise, the kernel
cannot know the application block size and what sort of splitting into
BIOs is permissible.

The kernel guarantees to write at least each individual application block
atomically. However, there is no guarantee to atomically write all data
for multiple blocks.

As an example of usage, for a 32KB application block size, userspace
may request a 64KB write at 96KB offset, which the kernel will submit
to HW as 2x 32KB individual atomic write operations.

Since xfs uses iomap and extents there may be discontiguous, we must
ensure that extents have specific alignments to support atomic writes. For
this, we add a new experimental variant of fallocate for xfs, fallocate2,
which takes an alignment arg, and should align any extents on that value.
In practice, it must be same value of atomic_write_unit_max for the
backing block device. This allows the user to submit atomic writes which
may span multiple discontig extents. This does not fully work yet, as
extents may later change and any new extents will not know about this
initial alignment requirement. Another option is to use XFS realtime
volumes, which does allow alignment to be specified via extsize arg. In
both cases, we should ensure extents are in written state prior to any
atomic writes.

SCSI sd.c and scsi_debug and NVMe kernel support is added.

We also have QEMU NVMe support and we hope to share in coming days.

We are sending as an RFC so we can share the code prior to LSFMM.

This series is based on v6.3

Alan Adamson (1):
  nvme: Support atomic writes

Allison Henderson (1):
  xfs: Add support for fallocate2

Himanshu Madhani (2):
  block: Add atomic write operations to request_queue limits
  block: Add REQ_ATOMIC flag

John Garry (10):
  xfs: Support atomic write for statx
  block: Limit atomic writes according to bio and queue limits
  block: Add bdev_find_max_atomic_write_alignment()
  block: Add support for atomic_write_unit
  block: Add blk_validate_atomic_write_op()
  block: Add fops atomic write support
  fs: iomap: Atomic write support
  scsi: sd: Support reading atomic properties from block limits VPD
  scsi: sd: Add WRITE_ATOMIC_16 support
  scsi: scsi_debug: Atomic write support

Prasad Singamsetty (2):
  fs/bdev: Add atomic write support info to statx
  fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support

 Documentation/ABI/stable/sysfs-block |  42 ++
 block/bdev.c                         |  60 +++
 block/bio.c                          |   7 +-
 block/blk-core.c                     |  28 ++
 block/blk-merge.c                    |  84 +++-
 block/blk-settings.c                 |  73 ++++
 block/blk-sysfs.c                    |  33 ++
 block/fops.c                         |  56 ++-
 drivers/nvme/host/core.c             |  33 ++
 drivers/scsi/scsi_debug.c            | 593 +++++++++++++++++++++------
 drivers/scsi/scsi_trace.c            |  22 +
 drivers/scsi/sd.c                    |  54 ++-
 drivers/scsi/sd.h                    |   7 +
 fs/iomap/direct-io.c                 |  72 +++-
 fs/stat.c                            |  10 +
 fs/xfs/Makefile                      |   1 +
 fs/xfs/libxfs/xfs_attr_remote.c      |   2 +-
 fs/xfs/libxfs/xfs_bmap.c             |   9 +-
 fs/xfs/libxfs/xfs_bmap.h             |   4 +-
 fs/xfs/libxfs/xfs_da_btree.c         |   4 +-
 fs/xfs/libxfs/xfs_fs.h               |   1 +
 fs/xfs/xfs_bmap_util.c               |   7 +-
 fs/xfs/xfs_bmap_util.h               |   2 +-
 fs/xfs/xfs_dquot.c                   |   2 +-
 fs/xfs/xfs_file.c                    |  19 +-
 fs/xfs/xfs_fs_staging.c              |  99 +++++
 fs/xfs/xfs_fs_staging.h              |  21 +
 fs/xfs/xfs_ioctl.c                   |   4 +
 fs/xfs/xfs_iomap.c                   |   4 +-
 fs/xfs/xfs_iops.c                    |  10 +
 fs/xfs/xfs_reflink.c                 |   4 +-
 fs/xfs/xfs_rtalloc.c                 |   2 +-
 fs/xfs/xfs_symlink.c                 |   2 +-
 include/linux/blk_types.h            |   4 +
 include/linux/blkdev.h               |  36 ++
 include/linux/fs.h                   |   1 +
 include/linux/stat.h                 |   2 +
 include/scsi/scsi_proto.h            |   1 +
 include/uapi/linux/fs.h              |   5 +-
 include/uapi/linux/stat.h            |   7 +-
 security/security.c                  |   1 +
 tools/include/uapi/linux/fs.h        |   5 +-
 42 files changed, 1257 insertions(+), 176 deletions(-)
 create mode 100644 fs/xfs/xfs_fs_staging.c
 create mode 100644 fs/xfs/xfs_fs_staging.h

-- 
2.31.1


             reply	other threads:[~2023-05-03 18:46 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-03 18:38 John Garry [this message]
2023-05-03 18:38 ` [PATCH RFC 01/16] block: Add atomic write operations to request_queue limits John Garry
2023-05-03 21:39   ` Dave Chinner
2023-05-04 18:14     ` John Garry
2023-05-04 22:26       ` Dave Chinner
2023-05-05  7:54         ` John Garry
2023-05-05 22:00           ` Darrick J. Wong
2023-05-07  1:59             ` Martin K. Petersen
2023-05-05 23:18           ` Dave Chinner
2023-05-06  9:38             ` John Garry
2023-05-07  2:35             ` Martin K. Petersen
2023-05-05 22:47         ` Eric Biggers
2023-05-05 23:31           ` Dave Chinner
2023-05-06  0:08             ` Eric Biggers
2023-05-09  0:19   ` Mike Snitzer
2023-05-17 17:02     ` John Garry
2023-05-03 18:38 ` [PATCH RFC 02/16] fs/bdev: Add atomic write support info to statx John Garry
2023-05-03 21:58   ` Dave Chinner
2023-05-04  8:45     ` John Garry
2023-05-04 22:40       ` Dave Chinner
2023-05-05  8:01         ` John Garry
2023-05-05 22:04           ` Darrick J. Wong
2023-05-03 18:38 ` [PATCH RFC 03/16] xfs: Support atomic write for statx John Garry
2023-05-03 22:17   ` Dave Chinner
2023-05-05 22:10     ` Darrick J. Wong
2023-05-03 18:38 ` [PATCH RFC 04/16] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support John Garry
2023-05-03 18:38 ` [PATCH RFC 05/16] block: Add REQ_ATOMIC flag John Garry
2023-05-03 18:38 ` [PATCH RFC 06/16] block: Limit atomic writes according to bio and queue limits John Garry
2023-05-03 18:53   ` Keith Busch
2023-05-04  8:24     ` John Garry
2023-05-03 18:38 ` [PATCH RFC 07/16] block: Add bdev_find_max_atomic_write_alignment() John Garry
2023-05-03 18:38 ` [PATCH RFC 08/16] block: Add support for atomic_write_unit John Garry
2023-05-03 18:38 ` [PATCH RFC 09/16] block: Add blk_validate_atomic_write_op() John Garry
2023-05-03 18:38 ` [PATCH RFC 10/16] block: Add fops atomic write support John Garry
2023-05-03 18:38 ` [PATCH RFC 11/16] fs: iomap: Atomic " John Garry
2023-05-04  5:00   ` Dave Chinner
2023-05-05 21:19     ` Darrick J. Wong
2023-05-05 23:56       ` Dave Chinner
2023-05-03 18:38 ` [PATCH RFC 12/16] xfs: Add support for fallocate2 John Garry
2023-05-03 23:26   ` Dave Chinner
2023-05-05 22:23     ` Darrick J. Wong
2023-05-05 23:42       ` Dave Chinner
2023-05-03 18:38 ` [PATCH RFC 13/16] scsi: sd: Support reading atomic properties from block limits VPD John Garry
2023-05-03 18:38 ` [PATCH RFC 14/16] scsi: sd: Add WRITE_ATOMIC_16 support John Garry
2023-05-03 18:48   ` Bart Van Assche
2023-05-04  8:17     ` John Garry
2023-05-03 18:38 ` [PATCH RFC 15/16] scsi: scsi_debug: Atomic write support John Garry
2023-05-03 18:38 ` [PATCH RFC 16/16] nvme: Support atomic writes John Garry
2023-05-03 18:49   ` Bart Van Assche
2023-05-04  8:19     ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230503183821.1473305-1-john.g.garry@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jejb@linux.ibm.com \
    --cc=jmorris@namei.org \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=paul@paul-moore.com \
    --cc=sagi@grimberg.me \
    --cc=serge@hallyn.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).