io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	jejb@linux.ibm.com, martin.petersen@oracle.com,
	djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org,
	dchinner@redhat.com, jack@suse.cz
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-fsdevel@vger.kernel.org,
	tytso@mit.edu, jbongio@google.com, linux-scsi@vger.kernel.org,
	ojaswin@linux.ibm.com, linux-aio@kvack.org,
	linux-btrfs@vger.kernel.org, io-uring@vger.kernel.org,
	nilay@linux.ibm.com, ritesh.list@gmail.com,
	John Garry <john.g.garry@oracle.com>
Subject: [PATCH v4 00/11] block atomic writes
Date: Mon, 19 Feb 2024 13:00:58 +0000	[thread overview]
Message-ID: <20240219130109.341523-1-john.g.garry@oracle.com> (raw)

This series introduces a proposal to implementing atomic writes in the
kernel for torn-write protection.

This series takes the approach of adding a new "atomic" flag to each of
pwritev2() and iocb->ki_flags - RWF_ATOMIC and IOCB_ATOMIC, respectively.
When set, these indicate that we want the write issued "atomically".

Only direct IO is supported and for block devices here. For this, atomic
write HW is required, like SCSI ATOMIC WRITE (16).

XFS FS support will require rework according to discussion at:
https://lore.kernel.org/linux-fsdevel/20240124142645.9334-1-john.g.garry@oracle.com/T/#m916df899e9d0fb688cdbd415826ae2423306c2e0

The current plan there is to use forcealign feature from the start. This
will take a bit of time to get done.

Updated man pages have been posted at:
https://lore.kernel.org/lkml/20240124112731.28579-1-john.g.garry@oracle.com/T/#m520dca97a9748de352b5a723d3155a4bb1e46456

The goal here is to provide an interface that allows applications use
application-specific block sizes larger than logical block size
reported by the storage device or larger than filesystem block size as
reported by stat().

With this new interface, application blocks will never be torn or
fractured when written. For a power fail, for each individual application
block, all or none of the data to be written. A racing atomic write and
read will mean that the read sees all the old data or all the new data,
but never a mix of old and new.

Three new fields are added to struct statx - atomic_write_unit_min,
atomic_write_unit_max, and atomic_write_segments_max. For each atomic
individual write, the total length of a write must be a between
atomic_write_unit_min and atomic_write_unit_max, inclusive, and a
power-of-2. The write must also be at a natural offset in the file
wrt the write length. For pwritev2, iovcnt is limited by
atomic_write_segments_max.

SCSI sd.c and scsi_debug and NVMe kernel support is added.

This series is based on v6.8-rc5.

Patches can be found at:
https://github.com/johnpgarry/linux/commits/atomic-writes-v6.8-v4

Changes since v3:
- Condense and reorder patches, and also write proper commit messages
- Add patch block fops.c patch to change blkdev_dio_unaligned() callsite
- Re-use block limits in nvme_valid_atomic_write()
- Disallow RWF_ATOMIC for reads
- Add HCH RB tag for blk_queue_get_max_sectors() patch and modify commit
  message for new position

Changes since v2:
- Support atomic_write_segments_max
- Limit atomic write paramaters to max_hw_sectors_kb
- Don't increase fmode_t
- Change value for RWF_ATOMIC
- Various tidying (including advised by Jan)


Alan Adamson (2):
  nvme: Atomic write support
  nvme: Ensure atomic writes will be executed atomically

John Garry (6):
  block: Pass blk_queue_get_max_sectors() a request pointer
  block: Call blkdev_dio_unaligned() from blkdev_direct_IO()
  block: Add core atomic write support
  block: Add fops atomic write support
  scsi: sd: Atomic write support
  scsi: scsi_debug: Atomic write support

Prasad Singamsetty (3):
  fs: Initial atomic write support
  fs: Add initial atomic write support info to statx
  block: Add atomic write support for statx

 Documentation/ABI/stable/sysfs-block |  52 +++
 block/bdev.c                         |  37 +-
 block/blk-merge.c                    |  94 ++++-
 block/blk-mq.c                       |   2 +-
 block/blk-settings.c                 | 103 +++++
 block/blk-sysfs.c                    |  33 ++
 block/blk.h                          |   9 +-
 block/fops.c                         |  43 +-
 drivers/nvme/host/core.c             |  73 ++++
 drivers/scsi/scsi_debug.c            | 589 +++++++++++++++++++++------
 drivers/scsi/scsi_trace.c            |  22 +
 drivers/scsi/sd.c                    |  93 ++++-
 drivers/scsi/sd.h                    |   8 +
 fs/aio.c                             |   8 +-
 fs/btrfs/ioctl.c                     |   2 +-
 fs/read_write.c                      |   2 +-
 fs/stat.c                            |  47 ++-
 include/linux/blk_types.h            |   2 +
 include/linux/blkdev.h               |  65 ++-
 include/linux/fs.h                   |  39 +-
 include/linux/stat.h                 |   3 +
 include/scsi/scsi_proto.h            |   1 +
 include/trace/events/scsi.h          |   1 +
 include/uapi/linux/fs.h              |   5 +-
 include/uapi/linux/stat.h            |   9 +-
 io_uring/rw.c                        |   4 +-
 26 files changed, 1166 insertions(+), 180 deletions(-)

-- 
2.31.1


             reply	other threads:[~2024-02-19 13:02 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19 13:00 John Garry [this message]
2024-02-19 13:00 ` [PATCH v4 01/11] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2024-02-19 18:57   ` Keith Busch
2024-02-19 13:01 ` [PATCH v4 02/11] block: Call blkdev_dio_unaligned() from blkdev_direct_IO() John Garry
2024-02-19 18:57   ` Keith Busch
2024-02-20  8:31     ` John Garry
2024-02-20  6:54   ` Christoph Hellwig
2024-02-19 13:01 ` [PATCH v4 03/11] fs: Initial atomic write support John Garry
2024-02-19 19:16   ` David Sterba
2024-02-20  8:13     ` John Garry
2024-02-19 22:44   ` Dave Chinner
2024-02-20  9:52     ` John Garry
2024-02-24 18:16   ` Ritesh Harjani
2024-02-24 18:20     ` Ritesh Harjani
2024-02-26  8:58       ` John Garry
2024-02-26  9:13         ` Ritesh Harjani
2024-02-26  9:46           ` John Garry
2024-02-26  8:51     ` John Garry
2024-02-19 13:01 ` [PATCH v4 04/11] fs: Add initial atomic write support info to statx John Garry
2024-02-19 22:28   ` Dave Chinner
2024-02-20  9:40     ` John Garry
2024-02-20  8:20   ` Christoph Hellwig
2024-02-20  9:01     ` John Garry
2024-02-24 18:46   ` Ritesh Harjani
2024-02-26  9:07     ` John Garry
2024-02-19 13:01 ` [PATCH v4 05/11] block: Add core atomic write support John Garry
2024-02-19 22:58   ` Dave Chinner
2024-02-20  8:22     ` Christoph Hellwig
2024-02-20 10:01     ` John Garry
2024-02-25 12:09   ` Ritesh Harjani
2024-02-25 12:21     ` Ritesh Harjani
2024-02-26  9:23     ` John Garry
2024-02-19 13:01 ` [PATCH v4 06/11] block: Add atomic write support for statx John Garry
2024-02-20  8:29   ` Christoph Hellwig
2024-02-20  9:35     ` John Garry
2024-02-25 14:20   ` Ritesh Harjani
2024-02-26  9:36     ` John Garry
2024-02-19 13:01 ` [PATCH v4 07/11] block: Add fops atomic write support John Garry
2024-02-25 14:46   ` Ritesh Harjani
2024-02-26  9:46     ` John Garry
2024-02-19 13:01 ` [PATCH v4 08/11] scsi: sd: Atomic " John Garry
2024-02-19 13:01 ` [PATCH v4 09/11] scsi: scsi_debug: " John Garry
2024-02-20  7:12   ` Ojaswin Mujoo
2024-02-20  9:01     ` John Garry
2024-02-19 13:01 ` [PATCH v4 10/11] nvme: " John Garry
2024-02-19 19:21   ` Keith Busch
2024-02-20  6:55     ` Christoph Hellwig
2024-02-20  8:19       ` John Garry
2024-02-20  8:31   ` Christoph Hellwig
2024-02-20  8:50     ` John Garry
2024-02-19 13:01 ` [PATCH v4 11/11] nvme: Ensure atomic writes will be executed atomically John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240219130109.341523-1-john.g.garry@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=jbongio@google.com \
    --cc=jejb@linux.ibm.com \
    --cc=kbusch@kernel.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nilay@linux.ibm.com \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=sagi@grimberg.me \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).