linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: John Garry <john.g.garry@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>,
	axboe@kernel.dk, kbusch@kernel.org, sagi@grimberg.me,
	jejb@linux.ibm.com, martin.petersen@oracle.com,
	djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org,
	chandan.babu@oracle.com, dchinner@redhat.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, jbongio@google.com,
	linux-api@vger.kernel.org
Subject: Re: [PATCH 17/21] fs: xfs: iomap atomic write support
Date: Mon, 4 Dec 2023 23:55:39 -0500	[thread overview]
Message-ID: <20231205045539.GH509422@mit.edu> (raw)
In-Reply-To: <a87d48a7-f2a8-40ae-8d9b-e4534ccc29b1@oracle.com>

On Mon, Dec 04, 2023 at 03:19:15PM +0000, John Garry wrote:
> > 
> > What is the 'dubious amazon torn-write prevention'?
> 
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-twp.html
> 
> AFAICS, this is without any kernel changes, so no guarantee of unwanted
> splitting or merging of bios.

Well, more than one company has audited the kernel paths, and it turns
out that for selected Kernel versions, after doing desk-check
verification of the relevant kernel baths, as well as experimental
verification via testing to try to find torn writes in the kernel, we
can make it safe for specific kernel versions which might be used in
hosted MySQL instances where we control the kernel, the mysql server,
and the emulated block device (and we know the database is doing
Direct I/O writes --- this won't work for PostgreSQL).  I gave a talk
about this at Google I/O Next '18, five years ago[1].

[1] https://www.youtube.com/watch?v=gIeuiGg-_iw

Given the performance gains (see the talk (see the comparison of the
at time 19:31 and at 29:57) --- it's quite compelling. 

Of course, I wouldn't recommend this approach for a naive sysadmin,
since most database adminsitrators won't know how to audit kernel code
(see the discussion at time 35:10 of the video), and reverify the
entire software stack before every kernel upgrade.  The challenge is
how to do this safely.

The fact remains that both Amazon's EBS and Google's Persistent Disk
products are implemented in such a way that writes will not be torn
below the virtual machine, and the guarantees are in fact quite a bit
stronger than what we will probably end up advertising via NVMe and/or
SCSI.  It wouldn't surprise me if this is the case (or could be made
to be the case) For Oracle Cloud as well.

The question is how to make this guarantee so that the kernel knows
when various cloud-provided block devicse do provide these greater
guarantees, and then how to make it be an architected feature, as
opposed to a happy implementation detail that has to be verified at
every kernel upgrade.

Cheers,

						- Ted

  parent reply	other threads:[~2023-12-05  4:58 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-29 10:27 [PATCH 00/21] block atomic writes John Garry
2023-09-29 10:27 ` [PATCH 01/21] block: Add atomic write operations to request_queue limits John Garry
2023-10-03 16:40   ` Bart Van Assche
2023-10-04  3:00     ` Martin K. Petersen
2023-10-04 17:28       ` Bart Van Assche
2023-10-04 18:26         ` Martin K. Petersen
2023-10-04 21:00       ` Bart Van Assche
2023-10-05  8:22         ` John Garry
2023-11-09 15:10   ` Christoph Hellwig
2023-11-09 17:01     ` John Garry
2023-11-10  6:23       ` Christoph Hellwig
2023-11-10  9:04         ` John Garry
2023-09-29 10:27 ` [PATCH 02/21] block: Limit atomic writes according to bio and queue limits John Garry
2023-11-09 15:13   ` Christoph Hellwig
2023-11-09 17:41     ` John Garry
2023-12-04  3:19   ` Ming Lei
2023-12-04  3:55     ` Ming Lei
2023-12-04  9:35       ` John Garry
2023-09-29 10:27 ` [PATCH 03/21] fs/bdev: Add atomic write support info to statx John Garry
2023-09-29 22:49   ` Eric Biggers
2023-10-01 13:23     ` Bart Van Assche
2023-10-02  9:51       ` John Garry
2023-10-02 18:39         ` Bart Van Assche
2023-10-03  0:28           ` Martin K. Petersen
2023-11-09 15:15             ` Christoph Hellwig
2023-10-03  1:51         ` Dave Chinner
2023-10-03  2:57           ` Darrick J. Wong
2023-10-03  7:23             ` John Garry
2023-10-03 15:46               ` Darrick J. Wong
2023-10-04 14:19                 ` John Garry
2023-09-29 10:27 ` [PATCH 04/21] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support John Garry
2023-10-06 18:15   ` Jeremy Bongio
2023-10-09 22:02     ` Dave Chinner
2023-09-29 10:27 ` [PATCH 05/21] block: Add REQ_ATOMIC flag John Garry
2023-09-29 10:27 ` [PATCH 06/21] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2023-09-29 10:27 ` [PATCH 07/21] block: Limit atomic write IO size according to atomic_write_max_sectors John Garry
2023-09-29 10:27 ` [PATCH 08/21] block: Error an attempt to split an atomic write bio John Garry
2023-09-29 10:27 ` [PATCH 09/21] block: Add checks to merging of atomic writes John Garry
2023-09-30 13:40   ` kernel test robot
2023-10-02 22:50     ` Nathan Chancellor
2023-10-04 11:40       ` John Garry
2023-09-29 10:27 ` [PATCH 10/21] block: Add fops atomic write support John Garry
2023-09-29 17:51   ` Bart Van Assche
2023-10-02 10:10     ` John Garry
2023-10-02 19:12       ` Bart Van Assche
2023-10-03  0:48         ` Martin K. Petersen
2023-10-03 16:55           ` Bart Van Assche
2023-10-04  2:53             ` Martin K. Petersen
2023-10-04 17:22               ` Bart Van Assche
2023-10-04 18:17                 ` Martin K. Petersen
2023-10-05 17:10                   ` Bart Van Assche
2023-10-05 22:36                     ` Dave Chinner
2023-10-05 22:58                       ` Bart Van Assche
2023-10-06  4:31                         ` Dave Chinner
2023-10-06 17:22                           ` Bart Van Assche
2023-10-07  1:21                             ` Martin K. Petersen
2023-10-03  8:37         ` John Garry
2023-10-03 16:45           ` Bart Van Assche
2023-10-04  9:14             ` John Garry
2023-10-04 17:34               ` Bart Van Assche
2023-10-04 21:59                 ` Dave Chinner
2023-12-04  2:30   ` Ming Lei
2023-12-04  9:27     ` John Garry
2023-12-04 12:18       ` Ming Lei
2023-12-04 13:13         ` John Garry
2023-12-05  1:45           ` Ming Lei
2023-12-05 10:49             ` John Garry
2023-09-29 10:27 ` [PATCH 11/21] fs: xfs: Don't use low-space allocator for alignment > 1 John Garry
2023-10-03  1:16   ` Dave Chinner
2023-10-03  3:00     ` Darrick J. Wong
2023-10-03  4:34       ` Dave Chinner
2023-10-03 10:22       ` John Garry
2023-09-29 10:27 ` [PATCH 12/21] fs: xfs: Introduce FORCEALIGN inode flag John Garry
2023-11-09 15:24   ` Christoph Hellwig
2023-09-29 10:27 ` [PATCH 13/21] fs: xfs: Make file data allocations observe the 'forcealign' flag John Garry
2023-10-03  1:42   ` Dave Chinner
2023-10-03 10:13     ` John Garry
2023-09-29 10:27 ` [PATCH 14/21] fs: xfs: Enable file data forcealign feature John Garry
2023-09-29 10:27 ` [PATCH 15/21] fs: xfs: Support atomic write for statx John Garry
2023-10-03  3:32   ` Dave Chinner
2023-10-03 10:56     ` John Garry
2023-10-03 16:10       ` Darrick J. Wong
2023-09-29 10:27 ` [PATCH 16/21] fs: iomap: Atomic write support John Garry
2023-10-03  4:24   ` Dave Chinner
2023-10-03 12:55     ` John Garry
2023-10-03 16:47     ` Darrick J. Wong
2023-10-04  1:16       ` Dave Chinner
2023-10-24 12:59     ` John Garry
2023-09-29 10:27 ` [PATCH 17/21] fs: xfs: iomap atomic " John Garry
2023-11-09 15:26   ` Christoph Hellwig
2023-11-10 10:42     ` John Garry
2023-11-28  8:56       ` John Garry
2023-11-28 13:56         ` Christoph Hellwig
2023-11-28 17:42           ` John Garry
2023-11-29  2:45             ` Martin K. Petersen
2023-12-04 13:45             ` Christoph Hellwig
2023-12-04 15:19               ` John Garry
2023-12-04 15:39                 ` Christoph Hellwig
2023-12-04 18:06                   ` John Garry
2023-12-05  4:55                 ` Theodore Ts'o [this message]
2023-12-05 11:09                   ` John Garry
2023-12-05 13:59                 ` Ming Lei
2023-09-29 10:27 ` [PATCH 18/21] scsi: sd: Support reading atomic properties from block limits VPD John Garry
2023-09-29 17:54   ` Bart Van Assche
2023-10-02 11:27     ` John Garry
2023-10-06 17:52       ` Bart Van Assche
2023-10-06 23:48         ` Martin K. Petersen
2023-09-29 10:27 ` [PATCH 19/21] scsi: sd: Add WRITE_ATOMIC_16 support John Garry
2023-09-29 17:59   ` Bart Van Assche
2023-10-02 11:36     ` John Garry
2023-10-02 19:21       ` Bart Van Assche
2023-09-29 10:27 ` [PATCH 20/21] scsi: scsi_debug: Atomic write support John Garry
2023-09-29 10:27 ` [PATCH 21/21] nvme: Support atomic writes John Garry
     [not found]   ` <CGME20231004113943eucas1p23a51ce5ef06c36459f826101bb7b85fc@eucas1p2.samsung.com>
2023-10-04 11:39     ` Pankaj Raghav
2023-10-05 10:24       ` John Garry
2023-10-05 13:32         ` Pankaj Raghav
2023-10-05 15:05           ` John Garry
2023-11-09 15:36   ` Christoph Hellwig
2023-11-09 15:42     ` Matthew Wilcox
2023-11-09 15:46       ` Christoph Hellwig
2023-11-09 19:08         ` John Garry
2023-11-10  6:29           ` Christoph Hellwig
2023-11-10  8:44             ` John Garry
2023-09-29 14:58 ` [PATCH 00/21] block " Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231205045539.GH509422@mit.edu \
    --to=tytso@mit.edu \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jbongio@google.com \
    --cc=jejb@linux.ibm.com \
    --cc=john.g.garry@oracle.com \
    --cc=kbusch@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sagi@grimberg.me \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).