linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	jejb@linux.ibm.com, martin.petersen@oracle.com,
	djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org,
	dchinner@redhat.com, jack@suse.cz, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com,
	linux-scsi@vger.kernel.org, ojaswin@linux.ibm.com,
	linux-aio@kvack.org, linux-btrfs@vger.kernel.org,
	io-uring@vger.kernel.org, nilay@linux.ibm.com,
	ritesh.list@gmail.com
Subject: Re: [PATCH v5 00/10] block atomic writes
Date: Wed, 6 Mar 2024 09:05:55 +0000	[thread overview]
Message-ID: <47d264c2-bc97-4313-bce0-737557312106@oracle.com> (raw)
In-Reply-To: <ZeembVG-ygFal6Eb@casper.infradead.org>

On 05/03/2024 23:10, Matthew Wilcox wrote:
> On Mon, Feb 26, 2024 at 05:36:02PM +0000, John Garry wrote:
>> This series introduces a proposal to implementing atomic writes in the
>> kernel for torn-write protection.
> 
> The API as documented will be unnecessarily complicated to implement
> for buffered writes, I believe.  What I would prefer is a chattr (or, I
> guess, setxattr these days) that sets the tearing boundary for the file.
> The page cache can absorb writes of arbitrary size and alignment, but
> will be able to guarantee that (if the storage supports it), the only
> write tearing will happen on the specified boundary.

In the "block atomic writes for XFS" series which I sent on Monday, we 
do use setxattr to set the extent alignment for an inode. It is not a 
tearing boundary, but just rather effectively sets the max atomic write 
size for the inode. This extent size must be a power-of-2. From this we 
can support atomic write sizes of [FS block size, extent size] for 
direct IO.

For bdev file operations atomic write support in this series for direct 
IO, atomic write size is limited by the HW support only.

> 
> We _can_ support arbitrary power-of-two write sizes to the page cache,
> but if the requirement is no tearing inside a single write, then we
> will have to do a lot of work to make that true.  It isn't clear to me
> that anybody is asking for this; the databases I'm aware of are willing
> to submit 128kB writes and accept that there may be tearing at 16kB
> boundaries (or whatever).

In this case, I would expect the DB to submit 8x separate 16KB writes. 
However if we advertise a range of supported sizes, userspace is 
entitled to use that, i.e. they could submit a single 128kB write, if 
supported.

As for supporting buffered atomic writes, the very simplest solution for 
regular FS files is to fix the atomic write min and max size at the 
extent size, above. Indeed, that might solve most or even all usecases. 
This is effectively same as your idea to set a boundary size, except 
that userspace must submit individual 16KB writes for the above example. 
As for bdev file operations, extent sizes is not a thing, so that is 
still a problem.

Having said all this, from discussion "[LSF/MM/BPF TOPIC] untorn 
buffered writes", I was hearing that can use a high-order for RWF_ATOMIC 
data and it would be just a matter of implementing support in the page 
cache, like dealing with already-present overlapping smaller folios - is 
implementing this now the concern?

Thanks,
John







      reply	other threads:[~2024-03-06  9:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-26 17:36 [PATCH v5 00/10] block atomic writes John Garry
2024-02-26 17:36 ` [PATCH v5 01/10] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2024-02-26 17:36 ` [PATCH v5 02/10] block: Call blkdev_dio_unaligned() from blkdev_direct_IO() John Garry
2024-02-26 17:36 ` [PATCH v5 03/10] fs: Initial atomic write support John Garry
2024-03-08 16:34   ` Jens Axboe
2024-03-08 16:52     ` John Garry
2024-03-08 17:05       ` Jens Axboe
2024-03-08 17:15         ` John Garry
2024-03-08 17:18           ` Jens Axboe
2024-02-26 17:36 ` [PATCH v5 04/10] fs: Add initial atomic write support info to statx John Garry
2024-02-26 17:36 ` [PATCH v5 05/10] block: Add core atomic write support John Garry
2024-02-26 17:36 ` [PATCH v5 06/10] block: Add atomic write support for statx John Garry
2024-02-26 17:36 ` [PATCH v5 07/10] block: Add fops atomic write support John Garry
2024-02-26 17:36 ` [PATCH v5 08/10] scsi: sd: Atomic " John Garry
2024-02-26 17:36 ` [PATCH v5 09/10] scsi: scsi_debug: " John Garry
2024-02-26 17:36 ` [PATCH v5 10/10] nvme: " John Garry
2024-03-05 23:10 ` [PATCH v5 00/10] block atomic writes Matthew Wilcox
2024-03-06  9:05   ` John Garry [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47d264c2-bc97-4313-bce0-737557312106@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=jbongio@google.com \
    --cc=jejb@linux.ibm.com \
    --cc=kbusch@kernel.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nilay@linux.ibm.com \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=sagi@grimberg.me \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).