linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ojaswin Mujoo <ojaswin@linux.ibm.com>
To: linux-ext4@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>
Cc: Ritesh Harjani <ritesh.list@gmail.com>,
	linux-kernel@vger.kernel.org,
	"Darrick J . Wong" <djwong@kernel.org>,
	linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	John Garry <john.g.garry@oracle.com>,
	dchinner@redhat.com
Subject: [RFC 0/7] ext4: Allocator changes for atomic write support with DIO
Date: Thu, 30 Nov 2023 19:23:08 +0530	[thread overview]
Message-ID: <cover.1701339358.git.ojaswin@linux.ibm.com> (raw)

This patch series builds on top of John Gary's atomic direct write 
patch series [1] and enables this support in ext4. This is a 2 step
process:

1. Enable aligned allocation in ext4 mballoc. This allows us to allocate
power-of-2 aligned physical blocks, which is needed for atomic writes.

2. Hook the direct IO path in ext4 to use aligned allocation to obtain 
physical blocks at a given alignment, which is needed for atomic IO. If 
for any reason we are not able to obtain blocks at given alignment we
fail the atomic write.

Currently this RFC does not impose any restrictions for atomic and non-atomic
allocations to any inode,  which also leaves policy decisions to user-space
as much as possible. So, for example, the user space can:

 * Do an atomic direct IO at any alignment and size provided it
   satisfies underlying device constraints. The only restriction for now
   is that it should be power of 2 len and atleast of FS block size.

 * Do any combination of non atomic and atomic writes on the same file
   in any order. As long as the user space is passing the RWF_ATOMIC flag 
   to pwritev2() it is guaranteed to do an atomic IO (or fail if not
   possible).

There are some TODOs on the allocator side which are remaining like...

1.  Fallback to original request size when normalized request size (due to
    preallocation) allocation is not possible.
2.  Testing some edge cases.

But since all the basic test scenarios were covered, hence we wanted to get
this RFC out for discussion on atomic write support for DIO in ext4.

Further points for discussion -

1. We might need an inode flag to identify that the inode has blocks/extents
atomically allocated. So that other userspace tools do not move the blocks of
the inode for e.g. during resize/fsck etc.
  a. Should inode be marked as atomic similar to how we have IS_DAX(inode)
  implementation? Any thoughts?

2. Should there be support for open flags like O_ATOMIC. So that in case if
user wants to do only atomic writes to an open fd, then all writes can be
considered atomic.

3. Do we need to have any feature compat flags for FS? (IMO) It doesn't look
like since say if there are block allocations done which were done atomically,
it should not matter to FS w.r.t compatibility.

4. Mostly aligned allocations are required when we don't have data=journal
mode. So should we return -EIO with data journalling mode for DIO request?

Script to test using pwritev2() can be found here: 
https://gist.github.com/OjaswinM/e67accee3cbb7832bd3f1a9543c01da9

Regards,
ojaswin

[1] https://lore.kernel.org/linux-fsdevel/20230929102726.2985188-1-john.g.garry@oracle.com


Ojaswin Mujoo (7):
  iomap: Don't fall back to buffered write if the write is atomic
  ext4: Factor out size and start prediction from
    ext4_mb_normalize_request()
  ext4: add aligned allocation support in mballoc
  ext4: allow inode preallocation for aligned alloc
  block: export blkdev_atomic_write_valid() and refactor api
  ext4: Add aligned allocation support for atomic direct io
  ext4: Support atomic write for statx

 block/fops.c                |  18 ++-
 fs/ext4/ext4.h              |  10 +-
 fs/ext4/extents.c           |  14 ++
 fs/ext4/file.c              |  49 ++++++
 fs/ext4/inode.c             | 142 ++++++++++++++++-
 fs/ext4/mballoc.c           | 302 +++++++++++++++++++++++++-----------
 fs/iomap/direct-io.c        |   8 +-
 include/linux/blkdev.h      |   2 +
 include/trace/events/ext4.h |   2 +
 9 files changed, 442 insertions(+), 105 deletions(-)

-- 
2.39.3


             reply	other threads:[~2023-11-30 13:53 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30 13:53 Ojaswin Mujoo [this message]
2023-11-30 13:53 ` [RFC 1/7] iomap: Don't fall back to buffered write if the write is atomic Ojaswin Mujoo
2023-11-30 21:10   ` Dave Chinner
2023-12-01 10:42     ` John Garry
2023-12-01 13:27       ` Matthew Wilcox
2023-12-01 19:06         ` John Garry
2023-12-01 22:07       ` Dave Chinner
2023-12-04  9:02         ` John Garry
2023-12-04 18:17           ` Darrick J. Wong
2023-12-04 18:34             ` John Garry
2023-12-07 12:43         ` John Garry
2023-11-30 13:53 ` [RFC 2/7] ext4: Factor out size and start prediction from ext4_mb_normalize_request() Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 3/7] ext4: add aligned allocation support in mballoc Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 4/7] ext4: allow inode preallocation for aligned alloc Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 5/7] block: export blkdev_atomic_write_valid() and refactor api Ojaswin Mujoo
2023-12-01 10:47   ` John Garry
2023-12-11 10:57     ` Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 6/7] ext4: Add aligned allocation support for atomic direct io Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 7/7] ext4: Support atomic write for statx Ojaswin Mujoo
2023-12-04 10:36 ` [RFC 0/7] ext4: Allocator changes for atomic write support with DIO John Garry
2023-12-04 13:38   ` Ojaswin Mujoo
2023-12-04 14:44     ` John Garry
2023-12-11 10:54       ` Ojaswin Mujoo
2023-12-12  7:46         ` John Garry
2023-12-12 13:10           ` Christoph Hellwig
2023-12-12 15:16             ` Theodore Ts'o
2023-12-12 15:19               ` Christoph Hellwig
2023-12-12 16:10             ` John Garry
2023-12-13  5:59           ` Ojaswin Mujoo
2023-12-13  9:17             ` John Garry
2023-12-13  6:42         ` Ojaswin Mujoo
2023-12-13  9:20           ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1701339358.git.ojaswin@linux.ibm.com \
    --to=ojaswin@linux.ibm.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=john.g.garry@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).