All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] btrfs-progs: use direct-IO for zoned device
@ 2021-09-27  4:15 Naohiro Aota
  2021-09-27  4:15 ` [PATCH 1/5] btrfs-progs: mkfs: do not set zone size on non-zoned mode Naohiro Aota
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Naohiro Aota @ 2021-09-27  4:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba, Naohiro Aota

As discussed in the Zoned Storage page [1],  the kernel page cache does not
guarantee that cached dirty pages will be flushed to a block device in
sequential sector order. Thus, we must use O_DIRECT for writing to a zoned
device to ensure the write ordering.

[1] https://zonedstorage.io/linux/overview/#zbd-support-restrictions

As a writng buffer is embedded in some other struct (e.g., "char data[]" in
struct extent_buffer), it is difficult to allocate the struct so that the
writng buffer is aligned.

This series introduces btrfs_{pread,pwrite} to wrap around pread/pwrite,
which allocates an aligned bounce buffer, copy the buffer contents, and
proceeds the IO. And, it now opens a zoned device with O_DIRECT.

Since the allocation and copying are costly, it is better to do them only
when necessary. But, it is cumbersome to call fcntl(F_GETFL) to determine
the file is opened with O_DIRECT or not every time doing an IO.

As zoned device forces to use zoned btrfs, I decided to use the zoned flag
to determine if it is direct-IO or not. This can cause a false-positive (to
use the bounce buffer when a file is *not* opened with O_DIRECT) in case of
emulated zoned mode on a non-zoned device or a regular file. Considering
the emulated zoned mode is mostly for debugging or testing, I believe this
is acceptable.

Patch 1 is a preparation not to set an emulated zone_size value when not
needed.

Patches 2 and 3 wraps pread/pwrite with newly introduced function
btrfs_pread/btrfs_pwrite.

Patches 4 deals with the zoned flag while reading the initial trees.

Patch 5 finally opens a zoned device with O_DIRECT.

Naohiro Aota (5):
  btrfs-progs: mkfs: do not set zone size on non-zoned mode
  btrfs-progs: introduce btrfs_pwrite wrapper for pwrite
  btrfs-progs: introduce btrfs_pread wrapper for pread
  btrfs-progs: temporally set zoned flag for initial tree reading
  btrfs-progs: use direct-IO for zoned device

 common/device-utils.c     | 76 ++++++++++++++++++++++++++++++++++++---
 common/device-utils.h     | 29 ++++++++++++++-
 kernel-shared/disk-io.c   | 19 +++++++++-
 kernel-shared/extent_io.c | 14 +++++---
 kernel-shared/volumes.c   |  4 +++
 kernel-shared/zoned.c     |  6 ++--
 mkfs/common.c             | 14 +++++---
 mkfs/main.c               | 12 +++++--
 8 files changed, 153 insertions(+), 21 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-10-05  6:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-27  4:15 [PATCH 0/5] btrfs-progs: use direct-IO for zoned device Naohiro Aota
2021-09-27  4:15 ` [PATCH 1/5] btrfs-progs: mkfs: do not set zone size on non-zoned mode Naohiro Aota
2021-09-27  9:19   ` Johannes Thumshirn
2021-09-27  4:15 ` [PATCH 2/5] btrfs-progs: introduce btrfs_pwrite wrapper for pwrite Naohiro Aota
2021-09-27  9:39   ` Johannes Thumshirn
2021-09-27  4:15 ` [PATCH 3/5] btrfs-progs: introduce btrfs_pread wrapper for pread Naohiro Aota
2021-09-27 10:23   ` Johannes Thumshirn
2021-09-27 18:41     ` David Sterba
2021-09-27  4:15 ` [PATCH 4/5] btrfs-progs: temporally set zoned flag for initial tree reading Naohiro Aota
2021-09-27 12:38   ` Johannes Thumshirn
2021-09-27  4:15 ` [PATCH 5/5] btrfs-progs: use direct-IO for zoned device Naohiro Aota
2021-09-27 18:48   ` David Sterba
2021-09-27 19:26 ` [PATCH 0/5] " David Sterba
2021-09-29  2:21   ` Naohiro Aota
2021-09-29 10:16     ` David Sterba
2021-09-27 21:51 ` David Sterba
2021-09-29  2:24   ` Naohiro Aota
2021-09-29 10:22     ` David Sterba
2021-10-05  6:11       ` Naohiro Aota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.