From: Christoph Hellwig <hch@infradead.org>
To: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: linux-btrfs@vger.kernel.org, hch@infradead.org,
darrick.wong@oracle.com, fdmanana@kernel.org, nborisov@suse.com,
dsterba@suse.cz, jthumshirn@suse.de,
linux-fsdevel@vger.kernel.org,
Goldwyn Rodrigues <rgoldwyn@suse.com>
Subject: Re: [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio
Date: Sat, 21 Dec 2019 06:42:26 -0800 [thread overview]
Message-ID: <20191221144226.GA25804@infradead.org> (raw)
In-Reply-To: <20191213195750.32184-5-rgoldwyn@suse.de>
[-- Attachment #1: Type: text/plain, Size: 580 bytes --]
So Ilooked into the "unlocked" direct I/O case, and I think the current
code using dio_sem is really sketchy. What btrfs really needs to do is
take i_rwsem shared by default for direct writes, and only upgrade to
the exclusive lock when needed, similar to xfs and the WIP ext4 code.
While looking for that I also noticed two other things:
- check_direct_IO looks pretty bogus
- btrfs_direct_IO really should be split and folded into the two
callers
Untested patches attached. The first should probably go into a prep
patch, and the second could be folded into this one.
[-- Attachment #2: 0001-btrfs-remove-direct-I-O-aligment-checks.patch --]
[-- Type: text/plain, Size: 3922 bytes --]
From bc285e440a50140beb456f11e545a049bdf51ec1 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Sat, 21 Dec 2019 15:17:26 +0100
Subject: btrfs: remove direct I/O aligment checks
The direct I/O code itself already checks for the proper sector
size alignment, so remove the duplicate checks. The remainder of
check_direct_IO is not ony needed for reads and can be moved to
file.c and outside of i_rwsem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/btrfs/file.c | 34 +++++++++++++++++++++++++++-------
fs/btrfs/inode.c | 37 -------------------------------------
2 files changed, 27 insertions(+), 44 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a6d41d7bf362..0522f6d45a98 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -3444,21 +3444,41 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
return generic_file_open(inode, filp);
}
-static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+/*
+ * If there are duplicate iov_base's in this iovec, fall back to buffered I/O
+ * to avoid checksum errors.
+ */
+static bool btrfs_direct_read_ok(struct kiocb *iocb, struct iov_iter *iter)
{
- ssize_t ret = 0;
+ int seg, i;
- if (iocb->ki_flags & IOCB_DIRECT) {
+ if (!iter_is_iovec(iter))
+ return true;
+
+ for (seg = 0; seg < iter->nr_segs; seg++) {
+ for (i = seg + 1; i < iter->nr_segs; i++) {
+ if (iter->iov[seg].iov_base == iter->iov[i].iov_base)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+
+static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+ if ((iocb->ki_flags & IOCB_DIRECT) && btrfs_direct_read_ok(iocb, to)) {
struct inode *inode = file_inode(iocb->ki_filp);
+ ssize_t ret;
inode_lock_shared(inode);
ret = btrfs_direct_IO(iocb, to);
inode_unlock_shared(inode);
- if (ret < 0)
- return ret;
- }
- return generic_file_buffered_read(iocb, to, ret);
+ return ret;
+ }
+ return generic_file_buffered_read(iocb, to, 0);
}
const struct file_operations btrfs_file_operations = {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 824f318cee5e..18d153a62655 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8581,39 +8581,6 @@ static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file,
return BLK_QC_T_NONE;
}
-static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
- const struct iov_iter *iter, loff_t offset)
-{
- int seg;
- int i;
- unsigned int blocksize_mask = fs_info->sectorsize - 1;
- ssize_t retval = -EINVAL;
-
- if (offset & blocksize_mask)
- goto out;
-
- if (iov_iter_alignment(iter) & blocksize_mask)
- goto out;
-
- /* If this is a write we don't need to check anymore */
- if (iov_iter_rw(iter) != READ || !iter_is_iovec(iter))
- return 0;
- /*
- * Check to make sure we don't have duplicate iov_base's in this
- * iovec, if so return EINVAL, otherwise we'll get csum errors
- * when reading back.
- */
- for (seg = 0; seg < iter->nr_segs; seg++) {
- for (i = seg + 1; i < iter->nr_segs; i++) {
- if (iter->iov[seg].iov_base == iter->iov[i].iov_base)
- goto out;
- }
- }
- retval = 0;
-out:
- return retval;
-}
-
static const struct iomap_ops btrfs_dio_iomap_ops = {
.iomap_begin = btrfs_dio_iomap_begin,
.iomap_end = btrfs_dio_iomap_end,
@@ -8635,7 +8602,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
- struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct extent_changeset *data_reserved = NULL;
loff_t offset = iocb->ki_pos;
size_t count = 0;
@@ -8644,9 +8610,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
lockdep_assert_held(&inode->i_rwsem);
- if (check_direct_IO(fs_info, iter, offset))
- return 0;
-
count = iov_iter_count(iter);
if (iov_iter_rw(iter) == WRITE) {
/*
--
2.24.0
[-- Attachment #3: 0002-btrfs-split-btrfs_direct_IO.patch --]
[-- Type: text/plain, Size: 6757 bytes --]
From 7194fa1986a48af46d2b01457865066cdbd14e35 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Sat, 21 Dec 2019 15:23:41 +0100
Subject: btrfs: split btrfs_direct_IO
The read and write versions don't have anything in common except
for the call to iomap_dio_rw. So split this function, and merge
each half into its only caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/btrfs/ctree.h | 4 ++-
fs/btrfs/file.c | 44 +++++++++++++++++++++++++----
fs/btrfs/inode.c | 72 ++++--------------------------------------------
3 files changed, 48 insertions(+), 72 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8faa069b0a73..fccbbfebdf88 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -28,6 +28,7 @@
#include <linux/dynamic_debug.h>
#include <linux/refcount.h>
#include <linux/crc32c.h>
+#include <linux/iomap.h>
#include "extent-io-tree.h"
#include "extent_io.h"
#include "extent_map.h"
@@ -2904,7 +2905,8 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end);
void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
u64 end, int uptodate);
extern const struct dentry_operations btrfs_dentry_operations;
-ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
+const struct iomap_ops btrfs_dio_iomap_ops;
+const struct iomap_dio_ops btrfs_dio_ops;
/* ioctl.c */
long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0522f6d45a98..ed0b2e015d8d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1822,17 +1822,50 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
return num_written ? num_written : ret;
}
-static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
+static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(file);
- loff_t pos;
+ size_t count = iov_iter_count(from);
+ struct extent_changeset *data_reserved = NULL;
+ loff_t pos = iocb->ki_pos;
ssize_t written;
ssize_t written_buffered;
loff_t endbyte;
+ bool relock = false;
int err;
- written = btrfs_direct_IO(iocb, from);
+ /*
+ * If the write DIO is beyond the EOF, we need update the isize, but
+ * it is protected by i_mutex. So we can not unlock the i_mutex in
+ * this case.
+ */
+ if (pos + count <= inode->i_size) {
+ inode_unlock(inode);
+ relock = true;
+ } else {
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ return -EAGAIN;
+ }
+
+ err = btrfs_delalloc_reserve_space(inode, &data_reserved, pos, count);
+ if (err) {
+ if (relock)
+ inode_lock(inode);
+ return err;
+ }
+
+ down_read(&BTRFS_I(inode)->dio_sem);
+ written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
+ is_sync_kiocb(iocb));
+ up_read(&BTRFS_I(inode)->dio_sem);
+ if (written >= 0 && (size_t)written < count)
+ btrfs_delalloc_release_space(inode, data_reserved,
+ pos, count - (size_t)written, true);
+ btrfs_delalloc_release_extents(BTRFS_I(inode), count);
+ if (relock)
+ inode_lock(inode);
+ extent_changeset_free(data_reserved);
if (written < 0 || !iov_iter_count(from))
return written;
@@ -1975,7 +2008,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
atomic_inc(&BTRFS_I(inode)->sync_writers);
if (iocb->ki_flags & IOCB_DIRECT) {
- num_written = __btrfs_direct_write(iocb, from);
+ num_written = btrfs_direct_write(iocb, from);
} else {
num_written = btrfs_buffered_write(iocb, from);
if (num_written > 0)
@@ -3473,7 +3506,8 @@ static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
ssize_t ret;
inode_lock_shared(inode);
- ret = btrfs_direct_IO(iocb, to);
+ ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops,
+ &btrfs_dio_ops, is_sync_kiocb(iocb));
inode_unlock_shared(inode);
return ret;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 18d153a62655..7b747270ec40 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -29,7 +29,6 @@
#include <linux/iversion.h>
#include <linux/swap.h>
#include <linux/sched/mm.h>
-#include <linux/iomap.h>
#include <asm/unaligned.h>
#include "misc.h"
#include "ctree.h"
@@ -7856,6 +7855,11 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
return 0;
}
+const struct iomap_ops btrfs_dio_iomap_ops = {
+ .iomap_begin = btrfs_dio_iomap_begin,
+ .iomap_end = btrfs_dio_iomap_end,
+};
+
static inline blk_status_t submit_dio_repair_bio(struct inode *inode,
struct bio *bio,
int mirror_num)
@@ -8581,74 +8585,10 @@ static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file,
return BLK_QC_T_NONE;
}
-static const struct iomap_ops btrfs_dio_iomap_ops = {
- .iomap_begin = btrfs_dio_iomap_begin,
- .iomap_end = btrfs_dio_iomap_end,
-};
-
-static const struct iomap_dio_ops btrfs_dops = {
+const struct iomap_dio_ops btrfs_dio_ops = {
.submit_io = btrfs_submit_direct,
};
-
-/*
- * btrfs_direct_IO - perform direct I/O
- * inode->i_rwsem must be locked before calling this function, shared or exclusive.
- * @iocb - kernel iocb
- * @iter - iter to/from data is copied
- */
-
-ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file->f_mapping->host;
- struct extent_changeset *data_reserved = NULL;
- loff_t offset = iocb->ki_pos;
- size_t count = 0;
- bool relock = false;
- ssize_t ret;
-
- lockdep_assert_held(&inode->i_rwsem);
-
- count = iov_iter_count(iter);
- if (iov_iter_rw(iter) == WRITE) {
- /*
- * If the write DIO is beyond the EOF, we need update
- * the isize, but it is protected by i_mutex. So we can
- * not unlock the i_mutex at this case.
- */
- if (offset + count <= inode->i_size) {
- inode_unlock(inode);
- relock = true;
- } else if (iocb->ki_flags & IOCB_NOWAIT) {
- ret = -EAGAIN;
- goto out;
- }
- ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
- offset, count);
- if (ret)
- goto out;
-
- down_read(&BTRFS_I(inode)->dio_sem);
- }
-
- ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dops,
- is_sync_kiocb(iocb));
-
- if (iov_iter_rw(iter) == WRITE) {
- up_read(&BTRFS_I(inode)->dio_sem);
- if (ret >= 0 && (size_t)ret < count)
- btrfs_delalloc_release_space(inode, data_reserved,
- offset, count - (size_t)ret, true);
- btrfs_delalloc_release_extents(BTRFS_I(inode), count);
- }
-out:
- if (relock)
- inode_lock(inode);
- extent_changeset_free(data_reserved);
- return ret;
-}
-
#define BTRFS_FIEMAP_FLAGS (FIEMAP_FLAG_SYNC)
static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
--
2.24.0
next prev parent reply other threads:[~2019-12-21 14:42 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-13 19:57 [PATCH 0/8 v6] btrfs direct-io using iomap Goldwyn Rodrigues
2019-12-13 19:57 ` [PATCH 1/8] fs: Export generic_file_buffered_read() Goldwyn Rodrigues
2019-12-13 19:57 ` [PATCH 2/8] iomap: add a filesystem hook for direct I/O bio submission Goldwyn Rodrigues
2019-12-14 0:31 ` Darrick J. Wong
2019-12-18 2:02 ` Darrick J. Wong
2019-12-13 19:57 ` [PATCH 3/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls Goldwyn Rodrigues
2019-12-14 0:32 ` Darrick J. Wong
2019-12-18 2:04 ` Darrick J. Wong
2019-12-21 13:41 ` Christoph Hellwig
2019-12-21 13:42 ` Christoph Hellwig
2019-12-21 18:02 ` Darrick J. Wong
2019-12-13 19:57 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
2019-12-21 14:42 ` Christoph Hellwig [this message]
2020-01-02 18:01 ` Goldwyn Rodrigues
2020-01-07 17:23 ` Christoph Hellwig
2020-01-07 11:59 ` Goldwyn Rodrigues
2020-01-07 17:21 ` Christoph Hellwig
2019-12-13 19:57 ` [PATCH 5/8] fs: Remove dio_end_io() Goldwyn Rodrigues
2019-12-13 19:57 ` [PATCH 6/8] btrfs: Wait for extent bits to release page Goldwyn Rodrigues
2019-12-13 19:57 ` [PATCH 7/8] btrfs: Use ->iomap_end() instead of btrfs_dio_data Goldwyn Rodrigues
2019-12-13 19:57 ` [PATCH 8/8] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Goldwyn Rodrigues
2019-12-16 0:01 ` [PATCH 0/8 v6] btrfs direct-io using iomap Nikolay Borisov
2019-12-16 12:41 ` Goldwyn Rodrigues
-- strict thread matches above, loose matches on Subject: below --
2019-12-10 23:01 [PATCH 0/8 v4] " Goldwyn Rodrigues
2019-12-10 23:01 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
2019-12-11 8:58 ` Filipe Manana
2019-12-11 10:43 ` Nikolay Borisov
2019-12-05 15:56 [PATCH 0/8 v3] btrfs direct-io using iomap Goldwyn Rodrigues
2019-12-05 15:56 ` [PATCH 4/8] btrfs: Switch to iomap_dio_rw() for dio Goldwyn Rodrigues
2019-12-05 17:18 ` Johannes Thumshirn
2019-12-05 17:19 ` Christoph Hellwig
2019-12-05 17:32 ` Johannes Thumshirn
2019-12-05 17:33 ` Christoph Hellwig
2019-12-05 17:36 ` Johannes Thumshirn
2019-12-05 17:37 ` Christoph Hellwig
2019-12-05 17:37 ` Christoph Hellwig
2019-12-05 17:40 ` Johannes Thumshirn
2019-12-05 17:44 ` Goldwyn Rodrigues
2019-12-05 22:59 ` Nikolay Borisov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191221144226.GA25804@infradead.org \
--to=hch@infradead.org \
--cc=darrick.wong@oracle.com \
--cc=dsterba@suse.cz \
--cc=fdmanana@kernel.org \
--cc=jthumshirn@suse.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=nborisov@suse.com \
--cc=rgoldwyn@suse.com \
--cc=rgoldwyn@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).