* [PATCH v2 0/2] Fix locking for btrfs direct writes
@ 2020-12-15 18:06 Goldwyn Rodrigues
2020-12-15 18:06 ` [PATCH 1/2] iomap: Separate out generic_write_sync() from iomap_dio_complete() Goldwyn Rodrigues
2020-12-15 18:06 ` [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock Goldwyn Rodrigues
0 siblings, 2 replies; 11+ messages in thread
From: Goldwyn Rodrigues @ 2020-12-15 18:06 UTC (permalink / raw)
To: linux-fsdevel, linux-btrfs; +Cc: darrick.wong, hch, nborisov, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
BTRFS direct write takes the inode lock for performing the direct write.
In case of a failure or an incomplete write, it falls back to buffered
writes. Before initiating the buffered write, it releases the inode lock
and reacquires it for buffered write. This may lead to corruption if
another process attempts to write around the same offset between the
unlock and the relock. The patches change the flow so that the lock is
taken only once before the write and released only after the I/O is
complete.
Goldwyn Rodrigues (2):
iomap: Separate out generic_write_sync() from iomap_dio_complete()
btrfs: Make btrfs_direct_write atomic with respect to inode_lock
fs/btrfs/file.c | 69 +++++++++++++++++++++++++------------------
fs/iomap/direct-io.c | 16 ++++++++--
include/linux/iomap.h | 2 +-
3 files changed, 54 insertions(+), 33 deletions(-)
--
2.29.2
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] iomap: Separate out generic_write_sync() from iomap_dio_complete()
2020-12-15 18:06 [PATCH v2 0/2] Fix locking for btrfs direct writes Goldwyn Rodrigues
@ 2020-12-15 18:06 ` Goldwyn Rodrigues
2020-12-15 21:24 ` kernel test robot
2020-12-15 22:16 ` Dave Chinner
2020-12-15 18:06 ` [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock Goldwyn Rodrigues
1 sibling, 2 replies; 11+ messages in thread
From: Goldwyn Rodrigues @ 2020-12-15 18:06 UTC (permalink / raw)
To: linux-fsdevel, linux-btrfs; +Cc: darrick.wong, hch, nborisov, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
This introduces a separate function __iomap_dio_complte() which
completes the Direct I/O without performing the write sync.
Filesystems such as btrfs which require an inode_lock for sync can call
__iomap_dio_complete() and must perform sync on their own after unlock.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
fs/iomap/direct-io.c | 16 +++++++++++++---
include/linux/iomap.h | 2 +-
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 933f234d5bec..11a108f39fd9 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -76,7 +76,7 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
dio->submit.cookie = submit_bio(bio);
}
-ssize_t iomap_dio_complete(struct iomap_dio *dio)
+ssize_t __iomap_dio_complete(struct iomap_dio *dio)
{
const struct iomap_dio_ops *dops = dio->dops;
struct kiocb *iocb = dio->iocb;
@@ -119,18 +119,28 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
}
inode_dio_end(file_inode(iocb->ki_filp));
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(__iomap_dio_complete);
+
+ssize_t iomap_dio_complete(struct iomap_dio *dio)
+{
+ ssize_t ret;
+
+ ret = __iomap_dio_complete(dio);
/*
* If this is a DSYNC write, make sure we push it to stable storage now
* that we've written data.
*/
if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
- ret = generic_write_sync(iocb, ret);
+ ret = generic_write_sync(dio->iocb, ret);
kfree(dio);
return ret;
}
-EXPORT_SYMBOL_GPL(iomap_dio_complete);
+
static void iomap_dio_complete_work(struct work_struct *work)
{
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 5bd3cac4df9c..5785dc0b8ec5 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -262,7 +262,7 @@ ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
bool wait_for_completion);
-ssize_t iomap_dio_complete(struct iomap_dio *dio);
+ssize_t __iomap_dio_complete(struct iomap_dio *dio);
int iomap_dio_iopoll(struct kiocb *kiocb, bool spin);
#ifdef CONFIG_SWAP
--
2.29.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock
2020-12-15 18:06 [PATCH v2 0/2] Fix locking for btrfs direct writes Goldwyn Rodrigues
2020-12-15 18:06 ` [PATCH 1/2] iomap: Separate out generic_write_sync() from iomap_dio_complete() Goldwyn Rodrigues
@ 2020-12-15 18:06 ` Goldwyn Rodrigues
2020-12-15 22:13 ` Darrick J. Wong
1 sibling, 1 reply; 11+ messages in thread
From: Goldwyn Rodrigues @ 2020-12-15 18:06 UTC (permalink / raw)
To: linux-fsdevel, linux-btrfs; +Cc: darrick.wong, hch, nborisov, Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
btrfs_direct_write() fallsback to buffered write in case btrfs is not
able to perform or complete a direct I/O. During the fallback
inode lock is unlocked and relocked. This does not guarantee the
atomicity of the entire write since the lock can be acquired by another
write between unlock and relock.
__btrfs_buffered_write() is used to perform the direct fallback write,
which performs the write without acquiring the lock or checks.
fa54fc76db94 ("btrfs: push inode locking and unlocking into buffered/direct write")
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
fs/btrfs/file.c | 69 ++++++++++++++++++++++++++++---------------------
1 file changed, 40 insertions(+), 29 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0e41459b8de6..9fc768b951f1 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1638,11 +1638,11 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
return 0;
}
-static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
+static noinline ssize_t __btrfs_buffered_write(struct kiocb *iocb,
struct iov_iter *i)
{
struct file *file = iocb->ki_filp;
- loff_t pos;
+ loff_t pos = iocb->ki_pos;
struct inode *inode = file_inode(file);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct page **pages = NULL;
@@ -1656,24 +1656,9 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
bool only_release_metadata = false;
bool force_page_uptodate = false;
loff_t old_isize = i_size_read(inode);
- unsigned int ilock_flags = 0;
-
- if (iocb->ki_flags & IOCB_NOWAIT)
- ilock_flags |= BTRFS_ILOCK_TRY;
-
- ret = btrfs_inode_lock(inode, ilock_flags);
- if (ret < 0)
- return ret;
-
- ret = generic_write_checks(iocb, i);
- if (ret <= 0)
- goto out;
- ret = btrfs_write_check(iocb, i, ret);
- if (ret < 0)
- goto out;
+ lockdep_assert_held(&inode->i_rwsem);
- pos = iocb->ki_pos;
nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
PAGE_SIZE / (sizeof(struct page *)));
nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied);
@@ -1877,10 +1862,37 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
iocb->ki_pos += num_written;
}
out:
- btrfs_inode_unlock(inode, ilock_flags);
return num_written ? num_written : ret;
}
+static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
+ struct iov_iter *i)
+{
+ struct inode *inode = file_inode(iocb->ki_filp);
+ unsigned int ilock_flags = 0;
+ ssize_t ret;
+
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ ilock_flags |= BTRFS_ILOCK_TRY;
+
+ ret = btrfs_inode_lock(inode, ilock_flags);
+ if (ret < 0)
+ return ret;
+
+ ret = generic_write_checks(iocb, i);
+ if (ret <= 0)
+ goto out;
+
+ ret = btrfs_write_check(iocb, i, ret);
+ if (ret < 0)
+ goto out;
+
+ ret = __btrfs_buffered_write(iocb, i);
+out:
+ btrfs_inode_unlock(inode, ilock_flags);
+ return ret;
+}
+
static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
const struct iov_iter *iter, loff_t offset)
{
@@ -1927,10 +1939,8 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
}
err = btrfs_write_check(iocb, from, err);
- if (err < 0) {
- btrfs_inode_unlock(inode, ilock_flags);
+ if (err < 0)
goto out;
- }
pos = iocb->ki_pos;
/*
@@ -1944,22 +1954,19 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
goto relock;
}
- if (check_direct_IO(fs_info, from, pos)) {
- btrfs_inode_unlock(inode, ilock_flags);
+ if (check_direct_IO(fs_info, from, pos))
goto buffered;
- }
dio = __iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops,
&btrfs_dio_ops, is_sync_kiocb(iocb));
- btrfs_inode_unlock(inode, ilock_flags);
-
if (IS_ERR_OR_NULL(dio)) {
err = PTR_ERR_OR_ZERO(dio);
if (err < 0 && err != -ENOTBLK)
goto out;
} else {
- written = iomap_dio_complete(dio);
+ written = __iomap_dio_complete(dio);
+ kfree(dio);
}
if (written < 0 || !iov_iter_count(from)) {
@@ -1969,7 +1976,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
buffered:
pos = iocb->ki_pos;
- written_buffered = btrfs_buffered_write(iocb, from);
+ written_buffered = __btrfs_buffered_write(iocb, from);
if (written_buffered < 0) {
err = written_buffered;
goto out;
@@ -1990,6 +1997,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
invalidate_mapping_pages(file->f_mapping, pos >> PAGE_SHIFT,
endbyte >> PAGE_SHIFT);
out:
+ btrfs_inode_unlock(inode, ilock_flags);
+ if (written > 0)
+ generic_write_sync(iocb, written);
+
return written ? written : err;
}
--
2.29.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] iomap: Separate out generic_write_sync() from iomap_dio_complete()
2020-12-15 18:06 ` [PATCH 1/2] iomap: Separate out generic_write_sync() from iomap_dio_complete() Goldwyn Rodrigues
@ 2020-12-15 21:24 ` kernel test robot
2020-12-15 22:16 ` Dave Chinner
1 sibling, 0 replies; 11+ messages in thread
From: kernel test robot @ 2020-12-15 21:24 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-fsdevel, linux-btrfs
Cc: kbuild-all, darrick.wong, hch, nborisov, Goldwyn Rodrigues
[-- Attachment #1: Type: text/plain, Size: 2483 bytes --]
Hi Goldwyn,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on kdave/for-next]
[also build test WARNING on v5.10 next-20201215]
[cannot apply to xfs-linux/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Goldwyn-Rodrigues/Fix-locking-for-btrfs-direct-writes/20201216-021312
base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: sparc-randconfig-s031-20201215 (attached as .config)
compiler: sparc-linux-gcc (GCC) 9.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-184-g1b896707-dirty
# https://github.com/0day-ci/linux/commit/4706fd8a8832b4948c25abc5fec38a017704d828
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Goldwyn-Rodrigues/Fix-locking-for-btrfs-direct-writes/20201216-021312
git checkout 4706fd8a8832b4948c25abc5fec38a017704d828
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=sparc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/iomap/direct-io.c:127:9: warning: no previous prototype for 'iomap_dio_complete' [-Wmissing-prototypes]
127 | ssize_t iomap_dio_complete(struct iomap_dio *dio)
| ^~~~~~~~~~~~~~~~~~
"sparse warnings: (new ones prefixed by >>)"
vim +/iomap_dio_complete +127 fs/iomap/direct-io.c
126
> 127 ssize_t iomap_dio_complete(struct iomap_dio *dio)
128 {
129 ssize_t ret;
130
131 ret = __iomap_dio_complete(dio);
132 /*
133 * If this is a DSYNC write, make sure we push it to stable storage now
134 * that we've written data.
135 */
136 if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
137 ret = generic_write_sync(dio->iocb, ret);
138
139 kfree(dio);
140
141 return ret;
142 }
143
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25723 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] iomap: Separate out generic_write_sync() from iomap_dio_complete()
@ 2020-12-15 21:24 ` kernel test robot
0 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2020-12-15 21:24 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 2548 bytes --]
Hi Goldwyn,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on kdave/for-next]
[also build test WARNING on v5.10 next-20201215]
[cannot apply to xfs-linux/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Goldwyn-Rodrigues/Fix-locking-for-btrfs-direct-writes/20201216-021312
base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: sparc-randconfig-s031-20201215 (attached as .config)
compiler: sparc-linux-gcc (GCC) 9.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-184-g1b896707-dirty
# https://github.com/0day-ci/linux/commit/4706fd8a8832b4948c25abc5fec38a017704d828
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Goldwyn-Rodrigues/Fix-locking-for-btrfs-direct-writes/20201216-021312
git checkout 4706fd8a8832b4948c25abc5fec38a017704d828
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=sparc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/iomap/direct-io.c:127:9: warning: no previous prototype for 'iomap_dio_complete' [-Wmissing-prototypes]
127 | ssize_t iomap_dio_complete(struct iomap_dio *dio)
| ^~~~~~~~~~~~~~~~~~
"sparse warnings: (new ones prefixed by >>)"
vim +/iomap_dio_complete +127 fs/iomap/direct-io.c
126
> 127 ssize_t iomap_dio_complete(struct iomap_dio *dio)
128 {
129 ssize_t ret;
130
131 ret = __iomap_dio_complete(dio);
132 /*
133 * If this is a DSYNC write, make sure we push it to stable storage now
134 * that we've written data.
135 */
136 if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
137 ret = generic_write_sync(dio->iocb, ret);
138
139 kfree(dio);
140
141 return ret;
142 }
143
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 25723 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock
2020-12-15 18:06 ` [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock Goldwyn Rodrigues
@ 2020-12-15 22:13 ` Darrick J. Wong
2020-12-16 21:07 ` Goldwyn Rodrigues
0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2020-12-15 22:13 UTC (permalink / raw)
To: Goldwyn Rodrigues
Cc: linux-fsdevel, linux-btrfs, hch, nborisov, Goldwyn Rodrigues
On Tue, Dec 15, 2020 at 12:06:36PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> btrfs_direct_write() fallsback to buffered write in case btrfs is not
> able to perform or complete a direct I/O. During the fallback
> inode lock is unlocked and relocked. This does not guarantee the
> atomicity of the entire write since the lock can be acquired by another
> write between unlock and relock.
>
> __btrfs_buffered_write() is used to perform the direct fallback write,
> which performs the write without acquiring the lock or checks.
Er... can you grab the inode lock before deciding which of the IO
path(s) you're going to take? Then you'd always have an atomic write
even if fallback happens.
(Also vaguely wondering why this needs even more slicing and dicing of
the iomap directio functions...)
--D
>
> fa54fc76db94 ("btrfs: push inode locking and unlocking into buffered/direct write")
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
> fs/btrfs/file.c | 69 ++++++++++++++++++++++++++++---------------------
> 1 file changed, 40 insertions(+), 29 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 0e41459b8de6..9fc768b951f1 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1638,11 +1638,11 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
> return 0;
> }
>
> -static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> +static noinline ssize_t __btrfs_buffered_write(struct kiocb *iocb,
> struct iov_iter *i)
> {
> struct file *file = iocb->ki_filp;
> - loff_t pos;
> + loff_t pos = iocb->ki_pos;
> struct inode *inode = file_inode(file);
> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> struct page **pages = NULL;
> @@ -1656,24 +1656,9 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> bool only_release_metadata = false;
> bool force_page_uptodate = false;
> loff_t old_isize = i_size_read(inode);
> - unsigned int ilock_flags = 0;
> -
> - if (iocb->ki_flags & IOCB_NOWAIT)
> - ilock_flags |= BTRFS_ILOCK_TRY;
> -
> - ret = btrfs_inode_lock(inode, ilock_flags);
> - if (ret < 0)
> - return ret;
> -
> - ret = generic_write_checks(iocb, i);
> - if (ret <= 0)
> - goto out;
>
> - ret = btrfs_write_check(iocb, i, ret);
> - if (ret < 0)
> - goto out;
> + lockdep_assert_held(&inode->i_rwsem);
>
> - pos = iocb->ki_pos;
> nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
> PAGE_SIZE / (sizeof(struct page *)));
> nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied);
> @@ -1877,10 +1862,37 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> iocb->ki_pos += num_written;
> }
> out:
> - btrfs_inode_unlock(inode, ilock_flags);
> return num_written ? num_written : ret;
> }
>
> +static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> + struct iov_iter *i)
> +{
> + struct inode *inode = file_inode(iocb->ki_filp);
> + unsigned int ilock_flags = 0;
> + ssize_t ret;
> +
> + if (iocb->ki_flags & IOCB_NOWAIT)
> + ilock_flags |= BTRFS_ILOCK_TRY;
> +
> + ret = btrfs_inode_lock(inode, ilock_flags);
> + if (ret < 0)
> + return ret;
> +
> + ret = generic_write_checks(iocb, i);
> + if (ret <= 0)
> + goto out;
> +
> + ret = btrfs_write_check(iocb, i, ret);
> + if (ret < 0)
> + goto out;
> +
> + ret = __btrfs_buffered_write(iocb, i);
> +out:
> + btrfs_inode_unlock(inode, ilock_flags);
> + return ret;
> +}
> +
> static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
> const struct iov_iter *iter, loff_t offset)
> {
> @@ -1927,10 +1939,8 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
> }
>
> err = btrfs_write_check(iocb, from, err);
> - if (err < 0) {
> - btrfs_inode_unlock(inode, ilock_flags);
> + if (err < 0)
> goto out;
> - }
>
> pos = iocb->ki_pos;
> /*
> @@ -1944,22 +1954,19 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
> goto relock;
> }
>
> - if (check_direct_IO(fs_info, from, pos)) {
> - btrfs_inode_unlock(inode, ilock_flags);
> + if (check_direct_IO(fs_info, from, pos))
> goto buffered;
> - }
>
> dio = __iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops,
> &btrfs_dio_ops, is_sync_kiocb(iocb));
>
> - btrfs_inode_unlock(inode, ilock_flags);
> -
> if (IS_ERR_OR_NULL(dio)) {
> err = PTR_ERR_OR_ZERO(dio);
> if (err < 0 && err != -ENOTBLK)
> goto out;
> } else {
> - written = iomap_dio_complete(dio);
> + written = __iomap_dio_complete(dio);
> + kfree(dio);
> }
>
> if (written < 0 || !iov_iter_count(from)) {
> @@ -1969,7 +1976,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
>
> buffered:
> pos = iocb->ki_pos;
> - written_buffered = btrfs_buffered_write(iocb, from);
> + written_buffered = __btrfs_buffered_write(iocb, from);
> if (written_buffered < 0) {
> err = written_buffered;
> goto out;
> @@ -1990,6 +1997,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
> invalidate_mapping_pages(file->f_mapping, pos >> PAGE_SHIFT,
> endbyte >> PAGE_SHIFT);
> out:
> + btrfs_inode_unlock(inode, ilock_flags);
> + if (written > 0)
> + generic_write_sync(iocb, written);
> +
> return written ? written : err;
> }
>
> --
> 2.29.2
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] iomap: Separate out generic_write_sync() from iomap_dio_complete()
2020-12-15 18:06 ` [PATCH 1/2] iomap: Separate out generic_write_sync() from iomap_dio_complete() Goldwyn Rodrigues
2020-12-15 21:24 ` kernel test robot
@ 2020-12-15 22:16 ` Dave Chinner
1 sibling, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2020-12-15 22:16 UTC (permalink / raw)
To: Goldwyn Rodrigues
Cc: linux-fsdevel, linux-btrfs, darrick.wong, hch, nborisov,
Goldwyn Rodrigues
On Tue, Dec 15, 2020 at 12:06:35PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> This introduces a separate function __iomap_dio_complte() which
> completes the Direct I/O without performing the write sync.
>
> Filesystems such as btrfs which require an inode_lock for sync can call
> __iomap_dio_complete() and must perform sync on their own after unlock.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
> fs/iomap/direct-io.c | 16 +++++++++++++---
> include/linux/iomap.h | 2 +-
> 2 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 933f234d5bec..11a108f39fd9 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -76,7 +76,7 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
> dio->submit.cookie = submit_bio(bio);
> }
>
> -ssize_t iomap_dio_complete(struct iomap_dio *dio)
> +ssize_t __iomap_dio_complete(struct iomap_dio *dio)
> {
> const struct iomap_dio_ops *dops = dio->dops;
> struct kiocb *iocb = dio->iocb;
> @@ -119,18 +119,28 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
> }
>
> inode_dio_end(file_inode(iocb->ki_filp));
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(__iomap_dio_complete);
> +
> +ssize_t iomap_dio_complete(struct iomap_dio *dio)
> +{
> + ssize_t ret;
> +
> + ret = __iomap_dio_complete(dio);
> /*
> * If this is a DSYNC write, make sure we push it to stable storage now
> * that we've written data.
> */
> if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
> - ret = generic_write_sync(iocb, ret);
> + ret = generic_write_sync(dio->iocb, ret);
>
> kfree(dio);
>
> return ret;
> }
> -EXPORT_SYMBOL_GPL(iomap_dio_complete);
> +
NACK.
If you don't want iomap_dio_complete to do O_DSYNC work after
successfully writing data, strip those flags out of the kiocb
before you call iomap_dio_rw() and do it yourself after calling
iomap_dio_complete().
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock
2020-12-15 22:13 ` Darrick J. Wong
@ 2020-12-16 21:07 ` Goldwyn Rodrigues
0 siblings, 0 replies; 11+ messages in thread
From: Goldwyn Rodrigues @ 2020-12-16 21:07 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-fsdevel, linux-btrfs, hch, nborisov
On 14:13 15/12, Darrick J. Wong wrote:
> On Tue, Dec 15, 2020 at 12:06:36PM -0600, Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >
> > btrfs_direct_write() fallsback to buffered write in case btrfs is not
> > able to perform or complete a direct I/O. During the fallback
> > inode lock is unlocked and relocked. This does not guarantee the
> > atomicity of the entire write since the lock can be acquired by another
> > write between unlock and relock.
> >
> > __btrfs_buffered_write() is used to perform the direct fallback write,
> > which performs the write without acquiring the lock or checks.
>
> Er... can you grab the inode lock before deciding which of the IO
> path(s) you're going to take? Then you'd always have an atomic write
> even if fallback happens.
No, since this is a fallback option which also works if the I/O is
incomplete.
>
> (Also vaguely wondering why this needs even more slicing and dicing of
> the iomap directio functions...)
I would most likely go with Dave's method of storing the flag in the
function and calling iomap dio functions without IOCB_DSYNC flag. This
way we don't have to change iomap.
--
Goldwyn
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock
@ 2020-12-16 1:06 kernel test robot
0 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2020-12-16 1:06 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 19999 bytes --]
CC: kbuild-all(a)lists.01.org
In-Reply-To: <49ff9bfb8ef20e7a9c6e26fd54bc9f4508c9ccb4.1608053602.git.rgoldwyn@suse.com>
References: <49ff9bfb8ef20e7a9c6e26fd54bc9f4508c9ccb4.1608053602.git.rgoldwyn@suse.com>
TO: Goldwyn Rodrigues <rgoldwyn@suse.de>
TO: linux-fsdevel(a)vger.kernel.org
TO: linux-btrfs(a)vger.kernel.org
CC: darrick.wong(a)oracle.com
CC: hch(a)infradead.org
CC: nborisov(a)suse.com
CC: Goldwyn Rodrigues <rgoldwyn@suse.com>
Hi Goldwyn,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on kdave/for-next]
[also build test WARNING on next-20201215]
[cannot apply to xfs-linux/for-next v5.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Goldwyn-Rodrigues/Fix-locking-for-btrfs-direct-writes/20201216-021312
base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
:::::: branch date: 7 hours ago
:::::: commit date: 7 hours ago
config: x86_64-randconfig-m001-20201215 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
smatch warnings:
fs/btrfs/file.c:1865 __btrfs_buffered_write() error: uninitialized symbol 'ret'.
vim +/ret +1865 fs/btrfs/file.c
b8d8e1fd570a194 Goldwyn Rodrigues 2020-09-24 1640
af84b6141d8301b Goldwyn Rodrigues 2020-12-15 1641 static noinline ssize_t __btrfs_buffered_write(struct kiocb *iocb,
e4af400a9c5081e Goldwyn Rodrigues 2018-06-17 1642 struct iov_iter *i)
39279cc3d2704cf Chris Mason 2007-06-12 1643 {
e4af400a9c5081e Goldwyn Rodrigues 2018-06-17 1644 struct file *file = iocb->ki_filp;
af84b6141d8301b Goldwyn Rodrigues 2020-12-15 1645 loff_t pos = iocb->ki_pos;
496ad9aa8ef4480 Al Viro 2013-01-23 1646 struct inode *inode = file_inode(file);
0b246afa62b0cf5 Jeff Mahoney 2016-06-22 1647 struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
11c65dccf70be9a Josef Bacik 2010-05-23 1648 struct page **pages = NULL;
364ecf3651e0862 Qu Wenruo 2017-02-27 1649 struct extent_changeset *data_reserved = NULL;
7ee9e4405f264e9 Josef Bacik 2013-06-21 1650 u64 release_bytes = 0;
376cc685cb3b43a Miao Xie 2013-12-10 1651 u64 lockstart;
376cc685cb3b43a Miao Xie 2013-12-10 1652 u64 lockend;
d0215f3e5ebb580 Josef Bacik 2011-01-25 1653 size_t num_written = 0;
d0215f3e5ebb580 Josef Bacik 2011-01-25 1654 int nrptrs;
c352370633400d1 Goldwyn Rodrigues 2020-09-24 1655 ssize_t ret;
7ee9e4405f264e9 Josef Bacik 2013-06-21 1656 bool only_release_metadata = false;
b6316429af7f365 Josef Bacik 2011-09-30 1657 bool force_page_uptodate = false;
5e8b9ef30392bb8 Goldwyn Rodrigues 2020-09-24 1658 loff_t old_isize = i_size_read(inode);
c352370633400d1 Goldwyn Rodrigues 2020-09-24 1659
af84b6141d8301b Goldwyn Rodrigues 2020-12-15 1660 lockdep_assert_held(&inode->i_rwsem);
c352370633400d1 Goldwyn Rodrigues 2020-09-24 1661
09cbfeaf1a5a67b Kirill A. Shutemov 2016-04-01 1662 nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
09cbfeaf1a5a67b Kirill A. Shutemov 2016-04-01 1663 PAGE_SIZE / (sizeof(struct page *)));
142349f541d0bb6 Wu Fengguang 2011-12-16 1664 nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied);
142349f541d0bb6 Wu Fengguang 2011-12-16 1665 nrptrs = max(nrptrs, 8);
31e818fe7375d60 David Sterba 2015-02-20 1666 pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL);
c352370633400d1 Goldwyn Rodrigues 2020-09-24 1667 if (!pages) {
c352370633400d1 Goldwyn Rodrigues 2020-09-24 1668 ret = -ENOMEM;
c352370633400d1 Goldwyn Rodrigues 2020-09-24 1669 goto out;
c352370633400d1 Goldwyn Rodrigues 2020-09-24 1670 }
ab93dbecfba72bb Chris Mason 2009-10-01 1671
d0215f3e5ebb580 Josef Bacik 2011-01-25 1672 while (iov_iter_count(i) > 0) {
c67d970f0ea8dcc Filipe Manana 2019-09-30 1673 struct extent_state *cached_state = NULL;
7073017aeb98db3 Johannes Thumshirn 2018-12-05 1674 size_t offset = offset_in_page(pos);
2e78c927d79333f Chandan Rajendra 2016-01-21 1675 size_t sector_offset;
d0215f3e5ebb580 Josef Bacik 2011-01-25 1676 size_t write_bytes = min(iov_iter_count(i),
09cbfeaf1a5a67b Kirill A. Shutemov 2016-04-01 1677 nrptrs * (size_t)PAGE_SIZE -
8c2383c3dd2cb5b Chris Mason 2007-06-18 1678 offset);
eefa45f59379282 Goldwyn Rodrigues 2020-09-25 1679 size_t num_pages;
7ee9e4405f264e9 Josef Bacik 2013-06-21 1680 size_t reserve_bytes;
d0215f3e5ebb580 Josef Bacik 2011-01-25 1681 size_t dirty_pages;
d0215f3e5ebb580 Josef Bacik 2011-01-25 1682 size_t copied;
2e78c927d79333f Chandan Rajendra 2016-01-21 1683 size_t dirty_sectors;
2e78c927d79333f Chandan Rajendra 2016-01-21 1684 size_t num_sectors;
79f015f216539df Goldwyn Rodrigues 2017-10-16 1685 int extents_locked;
39279cc3d2704cf Chris Mason 2007-06-12 1686
914ee295af418e9 Xin Zhong 2010-12-09 1687 /*
914ee295af418e9 Xin Zhong 2010-12-09 1688 * Fault pages before locking them in prepare_pages
914ee295af418e9 Xin Zhong 2010-12-09 1689 * to avoid recursive lock
914ee295af418e9 Xin Zhong 2010-12-09 1690 */
d0215f3e5ebb580 Josef Bacik 2011-01-25 1691 if (unlikely(iov_iter_fault_in_readable(i, write_bytes))) {
914ee295af418e9 Xin Zhong 2010-12-09 1692 ret = -EFAULT;
d0215f3e5ebb580 Josef Bacik 2011-01-25 1693 break;
914ee295af418e9 Xin Zhong 2010-12-09 1694 }
914ee295af418e9 Xin Zhong 2010-12-09 1695
a0e248bb502d516 Filipe Manana 2019-10-11 1696 only_release_metadata = false;
da17066c40472c2 Jeff Mahoney 2016-06-15 1697 sector_offset = pos & (fs_info->sectorsize - 1);
d9d8b2a51a404c2 Qu Wenruo 2015-09-08 1698
364ecf3651e0862 Qu Wenruo 2017-02-27 1699 extent_changeset_release(data_reserved);
36ea6f3e931391c Nikolay Borisov 2020-06-03 1700 ret = btrfs_check_data_free_space(BTRFS_I(inode),
36ea6f3e931391c Nikolay Borisov 2020-06-03 1701 &data_reserved, pos,
364ecf3651e0862 Qu Wenruo 2017-02-27 1702 write_bytes);
c6887cd11149d73 Josef Bacik 2016-03-25 1703 if (ret < 0) {
d9d8b2a51a404c2 Qu Wenruo 2015-09-08 1704 /*
eefa45f59379282 Goldwyn Rodrigues 2020-09-25 1705 * If we don't have to COW@the offset, reserve
eefa45f59379282 Goldwyn Rodrigues 2020-09-25 1706 * metadata only. write_bytes may get smaller than
eefa45f59379282 Goldwyn Rodrigues 2020-09-25 1707 * requested here.
d9d8b2a51a404c2 Qu Wenruo 2015-09-08 1708 */
eefa45f59379282 Goldwyn Rodrigues 2020-09-25 1709 if (btrfs_check_nocow_lock(BTRFS_I(inode), pos,
eefa45f59379282 Goldwyn Rodrigues 2020-09-25 1710 &write_bytes) > 0)
7ee9e4405f264e9 Josef Bacik 2013-06-21 1711 only_release_metadata = true;
eefa45f59379282 Goldwyn Rodrigues 2020-09-25 1712 else
d0215f3e5ebb580 Josef Bacik 2011-01-25 1713 break;
c6887cd11149d73 Josef Bacik 2016-03-25 1714 }
1832a6d5ee3b1af Chris Mason 2007-12-21 1715
eefa45f59379282 Goldwyn Rodrigues 2020-09-25 1716 num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE);
eefa45f59379282 Goldwyn Rodrigues 2020-09-25 1717 WARN_ON(num_pages > nrptrs);
eefa45f59379282 Goldwyn Rodrigues 2020-09-25 1718 reserve_bytes = round_up(write_bytes + sector_offset,
eefa45f59379282 Goldwyn Rodrigues 2020-09-25 1719 fs_info->sectorsize);
8b62f87bad9cf06 Josef Bacik 2017-10-19 1720 WARN_ON(reserve_bytes == 0);
9f3db423f98c5c6 Nikolay Borisov 2017-02-20 1721 ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
9f3db423f98c5c6 Nikolay Borisov 2017-02-20 1722 reserve_bytes);
7ee9e4405f264e9 Josef Bacik 2013-06-21 1723 if (ret) {
7ee9e4405f264e9 Josef Bacik 2013-06-21 1724 if (!only_release_metadata)
25ce28caaa1ddc2 Nikolay Borisov 2020-06-03 1725 btrfs_free_reserved_data_space(BTRFS_I(inode),
bc42bda22345efd Qu Wenruo 2017-02-27 1726 data_reserved, pos,
d9d8b2a51a404c2 Qu Wenruo 2015-09-08 1727 write_bytes);
8257b2dc3c1a105 Miao Xie 2014-03-06 1728 else
38d37aa9c329382 Qu Wenruo 2020-06-24 1729 btrfs_check_nocow_unlock(BTRFS_I(inode));
7ee9e4405f264e9 Josef Bacik 2013-06-21 1730 break;
7ee9e4405f264e9 Josef Bacik 2013-06-21 1731 }
7ee9e4405f264e9 Josef Bacik 2013-06-21 1732
7ee9e4405f264e9 Josef Bacik 2013-06-21 1733 release_bytes = reserve_bytes;
376cc685cb3b43a Miao Xie 2013-12-10 1734 again:
4a64001f0047956 Josef Bacik 2011-01-25 1735 /*
4a64001f0047956 Josef Bacik 2011-01-25 1736 * This is going to setup the pages array with the number of
4a64001f0047956 Josef Bacik 2011-01-25 1737 * pages we want, so we don't really need to worry about the
4a64001f0047956 Josef Bacik 2011-01-25 1738 * contents of pages from loop to loop
4a64001f0047956 Josef Bacik 2011-01-25 1739 */
b37392ea86761e9 Miao Xie 2013-12-10 1740 ret = prepare_pages(inode, pages, num_pages,
b37392ea86761e9 Miao Xie 2013-12-10 1741 pos, write_bytes,
b6316429af7f365 Josef Bacik 2011-09-30 1742 force_page_uptodate);
8b62f87bad9cf06 Josef Bacik 2017-10-19 1743 if (ret) {
8b62f87bad9cf06 Josef Bacik 2017-10-19 1744 btrfs_delalloc_release_extents(BTRFS_I(inode),
8702ba9396bf7bb Qu Wenruo 2019-10-14 1745 reserve_bytes);
d0215f3e5ebb580 Josef Bacik 2011-01-25 1746 break;
8b62f87bad9cf06 Josef Bacik 2017-10-19 1747 }
39279cc3d2704cf Chris Mason 2007-06-12 1748
79f015f216539df Goldwyn Rodrigues 2017-10-16 1749 extents_locked = lock_and_cleanup_extent_if_need(
79f015f216539df Goldwyn Rodrigues 2017-10-16 1750 BTRFS_I(inode), pages,
2cff578cfceba88 Nikolay Borisov 2017-02-20 1751 num_pages, pos, write_bytes, &lockstart,
2e78c927d79333f Chandan Rajendra 2016-01-21 1752 &lockend, &cached_state);
79f015f216539df Goldwyn Rodrigues 2017-10-16 1753 if (extents_locked < 0) {
79f015f216539df Goldwyn Rodrigues 2017-10-16 1754 if (extents_locked == -EAGAIN)
376cc685cb3b43a Miao Xie 2013-12-10 1755 goto again;
8b62f87bad9cf06 Josef Bacik 2017-10-19 1756 btrfs_delalloc_release_extents(BTRFS_I(inode),
8702ba9396bf7bb Qu Wenruo 2019-10-14 1757 reserve_bytes);
79f015f216539df Goldwyn Rodrigues 2017-10-16 1758 ret = extents_locked;
376cc685cb3b43a Miao Xie 2013-12-10 1759 break;
376cc685cb3b43a Miao Xie 2013-12-10 1760 }
376cc685cb3b43a Miao Xie 2013-12-10 1761
ee22f0c4ec428e7 Zhao Lei 2016-01-06 1762 copied = btrfs_copy_from_user(pos, write_bytes, pages, i);
b1bf862e9dad431 Chris Mason 2011-02-28 1763
0b246afa62b0cf5 Jeff Mahoney 2016-06-22 1764 num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes);
56244ef151c3cd1 Chris Mason 2016-05-16 1765 dirty_sectors = round_up(copied + sector_offset,
0b246afa62b0cf5 Jeff Mahoney 2016-06-22 1766 fs_info->sectorsize);
0b246afa62b0cf5 Jeff Mahoney 2016-06-22 1767 dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
56244ef151c3cd1 Chris Mason 2016-05-16 1768
b1bf862e9dad431 Chris Mason 2011-02-28 1769 /*
b1bf862e9dad431 Chris Mason 2011-02-28 1770 * if we have trouble faulting in the pages, fall
b1bf862e9dad431 Chris Mason 2011-02-28 1771 * back to one page at a time
b1bf862e9dad431 Chris Mason 2011-02-28 1772 */
b1bf862e9dad431 Chris Mason 2011-02-28 1773 if (copied < write_bytes)
b1bf862e9dad431 Chris Mason 2011-02-28 1774 nrptrs = 1;
b1bf862e9dad431 Chris Mason 2011-02-28 1775
b6316429af7f365 Josef Bacik 2011-09-30 1776 if (copied == 0) {
b6316429af7f365 Josef Bacik 2011-09-30 1777 force_page_uptodate = true;
56244ef151c3cd1 Chris Mason 2016-05-16 1778 dirty_sectors = 0;
b1bf862e9dad431 Chris Mason 2011-02-28 1779 dirty_pages = 0;
b6316429af7f365 Josef Bacik 2011-09-30 1780 } else {
b6316429af7f365 Josef Bacik 2011-09-30 1781 force_page_uptodate = false;
ed6078f70335f15 David Sterba 2014-06-05 1782 dirty_pages = DIV_ROUND_UP(copied + offset,
09cbfeaf1a5a67b Kirill A. Shutemov 2016-04-01 1783 PAGE_SIZE);
b6316429af7f365 Josef Bacik 2011-09-30 1784 }
914ee295af418e9 Xin Zhong 2010-12-09 1785
2e78c927d79333f Chandan Rajendra 2016-01-21 1786 if (num_sectors > dirty_sectors) {
8b8b08cbfb9021a Chris Mason 2016-07-19 1787 /* release everything except the sectors we dirtied */
265fdfa6ce0a79d David Sterba 2020-07-01 1788 release_bytes -= dirty_sectors << fs_info->sectorsize_bits;
485290a734f1427 Qu Wenruo 2015-10-29 1789 if (only_release_metadata) {
691fa059673b3b3 Nikolay Borisov 2017-02-20 1790 btrfs_delalloc_release_metadata(BTRFS_I(inode),
43b18595d6603cb Qu Wenruo 2017-12-12 1791 release_bytes, true);
485290a734f1427 Qu Wenruo 2015-10-29 1792 } else {
485290a734f1427 Qu Wenruo 2015-10-29 1793 u64 __pos;
485290a734f1427 Qu Wenruo 2015-10-29 1794
da17066c40472c2 Jeff Mahoney 2016-06-15 1795 __pos = round_down(pos,
0b246afa62b0cf5 Jeff Mahoney 2016-06-22 1796 fs_info->sectorsize) +
09cbfeaf1a5a67b Kirill A. Shutemov 2016-04-01 1797 (dirty_pages << PAGE_SHIFT);
86d52921a2ba51a Nikolay Borisov 2020-06-03 1798 btrfs_delalloc_release_space(BTRFS_I(inode),
bc42bda22345efd Qu Wenruo 2017-02-27 1799 data_reserved, __pos,
43b18595d6603cb Qu Wenruo 2017-12-12 1800 release_bytes, true);
914ee295af418e9 Xin Zhong 2010-12-09 1801 }
485290a734f1427 Qu Wenruo 2015-10-29 1802 }
914ee295af418e9 Xin Zhong 2010-12-09 1803
2e78c927d79333f Chandan Rajendra 2016-01-21 1804 release_bytes = round_up(copied + sector_offset,
0b246afa62b0cf5 Jeff Mahoney 2016-06-22 1805 fs_info->sectorsize);
376cc685cb3b43a Miao Xie 2013-12-10 1806
088545f6e442605 Nikolay Borisov 2020-06-03 1807 ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
088545f6e442605 Nikolay Borisov 2020-06-03 1808 dirty_pages, pos, copied,
aa8c1a41a1e6108 Goldwyn Rodrigues 2020-10-14 1809 &cached_state, only_release_metadata);
c67d970f0ea8dcc Filipe Manana 2019-09-30 1810
c67d970f0ea8dcc Filipe Manana 2019-09-30 1811 /*
c67d970f0ea8dcc Filipe Manana 2019-09-30 1812 * If we have not locked the extent range, because the range's
c67d970f0ea8dcc Filipe Manana 2019-09-30 1813 * start offset is >= i_size, we might still have a non-NULL
c67d970f0ea8dcc Filipe Manana 2019-09-30 1814 * cached extent state, acquired while marking the extent range
c67d970f0ea8dcc Filipe Manana 2019-09-30 1815 * as delalloc through btrfs_dirty_pages(). Therefore free any
c67d970f0ea8dcc Filipe Manana 2019-09-30 1816 * possible cached extent state to avoid a memory leak.
c67d970f0ea8dcc Filipe Manana 2019-09-30 1817 */
79f015f216539df Goldwyn Rodrigues 2017-10-16 1818 if (extents_locked)
376cc685cb3b43a Miao Xie 2013-12-10 1819 unlock_extent_cached(&BTRFS_I(inode)->io_tree,
e43bbe5e16d87b4 David Sterba 2017-12-12 1820 lockstart, lockend, &cached_state);
c67d970f0ea8dcc Filipe Manana 2019-09-30 1821 else
c67d970f0ea8dcc Filipe Manana 2019-09-30 1822 free_extent_state(cached_state);
c67d970f0ea8dcc Filipe Manana 2019-09-30 1823
8702ba9396bf7bb Qu Wenruo 2019-10-14 1824 btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
f1de968376340c9 Miao Xie 2014-01-09 1825 if (ret) {
d0215f3e5ebb580 Josef Bacik 2011-01-25 1826 btrfs_drop_pages(pages, num_pages);
d0215f3e5ebb580 Josef Bacik 2011-01-25 1827 break;
f1de968376340c9 Miao Xie 2014-01-09 1828 }
39279cc3d2704cf Chris Mason 2007-06-12 1829
7ee9e4405f264e9 Josef Bacik 2013-06-21 1830 release_bytes = 0;
8257b2dc3c1a105 Miao Xie 2014-03-06 1831 if (only_release_metadata)
38d37aa9c329382 Qu Wenruo 2020-06-24 1832 btrfs_check_nocow_unlock(BTRFS_I(inode));
8257b2dc3c1a105 Miao Xie 2014-03-06 1833
f1de968376340c9 Miao Xie 2014-01-09 1834 btrfs_drop_pages(pages, num_pages);
f1de968376340c9 Miao Xie 2014-01-09 1835
d0215f3e5ebb580 Josef Bacik 2011-01-25 1836 cond_resched();
d0215f3e5ebb580 Josef Bacik 2011-01-25 1837
d0e1d66b5aa1ec9 Namjae Jeon 2012-12-11 1838 balance_dirty_pages_ratelimited(inode->i_mapping);
cb843a6f513a1a9 Chris Mason 2008-10-03 1839
914ee295af418e9 Xin Zhong 2010-12-09 1840 pos += copied;
914ee295af418e9 Xin Zhong 2010-12-09 1841 num_written += copied;
d0215f3e5ebb580 Josef Bacik 2011-01-25 1842 }
39279cc3d2704cf Chris Mason 2007-06-12 1843
d0215f3e5ebb580 Josef Bacik 2011-01-25 1844 kfree(pages);
d0215f3e5ebb580 Josef Bacik 2011-01-25 1845
7ee9e4405f264e9 Josef Bacik 2013-06-21 1846 if (release_bytes) {
8257b2dc3c1a105 Miao Xie 2014-03-06 1847 if (only_release_metadata) {
38d37aa9c329382 Qu Wenruo 2020-06-24 1848 btrfs_check_nocow_unlock(BTRFS_I(inode));
691fa059673b3b3 Nikolay Borisov 2017-02-20 1849 btrfs_delalloc_release_metadata(BTRFS_I(inode),
43b18595d6603cb Qu Wenruo 2017-12-12 1850 release_bytes, true);
8257b2dc3c1a105 Miao Xie 2014-03-06 1851 } else {
86d52921a2ba51a Nikolay Borisov 2020-06-03 1852 btrfs_delalloc_release_space(BTRFS_I(inode),
86d52921a2ba51a Nikolay Borisov 2020-06-03 1853 data_reserved,
0b246afa62b0cf5 Jeff Mahoney 2016-06-22 1854 round_down(pos, fs_info->sectorsize),
43b18595d6603cb Qu Wenruo 2017-12-12 1855 release_bytes, true);
7ee9e4405f264e9 Josef Bacik 2013-06-21 1856 }
8257b2dc3c1a105 Miao Xie 2014-03-06 1857 }
7ee9e4405f264e9 Josef Bacik 2013-06-21 1858
364ecf3651e0862 Qu Wenruo 2017-02-27 1859 extent_changeset_free(data_reserved);
5e8b9ef30392bb8 Goldwyn Rodrigues 2020-09-24 1860 if (num_written > 0) {
5e8b9ef30392bb8 Goldwyn Rodrigues 2020-09-24 1861 pagecache_isize_extended(inode, old_isize, iocb->ki_pos);
5e8b9ef30392bb8 Goldwyn Rodrigues 2020-09-24 1862 iocb->ki_pos += num_written;
5e8b9ef30392bb8 Goldwyn Rodrigues 2020-09-24 1863 }
c352370633400d1 Goldwyn Rodrigues 2020-09-24 1864 out:
d0215f3e5ebb580 Josef Bacik 2011-01-25 @1865 return num_written ? num_written : ret;
39279cc3d2704cf Chris Mason 2007-06-12 1866 }
d0215f3e5ebb580 Josef Bacik 2011-01-25 1867
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 38475 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock
2020-12-08 18:42 ` [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock Goldwyn Rodrigues
@ 2020-12-10 8:52 ` Nikolay Borisov
0 siblings, 0 replies; 11+ messages in thread
From: Nikolay Borisov @ 2020-12-10 8:52 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-btrfs; +Cc: Goldwyn Rodrigues
On 8.12.20 г. 20:42 ч., Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> btrfs_direct_write() fallsback to buffered write in case btrfs is not
> able to perform or complete a direct I/O. During the fallback
> inode lock is unlocked and relocked. This does not guarantee the
> atomicity of the entire write since the lock can be acquired by another
> write between unlock and relock.
>
> __btrfs_buffered_write() is used to perform the write without locks or
> checks and called from btrfs_direct_write().
>
> fa54fc76db94 ("btrfs: push inode locking and unlocking into buffered/direct write")
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
> fs/btrfs/file.c | 55 +++++++++++++++++++++++++++----------------------
> 1 file changed, 30 insertions(+), 25 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 272660a8279f..03569fe20237 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1649,11 +1649,11 @@ static ssize_t btrfs_write_check(struct kiocb *iocb, struct iov_iter *from)
> return count;
> }
>
> -static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> +static noinline ssize_t __btrfs_buffered_write(struct kiocb *iocb,
> struct iov_iter *i)
> {
> struct file *file = iocb->ki_filp;
> - loff_t pos;
> + loff_t pos = iocb->ki_pos;
> struct inode *inode = file_inode(file);
> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> struct page **pages = NULL;
> @@ -1667,20 +1667,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> bool only_release_metadata = false;
> bool force_page_uptodate = false;
> loff_t old_isize = i_size_read(inode);
> - unsigned int ilock_flags = 0;
> -
> - if (iocb->ki_flags & IOCB_NOWAIT)
> - ilock_flags |= BTRFS_ILOCK_TRY;
> -
> - ret = btrfs_inode_lock(inode, ilock_flags);
> - if (ret < 0)
> - return ret;
> -
> - ret = btrfs_write_check(iocb, i);
> - if (ret <= 0)
> - goto out;
>
> - pos = iocb->ki_pos;
Add lockdep_assert_held(&inode->i_rwsem); since __btrfs_buffered_write
does require the lock to be held.
<snip>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock
2020-12-08 18:42 [PATCH 0/2] Fix direct write with respect to inode locking Goldwyn Rodrigues
@ 2020-12-08 18:42 ` Goldwyn Rodrigues
2020-12-10 8:52 ` Nikolay Borisov
0 siblings, 1 reply; 11+ messages in thread
From: Goldwyn Rodrigues @ 2020-12-08 18:42 UTC (permalink / raw)
To: linux-btrfs; +Cc: Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
btrfs_direct_write() fallsback to buffered write in case btrfs is not
able to perform or complete a direct I/O. During the fallback
inode lock is unlocked and relocked. This does not guarantee the
atomicity of the entire write since the lock can be acquired by another
write between unlock and relock.
__btrfs_buffered_write() is used to perform the write without locks or
checks and called from btrfs_direct_write().
fa54fc76db94 ("btrfs: push inode locking and unlocking into buffered/direct write")
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
fs/btrfs/file.c | 55 +++++++++++++++++++++++++++----------------------
1 file changed, 30 insertions(+), 25 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 272660a8279f..03569fe20237 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1649,11 +1649,11 @@ static ssize_t btrfs_write_check(struct kiocb *iocb, struct iov_iter *from)
return count;
}
-static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
+static noinline ssize_t __btrfs_buffered_write(struct kiocb *iocb,
struct iov_iter *i)
{
struct file *file = iocb->ki_filp;
- loff_t pos;
+ loff_t pos = iocb->ki_pos;
struct inode *inode = file_inode(file);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct page **pages = NULL;
@@ -1667,20 +1667,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
bool only_release_metadata = false;
bool force_page_uptodate = false;
loff_t old_isize = i_size_read(inode);
- unsigned int ilock_flags = 0;
-
- if (iocb->ki_flags & IOCB_NOWAIT)
- ilock_flags |= BTRFS_ILOCK_TRY;
-
- ret = btrfs_inode_lock(inode, ilock_flags);
- if (ret < 0)
- return ret;
-
- ret = btrfs_write_check(iocb, i);
- if (ret <= 0)
- goto out;
- pos = iocb->ki_pos;
nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
PAGE_SIZE / (sizeof(struct page *)));
nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied);
@@ -1884,10 +1871,33 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
iocb->ki_pos += num_written;
}
out:
- btrfs_inode_unlock(inode, ilock_flags);
return num_written ? num_written : ret;
}
+static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
+ struct iov_iter *i)
+{
+ struct inode *inode = file_inode(iocb->ki_filp);
+ unsigned int ilock_flags = 0;
+ ssize_t ret;
+
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ ilock_flags |= BTRFS_ILOCK_TRY;
+
+ ret = btrfs_inode_lock(inode, ilock_flags);
+ if (ret < 0)
+ return ret;
+
+ ret = btrfs_write_check(iocb, i);
+ if (ret <= 0)
+ goto out;
+
+ ret = __btrfs_buffered_write(iocb, i);
+out:
+ btrfs_inode_unlock(inode, ilock_flags);
+ return ret;
+}
+
static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
const struct iov_iter *iter, loff_t offset)
{
@@ -1928,10 +1938,8 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
return err;
err = btrfs_write_check(iocb, from);
- if (err <= 0) {
- btrfs_inode_unlock(inode, ilock_flags);
+ if (err <= 0)
goto out;
- }
pos = iocb->ki_pos;
/*
@@ -1945,16 +1953,12 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
goto relock;
}
- if (check_direct_IO(fs_info, from, pos)) {
- btrfs_inode_unlock(inode, ilock_flags);
+ if (check_direct_IO(fs_info, from, pos))
goto buffered;
- }
dio = __iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops,
&btrfs_dio_ops, is_sync_kiocb(iocb));
- btrfs_inode_unlock(inode, ilock_flags);
-
if (IS_ERR_OR_NULL(dio)) {
err = PTR_ERR_OR_ZERO(dio);
if (err < 0 && err != -ENOTBLK)
@@ -1970,7 +1974,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
buffered:
pos = iocb->ki_pos;
- written_buffered = btrfs_buffered_write(iocb, from);
+ written_buffered = __btrfs_buffered_write(iocb, from);
if (written_buffered < 0) {
err = written_buffered;
goto out;
@@ -1991,6 +1995,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
invalidate_mapping_pages(file->f_mapping, pos >> PAGE_SHIFT,
endbyte >> PAGE_SHIFT);
out:
+ btrfs_inode_unlock(inode, ilock_flags);
return written ? written : err;
}
--
2.29.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-12-16 21:08 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-15 18:06 [PATCH v2 0/2] Fix locking for btrfs direct writes Goldwyn Rodrigues
2020-12-15 18:06 ` [PATCH 1/2] iomap: Separate out generic_write_sync() from iomap_dio_complete() Goldwyn Rodrigues
2020-12-15 21:24 ` kernel test robot
2020-12-15 21:24 ` kernel test robot
2020-12-15 22:16 ` Dave Chinner
2020-12-15 18:06 ` [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock Goldwyn Rodrigues
2020-12-15 22:13 ` Darrick J. Wong
2020-12-16 21:07 ` Goldwyn Rodrigues
-- strict thread matches above, loose matches on Subject: below --
2020-12-16 1:06 kernel test robot
2020-12-08 18:42 [PATCH 0/2] Fix direct write with respect to inode locking Goldwyn Rodrigues
2020-12-08 18:42 ` [PATCH 2/2] btrfs: Make btrfs_direct_write atomic with respect to inode_lock Goldwyn Rodrigues
2020-12-10 8:52 ` Nikolay Borisov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.