linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O
@ 2018-05-11  6:30 Omar Sandoval
  2018-05-11  6:30 ` [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() Omar Sandoval
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Omar Sandoval @ 2018-05-11  6:30 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Hi, everyone,

Btrfs currently abuses current->journal_info in btrfs_direct_IO() in
order to pass around some state to get_block() and submit_io(). This
hack is ugly and unnecessary because the data we pass around is only
used in one call frame. Robbie Ko also pointed out [1] that it could
potentially cause a crash if we happen to end up in start_transaction()
(e.g., from memory reclaim calling into btrfs_evict_inode(), which can
start a transaction). I'm not convinced that Robbie's case can happen in
practice since we are using GFP_NOFS for allocations during direct I/O,
but either way it's fragile and nasty.

This series stops using current->journal_info and instead adds some
extra arguments to the generic direct I/O code so that we can pass
things around like sane people.

Based on Linus' master.

Thanks!

1: https://patchwork.kernel.org/patch/10389077/

Omar Sandoval (3):
  fs: add initial bh_result->b_private value to __blockdev_direct_IO()
  fs: add private argument to dio_submit_t
  Btrfs: stop abusing current->journal_info in btrfs_direct_IO()

 fs/btrfs/inode.c   | 39 ++++++++++-----------------------------
 fs/direct-io.c     | 12 +++++++-----
 fs/ext4/inode.c    |  6 +++---
 fs/gfs2/aops.c     |  2 +-
 fs/ocfs2/aops.c    |  5 ++---
 include/linux/fs.h | 10 +++++-----
 6 files changed, 28 insertions(+), 46 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()
  2018-05-11  6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval
@ 2018-05-11  6:30 ` Omar Sandoval
  2018-05-11 20:05   ` Al Viro
  2018-05-11  6:30 ` [PATCH 2/3] fs: add private argument to dio_submit_t Omar Sandoval
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Omar Sandoval @ 2018-05-11  6:30 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Btrfs abuses current->journal_info in btrfs_direct_IO() in order to pass
around some state to get_block() and submit_io(). The generic DIO code
already provides bh_result->b_private as a way to pass data between
calls to get_block() and end_io(), but it is always initialized to NULL.
Let's allow an initial value to be passed in from the caller of
__blockdev_direct_IO().

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c   | 6 +++---
 fs/direct-io.c     | 9 +++++----
 fs/ext4/inode.c    | 6 +++---
 fs/gfs2/aops.c     | 2 +-
 fs/ocfs2/aops.c    | 5 ++---
 include/linux/fs.h | 8 ++++----
 6 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d241285a0d2a..d8b0a7eacd19 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8618,9 +8618,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	}
 
 	ret = __blockdev_direct_IO(iocb, inode,
-				   fs_info->fs_devices->latest_bdev,
-				   iter, btrfs_get_blocks_direct, NULL,
-				   btrfs_submit_direct, flags);
+				   fs_info->fs_devices->latest_bdev, iter,
+				   btrfs_get_blocks_direct, NULL,
+				   btrfs_submit_direct, flags, NULL);
 	if (iov_iter_rw(iter) == WRITE) {
 		up_read(&BTRFS_I(inode)->dio_sem);
 		current->journal_info = NULL;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 874607bb6e02..9bb33d9c206c 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1171,7 +1171,7 @@ static inline ssize_t
 do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 		      struct block_device *bdev, struct iov_iter *iter,
 		      get_block_t get_block, dio_iodone_t end_io,
-		      dio_submit_t submit_io, int flags)
+		      dio_submit_t submit_io, int flags, void *private)
 {
 	unsigned i_blkbits = READ_ONCE(inode->i_blkbits);
 	unsigned blkbits = i_blkbits;
@@ -1182,7 +1182,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	const loff_t end = offset + count;
 	struct dio *dio;
 	struct dio_submit sdio = { 0, };
-	struct buffer_head map_bh = { 0, };
+	struct buffer_head map_bh = { .b_private = private, };
 	struct blk_plug plug;
 	unsigned long align = offset | iov_iter_alignment(iter);
 
@@ -1304,6 +1304,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 
 	sdio.get_block = get_block;
 	dio->end_io = end_io;
+	dio->private = map_bh.b_private;
 	sdio.submit_io = submit_io;
 	sdio.final_block_in_bio = -1;
 	sdio.next_block_for_io = -1;
@@ -1400,7 +1401,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 			     struct block_device *bdev, struct iov_iter *iter,
 			     get_block_t get_block,
 			     dio_iodone_t end_io, dio_submit_t submit_io,
-			     int flags)
+			     int flags, void *private)
 {
 	/*
 	 * The block device state is needed in the end to finally
@@ -1415,7 +1416,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	prefetch((char *)bdev->bd_queue + SMP_CACHE_BYTES);
 
 	return do_blockdev_direct_IO(iocb, inode, bdev, iter, get_block,
-				     end_io, submit_io, flags);
+				     end_io, submit_io, flags, private);
 }
 
 EXPORT_SYMBOL(__blockdev_direct_IO);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1e50c5efae67..e453e7529a3e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3739,7 +3739,7 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter)
 	}
 	ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter,
 				   get_block_func, ext4_end_io_dio, NULL,
-				   dio_flags);
+				   dio_flags, NULL);
 
 	if (ret > 0 && !overwrite && ext4_test_inode_state(inode,
 						EXT4_STATE_DIO_UNWRITTEN)) {
@@ -3830,8 +3830,8 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter)
 					   iocb->ki_pos + count - 1);
 	if (ret)
 		goto out_unlock;
-	ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
-				   iter, ext4_dio_get_block, NULL, NULL, 0);
+	ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter,
+				   ext4_dio_get_block, NULL, NULL, 0, NULL);
 out_unlock:
 	inode_unlock_shared(inode);
 	return ret;
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index f58716567972..d42e68543baa 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -1113,7 +1113,7 @@ static ssize_t gfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	}
 
 	rv = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter,
-				  gfs2_get_block_direct, NULL, NULL, 0);
+				  gfs2_get_block_direct, NULL, NULL, 0, NULL);
 out:
 	gfs2_glock_dq(&gh);
 out_uninit:
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 302cd7caa4a7..58b74d9da931 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2446,9 +2446,8 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	else
 		get_block = ocfs2_dio_wr_get_block;
 
-	return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
-				    iter, get_block,
-				    ocfs2_dio_end_io, NULL, 0);
+	return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter,
+				    get_block, ocfs2_dio_end_io, NULL, 0, NULL);
 }
 
 const struct address_space_operations ocfs2_aops = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 760d8da1b6c7..b9ef2fb1224b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2993,9 +2993,8 @@ void dio_warn_stale_pagecache(struct file *filp);
 
 ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 			     struct block_device *bdev, struct iov_iter *iter,
-			     get_block_t get_block,
-			     dio_iodone_t end_io, dio_submit_t submit_io,
-			     int flags);
+			     get_block_t get_block, dio_iodone_t end_io,
+			     dio_submit_t submit_io, int flags, void *private);
 
 static inline ssize_t blockdev_direct_IO(struct kiocb *iocb,
 					 struct inode *inode,
@@ -3003,7 +3002,8 @@ static inline ssize_t blockdev_direct_IO(struct kiocb *iocb,
 					 get_block_t get_block)
 {
 	return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter,
-			get_block, NULL, NULL, DIO_LOCKING | DIO_SKIP_HOLES);
+				    get_block, NULL, NULL,
+				    DIO_LOCKING | DIO_SKIP_HOLES, NULL);
 }
 #endif
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] fs: add private argument to dio_submit_t
  2018-05-11  6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval
  2018-05-11  6:30 ` [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() Omar Sandoval
@ 2018-05-11  6:30 ` Omar Sandoval
  2018-05-11  6:30 ` [PATCH 3/3] Btrfs: stop abusing current->journal_info in btrfs_direct_IO() Omar Sandoval
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2018-05-11  6:30 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

get_block() already has access to this through bh_result->b_private, and
it gets passed to end_io() as private, so do the same for submit_io().
Along with the previous change, this will allow us to get rid Btrfs'
current->journal_info abuse in btrfs_direct_IO().

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c   | 2 +-
 fs/direct-io.c     | 3 ++-
 include/linux/fs.h | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d8b0a7eacd19..46e3d366df8b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8418,7 +8418,7 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
 }
 
 static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
-				loff_t file_offset)
+				loff_t file_offset, void *private)
 {
 	struct btrfs_dio_private *dip = NULL;
 	struct bio *bio = NULL;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 9bb33d9c206c..3f3aa2d95ca1 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -475,7 +475,8 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
 	dio->bio_disk = bio->bi_disk;
 
 	if (sdio->submit_io) {
-		sdio->submit_io(bio, dio->inode, sdio->logical_offset_in_bio);
+		sdio->submit_io(bio, dio->inode, sdio->logical_offset_in_bio,
+				dio->private);
 		dio->bio_cookie = BLK_QC_T_NONE;
 	} else
 		dio->bio_cookie = submit_bio(bio);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b9ef2fb1224b..9e5e6ea1c144 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2978,7 +2978,7 @@ extern int nonseekable_open(struct inode * inode, struct file * filp);
 
 #ifdef CONFIG_BLOCK
 typedef void (dio_submit_t)(struct bio *bio, struct inode *inode,
-			    loff_t file_offset);
+			    loff_t file_offset, void *private);
 
 enum {
 	/* need locking between buffered and direct access */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] Btrfs: stop abusing current->journal_info in btrfs_direct_IO()
  2018-05-11  6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval
  2018-05-11  6:30 ` [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() Omar Sandoval
  2018-05-11  6:30 ` [PATCH 2/3] fs: add private argument to dio_submit_t Omar Sandoval
@ 2018-05-11  6:30 ` Omar Sandoval
  2018-05-11  9:26 ` [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O David Sterba
  2018-05-11  9:53 ` Nikolay Borisov
  4 siblings, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2018-05-11  6:30 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Now that we can pass around the struct btrfs_dio_data through the
different callbacks generically, we don't need to shove it in
current->journal_info.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 33 +++++++--------------------------
 1 file changed, 7 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 46e3d366df8b..3673c0420509 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7574,7 +7574,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_map *em;
 	struct extent_state *cached_state = NULL;
-	struct btrfs_dio_data *dio_data = NULL;
+	struct btrfs_dio_data *dio_data = bh_result->b_private;
 	u64 start = iblock << inode->i_blkbits;
 	u64 lockstart, lockend;
 	u64 len = bh_result->b_size;
@@ -7589,25 +7589,13 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 	lockstart = start;
 	lockend = start + len - 1;
 
-	if (current->journal_info) {
-		/*
-		 * Need to pull our outstanding extents and set journal_info to NULL so
-		 * that anything that needs to check if there's a transaction doesn't get
-		 * confused.
-		 */
-		dio_data = current->journal_info;
-		current->journal_info = NULL;
-	}
-
 	/*
 	 * If this errors out it's because we couldn't invalidate pagecache for
 	 * this range and we need to fallback to buffered.
 	 */
 	if (lock_extent_direct(inode, lockstart, lockend, &cached_state,
-			       create)) {
-		ret = -ENOTBLK;
-		goto err;
-	}
+			       create))
+		return -ENOTBLK;
 
 	em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, len, 0);
 	if (IS_ERR(em)) {
@@ -7732,7 +7720,6 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 		WARN_ON(dio_data->reserve < len);
 		dio_data->reserve -= len;
 		dio_data->unsubmitted_oe_range_end = start + len;
-		current->journal_info = dio_data;
 	}
 
 	/*
@@ -7755,9 +7742,6 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 unlock_err:
 	clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, lockend,
 			 unlock_bits, 1, 0, &cached_state);
-err:
-	if (dio_data)
-		current->journal_info = dio_data;
 	return ret;
 }
 
@@ -8460,7 +8444,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 	 * time by btrfs_direct_IO().
 	 */
 	if (write) {
-		struct btrfs_dio_data *dio_data = current->journal_info;
+		struct btrfs_dio_data *dio_data = private;
 
 		dio_data->unsubmitted_oe_range_end = dip->logical_offset +
 			dip->bytes;
@@ -8602,13 +8586,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		/*
 		 * We need to know how many extents we reserved so that we can
 		 * do the accounting properly if we go over the number we
-		 * originally calculated.  Abuse current->journal_info for this.
+		 * originally calculated.
 		 */
-		dio_data.reserve = round_up(count,
-					    fs_info->sectorsize);
+		dio_data.reserve = round_up(count, fs_info->sectorsize);
 		dio_data.unsubmitted_oe_range_start = (u64)offset;
 		dio_data.unsubmitted_oe_range_end = (u64)offset;
-		current->journal_info = &dio_data;
 		down_read(&BTRFS_I(inode)->dio_sem);
 	} else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 				     &BTRFS_I(inode)->runtime_flags)) {
@@ -8620,10 +8602,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	ret = __blockdev_direct_IO(iocb, inode,
 				   fs_info->fs_devices->latest_bdev, iter,
 				   btrfs_get_blocks_direct, NULL,
-				   btrfs_submit_direct, flags, NULL);
+				   btrfs_submit_direct, flags, &dio_data);
 	if (iov_iter_rw(iter) == WRITE) {
 		up_read(&BTRFS_I(inode)->dio_sem);
-		current->journal_info = NULL;
 		if (ret < 0 && ret != -EIOCBQUEUED) {
 			if (dio_data.reserve)
 				btrfs_delalloc_release_space(inode, data_reserved,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O
  2018-05-11  6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval
                   ` (2 preceding siblings ...)
  2018-05-11  6:30 ` [PATCH 3/3] Btrfs: stop abusing current->journal_info in btrfs_direct_IO() Omar Sandoval
@ 2018-05-11  9:26 ` David Sterba
  2018-05-11  9:53 ` Nikolay Borisov
  4 siblings, 0 replies; 15+ messages in thread
From: David Sterba @ 2018-05-11  9:26 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel, kernel-team

On Thu, May 10, 2018 at 11:30:09PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Hi, everyone,
> 
> Btrfs currently abuses current->journal_info in btrfs_direct_IO() in
> order to pass around some state to get_block() and submit_io(). This
> hack is ugly and unnecessary because the data we pass around is only
> used in one call frame.

I'd very much like to get rid of the journal_info hack. The changes to
ther filesystems are minimal.

The 3 patches look good to me, you can add my reviewed-by for btrfs
and ack for the rest.  I'm going to do a fstests round too.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O
  2018-05-11  6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval
                   ` (3 preceding siblings ...)
  2018-05-11  9:26 ` [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O David Sterba
@ 2018-05-11  9:53 ` Nikolay Borisov
  2018-05-11 10:24   ` David Sterba
  2018-05-11 17:28   ` Omar Sandoval
  4 siblings, 2 replies; 15+ messages in thread
From: Nikolay Borisov @ 2018-05-11  9:53 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs, linux-fsdevel; +Cc: kernel-team



On 11.05.2018 09:30, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Hi, everyone,
> 
> Btrfs currently abuses current->journal_info in btrfs_direct_IO() in
> order to pass around some state to get_block() and submit_io(). This
> hack is ugly and unnecessary because the data we pass around is only
> used in one call frame. Robbie Ko also pointed out [1] that it could
> potentially cause a crash if we happen to end up in start_transaction()
> (e.g., from memory reclaim calling into btrfs_evict_inode(), which can
> start a transaction). I'm not convinced that Robbie's case can happen in
> practice since we are using GFP_NOFS for allocations during direct I/O,
> but either way it's fragile and nasty.

When I worked initially on btrfs-over-swap I managed to hit a case where
ext4 stacked on top of btrfs would crash since btrfs will overwrite
journal_info which was set by ext4. So this change is indeed welcome :)

> 
> This series stops using current->journal_info and instead adds some
> extra arguments to the generic direct I/O code so that we can pass
> things around like sane people.
> 
> Based on Linus' master.
> 
> Thanks!
> 
> 1: https://patchwork.kernel.org/patch/10389077/
> 
> Omar Sandoval (3):
>   fs: add initial bh_result->b_private value to __blockdev_direct_IO()
>   fs: add private argument to dio_submit_t
>   Btrfs: stop abusing current->journal_info in btrfs_direct_IO()
> 
>  fs/btrfs/inode.c   | 39 ++++++++++-----------------------------
>  fs/direct-io.c     | 12 +++++++-----
>  fs/ext4/inode.c    |  6 +++---
>  fs/gfs2/aops.c     |  2 +-
>  fs/ocfs2/aops.c    |  5 ++---
>  include/linux/fs.h | 10 +++++-----
>  6 files changed, 28 insertions(+), 46 deletions(-)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O
  2018-05-11  9:53 ` Nikolay Borisov
@ 2018-05-11 10:24   ` David Sterba
  2018-05-11 17:28   ` Omar Sandoval
  1 sibling, 0 replies; 15+ messages in thread
From: David Sterba @ 2018-05-11 10:24 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Omar Sandoval, linux-btrfs, linux-fsdevel, kernel-team

On Fri, May 11, 2018 at 12:53:36PM +0300, Nikolay Borisov wrote:
> 
> 
> On 11.05.2018 09:30, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Hi, everyone,
> > 
> > Btrfs currently abuses current->journal_info in btrfs_direct_IO() in
> > order to pass around some state to get_block() and submit_io(). This
> > hack is ugly and unnecessary because the data we pass around is only
> > used in one call frame. Robbie Ko also pointed out [1] that it could
> > potentially cause a crash if we happen to end up in start_transaction()
> > (e.g., from memory reclaim calling into btrfs_evict_inode(), which can
> > start a transaction). I'm not convinced that Robbie's case can happen in
> > practice since we are using GFP_NOFS for allocations during direct I/O,
> > but either way it's fragile and nasty.
> 
> When I worked initially on btrfs-over-swap I managed to hit a case where
> ext4 stacked on top of btrfs would crash since btrfs will overwrite
> journal_info which was set by ext4. So this change is indeed welcome :)

And also this, https://lkml.org/lkml/2017/12/14/165.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O
  2018-05-11  9:53 ` Nikolay Borisov
  2018-05-11 10:24   ` David Sterba
@ 2018-05-11 17:28   ` Omar Sandoval
  1 sibling, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2018-05-11 17:28 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, linux-fsdevel, kernel-team

On Fri, May 11, 2018 at 12:53:36PM +0300, Nikolay Borisov wrote:
> 
> 
> On 11.05.2018 09:30, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Hi, everyone,
> > 
> > Btrfs currently abuses current->journal_info in btrfs_direct_IO() in
> > order to pass around some state to get_block() and submit_io(). This
> > hack is ugly and unnecessary because the data we pass around is only
> > used in one call frame. Robbie Ko also pointed out [1] that it could
> > potentially cause a crash if we happen to end up in start_transaction()
> > (e.g., from memory reclaim calling into btrfs_evict_inode(), which can
> > start a transaction). I'm not convinced that Robbie's case can happen in
> > practice since we are using GFP_NOFS for allocations during direct I/O,
> > but either way it's fragile and nasty.
> 
> When I worked initially on btrfs-over-swap I managed to hit a case where
> ext4 stacked on top of btrfs would crash since btrfs will overwrite
> journal_info which was set by ext4. So this change is indeed welcome :)

Yup, that's what I originally made these patches for. Although my latest
idea for swap is to do something along the lines of Darrick's
iomap_swap_activate(): https://patchwork.kernel.org/patch/10376435/,
I'll be getting back to that soon.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()
  2018-05-11  6:30 ` [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() Omar Sandoval
@ 2018-05-11 20:05   ` Al Viro
  2018-05-11 20:30     ` Omar Sandoval
  0 siblings, 1 reply; 15+ messages in thread
From: Al Viro @ 2018-05-11 20:05 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel, kernel-team

On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote:
>  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  		      struct block_device *bdev, struct iov_iter *iter,
>  		      get_block_t get_block, dio_iodone_t end_io,
> -		      dio_submit_t submit_io, int flags)
> +		      dio_submit_t submit_io, int flags, void *private)

Oh, dear...  That's what, 9 arguments?  I agree that the hack in question
is obscene, but so is this ;-/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()
  2018-05-11 20:05   ` Al Viro
@ 2018-05-11 20:30     ` Omar Sandoval
  2018-05-11 20:32       ` Al Viro
  2018-05-14 16:35       ` David Sterba
  0 siblings, 2 replies; 15+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:30 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-btrfs, linux-fsdevel, kernel-team

On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote:
> On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote:
> >  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
> >  		      struct block_device *bdev, struct iov_iter *iter,
> >  		      get_block_t get_block, dio_iodone_t end_io,
> > -		      dio_submit_t submit_io, int flags)
> > +		      dio_submit_t submit_io, int flags, void *private)
> 
> Oh, dear...  That's what, 9 arguments?  I agree that the hack in question
> is obscene, but so is this ;-/

So looking at these one by one, obviously needed:

- iocb
- inode
- iter

bdev is almost always inode->i_sb->s_bdev, except for Btrfs :(

These could _maybe_ go in struct kiocb:

- flags could maybe be folded into ki_flags
- private could maybe go in iocb->private, but I haven't yet read
  through to figure out if we're already using iocb->private for direct
  I/O

That leaves the callbacks, get_block, end_io, and submit_io. Perhaps we
can add those to inode_operations?

Thoughts?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()
  2018-05-11 20:30     ` Omar Sandoval
@ 2018-05-11 20:32       ` Al Viro
  2018-05-11 20:41         ` Omar Sandoval
  2018-05-14 16:35       ` David Sterba
  1 sibling, 1 reply; 15+ messages in thread
From: Al Viro @ 2018-05-11 20:32 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel, kernel-team

On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote:
> On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote:
> > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote:
> > >  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
> > >  		      struct block_device *bdev, struct iov_iter *iter,
> > >  		      get_block_t get_block, dio_iodone_t end_io,
> > > -		      dio_submit_t submit_io, int flags)
> > > +		      dio_submit_t submit_io, int flags, void *private)
> > 
> > Oh, dear...  That's what, 9 arguments?  I agree that the hack in question
> > is obscene, but so is this ;-/
> 
> So looking at these one by one, obviously needed:
> 
> - iocb
> - inode
> - iter
> 
> bdev is almost always inode->i_sb->s_bdev, except for Btrfs :(
> 
> These could _maybe_ go in struct kiocb:
> 
> - flags could maybe be folded into ki_flags
> - private could maybe go in iocb->private, but I haven't yet read
>   through to figure out if we're already using iocb->private for direct
>   I/O
> 
> That leaves the callbacks, get_block, end_io, and submit_io. Perhaps we
> can add those to inode_operations?

Or, perhaps, btrfs shouldn't be using the common helper?  The question
is not where to stash the bits and pieces - it's how unreadable the callers
are and how much boilerplate/hidden information is involved...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()
  2018-05-11 20:32       ` Al Viro
@ 2018-05-11 20:41         ` Omar Sandoval
  0 siblings, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:41 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-btrfs, linux-fsdevel, kernel-team

On Fri, May 11, 2018 at 09:32:28PM +0100, Al Viro wrote:
> On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote:
> > On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote:
> > > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote:
> > > >  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
> > > >  		      struct block_device *bdev, struct iov_iter *iter,
> > > >  		      get_block_t get_block, dio_iodone_t end_io,
> > > > -		      dio_submit_t submit_io, int flags)
> > > > +		      dio_submit_t submit_io, int flags, void *private)
> > > 
> > > Oh, dear...  That's what, 9 arguments?  I agree that the hack in question
> > > is obscene, but so is this ;-/
> > 
> > So looking at these one by one, obviously needed:
> > 
> > - iocb
> > - inode
> > - iter
> > 
> > bdev is almost always inode->i_sb->s_bdev, except for Btrfs :(
> > 
> > These could _maybe_ go in struct kiocb:
> > 
> > - flags could maybe be folded into ki_flags
> > - private could maybe go in iocb->private, but I haven't yet read
> >   through to figure out if we're already using iocb->private for direct
> >   I/O

Modifying kiocb isn't going to pan out, it's constructed way up in the
stack so that'd be a mess.

> > That leaves the callbacks, get_block, end_io, and submit_io. Perhaps we
> > can add those to inode_operations?
> 
> Or, perhaps, btrfs shouldn't be using the common helper?  The question
> is not where to stash the bits and pieces - it's how unreadable the callers
> are and how much boilerplate/hidden information is involved...

I need to call through to do_blockdev_direct_IO() eventually, I'm sure
no one wants me to reimplement the 200 lines in there :) so I'd be happy
to add a separate helper that only Btrfs uses, but if we're going to
call do_blockdev_direct_IO() eventually then we still need the 9
arguments in some form. Am I misunderstanding?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()
  2018-05-11 20:30     ` Omar Sandoval
  2018-05-11 20:32       ` Al Viro
@ 2018-05-14 16:35       ` David Sterba
  2018-06-25 17:16         ` David Sterba
  1 sibling, 1 reply; 15+ messages in thread
From: David Sterba @ 2018-05-14 16:35 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: Al Viro, linux-btrfs, linux-fsdevel, kernel-team

On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote:
> On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote:
> > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote:
> > >  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
> > >  		      struct block_device *bdev, struct iov_iter *iter,
> > >  		      get_block_t get_block, dio_iodone_t end_io,
> > > -		      dio_submit_t submit_io, int flags)
> > > +		      dio_submit_t submit_io, int flags, void *private)
> > 
> > Oh, dear...  That's what, 9 arguments?  I agree that the hack in question
> > is obscene, but so is this ;-/
> 
> So looking at these one by one, obviously needed:
> 
> - iocb
> - inode
> - iter
> 
> bdev is almost always inode->i_sb->s_bdev, except for Btrfs :(
> 
> These could _maybe_ go in struct kiocb:
> 
> - flags could maybe be folded into ki_flags
> - private could maybe go in iocb->private, but I haven't yet read
>   through to figure out if we're already using iocb->private for direct
>   I/O

I think the kiocb::private can be used for the purpose. There's only one
user, ext4, that also passes some DIO data around so it would in line
with the interface AFAICS.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()
  2018-05-14 16:35       ` David Sterba
@ 2018-06-25 17:16         ` David Sterba
  2018-06-29  7:02           ` Omar Sandoval
  0 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2018-06-25 17:16 UTC (permalink / raw)
  To: dsterba, Omar Sandoval, Al Viro, linux-btrfs, linux-fsdevel, kernel-team

On Mon, May 14, 2018 at 06:35:48PM +0200, David Sterba wrote:
> On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote:
> > On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote:
> > > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote:
> > > >  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
> > > >  		      struct block_device *bdev, struct iov_iter *iter,
> > > >  		      get_block_t get_block, dio_iodone_t end_io,
> > > > -		      dio_submit_t submit_io, int flags)
> > > > +		      dio_submit_t submit_io, int flags, void *private)
> > > 
> > > Oh, dear...  That's what, 9 arguments?  I agree that the hack in question
> > > is obscene, but so is this ;-/
> > 
> > So looking at these one by one, obviously needed:
> > 
> > - iocb
> > - inode
> > - iter
> > 
> > bdev is almost always inode->i_sb->s_bdev, except for Btrfs :(
> > 
> > These could _maybe_ go in struct kiocb:
> > 
> > - flags could maybe be folded into ki_flags
> > - private could maybe go in iocb->private, but I haven't yet read
> >   through to figure out if we're already using iocb->private for direct
> >   I/O
> 
> I think the kiocb::private can be used for the purpose. There's only one
> user, ext4, that also passes some DIO data around so it would in line
> with the interface AFAICS.

Omar, do you have an update to the patchset? Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()
  2018-06-25 17:16         ` David Sterba
@ 2018-06-29  7:02           ` Omar Sandoval
  0 siblings, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2018-06-29  7:02 UTC (permalink / raw)
  To: dsterba, Al Viro, linux-btrfs, linux-fsdevel, kernel-team

On Mon, Jun 25, 2018 at 07:16:38PM +0200, David Sterba wrote:
> On Mon, May 14, 2018 at 06:35:48PM +0200, David Sterba wrote:
> > On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote:
> > > On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote:
> > > > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote:
> > > > >  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
> > > > >  		      struct block_device *bdev, struct iov_iter *iter,
> > > > >  		      get_block_t get_block, dio_iodone_t end_io,
> > > > > -		      dio_submit_t submit_io, int flags)
> > > > > +		      dio_submit_t submit_io, int flags, void *private)
> > > > 
> > > > Oh, dear...  That's what, 9 arguments?  I agree that the hack in question
> > > > is obscene, but so is this ;-/
> > > 
> > > So looking at these one by one, obviously needed:
> > > 
> > > - iocb
> > > - inode
> > > - iter
> > > 
> > > bdev is almost always inode->i_sb->s_bdev, except for Btrfs :(
> > > 
> > > These could _maybe_ go in struct kiocb:
> > > 
> > > - flags could maybe be folded into ki_flags
> > > - private could maybe go in iocb->private, but I haven't yet read
> > >   through to figure out if we're already using iocb->private for direct
> > >   I/O
> > 
> > I think the kiocb::private can be used for the purpose. There's only one
> > user, ext4, that also passes some DIO data around so it would in line
> > with the interface AFAICS.
> 
> Omar, do you have an update to the patchset? Thanks.

Al, what do you think of changing all users of map_bh->b_private to use
iocb->private? We'd have to pass the iocb to get_block() and
submit_io(), but we could get rid of dio->private.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-06-29  7:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-11  6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval
2018-05-11  6:30 ` [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() Omar Sandoval
2018-05-11 20:05   ` Al Viro
2018-05-11 20:30     ` Omar Sandoval
2018-05-11 20:32       ` Al Viro
2018-05-11 20:41         ` Omar Sandoval
2018-05-14 16:35       ` David Sterba
2018-06-25 17:16         ` David Sterba
2018-06-29  7:02           ` Omar Sandoval
2018-05-11  6:30 ` [PATCH 2/3] fs: add private argument to dio_submit_t Omar Sandoval
2018-05-11  6:30 ` [PATCH 3/3] Btrfs: stop abusing current->journal_info in btrfs_direct_IO() Omar Sandoval
2018-05-11  9:26 ` [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O David Sterba
2018-05-11  9:53 ` Nikolay Borisov
2018-05-11 10:24   ` David Sterba
2018-05-11 17:28   ` Omar Sandoval

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).