* [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O @ 2018-05-11 6:30 Omar Sandoval 2018-05-11 6:30 ` [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() Omar Sandoval ` (4 more replies) 0 siblings, 5 replies; 15+ messages in thread From: Omar Sandoval @ 2018-05-11 6:30 UTC (permalink / raw) To: linux-btrfs, linux-fsdevel; +Cc: kernel-team From: Omar Sandoval <osandov@fb.com> Hi, everyone, Btrfs currently abuses current->journal_info in btrfs_direct_IO() in order to pass around some state to get_block() and submit_io(). This hack is ugly and unnecessary because the data we pass around is only used in one call frame. Robbie Ko also pointed out [1] that it could potentially cause a crash if we happen to end up in start_transaction() (e.g., from memory reclaim calling into btrfs_evict_inode(), which can start a transaction). I'm not convinced that Robbie's case can happen in practice since we are using GFP_NOFS for allocations during direct I/O, but either way it's fragile and nasty. This series stops using current->journal_info and instead adds some extra arguments to the generic direct I/O code so that we can pass things around like sane people. Based on Linus' master. Thanks! 1: https://patchwork.kernel.org/patch/10389077/ Omar Sandoval (3): fs: add initial bh_result->b_private value to __blockdev_direct_IO() fs: add private argument to dio_submit_t Btrfs: stop abusing current->journal_info in btrfs_direct_IO() fs/btrfs/inode.c | 39 ++++++++++----------------------------- fs/direct-io.c | 12 +++++++----- fs/ext4/inode.c | 6 +++--- fs/gfs2/aops.c | 2 +- fs/ocfs2/aops.c | 5 ++--- include/linux/fs.h | 10 +++++----- 6 files changed, 28 insertions(+), 46 deletions(-) -- 2.17.0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() 2018-05-11 6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval @ 2018-05-11 6:30 ` Omar Sandoval 2018-05-11 20:05 ` Al Viro 2018-05-11 6:30 ` [PATCH 2/3] fs: add private argument to dio_submit_t Omar Sandoval ` (3 subsequent siblings) 4 siblings, 1 reply; 15+ messages in thread From: Omar Sandoval @ 2018-05-11 6:30 UTC (permalink / raw) To: linux-btrfs, linux-fsdevel; +Cc: kernel-team From: Omar Sandoval <osandov@fb.com> Btrfs abuses current->journal_info in btrfs_direct_IO() in order to pass around some state to get_block() and submit_io(). The generic DIO code already provides bh_result->b_private as a way to pass data between calls to get_block() and end_io(), but it is always initialized to NULL. Let's allow an initial value to be passed in from the caller of __blockdev_direct_IO(). Signed-off-by: Omar Sandoval <osandov@fb.com> --- fs/btrfs/inode.c | 6 +++--- fs/direct-io.c | 9 +++++---- fs/ext4/inode.c | 6 +++--- fs/gfs2/aops.c | 2 +- fs/ocfs2/aops.c | 5 ++--- include/linux/fs.h | 8 ++++---- 6 files changed, 18 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d241285a0d2a..d8b0a7eacd19 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8618,9 +8618,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) } ret = __blockdev_direct_IO(iocb, inode, - fs_info->fs_devices->latest_bdev, - iter, btrfs_get_blocks_direct, NULL, - btrfs_submit_direct, flags); + fs_info->fs_devices->latest_bdev, iter, + btrfs_get_blocks_direct, NULL, + btrfs_submit_direct, flags, NULL); if (iov_iter_rw(iter) == WRITE) { up_read(&BTRFS_I(inode)->dio_sem); current->journal_info = NULL; diff --git a/fs/direct-io.c b/fs/direct-io.c index 874607bb6e02..9bb33d9c206c 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -1171,7 +1171,7 @@ static inline ssize_t do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, struct block_device *bdev, struct iov_iter *iter, get_block_t get_block, dio_iodone_t end_io, - dio_submit_t submit_io, int flags) + dio_submit_t submit_io, int flags, void *private) { unsigned i_blkbits = READ_ONCE(inode->i_blkbits); unsigned blkbits = i_blkbits; @@ -1182,7 +1182,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, const loff_t end = offset + count; struct dio *dio; struct dio_submit sdio = { 0, }; - struct buffer_head map_bh = { 0, }; + struct buffer_head map_bh = { .b_private = private, }; struct blk_plug plug; unsigned long align = offset | iov_iter_alignment(iter); @@ -1304,6 +1304,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, sdio.get_block = get_block; dio->end_io = end_io; + dio->private = map_bh.b_private; sdio.submit_io = submit_io; sdio.final_block_in_bio = -1; sdio.next_block_for_io = -1; @@ -1400,7 +1401,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, struct block_device *bdev, struct iov_iter *iter, get_block_t get_block, dio_iodone_t end_io, dio_submit_t submit_io, - int flags) + int flags, void *private) { /* * The block device state is needed in the end to finally @@ -1415,7 +1416,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, prefetch((char *)bdev->bd_queue + SMP_CACHE_BYTES); return do_blockdev_direct_IO(iocb, inode, bdev, iter, get_block, - end_io, submit_io, flags); + end_io, submit_io, flags, private); } EXPORT_SYMBOL(__blockdev_direct_IO); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 1e50c5efae67..e453e7529a3e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3739,7 +3739,7 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) } ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter, get_block_func, ext4_end_io_dio, NULL, - dio_flags); + dio_flags, NULL); if (ret > 0 && !overwrite && ext4_test_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN)) { @@ -3830,8 +3830,8 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter) iocb->ki_pos + count - 1); if (ret) goto out_unlock; - ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, - iter, ext4_dio_get_block, NULL, NULL, 0); + ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter, + ext4_dio_get_block, NULL, NULL, 0, NULL); out_unlock: inode_unlock_shared(inode); return ret; diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index f58716567972..d42e68543baa 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -1113,7 +1113,7 @@ static ssize_t gfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter) } rv = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter, - gfs2_get_block_direct, NULL, NULL, 0); + gfs2_get_block_direct, NULL, NULL, 0, NULL); out: gfs2_glock_dq(&gh); out_uninit: diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 302cd7caa4a7..58b74d9da931 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -2446,9 +2446,8 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter) else get_block = ocfs2_dio_wr_get_block; - return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, - iter, get_block, - ocfs2_dio_end_io, NULL, 0); + return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter, + get_block, ocfs2_dio_end_io, NULL, 0, NULL); } const struct address_space_operations ocfs2_aops = { diff --git a/include/linux/fs.h b/include/linux/fs.h index 760d8da1b6c7..b9ef2fb1224b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2993,9 +2993,8 @@ void dio_warn_stale_pagecache(struct file *filp); ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, struct block_device *bdev, struct iov_iter *iter, - get_block_t get_block, - dio_iodone_t end_io, dio_submit_t submit_io, - int flags); + get_block_t get_block, dio_iodone_t end_io, + dio_submit_t submit_io, int flags, void *private); static inline ssize_t blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, @@ -3003,7 +3002,8 @@ static inline ssize_t blockdev_direct_IO(struct kiocb *iocb, get_block_t get_block) { return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter, - get_block, NULL, NULL, DIO_LOCKING | DIO_SKIP_HOLES); + get_block, NULL, NULL, + DIO_LOCKING | DIO_SKIP_HOLES, NULL); } #endif -- 2.17.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() 2018-05-11 6:30 ` [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() Omar Sandoval @ 2018-05-11 20:05 ` Al Viro 2018-05-11 20:30 ` Omar Sandoval 0 siblings, 1 reply; 15+ messages in thread From: Al Viro @ 2018-05-11 20:05 UTC (permalink / raw) To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel, kernel-team On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote: > do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, > struct block_device *bdev, struct iov_iter *iter, > get_block_t get_block, dio_iodone_t end_io, > - dio_submit_t submit_io, int flags) > + dio_submit_t submit_io, int flags, void *private) Oh, dear... That's what, 9 arguments? I agree that the hack in question is obscene, but so is this ;-/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() 2018-05-11 20:05 ` Al Viro @ 2018-05-11 20:30 ` Omar Sandoval 2018-05-11 20:32 ` Al Viro 2018-05-14 16:35 ` David Sterba 0 siblings, 2 replies; 15+ messages in thread From: Omar Sandoval @ 2018-05-11 20:30 UTC (permalink / raw) To: Al Viro; +Cc: linux-btrfs, linux-fsdevel, kernel-team On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote: > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote: > > do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, > > struct block_device *bdev, struct iov_iter *iter, > > get_block_t get_block, dio_iodone_t end_io, > > - dio_submit_t submit_io, int flags) > > + dio_submit_t submit_io, int flags, void *private) > > Oh, dear... That's what, 9 arguments? I agree that the hack in question > is obscene, but so is this ;-/ So looking at these one by one, obviously needed: - iocb - inode - iter bdev is almost always inode->i_sb->s_bdev, except for Btrfs :( These could _maybe_ go in struct kiocb: - flags could maybe be folded into ki_flags - private could maybe go in iocb->private, but I haven't yet read through to figure out if we're already using iocb->private for direct I/O That leaves the callbacks, get_block, end_io, and submit_io. Perhaps we can add those to inode_operations? Thoughts? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() 2018-05-11 20:30 ` Omar Sandoval @ 2018-05-11 20:32 ` Al Viro 2018-05-11 20:41 ` Omar Sandoval 2018-05-14 16:35 ` David Sterba 1 sibling, 1 reply; 15+ messages in thread From: Al Viro @ 2018-05-11 20:32 UTC (permalink / raw) To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel, kernel-team On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote: > On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote: > > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote: > > > do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, > > > struct block_device *bdev, struct iov_iter *iter, > > > get_block_t get_block, dio_iodone_t end_io, > > > - dio_submit_t submit_io, int flags) > > > + dio_submit_t submit_io, int flags, void *private) > > > > Oh, dear... That's what, 9 arguments? I agree that the hack in question > > is obscene, but so is this ;-/ > > So looking at these one by one, obviously needed: > > - iocb > - inode > - iter > > bdev is almost always inode->i_sb->s_bdev, except for Btrfs :( > > These could _maybe_ go in struct kiocb: > > - flags could maybe be folded into ki_flags > - private could maybe go in iocb->private, but I haven't yet read > through to figure out if we're already using iocb->private for direct > I/O > > That leaves the callbacks, get_block, end_io, and submit_io. Perhaps we > can add those to inode_operations? Or, perhaps, btrfs shouldn't be using the common helper? The question is not where to stash the bits and pieces - it's how unreadable the callers are and how much boilerplate/hidden information is involved... ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() 2018-05-11 20:32 ` Al Viro @ 2018-05-11 20:41 ` Omar Sandoval 0 siblings, 0 replies; 15+ messages in thread From: Omar Sandoval @ 2018-05-11 20:41 UTC (permalink / raw) To: Al Viro; +Cc: linux-btrfs, linux-fsdevel, kernel-team On Fri, May 11, 2018 at 09:32:28PM +0100, Al Viro wrote: > On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote: > > On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote: > > > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote: > > > > do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, > > > > struct block_device *bdev, struct iov_iter *iter, > > > > get_block_t get_block, dio_iodone_t end_io, > > > > - dio_submit_t submit_io, int flags) > > > > + dio_submit_t submit_io, int flags, void *private) > > > > > > Oh, dear... That's what, 9 arguments? I agree that the hack in question > > > is obscene, but so is this ;-/ > > > > So looking at these one by one, obviously needed: > > > > - iocb > > - inode > > - iter > > > > bdev is almost always inode->i_sb->s_bdev, except for Btrfs :( > > > > These could _maybe_ go in struct kiocb: > > > > - flags could maybe be folded into ki_flags > > - private could maybe go in iocb->private, but I haven't yet read > > through to figure out if we're already using iocb->private for direct > > I/O Modifying kiocb isn't going to pan out, it's constructed way up in the stack so that'd be a mess. > > That leaves the callbacks, get_block, end_io, and submit_io. Perhaps we > > can add those to inode_operations? > > Or, perhaps, btrfs shouldn't be using the common helper? The question > is not where to stash the bits and pieces - it's how unreadable the callers > are and how much boilerplate/hidden information is involved... I need to call through to do_blockdev_direct_IO() eventually, I'm sure no one wants me to reimplement the 200 lines in there :) so I'd be happy to add a separate helper that only Btrfs uses, but if we're going to call do_blockdev_direct_IO() eventually then we still need the 9 arguments in some form. Am I misunderstanding? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() 2018-05-11 20:30 ` Omar Sandoval 2018-05-11 20:32 ` Al Viro @ 2018-05-14 16:35 ` David Sterba 2018-06-25 17:16 ` David Sterba 1 sibling, 1 reply; 15+ messages in thread From: David Sterba @ 2018-05-14 16:35 UTC (permalink / raw) To: Omar Sandoval; +Cc: Al Viro, linux-btrfs, linux-fsdevel, kernel-team On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote: > On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote: > > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote: > > > do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, > > > struct block_device *bdev, struct iov_iter *iter, > > > get_block_t get_block, dio_iodone_t end_io, > > > - dio_submit_t submit_io, int flags) > > > + dio_submit_t submit_io, int flags, void *private) > > > > Oh, dear... That's what, 9 arguments? I agree that the hack in question > > is obscene, but so is this ;-/ > > So looking at these one by one, obviously needed: > > - iocb > - inode > - iter > > bdev is almost always inode->i_sb->s_bdev, except for Btrfs :( > > These could _maybe_ go in struct kiocb: > > - flags could maybe be folded into ki_flags > - private could maybe go in iocb->private, but I haven't yet read > through to figure out if we're already using iocb->private for direct > I/O I think the kiocb::private can be used for the purpose. There's only one user, ext4, that also passes some DIO data around so it would in line with the interface AFAICS. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() 2018-05-14 16:35 ` David Sterba @ 2018-06-25 17:16 ` David Sterba 2018-06-29 7:02 ` Omar Sandoval 0 siblings, 1 reply; 15+ messages in thread From: David Sterba @ 2018-06-25 17:16 UTC (permalink / raw) To: dsterba, Omar Sandoval, Al Viro, linux-btrfs, linux-fsdevel, kernel-team On Mon, May 14, 2018 at 06:35:48PM +0200, David Sterba wrote: > On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote: > > On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote: > > > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote: > > > > do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, > > > > struct block_device *bdev, struct iov_iter *iter, > > > > get_block_t get_block, dio_iodone_t end_io, > > > > - dio_submit_t submit_io, int flags) > > > > + dio_submit_t submit_io, int flags, void *private) > > > > > > Oh, dear... That's what, 9 arguments? I agree that the hack in question > > > is obscene, but so is this ;-/ > > > > So looking at these one by one, obviously needed: > > > > - iocb > > - inode > > - iter > > > > bdev is almost always inode->i_sb->s_bdev, except for Btrfs :( > > > > These could _maybe_ go in struct kiocb: > > > > - flags could maybe be folded into ki_flags > > - private could maybe go in iocb->private, but I haven't yet read > > through to figure out if we're already using iocb->private for direct > > I/O > > I think the kiocb::private can be used for the purpose. There's only one > user, ext4, that also passes some DIO data around so it would in line > with the interface AFAICS. Omar, do you have an update to the patchset? Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() 2018-06-25 17:16 ` David Sterba @ 2018-06-29 7:02 ` Omar Sandoval 0 siblings, 0 replies; 15+ messages in thread From: Omar Sandoval @ 2018-06-29 7:02 UTC (permalink / raw) To: dsterba, Al Viro, linux-btrfs, linux-fsdevel, kernel-team On Mon, Jun 25, 2018 at 07:16:38PM +0200, David Sterba wrote: > On Mon, May 14, 2018 at 06:35:48PM +0200, David Sterba wrote: > > On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote: > > > On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote: > > > > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote: > > > > > do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, > > > > > struct block_device *bdev, struct iov_iter *iter, > > > > > get_block_t get_block, dio_iodone_t end_io, > > > > > - dio_submit_t submit_io, int flags) > > > > > + dio_submit_t submit_io, int flags, void *private) > > > > > > > > Oh, dear... That's what, 9 arguments? I agree that the hack in question > > > > is obscene, but so is this ;-/ > > > > > > So looking at these one by one, obviously needed: > > > > > > - iocb > > > - inode > > > - iter > > > > > > bdev is almost always inode->i_sb->s_bdev, except for Btrfs :( > > > > > > These could _maybe_ go in struct kiocb: > > > > > > - flags could maybe be folded into ki_flags > > > - private could maybe go in iocb->private, but I haven't yet read > > > through to figure out if we're already using iocb->private for direct > > > I/O > > > > I think the kiocb::private can be used for the purpose. There's only one > > user, ext4, that also passes some DIO data around so it would in line > > with the interface AFAICS. > > Omar, do you have an update to the patchset? Thanks. Al, what do you think of changing all users of map_bh->b_private to use iocb->private? We'd have to pass the iocb to get_block() and submit_io(), but we could get rid of dio->private. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] fs: add private argument to dio_submit_t 2018-05-11 6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval 2018-05-11 6:30 ` [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() Omar Sandoval @ 2018-05-11 6:30 ` Omar Sandoval 2018-05-11 6:30 ` [PATCH 3/3] Btrfs: stop abusing current->journal_info in btrfs_direct_IO() Omar Sandoval ` (2 subsequent siblings) 4 siblings, 0 replies; 15+ messages in thread From: Omar Sandoval @ 2018-05-11 6:30 UTC (permalink / raw) To: linux-btrfs, linux-fsdevel; +Cc: kernel-team From: Omar Sandoval <osandov@fb.com> get_block() already has access to this through bh_result->b_private, and it gets passed to end_io() as private, so do the same for submit_io(). Along with the previous change, this will allow us to get rid Btrfs' current->journal_info abuse in btrfs_direct_IO(). Signed-off-by: Omar Sandoval <osandov@fb.com> --- fs/btrfs/inode.c | 2 +- fs/direct-io.c | 3 ++- include/linux/fs.h | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d8b0a7eacd19..46e3d366df8b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8418,7 +8418,7 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip) } static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, - loff_t file_offset) + loff_t file_offset, void *private) { struct btrfs_dio_private *dip = NULL; struct bio *bio = NULL; diff --git a/fs/direct-io.c b/fs/direct-io.c index 9bb33d9c206c..3f3aa2d95ca1 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -475,7 +475,8 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) dio->bio_disk = bio->bi_disk; if (sdio->submit_io) { - sdio->submit_io(bio, dio->inode, sdio->logical_offset_in_bio); + sdio->submit_io(bio, dio->inode, sdio->logical_offset_in_bio, + dio->private); dio->bio_cookie = BLK_QC_T_NONE; } else dio->bio_cookie = submit_bio(bio); diff --git a/include/linux/fs.h b/include/linux/fs.h index b9ef2fb1224b..9e5e6ea1c144 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2978,7 +2978,7 @@ extern int nonseekable_open(struct inode * inode, struct file * filp); #ifdef CONFIG_BLOCK typedef void (dio_submit_t)(struct bio *bio, struct inode *inode, - loff_t file_offset); + loff_t file_offset, void *private); enum { /* need locking between buffered and direct access */ -- 2.17.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] Btrfs: stop abusing current->journal_info in btrfs_direct_IO() 2018-05-11 6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval 2018-05-11 6:30 ` [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() Omar Sandoval 2018-05-11 6:30 ` [PATCH 2/3] fs: add private argument to dio_submit_t Omar Sandoval @ 2018-05-11 6:30 ` Omar Sandoval 2018-05-11 9:26 ` [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O David Sterba 2018-05-11 9:53 ` Nikolay Borisov 4 siblings, 0 replies; 15+ messages in thread From: Omar Sandoval @ 2018-05-11 6:30 UTC (permalink / raw) To: linux-btrfs, linux-fsdevel; +Cc: kernel-team From: Omar Sandoval <osandov@fb.com> Now that we can pass around the struct btrfs_dio_data through the different callbacks generically, we don't need to shove it in current->journal_info. Signed-off-by: Omar Sandoval <osandov@fb.com> --- fs/btrfs/inode.c | 33 +++++++-------------------------- 1 file changed, 7 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 46e3d366df8b..3673c0420509 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7574,7 +7574,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_map *em; struct extent_state *cached_state = NULL; - struct btrfs_dio_data *dio_data = NULL; + struct btrfs_dio_data *dio_data = bh_result->b_private; u64 start = iblock << inode->i_blkbits; u64 lockstart, lockend; u64 len = bh_result->b_size; @@ -7589,25 +7589,13 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, lockstart = start; lockend = start + len - 1; - if (current->journal_info) { - /* - * Need to pull our outstanding extents and set journal_info to NULL so - * that anything that needs to check if there's a transaction doesn't get - * confused. - */ - dio_data = current->journal_info; - current->journal_info = NULL; - } - /* * If this errors out it's because we couldn't invalidate pagecache for * this range and we need to fallback to buffered. */ if (lock_extent_direct(inode, lockstart, lockend, &cached_state, - create)) { - ret = -ENOTBLK; - goto err; - } + create)) + return -ENOTBLK; em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, len, 0); if (IS_ERR(em)) { @@ -7732,7 +7720,6 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, WARN_ON(dio_data->reserve < len); dio_data->reserve -= len; dio_data->unsubmitted_oe_range_end = start + len; - current->journal_info = dio_data; } /* @@ -7755,9 +7742,6 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, unlock_err: clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, lockend, unlock_bits, 1, 0, &cached_state); -err: - if (dio_data) - current->journal_info = dio_data; return ret; } @@ -8460,7 +8444,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, * time by btrfs_direct_IO(). */ if (write) { - struct btrfs_dio_data *dio_data = current->journal_info; + struct btrfs_dio_data *dio_data = private; dio_data->unsubmitted_oe_range_end = dip->logical_offset + dip->bytes; @@ -8602,13 +8586,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) /* * We need to know how many extents we reserved so that we can * do the accounting properly if we go over the number we - * originally calculated. Abuse current->journal_info for this. + * originally calculated. */ - dio_data.reserve = round_up(count, - fs_info->sectorsize); + dio_data.reserve = round_up(count, fs_info->sectorsize); dio_data.unsubmitted_oe_range_start = (u64)offset; dio_data.unsubmitted_oe_range_end = (u64)offset; - current->journal_info = &dio_data; down_read(&BTRFS_I(inode)->dio_sem); } else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK, &BTRFS_I(inode)->runtime_flags)) { @@ -8620,10 +8602,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) ret = __blockdev_direct_IO(iocb, inode, fs_info->fs_devices->latest_bdev, iter, btrfs_get_blocks_direct, NULL, - btrfs_submit_direct, flags, NULL); + btrfs_submit_direct, flags, &dio_data); if (iov_iter_rw(iter) == WRITE) { up_read(&BTRFS_I(inode)->dio_sem); - current->journal_info = NULL; if (ret < 0 && ret != -EIOCBQUEUED) { if (dio_data.reserve) btrfs_delalloc_release_space(inode, data_reserved, -- 2.17.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O 2018-05-11 6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval ` (2 preceding siblings ...) 2018-05-11 6:30 ` [PATCH 3/3] Btrfs: stop abusing current->journal_info in btrfs_direct_IO() Omar Sandoval @ 2018-05-11 9:26 ` David Sterba 2018-05-11 9:53 ` Nikolay Borisov 4 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2018-05-11 9:26 UTC (permalink / raw) To: Omar Sandoval; +Cc: linux-btrfs, linux-fsdevel, kernel-team On Thu, May 10, 2018 at 11:30:09PM -0700, Omar Sandoval wrote: > From: Omar Sandoval <osandov@fb.com> > > Hi, everyone, > > Btrfs currently abuses current->journal_info in btrfs_direct_IO() in > order to pass around some state to get_block() and submit_io(). This > hack is ugly and unnecessary because the data we pass around is only > used in one call frame. I'd very much like to get rid of the journal_info hack. The changes to ther filesystems are minimal. The 3 patches look good to me, you can add my reviewed-by for btrfs and ack for the rest. I'm going to do a fstests round too. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O 2018-05-11 6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval ` (3 preceding siblings ...) 2018-05-11 9:26 ` [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O David Sterba @ 2018-05-11 9:53 ` Nikolay Borisov 2018-05-11 10:24 ` David Sterba 2018-05-11 17:28 ` Omar Sandoval 4 siblings, 2 replies; 15+ messages in thread From: Nikolay Borisov @ 2018-05-11 9:53 UTC (permalink / raw) To: Omar Sandoval, linux-btrfs, linux-fsdevel; +Cc: kernel-team On 11.05.2018 09:30, Omar Sandoval wrote: > From: Omar Sandoval <osandov@fb.com> > > Hi, everyone, > > Btrfs currently abuses current->journal_info in btrfs_direct_IO() in > order to pass around some state to get_block() and submit_io(). This > hack is ugly and unnecessary because the data we pass around is only > used in one call frame. Robbie Ko also pointed out [1] that it could > potentially cause a crash if we happen to end up in start_transaction() > (e.g., from memory reclaim calling into btrfs_evict_inode(), which can > start a transaction). I'm not convinced that Robbie's case can happen in > practice since we are using GFP_NOFS for allocations during direct I/O, > but either way it's fragile and nasty. When I worked initially on btrfs-over-swap I managed to hit a case where ext4 stacked on top of btrfs would crash since btrfs will overwrite journal_info which was set by ext4. So this change is indeed welcome :) > > This series stops using current->journal_info and instead adds some > extra arguments to the generic direct I/O code so that we can pass > things around like sane people. > > Based on Linus' master. > > Thanks! > > 1: https://patchwork.kernel.org/patch/10389077/ > > Omar Sandoval (3): > fs: add initial bh_result->b_private value to __blockdev_direct_IO() > fs: add private argument to dio_submit_t > Btrfs: stop abusing current->journal_info in btrfs_direct_IO() > > fs/btrfs/inode.c | 39 ++++++++++----------------------------- > fs/direct-io.c | 12 +++++++----- > fs/ext4/inode.c | 6 +++--- > fs/gfs2/aops.c | 2 +- > fs/ocfs2/aops.c | 5 ++--- > include/linux/fs.h | 10 +++++----- > 6 files changed, 28 insertions(+), 46 deletions(-) > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O 2018-05-11 9:53 ` Nikolay Borisov @ 2018-05-11 10:24 ` David Sterba 2018-05-11 17:28 ` Omar Sandoval 1 sibling, 0 replies; 15+ messages in thread From: David Sterba @ 2018-05-11 10:24 UTC (permalink / raw) To: Nikolay Borisov; +Cc: Omar Sandoval, linux-btrfs, linux-fsdevel, kernel-team On Fri, May 11, 2018 at 12:53:36PM +0300, Nikolay Borisov wrote: > > > On 11.05.2018 09:30, Omar Sandoval wrote: > > From: Omar Sandoval <osandov@fb.com> > > > > Hi, everyone, > > > > Btrfs currently abuses current->journal_info in btrfs_direct_IO() in > > order to pass around some state to get_block() and submit_io(). This > > hack is ugly and unnecessary because the data we pass around is only > > used in one call frame. Robbie Ko also pointed out [1] that it could > > potentially cause a crash if we happen to end up in start_transaction() > > (e.g., from memory reclaim calling into btrfs_evict_inode(), which can > > start a transaction). I'm not convinced that Robbie's case can happen in > > practice since we are using GFP_NOFS for allocations during direct I/O, > > but either way it's fragile and nasty. > > When I worked initially on btrfs-over-swap I managed to hit a case where > ext4 stacked on top of btrfs would crash since btrfs will overwrite > journal_info which was set by ext4. So this change is indeed welcome :) And also this, https://lkml.org/lkml/2017/12/14/165. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O 2018-05-11 9:53 ` Nikolay Borisov 2018-05-11 10:24 ` David Sterba @ 2018-05-11 17:28 ` Omar Sandoval 1 sibling, 0 replies; 15+ messages in thread From: Omar Sandoval @ 2018-05-11 17:28 UTC (permalink / raw) To: Nikolay Borisov; +Cc: linux-btrfs, linux-fsdevel, kernel-team On Fri, May 11, 2018 at 12:53:36PM +0300, Nikolay Borisov wrote: > > > On 11.05.2018 09:30, Omar Sandoval wrote: > > From: Omar Sandoval <osandov@fb.com> > > > > Hi, everyone, > > > > Btrfs currently abuses current->journal_info in btrfs_direct_IO() in > > order to pass around some state to get_block() and submit_io(). This > > hack is ugly and unnecessary because the data we pass around is only > > used in one call frame. Robbie Ko also pointed out [1] that it could > > potentially cause a crash if we happen to end up in start_transaction() > > (e.g., from memory reclaim calling into btrfs_evict_inode(), which can > > start a transaction). I'm not convinced that Robbie's case can happen in > > practice since we are using GFP_NOFS for allocations during direct I/O, > > but either way it's fragile and nasty. > > When I worked initially on btrfs-over-swap I managed to hit a case where > ext4 stacked on top of btrfs would crash since btrfs will overwrite > journal_info which was set by ext4. So this change is indeed welcome :) Yup, that's what I originally made these patches for. Although my latest idea for swap is to do something along the lines of Darrick's iomap_swap_activate(): https://patchwork.kernel.org/patch/10376435/, I'll be getting back to that soon. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-06-29 7:02 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-11 6:30 [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O Omar Sandoval 2018-05-11 6:30 ` [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO() Omar Sandoval 2018-05-11 20:05 ` Al Viro 2018-05-11 20:30 ` Omar Sandoval 2018-05-11 20:32 ` Al Viro 2018-05-11 20:41 ` Omar Sandoval 2018-05-14 16:35 ` David Sterba 2018-06-25 17:16 ` David Sterba 2018-06-29 7:02 ` Omar Sandoval 2018-05-11 6:30 ` [PATCH 2/3] fs: add private argument to dio_submit_t Omar Sandoval 2018-05-11 6:30 ` [PATCH 3/3] Btrfs: stop abusing current->journal_info in btrfs_direct_IO() Omar Sandoval 2018-05-11 9:26 ` [PATCH 0/3] Btrfs: stop abusing current->journal_info for direct I/O David Sterba 2018-05-11 9:53 ` Nikolay Borisov 2018-05-11 10:24 ` David Sterba 2018-05-11 17:28 ` Omar Sandoval
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).