* [PATCHSET 0/3] Submit ->readpages() IO as read-ahead @ 2018-05-24 16:02 Jens Axboe 2018-05-24 16:02 ` [PATCH 1/3] mpage: mpage_readpages() should submit " Jens Axboe ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Jens Axboe @ 2018-05-24 16:02 UTC (permalink / raw) To: linux-fsdevel; +Cc: viro, akpm The only caller of ->readpages() is from read-ahead, yet we don't submit IO flagged with REQ_RAHEAD. This means we don't see it in blktrace, for instance, which is a shame. We already make assumptions about ->readpages() just being for read-ahead in the mpage implementation, using readahead_gfp_mask(mapping) as out GFP mask of choice. This small series fixes up mpage_readpages() to submit with REQ_RAHEAD, which takes care of file systems using mpage_readpages(). The last two fixup ext4 and btrfs. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] mpage: mpage_readpages() should submit IO as read-ahead 2018-05-24 16:02 [PATCHSET 0/3] Submit ->readpages() IO as read-ahead Jens Axboe @ 2018-05-24 16:02 ` Jens Axboe 2018-05-24 19:43 ` Andrew Morton 2018-05-24 16:02 ` [PATCH 2/3] btrfs: readpages() " Jens Axboe ` (2 subsequent siblings) 3 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2018-05-24 16:02 UTC (permalink / raw) To: linux-fsdevel; +Cc: viro, akpm, Jens Axboe a_ops->readpages() is only ever used for read-ahead, yet we don't flag the IO being submitted as such. Fix that up. Any file system that uses mpage_readpages() as it's ->readpages() implementation will now get this right. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- fs/mpage.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/fs/mpage.c b/fs/mpage.c index b7e7f570733a..0a5474237f5e 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@ -146,7 +146,7 @@ static struct bio * do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, sector_t *last_block_in_bio, struct buffer_head *map_bh, unsigned long *first_logical_block, get_block_t get_block, - gfp_t gfp) + gfp_t gfp, bool is_readahead) { struct inode *inode = page->mapping->host; const unsigned blkbits = inode->i_blkbits; @@ -161,6 +161,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, struct block_device *bdev = NULL; int length; int fully_mapped = 1; + int op_flags = is_readahead ? REQ_RAHEAD : 0; unsigned nblocks; unsigned relative_block; @@ -274,7 +275,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, * This page will go to BIO. Do we need to send this BIO off first? */ if (bio && (*last_block_in_bio != blocks[0] - 1)) - bio = mpage_bio_submit(REQ_OP_READ, 0, bio); + bio = mpage_bio_submit(REQ_OP_READ, op_flags, bio); alloc_new: if (bio == NULL) { @@ -291,7 +292,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, length = first_hole << blkbits; if (bio_add_page(bio, page, length, 0) < length) { - bio = mpage_bio_submit(REQ_OP_READ, 0, bio); + bio = mpage_bio_submit(REQ_OP_READ, op_flags, bio); goto alloc_new; } @@ -299,7 +300,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, nblocks = map_bh->b_size >> blkbits; if ((buffer_boundary(map_bh) && relative_block == nblocks) || (first_hole != blocks_per_page)) - bio = mpage_bio_submit(REQ_OP_READ, 0, bio); + bio = mpage_bio_submit(REQ_OP_READ, op_flags, bio); else *last_block_in_bio = blocks[blocks_per_page - 1]; out: @@ -307,7 +308,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, confused: if (bio) - bio = mpage_bio_submit(REQ_OP_READ, 0, bio); + bio = mpage_bio_submit(REQ_OP_READ, op_flags, bio); if (!PageUptodate(page)) block_read_full_page(page, get_block); else @@ -384,13 +385,13 @@ mpage_readpages(struct address_space *mapping, struct list_head *pages, nr_pages - page_idx, &last_block_in_bio, &map_bh, &first_logical_block, - get_block, gfp); + get_block, gfp, true); } put_page(page); } BUG_ON(!list_empty(pages)); if (bio) - mpage_bio_submit(REQ_OP_READ, 0, bio); + mpage_bio_submit(REQ_OP_READ, REQ_RAHEAD, bio); return 0; } EXPORT_SYMBOL(mpage_readpages); @@ -409,7 +410,7 @@ int mpage_readpage(struct page *page, get_block_t get_block) map_bh.b_state = 0; map_bh.b_size = 0; bio = do_mpage_readpage(bio, page, 1, &last_block_in_bio, - &map_bh, &first_logical_block, get_block, gfp); + &map_bh, &first_logical_block, get_block, gfp, false); if (bio) mpage_bio_submit(REQ_OP_READ, 0, bio); return 0; -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mpage: mpage_readpages() should submit IO as read-ahead 2018-05-24 16:02 ` [PATCH 1/3] mpage: mpage_readpages() should submit " Jens Axboe @ 2018-05-24 19:43 ` Andrew Morton 2018-05-24 19:57 ` Matthew Wilcox 2018-05-29 14:36 ` Jens Axboe 0 siblings, 2 replies; 16+ messages in thread From: Andrew Morton @ 2018-05-24 19:43 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-fsdevel, viro On Thu, 24 May 2018 10:02:52 -0600 Jens Axboe <axboe@kernel.dk> wrote: > a_ops->readpages() is only ever used for read-ahead, yet we don't > flag the IO being submitted as such. Fix that up. Any file system > that uses mpage_readpages() as it's ->readpages() implementation > will now get this right. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > fs/mpage.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/fs/mpage.c b/fs/mpage.c > index b7e7f570733a..0a5474237f5e 100644 > --- a/fs/mpage.c > +++ b/fs/mpage.c > @@ -146,7 +146,7 @@ static struct bio * > do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, > sector_t *last_block_in_bio, struct buffer_head *map_bh, > unsigned long *first_logical_block, get_block_t get_block, > - gfp_t gfp) > + gfp_t gfp, bool is_readahead) That's a lot of arguments. I suspect we'll have a faster kernel if we mark this __always_inline. I think my ancient "This isn't called much at all" over mpage_readpage() remains true. Almost all callers come in via mpage_readpages(), which would benefit from the inlining. But mpage.o gets 1.5k fatter. hm. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mpage: mpage_readpages() should submit IO as read-ahead 2018-05-24 19:43 ` Andrew Morton @ 2018-05-24 19:57 ` Matthew Wilcox 2018-05-24 20:24 ` Jens Axboe 2018-05-29 14:36 ` Jens Axboe 1 sibling, 1 reply; 16+ messages in thread From: Matthew Wilcox @ 2018-05-24 19:57 UTC (permalink / raw) To: Andrew Morton; +Cc: Jens Axboe, linux-fsdevel, viro On Thu, May 24, 2018 at 12:43:06PM -0700, Andrew Morton wrote: > On Thu, 24 May 2018 10:02:52 -0600 Jens Axboe <axboe@kernel.dk> wrote: > > do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, > > sector_t *last_block_in_bio, struct buffer_head *map_bh, > > unsigned long *first_logical_block, get_block_t get_block, > > - gfp_t gfp) > > + gfp_t gfp, bool is_readahead) > > That's a lot of arguments. struct mpage_args args = { .bio = NULL, .first_logical_block = 0, .last_block_in_bio = 0, .is_readahead = true, .map_bh = { .b_state = 0; .b_size = 0; }, .get_block = get_block, .gfp = readahead_gfp_mask(mapping); }; ... do_mpage_readpages(&args, page, nr_pages - page_idx); better than inlining? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mpage: mpage_readpages() should submit IO as read-ahead 2018-05-24 19:57 ` Matthew Wilcox @ 2018-05-24 20:24 ` Jens Axboe 0 siblings, 0 replies; 16+ messages in thread From: Jens Axboe @ 2018-05-24 20:24 UTC (permalink / raw) To: Matthew Wilcox, Andrew Morton; +Cc: linux-fsdevel, viro On 5/24/18 1:57 PM, Matthew Wilcox wrote: > On Thu, May 24, 2018 at 12:43:06PM -0700, Andrew Morton wrote: >> On Thu, 24 May 2018 10:02:52 -0600 Jens Axboe <axboe@kernel.dk> wrote: >>> do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, >>> sector_t *last_block_in_bio, struct buffer_head *map_bh, >>> unsigned long *first_logical_block, get_block_t get_block, >>> - gfp_t gfp) >>> + gfp_t gfp, bool is_readahead) >> >> That's a lot of arguments. > > struct mpage_args args = { > .bio = NULL, > .first_logical_block = 0, > .last_block_in_bio = 0, > .is_readahead = true, > .map_bh = { > .b_state = 0; > .b_size = 0; > }, > .get_block = get_block, > .gfp = readahead_gfp_mask(mapping); > }; > > ... > do_mpage_readpages(&args, page, nr_pages - page_idx); > > better than inlining? It makes it smaller, at least. diff --git a/fs/mpage.c b/fs/mpage.c index 0a5474237f5e..82682f1e5187 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@ -133,6 +133,18 @@ map_buffer_to_page(struct page *page, struct buffer_head *bh, int page_block) } while (page_bh != head); } +struct mpage_args { + struct bio *bio; + struct page *page; + unsigned nr_pages; + sector_t last_block_in_bio; + struct buffer_head map_bh; + unsigned long first_logical_block; + get_block_t *get_block; + gfp_t gfp; + bool is_readahead; +}; + /* * This is the worker routine which does all the work of mapping the disk * blocks and constructs largest possible bios, submits them for IO if the @@ -142,12 +154,9 @@ map_buffer_to_page(struct page *page, struct buffer_head *bh, int page_block) * represent the validity of its disk mapping and to decide when to do the next * get_block() call. */ -static struct bio * -do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, - sector_t *last_block_in_bio, struct buffer_head *map_bh, - unsigned long *first_logical_block, get_block_t get_block, - gfp_t gfp, bool is_readahead) +static struct bio *do_mpage_readpage(struct mpage_args *args) { + struct page *page = args->page; struct inode *inode = page->mapping->host; const unsigned blkbits = inode->i_blkbits; const unsigned blocks_per_page = PAGE_SIZE >> blkbits; @@ -161,7 +170,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, struct block_device *bdev = NULL; int length; int fully_mapped = 1; - int op_flags = is_readahead ? REQ_RAHEAD : 0; + int op_flags = args->is_readahead ? REQ_RAHEAD : 0; unsigned nblocks; unsigned relative_block; @@ -169,7 +178,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, goto confused; block_in_file = (sector_t)page->index << (PAGE_SHIFT - blkbits); - last_block = block_in_file + nr_pages * blocks_per_page; + last_block = block_in_file + args->nr_pages * blocks_per_page; last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits; if (last_block > last_block_in_file) last_block = last_block_in_file; @@ -178,43 +187,43 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, /* * Map blocks using the result from the previous get_blocks call first. */ - nblocks = map_bh->b_size >> blkbits; - if (buffer_mapped(map_bh) && block_in_file > *first_logical_block && - block_in_file < (*first_logical_block + nblocks)) { - unsigned map_offset = block_in_file - *first_logical_block; + nblocks = args->map_bh.b_size >> blkbits; + if (buffer_mapped(&args->map_bh) && block_in_file > args->first_logical_block && + block_in_file < (args->first_logical_block + nblocks)) { + unsigned map_offset = block_in_file - args->first_logical_block; unsigned last = nblocks - map_offset; for (relative_block = 0; ; relative_block++) { if (relative_block == last) { - clear_buffer_mapped(map_bh); + clear_buffer_mapped(&args->map_bh); break; } if (page_block == blocks_per_page) break; - blocks[page_block] = map_bh->b_blocknr + map_offset + + blocks[page_block] = args->map_bh.b_blocknr + map_offset + relative_block; page_block++; block_in_file++; } - bdev = map_bh->b_bdev; + bdev = args->map_bh.b_bdev; } /* * Then do more get_blocks calls until we are done with this page. */ - map_bh->b_page = page; + args->map_bh.b_page = page; while (page_block < blocks_per_page) { - map_bh->b_state = 0; - map_bh->b_size = 0; + args->map_bh.b_state = 0; + args->map_bh.b_size = 0; if (block_in_file < last_block) { - map_bh->b_size = (last_block-block_in_file) << blkbits; - if (get_block(inode, block_in_file, map_bh, 0)) + args->map_bh.b_size = (last_block-block_in_file) << blkbits; + if (args->get_block(inode, block_in_file, &args->map_bh, 0)) goto confused; - *first_logical_block = block_in_file; + args->first_logical_block = block_in_file; } - if (!buffer_mapped(map_bh)) { + if (!buffer_mapped(&args->map_bh)) { fully_mapped = 0; if (first_hole == blocks_per_page) first_hole = page_block; @@ -229,8 +238,8 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, * we just collected from get_block into the page's buffers * so readpage doesn't have to repeat the get_block call */ - if (buffer_uptodate(map_bh)) { - map_buffer_to_page(page, map_bh, page_block); + if (buffer_uptodate(&args->map_bh)) { + map_buffer_to_page(page, &args->map_bh, page_block); goto confused; } @@ -238,20 +247,20 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, goto confused; /* hole -> non-hole */ /* Contiguous blocks? */ - if (page_block && blocks[page_block-1] != map_bh->b_blocknr-1) + if (page_block && blocks[page_block-1] != args->map_bh.b_blocknr-1) goto confused; - nblocks = map_bh->b_size >> blkbits; + nblocks = args->map_bh.b_size >> blkbits; for (relative_block = 0; ; relative_block++) { if (relative_block == nblocks) { - clear_buffer_mapped(map_bh); + clear_buffer_mapped(&args->map_bh); break; } else if (page_block == blocks_per_page) break; - blocks[page_block] = map_bh->b_blocknr+relative_block; + blocks[page_block] = args->map_bh.b_blocknr+relative_block; page_block++; block_in_file++; } - bdev = map_bh->b_bdev; + bdev = args->map_bh.b_bdev; } if (first_hole != blocks_per_page) { @@ -274,43 +283,43 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, /* * This page will go to BIO. Do we need to send this BIO off first? */ - if (bio && (*last_block_in_bio != blocks[0] - 1)) - bio = mpage_bio_submit(REQ_OP_READ, op_flags, bio); + if (args->bio && (args->last_block_in_bio != blocks[0] - 1)) + args->bio = mpage_bio_submit(REQ_OP_READ, op_flags, args->bio); alloc_new: - if (bio == NULL) { + if (args->bio == NULL) { if (first_hole == blocks_per_page) { if (!bdev_read_page(bdev, blocks[0] << (blkbits - 9), page)) goto out; } - bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9), - min_t(int, nr_pages, BIO_MAX_PAGES), gfp); - if (bio == NULL) + args->bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9), + min_t(int, args->nr_pages, BIO_MAX_PAGES), args->gfp); + if (args->bio == NULL) goto confused; } length = first_hole << blkbits; - if (bio_add_page(bio, page, length, 0) < length) { - bio = mpage_bio_submit(REQ_OP_READ, op_flags, bio); + if (bio_add_page(args->bio, page, length, 0) < length) { + args->bio = mpage_bio_submit(REQ_OP_READ, op_flags, args->bio); goto alloc_new; } - relative_block = block_in_file - *first_logical_block; - nblocks = map_bh->b_size >> blkbits; - if ((buffer_boundary(map_bh) && relative_block == nblocks) || + relative_block = block_in_file - args->first_logical_block; + nblocks = args->map_bh.b_size >> blkbits; + if ((buffer_boundary(&args->map_bh) && relative_block == nblocks) || (first_hole != blocks_per_page)) - bio = mpage_bio_submit(REQ_OP_READ, op_flags, bio); + args->bio = mpage_bio_submit(REQ_OP_READ, op_flags, args->bio); else - *last_block_in_bio = blocks[blocks_per_page - 1]; + args->last_block_in_bio = blocks[blocks_per_page - 1]; out: - return bio; + return args->bio; confused: - if (bio) - bio = mpage_bio_submit(REQ_OP_READ, op_flags, bio); + if (args->bio) + args->bio = mpage_bio_submit(REQ_OP_READ, op_flags, args->bio); if (!PageUptodate(page)) - block_read_full_page(page, get_block); + block_read_full_page(page, args->get_block); else unlock_page(page); goto out; @@ -364,15 +373,13 @@ int mpage_readpages(struct address_space *mapping, struct list_head *pages, unsigned nr_pages, get_block_t get_block) { - struct bio *bio = NULL; + struct mpage_args args = { + .get_block = get_block, + .gfp = readahead_gfp_mask(mapping), + .is_readahead = true, + }; unsigned page_idx; - sector_t last_block_in_bio = 0; - struct buffer_head map_bh; - unsigned long first_logical_block = 0; - gfp_t gfp = readahead_gfp_mask(mapping); - map_bh.b_state = 0; - map_bh.b_size = 0; for (page_idx = 0; page_idx < nr_pages; page_idx++) { struct page *page = lru_to_page(pages); @@ -380,18 +387,16 @@ mpage_readpages(struct address_space *mapping, struct list_head *pages, list_del(&page->lru); if (!add_to_page_cache_lru(page, mapping, page->index, - gfp)) { - bio = do_mpage_readpage(bio, page, - nr_pages - page_idx, - &last_block_in_bio, &map_bh, - &first_logical_block, - get_block, gfp, true); + args.gfp)) { + args.page = page; + args.nr_pages = nr_pages - page_idx; + args.bio = do_mpage_readpage(&args); } put_page(page); } BUG_ON(!list_empty(pages)); - if (bio) - mpage_bio_submit(REQ_OP_READ, REQ_RAHEAD, bio); + if (args.bio) + mpage_bio_submit(REQ_OP_READ, REQ_RAHEAD, args.bio); return 0; } EXPORT_SYMBOL(mpage_readpages); @@ -401,18 +406,16 @@ EXPORT_SYMBOL(mpage_readpages); */ int mpage_readpage(struct page *page, get_block_t get_block) { - struct bio *bio = NULL; - sector_t last_block_in_bio = 0; - struct buffer_head map_bh; - unsigned long first_logical_block = 0; - gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL); + struct mpage_args args = { + .page = page, + .nr_pages = 1, + .get_block = get_block, + .gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL), + }; - map_bh.b_state = 0; - map_bh.b_size = 0; - bio = do_mpage_readpage(bio, page, 1, &last_block_in_bio, - &map_bh, &first_logical_block, get_block, gfp, false); - if (bio) - mpage_bio_submit(REQ_OP_READ, 0, bio); + args.bio = do_mpage_readpage(&args); + if (args.bio) + mpage_bio_submit(REQ_OP_READ, 0, args.bio); return 0; } EXPORT_SYMBOL(mpage_readpage); -- Jens Axboe ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mpage: mpage_readpages() should submit IO as read-ahead 2018-05-24 19:43 ` Andrew Morton 2018-05-24 19:57 ` Matthew Wilcox @ 2018-05-29 14:36 ` Jens Axboe 2018-05-29 21:59 ` Andrew Morton 1 sibling, 1 reply; 16+ messages in thread From: Jens Axboe @ 2018-05-29 14:36 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-fsdevel, viro On 5/24/18 1:43 PM, Andrew Morton wrote: > On Thu, 24 May 2018 10:02:52 -0600 Jens Axboe <axboe@kernel.dk> wrote: > >> a_ops->readpages() is only ever used for read-ahead, yet we don't >> flag the IO being submitted as such. Fix that up. Any file system >> that uses mpage_readpages() as it's ->readpages() implementation >> will now get this right. >> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> --- >> fs/mpage.c | 17 +++++++++-------- >> 1 file changed, 9 insertions(+), 8 deletions(-) >> >> diff --git a/fs/mpage.c b/fs/mpage.c >> index b7e7f570733a..0a5474237f5e 100644 >> --- a/fs/mpage.c >> +++ b/fs/mpage.c >> @@ -146,7 +146,7 @@ static struct bio * >> do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, >> sector_t *last_block_in_bio, struct buffer_head *map_bh, >> unsigned long *first_logical_block, get_block_t get_block, >> - gfp_t gfp) >> + gfp_t gfp, bool is_readahead) > > That's a lot of arguments. > > I suspect we'll have a faster kernel if we mark this __always_inline. > I think my ancient "This isn't called much at all" over > mpage_readpage() remains true. Almost all callers come in via > mpage_readpages(), which would benefit from the inlining. But mpage.o > gets 1.5k fatter. hm. Was going to send out a v2, but would be nice to get some consensus on what you prefer here. I can either do the struct version, or I can keep it as-is (going from 8 to 9 arguments). For the struct version, I'd prefer to do that as a prep patch, so the functional change is clear. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mpage: mpage_readpages() should submit IO as read-ahead 2018-05-29 14:36 ` Jens Axboe @ 2018-05-29 21:59 ` Andrew Morton 2018-05-29 22:18 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Andrew Morton @ 2018-05-29 21:59 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-fsdevel, viro On Tue, 29 May 2018 08:36:41 -0600 Jens Axboe <axboe@kernel.dk> wrote: > On 5/24/18 1:43 PM, Andrew Morton wrote: > > On Thu, 24 May 2018 10:02:52 -0600 Jens Axboe <axboe@kernel.dk> wrote: > > > >> a_ops->readpages() is only ever used for read-ahead, yet we don't > >> flag the IO being submitted as such. Fix that up. Any file system > >> that uses mpage_readpages() as it's ->readpages() implementation > >> will now get this right. > >> > >> Signed-off-by: Jens Axboe <axboe@kernel.dk> > >> --- > >> fs/mpage.c | 17 +++++++++-------- > >> 1 file changed, 9 insertions(+), 8 deletions(-) > >> > >> diff --git a/fs/mpage.c b/fs/mpage.c > >> index b7e7f570733a..0a5474237f5e 100644 > >> --- a/fs/mpage.c > >> +++ b/fs/mpage.c > >> @@ -146,7 +146,7 @@ static struct bio * > >> do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, > >> sector_t *last_block_in_bio, struct buffer_head *map_bh, > >> unsigned long *first_logical_block, get_block_t get_block, > >> - gfp_t gfp) > >> + gfp_t gfp, bool is_readahead) > > > > That's a lot of arguments. > > > > I suspect we'll have a faster kernel if we mark this __always_inline. > > I think my ancient "This isn't called much at all" over > > mpage_readpage() remains true. Almost all callers come in via > > mpage_readpages(), which would benefit from the inlining. But mpage.o > > gets 1.5k fatter. hm. > > Was going to send out a v2, but would be nice to get some consensus on > what you prefer here. I can either do the struct version, or I can > keep it as-is (going from 8 to 9 arguments). For the struct version, > I'd prefer to do that as a prep patch, so the functional change is > clear. The struct thing makes the code smaller, and presumably faster, doesn't it? I suppose it saves a bit of stack as well, by letting the callee access the caller's locals rather than a copy of them. All sounds good to me? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mpage: mpage_readpages() should submit IO as read-ahead 2018-05-29 21:59 ` Andrew Morton @ 2018-05-29 22:18 ` Jens Axboe 2018-05-29 22:37 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2018-05-29 22:18 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-fsdevel, viro On 5/29/18 3:59 PM, Andrew Morton wrote: > On Tue, 29 May 2018 08:36:41 -0600 Jens Axboe <axboe@kernel.dk> wrote: > >> On 5/24/18 1:43 PM, Andrew Morton wrote: >>> On Thu, 24 May 2018 10:02:52 -0600 Jens Axboe <axboe@kernel.dk> wrote: >>> >>>> a_ops->readpages() is only ever used for read-ahead, yet we don't >>>> flag the IO being submitted as such. Fix that up. Any file system >>>> that uses mpage_readpages() as it's ->readpages() implementation >>>> will now get this right. >>>> >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>> --- >>>> fs/mpage.c | 17 +++++++++-------- >>>> 1 file changed, 9 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/fs/mpage.c b/fs/mpage.c >>>> index b7e7f570733a..0a5474237f5e 100644 >>>> --- a/fs/mpage.c >>>> +++ b/fs/mpage.c >>>> @@ -146,7 +146,7 @@ static struct bio * >>>> do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, >>>> sector_t *last_block_in_bio, struct buffer_head *map_bh, >>>> unsigned long *first_logical_block, get_block_t get_block, >>>> - gfp_t gfp) >>>> + gfp_t gfp, bool is_readahead) >>> >>> That's a lot of arguments. >>> >>> I suspect we'll have a faster kernel if we mark this __always_inline. >>> I think my ancient "This isn't called much at all" over >>> mpage_readpage() remains true. Almost all callers come in via >>> mpage_readpages(), which would benefit from the inlining. But mpage.o >>> gets 1.5k fatter. hm. >> >> Was going to send out a v2, but would be nice to get some consensus on >> what you prefer here. I can either do the struct version, or I can >> keep it as-is (going from 8 to 9 arguments). For the struct version, >> I'd prefer to do that as a prep patch, so the functional change is >> clear. > > The struct thing makes the code smaller, and presumably faster, doesn't > it? I suppose it saves a bit of stack as well, by letting the callee > access the caller's locals rather than a copy of them. All sounds good > to me? That's what I thought to, so already prepped the series. Sending it out. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mpage: mpage_readpages() should submit IO as read-ahead 2018-05-29 22:18 ` Jens Axboe @ 2018-05-29 22:37 ` Jens Axboe 0 siblings, 0 replies; 16+ messages in thread From: Jens Axboe @ 2018-05-29 22:37 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-fsdevel, viro On 5/29/18 4:18 PM, Jens Axboe wrote: > On 5/29/18 3:59 PM, Andrew Morton wrote: >> On Tue, 29 May 2018 08:36:41 -0600 Jens Axboe <axboe@kernel.dk> wrote: >> >>> On 5/24/18 1:43 PM, Andrew Morton wrote: >>>> On Thu, 24 May 2018 10:02:52 -0600 Jens Axboe <axboe@kernel.dk> wrote: >>>> >>>>> a_ops->readpages() is only ever used for read-ahead, yet we don't >>>>> flag the IO being submitted as such. Fix that up. Any file system >>>>> that uses mpage_readpages() as it's ->readpages() implementation >>>>> will now get this right. >>>>> >>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>> --- >>>>> fs/mpage.c | 17 +++++++++-------- >>>>> 1 file changed, 9 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/fs/mpage.c b/fs/mpage.c >>>>> index b7e7f570733a..0a5474237f5e 100644 >>>>> --- a/fs/mpage.c >>>>> +++ b/fs/mpage.c >>>>> @@ -146,7 +146,7 @@ static struct bio * >>>>> do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, >>>>> sector_t *last_block_in_bio, struct buffer_head *map_bh, >>>>> unsigned long *first_logical_block, get_block_t get_block, >>>>> - gfp_t gfp) >>>>> + gfp_t gfp, bool is_readahead) >>>> >>>> That's a lot of arguments. >>>> >>>> I suspect we'll have a faster kernel if we mark this __always_inline. >>>> I think my ancient "This isn't called much at all" over >>>> mpage_readpage() remains true. Almost all callers come in via >>>> mpage_readpages(), which would benefit from the inlining. But mpage.o >>>> gets 1.5k fatter. hm. >>> >>> Was going to send out a v2, but would be nice to get some consensus on >>> what you prefer here. I can either do the struct version, or I can >>> keep it as-is (going from 8 to 9 arguments). For the struct version, >>> I'd prefer to do that as a prep patch, so the functional change is >>> clear. >> >> The struct thing makes the code smaller, and presumably faster, doesn't >> it? I suppose it saves a bit of stack as well, by letting the callee >> access the caller's locals rather than a copy of them. All sounds good >> to me? > > That's what I thought to, so already prepped the series. Sending it out. We could actually kill args->gfp as well, since that's dependent on args->is_readahead anyway. Separate patch, or fold into patch #2? Incremental below. diff --git a/fs/mpage.c b/fs/mpage.c index a6344996f924..b0f9de977526 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@ -137,12 +137,11 @@ struct mpage_readpage_args { struct bio *bio; struct page *page; unsigned nr_pages; + bool is_readahead; sector_t last_block_in_bio; struct buffer_head map_bh; unsigned long first_logical_block; get_block_t *get_block; - gfp_t gfp; - bool is_readahead; }; /* @@ -171,9 +170,18 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) struct block_device *bdev = NULL; int length; int fully_mapped = 1; - int op_flags = args->is_readahead ? REQ_RAHEAD : 0; + int op_flags; unsigned nblocks; unsigned relative_block; + gfp_t gfp; + + if (args->is_readahead) { + op_flags = REQ_RAHEAD; + gfp = readahead_gfp_mask(page->mapping); + } else { + op_flags = 0; + gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL); + } if (page_has_buffers(page)) goto confused; @@ -295,7 +303,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) goto out; } args->bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9), - min_t(int, args->nr_pages, BIO_MAX_PAGES), args->gfp); + min_t(int, args->nr_pages, BIO_MAX_PAGES), gfp); if (args->bio == NULL) goto confused; } @@ -376,7 +384,6 @@ mpage_readpages(struct address_space *mapping, struct list_head *pages, { struct mpage_readpage_args args = { .get_block = get_block, - .gfp = readahead_gfp_mask(mapping), .is_readahead = true, }; unsigned page_idx; @@ -388,7 +395,7 @@ mpage_readpages(struct address_space *mapping, struct list_head *pages, list_del(&page->lru); if (!add_to_page_cache_lru(page, mapping, page->index, - args.gfp)) { + readahead_gfp_mask(mapping))) { args.page = page; args.nr_pages = nr_pages - page_idx; args.bio = do_mpage_readpage(&args); @@ -411,7 +418,6 @@ int mpage_readpage(struct page *page, get_block_t get_block) .page = page, .nr_pages = 1, .get_block = get_block, - .gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL), }; args.bio = do_mpage_readpage(&args); -- Jens Axboe ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/3] btrfs: readpages() should submit IO as read-ahead 2018-05-24 16:02 [PATCHSET 0/3] Submit ->readpages() IO as read-ahead Jens Axboe 2018-05-24 16:02 ` [PATCH 1/3] mpage: mpage_readpages() should submit " Jens Axboe @ 2018-05-24 16:02 ` Jens Axboe 2018-05-24 16:02 ` [PATCH 3/3] ext4: " Jens Axboe 2018-05-25 8:32 ` [PATCHSET 0/3] Submit ->readpages() " Christoph Hellwig 3 siblings, 0 replies; 16+ messages in thread From: Jens Axboe @ 2018-05-24 16:02 UTC (permalink / raw) To: linux-fsdevel; +Cc: viro, akpm, Jens Axboe a_ops->readpages() is only ever used for read-ahead. Ensure that we pass this information down to the block layer. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- fs/btrfs/extent_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e99b329002cf..32034bc1e2b9 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3119,7 +3119,7 @@ static inline void __do_contiguous_readpages(struct extent_io_tree *tree, for (index = 0; index < nr_pages; index++) { __do_readpage(tree, pages[index], btrfs_get_extent, em_cached, - bio, 0, bio_flags, 0, prev_em_start); + bio, 0, bio_flags, REQ_RAHEAD, prev_em_start); put_page(pages[index]); } } -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/3] ext4: readpages() should submit IO as read-ahead 2018-05-24 16:02 [PATCHSET 0/3] Submit ->readpages() IO as read-ahead Jens Axboe 2018-05-24 16:02 ` [PATCH 1/3] mpage: mpage_readpages() should submit " Jens Axboe 2018-05-24 16:02 ` [PATCH 2/3] btrfs: readpages() " Jens Axboe @ 2018-05-24 16:02 ` Jens Axboe 2018-05-24 20:47 ` Theodore Y. Ts'o 2018-05-25 8:32 ` [PATCHSET 0/3] Submit ->readpages() " Christoph Hellwig 3 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2018-05-24 16:02 UTC (permalink / raw) To: linux-fsdevel; +Cc: viro, akpm, Jens Axboe a_ops->readpages() is only ever used for read-ahead. Ensure that we pass this information down to the block layer. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- fs/ext4/readpage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c index 9ffa6fad18db..92b28273b223 100644 --- a/fs/ext4/readpage.c +++ b/fs/ext4/readpage.c @@ -259,7 +259,7 @@ int ext4_mpage_readpages(struct address_space *mapping, bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9); bio->bi_end_io = mpage_end_io; bio->bi_private = ctx; - bio_set_op_attrs(bio, REQ_OP_READ, 0); + bio_set_op_attrs(bio, REQ_OP_READ, REQ_RAHEAD); } length = first_hole << blkbits; -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] ext4: readpages() should submit IO as read-ahead 2018-05-24 16:02 ` [PATCH 3/3] ext4: " Jens Axboe @ 2018-05-24 20:47 ` Theodore Y. Ts'o 2018-05-24 20:53 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Theodore Y. Ts'o @ 2018-05-24 20:47 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-fsdevel, viro, akpm On Thu, May 24, 2018 at 10:02:54AM -0600, Jens Axboe wrote: > a_ops->readpages() is only ever used for read-ahead. Ensure that we > pass this information down to the block layer. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > fs/ext4/readpage.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c > index 9ffa6fad18db..92b28273b223 100644 > --- a/fs/ext4/readpage.c > +++ b/fs/ext4/readpage.c > @@ -259,7 +259,7 @@ int ext4_mpage_readpages(struct address_space *mapping, > bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9); > bio->bi_end_io = mpage_end_io; > bio->bi_private = ctx; > - bio_set_op_attrs(bio, REQ_OP_READ, 0); > + bio_set_op_attrs(bio, REQ_OP_READ, REQ_RAHEAD); > } The problem is that ext4_readpage() also calls ext4_mpage_readpages(). So you can't set REQ_RAHEAD unconditionally here. - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] ext4: readpages() should submit IO as read-ahead 2018-05-24 20:47 ` Theodore Y. Ts'o @ 2018-05-24 20:53 ` Jens Axboe 2018-05-24 21:05 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2018-05-24 20:53 UTC (permalink / raw) To: Theodore Y. Ts'o; +Cc: linux-fsdevel, viro, akpm On 5/24/18 2:47 PM, Theodore Y. Ts'o wrote: > On Thu, May 24, 2018 at 10:02:54AM -0600, Jens Axboe wrote: >> a_ops->readpages() is only ever used for read-ahead. Ensure that we >> pass this information down to the block layer. >> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> --- >> fs/ext4/readpage.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c >> index 9ffa6fad18db..92b28273b223 100644 >> --- a/fs/ext4/readpage.c >> +++ b/fs/ext4/readpage.c >> @@ -259,7 +259,7 @@ int ext4_mpage_readpages(struct address_space *mapping, >> bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9); >> bio->bi_end_io = mpage_end_io; >> bio->bi_private = ctx; >> - bio_set_op_attrs(bio, REQ_OP_READ, 0); >> + bio_set_op_attrs(bio, REQ_OP_READ, REQ_RAHEAD); >> } > > The problem is that ext4_readpage() also calls ext4_mpage_readpages(). > So you can't set REQ_RAHEAD unconditionally here. Oh shoot, yeah I did miss that. I'll update it, thanks Ted. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] ext4: readpages() should submit IO as read-ahead 2018-05-24 20:53 ` Jens Axboe @ 2018-05-24 21:05 ` Jens Axboe 0 siblings, 0 replies; 16+ messages in thread From: Jens Axboe @ 2018-05-24 21:05 UTC (permalink / raw) To: Theodore Y. Ts'o; +Cc: linux-fsdevel, viro, akpm On 5/24/18 2:53 PM, Jens Axboe wrote: > On 5/24/18 2:47 PM, Theodore Y. Ts'o wrote: >> On Thu, May 24, 2018 at 10:02:54AM -0600, Jens Axboe wrote: >>> a_ops->readpages() is only ever used for read-ahead. Ensure that we >>> pass this information down to the block layer. >>> >>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>> --- >>> fs/ext4/readpage.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c >>> index 9ffa6fad18db..92b28273b223 100644 >>> --- a/fs/ext4/readpage.c >>> +++ b/fs/ext4/readpage.c >>> @@ -259,7 +259,7 @@ int ext4_mpage_readpages(struct address_space *mapping, >>> bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9); >>> bio->bi_end_io = mpage_end_io; >>> bio->bi_private = ctx; >>> - bio_set_op_attrs(bio, REQ_OP_READ, 0); >>> + bio_set_op_attrs(bio, REQ_OP_READ, REQ_RAHEAD); >>> } >> >> The problem is that ext4_readpage() also calls ext4_mpage_readpages(). >> So you can't set REQ_RAHEAD unconditionally here. > > Oh shoot, yeah I did miss that. I'll update it, thanks Ted. Updated version: http://git.kernel.dk/cgit/linux-block/commit/?h=readpages-ahead&id=288b12ed3180530fac7d6598afd3449ec9cdfdb8 -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHSET 0/3] Submit ->readpages() IO as read-ahead 2018-05-24 16:02 [PATCHSET 0/3] Submit ->readpages() IO as read-ahead Jens Axboe ` (2 preceding siblings ...) 2018-05-24 16:02 ` [PATCH 3/3] ext4: " Jens Axboe @ 2018-05-25 8:32 ` Christoph Hellwig 2018-05-25 14:21 ` Jens Axboe 3 siblings, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2018-05-25 8:32 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-fsdevel, viro, akpm On Thu, May 24, 2018 at 10:02:51AM -0600, Jens Axboe wrote: > The only caller of ->readpages() is from read-ahead, yet we don't > submit IO flagged with REQ_RAHEAD. This means we don't see it in > blktrace, for instance, which is a shame. We already make assumptions > about ->readpages() just being for read-ahead in the mpage > implementation, using readahead_gfp_mask(mapping) as out GFP mask of > choice. > > This small series fixes up mpage_readpages() to submit with > REQ_RAHEAD, which takes care of file systems using mpage_readpages(). > The last two fixup ext4 and btrfs. What are the benefits? Any setup where this buys us anything? Any setup where this actually regressed because drivers/hardware are doing stupid things with REQ_RAHEAD? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHSET 0/3] Submit ->readpages() IO as read-ahead 2018-05-25 8:32 ` [PATCHSET 0/3] Submit ->readpages() " Christoph Hellwig @ 2018-05-25 14:21 ` Jens Axboe 0 siblings, 0 replies; 16+ messages in thread From: Jens Axboe @ 2018-05-25 14:21 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-fsdevel, viro, akpm On 5/25/18 2:32 AM, Christoph Hellwig wrote: > On Thu, May 24, 2018 at 10:02:51AM -0600, Jens Axboe wrote: >> The only caller of ->readpages() is from read-ahead, yet we don't >> submit IO flagged with REQ_RAHEAD. This means we don't see it in >> blktrace, for instance, which is a shame. We already make assumptions >> about ->readpages() just being for read-ahead in the mpage >> implementation, using readahead_gfp_mask(mapping) as out GFP mask of >> choice. >> >> This small series fixes up mpage_readpages() to submit with >> REQ_RAHEAD, which takes care of file systems using mpage_readpages(). >> The last two fixup ext4 and btrfs. > > What are the benefits? Any setup where this buys us anything? Any The first benefit is getting the tracing right. Before you could not see which parts where readahead in blktrace, and which parts were not. This now works fine. The second motivation is fixing an issue around big read ahead windows where a severely bogged down box takes forever to exit a task that was killed, because it's going through tons of IOs on an oversubscribed disk. We see this at FB. 4MB window, and IOs being largely non-sequential. 1000 requests on a rotating drive that takes seconds to do each one. This requires fixing on top, which can be pretty easily done as long as we acknowledge that ->readpages() is strictly for read-ahead. > setup where this actually regressed because drivers/hardware are > doing stupid things with REQ_RAHEAD? I'd sure hope not, we are using REQ_RAHEAD in various meta data related bits. But the main read-ahead was not. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-05-29 22:37 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-24 16:02 [PATCHSET 0/3] Submit ->readpages() IO as read-ahead Jens Axboe 2018-05-24 16:02 ` [PATCH 1/3] mpage: mpage_readpages() should submit " Jens Axboe 2018-05-24 19:43 ` Andrew Morton 2018-05-24 19:57 ` Matthew Wilcox 2018-05-24 20:24 ` Jens Axboe 2018-05-29 14:36 ` Jens Axboe 2018-05-29 21:59 ` Andrew Morton 2018-05-29 22:18 ` Jens Axboe 2018-05-29 22:37 ` Jens Axboe 2018-05-24 16:02 ` [PATCH 2/3] btrfs: readpages() " Jens Axboe 2018-05-24 16:02 ` [PATCH 3/3] ext4: " Jens Axboe 2018-05-24 20:47 ` Theodore Y. Ts'o 2018-05-24 20:53 ` Jens Axboe 2018-05-24 21:05 ` Jens Axboe 2018-05-25 8:32 ` [PATCHSET 0/3] Submit ->readpages() " Christoph Hellwig 2018-05-25 14:21 ` Jens Axboe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.