linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
@ 2018-05-30 14:42 Jens Axboe
  2018-05-30 14:42 ` [PATCH 1/4] mpage: add argument structure for do_mpage_readpage() Jens Axboe
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Jens Axboe @ 2018-05-30 14:42 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, akpm

The only caller of ->readpages() is from read-ahead, yet we don't
submit IO flagged with REQ_RAHEAD. This means we don't see it in
blktrace, for instance, which is a shame. We already make assumptions
about ->readpages() just being for read-ahead in the mpage
implementation, using readahead_gfp_mask(mapping) as out GFP mask of
choice.

This small series fixes up mpage_readpages() to submit with
REQ_RAHEAD, which takes care of file systems using mpage_readpages().
The first patch is a prep patch, that makes do_mpage_readpage() take
an argument structure.

Changes since v2:

- Get rid of 'gfp' passing once we have is_readahead
- Pack struct better, makes it 8 bytes smaller.

Changes since v1:

- Fix ext4_mpage_readpages() also being used for regular reads
- Add prep patch with struct arguments for do_mpage_readpage()

 fs/btrfs/extent_io.c |   2 +-
 fs/ext4/ext4.h       |   2 +-
 fs/ext4/inode.c      |   5 +-
 fs/ext4/readpage.c   |   5 +-
 fs/mpage.c           | 115 ++++++++++++++++++++++++-------------------
 5 files changed, 71 insertions(+), 58 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] mpage: add argument structure for do_mpage_readpage()
  2018-05-30 14:42 [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead Jens Axboe
@ 2018-05-30 14:42 ` Jens Axboe
  2018-05-30 14:42 ` [PATCH 2/4] mpage: mpage_readpages() should submit IO as read-ahead Jens Axboe
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2018-05-30 14:42 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, akpm, Jens Axboe

We're currently passing 8 arguments to this function, clean it up a
bit by packing the arguments in an args structure we pass to it.

Not intentional functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/mpage.c | 106 +++++++++++++++++++++++++++++++------------------------------
 1 file changed, 54 insertions(+), 52 deletions(-)

diff --git a/fs/mpage.c b/fs/mpage.c
index b7e7f570733a..0ecac5c410f4 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -133,6 +133,17 @@ map_buffer_to_page(struct page *page, struct buffer_head *bh, int page_block)
 	} while (page_bh != head);
 }
 
+struct mpage_readpage_args {
+	struct bio *bio;
+	struct page *page;
+	unsigned nr_pages;
+	sector_t last_block_in_bio;
+	struct buffer_head map_bh;
+	unsigned long first_logical_block;
+	get_block_t *get_block;
+	gfp_t gfp;
+};
+
 /*
  * This is the worker routine which does all the work of mapping the disk
  * blocks and constructs largest possible bios, submits them for IO if the
@@ -142,16 +153,14 @@ map_buffer_to_page(struct page *page, struct buffer_head *bh, int page_block)
  * represent the validity of its disk mapping and to decide when to do the next
  * get_block() call.
  */
-static struct bio *
-do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
-		sector_t *last_block_in_bio, struct buffer_head *map_bh,
-		unsigned long *first_logical_block, get_block_t get_block,
-		gfp_t gfp)
+static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 {
+	struct page *page = args->page;
 	struct inode *inode = page->mapping->host;
 	const unsigned blkbits = inode->i_blkbits;
 	const unsigned blocks_per_page = PAGE_SIZE >> blkbits;
 	const unsigned blocksize = 1 << blkbits;
+	struct buffer_head *map_bh = &args->map_bh;
 	sector_t block_in_file;
 	sector_t last_block;
 	sector_t last_block_in_file;
@@ -168,7 +177,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 		goto confused;
 
 	block_in_file = (sector_t)page->index << (PAGE_SHIFT - blkbits);
-	last_block = block_in_file + nr_pages * blocks_per_page;
+	last_block = block_in_file + args->nr_pages * blocks_per_page;
 	last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;
 	if (last_block > last_block_in_file)
 		last_block = last_block_in_file;
@@ -178,9 +187,9 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 	 * Map blocks using the result from the previous get_blocks call first.
 	 */
 	nblocks = map_bh->b_size >> blkbits;
-	if (buffer_mapped(map_bh) && block_in_file > *first_logical_block &&
-			block_in_file < (*first_logical_block + nblocks)) {
-		unsigned map_offset = block_in_file - *first_logical_block;
+	if (buffer_mapped(map_bh) && block_in_file > args->first_logical_block &&
+			block_in_file < (args->first_logical_block + nblocks)) {
+		unsigned map_offset = block_in_file - args->first_logical_block;
 		unsigned last = nblocks - map_offset;
 
 		for (relative_block = 0; ; relative_block++) {
@@ -208,9 +217,9 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 
 		if (block_in_file < last_block) {
 			map_bh->b_size = (last_block-block_in_file) << blkbits;
-			if (get_block(inode, block_in_file, map_bh, 0))
+			if (args->get_block(inode, block_in_file, map_bh, 0))
 				goto confused;
-			*first_logical_block = block_in_file;
+			args->first_logical_block = block_in_file;
 		}
 
 		if (!buffer_mapped(map_bh)) {
@@ -273,43 +282,43 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 	/*
 	 * This page will go to BIO.  Do we need to send this BIO off first?
 	 */
-	if (bio && (*last_block_in_bio != blocks[0] - 1))
-		bio = mpage_bio_submit(REQ_OP_READ, 0, bio);
+	if (args->bio && (args->last_block_in_bio != blocks[0] - 1))
+		args->bio = mpage_bio_submit(REQ_OP_READ, 0, args->bio);
 
 alloc_new:
-	if (bio == NULL) {
+	if (args->bio == NULL) {
 		if (first_hole == blocks_per_page) {
 			if (!bdev_read_page(bdev, blocks[0] << (blkbits - 9),
 								page))
 				goto out;
 		}
-		bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9),
-				min_t(int, nr_pages, BIO_MAX_PAGES), gfp);
-		if (bio == NULL)
+		args->bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9),
+				min_t(int, args->nr_pages, BIO_MAX_PAGES), args->gfp);
+		if (args->bio == NULL)
 			goto confused;
 	}
 
 	length = first_hole << blkbits;
-	if (bio_add_page(bio, page, length, 0) < length) {
-		bio = mpage_bio_submit(REQ_OP_READ, 0, bio);
+	if (bio_add_page(args->bio, page, length, 0) < length) {
+		args->bio = mpage_bio_submit(REQ_OP_READ, 0, args->bio);
 		goto alloc_new;
 	}
 
-	relative_block = block_in_file - *first_logical_block;
+	relative_block = block_in_file - args->first_logical_block;
 	nblocks = map_bh->b_size >> blkbits;
 	if ((buffer_boundary(map_bh) && relative_block == nblocks) ||
 	    (first_hole != blocks_per_page))
-		bio = mpage_bio_submit(REQ_OP_READ, 0, bio);
+		args->bio = mpage_bio_submit(REQ_OP_READ, 0, args->bio);
 	else
-		*last_block_in_bio = blocks[blocks_per_page - 1];
+		args->last_block_in_bio = blocks[blocks_per_page - 1];
 out:
-	return bio;
+	return args->bio;
 
 confused:
-	if (bio)
-		bio = mpage_bio_submit(REQ_OP_READ, 0, bio);
+	if (args->bio)
+		args->bio = mpage_bio_submit(REQ_OP_READ, 0, args->bio);
 	if (!PageUptodate(page))
-	        block_read_full_page(page, get_block);
+	        block_read_full_page(page, args->get_block);
 	else
 		unlock_page(page);
 	goto out;
@@ -363,15 +372,12 @@ int
 mpage_readpages(struct address_space *mapping, struct list_head *pages,
 				unsigned nr_pages, get_block_t get_block)
 {
-	struct bio *bio = NULL;
+	struct mpage_readpage_args args = {
+		.get_block = get_block,
+		.gfp = readahead_gfp_mask(mapping),
+	};
 	unsigned page_idx;
-	sector_t last_block_in_bio = 0;
-	struct buffer_head map_bh;
-	unsigned long first_logical_block = 0;
-	gfp_t gfp = readahead_gfp_mask(mapping);
 
-	map_bh.b_state = 0;
-	map_bh.b_size = 0;
 	for (page_idx = 0; page_idx < nr_pages; page_idx++) {
 		struct page *page = lru_to_page(pages);
 
@@ -379,18 +385,16 @@ mpage_readpages(struct address_space *mapping, struct list_head *pages,
 		list_del(&page->lru);
 		if (!add_to_page_cache_lru(page, mapping,
 					page->index,
-					gfp)) {
-			bio = do_mpage_readpage(bio, page,
-					nr_pages - page_idx,
-					&last_block_in_bio, &map_bh,
-					&first_logical_block,
-					get_block, gfp);
+					args.gfp)) {
+			args.page = page;
+			args.nr_pages = nr_pages - page_idx;
+			args.bio = do_mpage_readpage(&args);
 		}
 		put_page(page);
 	}
 	BUG_ON(!list_empty(pages));
-	if (bio)
-		mpage_bio_submit(REQ_OP_READ, 0, bio);
+	if (args.bio)
+		mpage_bio_submit(REQ_OP_READ, 0, args.bio);
 	return 0;
 }
 EXPORT_SYMBOL(mpage_readpages);
@@ -400,18 +404,16 @@ EXPORT_SYMBOL(mpage_readpages);
  */
 int mpage_readpage(struct page *page, get_block_t get_block)
 {
-	struct bio *bio = NULL;
-	sector_t last_block_in_bio = 0;
-	struct buffer_head map_bh;
-	unsigned long first_logical_block = 0;
-	gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
+	struct mpage_readpage_args args = {
+		.page = page,
+		.nr_pages = 1,
+		.get_block = get_block,
+		.gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL),
+	};
 
-	map_bh.b_state = 0;
-	map_bh.b_size = 0;
-	bio = do_mpage_readpage(bio, page, 1, &last_block_in_bio,
-			&map_bh, &first_logical_block, get_block, gfp);
-	if (bio)
-		mpage_bio_submit(REQ_OP_READ, 0, bio);
+	args.bio = do_mpage_readpage(&args);
+	if (args.bio)
+		mpage_bio_submit(REQ_OP_READ, 0, args.bio);
 	return 0;
 }
 EXPORT_SYMBOL(mpage_readpage);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/4] mpage: mpage_readpages() should submit IO as read-ahead
  2018-05-30 14:42 [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead Jens Axboe
  2018-05-30 14:42 ` [PATCH 1/4] mpage: add argument structure for do_mpage_readpage() Jens Axboe
@ 2018-05-30 14:42 ` Jens Axboe
  2018-05-30 14:42 ` [PATCH 3/4] btrfs: readpages() " Jens Axboe
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2018-05-30 14:42 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, akpm, Jens Axboe

a_ops->readpages() is only ever used for read-ahead, yet we don't
flag the IO being submitted as such. Fix that up. Any file system
that uses mpage_readpages() as it's ->readpages() implementation
will now get this right.

Since we're passing in whether the IO is read-ahead or not, we
don't need to pass in the 'gfp' separately, as it is dependent
on the IO being read-ahead. Kill off that member.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/mpage.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/fs/mpage.c b/fs/mpage.c
index 0ecac5c410f4..b0f9de977526 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -137,11 +137,11 @@ struct mpage_readpage_args {
 	struct bio *bio;
 	struct page *page;
 	unsigned nr_pages;
+	bool is_readahead;
 	sector_t last_block_in_bio;
 	struct buffer_head map_bh;
 	unsigned long first_logical_block;
 	get_block_t *get_block;
-	gfp_t gfp;
 };
 
 /*
@@ -170,8 +170,18 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 	struct block_device *bdev = NULL;
 	int length;
 	int fully_mapped = 1;
+	int op_flags;
 	unsigned nblocks;
 	unsigned relative_block;
+	gfp_t gfp;
+
+	if (args->is_readahead) {
+		op_flags = REQ_RAHEAD;
+		gfp = readahead_gfp_mask(page->mapping);
+	} else {
+		op_flags = 0;
+		gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
+	}
 
 	if (page_has_buffers(page))
 		goto confused;
@@ -283,7 +293,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 	 * This page will go to BIO.  Do we need to send this BIO off first?
 	 */
 	if (args->bio && (args->last_block_in_bio != blocks[0] - 1))
-		args->bio = mpage_bio_submit(REQ_OP_READ, 0, args->bio);
+		args->bio = mpage_bio_submit(REQ_OP_READ, op_flags, args->bio);
 
 alloc_new:
 	if (args->bio == NULL) {
@@ -293,14 +303,14 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 				goto out;
 		}
 		args->bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9),
-				min_t(int, args->nr_pages, BIO_MAX_PAGES), args->gfp);
+				min_t(int, args->nr_pages, BIO_MAX_PAGES), gfp);
 		if (args->bio == NULL)
 			goto confused;
 	}
 
 	length = first_hole << blkbits;
 	if (bio_add_page(args->bio, page, length, 0) < length) {
-		args->bio = mpage_bio_submit(REQ_OP_READ, 0, args->bio);
+		args->bio = mpage_bio_submit(REQ_OP_READ, op_flags, args->bio);
 		goto alloc_new;
 	}
 
@@ -308,7 +318,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 	nblocks = map_bh->b_size >> blkbits;
 	if ((buffer_boundary(map_bh) && relative_block == nblocks) ||
 	    (first_hole != blocks_per_page))
-		args->bio = mpage_bio_submit(REQ_OP_READ, 0, args->bio);
+		args->bio = mpage_bio_submit(REQ_OP_READ, op_flags, args->bio);
 	else
 		args->last_block_in_bio = blocks[blocks_per_page - 1];
 out:
@@ -316,7 +326,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 
 confused:
 	if (args->bio)
-		args->bio = mpage_bio_submit(REQ_OP_READ, 0, args->bio);
+		args->bio = mpage_bio_submit(REQ_OP_READ, op_flags, args->bio);
 	if (!PageUptodate(page))
 	        block_read_full_page(page, args->get_block);
 	else
@@ -374,7 +384,7 @@ mpage_readpages(struct address_space *mapping, struct list_head *pages,
 {
 	struct mpage_readpage_args args = {
 		.get_block = get_block,
-		.gfp = readahead_gfp_mask(mapping),
+		.is_readahead = true,
 	};
 	unsigned page_idx;
 
@@ -385,7 +395,7 @@ mpage_readpages(struct address_space *mapping, struct list_head *pages,
 		list_del(&page->lru);
 		if (!add_to_page_cache_lru(page, mapping,
 					page->index,
-					args.gfp)) {
+					readahead_gfp_mask(mapping))) {
 			args.page = page;
 			args.nr_pages = nr_pages - page_idx;
 			args.bio = do_mpage_readpage(&args);
@@ -394,7 +404,7 @@ mpage_readpages(struct address_space *mapping, struct list_head *pages,
 	}
 	BUG_ON(!list_empty(pages));
 	if (args.bio)
-		mpage_bio_submit(REQ_OP_READ, 0, args.bio);
+		mpage_bio_submit(REQ_OP_READ, REQ_RAHEAD, args.bio);
 	return 0;
 }
 EXPORT_SYMBOL(mpage_readpages);
@@ -408,7 +418,6 @@ int mpage_readpage(struct page *page, get_block_t get_block)
 		.page = page,
 		.nr_pages = 1,
 		.get_block = get_block,
-		.gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL),
 	};
 
 	args.bio = do_mpage_readpage(&args);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/4] btrfs: readpages() should submit IO as read-ahead
  2018-05-30 14:42 [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead Jens Axboe
  2018-05-30 14:42 ` [PATCH 1/4] mpage: add argument structure for do_mpage_readpage() Jens Axboe
  2018-05-30 14:42 ` [PATCH 2/4] mpage: mpage_readpages() should submit IO as read-ahead Jens Axboe
@ 2018-05-30 14:42 ` Jens Axboe
  2018-05-30 14:42 ` [PATCH 4/4] ext4: " Jens Axboe
  2018-06-19 23:56 ` [PATCHSET v3 0/4] Submit ->readpages() " Andrew Morton
  4 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2018-05-30 14:42 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, akpm, Jens Axboe

a_ops->readpages() is only ever used for read-ahead. Ensure that we
pass this information down to the block layer.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/btrfs/extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e99b329002cf..32034bc1e2b9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3119,7 +3119,7 @@ static inline void __do_contiguous_readpages(struct extent_io_tree *tree,
 
 	for (index = 0; index < nr_pages; index++) {
 		__do_readpage(tree, pages[index], btrfs_get_extent, em_cached,
-				bio, 0, bio_flags, 0, prev_em_start);
+				bio, 0, bio_flags, REQ_RAHEAD, prev_em_start);
 		put_page(pages[index]);
 	}
 }
-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/4] ext4: readpages() should submit IO as read-ahead
  2018-05-30 14:42 [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead Jens Axboe
                   ` (2 preceding siblings ...)
  2018-05-30 14:42 ` [PATCH 3/4] btrfs: readpages() " Jens Axboe
@ 2018-05-30 14:42 ` Jens Axboe
  2018-06-19 23:56 ` [PATCHSET v3 0/4] Submit ->readpages() " Andrew Morton
  4 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2018-05-30 14:42 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, akpm, Jens Axboe

a_ops->readpages() is only ever used for read-ahead. Ensure that we
pass this information down to the block layer.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/ext4/ext4.h     | 2 +-
 fs/ext4/inode.c    | 5 +++--
 fs/ext4/readpage.c | 5 +++--
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a42e71203e53..62df0322b567 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3052,7 +3052,7 @@ static inline void ext4_set_de_type(struct super_block *sb,
 /* readpages.c */
 extern int ext4_mpage_readpages(struct address_space *mapping,
 				struct list_head *pages, struct page *page,
-				unsigned nr_pages);
+				unsigned nr_pages, bool is_readahead);
 
 /* symlink.c */
 extern const struct inode_operations ext4_encrypted_symlink_inode_operations;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1e50c5efae67..2bb50d184bf5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3327,7 +3327,8 @@ static int ext4_readpage(struct file *file, struct page *page)
 		ret = ext4_readpage_inline(inode, page);
 
 	if (ret == -EAGAIN)
-		return ext4_mpage_readpages(page->mapping, NULL, page, 1);
+		return ext4_mpage_readpages(page->mapping, NULL, page, 1,
+						false);
 
 	return ret;
 }
@@ -3342,7 +3343,7 @@ ext4_readpages(struct file *file, struct address_space *mapping,
 	if (ext4_has_inline_data(inode))
 		return 0;
 
-	return ext4_mpage_readpages(mapping, pages, NULL, nr_pages);
+	return ext4_mpage_readpages(mapping, pages, NULL, nr_pages, true);
 }
 
 static void ext4_invalidatepage(struct page *page, unsigned int offset,
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index 9ffa6fad18db..28c857d3b309 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -98,7 +98,7 @@ static void mpage_end_io(struct bio *bio)
 
 int ext4_mpage_readpages(struct address_space *mapping,
 			 struct list_head *pages, struct page *page,
-			 unsigned nr_pages)
+			 unsigned nr_pages, bool is_readahead)
 {
 	struct bio *bio = NULL;
 	sector_t last_block_in_bio = 0;
@@ -259,7 +259,8 @@ int ext4_mpage_readpages(struct address_space *mapping,
 			bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9);
 			bio->bi_end_io = mpage_end_io;
 			bio->bi_private = ctx;
-			bio_set_op_attrs(bio, REQ_OP_READ, 0);
+			bio_set_op_attrs(bio, REQ_OP_READ,
+						is_readahead ? REQ_RAHEAD : 0);
 		}
 
 		length = first_hole << blkbits;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
  2018-05-30 14:42 [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead Jens Axboe
                   ` (3 preceding siblings ...)
  2018-05-30 14:42 ` [PATCH 4/4] ext4: " Jens Axboe
@ 2018-06-19 23:56 ` Andrew Morton
  2018-06-20 14:07   ` Jens Axboe
  4 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2018-06-19 23:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, viro

On Wed, 30 May 2018 08:42:05 -0600 Jens Axboe <axboe@kernel.dk> wrote:

> The only caller of ->readpages() is from read-ahead, yet we don't
> submit IO flagged with REQ_RAHEAD. This means we don't see it in
> blktrace, for instance, which is a shame. We already make assumptions
> about ->readpages() just being for read-ahead in the mpage
> implementation, using readahead_gfp_mask(mapping) as out GFP mask of
> choice.
> 
> This small series fixes up mpage_readpages() to submit with
> REQ_RAHEAD, which takes care of file systems using mpage_readpages().
> The first patch is a prep patch, that makes do_mpage_readpage() take
> an argument structure.

Well gee.  That sounds like a total fluke and I don't see anything
which prevents future code from using readpages for something other
than readahead.

For example, f2fs_mpage_readpages().  Which, being f2fs, has decided to
copy-paste-modify mpage_readpages() and to then use it for
non-readahead purposes.  If that code ever gets its act together then
we have a problem.

A proper approach might be to add a new
address_space_operations.readahead which sets REQ_RAHEAD but after
doing all that, nothing would use address_space_operations.readpages,
which is a bit weird.

And what's happening with the individual ->readpage() calls which
read_pages() does?  Do they still not get REQ_RAHEAD treatment?


This is kinda dirty, but we could add a readahead flag to blk_plug, set
it in read_pages(), test it down in the block layer somewhere....
That's grubby, but at least it wouldn't increase sizeof(task_struct)?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
  2018-06-19 23:56 ` [PATCHSET v3 0/4] Submit ->readpages() " Andrew Morton
@ 2018-06-20 14:07   ` Jens Axboe
  2018-06-20 19:58     ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2018-06-20 14:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, viro

On 6/19/18 5:56 PM, Andrew Morton wrote:
> On Wed, 30 May 2018 08:42:05 -0600 Jens Axboe <axboe@kernel.dk> wrote:
> 
>> The only caller of ->readpages() is from read-ahead, yet we don't
>> submit IO flagged with REQ_RAHEAD. This means we don't see it in
>> blktrace, for instance, which is a shame. We already make assumptions
>> about ->readpages() just being for read-ahead in the mpage
>> implementation, using readahead_gfp_mask(mapping) as out GFP mask of
>> choice.
>>
>> This small series fixes up mpage_readpages() to submit with
>> REQ_RAHEAD, which takes care of file systems using mpage_readpages().
>> The first patch is a prep patch, that makes do_mpage_readpage() take
>> an argument structure.
> 
> Well gee.  That sounds like a total fluke and I don't see anything
> which prevents future code from using readpages for something other
> than readahead.

Nothing prevents it, but it's always been the case. So we need to
embrace the fact one way or another.

> For example, f2fs_mpage_readpages().  Which, being f2fs, has decided to
> copy-paste-modify mpage_readpages() and to then use it for
> non-readahead purposes.  If that code ever gets its act together then
> we have a problem.
> 
> A proper approach might be to add a new
> address_space_operations.readahead which sets REQ_RAHEAD but after
> doing all that, nothing would use address_space_operations.readpages,
> which is a bit weird.

I did consider that, but as you also found out, that would leave us with
a readpages() that's never used. Alternatively, we can just rename it to
make it more explicit.

> And what's happening with the individual ->readpage() calls which
> read_pages() does?  Do they still not get REQ_RAHEAD treatment?

They should, yes.

> This is kinda dirty, but we could add a readahead flag to blk_plug, set
> it in read_pages(), test it down in the block layer somewhere....
> That's grubby, but at least it wouldn't increase sizeof(task_struct)?

That sounds way too dirty for my liking, I'd really not want to mix that
in with plugging.

Why not just treat it like it is - readpages() is clearly just
read-ahead. This isn't something new I'm inventing, it's always been so.
Seems to me that we just need to make it explicit.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
  2018-06-20 14:07   ` Jens Axboe
@ 2018-06-20 19:58     ` Andrew Morton
  2018-06-20 20:15       ` Chris Mason
  2018-06-20 22:28       ` Jens Axboe
  0 siblings, 2 replies; 16+ messages in thread
From: Andrew Morton @ 2018-06-20 19:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, viro

On Wed, 20 Jun 2018 08:07:06 -0600 Jens Axboe <axboe@kernel.dk> wrote:

> On 6/19/18 5:56 PM, Andrew Morton wrote:
> > On Wed, 30 May 2018 08:42:05 -0600 Jens Axboe <axboe@kernel.dk> wrote:
> > 
> >> The only caller of ->readpages() is from read-ahead, yet we don't
> >> submit IO flagged with REQ_RAHEAD. This means we don't see it in
> >> blktrace, for instance, which is a shame. We already make assumptions
> >> about ->readpages() just being for read-ahead in the mpage
> >> implementation, using readahead_gfp_mask(mapping) as out GFP mask of
> >> choice.
> >>
> >> This small series fixes up mpage_readpages() to submit with
> >> REQ_RAHEAD, which takes care of file systems using mpage_readpages().
> >> The first patch is a prep patch, that makes do_mpage_readpage() take
> >> an argument structure.
> > 
> > Well gee.  That sounds like a total fluke and I don't see anything
> > which prevents future code from using readpages for something other
> > than readahead.
> 
> Nothing prevents it, but it's always been the case. So we need to
> embrace the fact one way or another.

All it will take is one simple (and desirable) cleanup to f2fs and this
will all break.  Take that as an indication of the fragility of this
assumption.

> > And what's happening with the individual ->readpage() calls which
> > read_pages() does?  Do they still not get REQ_RAHEAD treatment?
> 
> They should, yes.

hm, how does that happen.

> > This is kinda dirty, but we could add a readahead flag to blk_plug, set
> > it in read_pages(), test it down in the block layer somewhere....
> > That's grubby, but at least it wouldn't increase sizeof(task_struct)?
> 
> That sounds way too dirty for my liking, I'd really not want to mix that
> in with plugging.

That's the wrong way of thinking about it ;) Rename blk_plug to
blk_state or something and view it as a communication package between
the VFS level and the block layer and it all fits together pretty well.

> Why not just treat it like it is - readpages() is clearly just
> read-ahead. This isn't something new I'm inventing, it's always been so.
> Seems to me that we just need to make it explicit.

There's no reason at all why ->readpages() is exclusively for use by
readahead!  I'm rather surprised that fsf2 is the only fs which is
(should be) using ->readpages internally.

Can we change blktrace instead?  Rename REQ_RAHEAD to REQ_READPAGES? 
Then everything will fit together OK.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
  2018-06-20 19:58     ` Andrew Morton
@ 2018-06-20 20:15       ` Chris Mason
  2018-06-20 22:28       ` Jens Axboe
  1 sibling, 0 replies; 16+ messages in thread
From: Chris Mason @ 2018-06-20 20:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jens Axboe, linux-fsdevel, viro

On 20 Jun 2018, at 15:58, Andrew Morton wrote:

> On Wed, 20 Jun 2018 08:07:06 -0600 Jens Axboe <axboe@kernel.dk> wrote:
>
>> Why not just treat it like it is - readpages() is clearly just
>> read-ahead. This isn't something new I'm inventing, it's always been 
>> so.
>> Seems to me that we just need to make it explicit.
>
> There's no reason at all why ->readpages() is exclusively for use by
> readahead!  I'm rather surprised that fsf2 is the only fs which is
> (should be) using ->readpages internally.

Btrfs uses ->readpages internally, but everywhere we do uses the same 
readahead style pattern that sys_read ends up falling into.

The generic helper ends up being exactly what we need, so we call 
page_cache_sync_readahead().  Looks like ext4 does the same.

-chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
  2018-06-20 19:58     ` Andrew Morton
  2018-06-20 20:15       ` Chris Mason
@ 2018-06-20 22:28       ` Jens Axboe
  2018-06-20 23:23         ` Andrew Morton
  2018-06-21 12:21         ` Christoph Hellwig
  1 sibling, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2018-06-20 22:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, viro

On 6/20/18 1:58 PM, Andrew Morton wrote:
> On Wed, 20 Jun 2018 08:07:06 -0600 Jens Axboe <axboe@kernel.dk> wrote:
> 
>> On 6/19/18 5:56 PM, Andrew Morton wrote:
>>> On Wed, 30 May 2018 08:42:05 -0600 Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>>> The only caller of ->readpages() is from read-ahead, yet we don't
>>>> submit IO flagged with REQ_RAHEAD. This means we don't see it in
>>>> blktrace, for instance, which is a shame. We already make assumptions
>>>> about ->readpages() just being for read-ahead in the mpage
>>>> implementation, using readahead_gfp_mask(mapping) as out GFP mask of
>>>> choice.
>>>>
>>>> This small series fixes up mpage_readpages() to submit with
>>>> REQ_RAHEAD, which takes care of file systems using mpage_readpages().
>>>> The first patch is a prep patch, that makes do_mpage_readpage() take
>>>> an argument structure.
>>>
>>> Well gee.  That sounds like a total fluke and I don't see anything
>>> which prevents future code from using readpages for something other
>>> than readahead.
>>
>> Nothing prevents it, but it's always been the case. So we need to
>> embrace the fact one way or another.
> 
> All it will take is one simple (and desirable) cleanup to f2fs and this
> will all break.  Take that as an indication of the fragility of this
> assumption.

The assumption has been true for a decade or more. The only problem with
it is that we have this ->readpages() interface that is only used for
readahead. Hence it should be ->readahead() or similar, that avoid the
fragility argument.

>>> And what's happening with the individual ->readpage() calls which
>>> read_pages() does?  Do they still not get REQ_RAHEAD treatment?
>>
>> They should, yes.
> 
> hm, how does that happen.

Maybe we aren't talking about the same thing. Which individual
->readpage() exactly?

>>> This is kinda dirty, but we could add a readahead flag to blk_plug, set
>>> it in read_pages(), test it down in the block layer somewhere....
>>> That's grubby, but at least it wouldn't increase sizeof(task_struct)?
>>
>> That sounds way too dirty for my liking, I'd really not want to mix that
>> in with plugging.
> 
> That's the wrong way of thinking about it ;) Rename blk_plug to
> blk_state or something and view it as a communication package between
> the VFS level and the block layer and it all fits together pretty well.

We have a perfectly fine established interface for telling the block
layer what kind of IO we're submitting, and that's setting it in the
bio. Introducing a secondary and orthogonal interface for this is a
horrible idea, imho.

>> Why not just treat it like it is - readpages() is clearly just
>> read-ahead. This isn't something new I'm inventing, it's always been so.
>> Seems to me that we just need to make it explicit.
> 
> There's no reason at all why ->readpages() is exclusively for use by
> readahead!  I'm rather surprised that fsf2 is the only fs which is
> (should be) using ->readpages internally.

I totally agree, there's no reason why it is, and everybody is surprised
that that is the case. But the fact of the matter is that that is the
case, and always has been so. So I'll repeat what I said before, we need
to embrace that and make it EXPLICIT instead of side stepping the issue.

> Can we change blktrace instead?  Rename REQ_RAHEAD to REQ_READPAGES? 
> Then everything will fit together OK.

Why do you insist on working around it?! That's just creating another
assumption in another place, not fixing the actual issue.

Besides, this isn't going to be just about tracing. Yes, it'll be
awesome to actually get the right information from blktrace, since right
now nobody knows which parts are read-ahead and which ones are explicit
reads. Might be pretty darn useful to debug read-ahead issues.

The read-ahead information must be reliable, otherwise it's impossible
to introduce other dependencies on top of that. We're having a lot of
issues with termination of tasks that are stuck in issuing read-ahead.
If we can rely on the read-ahead information being there AND being
correct, then we can terminate reads early. This might not sound like
huge win, but we're talking tens of minutes for some cases. That's the
planned functional change behind this, but can't be done before we make
progress on the topic at hand.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
  2018-06-20 22:28       ` Jens Axboe
@ 2018-06-20 23:23         ` Andrew Morton
  2018-06-20 23:33           ` Jens Axboe
  2018-06-21 12:21         ` Christoph Hellwig
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2018-06-20 23:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, viro

On Wed, 20 Jun 2018 16:28:54 -0600 Jens Axboe <axboe@kernel.dk> wrote:

> >> Nothing prevents it, but it's always been the case. So we need to
> >> embrace the fact one way or another.
> > 
> > All it will take is one simple (and desirable) cleanup to f2fs and this
> > will all break.  Take that as an indication of the fragility of this
> > assumption.
> 
> The assumption has been true for a decade or more. The only problem with
> it is that we have this ->readpages() interface that is only used for
> readahead. Hence it should be ->readahead() or similar, that avoid the
> fragility argument.

There's no reason why readpages shouldn't be used for non-readahead
purposes.

> >>> And what's happening with the individual ->readpage() calls which
> >>> read_pages() does?  Do they still not get REQ_RAHEAD treatment?
> >>
> >> They should, yes.
> > 
> > hm, how does that happen.
> 
> Maybe we aren't talking about the same thing. Which individual
> ->readpage() exactly?

I forget, never mind ;)

> >>> This is kinda dirty, but we could add a readahead flag to blk_plug, set
> >>> it in read_pages(), test it down in the block layer somewhere....
> >>> That's grubby, but at least it wouldn't increase sizeof(task_struct)?
> >>
> >> That sounds way too dirty for my liking, I'd really not want to mix that
> >> in with plugging.
> > 
> > That's the wrong way of thinking about it ;) Rename blk_plug to
> > blk_state or something and view it as a communication package between
> > the VFS level and the block layer and it all fits together pretty well.
> 
> We have a perfectly fine established interface for telling the block
> layer what kind of IO we're submitting, and that's setting it in the
> bio. Introducing a secondary and orthogonal interface for this is a
> horrible idea, imho.

->readpages() callers don't have access to the bio(s).

> >> Why not just treat it like it is - readpages() is clearly just
> >> read-ahead. This isn't something new I'm inventing, it's always been so.
> >> Seems to me that we just need to make it explicit.
> > 
> > There's no reason at all why ->readpages() is exclusively for use by
> > readahead!  I'm rather surprised that fsf2 is the only fs which is
> > (should be) using ->readpages internally.
> 
> I totally agree, there's no reason why it is, and everybody is surprised
> that that is the case. But the fact of the matter is that that is the
> case, and always has been so. So I'll repeat what I said before, we need
> to embrace that and make it EXPLICIT instead of side stepping the issue.

My point is that f2fs (for example) *should* be using mpage_readpages()
right now, for its non-readahead purposes.

If we are to remove callers' ability to use readpages for non-readahead
purposes we should rename the address_space field.

> > Can we change blktrace instead?  Rename REQ_RAHEAD to REQ_READPAGES? 
> > Then everything will fit together OK.
> 
> Why do you insist on working around it?! That's just creating another
> assumption in another place, not fixing the actual issue.

All I've been told about is blktrace.

> Besides, this isn't going to be just about tracing. Yes, it'll be
> awesome to actually get the right information from blktrace, since right
> now nobody knows which parts are read-ahead and which ones are explicit
> reads. Might be pretty darn useful to debug read-ahead issues.
> 
> The read-ahead information must be reliable, otherwise it's impossible
> to introduce other dependencies on top of that. We're having a lot of
> issues with termination of tasks that are stuck in issuing read-ahead.
> If we can rely on the read-ahead information being there AND being
> correct, then we can terminate reads early. This might not sound like
> huge win, but we're talking tens of minutes for some cases. That's the
> planned functional change behind this, but can't be done before we make
> progress on the topic at hand.

Changelog material right there.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
  2018-06-20 23:23         ` Andrew Morton
@ 2018-06-20 23:33           ` Jens Axboe
  2018-06-20 23:45             ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2018-06-20 23:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, viro

On 6/20/18 5:23 PM, Andrew Morton wrote:
> On Wed, 20 Jun 2018 16:28:54 -0600 Jens Axboe <axboe@kernel.dk> wrote:
> 
>>>> Nothing prevents it, but it's always been the case. So we need to
>>>> embrace the fact one way or another.
>>>
>>> All it will take is one simple (and desirable) cleanup to f2fs and this
>>> will all break.  Take that as an indication of the fragility of this
>>> assumption.
>>
>> The assumption has been true for a decade or more. The only problem with
>> it is that we have this ->readpages() interface that is only used for
>> readahead. Hence it should be ->readahead() or similar, that avoid the
>> fragility argument.
> 
> There's no reason why readpages shouldn't be used for non-readahead
> purposes.

I feel like we're going in circles here! Yes, I agree, fact is it isn't
and never has been. So either we acknowledge that ->readpages() is is
just read-ahead by nature, or we rename it, or in some other way
recognize that that is the case. See above.

>>>>> This is kinda dirty, but we could add a readahead flag to blk_plug, set
>>>>> it in read_pages(), test it down in the block layer somewhere....
>>>>> That's grubby, but at least it wouldn't increase sizeof(task_struct)?
>>>>
>>>> That sounds way too dirty for my liking, I'd really not want to mix that
>>>> in with plugging.
>>>
>>> That's the wrong way of thinking about it ;) Rename blk_plug to
>>> blk_state or something and view it as a communication package between
>>> the VFS level and the block layer and it all fits together pretty well.
>>
>> We have a perfectly fine established interface for telling the block
>> layer what kind of IO we're submitting, and that's setting it in the
>> bio. Introducing a secondary and orthogonal interface for this is a
>> horrible idea, imho.
> 
> ->readpages() callers don't have access to the bio(s).

It most certainly does, since it's the one submitting the IO. That's how
I'm currently propagating the read vs reada information.

>>>> Why not just treat it like it is - readpages() is clearly just
>>>> read-ahead. This isn't something new I'm inventing, it's always been so.
>>>> Seems to me that we just need to make it explicit.
>>>
>>> There's no reason at all why ->readpages() is exclusively for use by
>>> readahead!  I'm rather surprised that fsf2 is the only fs which is
>>> (should be) using ->readpages internally.
>>
>> I totally agree, there's no reason why it is, and everybody is surprised
>> that that is the case. But the fact of the matter is that that is the
>> case, and always has been so. So I'll repeat what I said before, we need
>> to embrace that and make it EXPLICIT instead of side stepping the issue.
> 
> My point is that f2fs (for example) *should* be using mpage_readpages()
> right now, for its non-readahead purposes.

I do hear that point. But it's somewhat of a fallacy. I'm perfectly
happy adding ->readahead() and having all the rest of the ->readpages()
be that, and then f2fs can use ->readpages(). At the same time, I don't
want to do that if there's no current users. For all I know and care,
f2fs will stay how it is forever. We can't base decision on how to
proceed on whether or not some random file system will decide to do
something else in the future.

> If we are to remove callers' ability to use readpages for non-readahead
> purposes we should rename the address_space field.

Totally agree, and I'd be happy to do that. So how about we do that? I
rename it to ->readahead (or ->readaheadpages()?), and then it's
perfectly clear what is going on. If f2fs wants to use ->readpages() for
non-readhead, then we can add ->readpages(). Honestly, it's
disappointing that we don't have any non-reada ->readpages.

>>> Can we change blktrace instead?  Rename REQ_RAHEAD to REQ_READPAGES? 
>>> Then everything will fit together OK.
>>
>> Why do you insist on working around it?! That's just creating another
>> assumption in another place, not fixing the actual issue.
> 
> All I've been told about is blktrace.

That was my cover.

>> Besides, this isn't going to be just about tracing. Yes, it'll be
>> awesome to actually get the right information from blktrace, since right
>> now nobody knows which parts are read-ahead and which ones are explicit
>> reads. Might be pretty darn useful to debug read-ahead issues.
>>
>> The read-ahead information must be reliable, otherwise it's impossible
>> to introduce other dependencies on top of that. We're having a lot of
>> issues with termination of tasks that are stuck in issuing read-ahead.
>> If we can rely on the read-ahead information being there AND being
>> correct, then we can terminate reads early. This might not sound like
>> huge win, but we're talking tens of minutes for some cases. That's the
>> planned functional change behind this, but can't be done before we make
>> progress on the topic at hand.
> 
> Changelog material right there.

I deliberately didn't want to include that, since it'd muddy the waters
on what is a 100% standalone change.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
  2018-06-20 23:33           ` Jens Axboe
@ 2018-06-20 23:45             ` Andrew Morton
  2018-06-21  0:06               ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2018-06-20 23:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, viro

On Wed, 20 Jun 2018 17:33:47 -0600 Jens Axboe <axboe@kernel.dk> wrote:

> > If we are to remove callers' ability to use readpages for non-readahead
> > purposes we should rename the address_space field.
> 
> Totally agree, and I'd be happy to do that. So how about we do that? I
> rename it to ->readahead (or ->readaheadpages()?), and then it's
> perfectly clear what is going on.

I'm not sure I have the heart to recommend that.

akpm3:/usr/src/linux-4.18-rc1> grep -r readpages .|wc -l
233

Really, every damn one will need an edit.  I'd understand if we left it
at ->readpages and sprinkled some loud comments around the place.

> >> Besides, this isn't going to be just about tracing. Yes, it'll be
> >> awesome to actually get the right information from blktrace, since right
> >> now nobody knows which parts are read-ahead and which ones are explicit
> >> reads. Might be pretty darn useful to debug read-ahead issues.
> >>
> >> The read-ahead information must be reliable, otherwise it's impossible
> >> to introduce other dependencies on top of that. We're having a lot of
> >> issues with termination of tasks that are stuck in issuing read-ahead.
> >> If we can rely on the read-ahead information being there AND being
> >> correct, then we can terminate reads early. This might not sound like
> >> huge win, but we're talking tens of minutes for some cases. That's the
> >> planned functional change behind this, but can't be done before we make
> >> progress on the topic at hand.
> > 
> > Changelog material right there.
> 
> I deliberately didn't want to include that, since it'd muddy the waters
> on what is a 100% standalone change.

Omitting the info muddied the waters!

Shrug, I give up.  "readpages is really only for readahead" can become
another kernel wart, I guess.  Anyone who wants that capability in the
future can sit there looping on ->readpage.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
  2018-06-20 23:45             ` Andrew Morton
@ 2018-06-21  0:06               ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2018-06-21  0:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, viro

On 6/20/18 5:45 PM, Andrew Morton wrote:
> On Wed, 20 Jun 2018 17:33:47 -0600 Jens Axboe <axboe@kernel.dk> wrote:
> 
>>> If we are to remove callers' ability to use readpages for non-readahead
>>> purposes we should rename the address_space field.
>>
>> Totally agree, and I'd be happy to do that. So how about we do that? I
>> rename it to ->readahead (or ->readaheadpages()?), and then it's
>> perfectly clear what is going on.
> 
> I'm not sure I have the heart to recommend that.
> 
> akpm3:/usr/src/linux-4.18-rc1> grep -r readpages .|wc -l
> 233
> 
> Really, every damn one will need an edit.  I'd understand if we left it
> at ->readpages and sprinkled some loud comments around the place.

That would be more convenient... Embrace and extend, I'll resend the
series with some comments added that'll hopefully get the point across.

>>>> Besides, this isn't going to be just about tracing. Yes, it'll be
>>>> awesome to actually get the right information from blktrace, since right
>>>> now nobody knows which parts are read-ahead and which ones are explicit
>>>> reads. Might be pretty darn useful to debug read-ahead issues.
>>>>
>>>> The read-ahead information must be reliable, otherwise it's impossible
>>>> to introduce other dependencies on top of that. We're having a lot of
>>>> issues with termination of tasks that are stuck in issuing read-ahead.
>>>> If we can rely on the read-ahead information being there AND being
>>>> correct, then we can terminate reads early. This might not sound like
>>>> huge win, but we're talking tens of minutes for some cases. That's the
>>>> planned functional change behind this, but can't be done before we make
>>>> progress on the topic at hand.
>>>
>>> Changelog material right there.
>>
>> I deliberately didn't want to include that, since it'd muddy the waters
>> on what is a 100% standalone change.
> 
> Omitting the info muddied the waters!

I claim no ill intent!

> Shrug, I give up.  "readpages is really only for readahead" can become
> another kernel wart, I guess.  Anyone who wants that capability in the
> future can sit there looping on ->readpage.

Or do the leg work, which is a rename and adding ->readpages(). As long
as folks don't get too confused, I think we're fine commandeering
->readpages() for read-ahead now.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
  2018-06-20 22:28       ` Jens Axboe
  2018-06-20 23:23         ` Andrew Morton
@ 2018-06-21 12:21         ` Christoph Hellwig
  2018-06-21 13:52           ` Jens Axboe
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2018-06-21 12:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-fsdevel, viro

On Wed, Jun 20, 2018 at 04:28:54PM -0600, Jens Axboe wrote:
> The assumption has been true for a decade or more. The only problem with
> it is that we have this ->readpages() interface that is only used for
> readahead. Hence it should be ->readahead() or similar, that avoid the
> fragility argument.

Or just stop this whole bike shed.  ->readpages doesn't keep the pages
it reads locked, so it fundamentally is a readahead style operation.

Now if we actually want do to anything with that information is another
question that I already asked in reply to the first round of these
patches.

Especially as e.g. nvme actually sets hints based on the flag, that
if hardware actually ever looked at them might be doing the wrong
thing (fortunately for us I doubt any nvme hardware actually looks
at that flag).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
  2018-06-21 12:21         ` Christoph Hellwig
@ 2018-06-21 13:52           ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2018-06-21 13:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrew Morton, linux-fsdevel, viro

On 6/21/18 6:21 AM, Christoph Hellwig wrote:
> On Wed, Jun 20, 2018 at 04:28:54PM -0600, Jens Axboe wrote:
>> The assumption has been true for a decade or more. The only problem with
>> it is that we have this ->readpages() interface that is only used for
>> readahead. Hence it should be ->readahead() or similar, that avoid the
>> fragility argument.
> 
> Or just stop this whole bike shed.  ->readpages doesn't keep the pages
> it reads locked, so it fundamentally is a readahead style operation.
> 
> Now if we actually want do to anything with that information is another
> question that I already asked in reply to the first round of these
> patches.
> 
> Especially as e.g. nvme actually sets hints based on the flag, that
> if hardware actually ever looked at them might be doing the wrong
> thing (fortunately for us I doubt any nvme hardware actually looks
> at that flag).

In most of these cases, these types of hints don't translate uniformly
across devices. So whether we pass them on to the actual hardware or
not is a different matter. For now, as mentioned, all I care about is
getting the tracing working properly, and being able to functionally
differentiate between a read and a read-ahead read in the block layer.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-06-21 13:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-30 14:42 [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead Jens Axboe
2018-05-30 14:42 ` [PATCH 1/4] mpage: add argument structure for do_mpage_readpage() Jens Axboe
2018-05-30 14:42 ` [PATCH 2/4] mpage: mpage_readpages() should submit IO as read-ahead Jens Axboe
2018-05-30 14:42 ` [PATCH 3/4] btrfs: readpages() " Jens Axboe
2018-05-30 14:42 ` [PATCH 4/4] ext4: " Jens Axboe
2018-06-19 23:56 ` [PATCHSET v3 0/4] Submit ->readpages() " Andrew Morton
2018-06-20 14:07   ` Jens Axboe
2018-06-20 19:58     ` Andrew Morton
2018-06-20 20:15       ` Chris Mason
2018-06-20 22:28       ` Jens Axboe
2018-06-20 23:23         ` Andrew Morton
2018-06-20 23:33           ` Jens Axboe
2018-06-20 23:45             ` Andrew Morton
2018-06-21  0:06               ` Jens Axboe
2018-06-21 12:21         ` Christoph Hellwig
2018-06-21 13:52           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).