linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] generic_file_buffered_read() refactoring & optimization
@ 2020-06-10  0:10 Kent Overstreet
  2020-06-10  0:10 ` [PATCH 1/2] fs: Break generic_file_buffered_read up into multiple functions Kent Overstreet
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Kent Overstreet @ 2020-06-10  0:10 UTC (permalink / raw)
  To: linux-kernel, akpm, viro, linux-mm, linux-fsdevel; +Cc: Kent Overstreet

This is a small patch series that's been in the bcachefs tree for awhile.

In the buffered read path, we look up a page in the page cache, then copy from
that page in a loop - i.e. mixing the data copies in between looking up each
individual page. When we're doing large reads from the page cache, this is some
pretty major overhead.

This just reworks generic_file_buffered_read() to use find_get_pages_contig()
and work on an array of pages. It's a pretty significant performance
improvement for large buffered reads, and doesn't regress performance on single
page reads.

As a bonus, generic_file_buffered_read() gets broken up into multiple functions
that are _somewhat_ easier to follow.

Kent Overstreet (2):
  fs: Break generic_file_buffered_read up into multiple functions
  fs: generic_file_buffered_read() now uses find_get_pages_contig

 mm/filemap.c | 486 +++++++++++++++++++++++++++++----------------------
 1 file changed, 273 insertions(+), 213 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] fs: Break generic_file_buffered_read up into multiple functions
  2020-06-10  0:10 [PATCH 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
@ 2020-06-10  0:10 ` Kent Overstreet
  2020-06-10  0:10 ` [PATCH 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2020-06-10  0:10 UTC (permalink / raw)
  To: linux-kernel, akpm, viro, linux-mm, linux-fsdevel; +Cc: Kent Overstreet

This is prep work for changing generic_file_buffered_read() to use
find_get_pages_contig() to batch up all the pagecache lookups.

This patch should be functionally identical to the existing code and
changes as little as of the flow control as possible. More refactoring
could be done, this patch is intended to be relatively minimal.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 mm/filemap.c | 418 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 228 insertions(+), 190 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index e67fa8ab48..206d51a1c9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2051,6 +2051,210 @@ static void shrink_readahead_size_eio(struct file_ra_state *ra)
 	ra->ra_pages /= 4;
 }
 
+static int generic_file_buffered_read_page_ok(struct kiocb *iocb,
+			struct iov_iter *iter,
+			struct page *page)
+{
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
+	struct inode *inode = mapping->host;
+	struct file_ra_state *ra = &iocb->ki_filp->f_ra;
+	unsigned offset = iocb->ki_pos & ~PAGE_MASK;
+	unsigned bytes, copied;
+	loff_t isize, end_offset;
+
+	BUG_ON(iocb->ki_pos >> PAGE_SHIFT != page->index);
+
+	/*
+	 * i_size must be checked after we know the page is Uptodate.
+	 *
+	 * Checking i_size after the check allows us to calculate
+	 * the correct value for "bytes", which means the zero-filled
+	 * part of the page is not copied back to userspace (unless
+	 * another truncate extends the file - this is desired though).
+	 */
+
+	isize = i_size_read(inode);
+	if (unlikely(iocb->ki_pos >= isize))
+		return 1;
+
+	end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
+
+	bytes = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
+
+	/* If users can be writing to this page using arbitrary
+	 * virtual addresses, take care about potential aliasing
+	 * before reading the page on the kernel side.
+	 */
+	if (mapping_writably_mapped(mapping))
+		flush_dcache_page(page);
+
+	/*
+	 * Ok, we have the page, and it's up-to-date, so
+	 * now we can copy it to user space...
+	 */
+
+	copied = copy_page_to_iter(page, offset, bytes, iter);
+
+	iocb->ki_pos += copied;
+
+	/*
+	 * When a sequential read accesses a page several times,
+	 * only mark it as accessed the first time.
+	 */
+	if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
+		mark_page_accessed(page);
+
+	ra->prev_pos = iocb->ki_pos;
+
+	if (copied < bytes)
+		return -EFAULT;
+
+	return !iov_iter_count(iter) || iocb->ki_pos == isize;
+}
+
+static struct page *
+generic_file_buffered_read_readpage(struct file *filp,
+				    struct address_space *mapping,
+				    struct page *page)
+{
+	struct file_ra_state *ra = &filp->f_ra;
+	int error;
+
+	/*
+	 * A previous I/O error may have been due to temporary
+	 * failures, eg. multipath errors.
+	 * PG_error will be set again if readpage fails.
+	 */
+	ClearPageError(page);
+	/* Start the actual read. The read will unlock the page. */
+	error = mapping->a_ops->readpage(filp, page);
+
+	if (unlikely(error)) {
+		put_page(page);
+		return error != AOP_TRUNCATED_PAGE ? ERR_PTR(error) : NULL;
+	}
+
+	if (!PageUptodate(page)) {
+		error = lock_page_killable(page);
+		if (unlikely(error)) {
+			put_page(page);
+			return ERR_PTR(error);
+		}
+		if (!PageUptodate(page)) {
+			if (page->mapping == NULL) {
+				/*
+				 * invalidate_mapping_pages got it
+				 */
+				unlock_page(page);
+				put_page(page);
+				return NULL;
+			}
+			unlock_page(page);
+			shrink_readahead_size_eio(ra);
+			put_page(page);
+			return ERR_PTR(-EIO);
+		}
+		unlock_page(page);
+	}
+
+	return page;
+}
+
+static struct page *
+generic_file_buffered_read_pagenotuptodate(struct file *filp,
+					   struct iov_iter *iter,
+					   struct page *page,
+					   loff_t pos, loff_t count)
+{
+	struct address_space *mapping = filp->f_mapping;
+	struct inode *inode = mapping->host;
+	int error;
+
+	/*
+	 * See comment in do_read_cache_page on why
+	 * wait_on_page_locked is used to avoid unnecessarily
+	 * serialisations and why it's safe.
+	 */
+	error = wait_on_page_locked_killable(page);
+	if (unlikely(error)) {
+		put_page(page);
+		return ERR_PTR(error);
+	}
+
+	if (PageUptodate(page))
+		return page;
+
+	if (inode->i_blkbits == PAGE_SHIFT ||
+	    !mapping->a_ops->is_partially_uptodate)
+		goto page_not_up_to_date;
+	/* pipes can't handle partially uptodate pages */
+	if (unlikely(iov_iter_is_pipe(iter)))
+		goto page_not_up_to_date;
+	if (!trylock_page(page))
+		goto page_not_up_to_date;
+	/* Did it get truncated before we got the lock? */
+	if (!page->mapping)
+		goto page_not_up_to_date_locked;
+
+	if (!mapping->a_ops->is_partially_uptodate(page,
+				pos & ~PAGE_MASK, count))
+		goto page_not_up_to_date_locked;
+	unlock_page(page);
+	return page;
+
+page_not_up_to_date:
+	/* Get exclusive access to the page ... */
+	error = lock_page_killable(page);
+	if (unlikely(error)) {
+		put_page(page);
+		return ERR_PTR(error);
+	}
+
+page_not_up_to_date_locked:
+	/* Did it get truncated before we got the lock? */
+	if (!page->mapping) {
+		unlock_page(page);
+		put_page(page);
+		return NULL;
+	}
+
+	/* Did somebody else fill it already? */
+	if (PageUptodate(page)) {
+		unlock_page(page);
+		return page;
+	}
+
+	return generic_file_buffered_read_readpage(filp, mapping, page);
+}
+
+static struct page *
+generic_file_buffered_read_no_cached_page(struct kiocb *iocb,
+					  struct iov_iter *iter)
+{
+	struct file *filp = iocb->ki_filp;
+	struct address_space *mapping = filp->f_mapping;
+	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
+	struct page *page;
+	int error;
+
+	/*
+	 * Ok, it wasn't cached, so we need to create a new
+	 * page..
+	 */
+	page = page_cache_alloc(mapping);
+	if (!page)
+		return ERR_PTR(-ENOMEM);
+
+	error = add_to_page_cache_lru(page, mapping, index,
+				      mapping_gfp_constraint(mapping, GFP_KERNEL));
+	if (error) {
+		put_page(page);
+		return error != -EEXIST ? ERR_PTR(error) : NULL;
+	}
+
+	return generic_file_buffered_read_readpage(filp, mapping, page);
+}
+
 /**
  * generic_file_buffered_read - generic file read routine
  * @iocb:	the iocb to read
@@ -2074,29 +2278,19 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 	struct address_space *mapping = filp->f_mapping;
 	struct inode *inode = mapping->host;
 	struct file_ra_state *ra = &filp->f_ra;
-	loff_t *ppos = &iocb->ki_pos;
-	pgoff_t index;
+	size_t orig_count = iov_iter_count(iter);
 	pgoff_t last_index;
-	pgoff_t prev_index;
-	unsigned long offset;      /* offset into pagecache page */
-	unsigned int prev_offset;
 	int error = 0;
 
-	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
+	if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes))
 		return 0;
 	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
 
-	index = *ppos >> PAGE_SHIFT;
-	prev_index = ra->prev_pos >> PAGE_SHIFT;
-	prev_offset = ra->prev_pos & (PAGE_SIZE-1);
-	last_index = (*ppos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
-	offset = *ppos & ~PAGE_MASK;
+	last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
 
 	for (;;) {
+		pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
 		struct page *page;
-		pgoff_t end_index;
-		loff_t isize;
-		unsigned long nr, ret;
 
 		cond_resched();
 find_page:
@@ -2113,8 +2307,15 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 					ra, filp,
 					index, last_index - index);
 			page = find_get_page(mapping, index);
-			if (unlikely(page == NULL))
-				goto no_cached_page;
+			if (unlikely(page == NULL)) {
+				page = generic_file_buffered_read_no_cached_page(iocb, iter);
+				if (!page)
+					goto find_page;
+				if (IS_ERR(page)) {
+					error = PTR_ERR(page);
+					goto out;
+				}
+			}
 		}
 		if (PageReadahead(page)) {
 			page_cache_async_readahead(mapping,
@@ -2124,199 +2325,36 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		if (!PageUptodate(page)) {
 			if (iocb->ki_flags & IOCB_NOWAIT) {
 				put_page(page);
-				goto would_block;
-			}
-
-			/*
-			 * See comment in do_read_cache_page on why
-			 * wait_on_page_locked is used to avoid unnecessarily
-			 * serialisations and why it's safe.
-			 */
-			error = wait_on_page_locked_killable(page);
-			if (unlikely(error))
-				goto readpage_error;
-			if (PageUptodate(page))
-				goto page_ok;
-
-			if (inode->i_blkbits == PAGE_SHIFT ||
-					!mapping->a_ops->is_partially_uptodate)
-				goto page_not_up_to_date;
-			/* pipes can't handle partially uptodate pages */
-			if (unlikely(iov_iter_is_pipe(iter)))
-				goto page_not_up_to_date;
-			if (!trylock_page(page))
-				goto page_not_up_to_date;
-			/* Did it get truncated before we got the lock? */
-			if (!page->mapping)
-				goto page_not_up_to_date_locked;
-			if (!mapping->a_ops->is_partially_uptodate(page,
-							offset, iter->count))
-				goto page_not_up_to_date_locked;
-			unlock_page(page);
-		}
-page_ok:
-		/*
-		 * i_size must be checked after we know the page is Uptodate.
-		 *
-		 * Checking i_size after the check allows us to calculate
-		 * the correct value for "nr", which means the zero-filled
-		 * part of the page is not copied back to userspace (unless
-		 * another truncate extends the file - this is desired though).
-		 */
-
-		isize = i_size_read(inode);
-		end_index = (isize - 1) >> PAGE_SHIFT;
-		if (unlikely(!isize || index > end_index)) {
-			put_page(page);
-			goto out;
-		}
-
-		/* nr is the maximum number of bytes to copy from this page */
-		nr = PAGE_SIZE;
-		if (index == end_index) {
-			nr = ((isize - 1) & ~PAGE_MASK) + 1;
-			if (nr <= offset) {
-				put_page(page);
+				error = -EAGAIN;
 				goto out;
 			}
-		}
-		nr = nr - offset;
-
-		/* If users can be writing to this page using arbitrary
-		 * virtual addresses, take care about potential aliasing
-		 * before reading the page on the kernel side.
-		 */
-		if (mapping_writably_mapped(mapping))
-			flush_dcache_page(page);
-
-		/*
-		 * When a sequential read accesses a page several times,
-		 * only mark it as accessed the first time.
-		 */
-		if (prev_index != index || offset != prev_offset)
-			mark_page_accessed(page);
-		prev_index = index;
-
-		/*
-		 * Ok, we have the page, and it's up-to-date, so
-		 * now we can copy it to user space...
-		 */
-
-		ret = copy_page_to_iter(page, offset, nr, iter);
-		offset += ret;
-		index += offset >> PAGE_SHIFT;
-		offset &= ~PAGE_MASK;
-		prev_offset = offset;
 
-		put_page(page);
-		written += ret;
-		if (!iov_iter_count(iter))
-			goto out;
-		if (ret < nr) {
-			error = -EFAULT;
-			goto out;
-		}
-		continue;
-
-page_not_up_to_date:
-		/* Get exclusive access to the page ... */
-		error = lock_page_killable(page);
-		if (unlikely(error))
-			goto readpage_error;
-
-page_not_up_to_date_locked:
-		/* Did it get truncated before we got the lock? */
-		if (!page->mapping) {
-			unlock_page(page);
-			put_page(page);
-			continue;
-		}
-
-		/* Did somebody else fill it already? */
-		if (PageUptodate(page)) {
-			unlock_page(page);
-			goto page_ok;
-		}
-
-readpage:
-		/*
-		 * A previous I/O error may have been due to temporary
-		 * failures, eg. multipath errors.
-		 * PG_error will be set again if readpage fails.
-		 */
-		ClearPageError(page);
-		/* Start the actual read. The read will unlock the page. */
-		error = mapping->a_ops->readpage(filp, page);
-
-		if (unlikely(error)) {
-			if (error == AOP_TRUNCATED_PAGE) {
-				put_page(page);
-				error = 0;
+			page = generic_file_buffered_read_pagenotuptodate(filp,
+					iter, page, iocb->ki_pos, iter->count);
+			if (!page)
 				goto find_page;
+			if (IS_ERR(page)) {
+				error = PTR_ERR(page);
+				goto out;
 			}
-			goto readpage_error;
-		}
-
-		if (!PageUptodate(page)) {
-			error = lock_page_killable(page);
-			if (unlikely(error))
-				goto readpage_error;
-			if (!PageUptodate(page)) {
-				if (page->mapping == NULL) {
-					/*
-					 * invalidate_mapping_pages got it
-					 */
-					unlock_page(page);
-					put_page(page);
-					goto find_page;
-				}
-				unlock_page(page);
-				shrink_readahead_size_eio(ra);
-				error = -EIO;
-				goto readpage_error;
-			}
-			unlock_page(page);
 		}
 
-		goto page_ok;
-
-readpage_error:
-		/* UHHUH! A synchronous read error occurred. Report it */
+		error = generic_file_buffered_read_page_ok(iocb, iter, page);
 		put_page(page);
-		goto out;
 
-no_cached_page:
-		/*
-		 * Ok, it wasn't cached, so we need to create a new
-		 * page..
-		 */
-		page = page_cache_alloc(mapping);
-		if (!page) {
-			error = -ENOMEM;
-			goto out;
-		}
-		error = add_to_page_cache_lru(page, mapping, index,
-				mapping_gfp_constraint(mapping, GFP_KERNEL));
 		if (error) {
-			put_page(page);
-			if (error == -EEXIST) {
+			if (error > 0)
 				error = 0;
-				goto find_page;
-			}
 			goto out;
 		}
-		goto readpage;
 	}
 
 would_block:
 	error = -EAGAIN;
 out:
-	ra->prev_pos = prev_index;
-	ra->prev_pos <<= PAGE_SHIFT;
-	ra->prev_pos |= prev_offset;
-
-	*ppos = ((loff_t)index << PAGE_SHIFT) + offset;
 	file_accessed(filp);
+	written += orig_count - iov_iter_count(iter);
+
 	return written ? written : error;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig
  2020-06-10  0:10 [PATCH 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
  2020-06-10  0:10 ` [PATCH 1/2] fs: Break generic_file_buffered_read up into multiple functions Kent Overstreet
@ 2020-06-10  0:10 ` Kent Overstreet
  2020-06-10  0:47   ` Matthew Wilcox
  2020-06-10  1:38   ` Matthew Wilcox
  2020-06-10  1:36 ` [PATCH v2 1/2] fs: Break generic_file_buffered_read up into multiple functions Kent Overstreet
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Kent Overstreet @ 2020-06-10  0:10 UTC (permalink / raw)
  To: linux-kernel, akpm, viro, linux-mm, linux-fsdevel; +Cc: Kent Overstreet

Convert generic_file_buffered_read() to get pages to read from in
batches, and then copy data to userspace from many pages at once - in
particular, we now don't touch any cachelines that might be contended
while we're in the loop to copy data to userspace.

This is is a performance improvement on workloads that do buffered reads
with large blocksizes, and a very large performance improvement if that
file is also being accessed concurrently by different threads.

On smaller reads (512 bytes), there's a very small performance
improvement (1%, within the margin of error).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 mm/filemap.c | 266 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 144 insertions(+), 122 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 206d51a1c9..0d1836081c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2051,67 +2051,6 @@ static void shrink_readahead_size_eio(struct file_ra_state *ra)
 	ra->ra_pages /= 4;
 }
 
-static int generic_file_buffered_read_page_ok(struct kiocb *iocb,
-			struct iov_iter *iter,
-			struct page *page)
-{
-	struct address_space *mapping = iocb->ki_filp->f_mapping;
-	struct inode *inode = mapping->host;
-	struct file_ra_state *ra = &iocb->ki_filp->f_ra;
-	unsigned offset = iocb->ki_pos & ~PAGE_MASK;
-	unsigned bytes, copied;
-	loff_t isize, end_offset;
-
-	BUG_ON(iocb->ki_pos >> PAGE_SHIFT != page->index);
-
-	/*
-	 * i_size must be checked after we know the page is Uptodate.
-	 *
-	 * Checking i_size after the check allows us to calculate
-	 * the correct value for "bytes", which means the zero-filled
-	 * part of the page is not copied back to userspace (unless
-	 * another truncate extends the file - this is desired though).
-	 */
-
-	isize = i_size_read(inode);
-	if (unlikely(iocb->ki_pos >= isize))
-		return 1;
-
-	end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
-
-	bytes = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
-
-	/* If users can be writing to this page using arbitrary
-	 * virtual addresses, take care about potential aliasing
-	 * before reading the page on the kernel side.
-	 */
-	if (mapping_writably_mapped(mapping))
-		flush_dcache_page(page);
-
-	/*
-	 * Ok, we have the page, and it's up-to-date, so
-	 * now we can copy it to user space...
-	 */
-
-	copied = copy_page_to_iter(page, offset, bytes, iter);
-
-	iocb->ki_pos += copied;
-
-	/*
-	 * When a sequential read accesses a page several times,
-	 * only mark it as accessed the first time.
-	 */
-	if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
-		mark_page_accessed(page);
-
-	ra->prev_pos = iocb->ki_pos;
-
-	if (copied < bytes)
-		return -EFAULT;
-
-	return !iov_iter_count(iter) || iocb->ki_pos == isize;
-}
-
 static struct page *
 generic_file_buffered_read_readpage(struct file *filp,
 				    struct address_space *mapping,
@@ -2255,6 +2194,79 @@ generic_file_buffered_read_no_cached_page(struct kiocb *iocb,
 	return generic_file_buffered_read_readpage(filp, mapping, page);
 }
 
+static int generic_file_buffered_read_get_pages(struct kiocb *iocb,
+						struct iov_iter *iter,
+						struct page **pages,
+						unsigned nr)
+{
+	struct file *filp = iocb->ki_filp;
+	struct address_space *mapping = filp->f_mapping;
+	struct file_ra_state *ra = &filp->f_ra;
+	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
+	pgoff_t last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
+	int i, j, ret, err = 0;
+
+	nr = min_t(unsigned long, last_index - index, nr);
+find_page:
+	if (fatal_signal_pending(current))
+		return -EINTR;
+
+	ret = find_get_pages_contig(mapping, index, nr, pages);
+	if (ret)
+		goto got_pages;
+
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		return -EAGAIN;
+
+	page_cache_sync_readahead(mapping, ra, filp, index, last_index - index);
+
+	ret = find_get_pages_contig(mapping, index, nr, pages);
+	if (ret)
+		goto got_pages;
+
+	pages[0] = generic_file_buffered_read_no_cached_page(iocb, iter);
+	err = PTR_ERR_OR_ZERO(pages[0]);
+	ret = !IS_ERR_OR_NULL(pages[0]);
+got_pages:
+	for (i = 0; i < ret; i++) {
+		struct page *page = pages[i];
+		pgoff_t pg_index = index +i;
+		loff_t pg_pos = max(iocb->ki_pos,
+				    (loff_t) pg_index << PAGE_SHIFT);
+		loff_t pg_count = iocb->ki_pos + iter->count - pg_pos;
+
+		if (PageReadahead(page))
+			page_cache_async_readahead(mapping, ra, filp, page,
+					pg_index, last_index - pg_index);
+
+		if (!PageUptodate(page)) {
+			if (iocb->ki_flags & IOCB_NOWAIT) {
+				for (j = i; j < ret; j++)
+					put_page(pages[j]);
+				ret = i;
+				err = -EAGAIN;
+				break;
+			}
+
+			page = generic_file_buffered_read_pagenotuptodate(filp,
+						iter, page, pg_pos, pg_count);
+			if (IS_ERR_OR_NULL(page)) {
+				for (j = i + 1; j < ret; j++)
+					put_page(pages[j]);
+				ret = i;
+				err = PTR_ERR_OR_ZERO(page);
+				break;
+			}
+		}
+	}
+
+	if (likely(ret))
+		return ret;
+	if (err)
+		return err;
+	goto find_page;
+}
+
 /**
  * generic_file_buffered_read - generic file read routine
  * @iocb:	the iocb to read
@@ -2275,83 +2287,93 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		struct iov_iter *iter, ssize_t written)
 {
 	struct file *filp = iocb->ki_filp;
+	struct file_ra_state *ra = &filp->f_ra;
 	struct address_space *mapping = filp->f_mapping;
 	struct inode *inode = mapping->host;
-	struct file_ra_state *ra = &filp->f_ra;
 	size_t orig_count = iov_iter_count(iter);
-	pgoff_t last_index;
-	int error = 0;
+	struct page *pages[64];
+	int i, pg_nr, error = 0;
+	bool writably_mapped;
+	loff_t isize, end_offset;
 
 	if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes))
 		return 0;
 	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
 
-	last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
-
-	for (;;) {
-		pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
-		struct page *page;
-
+	do {
 		cond_resched();
-find_page:
-		if (fatal_signal_pending(current)) {
-			error = -EINTR;
-			goto out;
-		}
 
-		page = find_get_page(mapping, index);
-		if (!page) {
-			if (iocb->ki_flags & IOCB_NOWAIT)
-				goto would_block;
-			page_cache_sync_readahead(mapping,
-					ra, filp,
-					index, last_index - index);
-			page = find_get_page(mapping, index);
-			if (unlikely(page == NULL)) {
-				page = generic_file_buffered_read_no_cached_page(iocb, iter);
-				if (!page)
-					goto find_page;
-				if (IS_ERR(page)) {
-					error = PTR_ERR(page);
-					goto out;
-				}
-			}
-		}
-		if (PageReadahead(page)) {
-			page_cache_async_readahead(mapping,
-					ra, filp, page,
-					index, last_index - index);
+		i = 0;
+		pg_nr = generic_file_buffered_read_get_pages(iocb, iter, pages,
+							     ARRAY_SIZE(pages));
+		if (pg_nr < 0) {
+			error = pg_nr;
+			break;
 		}
-		if (!PageUptodate(page)) {
-			if (iocb->ki_flags & IOCB_NOWAIT) {
-				put_page(page);
-				error = -EAGAIN;
-				goto out;
-			}
 
-			page = generic_file_buffered_read_pagenotuptodate(filp,
-					iter, page, iocb->ki_pos, iter->count);
-			if (!page)
-				goto find_page;
-			if (IS_ERR(page)) {
-				error = PTR_ERR(page);
-				goto out;
-			}
-		}
+		/*
+		 * i_size must be checked after we know the pages are Uptodate.
+		 *
+		 * Checking i_size after the check allows us to calculate
+		 * the correct value for "nr", which means the zero-filled
+		 * part of the page is not copied back to userspace (unless
+		 * another truncate extends the file - this is desired though).
+		 */
+		isize = i_size_read(inode);
+		if (unlikely(iocb->ki_pos >= isize))
+			goto put_pages;
 
-		error = generic_file_buffered_read_page_ok(iocb, iter, page);
-		put_page(page);
+		end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
 
-		if (error) {
-			if (error > 0)
-				error = 0;
-			goto out;
+		while ((iocb->ki_pos >> PAGE_SHIFT) + pg_nr >
+		       (end_offset + PAGE_SIZE - 1) >> PAGE_SHIFT)
+			put_page(pages[--pg_nr]);
+
+		/*
+		 * Once we start copying data, we don't want to be touching any
+		 * cachelines that might be contended:
+		 */
+		writably_mapped = mapping_writably_mapped(mapping);
+
+		/*
+		 * When a sequential read accesses a page several times, only
+		 * mark it as accessed the first time.
+		 */
+		if (iocb->ki_pos >> PAGE_SHIFT !=
+		    ra->prev_pos >> PAGE_SHIFT)
+			mark_page_accessed(pages[0]);
+		for (i = 1; i < pg_nr; i++)
+			mark_page_accessed(pages[i]);
+
+		for (i = 0; i < pg_nr; i++) {
+			unsigned offset = iocb->ki_pos & ~PAGE_MASK;
+			unsigned bytes = min_t(loff_t, end_offset - iocb->ki_pos,
+					       PAGE_SIZE - offset);
+			unsigned copied;
+
+			/*
+			 * If users can be writing to this page using arbitrary
+			 * virtual addresses, take care about potential aliasing
+			 * before reading the page on the kernel side.
+			 */
+			if (writably_mapped)
+				flush_dcache_page(pages[i]);
+
+			copied = copy_page_to_iter(pages[i], offset, bytes, iter);
+
+			iocb->ki_pos += copied;
+			ra->prev_pos = iocb->ki_pos;
+
+			if (copied < bytes) {
+				error = -EFAULT;
+				break;
+			}
 		}
-	}
+put_pages:
+		for (i = 0; i < pg_nr; i++)
+			put_page(pages[i]);
+	} while (iov_iter_count(iter) && iocb->ki_pos < isize && !error);
 
-would_block:
-	error = -EAGAIN;
-out:
 	file_accessed(filp);
 	written += orig_count - iov_iter_count(iter);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig
  2020-06-10  0:10 ` [PATCH 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
@ 2020-06-10  0:47   ` Matthew Wilcox
  2020-06-10  1:08     ` Kent Overstreet
  2020-06-10  1:38   ` Matthew Wilcox
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2020-06-10  0:47 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel, akpm, viro, linux-mm, linux-fsdevel

On Tue, Jun 09, 2020 at 08:10:36PM -0400, Kent Overstreet wrote:
> @@ -2275,83 +2287,93 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
>  		struct iov_iter *iter, ssize_t written)
>  {
>  	struct file *filp = iocb->ki_filp;
> +	struct file_ra_state *ra = &filp->f_ra;
>  	struct address_space *mapping = filp->f_mapping;
>  	struct inode *inode = mapping->host;
> -	struct file_ra_state *ra = &filp->f_ra;
>  	size_t orig_count = iov_iter_count(iter);
> -	pgoff_t last_index;
> -	int error = 0;
> +	struct page *pages[64];

That's 512 bytes which seems like a lot of stack space.  Would 16 be
enough to see a significant fraction of the benefit?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig
  2020-06-10  0:47   ` Matthew Wilcox
@ 2020-06-10  1:08     ` Kent Overstreet
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2020-06-10  1:08 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, akpm, viro, linux-mm, linux-fsdevel

On Tue, Jun 09, 2020 at 05:47:53PM -0700, Matthew Wilcox wrote:
> On Tue, Jun 09, 2020 at 08:10:36PM -0400, Kent Overstreet wrote:
> > @@ -2275,83 +2287,93 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
> >  		struct iov_iter *iter, ssize_t written)
> >  {
> >  	struct file *filp = iocb->ki_filp;
> > +	struct file_ra_state *ra = &filp->f_ra;
> >  	struct address_space *mapping = filp->f_mapping;
> >  	struct inode *inode = mapping->host;
> > -	struct file_ra_state *ra = &filp->f_ra;
> >  	size_t orig_count = iov_iter_count(iter);
> > -	pgoff_t last_index;
> > -	int error = 0;
> > +	struct page *pages[64];
> 
> That's 512 bytes which seems like a lot of stack space.  Would 16 be
> enough to see a significant fraction of the benefit?

Ah right, we do call into fs code for readahead from here. I'll switch it to
kmalloc the page array if it's more than 16.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/2] fs: Break generic_file_buffered_read up into multiple functions
  2020-06-10  0:10 [PATCH 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
  2020-06-10  0:10 ` [PATCH 1/2] fs: Break generic_file_buffered_read up into multiple functions Kent Overstreet
  2020-06-10  0:10 ` [PATCH 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
@ 2020-06-10  1:36 ` Kent Overstreet
  2020-06-10  1:36 ` [PATCH v2 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
  2020-06-30  0:12 ` Fixup patch for [PATCH 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
  4 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2020-06-10  1:36 UTC (permalink / raw)
  To: linux-kernel, akpm, viro, linux-mm, linux-fsdevel; +Cc: Kent Overstreet

This is prep work for changing generic_file_buffered_read() to use
find_get_pages_contig() to batch up all the pagecache lookups.

This patch should be functionally identical to the existing code and
changes as little as of the flow control as possible. More refactoring
could be done, this patch is intended to be relatively minimal.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 mm/filemap.c | 418 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 228 insertions(+), 190 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index e67fa8ab48..206d51a1c9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2051,6 +2051,210 @@ static void shrink_readahead_size_eio(struct file_ra_state *ra)
 	ra->ra_pages /= 4;
 }
 
+static int generic_file_buffered_read_page_ok(struct kiocb *iocb,
+			struct iov_iter *iter,
+			struct page *page)
+{
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
+	struct inode *inode = mapping->host;
+	struct file_ra_state *ra = &iocb->ki_filp->f_ra;
+	unsigned offset = iocb->ki_pos & ~PAGE_MASK;
+	unsigned bytes, copied;
+	loff_t isize, end_offset;
+
+	BUG_ON(iocb->ki_pos >> PAGE_SHIFT != page->index);
+
+	/*
+	 * i_size must be checked after we know the page is Uptodate.
+	 *
+	 * Checking i_size after the check allows us to calculate
+	 * the correct value for "bytes", which means the zero-filled
+	 * part of the page is not copied back to userspace (unless
+	 * another truncate extends the file - this is desired though).
+	 */
+
+	isize = i_size_read(inode);
+	if (unlikely(iocb->ki_pos >= isize))
+		return 1;
+
+	end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
+
+	bytes = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
+
+	/* If users can be writing to this page using arbitrary
+	 * virtual addresses, take care about potential aliasing
+	 * before reading the page on the kernel side.
+	 */
+	if (mapping_writably_mapped(mapping))
+		flush_dcache_page(page);
+
+	/*
+	 * Ok, we have the page, and it's up-to-date, so
+	 * now we can copy it to user space...
+	 */
+
+	copied = copy_page_to_iter(page, offset, bytes, iter);
+
+	iocb->ki_pos += copied;
+
+	/*
+	 * When a sequential read accesses a page several times,
+	 * only mark it as accessed the first time.
+	 */
+	if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
+		mark_page_accessed(page);
+
+	ra->prev_pos = iocb->ki_pos;
+
+	if (copied < bytes)
+		return -EFAULT;
+
+	return !iov_iter_count(iter) || iocb->ki_pos == isize;
+}
+
+static struct page *
+generic_file_buffered_read_readpage(struct file *filp,
+				    struct address_space *mapping,
+				    struct page *page)
+{
+	struct file_ra_state *ra = &filp->f_ra;
+	int error;
+
+	/*
+	 * A previous I/O error may have been due to temporary
+	 * failures, eg. multipath errors.
+	 * PG_error will be set again if readpage fails.
+	 */
+	ClearPageError(page);
+	/* Start the actual read. The read will unlock the page. */
+	error = mapping->a_ops->readpage(filp, page);
+
+	if (unlikely(error)) {
+		put_page(page);
+		return error != AOP_TRUNCATED_PAGE ? ERR_PTR(error) : NULL;
+	}
+
+	if (!PageUptodate(page)) {
+		error = lock_page_killable(page);
+		if (unlikely(error)) {
+			put_page(page);
+			return ERR_PTR(error);
+		}
+		if (!PageUptodate(page)) {
+			if (page->mapping == NULL) {
+				/*
+				 * invalidate_mapping_pages got it
+				 */
+				unlock_page(page);
+				put_page(page);
+				return NULL;
+			}
+			unlock_page(page);
+			shrink_readahead_size_eio(ra);
+			put_page(page);
+			return ERR_PTR(-EIO);
+		}
+		unlock_page(page);
+	}
+
+	return page;
+}
+
+static struct page *
+generic_file_buffered_read_pagenotuptodate(struct file *filp,
+					   struct iov_iter *iter,
+					   struct page *page,
+					   loff_t pos, loff_t count)
+{
+	struct address_space *mapping = filp->f_mapping;
+	struct inode *inode = mapping->host;
+	int error;
+
+	/*
+	 * See comment in do_read_cache_page on why
+	 * wait_on_page_locked is used to avoid unnecessarily
+	 * serialisations and why it's safe.
+	 */
+	error = wait_on_page_locked_killable(page);
+	if (unlikely(error)) {
+		put_page(page);
+		return ERR_PTR(error);
+	}
+
+	if (PageUptodate(page))
+		return page;
+
+	if (inode->i_blkbits == PAGE_SHIFT ||
+	    !mapping->a_ops->is_partially_uptodate)
+		goto page_not_up_to_date;
+	/* pipes can't handle partially uptodate pages */
+	if (unlikely(iov_iter_is_pipe(iter)))
+		goto page_not_up_to_date;
+	if (!trylock_page(page))
+		goto page_not_up_to_date;
+	/* Did it get truncated before we got the lock? */
+	if (!page->mapping)
+		goto page_not_up_to_date_locked;
+
+	if (!mapping->a_ops->is_partially_uptodate(page,
+				pos & ~PAGE_MASK, count))
+		goto page_not_up_to_date_locked;
+	unlock_page(page);
+	return page;
+
+page_not_up_to_date:
+	/* Get exclusive access to the page ... */
+	error = lock_page_killable(page);
+	if (unlikely(error)) {
+		put_page(page);
+		return ERR_PTR(error);
+	}
+
+page_not_up_to_date_locked:
+	/* Did it get truncated before we got the lock? */
+	if (!page->mapping) {
+		unlock_page(page);
+		put_page(page);
+		return NULL;
+	}
+
+	/* Did somebody else fill it already? */
+	if (PageUptodate(page)) {
+		unlock_page(page);
+		return page;
+	}
+
+	return generic_file_buffered_read_readpage(filp, mapping, page);
+}
+
+static struct page *
+generic_file_buffered_read_no_cached_page(struct kiocb *iocb,
+					  struct iov_iter *iter)
+{
+	struct file *filp = iocb->ki_filp;
+	struct address_space *mapping = filp->f_mapping;
+	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
+	struct page *page;
+	int error;
+
+	/*
+	 * Ok, it wasn't cached, so we need to create a new
+	 * page..
+	 */
+	page = page_cache_alloc(mapping);
+	if (!page)
+		return ERR_PTR(-ENOMEM);
+
+	error = add_to_page_cache_lru(page, mapping, index,
+				      mapping_gfp_constraint(mapping, GFP_KERNEL));
+	if (error) {
+		put_page(page);
+		return error != -EEXIST ? ERR_PTR(error) : NULL;
+	}
+
+	return generic_file_buffered_read_readpage(filp, mapping, page);
+}
+
 /**
  * generic_file_buffered_read - generic file read routine
  * @iocb:	the iocb to read
@@ -2074,29 +2278,19 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 	struct address_space *mapping = filp->f_mapping;
 	struct inode *inode = mapping->host;
 	struct file_ra_state *ra = &filp->f_ra;
-	loff_t *ppos = &iocb->ki_pos;
-	pgoff_t index;
+	size_t orig_count = iov_iter_count(iter);
 	pgoff_t last_index;
-	pgoff_t prev_index;
-	unsigned long offset;      /* offset into pagecache page */
-	unsigned int prev_offset;
 	int error = 0;
 
-	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
+	if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes))
 		return 0;
 	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
 
-	index = *ppos >> PAGE_SHIFT;
-	prev_index = ra->prev_pos >> PAGE_SHIFT;
-	prev_offset = ra->prev_pos & (PAGE_SIZE-1);
-	last_index = (*ppos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
-	offset = *ppos & ~PAGE_MASK;
+	last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
 
 	for (;;) {
+		pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
 		struct page *page;
-		pgoff_t end_index;
-		loff_t isize;
-		unsigned long nr, ret;
 
 		cond_resched();
 find_page:
@@ -2113,8 +2307,15 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 					ra, filp,
 					index, last_index - index);
 			page = find_get_page(mapping, index);
-			if (unlikely(page == NULL))
-				goto no_cached_page;
+			if (unlikely(page == NULL)) {
+				page = generic_file_buffered_read_no_cached_page(iocb, iter);
+				if (!page)
+					goto find_page;
+				if (IS_ERR(page)) {
+					error = PTR_ERR(page);
+					goto out;
+				}
+			}
 		}
 		if (PageReadahead(page)) {
 			page_cache_async_readahead(mapping,
@@ -2124,199 +2325,36 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		if (!PageUptodate(page)) {
 			if (iocb->ki_flags & IOCB_NOWAIT) {
 				put_page(page);
-				goto would_block;
-			}
-
-			/*
-			 * See comment in do_read_cache_page on why
-			 * wait_on_page_locked is used to avoid unnecessarily
-			 * serialisations and why it's safe.
-			 */
-			error = wait_on_page_locked_killable(page);
-			if (unlikely(error))
-				goto readpage_error;
-			if (PageUptodate(page))
-				goto page_ok;
-
-			if (inode->i_blkbits == PAGE_SHIFT ||
-					!mapping->a_ops->is_partially_uptodate)
-				goto page_not_up_to_date;
-			/* pipes can't handle partially uptodate pages */
-			if (unlikely(iov_iter_is_pipe(iter)))
-				goto page_not_up_to_date;
-			if (!trylock_page(page))
-				goto page_not_up_to_date;
-			/* Did it get truncated before we got the lock? */
-			if (!page->mapping)
-				goto page_not_up_to_date_locked;
-			if (!mapping->a_ops->is_partially_uptodate(page,
-							offset, iter->count))
-				goto page_not_up_to_date_locked;
-			unlock_page(page);
-		}
-page_ok:
-		/*
-		 * i_size must be checked after we know the page is Uptodate.
-		 *
-		 * Checking i_size after the check allows us to calculate
-		 * the correct value for "nr", which means the zero-filled
-		 * part of the page is not copied back to userspace (unless
-		 * another truncate extends the file - this is desired though).
-		 */
-
-		isize = i_size_read(inode);
-		end_index = (isize - 1) >> PAGE_SHIFT;
-		if (unlikely(!isize || index > end_index)) {
-			put_page(page);
-			goto out;
-		}
-
-		/* nr is the maximum number of bytes to copy from this page */
-		nr = PAGE_SIZE;
-		if (index == end_index) {
-			nr = ((isize - 1) & ~PAGE_MASK) + 1;
-			if (nr <= offset) {
-				put_page(page);
+				error = -EAGAIN;
 				goto out;
 			}
-		}
-		nr = nr - offset;
-
-		/* If users can be writing to this page using arbitrary
-		 * virtual addresses, take care about potential aliasing
-		 * before reading the page on the kernel side.
-		 */
-		if (mapping_writably_mapped(mapping))
-			flush_dcache_page(page);
-
-		/*
-		 * When a sequential read accesses a page several times,
-		 * only mark it as accessed the first time.
-		 */
-		if (prev_index != index || offset != prev_offset)
-			mark_page_accessed(page);
-		prev_index = index;
-
-		/*
-		 * Ok, we have the page, and it's up-to-date, so
-		 * now we can copy it to user space...
-		 */
-
-		ret = copy_page_to_iter(page, offset, nr, iter);
-		offset += ret;
-		index += offset >> PAGE_SHIFT;
-		offset &= ~PAGE_MASK;
-		prev_offset = offset;
 
-		put_page(page);
-		written += ret;
-		if (!iov_iter_count(iter))
-			goto out;
-		if (ret < nr) {
-			error = -EFAULT;
-			goto out;
-		}
-		continue;
-
-page_not_up_to_date:
-		/* Get exclusive access to the page ... */
-		error = lock_page_killable(page);
-		if (unlikely(error))
-			goto readpage_error;
-
-page_not_up_to_date_locked:
-		/* Did it get truncated before we got the lock? */
-		if (!page->mapping) {
-			unlock_page(page);
-			put_page(page);
-			continue;
-		}
-
-		/* Did somebody else fill it already? */
-		if (PageUptodate(page)) {
-			unlock_page(page);
-			goto page_ok;
-		}
-
-readpage:
-		/*
-		 * A previous I/O error may have been due to temporary
-		 * failures, eg. multipath errors.
-		 * PG_error will be set again if readpage fails.
-		 */
-		ClearPageError(page);
-		/* Start the actual read. The read will unlock the page. */
-		error = mapping->a_ops->readpage(filp, page);
-
-		if (unlikely(error)) {
-			if (error == AOP_TRUNCATED_PAGE) {
-				put_page(page);
-				error = 0;
+			page = generic_file_buffered_read_pagenotuptodate(filp,
+					iter, page, iocb->ki_pos, iter->count);
+			if (!page)
 				goto find_page;
+			if (IS_ERR(page)) {
+				error = PTR_ERR(page);
+				goto out;
 			}
-			goto readpage_error;
-		}
-
-		if (!PageUptodate(page)) {
-			error = lock_page_killable(page);
-			if (unlikely(error))
-				goto readpage_error;
-			if (!PageUptodate(page)) {
-				if (page->mapping == NULL) {
-					/*
-					 * invalidate_mapping_pages got it
-					 */
-					unlock_page(page);
-					put_page(page);
-					goto find_page;
-				}
-				unlock_page(page);
-				shrink_readahead_size_eio(ra);
-				error = -EIO;
-				goto readpage_error;
-			}
-			unlock_page(page);
 		}
 
-		goto page_ok;
-
-readpage_error:
-		/* UHHUH! A synchronous read error occurred. Report it */
+		error = generic_file_buffered_read_page_ok(iocb, iter, page);
 		put_page(page);
-		goto out;
 
-no_cached_page:
-		/*
-		 * Ok, it wasn't cached, so we need to create a new
-		 * page..
-		 */
-		page = page_cache_alloc(mapping);
-		if (!page) {
-			error = -ENOMEM;
-			goto out;
-		}
-		error = add_to_page_cache_lru(page, mapping, index,
-				mapping_gfp_constraint(mapping, GFP_KERNEL));
 		if (error) {
-			put_page(page);
-			if (error == -EEXIST) {
+			if (error > 0)
 				error = 0;
-				goto find_page;
-			}
 			goto out;
 		}
-		goto readpage;
 	}
 
 would_block:
 	error = -EAGAIN;
 out:
-	ra->prev_pos = prev_index;
-	ra->prev_pos <<= PAGE_SHIFT;
-	ra->prev_pos |= prev_offset;
-
-	*ppos = ((loff_t)index << PAGE_SHIFT) + offset;
 	file_accessed(filp);
+	written += orig_count - iov_iter_count(iter);
+
 	return written ? written : error;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig
  2020-06-10  0:10 [PATCH 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
                   ` (2 preceding siblings ...)
  2020-06-10  1:36 ` [PATCH v2 1/2] fs: Break generic_file_buffered_read up into multiple functions Kent Overstreet
@ 2020-06-10  1:36 ` Kent Overstreet
  2020-06-18  1:05   ` Andrew Morton
  2020-06-30  0:12 ` Fixup patch for [PATCH 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
  4 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2020-06-10  1:36 UTC (permalink / raw)
  To: linux-kernel, akpm, viro, linux-mm, linux-fsdevel; +Cc: Kent Overstreet

Convert generic_file_buffered_read() to get pages to read from in
batches, and then copy data to userspace from many pages at once - in
particular, we now don't touch any cachelines that might be contended
while we're in the loop to copy data to userspace.

This is is a performance improvement on workloads that do buffered reads
with large blocksizes, and a very large performance improvement if that
file is also being accessed concurrently by different threads.

On smaller reads (512 bytes), there's a very small performance
improvement (1%, within the margin of error).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 mm/filemap.c | 276 +++++++++++++++++++++++++++++----------------------
 1 file changed, 155 insertions(+), 121 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 206d51a1c9..4fb0e5a238 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2051,67 +2051,6 @@ static void shrink_readahead_size_eio(struct file_ra_state *ra)
 	ra->ra_pages /= 4;
 }
 
-static int generic_file_buffered_read_page_ok(struct kiocb *iocb,
-			struct iov_iter *iter,
-			struct page *page)
-{
-	struct address_space *mapping = iocb->ki_filp->f_mapping;
-	struct inode *inode = mapping->host;
-	struct file_ra_state *ra = &iocb->ki_filp->f_ra;
-	unsigned offset = iocb->ki_pos & ~PAGE_MASK;
-	unsigned bytes, copied;
-	loff_t isize, end_offset;
-
-	BUG_ON(iocb->ki_pos >> PAGE_SHIFT != page->index);
-
-	/*
-	 * i_size must be checked after we know the page is Uptodate.
-	 *
-	 * Checking i_size after the check allows us to calculate
-	 * the correct value for "bytes", which means the zero-filled
-	 * part of the page is not copied back to userspace (unless
-	 * another truncate extends the file - this is desired though).
-	 */
-
-	isize = i_size_read(inode);
-	if (unlikely(iocb->ki_pos >= isize))
-		return 1;
-
-	end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
-
-	bytes = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
-
-	/* If users can be writing to this page using arbitrary
-	 * virtual addresses, take care about potential aliasing
-	 * before reading the page on the kernel side.
-	 */
-	if (mapping_writably_mapped(mapping))
-		flush_dcache_page(page);
-
-	/*
-	 * Ok, we have the page, and it's up-to-date, so
-	 * now we can copy it to user space...
-	 */
-
-	copied = copy_page_to_iter(page, offset, bytes, iter);
-
-	iocb->ki_pos += copied;
-
-	/*
-	 * When a sequential read accesses a page several times,
-	 * only mark it as accessed the first time.
-	 */
-	if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
-		mark_page_accessed(page);
-
-	ra->prev_pos = iocb->ki_pos;
-
-	if (copied < bytes)
-		return -EFAULT;
-
-	return !iov_iter_count(iter) || iocb->ki_pos == isize;
-}
-
 static struct page *
 generic_file_buffered_read_readpage(struct file *filp,
 				    struct address_space *mapping,
@@ -2255,6 +2194,79 @@ generic_file_buffered_read_no_cached_page(struct kiocb *iocb,
 	return generic_file_buffered_read_readpage(filp, mapping, page);
 }
 
+static int generic_file_buffered_read_get_pages(struct kiocb *iocb,
+						struct iov_iter *iter,
+						struct page **pages,
+						unsigned nr)
+{
+	struct file *filp = iocb->ki_filp;
+	struct address_space *mapping = filp->f_mapping;
+	struct file_ra_state *ra = &filp->f_ra;
+	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
+	pgoff_t last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
+	int i, j, ret, err = 0;
+
+	nr = min_t(unsigned long, last_index - index, nr);
+find_page:
+	if (fatal_signal_pending(current))
+		return -EINTR;
+
+	ret = find_get_pages_contig(mapping, index, nr, pages);
+	if (ret)
+		goto got_pages;
+
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		return -EAGAIN;
+
+	page_cache_sync_readahead(mapping, ra, filp, index, last_index - index);
+
+	ret = find_get_pages_contig(mapping, index, nr, pages);
+	if (ret)
+		goto got_pages;
+
+	pages[0] = generic_file_buffered_read_no_cached_page(iocb, iter);
+	err = PTR_ERR_OR_ZERO(pages[0]);
+	ret = !IS_ERR_OR_NULL(pages[0]);
+got_pages:
+	for (i = 0; i < ret; i++) {
+		struct page *page = pages[i];
+		pgoff_t pg_index = index +i;
+		loff_t pg_pos = max(iocb->ki_pos,
+				    (loff_t) pg_index << PAGE_SHIFT);
+		loff_t pg_count = iocb->ki_pos + iter->count - pg_pos;
+
+		if (PageReadahead(page))
+			page_cache_async_readahead(mapping, ra, filp, page,
+					pg_index, last_index - pg_index);
+
+		if (!PageUptodate(page)) {
+			if (iocb->ki_flags & IOCB_NOWAIT) {
+				for (j = i; j < ret; j++)
+					put_page(pages[j]);
+				ret = i;
+				err = -EAGAIN;
+				break;
+			}
+
+			page = generic_file_buffered_read_pagenotuptodate(filp,
+						iter, page, pg_pos, pg_count);
+			if (IS_ERR_OR_NULL(page)) {
+				for (j = i + 1; j < ret; j++)
+					put_page(pages[j]);
+				ret = i;
+				err = PTR_ERR_OR_ZERO(page);
+				break;
+			}
+		}
+	}
+
+	if (likely(ret))
+		return ret;
+	if (err)
+		return err;
+	goto find_page;
+}
+
 /**
  * generic_file_buffered_read - generic file read routine
  * @iocb:	the iocb to read
@@ -2275,86 +2287,108 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		struct iov_iter *iter, ssize_t written)
 {
 	struct file *filp = iocb->ki_filp;
+	struct file_ra_state *ra = &filp->f_ra;
 	struct address_space *mapping = filp->f_mapping;
 	struct inode *inode = mapping->host;
-	struct file_ra_state *ra = &filp->f_ra;
 	size_t orig_count = iov_iter_count(iter);
-	pgoff_t last_index;
-	int error = 0;
+	struct page *page_array[8], **pages;
+	unsigned nr_pages = ARRAY_SIZE(page_array);
+	unsigned read_nr_pages = ((iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT) -
+		(iocb->ki_pos >> PAGE_SHIFT);
+	int i, pg_nr, error = 0;
+	bool writably_mapped;
+	loff_t isize, end_offset;
 
 	if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes))
 		return 0;
 	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
 
-	last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
-
-	for (;;) {
-		pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
-		struct page *page;
+	if (read_nr_pages > nr_pages &&
+	    (pages = kmalloc_array(read_nr_pages, sizeof(void *), GFP_KERNEL)))
+		nr_pages = read_nr_pages;
+	else
+		pages = page_array;
 
+	do {
 		cond_resched();
-find_page:
-		if (fatal_signal_pending(current)) {
-			error = -EINTR;
-			goto out;
-		}
 
-		page = find_get_page(mapping, index);
-		if (!page) {
-			if (iocb->ki_flags & IOCB_NOWAIT)
-				goto would_block;
-			page_cache_sync_readahead(mapping,
-					ra, filp,
-					index, last_index - index);
-			page = find_get_page(mapping, index);
-			if (unlikely(page == NULL)) {
-				page = generic_file_buffered_read_no_cached_page(iocb, iter);
-				if (!page)
-					goto find_page;
-				if (IS_ERR(page)) {
-					error = PTR_ERR(page);
-					goto out;
-				}
-			}
-		}
-		if (PageReadahead(page)) {
-			page_cache_async_readahead(mapping,
-					ra, filp, page,
-					index, last_index - index);
+		i = 0;
+		pg_nr = generic_file_buffered_read_get_pages(iocb, iter,
+							 pages, nr_pages);
+		if (pg_nr < 0) {
+			error = pg_nr;
+			break;
 		}
-		if (!PageUptodate(page)) {
-			if (iocb->ki_flags & IOCB_NOWAIT) {
-				put_page(page);
-				error = -EAGAIN;
-				goto out;
-			}
 
-			page = generic_file_buffered_read_pagenotuptodate(filp,
-					iter, page, iocb->ki_pos, iter->count);
-			if (!page)
-				goto find_page;
-			if (IS_ERR(page)) {
-				error = PTR_ERR(page);
-				goto out;
-			}
-		}
+		/*
+		 * i_size must be checked after we know the pages are Uptodate.
+		 *
+		 * Checking i_size after the check allows us to calculate
+		 * the correct value for "nr", which means the zero-filled
+		 * part of the page is not copied back to userspace (unless
+		 * another truncate extends the file - this is desired though).
+		 */
+		isize = i_size_read(inode);
+		if (unlikely(iocb->ki_pos >= isize))
+			goto put_pages;
 
-		error = generic_file_buffered_read_page_ok(iocb, iter, page);
-		put_page(page);
+		end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
 
-		if (error) {
-			if (error > 0)
-				error = 0;
-			goto out;
+		while ((iocb->ki_pos >> PAGE_SHIFT) + pg_nr >
+		       (end_offset + PAGE_SIZE - 1) >> PAGE_SHIFT)
+			put_page(pages[--pg_nr]);
+
+		/*
+		 * Once we start copying data, we don't want to be touching any
+		 * cachelines that might be contended:
+		 */
+		writably_mapped = mapping_writably_mapped(mapping);
+
+		/*
+		 * When a sequential read accesses a page several times, only
+		 * mark it as accessed the first time.
+		 */
+		if (iocb->ki_pos >> PAGE_SHIFT !=
+		    ra->prev_pos >> PAGE_SHIFT)
+			mark_page_accessed(pages[0]);
+		for (i = 1; i < pg_nr; i++)
+			mark_page_accessed(pages[i]);
+
+		for (i = 0; i < pg_nr; i++) {
+			unsigned offset = iocb->ki_pos & ~PAGE_MASK;
+			unsigned bytes = min_t(loff_t, end_offset - iocb->ki_pos,
+					       PAGE_SIZE - offset);
+			unsigned copied;
+
+			/*
+			 * If users can be writing to this page using arbitrary
+			 * virtual addresses, take care about potential aliasing
+			 * before reading the page on the kernel side.
+			 */
+			if (writably_mapped)
+				flush_dcache_page(pages[i]);
+
+			copied = copy_page_to_iter(pages[i], offset, bytes, iter);
+
+			iocb->ki_pos += copied;
+			ra->prev_pos = iocb->ki_pos;
+
+			if (copied < bytes) {
+				error = -EFAULT;
+				break;
+			}
 		}
-	}
+put_pages:
+		for (i = 0; i < pg_nr; i++)
+			put_page(pages[i]);
+	} while (iov_iter_count(iter) && iocb->ki_pos < isize && !error);
 
-would_block:
-	error = -EAGAIN;
-out:
 	file_accessed(filp);
 	written += orig_count - iov_iter_count(iter);
 
+	if (pages != page_array)
+		kfree(pages);
+
 	return written ? written : error;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig
  2020-06-10  0:10 ` [PATCH 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
  2020-06-10  0:47   ` Matthew Wilcox
@ 2020-06-10  1:38   ` Matthew Wilcox
  2020-06-10  1:46     ` Kent Overstreet
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2020-06-10  1:38 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel, akpm, viro, linux-mm, linux-fsdevel

On Tue, Jun 09, 2020 at 08:10:36PM -0400, Kent Overstreet wrote:
> Convert generic_file_buffered_read() to get pages to read from in
> batches, and then copy data to userspace from many pages at once - in
> particular, we now don't touch any cachelines that might be contended
> while we're in the loop to copy data to userspace.
> 
> This is is a performance improvement on workloads that do buffered reads
> with large blocksizes, and a very large performance improvement if that
> file is also being accessed concurrently by different threads.

Hey, you're stealing my performance improvements!

Granted, I haven't got to doing performance optimisations (certainly
not in this function), but this is one of the places where THP in the
page cache will have a useful performance improvement.

I'm not opposed to putting this in, but I may back it out as part of
the THP work because the THPs will get the same performance improvements
that you're seeing here with less code.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig
  2020-06-10  1:38   ` Matthew Wilcox
@ 2020-06-10  1:46     ` Kent Overstreet
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2020-06-10  1:46 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, akpm, viro, linux-mm, linux-fsdevel

On Tue, Jun 09, 2020 at 06:38:08PM -0700, Matthew Wilcox wrote:
> On Tue, Jun 09, 2020 at 08:10:36PM -0400, Kent Overstreet wrote:
> > Convert generic_file_buffered_read() to get pages to read from in
> > batches, and then copy data to userspace from many pages at once - in
> > particular, we now don't touch any cachelines that might be contended
> > while we're in the loop to copy data to userspace.
> > 
> > This is is a performance improvement on workloads that do buffered reads
> > with large blocksizes, and a very large performance improvement if that
> > file is also being accessed concurrently by different threads.
> 
> Hey, you're stealing my performance improvements!

:)

> Granted, I haven't got to doing performance optimisations (certainly
> not in this function), but this is one of the places where THP in the
> page cache will have a useful performance improvement.
> 
> I'm not opposed to putting this in, but I may back it out as part of
> the THP work because the THPs will get the same performance improvements
> that you're seeing here with less code.

I'm an _enthusiastic_ supporter of the THP stuff (as you know), but my feeling
is that it's going to be a long time before hugepages are everywhere - and I
think even with the pagevec stuff generic_file_buffered_read() is somewhat
easier to read and deal with after this series than before it.

Though I could see the pagevec stuff making hugepage support a pain, so there is
that. Eh.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig
  2020-06-10  1:36 ` [PATCH v2 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
@ 2020-06-18  1:05   ` Andrew Morton
  2020-06-19  3:20     ` [PATCH v3 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Andrew Morton @ 2020-06-18  1:05 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel, viro, linux-mm, linux-fsdevel

On Tue,  9 Jun 2020 21:36:42 -0400 Kent Overstreet <kent.overstreet@gmail.com> wrote:

> Convert generic_file_buffered_read() to get pages to read from in
> batches, and then copy data to userspace from many pages at once - in
> particular, we now don't touch any cachelines that might be contended
> while we're in the loop to copy data to userspace.
> 
> This is is a performance improvement on workloads that do buffered reads
> with large blocksizes, and a very large performance improvement if that
> file is also being accessed concurrently by different threads.
> 
> On smaller reads (512 bytes), there's a very small performance
> improvement (1%, within the margin of error).
> 

checkpatch goes fairly crazy over this one, mostly legitimate.

> @@ -2255,6 +2194,79 @@ generic_file_buffered_read_no_cached_page(struct kiocb *iocb,
>  	return generic_file_buffered_read_readpage(filp, mapping, page);
>  }
>  
> +static int generic_file_buffered_read_get_pages(struct kiocb *iocb,
> +						struct iov_iter *iter,
> +						struct page **pages,
> +						unsigned nr)
> +{
> +	struct file *filp = iocb->ki_filp;
> +	struct address_space *mapping = filp->f_mapping;
> +	struct file_ra_state *ra = &filp->f_ra;
> +	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
> +	pgoff_t last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
> +	int i, j, ret, err = 0;
> +
> +	nr = min_t(unsigned long, last_index - index, nr);
> +find_page:
> +	if (fatal_signal_pending(current))
> +		return -EINTR;
> +
> +	ret = find_get_pages_contig(mapping, index, nr, pages);
> +	if (ret)
> +		goto got_pages;
> +
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		return -EAGAIN;
> +
> +	page_cache_sync_readahead(mapping, ra, filp, index, last_index - index);
> +
> +	ret = find_get_pages_contig(mapping, index, nr, pages);
> +	if (ret)
> +		goto got_pages;
> +
> +	pages[0] = generic_file_buffered_read_no_cached_page(iocb, iter);
> +	err = PTR_ERR_OR_ZERO(pages[0]);
> +	ret = !IS_ERR_OR_NULL(pages[0]);

what?

> +got_pages:
> +	for (i = 0; i < ret; i++) {

Comparing i with ret here just hurts my brain.  Two lines ago ret was a
boolean, now it's a scalar.

> +		struct page *page = pages[i];
> +		pgoff_t pg_index = index +i;
> +		loff_t pg_pos = max(iocb->ki_pos,
> +				    (loff_t) pg_index << PAGE_SHIFT);

hm.  I guess we can't use max_t here because we need to cast the
pgoff_t before the << to avoid overflows on 32-bit.  Perhaps this could
be cleaned up by using additional suitably typed and named locals.

> +		loff_t pg_count = iocb->ki_pos + iter->count - pg_pos;
> +
> +		if (PageReadahead(page))
> +			page_cache_async_readahead(mapping, ra, filp, page,
> +					pg_index, last_index - pg_index);
> +
> +		if (!PageUptodate(page)) {
> +			if (iocb->ki_flags & IOCB_NOWAIT) {
> +				for (j = i; j < ret; j++)
> +					put_page(pages[j]);
> +				ret = i;
> +				err = -EAGAIN;
> +				break;
> +			}
> +
> +			page = generic_file_buffered_read_pagenotuptodate(filp,
> +						iter, page, pg_pos, pg_count);
> +			if (IS_ERR_OR_NULL(page)) {
> +				for (j = i + 1; j < ret; j++)
> +					put_page(pages[j]);
> +				ret = i;
> +				err = PTR_ERR_OR_ZERO(page);
> +				break;
> +			}
> +		}
> +	}
> +
> +	if (likely(ret))
> +		return ret;
> +	if (err)
> +		return err;
> +	goto find_page;
> +}
> +
>  /**
>   * generic_file_buffered_read - generic file read routine
>   * @iocb:	the iocb to read
> @@ -2275,86 +2287,108 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
>  		struct iov_iter *iter, ssize_t written)
>  {
>  	struct file *filp = iocb->ki_filp;
> +	struct file_ra_state *ra = &filp->f_ra;
>  	struct address_space *mapping = filp->f_mapping;
>  	struct inode *inode = mapping->host;
> -	struct file_ra_state *ra = &filp->f_ra;
>  	size_t orig_count = iov_iter_count(iter);
> -	pgoff_t last_index;
> -	int error = 0;
> +	struct page *page_array[8], **pages;
> +	unsigned nr_pages = ARRAY_SIZE(page_array);
> +	unsigned read_nr_pages = ((iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT) -
> +		(iocb->ki_pos >> PAGE_SHIFT);
> +	int i, pg_nr, error = 0;
> +	bool writably_mapped;
> +	loff_t isize, end_offset;
>  
>  	if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes))
>  		return 0;
>  	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
>  
> -	last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
> -
> -	for (;;) {
> -		pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
> -		struct page *page;
> +	if (read_nr_pages > nr_pages &&
> +	    (pages = kmalloc_array(read_nr_pages, sizeof(void *), GFP_KERNEL)))

I agree with checkpatch!

> +		nr_pages = read_nr_pages;
> +	else
> +		pages = page_array;
>  
> +	do {
>  		cond_resched();
>
> ...
>

Please, can we make all this code nice to read?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 0/2] generic_file_buffered_read() refactoring & optimization
  2020-06-18  1:05   ` Andrew Morton
@ 2020-06-19  3:20     ` Kent Overstreet
  2020-06-19 12:59       ` Christoph Hellwig
  2020-06-19  3:20     ` [PATCH v3 1/2] fs: Break generic_file_buffered_read up into multiple functions Kent Overstreet
  2020-06-19  3:20     ` [PATCH v3 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
  2 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2020-06-19  3:20 UTC (permalink / raw)
  To: linux-kernel, akpm; +Cc: Kent Overstreet, viro, linux-mm, linux-fsdevel

Ok - here's a new version, I fixed the checkpatch stuff and the thing with ret
should be more readable now:

Kent Overstreet (2):
  fs: Break generic_file_buffered_read up into multiple functions
  fs: generic_file_buffered_read() now uses find_get_pages_contig

 mm/filemap.c | 497 +++++++++++++++++++++++++++++----------------------
 1 file changed, 287 insertions(+), 210 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/2] fs: Break generic_file_buffered_read up into multiple functions
  2020-06-18  1:05   ` Andrew Morton
  2020-06-19  3:20     ` [PATCH v3 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
@ 2020-06-19  3:20     ` Kent Overstreet
  2020-06-19  3:20     ` [PATCH v3 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
  2 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2020-06-19  3:20 UTC (permalink / raw)
  To: linux-kernel, akpm; +Cc: Kent Overstreet, viro, linux-mm, linux-fsdevel

This is prep work for changing generic_file_buffered_read() to use
find_get_pages_contig() to batch up all the pagecache lookups.

This patch should be functionally identical to the existing code and
changes as little as of the flow control as possible. More refactoring
could be done, this patch is intended to be relatively minimal.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 mm/filemap.c | 418 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 228 insertions(+), 190 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 23a051a7ef..dc4a72042e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1975,6 +1975,210 @@ static void shrink_readahead_size_eio(struct file_ra_state *ra)
 	ra->ra_pages /= 4;
 }
 
+static int generic_file_buffered_read_page_ok(struct kiocb *iocb,
+			struct iov_iter *iter,
+			struct page *page)
+{
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
+	struct inode *inode = mapping->host;
+	struct file_ra_state *ra = &iocb->ki_filp->f_ra;
+	unsigned int offset = iocb->ki_pos & ~PAGE_MASK;
+	unsigned int bytes, copied;
+	loff_t isize, end_offset;
+
+	BUG_ON(iocb->ki_pos >> PAGE_SHIFT != page->index);
+
+	/*
+	 * i_size must be checked after we know the page is Uptodate.
+	 *
+	 * Checking i_size after the check allows us to calculate
+	 * the correct value for "bytes", which means the zero-filled
+	 * part of the page is not copied back to userspace (unless
+	 * another truncate extends the file - this is desired though).
+	 */
+
+	isize = i_size_read(inode);
+	if (unlikely(iocb->ki_pos >= isize))
+		return 1;
+
+	end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
+
+	bytes = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
+
+	/* If users can be writing to this page using arbitrary
+	 * virtual addresses, take care about potential aliasing
+	 * before reading the page on the kernel side.
+	 */
+	if (mapping_writably_mapped(mapping))
+		flush_dcache_page(page);
+
+	/*
+	 * Ok, we have the page, and it's up-to-date, so
+	 * now we can copy it to user space...
+	 */
+
+	copied = copy_page_to_iter(page, offset, bytes, iter);
+
+	iocb->ki_pos += copied;
+
+	/*
+	 * When a sequential read accesses a page several times,
+	 * only mark it as accessed the first time.
+	 */
+	if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
+		mark_page_accessed(page);
+
+	ra->prev_pos = iocb->ki_pos;
+
+	if (copied < bytes)
+		return -EFAULT;
+
+	return !iov_iter_count(iter) || iocb->ki_pos == isize;
+}
+
+static struct page *
+generic_file_buffered_read_readpage(struct file *filp,
+				    struct address_space *mapping,
+				    struct page *page)
+{
+	struct file_ra_state *ra = &filp->f_ra;
+	int error;
+
+	/*
+	 * A previous I/O error may have been due to temporary
+	 * failures, eg. multipath errors.
+	 * PG_error will be set again if readpage fails.
+	 */
+	ClearPageError(page);
+	/* Start the actual read. The read will unlock the page. */
+	error = mapping->a_ops->readpage(filp, page);
+
+	if (unlikely(error)) {
+		put_page(page);
+		return error != AOP_TRUNCATED_PAGE ? ERR_PTR(error) : NULL;
+	}
+
+	if (!PageUptodate(page)) {
+		error = lock_page_killable(page);
+		if (unlikely(error)) {
+			put_page(page);
+			return ERR_PTR(error);
+		}
+		if (!PageUptodate(page)) {
+			if (page->mapping == NULL) {
+				/*
+				 * invalidate_mapping_pages got it
+				 */
+				unlock_page(page);
+				put_page(page);
+				return NULL;
+			}
+			unlock_page(page);
+			shrink_readahead_size_eio(ra);
+			put_page(page);
+			return ERR_PTR(-EIO);
+		}
+		unlock_page(page);
+	}
+
+	return page;
+}
+
+static struct page *
+generic_file_buffered_read_pagenotuptodate(struct file *filp,
+					   struct iov_iter *iter,
+					   struct page *page,
+					   loff_t pos, loff_t count)
+{
+	struct address_space *mapping = filp->f_mapping;
+	struct inode *inode = mapping->host;
+	int error;
+
+	/*
+	 * See comment in do_read_cache_page on why
+	 * wait_on_page_locked is used to avoid unnecessarily
+	 * serialisations and why it's safe.
+	 */
+	error = wait_on_page_locked_killable(page);
+	if (unlikely(error)) {
+		put_page(page);
+		return ERR_PTR(error);
+	}
+
+	if (PageUptodate(page))
+		return page;
+
+	if (inode->i_blkbits == PAGE_SHIFT ||
+	    !mapping->a_ops->is_partially_uptodate)
+		goto page_not_up_to_date;
+	/* pipes can't handle partially uptodate pages */
+	if (unlikely(iov_iter_is_pipe(iter)))
+		goto page_not_up_to_date;
+	if (!trylock_page(page))
+		goto page_not_up_to_date;
+	/* Did it get truncated before we got the lock? */
+	if (!page->mapping)
+		goto page_not_up_to_date_locked;
+
+	if (!mapping->a_ops->is_partially_uptodate(page,
+				pos & ~PAGE_MASK, count))
+		goto page_not_up_to_date_locked;
+	unlock_page(page);
+	return page;
+
+page_not_up_to_date:
+	/* Get exclusive access to the page ... */
+	error = lock_page_killable(page);
+	if (unlikely(error)) {
+		put_page(page);
+		return ERR_PTR(error);
+	}
+
+page_not_up_to_date_locked:
+	/* Did it get truncated before we got the lock? */
+	if (!page->mapping) {
+		unlock_page(page);
+		put_page(page);
+		return NULL;
+	}
+
+	/* Did somebody else fill it already? */
+	if (PageUptodate(page)) {
+		unlock_page(page);
+		return page;
+	}
+
+	return generic_file_buffered_read_readpage(filp, mapping, page);
+}
+
+static struct page *
+generic_file_buffered_read_no_cached_page(struct kiocb *iocb,
+					  struct iov_iter *iter)
+{
+	struct file *filp = iocb->ki_filp;
+	struct address_space *mapping = filp->f_mapping;
+	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
+	struct page *page;
+	int error;
+
+	/*
+	 * Ok, it wasn't cached, so we need to create a new
+	 * page..
+	 */
+	page = page_cache_alloc(mapping);
+	if (!page)
+		return ERR_PTR(-ENOMEM);
+
+	error = add_to_page_cache_lru(page, mapping, index,
+				      mapping_gfp_constraint(mapping, GFP_KERNEL));
+	if (error) {
+		put_page(page);
+		return error != -EEXIST ? ERR_PTR(error) : NULL;
+	}
+
+	return generic_file_buffered_read_readpage(filp, mapping, page);
+}
+
 /**
  * generic_file_buffered_read - generic file read routine
  * @iocb:	the iocb to read
@@ -1998,29 +2202,19 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 	struct address_space *mapping = filp->f_mapping;
 	struct inode *inode = mapping->host;
 	struct file_ra_state *ra = &filp->f_ra;
-	loff_t *ppos = &iocb->ki_pos;
-	pgoff_t index;
+	size_t orig_count = iov_iter_count(iter);
 	pgoff_t last_index;
-	pgoff_t prev_index;
-	unsigned long offset;      /* offset into pagecache page */
-	unsigned int prev_offset;
 	int error = 0;
 
-	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
+	if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes))
 		return 0;
 	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
 
-	index = *ppos >> PAGE_SHIFT;
-	prev_index = ra->prev_pos >> PAGE_SHIFT;
-	prev_offset = ra->prev_pos & (PAGE_SIZE-1);
-	last_index = (*ppos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
-	offset = *ppos & ~PAGE_MASK;
+	last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
 
 	for (;;) {
+		pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
 		struct page *page;
-		pgoff_t end_index;
-		loff_t isize;
-		unsigned long nr, ret;
 
 		cond_resched();
 find_page:
@@ -2037,8 +2231,15 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 					ra, filp,
 					index, last_index - index);
 			page = find_get_page(mapping, index);
-			if (unlikely(page == NULL))
-				goto no_cached_page;
+			if (unlikely(page == NULL)) {
+				page = generic_file_buffered_read_no_cached_page(iocb, iter);
+				if (!page)
+					goto find_page;
+				if (IS_ERR(page)) {
+					error = PTR_ERR(page);
+					goto out;
+				}
+			}
 		}
 		if (PageReadahead(page)) {
 			page_cache_async_readahead(mapping,
@@ -2048,199 +2249,36 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		if (!PageUptodate(page)) {
 			if (iocb->ki_flags & IOCB_NOWAIT) {
 				put_page(page);
-				goto would_block;
-			}
-
-			/*
-			 * See comment in do_read_cache_page on why
-			 * wait_on_page_locked is used to avoid unnecessarily
-			 * serialisations and why it's safe.
-			 */
-			error = wait_on_page_locked_killable(page);
-			if (unlikely(error))
-				goto readpage_error;
-			if (PageUptodate(page))
-				goto page_ok;
-
-			if (inode->i_blkbits == PAGE_SHIFT ||
-					!mapping->a_ops->is_partially_uptodate)
-				goto page_not_up_to_date;
-			/* pipes can't handle partially uptodate pages */
-			if (unlikely(iov_iter_is_pipe(iter)))
-				goto page_not_up_to_date;
-			if (!trylock_page(page))
-				goto page_not_up_to_date;
-			/* Did it get truncated before we got the lock? */
-			if (!page->mapping)
-				goto page_not_up_to_date_locked;
-			if (!mapping->a_ops->is_partially_uptodate(page,
-							offset, iter->count))
-				goto page_not_up_to_date_locked;
-			unlock_page(page);
-		}
-page_ok:
-		/*
-		 * i_size must be checked after we know the page is Uptodate.
-		 *
-		 * Checking i_size after the check allows us to calculate
-		 * the correct value for "nr", which means the zero-filled
-		 * part of the page is not copied back to userspace (unless
-		 * another truncate extends the file - this is desired though).
-		 */
-
-		isize = i_size_read(inode);
-		end_index = (isize - 1) >> PAGE_SHIFT;
-		if (unlikely(!isize || index > end_index)) {
-			put_page(page);
-			goto out;
-		}
-
-		/* nr is the maximum number of bytes to copy from this page */
-		nr = PAGE_SIZE;
-		if (index == end_index) {
-			nr = ((isize - 1) & ~PAGE_MASK) + 1;
-			if (nr <= offset) {
-				put_page(page);
+				error = -EAGAIN;
 				goto out;
 			}
-		}
-		nr = nr - offset;
-
-		/* If users can be writing to this page using arbitrary
-		 * virtual addresses, take care about potential aliasing
-		 * before reading the page on the kernel side.
-		 */
-		if (mapping_writably_mapped(mapping))
-			flush_dcache_page(page);
-
-		/*
-		 * When a sequential read accesses a page several times,
-		 * only mark it as accessed the first time.
-		 */
-		if (prev_index != index || offset != prev_offset)
-			mark_page_accessed(page);
-		prev_index = index;
-
-		/*
-		 * Ok, we have the page, and it's up-to-date, so
-		 * now we can copy it to user space...
-		 */
-
-		ret = copy_page_to_iter(page, offset, nr, iter);
-		offset += ret;
-		index += offset >> PAGE_SHIFT;
-		offset &= ~PAGE_MASK;
-		prev_offset = offset;
 
-		put_page(page);
-		written += ret;
-		if (!iov_iter_count(iter))
-			goto out;
-		if (ret < nr) {
-			error = -EFAULT;
-			goto out;
-		}
-		continue;
-
-page_not_up_to_date:
-		/* Get exclusive access to the page ... */
-		error = lock_page_killable(page);
-		if (unlikely(error))
-			goto readpage_error;
-
-page_not_up_to_date_locked:
-		/* Did it get truncated before we got the lock? */
-		if (!page->mapping) {
-			unlock_page(page);
-			put_page(page);
-			continue;
-		}
-
-		/* Did somebody else fill it already? */
-		if (PageUptodate(page)) {
-			unlock_page(page);
-			goto page_ok;
-		}
-
-readpage:
-		/*
-		 * A previous I/O error may have been due to temporary
-		 * failures, eg. multipath errors.
-		 * PG_error will be set again if readpage fails.
-		 */
-		ClearPageError(page);
-		/* Start the actual read. The read will unlock the page. */
-		error = mapping->a_ops->readpage(filp, page);
-
-		if (unlikely(error)) {
-			if (error == AOP_TRUNCATED_PAGE) {
-				put_page(page);
-				error = 0;
+			page = generic_file_buffered_read_pagenotuptodate(filp,
+					iter, page, iocb->ki_pos, iter->count);
+			if (!page)
 				goto find_page;
+			if (IS_ERR(page)) {
+				error = PTR_ERR(page);
+				goto out;
 			}
-			goto readpage_error;
-		}
-
-		if (!PageUptodate(page)) {
-			error = lock_page_killable(page);
-			if (unlikely(error))
-				goto readpage_error;
-			if (!PageUptodate(page)) {
-				if (page->mapping == NULL) {
-					/*
-					 * invalidate_mapping_pages got it
-					 */
-					unlock_page(page);
-					put_page(page);
-					goto find_page;
-				}
-				unlock_page(page);
-				shrink_readahead_size_eio(ra);
-				error = -EIO;
-				goto readpage_error;
-			}
-			unlock_page(page);
 		}
 
-		goto page_ok;
-
-readpage_error:
-		/* UHHUH! A synchronous read error occurred. Report it */
+		error = generic_file_buffered_read_page_ok(iocb, iter, page);
 		put_page(page);
-		goto out;
 
-no_cached_page:
-		/*
-		 * Ok, it wasn't cached, so we need to create a new
-		 * page..
-		 */
-		page = page_cache_alloc(mapping);
-		if (!page) {
-			error = -ENOMEM;
-			goto out;
-		}
-		error = add_to_page_cache_lru(page, mapping, index,
-				mapping_gfp_constraint(mapping, GFP_KERNEL));
 		if (error) {
-			put_page(page);
-			if (error == -EEXIST) {
+			if (error > 0)
 				error = 0;
-				goto find_page;
-			}
 			goto out;
 		}
-		goto readpage;
 	}
 
 would_block:
 	error = -EAGAIN;
 out:
-	ra->prev_pos = prev_index;
-	ra->prev_pos <<= PAGE_SHIFT;
-	ra->prev_pos |= prev_offset;
-
-	*ppos = ((loff_t)index << PAGE_SHIFT) + offset;
 	file_accessed(filp);
+	written += orig_count - iov_iter_count(iter);
+
 	return written ? written : error;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig
  2020-06-18  1:05   ` Andrew Morton
  2020-06-19  3:20     ` [PATCH v3 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
  2020-06-19  3:20     ` [PATCH v3 1/2] fs: Break generic_file_buffered_read up into multiple functions Kent Overstreet
@ 2020-06-19  3:20     ` Kent Overstreet
  2 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2020-06-19  3:20 UTC (permalink / raw)
  To: linux-kernel, akpm; +Cc: Kent Overstreet, viro, linux-mm, linux-fsdevel

Convert generic_file_buffered_read() to get pages to read from in
batches, and then copy data to userspace from many pages at once - in
particular, we now don't touch any cachelines that might be contended
while we're in the loop to copy data to userspace.

This is is a performance improvement on workloads that do buffered reads
with large blocksizes, and a very large performance improvement if that
file is also being accessed concurrently by different threads.

On smaller reads (512 bytes), there's a very small performance
improvement (1%, within the margin of error).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 mm/filemap.c | 279 +++++++++++++++++++++++++++++----------------------
 1 file changed, 159 insertions(+), 120 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index dc4a72042e..d8bd5e9647 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1975,67 +1975,6 @@ static void shrink_readahead_size_eio(struct file_ra_state *ra)
 	ra->ra_pages /= 4;
 }
 
-static int generic_file_buffered_read_page_ok(struct kiocb *iocb,
-			struct iov_iter *iter,
-			struct page *page)
-{
-	struct address_space *mapping = iocb->ki_filp->f_mapping;
-	struct inode *inode = mapping->host;
-	struct file_ra_state *ra = &iocb->ki_filp->f_ra;
-	unsigned int offset = iocb->ki_pos & ~PAGE_MASK;
-	unsigned int bytes, copied;
-	loff_t isize, end_offset;
-
-	BUG_ON(iocb->ki_pos >> PAGE_SHIFT != page->index);
-
-	/*
-	 * i_size must be checked after we know the page is Uptodate.
-	 *
-	 * Checking i_size after the check allows us to calculate
-	 * the correct value for "bytes", which means the zero-filled
-	 * part of the page is not copied back to userspace (unless
-	 * another truncate extends the file - this is desired though).
-	 */
-
-	isize = i_size_read(inode);
-	if (unlikely(iocb->ki_pos >= isize))
-		return 1;
-
-	end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
-
-	bytes = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
-
-	/* If users can be writing to this page using arbitrary
-	 * virtual addresses, take care about potential aliasing
-	 * before reading the page on the kernel side.
-	 */
-	if (mapping_writably_mapped(mapping))
-		flush_dcache_page(page);
-
-	/*
-	 * Ok, we have the page, and it's up-to-date, so
-	 * now we can copy it to user space...
-	 */
-
-	copied = copy_page_to_iter(page, offset, bytes, iter);
-
-	iocb->ki_pos += copied;
-
-	/*
-	 * When a sequential read accesses a page several times,
-	 * only mark it as accessed the first time.
-	 */
-	if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
-		mark_page_accessed(page);
-
-	ra->prev_pos = iocb->ki_pos;
-
-	if (copied < bytes)
-		return -EFAULT;
-
-	return !iov_iter_count(iter) || iocb->ki_pos == isize;
-}
-
 static struct page *
 generic_file_buffered_read_readpage(struct file *filp,
 				    struct address_space *mapping,
@@ -2179,6 +2118,83 @@ generic_file_buffered_read_no_cached_page(struct kiocb *iocb,
 	return generic_file_buffered_read_readpage(filp, mapping, page);
 }
 
+static int generic_file_buffered_read_get_pages(struct kiocb *iocb,
+						struct iov_iter *iter,
+						struct page **pages,
+						unsigned int nr)
+{
+	struct file *filp = iocb->ki_filp;
+	struct address_space *mapping = filp->f_mapping;
+	struct file_ra_state *ra = &filp->f_ra;
+	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
+	pgoff_t last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
+	int i, j, nr_got, err = 0;
+
+	nr = min_t(unsigned long, last_index - index, nr);
+find_page:
+	if (fatal_signal_pending(current))
+		return -EINTR;
+
+	nr_got = find_get_pages_contig(mapping, index, nr, pages);
+	if (nr_got)
+		goto got_pages;
+
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		return -EAGAIN;
+
+	page_cache_sync_readahead(mapping, ra, filp, index, last_index - index);
+
+	nr_got = find_get_pages_contig(mapping, index, nr, pages);
+	if (nr_got)
+		goto got_pages;
+
+	pages[0] = generic_file_buffered_read_no_cached_page(iocb, iter);
+	err = PTR_ERR_OR_ZERO(pages[0]);
+	if (!IS_ERR_OR_NULL(pages[0]))
+		nr_got = 1;
+got_pages:
+	for (i = 0; i < nr_got; i++) {
+		struct page *page = pages[i];
+		pgoff_t pg_index = index + i;
+		loff_t pg_pos = max(iocb->ki_pos,
+				    (loff_t) pg_index << PAGE_SHIFT);
+		loff_t pg_count = iocb->ki_pos + iter->count - pg_pos;
+
+		if (PageReadahead(page))
+			page_cache_async_readahead(mapping, ra, filp, page,
+					pg_index, last_index - pg_index);
+
+		if (!PageUptodate(page)) {
+			if (iocb->ki_flags & IOCB_NOWAIT) {
+				for (j = i; j < nr_got; j++)
+					put_page(pages[j]);
+				nr_got = i;
+				err = -EAGAIN;
+				break;
+			}
+
+			page = generic_file_buffered_read_pagenotuptodate(filp,
+						iter, page, pg_pos, pg_count);
+			if (IS_ERR_OR_NULL(page)) {
+				for (j = i + 1; j < nr_got; j++)
+					put_page(pages[j]);
+				nr_got = i;
+				err = PTR_ERR_OR_ZERO(page);
+				break;
+			}
+		}
+	}
+
+	if (likely(nr_got))
+		return nr_got;
+	if (err)
+		return err;
+	/*
+	 * No pages and no error means we raced and should retry:
+	 */
+	goto find_page;
+}
+
 /**
  * generic_file_buffered_read - generic file read routine
  * @iocb:	the iocb to read
@@ -2199,86 +2215,109 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		struct iov_iter *iter, ssize_t written)
 {
 	struct file *filp = iocb->ki_filp;
+	struct file_ra_state *ra = &filp->f_ra;
 	struct address_space *mapping = filp->f_mapping;
 	struct inode *inode = mapping->host;
-	struct file_ra_state *ra = &filp->f_ra;
 	size_t orig_count = iov_iter_count(iter);
-	pgoff_t last_index;
-	int error = 0;
+	struct page *pages_onstack[8], **pages = NULL;
+	unsigned int nr_pages = ((iocb->ki_pos + iter->count + PAGE_SIZE - 1) >> PAGE_SHIFT) -
+		(iocb->ki_pos >> PAGE_SHIFT);
+	int i, pg_nr, error = 0;
+	bool writably_mapped;
+	loff_t isize, end_offset;
 
 	if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes))
 		return 0;
 	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
 
-	last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
+	if (nr_pages > ARRAY_SIZE(pages_onstack))
+		pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
 
-	for (;;) {
-		pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
-		struct page *page;
+	if (!pages) {
+		pages = pages_onstack;
+		nr_pages = min_t(unsigned int, nr_pages, ARRAY_SIZE(pages_onstack));
+	}
 
+	do {
 		cond_resched();
-find_page:
-		if (fatal_signal_pending(current)) {
-			error = -EINTR;
-			goto out;
-		}
 
-		page = find_get_page(mapping, index);
-		if (!page) {
-			if (iocb->ki_flags & IOCB_NOWAIT)
-				goto would_block;
-			page_cache_sync_readahead(mapping,
-					ra, filp,
-					index, last_index - index);
-			page = find_get_page(mapping, index);
-			if (unlikely(page == NULL)) {
-				page = generic_file_buffered_read_no_cached_page(iocb, iter);
-				if (!page)
-					goto find_page;
-				if (IS_ERR(page)) {
-					error = PTR_ERR(page);
-					goto out;
-				}
-			}
-		}
-		if (PageReadahead(page)) {
-			page_cache_async_readahead(mapping,
-					ra, filp, page,
-					index, last_index - index);
+		i = 0;
+		pg_nr = generic_file_buffered_read_get_pages(iocb, iter,
+							 pages, nr_pages);
+		if (pg_nr < 0) {
+			error = pg_nr;
+			break;
 		}
-		if (!PageUptodate(page)) {
-			if (iocb->ki_flags & IOCB_NOWAIT) {
-				put_page(page);
-				error = -EAGAIN;
-				goto out;
-			}
 
-			page = generic_file_buffered_read_pagenotuptodate(filp,
-					iter, page, iocb->ki_pos, iter->count);
-			if (!page)
-				goto find_page;
-			if (IS_ERR(page)) {
-				error = PTR_ERR(page);
-				goto out;
-			}
-		}
+		/*
+		 * i_size must be checked after we know the pages are Uptodate.
+		 *
+		 * Checking i_size after the check allows us to calculate
+		 * the correct value for "nr", which means the zero-filled
+		 * part of the page is not copied back to userspace (unless
+		 * another truncate extends the file - this is desired though).
+		 */
+		isize = i_size_read(inode);
+		if (unlikely(iocb->ki_pos >= isize))
+			goto put_pages;
 
-		error = generic_file_buffered_read_page_ok(iocb, iter, page);
-		put_page(page);
+		end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
 
-		if (error) {
-			if (error > 0)
-				error = 0;
-			goto out;
+		while ((iocb->ki_pos >> PAGE_SHIFT) + pg_nr >
+		       (end_offset + PAGE_SIZE - 1) >> PAGE_SHIFT)
+			put_page(pages[--pg_nr]);
+
+		/*
+		 * Once we start copying data, we don't want to be touching any
+		 * cachelines that might be contended:
+		 */
+		writably_mapped = mapping_writably_mapped(mapping);
+
+		/*
+		 * When a sequential read accesses a page several times, only
+		 * mark it as accessed the first time.
+		 */
+		if (iocb->ki_pos >> PAGE_SHIFT !=
+		    ra->prev_pos >> PAGE_SHIFT)
+			mark_page_accessed(pages[0]);
+		for (i = 1; i < pg_nr; i++)
+			mark_page_accessed(pages[i]);
+
+		for (i = 0; i < pg_nr; i++) {
+			unsigned int offset = iocb->ki_pos & ~PAGE_MASK;
+			unsigned int bytes = min_t(loff_t, end_offset - iocb->ki_pos,
+						   PAGE_SIZE - offset);
+			unsigned int copied;
+
+			/*
+			 * If users can be writing to this page using arbitrary
+			 * virtual addresses, take care about potential aliasing
+			 * before reading the page on the kernel side.
+			 */
+			if (writably_mapped)
+				flush_dcache_page(pages[i]);
+
+			copied = copy_page_to_iter(pages[i], offset, bytes, iter);
+
+			iocb->ki_pos += copied;
+			ra->prev_pos = iocb->ki_pos;
+
+			if (copied < bytes) {
+				error = -EFAULT;
+				break;
+			}
 		}
-	}
+put_pages:
+		for (i = 0; i < pg_nr; i++)
+			put_page(pages[i]);
+	} while (iov_iter_count(iter) && iocb->ki_pos < isize && !error);
 
-would_block:
-	error = -EAGAIN;
-out:
 	file_accessed(filp);
 	written += orig_count - iov_iter_count(iter);
 
+	if (pages != pages_onstack)
+		kfree(pages);
+
 	return written ? written : error;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/2] generic_file_buffered_read() refactoring & optimization
  2020-06-19  3:20     ` [PATCH v3 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
@ 2020-06-19 12:59       ` Christoph Hellwig
  2020-06-19 18:44         ` Kent Overstreet
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-19 12:59 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel, akpm, viro, linux-mm, linux-fsdevel

After looking at v2 yesterday I noticed I few things in the structure
that I really didn't like:

 - using a struct page * return value just to communicate status codes
 - the extremely long function names
 - a generally somewhat non-intuitive split of the helpers

I then hacked on top of it for a bit while sitting in a telephone
conference.  Below is my result, which passes a quick xfstests run.
Note that this includes the refactoring and the batch lookup changes
as I did it on top of your series:

diff --git a/mm/filemap.c b/mm/filemap.c
index f0ae9a6308cb4d..9e0aecd99950f4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1972,273 +1972,360 @@ static void shrink_readahead_size_eio(struct file_ra_state *ra)
 	ra->ra_pages /= 4;
 }
 
-/**
- * generic_file_buffered_read - generic file read routine
- * @iocb:	the iocb to read
- * @iter:	data destination
- * @written:	already copied
- *
- * This is a generic file read routine, and uses the
- * mapping->a_ops->readpage() function for the actual low-level stuff.
- *
- * This is really ugly. But the goto's actually try to clarify some
- * of the logic when it comes to error handling etc.
- *
- * Return:
- * * total number of bytes copied, including those the were already @written
- * * negative error code if nothing was copied
- */
-ssize_t generic_file_buffered_read(struct kiocb *iocb,
-		struct iov_iter *iter, ssize_t written)
+static inline pgoff_t filemap_last_index(struct kiocb *iocb,
+		struct iov_iter *iter)
 {
-	struct file *filp = iocb->ki_filp;
-	struct address_space *mapping = filp->f_mapping;
-	struct inode *inode = mapping->host;
-	struct file_ra_state *ra = &filp->f_ra;
-	loff_t *ppos = &iocb->ki_pos;
-	pgoff_t index;
-	pgoff_t last_index;
-	pgoff_t prev_index;
-	unsigned long offset;      /* offset into pagecache page */
-	unsigned int prev_offset;
-	int error = 0;
-
-	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
-		return 0;
-	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
-
-	index = *ppos >> PAGE_SHIFT;
-	prev_index = ra->prev_pos >> PAGE_SHIFT;
-	prev_offset = ra->prev_pos & (PAGE_SIZE-1);
-	last_index = (*ppos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
-	offset = *ppos & ~PAGE_MASK;
+	return (iocb->ki_pos + iter->count + PAGE_SIZE - 1) >> PAGE_SHIFT;
+}
 
-	for (;;) {
-		struct page *page;
-		pgoff_t end_index;
-		loff_t isize;
-		unsigned long nr, ret;
+static inline unsigned long filemap_nr_pages(struct kiocb *iocb,
+		struct iov_iter *iter)
+{
+	return filemap_last_index(iocb, iter) - (iocb->ki_pos >> PAGE_SHIFT);
+}
 
-		cond_resched();
-find_page:
-		if (fatal_signal_pending(current)) {
-			error = -EINTR;
-			goto out;
-		}
+static int __filemap_read_not_uptodate(struct file *file, struct page *page)
+{
+	int error;
 
-		page = find_get_page(mapping, index);
-		if (!page) {
-			if (iocb->ki_flags & IOCB_NOWAIT)
-				goto would_block;
-			page_cache_sync_readahead(mapping,
-					ra, filp,
-					index, last_index - index);
-			page = find_get_page(mapping, index);
-			if (unlikely(page == NULL))
-				goto no_cached_page;
-		}
-		if (PageReadahead(page)) {
-			page_cache_async_readahead(mapping,
-					ra, filp, page,
-					index, last_index - index);
-		}
-		if (!PageUptodate(page)) {
-			if (iocb->ki_flags & IOCB_NOWAIT) {
-				put_page(page);
-				goto would_block;
-			}
+	error = lock_page_killable(page);
+	if (error)
+		return error;
 
+	if (!PageUptodate(page)) {
+		if (!page->mapping) {
 			/*
-			 * See comment in do_read_cache_page on why
-			 * wait_on_page_locked is used to avoid unnecessarily
-			 * serialisations and why it's safe.
+			 * invalidate_mapping_pages got it
 			 */
-			error = wait_on_page_locked_killable(page);
-			if (unlikely(error))
-				goto readpage_error;
-			if (PageUptodate(page))
-				goto page_ok;
-
-			if (inode->i_blkbits == PAGE_SHIFT ||
-					!mapping->a_ops->is_partially_uptodate)
-				goto page_not_up_to_date;
-			/* pipes can't handle partially uptodate pages */
-			if (unlikely(iov_iter_is_pipe(iter)))
-				goto page_not_up_to_date;
-			if (!trylock_page(page))
-				goto page_not_up_to_date;
-			/* Did it get truncated before we got the lock? */
-			if (!page->mapping)
-				goto page_not_up_to_date_locked;
-			if (!mapping->a_ops->is_partially_uptodate(page,
-							offset, iter->count))
-				goto page_not_up_to_date_locked;
-			unlock_page(page);
+			error = AOP_TRUNCATED_PAGE;
+		} else {
+			error = -EIO;
 		}
-page_ok:
-		/*
-		 * i_size must be checked after we know the page is Uptodate.
-		 *
-		 * Checking i_size after the check allows us to calculate
-		 * the correct value for "nr", which means the zero-filled
-		 * part of the page is not copied back to userspace (unless
-		 * another truncate extends the file - this is desired though).
-		 */
+	}
 
-		isize = i_size_read(inode);
-		end_index = (isize - 1) >> PAGE_SHIFT;
-		if (unlikely(!isize || index > end_index)) {
-			put_page(page);
-			goto out;
-		}
+	unlock_page(page);
+	if (error == -EIO)
+		shrink_readahead_size_eio(&file->f_ra);
+	return error;
+}
 
-		/* nr is the maximum number of bytes to copy from this page */
-		nr = PAGE_SIZE;
-		if (index == end_index) {
-			nr = ((isize - 1) & ~PAGE_MASK) + 1;
-			if (nr <= offset) {
-				put_page(page);
-				goto out;
-			}
-		}
-		nr = nr - offset;
+static int __filemap_read_one_page(struct file *file, struct page *page)
+{
+	int error;
 
-		/* If users can be writing to this page using arbitrary
-		 * virtual addresses, take care about potential aliasing
-		 * before reading the page on the kernel side.
-		 */
-		if (mapping_writably_mapped(mapping))
-			flush_dcache_page(page);
+	/*
+	 * A previous I/O error may have been due to temporary failures, e.g.
+	 * multipath errors. PG_error will be set again if readpage fails.
+	 */
+	ClearPageError(page);
 
-		/*
-		 * When a sequential read accesses a page several times,
-		 * only mark it as accessed the first time.
-		 */
-		if (prev_index != index || offset != prev_offset)
-			mark_page_accessed(page);
-		prev_index = index;
+	/*
+	 * Start the actual read. The read will unlock the page.
+	 */
+	error = file->f_mapping->a_ops->readpage(file, page);
+	if (!error && !PageUptodate(page))
+		return __filemap_read_not_uptodate(file, page);
+	return error;
+}
 
-		/*
-		 * Ok, we have the page, and it's up-to-date, so
-		 * now we can copy it to user space...
-		 */
+static int filemap_read_one_page(struct kiocb *iocb, struct iov_iter *iter,
+		struct page *page, pgoff_t index)
+{
+	struct file *file = iocb->ki_filp;
+	struct address_space *mapping = file->f_mapping;
+	loff_t pos = max(iocb->ki_pos, (loff_t)index << PAGE_SHIFT);
+	loff_t count = iocb->ki_pos + iter->count - pos;
+	int error;
 
-		ret = copy_page_to_iter(page, offset, nr, iter);
-		offset += ret;
-		index += offset >> PAGE_SHIFT;
-		offset &= ~PAGE_MASK;
-		prev_offset = offset;
+	/*
+	 * See comment in do_read_cache_page on why wait_on_page_locked is used
+	 * to avoid unnecessarily serialisations and why it's safe.
+	 */
+	error = wait_on_page_locked_killable(page);
+	if (unlikely(error))
+		goto out_put_page;
 
-		put_page(page);
-		written += ret;
-		if (!iov_iter_count(iter))
-			goto out;
-		if (ret < nr) {
-			error = -EFAULT;
-			goto out;
-		}
-		continue;
+	if (PageUptodate(page))
+		return 0;
+
+	if (mapping->host->i_blkbits == PAGE_SHIFT ||
+	    !mapping->a_ops->is_partially_uptodate)
+		goto page_not_up_to_date;
+	/* pipes can't handle partially uptodate pages */
+	if (unlikely(iov_iter_is_pipe(iter)))
+		goto page_not_up_to_date;
+	if (!trylock_page(page))
+		goto page_not_up_to_date;
+	/* Did it get truncated before we got the lock? */
+	if (!page->mapping)
+		goto page_not_up_to_date_locked;
+	if (!mapping->a_ops->is_partially_uptodate(page, pos & ~PAGE_MASK,
+			count))
+		goto page_not_up_to_date_locked;
+done:
+	unlock_page(page);
+	return 0;
 
 page_not_up_to_date:
-		/* Get exclusive access to the page ... */
-		error = lock_page_killable(page);
-		if (unlikely(error))
-			goto readpage_error;
+	/* Get exclusive access to the page ... */
+	error = lock_page_killable(page);
+	if (unlikely(error))
+		goto out_put_page;
 
 page_not_up_to_date_locked:
-		/* Did it get truncated before we got the lock? */
-		if (!page->mapping) {
-			unlock_page(page);
-			put_page(page);
+	/* Did it get truncated before we got the lock? */
+	if (!page->mapping) {
+		error = AOP_TRUNCATED_PAGE;
+		unlock_page(page);
+		goto out_put_page;
+	}
+
+	/* Did somebody else fill it already? */
+	if (PageUptodate(page))
+		goto done;
+
+	error = __filemap_read_one_page(file, page);
+	if (error)
+		goto out_put_page;
+	return 0;
+
+out_put_page:
+	put_page(page);
+	return error;
+}
+
+static int filemap_read_get_pages(struct kiocb *iocb, struct iov_iter *iter,
+		struct page **pages, unsigned long max_pages)
+{
+	struct file *file = iocb->ki_filp;
+	struct address_space *mapping = file->f_mapping;
+	unsigned long nr = min(filemap_nr_pages(iocb, iter), max_pages);
+	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
+	int ret;
+
+	if (fatal_signal_pending(current))
+		return -EINTR;
+
+	ret = find_get_pages_contig(mapping, index, nr, pages);
+	if (ret)
+		return ret;
+
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		return -EAGAIN;
+
+	page_cache_sync_readahead(mapping, &file->f_ra, file, index,
+				  filemap_nr_pages(iocb, iter));
+
+	ret = find_get_pages_contig(mapping, index, nr, pages);
+	if (ret)
+		return ret;
+
+	/*
+	 * Ok, it wasn't cached, so we need to create a new page..
+	 */
+	pages[0] = page_cache_alloc(mapping);
+	if (!pages[0])
+		return -ENOMEM;
+
+	ret = add_to_page_cache_lru(pages[0], mapping, index,
+			mapping_gfp_constraint(mapping, GFP_KERNEL));
+	if (ret) {
+		if (ret == -EEXIST)
+			ret = 0;
+		goto out_put_page;
+	}
+
+	ret = __filemap_read_one_page(file, pages[0]);
+	if (ret) {
+		if (ret == AOP_TRUNCATED_PAGE)
+			ret = 0;
+		goto out_put_page;
+	}
+
+	return 1;
+
+out_put_page:
+	put_page(pages[0]);
+	return ret;
+}
+
+static int filemap_read_pages(struct kiocb *iocb, struct iov_iter *iter,
+		struct page **pages, unsigned int nr_pages)
+{
+	struct file *file = iocb->ki_filp;
+	struct address_space *mapping = file->f_mapping;
+	pgoff_t last_index = filemap_last_index(iocb, iter);
+	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
+	unsigned int cur_page, j;
+	int err = 0;
+
+	for (cur_page = 0; cur_page < nr_pages; cur_page++, index++) {
+		struct page *page = pages[cur_page];
+
+		if (PageReadahead(page))
+			page_cache_async_readahead(mapping, &file->f_ra, file,
+					page, index, last_index - index);
+
+		if (PageUptodate(page))
 			continue;
-		}
 
-		/* Did somebody else fill it already? */
-		if (PageUptodate(page)) {
-			unlock_page(page);
-			goto page_ok;
+		if (iocb->ki_flags & IOCB_NOWAIT) {
+			put_page(page);
+			err = -EAGAIN;
+			goto out_put_pages;
 		}
 
-readpage:
-		/*
-		 * A previous I/O error may have been due to temporary
-		 * failures, eg. multipath errors.
-		 * PG_error will be set again if readpage fails.
-		 */
-		ClearPageError(page);
-		/* Start the actual read. The read will unlock the page. */
-		error = mapping->a_ops->readpage(filp, page);
+		err = filemap_read_one_page(iocb, iter, page, index);
+		if (err)
+			goto out_put_pages;
+	}
 
-		if (unlikely(error)) {
-			if (error == AOP_TRUNCATED_PAGE) {
-				put_page(page);
-				error = 0;
-				goto find_page;
-			}
-			goto readpage_error;
-		}
+	return cur_page;
 
-		if (!PageUptodate(page)) {
-			error = lock_page_killable(page);
-			if (unlikely(error))
-				goto readpage_error;
-			if (!PageUptodate(page)) {
-				if (page->mapping == NULL) {
-					/*
-					 * invalidate_mapping_pages got it
-					 */
-					unlock_page(page);
-					put_page(page);
-					goto find_page;
-				}
-				unlock_page(page);
-				shrink_readahead_size_eio(ra);
-				error = -EIO;
-				goto readpage_error;
-			}
-			unlock_page(page);
-		}
+out_put_pages:
+	for (j = cur_page + 1; j < nr_pages; j++)
+		put_page(pages[j]);
+	if (cur_page == 0) {
+		if (err == AOP_TRUNCATED_PAGE)
+			err = 0;
+		return err;
+	}
+	return cur_page;
+}
 
-		goto page_ok;
+static int filemap_do_read(struct kiocb *iocb, struct iov_iter *iter,
+		struct page **pages, unsigned long max_pages)
+{
+	struct file *filp = iocb->ki_filp;
+	struct address_space *mapping = filp->f_mapping;
+	struct file_ra_state *ra = &filp->f_ra;
+	loff_t isize, end_offset;
+	int nr_pages, i;
+	bool writably_mapped;
 
-readpage_error:
-		/* UHHUH! A synchronous read error occurred. Report it */
-		put_page(page);
-		goto out;
+	cond_resched();
+
+was_truncated:
+	nr_pages = filemap_read_get_pages(iocb, iter, pages, max_pages);
+	if (nr_pages > 0)
+		nr_pages = filemap_read_pages(iocb, iter, pages, nr_pages);
+	if (nr_pages == 0)
+		goto was_truncated;
+	if (nr_pages < 0)
+		return nr_pages;
+
+	/*
+	 * i_size must be checked after we know the pages are Uptodate.
+	 *
+	 * Checking i_size after the check allows us to calculate the correct
+	 * value for "nr_pages", which means the zero-filled part of the page is
+	 * not copied back to userspace (unless another truncate extends the
+	 * file - this is desired though).
+	 */
+	isize = i_size_read(mapping->host);
+	if (unlikely(iocb->ki_pos >= isize))
+		goto put_pages;
+
+	end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
+
+	while ((iocb->ki_pos >> PAGE_SHIFT) + nr_pages >
+	       (end_offset + PAGE_SIZE - 1) >> PAGE_SHIFT)
+		put_page(pages[--nr_pages]);
+
+	/*
+	 * Once we start copying data, we don't want to be touching any
+	 * cachelines that might be contended:
+	 */
+	writably_mapped = mapping_writably_mapped(mapping);
+
+	/*
+	 * When a sequential read accesses a page several times, only mark it as
+	 * accessed the first time.
+	 */
+	if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
+		mark_page_accessed(pages[0]);
+	for (i = 1; i < nr_pages; i++)
+		mark_page_accessed(pages[i]);
+
+	for (i = 0; i < nr_pages; i++) {
+		unsigned offset = iocb->ki_pos & ~PAGE_MASK;
+		unsigned bytes = min_t(loff_t, end_offset - iocb->ki_pos,
+					       PAGE_SIZE - offset);
+		unsigned copied;
 
-no_cached_page:
 		/*
-		 * Ok, it wasn't cached, so we need to create a new
-		 * page..
+		 * If users can be writing to this page using arbitrary virtual
+		 * addresses, take care about potential aliasing before reading
+		 * the page on the kernel side.
 		 */
-		page = page_cache_alloc(mapping);
-		if (!page) {
-			error = -ENOMEM;
-			goto out;
-		}
-		error = add_to_page_cache_lru(page, mapping, index,
-				mapping_gfp_constraint(mapping, GFP_KERNEL));
-		if (error) {
-			put_page(page);
-			if (error == -EEXIST) {
-				error = 0;
-				goto find_page;
-			}
-			goto out;
-		}
-		goto readpage;
+		if (writably_mapped)
+			flush_dcache_page(pages[i]);
+
+		copied = copy_page_to_iter(pages[i], offset, bytes, iter);
+
+		iocb->ki_pos += copied;
+		ra->prev_pos = iocb->ki_pos;
+
+		if (copied < bytes)
+			return -EFAULT;
 	}
 
-would_block:
-	error = -EAGAIN;
-out:
-	ra->prev_pos = prev_index;
-	ra->prev_pos <<= PAGE_SHIFT;
-	ra->prev_pos |= prev_offset;
+put_pages:
+	for (i = 0; i < nr_pages; i++)
+		put_page(pages[i]);
+	return 0;
+}
+
+/**
+ * generic_file_buffered_read - generic file read routine
+ * @iocb:	the iocb to read
+ * @iter:	data destination
+ * @written:	already copied
+ *
+ * This is a generic file read routine, and uses the
+ * mapping->a_ops->readpage() function for the actual low-level stuff.
+ *
+ * Return:
+ * * total number of bytes copied, including those the were already @written
+ * * negative error code if nothing was copied
+ */
+ssize_t generic_file_buffered_read(struct kiocb *iocb,
+		struct iov_iter *iter, ssize_t written)
+{
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
+	size_t orig_count = iov_iter_count(iter);
+	struct inode *inode = mapping->host;
+	struct page *page_array[8], **pages = NULL;
+	unsigned max_pages = filemap_nr_pages(iocb, iter);
+	int error;
 
-	*ppos = ((loff_t)index << PAGE_SHIFT) + offset;
-	file_accessed(filp);
-	return written ? written : error;
+	if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes))
+		return 0;
+	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
+
+	if (max_pages > ARRAY_SIZE(page_array))
+		pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
+
+	if (!pages) {
+		pages = page_array;
+		max_pages = ARRAY_SIZE(page_array);
+	}
+
+	do {
+		error = filemap_do_read(iocb, iter, pages, max_pages);
+		if (error)
+			break;
+	} while (iov_iter_count(iter) && iocb->ki_pos < i_size_read(inode));
+
+	file_accessed(iocb->ki_filp);
+	written += orig_count - iov_iter_count(iter);
+
+	if (pages != page_array)
+		kfree(pages);
+
+	if (unlikely(!written))
+		return error;
+	return written;
 }
 EXPORT_SYMBOL_GPL(generic_file_buffered_read);
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/2] generic_file_buffered_read() refactoring & optimization
  2020-06-19 12:59       ` Christoph Hellwig
@ 2020-06-19 18:44         ` Kent Overstreet
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2020-06-19 18:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, akpm, viro, linux-mm, linux-fsdevel

On Fri, Jun 19, 2020 at 05:59:20AM -0700, Christoph Hellwig wrote:
> After looking at v2 yesterday I noticed I few things in the structure
> that I really didn't like:
> 
>  - using a struct page * return value just to communicate status codes
>  - the extremely long function names
>  - a generally somewhat non-intuitive split of the helpers
> 
> I then hacked on top of it for a bit while sitting in a telephone
> conference.  Below is my result, which passes a quick xfstests run.
> Note that this includes the refactoring and the batch lookup changes
> as I did it on top of your series:

I like it - I can't get your patch to apply to anything though, do you have it
up in a git repo anywhere?

> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index f0ae9a6308cb4d..9e0aecd99950f4 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1972,273 +1972,360 @@ static void shrink_readahead_size_eio(struct file_ra_state *ra)
>  	ra->ra_pages /= 4;
>  }
>  
> -/**
> - * generic_file_buffered_read - generic file read routine
> - * @iocb:	the iocb to read
> - * @iter:	data destination
> - * @written:	already copied
> - *
> - * This is a generic file read routine, and uses the
> - * mapping->a_ops->readpage() function for the actual low-level stuff.
> - *
> - * This is really ugly. But the goto's actually try to clarify some
> - * of the logic when it comes to error handling etc.
> - *
> - * Return:
> - * * total number of bytes copied, including those the were already @written
> - * * negative error code if nothing was copied
> - */
> -ssize_t generic_file_buffered_read(struct kiocb *iocb,
> -		struct iov_iter *iter, ssize_t written)
> +static inline pgoff_t filemap_last_index(struct kiocb *iocb,
> +		struct iov_iter *iter)
>  {
> -	struct file *filp = iocb->ki_filp;
> -	struct address_space *mapping = filp->f_mapping;
> -	struct inode *inode = mapping->host;
> -	struct file_ra_state *ra = &filp->f_ra;
> -	loff_t *ppos = &iocb->ki_pos;
> -	pgoff_t index;
> -	pgoff_t last_index;
> -	pgoff_t prev_index;
> -	unsigned long offset;      /* offset into pagecache page */
> -	unsigned int prev_offset;
> -	int error = 0;
> -
> -	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
> -		return 0;
> -	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
> -
> -	index = *ppos >> PAGE_SHIFT;
> -	prev_index = ra->prev_pos >> PAGE_SHIFT;
> -	prev_offset = ra->prev_pos & (PAGE_SIZE-1);
> -	last_index = (*ppos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
> -	offset = *ppos & ~PAGE_MASK;
> +	return (iocb->ki_pos + iter->count + PAGE_SIZE - 1) >> PAGE_SHIFT;
> +}
>  
> -	for (;;) {
> -		struct page *page;
> -		pgoff_t end_index;
> -		loff_t isize;
> -		unsigned long nr, ret;
> +static inline unsigned long filemap_nr_pages(struct kiocb *iocb,
> +		struct iov_iter *iter)
> +{
> +	return filemap_last_index(iocb, iter) - (iocb->ki_pos >> PAGE_SHIFT);
> +}
>  
> -		cond_resched();
> -find_page:
> -		if (fatal_signal_pending(current)) {
> -			error = -EINTR;
> -			goto out;
> -		}
> +static int __filemap_read_not_uptodate(struct file *file, struct page *page)
> +{
> +	int error;
>  
> -		page = find_get_page(mapping, index);
> -		if (!page) {
> -			if (iocb->ki_flags & IOCB_NOWAIT)
> -				goto would_block;
> -			page_cache_sync_readahead(mapping,
> -					ra, filp,
> -					index, last_index - index);
> -			page = find_get_page(mapping, index);
> -			if (unlikely(page == NULL))
> -				goto no_cached_page;
> -		}
> -		if (PageReadahead(page)) {
> -			page_cache_async_readahead(mapping,
> -					ra, filp, page,
> -					index, last_index - index);
> -		}
> -		if (!PageUptodate(page)) {
> -			if (iocb->ki_flags & IOCB_NOWAIT) {
> -				put_page(page);
> -				goto would_block;
> -			}
> +	error = lock_page_killable(page);
> +	if (error)
> +		return error;
>  
> +	if (!PageUptodate(page)) {
> +		if (!page->mapping) {
>  			/*
> -			 * See comment in do_read_cache_page on why
> -			 * wait_on_page_locked is used to avoid unnecessarily
> -			 * serialisations and why it's safe.
> +			 * invalidate_mapping_pages got it
>  			 */
> -			error = wait_on_page_locked_killable(page);
> -			if (unlikely(error))
> -				goto readpage_error;
> -			if (PageUptodate(page))
> -				goto page_ok;
> -
> -			if (inode->i_blkbits == PAGE_SHIFT ||
> -					!mapping->a_ops->is_partially_uptodate)
> -				goto page_not_up_to_date;
> -			/* pipes can't handle partially uptodate pages */
> -			if (unlikely(iov_iter_is_pipe(iter)))
> -				goto page_not_up_to_date;
> -			if (!trylock_page(page))
> -				goto page_not_up_to_date;
> -			/* Did it get truncated before we got the lock? */
> -			if (!page->mapping)
> -				goto page_not_up_to_date_locked;
> -			if (!mapping->a_ops->is_partially_uptodate(page,
> -							offset, iter->count))
> -				goto page_not_up_to_date_locked;
> -			unlock_page(page);
> +			error = AOP_TRUNCATED_PAGE;
> +		} else {
> +			error = -EIO;
>  		}
> -page_ok:
> -		/*
> -		 * i_size must be checked after we know the page is Uptodate.
> -		 *
> -		 * Checking i_size after the check allows us to calculate
> -		 * the correct value for "nr", which means the zero-filled
> -		 * part of the page is not copied back to userspace (unless
> -		 * another truncate extends the file - this is desired though).
> -		 */
> +	}
>  
> -		isize = i_size_read(inode);
> -		end_index = (isize - 1) >> PAGE_SHIFT;
> -		if (unlikely(!isize || index > end_index)) {
> -			put_page(page);
> -			goto out;
> -		}
> +	unlock_page(page);
> +	if (error == -EIO)
> +		shrink_readahead_size_eio(&file->f_ra);
> +	return error;
> +}
>  
> -		/* nr is the maximum number of bytes to copy from this page */
> -		nr = PAGE_SIZE;
> -		if (index == end_index) {
> -			nr = ((isize - 1) & ~PAGE_MASK) + 1;
> -			if (nr <= offset) {
> -				put_page(page);
> -				goto out;
> -			}
> -		}
> -		nr = nr - offset;
> +static int __filemap_read_one_page(struct file *file, struct page *page)
> +{
> +	int error;
>  
> -		/* If users can be writing to this page using arbitrary
> -		 * virtual addresses, take care about potential aliasing
> -		 * before reading the page on the kernel side.
> -		 */
> -		if (mapping_writably_mapped(mapping))
> -			flush_dcache_page(page);
> +	/*
> +	 * A previous I/O error may have been due to temporary failures, e.g.
> +	 * multipath errors. PG_error will be set again if readpage fails.
> +	 */
> +	ClearPageError(page);
>  
> -		/*
> -		 * When a sequential read accesses a page several times,
> -		 * only mark it as accessed the first time.
> -		 */
> -		if (prev_index != index || offset != prev_offset)
> -			mark_page_accessed(page);
> -		prev_index = index;
> +	/*
> +	 * Start the actual read. The read will unlock the page.
> +	 */
> +	error = file->f_mapping->a_ops->readpage(file, page);
> +	if (!error && !PageUptodate(page))
> +		return __filemap_read_not_uptodate(file, page);
> +	return error;
> +}
>  
> -		/*
> -		 * Ok, we have the page, and it's up-to-date, so
> -		 * now we can copy it to user space...
> -		 */
> +static int filemap_read_one_page(struct kiocb *iocb, struct iov_iter *iter,
> +		struct page *page, pgoff_t index)
> +{
> +	struct file *file = iocb->ki_filp;
> +	struct address_space *mapping = file->f_mapping;
> +	loff_t pos = max(iocb->ki_pos, (loff_t)index << PAGE_SHIFT);
> +	loff_t count = iocb->ki_pos + iter->count - pos;
> +	int error;
>  
> -		ret = copy_page_to_iter(page, offset, nr, iter);
> -		offset += ret;
> -		index += offset >> PAGE_SHIFT;
> -		offset &= ~PAGE_MASK;
> -		prev_offset = offset;
> +	/*
> +	 * See comment in do_read_cache_page on why wait_on_page_locked is used
> +	 * to avoid unnecessarily serialisations and why it's safe.
> +	 */
> +	error = wait_on_page_locked_killable(page);
> +	if (unlikely(error))
> +		goto out_put_page;
>  
> -		put_page(page);
> -		written += ret;
> -		if (!iov_iter_count(iter))
> -			goto out;
> -		if (ret < nr) {
> -			error = -EFAULT;
> -			goto out;
> -		}
> -		continue;
> +	if (PageUptodate(page))
> +		return 0;
> +
> +	if (mapping->host->i_blkbits == PAGE_SHIFT ||
> +	    !mapping->a_ops->is_partially_uptodate)
> +		goto page_not_up_to_date;
> +	/* pipes can't handle partially uptodate pages */
> +	if (unlikely(iov_iter_is_pipe(iter)))
> +		goto page_not_up_to_date;
> +	if (!trylock_page(page))
> +		goto page_not_up_to_date;
> +	/* Did it get truncated before we got the lock? */
> +	if (!page->mapping)
> +		goto page_not_up_to_date_locked;
> +	if (!mapping->a_ops->is_partially_uptodate(page, pos & ~PAGE_MASK,
> +			count))
> +		goto page_not_up_to_date_locked;
> +done:
> +	unlock_page(page);
> +	return 0;
>  
>  page_not_up_to_date:
> -		/* Get exclusive access to the page ... */
> -		error = lock_page_killable(page);
> -		if (unlikely(error))
> -			goto readpage_error;
> +	/* Get exclusive access to the page ... */
> +	error = lock_page_killable(page);
> +	if (unlikely(error))
> +		goto out_put_page;
>  
>  page_not_up_to_date_locked:
> -		/* Did it get truncated before we got the lock? */
> -		if (!page->mapping) {
> -			unlock_page(page);
> -			put_page(page);
> +	/* Did it get truncated before we got the lock? */
> +	if (!page->mapping) {
> +		error = AOP_TRUNCATED_PAGE;
> +		unlock_page(page);
> +		goto out_put_page;
> +	}
> +
> +	/* Did somebody else fill it already? */
> +	if (PageUptodate(page))
> +		goto done;
> +
> +	error = __filemap_read_one_page(file, page);
> +	if (error)
> +		goto out_put_page;
> +	return 0;
> +
> +out_put_page:
> +	put_page(page);
> +	return error;
> +}
> +
> +static int filemap_read_get_pages(struct kiocb *iocb, struct iov_iter *iter,
> +		struct page **pages, unsigned long max_pages)
> +{
> +	struct file *file = iocb->ki_filp;
> +	struct address_space *mapping = file->f_mapping;
> +	unsigned long nr = min(filemap_nr_pages(iocb, iter), max_pages);
> +	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
> +	int ret;
> +
> +	if (fatal_signal_pending(current))
> +		return -EINTR;
> +
> +	ret = find_get_pages_contig(mapping, index, nr, pages);
> +	if (ret)
> +		return ret;
> +
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		return -EAGAIN;
> +
> +	page_cache_sync_readahead(mapping, &file->f_ra, file, index,
> +				  filemap_nr_pages(iocb, iter));
> +
> +	ret = find_get_pages_contig(mapping, index, nr, pages);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Ok, it wasn't cached, so we need to create a new page..
> +	 */
> +	pages[0] = page_cache_alloc(mapping);
> +	if (!pages[0])
> +		return -ENOMEM;
> +
> +	ret = add_to_page_cache_lru(pages[0], mapping, index,
> +			mapping_gfp_constraint(mapping, GFP_KERNEL));
> +	if (ret) {
> +		if (ret == -EEXIST)
> +			ret = 0;
> +		goto out_put_page;
> +	}
> +
> +	ret = __filemap_read_one_page(file, pages[0]);
> +	if (ret) {
> +		if (ret == AOP_TRUNCATED_PAGE)
> +			ret = 0;
> +		goto out_put_page;
> +	}
> +
> +	return 1;
> +
> +out_put_page:
> +	put_page(pages[0]);
> +	return ret;
> +}
> +
> +static int filemap_read_pages(struct kiocb *iocb, struct iov_iter *iter,
> +		struct page **pages, unsigned int nr_pages)
> +{
> +	struct file *file = iocb->ki_filp;
> +	struct address_space *mapping = file->f_mapping;
> +	pgoff_t last_index = filemap_last_index(iocb, iter);
> +	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
> +	unsigned int cur_page, j;
> +	int err = 0;
> +
> +	for (cur_page = 0; cur_page < nr_pages; cur_page++, index++) {
> +		struct page *page = pages[cur_page];
> +
> +		if (PageReadahead(page))
> +			page_cache_async_readahead(mapping, &file->f_ra, file,
> +					page, index, last_index - index);
> +
> +		if (PageUptodate(page))
>  			continue;
> -		}
>  
> -		/* Did somebody else fill it already? */
> -		if (PageUptodate(page)) {
> -			unlock_page(page);
> -			goto page_ok;
> +		if (iocb->ki_flags & IOCB_NOWAIT) {
> +			put_page(page);
> +			err = -EAGAIN;
> +			goto out_put_pages;
>  		}
>  
> -readpage:
> -		/*
> -		 * A previous I/O error may have been due to temporary
> -		 * failures, eg. multipath errors.
> -		 * PG_error will be set again if readpage fails.
> -		 */
> -		ClearPageError(page);
> -		/* Start the actual read. The read will unlock the page. */
> -		error = mapping->a_ops->readpage(filp, page);
> +		err = filemap_read_one_page(iocb, iter, page, index);
> +		if (err)
> +			goto out_put_pages;
> +	}
>  
> -		if (unlikely(error)) {
> -			if (error == AOP_TRUNCATED_PAGE) {
> -				put_page(page);
> -				error = 0;
> -				goto find_page;
> -			}
> -			goto readpage_error;
> -		}
> +	return cur_page;
>  
> -		if (!PageUptodate(page)) {
> -			error = lock_page_killable(page);
> -			if (unlikely(error))
> -				goto readpage_error;
> -			if (!PageUptodate(page)) {
> -				if (page->mapping == NULL) {
> -					/*
> -					 * invalidate_mapping_pages got it
> -					 */
> -					unlock_page(page);
> -					put_page(page);
> -					goto find_page;
> -				}
> -				unlock_page(page);
> -				shrink_readahead_size_eio(ra);
> -				error = -EIO;
> -				goto readpage_error;
> -			}
> -			unlock_page(page);
> -		}
> +out_put_pages:
> +	for (j = cur_page + 1; j < nr_pages; j++)
> +		put_page(pages[j]);
> +	if (cur_page == 0) {
> +		if (err == AOP_TRUNCATED_PAGE)
> +			err = 0;
> +		return err;
> +	}
> +	return cur_page;
> +}
>  
> -		goto page_ok;
> +static int filemap_do_read(struct kiocb *iocb, struct iov_iter *iter,
> +		struct page **pages, unsigned long max_pages)
> +{
> +	struct file *filp = iocb->ki_filp;
> +	struct address_space *mapping = filp->f_mapping;
> +	struct file_ra_state *ra = &filp->f_ra;
> +	loff_t isize, end_offset;
> +	int nr_pages, i;
> +	bool writably_mapped;
>  
> -readpage_error:
> -		/* UHHUH! A synchronous read error occurred. Report it */
> -		put_page(page);
> -		goto out;
> +	cond_resched();
> +
> +was_truncated:
> +	nr_pages = filemap_read_get_pages(iocb, iter, pages, max_pages);
> +	if (nr_pages > 0)
> +		nr_pages = filemap_read_pages(iocb, iter, pages, nr_pages);
> +	if (nr_pages == 0)
> +		goto was_truncated;
> +	if (nr_pages < 0)
> +		return nr_pages;
> +
> +	/*
> +	 * i_size must be checked after we know the pages are Uptodate.
> +	 *
> +	 * Checking i_size after the check allows us to calculate the correct
> +	 * value for "nr_pages", which means the zero-filled part of the page is
> +	 * not copied back to userspace (unless another truncate extends the
> +	 * file - this is desired though).
> +	 */
> +	isize = i_size_read(mapping->host);
> +	if (unlikely(iocb->ki_pos >= isize))
> +		goto put_pages;
> +
> +	end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
> +
> +	while ((iocb->ki_pos >> PAGE_SHIFT) + nr_pages >
> +	       (end_offset + PAGE_SIZE - 1) >> PAGE_SHIFT)
> +		put_page(pages[--nr_pages]);
> +
> +	/*
> +	 * Once we start copying data, we don't want to be touching any
> +	 * cachelines that might be contended:
> +	 */
> +	writably_mapped = mapping_writably_mapped(mapping);
> +
> +	/*
> +	 * When a sequential read accesses a page several times, only mark it as
> +	 * accessed the first time.
> +	 */
> +	if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
> +		mark_page_accessed(pages[0]);
> +	for (i = 1; i < nr_pages; i++)
> +		mark_page_accessed(pages[i]);
> +
> +	for (i = 0; i < nr_pages; i++) {
> +		unsigned offset = iocb->ki_pos & ~PAGE_MASK;
> +		unsigned bytes = min_t(loff_t, end_offset - iocb->ki_pos,
> +					       PAGE_SIZE - offset);
> +		unsigned copied;
>  
> -no_cached_page:
>  		/*
> -		 * Ok, it wasn't cached, so we need to create a new
> -		 * page..
> +		 * If users can be writing to this page using arbitrary virtual
> +		 * addresses, take care about potential aliasing before reading
> +		 * the page on the kernel side.
>  		 */
> -		page = page_cache_alloc(mapping);
> -		if (!page) {
> -			error = -ENOMEM;
> -			goto out;
> -		}
> -		error = add_to_page_cache_lru(page, mapping, index,
> -				mapping_gfp_constraint(mapping, GFP_KERNEL));
> -		if (error) {
> -			put_page(page);
> -			if (error == -EEXIST) {
> -				error = 0;
> -				goto find_page;
> -			}
> -			goto out;
> -		}
> -		goto readpage;
> +		if (writably_mapped)
> +			flush_dcache_page(pages[i]);
> +
> +		copied = copy_page_to_iter(pages[i], offset, bytes, iter);
> +
> +		iocb->ki_pos += copied;
> +		ra->prev_pos = iocb->ki_pos;
> +
> +		if (copied < bytes)
> +			return -EFAULT;
>  	}
>  
> -would_block:
> -	error = -EAGAIN;
> -out:
> -	ra->prev_pos = prev_index;
> -	ra->prev_pos <<= PAGE_SHIFT;
> -	ra->prev_pos |= prev_offset;
> +put_pages:
> +	for (i = 0; i < nr_pages; i++)
> +		put_page(pages[i]);
> +	return 0;
> +}
> +
> +/**
> + * generic_file_buffered_read - generic file read routine
> + * @iocb:	the iocb to read
> + * @iter:	data destination
> + * @written:	already copied
> + *
> + * This is a generic file read routine, and uses the
> + * mapping->a_ops->readpage() function for the actual low-level stuff.
> + *
> + * Return:
> + * * total number of bytes copied, including those the were already @written
> + * * negative error code if nothing was copied
> + */
> +ssize_t generic_file_buffered_read(struct kiocb *iocb,
> +		struct iov_iter *iter, ssize_t written)
> +{
> +	struct address_space *mapping = iocb->ki_filp->f_mapping;
> +	size_t orig_count = iov_iter_count(iter);
> +	struct inode *inode = mapping->host;
> +	struct page *page_array[8], **pages = NULL;
> +	unsigned max_pages = filemap_nr_pages(iocb, iter);
> +	int error;
>  
> -	*ppos = ((loff_t)index << PAGE_SHIFT) + offset;
> -	file_accessed(filp);
> -	return written ? written : error;
> +	if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes))
> +		return 0;
> +	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
> +
> +	if (max_pages > ARRAY_SIZE(page_array))
> +		pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> +
> +	if (!pages) {
> +		pages = page_array;
> +		max_pages = ARRAY_SIZE(page_array);
> +	}
> +
> +	do {
> +		error = filemap_do_read(iocb, iter, pages, max_pages);
> +		if (error)
> +			break;
> +	} while (iov_iter_count(iter) && iocb->ki_pos < i_size_read(inode));
> +
> +	file_accessed(iocb->ki_filp);
> +	written += orig_count - iov_iter_count(iter);
> +
> +	if (pages != page_array)
> +		kfree(pages);
> +
> +	if (unlikely(!written))
> +		return error;
> +	return written;
>  }
>  EXPORT_SYMBOL_GPL(generic_file_buffered_read);
>  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Fixup patch for [PATCH 0/2] generic_file_buffered_read() refactoring & optimization
  2020-06-10  0:10 [PATCH 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
                   ` (3 preceding siblings ...)
  2020-06-10  1:36 ` [PATCH v2 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
@ 2020-06-30  0:12 ` Kent Overstreet
  4 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2020-06-30  0:12 UTC (permalink / raw)
  To: linux-kernel, akpm, viro, linux-mm, linux-fsdevel

Andrew - fixup patch because I got a bug report where we were trying to do an
order 7 allocation here:

-- >8 --
Subject: [PATCH] fixup! fs: generic_file_buffered_read() now uses
 find_get_pages_contig

We shouldn't try to pin too many pages at once, reads can be almost
arbitrarily big.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 mm/filemap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index d8bd5e9647..b3a2aad1b7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2220,8 +2220,9 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 	struct inode *inode = mapping->host;
 	size_t orig_count = iov_iter_count(iter);
 	struct page *pages_onstack[8], **pages = NULL;
-	unsigned int nr_pages = ((iocb->ki_pos + iter->count + PAGE_SIZE - 1) >> PAGE_SHIFT) -
-		(iocb->ki_pos >> PAGE_SHIFT);
+	unsigned int nr_pages = min_t(unsigned int, 512,
+			((iocb->ki_pos + iter->count + PAGE_SIZE - 1) >> PAGE_SHIFT) -
+			(iocb->ki_pos >> PAGE_SHIFT));
 	int i, pg_nr, error = 0;
 	bool writably_mapped;
 	loff_t isize, end_offset;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig
  2020-10-25 21:29 [PATCH v2 0/2] generic_file_buffered_read() improvements Kent Overstreet
@ 2020-10-25 21:29 ` Kent Overstreet
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2020-10-25 21:29 UTC (permalink / raw)
  To: linux-kernel, akpm; +Cc: Kent Overstreet, axboe, willy, linux-fsdevel

Convert generic_file_buffered_read() to get pages to read from in
batches, and then copy data to userspace from many pages at once - in
particular, we now don't touch any cachelines that might be contended
while we're in the loop to copy data to userspace.

This is is a performance improvement on workloads that do buffered reads
with large blocksizes, and a very large performance improvement if that
file is also being accessed concurrently by different threads.

On smaller reads (512 bytes), there's a very small performance
improvement (1%, within the margin of error).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
 mm/filemap.c | 313 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 175 insertions(+), 138 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index cc0f58a249..1bf1f424cb 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2168,67 +2168,6 @@ static int lock_page_for_iocb(struct kiocb *iocb, struct page *page)
 		return lock_page_killable(page);
 }
 
-static int generic_file_buffered_read_page_ok(struct kiocb *iocb,
-			struct iov_iter *iter,
-			struct page *page)
-{
-	struct address_space *mapping = iocb->ki_filp->f_mapping;
-	struct inode *inode = mapping->host;
-	struct file_ra_state *ra = &iocb->ki_filp->f_ra;
-	unsigned int offset = iocb->ki_pos & ~PAGE_MASK;
-	unsigned int bytes, copied;
-	loff_t isize, end_offset;
-
-	BUG_ON(iocb->ki_pos >> PAGE_SHIFT != page->index);
-
-	/*
-	 * i_size must be checked after we know the page is Uptodate.
-	 *
-	 * Checking i_size after the check allows us to calculate
-	 * the correct value for "bytes", which means the zero-filled
-	 * part of the page is not copied back to userspace (unless
-	 * another truncate extends the file - this is desired though).
-	 */
-
-	isize = i_size_read(inode);
-	if (unlikely(iocb->ki_pos >= isize))
-		return 1;
-
-	end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
-
-	bytes = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
-
-	/* If users can be writing to this page using arbitrary
-	 * virtual addresses, take care about potential aliasing
-	 * before reading the page on the kernel side.
-	 */
-	if (mapping_writably_mapped(mapping))
-		flush_dcache_page(page);
-
-	/*
-	 * Ok, we have the page, and it's up-to-date, so
-	 * now we can copy it to user space...
-	 */
-
-	copied = copy_page_to_iter(page, offset, bytes, iter);
-
-	iocb->ki_pos += copied;
-
-	/*
-	 * When a sequential read accesses a page several times,
-	 * only mark it as accessed the first time.
-	 */
-	if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
-		mark_page_accessed(page);
-
-	ra->prev_pos = iocb->ki_pos;
-
-	if (copied < bytes)
-		return -EFAULT;
-
-	return !iov_iter_count(iter) || iocb->ki_pos == isize;
-}
-
 static struct page *
 generic_file_buffered_read_readpage(struct kiocb *iocb,
 				    struct file *filp,
@@ -2386,6 +2325,92 @@ generic_file_buffered_read_no_cached_page(struct kiocb *iocb,
 	return generic_file_buffered_read_readpage(iocb, filp, mapping, page);
 }
 
+static int generic_file_buffered_read_get_pages(struct kiocb *iocb,
+						struct iov_iter *iter,
+						struct page **pages,
+						unsigned int nr)
+{
+	struct file *filp = iocb->ki_filp;
+	struct address_space *mapping = filp->f_mapping;
+	struct file_ra_state *ra = &filp->f_ra;
+	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
+	pgoff_t last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
+	int i, j, nr_got, err = 0;
+
+	nr = min_t(unsigned long, last_index - index, nr);
+find_page:
+	if (fatal_signal_pending(current))
+		return -EINTR;
+
+	nr_got = find_get_pages_contig(mapping, index, nr, pages);
+	if (nr_got)
+		goto got_pages;
+
+	if (iocb->ki_flags & IOCB_NOIO)
+		return -EAGAIN;
+
+	page_cache_sync_readahead(mapping, ra, filp, index, last_index - index);
+
+	nr_got = find_get_pages_contig(mapping, index, nr, pages);
+	if (nr_got)
+		goto got_pages;
+
+	pages[0] = generic_file_buffered_read_no_cached_page(iocb, iter);
+	err = PTR_ERR_OR_ZERO(pages[0]);
+	if (!IS_ERR_OR_NULL(pages[0]))
+		nr_got = 1;
+got_pages:
+	for (i = 0; i < nr_got; i++) {
+		struct page *page = pages[i];
+		pgoff_t pg_index = index + i;
+		loff_t pg_pos = max(iocb->ki_pos,
+				    (loff_t) pg_index << PAGE_SHIFT);
+		loff_t pg_count = iocb->ki_pos + iter->count - pg_pos;
+
+		if (PageReadahead(page)) {
+			if (iocb->ki_flags & IOCB_NOIO) {
+				for (j = i; j < nr_got; j++)
+					put_page(pages[j]);
+				nr_got = i;
+				err = -EAGAIN;
+				break;
+			}
+			page_cache_async_readahead(mapping, ra, filp, page,
+					pg_index, last_index - pg_index);
+		}
+
+		if (!PageUptodate(page)) {
+			if ((iocb->ki_flags & IOCB_NOWAIT) ||
+			    ((iocb->ki_flags & IOCB_WAITQ) && i)) {
+				for (j = i; j < nr_got; j++)
+					put_page(pages[j]);
+				nr_got = i;
+				err = -EAGAIN;
+				break;
+			}
+
+			page = generic_file_buffered_read_pagenotuptodate(iocb,
+					filp, iter, page, pg_pos, pg_count);
+			if (IS_ERR_OR_NULL(page)) {
+				for (j = i + 1; j < nr_got; j++)
+					put_page(pages[j]);
+				nr_got = i;
+				err = PTR_ERR_OR_ZERO(page);
+				break;
+			}
+		}
+	}
+
+	if (likely(nr_got))
+		return nr_got;
+	if (err)
+		return err;
+	/*
+	 * No pages and no error means we raced and should retry:
+	 */
+	goto find_page;
+}
+
 /**
  * generic_file_buffered_read - generic file read routine
  * @iocb:	the iocb to read
@@ -2406,104 +2431,116 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		struct iov_iter *iter, ssize_t written)
 {
 	struct file *filp = iocb->ki_filp;
+	struct file_ra_state *ra = &filp->f_ra;
 	struct address_space *mapping = filp->f_mapping;
 	struct inode *inode = mapping->host;
-	struct file_ra_state *ra = &filp->f_ra;
-	size_t orig_count = iov_iter_count(iter);
-	pgoff_t last_index;
-	int error = 0;
+	struct page *pages_onstack[PAGEVEC_SIZE], **pages = NULL;
+	unsigned int nr_pages = min_t(unsigned int, 512,
+			((iocb->ki_pos + iter->count + PAGE_SIZE - 1) >> PAGE_SHIFT) -
+			(iocb->ki_pos >> PAGE_SHIFT));
+	int i, pg_nr, error = 0;
+	bool writably_mapped;
+	loff_t isize, end_offset;
 
 	if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes))
 		return 0;
 	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
 
-	last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT;
+	if (nr_pages > ARRAY_SIZE(pages_onstack))
+		pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
 
-	/*
-	 * If we've already successfully copied some data, then we
-	 * can no longer safely return -EIOCBQUEUED. Hence mark
-	 * an async read NOWAIT at that point.
-	 */
-	if (written && (iocb->ki_flags & IOCB_WAITQ))
-		iocb->ki_flags |= IOCB_NOWAIT;
-
-	for (;;) {
-		pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
-		struct page *page;
+	if (!pages) {
+		pages = pages_onstack;
+		nr_pages = min_t(unsigned int, nr_pages, ARRAY_SIZE(pages_onstack));
+	}
 
+	do {
 		cond_resched();
-find_page:
-		if (fatal_signal_pending(current)) {
-			error = -EINTR;
-			goto out;
-		}
 
 		/*
-		 * We can't return -EIOCBQUEUED once we've done some work, so
-		 * ensure we don't block:
+		 * If we've already successfully copied some data, then we
+		 * can no longer safely return -EIOCBQUEUED. Hence mark
+		 * an async read NOWAIT at that point.
 		 */
-		if ((iocb->ki_flags & IOCB_WAITQ) &&
-		    (written + orig_count - iov_iter_count(iter)))
+		if ((iocb->ki_flags & IOCB_WAITQ) && written)
 			iocb->ki_flags |= IOCB_NOWAIT;
 
-		page = find_get_page(mapping, index);
-		if (!page) {
-			if (iocb->ki_flags & IOCB_NOIO)
-				goto would_block;
-			page_cache_sync_readahead(mapping,
-					ra, filp,
-					index, last_index - index);
-			page = find_get_page(mapping, index);
-			if (unlikely(page == NULL)) {
-				page = generic_file_buffered_read_no_cached_page(iocb, iter);
-				if (!page)
-					goto find_page;
-				if (IS_ERR(page)) {
-					error = PTR_ERR(page);
-					goto out;
-				}
-			}
-		}
-		if (PageReadahead(page)) {
-			if (iocb->ki_flags & IOCB_NOIO) {
-				put_page(page);
-				goto out;
-			}
-			page_cache_async_readahead(mapping,
-					ra, filp, page,
-					index, last_index - index);
-		}
-		if (!PageUptodate(page)) {
-			if (iocb->ki_flags & IOCB_NOWAIT) {
-				put_page(page);
-				error = -EAGAIN;
-				goto out;
-			}
-			page = generic_file_buffered_read_pagenotuptodate(iocb,
-					filp, iter, page, iocb->ki_pos, iter->count);
-			if (!page)
-				goto find_page;
-			if (IS_ERR(page)) {
-				error = PTR_ERR(page);
-				goto out;
-			}
+		i = 0;
+		pg_nr = generic_file_buffered_read_get_pages(iocb, iter,
+							     pages, nr_pages);
+		if (pg_nr < 0) {
+			error = pg_nr;
+			break;
 		}
 
-		error = generic_file_buffered_read_page_ok(iocb, iter, page);
-		put_page(page);
+		/*
+		 * i_size must be checked after we know the pages are Uptodate.
+		 *
+		 * Checking i_size after the check allows us to calculate
+		 * the correct value for "nr", which means the zero-filled
+		 * part of the page is not copied back to userspace (unless
+		 * another truncate extends the file - this is desired though).
+		 */
+		isize = i_size_read(inode);
+		if (unlikely(iocb->ki_pos >= isize))
+			goto put_pages;
 
-		if (error) {
-			if (error > 0)
-				error = 0;
-			goto out;
+		end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
+
+		while ((iocb->ki_pos >> PAGE_SHIFT) + pg_nr >
+		       (end_offset + PAGE_SIZE - 1) >> PAGE_SHIFT)
+			put_page(pages[--pg_nr]);
+
+		/*
+		 * Once we start copying data, we don't want to be touching any
+		 * cachelines that might be contended:
+		 */
+		writably_mapped = mapping_writably_mapped(mapping);
+
+		/*
+		 * When a sequential read accesses a page several times, only
+		 * mark it as accessed the first time.
+		 */
+		if (iocb->ki_pos >> PAGE_SHIFT !=
+		    ra->prev_pos >> PAGE_SHIFT)
+			mark_page_accessed(pages[0]);
+		for (i = 1; i < pg_nr; i++)
+			mark_page_accessed(pages[i]);
+
+		for (i = 0; i < pg_nr; i++) {
+			unsigned int offset = iocb->ki_pos & ~PAGE_MASK;
+			unsigned int bytes = min_t(loff_t, end_offset - iocb->ki_pos,
+						   PAGE_SIZE - offset);
+			unsigned int copied;
+
+			/*
+			 * If users can be writing to this page using arbitrary
+			 * virtual addresses, take care about potential aliasing
+			 * before reading the page on the kernel side.
+			 */
+			if (writably_mapped)
+				flush_dcache_page(pages[i]);
+
+			copied = copy_page_to_iter(pages[i], offset, bytes, iter);
+
+			written += copied;
+			iocb->ki_pos += copied;
+			ra->prev_pos = iocb->ki_pos;
+
+			if (copied < bytes) {
+				error = -EFAULT;
+				break;
+			}
 		}
-	}
+put_pages:
+		for (i = 0; i < pg_nr; i++)
+			put_page(pages[i]);
+	} while (iov_iter_count(iter) && iocb->ki_pos < isize && !error);
 
-would_block:
-	error = -EAGAIN;
-out:
 	file_accessed(filp);
-	written += orig_count - iov_iter_count(iter);
+
+	if (pages != pages_onstack)
+		kfree(pages);
 
 	return written ? written : error;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-10-25 21:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-10  0:10 [PATCH 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
2020-06-10  0:10 ` [PATCH 1/2] fs: Break generic_file_buffered_read up into multiple functions Kent Overstreet
2020-06-10  0:10 ` [PATCH 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
2020-06-10  0:47   ` Matthew Wilcox
2020-06-10  1:08     ` Kent Overstreet
2020-06-10  1:38   ` Matthew Wilcox
2020-06-10  1:46     ` Kent Overstreet
2020-06-10  1:36 ` [PATCH v2 1/2] fs: Break generic_file_buffered_read up into multiple functions Kent Overstreet
2020-06-10  1:36 ` [PATCH v2 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
2020-06-18  1:05   ` Andrew Morton
2020-06-19  3:20     ` [PATCH v3 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
2020-06-19 12:59       ` Christoph Hellwig
2020-06-19 18:44         ` Kent Overstreet
2020-06-19  3:20     ` [PATCH v3 1/2] fs: Break generic_file_buffered_read up into multiple functions Kent Overstreet
2020-06-19  3:20     ` [PATCH v3 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet
2020-06-30  0:12 ` Fixup patch for [PATCH 0/2] generic_file_buffered_read() refactoring & optimization Kent Overstreet
2020-10-25 21:29 [PATCH v2 0/2] generic_file_buffered_read() improvements Kent Overstreet
2020-10-25 21:29 ` [PATCH v2 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).