linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Rosenberg <drosen@google.com>
To: Phillip Lougher <phillip@squashfs.org.uk>, linux-kernel@vger.kernel.org
Cc: adrien@schischi.me, Adrien Schildknecht <adriens@google.com>,
	Daniel Rosenberg <drosen@google.com>
Subject: [PATCH 5/5] Squashfs: optimize reading uncompressed data
Date: Fri, 22 Sep 2017 14:55:08 -0700	[thread overview]
Message-ID: <20170922215508.73407-6-drosen@google.com> (raw)
In-Reply-To: <20170922215508.73407-1-drosen@google.com>

From: Adrien Schildknecht <adriens@google.com>

When dealing with uncompressed data, there is no need to read a whole
block (default 128K) to get the desired page: the pages are
independent from each others.

This patch change the readpages logic so that reading uncompressed
data only read the number of pages advised by the readahead algorithm.

Moreover, if the page actor contains holes (i.e. pages that are already
up-to-date), squashfs skips the buffer_head associated to those pages.

This patch greatly improve the performance of random reads for
uncompressed files because squashfs only read what is needed. It also
reduces the number of unnecessary reads.

Signed-off-by: Adrien Schildknecht <adriens@google.com>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
---
 fs/squashfs/block.c       | 25 +++++++++++++++++++++++++
 fs/squashfs/file_direct.c | 37 ++++++++++++++++++++++++++++++-------
 2 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c
index 252dfc82ae72..37658b6e83ee 100644
--- a/fs/squashfs/block.c
+++ b/fs/squashfs/block.c
@@ -207,6 +207,22 @@ static void squashfs_bio_end_io(struct bio *bio)
 	kfree(bio_req);
 }
 
+static int bh_is_optional(struct squashfs_read_request *req, int idx)
+{
+	int start_idx, end_idx;
+	struct squashfs_sb_info *msblk = req->sb->s_fs_info;
+
+	start_idx = (idx * msblk->devblksize - req->offset) >> PAGE_SHIFT;
+	end_idx = ((idx + 1) * msblk->devblksize - req->offset + 1) >> PAGE_SHIFT;
+	if (start_idx >= req->output->pages)
+		return 1;
+	if (start_idx < 0)
+		start_idx = end_idx;
+	if (end_idx >= req->output->pages)
+		end_idx = start_idx;
+	return !req->output->page[start_idx] && !req->output->page[end_idx];
+}
+
 static int actor_getblks(struct squashfs_read_request *req, u64 block)
 {
 	int i;
@@ -216,6 +232,15 @@ static int actor_getblks(struct squashfs_read_request *req, u64 block)
 		return -ENOMEM;
 
 	for (i = 0; i < req->nr_buffers; ++i) {
+		/*
+		 * When dealing with an uncompressed block, the actor may
+		 * contains NULL pages. There's no need to read the buffers
+		 * associated with these pages.
+		 */
+		if (!req->compressed && bh_is_optional(req, i)) {
+			req->bh[i] = NULL;
+			continue;
+		}
 		req->bh[i] = sb_getblk(req->sb, block + i);
 		if (!req->bh[i]) {
 			while (--i) {
diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c
index a978811de327..dc87f77ce11e 100644
--- a/fs/squashfs/file_direct.c
+++ b/fs/squashfs/file_direct.c
@@ -111,15 +111,38 @@ int squashfs_readpages_block(struct page *target_page,
 	struct squashfs_page_actor *actor;
 	struct inode *inode = mapping->host;
 	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
-	int file_end = (i_size_read(inode) - 1) >> PAGE_SHIFT;
+	int start_index, end_index, file_end, actor_pages, res;
 	int mask = (1 << (msblk->block_log - PAGE_SHIFT)) - 1;
-	int start_index = page_index & ~mask;
-	int end_index = start_index | mask;
-	int actor_pages, res;
 
-	if (end_index > file_end)
-		end_index = file_end;
-	actor_pages = end_index - start_index + 1;
+	/*
+	 * If readpage() is called on an uncompressed datablock, we can just
+	 * read the pages instead of fetching the whole block.
+	 * This greatly improves the performance when a process keep doing
+	 * random reads because we only fetch the necessary data.
+	 * The readahead algorithm will take care of doing speculative reads
+	 * if necessary.
+	 * We can't read more than 1 block even if readahead provides use more
+	 * pages because we don't know yet if the next block is compressed or
+	 * not.
+	 */
+	if (bsize && !SQUASHFS_COMPRESSED_BLOCK(bsize)) {
+		u64 block_end = block + msblk->block_size;
+
+		block += (page_index & mask) * PAGE_SIZE;
+		actor_pages = (block_end - block) / PAGE_SIZE;
+		if (*nr_pages < actor_pages)
+			actor_pages = *nr_pages;
+		start_index = page_index;
+		bsize = min_t(int, bsize, (PAGE_SIZE * actor_pages)
+					  | SQUASHFS_COMPRESSED_BIT_BLOCK);
+	} else {
+		file_end = (i_size_read(inode) - 1) >> PAGE_SHIFT;
+		start_index = page_index & ~mask;
+		end_index = start_index | mask;
+		if (end_index > file_end)
+			end_index = file_end;
+		actor_pages = end_index - start_index + 1;
+	}
 
 	actor = actor_from_page_cache(actor_pages, target_page,
 				      readahead_pages, nr_pages, start_index,
-- 
2.14.1.821.g8fa685d3b7-goog

      parent reply	other threads:[~2017-09-22 21:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-22 21:55 [PATCH 0/5] Squashfs Whitelist and Compression Threshold Daniel Rosenberg
2017-09-22 21:55 ` [PATCH 1/5] Squashfs: remove the FILE_CACHE option Daniel Rosenberg
2017-09-23  2:01   ` Phillip Lougher
2017-09-22 21:55 ` [PATCH 2/5] Squashfs: refactor page_actor Daniel Rosenberg
2017-09-22 21:55 ` [PATCH 3/5] Squashfs: replace buffer_head with BIO Daniel Rosenberg
2017-09-22 21:55 ` [PATCH 4/5] Squashfs: implement .readpages() Daniel Rosenberg
2017-09-22 21:55 ` Daniel Rosenberg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170922215508.73407-6-drosen@google.com \
    --to=drosen@google.com \
    --cc=adrien@schischi.me \
    --cc=adriens@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phillip@squashfs.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).