linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Squashfs performance improvements
@ 2013-11-07 20:24 Phillip Lougher
  2013-11-07 20:24 ` [PATCH 1/6] Squashfs: Refactor decompressor interface and code (V2) Phillip Lougher
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Phillip Lougher @ 2013-11-07 20:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: minchan, Phillip Lougher

Hi

This patch-set is mainly a compilation of the various Squashfs
performance improvement patches sent to the mailing list over the
last month or so.  These patches do the following:

	* Add support for different decompressor implementations
	* Add support for parallel decompression, with or without
	  percpu variables
	* Add support for direct decompression into the page cache,
	  rather than using an intermediate buffer.

Patches 1, 4 and 6 are revised V2 versions following review.

Patch 2 is Minchan Kim's multi-threading patch unmodified, it is
included here for completeness.

Patch 3 is new, it is a multi-threaded implementation which uses
percpu variables to do load-balancing across multiple cores, providing
one decompressor per core.  This implementation is intended to be
complementary to Minchan Kim's patch.

Each parallelisation implementation exhibits trade-offs between
decompression performance and CPU and memory usage.  The
multi-threading implementation without percpu variables offers
the ability to do two simultaneous decompressions per core, which
offers maximum performance but at the expense of very high CPU usage
and memory overhead.  For many multi-core embedded systems with
weak CPUs such resource use may be prohibitive.

The percpu implementation inherently limits CPU usage to one decompression
per core, and because of the use of percpu variables it ensures
decompression is load balanced too.

----------------------------------------------------------------
Minchan Kim (1):
      Squashfs: enhance parallel I/O

Phillip Lougher (5):
      Squashfs: Refactor decompressor interface and code (V2)
      Squashfs: add multi-threaded decompression using percpu variables
      Squashfs: Generalise paging handling in the decompressors (V2)
      Squashfs: restructure squashfs_readpage()
      Squashfs: Directly decompress into the page cache for file data (V2)

 fs/squashfs/Kconfig                     |   72 +++++++++++
 fs/squashfs/Makefile                    |    5 +
 fs/squashfs/block.c                     |   36 +++---
 fs/squashfs/cache.c                     |   28 ++++-
 fs/squashfs/decompressor.c              |   59 ++++++---
 fs/squashfs/decompressor.h              |   24 ++--
 fs/squashfs/decompressor_multi.c        |  199 +++++++++++++++++++++++++++++++
 fs/squashfs/decompressor_multi_percpu.c |  104 ++++++++++++++++
 fs/squashfs/decompressor_single.c       |   85 +++++++++++++
 fs/squashfs/file.c                      |  142 +++++++++++-----------
 fs/squashfs/file_cache.c                |   38 ++++++
 fs/squashfs/file_direct.c               |  178 +++++++++++++++++++++++++++
 fs/squashfs/lzo_wrapper.c               |   47 ++++----
 fs/squashfs/page_actor.c                |  104 ++++++++++++++++
 fs/squashfs/page_actor.h                |   86 +++++++++++++
 fs/squashfs/squashfs.h                  |   20 +++-
 fs/squashfs/squashfs_fs_sb.h            |    4 +-
 fs/squashfs/super.c                     |   10 +-
 fs/squashfs/xz_wrapper.c                |  105 +++++++++-------
 fs/squashfs/zlib_wrapper.c              |   64 ++++------
 20 files changed, 1167 insertions(+), 243 deletions(-)
 create mode 100644 fs/squashfs/decompressor_multi.c
 create mode 100644 fs/squashfs/decompressor_multi_percpu.c
 create mode 100644 fs/squashfs/decompressor_single.c
 create mode 100644 fs/squashfs/file_cache.c
 create mode 100644 fs/squashfs/file_direct.c
 create mode 100644 fs/squashfs/page_actor.c
 create mode 100644 fs/squashfs/page_actor.h

Phillip

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] Squashfs: Refactor decompressor interface and code (V2)
  2013-11-07 20:24 [PATCH 0/6] Squashfs performance improvements Phillip Lougher
@ 2013-11-07 20:24 ` Phillip Lougher
  2013-11-07 20:24 ` [PATCH 2/6] Squashfs: enhance parallel I/O Phillip Lougher
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Phillip Lougher @ 2013-11-07 20:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: minchan, Phillip Lougher

The decompressor interface and code was written from
the point of view of single-threaded operation.  In doing
so it mixed a lot of single-threaded implementation specific
aspects into the decompressor code and elsewhere which makes it
difficult to seamlessly support multiple different decompressor
implementations.

This patch does the following:

1.  It removes compressor_options parsing from the decompressor
    init() function.  This allows the decompressor init() function
    to be dynamically called to instantiate multiple decompressors,
    without the compressor options needing to be read and parsed each
    time.

2.  It moves threading and all sleeping operations out of the
    decompressors.  In doing so, it makes the decompressors
    non-blocking wrappers which only deal with interfacing with
    the decompressor implementation.

3. It splits decompressor.[ch] into decompressor generic functions
   in decompressor.[ch], and moves the single threaded
   decompressor implementation into decompressor_single.c.

The result of this patch is Squashfs should now be able to
support multiple decompressors by adding new decompressor_xxx.c
files with specialised implementations of the functions in
decompressor_single.c

V2:
  * decompressor_single.c: Shorten header comment by removing GPL licence text
  * decompressor_single.c: Remove comp_opts from struct squashfs_stream,
    it's not needed in this implementation
  * Fix checkpatch.pl errors

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>
---
 fs/squashfs/Makefile              |    2 +-
 fs/squashfs/block.c               |   11 +++--
 fs/squashfs/decompressor.c        |   47 +++++++++++++-------
 fs/squashfs/decompressor.h        |   21 +++------
 fs/squashfs/decompressor_single.c |   86 +++++++++++++++++++++++++++++++++++
 fs/squashfs/lzo_wrapper.c         |   24 +++-------
 fs/squashfs/squashfs.h            |    9 +++-
 fs/squashfs/squashfs_fs_sb.h      |    3 +-
 fs/squashfs/super.c               |   10 ++---
 fs/squashfs/xz_wrapper.c          |   89 ++++++++++++++++++++-----------------
 fs/squashfs/zlib_wrapper.c        |   50 +++++++--------------
 11 files changed, 216 insertions(+), 136 deletions(-)
 create mode 100644 fs/squashfs/decompressor_single.c

diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
index 110b047..c223c84 100644
--- a/fs/squashfs/Makefile
+++ b/fs/squashfs/Makefile
@@ -4,7 +4,7 @@
 
 obj-$(CONFIG_SQUASHFS) += squashfs.o
 squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
-squashfs-y += namei.o super.o symlink.o decompressor.o
+squashfs-y += namei.o super.o symlink.o decompressor.o decompressor_single.o
 squashfs-$(CONFIG_SQUASHFS_XATTR) += xattr.o xattr_id.o
 squashfs-$(CONFIG_SQUASHFS_LZO) += lzo_wrapper.o
 squashfs-$(CONFIG_SQUASHFS_XZ) += xz_wrapper.o
diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c
index 41d108e..4dd4025 100644
--- a/fs/squashfs/block.c
+++ b/fs/squashfs/block.c
@@ -93,7 +93,7 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
 	struct buffer_head **bh;
 	int offset = index & ((1 << msblk->devblksize_log2) - 1);
 	u64 cur_index = index >> msblk->devblksize_log2;
-	int bytes, compressed, b = 0, k = 0, page = 0, avail;
+	int bytes, compressed, b = 0, k = 0, page = 0, avail, i;
 
 	bh = kcalloc(((srclength + msblk->devblksize - 1)
 		>> msblk->devblksize_log2) + 1, sizeof(*bh), GFP_KERNEL);
@@ -158,6 +158,12 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
 		ll_rw_block(READ, b - 1, bh + 1);
 	}
 
+	for (i = 0; i < b; i++) {
+		wait_on_buffer(bh[i]);
+		if (!buffer_uptodate(bh[i]))
+			goto block_release;
+	}
+
 	if (compressed) {
 		length = squashfs_decompress(msblk, buffer, bh, b, offset,
 			 length, srclength, pages);
@@ -172,9 +178,6 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
 		for (bytes = length; k < b; k++) {
 			in = min(bytes, msblk->devblksize - offset);
 			bytes -= in;
-			wait_on_buffer(bh[k]);
-			if (!buffer_uptodate(bh[k]))
-				goto block_release;
 			while (in) {
 				if (pg_offset == PAGE_CACHE_SIZE) {
 					page++;
diff --git a/fs/squashfs/decompressor.c b/fs/squashfs/decompressor.c
index 3f6271d..234291f 100644
--- a/fs/squashfs/decompressor.c
+++ b/fs/squashfs/decompressor.c
@@ -37,29 +37,29 @@
  */
 
 static const struct squashfs_decompressor squashfs_lzma_unsupported_comp_ops = {
-	NULL, NULL, NULL, LZMA_COMPRESSION, "lzma", 0
+	NULL, NULL, NULL, NULL, LZMA_COMPRESSION, "lzma", 0
 };
 
 #ifndef CONFIG_SQUASHFS_LZO
 static const struct squashfs_decompressor squashfs_lzo_comp_ops = {
-	NULL, NULL, NULL, LZO_COMPRESSION, "lzo", 0
+	NULL, NULL, NULL, NULL, LZO_COMPRESSION, "lzo", 0
 };
 #endif
 
 #ifndef CONFIG_SQUASHFS_XZ
 static const struct squashfs_decompressor squashfs_xz_comp_ops = {
-	NULL, NULL, NULL, XZ_COMPRESSION, "xz", 0
+	NULL, NULL, NULL, NULL, XZ_COMPRESSION, "xz", 0
 };
 #endif
 
 #ifndef CONFIG_SQUASHFS_ZLIB
 static const struct squashfs_decompressor squashfs_zlib_comp_ops = {
-	NULL, NULL, NULL, ZLIB_COMPRESSION, "zlib", 0
+	NULL, NULL, NULL, NULL, ZLIB_COMPRESSION, "zlib", 0
 };
 #endif
 
 static const struct squashfs_decompressor squashfs_unknown_comp_ops = {
-	NULL, NULL, NULL, 0, "unknown", 0
+	NULL, NULL, NULL, NULL, 0, "unknown", 0
 };
 
 static const struct squashfs_decompressor *decompressor[] = {
@@ -83,10 +83,10 @@ const struct squashfs_decompressor *squashfs_lookup_decompressor(int id)
 }
 
 
-void *squashfs_decompressor_init(struct super_block *sb, unsigned short flags)
+static void *get_comp_opts(struct super_block *sb, unsigned short flags)
 {
 	struct squashfs_sb_info *msblk = sb->s_fs_info;
-	void *strm, *buffer = NULL;
+	void *buffer = NULL, *comp_opts;
 	int length = 0;
 
 	/*
@@ -94,23 +94,40 @@ void *squashfs_decompressor_init(struct super_block *sb, unsigned short flags)
 	 */
 	if (SQUASHFS_COMP_OPTS(flags)) {
 		buffer = kmalloc(PAGE_CACHE_SIZE, GFP_KERNEL);
-		if (buffer == NULL)
-			return ERR_PTR(-ENOMEM);
+		if (buffer == NULL) {
+			comp_opts = ERR_PTR(-ENOMEM);
+			goto out;
+		}
 
 		length = squashfs_read_data(sb, &buffer,
 			sizeof(struct squashfs_super_block), 0, NULL,
-			PAGE_CACHE_SIZE, 1);
+				PAGE_CACHE_SIZE, 1);
 
 		if (length < 0) {
-			strm = ERR_PTR(length);
-			goto finished;
+			comp_opts = ERR_PTR(length);
+			goto out;
 		}
 	}
 
-	strm = msblk->decompressor->init(msblk, buffer, length);
+	comp_opts = squashfs_comp_opts(msblk, buffer, length);
 
-finished:
+out:
 	kfree(buffer);
+	return comp_opts;
+}
+
+
+void *squashfs_decompressor_setup(struct super_block *sb, unsigned short flags)
+{
+	struct squashfs_sb_info *msblk = sb->s_fs_info;
+	void *stream, *comp_opts = get_comp_opts(sb, flags);
+
+	if (IS_ERR(comp_opts))
+		return comp_opts;
+
+	stream = squashfs_decompressor_create(msblk, comp_opts);
+	if (IS_ERR(stream))
+		kfree(comp_opts);
 
-	return strm;
+	return stream;
 }
diff --git a/fs/squashfs/decompressor.h b/fs/squashfs/decompressor.h
index 330073e..6cdb20a 100644
--- a/fs/squashfs/decompressor.h
+++ b/fs/squashfs/decompressor.h
@@ -24,28 +24,21 @@
  */
 
 struct squashfs_decompressor {
-	void	*(*init)(struct squashfs_sb_info *, void *, int);
+	void	*(*init)(struct squashfs_sb_info *, void *);
+	void	*(*comp_opts)(struct squashfs_sb_info *, void *, int);
 	void	(*free)(void *);
-	int	(*decompress)(struct squashfs_sb_info *, void **,
+	int	(*decompress)(struct squashfs_sb_info *, void *, void **,
 		struct buffer_head **, int, int, int, int, int);
 	int	id;
 	char	*name;
 	int	supported;
 };
 
-static inline void squashfs_decompressor_free(struct squashfs_sb_info *msblk,
-	void *s)
+static inline void *squashfs_comp_opts(struct squashfs_sb_info *msblk,
+							void *buff, int length)
 {
-	if (msblk->decompressor)
-		msblk->decompressor->free(s);
-}
-
-static inline int squashfs_decompress(struct squashfs_sb_info *msblk,
-	void **buffer, struct buffer_head **bh, int b, int offset, int length,
-	int srclength, int pages)
-{
-	return msblk->decompressor->decompress(msblk, buffer, bh, b, offset,
-		length, srclength, pages);
+	return msblk->decompressor->comp_opts ?
+		msblk->decompressor->comp_opts(msblk, buff, length) : NULL;
 }
 
 #ifdef CONFIG_SQUASHFS_XZ
diff --git a/fs/squashfs/decompressor_single.c b/fs/squashfs/decompressor_single.c
new file mode 100644
index 0000000..c5bb694
--- /dev/null
+++ b/fs/squashfs/decompressor_single.c
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2013
+ * Phillip Lougher <phillip@squashfs.org.uk>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/buffer_head.h>
+
+#include "squashfs_fs.h"
+#include "squashfs_fs_sb.h"
+#include "decompressor.h"
+#include "squashfs.h"
+
+/*
+ * This file implements single-threaded decompression in the
+ * decompressor framework
+ */
+
+struct squashfs_stream {
+	void		*stream;
+	struct mutex	mutex;
+};
+
+void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
+						void *comp_opts)
+{
+	struct squashfs_stream *stream;
+	int err = -ENOMEM;
+
+	stream = kmalloc(sizeof(*stream), GFP_KERNEL);
+	if (stream == NULL)
+		goto out;
+
+	stream->stream = msblk->decompressor->init(msblk, comp_opts);
+	kfree(comp_opts);
+	if (IS_ERR(stream->stream)) {
+		err = PTR_ERR(stream->stream);
+		goto out;
+	}
+
+	mutex_init(&stream->mutex);
+	return stream;
+
+out:
+	kfree(stream);
+	return ERR_PTR(err);
+}
+
+void squashfs_decompressor_destroy(struct squashfs_sb_info *msblk)
+{
+	struct squashfs_stream *stream = msblk->stream;
+
+	if (stream) {
+		msblk->decompressor->free(stream->stream);
+		kfree(stream);
+	}
+}
+
+int squashfs_decompress(struct squashfs_sb_info *msblk,
+	void **buffer, struct buffer_head **bh, int b, int offset, int length,
+	int srclength, int pages)
+{
+	int res;
+	struct squashfs_stream *stream = msblk->stream;
+
+	mutex_lock(&stream->mutex);
+	res = msblk->decompressor->decompress(msblk, stream->stream, buffer,
+		bh, b, offset, length, srclength, pages);
+	mutex_unlock(&stream->mutex);
+
+	if (res < 0)
+		ERROR("%s decompression failed, data probably corrupt\n",
+			msblk->decompressor->name);
+
+	return res;
+}
+
+int squashfs_max_decompressors(void)
+{
+	return 1;
+}
diff --git a/fs/squashfs/lzo_wrapper.c b/fs/squashfs/lzo_wrapper.c
index 00f4dfc..75c3b57 100644
--- a/fs/squashfs/lzo_wrapper.c
+++ b/fs/squashfs/lzo_wrapper.c
@@ -37,7 +37,7 @@ struct squashfs_lzo {
 	void	*output;
 };
 
-static void *lzo_init(struct squashfs_sb_info *msblk, void *buff, int len)
+static void *lzo_init(struct squashfs_sb_info *msblk, void *buff)
 {
 	int block_size = max_t(int, msblk->block_size, SQUASHFS_METADATA_SIZE);
 
@@ -74,22 +74,16 @@ static void lzo_free(void *strm)
 }
 
 
-static int lzo_uncompress(struct squashfs_sb_info *msblk, void **buffer,
-	struct buffer_head **bh, int b, int offset, int length, int srclength,
-	int pages)
+static int lzo_uncompress(struct squashfs_sb_info *msblk, void *strm,
+	void **buffer, struct buffer_head **bh, int b, int offset, int length,
+	int srclength, int pages)
 {
-	struct squashfs_lzo *stream = msblk->stream;
+	struct squashfs_lzo *stream = strm;
 	void *buff = stream->input;
 	int avail, i, bytes = length, res;
 	size_t out_len = srclength;
 
-	mutex_lock(&msblk->read_data_mutex);
-
 	for (i = 0; i < b; i++) {
-		wait_on_buffer(bh[i]);
-		if (!buffer_uptodate(bh[i]))
-			goto block_release;
-
 		avail = min(bytes, msblk->devblksize - offset);
 		memcpy(buff, bh[i]->b_data + offset, avail);
 		buff += avail;
@@ -111,17 +105,9 @@ static int lzo_uncompress(struct squashfs_sb_info *msblk, void **buffer,
 		bytes -= avail;
 	}
 
-	mutex_unlock(&msblk->read_data_mutex);
 	return res;
 
-block_release:
-	for (; i < b; i++)
-		put_bh(bh[i]);
-
 failed:
-	mutex_unlock(&msblk->read_data_mutex);
-
-	ERROR("lzo decompression failed, data probably corrupt\n");
 	return -EIO;
 }
 
diff --git a/fs/squashfs/squashfs.h b/fs/squashfs/squashfs.h
index d126651..2e2751d 100644
--- a/fs/squashfs/squashfs.h
+++ b/fs/squashfs/squashfs.h
@@ -48,7 +48,14 @@ extern void *squashfs_read_table(struct super_block *, u64, int);
 
 /* decompressor.c */
 extern const struct squashfs_decompressor *squashfs_lookup_decompressor(int);
-extern void *squashfs_decompressor_init(struct super_block *, unsigned short);
+extern void *squashfs_decompressor_setup(struct super_block *, unsigned short);
+
+/* decompressor_xxx.c */
+extern void *squashfs_decompressor_create(struct squashfs_sb_info *, void *);
+extern void squashfs_decompressor_destroy(struct squashfs_sb_info *);
+extern int squashfs_decompress(struct squashfs_sb_info *, void **,
+	struct buffer_head **, int, int, int, int, int);
+extern int squashfs_max_decompressors(void);
 
 /* export.c */
 extern __le64 *squashfs_read_inode_lookup_table(struct super_block *, u64, u64,
diff --git a/fs/squashfs/squashfs_fs_sb.h b/fs/squashfs/squashfs_fs_sb.h
index 52934a2..9cdcf41 100644
--- a/fs/squashfs/squashfs_fs_sb.h
+++ b/fs/squashfs/squashfs_fs_sb.h
@@ -63,10 +63,9 @@ struct squashfs_sb_info {
 	__le64					*id_table;
 	__le64					*fragment_index;
 	__le64					*xattr_id_table;
-	struct mutex				read_data_mutex;
 	struct mutex				meta_index_mutex;
 	struct meta_index			*meta_index;
-	void					*stream;
+	struct squashfs_stream			*stream;
 	__le64					*inode_lookup_table;
 	u64					inode_table;
 	u64					directory_table;
diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c
index 60553a9..202df63 100644
--- a/fs/squashfs/super.c
+++ b/fs/squashfs/super.c
@@ -98,7 +98,6 @@ static int squashfs_fill_super(struct super_block *sb, void *data, int silent)
 	msblk->devblksize = sb_min_blocksize(sb, SQUASHFS_DEVBLK_SIZE);
 	msblk->devblksize_log2 = ffz(~msblk->devblksize);
 
-	mutex_init(&msblk->read_data_mutex);
 	mutex_init(&msblk->meta_index_mutex);
 
 	/*
@@ -206,13 +205,14 @@ static int squashfs_fill_super(struct super_block *sb, void *data, int silent)
 		goto failed_mount;
 
 	/* Allocate read_page block */
-	msblk->read_page = squashfs_cache_init("data", 1, msblk->block_size);
+	msblk->read_page = squashfs_cache_init("data",
+		squashfs_max_decompressors(), msblk->block_size);
 	if (msblk->read_page == NULL) {
 		ERROR("Failed to allocate read_page block\n");
 		goto failed_mount;
 	}
 
-	msblk->stream = squashfs_decompressor_init(sb, flags);
+	msblk->stream = squashfs_decompressor_setup(sb, flags);
 	if (IS_ERR(msblk->stream)) {
 		err = PTR_ERR(msblk->stream);
 		msblk->stream = NULL;
@@ -336,7 +336,7 @@ failed_mount:
 	squashfs_cache_delete(msblk->block_cache);
 	squashfs_cache_delete(msblk->fragment_cache);
 	squashfs_cache_delete(msblk->read_page);
-	squashfs_decompressor_free(msblk, msblk->stream);
+	squashfs_decompressor_destroy(msblk);
 	kfree(msblk->inode_lookup_table);
 	kfree(msblk->fragment_index);
 	kfree(msblk->id_table);
@@ -383,7 +383,7 @@ static void squashfs_put_super(struct super_block *sb)
 		squashfs_cache_delete(sbi->block_cache);
 		squashfs_cache_delete(sbi->fragment_cache);
 		squashfs_cache_delete(sbi->read_page);
-		squashfs_decompressor_free(sbi, sbi->stream);
+		squashfs_decompressor_destroy(sbi);
 		kfree(sbi->id_table);
 		kfree(sbi->fragment_index);
 		kfree(sbi->meta_index);
diff --git a/fs/squashfs/xz_wrapper.c b/fs/squashfs/xz_wrapper.c
index 1760b7d1..5d1d07c 100644
--- a/fs/squashfs/xz_wrapper.c
+++ b/fs/squashfs/xz_wrapper.c
@@ -38,38 +38,63 @@ struct squashfs_xz {
 	struct xz_buf buf;
 };
 
-struct comp_opts {
+struct disk_comp_opts {
 	__le32 dictionary_size;
 	__le32 flags;
 };
 
-static void *squashfs_xz_init(struct squashfs_sb_info *msblk, void *buff,
-	int len)
+struct comp_opts {
+	int dict_size;
+};
+
+static void *squashfs_xz_comp_opts(struct squashfs_sb_info *msblk,
+	void *buff, int len)
 {
-	struct comp_opts *comp_opts = buff;
-	struct squashfs_xz *stream;
-	int dict_size = msblk->block_size;
-	int err, n;
+	struct disk_comp_opts *comp_opts = buff;
+	struct comp_opts *opts;
+	int err = 0, n;
+
+	opts = kmalloc(sizeof(*opts), GFP_KERNEL);
+	if (opts == NULL) {
+		err = -ENOMEM;
+		goto out2;
+	}
 
 	if (comp_opts) {
 		/* check compressor options are the expected length */
 		if (len < sizeof(*comp_opts)) {
 			err = -EIO;
-			goto failed;
+			goto out;
 		}
 
-		dict_size = le32_to_cpu(comp_opts->dictionary_size);
+		opts->dict_size = le32_to_cpu(comp_opts->dictionary_size);
 
 		/* the dictionary size should be 2^n or 2^n+2^(n+1) */
-		n = ffs(dict_size) - 1;
-		if (dict_size != (1 << n) && dict_size != (1 << n) +
+		n = ffs(opts->dict_size) - 1;
+		if (opts->dict_size != (1 << n) && opts->dict_size != (1 << n) +
 						(1 << (n + 1))) {
 			err = -EIO;
-			goto failed;
+			goto out;
 		}
-	}
+	} else
+		/* use defaults */
+		opts->dict_size = max_t(int, msblk->block_size,
+							SQUASHFS_METADATA_SIZE);
 
-	dict_size = max_t(int, dict_size, SQUASHFS_METADATA_SIZE);
+	return opts;
+
+out:
+	kfree(opts);
+out2:
+	return ERR_PTR(err);
+}
+
+
+static void *squashfs_xz_init(struct squashfs_sb_info *msblk, void *buff)
+{
+	struct comp_opts *comp_opts = buff;
+	struct squashfs_xz *stream;
+	int err;
 
 	stream = kmalloc(sizeof(*stream), GFP_KERNEL);
 	if (stream == NULL) {
@@ -77,7 +102,7 @@ static void *squashfs_xz_init(struct squashfs_sb_info *msblk, void *buff,
 		goto failed;
 	}
 
-	stream->state = xz_dec_init(XZ_PREALLOC, dict_size);
+	stream->state = xz_dec_init(XZ_PREALLOC, comp_opts->dict_size);
 	if (stream->state == NULL) {
 		kfree(stream);
 		err = -ENOMEM;
@@ -103,15 +128,13 @@ static void squashfs_xz_free(void *strm)
 }
 
 
-static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void **buffer,
-	struct buffer_head **bh, int b, int offset, int length, int srclength,
-	int pages)
+static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void *strm,
+	void **buffer, struct buffer_head **bh, int b, int offset, int length,
+	int srclength, int pages)
 {
 	enum xz_ret xz_err;
 	int avail, total = 0, k = 0, page = 0;
-	struct squashfs_xz *stream = msblk->stream;
-
-	mutex_lock(&msblk->read_data_mutex);
+	struct squashfs_xz *stream = strm;
 
 	xz_dec_reset(stream->state);
 	stream->buf.in_pos = 0;
@@ -124,10 +147,6 @@ static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void **buffer,
 		if (stream->buf.in_pos == stream->buf.in_size && k < b) {
 			avail = min(length, msblk->devblksize - offset);
 			length -= avail;
-			wait_on_buffer(bh[k]);
-			if (!buffer_uptodate(bh[k]))
-				goto release_mutex;
-
 			stream->buf.in = bh[k]->b_data + offset;
 			stream->buf.in_size = avail;
 			stream->buf.in_pos = 0;
@@ -147,23 +166,12 @@ static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void **buffer,
 			put_bh(bh[k++]);
 	} while (xz_err == XZ_OK);
 
-	if (xz_err != XZ_STREAM_END) {
-		ERROR("xz_dec_run error, data probably corrupt\n");
-		goto release_mutex;
-	}
-
-	if (k < b) {
-		ERROR("xz_uncompress error, input remaining\n");
-		goto release_mutex;
-	}
-
-	total += stream->buf.out_pos;
-	mutex_unlock(&msblk->read_data_mutex);
-	return total;
+	if (xz_err != XZ_STREAM_END || k < b)
+		goto out;
 
-release_mutex:
-	mutex_unlock(&msblk->read_data_mutex);
+	return total + stream->buf.out_pos;
 
+out:
 	for (; k < b; k++)
 		put_bh(bh[k]);
 
@@ -172,6 +180,7 @@ release_mutex:
 
 const struct squashfs_decompressor squashfs_xz_comp_ops = {
 	.init = squashfs_xz_init,
+	.comp_opts = squashfs_xz_comp_opts,
 	.free = squashfs_xz_free,
 	.decompress = squashfs_xz_uncompress,
 	.id = XZ_COMPRESSION,
diff --git a/fs/squashfs/zlib_wrapper.c b/fs/squashfs/zlib_wrapper.c
index 55d918f..bb04902 100644
--- a/fs/squashfs/zlib_wrapper.c
+++ b/fs/squashfs/zlib_wrapper.c
@@ -33,7 +33,7 @@
 #include "squashfs.h"
 #include "decompressor.h"
 
-static void *zlib_init(struct squashfs_sb_info *dummy, void *buff, int len)
+static void *zlib_init(struct squashfs_sb_info *dummy, void *buff)
 {
 	z_stream *stream = kmalloc(sizeof(z_stream), GFP_KERNEL);
 	if (stream == NULL)
@@ -61,15 +61,13 @@ static void zlib_free(void *strm)
 }
 
 
-static int zlib_uncompress(struct squashfs_sb_info *msblk, void **buffer,
-	struct buffer_head **bh, int b, int offset, int length, int srclength,
-	int pages)
+static int zlib_uncompress(struct squashfs_sb_info *msblk, void *strm,
+	void **buffer, struct buffer_head **bh, int b, int offset, int length,
+	int srclength, int pages)
 {
 	int zlib_err, zlib_init = 0;
 	int k = 0, page = 0;
-	z_stream *stream = msblk->stream;
-
-	mutex_lock(&msblk->read_data_mutex);
+	z_stream *stream = strm;
 
 	stream->avail_out = 0;
 	stream->avail_in = 0;
@@ -78,10 +76,6 @@ static int zlib_uncompress(struct squashfs_sb_info *msblk, void **buffer,
 		if (stream->avail_in == 0 && k < b) {
 			int avail = min(length, msblk->devblksize - offset);
 			length -= avail;
-			wait_on_buffer(bh[k]);
-			if (!buffer_uptodate(bh[k]))
-				goto release_mutex;
-
 			stream->next_in = bh[k]->b_data + offset;
 			stream->avail_in = avail;
 			offset = 0;
@@ -94,12 +88,8 @@ static int zlib_uncompress(struct squashfs_sb_info *msblk, void **buffer,
 
 		if (!zlib_init) {
 			zlib_err = zlib_inflateInit(stream);
-			if (zlib_err != Z_OK) {
-				ERROR("zlib_inflateInit returned unexpected "
-					"result 0x%x, srclength %d\n",
-					zlib_err, srclength);
-				goto release_mutex;
-			}
+			if (zlib_err != Z_OK)
+				goto out;
 			zlib_init = 1;
 		}
 
@@ -109,29 +99,19 @@ static int zlib_uncompress(struct squashfs_sb_info *msblk, void **buffer,
 			put_bh(bh[k++]);
 	} while (zlib_err == Z_OK);
 
-	if (zlib_err != Z_STREAM_END) {
-		ERROR("zlib_inflate error, data probably corrupt\n");
-		goto release_mutex;
-	}
+	if (zlib_err != Z_STREAM_END)
+		goto out;
 
 	zlib_err = zlib_inflateEnd(stream);
-	if (zlib_err != Z_OK) {
-		ERROR("zlib_inflate error, data probably corrupt\n");
-		goto release_mutex;
-	}
-
-	if (k < b) {
-		ERROR("zlib_uncompress error, data remaining\n");
-		goto release_mutex;
-	}
+	if (zlib_err != Z_OK)
+		goto out;
 
-	length = stream->total_out;
-	mutex_unlock(&msblk->read_data_mutex);
-	return length;
+	if (k < b)
+		goto out;
 
-release_mutex:
-	mutex_unlock(&msblk->read_data_mutex);
+	return stream->total_out;
 
+out:
 	for (; k < b; k++)
 		put_bh(bh[k]);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] Squashfs: enhance parallel I/O
  2013-11-07 20:24 [PATCH 0/6] Squashfs performance improvements Phillip Lougher
  2013-11-07 20:24 ` [PATCH 1/6] Squashfs: Refactor decompressor interface and code (V2) Phillip Lougher
@ 2013-11-07 20:24 ` Phillip Lougher
  2013-11-07 20:24 ` [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables Phillip Lougher
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Phillip Lougher @ 2013-11-07 20:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: minchan, Phillip Lougher

From: Minchan Kim <minchan@kernel.org>

Now squashfs have used for only one stream buffer for decompression
so it hurts parallel read performance so this patch supports
multiple decompressor to enhance performance parallel I/O.

Four 1G file dd read on KVM machine which has 2 CPU and 4G memory.

dd if=test/test1.dat of=/dev/null &
dd if=test/test2.dat of=/dev/null &
dd if=test/test3.dat of=/dev/null &
dd if=test/test4.dat of=/dev/null &

old : 1m39s -> new : 9s

* From v1
  * Change comp_strm with decomp_strm - Phillip
  * Change/add comments - Phillip

Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
---
 fs/squashfs/Kconfig              |   13 +++
 fs/squashfs/Makefile             |    9 +-
 fs/squashfs/decompressor_multi.c |  200 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 221 insertions(+), 1 deletion(-)
 create mode 100644 fs/squashfs/decompressor_multi.c

diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
index c70111e..1c6d340 100644
--- a/fs/squashfs/Kconfig
+++ b/fs/squashfs/Kconfig
@@ -63,6 +63,19 @@ config SQUASHFS_LZO
 
 	  If unsure, say N.
 
+config SQUASHFS_MULTI_DECOMPRESSOR
+	bool "Use multiple decompressors for handling parallel I/O"
+	depends on SQUASHFS
+	help
+	  By default Squashfs uses a single decompressor but it gives
+	  poor performance on parallel I/O workloads when using multiple CPU
+	  machines due to waiting on decompressor availability.
+
+	  If you have a parallel I/O workload and your system has enough memory,
+	  using this option may improve overall I/O performance.
+
+	  If unsure, say N.
+
 config SQUASHFS_XZ
 	bool "Include support for XZ compressed file systems"
 	depends on SQUASHFS
diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
index c223c84..dfebc3b 100644
--- a/fs/squashfs/Makefile
+++ b/fs/squashfs/Makefile
@@ -4,8 +4,15 @@
 
 obj-$(CONFIG_SQUASHFS) += squashfs.o
 squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
-squashfs-y += namei.o super.o symlink.o decompressor.o decompressor_single.o
+squashfs-y += namei.o super.o symlink.o decompressor.o
+
 squashfs-$(CONFIG_SQUASHFS_XATTR) += xattr.o xattr_id.o
 squashfs-$(CONFIG_SQUASHFS_LZO) += lzo_wrapper.o
 squashfs-$(CONFIG_SQUASHFS_XZ) += xz_wrapper.o
 squashfs-$(CONFIG_SQUASHFS_ZLIB) += zlib_wrapper.o
+
+ifdef CONFIG_SQUASHFS_MULTI_DECOMPRESSOR
+	squashfs-y		+= decompressor_multi.o
+else
+	squashfs-y		+= decompressor_single.o
+endif
diff --git a/fs/squashfs/decompressor_multi.c b/fs/squashfs/decompressor_multi.c
new file mode 100644
index 0000000..462731d
--- /dev/null
+++ b/fs/squashfs/decompressor_multi.c
@@ -0,0 +1,200 @@
+/*
+ *  Copyright (c) 2013
+ *  Minchan Kim <minchan@kernel.org>
+ *
+ *  This work is licensed under the terms of the GNU GPL, version 2. See
+ *  the COPYING file in the top-level directory.
+ */
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/buffer_head.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/cpumask.h>
+
+#include "squashfs_fs.h"
+#include "squashfs_fs_sb.h"
+#include "decompressor.h"
+#include "squashfs.h"
+
+/*
+ * This file implements multi-threaded decompression in the
+ * decompressor framework
+ */
+
+
+/*
+ * The reason that multiply two is that a CPU can request new I/O
+ * while it is waiting previous request.
+ */
+#define MAX_DECOMPRESSOR	(num_online_cpus() * 2)
+
+
+int squashfs_max_decompressors(void)
+{
+	return MAX_DECOMPRESSOR;
+}
+
+
+struct squashfs_stream {
+	void			*comp_opts;
+	struct list_head	strm_list;
+	struct mutex		mutex;
+	int			avail_decomp;
+	wait_queue_head_t	wait;
+};
+
+
+struct decomp_stream {
+	void *stream;
+	struct list_head list;
+};
+
+
+static void put_decomp_stream(struct decomp_stream *decomp_strm,
+				struct squashfs_stream *stream)
+{
+	mutex_lock(&stream->mutex);
+	list_add(&decomp_strm->list, &stream->strm_list);
+	mutex_unlock(&stream->mutex);
+	wake_up(&stream->wait);
+}
+
+void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
+				void *comp_opts)
+{
+	struct squashfs_stream *stream;
+	struct decomp_stream *decomp_strm = NULL;
+	int err = -ENOMEM;
+
+	stream = kzalloc(sizeof(*stream), GFP_KERNEL);
+	if (!stream)
+		goto out;
+
+	stream->comp_opts = comp_opts;
+	mutex_init(&stream->mutex);
+	INIT_LIST_HEAD(&stream->strm_list);
+	init_waitqueue_head(&stream->wait);
+
+	/*
+	 * We should have a decompressor at least as default
+	 * so if we fail to allocate new decompressor dynamically,
+	 * we could always fall back to default decompressor and
+	 * file system works.
+	 */
+	decomp_strm = kmalloc(sizeof(*decomp_strm), GFP_KERNEL);
+	if (!decomp_strm)
+		goto out;
+
+	decomp_strm->stream = msblk->decompressor->init(msblk,
+						stream->comp_opts);
+	if (IS_ERR(decomp_strm->stream)) {
+		err = PTR_ERR(decomp_strm->stream);
+		goto out;
+	}
+
+	list_add(&decomp_strm->list, &stream->strm_list);
+	stream->avail_decomp = 1;
+	return stream;
+
+out:
+	kfree(decomp_strm);
+	kfree(stream);
+	return ERR_PTR(err);
+}
+
+
+void squashfs_decompressor_destroy(struct squashfs_sb_info *msblk)
+{
+	struct squashfs_stream *stream = msblk->stream;
+	if (stream) {
+		struct decomp_stream *decomp_strm;
+
+		while (!list_empty(&stream->strm_list)) {
+			decomp_strm = list_entry(stream->strm_list.prev,
+						struct decomp_stream, list);
+			list_del(&decomp_strm->list);
+			msblk->decompressor->free(decomp_strm->stream);
+			kfree(decomp_strm);
+			stream->avail_decomp--;
+		}
+	}
+
+	WARN_ON(stream->avail_decomp);
+	kfree(stream->comp_opts);
+	kfree(stream);
+}
+
+
+static struct decomp_stream *get_decomp_stream(struct squashfs_sb_info *msblk,
+					struct squashfs_stream *stream)
+{
+	struct decomp_stream *decomp_strm;
+
+	while (1) {
+		mutex_lock(&stream->mutex);
+
+		/* There is available decomp_stream */
+		if (!list_empty(&stream->strm_list)) {
+			decomp_strm = list_entry(stream->strm_list.prev,
+				struct decomp_stream, list);
+			list_del(&decomp_strm->list);
+			mutex_unlock(&stream->mutex);
+			break;
+		}
+
+		/*
+		 * If there is no available decomp and already full,
+		 * let's wait for releasing decomp from other users.
+		 */
+		if (stream->avail_decomp >= MAX_DECOMPRESSOR)
+			goto wait;
+
+		/* Let's allocate new decomp */
+		decomp_strm = kmalloc(sizeof(*decomp_strm), GFP_KERNEL);
+		if (!decomp_strm)
+			goto wait;
+
+		decomp_strm->stream = msblk->decompressor->init(msblk,
+						stream->comp_opts);
+		if (IS_ERR(decomp_strm->stream)) {
+			kfree(decomp_strm);
+			goto wait;
+		}
+
+		stream->avail_decomp++;
+		WARN_ON(stream->avail_decomp > MAX_DECOMPRESSOR);
+
+		mutex_unlock(&stream->mutex);
+		break;
+wait:
+		/*
+		 * If system memory is tough, let's for other's
+		 * releasing instead of hurting VM because it could
+		 * make page cache thrashing.
+		 */
+		mutex_unlock(&stream->mutex);
+		wait_event(stream->wait,
+			!list_empty(&stream->strm_list));
+	}
+
+	return decomp_strm;
+}
+
+
+int squashfs_decompress(struct squashfs_sb_info *msblk,
+	void **buffer, struct buffer_head **bh, int b, int offset, int length,
+	int srclength, int pages)
+{
+	int res;
+	struct squashfs_stream *stream = msblk->stream;
+	struct decomp_stream *decomp_stream = get_decomp_stream(msblk, stream);
+	res = msblk->decompressor->decompress(msblk, decomp_stream->stream,
+		buffer, bh, b, offset, length, srclength, pages);
+	put_decomp_stream(decomp_stream, stream);
+	if (res < 0)
+		ERROR("%s decompression failed, data probably corrupt\n",
+			msblk->decompressor->name);
+	return res;
+}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
  2013-11-07 20:24 [PATCH 0/6] Squashfs performance improvements Phillip Lougher
  2013-11-07 20:24 ` [PATCH 1/6] Squashfs: Refactor decompressor interface and code (V2) Phillip Lougher
  2013-11-07 20:24 ` [PATCH 2/6] Squashfs: enhance parallel I/O Phillip Lougher
@ 2013-11-07 20:24 ` Phillip Lougher
  2013-11-08  2:42   ` Minchan Kim
  2013-11-07 20:24 ` [PATCH 4/6] Squashfs: Generalise paging handling in the decompressors (V2) Phillip Lougher
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Phillip Lougher @ 2013-11-07 20:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: minchan, Phillip Lougher

Add a multi-threaded decompression implementation which uses
percpu variables.

Using percpu variables has advantages and disadvantages over
implementations which do not use percpu variables.

Advantages: the nature of percpu variables ensures decompression is
load-balanced across the multiple cores.

Disadvantages: it limits decompression to one thread per core.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
---
 fs/squashfs/Kconfig                     |   57 +++++++++++++----
 fs/squashfs/Makefile                    |   10 +--
 fs/squashfs/decompressor_multi_percpu.c |  105 +++++++++++++++++++++++++++++++
 3 files changed, 152 insertions(+), 20 deletions(-)
 create mode 100644 fs/squashfs/decompressor_multi_percpu.c

diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
index 1c6d340..c92c75f 100644
--- a/fs/squashfs/Kconfig
+++ b/fs/squashfs/Kconfig
@@ -25,6 +25,50 @@ config SQUASHFS
 
 	  If unsure, say N.
 
+choice
+	prompt "Decompressor parallelisation options"
+	depends on SQUASHFS
+	help
+	  Squashfs now supports three parallelisation options for
+	  decompression.  Each one exhibits various trade-offs between
+	  decompression performance and CPU and memory usage.
+
+	  If in doubt, select "Single threaded compression"
+
+config SQUASHFS_DECOMP_SINGLE
+	bool "Single threaded compression"
+	help
+	  Traditionally Squashfs has used single-threaded decompression.
+	  Only one block (data or metadata) can be decompressed at any
+	  one time.  This limits CPU and memory usage to a minimum.
+
+config SQUASHFS_DECOMP_MULTI
+	bool "Use multiple decompressors for parallel I/O"
+	help
+	  By default Squashfs uses a single decompressor but it gives
+	  poor performance on parallel I/O workloads when using multiple CPU
+	  machines due to waiting on decompressor availability.
+
+	  If you have a parallel I/O workload and your system has enough memory,
+	  using this option may improve overall I/O performance.
+
+	  This decompressor implementation uses up to two parallel
+	  decompressors per core.  It dynamically allocates decompressors
+	  on a demand basis.
+
+config SQUASHFS_DECOMP_MULTI_PERCPU
+       bool "Use percpu multiple decompressors for parallel I/O"
+	help
+	  By default Squashfs uses a single decompressor but it gives
+	  poor performance on parallel I/O workloads when using multiple CPU
+	  machines due to waiting on decompressor availability.
+
+	  This decompressor implementation uses a maximum of one
+	  decompressor per core.  It uses percpu variables to ensure
+	  decompression is load-balanced across the cores.
+
+endchoice
+
 config SQUASHFS_XATTR
 	bool "Squashfs XATTR support"
 	depends on SQUASHFS
@@ -63,19 +107,6 @@ config SQUASHFS_LZO
 
 	  If unsure, say N.
 
-config SQUASHFS_MULTI_DECOMPRESSOR
-	bool "Use multiple decompressors for handling parallel I/O"
-	depends on SQUASHFS
-	help
-	  By default Squashfs uses a single decompressor but it gives
-	  poor performance on parallel I/O workloads when using multiple CPU
-	  machines due to waiting on decompressor availability.
-
-	  If you have a parallel I/O workload and your system has enough memory,
-	  using this option may improve overall I/O performance.
-
-	  If unsure, say N.
-
 config SQUASHFS_XZ
 	bool "Include support for XZ compressed file systems"
 	depends on SQUASHFS
diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
index dfebc3b..5833b96 100644
--- a/fs/squashfs/Makefile
+++ b/fs/squashfs/Makefile
@@ -5,14 +5,10 @@
 obj-$(CONFIG_SQUASHFS) += squashfs.o
 squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
 squashfs-y += namei.o super.o symlink.o decompressor.o
-
+squashfs-$(CONFIG_SQUASHFS_DECOMP_SINGLE) += decompressor_single.o
+squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI) += decompressor_multi.o
+squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU) += decompressor_multi_percpu.o
 squashfs-$(CONFIG_SQUASHFS_XATTR) += xattr.o xattr_id.o
 squashfs-$(CONFIG_SQUASHFS_LZO) += lzo_wrapper.o
 squashfs-$(CONFIG_SQUASHFS_XZ) += xz_wrapper.o
 squashfs-$(CONFIG_SQUASHFS_ZLIB) += zlib_wrapper.o
-
-ifdef CONFIG_SQUASHFS_MULTI_DECOMPRESSOR
-	squashfs-y		+= decompressor_multi.o
-else
-	squashfs-y		+= decompressor_single.o
-endif
diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c
new file mode 100644
index 0000000..b5598ab
--- /dev/null
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright (c) 2013
+ * Phillip Lougher <phillip@squashfs.org.uk>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/buffer_head.h>
+
+#include "squashfs_fs.h"
+#include "squashfs_fs_sb.h"
+#include "decompressor.h"
+#include "squashfs.h"
+
+/*
+ * This file implements multi-threaded decompression using percpu
+ * variables, one thread per cpu core.
+ */
+
+struct squashfs_stream {
+	void		*stream;
+};
+
+void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
+						void *comp_opts)
+{
+	struct squashfs_stream *stream;
+	struct squashfs_stream __percpu *percpu;
+	int err, cpu, cpu0;
+
+	percpu = alloc_percpu(struct squashfs_stream);
+	if (percpu == NULL) {
+		err = -ENOMEM;
+		goto out2;
+	}
+
+	for_each_possible_cpu(cpu) {
+		stream = per_cpu_ptr(percpu, cpu);
+		stream->stream = msblk->decompressor->init(msblk, comp_opts);
+		if (IS_ERR(stream->stream)) {
+			err = PTR_ERR(stream->stream);
+			goto out;
+		}
+	}
+
+	kfree(comp_opts);
+	return (__force void *) percpu;
+
+out:
+	for_each_possible_cpu(cpu0) {
+		if (cpu0 == cpu)
+			break;
+		stream = per_cpu_ptr(percpu, cpu0);
+		msblk->decompressor->free(stream->stream);
+	}
+	free_percpu(percpu);
+out2:
+	kfree(comp_opts);
+	return ERR_PTR(err);
+}
+
+void squashfs_decompressor_destroy(struct squashfs_sb_info *msblk)
+{
+	struct squashfs_stream __percpu *percpu =
+			(struct squashfs_stream __percpu *) msblk->stream;
+	struct squashfs_stream *stream;
+	int cpu;
+
+	if (msblk->stream) {
+		for_each_possible_cpu(cpu) {
+			stream = per_cpu_ptr(percpu, cpu);
+			msblk->decompressor->free(stream->stream);
+		}
+		free_percpu(percpu);
+	}
+}
+
+int squashfs_decompress(struct squashfs_sb_info *msblk,
+	void **buffer, struct buffer_head **bh, int b, int offset, int length,
+	int srclength, int pages)
+{
+	int res;
+	struct squashfs_stream __percpu *percpu =
+			(struct squashfs_stream __percpu *) msblk->stream;
+	struct squashfs_stream *stream = get_cpu_ptr(percpu);
+
+	res = msblk->decompressor->decompress(msblk, stream->stream, buffer,
+		bh, b, offset, length, srclength, pages);
+	put_cpu_ptr(stream);
+
+	if (res < 0)
+		ERROR("%s decompression failed, data probably corrupt\n",
+			msblk->decompressor->name);
+
+	return res;
+}
+
+int squashfs_max_decompressors(void)
+{
+	return num_possible_cpus();
+}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] Squashfs: Generalise paging handling in the decompressors (V2)
  2013-11-07 20:24 [PATCH 0/6] Squashfs performance improvements Phillip Lougher
                   ` (2 preceding siblings ...)
  2013-11-07 20:24 ` [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables Phillip Lougher
@ 2013-11-07 20:24 ` Phillip Lougher
  2013-11-08  5:29   ` Minchan Kim
  2013-11-07 20:24 ` [PATCH 5/6] Squashfs: restructure squashfs_readpage() Phillip Lougher
  2013-11-07 20:24 ` [PATCH 6/6] Squashfs: Directly decompress into the page cache for file data (V2) Phillip Lougher
  5 siblings, 1 reply; 15+ messages in thread
From: Phillip Lougher @ 2013-11-07 20:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: minchan, Phillip Lougher

Further generalise the decompressors by adding a page handler
abstraction.  This adds helpers to allow the decompressors
to access and process the output buffers in an implementation
independant manner.

This allows different types of output buffer to be passed
to the decompressors, with the implementation specific
aspects handled at decompression time, but without the
knowledge being held in the decompressor wrapper code.

This will allow the decompressors to handle Squashfs
cache buffers, and page cache pages.

This patch adds the abstraction and an implementation for
the caches.

V2: also update the code in decompressor_multi*.c

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
---
 fs/squashfs/block.c                     |   27 +++++++++-------
 fs/squashfs/cache.c                     |   28 +++++++++++++---
 fs/squashfs/decompressor.c              |   14 ++++++--
 fs/squashfs/decompressor.h              |    5 +--
 fs/squashfs/decompressor_multi.c        |    7 ++--
 fs/squashfs/decompressor_multi_percpu.c |    9 +++---
 fs/squashfs/decompressor_single.c       |    9 +++---
 fs/squashfs/lzo_wrapper.c               |   27 ++++++++++------
 fs/squashfs/page_actor.h                |   54 +++++++++++++++++++++++++++++++
 fs/squashfs/squashfs.h                  |    8 ++---
 fs/squashfs/squashfs_fs_sb.h            |    1 +
 fs/squashfs/xz_wrapper.c                |   22 +++++++------
 fs/squashfs/zlib_wrapper.c              |   24 ++++++++------
 13 files changed, 168 insertions(+), 67 deletions(-)
 create mode 100644 fs/squashfs/page_actor.h

diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c
index 4dd4025..0cea9b9 100644
--- a/fs/squashfs/block.c
+++ b/fs/squashfs/block.c
@@ -36,6 +36,7 @@
 #include "squashfs_fs_sb.h"
 #include "squashfs.h"
 #include "decompressor.h"
+#include "page_actor.h"
 
 /*
  * Read the metadata block length, this is stored in the first two
@@ -86,16 +87,16 @@ static struct buffer_head *get_block_length(struct super_block *sb,
  * generated a larger block - this does occasionally happen with compression
  * algorithms).
  */
-int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
-			int length, u64 *next_index, int srclength, int pages)
+int squashfs_read_data(struct super_block *sb, u64 index, int length,
+		u64 *next_index, struct squashfs_page_actor *output)
 {
 	struct squashfs_sb_info *msblk = sb->s_fs_info;
 	struct buffer_head **bh;
 	int offset = index & ((1 << msblk->devblksize_log2) - 1);
 	u64 cur_index = index >> msblk->devblksize_log2;
-	int bytes, compressed, b = 0, k = 0, page = 0, avail, i;
+	int bytes, compressed, b = 0, k = 0, avail, i;
 
-	bh = kcalloc(((srclength + msblk->devblksize - 1)
+	bh = kcalloc(((output->length + msblk->devblksize - 1)
 		>> msblk->devblksize_log2) + 1, sizeof(*bh), GFP_KERNEL);
 	if (bh == NULL)
 		return -ENOMEM;
@@ -111,9 +112,9 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
 			*next_index = index + length;
 
 		TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
-			index, compressed ? "" : "un", length, srclength);
+			index, compressed ? "" : "un", length, output->length);
 
-		if (length < 0 || length > srclength ||
+		if (length < 0 || length > output->length ||
 				(index + length) > msblk->bytes_used)
 			goto read_failure;
 
@@ -145,7 +146,7 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
 		TRACE("Block @ 0x%llx, %scompressed size %d\n", index,
 				compressed ? "" : "un", length);
 
-		if (length < 0 || length > srclength ||
+		if (length < 0 || length > output->length ||
 					(index + length) > msblk->bytes_used)
 			goto block_release;
 
@@ -165,8 +166,8 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
 	}
 
 	if (compressed) {
-		length = squashfs_decompress(msblk, buffer, bh, b, offset,
-			 length, srclength, pages);
+		length = squashfs_decompress(msblk, bh, b, offset, length,
+			output);
 		if (length < 0)
 			goto read_failure;
 	} else {
@@ -174,19 +175,20 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
 		 * Block is uncompressed.
 		 */
 		int in, pg_offset = 0;
+		void *data = squashfs_first_page(output);
 
 		for (bytes = length; k < b; k++) {
 			in = min(bytes, msblk->devblksize - offset);
 			bytes -= in;
 			while (in) {
 				if (pg_offset == PAGE_CACHE_SIZE) {
-					page++;
+					data = squashfs_next_page(output);
 					pg_offset = 0;
 				}
 				avail = min_t(int, in, PAGE_CACHE_SIZE -
 						pg_offset);
-				memcpy(buffer[page] + pg_offset,
-						bh[k]->b_data + offset, avail);
+				memcpy(data + pg_offset, bh[k]->b_data + offset,
+						avail);
 				in -= avail;
 				pg_offset += avail;
 				offset += avail;
@@ -194,6 +196,7 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
 			offset = 0;
 			put_bh(bh[k]);
 		}
+		squashfs_finish_page(output);
 	}
 
 	kfree(bh);
diff --git a/fs/squashfs/cache.c b/fs/squashfs/cache.c
index af0b738..1cb70a0 100644
--- a/fs/squashfs/cache.c
+++ b/fs/squashfs/cache.c
@@ -56,6 +56,7 @@
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
 #include "squashfs.h"
+#include "page_actor.h"
 
 /*
  * Look-up block in cache, and increment usage count.  If not in cache, read
@@ -119,9 +120,8 @@ struct squashfs_cache_entry *squashfs_cache_get(struct super_block *sb,
 			entry->error = 0;
 			spin_unlock(&cache->lock);
 
-			entry->length = squashfs_read_data(sb, entry->data,
-				block, length, &entry->next_index,
-				cache->block_size, cache->pages);
+			entry->length = squashfs_read_data(sb, block, length,
+				&entry->next_index, entry->actor);
 
 			spin_lock(&cache->lock);
 
@@ -220,6 +220,7 @@ void squashfs_cache_delete(struct squashfs_cache *cache)
 				kfree(cache->entry[i].data[j]);
 			kfree(cache->entry[i].data);
 		}
+		kfree(cache->entry[i].actor);
 	}
 
 	kfree(cache->entry);
@@ -280,6 +281,13 @@ struct squashfs_cache *squashfs_cache_init(char *name, int entries,
 				goto cleanup;
 			}
 		}
+
+		entry->actor = squashfs_page_actor_init(entry->data,
+						cache->pages, 0);
+		if (entry->actor == NULL) {
+			ERROR("Failed to allocate %s cache entry\n", name);
+			goto cleanup;
+		}
 	}
 
 	return cache;
@@ -410,6 +418,7 @@ void *squashfs_read_table(struct super_block *sb, u64 block, int length)
 	int pages = (length + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	int i, res;
 	void *table, *buffer, **data;
+	struct squashfs_page_actor *actor;
 
 	table = buffer = kmalloc(length, GFP_KERNEL);
 	if (table == NULL)
@@ -421,19 +430,28 @@ void *squashfs_read_table(struct super_block *sb, u64 block, int length)
 		goto failed;
 	}
 
+	actor = squashfs_page_actor_init(data, pages, length);
+	if (actor == NULL) {
+		res = -ENOMEM;
+		goto failed2;
+	}
+
 	for (i = 0; i < pages; i++, buffer += PAGE_CACHE_SIZE)
 		data[i] = buffer;
 
-	res = squashfs_read_data(sb, data, block, length |
-		SQUASHFS_COMPRESSED_BIT_BLOCK, NULL, length, pages);
+	res = squashfs_read_data(sb, block, length |
+		SQUASHFS_COMPRESSED_BIT_BLOCK, NULL, actor);
 
 	kfree(data);
+	kfree(actor);
 
 	if (res < 0)
 		goto failed;
 
 	return table;
 
+failed2:
+	kfree(data);
 failed:
 	kfree(table);
 	return ERR_PTR(res);
diff --git a/fs/squashfs/decompressor.c b/fs/squashfs/decompressor.c
index 234291f..ac22fe7 100644
--- a/fs/squashfs/decompressor.c
+++ b/fs/squashfs/decompressor.c
@@ -30,6 +30,7 @@
 #include "squashfs_fs_sb.h"
 #include "decompressor.h"
 #include "squashfs.h"
+#include "page_actor.h"
 
 /*
  * This file (and decompressor.h) implements a decompressor framework for
@@ -87,6 +88,7 @@ static void *get_comp_opts(struct super_block *sb, unsigned short flags)
 {
 	struct squashfs_sb_info *msblk = sb->s_fs_info;
 	void *buffer = NULL, *comp_opts;
+	struct squashfs_page_actor *actor = NULL;
 	int length = 0;
 
 	/*
@@ -99,9 +101,14 @@ static void *get_comp_opts(struct super_block *sb, unsigned short flags)
 			goto out;
 		}
 
-		length = squashfs_read_data(sb, &buffer,
-			sizeof(struct squashfs_super_block), 0, NULL,
-				PAGE_CACHE_SIZE, 1);
+		actor = squashfs_page_actor_init(&buffer, 1, 0);
+		if (actor == NULL) {
+			comp_opts = ERR_PTR(-ENOMEM);
+			goto out;
+		}
+
+		length = squashfs_read_data(sb,
+			sizeof(struct squashfs_super_block), 0, NULL, actor);
 
 		if (length < 0) {
 			comp_opts = ERR_PTR(length);
@@ -112,6 +119,7 @@ static void *get_comp_opts(struct super_block *sb, unsigned short flags)
 	comp_opts = squashfs_comp_opts(msblk, buffer, length);
 
 out:
+	kfree(actor);
 	kfree(buffer);
 	return comp_opts;
 }
diff --git a/fs/squashfs/decompressor.h b/fs/squashfs/decompressor.h
index 6cdb20a..af09853 100644
--- a/fs/squashfs/decompressor.h
+++ b/fs/squashfs/decompressor.h
@@ -27,8 +27,9 @@ struct squashfs_decompressor {
 	void	*(*init)(struct squashfs_sb_info *, void *);
 	void	*(*comp_opts)(struct squashfs_sb_info *, void *, int);
 	void	(*free)(void *);
-	int	(*decompress)(struct squashfs_sb_info *, void *, void **,
-		struct buffer_head **, int, int, int, int, int);
+	int	(*decompress)(struct squashfs_sb_info *, void *,
+		struct buffer_head **, int, int, int,
+		struct squashfs_page_actor *);
 	int	id;
 	char	*name;
 	int	supported;
diff --git a/fs/squashfs/decompressor_multi.c b/fs/squashfs/decompressor_multi.c
index 462731d..ae54675 100644
--- a/fs/squashfs/decompressor_multi.c
+++ b/fs/squashfs/decompressor_multi.c
@@ -183,15 +183,14 @@ wait:
 }
 
 
-int squashfs_decompress(struct squashfs_sb_info *msblk,
-	void **buffer, struct buffer_head **bh, int b, int offset, int length,
-	int srclength, int pages)
+int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh,
+	int b, int offset, int length, struct squashfs_page_actor *output)
 {
 	int res;
 	struct squashfs_stream *stream = msblk->stream;
 	struct decomp_stream *decomp_stream = get_decomp_stream(msblk, stream);
 	res = msblk->decompressor->decompress(msblk, decomp_stream->stream,
-		buffer, bh, b, offset, length, srclength, pages);
+		bh, b, offset, length, output);
 	put_decomp_stream(decomp_stream, stream);
 	if (res < 0)
 		ERROR("%s decompression failed, data probably corrupt\n",
diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c
index b5598ab..45381ce 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -79,17 +79,16 @@ void squashfs_decompressor_destroy(struct squashfs_sb_info *msblk)
 	}
 }
 
-int squashfs_decompress(struct squashfs_sb_info *msblk,
-	void **buffer, struct buffer_head **bh, int b, int offset, int length,
-	int srclength, int pages)
+int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh,
+	int b, int offset, int length, struct squashfs_page_actor *output)
 {
 	int res;
 	struct squashfs_stream __percpu *percpu =
 			(struct squashfs_stream __percpu *) msblk->stream;
 	struct squashfs_stream *stream = get_cpu_ptr(percpu);
 
-	res = msblk->decompressor->decompress(msblk, stream->stream, buffer,
-		bh, b, offset, length, srclength, pages);
+	res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+		offset, length, output);
 	put_cpu_ptr(stream);
 
 	if (res < 0)
diff --git a/fs/squashfs/decompressor_single.c b/fs/squashfs/decompressor_single.c
index c5bb694..2544e81 100644
--- a/fs/squashfs/decompressor_single.c
+++ b/fs/squashfs/decompressor_single.c
@@ -61,16 +61,15 @@ void squashfs_decompressor_destroy(struct squashfs_sb_info *msblk)
 	}
 }
 
-int squashfs_decompress(struct squashfs_sb_info *msblk,
-	void **buffer, struct buffer_head **bh, int b, int offset, int length,
-	int srclength, int pages)
+int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh,
+	int b, int offset, int length, struct squashfs_page_actor *output)
 {
 	int res;
 	struct squashfs_stream *stream = msblk->stream;
 
 	mutex_lock(&stream->mutex);
-	res = msblk->decompressor->decompress(msblk, stream->stream, buffer,
-		bh, b, offset, length, srclength, pages);
+	res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+		offset, length, output);
 	mutex_unlock(&stream->mutex);
 
 	if (res < 0)
diff --git a/fs/squashfs/lzo_wrapper.c b/fs/squashfs/lzo_wrapper.c
index 75c3b57..244b9fb 100644
--- a/fs/squashfs/lzo_wrapper.c
+++ b/fs/squashfs/lzo_wrapper.c
@@ -31,6 +31,7 @@
 #include "squashfs_fs_sb.h"
 #include "squashfs.h"
 #include "decompressor.h"
+#include "page_actor.h"
 
 struct squashfs_lzo {
 	void	*input;
@@ -75,13 +76,13 @@ static void lzo_free(void *strm)
 
 
 static int lzo_uncompress(struct squashfs_sb_info *msblk, void *strm,
-	void **buffer, struct buffer_head **bh, int b, int offset, int length,
-	int srclength, int pages)
+	struct buffer_head **bh, int b, int offset, int length,
+	struct squashfs_page_actor *output)
 {
 	struct squashfs_lzo *stream = strm;
-	void *buff = stream->input;
+	void *buff = stream->input, *data;
 	int avail, i, bytes = length, res;
-	size_t out_len = srclength;
+	size_t out_len = output->length;
 
 	for (i = 0; i < b; i++) {
 		avail = min(bytes, msblk->devblksize - offset);
@@ -98,12 +99,20 @@ static int lzo_uncompress(struct squashfs_sb_info *msblk, void *strm,
 		goto failed;
 
 	res = bytes = (int)out_len;
-	for (i = 0, buff = stream->output; bytes && i < pages; i++) {
-		avail = min_t(int, bytes, PAGE_CACHE_SIZE);
-		memcpy(buffer[i], buff, avail);
-		buff += avail;
-		bytes -= avail;
+	data = squashfs_first_page(output);
+	buff = stream->output;
+	while (data) {
+		if (bytes <= PAGE_CACHE_SIZE) {
+			memcpy(data, buff, bytes);
+			break;
+		} else {
+			memcpy(data, buff, PAGE_CACHE_SIZE);
+			buff += PAGE_CACHE_SIZE;
+			bytes -= PAGE_CACHE_SIZE;
+			data = squashfs_next_page(output);
+		}
 	}
+	squashfs_finish_page(output);
 
 	return res;
 
diff --git a/fs/squashfs/page_actor.h b/fs/squashfs/page_actor.h
new file mode 100644
index 0000000..19a66a3
--- /dev/null
+++ b/fs/squashfs/page_actor.h
@@ -0,0 +1,54 @@
+#ifndef PAGE_ACTOR_H
+#define PAGE_ACTOR_H
+/*
+ * Copyright (c) 2013
+ * Phillip Lougher <phillip@squashfs.org.uk>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+struct squashfs_page_actor {
+	void	**page;
+	int	pages;
+	int	length;
+	int	next_page;
+};
+
+static inline struct squashfs_page_actor *squashfs_page_actor_init(void **page,
+	int pages, int length)
+{
+	struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL);
+
+	if (actor == NULL)
+		return NULL;
+
+	if (length)
+		actor->length = length;
+	else
+		actor->length = pages * PAGE_CACHE_SIZE;
+	actor->page = page;
+	actor->pages = pages;
+	actor->next_page = 0;
+	return actor;
+}
+
+static inline void *squashfs_first_page(struct squashfs_page_actor *actor)
+{
+	actor->next_page = 1;
+	return actor->page[0];
+}
+
+static inline void *squashfs_next_page(struct squashfs_page_actor *actor)
+{
+	if (actor->next_page == actor->pages)
+		return NULL;
+
+	return actor->page[actor->next_page++];
+}
+
+static inline void squashfs_finish_page(struct squashfs_page_actor *actor)
+{
+	/* empty */
+}
+#endif
diff --git a/fs/squashfs/squashfs.h b/fs/squashfs/squashfs.h
index 2e2751d..6a97e63 100644
--- a/fs/squashfs/squashfs.h
+++ b/fs/squashfs/squashfs.h
@@ -28,8 +28,8 @@
 #define WARNING(s, args...)	pr_warning("SQUASHFS: "s, ## args)
 
 /* block.c */
-extern int squashfs_read_data(struct super_block *, void **, u64, int, u64 *,
-				int, int);
+extern int squashfs_read_data(struct super_block *, u64, int, u64 *,
+				struct squashfs_page_actor *);
 
 /* cache.c */
 extern struct squashfs_cache *squashfs_cache_init(char *, int, int);
@@ -53,8 +53,8 @@ extern void *squashfs_decompressor_setup(struct super_block *, unsigned short);
 /* decompressor_xxx.c */
 extern void *squashfs_decompressor_create(struct squashfs_sb_info *, void *);
 extern void squashfs_decompressor_destroy(struct squashfs_sb_info *);
-extern int squashfs_decompress(struct squashfs_sb_info *, void **,
-	struct buffer_head **, int, int, int, int, int);
+extern int squashfs_decompress(struct squashfs_sb_info *, struct buffer_head **,
+	int, int, int, struct squashfs_page_actor *);
 extern int squashfs_max_decompressors(void);
 
 /* export.c */
diff --git a/fs/squashfs/squashfs_fs_sb.h b/fs/squashfs/squashfs_fs_sb.h
index 9cdcf41..1da565c 100644
--- a/fs/squashfs/squashfs_fs_sb.h
+++ b/fs/squashfs/squashfs_fs_sb.h
@@ -50,6 +50,7 @@ struct squashfs_cache_entry {
 	wait_queue_head_t	wait_queue;
 	struct squashfs_cache	*cache;
 	void			**data;
+	struct squashfs_page_actor	*actor;
 };
 
 struct squashfs_sb_info {
diff --git a/fs/squashfs/xz_wrapper.c b/fs/squashfs/xz_wrapper.c
index 5d1d07c..c609624 100644
--- a/fs/squashfs/xz_wrapper.c
+++ b/fs/squashfs/xz_wrapper.c
@@ -32,6 +32,7 @@
 #include "squashfs_fs_sb.h"
 #include "squashfs.h"
 #include "decompressor.h"
+#include "page_actor.h"
 
 struct squashfs_xz {
 	struct xz_dec *state;
@@ -129,11 +130,11 @@ static void squashfs_xz_free(void *strm)
 
 
 static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void *strm,
-	void **buffer, struct buffer_head **bh, int b, int offset, int length,
-	int srclength, int pages)
+	struct buffer_head **bh, int b, int offset, int length,
+	struct squashfs_page_actor *output)
 {
 	enum xz_ret xz_err;
-	int avail, total = 0, k = 0, page = 0;
+	int avail, total = 0, k = 0;
 	struct squashfs_xz *stream = strm;
 
 	xz_dec_reset(stream->state);
@@ -141,7 +142,7 @@ static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void *strm,
 	stream->buf.in_size = 0;
 	stream->buf.out_pos = 0;
 	stream->buf.out_size = PAGE_CACHE_SIZE;
-	stream->buf.out = buffer[page++];
+	stream->buf.out = squashfs_first_page(output);
 
 	do {
 		if (stream->buf.in_pos == stream->buf.in_size && k < b) {
@@ -153,11 +154,12 @@ static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void *strm,
 			offset = 0;
 		}
 
-		if (stream->buf.out_pos == stream->buf.out_size
-							&& page < pages) {
-			stream->buf.out = buffer[page++];
-			stream->buf.out_pos = 0;
-			total += PAGE_CACHE_SIZE;
+		if (stream->buf.out_pos == stream->buf.out_size) {
+			stream->buf.out = squashfs_next_page(output);
+			if (stream->buf.out != NULL) {
+				stream->buf.out_pos = 0;
+				total += PAGE_CACHE_SIZE;
+			}
 		}
 
 		xz_err = xz_dec_run(stream->state, &stream->buf);
@@ -166,6 +168,8 @@ static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void *strm,
 			put_bh(bh[k++]);
 	} while (xz_err == XZ_OK);
 
+	squashfs_finish_page(output);
+
 	if (xz_err != XZ_STREAM_END || k < b)
 		goto out;
 
diff --git a/fs/squashfs/zlib_wrapper.c b/fs/squashfs/zlib_wrapper.c
index bb04902..8727cab 100644
--- a/fs/squashfs/zlib_wrapper.c
+++ b/fs/squashfs/zlib_wrapper.c
@@ -32,6 +32,7 @@
 #include "squashfs_fs_sb.h"
 #include "squashfs.h"
 #include "decompressor.h"
+#include "page_actor.h"
 
 static void *zlib_init(struct squashfs_sb_info *dummy, void *buff)
 {
@@ -62,14 +63,14 @@ static void zlib_free(void *strm)
 
 
 static int zlib_uncompress(struct squashfs_sb_info *msblk, void *strm,
-	void **buffer, struct buffer_head **bh, int b, int offset, int length,
-	int srclength, int pages)
+	struct buffer_head **bh, int b, int offset, int length,
+	struct squashfs_page_actor *output)
 {
-	int zlib_err, zlib_init = 0;
-	int k = 0, page = 0;
+	int zlib_err, zlib_init = 0, k = 0;
 	z_stream *stream = strm;
 
-	stream->avail_out = 0;
+	stream->avail_out = PAGE_CACHE_SIZE;
+	stream->next_out = squashfs_first_page(output);
 	stream->avail_in = 0;
 
 	do {
@@ -81,15 +82,18 @@ static int zlib_uncompress(struct squashfs_sb_info *msblk, void *strm,
 			offset = 0;
 		}
 
-		if (stream->avail_out == 0 && page < pages) {
-			stream->next_out = buffer[page++];
-			stream->avail_out = PAGE_CACHE_SIZE;
+		if (stream->avail_out == 0) {
+			stream->next_out = squashfs_next_page(output);
+			if (stream->next_out != NULL)
+				stream->avail_out = PAGE_CACHE_SIZE;
 		}
 
 		if (!zlib_init) {
 			zlib_err = zlib_inflateInit(stream);
-			if (zlib_err != Z_OK)
+			if (zlib_err != Z_OK) {
+				squashfs_finish_page(output);
 				goto out;
+			}
 			zlib_init = 1;
 		}
 
@@ -99,6 +103,8 @@ static int zlib_uncompress(struct squashfs_sb_info *msblk, void *strm,
 			put_bh(bh[k++]);
 	} while (zlib_err == Z_OK);
 
+	squashfs_finish_page(output);
+
 	if (zlib_err != Z_STREAM_END)
 		goto out;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] Squashfs: restructure squashfs_readpage()
  2013-11-07 20:24 [PATCH 0/6] Squashfs performance improvements Phillip Lougher
                   ` (3 preceding siblings ...)
  2013-11-07 20:24 ` [PATCH 4/6] Squashfs: Generalise paging handling in the decompressors (V2) Phillip Lougher
@ 2013-11-07 20:24 ` Phillip Lougher
  2013-11-08  5:55   ` Minchan Kim
  2013-11-08  6:02   ` Minchan Kim
  2013-11-07 20:24 ` [PATCH 6/6] Squashfs: Directly decompress into the page cache for file data (V2) Phillip Lougher
  5 siblings, 2 replies; 15+ messages in thread
From: Phillip Lougher @ 2013-11-07 20:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: minchan, Phillip Lougher

Restructure squashfs_readpage() splitting it into separate
functions for datablocks, fragments and sparse blocks.

Move the memcpying (from squashfs cache entry) implementation of
squashfs_readpage_block into file_cache.c

This allows different implementations to be supported.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
---
 fs/squashfs/Makefile     |    2 +-
 fs/squashfs/file.c       |  142 +++++++++++++++++++++++-----------------------
 fs/squashfs/file_cache.c |   38 +++++++++++++
 fs/squashfs/squashfs.h   |    7 +++
 4 files changed, 118 insertions(+), 71 deletions(-)
 create mode 100644 fs/squashfs/file_cache.c

diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
index 5833b96..908c0d9 100644
--- a/fs/squashfs/Makefile
+++ b/fs/squashfs/Makefile
@@ -4,7 +4,7 @@
 
 obj-$(CONFIG_SQUASHFS) += squashfs.o
 squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
-squashfs-y += namei.o super.o symlink.o decompressor.o
+squashfs-y += namei.o super.o symlink.o decompressor.o file_cache.c
 squashfs-$(CONFIG_SQUASHFS_DECOMP_SINGLE) += decompressor_single.o
 squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI) += decompressor_multi.o
 squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU) += decompressor_multi_percpu.o
diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 8ca62c2..e5c9689 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -370,77 +370,15 @@ static int read_blocklist(struct inode *inode, int index, u64 *block)
 	return le32_to_cpu(size);
 }
 
-
-static int squashfs_readpage(struct file *file, struct page *page)
+/* Copy data into page cache  */
+void squashfs_copy_cache(struct page *page, struct squashfs_cache_entry *buffer,
+	int bytes, int offset)
 {
 	struct inode *inode = page->mapping->host;
 	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
-	int bytes, i, offset = 0, sparse = 0;
-	struct squashfs_cache_entry *buffer = NULL;
 	void *pageaddr;
-
-	int mask = (1 << (msblk->block_log - PAGE_CACHE_SHIFT)) - 1;
-	int index = page->index >> (msblk->block_log - PAGE_CACHE_SHIFT);
-	int start_index = page->index & ~mask;
-	int end_index = start_index | mask;
-	int file_end = i_size_read(inode) >> msblk->block_log;
-
-	TRACE("Entered squashfs_readpage, page index %lx, start block %llx\n",
-				page->index, squashfs_i(inode)->start);
-
-	if (page->index >= ((i_size_read(inode) + PAGE_CACHE_SIZE - 1) >>
-					PAGE_CACHE_SHIFT))
-		goto out;
-
-	if (index < file_end || squashfs_i(inode)->fragment_block ==
-					SQUASHFS_INVALID_BLK) {
-		/*
-		 * Reading a datablock from disk.  Need to read block list
-		 * to get location and block size.
-		 */
-		u64 block = 0;
-		int bsize = read_blocklist(inode, index, &block);
-		if (bsize < 0)
-			goto error_out;
-
-		if (bsize == 0) { /* hole */
-			bytes = index == file_end ?
-				(i_size_read(inode) & (msblk->block_size - 1)) :
-				 msblk->block_size;
-			sparse = 1;
-		} else {
-			/*
-			 * Read and decompress datablock.
-			 */
-			buffer = squashfs_get_datablock(inode->i_sb,
-								block, bsize);
-			if (buffer->error) {
-				ERROR("Unable to read page, block %llx, size %x"
-					"\n", block, bsize);
-				squashfs_cache_put(buffer);
-				goto error_out;
-			}
-			bytes = buffer->length;
-		}
-	} else {
-		/*
-		 * Datablock is stored inside a fragment (tail-end packed
-		 * block).
-		 */
-		buffer = squashfs_get_fragment(inode->i_sb,
-				squashfs_i(inode)->fragment_block,
-				squashfs_i(inode)->fragment_size);
-
-		if (buffer->error) {
-			ERROR("Unable to read page, block %llx, size %x\n",
-				squashfs_i(inode)->fragment_block,
-				squashfs_i(inode)->fragment_size);
-			squashfs_cache_put(buffer);
-			goto error_out;
-		}
-		bytes = i_size_read(inode) & (msblk->block_size - 1);
-		offset = squashfs_i(inode)->fragment_offset;
-	}
+	int i, mask = (1 << (msblk->block_log - PAGE_CACHE_SHIFT)) - 1;
+	int start_index = page->index & ~mask, end_index = start_index | mask;
 
 	/*
 	 * Loop copying datablock into pages.  As the datablock likely covers
@@ -451,7 +389,7 @@ static int squashfs_readpage(struct file *file, struct page *page)
 	for (i = start_index; i <= end_index && bytes > 0; i++,
 			bytes -= PAGE_CACHE_SIZE, offset += PAGE_CACHE_SIZE) {
 		struct page *push_page;
-		int avail = sparse ? 0 : min_t(int, bytes, PAGE_CACHE_SIZE);
+		int avail = buffer ? min_t(int, bytes, PAGE_CACHE_SIZE) : 0;
 
 		TRACE("bytes %d, i %d, available_bytes %d\n", bytes, i, avail);
 
@@ -475,11 +413,75 @@ skip_page:
 		if (i != page->index)
 			page_cache_release(push_page);
 	}
+}
+
+/* Read datablock stored packed inside a fragment (tail-end packed block) */
+static int squashfs_readpage_fragment(struct page *page)
+{
+	struct inode *inode = page->mapping->host;
+	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+	struct squashfs_cache_entry *buffer = squashfs_get_fragment(inode->i_sb,
+		squashfs_i(inode)->fragment_block,
+		squashfs_i(inode)->fragment_size);
+	int res = buffer->error;
+
+	if (res)
+		ERROR("Unable to read page, block %llx, size %x\n",
+			squashfs_i(inode)->fragment_block,
+			squashfs_i(inode)->fragment_size);
+	else
+		squashfs_copy_cache(page, buffer, i_size_read(inode) &
+			(msblk->block_size - 1),
+			squashfs_i(inode)->fragment_offset);
+
+	squashfs_cache_put(buffer);
+	return res;
+}
 
-	if (!sparse)
-		squashfs_cache_put(buffer);
+static int squashfs_readpage_sparse(struct page *page, int index, int file_end)
+{
+	struct inode *inode = page->mapping->host;
+	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+	int bytes = index == file_end ?
+			(i_size_read(inode) & (msblk->block_size - 1)) :
+			 msblk->block_size;
 
+	squashfs_copy_cache(page, NULL, bytes, 0);
 	return 0;
+}
+
+static int squashfs_readpage(struct file *file, struct page *page)
+{
+	struct inode *inode = page->mapping->host;
+	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+	int index = page->index >> (msblk->block_log - PAGE_CACHE_SHIFT);
+	int file_end = i_size_read(inode) >> msblk->block_log;
+	int res;
+	void *pageaddr;
+
+	TRACE("Entered squashfs_readpage, page index %lx, start block %llx\n",
+				page->index, squashfs_i(inode)->start);
+
+	if (page->index >= ((i_size_read(inode) + PAGE_CACHE_SIZE - 1) >>
+					PAGE_CACHE_SHIFT))
+		goto out;
+
+	if (index < file_end || squashfs_i(inode)->fragment_block ==
+					SQUASHFS_INVALID_BLK) {
+		u64 block = 0;
+		int bsize = read_blocklist(inode, index, &block);
+		if (bsize < 0)
+			goto error_out;
+
+		if (bsize == 0)
+			res = squashfs_readpage_sparse(page, index, file_end);
+		else
+			res = squashfs_readpage_block(page, block, bsize);
+	} else
+		res = squashfs_readpage_fragment(page);
+
+	if (!res)
+		return 0;
 
 error_out:
 	SetPageError(page);
diff --git a/fs/squashfs/file_cache.c b/fs/squashfs/file_cache.c
new file mode 100644
index 0000000..f2310d2
--- /dev/null
+++ b/fs/squashfs/file_cache.c
@@ -0,0 +1,38 @@
+/*
+ * Copyright (c) 2013
+ * Phillip Lougher <phillip@squashfs.org.uk>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/fs.h>
+#include <linux/vfs.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/pagemap.h>
+#include <linux/mutex.h>
+
+#include "squashfs_fs.h"
+#include "squashfs_fs_sb.h"
+#include "squashfs_fs_i.h"
+#include "squashfs.h"
+
+/* Read separately compressed datablock and memcopy into page cache */
+int squashfs_readpage_block(struct page *page, u64 block, int bsize)
+{
+	struct inode *i = page->mapping->host;
+	struct squashfs_cache_entry *buffer = squashfs_get_datablock(i->i_sb,
+		block, bsize);
+	int res = buffer->error;
+
+	if (res)
+		ERROR("Unable to read page, block %llx, size %x\n", block,
+			bsize);
+	else
+		squashfs_copy_cache(page, buffer, buffer->length, 0);
+
+	squashfs_cache_put(buffer);
+	return res;
+}
diff --git a/fs/squashfs/squashfs.h b/fs/squashfs/squashfs.h
index 6a97e63..9e1bb79 100644
--- a/fs/squashfs/squashfs.h
+++ b/fs/squashfs/squashfs.h
@@ -66,6 +66,13 @@ extern int squashfs_frag_lookup(struct super_block *, unsigned int, u64 *);
 extern __le64 *squashfs_read_fragment_index_table(struct super_block *,
 				u64, u64, unsigned int);
 
+/* file.c */
+void squashfs_copy_cache(struct page *, struct squashfs_cache_entry *, int,
+				int);
+
+/* file_xxx.c */
+extern int squashfs_readpage_block(struct page *, u64, int);
+
 /* id.c */
 extern int squashfs_get_id(struct super_block *, unsigned int, unsigned int *);
 extern __le64 *squashfs_read_id_index_table(struct super_block *, u64, u64,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] Squashfs: Directly decompress into the page cache for file data (V2)
  2013-11-07 20:24 [PATCH 0/6] Squashfs performance improvements Phillip Lougher
                   ` (4 preceding siblings ...)
  2013-11-07 20:24 ` [PATCH 5/6] Squashfs: restructure squashfs_readpage() Phillip Lougher
@ 2013-11-07 20:24 ` Phillip Lougher
  2013-11-08  8:23   ` Minchan Kim
  5 siblings, 1 reply; 15+ messages in thread
From: Phillip Lougher @ 2013-11-07 20:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: minchan, Phillip Lougher

This introduces an implementation of squashfs_readpage_block()
that directly decompresses into the page cache.

This uses the previously added page handler abstraction to push
down the necessary kmap_atomic/kunmap_atomic operations on the
page cache buffers into the decompressors.  This enables
direct copying into the page cache without using the slow
kmap/kunmap calls.

The code detects when multiple threads are racing in
squashfs_readpage() to decompress the same block, and avoids
this regression by falling back to using an intermediate
buffer.

This patch enhances the performance of Squashfs significantly
when multiple processes are accessing the filesystem simultaneously
because it not only reduces memcopying, but it more importantly
eliminates the lock contention on the intermediate buffer.

Using single-thread decompression.

        dd if=file1 of=/dev/null bs=4096 &
        dd if=file2 of=/dev/null bs=4096 &
        dd if=file3 of=/dev/null bs=4096 &
        dd if=file4 of=/dev/null bs=4096

Before:

629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s

After:

629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s

V2:
  * update comment adding failure to grab pages could be
    because we've been VM reclaimed, but the other pages are
    still in the page cache and uptodate.
  * Make Kconfig option a choice, making the either-other nature of
    the option more explicit, and also tidying up the ifdef in the
    Makefile

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
---
 fs/squashfs/Kconfig       |   28 +++++++
 fs/squashfs/Makefile      |    4 +-
 fs/squashfs/file_direct.c |  178 +++++++++++++++++++++++++++++++++++++++++++++
 fs/squashfs/page_actor.c  |  104 ++++++++++++++++++++++++++
 fs/squashfs/page_actor.h  |   32 ++++++++
 5 files changed, 345 insertions(+), 1 deletion(-)
 create mode 100644 fs/squashfs/file_direct.c
 create mode 100644 fs/squashfs/page_actor.c

diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
index c92c75f..3a21adf 100644
--- a/fs/squashfs/Kconfig
+++ b/fs/squashfs/Kconfig
@@ -26,6 +26,34 @@ config SQUASHFS
 	  If unsure, say N.
 
 choice
+	prompt "File decompression options"
+	depends on SQUASHFS
+	help
+	  Squashfs now supports two options for decompressing file
+	  data.  Traditionally Squashfs has decompressed into an
+	  intermediate buffer and then memcopied it into the page cache.
+	  Squashfs now supports the ability to decompress directly into
+	  the page cache.
+
+	  If unsure, select "Decompress file data into an intermediate buffer"
+
+config SQUASHFS_FILE_CACHE
+	bool "Decompress file data into an intermediate buffer"
+	help
+	  Decompress file data into an intermediate buffer and then
+	  memcopy it into the page cache.
+
+config SQUASHFS_FILE_DIRECT
+	bool "Decompress files directly into the page cache"
+	help
+	  Directly decompress file data into the page cache.
+	  Doing so can significantly improve performance because
+	  it eliminates a mempcpy and it also removes the lock contention
+	  on the single buffer.
+
+endchoice
+
+choice
 	prompt "Decompressor parallelisation options"
 	depends on SQUASHFS
 	help
diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
index 908c0d9..4132520 100644
--- a/fs/squashfs/Makefile
+++ b/fs/squashfs/Makefile
@@ -4,7 +4,9 @@
 
 obj-$(CONFIG_SQUASHFS) += squashfs.o
 squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
-squashfs-y += namei.o super.o symlink.o decompressor.o file_cache.c
+squashfs-y += namei.o super.o symlink.o decompressor.o
+squashfs-$(CONFIG_SQUASHFS_FILE_CACHE) += file_cache.o
+squashfs-$(CONFIG_SQUASHFS_FILE_DIRECT) += file_direct.o page_actor.o
 squashfs-$(CONFIG_SQUASHFS_DECOMP_SINGLE) += decompressor_single.o
 squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI) += decompressor_multi.o
 squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU) += decompressor_multi_percpu.o
diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c
new file mode 100644
index 0000000..d020d94
--- /dev/null
+++ b/fs/squashfs/file_direct.c
@@ -0,0 +1,178 @@
+/*
+ * Copyright (c) 2013
+ * Phillip Lougher <phillip@squashfs.org.uk>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/fs.h>
+#include <linux/vfs.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/pagemap.h>
+#include <linux/mutex.h>
+
+#include "squashfs_fs.h"
+#include "squashfs_fs_sb.h"
+#include "squashfs_fs_i.h"
+#include "squashfs.h"
+#include "page_actor.h"
+
+static int squashfs_read_cache(struct page *target_page, u64 block, int bsize,
+	int pages, struct page **page);
+
+/* Read separately compressed datablock directly into page cache */
+int squashfs_readpage_block(struct page *target_page, u64 block, int bsize)
+
+{
+	struct inode *inode = target_page->mapping->host;
+	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+
+	int file_end = (i_size_read(inode) - 1) >> PAGE_CACHE_SHIFT;
+	int mask = (1 << (msblk->block_log - PAGE_CACHE_SHIFT)) - 1;
+	int start_index = target_page->index & ~mask;
+	int end_index = start_index | mask;
+	int i, n, pages, missing_pages, bytes, res = -ENOMEM;
+	struct page **page;
+	struct squashfs_page_actor *actor;
+	void *pageaddr;
+
+	if (end_index > file_end)
+		end_index = file_end;
+
+	pages = end_index - start_index + 1;
+
+	page = kmalloc(sizeof(void *) * pages, GFP_KERNEL);
+	if (page == NULL)
+		goto error_out;
+
+	/*
+	 * Create a "page actor" which will kmap and kunmap the
+	 * page cache pages appropriately within the decompressor
+	 */
+	actor = squashfs_page_actor_init_special(page, pages, 0);
+	if (actor == NULL)
+		goto error_out2;
+
+	/* Try to grab all the pages covered by the Squashfs block */
+	for (missing_pages = 0, i = 0, n = start_index; i < pages; i++, n++) {
+		page[i] = (n == target_page->index) ? target_page :
+			grab_cache_page_nowait(target_page->mapping, n);
+
+		if (page[i] == NULL) {
+			missing_pages++;
+			continue;
+		}
+
+		if (PageUptodate(page[i])) {
+			unlock_page(page[i]);
+			page_cache_release(page[i]);
+			page[i] = NULL;
+			missing_pages++;
+		}
+	}
+
+	if (missing_pages) {
+		/*
+		 * Couldn't get one or more pages, this page has either
+		 * been VM reclaimed, but others are still in the page cache
+		 * and uptodate, or we're racing with another thread in
+		 * squashfs_readpage also trying to grab them.  Fall back to
+		 * using an intermediate buffer.
+		 */
+		kfree(actor);
+		return squashfs_read_cache(target_page, block, bsize, pages,
+								page);
+	}
+
+	/* Decompress directly into the page cache buffers */
+	res = squashfs_read_data(inode->i_sb, block, bsize, NULL, actor);
+	if (res < 0)
+		goto mark_errored;
+
+	/* Last page may have trailing bytes not filled */
+	bytes = res % PAGE_CACHE_SIZE;
+	if (bytes) {
+		pageaddr = kmap_atomic(page[pages - 1]);
+		memset(pageaddr + bytes, 0, PAGE_CACHE_SIZE - bytes);
+		kunmap_atomic(pageaddr);
+	}
+
+	/* Mark pages as uptodate, unlock and release */
+	for (i = 0; i < pages; i++) {
+		flush_dcache_page(page[i]);
+		SetPageUptodate(page[i]);
+		unlock_page(page[i]);
+		if (page[i] != target_page)
+			page_cache_release(page[i]);
+	}
+
+	kfree(actor);
+	kfree(page);
+
+	return 0;
+
+mark_errored:
+	/* Decompression failed, mark pages as errored.  Target_page is
+	 * dealt with by the caller
+	 */
+	for (i = 0; i < pages; i++) {
+		if (page[i] == target_page)
+			continue;
+		pageaddr = kmap_atomic(page[i]);
+		memset(pageaddr, 0, PAGE_CACHE_SIZE);
+		kunmap_atomic(pageaddr);
+		flush_dcache_page(page[i]);
+		SetPageError(page[i]);
+		unlock_page(page[i]);
+		page_cache_release(page[i]);
+	}
+
+	kfree(actor);
+error_out2:
+	kfree(page);
+error_out:
+	return res;
+}
+
+
+static int squashfs_read_cache(struct page *target_page, u64 block, int bsize,
+	int pages, struct page **page)
+{
+	struct inode *i = target_page->mapping->host;
+	struct squashfs_cache_entry *buffer = squashfs_get_datablock(i->i_sb,
+						 block, bsize);
+	int bytes = buffer->length, res = buffer->error, n, offset = 0;
+	void *pageaddr;
+
+	if (res) {
+		ERROR("Unable to read page, block %llx, size %x\n", block,
+			bsize);
+		goto out;
+	}
+
+	for (n = 0; n < pages && bytes > 0; n++,
+			bytes -= PAGE_CACHE_SIZE, offset += PAGE_CACHE_SIZE) {
+		int avail = min_t(int, bytes, PAGE_CACHE_SIZE);
+
+		if (page[n] == NULL)
+			continue;
+
+		pageaddr = kmap_atomic(page[n]);
+		squashfs_copy_data(pageaddr, buffer, offset, avail);
+		memset(pageaddr + avail, 0, PAGE_CACHE_SIZE - avail);
+		kunmap_atomic(pageaddr);
+		flush_dcache_page(page[n]);
+		SetPageUptodate(page[n]);
+		unlock_page(page[n]);
+		if (page[n] != target_page)
+			page_cache_release(page[n]);
+	}
+
+out:
+	squashfs_cache_put(buffer);
+	kfree(page);
+	return res;
+}
diff --git a/fs/squashfs/page_actor.c b/fs/squashfs/page_actor.c
new file mode 100644
index 0000000..8e754ff
--- /dev/null
+++ b/fs/squashfs/page_actor.c
@@ -0,0 +1,104 @@
+/*
+ * Copyright (c) 2013
+ * Phillip Lougher <phillip@squashfs.org.uk>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/pagemap.h>
+#include "page_actor.h"
+
+/* Implementation of page_actor for decompressing into intermediate buffer */
+static void *cache_first_page(struct squashfs_page_actor *actor)
+{
+	actor->next_page = 1;
+	return actor->buffer[0];
+}
+
+static void *cache_next_page(struct squashfs_page_actor *actor)
+{
+	if (actor->next_page == actor->pages)
+		return NULL;
+
+	return actor->buffer[actor->next_page++];
+}
+
+static void cache_finish_page(struct squashfs_page_actor *actor)
+{
+	/* empty */
+}
+
+struct squashfs_page_actor *squashfs_page_actor_init(void **buffer,
+	int pages, int length)
+{
+	struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL);
+
+	if (actor == NULL)
+		return NULL;
+
+	if (length)
+		actor->length = length;
+	else
+		actor->length = pages * PAGE_CACHE_SIZE;
+	actor->buffer = buffer;
+	actor->pages = pages;
+	actor->next_page = 0;
+
+	actor->squashfs_first_page = cache_first_page;
+	actor->squashfs_next_page = cache_next_page;
+	actor->squashfs_finish_page = cache_finish_page;
+	return actor;
+}
+
+/* Implementation of page_actor for decompressing directly into page cache */
+static void *direct_first_page(struct squashfs_page_actor *actor)
+{
+	actor->next_page = 1;
+	return actor->pageaddr = kmap_atomic(actor->page[0]);
+}
+
+static void *direct_next_page(struct squashfs_page_actor *actor)
+{
+	if (actor->pageaddr)
+		kunmap_atomic(actor->pageaddr);
+
+	if (actor->next_page == actor->pages) {
+		actor->pageaddr = NULL;
+		return NULL;
+	}
+
+	return actor->pageaddr = kmap_atomic(actor->page[actor->next_page++]);
+}
+
+static void direct_finish_page(struct squashfs_page_actor *actor)
+{
+	if (actor->pageaddr)
+		kunmap_atomic(actor->pageaddr);
+}
+
+
+struct squashfs_page_actor *squashfs_page_actor_init_special(struct page **page,
+	int pages, int length)
+{
+	struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL);
+
+	if (actor == NULL)
+		return NULL;
+
+	if (length)
+		actor->length = length;
+	else
+		actor->length = pages * PAGE_CACHE_SIZE;
+	actor->page = page;
+	actor->pages = pages;
+	actor->next_page = 0;
+	actor->pageaddr = NULL;
+
+	actor->squashfs_first_page = direct_first_page;
+	actor->squashfs_next_page = direct_next_page;
+	actor->squashfs_finish_page = direct_finish_page;
+	return actor;
+}
diff --git a/fs/squashfs/page_actor.h b/fs/squashfs/page_actor.h
index 19a66a3..22731c7 100644
--- a/fs/squashfs/page_actor.h
+++ b/fs/squashfs/page_actor.h
@@ -8,6 +8,7 @@
  * the COPYING file in the top-level directory.
  */
 
+#ifndef CONFIG_SQUASHFS_FILE_DIRECT
 struct squashfs_page_actor {
 	void	**page;
 	int	pages;
@@ -51,4 +52,35 @@ static inline void squashfs_finish_page(struct squashfs_page_actor *actor)
 {
 	/* empty */
 }
+#else
+struct squashfs_page_actor {
+	union {
+		void		**buffer;
+		struct page	**page;
+	};
+	void	*pageaddr;
+	void    *(*squashfs_first_page)(struct squashfs_page_actor *);
+	void    *(*squashfs_next_page)(struct squashfs_page_actor *);
+	void    (*squashfs_finish_page)(struct squashfs_page_actor *);
+	int	pages;
+	int	length;
+	int	next_page;
+};
+
+extern struct squashfs_page_actor *squashfs_page_actor_init(void **, int, int);
+extern struct squashfs_page_actor *squashfs_page_actor_init_special(struct page
+							 **, int, int);
+static inline void *squashfs_first_page(struct squashfs_page_actor *actor)
+{
+	return actor->squashfs_first_page(actor);
+}
+static inline void *squashfs_next_page(struct squashfs_page_actor *actor)
+{
+	return actor->squashfs_next_page(actor);
+}
+static inline void squashfs_finish_page(struct squashfs_page_actor *actor)
+{
+	actor->squashfs_finish_page(actor);
+}
+#endif
 #endif
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
  2013-11-07 20:24 ` [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables Phillip Lougher
@ 2013-11-08  2:42   ` Minchan Kim
  2013-11-14 17:04     ` Phillip Lougher
  0 siblings, 1 reply; 15+ messages in thread
From: Minchan Kim @ 2013-11-08  2:42 UTC (permalink / raw)
  To: Phillip Lougher; +Cc: linux-kernel

Hello Phillip,

On Thu, Nov 07, 2013 at 08:24:22PM +0000, Phillip Lougher wrote:
> Add a multi-threaded decompression implementation which uses
> percpu variables.
> 
> Using percpu variables has advantages and disadvantages over
> implementations which do not use percpu variables.
> 
> Advantages: the nature of percpu variables ensures decompression is
> load-balanced across the multiple cores.
> 
> Disadvantages: it limits decompression to one thread per core.

At a glance, I understand your concern but I don't see benefit to
make this feature as separate new config because we can modify the
number of decompressor per core in the future.
I don't want to create new config SQUASHFS_DECOMP_MULTI_3,
SQUASHFS_DECOMP_MULTI_4 and so on. :)

How about this?

1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin
   in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it
   to sysfs so user can tune it in rumtime.

2. put decompressor shrink logic by slab shrinker so if system has
   memory pressure, we could catch the event and free some of decompressor
   but memory pressure is not severe again in the future, we can create
   new decompressor until reaching threadhold user define.
   We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL
   in get_decomp_stream's allocation indirectly.

In short, let's make decompressor_multi as dynamically tuned system
and user can limit the max.


> 
> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
> ---
>  fs/squashfs/Kconfig                     |   57 +++++++++++++----
>  fs/squashfs/Makefile                    |   10 +--
>  fs/squashfs/decompressor_multi_percpu.c |  105 +++++++++++++++++++++++++++++++
>  3 files changed, 152 insertions(+), 20 deletions(-)
>  create mode 100644 fs/squashfs/decompressor_multi_percpu.c
> 
> diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
> index 1c6d340..c92c75f 100644
> --- a/fs/squashfs/Kconfig
> +++ b/fs/squashfs/Kconfig
> @@ -25,6 +25,50 @@ config SQUASHFS
>  
>  	  If unsure, say N.
>  
> +choice
> +	prompt "Decompressor parallelisation options"
> +	depends on SQUASHFS
> +	help
> +	  Squashfs now supports three parallelisation options for
> +	  decompression.  Each one exhibits various trade-offs between
> +	  decompression performance and CPU and memory usage.
> +
> +	  If in doubt, select "Single threaded compression"
> +
> +config SQUASHFS_DECOMP_SINGLE
> +	bool "Single threaded compression"
> +	help
> +	  Traditionally Squashfs has used single-threaded decompression.
> +	  Only one block (data or metadata) can be decompressed at any
> +	  one time.  This limits CPU and memory usage to a minimum.
> +
> +config SQUASHFS_DECOMP_MULTI
> +	bool "Use multiple decompressors for parallel I/O"
> +	help
> +	  By default Squashfs uses a single decompressor but it gives
> +	  poor performance on parallel I/O workloads when using multiple CPU
> +	  machines due to waiting on decompressor availability.
> +
> +	  If you have a parallel I/O workload and your system has enough memory,
> +	  using this option may improve overall I/O performance.
> +
> +	  This decompressor implementation uses up to two parallel
> +	  decompressors per core.  It dynamically allocates decompressors
> +	  on a demand basis.

I'm not sure it's good idea to write how many of parallel decompressor
per core. IMO, It's implementation stuff and user don't need to know internal.
And we could modify it in the future.

> +
> +config SQUASHFS_DECOMP_MULTI_PERCPU
> +       bool "Use percpu multiple decompressors for parallel I/O"
> +	help

Indentation.

> +	  By default Squashfs uses a single decompressor but it gives
> +	  poor performance on parallel I/O workloads when using multiple CPU
> +	  machines due to waiting on decompressor availability.
> +
> +	  This decompressor implementation uses a maximum of one
> +	  decompressor per core.  It uses percpu variables to ensure
> +	  decompression is load-balanced across the cores.
> +
> +endchoice
> +
>  config SQUASHFS_XATTR
>  	bool "Squashfs XATTR support"
>  	depends on SQUASHFS
> @@ -63,19 +107,6 @@ config SQUASHFS_LZO
>  
>  	  If unsure, say N.
>  
> -config SQUASHFS_MULTI_DECOMPRESSOR
> -	bool "Use multiple decompressors for handling parallel I/O"
> -	depends on SQUASHFS
> -	help
> -	  By default Squashfs uses a single decompressor but it gives
> -	  poor performance on parallel I/O workloads when using multiple CPU
> -	  machines due to waiting on decompressor availability.
> -
> -	  If you have a parallel I/O workload and your system has enough memory,
> -	  using this option may improve overall I/O performance.
> -
> -	  If unsure, say N.
> -
>  config SQUASHFS_XZ
>  	bool "Include support for XZ compressed file systems"
>  	depends on SQUASHFS
> diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
> index dfebc3b..5833b96 100644
> --- a/fs/squashfs/Makefile
> +++ b/fs/squashfs/Makefile
> @@ -5,14 +5,10 @@
>  obj-$(CONFIG_SQUASHFS) += squashfs.o
>  squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
>  squashfs-y += namei.o super.o symlink.o decompressor.o
> -
> +squashfs-$(CONFIG_SQUASHFS_DECOMP_SINGLE) += decompressor_single.o
> +squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI) += decompressor_multi.o
> +squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU) += decompressor_multi_percpu.o
>  squashfs-$(CONFIG_SQUASHFS_XATTR) += xattr.o xattr_id.o
>  squashfs-$(CONFIG_SQUASHFS_LZO) += lzo_wrapper.o
>  squashfs-$(CONFIG_SQUASHFS_XZ) += xz_wrapper.o
>  squashfs-$(CONFIG_SQUASHFS_ZLIB) += zlib_wrapper.o
> -
> -ifdef CONFIG_SQUASHFS_MULTI_DECOMPRESSOR
> -	squashfs-y		+= decompressor_multi.o
> -else
> -	squashfs-y		+= decompressor_single.o
> -endif
> diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c
> new file mode 100644
> index 0000000..b5598ab
> --- /dev/null
> +++ b/fs/squashfs/decompressor_multi_percpu.c
> @@ -0,0 +1,105 @@
> +/*
> + * Copyright (c) 2013
> + * Phillip Lougher <phillip@squashfs.org.uk>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/mutex.h>
> +#include <linux/slab.h>
> +#include <linux/buffer_head.h>
> +
> +#include "squashfs_fs.h"
> +#include "squashfs_fs_sb.h"
> +#include "decompressor.h"
> +#include "squashfs.h"
> +
> +/*
> + * This file implements multi-threaded decompression using percpu
> + * variables, one thread per cpu core.
> + */
> +
> +struct squashfs_stream {
> +	void		*stream;
> +};
> +
> +void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
> +						void *comp_opts)
> +{
> +	struct squashfs_stream *stream;
> +	struct squashfs_stream __percpu *percpu;
> +	int err, cpu, cpu0;
> +
> +	percpu = alloc_percpu(struct squashfs_stream);
> +	if (percpu == NULL) {
> +		err = -ENOMEM;
> +		goto out2;
> +	}
> +
> +	for_each_possible_cpu(cpu) {
> +		stream = per_cpu_ptr(percpu, cpu);
> +		stream->stream = msblk->decompressor->init(msblk, comp_opts);
> +		if (IS_ERR(stream->stream)) {
> +			err = PTR_ERR(stream->stream);
> +			goto out;
> +		}
> +	}
> +
> +	kfree(comp_opts);

squashfs_decompressor_setup free comp_opts.

> +	return (__force void *) percpu;
> +
> +out:
> +	for_each_possible_cpu(cpu0) {
> +		if (cpu0 == cpu)

Just nit so you could ignore easily. :)

this cpu variable comparing is tricky to me.
I'm not sure what happens if CPU hotplug happens between above two loop
I guess your code is but I couldn't find such usecase in other code.

alloc_percpu semantic is that "Allocate zero-filled percpu area"
so how about using it instead of cpu index comparing?

for_each_possbile_cpu(cpu) {
        stream = per_cpu_ptr(percpu, cpu0);
        if (stream)
                msblk->decompressor->free(stream->stream);
}

It's never hot path and not fragile to change CPU hotplug, IMHO.


> +			break;
> +		stream = per_cpu_ptr(percpu, cpu0);
> +		msblk->decompressor->free(stream->stream);
> +	}
> +	free_percpu(percpu);
> +out2:
> +	kfree(comp_opts);

We can throw away, either.


> +	return ERR_PTR(err);
> +}
> +
> +void squashfs_decompressor_destroy(struct squashfs_sb_info *msblk)
> +{
> +	struct squashfs_stream __percpu *percpu =
> +			(struct squashfs_stream __percpu *) msblk->stream;
> +	struct squashfs_stream *stream;
> +	int cpu;
> +
> +	if (msblk->stream) {
> +		for_each_possible_cpu(cpu) {
> +			stream = per_cpu_ptr(percpu, cpu);
> +			msblk->decompressor->free(stream->stream);
> +		}
> +		free_percpu(percpu);
> +	}
> +}
> +
> +int squashfs_decompress(struct squashfs_sb_info *msblk,
> +	void **buffer, struct buffer_head **bh, int b, int offset, int length,
> +	int srclength, int pages)
> +{
> +	int res;
> +	struct squashfs_stream __percpu *percpu =
> +			(struct squashfs_stream __percpu *) msblk->stream;
> +	struct squashfs_stream *stream = get_cpu_ptr(percpu);
> +
> +	res = msblk->decompressor->decompress(msblk, stream->stream, buffer,
> +		bh, b, offset, length, srclength, pages);
> +	put_cpu_ptr(stream);
> +
> +	if (res < 0)
> +		ERROR("%s decompression failed, data probably corrupt\n",
> +			msblk->decompressor->name);
> +
> +	return res;
> +}
> +
> +int squashfs_max_decompressors(void)
> +{
> +	return num_possible_cpus();
> +}
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] Squashfs: Generalise paging handling in the decompressors (V2)
  2013-11-07 20:24 ` [PATCH 4/6] Squashfs: Generalise paging handling in the decompressors (V2) Phillip Lougher
@ 2013-11-08  5:29   ` Minchan Kim
  0 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2013-11-08  5:29 UTC (permalink / raw)
  To: Phillip Lougher; +Cc: linux-kernel

On Thu, Nov 07, 2013 at 08:24:23PM +0000, Phillip Lougher wrote:
> Further generalise the decompressors by adding a page handler
> abstraction.  This adds helpers to allow the decompressors
> to access and process the output buffers in an implementation
> independant manner.
> 
> This allows different types of output buffer to be passed
> to the decompressors, with the implementation specific
> aspects handled at decompression time, but without the
> knowledge being held in the decompressor wrapper code.
> 
> This will allow the decompressors to handle Squashfs
> cache buffers, and page cache pages.
> 
> This patch adds the abstraction and an implementation for
> the caches.
> 
> V2: also update the code in decompressor_multi*.c

Thanks!

> 
> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] Squashfs: restructure squashfs_readpage()
  2013-11-07 20:24 ` [PATCH 5/6] Squashfs: restructure squashfs_readpage() Phillip Lougher
@ 2013-11-08  5:55   ` Minchan Kim
  2013-11-08  6:02   ` Minchan Kim
  1 sibling, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2013-11-08  5:55 UTC (permalink / raw)
  To: Phillip Lougher; +Cc: linux-kernel

On Thu, Nov 07, 2013 at 08:24:24PM +0000, Phillip Lougher wrote:
> Restructure squashfs_readpage() splitting it into separate
> functions for datablocks, fragments and sparse blocks.
> 
> Move the memcpying (from squashfs cache entry) implementation of
> squashfs_readpage_block into file_cache.c
> 
> This allows different implementations to be supported.
> 
> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] Squashfs: restructure squashfs_readpage()
  2013-11-07 20:24 ` [PATCH 5/6] Squashfs: restructure squashfs_readpage() Phillip Lougher
  2013-11-08  5:55   ` Minchan Kim
@ 2013-11-08  6:02   ` Minchan Kim
  1 sibling, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2013-11-08  6:02 UTC (permalink / raw)
  To: Phillip Lougher; +Cc: linux-kernel

On Thu, Nov 07, 2013 at 08:24:24PM +0000, Phillip Lougher wrote:
> Restructure squashfs_readpage() splitting it into separate
> functions for datablocks, fragments and sparse blocks.
> 
> Move the memcpying (from squashfs cache entry) implementation of
> squashfs_readpage_block into file_cache.c
> 
> This allows different implementations to be supported.
> 
> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
> ---
>  fs/squashfs/Makefile     |    2 +-
>  fs/squashfs/file.c       |  142 +++++++++++++++++++++++-----------------------
>  fs/squashfs/file_cache.c |   38 +++++++++++++
>  fs/squashfs/squashfs.h   |    7 +++
>  4 files changed, 118 insertions(+), 71 deletions(-)
>  create mode 100644 fs/squashfs/file_cache.c
> 
> diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
> index 5833b96..908c0d9 100644
> --- a/fs/squashfs/Makefile
> +++ b/fs/squashfs/Makefile
> @@ -4,7 +4,7 @@
>  
>  obj-$(CONFIG_SQUASHFS) += squashfs.o
>  squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
> -squashfs-y += namei.o super.o symlink.o decompressor.o
> +squashfs-y += namei.o super.o symlink.o decompressor.o file_cache.c

                                                          file_cache.o

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/6] Squashfs: Directly decompress into the page cache for file data (V2)
  2013-11-07 20:24 ` [PATCH 6/6] Squashfs: Directly decompress into the page cache for file data (V2) Phillip Lougher
@ 2013-11-08  8:23   ` Minchan Kim
  0 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2013-11-08  8:23 UTC (permalink / raw)
  To: Phillip Lougher; +Cc: linux-kernel

On Thu, Nov 07, 2013 at 08:24:25PM +0000, Phillip Lougher wrote:
> This introduces an implementation of squashfs_readpage_block()
> that directly decompresses into the page cache.
> 
> This uses the previously added page handler abstraction to push
> down the necessary kmap_atomic/kunmap_atomic operations on the
> page cache buffers into the decompressors.  This enables
> direct copying into the page cache without using the slow
> kmap/kunmap calls.
> 
> The code detects when multiple threads are racing in
> squashfs_readpage() to decompress the same block, and avoids
> this regression by falling back to using an intermediate
> buffer.
> 
> This patch enhances the performance of Squashfs significantly
> when multiple processes are accessing the filesystem simultaneously
> because it not only reduces memcopying, but it more importantly
> eliminates the lock contention on the intermediate buffer.
> 
> Using single-thread decompression.
> 
>         dd if=file1 of=/dev/null bs=4096 &
>         dd if=file2 of=/dev/null bs=4096 &
>         dd if=file3 of=/dev/null bs=4096 &
>         dd if=file4 of=/dev/null bs=4096
> 
> Before:
> 
> 629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s
> 
> After:
> 
> 629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s
> 
> V2:
>   * update comment adding failure to grab pages could be
>     because we've been VM reclaimed, but the other pages are
>     still in the page cache and uptodate.
>   * Make Kconfig option a choice, making the either-other nature of
>     the option more explicit, and also tidying up the ifdef in the
>     Makefile
> 
> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
> ---
>  fs/squashfs/Kconfig       |   28 +++++++
>  fs/squashfs/Makefile      |    4 +-
>  fs/squashfs/file_direct.c |  178 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/squashfs/page_actor.c  |  104 ++++++++++++++++++++++++++
>  fs/squashfs/page_actor.h  |   32 ++++++++
>  5 files changed, 345 insertions(+), 1 deletion(-)
>  create mode 100644 fs/squashfs/file_direct.c
>  create mode 100644 fs/squashfs/page_actor.c
> 
> diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
> index c92c75f..3a21adf 100644
> --- a/fs/squashfs/Kconfig
> +++ b/fs/squashfs/Kconfig
> @@ -26,6 +26,34 @@ config SQUASHFS
>  	  If unsure, say N.
>  
>  choice
> +	prompt "File decompression options"
> +	depends on SQUASHFS
> +	help
> +	  Squashfs now supports two options for decompressing file
> +	  data.  Traditionally Squashfs has decompressed into an
> +	  intermediate buffer and then memcopied it into the page cache.
> +	  Squashfs now supports the ability to decompress directly into
> +	  the page cache.
> +
> +	  If unsure, select "Decompress file data into an intermediate buffer"
> +
> +config SQUASHFS_FILE_CACHE
> +	bool "Decompress file data into an intermediate buffer"
> +	help
> +	  Decompress file data into an intermediate buffer and then
> +	  memcopy it into the page cache.
> +
> +config SQUASHFS_FILE_DIRECT
> +	bool "Decompress files directly into the page cache"
> +	help
> +	  Directly decompress file data into the page cache.
> +	  Doing so can significantly improve performance because
> +	  it eliminates a mempcpy and it also removes the lock contention

                          memcpy

> +	  on the single buffer.
> +
> +endchoice
> +
> +choice
>  	prompt "Decompressor parallelisation options"
>  	depends on SQUASHFS
>  	help
> diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
> index 908c0d9..4132520 100644
> --- a/fs/squashfs/Makefile
> +++ b/fs/squashfs/Makefile
> @@ -4,7 +4,9 @@
>  
>  obj-$(CONFIG_SQUASHFS) += squashfs.o
>  squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
> -squashfs-y += namei.o super.o symlink.o decompressor.o file_cache.c
> +squashfs-y += namei.o super.o symlink.o decompressor.o
> +squashfs-$(CONFIG_SQUASHFS_FILE_CACHE) += file_cache.o
> +squashfs-$(CONFIG_SQUASHFS_FILE_DIRECT) += file_direct.o page_actor.o
>  squashfs-$(CONFIG_SQUASHFS_DECOMP_SINGLE) += decompressor_single.o
>  squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI) += decompressor_multi.o
>  squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU) += decompressor_multi_percpu.o
> diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c
> new file mode 100644
> index 0000000..d020d94
> --- /dev/null
> +++ b/fs/squashfs/file_direct.c
> @@ -0,0 +1,178 @@
> +/*
> + * Copyright (c) 2013
> + * Phillip Lougher <phillip@squashfs.org.uk>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/vfs.h>
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +#include <linux/string.h>
> +#include <linux/pagemap.h>
> +#include <linux/mutex.h>
> +
> +#include "squashfs_fs.h"
> +#include "squashfs_fs_sb.h"
> +#include "squashfs_fs_i.h"
> +#include "squashfs.h"
> +#include "page_actor.h"
> +
> +static int squashfs_read_cache(struct page *target_page, u64 block, int bsize,
> +	int pages, struct page **page);
> +
> +/* Read separately compressed datablock directly into page cache */
> +int squashfs_readpage_block(struct page *target_page, u64 block, int bsize)
> +
> +{
> +	struct inode *inode = target_page->mapping->host;
> +	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> +
> +	int file_end = (i_size_read(inode) - 1) >> PAGE_CACHE_SHIFT;
> +	int mask = (1 << (msblk->block_log - PAGE_CACHE_SHIFT)) - 1;
> +	int start_index = target_page->index & ~mask;
> +	int end_index = start_index | mask;
> +	int i, n, pages, missing_pages, bytes, res = -ENOMEM;
> +	struct page **page;
> +	struct squashfs_page_actor *actor;
> +	void *pageaddr;
> +
> +	if (end_index > file_end)
> +		end_index = file_end;
> +
> +	pages = end_index - start_index + 1;
> +
> +	page = kmalloc(sizeof(void *) * pages, GFP_KERNEL);
> +	if (page == NULL)
> +		goto error_out;
> +
> +	/*
> +	 * Create a "page actor" which will kmap and kunmap the
> +	 * page cache pages appropriately within the decompressor
> +	 */
> +	actor = squashfs_page_actor_init_special(page, pages, 0);
> +	if (actor == NULL)
> +		goto error_out2;
> +
> +	/* Try to grab all the pages covered by the Squashfs block */
> +	for (missing_pages = 0, i = 0, n = start_index; i < pages; i++, n++) {
> +		page[i] = (n == target_page->index) ? target_page :
> +			grab_cache_page_nowait(target_page->mapping, n);
> +
> +		if (page[i] == NULL) {
> +			missing_pages++;
> +			continue;
> +		}
> +
> +		if (PageUptodate(page[i])) {
> +			unlock_page(page[i]);
> +			page_cache_release(page[i]);
> +			page[i] = NULL;
> +			missing_pages++;
> +		}
> +	}
> +
> +	if (missing_pages) {
> +		/*
> +		 * Couldn't get one or more pages, this page has either
> +		 * been VM reclaimed, but others are still in the page cache
> +		 * and uptodate, or we're racing with another thread in
> +		 * squashfs_readpage also trying to grab them.  Fall back to
> +		 * using an intermediate buffer.
> +		 */
> +		kfree(actor);
> +		return squashfs_read_cache(target_page, block, bsize, pages,
> +								page);
> +	}
> +
> +	/* Decompress directly into the page cache buffers */
> +	res = squashfs_read_data(inode->i_sb, block, bsize, NULL, actor);
> +	if (res < 0)
> +		goto mark_errored;
> +
> +	/* Last page may have trailing bytes not filled */
> +	bytes = res % PAGE_CACHE_SIZE;
> +	if (bytes) {
> +		pageaddr = kmap_atomic(page[pages - 1]);
> +		memset(pageaddr + bytes, 0, PAGE_CACHE_SIZE - bytes);
> +		kunmap_atomic(pageaddr);
> +	}
> +
> +	/* Mark pages as uptodate, unlock and release */
> +	for (i = 0; i < pages; i++) {
> +		flush_dcache_page(page[i]);
> +		SetPageUptodate(page[i]);
> +		unlock_page(page[i]);
> +		if (page[i] != target_page)
> +			page_cache_release(page[i]);
> +	}
> +
> +	kfree(actor);
> +	kfree(page);
> +
> +	return 0;
> +
> +mark_errored:
> +	/* Decompression failed, mark pages as errored.  Target_page is
> +	 * dealt with by the caller
> +	 */
> +	for (i = 0; i < pages; i++) {
> +		if (page[i] == target_page)
> +			continue;
> +		pageaddr = kmap_atomic(page[i]);
> +		memset(pageaddr, 0, PAGE_CACHE_SIZE);

Do we need page zeroing?
If others see !PG_uptodate, it will retry to read so I guess we don't need it.

> +		kunmap_atomic(pageaddr);
> +		flush_dcache_page(page[i]);
> +		SetPageError(page[i]);
> +		unlock_page(page[i]);
> +		page_cache_release(page[i]);
> +	}
> +
> +	kfree(actor);
> +error_out2:
> +	kfree(page);
> +error_out:
> +	return res;
> +}
> +
> +
> +static int squashfs_read_cache(struct page *target_page, u64 block, int bsize,
> +	int pages, struct page **page)
> +{
> +	struct inode *i = target_page->mapping->host;
> +	struct squashfs_cache_entry *buffer = squashfs_get_datablock(i->i_sb,
> +						 block, bsize);
> +	int bytes = buffer->length, res = buffer->error, n, offset = 0;
> +	void *pageaddr;
> +
> +	if (res) {
> +		ERROR("Unable to read page, block %llx, size %x\n", block,
> +			bsize);
> +		goto out;
> +	}
> +
> +	for (n = 0; n < pages && bytes > 0; n++,
> +			bytes -= PAGE_CACHE_SIZE, offset += PAGE_CACHE_SIZE) {
> +		int avail = min_t(int, bytes, PAGE_CACHE_SIZE);
> +
> +		if (page[n] == NULL)
> +			continue;
> +
> +		pageaddr = kmap_atomic(page[n]);
> +		squashfs_copy_data(pageaddr, buffer, offset, avail);
> +		memset(pageaddr + avail, 0, PAGE_CACHE_SIZE - avail);
> +		kunmap_atomic(pageaddr);
> +		flush_dcache_page(page[n]);
> +		SetPageUptodate(page[n]);
> +		unlock_page(page[n]);
> +		if (page[n] != target_page)
> +			page_cache_release(page[n]);
> +	}
> +
> +out:
> +	squashfs_cache_put(buffer);

Nitpick:

It would be better to free page in caller rather than caller if the function
return error?

> +	kfree(page);
> +	return res;
> +}
> diff --git a/fs/squashfs/page_actor.c b/fs/squashfs/page_actor.c
> new file mode 100644
> index 0000000..8e754ff
> --- /dev/null
> +++ b/fs/squashfs/page_actor.c
> @@ -0,0 +1,104 @@
> +/*
> + * Copyright (c) 2013
> + * Phillip Lougher <phillip@squashfs.org.uk>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +#include <linux/pagemap.h>
> +#include "page_actor.h"
> +
> +/* Implementation of page_actor for decompressing into intermediate buffer */
> +static void *cache_first_page(struct squashfs_page_actor *actor)
> +{
> +	actor->next_page = 1;
> +	return actor->buffer[0];
> +}
> +
> +static void *cache_next_page(struct squashfs_page_actor *actor)
> +{
> +	if (actor->next_page == actor->pages)
> +		return NULL;
> +
> +	return actor->buffer[actor->next_page++];
> +}
> +
> +static void cache_finish_page(struct squashfs_page_actor *actor)
> +{
> +	/* empty */
> +}
> +
> +struct squashfs_page_actor *squashfs_page_actor_init(void **buffer,
> +	int pages, int length)
> +{
> +	struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL);
> +
> +	if (actor == NULL)
> +		return NULL;
> +
> +	if (length)
> +		actor->length = length;
> +	else
> +		actor->length = pages * PAGE_CACHE_SIZE;
> +	actor->buffer = buffer;
> +	actor->pages = pages;
> +	actor->next_page = 0;
> +
> +	actor->squashfs_first_page = cache_first_page;
> +	actor->squashfs_next_page = cache_next_page;
> +	actor->squashfs_finish_page = cache_finish_page;
> +	return actor;
> +}
> +
> +/* Implementation of page_actor for decompressing directly into page cache */
> +static void *direct_first_page(struct squashfs_page_actor *actor)
> +{
> +	actor->next_page = 1;
> +	return actor->pageaddr = kmap_atomic(actor->page[0]);
> +}


Just my two cents

It makes new rule that we shouldn't call blocking function during page
enumerating with page_actor. Somewhere comment about that will be helpful.

> +
> +static void *direct_next_page(struct squashfs_page_actor *actor)
> +{
> +	if (actor->pageaddr)
> +		kunmap_atomic(actor->pageaddr);
> +
> +	if (actor->next_page == actor->pages) {
> +		actor->pageaddr = NULL;
> +		return NULL;
> +	}
> +
> +	return actor->pageaddr = kmap_atomic(actor->page[actor->next_page++]);
> +}
> +
> +static void direct_finish_page(struct squashfs_page_actor *actor)
> +{
> +	if (actor->pageaddr)
> +		kunmap_atomic(actor->pageaddr);
> +}
> +
> +
> +struct squashfs_page_actor *squashfs_page_actor_init_special(struct page **page,
> +	int pages, int length)
> +{
> +	struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL);
> +
> +	if (actor == NULL)
> +		return NULL;
> +
> +	if (length)
> +		actor->length = length;
> +	else
> +		actor->length = pages * PAGE_CACHE_SIZE;
> +	actor->page = page;
> +	actor->pages = pages;
> +	actor->next_page = 0;
> +	actor->pageaddr = NULL;
> +
> +	actor->squashfs_first_page = direct_first_page;
> +	actor->squashfs_next_page = direct_next_page;
> +	actor->squashfs_finish_page = direct_finish_page;
> +	return actor;
> +}
> diff --git a/fs/squashfs/page_actor.h b/fs/squashfs/page_actor.h
> index 19a66a3..22731c7 100644
> --- a/fs/squashfs/page_actor.h
> +++ b/fs/squashfs/page_actor.h
> @@ -8,6 +8,7 @@
>   * the COPYING file in the top-level directory.
>   */
>  
> +#ifndef CONFIG_SQUASHFS_FILE_DIRECT
>  struct squashfs_page_actor {
>  	void	**page;
>  	int	pages;
> @@ -51,4 +52,35 @@ static inline void squashfs_finish_page(struct squashfs_page_actor *actor)
>  {
>  	/* empty */
>  }
> +#else
> +struct squashfs_page_actor {
> +	union {
> +		void		**buffer;
> +		struct page	**page;
> +	};
> +	void	*pageaddr;
> +	void    *(*squashfs_first_page)(struct squashfs_page_actor *);
> +	void    *(*squashfs_next_page)(struct squashfs_page_actor *);
> +	void    (*squashfs_finish_page)(struct squashfs_page_actor *);
> +	int	pages;
> +	int	length;
> +	int	next_page;
> +};
> +
> +extern struct squashfs_page_actor *squashfs_page_actor_init(void **, int, int);
> +extern struct squashfs_page_actor *squashfs_page_actor_init_special(struct page
> +							 **, int, int);
> +static inline void *squashfs_first_page(struct squashfs_page_actor *actor)
> +{
> +	return actor->squashfs_first_page(actor);
> +}
> +static inline void *squashfs_next_page(struct squashfs_page_actor *actor)
> +{
> +	return actor->squashfs_next_page(actor);
> +}
> +static inline void squashfs_finish_page(struct squashfs_page_actor *actor)
> +{
> +	actor->squashfs_finish_page(actor);
> +}
> +#endif
>  #endif

Most of thing from me are just nitpicks.
Looks great to me.

Thanks, Phillip.

Reviewed-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
  2013-11-08  2:42   ` Minchan Kim
@ 2013-11-14 17:04     ` Phillip Lougher
  2013-11-18  8:08       ` J. R. Okajima
  2013-11-19  2:12       ` Minchan Kim
  0 siblings, 2 replies; 15+ messages in thread
From: Phillip Lougher @ 2013-11-14 17:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Minchan Kim, Phillip Lougher, J. R. Okajima, Stephen Hemminger

CCing Junjiro Okijima and Stephen Hemminger

On 08/11/13 02:42, Minchan Kim wrote:
>
>Hello Phillip,
>
>On Thu, Nov 07, 2013 at 08:24:22PM +0000, Phillip Lougher wrote:
>> Add a multi-threaded decompression implementation which uses
>> percpu variables.
>>
>> Using percpu variables has advantages and disadvantages over
>> implementations which do not use percpu variables.
>>
>> Advantages: the nature of percpu variables ensures decompression is
>> load-balanced across the multiple cores.
>>
>> Disadvantages: it limits decompression to one thread per core.
>
>At a glance, I understand your concern but I don't see benefit to
>make this feature as separate new config because we can modify the
>number of decompressor per core in the future.
>I don't want to create new config SQUASHFS_DECOMP_MULTI_3,
>SQUASHFS_DECOMP_MULTI_4 and so on. :)

You misunderstand

I have been sent two multi-threaded implementations in the
past which use percpu variables:

1.  First patch set:

    http://www.spinics.net/lists/linux-fsdevel/msg34365.html

    Later in early 2011, I explained why I'd not merged the
    patches, and promised to do so when I got time

    http://www.spinics.net/lists/linux-fsdevel/msg42392.html


2.  Second patch set sent in 2011

    http://www.spinics.net/lists/linux-fsdevel/msg44111.html

So, these patches have been in my inbox, waiting until I got
time to refactor Squashfs so that they could be merged... and
I finally got to do this last month, which is why I'm merging
a combined version of both patches now.

As to why have *two* implementations, I previously explained these
two approaches are complementary, and merging both allows the
user to decide which method of parallelising Squashfs they want
to do.

The percpu implementation is a good approach to parallelising
Squashfs.  It is extremely simple, both in code and overhead.
The decompression hotpath simply consists of taking a percpu
variable, doing the decompression, and then a release.

Looking at code sizes:

fs/squashfs/decompressor_multi.c        |  199 +++++++++++++++++++++++++++++++
fs/squashfs/decompressor_multi_percpu.c |  104 ++++++++++++++++
fs/squashfs/decompressor_single.c       |   85 +++++++++++++

The simplicity of the percpu approach is readily apparent, at 104
lines it is only slightly larger than the single threaded
implementation.

Personally I like both approaches, and I have no reason not to
merge both implementations I have been sent.

But what does the community think here?  Do you want the percpu
implementation?  Do you see value in having two implementations?
Feedback is appreciated.

>
>How about this?
>
>1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin
>   in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it
>   to sysfs so user can tune it in rumtime.
>
>2. put decompressor shrink logic by slab shrinker so if system has
>   memory pressure, we could catch the event and free some of decompressor
>   but memory pressure is not severe again in the future, we can create
>   new decompressor until reaching threadhold user define.
>   We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL
>   in get_decomp_stream's allocation indirectly.

This adds extra complexity to an implementation already 199 lines long
(as opposed to 104 for the percpu implementation).   The whole point of
the percpu implementation is to add a simple implementation that
may suit many systems.

Phillip

>
>In short, let's make decompressor_multi as dynamically tuned system
>and user can limit the max.
>
>
>>
>> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
>> ---
>>  fs/squashfs/Kconfig                     |   57 +++++++++++++----
>>  fs/squashfs/Makefile                    |   10 +--
>>  fs/squashfs/decompressor_multi_percpu.c |  105 +++++++++++++++++++++++++++++++
>>  3 files changed, 152 insertions(+), 20 deletions(-)
>>  create mode 100644 fs/squashfs/decompressor_multi_percpu.c
>>
>> diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
>> index 1c6d340..c92c75f 100644
>> --- a/fs/squashfs/Kconfig
>> +++ b/fs/squashfs/Kconfig
>> @@ -25,6 +25,50 @@ config SQUASHFS
>>
>>  	  If unsure, say N.
>>
>> +choice
>> +	prompt "Decompressor parallelisation options"
>> +	depends on SQUASHFS
>> +	help
>> +	  Squashfs now supports three parallelisation options for
>> +	  decompression.  Each one exhibits various trade-offs between
>> +	  decompression performance and CPU and memory usage.
>> +
>> +	  If in doubt, select "Single threaded compression"
>> +
>> +config SQUASHFS_DECOMP_SINGLE
>> +	bool "Single threaded compression"
>> +	help
>> +	  Traditionally Squashfs has used single-threaded decompression.
>> +	  Only one block (data or metadata) can be decompressed at any
>> +	  one time.  This limits CPU and memory usage to a minimum.
>> +
>> +config SQUASHFS_DECOMP_MULTI
>> +	bool "Use multiple decompressors for parallel I/O"
>> +	help
>> +	  By default Squashfs uses a single decompressor but it gives
>> +	  poor performance on parallel I/O workloads when using multiple CPU
>> +	  machines due to waiting on decompressor availability.
>> +
>> +	  If you have a parallel I/O workload and your system has enough memory,
>> +	  using this option may improve overall I/O performance.
>> +
>> +	  This decompressor implementation uses up to two parallel
>> +	  decompressors per core.  It dynamically allocates decompressors
>> +	  on a demand basis.
>
>I'm not sure it's good idea to write how many of parallel decompressor
>per core. IMO, It's implementation stuff and user don't need to know internal.
>And we could modify it in the future.
>
>> +
>> +config SQUASHFS_DECOMP_MULTI_PERCPU
>> +       bool "Use percpu multiple decompressors for parallel I/O"
>> +	help
>
>Indentation.
>
>> +	  By default Squashfs uses a single decompressor but it gives
>> +	  poor performance on parallel I/O workloads when using multiple CPU
>> +	  machines due to waiting on decompressor availability.
>> +
>> +	  This decompressor implementation uses a maximum of one
>> +	  decompressor per core.  It uses percpu variables to ensure
>> +	  decompression is load-balanced across the cores.
>> +
>> +endchoice
>> +
>>  config SQUASHFS_XATTR
>>  	bool "Squashfs XATTR support"
>>  	depends on SQUASHFS
>> @@ -63,19 +107,6 @@ config SQUASHFS_LZO
>>
>>  	  If unsure, say N.
>>
>> -config SQUASHFS_MULTI_DECOMPRESSOR
>> -	bool "Use multiple decompressors for handling parallel I/O"
>> -	depends on SQUASHFS
>> -	help
>> -	  By default Squashfs uses a single decompressor but it gives
>> -	  poor performance on parallel I/O workloads when using multiple CPU
>> -	  machines due to waiting on decompressor availability.
>> -
>> -	  If you have a parallel I/O workload and your system has enough memory,
>> -	  using this option may improve overall I/O performance.
>> -
>> -	  If unsure, say N.
>> -
>>  config SQUASHFS_XZ
>>  	bool "Include support for XZ compressed file systems"
>>  	depends on SQUASHFS
>> diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
>> index dfebc3b..5833b96 100644
>> --- a/fs/squashfs/Makefile
>> +++ b/fs/squashfs/Makefile
>> @@ -5,14 +5,10 @@
>>  obj-$(CONFIG_SQUASHFS) += squashfs.o
>>  squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
>>  squashfs-y += namei.o super.o symlink.o decompressor.o
>> -
>> +squashfs-$(CONFIG_SQUASHFS_DECOMP_SINGLE) += decompressor_single.o
>> +squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI) += decompressor_multi.o
>> +squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU) += decompressor_multi_percpu.o
>>  squashfs-$(CONFIG_SQUASHFS_XATTR) += xattr.o xattr_id.o
>>  squashfs-$(CONFIG_SQUASHFS_LZO) += lzo_wrapper.o
>>  squashfs-$(CONFIG_SQUASHFS_XZ) += xz_wrapper.o
>>  squashfs-$(CONFIG_SQUASHFS_ZLIB) += zlib_wrapper.o
>> -
>> -ifdef CONFIG_SQUASHFS_MULTI_DECOMPRESSOR
>> -	squashfs-y		+= decompressor_multi.o
>> -else
>> -	squashfs-y		+= decompressor_single.o
>> -endif
>> diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c
>> new file mode 100644
>> index 0000000..b5598ab
>> --- /dev/null
>> +++ b/fs/squashfs/decompressor_multi_percpu.c
>> @@ -0,0 +1,105 @@
>> +/*
>> + * Copyright (c) 2013
>> + * Phillip Lougher <phillip@squashfs.org.uk>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2. See
>> + * the COPYING file in the top-level directory.
>> + */
>> +
>> +#include <linux/types.h>
>> +#include <linux/mutex.h>
>> +#include <linux/slab.h>
>> +#include <linux/buffer_head.h>
>> +
>> +#include "squashfs_fs.h"
>> +#include "squashfs_fs_sb.h"
>> +#include "decompressor.h"
>> +#include "squashfs.h"
>> +
>> +/*
>> + * This file implements multi-threaded decompression using percpu
>> + * variables, one thread per cpu core.
>> + */
>> +
>> +struct squashfs_stream {
>> +	void		*stream;
>> +};
>> +
>> +void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
>> +						void *comp_opts)
>> +{
>> +	struct squashfs_stream *stream;
>> +	struct squashfs_stream __percpu *percpu;
>> +	int err, cpu, cpu0;
>> +
>> +	percpu = alloc_percpu(struct squashfs_stream);
>> +	if (percpu == NULL) {
>> +		err = -ENOMEM;
>> +		goto out2;
>> +	}
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		stream = per_cpu_ptr(percpu, cpu);
>> +		stream->stream = msblk->decompressor->init(msblk, comp_opts);
>> +		if (IS_ERR(stream->stream)) {
>> +			err = PTR_ERR(stream->stream);
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	kfree(comp_opts);
>
>squashfs_decompressor_setup free comp_opts.
>
>> +	return (__force void *) percpu;
>> +
>> +out:
>> +	for_each_possible_cpu(cpu0) {
>> +		if (cpu0 == cpu)
>
>Just nit so you could ignore easily. :)
>
>this cpu variable comparing is tricky to me.
>I'm not sure what happens if CPU hotplug happens between above two loop
>I guess your code is but I couldn't find such usecase in other code.
>
>alloc_percpu semantic is that "Allocate zero-filled percpu area"
>so how about using it instead of cpu index comparing?
>
>for_each_possbile_cpu(cpu) {
>        stream = per_cpu_ptr(percpu, cpu0);
>        if (stream)
>                msblk->decompressor->free(stream->stream);
>}
>
>It's never hot path and not fragile to change CPU hotplug, IMHO.
>
>
>> +			break;
>> +		stream = per_cpu_ptr(percpu, cpu0);
>> +		msblk->decompressor->free(stream->stream);
>> +	}
>> +	free_percpu(percpu);
>> +out2:
>> +	kfree(comp_opts);
>
>We can throw away, either.
>
>
>> +	return ERR_PTR(err);
>> +}
>> +
>> +void squashfs_decompressor_destroy(struct squashfs_sb_info *msblk)
>> +{
>> +	struct squashfs_stream __percpu *percpu =
>> +			(struct squashfs_stream __percpu *) msblk->stream;
>> +	struct squashfs_stream *stream;
>> +	int cpu;
>> +
>> +	if (msblk->stream) {
>> +		for_each_possible_cpu(cpu) {
>> +			stream = per_cpu_ptr(percpu, cpu);
>> +			msblk->decompressor->free(stream->stream);
>> +		}
>> +		free_percpu(percpu);
>> +	}
>> +}
>> +
>> +int squashfs_decompress(struct squashfs_sb_info *msblk,
>> +	void **buffer, struct buffer_head **bh, int b, int offset, int length,
>> +	int srclength, int pages)
>> +{
>> +	int res;
>> +	struct squashfs_stream __percpu *percpu =
>> +			(struct squashfs_stream __percpu *) msblk->stream;
>> +	struct squashfs_stream *stream = get_cpu_ptr(percpu);
>> +
>> +	res = msblk->decompressor->decompress(msblk, stream->stream, buffer,
>> +		bh, b, offset, length, srclength, pages);
>> +	put_cpu_ptr(stream);
>> +
>> +	if (res < 0)
>> +		ERROR("%s decompression failed, data probably corrupt\n",
>> +			msblk->decompressor->name);
>> +
>> +	return res;
>> +}
>> +
>> +int squashfs_max_decompressors(void)
>> +{
>> +	return num_possible_cpus();
>> +}
>> --
>> 1.7.10.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
>--
>Kind regards,
>Minchan Kim
>.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
  2013-11-14 17:04     ` Phillip Lougher
@ 2013-11-18  8:08       ` J. R. Okajima
  2013-11-19  2:12       ` Minchan Kim
  1 sibling, 0 replies; 15+ messages in thread
From: J. R. Okajima @ 2013-11-18  8:08 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: linux-kernel, Minchan Kim, Phillip Lougher, Stephen Hemminger


Phillip Lougher:
> CCing Junjiro Okijima and Stephen Hemminger

Thank you for CCing, and sorry for my slow responce.


> >> Using percpu variables has advantages and disadvantages over
> >> implementations which do not use percpu variables.
> >>
> >> Advantages: the nature of percpu variables ensures decompression is
> >> load-balanced across the multiple cores.
> >>
> >> Disadvantages: it limits decompression to one thread per core.

Honestly speaking, I don't remember the details of squashfs. It was a
long long time ago when I read and modified squashfs.
Anyway I will try replying.

Percpu is a good approach. Obviously, as you mentioned as
disadvantage, it depends the balance between these two things.
- How many I/Os in parallel?
- How much does the decompression cost?
My current guess is the latter is heavier (for the performance), so I
guess percpu is good.

Is it guranteed that the decompressor never require any new resources?
Under heavy I/O and memory pressure,
if the decompressor wants some memory between get_cpu_ptr() and
put_cpu_ptr(),
and if the decompressor is running on all other cores at the same time,
then does squashfs simply return ENOMEM because the memory shrinker
cannot run on any core?
If it is true, we may need a rule "no new resources for decompressing"
since users may prefer the "slow but successful decompression" than
getting ENOMEM.

If this mail is totaly pointless, please ignore.


J. R. Okajima

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
  2013-11-14 17:04     ` Phillip Lougher
  2013-11-18  8:08       ` J. R. Okajima
@ 2013-11-19  2:12       ` Minchan Kim
  1 sibling, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2013-11-19  2:12 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: linux-kernel, Phillip Lougher, J. R. Okajima, Stephen Hemminger

Hello Phillip,

Sorry for late response.

On Thu, Nov 14, 2013 at 05:04:36PM +0000, Phillip Lougher wrote:
> CCing Junjiro Okijima and Stephen Hemminger
> 
> On 08/11/13 02:42, Minchan Kim wrote:
> >
> >Hello Phillip,
> >
> >On Thu, Nov 07, 2013 at 08:24:22PM +0000, Phillip Lougher wrote:
> >>Add a multi-threaded decompression implementation which uses
> >>percpu variables.
> >>
> >>Using percpu variables has advantages and disadvantages over
> >>implementations which do not use percpu variables.
> >>
> >>Advantages: the nature of percpu variables ensures decompression is
> >>load-balanced across the multiple cores.
> >>
> >>Disadvantages: it limits decompression to one thread per core.
> >
> >At a glance, I understand your concern but I don't see benefit to
> >make this feature as separate new config because we can modify the
> >number of decompressor per core in the future.
> >I don't want to create new config SQUASHFS_DECOMP_MULTI_3,
> >SQUASHFS_DECOMP_MULTI_4 and so on. :)
> 
> You misunderstand
> 
> I have been sent two multi-threaded implementations in the
> past which use percpu variables:
> 
> 1.  First patch set:
> 
>    http://www.spinics.net/lists/linux-fsdevel/msg34365.html
> 
>    Later in early 2011, I explained why I'd not merged the
>    patches, and promised to do so when I got time
> 
>    http://www.spinics.net/lists/linux-fsdevel/msg42392.html
> 
> 
> 2.  Second patch set sent in 2011
> 
>    http://www.spinics.net/lists/linux-fsdevel/msg44111.html
> 
> So, these patches have been in my inbox, waiting until I got
> time to refactor Squashfs so that they could be merged... and
> I finally got to do this last month, which is why I'm merging
> a combined version of both patches now.
> 
> As to why have *two* implementations, I previously explained these
> two approaches are complementary, and merging both allows the
> user to decide which method of parallelising Squashfs they want
> to do.

What I see for benefit decompressor_multi_percpu is only limit
memory overhead but it could be solved by dynamic shrinking as
I mentioned earlier.

About CPU usage, I'm not sure how decompressor_multi is horrible.
If it's really concern, we can fix it to limit number of decomp
stream by admin via sysfs or something.

Decompression load balance? Acutally, I don't understand your claim.
Could you elaborate it a bit?
I couldn't understand why decompressor_multi_percpu is better than
decompressor_multi.

> 
> The percpu implementation is a good approach to parallelising
> Squashfs.  It is extremely simple, both in code and overhead.

I agree about code but not sure about overhead.
I admit decompressor_multi is bigger but it consists of simple opeartions.
Sometime, kmalloc's cost could be higher if system memory pressure is
severe so that file system's performance would fluctuate. If it's
your concern, we could make threshold min/high so that it could guarantee
some speed at least.


> The decompression hotpath simply consists of taking a percpu
> variable, doing the decompression, and then a release.

When decomp buffer is created dynamically, it could be rather overhead
compared to percpu approach but once it did, the overhead of decompress
would be marginal.

> Looking at code sizes:
> 
> fs/squashfs/decompressor_multi.c        |  199 +++++++++++++++++++++++++++++++
> fs/squashfs/decompressor_multi_percpu.c |  104 ++++++++++++++++
> fs/squashfs/decompressor_single.c       |   85 +++++++++++++
> 
> The simplicity of the percpu approach is readily apparent, at 104
> lines it is only slightly larger than the single threaded
> implementation.
> 
> Personally I like both approaches, and I have no reason not to
> merge both implementations I have been sent.

Okay. I'm not a maintainer so I'm not strong against your thougt.
I just wanted to unify them if either of them isn't sigificantly win
because I thought it makes maintainer and contributors happy due to
avoid more CONFIG options which should be considered when some feature
is added.

> 
> But what does the community think here?  Do you want the percpu
> implementation?  Do you see value in having two implementations?
> Feedback is appreciated.

As I mentioned, my opinion is that let's unify them if either of them
is significantly win. It would be better to see some benchmark result.

> 
> >
> >How about this?
> >
> >1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin
> >  in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it
> >  to sysfs so user can tune it in rumtime.
> >
> >2. put decompressor shrink logic by slab shrinker so if system has
> >  memory pressure, we could catch the event and free some of decompressor
> >  but memory pressure is not severe again in the future, we can create
> >  new decompressor until reaching threadhold user define.
> >  We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL
> >  in get_decomp_stream's allocation indirectly.
> 
> This adds extra complexity to an implementation already 199 lines long
> (as opposed to 104 for the percpu implementation).   The whole point of
> the percpu implementation is to add a simple implementation that
> may suit many systems.

Okay. finally I agree because I guess decompressor_multi's code complexity
might be large in future to control decomp_stream's size dynamically
in many core system and some system might hate it by memory overhead
In that case, per-cpu approach could works well by small amount of code.

>From now on, I don't object.

Thanks.

> 
> Phillip
> 
> >

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-11-19  2:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-07 20:24 [PATCH 0/6] Squashfs performance improvements Phillip Lougher
2013-11-07 20:24 ` [PATCH 1/6] Squashfs: Refactor decompressor interface and code (V2) Phillip Lougher
2013-11-07 20:24 ` [PATCH 2/6] Squashfs: enhance parallel I/O Phillip Lougher
2013-11-07 20:24 ` [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables Phillip Lougher
2013-11-08  2:42   ` Minchan Kim
2013-11-14 17:04     ` Phillip Lougher
2013-11-18  8:08       ` J. R. Okajima
2013-11-19  2:12       ` Minchan Kim
2013-11-07 20:24 ` [PATCH 4/6] Squashfs: Generalise paging handling in the decompressors (V2) Phillip Lougher
2013-11-08  5:29   ` Minchan Kim
2013-11-07 20:24 ` [PATCH 5/6] Squashfs: restructure squashfs_readpage() Phillip Lougher
2013-11-08  5:55   ` Minchan Kim
2013-11-08  6:02   ` Minchan Kim
2013-11-07 20:24 ` [PATCH 6/6] Squashfs: Directly decompress into the page cache for file data (V2) Phillip Lougher
2013-11-08  8:23   ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).