All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv5 0/4] add compressing abstraction and multi stream support
@ 2014-02-13 17:43 Sergey Senozhatsky
  2014-02-13 17:43 ` [PATCHv5 1/4] zram: introduce compressing backend abstraction Sergey Senozhatsky
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2014-02-13 17:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Jerome Marchand, Nitin Gupta, linux-kernel, Sergey Senozhatsky

This patchset introduces zcomp compression backend abstraction
adding ability to support compression algorithms other than LZO;
support for multi compression streams, making parallel compressions
possible.

Diff from v4 (reviewed by Minchan Kim):
-- renamed zcomp buffer_lock; removed src len and dst len from
   compress() and decompress(); not using term `buffer' and
   `workmem' in code and documentation; define compress() and
   decompress() functions for LZO backend; not using goto's;
   do not put idle zcomp_strm to idle list tail.

Diff from v3:
-- renamed compression backend and working memory structs as requested
   by Minchan Kim; fixed several issues noted by Minchan Kim.

Sergey Senozhatsky (4):
  zram: introduce compressing backend abstraction
  zram: use zcomp compressing backends
  zram: zcomp support multi compressing streams
  zram: document max_comp_streams

 Documentation/ABI/testing/sysfs-block-zram |   9 +-
 Documentation/blockdev/zram.txt            |  20 +++-
 drivers/block/zram/Makefile                |   2 +-
 drivers/block/zram/zcomp.c                 | 155 +++++++++++++++++++++++++++++
 drivers/block/zram/zcomp.h                 |  56 +++++++++++
 drivers/block/zram/zcomp_lzo.c             |  48 +++++++++
 drivers/block/zram/zram_drv.c              | 102 ++++++++++++-------
 drivers/block/zram/zram_drv.h              |  10 +-
 8 files changed, 355 insertions(+), 47 deletions(-)
 create mode 100644 drivers/block/zram/zcomp.c
 create mode 100644 drivers/block/zram/zcomp.h
 create mode 100644 drivers/block/zram/zcomp_lzo.c

-- 
1.9.0.rc3.260.g4cf525c


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv5 1/4] zram: introduce compressing backend abstraction
  2014-02-13 17:43 [PATCHv5 0/4] add compressing abstraction and multi stream support Sergey Senozhatsky
@ 2014-02-13 17:43 ` Sergey Senozhatsky
  2014-02-18  5:10   ` Minchan Kim
  2014-02-13 17:43 ` [PATCHv5 2/4] zram: use zcomp compressing backends Sergey Senozhatsky
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Sergey Senozhatsky @ 2014-02-13 17:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Jerome Marchand, Nitin Gupta, linux-kernel, Sergey Senozhatsky

ZRAM performs direct LZO compression algorithm calls, making it the one and
only option. Introduce compressing backend abstraction zcomp in order to
support multiple compression algorithms with the following set of operations:
        .create
        .destroy
        .compress
        .decompress

Schematically zram write() usually contains the following steps:
0) preparation (decompression of partioal IO, etc.)
1) lock buffer_lock mutex (protects meta compress buffers)
2) compress (using meta compress buffers)
3) alloc and map zs_pool object
4) copy compressed data (from meta compress buffers) to object allocated by 3)
5) free previous pool page, assign a new one
6) unlock buffer_lock mutex

As we can see, compressing buffers must remain untouched from 1) to 4),
because, otherwise, concurrent write() can overwrite data. At the same
time, zram_meta must be aware of a) specific compression algorithm
memory requirements and b) necessary locking to protect compression
buffers. Besides, zram holds buffer_lock almost through the whole write()
function, making parallel compression impossible. To remove requirement
a) new struct zcomp_strm introduced, which contain a compress/decompress
`buffer' and compression algorithm `private' part. While struct zcomp
implements zcomp_strm stream handling and locking by means of get() and put()
semantics and removes requirement b) from zram meta. zcomp ->create() and
->destroy(), respectively, allocate and deallocate algorithm specific
zcomp_strm `private' part.

Every zcomp has a list of idle zcomp_strm structs, spinlock to protect idle
list and wait queue, making it possible to perform parallel compressions.
Each time zram issues a zcomp_strm_get() call, the following set of operations
performed:
- spin lock strm_lock
- if idle list is not empty, remove zcomp_strm from idle list, spin
  unlock and return zcomp stream pointer to caller
- if idle list is empty, current adds itself to wait queue. it will be
  awaken by zcomp_strm_put() caller.

zcomp_strm_put():
- spin lock strm_lock
- add zcomp stream to idle list
- spin unlock, wake up sleeper

In other words, zcomp_strm_get() turns caller into exclusive user of a stream
and zcomp_strm_put() makes a particular zcomp stream available.

Usage examples.

To initialize compressing backend:
	comp = zcomp_create(NAME) /* NAME e.g. "lzo" */

which initialises compressing backend if requested algorithm is supported.

Compress:
	zstrm = zcomp_strm_get(comp)
	zcomp_compress(comp, zstrm, src, &dst_len)
	[..] /* copy compressed data */
	zcomp_strm_put(comp, zstrm)

Decompress:
	zcomp_decompress(comp, src, src_len, dst);

Free compessing backend and its zcomp stream:
	zcomp_destroy(comp)

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zcomp.c     | 151 +++++++++++++++++++++++++++++++++++++++++
 drivers/block/zram/zcomp.h     |  56 +++++++++++++++
 drivers/block/zram/zcomp_lzo.c |  48 +++++++++++++
 3 files changed, 255 insertions(+)
 create mode 100644 drivers/block/zram/zcomp.c
 create mode 100644 drivers/block/zram/zcomp.h
 create mode 100644 drivers/block/zram/zcomp_lzo.c

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
new file mode 100644
index 0000000..3af25b6
--- /dev/null
+++ b/drivers/block/zram/zcomp.c
@@ -0,0 +1,151 @@
+/*
+ * Copyright (C) 2014 Sergey Senozhatsky.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+#include <linux/sched.h>
+
+#include "zcomp.h"
+
+extern struct zcomp_backend zcomp_lzo;
+
+static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
+{
+	comp->backend->destroy(zstrm->private);
+	free_pages((unsigned long)zstrm->buffer, 1);
+	kfree(zstrm);
+}
+
+/*
+ * allocate new zcomp_strm structure with ->private initialized by
+ * backend, return NULL on error
+ */
+static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
+{
+	struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL);
+	if (!zstrm)
+		return NULL;
+
+	INIT_LIST_HEAD(&zstrm->list);
+	zstrm->private = comp->backend->create();
+	/*
+	 * allocate 2 pages. 1 for compressed data, plus 1 extra for the
+	 * case when compressed size is larger than the original one
+	 */
+	zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
+	if (!zstrm->private || !zstrm->buffer) {
+		zcomp_strm_free(comp, zstrm);
+		zstrm = NULL;
+	}
+	return zstrm;
+}
+
+/*
+ * get existing idle zcomp_strm or wait until other process release
+ * (zcomp_strm_put()) one for us
+ */
+struct zcomp_strm *zcomp_strm_get(struct zcomp *comp)
+{
+	struct zcomp_strm *zstrm;
+
+	while (1) {
+		spin_lock(&comp->strm_lock);
+		if (list_empty(&comp->idle_strm)) {
+			spin_unlock(&comp->strm_lock);
+			wait_event(comp->strm_wait,
+					!list_empty(&comp->idle_strm));
+			continue;
+		}
+
+		zstrm = list_entry(comp->idle_strm.next,
+				struct zcomp_strm, list);
+		list_del(&zstrm->list);
+		spin_unlock(&comp->strm_lock);
+		break;
+	}
+	return zstrm;
+}
+
+/* add zcomp_strm back to idle list and wake up waiter (if any) */
+void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm)
+{
+	spin_lock(&comp->strm_lock);
+	list_add(&zstrm->list, &comp->idle_strm);
+	spin_unlock(&comp->strm_lock);
+
+	wake_up(&comp->strm_wait);
+}
+
+int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
+		const unsigned char *src, size_t *dst_len)
+{
+	return comp->backend->compress(src, zstrm->buffer, dst_len,
+			zstrm->private);
+}
+
+int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
+		size_t src_len, unsigned char *dst)
+{
+	return comp->backend->decompress(src, src_len, dst);
+}
+
+void zcomp_destroy(struct zcomp *comp)
+{
+	struct zcomp_strm *zstrm;
+	while (!list_empty(&comp->idle_strm)) {
+		zstrm = list_entry(comp->idle_strm.next,
+				struct zcomp_strm, list);
+		list_del(&zstrm->list);
+		zcomp_strm_free(comp, zstrm);
+	}
+	kfree(comp);
+}
+
+static struct zcomp_backend *find_backend(const char *compress)
+{
+	if (sysfs_streq(compress, "lzo"))
+		return &zcomp_lzo;
+	return NULL;
+}
+
+/*
+ * search available compressors for requested algorithm.
+ * allocate new zcomp and initialize it. return NULL
+ * if requested algorithm is not supported or in case
+ * of init error
+ */
+struct zcomp *zcomp_create(const char *compress)
+{
+	struct zcomp *comp;
+	struct zcomp_backend *backend;
+	struct zcomp_strm *zstrm;
+
+	backend = find_backend(compress);
+	if (!backend)
+		return NULL;
+
+	comp = kmalloc(sizeof(struct zcomp), GFP_KERNEL);
+	if (!comp)
+		return NULL;
+
+	comp->backend = backend;
+	spin_lock_init(&comp->strm_lock);
+	INIT_LIST_HEAD(&comp->idle_strm);
+	init_waitqueue_head(&comp->strm_wait);
+
+	zstrm = zcomp_strm_alloc(comp);
+	if (!zstrm) {
+		zcomp_destroy(comp);
+		return NULL;
+	}
+	list_add(&zstrm->list, &comp->idle_strm);
+	return comp;
+}
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
new file mode 100644
index 0000000..7e05d20
--- /dev/null
+++ b/drivers/block/zram/zcomp.h
@@ -0,0 +1,56 @@
+/*
+ * Copyright (C) 2014 Sergey Senozhatsky.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ZCOMP_H_
+#define _ZCOMP_H_
+
+#include <linux/spinlock.h>
+
+struct zcomp_strm {
+	void *buffer;     /* compression/decompression buffer */
+	void *private;      /* algorithm private memory */
+	struct list_head list;
+};
+
+/* static compression backend */
+struct zcomp_backend {
+	int (*compress)(const unsigned char *src, unsigned char *dst,
+			size_t *dst_len, void *private);
+
+	int (*decompress)(const unsigned char *src, size_t src_len,
+			unsigned char *dst);
+
+	void * (*create)(void);
+	void (*destroy)(void *private);
+
+	const char *name;
+};
+
+/* dynamic per-device compression frontend */
+struct zcomp {
+	/* protect strm list */
+	spinlock_t strm_lock;
+	/* list of available strms */
+	struct list_head idle_strm;
+	wait_queue_head_t strm_wait;
+	struct zcomp_backend *backend;
+};
+
+struct zcomp *zcomp_create(const char *comp);
+void zcomp_destroy(struct zcomp *comp);
+
+struct zcomp_strm *zcomp_strm_get(struct zcomp *comp);
+void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm);
+
+int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
+		const unsigned char *src, size_t *dst_len);
+
+int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
+		size_t src_len, unsigned char *dst);
+#endif /* _ZCOMP_H_ */
diff --git a/drivers/block/zram/zcomp_lzo.c b/drivers/block/zram/zcomp_lzo.c
new file mode 100644
index 0000000..39106a7
--- /dev/null
+++ b/drivers/block/zram/zcomp_lzo.c
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2014 Sergey Senozhatsky.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/lzo.h>
+
+#include "zcomp.h"
+
+static void * lzo_create(void)
+{
+	return kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
+}
+
+static void lzo_destroy(void *private)
+{
+	kfree(private);
+}
+
+static int lzo_compress(const unsigned char *src, unsigned char *dst,
+		size_t *dst_len, void *private)
+{
+	int ret = lzo1x_1_compress(src, PAGE_SIZE, dst, dst_len, private);
+	return ret == LZO_E_OK ? 0 : ret;
+}
+
+static int lzo_decompress(const unsigned char *src, size_t src_len,
+		unsigned char *dst)
+{
+	size_t dst_len = PAGE_SIZE;
+	int ret = lzo1x_decompress_safe(src, src_len, dst, &dst_len);
+	return ret == LZO_E_OK ? 0 : ret;
+}
+
+extern struct zcomp_backend zcomp_lzo;
+struct zcomp_backend zcomp_lzo = {
+	.compress = lzo_compress,
+	.decompress = lzo_decompress,
+	.create = lzo_create,
+	.destroy = lzo_destroy,
+	.name = "lzo",
+};
-- 
1.9.0.rc3.260.g4cf525c


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv5 2/4] zram: use zcomp compressing backends
  2014-02-13 17:43 [PATCHv5 0/4] add compressing abstraction and multi stream support Sergey Senozhatsky
  2014-02-13 17:43 ` [PATCHv5 1/4] zram: introduce compressing backend abstraction Sergey Senozhatsky
@ 2014-02-13 17:43 ` Sergey Senozhatsky
  2014-02-13 17:43 ` [PATCHv5 3/4] zram: zcomp support multi compressing streams Sergey Senozhatsky
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2014-02-13 17:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Jerome Marchand, Nitin Gupta, linux-kernel, Sergey Senozhatsky

Do not perform direct LZO compress/decompress calls,
initialise and use zcomp LZO backend instead.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/Makefile   |  2 +-
 drivers/block/zram/zram_drv.c | 62 +++++++++++++++++++------------------------
 drivers/block/zram/zram_drv.h |  8 +++---
 3 files changed, 32 insertions(+), 40 deletions(-)

diff --git a/drivers/block/zram/Makefile b/drivers/block/zram/Makefile
index cb0f9ce..757c6a5 100644
--- a/drivers/block/zram/Makefile
+++ b/drivers/block/zram/Makefile
@@ -1,3 +1,3 @@
-zram-y	:=	zram_drv.o
+zram-y	:=	zcomp_lzo.o zcomp.o zram_drv.o
 
 obj-$(CONFIG_ZRAM)	+=	zram.o
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 9baac5b..200c12a 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -29,15 +29,14 @@
 #include <linux/genhd.h>
 #include <linux/highmem.h>
 #include <linux/slab.h>
-#include <linux/lzo.h>
 #include <linux/string.h>
-#include <linux/vmalloc.h>
 
 #include "zram_drv.h"
 
 /* Globals */
 static int zram_major;
 static struct zram *zram_devices;
+static const char *default_compressor = "lzo";
 
 /* Module params (documentation at end) */
 static unsigned int num_devices = 1;
@@ -160,8 +159,6 @@ static inline int valid_io_request(struct zram *zram, struct bio *bio)
 static void zram_meta_free(struct zram_meta *meta)
 {
 	zs_destroy_pool(meta->mem_pool);
-	kfree(meta->compress_workmem);
-	free_pages((unsigned long)meta->compress_buffer, 1);
 	vfree(meta->table);
 	kfree(meta);
 }
@@ -173,22 +170,11 @@ static struct zram_meta *zram_meta_alloc(u64 disksize)
 	if (!meta)
 		goto out;
 
-	meta->compress_workmem = kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
-	if (!meta->compress_workmem)
-		goto free_meta;
-
-	meta->compress_buffer =
-		(void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
-	if (!meta->compress_buffer) {
-		pr_err("Error allocating compressor buffer space\n");
-		goto free_workmem;
-	}
-
 	num_pages = disksize >> PAGE_SHIFT;
 	meta->table = vzalloc(num_pages * sizeof(*meta->table));
 	if (!meta->table) {
 		pr_err("Error allocating zram address table\n");
-		goto free_buffer;
+		goto free_meta;
 	}
 
 	meta->mem_pool = zs_create_pool(GFP_NOIO | __GFP_HIGHMEM);
@@ -198,15 +184,10 @@ static struct zram_meta *zram_meta_alloc(u64 disksize)
 	}
 
 	rwlock_init(&meta->tb_lock);
-	mutex_init(&meta->buffer_lock);
 	return meta;
 
 free_table:
 	vfree(meta->table);
-free_buffer:
-	free_pages((unsigned long)meta->compress_buffer, 1);
-free_workmem:
-	kfree(meta->compress_workmem);
 free_meta:
 	kfree(meta);
 	meta = NULL;
@@ -280,8 +261,7 @@ static void zram_free_page(struct zram *zram, size_t index)
 
 static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
 {
-	int ret = LZO_E_OK;
-	size_t clen = PAGE_SIZE;
+	int ret = 0;
 	unsigned char *cmem;
 	struct zram_meta *meta = zram->meta;
 	unsigned long handle;
@@ -301,12 +281,12 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
 	if (size == PAGE_SIZE)
 		copy_page(mem, cmem);
 	else
-		ret = lzo1x_decompress_safe(cmem, size,	mem, &clen);
+		ret = zcomp_decompress(zram->comp, cmem, size, mem);
 	zs_unmap_object(meta->mem_pool, handle);
 	read_unlock(&meta->tb_lock);
 
 	/* Should NEVER happen. Return bio error if it does. */
-	if (unlikely(ret != LZO_E_OK)) {
+	if (unlikely(ret)) {
 		pr_err("Decompression failed! err=%d, page=%u\n", ret, index);
 		atomic64_inc(&zram->stats.failed_reads);
 		return ret;
@@ -349,7 +329,7 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 
 	ret = zram_decompress_page(zram, uncmem, index);
 	/* Should NEVER happen. Return bio error if it does. */
-	if (unlikely(ret != LZO_E_OK))
+	if (unlikely(ret))
 		goto out_cleanup;
 
 	if (is_partial_io(bvec))
@@ -374,11 +354,10 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 	struct page *page;
 	unsigned char *user_mem, *cmem, *src, *uncmem = NULL;
 	struct zram_meta *meta = zram->meta;
+	struct zcomp_strm *zstrm;
 	bool locked = false;
 
 	page = bvec->bv_page;
-	src = meta->compress_buffer;
-
 	if (is_partial_io(bvec)) {
 		/*
 		 * This is a partial IO. We need to read the full page
@@ -394,7 +373,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 			goto out;
 	}
 
-	mutex_lock(&meta->buffer_lock);
+	zstrm = zcomp_strm_get(zram->comp);
 	locked = true;
 	user_mem = kmap_atomic(page);
 
@@ -420,19 +399,18 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		goto out;
 	}
 
-	ret = lzo1x_1_compress(uncmem, PAGE_SIZE, src, &clen,
-			       meta->compress_workmem);
+	ret = zcomp_compress(zram->comp, zstrm, uncmem, &clen);
 	if (!is_partial_io(bvec)) {
 		kunmap_atomic(user_mem);
 		user_mem = NULL;
 		uncmem = NULL;
 	}
 
-	if (unlikely(ret != LZO_E_OK)) {
+	if (unlikely(ret)) {
 		pr_err("Compression failed! err=%d\n", ret);
 		goto out;
 	}
-
+	src = zstrm->buffer;
 	if (unlikely(clen > max_zpage_size)) {
 		clen = PAGE_SIZE;
 		src = NULL;
@@ -457,6 +435,8 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		memcpy(cmem, src, clen);
 	}
 
+	zcomp_strm_put(zram->comp, zstrm);
+	locked = false;
 	zs_unmap_object(meta->mem_pool, handle);
 
 	/*
@@ -475,10 +455,9 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 	atomic64_inc(&zram->stats.pages_stored);
 out:
 	if (locked)
-		mutex_unlock(&meta->buffer_lock);
+		zcomp_strm_put(zram->comp, zstrm);
 	if (is_partial_io(bvec))
 		kfree(uncmem);
-
 	if (ret)
 		atomic64_inc(&zram->stats.failed_writes);
 	return ret;
@@ -522,6 +501,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 		zs_free(meta->mem_pool, handle);
 	}
 
+	zcomp_destroy(zram->comp);
+	zram->comp = NULL;
+
 	zram_meta_free(zram->meta);
 	zram->meta = NULL;
 	/* Reset stats */
@@ -550,9 +532,18 @@ static ssize_t disksize_store(struct device *dev,
 		return -EBUSY;
 	}
 
+	zram->comp = zcomp_create(default_compressor);
+	if (!zram->comp) {
+		up_write(&zram->init_lock);
+		pr_info("Cannot initialise compressing backend\n");
+		return -EINVAL;
+	}
+
 	disksize = PAGE_ALIGN(disksize);
 	zram->meta = zram_meta_alloc(disksize);
 	if (!zram->meta) {
+		zcomp_destroy(zram->comp);
+		zram->comp = NULL;
 		up_write(&zram->init_lock);
 		return -ENOMEM;
 	}
@@ -790,6 +781,7 @@ static int create_device(struct zram *zram, int device_id)
 	}
 
 	zram->meta = NULL;
+	zram->comp = NULL;
 	return 0;
 
 out_free_disk:
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 1d5b1f5..45e04f7 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -16,9 +16,10 @@
 #define _ZRAM_DRV_H_
 
 #include <linux/spinlock.h>
-#include <linux/mutex.h>
 #include <linux/zsmalloc.h>
 
+#include "zcomp.h"
+
 /*
  * Some arbitrary value. This is just to catch
  * invalid value for num_devices module parameter.
@@ -81,17 +82,16 @@ struct zram_stats {
 
 struct zram_meta {
 	rwlock_t tb_lock;	/* protect table */
-	void *compress_workmem;
-	void *compress_buffer;
 	struct table *table;
 	struct zs_pool *mem_pool;
-	struct mutex buffer_lock; /* protect compress buffers */
 };
 
 struct zram {
 	struct zram_meta *meta;
 	struct request_queue *queue;
 	struct gendisk *disk;
+	struct zcomp *comp;
+
 	/* Prevent concurrent execution of device init, reset and R/W request */
 	struct rw_semaphore init_lock;
 	/*
-- 
1.9.0.rc3.260.g4cf525c


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv5 3/4] zram: zcomp support multi compressing streams
  2014-02-13 17:43 [PATCHv5 0/4] add compressing abstraction and multi stream support Sergey Senozhatsky
  2014-02-13 17:43 ` [PATCHv5 1/4] zram: introduce compressing backend abstraction Sergey Senozhatsky
  2014-02-13 17:43 ` [PATCHv5 2/4] zram: use zcomp compressing backends Sergey Senozhatsky
@ 2014-02-13 17:43 ` Sergey Senozhatsky
  2014-02-13 17:43 ` [PATCHv5 4/4] zram: document max_comp_streams Sergey Senozhatsky
  2014-02-18  6:04 ` [PATCHv5 0/4] add compressing abstraction and multi stream support Minchan Kim
  4 siblings, 0 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2014-02-13 17:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Jerome Marchand, Nitin Gupta, linux-kernel, Sergey Senozhatsky

1) Extend zcomp zcomp_create() with `num_strm' parameter.
`num_strm' limits the number of zcomp_strm structs in compression
backend's idle list (max_comp_streams). thus, it's possible to
have up to max_comp_streams parallel compress operations.

2) Introduce zram device attribute max_comp_streams to show and
store current device's zcomp max number of zcomp streams (num_strm).

Usage example:
        echo 4 > /sys/block/zram0/max_comp_streams

set max_comp_streams to 4

        cat /sys/block/zram0/max_comp_streams

show current max_comp_streams (default value is 1).

iozone -t 3 -R -r 16K -s 60M -I +Z
(write tests. ext4, lzo, each test performed on newly created
 zram0 device)

        test        max_comp_streams 1    max_comp_streams 3
-------------------------------------------------------------
"  Initial write "      549804.75      |       703499.52
"        Rewrite "      531435.43      |      1486276.59
" Mixed workload "     1085846.34      |      1237846.88
"   Random write "      549573.95      |      1375186.91
"         Pwrite "      515619.08      |       729404.86
"         Fwrite "     1443161.84      |      1469171.97

        test        max_comp_streams 1    max_comp_streams 3
-------------------------------------------------------------
"  Initial write "      571307.19      |       719940.72
"        Rewrite "      534797.36      |      1475686.41
" Mixed workload "     1588736.53      |      1662039.41
"   Random write "      563608.17      |      1373689.31
"         Pwrite "      586889.88      |       717255.91
"         Fwrite "     1479426.94      |      1482781.22

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zcomp.c    | 15 +++++++++------
 drivers/block/zram/zcomp.h    |  2 +-
 drivers/block/zram/zram_drv.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 drivers/block/zram/zram_drv.h |  2 +-
 4 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 3af25b6..a032f49 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -122,11 +122,12 @@ static struct zcomp_backend *find_backend(const char *compress)
  * if requested algorithm is not supported or in case
  * of init error
  */
-struct zcomp *zcomp_create(const char *compress)
+struct zcomp *zcomp_create(const char *compress, int num_strm)
 {
 	struct zcomp *comp;
 	struct zcomp_backend *backend;
 	struct zcomp_strm *zstrm;
+	int i;
 
 	backend = find_backend(compress);
 	if (!backend)
@@ -141,11 +142,13 @@ struct zcomp *zcomp_create(const char *compress)
 	INIT_LIST_HEAD(&comp->idle_strm);
 	init_waitqueue_head(&comp->strm_wait);
 
-	zstrm = zcomp_strm_alloc(comp);
-	if (!zstrm) {
-		zcomp_destroy(comp);
-		return NULL;
+	for (i = 0; i < num_strm; i++) {
+		zstrm = zcomp_strm_alloc(comp);
+		if (!zstrm) {
+			zcomp_destroy(comp);
+			return NULL;
+		}
+		list_add(&zstrm->list, &comp->idle_strm);
 	}
-	list_add(&zstrm->list, &comp->idle_strm);
 	return comp;
 }
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index 7e05d20..52899de 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -42,7 +42,7 @@ struct zcomp {
 	struct zcomp_backend *backend;
 };
 
-struct zcomp *zcomp_create(const char *comp);
+struct zcomp *zcomp_create(const char *comp, int num_strm);
 void zcomp_destroy(struct zcomp *comp);
 
 struct zcomp_strm *zcomp_strm_get(struct zcomp *comp);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 200c12a..93c5d29 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -107,6 +107,41 @@ static ssize_t mem_used_total_show(struct device *dev,
 	return sprintf(buf, "%llu\n", val);
 }
 
+static ssize_t max_comp_streams_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	long val;
+	struct zram *zram = dev_to_zram(dev);
+
+	down_read(&zram->init_lock);
+	val = zram->max_comp_streams;
+	up_read(&zram->init_lock);
+
+	return sprintf(buf, "%ld\n", val);
+}
+
+static ssize_t max_comp_streams_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	long num;
+	struct zram *zram = dev_to_zram(dev);
+
+	if (kstrtol(buf, 0, &num))
+		return -EINVAL;
+	if (num < 1)
+		return -EINVAL;
+	down_write(&zram->init_lock);
+	if (init_done(zram)) {
+		up_write(&zram->init_lock);
+		pr_info("Can't set max_comp_streams for initialized device\n");
+		return -EBUSY;
+	}
+	zram->max_comp_streams = num;
+	up_write(&zram->init_lock);
+
+	return len;
+}
+
 /* flag operations needs meta->tb_lock */
 static int zram_test_flag(struct zram_meta *meta, u32 index,
 			enum zram_pageflags flag)
@@ -503,6 +538,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 
 	zcomp_destroy(zram->comp);
 	zram->comp = NULL;
+	zram->max_comp_streams = 1;
 
 	zram_meta_free(zram->meta);
 	zram->meta = NULL;
@@ -532,7 +568,7 @@ static ssize_t disksize_store(struct device *dev,
 		return -EBUSY;
 	}
 
-	zram->comp = zcomp_create(default_compressor);
+	zram->comp = zcomp_create(default_compressor, zram->max_comp_streams);
 	if (!zram->comp) {
 		up_write(&zram->init_lock);
 		pr_info("Cannot initialise compressing backend\n");
@@ -695,6 +731,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
 static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
 static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
 static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
+static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
+		max_comp_streams_show, max_comp_streams_store);
 
 ZRAM_ATTR_RO(num_reads);
 ZRAM_ATTR_RO(num_writes);
@@ -719,6 +757,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_orig_data_size.attr,
 	&dev_attr_compr_data_size.attr,
 	&dev_attr_mem_used_total.attr,
+	&dev_attr_max_comp_streams.attr,
 	NULL,
 };
 
@@ -782,6 +821,7 @@ static int create_device(struct zram *zram, int device_id)
 
 	zram->meta = NULL;
 	zram->comp = NULL;
+	zram->max_comp_streams = 1;
 	return 0;
 
 out_free_disk:
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 45e04f7..8dccb22 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -99,7 +99,7 @@ struct zram {
 	 * we can store in a disk.
 	 */
 	u64 disksize;	/* bytes */
-
+	long max_comp_streams;
 	struct zram_stats stats;
 };
 #endif
-- 
1.9.0.rc3.260.g4cf525c


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv5 4/4] zram: document max_comp_streams
  2014-02-13 17:43 [PATCHv5 0/4] add compressing abstraction and multi stream support Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2014-02-13 17:43 ` [PATCHv5 3/4] zram: zcomp support multi compressing streams Sergey Senozhatsky
@ 2014-02-13 17:43 ` Sergey Senozhatsky
  2014-02-18  6:04 ` [PATCHv5 0/4] add compressing abstraction and multi stream support Minchan Kim
  4 siblings, 0 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2014-02-13 17:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Jerome Marchand, Nitin Gupta, linux-kernel, Sergey Senozhatsky

Add max_comp_streams device attribute documentation.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/ABI/testing/sysfs-block-zram |  9 ++++++++-
 Documentation/blockdev/zram.txt            | 20 +++++++++++++++-----
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index 8aa0468..0da9ed6 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -50,7 +50,6 @@ Description:
 		The failed_reads file is read-only and specifies the number of
 		failed reads happened on this device.
 
-
 What:		/sys/block/zram<id>/failed_writes
 Date:		February 2014
 Contact:	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
@@ -58,6 +57,14 @@ Description:
 		The failed_writes file is read-only and specifies the number of
 		failed writes happened on this device.
 
+What:		/sys/block/zram<id>/max_comp_streams
+Date:		February 2014
+Contact:	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
+Description:
+		The max_comp_streams file is read-write and specifies the
+		number of backend's zcomp_strm compression streams (number of
+		concurrent compress operations).
+
 What:		/sys/block/zram<id>/notify_free
 Date:		August 2010
 Contact:	Nitin Gupta <ngupta@vflare.org>
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index b31ac5e..b5a722c 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -21,7 +21,17 @@ Following shows a typical sequence of steps for using zram.
 	This creates 4 devices: /dev/zram{0,1,2,3}
 	(num_devices parameter is optional. Default: 1)
 
-2) Set Disksize
+2) Set max number of compression streams
+	Compression backend may use up to max_comp_streams compression streams,
+	thus allowing up to max_comp_streams concurrent compression operations.
+	Examples:
+	#set max compression streams number to 3
+	echo 3 > /sys/block/zram0/max_comp_streams
+
+	#show max compression streams number
+	cat /sys/block/zram0/max_comp_streams
+
+3) Set Disksize
         Set disk size by writing the value to sysfs node 'disksize'.
         The value can be either in bytes or you can use mem suffixes.
         Examples:
@@ -38,14 +48,14 @@ There is little point creating a zram of greater than twice the size of memory
 since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
 size of the disk when not in use so a huge zram is wasteful.
 
-3) Activate:
+4) Activate:
 	mkswap /dev/zram0
 	swapon /dev/zram0
 
 	mkfs.ext4 /dev/zram1
 	mount /dev/zram1 /tmp
 
-4) Stats:
+5) Stats:
 	Per-device statistics are exported as various nodes under
 	/sys/block/zram<id>/
 		disksize
@@ -60,11 +70,11 @@ size of the disk when not in use so a huge zram is wasteful.
 		compr_data_size
 		mem_used_total
 
-5) Deactivate:
+6) Deactivate:
 	swapoff /dev/zram0
 	umount /dev/zram1
 
-6) Reset:
+7) Reset:
 	Write any positive value to 'reset' sysfs node
 	echo 1 > /sys/block/zram0/reset
 	echo 1 > /sys/block/zram1/reset
-- 
1.9.0.rc3.260.g4cf525c


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 1/4] zram: introduce compressing backend abstraction
  2014-02-13 17:43 ` [PATCHv5 1/4] zram: introduce compressing backend abstraction Sergey Senozhatsky
@ 2014-02-18  5:10   ` Minchan Kim
  2014-02-18  9:24     ` Sergey Senozhatsky
  0 siblings, 1 reply; 12+ messages in thread
From: Minchan Kim @ 2014-02-18  5:10 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Jerome Marchand, Nitin Gupta, linux-kernel

On Thu, Feb 13, 2014 at 08:43:19PM +0300, Sergey Senozhatsky wrote:
> ZRAM performs direct LZO compression algorithm calls, making it the one and
> only option. Introduce compressing backend abstraction zcomp in order to
> support multiple compression algorithms with the following set of operations:
>         .create
>         .destroy
>         .compress
>         .decompress
> 
> Schematically zram write() usually contains the following steps:
> 0) preparation (decompression of partioal IO, etc.)
> 1) lock buffer_lock mutex (protects meta compress buffers)
> 2) compress (using meta compress buffers)
> 3) alloc and map zs_pool object
> 4) copy compressed data (from meta compress buffers) to object allocated by 3)
> 5) free previous pool page, assign a new one
> 6) unlock buffer_lock mutex
> 
> As we can see, compressing buffers must remain untouched from 1) to 4),
> because, otherwise, concurrent write() can overwrite data. At the same
> time, zram_meta must be aware of a) specific compression algorithm
> memory requirements and b) necessary locking to protect compression
> buffers. Besides, zram holds buffer_lock almost through the whole write()
> function, making parallel compression impossible. To remove requirement
> a) new struct zcomp_strm introduced, which contain a compress/decompress
> `buffer' and compression algorithm `private' part. While struct zcomp
> implements zcomp_strm stream handling and locking by means of get() and put()
> semantics and removes requirement b) from zram meta. zcomp ->create() and
> ->destroy(), respectively, allocate and deallocate algorithm specific
> zcomp_strm `private' part.
> 
> Every zcomp has a list of idle zcomp_strm structs, spinlock to protect idle
> list and wait queue, making it possible to perform parallel compressions.
> Each time zram issues a zcomp_strm_get() call, the following set of operations
> performed:
> - spin lock strm_lock
> - if idle list is not empty, remove zcomp_strm from idle list, spin
>   unlock and return zcomp stream pointer to caller
> - if idle list is empty, current adds itself to wait queue. it will be
>   awaken by zcomp_strm_put() caller.
> 
> zcomp_strm_put():
> - spin lock strm_lock
> - add zcomp stream to idle list
> - spin unlock, wake up sleeper
> 
> In other words, zcomp_strm_get() turns caller into exclusive user of a stream
> and zcomp_strm_put() makes a particular zcomp stream available.
> 
> Usage examples.
> 
> To initialize compressing backend:
> 	comp = zcomp_create(NAME) /* NAME e.g. "lzo" */
> 
> which initialises compressing backend if requested algorithm is supported.
> 
> Compress:
> 	zstrm = zcomp_strm_get(comp)
> 	zcomp_compress(comp, zstrm, src, &dst_len)
> 	[..] /* copy compressed data */
> 	zcomp_strm_put(comp, zstrm)
> 
> Decompress:
> 	zcomp_decompress(comp, src, src_len, dst);
> 
> Free compessing backend and its zcomp stream:
> 	zcomp_destroy(comp)
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

Just minor nitpicks below.

> ---
>  drivers/block/zram/zcomp.c     | 151 +++++++++++++++++++++++++++++++++++++++++
>  drivers/block/zram/zcomp.h     |  56 +++++++++++++++
>  drivers/block/zram/zcomp_lzo.c |  48 +++++++++++++
>  3 files changed, 255 insertions(+)
>  create mode 100644 drivers/block/zram/zcomp.c
>  create mode 100644 drivers/block/zram/zcomp.h
>  create mode 100644 drivers/block/zram/zcomp_lzo.c
> 
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> new file mode 100644
> index 0000000..3af25b6
> --- /dev/null
> +++ b/drivers/block/zram/zcomp.c
> @@ -0,0 +1,151 @@
> +/*
> + * Copyright (C) 2014 Sergey Senozhatsky.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/slab.h>
> +#include <linux/wait.h>
> +#include <linux/sched.h>
> +
> +#include "zcomp.h"
> +
> +extern struct zcomp_backend zcomp_lzo;
> +
> +static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
> +{
> +	comp->backend->destroy(zstrm->private);
> +	free_pages((unsigned long)zstrm->buffer, 1);
> +	kfree(zstrm);
> +}
> +
> +/*
> + * allocate new zcomp_strm structure with ->private initialized by
> + * backend, return NULL on error
> + */
> +static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
> +{
> +	struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL);
> +	if (!zstrm)
> +		return NULL;
> +
> +	INIT_LIST_HEAD(&zstrm->list);

No need to initialize.

> +	zstrm->private = comp->backend->create();

Pz, check zstrm->private NULL and handle it out.
Otherwise, zcomp_strm_free could pass NULL with backend's destroy
so all of backend should handle NULL argument, which is not good for me.

> +	/*
> +	 * allocate 2 pages. 1 for compressed data, plus 1 extra for the
> +	 * case when compressed size is larger than the original one
> +	 */
> +	zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
> +	if (!zstrm->private || !zstrm->buffer) {
> +		zcomp_strm_free(comp, zstrm);
> +		zstrm = NULL;
> +	}
> +	return zstrm;
> +}
> +
> +/*
> + * get existing idle zcomp_strm or wait until other process release
> + * (zcomp_strm_put()) one for us
> + */
> +struct zcomp_strm *zcomp_strm_get(struct zcomp *comp)
> +{
> +	struct zcomp_strm *zstrm;
> +
> +	while (1) {
> +		spin_lock(&comp->strm_lock);
> +		if (list_empty(&comp->idle_strm)) {
> +			spin_unlock(&comp->strm_lock);
> +			wait_event(comp->strm_wait,
> +					!list_empty(&comp->idle_strm));
> +			continue;
> +		}
> +
> +		zstrm = list_entry(comp->idle_strm.next,
> +				struct zcomp_strm, list);
> +		list_del(&zstrm->list);
> +		spin_unlock(&comp->strm_lock);
> +		break;
> +	}
> +	return zstrm;
> +}
> +
> +/* add zcomp_strm back to idle list and wake up waiter (if any) */
> +void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm)
> +{
> +	spin_lock(&comp->strm_lock);
> +	list_add(&zstrm->list, &comp->idle_strm);
> +	spin_unlock(&comp->strm_lock);
> +
> +	wake_up(&comp->strm_wait);
> +}
> +
> +int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
> +		const unsigned char *src, size_t *dst_len)
> +{
> +	return comp->backend->compress(src, zstrm->buffer, dst_len,
> +			zstrm->private);
> +}
> +
> +int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
> +		size_t src_len, unsigned char *dst)
> +{
> +	return comp->backend->decompress(src, src_len, dst);
> +}
> +
> +void zcomp_destroy(struct zcomp *comp)
> +{
> +	struct zcomp_strm *zstrm;
> +	while (!list_empty(&comp->idle_strm)) {
> +		zstrm = list_entry(comp->idle_strm.next,
> +				struct zcomp_strm, list);
> +		list_del(&zstrm->list);
> +		zcomp_strm_free(comp, zstrm);
> +	}
> +	kfree(comp);
> +}

You made zcom_destroy but I couldn't find zcomp_create?

> +
> +static struct zcomp_backend *find_backend(const char *compress)
> +{
> +	if (sysfs_streq(compress, "lzo"))

Why do you use sysfs_streq?
It's not sysfs variable yet.

> +		return &zcomp_lzo;
> +	return NULL;
> +}
> +
> +/*
> + * search available compressors for requested algorithm.
> + * allocate new zcomp and initialize it. return NULL
> + * if requested algorithm is not supported or in case
> + * of init error
> + */
> +struct zcomp *zcomp_create(const char *compress)
> +{
> +	struct zcomp *comp;
> +	struct zcomp_backend *backend;
> +	struct zcomp_strm *zstrm;
> +
> +	backend = find_backend(compress);
> +	if (!backend)
> +		return NULL;
> +
> +	comp = kmalloc(sizeof(struct zcomp), GFP_KERNEL);
> +	if (!comp)
> +		return NULL;
> +
> +	comp->backend = backend;
> +	spin_lock_init(&comp->strm_lock);
> +	INIT_LIST_HEAD(&comp->idle_strm);
> +	init_waitqueue_head(&comp->strm_wait);
> +
> +	zstrm = zcomp_strm_alloc(comp);
> +	if (!zstrm) {
> +		zcomp_destroy(comp);
> +		return NULL;
> +	}
> +	list_add(&zstrm->list, &comp->idle_strm);
> +	return comp;
> +}
> diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> new file mode 100644
> index 0000000..7e05d20
> --- /dev/null
> +++ b/drivers/block/zram/zcomp.h
> @@ -0,0 +1,56 @@
> +/*
> + * Copyright (C) 2014 Sergey Senozhatsky.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _ZCOMP_H_
> +#define _ZCOMP_H_
> +
> +#include <linux/spinlock.h>
> +
> +struct zcomp_strm {
> +	void *buffer;     /* compression/decompression buffer */
> +	void *private;      /* algorithm private memory */

         							It's not a *memory*.
									
> +	struct list_head list;
> +};
> +
> +/* static compression backend */
> +struct zcomp_backend {
> +	int (*compress)(const unsigned char *src, unsigned char *dst,
> +			size_t *dst_len, void *private);
> +
> +	int (*decompress)(const unsigned char *src, size_t src_len,
> +			unsigned char *dst);
> +
> +	void * (*create)(void);
> +	void (*destroy)(void *private);
> +
> +	const char *name;
> +};
> +
> +/* dynamic per-device compression frontend */
> +struct zcomp {
> +	/* protect strm list */
> +	spinlock_t strm_lock;
> +	/* list of available strms */
> +	struct list_head idle_strm;
> +	wait_queue_head_t strm_wait;
> +	struct zcomp_backend *backend;
> +};
> +
> +struct zcomp *zcomp_create(const char *comp);
> +void zcomp_destroy(struct zcomp *comp);
> +
> +struct zcomp_strm *zcomp_strm_get(struct zcomp *comp);
> +void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm);
> +
> +int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
> +		const unsigned char *src, size_t *dst_len);
> +
> +int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
> +		size_t src_len, unsigned char *dst);
> +#endif /* _ZCOMP_H_ */
> diff --git a/drivers/block/zram/zcomp_lzo.c b/drivers/block/zram/zcomp_lzo.c
> new file mode 100644
> index 0000000..39106a7
> --- /dev/null
> +++ b/drivers/block/zram/zcomp_lzo.c
> @@ -0,0 +1,48 @@
> +/*
> + * Copyright (C) 2014 Sergey Senozhatsky.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +#include <linux/lzo.h>
> +
> +#include "zcomp.h"
> +
> +static void * lzo_create(void)
                ^
          unnecessary white space

> +{
> +	return kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
> +}
> +
> +static void lzo_destroy(void *private)
> +{
> +	kfree(private);
> +}
> +
> +static int lzo_compress(const unsigned char *src, unsigned char *dst,
> +		size_t *dst_len, void *private)
> +{
> +	int ret = lzo1x_1_compress(src, PAGE_SIZE, dst, dst_len, private);
> +	return ret == LZO_E_OK ? 0 : ret;
> +}
> +
> +static int lzo_decompress(const unsigned char *src, size_t src_len,
> +		unsigned char *dst)
> +{
> +	size_t dst_len = PAGE_SIZE;
> +	int ret = lzo1x_decompress_safe(src, src_len, dst, &dst_len);
> +	return ret == LZO_E_OK ? 0 : ret;
> +}
> +
> +extern struct zcomp_backend zcomp_lzo;
> +struct zcomp_backend zcomp_lzo = {
> +	.compress = lzo_compress,
> +	.decompress = lzo_decompress,
> +	.create = lzo_create,
> +	.destroy = lzo_destroy,
> +	.name = "lzo",
> +};
> -- 
> 1.9.0.rc3.260.g4cf525c
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 0/4] add compressing abstraction and multi stream support
  2014-02-13 17:43 [PATCHv5 0/4] add compressing abstraction and multi stream support Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2014-02-13 17:43 ` [PATCHv5 4/4] zram: document max_comp_streams Sergey Senozhatsky
@ 2014-02-18  6:04 ` Minchan Kim
  2014-02-18  9:20   ` Sergey Senozhatsky
  2014-02-18 19:38   ` Sergey Senozhatsky
  4 siblings, 2 replies; 12+ messages in thread
From: Minchan Kim @ 2014-02-18  6:04 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Jerome Marchand, Nitin Gupta, linux-kernel

Hello Sergey,

On Thu, Feb 13, 2014 at 08:43:18PM +0300, Sergey Senozhatsky wrote:
> This patchset introduces zcomp compression backend abstraction
> adding ability to support compression algorithms other than LZO;
> support for multi compression streams, making parallel compressions
> possible.
> 
> Diff from v4 (reviewed by Minchan Kim):
> -- renamed zcomp buffer_lock; removed src len and dst len from
>    compress() and decompress(); not using term `buffer' and
>    `workmem' in code and documentation; define compress() and
>    decompress() functions for LZO backend; not using goto's;
>    do not put idle zcomp_strm to idle list tail.
> 
> Diff from v3:
> -- renamed compression backend and working memory structs as requested
>    by Minchan Kim; fixed several issues noted by Minchan Kim.
> 
> Sergey Senozhatsky (4):
>   zram: introduce compressing backend abstraction
>   zram: use zcomp compressing backends
>   zram: zcomp support multi compressing streams
>   zram: document max_comp_streams
> 
>  Documentation/ABI/testing/sysfs-block-zram |   9 +-
>  Documentation/blockdev/zram.txt            |  20 +++-
>  drivers/block/zram/Makefile                |   2 +-
>  drivers/block/zram/zcomp.c                 | 155 +++++++++++++++++++++++++++++
>  drivers/block/zram/zcomp.h                 |  56 +++++++++++
>  drivers/block/zram/zcomp_lzo.c             |  48 +++++++++
>  drivers/block/zram/zram_drv.c              | 102 ++++++++++++-------
>  drivers/block/zram/zram_drv.h              |  10 +-
>  8 files changed, 355 insertions(+), 47 deletions(-)
>  create mode 100644 drivers/block/zram/zcomp.c
>  create mode 100644 drivers/block/zram/zcomp.h
>  create mode 100644 drivers/block/zram/zcomp_lzo.c

I reviewed patchset and implement looked good to me but
I found severe regression in my test which was one stream.

Base is current upstream and zcomp-multi is yours.
At last, zcom-single is where I tweaked to see schedule overhead
cause by event.
   
   Base                      zcomp-multi                      zcomp-single

==Initial  write             ==Initial  write              ==Initial  write
records:   5                 records:   5                  records:   5
avg:       1642424.35        avg:       699610.40          avg:       1655583.71
std:       39890.95(2.43%)   std:       232014.19(33.16%)  std:       52293.96(3.16%)
max:       1690170.94        max:       1163473.45         max:       1697164.75
min:       1568669.52        min:       573429.88          min:       1553410.23
==Rewrite  ==Rewrite         ==Rewrite
records:   5                 records:   5                  records:   5
avg:       1611775.39        avg:       501406.64          avg:       1684419.11
std:       17144.58(1.06%)   std:       15354.41(3.06%)    std:       18367.42(1.09%)
max:       1641800.95        max:       531356.78          max:       1706445.84
min:       1593515.27        min:       488817.78          min:       1655335.73
==Read     ==Read            ==Read
records:   5                 records:   5                  records:   5
avg:       2418916.31        avg:       2385573.68         avg:       2316542.26
std:       55324.98(2.29%)   std:       64475.37(2.70%)    std:       50621.10(2.19%)
max:       2517417.72        max:       2444138.89         max:       2383321.09
min:       2364690.92        min:       2263763.77         min:       2233198.12
==Re-read  ==Re-read         ==Re-read
records:   5                 records:   5                  records:   5
avg:       2351292.92        avg:       2333131.90         avg:       2336306.89
std:       73358.82(3.12%)   std:       34726.09(1.49%)    std:       74001.47(3.17%)
max:       2444053.92        max:       2396357.19         max:       2432697.06
min:       2255477.41        min:       2299239.74         min:       2231400.94
==Reverse  Read              ==Reverse  Read               ==Reverse  Read
records:   5                 records:   5                  records:   5
avg:       2383917.40        avg:       2267700.38         avg:       2328689.00
std:       51389.78(2.16%)   std:       57018.67(2.51%)    std:       44955.21(1.93%)
max:       2435560.11        max:       2346465.31         max:       2375047.77
min:       2290726.91        min:       2188208.84         min:       2251465.20
==Stride   read              ==Stride   read               ==Stride   read
records:   5                 records:   5                  records:   5
avg:       2361656.52        avg:       2292182.65         avg:       2302384.26
std:       64730.61(2.74%)   std:       34649.38(1.51%)    std:       51145.23(2.22%)
max:       2471425.77        max:       2341282.19         max:       2375936.66
min:       2279913.31        min:       2242688.59         min:       2224134.48
==Random   read              ==Random   read               ==Random   read
records:   5                 records:   5                  records:   5
avg:       2369086.95        avg:       2315431.27         avg:       2352314.50
std:       19870.36(0.84%)   std:       43020.16(1.86%)    std:       51677.02(2.20%)
max:       2391210.41        max:       2390286.89         max:       2415472.89
min:       2337092.91        min:       2266433.95         min:       2264088.05
==Mixed    workload          ==Mixed    workload           ==Mixed    workload
records:   5                 records:   5                  records:   5
avg:       1822253.03        avg:       2006455.50         avg:       1918270.29
std:       95037.10(5.22%)   std:       67792.78(3.38%)    std:       45933.99(2.39%)
max:       1934495.87        max:       2079128.36         max:       1995693.03
min:       1694223.48        min:       1914532.49         min:       1856410.41
==Random   write             ==Random   write              ==Random   write
records:   5                 records:   5                  records:   5
avg:       1626318.29        avg:       497250.78          avg:       1695582.06
std:       38550.23(2.37%)   std:       1405.42(0.28%)     std:       9211.98(0.54%)
max:       1665839.62        max:       498585.88          max:       1703808.22
min:       1562141.21        min:       494526.45          min:       1677664.94
==Pwrite   ==Pwrite          ==Pwrite
records:   5                 records:   5                  records:   5
avg:       1654641.25        avg:       581709.22          avg:       1641452.34
std:       47202.59(2.85%)   std:       9670.46(1.66%)     std:       38963.62(2.37%)
max:       1740682.36        max:       591300.09          max:       1687387.69
min:       1611436.34        min:       564324.38          min:       1570496.11
==Pread    ==Pread           ==Pread
records:   5                 records:   5                  records:   5
avg:       2308891.23        avg:       2381945.97         avg:       2347688.68
std:       192436.37(8.33%)  std:       22957.85(0.96%)    std:       31682.43(1.35%)
max:       2548482.56        max:       2415788.27         max:       2381937.58
min:       1960229.11        min:       2349188.16         min:       2292507.52


zcomp-single patch

>From 99abaf9c185953c35b0e3cd12ebb90b57f3bbd50 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Tue, 18 Feb 2014 11:41:11 +0900
Subject: [PATCH] zram: add single lock

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zcomp.c | 24 +++++-------------------
 drivers/block/zram/zcomp.h |  2 +-
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index a032f49a52e2..03f4241083ac 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -56,32 +56,18 @@ struct zcomp_strm *zcomp_strm_get(struct zcomp *comp)
 {
 	struct zcomp_strm *zstrm;
 
-	while (1) {
-		spin_lock(&comp->strm_lock);
-		if (list_empty(&comp->idle_strm)) {
-			spin_unlock(&comp->strm_lock);
-			wait_event(comp->strm_wait,
-					!list_empty(&comp->idle_strm));
-			continue;
-		}
-
-		zstrm = list_entry(comp->idle_strm.next,
+	mutex_lock(&comp->strm_lock);
+	zstrm = list_entry(comp->idle_strm.next,
 				struct zcomp_strm, list);
-		list_del(&zstrm->list);
-		spin_unlock(&comp->strm_lock);
-		break;
-	}
+	list_del(&zstrm->list);
 	return zstrm;
 }
 
 /* add zcomp_strm back to idle list and wake up waiter (if any) */
 void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm)
 {
-	spin_lock(&comp->strm_lock);
 	list_add(&zstrm->list, &comp->idle_strm);
-	spin_unlock(&comp->strm_lock);
-
-	wake_up(&comp->strm_wait);
+	mutex_unlock(&comp->strm_lock);
 }
 
 int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
@@ -138,7 +124,7 @@ struct zcomp *zcomp_create(const char *compress, int num_strm)
 		return NULL;
 
 	comp->backend = backend;
-	spin_lock_init(&comp->strm_lock);
+	mutex_init(&comp->strm_lock);
 	INIT_LIST_HEAD(&comp->idle_strm);
 	init_waitqueue_head(&comp->strm_wait);
 
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index 52899de40ca5..bfcec1cf7d2e 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -35,7 +35,7 @@ struct zcomp_backend {
 /* dynamic per-device compression frontend */
 struct zcomp {
 	/* protect strm list */
-	spinlock_t strm_lock;
+	struct mutex strm_lock;
 	/* list of available strms */
 	struct list_head idle_strm;
 	wait_queue_head_t strm_wait;
-- 
1.8.5.3

As you can see, your patch made zram too much regressed when it
used one stream buffer. The reason I guessed is overhead of
scheduling by sleep/wakeup when others couldn't find a idle stream
so I had an experiment with below simple patch to restore old behavior
so it works well again. The reason old behaivor was really good is
it uses mutex with spin_on_owner so that it could avoid frequent
sleep/wakeup.

A solution I could think is that we could grow up the number of
buffer dynamically up to max_buffers(ex, 2 * number of CPU) so we
could grab a idle stream easily for each CPU while we could release
buffers by shrinker when the memory pressure is severe.
Of course, we should keep one buffer to work.

Another solution I can imagaine is to define CONFIG_ZRAM_MULTI_STREAM
and CONFIG_ZRAM_SINGLE_STREAM(ie, default) so we couldn't break good
performance for old cases and new users who want write parallel
could use ZRAM_MULTI which should be several streams at a default
rather than a stream. And we could ehhacne it further by dynamic control
as I mentioned.

Thoughts?


> 
> -- 
> 1.9.0.rc3.260.g4cf525c
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 0/4] add compressing abstraction and multi stream support
  2014-02-18  6:04 ` [PATCHv5 0/4] add compressing abstraction and multi stream support Minchan Kim
@ 2014-02-18  9:20   ` Sergey Senozhatsky
  2014-02-19  6:14     ` Minchan Kim
  2014-02-18 19:38   ` Sergey Senozhatsky
  1 sibling, 1 reply; 12+ messages in thread
From: Sergey Senozhatsky @ 2014-02-18  9:20 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Jerome Marchand, Nitin Gupta, linux-kernel

Hello,

On (02/18/14 15:04), Minchan Kim wrote:
> Hello Sergey,
> 
> On Thu, Feb 13, 2014 at 08:43:18PM +0300, Sergey Senozhatsky wrote:
> > This patchset introduces zcomp compression backend abstraction
> > adding ability to support compression algorithms other than LZO;
> > support for multi compression streams, making parallel compressions
> > possible.
> > 
> > Diff from v4 (reviewed by Minchan Kim):
> > -- renamed zcomp buffer_lock; removed src len and dst len from
> >    compress() and decompress(); not using term `buffer' and
> >    `workmem' in code and documentation; define compress() and
> >    decompress() functions for LZO backend; not using goto's;
> >    do not put idle zcomp_strm to idle list tail.
> > 
> > Diff from v3:
> > -- renamed compression backend and working memory structs as requested
> >    by Minchan Kim; fixed several issues noted by Minchan Kim.
> > 
> > Sergey Senozhatsky (4):
> >   zram: introduce compressing backend abstraction
> >   zram: use zcomp compressing backends
> >   zram: zcomp support multi compressing streams
> >   zram: document max_comp_streams
> > 
> >  Documentation/ABI/testing/sysfs-block-zram |   9 +-
> >  Documentation/blockdev/zram.txt            |  20 +++-
> >  drivers/block/zram/Makefile                |   2 +-
> >  drivers/block/zram/zcomp.c                 | 155 +++++++++++++++++++++++++++++
> >  drivers/block/zram/zcomp.h                 |  56 +++++++++++
> >  drivers/block/zram/zcomp_lzo.c             |  48 +++++++++
> >  drivers/block/zram/zram_drv.c              | 102 ++++++++++++-------
> >  drivers/block/zram/zram_drv.h              |  10 +-
> >  8 files changed, 355 insertions(+), 47 deletions(-)
> >  create mode 100644 drivers/block/zram/zcomp.c
> >  create mode 100644 drivers/block/zram/zcomp.h
> >  create mode 100644 drivers/block/zram/zcomp_lzo.c
> 
> I reviewed patchset and implement looked good to me but
> I found severe regression in my test which was one stream.
> 
> Base is current upstream and zcomp-multi is yours.
> At last, zcom-single is where I tweaked to see schedule overhead
> cause by event.
>    
>    Base                      zcomp-multi                      zcomp-single
> 
> ==Initial  write             ==Initial  write              ==Initial  write
> records:   5                 records:   5                  records:   5
> avg:       1642424.35        avg:       699610.40          avg:       1655583.71

wow... spinlock vs mutex 3 times slower? that's ugly.
can you please provide iozone arg list?

> std:       39890.95(2.43%)   std:       232014.19(33.16%)  std:       52293.96(3.16%)
> max:       1690170.94        max:       1163473.45         max:       1697164.75
> min:       1568669.52        min:       573429.88          min:       1553410.23
> ==Rewrite  ==Rewrite         ==Rewrite
> records:   5                 records:   5                  records:   5
> avg:       1611775.39        avg:       501406.64          avg:       1684419.11
> std:       17144.58(1.06%)   std:       15354.41(3.06%)    std:       18367.42(1.09%)
> max:       1641800.95        max:       531356.78          max:       1706445.84
> min:       1593515.27        min:       488817.78          min:       1655335.73
> ==Read     ==Read            ==Read
> records:   5                 records:   5                  records:   5
> avg:       2418916.31        avg:       2385573.68         avg:       2316542.26
> std:       55324.98(2.29%)   std:       64475.37(2.70%)    std:       50621.10(2.19%)
> max:       2517417.72        max:       2444138.89         max:       2383321.09
> min:       2364690.92        min:       2263763.77         min:       2233198.12
> ==Re-read  ==Re-read         ==Re-read
> records:   5                 records:   5                  records:   5
> avg:       2351292.92        avg:       2333131.90         avg:       2336306.89
> std:       73358.82(3.12%)   std:       34726.09(1.49%)    std:       74001.47(3.17%)
> max:       2444053.92        max:       2396357.19         max:       2432697.06
> min:       2255477.41        min:       2299239.74         min:       2231400.94
> ==Reverse  Read              ==Reverse  Read               ==Reverse  Read
> records:   5                 records:   5                  records:   5
> avg:       2383917.40        avg:       2267700.38         avg:       2328689.00
> std:       51389.78(2.16%)   std:       57018.67(2.51%)    std:       44955.21(1.93%)
> max:       2435560.11        max:       2346465.31         max:       2375047.77
> min:       2290726.91        min:       2188208.84         min:       2251465.20
> ==Stride   read              ==Stride   read               ==Stride   read
> records:   5                 records:   5                  records:   5
> avg:       2361656.52        avg:       2292182.65         avg:       2302384.26
> std:       64730.61(2.74%)   std:       34649.38(1.51%)    std:       51145.23(2.22%)
> max:       2471425.77        max:       2341282.19         max:       2375936.66
> min:       2279913.31        min:       2242688.59         min:       2224134.48
> ==Random   read              ==Random   read               ==Random   read
> records:   5                 records:   5                  records:   5
> avg:       2369086.95        avg:       2315431.27         avg:       2352314.50
> std:       19870.36(0.84%)   std:       43020.16(1.86%)    std:       51677.02(2.20%)
> max:       2391210.41        max:       2390286.89         max:       2415472.89
> min:       2337092.91        min:       2266433.95         min:       2264088.05
> ==Mixed    workload          ==Mixed    workload           ==Mixed    workload
> records:   5                 records:   5                  records:   5
> avg:       1822253.03        avg:       2006455.50         avg:       1918270.29
> std:       95037.10(5.22%)   std:       67792.78(3.38%)    std:       45933.99(2.39%)
> max:       1934495.87        max:       2079128.36         max:       1995693.03
> min:       1694223.48        min:       1914532.49         min:       1856410.41
> ==Random   write             ==Random   write              ==Random   write
> records:   5                 records:   5                  records:   5
> avg:       1626318.29        avg:       497250.78          avg:       1695582.06
> std:       38550.23(2.37%)   std:       1405.42(0.28%)     std:       9211.98(0.54%)
> max:       1665839.62        max:       498585.88          max:       1703808.22
> min:       1562141.21        min:       494526.45          min:       1677664.94
> ==Pwrite   ==Pwrite          ==Pwrite
> records:   5                 records:   5                  records:   5
> avg:       1654641.25        avg:       581709.22          avg:       1641452.34
> std:       47202.59(2.85%)   std:       9670.46(1.66%)     std:       38963.62(2.37%)
> max:       1740682.36        max:       591300.09          max:       1687387.69
> min:       1611436.34        min:       564324.38          min:       1570496.11
> ==Pread    ==Pread           ==Pread
> records:   5                 records:   5                  records:   5
> avg:       2308891.23        avg:       2381945.97         avg:       2347688.68
> std:       192436.37(8.33%)  std:       22957.85(0.96%)    std:       31682.43(1.35%)
> max:       2548482.56        max:       2415788.27         max:       2381937.58
> min:       1960229.11        min:       2349188.16         min:       2292507.52
> 
> 
> zcomp-single patch
> 
> From 99abaf9c185953c35b0e3cd12ebb90b57f3bbd50 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Tue, 18 Feb 2014 11:41:11 +0900
> Subject: [PATCH] zram: add single lock
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  drivers/block/zram/zcomp.c | 24 +++++-------------------
>  drivers/block/zram/zcomp.h |  2 +-
>  2 files changed, 6 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index a032f49a52e2..03f4241083ac 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -56,32 +56,18 @@ struct zcomp_strm *zcomp_strm_get(struct zcomp *comp)
>  {
>  	struct zcomp_strm *zstrm;
>  
> -	while (1) {
> -		spin_lock(&comp->strm_lock);
> -		if (list_empty(&comp->idle_strm)) {
> -			spin_unlock(&comp->strm_lock);
> -			wait_event(comp->strm_wait,
> -					!list_empty(&comp->idle_strm));
> -			continue;
> -		}
> -
> -		zstrm = list_entry(comp->idle_strm.next,
> +	mutex_lock(&comp->strm_lock);
> +	zstrm = list_entry(comp->idle_strm.next,
>  				struct zcomp_strm, list);
> -		list_del(&zstrm->list);
> -		spin_unlock(&comp->strm_lock);
> -		break;
> -	}
> +	list_del(&zstrm->list);
>  	return zstrm;
>  }
>  
>  /* add zcomp_strm back to idle list and wake up waiter (if any) */
>  void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm)
>  {
> -	spin_lock(&comp->strm_lock);
>  	list_add(&zstrm->list, &comp->idle_strm);
> -	spin_unlock(&comp->strm_lock);
> -
> -	wake_up(&comp->strm_wait);
> +	mutex_unlock(&comp->strm_lock);
>  }

>  
>  int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
> @@ -138,7 +124,7 @@ struct zcomp *zcomp_create(const char *compress, int num_strm)
>  		return NULL;
>  
>  	comp->backend = backend;
> -	spin_lock_init(&comp->strm_lock);
> +	mutex_init(&comp->strm_lock);
>  	INIT_LIST_HEAD(&comp->idle_strm);
>  	init_waitqueue_head(&comp->strm_wait);
>  
> diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> index 52899de40ca5..bfcec1cf7d2e 100644
> --- a/drivers/block/zram/zcomp.h
> +++ b/drivers/block/zram/zcomp.h
> @@ -35,7 +35,7 @@ struct zcomp_backend {
>  /* dynamic per-device compression frontend */
>  struct zcomp {
>  	/* protect strm list */
> -	spinlock_t strm_lock;
> +	struct mutex strm_lock;
>  	/* list of available strms */
>  	struct list_head idle_strm;
>  	wait_queue_head_t strm_wait;
> -- 
> 1.8.5.3
> 
> As you can see, your patch made zram too much regressed when it
> used one stream buffer. The reason I guessed is overhead of
> scheduling by sleep/wakeup when others couldn't find a idle stream
> so I had an experiment with below simple patch to restore old behavior
> so it works well again.

yes, could be.

> The reason old behaivor was really good is
> it uses mutex with spin_on_owner so that it could avoid frequent
> sleep/wakeup.
> 

zcomp_strm_get()
	-> mutex_lock(&comp->strm_lock);

zcomp_strm_put()
	-> mutex_unlock(&comp->strm_lock);

semantics was supposed to be a default single zstrm policy, while spinlock
based was meant to be multi zstrm policy. though, I agree, that zstrm policy
concept complicates implementation.


> A solution I could think is that we could grow up the number of
> buffer dynamically up to max_buffers(ex, 2 * number of CPU) so we
> could grab a idle stream easily for each CPU while we could release
> buffers by shrinker when the memory pressure is severe.
> Of course, we should keep one buffer to work.
> 

yes, we had this behaviour in v1. I'm afraid 2 * online cpus() zstrms
can be tough on systems with big number of cpus, taking in account workmem
requirements, e.g. LZO1X_1_MEM_COMPRESS is 8192 * sizeof(unsigned short).

> Another solution I can imagaine is to define CONFIG_ZRAM_MULTI_STREAM
> and CONFIG_ZRAM_SINGLE_STREAM(ie, default) so we couldn't break good
> performance for old cases and new users who want write parallel
> could use ZRAM_MULTI which should be several streams at a default
> rather than a stream. And we could ehhacne it further by dynamic control
> as I mentioned.
>
> Thoughts?
> 

well, (my opinion) I'd rather avoid adding .config options.

we can force zram to support only multi zstrm. during initialisation one zstrm (default)
is allocated, which is guaranteed to be there. now, if write didn't get idle zstrm AND
current number of allocated zstrms is less that N * online CPUs (N could be 1 or 2)
it attempts to allocate a new one. otherwise, it waits. if it failed to allocate a
new one (low memory case) it waits, and eventually will receive idle zstrm (either
one of previously allocated zstrms or, in the worst case possible, the default one.
so we guarantee write to complete).

on put() we add zstrm to idle list or free it if the number of online CPUs has changed.

thoughts?

	-ss
> 
> > 
> > -- 
> > 1.9.0.rc3.260.g4cf525c
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
> Kind regards,
> Minchan Kim
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 1/4] zram: introduce compressing backend abstraction
  2014-02-18  5:10   ` Minchan Kim
@ 2014-02-18  9:24     ` Sergey Senozhatsky
  0 siblings, 0 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2014-02-18  9:24 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Jerome Marchand, Nitin Gupta, linux-kernel

Hello,

On (02/18/14 05:10), Minchan Kim wrote:
> On Thu, Feb 13, 2014 at 08:43:19PM +0300, Sergey Senozhatsky wrote:
> > ZRAM performs direct LZO compression algorithm calls, making it the one and
> > only option. Introduce compressing backend abstraction zcomp in order to
> > support multiple compression algorithms with the following set of operations:
> >         .create
> >         .destroy
> >         .compress
> >         .decompress
> > 
> > Schematically zram write() usually contains the following steps:
> > 0) preparation (decompression of partioal IO, etc.)
> > 1) lock buffer_lock mutex (protects meta compress buffers)
> > 2) compress (using meta compress buffers)
> > 3) alloc and map zs_pool object
> > 4) copy compressed data (from meta compress buffers) to object allocated by 3)
> > 5) free previous pool page, assign a new one
> > 6) unlock buffer_lock mutex
> > 
> > As we can see, compressing buffers must remain untouched from 1) to 4),
> > because, otherwise, concurrent write() can overwrite data. At the same
> > time, zram_meta must be aware of a) specific compression algorithm
> > memory requirements and b) necessary locking to protect compression
> > buffers. Besides, zram holds buffer_lock almost through the whole write()
> > function, making parallel compression impossible. To remove requirement
> > a) new struct zcomp_strm introduced, which contain a compress/decompress
> > `buffer' and compression algorithm `private' part. While struct zcomp
> > implements zcomp_strm stream handling and locking by means of get() and put()
> > semantics and removes requirement b) from zram meta. zcomp ->create() and
> > ->destroy(), respectively, allocate and deallocate algorithm specific
> > zcomp_strm `private' part.
> > 
> > Every zcomp has a list of idle zcomp_strm structs, spinlock to protect idle
> > list and wait queue, making it possible to perform parallel compressions.
> > Each time zram issues a zcomp_strm_get() call, the following set of operations
> > performed:
> > - spin lock strm_lock
> > - if idle list is not empty, remove zcomp_strm from idle list, spin
> >   unlock and return zcomp stream pointer to caller
> > - if idle list is empty, current adds itself to wait queue. it will be
> >   awaken by zcomp_strm_put() caller.
> > 
> > zcomp_strm_put():
> > - spin lock strm_lock
> > - add zcomp stream to idle list
> > - spin unlock, wake up sleeper
> > 
> > In other words, zcomp_strm_get() turns caller into exclusive user of a stream
> > and zcomp_strm_put() makes a particular zcomp stream available.
> > 
> > Usage examples.
> > 
> > To initialize compressing backend:
> > 	comp = zcomp_create(NAME) /* NAME e.g. "lzo" */
> > 
> > which initialises compressing backend if requested algorithm is supported.
> > 
> > Compress:
> > 	zstrm = zcomp_strm_get(comp)
> > 	zcomp_compress(comp, zstrm, src, &dst_len)
> > 	[..] /* copy compressed data */
> > 	zcomp_strm_put(comp, zstrm)
> > 
> > Decompress:
> > 	zcomp_decompress(comp, src, src_len, dst);
> > 
> > Free compessing backend and its zcomp stream:
> > 	zcomp_destroy(comp)
> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> 
> Just minor nitpicks below.

thanks, will address.

> > ---
> >  drivers/block/zram/zcomp.c     | 151 +++++++++++++++++++++++++++++++++++++++++
> >  drivers/block/zram/zcomp.h     |  56 +++++++++++++++
> >  drivers/block/zram/zcomp_lzo.c |  48 +++++++++++++
> >  3 files changed, 255 insertions(+)
> >  create mode 100644 drivers/block/zram/zcomp.c
> >  create mode 100644 drivers/block/zram/zcomp.h
> >  create mode 100644 drivers/block/zram/zcomp_lzo.c
> > 
> > diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> > new file mode 100644
> > index 0000000..3af25b6
> > --- /dev/null
> > +++ b/drivers/block/zram/zcomp.c
> > @@ -0,0 +1,151 @@
> > +/*
> > + * Copyright (C) 2014 Sergey Senozhatsky.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/string.h>
> > +#include <linux/slab.h>
> > +#include <linux/wait.h>
> > +#include <linux/sched.h>
> > +
> > +#include "zcomp.h"
> > +
> > +extern struct zcomp_backend zcomp_lzo;
> > +
> > +static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
> > +{
> > +	comp->backend->destroy(zstrm->private);
> > +	free_pages((unsigned long)zstrm->buffer, 1);
> > +	kfree(zstrm);
> > +}
> > +
> > +/*
> > + * allocate new zcomp_strm structure with ->private initialized by
> > + * backend, return NULL on error
> > + */
> > +static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
> > +{
> > +	struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL);
> > +	if (!zstrm)
> > +		return NULL;
> > +
> > +	INIT_LIST_HEAD(&zstrm->list);
> 
> No need to initialize.
> 
> > +	zstrm->private = comp->backend->create();
> 
> Pz, check zstrm->private NULL and handle it out.
> Otherwise, zcomp_strm_free could pass NULL with backend's destroy
> so all of backend should handle NULL argument, which is not good for me.
> 

yes, at the moment it will not cause any problems (passing NULL to kfree()),
but in general case it makes sense.

> > +	/*
> > +	 * allocate 2 pages. 1 for compressed data, plus 1 extra for the
> > +	 * case when compressed size is larger than the original one
> > +	 */
> > +	zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
> > +	if (!zstrm->private || !zstrm->buffer) {
> > +		zcomp_strm_free(comp, zstrm);
> > +		zstrm = NULL;
> > +	}
> > +	return zstrm;
> > +}
> > +
> > +/*
> > + * get existing idle zcomp_strm or wait until other process release
> > + * (zcomp_strm_put()) one for us
> > + */
> > +struct zcomp_strm *zcomp_strm_get(struct zcomp *comp)
> > +{
> > +	struct zcomp_strm *zstrm;
> > +
> > +	while (1) {
> > +		spin_lock(&comp->strm_lock);
> > +		if (list_empty(&comp->idle_strm)) {
> > +			spin_unlock(&comp->strm_lock);
> > +			wait_event(comp->strm_wait,
> > +					!list_empty(&comp->idle_strm));
> > +			continue;
> > +		}
> > +
> > +		zstrm = list_entry(comp->idle_strm.next,
> > +				struct zcomp_strm, list);
> > +		list_del(&zstrm->list);
> > +		spin_unlock(&comp->strm_lock);
> > +		break;
> > +	}
> > +	return zstrm;
> > +}
> > +
> > +/* add zcomp_strm back to idle list and wake up waiter (if any) */
> > +void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm)
> > +{
> > +	spin_lock(&comp->strm_lock);
> > +	list_add(&zstrm->list, &comp->idle_strm);
> > +	spin_unlock(&comp->strm_lock);
> > +
> > +	wake_up(&comp->strm_wait);
> > +}
> > +
> > +int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
> > +		const unsigned char *src, size_t *dst_len)
> > +{
> > +	return comp->backend->compress(src, zstrm->buffer, dst_len,
> > +			zstrm->private);
> > +}
> > +
> > +int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
> > +		size_t src_len, unsigned char *dst)
> > +{
> > +	return comp->backend->decompress(src, src_len, dst);
> > +}
> > +
> > +void zcomp_destroy(struct zcomp *comp)
> > +{
> > +	struct zcomp_strm *zstrm;
> > +	while (!list_empty(&comp->idle_strm)) {
> > +		zstrm = list_entry(comp->idle_strm.next,
> > +				struct zcomp_strm, list);
> > +		list_del(&zstrm->list);
> > +		zcomp_strm_free(comp, zstrm);
> > +	}
> > +	kfree(comp);
> > +}
> 
> You made zcom_destroy but I couldn't find zcomp_create?
> 

it's several lines below :)

> > +
> > +static struct zcomp_backend *find_backend(const char *compress)
> > +{
> > +	if (sysfs_streq(compress, "lzo"))
> 
> Why do you use sysfs_streq?
> It's not sysfs variable yet.

yep.

> > +		return &zcomp_lzo;
> > +	return NULL;
> > +}
> > +
> > +/*
> > + * search available compressors for requested algorithm.
> > + * allocate new zcomp and initialize it. return NULL
> > + * if requested algorithm is not supported or in case
> > + * of init error
> > + */
> > +struct zcomp *zcomp_create(const char *compress)

		^^^^^^^^^^^^^^^^^

here it is.

> > +{
> > +	struct zcomp *comp;
> > +	struct zcomp_backend *backend;
> > +	struct zcomp_strm *zstrm;
> > +
> > +	backend = find_backend(compress);
> > +	if (!backend)
> > +		return NULL;
> > +
> > +	comp = kmalloc(sizeof(struct zcomp), GFP_KERNEL);
> > +	if (!comp)
> > +		return NULL;
> > +
> > +	comp->backend = backend;
> > +	spin_lock_init(&comp->strm_lock);
> > +	INIT_LIST_HEAD(&comp->idle_strm);
> > +	init_waitqueue_head(&comp->strm_wait);
> > +
> > +	zstrm = zcomp_strm_alloc(comp);
> > +	if (!zstrm) {
> > +		zcomp_destroy(comp);
> > +		return NULL;
> > +	}
> > +	list_add(&zstrm->list, &comp->idle_strm);
> > +	return comp;
> > +}
> > diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> > new file mode 100644
> > index 0000000..7e05d20
> > --- /dev/null
> > +++ b/drivers/block/zram/zcomp.h
> > @@ -0,0 +1,56 @@
> > +/*
> > + * Copyright (C) 2014 Sergey Senozhatsky.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> > +
> > +#ifndef _ZCOMP_H_
> > +#define _ZCOMP_H_
> > +
> > +#include <linux/spinlock.h>
> > +
> > +struct zcomp_strm {
> > +	void *buffer;     /* compression/decompression buffer */
> > +	void *private;      /* algorithm private memory */
> 
>          							It's not a *memory*.
> 									

	ok

> > +	struct list_head list;
> > +};
> > +
> > +/* static compression backend */
> > +struct zcomp_backend {
> > +	int (*compress)(const unsigned char *src, unsigned char *dst,
> > +			size_t *dst_len, void *private);
> > +
> > +	int (*decompress)(const unsigned char *src, size_t src_len,
> > +			unsigned char *dst);
> > +
> > +	void * (*create)(void);
> > +	void (*destroy)(void *private);
> > +
> > +	const char *name;
> > +};
> > +
> > +/* dynamic per-device compression frontend */
> > +struct zcomp {
> > +	/* protect strm list */
> > +	spinlock_t strm_lock;
> > +	/* list of available strms */
> > +	struct list_head idle_strm;
> > +	wait_queue_head_t strm_wait;
> > +	struct zcomp_backend *backend;
> > +};
> > +
> > +struct zcomp *zcomp_create(const char *comp);
> > +void zcomp_destroy(struct zcomp *comp);
> > +
> > +struct zcomp_strm *zcomp_strm_get(struct zcomp *comp);
> > +void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm);
> > +
> > +int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
> > +		const unsigned char *src, size_t *dst_len);
> > +
> > +int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
> > +		size_t src_len, unsigned char *dst);
> > +#endif /* _ZCOMP_H_ */
> > diff --git a/drivers/block/zram/zcomp_lzo.c b/drivers/block/zram/zcomp_lzo.c
> > new file mode 100644
> > index 0000000..39106a7
> > --- /dev/null
> > +++ b/drivers/block/zram/zcomp_lzo.c
> > @@ -0,0 +1,48 @@
> > +/*
> > + * Copyright (C) 2014 Sergey Senozhatsky.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/slab.h>
> > +#include <linux/lzo.h>
> > +
> > +#include "zcomp.h"
> > +
> > +static void * lzo_create(void)
>                 ^
>           unnecessary white space
> 
> > +{
> > +	return kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
> > +}
> > +
> > +static void lzo_destroy(void *private)
> > +{
> > +	kfree(private);
> > +}
> > +
> > +static int lzo_compress(const unsigned char *src, unsigned char *dst,
> > +		size_t *dst_len, void *private)
> > +{
> > +	int ret = lzo1x_1_compress(src, PAGE_SIZE, dst, dst_len, private);
> > +	return ret == LZO_E_OK ? 0 : ret;
> > +}
> > +
> > +static int lzo_decompress(const unsigned char *src, size_t src_len,
> > +		unsigned char *dst)
> > +{
> > +	size_t dst_len = PAGE_SIZE;
> > +	int ret = lzo1x_decompress_safe(src, src_len, dst, &dst_len);
> > +	return ret == LZO_E_OK ? 0 : ret;
> > +}
> > +
> > +extern struct zcomp_backend zcomp_lzo;
> > +struct zcomp_backend zcomp_lzo = {
> > +	.compress = lzo_compress,
> > +	.decompress = lzo_decompress,
> > +	.create = lzo_create,
> > +	.destroy = lzo_destroy,
> > +	.name = "lzo",
> > +};
> > -- 
> > 1.9.0.rc3.260.g4cf525c
> > 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 0/4] add compressing abstraction and multi stream support
  2014-02-18  6:04 ` [PATCHv5 0/4] add compressing abstraction and multi stream support Minchan Kim
  2014-02-18  9:20   ` Sergey Senozhatsky
@ 2014-02-18 19:38   ` Sergey Senozhatsky
  2014-02-19  6:15     ` Minchan Kim
  1 sibling, 1 reply; 12+ messages in thread
From: Sergey Senozhatsky @ 2014-02-18 19:38 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Jerome Marchand, Nitin Gupta, linux-kernel

On (02/18/14 15:04), Minchan Kim wrote:
[..]
> 1.8.5.3
> 
> As you can see, your patch made zram too much regressed when it
> used one stream buffer. The reason I guessed is overhead of
> scheduling by sleep/wakeup when others couldn't find a idle stream
> so I had an experiment with below simple patch to restore old behavior
> so it works well again. The reason old behaivor was really good is
> it uses mutex with spin_on_owner so that it could avoid frequent
> sleep/wakeup.
> 
> A solution I could think is that we could grow up the number of
> buffer dynamically up to max_buffers(ex, 2 * number of CPU) so we
> could grab a idle stream easily for each CPU while we could release
> buffers by shrinker when the memory pressure is severe.
> Of course, we should keep one buffer to work.
> 
> Another solution I can imagaine is to define CONFIG_ZRAM_MULTI_STREAM
> and CONFIG_ZRAM_SINGLE_STREAM(ie, default) so we couldn't break good
> performance for old cases and new users who want write parallel
> could use ZRAM_MULTI which should be several streams at a default
> rather than a stream. And we could ehhacne it further by dynamic control
> as I mentioned.
> 
> Thoughts?
>

like the following one (this patch limits the number of streams to
num_online_cpus(). I really don't mind to limit zstrm to
N * num_online_cpus() where N could be 2)

	iozone -t 3 -R -r 16K -s 60M -I +Z

write test:
(old core i5 laptop, 2 cpus + 2 HyperThreading)

                       base            multi buffer

"  Initial write "   574938.14           716623.64
"        Rewrite "   573541.22          1499074.62
"   Random write "   587016.14          1393996.66
"         Pwrite "   595616.45           711028.70
"         Fwrite "  1482843.62          1493398.12


zcomp_strm_get() and zcomp_strm_put() were updated.
zcomp keeps the number allocated and still active (not freed)
streams (atomic_t). zcomp performs zstrm free() only if
current number of streams exceeds the number of online
cpus (cpu went offline).


not a properly `aligned' patch, just to demonstrate the idea.

thoughts?

thanks.

---

>From 0e16f26480933d22339675ca938049dfb21557e2 Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date: Tue, 11 Feb 2014 00:28:46 +0300
Subject: [PATCH] zram: multi stream compressing backend abstraction

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zcomp.c     | 174 +++++++++++++++++++++++++++++++++++++++++
 drivers/block/zram/zcomp.h     |  58 ++++++++++++++
 drivers/block/zram/zcomp_lzo.c |  48 ++++++++++++
 3 files changed, 280 insertions(+)
 create mode 100644 drivers/block/zram/zcomp.c
 create mode 100644 drivers/block/zram/zcomp.h
 create mode 100644 drivers/block/zram/zcomp_lzo.c

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
new file mode 100644
index 0000000..6bf49fc
--- /dev/null
+++ b/drivers/block/zram/zcomp.c
@@ -0,0 +1,174 @@
+/*
+ * Copyright (C) 2014 Sergey Senozhatsky.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+#include <linux/sched.h>
+
+#include "zcomp.h"
+
+extern struct zcomp_backend zcomp_lzo;
+
+static struct zcomp_backend *find_backend(const char *compress)
+{
+	if (strncmp(compress, "lzo", 3) == 0)
+		return &zcomp_lzo;
+	return NULL;
+}
+
+static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
+{
+	if (zstrm->private)
+		comp->backend->destroy(zstrm->private);
+	free_pages((unsigned long)zstrm->buffer, 1);
+	kfree(zstrm);
+}
+
+/*
+ * allocate new zcomp_strm structure with ->private initialized by
+ * backend, return NULL on error
+ */
+static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
+{
+	struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL);
+	if (!zstrm)
+		return NULL;
+
+	zstrm->private = comp->backend->create();
+	/*
+	 * allocate 2 pages. 1 for compressed data, plus 1 extra for the
+	 * case when compressed size is larger than the original one
+	 */
+	zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
+	if (!zstrm->private || !zstrm->buffer) {
+		zcomp_strm_free(comp, zstrm);
+		zstrm = NULL;
+	}
+	return zstrm;
+}
+
+/*
+ * get available idle stream or allocate a new one
+ */
+struct zcomp_strm *zcomp_strm_get(struct zcomp *comp)
+{
+	struct zcomp_strm *zstrm;
+
+	while (1) {
+		spin_lock(&comp->strm_lock);
+		if (!list_empty(&comp->idle_strm)) {
+			zstrm = list_entry(comp->idle_strm.next,
+					struct zcomp_strm, list);
+			list_del(&zstrm->list);
+			spin_unlock(&comp->strm_lock);
+			return zstrm;
+		}
+
+		if (atomic_read(&comp->num_strm) >= num_online_cpus()) {
+			spin_unlock(&comp->strm_lock);
+			wait_event(comp->strm_wait,
+					!list_empty(&comp->idle_strm));
+			continue;
+		}
+
+		atomic_inc(&comp->num_strm);
+		spin_unlock(&comp->strm_lock);
+
+		zstrm = zcomp_strm_alloc(comp);
+		if (!zstrm) {
+			atomic_dec(&comp->num_strm);
+			wait_event(comp->strm_wait,
+					!list_empty(&comp->idle_strm));
+			continue;
+		}
+		break;
+	}
+	return zstrm;
+}
+
+/*
+ * put zstrm back to idle list and wake up waiters or free it if
+ * current number of streams is above the limit
+ */
+void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm)
+{
+	spin_lock(&comp->strm_lock);
+	if (atomic_read(&comp->num_strm) <= num_online_cpus()) {
+		list_add(&zstrm->list, &comp->idle_strm);
+		spin_unlock(&comp->strm_lock);
+	} else {
+		atomic_dec(&comp->num_strm);
+		spin_unlock(&comp->strm_lock);
+		zcomp_strm_free(comp, zstrm);
+	}
+
+	wake_up(&comp->strm_wait);
+}
+
+int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
+		const unsigned char *src, size_t *dst_len)
+{
+	return comp->backend->compress(src, zstrm->buffer, dst_len,
+			zstrm->private);
+}
+
+int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
+		size_t src_len, unsigned char *dst)
+{
+	return comp->backend->decompress(src, src_len, dst);
+}
+
+void zcomp_destroy(struct zcomp *comp)
+{
+	struct zcomp_strm *zstrm;
+	while (!list_empty(&comp->idle_strm)) {
+		zstrm = list_entry(comp->idle_strm.next,
+				struct zcomp_strm, list);
+		list_del(&zstrm->list);
+		zcomp_strm_free(comp, zstrm);
+	}
+	kfree(comp);
+}
+
+/*
+ * search available compressors for requested algorithm.
+ * allocate new zcomp and initialize it. return NULL
+ * if requested algorithm is not supported or in case
+ * of init error
+ */
+struct zcomp *zcomp_create(const char *compress)
+{
+	struct zcomp *comp;
+	struct zcomp_backend *backend;
+	struct zcomp_strm *zstrm;
+
+	backend = find_backend(compress);
+	if (!backend)
+		return NULL;
+
+	comp = kmalloc(sizeof(struct zcomp), GFP_KERNEL);
+	if (!comp)
+		return NULL;
+
+	comp->backend = backend;
+	spin_lock_init(&comp->strm_lock);
+	INIT_LIST_HEAD(&comp->idle_strm);
+	init_waitqueue_head(&comp->strm_wait);
+	atomic_set(&comp->num_strm, 1);
+
+	zstrm = zcomp_strm_alloc(comp);
+	if (!zstrm) {
+		zcomp_destroy(comp);
+		return NULL;
+	}
+	list_add(&zstrm->list, &comp->idle_strm);
+	return comp;
+}
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
new file mode 100644
index 0000000..fd3871d
--- /dev/null
+++ b/drivers/block/zram/zcomp.h
@@ -0,0 +1,58 @@
+/*
+ * Copyright (C) 2014 Sergey Senozhatsky.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ZCOMP_H_
+#define _ZCOMP_H_
+
+#include <linux/spinlock.h>
+
+struct zcomp_strm {
+	void *buffer;     /* compression/decompression buffer */
+	void *private;
+	struct list_head list;
+};
+
+/* static compression backend */
+struct zcomp_backend {
+	int (*compress)(const unsigned char *src, unsigned char *dst,
+			size_t *dst_len, void *private);
+
+	int (*decompress)(const unsigned char *src, size_t src_len,
+			unsigned char *dst);
+
+	void * (*create)(void);
+	void (*destroy)(void *private);
+
+	const char *name;
+};
+
+/* dynamic per-device compression frontend */
+struct zcomp {
+	/* protect strm list */
+	spinlock_t strm_lock;
+	/* list of available strms */
+	struct list_head idle_strm;
+	wait_queue_head_t strm_wait;
+	/* number of allocated strms */
+	atomic_t num_strm;
+	struct zcomp_backend *backend;
+};
+
+struct zcomp *zcomp_create(const char *comp);
+void zcomp_destroy(struct zcomp *comp);
+
+struct zcomp_strm *zcomp_strm_get(struct zcomp *comp);
+void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm);
+
+int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
+		const unsigned char *src, size_t *dst_len);
+
+int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
+		size_t src_len, unsigned char *dst);
+#endif /* _ZCOMP_H_ */
diff --git a/drivers/block/zram/zcomp_lzo.c b/drivers/block/zram/zcomp_lzo.c
new file mode 100644
index 0000000..b7722a2
--- /dev/null
+++ b/drivers/block/zram/zcomp_lzo.c
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2014 Sergey Senozhatsky.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/lzo.h>
+
+#include "zcomp.h"
+
+static void *lzo_create(void)
+{
+	return kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
+}
+
+static void lzo_destroy(void *private)
+{
+	kfree(private);
+}
+
+static int lzo_compress(const unsigned char *src, unsigned char *dst,
+		size_t *dst_len, void *private)
+{
+	int ret = lzo1x_1_compress(src, PAGE_SIZE, dst, dst_len, private);
+	return ret == LZO_E_OK ? 0 : ret;
+}
+
+static int lzo_decompress(const unsigned char *src, size_t src_len,
+		unsigned char *dst)
+{
+	size_t dst_len = PAGE_SIZE;
+	int ret = lzo1x_decompress_safe(src, src_len, dst, &dst_len);
+	return ret == LZO_E_OK ? 0 : ret;
+}
+
+extern struct zcomp_backend zcomp_lzo;
+struct zcomp_backend zcomp_lzo = {
+	.compress = lzo_compress,
+	.decompress = lzo_decompress,
+	.create = lzo_create,
+	.destroy = lzo_destroy,
+	.name = "lzo",
+};
-- 
1.9.0.279.gdc9e3eb


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 0/4] add compressing abstraction and multi stream support
  2014-02-18  9:20   ` Sergey Senozhatsky
@ 2014-02-19  6:14     ` Minchan Kim
  0 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2014-02-19  6:14 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Jerome Marchand, Nitin Gupta, linux-kernel

Hello Sergey,

On Tue, Feb 18, 2014 at 12:20:17PM +0300, Sergey Senozhatsky wrote:
> Hello,
> 
> On (02/18/14 15:04), Minchan Kim wrote:
> > Hello Sergey,
> > 
> > On Thu, Feb 13, 2014 at 08:43:18PM +0300, Sergey Senozhatsky wrote:
> > > This patchset introduces zcomp compression backend abstraction
> > > adding ability to support compression algorithms other than LZO;
> > > support for multi compression streams, making parallel compressions
> > > possible.
> > > 
> > > Diff from v4 (reviewed by Minchan Kim):
> > > -- renamed zcomp buffer_lock; removed src len and dst len from
> > >    compress() and decompress(); not using term `buffer' and
> > >    `workmem' in code and documentation; define compress() and
> > >    decompress() functions for LZO backend; not using goto's;
> > >    do not put idle zcomp_strm to idle list tail.
> > > 
> > > Diff from v3:
> > > -- renamed compression backend and working memory structs as requested
> > >    by Minchan Kim; fixed several issues noted by Minchan Kim.
> > > 
> > > Sergey Senozhatsky (4):
> > >   zram: introduce compressing backend abstraction
> > >   zram: use zcomp compressing backends
> > >   zram: zcomp support multi compressing streams
> > >   zram: document max_comp_streams
> > > 
> > >  Documentation/ABI/testing/sysfs-block-zram |   9 +-
> > >  Documentation/blockdev/zram.txt            |  20 +++-
> > >  drivers/block/zram/Makefile                |   2 +-
> > >  drivers/block/zram/zcomp.c                 | 155 +++++++++++++++++++++++++++++
> > >  drivers/block/zram/zcomp.h                 |  56 +++++++++++
> > >  drivers/block/zram/zcomp_lzo.c             |  48 +++++++++
> > >  drivers/block/zram/zram_drv.c              | 102 ++++++++++++-------
> > >  drivers/block/zram/zram_drv.h              |  10 +-
> > >  8 files changed, 355 insertions(+), 47 deletions(-)
> > >  create mode 100644 drivers/block/zram/zcomp.c
> > >  create mode 100644 drivers/block/zram/zcomp.h
> > >  create mode 100644 drivers/block/zram/zcomp_lzo.c
> > 
> > I reviewed patchset and implement looked good to me but
> > I found severe regression in my test which was one stream.
> > 
> > Base is current upstream and zcomp-multi is yours.
> > At last, zcom-single is where I tweaked to see schedule overhead
> > cause by event.
> >    
> >    Base                      zcomp-multi                      zcomp-single
> > 
> > ==Initial  write             ==Initial  write              ==Initial  write
> > records:   5                 records:   5                  records:   5
> > avg:       1642424.35        avg:       699610.40          avg:       1655583.71
> 
> wow... spinlock vs mutex 3 times slower? that's ugly.
> can you please provide iozone arg list?

iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0

> 
> > std:       39890.95(2.43%)   std:       232014.19(33.16%)  std:       52293.96(3.16%)
> > max:       1690170.94        max:       1163473.45         max:       1697164.75
> > min:       1568669.52        min:       573429.88          min:       1553410.23
> > ==Rewrite  ==Rewrite         ==Rewrite
> > records:   5                 records:   5                  records:   5
> > avg:       1611775.39        avg:       501406.64          avg:       1684419.11
> > std:       17144.58(1.06%)   std:       15354.41(3.06%)    std:       18367.42(1.09%)
> > max:       1641800.95        max:       531356.78          max:       1706445.84
> > min:       1593515.27        min:       488817.78          min:       1655335.73
> > ==Read     ==Read            ==Read
> > records:   5                 records:   5                  records:   5
> > avg:       2418916.31        avg:       2385573.68         avg:       2316542.26
> > std:       55324.98(2.29%)   std:       64475.37(2.70%)    std:       50621.10(2.19%)
> > max:       2517417.72        max:       2444138.89         max:       2383321.09
> > min:       2364690.92        min:       2263763.77         min:       2233198.12
> > ==Re-read  ==Re-read         ==Re-read
> > records:   5                 records:   5                  records:   5
> > avg:       2351292.92        avg:       2333131.90         avg:       2336306.89
> > std:       73358.82(3.12%)   std:       34726.09(1.49%)    std:       74001.47(3.17%)
> > max:       2444053.92        max:       2396357.19         max:       2432697.06
> > min:       2255477.41        min:       2299239.74         min:       2231400.94
> > ==Reverse  Read              ==Reverse  Read               ==Reverse  Read
> > records:   5                 records:   5                  records:   5
> > avg:       2383917.40        avg:       2267700.38         avg:       2328689.00
> > std:       51389.78(2.16%)   std:       57018.67(2.51%)    std:       44955.21(1.93%)
> > max:       2435560.11        max:       2346465.31         max:       2375047.77
> > min:       2290726.91        min:       2188208.84         min:       2251465.20
> > ==Stride   read              ==Stride   read               ==Stride   read
> > records:   5                 records:   5                  records:   5
> > avg:       2361656.52        avg:       2292182.65         avg:       2302384.26
> > std:       64730.61(2.74%)   std:       34649.38(1.51%)    std:       51145.23(2.22%)
> > max:       2471425.77        max:       2341282.19         max:       2375936.66
> > min:       2279913.31        min:       2242688.59         min:       2224134.48
> > ==Random   read              ==Random   read               ==Random   read
> > records:   5                 records:   5                  records:   5
> > avg:       2369086.95        avg:       2315431.27         avg:       2352314.50
> > std:       19870.36(0.84%)   std:       43020.16(1.86%)    std:       51677.02(2.20%)
> > max:       2391210.41        max:       2390286.89         max:       2415472.89
> > min:       2337092.91        min:       2266433.95         min:       2264088.05
> > ==Mixed    workload          ==Mixed    workload           ==Mixed    workload
> > records:   5                 records:   5                  records:   5
> > avg:       1822253.03        avg:       2006455.50         avg:       1918270.29
> > std:       95037.10(5.22%)   std:       67792.78(3.38%)    std:       45933.99(2.39%)
> > max:       1934495.87        max:       2079128.36         max:       1995693.03
> > min:       1694223.48        min:       1914532.49         min:       1856410.41
> > ==Random   write             ==Random   write              ==Random   write
> > records:   5                 records:   5                  records:   5
> > avg:       1626318.29        avg:       497250.78          avg:       1695582.06
> > std:       38550.23(2.37%)   std:       1405.42(0.28%)     std:       9211.98(0.54%)
> > max:       1665839.62        max:       498585.88          max:       1703808.22
> > min:       1562141.21        min:       494526.45          min:       1677664.94
> > ==Pwrite   ==Pwrite          ==Pwrite
> > records:   5                 records:   5                  records:   5
> > avg:       1654641.25        avg:       581709.22          avg:       1641452.34
> > std:       47202.59(2.85%)   std:       9670.46(1.66%)     std:       38963.62(2.37%)
> > max:       1740682.36        max:       591300.09          max:       1687387.69
> > min:       1611436.34        min:       564324.38          min:       1570496.11
> > ==Pread    ==Pread           ==Pread
> > records:   5                 records:   5                  records:   5
> > avg:       2308891.23        avg:       2381945.97         avg:       2347688.68
> > std:       192436.37(8.33%)  std:       22957.85(0.96%)    std:       31682.43(1.35%)
> > max:       2548482.56        max:       2415788.27         max:       2381937.58
> > min:       1960229.11        min:       2349188.16         min:       2292507.52
> > 
> > 
> > zcomp-single patch
> > 
> > From 99abaf9c185953c35b0e3cd12ebb90b57f3bbd50 Mon Sep 17 00:00:00 2001
> > From: Minchan Kim <minchan@kernel.org>
> > Date: Tue, 18 Feb 2014 11:41:11 +0900
> > Subject: [PATCH] zram: add single lock
> > 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  drivers/block/zram/zcomp.c | 24 +++++-------------------
> >  drivers/block/zram/zcomp.h |  2 +-
> >  2 files changed, 6 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> > index a032f49a52e2..03f4241083ac 100644
> > --- a/drivers/block/zram/zcomp.c
> > +++ b/drivers/block/zram/zcomp.c
> > @@ -56,32 +56,18 @@ struct zcomp_strm *zcomp_strm_get(struct zcomp *comp)
> >  {
> >  	struct zcomp_strm *zstrm;
> >  
> > -	while (1) {
> > -		spin_lock(&comp->strm_lock);
> > -		if (list_empty(&comp->idle_strm)) {
> > -			spin_unlock(&comp->strm_lock);
> > -			wait_event(comp->strm_wait,
> > -					!list_empty(&comp->idle_strm));
> > -			continue;
> > -		}
> > -
> > -		zstrm = list_entry(comp->idle_strm.next,
> > +	mutex_lock(&comp->strm_lock);
> > +	zstrm = list_entry(comp->idle_strm.next,
> >  				struct zcomp_strm, list);
> > -		list_del(&zstrm->list);
> > -		spin_unlock(&comp->strm_lock);
> > -		break;
> > -	}
> > +	list_del(&zstrm->list);
> >  	return zstrm;
> >  }
> >  
> >  /* add zcomp_strm back to idle list and wake up waiter (if any) */
> >  void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm)
> >  {
> > -	spin_lock(&comp->strm_lock);
> >  	list_add(&zstrm->list, &comp->idle_strm);
> > -	spin_unlock(&comp->strm_lock);
> > -
> > -	wake_up(&comp->strm_wait);
> > +	mutex_unlock(&comp->strm_lock);
> >  }
> 
> >  
> >  int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
> > @@ -138,7 +124,7 @@ struct zcomp *zcomp_create(const char *compress, int num_strm)
> >  		return NULL;
> >  
> >  	comp->backend = backend;
> > -	spin_lock_init(&comp->strm_lock);
> > +	mutex_init(&comp->strm_lock);
> >  	INIT_LIST_HEAD(&comp->idle_strm);
> >  	init_waitqueue_head(&comp->strm_wait);
> >  
> > diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> > index 52899de40ca5..bfcec1cf7d2e 100644
> > --- a/drivers/block/zram/zcomp.h
> > +++ b/drivers/block/zram/zcomp.h
> > @@ -35,7 +35,7 @@ struct zcomp_backend {
> >  /* dynamic per-device compression frontend */
> >  struct zcomp {
> >  	/* protect strm list */
> > -	spinlock_t strm_lock;
> > +	struct mutex strm_lock;
> >  	/* list of available strms */
> >  	struct list_head idle_strm;
> >  	wait_queue_head_t strm_wait;
> > -- 
> > 1.8.5.3
> > 
> > As you can see, your patch made zram too much regressed when it
> > used one stream buffer. The reason I guessed is overhead of
> > scheduling by sleep/wakeup when others couldn't find a idle stream
> > so I had an experiment with below simple patch to restore old behavior
> > so it works well again.
> 
> yes, could be.
> 
> > The reason old behaivor was really good is
> > it uses mutex with spin_on_owner so that it could avoid frequent
> > sleep/wakeup.
> > 
> 
> zcomp_strm_get()
> 	-> mutex_lock(&comp->strm_lock);
> 
> zcomp_strm_put()
> 	-> mutex_unlock(&comp->strm_lock);
> 
> semantics was supposed to be a default single zstrm policy, while spinlock
> based was meant to be multi zstrm policy. though, I agree, that zstrm policy
> concept complicates implementation.
> 
> 
> > A solution I could think is that we could grow up the number of
> > buffer dynamically up to max_buffers(ex, 2 * number of CPU) so we
> > could grab a idle stream easily for each CPU while we could release
> > buffers by shrinker when the memory pressure is severe.
> > Of course, we should keep one buffer to work.
> > 
> 
> yes, we had this behaviour in v1. I'm afraid 2 * online cpus() zstrms
> can be tough on systems with big number of cpus, taking in account workmem
> requirements, e.g. LZO1X_1_MEM_COMPRESS is 8192 * sizeof(unsigned short).
> 
> > Another solution I can imagaine is to define CONFIG_ZRAM_MULTI_STREAM
> > and CONFIG_ZRAM_SINGLE_STREAM(ie, default) so we couldn't break good
> > performance for old cases and new users who want write parallel
> > could use ZRAM_MULTI which should be several streams at a default
> > rather than a stream. And we could ehhacne it further by dynamic control
> > as I mentioned.
> >
> > Thoughts?
> > 
> 
> well, (my opinion) I'd rather avoid adding .config options.
> 
> we can force zram to support only multi zstrm. during initialisation one zstrm (default)
> is allocated, which is guaranteed to be there. now, if write didn't get idle zstrm AND
> current number of allocated zstrms is less that N * online CPUs (N could be 1 or 2)
> it attempts to allocate a new one. otherwise, it waits. if it failed to allocate a
> new one (low memory case) it waits, and eventually will receive idle zstrm (either
> one of previously allocated zstrms or, in the worst case possible, the default one.
> so we guarantee write to complete).
> 
> on put() we add zstrm to idle list or free it if the number of online CPUs has changed.

It would introduce regression again by wait/wakeup if we couldn't increase
the number of stream dynamically which would be very likely for zram-swap
usecase with multiple CPU and even frequent allocation request which would
be failed is obviously overhead. Of course, we could control the congestion
with backoff but it would require more code so I doubt we could do it clearly
rather than separating it with another configs.

Another idea is to put single_mode in zcomp so if max_buffer is only 1,
we could use comp->mutex_lock instead of strm_lock in zcomp_strm_get.
It might work but it would add unnecessary code for only zram-swap use
case for small memory embedded system so it's not a choice, either.

There is another zram-swap use cases for desktop people that I've ever
heard. He has enough ram but his usage scenario was really extreme so
sometime it happens swapping so it ends up system lag so what he want
was to make zram I/O very fast. I guess he wouldn't have a concern
to reserve more meory for zstream of each cpu if zram could perform
well in parallel. For that case, it would be better to resevere
stream for each CPU when initial time of zram, which might introduce
CONFIG_ZRAM_COMP_PERCPU rather than CONFIG_ZRAM_COMP_MULTI.

In summary, there are several usecases and I don't think we could meet
every usecases as only *a* config without regression so I'd like to
separate them. But if you could handle it no regression but clean code,
I'm okay. Otherwise, I'd like to go with multiple config to prevent
regression for most popular usecase zram-swap these days.

> 
> thoughts?
> 
> 	-ss
> > 
> > > 
> > > -- 
> > > 1.9.0.rc3.260.g4cf525c
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> > -- 
> > Kind regards,
> > Minchan Kim
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv5 0/4] add compressing abstraction and multi stream support
  2014-02-18 19:38   ` Sergey Senozhatsky
@ 2014-02-19  6:15     ` Minchan Kim
  0 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2014-02-19  6:15 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Jerome Marchand, Nitin Gupta, linux-kernel

On Tue, Feb 18, 2014 at 10:38:18PM +0300, Sergey Senozhatsky wrote:
> On (02/18/14 15:04), Minchan Kim wrote:
> [..]
> > 1.8.5.3
> > 
> > As you can see, your patch made zram too much regressed when it
> > used one stream buffer. The reason I guessed is overhead of
> > scheduling by sleep/wakeup when others couldn't find a idle stream
> > so I had an experiment with below simple patch to restore old behavior
> > so it works well again. The reason old behaivor was really good is
> > it uses mutex with spin_on_owner so that it could avoid frequent
> > sleep/wakeup.
> > 
> > A solution I could think is that we could grow up the number of
> > buffer dynamically up to max_buffers(ex, 2 * number of CPU) so we
> > could grab a idle stream easily for each CPU while we could release
> > buffers by shrinker when the memory pressure is severe.
> > Of course, we should keep one buffer to work.
> > 
> > Another solution I can imagaine is to define CONFIG_ZRAM_MULTI_STREAM
> > and CONFIG_ZRAM_SINGLE_STREAM(ie, default) so we couldn't break good
> > performance for old cases and new users who want write parallel
> > could use ZRAM_MULTI which should be several streams at a default
> > rather than a stream. And we could ehhacne it further by dynamic control
> > as I mentioned.
> > 
> > Thoughts?
> >
> 
> like the following one (this patch limits the number of streams to
> num_online_cpus(). I really don't mind to limit zstrm to
> N * num_online_cpus() where N could be 2)
> 
> 	iozone -t 3 -R -r 16K -s 60M -I +Z
> 
> write test:
> (old core i5 laptop, 2 cpus + 2 HyperThreading)
> 
>                        base            multi buffer
> 
> "  Initial write "   574938.14           716623.64
> "        Rewrite "   573541.22          1499074.62
> "   Random write "   587016.14          1393996.66
> "         Pwrite "   595616.45           711028.70
> "         Fwrite "  1482843.62          1493398.12
> 

I guess you tested it for zram-block, not zram-swap.
If we use it as zram-swap and happens parallel swapout,
write performance would be dropped due to frequent sleep/wakeup.

> 
> zcomp_strm_get() and zcomp_strm_put() were updated.
> zcomp keeps the number allocated and still active (not freed)
> streams (atomic_t). zcomp performs zstrm free() only if
> current number of streams exceeds the number of online
> cpus (cpu went offline).
> 
> 
> not a properly `aligned' patch, just to demonstrate the idea.
> 
> thoughts?
> 
> thanks.
> 
> ---
> 
> >From 0e16f26480933d22339675ca938049dfb21557e2 Mon Sep 17 00:00:00 2001
> From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Date: Tue, 11 Feb 2014 00:28:46 +0300
> Subject: [PATCH] zram: multi stream compressing backend abstraction
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  drivers/block/zram/zcomp.c     | 174 +++++++++++++++++++++++++++++++++++++++++
>  drivers/block/zram/zcomp.h     |  58 ++++++++++++++
>  drivers/block/zram/zcomp_lzo.c |  48 ++++++++++++
>  3 files changed, 280 insertions(+)
>  create mode 100644 drivers/block/zram/zcomp.c
>  create mode 100644 drivers/block/zram/zcomp.h
>  create mode 100644 drivers/block/zram/zcomp_lzo.c
> 
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> new file mode 100644
> index 0000000..6bf49fc
> --- /dev/null
> +++ b/drivers/block/zram/zcomp.c
> @@ -0,0 +1,174 @@
> +/*
> + * Copyright (C) 2014 Sergey Senozhatsky.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/slab.h>
> +#include <linux/wait.h>
> +#include <linux/sched.h>
> +
> +#include "zcomp.h"
> +
> +extern struct zcomp_backend zcomp_lzo;
> +
> +static struct zcomp_backend *find_backend(const char *compress)
> +{
> +	if (strncmp(compress, "lzo", 3) == 0)
> +		return &zcomp_lzo;
> +	return NULL;
> +}
> +
> +static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
> +{
> +	if (zstrm->private)
> +		comp->backend->destroy(zstrm->private);
> +	free_pages((unsigned long)zstrm->buffer, 1);
> +	kfree(zstrm);
> +}
> +
> +/*
> + * allocate new zcomp_strm structure with ->private initialized by
> + * backend, return NULL on error
> + */
> +static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
> +{
> +	struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL);
> +	if (!zstrm)
> +		return NULL;
> +
> +	zstrm->private = comp->backend->create();
> +	/*
> +	 * allocate 2 pages. 1 for compressed data, plus 1 extra for the
> +	 * case when compressed size is larger than the original one
> +	 */
> +	zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
> +	if (!zstrm->private || !zstrm->buffer) {
> +		zcomp_strm_free(comp, zstrm);
> +		zstrm = NULL;
> +	}
> +	return zstrm;
> +}
> +
> +/*
> + * get available idle stream or allocate a new one
> + */
> +struct zcomp_strm *zcomp_strm_get(struct zcomp *comp)
> +{
> +	struct zcomp_strm *zstrm;
> +
> +	while (1) {
> +		spin_lock(&comp->strm_lock);
> +		if (!list_empty(&comp->idle_strm)) {
> +			zstrm = list_entry(comp->idle_strm.next,
> +					struct zcomp_strm, list);
> +			list_del(&zstrm->list);
> +			spin_unlock(&comp->strm_lock);
> +			return zstrm;
> +		}
> +
> +		if (atomic_read(&comp->num_strm) >= num_online_cpus()) {
> +			spin_unlock(&comp->strm_lock);
> +			wait_event(comp->strm_wait,
> +					!list_empty(&comp->idle_strm));
> +			continue;
> +		}
> +
> +		atomic_inc(&comp->num_strm);
> +		spin_unlock(&comp->strm_lock);
> +
> +		zstrm = zcomp_strm_alloc(comp);
> +		if (!zstrm) {
> +			atomic_dec(&comp->num_strm);
> +			wait_event(comp->strm_wait,
> +					!list_empty(&comp->idle_strm));
> +			continue;
> +		}
> +		break;
> +	}
> +	return zstrm;
> +}
> +
> +/*
> + * put zstrm back to idle list and wake up waiters or free it if
> + * current number of streams is above the limit
> + */
> +void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm)
> +{
> +	spin_lock(&comp->strm_lock);
> +	if (atomic_read(&comp->num_strm) <= num_online_cpus()) {
> +		list_add(&zstrm->list, &comp->idle_strm);
> +		spin_unlock(&comp->strm_lock);
> +	} else {
> +		atomic_dec(&comp->num_strm);
> +		spin_unlock(&comp->strm_lock);
> +		zcomp_strm_free(comp, zstrm);
> +	}
> +
> +	wake_up(&comp->strm_wait);
> +}
> +
> +int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
> +		const unsigned char *src, size_t *dst_len)
> +{
> +	return comp->backend->compress(src, zstrm->buffer, dst_len,
> +			zstrm->private);
> +}
> +
> +int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
> +		size_t src_len, unsigned char *dst)
> +{
> +	return comp->backend->decompress(src, src_len, dst);
> +}
> +
> +void zcomp_destroy(struct zcomp *comp)
> +{
> +	struct zcomp_strm *zstrm;
> +	while (!list_empty(&comp->idle_strm)) {
> +		zstrm = list_entry(comp->idle_strm.next,
> +				struct zcomp_strm, list);
> +		list_del(&zstrm->list);
> +		zcomp_strm_free(comp, zstrm);
> +	}
> +	kfree(comp);
> +}
> +
> +/*
> + * search available compressors for requested algorithm.
> + * allocate new zcomp and initialize it. return NULL
> + * if requested algorithm is not supported or in case
> + * of init error
> + */
> +struct zcomp *zcomp_create(const char *compress)
> +{
> +	struct zcomp *comp;
> +	struct zcomp_backend *backend;
> +	struct zcomp_strm *zstrm;
> +
> +	backend = find_backend(compress);
> +	if (!backend)
> +		return NULL;
> +
> +	comp = kmalloc(sizeof(struct zcomp), GFP_KERNEL);
> +	if (!comp)
> +		return NULL;
> +
> +	comp->backend = backend;
> +	spin_lock_init(&comp->strm_lock);
> +	INIT_LIST_HEAD(&comp->idle_strm);
> +	init_waitqueue_head(&comp->strm_wait);
> +	atomic_set(&comp->num_strm, 1);
> +
> +	zstrm = zcomp_strm_alloc(comp);
> +	if (!zstrm) {
> +		zcomp_destroy(comp);
> +		return NULL;
> +	}
> +	list_add(&zstrm->list, &comp->idle_strm);
> +	return comp;
> +}
> diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> new file mode 100644
> index 0000000..fd3871d
> --- /dev/null
> +++ b/drivers/block/zram/zcomp.h
> @@ -0,0 +1,58 @@
> +/*
> + * Copyright (C) 2014 Sergey Senozhatsky.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _ZCOMP_H_
> +#define _ZCOMP_H_
> +
> +#include <linux/spinlock.h>
> +
> +struct zcomp_strm {
> +	void *buffer;     /* compression/decompression buffer */
> +	void *private;
> +	struct list_head list;
> +};
> +
> +/* static compression backend */
> +struct zcomp_backend {
> +	int (*compress)(const unsigned char *src, unsigned char *dst,
> +			size_t *dst_len, void *private);
> +
> +	int (*decompress)(const unsigned char *src, size_t src_len,
> +			unsigned char *dst);
> +
> +	void * (*create)(void);
> +	void (*destroy)(void *private);
> +
> +	const char *name;
> +};
> +
> +/* dynamic per-device compression frontend */
> +struct zcomp {
> +	/* protect strm list */
> +	spinlock_t strm_lock;
> +	/* list of available strms */
> +	struct list_head idle_strm;
> +	wait_queue_head_t strm_wait;
> +	/* number of allocated strms */
> +	atomic_t num_strm;
> +	struct zcomp_backend *backend;
> +};
> +
> +struct zcomp *zcomp_create(const char *comp);
> +void zcomp_destroy(struct zcomp *comp);
> +
> +struct zcomp_strm *zcomp_strm_get(struct zcomp *comp);
> +void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm);
> +
> +int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
> +		const unsigned char *src, size_t *dst_len);
> +
> +int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
> +		size_t src_len, unsigned char *dst);
> +#endif /* _ZCOMP_H_ */
> diff --git a/drivers/block/zram/zcomp_lzo.c b/drivers/block/zram/zcomp_lzo.c
> new file mode 100644
> index 0000000..b7722a2
> --- /dev/null
> +++ b/drivers/block/zram/zcomp_lzo.c
> @@ -0,0 +1,48 @@
> +/*
> + * Copyright (C) 2014 Sergey Senozhatsky.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +#include <linux/lzo.h>
> +
> +#include "zcomp.h"
> +
> +static void *lzo_create(void)
> +{
> +	return kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
> +}
> +
> +static void lzo_destroy(void *private)
> +{
> +	kfree(private);
> +}
> +
> +static int lzo_compress(const unsigned char *src, unsigned char *dst,
> +		size_t *dst_len, void *private)
> +{
> +	int ret = lzo1x_1_compress(src, PAGE_SIZE, dst, dst_len, private);
> +	return ret == LZO_E_OK ? 0 : ret;
> +}
> +
> +static int lzo_decompress(const unsigned char *src, size_t src_len,
> +		unsigned char *dst)
> +{
> +	size_t dst_len = PAGE_SIZE;
> +	int ret = lzo1x_decompress_safe(src, src_len, dst, &dst_len);
> +	return ret == LZO_E_OK ? 0 : ret;
> +}
> +
> +extern struct zcomp_backend zcomp_lzo;
> +struct zcomp_backend zcomp_lzo = {
> +	.compress = lzo_compress,
> +	.decompress = lzo_decompress,
> +	.create = lzo_create,
> +	.destroy = lzo_destroy,
> +	.name = "lzo",
> +};
> -- 
> 1.9.0.279.gdc9e3eb
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-02-19  6:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-13 17:43 [PATCHv5 0/4] add compressing abstraction and multi stream support Sergey Senozhatsky
2014-02-13 17:43 ` [PATCHv5 1/4] zram: introduce compressing backend abstraction Sergey Senozhatsky
2014-02-18  5:10   ` Minchan Kim
2014-02-18  9:24     ` Sergey Senozhatsky
2014-02-13 17:43 ` [PATCHv5 2/4] zram: use zcomp compressing backends Sergey Senozhatsky
2014-02-13 17:43 ` [PATCHv5 3/4] zram: zcomp support multi compressing streams Sergey Senozhatsky
2014-02-13 17:43 ` [PATCHv5 4/4] zram: document max_comp_streams Sergey Senozhatsky
2014-02-18  6:04 ` [PATCHv5 0/4] add compressing abstraction and multi stream support Minchan Kim
2014-02-18  9:20   ` Sergey Senozhatsky
2014-02-19  6:14     ` Minchan Kim
2014-02-18 19:38   ` Sergey Senozhatsky
2014-02-19  6:15     ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.