linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] promote zcache from staging
@ 2012-07-27 18:18 Seth Jennings
  2012-07-27 18:18 ` [PATCH 1/4] zsmalloc: collapse internal .h into .c Seth Jennings
                   ` (6 more replies)
  0 siblings, 7 replies; 41+ messages in thread
From: Seth Jennings @ 2012-07-27 18:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Robert Jennings,
	linux-mm, linux-kernel, devel

zcache is the remaining piece of code required to support in-kernel
memory compression.  The other two features, cleancache and frontswap,
have been promoted to mainline in 3.0 and 3.5.  This patchset
promotes zcache from the staging tree to mainline.

Based on the level of activity and contributions we're seeing from a
diverse set of people and interests, I think zcache has matured to the
point where it makes sense to promote this out of staging.

Overview
========
zcache is a backend to frontswap and cleancache that accepts pages from
those mechanisms and compresses them, leading to reduced I/O caused by
swap and file re-reads.  This is very valuable in shared storage situations
to reduce load on things like SANs.  Also, in the case of slow backing/swap
devices, zcache can also yield a performance gain.

In-Kernel Memory Compression Overview:

 swap subsystem            page cache
        +                      +
    frontswap              cleancache
        +                      +
zcache frontswap glue  zcache cleancache glue
        +                      +
        +---------+------------+
                  +
            zcache/tmem core
                  +
        +---------+------------+
        +                      +
     zsmalloc                 zbud

Everything below the frontswap/cleancache layer is current inside the
zcache driver expect for zsmalloc which is a shared between zcache and
another memory compression driver, zram.

Since zcache is dependent on zsmalloc, it is also being promoted by this
patchset.

For information on zsmalloc and the rationale behind it's design and use
cases verses already existing allocators in the kernel:

https://lkml.org/lkml/2012/1/9/386

zsmalloc is the allocator used by zcache to store persistent pages that
comes from frontswap, as opposed to zbud which is the (internal) allocator
used for ephemeral pages from cleancache.

zsmalloc uses many fields of the page struct to create it's conceptual
high-order page called a zspage.  Exactly which fields are used and for
what purpose is documented at the top of the zsmalloc .c file.  Because
zsmalloc uses struct page extensively, Andrew advised that the
promotion location be mm/:

https://lkml.org/lkml/2012/1/20/308

Some benchmarking numbers demonstrating the I/O saving that can be had
with zcache:

https://lkml.org/lkml/2012/3/22/383

Dan's presentation at LSF/MM this year on zcache:

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/LSFMM12-zcache-final.pdf

This patchset is based on next-20120727 + 3-part zsmalloc patchset below

https://lkml.org/lkml/2012/7/18/353

The zsmalloc patchset is already acked and will be integrated by Greg after
3.6-rc1 is out.

Seth Jennings (4):
  zsmalloc: collapse internal .h into .c
  zsmalloc: promote to mm/
  drivers: add memory management driver class
  zcache: promote to drivers/mm/

 drivers/Kconfig                                    |    2 +
 drivers/Makefile                                   |    1 +
 drivers/mm/Kconfig                                 |   13 ++
 drivers/mm/Makefile                                |    1 +
 drivers/{staging => mm}/zcache/Makefile            |    0
 drivers/{staging => mm}/zcache/tmem.c              |    0
 drivers/{staging => mm}/zcache/tmem.h              |    0
 drivers/{staging => mm}/zcache/zcache-main.c       |    4 +-
 drivers/staging/Kconfig                            |    4 -
 drivers/staging/Makefile                           |    2 -
 drivers/staging/zcache/Kconfig                     |   11 --
 drivers/staging/zram/zram_drv.h                    |    3 +-
 drivers/staging/zsmalloc/Kconfig                   |   10 --
 drivers/staging/zsmalloc/Makefile                  |    3 -
 drivers/staging/zsmalloc/zsmalloc_int.h            |  149 --------------------
 .../staging/zsmalloc => include/linux}/zsmalloc.h  |    0
 mm/Kconfig                                         |   18 +++
 mm/Makefile                                        |    1 +
 .../zsmalloc/zsmalloc-main.c => mm/zsmalloc.c      |  133 ++++++++++++++++-
 19 files changed, 170 insertions(+), 185 deletions(-)
 create mode 100644 drivers/mm/Kconfig
 create mode 100644 drivers/mm/Makefile
 rename drivers/{staging => mm}/zcache/Makefile (100%)
 rename drivers/{staging => mm}/zcache/tmem.c (100%)
 rename drivers/{staging => mm}/zcache/tmem.h (100%)
 rename drivers/{staging => mm}/zcache/zcache-main.c (99%)
 delete mode 100644 drivers/staging/zcache/Kconfig
 delete mode 100644 drivers/staging/zsmalloc/Kconfig
 delete mode 100644 drivers/staging/zsmalloc/Makefile
 delete mode 100644 drivers/staging/zsmalloc/zsmalloc_int.h
 rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
 rename drivers/staging/zsmalloc/zsmalloc-main.c => mm/zsmalloc.c (86%)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/4] zsmalloc: collapse internal .h into .c
  2012-07-27 18:18 [PATCH 0/4] promote zcache from staging Seth Jennings
@ 2012-07-27 18:18 ` Seth Jennings
  2012-07-27 18:18 ` [PATCH 2/4] zsmalloc: promote to mm/ Seth Jennings
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Seth Jennings @ 2012-07-27 18:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Robert Jennings,
	linux-mm, linux-kernel, devel

The patch collapses in the internal zsmalloc_int.h into
the zsmalloc-main.c file.

This is done in preparation for the promotion to mm/ where
separate internal headers are discouraged.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/staging/zsmalloc/zsmalloc-main.c |  132 +++++++++++++++++++++++++-
 drivers/staging/zsmalloc/zsmalloc_int.h  |  149 ------------------------------
 2 files changed, 131 insertions(+), 150 deletions(-)
 delete mode 100644 drivers/staging/zsmalloc/zsmalloc_int.h

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index defe350..09a9d35 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -76,9 +76,139 @@
 #include <linux/cpu.h>
 #include <linux/vmalloc.h>
 #include <linux/hardirq.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
 
 #include "zsmalloc.h"
-#include "zsmalloc_int.h"
+
+/*
+ * This must be power of 2 and greater than of equal to sizeof(link_free).
+ * These two conditions ensure that any 'struct link_free' itself doesn't
+ * span more than 1 page which avoids complex case of mapping 2 pages simply
+ * to restore link_free pointer values.
+ */
+#define ZS_ALIGN		8
+
+/*
+ * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
+ * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
+ */
+#define ZS_MAX_ZSPAGE_ORDER 2
+#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
+
+/*
+ * Object location (<PFN>, <obj_idx>) is encoded as
+ * as single (void *) handle value.
+ *
+ * Note that object index <obj_idx> is relative to system
+ * page <PFN> it is stored in, so for each sub-page belonging
+ * to a zspage, obj_idx starts with 0.
+ *
+ * This is made more complicated by various memory models and PAE.
+ */
+
+#ifndef MAX_PHYSMEM_BITS
+#ifdef CONFIG_HIGHMEM64G
+#define MAX_PHYSMEM_BITS 36
+#else /* !CONFIG_HIGHMEM64G */
+/*
+ * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
+ * be PAGE_SHIFT
+ */
+#define MAX_PHYSMEM_BITS BITS_PER_LONG
+#endif
+#endif
+#define _PFN_BITS		(MAX_PHYSMEM_BITS - PAGE_SHIFT)
+#define OBJ_INDEX_BITS	(BITS_PER_LONG - _PFN_BITS)
+#define OBJ_INDEX_MASK	((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
+
+#define MAX(a, b) ((a) >= (b) ? (a) : (b))
+/* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
+#define ZS_MIN_ALLOC_SIZE \
+	MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+#define ZS_MAX_ALLOC_SIZE	PAGE_SIZE
+
+/*
+ * On systems with 4K page size, this gives 254 size classes! There is a
+ * trader-off here:
+ *  - Large number of size classes is potentially wasteful as free page are
+ *    spread across these classes
+ *  - Small number of size classes causes large internal fragmentation
+ *  - Probably its better to use specific size classes (empirically
+ *    determined). NOTE: all those class sizes must be set as multiple of
+ *    ZS_ALIGN to make sure link_free itself never has to span 2 pages.
+ *
+ *  ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
+ *  (reason above)
+ */
+#define ZS_SIZE_CLASS_DELTA	16
+#define ZS_SIZE_CLASSES		((ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE) / \
+					ZS_SIZE_CLASS_DELTA + 1)
+
+/*
+ * We do not maintain any list for completely empty or full pages
+ */
+enum fullness_group {
+	ZS_ALMOST_FULL,
+	ZS_ALMOST_EMPTY,
+	_ZS_NR_FULLNESS_GROUPS,
+
+	ZS_EMPTY,
+	ZS_FULL
+};
+
+/*
+ * We assign a page to ZS_ALMOST_EMPTY fullness group when:
+ *	n <= N / f, where
+ * n = number of allocated objects
+ * N = total number of objects zspage can store
+ * f = 1/fullness_threshold_frac
+ *
+ * Similarly, we assign zspage to:
+ *	ZS_ALMOST_FULL	when n > N / f
+ *	ZS_EMPTY	when n == 0
+ *	ZS_FULL		when n == N
+ *
+ * (see: fix_fullness_group())
+ */
+static const int fullness_threshold_frac = 4;
+
+struct size_class {
+	/*
+	 * Size of objects stored in this class. Must be multiple
+	 * of ZS_ALIGN.
+	 */
+	int size;
+	unsigned int index;
+
+	/* Number of PAGE_SIZE sized pages to combine to form a 'zspage' */
+	int pages_per_zspage;
+
+	spinlock_t lock;
+
+	/* stats */
+	u64 pages_allocated;
+
+	struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
+};
+
+/*
+ * Placed within free objects to form a singly linked list.
+ * For every zspage, first_page->freelist gives head of this list.
+ *
+ * This must be power of 2 and less than or equal to ZS_ALIGN
+ */
+struct link_free {
+	/* Handle of next free chunk (encodes <PFN, obj_idx>) */
+	void *next;
+};
+
+struct zs_pool {
+	struct size_class size_class[ZS_SIZE_CLASSES];
+
+	gfp_t flags;	/* allocation flags used when growing pool */
+	const char *name;
+};
 
 /*
  * A zspage's class index and fullness group
diff --git a/drivers/staging/zsmalloc/zsmalloc_int.h b/drivers/staging/zsmalloc/zsmalloc_int.h
deleted file mode 100644
index 8c0b344..0000000
--- a/drivers/staging/zsmalloc/zsmalloc_int.h
+++ /dev/null
@@ -1,149 +0,0 @@
-/*
- * zsmalloc memory allocator
- *
- * Copyright (C) 2011  Nitin Gupta
- *
- * This code is released using a dual license strategy: BSD/GPL
- * You can choose the license that better fits your requirements.
- *
- * Released under the terms of 3-clause BSD License
- * Released under the terms of GNU General Public License Version 2.0
- */
-
-#ifndef _ZS_MALLOC_INT_H_
-#define _ZS_MALLOC_INT_H_
-
-#include <linux/kernel.h>
-#include <linux/spinlock.h>
-#include <linux/types.h>
-
-/*
- * This must be power of 2 and greater than of equal to sizeof(link_free).
- * These two conditions ensure that any 'struct link_free' itself doesn't
- * span more than 1 page which avoids complex case of mapping 2 pages simply
- * to restore link_free pointer values.
- */
-#define ZS_ALIGN		8
-
-/*
- * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
- * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
- */
-#define ZS_MAX_ZSPAGE_ORDER 2
-#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
-
-/*
- * Object location (<PFN>, <obj_idx>) is encoded as
- * as single (void *) handle value.
- *
- * Note that object index <obj_idx> is relative to system
- * page <PFN> it is stored in, so for each sub-page belonging
- * to a zspage, obj_idx starts with 0.
- *
- * This is made more complicated by various memory models and PAE.
- */
-
-#ifndef MAX_PHYSMEM_BITS
-#ifdef CONFIG_HIGHMEM64G
-#define MAX_PHYSMEM_BITS 36
-#else /* !CONFIG_HIGHMEM64G */
-/*
- * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
- * be PAGE_SHIFT
- */
-#define MAX_PHYSMEM_BITS BITS_PER_LONG
-#endif
-#endif
-#define _PFN_BITS		(MAX_PHYSMEM_BITS - PAGE_SHIFT)
-#define OBJ_INDEX_BITS	(BITS_PER_LONG - _PFN_BITS)
-#define OBJ_INDEX_MASK	((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
-
-#define MAX(a, b) ((a) >= (b) ? (a) : (b))
-/* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
-#define ZS_MIN_ALLOC_SIZE \
-	MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
-#define ZS_MAX_ALLOC_SIZE	PAGE_SIZE
-
-/*
- * On systems with 4K page size, this gives 254 size classes! There is a
- * trader-off here:
- *  - Large number of size classes is potentially wasteful as free page are
- *    spread across these classes
- *  - Small number of size classes causes large internal fragmentation
- *  - Probably its better to use specific size classes (empirically
- *    determined). NOTE: all those class sizes must be set as multiple of
- *    ZS_ALIGN to make sure link_free itself never has to span 2 pages.
- *
- *  ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
- *  (reason above)
- */
-#define ZS_SIZE_CLASS_DELTA	16
-#define ZS_SIZE_CLASSES		((ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE) / \
-					ZS_SIZE_CLASS_DELTA + 1)
-
-/*
- * We do not maintain any list for completely empty or full pages
- */
-enum fullness_group {
-	ZS_ALMOST_FULL,
-	ZS_ALMOST_EMPTY,
-	_ZS_NR_FULLNESS_GROUPS,
-
-	ZS_EMPTY,
-	ZS_FULL
-};
-
-/*
- * We assign a page to ZS_ALMOST_EMPTY fullness group when:
- *	n <= N / f, where
- * n = number of allocated objects
- * N = total number of objects zspage can store
- * f = 1/fullness_threshold_frac
- *
- * Similarly, we assign zspage to:
- *	ZS_ALMOST_FULL	when n > N / f
- *	ZS_EMPTY	when n == 0
- *	ZS_FULL		when n == N
- *
- * (see: fix_fullness_group())
- */
-static const int fullness_threshold_frac = 4;
-
-struct size_class {
-	/*
-	 * Size of objects stored in this class. Must be multiple
-	 * of ZS_ALIGN.
-	 */
-	int size;
-	unsigned int index;
-
-	/* Number of PAGE_SIZE sized pages to combine to form a 'zspage' */
-	int pages_per_zspage;
-
-	spinlock_t lock;
-
-	/* stats */
-	u64 pages_allocated;
-
-	struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
-};
-
-/*
- * Placed within free objects to form a singly linked list.
- * For every zspage, first_page->freelist gives head of this list.
- *
- * This must be power of 2 and less than or equal to ZS_ALIGN
- */
-struct link_free {
-	/* Handle of next free chunk (encodes <PFN, obj_idx>) */
-	void *next;
-};
-
-struct zs_pool {
-	struct size_class size_class[ZS_SIZE_CLASSES];
-
-	gfp_t flags;	/* allocation flags used when growing pool */
-	const char *name;
-};
-
-#endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/4] zsmalloc: promote to mm/
  2012-07-27 18:18 [PATCH 0/4] promote zcache from staging Seth Jennings
  2012-07-27 18:18 ` [PATCH 1/4] zsmalloc: collapse internal .h into .c Seth Jennings
@ 2012-07-27 18:18 ` Seth Jennings
  2012-07-27 18:18 ` [PATCH 3/4] drivers: add memory management driver class Seth Jennings
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Seth Jennings @ 2012-07-27 18:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Robert Jennings,
	linux-mm, linux-kernel, devel

This patch promotes the slab-based zsmalloc memory allocator
from the staging tree to mm/

zcache depends on this allocator for storing compressed RAM pages
in an efficient way under system wide memory pressure where
high-order (greater than 0) page allocation are very likely to
fail.

For more information on zsmalloc and its internals, read the
documentation at the top of the zsmalloc c file.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/staging/Kconfig                            |    2 --
 drivers/staging/Makefile                           |    1 -
 drivers/staging/zcache/zcache-main.c               |    4 ++--
 drivers/staging/zram/zram_drv.h                    |    3 +--
 drivers/staging/zsmalloc/Kconfig                   |   10 ----------
 drivers/staging/zsmalloc/Makefile                  |    3 ---
 .../staging/zsmalloc => include/linux}/zsmalloc.h  |    0
 mm/Kconfig                                         |   18 ++++++++++++++++++
 mm/Makefile                                        |    1 +
 .../zsmalloc/zsmalloc-main.c => mm/zsmalloc.c      |    3 +--
 10 files changed, 23 insertions(+), 22 deletions(-)
 delete mode 100644 drivers/staging/zsmalloc/Kconfig
 delete mode 100644 drivers/staging/zsmalloc/Makefile
 rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
 rename drivers/staging/zsmalloc/zsmalloc-main.c => mm/zsmalloc.c (99%)

diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index e3402d5..b7f7bc7 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -78,8 +78,6 @@ source "drivers/staging/zram/Kconfig"
 
 source "drivers/staging/zcache/Kconfig"
 
-source "drivers/staging/zsmalloc/Kconfig"
-
 source "drivers/staging/wlags49_h2/Kconfig"
 
 source "drivers/staging/wlags49_h25/Kconfig"
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index 3be59d0..ad74bee 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -34,7 +34,6 @@ obj-$(CONFIG_DX_SEP)            += sep/
 obj-$(CONFIG_IIO)		+= iio/
 obj-$(CONFIG_ZRAM)		+= zram/
 obj-$(CONFIG_ZCACHE)		+= zcache/
-obj-$(CONFIG_ZSMALLOC)		+= zsmalloc/
 obj-$(CONFIG_WLAGS49_H2)	+= wlags49_h2/
 obj-$(CONFIG_WLAGS49_H25)	+= wlags49_h25/
 obj-$(CONFIG_FB_SM7XX)		+= sm7xxfb/
diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
index c214977..06ce28f 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -32,9 +32,9 @@
 #include <linux/crypto.h>
 #include <linux/string.h>
 #include <linux/idr.h>
-#include "tmem.h"
+#include <linux/zsmalloc.h>
 
-#include "../zsmalloc/zsmalloc.h"
+#include "tmem.h"
 
 #ifdef CONFIG_CLEANCACHE
 #include <linux/cleancache.h>
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 572c0b1..f6d0925 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -17,8 +17,7 @@
 
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
-
-#include "../zsmalloc/zsmalloc.h"
+#include <linux/zsmalloc.h>
 
 /*
  * Some arbitrary value. This is just to catch
diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
deleted file mode 100644
index 9084565..0000000
--- a/drivers/staging/zsmalloc/Kconfig
+++ /dev/null
@@ -1,10 +0,0 @@
-config ZSMALLOC
-	tristate "Memory allocator for compressed pages"
-	default n
-	help
-	  zsmalloc is a slab-based memory allocator designed to store
-	  compressed RAM pages.  zsmalloc uses virtual memory mapping
-	  in order to reduce fragmentation.  However, this results in a
-	  non-standard allocator interface where a handle, not a pointer, is
-	  returned by an alloc().  This handle must be mapped in order to
-	  access the allocated space.
diff --git a/drivers/staging/zsmalloc/Makefile b/drivers/staging/zsmalloc/Makefile
deleted file mode 100644
index b134848..0000000
--- a/drivers/staging/zsmalloc/Makefile
+++ /dev/null
@@ -1,3 +0,0 @@
-zsmalloc-y 		:= zsmalloc-main.o
-
-obj-$(CONFIG_ZSMALLOC)	+= zsmalloc.o
diff --git a/drivers/staging/zsmalloc/zsmalloc.h b/include/linux/zsmalloc.h
similarity index 100%
rename from drivers/staging/zsmalloc/zsmalloc.h
rename to include/linux/zsmalloc.h
diff --git a/mm/Kconfig b/mm/Kconfig
index d5c8019..2586b66 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -411,3 +411,21 @@ config FRONTSWAP
 	  and swap data is stored as normal on the matching swap device.
 
 	  If unsure, say Y to enable frontswap.
+
+config ZSMALLOC
+	tristate "Memory allocator for compressed pages"
+	default n
+	help
+	  zsmalloc is a slab-based memory allocator designed to store
+	  compressed RAM pages.  zsmalloc uses a memory pool that combines
+	  single pages into higher order pages by linking them together
+	  using the fields of the struct page. Allocations are then
+	  mapped through copy buffers or VM mapping, in order to reduce
+	  memory pool fragmentation and increase allocation success rate under
+	  memory pressure.
+
+	  This results in a non-standard allocator interface where
+	  a handle, not a pointer, is returned by the allocation function.
+	  This handle must be mapped in order to access the allocated space.
+
+	  If unsure, say N.
diff --git a/mm/Makefile b/mm/Makefile
index 92753e2..8a3d7bea 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -57,3 +57,4 @@ obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
 obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
 obj-$(CONFIG_CLEANCACHE) += cleancache.o
 obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
+obj-$(CONFIG_ZSMALLOC) += zsmalloc.o
diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/mm/zsmalloc.c
similarity index 99%
rename from drivers/staging/zsmalloc/zsmalloc-main.c
rename to mm/zsmalloc.c
index 09a9d35..6b20429 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/mm/zsmalloc.c
@@ -78,8 +78,7 @@
 #include <linux/hardirq.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
-
-#include "zsmalloc.h"
+#include <linux/zsmalloc.h>
 
 /*
  * This must be power of 2 and greater than of equal to sizeof(link_free).
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/4] drivers: add memory management driver class
  2012-07-27 18:18 [PATCH 0/4] promote zcache from staging Seth Jennings
  2012-07-27 18:18 ` [PATCH 1/4] zsmalloc: collapse internal .h into .c Seth Jennings
  2012-07-27 18:18 ` [PATCH 2/4] zsmalloc: promote to mm/ Seth Jennings
@ 2012-07-27 18:18 ` Seth Jennings
  2012-07-31 15:31   ` Konrad Rzeszutek Wilk
  2012-07-27 18:18 ` [PATCH 4/4] zcache: promote to drivers/mm/ Seth Jennings
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 41+ messages in thread
From: Seth Jennings @ 2012-07-27 18:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Robert Jennings,
	linux-mm, linux-kernel, devel

This patchset creates a new driver class under drivers/ for
memory management related drivers, like zcache.

This driver class would be for drivers that don't actually enabled
a hardware device, but rather augment the memory manager in some
way.

In-tree candidates for this driver class are zcache, zram, and
lowmemorykiller, both in staging.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/Kconfig    |    2 ++
 drivers/Makefile   |    1 +
 drivers/mm/Kconfig |    3 +++
 3 files changed, 6 insertions(+)
 create mode 100644 drivers/mm/Kconfig

diff --git a/drivers/Kconfig b/drivers/Kconfig
index ece958d..67fe7bd 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -152,4 +152,6 @@ source "drivers/vme/Kconfig"
 
 source "drivers/pwm/Kconfig"
 
+source "drivers/mm/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 5b42184..121742e 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -139,3 +139,4 @@ obj-$(CONFIG_EXTCON)		+= extcon/
 obj-$(CONFIG_MEMORY)		+= memory/
 obj-$(CONFIG_IIO)		+= iio/
 obj-$(CONFIG_VME_BUS)		+= vme/
+obj-$(CONFIG_MM_DRIVERS)	+= mm/
diff --git a/drivers/mm/Kconfig b/drivers/mm/Kconfig
new file mode 100644
index 0000000..e5b3743
--- /dev/null
+++ b/drivers/mm/Kconfig
@@ -0,0 +1,3 @@
+menu "Memory management drivers"
+
+endmenu
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/4] zcache: promote to drivers/mm/
  2012-07-27 18:18 [PATCH 0/4] promote zcache from staging Seth Jennings
                   ` (2 preceding siblings ...)
  2012-07-27 18:18 ` [PATCH 3/4] drivers: add memory management driver class Seth Jennings
@ 2012-07-27 18:18 ` Seth Jennings
  2012-07-29  2:20 ` [PATCH 0/4] promote zcache from staging Minchan Kim
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Seth Jennings @ 2012-07-27 18:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Robert Jennings,
	linux-mm, linux-kernel, devel

This patchset promtes the zcache driver from staging to drivers/mm/.

zcache captures swap pages via frontswap and pages that fall
out of the page cache via cleancache and compress them in RAM,
providing a compressed RAM swap and a compressed second-chance
page cache.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/mm/Kconfig                           |   10 ++++++++++
 drivers/mm/Makefile                          |    1 +
 drivers/{staging => mm}/zcache/Makefile      |    0
 drivers/{staging => mm}/zcache/tmem.c        |    0
 drivers/{staging => mm}/zcache/tmem.h        |    0
 drivers/{staging => mm}/zcache/zcache-main.c |    0
 drivers/staging/Kconfig                      |    2 --
 drivers/staging/Makefile                     |    1 -
 drivers/staging/zcache/Kconfig               |   11 -----------
 9 files changed, 11 insertions(+), 14 deletions(-)
 create mode 100644 drivers/mm/Makefile
 rename drivers/{staging => mm}/zcache/Makefile (100%)
 rename drivers/{staging => mm}/zcache/tmem.c (100%)
 rename drivers/{staging => mm}/zcache/tmem.h (100%)
 rename drivers/{staging => mm}/zcache/zcache-main.c (100%)
 delete mode 100644 drivers/staging/zcache/Kconfig

diff --git a/drivers/mm/Kconfig b/drivers/mm/Kconfig
index e5b3743..22289c6 100644
--- a/drivers/mm/Kconfig
+++ b/drivers/mm/Kconfig
@@ -1,3 +1,13 @@
 menu "Memory management drivers"
 
+config ZCACHE
+	bool "Dynamic compression of swap pages and clean pagecache pages"
+	depends on (CLEANCACHE || FRONTSWAP) && CRYPTO=y && ZSMALLOC=y
+	select CRYPTO_LZO
+	default n
+	help
+	  Zcache uses compression and an in-kernel implementation of
+	  transcendent memory to store clean page cache pages and swap
+	  in RAM, providing a noticeable reduction in disk I/O.
+
 endmenu
diff --git a/drivers/mm/Makefile b/drivers/mm/Makefile
new file mode 100644
index 0000000..f36f509
--- /dev/null
+++ b/drivers/mm/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_ZCACHE)	+= zcache/
diff --git a/drivers/staging/zcache/Makefile b/drivers/mm/zcache/Makefile
similarity index 100%
rename from drivers/staging/zcache/Makefile
rename to drivers/mm/zcache/Makefile
diff --git a/drivers/staging/zcache/tmem.c b/drivers/mm/zcache/tmem.c
similarity index 100%
rename from drivers/staging/zcache/tmem.c
rename to drivers/mm/zcache/tmem.c
diff --git a/drivers/staging/zcache/tmem.h b/drivers/mm/zcache/tmem.h
similarity index 100%
rename from drivers/staging/zcache/tmem.h
rename to drivers/mm/zcache/tmem.h
diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/mm/zcache/zcache-main.c
similarity index 100%
rename from drivers/staging/zcache/zcache-main.c
rename to drivers/mm/zcache/zcache-main.c
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index b7f7bc7..0940d2e 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -76,8 +76,6 @@ source "drivers/staging/iio/Kconfig"
 
 source "drivers/staging/zram/Kconfig"
 
-source "drivers/staging/zcache/Kconfig"
-
 source "drivers/staging/wlags49_h2/Kconfig"
 
 source "drivers/staging/wlags49_h25/Kconfig"
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index ad74bee..6e1c491 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -33,7 +33,6 @@ obj-$(CONFIG_IPACK_BUS)		+= ipack/
 obj-$(CONFIG_DX_SEP)            += sep/
 obj-$(CONFIG_IIO)		+= iio/
 obj-$(CONFIG_ZRAM)		+= zram/
-obj-$(CONFIG_ZCACHE)		+= zcache/
 obj-$(CONFIG_WLAGS49_H2)	+= wlags49_h2/
 obj-$(CONFIG_WLAGS49_H25)	+= wlags49_h25/
 obj-$(CONFIG_FB_SM7XX)		+= sm7xxfb/
diff --git a/drivers/staging/zcache/Kconfig b/drivers/staging/zcache/Kconfig
deleted file mode 100644
index 4881839..0000000
--- a/drivers/staging/zcache/Kconfig
+++ /dev/null
@@ -1,11 +0,0 @@
-config ZCACHE
-	bool "Dynamic compression of swap pages and clean pagecache pages"
-	depends on (CLEANCACHE || FRONTSWAP) && CRYPTO=y && ZSMALLOC=y
-	select CRYPTO_LZO
-	default n
-	help
-	  Zcache doubles RAM efficiency while providing a significant
-	  performance boosts on many workloads.  Zcache uses
-	  compression and an in-kernel implementation of transcendent
-	  memory to store clean page cache pages and swap in RAM,
-	  providing a noticeable reduction in disk I/O.
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* RE: [PATCH 0/4] promote zcache from staging
       [not found] <<1343413117-1989-1-git-send-email-sjenning@linux.vnet.ibm.com>
@ 2012-07-27 19:21 ` Dan Magenheimer
  2012-07-27 20:59   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 41+ messages in thread
From: Dan Magenheimer @ 2012-07-27 19:21 UTC (permalink / raw)
  To: Seth Jennings, Greg Kroah-Hartman
  Cc: Andrew Morton, Nitin Gupta, Minchan Kim, Konrad Rzeszutek Wilk,
	Dan Magenheimer, Robert Jennings, linux-mm, linux-kernel, devel

> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Subject: [PATCH 0/4] promote zcache from staging
> 
> zcache is the remaining piece of code required to support in-kernel
> memory compression.  The other two features, cleancache and frontswap,
> have been promoted to mainline in 3.0 and 3.5.  This patchset
> promotes zcache from the staging tree to mainline.
> 
> Based on the level of activity and contributions we're seeing from a
> diverse set of people and interests, I think zcache has matured to the
> point where it makes sense to promote this out of staging.

Hi Seth --

Per offline communication, I'd like to see this delayed for three
reasons:

1) I've completely rewritten zcache and will post the rewrite soon.
   The redesigned code fixes many of the weaknesses in zcache that
   makes it (IMHO) unsuitable for an enterprise distro.  (Some of
   these previously discussed in linux-mm [1].)
2) zcache is truly mm (memory management) code and the fact that
   it is in drivers at all was purely for logistical reasons
   (e.g. the only in-tree "staging" is in the drivers directory).
   My rewrite promotes it to (a subdirectory of) mm where IMHO it
   belongs.
3) Ramster heavily duplicates code from zcache.  My rewrite resolves
   this.  My soon-to-be-post also places the re-factored ramster
   in mm, though with some minor work zcache could go in mm and
   ramster could stay in staging.

Let's have this discussion, but unless the community decides
otherwise, please consider this a NACK.

Thanks,
Dan

[1] http://marc.info/?t=133886706700002&r=1&w=2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-27 19:21 ` Dan Magenheimer
@ 2012-07-27 20:59   ` Konrad Rzeszutek Wilk
  2012-07-27 21:42     ` Dan Magenheimer
  0 siblings, 1 reply; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-27 20:59 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Seth Jennings, Greg Kroah-Hartman, Andrew Morton, Nitin Gupta,
	Minchan Kim, Konrad Rzeszutek Wilk, Robert Jennings, linux-mm,
	linux-kernel, devel

On Fri, Jul 27, 2012 at 12:21:50PM -0700, Dan Magenheimer wrote:
> > From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> > Subject: [PATCH 0/4] promote zcache from staging
> > 
> > zcache is the remaining piece of code required to support in-kernel
> > memory compression.  The other two features, cleancache and frontswap,
> > have been promoted to mainline in 3.0 and 3.5.  This patchset
> > promotes zcache from the staging tree to mainline.
> > 
> > Based on the level of activity and contributions we're seeing from a
> > diverse set of people and interests, I think zcache has matured to the
> > point where it makes sense to promote this out of staging.
> 
> Hi Seth --
> 
> Per offline communication, I'd like to see this delayed for three
> reasons:
> 
> 1) I've completely rewritten zcache and will post the rewrite soon.
>    The redesigned code fixes many of the weaknesses in zcache that
>    makes it (IMHO) unsuitable for an enterprise distro.  (Some of
>    these previously discussed in linux-mm [1].)
> 2) zcache is truly mm (memory management) code and the fact that
>    it is in drivers at all was purely for logistical reasons
>    (e.g. the only in-tree "staging" is in the drivers directory).
>    My rewrite promotes it to (a subdirectory of) mm where IMHO it
>    belongs.
> 3) Ramster heavily duplicates code from zcache.  My rewrite resolves
>    this.  My soon-to-be-post also places the re-factored ramster
>    in mm, though with some minor work zcache could go in mm and
>    ramster could stay in staging.
> 
> Let's have this discussion, but unless the community decides
> otherwise, please consider this a NACK.

Hold on, that is rather unfair. The zcache has been in staging
for quite some time - your code has not been posted. Part of
"unstaging" a driver is for folks to review the code - and you
just said "No, mine is better" without showing your goods.

There is a third option - which is to continue the promotion
of zcache from staging, get reviews, work on them ,etc, and
alongside of that you can work on fixing up (or ripping out)
zcache1 with zcache2 components as they make sense. Or even
having two of them - an enterprise and an embedded version
that will eventually get merged together. There is nothing
wrong with modifying a driver once it has left staging.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 0/4] promote zcache from staging
  2012-07-27 20:59   ` Konrad Rzeszutek Wilk
@ 2012-07-27 21:42     ` Dan Magenheimer
  2012-07-29  1:54       ` Minchan Kim
  2012-07-30 19:19       ` Seth Jennings
  0 siblings, 2 replies; 41+ messages in thread
From: Dan Magenheimer @ 2012-07-27 21:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Seth Jennings, Greg Kroah-Hartman, Andrew Morton, Nitin Gupta,
	Minchan Kim, Konrad Wilk, Robert Jennings, linux-mm,
	linux-kernel, devel

> From: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org]
> Sent: Friday, July 27, 2012 3:00 PM
> Subject: Re: [PATCH 0/4] promote zcache from staging
> 
> On Fri, Jul 27, 2012 at 12:21:50PM -0700, Dan Magenheimer wrote:
> > > From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> > > Subject: [PATCH 0/4] promote zcache from staging
> > >
> > > zcache is the remaining piece of code required to support in-kernel
> > > memory compression.  The other two features, cleancache and frontswap,
> > > have been promoted to mainline in 3.0 and 3.5.  This patchset
> > > promotes zcache from the staging tree to mainline.
> > >
> > > Based on the level of activity and contributions we're seeing from a
> > > diverse set of people and interests, I think zcache has matured to the
> > > point where it makes sense to promote this out of staging.
> >
> > Hi Seth --
> >
> > Per offline communication, I'd like to see this delayed for three
> > reasons:
> >
> > 1) I've completely rewritten zcache and will post the rewrite soon.
> >    The redesigned code fixes many of the weaknesses in zcache that
> >    makes it (IMHO) unsuitable for an enterprise distro.  (Some of
> >    these previously discussed in linux-mm [1].)
> > 2) zcache is truly mm (memory management) code and the fact that
> >    it is in drivers at all was purely for logistical reasons
> >    (e.g. the only in-tree "staging" is in the drivers directory).
> >    My rewrite promotes it to (a subdirectory of) mm where IMHO it
> >    belongs.
> > 3) Ramster heavily duplicates code from zcache.  My rewrite resolves
> >    this.  My soon-to-be-post also places the re-factored ramster
> >    in mm, though with some minor work zcache could go in mm and
> >    ramster could stay in staging.
> >
> > Let's have this discussion, but unless the community decides
> > otherwise, please consider this a NACK.

Hi Konrad --
 
> Hold on, that is rather unfair. The zcache has been in staging
> for quite some time - your code has not been posted. Part of
> "unstaging" a driver is for folks to review the code - and you
> just said "No, mine is better" without showing your goods.

Sorry, I'm not trying to be unfair.  However, I don't see the point
of promoting zcache out of staging unless it is intended to be used
by real users in a real distro.  There's been a lot of discussion,
onlist and offlist, about what needs to be fixed in zcache and not
much visible progress on fixing it.  But fixing it is where I've spent
most of my time over the last couple of months.

If IBM or some other company or distro is eager to ship and support
zcache in its current form, I agree that "promote now, improve later"
is a fine approach.  But promoting zcache out of staging simply because
there is urgency to promote zsmalloc+zram out of staging doesn't
seem wise.  At a minimum, it distracts reviewers/effort from what IMHO
is required to turn zcache into an enterprise-ready kernel feature.

I can post my "goods" anytime.  In its current form it is better
than the zcache in staging (and, please remember, I wrote both so
I think I am in a good position to compare the two).
I have been waiting until I think the new zcache is feature complete
before asking for review, especially since the newest features
should demonstrate clearly why the rewrite is necessary and
beneficial.  But I can post* my current bits if people don't
believe they exist and/or don't mind reviewing non-final code.
(* Or I can put them in a publicly available git tree.)

> There is a third option - which is to continue the promotion
> of zcache from staging, get reviews, work on them ,etc, and
> alongside of that you can work on fixing up (or ripping out)
> zcache1 with zcache2 components as they make sense. Or even
> having two of them - an enterprise and an embedded version
> that will eventually get merged together. There is nothing
> wrong with modifying a driver once it has left staging.

Minchan and Seth can correct me if I am wrong, but I believe
zram+zsmalloc, not zcache, is the target solution for embedded.
The limitations of zsmalloc aren't an issue for zram but they are
for zcache, and this deficiency was one of the catalysts for the
rewrite.  The issues are explained in more detail in [1],
but if any point isn't clear, I'd be happy to explain further.

However, I have limited time for this right now and I'd prefer
to spend it finishing the code. :-}

So, as I said, I am still a NACK, but if there are good reasons
to duplicate effort and pursue the "third option", let's discuss
them.

Thanks,
Dan

[1] http://marc.info/?t=133886706700002&r=1&w=2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-27 21:42     ` Dan Magenheimer
@ 2012-07-29  1:54       ` Minchan Kim
  2012-07-31 15:36         ` Konrad Rzeszutek Wilk
  2012-07-30 19:19       ` Seth Jennings
  1 sibling, 1 reply; 41+ messages in thread
From: Minchan Kim @ 2012-07-29  1:54 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Konrad Rzeszutek Wilk, Seth Jennings, Greg Kroah-Hartman,
	Andrew Morton, Nitin Gupta, Konrad Wilk, Robert Jennings,
	linux-mm, linux-kernel, devel

On Fri, Jul 27, 2012 at 02:42:14PM -0700, Dan Magenheimer wrote:
> > From: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org]
> > Sent: Friday, July 27, 2012 3:00 PM
> > Subject: Re: [PATCH 0/4] promote zcache from staging
> > 
> > On Fri, Jul 27, 2012 at 12:21:50PM -0700, Dan Magenheimer wrote:
> > > > From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> > > > Subject: [PATCH 0/4] promote zcache from staging
> > > >
> > > > zcache is the remaining piece of code required to support in-kernel
> > > > memory compression.  The other two features, cleancache and frontswap,
> > > > have been promoted to mainline in 3.0 and 3.5.  This patchset
> > > > promotes zcache from the staging tree to mainline.
> > > >
> > > > Based on the level of activity and contributions we're seeing from a
> > > > diverse set of people and interests, I think zcache has matured to the
> > > > point where it makes sense to promote this out of staging.
> > >
> > > Hi Seth --
> > >
> > > Per offline communication, I'd like to see this delayed for three
> > > reasons:
> > >
> > > 1) I've completely rewritten zcache and will post the rewrite soon.
> > >    The redesigned code fixes many of the weaknesses in zcache that
> > >    makes it (IMHO) unsuitable for an enterprise distro.  (Some of
> > >    these previously discussed in linux-mm [1].)
> > > 2) zcache is truly mm (memory management) code and the fact that
> > >    it is in drivers at all was purely for logistical reasons
> > >    (e.g. the only in-tree "staging" is in the drivers directory).
> > >    My rewrite promotes it to (a subdirectory of) mm where IMHO it
> > >    belongs.
> > > 3) Ramster heavily duplicates code from zcache.  My rewrite resolves
> > >    this.  My soon-to-be-post also places the re-factored ramster
> > >    in mm, though with some minor work zcache could go in mm and
> > >    ramster could stay in staging.
> > >
> > > Let's have this discussion, but unless the community decides
> > > otherwise, please consider this a NACK.
> 
> Hi Konrad --
>  
> > Hold on, that is rather unfair. The zcache has been in staging
> > for quite some time - your code has not been posted. Part of
> > "unstaging" a driver is for folks to review the code - and you
> > just said "No, mine is better" without showing your goods.
> 
> Sorry, I'm not trying to be unfair.  However, I don't see the point
> of promoting zcache out of staging unless it is intended to be used
> by real users in a real distro.  There's been a lot of discussion,
> onlist and offlist, about what needs to be fixed in zcache and not
> much visible progress on fixing it.  But fixing it is where I've spent
> most of my time over the last couple of months.
> 
> If IBM or some other company or distro is eager to ship and support
> zcache in its current form, I agree that "promote now, improve later"
> is a fine approach.  But promoting zcache out of staging simply because
> there is urgency to promote zsmalloc+zram out of staging doesn't
> seem wise.  At a minimum, it distracts reviewers/effort from what IMHO
> is required to turn zcache into an enterprise-ready kernel feature.
> 
> I can post my "goods" anytime.  In its current form it is better
> than the zcache in staging (and, please remember, I wrote both so
> I think I am in a good position to compare the two).
> I have been waiting until I think the new zcache is feature complete
> before asking for review, especially since the newest features
> should demonstrate clearly why the rewrite is necessary and
> beneficial.  But I can post* my current bits if people don't
> believe they exist and/or don't mind reviewing non-final code.
> (* Or I can put them in a publicly available git tree.)
> 
> > There is a third option - which is to continue the promotion
> > of zcache from staging, get reviews, work on them ,etc, and
> > alongside of that you can work on fixing up (or ripping out)
> > zcache1 with zcache2 components as they make sense. Or even
> > having two of them - an enterprise and an embedded version
> > that will eventually get merged together. There is nothing
> > wrong with modifying a driver once it has left staging.
> 
> Minchan and Seth can correct me if I am wrong, but I believe
> zram+zsmalloc, not zcache, is the target solution for embedded.

NOT ture. Some embedded devices use zcache but it's not original
zcache but modificated one.
Anyway, although embedded people use modified zcache, I am biased to Dan.
I admit I don't spend lots of time to look zcache but as looking the
code, it wasn't good shape and even had a bug found during code review
and I felt strongly we should clean up it for promoting it to mm/.
So I would like to wait Dan's posting if you guys are not urgent.
(And I am not sure akpm allow it with current shape of zcache code.)
But the concern is about adding new feature. I guess there might be some
debate for long time and it can prevent promoting again.
I think It's not what Seth want.
I hope Dan doesn't mix clean up series and new feature series and
post clean up series as soon as possible so let's clean up first and
try to promote it and later, adding new feature or changing algorithm
is desirable.


> The limitations of zsmalloc aren't an issue for zram but they are
> for zcache, and this deficiency was one of the catalysts for the
> rewrite.  The issues are explained in more detail in [1],
> but if any point isn't clear, I'd be happy to explain further.
> 
> However, I have limited time for this right now and I'd prefer
> to spend it finishing the code. :-}
> 
> So, as I said, I am still a NACK, but if there are good reasons
> to duplicate effort and pursue the "third option", let's discuss
> them.
> 
> Thanks,
> Dan
> 
> [1] http://marc.info/?t=133886706700002&r=1&w=2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-27 18:18 [PATCH 0/4] promote zcache from staging Seth Jennings
                   ` (3 preceding siblings ...)
  2012-07-27 18:18 ` [PATCH 4/4] zcache: promote to drivers/mm/ Seth Jennings
@ 2012-07-29  2:20 ` Minchan Kim
  2012-08-07 20:23 ` Seth Jennings
  2012-08-14 22:18 ` Seth Jennings
  6 siblings, 0 replies; 41+ messages in thread
From: Minchan Kim @ 2012-07-29  2:20 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Robert Jennings,
	linux-mm, linux-kernel, devel

Hi Seth,

zcache out of staging is rather controversial as you see this thread.
But I believe zram is very mature and code/comment is clean. In addition,
it has lots of real customers in embedded side so IMHO, it would be easy to
promote it firstly. Of course, it will promote zsmalloc which is half on
what you want. What do you think about? If you agree, could you do that firstly?
If you don't want and promoting zcache continue to be controversial,
I will do that after my vacation.

Thanks.

On Fri, Jul 27, 2012 at 01:18:33PM -0500, Seth Jennings wrote:
> zcache is the remaining piece of code required to support in-kernel
> memory compression.  The other two features, cleancache and frontswap,
> have been promoted to mainline in 3.0 and 3.5.  This patchset
> promotes zcache from the staging tree to mainline.
> 
> Based on the level of activity and contributions we're seeing from a
> diverse set of people and interests, I think zcache has matured to the
> point where it makes sense to promote this out of staging.
> 
> Overview
> ========
> zcache is a backend to frontswap and cleancache that accepts pages from
> those mechanisms and compresses them, leading to reduced I/O caused by
> swap and file re-reads.  This is very valuable in shared storage situations
> to reduce load on things like SANs.  Also, in the case of slow backing/swap
> devices, zcache can also yield a performance gain.
> 
> In-Kernel Memory Compression Overview:
> 
>  swap subsystem            page cache
>         +                      +
>     frontswap              cleancache
>         +                      +
> zcache frontswap glue  zcache cleancache glue
>         +                      +
>         +---------+------------+
>                   +
>             zcache/tmem core
>                   +
>         +---------+------------+
>         +                      +
>      zsmalloc                 zbud
> 
> Everything below the frontswap/cleancache layer is current inside the
> zcache driver expect for zsmalloc which is a shared between zcache and
> another memory compression driver, zram.
> 
> Since zcache is dependent on zsmalloc, it is also being promoted by this
> patchset.
> 
> For information on zsmalloc and the rationale behind it's design and use
> cases verses already existing allocators in the kernel:
> 
> https://lkml.org/lkml/2012/1/9/386
> 
> zsmalloc is the allocator used by zcache to store persistent pages that
> comes from frontswap, as opposed to zbud which is the (internal) allocator
> used for ephemeral pages from cleancache.
> 
> zsmalloc uses many fields of the page struct to create it's conceptual
> high-order page called a zspage.  Exactly which fields are used and for
> what purpose is documented at the top of the zsmalloc .c file.  Because
> zsmalloc uses struct page extensively, Andrew advised that the
> promotion location be mm/:
> 
> https://lkml.org/lkml/2012/1/20/308
> 
> Some benchmarking numbers demonstrating the I/O saving that can be had
> with zcache:
> 
> https://lkml.org/lkml/2012/3/22/383
> 
> Dan's presentation at LSF/MM this year on zcache:
> 
> http://oss.oracle.com/projects/tmem/dist/documentation/presentations/LSFMM12-zcache-final.pdf
> 
> This patchset is based on next-20120727 + 3-part zsmalloc patchset below
> 
> https://lkml.org/lkml/2012/7/18/353
> 
> The zsmalloc patchset is already acked and will be integrated by Greg after
> 3.6-rc1 is out.
> 
> Seth Jennings (4):
>   zsmalloc: collapse internal .h into .c
>   zsmalloc: promote to mm/
>   drivers: add memory management driver class
>   zcache: promote to drivers/mm/
> 
>  drivers/Kconfig                                    |    2 +
>  drivers/Makefile                                   |    1 +
>  drivers/mm/Kconfig                                 |   13 ++
>  drivers/mm/Makefile                                |    1 +
>  drivers/{staging => mm}/zcache/Makefile            |    0
>  drivers/{staging => mm}/zcache/tmem.c              |    0
>  drivers/{staging => mm}/zcache/tmem.h              |    0
>  drivers/{staging => mm}/zcache/zcache-main.c       |    4 +-
>  drivers/staging/Kconfig                            |    4 -
>  drivers/staging/Makefile                           |    2 -
>  drivers/staging/zcache/Kconfig                     |   11 --
>  drivers/staging/zram/zram_drv.h                    |    3 +-
>  drivers/staging/zsmalloc/Kconfig                   |   10 --
>  drivers/staging/zsmalloc/Makefile                  |    3 -
>  drivers/staging/zsmalloc/zsmalloc_int.h            |  149 --------------------
>  .../staging/zsmalloc => include/linux}/zsmalloc.h  |    0
>  mm/Kconfig                                         |   18 +++
>  mm/Makefile                                        |    1 +
>  .../zsmalloc/zsmalloc-main.c => mm/zsmalloc.c      |  133 ++++++++++++++++-
>  19 files changed, 170 insertions(+), 185 deletions(-)
>  create mode 100644 drivers/mm/Kconfig
>  create mode 100644 drivers/mm/Makefile
>  rename drivers/{staging => mm}/zcache/Makefile (100%)
>  rename drivers/{staging => mm}/zcache/tmem.c (100%)
>  rename drivers/{staging => mm}/zcache/tmem.h (100%)
>  rename drivers/{staging => mm}/zcache/zcache-main.c (99%)
>  delete mode 100644 drivers/staging/zcache/Kconfig
>  delete mode 100644 drivers/staging/zsmalloc/Kconfig
>  delete mode 100644 drivers/staging/zsmalloc/Makefile
>  delete mode 100644 drivers/staging/zsmalloc/zsmalloc_int.h
>  rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
>  rename drivers/staging/zsmalloc/zsmalloc-main.c => mm/zsmalloc.c (86%)
> 
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-27 21:42     ` Dan Magenheimer
  2012-07-29  1:54       ` Minchan Kim
@ 2012-07-30 19:19       ` Seth Jennings
  2012-07-30 20:48         ` Dan Magenheimer
  1 sibling, 1 reply; 41+ messages in thread
From: Seth Jennings @ 2012-07-30 19:19 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Konrad Rzeszutek Wilk, Greg Kroah-Hartman, Andrew Morton,
	Nitin Gupta, Minchan Kim, Konrad Wilk, Robert Jennings, linux-mm,
	linux-kernel, devel

Dan,

I started writing inline responses to each concern but that
was adding more confusion than clarity.  I would like to
focus the discussion.

The purpose of this patchset is to discuss the inclusion of
zcache into mainline during the 3.7 merge window.  zcache
has been a staging since v2.6.39 and has been maturing with
contributions from 15 developers (5 with multiple commits)
working on improvements and bug fixes.

I want good code in the kernel, so if there are particular
areas that need attention before it's of acceptable quality
for mainline we need that discussion.  I am eager to have
customers using memory compression with zcache but before
that I want to see zcache in mainline.

We agree with Konrad that zcache should be promoted before
additional features are included.  Greg has also expressed
that he would like promotion before attempting to add
additional features [1].  Including new features now, while
in the staging tree, adds to the complexity and difficultly
of reverifying zcache and getting it accepted into mainline.

[1] https://lkml.org/lkml/2012/3/16/472

Let's have this discussion.  If there are specific issues
that need to be addressed to get this ready for mainline
let's take them one-by-one and line-by-line with patches.

Seth


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 0/4] promote zcache from staging
  2012-07-30 19:19       ` Seth Jennings
@ 2012-07-30 20:48         ` Dan Magenheimer
  2012-07-31 15:58           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 41+ messages in thread
From: Dan Magenheimer @ 2012-07-30 20:48 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Konrad Rzeszutek Wilk, Greg Kroah-Hartman, Andrew Morton,
	Nitin Gupta, Minchan Kim, Konrad Wilk, Robert Jennings, linux-mm,
	linux-kernel, devel

> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Subject: Re: [PATCH 0/4] promote zcache from staging
> 
> Dan,
> 
> I started writing inline responses to each concern but that
> was adding more confusion than clarity.  I would like to
> focus the discussion.
>   :
> Let's have this discussion.  If there are specific issues
> that need to be addressed to get this ready for mainline
> let's take them one-by-one and line-by-line with patches.

Hi Seth --

Thanks for your response and for your passion.

The first discussion I think is about whether zsmalloc is
a suitable allocator for zcache.  In its current state
in staging, zcache uses zbud for ephemeral (cleancache)
zpages and zsmalloc for persistent (frontswap) zpages.
I have proposed concerns on-list that the capabilities
provided by zsmalloc are not suitable for supporting zcache
in an enterprise distro.  The author of zsmalloc concurred
and has (at least so far) not been available to enhance
zsmalloc, and you have taken a strong position that zsmalloc
needed to be "generic" (i.e. will never deliver the functionality
IMHO is necessary for zcache).  So I have rewritten zbud to
handle both kinds of zpages and, at the same time, to
resolve my stated issues.  This is the bulk of my
major rewrite... I don't think constructing and reviewing
a long series of one-by-one and line-by-line patches is
of much value here, especially since the current code is
in staging.  We either (1) use the now rewritten zbud (2) wait
until someone rewrites zsmalloc (3) accept the deficiencies
of zcache in its current form.

The second discussion is whether ramster, as a "user" of
zcache, is relevant.  As you know, ramster is built on
top of zcache but requires a fair number of significant
changes that, due to gregkh's restriction, could not be
made directly to zcache while in staging.  In my rewrite,
I've taken a great deal of care that the "new" zcache
cleanly supports both.  While some couldn't care less about
ramster, the next step of ramster may be of more interest
to a broader part of the community.  So I am eager to
ensure that the core zcache code in zcache and ramster
doesn't need to "fork" again.  The zcache-main.c in staging/ramster
is farther along than the zcache-main.c in staging/zcache, but
IMHO my rewrite is better and cleaner than either.

Most of the rest of the cleanup, such as converting to debugfs
instead of sysfs, could be done as a sequence of one-by-one
and line-by-line patches.  I think we agree that zcache will
not be promoted unless this change is made, but IMHO constructing
and reviewing patches individually is not of much value since
the above zbud and ramster changes already result in a major
rewrite.  I think the community would benefit most from a new
solid code foundation for zcache and reviewers time (and your
time and mine) would best be spent grokking the new code than
from reviewing a very long sequence of cleanup patches.

> The purpose of this patchset is to discuss the inclusion of
> zcache into mainline during the 3.7 merge window.  zcache
> has been a staging since v2.6.39 and has been maturing with
> contributions from 15 developers (5 with multiple commits)
> working on improvements and bug fixes.
>
> I want good code in the kernel, so if there are particular
> areas that need attention before it's of acceptable quality
> for mainline we need that discussion.  I am eager to have
> customers using memory compression with zcache but before
> that I want to see zcache in mainline.

I think we are all eager to achieve the end result: real users
using zcache in real production systems.  IMHO your suggested
path will not achieve that, certainly not in the 3.7 timeframe.
The current code (IMHO) is neither suitable for promotion, nor
functionally capable of taking the beating of an enterprise distro.

> We agree with Konrad that zcache should be promoted before
> additional features are included.  Greg has also expressed
> that he would like promotion before attempting to add
> additional features [1].  Including new features now, while
> in the staging tree, adds to the complexity and difficultly
> of reverifying zcache and getting it accepted into mainline.
> 
> [1] https://lkml.org/lkml/2012/3/16/472still in staging.

Zcache as submitted to staging in 2.6.39 was (and is) a working
proof-of-concept.  As you know, Greg's position created a
"catch 22"... zcache in its current state isn't good enough
to be promoted, but we can't change it substantially to resolve
its deficiencies while it is still in staging.  (Minchan
recently stated that he doesn't think it is in good enough
shape to be approved by Andrew, and I agree.)  That's why I
embarked on the rewrite.

Lastly, I'm not so much "adding new features" as ensuring the
new zcache foundation will be sufficient to support enterprise
users.  But I do now agree with Minchan (and I think with you)
that I need to post where I'm at, even if I am not 100% ready or
satisfied.  I'll try to do that by the end of the week.

Thanks,
Dan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/4] drivers: add memory management driver class
  2012-07-27 18:18 ` [PATCH 3/4] drivers: add memory management driver class Seth Jennings
@ 2012-07-31 15:31   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-31 15:31 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Dan Magenheimer, Robert Jennings, linux-mm, linux-kernel, devel

On Fri, Jul 27, 2012 at 01:18:36PM -0500, Seth Jennings wrote:
> This patchset creates a new driver class under drivers/ for
> memory management related drivers, like zcache.

I was going back and forth with Dan whether it should be in mm/
or in drivers/mm.
> 
> This driver class would be for drivers that don't actually enabled
> a hardware device, but rather augment the memory manager in some
> way.
> 
> In-tree candidates for this driver class are zcache, zram, and
> lowmemorykiller, both in staging.

But with some many (well, three of them) I think sticking them in
drviers/mm makes more sense.
> 
> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> ---
>  drivers/Kconfig    |    2 ++
>  drivers/Makefile   |    1 +
>  drivers/mm/Kconfig |    3 +++
>  3 files changed, 6 insertions(+)
>  create mode 100644 drivers/mm/Kconfig
> 
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index ece958d..67fe7bd 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -152,4 +152,6 @@ source "drivers/vme/Kconfig"
>  
>  source "drivers/pwm/Kconfig"
>  
> +source "drivers/mm/Kconfig"
> +
>  endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index 5b42184..121742e 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -139,3 +139,4 @@ obj-$(CONFIG_EXTCON)		+= extcon/
>  obj-$(CONFIG_MEMORY)		+= memory/
>  obj-$(CONFIG_IIO)		+= iio/
>  obj-$(CONFIG_VME_BUS)		+= vme/
> +obj-$(CONFIG_MM_DRIVERS)	+= mm/
> diff --git a/drivers/mm/Kconfig b/drivers/mm/Kconfig
> new file mode 100644
> index 0000000..e5b3743
> --- /dev/null
> +++ b/drivers/mm/Kconfig
> @@ -0,0 +1,3 @@
> +menu "Memory management drivers"
> +
> +endmenu
> -- 
> 1.7.9.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-29  1:54       ` Minchan Kim
@ 2012-07-31 15:36         ` Konrad Rzeszutek Wilk
  2012-08-06  4:49           ` Minchan Kim
  0 siblings, 1 reply; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-31 15:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Dan Magenheimer, Konrad Rzeszutek Wilk, Seth Jennings,
	Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Robert Jennings,
	linux-mm, linux-kernel, devel

On Sun, Jul 29, 2012 at 10:54:28AM +0900, Minchan Kim wrote:
> On Fri, Jul 27, 2012 at 02:42:14PM -0700, Dan Magenheimer wrote:
> > > From: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org]
> > > Sent: Friday, July 27, 2012 3:00 PM
> > > Subject: Re: [PATCH 0/4] promote zcache from staging
> > > 
> > > On Fri, Jul 27, 2012 at 12:21:50PM -0700, Dan Magenheimer wrote:
> > > > > From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> > > > > Subject: [PATCH 0/4] promote zcache from staging
> > > > >
> > > > > zcache is the remaining piece of code required to support in-kernel
> > > > > memory compression.  The other two features, cleancache and frontswap,
> > > > > have been promoted to mainline in 3.0 and 3.5.  This patchset
> > > > > promotes zcache from the staging tree to mainline.
> > > > >
> > > > > Based on the level of activity and contributions we're seeing from a
> > > > > diverse set of people and interests, I think zcache has matured to the
> > > > > point where it makes sense to promote this out of staging.
> > > >
> > > > Hi Seth --
> > > >
> > > > Per offline communication, I'd like to see this delayed for three
> > > > reasons:
> > > >
> > > > 1) I've completely rewritten zcache and will post the rewrite soon.
> > > >    The redesigned code fixes many of the weaknesses in zcache that
> > > >    makes it (IMHO) unsuitable for an enterprise distro.  (Some of
> > > >    these previously discussed in linux-mm [1].)
> > > > 2) zcache is truly mm (memory management) code and the fact that
> > > >    it is in drivers at all was purely for logistical reasons
> > > >    (e.g. the only in-tree "staging" is in the drivers directory).
> > > >    My rewrite promotes it to (a subdirectory of) mm where IMHO it
> > > >    belongs.
> > > > 3) Ramster heavily duplicates code from zcache.  My rewrite resolves
> > > >    this.  My soon-to-be-post also places the re-factored ramster
> > > >    in mm, though with some minor work zcache could go in mm and
> > > >    ramster could stay in staging.
> > > >
> > > > Let's have this discussion, but unless the community decides
> > > > otherwise, please consider this a NACK.
> > 
> > Hi Konrad --
> >  
> > > Hold on, that is rather unfair. The zcache has been in staging
> > > for quite some time - your code has not been posted. Part of
> > > "unstaging" a driver is for folks to review the code - and you
> > > just said "No, mine is better" without showing your goods.
> > 
> > Sorry, I'm not trying to be unfair.  However, I don't see the point
> > of promoting zcache out of staging unless it is intended to be used
> > by real users in a real distro.  There's been a lot of discussion,
> > onlist and offlist, about what needs to be fixed in zcache and not
> > much visible progress on fixing it.  But fixing it is where I've spent
> > most of my time over the last couple of months.
> > 
> > If IBM or some other company or distro is eager to ship and support
> > zcache in its current form, I agree that "promote now, improve later"
> > is a fine approach.  But promoting zcache out of staging simply because
> > there is urgency to promote zsmalloc+zram out of staging doesn't
> > seem wise.  At a minimum, it distracts reviewers/effort from what IMHO
> > is required to turn zcache into an enterprise-ready kernel feature.
> > 
> > I can post my "goods" anytime.  In its current form it is better
> > than the zcache in staging (and, please remember, I wrote both so
> > I think I am in a good position to compare the two).
> > I have been waiting until I think the new zcache is feature complete
> > before asking for review, especially since the newest features
> > should demonstrate clearly why the rewrite is necessary and
> > beneficial.  But I can post* my current bits if people don't
> > believe they exist and/or don't mind reviewing non-final code.
> > (* Or I can put them in a publicly available git tree.)
> > 
> > > There is a third option - which is to continue the promotion
> > > of zcache from staging, get reviews, work on them ,etc, and
> > > alongside of that you can work on fixing up (or ripping out)
> > > zcache1 with zcache2 components as they make sense. Or even
> > > having two of them - an enterprise and an embedded version
> > > that will eventually get merged together. There is nothing
> > > wrong with modifying a driver once it has left staging.
> > 
> > Minchan and Seth can correct me if I am wrong, but I believe
> > zram+zsmalloc, not zcache, is the target solution for embedded.
> 
> NOT ture. Some embedded devices use zcache but it's not original
> zcache but modificated one.

What kind of modifications? Would it make sense to post the patches
for those modifications?

> Anyway, although embedded people use modified zcache, I am biased to Dan.
> I admit I don't spend lots of time to look zcache but as looking the
> code, it wasn't good shape and even had a bug found during code review
> and I felt strongly we should clean up it for promoting it to mm/.

Do you recall what the bugs where?

> So I would like to wait Dan's posting if you guys are not urgent.
> (And I am not sure akpm allow it with current shape of zcache code.)
> But the concern is about adding new feature. I guess there might be some
> debate for long time and it can prevent promoting again.
> I think It's not what Seth want.
> I hope Dan doesn't mix clean up series and new feature series and
> post clean up series as soon as possible so let's clean up first and
> try to promote it and later, adding new feature or changing algorithm
> is desirable.
> 
> 
> > The limitations of zsmalloc aren't an issue for zram but they are
> > for zcache, and this deficiency was one of the catalysts for the
> > rewrite.  The issues are explained in more detail in [1],
> > but if any point isn't clear, I'd be happy to explain further.
> > 
> > However, I have limited time for this right now and I'd prefer
> > to spend it finishing the code. :-}
> > 
> > So, as I said, I am still a NACK, but if there are good reasons
> > to duplicate effort and pursue the "third option", let's discuss
> > them.
> > 
> > Thanks,
> > Dan
> > 
> > [1] http://marc.info/?t=133886706700002&r=1&w=2
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> -- 
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-30 20:48         ` Dan Magenheimer
@ 2012-07-31 15:58           ` Konrad Rzeszutek Wilk
  2012-07-31 16:19             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-31 15:58 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Seth Jennings, Konrad Rzeszutek Wilk, Greg Kroah-Hartman,
	Andrew Morton, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, linux-kernel, devel

On Mon, Jul 30, 2012 at 01:48:29PM -0700, Dan Magenheimer wrote:
> > From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> > Subject: Re: [PATCH 0/4] promote zcache from staging
> > 
> > Dan,
> > 
> > I started writing inline responses to each concern but that
> > was adding more confusion than clarity.  I would like to
> > focus the discussion.
> >   :
> > Let's have this discussion.  If there are specific issues
> > that need to be addressed to get this ready for mainline
> > let's take them one-by-one and line-by-line with patches.
> 
> Hi Seth --
> 
> Thanks for your response and for your passion.
> 
> The first discussion I think is about whether zsmalloc is
> a suitable allocator for zcache.  In its current state
> in staging, zcache uses zbud for ephemeral (cleancache)
> zpages and zsmalloc for persistent (frontswap) zpages.

OK, but - unstaging 'zsmalloc' is a different patchset.

> I have proposed concerns on-list that the capabilities
> provided by zsmalloc are not suitable for supporting zcache
> in an enterprise distro.  The author of zsmalloc concurred

The goal is to support _both_ enterprise and embedded.

But what you are saying sounds like it does not work in enterprise
environment - which is not my experience? If you are saying that
the code should not be integrated before it works in both classes
perfectly - well, then a lot of other code in the Linux should not
have been accepted - and I would call those insufficienies "bugs".
And bugs are .. natural, albeit pesky.

I think what you are saing by "enterprise" is that you want
it to be enabled by default (or at least be confident that it
can be done so) for everybody and that it work quite well under
99% workload. While right now it covers only 98% (or some
other number) of workload.

> and has (at least so far) not been available to enhance
> zsmalloc, and you have taken a strong position that zsmalloc
> needed to be "generic" (i.e. will never deliver the functionality
> IMHO is necessary for zcache).  So I have rewritten zbud to
> handle both kinds of zpages and, at the same time, to
> resolve my stated issues.  This is the bulk of my

Ok, so zbud rewrite is to remove the need for zsmalloc
and use zbud2 for both persistent and ephemeral pages.

> major rewrite... I don't think constructing and reviewing
> a long series of one-by-one and line-by-line patches is
> of much value here, especially since the current code is
> in staging.  We either (1) use the now rewritten zbud (2) wait
> until someone rewrites zsmalloc (3) accept the deficiencies
> of zcache in its current form.

This sounds like a Catch-22 :-) Greg would like to have the
TODO list finished - and it seems that one of the todo's
is to have one instead of two engines for dealing with pages.
But at the same time not adding in new features.

> 
> The second discussion is whether ramster, as a "user" of
> zcache, is relevant.  As you know, ramster is built on
> top of zcache but requires a fair number of significant
> changes that, due to gregkh's restriction, could not be
> made directly to zcache while in staging.  In my rewrite,
> I've taken a great deal of care that the "new" zcache
> cleanly supports both.  While some couldn't care less about
> ramster, the next step of ramster may be of more interest
> to a broader part of the community.  So I am eager to
> ensure that the core zcache code in zcache and ramster
> doesn't need to "fork" again.  The zcache-main.c in staging/ramster
> is farther along than the zcache-main.c in staging/zcache, but
> IMHO my rewrite is better and cleaner than either.

So in short you made zcache more modular?

> 
> Most of the rest of the cleanup, such as converting to debugfs
> instead of sysfs, could be done as a sequence of one-by-one
> and line-by-line patches.  I think we agree that zcache will

Sure. Thought you could do it more wholesale: sysfs->debugfs patch.

> not be promoted unless this change is made, but IMHO constructing
> and reviewing patches individually is not of much value since
> the above zbud and ramster changes already result in a major
> rewrite.  I think the community would benefit most from a new
> solid code foundation for zcache and reviewers time (and your
> time and mine) would best be spent grokking the new code than
> from reviewing a very long sequence of cleanup patches.

You are ignoring the goodness of the testing and performance
numbers that zcache has gotten so far. With a new code those
numbers are invalidated. That is throwing away some good data.
Reviewing code based on the old code (and knowing how the
old code works) I think is easier than trying to understand new
code from scratch - at least one has a baseline to undertand it.
But that might be just my opinion - either way I am OK
looking at brand new code or old code - but I would end up
looking at the old code to answer those : "Huh. I wonder how
we did that previously." - at which point it might make sense
just to have the patches broken up in small segments.

> 
> > The purpose of this patchset is to discuss the inclusion of
> > zcache into mainline during the 3.7 merge window.  zcache
> > has been a staging since v2.6.39 and has been maturing with
> > contributions from 15 developers (5 with multiple commits)
> > working on improvements and bug fixes.
> >
> > I want good code in the kernel, so if there are particular
> > areas that need attention before it's of acceptable quality
> > for mainline we need that discussion.  I am eager to have
> > customers using memory compression with zcache but before
> > that I want to see zcache in mainline.
> 
> I think we are all eager to achieve the end result: real users
> using zcache in real production systems.  IMHO your suggested
> path will not achieve that, certainly not in the 3.7 timeframe.
> The current code (IMHO) is neither suitable for promotion, nor
> functionally capable of taking the beating of an enterprise distro.

So we agree that it must be fixed, but disagree on how to fix it :-)

However there are real folks in the embedded env that use it (with
some modifications) - please don't call them "unreal users".

> 
> > We agree with Konrad that zcache should be promoted before
> > additional features are included.  Greg has also expressed
> > that he would like promotion before attempting to add
> > additional features [1].  Including new features now, while
> > in the staging tree, adds to the complexity and difficultly
> > of reverifying zcache and getting it accepted into mainline.
> > 
> > [1] https://lkml.org/lkml/2012/3/16/472still in staging.
> 
> Zcache as submitted to staging in 2.6.39 was (and is) a working
> proof-of-concept.  As you know, Greg's position created a
> "catch 22"... zcache in its current state isn't good enough
> to be promoted, but we can't change it substantially to resolve
> its deficiencies while it is still in staging.  (Minchan
> recently stated that he doesn't think it is in good enough
> shape to be approved by Andrew, and I agree.)  That's why I
> embarked on the rewrite.
> 
> Lastly, I'm not so much "adding new features" as ensuring the
> new zcache foundation will be sufficient to support enterprise
> users.  But I do now agree with Minchan (and I think with you)
> that I need to post where I'm at, even if I am not 100% ready or
> satisfied.  I'll try to do that by the end of the week.

So in my head I feel that it is Ok to:
1) address the concerns that zcache has before it is unstaged
2) rip out the two-engine system with a one-engine system
   (and see how well it behaves)
3) sysfs->debugfs as needed
4) other things as needed

I think we are getting hung-up what Greg said about adding features
and the two-engine->one engine could be understood as that.
While I think that is part of a staging effort to clean up the
existing issues. Lets see what Greg thinks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-31 15:58           ` Konrad Rzeszutek Wilk
@ 2012-07-31 16:19             ` Greg Kroah-Hartman
  2012-07-31 17:51               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 41+ messages in thread
From: Greg Kroah-Hartman @ 2012-07-31 16:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Dan Magenheimer, devel, Seth Jennings, linux-mm, linux-kernel,
	Minchan Kim, Konrad Rzeszutek Wilk, Andrew Morton,
	Robert Jennings, Nitin Gupta

On Tue, Jul 31, 2012 at 11:58:43AM -0400, Konrad Rzeszutek Wilk wrote:
> So in my head I feel that it is Ok to:
> 1) address the concerns that zcache has before it is unstaged
> 2) rip out the two-engine system with a one-engine system
>    (and see how well it behaves)
> 3) sysfs->debugfs as needed
> 4) other things as needed
> 
> I think we are getting hung-up what Greg said about adding features
> and the two-engine->one engine could be understood as that.
> While I think that is part of a staging effort to clean up the
> existing issues. Lets see what Greg thinks.

Greg has no idea, except I want to see the needed fixups happen before
new features get added.  Add the new features _after_ it is out of
staging.

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-31 16:19             ` Greg Kroah-Hartman
@ 2012-07-31 17:51               ` Konrad Rzeszutek Wilk
  2012-07-31 18:19                 ` Seth Jennings
  2012-08-06  0:38                 ` Minchan Kim
  0 siblings, 2 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-31 17:51 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Dan Magenheimer, devel, Seth Jennings, linux-mm, linux-kernel,
	Minchan Kim, Konrad Rzeszutek Wilk, Andrew Morton,
	Robert Jennings, Nitin Gupta

On Tue, Jul 31, 2012 at 09:19:16AM -0700, Greg Kroah-Hartman wrote:
> On Tue, Jul 31, 2012 at 11:58:43AM -0400, Konrad Rzeszutek Wilk wrote:
> > So in my head I feel that it is Ok to:
> > 1) address the concerns that zcache has before it is unstaged
> > 2) rip out the two-engine system with a one-engine system
> >    (and see how well it behaves)
> > 3) sysfs->debugfs as needed
> > 4) other things as needed
> > 
> > I think we are getting hung-up what Greg said about adding features
> > and the two-engine->one engine could be understood as that.
> > While I think that is part of a staging effort to clean up the
> > existing issues. Lets see what Greg thinks.
> 
> Greg has no idea, except I want to see the needed fixups happen before
> new features get added.  Add the new features _after_ it is out of
> staging.

I think we (that is me, Seth, Minchan, Dan) need to talk to have a good
understanding of what each of us thinks are fixups.

Would Monday Aug 6th at 1pm EST on irc.freenode.net channel #zcache work
for people?

> 
> greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-31 17:51               ` Konrad Rzeszutek Wilk
@ 2012-07-31 18:19                 ` Seth Jennings
  2012-08-06  0:38                 ` Minchan Kim
  1 sibling, 0 replies; 41+ messages in thread
From: Seth Jennings @ 2012-07-31 18:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Greg Kroah-Hartman, Dan Magenheimer, devel, linux-mm,
	linux-kernel, Minchan Kim, Konrad Rzeszutek Wilk, Andrew Morton,
	Robert Jennings, Nitin Gupta

On 07/31/2012 12:51 PM, Konrad Rzeszutek Wilk wrote:
> Would Monday Aug 6th at 1pm EST on irc.freenode.net channel #zcache work
> for people?

I think this is a great idea!

Dan, can you post code as an RFC by tomorrow or Thursday?
We (Rob and I) have the Texas Linux Fest starting Friday.
We need time to review the code prior to chat so that we can
talk about specifics rather than generalities.

If that can be done, then we are available for the chat on
Monday.

Seth


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-31 17:51               ` Konrad Rzeszutek Wilk
  2012-07-31 18:19                 ` Seth Jennings
@ 2012-08-06  0:38                 ` Minchan Kim
  2012-08-06 15:24                   ` Dan Magenheimer
  2012-08-07 19:28                   ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 41+ messages in thread
From: Minchan Kim @ 2012-08-06  0:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Greg Kroah-Hartman, Dan Magenheimer, devel, Seth Jennings,
	linux-mm, linux-kernel, Konrad Rzeszutek Wilk, Andrew Morton,
	Robert Jennings, Nitin Gupta

Hi Konrad,

On Tue, Jul 31, 2012 at 01:51:42PM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 31, 2012 at 09:19:16AM -0700, Greg Kroah-Hartman wrote:
> > On Tue, Jul 31, 2012 at 11:58:43AM -0400, Konrad Rzeszutek Wilk wrote:
> > > So in my head I feel that it is Ok to:
> > > 1) address the concerns that zcache has before it is unstaged
> > > 2) rip out the two-engine system with a one-engine system
> > >    (and see how well it behaves)
> > > 3) sysfs->debugfs as needed
> > > 4) other things as needed
> > > 
> > > I think we are getting hung-up what Greg said about adding features
> > > and the two-engine->one engine could be understood as that.
> > > While I think that is part of a staging effort to clean up the
> > > existing issues. Lets see what Greg thinks.
> > 
> > Greg has no idea, except I want to see the needed fixups happen before
> > new features get added.  Add the new features _after_ it is out of
> > staging.
> 
> I think we (that is me, Seth, Minchan, Dan) need to talk to have a good
> understanding of what each of us thinks are fixups.
> 
> Would Monday Aug 6th at 1pm EST on irc.freenode.net channel #zcache work
> for people?

1pm EST is 2am KST(Korea Standard Time) so it's not good for me. :)
I know it's hard to adjust my time for yours so let you talk without
me. Instead, I will write it down my requirement. It's very simple and
trivial.

1) Please don't add any new feature like replace zsmalloc with zbud.
   It's totally untested so it needs more time for stable POV bug,
   or performance/fragementation.

2) Factor out common code between zcache and ramster. It should be just
   clean up code and should not change current behavior.

3) Add lots of comment to public functions

4) make function/varabiel names more clearly.

They are necessary for promotion and after promotion,
let's talk about new great features.


> 
> > 
> > greg k-h
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-31 15:36         ` Konrad Rzeszutek Wilk
@ 2012-08-06  4:49           ` Minchan Kim
  0 siblings, 0 replies; 41+ messages in thread
From: Minchan Kim @ 2012-08-06  4:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Dan Magenheimer, Konrad Rzeszutek Wilk, Seth Jennings,
	Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Robert Jennings,
	linux-mm, linux-kernel, devel

On Tue, Jul 31, 2012 at 11:36:04AM -0400, Konrad Rzeszutek Wilk wrote:
> On Sun, Jul 29, 2012 at 10:54:28AM +0900, Minchan Kim wrote:
> > On Fri, Jul 27, 2012 at 02:42:14PM -0700, Dan Magenheimer wrote:
> > > > From: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org]
> > > > Sent: Friday, July 27, 2012 3:00 PM
> > > > Subject: Re: [PATCH 0/4] promote zcache from staging
> > > > 
> > > > On Fri, Jul 27, 2012 at 12:21:50PM -0700, Dan Magenheimer wrote:
> > > > > > From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> > > > > > Subject: [PATCH 0/4] promote zcache from staging
> > > > > >
> > > > > > zcache is the remaining piece of code required to support in-kernel
> > > > > > memory compression.  The other two features, cleancache and frontswap,
> > > > > > have been promoted to mainline in 3.0 and 3.5.  This patchset
> > > > > > promotes zcache from the staging tree to mainline.
> > > > > >
> > > > > > Based on the level of activity and contributions we're seeing from a
> > > > > > diverse set of people and interests, I think zcache has matured to the
> > > > > > point where it makes sense to promote this out of staging.
> > > > >
> > > > > Hi Seth --
> > > > >
> > > > > Per offline communication, I'd like to see this delayed for three
> > > > > reasons:
> > > > >
> > > > > 1) I've completely rewritten zcache and will post the rewrite soon.
> > > > >    The redesigned code fixes many of the weaknesses in zcache that
> > > > >    makes it (IMHO) unsuitable for an enterprise distro.  (Some of
> > > > >    these previously discussed in linux-mm [1].)
> > > > > 2) zcache is truly mm (memory management) code and the fact that
> > > > >    it is in drivers at all was purely for logistical reasons
> > > > >    (e.g. the only in-tree "staging" is in the drivers directory).
> > > > >    My rewrite promotes it to (a subdirectory of) mm where IMHO it
> > > > >    belongs.
> > > > > 3) Ramster heavily duplicates code from zcache.  My rewrite resolves
> > > > >    this.  My soon-to-be-post also places the re-factored ramster
> > > > >    in mm, though with some minor work zcache could go in mm and
> > > > >    ramster could stay in staging.
> > > > >
> > > > > Let's have this discussion, but unless the community decides
> > > > > otherwise, please consider this a NACK.
> > > 
> > > Hi Konrad --
> > >  
> > > > Hold on, that is rather unfair. The zcache has been in staging
> > > > for quite some time - your code has not been posted. Part of
> > > > "unstaging" a driver is for folks to review the code - and you
> > > > just said "No, mine is better" without showing your goods.
> > > 
> > > Sorry, I'm not trying to be unfair.  However, I don't see the point
> > > of promoting zcache out of staging unless it is intended to be used
> > > by real users in a real distro.  There's been a lot of discussion,
> > > onlist and offlist, about what needs to be fixed in zcache and not
> > > much visible progress on fixing it.  But fixing it is where I've spent
> > > most of my time over the last couple of months.
> > > 
> > > If IBM or some other company or distro is eager to ship and support
> > > zcache in its current form, I agree that "promote now, improve later"
> > > is a fine approach.  But promoting zcache out of staging simply because
> > > there is urgency to promote zsmalloc+zram out of staging doesn't
> > > seem wise.  At a minimum, it distracts reviewers/effort from what IMHO
> > > is required to turn zcache into an enterprise-ready kernel feature.
> > > 
> > > I can post my "goods" anytime.  In its current form it is better
> > > than the zcache in staging (and, please remember, I wrote both so
> > > I think I am in a good position to compare the two).
> > > I have been waiting until I think the new zcache is feature complete
> > > before asking for review, especially since the newest features
> > > should demonstrate clearly why the rewrite is necessary and
> > > beneficial.  But I can post* my current bits if people don't
> > > believe they exist and/or don't mind reviewing non-final code.
> > > (* Or I can put them in a publicly available git tree.)
> > > 
> > > > There is a third option - which is to continue the promotion
> > > > of zcache from staging, get reviews, work on them ,etc, and
> > > > alongside of that you can work on fixing up (or ripping out)
> > > > zcache1 with zcache2 components as they make sense. Or even
> > > > having two of them - an enterprise and an embedded version
> > > > that will eventually get merged together. There is nothing
> > > > wrong with modifying a driver once it has left staging.
> > > 
> > > Minchan and Seth can correct me if I am wrong, but I believe
> > > zram+zsmalloc, not zcache, is the target solution for embedded.
> > 
> > NOT ture. Some embedded devices use zcache but it's not original
> > zcache but modificated one.
> 
> What kind of modifications? Would it make sense to post the patches

It's for contiguos memory allocation.
For it, it uses only clencache, not frontswap so it could zap ephemeral pages
without latency for getting big contiguos memory.

> for those modifications?

It's another story so at the moment, let's not consider it.
After we got some cleanup, I will revisit it.

> 
> > Anyway, although embedded people use modified zcache, I am biased to Dan.
> > I admit I don't spend lots of time to look zcache but as looking the
> > code, it wasn't good shape and even had a bug found during code review
> > and I felt strongly we should clean up it for promoting it to mm/.
> 
> Do you recall what the bugs where?

From: Minchan Kim <minchan@kernel.org>
Date: Fri, 27 Jul 2012 10:10:31 +0900
Subject: [PATCH] zcache: initialize idr

!CONFIG_FRONTSWAP doesn't initialize idr.
This patch always initialize idr.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/staging/zcache/zcache-main.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
index 564873f..a635ee2 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -968,7 +968,7 @@ static void zcache_put_pool(struct tmem_pool *pool)
        atomic_dec(&cli->refcount);
 }
 
-int zcache_new_client(uint16_t cli_id)
+static int zcache_new_client(uint16_t cli_id)
 {
        struct zcache_client *cli;
        int ret = -1; 
@@ -980,11 +980,11 @@ int zcache_new_client(uint16_t cli_id)
        if (cli->zspool)
                goto out;
 
+       idr_init(&cli->tmem_pools);
 #ifdef CONFIG_FRONTSWAP
        cli->zspool = zs_create_pool("zcache", ZCACHE_GFP_MASK);
        if (cli->zspool == NULL)
                goto out;
-       idr_init(&cli->tmem_pools);
 #endif
        ret = 0;
 out:
-- 
1.7.9.5


> 
> > So I would like to wait Dan's posting if you guys are not urgent.
> > (And I am not sure akpm allow it with current shape of zcache code.)
> > But the concern is about adding new feature. I guess there might be some
> > debate for long time and it can prevent promoting again.
> > I think It's not what Seth want.
> > I hope Dan doesn't mix clean up series and new feature series and
> > post clean up series as soon as possible so let's clean up first and
> > try to promote it and later, adding new feature or changing algorithm
> > is desirable.
> > 
> > 
> > > The limitations of zsmalloc aren't an issue for zram but they are
> > > for zcache, and this deficiency was one of the catalysts for the
> > > rewrite.  The issues are explained in more detail in [1],
> > > but if any point isn't clear, I'd be happy to explain further.
> > > 
> > > However, I have limited time for this right now and I'd prefer
> > > to spend it finishing the code. :-}
> > > 
> > > So, as I said, I am still a NACK, but if there are good reasons
> > > to duplicate effort and pursue the "third option", let's discuss
> > > them.
> > > 
> > > Thanks,
> > > Dan
> > > 
> > > [1] http://marc.info/?t=133886706700002&r=1&w=2
> > > 
> > > --
> > > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > > the body to majordomo@kvack.org.  For more info on Linux MM,
> > > see: http://www.linux-mm.org/ .
> > > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> > -- 
> > Kind regards,
> > Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* RE: [PATCH 0/4] promote zcache from staging
  2012-08-06  0:38                 ` Minchan Kim
@ 2012-08-06 15:24                   ` Dan Magenheimer
  2012-08-06 15:47                     ` Pekka Enberg
  2012-08-07 19:28                   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 41+ messages in thread
From: Dan Magenheimer @ 2012-08-06 15:24 UTC (permalink / raw)
  To: Minchan Kim, Konrad Wilk
  Cc: Greg Kroah-Hartman, devel, Seth Jennings, linux-mm, linux-kernel,
	Konrad Rzeszutek Wilk, Andrew Morton, Robert Jennings,
	Nitin Gupta

> > I think we (that is me, Seth, Minchan, Dan) need to talk to have a good
> > understanding of what each of us thinks are fixups.
> >
> > Would Monday Aug 6th at 1pm EST on irc.freenode.net channel #zcache work
> > for people?
> 
> 1pm EST is 2am KST(Korea Standard Time) so it's not good for me. :)
> I know it's hard to adjust my time for yours so let you talk without
> me. Instead, I will write it down my requirement. It's very simple and
> trivial.
> 
> 1) Please don't add any new feature like replace zsmalloc with zbud.
>    It's totally untested so it needs more time for stable POV bug,
>    or performance/fragementation.
> 
> 2) Factor out common code between zcache and ramster. It should be just
>    clean up code and should not change current behavior.
> 
> 3) Add lots of comment to public functions
> 
> 4) make function/varabiel names more clearly.
> 
> They are necessary for promotion and after promotion,
> let's talk about new great features.

Hi Minchan --

I hope you had a great vacation!

Since we won't be able to discuss this by phone/irc, I guess I
need to reply here.

Let me first restate my opinion as author of zcache.

The zcache in staging is really a "demo" version.  It was written 21
months ago (and went into staging 16 months ago) primarily to show,
at Andrew Morton's suggestion, that frontswap and cleancache had value
in a normal standalone kernel (i.e. virtualization not required).  When
posted in early 2011 zcache was known to have some fundamental flaws in the design...
that's why it went into "staging".  The "demo" version in staging still has
those flaws and the change from xvmalloc to zsmalloc makes one of those flaws
worse.  These design flaws are now fixed in the new code base I posted last
week AND the new code base has improved factoring, comments and the code is
properly re-merged with the zcache "fork" in ramster.

We are not talking about new "features"...  I have tried to back out the
new features from the new code base already posted and will post them separately.

So I think we have four choices:

A) Try to promote zcache as is.  (Seth's proposal)
B) Clean up zcache with no new functionality. (Minchan's proposal above)
C) New code base (in mm/tmem/) after review. (Dan's proposal)
D) New code base but retrofit as a series of patches (Konrad's suggestion)

Minchan, if we go with your proposal (B) are you volunteering
to do the work?  And if you do, doesn't it have the same issue
that it is also totally untested?  And, since (B) doesn't solve the
fundamental design issues, are you volunteering to fix those next?
And, in the meantime, doesn't this mean we have THREE versions
of zcache?

IMHO, the fastest way to get the best zcache into the kernel and
to distros and users is to throw away the "demo" version, move forward
to a new solid well-designed zcache code base, and work together to
build on it.  There's still a lot to do so I hope we can work together.

Thanks,
Dan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-08-06 15:24                   ` Dan Magenheimer
@ 2012-08-06 15:47                     ` Pekka Enberg
  2012-08-06 16:21                       ` Dan Magenheimer
  0 siblings, 1 reply; 41+ messages in thread
From: Pekka Enberg @ 2012-08-06 15:47 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Minchan Kim, Konrad Wilk, Greg Kroah-Hartman, devel,
	Seth Jennings, linux-mm, linux-kernel, Konrad Rzeszutek Wilk,
	Andrew Morton, Robert Jennings, Nitin Gupta

On Mon, Aug 6, 2012 at 6:24 PM, Dan Magenheimer
<dan.magenheimer@oracle.com> wrote:
> IMHO, the fastest way to get the best zcache into the kernel and
> to distros and users is to throw away the "demo" version, move forward
> to a new solid well-designed zcache code base, and work together to
> build on it.  There's still a lot to do so I hope we can work together.

I'm not convinced it's the _fastest way_. You're effectively
invalidating all the work done under drivers/staging so you might end up
in review limbo with your shiny new code...

AFAICT, your best bet is to first clean up zcache under driver/staging
and get that promoted under mm/zcache.c. You can then move on to the
more controversial ramster and figure out where to put the clustering
code, etc.

                        Pekka

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 0/4] promote zcache from staging
  2012-08-06 15:47                     ` Pekka Enberg
@ 2012-08-06 16:21                       ` Dan Magenheimer
  2012-08-06 16:29                         ` Greg Kroah-Hartman
  2012-08-07  0:44                         ` Minchan Kim
  0 siblings, 2 replies; 41+ messages in thread
From: Dan Magenheimer @ 2012-08-06 16:21 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Minchan Kim, Konrad Wilk, Greg Kroah-Hartman, devel,
	Seth Jennings, linux-mm, linux-kernel, Konrad Rzeszutek Wilk,
	Andrew Morton, Robert Jennings, Nitin Gupta

> From: Pekka Enberg [mailto:penberg@kernel.org]
> Subject: Re: [PATCH 0/4] promote zcache from staging
> 
> On Mon, Aug 6, 2012 at 6:24 PM, Dan Magenheimer
> <dan.magenheimer@oracle.com> wrote:
> > IMHO, the fastest way to get the best zcache into the kernel and
> > to distros and users is to throw away the "demo" version, move forward
> > to a new solid well-designed zcache code base, and work together to
> > build on it.  There's still a lot to do so I hope we can work together.
> 
> I'm not convinced it's the _fastest way_.

<grin> I guess I meant "optimal", combining "fast" and "best".

> You're effectively
> invalidating all the work done under drivers/staging so you might end up
> in review limbo with your shiny new code...

Fixing the fundamental design flaws will sooner or later invalidate
most (or all) of the previous testing/work anyway, won't it?  Since
any kernel built with staging is "tainted" already, I feel like now
is a better time to make a major design transition.

I suppose:

 (E) replace "demo" zcache with new code base and keep it
     in staging for another cycle

is another alternative, but I think gregkh has said no to that.

Dan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-08-06 16:21                       ` Dan Magenheimer
@ 2012-08-06 16:29                         ` Greg Kroah-Hartman
  2012-08-06 16:38                           ` Dan Magenheimer
  2012-08-07  0:44                         ` Minchan Kim
  1 sibling, 1 reply; 41+ messages in thread
From: Greg Kroah-Hartman @ 2012-08-06 16:29 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Pekka Enberg, Minchan Kim, Konrad Wilk, devel, Seth Jennings,
	linux-mm, linux-kernel, Konrad Rzeszutek Wilk, Andrew Morton,
	Robert Jennings, Nitin Gupta

On Mon, Aug 06, 2012 at 09:21:22AM -0700, Dan Magenheimer wrote:
> I suppose:
> 
>  (E) replace "demo" zcache with new code base and keep it
>      in staging for another cycle
> 
> is another alternative, but I think gregkh has said no to that.

No I have not.  If you all feel that the existing code needs to be
dropped and replaced with a totally new version, that's fine with me.
It's forward progress, which is all that I ask for.

Hope this helps,

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 0/4] promote zcache from staging
  2012-08-06 16:29                         ` Greg Kroah-Hartman
@ 2012-08-06 16:38                           ` Dan Magenheimer
  0 siblings, 0 replies; 41+ messages in thread
From: Dan Magenheimer @ 2012-08-06 16:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Pekka Enberg, Minchan Kim, Konrad Wilk, devel, Seth Jennings,
	linux-mm, linux-kernel, Konrad Rzeszutek Wilk, Andrew Morton,
	Robert Jennings, Nitin Gupta

> From: Greg Kroah-Hartman [mailto:gregkh@linuxfoundation.org]
> Subject: Re: [PATCH 0/4] promote zcache from staging
> 
> On Mon, Aug 06, 2012 at 09:21:22AM -0700, Dan Magenheimer wrote:
> > I suppose:
> >
> >  (E) replace "demo" zcache with new code base and keep it
> >      in staging for another cycle
> >
> > is another alternative, but I think gregkh has said no to that.
> 
> No I have not.  If you all feel that the existing code needs to be
> dropped and replaced with a totally new version, that's fine with me.
> It's forward progress, which is all that I ask for.
> 
> Hope this helps,
> greg k-h

Hi Greg --

Cool!  I guess I mistakenly assumed that your "no new features"
requirement also implied "no fixes of fundamental design flaws". :-)

Having option (E) should make it easier to decide the best
technical solution, separate from the promotion timing and "where
does it land" question.

We'll get back to you soon...

Thanks!
Dan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-08-06 16:21                       ` Dan Magenheimer
  2012-08-06 16:29                         ` Greg Kroah-Hartman
@ 2012-08-07  0:44                         ` Minchan Kim
  1 sibling, 0 replies; 41+ messages in thread
From: Minchan Kim @ 2012-08-07  0:44 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Pekka Enberg, Konrad Wilk, Greg Kroah-Hartman, devel,
	Seth Jennings, linux-mm, linux-kernel, Konrad Rzeszutek Wilk,
	Andrew Morton, Robert Jennings, Nitin Gupta

Hi Dan,

On Mon, Aug 06, 2012 at 09:21:22AM -0700, Dan Magenheimer wrote:
> > From: Pekka Enberg [mailto:penberg@kernel.org]
> > Subject: Re: [PATCH 0/4] promote zcache from staging
> > 
> > On Mon, Aug 6, 2012 at 6:24 PM, Dan Magenheimer
> > <dan.magenheimer@oracle.com> wrote:
> > > IMHO, the fastest way to get the best zcache into the kernel and
> > > to distros and users is to throw away the "demo" version, move forward
> > > to a new solid well-designed zcache code base, and work together to
> > > build on it.  There's still a lot to do so I hope we can work together.
> > 
> > I'm not convinced it's the _fastest way_.
> 
> <grin> I guess I meant "optimal", combining "fast" and "best".
> 
> > You're effectively
> > invalidating all the work done under drivers/staging so you might end up
> > in review limbo with your shiny new code...
> 
> Fixing the fundamental design flaws will sooner or later invalidate
> most (or all) of the previous testing/work anyway, won't it?  Since
> any kernel built with staging is "tainted" already, I feel like now
> is a better time to make a major design transition.
> 
> I suppose:
> 
>  (E) replace "demo" zcache with new code base and keep it
>      in staging for another cycle

I go for (E). Please send your refactoring code as formal patch.
Thanks.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-08-06  0:38                 ` Minchan Kim
  2012-08-06 15:24                   ` Dan Magenheimer
@ 2012-08-07 19:28                   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-07 19:28 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, Dan Magenheimer, devel, Seth Jennings,
	linux-mm, linux-kernel, Konrad Rzeszutek Wilk, Andrew Morton,
	Robert Jennings, Nitin Gupta

[-- Attachment #1: Type: text/plain, Size: 2962 bytes --]

On Mon, Aug 06, 2012 at 09:38:16AM +0900, Minchan Kim wrote:
> Hi Konrad,
> 
> On Tue, Jul 31, 2012 at 01:51:42PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jul 31, 2012 at 09:19:16AM -0700, Greg Kroah-Hartman wrote:
> > > On Tue, Jul 31, 2012 at 11:58:43AM -0400, Konrad Rzeszutek Wilk wrote:
> > > > So in my head I feel that it is Ok to:
> > > > 1) address the concerns that zcache has before it is unstaged
> > > > 2) rip out the two-engine system with a one-engine system
> > > >    (and see how well it behaves)
> > > > 3) sysfs->debugfs as needed
> > > > 4) other things as needed
> > > > 
> > > > I think we are getting hung-up what Greg said about adding features
> > > > and the two-engine->one engine could be understood as that.
> > > > While I think that is part of a staging effort to clean up the
> > > > existing issues. Lets see what Greg thinks.
> > > 
> > > Greg has no idea, except I want to see the needed fixups happen before
> > > new features get added.  Add the new features _after_ it is out of
> > > staging.
> > 
> > I think we (that is me, Seth, Minchan, Dan) need to talk to have a good
> > understanding of what each of us thinks are fixups.
> > 
> > Would Monday Aug 6th at 1pm EST on irc.freenode.net channel #zcache work
> > for people?
> 
> 1pm EST is 2am KST(Korea Standard Time) so it's not good for me. :)
> I know it's hard to adjust my time for yours so let you talk without
> me. Instead, I will write it down my requirement. It's very simple and
> trivial.

OK, Thank you.

We had a lengthy chat (full chat log attached). The summary was that
we all want to promote zcache (for various reasons), but we are hang up
whether we are OK unstaging it wherein it lowers the I/Os but potentially
not giving large performance increase (when doing 'make -jN') or that we
want both of those characteristics in. Little history: v3.3 had
swap readahead patches that made the amount of pages going in swap dramatically
decrease - as such the performance of zcache is not anymore amazing, but ok.

Seth and Robert (and I surmise Minchan too) are very interested in zcache
as its lowers the amount of I/Os but performance is secondary. Dan is interested
in having less I/Os and providing a performance boost with the such workloads as
'make -jN' - in short less I/Os and better performance. Dan would like both
before unstaging.

The action items that came out are:
 - Seth posted some benchmarks - he is going to rerun them with v3.5
   to see how it behaves in terms of performance (make -jN benchmark).
 - Robert is going to take a swing at Minchan refactor and adding comments, etc
   (But after we get over the hump of agreeing on the next step).
 - Konrad to rummage in his mbox to find any other technical objections
   that were raised on zcache earlier to make sure to address them.
 - Once Seth is finished Konrad is going to take another swing
   at driving this discussion - either via email, IRC or conference call.


[-- Attachment #2: zcache-Aug6.log --]
[-- Type: text/plain, Size: 24123 bytes --]

Aug 06 11:38:41 *	Now talking on #zcache
Aug 06 11:38:41 *	cameron.freenode.net sets mode +n #zcache
Aug 06 11:38:41 *	cameron.freenode.net sets mode +s #zcache
Aug 06 12:38:15 *	sjennings (~sjennings@2001:470:1f0f:87d:e46c:7cd1:4974:640b) has joined #zcache
Aug 06 12:49:12 *	rcj (~rcjenn@32.97.110.59) has joined #zcache
Aug 06 12:50:26 *	djm1 (~djm1021@inet-hqmc03-o.oracle.com) has joined #zcache
Aug 06 12:50:29 <rcj>	Good morning Konrad, thanks for suggesting this.
Aug 06 12:50:32 *	darnok (~konrad@209-6-85-33.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #zcache
Aug 06 12:50:47 <djm1>	hi all, first time user for freenode so I had to register, sorry to be late
Aug 06 12:50:53 <djm1>	(this is Dan Magenheimer)
Aug 06 12:51:17 <darnok>	djm1, I think you can be just an unregister user?
Aug 06 12:51:36 <konrad>	rcj, Sure thinkg
Aug 06 12:51:51 *	darnok goes away now that the proper Konrad is in place
Aug 06 12:52:08 <djm1>	konrad/darnok: ok, we can deal with that later, for now, "call me djm1"
Aug 06 12:52:22 <konrad>	Sure. So I think we are all here right? Minchan couldn't make it
Aug 06 12:52:38 <sjennings>	i don't know of anyone else that was planning to come
**** ENDING LOGGING AT Mon Aug  6 12:52:39 2012

**** BEGIN LOGGING AT Mon Aug  6 12:52:39 2012

Aug 06 12:52:46 <rcj>	Sounds good
Aug 06 12:52:59 *	konrad nods.
Aug 06 12:53:18 <konrad>	Let me first start by saying thank you for being able to make it.
Aug 06 12:53:42 <konrad>	Sometimes it takes a whole bunch of iterations to setup a call but this worked out fine.
Aug 06 12:53:46 *	darnok has quit (Read error: Operation timed out)
Aug 06 12:54:10 <konrad>	And also for the interest in zcache/tmem/frontswap/cleanpage/future ideas.
Aug 06 12:55:11 <konrad>	I think we all want all the pieces completely out of staging (irregardlesss of how internally each company uses the technologies - we have been shipping it with our kernel - UEK2)
Aug 06 12:55:55 <rcj>	We do want it out of staging, for us we can't use it until it's out so it's a bit more critical in that way.
Aug 06 12:55:55 <konrad>	But for right now the zcache is on the plate since that can be used in numerous ways.
Aug 06 12:56:09 *	djm1 makes minor correction... we are shipping frontswap and cleancache and the xen shim in UEK2, not zcache
Aug 06 12:56:21 <konrad>	djm1, Ah thats right. 
Aug 06 12:56:33 *	konrad puts on his bigger TODO list to address that.
Aug 06 12:57:07 <djm1>	rcj: right, we can't use it while its in staging either
Aug 06 12:57:27 <djm1>	UEK2 did jump the gun on frontswap though
Aug 06 12:57:48 <konrad>	rcj, With the 'can't use it' is it more of a .. support type question? Meaning that once it gets its equal to the being almost bug-free?
Aug 06 12:58:19 <djm1>	I think "officially" anything in staging results in a tainted kernel, though I don't know how that applies to distros
Aug 06 12:58:32 <rcj>	Trying to parse that last sentence konrad 
Aug 06 12:58:35 <konrad>	rcj, I am trying here to figure out if the "unstaging" part that is critical to you is in terms of code quality or some other criteria.
Aug 06 12:59:31 *	sashal (~sasha@95-89-78-76-dynip.superkabel.de) has joined #zcache
Aug 06 12:59:33 <konrad>	rcj, As that leads directly to what is the upmost important in zcache for you guys
Aug 06 12:59:41 <sjennings>	the distros we work with only accept mainline drivers/code, not staging
Aug 06 12:59:42 <rcj>	konrad, quality is a different discussion.  Quality has to be there.
Aug 06 12:59:50 <konrad>	Hey Sasha. Let me paste to you on a side-channel what we said so far.
Aug 06 12:59:58 <sashal>	konrad, thanks
Aug 06 13:00:20 <konrad>	sashal, Hopefully your client won't ban me for pasting too fast.
Aug 06 13:00:32 <djm1>	sashal: welcome!
Aug 06 13:00:38 *	djm1 is dan.magenheimer
Aug 06 13:01:05 <sjennings>	my interest is using frontswap/cleancache/zcache for in-kernel memory compression.  no xen, no kvm
Aug 06 13:01:07 <sashal>	djm1, hey!
Aug 06 13:01:10 <konrad>	sjennings, rcj : OK, so its the 'feature' needs to be in the mainline before they will accept turning it on. Somehow I thought staging would be part of it since Fedora has some bits turned for that.
Aug 06 13:02:43 <konrad>	sjennings, rcj , djm1 : But didn't realize the tainting part of the code? Ah yes: 2564                 add_taint_module(mod, TAINT_CRAP);
Aug 06 13:03:05 <sjennings>	yes, that looks bad in the dmesg of enterprise distros
Aug 06 13:03:27 *	konrad nods
Aug 06 13:03:57 <konrad>	Then it comes down to: time-line and resources
Aug 06 13:04:19 <djm1>	konrad: well, yes, and the small matter of technical solutions too ;-)
Aug 06 13:04:28 <sjennings>	i sent out the promotion patch because we find value in zcache as it is right now
Aug 06 13:05:12 <sjennings>	it is stable and, as both Dan and I have demostrated, has measurable benefits (I/O reduction and sometime speed)
Aug 06 13:05:14 <konrad>	sjennings, OK. so the aim for getting it out in v3.6 is .. an optimistic hope not a real I-MUST-GET-IT-NOW type.
Aug 06 13:05:18 <rcj>	sjennings, and there are other contributors that are in same situation where they are using zcache as it exists. 
Aug 06 13:06:46 <sjennings>	well, it's for 3.7 now, and the promotion patch should either work or tell us what showstoppers remain the prevent promotion
Aug 06 13:07:00 <djm1>	sjennings: two things... I recall Dave Hansen stating fairly clearly a year ago that zcache was not near suitable for promoting... is he happy?
Aug 06 13:07:25 <djm1>	second, have you measured zcache/non-zcache results on a recent kernel?
Aug 06 13:08:13 <sjennings>	depends on how recent your talking about.  i posted my zcache performance number on v3.4 i think
Aug 06 13:08:30 <sjennings>	i consider that recent
Aug 06 13:09:35 *	djm1 thinks it was 3.2, not sure, but Rik Van Riel made a lot of swap subsystem improvements in the last couple of kernels... I suggest rerunning at least a couple of benchmarks (especially something swap heavy)
Aug 06 13:10:46 <sjennings>	seems like that would only make zcache, in it currently form, more useful
Aug 06 13:10:46 <konrad>	djm1, Nothing big in v3.4: konrad@phenom:~/ssd/linux$ git log --oneline  v3.4..v3.5 mm/swapfile.c
Aug 06 13:10:46 <konrad>	9b15b81 swap: fix shmem swapping when more than 8 areas
Aug 06 13:10:46 <konrad>	a3fe778 Merge tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm
Aug 06 13:10:46 <konrad>	4b91355 memcg: fix/change behavior of shared anon at moving task
Aug 06 13:10:46 <konrad>	bde05d1 shmem: replace page if mapping excludes its zone
Aug 06 13:10:46 <konrad>	38b5faf mm: frontswap: core swap subsystem hooks and headers
Aug 06 13:12:20 <konrad>	djm1, So I think that is OK for v3.4 perf perspective?
Aug 06 13:12:36 <konrad>	sjennings, These weren't benchmark with other patches on top? Like the WasActive one that Dan came up with?
Aug 06 13:13:25 <sjennings>	no, i used the mainline + frontswap (before it was mainlined in v3.5)
Aug 06 13:13:43 <rcj>	djm1, If dhansen has specific issues he'll need to speak for them in this promotion process.
Aug 06 13:13:54 <djm1>	sjennings: you ran your benchmarks before I was at LSF/MM and I think 3.3 was released that weekend
Aug 06 13:14:07 <djm1>	https://lkml.org/lkml/2012/4/16/449
Aug 06 13:14:23 <djm1>	here's one of the Rik van Riel patches
Aug 06 13:16:42 <djm1>	sjennings: after rik's patches went in, the "non-zcache" times dropped dramatically
Aug 06 13:18:11 <sjennings>	zcache's primary benefit for me is reduced I/O
Aug 06 13:18:29 <sjennings>	just because swap on rotational media is faster now, doesn't make zcache unneeded
Aug 06 13:18:57 <sjennings>	not all linux machines use rotational swap media (i.e. phones)
Aug 06 13:18:58 <djm1>	sjennings: believe me, I am not arguing that zcache is not needed, just that we now have a lot more work/tuning to do
Aug 06 13:19:42 <rcj>	djm1, but there will always be interactions that drive work.
Aug 06 13:20:48 <rcj>	djm1, having zcache in staging or mainline is orthogonal.  In fact, in mainline a patch that causes a regression for zcache might get more attention.
Aug 06 13:23:15 <djm1>	rcj: sorry, I'm not sure I understand your point
Aug 06 13:23:59 <djm1>	I am suggesting that the benchmarks Seth and I ran and posted, if rerun today on 3.4/3.5 would look horrible
Aug 06 13:24:58 <sjennings>	how is Rik's patchset going to reduce I/O like zcache does?
Aug 06 13:25:05 <djm1>	(correction: some of them look horrible)
Aug 06 13:25:26 <sjennings>	if anything, it increases I/O be doing more aggressive readahead
Aug 06 13:25:39 <rcj>	djm1, You brought up Rik's patch improving swap performance without zcache.  You're saying that zcache needs to improve.  That's not a zcache regression.  The kernel is improving in the swap area for some cases, that's great.  So then we take a TODO to worrk on zcache swap performance or elevate the priority of that work. it's not a blocker.
Aug 06 13:25:47 <djm1>	sjennings: good question... I don't know for sure... but they dramatically increase performance
Aug 06 13:26:34 <djm1>	rcj: are you saying that zcache is useful regardless of whether it increases performance?
Aug 06 13:27:29 <djm1>	(or are you saying you have measured it on non-rotating and it DOES increase performance there, and so you don't care if whether it increases performance on rotating?)
Aug 06 13:27:56 <djm1>	(measured on 3.4/3.5...)
Aug 06 13:27:59 <rcj>	djm1, I'm saying that there are other measures as Seth is pointing out.
Aug 06 13:28:02 <konrad>	djm1, Does it matter? The goal is to have limited amount of I/Os.
Aug 06 13:28:55 *	sjennings what konrad said
Aug 06 13:28:55 <konrad>	djm1, well, let me redact that. .. and not have horrible performance .
Aug 06 13:28:58 <djm1>	konrad,rcj: to me, reduced I/O and better performance are directly correlated, unless the bottleneck is the cpu of course
Aug 06 13:30:01 <djm1>	sjennings, ok, then please rerun your benchmarks on 3.4 or 3.5, and look at whether zcache reduced I/Os vs non-zcache
Aug 06 13:31:21 <konrad>	djm1, You are thinking is that it does not? Or that it does reduce I/O but the perf overall goes down?
Aug 06 13:31:30 <sjennings>	i can do that, but those numbers don't negate our point.  this change in behavior isn't a bug in zcache
Aug 06 13:32:16 <djm1>	sjennings: ok, then let's start over from objectives: I think you said you want to promote zcache because it reduces I/O
Aug 06 13:33:37 <djm1>	I am agreeing that on 3.2 it did, but I have seen on 3.4 performance for non-zcache has improved dramatically, so I believe zcache no longer reduces I/O over non-zcache
Aug 06 13:34:53 <sjennings>	i'm i correct in understand rik's patch to make swap readahead more agressive?
Aug 06 13:35:12 <djm1>	it MAY be the case that the number of I/Os has improved dramatically for non-zcache, but the number of pages read/written for zcache is better than non-zcache
Aug 06 13:35:15 <sjennings>	*am i
Aug 06 13:36:30 <djm1>	sjennings: yes, that is one of rik's patches, there were several and I didn't look at all of them carefully
Aug 06 13:37:47 <sjennings>	we need to come back around on this though.  we want promotion.  what are the absolute showstoppers to prevent this.  that is bugs or hard numbers that demonstrate the zcache has no value now (since previous number indictated it did have value)
Aug 06 13:38:52 <rcj>	sjennings, This doesn't affect cleancache and it doesn't break frontswap, it just improves other kernel code.  There will always be work, but I don't see that gating mainline acceptance.
Aug 06 13:38:56 <konrad>	sjennings, The ones that Minchin identified were cleanups and refactor of code to make it more readable. No zcache engine replacement.
Aug 06 13:40:01 <rcj>	konrad, that sort of patch can go in anytime though.  I could send out patchess to clean-up and refactor mainline code.
Aug 06 13:40:07 <djm1>	rcj: please help me understand... you want to promote zcache because it improved performance on older kernels even if it doesn't on current kernels?
Aug 06 13:41:13 <djm1>	sjennings: the hard numbers clearly demonstrate that the existing zcache has much less value now
Aug 06 13:42:22 <rcj>	djm1, I don't see you presenting numbers though, only saying that it "may" affect performance. So send out performance patches.  Even if Rik never sent out his patches, could zcache stand performance work?  Could any part of the kernel take performance patches?  I don't see this as a gate to mainline.
Aug 06 13:42:53 <djm1>	rcj: please answer my question, you have avoided it
Aug 06 13:43:36 <konrad>	djm1, Is your thought that the answer to the performance numbers are the zcache2 different engine? Or the different algorithm for dealing with pages?
Aug 06 13:43:40 <djm1>	it is true I don't wish to shoot myself in the foot by publicly showing that zcache sucks now, but nor do I want to put my sign-off on promoting it while hiding the knowledge that it does
Aug 06 13:45:01 <djm1>	konrad: yes, I've sent many emails (offlist, and one or two on) about the design flaws of the "demo" zcache...
Aug 06 13:45:55 <konrad>	djm1, So what you are saying is that zcache2 by itself , if it was in v3.3 would show phenomal numbers, not just good.
Aug 06 13:46:25 <rcj>	djm1, I can't answer general questions unless we're talking real numbers.  The code is in public and the development should be as well.
Aug 06 13:46:26 <konrad>	djm1, And when measuring it with v3.5, it shows good  respectible numbers.
Aug 06 13:46:45 <djm1>	konrad: sadly no, my hand was forced to release the current status
Aug 06 13:47:30 <konrad>	djm1, So what you are saying is that you want to continue on working on thsi until you do get to that point
Aug 06 13:47:53 <konrad>	djm1, But that might take longer than a couple of releases and one might as well just drop in the new code.
Aug 06 13:47:58 <djm1>	rcj: ok, let me give you two numbers off the top of my head... Seth's published measurement 1700 seconds on zcache, 2000 non-zcache.
Aug 06 13:48:15 <djm1>	New measurement on non-zcache 3.5: 800 seconds
Aug 06 13:48:46 <sjennings>	did you use my machine?
Aug 06 13:48:56 <sjennings>	they aren't comparable
Aug 06 13:49:13 <sjennings>	plus i'm talking about I/O
Aug 06 13:49:29 <djm1>	sjennings: I am inviting you to use your machine and re-measure, or to admit you don't care what the new numbers are
Aug 06 13:50:08 *	konrad sighs.
Aug 06 13:50:33 <konrad>	I was hoping to have a clear define ACTION LISTs out of this and it seems we have at least two:
Aug 06 13:50:44 <konrad>	  1).   I could send out patchess to clean-up and refactor mainline code
Aug 06 13:50:59 <konrad>	 2). Run existing benchmarks against v3.5 using zcache/non-zcache
Aug 06 13:51:42 <sjennings>	doing 1 won't resolve djm1 objections though
Aug 06 13:52:35 <konrad>	djm1, So Dan, if the numbers on sjennings show no perf degradation, or his tests on v3.5 show good numbers, I believe you would be OK with the current zcache code right?
Aug 06 13:52:56 <konrad>	sjennings, True. Just going through the action items to jot them down.
Aug 06 13:54:36 <djm1>	konrad: if sjennings runs the same benchmarks as he previously published on 3.5 and zcache performance always beats non-zcache performance, that is a big step in the right direction
Aug 06 13:55:12 <djm1>	konrad: but knowing the design flaws in zcache I can fairly easily find benchmarks for which it is not the case
Aug 06 13:55:20 <konrad>	djm1, If they show horrible perf that is tried in with the generic swap system... then what? Your drop-in does not necessarily fixes it, and it might be a swap system issue. Which is why rcj proposoal to move with the unstaging as soon as possible so that we aren't behind with this is the right thing
Aug 06 13:55:52 <konrad>	djm1, OK, but some benchmarks are so synthethic that they bear no resemblence to real world scenario. 
Aug 06 13:56:58 <konrad>	djm1, But the design flaws discussion is something that can be addressed later too.
Aug 06 13:58:17 <rcj>	"Lies, damn lies, and statistics", right?  Seth was talking about IOPs being reduced and Dan's talking about # seconds to run kern-bench which are different, you can always find some benchmark that falls down.  But that doesn't mean there isn't some value still. zcache is not a panacea but does have use-cases.
Aug 06 13:58:41 <konrad>	rcj: Heh
Aug 06 13:59:19 <rcj>	And with measurements you can understand the problem and address it.  But the question about promotion isn't necessarily resolved that way.
Aug 06 13:59:58 *	rcj re-reads the second sentence and needs to restate...
Aug 06 14:00:26 <djm1>	rcj: please re-run the same benchmarks as sjennings ran in March, measure all three: seconds, pages of I/O and number of I/Os and then let's discuss further
Aug 06 14:01:27 <sjennings>	djm1, regarding the step in the right direction comment, i need to know what evidence would be required for you to drop your objection to promotion before i go through all the trouble of producing that evidence.
Aug 06 14:01:53 <sjennings>	if i can show that zcache reduces I/O during a kernel build with a 3.5 kernel, is that enouhg
Aug 06 14:02:34 <djm1>	sjennings: across the same range of -jN?
Aug 06 14:02:39 <sjennings>	sure
Aug 06 14:03:16 <djm1>	number of pages of I/O or number of I/Os? (and I ask that for a very specific reason)
Aug 06 14:03:50 <konrad>	djm1, So I am curious. If this discussion was done in v3.3 day-time - would the issue of the design of zcache be on the table?
Aug 06 14:04:04 <konrad>	djm1, Or the rewrite or the benchmarks?
Aug 06 14:05:04 <djm1>	konrad: good question... yes, we had this discussion offlist prior to v3.3 (and I think you were cc'ed)
Aug 06 14:05:35 <djm1>	at that point, zcache was "good" and I was trying to make it "better", now I fear that it sucks and I am trying to make it at least "good"
Aug 06 14:07:19 <djm1>	rcj, sjennings: I have worked in a big company for 30 years, I understand you have constraints, and I can only guess what is driving this
Aug 06 14:07:51 <djm1>	I suspect that IBM has a contractual obligation to deliver this in some embedded system which uses SSD/NVRAM
Aug 06 14:08:31 <sjennings>	djm1, our motivation is not relevant
Aug 06 14:08:41 <rcj>	sjennings, and the community doesn't care.
Aug 06 14:08:44 <djm1>	so to answer sjennings question, if zcache is truly going to be used successfully by real users, no, I don't think I want to get in the way of that
Aug 06 14:08:50 <sjennings>	this code is of value to use and others
Aug 06 14:09:03 <sjennings>	s/use/us
Aug 06 14:10:00 <sjennings>	and improvements can continue to be made
Aug 06 14:10:09 <djm1>	sjennings: restating that and ignoring what I am telling you and refusing to measure it for yourself doesn't help
Aug 06 14:11:21 <djm1>	rcj: the community most certainly does care... had Seth and I never published *amazing* zcache numbers (compared to native), we wouldn't be having this conversation
Aug 06 14:12:03 <rcj>	djm1, I was just saying that the community doesn't care about motivations
Aug 06 14:12:10 <djm1>	by promoting zcache based on those numbers, it makes me feel disingenous, if not just dirty
Aug 06 14:12:18 <konrad>	djm1, I think you made your point. You would like Seth to do some benchmarking to verify the perf numbers.
Aug 06 14:12:35 <konrad>	djm1, And if they do not work right, then address those before promoting it.
Aug 06 14:13:34 <sjennings>	but we don't agree on what "work right" is
Aug 06 14:13:43 <sjennings>	how are we defining that
Aug 06 14:14:02 <sjennings>	i define it in reduced I/O (even if runtime is longer)
Aug 06 14:15:21 <djm1>	which is why motivation is relevant
Aug 06 14:16:04 <konrad>	djm1, motivation==use case I think?
Aug 06 14:16:09 <djm1>	if this is the classic battle: "which is more important? enterprise or embedded" then let's call it that
Aug 06 14:16:55 <djm1>	konrad: yes, but "my use case runs faster so I don't care if yours run slower, just don't use it"
Aug 06 14:17:57 <konrad>	djm1, So your concern is with .. I can't seem to find in the log here
Aug 06 14:18:22 <djm1>	sjennings, rcj: so IMHO zram is the right match for you then... what is it about zcache that solves your problem better than zram?
Aug 06 14:19:42 <konrad>	djm1, Ah, speed. Workloads running faster.
Aug 06 14:23:19 *	djm1 goes afk
Aug 06 14:26:03 <konrad>	djm1, Um, hope you won't be long afk
Aug 06 14:27:02 <sjennings>	i'll post numbers on the v3.5 kernel
Aug 06 14:27:20 <konrad>	sjennings, OK.
Aug 06 14:27:31 <konrad>	sjennings, Thank you.
Aug 06 14:27:45 *	sashal has quit (Excess Flood)
Aug 06 14:28:05 *	sashal (~sasha@95-89-78-76-dynip.superkabel.de) has joined #zcache
Aug 06 14:28:21 <konrad>	sashal, Last thing said was: "<sjennings> i'll post numbers on the v3.5 kernel
Aug 06 14:28:21 <konrad>	<konrad> sjennings, OK.
Aug 06 14:28:21 <konrad>	<konrad> sjennings, Thank you."
Aug 06 14:29:00 <konrad>	sjennings, rcj , djm1 : I thought this would be only an hour but it went for an hour and half. Yikes.
Aug 06 14:29:21 <konrad>	sjennings, rcj : I hope it doesn't negatively impact your guys schedule for today.
Aug 06 14:30:08 <konrad>	sjennings, rcj djm1 :I think it makes sense to one more discussion about this after Seth has the opportunity to run the numbers.
Aug 06 14:30:32 <konrad>	sjennings, rcj djm1 sashal : Would thsi time work next week? Would a conference call be better than IRC?
Aug 06 14:30:57 <rcj>	konrad, IRC is better, this needs to be completely public.  On-list would be better.
Aug 06 14:31:31 <rcj>	konrad, I just don't want to see barriers to people being involved in a discussion.
Aug 06 14:31:55 <konrad>	rcj, Good point. .. I am thinking of more of removing some of the barriers by having folks in one location so to speak.
Aug 06 14:32:27 <rcj>	konrad, I understand completely.  Just trying to find a good balance.
Aug 06 14:33:27 <konrad>	sjennings, djm1, rcj,  So let me write a summary of this and also save the full log as an attachment.
Aug 06 14:33:30 <rcj>	konrad, I'm still worried by the premise of the objection.  That a non-zcache patch has improved performance somewhere else in the kernel and this poses a barrier to mainline inclusion.
Aug 06 14:34:52 <konrad>	rcj, Right. Thought not for embedded side. The things that Andrew Morton pointed out is that he would like zcache be enabled by default.
Aug 06 14:36:07 <konrad>	rcj, And Dan is trying to make that happen. But if the enablement of zcache does not benefit  on nowadays desktops..Then it should not be enabled by default.
Aug 06 14:36:37 <konrad>	rcj, And then it comes back to Andrew having a potential issue with zcache. So Dan is trying to make sure that this issue will not surface.
Aug 06 14:37:04 <konrad>	rcj, But the other side of the coin is that if its used in the embedded side then it does not matter.
Aug 06 14:37:51 <konrad>	rcj, Which I believe is what you guys are coming from.
Aug 06 14:38:58 <rcj>	konrad, We don't know the other show-stoppers because the conversation got killed pretty early on-list.
Aug 06 14:40:37 <konrad>	rcj, Ugh. OK.
Aug 06 14:41:32 <konrad>	rcj, Let me see what I have in the mbox and if I can compile to something similar to what Seth did for frontswa
Aug 06 14:42:08 <konrad>	Well, I have to head out, other things are showing up in my calendar
Aug 06 14:42:48 <rcj>	konrad, thanks.
Aug 06 14:43:13 <konrad>	sjennings, rcj djm1 sashal : Gotta go. I think the two TODOs are: 1). Seth run v3.5 w/ zcache and w/o zcache; 2) Konrad to rummage in his mbox to find earlier objections to zcache
Aug 06 14:43:27 *	djm1 returnsm apparently too late
Aug 06 14:43:54 <rcj>	konrad, sounds good.
Aug 06 14:44:00 <konrad>	sjennings, rcj djm1 sashal : 3) Propose to continue this on mailing list for the more broader topics, while the more fine one - on IRC.
Aug 06 14:44:17 <konrad>	.. and also get Minchan involved here.
Aug 06 14:44:44 *	konrad is away
Aug 06 15:01:14 *	sashal has quit (Ping timeout: 240 seconds)
Aug 06 15:42:06 *	sashal (~sasha@95-89-78-76-dynip.superkabel.de) has joined #zcache
Aug 06 16:20:00 *	rcj has quit (Quit: rcj)
Aug 06 17:03:01 *	sjennings has quit (Quit: Leaving)
Aug 07 12:17:29 *	Disconnected (Connection reset by peer).

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-27 18:18 [PATCH 0/4] promote zcache from staging Seth Jennings
                   ` (4 preceding siblings ...)
  2012-07-29  2:20 ` [PATCH 0/4] promote zcache from staging Minchan Kim
@ 2012-08-07 20:23 ` Seth Jennings
  2012-08-07 21:47   ` Dan Magenheimer
  2012-08-09 18:50   ` Seth Jennings
  2012-08-14 22:18 ` Seth Jennings
  6 siblings, 2 replies; 41+ messages in thread
From: Seth Jennings @ 2012-08-07 20:23 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Robert Jennings,
	linux-mm, linux-kernel, devel

On 07/27/2012 01:18 PM, Seth Jennings wrote:
> Some benchmarking numbers demonstrating the I/O saving that can be had
> with zcache:
> 
> https://lkml.org/lkml/2012/3/22/383

There was concern that kernel changes external to zcache since v3.3 may
have mitigated the benefit of zcache.  So I re-ran my kernel building
benchmark and confirmed that zcache is still providing I/O and runtime
savings.

Gentoo w/ kernel v3.5 (frontswap only, cleancache disabled)
Quad-core i5-2500 @ 3.3GHz
512MB DDR3 1600MHz (limited with mem=512m on boot)
Filesystem and swap on 80GB HDD (about 58MB/s with hdparm -t)
majflt are major page faults reported by the time command
pswpin/out is the delta of pswpin/out from /proc/vmstat before and after
the make -jN

Mind the 512MB RAM vs 1GB in my previous results.  This just reduces
the number of threads required to create memory pressure and removes some
of the context switching noise from the results.

I'm also using a single HDD instead of the RAID0 in my previous results.

Each run started with with:
swapoff -a
swapon -a
sync
echo 3 > /proc/sys/vm/drop_caches

I/O (in pages):
	normal				zcache				change
N	pswpin	pswpout	majflt	I/O sum	pswpin	pswpout	majflt	I/O sum	%I/O
4	0	2	2116	2118	0	0	2125	2125	0%
8	0	575	2244	2819	4	4	2219	2227	21%
12	2543	4038	3226	9807	1748	2519	3871	8138	17%
16	23926	47278	9426	80630	8252	15598	9372	33222	59%
20	50307	127797	15039	193143	20224	40634	17975	78833	59%

Runtime (in seconds):
N	normal	zcache	%change
4	126	127	-1%
8	124	124	0%
12	131	133	-2%
16	189	156	17%
20	261	235	10%

%CPU utilization (out of 400% on 4 cpus)
N	normal	zcache	%change
4	254	253	0%
8	261	263	-1%
12	250	248	1%
16	173	211	-22%
20	124	140	-13%

There is a sweet spot at 16 threads, where zcache is improving runtime by
17% and reducing I/O by 59% (185MB) using 22% more CPU.

Seth


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 0/4] promote zcache from staging
  2012-08-07 20:23 ` Seth Jennings
@ 2012-08-07 21:47   ` Dan Magenheimer
  2012-08-08 16:29     ` Seth Jennings
  2012-08-09 18:50   ` Seth Jennings
  1 sibling, 1 reply; 41+ messages in thread
From: Dan Magenheimer @ 2012-08-07 21:47 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Wilk, Robert Jennings, linux-mm, linux-kernel, devel,
	Kurt Hackel

> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Subject: Re: [PATCH 0/4] promote zcache from staging
> 
> On 07/27/2012 01:18 PM, Seth Jennings wrote:
> > Some benchmarking numbers demonstrating the I/O saving that can be had
> > with zcache:
> >
> > https://lkml.org/lkml/2012/3/22/383
> 
> There was concern that kernel changes external to zcache since v3.3 may
> have mitigated the benefit of zcache.  So I re-ran my kernel building
> benchmark and confirmed that zcache is still providing I/O and runtime
> savings.

Hi Seth --

Thanks for re-running your tests.  I have a couple of concerns and
hope that you, and other interested parties, will read all the
way through my lengthy response.

The zcache issues I have seen in recent kernels arise when zcache
gets "full".  I notice your original published benchmarks [1] include
N=24, N=28, and N=32, but these updated results do not.  Are you planning
on completing the runs?  Second, I now see the numbers I originally
published for what I thought was the same benchmark as yours are actually
an order of magnitude larger (in sec) than yours.  I didn't notice
this in March because we were focused on the percent improvement, not
the raw measurements.  Since the hardware is highly similar, I suspect
it is not a hardware difference but instead that you are compiling
a much smaller kernel.  In other words, your test case is much
smaller, and so exercises zcache much less.  My test case compiles
a full enterprise kernel... what is yours doing?

IMHO, any cache in computer science needs to be measured both
when it is not-yet-full and when it is full.  The "demo" zcache in
staging works very well before it is full and I think our benchmarking
in March and your re-run benchmarks demonstrate that.  At LSFMM, Andrea
Arcangeli pointed out that zcache, for frontswap pages, has no "writeback"
capabilities and, when it is full, it simply rejects further attempts
to put data in its cache.  He said this is unacceptable for KVM and I
agreed that it was a flaw that needed to be fixed before zcache should
be promoted.  When I tested zcache for this, I found that not only was
he right, but that zcache could not be fixed without a major rewrite.

This is one of the "fundamental flaws" of the "demo" zcache, but the new
code base allows for this to be fixed.

A second flaw is that the "demo" zcache has no concept of LRU for
either cleancache or frontswap pages, or ability to reclaim pageframes
at all for frontswap pages.  (And for cleancache, pageframe reclaim
is semi-random).  As I've noted in other threads, this may be impossible
to implement/fix with zsmalloc, and zsmalloc's author Nitin Gupta has
agreed, but the new code base implements all of this with zbud.  One
can argue that LRU is not a requirement for zcache, but a long history
of operating systems theory would suggest otherwise.

A third flaw is that the "demo" version has a very poor policy to
determine what pages are "admitted".  The demo policy does take into
account the total RAM in the system, but not current memory load
conditions.  The new code base IMHO does a better job but discussion
will be in a refereed presentation at the upcoming Plumber's meeting.
The fix for this flaw might be back-portable to the "demo" version
so is not a showstopper in the "demo" version, but fixing it is
not just a cosmetic fix.

I can add more issues to the list, but will stop here.  IMHO
the "demo" zcache is not suitable for promotion from staging,
which is why I spent over two months generating a new code base.
I, perhaps more than anyone else, would like to see zcache used,
by default, by real distros and customers, but I think it is
premature to promote it, especially the old "demo" code.

I do realize, however, that this decision is not mine alone so
defer to the community to decide.

Dan

[1] https://lkml.org/lkml/2012/3/22/383 
[2] http://lkml.indiana.edu/hypermail/linux/kernel/1203.2/02842.html 
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-08-07 21:47   ` Dan Magenheimer
@ 2012-08-08 16:29     ` Seth Jennings
  2012-08-08 17:47       ` Dan Magenheimer
  0 siblings, 1 reply; 41+ messages in thread
From: Seth Jennings @ 2012-08-08 16:29 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Wilk, Robert Jennings, linux-mm, linux-kernel, devel,
	Kurt Hackel

On 08/07/2012 04:47 PM, Dan Magenheimer wrote:
> I notice your original published benchmarks [1] include
> N=24, N=28, and N=32, but these updated results do not.  Are you planning
> on completing the runs?  Second, I now see the numbers I originally
> published for what I thought was the same benchmark as yours are actually
> an order of magnitude larger (in sec) than yours.  I didn't notice
> this in March because we were focused on the percent improvement, not
> the raw measurements.  Since the hardware is highly similar, I suspect
> it is not a hardware difference but instead that you are compiling
> a much smaller kernel.  In other words, your test case is much
> smaller, and so exercises zcache much less.  My test case compiles
> a full enterprise kernel... what is yours doing?

I am doing a minimal kernel build for my local hardware
configuration.

With the reduction in RAM, 1GB to 512MB, I didn't need to do
test runs with >20 threads to find the peak of the benefit
curve at 16 threads.  Past that, zcache is saturated and I'd
just be burning up my disk.  I'm already swapping out about
500MB (i.e. RAM size) in the 20 thread non-zcache case.

Also, I provide the magnitude numbers (pages, seconds) just
to show my source data.  The %change numbers are the real
results as they remove build size as a factor.

> At LSFMM, Andrea
> Arcangeli pointed out that zcache, for frontswap pages, has no "writeback"
> capabilities and, when it is full, it simply rejects further attempts
> to put data in its cache.  He said this is unacceptable for KVM and I
> agreed that it was a flaw that needed to be fixed before zcache should
> be promoted.

KVM (in-tree) is not a current user of zcache.  While the
use cases of possible future zcache users should be
considered, I don't think they can be used to prevent promotion.

> A second flaw is that the "demo" zcache has no concept of LRU for
> either cleancache or frontswap pages, or ability to reclaim pageframes
> at all for frontswap pages.
...
> 
> A third flaw is that the "demo" version has a very poor policy to
> determine what pages are "admitted".
...
> 
> I can add more issues to the list, but will stop here.

All of the flaws you list do not prevent zcache from being
beneficial right now, as my results demonstrate.  Therefore,
the flaws listed are really potential improvements and can
be done in mainline after promotion.  Even if large changes
are required to make these improvements, they can be made in
mainline in an incremental and public way.

Seth


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 0/4] promote zcache from staging
  2012-08-08 16:29     ` Seth Jennings
@ 2012-08-08 17:47       ` Dan Magenheimer
  0 siblings, 0 replies; 41+ messages in thread
From: Dan Magenheimer @ 2012-08-08 17:47 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Wilk, Robert Jennings, linux-mm, linux-kernel, devel,
	Kurt Hackel

> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]

Hi Seth --

Good discussion.  Even though we disagree, I appreciate
your enthusiasm and your good work on the kernel!

> Subject: Re: [PATCH 0/4] promote zcache from staging
> 
> On 08/07/2012 04:47 PM, Dan Magenheimer wrote:
> > I notice your original published benchmarks [1] include
> > N=24, N=28, and N=32, but these updated results do not.  Are you planning
> > on completing the runs?  Second, I now see the numbers I originally
> > published for what I thought was the same benchmark as yours are actually
> > an order of magnitude larger (in sec) than yours.  I didn't notice
> > this in March because we were focused on the percent improvement, not
> > the raw measurements.  Since the hardware is highly similar, I suspect
> > it is not a hardware difference but instead that you are compiling
> > a much smaller kernel.  In other words, your test case is much
> > smaller, and so exercises zcache much less.  My test case compiles
> > a full enterprise kernel... what is yours doing?
> 
> I am doing a minimal kernel build for my local hardware
> configuration.
> 
> With the reduction in RAM, 1GB to 512MB, I didn't need to do
> test runs with >20 threads to find the peak of the benefit
> curve at 16 threads.  Past that, zcache is saturated and I'd
> just be burning up my disk.

I think that's exactly what I said in a snippet of my response
that you deleted.  A cache needs to work well both when it
is non-full and when it is full.  You are only demonstrating
that it works well when it is non-full.  When it is
"saturated", bad things can happen.  Finding the "peak of the
benefit" is only half the work of benchmarking.

So it appears you are trying to prove your point by showing
the workloads that look good, while _not_ showing the workloads
that look bad, and then claiming you don't care about those
bad workloads anyway.

> Also, I provide the magnitude numbers (pages, seconds) just
> to show my source data.  The %change numbers are the real
> results as they remove build size as a factor.

You'll have to explain what you mean because, if I understand
correctly, this is just not true.  Different build sizes
definitely affect memory management differently, just as
different values of N (for make -jN) have an effect.

> > At LSFMM, Andrea
> > Arcangeli pointed out that zcache, for frontswap pages, has no "writeback"
> > capabilities and, when it is full, it simply rejects further attempts
> > to put data in its cache.  He said this is unacceptable for KVM and I
> > agreed that it was a flaw that needed to be fixed before zcache should
> > be promoted.
> 
> KVM (in-tree) is not a current user of zcache.  While the
> use cases of possible future zcache users should be
> considered, I don't think they can be used to prevent promotion.

That wasn't my point.  Andrea identified the flaw as an issue
of zcache.

> > A second flaw is that the "demo" zcache has no concept of LRU for
> > either cleancache or frontswap pages, or ability to reclaim pageframes
> > at all for frontswap pages.
> ...
> >
> > A third flaw is that the "demo" version has a very poor policy to
> > determine what pages are "admitted".
> ...
> >
> > I can add more issues to the list, but will stop here.
> 
> All of the flaws you list do not prevent zcache from being
> beneficial right now, as my results demonstrate.  Therefore,
> the flaws listed are really potential improvements and can
> be done in mainline after promotion.  Even if large changes
> are required to make these improvements, they can be made in
> mainline in an incremental and public way.

Your results only demonstrate that zcache is beneficial on
the workloads that you chose to present.  But using the same
workload with slightly different parameters (-jN or compiling
a larger kernel), zcache can be _detrimental_, and you've chosen
to not measure or present those cases, even though you did
measure and present some of those cases in your first benchmark
runs posted in March (on an earlier kernel).

I can only speak for myself, but this appears disingenuous to me.

Sorry, but FWIW my vote is still a NACK.  IMHO zcache needs major
work before it should be promoted, and I think we should be spending
the time fixing the known flaws rather than arguing about promoting
"demo" code.

Dan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-08-07 20:23 ` Seth Jennings
  2012-08-07 21:47   ` Dan Magenheimer
@ 2012-08-09 18:50   ` Seth Jennings
  2012-08-09 20:20     ` Dan Magenheimer
  1 sibling, 1 reply; 41+ messages in thread
From: Seth Jennings @ 2012-08-09 18:50 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Robert Jennings,
	linux-mm, linux-kernel, devel

On 08/07/2012 03:23 PM, Seth Jennings wrote:
> On 07/27/2012 01:18 PM, Seth Jennings wrote:
>> Some benchmarking numbers demonstrating the I/O saving that can be had
>> with zcache:
>>
>> https://lkml.org/lkml/2012/3/22/383
> 
> There was concern that kernel changes external to zcache since v3.3 may
> have mitigated the benefit of zcache.  So I re-ran my kernel building
> benchmark and confirmed that zcache is still providing I/O and runtime
> savings.

There was a request made to test with even greater memory pressure to
demonstrate that, at some unknown point, zcache doesn't have real
problems.  So I continued out to 32 threads:

N=4..20 is the same data as before except for the pswpin values.
I found a mistake in the way I computed pswpin that changed those
values slightly.  However, this didn't change the overall trend.

I also inverted the %change fields since it is a percent change vs the
normal case.

I/O (in pages)
	normal				zcache				change
N	pswpin	pswpout	majflt	I/O sum	pswpin	pswpout	majflt	I/O sum	%I/O
4	0	2	2116	2118	0	0	2125	2125	0%
8	0	575	2244	2819	0	4	2219	2223	-21%
12	1979	4038	3226	9243	1269	2519	3871	7659	-17%
16	21568	47278	9426	78272	7770	15598	9372	32740	-58%
20	50307	127797	15039	193143	20224	40634	17975	78833	-59%
24	186278	364809	45052	596139	47406	90489	30877	168772	-72%
28	274734	777815	53112	1105661	134981	307346	63480	505807	-54%
32	988530	2002087	168662	3159279	324801	723385	140288	1188474	-62%

Runtime (in seconds)
N	normal	zcache	%change
4	126	127	1%
8	124	124	0%
12	131	133	2%
16	189	156	-17%
20	261	235	-10%
24	513	288	-44%
28	556	434	-22%
32	1463	745	-49%

%CPU utilization (out of 400% on 4 cpus)
N	normal	zcache	%change
4	254	253	0%
8	261	263	1%
12	250	248	-1%
16	173	211	22%
20	124	140	13%
24	64	114	78%
28	59	76	29%
32	23	45	96%

The ~60% I/O savings holds even out to 32 threads, at which point the
non-zcache case has 12GB of I/O and is taking 12x longer to complete.
Additionally, the runtime savings increases significantly beyond 20
threads, even though the absolute runtime is suboptimal due to the
extreme memory pressure.

Seth


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 0/4] promote zcache from staging
  2012-08-09 18:50   ` Seth Jennings
@ 2012-08-09 20:20     ` Dan Magenheimer
  2012-08-10 18:14       ` Seth Jennings
  0 siblings, 1 reply; 41+ messages in thread
From: Dan Magenheimer @ 2012-08-09 20:20 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Wilk, Robert Jennings, linux-mm, linux-kernel, devel,
	Kurt Hackel

> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Subject: Re: [PATCH 0/4] promote zcache from staging
> 
> On 08/07/2012 03:23 PM, Seth Jennings wrote:
> > On 07/27/2012 01:18 PM, Seth Jennings wrote:
> >> Some benchmarking numbers demonstrating the I/O saving that can be had
> >> with zcache:
> >>
> >> https://lkml.org/lkml/2012/3/22/383
> >
> > There was concern that kernel changes external to zcache since v3.3 may
> > have mitigated the benefit of zcache.  So I re-ran my kernel building
> > benchmark and confirmed that zcache is still providing I/O and runtime
> > savings.
> 
> There was a request made to test with even greater memory pressure to
> demonstrate that, at some unknown point, zcache doesn't have real
> problems.  So I continued out to 32 threads:

Hi Seth --

Thanks for continuing with running the 24-32 thread benchmarks.

> Runtime (in seconds)
> N	normal	zcache	%change
> 4	126	127	1%

> threads, even though the absolute runtime is suboptimal due to the
> extreme memory pressure.

I am not in a position right now to reproduce your results or
mine (due to a house move which is limiting my time and access
to my test machines, plus two presentations later this month at
Linuxcon NA and Plumbers) but I still don't think you've really
saturated the cache, which is when the extreme memory pressure
issues will show up in zcache.  I suspect that adding more threads
to a minimal kernel compile doesn't increase the memory pressure as
much as I was seeing, so you're not seeing what I was seeing:
the zcache number climb to as much as 150% WORSE than non-zcache.
In various experiments trying variations, I have seen four-fold
degradations and worse.

My test case is a kernel compile using a full OL kernel config
file, which is roughly equivalent to a RHEL6 config.  Compiling
this kernel, using similar hardware, I have never seen a runtime
less than ~800 seconds for any value of N.  I suspect that my
test case, having much more source to compile, causes the N threads
in a "make -jN" each have more work to do, in parallel.

Since your test harness is obviously all set up, would you be
willing to reproduce your/my non-zcache/zcache runs with a RHEL6
config file and publish the results (using a 3.5 zcache)?

IIRC, the really bad zcache results starting showing up at N=24.
I also wonder if you have anything else unusual in your
test setup, such as a fast swap disk (mine is a partition
on the same rotating disk as source and target of the kernel build,
the default install for a RHEL6 system)?  Or have you disabled
cleancache?  Or have you changed any sysfs parameters or
other kernel files?  Also, whether zcache or non-zcache,
I've noticed that the runtime of this workload when swapping
can vary by as much as 30-40%, so it would be wise to take at
least three samples to ensure a statistically valid comparison.
And are you using 512M of physical memory or relying on
kernel boot parameters to reduce visible memory... and
if the latter have you confirmed with /proc/meminfo?
Obviously, I'm baffled at the difference in our observations.

While I am always willing to admit that my numbers may be wrong,
I still can't imagine why you are in such a hurry to promote
zcache when these questions are looming.  Would you care to
explain why?  It seems reckless to me, and unlike the IBM
behavior I expect, so I really wonder about the motivation.

My goal is very simple: "First do no harm".  I don't think
zcache should be enabled for distros (and users) until we can
reasonably demonstrate that running a workload with zcache
is never substantially worse than running the same workload
without zcache.  If you can tell your customer: "Yes, always enable
zcache", great!  But if you have to tell your customer: "It
depends on the workload, enable it if it works for you, disable
it otherwise", then zcache will get a bad reputation, and
will/should never be enabled in a reputable non-hobbyist distro.
I fear the "demo" zcache will get a bad reputation
so prefer to delay promotion while there is serious doubt
about whether "harm" may occur.

Last, you've never explained what problems zcache solves
for you that zram does not.  With Minchan pushing for
the promotion of zram+zsmalloc, does zram solve your problem?
Another alternative might be to promote zcache as "demozcache"
(i.e. fork it for now).

It's hard to identify a reasonable compromise when you
are just saying "Gotta promote zcache NOW!" and not
explaining the problem you are trying to solve or motivations
behind it.

OK, Seth, I think all my cards are on the table.  Where's yours?
(And, hello, is anyone else following this anyway? :-)

Thanks,
Dan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-08-09 20:20     ` Dan Magenheimer
@ 2012-08-10 18:14       ` Seth Jennings
  2012-08-15  9:38         ` Konrad Rzeszutek Wilk
  2012-08-17 22:21         ` Dan Magenheimer
  0 siblings, 2 replies; 41+ messages in thread
From: Seth Jennings @ 2012-08-10 18:14 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Wilk, Robert Jennings, linux-mm, linux-kernel, devel,
	Kurt Hackel

On 08/09/2012 03:20 PM, Dan Magenheimer wrote
> I also wonder if you have anything else unusual in your
> test setup, such as a fast swap disk (mine is a partition
> on the same rotating disk as source and target of the kernel build,
> the default install for a RHEL6 system)?

I'm using a normal SATA HDD with two partitions, one for
swap and the other an ext3 filesystem with the kernel source.

> Or have you disabled cleancache?

Yes, I _did_ disable cleancache.  I could see where having
cleancache enabled could explain the difference in results.

> Or have you changed any sysfs parameters or
> other kernel files?

No.

> And are you using 512M of physical memory or relying on
> kernel boot parameters to reduce visible memory

Limited with mem=512M boot parameter.

> ... and
> if the latter have you confirmed with /proc/meminfo?

Yes, confirmed.

Seth


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-07-27 18:18 [PATCH 0/4] promote zcache from staging Seth Jennings
                   ` (5 preceding siblings ...)
  2012-08-07 20:23 ` Seth Jennings
@ 2012-08-14 22:18 ` Seth Jennings
  2012-08-14 23:29   ` Minchan Kim
  6 siblings, 1 reply; 41+ messages in thread
From: Seth Jennings @ 2012-08-14 22:18 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Robert Jennings,
	linux-mm, linux-kernel, devel

On 07/27/2012 01:18 PM, Seth Jennings wrote:
> zcache is the remaining piece of code required to support in-kernel
> memory compression.  The other two features, cleancache and frontswap,
> have been promoted to mainline in 3.0 and 3.5.  This patchset
> promotes zcache from the staging tree to mainline.
> 
> Based on the level of activity and contributions we're seeing from a
> diverse set of people and interests, I think zcache has matured to the
> point where it makes sense to promote this out of staging.

I am wondering if there is any more discussion to be had on
the topic of promoting zcache.  The discussion got dominated
by performance concerns, but hopefully my latest performance
metrics have alleviated those concerns for most and shown
the continuing value of zcache in both I/O and runtime savings.

I'm not saying that zcache development is complete by any
means. There are still many improvements that can be made.
I'm just saying that I believe it is stable and beneficial
enough to leave the staging tree.

Seth


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-08-14 22:18 ` Seth Jennings
@ 2012-08-14 23:29   ` Minchan Kim
  0 siblings, 0 replies; 41+ messages in thread
From: Minchan Kim @ 2012-08-14 23:29 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Robert Jennings,
	linux-mm, linux-kernel, devel

Hi Seth,

On Tue, Aug 14, 2012 at 05:18:57PM -0500, Seth Jennings wrote:
> On 07/27/2012 01:18 PM, Seth Jennings wrote:
> > zcache is the remaining piece of code required to support in-kernel
> > memory compression.  The other two features, cleancache and frontswap,
> > have been promoted to mainline in 3.0 and 3.5.  This patchset
> > promotes zcache from the staging tree to mainline.
> > 
> > Based on the level of activity and contributions we're seeing from a
> > diverse set of people and interests, I think zcache has matured to the
> > point where it makes sense to promote this out of staging.
> 
> I am wondering if there is any more discussion to be had on
> the topic of promoting zcache.  The discussion got dominated
> by performance concerns, but hopefully my latest performance
> metrics have alleviated those concerns for most and shown
> the continuing value of zcache in both I/O and runtime savings.
> 
> I'm not saying that zcache development is complete by any
> means. There are still many improvements that can be made.
> I'm just saying that I believe it is stable and beneficial
> enough to leave the staging tree.
> 
> Seth

I want to do some clean up on zcache but I'm okay after it is promoted
if Andrew merge it. But I'm not sure he doesn't mind it due to not good code
quality which includes not enough comment, not good variable/function name,
many code duplication of ramster).
Anyway, I think  we should unify common code between zcache and ramster
before promoting at least. Otherwise, it would make code refactoring hard
because we always have to touch both side for just a clean up. It means
zcache contributor for the clean up should know well ramster too and it's
not desirable.


> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-08-10 18:14       ` Seth Jennings
@ 2012-08-15  9:38         ` Konrad Rzeszutek Wilk
  2012-08-15 14:24           ` Seth Jennings
  2012-08-17 22:21         ` Dan Magenheimer
  1 sibling, 1 reply; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-15  9:38 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Dan Magenheimer, Greg Kroah-Hartman, Andrew Morton, Nitin Gupta,
	Minchan Kim, Robert Jennings, linux-mm, linux-kernel, devel,
	Kurt Hackel

On Fri, Aug 10, 2012 at 01:14:01PM -0500, Seth Jennings wrote:
> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
> > I also wonder if you have anything else unusual in your
> > test setup, such as a fast swap disk (mine is a partition
> > on the same rotating disk as source and target of the kernel build,
> > the default install for a RHEL6 system)?
> 
> I'm using a normal SATA HDD with two partitions, one for
> swap and the other an ext3 filesystem with the kernel source.
> 
> > Or have you disabled cleancache?
> 
> Yes, I _did_ disable cleancache.  I could see where having
> cleancache enabled could explain the difference in results.

Why did you disable the cleancache? Having both (cleancache
to compress fs data) and frontswap (to compress swap data) is the
goal - while you turned one of its sources off.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-08-15  9:38         ` Konrad Rzeszutek Wilk
@ 2012-08-15 14:24           ` Seth Jennings
  0 siblings, 0 replies; 41+ messages in thread
From: Seth Jennings @ 2012-08-15 14:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Dan Magenheimer, Greg Kroah-Hartman, Andrew Morton, Nitin Gupta,
	Minchan Kim, Robert Jennings, linux-mm, linux-kernel, devel,
	Kurt Hackel

On 08/15/2012 04:38 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 10, 2012 at 01:14:01PM -0500, Seth Jennings wrote:
>> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
>>> I also wonder if you have anything else unusual in your
>>> test setup, such as a fast swap disk (mine is a partition
>>> on the same rotating disk as source and target of the kernel build,
>>> the default install for a RHEL6 system)?
>>
>> I'm using a normal SATA HDD with two partitions, one for
>> swap and the other an ext3 filesystem with the kernel source.
>>
>>> Or have you disabled cleancache?
>>
>> Yes, I _did_ disable cleancache.  I could see where having
>> cleancache enabled could explain the difference in results.
> 
> Why did you disable the cleancache? Having both (cleancache
> to compress fs data) and frontswap (to compress swap data) is the
> goal - while you turned one of its sources off.

I excluded cleancache to reduce interference/noise from the
benchmarking results. For this particular workload,
cleancache doesn't make a lot of sense since it will steal
pages that could otherwise be used for storing frontswap
pages to prevent swapin/swapout I/O.

In a test run with both enabled, I found that it didn't make
much difference under moderate to extreme memory pressure.
Both resulted in about 55% I/O reduction.  However, on light
memory pressure with 8 and 12 threads, it lowered the I/O
reduction ability of zcache to roughly 0 compared to ~20%
I/O reduction without cleancache.

In short, cleancache only had the power to harm in this
case, so I didn't enable it.

Seth


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 0/4] promote zcache from staging
  2012-08-10 18:14       ` Seth Jennings
  2012-08-15  9:38         ` Konrad Rzeszutek Wilk
@ 2012-08-17 22:21         ` Dan Magenheimer
  2012-08-17 23:33           ` Seth Jennings
  1 sibling, 1 reply; 41+ messages in thread
From: Dan Magenheimer @ 2012-08-17 22:21 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Wilk, Robert Jennings, linux-mm, linux-kernel, devel,
	Kurt Hackel

> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Subject: Re: [PATCH 0/4] promote zcache from staging
> 
> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
> > I also wonder if you have anything else unusual in your
> > test setup, such as a fast swap disk (mine is a partition
> > on the same rotating disk as source and target of the kernel build,
> > the default install for a RHEL6 system)?
> 
> I'm using a normal SATA HDD with two partitions, one for
> swap and the other an ext3 filesystem with the kernel source.
> 
> > Or have you disabled cleancache?
> 
> Yes, I _did_ disable cleancache.  I could see where having
> cleancache enabled could explain the difference in results.

Sorry to beat a dead horse, but I meant to report this
earlier in the week and got tied up by other things.

I finally got my test scaffold set up earlier this week
to try to reproduce my "bad" numbers with the RHEL6-ish
config file.

I found that with "make -j28" and "make -j32" I experienced
__DATA CORRUPTION__.  This was repeatable.

The type of error led me to believe that the problem was
due to concurrency of cleancache reclaim.  I did not try
with cleancache disabled to prove/support this theory
but it is consistent with the fact that you (Seth) have not
seen a similar problem and has disabled cleancache.

While this problem is most likely in my code and I am
suitably chagrined, it re-emphasizes the fact that
the current zcache in staging is 20-month old "demo"
code.  The proposed new zcache codebase handles concurrency
much more effectively.

I'll be away from email for a few days now.

Dan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/4] promote zcache from staging
  2012-08-17 22:21         ` Dan Magenheimer
@ 2012-08-17 23:33           ` Seth Jennings
  2012-08-18 19:09             ` Dan Magenheimer
  0 siblings, 1 reply; 41+ messages in thread
From: Seth Jennings @ 2012-08-17 23:33 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Wilk, Robert Jennings, linux-mm, linux-kernel, devel,
	Kurt Hackel

On 08/17/2012 05:21 PM, Dan Magenheimer wrote:
>> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
>> Subject: Re: [PATCH 0/4] promote zcache from staging
>>
>> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
>>> I also wonder if you have anything else unusual in your
>>> test setup, such as a fast swap disk (mine is a partition
>>> on the same rotating disk as source and target of the kernel build,
>>> the default install for a RHEL6 system)?
>>
>> I'm using a normal SATA HDD with two partitions, one for
>> swap and the other an ext3 filesystem with the kernel source.
>>
>>> Or have you disabled cleancache?
>>
>> Yes, I _did_ disable cleancache.  I could see where having
>> cleancache enabled could explain the difference in results.
> 
> Sorry to beat a dead horse, but I meant to report this
> earlier in the week and got tied up by other things.
> 
> I finally got my test scaffold set up earlier this week
> to try to reproduce my "bad" numbers with the RHEL6-ish
> config file.
> 
> I found that with "make -j28" and "make -j32" I experienced
> __DATA CORRUPTION__.  This was repeatable.

I actually hit this for the first time a few hours ago when
I was running performance for your rewrite.  I didn't know
what to make of it yet.  The 24-thread kernel build failed
when both frontswap and cleancache were enabled.

> The type of error led me to believe that the problem was
> due to concurrency of cleancache reclaim.  I did not try
> with cleancache disabled to prove/support this theory
> but it is consistent with the fact that you (Seth) have not
> seen a similar problem and has disabled cleancache.
> 
> While this problem is most likely in my code and I am
> suitably chagrined, it re-emphasizes the fact that
> the current zcache in staging is 20-month old "demo"
> code.  The proposed new zcache codebase handles concurrency
> much more effectively.

I imagine this can be solved without rewriting the entire
codebase.  If your new code contains a fix for this, can we
just pull it as a single patch?

Seth


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 0/4] promote zcache from staging
  2012-08-17 23:33           ` Seth Jennings
@ 2012-08-18 19:09             ` Dan Magenheimer
  0 siblings, 0 replies; 41+ messages in thread
From: Dan Magenheimer @ 2012-08-18 19:09 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Nitin Gupta, Minchan Kim,
	Konrad Wilk, Robert Jennings, linux-mm, linux-kernel, devel

> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Sent: Friday, August 17, 2012 5:33 PM
> To: Dan Magenheimer
> Cc: Greg Kroah-Hartman; Andrew Morton; Nitin Gupta; Minchan Kim; Konrad Wilk; Robert Jennings; linux-
> mm@kvack.org; linux-kernel@vger.kernel.org; devel@driverdev.osuosl.org; Kurt Hackel
> Subject: Re: [PATCH 0/4] promote zcache from staging
> 
> >
> > Sorry to beat a dead horse, but I meant to report this
> > earlier in the week and got tied up by other things.
> >
> > I finally got my test scaffold set up earlier this week
> > to try to reproduce my "bad" numbers with the RHEL6-ish
> > config file.
> >
> > I found that with "make -j28" and "make -j32" I experienced
> > __DATA CORRUPTION__.  This was repeatable.
> 
> I actually hit this for the first time a few hours ago when
> I was running performance for your rewrite.  I didn't know
> what to make of it yet.  The 24-thread kernel build failed
> when both frontswap and cleancache were enabled.
> 
> > The type of error led me to believe that the problem was
> > due to concurrency of cleancache reclaim.  I did not try
> > with cleancache disabled to prove/support this theory
> > but it is consistent with the fact that you (Seth) have not
> > seen a similar problem and has disabled cleancache.
> >
> > While this problem is most likely in my code and I am
> > suitably chagrined, it re-emphasizes the fact that
> > the current zcache in staging is 20-month old "demo"
> > code.  The proposed new zcache codebase handles concurrency
> > much more effectively.
> 
> I imagine this can be solved without rewriting the entire
> codebase.  If your new code contains a fix for this, can we
> just pull it as a single patch?

Hi Seth --

I didn't even observe this before this week, let alone fix this
as an individual bug.  The redesign takes into account LRU ordering
and zombie pageframes (which have valid pointers to the contained
zbuds and possibly valid data, so can't be recycled yet),
taking races and concurrency carefully into account.

The demo codebase is pretty dumb about concurrency, really
a hack that seemed to work.  Given the above, I guess the
hack only works _most_ of the time... when it doesn't
data corruption can occur.

It would be an interesting challenge, but likely very
time-consuming, to fix this one bug while minimizing other
changes so that the fix could be delivered as a self-contained
incremental patch.  I suspect if you try, you will learn why
the rewrite was preferable and necessary.

(Away from email for a few days very soon now.)
Dan

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2012-08-18 19:10 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-27 18:18 [PATCH 0/4] promote zcache from staging Seth Jennings
2012-07-27 18:18 ` [PATCH 1/4] zsmalloc: collapse internal .h into .c Seth Jennings
2012-07-27 18:18 ` [PATCH 2/4] zsmalloc: promote to mm/ Seth Jennings
2012-07-27 18:18 ` [PATCH 3/4] drivers: add memory management driver class Seth Jennings
2012-07-31 15:31   ` Konrad Rzeszutek Wilk
2012-07-27 18:18 ` [PATCH 4/4] zcache: promote to drivers/mm/ Seth Jennings
2012-07-29  2:20 ` [PATCH 0/4] promote zcache from staging Minchan Kim
2012-08-07 20:23 ` Seth Jennings
2012-08-07 21:47   ` Dan Magenheimer
2012-08-08 16:29     ` Seth Jennings
2012-08-08 17:47       ` Dan Magenheimer
2012-08-09 18:50   ` Seth Jennings
2012-08-09 20:20     ` Dan Magenheimer
2012-08-10 18:14       ` Seth Jennings
2012-08-15  9:38         ` Konrad Rzeszutek Wilk
2012-08-15 14:24           ` Seth Jennings
2012-08-17 22:21         ` Dan Magenheimer
2012-08-17 23:33           ` Seth Jennings
2012-08-18 19:09             ` Dan Magenheimer
2012-08-14 22:18 ` Seth Jennings
2012-08-14 23:29   ` Minchan Kim
     [not found] <<1343413117-1989-1-git-send-email-sjenning@linux.vnet.ibm.com>
2012-07-27 19:21 ` Dan Magenheimer
2012-07-27 20:59   ` Konrad Rzeszutek Wilk
2012-07-27 21:42     ` Dan Magenheimer
2012-07-29  1:54       ` Minchan Kim
2012-07-31 15:36         ` Konrad Rzeszutek Wilk
2012-08-06  4:49           ` Minchan Kim
2012-07-30 19:19       ` Seth Jennings
2012-07-30 20:48         ` Dan Magenheimer
2012-07-31 15:58           ` Konrad Rzeszutek Wilk
2012-07-31 16:19             ` Greg Kroah-Hartman
2012-07-31 17:51               ` Konrad Rzeszutek Wilk
2012-07-31 18:19                 ` Seth Jennings
2012-08-06  0:38                 ` Minchan Kim
2012-08-06 15:24                   ` Dan Magenheimer
2012-08-06 15:47                     ` Pekka Enberg
2012-08-06 16:21                       ` Dan Magenheimer
2012-08-06 16:29                         ` Greg Kroah-Hartman
2012-08-06 16:38                           ` Dan Magenheimer
2012-08-07  0:44                         ` Minchan Kim
2012-08-07 19:28                   ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).