linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] implement zsmalloc shrinking
@ 2014-09-11 20:53 Dan Streetman
  2014-09-11 20:53 ` [PATCH 01/10] zsmalloc: fix init_zspage free obj linking Dan Streetman
                   ` (11 more replies)
  0 siblings, 12 replies; 23+ messages in thread
From: Dan Streetman @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

Now that zswap can use zsmalloc as a storage pool via zpool, it will
try to shrink its zsmalloc zs_pool once it reaches its max_pool_percent
limit.  These patches implement zsmalloc shrinking.  The way the pool is
shrunk is by finding a zspage and reclaiming it, by evicting each of its
objects that is in use.

Without these patches zswap, and any other future user of zpool/zsmalloc
that attempts to shrink the zpool/zs_pool, will only get errors and will
be unable to shrink its zpool/zs_pool.  With the ability to shrink, zswap
can keep the most recent compressed pages in memory.

Note that the design of zsmalloc makes it impossible to actually find the
LRU zspage, so each class and fullness group is searched in a round-robin
method to find the next zspage to reclaim.  Each fullness group orders its
zspages in LRU order, so the oldest zspage is used for each fullness group.

---

This patch set applies to linux-next.

Dan Streetman (10):
  zsmalloc: fix init_zspage free obj linking
  zsmalloc: add fullness group list for ZS_FULL zspages
  zsmalloc: always update lru ordering of each zspage
  zsmalloc: move zspage obj freeing to separate function
  zsmalloc: add atomic index to find zspage to reclaim
  zsmalloc: add zs_ops to zs_pool
  zsmalloc: add obj_handle_is_free()
  zsmalloc: add reclaim_zspage()
  zsmalloc: add zs_shrink()
  zsmalloc: implement zs_zpool_shrink() with zs_shrink()

 drivers/block/zram/zram_drv.c |   2 +-
 include/linux/zsmalloc.h      |   7 +-
 mm/zsmalloc.c                 | 314 +++++++++++++++++++++++++++++++++++++-----
 3 files changed, 290 insertions(+), 33 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/10] zsmalloc: fix init_zspage free obj linking
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
@ 2014-09-11 20:53 ` Dan Streetman
  2014-09-12  3:16   ` Seth Jennings
  2014-09-12  4:59   ` Minchan Kim
  2014-09-11 20:53 ` [PATCH 02/10] zsmalloc: add fullness group list for ZS_FULL zspages Dan Streetman
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 23+ messages in thread
From: Dan Streetman @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

When zsmalloc creates a new zspage, it initializes each object it contains
with a link to the next object, so that the zspage has a singly-linked list
of its free objects.  However, the logic that sets up the links is wrong,
and in the case of objects that are precisely aligned with the page boundries
(e.g. a zspage with objects that are 1/2 PAGE_SIZE) the first object on the
next page is skipped, due to incrementing the offset twice.  The logic can be
simplified, as it doesn't need to calculate how many objects can fit on the
current page; simply checking the offset for each object is enough.

Change zsmalloc init_zspage() logic to iterate through each object on
each of its pages, checking the offset to verify the object is on the
current page before linking it into the zspage.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index c4a9157..03aa72f 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -628,7 +628,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
 	while (page) {
 		struct page *next_page;
 		struct link_free *link;
-		unsigned int i, objs_on_page;
+		unsigned int i = 1;
 
 		/*
 		 * page->index stores offset of first object starting
@@ -641,14 +641,10 @@ static void init_zspage(struct page *first_page, struct size_class *class)
 
 		link = (struct link_free *)kmap_atomic(page) +
 						off / sizeof(*link);
-		objs_on_page = (PAGE_SIZE - off) / class->size;
 
-		for (i = 1; i <= objs_on_page; i++) {
-			off += class->size;
-			if (off < PAGE_SIZE) {
-				link->next = obj_location_to_handle(page, i);
-				link += class->size / sizeof(*link);
-			}
+		while ((off += class->size) < PAGE_SIZE) {
+			link->next = obj_location_to_handle(page, i++);
+			link += class->size / sizeof(*link);
 		}
 
 		/*
@@ -660,7 +656,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
 		link->next = obj_location_to_handle(next_page, 0);
 		kunmap_atomic(link);
 		page = next_page;
-		off = (off + class->size) % PAGE_SIZE;
+		off %= PAGE_SIZE;
 	}
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/10] zsmalloc: add fullness group list for ZS_FULL zspages
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
  2014-09-11 20:53 ` [PATCH 01/10] zsmalloc: fix init_zspage free obj linking Dan Streetman
@ 2014-09-11 20:53 ` Dan Streetman
  2014-09-11 20:53 ` [PATCH 03/10] zsmalloc: always update lru ordering of each zspage Dan Streetman
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Dan Streetman @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

Move ZS_FULL into section of fullness_group entries that are tracked in
the class fullness_lists.  Without this change, full zspages are untracked
by zsmalloc; they are only moved back onto one of the tracked lists
(ZS_ALMOST_FULL or ZS_ALMOST_EMPTY) when a zsmalloc user frees one or more
of its contained objects.

This is required for zsmalloc shrinking, which needs to be able to search
all zspages in a zsmalloc pool, to find one to shrink.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 03aa72f..fedb70f 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -159,16 +159,19 @@
 					ZS_SIZE_CLASS_DELTA + 1)
 
 /*
- * We do not maintain any list for completely empty or full pages
+ * We do not maintain any list for completely empty zspages,
+ * since a zspage is freed when it becomes empty.
  */
 enum fullness_group {
 	ZS_ALMOST_FULL,
 	ZS_ALMOST_EMPTY,
+	ZS_FULL,
+
 	_ZS_NR_FULLNESS_GROUPS,
 
 	ZS_EMPTY,
-	ZS_FULL
 };
+#define _ZS_NR_AVAILABLE_FULLNESS_GROUPS ZS_FULL
 
 /*
  * We assign a page to ZS_ALMOST_EMPTY fullness group when:
@@ -722,12 +725,12 @@ cleanup:
 	return first_page;
 }
 
-static struct page *find_get_zspage(struct size_class *class)
+static struct page *find_available_zspage(struct size_class *class)
 {
 	int i;
 	struct page *page;
 
-	for (i = 0; i < _ZS_NR_FULLNESS_GROUPS; i++) {
+	for (i = 0; i < _ZS_NR_AVAILABLE_FULLNESS_GROUPS; i++) {
 		page = class->fullness_list[i];
 		if (page)
 			break;
@@ -1013,7 +1016,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	BUG_ON(class_idx != class->index);
 
 	spin_lock(&class->lock);
-	first_page = find_get_zspage(class);
+	first_page = find_available_zspage(class);
 
 	if (!first_page) {
 		spin_unlock(&class->lock);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/10] zsmalloc: always update lru ordering of each zspage
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
  2014-09-11 20:53 ` [PATCH 01/10] zsmalloc: fix init_zspage free obj linking Dan Streetman
  2014-09-11 20:53 ` [PATCH 02/10] zsmalloc: add fullness group list for ZS_FULL zspages Dan Streetman
@ 2014-09-11 20:53 ` Dan Streetman
  2014-09-12  3:20   ` Seth Jennings
  2014-09-11 20:53 ` [PATCH 04/10] zsmalloc: move zspage obj freeing to separate function Dan Streetman
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Dan Streetman @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

Update ordering of a changed zspage in its fullness group LRU list,
even if it has not moved to a different fullness group.

This is needed by zsmalloc shrinking, which partially relies on each
class fullness group list to be kept in LRU order, so the oldest can
be reclaimed first.  Currently, LRU ordering is only updated when
a zspage changes fullness groups.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index fedb70f..51db622 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -467,16 +467,14 @@ static enum fullness_group fix_fullness_group(struct zs_pool *pool,
 	BUG_ON(!is_first_page(page));
 
 	get_zspage_mapping(page, &class_idx, &currfg);
-	newfg = get_fullness_group(page);
-	if (newfg == currfg)
-		goto out;
-
 	class = &pool->size_class[class_idx];
+	newfg = get_fullness_group(page);
+	/* Need to do this even if currfg == newfg, to update lru */
 	remove_zspage(page, class, currfg);
 	insert_zspage(page, class, newfg);
-	set_zspage_mapping(page, class_idx, newfg);
+	if (currfg != newfg)
+		set_zspage_mapping(page, class_idx, newfg);
 
-out:
 	return newfg;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/10] zsmalloc: move zspage obj freeing to separate function
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
                   ` (2 preceding siblings ...)
  2014-09-11 20:53 ` [PATCH 03/10] zsmalloc: always update lru ordering of each zspage Dan Streetman
@ 2014-09-11 20:53 ` Dan Streetman
  2014-09-11 20:53 ` [PATCH 05/10] zsmalloc: add atomic index to find zspage to reclaim Dan Streetman
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Dan Streetman @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

Move the code that frees a zspage object out of the zs_free()
function and into its own obj_free() function.

This is required by zsmalloc shrinking, which will also need to
free objects during zspage reclaiming.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 51db622..cff8935 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -584,6 +584,21 @@ static unsigned long obj_idx_to_offset(struct page *page,
 	return off + obj_idx * class_size;
 }
 
+static void obj_free(unsigned long obj, struct page *page, unsigned long offset)
+{
+	struct page *first_page = get_first_page(page);
+	struct link_free *link;
+
+	/* Insert this object in containing zspage's freelist */
+	link = (struct link_free *)((unsigned char *)kmap_atomic(page)
+							+ offset);
+	link->next = first_page->freelist;
+	kunmap_atomic(link);
+	first_page->freelist = (void *)obj;
+
+	first_page->inuse--;
+}
+
 static void reset_page(struct page *page)
 {
 	clear_bit(PG_private, &page->flags);
@@ -1049,7 +1064,6 @@ EXPORT_SYMBOL_GPL(zs_malloc);
 
 void zs_free(struct zs_pool *pool, unsigned long obj)
 {
-	struct link_free *link;
 	struct page *first_page, *f_page;
 	unsigned long f_objidx, f_offset;
 
@@ -1069,14 +1083,8 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
 
 	spin_lock(&class->lock);
 
-	/* Insert this object in containing zspage's freelist */
-	link = (struct link_free *)((unsigned char *)kmap_atomic(f_page)
-							+ f_offset);
-	link->next = first_page->freelist;
-	kunmap_atomic(link);
-	first_page->freelist = (void *)obj;
+	obj_free(obj, f_page, f_offset);
 
-	first_page->inuse--;
 	fullness = fix_fullness_group(pool, first_page);
 	spin_unlock(&class->lock);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/10] zsmalloc: add atomic index to find zspage to reclaim
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
                   ` (3 preceding siblings ...)
  2014-09-11 20:53 ` [PATCH 04/10] zsmalloc: move zspage obj freeing to separate function Dan Streetman
@ 2014-09-11 20:53 ` Dan Streetman
  2014-09-11 20:53 ` [PATCH 06/10] zsmalloc: add zs_ops to zs_pool Dan Streetman
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Dan Streetman @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

Add an atomic index that allows multiple threads to concurrently and
sequentially iterate through the zspages in all classes and fullness
groups.  Add a function find_next_lru_class_fg() to find the next class
fullness group to check for a zspage.  Add a function find_lru_zspage()
which calls find_next_lru_class_fg() until a fullness group with an
available zspage is found.  This is required to implement zsmalloc pool
shrinking, which needs to be able to find a zspage to reclaim.

Since zsmalloc categorizes its zspages in arrays of fullness groups, which
are themselves inside arrays of classes, there is no (simple) way to
determine the LRU order of all a zsmalloc pool's zspages.  But to implement
shrinking, there must be some way to select the zspage to reclaim.  This
can't use a simple iteration through all classes, since any failure to
reclaim a zspage would result in any following reclaims to attempt to
reclaim the same zspage, which would likely result in repeated failures
to shrink the pool.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index cff8935..a2e417b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -242,6 +242,16 @@ struct mapping_area {
 	enum zs_mapmode vm_mm; /* mapping mode */
 };
 
+/* atomic counter indicating which class/fg to reclaim from */
+static atomic_t lru_class_fg;
+/* specific order of fg we want to reclaim from */
+static enum fullness_group lru_fg[] = {
+	ZS_ALMOST_EMPTY,
+	ZS_ALMOST_FULL,
+	ZS_FULL
+};
+#define _ZS_NR_LRU_CLASS_FG (ZS_SIZE_CLASSES * ARRAY_SIZE(lru_fg))
+
 /* zpool driver */
 
 #ifdef CONFIG_ZPOOL
@@ -752,6 +762,64 @@ static struct page *find_available_zspage(struct size_class *class)
 	return page;
 }
 
+/* this simply iterates atomically through all classes,
+ * using a specific fullness group.  At the end, it starts
+ * over using the next fullness group, and so on.  The
+ * fullness groups are used in a specific order, from
+ * least to most full.
+ */
+static void find_next_lru_class_fg(struct zs_pool *pool,
+			struct size_class **class, enum fullness_group *fg)
+{
+	int i = atomic_inc_return(&lru_class_fg);
+
+	if (i >= _ZS_NR_LRU_CLASS_FG) {
+		int orig = i;
+
+		i %= _ZS_NR_LRU_CLASS_FG;
+		/* only need to try once, since if we don't
+		 * succeed whoever changed it will also try
+		 * and eventually someone will reset it
+		 */
+		atomic_cmpxchg(&lru_class_fg, orig, i);
+	}
+	*class = &pool->size_class[i % ZS_SIZE_CLASSES];
+	*fg = lru_fg[i / ZS_SIZE_CLASSES];
+}
+
+/*
+ * This attempts to find the LRU zspage, but that's not really possible
+ * because zspages are not contained in a single LRU list, they're
+ * contained inside fullness groups which are themselves contained
+ * inside classes.  So this simply iterates through the classes and
+ * fullness groups to find the next non-empty fullness group, and
+ * uses the LRU zspage there.
+ *
+ * On success, the zspage is returned with its class locked.
+ * On failure, NULL is returned.
+ */
+static struct page *find_lru_zspage(struct zs_pool *pool)
+{
+	struct size_class *class;
+	struct page *page;
+	enum fullness_group fg;
+	int tries = 0;
+
+	while (tries++ < _ZS_NR_LRU_CLASS_FG) {
+		find_next_lru_class_fg(pool, &class, &fg);
+
+		spin_lock(&class->lock);
+
+		page = class->fullness_list[fg];
+		if (page)
+			return list_prev_entry(page, lru);
+
+		spin_unlock(&class->lock);
+	}
+
+	return NULL;
+}
+
 #ifdef CONFIG_PGTABLE_MAPPING
 static inline int __zs_cpu_up(struct mapping_area *area)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/10] zsmalloc: add zs_ops to zs_pool
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
                   ` (4 preceding siblings ...)
  2014-09-11 20:53 ` [PATCH 05/10] zsmalloc: add atomic index to find zspage to reclaim Dan Streetman
@ 2014-09-11 20:53 ` Dan Streetman
  2014-09-11 20:53 ` [PATCH 07/10] zsmalloc: add obj_handle_is_free() Dan Streetman
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Dan Streetman @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

Add struct zs_ops with a evict() callback function.  Add documentation
to zs_free() function clarifying that it cannot be called with a
zs_pool handle after that handle has been successfully evicted;
since evict calls into a function provided by the zs_pool creator,
the creator is therefore responsible for ensuring this requirement.

This is required to implement zsmalloc shrinking.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c |  2 +-
 include/linux/zsmalloc.h      |  6 +++++-
 mm/zsmalloc.c                 | 26 ++++++++++++++++++++++++--
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index bc20fe1..31ba9c7 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -328,7 +328,7 @@ static struct zram_meta *zram_meta_alloc(u64 disksize)
 		goto free_meta;
 	}
 
-	meta->mem_pool = zs_create_pool(GFP_NOIO | __GFP_HIGHMEM);
+	meta->mem_pool = zs_create_pool(GFP_NOIO | __GFP_HIGHMEM, NULL);
 	if (!meta->mem_pool) {
 		pr_err("Error creating memory pool\n");
 		goto free_table;
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 05c2147..2c341d4 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -36,7 +36,11 @@ enum zs_mapmode {
 
 struct zs_pool;
 
-struct zs_pool *zs_create_pool(gfp_t flags);
+struct zs_ops {
+	int (*evict)(struct zs_pool *pool, unsigned long handle);
+};
+
+struct zs_pool *zs_create_pool(gfp_t flags, struct zs_ops *ops);
 void zs_destroy_pool(struct zs_pool *pool);
 
 unsigned long zs_malloc(struct zs_pool *pool, size_t size);
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a2e417b..3dc7dae 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -221,6 +221,8 @@ struct zs_pool {
 
 	gfp_t flags;	/* allocation flags used when growing pool */
 	atomic_long_t pages_allocated;
+
+	struct zs_ops *ops;
 };
 
 /*
@@ -256,9 +258,18 @@ static enum fullness_group lru_fg[] = {
 
 #ifdef CONFIG_ZPOOL
 
+static int zs_zpool_evict(struct zs_pool *pool, unsigned long handle)
+{
+	return zpool_evict(pool, handle);
+}
+
+static struct zs_ops zs_zpool_ops = {
+	.evict =	zs_zpool_evict
+};
+
 static void *zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_ops)
 {
-	return zs_create_pool(gfp);
+	return zs_create_pool(gfp, &zs_zpool_ops);
 }
 
 static void zs_zpool_destroy(void *pool)
@@ -1019,7 +1030,7 @@ fail:
  * On success, a pointer to the newly created pool is returned,
  * otherwise NULL.
  */
-struct zs_pool *zs_create_pool(gfp_t flags)
+struct zs_pool *zs_create_pool(gfp_t flags, struct zs_ops *ops)
 {
 	int i, ovhd_size;
 	struct zs_pool *pool;
@@ -1046,6 +1057,7 @@ struct zs_pool *zs_create_pool(gfp_t flags)
 	}
 
 	pool->flags = flags;
+	pool->ops = ops;
 
 	return pool;
 }
@@ -1130,6 +1142,16 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 }
 EXPORT_SYMBOL_GPL(zs_malloc);
 
+/**
+ * zs_free - Free the handle from this pool.
+ * @pool: pool containing the handle
+ * @obj: the handle to free
+ *
+ * The caller must provide a valid handle that is contained
+ * in the provided pool.  The caller must ensure this is
+ * not called after evict() has returned successfully for the
+ * handle.
+ */
 void zs_free(struct zs_pool *pool, unsigned long obj)
 {
 	struct page *first_page, *f_page;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/10] zsmalloc: add obj_handle_is_free()
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
                   ` (5 preceding siblings ...)
  2014-09-11 20:53 ` [PATCH 06/10] zsmalloc: add zs_ops to zs_pool Dan Streetman
@ 2014-09-11 20:53 ` Dan Streetman
  2014-09-11 20:53 ` [PATCH 08/10] zsmalloc: add reclaim_zspage() Dan Streetman
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Dan Streetman @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

Add function obj_handle_is_free() which scans through the entire
singly-linked list of free objects inside the provided zspage to
determine if the provided object handle is free or not.  This is
required by zspage reclaiming, which needs to evict each object
that is currently in use by the zs_pool owner, but has no other
way to determine if an object is in use.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 3dc7dae..ab72390 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -605,6 +605,33 @@ static unsigned long obj_idx_to_offset(struct page *page,
 	return off + obj_idx * class_size;
 }
 
+static bool obj_handle_is_free(struct page *first_page,
+			struct size_class *class, unsigned long handle)
+{
+	unsigned long obj, idx, offset;
+	struct page *page;
+	struct link_free *link;
+
+	BUG_ON(!is_first_page(first_page));
+
+	obj = (unsigned long)first_page->freelist;
+
+	while (obj) {
+		if (obj == handle)
+			return true;
+
+		obj_handle_to_location(obj, &page, &idx);
+		offset = obj_idx_to_offset(page, idx, class->size);
+
+		link = (struct link_free *)kmap_atomic(page) +
+					offset / sizeof(*link);
+		obj = (unsigned long)link->next;
+		kunmap_atomic(link);
+	}
+
+	return false;
+}
+
 static void obj_free(unsigned long obj, struct page *page, unsigned long offset)
 {
 	struct page *first_page = get_first_page(page);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/10] zsmalloc: add reclaim_zspage()
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
                   ` (6 preceding siblings ...)
  2014-09-11 20:53 ` [PATCH 07/10] zsmalloc: add obj_handle_is_free() Dan Streetman
@ 2014-09-11 20:53 ` Dan Streetman
  2014-09-11 20:54 ` [PATCH 09/10] zsmalloc: add zs_shrink() Dan Streetman
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Dan Streetman @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

Add function reclaim_zspage() to evict each object in use in the provided
zspage, so that it can be freed.  This is required to be able to shrink
the zs_pool.  Check in zs_free() if the handle's zspage is in the reclaim
fullness group, and if so ignore it, since it will be freed during reclaim.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index ab72390..60fd23e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -170,6 +170,7 @@ enum fullness_group {
 	_ZS_NR_FULLNESS_GROUPS,
 
 	ZS_EMPTY,
+	ZS_RECLAIM
 };
 #define _ZS_NR_AVAILABLE_FULLNESS_GROUPS ZS_FULL
 
@@ -786,6 +787,80 @@ cleanup:
 	return first_page;
 }
 
+/*
+ * This tries to reclaim all the provided zspage's objects by calling the
+ * zs_pool's ops->evict function for each object in use.  This requires
+ * the zspage's class lock to be held when calling this function.  Since
+ * the evict function may sleep, this drops the class lock before evicting
+ * and objects.  No other locks should be held when calling this function.
+ * This will return with the class lock unlocked.
+ *
+ * If there is no zs_pool->ops or ops->evict function, this returns error.
+ *
+ * This returns 0 on success, -err on failure.  On failure, some of the
+ * objects may have been freed, but not all.  On success, the entire zspage
+ * has been freed and should not be used anymore.
+ */
+static int reclaim_zspage(struct zs_pool *pool, struct page *first_page)
+{
+	struct size_class *class;
+	enum fullness_group fullness;
+	struct page *page = first_page;
+	unsigned long handle;
+	int class_idx, ret = 0;
+
+	BUG_ON(!is_first_page(first_page));
+
+	get_zspage_mapping(first_page, &class_idx, &fullness);
+	class = &pool->size_class[class_idx];
+
+	assert_spin_locked(&class->lock);
+
+	if (!pool->ops || !pool->ops->evict) {
+		spin_unlock(&class->lock);
+		return -EINVAL;
+	}
+
+	/* move the zspage into the reclaim fullness group,
+	 * so it's not available for use by zs_malloc,
+	 * and won't be freed by zs_free
+	 */
+	remove_zspage(first_page, class, fullness);
+	set_zspage_mapping(first_page, class_idx, ZS_RECLAIM);
+
+	spin_unlock(&class->lock);
+
+	might_sleep();
+
+	while (page) {
+		unsigned long offset, idx = 0;
+
+		while ((offset = obj_idx_to_offset(page, idx, class->size))
+					< PAGE_SIZE) {
+			handle = (unsigned long)obj_location_to_handle(page,
+						idx++);
+			if (obj_handle_is_free(first_page, class, handle))
+				continue;
+			ret = pool->ops->evict(pool, handle);
+			if (ret) {
+				spin_lock(&class->lock);
+				fix_fullness_group(pool, first_page);
+				spin_unlock(&class->lock);
+				return ret;
+			}
+			obj_free(handle, page, offset);
+		}
+
+		page = get_next_page(page);
+	}
+
+	free_zspage(first_page);
+
+	atomic_long_sub(class->pages_per_zspage, &pool->pages_allocated);
+
+	return 0;
+}
+
 static struct page *find_available_zspage(struct size_class *class)
 {
 	int i;
@@ -1200,6 +1275,13 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
 
 	spin_lock(&class->lock);
 
+	/* must re-check fullness after taking class lock */
+	get_zspage_mapping(first_page, &class_idx, &fullness);
+	if (fullness == ZS_RECLAIM) {
+		spin_unlock(&class->lock);
+		return; /* will be freed during reclaim */
+	}
+
 	obj_free(obj, f_page, f_offset);
 
 	fullness = fix_fullness_group(pool, first_page);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/10] zsmalloc: add zs_shrink()
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
                   ` (7 preceding siblings ...)
  2014-09-11 20:53 ` [PATCH 08/10] zsmalloc: add reclaim_zspage() Dan Streetman
@ 2014-09-11 20:54 ` Dan Streetman
  2014-09-11 20:54 ` [PATCH 10/10] zsmalloc: implement zs_zpool_shrink() with zs_shrink() Dan Streetman
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Dan Streetman @ 2014-09-11 20:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

Add function zs_shrink() to implement zs_pool shrinking.  This allows
the pool owner to reduce the size of the zs_pool by one zspage, which
contains one or more struct pages.  Once the zs_pool is shrunk, the
freed pages are available for system use.

This is used by zswap to limit its total system memory usage to a
user-defined amount, while attempting to keep the most recently stored
pages compressed in memory, and the oldest (or older) pages are evicted
from the zsmalloc zs_pool and written out to swap disk.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
---
 include/linux/zsmalloc.h |  1 +
 mm/zsmalloc.c            | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 2c341d4..07fe84d 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -45,6 +45,7 @@ void zs_destroy_pool(struct zs_pool *pool);
 
 unsigned long zs_malloc(struct zs_pool *pool, size_t size);
 void zs_free(struct zs_pool *pool, unsigned long obj);
+int zs_shrink(struct zs_pool *pool);
 
 void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 			enum zs_mapmode mm);
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 60fd23e..f769c21 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1296,6 +1296,41 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
 EXPORT_SYMBOL_GPL(zs_free);
 
 /**
+ * zs_shrink - Shrink the pool
+ * @pool: pool to shrink
+ *
+ * The pool will be shrunk by one zspage, which is some
+ * number of pages in size.  On success, the number of freed
+ * pages is returned.  On failure, the error is returned.
+ */
+int zs_shrink(struct zs_pool *pool)
+{
+	struct size_class *class;
+	enum fullness_group fullness;
+	struct page *page;
+	int class_idx, ret;
+
+	if (!pool->ops || !pool->ops->evict)
+		return -EINVAL;
+
+	/* if a page is found, the class is locked */
+	page = find_lru_zspage(pool);
+	if (!page)
+		return -ENOENT;
+
+	get_zspage_mapping(page, &class_idx, &fullness);
+	class = &pool->size_class[class_idx];
+
+	/* reclaim_zspage unlocks the class lock */
+	ret = reclaim_zspage(pool, page);
+	if (ret)
+		return ret;
+
+	return class->pages_per_zspage;
+}
+EXPORT_SYMBOL_GPL(zs_shrink);
+
+/**
  * zs_map_object - get address of allocated object from handle.
  * @pool: pool from which the object was allocated
  * @handle: handle returned from zs_malloc
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/10] zsmalloc: implement zs_zpool_shrink() with zs_shrink()
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
                   ` (8 preceding siblings ...)
  2014-09-11 20:54 ` [PATCH 09/10] zsmalloc: add zs_shrink() Dan Streetman
@ 2014-09-11 20:54 ` Dan Streetman
  2014-09-12  3:14 ` [PATCH 00/10] implement zsmalloc shrinking Seth Jennings
  2014-09-12  5:46 ` Minchan Kim
  11 siblings, 0 replies; 23+ messages in thread
From: Dan Streetman @ 2014-09-11 20:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

Implement the zs_zpool_shrink() function, which previously just returned
EINVAL, by calling the zs_shrink() function to shrink the zs_pool by one
zspage.  The zs_shrink() function is called in a loop until the requested
number of pages have been reclaimed, or an error occurs.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f769c21..4937b2b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -292,7 +292,20 @@ static void zs_zpool_free(void *pool, unsigned long handle)
 static int zs_zpool_shrink(void *pool, unsigned int pages,
 			unsigned int *reclaimed)
 {
-	return -EINVAL;
+	int total = 0, ret = 0;
+
+	while (total < pages) {
+		ret = zs_shrink(pool);
+		WARN_ON(!ret);
+		if (ret <= 0)
+			break;
+		total += ret;
+		ret = 0;
+	}
+
+	if (reclaimed)
+		*reclaimed = total;
+	return ret;
 }
 
 static void *zs_zpool_map(void *pool, unsigned long handle,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/10] implement zsmalloc shrinking
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
                   ` (9 preceding siblings ...)
  2014-09-11 20:54 ` [PATCH 10/10] zsmalloc: implement zs_zpool_shrink() with zs_shrink() Dan Streetman
@ 2014-09-12  3:14 ` Seth Jennings
  2014-09-12  5:46 ` Minchan Kim
  11 siblings, 0 replies; 23+ messages in thread
From: Seth Jennings @ 2014-09-12  3:14 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky,
	Nitin Gupta, Andrew Morton

On Thu, Sep 11, 2014 at 04:53:51PM -0400, Dan Streetman wrote:
> Now that zswap can use zsmalloc as a storage pool via zpool, it will
> try to shrink its zsmalloc zs_pool once it reaches its max_pool_percent
> limit.  These patches implement zsmalloc shrinking.  The way the pool is
> shrunk is by finding a zspage and reclaiming it, by evicting each of its
> objects that is in use.
> 
> Without these patches zswap, and any other future user of zpool/zsmalloc
> that attempts to shrink the zpool/zs_pool, will only get errors and will
> be unable to shrink its zpool/zs_pool.  With the ability to shrink, zswap
> can keep the most recent compressed pages in memory.
> 
> Note that the design of zsmalloc makes it impossible to actually find the
> LRU zspage, so each class and fullness group is searched in a round-robin
> method to find the next zspage to reclaim.  Each fullness group orders its
> zspages in LRU order, so the oldest zspage is used for each fullness group.

After a quick inspection, the code looks reasonable.  Thanks!

I do wonder if this actually works well in practice though.

Have you run any tests that overflow the zsmalloc pool?  What does
performance look like at that point?  I would expect it would be worse
than allowing the overflow pages to go straight to swap, since, in
almost every case, you would be writing back more than one page.  In
some cases, MANY more than one page (up to 255 for a full zspage in the
minimum class size).

There have always been two sticking points with shrinking in zsmalloc
(one of which you have mentioned)

1) Low LRU locality among objects in a zspage.  zsmalloc values density
over reclaim ordering so it is hard to make good reclaim selection
decisions.

2) Writeback storm. If you try to reclaim a zspage with lots of objects
(i.e. small class size in fullness group ZS_FULL) you can create a ton
of memory pressure by uncompressing objects and adding them to the swap
cache.

A few reclaim models:

- Reclaim zspage with fewest objects: 

  This reduces writeback storm but would likely reclaim more recently
  allocated zspages that contain more recently used (added) objects.

- Reclaim zspage with largest class size:

  This also reduces writeback storm as zspages with larger objects
  (poorly compressible) are written back first.  This is not LRU though.
  This is the best of the options IMHO.  I'm not saying that is it good.

- Reclaim LRU round-robin through the fullness groups (approach used):

  The LRU here is limited since as the number of object in the zspage
  increase, it is LRU only wrt the most recently added object in the
  zspage.  It also has high risk of a writeback storm since it will
  eventually try to reclaim from the ZS_FULL group of the minimum class
  size.

There is also the point that writing back objects might not be the best
way to reclaim from zsmalloc at all.  Maybe compaction is the way to go.
This was recently discussed on the list.

http://marc.info/?l=linux-mm&m=140917577412645&w=2

As mentioned in that thread, it would require zsmalloc to add a
layer of indirection so that the objects could be relocated without
notifying the user.  The compaction mechanism would also be "fun" to
design I imagine.  But, in my mind, compaction is really needed,
regardless of whether or not zsmalloc is capable of writeback, and would
be more beneficial.

tl;dr version:

I would really need to see some evidence (and try it myself) that this
didn't run off a cliff when you overflow the zsmalloc pool.  It seems
like additional risk and complexity to avoid LRU inversion _after_ the
pool overflows.  And by "avoid" I mean "maybe avoid" as the reclaim
selection is just slightly more LRUish than random selection.

Thanks,
Seth

> 
> ---
> 
> This patch set applies to linux-next.
> 
> Dan Streetman (10):
>   zsmalloc: fix init_zspage free obj linking
>   zsmalloc: add fullness group list for ZS_FULL zspages
>   zsmalloc: always update lru ordering of each zspage
>   zsmalloc: move zspage obj freeing to separate function
>   zsmalloc: add atomic index to find zspage to reclaim
>   zsmalloc: add zs_ops to zs_pool
>   zsmalloc: add obj_handle_is_free()
>   zsmalloc: add reclaim_zspage()
>   zsmalloc: add zs_shrink()
>   zsmalloc: implement zs_zpool_shrink() with zs_shrink()
> 
>  drivers/block/zram/zram_drv.c |   2 +-
>  include/linux/zsmalloc.h      |   7 +-
>  mm/zsmalloc.c                 | 314 +++++++++++++++++++++++++++++++++++++-----
>  3 files changed, 290 insertions(+), 33 deletions(-)
> 
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/10] zsmalloc: fix init_zspage free obj linking
  2014-09-11 20:53 ` [PATCH 01/10] zsmalloc: fix init_zspage free obj linking Dan Streetman
@ 2014-09-12  3:16   ` Seth Jennings
  2014-09-12  4:59   ` Minchan Kim
  1 sibling, 0 replies; 23+ messages in thread
From: Seth Jennings @ 2014-09-12  3:16 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky,
	Nitin Gupta, Andrew Morton

On Thu, Sep 11, 2014 at 04:53:52PM -0400, Dan Streetman wrote:
> When zsmalloc creates a new zspage, it initializes each object it contains
> with a link to the next object, so that the zspage has a singly-linked list
> of its free objects.  However, the logic that sets up the links is wrong,
> and in the case of objects that are precisely aligned with the page boundries
> (e.g. a zspage with objects that are 1/2 PAGE_SIZE) the first object on the
> next page is skipped, due to incrementing the offset twice.  The logic can be
> simplified, as it doesn't need to calculate how many objects can fit on the
> current page; simply checking the offset for each object is enough.
> 
> Change zsmalloc init_zspage() logic to iterate through each object on
> each of its pages, checking the offset to verify the object is on the
> current page before linking it into the zspage.
> 
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
> Cc: Minchan Kim <minchan@kernel.org>

This one stands on its own as a bugfix.

Reviewed-by: Seth Jennings <sjennings@variantweb.net>

> ---
>  mm/zsmalloc.c | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index c4a9157..03aa72f 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -628,7 +628,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
>  	while (page) {
>  		struct page *next_page;
>  		struct link_free *link;
> -		unsigned int i, objs_on_page;
> +		unsigned int i = 1;
>  
>  		/*
>  		 * page->index stores offset of first object starting
> @@ -641,14 +641,10 @@ static void init_zspage(struct page *first_page, struct size_class *class)
>  
>  		link = (struct link_free *)kmap_atomic(page) +
>  						off / sizeof(*link);
> -		objs_on_page = (PAGE_SIZE - off) / class->size;
>  
> -		for (i = 1; i <= objs_on_page; i++) {
> -			off += class->size;
> -			if (off < PAGE_SIZE) {
> -				link->next = obj_location_to_handle(page, i);
> -				link += class->size / sizeof(*link);
> -			}
> +		while ((off += class->size) < PAGE_SIZE) {
> +			link->next = obj_location_to_handle(page, i++);
> +			link += class->size / sizeof(*link);
>  		}
>  
>  		/*
> @@ -660,7 +656,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
>  		link->next = obj_location_to_handle(next_page, 0);
>  		kunmap_atomic(link);
>  		page = next_page;
> -		off = (off + class->size) % PAGE_SIZE;
> +		off %= PAGE_SIZE;
>  	}
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/10] zsmalloc: always update lru ordering of each zspage
  2014-09-11 20:53 ` [PATCH 03/10] zsmalloc: always update lru ordering of each zspage Dan Streetman
@ 2014-09-12  3:20   ` Seth Jennings
  0 siblings, 0 replies; 23+ messages in thread
From: Seth Jennings @ 2014-09-12  3:20 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky,
	Nitin Gupta, Andrew Morton

On Thu, Sep 11, 2014 at 04:53:54PM -0400, Dan Streetman wrote:
> Update ordering of a changed zspage in its fullness group LRU list,
> even if it has not moved to a different fullness group.
> 
> This is needed by zsmalloc shrinking, which partially relies on each
> class fullness group list to be kept in LRU order, so the oldest can
> be reclaimed first.  Currently, LRU ordering is only updated when
> a zspage changes fullness groups.

Just something I saw.

fix_fullness_group() is called from zs_free(), which means that removing
an object from a zspage moves it to the front of the LRU.  Not sure if
that is what we want.  If anything that makes it a _better_ candidate
for reclaim as the zspage is now contains fewer objects that we'll have
to decompress and writeback.

Seth

> 
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
> Cc: Minchan Kim <minchan@kernel.org>
> ---
>  mm/zsmalloc.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index fedb70f..51db622 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -467,16 +467,14 @@ static enum fullness_group fix_fullness_group(struct zs_pool *pool,
>  	BUG_ON(!is_first_page(page));
>  
>  	get_zspage_mapping(page, &class_idx, &currfg);
> -	newfg = get_fullness_group(page);
> -	if (newfg == currfg)
> -		goto out;
> -
>  	class = &pool->size_class[class_idx];
> +	newfg = get_fullness_group(page);
> +	/* Need to do this even if currfg == newfg, to update lru */
>  	remove_zspage(page, class, currfg);
>  	insert_zspage(page, class, newfg);
> -	set_zspage_mapping(page, class_idx, newfg);
> +	if (currfg != newfg)
> +		set_zspage_mapping(page, class_idx, newfg);
>  
> -out:
>  	return newfg;
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/10] zsmalloc: fix init_zspage free obj linking
  2014-09-11 20:53 ` [PATCH 01/10] zsmalloc: fix init_zspage free obj linking Dan Streetman
  2014-09-12  3:16   ` Seth Jennings
@ 2014-09-12  4:59   ` Minchan Kim
  2014-09-12 16:43     ` Dan Streetman
  1 sibling, 1 reply; 23+ messages in thread
From: Minchan Kim @ 2014-09-12  4:59 UTC (permalink / raw)
  To: Dan Streetman
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton

On Thu, Sep 11, 2014 at 04:53:52PM -0400, Dan Streetman wrote:
> When zsmalloc creates a new zspage, it initializes each object it contains
> with a link to the next object, so that the zspage has a singly-linked list
> of its free objects.  However, the logic that sets up the links is wrong,
> and in the case of objects that are precisely aligned with the page boundries
> (e.g. a zspage with objects that are 1/2 PAGE_SIZE) the first object on the
> next page is skipped, due to incrementing the offset twice.  The logic can be
> simplified, as it doesn't need to calculate how many objects can fit on the
> current page; simply checking the offset for each object is enough.

If objects are precisely aligned with the page boundary, pages_per_zspage
should be 1 so there is no next page.

> 
> Change zsmalloc init_zspage() logic to iterate through each object on
> each of its pages, checking the offset to verify the object is on the
> current page before linking it into the zspage.
> 
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
> Cc: Minchan Kim <minchan@kernel.org>
> ---
>  mm/zsmalloc.c | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index c4a9157..03aa72f 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -628,7 +628,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
>  	while (page) {
>  		struct page *next_page;
>  		struct link_free *link;
> -		unsigned int i, objs_on_page;
> +		unsigned int i = 1;
>  
>  		/*
>  		 * page->index stores offset of first object starting
> @@ -641,14 +641,10 @@ static void init_zspage(struct page *first_page, struct size_class *class)
>  
>  		link = (struct link_free *)kmap_atomic(page) +
>  						off / sizeof(*link);
> -		objs_on_page = (PAGE_SIZE - off) / class->size;
>  
> -		for (i = 1; i <= objs_on_page; i++) {
> -			off += class->size;
> -			if (off < PAGE_SIZE) {
> -				link->next = obj_location_to_handle(page, i);
> -				link += class->size / sizeof(*link);
> -			}
> +		while ((off += class->size) < PAGE_SIZE) {
> +			link->next = obj_location_to_handle(page, i++);
> +			link += class->size / sizeof(*link);
>  		}
>  
>  		/*
> @@ -660,7 +656,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
>  		link->next = obj_location_to_handle(next_page, 0);
>  		kunmap_atomic(link);
>  		page = next_page;
> -		off = (off + class->size) % PAGE_SIZE;
> +		off %= PAGE_SIZE;
>  	}
>  }
>  
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/10] implement zsmalloc shrinking
  2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
                   ` (10 preceding siblings ...)
  2014-09-12  3:14 ` [PATCH 00/10] implement zsmalloc shrinking Seth Jennings
@ 2014-09-12  5:46 ` Minchan Kim
  2014-09-12 17:05   ` Dan Streetman
  11 siblings, 1 reply; 23+ messages in thread
From: Minchan Kim @ 2014-09-12  5:46 UTC (permalink / raw)
  To: Dan Streetman
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Mel Gorman

On Thu, Sep 11, 2014 at 04:53:51PM -0400, Dan Streetman wrote:
> Now that zswap can use zsmalloc as a storage pool via zpool, it will
> try to shrink its zsmalloc zs_pool once it reaches its max_pool_percent
> limit.  These patches implement zsmalloc shrinking.  The way the pool is
> shrunk is by finding a zspage and reclaiming it, by evicting each of its
> objects that is in use.
> 
> Without these patches zswap, and any other future user of zpool/zsmalloc
> that attempts to shrink the zpool/zs_pool, will only get errors and will
> be unable to shrink its zpool/zs_pool.  With the ability to shrink, zswap
> can keep the most recent compressed pages in memory.
> 
> Note that the design of zsmalloc makes it impossible to actually find the
> LRU zspage, so each class and fullness group is searched in a round-robin
> method to find the next zspage to reclaim.  Each fullness group orders its
> zspages in LRU order, so the oldest zspage is used for each fullness group.
> 

1. Pz, Cc Mel who was strong against zswap with zsmalloc.
2. I don't think LRU stuff should be in allocator layer. Exp, it's really
   hard to work well in zsmalloc design.
3. If you want to add another writeback, make zswap writeback sane first.
   current implemenation(zswap store -> zbud reclaim -> zswap writeback,
   even) is really ugly.
4. Don't make zsmalloc complicated without any data(benefit, regression)
   I will never ack if you don't give any number and real usecase.

> ---
> 
> This patch set applies to linux-next.
> 
> Dan Streetman (10):
>   zsmalloc: fix init_zspage free obj linking
>   zsmalloc: add fullness group list for ZS_FULL zspages
>   zsmalloc: always update lru ordering of each zspage
>   zsmalloc: move zspage obj freeing to separate function
>   zsmalloc: add atomic index to find zspage to reclaim
>   zsmalloc: add zs_ops to zs_pool
>   zsmalloc: add obj_handle_is_free()
>   zsmalloc: add reclaim_zspage()
>   zsmalloc: add zs_shrink()
>   zsmalloc: implement zs_zpool_shrink() with zs_shrink()
> 
>  drivers/block/zram/zram_drv.c |   2 +-
>  include/linux/zsmalloc.h      |   7 +-
>  mm/zsmalloc.c                 | 314 +++++++++++++++++++++++++++++++++++++-----
>  3 files changed, 290 insertions(+), 33 deletions(-)
> 
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/10] zsmalloc: fix init_zspage free obj linking
  2014-09-12  4:59   ` Minchan Kim
@ 2014-09-12 16:43     ` Dan Streetman
  2014-09-14 23:24       ` Minchan Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Streetman @ 2014-09-12 16:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linux-MM, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton

On Fri, Sep 12, 2014 at 12:59 AM, Minchan Kim <minchan@kernel.org> wrote:
> On Thu, Sep 11, 2014 at 04:53:52PM -0400, Dan Streetman wrote:
>> When zsmalloc creates a new zspage, it initializes each object it contains
>> with a link to the next object, so that the zspage has a singly-linked list
>> of its free objects.  However, the logic that sets up the links is wrong,
>> and in the case of objects that are precisely aligned with the page boundries
>> (e.g. a zspage with objects that are 1/2 PAGE_SIZE) the first object on the
>> next page is skipped, due to incrementing the offset twice.  The logic can be
>> simplified, as it doesn't need to calculate how many objects can fit on the
>> current page; simply checking the offset for each object is enough.
>
> If objects are precisely aligned with the page boundary, pages_per_zspage
> should be 1 so there is no next page.

ah, ok.  I wonder if it should be changed anyway so it doesn't rely on
that detail, in case that's ever changed in the future.  It's not
obvious the existing logic relies on that for correct operation.  And
this simplifies the logic too.

>
>>
>> Change zsmalloc init_zspage() logic to iterate through each object on
>> each of its pages, checking the offset to verify the object is on the
>> current page before linking it into the zspage.
>>
>> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
>> Cc: Minchan Kim <minchan@kernel.org>
>> ---
>>  mm/zsmalloc.c | 14 +++++---------
>>  1 file changed, 5 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
>> index c4a9157..03aa72f 100644
>> --- a/mm/zsmalloc.c
>> +++ b/mm/zsmalloc.c
>> @@ -628,7 +628,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
>>       while (page) {
>>               struct page *next_page;
>>               struct link_free *link;
>> -             unsigned int i, objs_on_page;
>> +             unsigned int i = 1;
>>
>>               /*
>>                * page->index stores offset of first object starting
>> @@ -641,14 +641,10 @@ static void init_zspage(struct page *first_page, struct size_class *class)
>>
>>               link = (struct link_free *)kmap_atomic(page) +
>>                                               off / sizeof(*link);
>> -             objs_on_page = (PAGE_SIZE - off) / class->size;
>>
>> -             for (i = 1; i <= objs_on_page; i++) {
>> -                     off += class->size;
>> -                     if (off < PAGE_SIZE) {
>> -                             link->next = obj_location_to_handle(page, i);
>> -                             link += class->size / sizeof(*link);
>> -                     }
>> +             while ((off += class->size) < PAGE_SIZE) {
>> +                     link->next = obj_location_to_handle(page, i++);
>> +                     link += class->size / sizeof(*link);
>>               }
>>
>>               /*
>> @@ -660,7 +656,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
>>               link->next = obj_location_to_handle(next_page, 0);
>>               kunmap_atomic(link);
>>               page = next_page;
>> -             off = (off + class->size) % PAGE_SIZE;
>> +             off %= PAGE_SIZE;
>>       }
>>  }
>>
>> --
>> 1.8.3.1
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/10] implement zsmalloc shrinking
  2014-09-12  5:46 ` Minchan Kim
@ 2014-09-12 17:05   ` Dan Streetman
  2014-09-15  0:00     ` Minchan Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Streetman @ 2014-09-12 17:05 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linux-MM, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Mel Gorman

On Fri, Sep 12, 2014 at 1:46 AM, Minchan Kim <minchan@kernel.org> wrote:
> On Thu, Sep 11, 2014 at 04:53:51PM -0400, Dan Streetman wrote:
>> Now that zswap can use zsmalloc as a storage pool via zpool, it will
>> try to shrink its zsmalloc zs_pool once it reaches its max_pool_percent
>> limit.  These patches implement zsmalloc shrinking.  The way the pool is
>> shrunk is by finding a zspage and reclaiming it, by evicting each of its
>> objects that is in use.
>>
>> Without these patches zswap, and any other future user of zpool/zsmalloc
>> that attempts to shrink the zpool/zs_pool, will only get errors and will
>> be unable to shrink its zpool/zs_pool.  With the ability to shrink, zswap
>> can keep the most recent compressed pages in memory.
>>
>> Note that the design of zsmalloc makes it impossible to actually find the
>> LRU zspage, so each class and fullness group is searched in a round-robin
>> method to find the next zspage to reclaim.  Each fullness group orders its
>> zspages in LRU order, so the oldest zspage is used for each fullness group.
>>
>
> 1. Pz, Cc Mel who was strong against zswap with zsmalloc.
> 2. I don't think LRU stuff should be in allocator layer. Exp, it's really
>    hard to work well in zsmalloc design.

I didn't add any LRU - the existing fullness group LRU ordering is
already there.  And yes, the zsmalloc design prevents any real LRU
ordering, beyond per-fullness-group LRU ordering.

> 3. If you want to add another writeback, make zswap writeback sane first.
>    current implemenation(zswap store -> zbud reclaim -> zswap writeback,
>    even) is really ugly.

why what's wrong with that?  how else can zbud/zsmalloc evict stored objects?

> 4. Don't make zsmalloc complicated without any data(benefit, regression)
>    I will never ack if you don't give any number and real usecase.

ok, i'll run performance tests then, but let me know if you see any
technical problems with any of the patches before then.

thanks!

>
>> ---
>>
>> This patch set applies to linux-next.
>>
>> Dan Streetman (10):
>>   zsmalloc: fix init_zspage free obj linking
>>   zsmalloc: add fullness group list for ZS_FULL zspages
>>   zsmalloc: always update lru ordering of each zspage
>>   zsmalloc: move zspage obj freeing to separate function
>>   zsmalloc: add atomic index to find zspage to reclaim
>>   zsmalloc: add zs_ops to zs_pool
>>   zsmalloc: add obj_handle_is_free()
>>   zsmalloc: add reclaim_zspage()
>>   zsmalloc: add zs_shrink()
>>   zsmalloc: implement zs_zpool_shrink() with zs_shrink()
>>
>>  drivers/block/zram/zram_drv.c |   2 +-
>>  include/linux/zsmalloc.h      |   7 +-
>>  mm/zsmalloc.c                 | 314 +++++++++++++++++++++++++++++++++++++-----
>>  3 files changed, 290 insertions(+), 33 deletions(-)
>>
>> --
>> 1.8.3.1
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/10] zsmalloc: fix init_zspage free obj linking
  2014-09-12 16:43     ` Dan Streetman
@ 2014-09-14 23:24       ` Minchan Kim
  2014-09-15 20:58         ` [PATCH] zsmalloc: simplify " Dan Streetman
  0 siblings, 1 reply; 23+ messages in thread
From: Minchan Kim @ 2014-09-14 23:24 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Linux-MM, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton

On Fri, Sep 12, 2014 at 12:43:22PM -0400, Dan Streetman wrote:
> On Fri, Sep 12, 2014 at 12:59 AM, Minchan Kim <minchan@kernel.org> wrote:
> > On Thu, Sep 11, 2014 at 04:53:52PM -0400, Dan Streetman wrote:
> >> When zsmalloc creates a new zspage, it initializes each object it contains
> >> with a link to the next object, so that the zspage has a singly-linked list
> >> of its free objects.  However, the logic that sets up the links is wrong,
> >> and in the case of objects that are precisely aligned with the page boundries
> >> (e.g. a zspage with objects that are 1/2 PAGE_SIZE) the first object on the
> >> next page is skipped, due to incrementing the offset twice.  The logic can be
> >> simplified, as it doesn't need to calculate how many objects can fit on the
> >> current page; simply checking the offset for each object is enough.
> >
> > If objects are precisely aligned with the page boundary, pages_per_zspage
> > should be 1 so there is no next page.
> 
> ah, ok.  I wonder if it should be changed anyway so it doesn't rely on
> that detail, in case that's ever changed in the future.  It's not
> obvious the existing logic relies on that for correct operation.  And
> this simplifies the logic too.

Correct description and resend if you want.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/10] implement zsmalloc shrinking
  2014-09-12 17:05   ` Dan Streetman
@ 2014-09-15  0:00     ` Minchan Kim
  2014-09-15 14:29       ` Dan Streetman
  0 siblings, 1 reply; 23+ messages in thread
From: Minchan Kim @ 2014-09-15  0:00 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Linux-MM, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Mel Gorman

On Fri, Sep 12, 2014 at 01:05:11PM -0400, Dan Streetman wrote:
> On Fri, Sep 12, 2014 at 1:46 AM, Minchan Kim <minchan@kernel.org> wrote:
> > On Thu, Sep 11, 2014 at 04:53:51PM -0400, Dan Streetman wrote:
> >> Now that zswap can use zsmalloc as a storage pool via zpool, it will
> >> try to shrink its zsmalloc zs_pool once it reaches its max_pool_percent
> >> limit.  These patches implement zsmalloc shrinking.  The way the pool is
> >> shrunk is by finding a zspage and reclaiming it, by evicting each of its
> >> objects that is in use.
> >>
> >> Without these patches zswap, and any other future user of zpool/zsmalloc
> >> that attempts to shrink the zpool/zs_pool, will only get errors and will
> >> be unable to shrink its zpool/zs_pool.  With the ability to shrink, zswap
> >> can keep the most recent compressed pages in memory.
> >>
> >> Note that the design of zsmalloc makes it impossible to actually find the
> >> LRU zspage, so each class and fullness group is searched in a round-robin
> >> method to find the next zspage to reclaim.  Each fullness group orders its
> >> zspages in LRU order, so the oldest zspage is used for each fullness group.
> >>
> >
> > 1. Pz, Cc Mel who was strong against zswap with zsmalloc.
> > 2. I don't think LRU stuff should be in allocator layer. Exp, it's really
> >    hard to work well in zsmalloc design.
> 
> I didn't add any LRU - the existing fullness group LRU ordering is
> already there.  And yes, the zsmalloc design prevents any real LRU

I don't think It's not LRU for reclaiming but just simple linked list for
finding free slot.

> ordering, beyond per-fullness-group LRU ordering.

Yes.

> 
> > 3. If you want to add another writeback, make zswap writeback sane first.
> >    current implemenation(zswap store -> zbud reclaim -> zswap writeback,
> >    even) is really ugly.
> 
> why what's wrong with that?  how else can zbud/zsmalloc evict stored objects?

You can refer Mel's suggestion for zswap/zsmalloc and writeback problem.

http://www.spinics.net/lists/linux-mm/msg61601.html
http://lkml.iu.edu//hypermail/linux/kernel/1304.1/04334.html

I think LRU/writeback should be upper layer, not allocator itself.
Please, don't force every allocator to implement it for only zswap.

> 
> > 4. Don't make zsmalloc complicated without any data(benefit, regression)
> >    I will never ack if you don't give any number and real usecase.
> 
> ok, i'll run performance tests then, but let me know if you see any
> technical problems with any of the patches before then.
> 
> thanks!
> 
> >
> >> ---
> >>
> >> This patch set applies to linux-next.
> >>
> >> Dan Streetman (10):
> >>   zsmalloc: fix init_zspage free obj linking
> >>   zsmalloc: add fullness group list for ZS_FULL zspages
> >>   zsmalloc: always update lru ordering of each zspage
> >>   zsmalloc: move zspage obj freeing to separate function
> >>   zsmalloc: add atomic index to find zspage to reclaim
> >>   zsmalloc: add zs_ops to zs_pool
> >>   zsmalloc: add obj_handle_is_free()
> >>   zsmalloc: add reclaim_zspage()
> >>   zsmalloc: add zs_shrink()
> >>   zsmalloc: implement zs_zpool_shrink() with zs_shrink()
> >>
> >>  drivers/block/zram/zram_drv.c |   2 +-
> >>  include/linux/zsmalloc.h      |   7 +-
> >>  mm/zsmalloc.c                 | 314 +++++++++++++++++++++++++++++++++++++-----
> >>  3 files changed, 290 insertions(+), 33 deletions(-)
> >>
> >> --
> >> 1.8.3.1
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> > --
> > Kind regards,
> > Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/10] implement zsmalloc shrinking
  2014-09-15  0:00     ` Minchan Kim
@ 2014-09-15 14:29       ` Dan Streetman
  0 siblings, 0 replies; 23+ messages in thread
From: Dan Streetman @ 2014-09-15 14:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linux-MM, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Mel Gorman

On Sun, Sep 14, 2014 at 8:00 PM, Minchan Kim <minchan@kernel.org> wrote:
> On Fri, Sep 12, 2014 at 01:05:11PM -0400, Dan Streetman wrote:
>> On Fri, Sep 12, 2014 at 1:46 AM, Minchan Kim <minchan@kernel.org> wrote:
>> > On Thu, Sep 11, 2014 at 04:53:51PM -0400, Dan Streetman wrote:
>> >> Now that zswap can use zsmalloc as a storage pool via zpool, it will
>> >> try to shrink its zsmalloc zs_pool once it reaches its max_pool_percent
>> >> limit.  These patches implement zsmalloc shrinking.  The way the pool is
>> >> shrunk is by finding a zspage and reclaiming it, by evicting each of its
>> >> objects that is in use.
>> >>
>> >> Without these patches zswap, and any other future user of zpool/zsmalloc
>> >> that attempts to shrink the zpool/zs_pool, will only get errors and will
>> >> be unable to shrink its zpool/zs_pool.  With the ability to shrink, zswap
>> >> can keep the most recent compressed pages in memory.
>> >>
>> >> Note that the design of zsmalloc makes it impossible to actually find the
>> >> LRU zspage, so each class and fullness group is searched in a round-robin
>> >> method to find the next zspage to reclaim.  Each fullness group orders its
>> >> zspages in LRU order, so the oldest zspage is used for each fullness group.
>> >>
>> >
>> > 1. Pz, Cc Mel who was strong against zswap with zsmalloc.
>> > 2. I don't think LRU stuff should be in allocator layer. Exp, it's really
>> >    hard to work well in zsmalloc design.
>>
>> I didn't add any LRU - the existing fullness group LRU ordering is
>> already there.  And yes, the zsmalloc design prevents any real LRU
>
> I don't think It's not LRU for reclaiming but just simple linked list for
> finding free slot.

yes, but it does happen to be in LRU order.  However the LRU ordering
here isn't as important as being able to free the least amount of
pages in order to shrink the zspool.

>
>> ordering, beyond per-fullness-group LRU ordering.
>
> Yes.
>
>>
>> > 3. If you want to add another writeback, make zswap writeback sane first.
>> >    current implemenation(zswap store -> zbud reclaim -> zswap writeback,
>> >    even) is really ugly.
>>
>> why what's wrong with that?  how else can zbud/zsmalloc evict stored objects?
>
> You can refer Mel's suggestion for zswap/zsmalloc and writeback problem.

As far as I can tell Mel's complaint was with zswap synchronously
shrinking the pool one-page-at-a-time during store(), which will block
anything trying to write a swap page - although Mel correct me if I'm
minunderstanding it.

That's not relevant for the reclaim->evict portion of zbud and
zsmalloc.  Once zbud or zsmalloc have been asked to shrink their size,
they must reclaim a zbud page or zspage, respectively, and to do that
they have to clear out any used handles from it.  Asking the handle
owner (zswap in this case) to evict it is the only option I can see.
And I don't see what is wrong with that, from the perspective of
zbud/zsmalloc.

Updating zswap to pre-emptively shrink its pool before filling up is
something that could be done, but would be entirely in zswap, and
doesn't affect how zbud or zsmalloc work w.r.t shrinking, reclaim, or
evict.  It's not related to adding shrinking to zsmalloc.

>
> http://www.spinics.net/lists/linux-mm/msg61601.html
> http://lkml.iu.edu//hypermail/linux/kernel/1304.1/04334.html
>
> I think LRU/writeback should be upper layer, not allocator itself.
> Please, don't force every allocator to implement it for only zswap.

zswap could maintain an LRU list, but what it can't do it know which
of its stored pages are grouped together by the allocator.  So freeing
pages in a strictly LRU order would almost certainly 1) require
evicting many more pages than the allocator would have just to shrink
the pool by 1 page, and 2) increase the allocator's fragmentation,
possibly quite badly depending on how much shrinking is needed.

At the point when zswap needs to start shrinking its pool, there is
clearly a lot of memory pressure, and it doesn't make sense to evict
more pages than it needs to, nor does it make sense to increase
fragmentation in the storage pool, and waste memory.

>
>>
>> > 4. Don't make zsmalloc complicated without any data(benefit, regression)
>> >    I will never ack if you don't give any number and real usecase.
>>
>> ok, i'll run performance tests then, but let me know if you see any
>> technical problems with any of the patches before then.
>>
>> thanks!
>>
>> >
>> >> ---
>> >>
>> >> This patch set applies to linux-next.
>> >>
>> >> Dan Streetman (10):
>> >>   zsmalloc: fix init_zspage free obj linking
>> >>   zsmalloc: add fullness group list for ZS_FULL zspages
>> >>   zsmalloc: always update lru ordering of each zspage
>> >>   zsmalloc: move zspage obj freeing to separate function
>> >>   zsmalloc: add atomic index to find zspage to reclaim
>> >>   zsmalloc: add zs_ops to zs_pool
>> >>   zsmalloc: add obj_handle_is_free()
>> >>   zsmalloc: add reclaim_zspage()
>> >>   zsmalloc: add zs_shrink()
>> >>   zsmalloc: implement zs_zpool_shrink() with zs_shrink()
>> >>
>> >>  drivers/block/zram/zram_drv.c |   2 +-
>> >>  include/linux/zsmalloc.h      |   7 +-
>> >>  mm/zsmalloc.c                 | 314 +++++++++++++++++++++++++++++++++++++-----
>> >>  3 files changed, 290 insertions(+), 33 deletions(-)
>> >>
>> >> --
>> >> 1.8.3.1
>> >>
>> >> --
>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> see: http://www.linux-mm.org/ .
>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >
>> > --
>> > Kind regards,
>> > Minchan Kim
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] zsmalloc: simplify init_zspage free obj linking
  2014-09-14 23:24       ` Minchan Kim
@ 2014-09-15 20:58         ` Dan Streetman
  2014-09-16  2:41           ` Minchan Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Streetman @ 2014-09-15 20:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton, Dan Streetman

Change zsmalloc init_zspage() logic to iterate through each object on
each of its pages, checking the offset to verify the object is on the
current page before linking it into the zspage.

The current zsmalloc init_zspage free object linking code has logic
that relies on there only being one page per zspage when PAGE_SIZE
is a multiple of class->size.  It calculates the number of objects
for the current page, and iterates through all of them plus one,
to account for the assumed partial object at the end of the page.
While this currently works, the logic can be simplified to just
link the object at each successive offset until the offset is larger
than PAGE_SIZE, which does not rely on PAGE_SIZE being a multiple of
class->size.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index c4a9157..03aa72f 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -628,7 +628,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
 	while (page) {
 		struct page *next_page;
 		struct link_free *link;
-		unsigned int i, objs_on_page;
+		unsigned int i = 1;
 
 		/*
 		 * page->index stores offset of first object starting
@@ -641,14 +641,10 @@ static void init_zspage(struct page *first_page, struct size_class *class)
 
 		link = (struct link_free *)kmap_atomic(page) +
 						off / sizeof(*link);
-		objs_on_page = (PAGE_SIZE - off) / class->size;
 
-		for (i = 1; i <= objs_on_page; i++) {
-			off += class->size;
-			if (off < PAGE_SIZE) {
-				link->next = obj_location_to_handle(page, i);
-				link += class->size / sizeof(*link);
-			}
+		while ((off += class->size) < PAGE_SIZE) {
+			link->next = obj_location_to_handle(page, i++);
+			link += class->size / sizeof(*link);
 		}
 
 		/*
@@ -660,7 +656,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
 		link->next = obj_location_to_handle(next_page, 0);
 		kunmap_atomic(link);
 		page = next_page;
-		off = (off + class->size) % PAGE_SIZE;
+		off %= PAGE_SIZE;
 	}
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] zsmalloc: simplify init_zspage free obj linking
  2014-09-15 20:58         ` [PATCH] zsmalloc: simplify " Dan Streetman
@ 2014-09-16  2:41           ` Minchan Kim
  0 siblings, 0 replies; 23+ messages in thread
From: Minchan Kim @ 2014-09-16  2:41 UTC (permalink / raw)
  To: Dan Streetman
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Nitin Gupta,
	Seth Jennings, Andrew Morton

On Mon, Sep 15, 2014 at 04:58:50PM -0400, Dan Streetman wrote:
> Change zsmalloc init_zspage() logic to iterate through each object on
> each of its pages, checking the offset to verify the object is on the
> current page before linking it into the zspage.
> 
> The current zsmalloc init_zspage free object linking code has logic
> that relies on there only being one page per zspage when PAGE_SIZE
> is a multiple of class->size.  It calculates the number of objects
> for the current page, and iterates through all of them plus one,
> to account for the assumed partial object at the end of the page.
> While this currently works, the logic can be simplified to just
> link the object at each successive offset until the offset is larger
> than PAGE_SIZE, which does not rely on PAGE_SIZE being a multiple of
> class->size.
> 
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
> Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-09-16  2:40 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-11 20:53 [PATCH 00/10] implement zsmalloc shrinking Dan Streetman
2014-09-11 20:53 ` [PATCH 01/10] zsmalloc: fix init_zspage free obj linking Dan Streetman
2014-09-12  3:16   ` Seth Jennings
2014-09-12  4:59   ` Minchan Kim
2014-09-12 16:43     ` Dan Streetman
2014-09-14 23:24       ` Minchan Kim
2014-09-15 20:58         ` [PATCH] zsmalloc: simplify " Dan Streetman
2014-09-16  2:41           ` Minchan Kim
2014-09-11 20:53 ` [PATCH 02/10] zsmalloc: add fullness group list for ZS_FULL zspages Dan Streetman
2014-09-11 20:53 ` [PATCH 03/10] zsmalloc: always update lru ordering of each zspage Dan Streetman
2014-09-12  3:20   ` Seth Jennings
2014-09-11 20:53 ` [PATCH 04/10] zsmalloc: move zspage obj freeing to separate function Dan Streetman
2014-09-11 20:53 ` [PATCH 05/10] zsmalloc: add atomic index to find zspage to reclaim Dan Streetman
2014-09-11 20:53 ` [PATCH 06/10] zsmalloc: add zs_ops to zs_pool Dan Streetman
2014-09-11 20:53 ` [PATCH 07/10] zsmalloc: add obj_handle_is_free() Dan Streetman
2014-09-11 20:53 ` [PATCH 08/10] zsmalloc: add reclaim_zspage() Dan Streetman
2014-09-11 20:54 ` [PATCH 09/10] zsmalloc: add zs_shrink() Dan Streetman
2014-09-11 20:54 ` [PATCH 10/10] zsmalloc: implement zs_zpool_shrink() with zs_shrink() Dan Streetman
2014-09-12  3:14 ` [PATCH 00/10] implement zsmalloc shrinking Seth Jennings
2014-09-12  5:46 ` Minchan Kim
2014-09-12 17:05   ` Dan Streetman
2014-09-15  0:00     ` Minchan Kim
2014-09-15 14:29       ` Dan Streetman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).