linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/6] zsmalloc support compaction
@ 2014-12-02  2:49 Minchan Kim
  2014-12-02  2:49 ` [RFC 1/6] zsmalloc: expand size class to support sizeof(unsigned long) Minchan Kim
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Minchan Kim @ 2014-12-02  2:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Nitin Gupta, Dan Streetman,
	Seth Jennings, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi, seungho1.park, Minchan Kim

Recently, there was issue about zsmalloc fragmentation and
I got a report from Juno that new fork failed although there
are plenty of free pages in the system.
His investigation revealed zram is one of the culprit to make
heavy fragmentation so there was no more contiguous 16K page
for pgd to fork in the ARM.

This patchset implement *basic* zsmalloc compaction support
and zram utilizes it so admin can do
	"echo 1 > /sys/block/zram0/compact"

Actually, ideal is that mm migrate code is aware of zram pages and
migrate them out automatically without admin's manual opeartion
when system is out of contiguous page. Howver, we need more thinking
before adding more hooks to migrate.c. Even though we implement it,
we need manual trigger mode, too so I hope we could enhance
zram migration stuff based on this primitive functions in future.

I just tested it on only x86 so need more testing on other arches.
Additionally, I should have a number for zsmalloc regression
caused by indirect layering. Unfortunately, I don't have any
ARM test machine on my desk. I will get it soon and test it.
Anyway, before further work, I'd like to hear opinion.

Pathset is based on v3.18-rc6-mmotm-2014-11-26-15-45.

Thanks.

Minchan Kim (6):
  zsmalloc: expand size class to support sizeof(unsigned long)
  zsmalloc: add indrection layer to decouple handle from object
  zsmalloc: implement reverse mapping
  zsmalloc: encode alloced mark in handle object
  zsmalloc: support compaction
  zram: support compaction

 drivers/block/zram/zram_drv.c |  24 ++
 drivers/block/zram/zram_drv.h |   1 +
 include/linux/zsmalloc.h      |   1 +
 mm/zsmalloc.c                 | 596 +++++++++++++++++++++++++++++++++++++-----
 4 files changed, 552 insertions(+), 70 deletions(-)

-- 
2.0.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC 1/6] zsmalloc: expand size class to support sizeof(unsigned long)
  2014-12-02  2:49 [RFC 0/6] zsmalloc support compaction Minchan Kim
@ 2014-12-02  2:49 ` Minchan Kim
  2014-12-02  2:49 ` [RFC 2/6] zsmalloc: add indrection layer to decouple handle from object Minchan Kim
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2014-12-02  2:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Nitin Gupta, Dan Streetman,
	Seth Jennings, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi, seungho1.park, Minchan Kim

For compaction of zsmalloc, we need to decouple handle and
obj position binding. For that, we need another memory to
keep handle and I want to reuse existing functions of zsmalloc
to implement indirect layer.

For that, we need to support new size class(ie, sizeof(unsigned
long)) so this patch does it.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2021df5eb891..a806d714924c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -100,7 +100,8 @@
  * span more than 1 page which avoids complex case of mapping 2 pages simply
  * to restore link_free pointer values.
  */
-#define ZS_ALIGN		8
+#define ZS_ALIGN		(sizeof(struct link_free))
+#define ZS_HANDLE_SIZE		(sizeof(unsigned long))
 
 /*
  * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
@@ -138,11 +139,11 @@
 #define MAX(a, b) ((a) >= (b) ? (a) : (b))
 /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
 #define ZS_MIN_ALLOC_SIZE \
-	MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
-#define ZS_MAX_ALLOC_SIZE	PAGE_SIZE
+	MAX(ZS_ALIGN, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+#define ZS_MAX_ALLOC_SIZE	(PAGE_SIZE + ZS_HANDLE_SIZE)
 
 /*
- * On systems with 4K page size, this gives 255 size classes! There is a
+ * On systems with 4K page size, this gives 257 size classes! There is a
  * trader-off here:
  *  - Large number of size classes is potentially wasteful as free page are
  *    spread across these classes
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 2/6] zsmalloc: add indrection layer to decouple handle from object
  2014-12-02  2:49 [RFC 0/6] zsmalloc support compaction Minchan Kim
  2014-12-02  2:49 ` [RFC 1/6] zsmalloc: expand size class to support sizeof(unsigned long) Minchan Kim
@ 2014-12-02  2:49 ` Minchan Kim
  2014-12-02  2:49 ` [RFC 3/6] zsmalloc: implement reverse mapping Minchan Kim
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2014-12-02  2:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Nitin Gupta, Dan Streetman,
	Seth Jennings, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi, seungho1.park, Minchan Kim

Currently, zram's handle encodes object's location directly so
it makes hard to support migration/compaction.

This patch adds indirection layer for decoupling handle and
object location. With it, we could prepare to support migration/
compaction to prevent fragment problem of zsmalloc. As well,
it could make zram use movable pages in future.

First of all, we need indirection layer to assoicate handle
with object so that this patch introduces logics to support
the decoupling.

Old procedure of zsmalloc is as follows

* zs_malloc -> allocate object -> return object as handle
because object itself is handle.
* zs_map_object -> get object's position from handle -> map pages
* zs_unmap_object -> get object's position from handle -> unmap pages
* zs_free -> free object because handle itself encodes object's location.

New behavior is as follows

* zsmalloc -> allocate handle -> allocate object -> assoicate handle
with object -> return handle, not object.
* zs_map_object-> get object from handle via indiretion layer ->
get object's position from object -> map pages
* zs_unmap_object-> get object from handle via indirection layer ->
get object's position from object -> unmap pages
* zs_free -> get object from handle via indirection layer ->
free handle -> free object

As drawback, it would increase overhead of allocator.
Yet, it's not measured and might separate it with another config
if it's heavy.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 199 +++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 157 insertions(+), 42 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a806d714924c..5f3f9119705e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -219,6 +219,7 @@ struct link_free {
 
 struct zs_pool {
 	struct size_class **size_class;
+	struct size_class *handle_class;
 
 	gfp_t flags;	/* allocation flags used when growing pool */
 	atomic_long_t pages_allocated;
@@ -243,6 +244,11 @@ struct mapping_area {
 	enum zs_mapmode vm_mm; /* mapping mode */
 };
 
+static unsigned long __zs_malloc(struct zs_pool *pool,
+			struct size_class *class, gfp_t flags);
+static void __zs_free(struct zs_pool *pool, struct size_class *class,
+			unsigned long handle);
+
 /* zpool driver */
 
 #ifdef CONFIG_ZPOOL
@@ -458,11 +464,10 @@ static void remove_zspage(struct page *page, struct size_class *class,
  * page from the freelist of the old fullness group to that of the new
  * fullness group.
  */
-static enum fullness_group fix_fullness_group(struct zs_pool *pool,
-						struct page *page)
+static enum fullness_group fix_fullness_group(struct size_class *class,
+					struct page *page)
 {
 	int class_idx;
-	struct size_class *class;
 	enum fullness_group currfg, newfg;
 
 	BUG_ON(!is_first_page(page));
@@ -472,7 +477,6 @@ static enum fullness_group fix_fullness_group(struct zs_pool *pool,
 	if (newfg == currfg)
 		goto out;
 
-	class = pool->size_class[class_idx];
 	remove_zspage(page, class, currfg);
 	insert_zspage(page, class, newfg);
 	set_zspage_mapping(page, class_idx, newfg);
@@ -569,7 +573,7 @@ static void *obj_location_to_handle(struct page *page, unsigned long obj_idx)
  * decoded obj_idx back to its original value since it was adjusted in
  * obj_location_to_handle().
  */
-static void obj_handle_to_location(unsigned long handle, struct page **page,
+static void obj_to_location(unsigned long handle, struct page **page,
 				unsigned long *obj_idx)
 {
 	*page = pfn_to_page(handle >> OBJ_INDEX_BITS);
@@ -587,6 +591,41 @@ static unsigned long obj_idx_to_offset(struct page *page,
 	return off + obj_idx * class_size;
 }
 
+static void *handle_to_addr(struct zs_pool *pool, unsigned long handle)
+{
+	struct page *page;
+	unsigned long obj_idx, off;
+	struct size_class *class;
+
+	obj_to_location(handle, &page, &obj_idx);
+	class = pool->handle_class;
+	off = obj_idx_to_offset(page, obj_idx, class->size);
+
+	return lowmem_page_address(page) + off;
+}
+
+static unsigned long handle_to_obj(struct zs_pool *pool, unsigned long handle)
+{
+	unsigned long obj;
+	unsigned long *h_addr;
+
+	h_addr = handle_to_addr(pool, handle);
+	obj = *h_addr;
+
+	return obj;
+}
+
+static unsigned long alloc_handle(struct zs_pool *pool)
+{
+	return __zs_malloc(pool, pool->handle_class,
+			pool->flags & ~__GFP_HIGHMEM);
+}
+
+static void free_handle(struct zs_pool *pool, unsigned long handle)
+{
+	__zs_free(pool, pool->handle_class, handle);
+}
+
 static void reset_page(struct page *page)
 {
 	clear_bit(PG_private, &page->flags);
@@ -968,6 +1007,24 @@ static bool can_merge(struct size_class *prev, int size, int pages_per_zspage)
 	return true;
 }
 
+static int create_handle_class(struct zs_pool *pool, int handle_size)
+{
+	struct size_class *class;
+
+	class = kzalloc(sizeof(struct size_class), GFP_KERNEL);
+	if (!class)
+		return -ENOMEM;
+
+	class->index = 0;
+	class->size = handle_size;
+	class->pages_per_zspage = 1;
+	BUG_ON(class->pages_per_zspage != get_pages_per_zspage(handle_size));
+	spin_lock_init(&class->lock);
+	pool->handle_class = class;
+
+	return 0;
+}
+
 /**
  * zs_create_pool - Creates an allocation pool to work from.
  * @flags: allocation flags used to allocate pool metadata
@@ -989,12 +1046,13 @@ struct zs_pool *zs_create_pool(gfp_t flags)
 	if (!pool)
 		return NULL;
 
+	if (create_handle_class(pool, ZS_HANDLE_SIZE))
+		goto err;
+
 	pool->size_class = kcalloc(zs_size_classes, sizeof(struct size_class *),
 			GFP_KERNEL);
-	if (!pool->size_class) {
-		kfree(pool);
-		return NULL;
-	}
+	if (!pool->size_class)
+		goto err;
 
 	/*
 	 * Iterate reversly, because, size of size_class that we want to use
@@ -1053,6 +1111,8 @@ void zs_destroy_pool(struct zs_pool *pool)
 {
 	int i;
 
+	kfree(pool->handle_class);
+
 	for (i = 0; i < zs_size_classes; i++) {
 		int fg;
 		struct size_class *class = pool->size_class[i];
@@ -1077,36 +1137,21 @@ void zs_destroy_pool(struct zs_pool *pool)
 }
 EXPORT_SYMBOL_GPL(zs_destroy_pool);
 
-/**
- * zs_malloc - Allocate block of given size from pool.
- * @pool: pool to allocate from
- * @size: size of block to allocate
- *
- * On success, handle to the allocated object is returned,
- * otherwise 0.
- * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail.
- */
-unsigned long zs_malloc(struct zs_pool *pool, size_t size)
+static unsigned long __zs_malloc(struct zs_pool *pool,
+			struct size_class *class, gfp_t flags)
 {
 	unsigned long obj;
 	struct link_free *link;
-	struct size_class *class;
-	void *vaddr;
-
 	struct page *first_page, *m_page;
 	unsigned long m_objidx, m_offset;
-
-	if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
-		return 0;
-
-	class = pool->size_class[get_size_class_index(size)];
+	void *vaddr;
 
 	spin_lock(&class->lock);
 	first_page = find_get_zspage(class);
 
 	if (!first_page) {
 		spin_unlock(&class->lock);
-		first_page = alloc_zspage(class, pool->flags);
+		first_page = alloc_zspage(class, flags);
 		if (unlikely(!first_page))
 			return 0;
 
@@ -1117,7 +1162,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	}
 
 	obj = (unsigned long)first_page->freelist;
-	obj_handle_to_location(obj, &m_page, &m_objidx);
+	obj_to_location(obj, &m_page, &m_objidx);
 	m_offset = obj_idx_to_offset(m_page, m_objidx, class->size);
 
 	vaddr = kmap_atomic(m_page);
@@ -1128,14 +1173,54 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 
 	first_page->inuse++;
 	/* Now move the zspage to another fullness group, if required */
-	fix_fullness_group(pool, first_page);
+	fix_fullness_group(class, first_page);
 	spin_unlock(&class->lock);
 
 	return obj;
 }
+
+/**
+ * zs_malloc - Allocate block of given size from pool.
+ * @pool: pool to allocate from
+ * @size: size of block to allocate
+ *
+ * On success, handle to the allocated object is returned,
+ * otherwise 0.
+ * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail.
+ */
+unsigned long zs_malloc(struct zs_pool *pool, size_t size)
+{
+	unsigned long obj, handle;
+	struct size_class *class;
+	unsigned long *h_addr;
+
+	if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
+		return 0;
+
+	/* allocate handle */
+	handle = alloc_handle(pool);
+	if (!handle)
+		goto out;
+
+	/* allocate obj */
+	class = pool->size_class[get_size_class_index(size)];
+	obj = __zs_malloc(pool, class, pool->flags);
+	if (!obj) {
+		__zs_free(pool, pool->handle_class, handle);
+		handle = 0;
+		goto out;
+	}
+
+	/* associate handle with obj */
+	h_addr = handle_to_addr(pool, handle);
+	*h_addr = obj;
+out:
+	return handle;
+}
 EXPORT_SYMBOL_GPL(zs_malloc);
 
-void zs_free(struct zs_pool *pool, unsigned long obj)
+static void __zs_free(struct zs_pool *pool, struct size_class *class,
+			unsigned long handle)
 {
 	struct link_free *link;
 	struct page *first_page, *f_page;
@@ -1143,38 +1228,64 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
 	void *vaddr;
 
 	int class_idx;
-	struct size_class *class;
 	enum fullness_group fullness;
 
-	if (unlikely(!obj))
+	if (unlikely(!handle))
 		return;
 
-	obj_handle_to_location(obj, &f_page, &f_objidx);
+	obj_to_location(handle, &f_page, &f_objidx);
 	first_page = get_first_page(f_page);
 
 	get_zspage_mapping(first_page, &class_idx, &fullness);
-	class = pool->size_class[class_idx];
 	f_offset = obj_idx_to_offset(f_page, f_objidx, class->size);
 
-	spin_lock(&class->lock);
+	vaddr = kmap_atomic(f_page);
 
+	spin_lock(&class->lock);
 	/* Insert this object in containing zspage's freelist */
-	vaddr = kmap_atomic(f_page);
 	link = (struct link_free *)(vaddr + f_offset);
 	link->next = first_page->freelist;
-	kunmap_atomic(vaddr);
-	first_page->freelist = (void *)obj;
+	first_page->freelist = (void *)handle;
 
 	first_page->inuse--;
-	fullness = fix_fullness_group(pool, first_page);
+	fullness = fix_fullness_group(class, first_page);
 	spin_unlock(&class->lock);
 
+	kunmap_atomic(vaddr);
+
 	if (fullness == ZS_EMPTY) {
 		atomic_long_sub(class->pages_per_zspage,
 				&pool->pages_allocated);
 		free_zspage(first_page);
 	}
 }
+
+void zs_free(struct zs_pool *pool, unsigned long handle)
+{
+	unsigned long obj;
+	struct page *first_page, *f_page;
+	unsigned long f_objidx;
+
+	int class_idx;
+	struct size_class *class;
+	enum fullness_group fullness;
+
+	if (unlikely(!handle))
+		return;
+
+	obj = handle_to_obj(pool, handle);
+	/* free handle */
+	free_handle(pool, handle);
+
+	/* free obj */
+	obj_to_location(obj, &f_page, &f_objidx);
+	first_page = get_first_page(f_page);
+
+	get_zspage_mapping(first_page, &class_idx, &fullness);
+	class = pool->size_class[class_idx];
+
+	__zs_free(pool, class, obj);
+}
 EXPORT_SYMBOL_GPL(zs_free);
 
 /**
@@ -1195,6 +1306,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 			enum zs_mapmode mm)
 {
 	struct page *page;
+	unsigned long obj;
 	unsigned long obj_idx, off;
 
 	unsigned int class_idx;
@@ -1212,7 +1324,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	 */
 	BUG_ON(in_interrupt());
 
-	obj_handle_to_location(handle, &page, &obj_idx);
+	obj = handle_to_obj(pool, handle);
+	obj_to_location(obj, &page, &obj_idx);
 	get_zspage_mapping(get_first_page(page), &class_idx, &fg);
 	class = pool->size_class[class_idx];
 	off = obj_idx_to_offset(page, obj_idx, class->size);
@@ -1237,6 +1350,7 @@ EXPORT_SYMBOL_GPL(zs_map_object);
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 {
 	struct page *page;
+	unsigned long obj;
 	unsigned long obj_idx, off;
 
 	unsigned int class_idx;
@@ -1246,7 +1360,8 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 
 	BUG_ON(!handle);
 
-	obj_handle_to_location(handle, &page, &obj_idx);
+	obj = handle_to_obj(pool, handle);
+	obj_to_location(obj, &page, &obj_idx);
 	get_zspage_mapping(get_first_page(page), &class_idx, &fg);
 	class = pool->size_class[class_idx];
 	off = obj_idx_to_offset(page, obj_idx, class->size);
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 3/6] zsmalloc: implement reverse mapping
  2014-12-02  2:49 [RFC 0/6] zsmalloc support compaction Minchan Kim
  2014-12-02  2:49 ` [RFC 1/6] zsmalloc: expand size class to support sizeof(unsigned long) Minchan Kim
  2014-12-02  2:49 ` [RFC 2/6] zsmalloc: add indrection layer to decouple handle from object Minchan Kim
@ 2014-12-02  2:49 ` Minchan Kim
  2014-12-02  2:49 ` [RFC 4/6] zsmalloc: encode alloced mark in handle object Minchan Kim
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2014-12-02  2:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Nitin Gupta, Dan Streetman,
	Seth Jennings, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi, seungho1.park, Minchan Kim

This patch supports reverse mapping which gets handle from object.
For keeping handle per object, it allocates ZS_HANDLE_SIZE greater
than size user requested and stores handle in there.
IOW, *(mapped address by zs_map_object - ZS_HANDLE_SIZE) == handle.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 55 +++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 20 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5f3f9119705e..1eec2a539f77 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -213,8 +213,12 @@ struct size_class {
  * This must be power of 2 and less than or equal to ZS_ALIGN
  */
 struct link_free {
-	/* Handle of next free chunk (encodes <PFN, obj_idx>) */
-	void *next;
+	union {
+		/* Handle of next free chunk (encodes <PFN, obj_idx>) */
+		void *next;
+		/* Handle of object allocated to user */
+		unsigned long handle;
+	};
 };
 
 struct zs_pool {
@@ -245,7 +249,9 @@ struct mapping_area {
 };
 
 static unsigned long __zs_malloc(struct zs_pool *pool,
-			struct size_class *class, gfp_t flags);
+			struct size_class *class, gfp_t flags,
+			unsigned long handle);
+
 static void __zs_free(struct zs_pool *pool, struct size_class *class,
 			unsigned long handle);
 
@@ -618,7 +624,7 @@ static unsigned long handle_to_obj(struct zs_pool *pool, unsigned long handle)
 static unsigned long alloc_handle(struct zs_pool *pool)
 {
 	return __zs_malloc(pool, pool->handle_class,
-			pool->flags & ~__GFP_HIGHMEM);
+			pool->flags & ~__GFP_HIGHMEM, 0);
 }
 
 static void free_handle(struct zs_pool *pool, unsigned long handle)
@@ -873,18 +879,22 @@ static void __zs_unmap_object(struct mapping_area *area,
 {
 	int sizes[2];
 	void *addr;
-	char *buf = area->vm_buf;
+	char *buf;
 
 	/* no write fastpath */
 	if (area->vm_mm == ZS_MM_RO)
 		goto out;
 
-	sizes[0] = PAGE_SIZE - off;
+	/* We shouldn't overwrite handle */
+	buf = area->vm_buf + ZS_HANDLE_SIZE;
+	size -= ZS_HANDLE_SIZE;
+
+	sizes[0] = PAGE_SIZE - off - ZS_HANDLE_SIZE;
 	sizes[1] = size - sizes[0];
 
 	/* copy per-cpu buffer to object */
 	addr = kmap_atomic(pages[0]);
-	memcpy(addr + off, buf, sizes[0]);
+	memcpy(addr + off + ZS_HANDLE_SIZE, buf, sizes[0]);
 	kunmap_atomic(addr);
 	addr = kmap_atomic(pages[1]);
 	memcpy(addr, buf + sizes[0], sizes[1]);
@@ -1138,7 +1148,7 @@ void zs_destroy_pool(struct zs_pool *pool)
 EXPORT_SYMBOL_GPL(zs_destroy_pool);
 
 static unsigned long __zs_malloc(struct zs_pool *pool,
-			struct size_class *class, gfp_t flags)
+		struct size_class *class, gfp_t flags, unsigned long handle)
 {
 	unsigned long obj;
 	struct link_free *link;
@@ -1168,10 +1178,19 @@ static unsigned long __zs_malloc(struct zs_pool *pool,
 	vaddr = kmap_atomic(m_page);
 	link = (struct link_free *)vaddr + m_offset / sizeof(*link);
 	first_page->freelist = link->next;
-	memset(link, POISON_INUSE, sizeof(*link));
+	link->handle = handle;
 	kunmap_atomic(vaddr);
 
 	first_page->inuse++;
+
+	if (handle) {
+		unsigned long *h_addr;
+
+		/* associate handle with obj */
+		h_addr = handle_to_addr(pool, handle);
+		*h_addr = obj;
+	}
+
 	/* Now move the zspage to another fullness group, if required */
 	fix_fullness_group(class, first_page);
 	spin_unlock(&class->lock);
@@ -1192,9 +1211,8 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 {
 	unsigned long obj, handle;
 	struct size_class *class;
-	unsigned long *h_addr;
 
-	if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
+	if (unlikely(!size || (size + ZS_HANDLE_SIZE) > ZS_MAX_ALLOC_SIZE))
 		return 0;
 
 	/* allocate handle */
@@ -1202,18 +1220,15 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	if (!handle)
 		goto out;
 
-	/* allocate obj */
+	/* allocate obj and associate it with handle */
+	size += ZS_HANDLE_SIZE;
 	class = pool->size_class[get_size_class_index(size)];
-	obj = __zs_malloc(pool, class, pool->flags);
+	obj = __zs_malloc(pool, class, pool->flags, handle);
 	if (!obj) {
-		__zs_free(pool, pool->handle_class, handle);
+		free_handle(pool, handle);
 		handle = 0;
 		goto out;
 	}
-
-	/* associate handle with obj */
-	h_addr = handle_to_addr(pool, handle);
-	*h_addr = obj;
 out:
 	return handle;
 }
@@ -1335,7 +1350,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	if (off + class->size <= PAGE_SIZE) {
 		/* this object is contained entirely within a page */
 		area->vm_addr = kmap_atomic(page);
-		return area->vm_addr + off;
+		return area->vm_addr + off + ZS_HANDLE_SIZE;
 	}
 
 	/* this object spans two pages */
@@ -1343,7 +1358,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	pages[1] = get_next_page(page);
 	BUG_ON(!pages[1]);
 
-	return __zs_map_object(area, pages, off, class->size);
+	return __zs_map_object(area, pages, off, class->size) + ZS_HANDLE_SIZE;
 }
 EXPORT_SYMBOL_GPL(zs_map_object);
 
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 4/6] zsmalloc: encode alloced mark in handle object
  2014-12-02  2:49 [RFC 0/6] zsmalloc support compaction Minchan Kim
                   ` (2 preceding siblings ...)
  2014-12-02  2:49 ` [RFC 3/6] zsmalloc: implement reverse mapping Minchan Kim
@ 2014-12-02  2:49 ` Minchan Kim
  2014-12-02  2:49 ` [RFC 5/6] zsmalloc: support compaction Minchan Kim
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2014-12-02  2:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Nitin Gupta, Dan Streetman,
	Seth Jennings, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi, seungho1.park, Minchan Kim

For compaction, we need to look up using object in zspage
to migrate but there is no way to distinguish it from
free objects without walking all of free objects via
first_page->freelist, which would be haavy.

This patch encodes alloced mark in handle's least bit
so compaction can find it with small cost.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 1eec2a539f77..16c40081c22e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -133,7 +133,9 @@
 #endif
 #endif
 #define _PFN_BITS		(MAX_PHYSMEM_BITS - PAGE_SHIFT)
-#define OBJ_INDEX_BITS	(BITS_PER_LONG - _PFN_BITS)
+#define OBJ_ALLOCATED	1
+#define OBJ_ALLOC_BITS	1
+#define OBJ_INDEX_BITS	(BITS_PER_LONG - _PFN_BITS - OBJ_ALLOC_BITS)
 #define OBJ_INDEX_MASK	((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
 
 #define MAX(a, b) ((a) >= (b) ? (a) : (b))
@@ -555,9 +557,6 @@ static struct page *get_next_page(struct page *page)
 
 /*
  * Encode <page, obj_idx> as a single handle value.
- * On hardware platforms with physical memory starting at 0x0 the pfn
- * could be 0 so we ensure that the handle will never be 0 by adjusting the
- * encoded obj_idx value before encoding.
  */
 static void *obj_location_to_handle(struct page *page, unsigned long obj_idx)
 {
@@ -568,22 +567,20 @@ static void *obj_location_to_handle(struct page *page, unsigned long obj_idx)
 		return NULL;
 	}
 
-	handle = page_to_pfn(page) << OBJ_INDEX_BITS;
-	handle |= ((obj_idx + 1) & OBJ_INDEX_MASK);
+	handle = page_to_pfn(page) << (OBJ_INDEX_BITS + OBJ_ALLOC_BITS);
+	handle |= (obj_idx & OBJ_INDEX_MASK) << OBJ_ALLOC_BITS;
 
 	return (void *)handle;
 }
 
 /*
- * Decode <page, obj_idx> pair from the given object handle. We adjust the
- * decoded obj_idx back to its original value since it was adjusted in
- * obj_location_to_handle().
+ * Decode <page, obj_idx> pair from the given object handle.
  */
 static void obj_to_location(unsigned long handle, struct page **page,
 				unsigned long *obj_idx)
 {
-	*page = pfn_to_page(handle >> OBJ_INDEX_BITS);
-	*obj_idx = (handle & OBJ_INDEX_MASK) - 1;
+	*page = pfn_to_page(handle >> (OBJ_INDEX_BITS + OBJ_ALLOC_BITS));
+	*obj_idx = ((handle >> OBJ_ALLOC_BITS) & OBJ_INDEX_MASK);
 }
 
 static unsigned long obj_idx_to_offset(struct page *page,
@@ -623,8 +620,21 @@ static unsigned long handle_to_obj(struct zs_pool *pool, unsigned long handle)
 
 static unsigned long alloc_handle(struct zs_pool *pool)
 {
-	return __zs_malloc(pool, pool->handle_class,
+	unsigned long handle;
+
+	handle = __zs_malloc(pool, pool->handle_class,
 			pool->flags & ~__GFP_HIGHMEM, 0);
+	/*
+	 * OBJ_ALLOCATED marks the object allocated tag so compaction
+	 * can identify it among free objects in zspage.
+	 * In addtion, on hardware platforms with physical memory
+	 * starting at 0x0 the pfn could be 0 so it ensure that the
+	 * handle will never be 0 which means fail of allocation now.
+	 */
+	if (likely(handle))
+		handle |= OBJ_ALLOCATED;
+
+	return handle;
 }
 
 static void free_handle(struct zs_pool *pool, unsigned long handle)
@@ -1259,6 +1269,7 @@ static void __zs_free(struct zs_pool *pool, struct size_class *class,
 	spin_lock(&class->lock);
 	/* Insert this object in containing zspage's freelist */
 	link = (struct link_free *)(vaddr + f_offset);
+	link->handle &= ~OBJ_ALLOCATED;
 	link->next = first_page->freelist;
 	first_page->freelist = (void *)handle;
 
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 5/6] zsmalloc: support compaction
  2014-12-02  2:49 [RFC 0/6] zsmalloc support compaction Minchan Kim
                   ` (3 preceding siblings ...)
  2014-12-02  2:49 ` [RFC 4/6] zsmalloc: encode alloced mark in handle object Minchan Kim
@ 2014-12-02  2:49 ` Minchan Kim
  2014-12-02  2:49 ` [RFC 6/6] zram: " Minchan Kim
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2014-12-02  2:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Nitin Gupta, Dan Streetman,
	Seth Jennings, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi, seungho1.park, Minchan Kim

This patch enables zsmalloc compaction so that user can use it
via calling zs_compact(pool).

The migration policy is as follows,

1. find migration target objects in ZS_ALMOST_EMPTY
2. find free space in ZS_ALMOST_FULL. With no found, find it in ZS_ALMOST_EMPTY.
3. migrate objects get by 1 to free spaces get by 2
4. repeat [1-3] on each size class

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/zsmalloc.h |   1 +
 mm/zsmalloc.c            | 344 ++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 330 insertions(+), 15 deletions(-)

diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 05c214760977..04ecd3fc4283 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -47,5 +47,6 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
 
 unsigned long zs_get_total_pages(struct zs_pool *pool);
+unsigned long zs_compact(struct zs_pool *pool);
 
 #endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 16c40081c22e..304595d97610 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -227,6 +227,7 @@ struct zs_pool {
 	struct size_class **size_class;
 	struct size_class *handle_class;
 
+	rwlock_t  migrate_lock;
 	gfp_t flags;	/* allocation flags used when growing pool */
 	atomic_long_t pages_allocated;
 };
@@ -618,6 +619,24 @@ static unsigned long handle_to_obj(struct zs_pool *pool, unsigned long handle)
 	return obj;
 }
 
+static unsigned long obj_to_handle(struct zs_pool *pool,
+				struct size_class *class, unsigned long obj)
+{
+	struct page *page;
+	unsigned long obj_idx, off;
+	unsigned long handle;
+	void *addr;
+
+	obj_to_location(obj, &page, &obj_idx);
+	off = obj_idx_to_offset(page, obj_idx, class->size);
+
+	addr = kmap_atomic(page);
+	handle = *(unsigned long *)(addr + off);
+	kunmap_atomic(addr);
+
+	return handle;
+}
+
 static unsigned long alloc_handle(struct zs_pool *pool)
 {
 	unsigned long handle;
@@ -1066,6 +1085,8 @@ struct zs_pool *zs_create_pool(gfp_t flags)
 	if (!pool)
 		return NULL;
 
+	rwlock_init(&pool->migrate_lock);
+
 	if (create_handle_class(pool, ZS_HANDLE_SIZE))
 		goto err;
 
@@ -1157,20 +1178,41 @@ void zs_destroy_pool(struct zs_pool *pool)
 }
 EXPORT_SYMBOL_GPL(zs_destroy_pool);
 
-static unsigned long __zs_malloc(struct zs_pool *pool,
-		struct size_class *class, gfp_t flags, unsigned long handle)
+static unsigned long __obj_malloc(struct page *first_page,
+		struct size_class *class, unsigned long handle)
 {
 	unsigned long obj;
 	struct link_free *link;
-	struct page *first_page, *m_page;
+	struct page *m_page;
 	unsigned long m_objidx, m_offset;
 	void *vaddr;
 
+	obj = (unsigned long)first_page->freelist;
+	obj_to_location(obj, &m_page, &m_objidx);
+	m_offset = obj_idx_to_offset(m_page, m_objidx, class->size);
+
+	vaddr = kmap_atomic(m_page);
+	link = (struct link_free *)vaddr + m_offset / sizeof(*link);
+	first_page->freelist = link->next;
+	link->handle = handle;
+	kunmap_atomic(vaddr);
+
+	first_page->inuse++;
+	return obj;
+}
+
+static unsigned long __zs_malloc(struct zs_pool *pool,
+		struct size_class *class, gfp_t flags, unsigned long handle)
+{
+	struct page *first_page;
+	unsigned long obj;
+
 	spin_lock(&class->lock);
 	first_page = find_get_zspage(class);
 
 	if (!first_page) {
 		spin_unlock(&class->lock);
+		read_unlock(&pool->migrate_lock);
 		first_page = alloc_zspage(class, flags);
 		if (unlikely(!first_page))
 			return 0;
@@ -1178,21 +1220,11 @@ static unsigned long __zs_malloc(struct zs_pool *pool,
 		set_zspage_mapping(first_page, class->index, ZS_EMPTY);
 		atomic_long_add(class->pages_per_zspage,
 					&pool->pages_allocated);
+		read_lock(&pool->migrate_lock);
 		spin_lock(&class->lock);
 	}
 
-	obj = (unsigned long)first_page->freelist;
-	obj_to_location(obj, &m_page, &m_objidx);
-	m_offset = obj_idx_to_offset(m_page, m_objidx, class->size);
-
-	vaddr = kmap_atomic(m_page);
-	link = (struct link_free *)vaddr + m_offset / sizeof(*link);
-	first_page->freelist = link->next;
-	link->handle = handle;
-	kunmap_atomic(vaddr);
-
-	first_page->inuse++;
-
+	obj = __obj_malloc(first_page, class, handle);
 	if (handle) {
 		unsigned long *h_addr;
 
@@ -1225,6 +1257,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	if (unlikely(!size || (size + ZS_HANDLE_SIZE) > ZS_MAX_ALLOC_SIZE))
 		return 0;
 
+	read_lock(&pool->migrate_lock);
 	/* allocate handle */
 	handle = alloc_handle(pool);
 	if (!handle)
@@ -1240,6 +1273,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 		goto out;
 	}
 out:
+	read_unlock(&pool->migrate_lock);
 	return handle;
 }
 EXPORT_SYMBOL_GPL(zs_malloc);
@@ -1299,6 +1333,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 	if (unlikely(!handle))
 		return;
 
+	read_lock(&pool->migrate_lock);
 	obj = handle_to_obj(pool, handle);
 	/* free handle */
 	free_handle(pool, handle);
@@ -1311,6 +1346,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 	class = pool->size_class[class_idx];
 
 	__zs_free(pool, class, obj);
+	read_unlock(&pool->migrate_lock);
 }
 EXPORT_SYMBOL_GPL(zs_free);
 
@@ -1343,6 +1379,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 
 	BUG_ON(!handle);
 
+	read_lock(&pool->migrate_lock);
 	/*
 	 * Because we use per-cpu mapping areas shared among the
 	 * pools/users, we can't allow mapping in interrupt context
@@ -1405,6 +1442,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 		__zs_unmap_object(area, pages, off, class->size);
 	}
 	put_cpu_var(zs_map_area);
+	read_unlock(&pool->migrate_lock);
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
 
@@ -1414,6 +1452,282 @@ unsigned long zs_get_total_pages(struct zs_pool *pool)
 }
 EXPORT_SYMBOL_GPL(zs_get_total_pages);
 
+static void zs_object_copy(unsigned long src, unsigned long dst,
+				struct size_class *class)
+{
+	struct page *s_page, *d_page;
+	unsigned long s_objidx, d_objidx;
+	unsigned long s_off, d_off;
+	void *s_addr, *d_addr;
+	int s_size, d_size, size;
+	int written = 0;
+
+	s_size = d_size = class->size;
+
+	obj_to_location(src, &s_page, &s_objidx);
+	obj_to_location(dst, &d_page, &d_objidx);
+
+	s_off = obj_idx_to_offset(s_page, s_objidx, class->size);
+	d_off = obj_idx_to_offset(d_page, d_objidx, class->size);
+
+	if (s_off + class->size > PAGE_SIZE)
+		s_size = PAGE_SIZE - s_off;
+
+	if (d_off + class->size > PAGE_SIZE)
+		d_size = PAGE_SIZE - d_off;
+
+	s_addr = kmap_atomic(s_page);
+	d_addr = kmap_atomic(d_page);
+
+	while (1) {
+		size = min(s_size, d_size);
+		memcpy(d_addr + d_off, s_addr + s_off, size);
+		written += size;
+
+		if (written == class->size)
+			break;
+
+		if (s_off + size >= PAGE_SIZE) {
+			kunmap_atomic(s_addr);
+			s_page = get_next_page(s_page);
+			BUG_ON(!s_page);
+			s_addr = kmap_atomic(s_page);
+			s_size = class->size - written;
+			s_off = 0;
+		} else {
+			s_off += size;
+			s_size -= size;
+		}
+
+		if (d_off + size >= PAGE_SIZE) {
+			kunmap_atomic(d_addr);
+			d_page = get_next_page(d_page);
+			BUG_ON(!d_page);
+			d_addr = kmap_atomic(d_page);
+			d_size = class->size - written;
+			d_off = 0;
+		} else {
+			d_off += size;
+			d_size -= size;
+		}
+	}
+
+	kunmap_atomic(s_addr);
+	kunmap_atomic(d_addr);
+}
+
+static unsigned long find_alloced_obj(struct page *page, int index,
+					struct size_class *class)
+{
+	int offset = 0;
+	unsigned long obj = 0;
+	void *addr = kmap_atomic(page);
+
+	if (!is_first_page(page))
+		offset = page->index;
+	offset += class->size * index;
+
+	while (offset < PAGE_SIZE) {
+		if (*(unsigned long *)(addr + offset) & OBJ_ALLOCATED) {
+			obj = (unsigned long)obj_location_to_handle(page,
+								index);
+			break;
+		}
+
+		offset += class->size;
+		index++;
+	}
+
+	kunmap_atomic(addr);
+	return obj;
+}
+
+struct zs_compact_control {
+	struct page *s_page; /* from page for migration */
+	int index; /* start index from @s_page for finding used object */
+	struct page *d_page; /* to page for migration */
+	unsigned long nr_migrated;
+	int nr_to_migrate;
+};
+
+static void migrate_zspage(struct zs_pool *pool, struct zs_compact_control *cc,
+				struct size_class *class)
+{
+	unsigned long used_obj, free_obj;
+	unsigned long handle;
+	struct page *s_page = cc->s_page;
+	unsigned long index = cc->index;
+	struct page *d_page = cc->d_page;
+	unsigned long *h_addr;
+	bool exit = false;
+
+	BUG_ON(!is_first_page(d_page));
+
+	while (1) {
+		used_obj = find_alloced_obj(s_page, index, class);
+		if (!used_obj) {
+			s_page = get_next_page(s_page);
+			if (!s_page)
+				break;
+			index = 0;
+			continue;
+		}
+
+		if (d_page->inuse == d_page->objects)
+			break;
+
+		free_obj = __obj_malloc(d_page, class, 0);
+
+		zs_object_copy(used_obj, free_obj, class);
+
+		obj_to_location(used_obj, &s_page, &index);
+		index++;
+
+		handle = obj_to_handle(pool, class, used_obj);
+		h_addr = handle_to_addr(pool, handle);
+		BUG_ON(*h_addr != used_obj);
+		*h_addr = free_obj;
+		cc->nr_migrated++;
+
+		/* Don't need a class->lock due to migrate_lock */
+		insert_zspage(get_first_page(s_page), class, ZS_ALMOST_EMPTY);
+
+		/*
+		 * I don't want __zs_free has return value in case of freeing
+		 * zspage for slow path so let's check page->inuse count
+		 * right before __zs_free and then exit if it is last object.
+		 */
+		if (get_first_page(s_page)->inuse == 1)
+			exit = true;
+
+		__zs_free(pool, class, used_obj);
+		if (exit)
+			break;
+
+		remove_zspage(get_first_page(s_page), class, ZS_ALMOST_EMPTY);
+	}
+
+	cc->s_page = s_page;
+	cc->index = index;
+}
+
+static struct page *alloc_target_page(struct size_class *class)
+{
+	int i;
+	struct page *page;
+
+	spin_lock(&class->lock);
+	for (i = 0; i < _ZS_NR_FULLNESS_GROUPS; i++) {
+		page = class->fullness_list[i];
+		if (page) {
+			remove_zspage(page, class, i);
+			break;
+		}
+	}
+	spin_unlock(&class->lock);
+
+	return page;
+}
+
+static void putback_target_page(struct page *page, struct size_class *class)
+{
+	int class_idx;
+	enum fullness_group currfg;
+
+	BUG_ON(!is_first_page(page));
+
+	spin_lock(&class->lock);
+	get_zspage_mapping(page, &class_idx, &currfg);
+	insert_zspage(page, class, currfg);
+	fix_fullness_group(class, page);
+	spin_unlock(&class->lock);
+}
+
+static struct page *isolate_source_page(struct size_class *class)
+{
+	struct page *page;
+
+	spin_lock(&class->lock);
+	page = class->fullness_list[ZS_ALMOST_EMPTY];
+	if (page)
+		remove_zspage(page, class, ZS_ALMOST_EMPTY);
+	spin_unlock(&class->lock);
+
+	return page;
+}
+
+static void putback_source_page(struct page *page, struct size_class *class)
+{
+	spin_lock(&class->lock);
+	insert_zspage(page, class, ZS_ALMOST_EMPTY);
+	fix_fullness_group(class, page);
+	spin_unlock(&class->lock);
+}
+
+static unsigned long __zs_compact(struct zs_pool *pool,
+				struct size_class *class)
+{
+	unsigned long nr_total_migrated = 0;
+	struct page *src_page, *dst_page;
+
+	write_lock(&pool->migrate_lock);
+	while ((src_page = isolate_source_page(class))) {
+		struct zs_compact_control cc;
+
+		BUG_ON(!is_first_page(src_page));
+
+		cc.index = 0;
+		cc.s_page = src_page;
+		cc.nr_to_migrate = src_page->inuse;
+		cc.nr_migrated = 0;
+
+		BUG_ON(0 >= cc.nr_to_migrate);
+retry:
+		dst_page = alloc_target_page(class);
+		if (!dst_page)
+			break;
+		cc.d_page = dst_page;
+
+		migrate_zspage(pool, &cc, class);
+		putback_target_page(cc.d_page, class);
+
+		if (cc.nr_migrated < cc.nr_to_migrate)
+			goto retry;
+
+		write_unlock(&pool->migrate_lock);
+		write_lock(&pool->migrate_lock);
+		nr_total_migrated += cc.nr_migrated;
+	}
+
+	if (src_page)
+		putback_source_page(src_page, class);
+
+	write_unlock(&pool->migrate_lock);
+
+	return nr_total_migrated;
+}
+
+unsigned long zs_compact(struct zs_pool *pool)
+{
+	int i;
+	unsigned long nr_migrated = 0;
+
+	for (i = 0; i < zs_size_classes; i++) {
+		struct size_class *class = pool->size_class[i];
+
+		if (!class)
+			continue;
+
+		if (class->index != i)
+			continue;
+
+		nr_migrated += __zs_compact(pool, class);
+	}
+
+	return nr_migrated;
+}
+EXPORT_SYMBOL_GPL(zs_compact);
+
 module_init(zs_init);
 module_exit(zs_exit);
 
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 6/6] zram: support compaction
  2014-12-02  2:49 [RFC 0/6] zsmalloc support compaction Minchan Kim
                   ` (4 preceding siblings ...)
  2014-12-02  2:49 ` [RFC 5/6] zsmalloc: support compaction Minchan Kim
@ 2014-12-02  2:49 ` Minchan Kim
  2014-12-04  6:49 ` [RFC 0/6] zsmalloc " "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
  2014-12-17 23:19 ` Seth Jennings
  7 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2014-12-02  2:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Nitin Gupta, Dan Streetman,
	Seth Jennings, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi, seungho1.park, Minchan Kim

Now that zsmalloc supports compaction, zram can use it.
For the first step, this patch exports compact knob via sysfs
so user can do compaction via "echo 1 > /sys/block/zram0/compact".

Maybe, we need another knob to trigger compaction automatically
once the amount of fragment is higher than the ratio.

echo "fragment_ratio" > /sys/block/zram0/compact_based_on_the_ratio

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 24 ++++++++++++++++++++++++
 drivers/block/zram/zram_drv.h |  1 +
 2 files changed, 25 insertions(+)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 976eab6f35b9..53c110b289fc 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -247,6 +247,26 @@ static ssize_t comp_algorithm_store(struct device *dev,
 	return len;
 }
 
+static ssize_t compact_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	unsigned long nr_migrated;
+	struct zram *zram = dev_to_zram(dev);
+	struct zram_meta *meta;
+
+	down_read(&zram->init_lock);
+	if (!init_done(zram)) {
+		up_read(&zram->init_lock);
+		return -EINVAL;
+	}
+
+	meta = zram->meta;
+	nr_migrated = zs_compact(meta->mem_pool);
+	up_read(&zram->init_lock);
+	atomic64_add(nr_migrated, &zram->stats.num_migrated);
+	return len;
+}
+
 /* flag operations needs meta->tb_lock */
 static int zram_test_flag(struct zram_meta *meta, u32 index,
 			enum zram_pageflags flag)
@@ -1008,6 +1028,7 @@ static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
 		max_comp_streams_show, max_comp_streams_store);
 static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
 		comp_algorithm_show, comp_algorithm_store);
+static DEVICE_ATTR(compact, S_IWUSR, NULL, compact_store);
 
 ZRAM_ATTR_RO(num_reads);
 ZRAM_ATTR_RO(num_writes);
@@ -1017,6 +1038,7 @@ ZRAM_ATTR_RO(invalid_io);
 ZRAM_ATTR_RO(notify_free);
 ZRAM_ATTR_RO(zero_pages);
 ZRAM_ATTR_RO(compr_data_size);
+ZRAM_ATTR_RO(num_migrated);
 
 static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_disksize.attr,
@@ -1024,6 +1046,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_reset.attr,
 	&dev_attr_num_reads.attr,
 	&dev_attr_num_writes.attr,
+	&dev_attr_num_migrated.attr,
 	&dev_attr_failed_reads.attr,
 	&dev_attr_failed_writes.attr,
 	&dev_attr_invalid_io.attr,
@@ -1036,6 +1059,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_mem_used_max.attr,
 	&dev_attr_max_comp_streams.attr,
 	&dev_attr_comp_algorithm.attr,
+	&dev_attr_compact.attr,
 	NULL,
 };
 
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index b05a816b09ac..5e7a565808b9 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -84,6 +84,7 @@ struct zram_stats {
 	atomic64_t compr_data_size;	/* compressed size of pages stored */
 	atomic64_t num_reads;	/* failed + successful */
 	atomic64_t num_writes;	/* --do-- */
+	atomic64_t num_migrated;	/* no. of migrated object */
 	atomic64_t failed_reads;	/* can happen when memory is too low */
 	atomic64_t failed_writes;	/* can happen when memory is too low */
 	atomic64_t invalid_io;	/* non-page-aligned I/O requests */
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC 0/6] zsmalloc support compaction
  2014-12-02  2:49 [RFC 0/6] zsmalloc support compaction Minchan Kim
                   ` (5 preceding siblings ...)
  2014-12-02  2:49 ` [RFC 6/6] zram: " Minchan Kim
@ 2014-12-04  6:49 ` "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
  2014-12-04  7:20   ` Minchan Kim
  2014-12-04  7:21   ` "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
  2014-12-17 23:19 ` Seth Jennings
  7 siblings, 2 replies; 15+ messages in thread
From: "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)" @ 2014-12-04  6:49 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Nitin Gupta, Dan Streetman,
	Seth Jennings, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi

Hi, Minchan.

I have a question.
The problem mentioned can't be resolved with compaction?
Is there any reason that zsmalloc pages can't be moved by compaction
operation in direct reclaim?

2014-12-02 오전 11:49에 Minchan Kim 이(가) 쓴 글:
> Recently, there was issue about zsmalloc fragmentation and
> I got a report from Juno that new fork failed although there
> are plenty of free pages in the system.
> His investigation revealed zram is one of the culprit to make
> heavy fragmentation so there was no more contiguous 16K page
> for pgd to fork in the ARM.
>
> This patchset implement *basic* zsmalloc compaction support
> and zram utilizes it so admin can do
> 	"echo 1 > /sys/block/zram0/compact"
>
> Actually, ideal is that mm migrate code is aware of zram pages and
> migrate them out automatically without admin's manual opeartion
> when system is out of contiguous page. Howver, we need more thinking
> before adding more hooks to migrate.c. Even though we implement it,
> we need manual trigger mode, too so I hope we could enhance
> zram migration stuff based on this primitive functions in future.
>
> I just tested it on only x86 so need more testing on other arches.
> Additionally, I should have a number for zsmalloc regression
> caused by indirect layering. Unfortunately, I don't have any
> ARM test machine on my desk. I will get it soon and test it.
> Anyway, before further work, I'd like to hear opinion.
>
> Pathset is based on v3.18-rc6-mmotm-2014-11-26-15-45.
>
> Thanks.
>
> Minchan Kim (6):
>    zsmalloc: expand size class to support sizeof(unsigned long)
>    zsmalloc: add indrection layer to decouple handle from object
>    zsmalloc: implement reverse mapping
>    zsmalloc: encode alloced mark in handle object
>    zsmalloc: support compaction
>    zram: support compaction
>
>   drivers/block/zram/zram_drv.c |  24 ++
>   drivers/block/zram/zram_drv.h |   1 +
>   include/linux/zsmalloc.h      |   1 +
>   mm/zsmalloc.c                 | 596 +++++++++++++++++++++++++++++++++++++-----
>   4 files changed, 552 insertions(+), 70 deletions(-)
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/6] zsmalloc support compaction
  2014-12-04  6:49 ` [RFC 0/6] zsmalloc " "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
@ 2014-12-04  7:20   ` Minchan Kim
  2014-12-04  7:29     ` "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
  2014-12-04  7:21   ` "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
  1 sibling, 1 reply; 15+ messages in thread
From: Minchan Kim @ 2014-12-04  7:20 UTC (permalink / raw)
  To: "박승호/책임연구원/SW
	Platform(연)AOT팀(seungho1.park@lge.com)"
  Cc: Andrew Morton, linux-kernel, linux-mm, Nitin Gupta,
	Dan Streetman, Seth Jennings, Sergey Senozhatsky,
	Luigi Semenzato, Jerome Marchand, juno.choi

Hey Seungho,

On Thu, Dec 04, 2014 at 03:49:21PM +0900, "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)" wrote:
> Hi, Minchan.
> 
> I have a question.
> The problem mentioned can't be resolved with compaction?
> Is there any reason that zsmalloc pages can't be moved by compaction
> operation in direct reclaim?

Currently, zsmalloc doesn't request movable page to page allocator
since compaction is not aware of zram pages(ie, PageZram(page)).
IOW, compaction cannot migrate zram pages at the moment.
As I described, it's final destination we should go but I think
we need more thinking to generalize such special pages handling
(ex, balloon, zswap, zram and so on for future) in compaction logic.

Now, I want to merge basic primitive functions in zsmalloc to
support zspage migration, which will be used as basic utility functions
for supporting compaction-aware zspage migration.

In addition, even we need manual opeartion logic to trigger
compaction for zsmalloc like /proc/sys/vm/compact_memory.

> 
> 2014-12-02 오전 11:49에 Minchan Kim 이(가) 쓴 글:
> >Recently, there was issue about zsmalloc fragmentation and
> >I got a report from Juno that new fork failed although there
> >are plenty of free pages in the system.
> >His investigation revealed zram is one of the culprit to make
> >heavy fragmentation so there was no more contiguous 16K page
> >for pgd to fork in the ARM.
> >
> >This patchset implement *basic* zsmalloc compaction support
> >and zram utilizes it so admin can do
> >	"echo 1 > /sys/block/zram0/compact"
> >
> >Actually, ideal is that mm migrate code is aware of zram pages and
> >migrate them out automatically without admin's manual opeartion
> >when system is out of contiguous page. Howver, we need more thinking
> >before adding more hooks to migrate.c. Even though we implement it,
> >we need manual trigger mode, too so I hope we could enhance
> >zram migration stuff based on this primitive functions in future.
> >
> >I just tested it on only x86 so need more testing on other arches.
> >Additionally, I should have a number for zsmalloc regression
> >caused by indirect layering. Unfortunately, I don't have any
> >ARM test machine on my desk. I will get it soon and test it.
> >Anyway, before further work, I'd like to hear opinion.
> >
> >Pathset is based on v3.18-rc6-mmotm-2014-11-26-15-45.
> >
> >Thanks.
> >
> >Minchan Kim (6):
> >   zsmalloc: expand size class to support sizeof(unsigned long)
> >   zsmalloc: add indrection layer to decouple handle from object
> >   zsmalloc: implement reverse mapping
> >   zsmalloc: encode alloced mark in handle object
> >   zsmalloc: support compaction
> >   zram: support compaction
> >
> >  drivers/block/zram/zram_drv.c |  24 ++
> >  drivers/block/zram/zram_drv.h |   1 +
> >  include/linux/zsmalloc.h      |   1 +
> >  mm/zsmalloc.c                 | 596 +++++++++++++++++++++++++++++++++++++-----
> >  4 files changed, 552 insertions(+), 70 deletions(-)
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/6] zsmalloc support compaction
  2014-12-04  6:49 ` [RFC 0/6] zsmalloc " "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
  2014-12-04  7:20   ` Minchan Kim
@ 2014-12-04  7:21   ` "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
  1 sibling, 0 replies; 15+ messages in thread
From: "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)" @ 2014-12-04  7:21 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Nitin Gupta, Dan Streetman,
	Seth Jennings, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi

Hi, Minchan.

Sorry, I got it. You made zsmalloc pages movable.
forget it.

Seungho.

2014-12-04 오후 3:49에 "박승호/책임연구원/SW Platform(연)AOT팀 
(seungho1.park@lge.com)" 이(가) 쓴 글:
> Hi, Minchan.
>
> I have a question.
> The problem mentioned can't be resolved with compaction?
> Is there any reason that zsmalloc pages can't be moved by compaction
> operation in direct reclaim?
>
> 2014-12-02 오전 11:49에 Minchan Kim 이(가) 쓴 글:
>> Recently, there was issue about zsmalloc fragmentation and
>> I got a report from Juno that new fork failed although there
>> are plenty of free pages in the system.
>> His investigation revealed zram is one of the culprit to make
>> heavy fragmentation so there was no more contiguous 16K page
>> for pgd to fork in the ARM.
>>
>> This patchset implement *basic* zsmalloc compaction support
>> and zram utilizes it so admin can do
>>     "echo 1 > /sys/block/zram0/compact"
>>
>> Actually, ideal is that mm migrate code is aware of zram pages and
>> migrate them out automatically without admin's manual opeartion
>> when system is out of contiguous page. Howver, we need more thinking
>> before adding more hooks to migrate.c. Even though we implement it,
>> we need manual trigger mode, too so I hope we could enhance
>> zram migration stuff based on this primitive functions in future.
>>
>> I just tested it on only x86 so need more testing on other arches.
>> Additionally, I should have a number for zsmalloc regression
>> caused by indirect layering. Unfortunately, I don't have any
>> ARM test machine on my desk. I will get it soon and test it.
>> Anyway, before further work, I'd like to hear opinion.
>>
>> Pathset is based on v3.18-rc6-mmotm-2014-11-26-15-45.
>>
>> Thanks.
>>
>> Minchan Kim (6):
>>    zsmalloc: expand size class to support sizeof(unsigned long)
>>    zsmalloc: add indrection layer to decouple handle from object
>>    zsmalloc: implement reverse mapping
>>    zsmalloc: encode alloced mark in handle object
>>    zsmalloc: support compaction
>>    zram: support compaction
>>
>>   drivers/block/zram/zram_drv.c |  24 ++
>>   drivers/block/zram/zram_drv.h |   1 +
>>   include/linux/zsmalloc.h      |   1 +
>>   mm/zsmalloc.c                 | 596 
>> +++++++++++++++++++++++++++++++++++++-----
>>   4 files changed, 552 insertions(+), 70 deletions(-)
>>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/6] zsmalloc support compaction
  2014-12-04  7:20   ` Minchan Kim
@ 2014-12-04  7:29     ` "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
  0 siblings, 0 replies; 15+ messages in thread
From: "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)" @ 2014-12-04  7:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Nitin Gupta,
	Dan Streetman, Seth Jennings, Sergey Senozhatsky,
	Luigi Semenzato, Jerome Marchand, juno.choi

Hi, Minchan.

I got it. Thank u :).

Seungho.

2014-12-04 오후 4:20에 Minchan Kim 이(가) 쓴 글:
> Hey Seungho,
>
> On Thu, Dec 04, 2014 at 03:49:21PM +0900, "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)" wrote:
>> Hi, Minchan.
>>
>> I have a question.
>> The problem mentioned can't be resolved with compaction?
>> Is there any reason that zsmalloc pages can't be moved by compaction
>> operation in direct reclaim?
> Currently, zsmalloc doesn't request movable page to page allocator
> since compaction is not aware of zram pages(ie, PageZram(page)).
> IOW, compaction cannot migrate zram pages at the moment.
> As I described, it's final destination we should go but I think
> we need more thinking to generalize such special pages handling
> (ex, balloon, zswap, zram and so on for future) in compaction logic.
>
> Now, I want to merge basic primitive functions in zsmalloc to
> support zspage migration, which will be used as basic utility functions
> for supporting compaction-aware zspage migration.
>
> In addition, even we need manual opeartion logic to trigger
> compaction for zsmalloc like /proc/sys/vm/compact_memory.
>
>> 2014-12-02 오전 11:49에 Minchan Kim 이(가) 쓴 글:
>>> Recently, there was issue about zsmalloc fragmentation and
>>> I got a report from Juno that new fork failed although there
>>> are plenty of free pages in the system.
>>> His investigation revealed zram is one of the culprit to make
>>> heavy fragmentation so there was no more contiguous 16K page
>>> for pgd to fork in the ARM.
>>>
>>> This patchset implement *basic* zsmalloc compaction support
>>> and zram utilizes it so admin can do
>>> 	"echo 1 > /sys/block/zram0/compact"
>>>
>>> Actually, ideal is that mm migrate code is aware of zram pages and
>>> migrate them out automatically without admin's manual opeartion
>>> when system is out of contiguous page. Howver, we need more thinking
>>> before adding more hooks to migrate.c. Even though we implement it,
>>> we need manual trigger mode, too so I hope we could enhance
>>> zram migration stuff based on this primitive functions in future.
>>>
>>> I just tested it on only x86 so need more testing on other arches.
>>> Additionally, I should have a number for zsmalloc regression
>>> caused by indirect layering. Unfortunately, I don't have any
>>> ARM test machine on my desk. I will get it soon and test it.
>>> Anyway, before further work, I'd like to hear opinion.
>>>
>>> Pathset is based on v3.18-rc6-mmotm-2014-11-26-15-45.
>>>
>>> Thanks.
>>>
>>> Minchan Kim (6):
>>>    zsmalloc: expand size class to support sizeof(unsigned long)
>>>    zsmalloc: add indrection layer to decouple handle from object
>>>    zsmalloc: implement reverse mapping
>>>    zsmalloc: encode alloced mark in handle object
>>>    zsmalloc: support compaction
>>>    zram: support compaction
>>>
>>>   drivers/block/zram/zram_drv.c |  24 ++
>>>   drivers/block/zram/zram_drv.h |   1 +
>>>   include/linux/zsmalloc.h      |   1 +
>>>   mm/zsmalloc.c                 | 596 +++++++++++++++++++++++++++++++++++++-----
>>>   4 files changed, 552 insertions(+), 70 deletions(-)
>>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/6] zsmalloc support compaction
  2014-12-02  2:49 [RFC 0/6] zsmalloc support compaction Minchan Kim
                   ` (6 preceding siblings ...)
  2014-12-04  6:49 ` [RFC 0/6] zsmalloc " "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
@ 2014-12-17 23:19 ` Seth Jennings
  2014-12-18  1:50   ` Ganesh Mahendran
  2014-12-19  0:46   ` Minchan Kim
  7 siblings, 2 replies; 15+ messages in thread
From: Seth Jennings @ 2014-12-17 23:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Nitin Gupta,
	Dan Streetman, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi, seungho1.park

On Tue, Dec 02, 2014 at 11:49:41AM +0900, Minchan Kim wrote:
> Recently, there was issue about zsmalloc fragmentation and
> I got a report from Juno that new fork failed although there
> are plenty of free pages in the system.
> His investigation revealed zram is one of the culprit to make
> heavy fragmentation so there was no more contiguous 16K page
> for pgd to fork in the ARM.
> 
> This patchset implement *basic* zsmalloc compaction support
> and zram utilizes it so admin can do
> 	"echo 1 > /sys/block/zram0/compact"
> 
> Actually, ideal is that mm migrate code is aware of zram pages and
> migrate them out automatically without admin's manual opeartion
> when system is out of contiguous page. Howver, we need more thinking
> before adding more hooks to migrate.c. Even though we implement it,
> we need manual trigger mode, too so I hope we could enhance
> zram migration stuff based on this primitive functions in future.
> 
> I just tested it on only x86 so need more testing on other arches.
> Additionally, I should have a number for zsmalloc regression
> caused by indirect layering. Unfortunately, I don't have any
> ARM test machine on my desk. I will get it soon and test it.
> Anyway, before further work, I'd like to hear opinion.
> 
> Pathset is based on v3.18-rc6-mmotm-2014-11-26-15-45.

Hey Minchan, sorry it has taken a while for me to look at this.

I have prototyped this for zbud to and I see you face some of the same
issues, some of them much worse for zsmalloc like large number of
objects to move to reclaim a page (with zbud, the max is 1).

I see you are using zsmalloc itself for allocating the handles.  Why not
kmalloc()?  Then you wouldn't need to track the handle_class stuff and
adjust the class sizes (just in the interest of changing only what is
need to achieve the functionality).

I used kmalloc() but that is not without issue as the handles can be
allocated from many slabs and any slab that contains a handle can't be
freed, basically resulting in the handles themselves needing to be
compacted, which they can't be because the user handle is a pointer to
them.

One way to fix this, but it would be some amount of work, is to have the
user (zswap/zbud) provide the space for the handle to zbud/zsmalloc.
The zswap/zbud layer knows the size of the device (i.e. handle space)
and could allocate a statically sized vmalloc area for holding handles
so they don't get spread all over memory.  I haven't fully explored this
idea yet.

It is pretty limiting having the user trigger the compaction. Can we
have a work item that periodically does some amount of compaction?
Maybe also have something analogous to direct reclaim that, when
zs_malloc fails to secure a new page, it will try to compact to get one?
I understand this is a first step.  Maybe too much.

Also worth pointing out that the fullness groups are very coarse.
Combining the objects from a ZS_ALMOST_EMPTY zspage and ZS_ALMOST_FULL
zspage, might not result in very tight packing.  In the worst case, the
destination zspage would be slightly over 1/4 full (see
fullness_threshold_frac)

It also seems that you start with the smallest size classes first.
Seems like if we start with the biggest first, we move fewer objects and
reclaim more pages.

It does add a lot of code :-/  Not sure if there is any way around that
though if we want this functionality for zsmalloc.

Seth

> 
> Thanks.
> 
> Minchan Kim (6):
>   zsmalloc: expand size class to support sizeof(unsigned long)
>   zsmalloc: add indrection layer to decouple handle from object
>   zsmalloc: implement reverse mapping
>   zsmalloc: encode alloced mark in handle object
>   zsmalloc: support compaction
>   zram: support compaction
> 
>  drivers/block/zram/zram_drv.c |  24 ++
>  drivers/block/zram/zram_drv.h |   1 +
>  include/linux/zsmalloc.h      |   1 +
>  mm/zsmalloc.c                 | 596 +++++++++++++++++++++++++++++++++++++-----
>  4 files changed, 552 insertions(+), 70 deletions(-)
> 
> -- 
> 2.0.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/6] zsmalloc support compaction
  2014-12-17 23:19 ` Seth Jennings
@ 2014-12-18  1:50   ` Ganesh Mahendran
  2014-12-19  0:46   ` Minchan Kim
  1 sibling, 0 replies; 15+ messages in thread
From: Ganesh Mahendran @ 2014-12-18  1:50 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Minchan Kim, Andrew Morton, linux-kernel, Linux-MM, Nitin Gupta,
	Dan Streetman, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi, seungho1.park

2014-12-18 7:19 GMT+08:00 Seth Jennings <sjennings@variantweb.net>:
> On Tue, Dec 02, 2014 at 11:49:41AM +0900, Minchan Kim wrote:
>> Recently, there was issue about zsmalloc fragmentation and
>> I got a report from Juno that new fork failed although there
>> are plenty of free pages in the system.
>> His investigation revealed zram is one of the culprit to make
>> heavy fragmentation so there was no more contiguous 16K page
>> for pgd to fork in the ARM.
>>
>> This patchset implement *basic* zsmalloc compaction support
>> and zram utilizes it so admin can do
>>       "echo 1 > /sys/block/zram0/compact"
>>
>> Actually, ideal is that mm migrate code is aware of zram pages and
>> migrate them out automatically without admin's manual opeartion
>> when system is out of contiguous page. Howver, we need more thinking
>> before adding more hooks to migrate.c. Even though we implement it,
>> we need manual trigger mode, too so I hope we could enhance
>> zram migration stuff based on this primitive functions in future.
>>
>> I just tested it on only x86 so need more testing on other arches.
>> Additionally, I should have a number for zsmalloc regression
>> caused by indirect layering. Unfortunately, I don't have any
>> ARM test machine on my desk. I will get it soon and test it.
>> Anyway, before further work, I'd like to hear opinion.
>>
>> Pathset is based on v3.18-rc6-mmotm-2014-11-26-15-45.
>
> Hey Minchan, sorry it has taken a while for me to look at this.
>
> I have prototyped this for zbud to and I see you face some of the same
> issues, some of them much worse for zsmalloc like large number of
> objects to move to reclaim a page (with zbud, the max is 1).
>
> I see you are using zsmalloc itself for allocating the handles.  Why not
> kmalloc()?  Then you wouldn't need to track the handle_class stuff and
> adjust the class sizes (just in the interest of changing only what is
> need to achieve the functionality).
>
> I used kmalloc() but that is not without issue as the handles can be
> allocated from many slabs and any slab that contains a handle can't be
> freed, basically resulting in the handles themselves needing to be
> compacted, which they can't be because the user handle is a pointer to
> them.
>
> One way to fix this, but it would be some amount of work, is to have the
> user (zswap/zbud) provide the space for the handle to zbud/zsmalloc.
> The zswap/zbud layer knows the size of the device (i.e. handle space)
> and could allocate a statically sized vmalloc area for holding handles
> so they don't get spread all over memory.  I haven't fully explored this
> idea yet.
>
> It is pretty limiting having the user trigger the compaction. Can we
> have a work item that periodically does some amount of compaction?
> Maybe also have something analogous to direct reclaim that, when
> zs_malloc fails to secure a new page, it will try to compact to get one?
> I understand this is a first step.  Maybe too much.

Yes, User do not know when to do the compaction.
Actually, zsmalloc's responsibility is to keep the fragmentation in a low level.
How about dynamically monitoring the fragmentation and do the compaction when
there are too much fragmentation.

I am working on another patch to collect statistics of zsmalloc
objects. Maybe that will
be helpful for this.

Thanks.

>
> Also worth pointing out that the fullness groups are very coarse.
> Combining the objects from a ZS_ALMOST_EMPTY zspage and ZS_ALMOST_FULL
> zspage, might not result in very tight packing.  In the worst case, the
> destination zspage would be slightly over 1/4 full (see
> fullness_threshold_frac)
>
> It also seems that you start with the smallest size classes first.
> Seems like if we start with the biggest first, we move fewer objects and
> reclaim more pages.
>
> It does add a lot of code :-/  Not sure if there is any way around that
> though if we want this functionality for zsmalloc.
>
> Seth
>
>>
>> Thanks.
>>
>> Minchan Kim (6):
>>   zsmalloc: expand size class to support sizeof(unsigned long)
>>   zsmalloc: add indrection layer to decouple handle from object
>>   zsmalloc: implement reverse mapping
>>   zsmalloc: encode alloced mark in handle object
>>   zsmalloc: support compaction
>>   zram: support compaction
>>
>>  drivers/block/zram/zram_drv.c |  24 ++
>>  drivers/block/zram/zram_drv.h |   1 +
>>  include/linux/zsmalloc.h      |   1 +
>>  mm/zsmalloc.c                 | 596 +++++++++++++++++++++++++++++++++++++-----
>>  4 files changed, 552 insertions(+), 70 deletions(-)
>>
>> --
>> 2.0.0
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/6] zsmalloc support compaction
  2014-12-17 23:19 ` Seth Jennings
  2014-12-18  1:50   ` Ganesh Mahendran
@ 2014-12-19  0:46   ` Minchan Kim
  2014-12-23  2:50     ` Minchan Kim
  1 sibling, 1 reply; 15+ messages in thread
From: Minchan Kim @ 2014-12-19  0:46 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Andrew Morton, linux-kernel, linux-mm, Nitin Gupta,
	Dan Streetman, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi, seungho1.park

Hey Seth,

On Wed, Dec 17, 2014 at 05:19:30PM -0600, Seth Jennings wrote:
> On Tue, Dec 02, 2014 at 11:49:41AM +0900, Minchan Kim wrote:
> > Recently, there was issue about zsmalloc fragmentation and
> > I got a report from Juno that new fork failed although there
> > are plenty of free pages in the system.
> > His investigation revealed zram is one of the culprit to make
> > heavy fragmentation so there was no more contiguous 16K page
> > for pgd to fork in the ARM.
> > 
> > This patchset implement *basic* zsmalloc compaction support
> > and zram utilizes it so admin can do
> > 	"echo 1 > /sys/block/zram0/compact"
> > 
> > Actually, ideal is that mm migrate code is aware of zram pages and
> > migrate them out automatically without admin's manual opeartion
> > when system is out of contiguous page. Howver, we need more thinking
> > before adding more hooks to migrate.c. Even though we implement it,
> > we need manual trigger mode, too so I hope we could enhance
> > zram migration stuff based on this primitive functions in future.
> > 
> > I just tested it on only x86 so need more testing on other arches.
> > Additionally, I should have a number for zsmalloc regression
> > caused by indirect layering. Unfortunately, I don't have any
> > ARM test machine on my desk. I will get it soon and test it.
> > Anyway, before further work, I'd like to hear opinion.
> > 
> > Pathset is based on v3.18-rc6-mmotm-2014-11-26-15-45.
> 
> Hey Minchan, sorry it has taken a while for me to look at this.

It's better than forever silence. Thanks, Seth.

> 
> I have prototyped this for zbud to and I see you face some of the same
> issues, some of them much worse for zsmalloc like large number of
> objects to move to reclaim a page (with zbud, the max is 1).
> 
> I see you are using zsmalloc itself for allocating the handles.  Why not
> kmalloc()?  Then you wouldn't need to track the handle_class stuff and
> adjust the class sizes (just in the interest of changing only what is
> need to achieve the functionality).

1. kmalloc minimum size : 8 byte but 4byte is enough to keep the handle
2. handle can pin lots of slab pages in memory
3. it's inaccurate for accouting memory usage of zsmalloc
4. Creating handle class in zsmalloc is simple.

> 
> I used kmalloc() but that is not without issue as the handles can be
> allocated from many slabs and any slab that contains a handle can't be
> freed, basically resulting in the handles themselves needing to be
> compacted, which they can't be because the user handle is a pointer to
> them.

Sure.

> 
> One way to fix this, but it would be some amount of work, is to have the
> user (zswap/zbud) provide the space for the handle to zbud/zsmalloc.
              zram?
> The zswap/zbud layer knows the size of the device (i.e. handle space)
            zram?
> and could allocate a statically sized vmalloc area for holding handles
> so they don't get spread all over memory.  I haven't fully explored this
> idea yet.

Hmm, I don't think it's a good idea.

Don't make an assumption that user of allocator know the size in advance.
In addition, you want to populate all of pages to keep handle in vmalloc
area statiscally? It wouldn't be huge but it depends on the user's setup
of disksize. More question: How do you search empty slot for new handle?
At last, we need caching logic and small allocator for that.
IMHO, it has cons than pros compared current my approach.

> 
> It is pretty limiting having the user trigger the compaction. Can we

Yeb, As I said, we need more policy but in this step, I want to introduce
primitive functions to enhance our policy as next step.

> have a work item that periodically does some amount of compaction?

I'm not sure periodic cleanup is good idea. I'd like to pass the decision
to the user, rather than allocator itself. It's enough for allocator
to expose current status to the user.


> Maybe also have something analogous to direct reclaim that, when
> zs_malloc fails to secure a new page, it will try to compact to get one?
> I understand this is a first step.  Maybe too much.

Yeb, I want to separate enhance as another patchset.

> 
> Also worth pointing out that the fullness groups are very coarse.
> Combining the objects from a ZS_ALMOST_EMPTY zspage and ZS_ALMOST_FULL
> zspage, might not result in very tight packing.  In the worst case, the
> destination zspage would be slightly over 1/4 full (see
> fullness_threshold_frac)

Good point. Actually, I had noticed that.
after all of ALMOST_EMPTY zspages are done to migrate, we might peek
out ZS_ALMOST_FULL zspages to consider.

> 
> It also seems that you start with the smallest size classes first.
> Seems like if we start with the biggest first, we move fewer objects and
> reclaim more pages.

Good idea. I will respin.
Thanks for the comment!

> 
> It does add a lot of code :-/  Not sure if there is any way around that
> though if we want this functionality for zsmalloc.
> 
> Seth
> 
> > 
> > Thanks.
> > 
> > Minchan Kim (6):
> >   zsmalloc: expand size class to support sizeof(unsigned long)
> >   zsmalloc: add indrection layer to decouple handle from object
> >   zsmalloc: implement reverse mapping
> >   zsmalloc: encode alloced mark in handle object
> >   zsmalloc: support compaction
> >   zram: support compaction
> > 
> >  drivers/block/zram/zram_drv.c |  24 ++
> >  drivers/block/zram/zram_drv.h |   1 +
> >  include/linux/zsmalloc.h      |   1 +
> >  mm/zsmalloc.c                 | 596 +++++++++++++++++++++++++++++++++++++-----
> >  4 files changed, 552 insertions(+), 70 deletions(-)
> > 
> > -- 
> > 2.0.0
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/6] zsmalloc support compaction
  2014-12-19  0:46   ` Minchan Kim
@ 2014-12-23  2:50     ` Minchan Kim
  0 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2014-12-23  2:50 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Andrew Morton, linux-kernel, linux-mm, Nitin Gupta,
	Dan Streetman, Sergey Senozhatsky, Luigi Semenzato,
	Jerome Marchand, juno.choi, seungho1.park

On Fri, Dec 19, 2014 at 09:46:48AM +0900, Minchan Kim wrote:
> Hey Seth,
> 
> On Wed, Dec 17, 2014 at 05:19:30PM -0600, Seth Jennings wrote:
> > On Tue, Dec 02, 2014 at 11:49:41AM +0900, Minchan Kim wrote:
> > > Recently, there was issue about zsmalloc fragmentation and
> > > I got a report from Juno that new fork failed although there
> > > are plenty of free pages in the system.
> > > His investigation revealed zram is one of the culprit to make
> > > heavy fragmentation so there was no more contiguous 16K page
> > > for pgd to fork in the ARM.
> > > 
> > > This patchset implement *basic* zsmalloc compaction support
> > > and zram utilizes it so admin can do
> > > 	"echo 1 > /sys/block/zram0/compact"
> > > 
> > > Actually, ideal is that mm migrate code is aware of zram pages and
> > > migrate them out automatically without admin's manual opeartion
> > > when system is out of contiguous page. Howver, we need more thinking
> > > before adding more hooks to migrate.c. Even though we implement it,
> > > we need manual trigger mode, too so I hope we could enhance
> > > zram migration stuff based on this primitive functions in future.
> > > 
> > > I just tested it on only x86 so need more testing on other arches.
> > > Additionally, I should have a number for zsmalloc regression
> > > caused by indirect layering. Unfortunately, I don't have any
> > > ARM test machine on my desk. I will get it soon and test it.
> > > Anyway, before further work, I'd like to hear opinion.
> > > 
> > > Pathset is based on v3.18-rc6-mmotm-2014-11-26-15-45.
> > 
> > Hey Minchan, sorry it has taken a while for me to look at this.
> 
> It's better than forever silence. Thanks, Seth.
> 
> > 
> > I have prototyped this for zbud to and I see you face some of the same
> > issues, some of them much worse for zsmalloc like large number of
> > objects to move to reclaim a page (with zbud, the max is 1).
> > 
> > I see you are using zsmalloc itself for allocating the handles.  Why not
> > kmalloc()?  Then you wouldn't need to track the handle_class stuff and
> > adjust the class sizes (just in the interest of changing only what is
> > need to achieve the functionality).
> 
> 1. kmalloc minimum size : 8 byte but 4byte is enough to keep the handle
> 2. handle can pin lots of slab pages in memory
> 3. it's inaccurate for accouting memory usage of zsmalloc
> 4. Creating handle class in zsmalloc is simple.
> 
> > 
> > I used kmalloc() but that is not without issue as the handles can be
> > allocated from many slabs and any slab that contains a handle can't be
> > freed, basically resulting in the handles themselves needing to be
> > compacted, which they can't be because the user handle is a pointer to
> > them.
> 
> Sure.

One thing for good with slab is that it could remove unnecessary operations
to translate handle to handle's position(page, idx) so that it would be
faster although we waste 50% on 32 bit machine. Okay, I will check it.

Thanks, Seth.


> 
> > 
> > One way to fix this, but it would be some amount of work, is to have the
> > user (zswap/zbud) provide the space for the handle to zbud/zsmalloc.
>               zram?
> > The zswap/zbud layer knows the size of the device (i.e. handle space)
>             zram?
> > and could allocate a statically sized vmalloc area for holding handles
> > so they don't get spread all over memory.  I haven't fully explored this
> > idea yet.
> 
> Hmm, I don't think it's a good idea.
> 
> Don't make an assumption that user of allocator know the size in advance.
> In addition, you want to populate all of pages to keep handle in vmalloc
> area statiscally? It wouldn't be huge but it depends on the user's setup
> of disksize. More question: How do you search empty slot for new handle?
> At last, we need caching logic and small allocator for that.
> IMHO, it has cons than pros compared current my approach.
> 
> > 
> > It is pretty limiting having the user trigger the compaction. Can we
> 
> Yeb, As I said, we need more policy but in this step, I want to introduce
> primitive functions to enhance our policy as next step.
> 
> > have a work item that periodically does some amount of compaction?
> 
> I'm not sure periodic cleanup is good idea. I'd like to pass the decision
> to the user, rather than allocator itself. It's enough for allocator
> to expose current status to the user.
> 
> 
> > Maybe also have something analogous to direct reclaim that, when
> > zs_malloc fails to secure a new page, it will try to compact to get one?
> > I understand this is a first step.  Maybe too much.
> 
> Yeb, I want to separate enhance as another patchset.
> 
> > 
> > Also worth pointing out that the fullness groups are very coarse.
> > Combining the objects from a ZS_ALMOST_EMPTY zspage and ZS_ALMOST_FULL
> > zspage, might not result in very tight packing.  In the worst case, the
> > destination zspage would be slightly over 1/4 full (see
> > fullness_threshold_frac)
> 
> Good point. Actually, I had noticed that.
> after all of ALMOST_EMPTY zspages are done to migrate, we might peek
> out ZS_ALMOST_FULL zspages to consider.
> 
> > 
> > It also seems that you start with the smallest size classes first.
> > Seems like if we start with the biggest first, we move fewer objects and
> > reclaim more pages.
> 
> Good idea. I will respin.
> Thanks for the comment!
> 
> > 
> > It does add a lot of code :-/  Not sure if there is any way around that
> > though if we want this functionality for zsmalloc.
> > 
> > Seth
> > 
> > > 
> > > Thanks.
> > > 
> > > Minchan Kim (6):
> > >   zsmalloc: expand size class to support sizeof(unsigned long)
> > >   zsmalloc: add indrection layer to decouple handle from object
> > >   zsmalloc: implement reverse mapping
> > >   zsmalloc: encode alloced mark in handle object
> > >   zsmalloc: support compaction
> > >   zram: support compaction
> > > 
> > >  drivers/block/zram/zram_drv.c |  24 ++
> > >  drivers/block/zram/zram_drv.h |   1 +
> > >  include/linux/zsmalloc.h      |   1 +
> > >  mm/zsmalloc.c                 | 596 +++++++++++++++++++++++++++++++++++++-----
> > >  4 files changed, 552 insertions(+), 70 deletions(-)
> > > 
> > > -- 
> > > 2.0.0
> > > 
> > > --
> > > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > > the body to majordomo@kvack.org.  For more info on Linux MM,
> > > see: http://www.linux-mm.org/ .
> > > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> -- 
> Kind regards,
> Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-12-23  2:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-02  2:49 [RFC 0/6] zsmalloc support compaction Minchan Kim
2014-12-02  2:49 ` [RFC 1/6] zsmalloc: expand size class to support sizeof(unsigned long) Minchan Kim
2014-12-02  2:49 ` [RFC 2/6] zsmalloc: add indrection layer to decouple handle from object Minchan Kim
2014-12-02  2:49 ` [RFC 3/6] zsmalloc: implement reverse mapping Minchan Kim
2014-12-02  2:49 ` [RFC 4/6] zsmalloc: encode alloced mark in handle object Minchan Kim
2014-12-02  2:49 ` [RFC 5/6] zsmalloc: support compaction Minchan Kim
2014-12-02  2:49 ` [RFC 6/6] zram: " Minchan Kim
2014-12-04  6:49 ` [RFC 0/6] zsmalloc " "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
2014-12-04  7:20   ` Minchan Kim
2014-12-04  7:29     ` "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
2014-12-04  7:21   ` "박승호/책임연구원/SW Platform(연)AOT팀(seungho1.park@lge.com)"
2014-12-17 23:19 ` Seth Jennings
2014-12-18  1:50   ` Ganesh Mahendran
2014-12-19  0:46   ` Minchan Kim
2014-12-23  2:50     ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).