All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] zsmalloc improvements
@ 2012-07-02 21:15 ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-02 21:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

This patchset removes the current x86 dependency for zsmalloc
and introduces some performance improvements in the object
mapping paths.

It was meant to be a follow-on to my previous patchest

https://lkml.org/lkml/2012/6/26/540

However, this patchset differed so much in light of new performance
information that I mostly started over.

In the past, I attempted to compare different mapping methods
via the use of zcache and frontswap.  However, the nature of those
two features makes comparing mapping method efficiency difficult
since the mapping is a very small part of the overall code path.

In an effort to get more useful statistics on the mapping speed,
I wrote a microbenchmark module named zsmapbench, designed to
measure mapping speed by calling straight into the zsmalloc
paths.

https://github.com/spartacus06/zsmapbench

This exposed an interesting and unexpected result: in all
cases that I tried, copying the objects that span pages instead
of using the page table to map them, was _always_ faster.  I could
not find a case in which the page table mapping method was faster.

zsmapbench measures the copy-based mapping at ~560 cycles for a
map/unmap operation on spanned object for both KVM guest and bare-metal,
while the page table mapping was ~1500 cycles on a VM and ~760 cycles
bare-metal.  The cycles for the copy method will vary with
allocation size, however, it is still faster even for the largest
allocation that zsmalloc supports.

The result is convenient though, as mempcy is very portable :)

This patchset replaces the x86-only page table mapping code with
copy-based mapping code. It also makes changes to optimize this
new method further.

There are no changes in arch/x86 required.

Patchset is based on greg's staging-next.

Seth Jennings (4):
  zsmalloc: remove x86 dependency
  zsmalloc: add single-page object fastpath in unmap
  zsmalloc: add details to zs_map_object boiler plate
  zsmalloc: add mapping modes

 drivers/staging/zcache/zcache-main.c     |    6 +-
 drivers/staging/zram/zram_drv.c          |    7 +-
 drivers/staging/zsmalloc/Kconfig         |    4 -
 drivers/staging/zsmalloc/zsmalloc-main.c |  124 ++++++++++++++++++++++--------
 drivers/staging/zsmalloc/zsmalloc.h      |   14 +++-
 drivers/staging/zsmalloc/zsmalloc_int.h  |    6 +-
 6 files changed, 114 insertions(+), 47 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 0/4] zsmalloc improvements
@ 2012-07-02 21:15 ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-02 21:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

This patchset removes the current x86 dependency for zsmalloc
and introduces some performance improvements in the object
mapping paths.

It was meant to be a follow-on to my previous patchest

https://lkml.org/lkml/2012/6/26/540

However, this patchset differed so much in light of new performance
information that I mostly started over.

In the past, I attempted to compare different mapping methods
via the use of zcache and frontswap.  However, the nature of those
two features makes comparing mapping method efficiency difficult
since the mapping is a very small part of the overall code path.

In an effort to get more useful statistics on the mapping speed,
I wrote a microbenchmark module named zsmapbench, designed to
measure mapping speed by calling straight into the zsmalloc
paths.

https://github.com/spartacus06/zsmapbench

This exposed an interesting and unexpected result: in all
cases that I tried, copying the objects that span pages instead
of using the page table to map them, was _always_ faster.  I could
not find a case in which the page table mapping method was faster.

zsmapbench measures the copy-based mapping at ~560 cycles for a
map/unmap operation on spanned object for both KVM guest and bare-metal,
while the page table mapping was ~1500 cycles on a VM and ~760 cycles
bare-metal.  The cycles for the copy method will vary with
allocation size, however, it is still faster even for the largest
allocation that zsmalloc supports.

The result is convenient though, as mempcy is very portable :)

This patchset replaces the x86-only page table mapping code with
copy-based mapping code. It also makes changes to optimize this
new method further.

There are no changes in arch/x86 required.

Patchset is based on greg's staging-next.

Seth Jennings (4):
  zsmalloc: remove x86 dependency
  zsmalloc: add single-page object fastpath in unmap
  zsmalloc: add details to zs_map_object boiler plate
  zsmalloc: add mapping modes

 drivers/staging/zcache/zcache-main.c     |    6 +-
 drivers/staging/zram/zram_drv.c          |    7 +-
 drivers/staging/zsmalloc/Kconfig         |    4 -
 drivers/staging/zsmalloc/zsmalloc-main.c |  124 ++++++++++++++++++++++--------
 drivers/staging/zsmalloc/zsmalloc.h      |   14 +++-
 drivers/staging/zsmalloc/zsmalloc_int.h  |    6 +-
 6 files changed, 114 insertions(+), 47 deletions(-)

-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 1/4] zsmalloc: remove x86 dependency
  2012-07-02 21:15 ` Seth Jennings
@ 2012-07-02 21:15   ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-02 21:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

This patch replaces the page table assisted object mapping
method, which has x86 dependencies, with a arch-independent
method that does a simple copy into a temporary per-cpu
buffer.

While a copy seems like it would be worse than mapping the pages,
tests demonstrate the copying is always faster and, in the case of
running inside a KVM guest, roughly 4x faster.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/staging/zsmalloc/Kconfig         |    4 --
 drivers/staging/zsmalloc/zsmalloc-main.c |   99 +++++++++++++++++++++---------
 drivers/staging/zsmalloc/zsmalloc_int.h  |    5 +-
 3 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
index a5ab720..9084565 100644
--- a/drivers/staging/zsmalloc/Kconfig
+++ b/drivers/staging/zsmalloc/Kconfig
@@ -1,9 +1,5 @@
 config ZSMALLOC
 	tristate "Memory allocator for compressed pages"
-	# X86 dependency is because of the use of __flush_tlb_one and set_pte
-	# in zsmalloc-main.c.
-	# TODO: convert these to portable functions
-	depends on X86
 	default n
 	help
 	  zsmalloc is a slab-based memory allocator designed to store
diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index 10b0d60..a7a6f22 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -470,6 +470,57 @@ static struct page *find_get_zspage(struct size_class *class)
 	return page;
 }
 
+static void zs_copy_map_object(char *buf, struct page *firstpage,
+				int off, int size)
+{
+	struct page *pages[2];
+	int sizes[2];
+	void *addr;
+
+	pages[0] = firstpage;
+	pages[1] = get_next_page(firstpage);
+	BUG_ON(!pages[1]);
+
+	sizes[0] = PAGE_SIZE - off;
+	sizes[1] = size - sizes[0];
+
+	/* disable page faults to match kmap_atomic() return conditions */
+	pagefault_disable();
+
+	/* copy object to per-cpu buffer */
+	addr = kmap_atomic(pages[0]);
+	memcpy(buf, addr + off, sizes[0]);
+	kunmap_atomic(addr);
+	addr = kmap_atomic(pages[1]);
+	memcpy(buf + sizes[0], addr, sizes[1]);
+	kunmap_atomic(addr);
+}
+
+static void zs_copy_unmap_object(char *buf, struct page *firstpage,
+				int off, int size)
+{
+	struct page *pages[2];
+	int sizes[2];
+	void *addr;
+
+	pages[0] = firstpage;
+	pages[1] = get_next_page(firstpage);
+	BUG_ON(!pages[1]);
+
+	sizes[0] = PAGE_SIZE - off;
+	sizes[1] = size - sizes[0];
+
+	/* copy per-cpu buffer to object */
+	addr = kmap_atomic(pages[0]);
+	memcpy(addr + off, buf, sizes[0]);
+	kunmap_atomic(addr);
+	addr = kmap_atomic(pages[1]);
+	memcpy(addr, buf + sizes[0], sizes[1]);
+	kunmap_atomic(addr);
+
+	/* enable page faults to match kunmap_atomic() return conditions */
+	pagefault_enable();
+}
 
 static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
 				void *pcpu)
@@ -480,18 +531,23 @@ static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
 	switch (action) {
 	case CPU_UP_PREPARE:
 		area = &per_cpu(zs_map_area, cpu);
-		if (area->vm)
-			break;
-		area->vm = alloc_vm_area(2 * PAGE_SIZE, area->vm_ptes);
-		if (!area->vm)
-			return notifier_from_errno(-ENOMEM);
+		/*
+		 * Make sure we don't leak memory if a cpu UP notification
+		 * and zs_init() race and both call zs_cpu_up() on the same cpu
+		 */
+		if (area->vm_buf)
+			return 0;
+		area->vm_buf = (char *)__get_free_page(GFP_KERNEL);
+		if (!area->vm_buf)
+			return -ENOMEM;
+		return 0;
 		break;
 	case CPU_DEAD:
 	case CPU_UP_CANCELED:
 		area = &per_cpu(zs_map_area, cpu);
-		if (area->vm)
-			free_vm_area(area->vm);
-		area->vm = NULL;
+		if (area->vm_buf)
+			free_page((unsigned long)area->vm_buf);
+		area->vm_buf = NULL;
 		break;
 	}
 
@@ -714,22 +770,11 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle)
 	if (off + class->size <= PAGE_SIZE) {
 		/* this object is contained entirely within a page */
 		area->vm_addr = kmap_atomic(page);
-	} else {
-		/* this object spans two pages */
-		struct page *nextp;
-
-		nextp = get_next_page(page);
-		BUG_ON(!nextp);
-
-
-		set_pte(area->vm_ptes[0], mk_pte(page, PAGE_KERNEL));
-		set_pte(area->vm_ptes[1], mk_pte(nextp, PAGE_KERNEL));
-
-		/* We pre-allocated VM area so mapping can never fail */
-		area->vm_addr = area->vm->addr;
+		return area->vm_addr + off;
 	}
 
-	return area->vm_addr + off;
+	zs_copy_map_object(area->vm_buf, page, off, class->size);
+	return area->vm_buf;
 }
 EXPORT_SYMBOL_GPL(zs_map_object);
 
@@ -751,14 +796,10 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 	off = obj_idx_to_offset(page, obj_idx, class->size);
 
 	area = &__get_cpu_var(zs_map_area);
-	if (off + class->size <= PAGE_SIZE) {
+	if (off + class->size <= PAGE_SIZE)
 		kunmap_atomic(area->vm_addr);
-	} else {
-		set_pte(area->vm_ptes[0], __pte(0));
-		set_pte(area->vm_ptes[1], __pte(0));
-		__flush_tlb_one((unsigned long)area->vm_addr);
-		__flush_tlb_one((unsigned long)area->vm_addr + PAGE_SIZE);
-	}
+	else
+		zs_copy_unmap_object(area->vm_buf, page, off, class->size);
 	put_cpu_var(zs_map_area);
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
diff --git a/drivers/staging/zsmalloc/zsmalloc_int.h b/drivers/staging/zsmalloc/zsmalloc_int.h
index 6fd32a9..f760dae 100644
--- a/drivers/staging/zsmalloc/zsmalloc_int.h
+++ b/drivers/staging/zsmalloc/zsmalloc_int.h
@@ -110,9 +110,8 @@ enum fullness_group {
 static const int fullness_threshold_frac = 4;
 
 struct mapping_area {
-	struct vm_struct *vm;
-	pte_t *vm_ptes[2];
-	char *vm_addr;
+	char *vm_buf; /* copy buffer for objects that span pages */
+	char *vm_addr; /* address of kmap_atomic()'ed pages */
 };
 
 struct size_class {
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 1/4] zsmalloc: remove x86 dependency
@ 2012-07-02 21:15   ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-02 21:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

This patch replaces the page table assisted object mapping
method, which has x86 dependencies, with a arch-independent
method that does a simple copy into a temporary per-cpu
buffer.

While a copy seems like it would be worse than mapping the pages,
tests demonstrate the copying is always faster and, in the case of
running inside a KVM guest, roughly 4x faster.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/staging/zsmalloc/Kconfig         |    4 --
 drivers/staging/zsmalloc/zsmalloc-main.c |   99 +++++++++++++++++++++---------
 drivers/staging/zsmalloc/zsmalloc_int.h  |    5 +-
 3 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
index a5ab720..9084565 100644
--- a/drivers/staging/zsmalloc/Kconfig
+++ b/drivers/staging/zsmalloc/Kconfig
@@ -1,9 +1,5 @@
 config ZSMALLOC
 	tristate "Memory allocator for compressed pages"
-	# X86 dependency is because of the use of __flush_tlb_one and set_pte
-	# in zsmalloc-main.c.
-	# TODO: convert these to portable functions
-	depends on X86
 	default n
 	help
 	  zsmalloc is a slab-based memory allocator designed to store
diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index 10b0d60..a7a6f22 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -470,6 +470,57 @@ static struct page *find_get_zspage(struct size_class *class)
 	return page;
 }
 
+static void zs_copy_map_object(char *buf, struct page *firstpage,
+				int off, int size)
+{
+	struct page *pages[2];
+	int sizes[2];
+	void *addr;
+
+	pages[0] = firstpage;
+	pages[1] = get_next_page(firstpage);
+	BUG_ON(!pages[1]);
+
+	sizes[0] = PAGE_SIZE - off;
+	sizes[1] = size - sizes[0];
+
+	/* disable page faults to match kmap_atomic() return conditions */
+	pagefault_disable();
+
+	/* copy object to per-cpu buffer */
+	addr = kmap_atomic(pages[0]);
+	memcpy(buf, addr + off, sizes[0]);
+	kunmap_atomic(addr);
+	addr = kmap_atomic(pages[1]);
+	memcpy(buf + sizes[0], addr, sizes[1]);
+	kunmap_atomic(addr);
+}
+
+static void zs_copy_unmap_object(char *buf, struct page *firstpage,
+				int off, int size)
+{
+	struct page *pages[2];
+	int sizes[2];
+	void *addr;
+
+	pages[0] = firstpage;
+	pages[1] = get_next_page(firstpage);
+	BUG_ON(!pages[1]);
+
+	sizes[0] = PAGE_SIZE - off;
+	sizes[1] = size - sizes[0];
+
+	/* copy per-cpu buffer to object */
+	addr = kmap_atomic(pages[0]);
+	memcpy(addr + off, buf, sizes[0]);
+	kunmap_atomic(addr);
+	addr = kmap_atomic(pages[1]);
+	memcpy(addr, buf + sizes[0], sizes[1]);
+	kunmap_atomic(addr);
+
+	/* enable page faults to match kunmap_atomic() return conditions */
+	pagefault_enable();
+}
 
 static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
 				void *pcpu)
@@ -480,18 +531,23 @@ static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
 	switch (action) {
 	case CPU_UP_PREPARE:
 		area = &per_cpu(zs_map_area, cpu);
-		if (area->vm)
-			break;
-		area->vm = alloc_vm_area(2 * PAGE_SIZE, area->vm_ptes);
-		if (!area->vm)
-			return notifier_from_errno(-ENOMEM);
+		/*
+		 * Make sure we don't leak memory if a cpu UP notification
+		 * and zs_init() race and both call zs_cpu_up() on the same cpu
+		 */
+		if (area->vm_buf)
+			return 0;
+		area->vm_buf = (char *)__get_free_page(GFP_KERNEL);
+		if (!area->vm_buf)
+			return -ENOMEM;
+		return 0;
 		break;
 	case CPU_DEAD:
 	case CPU_UP_CANCELED:
 		area = &per_cpu(zs_map_area, cpu);
-		if (area->vm)
-			free_vm_area(area->vm);
-		area->vm = NULL;
+		if (area->vm_buf)
+			free_page((unsigned long)area->vm_buf);
+		area->vm_buf = NULL;
 		break;
 	}
 
@@ -714,22 +770,11 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle)
 	if (off + class->size <= PAGE_SIZE) {
 		/* this object is contained entirely within a page */
 		area->vm_addr = kmap_atomic(page);
-	} else {
-		/* this object spans two pages */
-		struct page *nextp;
-
-		nextp = get_next_page(page);
-		BUG_ON(!nextp);
-
-
-		set_pte(area->vm_ptes[0], mk_pte(page, PAGE_KERNEL));
-		set_pte(area->vm_ptes[1], mk_pte(nextp, PAGE_KERNEL));
-
-		/* We pre-allocated VM area so mapping can never fail */
-		area->vm_addr = area->vm->addr;
+		return area->vm_addr + off;
 	}
 
-	return area->vm_addr + off;
+	zs_copy_map_object(area->vm_buf, page, off, class->size);
+	return area->vm_buf;
 }
 EXPORT_SYMBOL_GPL(zs_map_object);
 
@@ -751,14 +796,10 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 	off = obj_idx_to_offset(page, obj_idx, class->size);
 
 	area = &__get_cpu_var(zs_map_area);
-	if (off + class->size <= PAGE_SIZE) {
+	if (off + class->size <= PAGE_SIZE)
 		kunmap_atomic(area->vm_addr);
-	} else {
-		set_pte(area->vm_ptes[0], __pte(0));
-		set_pte(area->vm_ptes[1], __pte(0));
-		__flush_tlb_one((unsigned long)area->vm_addr);
-		__flush_tlb_one((unsigned long)area->vm_addr + PAGE_SIZE);
-	}
+	else
+		zs_copy_unmap_object(area->vm_buf, page, off, class->size);
 	put_cpu_var(zs_map_area);
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
diff --git a/drivers/staging/zsmalloc/zsmalloc_int.h b/drivers/staging/zsmalloc/zsmalloc_int.h
index 6fd32a9..f760dae 100644
--- a/drivers/staging/zsmalloc/zsmalloc_int.h
+++ b/drivers/staging/zsmalloc/zsmalloc_int.h
@@ -110,9 +110,8 @@ enum fullness_group {
 static const int fullness_threshold_frac = 4;
 
 struct mapping_area {
-	struct vm_struct *vm;
-	pte_t *vm_ptes[2];
-	char *vm_addr;
+	char *vm_buf; /* copy buffer for objects that span pages */
+	char *vm_addr; /* address of kmap_atomic()'ed pages */
 };
 
 struct size_class {
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 2/4] zsmalloc: add single-page object fastpath in unmap
  2012-07-02 21:15 ` Seth Jennings
@ 2012-07-02 21:15   ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-02 21:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

Improve zs_unmap_object() performance by adding a fast path for
objects that don't span pages.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/staging/zsmalloc/zsmalloc-main.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index a7a6f22..4942d41 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -774,6 +774,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle)
 	}
 
 	zs_copy_map_object(area->vm_buf, page, off, class->size);
+	area->vm_addr = NULL;
 	return area->vm_buf;
 }
 EXPORT_SYMBOL_GPL(zs_map_object);
@@ -788,6 +789,14 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 	struct size_class *class;
 	struct mapping_area *area;
 
+	area = &__get_cpu_var(zs_map_area);
+	if (area->vm_addr) {
+		/* single-page object fastpath */
+		kunmap_atomic(area->vm_addr);
+		put_cpu_var(zs_map_area);
+		return;
+	}
+
 	BUG_ON(!handle);
 
 	obj_handle_to_location(handle, &page, &obj_idx);
@@ -795,11 +804,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 	class = &pool->size_class[class_idx];
 	off = obj_idx_to_offset(page, obj_idx, class->size);
 
-	area = &__get_cpu_var(zs_map_area);
-	if (off + class->size <= PAGE_SIZE)
-		kunmap_atomic(area->vm_addr);
-	else
-		zs_copy_unmap_object(area->vm_buf, page, off, class->size);
+	zs_copy_unmap_object(area->vm_buf, page, off, class->size);
 	put_cpu_var(zs_map_area);
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 2/4] zsmalloc: add single-page object fastpath in unmap
@ 2012-07-02 21:15   ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-02 21:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

Improve zs_unmap_object() performance by adding a fast path for
objects that don't span pages.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/staging/zsmalloc/zsmalloc-main.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index a7a6f22..4942d41 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -774,6 +774,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle)
 	}
 
 	zs_copy_map_object(area->vm_buf, page, off, class->size);
+	area->vm_addr = NULL;
 	return area->vm_buf;
 }
 EXPORT_SYMBOL_GPL(zs_map_object);
@@ -788,6 +789,14 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 	struct size_class *class;
 	struct mapping_area *area;
 
+	area = &__get_cpu_var(zs_map_area);
+	if (area->vm_addr) {
+		/* single-page object fastpath */
+		kunmap_atomic(area->vm_addr);
+		put_cpu_var(zs_map_area);
+		return;
+	}
+
 	BUG_ON(!handle);
 
 	obj_handle_to_location(handle, &page, &obj_idx);
@@ -795,11 +804,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 	class = &pool->size_class[class_idx];
 	off = obj_idx_to_offset(page, obj_idx, class->size);
 
-	area = &__get_cpu_var(zs_map_area);
-	if (off + class->size <= PAGE_SIZE)
-		kunmap_atomic(area->vm_addr);
-	else
-		zs_copy_unmap_object(area->vm_buf, page, off, class->size);
+	zs_copy_unmap_object(area->vm_buf, page, off, class->size);
 	put_cpu_var(zs_map_area);
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
  2012-07-02 21:15 ` Seth Jennings
@ 2012-07-02 21:15   ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-02 21:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

Add information on the usage limits of zs_map_object()

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/staging/zsmalloc/zsmalloc-main.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index 4942d41..abf7c13 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -747,7 +747,12 @@ EXPORT_SYMBOL_GPL(zs_free);
  *
  * Before using an object allocated from zs_malloc, it must be mapped using
  * this function. When done with the object, it must be unmapped using
- * zs_unmap_object
+ * zs_unmap_object.
+ *
+ * Only one object can be mapped per cpu at a time. There is no protection
+ * against nested mappings.
+ *
+ * This function returns with preemption and page faults disabled.
 */
 void *zs_map_object(struct zs_pool *pool, unsigned long handle)
 {
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
@ 2012-07-02 21:15   ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-02 21:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

Add information on the usage limits of zs_map_object()

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/staging/zsmalloc/zsmalloc-main.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index 4942d41..abf7c13 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -747,7 +747,12 @@ EXPORT_SYMBOL_GPL(zs_free);
  *
  * Before using an object allocated from zs_malloc, it must be mapped using
  * this function. When done with the object, it must be unmapped using
- * zs_unmap_object
+ * zs_unmap_object.
+ *
+ * Only one object can be mapped per cpu at a time. There is no protection
+ * against nested mappings.
+ *
+ * This function returns with preemption and page faults disabled.
 */
 void *zs_map_object(struct zs_pool *pool, unsigned long handle)
 {
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 4/4] zsmalloc: add mapping modes
  2012-07-02 21:15 ` Seth Jennings
@ 2012-07-02 21:15   ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-02 21:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

This patch improves mapping performance in zsmalloc by getting
usage information from the user in the form of a "mapping mode"
and using it to avoid unnecessary copying for objects that span
pages.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/staging/zcache/zcache-main.c     |    6 +++---
 drivers/staging/zram/zram_drv.c          |    7 ++++---
 drivers/staging/zsmalloc/zsmalloc-main.c |   29 ++++++++++++++++++-----------
 drivers/staging/zsmalloc/zsmalloc.h      |   14 +++++++++++++-
 drivers/staging/zsmalloc/zsmalloc_int.h  |    1 +
 5 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
index c9e08bb..a8aabbe 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -708,7 +708,7 @@ static unsigned long zv_create(struct zs_pool *pool, uint32_t pool_id,
 		goto out;
 	atomic_inc(&zv_curr_dist_counts[chunks]);
 	atomic_inc(&zv_cumul_dist_counts[chunks]);
-	zv = zs_map_object(pool, handle);
+	zv = zs_map_object(pool, handle, ZS_MM_WO);
 	zv->index = index;
 	zv->oid = *oid;
 	zv->pool_id = pool_id;
@@ -727,7 +727,7 @@ static void zv_free(struct zs_pool *pool, unsigned long handle)
 	uint16_t size;
 	int chunks;
 
-	zv = zs_map_object(pool, handle);
+	zv = zs_map_object(pool, handle, ZS_MM_RW);
 	ASSERT_SENTINEL(zv, ZVH);
 	size = zv->size + sizeof(struct zv_hdr);
 	INVERT_SENTINEL(zv, ZVH);
@@ -749,7 +749,7 @@ static void zv_decompress(struct page *page, unsigned long handle)
 	int ret;
 	struct zv_hdr *zv;
 
-	zv = zs_map_object(zcache_host.zspool, handle);
+	zv = zs_map_object(zcache_host.zspool, handle, ZS_MM_RO);
 	BUG_ON(zv->size == 0);
 	ASSERT_SENTINEL(zv, ZVH);
 	to_va = kmap_atomic(page);
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 706cb62..653b074 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -220,7 +220,8 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 		uncmem = user_mem;
 	clen = PAGE_SIZE;
 
-	cmem = zs_map_object(zram->mem_pool, zram->table[index].handle);
+	cmem = zs_map_object(zram->mem_pool, zram->table[index].handle,
+				ZS_MM_RO);
 
 	ret = lzo1x_decompress_safe(cmem, zram->table[index].size,
 				    uncmem, &clen);
@@ -258,7 +259,7 @@ static int zram_read_before_write(struct zram *zram, char *mem, u32 index)
 		return 0;
 	}
 
-	cmem = zs_map_object(zram->mem_pool, handle);
+	cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
 	ret = lzo1x_decompress_safe(cmem, zram->table[index].size,
 				    mem, &clen);
 	zs_unmap_object(zram->mem_pool, handle);
@@ -351,7 +352,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		ret = -ENOMEM;
 		goto out;
 	}
-	cmem = zs_map_object(zram->mem_pool, handle);
+	cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
 
 	memcpy(cmem, src, clen);
 
diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index abf7c13..8b0bcb6 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -484,9 +484,6 @@ static void zs_copy_map_object(char *buf, struct page *firstpage,
 	sizes[0] = PAGE_SIZE - off;
 	sizes[1] = size - sizes[0];
 
-	/* disable page faults to match kmap_atomic() return conditions */
-	pagefault_disable();
-
 	/* copy object to per-cpu buffer */
 	addr = kmap_atomic(pages[0]);
 	memcpy(buf, addr + off, sizes[0]);
@@ -517,9 +514,6 @@ static void zs_copy_unmap_object(char *buf, struct page *firstpage,
 	addr = kmap_atomic(pages[1]);
 	memcpy(addr, buf + sizes[0], sizes[1]);
 	kunmap_atomic(addr);
-
-	/* enable page faults to match kunmap_atomic() return conditions */
-	pagefault_enable();
 }
 
 static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
@@ -754,7 +748,8 @@ EXPORT_SYMBOL_GPL(zs_free);
  *
  * This function returns with preemption and page faults disabled.
 */
-void *zs_map_object(struct zs_pool *pool, unsigned long handle)
+void *zs_map_object(struct zs_pool *pool, unsigned long handle,
+			enum zs_mapmode mm)
 {
 	struct page *page;
 	unsigned long obj_idx, off;
@@ -778,7 +773,11 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle)
 		return area->vm_addr + off;
 	}
 
-	zs_copy_map_object(area->vm_buf, page, off, class->size);
+	/* disable page faults to match kmap_atomic() return conditions */
+	pagefault_disable();
+
+	if (mm != ZS_MM_WO)
+		zs_copy_map_object(area->vm_buf, page, off, class->size);
 	area->vm_addr = NULL;
 	return area->vm_buf;
 }
@@ -795,13 +794,16 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 	struct mapping_area *area;
 
 	area = &__get_cpu_var(zs_map_area);
+	/* single-page object fastpath */
 	if (area->vm_addr) {
-		/* single-page object fastpath */
 		kunmap_atomic(area->vm_addr);
-		put_cpu_var(zs_map_area);
-		return;
+		goto out;
 	}
 
+	/* no write fastpath */
+	if (area->vm_mm == ZS_MM_RO)
+		goto pfenable;
+
 	BUG_ON(!handle);
 
 	obj_handle_to_location(handle, &page, &obj_idx);
@@ -810,6 +812,11 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 	off = obj_idx_to_offset(page, obj_idx, class->size);
 
 	zs_copy_unmap_object(area->vm_buf, page, off, class->size);
+
+pfenable:
+	/* enable page faults to match kunmap_atomic() return conditions */
+	pagefault_enable();
+out:
 	put_cpu_var(zs_map_area);
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
diff --git a/drivers/staging/zsmalloc/zsmalloc.h b/drivers/staging/zsmalloc/zsmalloc.h
index 485cbb1..de2e8bf 100644
--- a/drivers/staging/zsmalloc/zsmalloc.h
+++ b/drivers/staging/zsmalloc/zsmalloc.h
@@ -15,6 +15,17 @@
 
 #include <linux/types.h>
 
+/*
+ * zsmalloc mapping modes
+ *
+ * NOTE: These only make a difference when a mapped object spans pages
+*/
+enum zs_mapmode {
+	ZS_MM_RW, /* normal read-write mapping */
+	ZS_MM_RO, /* read-only (no copy-out at unmap time) */
+	ZS_MM_WO /* write-only (no copy-in at map time) */
+};
+
 struct zs_pool;
 
 struct zs_pool *zs_create_pool(const char *name, gfp_t flags);
@@ -23,7 +34,8 @@ void zs_destroy_pool(struct zs_pool *pool);
 unsigned long zs_malloc(struct zs_pool *pool, size_t size);
 void zs_free(struct zs_pool *pool, unsigned long obj);
 
-void *zs_map_object(struct zs_pool *pool, unsigned long handle);
+void *zs_map_object(struct zs_pool *pool, unsigned long handle,
+			enum zs_mapmode mm);
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
 
 u64 zs_get_total_size_bytes(struct zs_pool *pool);
diff --git a/drivers/staging/zsmalloc/zsmalloc_int.h b/drivers/staging/zsmalloc/zsmalloc_int.h
index f760dae..52805176 100644
--- a/drivers/staging/zsmalloc/zsmalloc_int.h
+++ b/drivers/staging/zsmalloc/zsmalloc_int.h
@@ -112,6 +112,7 @@ static const int fullness_threshold_frac = 4;
 struct mapping_area {
 	char *vm_buf; /* copy buffer for objects that span pages */
 	char *vm_addr; /* address of kmap_atomic()'ed pages */
+	enum zs_mapmode vm_mm; /* mapping mode */
 };
 
 struct size_class {
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 4/4] zsmalloc: add mapping modes
@ 2012-07-02 21:15   ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-02 21:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Seth Jennings, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

This patch improves mapping performance in zsmalloc by getting
usage information from the user in the form of a "mapping mode"
and using it to avoid unnecessary copying for objects that span
pages.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
---
 drivers/staging/zcache/zcache-main.c     |    6 +++---
 drivers/staging/zram/zram_drv.c          |    7 ++++---
 drivers/staging/zsmalloc/zsmalloc-main.c |   29 ++++++++++++++++++-----------
 drivers/staging/zsmalloc/zsmalloc.h      |   14 +++++++++++++-
 drivers/staging/zsmalloc/zsmalloc_int.h  |    1 +
 5 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
index c9e08bb..a8aabbe 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -708,7 +708,7 @@ static unsigned long zv_create(struct zs_pool *pool, uint32_t pool_id,
 		goto out;
 	atomic_inc(&zv_curr_dist_counts[chunks]);
 	atomic_inc(&zv_cumul_dist_counts[chunks]);
-	zv = zs_map_object(pool, handle);
+	zv = zs_map_object(pool, handle, ZS_MM_WO);
 	zv->index = index;
 	zv->oid = *oid;
 	zv->pool_id = pool_id;
@@ -727,7 +727,7 @@ static void zv_free(struct zs_pool *pool, unsigned long handle)
 	uint16_t size;
 	int chunks;
 
-	zv = zs_map_object(pool, handle);
+	zv = zs_map_object(pool, handle, ZS_MM_RW);
 	ASSERT_SENTINEL(zv, ZVH);
 	size = zv->size + sizeof(struct zv_hdr);
 	INVERT_SENTINEL(zv, ZVH);
@@ -749,7 +749,7 @@ static void zv_decompress(struct page *page, unsigned long handle)
 	int ret;
 	struct zv_hdr *zv;
 
-	zv = zs_map_object(zcache_host.zspool, handle);
+	zv = zs_map_object(zcache_host.zspool, handle, ZS_MM_RO);
 	BUG_ON(zv->size == 0);
 	ASSERT_SENTINEL(zv, ZVH);
 	to_va = kmap_atomic(page);
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 706cb62..653b074 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -220,7 +220,8 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 		uncmem = user_mem;
 	clen = PAGE_SIZE;
 
-	cmem = zs_map_object(zram->mem_pool, zram->table[index].handle);
+	cmem = zs_map_object(zram->mem_pool, zram->table[index].handle,
+				ZS_MM_RO);
 
 	ret = lzo1x_decompress_safe(cmem, zram->table[index].size,
 				    uncmem, &clen);
@@ -258,7 +259,7 @@ static int zram_read_before_write(struct zram *zram, char *mem, u32 index)
 		return 0;
 	}
 
-	cmem = zs_map_object(zram->mem_pool, handle);
+	cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
 	ret = lzo1x_decompress_safe(cmem, zram->table[index].size,
 				    mem, &clen);
 	zs_unmap_object(zram->mem_pool, handle);
@@ -351,7 +352,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		ret = -ENOMEM;
 		goto out;
 	}
-	cmem = zs_map_object(zram->mem_pool, handle);
+	cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
 
 	memcpy(cmem, src, clen);
 
diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index abf7c13..8b0bcb6 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -484,9 +484,6 @@ static void zs_copy_map_object(char *buf, struct page *firstpage,
 	sizes[0] = PAGE_SIZE - off;
 	sizes[1] = size - sizes[0];
 
-	/* disable page faults to match kmap_atomic() return conditions */
-	pagefault_disable();
-
 	/* copy object to per-cpu buffer */
 	addr = kmap_atomic(pages[0]);
 	memcpy(buf, addr + off, sizes[0]);
@@ -517,9 +514,6 @@ static void zs_copy_unmap_object(char *buf, struct page *firstpage,
 	addr = kmap_atomic(pages[1]);
 	memcpy(addr, buf + sizes[0], sizes[1]);
 	kunmap_atomic(addr);
-
-	/* enable page faults to match kunmap_atomic() return conditions */
-	pagefault_enable();
 }
 
 static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
@@ -754,7 +748,8 @@ EXPORT_SYMBOL_GPL(zs_free);
  *
  * This function returns with preemption and page faults disabled.
 */
-void *zs_map_object(struct zs_pool *pool, unsigned long handle)
+void *zs_map_object(struct zs_pool *pool, unsigned long handle,
+			enum zs_mapmode mm)
 {
 	struct page *page;
 	unsigned long obj_idx, off;
@@ -778,7 +773,11 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle)
 		return area->vm_addr + off;
 	}
 
-	zs_copy_map_object(area->vm_buf, page, off, class->size);
+	/* disable page faults to match kmap_atomic() return conditions */
+	pagefault_disable();
+
+	if (mm != ZS_MM_WO)
+		zs_copy_map_object(area->vm_buf, page, off, class->size);
 	area->vm_addr = NULL;
 	return area->vm_buf;
 }
@@ -795,13 +794,16 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 	struct mapping_area *area;
 
 	area = &__get_cpu_var(zs_map_area);
+	/* single-page object fastpath */
 	if (area->vm_addr) {
-		/* single-page object fastpath */
 		kunmap_atomic(area->vm_addr);
-		put_cpu_var(zs_map_area);
-		return;
+		goto out;
 	}
 
+	/* no write fastpath */
+	if (area->vm_mm == ZS_MM_RO)
+		goto pfenable;
+
 	BUG_ON(!handle);
 
 	obj_handle_to_location(handle, &page, &obj_idx);
@@ -810,6 +812,11 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 	off = obj_idx_to_offset(page, obj_idx, class->size);
 
 	zs_copy_unmap_object(area->vm_buf, page, off, class->size);
+
+pfenable:
+	/* enable page faults to match kunmap_atomic() return conditions */
+	pagefault_enable();
+out:
 	put_cpu_var(zs_map_area);
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
diff --git a/drivers/staging/zsmalloc/zsmalloc.h b/drivers/staging/zsmalloc/zsmalloc.h
index 485cbb1..de2e8bf 100644
--- a/drivers/staging/zsmalloc/zsmalloc.h
+++ b/drivers/staging/zsmalloc/zsmalloc.h
@@ -15,6 +15,17 @@
 
 #include <linux/types.h>
 
+/*
+ * zsmalloc mapping modes
+ *
+ * NOTE: These only make a difference when a mapped object spans pages
+*/
+enum zs_mapmode {
+	ZS_MM_RW, /* normal read-write mapping */
+	ZS_MM_RO, /* read-only (no copy-out at unmap time) */
+	ZS_MM_WO /* write-only (no copy-in at map time) */
+};
+
 struct zs_pool;
 
 struct zs_pool *zs_create_pool(const char *name, gfp_t flags);
@@ -23,7 +34,8 @@ void zs_destroy_pool(struct zs_pool *pool);
 unsigned long zs_malloc(struct zs_pool *pool, size_t size);
 void zs_free(struct zs_pool *pool, unsigned long obj);
 
-void *zs_map_object(struct zs_pool *pool, unsigned long handle);
+void *zs_map_object(struct zs_pool *pool, unsigned long handle,
+			enum zs_mapmode mm);
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
 
 u64 zs_get_total_size_bytes(struct zs_pool *pool);
diff --git a/drivers/staging/zsmalloc/zsmalloc_int.h b/drivers/staging/zsmalloc/zsmalloc_int.h
index f760dae..52805176 100644
--- a/drivers/staging/zsmalloc/zsmalloc_int.h
+++ b/drivers/staging/zsmalloc/zsmalloc_int.h
@@ -112,6 +112,7 @@ static const int fullness_threshold_frac = 4;
 struct mapping_area {
 	char *vm_buf; /* copy buffer for objects that span pages */
 	char *vm_addr; /* address of kmap_atomic()'ed pages */
+	enum zs_mapmode vm_mm; /* mapping mode */
 };
 
 struct size_class {
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
  2012-07-02 21:15 ` Seth Jennings
@ 2012-07-04  5:33   ` Minchan Kim
  -1 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-04  5:33 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

Sorry for not reviewing yet.
By urgent works, I will review this patches on next week.

I hope others can review it.
Thanks.

On 07/03/2012 06:15 AM, Seth Jennings wrote:

> This patchset removes the current x86 dependency for zsmalloc
> and introduces some performance improvements in the object
> mapping paths.
> 
> It was meant to be a follow-on to my previous patchest
> 
> https://lkml.org/lkml/2012/6/26/540
> 
> However, this patchset differed so much in light of new performance
> information that I mostly started over.
> 
> In the past, I attempted to compare different mapping methods
> via the use of zcache and frontswap.  However, the nature of those
> two features makes comparing mapping method efficiency difficult
> since the mapping is a very small part of the overall code path.
> 
> In an effort to get more useful statistics on the mapping speed,
> I wrote a microbenchmark module named zsmapbench, designed to
> measure mapping speed by calling straight into the zsmalloc
> paths.
> 
> https://github.com/spartacus06/zsmapbench
> 
> This exposed an interesting and unexpected result: in all
> cases that I tried, copying the objects that span pages instead
> of using the page table to map them, was _always_ faster.  I could
> not find a case in which the page table mapping method was faster.
> 
> zsmapbench measures the copy-based mapping at ~560 cycles for a
> map/unmap operation on spanned object for both KVM guest and bare-metal,
> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
> bare-metal.  The cycles for the copy method will vary with
> allocation size, however, it is still faster even for the largest
> allocation that zsmalloc supports.
> 
> The result is convenient though, as mempcy is very portable :)
> 
> This patchset replaces the x86-only page table mapping code with
> copy-based mapping code. It also makes changes to optimize this
> new method further.
> 
> There are no changes in arch/x86 required.
> 
> Patchset is based on greg's staging-next.
> 
> Seth Jennings (4):
>   zsmalloc: remove x86 dependency
>   zsmalloc: add single-page object fastpath in unmap
>   zsmalloc: add details to zs_map_object boiler plate
>   zsmalloc: add mapping modes
> 
>  drivers/staging/zcache/zcache-main.c     |    6 +-
>  drivers/staging/zram/zram_drv.c          |    7 +-
>  drivers/staging/zsmalloc/Kconfig         |    4 -
>  drivers/staging/zsmalloc/zsmalloc-main.c |  124 ++++++++++++++++++++++--------
>  drivers/staging/zsmalloc/zsmalloc.h      |   14 +++-
>  drivers/staging/zsmalloc/zsmalloc_int.h  |    6 +-
>  6 files changed, 114 insertions(+), 47 deletions(-)
> 



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
@ 2012-07-04  5:33   ` Minchan Kim
  0 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-04  5:33 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

Sorry for not reviewing yet.
By urgent works, I will review this patches on next week.

I hope others can review it.
Thanks.

On 07/03/2012 06:15 AM, Seth Jennings wrote:

> This patchset removes the current x86 dependency for zsmalloc
> and introduces some performance improvements in the object
> mapping paths.
> 
> It was meant to be a follow-on to my previous patchest
> 
> https://lkml.org/lkml/2012/6/26/540
> 
> However, this patchset differed so much in light of new performance
> information that I mostly started over.
> 
> In the past, I attempted to compare different mapping methods
> via the use of zcache and frontswap.  However, the nature of those
> two features makes comparing mapping method efficiency difficult
> since the mapping is a very small part of the overall code path.
> 
> In an effort to get more useful statistics on the mapping speed,
> I wrote a microbenchmark module named zsmapbench, designed to
> measure mapping speed by calling straight into the zsmalloc
> paths.
> 
> https://github.com/spartacus06/zsmapbench
> 
> This exposed an interesting and unexpected result: in all
> cases that I tried, copying the objects that span pages instead
> of using the page table to map them, was _always_ faster.  I could
> not find a case in which the page table mapping method was faster.
> 
> zsmapbench measures the copy-based mapping at ~560 cycles for a
> map/unmap operation on spanned object for both KVM guest and bare-metal,
> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
> bare-metal.  The cycles for the copy method will vary with
> allocation size, however, it is still faster even for the largest
> allocation that zsmalloc supports.
> 
> The result is convenient though, as mempcy is very portable :)
> 
> This patchset replaces the x86-only page table mapping code with
> copy-based mapping code. It also makes changes to optimize this
> new method further.
> 
> There are no changes in arch/x86 required.
> 
> Patchset is based on greg's staging-next.
> 
> Seth Jennings (4):
>   zsmalloc: remove x86 dependency
>   zsmalloc: add single-page object fastpath in unmap
>   zsmalloc: add details to zs_map_object boiler plate
>   zsmalloc: add mapping modes
> 
>  drivers/staging/zcache/zcache-main.c     |    6 +-
>  drivers/staging/zram/zram_drv.c          |    7 +-
>  drivers/staging/zsmalloc/Kconfig         |    4 -
>  drivers/staging/zsmalloc/zsmalloc-main.c |  124 ++++++++++++++++++++++--------
>  drivers/staging/zsmalloc/zsmalloc.h      |   14 +++-
>  drivers/staging/zsmalloc/zsmalloc_int.h  |    6 +-
>  6 files changed, 114 insertions(+), 47 deletions(-)
> 



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
  2012-07-02 21:15 ` Seth Jennings
@ 2012-07-04 20:43   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 62+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-04 20:43 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

On Mon, Jul 02, 2012 at 04:15:48PM -0500, Seth Jennings wrote:
> This patchset removes the current x86 dependency for zsmalloc
> and introduces some performance improvements in the object
> mapping paths.
> 
> It was meant to be a follow-on to my previous patchest
> 
> https://lkml.org/lkml/2012/6/26/540
> 
> However, this patchset differed so much in light of new performance
> information that I mostly started over.
> 
> In the past, I attempted to compare different mapping methods
> via the use of zcache and frontswap.  However, the nature of those
> two features makes comparing mapping method efficiency difficult
> since the mapping is a very small part of the overall code path.
> 
> In an effort to get more useful statistics on the mapping speed,
> I wrote a microbenchmark module named zsmapbench, designed to
> measure mapping speed by calling straight into the zsmalloc
> paths.
> 
> https://github.com/spartacus06/zsmapbench
> 
> This exposed an interesting and unexpected result: in all
> cases that I tried, copying the objects that span pages instead
> of using the page table to map them, was _always_ faster.  I could
> not find a case in which the page table mapping method was faster.

Which architecture was this under? It sounds x86-ish? Is this on
Westmere and more modern machines? What about Core2 architecture?

Oh how did it work on AMD Phenom boxes?
> 
> zsmapbench measures the copy-based mapping at ~560 cycles for a
> map/unmap operation on spanned object for both KVM guest and bare-metal,
> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
> bare-metal.  The cycles for the copy method will vary with
> allocation size, however, it is still faster even for the largest
> allocation that zsmalloc supports.
> 
> The result is convenient though, as mempcy is very portable :)
> 
> This patchset replaces the x86-only page table mapping code with
> copy-based mapping code. It also makes changes to optimize this
> new method further.
> 
> There are no changes in arch/x86 required.
> 
> Patchset is based on greg's staging-next.
> 
> Seth Jennings (4):
>   zsmalloc: remove x86 dependency
>   zsmalloc: add single-page object fastpath in unmap
>   zsmalloc: add details to zs_map_object boiler plate
>   zsmalloc: add mapping modes
> 
>  drivers/staging/zcache/zcache-main.c     |    6 +-
>  drivers/staging/zram/zram_drv.c          |    7 +-
>  drivers/staging/zsmalloc/Kconfig         |    4 -
>  drivers/staging/zsmalloc/zsmalloc-main.c |  124 ++++++++++++++++++++++--------
>  drivers/staging/zsmalloc/zsmalloc.h      |   14 +++-
>  drivers/staging/zsmalloc/zsmalloc_int.h  |    6 +-
>  6 files changed, 114 insertions(+), 47 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
@ 2012-07-04 20:43   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 62+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-04 20:43 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

On Mon, Jul 02, 2012 at 04:15:48PM -0500, Seth Jennings wrote:
> This patchset removes the current x86 dependency for zsmalloc
> and introduces some performance improvements in the object
> mapping paths.
> 
> It was meant to be a follow-on to my previous patchest
> 
> https://lkml.org/lkml/2012/6/26/540
> 
> However, this patchset differed so much in light of new performance
> information that I mostly started over.
> 
> In the past, I attempted to compare different mapping methods
> via the use of zcache and frontswap.  However, the nature of those
> two features makes comparing mapping method efficiency difficult
> since the mapping is a very small part of the overall code path.
> 
> In an effort to get more useful statistics on the mapping speed,
> I wrote a microbenchmark module named zsmapbench, designed to
> measure mapping speed by calling straight into the zsmalloc
> paths.
> 
> https://github.com/spartacus06/zsmapbench
> 
> This exposed an interesting and unexpected result: in all
> cases that I tried, copying the objects that span pages instead
> of using the page table to map them, was _always_ faster.  I could
> not find a case in which the page table mapping method was faster.

Which architecture was this under? It sounds x86-ish? Is this on
Westmere and more modern machines? What about Core2 architecture?

Oh how did it work on AMD Phenom boxes?
> 
> zsmapbench measures the copy-based mapping at ~560 cycles for a
> map/unmap operation on spanned object for both KVM guest and bare-metal,
> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
> bare-metal.  The cycles for the copy method will vary with
> allocation size, however, it is still faster even for the largest
> allocation that zsmalloc supports.
> 
> The result is convenient though, as mempcy is very portable :)
> 
> This patchset replaces the x86-only page table mapping code with
> copy-based mapping code. It also makes changes to optimize this
> new method further.
> 
> There are no changes in arch/x86 required.
> 
> Patchset is based on greg's staging-next.
> 
> Seth Jennings (4):
>   zsmalloc: remove x86 dependency
>   zsmalloc: add single-page object fastpath in unmap
>   zsmalloc: add details to zs_map_object boiler plate
>   zsmalloc: add mapping modes
> 
>  drivers/staging/zcache/zcache-main.c     |    6 +-
>  drivers/staging/zram/zram_drv.c          |    7 +-
>  drivers/staging/zsmalloc/Kconfig         |    4 -
>  drivers/staging/zsmalloc/zsmalloc-main.c |  124 ++++++++++++++++++++++--------
>  drivers/staging/zsmalloc/zsmalloc.h      |   14 +++-
>  drivers/staging/zsmalloc/zsmalloc_int.h  |    6 +-
>  6 files changed, 114 insertions(+), 47 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
  2012-07-04 20:43   ` Konrad Rzeszutek Wilk
@ 2012-07-06 15:07     ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-06 15:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

On 07/04/2012 03:43 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Jul 02, 2012 at 04:15:48PM -0500, Seth Jennings wrote:
>> This exposed an interesting and unexpected result: in all
>> cases that I tried, copying the objects that span pages instead
>> of using the page table to map them, was _always_ faster.  I could
>> not find a case in which the page table mapping method was faster.
> 
> Which architecture was this under? It sounds x86-ish? Is this on
> Westmere and more modern machines? What about Core2 architecture?
> 
> Oh how did it work on AMD Phenom boxes?

I don't have a Phenom box but I have an Athlon X2 I can try out.
I'll get this information next Monday.

--
Seth


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
@ 2012-07-06 15:07     ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-06 15:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

On 07/04/2012 03:43 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Jul 02, 2012 at 04:15:48PM -0500, Seth Jennings wrote:
>> This exposed an interesting and unexpected result: in all
>> cases that I tried, copying the objects that span pages instead
>> of using the page table to map them, was _always_ faster.  I could
>> not find a case in which the page table mapping method was faster.
> 
> Which architecture was this under? It sounds x86-ish? Is this on
> Westmere and more modern machines? What about Core2 architecture?
> 
> Oh how did it work on AMD Phenom boxes?

I don't have a Phenom box but I have an Athlon X2 I can try out.
I'll get this information next Monday.

--
Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
  2012-07-06 15:07     ` Seth Jennings
@ 2012-07-09 13:58       ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-09 13:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

On 07/06/2012 10:07 AM, Seth Jennings wrote:
> On 07/04/2012 03:43 PM, Konrad Rzeszutek Wilk wrote:
>> On Mon, Jul 02, 2012 at 04:15:48PM -0500, Seth Jennings wrote:
>>> This exposed an interesting and unexpected result: in all
>>> cases that I tried, copying the objects that span pages instead
>>> of using the page table to map them, was _always_ faster.  I could
>>> not find a case in which the page table mapping method was faster.
>>
>> Which architecture was this under? It sounds x86-ish? Is this on
>> Westmere and more modern machines? What about Core2 architecture?
>>
>> Oh how did it work on AMD Phenom boxes?
> 
> I don't have a Phenom box but I have an Athlon X2 I can try out.
> I'll get this information next Monday.

Actually, I'm running some production stuff on that box, so
I rather not put testing stuff on it.  Is there any
particular reason that you wanted this information? Do you
have a reason to believe that mapping will be faster than
copy for AMD procs?

(To everyone) I'd like to get this acked before the 3.6
merge window if there are no concerns/objections.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
@ 2012-07-09 13:58       ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-09 13:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

On 07/06/2012 10:07 AM, Seth Jennings wrote:
> On 07/04/2012 03:43 PM, Konrad Rzeszutek Wilk wrote:
>> On Mon, Jul 02, 2012 at 04:15:48PM -0500, Seth Jennings wrote:
>>> This exposed an interesting and unexpected result: in all
>>> cases that I tried, copying the objects that span pages instead
>>> of using the page table to map them, was _always_ faster.  I could
>>> not find a case in which the page table mapping method was faster.
>>
>> Which architecture was this under? It sounds x86-ish? Is this on
>> Westmere and more modern machines? What about Core2 architecture?
>>
>> Oh how did it work on AMD Phenom boxes?
> 
> I don't have a Phenom box but I have an Athlon X2 I can try out.
> I'll get this information next Monday.

Actually, I'm running some production stuff on that box, so
I rather not put testing stuff on it.  Is there any
particular reason that you wanted this information? Do you
have a reason to believe that mapping will be faster than
copy for AMD procs?

(To everyone) I'd like to get this acked before the 3.6
merge window if there are no concerns/objections.

Thanks,
Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
  2012-07-02 21:15   ` Seth Jennings
@ 2012-07-10  2:21     ` Minchan Kim
  -1 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-10  2:21 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/03/2012 06:15 AM, Seth Jennings wrote:
> This patch replaces the page table assisted object mapping
> method, which has x86 dependencies, with a arch-independent
> method that does a simple copy into a temporary per-cpu
> buffer.
> 
> While a copy seems like it would be worse than mapping the pages,
> tests demonstrate the copying is always faster and, in the case of
> running inside a KVM guest, roughly 4x faster.
> 
> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> ---
>  drivers/staging/zsmalloc/Kconfig         |    4 --
>  drivers/staging/zsmalloc/zsmalloc-main.c |   99 +++++++++++++++++++++---------
>  drivers/staging/zsmalloc/zsmalloc_int.h  |    5 +-
>  3 files changed, 72 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
> index a5ab720..9084565 100644
> --- a/drivers/staging/zsmalloc/Kconfig
> +++ b/drivers/staging/zsmalloc/Kconfig
> @@ -1,9 +1,5 @@
>  config ZSMALLOC
>  	tristate "Memory allocator for compressed pages"
> -	# X86 dependency is because of the use of __flush_tlb_one and set_pte
> -	# in zsmalloc-main.c.
> -	# TODO: convert these to portable functions
> -	depends on X86
>  	default n
>  	help
>  	  zsmalloc is a slab-based memory allocator designed to store
> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
> index 10b0d60..a7a6f22 100644
> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> @@ -470,6 +470,57 @@ static struct page *find_get_zspage(struct size_class *class)
>  	return page;
>  }
>  
> +static void zs_copy_map_object(char *buf, struct page *firstpage,
> +				int off, int size)

firstpage is rather misleading.
As you know, we use firstpage term for real firstpage of zspage but
in case of zs_copy_map_object, it could be a middle page of zspage.
So I would like to use "page" instead of firstpage.

> +{
> +	struct page *pages[2];
> +	int sizes[2];
> +	void *addr;
> +
> +	pages[0] = firstpage;
> +	pages[1] = get_next_page(firstpage);
> +	BUG_ON(!pages[1]);
> +
> +	sizes[0] = PAGE_SIZE - off;
> +	sizes[1] = size - sizes[0];
> +
> +	/* disable page faults to match kmap_atomic() return conditions */
> +	pagefault_disable();

If I understand your intention correctly, you want to prevent calling
this function on non-atomic context. Right?
Please write down description more clearly as point of view what's happen
if we didn't.

> +
> +	/* copy object to per-cpu buffer */
> +	addr = kmap_atomic(pages[0]);
> +	memcpy(buf, addr + off, sizes[0]);
> +	kunmap_atomic(addr);
> +	addr = kmap_atomic(pages[1]);
> +	memcpy(buf + sizes[0], addr, sizes[1]);
> +	kunmap_atomic(addr);
> +}
> +
> +static void zs_copy_unmap_object(char *buf, struct page *firstpage,
> +				int off, int size)

Ditto firstpage.

Otherwise, Looks good to me.

-- 
Kind regards,
Minchan Kim



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
@ 2012-07-10  2:21     ` Minchan Kim
  0 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-10  2:21 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/03/2012 06:15 AM, Seth Jennings wrote:
> This patch replaces the page table assisted object mapping
> method, which has x86 dependencies, with a arch-independent
> method that does a simple copy into a temporary per-cpu
> buffer.
> 
> While a copy seems like it would be worse than mapping the pages,
> tests demonstrate the copying is always faster and, in the case of
> running inside a KVM guest, roughly 4x faster.
> 
> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> ---
>  drivers/staging/zsmalloc/Kconfig         |    4 --
>  drivers/staging/zsmalloc/zsmalloc-main.c |   99 +++++++++++++++++++++---------
>  drivers/staging/zsmalloc/zsmalloc_int.h  |    5 +-
>  3 files changed, 72 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
> index a5ab720..9084565 100644
> --- a/drivers/staging/zsmalloc/Kconfig
> +++ b/drivers/staging/zsmalloc/Kconfig
> @@ -1,9 +1,5 @@
>  config ZSMALLOC
>  	tristate "Memory allocator for compressed pages"
> -	# X86 dependency is because of the use of __flush_tlb_one and set_pte
> -	# in zsmalloc-main.c.
> -	# TODO: convert these to portable functions
> -	depends on X86
>  	default n
>  	help
>  	  zsmalloc is a slab-based memory allocator designed to store
> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
> index 10b0d60..a7a6f22 100644
> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> @@ -470,6 +470,57 @@ static struct page *find_get_zspage(struct size_class *class)
>  	return page;
>  }
>  
> +static void zs_copy_map_object(char *buf, struct page *firstpage,
> +				int off, int size)

firstpage is rather misleading.
As you know, we use firstpage term for real firstpage of zspage but
in case of zs_copy_map_object, it could be a middle page of zspage.
So I would like to use "page" instead of firstpage.

> +{
> +	struct page *pages[2];
> +	int sizes[2];
> +	void *addr;
> +
> +	pages[0] = firstpage;
> +	pages[1] = get_next_page(firstpage);
> +	BUG_ON(!pages[1]);
> +
> +	sizes[0] = PAGE_SIZE - off;
> +	sizes[1] = size - sizes[0];
> +
> +	/* disable page faults to match kmap_atomic() return conditions */
> +	pagefault_disable();

If I understand your intention correctly, you want to prevent calling
this function on non-atomic context. Right?
Please write down description more clearly as point of view what's happen
if we didn't.

> +
> +	/* copy object to per-cpu buffer */
> +	addr = kmap_atomic(pages[0]);
> +	memcpy(buf, addr + off, sizes[0]);
> +	kunmap_atomic(addr);
> +	addr = kmap_atomic(pages[1]);
> +	memcpy(buf + sizes[0], addr, sizes[1]);
> +	kunmap_atomic(addr);
> +}
> +
> +static void zs_copy_unmap_object(char *buf, struct page *firstpage,
> +				int off, int size)

Ditto firstpage.

Otherwise, Looks good to me.

-- 
Kind regards,
Minchan Kim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/4] zsmalloc: add single-page object fastpath in unmap
  2012-07-02 21:15   ` Seth Jennings
@ 2012-07-10  2:25     ` Minchan Kim
  -1 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-10  2:25 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/03/2012 06:15 AM, Seth Jennings wrote:
> Improve zs_unmap_object() performance by adding a fast path for
> objects that don't span pages.
> 
> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> ---
>  drivers/staging/zsmalloc/zsmalloc-main.c |   15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
> index a7a6f22..4942d41 100644
> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> @@ -774,6 +774,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle)
>  	}
>  
>  	zs_copy_map_object(area->vm_buf, page, off, class->size);
> +	area->vm_addr = NULL;
>  	return area->vm_buf;
>  }
>  EXPORT_SYMBOL_GPL(zs_map_object);
> @@ -788,6 +789,14 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
>  	struct size_class *class;
>  	struct mapping_area *area;
>  
> +	area = &__get_cpu_var(zs_map_area);
> +	if (area->vm_addr) {
> +		/* single-page object fastpath */
> +		kunmap_atomic(area->vm_addr);
> +		put_cpu_var(zs_map_area);
> +		return;
> +	}
> +

Please locate this after below BUG_ON.
The BUG check is still effective regardless of your fast path patch.

>  	BUG_ON(!handle);
>  
>  	obj_handle_to_location(handle, &page, &obj_idx);
> @@ -795,11 +804,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
>  	class = &pool->size_class[class_idx];
>  	off = obj_idx_to_offset(page, obj_idx, class->size);
>  
> -	area = &__get_cpu_var(zs_map_area);
> -	if (off + class->size <= PAGE_SIZE)
> -		kunmap_atomic(area->vm_addr);
> -	else
> -		zs_copy_unmap_object(area->vm_buf, page, off, class->size);
> +	zs_copy_unmap_object(area->vm_buf, page, off, class->size);
>  	put_cpu_var(zs_map_area);
>  }
>  EXPORT_SYMBOL_GPL(zs_unmap_object);
> 


-- 
Kind regards,
Minchan Kim



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/4] zsmalloc: add single-page object fastpath in unmap
@ 2012-07-10  2:25     ` Minchan Kim
  0 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-10  2:25 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/03/2012 06:15 AM, Seth Jennings wrote:
> Improve zs_unmap_object() performance by adding a fast path for
> objects that don't span pages.
> 
> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> ---
>  drivers/staging/zsmalloc/zsmalloc-main.c |   15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
> index a7a6f22..4942d41 100644
> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> @@ -774,6 +774,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle)
>  	}
>  
>  	zs_copy_map_object(area->vm_buf, page, off, class->size);
> +	area->vm_addr = NULL;
>  	return area->vm_buf;
>  }
>  EXPORT_SYMBOL_GPL(zs_map_object);
> @@ -788,6 +789,14 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
>  	struct size_class *class;
>  	struct mapping_area *area;
>  
> +	area = &__get_cpu_var(zs_map_area);
> +	if (area->vm_addr) {
> +		/* single-page object fastpath */
> +		kunmap_atomic(area->vm_addr);
> +		put_cpu_var(zs_map_area);
> +		return;
> +	}
> +

Please locate this after below BUG_ON.
The BUG check is still effective regardless of your fast path patch.

>  	BUG_ON(!handle);
>  
>  	obj_handle_to_location(handle, &page, &obj_idx);
> @@ -795,11 +804,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
>  	class = &pool->size_class[class_idx];
>  	off = obj_idx_to_offset(page, obj_idx, class->size);
>  
> -	area = &__get_cpu_var(zs_map_area);
> -	if (off + class->size <= PAGE_SIZE)
> -		kunmap_atomic(area->vm_addr);
> -	else
> -		zs_copy_unmap_object(area->vm_buf, page, off, class->size);
> +	zs_copy_unmap_object(area->vm_buf, page, off, class->size);
>  	put_cpu_var(zs_map_area);
>  }
>  EXPORT_SYMBOL_GPL(zs_unmap_object);
> 


-- 
Kind regards,
Minchan Kim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
  2012-07-02 21:15   ` Seth Jennings
@ 2012-07-10  2:35     ` Minchan Kim
  -1 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-10  2:35 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/03/2012 06:15 AM, Seth Jennings wrote:
> Add information on the usage limits of zs_map_object()
> 
> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> ---
>  drivers/staging/zsmalloc/zsmalloc-main.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
> index 4942d41..abf7c13 100644
> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> @@ -747,7 +747,12 @@ EXPORT_SYMBOL_GPL(zs_free);
>   *
>   * Before using an object allocated from zs_malloc, it must be mapped using
>   * this function. When done with the object, it must be unmapped using
> - * zs_unmap_object
> + * zs_unmap_object.
> + *
> + * Only one object can be mapped per cpu at a time. There is no protection
> + * against nested mappings.
> + *
> + * This function returns with preemption and page faults disabled.
>  */
>  void *zs_map_object(struct zs_pool *pool, unsigned long handle)
>  {
> 

The comment is good but I hope we can detect it automatically with DEBUG
option. It wouldn't be hard but it's a debug patch so not critical
until we receive some report about the bug.

The possibility for nesting is that it is used by irq context.

A uses the mapping
.
.
.
IRQ happen
	B uses the mapping in IRQ context
	.
	.
	.

Maybe we need local_irq_save/restore in zs_[un]map_object path.

-- 
Kind regards,
Minchan Kim



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
@ 2012-07-10  2:35     ` Minchan Kim
  0 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-10  2:35 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/03/2012 06:15 AM, Seth Jennings wrote:
> Add information on the usage limits of zs_map_object()
> 
> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> ---
>  drivers/staging/zsmalloc/zsmalloc-main.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
> index 4942d41..abf7c13 100644
> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> @@ -747,7 +747,12 @@ EXPORT_SYMBOL_GPL(zs_free);
>   *
>   * Before using an object allocated from zs_malloc, it must be mapped using
>   * this function. When done with the object, it must be unmapped using
> - * zs_unmap_object
> + * zs_unmap_object.
> + *
> + * Only one object can be mapped per cpu at a time. There is no protection
> + * against nested mappings.
> + *
> + * This function returns with preemption and page faults disabled.
>  */
>  void *zs_map_object(struct zs_pool *pool, unsigned long handle)
>  {
> 

The comment is good but I hope we can detect it automatically with DEBUG
option. It wouldn't be hard but it's a debug patch so not critical
until we receive some report about the bug.

The possibility for nesting is that it is used by irq context.

A uses the mapping
.
.
.
IRQ happen
	B uses the mapping in IRQ context
	.
	.
	.

Maybe we need local_irq_save/restore in zs_[un]map_object path.

-- 
Kind regards,
Minchan Kim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
  2012-07-10  2:35     ` Minchan Kim
@ 2012-07-10 15:17       ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-10 15:17 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/09/2012 09:35 PM, Minchan Kim wrote:
> On 07/03/2012 06:15 AM, Seth Jennings wrote:
>> Add information on the usage limits of zs_map_object()
>>
>> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
>> ---
>>  drivers/staging/zsmalloc/zsmalloc-main.c |    7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
>> index 4942d41..abf7c13 100644
>> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
>> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
>> @@ -747,7 +747,12 @@ EXPORT_SYMBOL_GPL(zs_free);
>>   *
>>   * Before using an object allocated from zs_malloc, it must be mapped using
>>   * this function. When done with the object, it must be unmapped using
>> - * zs_unmap_object
>> + * zs_unmap_object.
>> + *
>> + * Only one object can be mapped per cpu at a time. There is no protection
>> + * against nested mappings.
>> + *
>> + * This function returns with preemption and page faults disabled.
>>  */
>>  void *zs_map_object(struct zs_pool *pool, unsigned long handle)
>>  {
>>
> 
> The comment is good but I hope we can detect it automatically with DEBUG
> option. It wouldn't be hard but it's a debug patch so not critical
> until we receive some report about the bug.

Yes, we could implement some detection scheme later.

> 
> The possibility for nesting is that it is used by irq context.
> 
> A uses the mapping
> .
> .
> .
> IRQ happen
> 	B uses the mapping in IRQ context
> 	.
> 	.
> 	.
> 
> Maybe we need local_irq_save/restore in zs_[un]map_object path.

I'd rather not disable interrupts since that will create
unnecessary interrupt latency for all users, even if they
don't need interrupt protection.  If a particular user uses
zs_map_object() in an interrupt path, it will be up to that
user to disable interrupts to ensure safety.

Thanks,
Seth



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
@ 2012-07-10 15:17       ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-10 15:17 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/09/2012 09:35 PM, Minchan Kim wrote:
> On 07/03/2012 06:15 AM, Seth Jennings wrote:
>> Add information on the usage limits of zs_map_object()
>>
>> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
>> ---
>>  drivers/staging/zsmalloc/zsmalloc-main.c |    7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
>> index 4942d41..abf7c13 100644
>> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
>> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
>> @@ -747,7 +747,12 @@ EXPORT_SYMBOL_GPL(zs_free);
>>   *
>>   * Before using an object allocated from zs_malloc, it must be mapped using
>>   * this function. When done with the object, it must be unmapped using
>> - * zs_unmap_object
>> + * zs_unmap_object.
>> + *
>> + * Only one object can be mapped per cpu at a time. There is no protection
>> + * against nested mappings.
>> + *
>> + * This function returns with preemption and page faults disabled.
>>  */
>>  void *zs_map_object(struct zs_pool *pool, unsigned long handle)
>>  {
>>
> 
> The comment is good but I hope we can detect it automatically with DEBUG
> option. It wouldn't be hard but it's a debug patch so not critical
> until we receive some report about the bug.

Yes, we could implement some detection scheme later.

> 
> The possibility for nesting is that it is used by irq context.
> 
> A uses the mapping
> .
> .
> .
> IRQ happen
> 	B uses the mapping in IRQ context
> 	.
> 	.
> 	.
> 
> Maybe we need local_irq_save/restore in zs_[un]map_object path.

I'd rather not disable interrupts since that will create
unnecessary interrupt latency for all users, even if they
don't need interrupt protection.  If a particular user uses
zs_map_object() in an interrupt path, it will be up to that
user to disable interrupts to ensure safety.

Thanks,
Seth


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
  2012-07-10  2:21     ` Minchan Kim
@ 2012-07-10 15:29       ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-10 15:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/09/2012 09:21 PM, Minchan Kim wrote:
> On 07/03/2012 06:15 AM, Seth Jennings wrote:
<snip>
>> +static void zs_copy_map_object(char *buf, struct page *firstpage,
>> +				int off, int size)
> 
> firstpage is rather misleading.
> As you know, we use firstpage term for real firstpage of zspage but
> in case of zs_copy_map_object, it could be a middle page of zspage.
> So I would like to use "page" instead of firstpage.

Accepted.

>> +{
>> +	struct page *pages[2];
>> +	int sizes[2];
>> +	void *addr;
>> +
>> +	pages[0] = firstpage;
>> +	pages[1] = get_next_page(firstpage);
>> +	BUG_ON(!pages[1]);
>> +
>> +	sizes[0] = PAGE_SIZE - off;
>> +	sizes[1] = size - sizes[0];
>> +
>> +	/* disable page faults to match kmap_atomic() return conditions */
>> +	pagefault_disable();
> 
> If I understand your intention correctly, you want to prevent calling
> this function on non-atomic context. Right?

This is moved to zs_map_object() in a later patch, but the
point is to provide uniform return conditions, regardless of
whether the object to be mapped is contained in a single
page or spans two pages.  kmap_atomic() disables page
faults, so I did it here to create symmetry.  The result is
that zs_map_object always returns with preemption and page
faults disabled.

Also, Greg already merged these patches so I'll have to
incorporate these changes as a separate patch.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
@ 2012-07-10 15:29       ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-10 15:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/09/2012 09:21 PM, Minchan Kim wrote:
> On 07/03/2012 06:15 AM, Seth Jennings wrote:
<snip>
>> +static void zs_copy_map_object(char *buf, struct page *firstpage,
>> +				int off, int size)
> 
> firstpage is rather misleading.
> As you know, we use firstpage term for real firstpage of zspage but
> in case of zs_copy_map_object, it could be a middle page of zspage.
> So I would like to use "page" instead of firstpage.

Accepted.

>> +{
>> +	struct page *pages[2];
>> +	int sizes[2];
>> +	void *addr;
>> +
>> +	pages[0] = firstpage;
>> +	pages[1] = get_next_page(firstpage);
>> +	BUG_ON(!pages[1]);
>> +
>> +	sizes[0] = PAGE_SIZE - off;
>> +	sizes[1] = size - sizes[0];
>> +
>> +	/* disable page faults to match kmap_atomic() return conditions */
>> +	pagefault_disable();
> 
> If I understand your intention correctly, you want to prevent calling
> this function on non-atomic context. Right?

This is moved to zs_map_object() in a later patch, but the
point is to provide uniform return conditions, regardless of
whether the object to be mapped is contained in a single
page or spans two pages.  kmap_atomic() disables page
faults, so I did it here to create symmetry.  The result is
that zs_map_object always returns with preemption and page
faults disabled.

Also, Greg already merged these patches so I'll have to
incorporate these changes as a separate patch.

Thanks,
Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
  2012-07-02 21:15 ` Seth Jennings
@ 2012-07-11  7:03   ` Minchan Kim
  -1 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-11  7:03 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

Hi everybody,

I realized it by Seth's mention yesterday that Greg already merged this series 
I should have hurried but last week I have no time. :(

On 07/03/2012 06:15 AM, Seth Jennings wrote:
> This patchset removes the current x86 dependency for zsmalloc
> and introduces some performance improvements in the object
> mapping paths.
> 
> It was meant to be a follow-on to my previous patchest
> 
> https://lkml.org/lkml/2012/6/26/540
> 
> However, this patchset differed so much in light of new performance
> information that I mostly started over.
> 
> In the past, I attempted to compare different mapping methods
> via the use of zcache and frontswap.  However, the nature of those
> two features makes comparing mapping method efficiency difficult
> since the mapping is a very small part of the overall code path.
> 
> In an effort to get more useful statistics on the mapping speed,
> I wrote a microbenchmark module named zsmapbench, designed to
> measure mapping speed by calling straight into the zsmalloc
> paths.
> 
> https://github.com/spartacus06/zsmapbench
> 
> This exposed an interesting and unexpected result: in all
> cases that I tried, copying the objects that span pages instead
> of using the page table to map them, was _always_ faster.  I could
> not find a case in which the page table mapping method was faster.
> 
> zsmapbench measures the copy-based mapping at ~560 cycles for a
> map/unmap operation on spanned object for both KVM guest and bare-metal,
> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
> bare-metal.  The cycles for the copy method will vary with
> allocation size, however, it is still faster even for the largest
> allocation that zsmalloc supports.
> 
> The result is convenient though, as mempcy is very portable :)

Today, I tested zsmapbench in my embedded board(ARM).
tlb-flush is 30% faster than copy-based so it's always not win.
I think it depends on CPU speed/cache size.

zram is already very popular on embedded systems so I want to use
it continuously without 30% big demage so I want to keep our old approach
which supporting local tlb flush. 

Of course, in case of KVM guest, copy-based would be always bin win.
So shouldn't we support both approach? It could make code very ugly
but I think it has enough value.

Any thought?


> 
> This patchset replaces the x86-only page table mapping code with
> copy-based mapping code. It also makes changes to optimize this
> new method further.
> 
> There are no changes in arch/x86 required.
> 
> Patchset is based on greg's staging-next.
> 
> Seth Jennings (4):
>   zsmalloc: remove x86 dependency
>   zsmalloc: add single-page object fastpath in unmap
>   zsmalloc: add details to zs_map_object boiler plate
>   zsmalloc: add mapping modes
> 
>  drivers/staging/zcache/zcache-main.c     |    6 +-
>  drivers/staging/zram/zram_drv.c          |    7 +-
>  drivers/staging/zsmalloc/Kconfig         |    4 -
>  drivers/staging/zsmalloc/zsmalloc-main.c |  124 ++++++++++++++++++++++--------
>  drivers/staging/zsmalloc/zsmalloc.h      |   14 +++-
>  drivers/staging/zsmalloc/zsmalloc_int.h  |    6 +-
>  6 files changed, 114 insertions(+), 47 deletions(-)
> 


-- 
Kind regards,
Minchan Kim



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
@ 2012-07-11  7:03   ` Minchan Kim
  0 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-11  7:03 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

Hi everybody,

I realized it by Seth's mention yesterday that Greg already merged this series 
I should have hurried but last week I have no time. :(

On 07/03/2012 06:15 AM, Seth Jennings wrote:
> This patchset removes the current x86 dependency for zsmalloc
> and introduces some performance improvements in the object
> mapping paths.
> 
> It was meant to be a follow-on to my previous patchest
> 
> https://lkml.org/lkml/2012/6/26/540
> 
> However, this patchset differed so much in light of new performance
> information that I mostly started over.
> 
> In the past, I attempted to compare different mapping methods
> via the use of zcache and frontswap.  However, the nature of those
> two features makes comparing mapping method efficiency difficult
> since the mapping is a very small part of the overall code path.
> 
> In an effort to get more useful statistics on the mapping speed,
> I wrote a microbenchmark module named zsmapbench, designed to
> measure mapping speed by calling straight into the zsmalloc
> paths.
> 
> https://github.com/spartacus06/zsmapbench
> 
> This exposed an interesting and unexpected result: in all
> cases that I tried, copying the objects that span pages instead
> of using the page table to map them, was _always_ faster.  I could
> not find a case in which the page table mapping method was faster.
> 
> zsmapbench measures the copy-based mapping at ~560 cycles for a
> map/unmap operation on spanned object for both KVM guest and bare-metal,
> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
> bare-metal.  The cycles for the copy method will vary with
> allocation size, however, it is still faster even for the largest
> allocation that zsmalloc supports.
> 
> The result is convenient though, as mempcy is very portable :)

Today, I tested zsmapbench in my embedded board(ARM).
tlb-flush is 30% faster than copy-based so it's always not win.
I think it depends on CPU speed/cache size.

zram is already very popular on embedded systems so I want to use
it continuously without 30% big demage so I want to keep our old approach
which supporting local tlb flush. 

Of course, in case of KVM guest, copy-based would be always bin win.
So shouldn't we support both approach? It could make code very ugly
but I think it has enough value.

Any thought?


> 
> This patchset replaces the x86-only page table mapping code with
> copy-based mapping code. It also makes changes to optimize this
> new method further.
> 
> There are no changes in arch/x86 required.
> 
> Patchset is based on greg's staging-next.
> 
> Seth Jennings (4):
>   zsmalloc: remove x86 dependency
>   zsmalloc: add single-page object fastpath in unmap
>   zsmalloc: add details to zs_map_object boiler plate
>   zsmalloc: add mapping modes
> 
>  drivers/staging/zcache/zcache-main.c     |    6 +-
>  drivers/staging/zram/zram_drv.c          |    7 +-
>  drivers/staging/zsmalloc/Kconfig         |    4 -
>  drivers/staging/zsmalloc/zsmalloc-main.c |  124 ++++++++++++++++++++++--------
>  drivers/staging/zsmalloc/zsmalloc.h      |   14 +++-
>  drivers/staging/zsmalloc/zsmalloc_int.h  |    6 +-
>  6 files changed, 114 insertions(+), 47 deletions(-)
> 


-- 
Kind regards,
Minchan Kim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
  2012-07-10 15:29       ` Seth Jennings
@ 2012-07-11  7:27         ` Minchan Kim
  -1 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-11  7:27 UTC (permalink / raw)
  To: Seth Jennings
  Cc: devel, Dan Magenheimer, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, linux-kernel, linux-mm, Andrew Morton,
	Robert Jennings, Nitin Gupta, Peter Zijlstra

On 07/11/2012 12:29 AM, Seth Jennings wrote:
> On 07/09/2012 09:21 PM, Minchan Kim wrote:
>> On 07/03/2012 06:15 AM, Seth Jennings wrote:
> <snip>
>>> +static void zs_copy_map_object(char *buf, struct page *firstpage,
>>> +				int off, int size)
>>
>> firstpage is rather misleading.
>> As you know, we use firstpage term for real firstpage of zspage but
>> in case of zs_copy_map_object, it could be a middle page of zspage.
>> So I would like to use "page" instead of firstpage.
> 
> Accepted.
> 
>>> +{
>>> +	struct page *pages[2];
>>> +	int sizes[2];
>>> +	void *addr;
>>> +
>>> +	pages[0] = firstpage;
>>> +	pages[1] = get_next_page(firstpage);
>>> +	BUG_ON(!pages[1]);
>>> +
>>> +	sizes[0] = PAGE_SIZE - off;
>>> +	sizes[1] = size - sizes[0];
>>> +
>>> +	/* disable page faults to match kmap_atomic() return conditions */
>>> +	pagefault_disable();
>>
>> If I understand your intention correctly, you want to prevent calling
>> this function on non-atomic context. Right?
> 
> This is moved to zs_map_object() in a later patch, but the
> point is to provide uniform return conditions, regardless of
> whether the object to be mapped is contained in a single
> page or spans two pages.  kmap_atomic() disables page
> faults, so I did it here to create symmetry.  The result is

The one I want to comment out is why we should disable page fault.
ie, if we don't disable page fault, what's happen?

As I read the comment of kmap_atomic about pagefault_disable, 
it seems that for preventing reentrant bug in preemptive kernel
while it catch page fault during atomic context in non-preemptive kernel.
But I'm not sure so Ccing Peter.


> that zs_map_object always returns with preemption and page
> faults disabled.
> 
> Also, Greg already merged these patches so I'll have to
> incorporate these changes as a separate patch.
> 
> Thanks,
> Seth
> 


-- 
Kind regards,
Minchan Kim



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
@ 2012-07-11  7:27         ` Minchan Kim
  0 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-11  7:27 UTC (permalink / raw)
  To: Seth Jennings
  Cc: devel, Dan Magenheimer, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, linux-kernel, linux-mm, Andrew Morton,
	Robert Jennings, Nitin Gupta, Peter Zijlstra

On 07/11/2012 12:29 AM, Seth Jennings wrote:
> On 07/09/2012 09:21 PM, Minchan Kim wrote:
>> On 07/03/2012 06:15 AM, Seth Jennings wrote:
> <snip>
>>> +static void zs_copy_map_object(char *buf, struct page *firstpage,
>>> +				int off, int size)
>>
>> firstpage is rather misleading.
>> As you know, we use firstpage term for real firstpage of zspage but
>> in case of zs_copy_map_object, it could be a middle page of zspage.
>> So I would like to use "page" instead of firstpage.
> 
> Accepted.
> 
>>> +{
>>> +	struct page *pages[2];
>>> +	int sizes[2];
>>> +	void *addr;
>>> +
>>> +	pages[0] = firstpage;
>>> +	pages[1] = get_next_page(firstpage);
>>> +	BUG_ON(!pages[1]);
>>> +
>>> +	sizes[0] = PAGE_SIZE - off;
>>> +	sizes[1] = size - sizes[0];
>>> +
>>> +	/* disable page faults to match kmap_atomic() return conditions */
>>> +	pagefault_disable();
>>
>> If I understand your intention correctly, you want to prevent calling
>> this function on non-atomic context. Right?
> 
> This is moved to zs_map_object() in a later patch, but the
> point is to provide uniform return conditions, regardless of
> whether the object to be mapped is contained in a single
> page or spans two pages.  kmap_atomic() disables page
> faults, so I did it here to create symmetry.  The result is

The one I want to comment out is why we should disable page fault.
ie, if we don't disable page fault, what's happen?

As I read the comment of kmap_atomic about pagefault_disable, 
it seems that for preventing reentrant bug in preemptive kernel
while it catch page fault during atomic context in non-preemptive kernel.
But I'm not sure so Ccing Peter.


> that zs_map_object always returns with preemption and page
> faults disabled.
> 
> Also, Greg already merged these patches so I'll have to
> incorporate these changes as a separate patch.
> 
> Thanks,
> Seth
> 


-- 
Kind regards,
Minchan Kim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
  2012-07-10 15:17       ` Seth Jennings
@ 2012-07-11  7:42         ` Minchan Kim
  -1 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-11  7:42 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 12:17 AM, Seth Jennings wrote:
> On 07/09/2012 09:35 PM, Minchan Kim wrote:
>> On 07/03/2012 06:15 AM, Seth Jennings wrote:
>>> Add information on the usage limits of zs_map_object()
>>>
>>> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
>>> ---
>>>  drivers/staging/zsmalloc/zsmalloc-main.c |    7 ++++++-
>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
>>> index 4942d41..abf7c13 100644
>>> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
>>> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
>>> @@ -747,7 +747,12 @@ EXPORT_SYMBOL_GPL(zs_free);
>>>   *
>>>   * Before using an object allocated from zs_malloc, it must be mapped using
>>>   * this function. When done with the object, it must be unmapped using
>>> - * zs_unmap_object
>>> + * zs_unmap_object.
>>> + *
>>> + * Only one object can be mapped per cpu at a time. There is no protection
>>> + * against nested mappings.
>>> + *
>>> + * This function returns with preemption and page faults disabled.
>>>  */
>>>  void *zs_map_object(struct zs_pool *pool, unsigned long handle)
>>>  {
>>>
>>
>> The comment is good but I hope we can detect it automatically with DEBUG
>> option. It wouldn't be hard but it's a debug patch so not critical
>> until we receive some report about the bug.
> 
> Yes, we could implement some detection scheme later.
> 
>>
>> The possibility for nesting is that it is used by irq context.
>>
>> A uses the mapping
>> .
>> .
>> .
>> IRQ happen
>> 	B uses the mapping in IRQ context
>> 	.
>> 	.
>> 	.
>>
>> Maybe we need local_irq_save/restore in zs_[un]map_object path.
> 
> I'd rather not disable interrupts since that will create
> unnecessary interrupt latency for all users, even if they

Agreed.
Although we guide k[un]map atomic is so fast, it isn't necessary
to force irq_[enable|disable]. Okay.

> don't need interrupt protection.  If a particular user uses
> zs_map_object() in an interrupt path, it will be up to that
> user to disable interrupts to ensure safety.

Nope. It shouldn't do that.
Any user in interrupt context can't assume that there isn't any other user using per-cpu buffer
right before interrupt happens.

The concern is that if such bug happens, it's very hard to find a bug.
So, how about adding this?

void zs_map_object(...)
{
	BUG_ON(in_interrupt());
}


> 
> Thanks,
> Seth
> 
> 


-- 
Kind regards,
Minchan Kim



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
@ 2012-07-11  7:42         ` Minchan Kim
  0 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-11  7:42 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 12:17 AM, Seth Jennings wrote:
> On 07/09/2012 09:35 PM, Minchan Kim wrote:
>> On 07/03/2012 06:15 AM, Seth Jennings wrote:
>>> Add information on the usage limits of zs_map_object()
>>>
>>> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
>>> ---
>>>  drivers/staging/zsmalloc/zsmalloc-main.c |    7 ++++++-
>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
>>> index 4942d41..abf7c13 100644
>>> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
>>> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
>>> @@ -747,7 +747,12 @@ EXPORT_SYMBOL_GPL(zs_free);
>>>   *
>>>   * Before using an object allocated from zs_malloc, it must be mapped using
>>>   * this function. When done with the object, it must be unmapped using
>>> - * zs_unmap_object
>>> + * zs_unmap_object.
>>> + *
>>> + * Only one object can be mapped per cpu at a time. There is no protection
>>> + * against nested mappings.
>>> + *
>>> + * This function returns with preemption and page faults disabled.
>>>  */
>>>  void *zs_map_object(struct zs_pool *pool, unsigned long handle)
>>>  {
>>>
>>
>> The comment is good but I hope we can detect it automatically with DEBUG
>> option. It wouldn't be hard but it's a debug patch so not critical
>> until we receive some report about the bug.
> 
> Yes, we could implement some detection scheme later.
> 
>>
>> The possibility for nesting is that it is used by irq context.
>>
>> A uses the mapping
>> .
>> .
>> .
>> IRQ happen
>> 	B uses the mapping in IRQ context
>> 	.
>> 	.
>> 	.
>>
>> Maybe we need local_irq_save/restore in zs_[un]map_object path.
> 
> I'd rather not disable interrupts since that will create
> unnecessary interrupt latency for all users, even if they

Agreed.
Although we guide k[un]map atomic is so fast, it isn't necessary
to force irq_[enable|disable]. Okay.

> don't need interrupt protection.  If a particular user uses
> zs_map_object() in an interrupt path, it will be up to that
> user to disable interrupts to ensure safety.

Nope. It shouldn't do that.
Any user in interrupt context can't assume that there isn't any other user using per-cpu buffer
right before interrupt happens.

The concern is that if such bug happens, it's very hard to find a bug.
So, how about adding this?

void zs_map_object(...)
{
	BUG_ON(in_interrupt());
}


> 
> Thanks,
> Seth
> 
> 


-- 
Kind regards,
Minchan Kim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
  2012-07-11  7:03   ` Minchan Kim
@ 2012-07-11 14:00     ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-11 14:00 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 02:03 AM, Minchan Kim wrote:
> On 07/03/2012 06:15 AM, Seth Jennings wrote:
>> zsmapbench measures the copy-based mapping at ~560 cycles for a
>> map/unmap operation on spanned object for both KVM guest and bare-metal,
>> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
>> bare-metal.  The cycles for the copy method will vary with
>> allocation size, however, it is still faster even for the largest
>> allocation that zsmalloc supports.
>>
>> The result is convenient though, as mempcy is very portable :)
> 
> Today, I tested zsmapbench in my embedded board(ARM).
> tlb-flush is 30% faster than copy-based so it's always not win.
> I think it depends on CPU speed/cache size.
> 
> zram is already very popular on embedded systems so I want to use
> it continuously without 30% big demage so I want to keep our old approach
> which supporting local tlb flush. 
> 
> Of course, in case of KVM guest, copy-based would be always bin win.
> So shouldn't we support both approach? It could make code very ugly
> but I think it has enough value.
> 
> Any thought?

Thanks for testing on ARM.

I can add the pgtable assisted method back in, no problem.
The question is by which criteria are we going to choose
which method to use? By arch (i.e. ARM -> pgtable assist,
x86 -> copy, other archs -> ?)?

Also, what changes did you make to zsmapbench to measure
elapsed time/cycles on ARM?  Afaik, rdtscll() is not
supported on ARM.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
@ 2012-07-11 14:00     ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-11 14:00 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 02:03 AM, Minchan Kim wrote:
> On 07/03/2012 06:15 AM, Seth Jennings wrote:
>> zsmapbench measures the copy-based mapping at ~560 cycles for a
>> map/unmap operation on spanned object for both KVM guest and bare-metal,
>> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
>> bare-metal.  The cycles for the copy method will vary with
>> allocation size, however, it is still faster even for the largest
>> allocation that zsmalloc supports.
>>
>> The result is convenient though, as mempcy is very portable :)
> 
> Today, I tested zsmapbench in my embedded board(ARM).
> tlb-flush is 30% faster than copy-based so it's always not win.
> I think it depends on CPU speed/cache size.
> 
> zram is already very popular on embedded systems so I want to use
> it continuously without 30% big demage so I want to keep our old approach
> which supporting local tlb flush. 
> 
> Of course, in case of KVM guest, copy-based would be always bin win.
> So shouldn't we support both approach? It could make code very ugly
> but I think it has enough value.
> 
> Any thought?

Thanks for testing on ARM.

I can add the pgtable assisted method back in, no problem.
The question is by which criteria are we going to choose
which method to use? By arch (i.e. ARM -> pgtable assist,
x86 -> copy, other archs -> ?)?

Also, what changes did you make to zsmapbench to measure
elapsed time/cycles on ARM?  Afaik, rdtscll() is not
supported on ARM.

Thanks,
Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
  2012-07-11  7:42         ` Minchan Kim
@ 2012-07-11 14:15           ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-11 14:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 02:42 AM, Minchan Kim wrote:
> On 07/11/2012 12:17 AM, Seth Jennings wrote:
>> On 07/09/2012 09:35 PM, Minchan Kim wrote:
>>> Maybe we need local_irq_save/restore in zs_[un]map_object path.
>>
>> I'd rather not disable interrupts since that will create
>> unnecessary interrupt latency for all users, even if they
> 
> Agreed.
> Although we guide k[un]map atomic is so fast, it isn't necessary
> to force irq_[enable|disable]. Okay.
> 
>> don't need interrupt protection.  If a particular user uses
>> zs_map_object() in an interrupt path, it will be up to that
>> user to disable interrupts to ensure safety.
> 
> Nope. It shouldn't do that.
> Any user in interrupt context can't assume that there isn't any other user using per-cpu buffer
> right before interrupt happens.
> 
> The concern is that if such bug happens, it's very hard to find a bug.
> So, how about adding this?
> 
> void zs_map_object(...)
> {
> 	BUG_ON(in_interrupt());
> }

I not completely following you, but I think I'm following
enough.  Your point is that the per-cpu buffers are shared
by all zsmalloc users and one user doesn't know if another
user is doing a zs_map_object() in an interrupt path.

However, I think what you are suggesting is to disallow
mapping in interrupt context.  This is a problem for zcache
as it already does mapping in interrupt context, namely for
page decompression in the page fault handler.

What do you think about making the per-cpu buffers local to
each zsmalloc pool? That way each user has their own per-cpu
buffers and don't step on each other's toes.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
@ 2012-07-11 14:15           ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-11 14:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 02:42 AM, Minchan Kim wrote:
> On 07/11/2012 12:17 AM, Seth Jennings wrote:
>> On 07/09/2012 09:35 PM, Minchan Kim wrote:
>>> Maybe we need local_irq_save/restore in zs_[un]map_object path.
>>
>> I'd rather not disable interrupts since that will create
>> unnecessary interrupt latency for all users, even if they
> 
> Agreed.
> Although we guide k[un]map atomic is so fast, it isn't necessary
> to force irq_[enable|disable]. Okay.
> 
>> don't need interrupt protection.  If a particular user uses
>> zs_map_object() in an interrupt path, it will be up to that
>> user to disable interrupts to ensure safety.
> 
> Nope. It shouldn't do that.
> Any user in interrupt context can't assume that there isn't any other user using per-cpu buffer
> right before interrupt happens.
> 
> The concern is that if such bug happens, it's very hard to find a bug.
> So, how about adding this?
> 
> void zs_map_object(...)
> {
> 	BUG_ON(in_interrupt());
> }

I not completely following you, but I think I'm following
enough.  Your point is that the per-cpu buffers are shared
by all zsmalloc users and one user doesn't know if another
user is doing a zs_map_object() in an interrupt path.

However, I think what you are suggesting is to disallow
mapping in interrupt context.  This is a problem for zcache
as it already does mapping in interrupt context, namely for
page decompression in the page fault handler.

What do you think about making the per-cpu buffers local to
each zsmalloc pool? That way each user has their own per-cpu
buffers and don't step on each other's toes.

Thanks,
Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
  2012-07-02 21:15   ` Seth Jennings
@ 2012-07-11 18:26     ` Nitin Gupta
  -1 siblings, 0 replies; 62+ messages in thread
From: Nitin Gupta @ 2012-07-11 18:26 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Minchan Kim, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/02/2012 02:15 PM, Seth Jennings wrote:
> This patch replaces the page table assisted object mapping
> method, which has x86 dependencies, with a arch-independent
> method that does a simple copy into a temporary per-cpu
> buffer.
> 
> While a copy seems like it would be worse than mapping the pages,
> tests demonstrate the copying is always faster and, in the case of
> running inside a KVM guest, roughly 4x faster.
>
> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> ---
>  drivers/staging/zsmalloc/Kconfig         |    4 --
>  drivers/staging/zsmalloc/zsmalloc-main.c |   99 +++++++++++++++++++++---------
>  drivers/staging/zsmalloc/zsmalloc_int.h  |    5 +-
>  3 files changed, 72 insertions(+), 36 deletions(-)
> 


>  struct mapping_area {
> -	struct vm_struct *vm;
> -	pte_t *vm_ptes[2];
> -	char *vm_addr;
> +	char *vm_buf; /* copy buffer for objects that span pages */
> +	char *vm_addr; /* address of kmap_atomic()'ed pages */
>  };
>  

I think we can reduce the copying overhead by not copying an entire
compressed object to another (per-cpu) buffer. The basic idea of the
method below is to:
 - Copy only the amount of data that spills over into the next page
 - No need for a separate buffer to copy into

Currently, we store objects that split across pages as:

+-Page1-+
|	|
|	|
|-------| <-- obj-1 off: 0
|<ob1'>	|
+-------+ <-- obj-1 off: s'

+-Page2-+ <-- obj-1 off: s'
|<ob1''>|
|-------| <-- obj-1 off: obj1_size, obj-2 off: 0
|<ob2>	|
|-------| <-- obj-2 off: obj2_size
+-------+

But now we would store it as:

+-Page1-+
|	|
|-------| <-- obj-1 off: s''
|	|
|<ob1'>	|
+-------+ <-- obj-1 off: obj1_size

+-Page2-+ <-- obj-1 off: 0
|<ob1''>|
|-------| <-- obj-1 off: s'', obj-2 off: 0
|<ob2>	|
|-------| <-- obj-2 off: obj2_size
+-------+

When object-1 (ob1) is to be mapped, part (size: s'-0) of object-2 will
be swapped with ob1'. This swapping can be done in-place using simple
xor swap algorithm. So, after swap, page-1 and page-2 will look like:

+-Page1-+
|	|
|-------| <-- obj-2 off: 0
|	|
|<ob2''>|
+-------+ <-- obj-2 off: (obj1_size - s'')

+-Page2-+ <-- obj-1 off: 0
|	|
|<ob1>	|
|-------| <-- obj-1 off: obj1_size, obj-2 off: (obj1_size - s'')
|<ob2'>	|
+-------+ <-- obj-2 off: obj2_size

Now obj-1 lies completely within page-2, so can be kmap'ed as usual. On
zs_unmap_object() we would just do the reverse and restore objects as in
figure-1.

We can reduce the overhead even further by removing the need to
"restore" the object. For this, we would have to remove the assumption
in zsmalloc that the spilled part begins at offset 0 in the next page.
Not sure how feasible this would be.

Note that doing this transform is always possible since we ensure that
an object can never exceed system page size.

Thanks,
Nitin



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
@ 2012-07-11 18:26     ` Nitin Gupta
  0 siblings, 0 replies; 62+ messages in thread
From: Nitin Gupta @ 2012-07-11 18:26 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Minchan Kim, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/02/2012 02:15 PM, Seth Jennings wrote:
> This patch replaces the page table assisted object mapping
> method, which has x86 dependencies, with a arch-independent
> method that does a simple copy into a temporary per-cpu
> buffer.
> 
> While a copy seems like it would be worse than mapping the pages,
> tests demonstrate the copying is always faster and, in the case of
> running inside a KVM guest, roughly 4x faster.
>
> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> ---
>  drivers/staging/zsmalloc/Kconfig         |    4 --
>  drivers/staging/zsmalloc/zsmalloc-main.c |   99 +++++++++++++++++++++---------
>  drivers/staging/zsmalloc/zsmalloc_int.h  |    5 +-
>  3 files changed, 72 insertions(+), 36 deletions(-)
> 


>  struct mapping_area {
> -	struct vm_struct *vm;
> -	pte_t *vm_ptes[2];
> -	char *vm_addr;
> +	char *vm_buf; /* copy buffer for objects that span pages */
> +	char *vm_addr; /* address of kmap_atomic()'ed pages */
>  };
>  

I think we can reduce the copying overhead by not copying an entire
compressed object to another (per-cpu) buffer. The basic idea of the
method below is to:
 - Copy only the amount of data that spills over into the next page
 - No need for a separate buffer to copy into

Currently, we store objects that split across pages as:

+-Page1-+
|	|
|	|
|-------| <-- obj-1 off: 0
|<ob1'>	|
+-------+ <-- obj-1 off: s'

+-Page2-+ <-- obj-1 off: s'
|<ob1''>|
|-------| <-- obj-1 off: obj1_size, obj-2 off: 0
|<ob2>	|
|-------| <-- obj-2 off: obj2_size
+-------+

But now we would store it as:

+-Page1-+
|	|
|-------| <-- obj-1 off: s''
|	|
|<ob1'>	|
+-------+ <-- obj-1 off: obj1_size

+-Page2-+ <-- obj-1 off: 0
|<ob1''>|
|-------| <-- obj-1 off: s'', obj-2 off: 0
|<ob2>	|
|-------| <-- obj-2 off: obj2_size
+-------+

When object-1 (ob1) is to be mapped, part (size: s'-0) of object-2 will
be swapped with ob1'. This swapping can be done in-place using simple
xor swap algorithm. So, after swap, page-1 and page-2 will look like:

+-Page1-+
|	|
|-------| <-- obj-2 off: 0
|	|
|<ob2''>|
+-------+ <-- obj-2 off: (obj1_size - s'')

+-Page2-+ <-- obj-1 off: 0
|	|
|<ob1>	|
|-------| <-- obj-1 off: obj1_size, obj-2 off: (obj1_size - s'')
|<ob2'>	|
+-------+ <-- obj-2 off: obj2_size

Now obj-1 lies completely within page-2, so can be kmap'ed as usual. On
zs_unmap_object() we would just do the reverse and restore objects as in
figure-1.

We can reduce the overhead even further by removing the need to
"restore" the object. For this, we would have to remove the assumption
in zsmalloc that the spilled part begins at offset 0 in the next page.
Not sure how feasible this would be.

Note that doing this transform is always possible since we ensure that
an object can never exceed system page size.

Thanks,
Nitin


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
  2012-07-11  7:03   ` Minchan Kim
@ 2012-07-11 19:16     ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-11 19:16 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 02:03 AM, Minchan Kim wrote:
> Today, I tested zsmapbench in my embedded board(ARM).
> tlb-flush is 30% faster than copy-based so it's always not win.
> I think it depends on CPU speed/cache size.

After you pointed this out, I decided to test this on my
Raspberry Pi, the only ARM system I have that is open enough
for me to work with.

I pulled some of the cycle counting stuff out of
arch/arm/kernel/perf_event_v6.c.  I've pushed that code to
the github repo.

git://github.com/spartacus06/zsmapbench.git

My results were in agreement with your findings.  I got 2040
cycles/map for the copy method and 947 cycles/map for the
page-table method.  I think memory speed is playing a big
roll in the difference.

I agree that the page-table method should be restored since
the performance difference is so significant on ARM, a
platform that benefits a lot from memory compression IMHO.

Still, the question remains how to implement the selection
logic, since not all archs that support the page-table
method will necessarily perform better with it.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
@ 2012-07-11 19:16     ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-11 19:16 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 02:03 AM, Minchan Kim wrote:
> Today, I tested zsmapbench in my embedded board(ARM).
> tlb-flush is 30% faster than copy-based so it's always not win.
> I think it depends on CPU speed/cache size.

After you pointed this out, I decided to test this on my
Raspberry Pi, the only ARM system I have that is open enough
for me to work with.

I pulled some of the cycle counting stuff out of
arch/arm/kernel/perf_event_v6.c.  I've pushed that code to
the github repo.

git://github.com/spartacus06/zsmapbench.git

My results were in agreement with your findings.  I got 2040
cycles/map for the copy method and 947 cycles/map for the
page-table method.  I think memory speed is playing a big
roll in the difference.

I agree that the page-table method should be restored since
the performance difference is so significant on ARM, a
platform that benefits a lot from memory compression IMHO.

Still, the question remains how to implement the selection
logic, since not all archs that support the page-table
method will necessarily perform better with it.

Thanks,
Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
  2012-07-09 13:58       ` Seth Jennings
@ 2012-07-11 19:42         ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 62+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-11 19:42 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

>>> Which architecture was this under? It sounds x86-ish? Is this on
>>> Westmere and more modern machines? What about Core2 architecture?
>>>
>>> Oh how did it work on AMD Phenom boxes?
>>
>> I don't have a Phenom box but I have an Athlon X2 I can try out.
>> I'll get this information next Monday.
>
> Actually, I'm running some production stuff on that box, so
> I rather not put testing stuff on it.  Is there any
> particular reason that you wanted this information? Do you
> have a reason to believe that mapping will be faster than
> copy for AMD procs?

Sorry for the late response. Working on some ugly bug that is taking
more time than anticipated.
My thoughts were that these findings are based on the hardware memory
prefetcher. The Intel
machines - especially starting with Nehelem have some pretty
impressive prefetcher where
even doing in a linked list 'prefetch' on the next node is not beneficial.

Perhaps the way to leverage this is to use different modes depending
on the bulk of data?
When there is a huge amount use the old method, but for small use copy
(as it would
in theory stay in the cache longer).

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
@ 2012-07-11 19:42         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 62+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-11 19:42 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

>>> Which architecture was this under? It sounds x86-ish? Is this on
>>> Westmere and more modern machines? What about Core2 architecture?
>>>
>>> Oh how did it work on AMD Phenom boxes?
>>
>> I don't have a Phenom box but I have an Athlon X2 I can try out.
>> I'll get this information next Monday.
>
> Actually, I'm running some production stuff on that box, so
> I rather not put testing stuff on it.  Is there any
> particular reason that you wanted this information? Do you
> have a reason to believe that mapping will be faster than
> copy for AMD procs?

Sorry for the late response. Working on some ugly bug that is taking
more time than anticipated.
My thoughts were that these findings are based on the hardware memory
prefetcher. The Intel
machines - especially starting with Nehelem have some pretty
impressive prefetcher where
even doing in a linked list 'prefetch' on the next node is not beneficial.

Perhaps the way to leverage this is to use different modes depending
on the bulk of data?
When there is a huge amount use the old method, but for small use copy
(as it would
in theory stay in the cache longer).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
  2012-07-11 18:26     ` Nitin Gupta
@ 2012-07-11 20:32       ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-11 20:32 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Minchan Kim, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 01:26 PM, Nitin Gupta wrote:
> On 07/02/2012 02:15 PM, Seth Jennings wrote:
>> This patch replaces the page table assisted object mapping
>> method, which has x86 dependencies, with a arch-independent
>> method that does a simple copy into a temporary per-cpu
>> buffer.
>>
>> While a copy seems like it would be worse than mapping the pages,
>> tests demonstrate the copying is always faster and, in the case of
>> running inside a KVM guest, roughly 4x faster.
>>
>> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
>> ---
>>  drivers/staging/zsmalloc/Kconfig         |    4 --
>>  drivers/staging/zsmalloc/zsmalloc-main.c |   99 +++++++++++++++++++++---------
>>  drivers/staging/zsmalloc/zsmalloc_int.h  |    5 +-
>>  3 files changed, 72 insertions(+), 36 deletions(-)
>>
> 
> 
>>  struct mapping_area {
>> -	struct vm_struct *vm;
>> -	pte_t *vm_ptes[2];
>> -	char *vm_addr;
>> +	char *vm_buf; /* copy buffer for objects that span pages */
>> +	char *vm_addr; /* address of kmap_atomic()'ed pages */
>>  };
>>  
> 
> I think we can reduce the copying overhead by not copying an entire
> compressed object to another (per-cpu) buffer. The basic idea of the
> method below is to:
>  - Copy only the amount of data that spills over into the next page
>  - No need for a separate buffer to copy into
> 
> Currently, we store objects that split across pages as:
> 
> +-Page1-+
> |	|
> |	|
> |-------| <-- obj-1 off: 0
> |<ob1'>	|
> +-------+ <-- obj-1 off: s'
> 
> +-Page2-+ <-- obj-1 off: s'
> |<ob1''>|
> |-------| <-- obj-1 off: obj1_size, obj-2 off: 0
> |<ob2>	|
> |-------| <-- obj-2 off: obj2_size
> +-------+
> 
> But now we would store it as:
> 
> +-Page1-+
> |	|
> |-------| <-- obj-1 off: s''
> |	|
> |<ob1'>	|
> +-------+ <-- obj-1 off: obj1_size
> 
> +-Page2-+ <-- obj-1 off: 0
> |<ob1''>|
> |-------| <-- obj-1 off: s'', obj-2 off: 0
> |<ob2>	|
> |-------| <-- obj-2 off: obj2_size
> +-------+
> 
> When object-1 (ob1) is to be mapped, part (size: s'-0) of object-2 will
> be swapped with ob1'. This swapping can be done in-place using simple
> xor swap algorithm. So, after swap, page-1 and page-2 will look like:
> 
> +-Page1-+
> |	|
> |-------| <-- obj-2 off: 0
> |	|
> |<ob2''>|
> +-------+ <-- obj-2 off: (obj1_size - s'')
> 
> +-Page2-+ <-- obj-1 off: 0
> |	|
> |<ob1>	|
> |-------| <-- obj-1 off: obj1_size, obj-2 off: (obj1_size - s'')
> |<ob2'>	|
> +-------+ <-- obj-2 off: obj2_size
> 
> Now obj-1 lies completely within page-2, so can be kmap'ed as usual. On
> zs_unmap_object() we would just do the reverse and restore objects as in
> figure-1.

Hey Nitin, thanks for the feedback.

Correct me if I'm wrong, but it seems like you wouldn't be able to map
ob2 while ob1 was mapped with this design.  You'd need some sort of
zspage level protection against concurrent object mappings.  The
code for that protection might cancel any benefit you would gain by
doing it this way.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
@ 2012-07-11 20:32       ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-11 20:32 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Minchan Kim, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 01:26 PM, Nitin Gupta wrote:
> On 07/02/2012 02:15 PM, Seth Jennings wrote:
>> This patch replaces the page table assisted object mapping
>> method, which has x86 dependencies, with a arch-independent
>> method that does a simple copy into a temporary per-cpu
>> buffer.
>>
>> While a copy seems like it would be worse than mapping the pages,
>> tests demonstrate the copying is always faster and, in the case of
>> running inside a KVM guest, roughly 4x faster.
>>
>> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
>> ---
>>  drivers/staging/zsmalloc/Kconfig         |    4 --
>>  drivers/staging/zsmalloc/zsmalloc-main.c |   99 +++++++++++++++++++++---------
>>  drivers/staging/zsmalloc/zsmalloc_int.h  |    5 +-
>>  3 files changed, 72 insertions(+), 36 deletions(-)
>>
> 
> 
>>  struct mapping_area {
>> -	struct vm_struct *vm;
>> -	pte_t *vm_ptes[2];
>> -	char *vm_addr;
>> +	char *vm_buf; /* copy buffer for objects that span pages */
>> +	char *vm_addr; /* address of kmap_atomic()'ed pages */
>>  };
>>  
> 
> I think we can reduce the copying overhead by not copying an entire
> compressed object to another (per-cpu) buffer. The basic idea of the
> method below is to:
>  - Copy only the amount of data that spills over into the next page
>  - No need for a separate buffer to copy into
> 
> Currently, we store objects that split across pages as:
> 
> +-Page1-+
> |	|
> |	|
> |-------| <-- obj-1 off: 0
> |<ob1'>	|
> +-------+ <-- obj-1 off: s'
> 
> +-Page2-+ <-- obj-1 off: s'
> |<ob1''>|
> |-------| <-- obj-1 off: obj1_size, obj-2 off: 0
> |<ob2>	|
> |-------| <-- obj-2 off: obj2_size
> +-------+
> 
> But now we would store it as:
> 
> +-Page1-+
> |	|
> |-------| <-- obj-1 off: s''
> |	|
> |<ob1'>	|
> +-------+ <-- obj-1 off: obj1_size
> 
> +-Page2-+ <-- obj-1 off: 0
> |<ob1''>|
> |-------| <-- obj-1 off: s'', obj-2 off: 0
> |<ob2>	|
> |-------| <-- obj-2 off: obj2_size
> +-------+
> 
> When object-1 (ob1) is to be mapped, part (size: s'-0) of object-2 will
> be swapped with ob1'. This swapping can be done in-place using simple
> xor swap algorithm. So, after swap, page-1 and page-2 will look like:
> 
> +-Page1-+
> |	|
> |-------| <-- obj-2 off: 0
> |	|
> |<ob2''>|
> +-------+ <-- obj-2 off: (obj1_size - s'')
> 
> +-Page2-+ <-- obj-1 off: 0
> |	|
> |<ob1>	|
> |-------| <-- obj-1 off: obj1_size, obj-2 off: (obj1_size - s'')
> |<ob2'>	|
> +-------+ <-- obj-2 off: obj2_size
> 
> Now obj-1 lies completely within page-2, so can be kmap'ed as usual. On
> zs_unmap_object() we would just do the reverse and restore objects as in
> figure-1.

Hey Nitin, thanks for the feedback.

Correct me if I'm wrong, but it seems like you wouldn't be able to map
ob2 while ob1 was mapped with this design.  You'd need some sort of
zspage level protection against concurrent object mappings.  The
code for that protection might cancel any benefit you would gain by
doing it this way.

Thanks,
Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
  2012-07-11 19:42         ` Konrad Rzeszutek Wilk
@ 2012-07-11 20:48           ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-11 20:48 UTC (permalink / raw)
  To: konrad
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

On 07/11/2012 02:42 PM, Konrad Rzeszutek Wilk wrote:
>>>> Which architecture was this under? It sounds x86-ish? Is this on
>>>> Westmere and more modern machines? What about Core2 architecture?
>>>>
>>>> Oh how did it work on AMD Phenom boxes?
>>>
>>> I don't have a Phenom box but I have an Athlon X2 I can try out.
>>> I'll get this information next Monday.
>>
>> Actually, I'm running some production stuff on that box, so
>> I rather not put testing stuff on it.  Is there any
>> particular reason that you wanted this information? Do you
>> have a reason to believe that mapping will be faster than
>> copy for AMD procs?
> 
> Sorry for the late response. Working on some ugly bug that is taking
> more time than anticipated.
> My thoughts were that these findings are based on the hardware memory
> prefetcher. The Intel
> machines - especially starting with Nehelem have some pretty
> impressive prefetcher where
> even doing in a linked list 'prefetch' on the next node is not beneficial.
> 
> Perhaps the way to leverage this is to use different modes depending
> on the bulk of data?
> When there is a huge amount use the old method, but for small use copy
> (as it would
> in theory stay in the cache longer).

Not sure what you mean by "bulk" or "huge amount" but the
maximum size of mapped object is PAGE_SIZE and the typical
size more around PAGE_SIZE/2. So that is what I'm
considering.  Do you think it makes a difference with copies
that small?

Thanks,
Seth


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
@ 2012-07-11 20:48           ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-11 20:48 UTC (permalink / raw)
  To: konrad
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

On 07/11/2012 02:42 PM, Konrad Rzeszutek Wilk wrote:
>>>> Which architecture was this under? It sounds x86-ish? Is this on
>>>> Westmere and more modern machines? What about Core2 architecture?
>>>>
>>>> Oh how did it work on AMD Phenom boxes?
>>>
>>> I don't have a Phenom box but I have an Athlon X2 I can try out.
>>> I'll get this information next Monday.
>>
>> Actually, I'm running some production stuff on that box, so
>> I rather not put testing stuff on it.  Is there any
>> particular reason that you wanted this information? Do you
>> have a reason to believe that mapping will be faster than
>> copy for AMD procs?
> 
> Sorry for the late response. Working on some ugly bug that is taking
> more time than anticipated.
> My thoughts were that these findings are based on the hardware memory
> prefetcher. The Intel
> machines - especially starting with Nehelem have some pretty
> impressive prefetcher where
> even doing in a linked list 'prefetch' on the next node is not beneficial.
> 
> Perhaps the way to leverage this is to use different modes depending
> on the bulk of data?
> When there is a huge amount use the old method, but for small use copy
> (as it would
> in theory stay in the cache longer).

Not sure what you mean by "bulk" or "huge amount" but the
maximum size of mapped object is PAGE_SIZE and the typical
size more around PAGE_SIZE/2. So that is what I'm
considering.  Do you think it makes a difference with copies
that small?

Thanks,
Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
  2012-07-11 20:32       ` Seth Jennings
@ 2012-07-11 22:42         ` Nitin Gupta
  -1 siblings, 0 replies; 62+ messages in thread
From: Nitin Gupta @ 2012-07-11 22:42 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Minchan Kim, Robert Jennings, linux-mm,
	devel, linux-kernel

On Wed, Jul 11, 2012 at 1:32 PM, Seth Jennings
<sjenning@linux.vnet.ibm.com> wrote:
> On 07/11/2012 01:26 PM, Nitin Gupta wrote:
>> On 07/02/2012 02:15 PM, Seth Jennings wrote:
>>> This patch replaces the page table assisted object mapping
>>> method, which has x86 dependencies, with a arch-independent
>>> method that does a simple copy into a temporary per-cpu
>>> buffer.
>>>
>>> While a copy seems like it would be worse than mapping the pages,
>>> tests demonstrate the copying is always faster and, in the case of
>>> running inside a KVM guest, roughly 4x faster.
>>>
>>> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
>>> ---
>>>  drivers/staging/zsmalloc/Kconfig         |    4 --
>>>  drivers/staging/zsmalloc/zsmalloc-main.c |   99 +++++++++++++++++++++---------
>>>  drivers/staging/zsmalloc/zsmalloc_int.h  |    5 +-
>>>  3 files changed, 72 insertions(+), 36 deletions(-)
>>>
>>
>>
>>>  struct mapping_area {
>>> -    struct vm_struct *vm;
>>> -    pte_t *vm_ptes[2];
>>> -    char *vm_addr;
>>> +    char *vm_buf; /* copy buffer for objects that span pages */
>>> +    char *vm_addr; /* address of kmap_atomic()'ed pages */
>>>  };
>>>
>>
>> I think we can reduce the copying overhead by not copying an entire
>> compressed object to another (per-cpu) buffer. The basic idea of the
>> method below is to:
>>  - Copy only the amount of data that spills over into the next page
>>  - No need for a separate buffer to copy into
>>
>> Currently, we store objects that split across pages as:
>>
>> +-Page1-+
>> |     |
>> |     |
>> |-------| <-- obj-1 off: 0
>> |<ob1'>       |
>> +-------+ <-- obj-1 off: s'
>>
>> +-Page2-+ <-- obj-1 off: s'
>> |<ob1''>|
>> |-------| <-- obj-1 off: obj1_size, obj-2 off: 0
>> |<ob2>        |
>> |-------| <-- obj-2 off: obj2_size
>> +-------+
>>
>> But now we would store it as:
>>
>> +-Page1-+
>> |     |
>> |-------| <-- obj-1 off: s''
>> |     |
>> |<ob1'>       |
>> +-------+ <-- obj-1 off: obj1_size
>>
>> +-Page2-+ <-- obj-1 off: 0
>> |<ob1''>|
>> |-------| <-- obj-1 off: s'', obj-2 off: 0
>> |<ob2>        |
>> |-------| <-- obj-2 off: obj2_size
>> +-------+
>>
>> When object-1 (ob1) is to be mapped, part (size: s'-0) of object-2 will
>> be swapped with ob1'. This swapping can be done in-place using simple
>> xor swap algorithm. So, after swap, page-1 and page-2 will look like:
>>
>> +-Page1-+
>> |     |
>> |-------| <-- obj-2 off: 0
>> |     |
>> |<ob2''>|
>> +-------+ <-- obj-2 off: (obj1_size - s'')
>>
>> +-Page2-+ <-- obj-1 off: 0
>> |     |
>> |<ob1>        |
>> |-------| <-- obj-1 off: obj1_size, obj-2 off: (obj1_size - s'')
>> |<ob2'>       |
>> +-------+ <-- obj-2 off: obj2_size
>>
>> Now obj-1 lies completely within page-2, so can be kmap'ed as usual. On
>> zs_unmap_object() we would just do the reverse and restore objects as in
>> figure-1.
>
> Hey Nitin, thanks for the feedback.
>
> Correct me if I'm wrong, but it seems like you wouldn't be able to map
> ob2 while ob1 was mapped with this design.  You'd need some sort of
> zspage level protection against concurrent object mappings.  The
> code for that protection might cancel any benefit you would gain by
> doing it this way.
>

Do you think blocking access of just one particular object (or
blocking an entire zspage, for simplicity) for a short time would be
an issue, apart from the complexity of implementing per zspage
locking?

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
@ 2012-07-11 22:42         ` Nitin Gupta
  0 siblings, 0 replies; 62+ messages in thread
From: Nitin Gupta @ 2012-07-11 22:42 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Minchan Kim, Robert Jennings, linux-mm,
	devel, linux-kernel

On Wed, Jul 11, 2012 at 1:32 PM, Seth Jennings
<sjenning@linux.vnet.ibm.com> wrote:
> On 07/11/2012 01:26 PM, Nitin Gupta wrote:
>> On 07/02/2012 02:15 PM, Seth Jennings wrote:
>>> This patch replaces the page table assisted object mapping
>>> method, which has x86 dependencies, with a arch-independent
>>> method that does a simple copy into a temporary per-cpu
>>> buffer.
>>>
>>> While a copy seems like it would be worse than mapping the pages,
>>> tests demonstrate the copying is always faster and, in the case of
>>> running inside a KVM guest, roughly 4x faster.
>>>
>>> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
>>> ---
>>>  drivers/staging/zsmalloc/Kconfig         |    4 --
>>>  drivers/staging/zsmalloc/zsmalloc-main.c |   99 +++++++++++++++++++++---------
>>>  drivers/staging/zsmalloc/zsmalloc_int.h  |    5 +-
>>>  3 files changed, 72 insertions(+), 36 deletions(-)
>>>
>>
>>
>>>  struct mapping_area {
>>> -    struct vm_struct *vm;
>>> -    pte_t *vm_ptes[2];
>>> -    char *vm_addr;
>>> +    char *vm_buf; /* copy buffer for objects that span pages */
>>> +    char *vm_addr; /* address of kmap_atomic()'ed pages */
>>>  };
>>>
>>
>> I think we can reduce the copying overhead by not copying an entire
>> compressed object to another (per-cpu) buffer. The basic idea of the
>> method below is to:
>>  - Copy only the amount of data that spills over into the next page
>>  - No need for a separate buffer to copy into
>>
>> Currently, we store objects that split across pages as:
>>
>> +-Page1-+
>> |     |
>> |     |
>> |-------| <-- obj-1 off: 0
>> |<ob1'>       |
>> +-------+ <-- obj-1 off: s'
>>
>> +-Page2-+ <-- obj-1 off: s'
>> |<ob1''>|
>> |-------| <-- obj-1 off: obj1_size, obj-2 off: 0
>> |<ob2>        |
>> |-------| <-- obj-2 off: obj2_size
>> +-------+
>>
>> But now we would store it as:
>>
>> +-Page1-+
>> |     |
>> |-------| <-- obj-1 off: s''
>> |     |
>> |<ob1'>       |
>> +-------+ <-- obj-1 off: obj1_size
>>
>> +-Page2-+ <-- obj-1 off: 0
>> |<ob1''>|
>> |-------| <-- obj-1 off: s'', obj-2 off: 0
>> |<ob2>        |
>> |-------| <-- obj-2 off: obj2_size
>> +-------+
>>
>> When object-1 (ob1) is to be mapped, part (size: s'-0) of object-2 will
>> be swapped with ob1'. This swapping can be done in-place using simple
>> xor swap algorithm. So, after swap, page-1 and page-2 will look like:
>>
>> +-Page1-+
>> |     |
>> |-------| <-- obj-2 off: 0
>> |     |
>> |<ob2''>|
>> +-------+ <-- obj-2 off: (obj1_size - s'')
>>
>> +-Page2-+ <-- obj-1 off: 0
>> |     |
>> |<ob1>        |
>> |-------| <-- obj-1 off: obj1_size, obj-2 off: (obj1_size - s'')
>> |<ob2'>       |
>> +-------+ <-- obj-2 off: obj2_size
>>
>> Now obj-1 lies completely within page-2, so can be kmap'ed as usual. On
>> zs_unmap_object() we would just do the reverse and restore objects as in
>> figure-1.
>
> Hey Nitin, thanks for the feedback.
>
> Correct me if I'm wrong, but it seems like you wouldn't be able to map
> ob2 while ob1 was mapped with this design.  You'd need some sort of
> zspage level protection against concurrent object mappings.  The
> code for that protection might cancel any benefit you would gain by
> doing it this way.
>

Do you think blocking access of just one particular object (or
blocking an entire zspage, for simplicity) for a short time would be
an issue, apart from the complexity of implementing per zspage
locking?

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
  2012-07-11 22:42         ` Nitin Gupta
@ 2012-07-12  0:23           ` Seth Jennings
  -1 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-12  0:23 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Minchan Kim, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 05:42 PM, Nitin Gupta wrote:
> On Wed, Jul 11, 2012 at 1:32 PM, Seth Jennings
> <sjenning@linux.vnet.ibm.com> wrote:
>> On 07/11/2012 01:26 PM, Nitin Gupta wrote:
<snip>
>>> Now obj-1 lies completely within page-2, so can be kmap'ed as usual. On
>>> zs_unmap_object() we would just do the reverse and restore objects as in
>>> figure-1.
>>
>> Hey Nitin, thanks for the feedback.
>>
>> Correct me if I'm wrong, but it seems like you wouldn't be able to map
>> ob2 while ob1 was mapped with this design.  You'd need some sort of
>> zspage level protection against concurrent object mappings.  The
>> code for that protection might cancel any benefit you would gain by
>> doing it this way.
>>
> 
> Do you think blocking access of just one particular object (or
> blocking an entire zspage, for simplicity) for a short time would be
> an issue, apart from the complexity of implementing per zspage
> locking?

It would only need to prevent the mapping of the temporarily displaced
object, but I said zspage because I don't know how we would do
per-object locking.  I actually don't know how we would do zspage
locking either unless there is a lock in the struct page we can use.

Either way, I think it is a complexity I think we'd be better to avoid
for now.  I'm trying to get zsmalloc in shape to bring into mainline, so
I'm really focusing on portability first and low hanging performance
fruit second. This optimization would be more like top-of-the-tree
performance fruit :-/

However, if you want to try it out, don't let me stop you :)

Thanks,
Seth


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/4] zsmalloc: remove x86 dependency
@ 2012-07-12  0:23           ` Seth Jennings
  0 siblings, 0 replies; 62+ messages in thread
From: Seth Jennings @ 2012-07-12  0:23 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Minchan Kim, Robert Jennings, linux-mm,
	devel, linux-kernel

On 07/11/2012 05:42 PM, Nitin Gupta wrote:
> On Wed, Jul 11, 2012 at 1:32 PM, Seth Jennings
> <sjenning@linux.vnet.ibm.com> wrote:
>> On 07/11/2012 01:26 PM, Nitin Gupta wrote:
<snip>
>>> Now obj-1 lies completely within page-2, so can be kmap'ed as usual. On
>>> zs_unmap_object() we would just do the reverse and restore objects as in
>>> figure-1.
>>
>> Hey Nitin, thanks for the feedback.
>>
>> Correct me if I'm wrong, but it seems like you wouldn't be able to map
>> ob2 while ob1 was mapped with this design.  You'd need some sort of
>> zspage level protection against concurrent object mappings.  The
>> code for that protection might cancel any benefit you would gain by
>> doing it this way.
>>
> 
> Do you think blocking access of just one particular object (or
> blocking an entire zspage, for simplicity) for a short time would be
> an issue, apart from the complexity of implementing per zspage
> locking?

It would only need to prevent the mapping of the temporarily displaced
object, but I said zspage because I don't know how we would do
per-object locking.  I actually don't know how we would do zspage
locking either unless there is a lock in the struct page we can use.

Either way, I think it is a complexity I think we'd be better to avoid
for now.  I'm trying to get zsmalloc in shape to bring into mainline, so
I'm really focusing on portability first and low hanging performance
fruit second. This optimization would be more like top-of-the-tree
performance fruit :-/

However, if you want to try it out, don't let me stop you :)

Thanks,
Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
  2012-07-11 14:00     ` Seth Jennings
@ 2012-07-12  1:01       ` Minchan Kim
  -1 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-12  1:01 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On Wed, Jul 11, 2012 at 09:00:30AM -0500, Seth Jennings wrote:
> On 07/11/2012 02:03 AM, Minchan Kim wrote:
> > On 07/03/2012 06:15 AM, Seth Jennings wrote:
> >> zsmapbench measures the copy-based mapping at ~560 cycles for a
> >> map/unmap operation on spanned object for both KVM guest and bare-metal,
> >> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
> >> bare-metal.  The cycles for the copy method will vary with
> >> allocation size, however, it is still faster even for the largest
> >> allocation that zsmalloc supports.
> >>
> >> The result is convenient though, as mempcy is very portable :)
> > 
> > Today, I tested zsmapbench in my embedded board(ARM).
> > tlb-flush is 30% faster than copy-based so it's always not win.
> > I think it depends on CPU speed/cache size.
> > 
> > zram is already very popular on embedded systems so I want to use
> > it continuously without 30% big demage so I want to keep our old approach
> > which supporting local tlb flush. 
> > 
> > Of course, in case of KVM guest, copy-based would be always bin win.
> > So shouldn't we support both approach? It could make code very ugly
> > but I think it has enough value.
> > 
> > Any thought?
> 
> Thanks for testing on ARM.
> 
> I can add the pgtable assisted method back in, no problem.
> The question is by which criteria are we going to choose
> which method to use? By arch (i.e. ARM -> pgtable assist,
> x86 -> copy, other archs -> ?)?

I prefer your previous version __HAVE_LOCAL_FLUSH_TLB_KERNEL_RANGE.
If you didn't implement that function for x86, it simply uses memcpy
version while ARM can use tlb flush version if we add the definary.

Of course, it would be better to select best choice by testing
benchmark for all of architecture but that architecture would be
changed in future, too so we need further testing periodically.
And we will have no time then, too.
For reducing the burden, we can detect it automatically while module
is loading or booting but it tackles with booting time. :(
So, let's put it aside as further works.
At the moment, let's think simply two arch(x86, ARM) until other arch
user doesn't raise a hand for volunteering.

Yes. it could be a problem in future if other arch which support
local flush want to use memcpy but IMHO, it's very hard to kill
two bird(portability and performance) with one stone. :(

> 
> Also, what changes did you make to zsmapbench to measure
> elapsed time/cycles on ARM?  Afaik, rdtscll() is not
> supported on ARM.

I used local_clock instead of arch dependent code and makes longer test time
from 1 sec to 10 sec.

> 
> Thanks,
> Seth
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
@ 2012-07-12  1:01       ` Minchan Kim
  0 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-12  1:01 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On Wed, Jul 11, 2012 at 09:00:30AM -0500, Seth Jennings wrote:
> On 07/11/2012 02:03 AM, Minchan Kim wrote:
> > On 07/03/2012 06:15 AM, Seth Jennings wrote:
> >> zsmapbench measures the copy-based mapping at ~560 cycles for a
> >> map/unmap operation on spanned object for both KVM guest and bare-metal,
> >> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
> >> bare-metal.  The cycles for the copy method will vary with
> >> allocation size, however, it is still faster even for the largest
> >> allocation that zsmalloc supports.
> >>
> >> The result is convenient though, as mempcy is very portable :)
> > 
> > Today, I tested zsmapbench in my embedded board(ARM).
> > tlb-flush is 30% faster than copy-based so it's always not win.
> > I think it depends on CPU speed/cache size.
> > 
> > zram is already very popular on embedded systems so I want to use
> > it continuously without 30% big demage so I want to keep our old approach
> > which supporting local tlb flush. 
> > 
> > Of course, in case of KVM guest, copy-based would be always bin win.
> > So shouldn't we support both approach? It could make code very ugly
> > but I think it has enough value.
> > 
> > Any thought?
> 
> Thanks for testing on ARM.
> 
> I can add the pgtable assisted method back in, no problem.
> The question is by which criteria are we going to choose
> which method to use? By arch (i.e. ARM -> pgtable assist,
> x86 -> copy, other archs -> ?)?

I prefer your previous version __HAVE_LOCAL_FLUSH_TLB_KERNEL_RANGE.
If you didn't implement that function for x86, it simply uses memcpy
version while ARM can use tlb flush version if we add the definary.

Of course, it would be better to select best choice by testing
benchmark for all of architecture but that architecture would be
changed in future, too so we need further testing periodically.
And we will have no time then, too.
For reducing the burden, we can detect it automatically while module
is loading or booting but it tackles with booting time. :(
So, let's put it aside as further works.
At the moment, let's think simply two arch(x86, ARM) until other arch
user doesn't raise a hand for volunteering.

Yes. it could be a problem in future if other arch which support
local flush want to use memcpy but IMHO, it's very hard to kill
two bird(portability and performance) with one stone. :(

> 
> Also, what changes did you make to zsmapbench to measure
> elapsed time/cycles on ARM?  Afaik, rdtscll() is not
> supported on ARM.

I used local_clock instead of arch dependent code and makes longer test time
from 1 sec to 10 sec.

> 
> Thanks,
> Seth
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
  2012-07-11 14:15           ` Seth Jennings
@ 2012-07-12  1:15             ` Minchan Kim
  -1 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-12  1:15 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On Wed, Jul 11, 2012 at 09:15:43AM -0500, Seth Jennings wrote:
> On 07/11/2012 02:42 AM, Minchan Kim wrote:
> > On 07/11/2012 12:17 AM, Seth Jennings wrote:
> >> On 07/09/2012 09:35 PM, Minchan Kim wrote:
> >>> Maybe we need local_irq_save/restore in zs_[un]map_object path.
> >>
> >> I'd rather not disable interrupts since that will create
> >> unnecessary interrupt latency for all users, even if they
> > 
> > Agreed.
> > Although we guide k[un]map atomic is so fast, it isn't necessary
> > to force irq_[enable|disable]. Okay.
> > 
> >> don't need interrupt protection.  If a particular user uses
> >> zs_map_object() in an interrupt path, it will be up to that
> >> user to disable interrupts to ensure safety.
> > 
> > Nope. It shouldn't do that.
> > Any user in interrupt context can't assume that there isn't any other user using per-cpu buffer
> > right before interrupt happens.
> > 
> > The concern is that if such bug happens, it's very hard to find a bug.
> > So, how about adding this?
> > 
> > void zs_map_object(...)
> > {
> > 	BUG_ON(in_interrupt());
> > }
> 
> I not completely following you, but I think I'm following
> enough.  Your point is that the per-cpu buffers are shared
> by all zsmalloc users and one user doesn't know if another
> user is doing a zs_map_object() in an interrupt path.

And vise versa is yes.

> 
> However, I think what you are suggesting is to disallow
> mapping in interrupt context.  This is a problem for zcache
> as it already does mapping in interrupt context, namely for
> page decompression in the page fault handler.

I don't get it.
Page fault handler isn't interrupt context.

> 
> What do you think about making the per-cpu buffers local to
> each zsmalloc pool? That way each user has their own per-cpu
> buffers and don't step on each other's toes.

Maybe, It could be a solution if you really need it in interrupt context.
But the concern is it could hurt zsmalloc's goal which is memory
space efficiency if your system has lots of CPUs.

> 
> Thanks,
> Seth
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
@ 2012-07-12  1:15             ` Minchan Kim
  0 siblings, 0 replies; 62+ messages in thread
From: Minchan Kim @ 2012-07-12  1:15 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Robert Jennings, linux-mm,
	devel, linux-kernel

On Wed, Jul 11, 2012 at 09:15:43AM -0500, Seth Jennings wrote:
> On 07/11/2012 02:42 AM, Minchan Kim wrote:
> > On 07/11/2012 12:17 AM, Seth Jennings wrote:
> >> On 07/09/2012 09:35 PM, Minchan Kim wrote:
> >>> Maybe we need local_irq_save/restore in zs_[un]map_object path.
> >>
> >> I'd rather not disable interrupts since that will create
> >> unnecessary interrupt latency for all users, even if they
> > 
> > Agreed.
> > Although we guide k[un]map atomic is so fast, it isn't necessary
> > to force irq_[enable|disable]. Okay.
> > 
> >> don't need interrupt protection.  If a particular user uses
> >> zs_map_object() in an interrupt path, it will be up to that
> >> user to disable interrupts to ensure safety.
> > 
> > Nope. It shouldn't do that.
> > Any user in interrupt context can't assume that there isn't any other user using per-cpu buffer
> > right before interrupt happens.
> > 
> > The concern is that if such bug happens, it's very hard to find a bug.
> > So, how about adding this?
> > 
> > void zs_map_object(...)
> > {
> > 	BUG_ON(in_interrupt());
> > }
> 
> I not completely following you, but I think I'm following
> enough.  Your point is that the per-cpu buffers are shared
> by all zsmalloc users and one user doesn't know if another
> user is doing a zs_map_object() in an interrupt path.

And vise versa is yes.

> 
> However, I think what you are suggesting is to disallow
> mapping in interrupt context.  This is a problem for zcache
> as it already does mapping in interrupt context, namely for
> page decompression in the page fault handler.

I don't get it.
Page fault handler isn't interrupt context.

> 
> What do you think about making the per-cpu buffers local to
> each zsmalloc pool? That way each user has their own per-cpu
> buffers and don't step on each other's toes.

Maybe, It could be a solution if you really need it in interrupt context.
But the concern is it could hurt zsmalloc's goal which is memory
space efficiency if your system has lots of CPUs.

> 
> Thanks,
> Seth
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
  2012-07-11 20:48           ` Seth Jennings
@ 2012-07-12 10:40             ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 62+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-12 10:40 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

On Wed, Jul 11, 2012 at 03:48:28PM -0500, Seth Jennings wrote:
> On 07/11/2012 02:42 PM, Konrad Rzeszutek Wilk wrote:
> >>>> Which architecture was this under? It sounds x86-ish? Is this on
> >>>> Westmere and more modern machines? What about Core2 architecture?
> >>>>
> >>>> Oh how did it work on AMD Phenom boxes?
> >>>
> >>> I don't have a Phenom box but I have an Athlon X2 I can try out.
> >>> I'll get this information next Monday.
> >>
> >> Actually, I'm running some production stuff on that box, so
> >> I rather not put testing stuff on it.  Is there any
> >> particular reason that you wanted this information? Do you
> >> have a reason to believe that mapping will be faster than
> >> copy for AMD procs?
> > 
> > Sorry for the late response. Working on some ugly bug that is taking
> > more time than anticipated.
> > My thoughts were that these findings are based on the hardware memory
> > prefetcher. The Intel
> > machines - especially starting with Nehelem have some pretty
> > impressive prefetcher where
> > even doing in a linked list 'prefetch' on the next node is not beneficial.
> > 
> > Perhaps the way to leverage this is to use different modes depending
> > on the bulk of data?
> > When there is a huge amount use the old method, but for small use copy
> > (as it would
> > in theory stay in the cache longer).
> 
> Not sure what you mean by "bulk" or "huge amount" but the
> maximum size of mapped object is PAGE_SIZE and the typical
> size more around PAGE_SIZE/2. So that is what I'm
> considering.  Do you think it makes a difference with copies
> that small?

I was thinking in terms of time. So if there are many requests coming
in at some threshold, then use one method.
> 
> Thanks,
> Seth
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/4] zsmalloc improvements
@ 2012-07-12 10:40             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 62+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-12 10:40 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Dan Magenheimer,
	Konrad Rzeszutek Wilk, Nitin Gupta, Minchan Kim, Robert Jennings,
	linux-mm, devel, linux-kernel

On Wed, Jul 11, 2012 at 03:48:28PM -0500, Seth Jennings wrote:
> On 07/11/2012 02:42 PM, Konrad Rzeszutek Wilk wrote:
> >>>> Which architecture was this under? It sounds x86-ish? Is this on
> >>>> Westmere and more modern machines? What about Core2 architecture?
> >>>>
> >>>> Oh how did it work on AMD Phenom boxes?
> >>>
> >>> I don't have a Phenom box but I have an Athlon X2 I can try out.
> >>> I'll get this information next Monday.
> >>
> >> Actually, I'm running some production stuff on that box, so
> >> I rather not put testing stuff on it.  Is there any
> >> particular reason that you wanted this information? Do you
> >> have a reason to believe that mapping will be faster than
> >> copy for AMD procs?
> > 
> > Sorry for the late response. Working on some ugly bug that is taking
> > more time than anticipated.
> > My thoughts were that these findings are based on the hardware memory
> > prefetcher. The Intel
> > machines - especially starting with Nehelem have some pretty
> > impressive prefetcher where
> > even doing in a linked list 'prefetch' on the next node is not beneficial.
> > 
> > Perhaps the way to leverage this is to use different modes depending
> > on the bulk of data?
> > When there is a huge amount use the old method, but for small use copy
> > (as it would
> > in theory stay in the cache longer).
> 
> Not sure what you mean by "bulk" or "huge amount" but the
> maximum size of mapped object is PAGE_SIZE and the typical
> size more around PAGE_SIZE/2. So that is what I'm
> considering.  Do you think it makes a difference with copies
> that small?

I was thinking in terms of time. So if there are many requests coming
in at some threshold, then use one method.
> 
> Thanks,
> Seth
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
  2012-07-12  1:15             ` Minchan Kim
@ 2012-07-12 19:54               ` Dan Magenheimer
  -1 siblings, 0 replies; 62+ messages in thread
From: Dan Magenheimer @ 2012-07-12 19:54 UTC (permalink / raw)
  To: Minchan Kim, Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Konrad Wilk, Nitin Gupta,
	Robert Jennings, linux-mm, devel, linux-kernel

> From: Minchan Kim [mailto:minchan@kernel.org]
> Subject: Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
> 
> On Wed, Jul 11, 2012 at 09:15:43AM -0500, Seth Jennings wrote:
> > On 07/11/2012 02:42 AM, Minchan Kim wrote:
> > > On 07/11/2012 12:17 AM, Seth Jennings wrote:
> > >> On 07/09/2012 09:35 PM, Minchan Kim wrote:
> > >>> Maybe we need local_irq_save/restore in zs_[un]map_object path.
> > >>
> > >> I'd rather not disable interrupts since that will create
> > >> unnecessary interrupt latency for all users, even if they
> > >
> > > Agreed.
> > > Although we guide k[un]map atomic is so fast, it isn't necessary
> > > to force irq_[enable|disable]. Okay.
> > >
> > >> don't need interrupt protection.  If a particular user uses
> > >> zs_map_object() in an interrupt path, it will be up to that
> > >> user to disable interrupts to ensure safety.
> > >
> > > Nope. It shouldn't do that.
> > > Any user in interrupt context can't assume that there isn't any other user using per-cpu buffer
> > > right before interrupt happens.
> > >
> > > The concern is that if such bug happens, it's very hard to find a bug.
> > > So, how about adding this?
> > >
> > > void zs_map_object(...)
> > > {
> > > 	BUG_ON(in_interrupt());
> > > }
> >
> > I not completely following you, but I think I'm following
> > enough.  Your point is that the per-cpu buffers are shared
> > by all zsmalloc users and one user doesn't know if another
> > user is doing a zs_map_object() in an interrupt path.
> 
> And vise versa is yes.
> 
> > However, I think what you are suggesting is to disallow
> > mapping in interrupt context.  This is a problem for zcache
> > as it already does mapping in interrupt context, namely for
> > page decompression in the page fault handler.
> 
> I don't get it.
> Page fault handler isn't interrupt context.
> 
> > What do you think about making the per-cpu buffers local to
> > each zsmalloc pool? That way each user has their own per-cpu
> > buffers and don't step on each other's toes.
> 
> Maybe, It could be a solution if you really need it in interrupt context.
> But the concern is it could hurt zsmalloc's goal which is memory
> space efficiency if your system has lots of CPUs.

Sorry to be so far behind on this thread.

For frontswap and zram, the "put" calls are not in interrupt
context.  For cleancache, the put call IS in interrupt context.
So if you want to use zsmalloc for zcache+cleancache, interrupt
context is a concern.  As discussed previously in a separate
thread though, zsmalloc will take a lot of work to support the full
needs of zcache.  So, pick your poison.

The x86 architecture is far ahead of ARM on many CPU optimizations
including fast copying.  Most x86 systems are also now 64-bit,
which means 64-bit+ data buses.  These differences probably account
for the difference in copy-vs-TLBmapping performance between (64-bit)
x86 and (32-bit ARM).  However, ARM may eventually catch up, especially
when 64-bit ARM chips are more common, so
it would be nice if zsmalloc could determine at runtime which
is faster and use it.

Dan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
@ 2012-07-12 19:54               ` Dan Magenheimer
  0 siblings, 0 replies; 62+ messages in thread
From: Dan Magenheimer @ 2012-07-12 19:54 UTC (permalink / raw)
  To: Minchan Kim, Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Konrad Wilk, Nitin Gupta,
	Robert Jennings, linux-mm, devel, linux-kernel

> From: Minchan Kim [mailto:minchan@kernel.org]
> Subject: Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
> 
> On Wed, Jul 11, 2012 at 09:15:43AM -0500, Seth Jennings wrote:
> > On 07/11/2012 02:42 AM, Minchan Kim wrote:
> > > On 07/11/2012 12:17 AM, Seth Jennings wrote:
> > >> On 07/09/2012 09:35 PM, Minchan Kim wrote:
> > >>> Maybe we need local_irq_save/restore in zs_[un]map_object path.
> > >>
> > >> I'd rather not disable interrupts since that will create
> > >> unnecessary interrupt latency for all users, even if they
> > >
> > > Agreed.
> > > Although we guide k[un]map atomic is so fast, it isn't necessary
> > > to force irq_[enable|disable]. Okay.
> > >
> > >> don't need interrupt protection.  If a particular user uses
> > >> zs_map_object() in an interrupt path, it will be up to that
> > >> user to disable interrupts to ensure safety.
> > >
> > > Nope. It shouldn't do that.
> > > Any user in interrupt context can't assume that there isn't any other user using per-cpu buffer
> > > right before interrupt happens.
> > >
> > > The concern is that if such bug happens, it's very hard to find a bug.
> > > So, how about adding this?
> > >
> > > void zs_map_object(...)
> > > {
> > > 	BUG_ON(in_interrupt());
> > > }
> >
> > I not completely following you, but I think I'm following
> > enough.  Your point is that the per-cpu buffers are shared
> > by all zsmalloc users and one user doesn't know if another
> > user is doing a zs_map_object() in an interrupt path.
> 
> And vise versa is yes.
> 
> > However, I think what you are suggesting is to disallow
> > mapping in interrupt context.  This is a problem for zcache
> > as it already does mapping in interrupt context, namely for
> > page decompression in the page fault handler.
> 
> I don't get it.
> Page fault handler isn't interrupt context.
> 
> > What do you think about making the per-cpu buffers local to
> > each zsmalloc pool? That way each user has their own per-cpu
> > buffers and don't step on each other's toes.
> 
> Maybe, It could be a solution if you really need it in interrupt context.
> But the concern is it could hurt zsmalloc's goal which is memory
> space efficiency if your system has lots of CPUs.

Sorry to be so far behind on this thread.

For frontswap and zram, the "put" calls are not in interrupt
context.  For cleancache, the put call IS in interrupt context.
So if you want to use zsmalloc for zcache+cleancache, interrupt
context is a concern.  As discussed previously in a separate
thread though, zsmalloc will take a lot of work to support the full
needs of zcache.  So, pick your poison.

The x86 architecture is far ahead of ARM on many CPU optimizations
including fast copying.  Most x86 systems are also now 64-bit,
which means 64-bit+ data buses.  These differences probably account
for the difference in copy-vs-TLBmapping performance between (64-bit)
x86 and (32-bit ARM).  However, ARM may eventually catch up, especially
when 64-bit ARM chips are more common, so
it would be nice if zsmalloc could determine at runtime which
is faster and use it.

Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
  2012-07-12 19:54               ` Dan Magenheimer
@ 2012-07-12 22:46                 ` Dan Magenheimer
  -1 siblings, 0 replies; 62+ messages in thread
From: Dan Magenheimer @ 2012-07-12 22:46 UTC (permalink / raw)
  To: Minchan Kim, Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Konrad Wilk, Nitin Gupta,
	Robert Jennings, linux-mm, devel, linux-kernel

> From: Dan Magenheimer
> Subject: RE: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
> 
> > From: Minchan Kim [mailto:minchan@kernel.org]
> > Subject: Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
> >
> > On Wed, Jul 11, 2012 at 09:15:43AM -0500, Seth Jennings wrote:
> > > On 07/11/2012 02:42 AM, Minchan Kim wrote:
> > > > On 07/11/2012 12:17 AM, Seth Jennings wrote:
> > > >> On 07/09/2012 09:35 PM, Minchan Kim wrote:
> > > >>> Maybe we need local_irq_save/restore in zs_[un]map_object path.
> > > >>
> > > >> I'd rather not disable interrupts since that will create
> > > >> unnecessary interrupt latency for all users, even if they
> > > >
> > > > Agreed.
> > > > Although we guide k[un]map atomic is so fast, it isn't necessary
> > > > to force irq_[enable|disable]. Okay.
> > > >
> > > >> don't need interrupt protection.  If a particular user uses
> > > >> zs_map_object() in an interrupt path, it will be up to that
> > > >> user to disable interrupts to ensure safety.
> > > >
> > > > Nope. It shouldn't do that.
> > > > Any user in interrupt context can't assume that there isn't any other user using per-cpu buffer
> > > > right before interrupt happens.
> > > >
> > > > The concern is that if such bug happens, it's very hard to find a bug.
> > > > So, how about adding this?
> > > >
> > > > void zs_map_object(...)
> > > > {
> > > > 	BUG_ON(in_interrupt());
> > > > }
> > >
> > > I not completely following you, but I think I'm following
> > > enough.  Your point is that the per-cpu buffers are shared
> > > by all zsmalloc users and one user doesn't know if another
> > > user is doing a zs_map_object() in an interrupt path.
> >
> > And vise versa is yes.
> >
> > > However, I think what you are suggesting is to disallow
> > > mapping in interrupt context.  This is a problem for zcache
> > > as it already does mapping in interrupt context, namely for
> > > page decompression in the page fault handler.
> >
> > I don't get it.
> > Page fault handler isn't interrupt context.
> >
> > > What do you think about making the per-cpu buffers local to
> > > each zsmalloc pool? That way each user has their own per-cpu
> > > buffers and don't step on each other's toes.
> >
> > Maybe, It could be a solution if you really need it in interrupt context.
> > But the concern is it could hurt zsmalloc's goal which is memory
> > space efficiency if your system has lots of CPUs.
> 
> Sorry to be so far behind on this thread.
> 
> For frontswap and zram, the "put" calls are not in interrupt
> context.  For cleancache, the put call IS in interrupt context.
> So if you want to use zsmalloc for zcache+cleancache, interrupt
> context is a concern.  As discussed previously in a separate
> thread though, zsmalloc will take a lot of work to support the full
> needs of zcache.  So, pick your poison.

Oops, correction.  Cleancache puts are not in interrupt context
but do have interrupts disabled.  That's quite different of
course.  So Minchan's BUG_ON(in_interrupt()) should be fine for
now.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
@ 2012-07-12 22:46                 ` Dan Magenheimer
  0 siblings, 0 replies; 62+ messages in thread
From: Dan Magenheimer @ 2012-07-12 22:46 UTC (permalink / raw)
  To: Minchan Kim, Seth Jennings
  Cc: Greg Kroah-Hartman, Andrew Morton, Konrad Wilk, Nitin Gupta,
	Robert Jennings, linux-mm, devel, linux-kernel

> From: Dan Magenheimer
> Subject: RE: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
> 
> > From: Minchan Kim [mailto:minchan@kernel.org]
> > Subject: Re: [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate
> >
> > On Wed, Jul 11, 2012 at 09:15:43AM -0500, Seth Jennings wrote:
> > > On 07/11/2012 02:42 AM, Minchan Kim wrote:
> > > > On 07/11/2012 12:17 AM, Seth Jennings wrote:
> > > >> On 07/09/2012 09:35 PM, Minchan Kim wrote:
> > > >>> Maybe we need local_irq_save/restore in zs_[un]map_object path.
> > > >>
> > > >> I'd rather not disable interrupts since that will create
> > > >> unnecessary interrupt latency for all users, even if they
> > > >
> > > > Agreed.
> > > > Although we guide k[un]map atomic is so fast, it isn't necessary
> > > > to force irq_[enable|disable]. Okay.
> > > >
> > > >> don't need interrupt protection.  If a particular user uses
> > > >> zs_map_object() in an interrupt path, it will be up to that
> > > >> user to disable interrupts to ensure safety.
> > > >
> > > > Nope. It shouldn't do that.
> > > > Any user in interrupt context can't assume that there isn't any other user using per-cpu buffer
> > > > right before interrupt happens.
> > > >
> > > > The concern is that if such bug happens, it's very hard to find a bug.
> > > > So, how about adding this?
> > > >
> > > > void zs_map_object(...)
> > > > {
> > > > 	BUG_ON(in_interrupt());
> > > > }
> > >
> > > I not completely following you, but I think I'm following
> > > enough.  Your point is that the per-cpu buffers are shared
> > > by all zsmalloc users and one user doesn't know if another
> > > user is doing a zs_map_object() in an interrupt path.
> >
> > And vise versa is yes.
> >
> > > However, I think what you are suggesting is to disallow
> > > mapping in interrupt context.  This is a problem for zcache
> > > as it already does mapping in interrupt context, namely for
> > > page decompression in the page fault handler.
> >
> > I don't get it.
> > Page fault handler isn't interrupt context.
> >
> > > What do you think about making the per-cpu buffers local to
> > > each zsmalloc pool? That way each user has their own per-cpu
> > > buffers and don't step on each other's toes.
> >
> > Maybe, It could be a solution if you really need it in interrupt context.
> > But the concern is it could hurt zsmalloc's goal which is memory
> > space efficiency if your system has lots of CPUs.
> 
> Sorry to be so far behind on this thread.
> 
> For frontswap and zram, the "put" calls are not in interrupt
> context.  For cleancache, the put call IS in interrupt context.
> So if you want to use zsmalloc for zcache+cleancache, interrupt
> context is a concern.  As discussed previously in a separate
> thread though, zsmalloc will take a lot of work to support the full
> needs of zcache.  So, pick your poison.

Oops, correction.  Cleancache puts are not in interrupt context
but do have interrupts disabled.  That's quite different of
course.  So Minchan's BUG_ON(in_interrupt()) should be fine for
now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2012-07-12 22:47 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-02 21:15 [PATCH 0/4] zsmalloc improvements Seth Jennings
2012-07-02 21:15 ` Seth Jennings
2012-07-02 21:15 ` [PATCH 1/4] zsmalloc: remove x86 dependency Seth Jennings
2012-07-02 21:15   ` Seth Jennings
2012-07-10  2:21   ` Minchan Kim
2012-07-10  2:21     ` Minchan Kim
2012-07-10 15:29     ` Seth Jennings
2012-07-10 15:29       ` Seth Jennings
2012-07-11  7:27       ` Minchan Kim
2012-07-11  7:27         ` Minchan Kim
2012-07-11 18:26   ` Nitin Gupta
2012-07-11 18:26     ` Nitin Gupta
2012-07-11 20:32     ` Seth Jennings
2012-07-11 20:32       ` Seth Jennings
2012-07-11 22:42       ` Nitin Gupta
2012-07-11 22:42         ` Nitin Gupta
2012-07-12  0:23         ` Seth Jennings
2012-07-12  0:23           ` Seth Jennings
2012-07-02 21:15 ` [PATCH 2/4] zsmalloc: add single-page object fastpath in unmap Seth Jennings
2012-07-02 21:15   ` Seth Jennings
2012-07-10  2:25   ` Minchan Kim
2012-07-10  2:25     ` Minchan Kim
2012-07-02 21:15 ` [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate Seth Jennings
2012-07-02 21:15   ` Seth Jennings
2012-07-10  2:35   ` Minchan Kim
2012-07-10  2:35     ` Minchan Kim
2012-07-10 15:17     ` Seth Jennings
2012-07-10 15:17       ` Seth Jennings
2012-07-11  7:42       ` Minchan Kim
2012-07-11  7:42         ` Minchan Kim
2012-07-11 14:15         ` Seth Jennings
2012-07-11 14:15           ` Seth Jennings
2012-07-12  1:15           ` Minchan Kim
2012-07-12  1:15             ` Minchan Kim
2012-07-12 19:54             ` Dan Magenheimer
2012-07-12 19:54               ` Dan Magenheimer
2012-07-12 22:46               ` Dan Magenheimer
2012-07-12 22:46                 ` Dan Magenheimer
2012-07-02 21:15 ` [PATCH 4/4] zsmalloc: add mapping modes Seth Jennings
2012-07-02 21:15   ` Seth Jennings
2012-07-04  5:33 ` [PATCH 0/4] zsmalloc improvements Minchan Kim
2012-07-04  5:33   ` Minchan Kim
2012-07-04 20:43 ` Konrad Rzeszutek Wilk
2012-07-04 20:43   ` Konrad Rzeszutek Wilk
2012-07-06 15:07   ` Seth Jennings
2012-07-06 15:07     ` Seth Jennings
2012-07-09 13:58     ` Seth Jennings
2012-07-09 13:58       ` Seth Jennings
2012-07-11 19:42       ` Konrad Rzeszutek Wilk
2012-07-11 19:42         ` Konrad Rzeszutek Wilk
2012-07-11 20:48         ` Seth Jennings
2012-07-11 20:48           ` Seth Jennings
2012-07-12 10:40           ` Konrad Rzeszutek Wilk
2012-07-12 10:40             ` Konrad Rzeszutek Wilk
2012-07-11  7:03 ` Minchan Kim
2012-07-11  7:03   ` Minchan Kim
2012-07-11 14:00   ` Seth Jennings
2012-07-11 14:00     ` Seth Jennings
2012-07-12  1:01     ` Minchan Kim
2012-07-12  1:01       ` Minchan Kim
2012-07-11 19:16   ` Seth Jennings
2012-07-11 19:16     ` Seth Jennings

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.