linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method
@ 2013-02-06  2:17 Minchan Kim
  2013-02-06  2:28 ` Greg Kroah-Hartman
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Minchan Kim @ 2013-02-06  2:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-mm, linux-kernel, Minchan Kim, Andrew Morton,
	Seth Jennings, Nitin Gupta, Dan Magenheimer,
	Konrad Rzeszutek Wilk

Zsmalloc has two methods 1) copy-based and 2) pte-based to access
allocations that span two pages. You can see history why we supported
two approach from [1].

In summary, copy-based method is 3 times fater in x86 while pte-based
is 6 times faster in ARM.

But it was bad choice that adding hard coding to select architecture
which want to use pte based method. This patch removed it and adds
new Kconfig to select the approach.

This patch is based on next-20130205.

[1] https://lkml.org/lkml/2012/7/11/58

* Changelog from v1
  * Fix CONFIG_PGTABLE_MAPPING in zsmalloc-main.c - Greg

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/staging/zsmalloc/Kconfig         | 12 ++++++++++++
 drivers/staging/zsmalloc/zsmalloc-main.c | 20 +++++---------------
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
index 9084565..232b3b6 100644
--- a/drivers/staging/zsmalloc/Kconfig
+++ b/drivers/staging/zsmalloc/Kconfig
@@ -8,3 +8,15 @@ config ZSMALLOC
 	  non-standard allocator interface where a handle, not a pointer, is
 	  returned by an alloc().  This handle must be mapped in order to
 	  access the allocated space.
+
+config PGTABLE_MAPPING
+        bool "Use page table mapping to access allocations that span two pages"
+        depends on ZSMALLOC
+        default n
+        help
+	  By default, zsmalloc uses a copy-based object mapping method to access
+	  allocations that span two pages. However, if a particular architecture
+	  performs VM mapping faster than copying, then you should select this.
+	  This causes zsmalloc to use page table mapping rather than copying
+	  for object mapping. You can check speed with zsmalloc benchmark[1].
+	  [1] https://github.com/spartacus06/zsmalloc
diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index 06f73a9..2c1805c 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -207,6 +207,7 @@ struct zs_pool {
 	struct size_class size_class[ZS_SIZE_CLASSES];
 
 	gfp_t flags;	/* allocation flags used when growing pool */
+
 };
 
 /*
@@ -218,19 +219,8 @@ struct zs_pool {
 #define CLASS_IDX_MASK	((1 << CLASS_IDX_BITS) - 1)
 #define FULLNESS_MASK	((1 << FULLNESS_BITS) - 1)
 
-/*
- * By default, zsmalloc uses a copy-based object mapping method to access
- * allocations that span two pages. However, if a particular architecture
- * performs VM mapping faster than copying, then it should be added here
- * so that USE_PGTABLE_MAPPING is defined. This causes zsmalloc to use
- * page table mapping rather than copying for object mapping.
-*/
-#if defined(CONFIG_ARM)
-#define USE_PGTABLE_MAPPING
-#endif
-
 struct mapping_area {
-#ifdef USE_PGTABLE_MAPPING
+#ifdef CONFIG_PGTABLE_MAPPING
 	struct vm_struct *vm; /* vm area for mapping object that span pages */
 #else
 	char *vm_buf; /* copy buffer for objects that span pages */
@@ -622,7 +612,7 @@ static struct page *find_get_zspage(struct size_class *class)
 	return page;
 }
 
-#ifdef USE_PGTABLE_MAPPING
+#ifdef CONFIG_PGTABLE_MAPPING
 static inline int __zs_cpu_up(struct mapping_area *area)
 {
 	/*
@@ -663,7 +653,7 @@ static inline void __zs_unmap_object(struct mapping_area *area,
 	flush_tlb_kernel_range(addr, end);
 }
 
-#else /* USE_PGTABLE_MAPPING */
+#else /* CONFIG_PGTABLE_MAPPING*/
 
 static inline int __zs_cpu_up(struct mapping_area *area)
 {
@@ -741,7 +731,7 @@ out:
 	pagefault_enable();
 }
 
-#endif /* USE_PGTABLE_MAPPING */
+#endif /* CONFIG_PGTABLE_MAPPING */
 
 static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
 				void *pcpu)
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method
  2013-02-06  2:17 [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method Minchan Kim
@ 2013-02-06  2:28 ` Greg Kroah-Hartman
  2013-02-06  2:50   ` Minchan Kim
  2013-02-06 16:47 ` Seth Jennings
  2013-02-17  6:19 ` Ric Mason
  2 siblings, 1 reply; 8+ messages in thread
From: Greg Kroah-Hartman @ 2013-02-06  2:28 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Andrew Morton, Seth Jennings,
	Nitin Gupta, Dan Magenheimer, Konrad Rzeszutek Wilk

On Wed, Feb 06, 2013 at 11:17:08AM +0900, Minchan Kim wrote:
> diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
> index 9084565..232b3b6 100644
> --- a/drivers/staging/zsmalloc/Kconfig
> +++ b/drivers/staging/zsmalloc/Kconfig
> @@ -8,3 +8,15 @@ config ZSMALLOC
>  	  non-standard allocator interface where a handle, not a pointer, is
>  	  returned by an alloc().  This handle must be mapped in order to
>  	  access the allocated space.
> +
> +config PGTABLE_MAPPING
> +        bool "Use page table mapping to access allocations that span two pages"

No tabs?

Please also put "ZSmalloc somewhere in the text here, otherwise it
really doesn't make much sense when seeing it in a menu.

> +        depends on ZSMALLOC
> +        default n

That's the default, so it can be dropped.

> +        help
> +	  By default, zsmalloc uses a copy-based object mapping method to access
> +	  allocations that span two pages. However, if a particular architecture
> +	  performs VM mapping faster than copying, then you should select this.
> +	  This causes zsmalloc to use page table mapping rather than copying
> +	  for object mapping. You can check speed with zsmalloc benchmark[1].
> +	  [1] https://github.com/spartacus06/zsmalloc

Care to specify exactly _what_ architectures this should be set for or
not?  That will help the distros out a lot in determining if this should
be enabled or not.

> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
> index 06f73a9..2c1805c 100644
> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> @@ -207,6 +207,7 @@ struct zs_pool {
>  	struct size_class size_class[ZS_SIZE_CLASSES];
>  
>  	gfp_t flags;	/* allocation flags used when growing pool */
> +
>  };
>  
>  /*

Why add this extra line?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method
  2013-02-06  2:28 ` Greg Kroah-Hartman
@ 2013-02-06  2:50   ` Minchan Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2013-02-06  2:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-mm, linux-kernel, Andrew Morton, Seth Jennings,
	Nitin Gupta, Dan Magenheimer, Konrad Rzeszutek Wilk

On Tue, Feb 05, 2013 at 06:28:54PM -0800, Greg Kroah-Hartman wrote:
> On Wed, Feb 06, 2013 at 11:17:08AM +0900, Minchan Kim wrote:
> > diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
> > index 9084565..232b3b6 100644
> > --- a/drivers/staging/zsmalloc/Kconfig
> > +++ b/drivers/staging/zsmalloc/Kconfig
> > @@ -8,3 +8,15 @@ config ZSMALLOC
> >  	  non-standard allocator interface where a handle, not a pointer, is
> >  	  returned by an alloc().  This handle must be mapped in order to
> >  	  access the allocated space.
> > +
> > +config PGTABLE_MAPPING
> > +        bool "Use page table mapping to access allocations that span two pages"
> 
> No tabs?
> 
> Please also put "ZSmalloc somewhere in the text here, otherwise it
> really doesn't make much sense when seeing it in a menu.
> 
> > +        depends on ZSMALLOC
> > +        default n
> 
> That's the default, so it can be dropped.
> 
> > +        help
> > +	  By default, zsmalloc uses a copy-based object mapping method to access
> > +	  allocations that span two pages. However, if a particular architecture
> > +	  performs VM mapping faster than copying, then you should select this.
> > +	  This causes zsmalloc to use page table mapping rather than copying
> > +	  for object mapping. You can check speed with zsmalloc benchmark[1].
> > +	  [1] https://github.com/spartacus06/zsmalloc
> 
> Care to specify exactly _what_ architectures this should be set for or
> not?  That will help the distros out a lot in determining if this should
> be enabled or not.
> 
> > diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
> > index 06f73a9..2c1805c 100644
> > --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> > +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> > @@ -207,6 +207,7 @@ struct zs_pool {
> >  	struct size_class size_class[ZS_SIZE_CLASSES];
> >  
> >  	gfp_t flags;	/* allocation flags used when growing pool */
> > +
> >  };
> >  
> >  /*
> 
> Why add this extra line?
> 
> thanks,
> 
> greg k-h

Sorry for bothering you.
I fixed all you pointed out.
Thanks for the review, Greg!

Here it goes.

------------------- 8< -------------------

>From 506acea72916c9a12cf80290bc5cd87f4af1914d Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Wed, 6 Feb 2013 11:10:59 +0900
Subject: [PATCH v3] zsmalloc: Add Kconfig for enabling PTE method

Zsmalloc has two methods 1) copy-based and 2) pte-based to access
allocations that span two pages. You can see history why we supported
two approach from [1].

In summary, copy-based method is 3 times fater in x86 while pte-based
is 6 times faster in ARM.

But it was bad choice that adding hard coding to select architecture
which want to use pte based method. This patch removed it and adds
new Kconfig to select the approach.

This patch is based on next-20130205.

[1] https://lkml.org/lkml/2012/7/11/58

* Changelog from v2
  * Add tab and drop "default n" - Greg
  * Modify description - Greg
  * Drop unnecessary extra line - Greg

* Changelog from v1
  * Fix CONFIG_PGTABLE_MAPPING in zsmalloc-main.c - Greg

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/staging/zsmalloc/Kconfig         | 13 +++++++++++++
 drivers/staging/zsmalloc/zsmalloc-main.c | 19 ++++---------------
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
index 9084565..83f9cec 100644
--- a/drivers/staging/zsmalloc/Kconfig
+++ b/drivers/staging/zsmalloc/Kconfig
@@ -8,3 +8,16 @@ config ZSMALLOC
 	  non-standard allocator interface where a handle, not a pointer, is
 	  returned by an alloc().  This handle must be mapped in order to
 	  access the allocated space.
+
+config PGTABLE_MAPPING
+	bool "Use page table mapping to access object in zsmalloc"
+	depends on ZSMALLOC
+	help
+	  By default, zsmalloc uses a copy-based object mapping method to
+	  access allocations that span two pages. However, if a particular
+	  architecture (ex, ARM) performs VM mapping faster than copying,
+	  then you should select this. This causes zsmalloc to use page table
+	  mapping rather than copying for object mapping.
+
+	  You can check speed with zsmalloc benchmark[1].
+	  [1] https://github.com/spartacus06/zsmalloc
diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index 06f73a9..aa6aac4 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -218,19 +218,8 @@ struct zs_pool {
 #define CLASS_IDX_MASK	((1 << CLASS_IDX_BITS) - 1)
 #define FULLNESS_MASK	((1 << FULLNESS_BITS) - 1)
 
-/*
- * By default, zsmalloc uses a copy-based object mapping method to access
- * allocations that span two pages. However, if a particular architecture
- * performs VM mapping faster than copying, then it should be added here
- * so that USE_PGTABLE_MAPPING is defined. This causes zsmalloc to use
- * page table mapping rather than copying for object mapping.
-*/
-#if defined(CONFIG_ARM)
-#define USE_PGTABLE_MAPPING
-#endif
-
 struct mapping_area {
-#ifdef USE_PGTABLE_MAPPING
+#ifdef CONFIG_PGTABLE_MAPPING
 	struct vm_struct *vm; /* vm area for mapping object that span pages */
 #else
 	char *vm_buf; /* copy buffer for objects that span pages */
@@ -622,7 +611,7 @@ static struct page *find_get_zspage(struct size_class *class)
 	return page;
 }
 
-#ifdef USE_PGTABLE_MAPPING
+#ifdef CONFIG_PGTABLE_MAPPING
 static inline int __zs_cpu_up(struct mapping_area *area)
 {
 	/*
@@ -663,7 +652,7 @@ static inline void __zs_unmap_object(struct mapping_area *area,
 	flush_tlb_kernel_range(addr, end);
 }
 
-#else /* USE_PGTABLE_MAPPING */
+#else /* CONFIG_PGTABLE_MAPPING*/
 
 static inline int __zs_cpu_up(struct mapping_area *area)
 {
@@ -741,7 +730,7 @@ out:
 	pagefault_enable();
 }
 
-#endif /* USE_PGTABLE_MAPPING */
+#endif /* CONFIG_PGTABLE_MAPPING */
 
 static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
 				void *pcpu)
-- 
1.8.1.1

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method
  2013-02-06  2:17 [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method Minchan Kim
  2013-02-06  2:28 ` Greg Kroah-Hartman
@ 2013-02-06 16:47 ` Seth Jennings
  2013-02-06 23:16   ` Minchan Kim
  2013-02-17  6:19 ` Ric Mason
  2 siblings, 1 reply; 8+ messages in thread
From: Seth Jennings @ 2013-02-06 16:47 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, linux-mm, linux-kernel, Andrew Morton,
	Nitin Gupta, Dan Magenheimer, Konrad Rzeszutek Wilk

On 02/05/2013 08:17 PM, Minchan Kim wrote:
> Zsmalloc has two methods 1) copy-based and 2) pte-based to access
> allocations that span two pages. You can see history why we supported
> two approach from [1].
> 
> In summary, copy-based method is 3 times fater in x86 while pte-based
> is 6 times faster in ARM.
> 
> But it was bad choice that adding hard coding to select architecture
> which want to use pte based method. This patch removed it and adds
> new Kconfig to select the approach.
> 
> This patch is based on next-20130205.
> 
> [1] https://lkml.org/lkml/2012/7/11/58
> 
> * Changelog from v1
>   * Fix CONFIG_PGTABLE_MAPPING in zsmalloc-main.c - Greg
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
> Cc: Nitin Gupta <ngupta@vflare.org>
> Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
> Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  drivers/staging/zsmalloc/Kconfig         | 12 ++++++++++++
>  drivers/staging/zsmalloc/zsmalloc-main.c | 20 +++++---------------
>  2 files changed, 17 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
> index 9084565..232b3b6 100644
> --- a/drivers/staging/zsmalloc/Kconfig
> +++ b/drivers/staging/zsmalloc/Kconfig
> @@ -8,3 +8,15 @@ config ZSMALLOC
>  	  non-standard allocator interface where a handle, not a pointer, is
>  	  returned by an alloc().  This handle must be mapped in order to
>  	  access the allocated space.
> +
> +config PGTABLE_MAPPING
> +        bool "Use page table mapping to access allocations that span two pages"
> +        depends on ZSMALLOC
> +        default n
> +        help
> +	  By default, zsmalloc uses a copy-based object mapping method to access
> +	  allocations that span two pages. However, if a particular architecture
> +	  performs VM mapping faster than copying, then you should select this.
> +	  This causes zsmalloc to use page table mapping rather than copying
> +	  for object mapping. You can check speed with zsmalloc benchmark[1].
> +	  [1] https://github.com/spartacus06/zsmalloc

Hmm, I'm not sure we want to include this link in the Kconfig.  While  I
don't have any plans to take that repo down, I could see it getting
stale at some point for yet-to-be-determined reasons.

Of course, without this tool (or something like it) it is hard to know
which option is better for your particular platform.

Would having this in a Documentation/ file, once one exists, be better?

Seth


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method
  2013-02-06 16:47 ` Seth Jennings
@ 2013-02-06 23:16   ` Minchan Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2013-02-06 23:16 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Greg Kroah-Hartman, linux-mm, linux-kernel, Andrew Morton,
	Nitin Gupta, Dan Magenheimer, Konrad Rzeszutek Wilk

On Wed, Feb 06, 2013 at 10:47:16AM -0600, Seth Jennings wrote:
> On 02/05/2013 08:17 PM, Minchan Kim wrote:
> > Zsmalloc has two methods 1) copy-based and 2) pte-based to access
> > allocations that span two pages. You can see history why we supported
> > two approach from [1].
> > 
> > In summary, copy-based method is 3 times fater in x86 while pte-based
> > is 6 times faster in ARM.
> > 
> > But it was bad choice that adding hard coding to select architecture
> > which want to use pte based method. This patch removed it and adds
> > new Kconfig to select the approach.
> > 
> > This patch is based on next-20130205.
> > 
> > [1] https://lkml.org/lkml/2012/7/11/58
> > 
> > * Changelog from v1
> >   * Fix CONFIG_PGTABLE_MAPPING in zsmalloc-main.c - Greg
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
> > Cc: Nitin Gupta <ngupta@vflare.org>
> > Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
> > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  drivers/staging/zsmalloc/Kconfig         | 12 ++++++++++++
> >  drivers/staging/zsmalloc/zsmalloc-main.c | 20 +++++---------------
> >  2 files changed, 17 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
> > index 9084565..232b3b6 100644
> > --- a/drivers/staging/zsmalloc/Kconfig
> > +++ b/drivers/staging/zsmalloc/Kconfig
> > @@ -8,3 +8,15 @@ config ZSMALLOC
> >  	  non-standard allocator interface where a handle, not a pointer, is
> >  	  returned by an alloc().  This handle must be mapped in order to
> >  	  access the allocated space.
> > +
> > +config PGTABLE_MAPPING
> > +        bool "Use page table mapping to access allocations that span two pages"
> > +        depends on ZSMALLOC
> > +        default n
> > +        help
> > +	  By default, zsmalloc uses a copy-based object mapping method to access
> > +	  allocations that span two pages. However, if a particular architecture
> > +	  performs VM mapping faster than copying, then you should select this.
> > +	  This causes zsmalloc to use page table mapping rather than copying
> > +	  for object mapping. You can check speed with zsmalloc benchmark[1].
> > +	  [1] https://github.com/spartacus06/zsmalloc
> 
> Hmm, I'm not sure we want to include this link in the Kconfig.  While  I
> don't have any plans to take that repo down, I could see it getting
> stale at some point for yet-to-be-determined reasons.
> 
> Of course, without this tool (or something like it) it is hard to know
> which option is better for your particular platform.
> 
> Would having this in a Documentation/ file, once one exists, be better?

It could be better. Then, Let's point out that documentataion in Kconfig.
Okay, Let's sort it out.

> 
> Seth
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method
  2013-02-06  2:17 [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method Minchan Kim
  2013-02-06  2:28 ` Greg Kroah-Hartman
  2013-02-06 16:47 ` Seth Jennings
@ 2013-02-17  6:19 ` Ric Mason
  2013-02-18 18:07   ` Dan Magenheimer
  2013-02-18 18:08   ` Seth Jennings
  2 siblings, 2 replies; 8+ messages in thread
From: Ric Mason @ 2013-02-17  6:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Greg Kroah-Hartman, linux-mm, linux-kernel, Andrew Morton,
	Seth Jennings, Nitin Gupta, Dan Magenheimer,
	Konrad Rzeszutek Wilk

On 02/06/2013 10:17 AM, Minchan Kim wrote:
> Zsmalloc has two methods 1) copy-based and 2) pte-based to access
> allocations that span two pages. You can see history why we supported
> two approach from [1].
>
> In summary, copy-based method is 3 times fater in x86 while pte-based
> is 6 times faster in ARM.

Why in some arches copy-based method is better and in the other arches 
pte-based is better? What's the root reason?

>
> But it was bad choice that adding hard coding to select architecture
> which want to use pte based method. This patch removed it and adds
> new Kconfig to select the approach.
>
> This patch is based on next-20130205.
>
> [1] https://lkml.org/lkml/2012/7/11/58
>
> * Changelog from v1
>    * Fix CONFIG_PGTABLE_MAPPING in zsmalloc-main.c - Greg
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
> Cc: Nitin Gupta <ngupta@vflare.org>
> Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
> Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>   drivers/staging/zsmalloc/Kconfig         | 12 ++++++++++++
>   drivers/staging/zsmalloc/zsmalloc-main.c | 20 +++++---------------
>   2 files changed, 17 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
> index 9084565..232b3b6 100644
> --- a/drivers/staging/zsmalloc/Kconfig
> +++ b/drivers/staging/zsmalloc/Kconfig
> @@ -8,3 +8,15 @@ config ZSMALLOC
>   	  non-standard allocator interface where a handle, not a pointer, is
>   	  returned by an alloc().  This handle must be mapped in order to
>   	  access the allocated space.
> +
> +config PGTABLE_MAPPING
> +        bool "Use page table mapping to access allocations that span two pages"
> +        depends on ZSMALLOC
> +        default n
> +        help
> +	  By default, zsmalloc uses a copy-based object mapping method to access
> +	  allocations that span two pages. However, if a particular architecture
> +	  performs VM mapping faster than copying, then you should select this.
> +	  This causes zsmalloc to use page table mapping rather than copying
> +	  for object mapping. You can check speed with zsmalloc benchmark[1].
> +	  [1] https://github.com/spartacus06/zsmalloc
> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
> index 06f73a9..2c1805c 100644
> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> @@ -207,6 +207,7 @@ struct zs_pool {
>   	struct size_class size_class[ZS_SIZE_CLASSES];
>   
>   	gfp_t flags;	/* allocation flags used when growing pool */
> +
>   };
>   
>   /*
> @@ -218,19 +219,8 @@ struct zs_pool {
>   #define CLASS_IDX_MASK	((1 << CLASS_IDX_BITS) - 1)
>   #define FULLNESS_MASK	((1 << FULLNESS_BITS) - 1)
>   
> -/*
> - * By default, zsmalloc uses a copy-based object mapping method to access
> - * allocations that span two pages. However, if a particular architecture
> - * performs VM mapping faster than copying, then it should be added here
> - * so that USE_PGTABLE_MAPPING is defined. This causes zsmalloc to use
> - * page table mapping rather than copying for object mapping.
> -*/
> -#if defined(CONFIG_ARM)
> -#define USE_PGTABLE_MAPPING
> -#endif
> -
>   struct mapping_area {
> -#ifdef USE_PGTABLE_MAPPING
> +#ifdef CONFIG_PGTABLE_MAPPING
>   	struct vm_struct *vm; /* vm area for mapping object that span pages */
>   #else
>   	char *vm_buf; /* copy buffer for objects that span pages */
> @@ -622,7 +612,7 @@ static struct page *find_get_zspage(struct size_class *class)
>   	return page;
>   }
>   
> -#ifdef USE_PGTABLE_MAPPING
> +#ifdef CONFIG_PGTABLE_MAPPING
>   static inline int __zs_cpu_up(struct mapping_area *area)
>   {
>   	/*
> @@ -663,7 +653,7 @@ static inline void __zs_unmap_object(struct mapping_area *area,
>   	flush_tlb_kernel_range(addr, end);
>   }
>   
> -#else /* USE_PGTABLE_MAPPING */
> +#else /* CONFIG_PGTABLE_MAPPING*/
>   
>   static inline int __zs_cpu_up(struct mapping_area *area)
>   {
> @@ -741,7 +731,7 @@ out:
>   	pagefault_enable();
>   }
>   
> -#endif /* USE_PGTABLE_MAPPING */
> +#endif /* CONFIG_PGTABLE_MAPPING */
>   
>   static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
>   				void *pcpu)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method
  2013-02-17  6:19 ` Ric Mason
@ 2013-02-18 18:07   ` Dan Magenheimer
  2013-02-18 18:08   ` Seth Jennings
  1 sibling, 0 replies; 8+ messages in thread
From: Dan Magenheimer @ 2013-02-18 18:07 UTC (permalink / raw)
  To: Ric Mason, Minchan Kim
  Cc: Greg Kroah-Hartman, linux-mm, linux-kernel, Andrew Morton,
	Seth Jennings, Nitin Gupta, Konrad Rzeszutek Wilk

> From: Ric Mason [mailto:ric.masonn@gmail.com]
> Sent: Saturday, February 16, 2013 11:19 PM
> To: Minchan Kim
> Cc: Greg Kroah-Hartman; linux-mm@kvack.org; linux-kernel@vger.kernel.org; Andrew Morton; Seth
> Jennings; Nitin Gupta; Dan Magenheimer; Konrad Rzeszutek Wilk
> Subject: Re: [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method
> 
> On 02/06/2013 10:17 AM, Minchan Kim wrote:
> > Zsmalloc has two methods 1) copy-based and 2) pte-based to access
> > allocations that span two pages. You can see history why we supported
> > two approach from [1].
> >
> > In summary, copy-based method is 3 times fater in x86 while pte-based
> > is 6 times faster in ARM.
> 
> Why in some arches copy-based method is better and in the other arches
> pte-based is better? What's the root reason?

Minchan, if you post another version, I think these precise numbers
(of "times faster") should be removed.  The speed is very data
dependent, because the copy-based method is copying a zpage which
may vary widely in size from ~100 bytes to nearly PAGE_SIZE bytes,
a factor of 40x or more.

Please at least say "up to 3 times" or "approximately 3x faster for
an average compressed page".

Ric, the copy-based method does an extra copy of N bytes (where
N is the compressed size of a page).  The pte-based method requires
extra TLB actions.  The relative speed of TLB operations vs
copying is very architecture-dependent.  It is also probably
dependent on the specific implementation of the architecture
(i.e x86 sandybridge is likely very different than x86
nehalem) and, as noted above, dependent on N which is
unpredictable.

So it makes sense to have both choices, but it's not at all clear
how to select which one to use!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method
  2013-02-17  6:19 ` Ric Mason
  2013-02-18 18:07   ` Dan Magenheimer
@ 2013-02-18 18:08   ` Seth Jennings
  1 sibling, 0 replies; 8+ messages in thread
From: Seth Jennings @ 2013-02-18 18:08 UTC (permalink / raw)
  To: Ric Mason
  Cc: Minchan Kim, Greg Kroah-Hartman, linux-mm, linux-kernel,
	Andrew Morton, Nitin Gupta, Dan Magenheimer,
	Konrad Rzeszutek Wilk

On 02/17/2013 12:19 AM, Ric Mason wrote:
> On 02/06/2013 10:17 AM, Minchan Kim wrote:
>> Zsmalloc has two methods 1) copy-based and 2) pte-based to access
>> allocations that span two pages. You can see history why we supported
>> two approach from [1].
>>
>> In summary, copy-based method is 3 times fater in x86 while pte-based
>> is 6 times faster in ARM.
> 
> Why in some arches copy-based method is better and in the other arches
> pte-based is better? What's the root reason?

Minchan might know more about this (or Russell King) but I'll give it
a try.

MMU designs can vary pretty significantly from arch to arch.  An
operation that is cheap on one MMU design can be expensive on another,
especially once SMP gets involved, possibly resulting in
inter-processor interrupts.

RAM speed is also a factor since the copy-method will use more memory
bandwidth.  Embedded systems typically won't have really fast memory.

Seth


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-02-18 18:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-06  2:17 [PATCH v2] zsmalloc: Add Kconfig for enabling PTE method Minchan Kim
2013-02-06  2:28 ` Greg Kroah-Hartman
2013-02-06  2:50   ` Minchan Kim
2013-02-06 16:47 ` Seth Jennings
2013-02-06 23:16   ` Minchan Kim
2013-02-17  6:19 ` Ric Mason
2013-02-18 18:07   ` Dan Magenheimer
2013-02-18 18:08   ` Seth Jennings

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).