All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Nitin Gupta <ngupta@vflare.org>,
	Robert Jennings <rcj@linux.vnet.ibm.com>,
	linux-mm@kvack.org, devel@driverdev.osuosl.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] zsmalloc improvements
Date: Thu, 12 Jul 2012 10:01:05 +0900	[thread overview]
Message-ID: <20120712010105.GA5503@bbox> (raw)
In-Reply-To: <4FFD86FE.1090307@linux.vnet.ibm.com>

On Wed, Jul 11, 2012 at 09:00:30AM -0500, Seth Jennings wrote:
> On 07/11/2012 02:03 AM, Minchan Kim wrote:
> > On 07/03/2012 06:15 AM, Seth Jennings wrote:
> >> zsmapbench measures the copy-based mapping at ~560 cycles for a
> >> map/unmap operation on spanned object for both KVM guest and bare-metal,
> >> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
> >> bare-metal.  The cycles for the copy method will vary with
> >> allocation size, however, it is still faster even for the largest
> >> allocation that zsmalloc supports.
> >>
> >> The result is convenient though, as mempcy is very portable :)
> > 
> > Today, I tested zsmapbench in my embedded board(ARM).
> > tlb-flush is 30% faster than copy-based so it's always not win.
> > I think it depends on CPU speed/cache size.
> > 
> > zram is already very popular on embedded systems so I want to use
> > it continuously without 30% big demage so I want to keep our old approach
> > which supporting local tlb flush. 
> > 
> > Of course, in case of KVM guest, copy-based would be always bin win.
> > So shouldn't we support both approach? It could make code very ugly
> > but I think it has enough value.
> > 
> > Any thought?
> 
> Thanks for testing on ARM.
> 
> I can add the pgtable assisted method back in, no problem.
> The question is by which criteria are we going to choose
> which method to use? By arch (i.e. ARM -> pgtable assist,
> x86 -> copy, other archs -> ?)?

I prefer your previous version __HAVE_LOCAL_FLUSH_TLB_KERNEL_RANGE.
If you didn't implement that function for x86, it simply uses memcpy
version while ARM can use tlb flush version if we add the definary.

Of course, it would be better to select best choice by testing
benchmark for all of architecture but that architecture would be
changed in future, too so we need further testing periodically.
And we will have no time then, too.
For reducing the burden, we can detect it automatically while module
is loading or booting but it tackles with booting time. :(
So, let's put it aside as further works.
At the moment, let's think simply two arch(x86, ARM) until other arch
user doesn't raise a hand for volunteering.

Yes. it could be a problem in future if other arch which support
local flush want to use memcpy but IMHO, it's very hard to kill
two bird(portability and performance) with one stone. :(

> 
> Also, what changes did you make to zsmapbench to measure
> elapsed time/cycles on ARM?  Afaik, rdtscll() is not
> supported on ARM.

I used local_clock instead of arch dependent code and makes longer test time
from 1 sec to 10 sec.

> 
> Thanks,
> Seth
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID
From: Minchan Kim <minchan@kernel.org>
To: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Nitin Gupta <ngupta@vflare.org>,
	Robert Jennings <rcj@linux.vnet.ibm.com>,
	linux-mm@kvack.org, devel@driverdev.osuosl.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] zsmalloc improvements
Date: Thu, 12 Jul 2012 10:01:05 +0900	[thread overview]
Message-ID: <20120712010105.GA5503@bbox> (raw)
In-Reply-To: <4FFD86FE.1090307@linux.vnet.ibm.com>

On Wed, Jul 11, 2012 at 09:00:30AM -0500, Seth Jennings wrote:
> On 07/11/2012 02:03 AM, Minchan Kim wrote:
> > On 07/03/2012 06:15 AM, Seth Jennings wrote:
> >> zsmapbench measures the copy-based mapping at ~560 cycles for a
> >> map/unmap operation on spanned object for both KVM guest and bare-metal,
> >> while the page table mapping was ~1500 cycles on a VM and ~760 cycles
> >> bare-metal.  The cycles for the copy method will vary with
> >> allocation size, however, it is still faster even for the largest
> >> allocation that zsmalloc supports.
> >>
> >> The result is convenient though, as mempcy is very portable :)
> > 
> > Today, I tested zsmapbench in my embedded board(ARM).
> > tlb-flush is 30% faster than copy-based so it's always not win.
> > I think it depends on CPU speed/cache size.
> > 
> > zram is already very popular on embedded systems so I want to use
> > it continuously without 30% big demage so I want to keep our old approach
> > which supporting local tlb flush. 
> > 
> > Of course, in case of KVM guest, copy-based would be always bin win.
> > So shouldn't we support both approach? It could make code very ugly
> > but I think it has enough value.
> > 
> > Any thought?
> 
> Thanks for testing on ARM.
> 
> I can add the pgtable assisted method back in, no problem.
> The question is by which criteria are we going to choose
> which method to use? By arch (i.e. ARM -> pgtable assist,
> x86 -> copy, other archs -> ?)?

I prefer your previous version __HAVE_LOCAL_FLUSH_TLB_KERNEL_RANGE.
If you didn't implement that function for x86, it simply uses memcpy
version while ARM can use tlb flush version if we add the definary.

Of course, it would be better to select best choice by testing
benchmark for all of architecture but that architecture would be
changed in future, too so we need further testing periodically.
And we will have no time then, too.
For reducing the burden, we can detect it automatically while module
is loading or booting but it tackles with booting time. :(
So, let's put it aside as further works.
At the moment, let's think simply two arch(x86, ARM) until other arch
user doesn't raise a hand for volunteering.

Yes. it could be a problem in future if other arch which support
local flush want to use memcpy but IMHO, it's very hard to kill
two bird(portability and performance) with one stone. :(

> 
> Also, what changes did you make to zsmapbench to measure
> elapsed time/cycles on ARM?  Afaik, rdtscll() is not
> supported on ARM.

I used local_clock instead of arch dependent code and makes longer test time
from 1 sec to 10 sec.

> 
> Thanks,
> Seth
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-07-12  1:01 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-02 21:15 Seth Jennings
2012-07-02 21:15 ` Seth Jennings
2012-07-02 21:15 ` [PATCH 1/4] zsmalloc: remove x86 dependency Seth Jennings
2012-07-02 21:15   ` Seth Jennings
2012-07-10  2:21   ` Minchan Kim
2012-07-10  2:21     ` Minchan Kim
2012-07-10 15:29     ` Seth Jennings
2012-07-10 15:29       ` Seth Jennings
2012-07-11  7:27       ` Minchan Kim
2012-07-11  7:27         ` Minchan Kim
2012-07-11 18:26   ` Nitin Gupta
2012-07-11 18:26     ` Nitin Gupta
2012-07-11 20:32     ` Seth Jennings
2012-07-11 20:32       ` Seth Jennings
2012-07-11 22:42       ` Nitin Gupta
2012-07-11 22:42         ` Nitin Gupta
2012-07-12  0:23         ` Seth Jennings
2012-07-12  0:23           ` Seth Jennings
2012-07-02 21:15 ` [PATCH 2/4] zsmalloc: add single-page object fastpath in unmap Seth Jennings
2012-07-02 21:15   ` Seth Jennings
2012-07-10  2:25   ` Minchan Kim
2012-07-10  2:25     ` Minchan Kim
2012-07-02 21:15 ` [PATCH 3/4] zsmalloc: add details to zs_map_object boiler plate Seth Jennings
2012-07-02 21:15   ` Seth Jennings
2012-07-10  2:35   ` Minchan Kim
2012-07-10  2:35     ` Minchan Kim
2012-07-10 15:17     ` Seth Jennings
2012-07-10 15:17       ` Seth Jennings
2012-07-11  7:42       ` Minchan Kim
2012-07-11  7:42         ` Minchan Kim
2012-07-11 14:15         ` Seth Jennings
2012-07-11 14:15           ` Seth Jennings
2012-07-12  1:15           ` Minchan Kim
2012-07-12  1:15             ` Minchan Kim
2012-07-12 19:54             ` Dan Magenheimer
2012-07-12 19:54               ` Dan Magenheimer
2012-07-12 22:46               ` Dan Magenheimer
2012-07-12 22:46                 ` Dan Magenheimer
2012-07-02 21:15 ` [PATCH 4/4] zsmalloc: add mapping modes Seth Jennings
2012-07-02 21:15   ` Seth Jennings
2012-07-04  5:33 ` [PATCH 0/4] zsmalloc improvements Minchan Kim
2012-07-04  5:33   ` Minchan Kim
2012-07-04 20:43 ` Konrad Rzeszutek Wilk
2012-07-04 20:43   ` Konrad Rzeszutek Wilk
2012-07-06 15:07   ` Seth Jennings
2012-07-06 15:07     ` Seth Jennings
2012-07-09 13:58     ` Seth Jennings
2012-07-09 13:58       ` Seth Jennings
2012-07-11 19:42       ` Konrad Rzeszutek Wilk
2012-07-11 19:42         ` Konrad Rzeszutek Wilk
2012-07-11 20:48         ` Seth Jennings
2012-07-11 20:48           ` Seth Jennings
2012-07-12 10:40           ` Konrad Rzeszutek Wilk
2012-07-12 10:40             ` Konrad Rzeszutek Wilk
2012-07-11  7:03 ` Minchan Kim
2012-07-11  7:03   ` Minchan Kim
2012-07-11 14:00   ` Seth Jennings
2012-07-11 14:00     ` Seth Jennings
2012-07-12  1:01     ` Minchan Kim [this message]
2012-07-12  1:01       ` Minchan Kim
2012-07-11 19:16   ` Seth Jennings
2012-07-11 19:16     ` Seth Jennings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120712010105.GA5503@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.magenheimer@oracle.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ngupta@vflare.org \
    --cc=rcj@linux.vnet.ibm.com \
    --cc=sjenning@linux.vnet.ibm.com \
    --subject='Re: [PATCH 0/4] zsmalloc improvements' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.