linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] slub: reduce overhead of slub_debug
@ 2011-06-26 19:39 Marcin Slusarz
  2011-06-28 19:32 ` Christoph Lameter
  2011-07-07 18:07 ` Pekka Enberg
  0 siblings, 2 replies; 22+ messages in thread
From: Marcin Slusarz @ 2011-06-26 19:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, Matt Mackall; +Cc: LKML, linux-mm

slub checks for poison one byte by one, which is highly inefficient
and shows up frequently as a highest cpu-eater in perf top.

Joining reads gives nice speedup:

(Compiling some project with different options)
                                 make -j12    make clean
slub_debug disabled:             1m 27s       1.2 s
slub_debug enabled:              1m 46s       7.6 s
slub_debug enabled + this patch: 1m 33s       3.2 s

check_bytes still shows up high, but not always at the top.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: linux-mm@kvack.org
---
 mm/slub.c |   36 ++++++++++++++++++++++++++++++++++--
 1 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 35f351f..a40ef2d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -557,10 +557,10 @@ static void init_object(struct kmem_cache *s, void *object, u8 val)
 		memset(p + s->objsize, val, s->inuse - s->objsize);
 }
 
-static u8 *check_bytes(u8 *start, unsigned int value, unsigned int bytes)
+static u8 *check_bytes8(u8 *start, u8 value, unsigned int bytes)
 {
 	while (bytes) {
-		if (*start != (u8)value)
+		if (*start != value)
 			return start;
 		start++;
 		bytes--;
@@ -568,6 +568,38 @@ static u8 *check_bytes(u8 *start, unsigned int value, unsigned int bytes)
 	return NULL;
 }
 
+static u8 *check_bytes(u8 *start, u8 value, unsigned int bytes)
+{
+	u64 value64;
+	unsigned int words, prefix;
+
+	if (bytes <= 16)
+		return check_bytes8(start, value, bytes);
+
+	value64 = value | value << 8 | value << 16 | value << 24;
+	value64 = value64 | value64 << 32;
+	prefix = 8 - ((unsigned long)start) % 8;
+
+	if (prefix) {
+		u8 *r = check_bytes8(start, value, prefix);
+		if (r)
+			return r;
+		start += prefix;
+		bytes -= prefix;
+	}
+
+	words = bytes / 8;
+
+	while (words) {
+		if (*(u64 *)start != value64)
+			return check_bytes8(start, value, 8);
+		start += 8;
+		words--;
+	}
+
+	return check_bytes8(start, value, bytes % 8);
+}
+
 static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
 						void *from, void *to)
 {
-- 
1.7.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-06-26 19:39 [PATCH] slub: reduce overhead of slub_debug Marcin Slusarz
@ 2011-06-28 19:32 ` Christoph Lameter
  2011-06-28 19:40   ` David Daney
  2011-07-07 18:07 ` Pekka Enberg
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2011-06-28 19:32 UTC (permalink / raw)
  To: Marcin Slusarz; +Cc: Pekka Enberg, Matt Mackall, LKML, linux-mm

On Sun, 26 Jun 2011, Marcin Slusarz wrote:

> slub checks for poison one byte by one, which is highly inefficient
> and shows up frequently as a highest cpu-eater in perf top.

Ummm.. Performance improvements for debugging modes? If you need
performance then switch off debuggin.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-06-28 19:32 ` Christoph Lameter
@ 2011-06-28 19:40   ` David Daney
  2011-06-28 20:58     ` David Rientjes
  0 siblings, 1 reply; 22+ messages in thread
From: David Daney @ 2011-06-28 19:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Marcin Slusarz, Pekka Enberg, Matt Mackall, LKML, linux-mm

On 06/28/2011 12:32 PM, Christoph Lameter wrote:
> On Sun, 26 Jun 2011, Marcin Slusarz wrote:
>
>> slub checks for poison one byte by one, which is highly inefficient
>> and shows up frequently as a highest cpu-eater in perf top.
>
> Ummm.. Performance improvements for debugging modes? If you need
> performance then switch off debuggin.

There is no reason to make things gratuitously slow.  I don't know about 
the merits of this particular patch, but I must disagree with the 
general sentiment.

We have high performance tracing, why not improve this as well.

Just last week I was trying to find the cause of memory corruption that 
only occurred at very high network packet rates.  Memory allocation 
speed was definitely getting in the way of debugging.  For me, faster 
SLUB debugging would be welcome.

David Daney

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-06-28 19:40   ` David Daney
@ 2011-06-28 20:58     ` David Rientjes
  2011-06-28 21:04       ` Ben Greear
  2011-06-28 21:16       ` Dave Jones
  0 siblings, 2 replies; 22+ messages in thread
From: David Rientjes @ 2011-06-28 20:58 UTC (permalink / raw)
  To: David Daney
  Cc: Christoph Lameter, Marcin Slusarz, Pekka Enberg, Matt Mackall,
	LKML, linux-mm

On Tue, 28 Jun 2011, David Daney wrote:

> On 06/28/2011 12:32 PM, Christoph Lameter wrote:
> > On Sun, 26 Jun 2011, Marcin Slusarz wrote:
> > 
> > > slub checks for poison one byte by one, which is highly inefficient
> > > and shows up frequently as a highest cpu-eater in perf top.
> > 
> > Ummm.. Performance improvements for debugging modes? If you need
> > performance then switch off debuggin.
> 
> There is no reason to make things gratuitously slow.  I don't know about the
> merits of this particular patch, but I must disagree with the general
> sentiment.
> 
> We have high performance tracing, why not improve this as well.
> 
> Just last week I was trying to find the cause of memory corruption that only
> occurred at very high network packet rates.  Memory allocation speed was
> definitely getting in the way of debugging.  For me, faster SLUB debugging
> would be welcome.
> 

SLUB debugging is useful only to diagnose issues or test new code, nobody 
is going to be enabling it in production environment.  We don't need 30 
new lines of code that make one thing slightly faster, in fact we'd prefer 
to have as simple and minimal code as possible for debugging features 
unless you're adding more debugging coverage.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-06-28 20:58     ` David Rientjes
@ 2011-06-28 21:04       ` Ben Greear
  2011-06-28 21:10         ` David Rientjes
  2011-06-28 21:16       ` Dave Jones
  1 sibling, 1 reply; 22+ messages in thread
From: Ben Greear @ 2011-06-28 21:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: David Daney, Christoph Lameter, Marcin Slusarz, Pekka Enberg,
	Matt Mackall, LKML, linux-mm

On 06/28/2011 01:58 PM, David Rientjes wrote:
> On Tue, 28 Jun 2011, David Daney wrote:
>
>> On 06/28/2011 12:32 PM, Christoph Lameter wrote:
>>> On Sun, 26 Jun 2011, Marcin Slusarz wrote:
>>>
>>>> slub checks for poison one byte by one, which is highly inefficient
>>>> and shows up frequently as a highest cpu-eater in perf top.
>>>
>>> Ummm.. Performance improvements for debugging modes? If you need
>>> performance then switch off debuggin.
>>
>> There is no reason to make things gratuitously slow.  I don't know about the
>> merits of this particular patch, but I must disagree with the general
>> sentiment.
>>
>> We have high performance tracing, why not improve this as well.
>>
>> Just last week I was trying to find the cause of memory corruption that only
>> occurred at very high network packet rates.  Memory allocation speed was
>> definitely getting in the way of debugging.  For me, faster SLUB debugging
>> would be welcome.
>>
>
> SLUB debugging is useful only to diagnose issues or test new code, nobody
> is going to be enabling it in production environment.  We don't need 30
> new lines of code that make one thing slightly faster, in fact we'd prefer
> to have as simple and minimal code as possible for debugging features
> unless you're adding more debugging coverage.

If your problem happens under load, then the overhead of slub could significantly
change the behaviour of the system.  Anything that makes it more efficient without
unduly complicating code should be a win.  That posted patch wasn't all that
complicated, and even if it has bugs, it could be fixed easily enough.

Thanks,
Ben

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-06-28 21:04       ` Ben Greear
@ 2011-06-28 21:10         ` David Rientjes
  0 siblings, 0 replies; 22+ messages in thread
From: David Rientjes @ 2011-06-28 21:10 UTC (permalink / raw)
  To: Ben Greear
  Cc: David Daney, Christoph Lameter, Marcin Slusarz, Pekka Enberg,
	Matt Mackall, LKML, linux-mm

On Tue, 28 Jun 2011, Ben Greear wrote:

> > SLUB debugging is useful only to diagnose issues or test new code, nobody
> > is going to be enabling it in production environment.  We don't need 30
> > new lines of code that make one thing slightly faster, in fact we'd prefer
> > to have as simple and minimal code as possible for debugging features
> > unless you're adding more debugging coverage.
> 
> If your problem happens under load, then the overhead of slub could
> significantly
> change the behaviour of the system.

You're talking about slub debugging and not slub in general, I assume.

> Anything that makes it more efficient
> without
> unduly complicating code should be a win.  That posted patch wasn't all that
> complicated, and even if it has bugs, it could be fixed easily enough.
> 

"Even if it has bugs"?  Ask Pekka to merge this and be on the receiving 
end of every other kernel development's flames when slub debugging no 
longer finds their problems but instead has bugs of its own.

Again, we want simple and minimal slub debugging code unless you're adding 
a new feature.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-06-28 20:58     ` David Rientjes
  2011-06-28 21:04       ` Ben Greear
@ 2011-06-28 21:16       ` Dave Jones
  1 sibling, 0 replies; 22+ messages in thread
From: Dave Jones @ 2011-06-28 21:16 UTC (permalink / raw)
  To: David Rientjes
  Cc: David Daney, Christoph Lameter, Marcin Slusarz, Pekka Enberg,
	Matt Mackall, LKML, linux-mm

On Tue, Jun 28, 2011 at 01:58:06PM -0700, David Rientjes wrote:

 > SLUB debugging is useful only to diagnose issues or test new code, nobody 
 > is going to be enabling it in production environment.  We don't need 30 
 > new lines of code that make one thing slightly faster

During five of the six months of development, Fedora kernels are built with
slub debugging forced on.  And it turns up problems every single release.

Having the impact of this be lower is very desirable. If this doesn't get
merged, I'd be tempted to carry it in Fedora regardless for this reason.

	Dave


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-06-26 19:39 [PATCH] slub: reduce overhead of slub_debug Marcin Slusarz
  2011-06-28 19:32 ` Christoph Lameter
@ 2011-07-07 18:07 ` Pekka Enberg
  2011-07-07 18:17   ` Christoph Lameter
  1 sibling, 1 reply; 22+ messages in thread
From: Pekka Enberg @ 2011-07-07 18:07 UTC (permalink / raw)
  To: Marcin Slusarz; +Cc: Christoph Lameter, Matt Mackall, LKML, rientjes, linux-mm

On Sun, 26 Jun 2011, Marcin Slusarz wrote:
> slub checks for poison one byte by one, which is highly inefficient
> and shows up frequently as a highest cpu-eater in perf top.
>
> Joining reads gives nice speedup:
>
> (Compiling some project with different options)
>                                 make -j12    make clean
> slub_debug disabled:             1m 27s       1.2 s
> slub_debug enabled:              1m 46s       7.6 s
> slub_debug enabled + this patch: 1m 33s       3.2 s
>
> check_bytes still shows up high, but not always at the top.
>
> Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Matt Mackall <mpm@selenic.com>
> Cc: linux-mm@kvack.org
> ---

Looks good to me. Christoph, David, ?

> mm/slub.c |   36 ++++++++++++++++++++++++++++++++++--
> 1 files changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 35f351f..a40ef2d 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -557,10 +557,10 @@ static void init_object(struct kmem_cache *s, void *object, u8 val)
> 		memset(p + s->objsize, val, s->inuse - s->objsize);
> }
>
> -static u8 *check_bytes(u8 *start, unsigned int value, unsigned int bytes)
> +static u8 *check_bytes8(u8 *start, u8 value, unsigned int bytes)
> {
> 	while (bytes) {
> -		if (*start != (u8)value)
> +		if (*start != value)
> 			return start;
> 		start++;
> 		bytes--;
> @@ -568,6 +568,38 @@ static u8 *check_bytes(u8 *start, unsigned int value, unsigned int bytes)
> 	return NULL;
> }
>
> +static u8 *check_bytes(u8 *start, u8 value, unsigned int bytes)
> +{
> +	u64 value64;
> +	unsigned int words, prefix;
> +
> +	if (bytes <= 16)
> +		return check_bytes8(start, value, bytes);
> +
> +	value64 = value | value << 8 | value << 16 | value << 24;
> +	value64 = value64 | value64 << 32;
> +	prefix = 8 - ((unsigned long)start) % 8;
> +
> +	if (prefix) {
> +		u8 *r = check_bytes8(start, value, prefix);
> +		if (r)
> +			return r;
> +		start += prefix;
> +		bytes -= prefix;
> +	}
> +
> +	words = bytes / 8;
> +
> +	while (words) {
> +		if (*(u64 *)start != value64)
> +			return check_bytes8(start, value, 8);
> +		start += 8;
> +		words--;
> +	}
> +
> +	return check_bytes8(start, value, bytes % 8);
> +}
> +
> static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
> 						void *from, void *to)
> {
> -- 
> 1.7.5.3
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 18:07 ` Pekka Enberg
@ 2011-07-07 18:17   ` Christoph Lameter
  2011-07-07 18:30     ` Ben Greear
                       ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Christoph Lameter @ 2011-07-07 18:17 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Marcin Slusarz, Matt Mackall, LKML, rientjes, linux-mm

On Thu, 7 Jul 2011, Pekka Enberg wrote:

> Looks good to me. Christoph, David, ?

The reason debug code is there is because it is useless overhead typically
not needed. There is no point in optimizing the code that is not run in
production environments unless there are gross performance issues that
make debugging difficult. A performance patch for debugging would have to
cause significant performance improvements. This patch does not do that
nor was there such an issue to be addressed in the first place.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 18:17   ` Christoph Lameter
@ 2011-07-07 18:30     ` Ben Greear
  2011-07-07 18:42       ` Christoph Lameter
  2011-07-07 18:30     ` Matt Mackall
  2011-07-07 18:52     ` Pekka Enberg
  2 siblings, 1 reply; 22+ messages in thread
From: Ben Greear @ 2011-07-07 18:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Marcin Slusarz, Matt Mackall, LKML, rientjes, linux-mm

On 07/07/2011 11:17 AM, Christoph Lameter wrote:
> On Thu, 7 Jul 2011, Pekka Enberg wrote:
>
>> Looks good to me. Christoph, David, ?
>
> The reason debug code is there is because it is useless overhead typically
> not needed. There is no point in optimizing the code that is not run in
> production environments unless there are gross performance issues that
> make debugging difficult. A performance patch for debugging would have to
> cause significant performance improvements. This patch does not do that
> nor was there such an issue to be addressed in the first place.

The more painful you make it, the less likely folks are to use it
in environments that actually reproduce bugs, so I think it's quite
short-sighted to reject such performance improvements out of hand.

And what if some production machine has funny crashes in a specific
work-load....wouldn't it be nice if it could enable debugging and
still perform well enough to do it's job?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 18:17   ` Christoph Lameter
  2011-07-07 18:30     ` Ben Greear
@ 2011-07-07 18:30     ` Matt Mackall
  2011-07-07 18:52     ` Pekka Enberg
  2 siblings, 0 replies; 22+ messages in thread
From: Matt Mackall @ 2011-07-07 18:30 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Marcin Slusarz, LKML, rientjes, linux-mm

On Thu, 2011-07-07 at 13:17 -0500, Christoph Lameter wrote:
> On Thu, 7 Jul 2011, Pekka Enberg wrote:
> 
> > Looks good to me. Christoph, David, ?
> 
> The reason debug code is there is because it is useless overhead typically
> not needed. There is no point in optimizing the code that is not run in
> production environments unless there are gross performance issues that
> make debugging difficult. A performance patch for debugging would have to
> cause significant performance improvements. This patch does not do that
> nor was there such an issue to be addressed in the first place.

Deja vu.

-- 
Mathematics is the supreme nostalgia of our time.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 18:30     ` Ben Greear
@ 2011-07-07 18:42       ` Christoph Lameter
  2011-07-07 18:54         ` Ben Greear
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2011-07-07 18:42 UTC (permalink / raw)
  To: Ben Greear
  Cc: Pekka Enberg, Marcin Slusarz, Matt Mackall, LKML, rientjes, linux-mm

On Thu, 7 Jul 2011, Ben Greear wrote:

> The more painful you make it, the less likely folks are to use it
> in environments that actually reproduce bugs, so I think it's quite
> short-sighted to reject such performance improvements out of hand.
>
> And what if some production machine has funny crashes in a specific
> work-load....wouldn't it be nice if it could enable debugging and
> still perform well enough to do it's job?

Sure if there would be significant improvements that accomplish what
you claim above then that would be certainly worthwhile. Come up with
patches like that please.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 18:17   ` Christoph Lameter
  2011-07-07 18:30     ` Ben Greear
  2011-07-07 18:30     ` Matt Mackall
@ 2011-07-07 18:52     ` Pekka Enberg
  2011-07-07 18:55       ` Matt Mackall
  2011-07-07 19:12       ` Christoph Lameter
  2 siblings, 2 replies; 22+ messages in thread
From: Pekka Enberg @ 2011-07-07 18:52 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Marcin Slusarz, Matt Mackall, LKML, rientjes, linux-mm

On Thu, 7 Jul 2011, Pekka Enberg wrote:
> > Looks good to me. Christoph, David, ?

On Thu, 2011-07-07 at 13:17 -0500, Christoph Lameter wrote:
> The reason debug code is there is because it is useless overhead typically
> not needed. There is no point in optimizing the code that is not run in
> production environments unless there are gross performance issues that
> make debugging difficult. A performance patch for debugging would have to
> cause significant performance improvements. This patch does not do that
> nor was there such an issue to be addressed in the first place.

Is there something technically wrong with the patch? Quoting the patch
email:

  (Compiling some project with different options)
                                 make -j12    make clean
  slub_debug disabled:             1m 27s       1.2 s
  slub_debug enabled:              1m 46s       7.6 s
  slub_debug enabled + this patch: 1m 33s       3.2 s

  check_bytes still shows up high, but not always at the top.

That's significant enough speedup for me!

			Pekka


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 18:42       ` Christoph Lameter
@ 2011-07-07 18:54         ` Ben Greear
  0 siblings, 0 replies; 22+ messages in thread
From: Ben Greear @ 2011-07-07 18:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Marcin Slusarz, Matt Mackall, LKML, rientjes, linux-mm

On 07/07/2011 11:42 AM, Christoph Lameter wrote:
> On Thu, 7 Jul 2011, Ben Greear wrote:
>
>> The more painful you make it, the less likely folks are to use it
>> in environments that actually reproduce bugs, so I think it's quite
>> short-sighted to reject such performance improvements out of hand.
>>
>> And what if some production machine has funny crashes in a specific
>> work-load....wouldn't it be nice if it could enable debugging and
>> still perform well enough to do it's job?
>
> Sure if there would be significant improvements that accomplish what
> you claim above then that would be certainly worthwhile. Come up with
> patches like that please.

The patch appears to make some work loads twice as fast ('make clean'),
and it had a reasonable speedup to the 'make -j12'.  What do you
consider 'significant'?

I'm willing to do some other network-related benchmarks with his patch if
that would give it better chance of being accepted.  (I end up running
with SLUB debug quite a bit on big, heavy, workloads...so any speedup
in that would be a big help for us...)

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 18:52     ` Pekka Enberg
@ 2011-07-07 18:55       ` Matt Mackall
  2011-07-07 19:12       ` Christoph Lameter
  1 sibling, 0 replies; 22+ messages in thread
From: Matt Mackall @ 2011-07-07 18:55 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, Marcin Slusarz, LKML, rientjes, linux-mm

On Thu, 2011-07-07 at 21:52 +0300, Pekka Enberg wrote:
> On Thu, 7 Jul 2011, Pekka Enberg wrote:
> > > Looks good to me. Christoph, David, ?
> 
> On Thu, 2011-07-07 at 13:17 -0500, Christoph Lameter wrote:
> > The reason debug code is there is because it is useless overhead typically
> > not needed. There is no point in optimizing the code that is not run in
> > production environments unless there are gross performance issues that
> > make debugging difficult. A performance patch for debugging would have to
> > cause significant performance improvements. This patch does not do that
> > nor was there such an issue to be addressed in the first place.
> 
> Is there something technically wrong with the patch? Quoting the patch
> email:
> 
>   (Compiling some project with different options)
>                                  make -j12    make clean
>   slub_debug disabled:             1m 27s       1.2 s
>   slub_debug enabled:              1m 46s       7.6 s
>   slub_debug enabled + this patch: 1m 33s       3.2 s
> 
>   check_bytes still shows up high, but not always at the top.
> 
> That's significant enough speedup for me!

We're not going to make any progress on this; Christoph conveniently
forgets the counterarguments from week to week.

-- 
Mathematics is the supreme nostalgia of our time.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 18:52     ` Pekka Enberg
  2011-07-07 18:55       ` Matt Mackall
@ 2011-07-07 19:12       ` Christoph Lameter
  2011-07-07 19:21         ` David Miller
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2011-07-07 19:12 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Marcin Slusarz, Matt Mackall, LKML, rientjes, linux-mm

On Thu, 7 Jul 2011, Pekka Enberg wrote:

> On Thu, 7 Jul 2011, Pekka Enberg wrote:
> > > Looks good to me. Christoph, David, ?
>
> On Thu, 2011-07-07 at 13:17 -0500, Christoph Lameter wrote:
> > The reason debug code is there is because it is useless overhead typically
> > not needed. There is no point in optimizing the code that is not run in
> > production environments unless there are gross performance issues that
> > make debugging difficult. A performance patch for debugging would have to
> > cause significant performance improvements. This patch does not do that
> > nor was there such an issue to be addressed in the first place.
>
> Is there something technically wrong with the patch? Quoting the patch
> email:
>
>   (Compiling some project with different options)
>                                  make -j12    make clean
>   slub_debug disabled:             1m 27s       1.2 s
>   slub_debug enabled:              1m 46s       7.6 s
>   slub_debug enabled + this patch: 1m 33s       3.2 s
>
>   check_bytes still shows up high, but not always at the top.
>
> That's significant enough speedup for me!

Ok. I had a different set of numbers in mind from earlier posts.

The benefit here comes from accessing memory in larger (word) chunks
instead of byte wise. This is a form of memscan() with inverse matching.

Isnt there an asm optimized version that can do this much better (there is
one for memscan())? Optimizing this in core code by codeing something as
generic as that is not that good since the arch code can deliver better
performance and it seems that this is functionality that could be useful
elsewhere.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 19:12       ` Christoph Lameter
@ 2011-07-07 19:21         ` David Miller
  2011-07-07 19:49           ` Pekka Enberg
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2011-07-07 19:21 UTC (permalink / raw)
  To: cl; +Cc: penberg, marcin.slusarz, mpm, linux-kernel, rientjes, linux-mm

From: Christoph Lameter <cl@linux.com>
Date: Thu, 7 Jul 2011 14:12:37 -0500 (CDT)

> On Thu, 7 Jul 2011, Pekka Enberg wrote:
> 
>> On Thu, 7 Jul 2011, Pekka Enberg wrote:
>> > > Looks good to me. Christoph, David, ?
>>
>> On Thu, 2011-07-07 at 13:17 -0500, Christoph Lameter wrote:
>> > The reason debug code is there is because it is useless overhead typically
>> > not needed. There is no point in optimizing the code that is not run in
>> > production environments unless there are gross performance issues that
>> > make debugging difficult. A performance patch for debugging would have to
>> > cause significant performance improvements. This patch does not do that
>> > nor was there such an issue to be addressed in the first place.
>>
>> Is there something technically wrong with the patch? Quoting the patch
>> email:
>>
>>   (Compiling some project with different options)
>>                                  make -j12    make clean
>>   slub_debug disabled:             1m 27s       1.2 s
>>   slub_debug enabled:              1m 46s       7.6 s
>>   slub_debug enabled + this patch: 1m 33s       3.2 s
>>
>>   check_bytes still shows up high, but not always at the top.
>>
>> That's significant enough speedup for me!
> 
> Ok. I had a different set of numbers in mind from earlier posts.
> 
> The benefit here comes from accessing memory in larger (word) chunks
> instead of byte wise. This is a form of memscan() with inverse matching.
> 
> Isnt there an asm optimized version that can do this much better (there is
> one for memscan())? Optimizing this in core code by codeing something as
> generic as that is not that good since the arch code can deliver better
> performance and it seems that this is functionality that could be useful
> elsewhere.

You're being so unreasonable, just let the optimization in, refine it
with follow-on patches.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 19:21         ` David Miller
@ 2011-07-07 19:49           ` Pekka Enberg
  2011-07-07 20:12             ` Christoph Lameter
  0 siblings, 1 reply; 22+ messages in thread
From: Pekka Enberg @ 2011-07-07 19:49 UTC (permalink / raw)
  To: David Miller; +Cc: cl, marcin.slusarz, mpm, linux-kernel, rientjes, linux-mm

On Thu, Jul 7, 2011 at 10:21 PM, David Miller <davem@davemloft.net> wrote:
> From: Christoph Lameter <cl@linux.com>
> Date: Thu, 7 Jul 2011 14:12:37 -0500 (CDT)
>
>> On Thu, 7 Jul 2011, Pekka Enberg wrote:
>>
>>> On Thu, 7 Jul 2011, Pekka Enberg wrote:
>>> > > Looks good to me. Christoph, David, ?
>>>
>>> On Thu, 2011-07-07 at 13:17 -0500, Christoph Lameter wrote:
>>> > The reason debug code is there is because it is useless overhead typically
>>> > not needed. There is no point in optimizing the code that is not run in
>>> > production environments unless there are gross performance issues that
>>> > make debugging difficult. A performance patch for debugging would have to
>>> > cause significant performance improvements. This patch does not do that
>>> > nor was there such an issue to be addressed in the first place.
>>>
>>> Is there something technically wrong with the patch? Quoting the patch
>>> email:
>>>
>>>   (Compiling some project with different options)
>>>                                  make -j12    make clean
>>>   slub_debug disabled:             1m 27s       1.2 s
>>>   slub_debug enabled:              1m 46s       7.6 s
>>>   slub_debug enabled + this patch: 1m 33s       3.2 s
>>>
>>>   check_bytes still shows up high, but not always at the top.
>>>
>>> That's significant enough speedup for me!
>>
>> Ok. I had a different set of numbers in mind from earlier posts.
>>
>> The benefit here comes from accessing memory in larger (word) chunks
>> instead of byte wise. This is a form of memscan() with inverse matching.
>>
>> Isnt there an asm optimized version that can do this much better (there is
>> one for memscan())? Optimizing this in core code by codeing something as
>> generic as that is not that good since the arch code can deliver better
>> performance and it seems that this is functionality that could be useful
>> elsewhere.
>
> You're being so unreasonable, just let the optimization in, refine it
> with follow-on patches.

I applied the patch. I think a follow up patch that moves the function
to lib/string.c with proper generic name would be in order. Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 19:49           ` Pekka Enberg
@ 2011-07-07 20:12             ` Christoph Lameter
  2011-07-08  5:23               ` Andi Kleen
  2011-07-08  5:38               ` Pekka Enberg
  0 siblings, 2 replies; 22+ messages in thread
From: Christoph Lameter @ 2011-07-07 20:12 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Miller, marcin.slusarz, mpm, linux-kernel, rientjes, linux-mm

On Thu, 7 Jul 2011, Pekka Enberg wrote:

> I applied the patch. I think a follow up patch that moves the function
> to lib/string.c with proper generic name would be in order. Thanks!

Well this is really straightforward. Hasnt seen much testing yet and
needs refinement but it would be like this:


---
 arch/x86/include/asm/string_32.h |    2 ++
 arch/x86/lib/string_32.c         |   17 +++++++++++++++++
 include/linux/string.h           |    3 +++
 lib/string.c                     |   25 +++++++++++++++++++++++++
 mm/slub.c                        |   13 ++++++-------
 5 files changed, 53 insertions(+), 7 deletions(-)

Index: linux-2.6/arch/x86/lib/string_32.c
===================================================================
--- linux-2.6.orig/arch/x86/lib/string_32.c	2011-07-07 15:03:46.000000000 -0500
+++ linux-2.6/arch/x86/lib/string_32.c	2011-07-07 15:03:56.000000000 -0500
@@ -214,6 +214,23 @@ void *memscan(void *addr, int c, size_t
 EXPORT_SYMBOL(memscan);
 #endif

+#ifdef __HAVE_ARCH_INV_MEMSCAN
+void *inv_memscan(void *addr, int c, size_t size)
+{
+	if (!size)
+		return addr;
+	asm volatile("repz; scasb\n\t"
+	    "jz 1f\n\t"
+	    "dec %%edi\n"
+	    "1:"
+	    : "=D" (addr), "=c" (size)
+	    : "0" (addr), "1" (size), "a" (c)
+	    : "memory");
+	return addr;
+}
+EXPORT_SYMBOL(memscan);
+#endif
+
 #ifdef __HAVE_ARCH_STRNLEN
 size_t strnlen(const char *s, size_t count)
 {
Index: linux-2.6/include/linux/string.h
===================================================================
--- linux-2.6.orig/include/linux/string.h	2011-07-07 15:03:46.000000000 -0500
+++ linux-2.6/include/linux/string.h	2011-07-07 15:03:56.000000000 -0500
@@ -108,6 +108,9 @@ extern void * memmove(void *,const void
 #ifndef __HAVE_ARCH_MEMSCAN
 extern void * memscan(void *,int,__kernel_size_t);
 #endif
+#ifndef __HAVE_ARCH_INV_MEMSCAN
+extern void * inv_memscan(void *,int,__kernel_size_t);
+#endif
 #ifndef __HAVE_ARCH_MEMCMP
 extern int memcmp(const void *,const void *,__kernel_size_t);
 #endif
Index: linux-2.6/lib/string.c
===================================================================
--- linux-2.6.orig/lib/string.c	2011-07-07 15:03:46.000000000 -0500
+++ linux-2.6/lib/string.c	2011-07-07 15:03:56.000000000 -0500
@@ -684,6 +684,31 @@ void *memscan(void *addr, int c, size_t
 EXPORT_SYMBOL(memscan);
 #endif

+#ifndef __HAVE_ARCH_INV_MEMSCAN
+/**
+ * memscan - Skip characters in an area of memory.
+ * @addr: The memory area
+ * @c: The byte to skip
+ * @size: The size of the area.
+ *
+ * returns the address of the first mismatch of @c, or 1 byte past
+ * the area if @c matches to the end
+ */
+void *inv_memscan(void *addr, int c, size_t size)
+{
+	unsigned char *p = addr;
+
+	while (size) {
+		if (*p != c)
+			return (void *)p;
+		p++;
+		size--;
+	}
+  	return (void *)p;
+}
+EXPORT_SYMBOL(inv_memscan);
+#endif
+
 #ifndef __HAVE_ARCH_STRSTR
 /**
  * strstr - Find the first substring in a %NUL terminated string
Index: linux-2.6/arch/x86/include/asm/string_32.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/string_32.h	2011-07-07 15:05:44.000000000 -0500
+++ linux-2.6/arch/x86/include/asm/string_32.h	2011-07-07 15:06:16.000000000 -0500
@@ -336,6 +336,8 @@ void *__constant_c_and_count_memset(void
  */
 #define __HAVE_ARCH_MEMSCAN
 extern void *memscan(void *addr, int c, size_t size);
+#define __HAVE_ARCH_INV_MEMSCAN
+extern void *inv_memscan(void *addr, int c, size_t size);

 #endif /* __KERNEL__ */

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-07-07 15:04:11.000000000 -0500
+++ linux-2.6/mm/slub.c	2011-07-07 15:10:21.000000000 -0500
@@ -559,13 +559,12 @@ static void init_object(struct kmem_cach

 static u8 *check_bytes(u8 *start, unsigned int value, unsigned int bytes)
 {
-	while (bytes) {
-		if (*start != (u8)value)
-			return start;
-		start++;
-		bytes--;
-	}
-	return NULL;
+	u8 *p = inv_memscan(start, value, bytes);
+
+	if (p == start + bytes)
+		return NULL;
+
+	return p;
 }

 static void restore_bytes(struct kmem_cache *s, char *message, u8 data,

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 20:12             ` Christoph Lameter
@ 2011-07-08  5:23               ` Andi Kleen
  2011-07-08 17:41                 ` Christoph Lameter
  2011-07-08  5:38               ` Pekka Enberg
  1 sibling, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2011-07-08  5:23 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, David Miller, marcin.slusarz, mpm, linux-kernel,
	rientjes, linux-mm

Christoph Lameter <cl@linux.com> writes:


> +#ifdef __HAVE_ARCH_INV_MEMSCAN
> +void *inv_memscan(void *addr, int c, size_t size)
> +{
> +	if (!size)
> +		return addr;
> +	asm volatile("repz; scasb\n\t"

This will just do the slow byte accesses again internally.
scasb is not normally very optimized in microcode as far
as I know.

Also rep has quite some startup overhead which makes
it a bad idea for small sizes (<16-20 or so)

I would stay with the C version. I bet that one is 
faster.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-07 20:12             ` Christoph Lameter
  2011-07-08  5:23               ` Andi Kleen
@ 2011-07-08  5:38               ` Pekka Enberg
  1 sibling, 0 replies; 22+ messages in thread
From: Pekka Enberg @ 2011-07-08  5:38 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Miller, marcin.slusarz, mpm, linux-kernel, rientjes,
	linux-mm, Andrew Morton

On Thu, Jul 7, 2011 at 11:12 PM, Christoph Lameter <cl@linux.com> wrote:
> On Thu, 7 Jul 2011, Pekka Enberg wrote:
>
>> I applied the patch. I think a follow up patch that moves the function
>> to lib/string.c with proper generic name would be in order. Thanks!
>
> Well this is really straightforward. Hasnt seen much testing yet and
> needs refinement but it would be like this:
>
>
> ---
>  arch/x86/include/asm/string_32.h |    2 ++
>  arch/x86/lib/string_32.c         |   17 +++++++++++++++++
>  include/linux/string.h           |    3 +++
>  lib/string.c                     |   25 +++++++++++++++++++++++++
>  mm/slub.c                        |   13 ++++++-------
>  5 files changed, 53 insertions(+), 7 deletions(-)
>
> Index: linux-2.6/arch/x86/lib/string_32.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/lib/string_32.c     2011-07-07 15:03:46.000000000 -0500
> +++ linux-2.6/arch/x86/lib/string_32.c  2011-07-07 15:03:56.000000000 -0500
> @@ -214,6 +214,23 @@ void *memscan(void *addr, int c, size_t
>  EXPORT_SYMBOL(memscan);
>  #endif
>
> +#ifdef __HAVE_ARCH_INV_MEMSCAN
> +void *inv_memscan(void *addr, int c, size_t size)
> +{
> +       if (!size)
> +               return addr;
> +       asm volatile("repz; scasb\n\t"
> +           "jz 1f\n\t"
> +           "dec %%edi\n"
> +           "1:"
> +           : "=D" (addr), "=c" (size)
> +           : "0" (addr), "1" (size), "a" (c)
> +           : "memory");
> +       return addr;
> +}
> +EXPORT_SYMBOL(memscan);
> +#endif
> +
>  #ifdef __HAVE_ARCH_STRNLEN
>  size_t strnlen(const char *s, size_t count)
>  {
> Index: linux-2.6/include/linux/string.h
> ===================================================================
> --- linux-2.6.orig/include/linux/string.h       2011-07-07 15:03:46.000000000 -0500
> +++ linux-2.6/include/linux/string.h    2011-07-07 15:03:56.000000000 -0500
> @@ -108,6 +108,9 @@ extern void * memmove(void *,const void
>  #ifndef __HAVE_ARCH_MEMSCAN
>  extern void * memscan(void *,int,__kernel_size_t);
>  #endif
> +#ifndef __HAVE_ARCH_INV_MEMSCAN
> +extern void * inv_memscan(void *,int,__kernel_size_t);
> +#endif
>  #ifndef __HAVE_ARCH_MEMCMP
>  extern int memcmp(const void *,const void *,__kernel_size_t);
>  #endif
> Index: linux-2.6/lib/string.c
> ===================================================================
> --- linux-2.6.orig/lib/string.c 2011-07-07 15:03:46.000000000 -0500
> +++ linux-2.6/lib/string.c      2011-07-07 15:03:56.000000000 -0500
> @@ -684,6 +684,31 @@ void *memscan(void *addr, int c, size_t
>  EXPORT_SYMBOL(memscan);
>  #endif
>
> +#ifndef __HAVE_ARCH_INV_MEMSCAN
> +/**
> + * memscan - Skip characters in an area of memory.
> + * @addr: The memory area
> + * @c: The byte to skip
> + * @size: The size of the area.
> + *
> + * returns the address of the first mismatch of @c, or 1 byte past
> + * the area if @c matches to the end
> + */
> +void *inv_memscan(void *addr, int c, size_t size)

I think this needs a better name. My suggestion is memskip().

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] slub: reduce overhead of slub_debug
  2011-07-08  5:23               ` Andi Kleen
@ 2011-07-08 17:41                 ` Christoph Lameter
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Lameter @ 2011-07-08 17:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Pekka Enberg, David Miller, marcin.slusarz, mpm, linux-kernel,
	rientjes, linux-mm

On Thu, 7 Jul 2011, Andi Kleen wrote:

> Christoph Lameter <cl@linux.com> writes:
>
>
> > +#ifdef __HAVE_ARCH_INV_MEMSCAN
> > +void *inv_memscan(void *addr, int c, size_t size)
> > +{
> > +	if (!size)
> > +		return addr;
> > +	asm volatile("repz; scasb\n\t"
>
> This will just do the slow byte accesses again internally.
> scasb is not normally very optimized in microcode as far
> as I know.
>
> Also rep has quite some startup overhead which makes
> it a bad idea for small sizes (<16-20 or so)
>
> I would stay with the C version. I bet that one is
> faster.

If the c code is such an improvement then memscan and other
implementations can be accellerated in the same way. That would be useful
in general. We can get rid of the implementation for memscan and friends
in x86 arch code.




^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-07-08 17:41 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-26 19:39 [PATCH] slub: reduce overhead of slub_debug Marcin Slusarz
2011-06-28 19:32 ` Christoph Lameter
2011-06-28 19:40   ` David Daney
2011-06-28 20:58     ` David Rientjes
2011-06-28 21:04       ` Ben Greear
2011-06-28 21:10         ` David Rientjes
2011-06-28 21:16       ` Dave Jones
2011-07-07 18:07 ` Pekka Enberg
2011-07-07 18:17   ` Christoph Lameter
2011-07-07 18:30     ` Ben Greear
2011-07-07 18:42       ` Christoph Lameter
2011-07-07 18:54         ` Ben Greear
2011-07-07 18:30     ` Matt Mackall
2011-07-07 18:52     ` Pekka Enberg
2011-07-07 18:55       ` Matt Mackall
2011-07-07 19:12       ` Christoph Lameter
2011-07-07 19:21         ` David Miller
2011-07-07 19:49           ` Pekka Enberg
2011-07-07 20:12             ` Christoph Lameter
2011-07-08  5:23               ` Andi Kleen
2011-07-08 17:41                 ` Christoph Lameter
2011-07-08  5:38               ` Pekka Enberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).