linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [TRIVIAL] kstrdup
@ 2003-04-18  7:58 Rusty Trivial Russell
  2003-04-18  8:10 ` Jeff Garzik
  0 siblings, 1 reply; 20+ messages in thread
From: Rusty Trivial Russell @ 2003-04-18  7:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

From:  Rusty Russell <rusty@rustcorp.com.au>

  Everyone loves reimplementing strdup.  Give them a kstrdup (basically
  from drivers/md).
  
  Rusty.
  --
    Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
  
  Name: kstrdup
  Author: Neil Brown and Rusty Russell
  Status: Trivial
  
  D: Everyone loves reimplementing strdup.  Give them a kstrdup.
  

--- trivial-2.5.67-bk8/drivers/md/dm-ioctl.c.orig	2003-04-18 17:43:57.000000000 +1000
+++ trivial-2.5.67-bk8/drivers/md/dm-ioctl.c	2003-04-18 17:43:57.000000000 +1000
@@ -14,6 +14,7 @@
 #include <linux/wait.h>
 #include <linux/blk.h>
 #include <linux/slab.h>
+#include <linux/string.h>
 
 #include <asm/uaccess.h>
 
@@ -118,14 +119,6 @@
 /*-----------------------------------------------------------------
  * Inserting, removing and renaming a device.
  *---------------------------------------------------------------*/
-static inline char *kstrdup(const char *str)
-{
-	char *r = kmalloc(strlen(str) + 1, GFP_KERNEL);
-	if (r)
-		strcpy(r, str);
-	return r;
-}
-
 static struct hash_cell *alloc_cell(const char *name, const char *uuid,
 				    struct mapped_device *md)
 {
@@ -135,7 +128,7 @@
 	if (!hc)
 		return NULL;
 
-	hc->name = kstrdup(name);
+	hc->name = kstrdup(name, GFP_KERNEL);
 	if (!hc->name) {
 		kfree(hc);
 		return NULL;
@@ -145,7 +138,7 @@
 		hc->uuid = NULL;
 
 	else {
-		hc->uuid = kstrdup(uuid);
+		hc->uuid = kstrdup(uuid, GFP_KERNEL);
 		if (!hc->uuid) {
 			kfree(hc->name);
 			kfree(hc);
@@ -270,7 +263,7 @@
 	/*
 	 * duplicate new.
 	 */
-	new_name = kstrdup(new);
+	new_name = kstrdup(new, GFP_KERNEL);
 	if (!new_name)
 		return -ENOMEM;
 
--- trivial-2.5.67-bk8/include/linux/string.h.orig	2003-04-18 17:43:57.000000000 +1000
+++ trivial-2.5.67-bk8/include/linux/string.h	2003-04-18 17:43:57.000000000 +1000
@@ -78,6 +78,8 @@
 extern void * memchr(const void *,int,__kernel_size_t);
 #endif
 
+extern char *kstrdup(const char *s, int gfp);
+
 #ifdef __cplusplus
 }
 #endif
--- trivial-2.5.67-bk8/kernel/ksyms.c.orig	2003-04-18 17:43:57.000000000 +1000
+++ trivial-2.5.67-bk8/kernel/ksyms.c	2003-04-18 17:43:57.000000000 +1000
@@ -585,6 +585,7 @@
 EXPORT_SYMBOL(strnicmp);
 EXPORT_SYMBOL(strspn);
 EXPORT_SYMBOL(strsep);
+EXPORT_SYMBOL(kstrdup);
 
 /* software interrupts */
 EXPORT_SYMBOL(tasklet_init);
--- trivial-2.5.67-bk8/lib/string.c.orig	2003-04-18 17:43:57.000000000 +1000
+++ trivial-2.5.67-bk8/lib/string.c	2003-04-18 17:43:57.000000000 +1000
@@ -22,6 +22,7 @@
 #include <linux/types.h>
 #include <linux/string.h>
 #include <linux/ctype.h>
+#include <linux/slab.h>
 
 #ifndef __HAVE_ARCH_STRNICMP
 /**
@@ -525,5 +526,12 @@
 	}
 	return NULL;
 }
-
 #endif
+
+char *kstrdup(const char *s, int gfp)
+{
+	char *buf = kmalloc(strlen(s)+1, gfp);
+	if (buf)
+		strcpy(buf, s);
+	return buf;
+}
-- 
  Don't blame me: the Monkey is driving
  File: Rusty Russell <rusty@rustcorp.com.au>: [PATCH] [TRIVIAL] kstrdup

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18  7:58 [TRIVIAL] kstrdup Rusty Trivial Russell
@ 2003-04-18  8:10 ` Jeff Garzik
  2003-04-18 16:20   ` Linus Torvalds
  2003-04-19  4:14   ` Rusty Russell
  0 siblings, 2 replies; 20+ messages in thread
From: Jeff Garzik @ 2003-04-18  8:10 UTC (permalink / raw)
  To: Rusty Trivial Russell; +Cc: Linus Torvalds, linux-kernel

Rusty Trivial Russell wrote:
> +char *kstrdup(const char *s, int gfp)
> +{
> +	char *buf = kmalloc(strlen(s)+1, gfp);
> +	if (buf)
> +		strcpy(buf, s);
> +	return buf;
> +}



You should save the strlen result to a temp var, and then s/strcpy/memcpy/

	Jeff




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18  8:10 ` Jeff Garzik
@ 2003-04-18 16:20   ` Linus Torvalds
  2003-04-18 16:56     ` Jeff Garzik
  2003-04-19  4:14   ` Rusty Russell
  1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2003-04-18 16:20 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Rusty Trivial Russell, linux-kernel


On Fri, 18 Apr 2003, Jeff Garzik wrote:
> 
> You should save the strlen result to a temp var, and then s/strcpy/memcpy/

No, you should just not do this. I don't see the point.

		Linus


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18 16:20   ` Linus Torvalds
@ 2003-04-18 16:56     ` Jeff Garzik
  2003-04-18 18:00       ` Linus Torvalds
  2003-04-18 18:00       ` Richard B. Johnson
  0 siblings, 2 replies; 20+ messages in thread
From: Jeff Garzik @ 2003-04-18 16:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rusty Trivial Russell, linux-kernel

Linus Torvalds wrote:
> On Fri, 18 Apr 2003, Jeff Garzik wrote:
> 
>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
> 
> 
> No, you should just not do this. I don't see the point.


strcpy has a test for each byte of its contents, and memcpy doesn't.
Why search 's' for NULL twice?

	Jeff




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18 16:56     ` Jeff Garzik
@ 2003-04-18 18:00       ` Linus Torvalds
  2003-04-18 18:44         ` Jeff Garzik
  2003-04-19  4:52         ` Rusty Russell
  2003-04-18 18:00       ` Richard B. Johnson
  1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2003-04-18 18:00 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Rusty Trivial Russell, linux-kernel


On Fri, 18 Apr 2003, Jeff Garzik wrote:
> Linus Torvalds wrote:
> > On Fri, 18 Apr 2003, Jeff Garzik wrote:
> > 
> >>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
> > 
> > No, you should just not do this. I don't see the point.
> 
> strcpy has a test for each byte of its contents, and memcpy doesn't.
> Why search 's' for NULL twice?

No, my point is that kstrdup() _itself_ just shouldn't be done. I don't
see it as being worthy of kernel support. Most of the kernel string data
structures are NOT random zero-ended strings anyway: they are either
strictly limited in some ways ("ends in '\0', but limited to PATH_MAX), or
they are explicitly sized ("struct qstr").

I don't much personally like C strings. Explicit lengths tend to have a
lot of advantages, and while a lot of the standard C infrastructure is for 
zero-ended strings, they do end up being even worse in the kernel than in 
user space (think buffer overflows and limited allocations in general).

			Linus


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18 16:56     ` Jeff Garzik
  2003-04-18 18:00       ` Linus Torvalds
@ 2003-04-18 18:00       ` Richard B. Johnson
  2003-04-18 18:40         ` Jeff Garzik
  1 sibling, 1 reply; 20+ messages in thread
From: Richard B. Johnson @ 2003-04-18 18:00 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel

On Fri, 18 Apr 2003, Jeff Garzik wrote:

> Linus Torvalds wrote:
> > On Fri, 18 Apr 2003, Jeff Garzik wrote:
> >
> >>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
> >
> >
> > No, you should just not do this. I don't see the point.
>
>
> strcpy has a test for each byte of its contents, and memcpy doesn't.
> Why search 's' for NULL twice?
>
> 	Jeff

Because it doesn't. strcpy() is usually implimented by getting
the string-length, using the same code sequence as strlen(), then
using the same code sequence as memcpy(), but copying the null-byte
as well. The check for the null-byte is done in the length routine.

If you do a memcpy(a, b, strlen(b));, then you are making two
procedure calls and dirtying the cache twice..

A typical Intel procedure, stripped of the push/pops to save
registers is here....

		cld
		movl	SRC(%esp), %edi
		movl	%edi, %esi	# Source = ESI and EDI
		orl	$-1, %ecx
		xorl	%eax,%eax	# Zero
		repnz	scasb		# Look for the null-byte
		negl	%ecx		# ecx = string length
		incl	%ecx		# copy the null
		movl	DST(%esp), %edi	# Destination buffer EDI
		shrl	$1, %ecx	# Make word count
		pushf			# Save result
		shrl	$1, %ecx	# Make longword count
		rep	movsl		# copy longwords
		adcl	%ecx, %ecx	# Possible word count
		rep	movsw		# Copy possible last word
		popf			# Result from first shift
		adcl	%ecx, %ecx	# Possible byte-count
		rep	movsb		# Copy last byte

There are no jumps and the locality of action is good. This could
be improved in the corner cases of long, aligned buffers.

A lot of persons who are unfamiliar with tools other than 'C' think
that strcpy() is made like this:

	while(*dsp++ = *src++)
                   ;

... ahaaa .. Wouderful short portable code! Wrong, awful code, possibly
portable.


.L2:
	movl 8(%ebp),%edx	# Memory reference (destination)
	movl 12(%ebp),%ecx	# Memory reference (source)
	movb (%ecx),%al		# Get a byte
	movb %al,(%edx)		# Put it into destination buffer
	incl 12(%ebp)		# Memory reference (bump the source)
	incl 8(%ebp)		# Memory reference (bump the destination)
	testb %al,%al		# Check for zero on every byte
	jne .L4			# Why a jump to here? (should be jne .L2)
	jmp .L3
	.align 4
.L4:
	jmp .L2			# Now we go back to the beginning.
	.align 4
.L3:
.L1:
	movl %ebp,%esp
	popl %ebp
	ret


Now; this code is 'shorter', but it is very slow compared to the
smarter strcpy() as previously shown.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18 18:00       ` Richard B. Johnson
@ 2003-04-18 18:40         ` Jeff Garzik
  2003-04-18 19:24           ` H. Peter Anvin
  2003-04-18 19:29           ` Richard B. Johnson
  0 siblings, 2 replies; 20+ messages in thread
From: Jeff Garzik @ 2003-04-18 18:40 UTC (permalink / raw)
  To: root; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel

Richard B. Johnson wrote:
> On Fri, 18 Apr 2003, Jeff Garzik wrote:
> 
> 
>>Linus Torvalds wrote:
>>
>>>On Fri, 18 Apr 2003, Jeff Garzik wrote:
>>>
>>>
>>>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
>>>
>>>
>>>No, you should just not do this. I don't see the point.
>>
>>
>>strcpy has a test for each byte of its contents, and memcpy doesn't.
>>Why search 's' for NULL twice?
>>
>>	Jeff
> 
> 
> Because it doesn't. strcpy() is usually implimented by getting
> the string-length, using the same code sequence as strlen(), then
> using the same code sequence as memcpy(), but copying the null-byte
> as well. The check for the null-byte is done in the length routine.
> 
> If you do a memcpy(a, b, strlen(b));, then you are making two
> procedure calls and dirtying the cache twice..

Wrong, because we have to call strlen _anyway_, to provide the size to 
kmalloc.


> A typical Intel procedure, stripped of the push/pops to save
> registers is here....

That's kinda cute.  Why not submit a patch to the strcpy implementation 
in include/asm-i386/string.h?  :)  Ours is shorter, but does have a jump:
         "1:\tlodsb\n\t"
         "stosb\n\t"
         "testb %%al,%%al\n\t"
         "jne 1b"

Which is better?  I don't know; I'm still learning the performance 
eccentricities of x86 insns on various processors.


Related x86 question:  if the memory buffer is not dword-aligned, is 
'rep movsl' the best idea?  On RISC it's usually smarter to unroll the 
head of the loop to avoid unaligned accesses; but from reading x86 asm 
code in the kernel, nobody seems to care about that.  Is the 
unaligned-access penalty so small that the increased code size of the 
head-unroll is never worth it?


> A lot of persons who are unfamiliar with tools other than 'C' think
> that strcpy() is made like this:
> 
> 	while(*dsp++ = *src++)
>                    ;

In fact, that's basically the kernel's non-arch-specific implementation :)

	Jeff




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18 18:00       ` Linus Torvalds
@ 2003-04-18 18:44         ` Jeff Garzik
  2003-04-19  4:52         ` Rusty Russell
  1 sibling, 0 replies; 20+ messages in thread
From: Jeff Garzik @ 2003-04-18 18:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rusty Trivial Russell, linux-kernel

Linus Torvalds wrote:
> On Fri, 18 Apr 2003, Jeff Garzik wrote:
> 
>>Linus Torvalds wrote:
>>
>>>On Fri, 18 Apr 2003, Jeff Garzik wrote:
>>>
>>>
>>>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
>>>
>>>No, you should just not do this. I don't see the point.
>>
>>strcpy has a test for each byte of its contents, and memcpy doesn't.
>>Why search 's' for NULL twice?
> 
> 
> No, my point is that kstrdup() _itself_ just shouldn't be done. I don't

Ah, indeed :)


> see it as being worthy of kernel support. Most of the kernel string data
> structures are NOT random zero-ended strings anyway: they are either
> strictly limited in some ways ("ends in '\0', but limited to PATH_MAX), or
> they are explicitly sized ("struct qstr").

Quite true

	Jeff




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18 18:40         ` Jeff Garzik
@ 2003-04-18 19:24           ` H. Peter Anvin
  2003-04-18 23:37             ` Alan Cox
  2003-04-18 19:29           ` Richard B. Johnson
  1 sibling, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2003-04-18 19:24 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <3EA0469D.7090602@pobox.com>
By author:    Jeff Garzik <jgarzik@pobox.com>
In newsgroup: linux.dev.kernel
> 
> That's kinda cute.  Why not submit a patch to the strcpy implementation 
> in include/asm-i386/string.h?  :)  Ours is shorter, but does have a jump:
>          "1:\tlodsb\n\t"
>          "stosb\n\t"
>          "testb %%al,%%al\n\t"
>          "jne 1b"
> 
> Which is better?  I don't know; I'm still learning the performance 
> eccentricities of x86 insns on various processors.
> 

It varies from porocessor to processor, and also depends on usage.

> 
> Related x86 question:  if the memory buffer is not dword-aligned, is 
> 'rep movsl' the best idea?  On RISC it's usually smarter to unroll the 
> head of the loop to avoid unaligned accesses; but from reading x86 asm 
> code in the kernel, nobody seems to care about that.  Is the 
> unaligned-access penalty so small that the increased code size of the 
> head-unroll is never worth it?
> 

A lot of the newer x86 processors will perform this unrolling in
microcode.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18 18:40         ` Jeff Garzik
  2003-04-18 19:24           ` H. Peter Anvin
@ 2003-04-18 19:29           ` Richard B. Johnson
  2003-04-19 11:45             ` Kai Henningsen
  2003-04-19 20:16             ` Ruth Ivimey-Cook
  1 sibling, 2 replies; 20+ messages in thread
From: Richard B. Johnson @ 2003-04-18 19:29 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel

On Fri, 18 Apr 2003, Jeff Garzik wrote:

> Richard B. Johnson wrote:
> > On Fri, 18 Apr 2003, Jeff Garzik wrote:
> >
> >
> >>Linus Torvalds wrote:
> >>
> >>>On Fri, 18 Apr 2003, Jeff Garzik wrote:
> >>>
> >>>
> >>>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
> >>>
> >>>
> >>>No, you should just not do this. I don't see the point.
> >>
> >>
> >>strcpy has a test for each byte of its contents, and memcpy doesn't.
> >>Why search 's' for NULL twice?
> >>
> >>	Jeff
> >
> >
> > Because it doesn't. strcpy() is usually implemented by getting
> > the string-length, using the same code sequence as strlen(), then
> > using the same code sequence as memcpy(), but copying the null-byte
> > as well. The check for the null-byte is done in the length routine.
> >
> > If you do a memcpy(a, b, strlen(b));, then you are making two
> > procedure calls and dirtying the cache twice..
>
> Wrong, because we have to call strlen _anyway_, to provide the size to
> kmalloc.
>
>
> > A typical Intel procedure, stripped of the push/pops to save
> > registers is here....
>
> That's kinda cute.  Why not submit a patch to the strcpy implementation
> in include/asm-i386/string.h?  :)  Ours is shorter, but does have a jump:
>          "1:\tlodsb\n\t"
>          "stosb\n\t"
>          "testb %%al,%%al\n\t"
>          "jne 1b"
>

Years ago I did submit patches of all kinds, mostly 'asm' stuff
because I have been doing assembly for over 20 years. However,
I got tired of being shot-down in flames by persons who haven't
a clue, so I stopped doing that.

The history of the stuff in asm-i386 is full of changes, with
many bugs introduced by persons who tried so save a nanosecond
here and there. In recent times (past two years) somebody changed
the stuff back to things which were not optimum, but quite obviously
correct.

I might 'risk' sending in some patches again. Maybe.

> Which is better?  I don't know; I'm still learning the performance
> eccentricities of x86 insns on various processors.
>

The test for every byte transferred is, quite obviously, correct.
It is also, quite obviously, non optimum.

>
> Related x86 question:  if the memory buffer is not dword-aligned, is
> 'rep movsl' the best idea?  On RISC it's usually smarter to unroll the
> head of the loop to avoid unaligned accesses; but from reading x86 asm
> code in the kernel, nobody seems to care about that.  Is the
> unaligned-access penalty so small that the increased code size of the
> head-unroll is never worth it?
>

Unaligned access takes a penalty. On early i586 machines, it was
horrible, doubled the access time. On i486 and, later on i686
machines, the access times are not changed as radically. Many
of the changes, that were later reverted back, to the string
and memory 'asm' routines occurred when the 'awful' i586
came out. Of course, the unaligned access on the i586 was
still faster than the i486 with aligned access (because of
clock speeds). However, it was worth the trouble to improve
the assembly routines at that time.

>
> > A lot of persons who are unfamiliar with tools other than 'C' think
> > that strcpy() is made like this:
> >
> > 	while(*dsp++ = *src++)
> >                    ;
>
> In fact, that's basically the kernel's non-arch-specific implementation :)
>
> 	Jeff

Yep. Naive code looks so 'simple', must be "optimum", no? ;^).


Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18 19:24           ` H. Peter Anvin
@ 2003-04-18 23:37             ` Alan Cox
  0 siblings, 0 replies; 20+ messages in thread
From: Alan Cox @ 2003-04-18 23:37 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux Kernel Mailing List

On Gwe, 2003-04-18 at 20:24, H. Peter Anvin wrote:
> Followup to:  <3EA0469D.7090602@pobox.com>
> > 
> > Which is better?  I don't know; I'm still learning the performance 
> > eccentricities of x86 insns on various processors.
> > 
> 
> It varies from porocessor to processor, and also depends on usage.

Stepping to stepping, even clock speed at stepping in the case of some B
series pentiums. For large copies we hit the kernel optimised stuff like
MMX copiers, for small copies its irrelevant 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18  8:10 ` Jeff Garzik
  2003-04-18 16:20   ` Linus Torvalds
@ 2003-04-19  4:14   ` Rusty Russell
  2003-04-19  4:48     ` Jeff Garzik
  1 sibling, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2003-04-19  4:14 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, linux-kernel

In message <3E9FB2E9.9040308@pobox.com> you write:
> Rusty Trivial Russell wrote:
> > +char *kstrdup(const char *s, int gfp)
> > +{
> > +	char *buf = kmalloc(strlen(s)+1, gfp);
> > +	if (buf)
> > +		strcpy(buf, s);
> > +	return buf;
> > +}
> 
> You should save the strlen result to a temp var, and then s/strcpy/memcpy/

Completely disagree.  Write the most straightforward code possible,
and then if there proves to be a problem, optimize.  Optimizations
where there's no actual performance problem should be left to the
compiler.

Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for
your version.

Hope that helps,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-19  4:14   ` Rusty Russell
@ 2003-04-19  4:48     ` Jeff Garzik
  2003-04-19  8:28       ` Rusty Russell
  2003-04-19 12:27       ` Alan Cox
  0 siblings, 2 replies; 20+ messages in thread
From: Jeff Garzik @ 2003-04-19  4:48 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Linus Torvalds, linux-kernel

Rusty Russell wrote:
> In message <3E9FB2E9.9040308@pobox.com> you write:
> 
>>Rusty Trivial Russell wrote:
>>
>>>+char *kstrdup(const char *s, int gfp)
>>>+{
>>>+	char *buf = kmalloc(strlen(s)+1, gfp);
>>>+	if (buf)
>>>+		strcpy(buf, s);
>>>+	return buf;
>>>+}
>>
>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
> 
> 
> Completely disagree.  Write the most straightforward code possible,
> and then if there proves to be a problem, optimize.  Optimizations
> where there's no actual performance problem should be left to the
> compiler.

Since the kernel does its own string ops, the compiler does not have 
enough information to deduce that further optimization is possible.


> Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for
> your version.

And?  It's still slower.

	Jeff



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18 18:00       ` Linus Torvalds
  2003-04-18 18:44         ` Jeff Garzik
@ 2003-04-19  4:52         ` Rusty Russell
  1 sibling, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2003-04-19  4:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

In message <Pine.LNX.4.44.0304181055290.2950-100000@home.transmeta.com> you write:
> 
> No, my point is that kstrdup() _itself_ just shouldn't be done. I don't
> see it as being worthy of kernel support.

parport, afs, intermezzo, sunrpc, md, sound and um have their own
strdup variants.  ecard.c (ARM), xtalk.c (ia64), ide.c, blkmtd.c
(mtd), dlci.c (wan), parport (again), md (again), moctotek.c (usb),
scsiglue.c (usb), super.c (affs), inode.c (nfs), and generic.c (proc)
all open code it.

Most of them use it multuple times.

I think unifying the seven implementations, and over fifty uses, is a
minor but worthwhile goal.  It's not as widely used as it would be in
userspace, but still...

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-19  4:48     ` Jeff Garzik
@ 2003-04-19  8:28       ` Rusty Russell
  2003-04-19 12:27       ` Alan Cox
  1 sibling, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2003-04-19  8:28 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, linux-kernel

In message <3EA0D524.7010309@pobox.com> you write:
> Since the kernel does its own string ops, the compiler does not have 
> enough information to deduce that further optimization is possible.

You're right, I was measuring without the kernel string ops.
> > Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for
> > your version.
> 
> And?  It's still slower.

Who gives a flying fuck?  You're doing an allocation in there
ferchissakes.  Choose the simplest option.  Jeff, if you have time to
post on this, I think you need a hobby 8)

char *__constant_kstrdup(const char *s, unsigned int size, int gfp)
{
	char *buf = kmalloc(size, gfp);
	if (likely(buf))
		memcpy(buf, s, size);
	return buf;
}

char *__kstrdup(const char *s, int gfp)
{
	return __constant_kstrdup(s, strlen(s)+1, gfp);
}

#define kstrdup(str, gfp)						\
	(__builtin_constant_p(str) && sizeof(str) != sizeof(char *) ?	\
		__constant_kstrdup(str, sizeof(str), gfp)		\
	: __kstrdup(str, gfp))

Feature list:
1) Optimizes the constant kstrdup case,
2) Doesn't multi-evaluate args (except if constant string),
3) Uses likely() to bias against the case of kmalloc failing.

OK, so I guess I need a hobby, too..
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18 19:29           ` Richard B. Johnson
@ 2003-04-19 11:45             ` Kai Henningsen
  2003-04-19 20:16             ` Ruth Ivimey-Cook
  1 sibling, 0 replies; 20+ messages in thread
From: Kai Henningsen @ 2003-04-19 11:45 UTC (permalink / raw)
  To: linux-kernel

root@chaos.analogic.com (Richard B. Johnson)  wrote on 18.04.03 in <Pine.LNX.4.53.0304181512220.22901@chaos>:

> The test for every byte transferred is, quite obviously, correct.
> It is also, quite obviously, non optimum.

Actually, that is very much not obvious.

Especially if you're familiar with architectures where every move has an  
implicit test (typically for zero and sign), and so checking for the zero  
byte during the move is quite obviously the only sane thing to do - the  
version with a count is slower, because the inner loop does more. (Those  
architectures typically don't have a REP-style prefix.)

MfG Kai

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-19  4:48     ` Jeff Garzik
  2003-04-19  8:28       ` Rusty Russell
@ 2003-04-19 12:27       ` Alan Cox
  2003-04-19 14:44         ` Bernd Eckenfels
  2003-04-20  8:05         ` Rusty Russell
  1 sibling, 2 replies; 20+ messages in thread
From: Alan Cox @ 2003-04-19 12:27 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Rusty Russell, Linus Torvalds, Linux Kernel Mailing List

On Sad, 2003-04-19 at 05:48, Jeff Garzik wrote:
> > Completely disagree.  Write the most straightforward code possible,
> > and then if there proves to be a problem, optimize.  Optimizations
> > where there's no actual performance problem should be left to the
> > compiler.
> 
> Since the kernel does its own string ops, the compiler does not have 
> enough information to deduce that further optimization is possible.
> 
> 
> > Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for
> > your version.
> 
> And?  It's still slower.

You are arguing over a 1 instruction, probably sub 1 clock scheduling
matter on a call which is not used on any fast or common path. If you
shaved 1 clock off the timer handling instead you'd make a lot more
difference..


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-19 12:27       ` Alan Cox
@ 2003-04-19 14:44         ` Bernd Eckenfels
  2003-04-20  8:05         ` Rusty Russell
  1 sibling, 0 replies; 20+ messages in thread
From: Bernd Eckenfels @ 2003-04-19 14:44 UTC (permalink / raw)
  To: linux-kernel

In article <1050755240.3277.3.camel@dhcp22.swansea.linux.org.uk> you wrote:
> You are arguing over a 1 instruction, probably sub 1 clock scheduling
> matter on a call which is not used on any fast or common path. If you
> shaved 1 clock off the timer handling instead you'd make a lot more
> difference..

i think he is arguing about the performance of strcpy vs. memcpy.

Is the 1 instruction longer including the inlined code for those two?

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-18 19:29           ` Richard B. Johnson
  2003-04-19 11:45             ` Kai Henningsen
@ 2003-04-19 20:16             ` Ruth Ivimey-Cook
  1 sibling, 0 replies; 20+ messages in thread
From: Ruth Ivimey-Cook @ 2003-04-19 20:16 UTC (permalink / raw)
  To: root, Jeff Garzik; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel

At 20:29 18/04/2003, Richard B. Johnson wrote:
> > > A lot of persons who are unfamiliar with tools other than 'C' think
> > > that strcpy() is made like this:
> > >
> > >     while(*dsp++ = *src++)
> > >                    ;
> >
> > In fact, that's basically the kernel's non-arch-specific implementation :)
> >
> >       Jeff
>
>Yep. Naive code looks so 'simple', must be "optimum", no? ;^).


Well, actually on ARM at least it is optimal, given the likely short length 
of strings. To do any better at all you have to assume stringlength of many 
tens of words, which is unlikely. For comparison, this is what armcc 
produces for the above code:

                   strdup PROC
                   |L1.0|
000000  e4d12001          LDRB     a3,[a2],#1            ;1
000004  e4c02001          STRB     a3,[a1],#1            ;1
000008  e3520000          CMP      a3,#0                 ;1
00000c  1afffffb          BNE      |L1.0|                ;1
000010  e1a0f00e          MOV      pc,lr                 ;1


AFAICT there is no better solution.

Ruth


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TRIVIAL] kstrdup
  2003-04-19 12:27       ` Alan Cox
  2003-04-19 14:44         ` Bernd Eckenfels
@ 2003-04-20  8:05         ` Rusty Russell
  1 sibling, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2003-04-20  8:05 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jeff Garzik, Linus Torvalds, Linux Kernel Mailing List

In message <1050755240.3277.3.camel@dhcp22.swansea.linux.org.uk> you write:
> On Sad, 2003-04-19 at 05:48, Jeff Garzik wrote:
> > And?  It's still slower.
> 
> You are arguing over a 1 instruction, probably sub 1 clock scheduling
> matter on a call which is not used on any fast or common path. If you
> shaved 1 clock off the timer handling instead you'd make a lot more
> difference..

Hey, if we have 50 million machines running 2.6 for three years, this
optimization saves 100 clock cycles per boot (most kstrdups are device
init), and machines boot once a month, and average 1GHz, that's three
minutes of saved time.

I think Jeff and I should start the kstrdup sourceforge project
immediately...

Cheers,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2003-04-20  8:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-18  7:58 [TRIVIAL] kstrdup Rusty Trivial Russell
2003-04-18  8:10 ` Jeff Garzik
2003-04-18 16:20   ` Linus Torvalds
2003-04-18 16:56     ` Jeff Garzik
2003-04-18 18:00       ` Linus Torvalds
2003-04-18 18:44         ` Jeff Garzik
2003-04-19  4:52         ` Rusty Russell
2003-04-18 18:00       ` Richard B. Johnson
2003-04-18 18:40         ` Jeff Garzik
2003-04-18 19:24           ` H. Peter Anvin
2003-04-18 23:37             ` Alan Cox
2003-04-18 19:29           ` Richard B. Johnson
2003-04-19 11:45             ` Kai Henningsen
2003-04-19 20:16             ` Ruth Ivimey-Cook
2003-04-19  4:14   ` Rusty Russell
2003-04-19  4:48     ` Jeff Garzik
2003-04-19  8:28       ` Rusty Russell
2003-04-19 12:27       ` Alan Cox
2003-04-19 14:44         ` Bernd Eckenfels
2003-04-20  8:05         ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).