* [TRIVIAL] kstrdup
@ 2003-04-18 7:58 Rusty Trivial Russell
2003-04-18 8:10 ` Jeff Garzik
0 siblings, 1 reply; 20+ messages in thread
From: Rusty Trivial Russell @ 2003-04-18 7:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel
From: Rusty Russell <rusty@rustcorp.com.au>
Everyone loves reimplementing strdup. Give them a kstrdup (basically
from drivers/md).
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
Name: kstrdup
Author: Neil Brown and Rusty Russell
Status: Trivial
D: Everyone loves reimplementing strdup. Give them a kstrdup.
--- trivial-2.5.67-bk8/drivers/md/dm-ioctl.c.orig 2003-04-18 17:43:57.000000000 +1000
+++ trivial-2.5.67-bk8/drivers/md/dm-ioctl.c 2003-04-18 17:43:57.000000000 +1000
@@ -14,6 +14,7 @@
#include <linux/wait.h>
#include <linux/blk.h>
#include <linux/slab.h>
+#include <linux/string.h>
#include <asm/uaccess.h>
@@ -118,14 +119,6 @@
/*-----------------------------------------------------------------
* Inserting, removing and renaming a device.
*---------------------------------------------------------------*/
-static inline char *kstrdup(const char *str)
-{
- char *r = kmalloc(strlen(str) + 1, GFP_KERNEL);
- if (r)
- strcpy(r, str);
- return r;
-}
-
static struct hash_cell *alloc_cell(const char *name, const char *uuid,
struct mapped_device *md)
{
@@ -135,7 +128,7 @@
if (!hc)
return NULL;
- hc->name = kstrdup(name);
+ hc->name = kstrdup(name, GFP_KERNEL);
if (!hc->name) {
kfree(hc);
return NULL;
@@ -145,7 +138,7 @@
hc->uuid = NULL;
else {
- hc->uuid = kstrdup(uuid);
+ hc->uuid = kstrdup(uuid, GFP_KERNEL);
if (!hc->uuid) {
kfree(hc->name);
kfree(hc);
@@ -270,7 +263,7 @@
/*
* duplicate new.
*/
- new_name = kstrdup(new);
+ new_name = kstrdup(new, GFP_KERNEL);
if (!new_name)
return -ENOMEM;
--- trivial-2.5.67-bk8/include/linux/string.h.orig 2003-04-18 17:43:57.000000000 +1000
+++ trivial-2.5.67-bk8/include/linux/string.h 2003-04-18 17:43:57.000000000 +1000
@@ -78,6 +78,8 @@
extern void * memchr(const void *,int,__kernel_size_t);
#endif
+extern char *kstrdup(const char *s, int gfp);
+
#ifdef __cplusplus
}
#endif
--- trivial-2.5.67-bk8/kernel/ksyms.c.orig 2003-04-18 17:43:57.000000000 +1000
+++ trivial-2.5.67-bk8/kernel/ksyms.c 2003-04-18 17:43:57.000000000 +1000
@@ -585,6 +585,7 @@
EXPORT_SYMBOL(strnicmp);
EXPORT_SYMBOL(strspn);
EXPORT_SYMBOL(strsep);
+EXPORT_SYMBOL(kstrdup);
/* software interrupts */
EXPORT_SYMBOL(tasklet_init);
--- trivial-2.5.67-bk8/lib/string.c.orig 2003-04-18 17:43:57.000000000 +1000
+++ trivial-2.5.67-bk8/lib/string.c 2003-04-18 17:43:57.000000000 +1000
@@ -22,6 +22,7 @@
#include <linux/types.h>
#include <linux/string.h>
#include <linux/ctype.h>
+#include <linux/slab.h>
#ifndef __HAVE_ARCH_STRNICMP
/**
@@ -525,5 +526,12 @@
}
return NULL;
}
-
#endif
+
+char *kstrdup(const char *s, int gfp)
+{
+ char *buf = kmalloc(strlen(s)+1, gfp);
+ if (buf)
+ strcpy(buf, s);
+ return buf;
+}
--
Don't blame me: the Monkey is driving
File: Rusty Russell <rusty@rustcorp.com.au>: [PATCH] [TRIVIAL] kstrdup
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 7:58 [TRIVIAL] kstrdup Rusty Trivial Russell
@ 2003-04-18 8:10 ` Jeff Garzik
2003-04-18 16:20 ` Linus Torvalds
2003-04-19 4:14 ` Rusty Russell
0 siblings, 2 replies; 20+ messages in thread
From: Jeff Garzik @ 2003-04-18 8:10 UTC (permalink / raw)
To: Rusty Trivial Russell; +Cc: Linus Torvalds, linux-kernel
Rusty Trivial Russell wrote:
> +char *kstrdup(const char *s, int gfp)
> +{
> + char *buf = kmalloc(strlen(s)+1, gfp);
> + if (buf)
> + strcpy(buf, s);
> + return buf;
> +}
You should save the strlen result to a temp var, and then s/strcpy/memcpy/
Jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 8:10 ` Jeff Garzik
@ 2003-04-18 16:20 ` Linus Torvalds
2003-04-18 16:56 ` Jeff Garzik
2003-04-19 4:14 ` Rusty Russell
1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2003-04-18 16:20 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Rusty Trivial Russell, linux-kernel
On Fri, 18 Apr 2003, Jeff Garzik wrote:
>
> You should save the strlen result to a temp var, and then s/strcpy/memcpy/
No, you should just not do this. I don't see the point.
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 16:20 ` Linus Torvalds
@ 2003-04-18 16:56 ` Jeff Garzik
2003-04-18 18:00 ` Linus Torvalds
2003-04-18 18:00 ` Richard B. Johnson
0 siblings, 2 replies; 20+ messages in thread
From: Jeff Garzik @ 2003-04-18 16:56 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Rusty Trivial Russell, linux-kernel
Linus Torvalds wrote:
> On Fri, 18 Apr 2003, Jeff Garzik wrote:
>
>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
>
>
> No, you should just not do this. I don't see the point.
strcpy has a test for each byte of its contents, and memcpy doesn't.
Why search 's' for NULL twice?
Jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 16:56 ` Jeff Garzik
@ 2003-04-18 18:00 ` Linus Torvalds
2003-04-18 18:44 ` Jeff Garzik
2003-04-19 4:52 ` Rusty Russell
2003-04-18 18:00 ` Richard B. Johnson
1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2003-04-18 18:00 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Rusty Trivial Russell, linux-kernel
On Fri, 18 Apr 2003, Jeff Garzik wrote:
> Linus Torvalds wrote:
> > On Fri, 18 Apr 2003, Jeff Garzik wrote:
> >
> >>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
> >
> > No, you should just not do this. I don't see the point.
>
> strcpy has a test for each byte of its contents, and memcpy doesn't.
> Why search 's' for NULL twice?
No, my point is that kstrdup() _itself_ just shouldn't be done. I don't
see it as being worthy of kernel support. Most of the kernel string data
structures are NOT random zero-ended strings anyway: they are either
strictly limited in some ways ("ends in '\0', but limited to PATH_MAX), or
they are explicitly sized ("struct qstr").
I don't much personally like C strings. Explicit lengths tend to have a
lot of advantages, and while a lot of the standard C infrastructure is for
zero-ended strings, they do end up being even worse in the kernel than in
user space (think buffer overflows and limited allocations in general).
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 16:56 ` Jeff Garzik
2003-04-18 18:00 ` Linus Torvalds
@ 2003-04-18 18:00 ` Richard B. Johnson
2003-04-18 18:40 ` Jeff Garzik
1 sibling, 1 reply; 20+ messages in thread
From: Richard B. Johnson @ 2003-04-18 18:00 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel
On Fri, 18 Apr 2003, Jeff Garzik wrote:
> Linus Torvalds wrote:
> > On Fri, 18 Apr 2003, Jeff Garzik wrote:
> >
> >>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
> >
> >
> > No, you should just not do this. I don't see the point.
>
>
> strcpy has a test for each byte of its contents, and memcpy doesn't.
> Why search 's' for NULL twice?
>
> Jeff
Because it doesn't. strcpy() is usually implimented by getting
the string-length, using the same code sequence as strlen(), then
using the same code sequence as memcpy(), but copying the null-byte
as well. The check for the null-byte is done in the length routine.
If you do a memcpy(a, b, strlen(b));, then you are making two
procedure calls and dirtying the cache twice..
A typical Intel procedure, stripped of the push/pops to save
registers is here....
cld
movl SRC(%esp), %edi
movl %edi, %esi # Source = ESI and EDI
orl $-1, %ecx
xorl %eax,%eax # Zero
repnz scasb # Look for the null-byte
negl %ecx # ecx = string length
incl %ecx # copy the null
movl DST(%esp), %edi # Destination buffer EDI
shrl $1, %ecx # Make word count
pushf # Save result
shrl $1, %ecx # Make longword count
rep movsl # copy longwords
adcl %ecx, %ecx # Possible word count
rep movsw # Copy possible last word
popf # Result from first shift
adcl %ecx, %ecx # Possible byte-count
rep movsb # Copy last byte
There are no jumps and the locality of action is good. This could
be improved in the corner cases of long, aligned buffers.
A lot of persons who are unfamiliar with tools other than 'C' think
that strcpy() is made like this:
while(*dsp++ = *src++)
;
... ahaaa .. Wouderful short portable code! Wrong, awful code, possibly
portable.
.L2:
movl 8(%ebp),%edx # Memory reference (destination)
movl 12(%ebp),%ecx # Memory reference (source)
movb (%ecx),%al # Get a byte
movb %al,(%edx) # Put it into destination buffer
incl 12(%ebp) # Memory reference (bump the source)
incl 8(%ebp) # Memory reference (bump the destination)
testb %al,%al # Check for zero on every byte
jne .L4 # Why a jump to here? (should be jne .L2)
jmp .L3
.align 4
.L4:
jmp .L2 # Now we go back to the beginning.
.align 4
.L3:
.L1:
movl %ebp,%esp
popl %ebp
ret
Now; this code is 'shorter', but it is very slow compared to the
smarter strcpy() as previously shown.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 18:00 ` Richard B. Johnson
@ 2003-04-18 18:40 ` Jeff Garzik
2003-04-18 19:24 ` H. Peter Anvin
2003-04-18 19:29 ` Richard B. Johnson
0 siblings, 2 replies; 20+ messages in thread
From: Jeff Garzik @ 2003-04-18 18:40 UTC (permalink / raw)
To: root; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel
Richard B. Johnson wrote:
> On Fri, 18 Apr 2003, Jeff Garzik wrote:
>
>
>>Linus Torvalds wrote:
>>
>>>On Fri, 18 Apr 2003, Jeff Garzik wrote:
>>>
>>>
>>>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
>>>
>>>
>>>No, you should just not do this. I don't see the point.
>>
>>
>>strcpy has a test for each byte of its contents, and memcpy doesn't.
>>Why search 's' for NULL twice?
>>
>> Jeff
>
>
> Because it doesn't. strcpy() is usually implimented by getting
> the string-length, using the same code sequence as strlen(), then
> using the same code sequence as memcpy(), but copying the null-byte
> as well. The check for the null-byte is done in the length routine.
>
> If you do a memcpy(a, b, strlen(b));, then you are making two
> procedure calls and dirtying the cache twice..
Wrong, because we have to call strlen _anyway_, to provide the size to
kmalloc.
> A typical Intel procedure, stripped of the push/pops to save
> registers is here....
That's kinda cute. Why not submit a patch to the strcpy implementation
in include/asm-i386/string.h? :) Ours is shorter, but does have a jump:
"1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
Which is better? I don't know; I'm still learning the performance
eccentricities of x86 insns on various processors.
Related x86 question: if the memory buffer is not dword-aligned, is
'rep movsl' the best idea? On RISC it's usually smarter to unroll the
head of the loop to avoid unaligned accesses; but from reading x86 asm
code in the kernel, nobody seems to care about that. Is the
unaligned-access penalty so small that the increased code size of the
head-unroll is never worth it?
> A lot of persons who are unfamiliar with tools other than 'C' think
> that strcpy() is made like this:
>
> while(*dsp++ = *src++)
> ;
In fact, that's basically the kernel's non-arch-specific implementation :)
Jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 18:00 ` Linus Torvalds
@ 2003-04-18 18:44 ` Jeff Garzik
2003-04-19 4:52 ` Rusty Russell
1 sibling, 0 replies; 20+ messages in thread
From: Jeff Garzik @ 2003-04-18 18:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Rusty Trivial Russell, linux-kernel
Linus Torvalds wrote:
> On Fri, 18 Apr 2003, Jeff Garzik wrote:
>
>>Linus Torvalds wrote:
>>
>>>On Fri, 18 Apr 2003, Jeff Garzik wrote:
>>>
>>>
>>>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
>>>
>>>No, you should just not do this. I don't see the point.
>>
>>strcpy has a test for each byte of its contents, and memcpy doesn't.
>>Why search 's' for NULL twice?
>
>
> No, my point is that kstrdup() _itself_ just shouldn't be done. I don't
Ah, indeed :)
> see it as being worthy of kernel support. Most of the kernel string data
> structures are NOT random zero-ended strings anyway: they are either
> strictly limited in some ways ("ends in '\0', but limited to PATH_MAX), or
> they are explicitly sized ("struct qstr").
Quite true
Jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 18:40 ` Jeff Garzik
@ 2003-04-18 19:24 ` H. Peter Anvin
2003-04-18 23:37 ` Alan Cox
2003-04-18 19:29 ` Richard B. Johnson
1 sibling, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2003-04-18 19:24 UTC (permalink / raw)
To: linux-kernel
Followup to: <3EA0469D.7090602@pobox.com>
By author: Jeff Garzik <jgarzik@pobox.com>
In newsgroup: linux.dev.kernel
>
> That's kinda cute. Why not submit a patch to the strcpy implementation
> in include/asm-i386/string.h? :) Ours is shorter, but does have a jump:
> "1:\tlodsb\n\t"
> "stosb\n\t"
> "testb %%al,%%al\n\t"
> "jne 1b"
>
> Which is better? I don't know; I'm still learning the performance
> eccentricities of x86 insns on various processors.
>
It varies from porocessor to processor, and also depends on usage.
>
> Related x86 question: if the memory buffer is not dword-aligned, is
> 'rep movsl' the best idea? On RISC it's usually smarter to unroll the
> head of the loop to avoid unaligned accesses; but from reading x86 asm
> code in the kernel, nobody seems to care about that. Is the
> unaligned-access penalty so small that the increased code size of the
> head-unroll is never worth it?
>
A lot of the newer x86 processors will perform this unrolling in
microcode.
-hpa
--
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 18:40 ` Jeff Garzik
2003-04-18 19:24 ` H. Peter Anvin
@ 2003-04-18 19:29 ` Richard B. Johnson
2003-04-19 11:45 ` Kai Henningsen
2003-04-19 20:16 ` Ruth Ivimey-Cook
1 sibling, 2 replies; 20+ messages in thread
From: Richard B. Johnson @ 2003-04-18 19:29 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel
On Fri, 18 Apr 2003, Jeff Garzik wrote:
> Richard B. Johnson wrote:
> > On Fri, 18 Apr 2003, Jeff Garzik wrote:
> >
> >
> >>Linus Torvalds wrote:
> >>
> >>>On Fri, 18 Apr 2003, Jeff Garzik wrote:
> >>>
> >>>
> >>>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
> >>>
> >>>
> >>>No, you should just not do this. I don't see the point.
> >>
> >>
> >>strcpy has a test for each byte of its contents, and memcpy doesn't.
> >>Why search 's' for NULL twice?
> >>
> >> Jeff
> >
> >
> > Because it doesn't. strcpy() is usually implemented by getting
> > the string-length, using the same code sequence as strlen(), then
> > using the same code sequence as memcpy(), but copying the null-byte
> > as well. The check for the null-byte is done in the length routine.
> >
> > If you do a memcpy(a, b, strlen(b));, then you are making two
> > procedure calls and dirtying the cache twice..
>
> Wrong, because we have to call strlen _anyway_, to provide the size to
> kmalloc.
>
>
> > A typical Intel procedure, stripped of the push/pops to save
> > registers is here....
>
> That's kinda cute. Why not submit a patch to the strcpy implementation
> in include/asm-i386/string.h? :) Ours is shorter, but does have a jump:
> "1:\tlodsb\n\t"
> "stosb\n\t"
> "testb %%al,%%al\n\t"
> "jne 1b"
>
Years ago I did submit patches of all kinds, mostly 'asm' stuff
because I have been doing assembly for over 20 years. However,
I got tired of being shot-down in flames by persons who haven't
a clue, so I stopped doing that.
The history of the stuff in asm-i386 is full of changes, with
many bugs introduced by persons who tried so save a nanosecond
here and there. In recent times (past two years) somebody changed
the stuff back to things which were not optimum, but quite obviously
correct.
I might 'risk' sending in some patches again. Maybe.
> Which is better? I don't know; I'm still learning the performance
> eccentricities of x86 insns on various processors.
>
The test for every byte transferred is, quite obviously, correct.
It is also, quite obviously, non optimum.
>
> Related x86 question: if the memory buffer is not dword-aligned, is
> 'rep movsl' the best idea? On RISC it's usually smarter to unroll the
> head of the loop to avoid unaligned accesses; but from reading x86 asm
> code in the kernel, nobody seems to care about that. Is the
> unaligned-access penalty so small that the increased code size of the
> head-unroll is never worth it?
>
Unaligned access takes a penalty. On early i586 machines, it was
horrible, doubled the access time. On i486 and, later on i686
machines, the access times are not changed as radically. Many
of the changes, that were later reverted back, to the string
and memory 'asm' routines occurred when the 'awful' i586
came out. Of course, the unaligned access on the i586 was
still faster than the i486 with aligned access (because of
clock speeds). However, it was worth the trouble to improve
the assembly routines at that time.
>
> > A lot of persons who are unfamiliar with tools other than 'C' think
> > that strcpy() is made like this:
> >
> > while(*dsp++ = *src++)
> > ;
>
> In fact, that's basically the kernel's non-arch-specific implementation :)
>
> Jeff
Yep. Naive code looks so 'simple', must be "optimum", no? ;^).
Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 19:24 ` H. Peter Anvin
@ 2003-04-18 23:37 ` Alan Cox
0 siblings, 0 replies; 20+ messages in thread
From: Alan Cox @ 2003-04-18 23:37 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Linux Kernel Mailing List
On Gwe, 2003-04-18 at 20:24, H. Peter Anvin wrote:
> Followup to: <3EA0469D.7090602@pobox.com>
> >
> > Which is better? I don't know; I'm still learning the performance
> > eccentricities of x86 insns on various processors.
> >
>
> It varies from porocessor to processor, and also depends on usage.
Stepping to stepping, even clock speed at stepping in the case of some B
series pentiums. For large copies we hit the kernel optimised stuff like
MMX copiers, for small copies its irrelevant
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 8:10 ` Jeff Garzik
2003-04-18 16:20 ` Linus Torvalds
@ 2003-04-19 4:14 ` Rusty Russell
2003-04-19 4:48 ` Jeff Garzik
1 sibling, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2003-04-19 4:14 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linus Torvalds, linux-kernel
In message <3E9FB2E9.9040308@pobox.com> you write:
> Rusty Trivial Russell wrote:
> > +char *kstrdup(const char *s, int gfp)
> > +{
> > + char *buf = kmalloc(strlen(s)+1, gfp);
> > + if (buf)
> > + strcpy(buf, s);
> > + return buf;
> > +}
>
> You should save the strlen result to a temp var, and then s/strcpy/memcpy/
Completely disagree. Write the most straightforward code possible,
and then if there proves to be a problem, optimize. Optimizations
where there's no actual performance problem should be left to the
compiler.
Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for
your version.
Hope that helps,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-19 4:14 ` Rusty Russell
@ 2003-04-19 4:48 ` Jeff Garzik
2003-04-19 8:28 ` Rusty Russell
2003-04-19 12:27 ` Alan Cox
0 siblings, 2 replies; 20+ messages in thread
From: Jeff Garzik @ 2003-04-19 4:48 UTC (permalink / raw)
To: Rusty Russell; +Cc: Linus Torvalds, linux-kernel
Rusty Russell wrote:
> In message <3E9FB2E9.9040308@pobox.com> you write:
>
>>Rusty Trivial Russell wrote:
>>
>>>+char *kstrdup(const char *s, int gfp)
>>>+{
>>>+ char *buf = kmalloc(strlen(s)+1, gfp);
>>>+ if (buf)
>>>+ strcpy(buf, s);
>>>+ return buf;
>>>+}
>>
>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
>
>
> Completely disagree. Write the most straightforward code possible,
> and then if there proves to be a problem, optimize. Optimizations
> where there's no actual performance problem should be left to the
> compiler.
Since the kernel does its own string ops, the compiler does not have
enough information to deduce that further optimization is possible.
> Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for
> your version.
And? It's still slower.
Jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 18:00 ` Linus Torvalds
2003-04-18 18:44 ` Jeff Garzik
@ 2003-04-19 4:52 ` Rusty Russell
1 sibling, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2003-04-19 4:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel
In message <Pine.LNX.4.44.0304181055290.2950-100000@home.transmeta.com> you write:
>
> No, my point is that kstrdup() _itself_ just shouldn't be done. I don't
> see it as being worthy of kernel support.
parport, afs, intermezzo, sunrpc, md, sound and um have their own
strdup variants. ecard.c (ARM), xtalk.c (ia64), ide.c, blkmtd.c
(mtd), dlci.c (wan), parport (again), md (again), moctotek.c (usb),
scsiglue.c (usb), super.c (affs), inode.c (nfs), and generic.c (proc)
all open code it.
Most of them use it multuple times.
I think unifying the seven implementations, and over fifty uses, is a
minor but worthwhile goal. It's not as widely used as it would be in
userspace, but still...
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-19 4:48 ` Jeff Garzik
@ 2003-04-19 8:28 ` Rusty Russell
2003-04-19 12:27 ` Alan Cox
1 sibling, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2003-04-19 8:28 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linus Torvalds, linux-kernel
In message <3EA0D524.7010309@pobox.com> you write:
> Since the kernel does its own string ops, the compiler does not have
> enough information to deduce that further optimization is possible.
You're right, I was measuring without the kernel string ops.
> > Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for
> > your version.
>
> And? It's still slower.
Who gives a flying fuck? You're doing an allocation in there
ferchissakes. Choose the simplest option. Jeff, if you have time to
post on this, I think you need a hobby 8)
char *__constant_kstrdup(const char *s, unsigned int size, int gfp)
{
char *buf = kmalloc(size, gfp);
if (likely(buf))
memcpy(buf, s, size);
return buf;
}
char *__kstrdup(const char *s, int gfp)
{
return __constant_kstrdup(s, strlen(s)+1, gfp);
}
#define kstrdup(str, gfp) \
(__builtin_constant_p(str) && sizeof(str) != sizeof(char *) ? \
__constant_kstrdup(str, sizeof(str), gfp) \
: __kstrdup(str, gfp))
Feature list:
1) Optimizes the constant kstrdup case,
2) Doesn't multi-evaluate args (except if constant string),
3) Uses likely() to bias against the case of kmalloc failing.
OK, so I guess I need a hobby, too..
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 19:29 ` Richard B. Johnson
@ 2003-04-19 11:45 ` Kai Henningsen
2003-04-19 20:16 ` Ruth Ivimey-Cook
1 sibling, 0 replies; 20+ messages in thread
From: Kai Henningsen @ 2003-04-19 11:45 UTC (permalink / raw)
To: linux-kernel
root@chaos.analogic.com (Richard B. Johnson) wrote on 18.04.03 in <Pine.LNX.4.53.0304181512220.22901@chaos>:
> The test for every byte transferred is, quite obviously, correct.
> It is also, quite obviously, non optimum.
Actually, that is very much not obvious.
Especially if you're familiar with architectures where every move has an
implicit test (typically for zero and sign), and so checking for the zero
byte during the move is quite obviously the only sane thing to do - the
version with a count is slower, because the inner loop does more. (Those
architectures typically don't have a REP-style prefix.)
MfG Kai
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-19 4:48 ` Jeff Garzik
2003-04-19 8:28 ` Rusty Russell
@ 2003-04-19 12:27 ` Alan Cox
2003-04-19 14:44 ` Bernd Eckenfels
2003-04-20 8:05 ` Rusty Russell
1 sibling, 2 replies; 20+ messages in thread
From: Alan Cox @ 2003-04-19 12:27 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Rusty Russell, Linus Torvalds, Linux Kernel Mailing List
On Sad, 2003-04-19 at 05:48, Jeff Garzik wrote:
> > Completely disagree. Write the most straightforward code possible,
> > and then if there proves to be a problem, optimize. Optimizations
> > where there's no actual performance problem should be left to the
> > compiler.
>
> Since the kernel does its own string ops, the compiler does not have
> enough information to deduce that further optimization is possible.
>
>
> > Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for
> > your version.
>
> And? It's still slower.
You are arguing over a 1 instruction, probably sub 1 clock scheduling
matter on a call which is not used on any fast or common path. If you
shaved 1 clock off the timer handling instead you'd make a lot more
difference..
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-19 12:27 ` Alan Cox
@ 2003-04-19 14:44 ` Bernd Eckenfels
2003-04-20 8:05 ` Rusty Russell
1 sibling, 0 replies; 20+ messages in thread
From: Bernd Eckenfels @ 2003-04-19 14:44 UTC (permalink / raw)
To: linux-kernel
In article <1050755240.3277.3.camel@dhcp22.swansea.linux.org.uk> you wrote:
> You are arguing over a 1 instruction, probably sub 1 clock scheduling
> matter on a call which is not used on any fast or common path. If you
> shaved 1 clock off the timer handling instead you'd make a lot more
> difference..
i think he is arguing about the performance of strcpy vs. memcpy.
Is the 1 instruction longer including the inlined code for those two?
Greetings
Bernd
--
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-18 19:29 ` Richard B. Johnson
2003-04-19 11:45 ` Kai Henningsen
@ 2003-04-19 20:16 ` Ruth Ivimey-Cook
1 sibling, 0 replies; 20+ messages in thread
From: Ruth Ivimey-Cook @ 2003-04-19 20:16 UTC (permalink / raw)
To: root, Jeff Garzik; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel
At 20:29 18/04/2003, Richard B. Johnson wrote:
> > > A lot of persons who are unfamiliar with tools other than 'C' think
> > > that strcpy() is made like this:
> > >
> > > while(*dsp++ = *src++)
> > > ;
> >
> > In fact, that's basically the kernel's non-arch-specific implementation :)
> >
> > Jeff
>
>Yep. Naive code looks so 'simple', must be "optimum", no? ;^).
Well, actually on ARM at least it is optimal, given the likely short length
of strings. To do any better at all you have to assume stringlength of many
tens of words, which is unlikely. For comparison, this is what armcc
produces for the above code:
strdup PROC
|L1.0|
000000 e4d12001 LDRB a3,[a2],#1 ;1
000004 e4c02001 STRB a3,[a1],#1 ;1
000008 e3520000 CMP a3,#0 ;1
00000c 1afffffb BNE |L1.0| ;1
000010 e1a0f00e MOV pc,lr ;1
AFAICT there is no better solution.
Ruth
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup
2003-04-19 12:27 ` Alan Cox
2003-04-19 14:44 ` Bernd Eckenfels
@ 2003-04-20 8:05 ` Rusty Russell
1 sibling, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2003-04-20 8:05 UTC (permalink / raw)
To: Alan Cox; +Cc: Jeff Garzik, Linus Torvalds, Linux Kernel Mailing List
In message <1050755240.3277.3.camel@dhcp22.swansea.linux.org.uk> you write:
> On Sad, 2003-04-19 at 05:48, Jeff Garzik wrote:
> > And? It's still slower.
>
> You are arguing over a 1 instruction, probably sub 1 clock scheduling
> matter on a call which is not used on any fast or common path. If you
> shaved 1 clock off the timer handling instead you'd make a lot more
> difference..
Hey, if we have 50 million machines running 2.6 for three years, this
optimization saves 100 clock cycles per boot (most kstrdups are device
init), and machines boot once a month, and average 1GHz, that's three
minutes of saved time.
I think Jeff and I should start the kstrdup sourceforge project
immediately...
Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2003-04-20 8:00 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-18 7:58 [TRIVIAL] kstrdup Rusty Trivial Russell
2003-04-18 8:10 ` Jeff Garzik
2003-04-18 16:20 ` Linus Torvalds
2003-04-18 16:56 ` Jeff Garzik
2003-04-18 18:00 ` Linus Torvalds
2003-04-18 18:44 ` Jeff Garzik
2003-04-19 4:52 ` Rusty Russell
2003-04-18 18:00 ` Richard B. Johnson
2003-04-18 18:40 ` Jeff Garzik
2003-04-18 19:24 ` H. Peter Anvin
2003-04-18 23:37 ` Alan Cox
2003-04-18 19:29 ` Richard B. Johnson
2003-04-19 11:45 ` Kai Henningsen
2003-04-19 20:16 ` Ruth Ivimey-Cook
2003-04-19 4:14 ` Rusty Russell
2003-04-19 4:48 ` Jeff Garzik
2003-04-19 8:28 ` Rusty Russell
2003-04-19 12:27 ` Alan Cox
2003-04-19 14:44 ` Bernd Eckenfels
2003-04-20 8:05 ` Rusty Russell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).