* [TRIVIAL] kstrdup @ 2003-04-18 7:58 Rusty Trivial Russell 2003-04-18 8:10 ` Jeff Garzik 0 siblings, 1 reply; 20+ messages in thread From: Rusty Trivial Russell @ 2003-04-18 7:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel From: Rusty Russell <rusty@rustcorp.com.au> Everyone loves reimplementing strdup. Give them a kstrdup (basically from drivers/md). Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. Name: kstrdup Author: Neil Brown and Rusty Russell Status: Trivial D: Everyone loves reimplementing strdup. Give them a kstrdup. --- trivial-2.5.67-bk8/drivers/md/dm-ioctl.c.orig 2003-04-18 17:43:57.000000000 +1000 +++ trivial-2.5.67-bk8/drivers/md/dm-ioctl.c 2003-04-18 17:43:57.000000000 +1000 @@ -14,6 +14,7 @@ #include <linux/wait.h> #include <linux/blk.h> #include <linux/slab.h> +#include <linux/string.h> #include <asm/uaccess.h> @@ -118,14 +119,6 @@ /*----------------------------------------------------------------- * Inserting, removing and renaming a device. *---------------------------------------------------------------*/ -static inline char *kstrdup(const char *str) -{ - char *r = kmalloc(strlen(str) + 1, GFP_KERNEL); - if (r) - strcpy(r, str); - return r; -} - static struct hash_cell *alloc_cell(const char *name, const char *uuid, struct mapped_device *md) { @@ -135,7 +128,7 @@ if (!hc) return NULL; - hc->name = kstrdup(name); + hc->name = kstrdup(name, GFP_KERNEL); if (!hc->name) { kfree(hc); return NULL; @@ -145,7 +138,7 @@ hc->uuid = NULL; else { - hc->uuid = kstrdup(uuid); + hc->uuid = kstrdup(uuid, GFP_KERNEL); if (!hc->uuid) { kfree(hc->name); kfree(hc); @@ -270,7 +263,7 @@ /* * duplicate new. */ - new_name = kstrdup(new); + new_name = kstrdup(new, GFP_KERNEL); if (!new_name) return -ENOMEM; --- trivial-2.5.67-bk8/include/linux/string.h.orig 2003-04-18 17:43:57.000000000 +1000 +++ trivial-2.5.67-bk8/include/linux/string.h 2003-04-18 17:43:57.000000000 +1000 @@ -78,6 +78,8 @@ extern void * memchr(const void *,int,__kernel_size_t); #endif +extern char *kstrdup(const char *s, int gfp); + #ifdef __cplusplus } #endif --- trivial-2.5.67-bk8/kernel/ksyms.c.orig 2003-04-18 17:43:57.000000000 +1000 +++ trivial-2.5.67-bk8/kernel/ksyms.c 2003-04-18 17:43:57.000000000 +1000 @@ -585,6 +585,7 @@ EXPORT_SYMBOL(strnicmp); EXPORT_SYMBOL(strspn); EXPORT_SYMBOL(strsep); +EXPORT_SYMBOL(kstrdup); /* software interrupts */ EXPORT_SYMBOL(tasklet_init); --- trivial-2.5.67-bk8/lib/string.c.orig 2003-04-18 17:43:57.000000000 +1000 +++ trivial-2.5.67-bk8/lib/string.c 2003-04-18 17:43:57.000000000 +1000 @@ -22,6 +22,7 @@ #include <linux/types.h> #include <linux/string.h> #include <linux/ctype.h> +#include <linux/slab.h> #ifndef __HAVE_ARCH_STRNICMP /** @@ -525,5 +526,12 @@ } return NULL; } - #endif + +char *kstrdup(const char *s, int gfp) +{ + char *buf = kmalloc(strlen(s)+1, gfp); + if (buf) + strcpy(buf, s); + return buf; +} -- Don't blame me: the Monkey is driving File: Rusty Russell <rusty@rustcorp.com.au>: [PATCH] [TRIVIAL] kstrdup ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 7:58 [TRIVIAL] kstrdup Rusty Trivial Russell @ 2003-04-18 8:10 ` Jeff Garzik 2003-04-18 16:20 ` Linus Torvalds 2003-04-19 4:14 ` Rusty Russell 0 siblings, 2 replies; 20+ messages in thread From: Jeff Garzik @ 2003-04-18 8:10 UTC (permalink / raw) To: Rusty Trivial Russell; +Cc: Linus Torvalds, linux-kernel Rusty Trivial Russell wrote: > +char *kstrdup(const char *s, int gfp) > +{ > + char *buf = kmalloc(strlen(s)+1, gfp); > + if (buf) > + strcpy(buf, s); > + return buf; > +} You should save the strlen result to a temp var, and then s/strcpy/memcpy/ Jeff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 8:10 ` Jeff Garzik @ 2003-04-18 16:20 ` Linus Torvalds 2003-04-18 16:56 ` Jeff Garzik 2003-04-19 4:14 ` Rusty Russell 1 sibling, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2003-04-18 16:20 UTC (permalink / raw) To: Jeff Garzik; +Cc: Rusty Trivial Russell, linux-kernel On Fri, 18 Apr 2003, Jeff Garzik wrote: > > You should save the strlen result to a temp var, and then s/strcpy/memcpy/ No, you should just not do this. I don't see the point. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 16:20 ` Linus Torvalds @ 2003-04-18 16:56 ` Jeff Garzik 2003-04-18 18:00 ` Linus Torvalds 2003-04-18 18:00 ` Richard B. Johnson 0 siblings, 2 replies; 20+ messages in thread From: Jeff Garzik @ 2003-04-18 16:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rusty Trivial Russell, linux-kernel Linus Torvalds wrote: > On Fri, 18 Apr 2003, Jeff Garzik wrote: > >>You should save the strlen result to a temp var, and then s/strcpy/memcpy/ > > > No, you should just not do this. I don't see the point. strcpy has a test for each byte of its contents, and memcpy doesn't. Why search 's' for NULL twice? Jeff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 16:56 ` Jeff Garzik @ 2003-04-18 18:00 ` Linus Torvalds 2003-04-18 18:44 ` Jeff Garzik 2003-04-19 4:52 ` Rusty Russell 2003-04-18 18:00 ` Richard B. Johnson 1 sibling, 2 replies; 20+ messages in thread From: Linus Torvalds @ 2003-04-18 18:00 UTC (permalink / raw) To: Jeff Garzik; +Cc: Rusty Trivial Russell, linux-kernel On Fri, 18 Apr 2003, Jeff Garzik wrote: > Linus Torvalds wrote: > > On Fri, 18 Apr 2003, Jeff Garzik wrote: > > > >>You should save the strlen result to a temp var, and then s/strcpy/memcpy/ > > > > No, you should just not do this. I don't see the point. > > strcpy has a test for each byte of its contents, and memcpy doesn't. > Why search 's' for NULL twice? No, my point is that kstrdup() _itself_ just shouldn't be done. I don't see it as being worthy of kernel support. Most of the kernel string data structures are NOT random zero-ended strings anyway: they are either strictly limited in some ways ("ends in '\0', but limited to PATH_MAX), or they are explicitly sized ("struct qstr"). I don't much personally like C strings. Explicit lengths tend to have a lot of advantages, and while a lot of the standard C infrastructure is for zero-ended strings, they do end up being even worse in the kernel than in user space (think buffer overflows and limited allocations in general). Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 18:00 ` Linus Torvalds @ 2003-04-18 18:44 ` Jeff Garzik 2003-04-19 4:52 ` Rusty Russell 1 sibling, 0 replies; 20+ messages in thread From: Jeff Garzik @ 2003-04-18 18:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rusty Trivial Russell, linux-kernel Linus Torvalds wrote: > On Fri, 18 Apr 2003, Jeff Garzik wrote: > >>Linus Torvalds wrote: >> >>>On Fri, 18 Apr 2003, Jeff Garzik wrote: >>> >>> >>>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/ >>> >>>No, you should just not do this. I don't see the point. >> >>strcpy has a test for each byte of its contents, and memcpy doesn't. >>Why search 's' for NULL twice? > > > No, my point is that kstrdup() _itself_ just shouldn't be done. I don't Ah, indeed :) > see it as being worthy of kernel support. Most of the kernel string data > structures are NOT random zero-ended strings anyway: they are either > strictly limited in some ways ("ends in '\0', but limited to PATH_MAX), or > they are explicitly sized ("struct qstr"). Quite true Jeff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 18:00 ` Linus Torvalds 2003-04-18 18:44 ` Jeff Garzik @ 2003-04-19 4:52 ` Rusty Russell 1 sibling, 0 replies; 20+ messages in thread From: Rusty Russell @ 2003-04-19 4:52 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel In message <Pine.LNX.4.44.0304181055290.2950-100000@home.transmeta.com> you write: > > No, my point is that kstrdup() _itself_ just shouldn't be done. I don't > see it as being worthy of kernel support. parport, afs, intermezzo, sunrpc, md, sound and um have their own strdup variants. ecard.c (ARM), xtalk.c (ia64), ide.c, blkmtd.c (mtd), dlci.c (wan), parport (again), md (again), moctotek.c (usb), scsiglue.c (usb), super.c (affs), inode.c (nfs), and generic.c (proc) all open code it. Most of them use it multuple times. I think unifying the seven implementations, and over fifty uses, is a minor but worthwhile goal. It's not as widely used as it would be in userspace, but still... Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 16:56 ` Jeff Garzik 2003-04-18 18:00 ` Linus Torvalds @ 2003-04-18 18:00 ` Richard B. Johnson 2003-04-18 18:40 ` Jeff Garzik 1 sibling, 1 reply; 20+ messages in thread From: Richard B. Johnson @ 2003-04-18 18:00 UTC (permalink / raw) To: Jeff Garzik; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel On Fri, 18 Apr 2003, Jeff Garzik wrote: > Linus Torvalds wrote: > > On Fri, 18 Apr 2003, Jeff Garzik wrote: > > > >>You should save the strlen result to a temp var, and then s/strcpy/memcpy/ > > > > > > No, you should just not do this. I don't see the point. > > > strcpy has a test for each byte of its contents, and memcpy doesn't. > Why search 's' for NULL twice? > > Jeff Because it doesn't. strcpy() is usually implimented by getting the string-length, using the same code sequence as strlen(), then using the same code sequence as memcpy(), but copying the null-byte as well. The check for the null-byte is done in the length routine. If you do a memcpy(a, b, strlen(b));, then you are making two procedure calls and dirtying the cache twice.. A typical Intel procedure, stripped of the push/pops to save registers is here.... cld movl SRC(%esp), %edi movl %edi, %esi # Source = ESI and EDI orl $-1, %ecx xorl %eax,%eax # Zero repnz scasb # Look for the null-byte negl %ecx # ecx = string length incl %ecx # copy the null movl DST(%esp), %edi # Destination buffer EDI shrl $1, %ecx # Make word count pushf # Save result shrl $1, %ecx # Make longword count rep movsl # copy longwords adcl %ecx, %ecx # Possible word count rep movsw # Copy possible last word popf # Result from first shift adcl %ecx, %ecx # Possible byte-count rep movsb # Copy last byte There are no jumps and the locality of action is good. This could be improved in the corner cases of long, aligned buffers. A lot of persons who are unfamiliar with tools other than 'C' think that strcpy() is made like this: while(*dsp++ = *src++) ; ... ahaaa .. Wouderful short portable code! Wrong, awful code, possibly portable. .L2: movl 8(%ebp),%edx # Memory reference (destination) movl 12(%ebp),%ecx # Memory reference (source) movb (%ecx),%al # Get a byte movb %al,(%edx) # Put it into destination buffer incl 12(%ebp) # Memory reference (bump the source) incl 8(%ebp) # Memory reference (bump the destination) testb %al,%al # Check for zero on every byte jne .L4 # Why a jump to here? (should be jne .L2) jmp .L3 .align 4 .L4: jmp .L2 # Now we go back to the beginning. .align 4 .L3: .L1: movl %ebp,%esp popl %ebp ret Now; this code is 'shorter', but it is very slow compared to the smarter strcpy() as previously shown. Cheers, Dick Johnson Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips). Why is the government concerned about the lunatic fringe? Think about it. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 18:00 ` Richard B. Johnson @ 2003-04-18 18:40 ` Jeff Garzik 2003-04-18 19:24 ` H. Peter Anvin 2003-04-18 19:29 ` Richard B. Johnson 0 siblings, 2 replies; 20+ messages in thread From: Jeff Garzik @ 2003-04-18 18:40 UTC (permalink / raw) To: root; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel Richard B. Johnson wrote: > On Fri, 18 Apr 2003, Jeff Garzik wrote: > > >>Linus Torvalds wrote: >> >>>On Fri, 18 Apr 2003, Jeff Garzik wrote: >>> >>> >>>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/ >>> >>> >>>No, you should just not do this. I don't see the point. >> >> >>strcpy has a test for each byte of its contents, and memcpy doesn't. >>Why search 's' for NULL twice? >> >> Jeff > > > Because it doesn't. strcpy() is usually implimented by getting > the string-length, using the same code sequence as strlen(), then > using the same code sequence as memcpy(), but copying the null-byte > as well. The check for the null-byte is done in the length routine. > > If you do a memcpy(a, b, strlen(b));, then you are making two > procedure calls and dirtying the cache twice.. Wrong, because we have to call strlen _anyway_, to provide the size to kmalloc. > A typical Intel procedure, stripped of the push/pops to save > registers is here.... That's kinda cute. Why not submit a patch to the strcpy implementation in include/asm-i386/string.h? :) Ours is shorter, but does have a jump: "1:\tlodsb\n\t" "stosb\n\t" "testb %%al,%%al\n\t" "jne 1b" Which is better? I don't know; I'm still learning the performance eccentricities of x86 insns on various processors. Related x86 question: if the memory buffer is not dword-aligned, is 'rep movsl' the best idea? On RISC it's usually smarter to unroll the head of the loop to avoid unaligned accesses; but from reading x86 asm code in the kernel, nobody seems to care about that. Is the unaligned-access penalty so small that the increased code size of the head-unroll is never worth it? > A lot of persons who are unfamiliar with tools other than 'C' think > that strcpy() is made like this: > > while(*dsp++ = *src++) > ; In fact, that's basically the kernel's non-arch-specific implementation :) Jeff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 18:40 ` Jeff Garzik @ 2003-04-18 19:24 ` H. Peter Anvin 2003-04-18 23:37 ` Alan Cox 2003-04-18 19:29 ` Richard B. Johnson 1 sibling, 1 reply; 20+ messages in thread From: H. Peter Anvin @ 2003-04-18 19:24 UTC (permalink / raw) To: linux-kernel Followup to: <3EA0469D.7090602@pobox.com> By author: Jeff Garzik <jgarzik@pobox.com> In newsgroup: linux.dev.kernel > > That's kinda cute. Why not submit a patch to the strcpy implementation > in include/asm-i386/string.h? :) Ours is shorter, but does have a jump: > "1:\tlodsb\n\t" > "stosb\n\t" > "testb %%al,%%al\n\t" > "jne 1b" > > Which is better? I don't know; I'm still learning the performance > eccentricities of x86 insns on various processors. > It varies from porocessor to processor, and also depends on usage. > > Related x86 question: if the memory buffer is not dword-aligned, is > 'rep movsl' the best idea? On RISC it's usually smarter to unroll the > head of the loop to avoid unaligned accesses; but from reading x86 asm > code in the kernel, nobody seems to care about that. Is the > unaligned-access penalty so small that the increased code size of the > head-unroll is never worth it? > A lot of the newer x86 processors will perform this unrolling in microcode. -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! "Unix gives you enough rope to shoot yourself in the foot." Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 19:24 ` H. Peter Anvin @ 2003-04-18 23:37 ` Alan Cox 0 siblings, 0 replies; 20+ messages in thread From: Alan Cox @ 2003-04-18 23:37 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Linux Kernel Mailing List On Gwe, 2003-04-18 at 20:24, H. Peter Anvin wrote: > Followup to: <3EA0469D.7090602@pobox.com> > > > > Which is better? I don't know; I'm still learning the performance > > eccentricities of x86 insns on various processors. > > > > It varies from porocessor to processor, and also depends on usage. Stepping to stepping, even clock speed at stepping in the case of some B series pentiums. For large copies we hit the kernel optimised stuff like MMX copiers, for small copies its irrelevant ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 18:40 ` Jeff Garzik 2003-04-18 19:24 ` H. Peter Anvin @ 2003-04-18 19:29 ` Richard B. Johnson 2003-04-19 11:45 ` Kai Henningsen 2003-04-19 20:16 ` Ruth Ivimey-Cook 1 sibling, 2 replies; 20+ messages in thread From: Richard B. Johnson @ 2003-04-18 19:29 UTC (permalink / raw) To: Jeff Garzik; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel On Fri, 18 Apr 2003, Jeff Garzik wrote: > Richard B. Johnson wrote: > > On Fri, 18 Apr 2003, Jeff Garzik wrote: > > > > > >>Linus Torvalds wrote: > >> > >>>On Fri, 18 Apr 2003, Jeff Garzik wrote: > >>> > >>> > >>>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/ > >>> > >>> > >>>No, you should just not do this. I don't see the point. > >> > >> > >>strcpy has a test for each byte of its contents, and memcpy doesn't. > >>Why search 's' for NULL twice? > >> > >> Jeff > > > > > > Because it doesn't. strcpy() is usually implemented by getting > > the string-length, using the same code sequence as strlen(), then > > using the same code sequence as memcpy(), but copying the null-byte > > as well. The check for the null-byte is done in the length routine. > > > > If you do a memcpy(a, b, strlen(b));, then you are making two > > procedure calls and dirtying the cache twice.. > > Wrong, because we have to call strlen _anyway_, to provide the size to > kmalloc. > > > > A typical Intel procedure, stripped of the push/pops to save > > registers is here.... > > That's kinda cute. Why not submit a patch to the strcpy implementation > in include/asm-i386/string.h? :) Ours is shorter, but does have a jump: > "1:\tlodsb\n\t" > "stosb\n\t" > "testb %%al,%%al\n\t" > "jne 1b" > Years ago I did submit patches of all kinds, mostly 'asm' stuff because I have been doing assembly for over 20 years. However, I got tired of being shot-down in flames by persons who haven't a clue, so I stopped doing that. The history of the stuff in asm-i386 is full of changes, with many bugs introduced by persons who tried so save a nanosecond here and there. In recent times (past two years) somebody changed the stuff back to things which were not optimum, but quite obviously correct. I might 'risk' sending in some patches again. Maybe. > Which is better? I don't know; I'm still learning the performance > eccentricities of x86 insns on various processors. > The test for every byte transferred is, quite obviously, correct. It is also, quite obviously, non optimum. > > Related x86 question: if the memory buffer is not dword-aligned, is > 'rep movsl' the best idea? On RISC it's usually smarter to unroll the > head of the loop to avoid unaligned accesses; but from reading x86 asm > code in the kernel, nobody seems to care about that. Is the > unaligned-access penalty so small that the increased code size of the > head-unroll is never worth it? > Unaligned access takes a penalty. On early i586 machines, it was horrible, doubled the access time. On i486 and, later on i686 machines, the access times are not changed as radically. Many of the changes, that were later reverted back, to the string and memory 'asm' routines occurred when the 'awful' i586 came out. Of course, the unaligned access on the i586 was still faster than the i486 with aligned access (because of clock speeds). However, it was worth the trouble to improve the assembly routines at that time. > > > A lot of persons who are unfamiliar with tools other than 'C' think > > that strcpy() is made like this: > > > > while(*dsp++ = *src++) > > ; > > In fact, that's basically the kernel's non-arch-specific implementation :) > > Jeff Yep. Naive code looks so 'simple', must be "optimum", no? ;^). Cheers, Dick Johnson Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips). Why is the government concerned about the lunatic fringe? Think about it. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 19:29 ` Richard B. Johnson @ 2003-04-19 11:45 ` Kai Henningsen 2003-04-19 20:16 ` Ruth Ivimey-Cook 1 sibling, 0 replies; 20+ messages in thread From: Kai Henningsen @ 2003-04-19 11:45 UTC (permalink / raw) To: linux-kernel root@chaos.analogic.com (Richard B. Johnson) wrote on 18.04.03 in <Pine.LNX.4.53.0304181512220.22901@chaos>: > The test for every byte transferred is, quite obviously, correct. > It is also, quite obviously, non optimum. Actually, that is very much not obvious. Especially if you're familiar with architectures where every move has an implicit test (typically for zero and sign), and so checking for the zero byte during the move is quite obviously the only sane thing to do - the version with a count is slower, because the inner loop does more. (Those architectures typically don't have a REP-style prefix.) MfG Kai ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 19:29 ` Richard B. Johnson 2003-04-19 11:45 ` Kai Henningsen @ 2003-04-19 20:16 ` Ruth Ivimey-Cook 1 sibling, 0 replies; 20+ messages in thread From: Ruth Ivimey-Cook @ 2003-04-19 20:16 UTC (permalink / raw) To: root, Jeff Garzik; +Cc: Linus Torvalds, Rusty Trivial Russell, linux-kernel At 20:29 18/04/2003, Richard B. Johnson wrote: > > > A lot of persons who are unfamiliar with tools other than 'C' think > > > that strcpy() is made like this: > > > > > > while(*dsp++ = *src++) > > > ; > > > > In fact, that's basically the kernel's non-arch-specific implementation :) > > > > Jeff > >Yep. Naive code looks so 'simple', must be "optimum", no? ;^). Well, actually on ARM at least it is optimal, given the likely short length of strings. To do any better at all you have to assume stringlength of many tens of words, which is unlikely. For comparison, this is what armcc produces for the above code: strdup PROC |L1.0| 000000 e4d12001 LDRB a3,[a2],#1 ;1 000004 e4c02001 STRB a3,[a1],#1 ;1 000008 e3520000 CMP a3,#0 ;1 00000c 1afffffb BNE |L1.0| ;1 000010 e1a0f00e MOV pc,lr ;1 AFAICT there is no better solution. Ruth ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-18 8:10 ` Jeff Garzik 2003-04-18 16:20 ` Linus Torvalds @ 2003-04-19 4:14 ` Rusty Russell 2003-04-19 4:48 ` Jeff Garzik 1 sibling, 1 reply; 20+ messages in thread From: Rusty Russell @ 2003-04-19 4:14 UTC (permalink / raw) To: Jeff Garzik; +Cc: Linus Torvalds, linux-kernel In message <3E9FB2E9.9040308@pobox.com> you write: > Rusty Trivial Russell wrote: > > +char *kstrdup(const char *s, int gfp) > > +{ > > + char *buf = kmalloc(strlen(s)+1, gfp); > > + if (buf) > > + strcpy(buf, s); > > + return buf; > > +} > > You should save the strlen result to a temp var, and then s/strcpy/memcpy/ Completely disagree. Write the most straightforward code possible, and then if there proves to be a problem, optimize. Optimizations where there's no actual performance problem should be left to the compiler. Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for your version. Hope that helps, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-19 4:14 ` Rusty Russell @ 2003-04-19 4:48 ` Jeff Garzik 2003-04-19 8:28 ` Rusty Russell 2003-04-19 12:27 ` Alan Cox 0 siblings, 2 replies; 20+ messages in thread From: Jeff Garzik @ 2003-04-19 4:48 UTC (permalink / raw) To: Rusty Russell; +Cc: Linus Torvalds, linux-kernel Rusty Russell wrote: > In message <3E9FB2E9.9040308@pobox.com> you write: > >>Rusty Trivial Russell wrote: >> >>>+char *kstrdup(const char *s, int gfp) >>>+{ >>>+ char *buf = kmalloc(strlen(s)+1, gfp); >>>+ if (buf) >>>+ strcpy(buf, s); >>>+ return buf; >>>+} >> >>You should save the strlen result to a temp var, and then s/strcpy/memcpy/ > > > Completely disagree. Write the most straightforward code possible, > and then if there proves to be a problem, optimize. Optimizations > where there's no actual performance problem should be left to the > compiler. Since the kernel does its own string ops, the compiler does not have enough information to deduce that further optimization is possible. > Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for > your version. And? It's still slower. Jeff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-19 4:48 ` Jeff Garzik @ 2003-04-19 8:28 ` Rusty Russell 2003-04-19 12:27 ` Alan Cox 1 sibling, 0 replies; 20+ messages in thread From: Rusty Russell @ 2003-04-19 8:28 UTC (permalink / raw) To: Jeff Garzik; +Cc: Linus Torvalds, linux-kernel In message <3EA0D524.7010309@pobox.com> you write: > Since the kernel does its own string ops, the compiler does not have > enough information to deduce that further optimization is possible. You're right, I was measuring without the kernel string ops. > > Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for > > your version. > > And? It's still slower. Who gives a flying fuck? You're doing an allocation in there ferchissakes. Choose the simplest option. Jeff, if you have time to post on this, I think you need a hobby 8) char *__constant_kstrdup(const char *s, unsigned int size, int gfp) { char *buf = kmalloc(size, gfp); if (likely(buf)) memcpy(buf, s, size); return buf; } char *__kstrdup(const char *s, int gfp) { return __constant_kstrdup(s, strlen(s)+1, gfp); } #define kstrdup(str, gfp) \ (__builtin_constant_p(str) && sizeof(str) != sizeof(char *) ? \ __constant_kstrdup(str, sizeof(str), gfp) \ : __kstrdup(str, gfp)) Feature list: 1) Optimizes the constant kstrdup case, 2) Doesn't multi-evaluate args (except if constant string), 3) Uses likely() to bias against the case of kmalloc failing. OK, so I guess I need a hobby, too.. Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-19 4:48 ` Jeff Garzik 2003-04-19 8:28 ` Rusty Russell @ 2003-04-19 12:27 ` Alan Cox 2003-04-19 14:44 ` Bernd Eckenfels 2003-04-20 8:05 ` Rusty Russell 1 sibling, 2 replies; 20+ messages in thread From: Alan Cox @ 2003-04-19 12:27 UTC (permalink / raw) To: Jeff Garzik; +Cc: Rusty Russell, Linus Torvalds, Linux Kernel Mailing List On Sad, 2003-04-19 at 05:48, Jeff Garzik wrote: > > Completely disagree. Write the most straightforward code possible, > > and then if there proves to be a problem, optimize. Optimizations > > where there's no actual performance problem should be left to the > > compiler. > > Since the kernel does its own string ops, the compiler does not have > enough information to deduce that further optimization is possible. > > > > Case in point: gcc-3.2 on -O2 on Intel is one instruction longer for > > your version. > > And? It's still slower. You are arguing over a 1 instruction, probably sub 1 clock scheduling matter on a call which is not used on any fast or common path. If you shaved 1 clock off the timer handling instead you'd make a lot more difference.. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-19 12:27 ` Alan Cox @ 2003-04-19 14:44 ` Bernd Eckenfels 2003-04-20 8:05 ` Rusty Russell 1 sibling, 0 replies; 20+ messages in thread From: Bernd Eckenfels @ 2003-04-19 14:44 UTC (permalink / raw) To: linux-kernel In article <1050755240.3277.3.camel@dhcp22.swansea.linux.org.uk> you wrote: > You are arguing over a 1 instruction, probably sub 1 clock scheduling > matter on a call which is not used on any fast or common path. If you > shaved 1 clock off the timer handling instead you'd make a lot more > difference.. i think he is arguing about the performance of strcpy vs. memcpy. Is the 1 instruction longer including the inlined code for those two? Greetings Bernd -- eckes privat - http://www.eckes.org/ Project Freefire - http://www.freefire.org/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [TRIVIAL] kstrdup 2003-04-19 12:27 ` Alan Cox 2003-04-19 14:44 ` Bernd Eckenfels @ 2003-04-20 8:05 ` Rusty Russell 1 sibling, 0 replies; 20+ messages in thread From: Rusty Russell @ 2003-04-20 8:05 UTC (permalink / raw) To: Alan Cox; +Cc: Jeff Garzik, Linus Torvalds, Linux Kernel Mailing List In message <1050755240.3277.3.camel@dhcp22.swansea.linux.org.uk> you write: > On Sad, 2003-04-19 at 05:48, Jeff Garzik wrote: > > And? It's still slower. > > You are arguing over a 1 instruction, probably sub 1 clock scheduling > matter on a call which is not used on any fast or common path. If you > shaved 1 clock off the timer handling instead you'd make a lot more > difference.. Hey, if we have 50 million machines running 2.6 for three years, this optimization saves 100 clock cycles per boot (most kstrdups are device init), and machines boot once a month, and average 1GHz, that's three minutes of saved time. I think Jeff and I should start the kstrdup sourceforge project immediately... Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2003-04-20 8:00 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-04-18 7:58 [TRIVIAL] kstrdup Rusty Trivial Russell 2003-04-18 8:10 ` Jeff Garzik 2003-04-18 16:20 ` Linus Torvalds 2003-04-18 16:56 ` Jeff Garzik 2003-04-18 18:00 ` Linus Torvalds 2003-04-18 18:44 ` Jeff Garzik 2003-04-19 4:52 ` Rusty Russell 2003-04-18 18:00 ` Richard B. Johnson 2003-04-18 18:40 ` Jeff Garzik 2003-04-18 19:24 ` H. Peter Anvin 2003-04-18 23:37 ` Alan Cox 2003-04-18 19:29 ` Richard B. Johnson 2003-04-19 11:45 ` Kai Henningsen 2003-04-19 20:16 ` Ruth Ivimey-Cook 2003-04-19 4:14 ` Rusty Russell 2003-04-19 4:48 ` Jeff Garzik 2003-04-19 8:28 ` Rusty Russell 2003-04-19 12:27 ` Alan Cox 2003-04-19 14:44 ` Bernd Eckenfels 2003-04-20 8:05 ` Rusty Russell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).