All of lore.kernel.org
 help / color / mirror / Atom feed
* char* -> char[] replacement
@ 2009-09-21 17:34 Dr. David Alan Gilbert
  2009-09-23 14:46 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 2+ messages in thread
From: Dr. David Alan Gilbert @ 2009-09-21 17:34 UTC (permalink / raw)
  To: kernel-janitors

Hi,
  I noticed the apparently simple entry in the TODO list saying:

     From: Jeff Garzik

     1) The string form

        [const] char *foo = "blah";
           creates two variables in the final assembly output, a
           static string, and a char pointer to the static string.
           The alternate string form

        [const] char foo[] = "blah";
           is better because it declares a single variable.

        For variables marked __initdata, the "*foo" form causes only
        the pointer, not the string itself, to be dropped from the
        kernel image, which is a bug. Using the "foo[]" form with
        regular 'ole local variables also makes the assembly shorter.

A quick grep found some targets - but I'm not sure that there is
really a benefit; I was comparing the size of the .o's before and
after the change and noticed that on x86-64 gcc 4.4.1
(ubuntu karmic alpha 4:4.4.1-1ubuntu2)
the two different forms cause wildly different optimisation, and
the [] form causes gcc to build the string using instructions.

e.g. with a little standalone test:

int main(int argc, char* argv[]){
    /*char *frob="hello";*/
    char frob[]="hello";
    printf("%s\n", frob);
}

produces code that looks like:

	movl	$1819043176, (%rsp)
	movw	$111, 4(%rsp)
	call	__printf_chk

Doing the same trick to some of the error messages in ntfs/inode.c
I can see the same thing is happening for fairly large strings - e.g.

0000000000000280 <ntfs_truncate>:
     280:	55                   	push   %rbp
     281:	49 ba 20 20 4c 65 61 	mov    $0x6e697661654c2020,%r10
     288:	76 69 6e 
     28b:	48 89 e5             	mov    %rsp,%rbp
     28e:	49 b9 67 20 66 69 6c 	mov    $0x6c20656c69662067,%r9
     295:	65 20 6c 
     298:	41 57                	push   %r15
     29a:	49 b8 65 6e 67 74 68 	mov    $0x756f206874676e65,%r8
     2a1:	20 6f 75 
     2a4:	41 56                	push   %r14
     2a6:	48 be 74 20 6f 66 20 	mov    $0x6e797320666f2074,%rsi
     2ad:	73 79 6e 
     2b0:	41 55                	push   %r13
     2b2:	48 b9 63 20 77 69 74 	mov    $0x6920687469772063,%rcx
     2b9:	68 20 69 
     2bc:	41 54                	push   %r12
     2be:	53                   	push   %rbx
     2bf:	48 89 fb             	mov    %rdi,%rbx
     2c2:	48 81 ec d8 00 00 00 	sub    $0xd8,%rsp
     2c9:	48 81 eb 08 01 00 00 	sub    $0x108,%rbx
     2d0:	48 89 bd 38 ff ff ff 	mov    %rdi,-0xc8(%rbp)
     2d7:	65 48 8b 04 25 28 00 	mov    %gs:0x28,%rax
     2de:	00 00 
     2e0:	48 89 45 c8          	mov    %rax,-0x38(%rbp)
     2e4:	31 c0                	xor    %eax,%eax
     2e6:	48 8b 97 28 ff ff ff 	mov    -0xd8(%rdi),%rdx
     2ed:	48 89 75 a8          	mov    %rsi,-0x58(%rbp)
     2f1:	48 89 95 48 ff ff ff 	mov    %rdx,-0xb8(%rbp)
     2f8:	48 89 4d b0          	mov    %rcx,-0x50(%rbp)
     2fc:	4c 89 55 90          	mov    %r10,-0x70(%rbp)
     300:	4c 89 4d 98          	mov    %r9,-0x68(%rbp)
     304:	4c 89 45 a0          	mov    %r8,-0x60(%rbp)
     308:	c7 45 b8 5f 73 69 7a 	movl   $0x7a69735f,-0x48(%rbp)
     30f:	66 c7 45 bc 65 2e    	movw   $0x2e65,-0x44(%rbp)
     315:	c6 45 be 00          	movb   $0x0,-0x42(%rbp)
     319:	48 c7 c1 00 00 00 00 	mov    $0x0,%rcx
     320:	4c 8b 47 40          	mov    0x40(%rdi),%r8
     324:	48 c7 c2 00 00 00 00 	mov    $0x0,%rdx
     32b:	be 43 09 00 00       	mov    $0x943,%esi
     330:	48 c7 c7 00 00 00 00 	mov    $0x0,%rdi
     337:	e8 00 00 00 00       	callq  33c <ntfs_truncate+0xbc>

as opposed to the old:

0000000000000280 <ntfs_truncate>:
     280:	55                   	push   %rbp
     281:	48 c7 c1 00 00 00 00 	mov    $0x0,%rcx
     288:	48 89 e5             	mov    %rsp,%rbp
     28b:	48 c7 c2 00 00 00 00 	mov    $0x0,%rdx
     292:	41 57                	push   %r15
     294:	be 43 09 00 00       	mov    $0x943,%esi
     299:	41 56                	push   %r14
     29b:	41 55                	push   %r13
     29d:	41 54                	push   %r12
     29f:	53                   	push   %rbx
     2a0:	48 89 fb             	mov    %rdi,%rbx
     2a3:	48 81 ec 98 00 00 00 	sub    $0x98,%rsp
     2aa:	48 81 eb 08 01 00 00 	sub    $0x108,%rbx
     2b1:	48 89 bd 78 ff ff ff 	mov    %rdi,-0x88(%rbp)
     2b8:	48 8b 87 28 ff ff ff 	mov    -0xd8(%rdi),%rax
     2bf:	48 89 45 88          	mov    %rax,-0x78(%rbp)
     2c3:	31 c0                	xor    %eax,%eax
     2c5:	4c 8b 47 40          	mov    0x40(%rdi),%r8
     2c9:	48 c7 c7 00 00 00 00 	mov    $0x0,%rdi
     2d0:	e8 00 00 00 00       	callq  2d5 <ntfs_truncate+0x55>

This seems pretty mad - is there a good reason gcc is doing this?
Because from that it would seem that changing to char*'s to char[] is
currently growing code rather than shrinking it. Is gcc just being overly
enthusiastic for instructions with large constants?

On related questions, looking through some of the constant strings
in the code does find some oddities:

1) fs/ntfs/inode.c:2331 
  static const char *es = "  Leaving inconsistent metadata.  Unmount and run "
                "chkdsk.";

  is a bit weird (especially since some functions declare a local es).

2) security/tomoyo/common.c:tomoyo_print_double_path_acl
   I'm missing why the atmark1 and atmark2 exist at all.


Dave
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    | Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: char* -> char[] replacement
  2009-09-21 17:34 char* -> char[] replacement Dr. David Alan Gilbert
@ 2009-09-23 14:46 ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 2+ messages in thread
From: Dr. David Alan Gilbert @ 2009-09-23 14:46 UTC (permalink / raw)
  To: kernel-janitors

OK, I've spotted my mistake - but further digging is keeping me
confused about what the expected behaviour is, especially with modern gcc's

* Dr. David Alan Gilbert (linux@treblig.org) wrote:
> Hi,
>   I noticed the apparently simple entry in the TODO list saying:
> 
>      From: Jeff Garzik
> 
>      1) The string form
> 
>         [const] char *foo = "blah";
>            creates two variables in the final assembly output, a
>            static string, and a char pointer to the static string.
>            The alternate string form
> 
>         [const] char foo[] = "blah";
>            is better because it declares a single variable.
> 
>         For variables marked __initdata, the "*foo" form causes only
>         the pointer, not the string itself, to be dropped from the
>         kernel image, which is a bug. Using the "foo[]" form with
>         regular 'ole local variables also makes the assembly shorter.

<snip my original message>

I've been trying this out on a little standalone hello world.

I'd originally replaced char *foo = "blah" by char foo[] = "blah"
and been surprised at the extra code generated - but actually thinking
about it, inside a function this makes sense - every time through the
compiler has to refresh the array since you might have changed it -
so I'm not sure that substitution ever makes sense.

Life gets more interesting with const - and I don't understand why.
On gcc 4.2 if I declare const char foo[] = "blah"  then blah
ends up in .rodata.str, if you use it multiple times it gets shared.
However with 4.3.4 or 4.4.1 (all ubuntu packaged versions)
the const char foo[]="blah" case generates code to write the string
to the stack on each function call - why??? Is this a compiler bug
or is there a reason for this change?

Finally,
  static const char foo[] = "blah";

This ends up putting the constants in a .rodata section rather than
.rodata.str and doesn't share multiple uses of the same string within
the same .c file.

These were all done for local variables; with global variables it
doesn't look much better; const char foo[] does put things in .rodata
but again none-shared, as does static const char foo[].

All these tests were done on x64.

So, too summarise I'm a bit confused, I think the advise to change
  char* foo="hello" to char foo[]="hello" appears to be bogus without
the const.  With the const then I'm not sure the newer compilers are
behaving as expected.

  Sharing only seems to happen with the char* foo="hello".

Maybe in the case of initdata some of this helps, but I can't see outside
that - but I can see it causing code duplication or with the newer
compilers const char foo[] generating more code for no good reason.

Dave
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    | Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-09-23 14:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-21 17:34 char* -> char[] replacement Dr. David Alan Gilbert
2009-09-23 14:46 ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.