From mboxrd@z Thu Jan  1 00:00:00 1970
From: Tim Abbott <tabbott@ksplice.com>
Subject: Re: [PATCH 1/5] vmlinux.lds.h: Include *(.text.*) in TEXT_TEXT
Date: Mon, 14 Jun 2010 22:45:35 -0400 (EDT)
Message-ID: <alpine.DEB.1.10.1006142039250.1248@dr-wily.mit.edu>
References: <1276519112-11649-1-git-send-email-matt@console-pimps.org>  <alpine.DEB.1.10.1006141014360.1432@dr-wily.mit.edu>  <87y6ehxvby.fsf@linux-g6p1.site> <1276545951.5374.260.camel@mulgrave.site>  <alpine.DEB.1.10.1006141608550.1248@dr-wily.mit.edu>
 <1276556919.5374.822.camel@mulgrave.site>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: Matt Fleming <matt@console-pimps.org>, linux-arch@vger.kernel.org,
	Arnd Bergmann <arnd@arndb.de>, linux-kernel@vger.kernel.org,
	Sam Ravnborg <sam@ravnborg.org>, Michal Marek <mmarek@suse.cz>,
	Denys Vlasenko <vda.linux@googlemail.com>,
	Parisc List <linux-parisc@vger.kernel.org>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Return-path: <linux-parisc-owner@vger.kernel.org>
In-Reply-To: <1276556919.5374.822.camel@mulgrave.site>
List-ID: <linux-parisc.vger.kernel.org>
List-Id: linux-parisc.vger.kernel.org

On Mon, 14 Jun 2010, James Bottomley wrote:

> > I believe that the pattern [A-Za-z$_] matches all valid characters to 
> > start a function name (in particular, it includes "$").  If I'm missing 
> > any valid characters for the start of a function name, please correct me.
> 
> Well, our linker seems to generate annoying symbols with carets in
> them ...

The question here is: is there C code that when compiled with 
-ffunction-sections will generate an ELF section with a name that starts 
with ".text.^"?  For that to happen, you would need a function whose name 
started with "^", which isn't valid C.

The relevant namespace here is the names for ELF sections generated by 
-ffunction-sections.  These are in turn computed by the compiler from 
function names -- there's no potential conflict created by 
linker-generated symbols whose names start with a caret.  Similarly, for 
-fdata-sections, we only care about the names of C data objects, which 
also can't start with a caret.

> > While one could in principle try to handle things by not renaming the 
> > .text.foo sections and instead just placing the linker script code for 
> > them all before a .text.* item in the linker script, that approach is 
> > really fragile.  I think the "text..foo" approach is a good design and I 
> > am not aware of any problems with it.
> 
> OK, but how about some actual explanation?  You've just characterised
> the current -ffunction-sections scheme that parisc has used for decades
> as "fragile"

The current parisc situation is fine.

What I was trying to draw a contrast with is supporting -fdata-sections by 
adding ".data.*" to DATA_DATA, and then trying to make sure that all the 
architecture linker scripts handle all the kernel's special data sections 
with names like ".data.foo" before the place where DATA_DATA appears in 
their linker scripts.  Most of the architecture linker scripts mention 
more than a half dozen special kernel sections with names of the form 
".data.foo", often in fairly random orders, and so it would be really 
fragile to add the constraint that these sections need to all appear above 
DATA_DATA.

Adding ".data.[A-Za-z$_]" to DATA_DATA doesn't have this problem.

If we similarly added ".text.[A-Za-z$_]" to TEXT_TEXT, we'd presumably 
move the 4 named .text.foo sections before TEXT_TEXT; I don't think any 
other architectures would require any work.

> > Some more detailed explanation is available here:
> > <http://lkml.org/lkml/2010/2/19/365>
> 
> That's still a bit short on explanations.
>
> But if I infer from the rest, someone, somewhere broke the convention
> that all our special linux sections be called .XX.data and .XX.text to
> distinguish them from the .text.FF and .data.YY the compiler will
> generate with the relevant sectional directives? because it's been
> working OK for us for a while.

I don't know the full history here.  But prior to the ".data.foo" -> 
".data..foo" patches that were merged recently, there were a bunch of 
cross-architecture sections with these sorts of names, e.g.:

.data.page_aligned
.data.nosave
.data.read_mostly
.data.cacheline_aligned
.data.lock_aligned
.data.percpu*
.data.init_task
etc.

There were also a bunch of ".text.foo" sections on individual 
architectures, many of which currently don't support -ffunction-sections 
(sh, ia64, x86, mips, etc.).

However, there weren't any .text.foo sections that are cross-architecture.  
Since parisc only uses -ffunction-sections, and not -fdata-sections, the 
popular .data.foo naming scheme doesn't cause any breakage on parisc.

The only architecture that does use -fdata-sections is frv, and there 
could theoretically have been breakage there, but in practice it's likely 
nobody has written kernel code that would actually conflict, e.g. "static 
int percpu = 3;", yet.

> To fix the breakage, the proposal now is to name all linux special
> sections as .text..XX and .data..XX?  I can see that's more standard
> looking that XX.text and XX.data, but not necessarily that it's better.

Yes, that's the proposal.

> This then introduces a problem of matching because .text.X and .text..X
> are hard to distinguish using the linker matching scripts.

Right.  I believe that this is totally solvable with a simple linker 
script pattern, since the space of valid names for functions and data 
objects in C code is quite restricted (and that the implementation of 
using e.g. ".data.[A-Za-z$_]*" solves this problem).

> So even if I buy the rename of the linux symbols, what about using a 
> linker defined symbol that's illegal as a function as the initial 
> separator instead of .?  So hyphen looks the obvious one ... you can 
> have all the linux special sections being .text- and .data- then we can 
> easily distinguish.

Is "." a valid first character for a function name?  I don't see the 
problem with using "." here.

Both .page_aligned.data and .data-page_aligned are valid choices (and in 
fact, the first patch series I sent on this topic about 18 months ago did 
.page_aligned.data, I think).

The main technical difference between ".data..page_aligned" and 
".page_aligned.data" in my view is that you need to be more careful when 
writing assembly files with ".page_aligned.data".

To give an example, if I run the following:

$ cat > foo.s 
.section .data-page-aligned
	.long 0
.section .data.page_aligned
	.long 1
$ gcc -c foo.s -o foo.o
$ objdump -h foo.o 
foo.o:     file format elf32-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000000  00000000  00000000  00000034  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000034  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000034  2**2
                  ALLOC
  3 .data-page-aligned 00000004  00000000  00000000  00000034  2**0
                  CONTENTS, READONLY
  4 .data.page_aligned 00000004  00000000  00000000  00000038  2**0
                  CONTENTS, ALLOC, LOAD, DATA

one can see that the .data-page-aligned section doesn't have the right 
section flags.  So I'm pretty sure the relevant assembler heuristic is 
looking for things starting with ".data.", not just ".data".

The kernel has a lot of code in assembly files that just does:

.section ".data"

and so there's a very real risk that folks who are doing pattern-matching 
development on assembly files will end up creating non-allocated sections 
by accident (I've certainly made this mistake myself, and there's a bug of 
this form in arch/x86/lib/thunk.S until commit 
c6c2d7a084d14a8a701be84872aa1b77d2945f46, so I don't think I'm alone)

I also think that ".data..page_aligned" is more readable as a new name for 
the former ".data.page_aligned" than ".page_aligned.data" is, but I think 
that's a secondary consideration.  ".data.-page_aligned" would be 
technically equivalent to ".data..page_aligned", but I think it is uglier.

	-Tim Abbott