linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Why Plan 9 C compilers don't have asm("")
@ 2001-07-04  3:37 Rick Hohensee
  2001-07-04  3:36 ` Olivier Galibert
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Rick Hohensee @ 2001-07-04  3:37 UTC (permalink / raw)
  To: linux-kernel

Because it's messy and unnecessary. Break this into asmlinkbuild, 
asmlink.c, asmlink.h and asmlink.S, chmod +x asmlinkbuild, run it, and
behold a 6. 
__________________________________________________________________

#..........................................................
# asmlinkbuild

gcc -c  asmlink.S
gcc -o asmlinked asmlink.c asmlink.o
asmlinked

cat asmlinkbuild asmlink.S asmlink.c > asmlink.post


/* ***************************************************
 asmlink.S

int bla (int ha, int hahaha, int uh) ;

That does...

        push uh
        push hahaha
        push ha

*/

.globl bla
bla:
        add 4(%esp), %eax
        add 8(%esp), %eax
        add 12(%esp), %eax
        ret



/* ********************************************   asmlink.c */
#include "asmlink.h"


int main () {
        printf("%d\n", bla(1, 2 , 3 ) ) ;

}

_________________________________________________________________

That's with the GNU tools, without asm(), and without proper declaration
of printf, as is my tendency. I don't actually return an int either, do I?
LAAETTR.

In other words, if you know the push sequence of your C compiler's
function calls, you don't need asm("");. x86 Gcc is "push last declared
first, return in EAX". Plan 9 guys, not surprisingly, seem to prefer to
keep C as C, and asm as asm. I encountered this while trying to build
Linux 1.2.13 with current GNU tools. It breaks on changes in GNU C
asm()'s. Rather a silly thing to break on, eh?

I don't think this is much less clear than the : "=r" $0;  stuff, if at
all. This thing didn't take as long to code as it did to construct this
post. Perhaps the C-labels-in-asms optimizes better. I doubt if it's by
much, or if it's worth it.

Oops. I didn't include asmlink.h in the above, except as a comment
in asmlink.S. Here it is by itself...

/* ********************************************asmlink.h*/
int bla (int ha, int hahaha, int uh) ;


Another easy win from Plan 9 that's related to this but that is not in
evidence here is that this thing on Plan 9 could build asmlinkbuild for
itself on the fly based on #pragma's in the headers that simply state what
library they are the header for. This to me is so obviously an improvement
to the usual state of affairs, an ornate system of dead-ends, as to be
depressing. The guys that wrote UNIX don't do such things to themselves
anymore.

Rick Hohensee
:; cLIeNUX /dev/tty11  11:00:14   /
:;d
ABOUT        LGPL         boot         device       log      subroutine
ABOUT.Linux  Linux        command      floppy       mounts       suite
GPL          README       configure    guest        owner        temp
H3nix        RIGHTS       dev          help         source
:; cLIeNUX /dev/tty11  22:44:25   /
:;











^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: Why Plan 9 C compilers don't have asm("")
@ 2001-07-04 10:10 Rick Hohensee
  0 siblings, 0 replies; 36+ messages in thread
From: Rick Hohensee @ 2001-07-04 10:10 UTC (permalink / raw)
  To: linux-kernel

>>
>Cort Dugan
>> There isn't such a crippling difference between straight-line and code
>>with>
>> unconditional branches in it with modern processors.  In fact, there's>
>>very
>> little measurable difference.
>>
>> If you're looking for something to blame hurd performance on I'd
>>suggest
>> the entire design of Mach, not inline asm vs procedure calls.  Tossing
>>a
>> few context switches into calls is a lot more expensive.
>
hpa
>That's not where the bulk of the penalty of a function call comes in
>(and it's a call/return, not an unconditional branch.)  The penalty
>comes in because of the additional need to obey the calling
>convention, and from the icache discontinuity.
>

call/return is two unconditional branches and a push and a pop (is that
right?), which is I think what CD means, i.e. in terms of branch
prediction. The push/pop is a hit on old CPUs, donno about >386. You're
right though. The big hit is you can't lose the pushes to set up the args
for a separately assembled function, or the frame drop that follows it.

>Not to mention that certain things simply cannot be done that way.
>

Don't tell me that. Then I can't use my subroutine-threaded Forth
variant, in which + is a subroutine call.  ;o)

Anyway, yes it's a performance hit to not inline asms. Is it worth the
bletchery? It's worth asking that once in a while. I've looked at set_bit
both ways. Now I'm curious how it does as straight C.

Rick Hohensee

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: Why Plan 9 C compilers don't have asm("")
@ 2001-07-05  3:26 Rick Hohensee
  0 siblings, 0 replies; 36+ messages in thread
From: Rick Hohensee @ 2001-07-05  3:26 UTC (permalink / raw)
  To: linux-kernel

>Now, you could probably argue that instead of inline asms we should have
>more flexibility in doing a per-callee calling convention. That would be
>good too, no question about it.
>
>                        Linus
>

Today's flamebait has been postponed. Happy July 4th. Peace.

Rick Hohensee
		www.clienux.com

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: Why Plan 9 C compilers don't have asm("")
@ 2001-07-06 17:24 Rick Hohensee
  2001-07-06 23:54 ` David S. Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Rick Hohensee @ 2001-07-06 17:24 UTC (permalink / raw)
  To: linux-kernel

>Cort Dougan writes:
> > I'm talking about _modern_ processors, not processors that dominate
>the
> > modern age.  This isn't x86.
>
>Linus mentioned Alpha specifically.  I don't see how any of the things
>he said were x86-centric in any way shape or form.
>
>All of his examples are entirely accurate on sparc64 for example, and
>to even moreso his Alpha commentary can nearly directly be applied to
>the MIPS.
>
>Calls suck ass, even on modern cpus.  I've seen several hundreds of
>

Modern? How many stacks?
There's a couple of Forth engines out there that pay the usual for a call
and get returns in zero time. Forth code, and Forth engine machine
instructions, have about twice as many calls as Linux code,
proportionately. Therefor, a return on some designs is one bit in every
instruction. Every instruction is "...and maybe do a return in parallel."
Forth engines don't have caches. They have on-chip stacks, or the Novix
has separate busses to the stacks. Both stacks, return and data. 

Forth chips aren't modern in the true-multi-user sense, but if an
individual were to design such a beast they could get several of them,
hundreds maybe, on FPGAs available now. Such things are coming, because a 
Forth chip IS something an individual can design.

Rick Hohensee
		www.clienux.com

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: Why Plan 9 C compilers don't have asm("")
@ 2001-07-07  6:16 Rick Hohensee
  0 siblings, 0 replies; 36+ messages in thread
From: Rick Hohensee @ 2001-07-07  6:16 UTC (permalink / raw)
  To: linux-kernel

I replied to davem at length but I think I forgot to "reply to all
recipients". The gist of it is Forth code density is so high on Forth
hardware that things like icaches aren't as important, and the factors
involved are entirely different. Like high-performance Forth engines
are tiny and draw negligible current. Two URL's...

	http://forth.gsfc.nasa.gov/
	http://www.mindspring.com/chipchuck/forth.html

Rick Hohensee
		www.clienux.com

^ permalink raw reply	[flat|nested] 36+ messages in thread
[parent not found: <mailman.994629840.17424.linux-kernel2news@redhat.com>]
* Re: Why Plan 9 C compilers don't have asm("")
@ 2001-07-09  3:03 Rick Hohensee
  0 siblings, 0 replies; 36+ messages in thread
From: Rick Hohensee @ 2001-07-09  3:03 UTC (permalink / raw)
  To: linux-kernel

>Victor Yodaiken <yodaiken@fsmlabs.com>
>
>I think anywhere that you have inner loop or often used operations
>that are short assembler sequences, inline asm is a win - it's easy to
>show for example, that the Linux asm x86  macro semaphore down
>is three times as fast as
>a called version. I wish, however
>that GCC did not use a horrible overly complex lisplike syntax and
>that there was a way to inline functions written in .S files.

If you can loop faster in asm, and you surely can on x86/Gcc in many
cases, that's a win, and probably quite a worthwhile one, but that's
independant of inline in terms of "not a C call". I think that distinction
may be prone to being overlooked. The longer your average loop, the less
asm("") matters, i.e. the less of a proportional hit a C stack ceremony
is. You can loop in asm and still not need asm(""), if you pay for the
stack frame. Plan 9 has about 4 string functions that are hand-coded, but
they are C-called, from what I can tell, and have been told.

Rick Hohensee
		www.clienux.com



^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: Why Plan 9 C compilers don't have asm("")
@ 2001-07-23  4:39 Rick Hohensee
  0 siblings, 0 replies; 36+ messages in thread
From: Rick Hohensee @ 2001-07-23  4:39 UTC (permalink / raw)
  To: linux-kernel

What if Forth only had one stack?

Looking at optimizations and calling conventions, I did some gas/cpp
macros that implement caller-hikes, callee passes. The caller makes the
space for the callee's stack frame, but it's up to the callee to populate
it if necessary. Sometimes it isn't. In assembly the current context can
see the whole stack, and "osimpa" macros not all included here make the
parent frame, the current frame, and the most recently exited child frame
3 sets of named locals. This is in conjunction with x86 RET imm16 , which
does a stack frame drop for free. I got the Ackerman function, a nasty
little recursion excercise, and rather C's home court, about 50% faster
than Gcc 3.0 -O3 -fomit-frame-pointer. The Gcc version does optimize out
the two tail recursions, leaving one non-tail recursion. I beat that with
all 3 tail recursions remaining in my code. i.e. this is the first version
that worked. I stared at this monster for 2 full days looking for where I
had written "increment" instead of "decrement". Now it appears to produce
the correct results.

..........................................................................

#define cell    4
#define cells   *4
#define sM       4 (%esp)
#define sN       8 (%esp)
                                        /* some of the parent's locals */
#define pM      ((def_hike +2)  cells) (%esp)
#define pN      ((def_hike +3)  cells) (%esp)


#define def(routine,HIKE)                       \
        def_hike    =       HIKE    ;       \
        .globl routine                  ;       \
        routine:

#define fed             ret $(def_hike cells)

#define child(callee)   child_hike = callee ## _hike

#define hike(by)        subl $(by cells) , %esp

#define do(callee)              \
        hike(def_hike)   ;\
        call callee
                                /* Asmacs exerpts as pertains */
#define testsubtract    cmpl
#define ifzero          jz
#define decrement       decl
#define increment       incl
#define to              ,
#define with            ,
#define copy            movl
#define A               %eax

def(Ack,2)
testsubtract $0 with pM
ifzero                                          alpha
        testsubtract $0 with pN
        ifzero                          beta
#                               return( Ack(M - 1, Ack(M, (N - 1))) );
                copy pN to A
                decrement A
                copy A to sN
                copy pM to A
                copy A to sM
                        do(Ack)
                copy A to sN
                decrement sM
                        do(Ack)
                                fed
#                                        return( N + 1 );
alpha:  copy pN to A                                    # M=0
        increment A
                        fed
#                                       return( Ack(M - 1, 1) );
beta:   copy $1 to sN                                   # N=0
        copy  pM to A
        decrement A
        copy A to sM
                do(Ack)
                        fed

#___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___

def(main,2)                                     # known OK
        copy $2 to sM
        copy $8 to sN
        do(Ack)
                        fed



/* Rick Hohensee 2001 */


/* The Ackerman function in GNU gas with cpp macros for "asmacs"
verbosifications and "osimpa" caller-hikes, callee-passes subroutine
parameter passing

Parts of asmacs, osimpa.cpp and local renamings included in this file for
clarity.

 osimpa stuff, with locals names to reflect the C code example and osimpa
callee-passes, i.e. pM instead of Pa, sM instead of a, etc.

The full asmacs is in Janet_Reno and H3sm. osimpa isn't out yet.
*/
 
....................................................................

I compared this to the C version on Bagley's language shootout
page by hacking that down to use

<snip>

main () {
return Ack(3,8);
}

so it just returns the low byte of the result, as does my code.

C can pick this up after the expressions are parsed. Whereas this models
"stack-array plus accumulator", that's actually less aggravation to
program directly than Forth stack manipulations (well, maybe), so I'll
probably code this on top of shasm without an expression parser. osimpa
stands for "one stack in memory plus accumulator".

Rick Hohensee
			www.clienux.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2001-07-23  4:23 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-04  3:37 Why Plan 9 C compilers don't have asm("") Rick Hohensee
2001-07-04  3:36 ` Olivier Galibert
2001-07-04  6:24   ` Cort Dougan
2001-07-04  8:03     ` H. Peter Anvin
2001-07-04 17:22     ` Linus Torvalds
2001-07-06  8:38       ` Cort Dougan
2001-07-06 18:44         ` Linus Torvalds
2001-07-06 20:02           ` Cort Dougan
2001-07-08 21:55           ` Victor Yodaiken
2001-07-08 22:28             ` Alan Cox
2001-07-09  1:22             ` Johan Kullstam
2001-07-08 22:29           ` David S. Miller
2001-07-06 11:43       ` David S. Miller
2001-07-21 22:10       ` Richard Henderson
2001-07-22  3:43         ` Linus Torvalds
2001-07-22  3:59           ` Mike Castle
2001-07-22  6:49           ` Richard Henderson
2001-07-22  7:44             ` Linus Torvalds
2001-07-22 15:53               ` Richard Henderson
2001-07-22 19:08                 ` Linus Torvalds
2001-07-04  7:15 ` pazke
2001-07-04 17:32 ` Don't feed the trooll [offtopic] " Ben LaHaise
2001-07-05  1:02 ` Michael Meissner
2001-07-05  1:54   ` Rick Hohensee
2001-07-05 16:54     ` Michael Meissner
2001-07-04 10:10 Rick Hohensee
2001-07-05  3:26 Rick Hohensee
2001-07-06 17:24 Rick Hohensee
2001-07-06 23:54 ` David S. Miller
2001-07-07  0:16   ` H. Peter Anvin
2001-07-07  0:37   ` David S. Miller
2001-07-07  6:16 Rick Hohensee
     [not found] <mailman.994629840.17424.linux-kernel2news@redhat.com>
2001-07-09  0:08 ` Pete Zaitcev
2001-07-09  0:28   ` Victor Yodaiken
2001-07-09  3:03 Rick Hohensee
2001-07-23  4:39 Rick Hohensee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).