linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: "Richard B. Johnson" <root@chaos.analogic.com>
Cc: "Ihar 'Philips' Filipau" <filia@softhome.net>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: nasm over gas?
Date: Mon, 8 Sep 2003 17:10:32 +0100	[thread overview]
Message-ID: <20030908161032.GA26829@mail.jlokier.co.uk> (raw)
In-Reply-To: <Pine.LNX.4.53.0309080924440.32483@chaos>

Richard B. Johnson wrote:
> > > Actually it is no as simple as that.  With the instruction that uses
> > > %edi following immediately after the instruction that populates it you
> > > cannot
> > > execute those two instructions in parallel.
> 
> With a single-CPU ix86, the only instructions that operate in
> parallel are the instructions that calculate the next address, and
> this only if you use 'leal'. However, there is an instruction
> pipe-line so many memory accesses may seem to be unrelated to the
> current execution context and therfore assumed to be 'parallel'.

That was true on the 486.  The Pentium famously executed one or two
instructions per cycle, depending on whether they are "pairable".  The
Pentium Pro and later can issue up to 3 instructions per cycle,
depending on the instruction types.  If they are the right
instructions, it will sustain that rate over multiple cycles.

Nowadays all the major x86 CPUs issue multiple instructions per clock cycle.

-- Jamie



> 
> > >  So the code may be slower.  The
> > > exact rules depend on the architecture of the cpu.
> > >
> >
> >    It will depend on arch CPU only in case if you have unlimited i$ size.
> >    Servers with 8MB of cache - yes it is faster.
> >    Celeron with 128k of cache - +4bytes == higher probability of i$ miss
> > == lower performance.
> >
> > >
> > >>What gives you an impression that anyone is going to rewrite linux in asm?
> > >>I _only_ saying that compiler-generated asm is not 'good'. It's mediocre.
> > >>Nothing more. I am not asm zealot.
> > >
> > >
> > > I think I would agree with that statement most compiler-generated assembly
> > > code is mediocre in general.  At the same time I would add most human
> > > generated assembly is poor, and a pain to maintain.
> > >
> 
> The compiler-generated assembly is, by design, "universal" so that
> any legal 'C' statement may follow any other legal 'C' statement.
> This means that, at each sequence-point, the assembly generation
> is complete. This results in a lot of code duplication, etc. A
> really good optimizer could, perform a fix-up that, based upon
> the current 'C' code context, remove a lot of redundancy. Currently,
> some such optimization is done by gcc such as loop-unrolling, etc.
> 
> A really good project would be an assembly-optimizer operated
> like:
> 
> 	gcc -O2 -S -o -  prog.c | optimizer | as -o prog.o -
> 
> Just make that optimizer and away you go!  I hate parser and
> other text-based stuff so I'm not a candidate to make one of
> these things.
> 
> > > If you concentrate on those handful of places where you need to
> > > optimize that is reasonable.  Beyond that there simply are not the
> > > developer resources to do good assembly.  And things like algorithmic
> > > transformations in assembly are an absolute nightmare.  Where they are
> > > quite simple in C.
> > >
> > > And if the average generated code quality bothers you enough with C
> > > the compiler can be fixed, or another compiler can be written that
> > > does a better job, and the benefit applies to a lot more code.
> > >
> >
> >    e.g. C-- project: something like C, where you can operate with
> > registers just like another variables. Under DOS was producing .com
> > files witout any overhead: program with only 'int main() { return 0; }'
> > was optimized to one byte 'ret' ;-) But sure it was not complete C
> > implementation.
> >
> >    Sure I would prefere to have nasm used for kernel asm parts - but
> > obviously gas already became standard.
> >
> > P.S. Add having good macroprocessor for assembler is a must: CPP is
> > terribly stupid by design. I beleive gas has no preprocessor comparable
> > to masm's one? I bet they are using C's cpp. This is degradation: macros
> > is the major feature of any translator I was working with. They can save
> > you a lot of time and make code much more cleaner/readable/mantainable.
> > CPP is just too dumb for asm...
> > Good old times, when people were responsible to _every_ byte of their
> > programmes... Yeh... Memory/programmers are cheap nowadays...
> 
> 
> This is for information only. I certainly don't advocate
> writing everything in assembly language.
> 
> Attached is a tar file containing source and a Makefile.
> It generates two tiny programs, "hello" and "world".
> Both write "Hello world!" to standard-output. One is
> written in assembly and the other is written in 'C'.
> The one written in 'C' uses your installed shared
> runtime library as is normal for such programs. Even
> then, it is 2,948 bytes in length. The one written
> in assembly results in a complete executable that
> doesn't require any runtime support, i.e., static.
> It is only 456 bytes in length.
> 
> gcc -Wall -O4 -o hello hello.c
> strip hello
> as -o world.o world.S
> ld -o world world.o
> strip world
> ls -la hello world
> -rwxr-xr-x   1 root     root         2948 Sep  8 08:34 hello
> -rwxr-xr-x   1 root     root          456 Sep  8 08:34 world
> 
> The point is that if you really need to save some application
> size, in many cases you can do the work in assembly. It is
> a very useful tool. Also, if you have critical sections of
> code you need to pipe-line for speed, you can do it in assembly
> and make sure the optimization doesn't disappear the next
> time somebody updates (improves) your tools. What you write
> in assembly is what you get.
> 
> I don't like "in-line" assembly. Sometimes you don't have
> much choice because you can't call some assembly-language
> function to perform the work. However, when you can afford
> the overhead of calling a function written in assembly, the
> following applies.
> 
> Assume you have:
> 
>  extern int funct(int one, int two, int three);
> 
> Your assembly would obtain parameters as:
> 
> one   = 0x04
> two   = 0x08
> three = 0x0c
> 
> funct:	movl	one(%esp), %eax		# Get first passed parameter
> 	movl	two(%esp), %ebx		# Get second parameter
> 	movl	three(%esp), %ecx	# Get third parameter
> 	...etc
> 
> Now, gcc requires that your function not destroy any index
> registers, %ebp, or any segment registers so, in the case
> above, we need to save %ebx (an index register) before we
> modify its value. To do this, we push it onto the stack.
> This will alter the stack offsets where we obtain our input
> parameters.
> 
> 
> one   = 0x08
> two   = 0x0c
> three = 0x10
> 
> funct:	pushl	%ebx			# Save index register
> 	movl	one(%esp), %eax		# Get first passed parameter
> 	movl	two(%esp), %ebx		# Get second parameter
> 	movl	three(%esp), %ecx	# Get third parameter
> 	...etc
> 	popl	%ebx			# Restore index register
> 
> So, we could define macro that allows us to adjust the offsets
> based upon the number of registers saved. I won't bother
> here.
> 
> In most all cases, any value returned from the function is returned
> in the %eax register. If you need to return a 'long long' both
> %edx and %eax are used. Some functions may return values in the
> floating-point unit so, when replacing existing 'C' code, you
> need to see what the convention was.
> 
> When I write assembly-language functions I usually do it to
> replace 'C' functions that (usually) somebody else has written.
> Those 'C' functions are known to work. In other words, they
> perform the correct mathematics. However, they need to be
> speeded up or they need to be parred down to a more reasonable
> size to fit in some embedded system.
> 
> Recently we had a function that calculated the RMS value of
> an array of floating-point (double) numbers. With a particular
> array size, the time necessary was something like 300 milliseconds.
> By rewriting in assembly, and using the knowledge that the
> array will never be less that 512 doubles in length, plus always
> a power-of-two, the execution time went way down to 40 milliseconds.
> Also, you can't "cheat" with a FP unit. There are always memory-
> accesses that eat valuable CPU time. You can't put temporary float
> values in registers.
> 
> I strongly suggest that if you have an interest in assembly, you
> cultivate that interest. Soon most all mundane coding will be
> performed by machine from a specification written by "Sales".
> The only "real" programming will be done by those who can make
> the interface between the hardware and the "coding machine". That's
> assembly!
> 
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
>             Note 96.31% of all statistics are fiction.
> 



  reply	other threads:[~2003-09-08 16:11 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <rZQN.83u.21@gated-at.bofh.it>
     [not found] ` <saVL.7lR.1@gated-at.bofh.it>
     [not found]   ` <soFo.16a.1@gated-at.bofh.it>
     [not found]     ` <ssJa.6M6.25@gated-at.bofh.it>
     [not found]       ` <tcVB.rs.3@gated-at.bofh.it>
2003-09-08 12:03         ` nasm over gas? Ihar 'Philips' Filipau
2003-09-08 13:53           ` Richard B. Johnson
2003-09-08 16:10             ` Jamie Lokier [this message]
2003-09-08 16:17           ` Jamie Lokier
2003-09-08 16:45             ` Ihar 'Philips' Filipau
2003-09-08 16:58               ` Jamie Lokier
2003-09-08 17:59           ` William Lee Irwin III
     [not found] ` <uw6d.3hD.35@gated-at.bofh.it>
     [not found]   ` <uxED.5Rz.9@gated-at.bofh.it>
     [not found]     ` <uYbM.26o.3@gated-at.bofh.it>
     [not found]       ` <uZUr.4QR.25@gated-at.bofh.it>
     [not found]         ` <v4qU.3h1.27@gated-at.bofh.it>
     [not found]           ` <vog2.7k4.23@gated-at.bofh.it>
2003-09-13 23:57             ` stack alignment in the kernel was " Andi Kleen
2003-09-14 13:54               ` Jamie Lokier
2003-09-14 14:13                 ` Andi Kleen
2003-09-14 15:56                   ` Jamie Lokier
2003-09-14 22:27                 ` Jan Hubicka
     [not found] <tt0q.6Rc.17@gated-at.bofh.it>
     [not found] ` <tt0r.6Rc.19@gated-at.bofh.it>
     [not found]   ` <tt0r.6Rc.21@gated-at.bofh.it>
     [not found]     ` <tt0r.6Rc.23@gated-at.bofh.it>
     [not found]       ` <tt0r.6Rc.25@gated-at.bofh.it>
     [not found]         ` <tt0q.6Rc.15@gated-at.bofh.it>
     [not found]           ` <tyCN.6RD.13@gated-at.bofh.it>
2003-09-08 20:08             ` Ihar 'Philips' Filipau
     [not found] <snJB.8dk.25@gated-at.bofh.it>
     [not found] ` <snTm.8qD.41@gated-at.bofh.it>
     [not found]   ` <sTpW.18Z.19@gated-at.bofh.it>
     [not found]     ` <teE5.2XZ.9@gated-at.bofh.it>
2003-09-08 12:07       ` Ihar 'Philips' Filipau
2003-09-05 13:57 John Bradford
2003-09-05 15:39 ` Mehmet Ceyran
2003-09-06 20:24   ` David B. Stevens
  -- strict thread matches above, loose matches on Subject: below --
2003-09-05 12:25 John Bradford
2003-09-05 12:25 ` Fruhwirth Clemens
2003-09-06 22:08   ` Herbert Poetzl
2003-09-07 20:40     ` Fruhwirth Clemens
2003-09-05 13:20 ` Richard B. Johnson
2003-09-05 12:21 John Bradford
2003-09-04 10:42 Fruhwirth Clemens
2003-09-04 12:32 ` Antonio Vargas
2003-09-04 13:44 ` Yann Droneaud
2003-09-04 14:05   ` Richard B. Johnson
2003-09-04 14:21     ` Sean Neakums
2003-09-04 14:33       ` Richard B. Johnson
2003-09-04 15:09         ` Yann Droneaud
2003-09-04 14:55     ` Yann Droneaud
2003-09-05 21:16       ` George Anzinger
2003-09-04 14:57   ` Michael Frank
2003-09-04 15:43     ` Fruhwirth Clemens
2003-09-04 22:28     ` insecure
2003-09-05 12:59       ` Michael Frank
2003-09-05 17:28         ` insecure
2003-09-05 17:45           ` Jörn Engel
2003-09-06 17:18             ` insecure
2003-09-07 18:49           ` Eric W. Biederman
2003-09-07 19:30             ` Jamie Lokier
2003-09-09 21:37               ` insecure
2003-09-09 21:34             ` insecure
2003-09-11 11:07               ` Ricardo Bugalho
2003-09-12 15:26                 ` insecure
2003-09-12 17:27                   ` Ricardo Bugalho
2003-09-12 22:17                     ` Jörn Engel
2003-09-13 19:25                       ` Jamie Lokier
2003-09-13 19:51                         ` Jörn Engel
2003-09-11 14:03               ` Eric W. Biederman
2003-09-11 17:05                 ` Jamie Lokier
2003-09-09 20:56           ` Pavel Machek
2003-09-05 13:27       ` Jesse Pollard
2003-09-05 23:51     ` Aaron Lehmann
2003-09-06  1:41       ` Valdis.Kletnieks
2003-09-04 14:56 ` Yann Droneaud
2003-09-05 11:42 ` Jörn Engel
2003-09-05 12:04   ` Fruhwirth Clemens
2003-09-05 12:37     ` Jörn Engel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030908161032.GA26829@mail.jlokier.co.uk \
    --to=jamie@shareable.org \
    --cc=ebiederm@xmission.com \
    --cc=filia@softhome.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=root@chaos.analogic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).