linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Using %cr2 to reference "current"
@ 2001-11-06  7:18 H. Peter Anvin
  2001-11-06  8:01 ` Robert Love
                   ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: H. Peter Anvin @ 2001-11-06  7:18 UTC (permalink / raw)
  To: linux-kernel

2.4.13-ac8 uses %cr2 rather than (%esp & 0xfffe0000) to get "current".
I've been trying to figure out the point of this... writing a control
register is microcode on all the x86 implementations I know (and you
have to re-set it after every pagefault), and reading one probably is
one on most (not Transmeta, but...)

On the other hand, %esp is a GPR and available to the core directly,
and so are usually plain immediates.

Is using %cr2 really faster than the old implementation, or is there
another reason?  It seems that the alignment constraints on the stack
still remains, since the %esp solution still remains in places...

It might also be worth considering a segment-register based
implementation instead.  The reason we're not using %fs and %gs in the
kernel anymore is because of the setup slowness, but perhaps using
them (use %fs since it's much more likely to be NULL and thus faster
to restore) would be faster than using %cr2?

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06  7:18 Using %cr2 to reference "current" H. Peter Anvin
@ 2001-11-06  8:01 ` Robert Love
  2001-11-06 10:55   ` Alan Cox
  2001-11-06 14:14   ` Manfred Spraul
  2001-11-06 10:58 ` Alan Cox
  2001-11-06 17:02 ` Using %cr2 to reference "current" Linus Torvalds
  2 siblings, 2 replies; 61+ messages in thread
From: Robert Love @ 2001-11-06  8:01 UTC (permalink / raw)
  To: manfred; +Cc: linux-kernel, hpa

On Tue, 2001-11-06 at 02:18, H. Peter Anvin wrote:
> 2.4.13-ac8 uses %cr2 rather than (%esp & 0xfffe0000) to get "current".
> I've been trying to figure out the point of this...  <snip>

I too am confused.  More so, the difference between hard_get_current and
get_current is confusing.  I further question things because I suspect
there is a problem: hard_get_current is commented as "for within NMI,
do_page_fault, cpu_init" but all these functions call other functions
that may very well use get_current.  How is this going to work?

Further, the preemptible kernel patch oopses with this patch (IOW, don't
use 2.4.13-ac8 + preempt-kernel, unless you remove all these bits like I
did :>).  I think it may be because of:

Manfred Spraul wrote:
> error_code:
> [...]
> -       GET_CURRENT(%ebx)
>         call *%edi
>         addl $8,%esp
> +       GET_CURRENT(%ebx)
> The pointer to current was loaded into %ebx before the call to the error
> handler, now that only happens after the call. As far as I can see the
> load before the call is not required.

this change but I am unsure.  Would Manfred or someone knowledgeable in
this mind letting me pick their brain?

	Robert Love


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06  8:01 ` Robert Love
@ 2001-11-06 10:55   ` Alan Cox
  2001-11-06 17:31     ` Michael Barabanov
  2001-11-06 14:14   ` Manfred Spraul
  1 sibling, 1 reply; 61+ messages in thread
From: Alan Cox @ 2001-11-06 10:55 UTC (permalink / raw)
  To: Robert Love; +Cc: manfred, linux-kernel, hpa

> I too am confused.  More so, the difference between hard_get_current and
> get_current is confusing.  I further question things because I suspect

hard_get_current always works
get_current assumes %cr2 is loaded correctly

> do_page_fault, cpu_init" but all these functions call other functions
> that may very well use get_current.  How is this going to work?

do_page_fault and cpu_init load %cr2

> Further, the preemptible kernel patch oopses with this patch (IOW, don't
> use 2.4.13-ac8 + preempt-kernel, unless you remove all these bits like I
> did :>).  I think it may be because of:

You must ensure that you don't pre-empt until %cr2 is loaded. Obviously this
isnt a problem with the traditional low latency patch but if you pre-empty
very early in page fault handling then I suspect you might get the odd
suprise.

The reasoning behind all this is to fix the cache pessimal nature of the x86
stack layout - we had all task structs on the same cache colour and all 
stacks aligned within pages (so every apache thread waiting at the same
point is on the same colour too and each wait queue entry on their stacks
is linked to entries all the same colour)

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06  7:18 Using %cr2 to reference "current" H. Peter Anvin
  2001-11-06  8:01 ` Robert Love
@ 2001-11-06 10:58 ` Alan Cox
  2001-11-06 17:04   ` Linus Torvalds
  2001-11-06 17:02 ` Using %cr2 to reference "current" Linus Torvalds
  2 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2001-11-06 10:58 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

> Is using %cr2 really faster than the old implementation, or is there
> another reason?  It seems that the alignment constraints on the stack
> still remains, since the %esp solution still remains in places...

The stack is no longer aligned. We allocate two pages and disturb the stack
by upto 1.5K. We slab the task structs.

> It might also be worth considering a segment-register based
> implementation instead.  The reason we're not using %fs and %gs in the
> kernel anymore is because of the setup slowness, but perhaps using
> them (use %fs since it's much more likely to be NULL and thus faster
> to restore) would be faster than using %cr2?

It may be. Likewise its not clear if %cr2 should hold current or a cpu ident
pointer (so you dont reload on switch of task). This needs more
benchmarking. Its in current -ac to verify the theory is correct not the
tuning.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06  8:01 ` Robert Love
  2001-11-06 10:55   ` Alan Cox
@ 2001-11-06 14:14   ` Manfred Spraul
  1 sibling, 0 replies; 61+ messages in thread
From: Manfred Spraul @ 2001-11-06 14:14 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel, hpa

Robert Love wrote:
> 
> Further, the preemptible kernel patch oopses with this patch (IOW, don't
> use 2.4.13-ac8 + preempt-kernel, unless you remove all these bits like I
> did :>).  I think it may be because of:
>

Could you send me an oops?
I assume that a
	set_current(hard_get_current());
is missing somewhere.
The assumption is that get_current() is faster than hard_get_current(),
and that there are so many get_current() calls that the overhead for the
set_current() in __switch_to and do_page_fault is small.

> Manfred Spraul wrote:
> > error_code:
> > [...]
> > -       GET_CURRENT(%ebx)
> >         call *%edi
> >         addl $8,%esp
> > +       GET_CURRENT(%ebx)
> > The pointer to current was loaded into %ebx before the call to the error
> > handler, now that only happens after the call. As far as I can see the
> > load before the call is not required.
> 
> this change but I am unsure.  Would Manfred or someone knowledgeable in
> this mind letting me pick their brain?
>
I would be very surprised if that's a problem: the error handlers are C
functions, and they don't expect parameters in register %ebx.

--
	Manfred

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 18:14         ` Alan Cox
@ 2001-11-06 16:55           ` Marcelo Tosatti
  2001-11-06 18:14           ` Linus Torvalds
  2001-11-07  0:00           ` Martin Dalecki
  2 siblings, 0 replies; 61+ messages in thread
From: Marcelo Tosatti @ 2001-11-06 16:55 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, linux-kernel



On Tue, 6 Nov 2001, Alan Cox wrote:

> > "get_current" interrupt safe (ie switching tasks is totally atomic, as
> > it's the one single "movl ..,%esp" instruction that does the real switch
> > as far as the kernel is concerned).
> > 
> > It does require using an order-2 allocation, which the current VM will
> > allow anyway, but which is obviously nastier than an order-1.
> 
> I've seen boxes dead in the water from 8K NFS (ie 16K order-2 allocations),
> let alone the huge memory hit. Michael's rtlinux approach looks even more
> interesting and I may have to play with that (using the TSS to ident the
> cpu)

Btw, I also want to see what intense "for-optimization" high-order
allocators are going to do to the current VM. 

Think about the possible intensive pressure (and CPU wasted) caused by,
for example, SCSI code which _always_ tries to do 1-order allocations (or
bigger?) to allocate scatter/gather tables. We want those allocations to
fail to 0-order allocations instead looping madly inside the VM freeing
routines.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06  7:18 Using %cr2 to reference "current" H. Peter Anvin
  2001-11-06  8:01 ` Robert Love
  2001-11-06 10:58 ` Alan Cox
@ 2001-11-06 17:02 ` Linus Torvalds
  2001-11-06 17:13   ` Benjamin LaHaise
  2 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2001-11-06 17:02 UTC (permalink / raw)
  To: linux-kernel

In article <9s82rl$k51$1@cesium.transmeta.com>,
H. Peter Anvin <hpa@zytor.com> wrote:
>
>Is using %cr2 really faster than the old implementation, or is there
>another reason?  It seems that the alignment constraints on the stack
>still remains, since the %esp solution still remains in places...

I think the _real_ issue with that patch is that %cr2 is by no means
architecturally even guaranteed to work the way the patches want it to
work. 

It's simply not a general-purpose register, and I don't see why it is
assumed to be (a) fast (b) stable and (c) writable.

I could well imagine a x86-compatible chip where %cr2 isn't even
writable.  In fact, reading the intel documentation, I see _nowhere_ a
mention of %cr2 being writable at all - it all just says "contains the
fault address". 

Similarly, there is _nothing_ that guarantees that the low bits of %cr2
are meaningful, writable, or even implemented.

Which means that the whole approach is just depending on undocumented
implementation behaviour. That's asking for trouble.

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 10:58 ` Alan Cox
@ 2001-11-06 17:04   ` Linus Torvalds
  2001-11-06 17:46     ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2001-11-06 17:04 UTC (permalink / raw)
  To: linux-kernel

In article <E1613vx-00005r-00@the-village.bc.nu>,
Alan Cox  <alan@lxorguk.ukuu.org.uk> wrote:
>
>It may be. Likewise its not clear if %cr2 should hold current or a cpu ident
>pointer (so you dont reload on switch of task). This needs more
>benchmarking. Its in current -ac to verify the theory is correct not the
>tuning.

We pretty much know the _theory_ is not correct, just by virtue of
depending on non-architected behaviour.  The only thing -ac can do is
test whether it works in practice.  Which is a totally different thing. 

Especially on x86 chips.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 17:02 ` Using %cr2 to reference "current" Linus Torvalds
@ 2001-11-06 17:13   ` Benjamin LaHaise
  2001-11-06 17:49     ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Benjamin LaHaise @ 2001-11-06 17:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Tue, Nov 06, 2001 at 05:02:32PM +0000, Linus Torvalds wrote:
> Which means that the whole approach is just depending on undocumented
> implementation behaviour. That's asking for trouble.

NetWare uses it and has for a long time.

		-ben

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 10:55   ` Alan Cox
@ 2001-11-06 17:31     ` Michael Barabanov
  0 siblings, 0 replies; 61+ messages in thread
From: Michael Barabanov @ 2001-11-06 17:31 UTC (permalink / raw)
  To: Alan Cox; +Cc: Robert Love, manfred, linux-kernel, hpa, Victor Yodaiken

Here's my version of hard cpu id (RTLinux version):

extern inline int rtl_getcpuid(void)
{
        unsigned cpu;
        __asm__ (
                        "str %%ax\n\t"
                        "shr $5, %%eax\n\t"
                        "sub $3, %%eax\n\t"
                        : "=a"(cpu));
        return cpu;
}

No cr2 involved; extremely fast. This takes advantage of the fact that
TSS-CPU mapping is 1-1 in 2.4.

Michael.

Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> > I too am confused.  More so, the difference between hard_get_current and
> > get_current is confusing.  I further question things because I suspect
> 
> hard_get_current always works
> get_current assumes %cr2 is loaded correctly
> 
> > do_page_fault, cpu_init" but all these functions call other functions
> > that may very well use get_current.  How is this going to work?
> 
> do_page_fault and cpu_init load %cr2
> 
> > Further, the preemptible kernel patch oopses with this patch (IOW, don't
> > use 2.4.13-ac8 + preempt-kernel, unless you remove all these bits like I
> > did :>).  I think it may be because of:
> 
> You must ensure that you don't pre-empt until %cr2 is loaded. Obviously this
> isnt a problem with the traditional low latency patch but if you pre-empty
> very early in page fault handling then I suspect you might get the odd
> suprise.
> 
> The reasoning behind all this is to fix the cache pessimal nature of the x86
> stack layout - we had all task structs on the same cache colour and all 
> stacks aligned within pages (so every apache thread waiting at the same
> point is on the same colour too and each wait queue entry on their stacks
> is linked to entries all the same colour)
> 
> Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 17:04   ` Linus Torvalds
@ 2001-11-06 17:46     ` Alan Cox
  2001-11-06 17:59       ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2001-11-06 17:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

> We pretty much know the _theory_ is not correct, just by virtue of
> depending on non-architected behaviour.  The only thing -ac can do is
> test whether it works in practice.  Which is a totally different thing. 

Yep

> Especially on x86 chips.

Well so far I've found one laptop that eats %cr2 on APM calls, and we have
some mystery cases. Peter's suggestion of using %fs or %gs looks more
promising at the moment


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 17:13   ` Benjamin LaHaise
@ 2001-11-06 17:49     ` Linus Torvalds
  2001-11-06 18:19       ` Alan Cox
  2001-11-06 18:42       ` Benjamin LaHaise
  0 siblings, 2 replies; 61+ messages in thread
From: Linus Torvalds @ 2001-11-06 17:49 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-kernel


On Tue, 6 Nov 2001, Benjamin LaHaise wrote:
>
> On Tue, Nov 06, 2001 at 05:02:32PM +0000, Linus Torvalds wrote:
> > Which means that the whole approach is just depending on undocumented
> > implementation behaviour. That's asking for trouble.
>
> NetWare uses it and has for a long time.

Does anybody know if WNT uses it? Quite frankly, I don't see Intel
worrying over-much about NetWare compatibility. They've broken small OS's
before (ie older versions of SCO Xenix wouldn't boot on a Pentium MMU
because of some changes to error reporting, if I remember correctly).

That said, how expensive is loading %cr2 anyway? We can do all the same
tricks with a 16kB stack and just playing games with using the higher bits
as the "offset", ie things like

	/* Return "current" in %eax, trash %edx */
	do_get_current:
		movl $0x0003c000,%eax	// 4 bits at bit 14
		movl $-16384,%edx	// remove low 14 bits
		andl $esp,%eax
		andl $esp,%edx
		shrl $7,%eax		// color it by 128 bytes
		addl %edx,%eax
		ret

which is going to be ~5 cycles _without_ doing anything that is
undocumented (add a push/pop to not trash a register, that might be
worthwhile - it makes the function marginally slower but might make
callers happier).

Oh, and call using inline assembly, not a C call (so that gcc can take
advantage of better calling convention, and not think memory is trashed
etc). So

	static inline struct task_struct *get_current(void)
	{
		struct task_struct *tsk;
		asm("call do_get_current":"=a" (tsk)::"dx");
		return tsk;
	}

See? You don't have to play games with control registers.

(actually, entry.S seems to want the return value in %ebx, so change to
taste. Or you could have two different versions of the thing, or even
inline it for any place where that makes sense).

The above also allows you to keep fork with just one allocation, and makes
the stack larger (we steal 2kB for the coloring, but we'd use an order-2
allocation that at least SGI wants to do regardless).

The 2kB is, of course, tunable. The above is with a 128-byte cacheline and
16 colors - that may be overkill. 32-byte increents with 32 colors might
be more appropriate (I don't know what the effect of the P4 half-cacheline
thing is, I don't know if the CPU can have just a 64-byte block coherent,
or what.. But a 32-byte color is fine for _most_ CPU's).

The 32-byte by 32-color thing would just change the bitmasks to 0x0007c000
and the shift to 9 (bit 14+ shifted down to bit 5+).

Note that there are lots of advantages to using simple regular
instructions over using "special" instructions like "move from control
register". Historically, the special instructions tend to always become
slower, while the regular instructions become faster.

I would not be surprised if "mov %cr2,%reg" will break a netburst trace
cache entity, or even cause microcode to be executed. While I _guarantee_
that all future Intel CPU's will continue to be fast at mixtures of simple
arithmetic operations like "add" and "and".

(And I bet that the likelyhood of Intel speeding up shifts in the next P4
derivative is a _lot_ higher than Intel speeding up "mov %cr2,xx"..)

		Linus


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 17:46     ` Alan Cox
@ 2001-11-06 17:59       ` Linus Torvalds
  2001-11-06 18:14         ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2001-11-06 17:59 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel


On Tue, 6 Nov 2001, Alan Cox wrote:
>
> > Especially on x86 chips.
>
> Well so far I've found one laptop that eats %cr2 on APM calls, and we have
> some mystery cases.

Well, APM is going away, and it should be easy enough to work around it
(and I don't _think_ you can reasonably do the same in ACPI or SMM: SMM
will save the whole CPU state and has to do that anyway, and ACPI doesn't
actually get to touch things like %cr2).

So I'd be more nervous about future CPU's just not having the register
writable (or having only parts of it, or..)

> Peter's suggestion of using %fs or %gs looks more
> promising at the moment

The problem with using a segment register is that then you have to
save/restore it over system calls - pretty much whether the call needs it
or not. Ie you can pretty much _guarantee_ that any system call will be
slowed down by something on the order of 10-15 cycles (on a good day, some
CPU's are slower at it). Same goes for task switch etc.

Which is why I'd much rather just color using the high bits of %esp, and
spend a few more cycles inside "get_current()". I can guarantee you that
it won't slow down paths that don't even need current at all (unlike the
segment register approach), and even the paths that _do_ need current will
only be ~5 cycles slower (plus possible the cache miss of doing the
function call, but the call-site itself will actually be slightly smaller
than the current in-lined 32-bit immediate and "andl").

Using high bits of %esp has zero impact on task-switch, and makes
"get_current" interrupt safe (ie switching tasks is totally atomic, as
it's the one single "movl ..,%esp" instruction that does the real switch
as far as the kernel is concerned).

It does require using an order-2 allocation, which the current VM will
allow anyway, but which is obviously nastier than an order-1.

		Linus


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 18:14         ` Alan Cox
  2001-11-06 16:55           ` Marcelo Tosatti
@ 2001-11-06 18:14           ` Linus Torvalds
  2001-11-06 18:31             ` Alan Cox
  2001-11-07  0:00           ` Martin Dalecki
  2 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2001-11-06 18:14 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel


On Tue, 6 Nov 2001, Alan Cox wrote:
>
> Our memory bloat is already pretty gross in 2.4 without adding 16K task
> stacks to the oversided struct page, bootmem and excess double linked lists.

There are some people who think that the 5kB stack we have now is too
small ;(

> I also need to try sticking a pointer to the task struct at the top of the
> stack and loading that - since that should be a cache line that isnt being
> shared around or swapped between processors

That should work fairly well, and has the advantage that you can hide more
state there if you want (ie it allows us, on demand, to move hot state of
"struct task_struct" up there).

There is a subset of "struct task_struct" that is basically completely
local to the task, and could be advantageous to move around. Things like

 - need_resched/sigpending/process attributes
 - ptrace
 - processor
 - addr_limit

are all things that we don't actually _need_ to go all the way to the task
structure to fetch, and that we mostly need to modify anyway on task
switch (ie "need_resched" and "processor" both need to be written on
task-switch anyway, and are not touched by anything other CPU)

So it would basically be a small per-CPU/thread area, not just the "struct
task_struct".

		Linus


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 17:59       ` Linus Torvalds
@ 2001-11-06 18:14         ` Alan Cox
  2001-11-06 16:55           ` Marcelo Tosatti
                             ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Alan Cox @ 2001-11-06 18:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alan Cox, linux-kernel

> "get_current" interrupt safe (ie switching tasks is totally atomic, as
> it's the one single "movl ..,%esp" instruction that does the real switch
> as far as the kernel is concerned).
> 
> It does require using an order-2 allocation, which the current VM will
> allow anyway, but which is obviously nastier than an order-1.

I've seen boxes dead in the water from 8K NFS (ie 16K order-2 allocations),
let alone the huge memory hit. Michael's rtlinux approach looks even more
interesting and I may have to play with that (using the TSS to ident the
cpu)

Our memory bloat is already pretty gross in 2.4 without adding 16K task
stacks to the oversided struct page, bootmem and excess double linked lists.

I also need to try sticking a pointer to the task struct at the top of the
stack and loading that - since that should be a cache line that isnt being
shared around or swapped between processors

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 17:49     ` Linus Torvalds
@ 2001-11-06 18:19       ` Alan Cox
  2001-11-09 21:52         ` Jamie Lokier
  2001-11-06 18:42       ` Benjamin LaHaise
  1 sibling, 1 reply; 61+ messages in thread
From: Alan Cox @ 2001-11-06 18:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Benjamin LaHaise, linux-kernel

> That said, how expensive is loading %cr2 anyway? We can do all the same
> tricks with a 16kB stack and just playing games with using the higher bits
> as the "offset", ie things like

So thats another 600K on my box vanished. I suspect the page faults will
outweigh it

> the stack larger (we steal 2kB for the coloring, but we'd use an order-2
> allocation that at least SGI wants to do regardless).

16K stack is serious "people who cant program" country.

> I would not be surprised if "mov %cr2,%reg" will break a netburst trace
> cache entity, or even cause microcode to be executed. While I _guarantee_
> that all future Intel CPU's will continue to be fast at mixtures of simple
> arithmetic operations like "add" and "and".

True enough, but then we can go to

	andl %%esp, %0
	movl (%%eax), %%eax

which doesnt really change the cost much, lets us colour the task structs
nicely, and lets us colour the stack somewhat by offseting esp from the base
- and all in standard instructions

Alan





^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 18:14           ` Linus Torvalds
@ 2001-11-06 18:31             ` Alan Cox
  2001-11-06 22:38               ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2001-11-06 18:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alan Cox, linux-kernel

> > Our memory bloat is already pretty gross in 2.4 without adding 16K task
> > stacks to the oversided struct page, bootmem and excess double linked lists.
> 
> There are some people who think that the 5kB stack we have now is too
> small ;(

Yes but we dont want to let them win or next year 16K will be too small and
then they'll want to 16K C++ stack objects. At the very least we should
make them have to use

	really_slow_vmalloc_and_switch_to_big_temporary_stack()
	really_slow_vfree_and_return_to_old_stack()

_and_ make them type function names that long.

Granted its less of an issue in 2.5 because we can afford to finally make
DMA off the stack a crime (right now its an offence but one that is violated
in too many places to be sure of killing them all off) - scsi for one does
it. 

> That should work fairly well, and has the advantage that you can hide more
> state there if you want (ie it allows us, on demand, to move hot state of
> "struct task_struct" up there).

Sweet. Now that I'd completely missed. Task private state and task
public state splitting

> So it would basically be a small per-CPU/thread area, not just the "struct
> task_struct".

Yep

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 17:49     ` Linus Torvalds
  2001-11-06 18:19       ` Alan Cox
@ 2001-11-06 18:42       ` Benjamin LaHaise
  2001-11-06 19:09         ` H. Peter Anvin
  2001-11-06 19:16         ` Dave Jones
  1 sibling, 2 replies; 61+ messages in thread
From: Benjamin LaHaise @ 2001-11-06 18:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Tue, Nov 06, 2001 at 09:49:15AM -0800, Linus Torvalds wrote:
> That said, how expensive is loading %cr2 anyway? We can do all the same
> tricks with a 16kB stack and just playing games with using the higher bits
> as the "offset", ie things like

Here are some numbers:

read cr2 best: 11  av: 11.12
write cr2 cr2 best: 61  av: 64.42
read cr2 best: 11  av: 11.12
write cr2 cr2 best: 61  av: 65.01
read stk best: 10  av: 11.03
write cr2 stk best: 61  av: 64.95
read stk best: 10  av: 11.03
write cr2 stk best: 61  av: 65.23

Which come from insmod of the below two modules.  I didn't test writing to 
the stack register, but I expect it's similarly expensive as it affects the 
call return stack and other behind the scenes dependancies.  Suffice it to 
say that reading %cr2 is essentially free on my box (athlon mp).  Maybe 
we should use it as a pointer into a per-cpu area to avoid writing it?

		-ben

----teststk_k.c----
#define USE_STK 1
#include "testcr2_k.c"
----testcr2_k.c----
#include <linux/module.h>
#include <linux/kernel.h>
#include <asm/errno.h>
#include <linux/init.h>

static inline long long rdtsc(void)
{
        unsigned int low,high;
        __asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high));
        return low + (((long long)high)<<32);
}

long dummy;

long doit(void)
{
	long long start, end;
	long val;

	start = rdtsc();
#ifdef USE_STK
#define WHICH	"stk"
	__asm__ __volatile__(
                "movl $0x0003c000,%%eax  \n" // 4 bits at bit 14
                "movl $-16384,%%edx      \n" // remove low 14 bits
                "andl %%esp,%%eax		\n"
                "andl %%esp,%%edx		\n"
                "shrl $7,%%eax           \n" // color it by 128 bytes
                "addl %%edx,%%eax		\n"
		: "=a" (val) :: "edx");
#else
#define WHICH "cr2"
        __asm__ __volatile__("movl %%cr2,%0" : "=r" (val));
#endif
	val += 100;
	dummy = val;
	end = rdtsc();

	return end - start;
}

long doit2(void)
{
	long long start, end;
	long val;

	start = rdtsc();
	val = dummy;
        __asm__ __volatile__("movl %0,%%cr2" : "=r" (val));
	end = rdtsc();

	return end - start;
}

int test_init (void)
{
	long min = 1000000000, av = 0;
	int i;
	for (i=0; i<100; i++) {
		long dur = doit();
		if (dur < min)
			min = dur;
		av += dur;
	}
	printk("read " WHICH " best: %ld  av: %ld.%02ld\n", min, av / 100, av % 100);

	min = 10000000;
	av = 0;
	for (i=0; i<100; i++) {
		long dur = doit2();
		if (dur < min)
			min = dur;
		av += dur;
	}
	printk("write cr2 " WHICH " best: %ld  av: %ld.%02ld\n", min, av / 100, av % 100);
	return -ENODEV;
}

void test_exit(void)
{
	return;
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
---snip---

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 18:42       ` Benjamin LaHaise
@ 2001-11-06 19:09         ` H. Peter Anvin
  2001-11-06 19:16         ` Dave Jones
  1 sibling, 0 replies; 61+ messages in thread
From: H. Peter Anvin @ 2001-11-06 19:09 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20011106134234.A27718@redhat.com>
By author:    Benjamin LaHaise <bcrl@redhat.com>
In newsgroup: linux.dev.kernel
>
> On Tue, Nov 06, 2001 at 09:49:15AM -0800, Linus Torvalds wrote:
> > That said, how expensive is loading %cr2 anyway? We can do all the same
> > tricks with a 16kB stack and just playing games with using the higher bits
> > as the "offset", ie things like
> 
> Here are some numbers:
> 
> read cr2 best: 11  av: 11.12
> write cr2 cr2 best: 61  av: 64.42
> read cr2 best: 11  av: 11.12
> write cr2 cr2 best: 61  av: 65.01
> read stk best: 10  av: 11.03
> write cr2 stk best: 61  av: 64.95
> read stk best: 10  av: 11.03
> write cr2 stk best: 61  av: 65.23
> 
> Which come from insmod of the below two modules.  I didn't test writing to 
> the stack register, but I expect it's similarly expensive as it affects the 
> call return stack and other behind the scenes dependancies.  Suffice it to 
> say that reading %cr2 is essentially free on my box (athlon mp).  Maybe 
> we should use it as a pointer into a per-cpu area to avoid writing it?
> 

You still have to write it every time you take a page fault.  You're
adding 60-odd cycles to the page fault path at least.

Not to mention any system which does microcoded reads of %cr2, which
apparently the Athlon XP doesn't.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 18:42       ` Benjamin LaHaise
  2001-11-06 19:09         ` H. Peter Anvin
@ 2001-11-06 19:16         ` Dave Jones
  2001-11-06 20:10           ` Ricky Beam
  2001-11-06 23:09           ` Alan Cox
  1 sibling, 2 replies; 61+ messages in thread
From: Dave Jones @ 2001-11-06 19:16 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Linus Torvalds, linux-kernel

On Tue, 6 Nov 2001, Benjamin LaHaise wrote:

> Here are some numbers:
> Which come from insmod of the below two modules.  I didn't test writing to
> the stack register, but I expect it's similarly expensive as it affects the
> call return stack and other behind the scenes dependancies.  Suffice it to
> say that reading %cr2 is essentially free on my box (athlon mp).  Maybe
> we should use it as a pointer into a per-cpu area to avoid writing it?

If this is done, it should perhaps be done on only on certain x86s,
as some show the results go the other way. For example, the Cyrix III..

read stk best: 42  av: 42.60
read cr2 best: 61  av: 61.28

regards,

Dave.

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 19:16         ` Dave Jones
@ 2001-11-06 20:10           ` Ricky Beam
  2001-11-06 23:09           ` Alan Cox
  1 sibling, 0 replies; 61+ messages in thread
From: Ricky Beam @ 2001-11-06 20:10 UTC (permalink / raw)
  To: Dave Jones; +Cc: Benjamin LaHaise, Linux Kernel Mail List

On Tue, 6 Nov 2001, Dave Jones wrote:
>If this is done, it should perhaps be done on only on certain x86s,
>as some show the results go the other way. For example, the Cyrix III..

And for some (P150) it makes no difference...

read cr2 best: 25  av: 27.09
write cr2 cr2 best: 32  av: 34.39

read stk best: 26  av: 28.22
write cr2 stk best: 32  av: 33.04

--Ricky



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 18:31             ` Alan Cox
@ 2001-11-06 22:38               ` Linus Torvalds
  0 siblings, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2001-11-06 22:38 UTC (permalink / raw)
  To: linux-kernel

In article <E161B0f-0001Io-00@the-village.bc.nu>,
Alan Cox  <alan@lxorguk.ukuu.org.uk> wrote:
>
>> That should work fairly well, and has the advantage that you can hide more
>> state there if you want (ie it allows us, on demand, to move hot state of
>> "struct task_struct" up there).
>
>Sweet. Now that I'd completely missed. Task private state and task
>public state splitting

Yes. It would be a waste to have to bring in a cache-line into the L1
cache, and then only use 4 bytes of it. So it should make sense to set
this up somewhat like:

	struct local_task_struct {
		struct task_struct *tsk;
		.. other fields ..
	};

and then use the _exact_ existing infrastructure to get
"local_task_struct" instead of "task_struct", and let the compiler do
all the rest at a higher level. So we'd just rename "get_current()" to
"get_local_current()", and then do

	#define get_current()	(get_local_current()->tsk)

and people who want to know about the local task struct can use that.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 19:16         ` Dave Jones
  2001-11-06 20:10           ` Ricky Beam
@ 2001-11-06 23:09           ` Alan Cox
  2001-11-06 23:15             ` Dave Jones
  1 sibling, 1 reply; 61+ messages in thread
From: Alan Cox @ 2001-11-06 23:09 UTC (permalink / raw)
  To: Dave Jones; +Cc: Benjamin LaHaise, Linus Torvalds, linux-kernel

> If this is done, it should perhaps be done on only on certain x86s,
> as some show the results go the other way. For example, the Cyrix III..
> 
> read stk best: 42  av: 42.60
> read cr2 best: 61  av: 61.28

Do we have many SMP Cyrix III's ?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 23:09           ` Alan Cox
@ 2001-11-06 23:15             ` Dave Jones
  0 siblings, 0 replies; 61+ messages in thread
From: Dave Jones @ 2001-11-06 23:15 UTC (permalink / raw)
  To: Alan Cox; +Cc: Benjamin LaHaise, Linus Torvalds, linux-kernel

On Tue, 6 Nov 2001, Alan Cox wrote:

> > If this is done, it should perhaps be done on only on certain x86s,
> > as some show the results go the other way. For example, the Cyrix III..
> Do we have many SMP Cyrix III's ?

I wish :)  Today no, tomorrow only VIA knows.
I just used that as an example that it may not be a win everywhere.
A better example perhaps was the P5 case Ricky posted, which as you
know, are seen in the real world in SMP.

regards,

Dave.

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07  0:00           ` Martin Dalecki
@ 2001-11-06 23:19             ` Alan Cox
  2001-11-07  0:43               ` Martin Dalecki
  2001-11-07 14:00               ` Martin Dalecki
  0 siblings, 2 replies; 61+ messages in thread
From: Alan Cox @ 2001-11-06 23:19 UTC (permalink / raw)
  To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel

> If we are talking about memmory bload. Let's usk a question. Is somebody
> there
> working seriously on changing the default function call conventions on
> IA32

Thats pure noise

On a 256Mb machine you have 65536 page map entries. Those are 64 bytes but
its not hard to get it down to 56 bytes (.5Mb saved) and probably to 48
bytes. We can probably also shave 8 bytes off each cached inode if not
more (the nfs changes in -ac are a big help there already) - thats typically
another 200K on a reasonable size box - and the new bootmem code can save a
chunk too

Im not sure how much the code change for function call patterns would be
but I doubt its so big for such little effort

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 18:14         ` Alan Cox
  2001-11-06 16:55           ` Marcelo Tosatti
  2001-11-06 18:14           ` Linus Torvalds
@ 2001-11-07  0:00           ` Martin Dalecki
  2001-11-06 23:19             ` Alan Cox
  2 siblings, 1 reply; 61+ messages in thread
From: Martin Dalecki @ 2001-11-07  0:00 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, linux-kernel

Alan Cox wrote:
> 
> > "get_current" interrupt safe (ie switching tasks is totally atomic, as
> > it's the one single "movl ..,%esp" instruction that does the real switch
> > as far as the kernel is concerned).
> >
> > It does require using an order-2 allocation, which the current VM will
> > allow anyway, but which is obviously nastier than an order-1.
> 
> I've seen boxes dead in the water from 8K NFS (ie 16K order-2 allocations),
> let alone the huge memory hit. Michael's rtlinux approach looks even more
> interesting and I may have to play with that (using the TSS to ident the
> cpu)
> 
> Our memory bloat is already pretty gross in 2.4 without adding 16K task
> stacks to the oversided struct page, bootmem and excess double linked lists.

If we are talking about memmory bload. Let's usk a question. Is somebody
there
working seriously on changing the default function call conventions on
IA32
from stack parameter pushing to register passing throughout the
kernel? The implications on in esp. the I-cache pressure seem to be
quite significant and apparently one of there areas where the GCC got
much better is precisely this. The recent comparisions of gcc against
the intel compiler show as well that this may be really worth it.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07  0:43               ` Martin Dalecki
@ 2001-11-07  0:27                 ` Alan Cox
  2001-11-07  0:35                 ` Jeff Garzik
  1 sibling, 0 replies; 61+ messages in thread
From: Alan Cox @ 2001-11-07  0:27 UTC (permalink / raw)
  To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel

> Please count the removal of the *very* sparse read_ahead array as
> well (patch went to this list a long time ago) in.
> It doesn't cost anything and saves some few pages depending on the
> number of drivers you have loaded... (Well in comparision to the above
> that's nit picking, but...) 

Sounds quite believable. Several of the hashes are oversize too it seems

> And then there is the overloaded struct inde. It would be worth
> quite a bit of memmory to not overlay the private,filesystem 
> specific parts but to attach them by a pointer instead, in esp.

Thats what -ac has started doing. Al Viro has done the worst case ones
so far.

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07  0:43               ` Martin Dalecki
  2001-11-07  0:27                 ` Alan Cox
@ 2001-11-07  0:35                 ` Jeff Garzik
  1 sibling, 0 replies; 61+ messages in thread
From: Jeff Garzik @ 2001-11-07  0:35 UTC (permalink / raw)
  To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel

Martin Dalecki wrote:
> And then there is the overloaded struct inde. It would be worth
> quite a bit of memmory to not overlay the private,filesystem
> specific parts but to attach them by a pointer instead, in esp.
> if you make this in a way where the private part would be used
> without the public interface in drivers.

I think there are plans for several filesystems to use the generic_ip
and generic_sbp members of the unions, instead of further adding to the
unions.  

FreeVxFS is an example of one such filesystem which already does this.

-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 23:19             ` Alan Cox
@ 2001-11-07  0:43               ` Martin Dalecki
  2001-11-07  0:27                 ` Alan Cox
  2001-11-07  0:35                 ` Jeff Garzik
  2001-11-07 14:00               ` Martin Dalecki
  1 sibling, 2 replies; 61+ messages in thread
From: Martin Dalecki @ 2001-11-07  0:43 UTC (permalink / raw)
  To: Alan Cox; +Cc: dalecki, Linus Torvalds, linux-kernel

Alan Cox wrote:
> 
> > If we are talking about memmory bload. Let's usk a question. Is somebody
> > there
> > working seriously on changing the default function call conventions on
> > IA32
> 
> Thats pure noise
> 
> On a 256Mb machine you have 65536 page map entries. Those are 64 bytes but
> its not hard to get it down to 56 bytes (.5Mb saved) and probably to 48
> bytes. We can probably also shave 8 bytes off each cached inode if not
> more (the nfs changes in -ac are a big help there already) - thats typically
> another 200K on a reasonable size box - and the new bootmem code can save a
> chunk too
> 
> Im not sure how much the code change for function call patterns would be
> but I doubt its so big for such little effort

Please count the removal of the *very* sparse read_ahead array as
well (patch went to this list a long time ago) in.
It doesn't cost anything and saves some few pages depending on the
number of drivers you have loaded... (Well in comparision to the above
that's nit picking, but...) 

And then there is the overloaded struct inde. It would be worth
quite a bit of memmory to not overlay the private,filesystem 
specific parts but to attach them by a pointer instead, in esp.
if you make this in a way where the private part would be used 
without the public interface in drivers. Currently the most rudiculous
inode layout is deterministic for the overall size in the compiled
kernel.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07 14:00               ` Martin Dalecki
@ 2001-11-07 13:38                 ` Alan Cox
  2001-11-07 14:59                   ` Martin Dalecki
                                     ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Alan Cox @ 2001-11-07 13:38 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel

> With the following options enabled we get:
> -freg-struct-return -mrtd -mregparm=3
> 
>    text    data     bss     dec     hex filename
> 1302372  260804  288080 1851256  1c3f78 vmlinux
> 
> Quite significant difference if you ask me!!!

30K is nice have but still a scratch on the surface compared with 500K 8)

> in a saving of about 2.3% in code size. This may not sound grat in
> relative
> numbers, but for a compiler designer this would already sound hilarious
> and in
> absolute numbers it's: 29760 bytes. Not withstanding the speed
> improvement...

The obvious question is - have you tried running the kernel built like that
with any asm fixups needed ?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 23:19             ` Alan Cox
  2001-11-07  0:43               ` Martin Dalecki
@ 2001-11-07 14:00               ` Martin Dalecki
  2001-11-07 13:38                 ` Alan Cox
  1 sibling, 1 reply; 61+ messages in thread
From: Martin Dalecki @ 2001-11-07 14:00 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, linux-kernel

Alan Cox wrote:

> Im not sure how much the code change for function call patterns would be
> but I doubt its so big for such little effort

Let numbers talk to us, or allow me to quote the georieously politically
incorrect Dave: "Numbers talk - billshit walks!":

Without register passing, we have the following size situation:

   text    data     bss     dec     hex filename
1332132  260804  288080 1881016  1cb3b8 vmlinux

With the following options enabled we get:
-freg-struct-return -mrtd -mregparm=3

   text    data     bss     dec     hex filename
1302372  260804  288080 1851256  1c3f78 vmlinux

Quite significant difference if you ask me!!!

With the following options enabled we get:
-mrtd -mregparm=3

   text    data     bss     dec     hex filename
1302404  260804  288080 1851288  1c3f98 vmlinux

Here it's just a few bytes here and there not really
significant, becouse the kernel apparently doesn't
use structs as return values frequently.

With the following options enabled we get:
-mregparm=3

   text    data     bss     dec     hex filename
1303476  260804  288080 1852360  1c43c8 vmlinux

So apparently the -mrtd options is quite significant as well.

With the following options enabled we get:
-mregparm=2

text    data     bss     dec     hex filename
1307876  260804  288080 1856760  1c54f8 vmlinux

As expected the influence here isn't too significant.

So the conclusion is that apparetly the change in calling convention can
result
in a saving of about 2.3% in code size. This may not sound grat in
relative
numbers, but for a compiler designer this would already sound hilarious
and in
absolute numbers it's: 29760 bytes. Not withstanding the speed
improvement...

Oh for compleatness sake, the compiler used was:
gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-99)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07 14:59                   ` Martin Dalecki
@ 2001-11-07 14:17                     ` Alan Cox
  2001-11-07 14:34                       ` Dirk Moerenhout
                                         ` (4 more replies)
  0 siblings, 5 replies; 61+ messages in thread
From: Alan Cox @ 2001-11-07 14:17 UTC (permalink / raw)
  To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel

> somehow encouraged by the compiler comparisions between gcc and intel's
> free compiler, which use the register passing for anything local
> to the actual code, where the speed gains are up to 20% im currently

I was under the impression intels compiler was profoundly non-free ?

> quite inclined to do the redo and finish the experiment.
> BTW.> It's not just asm fixpus that have to be done for this
> to work. For example all the c files with -fno-omit-frame-pointer

20% is a nice large number

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07 14:17                     ` Alan Cox
@ 2001-11-07 14:34                       ` Dirk Moerenhout
  2001-11-07 14:54                         ` Alan Cox
  2001-11-07 14:39                       ` Intel compiler [Re: Using %cr2 to reference "current"] Sebastian Heidl
                                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 61+ messages in thread
From: Dirk Moerenhout @ 2001-11-07 14:34 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

> > somehow encouraged by the compiler comparisions between gcc and intel's
> > free compiler, which use the register passing for anything local
> > to the actual code, where the speed gains are up to 20% im currently
>
> I was under the impression intels compiler was profoundly non-free ?

Thought that too untill a minute ago. Went to the Intel site and read the
information.

http://developer.intel.com/software/products/eval/

Gives details about _two_ ways to get it free. The known 30 day free trial
with support but also a less known "non commercial unsupported" option. So
for non-commercial use you can use it as much as you want, you just don't
get support.

Downloading it now to play some with it :-)

Dirk Moerenhout ///// System Administrator ///// Planet Internet NV


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Intel compiler [Re: Using %cr2 to reference "current"]
  2001-11-07 14:17                     ` Alan Cox
  2001-11-07 14:34                       ` Dirk Moerenhout
@ 2001-11-07 14:39                       ` Sebastian Heidl
  2001-11-07 22:05                         ` lists
  2001-11-07 15:36                       ` Using %cr2 to reference "current" Martin Dalecki
                                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 61+ messages in thread
From: Sebastian Heidl @ 2001-11-07 14:39 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Wed, Nov 07, 2001 at 02:17:33PM +0000, Alan Cox wrote:
> > somehow encouraged by the compiler comparisions between gcc and intel's
> > free compiler, which use the register passing for anything local
> > to the actual code, where the speed gains are up to 20% im currently
> 
> I was under the impression intels compiler was profoundly non-free ?

have a look:
http://developer.intel.com/software/products/eval/


_sh_


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07 14:34                       ` Dirk Moerenhout
@ 2001-11-07 14:54                         ` Alan Cox
  2001-11-07 15:32                           ` David Howells
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2001-11-07 14:54 UTC (permalink / raw)
  To: Dirk Moerenhout; +Cc: Alan Cox, linux-kernel

> Thought that too untill a minute ago. Went to the Intel site and read the
> information.
> 
> http://developer.intel.com/software/products/eval/

> Gives details about _two_ ways to get it free. The known 30 day free trial

Seems to be non free to me

May well be non-fee non-free but its still most definitely non-free

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07 13:38                 ` Alan Cox
@ 2001-11-07 14:59                   ` Martin Dalecki
  2001-11-07 14:17                     ` Alan Cox
  2001-11-07 20:04                   ` Using %cr2 to reference "current" Andrew Morton
  2001-11-11 13:16                   ` Martin Dalecki
  2 siblings, 1 reply; 61+ messages in thread
From: Martin Dalecki @ 2001-11-07 14:59 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, linux-kernel

Alan Cox wrote:
> 
> > With the following options enabled we get:
> > -freg-struct-return -mrtd -mregparm=3
> >
> >    text    data     bss     dec     hex filename
> > 1302372  260804  288080 1851256  1c3f78 vmlinux
> >
> > Quite significant difference if you ask me!!!
> 
> 30K is nice have but still a scratch on the surface compared with 500K 8)
> 
> > in a saving of about 2.3% in code size. This may not sound grat in
> > relative
> > numbers, but for a compiler designer this would already sound hilarious
> > and in
> > absolute numbers it's: 29760 bytes. Not withstanding the speed
> > improvement...
> 
> The obvious question is - have you tried running the kernel built like that
> with any asm fixups needed ?

Once a long time ago I tried already to do the fixups myself, and got
to the stage of init starting... It wasn't THAT difficult. However
somehow encouraged by the compiler comparisions between gcc and intel's
free compiler, which use the register passing for anything local
to the actual code, where the speed gains are up to 20% im currently
quite inclined to do the redo and finish the experiment.
BTW.> It's not just asm fixpus that have to be done for this
to work. For example all the c files with -fno-omit-frame-pointer
as additional compilatoin flag have to be looked seriously at
again. And of course UML makes the debugging of at least this easier.

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07 14:54                         ` Alan Cox
@ 2001-11-07 15:32                           ` David Howells
  0 siblings, 0 replies; 61+ messages in thread
From: David Howells @ 2001-11-07 15:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Torvalds, bcrl, linux-kernel


Instead of using %cr2, how about giving each CPU it's own GDT (the GDT doesn't
need to contain many entries). Have one segment number point to a CPU specific
data area that contains things like the current task pointer for that CPU, the
CPU number, etc, etc. This same segment number will be used on all CPU's, but
will be multiplexed via the per-CPU GDTs instead.

Then you can load up a segment register with this segment on entry to the
kernel, and then make CPU data accesses relative to that.

David

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07 14:17                     ` Alan Cox
  2001-11-07 14:34                       ` Dirk Moerenhout
  2001-11-07 14:39                       ` Intel compiler [Re: Using %cr2 to reference "current"] Sebastian Heidl
@ 2001-11-07 15:36                       ` Martin Dalecki
  2001-11-08 14:08                       ` Martin Dalecki
  2001-11-13 16:49                       ` Merge BUG in 2.4.15-pre4 serial.c Martin Dalecki
  4 siblings, 0 replies; 61+ messages in thread
From: Martin Dalecki @ 2001-11-07 15:36 UTC (permalink / raw)
  To: Alan Cox; +Cc: dalecki, Linus Torvalds, linux-kernel

Alan Cox wrote:
> 
> > somehow encouraged by the compiler comparisions between gcc and intel's
> > free compiler, which use the register passing for anything local
> > to the actual code, where the speed gains are up to 20% im currently
> 
> I was under the impression intels compiler was profoundly non-free ?

Well it's free in terms of money, read: download and "personal usage"
blabla.
This doesn't deterr me from having a look at it ;-).
> 
> > quite inclined to do the redo and finish the experiment.
> > BTW.> It's not just asm fixpus that have to be done for this
> > to work. For example all the c files with -fno-omit-frame-pointer
> 
> 20% is a nice large number.

Yes I was impressed as well and twiddeling with compiler flags is 
actually indicating that the calling convention stuff is one
of the main contributors to this. BTW.> The speed differences 
can go up to 40% for floating point, OK this is irrelevant for
the kernel but it is showing very well that there is still
plenty of room for improvement.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07 13:38                 ` Alan Cox
  2001-11-07 14:59                   ` Martin Dalecki
@ 2001-11-07 20:04                   ` Andrew Morton
  2001-11-11 13:16                   ` Martin Dalecki
  2 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2001-11-07 20:04 UTC (permalink / raw)
  To: Alan Cox; +Cc: Martin Dalecki, linux-kernel

Alan Cox wrote:
> 
> > With the following options enabled we get:
> > -freg-struct-return -mrtd -mregparm=3
> >
> >    text    data     bss     dec     hex filename
> > 1302372  260804  288080 1851256  1c3f78 vmlinux
> >
> > Quite significant difference if you ask me!!!
> 
> 30K is nice have but still a scratch on the surface compared with 500K 8)
> 

It's a lot of L1 though.

If this sort of change breaks the ability to build with
conventional argument passing and no-omit-frame-pointer then
the happy kgdb users of this world will be most aggrieved.

-

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Intel compiler [Re: Using %cr2 to reference "current"]
  2001-11-07 14:39                       ` Intel compiler [Re: Using %cr2 to reference "current"] Sebastian Heidl
@ 2001-11-07 22:05                         ` lists
  0 siblings, 0 replies; 61+ messages in thread
From: lists @ 2001-11-07 22:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: Sebastian Heidl


 Just as another data point - a simple test, I ran intel 
 compiler on flops v2.

 Run 3 ways - gcc3, icc (v 5) and the beta 6 icc. All run
 on dual p4 with 1 Gb mem on Rh 7.2

 At least on this test the differences are quite dramatic.

  Regards,

  gene/

---------------------------------------------------------------------
Summary
------

   gcc -DUNIX -O3 -march=i686 flops2.c
   icc -xMKW -o flops2 -DUNIX -O3 flops2.c

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module        MFLOPS
               gcc       icc 5        icc 6        
	       --------  ---------  ----------
     1         444.9410   439.4850    674.3180
     2         265.4815   362.3862    362.3862
     3         298.1843   604.0250   1270.6569
     4         337.7309  1224.8804   1373.8819
     5         392.7003  1138.6503   1131.7073
     6         391.7678  1334.0521   1422.2222
     7         163.5783   193.3900    193.5118
     8         395.7743  1317.3242   1372.6542

   Iterations      =  512000000  512000000  512000000
   NullTime (usec) =     0.0029     0.0000     0.0000
   MFLOPS(1)       =   275.3542   416.9120   472.8952
   MFLOPS(2)       =   264.7165   413.4297   448.2175
   MFLOPS(3)       =   339.5966   714.7146   834.5651
   MFLOPS(4)       =   362.1891  1071.8196  1367.5374

---------------------------------------------------------------------

On Wed, Nov 07, 2001 at 03:39:46PM +0100, Sebastian Heidl wrote:
> On Wed, Nov 07, 2001 at 02:17:33PM +0000, Alan Cox wrote:
> > > somehow encouraged by the compiler comparisions between gcc and intel's
> > > free compiler, which use the register passing for anything local
> > > to the actual code, where the speed gains are up to 20% im currently
> > 
> > I was under the impression intels compiler was profoundly non-free ?
> 
> have a look:
> http://developer.intel.com/software/products/eval/

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07 14:17                     ` Alan Cox
                                         ` (2 preceding siblings ...)
  2001-11-07 15:36                       ` Using %cr2 to reference "current" Martin Dalecki
@ 2001-11-08 14:08                       ` Martin Dalecki
  2001-11-13 16:49                       ` Merge BUG in 2.4.15-pre4 serial.c Martin Dalecki
  4 siblings, 0 replies; 61+ messages in thread
From: Martin Dalecki @ 2001-11-08 14:08 UTC (permalink / raw)
  To: Alan Cox; +Cc: dalecki, Linus Torvalds, linux-kernel

Alan Cox wrote:
> 
> > somehow encouraged by the compiler comparisions between gcc and intel's
> > free compiler, which use the register passing for anything local
> > to the actual code, where the speed gains are up to 20% im currently
> 
> I was under the impression intels compiler was profoundly non-free ?
> 
> > quite inclined to do the redo and finish the experiment.
> > BTW.> It's not just asm fixpus that have to be done for this
> > to work. For example all the c files with -fno-omit-frame-pointer
> 
> 20% is a nice large number

I just wanted to note that I got already the wohle fixup until
the stage where the first schedule() occures during the kernel
initialization... printk and so on all seem to work nicely ;-).
Well the where some errors which had to be fixed until this.
For example the decompress_kernel function should have the
attribute asmlinkage and boot/compressed/misc.c should not export
enything else.

Further debugging will occur this evening...

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-06 18:19       ` Alan Cox
@ 2001-11-09 21:52         ` Jamie Lokier
  0 siblings, 0 replies; 61+ messages in thread
From: Jamie Lokier @ 2001-11-09 21:52 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, Benjamin LaHaise, linux-kernel

Alan Cox wrote:
> True enough, but then we can go to
> 
> 	andl %%esp, %0
> 	movl (%%eax), %%eax
> 
> which doesnt really change the cost much, lets us colour the task structs
> nicely, and lets us colour the stack somewhat by offseting esp from the base
> - and all in standard instructions

A variant lets you put the pointer at the top of the stack, where it can
sometimes share a cache line with the freshly pushed context:

	movl $0x1ffc,%0
	orl %esp,%0
	movl (%0), %0

This works because GCC keeps the stack aligned to 4 bytes at all times,
I believe.

Both this simple sequence, and Alan's code, suffer from the problem that
the pointer itself is not cache-coloured, but it is a lot better than
having the whole context and task state on the same colour.

This perhaps be improved using Linus' idea of shifting upper address
bits to colour the pointer as well.

-- Jamie


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-11 13:16                   ` Martin Dalecki
@ 2001-11-11 13:06                     ` Keith Owens
  2001-11-12 11:28                     ` PATCH 2.4.14 mregparm=3 compilation fixes Martin Dalecki
  1 sibling, 0 replies; 61+ messages in thread
From: Keith Owens @ 2001-11-11 13:06 UTC (permalink / raw)
  To: dalecki; +Cc: linux-kernel

On Sun, 11 Nov 2001 14:16:36 +0100, 
Martin Dalecki <dalecki@evision-ventures.com> wrote:
>I have now a nice kernel at home, compiled with -mredparm=3 up
>... Patch will follow on monday

Compiling the kernel with mregparm is going to play havoc with binary
only modules (BOMs), interface mismatches all over the place.  I know
we do not support BOMs but there is a big difference between not
supporting them and having them actively destroy the kernel because of
different calling sequences.

A new feature of kbuild 2.5 is defining which CONFIG options are
critical, any change to any critical config option forces a complete
kernel rebuild.  Modutils 2.5 will also refuse to load a module if its
critical config options are different from the kernel.  The current
list of critical options is

  CONFIG_SMP
    UP modules in SMP kernel or vice versa just go splat.  This
    replaces the modversions '_smp' prefix.

  CONFIG_KBUILD_GCC_VERSION
    Inserting a module compiled with gcc 3.0.1 into a kernel compiled
    with gcc 3.0.2 is a receipe for disaster.  Kernel and module must
    be built with the same compiler.

Any changes that affect the ABI for modules must be handled via config
options and those options must be on the critical list in 2.5.

Please add CONFIG_MREGPARM with a huge warning that, until kbuild 2.5
and modutils 2.5 are available, inserting a BOM is likely to destroy a
kernel compiled with CONFIG_MREGPARM.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
  2001-11-07 13:38                 ` Alan Cox
  2001-11-07 14:59                   ` Martin Dalecki
  2001-11-07 20:04                   ` Using %cr2 to reference "current" Andrew Morton
@ 2001-11-11 13:16                   ` Martin Dalecki
  2001-11-11 13:06                     ` Keith Owens
  2001-11-12 11:28                     ` PATCH 2.4.14 mregparm=3 compilation fixes Martin Dalecki
  2 siblings, 2 replies; 61+ messages in thread
From: Martin Dalecki @ 2001-11-11 13:16 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, linux-kernel

Alan Cox wrote:
> 
> > With the following options enabled we get:
> > -freg-struct-return -mrtd -mregparm=3
> >
> >    text    data     bss     dec     hex filename
> > 1302372  260804  288080 1851256  1c3f78 vmlinux
> >
> > Quite significant difference if you ask me!!!
> 
> 30K is nice have but still a scratch on the surface compared with 500K 8)
> 
> > in a saving of about 2.3% in code size. This may not sound grat in
> > relative
> > numbers, but for a compiler designer this would already sound hilarious
> > and in
> > absolute numbers it's: 29760 bytes. Not withstanding the speed
> > improvement...
> 
> The obvious question is - have you tried running the kernel built like that
> with any asm fixups needed ?

I have now a nice kernel at home, compiled with -mredparm=3 up
and going. Full interactive session, full kernel compiles working,
X11 whatsup. Everything seems fine so far.

However I still have to build a RPM-feature grade kernel and test it.
Further the precise benchmarking will take some time as well.
I think that I will in esp. use the byte benchmark, since it is 
quite "kernel intensive" at some parts. Patch will follow on
monday (if nothing comes in between...).

^ permalink raw reply	[flat|nested] 61+ messages in thread

* PATCH 2.4.14 mregparm=3 compilation fixes
  2001-11-11 13:16                   ` Martin Dalecki
  2001-11-11 13:06                     ` Keith Owens
@ 2001-11-12 11:28                     ` Martin Dalecki
  2001-11-12 16:10                       ` Keith Owens
  2001-11-12 16:42                       ` Linus Torvalds
  1 sibling, 2 replies; 61+ messages in thread
From: Martin Dalecki @ 2001-11-12 11:28 UTC (permalink / raw)
  To: Alan Cox, Linus Torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1027 bytes --]

Hello out there!

The attached patch is fixing compilation and running
of the kernel with -mregparm=3 on IA32. The fixes excluding
the change in arch/i386/Makefile of course apply to the stock kernel
as well, so Linus please include it in 2.4.15 - it just won't hurt...

Well the benchmarks I intended to do (i.e. the byte unix bench)
where not quite conclusive, so I include the results here just
for reference. They where done on a PIII Celeron notebook running
at 700 MHz with 192 of RAM.

- reparm3.report was gathered with the patch applied.

- report was probed without the patch applied.

Maybe someone with more time and who has the proper infrastructure at
hand may provide here some more fine grained tests? 

The patch itself turned out to be much smaller and simpler than
what I did expect. However the space savings are quite significant,
in esp. respective a so small change in the kernel...

BTW. The -pipe compiler options doesn't give any speed advantage
on systems where /tmp is on tmpfs anylonger!

Have fun!

[-- Attachment #2: mregparm.patch --]
[-- Type: text/plain, Size: 4136 bytes --]

diff -ur linux-2.4.14-2/arch/i386/Makefile linux-mdcki/arch/i386/Makefile
--- linux-2.4.14-2/arch/i386/Makefile	Thu Apr 12 21:20:31 2001
+++ linux-mdcki/arch/i386/Makefile	Sat Nov 10 00:07:17 2001
@@ -21,7 +21,7 @@
 LDFLAGS=-e stext
 LINKFLAGS =-T $(TOPDIR)/arch/i386/vmlinux.lds $(LDFLAGS)
 
-CFLAGS += -pipe
+CFLAGS += -freg-struct-return -mregparm=3
 
 # prevent gcc from keeping the stack 16 byte aligned
 CFLAGS += $(shell if $(CC) -mpreferred-stack-boundary=2 -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-mpreferred-stack-boundary=2"; fi)
diff -ur linux-2.4.14-2/arch/i386/boot/compressed/misc.c linux-mdcki/arch/i386/boot/compressed/misc.c
--- linux-2.4.14-2/arch/i386/boot/compressed/misc.c	Fri Oct  5 03:42:54 2001
+++ linux-mdcki/arch/i386/boot/compressed/misc.c	Sat Nov 10 00:02:08 2001
@@ -9,6 +9,7 @@
  * High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996
  */
 
+#include <linux/linkage.h>
 #include <linux/vmalloc.h>
 #include <linux/tty.h>
 #include <asm/io.h>
@@ -304,7 +305,7 @@
 	short b;
 	} stack_start = { & user_stack [STACK_SIZE] , __KERNEL_DS };
 
-void setup_normal_output_buffer(void)
+static void setup_normal_output_buffer(void)
 {
 #ifdef STANDARD_MEMORY_BIOS_CALL
 	if (EXT_MEM_K < 1024) error("Less than 2MB of memory.\n");
@@ -320,7 +321,7 @@
 	uch *high_buffer_start; int hcount;
 };
 
-void setup_output_buffer_if_we_run_high(struct moveparams *mv)
+static void setup_output_buffer_if_we_run_high(struct moveparams *mv)
 {
 	high_buffer_start = (uch *)(((ulg)&end) + HEAP_SIZE);
 #ifdef STANDARD_MEMORY_BIOS_CALL
@@ -342,7 +343,7 @@
 	mv->high_buffer_start = high_buffer_start;
 }
 
-void close_output_buffer_if_we_run_high(struct moveparams *mv)
+static void close_output_buffer_if_we_run_high(struct moveparams *mv)
 {
 	if (bytes_out > low_buffer_size) {
 		mv->lcount = low_buffer_size;
@@ -355,7 +356,7 @@
 }
 
 
-int decompress_kernel(struct moveparams *mv, void *rmode)
+asmlinkage int decompress_kernel(struct moveparams *mv, void *rmode)
 {
 	real_mode = rmode;
 
diff -ur linux-2.4.14-2/arch/i386/kernel/bluesmoke.c linux-mdcki/arch/i386/kernel/bluesmoke.c
--- linux-2.4.14-2/arch/i386/kernel/bluesmoke.c	Thu Oct 11 18:04:57 2001
+++ linux-mdcki/arch/i386/kernel/bluesmoke.c	Sat Nov 10 02:24:25 2001
@@ -100,11 +100,11 @@
 
 /*
  *	Call the installed machine check handler for this CPU setup.
- */ 
- 
+ */
+
 static void (*machine_check_vector)(struct pt_regs *, long error_code) = unexpected_machine_check;
 
-void do_machine_check(struct pt_regs * regs, long error_code)
+asmlinkage void do_machine_check(struct pt_regs * regs, long error_code)
 {
 	machine_check_vector(regs, error_code);
 }
diff -ur linux-2.4.14-2/arch/i386/math-emu/fpu_proto.h linux-mdcki/arch/i386/math-emu/fpu_proto.h
--- linux-2.4.14-2/arch/i386/math-emu/fpu_proto.h	Wed Dec 10 02:57:09 1997
+++ linux-mdcki/arch/i386/math-emu/fpu_proto.h	Sat Nov 10 02:31:22 2001
@@ -53,7 +53,7 @@
 extern void fst_i_(void);
 extern void fstp_i(void);
 /* fpu_entry.c */
-extern void math_emulate(long arg);
+asmlinkage extern void math_emulate(long arg);
 extern void math_abort(struct info *info, unsigned int signal);
 /* fpu_etc.c */
 extern void FPU_etc(void);
diff -ur linux-2.4.14-2/include/linux/kernel.h linux-mdcki/include/linux/kernel.h
--- linux-2.4.14-2/include/linux/kernel.h	Fri Nov  9 20:11:22 2001
+++ linux-mdcki/include/linux/kernel.h	Sun Nov 11 12:35:46 2001
@@ -51,7 +51,7 @@
 extern struct notifier_block *panic_notifier_list;
 NORET_TYPE void panic(const char * fmt, ...)
 	__attribute__ ((NORET_AND format (printf, 1, 2)));
-NORET_TYPE void do_exit(long error_code)
+asmlinkage NORET_TYPE void do_exit(long error_code)
 	ATTRIB_NORET;
 NORET_TYPE void complete_and_exit(struct completion *, long)
 	ATTRIB_NORET;
diff -ur linux-2.4.14-2/kernel/sched.c linux-mdcki/kernel/sched.c
--- linux-2.4.14-2/kernel/sched.c	Fri Nov  9 19:56:42 2001
+++ linux-mdcki/kernel/sched.c	Sat Nov 10 02:07:01 2001
@@ -515,7 +515,7 @@
 #endif /* CONFIG_SMP */
 }
 
-void schedule_tail(struct task_struct *prev)
+asmlinkage void schedule_tail(struct task_struct *prev)
 {
 	__schedule_tail(prev);
 }

[-- Attachment #3: regparm3.report --]
[-- Type: text/plain, Size: 3083 bytes --]


  BYTE UNIX Benchmarks (Version 3.11)
  System -- Linux kozaczek 2.4.14-mdcki #15 nie lis 11 12:35:45 CET 2001 i686 unknown
  Start Benchmark Run: nie lis 11 14:40:32 CET 2001
   1 interactive users.
Dhrystone 2 without register variables   1263066.6 lps   (10 secs, 6 samples)
Dhrystone 2 using register variables     1264480.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = arithoh)         3179144.1 lps   (10 secs, 6 samples)
Arithmetic Test (type = register)        188804.1 lps   (10 secs, 6 samples)
Arithmetic Test (type = short)           190760.8 lps   (10 secs, 6 samples)
Arithmetic Test (type = int)             188823.6 lps   (10 secs, 6 samples)
Arithmetic Test (type = long)            189990.7 lps   (10 secs, 6 samples)
Arithmetic Test (type = float)           182915.1 lps   (10 secs, 6 samples)
Arithmetic Test (type = double)          183937.8 lps   (10 secs, 6 samples)
System Call Overhead Test                363784.1 lps   (10 secs, 6 samples)
Pipe Throughput Test                     415828.7 lps   (10 secs, 6 samples)
Pipe-based Context Switching Test        196984.2 lps   (10 secs, 6 samples)
Process Creation Test                      3378.5 lps   (10 secs, 6 samples)
Execl Throughput Test                       619.3 lps   (9 secs, 6 samples)
File Read  (10 seconds)                  1327798.0 KBps  (10 secs, 6 samples)
File Write (10 seconds)                  138593.0 KBps  (10 secs, 6 samples)
File Copy  (10 seconds)                   19076.0 KBps  (10 secs, 6 samples)
File Read  (30 seconds)                  1337240.0 KBps  (30 secs, 6 samples)
File Write (30 seconds)                  147663.0 KBps  (30 secs, 6 samples)
File Copy  (30 seconds)                   14968.0 KBps  (30 secs, 6 samples)
C Compiler Test                             388.7 lpm   (60 secs, 3 samples)
Shell scripts (1 concurrent)               1065.8 lpm   (60 secs, 3 samples)
Shell scripts (2 concurrent)                562.8 lpm   (60 secs, 3 samples)
Shell scripts (4 concurrent)                287.0 lpm   (60 secs, 3 samples)
Shell scripts (8 concurrent)                146.0 lpm   (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places          28902.2 lpm   (60 secs, 6 samples)
Recursion Test--Tower of Hanoi            16393.7 lps   (10 secs, 6 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Arithmetic Test (type = double)               2541.7   183937.8       72.4
Dhrystone 2 without register variables       22366.3  1263066.6       56.5
Execl Throughput Test                           16.5      619.3       37.5
File Copy  (30 seconds)                        179.0    14968.0       83.6
Pipe-based Context Switching Test             1318.5   196984.2      149.4
Shell scripts (8 concurrent)                     4.0      146.0       36.5
                                                                 =========
     SUM of  6 items                                                 435.9
     AVERAGE                                                          72.6

[-- Attachment #4: report --]
[-- Type: text/plain, Size: 3077 bytes --]


  BYTE UNIX Benchmarks (Version 3.11)
  System -- Linux kozaczek 2.4.14-2 #1 pi± lis 9 22:22:10 CET 2001 i686 unknown
  Start Benchmark Run: nie lis 11 16:10:53 CET 2001
   1 interactive users.
Dhrystone 2 without register variables   1263134.8 lps   (10 secs, 6 samples)
Dhrystone 2 using register variables     1263583.6 lps   (10 secs, 6 samples)
Arithmetic Test (type = arithoh)         3177830.7 lps   (10 secs, 6 samples)
Arithmetic Test (type = register)        189076.1 lps   (10 secs, 6 samples)
Arithmetic Test (type = short)           190665.1 lps   (10 secs, 6 samples)
Arithmetic Test (type = int)             188753.5 lps   (10 secs, 6 samples)
Arithmetic Test (type = long)            190094.2 lps   (10 secs, 6 samples)
Arithmetic Test (type = float)           182872.2 lps   (10 secs, 6 samples)
Arithmetic Test (type = double)          183902.9 lps   (10 secs, 6 samples)
System Call Overhead Test                360235.7 lps   (10 secs, 6 samples)
Pipe Throughput Test                     421456.7 lps   (10 secs, 6 samples)
Pipe-based Context Switching Test        194915.8 lps   (10 secs, 6 samples)
Process Creation Test                      3605.4 lps   (10 secs, 6 samples)
Execl Throughput Test                       608.6 lps   (9 secs, 6 samples)
File Read  (10 seconds)                  1294487.0 KBps  (10 secs, 6 samples)
File Write (10 seconds)                  138403.0 KBps  (10 secs, 6 samples)
File Copy  (10 seconds)                   19158.0 KBps  (10 secs, 6 samples)
File Read  (30 seconds)                  1278293.0 KBps  (30 secs, 6 samples)
File Write (30 seconds)                  147556.0 KBps  (30 secs, 6 samples)
File Copy  (30 seconds)                   15129.0 KBps  (30 secs, 6 samples)
C Compiler Test                             388.8 lpm   (60 secs, 3 samples)
Shell scripts (1 concurrent)               1063.2 lpm   (60 secs, 3 samples)
Shell scripts (2 concurrent)                563.1 lpm   (60 secs, 3 samples)
Shell scripts (4 concurrent)                287.4 lpm   (60 secs, 3 samples)
Shell scripts (8 concurrent)                145.7 lpm   (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places          28576.1 lpm   (60 secs, 6 samples)
Recursion Test--Tower of Hanoi            16445.3 lps   (10 secs, 6 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Arithmetic Test (type = double)               2541.7   183902.9       72.4
Dhrystone 2 without register variables       22366.3  1263134.8       56.5
Execl Throughput Test                           16.5      608.6       36.9
File Copy  (30 seconds)                        179.0    15129.0       84.5
Pipe-based Context Switching Test             1318.5   194915.8      147.8
Shell scripts (8 concurrent)                     4.0      145.7       36.4
                                                                 =========
     SUM of  6 items                                                 434.5
     AVERAGE                                                          72.4

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: PATCH 2.4.14 mregparm=3 compilation fixes
  2001-11-12 11:28                     ` PATCH 2.4.14 mregparm=3 compilation fixes Martin Dalecki
@ 2001-11-12 16:10                       ` Keith Owens
  2001-11-12 16:25                         ` Christoph Hellwig
  2001-11-12 17:56                         ` Martin Dalecki
  2001-11-12 16:42                       ` Linus Torvalds
  1 sibling, 2 replies; 61+ messages in thread
From: Keith Owens @ 2001-11-12 16:10 UTC (permalink / raw)
  To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel

On Mon, 12 Nov 2001 12:28:33 +0100, 
Martin Dalecki <dalecki@evision-ventures.com> wrote:
>diff -ur linux-2.4.14-2/arch/i386/Makefile linux-mdcki/arch/i386/Makefile
>--- linux-2.4.14-2/arch/i386/Makefile	Thu Apr 12 21:20:31 2001
>+++ linux-mdcki/arch/i386/Makefile	Sat Nov 10 00:07:17 2001
>@@ -21,7 +21,7 @@
> LDFLAGS=-e stext
> LINKFLAGS =-T $(TOPDIR)/arch/i386/vmlinux.lds $(LDFLAGS)
> 
>-CFLAGS += -pipe
>+CFLAGS += -freg-struct-return -mregparm=3
> 
> # prevent gcc from keeping the stack 16 byte aligned
> CFLAGS += $(shell if $(CC) -mpreferred-stack-boundary=2 -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-mpreferred-stack-boundary=2"; fi)

Setting mregparm must be a CONFIG_ option, with a huge warning that

A) Changing CONFIG_MREGPARM requires make mrproper.

B) Loading binary only modules into a kernel compiled with mregparm is
   even more likely to destroy your kernel.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: PATCH 2.4.14 mregparm=3 compilation fixes
  2001-11-12 16:10                       ` Keith Owens
@ 2001-11-12 16:25                         ` Christoph Hellwig
  2001-11-12 17:56                         ` Martin Dalecki
  1 sibling, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2001-11-12 16:25 UTC (permalink / raw)
  To: Keith Owens, dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel

In article <4300.1005581402@ocs3.intra.ocs.com.au> you wrote:
> Setting mregparm must be a CONFIG_ option, with a huge warning that
>
> A) Changing CONFIG_MREGPARM requires make mrproper.

The above patch changes the kernel to always use mregparm - 
it should be catched by the .flags depencies anyway.

> B) Loading binary only modules into a kernel compiled with mregparm is
>    even more likely to destroy your kernel.

Nope - people who uses those are just doomed.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: PATCH 2.4.14 mregparm=3 compilation fixes
  2001-11-12 11:28                     ` PATCH 2.4.14 mregparm=3 compilation fixes Martin Dalecki
  2001-11-12 16:10                       ` Keith Owens
@ 2001-11-12 16:42                       ` Linus Torvalds
  2001-11-12 18:51                         ` Martin Dalecki
  1 sibling, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2001-11-12 16:42 UTC (permalink / raw)
  To: dalecki; +Cc: Alan Cox, linux-kernel


On Mon, 12 Nov 2001, Martin Dalecki wrote:
>
> The attached patch is fixing compilation and running
> of the kernel with -mregparm=3 on IA32. The fixes excluding
> the change in arch/i386/Makefile of course apply to the stock kernel
> as well, so Linus please include it in 2.4.15 - it just won't hurt...

I certainly won't enable it in the stock kernel, considering the bad track
record gcc has had with regparm under register pressure, but the
"asmlinkage" parts look like real fixes.

However, it's kind of sad to make some of the more timing-critical stuff
(like schedule_tail) be asmlinkage - it might be worth it to do it the
other way around, and make it FASTCALL() and change the assembly code to
pass arguments in registers. That way, the calling convention is still the
same on both regparm=3 and without, but instead of defaulting to the slow
method we'd default to the fast one..

		Linus


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: PATCH 2.4.14 mregparm=3 compilation fixes
  2001-11-12 16:10                       ` Keith Owens
  2001-11-12 16:25                         ` Christoph Hellwig
@ 2001-11-12 17:56                         ` Martin Dalecki
  1 sibling, 0 replies; 61+ messages in thread
From: Martin Dalecki @ 2001-11-12 17:56 UTC (permalink / raw)
  To: Keith Owens; +Cc: dalecki, Alan Cox, Linus Torvalds, linux-kernel

Keith Owens wrote:
> 
> On Mon, 12 Nov 2001 12:28:33 +0100,
> Martin Dalecki <dalecki@evision-ventures.com> wrote:
> >diff -ur linux-2.4.14-2/arch/i386/Makefile linux-mdcki/arch/i386/Makefile
> >--- linux-2.4.14-2/arch/i386/Makefile  Thu Apr 12 21:20:31 2001
> >+++ linux-mdcki/arch/i386/Makefile     Sat Nov 10 00:07:17 2001
> >@@ -21,7 +21,7 @@
> > LDFLAGS=-e stext
> > LINKFLAGS =-T $(TOPDIR)/arch/i386/vmlinux.lds $(LDFLAGS)
> >
> >-CFLAGS += -pipe
> >+CFLAGS += -freg-struct-return -mregparm=3
> >
> > # prevent gcc from keeping the stack 16 byte aligned
> > CFLAGS += $(shell if $(CC) -mpreferred-stack-boundary=2 -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-mpreferred-stack-boundary=2"; fi)
> 
> Setting mregparm must be a CONFIG_ option, with a huge warning that
> 
> A) Changing CONFIG_MREGPARM requires make mrproper.
> 
> B) Loading binary only modules into a kernel compiled with mregparm is
>    even more likely to destroy your kernel.

Ehmm... In fact my feelings about this are that _this part_ of the
patch _should not_ be included in the mainstream kernel at all. It
should may be made just the default (in 2.5 perhaps)
if it turns out that the performance code size and so on gains are worth 
it, since I didn't encounter any problems thus far even with a "distro
RPM grade
kernel" containing USB TCP and what a not. GCC real got better over the
last
years! 

So there is no real need for an option at all in my oppinion.
We have already enough of them.

The REST OF THE PATCH is containing only pure true clear cut bugfixes
which should be applied STRAIGHT away. Those fixes do not influence
the current compilation output at all (with the exception of hiding not
externaly used global symbols in misc.c). But they enable somebody
who knows what he is doing to add the above CFLAGS for his system to
gain a significant amount of free speace for example in the PROM or to
gain a bit of performance - supposedly.

I hope this makes my intentions clear. OK?

BTW.> Try it out it doesn't interferre with any module handling.
However your objections about binary only modules I just don't
share - becouse I just don't care about them... In esp. my nonexistant
interrest in computer games doesn't oppress me to 
by any nvida graphics cards. Pure nice old Mach64 -
which always was one of the most UNIX friendly VGA designs ever 
just makes it fine for me ;-).

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: PATCH 2.4.14 mregparm=3 compilation fixes
  2001-11-12 16:42                       ` Linus Torvalds
@ 2001-11-12 18:51                         ` Martin Dalecki
  2001-11-12 20:05                           ` Corsspatch patch-2.4.15-pre2 patch-2.4.15-pre3 Martin Dalecki
  0 siblings, 1 reply; 61+ messages in thread
From: Martin Dalecki @ 2001-11-12 18:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dalecki, Alan Cox, linux-kernel

Linus Torvalds wrote:
> 
> On Mon, 12 Nov 2001, Martin Dalecki wrote:
> >
> > The attached patch is fixing compilation and running
> > of the kernel with -mregparm=3 on IA32. The fixes excluding
> > the change in arch/i386/Makefile of course apply to the stock kernel
> > as well, so Linus please include it in 2.4.15 - it just won't hurt...
> 
> I certainly won't enable it in the stock kernel, considering the bad track
> record gcc has had with regparm under register pressure, but the
> "asmlinkage" parts look like real fixes.

Yes that was always my intention. The chunk changing the CFLAGS wasn't
deleted from the patch only for the purpose of referrence. I did hope
that I made this clear in my announcement, but i failed apparently ;-).

Despite this I would like to make clear that I have compiled my own
"RedHat 7.2" compatible kernel-RPM set with the patch applied already
and
didn't encounter any problems thus far... Even an ORACLE DB just started
without noticing that anything changed beneath it.
Since this all was done on my notebook, I can say that there where even 
no problems with any of the "less mature" kernel parts 
like USB handling, CardBus and so on and so on
(Anybody please note: I didn't say "immature" just "less mature",
more like "fresh" no pun intendid.)

Apparently GCC got really much better in regard of this stuff recently.
I'm using RedHat GCC 2.96 brand gcc-2.96-99...
And I reiterate that I'm just happy running a whole
kernel compiled with mregparm=3 without any anomalities thus far.

> However, it's kind of sad to make some of the more timing-critical stuff
> (like schedule_tail) be asmlinkage - it might be worth it to do it the
> other way around, and make it FASTCALL() and change the assembly code to
> pass arguments in registers. That way, the calling convention is still the
> same on both regparm=3 and without, but instead of defaulting to the slow
> method we'd default to the fast one..

Yes that's right. However if you look close than you will notice, that
asmlinkage is quite a bad name. There should be a asmlinkage with
mregparm=3
ideally and a syslinkage macro for system call entry points with
mregparm=0 there.
And then fixes are fixes and with the current semantics my patch is
really just fixing bugs. (Tougth not "tragical" ones). So if I see
this fix applied I will make the above described improvements in 2.5
;-).
They are not difficult anyway, just a bit tedious... and then they would
affect a bit more code around there. In esp. the system call
declarations
and we have a lot of them already ;-).


So long...

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Corsspatch patch-2.4.15-pre2 patch-2.4.15-pre3
  2001-11-12 18:51                         ` Martin Dalecki
@ 2001-11-12 20:05                           ` Martin Dalecki
  2001-11-12 20:13                             ` BUG BUG hunt the bugs!!! patch-2.4.15-pre5 Martin Dalecki
  0 siblings, 1 reply; 61+ messages in thread
From: Martin Dalecki @ 2001-11-12 20:05 UTC (permalink / raw)
  Cc: Linus Torvalds, Alan Cox, linux-kernel

Hello out there!

Doing a X-patch between, ehmm, the pre-patches 2 and 3, I noticed
that a call to sa1100_irda_init() will be added in
patch-2.4.15-pre3 TWICE. This *may* work, but I think this isn't
quite in the intention of the inventor :-). So Linus/Alan please 
watch out...

It's in the file linux/net/irda/irda_device.c:
The following will be twice there after pre3
#ifdef CONFIG_SA1100_FIR
	sa1100_irda_init()
#endif

^ permalink raw reply	[flat|nested] 61+ messages in thread

* BUG BUG hunt the bugs!!! patch-2.4.15-pre5
  2001-11-12 20:05                           ` Corsspatch patch-2.4.15-pre2 patch-2.4.15-pre3 Martin Dalecki
@ 2001-11-12 20:13                             ` Martin Dalecki
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Dalecki @ 2001-11-12 20:13 UTC (permalink / raw)
  Cc: Linus Torvalds, Alan Cox, linux-kernel

Hallo out there!

Same symptom from patch-2.4.15-pre4:

diff -u --recursive --new-file v2.4.14/linux/net/irda/irda_device.c
linux/net/irda/irda_device.c
--- v2.4.14/linux/net/irda/irda_device.c	Sun Sep 23 11:41:02 2001
+++ linux/net/irda/irda_device.c	Sun Nov 11 10:20:21 2001
 
bla bla bla...

@@ -124,6 +127,12 @@
 #ifdef CONFIG_WINBOND_FIR
 	w83977af_init();
 #endif
+#ifdef CONFIG_SA1100_FIR
+	sa1100_irda_init();
+#endif
+#ifdef CONFIG_SA1100_FIR
+	sa1100_irda_init();
+#endif
 #ifdef CONFIG_NSC_FIR
 	nsc_ircc_init();
 #endif
@@ -151,6 +160,12 @@
 #ifdef CONFIG_OLD_BELKIN
  	old_belkin_init();
 #endif
+#ifdef CONFIG_EP7211_IR
+ 	ep7211_ir_init();
+#endif
+#ifdef CONFIG_EP7211_IR
+ 	ep7211_ir_init();
+#endif
 	return 0;

You see the initialization done twice!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Merge BUG in 2.4.15-pre4 serial.c
  2001-11-13 16:49                       ` Merge BUG in 2.4.15-pre4 serial.c Martin Dalecki
@ 2001-11-13 16:21                         ` Russell King
  2001-11-13 17:37                           ` Martin Dalecki
  0 siblings, 1 reply; 61+ messages in thread
From: Russell King @ 2001-11-13 16:21 UTC (permalink / raw)
  To: dalecki; +Cc: Alan Cox, Linus Torvalds, linux-kernel

On Tue, Nov 13, 2001 at 05:49:24PM +0100, Martin Dalecki wrote:
> I have found the following code in serial.c aorund line 5565
> 
> #ifdef __i386__
> 	if (i == NR_PORTS) {
> 		for (i = 4; i < NR_PORTS; i++)
> 			if ((rs_table[i].type == PORT_UNKNOWN) &&
> 			    (rs_table[i].count == 0))
> 				break;
> 	}
> #endif
> 	if (i == NR_PORTS) {
> 		for (i = 0; i < NR_PORTS; i++)
> 			if ((rs_table[i].type == PORT_UNKNOWN) &&
> 			    (rs_table[i].count == 0))
> 				break;
> 	}
> 
> This is supposedly the result of applying some patch twice.
> Let me guess the first 8 lines of this can be deleted.

Look at it closer, in particular the for() loops.

It's basically there so that on x86, we don't normally use ttyS0-3
for pcmcia and other similar ports, unless we run out of other ports
to use.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Merge BUG in 2.4.15-pre4 serial.c
  2001-11-07 14:17                     ` Alan Cox
                                         ` (3 preceding siblings ...)
  2001-11-08 14:08                       ` Martin Dalecki
@ 2001-11-13 16:49                       ` Martin Dalecki
  2001-11-13 16:21                         ` Russell King
  4 siblings, 1 reply; 61+ messages in thread
From: Martin Dalecki @ 2001-11-13 16:49 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, linux-kernel

I have found the following code in serial.c aorund line 5565

#ifdef __i386__
	if (i == NR_PORTS) {
		for (i = 4; i < NR_PORTS; i++)
			if ((rs_table[i].type == PORT_UNKNOWN) &&
			    (rs_table[i].count == 0))
				break;
	}
#endif
	if (i == NR_PORTS) {
		for (i = 0; i < NR_PORTS; i++)
			if ((rs_table[i].type == PORT_UNKNOWN) &&
			    (rs_table[i].count == 0))
				break;
	}

This is supposedly the result of applying some patch twice.
Let me guess the first 8 lines of this can be deleted.

Regards!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Merge BUG in 2.4.15-pre4 serial.c
  2001-11-13 17:37                           ` Martin Dalecki
@ 2001-11-13 16:53                             ` Russell King
  2001-11-13 18:05                               ` Martin Dalecki
  2001-11-13 17:11                             ` Alan Cox
  1 sibling, 1 reply; 61+ messages in thread
From: Russell King @ 2001-11-13 16:53 UTC (permalink / raw)
  To: dalecki; +Cc: linux-kernel

On Tue, Nov 13, 2001 at 06:37:54PM +0100, Martin Dalecki wrote:
> Pushing the port numbers artificially behind doesn't make sense for me
> and makes some setserial unknown tricks neccessary for irtty setup.

The key words here are "for me".

What setserial "unknown tricks" are you referring to?

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Merge BUG in 2.4.15-pre4 serial.c
  2001-11-13 17:37                           ` Martin Dalecki
  2001-11-13 16:53                             ` Russell King
@ 2001-11-13 17:11                             ` Alan Cox
  2001-11-13 18:23                               ` Martin Dalecki
  1 sibling, 1 reply; 61+ messages in thread
From: Alan Cox @ 2001-11-13 17:11 UTC (permalink / raw)
  To: dalecki; +Cc: Russell King, Alan Cox, Linus Torvalds, linux-kernel

> Well I still think that the 8 lines can be deleted. Once again my famous
> notbook is perfectly __i386__ and doesn't contain any devices served by
> serial.c
> unless I configure IrDA. Pushing the port numbers artificially behind
> doesn't make sense for me and makes some setserial unknown tricks
> neccessary

Renumbering everyones serial ports by suprise seems to be a 2.5 thing

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Merge BUG in 2.4.15-pre4 serial.c
  2001-11-13 16:21                         ` Russell King
@ 2001-11-13 17:37                           ` Martin Dalecki
  2001-11-13 16:53                             ` Russell King
  2001-11-13 17:11                             ` Alan Cox
  0 siblings, 2 replies; 61+ messages in thread
From: Martin Dalecki @ 2001-11-13 17:37 UTC (permalink / raw)
  To: Russell King; +Cc: dalecki, Alan Cox, Linus Torvalds, linux-kernel

Russell King wrote:
> 
> On Tue, Nov 13, 2001 at 05:49:24PM +0100, Martin Dalecki wrote:
> > I have found the following code in serial.c aorund line 5565
> >
> > #ifdef __i386__
> >       if (i == NR_PORTS) {
> >               for (i = 4; i < NR_PORTS; i++)
> >                       if ((rs_table[i].type == PORT_UNKNOWN) &&
> >                           (rs_table[i].count == 0))
> >                               break;
> >       }
> > #endif
> >       if (i == NR_PORTS) {
> >               for (i = 0; i < NR_PORTS; i++)
> >                       if ((rs_table[i].type == PORT_UNKNOWN) &&
> >                           (rs_table[i].count == 0))
> >                               break;
> >       }
> >
> > This is supposedly the result of applying some patch twice.
> > Let me guess the first 8 lines of this can be deleted.
> 
> Look at it closer, in particular the for() loops.
> 
> It's basically there so that on x86, we don't normally use ttyS0-3
> for pcmcia and other similar ports, unless we run out of other ports
> to use.

Well I still think that the 8 lines can be deleted. Once again my famous
notbook is perfectly __i386__ and doesn't contain any devices served by
serial.c
unless I configure IrDA. Pushing the port numbers artificially behind
doesn't make sense for me and makes some setserial unknown tricks
neccessary
for irtty setup.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Merge BUG in 2.4.15-pre4 serial.c
  2001-11-13 16:53                             ` Russell King
@ 2001-11-13 18:05                               ` Martin Dalecki
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Dalecki @ 2001-11-13 18:05 UTC (permalink / raw)
  To: Russell King; +Cc: dalecki, linux-kernel

Russell King wrote:
> 
> On Tue, Nov 13, 2001 at 06:37:54PM +0100, Martin Dalecki wrote:
> > Pushing the port numbers artificially behind doesn't make sense for me
> > and makes some setserial unknown tricks neccessary for irtty setup.
> 
> The key words here are "for me".
> 
> What setserial "unknown tricks" are you referring to?

I referr to the IrDA-HOWTO problems with the serial driver I think
this may be precisely the cause of the culprit.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Merge BUG in 2.4.15-pre4 serial.c
  2001-11-13 17:11                             ` Alan Cox
@ 2001-11-13 18:23                               ` Martin Dalecki
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Dalecki @ 2001-11-13 18:23 UTC (permalink / raw)
  To: Alan Cox; +Cc: dalecki, Russell King, Linus Torvalds, linux-kernel

Alan Cox wrote:
> 
> > Well I still think that the 8 lines can be deleted. Once again my famous
> > notbook is perfectly __i386__ and doesn't contain any devices served by
> > serial.c
> > unless I configure IrDA. Pushing the port numbers artificially behind
> > doesn't make sense for me and makes some setserial unknown tricks
> > neccessary
> 
> Renumbering everyones serial ports by suprise seems to be a 2.5 thing

OK that's an argument to which I fully agree.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
@ 2002-11-10 21:23 Igor Levicki
  0 siblings, 0 replies; 61+ messages in thread
From: Igor Levicki @ 2002-11-10 21:23 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel


Hi,

>I could well imagine a x86-compatible chip where %cr2 isn't even
>writable.  In fact, reading the intel documentation, I see _nowhere_ a
>mention of %cr2 being writable at all - it all just says "contains the
>fault address". 

>From Intel System Programmers Guide:

"The control registers can be read and loaded (or modified) using the
move-to-or-from-controlregisters
forms of the MOV instruction. In protected mode, the MOV instructions
allow the
control registers to be read or loaded (at privilege level 0 only).
This restriction means that application
programs or operating-system procedures (running at privilege levels 1,
2, or 3) are
prevented from reading or loading the control registers.
When loading the control register, reserved bits should always be set
to the values previously
read."

>(I don't know what the effect of the P4 half-cacheline
>thing is, I don't know if the CPU can have just a 64-byte block coherent,
>or what..

Cache sector size is 64 bytes on Pentium 4. When CPU reads from memory
it reads 2 sectors x 64 bytes = 128 byte cache line. Hardware
prefetcher fetches 2 x 128 byte cache line = 256 bytes of memory. On
write CPU writes 64 bytes always.
Now if you read 16 bytes from some address and then for example add
something to them and write them back to the same address you will have
a penalty when you read next 16 bytes from that address because you
have just trashed the 64 byte sector and you have to wait for
back-propagation.
Hope this helps.

Regards,
Igor Levicki
levicki@yubc.net



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Using %cr2 to reference "current"
@ 2001-11-06 22:05 Mikael Pettersson
  0 siblings, 0 replies; 61+ messages in thread
From: Mikael Pettersson @ 2001-11-06 22:05 UTC (permalink / raw)
  To: bcrl, torvalds; +Cc: linux-kernel

On Tue, 6 Nov 2001 09:49:15 -0800 (PST), Linus Torvalds wrote:
>	/* Return "current" in %eax, trash %edx */
>	do_get_current:
>		movl $0x0003c000,%eax	// 4 bits at bit 14
>		movl $-16384,%edx	// remove low 14 bits
>		andl $esp,%eax
>		andl $esp,%edx
>		shrl $7,%eax		// color it by 128 bytes
>		addl %edx,%eax
>		ret
>...
>I would not be surprised if "mov %cr2,%reg" will break a netburst trace
>cache entity, or even cause microcode to be executed. While I _guarantee_
>that all future Intel CPU's will continue to be fast at mixtures of simple
>arithmetic operations like "add" and "and".

On my Pentium 4:
- 6.30 cycles to copy %cr2 to %eax
- 1.05 cycles to compute a non-coloured current by masking %esp
- 2.31 cycles to compute a coloured current by your code above

I did some tests on using %cr2 for get_processor_id() a while ago,
but it was clearly slower (58% on P6, 20% on K6-III, 3% on P5MMX)
than *((%esp & mask)+offset), even though the latter also does a load.

/Mikael

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2002-11-10 21:17 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-06  7:18 Using %cr2 to reference "current" H. Peter Anvin
2001-11-06  8:01 ` Robert Love
2001-11-06 10:55   ` Alan Cox
2001-11-06 17:31     ` Michael Barabanov
2001-11-06 14:14   ` Manfred Spraul
2001-11-06 10:58 ` Alan Cox
2001-11-06 17:04   ` Linus Torvalds
2001-11-06 17:46     ` Alan Cox
2001-11-06 17:59       ` Linus Torvalds
2001-11-06 18:14         ` Alan Cox
2001-11-06 16:55           ` Marcelo Tosatti
2001-11-06 18:14           ` Linus Torvalds
2001-11-06 18:31             ` Alan Cox
2001-11-06 22:38               ` Linus Torvalds
2001-11-07  0:00           ` Martin Dalecki
2001-11-06 23:19             ` Alan Cox
2001-11-07  0:43               ` Martin Dalecki
2001-11-07  0:27                 ` Alan Cox
2001-11-07  0:35                 ` Jeff Garzik
2001-11-07 14:00               ` Martin Dalecki
2001-11-07 13:38                 ` Alan Cox
2001-11-07 14:59                   ` Martin Dalecki
2001-11-07 14:17                     ` Alan Cox
2001-11-07 14:34                       ` Dirk Moerenhout
2001-11-07 14:54                         ` Alan Cox
2001-11-07 15:32                           ` David Howells
2001-11-07 14:39                       ` Intel compiler [Re: Using %cr2 to reference "current"] Sebastian Heidl
2001-11-07 22:05                         ` lists
2001-11-07 15:36                       ` Using %cr2 to reference "current" Martin Dalecki
2001-11-08 14:08                       ` Martin Dalecki
2001-11-13 16:49                       ` Merge BUG in 2.4.15-pre4 serial.c Martin Dalecki
2001-11-13 16:21                         ` Russell King
2001-11-13 17:37                           ` Martin Dalecki
2001-11-13 16:53                             ` Russell King
2001-11-13 18:05                               ` Martin Dalecki
2001-11-13 17:11                             ` Alan Cox
2001-11-13 18:23                               ` Martin Dalecki
2001-11-07 20:04                   ` Using %cr2 to reference "current" Andrew Morton
2001-11-11 13:16                   ` Martin Dalecki
2001-11-11 13:06                     ` Keith Owens
2001-11-12 11:28                     ` PATCH 2.4.14 mregparm=3 compilation fixes Martin Dalecki
2001-11-12 16:10                       ` Keith Owens
2001-11-12 16:25                         ` Christoph Hellwig
2001-11-12 17:56                         ` Martin Dalecki
2001-11-12 16:42                       ` Linus Torvalds
2001-11-12 18:51                         ` Martin Dalecki
2001-11-12 20:05                           ` Corsspatch patch-2.4.15-pre2 patch-2.4.15-pre3 Martin Dalecki
2001-11-12 20:13                             ` BUG BUG hunt the bugs!!! patch-2.4.15-pre5 Martin Dalecki
2001-11-06 17:02 ` Using %cr2 to reference "current" Linus Torvalds
2001-11-06 17:13   ` Benjamin LaHaise
2001-11-06 17:49     ` Linus Torvalds
2001-11-06 18:19       ` Alan Cox
2001-11-09 21:52         ` Jamie Lokier
2001-11-06 18:42       ` Benjamin LaHaise
2001-11-06 19:09         ` H. Peter Anvin
2001-11-06 19:16         ` Dave Jones
2001-11-06 20:10           ` Ricky Beam
2001-11-06 23:09           ` Alan Cox
2001-11-06 23:15             ` Dave Jones
2001-11-06 22:05 Mikael Pettersson
2002-11-10 21:23 Igor Levicki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).