linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Info: NAPI performance at "low" loads
@ 2002-09-17 19:53 Manfred Spraul
  2002-09-17 20:59 ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Manfred Spraul @ 2002-09-17 19:53 UTC (permalink / raw)
  To: linux-netdev, linux-kernel

NAPI network drivers mask the rx interrupts in their interrupt handler,
and reenable them in dev->poll(). In the worst case, that happens for
every packet. I've tried to measure the overhead of that operation.

The cpu time needed to recieve 50k packets/sec:

without NAPI:	53.7 %
with NAPI:	59.9 %

50k packets/sec is the limit for NAPI, at higher packet rates the forced 
mitigation kicks in and every interrupt recieves more than one packet.

The cpu time was measured by busy-looping in user space, the numbers 
should be accurate to less than 1 %.
Summary: with my setup, the overhead is around 11 %.

Could someone try to reproduce my results?

Sender:
  # sendpkt <target ip> 1 <10..50, go get a good packet rate>

Receiver:
  $ loadtest

Please disable any interrupt  mitigation features of your nic, otherwise 
the mitigation will dramatically change the needed cpu time.
The sender sends ICMP echo reply packets, evenly spaced by 
"memset(,,n*512)" between the syscalls.
The cpu load was measured with a user space app that calls
"memset(,,16384)" in a tight loop, and reports the number of loops per
second.

I've used a patched tulip driver, the current NAPI driver contains a
loop that severely slows down the nic under such loads.

The patch and my test apps are at

http://www.q-ag.de/~manfred/loadtest

hardware setup:
	Duron 700, VIA KT 133
		no IO APIC, i.e. slow 8259 XT PIC.
	Accton tulip clone, ADMtek comet.
	crossover cable
	Sender: Celeron 1.13 GHz, rtl8139

--
	Manfred


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-17 19:53 Info: NAPI performance at "low" loads Manfred Spraul
@ 2002-09-17 20:59 ` David S. Miller
  2002-09-17 21:32   ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2002-09-17 20:59 UTC (permalink / raw)
  To: manfred; +Cc: linux-netdev, linux-kernel

   From: Manfred Spraul <manfred@colorfullife.com>
   Date: Tue, 17 Sep 2002 21:53:03 +0200

   Receiver:
     $ loadtest

This appears to be x86 only, sorry I can't test this out for you as
all my boxes are sparc64.

I was actually eager to try your tests out here.

Do you really need to use x86 instructions to do what you
are doing?  There are portable pthread mutexes available.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-17 21:32   ` Andrew Morton
@ 2002-09-17 21:26     ` David S. Miller
  2002-09-17 21:45       ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2002-09-17 21:26 UTC (permalink / raw)
  To: akpm; +Cc: manfred, linux-netdev, linux-kernel

   From: Andrew Morton <akpm@digeo.com>
   Date: Tue, 17 Sep 2002 14:32:09 -0700

   There is a similar background loadtester at
   http://www.zip.com.au/~akpm/linux/#zc .
   
   It's fairly fancy - I wrote it for measuring networking
   efficiency.  It doesn't seem to have any PCisms....
   
Thanks I'll check it out, but meanwhile I hacked up sparc
specific assembler for manfred's code :-)

   (I measured similar regression using an ancient NAPIfied
   3c59x a long time ago).
   
Well, it is due to the same problems manfred saw initially,
namely just a crappy or buggy NAPI driver implementation. :-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-17 20:59 ` David S. Miller
@ 2002-09-17 21:32   ` Andrew Morton
  2002-09-17 21:26     ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2002-09-17 21:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: manfred, linux-netdev, linux-kernel

"David S. Miller" wrote:
> 
>    From: Manfred Spraul <manfred@colorfullife.com>
>    Date: Tue, 17 Sep 2002 21:53:03 +0200
> 
>    Receiver:
>      $ loadtest
> 
> This appears to be x86 only, sorry I can't test this out for you as
> all my boxes are sparc64.
> 
> I was actually eager to try your tests out here.
> 
> Do you really need to use x86 instructions to do what you
> are doing?  There are portable pthread mutexes available.

There is a similar background loadtester at
http://www.zip.com.au/~akpm/linux/#zc .

It's fairly fancy - I wrote it for measuring networking
efficiency.  It doesn't seem to have any PCisms....

(I measured similar regression using an ancient NAPIfied
3c59x a long time ago).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-17 21:45       ` Andrew Morton
@ 2002-09-17 21:39         ` David S. Miller
  2002-09-17 21:54           ` Jeff Garzik
  2002-09-17 21:58           ` Andrew Morton
  0 siblings, 2 replies; 25+ messages in thread
From: David S. Miller @ 2002-09-17 21:39 UTC (permalink / raw)
  To: akpm; +Cc: manfred, netdev, linux-kernel

   From: Andrew Morton <akpm@digeo.com>
   Date: Tue, 17 Sep 2002 14:45:08 -0700

   "David S. Miller" wrote:
   > Well, it is due to the same problems manfred saw initially,
   > namely just a crappy or buggy NAPI driver implementation. :-)
   
   It was due to additional inl()'s and outl()'s in the driver fastpath.
   
How many?  Did the implementation cache the register value in a
software state word or did it read the register each time to write
the IRQ masking bits back?

It is issues like this that make me say "crappy or buggy NAPI
implementation"

Any driver should be able to get the NAPI overhead to max out at
2 PIOs per packet.

And if the performance is really concerning, perhaps add an option to
use MEM space in the 3c59x driver too, IO instructions are constant
cost regardless of how fast the PCI bus being used is :-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-17 21:26     ` David S. Miller
@ 2002-09-17 21:45       ` Andrew Morton
  2002-09-17 21:39         ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2002-09-17 21:45 UTC (permalink / raw)
  To: David S. Miller; +Cc: manfred, netdev, linux-kernel

"David S. Miller" wrote:
> 
>    From: Andrew Morton <akpm@digeo.com>
>    Date: Tue, 17 Sep 2002 14:32:09 -0700
> 
>    There is a similar background loadtester at
>    http://www.zip.com.au/~akpm/linux/#zc .
> 
>    It's fairly fancy - I wrote it for measuring networking
>    efficiency.  It doesn't seem to have any PCisms....
> 
> Thanks I'll check it out, but meanwhile I hacked up sparc
> specific assembler for manfred's code :-)
> 
>    (I measured similar regression using an ancient NAPIfied
>    3c59x a long time ago).
> 
> Well, it is due to the same problems manfred saw initially,
> namely just a crappy or buggy NAPI driver implementation. :-)

It was due to additional inl()'s and outl()'s in the driver fastpath.

Testcase was netperf Tx and Rx.  Just TCP over 100bT. AFAIK, this overhead
is intrinsic to NAPI.  Not to say that its costs outweigh its benefits,
but it's just there.

If someone wants to point me at all the bits and pieces to get a
NAPIfied 3c59x working on 2.5.current I'll retest, and generate
some instruction-level oprofiles.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-17 21:54           ` Jeff Garzik
@ 2002-09-17 21:49             ` David S. Miller
  2002-09-18  2:11               ` Jeff Garzik
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2002-09-17 21:49 UTC (permalink / raw)
  To: jgarzik; +Cc: akpm, manfred, netdev, linux-kernel

   From: Jeff Garzik <jgarzik@mandrakesoft.com>
   Date: Tue, 17 Sep 2002 17:54:42 -0400

   David S. Miller wrote:
   > Any driver should be able to get the NAPI overhead to max out at
   > 2 PIOs per packet.
   
   Just to pick nits... my example went from 2 or 3 IOs [depending on the 
   presence/absence of a work loop] to 6 IOs.
   
I mean "2 extra PIOs" not "2 total PIOs".

I think it's doable for just about every driver, even tg3 with it's
weird semaphore scheme takes 2 extra PIOs worst case with NAPI.

The semaphore I have to ACK anyways at hw IRQ time anyways, and since
I keep a software copy of the IRQ masking register, mask and unmask
are each one PIO.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-17 21:39         ` David S. Miller
@ 2002-09-17 21:54           ` Jeff Garzik
  2002-09-17 21:49             ` David S. Miller
  2002-09-17 21:58           ` Andrew Morton
  1 sibling, 1 reply; 25+ messages in thread
From: Jeff Garzik @ 2002-09-17 21:54 UTC (permalink / raw)
  To: David S. Miller; +Cc: akpm, manfred, netdev, linux-kernel

David S. Miller wrote:
> Any driver should be able to get the NAPI overhead to max out at
> 2 PIOs per packet.


Just to pick nits... my example went from 2 or 3 IOs [depending on the 
presence/absence of a work loop] to 6 IOs.

Feel free to re-read my message and point out where an IO can be 
eliminated...

	Jeff




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-17 21:39         ` David S. Miller
  2002-09-17 21:54           ` Jeff Garzik
@ 2002-09-17 21:58           ` Andrew Morton
  2002-09-18  0:57             ` jamal
  1 sibling, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2002-09-17 21:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: manfred, netdev, linux-kernel

"David S. Miller" wrote:
> 
>    From: Andrew Morton <akpm@digeo.com>
>    Date: Tue, 17 Sep 2002 14:45:08 -0700
> 
>    "David S. Miller" wrote:
>    > Well, it is due to the same problems manfred saw initially,
>    > namely just a crappy or buggy NAPI driver implementation. :-)
> 
>    It was due to additional inl()'s and outl()'s in the driver fastpath.
> 
> How many?  Did the implementation cache the register value in a
> software state word or did it read the register each time to write
> the IRQ masking bits back?
> 

Looks like it cached it:

-    outw(SetIntrEnb | (inw(ioaddr + 10) & ~StatsFull), ioaddr + EL3_CMD);
     vp->intr_enable &= ~StatsFull;
+    outw(vp->intr_enable, ioaddr + EL3_CMD);

> It is issues like this that make me say "crappy or buggy NAPI
> implementation"
> 
> Any driver should be able to get the NAPI overhead to max out at
> 2 PIOs per packet.
> 
> And if the performance is really concerning, perhaps add an option to
> use MEM space in the 3c59x driver too, IO instructions are constant
> cost regardless of how fast the PCI bus being used is :-)

Yup.  But deltas are interesting.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-17 21:58           ` Andrew Morton
@ 2002-09-18  0:57             ` jamal
  2002-09-18  1:00               ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: jamal @ 2002-09-18  0:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, manfred, netdev, linux-kernel



Manfred, could you please turn MMIO (you can select it
via kernel config) and see what the new difference looks like?

I am not so sure with that 6% difference there is no other bug lurking
there; 6% seems too large for an extra two PCI transactions per packet.
If someone could test a different NIC this would be great.
Actually what would be even better is to go something like 20kpps,
50kpps, 80 kpps, 100kpps and 140 kpps and see what we get.

cheers,
jamal


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18  0:57             ` jamal
@ 2002-09-18  1:00               ` David S. Miller
  2002-09-18  2:16                 ` Andrew Morton
  2002-09-18 17:27                 ` Eric W. Biederman
  0 siblings, 2 replies; 25+ messages in thread
From: David S. Miller @ 2002-09-18  1:00 UTC (permalink / raw)
  To: hadi; +Cc: akpm, manfred, netdev, linux-kernel

   From: jamal <hadi@cyberus.ca>
   Date: Tue, 17 Sep 2002 20:57:58 -0400 (EDT)
   
   I am not so sure with that 6% difference there is no other bug lurking
   there; 6% seems too large for an extra two PCI transactions per packet.

{in,out}{b,w,l}() operations have a fixed timing, therefore his
results doesn't sound that far off.

It is also one of the reasons I suspect Andrew saw such bad results
with 3c59x, but probably that is not the only reason.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18  2:11               ` Jeff Garzik
@ 2002-09-18  2:06                 ` David S. Miller
  2002-09-18  2:36                   ` Jeff Garzik
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2002-09-18  2:06 UTC (permalink / raw)
  To: jgarzik; +Cc: akpm, manfred, netdev, linux-kernel

   From: Jeff Garzik <jgarzik@mandrakesoft.com>
   Date: Tue, 17 Sep 2002 22:11:14 -0400
   
   You're looking at at least one extra get-irq-status too, at least in the 
   classical 10/100 drivers I'm used to seeing...
   
How so?  The number of ones done in the e1000 NAPI code are the same
(read register until no interesting status bits remain set, same as
pre-NAPI e1000 driver).

For tg3 it's a cheap memory read from the status block not a PIO.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-17 21:49             ` David S. Miller
@ 2002-09-18  2:11               ` Jeff Garzik
  2002-09-18  2:06                 ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Garzik @ 2002-09-18  2:11 UTC (permalink / raw)
  To: David S. Miller; +Cc: akpm, manfred, netdev, linux-kernel

David S. Miller wrote:
>    From: Jeff Garzik <jgarzik@mandrakesoft.com>
>    Date: Tue, 17 Sep 2002 17:54:42 -0400
> 
>    David S. Miller wrote:
>    > Any driver should be able to get the NAPI overhead to max out at
>    > 2 PIOs per packet.
>    
>    Just to pick nits... my example went from 2 or 3 IOs [depending on the 
>    presence/absence of a work loop] to 6 IOs.
>    
> I mean "2 extra PIOs" not "2 total PIOs".
> 
> I think it's doable for just about every driver, even tg3 with it's
> weird semaphore scheme takes 2 extra PIOs worst case with NAPI.
> 
> The semaphore I have to ACK anyways at hw IRQ time anyways, and since
> I keep a software copy of the IRQ masking register, mask and unmask
> are each one PIO.


You're looking at at least one extra get-irq-status too, at least in the 
classical 10/100 drivers I'm used to seeing...

	Jeff




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18  1:00               ` David S. Miller
@ 2002-09-18  2:16                 ` Andrew Morton
  2002-09-18 17:27                 ` Eric W. Biederman
  1 sibling, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2002-09-18  2:16 UTC (permalink / raw)
  To: David S. Miller; +Cc: hadi, manfred, netdev, linux-kernel

"David S. Miller" wrote:
> 
>    From: jamal <hadi@cyberus.ca>
>    Date: Tue, 17 Sep 2002 20:57:58 -0400 (EDT)
> 
>    I am not so sure with that 6% difference there is no other bug lurking
>    there; 6% seems too large for an extra two PCI transactions per packet.
> 
> {in,out}{b,w,l}() operations have a fixed timing, therefore his
> results doesn't sound that far off.
> 
> It is also one of the reasons I suspect Andrew saw such bad results
> with 3c59x, but probably that is not the only reason.

They weren't "very bad", iirc.  Maybe a 5% increase in CPU load.

It was all a long time ago.  Will retest if someone sends URLs.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18  2:06                 ` David S. Miller
@ 2002-09-18  2:36                   ` Jeff Garzik
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff Garzik @ 2002-09-18  2:36 UTC (permalink / raw)
  To: David S. Miller; +Cc: akpm, manfred, netdev, linux-kernel

David S. Miller wrote:
>    From: Jeff Garzik <jgarzik@mandrakesoft.com>
>    Date: Tue, 17 Sep 2002 22:11:14 -0400
>    
>    You're looking at at least one extra get-irq-status too, at least in the 
>    classical 10/100 drivers I'm used to seeing...
>    
> How so?  The number of ones done in the e1000 NAPI code are the same
> (read register until no interesting status bits remain set, same as
> pre-NAPI e1000 driver).
> 
> For tg3 it's a cheap memory read from the status block not a PIO.


Non-NAPI:

	get-irq-stat
	ack-irq
	get-irq-stat (omit, if no work loop)

NAPI:

	get-irq-stat
	ack-all-but-rx-irq
	mask-rx-irqs
	get-irq-stat (omit, if work loop)
	...
	ack-rx-irqs
	get-irq-stat
	unmask-rx-irqs

This is the low load / low latency case only.  The number of IOs 
decreases at higher loads [obviously :)]


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18  1:00               ` David S. Miller
  2002-09-18  2:16                 ` Andrew Morton
@ 2002-09-18 17:27                 ` Eric W. Biederman
  2002-09-18 17:50                   ` Alan Cox
  2002-09-18 20:23                   ` David S. Miller
  1 sibling, 2 replies; 25+ messages in thread
From: Eric W. Biederman @ 2002-09-18 17:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: hadi, akpm, manfred, netdev, linux-kernel

"David S. Miller" <davem@redhat.com> writes:

>    From: jamal <hadi@cyberus.ca>
>    Date: Tue, 17 Sep 2002 20:57:58 -0400 (EDT)
>    
>    I am not so sure with that 6% difference there is no other bug lurking
>    there; 6% seems too large for an extra two PCI transactions per packet.
> 
> {in,out}{b,w,l}() operations have a fixed timing, therefore his
> results doesn't sound that far off.
????

I don't see why they should be.  If it is a pci device the cost should
the same as a pci memory I/O.  The bus packets are the same.  So things like
increasing the pci bus speed should make it take less time.

Plus I have played with calibrating the TSC with outb to port
0x80 and there was enough variation that it was unuseable.  On some
newer systems it would take twice as long as on some older ones.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18 17:27                 ` Eric W. Biederman
@ 2002-09-18 17:50                   ` Alan Cox
  2002-09-19 14:58                     ` Eric W. Biederman
  2002-09-18 20:23                   ` David S. Miller
  1 sibling, 1 reply; 25+ messages in thread
From: Alan Cox @ 2002-09-18 17:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, hadi, akpm, manfred, netdev, linux-kernel

On Wed, 2002-09-18 at 18:27, Eric W. Biederman wrote:
> Plus I have played with calibrating the TSC with outb to port
> 0x80 and there was enough variation that it was unuseable.  On some
> newer systems it would take twice as long as on some older ones.

port 0x80 isnt going to PCI space.

x86 generally posts mmio write but not io write. Thats quite measurable.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18 17:27                 ` Eric W. Biederman
  2002-09-18 17:50                   ` Alan Cox
@ 2002-09-18 20:23                   ` David S. Miller
  2002-09-18 20:43                     ` Alan Cox
  1 sibling, 1 reply; 25+ messages in thread
From: David S. Miller @ 2002-09-18 20:23 UTC (permalink / raw)
  To: ebiederm; +Cc: hadi, akpm, manfred, netdev, linux-kernel

   From: ebiederm@xmission.com (Eric W. Biederman)
   Date: 18 Sep 2002 11:27:34 -0600

   "David S. Miller" <davem@redhat.com> writes:
   
   > {in,out}{b,w,l}() operations have a fixed timing, therefore his
   > results doesn't sound that far off.
   ????
   
   I don't see why they should be.  If it is a pci device the cost should
   the same as a pci memory I/O.  The bus packets are the same.  So things like
   increasing the pci bus speed should make it take less time.

The x86 processor has a well defined timing for executing inb
etc. instructions, the timing is fixed and is independant of the
speed of the PCI bus the device is on.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18 20:23                   ` David S. Miller
@ 2002-09-18 20:43                     ` Alan Cox
  2002-09-18 20:46                       ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Cox @ 2002-09-18 20:43 UTC (permalink / raw)
  To: David S. Miller; +Cc: ebiederm, hadi, akpm, manfred, netdev, linux-kernel

On Wed, 2002-09-18 at 21:23, David S. Miller wrote:
> The x86 processor has a well defined timing for executing inb
> etc. instructions, the timing is fixed and is independant of the
> speed of the PCI bus the device is on.

Earth calling Dave Miller

The inb timing depends on the PCI bus. If you want proof set a Matrox
G400 into no pci retry mode, run a large X load at it and time some inbs
you should be able to get to about 100 milliseconds for an inb to
execute


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18 20:43                     ` Alan Cox
@ 2002-09-18 20:46                       ` David S. Miller
  2002-09-18 21:15                         ` Alan Cox
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2002-09-18 20:46 UTC (permalink / raw)
  To: alan; +Cc: ebiederm, hadi, akpm, manfred, netdev, linux-kernel

   From: Alan Cox <alan@lxorguk.ukuu.org.uk>
   Date: 18 Sep 2002 21:43:09 +0100
   
   The inb timing depends on the PCI bus. If you want proof set a Matrox
   G400 into no pci retry mode, run a large X load at it and time some inbs
   you should be able to get to about 100 milliseconds for an inb to
   execute
   
Matrox isn't using inb/outb instructions to IO space, it is being
accessed by X using MEM space which is done using normal load and
store instructions on x86 after the card is mmap()'d into user space.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18 20:46                       ` David S. Miller
@ 2002-09-18 21:15                         ` Alan Cox
  2002-09-18 21:22                           ` David S. Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Cox @ 2002-09-18 21:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: ebiederm, hadi, akpm, manfred, netdev, linux-kernel

On Wed, 2002-09-18 at 21:46, David S. Miller wrote:
>    From: Alan Cox <alan@lxorguk.ukuu.org.uk>
>    Date: 18 Sep 2002 21:43:09 +0100
>    
>    The inb timing depends on the PCI bus. If you want proof set a Matrox
>    G400 into no pci retry mode, run a large X load at it and time some inbs
>    you should be able to get to about 100 milliseconds for an inb to
>    execute
>    
> Matrox isn't using inb/outb instructions to IO space, it is being
> accessed by X using MEM space which is done using normal load and
> store instructions on x86 after the card is mmap()'d into user space.

It doesnt matter what XFree86 is doing. Thats just to load the PCI bus
and jam it up to prove the point. It'll change your inb timing


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18 21:15                         ` Alan Cox
@ 2002-09-18 21:22                           ` David S. Miller
  2002-09-19 15:03                             ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: David S. Miller @ 2002-09-18 21:22 UTC (permalink / raw)
  To: alan; +Cc: ebiederm, hadi, akpm, manfred, netdev, linux-kernel

   From: Alan Cox <alan@lxorguk.ukuu.org.uk>
   Date: 18 Sep 2002 22:15:27 +0100
   
   It doesnt matter what XFree86 is doing. Thats just to load the PCI bus
   and jam it up to prove the point. It'll change your inb timing
   
Understood.  Maybe a more accurate wording would be "a fixed minimum
timing".

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18 17:50                   ` Alan Cox
@ 2002-09-19 14:58                     ` Eric W. Biederman
  0 siblings, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2002-09-19 14:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: David S. Miller, hadi, akpm, manfred, netdev, linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> On Wed, 2002-09-18 at 18:27, Eric W. Biederman wrote:
> > Plus I have played with calibrating the TSC with outb to port
> > 0x80 and there was enough variation that it was unuseable.  On some
> > newer systems it would take twice as long as on some older ones.
> 
> port 0x80 isnt going to PCI space.

Agreed.  It isn't going anywhere, and it takes it a while to recogonize
that.
 
> x86 generally posts mmio write but not io write. Thats quite measurable.

The difference timing difference between posted and non-posted writes
I can see.

Eric


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-18 21:22                           ` David S. Miller
@ 2002-09-19 15:03                             ` Eric W. Biederman
  2002-09-19 15:53                               ` Alan Cox
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2002-09-19 15:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: alan, ebiederm, hadi, akpm, manfred, netdev, linux-kernel

"David S. Miller" <davem@redhat.com> writes:

>    From: Alan Cox <alan@lxorguk.ukuu.org.uk>
>    Date: 18 Sep 2002 22:15:27 +0100
>    
>    It doesnt matter what XFree86 is doing. Thats just to load the PCI bus
>    and jam it up to prove the point. It'll change your inb timing
>    
> Understood.  Maybe a more accurate wording would be "a fixed minimum
> timing".

Why?

If I do an inb to a PCI-X device running at 133Mhz it should come back
much faster than an inb from my serial port on the ISA port.  What
is the reason for the fixed minimum timing?

Alan asserted there is a posting behavior difference, but that should
not affect reads.

What is different between mmio and pio to a pci device when doing reads
that should make mmio faster?

Eric


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Info: NAPI performance at "low" loads
  2002-09-19 15:03                             ` Eric W. Biederman
@ 2002-09-19 15:53                               ` Alan Cox
  0 siblings, 0 replies; 25+ messages in thread
From: Alan Cox @ 2002-09-19 15:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, hadi, akpm, manfred, netdev, linux-kernel

On Thu, 2002-09-19 at 16:03, Eric W. Biederman wrote:
> If I do an inb to a PCI-X device running at 133Mhz it should come back
> much faster than an inb from my serial port on the ISA port.  What
> is the reason for the fixed minimum timing?

As far as I can tell the minimum time for the inb/outb is simply the
time it takes the bus to respond. The only difference there is that for
writel rather than outl you won't wait for the write to complete on the
PCI bus just dump it into the fifo if its empty


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2002-09-19 15:44 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-17 19:53 Info: NAPI performance at "low" loads Manfred Spraul
2002-09-17 20:59 ` David S. Miller
2002-09-17 21:32   ` Andrew Morton
2002-09-17 21:26     ` David S. Miller
2002-09-17 21:45       ` Andrew Morton
2002-09-17 21:39         ` David S. Miller
2002-09-17 21:54           ` Jeff Garzik
2002-09-17 21:49             ` David S. Miller
2002-09-18  2:11               ` Jeff Garzik
2002-09-18  2:06                 ` David S. Miller
2002-09-18  2:36                   ` Jeff Garzik
2002-09-17 21:58           ` Andrew Morton
2002-09-18  0:57             ` jamal
2002-09-18  1:00               ` David S. Miller
2002-09-18  2:16                 ` Andrew Morton
2002-09-18 17:27                 ` Eric W. Biederman
2002-09-18 17:50                   ` Alan Cox
2002-09-19 14:58                     ` Eric W. Biederman
2002-09-18 20:23                   ` David S. Miller
2002-09-18 20:43                     ` Alan Cox
2002-09-18 20:46                       ` David S. Miller
2002-09-18 21:15                         ` Alan Cox
2002-09-18 21:22                           ` David S. Miller
2002-09-19 15:03                             ` Eric W. Biederman
2002-09-19 15:53                               ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).