linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: New net features for added performance
  2001-02-25  0:12     ` Andi Kleen
@ 2000-01-01  0:19       ` Pavel Machek
  2001-03-04  1:19         ` LILO error with 2.4.3-pre1 Steven J. Hill
  0 siblings, 1 reply; 37+ messages in thread
From: Pavel Machek @ 2000-01-01  0:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jeff Garzik, linux-kernel, netdev

Hi!

> > an alloc of a PKT_BUF_SZ'd skb immediately follows a free of a
> > same-sized skb.  100% of the time.
> 
> Free/Alloc gives the mm the chance to throttle it by failing, and also to 
> recover from fragmentation by packing the slabs. If you don't do it you need
> to add a hook somewhere that gets triggered on low memory situations and 
> frees the buffers.

And what? It makes allocation longer lived. Our MM should survive that just
fine.

-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* New net features for added performance
@ 2001-02-24 23:25 Jeff Garzik
  2001-02-24 23:48 ` Andi Kleen
                   ` (9 more replies)
  0 siblings, 10 replies; 37+ messages in thread
From: Jeff Garzik @ 2001-02-24 23:25 UTC (permalink / raw)
  To: netdev; +Cc: Linux Knernel Mailing List

Disclaimer:  This is 2.5, repeat, 2.5 material.



I've talked about the following items with a couple people on this list
in private.  I wanted to bring these up again, to see if anyone has
comments on the following suggested netdevice changes for the upcoming
2.5 development series of kernels.


1) Rx Skb recycling.  It would be nice to have skbs returned to the
driver after the net core is done with them, rather than have netif_rx
free the skb.  Many drivers pre-allocate a number of maximum-sized skbs
into which the net card DMA's data.  If netif_rx returned the SKB
instead of freeing it, the driver could simply flip the DescriptorOwned
bit for that buffer, giving it immediately back to the net card.

Advantages:  A de-allocation immediately followed by a reallocation is
eliminated, less L1 cache pollution during interrupt handling. 
Potentially less DMA traffic between card and host.

Disadvantages?



2) Tx packet grouping.  If the net core has knowledge that more packets
will be following the current one being sent to dev->hard_start_xmit(),
it should pass that knowledge on to dev->hard_start_xmit(), either as an
estimated number yet-to-be-sent, or just as a flag that "more is
coming."

Advantages: This lets the net driver make smarter decisions about Tx
interrupt mitigation, Tx buffer queueing, etc.

Disadvantages?  Can this sort of knowledge be obtained by a netdevice
right now, without any kernel modifications?



3) Slabbier packet allocation.  Even though skb allocation is decently
fast, you are still looking at an skb buffer head grab and a kmalloc,
for each [dev_]alloc_skb call.  I was wondering if it would be possible
to create a helper function for drivers which would improve the hot-path
considerably:

	static struct skbuff *ether_alloc_skb (int size)
	{
		if (size >= preallocated_skb_list->skb->size) {
			dequeue_skb_from_list()
			if (preallocate_size < low_water_limit)
				schedule_tasklet(refill_skb_list);
			return skb;
		}
		return dev_alloc_skb(size);
	}

The skbs from this list would be allocated by a tasklet in the
background to the maximum size requested by the ethernet driver.  If you
wanted to waste even more memory, you could allocate from per-CPU
lists..

Disadvantages?  Doing this might increase cache pollution due to
increased code and data size, but I think the hot path is much improved
(dequeue a properly sized, initialized, skb-reserved'd skb off a list)
and would help mitigate the impact of sudden bursts of traffic.



-- 
Jeff Garzik       | "You see, in this world there's two kinds of
Building 1024     |  people, my friend: Those with loaded guns
MandrakeSoft      |  and those who dig. You dig."  --Blondie

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:25 New net features for added performance Jeff Garzik
@ 2001-02-24 23:48 ` Andi Kleen
  2001-02-25  0:03   ` Jeff Garzik
  2001-02-25 11:49   ` Rusty Russell
  2001-02-25  1:55 ` Michael Richardson
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 37+ messages in thread
From: Andi Kleen @ 2001-02-24 23:48 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel

Jeff Garzik <jgarzik@mandrakesoft.com> writes:

> Advantages:  A de-allocation immediately followed by a reallocation is
> eliminated, less L1 cache pollution during interrupt handling. 
> Potentially less DMA traffic between card and host.
> 
> Disadvantages?

You need a new mechanism to cope with low memory situations because the 
drivers can tie up quite a bit of memory (in fact you gave up unified
memory management). 

> 3) Slabbier packet allocation.  Even though skb allocation is decently
> fast, you are still looking at an skb buffer head grab and a kmalloc,
> for each [dev_]alloc_skb call.  I was wondering if it would be possible
> to create a helper function for drivers which would improve the hot-path
> considerably:
[...]

If you need such a horror it just means there is something wrong with slab.
Better fix slab.


4) Better support for aligned RX by only copying the header, no the whole
packet, to end up with an aligned IP header. Unless the driver knows about
all protocol lengths this means the stack needs to support "parse header
in this buffer, then switch to other buffer with computed offset for data" 

-Andi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:48 ` Andi Kleen
@ 2001-02-25  0:03   ` Jeff Garzik
  2001-02-25  0:12     ` Andi Kleen
  2001-02-25  0:13     ` New net features for added performance Jeff Garzik
  2001-02-25 11:49   ` Rusty Russell
  1 sibling, 2 replies; 37+ messages in thread
From: Jeff Garzik @ 2001-02-25  0:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Andi Kleen wrote:
> 
> Jeff Garzik <jgarzik@mandrakesoft.com> writes:
> 
> > Advantages:  A de-allocation immediately followed by a reallocation is
> > eliminated, less L1 cache pollution during interrupt handling.
> > Potentially less DMA traffic between card and host.
> >
> > Disadvantages?
> 
> You need a new mechanism to cope with low memory situations because the
> drivers can tie up quite a bit of memory (in fact you gave up unified
> memory management).

I think you misunderstand..  netif_rx frees the skb.  In this example:

	netif_rx(skb); /* free skb of size PKT_BUF_SZ */
	skb = dev_alloc_skb(PKT_BUF_SZ)

an alloc of a PKT_BUF_SZ'd skb immediately follows a free of a
same-sized skb.  100% of the time.

It seems an obvious shortcut to me, to have __netif_rx or similar
-clear- the skb head not free it.  No changes to memory management or
additional low memory situations created by this, AFAICS.


> 4) Better support for aligned RX by only copying the header, no the whole
> packet, to end up with an aligned IP header. Unless the driver knows about
> all protocol lengths this means the stack needs to support "parse header
> in this buffer, then switch to other buffer with computed offset for data"

This requires scatter-gather hardware support, right?  If so, would this
support only exist for checksumming hardware -- like the current
zerocopy -- or would non-checksumming SG hardware like tulip be
supported too?

	Jeff


-- 
Jeff Garzik       | "You see, in this world there's two kinds of
Building 1024     |  people, my friend: Those with loaded guns
MandrakeSoft      |  and those who dig. You dig."  --Blondie

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-25  0:03   ` Jeff Garzik
@ 2001-02-25  0:12     ` Andi Kleen
  2000-01-01  0:19       ` Pavel Machek
  2001-02-25  0:13     ` New net features for added performance Jeff Garzik
  1 sibling, 1 reply; 37+ messages in thread
From: Andi Kleen @ 2001-02-25  0:12 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andi Kleen, linux-kernel, netdev

On Sat, Feb 24, 2001 at 07:03:38PM -0500, Jeff Garzik wrote:
> Andi Kleen wrote:
> > 
> > Jeff Garzik <jgarzik@mandrakesoft.com> writes:
> > 
> > > Advantages:  A de-allocation immediately followed by a reallocation is
> > > eliminated, less L1 cache pollution during interrupt handling.
> > > Potentially less DMA traffic between card and host.
> > >
> > > Disadvantages?
> > 
> > You need a new mechanism to cope with low memory situations because the
> > drivers can tie up quite a bit of memory (in fact you gave up unified
> > memory management).
> 
> I think you misunderstand..  netif_rx frees the skb.  In this example:
> 
> 	netif_rx(skb); /* free skb of size PKT_BUF_SZ */
> 	skb = dev_alloc_skb(PKT_BUF_SZ)
> 
> an alloc of a PKT_BUF_SZ'd skb immediately follows a free of a
> same-sized skb.  100% of the time.

Free/Alloc gives the mm the chance to throttle it by failing, and also to 
recover from fragmentation by packing the slabs. If you don't do it you need
to add a hook somewhere that gets triggered on low memory situations and 
frees the buffers.

> > 4) Better support for aligned RX by only copying the header, no the whole
> > packet, to end up with an aligned IP header. Unless the driver knows about
> > all protocol lengths this means the stack needs to support "parse header
> > in this buffer, then switch to other buffer with computed offset for data"
> 
> This requires scatter-gather hardware support, right?  If so, would this
> support only exist for checksumming hardware -- like the current
> zerocopy -- or would non-checksumming SG hardware like tulip be
> supported too?

It doesn't need any hardware support. In fact it is especially helpful for 
the tulip. The idea is that instead of copying the whole packet to get an
aligned header (e.g. on the alpha or other boxes where unaligned accesses
are very expensive) you just copy the first 128 byte that probably contain 
the header. For the data it doesn't matter much if it's unaligned; copy_to_user
and csum_copy_to_user can deal with that fine. 


-Andi


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-25  0:03   ` Jeff Garzik
  2001-02-25  0:12     ` Andi Kleen
@ 2001-02-25  0:13     ` Jeff Garzik
  2001-02-25  0:16       ` Andi Kleen
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2001-02-25  0:13 UTC (permalink / raw)
  To: Andi Kleen, linux-kernel

Jeff Garzik wrote:
> 
> Andi Kleen wrote:
> >
> > Jeff Garzik <jgarzik@mandrakesoft.com> writes:
> >
> > > Advantages:  A de-allocation immediately followed by a reallocation is
> > > eliminated, less L1 cache pollution during interrupt handling.
> > > Potentially less DMA traffic between card and host.
> > >
> > > Disadvantages?
> >
> > You need a new mechanism to cope with low memory situations because the
> > drivers can tie up quite a bit of memory (in fact you gave up unified
> > memory management).
> 
> I think you misunderstand..  netif_rx frees the skb.  In this example:
> 
>         netif_rx(skb); /* free skb of size PKT_BUF_SZ */
>         skb = dev_alloc_skb(PKT_BUF_SZ)
> 
> an alloc of a PKT_BUF_SZ'd skb immediately follows a free of a
> same-sized skb.  100% of the time.
> 
> It seems an obvious shortcut to me, to have __netif_rx or similar
> -clear- the skb head not free it.  No changes to memory management or
> additional low memory situations created by this, AFAICS.

Sorry... I should also point out that I was thinking of tulip
architecture and similar architectures, where you have a fixed number of
Skbs allocated at all times, and that number doesn't change for the
lifetime of the driver.

Clearly not all cases would benefit from skb recycling, but there are a
number of rx-ring-based systems where this would be useful, and (AFAICS)
reduce the work needed to be done by the system, and reduce the amount
of overall DMA traffic by a bit.

	Jeff



-- 
Jeff Garzik       | "You see, in this world there's two kinds of
Building 1024     |  people, my friend: Those with loaded guns
MandrakeSoft      |  and those who dig. You dig."  --Blondie

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-25  0:13     ` New net features for added performance Jeff Garzik
@ 2001-02-25  0:16       ` Andi Kleen
  0 siblings, 0 replies; 37+ messages in thread
From: Andi Kleen @ 2001-02-25  0:16 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andi Kleen, linux-kernel, netdev

On Sat, Feb 24, 2001 at 07:13:14PM -0500, Jeff Garzik wrote:
> Sorry... I should also point out that I was thinking of tulip
> architecture and similar architectures, where you have a fixed number of
> Skbs allocated at all times, and that number doesn't change for the
> lifetime of the driver.
> 
> Clearly not all cases would benefit from skb recycling, but there are a
> number of rx-ring-based systems where this would be useful, and (AFAICS)
> reduce the work needed to be done by the system, and reduce the amount
> of overall DMA traffic by a bit.

A simple way to do it currently is just to compare the new skb with the old
one. If it is the same, do a shortcut. That should usually work out when the
system has enough memory.


-Andi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:25 New net features for added performance Jeff Garzik
  2001-02-24 23:48 ` Andi Kleen
@ 2001-02-25  1:55 ` Michael Richardson
  2001-02-25  2:32 ` Jeremy Jackson
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Michael Richardson @ 2001-02-25  1:55 UTC (permalink / raw)
  To: Linux Knernel Mailing List


>>>>> "Jeff" == Jeff Garzik <jgarzik@mandrakesoft.com> writes:
    Jeff> 1) Rx Skb recycling.  It would be nice to have skbs returned to the
    Jeff> driver after the net core is done with them, rather than have netif_rx
    Jeff> free the skb.  Many drivers pre-allocate a number of maximum-sized skbs
    Jeff> into which the net card DMA's data.  If netif_rx returned the SKB
    Jeff> instead of freeing it, the driver could simply flip the DescriptorOwned
    Jeff> bit for that buffer, giving it immediately back to the net card.

    Jeff> Disadvantages?

  netif_rx() would have to copy the buffer.

  Right now, it just puts it on the queue towards the BH. For it to return
the skb would require that all processing occur inside of netif_rx() (a la BSD),
or that it copy the buffer.
 
    Jeff> 3) Slabbier packet allocation.  Even though skb allocation is decently
    Jeff> fast, you are still looking at an skb buffer head grab and a
 
  I think that if you had this, and you also returned skb's to this list on
a per device basis (change skb->free, I think) instead of to the general
pool, you probably eliminate your request #1.

] Train travel features AC outlets with no take-off restrictions|gigabit is no[
]   Michael Richardson, Solidum Systems   Oh where, oh where has|problem  with[
]     mcr@solidum.com   www.solidum.com   the little fishy gone?|PAX.port 1100[
] panic("Just another NetBSD/notebook using, kernel hacking, security guy");  [

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:25 New net features for added performance Jeff Garzik
  2001-02-24 23:48 ` Andi Kleen
  2001-02-25  1:55 ` Michael Richardson
@ 2001-02-25  2:32 ` Jeremy Jackson
  2001-02-25  3:23   ` Chris Wedgwood
  2001-02-25  2:38 ` Noah Romer
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Jeremy Jackson @ 2001-02-25  2:32 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Linux Knernel Mailing List

Jeff Garzik wrote:

(about optimizing kernel network code for busmastering NIC's)

> Disclaimer:  This is 2.5, repeat, 2.5 material.

Related question: are there any 100Mbit NICs with cpu's onboard?
Something mainstream/affordable?(i.e. not 1G ethernet)
Just recently someone posted asking some technical question about
ARMlinux for and intel card with 2 1G ports, 8 100M ports,
an onboard ARM cpu and 4 other uControllers... seems to me
that ultimately the networking code should go in that direction:
immagine having the *NIC* do most of this... no cache pollution problems...


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:25 New net features for added performance Jeff Garzik
                   ` (2 preceding siblings ...)
  2001-02-25  2:32 ` Jeremy Jackson
@ 2001-02-25  2:38 ` Noah Romer
  2001-03-03 23:32   ` Jes Sorensen
  2001-02-25 12:01 ` Andrew Morton
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Noah Romer @ 2001-02-25  2:38 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Linux Knernel Mailing List

On Sat, 24 Feb 2001, Jeff Garzik wrote:

> Disclaimer:  This is 2.5, repeat, 2.5 material.
[snip] 
> 1) Rx Skb recycling.  It would be nice to have skbs returned to the
> driver after the net core is done with them, rather than have netif_rx
> free the skb.  Many drivers pre-allocate a number of maximum-sized skbs
> into which the net card DMA's data.  If netif_rx returned the SKB
> instead of freeing it, the driver could simply flip the DescriptorOwned
> bit for that buffer, giving it immediately back to the net card.
> 
> Advantages:  A de-allocation immediately followed by a reallocation is
> eliminated, less L1 cache pollution during interrupt handling. 
> Potentially less DMA traffic between card and host.

This could be quite useful for the network driver I maintain (it's made
it to the -ac patch set for 2.4, but not yet into the main kernel
tarball). At the momement, it allocates 127 "buckets" (skb's under linux)
at start of day and posts them to the card. After that, it maintains a
minimum of 80 data buffers available to the card at any one time. There's
a noticable performance hit when the driver has to reallocate new skbs
to keep above the threshold. I try to recycle as much as possible w/in the
driver (i.e. really small incoming packets get a new skb allocated for
them and the original buffer is put back on the queue), but it would be
nice to be able to recycle even more of the skbs.

> Disadvantages?

As has been pointed out, there's a certain loss of control over allocation
of memory (could check for low memory conditions before sending the skb
back to the driver, but . . .). I do see a failure to allocate all 127
skbs, occasionally, when the driver is first loaded (only way to get
around this is to reboot the system).

> 2) Tx packet grouping.  If the net core has knowledge that more packets
> will be following the current one being sent to dev->hard_start_xmit(),
> it should pass that knowledge on to dev->hard_start_xmit(), either as an
> estimated number yet-to-be-sent, or just as a flag that "more is
> coming."
> 
> Advantages: This lets the net driver make smarter decisions about Tx
> interrupt mitigation, Tx buffer queueing, etc.
>
> Disadvantages?  Can this sort of knowledge be obtained by a netdevice
> right now, without any kernel modifications?

In my experience, Tx interrupt mitigation is of little benefit. I actually
saw a performance increase of ~20% when I turned off Tx interrupt
mitigation in my driver (could have been poor implementation on my part).

--
Noah Romer              |"Calm down, it's only ones and zeros." - this message
klevin@eskimo.com       |brought to you by The Network
PGP key available       |"Time will have its say, it always does." - Celltrex
by finger or email      |from Flying to Valhalla by Charles Pellegrino


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-25  2:32 ` Jeremy Jackson
@ 2001-02-25  3:23   ` Chris Wedgwood
  2001-02-25 12:41     ` Werner Almesberger
  0 siblings, 1 reply; 37+ messages in thread
From: Chris Wedgwood @ 2001-02-25  3:23 UTC (permalink / raw)
  To: Jeremy Jackson; +Cc: Jeff Garzik, netdev, Linux Knernel Mailing List

On Sat, Feb 24, 2001 at 09:32:59PM -0500, Jeremy Jackson wrote:

    Related question: are there any 100Mbit NICs with cpu's onboard?

Yes, but the only ones I've seen to date are magic and do special
things (like VPN or hardware crypto). I'm not sure without 'magic'
requirements there is much point for 100M on modern hardware.

Not affordable and whilst moving some of the IP stack onto the card
(I think this is what are alluding to) would be extremely non-trivial
especially if you want all the components (host OS, multiple networks
cards) to talk to each other asynchronously and you would all have to
deal with buggy hardware that doesn't like doing PCI-PCI transfers
and such like.

That said, it would be an extemely neat thing to do from a technical
perspective, but I don't know if you would ever get really good
performance from it.




  --cw

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:48 ` Andi Kleen
  2001-02-25  0:03   ` Jeff Garzik
@ 2001-02-25 11:49   ` Rusty Russell
  1 sibling, 0 replies; 37+ messages in thread
From: Rusty Russell @ 2001-02-25 11:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

In message <oupsnl3k5gs.fsf@pigdrop.muc.suse.de> you write:
> Jeff Garzik <jgarzik@mandrakesoft.com> writes:
> 
> > Advantages:  A de-allocation immediately followed by a reallocation is
> > eliminated, less L1 cache pollution during interrupt handling. 
> > Potentially less DMA traffic between card and host.
> > 
> > Disadvantages?
> 
> You need a new mechanism to cope with low memory situations because the 
> drivers can tie up quite a bit of memory (in fact you gave up unified
> memory management). 

Also, you still need to "clean" the skb (it can hold device and nfct
references).

Rusty.
--
Premature optmztion is rt of all evl. --DK

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:25 New net features for added performance Jeff Garzik
                   ` (3 preceding siblings ...)
  2001-02-25  2:38 ` Noah Romer
@ 2001-02-25 12:01 ` Andrew Morton
  2001-02-25 15:11   ` Jeremy Jackson
  2001-02-25 12:22 ` Werner Almesberger
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Andrew Morton @ 2001-02-25 12:01 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Linux Knernel Mailing List

Jeff Garzik wrote:
> 
>...
> 1) Rx Skb recycling.
>... 
> 2) Tx packet grouping.
>...
> 3) Slabbier packet allocation.

Let's see what the profiler says.  10 seconds of TCP xmit
followed by 10 seconds of TCP receive.  100 mbits/sec.
Kernel 2.4.2+ZC.

c0119470 do_softirq                                   97   0.7132
c020e718 ip_output                                    99   0.3694
c020a2c8 ip_route_input                              103   0.2893
c01fdc4c skb_release_data                            113   1.0089
c021312c tcp_sendmsg                                 113   0.0252
c0129c64 kmalloc                                     117   0.3953
c0112efc __wake_up_sync                              128   0.6667
c01fdd24 __kfree_skb                                 153   0.6071
c020e824 ip_queue_xmit                               154   0.1149
c011be80 del_timer                                   163   2.2639
c0222fac tcp_v4_rcv                                  173   0.1022
c010a778 handle_IRQ_event                            178   1.4833
c01127fc schedule                                    200   0.1259
c01d39f8 boomerang_rx                                332   0.2823
c024284c csum_partial_copy_generic                   564   2.2742
c01d2c84 boomerang_start_xmit                        654   0.9033
c0242b3c __generic_copy_from_user                    733  12.2167
c01d329c boomerang_interrupt                         910   0.8818
c01071f4 poll_idle                                 41813 1306.6562
00000000 total                                     48901   0.0367

7088 non-idle ticks.
153+117+113 ticks in skb/memory type functions.

So, naively, the most which can be saved here by optimising
the skb and memory usage is 5% of networking load. (1% of
system load @100 mbps)

Total device driver cost is 27% of the networking load.

All the meat is in the interrupt load.  The 3com driver
transfers about three packets per interrupt.  Here's
the system load (dual CPU):

Doing 100mbps TCP send with netperf:    14.9%
Doing 100mbps TCP receive with netperf: 23.3%

When tx interrupt mitigation is disabled we get 1.5 packets
per interrupt doing transmit:

Doing 100mbps TCP send with netperf:    16.1%
Doing 100mbps TCP receive with netperf: 24.0%

So a 2x reduction in interrupt frequency on TCP transmit has
saved 1.2% of system load. That's 8% of networking load, and,
presumably, 30% of the driver load. That all seems to make sense.


The moral?

- Tuning skb allocation isn't likely to make much difference.
- At the device-driver level the most effective thing is
  to reduce the number of interrupts.
- If we can reduce the driver cost to *zero*, we improve
  TCP efficiency by 27%.
- At the system level the most important thing is to rewrite
  applications to use sendfile(). (But Rx is more expensive
  than Tx, so even this ain't the main game).

I agree that batching skbs into hard_start_xmit() may allow
some driver optimisations.  Pass it a vector of skbs rather
than one, and let it return an indication of how many were
actually consumed.  But we'd need to go through an exercise
like the above beforehand - it may not be worth the
protocol-level trauma.

I suspect that a thorough analysis of the best way to
use Linux networking, and then a rewrite of important
applications so they use the result of that analysis
would pay dividends.

-

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:25 New net features for added performance Jeff Garzik
                   ` (4 preceding siblings ...)
  2001-02-25 12:01 ` Andrew Morton
@ 2001-02-25 12:22 ` Werner Almesberger
  2001-03-12 15:08   ` Jes Sorensen
  2001-02-25 13:08 ` Jonathan Morton
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Werner Almesberger @ 2001-02-25 12:22 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Linux Knernel Mailing List

Jeff Garzik wrote:
> 1) Rx Skb recycling.

Sounds like a potentially useful idea. To solve the most immediate memory
pressure problems, maybe VM could provide some function that does a kfree
in cases of memory shortage, and that does nothing otherwise, so the
driver could offer to free the skb after netif_rx. You still need to go
over the list in idle periods, though.

> 2) Tx packet grouping.

Hmm, I think we need an estimate of how long a packet train you'd usually
get. A flag looks reasonably inexpensive. Estimated numbers sound like
over-engineering.

> Disadvantages?  Can this sort of knowledge be obtained by a netdevice
> right now, without any kernel modifications?

Question is what the hardware really needs. If you can change the
interrupt point easily, it's probably cheapest to do all the work in
hard_start_xmit.

> 3) Slabbier packet allocation.

Hmm, this may actually be worse during bursts: if you burst exceeds
the preallocated size, you have to perform more expensive/slower
operations (e.g. running a tasklet) to refill your cache.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, ICA, EPFL, CH           Werner.Almesberger@epfl.ch /
/_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-25  3:23   ` Chris Wedgwood
@ 2001-02-25 12:41     ` Werner Almesberger
  2001-02-25 13:57       ` Chris Wedgwood
  0 siblings, 1 reply; 37+ messages in thread
From: Werner Almesberger @ 2001-02-25 12:41 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: netdev, Linux Knernel Mailing List

Chris Wedgwood wrote:
> That said, it would be an extemely neat thing to do from a technical
> perspective, but I don't know if you would ever get really good
> performance from it.

Well, you'd have to re-design the networking code to support NUMA
architectures, with a fairly fine granularity. I'm not sure you'd gain
anything except possibly for the forwarding fast path.

A cheaper, and probably more useful possibility is hardware assistance for
specific operations. E.g. hardware-accelerated packet classification looks
interesting. I'd also like to see hardware-assistance for shaping on other
media than ATM.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, ICA, EPFL, CH           Werner.Almesberger@epfl.ch /
/_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:25 New net features for added performance Jeff Garzik
                   ` (5 preceding siblings ...)
  2001-02-25 12:22 ` Werner Almesberger
@ 2001-02-25 13:08 ` Jonathan Morton
  2001-02-26 23:46 ` David S. Miller
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Jonathan Morton @ 2001-02-25 13:08 UTC (permalink / raw)
  To: Jeremy Jackson, Jeff Garzik; +Cc: netdev, Linux Knernel Mailing List

At 2:32 am +0000 25/2/2001, Jeremy Jackson wrote:
>Jeff Garzik wrote:
>
>(about optimizing kernel network code for busmastering NIC's)
>
>> Disclaimer:  This is 2.5, repeat, 2.5 material.
>
>Related question: are there any 100Mbit NICs with cpu's onboard?
>Something mainstream/affordable?(i.e. not 1G ethernet)
>Just recently someone posted asking some technical question about
>ARMlinux for and intel card with 2 1G ports, 8 100M ports,
>an onboard ARM cpu and 4 other uControllers... seems to me
>that ultimately the networking code should go in that direction:
>immagine having the *NIC* do most of this... no cache pollution problems...

Dunno, but the latest Motorola ColdFire microcontroller has Ethernet built
in.  I think it's even 100baseTX, but I could be mistaken.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-25 12:41     ` Werner Almesberger
@ 2001-02-25 13:57       ` Chris Wedgwood
  0 siblings, 0 replies; 37+ messages in thread
From: Chris Wedgwood @ 2001-02-25 13:57 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: netdev, Linux Knernel Mailing List

On Sun, Feb 25, 2001 at 01:41:56PM +0100, Werner Almesberger wrote:

    Well, you'd have to re-design the networking code to support NUMA
    architectures, with a fairly fine granularity. I'm not sure you'd
    gain anything except possibly for the forwarding fast path.

I'm not convince for a general purpose OS you would gain anything at
all; but an an intellectual exercise it's a fascinating idea.

I'd make a good PhD thesis.



  --cw

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-25 12:01 ` Andrew Morton
@ 2001-02-25 15:11   ` Jeremy Jackson
  0 siblings, 0 replies; 37+ messages in thread
From: Jeremy Jackson @ 2001-02-25 15:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jeff Garzik, netdev, Linux Knernel Mailing List

Andrew Morton wrote:

(kernel profile of TCP tx/rx)So, naively, the most which can be saved here by
optimising

> the skb and memory usage is 5% of networking load. (1% of
> system load @100 mbps)
>

For a local tx/rx.  (open question) What happens with
a router box with netfilter and queueing?  Perhaps
this type of optimisation will help more in that case?

think about a box with 4 1G NICs being able to
route AND do QoS per conntrack connection
(ala RSVP and such)

Really what I'm looking for is something like SGI's
STP (Scheduled Transfer Protocol).  mmap your
tcp recieve buffer, and have a card smart enough
to figure out header alignment, (i.e. know header
size based on protocol number) transfer only that,
let the kernel process it, then tell the card to DMA
the data from the buffer right into process memory.
(or other NIC)

Make it possible to have the performance of a
Juniper network processor + flexiblity of Linux.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:25 New net features for added performance Jeff Garzik
                   ` (6 preceding siblings ...)
  2001-02-25 13:08 ` Jonathan Morton
@ 2001-02-26 23:46 ` David S. Miller
  2001-02-27  0:07   ` Jeff Garzik
  2001-02-27  0:10   ` David S. Miller
  2001-02-26 23:48 ` David S. Miller
  2001-03-01 21:06 ` Jes Sorensen
  9 siblings, 2 replies; 37+ messages in thread
From: David S. Miller @ 2001-02-26 23:46 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Linux Knernel Mailing List


Jeff Garzik writes:
 > 1) Rx Skb recycling.
 ...
 > Advantages:  A de-allocation immediately followed by a reallocation is
 > eliminated, less L1 cache pollution during interrupt handling. 
 > Potentially less DMA traffic between card and host.
 ...
 > Disadvantages?

It simply cannot work, as Alexey stated, in normal circumstances
netif_rx() queues until the user reads the data.  This is the whole
basis of our receive packet processing model within softint/user
context.

Secondly, I can argue that skb recycling can give _worse_ cache
performance.  If the next use and access by the card to the
skb data is deferred, this gives the cpu a chance to displace those
lines in it's cache naturally via displacement instead of being forced
quickly to do so when the device touches that data.

If the device forces the cache displacement, those cache lines become
empty until filled with something later (smaller utilization of total
cache contents) whereas natural displacement puts useful data into
the cache at the time of the displacement (larger utilization of total
cache contents).

It is an NT/windows driver API rubbish idea, and it is full crap.

 > 2) Tx packet grouping.
 ...
 > Disadvantages?

See Torvalds vs. world discussion on this list about API entry points
which pass multiple pages at a time versus simpler ones which pass
only a single page at a time. :-)

 > 3) Slabbier packet allocation.
 ...
 > Disadvantages?  Doing this might increase cache pollution due to
 > increased code and data size, but I think the hot path is much improved
 > (dequeue a properly sized, initialized, skb-reserved'd skb off a list)
 > and would help mitigate the impact of sudden bursts of traffic.

I don't know what I think about this one, but my hunch is that it will
lead to worse data packing via such an allocator.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:25 New net features for added performance Jeff Garzik
                   ` (7 preceding siblings ...)
  2001-02-26 23:46 ` David S. Miller
@ 2001-02-26 23:48 ` David S. Miller
  2001-02-27  0:03   ` Andi Kleen
  2001-02-27  0:08   ` David S. Miller
  2001-03-01 21:06 ` Jes Sorensen
  9 siblings, 2 replies; 37+ messages in thread
From: David S. Miller @ 2001-02-26 23:48 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jeff Garzik, linux-kernel


Andi Kleen writes:
 > 4) Better support for aligned RX by only copying the header

Andi you can make this now:

1) Add new "post-header data pointer" field in SKB.
2) Change drivers to copy into aligned headroom as
   you mention, and they set this new post-header
   pointer as appropriate.  For normal drivers without
   alignment problem, generic code sets the pointer up
   just like it does the rest of the SKB header pointers
   now.
3) Enforce correct usage of it in all the networking :-)

I would definitely accept such a patch for the 2.5.x
series.  It seems to be a nice idea and I currently see
no holes in it.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-26 23:48 ` David S. Miller
@ 2001-02-27  0:03   ` Andi Kleen
  2001-02-27 19:59     ` kuznet
  2001-02-27  0:08   ` David S. Miller
  1 sibling, 1 reply; 37+ messages in thread
From: Andi Kleen @ 2001-02-27  0:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: Andi Kleen, Jeff Garzik, linux-kernel

On Mon, Feb 26, 2001 at 03:48:31PM -0800, David S. Miller wrote:
> 
> Andi Kleen writes:
>  > 4) Better support for aligned RX by only copying the header
> 
> Andi you can make this now:
> 
> 1) Add new "post-header data pointer" field in SKB.

That would imply to let the drivers parse all headers to figure out the length.
I think it's better to have a "header base" and "data base" pointer.
The driver would just copy some standard size that likely contains all of
the header 
When you're finished with the header use 
skb->database+(skb->hdrptr-skb->hdrbase) to get the start of data. 

Or did I misunderstand you?



> 3) Enforce correct usage of it in all the networking :-)

,) -- the tricky part.


-Andi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-26 23:46 ` David S. Miller
@ 2001-02-27  0:07   ` Jeff Garzik
  2001-02-27  0:10   ` David S. Miller
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff Garzik @ 2001-02-27  0:07 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Linux Knernel Mailing List

"David S. Miller" wrote:
> Jeff Garzik writes:
>  > 2) Tx packet grouping.
>  ...
>  > Disadvantages?
> 
> See Torvalds vs. world discussion on this list about API entry points
> which pass multiple pages at a time versus simpler ones which pass
> only a single page at a time. :-)

I only want to know if more are coming, not actually pass multiples..

	Jeff



-- 
Jeff Garzik       | "You see, in this world there's two kinds of
Building 1024     |  people, my friend: Those with loaded guns
MandrakeSoft      |  and those who dig. You dig."  --Blondie

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-26 23:48 ` David S. Miller
  2001-02-27  0:03   ` Andi Kleen
@ 2001-02-27  0:08   ` David S. Miller
  2001-02-27  2:53     ` Jeremy Jackson
  1 sibling, 1 reply; 37+ messages in thread
From: David S. Miller @ 2001-02-27  0:08 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jeff Garzik, linux-kernel


Andi Kleen writes:
 > Or did I misunderstand you?

What is wrong with making methods, keyed off of the ethernet protocol
ID, that can do the "I know where/how-long headers are" stuff for that
protocol?  Only cards with the problem call into this function vector
or however we arrange it, and then for those that don't have these
problems at all we can make NULL a special value for this
"post-header" pointer.

You can pick some arbitrary number, sure, that is another way to
do it.  Such a size would need to be chosen very carefully though.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-26 23:46 ` David S. Miller
  2001-02-27  0:07   ` Jeff Garzik
@ 2001-02-27  0:10   ` David S. Miller
  1 sibling, 0 replies; 37+ messages in thread
From: David S. Miller @ 2001-02-27  0:10 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Linux Knernel Mailing List


Jeff Garzik writes:
 > I only want to know if more are coming, not actually pass multiples..

Ok, then my only concern is that the path from "I know more is coming"
down to hard_start_xmit invocation is long.  It would mean passing a
new piece of state a long distance inside the stack from SKB origin to
device.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-27  0:08   ` David S. Miller
@ 2001-02-27  2:53     ` Jeremy Jackson
  0 siblings, 0 replies; 37+ messages in thread
From: Jeremy Jackson @ 2001-02-27  2:53 UTC (permalink / raw)
  To: David S. Miller; +Cc: Andi Kleen, Jeff Garzik, linux-kernel

"David S. Miller" wrote:

> Andi Kleen writes:
>  > Or did I misunderstand you?
>
> What is wrong with making methods, keyed off of the ethernet protocol
> ID, that can do the "I know where/how-long headers are" stuff for that
> protocol?  Only cards with the problem call into this function vector
> or however we arrange it, and then for those that don't have these
> problems at all we can make NULL a special value for this
> "post-header" pointer.
>

I had a dream about a NIC that would do exactly the above by itsself.
The dumb cards would use the above code, and the smart ones' drivers
would overload the functions and allow the NIC to do it.

"Tell me of the waters of your homeworld, Usul" :)

Except the driver interacts differently than netif_rx... knowing the
protocol it DMA's the header  only(it knows the length then too)

(SMC's epic100's descriptors can do this, but the card can't
do the de-mux on proto id, forcing the network core to run
in the ISR so the card can finish DMA and not exhaust it's
tiny memory.) The network code can
then do all the routing/netfilter/QoS stuff, and tell the card to DMA
the payload into the TX queue of another NIC (or queue the header
with a pointer to the payload in the PCI address space of the incoming
NIC heh heh) OR into the process' mmap'ed TCP receive buffer
ala SGI's STP.




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-27  0:03   ` Andi Kleen
@ 2001-02-27 19:59     ` kuznet
  0 siblings, 0 replies; 37+ messages in thread
From: kuznet @ 2001-02-27 19:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Hello!

> > 3) Enforce correct usage of it in all the networking :-)
> 
> ,) -- the tricky part.

No tricks, IP[v6] is already enforced to be clever; all the rest are free
to do this, if they desire. And btw, driver need not to parse anything,
but its internal stuff and even aligning eth II header can be made
in eth_type_trans().

Actually, it is possible now not changing anything but driver.
Fortunately, I removed stupid tulip from alpha, so that I have
no impetus to try this myself. 8)

Alexey

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-24 23:25 New net features for added performance Jeff Garzik
                   ` (8 preceding siblings ...)
  2001-02-26 23:48 ` David S. Miller
@ 2001-03-01 21:06 ` Jes Sorensen
  9 siblings, 0 replies; 37+ messages in thread
From: Jes Sorensen @ 2001-03-01 21:06 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Linux Knernel Mailing List

>>>>> "Jeff" == Jeff Garzik <jgarzik@mandrakesoft.com> writes:

Jeff> 1) Rx Skb recycling.  It would be nice to have skbs returned to
Jeff> the driver after the net core is done with them, rather than
Jeff> have netif_rx free the skb.  Many drivers pre-allocate a number
Jeff> of maximum-sized skbs into which the net card DMA's data.  If
Jeff> netif_rx returned the SKB instead of freeing it, the driver
Jeff> could simply flip the DescriptorOwned bit for that buffer,
Jeff> giving it immediately back to the net card.

Jeff> Advantages: A de-allocation immediately followed by a
Jeff> reallocation is eliminated, less L1 cache pollution during
Jeff> interrupt handling.  Potentially less DMA traffic between card
Jeff> and host.

Jeff> Disadvantages?

I already tried this with the AceNIC GigE driver some time ago, and
after Ingo came up with a per-CPU slab patch the gain was gone. I am
not sure the complexity is worth it.

Jes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-25  2:38 ` Noah Romer
@ 2001-03-03 23:32   ` Jes Sorensen
  0 siblings, 0 replies; 37+ messages in thread
From: Jes Sorensen @ 2001-03-03 23:32 UTC (permalink / raw)
  To: Noah Romer; +Cc: Jeff Garzik, netdev, Linux Knernel Mailing List

>>>>> "Noah" == Noah Romer <klevin@eskimo.com> writes:

Noah> In my experience, Tx interrupt mitigation is of little
Noah> benefit. I actually saw a performance increase of ~20% when I
Noah> turned off Tx interrupt mitigation in my driver (could have been
Noah> poor implementation on my part).

You need to define performance increase here. TX interrupt coalescing
can still be a win in the systems load department.

Jes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* LILO error with 2.4.3-pre1...
  2000-01-01  0:19       ` Pavel Machek
@ 2001-03-04  1:19         ` Steven J. Hill
  2001-03-04  1:39           ` Keith Owens
                             ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Steven J. Hill @ 2001-03-04  1:19 UTC (permalink / raw)
  To: linux-kernel

Hmm, needed 2.4.3-pre1 and went to install with LILO using
'lilo -v' and got this:

   LILO version 21.4-4, Copyright (C) 1992-1998 Werner Almesberger
   'lba32' extensions Copyright (C) 1999,2000 John Coffman

   Reading boot sector from /dev/hda
   Merging with /boot/boot.b
   Boot image: /boot/vmlinuz-2.4.2
   Added linux *
   Boot image: /boot/vmlinuz-2.4.3-pre1
   Fatal: geo_comp_addr: Cylinder number is too big (1274 > 1023)

Neato. I don't have time to dig through LILO source code right
now, so here are my system specs:

	Linux Distribution: RedHat 6.2 with all latest updates
        Hard Disk: Maxtor 52049H3 (20GB) IDE
        CPU: Dual PII-266MHz
        RAM: 256MB PC100
        Result of 'fdisk /dev/hda -l':

           Disk /dev/hda: 255 heads, 63 sectors, 2491 cylinders
           Units = cylinders of 16065 * 512 bytes

             Device Boot    Start       End    Blocks   Id  System
          /dev/hda1   *         1      1513  12153141   83  Linux
          /dev/hda2          1514      1530    136552+  82  Linux swap
          /dev/hda3          1531      2491   7719232+  83  Linux

I have no idea why the 1023 limit is coming up considering 2.4.2 and
LILO were working just fine together and I have a newer BIOS that has
not problems detecting the driver properly. Go ahead, call me idiot :).

-Steve

-- 
 Steven J. Hill - Embedded SW Engineer
 Public Key: 'http://www.cotw.com/pubkey.txt'
 FPR1: E124 6E1C AF8E 7802 A815
 FPR2: 7D72 829C 3386 4C4A E17D

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: LILO error with 2.4.3-pre1...
  2001-03-04  1:19         ` LILO error with 2.4.3-pre1 Steven J. Hill
@ 2001-03-04  1:39           ` Keith Owens
  2001-03-04  2:27             ` Tom Sightler
  2001-03-04 21:32             ` Mircea Damian
  2001-03-04  2:39           ` Andre Tomt
  2001-03-04 13:35           ` Alan Cox
  2 siblings, 2 replies; 37+ messages in thread
From: Keith Owens @ 2001-03-04  1:39 UTC (permalink / raw)
  To: sjhill; +Cc: linux-kernel

On Sat, 03 Mar 2001 19:19:28 -0600, 
"Steven J. Hill" <sjhill@cotw.com> wrote:
>I have no idea why the 1023 limit is coming up considering 2.4.2 and
>LILO were working just fine together and I have a newer BIOS that has
>not problems detecting the driver properly. Go ahead, call me idiot :).

OK, you're an idiot :).  It only worked before because all the files
that lilo used just happened to be below cylinder 1024.  Your partition
goes past cyl 1024 and your new kernel is using space above 1024.  Find
a version of lilo that can cope with cyl >= 1024 (is there one?) or
move the kernel below cyl 1024.  You might need to repartition your
disk to get / all below 1024.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: LILO error with 2.4.3-pre1...
  2001-03-04  1:39           ` Keith Owens
@ 2001-03-04  2:27             ` Tom Sightler
  2001-03-04 21:32             ` Mircea Damian
  1 sibling, 0 replies; 37+ messages in thread
From: Tom Sightler @ 2001-03-04  2:27 UTC (permalink / raw)
  To: Keith Owens, sjhill; +Cc: linux-kernel


----- Original Message -----
From: "Keith Owens" <kaos@ocs.com.au>
To: <sjhill@cotw.com>
Cc: <linux-kernel@vger.kernel.org>
Sent: Saturday, March 03, 2001 8:39 PM
Subject: Re: LILO error with 2.4.3-pre1...


> On Sat, 03 Mar 2001 19:19:28 -0600,
> "Steven J. Hill" <sjhill@cotw.com> wrote:
> >I have no idea why the 1023 limit is coming up considering 2.4.2 and
> >LILO were working just fine together and I have a newer BIOS that has
> >not problems detecting the driver properly. Go ahead, call me idiot :).
>
> OK, you're an idiot :).  It only worked before because all the files
> that lilo used just happened to be below cylinder 1024.  Your partition
> goes past cyl 1024 and your new kernel is using space above 1024.

I would agree with this explanation.

> Find a version of lilo that can cope with cyl >= 1024 (is there one?)

Uh, the version he has can cope with this, see the following:

>    LILO version 21.4-4, Copyright (C) 1992-1998 Werner Almesberger
>    'lba32' extensions Copyright (C) 1999,2000 John Coffman

The lba32 extensions should take care of this, of course you have to add
'lba32' as a line in your lilo.conf before lilo actually uses them (and, I
assume, the BIOS must support the LBA extensions, but it seems most modern
ones do).

Give that a try.  Works for me.

Later,
Tom



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: LILO error with 2.4.3-pre1...
  2001-03-04  1:19         ` LILO error with 2.4.3-pre1 Steven J. Hill
  2001-03-04  1:39           ` Keith Owens
@ 2001-03-04  2:39           ` Andre Tomt
  2001-03-04  3:32             ` Steven J. Hill
  2001-03-04 13:35           ` Alan Cox
  2 siblings, 1 reply; 37+ messages in thread
From: Andre Tomt @ 2001-03-04  2:39 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: sjhill

>    'lba32' extensions Copyright (C) 1999,2000 John Coffman
     ^^^^^^

Add lba32 as the top line in lilo.conf. Re-run lilo.

> Fatal: geo_comp_addr: Cylinder number is too big (1274 > 1023)

Before 2.4.3pre1, your kernel just happened to toss itself below cylinder
1024.

> Go ahead, call me idiot :).

Idiot. :-)

--
Regards,
Andre Tomt


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: LILO error with 2.4.3-pre1...
  2001-03-04  2:39           ` Andre Tomt
@ 2001-03-04  3:32             ` Steven J. Hill
  0 siblings, 0 replies; 37+ messages in thread
From: Steven J. Hill @ 2001-03-04  3:32 UTC (permalink / raw)
  To: Andre Tomt; +Cc: Linux Kernel Mailing List

Andre Tomt wrote:
> 
> >    'lba32' extensions Copyright (C) 1999,2000 John Coffman
>      ^^^^^^
> 
> Add lba32 as the top line in lilo.conf. Re-run lilo.
> 
> > Fatal: geo_comp_addr: Cylinder number is too big (1274 > 1023)
> 
> Before 2.4.3pre1, your kernel just happened to toss itself below cylinder
> 1024.
> 
> > Go ahead, call me idiot :).
> 
> Idiot. :-)
> 
And since Andre was the last person to email me and call me an idiot,
I will reply to his response :). Yup, that was the case and I added
'lba32' to my '/etc/lilo.conf' and things work great. I knew it was
something simple, but I just don't pay attention to LILO much anymore.
Thanks everyone.

-Steve

-- 
 Steven J. Hill - Embedded SW Engineer
 Public Key: 'http://www.cotw.com/pubkey.txt'
 FPR1: E124 6E1C AF8E 7802 A815
 FPR2: 7D72 829C 3386 4C4A E17D

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: LILO error with 2.4.3-pre1...
  2001-03-04  1:19         ` LILO error with 2.4.3-pre1 Steven J. Hill
  2001-03-04  1:39           ` Keith Owens
  2001-03-04  2:39           ` Andre Tomt
@ 2001-03-04 13:35           ` Alan Cox
  2 siblings, 0 replies; 37+ messages in thread
From: Alan Cox @ 2001-03-04 13:35 UTC (permalink / raw)
  To: sjhill; +Cc: linux-kernel

>    LILO version 21.4-4, Copyright (C) 1992-1998 Werner Almesberger
>    'lba32' extensions Copyright (C) 1999,2000 John Coffman
> 
>    Boot image: /boot/vmlinuz-2.4.3-pre1
>    Fatal: geo_comp_addr: Cylinder number is too big (1274 > 1023)
> 
> I have no idea why the 1023 limit is coming up considering 2.4.2 and
> LILO were working just fine together and I have a newer BIOS that has
> not problems detecting the driver properly. Go ahead, call me idiot :).

You need to specify the lba32 option in your config


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: LILO error with 2.4.3-pre1...
  2001-03-04  1:39           ` Keith Owens
  2001-03-04  2:27             ` Tom Sightler
@ 2001-03-04 21:32             ` Mircea Damian
  2001-03-04 23:05               ` Guest section DW
  1 sibling, 1 reply; 37+ messages in thread
From: Mircea Damian @ 2001-03-04 21:32 UTC (permalink / raw)
  To: Keith Owens; +Cc: sjhill, linux-kernel

On Sun, Mar 04, 2001 at 12:39:32PM +1100, Keith Owens wrote:
> On Sat, 03 Mar 2001 19:19:28 -0600, 
> "Steven J. Hill" <sjhill@cotw.com> wrote:
> >I have no idea why the 1023 limit is coming up considering 2.4.2 and
> >LILO were working just fine together and I have a newer BIOS that has
> >not problems detecting the driver properly. Go ahead, call me idiot :).
> 
> OK, you're an idiot :).  It only worked before because all the files
> that lilo used just happened to be below cylinder 1024.  Your partition
> goes past cyl 1024 and your new kernel is using space above 1024.  Find
> a version of lilo that can cope with cyl >= 1024 (is there one?) or
> move the kernel below cyl 1024.  You might need to repartition your
> disk to get / all below 1024.

Call me idiot too but please explain what is wrong here:

# cat /etc/lilo.conf
boot = /dev/hda
timeout = 150
vga = 4
ramdisk = 0
lba32
append = "hdc=scsi"
prompt


image = /boot/vmlinuz-2.4.2
  root = /dev/hda2
  read-only
  label = Linux

other = /dev/hda3
  label = win
  table = /dev/hda

# fdisk -l /dev/hda

Disk /dev/hda: 255 heads, 63 sectors, 1650 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1             1        17    136521   82  Linux swap
/dev/hda2            18      1165   9221310   83  Linux
/dev/hda3   *      1166      1650   3895762+   c  Win95 FAT32 (LBA)
root@taz:~# lilo -v
LILO version 21.7, Copyright (C) 1992-1998 Werner Almesberger
Linux Real Mode Interface library Copyright (C) 1998 Josh Vanderhoof
Development beyond version 21 Copyright (C) 1999-2001 John Coffman
Released 24-Feb-2001 and compiled at 18:31:02 on Mar  3 2001.

Reading boot sector from /dev/hda
Merging with /boot/boot.b
Boot image: /boot/vmlinuz-2.4.2
Added Linux *
Boot other: /dev/hda3, on /dev/hda, loader /boot/chain.b
Device 0x0300: Invalid partition table, 3rd entry
  3D address:     63/254/141 (2281229)
  Linear address: 1/0/1165 (18715725)


Mar  2 20:26:29 taz kernel: hda: IBM-DJNA-371350, ATA DISK drive 
Mar  2 20:26:29 taz kernel: hda: 26520480 sectors (13578 MB) w/1966KiB Cache, CHS=1650/255/63 


Is anybody able to explain the error?
That partition contains a valid VFAT partition with win98se installed on it (and it works fine,
ofc if I remove lilo from MBR).

-- 
Mircea Damian
E-mails: dmircea@kappa.ro, dmircea@roedu.net
WebPage: http://taz.mania.k.ro/~dmircea/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: LILO error with 2.4.3-pre1...
  2001-03-04 21:32             ` Mircea Damian
@ 2001-03-04 23:05               ` Guest section DW
  0 siblings, 0 replies; 37+ messages in thread
From: Guest section DW @ 2001-03-04 23:05 UTC (permalink / raw)
  To: Mircea Damian, Keith Owens; +Cc: sjhill, linux-kernel

On Sun, Mar 04, 2001 at 11:32:44PM +0200, Mircea Damian wrote:

> Call me idiot too but please explain what is wrong here:

What is wrong is that this is the kernel list, not the LILO list.

> root@taz:~# lilo -v
> LILO version 21.7, Copyright (C) 1992-1998 Werner Almesberger
> Device 0x0300: Invalid partition table, 3rd entry
>   3D address:     63/254/141 (2281229)
>   Linear address: 1/0/1165 (18715725)

Read the README in the LILO distribution.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: New net features for added performance
  2001-02-25 12:22 ` Werner Almesberger
@ 2001-03-12 15:08   ` Jes Sorensen
  0 siblings, 0 replies; 37+ messages in thread
From: Jes Sorensen @ 2001-03-12 15:08 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: Jeff Garzik, netdev, Linux Knernel Mailing List

>>>>> "Werner" == Werner Almesberger <Werner.Almesberger@epfl.ch> writes:

Werner> Jeff Garzik wrote:
>> 3) Slabbier packet allocation.

Werner> Hmm, this may actually be worse during bursts: if you burst
Werner> exceeds the preallocated size, you have to perform more
Werner> expensive/slower operations (e.g. running a tasklet) to refill
Werner> your cache.

You may want to look at how I did this in the acenic driver. If the
water mark goes below a certain level I schedule the tasklet, if it
gets below an urgent watermark I do the allocation in the interrupt
handler itself.

This is of course mainly useful for cards which give you deep
queues.

Jes

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2001-03-12 15:10 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-02-24 23:25 New net features for added performance Jeff Garzik
2001-02-24 23:48 ` Andi Kleen
2001-02-25  0:03   ` Jeff Garzik
2001-02-25  0:12     ` Andi Kleen
2000-01-01  0:19       ` Pavel Machek
2001-03-04  1:19         ` LILO error with 2.4.3-pre1 Steven J. Hill
2001-03-04  1:39           ` Keith Owens
2001-03-04  2:27             ` Tom Sightler
2001-03-04 21:32             ` Mircea Damian
2001-03-04 23:05               ` Guest section DW
2001-03-04  2:39           ` Andre Tomt
2001-03-04  3:32             ` Steven J. Hill
2001-03-04 13:35           ` Alan Cox
2001-02-25  0:13     ` New net features for added performance Jeff Garzik
2001-02-25  0:16       ` Andi Kleen
2001-02-25 11:49   ` Rusty Russell
2001-02-25  1:55 ` Michael Richardson
2001-02-25  2:32 ` Jeremy Jackson
2001-02-25  3:23   ` Chris Wedgwood
2001-02-25 12:41     ` Werner Almesberger
2001-02-25 13:57       ` Chris Wedgwood
2001-02-25  2:38 ` Noah Romer
2001-03-03 23:32   ` Jes Sorensen
2001-02-25 12:01 ` Andrew Morton
2001-02-25 15:11   ` Jeremy Jackson
2001-02-25 12:22 ` Werner Almesberger
2001-03-12 15:08   ` Jes Sorensen
2001-02-25 13:08 ` Jonathan Morton
2001-02-26 23:46 ` David S. Miller
2001-02-27  0:07   ` Jeff Garzik
2001-02-27  0:10   ` David S. Miller
2001-02-26 23:48 ` David S. Miller
2001-02-27  0:03   ` Andi Kleen
2001-02-27 19:59     ` kuznet
2001-02-27  0:08   ` David S. Miller
2001-02-27  2:53     ` Jeremy Jackson
2001-03-01 21:06 ` Jes Sorensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).