All of lore.kernel.org
 help / color / mirror / Atom feed
* performance
@ 2003-06-09 16:12 P
  2003-06-09 16:16 ` performance P
  0 siblings, 1 reply; 6+ messages in thread
From: P @ 2003-06-09 16:12 UTC (permalink / raw)
  To: netfilter-devel

Hi,

I'm testing netfilter performance here on
PIII 1.2GHz based systems. With default
kernel configuration, netfilter is able
to process 85,000 pps with 125 rules (all
rules matching).

Note the application is just counting.
There is no transmitting/forwarding.

Also note the nics are e100.

So my simple question are there any
tips in increasing the performance?
Hmm actually the performance seems
optimal? is it only taking 9 instructions
per match? 1.2*10^9/(85000*1500) = 9

thanks,
Pádraig.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: performance
  2003-06-09 16:12 performance P
@ 2003-06-09 16:16 ` P
  2003-06-09 18:27   ` Re[2]: performance Peteris Krumins
  0 siblings, 1 reply; 6+ messages in thread
From: P @ 2003-06-09 16:16 UTC (permalink / raw)
  To: netfilter-devel

P@draigBrady.com wrote:
> Hi,
> 
> I'm testing netfilter performance here on
> PIII 1.2GHz based systems. With default
> kernel configuration, netfilter is able
> to process 85,000 pps with 125 rules (all
> rules matching).
> 
> Note the application is just counting.
> There is no transmitting/forwarding.
> 
> Also note the nics are e100.
> 
> So my simple question are there any
> tips in increasing the performance?
> Hmm actually the performance seems
> optimal? is it only taking 9 instructions
> per match? 1.2*10^9/(85000*1500) = 9

I knew that couldn't be right.
That was tested on a dual 1.2GHz,
so that should be approx:
2*10^9/(85000*125) = 188 instructions per match.

I guess that's pretty optimal?
The best I could hope for after that
would be to increase the rx packet
buffer space so as to handle higher
spikes than this.

cheers,
Pádraig.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re[2]: performance
  2003-06-09 16:16 ` performance P
@ 2003-06-09 18:27   ` Peteris Krumins
  2003-06-10  8:49     ` performance P
  0 siblings, 1 reply; 6+ messages in thread
From: Peteris Krumins @ 2003-06-09 18:27 UTC (permalink / raw)
  To: P; +Cc: netfilter-devel

Monday, June 9, 2003, 7:16:35 PM, you wrote:

Pdc> P@draigBrady.com wrote:
>> Hi,
>> 
>> I'm testing netfilter performance here on
>> PIII 1.2GHz based systems. With default
>> kernel configuration, netfilter is able
>> to process 85,000 pps with 125 rules (all
>> rules matching).
>> 
>> Note the application is just counting.
>> There is no transmitting/forwarding.
>> 
>> Also note the nics are e100.
>> 
>> So my simple question are there any
>> tips in increasing the performance?
>> Hmm actually the performance seems
>> optimal? is it only taking 9 instructions
>> per match? 1.2*10^9/(85000*1500) = 9

Pdc> I knew that couldn't be right.
Pdc> That was tested on a dual 1.2GHz,
Pdc> so that should be approx:
Pdc> 2*10^9/(85000*125) = 188 instructions per match.

I am afraid that is not instructions per match
but ticks per match? An instruction can take more than
one tick (clock cycle).


P.Krumins

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: performance
  2003-06-09 18:27   ` Re[2]: performance Peteris Krumins
@ 2003-06-10  8:49     ` P
  2003-06-11 12:16       ` performance Harald Welte
  0 siblings, 1 reply; 6+ messages in thread
From: P @ 2003-06-10  8:49 UTC (permalink / raw)
  To: Peteris Krumins, netfilter-devel

Peteris Krumins wrote:
> Monday, June 9, 2003, 7:16:35 PM, you wrote:
> 
> Pdc> P@draigBrady.com wrote:
> 
>>>Hi,
>>>
>>>I'm testing netfilter performance here on
>>>PIII 1.2GHz based systems. With default
>>>kernel configuration, netfilter is able
>>>to process 85,000 pps with 125 rules (all
>>>rules matching).
>>>
>>>Note the application is just counting.
>>>There is no transmitting/forwarding.
>>>
>>>Also note the nics are e100.
>>>
>>>So my simple question are there any
>>>tips in increasing the performance?
>>>Hmm actually the performance seems
>>>optimal? is it only taking 9 instructions
>>>per match? 1.2*10^9/(85000*1500) = 9
> 
> 
> Pdc> I knew that couldn't be right.
> Pdc> That was tested on a dual 1.2GHz,
> Pdc> so that should be approx:
> Pdc> 2*10^9/(85000*125) = 188 instructions per match.
> 
> I am afraid that is not instructions per match
> but ticks per match? An instruction can take more than
> one tick (clock cycle).

yes true. I was making assumptions.
The following suggests the average cycles:instruction
ratio is 1.68 I think:
http://www.cs.berkeley.edu/~pattrsn/252S01/Lec18-dynamic3.pdf

Looking again at the system it's actually 2 x 1.4GHz
So the instructions per match are about:
((2.8*10^9)/1.68)/(85000*125) = 156

With the overhead associated with SMP/kernel/...
this suggests a figure closer to 120?
I'm very impressed.

So to scale I would have to organise the rules
into chains that could be bypassed. I wonder is
there any projects to do this automatically? hmm..

Pádraig.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: performance
  2003-06-10  8:49     ` performance P
@ 2003-06-11 12:16       ` Harald Welte
  2003-06-12 12:04         ` performance P
  0 siblings, 1 reply; 6+ messages in thread
From: Harald Welte @ 2003-06-11 12:16 UTC (permalink / raw)
  To: P; +Cc: Peteris Krumins, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1842 bytes --]

On Tue, Jun 10, 2003 at 09:49:54AM +0100, P@draigBrady.com wrote:
 
> yes true. I was making assumptions.
> The following suggests the average cycles:instruction
> ratio is 1.68 I think:
> http://www.cs.berkeley.edu/~pattrsn/252S01/Lec18-dynamic3.pdf
> 
> Looking again at the system it's actually 2 x 1.4GHz
> So the instructions per match are about:
> ((2.8*10^9)/1.68)/(85000*125) = 156
> 
> With the overhead associated with SMP/kernel/...
> this suggests a figure closer to 120?
> I'm very impressed.

I don't know what kind of weird calculation that would be.  First of
all, why the hack are you adding up the clock rates of your CPU's?  Do
you have any idea how SMP systems work? 

And then, why do you multiply the pps rate with the size of the packets?
do you think we process every byte of a packet individually?

Let's make a simple calculation for the UP case:

average clock ticks per processed packet:
1.4*10^9/85000 = 16470

In order to get any idea about how those clock ticks are distributed
among the various pars of kernel networking, you need to use some means
of profiling.

I advise you to dig into profiling, and exploit all general kernel
networking optimizations (like NAPI, ...) before starting to think about
optimizing iptables.

> So to scale I would have to organise the rules
> into chains that could be bypassed. I wonder is
> there any projects to do this automatically? hmm..

no.

> Pádraig.

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: performance
  2003-06-11 12:16       ` performance Harald Welte
@ 2003-06-12 12:04         ` P
  0 siblings, 0 replies; 6+ messages in thread
From: P @ 2003-06-12 12:04 UTC (permalink / raw)
  To: Harald Welte; +Cc: Peteris Krumins, netfilter-devel

Harald Welte wrote:
> In order to get any idea about how those clock ticks are distributed
> among the various pars of kernel networking, you need to use some means
> of profiling.

iptables is in modules so I can't profile it at the moment.
But this is informative. This is with 160 match any rules in the
mangle::PREROUTING chain and then the packets are just dropped after.
Packet rate again is 85Kpps.

  14754 total                                      0.0096
  11421 default_idle                             142.7625 (71% !!)
    537 handle_IRQ_event                           3.3563
    341 eth_type_trans                             1.7760
    298 ip_rcv                                     0.4233
    238 do_gettimeofday                            1.8594
    224 netif_rx                                   0.5185
    217 add_timer_randomness                       0.9042
    196 skb_release_data                           1.3611
    180 batch_entropy_store                        1.0227
    162 alloc_skb                                  0.3375
    157 process_backlog                            0.5164
    137 kfree                                      0.7784
    126 __kmem_cache_alloc                         0.3937
    109 netif_receive_skb                          0.2004
     85 nf_hook_slow                               0.1771
     66 kmalloc                                    0.8250
     61 kfree_skbmem                               0.4766
     50 __constant_c_and_count_memset              0.3125
     36 __kfree_skb                                0.0978
     28 nf_iterate                                 0.1750
     16 get_sample_stats                           0.1250
     15 add_entropy_words                          0.0852
     14 ip_promisc_rcv_finish                      0.2917
     10 kmem_cache_free                            0.0625
      8 add_interrupt_randomness                   0.1250
      6 __generic_copy_to_user                     0.0625
      4 schedule                                   0.0030
      2 net_rx_action                              0.0057
      2 do_softirq                                 0.0089

The main question is why all the idle time?
Note userspace is locked out at this packet rate.

system specs:
    dual 1.4GHz PIII
    e1000 nic
    kernel 2.4.20

Pádraig.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-06-12 12:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-09 16:12 performance P
2003-06-09 16:16 ` performance P
2003-06-09 18:27   ` Re[2]: performance Peteris Krumins
2003-06-10  8:49     ` performance P
2003-06-11 12:16       ` performance Harald Welte
2003-06-12 12:04         ` performance P

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.