* performance
@ 2003-06-09 16:12 P
2003-06-09 16:16 ` performance P
0 siblings, 1 reply; 6+ messages in thread
From: P @ 2003-06-09 16:12 UTC (permalink / raw)
To: netfilter-devel
Hi,
I'm testing netfilter performance here on
PIII 1.2GHz based systems. With default
kernel configuration, netfilter is able
to process 85,000 pps with 125 rules (all
rules matching).
Note the application is just counting.
There is no transmitting/forwarding.
Also note the nics are e100.
So my simple question are there any
tips in increasing the performance?
Hmm actually the performance seems
optimal? is it only taking 9 instructions
per match? 1.2*10^9/(85000*1500) = 9
thanks,
Pádraig.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: performance
2003-06-09 16:12 performance P
@ 2003-06-09 16:16 ` P
2003-06-09 18:27 ` Re[2]: performance Peteris Krumins
0 siblings, 1 reply; 6+ messages in thread
From: P @ 2003-06-09 16:16 UTC (permalink / raw)
To: netfilter-devel
P@draigBrady.com wrote:
> Hi,
>
> I'm testing netfilter performance here on
> PIII 1.2GHz based systems. With default
> kernel configuration, netfilter is able
> to process 85,000 pps with 125 rules (all
> rules matching).
>
> Note the application is just counting.
> There is no transmitting/forwarding.
>
> Also note the nics are e100.
>
> So my simple question are there any
> tips in increasing the performance?
> Hmm actually the performance seems
> optimal? is it only taking 9 instructions
> per match? 1.2*10^9/(85000*1500) = 9
I knew that couldn't be right.
That was tested on a dual 1.2GHz,
so that should be approx:
2*10^9/(85000*125) = 188 instructions per match.
I guess that's pretty optimal?
The best I could hope for after that
would be to increase the rx packet
buffer space so as to handle higher
spikes than this.
cheers,
Pádraig.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re[2]: performance
2003-06-09 16:16 ` performance P
@ 2003-06-09 18:27 ` Peteris Krumins
2003-06-10 8:49 ` performance P
0 siblings, 1 reply; 6+ messages in thread
From: Peteris Krumins @ 2003-06-09 18:27 UTC (permalink / raw)
To: P; +Cc: netfilter-devel
Monday, June 9, 2003, 7:16:35 PM, you wrote:
Pdc> P@draigBrady.com wrote:
>> Hi,
>>
>> I'm testing netfilter performance here on
>> PIII 1.2GHz based systems. With default
>> kernel configuration, netfilter is able
>> to process 85,000 pps with 125 rules (all
>> rules matching).
>>
>> Note the application is just counting.
>> There is no transmitting/forwarding.
>>
>> Also note the nics are e100.
>>
>> So my simple question are there any
>> tips in increasing the performance?
>> Hmm actually the performance seems
>> optimal? is it only taking 9 instructions
>> per match? 1.2*10^9/(85000*1500) = 9
Pdc> I knew that couldn't be right.
Pdc> That was tested on a dual 1.2GHz,
Pdc> so that should be approx:
Pdc> 2*10^9/(85000*125) = 188 instructions per match.
I am afraid that is not instructions per match
but ticks per match? An instruction can take more than
one tick (clock cycle).
P.Krumins
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: performance
2003-06-09 18:27 ` Re[2]: performance Peteris Krumins
@ 2003-06-10 8:49 ` P
2003-06-11 12:16 ` performance Harald Welte
0 siblings, 1 reply; 6+ messages in thread
From: P @ 2003-06-10 8:49 UTC (permalink / raw)
To: Peteris Krumins, netfilter-devel
Peteris Krumins wrote:
> Monday, June 9, 2003, 7:16:35 PM, you wrote:
>
> Pdc> P@draigBrady.com wrote:
>
>>>Hi,
>>>
>>>I'm testing netfilter performance here on
>>>PIII 1.2GHz based systems. With default
>>>kernel configuration, netfilter is able
>>>to process 85,000 pps with 125 rules (all
>>>rules matching).
>>>
>>>Note the application is just counting.
>>>There is no transmitting/forwarding.
>>>
>>>Also note the nics are e100.
>>>
>>>So my simple question are there any
>>>tips in increasing the performance?
>>>Hmm actually the performance seems
>>>optimal? is it only taking 9 instructions
>>>per match? 1.2*10^9/(85000*1500) = 9
>
>
> Pdc> I knew that couldn't be right.
> Pdc> That was tested on a dual 1.2GHz,
> Pdc> so that should be approx:
> Pdc> 2*10^9/(85000*125) = 188 instructions per match.
>
> I am afraid that is not instructions per match
> but ticks per match? An instruction can take more than
> one tick (clock cycle).
yes true. I was making assumptions.
The following suggests the average cycles:instruction
ratio is 1.68 I think:
http://www.cs.berkeley.edu/~pattrsn/252S01/Lec18-dynamic3.pdf
Looking again at the system it's actually 2 x 1.4GHz
So the instructions per match are about:
((2.8*10^9)/1.68)/(85000*125) = 156
With the overhead associated with SMP/kernel/...
this suggests a figure closer to 120?
I'm very impressed.
So to scale I would have to organise the rules
into chains that could be bypassed. I wonder is
there any projects to do this automatically? hmm..
Pádraig.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: performance
2003-06-10 8:49 ` performance P
@ 2003-06-11 12:16 ` Harald Welte
2003-06-12 12:04 ` performance P
0 siblings, 1 reply; 6+ messages in thread
From: Harald Welte @ 2003-06-11 12:16 UTC (permalink / raw)
To: P; +Cc: Peteris Krumins, netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 1842 bytes --]
On Tue, Jun 10, 2003 at 09:49:54AM +0100, P@draigBrady.com wrote:
> yes true. I was making assumptions.
> The following suggests the average cycles:instruction
> ratio is 1.68 I think:
> http://www.cs.berkeley.edu/~pattrsn/252S01/Lec18-dynamic3.pdf
>
> Looking again at the system it's actually 2 x 1.4GHz
> So the instructions per match are about:
> ((2.8*10^9)/1.68)/(85000*125) = 156
>
> With the overhead associated with SMP/kernel/...
> this suggests a figure closer to 120?
> I'm very impressed.
I don't know what kind of weird calculation that would be. First of
all, why the hack are you adding up the clock rates of your CPU's? Do
you have any idea how SMP systems work?
And then, why do you multiply the pps rate with the size of the packets?
do you think we process every byte of a packet individually?
Let's make a simple calculation for the UP case:
average clock ticks per processed packet:
1.4*10^9/85000 = 16470
In order to get any idea about how those clock ticks are distributed
among the various pars of kernel networking, you need to use some means
of profiling.
I advise you to dig into profiling, and exploit all general kernel
networking optimizations (like NAPI, ...) before starting to think about
optimizing iptables.
> So to scale I would have to organise the rules
> into chains that could be bypassed. I wonder is
> there any projects to do this automatically? hmm..
no.
> Pádraig.
--
- Harald Welte <laforge@netfilter.org> http://www.netfilter.org/
============================================================================
"Fragmentation is like classful addressing -- an interesting early
architectural error that shows how much experimentation was going
on while IP was being designed." -- Paul Vixie
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: performance
2003-06-11 12:16 ` performance Harald Welte
@ 2003-06-12 12:04 ` P
0 siblings, 0 replies; 6+ messages in thread
From: P @ 2003-06-12 12:04 UTC (permalink / raw)
To: Harald Welte; +Cc: Peteris Krumins, netfilter-devel
Harald Welte wrote:
> In order to get any idea about how those clock ticks are distributed
> among the various pars of kernel networking, you need to use some means
> of profiling.
iptables is in modules so I can't profile it at the moment.
But this is informative. This is with 160 match any rules in the
mangle::PREROUTING chain and then the packets are just dropped after.
Packet rate again is 85Kpps.
14754 total 0.0096
11421 default_idle 142.7625 (71% !!)
537 handle_IRQ_event 3.3563
341 eth_type_trans 1.7760
298 ip_rcv 0.4233
238 do_gettimeofday 1.8594
224 netif_rx 0.5185
217 add_timer_randomness 0.9042
196 skb_release_data 1.3611
180 batch_entropy_store 1.0227
162 alloc_skb 0.3375
157 process_backlog 0.5164
137 kfree 0.7784
126 __kmem_cache_alloc 0.3937
109 netif_receive_skb 0.2004
85 nf_hook_slow 0.1771
66 kmalloc 0.8250
61 kfree_skbmem 0.4766
50 __constant_c_and_count_memset 0.3125
36 __kfree_skb 0.0978
28 nf_iterate 0.1750
16 get_sample_stats 0.1250
15 add_entropy_words 0.0852
14 ip_promisc_rcv_finish 0.2917
10 kmem_cache_free 0.0625
8 add_interrupt_randomness 0.1250
6 __generic_copy_to_user 0.0625
4 schedule 0.0030
2 net_rx_action 0.0057
2 do_softirq 0.0089
The main question is why all the idle time?
Note userspace is locked out at this packet rate.
system specs:
dual 1.4GHz PIII
e1000 nic
kernel 2.4.20
Pádraig.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-06-12 12:04 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-09 16:12 performance P
2003-06-09 16:16 ` performance P
2003-06-09 18:27 ` Re[2]: performance Peteris Krumins
2003-06-10 8:49 ` performance P
2003-06-11 12:16 ` performance Harald Welte
2003-06-12 12:04 ` performance P
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.