Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  0:31   ` [ANNOUNCE] NF-HIPAC: High Performance Packet Classification Andi Kleen
@ 2002-09-26  0:29     ` David S. Miller
  2002-09-26  0:46       ` Andi Kleen
  2002-09-26  9:00       ` Roberto Nibali
  2002-09-26  1:17     ` Nivedita Singhvi
  1 sibling, 2 replies; 27+ messages in thread
From: David S. Miller @ 2002-09-26  0:29 UTC (permalink / raw)
  To: ak; +Cc: niv, linux-kernel

   From: Andi Kleen <ak@suse.de>
   Date: 26 Sep 2002 02:31:13 +0200

   "David S. Miller" <davem@redhat.com> writes:
   >    
   > In fact the exact opposite, such a suggested flow cache is about
   > as parallel as you can make it.
   
   It sounds more like it would include the FIB too.

That's the second level cache, not the top level lookup which
is what hits %99 of the time.
   
   The current FIBs have a bit heavier locking at least. Fine grain locking
   btrees is also not easy/nice.
   
Also not necessary, only the top level cache really needs to be
top performance.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
       [not found] ` <20020925.170336.77023245.davem@redhat.com.suse.lists.linux.kernel>
@ 2002-09-26  0:31   ` Andi Kleen
  2002-09-26  0:29     ` David S. Miller
  2002-09-26  1:17     ` Nivedita Singhvi
  0 siblings, 2 replies; 27+ messages in thread
From: Andi Kleen @ 2002-09-26  0:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: niv, linux-kernel

"David S. Miller" <davem@redhat.com> writes:
>    
> In fact the exact opposite, such a suggested flow cache is about
> as parallel as you can make it.

It sounds more like it would include the FIB too.

> I don't understand why you think using the routing tables to their
> full potential would imply serialization.  If you still believe this
> you have to describe why in more detail.

I guess he's thinking of the FIB, not the routing cache.

The current FIBs have a bit heavier locking at least. Fine grain locking
btrees is also not easy/nice.

-Andi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  0:46       ` Andi Kleen
@ 2002-09-26  0:44         ` David S. Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David S. Miller @ 2002-09-26  0:44 UTC (permalink / raw)
  To: ak; +Cc: niv, linux-kernel

   From: Andi Kleen <ak@suse.de>
   Date: Thu, 26 Sep 2002 02:46:45 +0200

   > Also not necessary, only the top level cache really needs to be
   > top performance.

   Sure, but if they were unified (that is what I understood what the original
   poster wanted to do) then they would be suddenly much more performance
   critical and need fine grained locking.

This can be made, if necessary.  If the toplevel flow cache lookup
table is sized appropriately, I doubt anything will be needed.

   P.S.: One big performance problem currently is ip_conntrack. It has a bad
   hash function and tends to have a too big working set (beyond cache size)
   Some tuning in this regard would help a lot of workloads.

This is well understood problem and a fix is in the works.
See the netfilter lists.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  0:29     ` David S. Miller
@ 2002-09-26  0:46       ` Andi Kleen
  2002-09-26  0:44         ` David S. Miller
  2002-09-26  9:00       ` Roberto Nibali
  1 sibling, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2002-09-26  0:46 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, niv, linux-kernel

On Wed, Sep 25, 2002 at 05:29:31PM -0700, David S. Miller wrote:
>    The current FIBs have a bit heavier locking at least. Fine grain locking
>    btrees is also not easy/nice.
>    
> Also not necessary, only the top level cache really needs to be
> top performance.

Sure, but if they were unified (that is what I understood what the original
poster wanted to do) then they would be suddenly much more performance
critical and need fine grained locking.

-Andi

P.S.: One big performance problem currently is ip_conntrack. It has a bad
hash function and tends to have a too big working set (beyond cache size)
Some tuning in this regard would help a lot of workloads.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  1:17     ` Nivedita Singhvi
@ 2002-09-26  1:15       ` Andi Kleen
  0 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2002-09-26  1:15 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: Andi Kleen, David S. Miller, linux-kernel

On Wed, Sep 25, 2002 at 06:17:58PM -0700, Nivedita Singhvi wrote:
> Andi Kleen wrote:
> 
> > I guess he's thinking of the FIB, not the routing cache.
> 
> I was, + chain explansion, but this is just (um, cough)
> to s/he/she

I was actually thinking about the first poster in the thread (it was a 'he'
iirc) 

But thanks for the correction anyways :-)

-Andi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  0:31   ` [ANNOUNCE] NF-HIPAC: High Performance Packet Classification Andi Kleen
  2002-09-26  0:29     ` David S. Miller
@ 2002-09-26  1:17     ` Nivedita Singhvi
  2002-09-26  1:15       ` Andi Kleen
  1 sibling, 1 reply; 27+ messages in thread
From: Nivedita Singhvi @ 2002-09-26  1:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David S. Miller, linux-kernel

Andi Kleen wrote:

> I guess he's thinking of the FIB, not the routing cache.

I was, + chain explansion, but this is just (um, cough)
to s/he/she

:)
thanks,
Nivedita

> The current FIBs have a bit heavier locking at least. Fine grain locking
> btrees is also not easy/nice.
> 
> -Andi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  0:29     ` David S. Miller
  2002-09-26  0:46       ` Andi Kleen
@ 2002-09-26  9:00       ` Roberto Nibali
  2002-09-26  9:06         ` David S. Miller
                           ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Roberto Nibali @ 2002-09-26  9:00 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, niv, linux-kernel, jamal

Hello DaveM and others,

>    It sounds more like it would include the FIB too.
> 
> That's the second level cache, not the top level lookup which
> is what hits %99 of the time.

I've done extensive testing in this field trying to achive fast packet 
filtering with a huge set of not ordered rules loaded into the kernel.

According to my findings I had reason to believe that after around 1000 
rules for ipchains and around 4800 rules for iptables the L2 cache was 
the limiting factor (of course given the slowish iptables/conntrack 
table lookup).

Those are rule thresholds I achieved with a PIII Tualatin and 512KB L2 
cache. With a sluggish Celeron with I think 128KB L2 cache I achieved 
about 1/8 of the above treshold. That's why I thought the L2 cache plays 
a bigger role in this than the CPU FSB clock.

I concluded that if the ruleset to be matched would exceed the treshold 
of what can be loaded into the L2 cache we see cache trashing and that's 
why performance goes right to hell. I wanted to test this using oprofile 
but haven't found the correct cpu performance counter yet :).

> Also not necessary, only the top level cache really needs to be
> top performance.

I will do a new round of testing this weekend for a speech I'll be 
giving. This time I will include ipchains, iptables (of course I am 
willing to apply every interesting patch regarding hash table 
optimisation and whatnot you want me to test), nf-hipac, the OpenBSD pf 
and of course the work done by Jamal.

Dave, is the work done by Jamal (and I think Werner and others did some 
too) before, mostly during OLS, and probably now the one you're 
referring to? Hadi showed it to me at OLS and I saw a great potential in it.

I'm asking because the company I work for builds rather big packet 
filters (with up to 24 NICs per node) for special purpose networks which 
due to policies and automated ruleset generation by mapping a port 
matrix into a weighted graph and then extrapolating the ruleset with 
basic Algebra (Dijkstra and all this cruft) generate a huge set of 
rules. Two problems we're facing on a daily basis:

o we can't filter more than 13Mbit/s anymore after loading around 3000
   rules into the kernel (problem is gone with nf-hipac for example).
o we can't log all the messages we would like to because the user space
   log daemon (syslog-ng in our case, but we've tried others too) doesn't
   get enough CPU time anymore to read the buffer before it will be over-
   written by the printk's again. This leads to an almost proportial to
   N^2 log entry loss with increasing number of rules that do not match.
   This is the worst thing that can happen to you working in the
   security business: not having an appropriate log trace during a
   possible incident.

AFAICR Jamal did modify the routing and FIB code and hacked iproute2 to 
achieve that. We spoke about this at the OLS. Until I had seen his code 
my approach to test the speed was to (don't laugh):

o blackhole everything (POLICY DROP)
o generate routing rules (selectors) for matching packets
o add routes which would allow just that specific flow into the
   according routing tables
o '-j <CHAIN>' was implemented using bounce table walking

This was just a test to see the potential speed improvement of moving 
the most simplistic things from netfilter (like raw packetfiltering 
without conntrack and ports) a 'layer' down to the routing code. A lot 
of works has to be done in this field and the filtering code is just 
about the most simple one AFAICT, but conntrack and proper n:m NAPT 
incorporated into the routing code is IMHO a tricky thing.

Best regards,
Roberto Nibali, ratz
-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  9:00       ` Roberto Nibali
@ 2002-09-26  9:06         ` David S. Miller
  2002-09-26  9:24           ` Roberto Nibali
                             ` (2 more replies)
  2002-09-26 12:04         ` Andi Kleen
  2002-09-30 17:36         ` Bill Davidsen
  2 siblings, 3 replies; 27+ messages in thread
From: David S. Miller @ 2002-09-26  9:06 UTC (permalink / raw)
  To: ratz; +Cc: ak, niv, linux-kernel, hadi

   From: Roberto Nibali <ratz@drugphish.ch>
   Date: Thu, 26 Sep 2002 11:00:53 +0200

   Hello DaveM and others,

   > That's the second level cache, not the top level lookup which
   > is what hits %99 of the time.
 ...   
   the L2 cache was the limiting factor

I'm not talking about cpu second level cache, I'm talking about
a second level lookup table that backs up a front end routing
hash.  A software data structure.

You are talking about a lot of independant things, but I'm going
to defer my contributions until we have actual code people can
start plugging netfilter into if they want.

About using syslog to record messages, that is doomed to failure,
implement log messages via netlink and use that to log the events
instead.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  9:24           ` Roberto Nibali
@ 2002-09-26  9:21             ` David S. Miller
  2002-09-26 15:13             ` James Morris
  1 sibling, 0 replies; 27+ messages in thread
From: David S. Miller @ 2002-09-26  9:21 UTC (permalink / raw)
  To: ratz; +Cc: ak, niv, linux-kernel, hadi

   From: Roberto Nibali <ratz@drugphish.ch>
   Date: Thu, 26 Sep 2002 11:24:19 +0200
   
   Fair enough. I'm looking forward to seeing this framework. Any release 
   schedules or rough plans?
   
None whatsoever, as it should be.

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  9:06         ` David S. Miller
@ 2002-09-26  9:24           ` Roberto Nibali
  2002-09-26  9:21             ` David S. Miller
  2002-09-26 15:13             ` James Morris
  2002-09-26 10:25           ` Roberto Nibali
  2002-09-26 12:03           ` jamal
  2 siblings, 2 replies; 27+ messages in thread
From: Roberto Nibali @ 2002-09-26  9:24 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, niv, linux-kernel, hadi

> I'm not talking about cpu second level cache, I'm talking about
> a second level lookup table that backs up a front end routing
> hash.  A software data structure.

Doh! Sorry for my confusion, I guess I wasn't reading your posting too 
carefully. I understand the software architecture part now. Nevertheless 
one day or another you will need to face the caching issue too unless 
your data structure will always fit entirely into the cache or am I 
completely off track again?

> You are talking about a lot of independant things, but I'm going
> to defer my contributions until we have actual code people can
> start plugging netfilter into if they want.

Fair enough. I'm looking forward to seeing this framework. Any release 
schedules or rough plans?

> About using syslog to record messages, that is doomed to failure,
> implement log messages via netlink and use that to log the events
> instead.

Yes, we're doing tests in this field now (as with evlog) but as it seems 
from preliminary testing netlink transportation of binary data is not 
100% reliable either. However, I will refrain from further posting 
assumptions until we've done our tests and until we can post useful 
results and facts in this field.

Thanks and cheers,
Roberto Nibali, ratz
-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26 10:25           ` Roberto Nibali
@ 2002-09-26 10:20             ` David S. Miller
  2002-09-26 10:49               ` Roberto Nibali
  0 siblings, 1 reply; 27+ messages in thread
From: David S. Miller @ 2002-09-26 10:20 UTC (permalink / raw)
  To: ratz; +Cc: ak, niv, linux-kernel, hadi

   From: Roberto Nibali <ratz@drugphish.ch>
   Date: Thu, 26 Sep 2002 12:25:20 +0200

   <maybe stupid thought>
   Another thing would be to use netconsole to send event messages over the 

What if the netconsole packets cause events to be logged?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  9:06         ` David S. Miller
  2002-09-26  9:24           ` Roberto Nibali
@ 2002-09-26 10:25           ` Roberto Nibali
  2002-09-26 10:20             ` David S. Miller
  2002-09-26 12:03           ` jamal
  2 siblings, 1 reply; 27+ messages in thread
From: Roberto Nibali @ 2002-09-26 10:25 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, niv, linux-kernel, hadi

> About using syslog to record messages, that is doomed to failure,
> implement log messages via netlink and use that to log the events
> instead.

<maybe stupid thought>
Another thing would be to use netconsole to send event messages over the 
network to a central loghost. This would eliminate the buffer overwrite 
problem unless you sent more messages than the backlog queue is able to 
hold before the packets are being processed. But you could theoretically 
send 10 MB messages per seconds that could also be stored.
</maybe stupid thought>

I will shut up now as I do not want to waste your and the others 
precious time with my extensive schmoozing ;).

Best regards,
Roberto Nibali, ratz
-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26 10:20             ` David S. Miller
@ 2002-09-26 10:49               ` Roberto Nibali
  0 siblings, 0 replies; 27+ messages in thread
From: Roberto Nibali @ 2002-09-26 10:49 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, niv, linux-kernel, hadi

> What if the netconsole packets cause events to be logged?

Oups! Well, you could send them via printk to the internal printk buffer 
which then gets fetched by the local syslog and then you can decide what 
to do since those messages should never fill up the buffer as quickly as 
syslog will be able to get them. Actually then you should rate limit the 
printk messages and probably also increase the buffer size.

But to be honest, those are not the usual messages that fill up the 
buffer so fast that syslog is not able to read the message before the 
buffer gets overwritten again. Of course you will then have a logfile 
inconsistency and this could be accounted just as well as a loss of trace.

But generally I agree that you're standing there pants down. For example 
if you have a filter rule in the routing or whereever code that doesn't 
permit the packets to leave the machine and thus to be dropped. Ah well 
... it was worth a try.

[Hmpf, my collegue is just testing this right now but I think I can then 
tell him to stop this because you're always biting your own tail with 
this approach, one way or another].

Thanks for the valuable input and best regards,
Roberto Nibali, ratz
-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  9:06         ` David S. Miller
  2002-09-26  9:24           ` Roberto Nibali
  2002-09-26 10:25           ` Roberto Nibali
@ 2002-09-26 12:03           ` jamal
  2002-09-26 20:23             ` Roberto Nibali
  2 siblings, 1 reply; 27+ messages in thread
From: jamal @ 2002-09-26 12:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: ratz, ak, niv, linux-kernel, netdev

It would be nice if people would start ccing networking related
discussions to netdev. I missed the first part of the discussion
but i take it the NF-HIPAC posted a patch.. BTW, I emailed the authors
when i read the paper but never heard back.
What i wanted the authors was to compare against one of the tc
classifiers not iptables.

On Thu, 26 Sep 2002, David S. Miller wrote:

> You are talking about a lot of independant things, but I'm going
> to defer my contributions until we have actual code people can
> start plugging netfilter into if they want.
>

I hacked some code using the traffic control framework around OLS time;
there are a lot of ideas i havent incorporated yet. Too many hacks, too
little time ;-> I think this is what i may have showed Roberto on my
laptop over a drink.
I probably wouldnt have put this code out if my complaints about
netfilter werent ignored.
And you know what happens when you start writting poetry, I ended worrying
more than just about the performance problems of iptables; for example
the code i have now makes it easy to extend the path a packet takes using
simple policies.
The code i have is based around tc framework. One thing i liked about
netfilter is the idea of targets being separate modules; so the code i
have infact makes uses of netfilter targets.
I plan on revisiting this code at some point, maybe this weekend now that
i am reminded of it ;->
Take a look:
http://www.cyberus.ca/~hadi/patches/action.DESCRIPTION

> About using syslog to record messages, that is doomed to failure,
> implement log messages via netlink and use that to log the events
> instead.
>

Agreed, you need a netlink to syslog converter.
Netlink is king -- all the policies in the above code are netlink
controlled. All events are also netlink transported. You dont have to send
every little message you see; netlink allows you to batch and you could
easily do a nagle like algorithm. Next steps are a distributed version
of netlink..

cheers,
jamal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  9:00       ` Roberto Nibali
  2002-09-26  9:06         ` David S. Miller
@ 2002-09-26 12:04         ` Andi Kleen
  2002-09-26 20:49           ` Roberto Nibali
  2002-09-30 17:36         ` Bill Davidsen
  2 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2002-09-26 12:04 UTC (permalink / raw)
  To: Roberto Nibali; +Cc: David S. Miller, ak, niv, linux-kernel, jamal

On Thu, Sep 26, 2002 at 11:00:53AM +0200, Roberto Nibali wrote:
> o we can't filter more than 13Mbit/s anymore after loading around 3000
>   rules into the kernel (problem is gone with nf-hipac for example).

For iptables/ipchain you need to write hierarchical/port range rules 
in this case and try to terminate searchs early.

But yes, we also found that the L2 cache is limiting here
(ip_conntrack has the same problem) 

> o we can't log all the messages we would like to because the user space
>   log daemon (syslog-ng in our case, but we've tried others too) doesn't
>   get enough CPU time anymore to read the buffer before it will be over-
>   written by the printk's again. This leads to an almost proportial to
>   N^2 log entry loss with increasing number of rules that do not match.
>   This is the worst thing that can happen to you working in the
>   security business: not having an appropriate log trace during a
>   possible incident.

At least  that is easily fixed. Just increase the LOG_BUF_LEN parameter
in kernel/printk.c

Alternatively don't use slow printk, but nfnetlink to report bad packets
and print from user space. That should scale much better.

-Andi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  9:24           ` Roberto Nibali
  2002-09-26  9:21             ` David S. Miller
@ 2002-09-26 15:13             ` James Morris
  2002-09-26 20:51               ` Roberto Nibali
  1 sibling, 1 reply; 27+ messages in thread
From: James Morris @ 2002-09-26 15:13 UTC (permalink / raw)
  To: Roberto Nibali; +Cc: David S. Miller, Andi Kleen, niv, linux-kernel, jamal

On Thu, 26 Sep 2002, Roberto Nibali wrote:

> Yes, we're doing tests in this field now (as with evlog) but as it seems 
> from preliminary testing netlink transportation of binary data is not 
> 100% reliable either.

Non-blocking netlink delivery is reliable, although you can overrun the 
userspace socket buffer (this can be detected, however).  The fundamental 
issue remains: sending more data to userspace than can be handled.

A truly reliable transport would also involve an ack based protocol .  
Under certain circumstances (e.g. log every forwarded packet for audit
purposes), packets would need to be dropped if the logging mechanism
became overloaded.  This would in turn involve some kind of queuing
mechanism and introduce a new set of performance problems.  Reliable
logging is a challenging problem area in general, probably better suited
to dedicated hardware environments where the software can be tuned to
known system capabilities.

- James
-- 
James Morris
<jmorris@intercode.com.au>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26 12:03           ` jamal
@ 2002-09-26 20:23             ` Roberto Nibali
  2002-09-27 13:57               ` jamal
  0 siblings, 1 reply; 27+ messages in thread
From: Roberto Nibali @ 2002-09-26 20:23 UTC (permalink / raw)
  To: jamal; +Cc: niv, linux-kernel, netdev

Hello Jamal,

[took out AK und DaveM since I know they both read netdev and this reply 
is not really of any relevance to them]

> It would be nice if people would start ccing networking related
> discussions to netdev. I missed the first part of the discussion
> but i take it the NF-HIPAC posted a patch.. BTW, I emailed the authors

Yes, your assumption is correct and sorry for missing the cc once again.

> when i read the paper but never heard back.
> What i wanted the authors was to compare against one of the tc
> classifiers not iptables.

I will contact you privately on this issue since I'm about to conduct 
tests this weekend.

> I hacked some code using the traffic control framework around OLS time;
> there are a lot of ideas i havent incorporated yet. Too many hacks, too
> little time ;-> I think this is what i may have showed Roberto on my
> laptop over a drink.

Exactly (even wearing a netfilter T-shirt).

> I probably wouldnt have put this code out if my complaints about
> netfilter werent ignored.
> And you know what happens when you start writting poetry, I ended worrying
> more than just about the performance problems of iptables; for example
> the code i have now makes it easy to extend the path a packet takes using
> simple policies.

Great, I remember some of your postings about the netfilter framework.

> The code i have is based around tc framework. One thing i liked about
> netfilter is the idea of targets being separate modules; so the code i
> have infact makes uses of netfilter targets.
> I plan on revisiting this code at some point, maybe this weekend now that
> i am reminded of it ;->

Excellent, this could make it into my test suites as well.

> Take a look:
> http://www.cyberus.ca/~hadi/patches/action.DESCRIPTION

I did, I simply didn't find the time to do it.

> Agreed, you need a netlink to syslog converter.
> Netlink is king -- all the policies in the above code are netlink
> controlled. All events are also netlink transported. You dont have to send
> every little message you see; netlink allows you to batch and you could
> easily do a nagle like algorithm. Next steps are a distributed version
> of netlink..

Is there a code architecture draft somewhere?

Best regards,
Roberto Nibali, ratz
-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26 12:04         ` Andi Kleen
@ 2002-09-26 20:49           ` Roberto Nibali
  0 siblings, 0 replies; 27+ messages in thread
From: Roberto Nibali @ 2002-09-26 20:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David S. Miller, niv, linux-kernel, jamal, netdev

> For iptables/ipchain you need to write hierarchical/port range rules 
> in this case and try to terminate searchs early.

We're still trying to find the correct mathematical functions to do 
this. Trust me, it is not so easy, the mapping of the port matrix and 
the network flow through many stacked packet filters and firewalls 
generates a rather complex graph (partly bigraph (LVS-DR for example)) 
which has complex structures (redundancy and parallelisations). It's not 
that we could sit down and implement a fw-script for our packet filters, 
the fw-script is being generated through a meta-fw layer that knows 
about the surrounding network nodes.

> But yes, we also found that the L2 cache is limiting here
> (ip_conntrack has the same problem) 

I think this weekend I will do my tests also measuring some cpu 
performance counters with oprofile, such as DATA_READ_MISS, CODE CACHE 
MISS and NONCACHEABLE_MEMORY_READS.

> At least  that is easily fixed. Just increase the LOG_BUF_LEN parameter
> in kernel/printk.c

Tests showed that this only helps in peak situations, I think we should 
simply forget about printk().

> Alternatively don't use slow printk, but nfnetlink to report bad packets
> and print from user space. That should scale much better.

Yes and there are a few things that my collegue found out during his 
tests (actually pretty straight forward things):

1. A big log buffer is only useful to come by peaks
2. A big log buffer while having high CPU load doesn't help at all
3. The smaller the message, the better (binary logging thus is an
    advantage)
4. The logging via printk() is extremely expensive, because of the
    conversions and whatnot. A rough estimate would be 12500 clock
    cycles for a log entry generated by printk(). This means that on a
    PIII/450 a log entry needs 0.000028s and this again leads to
    following observation: Having 36000pps which should all be logged,
    you will end up with a system having 100% CPU load and being 0% idle.
5. The kernel should log a binary stream, also the daemon that needs to
    fetch the data. If you want to convert the binary to human readable
    format, you start a process with low prio or do it on-demand.
6. Ideally the log daemon should be preemtible to get a defined time
    slice to do its job.

Some test results conducted by a coworker of mine (Achim Gsell):

Max pkt rate the system can log without losing more then 1% of the messages:
----------------------------------------------------------------------------


kernel:		Linux 2.4.19-gentoo-r7 (low latency scheduling)

daemon:		syslog-ng (nice 0), logbufsiz=16k, pkts=10*10000, CPU=PIII/450
packet-len:	64		256		512		1024

		2873pkt/s	3332pkt/s	3124pkt/s	3067pkt/s
		1.4 Mb/s	6.6Mb/s		12.2Mb/s	23.9Mb/s

daemon:		syslog-ng (nice 0), logbufsiz=16k, pkts=10*10000, CPU=PIVM/1.7
packet-len:	64		256		512		1024

		7808pkt/s	7807pkt/s	7806pkt/s	    pkt/s
		3.8 Mb/s	15.2Mb/s	30.5Mb/s	    Mb/s

----------------------------------------------------------------------------------------------------------

daemon:		cat /proc/kmsg > kernlog, logbufsiz=16k, pkts=10*10000, 
CPU=PIII/450
packet-len:	64		256		512		1024

		4300pkt/s	        	         	3076pkt/s
		2.1 Mb/s	       		         	24.0Mb/s

daemon:		ulogd (nlbufsize=4k, qthreshold=1), pkts=10*10000, CPU=PIII/450
packet-len:	64		256		512		1024

		4097pkt/s	        	       		4097pkt/s
		2.0 Mb/s	       		         	32  Mb/s

daemon:		ulogd (nlbufsize=2^17 - 1, qthreshold=1), pkts=10*10000, 
CPU=PIII/450
packet-len:	64		256		512		1024

		6576pkt/s	        	         	5000pkt/s
		3.2 Mb/s	       		        	38  Mb/s

daemon:		ulogd (nlbufsize=64k, qthreshold=1), pkts=1*10000, CPU=PIII/450
packet-len:	64		256		512		1024

		         	        	         	    pkt/s
		        	       		        	4.0 Mb/s

daemon:		ulogd (nlbufsize=2^17 - 1, qthreshold=50), pkts=10*10000, 
CPU=PIII/450
packet-len:	64		256		512		1024

		6170pkt/s	        	         	5000pkt/s
		3.0 Mb/s	       		        	38  Mb/s


Best regards,
Roberto Nibali, ratz
-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26 15:13             ` James Morris
@ 2002-09-26 20:51               ` Roberto Nibali
  0 siblings, 0 replies; 27+ messages in thread
From: Roberto Nibali @ 2002-09-26 20:51 UTC (permalink / raw)
  To: James Morris; +Cc: linux-kernel, jamal

> Non-blocking netlink delivery is reliable, although you can overrun the 
> userspace socket buffer (this can be detected, however).  The fundamental 
> issue remains: sending more data to userspace than can be handled.

Agreed.

> A truly reliable transport would also involve an ack based protocol .  
> Under certain circumstances (e.g. log every forwarded packet for audit
> purposes), packets would need to be dropped if the logging mechanism
> became overloaded.  This would in turn involve some kind of queuing
> mechanism and introduce a new set of performance problems.  Reliable
> logging is a challenging problem area in general, probably better suited
> to dedicated hardware environments where the software can be tuned to
> known system capabilities.

Thanks. I think we'll find a solution that will suit us best and if we 
have something we let the community know.

Best regards,
Roberto Nibali, ratz
-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26 20:23             ` Roberto Nibali
@ 2002-09-27 13:57               ` jamal
  0 siblings, 0 replies; 27+ messages in thread
From: jamal @ 2002-09-27 13:57 UTC (permalink / raw)
  To: Roberto Nibali; +Cc: linux-kernel, netdev

On Thu, 26 Sep 2002, Roberto Nibali wrote:

> Is there a code architecture draft somewhere?

You mean for what i posted? Dont you think i already went beyond
the classical open source model by putting out a user guide? ;-> ;->
Just ask me questions in private and i'll try and help.

cheers,
jamal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  9:00       ` Roberto Nibali
  2002-09-26  9:06         ` David S. Miller
  2002-09-26 12:04         ` Andi Kleen
@ 2002-09-30 17:36         ` Bill Davidsen
  2002-10-02 17:37           ` Roberto Nibali
  2 siblings, 1 reply; 27+ messages in thread
From: Bill Davidsen @ 2002-09-30 17:36 UTC (permalink / raw)
  To: Roberto Nibali; +Cc: David S. Miller, ak, niv, linux-kernel, jamal

On Thu, 26 Sep 2002, Roberto Nibali wrote:

> I've done extensive testing in this field trying to achive fast packet 
> filtering with a huge set of not ordered rules loaded into the kernel.
> 
> According to my findings I had reason to believe that after around 1000 
> rules for ipchains and around 4800 rules for iptables the L2 cache was 
> the limiting factor (of course given the slowish iptables/conntrack 
> table lookup).
> 
> Those are rule thresholds I achieved with a PIII Tualatin and 512KB L2 
> cache. With a sluggish Celeron with I think 128KB L2 cache I achieved 
> about 1/8 of the above treshold. That's why I thought the L2 cache plays 
> a bigger role in this than the CPU FSB clock.
> 
> I concluded that if the ruleset to be matched would exceed the treshold 
> of what can be loaded into the L2 cache we see cache trashing and that's 
> why performance goes right to hell. I wanted to test this using oprofile 
> but haven't found the correct cpu performance counter yet :).
> 
> > Also not necessary, only the top level cache really needs to be
> > top performance.
> 
> I will do a new round of testing this weekend for a speech I'll be 
> giving. This time I will include ipchains, iptables (of course I am 
> willing to apply every interesting patch regarding hash table 
> optimisation and whatnot you want me to test), nf-hipac, the OpenBSD pf 
> and of course the work done by Jamal.

Look forward to any info you can provide.

I particularly like that nf-hipac can be put in and tried in one-to-one
comparison, that leaves an easy route to testing and getting confidence in
the code.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-30 17:36         ` Bill Davidsen
@ 2002-10-02 17:37           ` Roberto Nibali
  0 siblings, 0 replies; 27+ messages in thread
From: Roberto Nibali @ 2002-10-02 17:37 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-kernel, netdev

Hi,

>>I will do a new round of testing this weekend for a speech I'll be 
>>giving. This time I will include ipchains, iptables (of course I am 
>>willing to apply every interesting patch regarding hash table 
>>optimisation and whatnot you want me to test), nf-hipac, the OpenBSD pf 
>>and of course the work done by Jamal.
>  
> Look forward to any info you can provide.

Unfortunately (as always) there were tons of delays that didn't allow me 
to finish the complete test suite as I hoped I could but I sent some 
information off this list to Jamal and the nf-hipac guys about previous 
test result. See below. I hope I can do more tests this weekend ...

> I particularly like that nf-hipac can be put in and tried in one-to-one
> comparison, that leaves an easy route to testing and getting confidence in
> the code.

Yes and it was very convincing after the first few tests Some prelimiary 
test with raw TCP throughput have given me following really cool results:

TCP RAW throughput 100Mbit/s max MTU:
-------------------------------------
ratz@laphish:~/netperf-2.2pl2 > ./netperf -H 192.168.1.141 -p 6666 -l 60
TCP STREAM TEST to 192.168.1.141
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/s

  87380  16384  16384    60.01      88.03 <------
ratz@laphish:~/netperf-2.2pl2 >


TCP RAW throughput 100Mbit/s max MTU with 10000 non-matching rules + 1 
last matching rule at the end of the FORWARD chain [iptables]:
----------------------------------------------------------------------
ratz@laphish:~/netperf-2.2pl2 > ./netperf -H 192.168.1.141 -p 6666 -l 60
TCP STREAM TEST to 192.168.1.141
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

  87380  16384  16384    60.12       3.28 <------
ratz@laphish:~/netperf-2.2pl2 >


TCP RAW throughput 100Mbit/s max MTU with 10000 non-matching rules + 1 
last matching rule at the end of the FORWARD chain [nf-hipac]:
----------------------------------------------------------------------
ratz@laphish:~/netperf-2.2pl2 > ./netperf -H 192.168.1.141 -p 6666 -l 60
TCP STREAM TEST to 192.168.1.141
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

  87380  16384  16384    60.03      85.78 <------
ratz@laphish:~/netperf-2.2pl2 >


For nf-hipac I also have some statistics:
-----------------------------------------
bloodyhell:/var/FWTEST/nf-hipac # cat /proc/net/nf-hipac
nf-hipac statistics
-------------------

Maximum available memory:          65308672 bytes

Currently used memory:             1764160 bytes

INPUT:
   - INPUT chain is empty

FORWARD:
   - Number of rules:                 10002
   - Total size:                    1033010 bytes
   - Total size (allocated):        1764160 bytes
   - Termrule size:                   80016 bytes
   - Termrule size (allocated):      320064 bytes
   - Number of btrees:                30007
     * number of u32 btrees:          10003
       + distribution of u32 btrees:
                                     [     2,      4]:   10002
                                     [ 16384,  32768]:       1
     * number of u16 btrees:          10002
       + distribution of u16 btrees:
                                     [    1,     2]:   10002
     * number of u8 btrees:           10002
       + distribution of u8 btrees:
                                     [  2,   4]:      18

OUTPUT:
   - OUTPUT chain is empty

bloodyhell:/var/FWTEST/nf-hipac #

Roberto Nibali, ratz
-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  0:40     ` David S. Miller
@ 2002-09-26  1:09       ` Nivedita Singhvi
  0 siblings, 0 replies; 27+ messages in thread
From: Nivedita Singhvi @ 2002-09-26  1:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

"David S. Miller" wrote:

>    Well, true - we have per hashchain locks, but are we now adding
>    the times we need to lookup something on this chain because we now
>    have additional info other than the route, is what I was
>    wondering..?
> 
> That's what I meant by "extending the lookup key", consider if we
> took "next protocol, src port, dst port" into account.

Aah!. thick head <-- understanding. 

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  0:03 ` David S. Miller
@ 2002-09-26  0:50   ` Nivedita Singhvi
  2002-09-26  0:40     ` David S. Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Nivedita Singhvi @ 2002-09-26  0:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

"David S. Miller" wrote:
> 
>    From: "Nivedita Singhvi" <niv@us.ibm.com>
>    Date: 25 Sep 2002 17:06:53 -0700
>    ...
> 
>    > Everything, from packet forwarding, to firewalling, to TCP socket
>    > packet receive, can be described with routes.  It doesn't make sense
>    > for forwarding, TCP, netfilter, and encapsulation schemes to duplicate
>    > all of this table lookup logic and in fact it's entirely superfluous.
> 
>    Are you saying combine the tables themselves?
> 
>    One of the tradeoffs would be serialization of the access, then,
>    right? i.e. Much less stuff could happen in parallel? Or am I
>    completely misunderstanding your proposal?
> 
> In fact the exact opposite, such a suggested flow cache is about
> as parallel as you can make it.
> 
> Even if the per-cpu toplevel flow cache idea were not implemented and
> we used the current top-level route lookup infrastructure, it is fully
> parallelized since the toplevel hash table uses per-hashchain locks.
> Please see net/ipv4/route.c:ip_route_input() and friends.

Well, true - we have per hashchain locks, but are we now adding
the times we need to lookup something on this chain because we now 
have additional info other than the route, is what I was
wondering..?


> I don't understand why you think using the routing tables to their
> full potential would imply serialization.  If you still believe this
> you have to describe why in more detail.

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  0:50   ` Nivedita Singhvi
@ 2002-09-26  0:40     ` David S. Miller
  2002-09-26  1:09       ` Nivedita Singhvi
  0 siblings, 1 reply; 27+ messages in thread
From: David S. Miller @ 2002-09-26  0:40 UTC (permalink / raw)
  To: niv; +Cc: linux-kernel

   From: Nivedita Singhvi <niv@us.ibm.com>
   Date: Wed, 25 Sep 2002 17:50:11 -0700

   Well, true - we have per hashchain locks, but are we now adding
   the times we need to lookup something on this chain because we now 
   have additional info other than the route, is what I was
   wondering..?

That's what I meant by "extending the lookup key", consider if we
took "next protocol, src port, dst port" into account.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
@ 2002-09-26  0:06 Nivedita Singhvi
  2002-09-26  0:03 ` David S. Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Nivedita Singhvi @ 2002-09-26  0:06 UTC (permalink / raw)
  To: davem; +Cc: linux-kernel


> Such a scheme can even obviate socket lookup if implemented properly.
> It'd basically be a flow cache, much like route lookups but with an
> expanded key set and the capability to stack routes.  Such a flow
> cache could even be two level, with the top level being %100 cpu local
> on SMP (ie. no shared cache lines).

...

> Everything, from packet forwarding, to firewalling, to TCP socket
> packet receive, can be described with routes.  It doesn't make sense
> for forwarding, TCP, netfilter, and encapsulation schemes to duplicate
> all of this table lookup logic and in fact it's entirely superfluous.

Are you saying combine the tables themselves? 

One of the tradeoffs would be serialization of the access, then,
right? i.e. Much less stuff could happen in parallel? Or am I 
completely misunderstanding your proposal?

> This stackable routes idea being worked on, watch this space over the
> next couple of weeks :-)

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
  2002-09-26  0:06 Nivedita Singhvi
@ 2002-09-26  0:03 ` David S. Miller
  2002-09-26  0:50   ` Nivedita Singhvi
  0 siblings, 1 reply; 27+ messages in thread
From: David S. Miller @ 2002-09-26  0:03 UTC (permalink / raw)
  To: niv; +Cc: linux-kernel

   From: "Nivedita Singhvi" <niv@us.ibm.com>
   Date: 25 Sep 2002 17:06:53 -0700
   ...

   > Everything, from packet forwarding, to firewalling, to TCP socket
   > packet receive, can be described with routes.  It doesn't make sense
   > for forwarding, TCP, netfilter, and encapsulation schemes to duplicate
   > all of this table lookup logic and in fact it's entirely superfluous.

   Are you saying combine the tables themselves? 

   One of the tradeoffs would be serialization of the access, then,
   right? i.e. Much less stuff could happen in parallel? Or am I 
   completely misunderstanding your proposal?

In fact the exact opposite, such a suggested flow cache is about
as parallel as you can make it.

Even if the per-cpu toplevel flow cache idea were not implemented and
we used the current top-level route lookup infrastructure, it is fully
parallelized since the toplevel hash table uses per-hashchain locks.
Please see net/ipv4/route.c:ip_route_input() and friends.

I don't understand why you think using the routing tables to their
full potential would imply serialization.  If you still believe this
you have to describe why in more detail.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2002-10-02 17:31 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3D924F9D.C2DCF56A@us.ibm.com.suse.lists.linux.kernel>
     [not found] ` <20020925.170336.77023245.davem@redhat.com.suse.lists.linux.kernel>
2002-09-26  0:31   ` [ANNOUNCE] NF-HIPAC: High Performance Packet Classification Andi Kleen
2002-09-26  0:29     ` David S. Miller
2002-09-26  0:46       ` Andi Kleen
2002-09-26  0:44         ` David S. Miller
2002-09-26  9:00       ` Roberto Nibali
2002-09-26  9:06         ` David S. Miller
2002-09-26  9:24           ` Roberto Nibali
2002-09-26  9:21             ` David S. Miller
2002-09-26 15:13             ` James Morris
2002-09-26 20:51               ` Roberto Nibali
2002-09-26 10:25           ` Roberto Nibali
2002-09-26 10:20             ` David S. Miller
2002-09-26 10:49               ` Roberto Nibali
2002-09-26 12:03           ` jamal
2002-09-26 20:23             ` Roberto Nibali
2002-09-27 13:57               ` jamal
2002-09-26 12:04         ` Andi Kleen
2002-09-26 20:49           ` Roberto Nibali
2002-09-30 17:36         ` Bill Davidsen
2002-10-02 17:37           ` Roberto Nibali
2002-09-26  1:17     ` Nivedita Singhvi
2002-09-26  1:15       ` Andi Kleen
2002-09-26  0:06 Nivedita Singhvi
2002-09-26  0:03 ` David S. Miller
2002-09-26  0:50   ` Nivedita Singhvi
2002-09-26  0:40     ` David S. Miller
2002-09-26  1:09       ` Nivedita Singhvi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).