* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification [not found] ` <20020925.170336.77023245.davem@redhat.com.suse.lists.linux.kernel> @ 2002-09-26 0:31 ` Andi Kleen 2002-09-26 0:29 ` David S. Miller 2002-09-26 1:17 ` Nivedita Singhvi 0 siblings, 2 replies; 27+ messages in thread From: Andi Kleen @ 2002-09-26 0:31 UTC (permalink / raw) To: David S. Miller; +Cc: niv, linux-kernel "David S. Miller" <davem@redhat.com> writes: > > In fact the exact opposite, such a suggested flow cache is about > as parallel as you can make it. It sounds more like it would include the FIB too. > I don't understand why you think using the routing tables to their > full potential would imply serialization. If you still believe this > you have to describe why in more detail. I guess he's thinking of the FIB, not the routing cache. The current FIBs have a bit heavier locking at least. Fine grain locking btrees is also not easy/nice. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 0:31 ` [ANNOUNCE] NF-HIPAC: High Performance Packet Classification Andi Kleen @ 2002-09-26 0:29 ` David S. Miller 2002-09-26 0:46 ` Andi Kleen 2002-09-26 9:00 ` Roberto Nibali 2002-09-26 1:17 ` Nivedita Singhvi 1 sibling, 2 replies; 27+ messages in thread From: David S. Miller @ 2002-09-26 0:29 UTC (permalink / raw) To: ak; +Cc: niv, linux-kernel From: Andi Kleen <ak@suse.de> Date: 26 Sep 2002 02:31:13 +0200 "David S. Miller" <davem@redhat.com> writes: > > In fact the exact opposite, such a suggested flow cache is about > as parallel as you can make it. It sounds more like it would include the FIB too. That's the second level cache, not the top level lookup which is what hits %99 of the time. The current FIBs have a bit heavier locking at least. Fine grain locking btrees is also not easy/nice. Also not necessary, only the top level cache really needs to be top performance. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 0:29 ` David S. Miller @ 2002-09-26 0:46 ` Andi Kleen 2002-09-26 0:44 ` David S. Miller 2002-09-26 9:00 ` Roberto Nibali 1 sibling, 1 reply; 27+ messages in thread From: Andi Kleen @ 2002-09-26 0:46 UTC (permalink / raw) To: David S. Miller; +Cc: ak, niv, linux-kernel On Wed, Sep 25, 2002 at 05:29:31PM -0700, David S. Miller wrote: > The current FIBs have a bit heavier locking at least. Fine grain locking > btrees is also not easy/nice. > > Also not necessary, only the top level cache really needs to be > top performance. Sure, but if they were unified (that is what I understood what the original poster wanted to do) then they would be suddenly much more performance critical and need fine grained locking. -Andi P.S.: One big performance problem currently is ip_conntrack. It has a bad hash function and tends to have a too big working set (beyond cache size) Some tuning in this regard would help a lot of workloads. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 0:46 ` Andi Kleen @ 2002-09-26 0:44 ` David S. Miller 0 siblings, 0 replies; 27+ messages in thread From: David S. Miller @ 2002-09-26 0:44 UTC (permalink / raw) To: ak; +Cc: niv, linux-kernel From: Andi Kleen <ak@suse.de> Date: Thu, 26 Sep 2002 02:46:45 +0200 > Also not necessary, only the top level cache really needs to be > top performance. Sure, but if they were unified (that is what I understood what the original poster wanted to do) then they would be suddenly much more performance critical and need fine grained locking. This can be made, if necessary. If the toplevel flow cache lookup table is sized appropriately, I doubt anything will be needed. P.S.: One big performance problem currently is ip_conntrack. It has a bad hash function and tends to have a too big working set (beyond cache size) Some tuning in this regard would help a lot of workloads. This is well understood problem and a fix is in the works. See the netfilter lists. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 0:29 ` David S. Miller 2002-09-26 0:46 ` Andi Kleen @ 2002-09-26 9:00 ` Roberto Nibali 2002-09-26 9:06 ` David S. Miller ` (2 more replies) 1 sibling, 3 replies; 27+ messages in thread From: Roberto Nibali @ 2002-09-26 9:00 UTC (permalink / raw) To: David S. Miller; +Cc: ak, niv, linux-kernel, jamal Hello DaveM and others, > It sounds more like it would include the FIB too. > > That's the second level cache, not the top level lookup which > is what hits %99 of the time. I've done extensive testing in this field trying to achive fast packet filtering with a huge set of not ordered rules loaded into the kernel. According to my findings I had reason to believe that after around 1000 rules for ipchains and around 4800 rules for iptables the L2 cache was the limiting factor (of course given the slowish iptables/conntrack table lookup). Those are rule thresholds I achieved with a PIII Tualatin and 512KB L2 cache. With a sluggish Celeron with I think 128KB L2 cache I achieved about 1/8 of the above treshold. That's why I thought the L2 cache plays a bigger role in this than the CPU FSB clock. I concluded that if the ruleset to be matched would exceed the treshold of what can be loaded into the L2 cache we see cache trashing and that's why performance goes right to hell. I wanted to test this using oprofile but haven't found the correct cpu performance counter yet :). > Also not necessary, only the top level cache really needs to be > top performance. I will do a new round of testing this weekend for a speech I'll be giving. This time I will include ipchains, iptables (of course I am willing to apply every interesting patch regarding hash table optimisation and whatnot you want me to test), nf-hipac, the OpenBSD pf and of course the work done by Jamal. Dave, is the work done by Jamal (and I think Werner and others did some too) before, mostly during OLS, and probably now the one you're referring to? Hadi showed it to me at OLS and I saw a great potential in it. I'm asking because the company I work for builds rather big packet filters (with up to 24 NICs per node) for special purpose networks which due to policies and automated ruleset generation by mapping a port matrix into a weighted graph and then extrapolating the ruleset with basic Algebra (Dijkstra and all this cruft) generate a huge set of rules. Two problems we're facing on a daily basis: o we can't filter more than 13Mbit/s anymore after loading around 3000 rules into the kernel (problem is gone with nf-hipac for example). o we can't log all the messages we would like to because the user space log daemon (syslog-ng in our case, but we've tried others too) doesn't get enough CPU time anymore to read the buffer before it will be over- written by the printk's again. This leads to an almost proportial to N^2 log entry loss with increasing number of rules that do not match. This is the worst thing that can happen to you working in the security business: not having an appropriate log trace during a possible incident. AFAICR Jamal did modify the routing and FIB code and hacked iproute2 to achieve that. We spoke about this at the OLS. Until I had seen his code my approach to test the speed was to (don't laugh): o blackhole everything (POLICY DROP) o generate routing rules (selectors) for matching packets o add routes which would allow just that specific flow into the according routing tables o '-j <CHAIN>' was implemented using bounce table walking This was just a test to see the potential speed improvement of moving the most simplistic things from netfilter (like raw packetfiltering without conntrack and ports) a 'layer' down to the routing code. A lot of works has to be done in this field and the filtering code is just about the most simple one AFAICT, but conntrack and proper n:m NAPT incorporated into the routing code is IMHO a tricky thing. Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 9:00 ` Roberto Nibali @ 2002-09-26 9:06 ` David S. Miller 2002-09-26 9:24 ` Roberto Nibali ` (2 more replies) 2002-09-26 12:04 ` Andi Kleen 2002-09-30 17:36 ` Bill Davidsen 2 siblings, 3 replies; 27+ messages in thread From: David S. Miller @ 2002-09-26 9:06 UTC (permalink / raw) To: ratz; +Cc: ak, niv, linux-kernel, hadi From: Roberto Nibali <ratz@drugphish.ch> Date: Thu, 26 Sep 2002 11:00:53 +0200 Hello DaveM and others, > That's the second level cache, not the top level lookup which > is what hits %99 of the time. ... the L2 cache was the limiting factor I'm not talking about cpu second level cache, I'm talking about a second level lookup table that backs up a front end routing hash. A software data structure. You are talking about a lot of independant things, but I'm going to defer my contributions until we have actual code people can start plugging netfilter into if they want. About using syslog to record messages, that is doomed to failure, implement log messages via netlink and use that to log the events instead. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 9:06 ` David S. Miller @ 2002-09-26 9:24 ` Roberto Nibali 2002-09-26 9:21 ` David S. Miller 2002-09-26 15:13 ` James Morris 2002-09-26 10:25 ` Roberto Nibali 2002-09-26 12:03 ` jamal 2 siblings, 2 replies; 27+ messages in thread From: Roberto Nibali @ 2002-09-26 9:24 UTC (permalink / raw) To: David S. Miller; +Cc: ak, niv, linux-kernel, hadi > I'm not talking about cpu second level cache, I'm talking about > a second level lookup table that backs up a front end routing > hash. A software data structure. Doh! Sorry for my confusion, I guess I wasn't reading your posting too carefully. I understand the software architecture part now. Nevertheless one day or another you will need to face the caching issue too unless your data structure will always fit entirely into the cache or am I completely off track again? > You are talking about a lot of independant things, but I'm going > to defer my contributions until we have actual code people can > start plugging netfilter into if they want. Fair enough. I'm looking forward to seeing this framework. Any release schedules or rough plans? > About using syslog to record messages, that is doomed to failure, > implement log messages via netlink and use that to log the events > instead. Yes, we're doing tests in this field now (as with evlog) but as it seems from preliminary testing netlink transportation of binary data is not 100% reliable either. However, I will refrain from further posting assumptions until we've done our tests and until we can post useful results and facts in this field. Thanks and cheers, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 9:24 ` Roberto Nibali @ 2002-09-26 9:21 ` David S. Miller 2002-09-26 15:13 ` James Morris 1 sibling, 0 replies; 27+ messages in thread From: David S. Miller @ 2002-09-26 9:21 UTC (permalink / raw) To: ratz; +Cc: ak, niv, linux-kernel, hadi From: Roberto Nibali <ratz@drugphish.ch> Date: Thu, 26 Sep 2002 11:24:19 +0200 Fair enough. I'm looking forward to seeing this framework. Any release schedules or rough plans? None whatsoever, as it should be. Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 9:24 ` Roberto Nibali 2002-09-26 9:21 ` David S. Miller @ 2002-09-26 15:13 ` James Morris 2002-09-26 20:51 ` Roberto Nibali 1 sibling, 1 reply; 27+ messages in thread From: James Morris @ 2002-09-26 15:13 UTC (permalink / raw) To: Roberto Nibali; +Cc: David S. Miller, Andi Kleen, niv, linux-kernel, jamal On Thu, 26 Sep 2002, Roberto Nibali wrote: > Yes, we're doing tests in this field now (as with evlog) but as it seems > from preliminary testing netlink transportation of binary data is not > 100% reliable either. Non-blocking netlink delivery is reliable, although you can overrun the userspace socket buffer (this can be detected, however). The fundamental issue remains: sending more data to userspace than can be handled. A truly reliable transport would also involve an ack based protocol . Under certain circumstances (e.g. log every forwarded packet for audit purposes), packets would need to be dropped if the logging mechanism became overloaded. This would in turn involve some kind of queuing mechanism and introduce a new set of performance problems. Reliable logging is a challenging problem area in general, probably better suited to dedicated hardware environments where the software can be tuned to known system capabilities. - James -- James Morris <jmorris@intercode.com.au> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 15:13 ` James Morris @ 2002-09-26 20:51 ` Roberto Nibali 0 siblings, 0 replies; 27+ messages in thread From: Roberto Nibali @ 2002-09-26 20:51 UTC (permalink / raw) To: James Morris; +Cc: linux-kernel, jamal > Non-blocking netlink delivery is reliable, although you can overrun the > userspace socket buffer (this can be detected, however). The fundamental > issue remains: sending more data to userspace than can be handled. Agreed. > A truly reliable transport would also involve an ack based protocol . > Under certain circumstances (e.g. log every forwarded packet for audit > purposes), packets would need to be dropped if the logging mechanism > became overloaded. This would in turn involve some kind of queuing > mechanism and introduce a new set of performance problems. Reliable > logging is a challenging problem area in general, probably better suited > to dedicated hardware environments where the software can be tuned to > known system capabilities. Thanks. I think we'll find a solution that will suit us best and if we have something we let the community know. Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 9:06 ` David S. Miller 2002-09-26 9:24 ` Roberto Nibali @ 2002-09-26 10:25 ` Roberto Nibali 2002-09-26 10:20 ` David S. Miller 2002-09-26 12:03 ` jamal 2 siblings, 1 reply; 27+ messages in thread From: Roberto Nibali @ 2002-09-26 10:25 UTC (permalink / raw) To: David S. Miller; +Cc: ak, niv, linux-kernel, hadi > About using syslog to record messages, that is doomed to failure, > implement log messages via netlink and use that to log the events > instead. <maybe stupid thought> Another thing would be to use netconsole to send event messages over the network to a central loghost. This would eliminate the buffer overwrite problem unless you sent more messages than the backlog queue is able to hold before the packets are being processed. But you could theoretically send 10 MB messages per seconds that could also be stored. </maybe stupid thought> I will shut up now as I do not want to waste your and the others precious time with my extensive schmoozing ;). Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 10:25 ` Roberto Nibali @ 2002-09-26 10:20 ` David S. Miller 2002-09-26 10:49 ` Roberto Nibali 0 siblings, 1 reply; 27+ messages in thread From: David S. Miller @ 2002-09-26 10:20 UTC (permalink / raw) To: ratz; +Cc: ak, niv, linux-kernel, hadi From: Roberto Nibali <ratz@drugphish.ch> Date: Thu, 26 Sep 2002 12:25:20 +0200 <maybe stupid thought> Another thing would be to use netconsole to send event messages over the What if the netconsole packets cause events to be logged? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 10:20 ` David S. Miller @ 2002-09-26 10:49 ` Roberto Nibali 0 siblings, 0 replies; 27+ messages in thread From: Roberto Nibali @ 2002-09-26 10:49 UTC (permalink / raw) To: David S. Miller; +Cc: ak, niv, linux-kernel, hadi > What if the netconsole packets cause events to be logged? Oups! Well, you could send them via printk to the internal printk buffer which then gets fetched by the local syslog and then you can decide what to do since those messages should never fill up the buffer as quickly as syslog will be able to get them. Actually then you should rate limit the printk messages and probably also increase the buffer size. But to be honest, those are not the usual messages that fill up the buffer so fast that syslog is not able to read the message before the buffer gets overwritten again. Of course you will then have a logfile inconsistency and this could be accounted just as well as a loss of trace. But generally I agree that you're standing there pants down. For example if you have a filter rule in the routing or whereever code that doesn't permit the packets to leave the machine and thus to be dropped. Ah well ... it was worth a try. [Hmpf, my collegue is just testing this right now but I think I can then tell him to stop this because you're always biting your own tail with this approach, one way or another]. Thanks for the valuable input and best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 9:06 ` David S. Miller 2002-09-26 9:24 ` Roberto Nibali 2002-09-26 10:25 ` Roberto Nibali @ 2002-09-26 12:03 ` jamal 2002-09-26 20:23 ` Roberto Nibali 2 siblings, 1 reply; 27+ messages in thread From: jamal @ 2002-09-26 12:03 UTC (permalink / raw) To: David S. Miller; +Cc: ratz, ak, niv, linux-kernel, netdev It would be nice if people would start ccing networking related discussions to netdev. I missed the first part of the discussion but i take it the NF-HIPAC posted a patch.. BTW, I emailed the authors when i read the paper but never heard back. What i wanted the authors was to compare against one of the tc classifiers not iptables. On Thu, 26 Sep 2002, David S. Miller wrote: > You are talking about a lot of independant things, but I'm going > to defer my contributions until we have actual code people can > start plugging netfilter into if they want. > I hacked some code using the traffic control framework around OLS time; there are a lot of ideas i havent incorporated yet. Too many hacks, too little time ;-> I think this is what i may have showed Roberto on my laptop over a drink. I probably wouldnt have put this code out if my complaints about netfilter werent ignored. And you know what happens when you start writting poetry, I ended worrying more than just about the performance problems of iptables; for example the code i have now makes it easy to extend the path a packet takes using simple policies. The code i have is based around tc framework. One thing i liked about netfilter is the idea of targets being separate modules; so the code i have infact makes uses of netfilter targets. I plan on revisiting this code at some point, maybe this weekend now that i am reminded of it ;-> Take a look: http://www.cyberus.ca/~hadi/patches/action.DESCRIPTION > About using syslog to record messages, that is doomed to failure, > implement log messages via netlink and use that to log the events > instead. > Agreed, you need a netlink to syslog converter. Netlink is king -- all the policies in the above code are netlink controlled. All events are also netlink transported. You dont have to send every little message you see; netlink allows you to batch and you could easily do a nagle like algorithm. Next steps are a distributed version of netlink.. cheers, jamal ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 12:03 ` jamal @ 2002-09-26 20:23 ` Roberto Nibali 2002-09-27 13:57 ` jamal 0 siblings, 1 reply; 27+ messages in thread From: Roberto Nibali @ 2002-09-26 20:23 UTC (permalink / raw) To: jamal; +Cc: niv, linux-kernel, netdev Hello Jamal, [took out AK und DaveM since I know they both read netdev and this reply is not really of any relevance to them] > It would be nice if people would start ccing networking related > discussions to netdev. I missed the first part of the discussion > but i take it the NF-HIPAC posted a patch.. BTW, I emailed the authors Yes, your assumption is correct and sorry for missing the cc once again. > when i read the paper but never heard back. > What i wanted the authors was to compare against one of the tc > classifiers not iptables. I will contact you privately on this issue since I'm about to conduct tests this weekend. > I hacked some code using the traffic control framework around OLS time; > there are a lot of ideas i havent incorporated yet. Too many hacks, too > little time ;-> I think this is what i may have showed Roberto on my > laptop over a drink. Exactly (even wearing a netfilter T-shirt). > I probably wouldnt have put this code out if my complaints about > netfilter werent ignored. > And you know what happens when you start writting poetry, I ended worrying > more than just about the performance problems of iptables; for example > the code i have now makes it easy to extend the path a packet takes using > simple policies. Great, I remember some of your postings about the netfilter framework. > The code i have is based around tc framework. One thing i liked about > netfilter is the idea of targets being separate modules; so the code i > have infact makes uses of netfilter targets. > I plan on revisiting this code at some point, maybe this weekend now that > i am reminded of it ;-> Excellent, this could make it into my test suites as well. > Take a look: > http://www.cyberus.ca/~hadi/patches/action.DESCRIPTION I did, I simply didn't find the time to do it. > Agreed, you need a netlink to syslog converter. > Netlink is king -- all the policies in the above code are netlink > controlled. All events are also netlink transported. You dont have to send > every little message you see; netlink allows you to batch and you could > easily do a nagle like algorithm. Next steps are a distributed version > of netlink.. Is there a code architecture draft somewhere? Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 20:23 ` Roberto Nibali @ 2002-09-27 13:57 ` jamal 0 siblings, 0 replies; 27+ messages in thread From: jamal @ 2002-09-27 13:57 UTC (permalink / raw) To: Roberto Nibali; +Cc: linux-kernel, netdev On Thu, 26 Sep 2002, Roberto Nibali wrote: > Is there a code architecture draft somewhere? You mean for what i posted? Dont you think i already went beyond the classical open source model by putting out a user guide? ;-> ;-> Just ask me questions in private and i'll try and help. cheers, jamal ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 9:00 ` Roberto Nibali 2002-09-26 9:06 ` David S. Miller @ 2002-09-26 12:04 ` Andi Kleen 2002-09-26 20:49 ` Roberto Nibali 2002-09-30 17:36 ` Bill Davidsen 2 siblings, 1 reply; 27+ messages in thread From: Andi Kleen @ 2002-09-26 12:04 UTC (permalink / raw) To: Roberto Nibali; +Cc: David S. Miller, ak, niv, linux-kernel, jamal On Thu, Sep 26, 2002 at 11:00:53AM +0200, Roberto Nibali wrote: > o we can't filter more than 13Mbit/s anymore after loading around 3000 > rules into the kernel (problem is gone with nf-hipac for example). For iptables/ipchain you need to write hierarchical/port range rules in this case and try to terminate searchs early. But yes, we also found that the L2 cache is limiting here (ip_conntrack has the same problem) > o we can't log all the messages we would like to because the user space > log daemon (syslog-ng in our case, but we've tried others too) doesn't > get enough CPU time anymore to read the buffer before it will be over- > written by the printk's again. This leads to an almost proportial to > N^2 log entry loss with increasing number of rules that do not match. > This is the worst thing that can happen to you working in the > security business: not having an appropriate log trace during a > possible incident. At least that is easily fixed. Just increase the LOG_BUF_LEN parameter in kernel/printk.c Alternatively don't use slow printk, but nfnetlink to report bad packets and print from user space. That should scale much better. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 12:04 ` Andi Kleen @ 2002-09-26 20:49 ` Roberto Nibali 0 siblings, 0 replies; 27+ messages in thread From: Roberto Nibali @ 2002-09-26 20:49 UTC (permalink / raw) To: Andi Kleen; +Cc: David S. Miller, niv, linux-kernel, jamal, netdev > For iptables/ipchain you need to write hierarchical/port range rules > in this case and try to terminate searchs early. We're still trying to find the correct mathematical functions to do this. Trust me, it is not so easy, the mapping of the port matrix and the network flow through many stacked packet filters and firewalls generates a rather complex graph (partly bigraph (LVS-DR for example)) which has complex structures (redundancy and parallelisations). It's not that we could sit down and implement a fw-script for our packet filters, the fw-script is being generated through a meta-fw layer that knows about the surrounding network nodes. > But yes, we also found that the L2 cache is limiting here > (ip_conntrack has the same problem) I think this weekend I will do my tests also measuring some cpu performance counters with oprofile, such as DATA_READ_MISS, CODE CACHE MISS and NONCACHEABLE_MEMORY_READS. > At least that is easily fixed. Just increase the LOG_BUF_LEN parameter > in kernel/printk.c Tests showed that this only helps in peak situations, I think we should simply forget about printk(). > Alternatively don't use slow printk, but nfnetlink to report bad packets > and print from user space. That should scale much better. Yes and there are a few things that my collegue found out during his tests (actually pretty straight forward things): 1. A big log buffer is only useful to come by peaks 2. A big log buffer while having high CPU load doesn't help at all 3. The smaller the message, the better (binary logging thus is an advantage) 4. The logging via printk() is extremely expensive, because of the conversions and whatnot. A rough estimate would be 12500 clock cycles for a log entry generated by printk(). This means that on a PIII/450 a log entry needs 0.000028s and this again leads to following observation: Having 36000pps which should all be logged, you will end up with a system having 100% CPU load and being 0% idle. 5. The kernel should log a binary stream, also the daemon that needs to fetch the data. If you want to convert the binary to human readable format, you start a process with low prio or do it on-demand. 6. Ideally the log daemon should be preemtible to get a defined time slice to do its job. Some test results conducted by a coworker of mine (Achim Gsell): Max pkt rate the system can log without losing more then 1% of the messages: ---------------------------------------------------------------------------- kernel: Linux 2.4.19-gentoo-r7 (low latency scheduling) daemon: syslog-ng (nice 0), logbufsiz=16k, pkts=10*10000, CPU=PIII/450 packet-len: 64 256 512 1024 2873pkt/s 3332pkt/s 3124pkt/s 3067pkt/s 1.4 Mb/s 6.6Mb/s 12.2Mb/s 23.9Mb/s daemon: syslog-ng (nice 0), logbufsiz=16k, pkts=10*10000, CPU=PIVM/1.7 packet-len: 64 256 512 1024 7808pkt/s 7807pkt/s 7806pkt/s pkt/s 3.8 Mb/s 15.2Mb/s 30.5Mb/s Mb/s ---------------------------------------------------------------------------------------------------------- daemon: cat /proc/kmsg > kernlog, logbufsiz=16k, pkts=10*10000, CPU=PIII/450 packet-len: 64 256 512 1024 4300pkt/s 3076pkt/s 2.1 Mb/s 24.0Mb/s daemon: ulogd (nlbufsize=4k, qthreshold=1), pkts=10*10000, CPU=PIII/450 packet-len: 64 256 512 1024 4097pkt/s 4097pkt/s 2.0 Mb/s 32 Mb/s daemon: ulogd (nlbufsize=2^17 - 1, qthreshold=1), pkts=10*10000, CPU=PIII/450 packet-len: 64 256 512 1024 6576pkt/s 5000pkt/s 3.2 Mb/s 38 Mb/s daemon: ulogd (nlbufsize=64k, qthreshold=1), pkts=1*10000, CPU=PIII/450 packet-len: 64 256 512 1024 pkt/s 4.0 Mb/s daemon: ulogd (nlbufsize=2^17 - 1, qthreshold=50), pkts=10*10000, CPU=PIII/450 packet-len: 64 256 512 1024 6170pkt/s 5000pkt/s 3.0 Mb/s 38 Mb/s Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 9:00 ` Roberto Nibali 2002-09-26 9:06 ` David S. Miller 2002-09-26 12:04 ` Andi Kleen @ 2002-09-30 17:36 ` Bill Davidsen 2002-10-02 17:37 ` Roberto Nibali 2 siblings, 1 reply; 27+ messages in thread From: Bill Davidsen @ 2002-09-30 17:36 UTC (permalink / raw) To: Roberto Nibali; +Cc: David S. Miller, ak, niv, linux-kernel, jamal On Thu, 26 Sep 2002, Roberto Nibali wrote: > I've done extensive testing in this field trying to achive fast packet > filtering with a huge set of not ordered rules loaded into the kernel. > > According to my findings I had reason to believe that after around 1000 > rules for ipchains and around 4800 rules for iptables the L2 cache was > the limiting factor (of course given the slowish iptables/conntrack > table lookup). > > Those are rule thresholds I achieved with a PIII Tualatin and 512KB L2 > cache. With a sluggish Celeron with I think 128KB L2 cache I achieved > about 1/8 of the above treshold. That's why I thought the L2 cache plays > a bigger role in this than the CPU FSB clock. > > I concluded that if the ruleset to be matched would exceed the treshold > of what can be loaded into the L2 cache we see cache trashing and that's > why performance goes right to hell. I wanted to test this using oprofile > but haven't found the correct cpu performance counter yet :). > > > Also not necessary, only the top level cache really needs to be > > top performance. > > I will do a new round of testing this weekend for a speech I'll be > giving. This time I will include ipchains, iptables (of course I am > willing to apply every interesting patch regarding hash table > optimisation and whatnot you want me to test), nf-hipac, the OpenBSD pf > and of course the work done by Jamal. Look forward to any info you can provide. I particularly like that nf-hipac can be put in and tried in one-to-one comparison, that leaves an easy route to testing and getting confidence in the code. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-30 17:36 ` Bill Davidsen @ 2002-10-02 17:37 ` Roberto Nibali 0 siblings, 0 replies; 27+ messages in thread From: Roberto Nibali @ 2002-10-02 17:37 UTC (permalink / raw) To: Bill Davidsen; +Cc: linux-kernel, netdev Hi, >>I will do a new round of testing this weekend for a speech I'll be >>giving. This time I will include ipchains, iptables (of course I am >>willing to apply every interesting patch regarding hash table >>optimisation and whatnot you want me to test), nf-hipac, the OpenBSD pf >>and of course the work done by Jamal. > > Look forward to any info you can provide. Unfortunately (as always) there were tons of delays that didn't allow me to finish the complete test suite as I hoped I could but I sent some information off this list to Jamal and the nf-hipac guys about previous test result. See below. I hope I can do more tests this weekend ... > I particularly like that nf-hipac can be put in and tried in one-to-one > comparison, that leaves an easy route to testing and getting confidence in > the code. Yes and it was very convincing after the first few tests Some prelimiary test with raw TCP throughput have given me following really cool results: TCP RAW throughput 100Mbit/s max MTU: ------------------------------------- ratz@laphish:~/netperf-2.2pl2 > ./netperf -H 192.168.1.141 -p 6666 -l 60 TCP STREAM TEST to 192.168.1.141 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/s 87380 16384 16384 60.01 88.03 <------ ratz@laphish:~/netperf-2.2pl2 > TCP RAW throughput 100Mbit/s max MTU with 10000 non-matching rules + 1 last matching rule at the end of the FORWARD chain [iptables]: ---------------------------------------------------------------------- ratz@laphish:~/netperf-2.2pl2 > ./netperf -H 192.168.1.141 -p 6666 -l 60 TCP STREAM TEST to 192.168.1.141 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 60.12 3.28 <------ ratz@laphish:~/netperf-2.2pl2 > TCP RAW throughput 100Mbit/s max MTU with 10000 non-matching rules + 1 last matching rule at the end of the FORWARD chain [nf-hipac]: ---------------------------------------------------------------------- ratz@laphish:~/netperf-2.2pl2 > ./netperf -H 192.168.1.141 -p 6666 -l 60 TCP STREAM TEST to 192.168.1.141 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 60.03 85.78 <------ ratz@laphish:~/netperf-2.2pl2 > For nf-hipac I also have some statistics: ----------------------------------------- bloodyhell:/var/FWTEST/nf-hipac # cat /proc/net/nf-hipac nf-hipac statistics ------------------- Maximum available memory: 65308672 bytes Currently used memory: 1764160 bytes INPUT: - INPUT chain is empty FORWARD: - Number of rules: 10002 - Total size: 1033010 bytes - Total size (allocated): 1764160 bytes - Termrule size: 80016 bytes - Termrule size (allocated): 320064 bytes - Number of btrees: 30007 * number of u32 btrees: 10003 + distribution of u32 btrees: [ 2, 4]: 10002 [ 16384, 32768]: 1 * number of u16 btrees: 10002 + distribution of u16 btrees: [ 1, 2]: 10002 * number of u8 btrees: 10002 + distribution of u8 btrees: [ 2, 4]: 18 OUTPUT: - OUTPUT chain is empty bloodyhell:/var/FWTEST/nf-hipac # Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 0:31 ` [ANNOUNCE] NF-HIPAC: High Performance Packet Classification Andi Kleen 2002-09-26 0:29 ` David S. Miller @ 2002-09-26 1:17 ` Nivedita Singhvi 2002-09-26 1:15 ` Andi Kleen 1 sibling, 1 reply; 27+ messages in thread From: Nivedita Singhvi @ 2002-09-26 1:17 UTC (permalink / raw) To: Andi Kleen; +Cc: David S. Miller, linux-kernel Andi Kleen wrote: > I guess he's thinking of the FIB, not the routing cache. I was, + chain explansion, but this is just (um, cough) to s/he/she :) thanks, Nivedita > The current FIBs have a bit heavier locking at least. Fine grain locking > btrees is also not easy/nice. > > -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 1:17 ` Nivedita Singhvi @ 2002-09-26 1:15 ` Andi Kleen 0 siblings, 0 replies; 27+ messages in thread From: Andi Kleen @ 2002-09-26 1:15 UTC (permalink / raw) To: Nivedita Singhvi; +Cc: Andi Kleen, David S. Miller, linux-kernel On Wed, Sep 25, 2002 at 06:17:58PM -0700, Nivedita Singhvi wrote: > Andi Kleen wrote: > > > I guess he's thinking of the FIB, not the routing cache. > > I was, + chain explansion, but this is just (um, cough) > to s/he/she I was actually thinking about the first poster in the thread (it was a 'he' iirc) But thanks for the correction anyways :-) -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification @ 2002-09-26 0:06 Nivedita Singhvi 2002-09-26 0:03 ` David S. Miller 0 siblings, 1 reply; 27+ messages in thread From: Nivedita Singhvi @ 2002-09-26 0:06 UTC (permalink / raw) To: davem; +Cc: linux-kernel > Such a scheme can even obviate socket lookup if implemented properly. > It'd basically be a flow cache, much like route lookups but with an > expanded key set and the capability to stack routes. Such a flow > cache could even be two level, with the top level being %100 cpu local > on SMP (ie. no shared cache lines). ... > Everything, from packet forwarding, to firewalling, to TCP socket > packet receive, can be described with routes. It doesn't make sense > for forwarding, TCP, netfilter, and encapsulation schemes to duplicate > all of this table lookup logic and in fact it's entirely superfluous. Are you saying combine the tables themselves? One of the tradeoffs would be serialization of the access, then, right? i.e. Much less stuff could happen in parallel? Or am I completely misunderstanding your proposal? > This stackable routes idea being worked on, watch this space over the > next couple of weeks :-) thanks, Nivedita ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 0:06 Nivedita Singhvi @ 2002-09-26 0:03 ` David S. Miller 2002-09-26 0:50 ` Nivedita Singhvi 0 siblings, 1 reply; 27+ messages in thread From: David S. Miller @ 2002-09-26 0:03 UTC (permalink / raw) To: niv; +Cc: linux-kernel From: "Nivedita Singhvi" <niv@us.ibm.com> Date: 25 Sep 2002 17:06:53 -0700 ... > Everything, from packet forwarding, to firewalling, to TCP socket > packet receive, can be described with routes. It doesn't make sense > for forwarding, TCP, netfilter, and encapsulation schemes to duplicate > all of this table lookup logic and in fact it's entirely superfluous. Are you saying combine the tables themselves? One of the tradeoffs would be serialization of the access, then, right? i.e. Much less stuff could happen in parallel? Or am I completely misunderstanding your proposal? In fact the exact opposite, such a suggested flow cache is about as parallel as you can make it. Even if the per-cpu toplevel flow cache idea were not implemented and we used the current top-level route lookup infrastructure, it is fully parallelized since the toplevel hash table uses per-hashchain locks. Please see net/ipv4/route.c:ip_route_input() and friends. I don't understand why you think using the routing tables to their full potential would imply serialization. If you still believe this you have to describe why in more detail. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 0:03 ` David S. Miller @ 2002-09-26 0:50 ` Nivedita Singhvi 2002-09-26 0:40 ` David S. Miller 0 siblings, 1 reply; 27+ messages in thread From: Nivedita Singhvi @ 2002-09-26 0:50 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel "David S. Miller" wrote: > > From: "Nivedita Singhvi" <niv@us.ibm.com> > Date: 25 Sep 2002 17:06:53 -0700 > ... > > > Everything, from packet forwarding, to firewalling, to TCP socket > > packet receive, can be described with routes. It doesn't make sense > > for forwarding, TCP, netfilter, and encapsulation schemes to duplicate > > all of this table lookup logic and in fact it's entirely superfluous. > > Are you saying combine the tables themselves? > > One of the tradeoffs would be serialization of the access, then, > right? i.e. Much less stuff could happen in parallel? Or am I > completely misunderstanding your proposal? > > In fact the exact opposite, such a suggested flow cache is about > as parallel as you can make it. > > Even if the per-cpu toplevel flow cache idea were not implemented and > we used the current top-level route lookup infrastructure, it is fully > parallelized since the toplevel hash table uses per-hashchain locks. > Please see net/ipv4/route.c:ip_route_input() and friends. Well, true - we have per hashchain locks, but are we now adding the times we need to lookup something on this chain because we now have additional info other than the route, is what I was wondering..? > I don't understand why you think using the routing tables to their > full potential would imply serialization. If you still believe this > you have to describe why in more detail. thanks, Nivedita ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 0:50 ` Nivedita Singhvi @ 2002-09-26 0:40 ` David S. Miller 2002-09-26 1:09 ` Nivedita Singhvi 0 siblings, 1 reply; 27+ messages in thread From: David S. Miller @ 2002-09-26 0:40 UTC (permalink / raw) To: niv; +Cc: linux-kernel From: Nivedita Singhvi <niv@us.ibm.com> Date: Wed, 25 Sep 2002 17:50:11 -0700 Well, true - we have per hashchain locks, but are we now adding the times we need to lookup something on this chain because we now have additional info other than the route, is what I was wondering..? That's what I meant by "extending the lookup key", consider if we took "next protocol, src port, dst port" into account. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification 2002-09-26 0:40 ` David S. Miller @ 2002-09-26 1:09 ` Nivedita Singhvi 0 siblings, 0 replies; 27+ messages in thread From: Nivedita Singhvi @ 2002-09-26 1:09 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel "David S. Miller" wrote: > Well, true - we have per hashchain locks, but are we now adding > the times we need to lookup something on this chain because we now > have additional info other than the route, is what I was > wondering..? > > That's what I meant by "extending the lookup key", consider if we > took "next protocol, src port, dst port" into account. Aah!. thick head <-- understanding. thanks, Nivedita ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2002-10-02 17:31 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <3D924F9D.C2DCF56A@us.ibm.com.suse.lists.linux.kernel> [not found] ` <20020925.170336.77023245.davem@redhat.com.suse.lists.linux.kernel> 2002-09-26 0:31 ` [ANNOUNCE] NF-HIPAC: High Performance Packet Classification Andi Kleen 2002-09-26 0:29 ` David S. Miller 2002-09-26 0:46 ` Andi Kleen 2002-09-26 0:44 ` David S. Miller 2002-09-26 9:00 ` Roberto Nibali 2002-09-26 9:06 ` David S. Miller 2002-09-26 9:24 ` Roberto Nibali 2002-09-26 9:21 ` David S. Miller 2002-09-26 15:13 ` James Morris 2002-09-26 20:51 ` Roberto Nibali 2002-09-26 10:25 ` Roberto Nibali 2002-09-26 10:20 ` David S. Miller 2002-09-26 10:49 ` Roberto Nibali 2002-09-26 12:03 ` jamal 2002-09-26 20:23 ` Roberto Nibali 2002-09-27 13:57 ` jamal 2002-09-26 12:04 ` Andi Kleen 2002-09-26 20:49 ` Roberto Nibali 2002-09-30 17:36 ` Bill Davidsen 2002-10-02 17:37 ` Roberto Nibali 2002-09-26 1:17 ` Nivedita Singhvi 2002-09-26 1:15 ` Andi Kleen 2002-09-26 0:06 Nivedita Singhvi 2002-09-26 0:03 ` David S. Miller 2002-09-26 0:50 ` Nivedita Singhvi 2002-09-26 0:40 ` David S. Miller 2002-09-26 1:09 ` Nivedita Singhvi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).