From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willem de Bruijn Subject: Re: [PATCH next] iptables: add xt_bpf match Date: Tue, 8 Jan 2013 20:58:37 -0500 Message-ID: References: <20121208033111.GB28114@1984> <1355089978-24463-1-git-send-email-willemb@google.com> <20130108032123.GA16502@1984> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: netfilter-devel To: Pablo Neira Ayuso Return-path: Received: from mail-ie0-f177.google.com ([209.85.223.177]:65226 "EHLO mail-ie0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756217Ab3AIB7H (ORCPT ); Tue, 8 Jan 2013 20:59:07 -0500 Received: by mail-ie0-f177.google.com with SMTP id k13so1428914iea.36 for ; Tue, 08 Jan 2013 17:59:07 -0800 (PST) In-Reply-To: <20130108032123.GA16502@1984> Sender: netfilter-devel-owner@vger.kernel.org List-ID: On Mon, Jan 7, 2013 at 10:21 PM, Pablo Neira Ayuso wrote: > Hi Willem, > > On Sun, Dec 09, 2012 at 04:52:58PM -0500, Willem de Bruijn wrote: >> Support arbitrary linux socket filter (BPF) programs as iptables >> match rules. This allows for very expressive filters, and on >> platforms with BPF JIT appears competitive with traditional hardcoded >> iptables rules. >> >> At least, on an x86_64 that achieves 40K netperf TCP_STREAM without >> any iptables rules (40 GBps), >> >> inserting 100x this bpf rule gives 28K >> >> ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j >> >> (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') >> >> inserting 100x this u32 rule gives 21K >> >> ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP >> >> The two are logically equivalent, as far as I can tell. Let me know >> if my test methodology is flawed in some way. Even in cases where >> slower, the filter adds functionality currently lacking in iptables, >> such as access to sk_buff fields like rxhash and queue_mapping. >> >> Signed-off-by: Willem de Bruijn >> --- >> include/linux/netfilter/xt_bpf.h | 17 +++++++ >> net/netfilter/Kconfig | 9 ++++ >> net/netfilter/Makefile | 1 + >> net/netfilter/x_tables.c | 5 +- >> net/netfilter/xt_bpf.c | 86 ++++++++++++++++++++++++++++++++++++++ >> 5 files changed, 116 insertions(+), 2 deletions(-) >> create mode 100644 include/linux/netfilter/xt_bpf.h >> create mode 100644 net/netfilter/xt_bpf.c >> >> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h >> new file mode 100644 >> index 0000000..23502c0 >> --- /dev/null >> +++ b/include/linux/netfilter/xt_bpf.h >> @@ -0,0 +1,17 @@ >> +#ifndef _XT_BPF_H >> +#define _XT_BPF_H >> + >> +#include >> +#include >> + >> +struct xt_bpf_info { >> + __u16 bpf_program_num_elem; >> + >> + /* only used in kernel */ >> + struct sk_filter *filter __attribute__((aligned(8))); > > I see. You set match->userspacesize to zero in libxt_bpf to skip the > comparison of that internal struct sk_filter *filter. > >> + >> + /* variable size, based on program_num_elem */ >> + struct sock_filter bpf_program[0]; > > While testing this I noticed: > > iptables -I OUTPUT -m bpf --bytecode \ > '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT > > Note that this works but it should not. > > iptables -D OUTPUT -m bpf --bytecode \ > '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,1 0 0 0' -j ACCEPT > ^ > Mind that 1, it's a different filter, but it deletes the previous > filter without problems here. > > A quick look at make_delete_mask() in iptables tells me that the > changes you made to userspace to allow variable size matches are not > enough to generate a sane mask (which is fundamental while looking for > a matching rule during the deletion). Thanks for finding this, Pablo. I completely forgot to check that. I've never looked at that deletion code before. Will read it and hopefully propose a simple fix in a few days. An earlier version of the patch used a statically sized struct, by the way, like xt_string does (XT_STRING_MAX_PATTERN_SIZE). If it is easier to incorporate, we can always revert to that.