From mboxrd@z Thu Jan  1 00:00:00 1970
From: Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH next] iptables: add xt_bpf match
Date: Tue, 8 Jan 2013 20:58:37 -0500
Message-ID: <CA+FuTSe-t0Cougo5_7hec6obgxon=8VdcreEB4_hJB5w881bYg@mail.gmail.com>
References: <20121208033111.GB28114@1984> <1355089978-24463-1-git-send-email-willemb@google.com>
 <20130108032123.GA16502@1984>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Cc: netfilter-devel <netfilter-devel@vger.kernel.org>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Return-path: <netfilter-devel-owner@vger.kernel.org>
Received: from mail-ie0-f177.google.com ([209.85.223.177]:65226 "EHLO
	mail-ie0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1756217Ab3AIB7H (ORCPT
	<rfc822;netfilter-devel@vger.kernel.org>);
	Tue, 8 Jan 2013 20:59:07 -0500
Received: by mail-ie0-f177.google.com with SMTP id k13so1428914iea.36
        for <netfilter-devel@vger.kernel.org>; Tue, 08 Jan 2013 17:59:07 -0800 (PST)
In-Reply-To: <20130108032123.GA16502@1984>
Sender: netfilter-devel-owner@vger.kernel.org
List-ID: <netfilter-devel.vger.kernel.org>

On Mon, Jan 7, 2013 at 10:21 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Willem,
>
> On Sun, Dec 09, 2012 at 04:52:58PM -0500, Willem de Bruijn wrote:
>> Support arbitrary linux socket filter (BPF) programs as iptables
>> match rules. This allows for very expressive filters, and on
>> platforms with BPF JIT appears competitive with traditional hardcoded
>> iptables rules.
>>
>> At least, on an x86_64 that achieves 40K netperf TCP_STREAM without
>> any iptables rules (40 GBps),
>>
>> inserting 100x this bpf rule gives 28K
>>
>>     ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j
>>
>>     (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',')
>>
>> inserting 100x this u32 rule gives 21K
>>
>>     ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP
>>
>> The two are logically equivalent, as far as I can tell. Let me know
>> if my test methodology is flawed in some way. Even in cases where
>> slower, the filter adds functionality currently lacking in iptables,
>> such as access to sk_buff fields like rxhash and queue_mapping.
>>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>> ---
>>  include/linux/netfilter/xt_bpf.h |   17 +++++++
>>  net/netfilter/Kconfig            |    9 ++++
>>  net/netfilter/Makefile           |    1 +
>>  net/netfilter/x_tables.c         |    5 +-
>>  net/netfilter/xt_bpf.c           |   86 ++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 116 insertions(+), 2 deletions(-)
>>  create mode 100644 include/linux/netfilter/xt_bpf.h
>>  create mode 100644 net/netfilter/xt_bpf.c
>>
>> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h
>> new file mode 100644
>> index 0000000..23502c0
>> --- /dev/null
>> +++ b/include/linux/netfilter/xt_bpf.h
>> @@ -0,0 +1,17 @@
>> +#ifndef _XT_BPF_H
>> +#define _XT_BPF_H
>> +
>> +#include <linux/filter.h>
>> +#include <linux/types.h>
>> +
>> +struct xt_bpf_info {
>> +     __u16 bpf_program_num_elem;
>> +
>> +     /* only used in kernel */
>> +     struct sk_filter *filter __attribute__((aligned(8)));
>
> I see. You set match->userspacesize to zero in libxt_bpf to skip the
> comparison of that internal struct sk_filter *filter.
>
>> +
>> +     /* variable size, based on program_num_elem */
>> +     struct sock_filter bpf_program[0];
>
> While testing this I noticed:
>
> iptables -I OUTPUT -m bpf --bytecode   \
>         '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT
>
> Note that this works but it should not.
>
> iptables -D OUTPUT -m bpf --bytecode   \
>         '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,1 0 0 0' -j ACCEPT
>                                                                ^
> Mind that 1, it's a different filter, but it deletes the previous
> filter without problems here.
>
> A quick look at make_delete_mask() in iptables tells me that the
> changes you made to userspace to allow variable size matches are not
> enough to generate a sane mask (which is fundamental while looking for
> a matching rule during the deletion).

Thanks for finding this, Pablo. I completely forgot to check that.

I've never looked at that deletion code before. Will read it and
hopefully propose a simple fix in a few days. An earlier version of
the patch used a statically sized struct, by the way, like xt_string
does (XT_STRING_MAX_PATTERN_SIZE). If it is easier to
incorporate, we can always revert to that.