* [PATCH rfc] netfilter: two xtables matches @ 2012-12-05 19:22 Willem de Bruijn 2012-12-05 19:22 ` [PATCH 1/2] netfilter: add xt_priority xtables match Willem de Bruijn ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Willem de Bruijn @ 2012-12-05 19:22 UTC (permalink / raw) To: netfilter-devel, netdev, edumazet, davem, kaber, pablo The second patch is more speculative and aims to be a more general workaround, as well as a performance optimization: support (preferably JIT compiled) BPF programs as iptables match rules. Potentially, the skb->priority match can be implemented by applying only the second patch and adding a new BPF_S_ANC ancillary field to Linux Socket Filters. I also wrote corresponding userspace patches to iptables. The process for submitting both kernel and user patches is not 100% clear to me. Sending the kernel bits to both netdev and netfilter-devel for initial feedback. Please correct me if you want it another way. The patches apply to net-next. ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 1/2] netfilter: add xt_priority xtables match 2012-12-05 19:22 [PATCH rfc] netfilter: two xtables matches Willem de Bruijn @ 2012-12-05 19:22 ` Willem de Bruijn 2012-12-08 0:04 ` [PATCH] [RFC] netfilter: add xt_skbuff " Willem de Bruijn 2012-12-05 19:22 ` [PATCH 2/2] netfilter: add xt_bpf " Willem de Bruijn 2012-12-05 19:28 ` [PATCH rfc] netfilter: two xtables matches Willem de Bruijn 2 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2012-12-05 19:22 UTC (permalink / raw) To: netfilter-devel, netdev, edumazet, davem, kaber, pablo; +Cc: Willem de Bruijn Add an iptables match based on the skb->priority field. This field can be set by socket option SO_PRIORITY, among others. The match supports range based matching on packet priority, with optional inversion. Before matching, a mask can be applied to the priority field to handle the case where different regions of the bitfield are reserved for unrelated uses. --- include/linux/netfilter/xt_priority.h | 13 ++++++++ net/netfilter/Kconfig | 9 ++++++ net/netfilter/Makefile | 1 + net/netfilter/xt_priority.c | 51 +++++++++++++++++++++++++++++++++ 4 files changed, 74 insertions(+), 0 deletions(-) create mode 100644 include/linux/netfilter/xt_priority.h create mode 100644 net/netfilter/xt_priority.c diff --git a/include/linux/netfilter/xt_priority.h b/include/linux/netfilter/xt_priority.h new file mode 100644 index 0000000..da9a288 --- /dev/null +++ b/include/linux/netfilter/xt_priority.h @@ -0,0 +1,13 @@ +#ifndef _XT_PRIORITY_H +#define _XT_PRIORITY_H + +#include <linux/types.h> + +struct xt_priority_info { + __u32 min; + __u32 max; + __u32 mask; + __u8 invert; +}; + +#endif /*_XT_PRIORITY_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index fefa514..c9739c6 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -1093,6 +1093,15 @@ config NETFILTER_XT_MATCH_PKTTYPE To compile it as a module, choose M here. If unsure, say N. +config NETFILTER_XT_MATCH_PRIORITY + tristate '"priority" match support' + depends on NETFILTER_ADVANCED + help + This option adds a match based on the value of the sk_buff + priority field. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_QUOTA tristate '"quota" match support' depends on NETFILTER_ADVANCED diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 3259697..8e5602f 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o +obj-$(CONFIG_NETFILTER_XT_MATCH_PRIORITY) += xt_priority.o obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o obj-$(CONFIG_NETFILTER_XT_MATCH_RATEEST) += xt_rateest.o obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o diff --git a/net/netfilter/xt_priority.c b/net/netfilter/xt_priority.c new file mode 100644 index 0000000..4982eee --- /dev/null +++ b/net/netfilter/xt_priority.c @@ -0,0 +1,51 @@ +/* Xtables module to match packets based on their sk_buff priority field. + * Copyright 2012 Google Inc. + * Written by Willem de Bruijn <willemb@google.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/module.h> +#include <linux/skbuff.h> + +#include <linux/netfilter/xt_priority.h> +#include <linux/netfilter/x_tables.h> + +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); +MODULE_DESCRIPTION("Xtables: priority filter match"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("ipt_priority"); +MODULE_ALIAS("ip6t_priority"); + +static bool priority_mt(const struct sk_buff *skb, + struct xt_action_param *par) +{ + const struct xt_priority_info *info = par->matchinfo; + + __u32 priority = skb->priority & info->mask; + return (priority >= info->min && priority <= info->max) ^ info->invert; +} + +static struct xt_match priority_mt_reg __read_mostly = { + .name = "priority", + .revision = 0, + .family = NFPROTO_UNSPEC, + .match = priority_mt, + .matchsize = sizeof(struct xt_priority_info), + .me = THIS_MODULE, +}; + +static int __init priority_mt_init(void) +{ + return xt_register_match(&priority_mt_reg); +} + +static void __exit priority_mt_exit(void) +{ + xt_unregister_match(&priority_mt_reg); +} + +module_init(priority_mt_init); +module_exit(priority_mt_exit); -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH] [RFC] netfilter: add xt_skbuff xtables match 2012-12-05 19:22 ` [PATCH 1/2] netfilter: add xt_priority xtables match Willem de Bruijn @ 2012-12-08 0:04 ` Willem de Bruijn 2012-12-08 3:23 ` Pablo Neira Ayuso 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2012-12-08 0:04 UTC (permalink / raw) To: netfilter-devel, pablo, kaber; +Cc: Willem de Bruijn Add an iptables match based on skb fields, such as mark, priority, input interface and rxhash. The match supports range based matching on one field, with optional inversion and masking. v2: switches from xt_priority to xt_skbuff. Pablo, is this what you had in mind? It doesn't perfectly duplicate the values from nftables xt_meta. Needs more testing to cover the field-specific codepaths. Tested by inserting iptables -t mangle -A PREROUTING -s $SRC -j MARK --set-mark 10 iptables -A INPUT -m skbuff --min 10 --max 10 -j TRACE The userspace tool needs work, too. For one, I just hardcoded the field_id to be skb_field_mark for this test. That's why it's missing from the command line. Signed-off-by: Willem de Bruijn <willemb@google.com> --- include/linux/netfilter/xt_skbuff.h | 33 ++++++++ net/netfilter/Kconfig | 9 ++ net/netfilter/Makefile | 1 + net/netfilter/xt_skbuff.c | 141 +++++++++++++++++++++++++++++++++++ 4 files changed, 184 insertions(+), 0 deletions(-) create mode 100644 include/linux/netfilter/xt_skbuff.h create mode 100644 net/netfilter/xt_skbuff.c diff --git a/include/linux/netfilter/xt_skbuff.h b/include/linux/netfilter/xt_skbuff.h new file mode 100644 index 0000000..10eb8d8 --- /dev/null +++ b/include/linux/netfilter/xt_skbuff.h @@ -0,0 +1,33 @@ +#ifndef _XT_SKBUFF_H +#define _XT_SKBUFF_H + +#include <linux/types.h> + +enum xt_skbuff_field_selector { + skb_field_csum = 0, + skb_field_hatype, + skb_field_iif, + skb_field_len, + skb_field_mark, + skb_field_pkt_type, + skb_field_priority, + skb_field_protocol, + skb_field_queue_mapping, + skb_field_rt_classid, + skb_field_rxhash, + skb_field_secmark, + skb_field_sk_uid, + skb_field_sk_gid, + skb_field_tstamp, + skb_field_vlan_tci, +}; + +struct xt_skbuff_info { + __u16 field_id; /* an xt_skbuff_field_selector value */ + __u8 invert; + __u64 min; + __u64 max; + __u64 mask; +}; + +#endif /*_XT_SKBUFF_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index fefa514..3a07a86 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -1093,6 +1093,15 @@ config NETFILTER_XT_MATCH_PKTTYPE To compile it as a module, choose M here. If unsure, say N. +config NETFILTER_XT_MATCH_SKBUFF + tristate '"skbuff" match support' + depends on NETFILTER_ADVANCED + help + This option adds a match based on the value of a chosen sk_buff + field. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_QUOTA tristate '"quota" match support' depends on NETFILTER_ADVANCED diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 3259697..9bc95e0 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -129,6 +129,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_RATEEST) += xt_rateest.o obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o obj-$(CONFIG_NETFILTER_XT_MATCH_RECENT) += xt_recent.o obj-$(CONFIG_NETFILTER_XT_MATCH_SCTP) += xt_sctp.o +obj-$(CONFIG_NETFILTER_XT_MATCH_SKBUFF) += xt_skbuff.o obj-$(CONFIG_NETFILTER_XT_MATCH_SOCKET) += xt_socket.o obj-$(CONFIG_NETFILTER_XT_MATCH_STATE) += xt_state.o obj-$(CONFIG_NETFILTER_XT_MATCH_STATISTIC) += xt_statistic.o diff --git a/net/netfilter/xt_skbuff.c b/net/netfilter/xt_skbuff.c new file mode 100644 index 0000000..5ca30eb --- /dev/null +++ b/net/netfilter/xt_skbuff.c @@ -0,0 +1,141 @@ +/* Xtables module to match packets based on sk_buff fields. + * Copyright 2012 Google Inc. + * Written by Willem de Bruijn <willemb@google.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/module.h> +#include <linux/skbuff.h> +#include <net/sock.h> + +#include <linux/netfilter/xt_skbuff.h> +#include <linux/netfilter/x_tables.h> + +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); +MODULE_DESCRIPTION("Xtables: skbuff match"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("ipt_priority"); +MODULE_ALIAS("ip6t_priority"); + +static bool skbuff_mt(const struct sk_buff *skb, + struct xt_action_param *par) +{ + const struct xt_skbuff_info *info = par->matchinfo; + u64 value; + + switch (info->field_id) { + case skb_field_csum: + if (skb->ip_summed != CHECKSUM_COMPLETE) + return false; + value = skb->csum; + break; + case skb_field_hatype: + if (!skb->dev) + return false; + value = skb->dev->type; + break; + case skb_field_iif: + value = skb->skb_iif; + break; + case skb_field_len: + value = skb->len; + break; + case skb_field_mark: + value = skb->mark; + break; + case skb_field_pkt_type: + value = skb->pkt_type; + break; + case skb_field_priority: + value = skb->priority; + break; + case skb_field_protocol: + value = skb->protocol; + break; + case skb_field_queue_mapping: + value = skb->queue_mapping; + break; + case skb_field_rt_classid: +#ifdef CONFIG_NET_CLS_ROUTE + const struct dst_entry *dst; + + rcu_read_lock(); + dst = skb_dst(skb); + if (dst) + value = dst->tclassid; + rcu_read_unlock(); + if (!dst) + return false; + break; +#else + return false; +#endif + case skb_field_rxhash: + value = skb->rxhash; + break; + case skb_field_secmark: +#ifdef CONFIG_NETWORK_SECMARK + value = skb->secmark; + break; +#else + return false; +#endif + case skb_field_sk_uid: + if (!skb->sk || !skb->sk->sk_socket || !skb->sk->sk_socket->file) + return false; + value = skb->sk->sk_socket->file->f_cred->fsuid; + break; + case skb_field_sk_gid: + if (!skb->sk || !skb->sk->sk_socket || !skb->sk->sk_socket->file) + return false; + value = skb->sk->sk_socket->file->f_cred->fsgid; + break; + case skb_field_tstamp: + value = skb->tstamp.tv64; + break; + case skb_field_vlan_tci: + value = skb->vlan_tci; + break; + default: + return false; + } + + value &= info->mask; + return (value >= info->min && value <= info->max) ^ info->invert; +} + +static int skbuff_mt_check(const struct xt_mtchk_param *par) +{ + const struct xt_skbuff_info *info = par->matchinfo; + + if (info->field_id > skb_field_vlan_tci) + return -EINVAL; + + return 0; +} + +static struct xt_match skbuff_mt_reg __read_mostly = { + .name = "skbuff", + .revision = 0, + .family = NFPROTO_UNSPEC, + .match = skbuff_mt, + .matchsize = sizeof(struct xt_skbuff_info), + .checkentry = skbuff_mt_check, + .me = THIS_MODULE, +}; + +static int __init skbuff_mt_init(void) +{ + return xt_register_match(&skbuff_mt_reg); +} + +static void __exit skbuff_mt_exit(void) +{ + xt_unregister_match(&skbuff_mt_reg); +} + +module_init(skbuff_mt_init); +module_exit(skbuff_mt_exit); -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] [RFC] netfilter: add xt_skbuff xtables match 2012-12-08 0:04 ` [PATCH] [RFC] netfilter: add xt_skbuff " Willem de Bruijn @ 2012-12-08 3:23 ` Pablo Neira Ayuso 2012-12-09 20:24 ` Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2012-12-08 3:23 UTC (permalink / raw) To: Willem de Bruijn; +Cc: netfilter-devel, kaber On Fri, Dec 07, 2012 at 07:04:04PM -0500, Willem de Bruijn wrote: > Add an iptables match based on skb fields, such as mark, priority, > input interface and rxhash. The match supports range based matching > on one field, with optional inversion and masking. > > v2: switches from xt_priority to xt_skbuff. Pablo, is this what > you had in mind? It doesn't perfectly duplicate the values from > nftables xt_meta. Needs more testing to cover the field-specific > codepaths. Yes, I think this is the good direction for it. > Tested by inserting > > iptables -t mangle -A PREROUTING -s $SRC -j MARK --set-mark 10 > iptables -A INPUT -m skbuff --min 10 --max 10 -j TRACE Hm, you have to specify the field type in the iptables rule, right?i > The userspace tool needs work, too. For one, I just hardcoded the > field_id to be skb_field_mark for this test. That's why it's missing > from the command line. Please, once you're done with the user-space part, post it to the ML in order to help testing this. > Signed-off-by: Willem de Bruijn <willemb@google.com> > --- > include/linux/netfilter/xt_skbuff.h | 33 ++++++++ > net/netfilter/Kconfig | 9 ++ > net/netfilter/Makefile | 1 + > net/netfilter/xt_skbuff.c | 141 +++++++++++++++++++++++++++++++++++ > 4 files changed, 184 insertions(+), 0 deletions(-) > create mode 100644 include/linux/netfilter/xt_skbuff.h > create mode 100644 net/netfilter/xt_skbuff.c > > diff --git a/include/linux/netfilter/xt_skbuff.h b/include/linux/netfilter/xt_skbuff.h > new file mode 100644 > index 0000000..10eb8d8 > --- /dev/null > +++ b/include/linux/netfilter/xt_skbuff.h > @@ -0,0 +1,33 @@ > +#ifndef _XT_SKBUFF_H > +#define _XT_SKBUFF_H > + > +#include <linux/types.h> > + > +enum xt_skbuff_field_selector { these enums in uppercase. > + skb_field_csum = 0, > + skb_field_hatype, > + skb_field_iif, > + skb_field_len, > + skb_field_mark, > + skb_field_pkt_type, > + skb_field_priority, > + skb_field_protocol, > + skb_field_queue_mapping, > + skb_field_rt_classid, > + skb_field_rxhash, > + skb_field_secmark, > + skb_field_sk_uid, > + skb_field_sk_gid, > + skb_field_tstamp, > + skb_field_vlan_tci, Please, check if these all are worth to be exposed to user-space via iptables. > +}; > + > +struct xt_skbuff_info { > + __u16 field_id; /* an xt_skbuff_field_selector value */ > + __u8 invert; > + __u64 min; > + __u64 max; > + __u64 mask; > +}; > + > +#endif /*_XT_SKBUFF_H */ > diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig > index fefa514..3a07a86 100644 > --- a/net/netfilter/Kconfig > +++ b/net/netfilter/Kconfig > @@ -1093,6 +1093,15 @@ config NETFILTER_XT_MATCH_PKTTYPE > > To compile it as a module, choose M here. If unsure, say N. > > +config NETFILTER_XT_MATCH_SKBUFF > + tristate '"skbuff" match support' > + depends on NETFILTER_ADVANCED > + help > + This option adds a match based on the value of a chosen sk_buff > + field. > + > + To compile it as a module, choose M here. If unsure, say N. > + > config NETFILTER_XT_MATCH_QUOTA > tristate '"quota" match support' > depends on NETFILTER_ADVANCED > diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile > index 3259697..9bc95e0 100644 > --- a/net/netfilter/Makefile > +++ b/net/netfilter/Makefile > @@ -129,6 +129,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_RATEEST) += xt_rateest.o > obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o > obj-$(CONFIG_NETFILTER_XT_MATCH_RECENT) += xt_recent.o > obj-$(CONFIG_NETFILTER_XT_MATCH_SCTP) += xt_sctp.o > +obj-$(CONFIG_NETFILTER_XT_MATCH_SKBUFF) += xt_skbuff.o > obj-$(CONFIG_NETFILTER_XT_MATCH_SOCKET) += xt_socket.o > obj-$(CONFIG_NETFILTER_XT_MATCH_STATE) += xt_state.o > obj-$(CONFIG_NETFILTER_XT_MATCH_STATISTIC) += xt_statistic.o > diff --git a/net/netfilter/xt_skbuff.c b/net/netfilter/xt_skbuff.c > new file mode 100644 > index 0000000..5ca30eb > --- /dev/null > +++ b/net/netfilter/xt_skbuff.c > @@ -0,0 +1,141 @@ > +/* Xtables module to match packets based on sk_buff fields. > + * Copyright 2012 Google Inc. > + * Written by Willem de Bruijn <willemb@google.com> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + */ > + > +#include <linux/module.h> > +#include <linux/skbuff.h> > +#include <net/sock.h> > + > +#include <linux/netfilter/xt_skbuff.h> > +#include <linux/netfilter/x_tables.h> > + > +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); > +MODULE_DESCRIPTION("Xtables: skbuff match"); > +MODULE_LICENSE("GPL"); > +MODULE_ALIAS("ipt_priority"); > +MODULE_ALIAS("ip6t_priority"); you'll have to remove this aliasing support. There was not previous priority match in mainstream. > +static bool skbuff_mt(const struct sk_buff *skb, > + struct xt_action_param *par) > +{ > + const struct xt_skbuff_info *info = par->matchinfo; > + u64 value; > + > + switch (info->field_id) { > + case skb_field_csum: > + if (skb->ip_summed != CHECKSUM_COMPLETE) > + return false; > + value = skb->csum; > + break; > + case skb_field_hatype: > + if (!skb->dev) > + return false; > + value = skb->dev->type; > + break; > + case skb_field_iif: > + value = skb->skb_iif; > + break; > + case skb_field_len: > + value = skb->len; > + break; > + case skb_field_mark: > + value = skb->mark; > + break; > + case skb_field_pkt_type: > + value = skb->pkt_type; > + break; > + case skb_field_priority: > + value = skb->priority; > + break; > + case skb_field_protocol: > + value = skb->protocol; > + break; > + case skb_field_queue_mapping: > + value = skb->queue_mapping; > + break; > + case skb_field_rt_classid: > +#ifdef CONFIG_NET_CLS_ROUTE > + const struct dst_entry *dst; > + > + rcu_read_lock(); > + dst = skb_dst(skb); > + if (dst) > + value = dst->tclassid; > + rcu_read_unlock(); > + if (!dst) > + return false; > + break; > +#else > + return false; > +#endif > + case skb_field_rxhash: > + value = skb->rxhash; > + break; > + case skb_field_secmark: > +#ifdef CONFIG_NETWORK_SECMARK > + value = skb->secmark; > + break; > +#else > + return false; > +#endif > + case skb_field_sk_uid: > + if (!skb->sk || !skb->sk->sk_socket || !skb->sk->sk_socket->file) > + return false; > + value = skb->sk->sk_socket->file->f_cred->fsuid; > + break; > + case skb_field_sk_gid: > + if (!skb->sk || !skb->sk->sk_socket || !skb->sk->sk_socket->file) > + return false; > + value = skb->sk->sk_socket->file->f_cred->fsgid; > + break; > + case skb_field_tstamp: > + value = skb->tstamp.tv64; > + break; > + case skb_field_vlan_tci: > + value = skb->vlan_tci; > + break; > + default: > + return false; > + } > + > + value &= info->mask; > + return (value >= info->min && value <= info->max) ^ info->invert; > +} > + > +static int skbuff_mt_check(const struct xt_mtchk_param *par) > +{ > + const struct xt_skbuff_info *info = par->matchinfo; > + > + if (info->field_id > skb_field_vlan_tci) Better define SKB_FIELD_MAX ? > + return -EINVAL; probably -EOPNOTSUPP is better in case we add some new skbuff field that we support. > + > + return 0; > +} > + > +static struct xt_match skbuff_mt_reg __read_mostly = { > + .name = "skbuff", > + .revision = 0, > + .family = NFPROTO_UNSPEC, > + .match = skbuff_mt, > + .matchsize = sizeof(struct xt_skbuff_info), > + .checkentry = skbuff_mt_check, > + .me = THIS_MODULE, > +}; > + > +static int __init skbuff_mt_init(void) > +{ > + return xt_register_match(&skbuff_mt_reg); > +} > + > +static void __exit skbuff_mt_exit(void) > +{ > + xt_unregister_match(&skbuff_mt_reg); > +} > + > +module_init(skbuff_mt_init); > +module_exit(skbuff_mt_exit); > -- > 1.7.7.3 > ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] [RFC] netfilter: add xt_skbuff xtables match 2012-12-08 3:23 ` Pablo Neira Ayuso @ 2012-12-09 20:24 ` Willem de Bruijn 2012-12-09 20:28 ` [PATCH] " Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2012-12-09 20:24 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter-devel, Patrick McHardy On Fri, Dec 7, 2012 at 10:23 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Fri, Dec 07, 2012 at 07:04:04PM -0500, Willem de Bruijn wrote: >> Add an iptables match based on skb fields, such as mark, priority, >> input interface and rxhash. The match supports range based matching >> on one field, with optional inversion and masking. >> >> v2: switches from xt_priority to xt_skbuff. Pablo, is this what >> you had in mind? It doesn't perfectly duplicate the values from >> nftables xt_meta. Needs more testing to cover the field-specific >> codepaths. > > Yes, I think this is the good direction for it. > >> Tested by inserting >> >> iptables -t mangle -A PREROUTING -s $SRC -j MARK --set-mark 10 >> iptables -A INPUT -m skbuff --min 10 --max 10 -j TRACE > > Hm, you have to specify the field type in the iptables rule, right?i Yes, updated in the patch I just sent. >> The userspace tool needs work, too. For one, I just hardcoded the >> field_id to be skb_field_mark for this test. That's why it's missing >> from the command line. > > Please, once you're done with the user-space part, post it to the ML > in order to help testing this. > >> Signed-off-by: Willem de Bruijn <willemb@google.com> >> --- >> include/linux/netfilter/xt_skbuff.h | 33 ++++++++ >> net/netfilter/Kconfig | 9 ++ >> net/netfilter/Makefile | 1 + >> net/netfilter/xt_skbuff.c | 141 +++++++++++++++++++++++++++++++++++ >> 4 files changed, 184 insertions(+), 0 deletions(-) >> create mode 100644 include/linux/netfilter/xt_skbuff.h >> create mode 100644 net/netfilter/xt_skbuff.c >> >> diff --git a/include/linux/netfilter/xt_skbuff.h b/include/linux/netfilter/xt_skbuff.h >> new file mode 100644 >> index 0000000..10eb8d8 >> --- /dev/null >> +++ b/include/linux/netfilter/xt_skbuff.h >> @@ -0,0 +1,33 @@ >> +#ifndef _XT_SKBUFF_H >> +#define _XT_SKBUFF_H >> + >> +#include <linux/types.h> >> + >> +enum xt_skbuff_field_selector { > > these enums in uppercase. Done. I'll send a revised patch right after this email that addresses these and the other comments. Unfortunately, I didn't think of revising the kernel module before sending the iptables patch, so that patch is against the old module. In particular, against the headerfile with lower-case enums. >> + skb_field_csum = 0, >> + skb_field_hatype, >> + skb_field_iif, >> + skb_field_len, >> + skb_field_mark, >> + skb_field_pkt_type, >> + skb_field_priority, >> + skb_field_protocol, >> + skb_field_queue_mapping, >> + skb_field_rt_classid, >> + skb_field_rxhash, >> + skb_field_secmark, >> + skb_field_sk_uid, >> + skb_field_sk_gid, >> + skb_field_tstamp, >> + skb_field_vlan_tci, > > Please, check if these all are worth to be exposed to user-space via > iptables. On second thought, iif may not be useful. Replaced with dev->ifindex, which I think will be. hatype, len, mark, pkt_type, protocol, queue_mapping and rxhash are taken from bpf ancillary, so I assume that there is a use. priority is the one field that I really want to be able to filter on myself. rt_classid, secmark, sk_uid and sk_gid are from nftables meta. csum and tstamp are new. I don't have an immediate good example of either, but they are straightforward to add. The first is at the least a source of entropy. Some of these raw skb field readings are very crude approximations of existing filters: xt_pkttype, xt_owners, xt_time and I may have missed some. I don't see that as an issue, given the few lines of code each adds to the total patch, but let me know if you want me to rip these (or any others) out. >> +}; >> + >> +struct xt_skbuff_info { >> + __u16 field_id; /* an xt_skbuff_field_selector value */ >> + __u8 invert; >> + __u64 min; >> + __u64 max; >> + __u64 mask; >> +}; >> + >> +#endif /*_XT_SKBUFF_H */ >> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig >> index fefa514..3a07a86 100644 >> --- a/net/netfilter/Kconfig >> +++ b/net/netfilter/Kconfig >> @@ -1093,6 +1093,15 @@ config NETFILTER_XT_MATCH_PKTTYPE >> >> To compile it as a module, choose M here. If unsure, say N. >> >> +config NETFILTER_XT_MATCH_SKBUFF >> + tristate '"skbuff" match support' >> + depends on NETFILTER_ADVANCED >> + help >> + This option adds a match based on the value of a chosen sk_buff >> + field. >> + >> + To compile it as a module, choose M here. If unsure, say N. >> + >> config NETFILTER_XT_MATCH_QUOTA >> tristate '"quota" match support' >> depends on NETFILTER_ADVANCED >> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile >> index 3259697..9bc95e0 100644 >> --- a/net/netfilter/Makefile >> +++ b/net/netfilter/Makefile >> @@ -129,6 +129,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_RATEEST) += xt_rateest.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_RECENT) += xt_recent.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_SCTP) += xt_sctp.o >> +obj-$(CONFIG_NETFILTER_XT_MATCH_SKBUFF) += xt_skbuff.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_SOCKET) += xt_socket.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_STATE) += xt_state.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_STATISTIC) += xt_statistic.o >> diff --git a/net/netfilter/xt_skbuff.c b/net/netfilter/xt_skbuff.c >> new file mode 100644 >> index 0000000..5ca30eb >> --- /dev/null >> +++ b/net/netfilter/xt_skbuff.c >> @@ -0,0 +1,141 @@ >> +/* Xtables module to match packets based on sk_buff fields. >> + * Copyright 2012 Google Inc. >> + * Written by Willem de Bruijn <willemb@google.com> >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License version 2 as >> + * published by the Free Software Foundation. >> + */ >> + >> +#include <linux/module.h> >> +#include <linux/skbuff.h> >> +#include <net/sock.h> >> + >> +#include <linux/netfilter/xt_skbuff.h> >> +#include <linux/netfilter/x_tables.h> >> + >> +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); >> +MODULE_DESCRIPTION("Xtables: skbuff match"); >> +MODULE_LICENSE("GPL"); >> +MODULE_ALIAS("ipt_priority"); >> +MODULE_ALIAS("ip6t_priority"); > > you'll have to remove this aliasing support. There was not previous > priority match in mainstream. > >> +static bool skbuff_mt(const struct sk_buff *skb, >> + struct xt_action_param *par) >> +{ >> + const struct xt_skbuff_info *info = par->matchinfo; >> + u64 value; >> + >> + switch (info->field_id) { >> + case skb_field_csum: >> + if (skb->ip_summed != CHECKSUM_COMPLETE) >> + return false; >> + value = skb->csum; >> + break; >> + case skb_field_hatype: >> + if (!skb->dev) >> + return false; >> + value = skb->dev->type; >> + break; >> + case skb_field_iif: >> + value = skb->skb_iif; >> + break; >> + case skb_field_len: >> + value = skb->len; >> + break; >> + case skb_field_mark: >> + value = skb->mark; >> + break; >> + case skb_field_pkt_type: >> + value = skb->pkt_type; >> + break; >> + case skb_field_priority: >> + value = skb->priority; >> + break; >> + case skb_field_protocol: >> + value = skb->protocol; >> + break; >> + case skb_field_queue_mapping: >> + value = skb->queue_mapping; >> + break; >> + case skb_field_rt_classid: >> +#ifdef CONFIG_NET_CLS_ROUTE >> + const struct dst_entry *dst; >> + >> + rcu_read_lock(); >> + dst = skb_dst(skb); >> + if (dst) >> + value = dst->tclassid; >> + rcu_read_unlock(); >> + if (!dst) >> + return false; >> + break; >> +#else >> + return false; >> +#endif >> + case skb_field_rxhash: >> + value = skb->rxhash; >> + break; >> + case skb_field_secmark: >> +#ifdef CONFIG_NETWORK_SECMARK >> + value = skb->secmark; >> + break; >> +#else >> + return false; >> +#endif >> + case skb_field_sk_uid: >> + if (!skb->sk || !skb->sk->sk_socket || !skb->sk->sk_socket->file) >> + return false; >> + value = skb->sk->sk_socket->file->f_cred->fsuid; >> + break; >> + case skb_field_sk_gid: >> + if (!skb->sk || !skb->sk->sk_socket || !skb->sk->sk_socket->file) >> + return false; >> + value = skb->sk->sk_socket->file->f_cred->fsgid; >> + break; >> + case skb_field_tstamp: >> + value = skb->tstamp.tv64; >> + break; >> + case skb_field_vlan_tci: >> + value = skb->vlan_tci; >> + break; >> + default: >> + return false; >> + } >> + >> + value &= info->mask; >> + return (value >= info->min && value <= info->max) ^ info->invert; >> +} >> + >> +static int skbuff_mt_check(const struct xt_mtchk_param *par) >> +{ >> + const struct xt_skbuff_info *info = par->matchinfo; >> + >> + if (info->field_id > skb_field_vlan_tci) > > Better define SKB_FIELD_MAX ? > >> + return -EINVAL; > > probably -EOPNOTSUPP is better in case we add some new skbuff field > that we support. > >> + >> + return 0; >> +} >> + >> +static struct xt_match skbuff_mt_reg __read_mostly = { >> + .name = "skbuff", >> + .revision = 0, >> + .family = NFPROTO_UNSPEC, >> + .match = skbuff_mt, >> + .matchsize = sizeof(struct xt_skbuff_info), >> + .checkentry = skbuff_mt_check, >> + .me = THIS_MODULE, >> +}; >> + >> +static int __init skbuff_mt_init(void) >> +{ >> + return xt_register_match(&skbuff_mt_reg); >> +} >> + >> +static void __exit skbuff_mt_exit(void) >> +{ >> + xt_unregister_match(&skbuff_mt_reg); >> +} >> + >> +module_init(skbuff_mt_init); >> +module_exit(skbuff_mt_exit); >> -- >> 1.7.7.3 >> ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH] netfilter: add xt_skbuff xtables match 2012-12-09 20:24 ` Willem de Bruijn @ 2012-12-09 20:28 ` Willem de Bruijn 0 siblings, 0 replies; 58+ messages in thread From: Willem de Bruijn @ 2012-12-09 20:28 UTC (permalink / raw) To: netfilter-devel, pablo, kaber; +Cc: Willem de Bruijn Add an iptables match based on skb fields, such as mark, priority, input interface and rxhash. The match supports range based matching on one field, with optional inversion and masking. v2: switches from xt_priority to xt_skbuff. Pablo, is this what you had in mind? It doesn't perfectly duplicate the values from nftables xt_meta. Needs more testing to cover the field-specific codepaths. Tested by inserting iptables -t mangle -A PREROUTING -s $SRC -j MARK --set-mark 10 iptables -A INPUT -m skbuff --min 10 --max 10 -j TRACE The userspace tool needs work, too. For one, I just hardcoded the field_id to be skb_field_mark for this test. That's why it's missing from the command line. Change-Id: Ib855409fb940acc9eccc87e2a550a875d9b19fd1 Signed-off-by: Willem de Bruijn <willemb@google.com> --- include/linux/netfilter/xt_skbuff.h | 36 +++++++++ net/netfilter/Kconfig | 9 ++ net/netfilter/Makefile | 1 + net/netfilter/xt_skbuff.c | 141 +++++++++++++++++++++++++++++++++++ 4 files changed, 187 insertions(+), 0 deletions(-) create mode 100644 include/linux/netfilter/xt_skbuff.h create mode 100644 net/netfilter/xt_skbuff.c diff --git a/include/linux/netfilter/xt_skbuff.h b/include/linux/netfilter/xt_skbuff.h new file mode 100644 index 0000000..1d7aa63 --- /dev/null +++ b/include/linux/netfilter/xt_skbuff.h @@ -0,0 +1,36 @@ +#ifndef _XT_SKBUFF_H +#define _XT_SKBUFF_H + +#include <linux/types.h> + +enum xt_skbuff_field_selector { + SKB_FIELD_CSUM = 0, + SKB_FIELD_HATYPE, + SKB_FIELD_IFINDEX, + SKB_FIELD_LEN, + SKB_FIELD_MARK, + SKB_FIELD_PKT_TYPE, + SKB_FIELD_PRIORITY, + SKB_FIELD_PROTOCOL, + SKB_FIELD_QUEUE_MAPPING, + SKB_FIELD_RT_CLASSID, + SKB_FIELD_RXHASH, + SKB_FIELD_SECMARK, + SKB_FIELD_SK_UID, + SKB_FIELD_SK_GID, + SKB_FIELD_TSTAMP, + SKB_FIELD_VLAN_TCI, + + /* must be the last element */ + SKB_FIELD_MAX, +}; + +struct xt_skbuff_info { + __u16 field_id; /* an xt_skbuff_field_selector value */ + __u8 invert; + __u64 min; + __u64 max; + __u64 mask; +}; + +#endif /*_XT_SKBUFF_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index fefa514..3a07a86 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -1093,6 +1093,15 @@ config NETFILTER_XT_MATCH_PKTTYPE To compile it as a module, choose M here. If unsure, say N. +config NETFILTER_XT_MATCH_SKBUFF + tristate '"skbuff" match support' + depends on NETFILTER_ADVANCED + help + This option adds a match based on the value of a chosen sk_buff + field. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_QUOTA tristate '"quota" match support' depends on NETFILTER_ADVANCED diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 3259697..9bc95e0 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -129,6 +129,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_RATEEST) += xt_rateest.o obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o obj-$(CONFIG_NETFILTER_XT_MATCH_RECENT) += xt_recent.o obj-$(CONFIG_NETFILTER_XT_MATCH_SCTP) += xt_sctp.o +obj-$(CONFIG_NETFILTER_XT_MATCH_SKBUFF) += xt_skbuff.o obj-$(CONFIG_NETFILTER_XT_MATCH_SOCKET) += xt_socket.o obj-$(CONFIG_NETFILTER_XT_MATCH_STATE) += xt_state.o obj-$(CONFIG_NETFILTER_XT_MATCH_STATISTIC) += xt_statistic.o diff --git a/net/netfilter/xt_skbuff.c b/net/netfilter/xt_skbuff.c new file mode 100644 index 0000000..bd9ce7c --- /dev/null +++ b/net/netfilter/xt_skbuff.c @@ -0,0 +1,141 @@ +/* Xtables module to match packets based on sk_buff fields. + * Copyright 2012 Google Inc. + * Written by Willem de Bruijn <willemb@google.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/module.h> +#include <linux/skbuff.h> +#include <net/sock.h> + +#include <linux/netfilter/xt_skbuff.h> +#include <linux/netfilter/x_tables.h> + +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); +MODULE_DESCRIPTION("Xtables: skbuff match"); +MODULE_LICENSE("GPL"); + +static bool skbuff_mt(const struct sk_buff *skb, + struct xt_action_param *par) +{ + const struct xt_skbuff_info *info = par->matchinfo; + u64 value; + + switch (info->field_id) { + case SKB_FIELD_CSUM: + if (skb->ip_summed != CHECKSUM_COMPLETE) + return false; + value = skb->csum; + break; + case SKB_FIELD_HATYPE: + if (!skb->dev) + return false; + value = skb->dev->type; + break; + case SKB_FIELD_IFINDEX: + if (!skb->dev) + return false; + value = skb->dev->ifindex; + break; + case SKB_FIELD_LEN: + value = skb->len; + break; + case SKB_FIELD_MARK: + value = skb->mark; + break; + case SKB_FIELD_PKT_TYPE: + value = skb->pkt_type; + break; + case SKB_FIELD_PRIORITY: + value = skb->priority; + break; + case SKB_FIELD_PROTOCOL: + value = skb->protocol; + break; + case SKB_FIELD_QUEUE_MAPPING: + value = skb->queue_mapping; + break; + case SKB_FIELD_RT_CLASSID: +#ifdef CONFIG_NET_CLS_ROUTE + const struct dst_entry *dst; + + rcu_read_lock(); + dst = skb_dst(skb); + if (dst) + value = dst->tclassid; + rcu_read_unlock(); + if (!dst) + return false; + break; +#else + return false; +#endif + case SKB_FIELD_RXHASH: + value = skb->rxhash; + break; + case SKB_FIELD_SECMARK: +#ifdef CONFIG_NETWORK_SECMARK + value = skb->secmark; + break; +#else + return false; +#endif + case SKB_FIELD_SK_UID: + if (!skb->sk || !skb->sk->sk_socket || !skb->sk->sk_socket->file) + return false; + value = skb->sk->sk_socket->file->f_cred->fsuid; + break; + case SKB_FIELD_SK_GID: + if (!skb->sk || !skb->sk->sk_socket || !skb->sk->sk_socket->file) + return false; + value = skb->sk->sk_socket->file->f_cred->fsgid; + break; + case SKB_FIELD_TSTAMP: + value = skb->tstamp.tv64; + break; + case SKB_FIELD_VLAN_TCI: + value = skb->vlan_tci; + break; + default: + return false; + } + + value &= info->mask; + return (value >= info->min && value <= info->max) ^ info->invert; +} + +static int skbuff_mt_check(const struct xt_mtchk_param *par) +{ + const struct xt_skbuff_info *info = par->matchinfo; + + if (info->field_id >= SKB_FIELD_MAX) + return -EOPNOTSUPP; + + return 0; +} + +static struct xt_match skbuff_mt_reg __read_mostly = { + .name = "skbuff", + .revision = 0, + .family = NFPROTO_UNSPEC, + .match = skbuff_mt, + .matchsize = sizeof(struct xt_skbuff_info), + .checkentry = skbuff_mt_check, + .me = THIS_MODULE, +}; + +static int __init skbuff_mt_init(void) +{ + return xt_register_match(&skbuff_mt_reg); +} + +static void __exit skbuff_mt_exit(void) +{ + xt_unregister_match(&skbuff_mt_reg); +} + +module_init(skbuff_mt_init); +module_exit(skbuff_mt_exit); -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 2/2] netfilter: add xt_bpf xtables match 2012-12-05 19:22 [PATCH rfc] netfilter: two xtables matches Willem de Bruijn 2012-12-05 19:22 ` [PATCH 1/2] netfilter: add xt_priority xtables match Willem de Bruijn @ 2012-12-05 19:22 ` Willem de Bruijn 2012-12-05 19:48 ` Pablo Neira Ayuso 2012-12-05 19:28 ` [PATCH rfc] netfilter: two xtables matches Willem de Bruijn 2 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2012-12-05 19:22 UTC (permalink / raw) To: netfilter-devel, netdev, edumazet, davem, kaber, pablo; +Cc: Willem de Bruijn A new match that executes sk_run_filter on every packet. BPF filters can access skbuff fields that are out of scope for existing iptables rules, allow more expressive logic, and on platforms with JIT support can even be faster. I have a corresponding iptables patch that takes `tcpdump -ddd` output, as used in the examples below. The two parts communicate using a variable length structure. This is similar to ebt_among, but new for iptables. Verified functionality by inserting an ip source filter on chain INPUT and an ip dest filter on chain OUTPUT and noting that ping failed while a rule was active: iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP Evaluated throughput by running netperf TCP_STREAM over loopback on x86_64. I expected the BPF filter to outperform hardcoded iptables filters when replacing multiple matches with a single bpf match, but even a single comparison to u32 appears to do better. Relative to the benchmark with no filter applied, rate with 100 BPF filters dropped to 81%. With 100 U32 filters it dropped to 55%. The difference sounds excessive to me, but was consistent on my hardware. Commands used: for i in `seq 100`; do iptables -A OUTPUT -m bpf --bytecode '4,48 0 0 9,21 0 1 20,6 0 0 96,6 0 0 0,' -j DROP; done for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done iptables -F OUTPUT for i in `seq 100`; do iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP; done for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done FYI: perf top [bpf] 33.94% [kernel] [k] copy_user_generic_string 8.92% [kernel] [k] sk_run_filter 7.77% [ip_tables] [k] ipt_do_table [u32] 22.63% [kernel] [k] copy_user_generic_string 14.46% [kernel] [k] memcpy 9.19% [ip_tables] [k] ipt_do_table 8.47% [xt_u32] [k] u32_mt 5.32% [kernel] [k] skb_copy_bits The big difference appears to be in memory copying. I have not looked into u32, so cannot explain this right now. More interestingly, at higher rate, sk_run_filter appears to use as many cycles as u32_mt (both traces have roughly the same number of events). One caveat: to work independent of device link layer, the filter expects DLT_RAW style BPF programs, i.e., those that expect the packet to start at the IP layer. --- include/linux/netfilter/xt_bpf.h | 17 +++++++ net/netfilter/Kconfig | 9 ++++ net/netfilter/Makefile | 1 + net/netfilter/x_tables.c | 5 +- net/netfilter/xt_bpf.c | 88 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 118 insertions(+), 2 deletions(-) create mode 100644 include/linux/netfilter/xt_bpf.h create mode 100644 net/netfilter/xt_bpf.c diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h new file mode 100644 index 0000000..23502c0 --- /dev/null +++ b/include/linux/netfilter/xt_bpf.h @@ -0,0 +1,17 @@ +#ifndef _XT_BPF_H +#define _XT_BPF_H + +#include <linux/filter.h> +#include <linux/types.h> + +struct xt_bpf_info { + __u16 bpf_program_num_elem; + + /* only used in kernel */ + struct sk_filter *filter __attribute__((aligned(8))); + + /* variable size, based on program_num_elem */ + struct sock_filter bpf_program[0]; +}; + +#endif /*_XT_BPF_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index c9739c6..c7cc0b8 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE If you want to compile it as a module, say M here and read <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. +config NETFILTER_XT_MATCH_BPF + tristate '"bpf" match support' + depends on NETFILTER_ADVANCED + help + BPF matching applies a linux socket filter to each packet and + accepts those for which the filter returns non-zero. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_CLUSTER tristate '"cluster" match support' depends on NF_CONNTRACK diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 8e5602f..9f12eeb 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o # matches obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 8d987c3..26306be 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par, if (XT_ALIGN(par->match->matchsize) != size && par->match->matchsize != -1) { /* - * ebt_among is exempt from centralized matchsize checking - * because it uses a dynamic-size data set. + * matches of variable size length, such as ebt_among, + * are exempt from centralized matchsize checking. They + * skip the test by setting xt_match.matchsize to -1. */ pr_err("%s_tables: %s.%u match: invalid size " "%u (kernel) != (user) %u\n", diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c new file mode 100644 index 0000000..07077c5 --- /dev/null +++ b/net/netfilter/xt_bpf.c @@ -0,0 +1,88 @@ +/* Xtables module to match packets using a BPF filter. + * Copyright 2012 Google Inc. + * Written by Willem de Bruijn <willemb@google.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/module.h> +#include <linux/skbuff.h> +#include <linux/ipv6.h> +#include <linux/filter.h> +#include <net/ip.h> + +#include <linux/netfilter/xt_bpf.h> +#include <linux/netfilter/x_tables.h> + +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); +MODULE_DESCRIPTION("Xtables: BPF filter match"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("ipt_bpf"); +MODULE_ALIAS("ip6t_bpf"); + +static int bpf_mt_check(const struct xt_mtchk_param *par) +{ + struct xt_bpf_info *info = par->matchinfo; + const struct xt_entry_match *match; + struct sock_fprog program; + int expected_len; + + match = container_of(par->matchinfo, const struct xt_entry_match, data); + expected_len = sizeof(struct xt_entry_match) + + sizeof(struct xt_bpf_info) + + (sizeof(struct sock_filter) * + info->bpf_program_num_elem); + + if (match->u.match_size != expected_len) { + pr_info("bpf: check failed: incorrect length\n"); + return -EINVAL; + } + + program.len = info->bpf_program_num_elem; + program.filter = info->bpf_program; + if (sk_unattached_filter_create(&info->filter, &program)) { + pr_info("bpf: check failed: parse error\n"); + return -EINVAL; + } + + return 0; +} + +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + + return SK_RUN_FILTER(info->filter, skb); +} + +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + sk_unattached_filter_destroy(info->filter); +} + +static struct xt_match bpf_mt_reg __read_mostly = { + .name = "bpf", + .revision = 0, + .family = NFPROTO_UNSPEC, + .checkentry = bpf_mt_check, + .match = bpf_mt, + .destroy = bpf_mt_destroy, + .matchsize = -1, /* skip xt_check_match because of dynamic len */ + .me = THIS_MODULE, +}; + +static int __init bpf_mt_init(void) +{ + return xt_register_match(&bpf_mt_reg); +} + +static void __exit bpf_mt_exit(void) +{ + xt_unregister_match(&bpf_mt_reg); +} + +module_init(bpf_mt_init); +module_exit(bpf_mt_exit); -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match 2012-12-05 19:22 ` [PATCH 2/2] netfilter: add xt_bpf " Willem de Bruijn @ 2012-12-05 19:48 ` Pablo Neira Ayuso 2012-12-05 20:10 ` Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2012-12-05 19:48 UTC (permalink / raw) To: Willem de Bruijn; +Cc: netfilter-devel, netdev, edumazet, davem, kaber Hi Willem, On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote: > A new match that executes sk_run_filter on every packet. BPF filters > can access skbuff fields that are out of scope for existing iptables > rules, allow more expressive logic, and on platforms with JIT support > can even be faster. > > I have a corresponding iptables patch that takes `tcpdump -ddd` > output, as used in the examples below. The two parts communicate > using a variable length structure. This is similar to ebt_among, > but new for iptables. > > Verified functionality by inserting an ip source filter on chain > INPUT and an ip dest filter on chain OUTPUT and noting that ping > failed while a rule was active: > > iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP > iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP I like this BPF idea for iptables. I made a similar extension time ago, but it was taking a file as parameter. That file contained in BPF code. I made a simple bison parser that takes BPF code and put it into the bpf array of instructions. It would be a bit more intuitive to define a filter and we can distribute it with iptables. Let me check on my internal trees, I can put that user-space code somewhere in case you're interested. > Evaluated throughput by running netperf TCP_STREAM over loopback on > x86_64. I expected the BPF filter to outperform hardcoded iptables > filters when replacing multiple matches with a single bpf match, but > even a single comparison to u32 appears to do better. Relative to the > benchmark with no filter applied, rate with 100 BPF filters dropped > to 81%. With 100 U32 filters it dropped to 55%. The difference sounds > excessive to me, but was consistent on my hardware. Commands used: > > for i in `seq 100`; do iptables -A OUTPUT -m bpf --bytecode '4,48 0 0 9,21 0 1 20,6 0 0 96,6 0 0 0,' -j DROP; done > for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done > > iptables -F OUTPUT > > for i in `seq 100`; do iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP; done > for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done > > FYI: perf top > > [bpf] > 33.94% [kernel] [k] copy_user_generic_string > 8.92% [kernel] [k] sk_run_filter > 7.77% [ip_tables] [k] ipt_do_table > > [u32] > 22.63% [kernel] [k] copy_user_generic_string > 14.46% [kernel] [k] memcpy > 9.19% [ip_tables] [k] ipt_do_table > 8.47% [xt_u32] [k] u32_mt > 5.32% [kernel] [k] skb_copy_bits > > The big difference appears to be in memory copying. I have not > looked into u32, so cannot explain this right now. More interestingly, > at higher rate, sk_run_filter appears to use as many cycles as u32_mt > (both traces have roughly the same number of events). > > One caveat: to work independent of device link layer, the filter > expects DLT_RAW style BPF programs, i.e., those that expect the > packet to start at the IP layer. > --- > include/linux/netfilter/xt_bpf.h | 17 +++++++ > net/netfilter/Kconfig | 9 ++++ > net/netfilter/Makefile | 1 + > net/netfilter/x_tables.c | 5 +- > net/netfilter/xt_bpf.c | 88 ++++++++++++++++++++++++++++++++++++++ > 5 files changed, 118 insertions(+), 2 deletions(-) > create mode 100644 include/linux/netfilter/xt_bpf.h > create mode 100644 net/netfilter/xt_bpf.c > > diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h > new file mode 100644 > index 0000000..23502c0 > --- /dev/null > +++ b/include/linux/netfilter/xt_bpf.h > @@ -0,0 +1,17 @@ > +#ifndef _XT_BPF_H > +#define _XT_BPF_H > + > +#include <linux/filter.h> > +#include <linux/types.h> > + > +struct xt_bpf_info { > + __u16 bpf_program_num_elem; > + > + /* only used in kernel */ > + struct sk_filter *filter __attribute__((aligned(8))); > + > + /* variable size, based on program_num_elem */ > + struct sock_filter bpf_program[0]; > +}; > + > +#endif /*_XT_BPF_H */ > diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig > index c9739c6..c7cc0b8 100644 > --- a/net/netfilter/Kconfig > +++ b/net/netfilter/Kconfig > @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE > If you want to compile it as a module, say M here and read > <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. > > +config NETFILTER_XT_MATCH_BPF > + tristate '"bpf" match support' > + depends on NETFILTER_ADVANCED > + help > + BPF matching applies a linux socket filter to each packet and > + accepts those for which the filter returns non-zero. > + > + To compile it as a module, choose M here. If unsure, say N. > + > config NETFILTER_XT_MATCH_CLUSTER > tristate '"cluster" match support' > depends on NF_CONNTRACK > diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile > index 8e5602f..9f12eeb 100644 > --- a/net/netfilter/Makefile > +++ b/net/netfilter/Makefile > @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o > > # matches > obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o > +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o > obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o > obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o > obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > index 8d987c3..26306be 100644 > --- a/net/netfilter/x_tables.c > +++ b/net/netfilter/x_tables.c > @@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par, > if (XT_ALIGN(par->match->matchsize) != size && > par->match->matchsize != -1) { > /* > - * ebt_among is exempt from centralized matchsize checking > - * because it uses a dynamic-size data set. > + * matches of variable size length, such as ebt_among, > + * are exempt from centralized matchsize checking. They > + * skip the test by setting xt_match.matchsize to -1. > */ > pr_err("%s_tables: %s.%u match: invalid size " > "%u (kernel) != (user) %u\n", > diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c > new file mode 100644 > index 0000000..07077c5 > --- /dev/null > +++ b/net/netfilter/xt_bpf.c > @@ -0,0 +1,88 @@ > +/* Xtables module to match packets using a BPF filter. > + * Copyright 2012 Google Inc. > + * Written by Willem de Bruijn <willemb@google.com> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + */ > + > +#include <linux/module.h> > +#include <linux/skbuff.h> > +#include <linux/ipv6.h> > +#include <linux/filter.h> > +#include <net/ip.h> > + > +#include <linux/netfilter/xt_bpf.h> > +#include <linux/netfilter/x_tables.h> > + > +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); > +MODULE_DESCRIPTION("Xtables: BPF filter match"); > +MODULE_LICENSE("GPL"); > +MODULE_ALIAS("ipt_bpf"); > +MODULE_ALIAS("ip6t_bpf"); > + > +static int bpf_mt_check(const struct xt_mtchk_param *par) > +{ > + struct xt_bpf_info *info = par->matchinfo; > + const struct xt_entry_match *match; > + struct sock_fprog program; > + int expected_len; > + > + match = container_of(par->matchinfo, const struct xt_entry_match, data); > + expected_len = sizeof(struct xt_entry_match) + > + sizeof(struct xt_bpf_info) + > + (sizeof(struct sock_filter) * > + info->bpf_program_num_elem); > + > + if (match->u.match_size != expected_len) { > + pr_info("bpf: check failed: incorrect length\n"); > + return -EINVAL; > + } > + > + program.len = info->bpf_program_num_elem; > + program.filter = info->bpf_program; > + if (sk_unattached_filter_create(&info->filter, &program)) { > + pr_info("bpf: check failed: parse error\n"); > + return -EINVAL; > + } > + > + return 0; > +} > + > +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) > +{ > + const struct xt_bpf_info *info = par->matchinfo; > + > + return SK_RUN_FILTER(info->filter, skb); > +} > + > +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) > +{ > + const struct xt_bpf_info *info = par->matchinfo; > + sk_unattached_filter_destroy(info->filter); > +} > + > +static struct xt_match bpf_mt_reg __read_mostly = { > + .name = "bpf", > + .revision = 0, > + .family = NFPROTO_UNSPEC, > + .checkentry = bpf_mt_check, > + .match = bpf_mt, > + .destroy = bpf_mt_destroy, > + .matchsize = -1, /* skip xt_check_match because of dynamic len */ > + .me = THIS_MODULE, > +}; > + > +static int __init bpf_mt_init(void) > +{ > + return xt_register_match(&bpf_mt_reg); > +} > + > +static void __exit bpf_mt_exit(void) > +{ > + xt_unregister_match(&bpf_mt_reg); > +} > + > +module_init(bpf_mt_init); > +module_exit(bpf_mt_exit); > -- > 1.7.7.3 > ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match 2012-12-05 19:48 ` Pablo Neira Ayuso @ 2012-12-05 20:10 ` Willem de Bruijn 2012-12-07 13:16 ` Pablo Neira Ayuso 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2012-12-05 20:10 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber On Wed, Dec 5, 2012 at 2:48 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > Hi Willem, > > On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote: >> A new match that executes sk_run_filter on every packet. BPF filters >> can access skbuff fields that are out of scope for existing iptables >> rules, allow more expressive logic, and on platforms with JIT support >> can even be faster. >> >> I have a corresponding iptables patch that takes `tcpdump -ddd` >> output, as used in the examples below. The two parts communicate >> using a variable length structure. This is similar to ebt_among, >> but new for iptables. >> >> Verified functionality by inserting an ip source filter on chain >> INPUT and an ip dest filter on chain OUTPUT and noting that ping >> failed while a rule was active: >> >> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP >> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP > > I like this BPF idea for iptables. > > I made a similar extension time ago, but it was taking a file as > parameter. That file contained in BPF code. I made a simple bison > parser that takes BPF code and put it into the bpf array of > instructions. It would be a bit more intuitive to define a filter and > we can distribute it with iptables. That's cleaner, indeed. I actually like how tcpdump operates as a code generator if you pass -ddd. Unfortunately, it generates code only for link layer types of its supported devices, such as DLT_EN10MB and DLT_LINUX_SLL. The network layer interface of basic iptables (forgetting device dependent mechanisms as used in xt_mac) is DLT_RAW, but that is rarely supported. > Let me check on my internal trees, I can put that user-space code > somewhere in case you're interested. Absolutely. I'll be happy to revise to get it in. I'm also considering sending a patch to tcpdump to make it generate code independent of the installed hardware when specifying -y. >> Evaluated throughput by running netperf TCP_STREAM over loopback on >> x86_64. I expected the BPF filter to outperform hardcoded iptables >> filters when replacing multiple matches with a single bpf match, but >> even a single comparison to u32 appears to do better. Relative to the >> benchmark with no filter applied, rate with 100 BPF filters dropped >> to 81%. With 100 U32 filters it dropped to 55%. The difference sounds >> excessive to me, but was consistent on my hardware. Commands used: >> >> for i in `seq 100`; do iptables -A OUTPUT -m bpf --bytecode '4,48 0 0 9,21 0 1 20,6 0 0 96,6 0 0 0,' -j DROP; done >> for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done >> >> iptables -F OUTPUT >> >> for i in `seq 100`; do iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP; done >> for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done >> >> FYI: perf top >> >> [bpf] >> 33.94% [kernel] [k] copy_user_generic_string >> 8.92% [kernel] [k] sk_run_filter >> 7.77% [ip_tables] [k] ipt_do_table >> >> [u32] >> 22.63% [kernel] [k] copy_user_generic_string >> 14.46% [kernel] [k] memcpy >> 9.19% [ip_tables] [k] ipt_do_table >> 8.47% [xt_u32] [k] u32_mt >> 5.32% [kernel] [k] skb_copy_bits >> >> The big difference appears to be in memory copying. I have not >> looked into u32, so cannot explain this right now. More interestingly, >> at higher rate, sk_run_filter appears to use as many cycles as u32_mt >> (both traces have roughly the same number of events). >> >> One caveat: to work independent of device link layer, the filter >> expects DLT_RAW style BPF programs, i.e., those that expect the >> packet to start at the IP layer. >> --- >> include/linux/netfilter/xt_bpf.h | 17 +++++++ >> net/netfilter/Kconfig | 9 ++++ >> net/netfilter/Makefile | 1 + >> net/netfilter/x_tables.c | 5 +- >> net/netfilter/xt_bpf.c | 88 ++++++++++++++++++++++++++++++++++++++ >> 5 files changed, 118 insertions(+), 2 deletions(-) >> create mode 100644 include/linux/netfilter/xt_bpf.h >> create mode 100644 net/netfilter/xt_bpf.c >> >> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h >> new file mode 100644 >> index 0000000..23502c0 >> --- /dev/null >> +++ b/include/linux/netfilter/xt_bpf.h >> @@ -0,0 +1,17 @@ >> +#ifndef _XT_BPF_H >> +#define _XT_BPF_H >> + >> +#include <linux/filter.h> >> +#include <linux/types.h> >> + >> +struct xt_bpf_info { >> + __u16 bpf_program_num_elem; >> + >> + /* only used in kernel */ >> + struct sk_filter *filter __attribute__((aligned(8))); >> + >> + /* variable size, based on program_num_elem */ >> + struct sock_filter bpf_program[0]; >> +}; >> + >> +#endif /*_XT_BPF_H */ >> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig >> index c9739c6..c7cc0b8 100644 >> --- a/net/netfilter/Kconfig >> +++ b/net/netfilter/Kconfig >> @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE >> If you want to compile it as a module, say M here and read >> <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. >> >> +config NETFILTER_XT_MATCH_BPF >> + tristate '"bpf" match support' >> + depends on NETFILTER_ADVANCED >> + help >> + BPF matching applies a linux socket filter to each packet and >> + accepts those for which the filter returns non-zero. >> + >> + To compile it as a module, choose M here. If unsure, say N. >> + >> config NETFILTER_XT_MATCH_CLUSTER >> tristate '"cluster" match support' >> depends on NF_CONNTRACK >> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile >> index 8e5602f..9f12eeb 100644 >> --- a/net/netfilter/Makefile >> +++ b/net/netfilter/Makefile >> @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o >> >> # matches >> obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o >> +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o >> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c >> index 8d987c3..26306be 100644 >> --- a/net/netfilter/x_tables.c >> +++ b/net/netfilter/x_tables.c >> @@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par, >> if (XT_ALIGN(par->match->matchsize) != size && >> par->match->matchsize != -1) { >> /* >> - * ebt_among is exempt from centralized matchsize checking >> - * because it uses a dynamic-size data set. >> + * matches of variable size length, such as ebt_among, >> + * are exempt from centralized matchsize checking. They >> + * skip the test by setting xt_match.matchsize to -1. >> */ >> pr_err("%s_tables: %s.%u match: invalid size " >> "%u (kernel) != (user) %u\n", >> diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c >> new file mode 100644 >> index 0000000..07077c5 >> --- /dev/null >> +++ b/net/netfilter/xt_bpf.c >> @@ -0,0 +1,88 @@ >> +/* Xtables module to match packets using a BPF filter. >> + * Copyright 2012 Google Inc. >> + * Written by Willem de Bruijn <willemb@google.com> >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License version 2 as >> + * published by the Free Software Foundation. >> + */ >> + >> +#include <linux/module.h> >> +#include <linux/skbuff.h> >> +#include <linux/ipv6.h> >> +#include <linux/filter.h> >> +#include <net/ip.h> >> + >> +#include <linux/netfilter/xt_bpf.h> >> +#include <linux/netfilter/x_tables.h> >> + >> +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); >> +MODULE_DESCRIPTION("Xtables: BPF filter match"); >> +MODULE_LICENSE("GPL"); >> +MODULE_ALIAS("ipt_bpf"); >> +MODULE_ALIAS("ip6t_bpf"); >> + >> +static int bpf_mt_check(const struct xt_mtchk_param *par) >> +{ >> + struct xt_bpf_info *info = par->matchinfo; >> + const struct xt_entry_match *match; >> + struct sock_fprog program; >> + int expected_len; >> + >> + match = container_of(par->matchinfo, const struct xt_entry_match, data); >> + expected_len = sizeof(struct xt_entry_match) + >> + sizeof(struct xt_bpf_info) + >> + (sizeof(struct sock_filter) * >> + info->bpf_program_num_elem); >> + >> + if (match->u.match_size != expected_len) { >> + pr_info("bpf: check failed: incorrect length\n"); >> + return -EINVAL; >> + } >> + >> + program.len = info->bpf_program_num_elem; >> + program.filter = info->bpf_program; >> + if (sk_unattached_filter_create(&info->filter, &program)) { >> + pr_info("bpf: check failed: parse error\n"); >> + return -EINVAL; >> + } >> + >> + return 0; >> +} >> + >> +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) >> +{ >> + const struct xt_bpf_info *info = par->matchinfo; >> + >> + return SK_RUN_FILTER(info->filter, skb); >> +} >> + >> +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) >> +{ >> + const struct xt_bpf_info *info = par->matchinfo; >> + sk_unattached_filter_destroy(info->filter); >> +} >> + >> +static struct xt_match bpf_mt_reg __read_mostly = { >> + .name = "bpf", >> + .revision = 0, >> + .family = NFPROTO_UNSPEC, >> + .checkentry = bpf_mt_check, >> + .match = bpf_mt, >> + .destroy = bpf_mt_destroy, >> + .matchsize = -1, /* skip xt_check_match because of dynamic len */ >> + .me = THIS_MODULE, >> +}; >> + >> +static int __init bpf_mt_init(void) >> +{ >> + return xt_register_match(&bpf_mt_reg); >> +} >> + >> +static void __exit bpf_mt_exit(void) >> +{ >> + xt_unregister_match(&bpf_mt_reg); >> +} >> + >> +module_init(bpf_mt_init); >> +module_exit(bpf_mt_exit); >> -- >> 1.7.7.3 >> ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match 2012-12-05 20:10 ` Willem de Bruijn @ 2012-12-07 13:16 ` Pablo Neira Ayuso 2012-12-07 16:56 ` Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2012-12-07 13:16 UTC (permalink / raw) To: Willem de Bruijn Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber On Wed, Dec 05, 2012 at 03:10:13PM -0500, Willem de Bruijn wrote: > On Wed, Dec 5, 2012 at 2:48 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > Hi Willem, > > > > On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote: > >> A new match that executes sk_run_filter on every packet. BPF filters > >> can access skbuff fields that are out of scope for existing iptables > >> rules, allow more expressive logic, and on platforms with JIT support > >> can even be faster. > >> > >> I have a corresponding iptables patch that takes `tcpdump -ddd` > >> output, as used in the examples below. The two parts communicate > >> using a variable length structure. This is similar to ebt_among, > >> but new for iptables. > >> > >> Verified functionality by inserting an ip source filter on chain > >> INPUT and an ip dest filter on chain OUTPUT and noting that ping > >> failed while a rule was active: > >> > >> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP > >> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP > > > > I like this BPF idea for iptables. > > > > I made a similar extension time ago, but it was taking a file as > > parameter. That file contained in BPF code. I made a simple bison > > parser that takes BPF code and put it into the bpf array of > > instructions. It would be a bit more intuitive to define a filter and > > we can distribute it with iptables. > > That's cleaner, indeed. I actually like how tcpdump operates as a > code generator if you pass -ddd. Unfortunately, it generates code only > for link layer types of its supported devices, such as DLT_EN10MB and > DLT_LINUX_SLL. The network layer interface of basic iptables > (forgetting device dependent mechanisms as used in xt_mac) is DLT_RAW, > but that is rarely supported. Indeed, you'll have to hack on tcpdump to select the offset. In iptables the base is the layer 3 header. With that change you could use tcpdump for generate code automagically from their syntax. > > Let me check on my internal trees, I can put that user-space code > > somewhere in case you're interested. > > Absolutely. I'll be happy to revise to get it in. I'm also considering > sending a patch to tcpdump to make it generate code independent of the > installed hardware when specifying -y. I found a version of the old parser code I made: http://1984.lsi.us.es/git/nfbpf/ It interprets a filter expressed in a similar way to tcpdump -dd but it's using the BPF constants. It's quite preliminary and simple if you look at the code. Extending it to interpret some syntax similar to tcpdump -d would even make more readable the BPF filter. Time ago I also thought about taking the kernel code that checks that the filter is correct. Currently you get -EINVAL if you pass a handcrafted filter which is incorrect, so it's hard task to debug what you made wrong. It could be added to the iptables tree. Or if generic enough for BPF and the effort is worth, just provide some small library that iptables can link with and a small compiler/checker to help people develop BPF filters. Back to your xt_bpf thing, we can use the file containing the code instead: iptables -v -A INPUT -m bpf --bytecode-file filter1.bpf -j DROP iptables -v -A OUTPUT -m bpf --bytecode-file filter2.bpf -j DROP We can still allow the inlined filter via --bytecode if you want. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match 2012-12-07 13:16 ` Pablo Neira Ayuso @ 2012-12-07 16:56 ` Willem de Bruijn 2012-12-08 3:31 ` Pablo Neira Ayuso 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2012-12-07 16:56 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber On Fri, Dec 7, 2012 at 8:16 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Wed, Dec 05, 2012 at 03:10:13PM -0500, Willem de Bruijn wrote: >> On Wed, Dec 5, 2012 at 2:48 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >> > Hi Willem, >> > >> > On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote: >> >> A new match that executes sk_run_filter on every packet. BPF filters >> >> can access skbuff fields that are out of scope for existing iptables >> >> rules, allow more expressive logic, and on platforms with JIT support >> >> can even be faster. >> >> >> >> I have a corresponding iptables patch that takes `tcpdump -ddd` >> >> output, as used in the examples below. The two parts communicate >> >> using a variable length structure. This is similar to ebt_among, >> >> but new for iptables. >> >> >> >> Verified functionality by inserting an ip source filter on chain >> >> INPUT and an ip dest filter on chain OUTPUT and noting that ping >> >> failed while a rule was active: >> >> >> >> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP >> >> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP >> > >> > I like this BPF idea for iptables. >> > >> > I made a similar extension time ago, but it was taking a file as >> > parameter. That file contained in BPF code. I made a simple bison >> > parser that takes BPF code and put it into the bpf array of >> > instructions. It would be a bit more intuitive to define a filter and >> > we can distribute it with iptables. >> >> That's cleaner, indeed. I actually like how tcpdump operates as a >> code generator if you pass -ddd. Unfortunately, it generates code only >> for link layer types of its supported devices, such as DLT_EN10MB and >> DLT_LINUX_SLL. The network layer interface of basic iptables >> (forgetting device dependent mechanisms as used in xt_mac) is DLT_RAW, >> but that is rarely supported. > > Indeed, you'll have to hack on tcpdump to select the offset. In > iptables the base is the layer 3 header. With that change you could > use tcpdump for generate code automagically from their syntax. > >> > Let me check on my internal trees, I can put that user-space code >> > somewhere in case you're interested. >> >> Absolutely. I'll be happy to revise to get it in. I'm also considering >> sending a patch to tcpdump to make it generate code independent of the >> installed hardware when specifying -y. > > I found a version of the old parser code I made: > > http://1984.lsi.us.es/git/nfbpf/ > > It interprets a filter expressed in a similar way to tcpdump -dd but > it's using the BPF constants. It's quite preliminary and simple if you > look at the code. > > Extending it to interpret some syntax similar to tcpdump -d would even > make more readable the BPF filter. > > Time ago I also thought about taking the kernel code that checks that > the filter is correct. Currently you get -EINVAL if you pass a > handcrafted filter which is incorrect, so it's hard task to debug what > you made wrong. > > It could be added to the iptables tree. Or if generic enough for BPF > and the effort is worth, just provide some small library that iptables > can link with and a small compiler/checker to help people develop BPF > filters. Or use pcap_compile? I went with the tcpdump output to avoid introducing a direct dependency on pcap to iptables. One possible downside I see to pcap_compile vs. developing from scratch is that it might lag in supporting the LSF ancillary data fields. > Back to your xt_bpf thing, we can use the file containing the code > instead: > > iptables -v -A INPUT -m bpf --bytecode-file filter1.bpf -j DROP > iptables -v -A OUTPUT -m bpf --bytecode-file filter2.bpf -j DROP > > We can still allow the inlined filter via --bytecode if you want. I'll add that. I'd like to keep --bytecode to able to generate the code inline using backticks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match 2012-12-07 16:56 ` Willem de Bruijn @ 2012-12-08 3:31 ` Pablo Neira Ayuso 2012-12-08 16:02 ` Daniel Borkmann 2012-12-09 21:52 ` [PATCH next] iptables: add xt_bpf match Willem de Bruijn 0 siblings, 2 replies; 58+ messages in thread From: Pablo Neira Ayuso @ 2012-12-08 3:31 UTC (permalink / raw) To: Willem de Bruijn Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber On Fri, Dec 07, 2012 at 11:56:05AM -0500, Willem de Bruijn wrote: > On Fri, Dec 7, 2012 at 8:16 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Wed, Dec 05, 2012 at 03:10:13PM -0500, Willem de Bruijn wrote: > >> On Wed, Dec 5, 2012 at 2:48 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > >> > Hi Willem, > >> > > >> > On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote: > >> >> A new match that executes sk_run_filter on every packet. BPF filters > >> >> can access skbuff fields that are out of scope for existing iptables > >> >> rules, allow more expressive logic, and on platforms with JIT support > >> >> can even be faster. > >> >> > >> >> I have a corresponding iptables patch that takes `tcpdump -ddd` > >> >> output, as used in the examples below. The two parts communicate > >> >> using a variable length structure. This is similar to ebt_among, > >> >> but new for iptables. > >> >> > >> >> Verified functionality by inserting an ip source filter on chain > >> >> INPUT and an ip dest filter on chain OUTPUT and noting that ping > >> >> failed while a rule was active: > >> >> > >> >> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP > >> >> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP > >> > > >> > I like this BPF idea for iptables. > >> > > >> > I made a similar extension time ago, but it was taking a file as > >> > parameter. That file contained in BPF code. I made a simple bison > >> > parser that takes BPF code and put it into the bpf array of > >> > instructions. It would be a bit more intuitive to define a filter and > >> > we can distribute it with iptables. > >> > >> That's cleaner, indeed. I actually like how tcpdump operates as a > >> code generator if you pass -ddd. Unfortunately, it generates code only > >> for link layer types of its supported devices, such as DLT_EN10MB and > >> DLT_LINUX_SLL. The network layer interface of basic iptables > >> (forgetting device dependent mechanisms as used in xt_mac) is DLT_RAW, > >> but that is rarely supported. > > > > Indeed, you'll have to hack on tcpdump to select the offset. In > > iptables the base is the layer 3 header. With that change you could > > use tcpdump for generate code automagically from their syntax. > > > >> > Let me check on my internal trees, I can put that user-space code > >> > somewhere in case you're interested. > >> > >> Absolutely. I'll be happy to revise to get it in. I'm also considering > >> sending a patch to tcpdump to make it generate code independent of the > >> installed hardware when specifying -y. > > > > I found a version of the old parser code I made: > > > > http://1984.lsi.us.es/git/nfbpf/ > > > > It interprets a filter expressed in a similar way to tcpdump -dd but > > it's using the BPF constants. It's quite preliminary and simple if you > > look at the code. > > > > Extending it to interpret some syntax similar to tcpdump -d would even > > make more readable the BPF filter. > > > > Time ago I also thought about taking the kernel code that checks that > > the filter is correct. Currently you get -EINVAL if you pass a > > handcrafted filter which is incorrect, so it's hard task to debug what > > you made wrong. > > > > It could be added to the iptables tree. Or if generic enough for BPF > > and the effort is worth, just provide some small library that iptables > > can link with and a small compiler/checker to help people develop BPF > > filters. > > Or use pcap_compile? I went with the tcpdump output to avoid > introducing a direct dependency on pcap to iptables. One possible > downside I see to pcap_compile vs. developing from scratch is that it > might lag in supporting the LSF ancillary data fields. I suggest to put the code of that preliminary nfbpf utility into iptables to allow to read the BPF filters from a file and put them into the BPF array of instructions. I can help with that. > > Back to your xt_bpf thing, we can use the file containing the code > > instead: > > > > iptables -v -A INPUT -m bpf --bytecode-file filter1.bpf -j DROP > > iptables -v -A OUTPUT -m bpf --bytecode-file filter2.bpf -j DROP > > > > We can still allow the inlined filter via --bytecode if you want. > > I'll add that. I'd like to keep --bytecode to able to generate the > code inline using backticks. As said, I'm fine with that, but I'll be really happy if we can provide some utility to generate that code using backticks for the masses (in case they want to pass it inlined in that format). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match 2012-12-08 3:31 ` Pablo Neira Ayuso @ 2012-12-08 16:02 ` Daniel Borkmann 2012-12-09 21:52 ` [PATCH next] iptables: add xt_bpf match Willem de Bruijn 1 sibling, 0 replies; 58+ messages in thread From: Daniel Borkmann @ 2012-12-08 16:02 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: Willem de Bruijn, netfilter-devel, netdev, Eric Dumazet, David Miller, kaber On Sat, Dec 8, 2012 at 4:31 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Fri, Dec 07, 2012 at 11:56:05AM -0500, Willem de Bruijn wrote: >> On Fri, Dec 7, 2012 at 8:16 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >> > On Wed, Dec 05, 2012 at 03:10:13PM -0500, Willem de Bruijn wrote: >> >> On Wed, Dec 5, 2012 at 2:48 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >> >> > Hi Willem, >> >> > >> >> > On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote: >> >> >> A new match that executes sk_run_filter on every packet. BPF filters >> >> >> can access skbuff fields that are out of scope for existing iptables >> >> >> rules, allow more expressive logic, and on platforms with JIT support >> >> >> can even be faster. >> >> >> >> >> >> I have a corresponding iptables patch that takes `tcpdump -ddd` >> >> >> output, as used in the examples below. The two parts communicate >> >> >> using a variable length structure. This is similar to ebt_among, >> >> >> but new for iptables. >> >> >> >> >> >> Verified functionality by inserting an ip source filter on chain >> >> >> INPUT and an ip dest filter on chain OUTPUT and noting that ping >> >> >> failed while a rule was active: >> >> >> >> >> >> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP >> >> >> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP >> >> > >> >> > I like this BPF idea for iptables. >> >> > >> >> > I made a similar extension time ago, but it was taking a file as >> >> > parameter. That file contained in BPF code. I made a simple bison >> >> > parser that takes BPF code and put it into the bpf array of >> >> > instructions. It would be a bit more intuitive to define a filter and >> >> > we can distribute it with iptables. >> >> >> >> That's cleaner, indeed. I actually like how tcpdump operates as a >> >> code generator if you pass -ddd. Unfortunately, it generates code only >> >> for link layer types of its supported devices, such as DLT_EN10MB and >> >> DLT_LINUX_SLL. The network layer interface of basic iptables >> >> (forgetting device dependent mechanisms as used in xt_mac) is DLT_RAW, >> >> but that is rarely supported. >> > >> > Indeed, you'll have to hack on tcpdump to select the offset. In >> > iptables the base is the layer 3 header. With that change you could >> > use tcpdump for generate code automagically from their syntax. >> > >> >> > Let me check on my internal trees, I can put that user-space code >> >> > somewhere in case you're interested. >> >> >> >> Absolutely. I'll be happy to revise to get it in. I'm also considering >> >> sending a patch to tcpdump to make it generate code independent of the >> >> installed hardware when specifying -y. >> > >> > I found a version of the old parser code I made: >> > >> > http://1984.lsi.us.es/git/nfbpf/ >> > >> > It interprets a filter expressed in a similar way to tcpdump -dd but >> > it's using the BPF constants. It's quite preliminary and simple if you >> > look at the code. >> > >> > Extending it to interpret some syntax similar to tcpdump -d would even >> > make more readable the BPF filter. >> > >> > Time ago I also thought about taking the kernel code that checks that >> > the filter is correct. Currently you get -EINVAL if you pass a >> > handcrafted filter which is incorrect, so it's hard task to debug what >> > you made wrong. >> > >> > It could be added to the iptables tree. Or if generic enough for BPF >> > and the effort is worth, just provide some small library that iptables >> > can link with and a small compiler/checker to help people develop BPF >> > filters. >> >> Or use pcap_compile? I went with the tcpdump output to avoid >> introducing a direct dependency on pcap to iptables. One possible >> downside I see to pcap_compile vs. developing from scratch is that it >> might lag in supporting the LSF ancillary data fields. > > I suggest to put the code of that preliminary nfbpf utility into > iptables to allow to read the BPF filters from a file and put them > into the BPF array of instructions. I can help with that. > >> > Back to your xt_bpf thing, we can use the file containing the code >> > instead: >> > >> > iptables -v -A INPUT -m bpf --bytecode-file filter1.bpf -j DROP >> > iptables -v -A OUTPUT -m bpf --bytecode-file filter2.bpf -j DROP >> > >> > We can still allow the inlined filter via --bytecode if you want. >> >> I'll add that. I'd like to keep --bytecode to able to generate the >> code inline using backticks. > > As said, I'm fine with that, but I'll be really happy if we can > provide some utility to generate that code using backticks for the > masses (in case they want to pass it inlined in that format). If it helps, you could use "bpfc", or rip-off its code to not have a dependency; it's part of the netsniff-ng toolkit. It can be used like: bpfc examples/bpfc/arp.bpf { 0x28, 0, 0, 0x0000000c }, { 0x15, 0, 1, 0x00000806 }, { 0x6, 0, 0, 0xffffffff }, { 0x6, 0, 0, 0x00000000 }, where arp.bpf is, for instance: _main: ldh [12] jeq #0x806, keep, drop keep: ret #0xffffffff drop: ret #0 "Core" files are: src/bpf_lexer.l, src/bpf_parser.y It also supports all Linux ANC-operations that were added to the kernel (like VLAN, XOR and so on). I started but didn't have time to continue a higher-level language for that, that would translate to such an example above (which then translates again to opcodes). ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH next] iptables: add xt_bpf match 2012-12-08 3:31 ` Pablo Neira Ayuso 2012-12-08 16:02 ` Daniel Borkmann @ 2012-12-09 21:52 ` Willem de Bruijn 2013-01-08 3:21 ` Pablo Neira Ayuso 1 sibling, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2012-12-09 21:52 UTC (permalink / raw) To: netfilter-devel, pablo; +Cc: Willem de Bruijn Support arbitrary linux socket filter (BPF) programs as iptables match rules. This allows for very expressive filters, and on platforms with BPF JIT appears competitive with traditional hardcoded iptables rules. At least, on an x86_64 that achieves 40K netperf TCP_STREAM without any iptables rules (40 GBps), inserting 100x this bpf rule gives 28K ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') inserting 100x this u32 rule gives 21K ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP The two are logically equivalent, as far as I can tell. Let me know if my test methodology is flawed in some way. Even in cases where slower, the filter adds functionality currently lacking in iptables, such as access to sk_buff fields like rxhash and queue_mapping. Signed-off-by: Willem de Bruijn <willemb@google.com> --- include/linux/netfilter/xt_bpf.h | 17 +++++++ net/netfilter/Kconfig | 9 ++++ net/netfilter/Makefile | 1 + net/netfilter/x_tables.c | 5 +- net/netfilter/xt_bpf.c | 86 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 116 insertions(+), 2 deletions(-) create mode 100644 include/linux/netfilter/xt_bpf.h create mode 100644 net/netfilter/xt_bpf.c diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h new file mode 100644 index 0000000..23502c0 --- /dev/null +++ b/include/linux/netfilter/xt_bpf.h @@ -0,0 +1,17 @@ +#ifndef _XT_BPF_H +#define _XT_BPF_H + +#include <linux/filter.h> +#include <linux/types.h> + +struct xt_bpf_info { + __u16 bpf_program_num_elem; + + /* only used in kernel */ + struct sk_filter *filter __attribute__((aligned(8))); + + /* variable size, based on program_num_elem */ + struct sock_filter bpf_program[0]; +}; + +#endif /*_XT_BPF_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index fefa514..d45720f 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE If you want to compile it as a module, say M here and read <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. +config NETFILTER_XT_MATCH_BPF + tristate '"bpf" match support' + depends on NETFILTER_ADVANCED + help + BPF matching applies a linux socket filter to each packet and + accepts those for which the filter returns non-zero. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_CLUSTER tristate '"cluster" match support' depends on NF_CONNTRACK diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 3259697..6d6194525 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o # matches obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 8d987c3..26306be 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par, if (XT_ALIGN(par->match->matchsize) != size && par->match->matchsize != -1) { /* - * ebt_among is exempt from centralized matchsize checking - * because it uses a dynamic-size data set. + * matches of variable size length, such as ebt_among, + * are exempt from centralized matchsize checking. They + * skip the test by setting xt_match.matchsize to -1. */ pr_err("%s_tables: %s.%u match: invalid size " "%u (kernel) != (user) %u\n", diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c new file mode 100644 index 0000000..312cb90 --- /dev/null +++ b/net/netfilter/xt_bpf.c @@ -0,0 +1,86 @@ +/* Xtables module to match packets using a BPF filter. + * Copyright 2012 Google Inc. + * Written by Willem de Bruijn <willemb@google.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/module.h> +#include <linux/skbuff.h> +#include <linux/ipv6.h> +#include <linux/filter.h> +#include <net/ip.h> + +#include <linux/netfilter/xt_bpf.h> +#include <linux/netfilter/x_tables.h> + +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); +MODULE_DESCRIPTION("Xtables: BPF filter match"); +MODULE_LICENSE("GPL"); + +static int bpf_mt_check(const struct xt_mtchk_param *par) +{ + struct xt_bpf_info *info = par->matchinfo; + const struct xt_entry_match *match; + struct sock_fprog program; + int expected_len; + + match = container_of(par->matchinfo, const struct xt_entry_match, data); + expected_len = sizeof(struct xt_entry_match) + + sizeof(struct xt_bpf_info) + + (sizeof(struct sock_filter) * + info->bpf_program_num_elem); + + if (match->u.match_size != expected_len) { + pr_info("bpf: check failed: incorrect length\n"); + return -EINVAL; + } + + program.len = info->bpf_program_num_elem; + program.filter = info->bpf_program; + if (sk_unattached_filter_create(&info->filter, &program)) { + pr_info("bpf: check failed: parse error\n"); + return -EINVAL; + } + + return 0; +} + +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + + return SK_RUN_FILTER(info->filter, skb); +} + +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + sk_unattached_filter_destroy(info->filter); +} + +static struct xt_match bpf_mt_reg __read_mostly = { + .name = "bpf", + .revision = 0, + .family = NFPROTO_UNSPEC, + .checkentry = bpf_mt_check, + .match = bpf_mt, + .destroy = bpf_mt_destroy, + .matchsize = -1, /* skip xt_check_match because of dynamic len */ + .me = THIS_MODULE, +}; + +static int __init bpf_mt_init(void) +{ + return xt_register_match(&bpf_mt_reg); +} + +static void __exit bpf_mt_exit(void) +{ + xt_unregister_match(&bpf_mt_reg); +} + +module_init(bpf_mt_init); +module_exit(bpf_mt_exit); -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH next] iptables: add xt_bpf match 2012-12-09 21:52 ` [PATCH next] iptables: add xt_bpf match Willem de Bruijn @ 2013-01-08 3:21 ` Pablo Neira Ayuso 2013-01-09 1:58 ` Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-01-08 3:21 UTC (permalink / raw) To: Willem de Bruijn; +Cc: netfilter-devel Hi Willem, On Sun, Dec 09, 2012 at 04:52:58PM -0500, Willem de Bruijn wrote: > Support arbitrary linux socket filter (BPF) programs as iptables > match rules. This allows for very expressive filters, and on > platforms with BPF JIT appears competitive with traditional hardcoded > iptables rules. > > At least, on an x86_64 that achieves 40K netperf TCP_STREAM without > any iptables rules (40 GBps), > > inserting 100x this bpf rule gives 28K > > ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j > > (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') > > inserting 100x this u32 rule gives 21K > > ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP > > The two are logically equivalent, as far as I can tell. Let me know > if my test methodology is flawed in some way. Even in cases where > slower, the filter adds functionality currently lacking in iptables, > such as access to sk_buff fields like rxhash and queue_mapping. > > Signed-off-by: Willem de Bruijn <willemb@google.com> > --- > include/linux/netfilter/xt_bpf.h | 17 +++++++ > net/netfilter/Kconfig | 9 ++++ > net/netfilter/Makefile | 1 + > net/netfilter/x_tables.c | 5 +- > net/netfilter/xt_bpf.c | 86 ++++++++++++++++++++++++++++++++++++++ > 5 files changed, 116 insertions(+), 2 deletions(-) > create mode 100644 include/linux/netfilter/xt_bpf.h > create mode 100644 net/netfilter/xt_bpf.c > > diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h > new file mode 100644 > index 0000000..23502c0 > --- /dev/null > +++ b/include/linux/netfilter/xt_bpf.h > @@ -0,0 +1,17 @@ > +#ifndef _XT_BPF_H > +#define _XT_BPF_H > + > +#include <linux/filter.h> > +#include <linux/types.h> > + > +struct xt_bpf_info { > + __u16 bpf_program_num_elem; > + > + /* only used in kernel */ > + struct sk_filter *filter __attribute__((aligned(8))); I see. You set match->userspacesize to zero in libxt_bpf to skip the comparison of that internal struct sk_filter *filter. > + > + /* variable size, based on program_num_elem */ > + struct sock_filter bpf_program[0]; While testing this I noticed: iptables -I OUTPUT -m bpf --bytecode \ '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT Note that this works but it should not. iptables -D OUTPUT -m bpf --bytecode \ '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,1 0 0 0' -j ACCEPT ^ Mind that 1, it's a different filter, but it deletes the previous filter without problems here. A quick look at make_delete_mask() in iptables tells me that the changes you made to userspace to allow variable size matches are not enough to generate a sane mask (which is fundamental while looking for a matching rule during the deletion). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next] iptables: add xt_bpf match 2013-01-08 3:21 ` Pablo Neira Ayuso @ 2013-01-09 1:58 ` Willem de Bruijn 2013-01-09 9:52 ` Pablo Neira Ayuso 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-01-09 1:58 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter-devel On Mon, Jan 7, 2013 at 10:21 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > Hi Willem, > > On Sun, Dec 09, 2012 at 04:52:58PM -0500, Willem de Bruijn wrote: >> Support arbitrary linux socket filter (BPF) programs as iptables >> match rules. This allows for very expressive filters, and on >> platforms with BPF JIT appears competitive with traditional hardcoded >> iptables rules. >> >> At least, on an x86_64 that achieves 40K netperf TCP_STREAM without >> any iptables rules (40 GBps), >> >> inserting 100x this bpf rule gives 28K >> >> ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j >> >> (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') >> >> inserting 100x this u32 rule gives 21K >> >> ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP >> >> The two are logically equivalent, as far as I can tell. Let me know >> if my test methodology is flawed in some way. Even in cases where >> slower, the filter adds functionality currently lacking in iptables, >> such as access to sk_buff fields like rxhash and queue_mapping. >> >> Signed-off-by: Willem de Bruijn <willemb@google.com> >> --- >> include/linux/netfilter/xt_bpf.h | 17 +++++++ >> net/netfilter/Kconfig | 9 ++++ >> net/netfilter/Makefile | 1 + >> net/netfilter/x_tables.c | 5 +- >> net/netfilter/xt_bpf.c | 86 ++++++++++++++++++++++++++++++++++++++ >> 5 files changed, 116 insertions(+), 2 deletions(-) >> create mode 100644 include/linux/netfilter/xt_bpf.h >> create mode 100644 net/netfilter/xt_bpf.c >> >> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h >> new file mode 100644 >> index 0000000..23502c0 >> --- /dev/null >> +++ b/include/linux/netfilter/xt_bpf.h >> @@ -0,0 +1,17 @@ >> +#ifndef _XT_BPF_H >> +#define _XT_BPF_H >> + >> +#include <linux/filter.h> >> +#include <linux/types.h> >> + >> +struct xt_bpf_info { >> + __u16 bpf_program_num_elem; >> + >> + /* only used in kernel */ >> + struct sk_filter *filter __attribute__((aligned(8))); > > I see. You set match->userspacesize to zero in libxt_bpf to skip the > comparison of that internal struct sk_filter *filter. > >> + >> + /* variable size, based on program_num_elem */ >> + struct sock_filter bpf_program[0]; > > While testing this I noticed: > > iptables -I OUTPUT -m bpf --bytecode \ > '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT > > Note that this works but it should not. > > iptables -D OUTPUT -m bpf --bytecode \ > '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,1 0 0 0' -j ACCEPT > ^ > Mind that 1, it's a different filter, but it deletes the previous > filter without problems here. > > A quick look at make_delete_mask() in iptables tells me that the > changes you made to userspace to allow variable size matches are not > enough to generate a sane mask (which is fundamental while looking for > a matching rule during the deletion). Thanks for finding this, Pablo. I completely forgot to check that. I've never looked at that deletion code before. Will read it and hopefully propose a simple fix in a few days. An earlier version of the patch used a statically sized struct, by the way, like xt_string does (XT_STRING_MAX_PATTERN_SIZE). If it is easier to incorporate, we can always revert to that. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next] iptables: add xt_bpf match 2013-01-09 1:58 ` Willem de Bruijn @ 2013-01-09 9:52 ` Pablo Neira Ayuso 2013-01-10 0:08 ` Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-01-09 9:52 UTC (permalink / raw) To: Willem de Bruijn; +Cc: netfilter-devel Hi Willem, On Tue, Jan 08, 2013 at 08:58:37PM -0500, Willem de Bruijn wrote: > On Mon, Jan 7, 2013 at 10:21 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > Hi Willem, > > > > On Sun, Dec 09, 2012 at 04:52:58PM -0500, Willem de Bruijn wrote: > >> Support arbitrary linux socket filter (BPF) programs as iptables > >> match rules. This allows for very expressive filters, and on > >> platforms with BPF JIT appears competitive with traditional hardcoded > >> iptables rules. > >> > >> At least, on an x86_64 that achieves 40K netperf TCP_STREAM without > >> any iptables rules (40 GBps), > >> > >> inserting 100x this bpf rule gives 28K > >> > >> ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j > >> > >> (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') > >> > >> inserting 100x this u32 rule gives 21K > >> > >> ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP > >> > >> The two are logically equivalent, as far as I can tell. Let me know > >> if my test methodology is flawed in some way. Even in cases where > >> slower, the filter adds functionality currently lacking in iptables, > >> such as access to sk_buff fields like rxhash and queue_mapping. > >> > >> Signed-off-by: Willem de Bruijn <willemb@google.com> > >> --- > >> include/linux/netfilter/xt_bpf.h | 17 +++++++ > >> net/netfilter/Kconfig | 9 ++++ > >> net/netfilter/Makefile | 1 + > >> net/netfilter/x_tables.c | 5 +- > >> net/netfilter/xt_bpf.c | 86 ++++++++++++++++++++++++++++++++++++++ > >> 5 files changed, 116 insertions(+), 2 deletions(-) > >> create mode 100644 include/linux/netfilter/xt_bpf.h > >> create mode 100644 net/netfilter/xt_bpf.c > >> > >> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h > >> new file mode 100644 > >> index 0000000..23502c0 > >> --- /dev/null > >> +++ b/include/linux/netfilter/xt_bpf.h > >> @@ -0,0 +1,17 @@ > >> +#ifndef _XT_BPF_H > >> +#define _XT_BPF_H > >> + > >> +#include <linux/filter.h> > >> +#include <linux/types.h> > >> + > >> +struct xt_bpf_info { > >> + __u16 bpf_program_num_elem; > >> + > >> + /* only used in kernel */ > >> + struct sk_filter *filter __attribute__((aligned(8))); > > > > I see. You set match->userspacesize to zero in libxt_bpf to skip the > > comparison of that internal struct sk_filter *filter. > > > >> + > >> + /* variable size, based on program_num_elem */ > >> + struct sock_filter bpf_program[0]; > > > > While testing this I noticed: > > > > iptables -I OUTPUT -m bpf --bytecode \ > > '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT > > > > Note that this works but it should not. > > > > iptables -D OUTPUT -m bpf --bytecode \ > > '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,1 0 0 0' -j ACCEPT > > ^ > > Mind that 1, it's a different filter, but it deletes the previous > > filter without problems here. > > > > A quick look at make_delete_mask() in iptables tells me that the > > changes you made to userspace to allow variable size matches are not > > enough to generate a sane mask (which is fundamental while looking for > > a matching rule during the deletion). > > Thanks for finding this, Pablo. I completely forgot to check that. > > I've never looked at that deletion code before. Will read it and > hopefully propose a simple fix in a few days. An earlier version of > the patch used a statically sized struct, by the way, like xt_string > does (XT_STRING_MAX_PATTERN_SIZE). If it is easier to > incorporate, we can always revert to that. I prefer if this sticks to static size by now. The problem is that BPF_MAXINSNS is probably too much to allocate per rule. So you'll have to limit this to some reasonable amount of lines in the filter. Please, also check that iptables-save and iptables-restore work fine, there is also some problem with the existing code. Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next] iptables: add xt_bpf match 2013-01-09 9:52 ` Pablo Neira Ayuso @ 2013-01-10 0:08 ` Willem de Bruijn 2013-01-10 0:08 ` [PATCH next v2] " Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-01-10 0:08 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter-devel On Wed, Jan 9, 2013 at 4:52 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > Hi Willem, > > On Tue, Jan 08, 2013 at 08:58:37PM -0500, Willem de Bruijn wrote: >> On Mon, Jan 7, 2013 at 10:21 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >> > Hi Willem, >> > >> > On Sun, Dec 09, 2012 at 04:52:58PM -0500, Willem de Bruijn wrote: >> >> Support arbitrary linux socket filter (BPF) programs as iptables >> >> match rules. This allows for very expressive filters, and on >> >> platforms with BPF JIT appears competitive with traditional hardcoded >> >> iptables rules. >> >> >> >> At least, on an x86_64 that achieves 40K netperf TCP_STREAM without >> >> any iptables rules (40 GBps), >> >> >> >> inserting 100x this bpf rule gives 28K >> >> >> >> ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j >> >> >> >> (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') >> >> >> >> inserting 100x this u32 rule gives 21K >> >> >> >> ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP >> >> >> >> The two are logically equivalent, as far as I can tell. Let me know >> >> if my test methodology is flawed in some way. Even in cases where >> >> slower, the filter adds functionality currently lacking in iptables, >> >> such as access to sk_buff fields like rxhash and queue_mapping. >> >> >> >> Signed-off-by: Willem de Bruijn <willemb@google.com> >> >> --- >> >> include/linux/netfilter/xt_bpf.h | 17 +++++++ >> >> net/netfilter/Kconfig | 9 ++++ >> >> net/netfilter/Makefile | 1 + >> >> net/netfilter/x_tables.c | 5 +- >> >> net/netfilter/xt_bpf.c | 86 ++++++++++++++++++++++++++++++++++++++ >> >> 5 files changed, 116 insertions(+), 2 deletions(-) >> >> create mode 100644 include/linux/netfilter/xt_bpf.h >> >> create mode 100644 net/netfilter/xt_bpf.c >> >> >> >> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h >> >> new file mode 100644 >> >> index 0000000..23502c0 >> >> --- /dev/null >> >> +++ b/include/linux/netfilter/xt_bpf.h >> >> @@ -0,0 +1,17 @@ >> >> +#ifndef _XT_BPF_H >> >> +#define _XT_BPF_H >> >> + >> >> +#include <linux/filter.h> >> >> +#include <linux/types.h> >> >> + >> >> +struct xt_bpf_info { >> >> + __u16 bpf_program_num_elem; >> >> + >> >> + /* only used in kernel */ >> >> + struct sk_filter *filter __attribute__((aligned(8))); >> > >> > I see. You set match->userspacesize to zero in libxt_bpf to skip the >> > comparison of that internal struct sk_filter *filter. >> > >> >> + >> >> + /* variable size, based on program_num_elem */ >> >> + struct sock_filter bpf_program[0]; >> > >> > While testing this I noticed: >> > >> > iptables -I OUTPUT -m bpf --bytecode \ >> > '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT >> > >> > Note that this works but it should not. >> > >> > iptables -D OUTPUT -m bpf --bytecode \ >> > '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,1 0 0 0' -j ACCEPT >> > ^ >> > Mind that 1, it's a different filter, but it deletes the previous >> > filter without problems here. >> > >> > A quick look at make_delete_mask() in iptables tells me that the >> > changes you made to userspace to allow variable size matches are not >> > enough to generate a sane mask (which is fundamental while looking for >> > a matching rule during the deletion). >> >> Thanks for finding this, Pablo. I completely forgot to check that. >> >> I've never looked at that deletion code before. Will read it and >> hopefully propose a simple fix in a few days. An earlier version of >> the patch used a statically sized struct, by the way, like xt_string >> does (XT_STRING_MAX_PATTERN_SIZE). If it is easier to >> incorporate, we can always revert to that. > > I prefer if this sticks to static size by now. Okay. That is actually a lot simpler. > The problem is that > BPF_MAXINSNS is probably too much to allocate per rule. So you'll have > to limit this to some reasonable amount of lines in the filter. > Please, also check that iptables-save and iptables-restore work fine, > there is also some problem with the existing code. Done. I'll send updated patches right after this. Verified that they work using ## test append # fail: more rules than num_rules ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,6 0 0 0' -j ACCEPT # fail: fewer rules than num_rules ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96' -j ACCEPT # pass: correct ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT ## test delete # fail: differs ./iptables -D OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 1' -j ACCEPT # pass: same ./iptables -D OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT ## test save/restore ./iptables-save > out && cat out && ./iptables-restore < out && echo "OK" I did not retest the datapath for this revision. ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH next v2] iptables: add xt_bpf match 2013-01-10 0:08 ` Willem de Bruijn @ 2013-01-10 0:08 ` Willem de Bruijn 2013-01-10 0:15 ` [PATCH next v3] " Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-01-10 0:08 UTC (permalink / raw) To: netfilter-devel, pablo; +Cc: Willem de Bruijn Changes: - v2->v1: use a fixed size match structure to communicate between kernel and userspace. Support arbitrary linux socket filter (BPF) programs as iptables match rules. This allows for very expressive filters, and on platforms with BPF JIT appears competitive with traditional hardcoded iptables rules. At least, on an x86_64 that achieves 40K netperf TCP_STREAM without any iptables rules (40 GBps), inserting 100x this bpf rule gives 28K ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') inserting 100x this u32 rule gives 21K ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP The two are logically equivalent, as far as I can tell. Let me know if my test methodology is flawed in some way. Even in cases where slower, the filter adds functionality currently lacking in iptables, such as access to sk_buff fields like rxhash and queue_mapping. --- Gconfig.xt_bpf | 1 + include/uapi/linux/netfilter/xt_bpf.h | 17 ++++++++ net/netfilter/Kconfig | 9 ++++ net/netfilter/Makefile | 1 + net/netfilter/x_tables.c | 5 +- net/netfilter/xt_bpf.c | 73 +++++++++++++++++++++++++++++++++ 6 files changed, 104 insertions(+), 2 deletions(-) create mode 100644 Gconfig.xt_bpf create mode 100644 include/uapi/linux/netfilter/xt_bpf.h create mode 100644 net/netfilter/xt_bpf.c diff --git a/Gconfig.xt_bpf b/Gconfig.xt_bpf new file mode 100644 index 0000000..dd51452 --- /dev/null +++ b/Gconfig.xt_bpf @@ -0,0 +1 @@ +CONFIG_NETFILTER_XT_MATCH_BPF=m diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h new file mode 100644 index 0000000..5dda450 --- /dev/null +++ b/include/uapi/linux/netfilter/xt_bpf.h @@ -0,0 +1,17 @@ +#ifndef _XT_BPF_H +#define _XT_BPF_H + +#include <linux/filter.h> +#include <linux/types.h> + +#define XT_BPF_MAX_NUM_INSTR 64 + +struct xt_bpf_info { + __u16 bpf_program_num_elem; + struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR]; + + /* only used in the kernel */ + struct sk_filter *filter __attribute__((aligned(8))); +}; + +#endif /*_XT_BPF_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index fefa514..d45720f 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE If you want to compile it as a module, say M here and read <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. +config NETFILTER_XT_MATCH_BPF + tristate '"bpf" match support' + depends on NETFILTER_ADVANCED + help + BPF matching applies a linux socket filter to each packet and + accepts those for which the filter returns non-zero. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_CLUSTER tristate '"cluster" match support' depends on NF_CONNTRACK diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 3259697..6d6194525 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o # matches obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 8d987c3..26306be 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par, if (XT_ALIGN(par->match->matchsize) != size && par->match->matchsize != -1) { /* - * ebt_among is exempt from centralized matchsize checking - * because it uses a dynamic-size data set. + * matches of variable size length, such as ebt_among, + * are exempt from centralized matchsize checking. They + * skip the test by setting xt_match.matchsize to -1. */ pr_err("%s_tables: %s.%u match: invalid size " "%u (kernel) != (user) %u\n", diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c new file mode 100644 index 0000000..1bdfab8 --- /dev/null +++ b/net/netfilter/xt_bpf.c @@ -0,0 +1,73 @@ +/* Xtables module to match packets using a BPF filter. + * Copyright 2013 Google Inc. + * Written by Willem de Bruijn <willemb@google.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/module.h> +#include <linux/skbuff.h> +#include <linux/ipv6.h> +#include <linux/filter.h> +#include <net/ip.h> + +#include <linux/netfilter/xt_bpf.h> +#include <linux/netfilter/x_tables.h> + +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); +MODULE_DESCRIPTION("Xtables: BPF filter match"); +MODULE_LICENSE("GPL"); + +static int bpf_mt_check(const struct xt_mtchk_param *par) +{ + struct xt_bpf_info *info = par->matchinfo; + struct sock_fprog program; + + program.len = info->bpf_program_num_elem; + program.filter = info->bpf_program; + if (sk_unattached_filter_create(&info->filter, &program)) { + pr_info("bpf: check failed: parse error\n"); + return -EINVAL; + } + + return 0; +} + +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + + return SK_RUN_FILTER(info->filter, skb); +} + +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + sk_unattached_filter_destroy(info->filter); +} + +static struct xt_match bpf_mt_reg __read_mostly = { + .name = "bpf", + .revision = 0, + .family = NFPROTO_UNSPEC, + .checkentry = bpf_mt_check, + .match = bpf_mt, + .destroy = bpf_mt_destroy, + .matchsize = sizeof(struct xt_bpf_info), + .me = THIS_MODULE, +}; + +static int __init bpf_mt_init(void) +{ + return xt_register_match(&bpf_mt_reg); +} + +static void __exit bpf_mt_exit(void) +{ + xt_unregister_match(&bpf_mt_reg); +} + +module_init(bpf_mt_init); +module_exit(bpf_mt_exit); -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH next v3] iptables: add xt_bpf match 2013-01-10 0:08 ` [PATCH next v2] " Willem de Bruijn @ 2013-01-10 0:15 ` Willem de Bruijn 2013-01-17 23:53 ` Pablo Neira Ayuso 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-01-10 0:15 UTC (permalink / raw) To: netfilter-devel, pablo; +Cc: Willem de Bruijn Changes: - v3: reverted no longer needed changes to x_tables.c - v2: use a fixed size match structure to communicate between kernel and userspace. Support arbitrary linux socket filter (BPF) programs as iptables match rules. This allows for very expressive filters, and on platforms with BPF JIT appears competitive with traditional hardcoded iptables rules. At least, on an x86_64 that achieves 40K netperf TCP_STREAM without any iptables rules (40 GBps), inserting 100x this bpf rule gives 28K ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') inserting 100x this u32 rule gives 21K ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP The two are logically equivalent, as far as I can tell. Let me know if my test methodology is flawed in some way. Even in cases where slower, the filter adds functionality currently lacking in iptables, such as access to sk_buff fields like rxhash and queue_mapping. --- include/uapi/linux/netfilter/xt_bpf.h | 17 ++++++++ net/netfilter/Kconfig | 9 ++++ net/netfilter/Makefile | 1 + net/netfilter/xt_bpf.c | 73 +++++++++++++++++++++++++++++++++ 4 files changed, 100 insertions(+), 0 deletions(-) create mode 100644 include/uapi/linux/netfilter/xt_bpf.h create mode 100644 net/netfilter/xt_bpf.c diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h new file mode 100644 index 0000000..5dda450 --- /dev/null +++ b/include/uapi/linux/netfilter/xt_bpf.h @@ -0,0 +1,17 @@ +#ifndef _XT_BPF_H +#define _XT_BPF_H + +#include <linux/filter.h> +#include <linux/types.h> + +#define XT_BPF_MAX_NUM_INSTR 64 + +struct xt_bpf_info { + __u16 bpf_program_num_elem; + struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR]; + + /* only used in the kernel */ + struct sk_filter *filter __attribute__((aligned(8))); +}; + +#endif /*_XT_BPF_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index fefa514..d45720f 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE If you want to compile it as a module, say M here and read <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. +config NETFILTER_XT_MATCH_BPF + tristate '"bpf" match support' + depends on NETFILTER_ADVANCED + help + BPF matching applies a linux socket filter to each packet and + accepts those for which the filter returns non-zero. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_CLUSTER tristate '"cluster" match support' depends on NF_CONNTRACK diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 3259697..6d6194525 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o # matches obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c new file mode 100644 index 0000000..1bdfab8 --- /dev/null +++ b/net/netfilter/xt_bpf.c @@ -0,0 +1,73 @@ +/* Xtables module to match packets using a BPF filter. + * Copyright 2013 Google Inc. + * Written by Willem de Bruijn <willemb@google.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/module.h> +#include <linux/skbuff.h> +#include <linux/ipv6.h> +#include <linux/filter.h> +#include <net/ip.h> + +#include <linux/netfilter/xt_bpf.h> +#include <linux/netfilter/x_tables.h> + +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); +MODULE_DESCRIPTION("Xtables: BPF filter match"); +MODULE_LICENSE("GPL"); + +static int bpf_mt_check(const struct xt_mtchk_param *par) +{ + struct xt_bpf_info *info = par->matchinfo; + struct sock_fprog program; + + program.len = info->bpf_program_num_elem; + program.filter = info->bpf_program; + if (sk_unattached_filter_create(&info->filter, &program)) { + pr_info("bpf: check failed: parse error\n"); + return -EINVAL; + } + + return 0; +} + +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + + return SK_RUN_FILTER(info->filter, skb); +} + +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + sk_unattached_filter_destroy(info->filter); +} + +static struct xt_match bpf_mt_reg __read_mostly = { + .name = "bpf", + .revision = 0, + .family = NFPROTO_UNSPEC, + .checkentry = bpf_mt_check, + .match = bpf_mt, + .destroy = bpf_mt_destroy, + .matchsize = sizeof(struct xt_bpf_info), + .me = THIS_MODULE, +}; + +static int __init bpf_mt_init(void) +{ + return xt_register_match(&bpf_mt_reg); +} + +static void __exit bpf_mt_exit(void) +{ + xt_unregister_match(&bpf_mt_reg); +} + +module_init(bpf_mt_init); +module_exit(bpf_mt_exit); -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-10 0:15 ` [PATCH next v3] " Willem de Bruijn @ 2013-01-17 23:53 ` Pablo Neira Ayuso 2013-01-18 16:48 ` Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-01-17 23:53 UTC (permalink / raw) To: Willem de Bruijn; +Cc: netfilter-devel Hi Willem, On Wed, Jan 09, 2013 at 07:15:44PM -0500, Willem de Bruijn wrote: > Changes: > - v3: reverted no longer needed changes to x_tables.c > - v2: use a fixed size match structure to communicate between > kernel and userspace. > > Support arbitrary linux socket filter (BPF) programs as iptables > match rules. This allows for very expressive filters, and on > platforms with BPF JIT appears competitive with traditional hardcoded > iptables rules. > > At least, on an x86_64 that achieves 40K netperf TCP_STREAM without > any iptables rules (40 GBps), > > inserting 100x this bpf rule gives 28K > > ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j > > (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') That code generated by tcpdump will not work. tcpdump generates BPF code assuming that offset 0 is the link layer header, while iptables considers that offset 0 is the network layer. More comments below: > inserting 100x this u32 rule gives 21K > > ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP > > The two are logically equivalent, as far as I can tell. Let me know > if my test methodology is flawed in some way. Even in cases where > slower, the filter adds functionality currently lacking in iptables, > such as access to sk_buff fields like rxhash and queue_mapping. > --- > include/uapi/linux/netfilter/xt_bpf.h | 17 ++++++++ > net/netfilter/Kconfig | 9 ++++ > net/netfilter/Makefile | 1 + > net/netfilter/xt_bpf.c | 73 +++++++++++++++++++++++++++++++++ > 4 files changed, 100 insertions(+), 0 deletions(-) > create mode 100644 include/uapi/linux/netfilter/xt_bpf.h > create mode 100644 net/netfilter/xt_bpf.c > > diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h > new file mode 100644 > index 0000000..5dda450 > --- /dev/null > +++ b/include/uapi/linux/netfilter/xt_bpf.h > @@ -0,0 +1,17 @@ > +#ifndef _XT_BPF_H > +#define _XT_BPF_H > + > +#include <linux/filter.h> > +#include <linux/types.h> > + > +#define XT_BPF_MAX_NUM_INSTR 64 > + > +struct xt_bpf_info { > + __u16 bpf_program_num_elem; > + struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR]; > + > + /* only used in the kernel */ > + struct sk_filter *filter __attribute__((aligned(8))); > +}; > + > +#endif /*_XT_BPF_H */ > diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig > index fefa514..d45720f 100644 > --- a/net/netfilter/Kconfig > +++ b/net/netfilter/Kconfig > @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE > If you want to compile it as a module, say M here and read > <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. > > +config NETFILTER_XT_MATCH_BPF > + tristate '"bpf" match support' > + depends on NETFILTER_ADVANCED > + help > + BPF matching applies a linux socket filter to each packet and > + accepts those for which the filter returns non-zero. > + > + To compile it as a module, choose M here. If unsure, say N. > + > config NETFILTER_XT_MATCH_CLUSTER > tristate '"cluster" match support' > depends on NF_CONNTRACK > diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile > index 3259697..6d6194525 100644 > --- a/net/netfilter/Makefile > +++ b/net/netfilter/Makefile > @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o > > # matches > obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o > +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o > obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o > obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o > obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o > diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c > new file mode 100644 > index 0000000..1bdfab8 > --- /dev/null > +++ b/net/netfilter/xt_bpf.c > @@ -0,0 +1,73 @@ > +/* Xtables module to match packets using a BPF filter. > + * Copyright 2013 Google Inc. > + * Written by Willem de Bruijn <willemb@google.com> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + */ > + > +#include <linux/module.h> > +#include <linux/skbuff.h> > +#include <linux/ipv6.h> > +#include <linux/filter.h> > +#include <net/ip.h> > + > +#include <linux/netfilter/xt_bpf.h> > +#include <linux/netfilter/x_tables.h> > + > +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); > +MODULE_DESCRIPTION("Xtables: BPF filter match"); > +MODULE_LICENSE("GPL"); Please, add MODULE_ALIAS("ipt_bpf"); MODULE_ALIAS("ip6t_bpf"); otherwise module auto-loading will not work. > + > +static int bpf_mt_check(const struct xt_mtchk_param *par) > +{ > + struct xt_bpf_info *info = par->matchinfo; > + struct sock_fprog program; > + > + program.len = info->bpf_program_num_elem; > + program.filter = info->bpf_program; sparse reports a warning here above. I've been trying to find a quick solution, please, have a look at it. > + if (sk_unattached_filter_create(&info->filter, &program)) { > + pr_info("bpf: check failed: parse error\n"); > + return -EINVAL; > + } > + > + return 0; > +} > + > +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) > +{ > + const struct xt_bpf_info *info = par->matchinfo; > + > + return SK_RUN_FILTER(info->filter, skb); > +} > + > +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) > +{ > + const struct xt_bpf_info *info = par->matchinfo; > + sk_unattached_filter_destroy(info->filter); > +} > + > +static struct xt_match bpf_mt_reg __read_mostly = { > + .name = "bpf", > + .revision = 0, > + .family = NFPROTO_UNSPEC, > + .checkentry = bpf_mt_check, > + .match = bpf_mt, > + .destroy = bpf_mt_destroy, > + .matchsize = sizeof(struct xt_bpf_info), > + .me = THIS_MODULE, > +}; > + > +static int __init bpf_mt_init(void) > +{ > + return xt_register_match(&bpf_mt_reg); > +} > + > +static void __exit bpf_mt_exit(void) > +{ > + xt_unregister_match(&bpf_mt_reg); > +} > + > +module_init(bpf_mt_init); > +module_exit(bpf_mt_exit); > -- > 1.7.7.3 > ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-17 23:53 ` Pablo Neira Ayuso @ 2013-01-18 16:48 ` Willem de Bruijn 2013-01-18 17:17 ` [PATCH next] " Willem de Bruijn 2013-01-21 13:44 ` [PATCH next v3] " Pablo Neira Ayuso 0 siblings, 2 replies; 58+ messages in thread From: Willem de Bruijn @ 2013-01-18 16:48 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter-devel On Thu, Jan 17, 2013 at 6:53 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > Hi Willem, > > On Wed, Jan 09, 2013 at 07:15:44PM -0500, Willem de Bruijn wrote: >> Changes: >> - v3: reverted no longer needed changes to x_tables.c >> - v2: use a fixed size match structure to communicate between >> kernel and userspace. >> >> Support arbitrary linux socket filter (BPF) programs as iptables >> match rules. This allows for very expressive filters, and on >> platforms with BPF JIT appears competitive with traditional hardcoded >> iptables rules. >> >> At least, on an x86_64 that achieves 40K netperf TCP_STREAM without >> any iptables rules (40 GBps), >> >> inserting 100x this bpf rule gives 28K >> >> ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j >> >> (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') > > That code generated by tcpdump will not work. > > tcpdump generates BPF code assuming that offset 0 is the link layer > header, while iptables considers that offset 0 is the network layer. Ah, yes, of course. We discussed that earlier. I removed the statement from the commit message. Such hints belong in the libxt_bpf man page, if anywhere. To compile code right now, the little bpf compiler that I emailed before can be downloaded from http://code.google.com/p/kernel/downloads/detail?name=bpf2decimal.c I don't think that a compiler has to be shipped with iptables itself, let alone make iptables link against libraries. That said, it is not impossible to detect pcap.h in configure.ac and optionally enable a "-m bpf --string" mode that calls pcap_compile_nopcap from within libxt_bpf, so let me know if you would like me to code that up. I can also try to send a patch to tcpdump that extends compilation (`-ddd -y <type>`) to arbitrary link layer types. > More comments below: > >> inserting 100x this u32 rule gives 21K >> >> ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP >> >> The two are logically equivalent, as far as I can tell. Let me know >> if my test methodology is flawed in some way. Even in cases where >> slower, the filter adds functionality currently lacking in iptables, >> such as access to sk_buff fields like rxhash and queue_mapping. >> --- >> include/uapi/linux/netfilter/xt_bpf.h | 17 ++++++++ >> net/netfilter/Kconfig | 9 ++++ >> net/netfilter/Makefile | 1 + >> net/netfilter/xt_bpf.c | 73 +++++++++++++++++++++++++++++++++ >> 4 files changed, 100 insertions(+), 0 deletions(-) >> create mode 100644 include/uapi/linux/netfilter/xt_bpf.h >> create mode 100644 net/netfilter/xt_bpf.c >> >> diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h >> new file mode 100644 >> index 0000000..5dda450 >> --- /dev/null >> +++ b/include/uapi/linux/netfilter/xt_bpf.h >> @@ -0,0 +1,17 @@ >> +#ifndef _XT_BPF_H >> +#define _XT_BPF_H >> + >> +#include <linux/filter.h> >> +#include <linux/types.h> >> + >> +#define XT_BPF_MAX_NUM_INSTR 64 >> + >> +struct xt_bpf_info { >> + __u16 bpf_program_num_elem; >> + struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR]; >> + >> + /* only used in the kernel */ >> + struct sk_filter *filter __attribute__((aligned(8))); >> +}; >> + >> +#endif /*_XT_BPF_H */ >> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig >> index fefa514..d45720f 100644 >> --- a/net/netfilter/Kconfig >> +++ b/net/netfilter/Kconfig >> @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE >> If you want to compile it as a module, say M here and read >> <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. >> >> +config NETFILTER_XT_MATCH_BPF >> + tristate '"bpf" match support' >> + depends on NETFILTER_ADVANCED >> + help >> + BPF matching applies a linux socket filter to each packet and >> + accepts those for which the filter returns non-zero. >> + >> + To compile it as a module, choose M here. If unsure, say N. >> + >> config NETFILTER_XT_MATCH_CLUSTER >> tristate '"cluster" match support' >> depends on NF_CONNTRACK >> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile >> index 3259697..6d6194525 100644 >> --- a/net/netfilter/Makefile >> +++ b/net/netfilter/Makefile >> @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o >> >> # matches >> obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o >> +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o >> diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c >> new file mode 100644 >> index 0000000..1bdfab8 >> --- /dev/null >> +++ b/net/netfilter/xt_bpf.c >> @@ -0,0 +1,73 @@ >> +/* Xtables module to match packets using a BPF filter. >> + * Copyright 2013 Google Inc. >> + * Written by Willem de Bruijn <willemb@google.com> >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License version 2 as >> + * published by the Free Software Foundation. >> + */ >> + >> +#include <linux/module.h> >> +#include <linux/skbuff.h> >> +#include <linux/ipv6.h> >> +#include <linux/filter.h> >> +#include <net/ip.h> >> + >> +#include <linux/netfilter/xt_bpf.h> >> +#include <linux/netfilter/x_tables.h> >> + >> +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); >> +MODULE_DESCRIPTION("Xtables: BPF filter match"); >> +MODULE_LICENSE("GPL"); > > Please, add > > MODULE_ALIAS("ipt_bpf"); > MODULE_ALIAS("ip6t_bpf"); Done. > otherwise module auto-loading will not work. > >> + >> +static int bpf_mt_check(const struct xt_mtchk_param *par) >> +{ >> + struct xt_bpf_info *info = par->matchinfo; >> + struct sock_fprog program; >> + >> + program.len = info->bpf_program_num_elem; >> + program.filter = info->bpf_program; > > sparse reports a warning here above. I've been trying to find a quick > solution, please, have a look at it. Thanks for catching this. Apparently, program.filter is annotated as __user. A cast made the warning disappear, and is safe as far as I can see, since it only forces the kernel to be more conservative in accessing the memory. >> + if (sk_unattached_filter_create(&info->filter, &program)) { >> + pr_info("bpf: check failed: parse error\n"); >> + return -EINVAL; >> + } >> + >> + return 0; >> +} >> + >> +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) >> +{ >> + const struct xt_bpf_info *info = par->matchinfo; >> + >> + return SK_RUN_FILTER(info->filter, skb); >> +} >> + >> +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) >> +{ >> + const struct xt_bpf_info *info = par->matchinfo; >> + sk_unattached_filter_destroy(info->filter); >> +} >> + >> +static struct xt_match bpf_mt_reg __read_mostly = { >> + .name = "bpf", >> + .revision = 0, >> + .family = NFPROTO_UNSPEC, >> + .checkentry = bpf_mt_check, >> + .match = bpf_mt, >> + .destroy = bpf_mt_destroy, >> + .matchsize = sizeof(struct xt_bpf_info), >> + .me = THIS_MODULE, >> +}; >> + >> +static int __init bpf_mt_init(void) >> +{ >> + return xt_register_match(&bpf_mt_reg); >> +} >> + >> +static void __exit bpf_mt_exit(void) >> +{ >> + xt_unregister_match(&bpf_mt_reg); >> +} >> + >> +module_init(bpf_mt_init); >> +module_exit(bpf_mt_exit); >> -- >> 1.7.7.3 >> ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH next] iptables: add xt_bpf match 2013-01-18 16:48 ` Willem de Bruijn @ 2013-01-18 17:17 ` Willem de Bruijn 2013-01-21 11:28 ` Pablo Neira Ayuso 2013-01-21 13:44 ` [PATCH next v3] " Pablo Neira Ayuso 1 sibling, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-01-18 17:17 UTC (permalink / raw) To: netfilter-devel, pablo; +Cc: Willem de Bruijn Changes: - v4: fixed sparse warning and module autoloading - v3: reverted no longer needed changes to x_tables.c - v2: use a fixed size match structure to communicate between kernel and userspace. Support arbitrary linux socket filter (BPF) programs as iptables match rules. This allows for very expressive filters, and on platforms with BPF JIT appears competitive with traditional hardcoded iptables rules. At least, on an x86_64 that achieves 40K netperf TCP_STREAM without any iptables rules (40 GBps), inserting 100x this bpf rule gives 28K ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j inserting 100x this u32 rule gives 21K ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP The two are logically equivalent, as far as I can tell. Let me know if my test methodology is flawed in some way. Even in cases where slower, the filter adds functionality currently lacking in iptables, such as access to sk_buff fields like rxhash and queue_mapping. --- include/uapi/linux/netfilter/xt_bpf.h | 17 +++++++ net/netfilter/Kconfig | 9 ++++ net/netfilter/Makefile | 1 + net/netfilter/xt_bpf.c | 75 +++++++++++++++++++++++++++++++++ 4 files changed, 102 insertions(+), 0 deletions(-) create mode 100644 include/uapi/linux/netfilter/xt_bpf.h create mode 100644 net/netfilter/xt_bpf.c diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h new file mode 100644 index 0000000..5dda450 --- /dev/null +++ b/include/uapi/linux/netfilter/xt_bpf.h @@ -0,0 +1,17 @@ +#ifndef _XT_BPF_H +#define _XT_BPF_H + +#include <linux/filter.h> +#include <linux/types.h> + +#define XT_BPF_MAX_NUM_INSTR 64 + +struct xt_bpf_info { + __u16 bpf_program_num_elem; + struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR]; + + /* only used in the kernel */ + struct sk_filter *filter __attribute__((aligned(8))); +}; + +#endif /*_XT_BPF_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index bb48607..4017d85 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -811,6 +811,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE If you want to compile it as a module, say M here and read <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. +config NETFILTER_XT_MATCH_BPF + tristate '"bpf" match support' + depends on NETFILTER_ADVANCED + help + BPF matching applies a linux socket filter to each packet and + accepts those for which the filter returns non-zero. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_CLUSTER tristate '"cluster" match support' depends on NF_CONNTRACK diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index b3bbda6..a1abf87 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -99,6 +99,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o # matches obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c new file mode 100644 index 0000000..62d93f8 --- /dev/null +++ b/net/netfilter/xt_bpf.c @@ -0,0 +1,75 @@ +/* Xtables module to match packets using a BPF filter. + * Copyright 2013 Google Inc. + * Written by Willem de Bruijn <willemb@google.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/module.h> +#include <linux/skbuff.h> +#include <linux/ipv6.h> +#include <linux/filter.h> +#include <net/ip.h> + +#include <linux/netfilter/xt_bpf.h> +#include <linux/netfilter/x_tables.h> + +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); +MODULE_DESCRIPTION("Xtables: BPF filter match"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("ipt_bpf"); +MODULE_ALIAS("ip6t_bpf"); + +static int bpf_mt_check(const struct xt_mtchk_param *par) +{ + struct xt_bpf_info *info = par->matchinfo; + struct sock_fprog program; + + program.len = info->bpf_program_num_elem; + program.filter = (struct sock_filter __user *) info->bpf_program; + if (sk_unattached_filter_create(&info->filter, &program)) { + pr_info("bpf: check failed: parse error\n"); + return -EINVAL; + } + + return 0; +} + +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + + return SK_RUN_FILTER(info->filter, skb); +} + +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + sk_unattached_filter_destroy(info->filter); +} + +static struct xt_match bpf_mt_reg __read_mostly = { + .name = "bpf", + .revision = 0, + .family = NFPROTO_UNSPEC, + .checkentry = bpf_mt_check, + .match = bpf_mt, + .destroy = bpf_mt_destroy, + .matchsize = sizeof(struct xt_bpf_info), + .me = THIS_MODULE, +}; + +static int __init bpf_mt_init(void) +{ + return xt_register_match(&bpf_mt_reg); +} + +static void __exit bpf_mt_exit(void) +{ + xt_unregister_match(&bpf_mt_reg); +} + +module_init(bpf_mt_init); +module_exit(bpf_mt_exit); -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH next] iptables: add xt_bpf match 2013-01-18 17:17 ` [PATCH next] " Willem de Bruijn @ 2013-01-21 11:28 ` Pablo Neira Ayuso 2013-01-21 11:33 ` Pablo Neira Ayuso 2013-01-21 16:02 ` Willem de Bruijn 0 siblings, 2 replies; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-01-21 11:28 UTC (permalink / raw) To: Willem de Bruijn; +Cc: netfilter-devel Hi Willem, I have applied this patch to my nf-next tree with minor changes. You'll get also another review asap for the iptables user-space part, we have more time to get that into shape anyway. On Fri, Jan 18, 2013 at 12:17:30PM -0500, Willem de Bruijn wrote: > Changes: > - v4: fixed sparse warning and module autoloading > - v3: reverted no longer needed changes to x_tables.c > - v2: use a fixed size match structure to communicate between > kernel and userspace. > > Support arbitrary linux socket filter (BPF) programs as iptables > match rules. This allows for very expressive filters, and on > platforms with BPF JIT appears competitive with traditional hardcoded > iptables rules. > > At least, on an x86_64 that achieves 40K netperf TCP_STREAM without > any iptables rules (40 GBps), > > inserting 100x this bpf rule gives 28K > > ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j I have removed part of the changelog, the performance part. This filter above to be possible confusion. That filter is using the link layer as base. I think the expressivity of BPF is enough to justify its inclusion into mainline. I also added a line notice on the artificial limitation to 64 instructions per filter. More minor glitches below. > inserting 100x this u32 rule gives 21K > > ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP > > The two are logically equivalent, as far as I can tell. Let me know > if my test methodology is flawed in some way. Even in cases where > slower, the filter adds functionality currently lacking in iptables, > such as access to sk_buff fields like rxhash and queue_mapping. > --- > include/uapi/linux/netfilter/xt_bpf.h | 17 +++++++ > net/netfilter/Kconfig | 9 ++++ > net/netfilter/Makefile | 1 + > net/netfilter/xt_bpf.c | 75 +++++++++++++++++++++++++++++++++ > 4 files changed, 102 insertions(+), 0 deletions(-) > create mode 100644 include/uapi/linux/netfilter/xt_bpf.h > create mode 100644 net/netfilter/xt_bpf.c > > diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h > new file mode 100644 > index 0000000..5dda450 > --- /dev/null > +++ b/include/uapi/linux/netfilter/xt_bpf.h > @@ -0,0 +1,17 @@ > +#ifndef _XT_BPF_H > +#define _XT_BPF_H > + > +#include <linux/filter.h> > +#include <linux/types.h> > + > +#define XT_BPF_MAX_NUM_INSTR 64 > + > +struct xt_bpf_info { > + __u16 bpf_program_num_elem; > + struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR]; > + > + /* only used in the kernel */ > + struct sk_filter *filter __attribute__((aligned(8))); > +}; > + > +#endif /*_XT_BPF_H */ > diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig > index bb48607..4017d85 100644 > --- a/net/netfilter/Kconfig > +++ b/net/netfilter/Kconfig > @@ -811,6 +811,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE > If you want to compile it as a module, say M here and read > <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. > > +config NETFILTER_XT_MATCH_BPF > + tristate '"bpf" match support' > + depends on NETFILTER_ADVANCED > + help > + BPF matching applies a linux socket filter to each packet and > + accepts those for which the filter returns non-zero. ^^^ Fixed this minor glitch. > + > + To compile it as a module, choose M here. If unsure, say N. > + > config NETFILTER_XT_MATCH_CLUSTER > tristate '"cluster" match support' > depends on NF_CONNTRACK > diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile > index b3bbda6..a1abf87 100644 > --- a/net/netfilter/Makefile > +++ b/net/netfilter/Makefile > @@ -99,6 +99,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o > > # matches > obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o > +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o > obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o > obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o > obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o > diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c > new file mode 100644 > index 0000000..62d93f8 > --- /dev/null > +++ b/net/netfilter/xt_bpf.c > @@ -0,0 +1,75 @@ > +/* Xtables module to match packets using a BPF filter. > + * Copyright 2013 Google Inc. > + * Written by Willem de Bruijn <willemb@google.com> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + */ > + > +#include <linux/module.h> > +#include <linux/skbuff.h> > +#include <linux/ipv6.h> Removed this header above, we don't need it. > +#include <linux/filter.h> > +#include <net/ip.h> Same thing. > + > +#include <linux/netfilter/xt_bpf.h> > +#include <linux/netfilter/x_tables.h> > + > +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); > +MODULE_DESCRIPTION("Xtables: BPF filter match"); > +MODULE_LICENSE("GPL"); > +MODULE_ALIAS("ipt_bpf"); > +MODULE_ALIAS("ip6t_bpf"); > + > +static int bpf_mt_check(const struct xt_mtchk_param *par) > +{ > + struct xt_bpf_info *info = par->matchinfo; > + struct sock_fprog program; > + > + program.len = info->bpf_program_num_elem; > + program.filter = (struct sock_filter __user *) info->bpf_program; > + if (sk_unattached_filter_create(&info->filter, &program)) { > + pr_info("bpf: check failed: parse error\n"); > + return -EINVAL; > + } > + > + return 0; > +} > + > +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) > +{ > + const struct xt_bpf_info *info = par->matchinfo; > + > + return SK_RUN_FILTER(info->filter, skb); > +} > + > +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) > +{ > + const struct xt_bpf_info *info = par->matchinfo; > + sk_unattached_filter_destroy(info->filter); > +} > + > +static struct xt_match bpf_mt_reg __read_mostly = { > + .name = "bpf", > + .revision = 0, > + .family = NFPROTO_UNSPEC, > + .checkentry = bpf_mt_check, > + .match = bpf_mt, > + .destroy = bpf_mt_destroy, > + .matchsize = sizeof(struct xt_bpf_info), > + .me = THIS_MODULE, > +}; > + > +static int __init bpf_mt_init(void) > +{ > + return xt_register_match(&bpf_mt_reg); > +} > + > +static void __exit bpf_mt_exit(void) > +{ > + xt_unregister_match(&bpf_mt_reg); > +} > + > +module_init(bpf_mt_init); > +module_exit(bpf_mt_exit); > -- > 1.7.7.3 > > -- > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next] iptables: add xt_bpf match 2013-01-21 11:28 ` Pablo Neira Ayuso @ 2013-01-21 11:33 ` Pablo Neira Ayuso 2013-01-21 11:42 ` Florian Westphal 2013-01-21 16:02 ` Willem de Bruijn 1 sibling, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-01-21 11:33 UTC (permalink / raw) To: Willem de Bruijn; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 209 bytes --] On Mon, Jan 21, 2013 at 12:28:38PM +0100, Pablo Neira Ayuso wrote: > Hi Willem, > > I have applied this patch to my nf-next tree with minor changes. This was also missing, I have applied this patch as well. [-- Attachment #2: 0001-netfilter-add-missing-xt_bpf.h-header-in-installatio.patch --] [-- Type: text/x-diff, Size: 805 bytes --] >From e7db3cbcd6508235d63ba4a31bbd1ce4fdece6e1 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso <pablo@netfilter.org> Date: Mon, 21 Jan 2013 12:30:59 +0100 Subject: [PATCH] netfilter: add missing xt_bpf.h header in installation Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> --- include/uapi/linux/netfilter/Kbuild | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild index 08f555f..8b4bd36 100644 --- a/include/uapi/linux/netfilter/Kbuild +++ b/include/uapi/linux/netfilter/Kbuild @@ -35,6 +35,7 @@ header-y += xt_TCPOPTSTRIP.h header-y += xt_TEE.h header-y += xt_TPROXY.h header-y += xt_addrtype.h +header-y += xt_bpf.h header-y += xt_cluster.h header-y += xt_comment.h header-y += xt_connbytes.h -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH next] iptables: add xt_bpf match 2013-01-21 11:33 ` Pablo Neira Ayuso @ 2013-01-21 11:42 ` Florian Westphal 2013-01-21 12:03 ` Pablo Neira Ayuso 0 siblings, 1 reply; 58+ messages in thread From: Florian Westphal @ 2013-01-21 11:42 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter-devel Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Mon, Jan 21, 2013 at 12:28:38PM +0100, Pablo Neira Ayuso wrote: > > Hi Willem, > > > > I have applied this patch to my nf-next tree with minor changes. [..] > diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild > index 08f555f..8b4bd36 100644 > --- a/include/uapi/linux/netfilter/Kbuild > +++ b/include/uapi/linux/netfilter/Kbuild > @@ -35,6 +35,7 @@ header-y += xt_TCPOPTSTRIP.h > header-y += xt_TEE.h > header-y += xt_TPROXY.h > header-y += xt_addrtype.h > +header-y += xt_bpf.h Good catch. I forgot about that too, so xt_connlabel.h is also missing. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next] iptables: add xt_bpf match 2013-01-21 11:42 ` Florian Westphal @ 2013-01-21 12:03 ` Pablo Neira Ayuso 0 siblings, 0 replies; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-01-21 12:03 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 794 bytes --] On Mon, Jan 21, 2013 at 12:42:20PM +0100, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Mon, Jan 21, 2013 at 12:28:38PM +0100, Pablo Neira Ayuso wrote: > > > Hi Willem, > > > > > > I have applied this patch to my nf-next tree with minor changes. > > [..] > > > diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild > > index 08f555f..8b4bd36 100644 > > --- a/include/uapi/linux/netfilter/Kbuild > > +++ b/include/uapi/linux/netfilter/Kbuild > > @@ -35,6 +35,7 @@ header-y += xt_TCPOPTSTRIP.h > > header-y += xt_TEE.h > > header-y += xt_TPROXY.h > > header-y += xt_addrtype.h > > +header-y += xt_bpf.h > > Good catch. I forgot about that too, so xt_connlabel.h is also missing. Patch attached, thanks for spotting this. [-- Attachment #2: 0001-netfilter-add-missing-xt_connlabel.h-header-in-insta.patch --] [-- Type: text/x-diff, Size: 863 bytes --] >From 7a95ad28c40fec445b0e1463fd8ed638701ad94d Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso <pablo@netfilter.org> Date: Mon, 21 Jan 2013 13:02:19 +0100 Subject: [PATCH] netfilter: add missing xt_connlabel.h header in installation Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> --- include/uapi/linux/netfilter/Kbuild | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild index 8b4bd36..4111577 100644 --- a/include/uapi/linux/netfilter/Kbuild +++ b/include/uapi/linux/netfilter/Kbuild @@ -39,6 +39,7 @@ header-y += xt_bpf.h header-y += xt_cluster.h header-y += xt_comment.h header-y += xt_connbytes.h +header-y += xt_connlabel.h header-y += xt_connlimit.h header-y += xt_connmark.h header-y += xt_conntrack.h -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH next] iptables: add xt_bpf match 2013-01-21 11:28 ` Pablo Neira Ayuso 2013-01-21 11:33 ` Pablo Neira Ayuso @ 2013-01-21 16:02 ` Willem de Bruijn 1 sibling, 0 replies; 58+ messages in thread From: Willem de Bruijn @ 2013-01-21 16:02 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter-devel On Mon, Jan 21, 2013 at 6:28 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > Hi Willem, > > I have applied this patch to my nf-next tree with minor changes. Great. Thanks for cleaning up the patch, Pablo. Apologies for not spotting those glitches, myself. > You'll get also another review asap for the iptables user-space part, > we have more time to get that into shape anyway. > > On Fri, Jan 18, 2013 at 12:17:30PM -0500, Willem de Bruijn wrote: >> Changes: >> - v4: fixed sparse warning and module autoloading >> - v3: reverted no longer needed changes to x_tables.c >> - v2: use a fixed size match structure to communicate between >> kernel and userspace. >> >> Support arbitrary linux socket filter (BPF) programs as iptables >> match rules. This allows for very expressive filters, and on >> platforms with BPF JIT appears competitive with traditional hardcoded >> iptables rules. >> >> At least, on an x86_64 that achieves 40K netperf TCP_STREAM without >> any iptables rules (40 GBps), >> >> inserting 100x this bpf rule gives 28K >> >> ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j > > I have removed part of the changelog, the performance part. This > filter above to be possible confusion. That filter is using the link > layer as base. > > I think the expressivity of BPF is enough to justify its inclusion > into mainline. > > I also added a line notice on the artificial limitation to 64 > instructions per filter. > > More minor glitches below. > >> inserting 100x this u32 rule gives 21K >> >> ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP >> >> The two are logically equivalent, as far as I can tell. Let me know >> if my test methodology is flawed in some way. Even in cases where >> slower, the filter adds functionality currently lacking in iptables, >> such as access to sk_buff fields like rxhash and queue_mapping. >> --- >> include/uapi/linux/netfilter/xt_bpf.h | 17 +++++++ >> net/netfilter/Kconfig | 9 ++++ >> net/netfilter/Makefile | 1 + >> net/netfilter/xt_bpf.c | 75 +++++++++++++++++++++++++++++++++ >> 4 files changed, 102 insertions(+), 0 deletions(-) >> create mode 100644 include/uapi/linux/netfilter/xt_bpf.h >> create mode 100644 net/netfilter/xt_bpf.c >> >> diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h >> new file mode 100644 >> index 0000000..5dda450 >> --- /dev/null >> +++ b/include/uapi/linux/netfilter/xt_bpf.h >> @@ -0,0 +1,17 @@ >> +#ifndef _XT_BPF_H >> +#define _XT_BPF_H >> + >> +#include <linux/filter.h> >> +#include <linux/types.h> >> + >> +#define XT_BPF_MAX_NUM_INSTR 64 >> + >> +struct xt_bpf_info { >> + __u16 bpf_program_num_elem; >> + struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR]; >> + >> + /* only used in the kernel */ >> + struct sk_filter *filter __attribute__((aligned(8))); >> +}; >> + >> +#endif /*_XT_BPF_H */ >> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig >> index bb48607..4017d85 100644 >> --- a/net/netfilter/Kconfig >> +++ b/net/netfilter/Kconfig >> @@ -811,6 +811,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE >> If you want to compile it as a module, say M here and read >> <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. >> >> +config NETFILTER_XT_MATCH_BPF >> + tristate '"bpf" match support' >> + depends on NETFILTER_ADVANCED >> + help >> + BPF matching applies a linux socket filter to each packet and >> + accepts those for which the filter returns non-zero. > ^^^ > > Fixed this minor glitch. > >> + >> + To compile it as a module, choose M here. If unsure, say N. >> + >> config NETFILTER_XT_MATCH_CLUSTER >> tristate '"cluster" match support' >> depends on NF_CONNTRACK >> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile >> index b3bbda6..a1abf87 100644 >> --- a/net/netfilter/Makefile >> +++ b/net/netfilter/Makefile >> @@ -99,6 +99,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o >> >> # matches >> obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o >> +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o >> diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c >> new file mode 100644 >> index 0000000..62d93f8 >> --- /dev/null >> +++ b/net/netfilter/xt_bpf.c >> @@ -0,0 +1,75 @@ >> +/* Xtables module to match packets using a BPF filter. >> + * Copyright 2013 Google Inc. >> + * Written by Willem de Bruijn <willemb@google.com> >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License version 2 as >> + * published by the Free Software Foundation. >> + */ >> + >> +#include <linux/module.h> >> +#include <linux/skbuff.h> >> +#include <linux/ipv6.h> > > Removed this header above, we don't need it. > >> +#include <linux/filter.h> >> +#include <net/ip.h> > > Same thing. > >> + >> +#include <linux/netfilter/xt_bpf.h> >> +#include <linux/netfilter/x_tables.h> >> + >> +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>"); >> +MODULE_DESCRIPTION("Xtables: BPF filter match"); >> +MODULE_LICENSE("GPL"); >> +MODULE_ALIAS("ipt_bpf"); >> +MODULE_ALIAS("ip6t_bpf"); >> + >> +static int bpf_mt_check(const struct xt_mtchk_param *par) >> +{ >> + struct xt_bpf_info *info = par->matchinfo; >> + struct sock_fprog program; >> + >> + program.len = info->bpf_program_num_elem; >> + program.filter = (struct sock_filter __user *) info->bpf_program; >> + if (sk_unattached_filter_create(&info->filter, &program)) { >> + pr_info("bpf: check failed: parse error\n"); >> + return -EINVAL; >> + } >> + >> + return 0; >> +} >> + >> +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) >> +{ >> + const struct xt_bpf_info *info = par->matchinfo; >> + >> + return SK_RUN_FILTER(info->filter, skb); >> +} >> + >> +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) >> +{ >> + const struct xt_bpf_info *info = par->matchinfo; >> + sk_unattached_filter_destroy(info->filter); >> +} >> + >> +static struct xt_match bpf_mt_reg __read_mostly = { >> + .name = "bpf", >> + .revision = 0, >> + .family = NFPROTO_UNSPEC, >> + .checkentry = bpf_mt_check, >> + .match = bpf_mt, >> + .destroy = bpf_mt_destroy, >> + .matchsize = sizeof(struct xt_bpf_info), >> + .me = THIS_MODULE, >> +}; >> + >> +static int __init bpf_mt_init(void) >> +{ >> + return xt_register_match(&bpf_mt_reg); >> +} >> + >> +static void __exit bpf_mt_exit(void) >> +{ >> + xt_unregister_match(&bpf_mt_reg); >> +} >> + >> +module_init(bpf_mt_init); >> +module_exit(bpf_mt_exit); >> -- >> 1.7.7.3 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-18 16:48 ` Willem de Bruijn 2013-01-18 17:17 ` [PATCH next] " Willem de Bruijn @ 2013-01-21 13:44 ` Pablo Neira Ayuso 2013-01-22 8:46 ` Florian Westphal 1 sibling, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-01-21 13:44 UTC (permalink / raw) To: Willem de Bruijn; +Cc: netfilter-devel, Jozsef Kadlecsik, Florian Westphal On Fri, Jan 18, 2013 at 11:48:34AM -0500, Willem de Bruijn wrote: [...] > To compile code right now, the little bpf compiler that I emailed > before can be downloaded from > http://code.google.com/p/kernel/downloads/detail?name=bpf2decimal.c > > I don't think that a compiler has to be shipped with iptables itself, > let alone make iptables link against libraries. That said, it is not > impossible to detect pcap.h in configure.ac and optionally enable a > "-m bpf --string" mode that calls pcap_compile_nopcap from within > libxt_bpf, so let me know if you would like me to code that up. I can > also try to send a patch to tcpdump that extends compilation (`-ddd -y > <type>`) to arbitrary link layer types. We have to decide if: a) we add a new hard library dependency to iptables (libpcap) for just for one single module, that is, the libxt_bpf depends on libpcap. or b) provide a separate utility to generate the BPF filter in text-based format from some utility that accepts tcpdump-like syntax. The utility can be distributed in the utils directory and it would not be mandatory to compile it if libpcap is not present. I'd like to hear pro and cons arguments from others on this. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-21 13:44 ` [PATCH next v3] " Pablo Neira Ayuso @ 2013-01-22 8:46 ` Florian Westphal 2013-01-22 9:46 ` Jozsef Kadlecsik 2013-01-23 15:59 ` Willem de Bruijn 0 siblings, 2 replies; 58+ messages in thread From: Florian Westphal @ 2013-01-22 8:46 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: Willem de Bruijn, netfilter-devel, Jozsef Kadlecsik, Florian Westphal Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Fri, Jan 18, 2013 at 11:48:34AM -0500, Willem de Bruijn wrote: > [...] > > To compile code right now, the little bpf compiler that I emailed > > before can be downloaded from > > http://code.google.com/p/kernel/downloads/detail?name=bpf2decimal.c > > > > I don't think that a compiler has to be shipped with iptables itself, > > let alone make iptables link against libraries. That said, it is not > > impossible to detect pcap.h in configure.ac and optionally enable a > > "-m bpf --string" mode that calls pcap_compile_nopcap from within > > libxt_bpf, so let me know if you would like me to code that up. I can > > also try to send a patch to tcpdump that extends compilation (`-ddd -y > > <type>`) to arbitrary link layer types. > > We have to decide if: > > a) we add a new hard library dependency to iptables (libpcap) for just > for one single module, that is, the libxt_bpf depends on libpcap. > > or > > b) provide a separate utility to generate the BPF filter in text-based > format from some utility that accepts tcpdump-like syntax. The utility > can be distributed in the utils directory and it would not be > mandatory to compile it if libpcap is not present. > > I'd like to hear pro and cons arguments from others on this. a) is arguably more user friendly, however, I don't think we can retain the 'text representation' for iptables-save so users would still be confronted with the compiled data at some point (i.e., they need to write down the original expression anyway to figure out what the rule they added 6 months back actually does...) I would go with b) for now; we can always move to a) later on, but not the other way around (would kill backwards compatibility). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-22 8:46 ` Florian Westphal @ 2013-01-22 9:46 ` Jozsef Kadlecsik 2013-01-22 10:03 ` Maciej Żenczykowski 2013-01-22 11:11 ` Pablo Neira Ayuso 2013-01-23 15:59 ` Willem de Bruijn 1 sibling, 2 replies; 58+ messages in thread From: Jozsef Kadlecsik @ 2013-01-22 9:46 UTC (permalink / raw) To: Florian Westphal; +Cc: Pablo Neira Ayuso, Willem de Bruijn, netfilter-devel On Tue, 22 Jan 2013, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Fri, Jan 18, 2013 at 11:48:34AM -0500, Willem de Bruijn wrote: > > [...] > > > To compile code right now, the little bpf compiler that I emailed > > > before can be downloaded from > > > http://code.google.com/p/kernel/downloads/detail?name=bpf2decimal.c > > > > > > I don't think that a compiler has to be shipped with iptables itself, > > > let alone make iptables link against libraries. That said, it is not > > > impossible to detect pcap.h in configure.ac and optionally enable a > > > "-m bpf --string" mode that calls pcap_compile_nopcap from within > > > libxt_bpf, so let me know if you would like me to code that up. I can > > > also try to send a patch to tcpdump that extends compilation (`-ddd -y > > > <type>`) to arbitrary link layer types. > > > > We have to decide if: > > > > a) we add a new hard library dependency to iptables (libpcap) for just > > for one single module, that is, the libxt_bpf depends on libpcap. > > > > or > > > > b) provide a separate utility to generate the BPF filter in text-based > > format from some utility that accepts tcpdump-like syntax. The utility > > can be distributed in the utils directory and it would not be > > mandatory to compile it if libpcap is not present. > > > > I'd like to hear pro and cons arguments from others on this. > > a) is arguably more user friendly, however, I don't think we can > retain the 'text representation' for iptables-save so users > would still be confronted with the compiled data at some point > (i.e., they need to write down the original expression anyway to > figure out what the rule they added 6 months back actually does...) > > I would go with b) for now; we can always move to a) later on, but not > the other way around (would kill backwards compatibility). Yes, let's go with b). (But from packaging point of view utils/bpf2decimal.c depending on libpcap is not much different from extensions/libxt_bpf.c depending on libpcap.) Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-22 9:46 ` Jozsef Kadlecsik @ 2013-01-22 10:03 ` Maciej Żenczykowski 2013-01-22 11:11 ` Pablo Neira Ayuso 1 sibling, 0 replies; 58+ messages in thread From: Maciej Żenczykowski @ 2013-01-22 10:03 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: Florian Westphal, Pablo Neira Ayuso, Willem de Bruijn, netfilter-devel With (b) the static iptables binary doesn't grow by the size of the libpcap library. I think this is a worthy goal. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-22 9:46 ` Jozsef Kadlecsik 2013-01-22 10:03 ` Maciej Żenczykowski @ 2013-01-22 11:11 ` Pablo Neira Ayuso 1 sibling, 0 replies; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-01-22 11:11 UTC (permalink / raw) To: Jozsef Kadlecsik; +Cc: Florian Westphal, Willem de Bruijn, netfilter-devel On Tue, Jan 22, 2013 at 10:46:17AM +0100, Jozsef Kadlecsik wrote: > On Tue, 22 Jan 2013, Florian Westphal wrote: > > > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > On Fri, Jan 18, 2013 at 11:48:34AM -0500, Willem de Bruijn wrote: > > > [...] > > > > To compile code right now, the little bpf compiler that I emailed > > > > before can be downloaded from > > > > http://code.google.com/p/kernel/downloads/detail?name=bpf2decimal.c > > > > > > > > I don't think that a compiler has to be shipped with iptables itself, > > > > let alone make iptables link against libraries. That said, it is not > > > > impossible to detect pcap.h in configure.ac and optionally enable a > > > > "-m bpf --string" mode that calls pcap_compile_nopcap from within > > > > libxt_bpf, so let me know if you would like me to code that up. I can > > > > also try to send a patch to tcpdump that extends compilation (`-ddd -y > > > > <type>`) to arbitrary link layer types. > > > > > > We have to decide if: > > > > > > a) we add a new hard library dependency to iptables (libpcap) for just > > > for one single module, that is, the libxt_bpf depends on libpcap. > > > > > > or > > > > > > b) provide a separate utility to generate the BPF filter in text-based > > > format from some utility that accepts tcpdump-like syntax. The utility > > > can be distributed in the utils directory and it would not be > > > mandatory to compile it if libpcap is not present. > > > > > > I'd like to hear pro and cons arguments from others on this. > > > > a) is arguably more user friendly, however, I don't think we can > > retain the 'text representation' for iptables-save so users > > would still be confronted with the compiled data at some point > > (i.e., they need to write down the original expression anyway to > > figure out what the rule they added 6 months back actually does...) > > > > I would go with b) for now; we can always move to a) later on, but not > > the other way around (would kill backwards compatibility). > > Yes, let's go with b). (But from packaging point of view > utils/bpf2decimal.c depending on libpcap is not much different from > extensions/libxt_bpf.c depending on libpcap.) We can skip that dependency by adding an independent configure.ac and Makefile for this under iptables/utils/nfbpf. Thus, iptables itself will not depend on libpcap. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-22 8:46 ` Florian Westphal 2013-01-22 9:46 ` Jozsef Kadlecsik @ 2013-01-23 15:59 ` Willem de Bruijn 2013-01-23 16:21 ` Pablo Neira Ayuso 1 sibling, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-01-23 15:59 UTC (permalink / raw) To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel, Jozsef Kadlecsik On Tue, Jan 22, 2013 at 3:46 AM, Florian Westphal <fw@strlen.de> wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: >> On Fri, Jan 18, 2013 at 11:48:34AM -0500, Willem de Bruijn wrote: >> [...] >> > To compile code right now, the little bpf compiler that I emailed >> > before can be downloaded from >> > http://code.google.com/p/kernel/downloads/detail?name=bpf2decimal.c >> > >> > I don't think that a compiler has to be shipped with iptables itself, >> > let alone make iptables link against libraries. That said, it is not >> > impossible to detect pcap.h in configure.ac and optionally enable a >> > "-m bpf --string" mode that calls pcap_compile_nopcap from within >> > libxt_bpf, so let me know if you would like me to code that up. I can >> > also try to send a patch to tcpdump that extends compilation (`-ddd -y >> > <type>`) to arbitrary link layer types. >> >> We have to decide if: >> >> a) we add a new hard library dependency to iptables (libpcap) for just >> for one single module, that is, the libxt_bpf depends on libpcap. >> >> or >> >> b) provide a separate utility to generate the BPF filter in text-based >> format from some utility that accepts tcpdump-like syntax. The utility >> can be distributed in the utils directory and it would not be >> mandatory to compile it if libpcap is not present. >> >> I'd like to hear pro and cons arguments from others on this. > > a) is arguably more user friendly, however, I don't think we can > retain the 'text representation' for iptables-save so users > would still be confronted with the compiled data at some point > (i.e., they need to write down the original expression anyway to > figure out what the rule they added 6 months back actually does...) > > I would go with b) for now; we can always move to a) later on, but not > the other way around (would kill backwards compatibility). This sounds like the consensus (for the record, I also prefer this less disruptive approach). In that case, I can submit a revised libxt_bpf with your suggested changes right away, Pablo, and we can leave the separate userspace tool for a later commit. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-23 15:59 ` Willem de Bruijn @ 2013-01-23 16:21 ` Pablo Neira Ayuso 2013-01-23 16:38 ` Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-01-23 16:21 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Florian Westphal, netfilter-devel, Jozsef Kadlecsik On Wed, Jan 23, 2013 at 10:59:28AM -0500, Willem de Bruijn wrote: > >> b) provide a separate utility to generate the BPF filter in text-based > >> format from some utility that accepts tcpdump-like syntax. The utility > >> can be distributed in the utils directory and it would not be > >> mandatory to compile it if libpcap is not present. [...] > > I would go with b) for now; we can always move to a) later on, but not > > the other way around (would kill backwards compatibility). > > This sounds like the consensus (for the record, I also prefer this less > disruptive approach). In that case, I can submit a revised libxt_bpf with your > suggested changes right away, Pablo, and we can leave the separate > userspace tool for a later commit. Either way is fine, but please we should have that utility compiler integrated in the iptables tree by when 3.9-rc1 is released. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-23 16:21 ` Pablo Neira Ayuso @ 2013-01-23 16:38 ` Willem de Bruijn 2013-01-23 18:56 ` Pablo Neira Ayuso 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-01-23 16:38 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, Jozsef Kadlecsik On Wed, Jan 23, 2013 at 11:21 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Wed, Jan 23 2013 at 10:59:28AM -0500, Willem de Bruijn wrote: >> >> b) provide a separate utility to generate the BPF filter in text-based >> >> format from some utility that accepts tcpdump-like syntax. The utility >> >> can be distributed in the utils directory and it would not be >> >> mandatory to compile it if libpcap is not present. > [...] >> > I would go with b) for now; we can always move to a) later on, but not >> > the other way around (would kill backwards compatibility). >> >> This sounds like the consensus (for the record, I also prefer this less >> disruptive approach). In that case, I can submit a revised libxt_bpf with your >> suggested changes right away, Pablo, and we can leave the separate >> userspace tool for a later commit. > > Either way is fine, but please we should have that utility compiler > integrated in the iptables tree by when 3.9-rc1 is released. Okay. I'll prepare a separate patch with the pcap-based utility, then. Since utils is built as part of the root make invocation, I think it's better to test for pcap.h in the root configure.ac and add a test in utils/Makefile.am to build this tool if found, as opposed to creating a separate configure.ac under utils. We can also discuss these details after the first version of the patch, of course. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-23 16:38 ` Willem de Bruijn @ 2013-01-23 18:56 ` Pablo Neira Ayuso 2013-02-18 3:44 ` [PATCH] utils: bpf_compile Willem de Bruijn 2013-02-18 3:52 ` [PATCH next v3] iptables: add xt_bpf match Willem de Bruijn 0 siblings, 2 replies; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-01-23 18:56 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Florian Westphal, netfilter-devel, Jozsef Kadlecsik On Wed, Jan 23, 2013 at 11:38:20AM -0500, Willem de Bruijn wrote: > On Wed, Jan 23, 2013 at 11:21 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Wed, Jan 23 2013 at 10:59:28AM -0500, Willem de Bruijn wrote: > >> >> b) provide a separate utility to generate the BPF filter in text-based > >> >> format from some utility that accepts tcpdump-like syntax. The utility > >> >> can be distributed in the utils directory and it would not be > >> >> mandatory to compile it if libpcap is not present. > > [...] > >> > I would go with b) for now; we can always move to a) later on, but not > >> > the other way around (would kill backwards compatibility). > >> > >> This sounds like the consensus (for the record, I also prefer this less > >> disruptive approach). In that case, I can submit a revised libxt_bpf with your > >> suggested changes right away, Pablo, and we can leave the separate > >> userspace tool for a later commit. > > > > Either way is fine, but please we should have that utility compiler > > integrated in the iptables tree by when 3.9-rc1 is released. > > Okay. I'll prepare a separate patch with the pcap-based utility, then. > > Since utils is built as part of the root make invocation, I think it's > better to test for pcap.h in the root configure.ac and add a test in > utils/Makefile.am to build this tool if found, as opposed to creating > a separate configure.ac under utils. We can also discuss these > details after the first version of the patch, of course. That's fine by now, and it's way less bloat. ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH] utils: bpf_compile 2013-01-23 18:56 ` Pablo Neira Ayuso @ 2013-02-18 3:44 ` Willem de Bruijn 2013-02-20 10:38 ` Daniel Borkmann 2013-02-18 3:52 ` [PATCH next v3] iptables: add xt_bpf match Willem de Bruijn 1 sibling, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-02-18 3:44 UTC (permalink / raw) To: pablo, netfilter-devel; +Cc: Willem de Bruijn A BPF compiler to convert tcpudmp expressions to the decimal format accepted by the libxt_bpf. Generate a file and pass that to iptables: bpf_compile RAW 'udp dst port 9000' > test.bpf iptables -A OUTPUT -m bpf --bytecode-file test.bpf -j LOG Or pass the output directly to iptables using backticks: iptables -A INPUT -m bpf --bytecode \ "`./bpf_compile RAW 'udp dst port 9000' | tr '\n' ','`" -j LOG This utility depends on libpcap. The library is only compiled if the option --enable-pcap is explicitly passed to ./configure and libpcap is found. Tested (for review: to be removed before merging): - compilation always succeeds. - utils/bpf_compile is only built when enable_pcap is true and libpcap is found - utils/nfnl_osf is (still) only built when nfnetlink library is found - execution - tested the above two expressions and verified dmesg output - logging - iptables -L INPUT " LOG all -- anywhere anywhere match bpf 48 0 0 0,84 0 0 240,21 0 4 96,48 0 0 6,21 0 13 17,40 0 0 42,21 10 11 9000,48 0 0 0,84 0 0 240,21 0 8 64,48 0 0 9,21 0 6 17,40 0 0 6,69 4 0 8191,177 0 0 0,72 0 0 2,21 0 1 9000,6 0 0 65535,6 0 0 0, LOG level warning " - iptables-save " -A INPUT -m bpf --bytecode "19,48 0 0 0,84 0 0 240,21 0 4 96,48 0 0 6,21 0 13 17,40 0 0 42,21 10 11 9000,48 0 0 0,84 0 0 240,21 0 8 64,48 0 0 9,21 0 6 17,40 0 0 6,69 4 0 8191,177 0 0 0,72 0 0 2,21 0 1 9000,6 0 0 65535,6 0 0 0," -j LOG " --- Makefile.am | 2 -- configure.ac | 8 ++++++++ utils/Makefile.am | 14 ++++++++++++-- utils/bpf_compile.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 75 insertions(+), 4 deletions(-) create mode 100644 utils/bpf_compile.c diff --git a/Makefile.am b/Makefile.am index 6400ba4..c38d360 100644 --- a/Makefile.am +++ b/Makefile.am @@ -10,9 +10,7 @@ endif if ENABLE_LIBIPQ SUBDIRS += libipq endif -if HAVE_LIBNFNETLINK SUBDIRS += utils -endif # Depends on libxtables: SUBDIRS += extensions # Depends on extensions/libext.a: diff --git a/configure.ac b/configure.ac index 27e0b10..fe40afe 100644 --- a/configure.ac +++ b/configure.ac @@ -50,6 +50,9 @@ AC_ARG_ENABLE([devel], [enable_devel="$enableval"], [enable_devel="yes"]) AC_ARG_ENABLE([libipq], AS_HELP_STRING([--enable-libipq], [Build and install libipq])) +AC_ARG_ENABLE([pcap], + AS_HELP_STRING([--enable-pcap], [Build libpcap dependencies]), + [want_pcap="yes"]) AC_ARG_WITH([pkgconfigdir], AS_HELP_STRING([--with-pkgconfigdir=PATH], [Path to the pkgconfig directory [[LIBDIR/pkgconfig]]]), [pkgconfigdir="$withval"], [pkgconfigdir='${libdir}/pkgconfig']) @@ -93,6 +96,11 @@ PKG_CHECK_MODULES([libnfnetlink], [libnfnetlink >= 1.0], [nfnetlink=1], [nfnetlink=0]) AM_CONDITIONAL([HAVE_LIBNFNETLINK], [test "$nfnetlink" = 1]) +if test "$want_pcap" == "yes"; then +AC_CHECK_LIB(pcap, pcap_compile_nopcap, [have_libpcap="yes"]) +fi; +AM_CONDITIONAL([HAVE_LIBPCAP], [test "$have_libpcap" = "yes"]) + regular_CFLAGS="-Wall -Waggregate-return -Wmissing-declarations \ -Wmissing-prototypes -Wredundant-decls -Wshadow -Wstrict-prototypes \ -Winline -pipe"; diff --git a/utils/Makefile.am b/utils/Makefile.am index f1bbfc5..b05ff51 100644 --- a/utils/Makefile.am +++ b/utils/Makefile.am @@ -4,7 +4,17 @@ AM_CFLAGS = ${regular_CFLAGS} AM_CPPFLAGS = ${regular_CPPFLAGS} -I${top_builddir}/include \ -I${top_srcdir}/include ${libnfnetlink_CFLAGS} -sbin_PROGRAMS = nfnl_osf -pkgdata_DATA = pf.os +sbin_PROGRAMS = +pkgdata_DATA = + +if HAVE_LIBNFNETLINK +sbin_PROGRAMS += nfnl_osf +pkgdata_DATA += pf.os nfnl_osf_LDADD = -lnfnetlink +endif + +if HAVE_LIBPCAP +sbin_PROGRAMS += bpf_compile +bpf_compile_LDADD = -lpcap +endif diff --git a/utils/bpf_compile.c b/utils/bpf_compile.c new file mode 100644 index 0000000..7b3e3c0 --- /dev/null +++ b/utils/bpf_compile.c @@ -0,0 +1,55 @@ +/* + * BPF program compilation tool + * + * Generates decimal output, similar to `tcpdump -ddd ...`. + * Unlike tcpdump, will generate for any given link layer type. + * + * There is no makefile: + * compile with `gcc -Wall -o bpf2decimal bpf2decimal.c -lpcap` or similar. + * + * Written by Willem de Bruijn (willemb@google.com) + * Copyright Google, Inc. 2013 + * Licensed under the GNU General Public License version 2 (GPLv2) +*/ + +#include <pcap.h> +#include <stdio.h> + +int main(int argc, char **argv) +{ + struct bpf_program program; + struct bpf_insn *ins; + int i, dlt = DLT_RAW; + + if (argc < 2 || argc > 3) { + fprintf(stderr, "Usage: %s [link] '<program>'\n\n" + " link is a pcap linklayer type:\n" + " one of EN10MB, RAW, SLIP, ...\n\n" + "Examples: %s RAW 'tcp and greater 100'\n" + " %s EN10MB 'ip proto 47'\n'", + argv[0], argv[0], argv[0]); + return 1; + } + + if (argc == 3) { + dlt = pcap_datalink_name_to_val(argv[1]); + if (dlt == -1) { + fprintf(stderr, "Unknown datalinktype: %s\n", argv[1]); + return 1; + } + } + + if (pcap_compile_nopcap(65535, dlt, &program, argv[argc - 1], 1, + PCAP_NETMASK_UNKNOWN)) { + fprintf(stderr, "Compilation error\n"); + return 1; + } + + printf("%d\n", program.bf_len); + ins = program.bf_insns; + for (i = 0; i < program.bf_len; ++ins, ++i) + printf("%u %u %u %u\n", ins->code, ins->jt, ins->jf, ins->k); + + return 0; +} + -- 1.8.1.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] utils: bpf_compile 2013-02-18 3:44 ` [PATCH] utils: bpf_compile Willem de Bruijn @ 2013-02-20 10:38 ` Daniel Borkmann 2013-02-21 4:35 ` Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Daniel Borkmann @ 2013-02-20 10:38 UTC (permalink / raw) To: Willem de Bruijn; +Cc: pablo, netfilter-devel On 02/18/2013 04:44 AM, Willem de Bruijn wrote: > A BPF compiler to convert tcpudmp expressions to the decimal format accepted > by the libxt_bpf. [...] > --- /dev/null > +++ b/utils/bpf_compile.c > @@ -0,0 +1,55 @@ > +/* > + * BPF program compilation tool > + * > + * Generates decimal output, similar to `tcpdump -ddd ...`. > + * Unlike tcpdump, will generate for any given link layer type. > + * > + * There is no makefile: > + * compile with `gcc -Wall -o bpf2decimal bpf2decimal.c -lpcap` or similar. > + * > + * Written by Willem de Bruijn (willemb@google.com) > + * Copyright Google, Inc. 2013 > + * Licensed under the GNU General Public License version 2 (GPLv2) > +*/ > + > +#include <pcap.h> > +#include <stdio.h> > + > +int main(int argc, char **argv) > +{ > + struct bpf_program program; > + struct bpf_insn *ins; > + int i, dlt = DLT_RAW; > + > + if (argc < 2 || argc > 3) { > + fprintf(stderr, "Usage: %s [link] '<program>'\n\n" > + " link is a pcap linklayer type:\n" > + " one of EN10MB, RAW, SLIP, ...\n\n" > + "Examples: %s RAW 'tcp and greater 100'\n" > + " %s EN10MB 'ip proto 47'\n'", > + argv[0], argv[0], argv[0]); > + return 1; > + } > + > + if (argc == 3) { > + dlt = pcap_datalink_name_to_val(argv[1]); > + if (dlt == -1) { > + fprintf(stderr, "Unknown datalinktype: %s\n", argv[1]); > + return 1; > + } > + } > + > + if (pcap_compile_nopcap(65535, dlt, &program, argv[argc - 1], 1, > + PCAP_NETMASK_UNKNOWN)) { > + fprintf(stderr, "Compilation error\n"); > + return 1; > + } > + > + printf("%d\n", program.bf_len); > + ins = program.bf_insns; > + for (i = 0; i < program.bf_len; ++ins, ++i) > + printf("%u %u %u %u\n", ins->code, ins->jt, ins->jf, ins->k); Here I think you should release the internally allocated memory by adding a: pcap_freecode(&program); > + return 0; > +} > + > ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] utils: bpf_compile 2013-02-20 10:38 ` Daniel Borkmann @ 2013-02-21 4:35 ` Willem de Bruijn 2013-02-21 13:43 ` Daniel Borkmann 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-02-21 4:35 UTC (permalink / raw) To: Daniel Borkmann; +Cc: Pablo Neira Ayuso, netfilter-devel On Wed, Feb 20, 2013 at 5:38 AM, Daniel Borkmann <dborkman@redhat.com> wrote: > On 02/18/2013 04:44 AM, Willem de Bruijn wrote: >> >> A BPF compiler to convert tcpudmp expressions to the decimal format >> accepted >> by the libxt_bpf. > > > [...] > > >> --- /dev/null >> +++ b/utils/bpf_compile.c >> @@ -0,0 +1,55 @@ >> +/* >> + * BPF program compilation tool >> + * >> + * Generates decimal output, similar to `tcpdump -ddd ...`. >> + * Unlike tcpdump, will generate for any given link layer type. >> + * >> + * There is no makefile: >> + * compile with `gcc -Wall -o bpf2decimal bpf2decimal.c -lpcap` or >> similar. >> + * >> + * Written by Willem de Bruijn (willemb@google.com) >> + * Copyright Google, Inc. 2013 >> + * Licensed under the GNU General Public License version 2 (GPLv2) >> +*/ >> + >> +#include <pcap.h> >> +#include <stdio.h> >> + >> +int main(int argc, char **argv) >> +{ >> + struct bpf_program program; >> + struct bpf_insn *ins; >> + int i, dlt = DLT_RAW; >> + >> + if (argc < 2 || argc > 3) { >> + fprintf(stderr, "Usage: %s [link] '<program>'\n\n" >> + " link is a pcap linklayer >> type:\n" >> + " one of EN10MB, RAW, SLIP, >> ...\n\n" >> + "Examples: %s RAW 'tcp and greater 100'\n" >> + " %s EN10MB 'ip proto 47'\n'", >> + argv[0], argv[0], argv[0]); >> + return 1; >> + } >> + >> + if (argc == 3) { >> + dlt = pcap_datalink_name_to_val(argv[1]); >> + if (dlt == -1) { >> + fprintf(stderr, "Unknown datalinktype: %s\n", >> argv[1]); >> + return 1; >> + } >> + } >> + >> + if (pcap_compile_nopcap(65535, dlt, &program, argv[argc - 1], 1, >> + PCAP_NETMASK_UNKNOWN)) { >> + fprintf(stderr, "Compilation error\n"); >> + return 1; >> + } >> + >> + printf("%d\n", program.bf_len); >> + ins = program.bf_insns; >> + for (i = 0; i < program.bf_len; ++ins, ++i) >> + printf("%u %u %u %u\n", ins->code, ins->jt, ins->jf, >> ins->k); > > > Here I think you should release the internally allocated memory by adding a: > > pcap_freecode(&program); Thanks for catching that, Daniel. I'll hold off respinning the patch to see if there is other feedback, but will fix this in the next revision. > >> + return 0; >> +} >> + >> > ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] utils: bpf_compile 2013-02-21 4:35 ` Willem de Bruijn @ 2013-02-21 13:43 ` Daniel Borkmann 2013-03-12 15:44 ` [PATCH next] " Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Daniel Borkmann @ 2013-02-21 13:43 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Pablo Neira Ayuso, netfilter-devel On 02/21/2013 05:35 AM, Willem de Bruijn wrote: > On Wed, Feb 20, 2013 at 5:38 AM, Daniel Borkmann <dborkman@redhat.com> wrote: >> On 02/18/2013 04:44 AM, Willem de Bruijn wrote: >>> >>> A BPF compiler to convert tcpudmp expressions to the decimal format >>> accepted >>> by the libxt_bpf. >> >> [...] >> >>> --- /dev/null >>> +++ b/utils/bpf_compile.c >>> @@ -0,0 +1,55 @@ >>> +/* >>> + * BPF program compilation tool >>> + * >>> + * Generates decimal output, similar to `tcpdump -ddd ...`. >>> + * Unlike tcpdump, will generate for any given link layer type. >>> + * >>> + * There is no makefile: >>> + * compile with `gcc -Wall -o bpf2decimal bpf2decimal.c -lpcap` or >>> similar. >>> + * >>> + * Written by Willem de Bruijn (willemb@google.com) >>> + * Copyright Google, Inc. 2013 >>> + * Licensed under the GNU General Public License version 2 (GPLv2) >>> +*/ >>> + >>> +#include <pcap.h> >>> +#include <stdio.h> >>> + >>> +int main(int argc, char **argv) >>> +{ >>> + struct bpf_program program; >>> + struct bpf_insn *ins; >>> + int i, dlt = DLT_RAW; >>> + >>> + if (argc < 2 || argc > 3) { >>> + fprintf(stderr, "Usage: %s [link] '<program>'\n\n" >>> + " link is a pcap linklayer >>> type:\n" >>> + " one of EN10MB, RAW, SLIP, >>> ...\n\n" >>> + "Examples: %s RAW 'tcp and greater 100'\n" >>> + " %s EN10MB 'ip proto 47'\n'", >>> + argv[0], argv[0], argv[0]); >>> + return 1; >>> + } >>> + >>> + if (argc == 3) { >>> + dlt = pcap_datalink_name_to_val(argv[1]); >>> + if (dlt == -1) { >>> + fprintf(stderr, "Unknown datalinktype: %s\n", >>> argv[1]); >>> + return 1; >>> + } >>> + } >>> + >>> + if (pcap_compile_nopcap(65535, dlt, &program, argv[argc - 1], 1, >>> + PCAP_NETMASK_UNKNOWN)) { >>> + fprintf(stderr, "Compilation error\n"); >>> + return 1; >>> + } >>> + >>> + printf("%d\n", program.bf_len); >>> + ins = program.bf_insns; >>> + for (i = 0; i < program.bf_len; ++ins, ++i) >>> + printf("%u %u %u %u\n", ins->code, ins->jt, ins->jf, >>> ins->k); >> >> >> Here I think you should release the internally allocated memory by adding a: >> >> pcap_freecode(&program); > > Thanks for catching that, Daniel. I'll hold off respinning the patch > to see if there is other feedback, but will fix this in the next > revision. Thanks, otherwise I think the user space utility looks good. I've also just added support for this output format into bpfc (netsniff-ng Git tree), in case low-level filter devel/debugging is needed, e.g. bpfc -Di <file>. ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH next] utils: bpf_compile 2013-02-21 13:43 ` Daniel Borkmann @ 2013-03-12 15:44 ` Willem de Bruijn 2013-04-01 22:20 ` Pablo Neira Ayuso 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-03-12 15:44 UTC (permalink / raw) To: pablo, netfilter-devel, dborkman, zenczykowski; +Cc: Willem de Bruijn V2 - update build process to - succeed without bpf_compiler when --enable-bpf-compiler is not set - succeed with bpf_compiler when this flag is set and libpcap is found - fail when this flag is set but libpcap is absent - release memory allocated by pcap on exit A BPF compiler to convert tcpudmp expressions to the decimal format accepted by the libxt_bpf. Generate a file and pass that to iptables: bpf_compile RAW 'udp dst port 9000' > test.bpf iptables -A OUTPUT -m bpf --bytecode-file test.bpf -j LOG Or pass the output directly to iptables using backticks: iptables -A INPUT -m bpf --bytecode \ "`./bpf_compile RAW 'udp dst port 9000' | tr '\n' ','`" -j LOG This utility depends on libpcap. The library is only compiled if the option --enable-pcap is explicitly passed to ./configure and libpcap is found. Tested (for review: to be removed before merging): - compilation always succeeds. - utils/bpf_compile is only built when enable_pcap is true and libpcap is found - utils/nfnl_osf is (still) only built when nfnetlink library is found - execution - tested the above two expressions and verified dmesg output - logging - iptables -L INPUT " LOG all -- anywhere anywhere match bpf 48 0 0 0,84 0 0 240,21 0 4 96,48 0 0 6,21 0 13 17,40 0 0 42,21 10 11 9000,48 0 0 0,84 0 0 240,21 0 8 64,48 0 0 9,21 0 6 17,40 0 0 6,69 4 0 8191,177 0 0 0,72 0 0 2,21 0 1 9000,6 0 0 65535,6 0 0 0, LOG level warning " - iptables-save " -A INPUT -m bpf --bytecode "19,48 0 0 0,84 0 0 240,21 0 4 96,48 0 0 6,21 0 13 17,40 0 0 42,21 10 11 9000,48 0 0 0,84 0 0 240,21 0 8 64,48 0 0 9,21 0 6 17,40 0 0 6,69 4 0 8191,177 0 0 0,72 0 0 2,21 0 1 9000,6 0 0 65535,6 0 0 0," -j LOG " Signed-off-by: Willem de Bruijn <willemb@google.com> --- Makefile.am | 2 -- configure.ac | 4 ++++ utils/Makefile.am | 14 ++++++++++++-- utils/bpf_compile.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 72 insertions(+), 4 deletions(-) create mode 100644 utils/bpf_compile.c diff --git a/Makefile.am b/Makefile.am index 6400ba4..c38d360 100644 --- a/Makefile.am +++ b/Makefile.am @@ -10,9 +10,7 @@ endif if ENABLE_LIBIPQ SUBDIRS += libipq endif -if HAVE_LIBNFNETLINK SUBDIRS += utils -endif # Depends on libxtables: SUBDIRS += extensions # Depends on extensions/libext.a: diff --git a/configure.ac b/configure.ac index 27e0b10..e5cfb12 100644 --- a/configure.ac +++ b/configure.ac @@ -50,6 +50,9 @@ AC_ARG_ENABLE([devel], [enable_devel="$enableval"], [enable_devel="yes"]) AC_ARG_ENABLE([libipq], AS_HELP_STRING([--enable-libipq], [Build and install libipq])) +AC_ARG_ENABLE([bpf-compiler], + AS_HELP_STRING([--enable-bpf-compiler], [Build bpf compiler]), + [enable_bpfc="yes"]) AC_ARG_WITH([pkgconfigdir], AS_HELP_STRING([--with-pkgconfigdir=PATH], [Path to the pkgconfig directory [[LIBDIR/pkgconfig]]]), [pkgconfigdir="$withval"], [pkgconfigdir='${libdir}/pkgconfig']) @@ -88,6 +91,7 @@ AM_CONDITIONAL([ENABLE_IPV6], [test "$enable_ipv6" = "yes"]) AM_CONDITIONAL([ENABLE_LARGEFILE], [test "$enable_largefile" = "yes"]) AM_CONDITIONAL([ENABLE_DEVEL], [test "$enable_devel" = "yes"]) AM_CONDITIONAL([ENABLE_LIBIPQ], [test "$enable_libipq" = "yes"]) +AM_CONDITIONAL([ENABLE_BPFC], [test "$enable_bpfc" = "yes"]) PKG_CHECK_MODULES([libnfnetlink], [libnfnetlink >= 1.0], [nfnetlink=1], [nfnetlink=0]) diff --git a/utils/Makefile.am b/utils/Makefile.am index f1bbfc5..8b510d6 100644 --- a/utils/Makefile.am +++ b/utils/Makefile.am @@ -4,7 +4,17 @@ AM_CFLAGS = ${regular_CFLAGS} AM_CPPFLAGS = ${regular_CPPFLAGS} -I${top_builddir}/include \ -I${top_srcdir}/include ${libnfnetlink_CFLAGS} -sbin_PROGRAMS = nfnl_osf -pkgdata_DATA = pf.os +sbin_PROGRAMS = +pkgdata_DATA = + +if HAVE_LIBNFNETLINK +sbin_PROGRAMS += nfnl_osf +pkgdata_DATA += pf.os nfnl_osf_LDADD = -lnfnetlink +endif + +if ENABLE_BPFC +sbin_PROGRAMS += bpf_compile +bpf_compile_LDADD = -lpcap +endif diff --git a/utils/bpf_compile.c b/utils/bpf_compile.c new file mode 100644 index 0000000..81b59a4 --- /dev/null +++ b/utils/bpf_compile.c @@ -0,0 +1,56 @@ +/* + * BPF program compilation tool + * + * Generates decimal output, similar to `tcpdump -ddd ...`. + * Unlike tcpdump, will generate for any given link layer type. + * + * There is no makefile: + * compile with `gcc -Wall -o bpf2decimal bpf2decimal.c -lpcap` or similar. + * + * Written by Willem de Bruijn (willemb@google.com) + * Copyright Google, Inc. 2013 + * Licensed under the GNU General Public License version 2 (GPLv2) +*/ + +#include <pcap.h> +#include <stdio.h> + +int main(int argc, char **argv) +{ + struct bpf_program program; + struct bpf_insn *ins; + int i, dlt = DLT_RAW; + + if (argc < 2 || argc > 3) { + fprintf(stderr, "Usage: %s [link] '<program>'\n\n" + " link is a pcap linklayer type:\n" + " one of EN10MB, RAW, SLIP, ...\n\n" + "Examples: %s RAW 'tcp and greater 100'\n" + " %s EN10MB 'ip proto 47'\n'", + argv[0], argv[0], argv[0]); + return 1; + } + + if (argc == 3) { + dlt = pcap_datalink_name_to_val(argv[1]); + if (dlt == -1) { + fprintf(stderr, "Unknown datalinktype: %s\n", argv[1]); + return 1; + } + } + + if (pcap_compile_nopcap(65535, dlt, &program, argv[argc - 1], 1, + PCAP_NETMASK_UNKNOWN)) { + fprintf(stderr, "Compilation error\n"); + return 1; + } + + printf("%d\n", program.bf_len); + ins = program.bf_insns; + for (i = 0; i < program.bf_len; ++ins, ++i) + printf("%u %u %u %u\n", ins->code, ins->jt, ins->jf, ins->k); + + pcap_freecode(&program); + return 0; +} + -- 1.8.1.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH next] utils: bpf_compile 2013-03-12 15:44 ` [PATCH next] " Willem de Bruijn @ 2013-04-01 22:20 ` Pablo Neira Ayuso 2013-04-03 15:32 ` Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-04-01 22:20 UTC (permalink / raw) To: Willem de Bruijn; +Cc: netfilter-devel, dborkman, zenczykowski On Tue, Mar 12, 2013 at 11:44:12AM -0400, Willem de Bruijn wrote: > V2 > - update build process to > - succeed without bpf_compiler when --enable-bpf-compiler is not set > - succeed with bpf_compiler when this flag is set and libpcap is found > - fail when this flag is set but libpcap is absent > - release memory allocated by pcap on exit Applied with changes: * renamed utility to add the `nf' prefix. * the output of the utility is now exactly what iptables -m bpf --bytecode needs, just to make it easier to everyone. Still missing a hard fail if --enable-bpf-compiler is set and libpcap is not available in the system, I'd appreciate if you send me a follow up patch to address this. You may want to have a look at how ulogd2 does this in its configure fail as reference. Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next] utils: bpf_compile 2013-04-01 22:20 ` Pablo Neira Ayuso @ 2013-04-03 15:32 ` Willem de Bruijn 2013-04-04 9:34 ` Pablo Neira Ayuso 0 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-04-03 15:32 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: netfilter-devel, Daniel Borkmann, Maciej Żenczykowski On Mon, Apr 1, 2013 at 6:20 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Tue, Mar 12, 2013 at 11:44:12AM -0400, Willem de Bruijn wrote: >> V2 >> - update build process to >> - succeed without bpf_compiler when --enable-bpf-compiler is not set >> - succeed with bpf_compiler when this flag is set and libpcap is found >> - fail when this flag is set but libpcap is absent >> - release memory allocated by pcap on exit > > Applied with changes: Thanks, Pablo. > * renamed utility to add the `nf' prefix. > > * the output of the utility is now exactly what iptables -m bpf --bytecode > needs, just to make it easier to everyone. > > Still missing a hard fail if --enable-bpf-compiler is set and libpcap > is not available in the system, Compilation and linking will both fail hard: if gcc -DHAVE_CONFIG_H -I. -I. -I.. -D_LARGEFILE_SOURCE=1 -D_LARGE_FILES -D_FILE_OFFSET_BITS=64 -D_REENTRANT -DXTABLES_LIBDIR=\"/usr/local/lib/xtables\" -DXTABLES_INTERNAL -I../include -I../include -Wall -Waggregate-return -Wmissing-declarations -Wmissing-prototypes -Wredundant-decls -Wshadow -Wstrict-prototypes -Winline -pipe -g -O2 -MT nfbpf_compile.o -MD -MP -MF ".deps/nfbpf_compile.Tpo" -c -o nfbpf_compile.o nfbpf_compile.c; \ then mv -f ".deps/nfbpf_compile.Tpo" ".deps/nfbpf_compile.Po"; else rm -f ".deps/nfbpf_compile.Tpo"; exit 1; fi nfbpf_compile.c:12:18: fatal error: pcap.h: No such file or directory compilation terminated. make: *** [nfbpf_compile.o] Error 1 Do you want the configure script itself to fail before that step? > I'd appreciate if you send me a follow > up patch to address this. You may want to have a look at how ulogd2 > does this in its configure fail as reference. That skips compilation if HAVE_PCAP is false, but it does not fail hard during configure, either. At least, that's what I gather from reading and running https://git.netfilter.org/ulogd2/tree/configure.ac#n23 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next] utils: bpf_compile 2013-04-03 15:32 ` Willem de Bruijn @ 2013-04-04 9:34 ` Pablo Neira Ayuso 0 siblings, 0 replies; 58+ messages in thread From: Pablo Neira Ayuso @ 2013-04-04 9:34 UTC (permalink / raw) To: Willem de Bruijn Cc: netfilter-devel, Daniel Borkmann, Maciej Żenczykowski On Wed, Apr 03, 2013 at 11:32:13AM -0400, Willem de Bruijn wrote: [...] > > Still missing a hard fail if --enable-bpf-compiler is set and libpcap > > is not available in the system, > > Compilation and linking will both fail hard: > > if gcc -DHAVE_CONFIG_H -I. -I. -I.. -D_LARGEFILE_SOURCE=1 > -D_LARGE_FILES -D_FILE_OFFSET_BITS=64 -D_REENTRANT > -DXTABLES_LIBDIR=\"/usr/local/lib/xtables\" -DXTABLES_INTERNAL > -I../include -I../include -Wall -Waggregate-return > -Wmissing-declarations -Wmissing-prototypes -Wredundant-decls -Wshadow > -Wstrict-prototypes -Winline -pipe -g -O2 -MT nfbpf_compile.o -MD -MP > -MF ".deps/nfbpf_compile.Tpo" -c -o nfbpf_compile.o nfbpf_compile.c; \ > then mv -f ".deps/nfbpf_compile.Tpo" ".deps/nfbpf_compile.Po"; else rm > -f ".deps/nfbpf_compile.Tpo"; exit 1; fi > nfbpf_compile.c:12:18: fatal error: pcap.h: No such file or directory > compilation terminated. > make: *** [nfbpf_compile.o] Error 1 > > Do you want the configure script itself to fail before that step? Yes. It should hard fail if --enable-bpf-compiler is set and libpcap is not available. > > I'd appreciate if you send me a follow > > up patch to address this. You may want to have a look at how ulogd2 > > does this in its configure fail as reference. > > That skips compilation if HAVE_PCAP is false, but it does not fail > hard during configure, either. At least, that's what I gather from > reading and running > > https://git.netfilter.org/ulogd2/tree/configure.ac#n23 Just pointed to that chunk of code so you can use it as base, you'll have to adapt it to achieve the behaviour we want, of course. Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-01-23 18:56 ` Pablo Neira Ayuso 2013-02-18 3:44 ` [PATCH] utils: bpf_compile Willem de Bruijn @ 2013-02-18 3:52 ` Willem de Bruijn 2013-02-24 2:15 ` Maciej Żenczykowski 1 sibling, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2013-02-18 3:52 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, Jozsef Kadlecsik On Wed, Jan 23, 2013 at 1:56 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Wed, Jan 23, 2013 at 11:38:20AM -0500, Willem de Bruijn wrote: >> On Wed, Jan 23, 2013 at 11:21 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >> > On Wed, Jan 23 2013 at 10:59:28AM -0500, Willem de Bruijn wrote: >> >> >> b) provide a separate utility to generate the BPF filter in text-based >> >> >> format from some utility that accepts tcpdump-like syntax. The utility >> >> >> can be distributed in the utils directory and it would not be >> >> >> mandatory to compile it if libpcap is not present. >> > [...] >> >> > I would go with b) for now; we can always move to a) later on, but not >> >> > the other way around (would kill backwards compatibility). >> >> >> >> This sounds like the consensus (for the record, I also prefer this less >> >> disruptive approach). In that case, I can submit a revised libxt_bpf with your >> >> suggested changes right away, Pablo, and we can leave the separate >> >> userspace tool for a later commit. >> > >> > Either way is fine, but please we should have that utility compiler >> > integrated in the iptables tree by when 3.9-rc1 is released. >> >> Okay. I'll prepare a separate patch with the pcap-based utility, then. Just sent the patch. I'm no expert at autoconf and automake, so the build logic can conceivably be shorter, but it works for me and the logic is straightforward. I forgot to mention in the commit message which versions of the tools I used: tested on a ubuntu 12.04 with autoconf 2.68, automake 1.9.6 and libtool 2.4.2. >> Since utils is built as part of the root make invocation, I think it's >> better to test for pcap.h in the root configure.ac and add a test in >> utils/Makefile.am to build this tool if found, as opposed to creating >> a separate configure.ac under utils. We can also discuss these >> details after the first version of the patch, of course. > > That's fine by now, and it's way less bloat. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-02-18 3:52 ` [PATCH next v3] iptables: add xt_bpf match Willem de Bruijn @ 2013-02-24 2:15 ` Maciej Żenczykowski 2013-02-27 20:39 ` Willem de Bruijn 0 siblings, 1 reply; 58+ messages in thread From: Maciej Żenczykowski @ 2013-02-24 2:15 UTC (permalink / raw) To: Willem de Bruijn Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Jozsef Kadlecsik at a guess, there should be a --with/without option for it, and if --with=tool is specified build/configure should fail if support libraries are missing On Sun, Feb 17, 2013 at 7:52 PM, Willem de Bruijn <willemb@google.com> wrote: > On Wed, Jan 23, 2013 at 1:56 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >> On Wed, Jan 23, 2013 at 11:38:20AM -0500, Willem de Bruijn wrote: >>> On Wed, Jan 23, 2013 at 11:21 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >>> > On Wed, Jan 23 2013 at 10:59:28AM -0500, Willem de Bruijn wrote: >>> >> >> b) provide a separate utility to generate the BPF filter in text-based >>> >> >> format from some utility that accepts tcpdump-like syntax. The utility >>> >> >> can be distributed in the utils directory and it would not be >>> >> >> mandatory to compile it if libpcap is not present. >>> > [...] >>> >> > I would go with b) for now; we can always move to a) later on, but not >>> >> > the other way around (would kill backwards compatibility). >>> >> >>> >> This sounds like the consensus (for the record, I also prefer this less >>> >> disruptive approach). In that case, I can submit a revised libxt_bpf with your >>> >> suggested changes right away, Pablo, and we can leave the separate >>> >> userspace tool for a later commit. >>> > >>> > Either way is fine, but please we should have that utility compiler >>> > integrated in the iptables tree by when 3.9-rc1 is released. >>> >>> Okay. I'll prepare a separate patch with the pcap-based utility, then. > > Just sent the patch. I'm no expert at autoconf and automake, so the > build logic can conceivably be shorter, but it works for me and the > logic is straightforward. I forgot to mention in the commit message > which versions of the tools I used: tested on a ubuntu 12.04 with > autoconf 2.68, automake 1.9.6 and libtool 2.4.2. > >>> Since utils is built as part of the root make invocation, I think it's >>> better to test for pcap.h in the root configure.ac and add a test in >>> utils/Makefile.am to build this tool if found, as opposed to creating >>> a separate configure.ac under utils. We can also discuss these >>> details after the first version of the patch, of course. >> >> That's fine by now, and it's way less bloat. > -- > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH next v3] iptables: add xt_bpf match 2013-02-24 2:15 ` Maciej Żenczykowski @ 2013-02-27 20:39 ` Willem de Bruijn 0 siblings, 0 replies; 58+ messages in thread From: Willem de Bruijn @ 2013-02-27 20:39 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Jozsef Kadlecsik On Sat, Feb 23, 2013 at 9:15 PM, Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > at a guess, there should be a --with/without option for it, and if > --with=tool is specified build/configure should fail if support > libraries are missing Agreed on the fail hard. After a brief offline discussion on --enable vs --with, will respin with optional --enable-bpf-compiler. The option is disabled by default. If it is enabled and pcap cannot be found, build fails. If no further comments, I'll respin shortly. > > On Sun, Feb 17, 2013 at 7:52 PM, Willem de Bruijn <willemb@google.com> wrote: >> On Wed, Jan 23, 2013 at 1:56 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >>> On Wed, Jan 23, 2013 at 11:38:20AM -0500, Willem de Bruijn wrote: >>>> On Wed, Jan 23, 2013 at 11:21 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >>>> > On Wed, Jan 23 2013 at 10:59:28AM -0500, Willem de Bruijn wrote: >>>> >> >> b) provide a separate utility to generate the BPF filter in text-based >>>> >> >> format from some utility that accepts tcpdump-like syntax. The utility >>>> >> >> can be distributed in the utils directory and it would not be >>>> >> >> mandatory to compile it if libpcap is not present. >>>> > [...] >>>> >> > I would go with b) for now; we can always move to a) later on, but not >>>> >> > the other way around (would kill backwards compatibility). >>>> >> >>>> >> This sounds like the consensus (for the record, I also prefer this less >>>> >> disruptive approach). In that case, I can submit a revised libxt_bpf with your >>>> >> suggested changes right away, Pablo, and we can leave the separate >>>> >> userspace tool for a later commit. >>>> > >>>> > Either way is fine, but please we should have that utility compiler >>>> > integrated in the iptables tree by when 3.9-rc1 is released. >>>> >>>> Okay. I'll prepare a separate patch with the pcap-based utility, then. >> >> Just sent the patch. I'm no expert at autoconf and automake, so the >> build logic can conceivably be shorter, but it works for me and the >> logic is straightforward. I forgot to mention in the commit message >> which versions of the tools I used: tested on a ubuntu 12.04 with >> autoconf 2.68, automake 1.9.6 and libtool 2.4.2. >> >>>> Since utils is built as part of the root make invocation, I think it's >>>> better to test for pcap.h in the root configure.ac and add a test in >>>> utils/Makefile.am to build this tool if found, as opposed to creating >>>> a separate configure.ac under utils. We can also discuss these >>>> details after the first version of the patch, of course. >>> >>> That's fine by now, and it's way less bloat. >> -- >> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH rfc] netfilter: two xtables matches 2012-12-05 19:22 [PATCH rfc] netfilter: two xtables matches Willem de Bruijn 2012-12-05 19:22 ` [PATCH 1/2] netfilter: add xt_priority xtables match Willem de Bruijn 2012-12-05 19:22 ` [PATCH 2/2] netfilter: add xt_bpf " Willem de Bruijn @ 2012-12-05 19:28 ` Willem de Bruijn 2012-12-05 20:00 ` Jan Engelhardt 2 siblings, 1 reply; 58+ messages in thread From: Willem de Bruijn @ 2012-12-05 19:28 UTC (permalink / raw) To: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber, pablo Somehow, the first part of this email went missing. Not critical, but for completeness: These two patches each add an xtables match. The xt_priority match is a straighforward addition in the style of xt_mark, adding the option to filter on one more sk_buff field. I have an immediate application for this. The amount of code (in kernel + userspace) to add a single check proved quite large. On Wed, Dec 5, 2012 at 2:22 PM, Willem de Bruijn <willemb@google.com> wrote: > The second patch is more speculative and aims to be a more general > workaround, as well as a performance optimization: support > (preferably JIT compiled) BPF programs as iptables match rules. > > Potentially, the skb->priority match can be implemented by applying > only the second patch and adding a new BPF_S_ANC ancillary field to > Linux Socket Filters. > > I also wrote corresponding userspace patches to iptables. The process > for submitting both kernel and user patches is not 100% clear to me. > Sending the kernel bits to both netdev and netfilter-devel for > initial feedback. Please correct me if you want it another way. > > The patches apply to net-next. > ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH rfc] netfilter: two xtables matches 2012-12-05 19:28 ` [PATCH rfc] netfilter: two xtables matches Willem de Bruijn @ 2012-12-05 20:00 ` Jan Engelhardt 2012-12-05 21:45 ` Willem de Bruijn 2012-12-06 5:22 ` Pablo Neira Ayuso 0 siblings, 2 replies; 58+ messages in thread From: Jan Engelhardt @ 2012-12-05 20:00 UTC (permalink / raw) To: Willem de Bruijn Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber, pablo On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote: >Somehow, the first part of this email went missing. Not critical, >but for completeness: > >These two patches each add an xtables match. > >The xt_priority match is a straighforward addition in the style of >xt_mark, adding the option to filter on one more sk_buff field. I >have an immediate application for this. The amount of code (in >kernel + userspace) to add a single check proved quite large. Hm so yeah, can't we just place this in xt_mark.c? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH rfc] netfilter: two xtables matches 2012-12-05 20:00 ` Jan Engelhardt @ 2012-12-05 21:45 ` Willem de Bruijn 2012-12-05 21:50 ` Willem de Bruijn 2012-12-05 22:35 ` Jan Engelhardt 2012-12-06 5:22 ` Pablo Neira Ayuso 1 sibling, 2 replies; 58+ messages in thread From: Willem de Bruijn @ 2012-12-05 21:45 UTC (permalink / raw) To: Jan Engelhardt Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, Patrick McHardy, pablo On Wed, Dec 5, 2012 at 3:00 PM, Jan Engelhardt <jengelh@inai.de> wrote: > On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote: > >>Somehow, the first part of this email went missing. Not critical, >>but for completeness: >> >>These two patches each add an xtables match. >> >>The xt_priority match is a straighforward addition in the style of >>xt_mark, adding the option to filter on one more sk_buff field. I >>have an immediate application for this. The amount of code (in >>kernel + userspace) to add a single check proved quite large. > > Hm so yeah, can't we just place this in xt_mark.c? I'm happy to do so, but note that that breaks the custom of having one static struct xt_$NAME for each file xt_$NAME.[ch]. It may be reasonable, as the same issue may keep popping up as additional sk_buff fields are found useful for filtering. For instance, skb->queue_mapping could be used in conjuction with network flow classification (ethtool -N). All the ancillary data accessible from BPF likely has some use and could be ported to iptables (rxhash, pkt_type, ...). To avoid rule explosion, I considered an xt_skbuff match rule that applies the same mask operation, range and inversion tests, and takes a field id to select the sk_buff field to operate on. I think the BPF patch is a better long term solution. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH rfc] netfilter: two xtables matches 2012-12-05 21:45 ` Willem de Bruijn @ 2012-12-05 21:50 ` Willem de Bruijn 2012-12-05 22:35 ` Jan Engelhardt 1 sibling, 0 replies; 58+ messages in thread From: Willem de Bruijn @ 2012-12-05 21:50 UTC (permalink / raw) To: Jan Engelhardt Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, Patrick McHardy, Pablo Neira Ayuso On Wed, Dec 5, 2012 at 4:45 PM, Willem de Bruijn <willemb@google.com> wrote: > On Wed, Dec 5, 2012 at 3:00 PM, Jan Engelhardt <jengelh@inai.de> wrote: >> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote: >> >>>Somehow, the first part of this email went missing. Not critical, >>>but for completeness: >>> >>>These two patches each add an xtables match. >>> >>>The xt_priority match is a straighforward addition in the style of >>>xt_mark, adding the option to filter on one more sk_buff field. I >>>have an immediate application for this. The amount of code (in >>>kernel + userspace) to add a single check proved quite large. >> >> Hm so yeah, can't we just place this in xt_mark.c? > > I'm happy to do so, but note that that breaks the custom of > having one static struct xt_$NAME for each file xt_$NAME.[ch]. > > It may be reasonable, as the same issue may keep popping up > as additional sk_buff fields are found useful for filtering. For > instance, skb->queue_mapping could be used in conjuction with > network flow classification (ethtool -N). bad example: queue_mapping is tx only. I thought of rxqueues. > All the ancillary data > accessible from BPF likely has some use and could be ported > to iptables (rxhash, pkt_type, ...). > > To avoid rule explosion, I considered an xt_skbuff match rule that > applies the same mask operation, range and inversion tests, and > takes a field id to select the sk_buff field to operate on. I think > the BPF patch is a better long term solution. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH rfc] netfilter: two xtables matches 2012-12-05 21:45 ` Willem de Bruijn 2012-12-05 21:50 ` Willem de Bruijn @ 2012-12-05 22:35 ` Jan Engelhardt 1 sibling, 0 replies; 58+ messages in thread From: Jan Engelhardt @ 2012-12-05 22:35 UTC (permalink / raw) To: Willem de Bruijn Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, Patrick McHardy, pablo On Wednesday 2012-12-05 22:45, Willem de Bruijn wrote: >>>The xt_priority match is a straighforward addition in the style of >>>xt_mark, adding the option to filter on one more sk_buff field. I >>>have an immediate application for this. The amount of code (in >>>kernel + userspace) to add a single check proved quite large. >> >> Hm so yeah, can't we just place this in xt_mark.c? > >I'm happy to do so, but note that that breaks the custom of >having one static struct xt_$NAME for each file xt_$NAME.[ch]. The custom is long gone (just look at xt_mark.c ;-), because the module overhead is so much more than a function with an assignment/readout. >To avoid rule explosion, I considered an xt_skbuff match rule that >applies the same mask operation, range and inversion tests, and >takes a field id to select the sk_buff field to operate on. I think >the BPF patch is a better long term solution. I can't disagree. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH rfc] netfilter: two xtables matches 2012-12-05 20:00 ` Jan Engelhardt 2012-12-05 21:45 ` Willem de Bruijn @ 2012-12-06 5:22 ` Pablo Neira Ayuso 2012-12-06 21:12 ` Willem de Bruijn 1 sibling, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2012-12-06 5:22 UTC (permalink / raw) To: Jan Engelhardt Cc: Willem de Bruijn, netfilter-devel, netdev, Eric Dumazet, David Miller, kaber On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote: > On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote: > > >Somehow, the first part of this email went missing. Not critical, > >but for completeness: > > > >These two patches each add an xtables match. > > > >The xt_priority match is a straighforward addition in the style of > >xt_mark, adding the option to filter on one more sk_buff field. I > >have an immediate application for this. The amount of code (in > >kernel + userspace) to add a single check proved quite large. > > Hm so yeah, can't we just place this in xt_mark.c? I don't feel this belongs to xt_mark at all. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH rfc] netfilter: two xtables matches 2012-12-06 5:22 ` Pablo Neira Ayuso @ 2012-12-06 21:12 ` Willem de Bruijn 2012-12-07 7:22 ` Pablo Neira Ayuso 2012-12-07 13:20 ` Pablo Neira Ayuso 0 siblings, 2 replies; 58+ messages in thread From: Willem de Bruijn @ 2012-12-06 21:12 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: Jan Engelhardt, netfilter-devel, netdev, Eric Dumazet, David Miller, Patrick McHardy On Thu, Dec 6, 2012 at 12:22 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote: >> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote: >> >> >Somehow, the first part of this email went missing. Not critical, >> >but for completeness: >> > >> >These two patches each add an xtables match. >> > >> >The xt_priority match is a straighforward addition in the style of >> >xt_mark, adding the option to filter on one more sk_buff field. I >> >have an immediate application for this. The amount of code (in >> >kernel + userspace) to add a single check proved quite large. >> >> Hm so yeah, can't we just place this in xt_mark.c? > > I don't feel this belongs to xt_mark at all. Do you have other concerns, or can I resubmit as is for merging in a few days if no one raises additional issues? For this and netfilter changes in general: should these patches be against git://1984.lsi.us.es/nf-next instead of net-next? This patch likely applies cleanly there, but I haven't tried yet. Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH rfc] netfilter: two xtables matches 2012-12-06 21:12 ` Willem de Bruijn @ 2012-12-07 7:22 ` Pablo Neira Ayuso 2012-12-07 13:20 ` Pablo Neira Ayuso 1 sibling, 0 replies; 58+ messages in thread From: Pablo Neira Ayuso @ 2012-12-07 7:22 UTC (permalink / raw) To: Willem de Bruijn Cc: Jan Engelhardt, netfilter-devel, netdev, Eric Dumazet, David Miller, Patrick McHardy On Thu, Dec 06, 2012 at 04:12:10PM -0500, Willem de Bruijn wrote: > On Thu, Dec 6, 2012 at 12:22 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote: > >> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote: > >> > >> >Somehow, the first part of this email went missing. Not critical, > >> >but for completeness: > >> > > >> >These two patches each add an xtables match. > >> > > >> >The xt_priority match is a straighforward addition in the style of > >> >xt_mark, adding the option to filter on one more sk_buff field. I > >> >have an immediate application for this. The amount of code (in > >> >kernel + userspace) to add a single check proved quite large. > >> > >> Hm so yeah, can't we just place this in xt_mark.c? > > > > I don't feel this belongs to xt_mark at all. > > Do you have other concerns, or can I resubmit as is for merging in a > few days if no one raises additional issues? > > For this and netfilter changes in general: should these patches be > against git://1984.lsi.us.es/nf-next instead of net-next? This patch > likely applies cleanly there, but I haven't tried yet. Thanks. Please, against nf-next since this has to go throuh the netfilter tree. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH rfc] netfilter: two xtables matches 2012-12-06 21:12 ` Willem de Bruijn 2012-12-07 7:22 ` Pablo Neira Ayuso @ 2012-12-07 13:20 ` Pablo Neira Ayuso 2012-12-07 17:26 ` Willem de Bruijn 1 sibling, 1 reply; 58+ messages in thread From: Pablo Neira Ayuso @ 2012-12-07 13:20 UTC (permalink / raw) To: Willem de Bruijn Cc: Jan Engelhardt, netfilter-devel, netdev, Eric Dumazet, David Miller, Patrick McHardy On Thu, Dec 06, 2012 at 04:12:10PM -0500, Willem de Bruijn wrote: > On Thu, Dec 6, 2012 at 12:22 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote: > >> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote: > >> > >> >Somehow, the first part of this email went missing. Not critical, > >> >but for completeness: > >> > > >> >These two patches each add an xtables match. > >> > > >> >The xt_priority match is a straighforward addition in the style of > >> >xt_mark, adding the option to filter on one more sk_buff field. I > >> >have an immediate application for this. The amount of code (in > >> >kernel + userspace) to add a single check proved quite large. > >> > >> Hm so yeah, can't we just place this in xt_mark.c? > > > > I don't feel this belongs to xt_mark at all. > > Do you have other concerns, or can I resubmit as is for merging in a > few days if no one raises additional issues? In nftables we have the 'meta' extension that allows to match all skbuff fields (among other things): http://1984.lsi.us.es/git/nf-next/tree/net/netfilter/nft_meta.c?h=nf_tables8 I think it's the way to go so we stop adding small matches for each skbuff field. I don't mind the name if it's xt_skbuff or xt_meta. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH rfc] netfilter: two xtables matches 2012-12-07 13:20 ` Pablo Neira Ayuso @ 2012-12-07 17:26 ` Willem de Bruijn 0 siblings, 0 replies; 58+ messages in thread From: Willem de Bruijn @ 2012-12-07 17:26 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: Jan Engelhardt, netfilter-devel, netdev, Eric Dumazet, David Miller, Patrick McHardy On Fri, Dec 7, 2012 at 8:20 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Thu, Dec 06, 2012 at 04:12:10PM -0500, Willem de Bruijn wrote: >> On Thu, Dec 6, 2012 at 12:22 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >> > On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote: >> >> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote: >> >> >> >> >Somehow, the first part of this email went missing. Not critical, >> >> >but for completeness: >> >> > >> >> >These two patches each add an xtables match. >> >> > >> >> >The xt_priority match is a straighforward addition in the style of >> >> >xt_mark, adding the option to filter on one more sk_buff field. I >> >> >have an immediate application for this. The amount of code (in >> >> >kernel + userspace) to add a single check proved quite large. >> >> >> >> Hm so yeah, can't we just place this in xt_mark.c? >> > >> > I don't feel this belongs to xt_mark at all. >> >> Do you have other concerns, or can I resubmit as is for merging in a >> few days if no one raises additional issues? > > In nftables we have the 'meta' extension that allows to match all > skbuff fields (among other things): > > http://1984.lsi.us.es/git/nf-next/tree/net/netfilter/nft_meta.c?h=nf_tables8 > > I think it's the way to go so we stop adding small matches for each > skbuff field. > > I don't mind the name if it's xt_skbuff or xt_meta. Okay. I'll respin right now with one more field to select the skb field to match on, as a patch against tree nf-next, and will send that to netfilter-devel. ^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2013-04-04 9:34 UTC | newest] Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-12-05 19:22 [PATCH rfc] netfilter: two xtables matches Willem de Bruijn 2012-12-05 19:22 ` [PATCH 1/2] netfilter: add xt_priority xtables match Willem de Bruijn 2012-12-08 0:04 ` [PATCH] [RFC] netfilter: add xt_skbuff " Willem de Bruijn 2012-12-08 3:23 ` Pablo Neira Ayuso 2012-12-09 20:24 ` Willem de Bruijn 2012-12-09 20:28 ` [PATCH] " Willem de Bruijn 2012-12-05 19:22 ` [PATCH 2/2] netfilter: add xt_bpf " Willem de Bruijn 2012-12-05 19:48 ` Pablo Neira Ayuso 2012-12-05 20:10 ` Willem de Bruijn 2012-12-07 13:16 ` Pablo Neira Ayuso 2012-12-07 16:56 ` Willem de Bruijn 2012-12-08 3:31 ` Pablo Neira Ayuso 2012-12-08 16:02 ` Daniel Borkmann 2012-12-09 21:52 ` [PATCH next] iptables: add xt_bpf match Willem de Bruijn 2013-01-08 3:21 ` Pablo Neira Ayuso 2013-01-09 1:58 ` Willem de Bruijn 2013-01-09 9:52 ` Pablo Neira Ayuso 2013-01-10 0:08 ` Willem de Bruijn 2013-01-10 0:08 ` [PATCH next v2] " Willem de Bruijn 2013-01-10 0:15 ` [PATCH next v3] " Willem de Bruijn 2013-01-17 23:53 ` Pablo Neira Ayuso 2013-01-18 16:48 ` Willem de Bruijn 2013-01-18 17:17 ` [PATCH next] " Willem de Bruijn 2013-01-21 11:28 ` Pablo Neira Ayuso 2013-01-21 11:33 ` Pablo Neira Ayuso 2013-01-21 11:42 ` Florian Westphal 2013-01-21 12:03 ` Pablo Neira Ayuso 2013-01-21 16:02 ` Willem de Bruijn 2013-01-21 13:44 ` [PATCH next v3] " Pablo Neira Ayuso 2013-01-22 8:46 ` Florian Westphal 2013-01-22 9:46 ` Jozsef Kadlecsik 2013-01-22 10:03 ` Maciej Żenczykowski 2013-01-22 11:11 ` Pablo Neira Ayuso 2013-01-23 15:59 ` Willem de Bruijn 2013-01-23 16:21 ` Pablo Neira Ayuso 2013-01-23 16:38 ` Willem de Bruijn 2013-01-23 18:56 ` Pablo Neira Ayuso 2013-02-18 3:44 ` [PATCH] utils: bpf_compile Willem de Bruijn 2013-02-20 10:38 ` Daniel Borkmann 2013-02-21 4:35 ` Willem de Bruijn 2013-02-21 13:43 ` Daniel Borkmann 2013-03-12 15:44 ` [PATCH next] " Willem de Bruijn 2013-04-01 22:20 ` Pablo Neira Ayuso 2013-04-03 15:32 ` Willem de Bruijn 2013-04-04 9:34 ` Pablo Neira Ayuso 2013-02-18 3:52 ` [PATCH next v3] iptables: add xt_bpf match Willem de Bruijn 2013-02-24 2:15 ` Maciej Żenczykowski 2013-02-27 20:39 ` Willem de Bruijn 2012-12-05 19:28 ` [PATCH rfc] netfilter: two xtables matches Willem de Bruijn 2012-12-05 20:00 ` Jan Engelhardt 2012-12-05 21:45 ` Willem de Bruijn 2012-12-05 21:50 ` Willem de Bruijn 2012-12-05 22:35 ` Jan Engelhardt 2012-12-06 5:22 ` Pablo Neira Ayuso 2012-12-06 21:12 ` Willem de Bruijn 2012-12-07 7:22 ` Pablo Neira Ayuso 2012-12-07 13:20 ` Pablo Neira Ayuso 2012-12-07 17:26 ` Willem de Bruijn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.