* [PATCH nf-next] netfilter: xtables: lightweight process control group matching @ 2013-10-04 18:20 Daniel Borkmann 2013-10-07 3:07 ` Gao feng ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Daniel Borkmann @ 2013-10-04 18:20 UTC (permalink / raw) To: pablo; +Cc: netfilter-devel, netdev, Tejun Heo, cgroups It would be useful e.g. in a server or desktop environment to have a facility in the notion of fine-grained "per application" or "per application group" firewall policies. Probably, users in the mobile/ embedded area (e.g. Android based) with different security policy requirements for application groups could have great benefit from that as well. For example, with a little bit of configuration effort, an admin could whitelist well-known applications, and thus block otherwise unwanted "hard-to-track" applications like [1] from a user's machine. Implementation of PID-based matching would not be appropriate as they frequently change, and child tracking would make that even more complex and ugly. Cgroups would be a perfect candidate for accomplishing that as they associate a set of tasks with a set of parameters for one or more subsystems, in our case the netfilter subsystem, which, of course, can be combined with other cgroup subsystems into something more complex. As mentioned, to overcome this constraint, such processes could be placed into one or multiple cgroups where different fine-grained rules can be defined depending on the application scenario, while e.g. everything else that is not part of that could be dropped (or vice versa), thus making life harder for unwanted processes to communicate to the outside world. So, we make use of cgroups here to track jobs and limit their resources in terms of iptables policies; in other words, limiting what they are allowed to communicate. We have similar cgroup facilities in networking for traffic classifier, and netprio cgroups. This feature adds a lightweight cgroup id matching in terms of network security resp. network traffic isolation as part of netfilter's xtables subsystem. Minimal, basic usage example (many other iptables options can be applied obviously): 1) Configuring cgroups: mkdir /sys/fs/cgroup/net_filter mount -t cgroup -o net_filter net_filter /sys/fs/cgroup/net_filter mkdir /sys/fs/cgroup/net_filter/0 echo 1 > /sys/fs/cgroup/net_filter/0/net_filter.fwid 2) Configuring netfilter: iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP 3) Running applications: ping 208.67.222.222 <pid:1799> echo 1799 > /sys/fs/cgroup/net_filter/0/tasks 64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms ... ping 208.67.220.220 <pid:1804> ping: sendmsg: Operation not permitted ... echo 1804 > /sys/fs/cgroup/net_filter/0/tasks 64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms ... Of course, real-world deployments would make use of cgroups user space toolsuite, or custom daemons dynamically moving applications from/to net_filter cgroups. [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: cgroups@vger.kernel.org --- Documentation/cgroups/00-INDEX | 2 + Documentation/cgroups/net_filter.txt | 27 +++++ include/linux/cgroup_subsys.h | 5 + include/net/netfilter/xt_cgroup.h | 58 ++++++++++ include/net/sock.h | 3 + include/uapi/linux/netfilter/Kbuild | 1 + include/uapi/linux/netfilter/xt_cgroup.h | 11 ++ net/core/scm.c | 2 + net/core/sock.c | 14 +++ net/netfilter/Kconfig | 8 ++ net/netfilter/Makefile | 1 + net/netfilter/xt_cgroup.c | 182 +++++++++++++++++++++++++++++++ 12 files changed, 314 insertions(+) create mode 100644 Documentation/cgroups/net_filter.txt create mode 100644 include/net/netfilter/xt_cgroup.h create mode 100644 include/uapi/linux/netfilter/xt_cgroup.h create mode 100644 net/netfilter/xt_cgroup.c diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX index bc461b6..14424d2 100644 --- a/Documentation/cgroups/00-INDEX +++ b/Documentation/cgroups/00-INDEX @@ -20,6 +20,8 @@ memory.txt - Memory Resource Controller; design, accounting, interface, testing. net_cls.txt - Network classifier cgroups details and usages. +net_filter.txt + - Network firewalling (netfilter) cgroups details and usages. net_prio.txt - Network priority cgroups details and usages. resource_counter.txt diff --git a/Documentation/cgroups/net_filter.txt b/Documentation/cgroups/net_filter.txt new file mode 100644 index 0000000..0e21822 --- /dev/null +++ b/Documentation/cgroups/net_filter.txt @@ -0,0 +1,27 @@ +Netfilter cgroup +---------------- + +The netfilter cgroup provides an interface to aggregate jobs +to a particular netfilter tag, that can be used to apply +various iptables/netfilter policies for those jobs in order +to limit resources/abilities for network communication. + +Creating a net_filter cgroups instance creates a net_filter.fwid +file. The value of net_filter.fwid is initialized to 0 on +default (so only global iptables/netfilter policies apply). +You can write a unique decimal fwid tag into net_filter.fwid +file, and use that tag along with iptables' --cgroup option. + +Minimal/basic usage example: + +1) Configuring cgroup: + + mkdir /sys/fs/cgroup/net_filter + mount -t cgroup -o net_filter net_filter /sys/fs/cgroup/net_filter + mkdir /sys/fs/cgroup/net_filter/0 + echo 1 > /sys/fs/cgroup/net_filter/0/net_filter.fwid + echo [pid] > /sys/fs/cgroup/net_filter/0/tasks + +2) Configuring netfilter: + + iptables -A OUTPUT -m cgroup ! --cgroup 1 -p tcp --dport 80 -j DROP diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index b613ffd..ef58217 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -50,6 +50,11 @@ SUBSYS(net_prio) #if IS_SUBSYS_ENABLED(CONFIG_CGROUP_HUGETLB) SUBSYS(hugetlb) #endif + +#if IS_SUBSYS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) +SUBSYS(net_filter) +#endif + /* * DO NOT ADD ANY SUBSYSTEM WITHOUT EXPLICIT ACKS FROM CGROUP MAINTAINERS. */ diff --git a/include/net/netfilter/xt_cgroup.h b/include/net/netfilter/xt_cgroup.h new file mode 100644 index 0000000..b2c702f --- /dev/null +++ b/include/net/netfilter/xt_cgroup.h @@ -0,0 +1,58 @@ +#ifndef _XT_CGROUP_H +#define _XT_CGROUP_H + +#include <linux/types.h> +#include <linux/cgroup.h> +#include <linux/hardirq.h> +#include <linux/rcupdate.h> + +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) +struct cgroup_nf_state { + struct cgroup_subsys_state css; + u32 fwid; +}; + +void sock_update_fwid(struct sock *sk); + +#if IS_BUILTIN(CONFIG_NETFILTER_XT_MATCH_CGROUP) +static inline u32 task_fwid(struct task_struct *p) +{ + u32 fwid; + + if (in_interrupt()) + return 0; + + rcu_read_lock(); + fwid = container_of(task_css(p, net_filter_subsys_id), + struct cgroup_nf_state, css)->fwid; + rcu_read_unlock(); + + return fwid; +} +#elif IS_MODULE(CONFIG_NETFILTER_XT_MATCH_CGROUP) +static inline u32 task_fwid(struct task_struct *p) +{ + struct cgroup_subsys_state *css; + u32 fwid = 0; + + if (in_interrupt()) + return 0; + + rcu_read_lock(); + css = task_css(p, net_filter_subsys_id); + if (css) + fwid = container_of(css, struct cgroup_nf_state, css)->fwid; + rcu_read_unlock(); + + return fwid; +} +#endif +#else /* !CONFIG_NETFILTER_XT_MATCH_CGROUP */ +static inline u32 task_fwid(struct task_struct *p) +{ + return 0; +} + +#define sock_update_fwid(sk) +#endif /* CONFIG_NETFILTER_XT_MATCH_CGROUP */ +#endif /* _XT_CGROUP_H */ diff --git a/include/net/sock.h b/include/net/sock.h index e3bf213..f7da4b4 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -387,6 +387,9 @@ struct sock { #if IS_ENABLED(CONFIG_NETPRIO_CGROUP) __u32 sk_cgrp_prioidx; #endif +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) + __u32 sk_cgrp_fwid; +#endif struct pid *sk_peer_pid; const struct cred *sk_peer_cred; long sk_rcvtimeo; diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild index 1749154..94a4890 100644 --- a/include/uapi/linux/netfilter/Kbuild +++ b/include/uapi/linux/netfilter/Kbuild @@ -37,6 +37,7 @@ header-y += xt_TEE.h header-y += xt_TPROXY.h header-y += xt_addrtype.h header-y += xt_bpf.h +header-y += xt_cgroup.h header-y += xt_cluster.h header-y += xt_comment.h header-y += xt_connbytes.h diff --git a/include/uapi/linux/netfilter/xt_cgroup.h b/include/uapi/linux/netfilter/xt_cgroup.h new file mode 100644 index 0000000..43acb7e --- /dev/null +++ b/include/uapi/linux/netfilter/xt_cgroup.h @@ -0,0 +1,11 @@ +#ifndef _UAPI_XT_CGROUP_H +#define _UAPI_XT_CGROUP_H + +#include <linux/types.h> + +struct xt_cgroup_info { + __u32 id; + __u32 invert; +}; + +#endif /* _UAPI_XT_CGROUP_H */ diff --git a/net/core/scm.c b/net/core/scm.c index b442e7e..f08672a 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -36,6 +36,7 @@ #include <net/sock.h> #include <net/compat.h> #include <net/scm.h> +#include <net/netfilter/xt_cgroup.h> #include <net/cls_cgroup.h> @@ -290,6 +291,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm) /* Bump the usage count and install the file. */ sock = sock_from_file(fp[i], &err); if (sock) { + sock_update_fwid(sock->sk); sock_update_netprioidx(sock->sk); sock_update_classid(sock->sk); } diff --git a/net/core/sock.c b/net/core/sock.c index 2bd9b3f..524a376 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -125,6 +125,7 @@ #include <linux/skbuff.h> #include <net/net_namespace.h> #include <net/request_sock.h> +#include <net/netfilter/xt_cgroup.h> #include <net/sock.h> #include <linux/net_tstamp.h> #include <net/xfrm.h> @@ -1337,6 +1338,18 @@ void sock_update_netprioidx(struct sock *sk) EXPORT_SYMBOL_GPL(sock_update_netprioidx); #endif +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) +void sock_update_fwid(struct sock *sk) +{ + u32 fwid; + + fwid = task_fwid(current); + if (fwid != sk->sk_cgrp_fwid) + sk->sk_cgrp_fwid = fwid; +} +EXPORT_SYMBOL(sock_update_fwid); +#endif + /** * sk_alloc - All socket objects are allocated here * @net: the applicable net namespace @@ -1363,6 +1376,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority, sock_update_classid(sk); sock_update_netprioidx(sk); + sock_update_fwid(sk); } return sk; diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 6e839b6..d276ff4 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -806,6 +806,14 @@ config NETFILTER_XT_MATCH_BPF To compile it as a module, choose M here. If unsure, say N. +config NETFILTER_XT_MATCH_CGROUP + tristate '"control group" match support' + depends on NETFILTER_ADVANCED + depends on CGROUPS + ---help--- + Socket/process control group matching allows you to match locally + generated packets based on which control group processes belong to. + config NETFILTER_XT_MATCH_CLUSTER tristate '"cluster" match support' depends on NF_CONNTRACK diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index c3a0a12..12f014f 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o obj-$(CONFIG_NETFILTER_XT_MATCH_NFACCT) += xt_nfacct.o obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o +obj-$(CONFIG_NETFILTER_XT_MATCH_CGROUP) += xt_cgroup.o obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c new file mode 100644 index 0000000..86be16d --- /dev/null +++ b/net/netfilter/xt_cgroup.c @@ -0,0 +1,182 @@ +/* + * Xtables module to match the process control group. + * + * Might be used to implement individual "per-application" firewall + * policies (in contrast to global policies) based on control groups. + * + * (C) 2013 Daniel Borkmann <dborkman@redhat.com> + * (C) 2013 Thomas Graf <tgraf@redhat.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/skbuff.h> +#include <linux/module.h> +#include <linux/file.h> +#include <linux/cgroup.h> +#include <linux/fdtable.h> +#include <linux/netfilter/x_tables.h> +#include <linux/netfilter/xt_cgroup.h> +#include <net/netfilter/xt_cgroup.h> +#include <net/sock.h> + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>"); +MODULE_DESCRIPTION("Xtables: process control group matching"); +MODULE_ALIAS("ipt_cgroup"); +MODULE_ALIAS("ip6t_cgroup"); + +static int cgroup_mt_check(const struct xt_mtchk_param *par) +{ + struct xt_cgroup_info *info = par->matchinfo; + + if (info->invert & ~1) + return -EINVAL; + + return info->id ? 0 : -EINVAL; +} + +static bool +cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct xt_cgroup_info *info = par->matchinfo; + + if (skb->sk == NULL) + return false; + + return (info->id == skb->sk->sk_cgrp_fwid) ^ info->invert; +} + +static struct xt_match cgroup_mt_reg __read_mostly = { + .name = "cgroup", + .revision = 0, + .family = NFPROTO_UNSPEC, + .checkentry = cgroup_mt_check, + .match = cgroup_mt, + .matchsize = sizeof(struct xt_cgroup_info), + .me = THIS_MODULE, + .hooks = (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_POST_ROUTING), +}; + +static inline struct cgroup_nf_state * +css_nf_state(struct cgroup_subsys_state *css) +{ + return css ? container_of(css, struct cgroup_nf_state, css) : NULL; +} + +static inline struct cgroup_nf_state *task_nf_state(struct task_struct *p) +{ + return css_nf_state(task_css(p, net_filter_subsys_id)); +} + +static struct cgroup_subsys_state * +cgroup_css_alloc(struct cgroup_subsys_state *parent_css) +{ + struct cgroup_nf_state *cs; + + cs = kzalloc(sizeof(*cs), GFP_KERNEL); + if (!cs) + return ERR_PTR(-ENOMEM); + + return &cs->css; +} + +static int cgroup_css_online(struct cgroup_subsys_state *css) +{ + struct cgroup_nf_state *cs = css_nf_state(css); + struct cgroup_nf_state *parent = css_nf_state(css_parent(css)); + + if (parent) + cs->fwid = parent->fwid; + + return 0; +} + +static void cgroup_css_free(struct cgroup_subsys_state *css) +{ + kfree(css_nf_state(css)); +} + +static int cgroup_fwid_update(const void *v, struct file *file, unsigned n) +{ + int err; + struct socket *sock = sock_from_file(file, &err); + + if (sock) + sock->sk->sk_cgrp_fwid = (u32)(unsigned long) v; + + return 0; +} + +static u64 cgroup_fwid_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return css_nf_state(css)->fwid; +} + +static int cgroup_fwid_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 id) +{ + css_nf_state(css)->fwid = (u32) id; + + return 0; +} + +static void cgroup_attach(struct cgroup_subsys_state *css, + struct cgroup_taskset *tset) +{ + struct task_struct *p; + void *v; + + cgroup_taskset_for_each(p, css, tset) { + task_lock(p); + v = (void *)(unsigned long) task_fwid(p); + iterate_fd(p->files, 0, cgroup_fwid_update, v); + task_unlock(p); + } +} + +static struct cftype net_filter_ss_files[] = { + { + .name = "fwid", + .read_u64 = cgroup_fwid_read, + .write_u64 = cgroup_fwid_write, + }, + { } +}; + +struct cgroup_subsys net_filter_subsys = { + .name = "net_filter", + .css_alloc = cgroup_css_alloc, + .css_online = cgroup_css_online, + .css_free = cgroup_css_free, + .attach = cgroup_attach, + .subsys_id = net_filter_subsys_id, + .base_cftypes = net_filter_ss_files, + .module = THIS_MODULE, +}; + +static int __init cgroup_mt_init(void) +{ + int ret = cgroup_load_subsys(&net_filter_subsys); + if (ret) + goto out; + + ret = xt_register_match(&cgroup_mt_reg); + if (ret) + cgroup_unload_subsys(&net_filter_subsys); +out: + return ret; +} + +static void __exit cgroup_mt_exit(void) +{ + xt_unregister_match(&cgroup_mt_reg); + cgroup_unload_subsys(&net_filter_subsys); +} + +module_init(cgroup_mt_init); +module_exit(cgroup_mt_exit); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching 2013-10-04 18:20 [PATCH nf-next] netfilter: xtables: lightweight process control group matching Daniel Borkmann @ 2013-10-07 3:07 ` Gao feng 2013-10-07 9:17 ` Daniel Borkmann 2013-10-07 16:46 ` Tejun Heo [not found] ` <1380910855-12325-1-git-send-email-dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2 siblings, 1 reply; 19+ messages in thread From: Gao feng @ 2013-10-07 3:07 UTC (permalink / raw) To: Daniel Borkmann; +Cc: pablo, netfilter-devel, netdev, Tejun Heo, cgroups On 10/05/2013 02:20 AM, Daniel Borkmann wrote: > +static void cgroup_attach(struct cgroup_subsys_state *css, > + struct cgroup_taskset *tset) > +{ > + struct task_struct *p; > + void *v; > + > + cgroup_taskset_for_each(p, css, tset) { > + task_lock(p); > + v = (void *)(unsigned long) task_fwid(p); Shouldn't v be css_nf_state(css)->fwid? > + iterate_fd(p->files, 0, cgroup_fwid_update, v); > + task_unlock(p); > + } > +} ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching 2013-10-07 3:07 ` Gao feng @ 2013-10-07 9:17 ` Daniel Borkmann [not found] ` <52527C3E.1060004-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Daniel Borkmann @ 2013-10-07 9:17 UTC (permalink / raw) To: Gao feng; +Cc: pablo, netfilter-devel, netdev, Tejun Heo, cgroups On 10/07/2013 05:07 AM, Gao feng wrote: > On 10/05/2013 02:20 AM, Daniel Borkmann wrote: >> +static void cgroup_attach(struct cgroup_subsys_state *css, >> + struct cgroup_taskset *tset) >> +{ >> + struct task_struct *p; >> + void *v; >> + >> + cgroup_taskset_for_each(p, css, tset) { >> + task_lock(p); >> + v = (void *)(unsigned long) task_fwid(p); > > Shouldn't v be css_nf_state(css)->fwid? Nope, this is in line with net_cls and net_prio; the task has been moved there via cgroup backend already through cgroup_attach_task(), so we only need to update each of it's socket sk_cgrp_fwid parts. css is not strictly for net_filter. See also: 6a328d8c6f (cgroup: net_cls: Rework update socket logic) >> + iterate_fd(p->files, 0, cgroup_fwid_update, v); >> + task_unlock(p); >> + } >> +} ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <52527C3E.1060004-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching [not found] ` <52527C3E.1060004-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2013-10-07 9:42 ` Gao feng 0 siblings, 0 replies; 19+ messages in thread From: Gao feng @ 2013-10-07 9:42 UTC (permalink / raw) To: Daniel Borkmann Cc: pablo-Cap9r6Oaw4JrovVCs/uTlw, netfilter-devel-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA On 10/07/2013 05:17 PM, Daniel Borkmann wrote: > On 10/07/2013 05:07 AM, Gao feng wrote: >> On 10/05/2013 02:20 AM, Daniel Borkmann wrote: >>> +static void cgroup_attach(struct cgroup_subsys_state *css, >>> + struct cgroup_taskset *tset) >>> +{ >>> + struct task_struct *p; >>> + void *v; >>> + >>> + cgroup_taskset_for_each(p, css, tset) { >>> + task_lock(p); >>> + v = (void *)(unsigned long) task_fwid(p); >> >> Shouldn't v be css_nf_state(css)->fwid? > > Nope, this is in line with net_cls and net_prio; the task has been > moved there via cgroup backend already through cgroup_attach_task(), Yes, these tasks have already been migrated to this cgroup. > so we only need to update each of it's socket sk_cgrp_fwid parts. Sorry, I still don't know in which situation that css_nf_state(css)->fwid isn't equal to task_fwid(p). two threads write the same pid to different cgroup at the same time? it seems can not happen since we have cgroup_mutex protected. > css is not strictly for net_filter. See also: 6a328d8c6f (cgroup: > net_cls: Rework update socket logic) > >>> + iterate_fd(p->files, 0, cgroup_fwid_update, v); >>> + task_unlock(p); >>> + } >>> +} > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching 2013-10-04 18:20 [PATCH nf-next] netfilter: xtables: lightweight process control group matching Daniel Borkmann 2013-10-07 3:07 ` Gao feng @ 2013-10-07 16:46 ` Tejun Heo 2013-10-08 8:05 ` Daniel Borkmann [not found] ` <1380910855-12325-1-git-send-email-dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2 siblings, 1 reply; 19+ messages in thread From: Tejun Heo @ 2013-10-07 16:46 UTC (permalink / raw) To: Daniel Borkmann; +Cc: pablo, netfilter-devel, netdev, cgroups Hello, On Fri, Oct 04, 2013 at 08:20:55PM +0200, Daniel Borkmann wrote: > It would be useful e.g. in a server or desktop environment to have > a facility in the notion of fine-grained "per application" or "per > application group" firewall policies. Probably, users in the mobile/ > embedded area (e.g. Android based) with different security policy > requirements for application groups could have great benefit from > that as well. For example, with a little bit of configuration effort, > an admin could whitelist well-known applications, and thus block > otherwise unwanted "hard-to-track" applications like [1] from a > user's machine. > > Implementation of PID-based matching would not be appropriate > as they frequently change, and child tracking would make that > even more complex and ugly. Cgroups would be a perfect candidate > for accomplishing that as they associate a set of tasks with a > set of parameters for one or more subsystems, in our case the > netfilter subsystem, which, of course, can be combined with other > cgroup subsystems into something more complex. > > As mentioned, to overcome this constraint, such processes could > be placed into one or multiple cgroups where different fine-grained > rules can be defined depending on the application scenario, while > e.g. everything else that is not part of that could be dropped (or > vice versa), thus making life harder for unwanted processes to > communicate to the outside world. So, we make use of cgroups here > to track jobs and limit their resources in terms of iptables > policies; in other words, limiting what they are allowed to > communicate. > > We have similar cgroup facilities in networking for traffic > classifier, and netprio cgroups. This feature adds a lightweight > cgroup id matching in terms of network security resp. network > traffic isolation as part of netfilter's xtables subsystem. I don't think the two net cgroups were a good idea and definitely don't want to continue the trend. I think this is being done backwards. Wouldn't it be more logical to implement netfilter rule to match the target cgroup paths? It really doesn't make much sense to me to add separate controllers to just tag processes. Please classify tasks in cgroup and let netfilter match the cgroups. Thanks. -- tejun ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching 2013-10-07 16:46 ` Tejun Heo @ 2013-10-08 8:05 ` Daniel Borkmann [not found] ` <5253BCAE.5060409-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Daniel Borkmann @ 2013-10-08 8:05 UTC (permalink / raw) To: Tejun Heo; +Cc: pablo, netfilter-devel, netdev, cgroups Hi Tejun, On 10/07/2013 06:46 PM, Tejun Heo wrote: > On Fri, Oct 04, 2013 at 08:20:55PM +0200, Daniel Borkmann wrote: >> It would be useful e.g. in a server or desktop environment to have >> a facility in the notion of fine-grained "per application" or "per >> application group" firewall policies. Probably, users in the mobile/ >> embedded area (e.g. Android based) with different security policy >> requirements for application groups could have great benefit from >> that as well. For example, with a little bit of configuration effort, >> an admin could whitelist well-known applications, and thus block >> otherwise unwanted "hard-to-track" applications like [1] from a >> user's machine. >> >> Implementation of PID-based matching would not be appropriate >> as they frequently change, and child tracking would make that >> even more complex and ugly. Cgroups would be a perfect candidate >> for accomplishing that as they associate a set of tasks with a >> set of parameters for one or more subsystems, in our case the >> netfilter subsystem, which, of course, can be combined with other >> cgroup subsystems into something more complex. >> >> As mentioned, to overcome this constraint, such processes could >> be placed into one or multiple cgroups where different fine-grained >> rules can be defined depending on the application scenario, while >> e.g. everything else that is not part of that could be dropped (or >> vice versa), thus making life harder for unwanted processes to >> communicate to the outside world. So, we make use of cgroups here >> to track jobs and limit their resources in terms of iptables >> policies; in other words, limiting what they are allowed to >> communicate. >> >> We have similar cgroup facilities in networking for traffic >> classifier, and netprio cgroups. This feature adds a lightweight >> cgroup id matching in terms of network security resp. network >> traffic isolation as part of netfilter's xtables subsystem. > > I don't think the two net cgroups were a good idea and definitely > don't want to continue the trend. I think this is being done > backwards. Wouldn't it be more logical to implement netfilter rule to > match the target cgroup paths? It really doesn't make much sense to > me to add separate controllers to just tag processes. Please classify > tasks in cgroup and let netfilter match the cgroups. Thanks for your feedback! Could you elaborate on "Wouldn't it be more logical to implement netfilter rule to match the target cgroup paths?". I don't think (or hope) you mean some string comparison on the dentry path here? :) With our proposal, we have in the network stack's critical path only the following code that is being executed here to match the cgroup ... static bool cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_cgroup_info *info = par->matchinfo; if (skb->sk == NULL) return false; return (info->id == skb->sk->sk_cgrp_fwid) ^ info->invert; } ... where ``info->id == skb->sk->sk_cgrp_fwid'' is the actual work, so very lightweight, which is good for high loads (1Gbit/s, 10Gbit/s and beyond), of course. Also, it would be intuitive for admins familiar with other subsystems to just set up and use these cgroup ids in iptabels. I'm not yet quite sure how your suggestion would look like, so you would need to setup some "dummy" subgroups first just to have a path that you can match on? Thanks, Daniel ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <5253BCAE.5060409-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching [not found] ` <5253BCAE.5060409-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2013-10-09 17:04 ` Tejun Heo 2013-10-09 19:12 ` Daniel Borkmann [not found] ` <20131009170409.GH22495-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 0 siblings, 2 replies; 19+ messages in thread From: Tejun Heo @ 2013-10-09 17:04 UTC (permalink / raw) To: Daniel Borkmann Cc: pablo-Cap9r6Oaw4JrovVCs/uTlw, netfilter-devel-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA Hello, On Tue, Oct 08, 2013 at 10:05:02AM +0200, Daniel Borkmann wrote: > Could you elaborate on "Wouldn't it be more logical to implement netfilter > rule to match the target cgroup paths?". I don't think (or hope) you mean > some string comparison on the dentry path here? :) With our proposal, we > have in the network stack's critical path only the following code that is > being executed here to match the cgroup ... Comparing path each time obviously doesn't make sense but you can determine the cgroup on config and hold onto the pointer while the rule exists. > ... where ``info->id == skb->sk->sk_cgrp_fwid'' is the actual work, so very > lightweight, which is good for high loads (1Gbit/s, 10Gbit/s and beyond), of > course. Also, it would be intuitive for admins familiar with other subsystems > to just set up and use these cgroup ids in iptabels. I'm not yet quite sure > how your suggestion would look like, so you would need to setup some "dummy" > subgroups first just to have a path that you can match on? Currently, it's tricky because we have multiple hierarchies to consider and there isn't an efficient way to map from task to cgroup on a specific hierarchy. I'm not sure whether we should add another mapping table in css_set or just allow using path matching on the unified hierarchy. The latter should be cleaner and easier but more restrictive. Anyways, it isn't manageable in the long term to keep adding controllers simply to tag tasks differently. If we want to do this, let's please work on a way to match a task's cgroup affiliation efficiently. Thanks. -- tejun ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching 2013-10-09 17:04 ` Tejun Heo @ 2013-10-09 19:12 ` Daniel Borkmann [not found] ` <20131009170409.GH22495-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 1 sibling, 0 replies; 19+ messages in thread From: Daniel Borkmann @ 2013-10-09 19:12 UTC (permalink / raw) To: Tejun Heo; +Cc: pablo, netfilter-devel, netdev, cgroups On 10/09/2013 07:04 PM, Tejun Heo wrote: > Hello, > > On Tue, Oct 08, 2013 at 10:05:02AM +0200, Daniel Borkmann wrote: >> Could you elaborate on "Wouldn't it be more logical to implement netfilter >> rule to match the target cgroup paths?". I don't think (or hope) you mean >> some string comparison on the dentry path here? :) With our proposal, we >> have in the network stack's critical path only the following code that is >> being executed here to match the cgroup ... > > Comparing path each time obviously doesn't make sense but you can > determine the cgroup on config and hold onto the pointer while the > rule exists. > >> ... where ``info->id == skb->sk->sk_cgrp_fwid'' is the actual work, so very >> lightweight, which is good for high loads (1Gbit/s, 10Gbit/s and beyond), of >> course. Also, it would be intuitive for admins familiar with other subsystems >> to just set up and use these cgroup ids in iptabels. I'm not yet quite sure >> how your suggestion would look like, so you would need to setup some "dummy" >> subgroups first just to have a path that you can match on? > > Currently, it's tricky because we have multiple hierarchies to > consider and there isn't an efficient way to map from task to cgroup > on a specific hierarchy. I'm not sure whether we should add another > mapping table in css_set or just allow using path matching on the > unified hierarchy. The latter should be cleaner and easier but more > restrictive. > > Anyways, it isn't manageable in the long term to keep adding > controllers simply to tag tasks differently. If we want to do this, > let's please work on a way to match a task's cgroup affiliation > efficiently. Agreed, let us solve that first, and then I go back to the netfilter module to bring netfilter and cgroups together. Thanks, Daniel ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20131009170409.GH22495-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching [not found] ` <20131009170409.GH22495-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> @ 2013-10-10 21:55 ` Daniel Borkmann 0 siblings, 0 replies; 19+ messages in thread From: Daniel Borkmann @ 2013-10-10 21:55 UTC (permalink / raw) To: Tejun Heo Cc: pablo-Cap9r6Oaw4JrovVCs/uTlw, netfilter-devel-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel Borkmann Hi Tejun, On 10/09/2013 07:04 PM, Tejun Heo wrote: > On Tue, Oct 08, 2013 at 10:05:02AM +0200, Daniel Borkmann wrote: >> Could you elaborate on "Wouldn't it be more logical to implement netfilter >> rule to match the target cgroup paths?". I don't think (or hope) you mean >> some string comparison on the dentry path here? :) With our proposal, we >> have in the network stack's critical path only the following code that is >> being executed here to match the cgroup ... > > Comparing path each time obviously doesn't make sense but you can > determine the cgroup on config and hold onto the pointer while the > rule exists. > >> ... where ``info->id == skb->sk->sk_cgrp_fwid'' is the actual work, so very >> lightweight, which is good for high loads (1Gbit/s, 10Gbit/s and beyond), of >> course. Also, it would be intuitive for admins familiar with other subsystems >> to just set up and use these cgroup ids in iptabels. I'm not yet quite sure >> how your suggestion would look like, so you would need to setup some "dummy" >> subgroups first just to have a path that you can match on? > > Currently, it's tricky because we have multiple hierarchies to > consider and there isn't an efficient way to map from task to cgroup > on a specific hierarchy. I'm not sure whether we should add another > mapping table in css_set or just allow using path matching on the > unified hierarchy. The latter should be cleaner and easier but more > restrictive. > > Anyways, it isn't manageable in the long term to keep adding > controllers simply to tag tasks differently. If we want to do this, > let's please work on a way to match a task's cgroup affiliation > efficiently. Here's a draft (!) of an alternative w/o using a new cgroup subsystem. I've tested it and it would basically work this way as well. I've used serial_nr as an identifier of cgroups here, as we'd actually want the xt_cgroup_info structure as small as possible for rule sets (since they can be large and are flat-copied to kernel). Logic in cgroup_mt() needs to change a bit as we cannot hold css_set_lock here. Anyway, iptables would match here against cgroup.serial (that can probably also be widely used otherwise). The way we do it here is to cache the corresponding task in socket structure, and walk all cgroups belonging to that task, comparing if serial_nr's match. Still, I think my original patch is more clean, user friendly and intuitive, and has a better performance (main work is one comparison instead of walking all corresponding cgroups), so I'd still consider this the better tradeoff to go with, I think netfilter is a large enough candidate for a subsys. ;) Thanks, Daniel Draft patch: diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 3561d30..3c5e953 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -399,6 +399,26 @@ struct cgroup_map_cb { }; /* + * A cgroup can be associated with multiple css_sets as different tasks may + * belong to different cgroups on different hierarchies. In the other + * direction, a css_set is naturally associated with multiple cgroups. + * This M:N relationship is represented by the following link structure + * which exists for each association and allows traversing the associations + * from both sides. + */ +struct cgrp_cset_link { + /* the cgroup and css_set this link associates */ + struct cgroup *cgrp; + struct css_set *cset; + + /* list of cgrp_cset_links anchored at cgrp->cset_links */ + struct list_head cset_link; + + /* list of cgrp_cset_links anchored at css_set->cgrp_links */ + struct list_head cgrp_link; +}; + +/* * struct cftype: handler definitions for cgroup control files * * When reading/writing to a file: diff --git a/include/net/sock.h b/include/net/sock.h index e3bf213..b035ba3 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -406,6 +406,9 @@ struct sock { __u32 sk_mark; u32 sk_classid; struct cg_proto *sk_cgrp; +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) + struct task_struct *sk_task_cached; +#endif void (*sk_state_change)(struct sock *sk); void (*sk_data_ready)(struct sock *sk, int bytes); void (*sk_write_space)(struct sock *sk); @@ -2098,6 +2101,22 @@ static inline gfp_t gfp_any(void) return in_softirq() ? GFP_ATOMIC : GFP_KERNEL; } +static inline struct task_struct *sock_task(const struct sock *sk) +{ +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) + return sk->sk_task_cached; +#else + return NULL; +#endif +} + +static inline void sock_task_set(struct sock *sk, struct task_struct *task) +{ +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) + sk->sk_task_cached = task; +#endif +} + static inline long sock_rcvtimeo(const struct sock *sk, bool noblock) { return noblock ? 0 : sk->sk_rcvtimeo; diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild index 1749154..94a4890 100644 --- a/include/uapi/linux/netfilter/Kbuild +++ b/include/uapi/linux/netfilter/Kbuild @@ -37,6 +37,7 @@ header-y += xt_TEE.h header-y += xt_TPROXY.h header-y += xt_addrtype.h header-y += xt_bpf.h +header-y += xt_cgroup.h header-y += xt_cluster.h header-y += xt_comment.h header-y += xt_connbytes.h diff --git a/include/uapi/linux/netfilter/xt_cgroup.h b/include/uapi/linux/netfilter/xt_cgroup.h new file mode 100644 index 0000000..c59ff53 --- /dev/null +++ b/include/uapi/linux/netfilter/xt_cgroup.h @@ -0,0 +1,11 @@ +#ifndef _UAPI_XT_CGROUP_H +#define _UAPI_XT_CGROUP_H + +#include <linux/types.h> + +struct xt_cgroup_info { + __u64 serial_nr; + __u32 invert; +}; + +#endif /* _UAPI_XT_CGROUP_H */ diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2418b6e..1f9dc5b 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -357,26 +357,6 @@ static void cgroup_release_agent(struct work_struct *work); static DECLARE_WORK(release_agent_work, cgroup_release_agent); static void check_for_release(struct cgroup *cgrp); -/* - * A cgroup can be associated with multiple css_sets as different tasks may - * belong to different cgroups on different hierarchies. In the other - * direction, a css_set is naturally associated with multiple cgroups. - * This M:N relationship is represented by the following link structure - * which exists for each association and allows traversing the associations - * from both sides. - */ -struct cgrp_cset_link { - /* the cgroup and css_set this link associates */ - struct cgroup *cgrp; - struct css_set *cset; - - /* list of cgrp_cset_links anchored at cgrp->cset_links */ - struct list_head cset_link; - - /* list of cgrp_cset_links anchored at css_set->cgrp_links */ - struct list_head cgrp_link; -}; - /* The default css_set - used by init and its children prior to any * hierarchies being mounted. It contains a pointer to the root state * for each subsystem. Also used to anchor the list of css_sets. Not @@ -4163,6 +4143,12 @@ static int cgroup_clone_children_write(struct cgroup_subsys_state *css, return 0; } +static u64 cgroup_serial_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return css->cgroup->serial_nr; +} + static struct cftype cgroup_base_files[] = { { .name = "cgroup.procs", @@ -4187,6 +4173,11 @@ static struct cftype cgroup_base_files[] = { .flags = CFTYPE_ONLY_ON_ROOT, .read_seq_string = cgroup_sane_behavior_show, }, + { + .name = "cgroup.serial", + .read_u64 = cgroup_serial_read, + .mode = S_IRUGO, + }, /* * Historical crazy stuff. These don't have "cgroup." prefix and diff --git a/net/core/scm.c b/net/core/scm.c index b442e7e..9a40ab0 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -292,6 +292,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm) if (sock) { sock_update_netprioidx(sock->sk); sock_update_classid(sock->sk); + sock_task_set(sock->sk, current); } fd_install(new_fd, get_file(fp[i])); } diff --git a/net/core/sock.c b/net/core/sock.c index 2bd9b3f..ab53afc 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1359,6 +1359,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority, sk->sk_prot = sk->sk_prot_creator = prot; sock_lock_init(sk); sock_net_set(sk, get_net(net)); + sock_task_set(sk, current); atomic_set(&sk->sk_wmem_alloc, 1); sock_update_classid(sk); diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 6e839b6..d276ff4 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -806,6 +806,14 @@ config NETFILTER_XT_MATCH_BPF To compile it as a module, choose M here. If unsure, say N. +config NETFILTER_XT_MATCH_CGROUP + tristate '"control group" match support' + depends on NETFILTER_ADVANCED + depends on CGROUPS + ---help--- + Socket/process control group matching allows you to match locally + generated packets based on which control group processes belong to. + config NETFILTER_XT_MATCH_CLUSTER tristate '"cluster" match support' depends on NF_CONNTRACK diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index c3a0a12..12f014f 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o obj-$(CONFIG_NETFILTER_XT_MATCH_NFACCT) += xt_nfacct.o obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o +obj-$(CONFIG_NETFILTER_XT_MATCH_CGROUP) += xt_cgroup.o obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c new file mode 100644 index 0000000..f04cba8 --- /dev/null +++ b/net/netfilter/xt_cgroup.c @@ -0,0 +1,79 @@ +#include <linux/skbuff.h> +#include <linux/module.h> +#include <linux/file.h> +#include <linux/cgroup.h> +#include <linux/fdtable.h> +#include <linux/netfilter/x_tables.h> +#include <linux/netfilter/xt_cgroup.h> +#include <net/sock.h> + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"); +MODULE_DESCRIPTION("Xtables: process control group matching"); +MODULE_ALIAS("ipt_cgroup"); +MODULE_ALIAS("ip6t_cgroup"); + +static int cgroup_mt_check(const struct xt_mtchk_param *par) +{ + struct xt_cgroup_info *info = par->matchinfo; + + return (info->invert & ~1) ? -EINVAL : 0; +} + +static bool +cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct xt_cgroup_info *info = par->matchinfo; + struct cgrp_cset_link *link, *link_tmp; + const struct sock *sk = skb->sk; + struct task_struct *task; + struct css_set *cset; + bool found = false; + + if (sk == NULL) + return false; + + task = sock_task(sk); + if (task == NULL) + return false; + + rcu_read_lock(); + /* XXX: read_lock(&css_set_lock); */ + cset = task_css_set(task); + list_for_each_entry_safe(link, link_tmp, &cset->cgrp_links, cgrp_link) { + struct cgroup *cgrp = link->cgrp; + if (cgrp->serial_nr == info->serial_nr) { + found = true; + break; + } + } + /* XXX: read_unlock(&css_set_lock); */ + rcu_read_unlock(); + + return found ^ info->invert; +} + +static struct xt_match cgroup_mt_reg __read_mostly = { + .name = "cgroup", + .revision = 0, + .family = NFPROTO_UNSPEC, + .checkentry = cgroup_mt_check, + .match = cgroup_mt, + .matchsize = sizeof(struct xt_cgroup_info), + .me = THIS_MODULE, + .hooks = (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_POST_ROUTING), +}; + +static int __init cgroup_mt_init(void) +{ + return xt_register_match(&cgroup_mt_reg); +} + +static void __exit cgroup_mt_exit(void) +{ + xt_unregister_match(&cgroup_mt_reg); +} + +module_init(cgroup_mt_init); +module_exit(cgroup_mt_exit); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
[parent not found: <1380910855-12325-1-git-send-email-dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching [not found] ` <1380910855-12325-1-git-send-email-dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2013-10-18 23:21 ` Eric W. Biederman [not found] ` <87li1qp3l8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Eric W. Biederman @ 2013-10-18 23:21 UTC (permalink / raw) To: Daniel Borkmann Cc: pablo-Cap9r6Oaw4JrovVCs/uTlw, netfilter-devel-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > Implementation of PID-based matching would not be appropriate > as they frequently change, and child tracking would make that > even more complex and ugly. Cgroups would be a perfect candidate > for accomplishing that as they associate a set of tasks with a > set of parameters for one or more subsystems, in our case the > netfilter subsystem, which, of course, can be combined with other > cgroup subsystems into something more complex. I am coming to this late. But two concrete suggestions. 1) process groups and sessions don't change as frequently as pids. 2) It is possible to put a set of processes in their own network namespace and pipe just the packets you want those processes to use into that network namespace. Using an ingress queueing filter makes that process very efficient even if you have to filter by port. So I don't think you need cgroups to solve this problem at all. Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <87li1qp3l8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching [not found] ` <87li1qp3l8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> @ 2013-10-19 7:16 ` Daniel Borkmann 2013-10-21 15:09 ` Daniel Wagner 0 siblings, 1 reply; 19+ messages in thread From: Daniel Borkmann @ 2013-10-19 7:16 UTC (permalink / raw) To: Eric W. Biederman Cc: pablo-Cap9r6Oaw4JrovVCs/uTlw, netfilter-devel-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA On 10/19/2013 01:21 AM, Eric W. Biederman wrote: > I am coming to this late. But two concrete suggestions. > > 1) process groups and sessions don't change as frequently as pids. > > 2) It is possible to put a set of processes in their own network > namespace and pipe just the packets you want those processes to > use into that network namespace. Using an ingress queueing filter > makes that process very efficient even if you have to filter by port. Actually in our case we're filtering outgoing traffic, based on which local socket that originated from; so you wouldn't need all of that construct. Also, you wouldn't even need to have an a-prio knowledge of the application internals regarding their use of particular use of ports or protocols. I don't think that such a setup will have the same efficiency, ease of use, and power to distinguish the application the traffic came from in such a lightweight, protocol independent and easy way. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching 2013-10-19 7:16 ` Daniel Borkmann @ 2013-10-21 15:09 ` Daniel Wagner [not found] ` <526543A2.2040901-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Daniel Wagner @ 2013-10-21 15:09 UTC (permalink / raw) To: Daniel Borkmann, Eric W. Biederman Cc: pablo, netfilter-devel, netdev, Tejun Heo, cgroups Hi Daniel On 10/19/2013 08:16 AM, Daniel Borkmann wrote: > On 10/19/2013 01:21 AM, Eric W. Biederman wrote: > >> I am coming to this late. But two concrete suggestions. >> >> 1) process groups and sessions don't change as frequently as pids. >> >> 2) It is possible to put a set of processes in their own network >> namespace and pipe just the packets you want those processes to >> use into that network namespace. Using an ingress queueing filter >> makes that process very efficient even if you have to filter by port. > > Actually in our case we're filtering outgoing traffic, based on which > local socket that originated from; so you wouldn't need all of that > construct. Also, you wouldn't even need to have an a-prio knowledge of > the application internals regarding their use of particular use of ports > or protocols. I don't think that such a setup will have the same > efficiency, ease of use, and power to distinguish the application the > traffic came from in such a lightweight, protocol independent and easy way. Sorry for beeing late as well (and also stupid question) Couldn't you use something from the LSM? I mean you allow the application to create the socket etc and then block later the traffic originated from that socket. Wouldn't it make more sense to block early? cheers, daniel ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <526543A2.2040901-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>]
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching [not found] ` <526543A2.2040901-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org> @ 2013-10-21 15:48 ` Daniel Borkmann 2013-10-22 7:15 ` Ni, Xun [not found] ` <52654CE6.7030706-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 2 replies; 19+ messages in thread From: Daniel Borkmann @ 2013-10-21 15:48 UTC (permalink / raw) To: Daniel Wagner Cc: Eric W. Biederman, pablo-Cap9r6Oaw4JrovVCs/uTlw, netfilter-devel-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA On 10/21/2013 05:09 PM, Daniel Wagner wrote: > On 10/19/2013 08:16 AM, Daniel Borkmann wrote: >> On 10/19/2013 01:21 AM, Eric W. Biederman wrote: >> >>> I am coming to this late. But two concrete suggestions. >>> >>> 1) process groups and sessions don't change as frequently as pids. >>> >>> 2) It is possible to put a set of processes in their own network >>> namespace and pipe just the packets you want those processes to >>> use into that network namespace. Using an ingress queueing filter >>> makes that process very efficient even if you have to filter by port. >> >> Actually in our case we're filtering outgoing traffic, based on which >> local socket that originated from; so you wouldn't need all of that >> construct. Also, you wouldn't even need to have an a-prio knowledge of >> the application internals regarding their use of particular use of ports >> or protocols. I don't think that such a setup will have the same >> efficiency, ease of use, and power to distinguish the application the >> traffic came from in such a lightweight, protocol independent and easy way. > > Sorry for beeing late as well (and also stupid question) > > Couldn't you use something from the LSM? I mean you allow the > application to create the socket etc and then block later > the traffic originated from that socket. Wouldn't it make > more sense to block early? I gave one simple example for blocking in the commit message, that's true, but it is not limited to that, meaning we can have much different scenarios/policies that netfilter allows us than just blocking, e.g. fine grained settings where applications are allowed to connect/send traffic to, application traffic marking/ conntracking, application-specific packet mangling, and so on, just think of the whole netfilter universe. ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH nf-next] netfilter: xtables: lightweight process control group matching 2013-10-21 15:48 ` Daniel Borkmann @ 2013-10-22 7:15 ` Ni, Xun 2013-10-22 7:42 ` Daniel Borkmann 2013-10-22 7:45 ` Daniel Wagner [not found] ` <52654CE6.7030706-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 1 sibling, 2 replies; 19+ messages in thread From: Ni, Xun @ 2013-10-22 7:15 UTC (permalink / raw) To: Daniel Borkmann, Daniel Wagner Cc: Eric W. Biederman, pablo, netfilter-devel, netdev, Tejun Heo, cgroups Hello, Daniel: can all your examples block early before doing network operations? What's the whole netfilter universe? Can you give us more clear examples? Thanks On 10/21/2013 05:09 PM, Daniel Wagner wrote: > On 10/19/2013 08:16 AM, Daniel Borkmann wrote: >> On 10/19/2013 01:21 AM, Eric W. Biederman wrote: >> >>> I am coming to this late. But two concrete suggestions. >>> >>> 1) process groups and sessions don't change as frequently as pids. >>> >>> 2) It is possible to put a set of processes in their own network >>> namespace and pipe just the packets you want those processes to >>> use into that network namespace. Using an ingress queueing filter >>> makes that process very efficient even if you have to filter by port. >> >> Actually in our case we're filtering outgoing traffic, based on which >> local socket that originated from; so you wouldn't need all of that >> construct. Also, you wouldn't even need to have an a-prio knowledge >> of the application internals regarding their use of particular use of >> ports or protocols. I don't think that such a setup will have the >> same efficiency, ease of use, and power to distinguish the >> application the traffic came from in such a lightweight, protocol independent and easy way. > > Sorry for beeing late as well (and also stupid question) > > Couldn't you use something from the LSM? I mean you allow the > application to create the socket etc and then block later the traffic > originated from that socket. Wouldn't it make more sense to block > early? I gave one simple example for blocking in the commit message, that's true, but it is not limited to that, meaning we can have much different scenarios/policies that netfilter allows us than just blocking, e.g. fine grained settings where applications are allowed to connect/send traffic to, application traffic marking/ conntracking, application-specific packet mangling, and so on, just think of the whole netfilter universe. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching 2013-10-22 7:15 ` Ni, Xun @ 2013-10-22 7:42 ` Daniel Borkmann 2013-10-22 7:45 ` Daniel Wagner 1 sibling, 0 replies; 19+ messages in thread From: Daniel Borkmann @ 2013-10-22 7:42 UTC (permalink / raw) To: Ni, Xun Cc: Daniel Wagner, Eric W. Biederman, pablo, netfilter-devel, netdev, Tejun Heo, cgroups On 10/22/2013 09:15 AM, Ni, Xun wrote: > Hello, Daniel: > can all your examples block early before doing network operations? What's the whole netfilter universe? Can you give us more clear examples? As you can see from the code, the netfilter hooks are located in NF_INET_LOCAL_OUT and NF_INET_POST_ROUTING. > Thanks > On 10/21/2013 05:09 PM, Daniel Wagner wrote: >> On 10/19/2013 08:16 AM, Daniel Borkmann wrote: >>> On 10/19/2013 01:21 AM, Eric W. Biederman wrote: >>> >>>> I am coming to this late. But two concrete suggestions. >>>> >>>> 1) process groups and sessions don't change as frequently as pids. >>>> >>>> 2) It is possible to put a set of processes in their own network >>>> namespace and pipe just the packets you want those processes to >>>> use into that network namespace. Using an ingress queueing filter >>>> makes that process very efficient even if you have to filter by port. >>> >>> Actually in our case we're filtering outgoing traffic, based on which >>> local socket that originated from; so you wouldn't need all of that >>> construct. Also, you wouldn't even need to have an a-prio knowledge >>> of the application internals regarding their use of particular use of >>> ports or protocols. I don't think that such a setup will have the >>> same efficiency, ease of use, and power to distinguish the >>> application the traffic came from in such a lightweight, protocol independent and easy way. >> >> Sorry for beeing late as well (and also stupid question) >> >> Couldn't you use something from the LSM? I mean you allow the >> application to create the socket etc and then block later the traffic >> originated from that socket. Wouldn't it make more sense to block >> early? > > I gave one simple example for blocking in the commit message, that's true, but it is not limited to that, meaning we can have much different scenarios/policies that netfilter allows us than just blocking, e.g. fine grained settings where applications are allowed to connect/send traffic to, application traffic marking/ conntracking, application-specific packet mangling, and so on, just think of the whole netfilter universe. > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching 2013-10-22 7:15 ` Ni, Xun 2013-10-22 7:42 ` Daniel Borkmann @ 2013-10-22 7:45 ` Daniel Wagner 1 sibling, 0 replies; 19+ messages in thread From: Daniel Wagner @ 2013-10-22 7:45 UTC (permalink / raw) To: Ni, Xun, Daniel Borkmann Cc: Eric W. Biederman, pablo, netfilter-devel, netdev, Tejun Heo, cgroups Hi Xun, On 10/22/2013 08:15 AM, Ni, Xun wrote: > Hello, Daniel: > can all your examples block early before doing network operations? I was referring to Linux Security Module which allows to define access policies for an application e.g. which ports are allowed to be used. If the goal is just to block those ports you don't have to go through half of the networking stack to figure out via an iptable rules that this access is not allowed. > What's the whole netfilter universe? Can you give us more clear > examples? I am not sure if I understood your question correctly. In case you are asking what netfilter is I would like pointing you to the http://www.netfilter.org/ project page. cheers, daniel ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <52654CE6.7030706-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching [not found] ` <52654CE6.7030706-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2013-10-22 7:36 ` Daniel Wagner 0 siblings, 0 replies; 19+ messages in thread From: Daniel Wagner @ 2013-10-22 7:36 UTC (permalink / raw) To: Daniel Borkmann Cc: Eric W. Biederman, pablo-Cap9r6Oaw4JrovVCs/uTlw, netfilter-devel-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA On 10/21/2013 04:48 PM, Daniel Borkmann wrote: > On 10/21/2013 05:09 PM, Daniel Wagner wrote: >> On 10/19/2013 08:16 AM, Daniel Borkmann wrote: >>> On 10/19/2013 01:21 AM, Eric W. Biederman wrote: >>> >>>> I am coming to this late. But two concrete suggestions. >>>> >>>> 1) process groups and sessions don't change as frequently as pids. >>>> >>>> 2) It is possible to put a set of processes in their own network >>>> namespace and pipe just the packets you want those processes to >>>> use into that network namespace. Using an ingress queueing filter >>>> makes that process very efficient even if you have to filter by >>>> port. >>> >>> Actually in our case we're filtering outgoing traffic, based on which >>> local socket that originated from; so you wouldn't need all of that >>> construct. Also, you wouldn't even need to have an a-prio knowledge of >>> the application internals regarding their use of particular use of ports >>> or protocols. I don't think that such a setup will have the same >>> efficiency, ease of use, and power to distinguish the application the >>> traffic came from in such a lightweight, protocol independent and >>> easy way. >> >> Sorry for beeing late as well (and also stupid question) >> >> Couldn't you use something from the LSM? I mean you allow the >> application to create the socket etc and then block later >> the traffic originated from that socket. Wouldn't it make >> more sense to block early? > > I gave one simple example for blocking in the commit message, > that's true, but it is not limited to that, meaning we can have > much different scenarios/policies that netfilter allows us than > just blocking, e.g. fine grained settings where applications are > allowed to connect/send traffic to, application traffic marking/ > conntracking, application-specific packet mangling, and so on, > just think of the whole netfilter universe. Oh, I didn't pay enough attention to the commit message. Sorry about that. Obviously, if fine grained settings is a must then blocking the write is not good enough. cheers, daniel ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <cover.1382101225.git.dborkman@redhat.com>]
* [PATCH nf-next] netfilter: xtables: lightweight process control group matching [not found] <cover.1382101225.git.dborkman@redhat.com> @ 2013-10-18 13:28 ` Daniel Borkmann [not found] ` <ee0fb538d6e43e23d0488d3edd741de9c4589fb1.1382101225.git.dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Daniel Borkmann @ 2013-10-18 13:28 UTC (permalink / raw) To: pablo; +Cc: netfilter-devel, netdev, Tejun Heo, cgroups It would be useful e.g. in a server or desktop environment to have a facility in the notion of fine-grained "per application" or "per application group" firewall policies. Probably, users in the mobile/ embedded area (e.g. Android based) with different security policy requirements for application groups could have great benefit from that as well. For example, with a little bit of configuration effort, an admin could whitelist well-known applications, and thus block otherwise unwanted "hard-to-track" applications like [1] from a user's machine. Implementation of PID-based matching would not be appropriate as they frequently change, and child tracking would make that even more complex and ugly. Cgroups would be a perfect candidate for accomplishing that as they associate a set of tasks with a set of parameters for one or more subsystems, in our case the netfilter subsystem, which, of course, can be combined with other cgroup subsystems into something more complex. As mentioned, to overcome this constraint, such processes could be placed into one or multiple cgroups where different fine-grained rules can be defined depending on the application scenario, while e.g. everything else that is not part of that could be dropped (or vice versa), thus making life harder for unwanted processes to communicate to the outside world. So, we make use of cgroups here to track jobs and limit their resources in terms of iptables policies; in other words, limiting what they are allowed to communicate. Minimal, basic usage example (many other iptables options can be applied obviously): 1) Configuring cgroups: mkdir /sys/fs/cgroup/net_filter mount -t cgroup -o net_filter net_filter /sys/fs/cgroup/net_filter mkdir /sys/fs/cgroup/net_filter/0 echo 1 > /sys/fs/cgroup/net_filter/0/net_filter.fwid 2) Configuring netfilter: iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP 3) Running applications: ping 208.67.222.222 <pid:1799> echo 1799 > /sys/fs/cgroup/net_filter/0/tasks 64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms ... ping 208.67.220.220 <pid:1804> ping: sendmsg: Operation not permitted ... echo 1804 > /sys/fs/cgroup/net_filter/0/tasks 64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms ... Of course, real-world deployments would make use of cgroups user space toolsuite, or own custom policy daemons dynamically moving applications from/to various net_filter cgroups. Design considerations appendix: Based on the discussion from [2], [3], it seems the best tradeoff imho to make this a subsystem, here's why: netfilter is a large enough and ubiquitous subsystem, meaning it is not somewhere in a niche, and enabled/shipped on most machines. It is true that the descision making on fwid is "outsourced" to netfilter itself, but that does not necessarily need to be considered as a bad thing to delegate and reuse as much as possible. The matching performance in the critical path is just a simple comparison of fwid tags, nothing more, thus resulting in a good performance suited for high-speed networking. Moreover, by simply transfering fwids between user- and kernel space, we can have the ruleset as packed as possible, giving an optimal footprint for large rulesets using this feature. The alternative draft that we have proposed in [3] comes at the cost of exposing some of the cgroups internals outside of cgroups to make it work, at least a higher memory footprint for transferal of rules and even worse a lower performance as more work needs to be done in the matching critical path, that is traversing all cgroups a task belongs to to find the one of our interest. Moreover, from the usability point of view, it seems less intuitive, rather more confusing than the approach presented here. Therefore, I consider this design the better and less intrusive tradeoff to go with. [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf [2] http://patchwork.ozlabs.org/patch/280687/ [3] http://patchwork.ozlabs.org/patch/282477/ Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: cgroups@vger.kernel.org --- v1->v2: - Updated commit message, rebased - Applied Gao Feng's feedback from [2] Note: iptables part is still available in http://patchwork.ozlabs.org/patch/280690/ Documentation/cgroups/00-INDEX | 2 + Documentation/cgroups/net_filter.txt | 27 +++++ include/linux/cgroup_subsys.h | 5 + include/net/netfilter/xt_cgroup.h | 58 ++++++++++ include/net/sock.h | 3 + include/uapi/linux/netfilter/Kbuild | 1 + include/uapi/linux/netfilter/xt_cgroup.h | 11 ++ net/core/scm.c | 2 + net/core/sock.c | 14 +++ net/netfilter/Kconfig | 8 ++ net/netfilter/Makefile | 1 + net/netfilter/xt_cgroup.c | 177 +++++++++++++++++++++++++++++++ 12 files changed, 309 insertions(+) create mode 100644 Documentation/cgroups/net_filter.txt create mode 100644 include/net/netfilter/xt_cgroup.h create mode 100644 include/uapi/linux/netfilter/xt_cgroup.h create mode 100644 net/netfilter/xt_cgroup.c diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX index bc461b6..14424d2 100644 --- a/Documentation/cgroups/00-INDEX +++ b/Documentation/cgroups/00-INDEX @@ -20,6 +20,8 @@ memory.txt - Memory Resource Controller; design, accounting, interface, testing. net_cls.txt - Network classifier cgroups details and usages. +net_filter.txt + - Network firewalling (netfilter) cgroups details and usages. net_prio.txt - Network priority cgroups details and usages. resource_counter.txt diff --git a/Documentation/cgroups/net_filter.txt b/Documentation/cgroups/net_filter.txt new file mode 100644 index 0000000..22759e4 --- /dev/null +++ b/Documentation/cgroups/net_filter.txt @@ -0,0 +1,27 @@ +Netfilter cgroup +---------------- + +The netfilter cgroup provides an interface to aggregate jobs +to a particular netfilter tag, that can be used to apply +various iptables/netfilter policies for those jobs in order +to limit resources/abilities for network communication. + +Creating a net_filter cgroups instance creates a net_filter.fwid +file. The value of net_filter.fwid is initialized to 0 on +default (so only global iptables/netfilter policies apply). +You can write a unique decimal fwid tag into net_filter.fwid +file, and use that tag along with iptables' --cgroup option. + +Minimal/basic usage example: + +1) Configuring cgroup: + + mkdir /sys/fs/cgroup/net_filter + mount -t cgroup -o net_filter net_filter /sys/fs/cgroup/net_filter + mkdir /sys/fs/cgroup/net_filter/0 + echo 1 > /sys/fs/cgroup/net_filter/0/net_filter.fwid + echo [pid] > /sys/fs/cgroup/net_filter/0/tasks + +2) Configuring netfilter: + + iptables -A OUTPUT -m cgroup ! --cgroup 1 -p tcp --dport 80 -j DROP diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index b613ffd..ef58217 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -50,6 +50,11 @@ SUBSYS(net_prio) #if IS_SUBSYS_ENABLED(CONFIG_CGROUP_HUGETLB) SUBSYS(hugetlb) #endif + +#if IS_SUBSYS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) +SUBSYS(net_filter) +#endif + /* * DO NOT ADD ANY SUBSYSTEM WITHOUT EXPLICIT ACKS FROM CGROUP MAINTAINERS. */ diff --git a/include/net/netfilter/xt_cgroup.h b/include/net/netfilter/xt_cgroup.h new file mode 100644 index 0000000..b2c702f --- /dev/null +++ b/include/net/netfilter/xt_cgroup.h @@ -0,0 +1,58 @@ +#ifndef _XT_CGROUP_H +#define _XT_CGROUP_H + +#include <linux/types.h> +#include <linux/cgroup.h> +#include <linux/hardirq.h> +#include <linux/rcupdate.h> + +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) +struct cgroup_nf_state { + struct cgroup_subsys_state css; + u32 fwid; +}; + +void sock_update_fwid(struct sock *sk); + +#if IS_BUILTIN(CONFIG_NETFILTER_XT_MATCH_CGROUP) +static inline u32 task_fwid(struct task_struct *p) +{ + u32 fwid; + + if (in_interrupt()) + return 0; + + rcu_read_lock(); + fwid = container_of(task_css(p, net_filter_subsys_id), + struct cgroup_nf_state, css)->fwid; + rcu_read_unlock(); + + return fwid; +} +#elif IS_MODULE(CONFIG_NETFILTER_XT_MATCH_CGROUP) +static inline u32 task_fwid(struct task_struct *p) +{ + struct cgroup_subsys_state *css; + u32 fwid = 0; + + if (in_interrupt()) + return 0; + + rcu_read_lock(); + css = task_css(p, net_filter_subsys_id); + if (css) + fwid = container_of(css, struct cgroup_nf_state, css)->fwid; + rcu_read_unlock(); + + return fwid; +} +#endif +#else /* !CONFIG_NETFILTER_XT_MATCH_CGROUP */ +static inline u32 task_fwid(struct task_struct *p) +{ + return 0; +} + +#define sock_update_fwid(sk) +#endif /* CONFIG_NETFILTER_XT_MATCH_CGROUP */ +#endif /* _XT_CGROUP_H */ diff --git a/include/net/sock.h b/include/net/sock.h index e3bf213..f7da4b4 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -387,6 +387,9 @@ struct sock { #if IS_ENABLED(CONFIG_NETPRIO_CGROUP) __u32 sk_cgrp_prioidx; #endif +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) + __u32 sk_cgrp_fwid; +#endif struct pid *sk_peer_pid; const struct cred *sk_peer_cred; long sk_rcvtimeo; diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild index 1749154..94a4890 100644 --- a/include/uapi/linux/netfilter/Kbuild +++ b/include/uapi/linux/netfilter/Kbuild @@ -37,6 +37,7 @@ header-y += xt_TEE.h header-y += xt_TPROXY.h header-y += xt_addrtype.h header-y += xt_bpf.h +header-y += xt_cgroup.h header-y += xt_cluster.h header-y += xt_comment.h header-y += xt_connbytes.h diff --git a/include/uapi/linux/netfilter/xt_cgroup.h b/include/uapi/linux/netfilter/xt_cgroup.h new file mode 100644 index 0000000..43acb7e --- /dev/null +++ b/include/uapi/linux/netfilter/xt_cgroup.h @@ -0,0 +1,11 @@ +#ifndef _UAPI_XT_CGROUP_H +#define _UAPI_XT_CGROUP_H + +#include <linux/types.h> + +struct xt_cgroup_info { + __u32 id; + __u32 invert; +}; + +#endif /* _UAPI_XT_CGROUP_H */ diff --git a/net/core/scm.c b/net/core/scm.c index b442e7e..f08672a 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -36,6 +36,7 @@ #include <net/sock.h> #include <net/compat.h> #include <net/scm.h> +#include <net/netfilter/xt_cgroup.h> #include <net/cls_cgroup.h> @@ -290,6 +291,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm) /* Bump the usage count and install the file. */ sock = sock_from_file(fp[i], &err); if (sock) { + sock_update_fwid(sock->sk); sock_update_netprioidx(sock->sk); sock_update_classid(sock->sk); } diff --git a/net/core/sock.c b/net/core/sock.c index 2bd9b3f..524a376 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -125,6 +125,7 @@ #include <linux/skbuff.h> #include <net/net_namespace.h> #include <net/request_sock.h> +#include <net/netfilter/xt_cgroup.h> #include <net/sock.h> #include <linux/net_tstamp.h> #include <net/xfrm.h> @@ -1337,6 +1338,18 @@ void sock_update_netprioidx(struct sock *sk) EXPORT_SYMBOL_GPL(sock_update_netprioidx); #endif +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) +void sock_update_fwid(struct sock *sk) +{ + u32 fwid; + + fwid = task_fwid(current); + if (fwid != sk->sk_cgrp_fwid) + sk->sk_cgrp_fwid = fwid; +} +EXPORT_SYMBOL(sock_update_fwid); +#endif + /** * sk_alloc - All socket objects are allocated here * @net: the applicable net namespace @@ -1363,6 +1376,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority, sock_update_classid(sk); sock_update_netprioidx(sk); + sock_update_fwid(sk); } return sk; diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 6e839b6..d276ff4 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -806,6 +806,14 @@ config NETFILTER_XT_MATCH_BPF To compile it as a module, choose M here. If unsure, say N. +config NETFILTER_XT_MATCH_CGROUP + tristate '"control group" match support' + depends on NETFILTER_ADVANCED + depends on CGROUPS + ---help--- + Socket/process control group matching allows you to match locally + generated packets based on which control group processes belong to. + config NETFILTER_XT_MATCH_CLUSTER tristate '"cluster" match support' depends on NF_CONNTRACK diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index c3a0a12..12f014f 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o obj-$(CONFIG_NETFILTER_XT_MATCH_NFACCT) += xt_nfacct.o obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o +obj-$(CONFIG_NETFILTER_XT_MATCH_CGROUP) += xt_cgroup.o obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c new file mode 100644 index 0000000..249c7ee --- /dev/null +++ b/net/netfilter/xt_cgroup.c @@ -0,0 +1,177 @@ +/* + * Xtables module to match the process control group. + * + * Might be used to implement individual "per-application" firewall + * policies in contrast to global policies based on control groups. + * + * (C) 2013 Daniel Borkmann <dborkman@redhat.com> + * (C) 2013 Thomas Graf <tgraf@redhat.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/skbuff.h> +#include <linux/module.h> +#include <linux/file.h> +#include <linux/cgroup.h> +#include <linux/fdtable.h> +#include <linux/netfilter/x_tables.h> +#include <linux/netfilter/xt_cgroup.h> +#include <net/netfilter/xt_cgroup.h> +#include <net/sock.h> + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>"); +MODULE_DESCRIPTION("Xtables: process control group matching"); +MODULE_ALIAS("ipt_cgroup"); +MODULE_ALIAS("ip6t_cgroup"); + +static int cgroup_mt_check(const struct xt_mtchk_param *par) +{ + struct xt_cgroup_info *info = par->matchinfo; + + if (info->invert & ~1) + return -EINVAL; + + return info->id ? 0 : -EINVAL; +} + +static bool +cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct xt_cgroup_info *info = par->matchinfo; + + if (skb->sk == NULL) + return false; + + return (info->id == skb->sk->sk_cgrp_fwid) ^ info->invert; +} + +static struct xt_match cgroup_mt_reg __read_mostly = { + .name = "cgroup", + .revision = 0, + .family = NFPROTO_UNSPEC, + .checkentry = cgroup_mt_check, + .match = cgroup_mt, + .matchsize = sizeof(struct xt_cgroup_info), + .me = THIS_MODULE, + .hooks = (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_POST_ROUTING), +}; + +static inline struct cgroup_nf_state * +css_nf_state(struct cgroup_subsys_state *css) +{ + return css ? container_of(css, struct cgroup_nf_state, css) : NULL; +} + +static struct cgroup_subsys_state * +cgroup_css_alloc(struct cgroup_subsys_state *parent_css) +{ + struct cgroup_nf_state *cs; + + cs = kzalloc(sizeof(*cs), GFP_KERNEL); + if (!cs) + return ERR_PTR(-ENOMEM); + + return &cs->css; +} + +static int cgroup_css_online(struct cgroup_subsys_state *css) +{ + struct cgroup_nf_state *cs = css_nf_state(css); + struct cgroup_nf_state *parent = css_nf_state(css_parent(css)); + + if (parent) + cs->fwid = parent->fwid; + + return 0; +} + +static void cgroup_css_free(struct cgroup_subsys_state *css) +{ + kfree(css_nf_state(css)); +} + +static int cgroup_fwid_update(const void *v, struct file *file, unsigned n) +{ + int err; + struct socket *sock = sock_from_file(file, &err); + + if (sock) + sock->sk->sk_cgrp_fwid = (u32)(unsigned long) v; + + return 0; +} + +static u64 cgroup_fwid_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return css_nf_state(css)->fwid; +} + +static int cgroup_fwid_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 id) +{ + css_nf_state(css)->fwid = (u32) id; + + return 0; +} + +static void cgroup_attach(struct cgroup_subsys_state *css, + struct cgroup_taskset *tset) +{ + struct cgroup_nf_state *cs = css_nf_state(css); + void *v = (void *)(unsigned long) cs->fwid; + struct task_struct *p; + + cgroup_taskset_for_each(p, css, tset) { + task_lock(p); + iterate_fd(p->files, 0, cgroup_fwid_update, v); + task_unlock(p); + } +} + +static struct cftype net_filter_ss_files[] = { + { + .name = "fwid", + .read_u64 = cgroup_fwid_read, + .write_u64 = cgroup_fwid_write, + }, + { } +}; + +struct cgroup_subsys net_filter_subsys = { + .name = "net_filter", + .css_alloc = cgroup_css_alloc, + .css_online = cgroup_css_online, + .css_free = cgroup_css_free, + .attach = cgroup_attach, + .subsys_id = net_filter_subsys_id, + .base_cftypes = net_filter_ss_files, + .module = THIS_MODULE, +}; + +static int __init cgroup_mt_init(void) +{ + int ret = cgroup_load_subsys(&net_filter_subsys); + if (ret) + goto out; + + ret = xt_register_match(&cgroup_mt_reg); + if (ret) + cgroup_unload_subsys(&net_filter_subsys); +out: + return ret; +} + +static void __exit cgroup_mt_exit(void) +{ + xt_unregister_match(&cgroup_mt_reg); + cgroup_unload_subsys(&net_filter_subsys); +} + +module_init(cgroup_mt_init); +module_exit(cgroup_mt_exit); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
[parent not found: <ee0fb538d6e43e23d0488d3edd741de9c4589fb1.1382101225.git.dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH nf-next] netfilter: xtables: lightweight process control group matching [not found] ` <ee0fb538d6e43e23d0488d3edd741de9c4589fb1.1382101225.git.dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2013-11-05 13:03 ` Daniel Borkmann 0 siblings, 0 replies; 19+ messages in thread From: Daniel Borkmann @ 2013-11-05 13:03 UTC (permalink / raw) To: pablo-Cap9r6Oaw4JrovVCs/uTlw Cc: netfilter-devel-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA On 10/18/2013 03:28 PM, Daniel Borkmann wrote: > It would be useful e.g. in a server or desktop environment to have > a facility in the notion of fine-grained "per application" or "per > application group" firewall policies. Probably, users in the mobile/ > embedded area (e.g. Android based) with different security policy > requirements for application groups could have great benefit from > that as well. For example, with a little bit of configuration effort, > an admin could whitelist well-known applications, and thus block > otherwise unwanted "hard-to-track" applications like [1] from a > user's machine. > > Implementation of PID-based matching would not be appropriate > as they frequently change, and child tracking would make that > even more complex and ugly. Cgroups would be a perfect candidate > for accomplishing that as they associate a set of tasks with a > set of parameters for one or more subsystems, in our case the > netfilter subsystem, which, of course, can be combined with other > cgroup subsystems into something more complex. > > As mentioned, to overcome this constraint, such processes could > be placed into one or multiple cgroups where different fine-grained > rules can be defined depending on the application scenario, while > e.g. everything else that is not part of that could be dropped (or > vice versa), thus making life harder for unwanted processes to > communicate to the outside world. So, we make use of cgroups here > to track jobs and limit their resources in terms of iptables > policies; in other words, limiting what they are allowed to > communicate. > > Minimal, basic usage example (many other iptables options can be > applied obviously): > > 1) Configuring cgroups: > > mkdir /sys/fs/cgroup/net_filter > mount -t cgroup -o net_filter net_filter /sys/fs/cgroup/net_filter > mkdir /sys/fs/cgroup/net_filter/0 > echo 1 > /sys/fs/cgroup/net_filter/0/net_filter.fwid > > 2) Configuring netfilter: > > iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP > > 3) Running applications: > > ping 208.67.222.222 <pid:1799> > echo 1799 > /sys/fs/cgroup/net_filter/0/tasks > 64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms > ... > > ping 208.67.220.220 <pid:1804> > ping: sendmsg: Operation not permitted > ... > echo 1804 > /sys/fs/cgroup/net_filter/0/tasks > 64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms > ... > > Of course, real-world deployments would make use of cgroups user > space toolsuite, or own custom policy daemons dynamically moving > applications from/to various net_filter cgroups. > > Design considerations appendix: > > Based on the discussion from [2], [3], it seems the best tradeoff > imho to make this a subsystem, here's why: > > netfilter is a large enough and ubiquitous subsystem, meaning it > is not somewhere in a niche, and enabled/shipped on most machines. > It is true that the descision making on fwid is "outsourced" to > netfilter itself, but that does not necessarily need to be > considered as a bad thing to delegate and reuse as much as possible. > The matching performance in the critical path is just a simple > comparison of fwid tags, nothing more, thus resulting in a good > performance suited for high-speed networking. Moreover, by simply > transfering fwids between user- and kernel space, we can have the > ruleset as packed as possible, giving an optimal footprint for > large rulesets using this feature. The alternative draft that we > have proposed in [3] comes at the cost of exposing some of the > cgroups internals outside of cgroups to make it work, at least a > higher memory footprint for transferal of rules and even worse a > lower performance as more work needs to be done in the matching > critical path, that is traversing all cgroups a task belongs to > to find the one of our interest. Moreover, from the usability > point of view, it seems less intuitive, rather more confusing > than the approach presented here. Therefore, I consider this design > the better and less intrusive tradeoff to go with. As I've provided a code proposal for both variants and a design discussion/conclusion, are you d'accord with this patch Tejun? > [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf > [2] http://patchwork.ozlabs.org/patch/280687/ > [3] http://patchwork.ozlabs.org/patch/282477/ > > Signed-off-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> > Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > --- > v1->v2: > - Updated commit message, rebased > - Applied Gao Feng's feedback from [2] > > Note: iptables part is still available in http://patchwork.ozlabs.org/patch/280690/ > > Documentation/cgroups/00-INDEX | 2 + > Documentation/cgroups/net_filter.txt | 27 +++++ > include/linux/cgroup_subsys.h | 5 + > include/net/netfilter/xt_cgroup.h | 58 ++++++++++ > include/net/sock.h | 3 + > include/uapi/linux/netfilter/Kbuild | 1 + > include/uapi/linux/netfilter/xt_cgroup.h | 11 ++ > net/core/scm.c | 2 + > net/core/sock.c | 14 +++ > net/netfilter/Kconfig | 8 ++ > net/netfilter/Makefile | 1 + > net/netfilter/xt_cgroup.c | 177 +++++++++++++++++++++++++++++++ > 12 files changed, 309 insertions(+) > create mode 100644 Documentation/cgroups/net_filter.txt > create mode 100644 include/net/netfilter/xt_cgroup.h > create mode 100644 include/uapi/linux/netfilter/xt_cgroup.h > create mode 100644 net/netfilter/xt_cgroup.c > > diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX > index bc461b6..14424d2 100644 > --- a/Documentation/cgroups/00-INDEX > +++ b/Documentation/cgroups/00-INDEX > @@ -20,6 +20,8 @@ memory.txt > - Memory Resource Controller; design, accounting, interface, testing. > net_cls.txt > - Network classifier cgroups details and usages. > +net_filter.txt > + - Network firewalling (netfilter) cgroups details and usages. > net_prio.txt > - Network priority cgroups details and usages. > resource_counter.txt > diff --git a/Documentation/cgroups/net_filter.txt b/Documentation/cgroups/net_filter.txt > new file mode 100644 > index 0000000..22759e4 > --- /dev/null > +++ b/Documentation/cgroups/net_filter.txt > @@ -0,0 +1,27 @@ > +Netfilter cgroup > +---------------- > + > +The netfilter cgroup provides an interface to aggregate jobs > +to a particular netfilter tag, that can be used to apply > +various iptables/netfilter policies for those jobs in order > +to limit resources/abilities for network communication. > + > +Creating a net_filter cgroups instance creates a net_filter.fwid > +file. The value of net_filter.fwid is initialized to 0 on > +default (so only global iptables/netfilter policies apply). > +You can write a unique decimal fwid tag into net_filter.fwid > +file, and use that tag along with iptables' --cgroup option. > + > +Minimal/basic usage example: > + > +1) Configuring cgroup: > + > + mkdir /sys/fs/cgroup/net_filter > + mount -t cgroup -o net_filter net_filter /sys/fs/cgroup/net_filter > + mkdir /sys/fs/cgroup/net_filter/0 > + echo 1 > /sys/fs/cgroup/net_filter/0/net_filter.fwid > + echo [pid] > /sys/fs/cgroup/net_filter/0/tasks > + > +2) Configuring netfilter: > + > + iptables -A OUTPUT -m cgroup ! --cgroup 1 -p tcp --dport 80 -j DROP > diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h > index b613ffd..ef58217 100644 > --- a/include/linux/cgroup_subsys.h > +++ b/include/linux/cgroup_subsys.h > @@ -50,6 +50,11 @@ SUBSYS(net_prio) > #if IS_SUBSYS_ENABLED(CONFIG_CGROUP_HUGETLB) > SUBSYS(hugetlb) > #endif > + > +#if IS_SUBSYS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) > +SUBSYS(net_filter) > +#endif > + > /* > * DO NOT ADD ANY SUBSYSTEM WITHOUT EXPLICIT ACKS FROM CGROUP MAINTAINERS. > */ > diff --git a/include/net/netfilter/xt_cgroup.h b/include/net/netfilter/xt_cgroup.h > new file mode 100644 > index 0000000..b2c702f > --- /dev/null > +++ b/include/net/netfilter/xt_cgroup.h > @@ -0,0 +1,58 @@ > +#ifndef _XT_CGROUP_H > +#define _XT_CGROUP_H > + > +#include <linux/types.h> > +#include <linux/cgroup.h> > +#include <linux/hardirq.h> > +#include <linux/rcupdate.h> > + > +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) > +struct cgroup_nf_state { > + struct cgroup_subsys_state css; > + u32 fwid; > +}; > + > +void sock_update_fwid(struct sock *sk); > + > +#if IS_BUILTIN(CONFIG_NETFILTER_XT_MATCH_CGROUP) > +static inline u32 task_fwid(struct task_struct *p) > +{ > + u32 fwid; > + > + if (in_interrupt()) > + return 0; > + > + rcu_read_lock(); > + fwid = container_of(task_css(p, net_filter_subsys_id), > + struct cgroup_nf_state, css)->fwid; > + rcu_read_unlock(); > + > + return fwid; > +} > +#elif IS_MODULE(CONFIG_NETFILTER_XT_MATCH_CGROUP) > +static inline u32 task_fwid(struct task_struct *p) > +{ > + struct cgroup_subsys_state *css; > + u32 fwid = 0; > + > + if (in_interrupt()) > + return 0; > + > + rcu_read_lock(); > + css = task_css(p, net_filter_subsys_id); > + if (css) > + fwid = container_of(css, struct cgroup_nf_state, css)->fwid; > + rcu_read_unlock(); > + > + return fwid; > +} > +#endif > +#else /* !CONFIG_NETFILTER_XT_MATCH_CGROUP */ > +static inline u32 task_fwid(struct task_struct *p) > +{ > + return 0; > +} > + > +#define sock_update_fwid(sk) > +#endif /* CONFIG_NETFILTER_XT_MATCH_CGROUP */ > +#endif /* _XT_CGROUP_H */ > diff --git a/include/net/sock.h b/include/net/sock.h > index e3bf213..f7da4b4 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -387,6 +387,9 @@ struct sock { > #if IS_ENABLED(CONFIG_NETPRIO_CGROUP) > __u32 sk_cgrp_prioidx; > #endif > +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) > + __u32 sk_cgrp_fwid; > +#endif > struct pid *sk_peer_pid; > const struct cred *sk_peer_cred; > long sk_rcvtimeo; > diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild > index 1749154..94a4890 100644 > --- a/include/uapi/linux/netfilter/Kbuild > +++ b/include/uapi/linux/netfilter/Kbuild > @@ -37,6 +37,7 @@ header-y += xt_TEE.h > header-y += xt_TPROXY.h > header-y += xt_addrtype.h > header-y += xt_bpf.h > +header-y += xt_cgroup.h > header-y += xt_cluster.h > header-y += xt_comment.h > header-y += xt_connbytes.h > diff --git a/include/uapi/linux/netfilter/xt_cgroup.h b/include/uapi/linux/netfilter/xt_cgroup.h > new file mode 100644 > index 0000000..43acb7e > --- /dev/null > +++ b/include/uapi/linux/netfilter/xt_cgroup.h > @@ -0,0 +1,11 @@ > +#ifndef _UAPI_XT_CGROUP_H > +#define _UAPI_XT_CGROUP_H > + > +#include <linux/types.h> > + > +struct xt_cgroup_info { > + __u32 id; > + __u32 invert; > +}; > + > +#endif /* _UAPI_XT_CGROUP_H */ > diff --git a/net/core/scm.c b/net/core/scm.c > index b442e7e..f08672a 100644 > --- a/net/core/scm.c > +++ b/net/core/scm.c > @@ -36,6 +36,7 @@ > #include <net/sock.h> > #include <net/compat.h> > #include <net/scm.h> > +#include <net/netfilter/xt_cgroup.h> > #include <net/cls_cgroup.h> > > > @@ -290,6 +291,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm) > /* Bump the usage count and install the file. */ > sock = sock_from_file(fp[i], &err); > if (sock) { > + sock_update_fwid(sock->sk); > sock_update_netprioidx(sock->sk); > sock_update_classid(sock->sk); > } > diff --git a/net/core/sock.c b/net/core/sock.c > index 2bd9b3f..524a376 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -125,6 +125,7 @@ > #include <linux/skbuff.h> > #include <net/net_namespace.h> > #include <net/request_sock.h> > +#include <net/netfilter/xt_cgroup.h> > #include <net/sock.h> > #include <linux/net_tstamp.h> > #include <net/xfrm.h> > @@ -1337,6 +1338,18 @@ void sock_update_netprioidx(struct sock *sk) > EXPORT_SYMBOL_GPL(sock_update_netprioidx); > #endif > > +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP) > +void sock_update_fwid(struct sock *sk) > +{ > + u32 fwid; > + > + fwid = task_fwid(current); > + if (fwid != sk->sk_cgrp_fwid) > + sk->sk_cgrp_fwid = fwid; > +} > +EXPORT_SYMBOL(sock_update_fwid); > +#endif > + > /** > * sk_alloc - All socket objects are allocated here > * @net: the applicable net namespace > @@ -1363,6 +1376,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority, > > sock_update_classid(sk); > sock_update_netprioidx(sk); > + sock_update_fwid(sk); > } > > return sk; > diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig > index 6e839b6..d276ff4 100644 > --- a/net/netfilter/Kconfig > +++ b/net/netfilter/Kconfig > @@ -806,6 +806,14 @@ config NETFILTER_XT_MATCH_BPF > > To compile it as a module, choose M here. If unsure, say N. > > +config NETFILTER_XT_MATCH_CGROUP > + tristate '"control group" match support' > + depends on NETFILTER_ADVANCED > + depends on CGROUPS > + ---help--- > + Socket/process control group matching allows you to match locally > + generated packets based on which control group processes belong to. > + > config NETFILTER_XT_MATCH_CLUSTER > tristate '"cluster" match support' > depends on NF_CONNTRACK > diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile > index c3a0a12..12f014f 100644 > --- a/net/netfilter/Makefile > +++ b/net/netfilter/Makefile > @@ -124,6 +124,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o > obj-$(CONFIG_NETFILTER_XT_MATCH_NFACCT) += xt_nfacct.o > obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o > obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o > +obj-$(CONFIG_NETFILTER_XT_MATCH_CGROUP) += xt_cgroup.o > obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o > obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o > obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o > diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c > new file mode 100644 > index 0000000..249c7ee > --- /dev/null > +++ b/net/netfilter/xt_cgroup.c > @@ -0,0 +1,177 @@ > +/* > + * Xtables module to match the process control group. > + * > + * Might be used to implement individual "per-application" firewall > + * policies in contrast to global policies based on control groups. > + * > + * (C) 2013 Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > + * (C) 2013 Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + */ > + > +#include <linux/skbuff.h> > +#include <linux/module.h> > +#include <linux/file.h> > +#include <linux/cgroup.h> > +#include <linux/fdtable.h> > +#include <linux/netfilter/x_tables.h> > +#include <linux/netfilter/xt_cgroup.h> > +#include <net/netfilter/xt_cgroup.h> > +#include <net/sock.h> > + > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"); > +MODULE_DESCRIPTION("Xtables: process control group matching"); > +MODULE_ALIAS("ipt_cgroup"); > +MODULE_ALIAS("ip6t_cgroup"); > + > +static int cgroup_mt_check(const struct xt_mtchk_param *par) > +{ > + struct xt_cgroup_info *info = par->matchinfo; > + > + if (info->invert & ~1) > + return -EINVAL; > + > + return info->id ? 0 : -EINVAL; > +} > + > +static bool > +cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par) > +{ > + const struct xt_cgroup_info *info = par->matchinfo; > + > + if (skb->sk == NULL) > + return false; > + > + return (info->id == skb->sk->sk_cgrp_fwid) ^ info->invert; > +} > + > +static struct xt_match cgroup_mt_reg __read_mostly = { > + .name = "cgroup", > + .revision = 0, > + .family = NFPROTO_UNSPEC, > + .checkentry = cgroup_mt_check, > + .match = cgroup_mt, > + .matchsize = sizeof(struct xt_cgroup_info), > + .me = THIS_MODULE, > + .hooks = (1 << NF_INET_LOCAL_OUT) | > + (1 << NF_INET_POST_ROUTING), > +}; > + > +static inline struct cgroup_nf_state * > +css_nf_state(struct cgroup_subsys_state *css) > +{ > + return css ? container_of(css, struct cgroup_nf_state, css) : NULL; > +} > + > +static struct cgroup_subsys_state * > +cgroup_css_alloc(struct cgroup_subsys_state *parent_css) > +{ > + struct cgroup_nf_state *cs; > + > + cs = kzalloc(sizeof(*cs), GFP_KERNEL); > + if (!cs) > + return ERR_PTR(-ENOMEM); > + > + return &cs->css; > +} > + > +static int cgroup_css_online(struct cgroup_subsys_state *css) > +{ > + struct cgroup_nf_state *cs = css_nf_state(css); > + struct cgroup_nf_state *parent = css_nf_state(css_parent(css)); > + > + if (parent) > + cs->fwid = parent->fwid; > + > + return 0; > +} > + > +static void cgroup_css_free(struct cgroup_subsys_state *css) > +{ > + kfree(css_nf_state(css)); > +} > + > +static int cgroup_fwid_update(const void *v, struct file *file, unsigned n) > +{ > + int err; > + struct socket *sock = sock_from_file(file, &err); > + > + if (sock) > + sock->sk->sk_cgrp_fwid = (u32)(unsigned long) v; > + > + return 0; > +} > + > +static u64 cgroup_fwid_read(struct cgroup_subsys_state *css, > + struct cftype *cft) > +{ > + return css_nf_state(css)->fwid; > +} > + > +static int cgroup_fwid_write(struct cgroup_subsys_state *css, > + struct cftype *cft, u64 id) > +{ > + css_nf_state(css)->fwid = (u32) id; > + > + return 0; > +} > + > +static void cgroup_attach(struct cgroup_subsys_state *css, > + struct cgroup_taskset *tset) > +{ > + struct cgroup_nf_state *cs = css_nf_state(css); > + void *v = (void *)(unsigned long) cs->fwid; > + struct task_struct *p; > + > + cgroup_taskset_for_each(p, css, tset) { > + task_lock(p); > + iterate_fd(p->files, 0, cgroup_fwid_update, v); > + task_unlock(p); > + } > +} > + > +static struct cftype net_filter_ss_files[] = { > + { > + .name = "fwid", > + .read_u64 = cgroup_fwid_read, > + .write_u64 = cgroup_fwid_write, > + }, > + { } > +}; > + > +struct cgroup_subsys net_filter_subsys = { > + .name = "net_filter", > + .css_alloc = cgroup_css_alloc, > + .css_online = cgroup_css_online, > + .css_free = cgroup_css_free, > + .attach = cgroup_attach, > + .subsys_id = net_filter_subsys_id, > + .base_cftypes = net_filter_ss_files, > + .module = THIS_MODULE, > +}; > + > +static int __init cgroup_mt_init(void) > +{ > + int ret = cgroup_load_subsys(&net_filter_subsys); > + if (ret) > + goto out; > + > + ret = xt_register_match(&cgroup_mt_reg); > + if (ret) > + cgroup_unload_subsys(&net_filter_subsys); > +out: > + return ret; > +} > + > +static void __exit cgroup_mt_exit(void) > +{ > + xt_unregister_match(&cgroup_mt_reg); > + cgroup_unload_subsys(&net_filter_subsys); > +} > + > +module_init(cgroup_mt_init); > +module_exit(cgroup_mt_exit); > ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-11-05 13:03 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-10-04 18:20 [PATCH nf-next] netfilter: xtables: lightweight process control group matching Daniel Borkmann 2013-10-07 3:07 ` Gao feng 2013-10-07 9:17 ` Daniel Borkmann [not found] ` <52527C3E.1060004-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2013-10-07 9:42 ` Gao feng 2013-10-07 16:46 ` Tejun Heo 2013-10-08 8:05 ` Daniel Borkmann [not found] ` <5253BCAE.5060409-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2013-10-09 17:04 ` Tejun Heo 2013-10-09 19:12 ` Daniel Borkmann [not found] ` <20131009170409.GH22495-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 2013-10-10 21:55 ` Daniel Borkmann [not found] ` <1380910855-12325-1-git-send-email-dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2013-10-18 23:21 ` Eric W. Biederman [not found] ` <87li1qp3l8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2013-10-19 7:16 ` Daniel Borkmann 2013-10-21 15:09 ` Daniel Wagner [not found] ` <526543A2.2040901-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org> 2013-10-21 15:48 ` Daniel Borkmann 2013-10-22 7:15 ` Ni, Xun 2013-10-22 7:42 ` Daniel Borkmann 2013-10-22 7:45 ` Daniel Wagner [not found] ` <52654CE6.7030706-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2013-10-22 7:36 ` Daniel Wagner [not found] <cover.1382101225.git.dborkman@redhat.com> 2013-10-18 13:28 ` Daniel Borkmann [not found] ` <ee0fb538d6e43e23d0488d3edd741de9c4589fb1.1382101225.git.dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2013-11-05 13:03 ` Daniel Borkmann
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.