netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] netfilter updates for net-next
@ 2014-01-05 23:12 Pablo Neira Ayuso
  2014-01-05 23:12 ` [PATCH 01/12] netfilter: avoid get_random_bytes calls Pablo Neira Ayuso
                   ` (11 more replies)
  0 siblings, 12 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

[ forgot to attach the pull request to this email and cc netdev, resending ]

Hi David,

The following patchset contains Netfilter updates for your net-next tree,
they are:

* Add full port randomization support. Some crazy researchers found a way
  to reconstruct the secure ephemeral ports that are allocated in random mode
  by sending off-path bursts of UDP packets to overrun the socket buffer of
  the DNS resolver to trigger retransmissions, then if the timing for the
  DNS resolution done by a client is larger than usual, then they conclude
  that the port that received the burst of UDP packets is the one that was
  opened. It seems a bit aggressive method to me but it seems to work for
  them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
  new NAT mode to fully randomize ports using prandom.

* Add a new classifier to x_tables based on the socket net_cls set via
  cgroups. These includes two patches to prepare the field as requested by
  Zefan Li. Also from Daniel Borkmann.

* Use prandom instead of get_random_bytes in several locations of the
  netfilter code, from Florian Westphal.

* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
  mark, also from Florian Westphal.

* Fix compilation warning due to unused variable in IPVS, from Geert
  Uytterhoeven.

* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.

* Add IPComp extension to x_tables, from Fan Du.

* Several patches to remove dead code, by Stephen Hemminger.

* Reorder netns structure for conntrack, based on original patch from Eric
  Dumazet, from Jesper D. Brouer.

You can pull these changes from:

 git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master

Thanks!

----------------------------------------------------------------

The following changes since commit 68536053600425c24aba031c45f053d447eedd9c:

  ipv6: fix incorrect type in declaration (2013-12-12 16:14:09 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master

for you to fetch changes up to 82a37132f300ea53bdcd812917af5a6329ec80c3:

  netfilter: x_tables: lightweight process control group matching (2014-01-03 23:41:44 +0100)

----------------------------------------------------------------
Daniel Borkmann (4):
      netfilter: nf_nat: add full port randomization support
      net: net_cls: move cgroupfs classid handling into core
      net: netprio: rename config to be more consistent with cgroup configs
      netfilter: x_tables: lightweight process control group matching

Eric Leblond (1):
      netfilter: xt_CT: fix error value in xt_ct_tg_check()

Florian Westphal (2):
      netfilter: avoid get_random_bytes calls
      netfilter: ctnetlink: honor CTA_MARK_MASK when setting ctmark

Geert Uytterhoeven (1):
      ipvs: Remove unused variable ret from sync_thread_master()

Jesper Dangaard Brouer (1):
      net: reorder struct netns_ct for better cache-line usage

Valentina Giusti (1):
      netfilter: nfnetlink_queue: enable UID/GID socket info retrieval

fan.du (1):
      netfilter: add IPv4/6 IPComp extension match support

stephen hemminger (2):
      netfilter: ipset: remove unused code
      netfilter: nf_conntrack: remove dead code

 Documentation/cgroups/net_cls.txt              |    5 +
 include/linux/cgroup_subsys.h                  |    4 +-
 include/linux/netdevice.h                      |    2 +-
 include/linux/netfilter/ipset/ip_set.h         |    1 -
 include/net/cls_cgroup.h                       |   40 +++-----
 include/net/netfilter/ipv4/nf_conntrack_ipv4.h |    2 -
 include/net/netfilter/nf_conntrack_l3proto.h   |    1 -
 include/net/netns/conntrack.h                  |   33 +++----
 include/net/netprio_cgroup.h                   |   18 ++--
 include/net/sock.h                             |    2 +-
 include/uapi/linux/netfilter/Kbuild            |    2 +
 include/uapi/linux/netfilter/nf_nat.h          |   12 ++-
 include/uapi/linux/netfilter/nfnetlink_queue.h |    5 +-
 include/uapi/linux/netfilter/xt_cgroup.h       |   11 +++
 include/uapi/linux/netfilter/xt_ipcomp.h       |   16 ++++
 net/Kconfig                                    |   11 ++-
 net/core/Makefile                              |    3 +-
 net/core/dev.c                                 |    2 +-
 net/core/netclassid_cgroup.c                   |  120 ++++++++++++++++++++++++
 net/core/sock.c                                |   14 +--
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |    6 --
 net/netfilter/Kconfig                          |   19 ++++
 net/netfilter/Makefile                         |    2 +
 net/netfilter/ipset/ip_set_core.c              |   28 ------
 net/netfilter/ipvs/ip_vs_sync.c                |    5 +-
 net/netfilter/nf_conntrack_core.c              |   15 ---
 net/netfilter/nf_conntrack_netlink.c           |   12 ++-
 net/netfilter/nf_conntrack_proto.c             |    6 --
 net/netfilter/nf_nat_core.c                    |    4 +-
 net/netfilter/nf_nat_proto_common.c            |   10 +-
 net/netfilter/nfnetlink_log.c                  |    8 --
 net/netfilter/nfnetlink_queue_core.c           |   34 +++++++
 net/netfilter/nft_hash.c                       |    2 +-
 net/netfilter/xt_CT.c                          |    4 +-
 net/netfilter/xt_RATEEST.c                     |    2 +-
 net/netfilter/xt_cgroup.c                      |   71 ++++++++++++++
 net/netfilter/xt_connlimit.c                   |    2 +-
 net/netfilter/xt_hashlimit.c                   |    2 +-
 net/netfilter/xt_ipcomp.c                      |  111 ++++++++++++++++++++++
 net/netfilter/xt_recent.c                      |    2 +-
 net/sched/Kconfig                              |    1 +
 net/sched/cls_cgroup.c                         |  111 +---------------------
 42 files changed, 487 insertions(+), 274 deletions(-)
 create mode 100644 include/uapi/linux/netfilter/xt_cgroup.h
 create mode 100644 include/uapi/linux/netfilter/xt_ipcomp.h
 create mode 100644 net/core/netclassid_cgroup.c
 create mode 100644 net/netfilter/xt_cgroup.c
 create mode 100644 net/netfilter/xt_ipcomp.c

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/12] netfilter: avoid get_random_bytes calls
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
@ 2014-01-05 23:12 ` Pablo Neira Ayuso
  2014-01-05 23:41   ` Hannes Frederic Sowa
  2014-01-05 23:12 ` [PATCH 02/12] netfilter: ctnetlink: honor CTA_MARK_MASK when setting ctmark Pablo Neira Ayuso
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Florian Westphal <fw@strlen.de>

All these users need an initial seed value for jhash, prandom is
perfectly fine.  This avoids draining the entropy pool where
its not strictly required.

nfnetlink_log did not use the random value at all.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nfnetlink_log.c |    8 --------
 net/netfilter/nft_hash.c      |    2 +-
 net/netfilter/xt_RATEEST.c    |    2 +-
 net/netfilter/xt_connlimit.c  |    2 +-
 net/netfilter/xt_hashlimit.c  |    2 +-
 net/netfilter/xt_recent.c     |    2 +-
 6 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 3c4b69e..7d4254b 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -28,8 +28,6 @@
 #include <linux/proc_fs.h>
 #include <linux/security.h>
 #include <linux/list.h>
-#include <linux/jhash.h>
-#include <linux/random.h>
 #include <linux/slab.h>
 #include <net/sock.h>
 #include <net/netfilter/nf_log.h>
@@ -75,7 +73,6 @@ struct nfulnl_instance {
 };
 
 #define INSTANCE_BUCKETS	16
-static unsigned int hash_init;
 
 static int nfnl_log_net_id __read_mostly;
 
@@ -1066,11 +1063,6 @@ static int __init nfnetlink_log_init(void)
 {
 	int status = -ENOMEM;
 
-	/* it's not really all that important to have a random value, so
-	 * we can do this from the init function, even if there hasn't
-	 * been that much entropy yet */
-	get_random_bytes(&hash_init, sizeof(hash_init));
-
 	netlink_register_notifier(&nfulnl_rtnl_notifier);
 	status = nfnetlink_subsys_register(&nfulnl_subsys);
 	if (status < 0) {
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 3d3f8fc..6aae699 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -164,7 +164,7 @@ static int nft_hash_init(const struct nft_set *set,
 	unsigned int cnt, i;
 
 	if (unlikely(!nft_hash_rnd_initted)) {
-		get_random_bytes(&nft_hash_rnd, 4);
+		nft_hash_rnd = prandom_u32();
 		nft_hash_rnd_initted = true;
 	}
 
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 370adf6..190854b 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -100,7 +100,7 @@ static int xt_rateest_tg_checkentry(const struct xt_tgchk_param *par)
 	int ret;
 
 	if (unlikely(!rnd_inited)) {
-		get_random_bytes(&jhash_rnd, sizeof(jhash_rnd));
+		jhash_rnd = prandom_u32();
 		rnd_inited = true;
 	}
 
diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index c40b269..7671e82 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -229,7 +229,7 @@ static int connlimit_mt_check(const struct xt_mtchk_param *par)
 		u_int32_t rand;
 
 		do {
-			get_random_bytes(&rand, sizeof(rand));
+			rand = prandom_u32();
 		} while (!rand);
 		cmpxchg(&connlimit_rnd, 0, rand);
 	}
diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 9ff035c..a83a35c 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -177,7 +177,7 @@ dsthash_alloc_init(struct xt_hashlimit_htable *ht,
 	/* initialize hash with random val at the time we allocate
 	 * the first hashtable entry */
 	if (unlikely(!ht->rnd_initialized)) {
-		get_random_bytes(&ht->rnd, sizeof(ht->rnd));
+		ht->rnd = prandom_u32();
 		ht->rnd_initialized = true;
 	}
 
diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c
index 1e657cf..bfdc29f 100644
--- a/net/netfilter/xt_recent.c
+++ b/net/netfilter/xt_recent.c
@@ -334,7 +334,7 @@ static int recent_mt_check(const struct xt_mtchk_param *par,
 	size_t sz;
 
 	if (unlikely(!hash_rnd_inited)) {
-		get_random_bytes(&hash_rnd, sizeof(hash_rnd));
+		hash_rnd = prandom_u32();
 		hash_rnd_inited = true;
 	}
 	if (info->check_set & ~XT_RECENT_VALID_FLAGS) {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/12] netfilter: ctnetlink: honor CTA_MARK_MASK when setting ctmark
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
  2014-01-05 23:12 ` [PATCH 01/12] netfilter: avoid get_random_bytes calls Pablo Neira Ayuso
@ 2014-01-05 23:12 ` Pablo Neira Ayuso
  2014-01-05 23:12 ` [PATCH 03/12] netfilter: nfnetlink_queue: enable UID/GID socket info retrieval Pablo Neira Ayuso
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Florian Westphal <fw@strlen.de>

Useful to only set a particular range of the conntrack mark while
leaving exisiting parts of the value alone, e.g. when setting
conntrack marks via NFQUEUE.

Follows same scheme as MARK/CONNMARK targets, i.e. the mask defines
those bits that should be altered.  No mask is equal to '~0', ie.
the old value is replaced by new one.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 08870b8..bb322d0 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -2118,8 +2118,16 @@ ctnetlink_nfqueue_parse_ct(const struct nlattr *cda[], struct nf_conn *ct)
 			return err;
 	}
 #if defined(CONFIG_NF_CONNTRACK_MARK)
-	if (cda[CTA_MARK])
-		ct->mark = ntohl(nla_get_be32(cda[CTA_MARK]));
+	if (cda[CTA_MARK]) {
+		u32 mask = 0, mark, newmark;
+		if (cda[CTA_MARK_MASK])
+			mask = ~ntohl(nla_get_be32(cda[CTA_MARK_MASK]));
+
+		mark = ntohl(nla_get_be32(cda[CTA_MARK]));
+		newmark = (ct->mark & mask) ^ mark;
+		if (newmark != ct->mark)
+			ct->mark = newmark;
+	}
 #endif
 	return 0;
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/12] netfilter: nfnetlink_queue: enable UID/GID socket info retrieval
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
  2014-01-05 23:12 ` [PATCH 01/12] netfilter: avoid get_random_bytes calls Pablo Neira Ayuso
  2014-01-05 23:12 ` [PATCH 02/12] netfilter: ctnetlink: honor CTA_MARK_MASK when setting ctmark Pablo Neira Ayuso
@ 2014-01-05 23:12 ` Pablo Neira Ayuso
  2014-01-06 15:32   ` Eric Dumazet
  2014-01-05 23:12 ` [PATCH 04/12] netfilter: add IPv4/6 IPComp extension match support Pablo Neira Ayuso
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Valentina Giusti <valentina.giusti@bmw-carit.de>

Thanks to commits 41063e9 (ipv4: Early TCP socket demux) and 421b388
(udp: ipv4: Add udp early demux) it is now possible to parse UID and
GID socket info also for incoming TCP and UDP connections. Having
this info available, it is convenient to let NFQUEUE parse it in
order to improve and refine the traffic analysis in userspace.

Signed-off-by: Valentina Giusti <valentina.giusti@bmw-carit.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nfnetlink_queue.h |    5 +++-
 net/netfilter/nfnetlink_queue_core.c           |   34 ++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/linux/netfilter/nfnetlink_queue.h
index 0132bad..8dd819e 100644
--- a/include/uapi/linux/netfilter/nfnetlink_queue.h
+++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
@@ -47,6 +47,8 @@ enum nfqnl_attr_type {
 	NFQA_CAP_LEN,			/* __u32 length of captured packet */
 	NFQA_SKB_INFO,			/* __u32 skb meta information */
 	NFQA_EXP,			/* nf_conntrack_netlink.h */
+	NFQA_UID,			/* __u32 sk uid */
+	NFQA_GID,			/* __u32 sk gid */
 
 	__NFQA_MAX
 };
@@ -99,7 +101,8 @@ enum nfqnl_attr_config {
 #define NFQA_CFG_F_FAIL_OPEN			(1 << 0)
 #define NFQA_CFG_F_CONNTRACK			(1 << 1)
 #define NFQA_CFG_F_GSO				(1 << 2)
-#define NFQA_CFG_F_MAX				(1 << 3)
+#define NFQA_CFG_F_UID_GID			(1 << 3)
+#define NFQA_CFG_F_MAX				(1 << 4)
 
 /* flags for NFQA_SKB_INFO */
 /* packet appears to have wrong checksums, but they are ok */
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 21258cf..d3cf12b 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -297,6 +297,31 @@ nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet,
 	return flags ? nla_put_be32(nlskb, NFQA_SKB_INFO, htonl(flags)) : 0;
 }
 
+static int nfqnl_put_sk_uidgid(struct sk_buff *skb, struct sock *sk)
+{
+	const struct cred *cred;
+
+	if (sk->sk_state == TCP_TIME_WAIT)
+		return 0;
+
+	read_lock_bh(&sk->sk_callback_lock);
+	if (sk->sk_socket && sk->sk_socket->file) {
+		cred = sk->sk_socket->file->f_cred;
+		if (nla_put_be32(skb, NFQA_UID,
+		    htonl(from_kuid_munged(&init_user_ns, cred->fsuid))))
+			goto nla_put_failure;
+		if (nla_put_be32(skb, NFQA_GID,
+		    htonl(from_kgid_munged(&init_user_ns, cred->fsgid))))
+			goto nla_put_failure;
+	}
+	read_unlock_bh(&sk->sk_callback_lock);
+	return 0;
+
+nla_put_failure:
+	read_unlock_bh(&sk->sk_callback_lock);
+	return -1;
+}
+
 static struct sk_buff *
 nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 			   struct nf_queue_entry *entry,
@@ -372,6 +397,11 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 	if (queue->flags & NFQA_CFG_F_CONNTRACK)
 		ct = nfqnl_ct_get(entskb, &size, &ctinfo);
 
+	if (queue->flags & NFQA_CFG_F_UID_GID) {
+		size +=  (nla_total_size(sizeof(u_int32_t))	/* uid */
+			+ nla_total_size(sizeof(u_int32_t)));	/* gid */
+	}
+
 	skb = nfnetlink_alloc_skb(net, size, queue->peer_portid,
 				  GFP_ATOMIC);
 	if (!skb)
@@ -484,6 +514,10 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 			goto nla_put_failure;
 	}
 
+	if ((queue->flags & NFQA_CFG_F_UID_GID) && entskb->sk &&
+	    nfqnl_put_sk_uidgid(skb, entskb->sk) < 0)
+		goto nla_put_failure;
+
 	if (ct && nfqnl_ct_put(skb, ct, ctinfo) < 0)
 		goto nla_put_failure;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/12] netfilter: add IPv4/6 IPComp extension match support
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2014-01-05 23:12 ` [PATCH 03/12] netfilter: nfnetlink_queue: enable UID/GID socket info retrieval Pablo Neira Ayuso
@ 2014-01-05 23:12 ` Pablo Neira Ayuso
  2014-01-05 23:12 ` [PATCH 05/12] ipvs: Remove unused variable ret from sync_thread_master() Pablo Neira Ayuso
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: "fan.du" <fan.du@windriver.com>

With this plugin, user could specify IPComp tagged with certain
CPI that host not interested will be DROPped or any other action.

For example:
iptables  -A INPUT -p 108 -m ipcomp --ipcompspi 0x87 -j DROP
ip6tables -A INPUT -p 108 -m ipcomp --ipcompspi 0x87 -j DROP

Then input IPComp packet with CPI equates 0x87 will not reach
upper layer anymore.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/Kbuild      |    1 +
 include/uapi/linux/netfilter/xt_ipcomp.h |   16 +++++
 net/netfilter/Kconfig                    |    9 +++
 net/netfilter/Makefile                   |    1 +
 net/netfilter/xt_ipcomp.c                |  111 ++++++++++++++++++++++++++++++
 5 files changed, 138 insertions(+)
 create mode 100644 include/uapi/linux/netfilter/xt_ipcomp.h
 create mode 100644 net/netfilter/xt_ipcomp.c

diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild
index 17c3af2..91be8ce 100644
--- a/include/uapi/linux/netfilter/Kbuild
+++ b/include/uapi/linux/netfilter/Kbuild
@@ -54,6 +54,7 @@ header-y += xt_ecn.h
 header-y += xt_esp.h
 header-y += xt_hashlimit.h
 header-y += xt_helper.h
+header-y += xt_ipcomp.h
 header-y += xt_iprange.h
 header-y += xt_ipvs.h
 header-y += xt_length.h
diff --git a/include/uapi/linux/netfilter/xt_ipcomp.h b/include/uapi/linux/netfilter/xt_ipcomp.h
new file mode 100644
index 0000000..45c7e40
--- /dev/null
+++ b/include/uapi/linux/netfilter/xt_ipcomp.h
@@ -0,0 +1,16 @@
+#ifndef _XT_IPCOMP_H
+#define _XT_IPCOMP_H
+
+#include <linux/types.h>
+
+struct xt_ipcomp {
+	__u32 spis[2];	/* Security Parameter Index */
+	__u8 invflags;	/* Inverse flags */
+	__u8 hdrres;	/* Test of the Reserved Filed */
+};
+
+/* Values for "invflags" field in struct xt_ipcomp. */
+#define XT_IPCOMP_INV_SPI	0x01	/* Invert the sense of spi. */
+#define XT_IPCOMP_INV_MASK	0x01	/* All possible flags. */
+
+#endif /*_XT_IPCOMP_H*/
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c3398cd..6d8e48b 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -1035,6 +1035,15 @@ config NETFILTER_XT_MATCH_HL
 	in the IPv6 header, or the time-to-live field in the IPv4
 	header of the packet.
 
+config NETFILTER_XT_MATCH_IPCOMP
+	tristate '"ipcomp" match support'
+	depends on NETFILTER_ADVANCED
+	help
+	  This match extension allows you to match a range of CPIs(16 bits)
+	  inside IPComp header of IPSec packets.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_IPRANGE
 	tristate '"iprange" address range match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 394483b..398cd70 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -133,6 +133,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_ESP) += xt_esp.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_HASHLIMIT) += xt_hashlimit.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_HELPER) += xt_helper.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_HL) += xt_hl.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_IPCOMP) += xt_ipcomp.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_IPRANGE) += xt_iprange.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_IPVS) += xt_ipvs.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) += xt_length.o
diff --git a/net/netfilter/xt_ipcomp.c b/net/netfilter/xt_ipcomp.c
new file mode 100644
index 0000000..a4c7561
--- /dev/null
+++ b/net/netfilter/xt_ipcomp.c
@@ -0,0 +1,111 @@
+/*  Kernel module to match IPComp parameters for IPv4 and IPv6
+ *
+ *  Copyright (C) 2013 WindRiver
+ *
+ *  Author:
+ *  Fan Du <fan.du@windriver.com>
+ *
+ *  Based on:
+ *  net/netfilter/xt_esp.c
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/in.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/ip.h>
+
+#include <linux/netfilter/xt_ipcomp.h>
+#include <linux/netfilter/x_tables.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Fan Du <fan.du@windriver.com>");
+MODULE_DESCRIPTION("Xtables: IPv4/6 IPsec-IPComp SPI match");
+
+/* Returns 1 if the spi is matched by the range, 0 otherwise */
+static inline bool
+spi_match(u_int32_t min, u_int32_t max, u_int32_t spi, bool invert)
+{
+	bool r;
+	pr_debug("spi_match:%c 0x%x <= 0x%x <= 0x%x\n",
+		 invert ? '!' : ' ', min, spi, max);
+	r = (spi >= min && spi <= max) ^ invert;
+	pr_debug(" result %s\n", r ? "PASS" : "FAILED");
+	return r;
+}
+
+static bool comp_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	struct ip_comp_hdr _comphdr;
+	const struct ip_comp_hdr *chdr;
+	const struct xt_ipcomp *compinfo = par->matchinfo;
+
+	/* Must not be a fragment. */
+	if (par->fragoff != 0)
+		return false;
+
+	chdr = skb_header_pointer(skb, par->thoff, sizeof(_comphdr), &_comphdr);
+	if (chdr == NULL) {
+		/* We've been asked to examine this packet, and we
+		 * can't.  Hence, no choice but to drop.
+		 */
+		pr_debug("Dropping evil IPComp tinygram.\n");
+		par->hotdrop = true;
+		return 0;
+	}
+
+	return spi_match(compinfo->spis[0], compinfo->spis[1],
+			 ntohl(chdr->cpi << 16),
+			 !!(compinfo->invflags & XT_IPCOMP_INV_SPI));
+}
+
+static int comp_mt_check(const struct xt_mtchk_param *par)
+{
+	const struct xt_ipcomp *compinfo = par->matchinfo;
+
+	/* Must specify no unknown invflags */
+	if (compinfo->invflags & ~XT_IPCOMP_INV_MASK) {
+		pr_err("unknown flags %X\n", compinfo->invflags);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct xt_match comp_mt_reg[] __read_mostly = {
+	{
+		.name		= "ipcomp",
+		.family		= NFPROTO_IPV4,
+		.match		= comp_mt,
+		.matchsize	= sizeof(struct xt_ipcomp),
+		.proto		= IPPROTO_COMP,
+		.checkentry	= comp_mt_check,
+		.me		= THIS_MODULE,
+	},
+	{
+		.name		= "ipcomp",
+		.family		= NFPROTO_IPV6,
+		.match		= comp_mt,
+		.matchsize	= sizeof(struct xt_ipcomp),
+		.proto		= IPPROTO_COMP,
+		.checkentry	= comp_mt_check,
+		.me		= THIS_MODULE,
+	},
+};
+
+static int __init comp_mt_init(void)
+{
+	return xt_register_matches(comp_mt_reg, ARRAY_SIZE(comp_mt_reg));
+}
+
+static void __exit comp_mt_exit(void)
+{
+	xt_unregister_matches(comp_mt_reg, ARRAY_SIZE(comp_mt_reg));
+}
+
+module_init(comp_mt_init);
+module_exit(comp_mt_exit);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/12] ipvs: Remove unused variable ret from sync_thread_master()
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2014-01-05 23:12 ` [PATCH 04/12] netfilter: add IPv4/6 IPComp extension match support Pablo Neira Ayuso
@ 2014-01-05 23:12 ` Pablo Neira Ayuso
  2014-01-05 23:13 ` [PATCH 06/12] netfilter: nf_nat: add full port randomization support Pablo Neira Ayuso
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Geert Uytterhoeven <geert@linux-m68k.org>

net/netfilter/ipvs/ip_vs_sync.c: In function 'sync_thread_master':
net/netfilter/ipvs/ip_vs_sync.c:1640:8: warning: unused variable 'ret' [-Wunused-variable]

Commit 35a2af94c7ce7130ca292c68b1d27fcfdb648f6b ("sched/wait: Make the
__wait_event*() interface more friendly") changed how the interruption
state is returned. However, sync_thread_master() ignores this state,
now causing a compile warning.

According to Julian Anastasov <ja@ssi.bg>, this behavior is OK:

    "Yes, your patch looks ok to me. In the past we used ssleep() but IPVS
     users were confused why IPVS threads increase the load average. So, we
     switched to _interruptible calls and later the socket polling was
     added."

Document this, as requested by Peter Zijlstra, to avoid precious developers
disappearing in this pitfall in the future.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_sync.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index f63c238..db80126 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1637,7 +1637,10 @@ static int sync_thread_master(void *data)
 			continue;
 		}
 		while (ip_vs_send_sync_msg(tinfo->sock, sb->mesg) < 0) {
-			int ret = __wait_event_interruptible(*sk_sleep(sk),
+			/* (Ab)use interruptible sleep to avoid increasing
+			 * the load avg.
+			 */
+			__wait_event_interruptible(*sk_sleep(sk),
 						   sock_writeable(sk) ||
 						   kthread_should_stop());
 			if (unlikely(kthread_should_stop()))
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/12] netfilter: nf_nat: add full port randomization support
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
                   ` (4 preceding siblings ...)
  2014-01-05 23:12 ` [PATCH 05/12] ipvs: Remove unused variable ret from sync_thread_master() Pablo Neira Ayuso
@ 2014-01-05 23:13 ` Pablo Neira Ayuso
  2014-01-05 23:13 ` [PATCH 07/12] netfilter: ipset: remove unused code Pablo Neira Ayuso
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Daniel Borkmann <dborkman@redhat.com>

We currently use prandom_u32() for allocation of ports in tcp bind(0)
and udp code. In case of plain SNAT we try to keep the ports as is
or increment on collision.

SNAT --random mode does use per-destination incrementing port
allocation. As a recent paper pointed out in [1] that this mode of
port allocation makes it possible to an attacker to find the randomly
allocated ports through a timing side-channel in a socket overloading
attack conducted through an off-path attacker.

So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
in regard to the attack described in this paper. As we need to keep
compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
selection algorithm with a simple prandom_u32() in order to mitigate
this attack vector. Note that the lfsr113's internal state is
periodically reseeded by the kernel through a local secure entropy
source.

More details can be found in [1], the basic idea is to send bursts
of packets to a socket to overflow its receive queue and measure
the latency to detect a possible retransmit when the port is found.
Because of increasing ports to given destination and port, further
allocations can be predicted. This information could then be used by
an attacker for e.g. for cache-poisoning, NS pinning, and degradation
of service attacks against DNS servers [1]:

  The best defense against the poisoning attacks is to properly
  deploy and validate DNSSEC; DNSSEC provides security not only
  against off-path attacker but even against MitM attacker. We hope
  that our results will help motivate administrators to adopt DNSSEC.
  However, full DNSSEC deployment make take significant time, and
  until that happens, we recommend short-term, non-cryptographic
  defenses. We recommend to support full port randomisation,
  according to practices recommended in [2], and to avoid
  per-destination sequential port allocation, which we show may be
  vulnerable to derandomisation attacks.

Joint work between Hannes Frederic Sowa and Daniel Borkmann.

 [1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
 [2] http://arxiv.org/pdf/1205.5190v1.pdf

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_nat.h |   12 ++++++++----
 net/netfilter/nf_nat_core.c           |    4 ++--
 net/netfilter/nf_nat_proto_common.c   |   10 ++++++----
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_nat.h b/include/uapi/linux/netfilter/nf_nat.h
index bf0cc37..1ad3659 100644
--- a/include/uapi/linux/netfilter/nf_nat.h
+++ b/include/uapi/linux/netfilter/nf_nat.h
@@ -4,10 +4,14 @@
 #include <linux/netfilter.h>
 #include <linux/netfilter/nf_conntrack_tuple_common.h>
 
-#define NF_NAT_RANGE_MAP_IPS		1
-#define NF_NAT_RANGE_PROTO_SPECIFIED	2
-#define NF_NAT_RANGE_PROTO_RANDOM	4
-#define NF_NAT_RANGE_PERSISTENT		8
+#define NF_NAT_RANGE_MAP_IPS			(1 << 0)
+#define NF_NAT_RANGE_PROTO_SPECIFIED		(1 << 1)
+#define NF_NAT_RANGE_PROTO_RANDOM		(1 << 2)
+#define NF_NAT_RANGE_PERSISTENT			(1 << 3)
+#define NF_NAT_RANGE_PROTO_RANDOM_FULLY		(1 << 4)
+
+#define NF_NAT_RANGE_PROTO_RANDOM_ALL		\
+	(NF_NAT_RANGE_PROTO_RANDOM | NF_NAT_RANGE_PROTO_RANDOM_FULLY)
 
 struct nf_nat_ipv4_range {
 	unsigned int			flags;
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 63a8154..d3f5cd6 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -315,7 +315,7 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple,
 	 * manips not an issue.
 	 */
 	if (maniptype == NF_NAT_MANIP_SRC &&
-	    !(range->flags & NF_NAT_RANGE_PROTO_RANDOM)) {
+	    !(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) {
 		/* try the original tuple first */
 		if (in_range(l3proto, l4proto, orig_tuple, range)) {
 			if (!nf_nat_used_tuple(orig_tuple, ct)) {
@@ -339,7 +339,7 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple,
 	 */
 
 	/* Only bother mapping if it's not already in range and unique */
-	if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM)) {
+	if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) {
 		if (range->flags & NF_NAT_RANGE_PROTO_SPECIFIED) {
 			if (l4proto->in_range(tuple, maniptype,
 					      &range->min_proto,
diff --git a/net/netfilter/nf_nat_proto_common.c b/net/netfilter/nf_nat_proto_common.c
index 9baaf73..83a72a2 100644
--- a/net/netfilter/nf_nat_proto_common.c
+++ b/net/netfilter/nf_nat_proto_common.c
@@ -74,22 +74,24 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto,
 		range_size = ntohs(range->max_proto.all) - min + 1;
 	}
 
-	if (range->flags & NF_NAT_RANGE_PROTO_RANDOM)
+	if (range->flags & NF_NAT_RANGE_PROTO_RANDOM) {
 		off = l3proto->secure_port(tuple, maniptype == NF_NAT_MANIP_SRC
 						  ? tuple->dst.u.all
 						  : tuple->src.u.all);
-	else
+	} else if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) {
+		off = prandom_u32();
+	} else {
 		off = *rover;
+	}
 
 	for (i = 0; ; ++off) {
 		*portptr = htons(min + off % range_size);
 		if (++i != range_size && nf_nat_used_tuple(tuple, ct))
 			continue;
-		if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM))
+		if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL))
 			*rover = off;
 		return;
 	}
-	return;
 }
 EXPORT_SYMBOL_GPL(nf_nat_l4proto_unique_tuple);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/12] netfilter: ipset: remove unused code
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
                   ` (5 preceding siblings ...)
  2014-01-05 23:13 ` [PATCH 06/12] netfilter: nf_nat: add full port randomization support Pablo Neira Ayuso
@ 2014-01-05 23:13 ` Pablo Neira Ayuso
  2014-01-05 23:13 ` [PATCH 08/12] netfilter: nf_conntrack: remove dead code Pablo Neira Ayuso
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: stephen hemminger <stephen@networkplumber.org>

Function never used in current upstream code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter/ipset/ip_set.h |    1 -
 net/netfilter/ipset/ip_set_core.c      |   28 ----------------------------
 2 files changed, 29 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h
index c7174b8..0c7d01e 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -331,7 +331,6 @@ extern ip_set_id_t ip_set_get_byname(struct net *net,
 				     const char *name, struct ip_set **set);
 extern void ip_set_put_byindex(struct net *net, ip_set_id_t index);
 extern const char *ip_set_name_byindex(struct net *net, ip_set_id_t index);
-extern ip_set_id_t ip_set_nfnl_get(struct net *net, const char *name);
 extern ip_set_id_t ip_set_nfnl_get_byindex(struct net *net, ip_set_id_t index);
 extern void ip_set_nfnl_put(struct net *net, ip_set_id_t index);
 
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index bac7e01..de770ec 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -625,34 +625,6 @@ EXPORT_SYMBOL_GPL(ip_set_name_byindex);
  */
 
 /*
- * Find set by name, reference it once. The reference makes sure the
- * thing pointed to, does not go away under our feet.
- *
- * The nfnl mutex is used in the function.
- */
-ip_set_id_t
-ip_set_nfnl_get(struct net *net, const char *name)
-{
-	ip_set_id_t i, index = IPSET_INVALID_ID;
-	struct ip_set *s;
-	struct ip_set_net *inst = ip_set_pernet(net);
-
-	nfnl_lock(NFNL_SUBSYS_IPSET);
-	for (i = 0; i < inst->ip_set_max; i++) {
-		s = nfnl_set(inst, i);
-		if (s != NULL && STREQ(s->name, name)) {
-			__ip_set_get(s);
-			index = i;
-			break;
-		}
-	}
-	nfnl_unlock(NFNL_SUBSYS_IPSET);
-
-	return index;
-}
-EXPORT_SYMBOL_GPL(ip_set_nfnl_get);
-
-/*
  * Find set by index, reference it once. The reference makes sure the
  * thing pointed to, does not go away under our feet.
  *
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/12] netfilter: nf_conntrack: remove dead code
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
                   ` (6 preceding siblings ...)
  2014-01-05 23:13 ` [PATCH 07/12] netfilter: ipset: remove unused code Pablo Neira Ayuso
@ 2014-01-05 23:13 ` Pablo Neira Ayuso
  2014-01-05 23:13 ` [PATCH 09/12] netfilter: xt_CT: fix error value in xt_ct_tg_check() Pablo Neira Ayuso
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: stephen hemminger <stephen@networkplumber.org>

The following code is not used in current upstream code.
Some of this seems to be old hooks, other might be used by some
out of tree module (which I don't care about breaking), and
the need_ipv4_conntrack was used by old NAT code but no longer
called.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/ipv4/nf_conntrack_ipv4.h |    2 --
 include/net/netfilter/nf_conntrack_l3proto.h   |    1 -
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |    6 ------
 net/netfilter/nf_conntrack_core.c              |   15 ---------------
 net/netfilter/nf_conntrack_proto.c             |    6 ------
 5 files changed, 30 deletions(-)

diff --git a/include/net/netfilter/ipv4/nf_conntrack_ipv4.h b/include/net/netfilter/ipv4/nf_conntrack_ipv4.h
index 6c3d12e..981c327 100644
--- a/include/net/netfilter/ipv4/nf_conntrack_ipv4.h
+++ b/include/net/netfilter/ipv4/nf_conntrack_ipv4.h
@@ -19,6 +19,4 @@ extern struct nf_conntrack_l4proto nf_conntrack_l4proto_icmp;
 int nf_conntrack_ipv4_compat_init(void);
 void nf_conntrack_ipv4_compat_fini(void);
 
-void need_ipv4_conntrack(void);
-
 #endif /*_NF_CONNTRACK_IPV4_H*/
diff --git a/include/net/netfilter/nf_conntrack_l3proto.h b/include/net/netfilter/nf_conntrack_l3proto.h
index 3efab70..adc1fa3 100644
--- a/include/net/netfilter/nf_conntrack_l3proto.h
+++ b/include/net/netfilter/nf_conntrack_l3proto.h
@@ -87,7 +87,6 @@ int nf_ct_l3proto_register(struct nf_conntrack_l3proto *proto);
 void nf_ct_l3proto_unregister(struct nf_conntrack_l3proto *proto);
 
 struct nf_conntrack_l3proto *nf_ct_l3proto_find_get(u_int16_t l3proto);
-void nf_ct_l3proto_put(struct nf_conntrack_l3proto *p);
 
 /* Existing built-in protocols */
 extern struct nf_conntrack_l3proto nf_conntrack_l3proto_generic;
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index ecd8bec..8127dc8 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -548,9 +548,3 @@ static void __exit nf_conntrack_l3proto_ipv4_fini(void)
 
 module_init(nf_conntrack_l3proto_ipv4_init);
 module_exit(nf_conntrack_l3proto_ipv4_fini);
-
-void need_ipv4_conntrack(void)
-{
-	return;
-}
-EXPORT_SYMBOL_GPL(need_ipv4_conntrack);
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 43549eb..8824ed0 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -60,12 +60,6 @@ int (*nfnetlink_parse_nat_setup_hook)(struct nf_conn *ct,
 				      const struct nlattr *attr) __read_mostly;
 EXPORT_SYMBOL_GPL(nfnetlink_parse_nat_setup_hook);
 
-int (*nf_nat_seq_adjust_hook)(struct sk_buff *skb,
-			      struct nf_conn *ct,
-			      enum ip_conntrack_info ctinfo,
-			      unsigned int protoff);
-EXPORT_SYMBOL_GPL(nf_nat_seq_adjust_hook);
-
 DEFINE_SPINLOCK(nf_conntrack_lock);
 EXPORT_SYMBOL_GPL(nf_conntrack_lock);
 
@@ -361,15 +355,6 @@ begin:
 	return NULL;
 }
 
-struct nf_conntrack_tuple_hash *
-__nf_conntrack_find(struct net *net, u16 zone,
-		    const struct nf_conntrack_tuple *tuple)
-{
-	return ____nf_conntrack_find(net, zone, tuple,
-				     hash_conntrack_raw(tuple, zone));
-}
-EXPORT_SYMBOL_GPL(__nf_conntrack_find);
-
 /* Find a connection corresponding to a tuple. */
 static struct nf_conntrack_tuple_hash *
 __nf_conntrack_find_get(struct net *net, u16 zone,
diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
index ce30041..b65d586 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -92,12 +92,6 @@ nf_ct_l3proto_find_get(u_int16_t l3proto)
 }
 EXPORT_SYMBOL_GPL(nf_ct_l3proto_find_get);
 
-void nf_ct_l3proto_put(struct nf_conntrack_l3proto *p)
-{
-	module_put(p->me);
-}
-EXPORT_SYMBOL_GPL(nf_ct_l3proto_put);
-
 int
 nf_ct_l3proto_try_module_get(unsigned short l3proto)
 {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/12] netfilter: xt_CT: fix error value in xt_ct_tg_check()
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
                   ` (7 preceding siblings ...)
  2014-01-05 23:13 ` [PATCH 08/12] netfilter: nf_conntrack: remove dead code Pablo Neira Ayuso
@ 2014-01-05 23:13 ` Pablo Neira Ayuso
  2014-01-05 23:13 ` [PATCH 10/12] net: net_cls: move cgroupfs classid handling into core Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Eric Leblond <eric@regit.org>

If setting event mask fails then we were returning 0 for success.
This patch updates return code to -EINVAL in case of problem.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_CT.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index da35ac0..5929be6 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -211,8 +211,10 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par,
 	ret = 0;
 	if ((info->ct_events || info->exp_events) &&
 	    !nf_ct_ecache_ext_add(ct, info->ct_events, info->exp_events,
-				  GFP_KERNEL))
+				  GFP_KERNEL)) {
+		ret = -EINVAL;
 		goto err3;
+	}
 
 	if (info->helper[0]) {
 		ret = xt_ct_set_helper(ct, info->helper, par);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/12] net: net_cls: move cgroupfs classid handling into core
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
                   ` (8 preceding siblings ...)
  2014-01-05 23:13 ` [PATCH 09/12] netfilter: xt_CT: fix error value in xt_ct_tg_check() Pablo Neira Ayuso
@ 2014-01-05 23:13 ` Pablo Neira Ayuso
  2014-01-05 23:13 ` [PATCH 11/12] net: netprio: rename config to be more consistent with cgroup configs Pablo Neira Ayuso
  2014-01-05 23:13 ` [PATCH 12/12] netfilter: x_tables: lightweight process control group matching Pablo Neira Ayuso
  11 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Daniel Borkmann <dborkman@redhat.com>

Zefan Li requested [1] to perform the following cleanup/refactoring:

- Split cgroupfs classid handling into net core to better express a
  possible more generic use.

- Disable module support for cgroupfs bits as the majority of other
  cgroupfs subsystems do not have that, and seems to be not wished
  from cgroup side. Zefan probably might want to follow-up for netprio
  later on.

- By this, code can be further reduced which previously took care of
  functionality built when compiled as module.

cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so
that we are consistent with {netclassid,netprio}_cgroup naming that is
under net/core/ as suggested by Zefan.

No change in functionality, but only code refactoring that is being
done here.

 [1] http://patchwork.ozlabs.org/patch/304825/

Suggested-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/cgroup_subsys.h |    2 +-
 include/net/cls_cgroup.h      |   40 +++++---------
 net/Kconfig                   |    7 +++
 net/core/Makefile             |    1 +
 net/core/netclassid_cgroup.c  |  120 +++++++++++++++++++++++++++++++++++++++++
 net/core/sock.c               |   12 -----
 net/sched/Kconfig             |    1 +
 net/sched/cls_cgroup.c        |  111 +-------------------------------------
 8 files changed, 143 insertions(+), 151 deletions(-)
 create mode 100644 net/core/netclassid_cgroup.c

diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index b613ffd..58bf94d 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -31,7 +31,7 @@ SUBSYS(devices)
 SUBSYS(freezer)
 #endif
 
-#if IS_SUBSYS_ENABLED(CONFIG_NET_CLS_CGROUP)
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_NET_CLASSID)
 SUBSYS(net_cls)
 #endif
 
diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
index 33d03b6..9cf2d5e 100644
--- a/include/net/cls_cgroup.h
+++ b/include/net/cls_cgroup.h
@@ -16,17 +16,16 @@
 #include <linux/cgroup.h>
 #include <linux/hardirq.h>
 #include <linux/rcupdate.h>
+#include <net/sock.h>
 
-#if IS_ENABLED(CONFIG_NET_CLS_CGROUP)
-struct cgroup_cls_state
-{
+#ifdef CONFIG_CGROUP_NET_CLASSID
+struct cgroup_cls_state {
 	struct cgroup_subsys_state css;
 	u32 classid;
 };
 
-void sock_update_classid(struct sock *sk);
+struct cgroup_cls_state *task_cls_state(struct task_struct *p);
 
-#if IS_BUILTIN(CONFIG_NET_CLS_CGROUP)
 static inline u32 task_cls_classid(struct task_struct *p)
 {
 	u32 classid;
@@ -41,33 +40,18 @@ static inline u32 task_cls_classid(struct task_struct *p)
 
 	return classid;
 }
-#elif IS_MODULE(CONFIG_NET_CLS_CGROUP)
-static inline u32 task_cls_classid(struct task_struct *p)
-{
-	struct cgroup_subsys_state *css;
-	u32 classid = 0;
-
-	if (in_interrupt())
-		return 0;
-
-	rcu_read_lock();
-	css = task_css(p, net_cls_subsys_id);
-	if (css)
-		classid = container_of(css,
-				       struct cgroup_cls_state, css)->classid;
-	rcu_read_unlock();
 
-	return classid;
-}
-#endif
-#else /* !CGROUP_NET_CLS_CGROUP */
 static inline void sock_update_classid(struct sock *sk)
 {
-}
+	u32 classid;
 
-static inline u32 task_cls_classid(struct task_struct *p)
+	classid = task_cls_classid(current);
+	if (classid != sk->sk_classid)
+		sk->sk_classid = classid;
+}
+#else /* !CONFIG_CGROUP_NET_CLASSID */
+static inline void sock_update_classid(struct sock *sk)
 {
-	return 0;
 }
-#endif /* CGROUP_NET_CLS_CGROUP */
+#endif /* CONFIG_CGROUP_NET_CLASSID */
 #endif  /* _NET_CLS_CGROUP_H */
diff --git a/net/Kconfig b/net/Kconfig
index d334678..7da10b8 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -245,6 +245,13 @@ config NETPRIO_CGROUP
 	  Cgroup subsystem for use in assigning processes to network priorities on
 	  a per-interface basis
 
+config CGROUP_NET_CLASSID
+	boolean "Network classid cgroup"
+	depends on CGROUPS
+	---help---
+	  Cgroup subsystem for use as general purpose socket classid marker that is
+	  being used in cls_cgroup and for netfilter matching.
+
 config NET_RX_BUSY_POLL
 	boolean
 	default y
diff --git a/net/core/Makefile b/net/core/Makefile
index b33b996..4839a27 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -22,3 +22,4 @@ obj-$(CONFIG_TRACEPOINTS) += net-traces.o
 obj-$(CONFIG_NET_DROP_MONITOR) += drop_monitor.o
 obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
 obj-$(CONFIG_NETPRIO_CGROUP) += netprio_cgroup.o
+obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
new file mode 100644
index 0000000..719efd5
--- /dev/null
+++ b/net/core/netclassid_cgroup.c
@@ -0,0 +1,120 @@
+/*
+ * net/core/netclassid_cgroup.c	Classid Cgroupfs Handling
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Thomas Graf <tgraf@suug.ch>
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/cgroup.h>
+#include <linux/fdtable.h>
+#include <net/cls_cgroup.h>
+#include <net/sock.h>
+
+static inline struct cgroup_cls_state *css_cls_state(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct cgroup_cls_state, css) : NULL;
+}
+
+struct cgroup_cls_state *task_cls_state(struct task_struct *p)
+{
+	return css_cls_state(task_css(p, net_cls_subsys_id));
+}
+EXPORT_SYMBOL_GPL(task_cls_state);
+
+static struct cgroup_subsys_state *
+cgrp_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct cgroup_cls_state *cs;
+
+	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
+	if (!cs)
+		return ERR_PTR(-ENOMEM);
+
+	return &cs->css;
+}
+
+static int cgrp_css_online(struct cgroup_subsys_state *css)
+{
+	struct cgroup_cls_state *cs = css_cls_state(css);
+	struct cgroup_cls_state *parent = css_cls_state(css_parent(css));
+
+	if (parent)
+		cs->classid = parent->classid;
+
+	return 0;
+}
+
+static void cgrp_css_free(struct cgroup_subsys_state *css)
+{
+	kfree(css_cls_state(css));
+}
+
+static int update_classid(const void *v, struct file *file, unsigned n)
+{
+	int err;
+	struct socket *sock = sock_from_file(file, &err);
+
+	if (sock)
+		sock->sk->sk_classid = (u32)(unsigned long)v;
+
+	return 0;
+}
+
+static void cgrp_attach(struct cgroup_subsys_state *css,
+			struct cgroup_taskset *tset)
+{
+	struct cgroup_cls_state *cs = css_cls_state(css);
+	void *v = (void *)(unsigned long)cs->classid;
+	struct task_struct *p;
+
+	cgroup_taskset_for_each(p, css, tset) {
+		task_lock(p);
+		iterate_fd(p->files, 0, update_classid, v);
+		task_unlock(p);
+	}
+}
+
+static u64 read_classid(struct cgroup_subsys_state *css, struct cftype *cft)
+{
+	return css_cls_state(css)->classid;
+}
+
+static int write_classid(struct cgroup_subsys_state *css, struct cftype *cft,
+			 u64 value)
+{
+	css_cls_state(css)->classid = (u32) value;
+
+	return 0;
+}
+
+static struct cftype ss_files[] = {
+	{
+		.name		= "classid",
+		.read_u64	= read_classid,
+		.write_u64	= write_classid,
+	},
+	{ }	/* terminate */
+};
+
+struct cgroup_subsys net_cls_subsys = {
+	.name			= "net_cls",
+	.css_alloc		= cgrp_css_alloc,
+	.css_online		= cgrp_css_online,
+	.css_free		= cgrp_css_free,
+	.attach			= cgrp_attach,
+	.subsys_id		= net_cls_subsys_id,
+	.base_cftypes		= ss_files,
+	.module			= THIS_MODULE,
+};
+
+static int __init init_netclassid_cgroup(void)
+{
+	return cgroup_load_subsys(&net_cls_subsys);
+}
+__initcall(init_netclassid_cgroup);
diff --git a/net/core/sock.c b/net/core/sock.c
index ab20ed9..3f15072 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1308,18 +1308,6 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
 	module_put(owner);
 }
 
-#if IS_ENABLED(CONFIG_NET_CLS_CGROUP)
-void sock_update_classid(struct sock *sk)
-{
-	u32 classid;
-
-	classid = task_cls_classid(current);
-	if (classid != sk->sk_classid)
-		sk->sk_classid = classid;
-}
-EXPORT_SYMBOL(sock_update_classid);
-#endif
-
 #if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
 void sock_update_netprioidx(struct sock *sk)
 {
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index ad1f1d8..f711a47 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -435,6 +435,7 @@ config NET_CLS_FLOW
 config NET_CLS_CGROUP
 	tristate "Control Group Classifier"
 	select NET_CLS
+	select CGROUP_NET_CLASSID
 	depends on CGROUPS
 	---help---
 	  Say Y here if you want to classify packets based on the control
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 16006c9..838fa40 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -11,109 +11,13 @@
 
 #include <linux/module.h>
 #include <linux/slab.h>
-#include <linux/types.h>
-#include <linux/string.h>
-#include <linux/errno.h>
 #include <linux/skbuff.h>
-#include <linux/cgroup.h>
 #include <linux/rcupdate.h>
-#include <linux/fdtable.h>
 #include <net/rtnetlink.h>
 #include <net/pkt_cls.h>
 #include <net/sock.h>
 #include <net/cls_cgroup.h>
 
-static inline struct cgroup_cls_state *css_cls_state(struct cgroup_subsys_state *css)
-{
-	return css ? container_of(css, struct cgroup_cls_state, css) : NULL;
-}
-
-static inline struct cgroup_cls_state *task_cls_state(struct task_struct *p)
-{
-	return css_cls_state(task_css(p, net_cls_subsys_id));
-}
-
-static struct cgroup_subsys_state *
-cgrp_css_alloc(struct cgroup_subsys_state *parent_css)
-{
-	struct cgroup_cls_state *cs;
-
-	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
-	if (!cs)
-		return ERR_PTR(-ENOMEM);
-	return &cs->css;
-}
-
-static int cgrp_css_online(struct cgroup_subsys_state *css)
-{
-	struct cgroup_cls_state *cs = css_cls_state(css);
-	struct cgroup_cls_state *parent = css_cls_state(css_parent(css));
-
-	if (parent)
-		cs->classid = parent->classid;
-	return 0;
-}
-
-static void cgrp_css_free(struct cgroup_subsys_state *css)
-{
-	kfree(css_cls_state(css));
-}
-
-static int update_classid(const void *v, struct file *file, unsigned n)
-{
-	int err;
-	struct socket *sock = sock_from_file(file, &err);
-	if (sock)
-		sock->sk->sk_classid = (u32)(unsigned long)v;
-	return 0;
-}
-
-static void cgrp_attach(struct cgroup_subsys_state *css,
-			struct cgroup_taskset *tset)
-{
-	struct task_struct *p;
-	struct cgroup_cls_state *cs = css_cls_state(css);
-	void *v = (void *)(unsigned long)cs->classid;
-
-	cgroup_taskset_for_each(p, css, tset) {
-		task_lock(p);
-		iterate_fd(p->files, 0, update_classid, v);
-		task_unlock(p);
-	}
-}
-
-static u64 read_classid(struct cgroup_subsys_state *css, struct cftype *cft)
-{
-	return css_cls_state(css)->classid;
-}
-
-static int write_classid(struct cgroup_subsys_state *css, struct cftype *cft,
-			 u64 value)
-{
-	css_cls_state(css)->classid = (u32) value;
-	return 0;
-}
-
-static struct cftype ss_files[] = {
-	{
-		.name = "classid",
-		.read_u64 = read_classid,
-		.write_u64 = write_classid,
-	},
-	{ }	/* terminate */
-};
-
-struct cgroup_subsys net_cls_subsys = {
-	.name		= "net_cls",
-	.css_alloc	= cgrp_css_alloc,
-	.css_online	= cgrp_css_online,
-	.css_free	= cgrp_css_free,
-	.attach		= cgrp_attach,
-	.subsys_id	= net_cls_subsys_id,
-	.base_cftypes	= ss_files,
-	.module		= THIS_MODULE,
-};
-
 struct cls_cgroup_head {
 	u32			handle;
 	struct tcf_exts		exts;
@@ -309,25 +213,12 @@ static struct tcf_proto_ops cls_cgroup_ops __read_mostly = {
 
 static int __init init_cgroup_cls(void)
 {
-	int ret;
-
-	ret = cgroup_load_subsys(&net_cls_subsys);
-	if (ret)
-		goto out;
-
-	ret = register_tcf_proto_ops(&cls_cgroup_ops);
-	if (ret)
-		cgroup_unload_subsys(&net_cls_subsys);
-
-out:
-	return ret;
+	return register_tcf_proto_ops(&cls_cgroup_ops);
 }
 
 static void __exit exit_cgroup_cls(void)
 {
 	unregister_tcf_proto_ops(&cls_cgroup_ops);
-
-	cgroup_unload_subsys(&net_cls_subsys);
 }
 
 module_init(init_cgroup_cls);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/12] net: netprio: rename config to be more consistent with cgroup configs
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
                   ` (9 preceding siblings ...)
  2014-01-05 23:13 ` [PATCH 10/12] net: net_cls: move cgroupfs classid handling into core Pablo Neira Ayuso
@ 2014-01-05 23:13 ` Pablo Neira Ayuso
  2014-01-05 23:13 ` [PATCH 12/12] netfilter: x_tables: lightweight process control group matching Pablo Neira Ayuso
  11 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Daniel Borkmann <dborkman@redhat.com>

While we're at it and introduced CGROUP_NET_CLASSID, lets also make
NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
option consistent among networking cgroups, but also among cgroups
CONFIG conventions in general as the vast majority has a prefix of
CONFIG_CGROUP_<SUBSYS>.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/cgroup_subsys.h |    2 +-
 include/linux/netdevice.h     |    2 +-
 include/net/netprio_cgroup.h  |   18 ++++++------------
 include/net/sock.h            |    2 +-
 net/Kconfig                   |    4 ++--
 net/core/Makefile             |    2 +-
 net/core/dev.c                |    2 +-
 net/core/sock.c               |    2 +-
 8 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 58bf94d..7b99d71 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -43,7 +43,7 @@ SUBSYS(blkio)
 SUBSYS(perf)
 #endif
 
-#if IS_SUBSYS_ENABLED(CONFIG_NETPRIO_CGROUP)
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_NET_PRIO)
 SUBSYS(net_prio)
 #endif
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5260d2e..45cf681 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1444,7 +1444,7 @@ struct net_device {
 	/* max exchange id for FCoE LRO by ddp */
 	unsigned int		fcoe_ddp_xid;
 #endif
-#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
+#if IS_ENABLED(CONFIG_CGROUP_NET_PRIO)
 	struct netprio_map __rcu *priomap;
 #endif
 	/* phy device may attach itself for hardware timestamping */
diff --git a/include/net/netprio_cgroup.h b/include/net/netprio_cgroup.h
index 099d027..dafc09f 100644
--- a/include/net/netprio_cgroup.h
+++ b/include/net/netprio_cgroup.h
@@ -13,12 +13,12 @@
 
 #ifndef _NETPRIO_CGROUP_H
 #define _NETPRIO_CGROUP_H
+
 #include <linux/cgroup.h>
 #include <linux/hardirq.h>
 #include <linux/rcupdate.h>
 
-
-#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
+#if IS_ENABLED(CONFIG_CGROUP_NET_PRIO)
 struct netprio_map {
 	struct rcu_head rcu;
 	u32 priomap_len;
@@ -27,8 +27,7 @@ struct netprio_map {
 
 void sock_update_netprioidx(struct sock *sk);
 
-#if IS_BUILTIN(CONFIG_NETPRIO_CGROUP)
-
+#if IS_BUILTIN(CONFIG_CGROUP_NET_PRIO)
 static inline u32 task_netprioidx(struct task_struct *p)
 {
 	struct cgroup_subsys_state *css;
@@ -40,9 +39,7 @@ static inline u32 task_netprioidx(struct task_struct *p)
 	rcu_read_unlock();
 	return idx;
 }
-
-#elif IS_MODULE(CONFIG_NETPRIO_CGROUP)
-
+#elif IS_MODULE(CONFIG_CGROUP_NET_PRIO)
 static inline u32 task_netprioidx(struct task_struct *p)
 {
 	struct cgroup_subsys_state *css;
@@ -56,9 +53,7 @@ static inline u32 task_netprioidx(struct task_struct *p)
 	return idx;
 }
 #endif
-
-#else /* !CONFIG_NETPRIO_CGROUP */
-
+#else /* !CONFIG_CGROUP_NET_PRIO */
 static inline u32 task_netprioidx(struct task_struct *p)
 {
 	return 0;
@@ -66,6 +61,5 @@ static inline u32 task_netprioidx(struct task_struct *p)
 
 #define sock_update_netprioidx(sk)
 
-#endif /* CONFIG_NETPRIO_CGROUP */
-
+#endif /* CONFIG_CGROUP_NET_PRIO */
 #endif  /* _NET_CLS_CGROUP_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 2ef3c3e..ef5e2be 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -395,7 +395,7 @@ struct sock {
 	unsigned short		sk_ack_backlog;
 	unsigned short		sk_max_ack_backlog;
 	__u32			sk_priority;
-#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
+#if IS_ENABLED(CONFIG_CGROUP_NET_PRIO)
 	__u32			sk_cgrp_prioidx;
 #endif
 	struct pid		*sk_peer_pid;
diff --git a/net/Kconfig b/net/Kconfig
index 7da10b8..e411046 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -238,12 +238,12 @@ config XPS
 	depends on SMP
 	default y
 
-config NETPRIO_CGROUP
+config CGROUP_NET_PRIO
 	tristate "Network priority cgroup"
 	depends on CGROUPS
 	---help---
 	  Cgroup subsystem for use in assigning processes to network priorities on
-	  a per-interface basis
+	  a per-interface basis.
 
 config CGROUP_NET_CLASSID
 	boolean "Network classid cgroup"
diff --git a/net/core/Makefile b/net/core/Makefile
index 4839a27..9628c20 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -21,5 +21,5 @@ obj-$(CONFIG_FIB_RULES) += fib_rules.o
 obj-$(CONFIG_TRACEPOINTS) += net-traces.o
 obj-$(CONFIG_NET_DROP_MONITOR) += drop_monitor.o
 obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
-obj-$(CONFIG_NETPRIO_CGROUP) += netprio_cgroup.o
+obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
 obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
diff --git a/net/core/dev.c b/net/core/dev.c
index c95d664..888a79b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2747,7 +2747,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	return rc;
 }
 
-#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
+#if IS_ENABLED(CONFIG_CGROUP_NET_PRIO)
 static void skb_update_prio(struct sk_buff *skb)
 {
 	struct netprio_map *map = rcu_dereference_bh(skb->dev->priomap);
diff --git a/net/core/sock.c b/net/core/sock.c
index 3f15072..a29735c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1308,7 +1308,7 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
 	module_put(owner);
 }
 
-#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
+#if IS_ENABLED(CONFIG_CGROUP_NET_PRIO)
 void sock_update_netprioidx(struct sock *sk)
 {
 	if (in_interrupt())
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 12/12] netfilter: x_tables: lightweight process control group matching
  2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
                   ` (10 preceding siblings ...)
  2014-01-05 23:13 ` [PATCH 11/12] net: netprio: rename config to be more consistent with cgroup configs Pablo Neira Ayuso
@ 2014-01-05 23:13 ` Pablo Neira Ayuso
  11 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Daniel Borkmann <dborkman@redhat.com>

It would be useful e.g. in a server or desktop environment to have
a facility in the notion of fine-grained "per application" or "per
application group" firewall policies. Probably, users in the mobile,
embedded area (e.g. Android based) with different security policy
requirements for application groups could have great benefit from
that as well. For example, with a little bit of configuration effort,
an admin could whitelist well-known applications, and thus block
otherwise unwanted "hard-to-track" applications like [1] from a
user's machine. Blocking is just one example, but it is not limited
to that, meaning we can have much different scenarios/policies that
netfilter allows us than just blocking, e.g. fine grained settings
where applications are allowed to connect/send traffic to, application
traffic marking/conntracking, application-specific packet mangling,
and so on.

Implementation of PID-based matching would not be appropriate
as they frequently change, and child tracking would make that
even more complex and ugly. Cgroups would be a perfect candidate
for accomplishing that as they associate a set of tasks with a
set of parameters for one or more subsystems, in our case the
netfilter subsystem, which, of course, can be combined with other
cgroup subsystems into something more complex if needed.

As mentioned, to overcome this constraint, such processes could
be placed into one or multiple cgroups where different fine-grained
rules can be defined depending on the application scenario, while
e.g. everything else that is not part of that could be dropped (or
vice versa), thus making life harder for unwanted processes to
communicate to the outside world. So, we make use of cgroups here
to track jobs and limit their resources in terms of iptables
policies; in other words, limiting, tracking, etc what they are
allowed to communicate.

In our case we're working on outgoing traffic based on which local
socket that originated from. Also, one doesn't even need to have
an a-prio knowledge of the application internals regarding their
particular use of ports or protocols. Matching is *extremly*
lightweight as we just test for the sk_classid marker of sockets,
originating from net_cls. net_cls and netfilter do not contradict
each other; in fact, each construct can live as standalone or they
can be used in combination with each other, which is perfectly fine,
plus it serves Tejun's requirement to not introduce a new cgroups
subsystem. Through this, we result in a very minimal and efficient
module, and don't add anything except netfilter code.

One possible, minimal usage example (many other iptables options
can be applied obviously):

 1) Configuring cgroups if not already done, e.g.:

  mkdir /sys/fs/cgroup/net_cls
  mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
  mkdir /sys/fs/cgroup/net_cls/0
  echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
  (resp. a real flow handle id for tc)

 2) Configuring netfilter (iptables-nftables), e.g.:

  iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP

 3) Running applications, e.g.:

  ping 208.67.222.222  <pid:1799>
  echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
  [...]
  ping 208.67.220.220  <pid:1804>
  ping: sendmsg: Operation not permitted
  [...]
  echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
  [...]

Of course, real-world deployments would make use of cgroups user
space toolsuite, or own custom policy daemons dynamically moving
applications from/to various cgroups.

  [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 Documentation/cgroups/net_cls.txt        |    5 +++
 include/uapi/linux/netfilter/Kbuild      |    1 +
 include/uapi/linux/netfilter/xt_cgroup.h |   11 +++++
 net/netfilter/Kconfig                    |   10 +++++
 net/netfilter/Makefile                   |    1 +
 net/netfilter/xt_cgroup.c                |   71 ++++++++++++++++++++++++++++++
 6 files changed, 99 insertions(+)
 create mode 100644 include/uapi/linux/netfilter/xt_cgroup.h
 create mode 100644 net/netfilter/xt_cgroup.c

diff --git a/Documentation/cgroups/net_cls.txt b/Documentation/cgroups/net_cls.txt
index 9face6b..ec18234 100644
--- a/Documentation/cgroups/net_cls.txt
+++ b/Documentation/cgroups/net_cls.txt
@@ -6,6 +6,8 @@ tag network packets with a class identifier (classid).
 
 The Traffic Controller (tc) can be used to assign
 different priorities to packets from different cgroups.
+Also, Netfilter (iptables) can use this tag to perform
+actions on such packets.
 
 Creating a net_cls cgroups instance creates a net_cls.classid file.
 This net_cls.classid value is initialized to 0.
@@ -32,3 +34,6 @@ tc class add dev eth0 parent 10: classid 10:1 htb rate 40mbit
  - creating traffic class 10:1
 
 tc filter add dev eth0 parent 10: protocol ip prio 10 handle 1: cgroup
+
+configuring iptables, basic example:
+iptables -A OUTPUT -m cgroup ! --cgroup 0x100001 -j DROP
diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild
index 91be8ce..2344f5a 100644
--- a/include/uapi/linux/netfilter/Kbuild
+++ b/include/uapi/linux/netfilter/Kbuild
@@ -39,6 +39,7 @@ header-y += xt_TEE.h
 header-y += xt_TPROXY.h
 header-y += xt_addrtype.h
 header-y += xt_bpf.h
+header-y += xt_cgroup.h
 header-y += xt_cluster.h
 header-y += xt_comment.h
 header-y += xt_connbytes.h
diff --git a/include/uapi/linux/netfilter/xt_cgroup.h b/include/uapi/linux/netfilter/xt_cgroup.h
new file mode 100644
index 0000000..43acb7e
--- /dev/null
+++ b/include/uapi/linux/netfilter/xt_cgroup.h
@@ -0,0 +1,11 @@
+#ifndef _UAPI_XT_CGROUP_H
+#define _UAPI_XT_CGROUP_H
+
+#include <linux/types.h>
+
+struct xt_cgroup_info {
+	__u32 id;
+	__u32 invert;
+};
+
+#endif /* _UAPI_XT_CGROUP_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 6d8e48b..c17902c 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -858,6 +858,16 @@ config NETFILTER_XT_MATCH_BPF
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_MATCH_CGROUP
+	tristate '"control group" match support'
+	depends on NETFILTER_ADVANCED
+	depends on CGROUPS
+	select CGROUP_NET_CLASSID
+	---help---
+	Socket/process control group matching allows you to match locally
+	generated packets based on which net_cls control group processes
+	belong to.
+
 config NETFILTER_XT_MATCH_CLUSTER
 	tristate '"cluster" match support'
 	depends on NF_CONNTRACK
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 398cd70..407fc23 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -143,6 +143,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_NFACCT) += xt_nfacct.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_CGROUP) += xt_cgroup.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
new file mode 100644
index 0000000..9a8e77e7
--- /dev/null
+++ b/net/netfilter/xt_cgroup.c
@@ -0,0 +1,71 @@
+/*
+ * Xtables module to match the process control group.
+ *
+ * Might be used to implement individual "per-application" firewall
+ * policies in contrast to global policies based on control groups.
+ * Matching is based upon processes tagged to net_cls' classid marker.
+ *
+ * (C) 2013 Daniel Borkmann <dborkman@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/skbuff.h>
+#include <linux/module.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_cgroup.h>
+#include <net/sock.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
+MODULE_DESCRIPTION("Xtables: process control group matching");
+MODULE_ALIAS("ipt_cgroup");
+MODULE_ALIAS("ip6t_cgroup");
+
+static int cgroup_mt_check(const struct xt_mtchk_param *par)
+{
+	struct xt_cgroup_info *info = par->matchinfo;
+
+	if (info->invert & ~1)
+		return -EINVAL;
+
+	return info->id ? 0 : -EINVAL;
+}
+
+static bool
+cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	const struct xt_cgroup_info *info = par->matchinfo;
+
+	if (skb->sk == NULL)
+		return false;
+
+	return (info->id == skb->sk->sk_classid) ^ info->invert;
+}
+
+static struct xt_match cgroup_mt_reg __read_mostly = {
+	.name       = "cgroup",
+	.revision   = 0,
+	.family     = NFPROTO_UNSPEC,
+	.checkentry = cgroup_mt_check,
+	.match      = cgroup_mt,
+	.matchsize  = sizeof(struct xt_cgroup_info),
+	.me         = THIS_MODULE,
+	.hooks      = (1 << NF_INET_LOCAL_OUT) |
+		      (1 << NF_INET_POST_ROUTING),
+};
+
+static int __init cgroup_mt_init(void)
+{
+	return xt_register_match(&cgroup_mt_reg);
+}
+
+static void __exit cgroup_mt_exit(void)
+{
+	xt_unregister_match(&cgroup_mt_reg);
+}
+
+module_init(cgroup_mt_init);
+module_exit(cgroup_mt_exit);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/12] netfilter: avoid get_random_bytes calls
  2014-01-05 23:12 ` [PATCH 01/12] netfilter: avoid get_random_bytes calls Pablo Neira Ayuso
@ 2014-01-05 23:41   ` Hannes Frederic Sowa
  2014-01-06 11:54     ` Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Frederic Sowa @ 2014-01-05 23:41 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev, fw

On Mon, Jan 06, 2014 at 12:12:55AM +0100, Pablo Neira Ayuso wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> All these users need an initial seed value for jhash, prandom is
> perfectly fine.  This avoids draining the entropy pool where
> its not strictly required.

Secrets protecting hash tables should be rather strong.

prandom_u32() has two seeding points at boot-up. One is at late_initcall.
Thanks to parallel boot-up this gets executed fairly early. The other one is
when the RNG nonblocking pool is fully initialized. Only after this point we
can assume prandom_u32() returns truely random values. In between, only
get_random_bytes or net_get_random_once are safe for use.

To get the impression when prandom_u32 gets truely seeded, watch out
for the message "random: nonblocking pool is initialized" in dmesg. ;)

Hmm, some of them look like good candidates for net_get_random_once. I don't
see such a problem with draining entropy pool, especially as they don't run
that early and they don't request so many random bits.

Greetings,

  Hannes



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/12] netfilter: avoid get_random_bytes calls
  2014-01-05 23:41   ` Hannes Frederic Sowa
@ 2014-01-06 11:54     ` Florian Westphal
  2014-01-06 12:43       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2014-01-06 11:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel, davem, netdev, fw

Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> On Mon, Jan 06, 2014 at 12:12:55AM +0100, Pablo Neira Ayuso wrote:
> > From: Florian Westphal <fw@strlen.de>
> > 
> > All these users need an initial seed value for jhash, prandom is
> > perfectly fine.  This avoids draining the entropy pool where
> > its not strictly required.
> 
> Secrets protecting hash tables should be rather strong.

Yes, which is why e.g. conntrack hash is not converted.

> prandom_u32() has two seeding points at boot-up. One is at late_initcall.

Yes.  None of these locations are executed via initcalls, they are all
in _checkentry (i.e., run when userspace iptables inserts a rule using
the target/match), except hashlimit where its delayed until the first
address is stored (so its even later).

> Thanks to parallel boot-up this gets executed fairly early. The other one is
> when the RNG nonblocking pool is fully initialized. Only after this point we
> can assume prandom_u32() returns truely random values. In between, only
> get_random_bytes or net_get_random_once are safe for use.

Can you elaborate?  If entropy estimate is really really low
(because we're booting up), why would get_random_bytes() be a better
choice [ i understand net_get_random_once() is for delaying
the actual random_bytes call until a later point in time where we've
hopefully collected more entropy ]

> To get the impression when prandom_u32 gets truely seeded, watch out
> for the message "random: nonblocking pool is initialized" in dmesg. ;)

It happens very very early on my machine, even before / is remounted
rw.  I would be more interested in what happens on small embedded
boxes...

> Hmm, some of them look like good candidates for net_get_random_once. I don't
> see such a problem with draining entropy pool, especially as they don't run
> that early and they don't request so many random bits.

I specifically did not use net_get_random_once once because checkentry is
not a hotpath.

I don't see why get_random_bytes use increases the security margin, especially
considering none of these hashes have periodic run-time rehashing?

But sure, if you think this change is a problem, Pablo can just revert it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/12] netfilter: avoid get_random_bytes calls
  2014-01-06 11:54     ` Florian Westphal
@ 2014-01-06 12:43       ` Hannes Frederic Sowa
  2014-01-06 12:58         ` Pablo Neira Ayuso
  2014-01-06 13:04         ` Florian Westphal
  0 siblings, 2 replies; 23+ messages in thread
From: Hannes Frederic Sowa @ 2014-01-06 12:43 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel, davem, netdev

Hello!

On Mon, Jan 06, 2014 at 12:54:36PM +0100, Florian Westphal wrote:
> Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> > On Mon, Jan 06, 2014 at 12:12:55AM +0100, Pablo Neira Ayuso wrote:
> > > From: Florian Westphal <fw@strlen.de>
> > > 
> > > All these users need an initial seed value for jhash, prandom is
> > > perfectly fine.  This avoids draining the entropy pool where
> > > its not strictly required.
> > 
> > Secrets protecting hash tables should be rather strong.
> 
> Yes, which is why e.g. conntrack hash is not converted.
> 
> > prandom_u32() has two seeding points at boot-up. One is at late_initcall.
> 
> Yes.  None of these locations are executed via initcalls, they are all
> in _checkentry (i.e., run when userspace iptables inserts a rule using
> the target/match), except hashlimit where its delayed until the first
> address is stored (so its even later).
> 
> > Thanks to parallel boot-up this gets executed fairly early. The other one is
> > when the RNG nonblocking pool is fully initialized. Only after this point we
> > can assume prandom_u32() returns truely random values. In between, only
> > get_random_bytes or net_get_random_once are safe for use.
> 
> Can you elaborate?  If entropy estimate is really really low
> (because we're booting up), why would get_random_bytes() be a better
> choice [ i understand net_get_random_once() is for delaying
> the actual random_bytes call until a later point in time where we've
> hopefully collected more entropy ]

I hope, I answer that below.

> > To get the impression when prandom_u32 gets truely seeded, watch out
> > for the message "random: nonblocking pool is initialized" in dmesg. ;)
> 
> It happens very very early on my machine, even before / is remounted
> rw.  I would be more interested in what happens on small embedded
> boxes...

On some of my small virtual machines (amd64) I even see this message while
login on the console (small iptables set also loaded before). In the mean
time prandom_u32() is still seeded with maybe 3 bits (I once measured it)
at the beginning and won't get a refresh until the nonblocking pool is
fully initialized. prandom_u32 will just iterate over its seed until it
is renewed whereas get_random_bytes does try to stretch (with help of the
twisted GFSR and SHA-1) the available entropy in case the nonblocking_pool
is limited, thus it is more probable to get better random results.

E.g. on an amd64 athlon x2 with two VMs:
[Mon Jan  6 13:35:40 2014] Initializing cgroup subsys cpuset
...
[Mon Jan  6 13:36:21 2014] random: nonblocking pool is initialized

I normally get the message while typing in the password on the prompt of the
serial console.

Single integers are not so much of a problem. E.g. one problem in
wireless code was, where get_random_bytes was called in a loop to fill
a structure, that did hurt: f7d8ad81ca8c44 ("mac80211: minstrels: spare
numerous useless calls to get_random_bytes").

> > Hmm, some of them look like good candidates for net_get_random_once. I don't
> > see such a problem with draining entropy pool, especially as they don't run
> > that early and they don't request so many random bits.
> 
> I specifically did not use net_get_random_once once because checkentry is
> not a hotpath.
> 
> I don't see why get_random_bytes use increases the security margin, especially
> considering none of these hashes have periodic run-time rehashing?
> 
> But sure, if you think this change is a problem, Pablo can just revert it.

I don't know if it is a real problem. Most of the time the initial seed
should be enough, but I guess get_random_bytes would still be a more
defensive choice. I would have used it. ;)

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/12] netfilter: avoid get_random_bytes calls
  2014-01-06 12:43       ` Hannes Frederic Sowa
@ 2014-01-06 12:58         ` Pablo Neira Ayuso
  2014-01-06 13:04         ` Florian Westphal
  1 sibling, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-06 12:58 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Florian Westphal, netfilter-devel, davem, netdev

On Mon, Jan 06, 2014 at 01:43:40PM +0100, Hannes Frederic Sowa wrote:
[...]
> > > Hmm, some of them look like good candidates for net_get_random_once. I don't
> > > see such a problem with draining entropy pool, especially as they don't run
> > > that early and they don't request so many random bits.
> > 
> > I specifically did not use net_get_random_once once because checkentry is
> > not a hotpath.
> > 
> > I don't see why get_random_bytes use increases the security margin, especially
> > considering none of these hashes have periodic run-time rehashing?
> > 
> > But sure, if you think this change is a problem, Pablo can just revert it.
> 
> I don't know if it is a real problem. Most of the time the initial seed
> should be enough, but I guess get_random_bytes would still be a more
> defensive choice. I would have used it. ;)

OK, I have reverted this patch, thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/12] netfilter: avoid get_random_bytes calls
  2014-01-06 12:43       ` Hannes Frederic Sowa
  2014-01-06 12:58         ` Pablo Neira Ayuso
@ 2014-01-06 13:04         ` Florian Westphal
  2014-01-06 13:06           ` Pablo Neira Ayuso
  1 sibling, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2014-01-06 13:04 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso, netfilter-devel, davem, netdev

Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> On Mon, Jan 06, 2014 at 12:54:36PM +0100, Florian Westphal wrote:
> > Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> > Can you elaborate?  If entropy estimate is really really low
> > (because we're booting up), why would get_random_bytes() be a better
> > choice [ i understand net_get_random_once() is for delaying
> > the actual random_bytes call until a later point in time where we've
> > hopefully collected more entropy ]

[..]

> On some of my small virtual machines (amd64) I even see this message while
> login on the console (small iptables set also loaded before). In the mean
> time prandom_u32() is still seeded with maybe 3 bits (I once measured it)
> at the beginning and won't get a refresh until the nonblocking pool is
> fully initialized.

I see.  In this case it indeed could be a problem; I was doing this
change with the assumption that prandom is useable at ->checkenty time.

> > I specifically did not use net_get_random_once once because checkentry is
> > not a hotpath.
> > 
> > I don't see why get_random_bytes use increases the security margin, especially
> > considering none of these hashes have periodic run-time rehashing?
> > 
> > But sure, if you think this change is a problem, Pablo can just revert it.
> 
> I don't know if it is a real problem. Most of the time the initial seed
> should be enough, but I guess get_random_bytes would still be a more
> defensive choice. I would have used it. ;)

Alright.  Given that this went into -next, I think we have a few weeks to
investigate.

I will check if the specific hash uses are problematic in their own right
(due to lack of reseed) or if they are weakened by this change only.

I'll follow up on this.

Pablo/David, if you think this needs to be fixed RIGHT NOW then please
just issue a revert for a42b99a6e329654d376b330de057eff87686d890.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/12] netfilter: avoid get_random_bytes calls
  2014-01-06 13:04         ` Florian Westphal
@ 2014-01-06 13:06           ` Pablo Neira Ayuso
  2014-01-06 13:07             ` Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-06 13:06 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, davem, netdev

On Mon, Jan 06, 2014 at 02:04:51PM +0100, Florian Westphal wrote:
> Pablo/David, if you think this needs to be fixed RIGHT NOW then please
> just issue a revert for a42b99a6e329654d376b330de057eff87686d890.

I just did it, thanks Florian.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/12] netfilter: avoid get_random_bytes calls
  2014-01-06 13:06           ` Pablo Neira Ayuso
@ 2014-01-06 13:07             ` Florian Westphal
  0 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2014-01-06 13:07 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, davem, netdev

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Jan 06, 2014 at 02:04:51PM +0100, Florian Westphal wrote:
> > Pablo/David, if you think this needs to be fixed RIGHT NOW then please
> > just issue a revert for a42b99a6e329654d376b330de057eff87686d890.
> 
> I just did it, thanks Florian.

Ok.  Can you resurrect the nfnetlink_log change?

That one is safe :)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/12] netfilter: nfnetlink_queue: enable UID/GID socket info retrieval
  2014-01-05 23:12 ` [PATCH 03/12] netfilter: nfnetlink_queue: enable UID/GID socket info retrieval Pablo Neira Ayuso
@ 2014-01-06 15:32   ` Eric Dumazet
  2014-01-06 16:36     ` Pablo Neira Ayuso
  2014-01-06 18:36     ` David Miller
  0 siblings, 2 replies; 23+ messages in thread
From: Eric Dumazet @ 2014-01-06 15:32 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Valentina Giusti; +Cc: netfilter-devel, davem, netdev

On Mon, 2014-01-06 at 00:12 +0100, Pablo Neira Ayuso wrote:
> From: Valentina Giusti <valentina.giusti@bmw-carit.de>
> 
> Thanks to commits 41063e9 (ipv4: Early TCP socket demux) and 421b388
> (udp: ipv4: Add udp early demux) it is now possible to parse UID and
> GID socket info also for incoming TCP and UDP connections. Having
> this info available, it is convenient to let NFQUEUE parse it in
> order to improve and refine the traffic analysis in userspace.
> 
> Signed-off-by: Valentina Giusti <valentina.giusti@bmw-carit.de>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
>  include/uapi/linux/netfilter/nfnetlink_queue.h |    5 +++-
>  net/netfilter/nfnetlink_queue_core.c           |   34 ++++++++++++++++++++++++
>  2 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/linux/netfilter/nfnetlink_queue.h
> index 0132bad..8dd819e 100644
> --- a/include/uapi/linux/netfilter/nfnetlink_queue.h
> +++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
> @@ -47,6 +47,8 @@ enum nfqnl_attr_type {
>  	NFQA_CAP_LEN,			/* __u32 length of captured packet */
>  	NFQA_SKB_INFO,			/* __u32 skb meta information */
>  	NFQA_EXP,			/* nf_conntrack_netlink.h */
> +	NFQA_UID,			/* __u32 sk uid */
> +	NFQA_GID,			/* __u32 sk gid */
>  
>  	__NFQA_MAX
>  };
> @@ -99,7 +101,8 @@ enum nfqnl_attr_config {
>  #define NFQA_CFG_F_FAIL_OPEN			(1 << 0)
>  #define NFQA_CFG_F_CONNTRACK			(1 << 1)
>  #define NFQA_CFG_F_GSO				(1 << 2)
> -#define NFQA_CFG_F_MAX				(1 << 3)
> +#define NFQA_CFG_F_UID_GID			(1 << 3)
> +#define NFQA_CFG_F_MAX				(1 << 4)
>  
>  /* flags for NFQA_SKB_INFO */
>  /* packet appears to have wrong checksums, but they are ok */
> diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
> index 21258cf..d3cf12b 100644
> --- a/net/netfilter/nfnetlink_queue_core.c
> +++ b/net/netfilter/nfnetlink_queue_core.c
> @@ -297,6 +297,31 @@ nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet,
>  	return flags ? nla_put_be32(nlskb, NFQA_SKB_INFO, htonl(flags)) : 0;
>  }
>  
> +static int nfqnl_put_sk_uidgid(struct sk_buff *skb, struct sock *sk)
> +{
> +	const struct cred *cred;
> +
> +	if (sk->sk_state == TCP_TIME_WAIT)
> +		return 0;
> +

net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid':
net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function)
net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1

Fix is obvious (#include <net/tcp_states.h>), but I have to run...




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/12] netfilter: nfnetlink_queue: enable UID/GID socket info retrieval
  2014-01-06 15:32   ` Eric Dumazet
@ 2014-01-06 16:36     ` Pablo Neira Ayuso
  2014-01-06 18:36     ` David Miller
  1 sibling, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-06 16:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Valentina Giusti, netfilter-devel, davem, netdev

On Mon, Jan 06, 2014 at 07:32:33AM -0800, Eric Dumazet wrote:
> On Mon, 2014-01-06 at 00:12 +0100, Pablo Neira Ayuso wrote:
> > From: Valentina Giusti <valentina.giusti@bmw-carit.de>
> > 
> > Thanks to commits 41063e9 (ipv4: Early TCP socket demux) and 421b388
> > (udp: ipv4: Add udp early demux) it is now possible to parse UID and
> > GID socket info also for incoming TCP and UDP connections. Having
> > this info available, it is convenient to let NFQUEUE parse it in
> > order to improve and refine the traffic analysis in userspace.
> > 
> > Signed-off-by: Valentina Giusti <valentina.giusti@bmw-carit.de>
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
[...]
> net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid':
> net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function)
> net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in
> make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1
> 
> Fix is obvious (#include <net/tcp_states.h>), but I have to run...

Just sent this patch to David, thanks for reporting.

http://patchwork.ozlabs.org/patch/307367/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/12] netfilter: nfnetlink_queue: enable UID/GID socket info retrieval
  2014-01-06 15:32   ` Eric Dumazet
  2014-01-06 16:36     ` Pablo Neira Ayuso
@ 2014-01-06 18:36     ` David Miller
  1 sibling, 0 replies; 23+ messages in thread
From: David Miller @ 2014-01-06 18:36 UTC (permalink / raw)
  To: eric.dumazet; +Cc: pablo, valentina.giusti, netfilter-devel, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 06 Jan 2014 07:32:33 -0800

> net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid':
> net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function)
> net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in
> make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1
> 
> Fix is obvious (#include <net/tcp_states.h>), but I have to run...

I just committed the following:

====================
[PATCH] netfilter: Fix build failure in nfnetlink_queue_core.c.

net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid':
net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function)
net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1

Just a missing include of net/tcp_states.h

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/netfilter/nfnetlink_queue_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index d3cf12b..b5e1f82 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -29,6 +29,7 @@
 #include <linux/netfilter/nfnetlink_queue.h>
 #include <linux/list.h>
 #include <net/sock.h>
+#include <net/tcp_states.h>
 #include <net/netfilter/nf_queue.h>
 #include <net/netns/generic.h>
 #include <net/netfilter/nfnetlink_queue.h>
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-01-06 18:36 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-05 23:12 [PATCH 00/12] netfilter updates for net-next Pablo Neira Ayuso
2014-01-05 23:12 ` [PATCH 01/12] netfilter: avoid get_random_bytes calls Pablo Neira Ayuso
2014-01-05 23:41   ` Hannes Frederic Sowa
2014-01-06 11:54     ` Florian Westphal
2014-01-06 12:43       ` Hannes Frederic Sowa
2014-01-06 12:58         ` Pablo Neira Ayuso
2014-01-06 13:04         ` Florian Westphal
2014-01-06 13:06           ` Pablo Neira Ayuso
2014-01-06 13:07             ` Florian Westphal
2014-01-05 23:12 ` [PATCH 02/12] netfilter: ctnetlink: honor CTA_MARK_MASK when setting ctmark Pablo Neira Ayuso
2014-01-05 23:12 ` [PATCH 03/12] netfilter: nfnetlink_queue: enable UID/GID socket info retrieval Pablo Neira Ayuso
2014-01-06 15:32   ` Eric Dumazet
2014-01-06 16:36     ` Pablo Neira Ayuso
2014-01-06 18:36     ` David Miller
2014-01-05 23:12 ` [PATCH 04/12] netfilter: add IPv4/6 IPComp extension match support Pablo Neira Ayuso
2014-01-05 23:12 ` [PATCH 05/12] ipvs: Remove unused variable ret from sync_thread_master() Pablo Neira Ayuso
2014-01-05 23:13 ` [PATCH 06/12] netfilter: nf_nat: add full port randomization support Pablo Neira Ayuso
2014-01-05 23:13 ` [PATCH 07/12] netfilter: ipset: remove unused code Pablo Neira Ayuso
2014-01-05 23:13 ` [PATCH 08/12] netfilter: nf_conntrack: remove dead code Pablo Neira Ayuso
2014-01-05 23:13 ` [PATCH 09/12] netfilter: xt_CT: fix error value in xt_ct_tg_check() Pablo Neira Ayuso
2014-01-05 23:13 ` [PATCH 10/12] net: net_cls: move cgroupfs classid handling into core Pablo Neira Ayuso
2014-01-05 23:13 ` [PATCH 11/12] net: netprio: rename config to be more consistent with cgroup configs Pablo Neira Ayuso
2014-01-05 23:13 ` [PATCH 12/12] netfilter: x_tables: lightweight process control group matching Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).