netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* pull request (net): ipsec 2019-01-25
@ 2019-01-25  7:44 Steffen Klassert
  2019-01-25  7:44 ` [PATCH 01/10] selftests: xfrm: add block rules with adjacent/overlapping subnets Steffen Klassert
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

1) Several patches to fix the fallout from the recent
   tree based policy lookup work. From Florian Westphal.

2) Fix VTI for IPCOMP for 'not compressed' IPCOMP packets.
   We need an extra IPIP handler to process these packets
   correctly. From Su Yanjun.

3) Fix validation of template and selector families for
   MODE_ROUTEOPTIMIZATION with ipv4-in-ipv6 packets.
   This can lead to a stack-out-of-bounds because
   flowi4 struct is treated as flowi6 struct.
   Fix from Florian Westphal.

4) Restore the default behaviour of the xfrm set-mark
   in the output path. This was changed accidentally
   when mark setting was extended to the input path.
   From Benedict Wong.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit d972f3dce8d161e2142da0ab1ef25df00e2f21a9:

  packet: Do not leak dev refcounts on error exit (2019-01-08 21:41:40 -0500)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master

for you to fetch changes up to e2612cd496e7b465711d219ea6118893d7253f52:

  xfrm: Make set-mark default behavior backward compatible (2019-01-16 13:10:55 +0100)

----------------------------------------------------------------
Benedict Wong (1):
      xfrm: Make set-mark default behavior backward compatible

Florian Westphal (8):
      selftests: xfrm: add block rules with adjacent/overlapping subnets
      xfrm: policy: use hlist rcu variants on inexact insert, part 2
      xfrm: policy: increment xfrm_hash_generation on hash rebuild
      xfrm: policy: delete inexact policies from inexact list on hash rebuild
      xfrm: policy: fix reinsertion on node merge
      selftests: xfrm: alter htresh to trigger move of policies to hash table
      xfrm: policy: fix infinite loop when merging src-nodes
      xfrm: refine validation of template and selector families

Su Yanjun (1):
      vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel

 net/ipv4/ip_vti.c                          |  50 ++++++++++
 net/xfrm/xfrm_policy.c                     |  63 ++++++------
 net/xfrm/xfrm_user.c                       |  13 ++-
 tools/testing/selftests/net/xfrm_policy.sh | 153 ++++++++++++++++++++++++-----
 4 files changed, 223 insertions(+), 56 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 01/10] selftests: xfrm: add block rules with adjacent/overlapping subnets
  2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
@ 2019-01-25  7:44 ` Steffen Klassert
  2019-01-25  7:44 ` [PATCH 02/10] xfrm: policy: use hlist rcu variants on inexact insert, part 2 Steffen Klassert
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

From: Florian Westphal <fw@strlen.de>

The existing script lacks a policy pattern that triggers 'tree node
merges' in the kernel.

Consider adding policy affecting following subnet:
pol1: dst 10.0.0.0/22
pol2: dst 10.0.0.0/23 # adds to existing 10.0.0.0/22 node

-> no problems here.  But now, lets consider reverse order:
pol1: dst 10.0.0.0/24
pol2: dst 10.0.0.0/23 # CANNOT add to existing node

When second policy gets added, the kernel must check that the new node
("10.0.0.0/23") doesn't overlap with any existing subnet.

Example:
dst 10.0.0.0/24
dst 10.0.0.1/24
dst 10.0.0.0/23

When the third policy gets added, the kernel must replace the nodes for
the 10.0.0.0/24 and 10.0.0.1/24 policies with a single one and must merge
all the subtrees/lists stored in those nodes into the new node.

The existing test cases only have overlaps with a single node, so no
merging takes place (we can always remove the 'old' node and replace
it with the new subnet prefix).

Add a few 'block policies' in a pattern that triggers this, with a priority
that will make kernel prefer the 'esp' rules.

Make sure the 'tunnel ping' tests still pass after they have been added.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 tools/testing/selftests/net/xfrm_policy.sh | 109 +++++++++++++++++----
 1 file changed, 91 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/net/xfrm_policy.sh b/tools/testing/selftests/net/xfrm_policy.sh
index 8db35b99457c..b5a1b565a7e6 100755
--- a/tools/testing/selftests/net/xfrm_policy.sh
+++ b/tools/testing/selftests/net/xfrm_policy.sh
@@ -46,6 +46,58 @@ do_esp() {
     ip -net $ns xfrm policy add src $rnet dst $lnet dir fwd tmpl src $remote dst $me proto esp mode tunnel priority 100 action allow
 }
 
+# add policies with different netmasks, to make sure kernel carries
+# the policies contained within new netmask over when search tree is
+# re-built.
+# peer netns that are supposed to be encapsulated via esp have addresses
+# in the 10.0.1.0/24 and 10.0.2.0/24 subnets, respectively.
+#
+# Adding a policy for '10.0.1.0/23' will make it necessary to
+# alter the prefix of 10.0.1.0 subnet.
+# In case new prefix overlaps with existing node, the node and all
+# policies it carries need to be merged with the existing one(s).
+#
+# Do that here.
+do_overlap()
+{
+    local ns=$1
+
+    # adds new nodes to tree (neither network exists yet in policy database).
+    ip -net $ns xfrm policy add src 10.1.0.0/24 dst 10.0.0.0/24 dir fwd priority 200 action block
+
+    # adds a new node in the 10.0.0.0/24 tree (dst node exists).
+    ip -net $ns xfrm policy add src 10.2.0.0/24 dst 10.0.0.0/24 dir fwd priority 200 action block
+
+    # adds a 10.2.0.0/24 node, but for different dst.
+    ip -net $ns xfrm policy add src 10.2.0.0/24 dst 10.0.1.0/24 dir fwd priority 200 action block
+
+    # dst now overlaps with the 10.0.1.0/24 ESP policy in fwd.
+    # kernel must 'promote' existing one (10.0.0.0/24) to 10.0.0.0/23.
+    # But 10.0.0.0/23 also includes existing 10.0.1.0/24, so that node
+    # also has to be merged too, including source-sorted subtrees.
+    # old:
+    # 10.0.0.0/24 (node 1 in dst tree of the bin)
+    #    10.1.0.0/24 (node in src tree of dst node 1)
+    #    10.2.0.0/24 (node in src tree of dst node 1)
+    # 10.0.1.0/24 (node 2 in dst tree of the bin)
+    #    10.0.2.0/24 (node in src tree of dst node 2)
+    #    10.2.0.0/24 (node in src tree of dst node 2)
+    #
+    # The next 'policy add' adds dst '10.0.0.0/23', which means
+    # that dst node 1 and dst node 2 have to be merged including
+    # the sub-tree.  As no duplicates are allowed, policies in
+    # the two '10.0.2.0/24' are also merged.
+    #
+    # after the 'add', internal search tree should look like this:
+    # 10.0.0.0/23 (node in dst tree of bin)
+    #     10.0.2.0/24 (node in src tree of dst node)
+    #     10.1.0.0/24 (node in src tree of dst node)
+    #     10.2.0.0/24 (node in src tree of dst node)
+    #
+    # 10.0.0.0/24 and 10.0.1.0/24 nodes have been merged as 10.0.0.0/23.
+    ip -net $ns xfrm policy add src 10.1.0.0/24 dst 10.0.0.0/23 dir fwd priority 200 action block
+}
+
 do_esp_policy_get_check() {
     local ns=$1
     local lnet=$2
@@ -160,6 +212,41 @@ check_xfrm() {
 	return $lret
 }
 
+check_exceptions()
+{
+	logpostfix="$1"
+	local lret=0
+
+	# ping to .254 should be excluded from the tunnel (exception is in place).
+	check_xfrm 0 254
+	if [ $? -ne 0 ]; then
+		echo "FAIL: expected ping to .254 to fail ($logpostfix)"
+		lret=1
+	else
+		echo "PASS: ping to .254 bypassed ipsec tunnel ($logpostfix)"
+	fi
+
+	# ping to .253 should use use ipsec due to direct policy exception.
+	check_xfrm 1 253
+	if [ $? -ne 0 ]; then
+		echo "FAIL: expected ping to .253 to use ipsec tunnel ($logpostfix)"
+		lret=1
+	else
+		echo "PASS: direct policy matches ($logpostfix)"
+	fi
+
+	# ping to .2 should use ipsec.
+	check_xfrm 1 2
+	if [ $? -ne 0 ]; then
+		echo "FAIL: expected ping to .2 to use ipsec tunnel ($logpostfix)"
+		lret=1
+	else
+		echo "PASS: policy matches ($logpostfix)"
+	fi
+
+	return $lret
+}
+
 #check for needed privileges
 if [ "$(id -u)" -ne 0 ];then
 	echo "SKIP: Need root privileges"
@@ -270,31 +357,17 @@ do_exception ns4 10.0.3.10 10.0.3.1 10.0.1.253 10.0.1.240/28
 do_exception ns3 dead:3::1 dead:3::10 dead:2::fd  dead:2:f0::/96
 do_exception ns4 dead:3::10 dead:3::1 dead:1::fd  dead:1:f0::/96
 
-# ping to .254 should now be excluded from the tunnel
-check_xfrm 0 254
+check_exceptions "exceptions"
 if [ $? -ne 0 ]; then
-	echo "FAIL: expected ping to .254 to fail"
 	ret=1
-else
-	echo "PASS: ping to .254 bypassed ipsec tunnel"
 fi
 
-# ping to .253 should use use ipsec due to direct policy exception.
-check_xfrm 1 253
-if [ $? -ne 0 ]; then
-	echo "FAIL: expected ping to .253 to use ipsec tunnel"
-	ret=1
-else
-	echo "PASS: direct policy matches"
-fi
+# insert block policies with adjacent/overlapping netmasks
+do_overlap ns3
 
-# ping to .2 should use ipsec.
-check_xfrm 1 2
+check_exceptions "exceptions and block policies"
 if [ $? -ne 0 ]; then
-	echo "FAIL: expected ping to .2 to use ipsec tunnel"
 	ret=1
-else
-	echo "PASS: policy matches"
 fi
 
 for i in 1 2 3 4;do ip netns del ns$i;done
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 02/10] xfrm: policy: use hlist rcu variants on inexact insert, part 2
  2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
  2019-01-25  7:44 ` [PATCH 01/10] selftests: xfrm: add block rules with adjacent/overlapping subnets Steffen Klassert
@ 2019-01-25  7:44 ` Steffen Klassert
  2019-01-25  7:44 ` [PATCH 03/10] xfrm: policy: increment xfrm_hash_generation on hash rebuild Steffen Klassert
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

From: Florian Westphal <fw@strlen.de>

This function was modeled on the 'exact' insert one, which did not use
the rcu variant either.

When I fixed the 'exact' insert I forgot to propagate this to my
development tree, so the inexact variant retained the bug.

Fixes: 9cf545ebd591d ("xfrm: policy: store inexact policies in a tree ordered by destination address")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 934492bad8e0..628b389af2ba 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -856,9 +856,9 @@ static void xfrm_policy_inexact_list_reinsert(struct net *net,
 		}
 
 		if (newpos)
-			hlist_add_behind(&policy->bydst, newpos);
+			hlist_add_behind_rcu(&policy->bydst, newpos);
 		else
-			hlist_add_head(&policy->bydst, &n->hhead);
+			hlist_add_head_rcu(&policy->bydst, &n->hhead);
 
 		/* paranoia checks follow.
 		 * Check that the reinserted policy matches at least
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 03/10] xfrm: policy: increment xfrm_hash_generation on hash rebuild
  2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
  2019-01-25  7:44 ` [PATCH 01/10] selftests: xfrm: add block rules with adjacent/overlapping subnets Steffen Klassert
  2019-01-25  7:44 ` [PATCH 02/10] xfrm: policy: use hlist rcu variants on inexact insert, part 2 Steffen Klassert
@ 2019-01-25  7:44 ` Steffen Klassert
  2019-01-25  7:44 ` [PATCH 04/10] xfrm: policy: delete inexact policies from inexact list " Steffen Klassert
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

From: Florian Westphal <fw@strlen.de>

Hash rebuild will re-set all the inexact entries, then re-insert them.
Lookups that can occur in parallel will therefore not find any policies.

This was safe when lookups were still guarded by rwlock.
After rcu-ification, lookups check the hash_generation seqcount to detect
when a hash resize takes place.  Hash rebuild missed the needed increment.

Hash resizes and hash rebuilds cannot occur in parallel (both acquire
hash_resize_mutex), so just increment xfrm_hash_generation, like resize.

Fixes: a7c44247f704e3 ("xfrm: policy: make xfrm_policy_lookup_bytype lockless")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 628b389af2ba..d8fba27a4bfb 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1235,6 +1235,7 @@ static void xfrm_hash_rebuild(struct work_struct *work)
 	} while (read_seqretry(&net->xfrm.policy_hthresh.lock, seq));
 
 	spin_lock_bh(&net->xfrm.xfrm_policy_lock);
+	write_seqcount_begin(&xfrm_policy_hash_generation);
 
 	/* make sure that we can insert the indirect policies again before
 	 * we start with destructive action.
@@ -1334,6 +1335,7 @@ static void xfrm_hash_rebuild(struct work_struct *work)
 
 out_unlock:
 	__xfrm_policy_inexact_flush(net);
+	write_seqcount_end(&xfrm_policy_hash_generation);
 	spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
 
 	mutex_unlock(&hash_resize_mutex);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 04/10] xfrm: policy: delete inexact policies from inexact list on hash rebuild
  2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
                   ` (2 preceding siblings ...)
  2019-01-25  7:44 ` [PATCH 03/10] xfrm: policy: increment xfrm_hash_generation on hash rebuild Steffen Klassert
@ 2019-01-25  7:44 ` Steffen Klassert
  2019-01-25  7:44 ` [PATCH 05/10] xfrm: policy: fix reinsertion on node merge Steffen Klassert
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

From: Florian Westphal <fw@strlen.de>

An xfrm hash rebuild has to reset the inexact policy list before the
policies get re-inserted: A change of hash thresholds will result in
policies to get moved from inexact tree to the policy hash table.

If the thresholds are increased again later, they get moved from hash
table to inexact tree.

We must unlink all policies from the inexact tree before re-insertion.

Otherwise 'migrate' may find policies that are in main hash table a
second time, when it searches the inexact lists.

Furthermore, re-insertion without deletion can cause elements ->next to
point back to itself, causing soft lockups or double-frees.

Reported-by: syzbot+9d971dd21eb26567036b@syzkaller.appspotmail.com
Fixes: 9cf545ebd591da ("xfrm: policy: store inexact policies in a tree ordered by destination address")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index d8fba27a4bfb..24dfd1e47cf0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -680,16 +680,6 @@ static void xfrm_hash_resize(struct work_struct *work)
 	mutex_unlock(&hash_resize_mutex);
 }
 
-static void xfrm_hash_reset_inexact_table(struct net *net)
-{
-	struct xfrm_pol_inexact_bin *b;
-
-	lockdep_assert_held(&net->xfrm.xfrm_policy_lock);
-
-	list_for_each_entry(b, &net->xfrm.inexact_bins, inexact_bins)
-		INIT_HLIST_HEAD(&b->hhead);
-}
-
 /* Make sure *pol can be inserted into fastbin.
  * Useful to check that later insert requests will be sucessful
  * (provided xfrm_policy_lock is held throughout).
@@ -1279,10 +1269,14 @@ static void xfrm_hash_rebuild(struct work_struct *work)
 	}
 
 	/* reset the bydst and inexact table in all directions */
-	xfrm_hash_reset_inexact_table(net);
-
 	for (dir = 0; dir < XFRM_POLICY_MAX; dir++) {
-		INIT_HLIST_HEAD(&net->xfrm.policy_inexact[dir]);
+		struct hlist_node *n;
+
+		hlist_for_each_entry_safe(policy, n,
+					  &net->xfrm.policy_inexact[dir],
+					  bydst_inexact_list)
+			hlist_del_init(&policy->bydst_inexact_list);
+
 		hmask = net->xfrm.policy_bydst[dir].hmask;
 		odst = net->xfrm.policy_bydst[dir].table;
 		for (i = hmask; i >= 0; i--)
@@ -1314,6 +1308,9 @@ static void xfrm_hash_rebuild(struct work_struct *work)
 		newpos = NULL;
 		chain = policy_hash_bysel(net, &policy->selector,
 					  policy->family, dir);
+
+		hlist_del_rcu(&policy->bydst);
+
 		if (!chain) {
 			void *p = xfrm_policy_inexact_insert(policy, dir, 0);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 05/10] xfrm: policy: fix reinsertion on node merge
  2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
                   ` (3 preceding siblings ...)
  2019-01-25  7:44 ` [PATCH 04/10] xfrm: policy: delete inexact policies from inexact list " Steffen Klassert
@ 2019-01-25  7:44 ` Steffen Klassert
  2019-01-25  7:44 ` [PATCH 06/10] selftests: xfrm: alter htresh to trigger move of policies to hash table Steffen Klassert
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

From: Florian Westphal <fw@strlen.de>

"newpos" has wrong scope.  It must be NULL on each iteration of the loop.
Otherwise, when policy is to be inserted at the start, we would instead
insert at point found by the previous loop-iteration instead.

Also, we need to unlink the policy before we reinsert it to the new node,
else we can get next-points-to-self loops.

Because policies are only ordered by priority it is irrelevant which policy
is "more recent" except when two policies have same priority.
(the more recent one is placed after the older one).

In these cases, we can use the ->pos id number to know which one is the
'older': the higher the id, the more recent the policy.

So we only need to unlink all policies from the node that is about to be
removed, and insert them to the replacement node.

Fixes: 9cf545ebd591da ("xfrm: policy: store inexact policies in a tree ordered by destination address")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 24dfd1e47cf0..e691683223ee 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -823,13 +823,13 @@ static void xfrm_policy_inexact_list_reinsert(struct net *net,
 					      u16 family)
 {
 	unsigned int matched_s, matched_d;
-	struct hlist_node *newpos = NULL;
 	struct xfrm_policy *policy, *p;
 
 	matched_s = 0;
 	matched_d = 0;
 
 	list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) {
+		struct hlist_node *newpos = NULL;
 		bool matches_s, matches_d;
 
 		if (!policy->bydst_reinsert)
@@ -839,7 +839,10 @@ static void xfrm_policy_inexact_list_reinsert(struct net *net,
 
 		policy->bydst_reinsert = false;
 		hlist_for_each_entry(p, &n->hhead, bydst) {
-			if (policy->priority >= p->priority)
+			if (policy->priority > p->priority)
+				newpos = &p->bydst;
+			else if (policy->priority == p->priority &&
+				 policy->pos > p->pos)
 				newpos = &p->bydst;
 			else
 				break;
@@ -955,12 +958,11 @@ static void xfrm_policy_inexact_node_merge(struct net *net,
 						  family);
 	}
 
-	hlist_for_each_entry(tmp, &v->hhead, bydst)
-		tmp->bydst_reinsert = true;
-	hlist_for_each_entry(tmp, &n->hhead, bydst)
+	hlist_for_each_entry(tmp, &v->hhead, bydst) {
 		tmp->bydst_reinsert = true;
+		hlist_del_rcu(&tmp->bydst);
+	}
 
-	INIT_HLIST_HEAD(&n->hhead);
 	xfrm_policy_inexact_list_reinsert(net, n, family);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 06/10] selftests: xfrm: alter htresh to trigger move of policies to hash table
  2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
                   ` (4 preceding siblings ...)
  2019-01-25  7:44 ` [PATCH 05/10] xfrm: policy: fix reinsertion on node merge Steffen Klassert
@ 2019-01-25  7:44 ` Steffen Klassert
  2019-01-25  7:44 ` [PATCH 07/10] xfrm: policy: fix infinite loop when merging src-nodes Steffen Klassert
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

From: Florian Westphal <fw@strlen.de>

... and back to inexact tree.
Repeat ping test after each htresh change: lookup results must not change.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 tools/testing/selftests/net/xfrm_policy.sh | 44 ++++++++++++++++++++--
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/net/xfrm_policy.sh b/tools/testing/selftests/net/xfrm_policy.sh
index b5a1b565a7e6..8ce54600d4d1 100755
--- a/tools/testing/selftests/net/xfrm_policy.sh
+++ b/tools/testing/selftests/net/xfrm_policy.sh
@@ -28,6 +28,19 @@ KEY_AES=0x0123456789abcdef0123456789012345
 SPI1=0x1
 SPI2=0x2
 
+do_esp_policy() {
+    local ns=$1
+    local me=$2
+    local remote=$3
+    local lnet=$4
+    local rnet=$5
+
+    # to encrypt packets as they go out (includes forwarded packets that need encapsulation)
+    ip -net $ns xfrm policy add src $lnet dst $rnet dir out tmpl src $me dst $remote proto esp mode tunnel priority 100 action allow
+    # to fwd decrypted packets after esp processing:
+    ip -net $ns xfrm policy add src $rnet dst $lnet dir fwd tmpl src $remote dst $me proto esp mode tunnel priority 100 action allow
+}
+
 do_esp() {
     local ns=$1
     local me=$2
@@ -40,10 +53,7 @@ do_esp() {
     ip -net $ns xfrm state add src $remote dst $me proto esp spi $spi_in  enc aes $KEY_AES  auth sha1 $KEY_SHA  mode tunnel sel src $rnet dst $lnet
     ip -net $ns xfrm state add src $me  dst $remote proto esp spi $spi_out enc aes $KEY_AES auth sha1 $KEY_SHA mode tunnel sel src $lnet dst $rnet
 
-    # to encrypt packets as they go out (includes forwarded packets that need encapsulation)
-    ip -net $ns xfrm policy add src $lnet dst $rnet dir out tmpl src $me dst $remote proto esp mode tunnel priority 100 action allow
-    # to fwd decrypted packets after esp processing:
-    ip -net $ns xfrm policy add src $rnet dst $lnet dir fwd tmpl src $remote dst $me proto esp mode tunnel priority 100 action allow
+    do_esp_policy $ns $me $remote $lnet $rnet
 }
 
 # add policies with different netmasks, to make sure kernel carries
@@ -370,6 +380,32 @@ if [ $? -ne 0 ]; then
 	ret=1
 fi
 
+for n in ns3 ns4;do
+	ip -net $n xfrm policy set hthresh4 28 24 hthresh6 126 125
+	sleep $((RANDOM%5))
+done
+
+check_exceptions "exceptions and block policies after hresh changes"
+
+# full flush of policy db, check everything gets freed incl. internal meta data
+ip -net ns3 xfrm policy flush
+
+do_esp_policy ns3 10.0.3.1 10.0.3.10 10.0.1.0/24 10.0.2.0/24
+do_exception ns3 10.0.3.1 10.0.3.10 10.0.2.253 10.0.2.240/28
+
+# move inexact policies to hash table
+ip -net ns3 xfrm policy set hthresh4 16 16
+
+sleep $((RANDOM%5))
+check_exceptions "exceptions and block policies after hthresh change in ns3"
+
+# restore original hthresh settings -- move policies back to tables
+for n in ns3 ns4;do
+	ip -net $n xfrm policy set hthresh4 32 32 hthresh6 128 128
+	sleep $((RANDOM%5))
+done
+check_exceptions "exceptions and block policies after hresh change to normal"
+
 for i in 1 2 3 4;do ip netns del ns$i;done
 
 exit $ret
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 07/10] xfrm: policy: fix infinite loop when merging src-nodes
  2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
                   ` (5 preceding siblings ...)
  2019-01-25  7:44 ` [PATCH 06/10] selftests: xfrm: alter htresh to trigger move of policies to hash table Steffen Klassert
@ 2019-01-25  7:44 ` Steffen Klassert
  2019-01-25  7:44 ` [PATCH 08/10] vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel Steffen Klassert
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

From: Florian Westphal <fw@strlen.de>

With very small change to test script we can trigger softlockup due to
bogus assignment of 'p' (policy to be examined) on restart.

Previously the two to-be-merged nodes had same address/prefixlength pair,
so no erase/reinsert was necessary, we only had to append the list from
node a to b.

If prefix lengths are different, the node has to be deleted and re-inserted
into the tree, with the updated prefix length.  This was broken; due to
bogus update to 'p' this loops forever.

Add a 'restart' label and use that instead.

While at it, don't perform the unneeded reinserts of the policies that
are already sorted into the 'new' node.

A previous patch in this series made xfrm_policy_inexact_list_reinsert()
use the relative position indicator to sort policies according to age in
case priorities are identical.

Fixes: 6ac098b2a9d30 ("xfrm: policy: add 2nd-level saddr trees for inexact policies")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c                     | 15 +++++++--------
 tools/testing/selftests/net/xfrm_policy.sh |  4 ++--
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index e691683223ee..8cfd75b62396 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -886,12 +886,13 @@ static void xfrm_policy_inexact_node_reinsert(struct net *net,
 					      struct rb_root *new,
 					      u16 family)
 {
-	struct rb_node **p, *parent = NULL;
 	struct xfrm_pol_inexact_node *node;
+	struct rb_node **p, *parent;
 
 	/* we should not have another subtree here */
 	WARN_ON_ONCE(!RB_EMPTY_ROOT(&n->root));
-
+restart:
+	parent = NULL;
 	p = &new->rb_node;
 	while (*p) {
 		u8 prefixlen;
@@ -911,12 +912,11 @@ static void xfrm_policy_inexact_node_reinsert(struct net *net,
 		} else {
 			struct xfrm_policy *tmp;
 
-			hlist_for_each_entry(tmp, &node->hhead, bydst)
-				tmp->bydst_reinsert = true;
-			hlist_for_each_entry(tmp, &n->hhead, bydst)
+			hlist_for_each_entry(tmp, &n->hhead, bydst) {
 				tmp->bydst_reinsert = true;
+				hlist_del_rcu(&tmp->bydst);
+			}
 
-			INIT_HLIST_HEAD(&node->hhead);
 			xfrm_policy_inexact_list_reinsert(net, node, family);
 
 			if (node->prefixlen == n->prefixlen) {
@@ -928,8 +928,7 @@ static void xfrm_policy_inexact_node_reinsert(struct net *net,
 			kfree_rcu(n, rcu);
 			n = node;
 			n->prefixlen = prefixlen;
-			*p = new->rb_node;
-			parent = NULL;
+			goto restart;
 		}
 	}
 
diff --git a/tools/testing/selftests/net/xfrm_policy.sh b/tools/testing/selftests/net/xfrm_policy.sh
index 8ce54600d4d1..71d7fdc513c1 100755
--- a/tools/testing/selftests/net/xfrm_policy.sh
+++ b/tools/testing/selftests/net/xfrm_policy.sh
@@ -78,8 +78,8 @@ do_overlap()
     # adds a new node in the 10.0.0.0/24 tree (dst node exists).
     ip -net $ns xfrm policy add src 10.2.0.0/24 dst 10.0.0.0/24 dir fwd priority 200 action block
 
-    # adds a 10.2.0.0/24 node, but for different dst.
-    ip -net $ns xfrm policy add src 10.2.0.0/24 dst 10.0.1.0/24 dir fwd priority 200 action block
+    # adds a 10.2.0.0/23 node, but for different dst.
+    ip -net $ns xfrm policy add src 10.2.0.0/23 dst 10.0.1.0/24 dir fwd priority 200 action block
 
     # dst now overlaps with the 10.0.1.0/24 ESP policy in fwd.
     # kernel must 'promote' existing one (10.0.0.0/24) to 10.0.0.0/23.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 08/10] vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel
  2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
                   ` (6 preceding siblings ...)
  2019-01-25  7:44 ` [PATCH 07/10] xfrm: policy: fix infinite loop when merging src-nodes Steffen Klassert
@ 2019-01-25  7:44 ` Steffen Klassert
  2019-01-25  7:44 ` [PATCH 09/10] xfrm: refine validation of template and selector families Steffen Klassert
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

From: Su Yanjun <suyj.fnst@cn.fujitsu.com>

Recently we run a network test over ipcomp virtual tunnel.We find that
if a ipv4 packet needs fragment, then the peer can't receive
it.

We deep into the code and find that when packet need fragment the smaller
fragment will be encapsulated by ipip not ipcomp. So when the ipip packet
goes into xfrm, it's skb->dev is not properly set. The ipv4 reassembly code
always set skb'dev to the last fragment's dev. After ipv4 defrag processing,
when the kernel rp_filter parameter is set, the skb will be drop by -EXDEV
error.

This patch adds compatible support for the ipip process in ipcomp virtual tunnel.

Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/ip_vti.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index d7b43e700023..68a21bf75dd0 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -74,6 +74,33 @@ static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
 	return 0;
 }
 
+static int vti_input_ipip(struct sk_buff *skb, int nexthdr, __be32 spi,
+		     int encap_type)
+{
+	struct ip_tunnel *tunnel;
+	const struct iphdr *iph = ip_hdr(skb);
+	struct net *net = dev_net(skb->dev);
+	struct ip_tunnel_net *itn = net_generic(net, vti_net_id);
+
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
+				  iph->saddr, iph->daddr, 0);
+	if (tunnel) {
+		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
+			goto drop;
+
+		XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = tunnel;
+
+		skb->dev = tunnel->dev;
+
+		return xfrm_input(skb, nexthdr, spi, encap_type);
+	}
+
+	return -EINVAL;
+drop:
+	kfree_skb(skb);
+	return 0;
+}
+
 static int vti_rcv(struct sk_buff *skb)
 {
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET;
@@ -82,6 +109,14 @@ static int vti_rcv(struct sk_buff *skb)
 	return vti_input(skb, ip_hdr(skb)->protocol, 0, 0);
 }
 
+static int vti_rcv_ipip(struct sk_buff *skb)
+{
+	XFRM_SPI_SKB_CB(skb)->family = AF_INET;
+	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct iphdr, daddr);
+
+	return vti_input_ipip(skb, ip_hdr(skb)->protocol, ip_hdr(skb)->saddr, 0);
+}
+
 static int vti_rcv_cb(struct sk_buff *skb, int err)
 {
 	unsigned short family;
@@ -435,6 +470,12 @@ static struct xfrm4_protocol vti_ipcomp4_protocol __read_mostly = {
 	.priority	=	100,
 };
 
+static struct xfrm_tunnel ipip_handler __read_mostly = {
+	.handler	=	vti_rcv_ipip,
+	.err_handler	=	vti4_err,
+	.priority	=	0,
+};
+
 static int __net_init vti_init_net(struct net *net)
 {
 	int err;
@@ -603,6 +644,13 @@ static int __init vti_init(void)
 	if (err < 0)
 		goto xfrm_proto_comp_failed;
 
+	msg = "ipip tunnel";
+	err = xfrm4_tunnel_register(&ipip_handler, AF_INET);
+	if (err < 0) {
+		pr_info("%s: cant't register tunnel\n",__func__);
+		goto xfrm_tunnel_failed;
+	}
+
 	msg = "netlink interface";
 	err = rtnl_link_register(&vti_link_ops);
 	if (err < 0)
@@ -612,6 +660,8 @@ static int __init vti_init(void)
 
 rtnl_link_failed:
 	xfrm4_protocol_deregister(&vti_ipcomp4_protocol, IPPROTO_COMP);
+xfrm_tunnel_failed:
+	xfrm4_tunnel_deregister(&ipip_handler, AF_INET);
 xfrm_proto_comp_failed:
 	xfrm4_protocol_deregister(&vti_ah4_protocol, IPPROTO_AH);
 xfrm_proto_ah_failed:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 09/10] xfrm: refine validation of template and selector families
  2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
                   ` (7 preceding siblings ...)
  2019-01-25  7:44 ` [PATCH 08/10] vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel Steffen Klassert
@ 2019-01-25  7:44 ` Steffen Klassert
  2019-01-25  7:44 ` [PATCH 10/10] xfrm: Make set-mark default behavior backward compatible Steffen Klassert
  2019-01-27 18:32 ` pull request (net): ipsec 2019-01-25 David Miller
  10 siblings, 0 replies; 12+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

From: Florian Westphal <fw@strlen.de>

The check assumes that in transport mode, the first templates family
must match the address family of the policy selector.

Syzkaller managed to build a template using MODE_ROUTEOPTIMIZATION,
with ipv4-in-ipv6 chain, leading to following splat:

BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x1db/0x1854
Read of size 4 at addr ffff888063e57aa0 by task a.out/2050
 xfrm_state_find+0x1db/0x1854
 xfrm_tmpl_resolve+0x100/0x1d0
 xfrm_resolve_and_create_bundle+0x108/0x1000 [..]

Problem is that addresses point into flowi4 struct, but xfrm_state_find
treats them as being ipv6 because it uses templ->encap_family is used
(AF_INET6 in case of reproducer) rather than family (AF_INET).

This patch inverts the logic: Enforce 'template family must match
selector' EXCEPT for tunnel and BEET mode.

In BEET and Tunnel mode, xfrm_tmpl_resolve_one will have remote/local
address pointers changed to point at the addresses found in the template,
rather than the flowi ones, so no oob read will occur.

Reported-by: 3ntr0py1337@gmail.com
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_user.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 277c1c46fe94..c6d26afcf89d 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1488,10 +1488,15 @@ static int validate_tmpl(int nr, struct xfrm_user_tmpl *ut, u16 family)
 		if (!ut[i].family)
 			ut[i].family = family;
 
-		if ((ut[i].mode == XFRM_MODE_TRANSPORT) &&
-		    (ut[i].family != prev_family))
-			return -EINVAL;
-
+		switch (ut[i].mode) {
+		case XFRM_MODE_TUNNEL:
+		case XFRM_MODE_BEET:
+			break;
+		default:
+			if (ut[i].family != prev_family)
+				return -EINVAL;
+			break;
+		}
 		if (ut[i].mode >= XFRM_MODE_MAX)
 			return -EINVAL;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 10/10] xfrm: Make set-mark default behavior backward compatible
  2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
                   ` (8 preceding siblings ...)
  2019-01-25  7:44 ` [PATCH 09/10] xfrm: refine validation of template and selector families Steffen Klassert
@ 2019-01-25  7:44 ` Steffen Klassert
  2019-01-27 18:32 ` pull request (net): ipsec 2019-01-25 David Miller
  10 siblings, 0 replies; 12+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

From: Benedict Wong <benedictwong@google.com>

Fixes 9b42c1f179a6, which changed the default route lookup behavior for
tunnel mode SAs in the outbound direction to use the skb mark, whereas
previously mark=0 was used if the output mark was unspecified. In
mark-based routing schemes such as Android’s, this change in default
behavior causes routing loops or lookup failures.

This patch restores the default behavior of using a 0 mark while still
incorporating the skb mark if the SET_MARK (and SET_MARK_MASK) is
specified.

Tested with additions to Android's kernel unit test suite:
https://android-review.googlesource.com/c/kernel/tests/+/860150

Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking")
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 8cfd75b62396..ba0a4048c846 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2600,7 +2600,10 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 		dst_copy_metrics(dst1, dst);
 
 		if (xfrm[i]->props.mode != XFRM_MODE_TRANSPORT) {
-			__u32 mark = xfrm_smark_get(fl->flowi_mark, xfrm[i]);
+			__u32 mark = 0;
+
+			if (xfrm[i]->props.smark.v || xfrm[i]->props.smark.m)
+				mark = xfrm_smark_get(fl->flowi_mark, xfrm[i]);
 
 			family = xfrm[i]->props.family;
 			dst = xfrm_dst_lookup(xfrm[i], tos, fl->flowi_oif,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: pull request (net): ipsec 2019-01-25
  2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
                   ` (9 preceding siblings ...)
  2019-01-25  7:44 ` [PATCH 10/10] xfrm: Make set-mark default behavior backward compatible Steffen Klassert
@ 2019-01-27 18:32 ` David Miller
  10 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2019-01-27 18:32 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, netdev

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Fri, 25 Jan 2019 08:44:16 +0100

> 1) Several patches to fix the fallout from the recent
>    tree based policy lookup work. From Florian Westphal.
> 
> 2) Fix VTI for IPCOMP for 'not compressed' IPCOMP packets.
>    We need an extra IPIP handler to process these packets
>    correctly. From Su Yanjun.
> 
> 3) Fix validation of template and selector families for
>    MODE_ROUTEOPTIMIZATION with ipv4-in-ipv6 packets.
>    This can lead to a stack-out-of-bounds because
>    flowi4 struct is treated as flowi6 struct.
>    Fix from Florian Westphal.
> 
> 4) Restore the default behaviour of the xfrm set-mark
>    in the output path. This was changed accidentally
>    when mark setting was extended to the input path.
>    From Benedict Wong.
> 
> Please pull or let me know if there are problems.

Pulled, thanks Steffen!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-01-27 18:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-25  7:44 pull request (net): ipsec 2019-01-25 Steffen Klassert
2019-01-25  7:44 ` [PATCH 01/10] selftests: xfrm: add block rules with adjacent/overlapping subnets Steffen Klassert
2019-01-25  7:44 ` [PATCH 02/10] xfrm: policy: use hlist rcu variants on inexact insert, part 2 Steffen Klassert
2019-01-25  7:44 ` [PATCH 03/10] xfrm: policy: increment xfrm_hash_generation on hash rebuild Steffen Klassert
2019-01-25  7:44 ` [PATCH 04/10] xfrm: policy: delete inexact policies from inexact list " Steffen Klassert
2019-01-25  7:44 ` [PATCH 05/10] xfrm: policy: fix reinsertion on node merge Steffen Klassert
2019-01-25  7:44 ` [PATCH 06/10] selftests: xfrm: alter htresh to trigger move of policies to hash table Steffen Klassert
2019-01-25  7:44 ` [PATCH 07/10] xfrm: policy: fix infinite loop when merging src-nodes Steffen Klassert
2019-01-25  7:44 ` [PATCH 08/10] vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel Steffen Klassert
2019-01-25  7:44 ` [PATCH 09/10] xfrm: refine validation of template and selector families Steffen Klassert
2019-01-25  7:44 ` [PATCH 10/10] xfrm: Make set-mark default behavior backward compatible Steffen Klassert
2019-01-27 18:32 ` pull request (net): ipsec 2019-01-25 David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).