All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/2] wireguard fixes for 5.9-rc5
@ 2020-09-09 11:58 Jason A. Donenfeld
  2020-09-09 11:58 ` [PATCH net 1/2] wireguard: noise: take lock when removing handshake entry from table Jason A. Donenfeld
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Jason A. Donenfeld @ 2020-09-09 11:58 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Eric Dumazet

Hi Dave,

Yesterday, Eric reported a race condition found by syzbot. This series
contains two commits, one that fixes the direct issue, and another that
addresses the more general issue, as a defense in depth.

1) The basic problem syzbot unearthed was that one particular mutation
   of handshake->entry was not protected by the handshake mutex like the
   other cases, so this patch basically just reorders a line to make
   sure the mutex is actually taken at the right point. Most of the work
   here went into making sure the race was fully understood and making a
   reproducer (which syzbot was unable to do itself, due to the rarity
   of the race).

2) Eric's initial suggestion for fixing this was taking a spinlock
   around the hash table replace function where the null ptr deref was
   happening. This doesn't address the main problem in the most precise
   possible way like (1) does, but it is a good suggestion for
   defense-in-depth, in case related issues come up in the future, and
   basically costs nothing from a performance perspective. I thought it
   aided in implementing a good general rule: all mutators of that hash
   table take the table lock. So that's part of this series as a
   companion.

Both of these contain Fixes: tags and are good candidates for stable.

Jason A. Donenfeld (2):
  wireguard: noise: take lock when removing handshake entry from table
  wireguard: peerlookup: take lock before checking hash in replace
    operation

 drivers/net/wireguard/noise.c      |  5 +----
 drivers/net/wireguard/peerlookup.c | 11 ++++++++---
 2 files changed, 9 insertions(+), 7 deletions(-)

Cc: Eric Dumazet <edumazet@google.com>

-- 
2.28.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH net 1/2] wireguard: noise: take lock when removing handshake entry from table
  2020-09-09 11:58 [PATCH net 0/2] wireguard fixes for 5.9-rc5 Jason A. Donenfeld
@ 2020-09-09 11:58 ` Jason A. Donenfeld
  2020-09-09 11:58 ` [PATCH net 2/2] wireguard: peerlookup: take lock before checking hash in replace operation Jason A. Donenfeld
  2020-09-09 18:33 ` [PATCH net 0/2] wireguard fixes for 5.9-rc5 David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Jason A. Donenfeld @ 2020-09-09 11:58 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, syzbot, Eric Dumazet

Eric reported that syzkaller found a race of this variety:

CPU 1                                       CPU 2
-------------------------------------------|---------------------------------------
wg_index_hashtable_replace(old, ...)       |
  if (hlist_unhashed(&old->index_hash))    |
                                           | wg_index_hashtable_remove(old)
                                           |   hlist_del_init_rcu(&old->index_hash)
				           |     old->index_hash.pprev = NULL
  hlist_replace_rcu(&old->index_hash, ...) |
    *old->index_hash.pprev                 |

Syzbot wasn't actually able to reproduce this more than once or create a
reproducer, because the race window between checking "hlist_unhashed" and
calling "hlist_replace_rcu" is just so small. Adding an mdelay(5) or
similar there helps make this demonstrable using this simple script:

    #!/bin/bash
    set -ex
    trap 'kill $pid1; kill $pid2; ip link del wg0; ip link del wg1' EXIT
    ip link add wg0 type wireguard
    ip link add wg1 type wireguard
    wg set wg0 private-key <(wg genkey) listen-port 9999
    wg set wg1 private-key <(wg genkey) peer $(wg show wg0 public-key) endpoint 127.0.0.1:9999 persistent-keepalive 1
    wg set wg0 peer $(wg show wg1 public-key)
    ip link set wg0 up
    yes link set wg1 up | ip -force -batch - &
    pid1=$!
    yes link set wg1 down | ip -force -batch - &
    pid2=$!
    wait

The fundumental underlying problem is that we permit calls to wg_index_
hashtable_remove(handshake.entry) without requiring the caller to take
the handshake mutex that is intended to protect members of handshake
during mutations. This is consistently the case with calls to wg_index_
hashtable_insert(handshake.entry) and wg_index_hashtable_replace(
handshake.entry), but it's missing from a pertinent callsite of wg_
index_hashtable_remove(handshake.entry). So, this patch makes sure that
mutex is taken.

The original code was a little bit funky though, in the form of:

    remove(handshake.entry)
    lock(), memzero(handshake.some_members), unlock()
    remove(handshake.entry)

The original intention of that double removal pattern outside the lock
appears to be some attempt to prevent insertions that might happen while
locks are dropped during expensive crypto operations, but actually, all
callers of wg_index_hashtable_insert(handshake.entry) take the write
lock and then explicitly check handshake.state, as they should, which
the aforementioned memzero clears, which means an insertion should
already be impossible. And regardless, the original intention was
necessarily racy, since it wasn't guaranteed that something else would
run after the unlock() instead of after the remove(). So, from a
soundness perspective, it seems positive to remove what looks like a
hack at best.

The crash from both syzbot and from the script above is as follows:

  general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
  KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
  CPU: 0 PID: 7395 Comm: kworker/0:3 Not tainted 5.9.0-rc4-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: wg-kex-wg1 wg_packet_handshake_receive_worker
  RIP: 0010:hlist_replace_rcu include/linux/rculist.h:505 [inline]
  RIP: 0010:wg_index_hashtable_replace+0x176/0x330 drivers/net/wireguard/peerlookup.c:174
  Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 44 01 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 10 48 89 c6 48 c1 ee 03 <80> 3c 0e 00 0f 85 06 01 00 00 48 85 d2 4c 89 28 74 47 e8 a3 4f b5
  RSP: 0018:ffffc90006a97bf8 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff888050ffc4f8 RCX: dffffc0000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88808e04e010
  RBP: ffff88808e04e000 R08: 0000000000000001 R09: ffff8880543d0000
  R10: ffffed100a87a000 R11: 000000000000016e R12: ffff8880543d0000
  R13: ffff88808e04e008 R14: ffff888050ffc508 R15: ffff888050ffc500
  FS:  0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000f5505db0 CR3: 0000000097cf7000 CR4: 00000000001526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
  wg_noise_handshake_begin_session+0x752/0xc9a drivers/net/wireguard/noise.c:820
  wg_receive_handshake_packet drivers/net/wireguard/receive.c:183 [inline]
  wg_packet_handshake_receive_worker+0x33b/0x730 drivers/net/wireguard/receive.c:220
  process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
  worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
  kthread+0x3b5/0x4a0 kernel/kthread.c:292
  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/net/wireguard/noise.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/wireguard/noise.c b/drivers/net/wireguard/noise.c
index 3dd3b76790d0..c0cfd9b36c0b 100644
--- a/drivers/net/wireguard/noise.c
+++ b/drivers/net/wireguard/noise.c
@@ -87,15 +87,12 @@ static void handshake_zero(struct noise_handshake *handshake)
 
 void wg_noise_handshake_clear(struct noise_handshake *handshake)
 {
+	down_write(&handshake->lock);
 	wg_index_hashtable_remove(
 			handshake->entry.peer->device->index_hashtable,
 			&handshake->entry);
-	down_write(&handshake->lock);
 	handshake_zero(handshake);
 	up_write(&handshake->lock);
-	wg_index_hashtable_remove(
-			handshake->entry.peer->device->index_hashtable,
-			&handshake->entry);
 }
 
 static struct noise_keypair *keypair_create(struct wg_peer *peer)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH net 2/2] wireguard: peerlookup: take lock before checking hash in replace operation
  2020-09-09 11:58 [PATCH net 0/2] wireguard fixes for 5.9-rc5 Jason A. Donenfeld
  2020-09-09 11:58 ` [PATCH net 1/2] wireguard: noise: take lock when removing handshake entry from table Jason A. Donenfeld
@ 2020-09-09 11:58 ` Jason A. Donenfeld
  2020-09-09 18:33 ` [PATCH net 0/2] wireguard fixes for 5.9-rc5 David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Jason A. Donenfeld @ 2020-09-09 11:58 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Eric Dumazet

Eric's suggested fix for the previous commit's mentioned race condition
was to simply take the table->lock in wg_index_hashtable_replace(). The
table->lock of the hash table is supposed to protect the bucket heads,
not the entires, but actually, since all the mutator functions are
already taking it, it makes sense to take it too for the test to
hlist_unhashed, as a defense in depth measure, so that it no longer
races with deletions, regardless of what other locks are protecting
individual entries. This is sensible from a performance perspective
because, as Eric pointed out, the case of being unhashed is already the
unlikely case, so this won't add common contention. And comparing
instructions, this basically doesn't make much of a difference other
than pushing and popping %r13, used by the new `bool ret`. More
generally, I like the idea of locking consistency across table mutator
functions, and this might let me rest slightly easier at night.

Suggested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/net/wireguard/peerlookup.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireguard/peerlookup.c b/drivers/net/wireguard/peerlookup.c
index e4deb331476b..f2783aa7a88f 100644
--- a/drivers/net/wireguard/peerlookup.c
+++ b/drivers/net/wireguard/peerlookup.c
@@ -167,9 +167,13 @@ bool wg_index_hashtable_replace(struct index_hashtable *table,
 				struct index_hashtable_entry *old,
 				struct index_hashtable_entry *new)
 {
-	if (unlikely(hlist_unhashed(&old->index_hash)))
-		return false;
+	bool ret;
+
 	spin_lock_bh(&table->lock);
+	ret = !hlist_unhashed(&old->index_hash);
+	if (unlikely(!ret))
+		goto out;
+
 	new->index = old->index;
 	hlist_replace_rcu(&old->index_hash, &new->index_hash);
 
@@ -180,8 +184,9 @@ bool wg_index_hashtable_replace(struct index_hashtable *table,
 	 * simply gets dropped, which isn't terrible.
 	 */
 	INIT_HLIST_NODE(&old->index_hash);
+out:
 	spin_unlock_bh(&table->lock);
-	return true;
+	return ret;
 }
 
 void wg_index_hashtable_remove(struct index_hashtable *table,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net 0/2] wireguard fixes for 5.9-rc5
  2020-09-09 11:58 [PATCH net 0/2] wireguard fixes for 5.9-rc5 Jason A. Donenfeld
  2020-09-09 11:58 ` [PATCH net 1/2] wireguard: noise: take lock when removing handshake entry from table Jason A. Donenfeld
  2020-09-09 11:58 ` [PATCH net 2/2] wireguard: peerlookup: take lock before checking hash in replace operation Jason A. Donenfeld
@ 2020-09-09 18:33 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2020-09-09 18:33 UTC (permalink / raw)
  To: Jason; +Cc: netdev, edumazet

From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Wed,  9 Sep 2020 13:58:13 +0200

> Yesterday, Eric reported a race condition found by syzbot. This series
> contains two commits, one that fixes the direct issue, and another that
> addresses the more general issue, as a defense in depth.
> 
> 1) The basic problem syzbot unearthed was that one particular mutation
>    of handshake->entry was not protected by the handshake mutex like the
>    other cases, so this patch basically just reorders a line to make
>    sure the mutex is actually taken at the right point. Most of the work
>    here went into making sure the race was fully understood and making a
>    reproducer (which syzbot was unable to do itself, due to the rarity
>    of the race).
> 
> 2) Eric's initial suggestion for fixing this was taking a spinlock
>    around the hash table replace function where the null ptr deref was
>    happening. This doesn't address the main problem in the most precise
>    possible way like (1) does, but it is a good suggestion for
>    defense-in-depth, in case related issues come up in the future, and
>    basically costs nothing from a performance perspective. I thought it
>    aided in implementing a good general rule: all mutators of that hash
>    table take the table lock. So that's part of this series as a
>    companion.
> 
> Both of these contain Fixes: tags and are good candidates for stable.

Series applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-09 18:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-09 11:58 [PATCH net 0/2] wireguard fixes for 5.9-rc5 Jason A. Donenfeld
2020-09-09 11:58 ` [PATCH net 1/2] wireguard: noise: take lock when removing handshake entry from table Jason A. Donenfeld
2020-09-09 11:58 ` [PATCH net 2/2] wireguard: peerlookup: take lock before checking hash in replace operation Jason A. Donenfeld
2020-09-09 18:33 ` [PATCH net 0/2] wireguard fixes for 5.9-rc5 David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.