bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bpf PATCH v2 0/6] bpf: sockmap/tls fixes
@ 2019-07-08 19:13 John Fastabend
  2019-07-08 19:13 ` [bpf PATCH v2 1/6] tls: remove close callback sock unlock/lock and flush_sync John Fastabend
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: John Fastabend @ 2019-07-08 19:13 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev, edumazet, john.fastabend, bpf

Resolve a series of splats discovered by syzbot and an unhash
TLS issue noted by Eric Dumazet.

The main issues revolved around interaction between TLS and
sockmap tear down. TLS and sockmap could both reset sk->prot
ops creating a condition where a close or unhash op could be
called forever. A rare race condition resulting from a missing
rcu sync operation was causing a use after free. Then on the
TLS side dropping the sock lock and re-acquiring it during the
close op could hang. Finally, sockmap must be deployed before
tls for current stack assumptions to be met. This is enforced
now. A feature series can enable it.

To fix this first refactor TLS code so the lock is held for the
entire teardown operation. Then add an unhash callback to ensure
TLS can not transition from ESTABLISHED to LISTEN state. This
transition is a similar bug to the one found and fixed previously
in sockmap. Then apply three fixes to sockmap to fix up races
on tear down around map free and close. Finally, if sockmap
is destroyed before TLS we add a new ULP op update to inform
the TLS stack it should not call sockmap ops. This last one
appears to be the most commonly found issue from syzbot.

---

John Fastabend (6):
      tls: remove close callback sock unlock/lock and flush_sync
      bpf: tls fix transition through disconnect with close
      bpf: sockmap, sock_map_delete needs to use xchg
      bpf: sockmap, synchronize_rcu before free'ing map
      bpf: sockmap, only create entry if ulp is not already enabled
      bpf: sockmap/tls, close can race with map free


 include/linux/skmsg.h |    8 +++
 include/net/tcp.h     |    3 +
 include/net/tls.h     |   10 +++-
 net/core/skmsg.c      |    4 +
 net/core/sock_map.c   |   19 +++++--
 net/ipv4/tcp_ulp.c    |   13 +++++
 net/tls/tls_main.c    |  135 ++++++++++++++++++++++++++++++++++++++-----------
 net/tls/tls_sw.c      |   38 +++++++++-----
 8 files changed, 176 insertions(+), 54 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [bpf PATCH v2 1/6] tls: remove close callback sock unlock/lock and flush_sync
  2019-07-08 19:13 [bpf PATCH v2 0/6] bpf: sockmap/tls fixes John Fastabend
@ 2019-07-08 19:13 ` John Fastabend
  2019-07-08 19:14 ` [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close John Fastabend
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: John Fastabend @ 2019-07-08 19:13 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev, edumazet, john.fastabend, bpf

The tls close() callback currently drops the sock lock, makes a
cancel_delayed_work_sync() call, and then relocks the sock. The
same pattern is used again to call strp_done().

By restructuring the code we can avoid droping lock and then
reclaiming it. To simplify this we do the following,

 tls_sk_proto_close
  set_bit(CLOSING)
  set_bit(SCHEDULE)
  cancel_delay_work_sync() <- cancel workqueue
  lock_sock(sk)
  ...
  release_sock(sk)
  strp_done()

Setting the CLOSING bit prevents the SCHEDULE bit from being
cleared by any workqueue items e.g. if one happens to be
scheduled and run between when we set SCHEDULE bit and cancel
work. Then because SCHEDULE bit is set now no new work will
be scheduled.

Then strp_done() is called after the sk work is completed.
Any outstanding work is sync'd and finally ctx is free'd.

Tested with net selftests and bpf selftests.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/net/tls.h  |    4 ++--
 net/tls/tls_main.c |   56 ++++++++++++++++++++++++++--------------------------
 net/tls/tls_sw.c   |   38 ++++++++++++++++++++++-------------
 3 files changed, 54 insertions(+), 44 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 63e473420b00..0a3540a6304d 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -107,9 +107,7 @@ struct tls_device {
 enum {
 	TLS_BASE,
 	TLS_SW,
-#ifdef CONFIG_TLS_DEVICE
 	TLS_HW,
-#endif
 	TLS_HW_RECORD,
 	TLS_NUM_CONFIG,
 };
@@ -162,6 +160,7 @@ struct tls_sw_context_tx {
 	int async_capable;
 
 #define BIT_TX_SCHEDULED	0
+#define BIT_TX_CLOSING		1
 	unsigned long tx_bitmask;
 };
 
@@ -361,6 +360,7 @@ void tls_sw_close(struct sock *sk, long timeout);
 void tls_sw_free_resources_tx(struct sock *sk);
 void tls_sw_free_resources_rx(struct sock *sk);
 void tls_sw_release_resources_rx(struct sock *sk);
+void tls_sw_release_strp_rx(struct tls_context *tls_ctx);
 int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		   int nonblock, int flags, int *addr_len);
 bool tls_sw_stream_read(const struct sock *sk);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index fc81ae18cc44..d63c3583d2f7 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -261,24 +261,9 @@ static void tls_ctx_free(struct tls_context *ctx)
 	kfree(ctx);
 }
 
-static void tls_sk_proto_close(struct sock *sk, long timeout)
+static void tls_sk_proto_cleanup(struct sock *sk,
+				 struct tls_context *ctx, long timeo)
 {
-	struct tls_context *ctx = tls_get_ctx(sk);
-	long timeo = sock_sndtimeo(sk, 0);
-	void (*sk_proto_close)(struct sock *sk, long timeout);
-	bool free_ctx = false;
-
-	lock_sock(sk);
-	sk_proto_close = ctx->sk_proto_close;
-
-	if (ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD)
-		goto skip_tx_cleanup;
-
-	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE) {
-		free_ctx = true;
-		goto skip_tx_cleanup;
-	}
-
 	if (!tls_complete_pending_work(sk, ctx, 0, &timeo))
 		tls_handle_open_record(sk, 0);
 
@@ -299,22 +284,37 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 #ifdef CONFIG_TLS_DEVICE
 	if (ctx->rx_conf == TLS_HW)
 		tls_device_offload_cleanup_rx(sk);
-
-	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW) {
-#else
-	{
 #endif
-		tls_ctx_free(ctx);
-		ctx = NULL;
-	}
+}
+
+static void tls_sk_proto_close(struct sock *sk, long timeout)
+{
+	struct tls_context *ctx = tls_get_ctx(sk);
+	long timeo = sock_sndtimeo(sk, 0);
+	void (*sk_proto_close)(struct sock *sk, long timeout);
+
+	if (ctx->tx_conf == TLS_SW)
+		tls_sw_cancel_work_tx(ctx);
+
+	lock_sock(sk);
+	sk_proto_close = ctx->sk_proto_close;
+
+	if (ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD)
+		goto skip_tx_cleanup;
+
+	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE)
+		goto skip_tx_cleanup;
+
+	tls_sk_proto_cleanup(sk, ctx, timeo);
 
 skip_tx_cleanup:
 	release_sock(sk);
+	if (ctx->rx_conf == TLS_SW)
+		tls_sw_release_strp_rx(ctx);
 	sk_proto_close(sk, timeout);
-	/* free ctx for TLS_HW_RECORD, used by tcp_set_state
-	 * for sk->sk_prot->unhash [tls_hw_unhash]
-	 */
-	if (free_ctx)
+
+	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW &&
+	    ctx->tx_conf != TLS_HW_RECORD && ctx->rx_conf != TLS_HW_RECORD)
 		tls_ctx_free(ctx);
 }
 
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index db585964b52b..3b01cd72ca6c 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2053,6 +2053,15 @@ static void tls_data_ready(struct sock *sk)
 	}
 }
 
+void tls_sw_cancel_work_tx(struct tls_context *tls_ctx)
+{
+	struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
+
+	set_bit(BIT_TX_CLOSING, &ctx->tx_bitmask);
+	set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask);
+	cancel_delayed_work_sync(&ctx->tx_work.work);
+}
+
 void tls_sw_free_resources_tx(struct sock *sk)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
@@ -2064,11 +2073,6 @@ void tls_sw_free_resources_tx(struct sock *sk)
 	if (atomic_read(&ctx->encrypt_pending))
 		crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
 
-	release_sock(sk);
-	cancel_delayed_work_sync(&ctx->tx_work.work);
-	lock_sock(sk);
-
-	/* Tx whatever records we can transmit and abandon the rest */
 	tls_tx_records(sk, -1);
 
 	/* Free up un-sent records in tx_list. First, free
@@ -2112,22 +2116,22 @@ void tls_sw_release_resources_rx(struct sock *sk)
 		write_lock_bh(&sk->sk_callback_lock);
 		sk->sk_data_ready = ctx->saved_data_ready;
 		write_unlock_bh(&sk->sk_callback_lock);
-		release_sock(sk);
-		strp_done(&ctx->strp);
-		lock_sock(sk);
 	}
 }
 
-void tls_sw_free_resources_rx(struct sock *sk)
+void tls_sw_release_strp_rx(struct tls_context *tls_ctx)
 {
-	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
 
-	tls_sw_release_resources_rx(sk);
-
+	strp_done(&ctx->strp);
 	kfree(ctx);
 }
 
+void tls_sw_free_resources_rx(struct sock *sk)
+{
+	tls_sw_release_resources_rx(sk);
+}
+
 /* The work handler to transmitt the encrypted records in tx_list */
 static void tx_work_handler(struct work_struct *work)
 {
@@ -2136,11 +2140,17 @@ static void tx_work_handler(struct work_struct *work)
 					       struct tx_work, work);
 	struct sock *sk = tx_work->sk;
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
-	struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
+	struct tls_sw_context_tx *ctx;
 
-	if (!test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
+	if (unlikely(!tls_ctx))
+		return;
+
+	ctx = tls_sw_ctx_tx(tls_ctx);
+	if (test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask))
 		return;
 
+	if (!test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
+		return;
 	lock_sock(sk);
 	tls_tx_records(sk, -1);
 	release_sock(sk);


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
  2019-07-08 19:13 [bpf PATCH v2 0/6] bpf: sockmap/tls fixes John Fastabend
  2019-07-08 19:13 ` [bpf PATCH v2 1/6] tls: remove close callback sock unlock/lock and flush_sync John Fastabend
@ 2019-07-08 19:14 ` John Fastabend
  2019-07-10  2:45   ` Jakub Kicinski
  2019-07-08 19:14 ` [bpf PATCH v2 3/6] bpf: sockmap, sock_map_delete needs to use xchg John Fastabend
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: John Fastabend @ 2019-07-08 19:14 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev, edumazet, john.fastabend, bpf

It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE
state via tcp_dosconnect() without actually calling tcp_close which
would then call the tls close callback. Because of this a user could
disconnect a socket then put it in a LISTEN state which would break
our assumptions about sockets always being ESTABLISHED state.

More directly because close() can call unhash() and unhash is
implemented by sockmap if a sockmap socket has TLS enabled we can
incorrectly destroy the psock from unhash() and then call its close
handler again. But because the psock (sockmap socket representation)
is already destroyed we call close handler in sk->prot. However,
in some cases (TLS BASE/BASE case) this will still point at the
sockmap close handler resulting in a circular call and crash reported
by syzbot.

To fix both above issues implement the unhash() routine for TLS.

Fixes: 3c4d7559159bf ("tls: kernel TLS support")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/net/tls.h  |    6 +++++-
 net/tls/tls_main.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 0a3540a6304d..2a98d0584999 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -251,6 +251,8 @@ struct tls_context {
 	u8 tx_conf:3;
 	u8 rx_conf:3;
 
+	struct proto *sk_proto;
+
 	int (*push_pending_record)(struct sock *sk, int flags);
 	void (*sk_write_space)(struct sock *sk);
 
@@ -288,6 +290,8 @@ struct tls_context {
 
 	struct list_head list;
 	refcount_t refcount;
+
+	struct work_struct gc;
 };
 
 enum tls_offload_ctx_dir {
@@ -356,7 +360,7 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx);
 int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int tls_sw_sendpage(struct sock *sk, struct page *page,
 		    int offset, size_t size, int flags);
-void tls_sw_close(struct sock *sk, long timeout);
+void tls_sw_cancel_work_tx(struct tls_context *tls_ctx);
 void tls_sw_free_resources_tx(struct sock *sk);
 void tls_sw_free_resources_rx(struct sock *sk);
 void tls_sw_release_resources_rx(struct sock *sk);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index d63c3583d2f7..e8418456ee24 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -251,6 +251,32 @@ static void tls_write_space(struct sock *sk)
 	ctx->sk_write_space(sk);
 }
 
+static void tls_ctx_free_deferred(struct work_struct *gc)
+{
+	struct tls_context *ctx = container_of(gc, struct tls_context, gc);
+
+	if (ctx->rx_conf == TLS_SW)
+		tls_sw_release_strp_rx(ctx);
+
+	/* Ensure any remaining work items are completed. The sk will
+	 * already have lost its tls_ctx reference by the time we get
+	 * here so no xmit operation will actually be performed.
+	 */
+	tls_sw_cancel_work_tx(ctx);
+	kfree(ctx);
+}
+
+static void tls_ctx_free_wq(struct tls_context *ctx)
+{
+	if (!ctx)
+		return;
+
+	memzero_explicit(&ctx->crypto_send, sizeof(ctx->crypto_send));
+	memzero_explicit(&ctx->crypto_recv, sizeof(ctx->crypto_recv));
+	INIT_WORK(&ctx->gc, tls_ctx_free_deferred);
+	schedule_work(&ctx->gc);
+}
+
 static void tls_ctx_free(struct tls_context *ctx)
 {
 	if (!ctx)
@@ -287,6 +313,27 @@ static void tls_sk_proto_cleanup(struct sock *sk,
 #endif
 }
 
+static void tls_sk_proto_unhash(struct sock *sk)
+{
+	struct inet_connection_sock *icsk = inet_csk(sk);
+	long timeo = sock_sndtimeo(sk, 0);
+	struct tls_context *ctx;
+
+	if (unlikely(!icsk->icsk_ulp_data)) {
+		if (sk->sk_prot->unhash)
+			sk->sk_prot->unhash(sk);
+	}
+
+	ctx = tls_get_ctx(sk);
+	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
+		tls_sk_proto_cleanup(sk, ctx, timeo);
+	icsk->icsk_ulp_data = NULL;
+	tls_ctx_free_wq(ctx);
+
+	if (ctx->unhash)
+		ctx->unhash(sk);
+}
+
 static void tls_sk_proto_close(struct sock *sk, long timeout)
 {
 	struct tls_context *ctx = tls_get_ctx(sk);
@@ -305,6 +352,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE)
 		goto skip_tx_cleanup;
 
+	sk->sk_prot = ctx->sk_proto;
 	tls_sk_proto_cleanup(sk, ctx, timeo);
 
 skip_tx_cleanup:
@@ -606,6 +654,7 @@ static struct tls_context *create_ctx(struct sock *sk)
 	ctx->setsockopt = sk->sk_prot->setsockopt;
 	ctx->getsockopt = sk->sk_prot->getsockopt;
 	ctx->sk_proto_close = sk->sk_prot->close;
+	ctx->unhash = sk->sk_prot->unhash;
 	return ctx;
 }
 
@@ -729,20 +778,24 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
 	prot[TLS_BASE][TLS_BASE].setsockopt	= tls_setsockopt;
 	prot[TLS_BASE][TLS_BASE].getsockopt	= tls_getsockopt;
 	prot[TLS_BASE][TLS_BASE].close		= tls_sk_proto_close;
+	prot[TLS_BASE][TLS_BASE].unhash		= tls_sk_proto_unhash;
 
 	prot[TLS_SW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
 	prot[TLS_SW][TLS_BASE].sendmsg		= tls_sw_sendmsg;
 	prot[TLS_SW][TLS_BASE].sendpage		= tls_sw_sendpage;
+	prot[TLS_SW][TLS_BASE].unhash		= tls_sk_proto_unhash;
 
 	prot[TLS_BASE][TLS_SW] = prot[TLS_BASE][TLS_BASE];
 	prot[TLS_BASE][TLS_SW].recvmsg		  = tls_sw_recvmsg;
 	prot[TLS_BASE][TLS_SW].stream_memory_read = tls_sw_stream_read;
 	prot[TLS_BASE][TLS_SW].close		  = tls_sk_proto_close;
+	prot[TLS_BASE][TLS_SW].unhash		  = tls_sk_proto_unhash;
 
 	prot[TLS_SW][TLS_SW] = prot[TLS_SW][TLS_BASE];
 	prot[TLS_SW][TLS_SW].recvmsg		= tls_sw_recvmsg;
 	prot[TLS_SW][TLS_SW].stream_memory_read	= tls_sw_stream_read;
 	prot[TLS_SW][TLS_SW].close		= tls_sk_proto_close;
+	prot[TLS_SW][TLS_SW].unhash		= tls_sk_proto_unhash;
 
 #ifdef CONFIG_TLS_DEVICE
 	prot[TLS_HW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
@@ -793,6 +846,7 @@ static int tls_init(struct sock *sk)
 	tls_build_proto(sk);
 	ctx->tx_conf = TLS_BASE;
 	ctx->rx_conf = TLS_BASE;
+	ctx->sk_proto = sk->sk_prot;
 	update_sk_prot(sk, ctx);
 out:
 	return rc;


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [bpf PATCH v2 3/6] bpf: sockmap, sock_map_delete needs to use xchg
  2019-07-08 19:13 [bpf PATCH v2 0/6] bpf: sockmap/tls fixes John Fastabend
  2019-07-08 19:13 ` [bpf PATCH v2 1/6] tls: remove close callback sock unlock/lock and flush_sync John Fastabend
  2019-07-08 19:14 ` [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close John Fastabend
@ 2019-07-08 19:14 ` John Fastabend
  2019-07-08 19:14 ` [bpf PATCH v2 4/6] bpf: sockmap, synchronize_rcu before free'ing map John Fastabend
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: John Fastabend @ 2019-07-08 19:14 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev, edumazet, john.fastabend, bpf

__sock_map_delete() may be called from a tcp event such as unhash or
close from the following trace,

  tcp_bpf_close()
    tcp_bpf_remove()
      sk_psock_unlink()
        sock_map_delete_from_link()
          __sock_map_delete()

In this case the sock lock is held but this only protects against
duplicate removals on the TCP side. If the map is free'd then we have
this trace,

  sock_map_free
    xchg()                  <- replaces map entry
    sock_map_unref()
      sk_psock_put()
        sock_map_del_link()

The __sock_map_delete() call however uses a read, test, null over the
map entry which can result in both paths trying to free the map
entry.

To fix use xchg in TCP paths as well so we avoid having two references
to the same map entry.

Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/sock_map.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 52d4faeee18b..28702f2e9a4a 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -276,16 +276,20 @@ static int __sock_map_delete(struct bpf_stab *stab, struct sock *sk_test,
 			     struct sock **psk)
 {
 	struct sock *sk;
+	int err = 0;
 
 	raw_spin_lock_bh(&stab->lock);
 	sk = *psk;
 	if (!sk_test || sk_test == sk)
-		*psk = NULL;
+		sk = xchg(psk, NULL);
+
+	if (likely(sk))
+		sock_map_unref(sk, psk);
+	else
+		err = -EINVAL;
+
 	raw_spin_unlock_bh(&stab->lock);
-	if (unlikely(!sk))
-		return -EINVAL;
-	sock_map_unref(sk, psk);
-	return 0;
+	return err;
 }
 
 static void sock_map_delete_from_link(struct bpf_map *map, struct sock *sk,


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [bpf PATCH v2 4/6] bpf: sockmap, synchronize_rcu before free'ing map
  2019-07-08 19:13 [bpf PATCH v2 0/6] bpf: sockmap/tls fixes John Fastabend
                   ` (2 preceding siblings ...)
  2019-07-08 19:14 ` [bpf PATCH v2 3/6] bpf: sockmap, sock_map_delete needs to use xchg John Fastabend
@ 2019-07-08 19:14 ` John Fastabend
  2019-07-08 19:15 ` [bpf PATCH v2 5/6] bpf: sockmap, only create entry if ulp is not already enabled John Fastabend
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: John Fastabend @ 2019-07-08 19:14 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev, edumazet, john.fastabend, bpf

We need to have a synchronize_rcu before free'ing the sockmap because
any outstanding psock references will have a pointer to the map and
when they use this could trigger a use after free.

Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/sock_map.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 28702f2e9a4a..56bcabe7c2f2 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -247,6 +247,8 @@ static void sock_map_free(struct bpf_map *map)
 	raw_spin_unlock_bh(&stab->lock);
 	rcu_read_unlock();
 
+	synchronize_rcu();
+
 	bpf_map_area_free(stab->sks);
 	kfree(stab);
 }


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [bpf PATCH v2 5/6] bpf: sockmap, only create entry if ulp is not already enabled
  2019-07-08 19:13 [bpf PATCH v2 0/6] bpf: sockmap/tls fixes John Fastabend
                   ` (3 preceding siblings ...)
  2019-07-08 19:14 ` [bpf PATCH v2 4/6] bpf: sockmap, synchronize_rcu before free'ing map John Fastabend
@ 2019-07-08 19:15 ` John Fastabend
  2019-07-08 19:15 ` [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free John Fastabend
  2019-07-09  6:13 ` [bpf PATCH v2 0/6] bpf: sockmap/tls fixes Jakub Kicinski
  6 siblings, 0 replies; 27+ messages in thread
From: John Fastabend @ 2019-07-08 19:15 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev, edumazet, john.fastabend, bpf

Sockmap does not currently support adding sockets after TLS has been
enabled. There never was a real use case for this so it was never
added. But, we lost the test for ULP at some point so add it here
and fail the socket insert if TLS is enabled. Future work could
make sockmap support this use case but fixup the bug here.

Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/sock_map.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 56bcabe7c2f2..1330a7442e5b 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -334,6 +334,7 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
 				  struct sock *sk, u64 flags)
 {
 	struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
+	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct sk_psock_link *link;
 	struct sk_psock *psock;
 	struct sock *osk;
@@ -344,6 +345,8 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
 		return -EINVAL;
 	if (unlikely(idx >= map->max_entries))
 		return -E2BIG;
+	if (unlikely(icsk->icsk_ulp_data))
+		return -EINVAL;
 
 	link = sk_psock_init_link();
 	if (!link)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free
  2019-07-08 19:13 [bpf PATCH v2 0/6] bpf: sockmap/tls fixes John Fastabend
                   ` (4 preceding siblings ...)
  2019-07-08 19:15 ` [bpf PATCH v2 5/6] bpf: sockmap, only create entry if ulp is not already enabled John Fastabend
@ 2019-07-08 19:15 ` John Fastabend
  2019-07-10  2:36   ` Jakub Kicinski
  2019-07-10  2:38   ` Jakub Kicinski
  2019-07-09  6:13 ` [bpf PATCH v2 0/6] bpf: sockmap/tls fixes Jakub Kicinski
  6 siblings, 2 replies; 27+ messages in thread
From: John Fastabend @ 2019-07-08 19:15 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev, edumazet, john.fastabend, bpf

When a map free is called and in parallel a socket is closed we
have two paths that can potentially reset the socket prot ops, the
bpf close() path and the map free path. This creates a problem
with which prot ops should be used from the socket closed side.

If the map_free side completes first then we want to call the
original lowest level ops. However, if the tls path runs first
we want to call the sockmap ops. Additionally there was no locking
around prot updates in TLS code paths so the prot ops could
be changed multiple times once from TLS path and again from sockmap
side potentially leaving ops pointed at either TLS or sockmap
when psock and/or tls context have already been destroyed.

To fix this race first only update ops inside callback lock
so that TLS, sockmap and lowest level all agree on prot state.
Second and a ULP callback update() so that lower layers can
inform the upper layer when they are being removed allowing the
upper layer to reset prot ops.

This gets us close to allowing sockmap and tls to be stacked
in arbitrary order but will save that patch for *next trees.

Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com
Fixes: 02c558b2d5d6 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/skmsg.h |    8 +++++++-
 include/net/tcp.h     |    3 +++
 net/core/skmsg.c      |    4 ++--
 net/ipv4/tcp_ulp.c    |   13 +++++++++++++
 net/tls/tls_main.c    |   35 +++++++++++++++++++++++++++++------
 5 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 50ced8aba9db..e4b3fb4bb77c 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -354,7 +354,13 @@ static inline void sk_psock_restore_proto(struct sock *sk,
 	sk->sk_write_space = psock->saved_write_space;
 
 	if (psock->sk_proto) {
-		sk->sk_prot = psock->sk_proto;
+		struct inet_connection_sock *icsk = inet_csk(sk);
+		bool has_ulp = !!icsk->icsk_ulp_data;
+
+		if (has_ulp)
+			tcp_update_ulp(sk, psock->sk_proto);
+		else
+			sk->sk_prot = psock->sk_proto;
 		psock->sk_proto = NULL;
 	}
 }
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9d36cc88d043..123cac4c96f2 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2102,6 +2102,8 @@ struct tcp_ulp_ops {
 
 	/* initialize ulp */
 	int (*init)(struct sock *sk);
+	/* update ulp */
+	void (*update)(struct sock *sk, struct proto *p);
 	/* cleanup ulp */
 	void (*release)(struct sock *sk);
 
@@ -2113,6 +2115,7 @@ void tcp_unregister_ulp(struct tcp_ulp_ops *type);
 int tcp_set_ulp(struct sock *sk, const char *name);
 void tcp_get_available_ulp(char *buf, size_t len);
 void tcp_cleanup_ulp(struct sock *sk);
+void tcp_update_ulp(struct sock *sk, struct proto *p);
 
 #define MODULE_ALIAS_TCP_ULP(name)				\
 	__MODULE_INFO(alias, alias_userspace, name);		\
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 93bffaad2135..6832eeb4b785 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -585,12 +585,12 @@ EXPORT_SYMBOL_GPL(sk_psock_destroy);
 
 void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
 {
-	rcu_assign_sk_user_data(sk, NULL);
 	sk_psock_cork_free(psock);
 	sk_psock_zap_ingress(psock);
-	sk_psock_restore_proto(sk, psock);
 
 	write_lock_bh(&sk->sk_callback_lock);
+	sk_psock_restore_proto(sk, psock);
+	rcu_assign_sk_user_data(sk, NULL);
 	if (psock->progs.skb_parser)
 		sk_psock_stop_strp(sk, psock);
 	write_unlock_bh(&sk->sk_callback_lock);
diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c
index 3d8a1d835471..4849edb62d52 100644
--- a/net/ipv4/tcp_ulp.c
+++ b/net/ipv4/tcp_ulp.c
@@ -96,6 +96,19 @@ void tcp_get_available_ulp(char *buf, size_t maxlen)
 	rcu_read_unlock();
 }
 
+void tcp_update_ulp(struct sock *sk, struct proto *proto)
+{
+	struct inet_connection_sock *icsk = inet_csk(sk);
+
+	if (!icsk->icsk_ulp_ops) {
+		sk->sk_prot = proto;
+		return;
+	}
+
+	if (icsk->icsk_ulp_ops->update)
+		icsk->icsk_ulp_ops->update(sk, proto);
+}
+
 void tcp_cleanup_ulp(struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index e8418456ee24..4ba5476fbc5f 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -336,15 +336,17 @@ static void tls_sk_proto_unhash(struct sock *sk)
 
 static void tls_sk_proto_close(struct sock *sk, long timeout)
 {
+	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tls_context *ctx = tls_get_ctx(sk);
 	long timeo = sock_sndtimeo(sk, 0);
-	void (*sk_proto_close)(struct sock *sk, long timeout);
+
+	if (unlikely(!ctx))
+		return;
 
 	if (ctx->tx_conf == TLS_SW)
 		tls_sw_cancel_work_tx(ctx);
 
 	lock_sock(sk);
-	sk_proto_close = ctx->sk_proto_close;
 
 	if (ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD)
 		goto skip_tx_cleanup;
@@ -352,15 +354,18 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE)
 		goto skip_tx_cleanup;
 
-	sk->sk_prot = ctx->sk_proto;
 	tls_sk_proto_cleanup(sk, ctx, timeo);
 
 skip_tx_cleanup:
+	write_lock_bh(&sk->sk_callback_lock);
+	icsk->icsk_ulp_data = NULL;
+	if (sk->sk_prot->close == tls_sk_proto_close)
+		sk->sk_prot = ctx->sk_proto;
+	write_unlock_bh(&sk->sk_callback_lock);
 	release_sock(sk);
 	if (ctx->rx_conf == TLS_SW)
 		tls_sw_release_strp_rx(ctx);
-	sk_proto_close(sk, timeout);
-
+	ctx->sk_proto_close(sk, timeout);
 	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW &&
 	    ctx->tx_conf != TLS_HW_RECORD && ctx->rx_conf != TLS_HW_RECORD)
 		tls_ctx_free(ctx);
@@ -836,22 +841,39 @@ static int tls_init(struct sock *sk)
 	if (sk->sk_state != TCP_ESTABLISHED)
 		return -ENOTSUPP;
 
+	tls_build_proto(sk);
+
 	/* allocate tls context */
+	write_lock_bh(&sk->sk_callback_lock);
 	ctx = create_ctx(sk);
 	if (!ctx) {
 		rc = -ENOMEM;
 		goto out;
 	}
 
-	tls_build_proto(sk);
 	ctx->tx_conf = TLS_BASE;
 	ctx->rx_conf = TLS_BASE;
 	ctx->sk_proto = sk->sk_prot;
 	update_sk_prot(sk, ctx);
 out:
+	write_unlock_bh(&sk->sk_callback_lock);
 	return rc;
 }
 
+static void tls_update(struct sock *sk, struct proto *p)
+{
+	struct tls_context *ctx;
+
+	ctx = tls_get_ctx(sk);
+	if (likely(ctx)) {
+		ctx->sk_proto_close = p->close;
+		ctx->unhash = p->unhash;
+		ctx->sk_proto = p;
+	} else {
+		sk->sk_prot = p;
+	}
+}
+
 void tls_register_device(struct tls_device *device)
 {
 	spin_lock_bh(&device_spinlock);
@@ -872,6 +894,7 @@ static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = {
 	.name			= "tls",
 	.owner			= THIS_MODULE,
 	.init			= tls_init,
+	.update			= tls_update,
 };
 
 static int __init tls_register(void)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 0/6] bpf: sockmap/tls fixes
  2019-07-08 19:13 [bpf PATCH v2 0/6] bpf: sockmap/tls fixes John Fastabend
                   ` (5 preceding siblings ...)
  2019-07-08 19:15 ` [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free John Fastabend
@ 2019-07-09  6:13 ` Jakub Kicinski
  2019-07-09 15:40   ` John Fastabend
  6 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-07-09  6:13 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

On Mon, 08 Jul 2019 19:13:29 +0000, John Fastabend wrote:
> Resolve a series of splats discovered by syzbot and an unhash
> TLS issue noted by Eric Dumazet.
> 
> The main issues revolved around interaction between TLS and
> sockmap tear down. TLS and sockmap could both reset sk->prot
> ops creating a condition where a close or unhash op could be
> called forever. A rare race condition resulting from a missing
> rcu sync operation was causing a use after free. Then on the
> TLS side dropping the sock lock and re-acquiring it during the
> close op could hang. Finally, sockmap must be deployed before
> tls for current stack assumptions to be met. This is enforced
> now. A feature series can enable it.
> 
> To fix this first refactor TLS code so the lock is held for the
> entire teardown operation. Then add an unhash callback to ensure
> TLS can not transition from ESTABLISHED to LISTEN state. This
> transition is a similar bug to the one found and fixed previously
> in sockmap. Then apply three fixes to sockmap to fix up races
> on tear down around map free and close. Finally, if sockmap
> is destroyed before TLS we add a new ULP op update to inform
> the TLS stack it should not call sockmap ops. This last one
> appears to be the most commonly found issue from syzbot.

Looks like strparser is not done'd for offload?

About patch 6 - I was recently wondering about the "impossible" syzbot
report where context is not freed and my conclusion was that there
can be someone sitting at lock_sock() in tcp_close() already by the
time we start installing the ULP, so TLS's close will never get called.
The entire replacing of callbacks business is really shaky :(

Perhaps I'm rumbling, I will take a close look after I get some sleep :)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 0/6] bpf: sockmap/tls fixes
  2019-07-09  6:13 ` [bpf PATCH v2 0/6] bpf: sockmap/tls fixes Jakub Kicinski
@ 2019-07-09 15:40   ` John Fastabend
  2019-07-10  0:04     ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: John Fastabend @ 2019-07-09 15:40 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Mon, 08 Jul 2019 19:13:29 +0000, John Fastabend wrote:
> > Resolve a series of splats discovered by syzbot and an unhash
> > TLS issue noted by Eric Dumazet.
> > 
> > The main issues revolved around interaction between TLS and
> > sockmap tear down. TLS and sockmap could both reset sk->prot
> > ops creating a condition where a close or unhash op could be
> > called forever. A rare race condition resulting from a missing
> > rcu sync operation was causing a use after free. Then on the
> > TLS side dropping the sock lock and re-acquiring it during the
> > close op could hang. Finally, sockmap must be deployed before
> > tls for current stack assumptions to be met. This is enforced
> > now. A feature series can enable it.
> > 
> > To fix this first refactor TLS code so the lock is held for the
> > entire teardown operation. Then add an unhash callback to ensure
> > TLS can not transition from ESTABLISHED to LISTEN state. This
> > transition is a similar bug to the one found and fixed previously
> > in sockmap. Then apply three fixes to sockmap to fix up races
> > on tear down around map free and close. Finally, if sockmap
> > is destroyed before TLS we add a new ULP op update to inform
> > the TLS stack it should not call sockmap ops. This last one
> > appears to be the most commonly found issue from syzbot.
> 
> Looks like strparser is not done'd for offload?

Right so if rx_conf != TLS_SW then the hardware needs to do
the strparser functionality.

> 
> About patch 6 - I was recently wondering about the "impossible" syzbot
> report where context is not freed and my conclusion was that there
> can be someone sitting at lock_sock() in tcp_close() already by the
> time we start installing the ULP, so TLS's close will never get called.
> The entire replacing of callbacks business is really shaky :(

Well replacing callbacks is the ULP model. The race we are fixing in
patch 6 is sockmap being free'd which removes psock and resets proto ops
with tcp_close() path.

I don't think there is another race like you describe because tcp_set_ulp
is called from do_tcp_setsockopt which holds the lock and tcp state is
checked to ensure its ESTABLISHED. A closing sock wont be in ESTABLISHED
state so any setup will be aborted. Before patch 1 though I definately
saw this race because we dropped the lock mid-close.

With this series I've been running those syzbot programs over night
without issue on 4 cores. Also selftests pass in ./net/tls and ./bpf/
so I think its stable and resolves many of the issues syzbot has been
stomping around.

> 
> Perhaps I'm rumbling, I will take a close look after I get some sleep :)

Yes please do ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 0/6] bpf: sockmap/tls fixes
  2019-07-09 15:40   ` John Fastabend
@ 2019-07-10  0:04     ` Jakub Kicinski
  2019-07-10  2:21       ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-07-10  0:04 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

On Tue, 09 Jul 2019 08:40:14 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > Looks like strparser is not done'd for offload?  
> 
> Right so if rx_conf != TLS_SW then the hardware needs to do
> the strparser functionality.

Can I just take a stab at fixing the HW part?

Can I rebase this onto net-next?  There are a few patches from net
missing in the bpf tree.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 0/6] bpf: sockmap/tls fixes
  2019-07-10  0:04     ` Jakub Kicinski
@ 2019-07-10  2:21       ` Jakub Kicinski
  2019-07-10  3:28         ` John Fastabend
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-07-10  2:21 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

On Tue, 9 Jul 2019 17:04:59 -0700, Jakub Kicinski wrote:
> On Tue, 09 Jul 2019 08:40:14 -0700, John Fastabend wrote:
> > Jakub Kicinski wrote:  
> > > Looks like strparser is not done'd for offload?    
> > 
> > Right so if rx_conf != TLS_SW then the hardware needs to do
> > the strparser functionality.  
> 
> Can I just take a stab at fixing the HW part?
> 
> Can I rebase this onto net-next?  There are a few patches from net
> missing in the bpf tree.

I think I fixed patch 1 for offload, I need to test it a little more
and I'll send it back to you. In the meantime, let me ask some
questions about the other two :)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free
  2019-07-08 19:15 ` [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free John Fastabend
@ 2019-07-10  2:36   ` Jakub Kicinski
  2019-07-10  2:38   ` Jakub Kicinski
  1 sibling, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-07-10  2:36 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

On Mon, 08 Jul 2019 19:15:18 +0000, John Fastabend wrote:
> @@ -836,22 +841,39 @@ static int tls_init(struct sock *sk)

There is a goto out above this which has to be turned into return 0;
if out now releases the lock.

>  	if (sk->sk_state != TCP_ESTABLISHED)
>  		return -ENOTSUPP;
>  
> +	tls_build_proto(sk);
> +
>  	/* allocate tls context */
> +	write_lock_bh(&sk->sk_callback_lock);
>  	ctx = create_ctx(sk);
>  	if (!ctx) {
>  		rc = -ENOMEM;
>  		goto out;
>  	}
>  
> -	tls_build_proto(sk);
>  	ctx->tx_conf = TLS_BASE;
>  	ctx->rx_conf = TLS_BASE;
>  	ctx->sk_proto = sk->sk_prot;
>  	update_sk_prot(sk, ctx);
>  out:
> +	write_unlock_bh(&sk->sk_callback_lock);
>  	return rc;
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free
  2019-07-08 19:15 ` [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free John Fastabend
  2019-07-10  2:36   ` Jakub Kicinski
@ 2019-07-10  2:38   ` Jakub Kicinski
  2019-07-10  3:33     ` John Fastabend
  1 sibling, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-07-10  2:38 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

On Mon, 08 Jul 2019 19:15:18 +0000, John Fastabend wrote:
> @@ -352,15 +354,18 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
>  	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE)
>  		goto skip_tx_cleanup;
>  
> -	sk->sk_prot = ctx->sk_proto;
>  	tls_sk_proto_cleanup(sk, ctx, timeo);
>  
>  skip_tx_cleanup:
> +	write_lock_bh(&sk->sk_callback_lock);
> +	icsk->icsk_ulp_data = NULL;

Is ulp_data pointer now supposed to be updated under the
sk_callback_lock?

> +	if (sk->sk_prot->close == tls_sk_proto_close)
> +		sk->sk_prot = ctx->sk_proto;
> +	write_unlock_bh(&sk->sk_callback_lock);
>  	release_sock(sk);
>  	if (ctx->rx_conf == TLS_SW)
>  		tls_sw_release_strp_rx(ctx);
> -	sk_proto_close(sk, timeout);
> -
> +	ctx->sk_proto_close(sk, timeout);
>  	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW &&
>  	    ctx->tx_conf != TLS_HW_RECORD && ctx->rx_conf != TLS_HW_RECORD)
>  		tls_ctx_free(ctx);


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
  2019-07-08 19:14 ` [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close John Fastabend
@ 2019-07-10  2:45   ` Jakub Kicinski
  2019-07-10  3:39     ` John Fastabend
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-07-10  2:45 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

On Mon, 08 Jul 2019 19:14:05 +0000, John Fastabend wrote:
> @@ -287,6 +313,27 @@ static void tls_sk_proto_cleanup(struct sock *sk,
>  #endif
>  }
>  
> +static void tls_sk_proto_unhash(struct sock *sk)
> +{
> +	struct inet_connection_sock *icsk = inet_csk(sk);
> +	long timeo = sock_sndtimeo(sk, 0);
> +	struct tls_context *ctx;
> +
> +	if (unlikely(!icsk->icsk_ulp_data)) {

Is this for when sockmap is stacked on top of TLS and TLS got removed
without letting sockmap know?

> +		if (sk->sk_prot->unhash)
> +			sk->sk_prot->unhash(sk);
> +	}
> +
> +	ctx = tls_get_ctx(sk);
> +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> +		tls_sk_proto_cleanup(sk, ctx, timeo);
> +	icsk->icsk_ulp_data = NULL;

I think close only starts checking if ctx is NULL in patch 6.
Looks like some chunks of ctx checking/clearing got spread to
patch 1 and some to patch 6.

> +	tls_ctx_free_wq(ctx);
> +
> +	if (ctx->unhash)
> +		ctx->unhash(sk);
> +}
> +
>  static void tls_sk_proto_close(struct sock *sk, long timeout)
>  {
>  	struct tls_context *ctx = tls_get_ctx(sk);


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 0/6] bpf: sockmap/tls fixes
  2019-07-10  2:21       ` Jakub Kicinski
@ 2019-07-10  3:28         ` John Fastabend
  0 siblings, 0 replies; 27+ messages in thread
From: John Fastabend @ 2019-07-10  3:28 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Tue, 9 Jul 2019 17:04:59 -0700, Jakub Kicinski wrote:
> > On Tue, 09 Jul 2019 08:40:14 -0700, John Fastabend wrote:
> > > Jakub Kicinski wrote:  
> > > > Looks like strparser is not done'd for offload?    
> > > 
> > > Right so if rx_conf != TLS_SW then the hardware needs to do
> > > the strparser functionality.  
> > 
> > Can I just take a stab at fixing the HW part?
> > 
> > Can I rebase this onto net-next?  There are a few patches from net
> > missing in the bpf tree.
> 
> I think I fixed patch 1 for offload, I need to test it a little more
> and I'll send it back to you. In the meantime, let me ask some
> questions about the other two :)

Great thanks. When your ready push it back and I'll retest in
my setup.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free
  2019-07-10  2:38   ` Jakub Kicinski
@ 2019-07-10  3:33     ` John Fastabend
  2019-07-10 19:35       ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: John Fastabend @ 2019-07-10  3:33 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Mon, 08 Jul 2019 19:15:18 +0000, John Fastabend wrote:
> > @@ -352,15 +354,18 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
> >  	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE)
> >  		goto skip_tx_cleanup;
> >  
> > -	sk->sk_prot = ctx->sk_proto;
> >  	tls_sk_proto_cleanup(sk, ctx, timeo);
> >  
> >  skip_tx_cleanup:
> > +	write_lock_bh(&sk->sk_callback_lock);
> > +	icsk->icsk_ulp_data = NULL;
> 
> Is ulp_data pointer now supposed to be updated under the
> sk_callback_lock?

Yes otherwise it can race with tls_update(). I didn't remove the
ulp pointer null set from tcp_ulp.c though. Could be done in this
patch or as a follow up.
> 
> > +	if (sk->sk_prot->close == tls_sk_proto_close)
> > +		sk->sk_prot = ctx->sk_proto;
> > +	write_unlock_bh(&sk->sk_callback_lock);
> >  	release_sock(sk);
> >  	if (ctx->rx_conf == TLS_SW)
> >  		tls_sw_release_strp_rx(ctx);
> > -	sk_proto_close(sk, timeout);
> > -
> > +	ctx->sk_proto_close(sk, timeout);
> >  	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW &&
> >  	    ctx->tx_conf != TLS_HW_RECORD && ctx->rx_conf != TLS_HW_RECORD)
> >  		tls_ctx_free(ctx);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
  2019-07-10  2:45   ` Jakub Kicinski
@ 2019-07-10  3:39     ` John Fastabend
  2019-07-10 19:34       ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: John Fastabend @ 2019-07-10  3:39 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Mon, 08 Jul 2019 19:14:05 +0000, John Fastabend wrote:
> > @@ -287,6 +313,27 @@ static void tls_sk_proto_cleanup(struct sock *sk,
> >  #endif
> >  }
> >  
> > +static void tls_sk_proto_unhash(struct sock *sk)
> > +{
> > +	struct inet_connection_sock *icsk = inet_csk(sk);
> > +	long timeo = sock_sndtimeo(sk, 0);
> > +	struct tls_context *ctx;
> > +
> > +	if (unlikely(!icsk->icsk_ulp_data)) {
> 
> Is this for when sockmap is stacked on top of TLS and TLS got removed
> without letting sockmap know?

Right its a pattern I used on the sockmap side and put here. But
I dropped the patch to let sockmap stack on top of TLS because
it was more than a fix IMO. We could probably drop this check on
the other hand its harmless.
> 
> > +		if (sk->sk_prot->unhash)
> > +			sk->sk_prot->unhash(sk);
> > +	}
> > +
> > +	ctx = tls_get_ctx(sk);
> > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > +		tls_sk_proto_cleanup(sk, ctx, timeo);
> > +	icsk->icsk_ulp_data = NULL;
> 
> I think close only starts checking if ctx is NULL in patch 6.
> Looks like some chunks of ctx checking/clearing got spread to
> patch 1 and some to patch 6.

Yeah, I thought the patches were easier to read this way but
maybe not. Could add something in the commit log.

> 
> > +	tls_ctx_free_wq(ctx);
> > +
> > +	if (ctx->unhash)
> > +		ctx->unhash(sk);
> > +}
> > +
> >  static void tls_sk_proto_close(struct sock *sk, long timeout)
> >  {
> >  	struct tls_context *ctx = tls_get_ctx(sk);
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
  2019-07-10  3:39     ` John Fastabend
@ 2019-07-10 19:34       ` Jakub Kicinski
  2019-07-10 20:04         ` Jakub Kicinski
  2019-07-11 16:35         ` John Fastabend
  0 siblings, 2 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-07-10 19:34 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

On Tue, 09 Jul 2019 20:39:24 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Mon, 08 Jul 2019 19:14:05 +0000, John Fastabend wrote:  
> > > @@ -287,6 +313,27 @@ static void tls_sk_proto_cleanup(struct sock *sk,
> > >  #endif
> > >  }
> > >  
> > > +static void tls_sk_proto_unhash(struct sock *sk)
> > > +{
> > > +	struct inet_connection_sock *icsk = inet_csk(sk);
> > > +	long timeo = sock_sndtimeo(sk, 0);
> > > +	struct tls_context *ctx;
> > > +
> > > +	if (unlikely(!icsk->icsk_ulp_data)) {  
> > 
> > Is this for when sockmap is stacked on top of TLS and TLS got removed
> > without letting sockmap know?  
> 
> Right its a pattern I used on the sockmap side and put here. But
> I dropped the patch to let sockmap stack on top of TLS because
> it was more than a fix IMO. We could probably drop this check on
> the other hand its harmless.

I feel like this code is pretty complex I struggle to follow all the
paths, so perhaps it'd be better to drop stuff that's not necessary 
to have a clearer picture.

> > > +		if (sk->sk_prot->unhash)
> > > +			sk->sk_prot->unhash(sk);
> > > +	}
> > > +
> > > +	ctx = tls_get_ctx(sk);
> > > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > > +		tls_sk_proto_cleanup(sk, ctx, timeo);
> > > +	icsk->icsk_ulp_data = NULL;  
> > 
> > I think close only starts checking if ctx is NULL in patch 6.
> > Looks like some chunks of ctx checking/clearing got spread to
> > patch 1 and some to patch 6.  
> 
> Yeah, I thought the patches were easier to read this way but
> maybe not. Could add something in the commit log.

Ack! Let me try to get a full grip of patches 2 and 6 and come back 
to this.

> > > +	tls_ctx_free_wq(ctx);
> > > +
> > > +	if (ctx->unhash)
> > > +		ctx->unhash(sk);
> > > +}
> > > +
> > >  static void tls_sk_proto_close(struct sock *sk, long timeout)
> > >  {
> > >  	struct tls_context *ctx = tls_get_ctx(sk);  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free
  2019-07-10  3:33     ` John Fastabend
@ 2019-07-10 19:35       ` Jakub Kicinski
  2019-07-11 16:39         ` John Fastabend
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-07-10 19:35 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

On Tue, 09 Jul 2019 20:33:58 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Mon, 08 Jul 2019 19:15:18 +0000, John Fastabend wrote:  
> > > @@ -352,15 +354,18 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
> > >  	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE)
> > >  		goto skip_tx_cleanup;
> > >  
> > > -	sk->sk_prot = ctx->sk_proto;
> > >  	tls_sk_proto_cleanup(sk, ctx, timeo);
> > >  
> > >  skip_tx_cleanup:
> > > +	write_lock_bh(&sk->sk_callback_lock);
> > > +	icsk->icsk_ulp_data = NULL;  
> > 
> > Is ulp_data pointer now supposed to be updated under the
> > sk_callback_lock?  
> 
> Yes otherwise it can race with tls_update(). I didn't remove the
> ulp pointer null set from tcp_ulp.c though. Could be done in this
> patch or as a follow up.

Do we need to hold the lock in unhash, too, or is unhash called with
sk_callback_lock held?

> > > +	if (sk->sk_prot->close == tls_sk_proto_close)
> > > +		sk->sk_prot = ctx->sk_proto;
> > > +	write_unlock_bh(&sk->sk_callback_lock);
> > >  	release_sock(sk);
> > >  	if (ctx->rx_conf == TLS_SW)
> > >  		tls_sw_release_strp_rx(ctx);
> > > -	sk_proto_close(sk, timeout);
> > > -
> > > +	ctx->sk_proto_close(sk, timeout);
> > >  	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW &&
> > >  	    ctx->tx_conf != TLS_HW_RECORD && ctx->rx_conf != TLS_HW_RECORD)
> > >  		tls_ctx_free(ctx);  


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
  2019-07-10 19:34       ` Jakub Kicinski
@ 2019-07-10 20:04         ` Jakub Kicinski
  2019-07-11 16:47           ` John Fastabend
  2019-07-11 16:35         ` John Fastabend
  1 sibling, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-07-10 20:04 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

On Wed, 10 Jul 2019 12:34:17 -0700, Jakub Kicinski wrote:
> > > > +		if (sk->sk_prot->unhash)
> > > > +			sk->sk_prot->unhash(sk);
> > > > +	}
> > > > +
> > > > +	ctx = tls_get_ctx(sk);
> > > > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > > > +		tls_sk_proto_cleanup(sk, ctx, timeo);

Do we still need to hook into unhash? With patch 6 in place perhaps we
can just do disconnect 🥺

cleanup is going to kick off TX but also:

	if (unlikely(sk->sk_write_pending) &&
	    !wait_on_pending_writer(sk, &timeo))
		tls_handle_open_record(sk, 0);

Are we guaranteed that sk_write_pending is 0?  Otherwise
wait_on_pending_writer is hiding yet another release_sock() :(

> > > > +	icsk->icsk_ulp_data = NULL;    
> > > 
> > > I think close only starts checking if ctx is NULL in patch 6.
> > > Looks like some chunks of ctx checking/clearing got spread to
> > > patch 1 and some to patch 6.    
> > 
> > Yeah, I thought the patches were easier to read this way but
> > maybe not. Could add something in the commit log.  
> 
> Ack! Let me try to get a full grip of patches 2 and 6 and come back 
> to this.
> 
> > > > +	tls_ctx_free_wq(ctx);
> > > > +
> > > > +	if (ctx->unhash)
> > > > +		ctx->unhash(sk);
> > > > +}
> > > > +
> > > >  static void tls_sk_proto_close(struct sock *sk, long timeout)
> > > >  {
> > > >  	struct tls_context *ctx = tls_get_ctx(sk);    


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
  2019-07-10 19:34       ` Jakub Kicinski
  2019-07-10 20:04         ` Jakub Kicinski
@ 2019-07-11 16:35         ` John Fastabend
  1 sibling, 0 replies; 27+ messages in thread
From: John Fastabend @ 2019-07-11 16:35 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Tue, 09 Jul 2019 20:39:24 -0700, John Fastabend wrote:
> > Jakub Kicinski wrote:
> > > On Mon, 08 Jul 2019 19:14:05 +0000, John Fastabend wrote:  
> > > > @@ -287,6 +313,27 @@ static void tls_sk_proto_cleanup(struct sock *sk,
> > > >  #endif
> > > >  }
> > > >  
> > > > +static void tls_sk_proto_unhash(struct sock *sk)
> > > > +{
> > > > +	struct inet_connection_sock *icsk = inet_csk(sk);
> > > > +	long timeo = sock_sndtimeo(sk, 0);
> > > > +	struct tls_context *ctx;
> > > > +
> > > > +	if (unlikely(!icsk->icsk_ulp_data)) {  
> > > 
> > > Is this for when sockmap is stacked on top of TLS and TLS got removed
> > > without letting sockmap know?  
> > 
> > Right its a pattern I used on the sockmap side and put here. But
> > I dropped the patch to let sockmap stack on top of TLS because
> > it was more than a fix IMO. We could probably drop this check on
> > the other hand its harmless.
> 
> I feel like this code is pretty complex I struggle to follow all the
> paths, so perhaps it'd be better to drop stuff that's not necessary 
> to have a clearer picture.
> 

Sure I can drop it and add it later when its necessary.

> > > > +		if (sk->sk_prot->unhash)
> > > > +			sk->sk_prot->unhash(sk);
> > > > +	}
> > > > +
> > > > +	ctx = tls_get_ctx(sk);
> > > > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > > > +		tls_sk_proto_cleanup(sk, ctx, timeo);
> > > > +	icsk->icsk_ulp_data = NULL;  
> > > 
> > > I think close only starts checking if ctx is NULL in patch 6.
> > > Looks like some chunks of ctx checking/clearing got spread to
> > > patch 1 and some to patch 6.  
> > 
> > Yeah, I thought the patches were easier to read this way but
> > maybe not. Could add something in the commit log.
> 
> Ack! Let me try to get a full grip of patches 2 and 6 and come back 
> to this.
> 
> > > > +	tls_ctx_free_wq(ctx);
> > > > +
> > > > +	if (ctx->unhash)
> > > > +		ctx->unhash(sk);
> > > > +}
> > > > +
> > > >  static void tls_sk_proto_close(struct sock *sk, long timeout)
> > > >  {
> > > >  	struct tls_context *ctx = tls_get_ctx(sk);  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free
  2019-07-10 19:35       ` Jakub Kicinski
@ 2019-07-11 16:39         ` John Fastabend
  0 siblings, 0 replies; 27+ messages in thread
From: John Fastabend @ 2019-07-11 16:39 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Tue, 09 Jul 2019 20:33:58 -0700, John Fastabend wrote:
> > Jakub Kicinski wrote:
> > > On Mon, 08 Jul 2019 19:15:18 +0000, John Fastabend wrote:  
> > > > @@ -352,15 +354,18 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
> > > >  	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE)
> > > >  		goto skip_tx_cleanup;
> > > >  
> > > > -	sk->sk_prot = ctx->sk_proto;
> > > >  	tls_sk_proto_cleanup(sk, ctx, timeo);
> > > >  
> > > >  skip_tx_cleanup:
> > > > +	write_lock_bh(&sk->sk_callback_lock);
> > > > +	icsk->icsk_ulp_data = NULL;  
> > > 
> > > Is ulp_data pointer now supposed to be updated under the
> > > sk_callback_lock?  
> > 
> > Yes otherwise it can race with tls_update(). I didn't remove the
> > ulp pointer null set from tcp_ulp.c though. Could be done in this
> > patch or as a follow up.
> 
> Do we need to hold the lock in unhash, too, or is unhash called with
> sk_callback_lock held?
> 

We should hold the lock here. Also we should reset sk_prot similar to
other paths in case we get here without a close() call. syzbot hasn't
found that path yet but I'll add some tests for it.

	write_lock_bh(...)
	icsk_ulp_data = NULL
	sk->sk_prot = ctx->sk_proto;
	write_unlock_bh(...)

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
  2019-07-10 20:04         ` Jakub Kicinski
@ 2019-07-11 16:47           ` John Fastabend
  2019-07-11 18:32             ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: John Fastabend @ 2019-07-11 16:47 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Wed, 10 Jul 2019 12:34:17 -0700, Jakub Kicinski wrote:
> > > > > +		if (sk->sk_prot->unhash)
> > > > > +			sk->sk_prot->unhash(sk);
> > > > > +	}
> > > > > +
> > > > > +	ctx = tls_get_ctx(sk);
> > > > > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > > > > +		tls_sk_proto_cleanup(sk, ctx, timeo);
> 
> Do we still need to hook into unhash? With patch 6 in place perhaps we
> can just do disconnect 🥺

?? "can just do a disconnect", not sure I folow. We still need unhash
in cases where we have a TLS socket transition from ESTABLISHED
to LISTEN state without calling close(). This is independent of if
sockmap is running or not.

Originally, I thought this would be extremely rare but I did see it
in real applications on the sockmap side so presumably it is possible
here as well.

> 
> cleanup is going to kick off TX but also:
> 
> 	if (unlikely(sk->sk_write_pending) &&
> 	    !wait_on_pending_writer(sk, &timeo))
> 		tls_handle_open_record(sk, 0);
> 
> Are we guaranteed that sk_write_pending is 0?  Otherwise
> wait_on_pending_writer is hiding yet another release_sock() :(

Not seeing the path to release_sock() at the moment?

   tls_handle_open_record
     push_pending_record
      tls_sw_push_pending_record
        bpf_exec_tx_verdict

If bpf_exec_tx_verdict does a redirect we could hit a relase but that
is another fix I have to get queued up shortly. I think we can fix
that in another series.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
  2019-07-11 16:47           ` John Fastabend
@ 2019-07-11 18:32             ` Jakub Kicinski
  2019-07-11 21:25               ` John Fastabend
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-07-11 18:32 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

On Thu, 11 Jul 2019 09:47:16 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Wed, 10 Jul 2019 12:34:17 -0700, Jakub Kicinski wrote:  
> > > > > > +		if (sk->sk_prot->unhash)
> > > > > > +			sk->sk_prot->unhash(sk);
> > > > > > +	}
> > > > > > +
> > > > > > +	ctx = tls_get_ctx(sk);
> > > > > > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > > > > > +		tls_sk_proto_cleanup(sk, ctx, timeo);  
> > 
> > Do we still need to hook into unhash? With patch 6 in place perhaps we
> > can just do disconnect 🥺  
> 
> ?? "can just do a disconnect", not sure I folow. We still need unhash
> in cases where we have a TLS socket transition from ESTABLISHED
> to LISTEN state without calling close(). This is independent of if
> sockmap is running or not.
> 
> Originally, I thought this would be extremely rare but I did see it
> in real applications on the sockmap side so presumably it is possible
> here as well.

Ugh, sorry, I meant shutdown. Instead of replacing the unhash callback
replace the shutdown callback. We probably shouldn't release the socket
lock either there, but we can sleep, so I'll be able to run the device
connection remove callback (which sleep).

> > cleanup is going to kick off TX but also:
> > 
> > 	if (unlikely(sk->sk_write_pending) &&
> > 	    !wait_on_pending_writer(sk, &timeo))
> > 		tls_handle_open_record(sk, 0);
> > 
> > Are we guaranteed that sk_write_pending is 0?  Otherwise
> > wait_on_pending_writer is hiding yet another release_sock() :(  
> 
> Not seeing the path to release_sock() at the moment?
> 
>    tls_handle_open_record
>      push_pending_record
>       tls_sw_push_pending_record
>         bpf_exec_tx_verdict

wait_on_pending_writer
  sk_wait_event
    release_sock

> If bpf_exec_tx_verdict does a redirect we could hit a relase but that
> is another fix I have to get queued up shortly. I think we can fix
> that in another series.

Ugh.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
  2019-07-11 18:32             ` Jakub Kicinski
@ 2019-07-11 21:25               ` John Fastabend
  2019-07-12  3:16                 ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: John Fastabend @ 2019-07-11 21:25 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Thu, 11 Jul 2019 09:47:16 -0700, John Fastabend wrote:
> > Jakub Kicinski wrote:
> > > On Wed, 10 Jul 2019 12:34:17 -0700, Jakub Kicinski wrote:  
> > > > > > > +		if (sk->sk_prot->unhash)
> > > > > > > +			sk->sk_prot->unhash(sk);
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	ctx = tls_get_ctx(sk);
> > > > > > > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > > > > > > +		tls_sk_proto_cleanup(sk, ctx, timeo);  
> > > 
> > > Do we still need to hook into unhash? With patch 6 in place perhaps we
> > > can just do disconnect 🥺  
> > 
> > ?? "can just do a disconnect", not sure I folow. We still need unhash
> > in cases where we have a TLS socket transition from ESTABLISHED
> > to LISTEN state without calling close(). This is independent of if
> > sockmap is running or not.
> > 
> > Originally, I thought this would be extremely rare but I did see it
> > in real applications on the sockmap side so presumably it is possible
> > here as well.
> 
> Ugh, sorry, I meant shutdown. Instead of replacing the unhash callback
> replace the shutdown callback. We probably shouldn't release the socket
> lock either there, but we can sleep, so I'll be able to run the device
> connection remove callback (which sleep).
> 

ah OK seems doable to me. Do you want to write that on top of this
series? Or would you like to push it onto your branch and I can pull
it in push the rest of the patches on top and send it out? I think
if you can get to it in the next few days then it makes sense to wait.

I can't test the hardware side so probably makes more sense for
you to do it if you can.


> > > cleanup is going to kick off TX but also:
> > > 
> > > 	if (unlikely(sk->sk_write_pending) &&
> > > 	    !wait_on_pending_writer(sk, &timeo))
> > > 		tls_handle_open_record(sk, 0);
> > > 
> > > Are we guaranteed that sk_write_pending is 0?  Otherwise
> > > wait_on_pending_writer is hiding yet another release_sock() :(  
> > 
> > Not seeing the path to release_sock() at the moment?
> > 
> >    tls_handle_open_record
> >      push_pending_record
> >       tls_sw_push_pending_record
> >         bpf_exec_tx_verdict
> 
> wait_on_pending_writer
>   sk_wait_event
>     release_sock
> 

ah OK. I'll check on sk_write_pending...

> > If bpf_exec_tx_verdict does a redirect we could hit a relase but that
> > is another fix I have to get queued up shortly. I think we can fix
> > that in another series.
> 
> Ugh.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
  2019-07-11 21:25               ` John Fastabend
@ 2019-07-12  3:16                 ` Jakub Kicinski
  2019-07-15 20:58                   ` John Fastabend
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-07-12  3:16 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

On Thu, 11 Jul 2019 14:25:54 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Thu, 11 Jul 2019 09:47:16 -0700, John Fastabend wrote:  
> > > Jakub Kicinski wrote:  
> > > > On Wed, 10 Jul 2019 12:34:17 -0700, Jakub Kicinski wrote:    
> > > > > > > > +		if (sk->sk_prot->unhash)
> > > > > > > > +			sk->sk_prot->unhash(sk);
> > > > > > > > +	}
> > > > > > > > +
> > > > > > > > +	ctx = tls_get_ctx(sk);
> > > > > > > > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > > > > > > > +		tls_sk_proto_cleanup(sk, ctx, timeo);    
> > > > 
> > > > Do we still need to hook into unhash? With patch 6 in place perhaps we
> > > > can just do disconnect 🥺    
> > > 
> > > ?? "can just do a disconnect", not sure I folow. We still need unhash
> > > in cases where we have a TLS socket transition from ESTABLISHED
> > > to LISTEN state without calling close(). This is independent of if
> > > sockmap is running or not.
> > > 
> > > Originally, I thought this would be extremely rare but I did see it
> > > in real applications on the sockmap side so presumably it is possible
> > > here as well.  
> > 
> > Ugh, sorry, I meant shutdown. Instead of replacing the unhash callback
> > replace the shutdown callback. We probably shouldn't release the socket
> > lock either there, but we can sleep, so I'll be able to run the device
> > connection remove callback (which sleep).
> 
> ah OK seems doable to me. Do you want to write that on top of this
> series? Or would you like to push it onto your branch and I can pull
> it in push the rest of the patches on top and send it out? I think
> if you can get to it in the next few days then it makes sense to wait.

Mm.. perhaps its easiest if we forget about HW for now and get SW 
to work? Once you get the SW to 100% I can probably figure out what 
to do for HW, but I feel like we got too many moving parts ATM.

> I can't test the hardware side so probably makes more sense for
> you to do it if you can.
>
> > > > cleanup is going to kick off TX but also:
> > > > 
> > > > 	if (unlikely(sk->sk_write_pending) &&
> > > > 	    !wait_on_pending_writer(sk, &timeo))
> > > > 		tls_handle_open_record(sk, 0);
> > > > 
> > > > Are we guaranteed that sk_write_pending is 0?  Otherwise
> > > > wait_on_pending_writer is hiding yet another release_sock() :(    
> > > 
> > > Not seeing the path to release_sock() at the moment?
> > > 
> > >    tls_handle_open_record
> > >      push_pending_record
> > >       tls_sw_push_pending_record
> > >         bpf_exec_tx_verdict  
> > 
> > wait_on_pending_writer
> >   sk_wait_event
> >     release_sock
> >   
> 
> ah OK. I'll check on sk_write_pending...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
  2019-07-12  3:16                 ` Jakub Kicinski
@ 2019-07-15 20:58                   ` John Fastabend
  0 siblings, 0 replies; 27+ messages in thread
From: John Fastabend @ 2019-07-15 20:58 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Thu, 11 Jul 2019 14:25:54 -0700, John Fastabend wrote:
> > Jakub Kicinski wrote:
> > > On Thu, 11 Jul 2019 09:47:16 -0700, John Fastabend wrote:  
> > > > Jakub Kicinski wrote:  
> > > > > On Wed, 10 Jul 2019 12:34:17 -0700, Jakub Kicinski wrote:    
> > > > > > > > > +		if (sk->sk_prot->unhash)
> > > > > > > > > +			sk->sk_prot->unhash(sk);
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	ctx = tls_get_ctx(sk);
> > > > > > > > > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > > > > > > > > +		tls_sk_proto_cleanup(sk, ctx, timeo);    
> > > > > 
> > > > > Do we still need to hook into unhash? With patch 6 in place perhaps we
> > > > > can just do disconnect 🥺    
> > > > 
> > > > ?? "can just do a disconnect", not sure I folow. We still need unhash
> > > > in cases where we have a TLS socket transition from ESTABLISHED
> > > > to LISTEN state without calling close(). This is independent of if
> > > > sockmap is running or not.
> > > > 
> > > > Originally, I thought this would be extremely rare but I did see it
> > > > in real applications on the sockmap side so presumably it is possible
> > > > here as well.  
> > > 
> > > Ugh, sorry, I meant shutdown. Instead of replacing the unhash callback
> > > replace the shutdown callback. We probably shouldn't release the socket
> > > lock either there, but we can sleep, so I'll be able to run the device
> > > connection remove callback (which sleep).
> > 
> > ah OK seems doable to me. Do you want to write that on top of this
> > series? Or would you like to push it onto your branch and I can pull
> > it in push the rest of the patches on top and send it out? I think
> > if you can get to it in the next few days then it makes sense to wait.
> 
> Mm.. perhaps its easiest if we forget about HW for now and get SW 
> to work? Once you get the SW to 100% I can probably figure out what 
> to do for HW, but I feel like we got too many moving parts ATM.

Hi Jack,

I went ahead and pushed a v3 with your patches at the front. This resolves
a set of issues for me so I think it makes sense to push now and look
to resolve any further issues later. I'll look into the close with pending
data potential issue to see if it is/is-not a real issue.

Thanks,
John

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-07-15 20:58 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-08 19:13 [bpf PATCH v2 0/6] bpf: sockmap/tls fixes John Fastabend
2019-07-08 19:13 ` [bpf PATCH v2 1/6] tls: remove close callback sock unlock/lock and flush_sync John Fastabend
2019-07-08 19:14 ` [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close John Fastabend
2019-07-10  2:45   ` Jakub Kicinski
2019-07-10  3:39     ` John Fastabend
2019-07-10 19:34       ` Jakub Kicinski
2019-07-10 20:04         ` Jakub Kicinski
2019-07-11 16:47           ` John Fastabend
2019-07-11 18:32             ` Jakub Kicinski
2019-07-11 21:25               ` John Fastabend
2019-07-12  3:16                 ` Jakub Kicinski
2019-07-15 20:58                   ` John Fastabend
2019-07-11 16:35         ` John Fastabend
2019-07-08 19:14 ` [bpf PATCH v2 3/6] bpf: sockmap, sock_map_delete needs to use xchg John Fastabend
2019-07-08 19:14 ` [bpf PATCH v2 4/6] bpf: sockmap, synchronize_rcu before free'ing map John Fastabend
2019-07-08 19:15 ` [bpf PATCH v2 5/6] bpf: sockmap, only create entry if ulp is not already enabled John Fastabend
2019-07-08 19:15 ` [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free John Fastabend
2019-07-10  2:36   ` Jakub Kicinski
2019-07-10  2:38   ` Jakub Kicinski
2019-07-10  3:33     ` John Fastabend
2019-07-10 19:35       ` Jakub Kicinski
2019-07-11 16:39         ` John Fastabend
2019-07-09  6:13 ` [bpf PATCH v2 0/6] bpf: sockmap/tls fixes Jakub Kicinski
2019-07-09 15:40   ` John Fastabend
2019-07-10  0:04     ` Jakub Kicinski
2019-07-10  2:21       ` Jakub Kicinski
2019-07-10  3:28         ` John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).