kernel-tls-handshake.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/2] Another crack at a handshake upcall mechanism
@ 2023-02-24 19:19 Chuck Lever
  2023-02-24 19:19 ` [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests Chuck Lever
  2023-02-24 19:19 ` [PATCH v5 2/2] net/tls: Add kernel APIs for requesting a TLSv1.3 handshake Chuck Lever
  0 siblings, 2 replies; 15+ messages in thread
From: Chuck Lever @ 2023-02-24 19:19 UTC (permalink / raw)
  To: kuba, pabeni, edumazet; +Cc: netdev, kernel-tls-handshake

Hi-

Here is v5 of a series to add generic support for transport layer
security handshake on behalf of kernel socket consumers (user space
consumers use a security library directly, of course). A summary of
the purpose of these patches is archived here:

https://lore.kernel.org/netdev/1DE06BB1-6BA9-4DB4-B2AA-07DE532963D6@oracle.com/

For v5, I've created a YAML spec that describes the HANDSHAKE
netlink protocol. Some simplifications were necessary to make the
protocol fit within the YAML schema. I was not able to get
multi-attr working for the remote-peerid attribute, so that has been
postponed to v6.

The socket "accept" mechanism has been replaced with something more
like "dup(2)", and we no longer rely on the DONE operation to close
the accepted file descriptor. Hopefully this clarifies error and
timeout handling as well as handshake_req lifetime.

The full patch set to support SunRPC with TLSv1.3 is available in
the topic-rpc-with-tls-upcall branch here, based on net-next/main:

  https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git

A user space handshake agent for TLSv1.3 to go along with the kernel
patches is available in the "netlink" branch here:

  https://github.com/oracle/ktls-utils

Enjoy your weekend!

---

Changes since v4:
- Rebased onto net-next/main
- Replaced req reference counting with ->sk_destruct
- CMD_ACCEPT now does the equivalent of a dup(2) rather than an
  accept(2)
- CMD_DONE no longer closes the user space socket endpoint
- handshake_req_cancel is now tested and working
- Added a YAML specification for the netlink upcall protocol, and
  simplified the protocol to fit the YAML schema
- Added an initial set of tracepoints

Changes since v3:
- Converted all netlink code to use Generic Netlink
- Reworked handshake request lifetime logic throughout
- Global pending list is now per-net
- On completion, return the remote's identity to the consumer

Changes since v2:
- PF_HANDSHAKE replaced with NETLINK_HANDSHAKE
- Replaced listen(2) / poll(2) with a multicast notification service
- Replaced accept(2) with a netlink operation that can return an
  open fd and handshake parameters
- Replaced close(2) with a netlink operation that can take arguments

Changes since RFC:
- Generic upcall support split away from kTLS
- Added support for TLS ServerHello
- Documentation has been temporarily removed while API churns

---

Chuck Lever (2):
      net/handshake: Create a NETLINK service for handling handshake requests
      net/tls: Add kernel APIs for requesting a TLSv1.3 handshake


 Documentation/netlink/specs/handshake.yaml | 136 +++++++
 Documentation/networking/index.rst         |   1 +
 Documentation/networking/tls-handshake.rst | 146 +++++++
 include/net/handshake.h                    |  45 +++
 include/net/net_namespace.h                |   5 +
 include/net/sock.h                         |   1 +
 include/net/tls.h                          |  27 ++
 include/trace/events/handshake.h           | 159 ++++++++
 include/uapi/linux/handshake.h             |  65 ++++
 net/Makefile                               |   1 +
 net/handshake/Makefile                     |  11 +
 net/handshake/handshake.h                  |  41 ++
 net/handshake/netlink.c                    | 341 +++++++++++++++++
 net/handshake/request.c                    | 246 ++++++++++++
 net/handshake/trace.c                      |  17 +
 net/tls/Makefile                           |   2 +-
 net/tls/tls_handshake.c                    | 423 +++++++++++++++++++++
 17 files changed, 1666 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/netlink/specs/handshake.yaml
 create mode 100644 Documentation/networking/tls-handshake.rst
 create mode 100644 include/net/handshake.h
 create mode 100644 include/trace/events/handshake.h
 create mode 100644 include/uapi/linux/handshake.h
 create mode 100644 net/handshake/Makefile
 create mode 100644 net/handshake/handshake.h
 create mode 100644 net/handshake/netlink.c
 create mode 100644 net/handshake/request.c
 create mode 100644 net/handshake/trace.c
 create mode 100644 net/tls/tls_handshake.c

--
Chuck Lever


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests
  2023-02-24 19:19 [PATCH v5 0/2] Another crack at a handshake upcall mechanism Chuck Lever
@ 2023-02-24 19:19 ` Chuck Lever
  2023-02-27  9:24   ` Hannes Reinecke
  2023-02-24 19:19 ` [PATCH v5 2/2] net/tls: Add kernel APIs for requesting a TLSv1.3 handshake Chuck Lever
  1 sibling, 1 reply; 15+ messages in thread
From: Chuck Lever @ 2023-02-24 19:19 UTC (permalink / raw)
  To: kuba, pabeni, edumazet; +Cc: netdev, kernel-tls-handshake

From: Chuck Lever <chuck.lever@oracle.com>

When a kernel consumer needs a transport layer security session, it
first needs a handshake to negotiate and establish a session. This
negotiation can be done in user space via one of the several
existing library implementations, or it can be done in the kernel.

No in-kernel handshake implementations yet exist. In their absence,
we add a netlink service that can:

a. Notify a user space daemon that a handshake is needed.

b. Once notified, the daemon calls the kernel back via this
   netlink service to get the handshake parameters, including an
   open socket on which to establish the session.

c. Once the handshake is complete, the daemon reports the
   session status and other information via a second netlink
   operation. This operation marks that it is safe for the
   kernel to use the open socket and the security session
   established there.

The notification service uses a multicast group. Each handshake
mechanism (eg, tlshd) adopts its own group number so that the
handshake services are completely independent of one another. The
kernel can then tell via netlink_has_listeners() whether a handshake
service is active and prepared to handle a handshake request.

A new netlink operation, ACCEPT, acts like accept(2) in that it
instantiates a file descriptor in the user space daemon's fd table.
If this operation is successful, the reply carries the fd number,
which can be treated as an open and ready file descriptor.

While user space is performing the handshake, the kernel keeps its
muddy paws off the open socket. A second new netlink operation,
DONE, indicates that the user space daemon is finished with the
socket and it is safe for the kernel to use again. The operation
also indicates whether a session was established successfully.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 Documentation/netlink/specs/handshake.yaml |  134 +++++++++++
 include/net/handshake.h                    |   45 ++++
 include/net/net_namespace.h                |    5 
 include/net/sock.h                         |    1 
 include/trace/events/handshake.h           |  159 +++++++++++++
 include/uapi/linux/handshake.h             |   63 +++++
 net/Makefile                               |    1 
 net/handshake/Makefile                     |   11 +
 net/handshake/handshake.h                  |   41 +++
 net/handshake/netlink.c                    |  340 ++++++++++++++++++++++++++++
 net/handshake/request.c                    |  246 ++++++++++++++++++++
 net/handshake/trace.c                      |   17 +
 12 files changed, 1063 insertions(+)
 create mode 100644 Documentation/netlink/specs/handshake.yaml
 create mode 100644 include/net/handshake.h
 create mode 100644 include/trace/events/handshake.h
 create mode 100644 include/uapi/linux/handshake.h
 create mode 100644 net/handshake/Makefile
 create mode 100644 net/handshake/handshake.h
 create mode 100644 net/handshake/netlink.c
 create mode 100644 net/handshake/request.c
 create mode 100644 net/handshake/trace.c

diff --git a/Documentation/netlink/specs/handshake.yaml b/Documentation/netlink/specs/handshake.yaml
new file mode 100644
index 000000000000..683a8f2df0a7
--- /dev/null
+++ b/Documentation/netlink/specs/handshake.yaml
@@ -0,0 +1,134 @@
+# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+#
+# GENL HANDSHAKE service.
+#
+# Author: Chuck Lever <chuck.lever@oracle.com>
+#
+# Copyright (c) 2023, Oracle and/or its affiliates.
+#
+
+name: handshake
+
+protocol: genetlink-c
+
+doc: Netlink protocol to request a transport layer security handshake.
+
+uapi-header: linux/net/handshake.h
+
+definitions:
+  -
+    type: enum
+    name: handler-class
+    enum-name:
+    value-start: 0
+    entries: [ none ]
+  -
+    type: enum
+    name: msg-type
+    enum-name:
+    value-start: 0
+    entries: [ unspec, clienthello, serverhello ]
+  -
+    type: enum
+    name: auth
+    enum-name:
+    value-start: 0
+    entries: [ unspec, unauth, x509, psk ]
+
+attribute-sets:
+  -
+    name: accept
+    attributes:
+      -
+        name: status
+        doc: Status of this accept operation
+        type: u32
+        value: 1
+      -
+        name: sockfd
+        doc: File descriptor of socket to use
+        type: u32
+      -
+        name: handler-class
+        doc: Which type of handler is responding
+        type: u32
+        enum: handler-class
+      -
+        name: message-type
+        doc: Handshake message type
+        type: u32
+        enum: msg-type
+      -
+        name: auth
+        doc: Authentication mode
+        type: u32
+        enum: auth
+      -
+        name: gnutls-priorities
+        doc: GnuTLS priority string
+        type: string
+      -
+        name: my-peerid
+        doc: Serial no of key containing local identity
+        type: u32
+      -
+        name: my-privkey
+        doc: Serial no of key containing optional private key
+        type: u32
+  -
+    name: done
+    attributes:
+      -
+        name: status
+        doc: Session status
+        type: u32
+        value: 1
+      -
+        name: sockfd
+        doc: File descriptor of socket that has completed
+        type: u32
+      -
+        name: remote-peerid
+        doc: Serial no of keys containing identities of remote peer
+        type: u32
+
+operations:
+  list:
+    -
+      name: ready
+      doc: Notify handlers that a new handshake request is waiting
+      value: 1
+      notify: accept
+    -
+      name: accept
+      doc: Handler retrieves next queued handshake request
+      attribute-set: accept
+      flags: [ admin-perm ]
+      do:
+        request:
+          attributes:
+            - handler-class
+        reply:
+          attributes:
+            - status
+            - sockfd
+            - message-type
+            - auth
+            - gnutls-priorities
+            - my-peerid
+            - my-privkey
+    -
+      name: done
+      doc: Handler reports handshake completion
+      attribute-set: done
+      do:
+        request:
+          attributes:
+            - status
+            - sockfd
+            - remote-peerid
+
+mcast-groups:
+  list:
+    -
+      name: none
diff --git a/include/net/handshake.h b/include/net/handshake.h
new file mode 100644
index 000000000000..08f859237936
--- /dev/null
+++ b/include/net/handshake.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generic HANDSHAKE service.
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2023, Oracle and/or its affiliates.
+ */
+
+/*
+ * Data structures and functions that are visible only within the
+ * kernel are declared here.
+ */
+
+#ifndef _NET_HANDSHAKE_H
+#define _NET_HANDSHAKE_H
+
+struct handshake_req;
+
+/*
+ * Invariants for all handshake requests for one transport layer
+ * security protocol
+ */
+struct handshake_proto {
+	int			hp_handler_class;
+	size_t			hp_privsize;
+
+	int			(*hp_accept)(struct handshake_req *req,
+					     struct genl_info *gi, int fd);
+	void			(*hp_done)(struct handshake_req *req,
+					   int status, struct nlattr **tb);
+	void			(*hp_destroy)(struct handshake_req *req);
+};
+
+extern struct handshake_req *
+handshake_req_alloc(struct socket *sock, const struct handshake_proto *proto,
+		    gfp_t flags);
+extern void *handshake_req_private(struct handshake_req *req);
+extern int handshake_req_submit(struct handshake_req *req, gfp_t flags);
+extern int handshake_req_cancel(struct socket *sock);
+
+extern struct nlmsghdr *handshake_genl_put(struct sk_buff *msg,
+					   struct genl_info *gi);
+
+#endif /* _NET_HANDSHAKE_H */
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 78beaa765c73..a0ce9de4dab1 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -188,6 +188,11 @@ struct net {
 #if IS_ENABLED(CONFIG_SMC)
 	struct netns_smc	smc;
 #endif
+
+	/* transport layer security handshake requests */
+	spinlock_t		hs_lock;
+	struct list_head	hs_requests;
+	int			hs_pending;
 } __randomize_layout;
 
 #include <linux/seq_file_net.h>
diff --git a/include/net/sock.h b/include/net/sock.h
index 573f2bf7e0de..2a7345ce2540 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -519,6 +519,7 @@ struct sock {
 
 	struct socket		*sk_socket;
 	void			*sk_user_data;
+	void			*sk_handshake_req;
 #ifdef CONFIG_SECURITY
 	void			*sk_security;
 #endif
diff --git a/include/trace/events/handshake.h b/include/trace/events/handshake.h
new file mode 100644
index 000000000000..feffcd1d6256
--- /dev/null
+++ b/include/trace/events/handshake.h
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM handshake
+
+#if !defined(_TRACE_HANDSHAKE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HANDSHAKE_H
+
+#include <linux/net.h>
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(handshake_event_class,
+	TP_PROTO(
+		const struct net *net,
+		const struct handshake_req *req,
+		const struct socket *sock
+	),
+	TP_ARGS(net, req, sock),
+	TP_STRUCT__entry(
+		__field(const void *, req)
+		__field(const void *, sock)
+		__field(unsigned int, netns_ino)
+	),
+	TP_fast_assign(
+		__entry->req = req;
+		__entry->sock = sock;
+		__entry->netns_ino = net->ns.inum;
+	),
+	TP_printk("req=%p sock=%p",
+		__entry->req, __entry->sock
+	)
+);
+#define DEFINE_HANDSHAKE_EVENT(name)				\
+	DEFINE_EVENT(handshake_event_class, name,		\
+		TP_PROTO(					\
+			const struct net *net,			\
+			const struct handshake_req *req,	\
+			const struct socket *sock		\
+		),						\
+		TP_ARGS(net, req, sock))
+
+DECLARE_EVENT_CLASS(handshake_fd_class,
+	TP_PROTO(
+		const struct net *net,
+		const struct handshake_req *req,
+		const struct socket *sock,
+		int fd
+	),
+	TP_ARGS(net, req, sock, fd),
+	TP_STRUCT__entry(
+		__field(const void *, req)
+		__field(const void *, sock)
+		__field(int, fd)
+		__field(unsigned int, netns_ino)
+	),
+	TP_fast_assign(
+		__entry->req = req;
+		__entry->sock = req->hr_sock;
+		__entry->fd = fd;
+		__entry->netns_ino = net->ns.inum;
+	),
+	TP_printk("req=%p sock=%p fd=%d",
+		__entry->req, __entry->sock, __entry->fd
+	)
+);
+#define DEFINE_HANDSHAKE_FD_EVENT(name)				\
+	DEFINE_EVENT(handshake_fd_class, name,			\
+		TP_PROTO(					\
+			const struct net *net,			\
+			const struct handshake_req *req,	\
+			const struct socket *sock,		\
+			int fd					\
+		),						\
+		TP_ARGS(net, req, sock, fd))
+
+DECLARE_EVENT_CLASS(handshake_error_class,
+	TP_PROTO(
+		const struct net *net,
+		const struct handshake_req *req,
+		const struct socket *sock,
+		int err
+	),
+	TP_ARGS(net, req, sock, err),
+	TP_STRUCT__entry(
+		__field(const void *, req)
+		__field(const void *, sock)
+		__field(int, err)
+		__field(unsigned int, netns_ino)
+	),
+	TP_fast_assign(
+		__entry->req = req;
+		__entry->sock = sock;
+		__entry->err = err;
+		__entry->netns_ino = net->ns.inum;
+	),
+	TP_printk("req=%p sock=%p err=%d",
+		__entry->req, __entry->sock, __entry->err
+	)
+);
+#define DEFINE_HANDSHAKE_ERROR(name)				\
+	DEFINE_EVENT(handshake_error_class, name,		\
+		TP_PROTO(					\
+			const struct net *net,			\
+			const struct handshake_req *req,	\
+			const struct socket *sock,		\
+			int err					\
+		),						\
+		TP_ARGS(net, req, sock, err))
+
+
+/**
+ ** Request lifetime events
+ **/
+
+DEFINE_HANDSHAKE_EVENT(handshake_submit);
+DEFINE_HANDSHAKE_ERROR(handshake_submit_err);
+DEFINE_HANDSHAKE_EVENT(handshake_cancel);
+DEFINE_HANDSHAKE_EVENT(handshake_cancel_none);
+DEFINE_HANDSHAKE_EVENT(handshake_cancel_busy);
+DEFINE_HANDSHAKE_EVENT(handshake_destruct);
+
+
+TRACE_EVENT(handshake_complete,
+	TP_PROTO(
+		const struct net *net,
+		const struct handshake_req *req,
+		const struct socket *sock,
+		int status
+	),
+	TP_ARGS(net, req, sock, status),
+	TP_STRUCT__entry(
+		__field(const void *, req)
+		__field(const void *, sock)
+		__field(int, status)
+		__field(unsigned int, netns_ino)
+	),
+	TP_fast_assign(
+		__entry->req = req;
+		__entry->sock = sock;
+		__entry->status = status;
+		__entry->netns_ino = net->ns.inum;
+	),
+	TP_printk("req=%p sock=%p status=%d",
+		__entry->req, __entry->sock, __entry->status
+	)
+);
+
+/**
+ ** Netlink events
+ **/
+
+DEFINE_HANDSHAKE_ERROR(handshake_notify_err);
+DEFINE_HANDSHAKE_FD_EVENT(handshake_cmd_accept);
+DEFINE_HANDSHAKE_ERROR(handshake_cmd_accept_err);
+DEFINE_HANDSHAKE_FD_EVENT(handshake_cmd_done);
+DEFINE_HANDSHAKE_ERROR(handshake_cmd_done_err);
+
+#endif /* _TRACE_HANDSHAKE_H */
+
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/handshake.h b/include/uapi/linux/handshake.h
new file mode 100644
index 000000000000..09fd7c37cba4
--- /dev/null
+++ b/include/uapi/linux/handshake.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Do not edit directly, auto-generated from: */
+/*	Documentation/netlink/specs/handshake.yaml */
+/* YNL-GEN uapi header */
+
+#ifndef _UAPI_LINUX_HANDSHAKE_H
+#define _UAPI_LINUX_HANDSHAKE_H
+
+#define HANDSHAKE_FAMILY_NAME		"handshake"
+#define HANDSHAKE_FAMILY_VERSION	1
+
+enum {
+	HANDSHAKE_HANDLER_CLASS_NONE,
+};
+
+enum {
+	HANDSHAKE_MSG_TYPE_UNSPEC,
+	HANDSHAKE_MSG_TYPE_CLIENTHELLO,
+	HANDSHAKE_MSG_TYPE_SERVERHELLO,
+};
+
+enum {
+	HANDSHAKE_AUTH_UNSPEC,
+	HANDSHAKE_AUTH_UNAUTH,
+	HANDSHAKE_AUTH_X509,
+	HANDSHAKE_AUTH_PSK,
+};
+
+enum {
+	HANDSHAKE_A_ACCEPT_STATUS = 1,
+	HANDSHAKE_A_ACCEPT_SOCKFD,
+	HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
+	HANDSHAKE_A_ACCEPT_MESSAGE_TYPE,
+	HANDSHAKE_A_ACCEPT_AUTH,
+	HANDSHAKE_A_ACCEPT_GNUTLS_PRIORITIES,
+	HANDSHAKE_A_ACCEPT_MY_PEERID,
+	HANDSHAKE_A_ACCEPT_MY_PRIVKEY,
+
+	__HANDSHAKE_A_ACCEPT_MAX,
+	HANDSHAKE_A_ACCEPT_MAX = (__HANDSHAKE_A_ACCEPT_MAX - 1)
+};
+
+enum {
+	HANDSHAKE_A_DONE_STATUS = 1,
+	HANDSHAKE_A_DONE_SOCKFD,
+	HANDSHAKE_A_DONE_REMOTE_PEERID,
+
+	__HANDSHAKE_A_DONE_MAX,
+	HANDSHAKE_A_DONE_MAX = (__HANDSHAKE_A_DONE_MAX - 1)
+};
+
+enum {
+	HANDSHAKE_CMD_READY = 1,
+	HANDSHAKE_CMD_ACCEPT,
+	HANDSHAKE_CMD_DONE,
+
+	__HANDSHAKE_CMD_MAX,
+	HANDSHAKE_CMD_MAX = (__HANDSHAKE_CMD_MAX - 1)
+};
+
+#define HANDSHAKE_MCGRP_NONE	"none"
+
+#endif /* _UAPI_LINUX_HANDSHAKE_H */
diff --git a/net/Makefile b/net/Makefile
index 0914bea9c335..adbb64277601 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -79,3 +79,4 @@ obj-$(CONFIG_NET_NCSI)		+= ncsi/
 obj-$(CONFIG_XDP_SOCKETS)	+= xdp/
 obj-$(CONFIG_MPTCP)		+= mptcp/
 obj-$(CONFIG_MCTP)		+= mctp/
+obj-y				+= handshake/
diff --git a/net/handshake/Makefile b/net/handshake/Makefile
new file mode 100644
index 000000000000..a41b03f4837b
--- /dev/null
+++ b/net/handshake/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the Generic HANDSHAKE service
+#
+# Author: Chuck Lever <chuck.lever@oracle.com>
+#
+# Copyright (c) 2023, Oracle and/or its affiliates.
+#
+
+obj-y += handshake.o
+handshake-y := netlink.o request.o trace.o
diff --git a/net/handshake/handshake.h b/net/handshake/handshake.h
new file mode 100644
index 000000000000..366c7659ec09
--- /dev/null
+++ b/net/handshake/handshake.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generic netlink handshake service
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2023, Oracle and/or its affiliates.
+ */
+
+/*
+ * Data structures and functions that are visible only within the
+ * handshake module are declared here.
+ */
+
+#ifndef _INTERNAL_HANDSHAKE_H
+#define _INTERNAL_HANDSHAKE_H
+
+/*
+ * One handshake request
+ */
+struct handshake_req {
+	struct list_head		hr_list;
+	unsigned long			hr_flags;
+	const struct handshake_proto	*hr_proto;
+	struct socket			*hr_sock;
+
+	void				(*hr_saved_destruct)(struct sock *sk);
+};
+
+#define HANDSHAKE_F_COMPLETED	BIT(0)
+
+/* netlink.c */
+extern bool handshake_genl_inited;
+int handshake_genl_notify(struct net *net, int handler_class, gfp_t flags);
+
+/* request.c */
+void __remove_pending_locked(struct net *net, struct handshake_req *req);
+void handshake_complete(struct handshake_req *req, int status,
+			struct nlattr **tb);
+
+#endif /* _INTERNAL_HANDSHAKE_H */
diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
new file mode 100644
index 000000000000..581e382236cf
--- /dev/null
+++ b/net/handshake/netlink.c
@@ -0,0 +1,340 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Generic netlink handshake service
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2023, Oracle and/or its affiliates.
+ */
+
+#include <linux/types.h>
+#include <linux/socket.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/inet.h>
+
+#include <net/sock.h>
+#include <net/genetlink.h>
+#include <net/handshake.h>
+
+#include <uapi/linux/handshake.h>
+#include <trace/events/handshake.h>
+#include "handshake.h"
+
+static struct genl_family __ro_after_init handshake_genl_family;
+bool handshake_genl_inited;
+
+/**
+ * handshake_genl_notify - Notify handlers that a request is waiting
+ * @net: target network namespace
+ * @handler_class: target handler
+ * @flags: memory allocation control flags
+ *
+ * Returns zero on success or a negative errno if notification failed.
+ */
+int handshake_genl_notify(struct net *net, int handler_class, gfp_t flags)
+{
+	struct sk_buff *msg;
+	void *hdr;
+
+	if (!genl_has_listeners(&handshake_genl_family, net, handler_class))
+		return -ESRCH;
+
+	msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(msg, 0, 0, &handshake_genl_family, 0,
+			  HANDSHAKE_CMD_READY);
+	if (!hdr)
+		goto out_free;
+
+	if (nla_put_u32(msg, HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
+			handler_class) < 0) {
+		genlmsg_cancel(msg, hdr);
+		goto out_free;
+	}
+
+	genlmsg_end(msg, hdr);
+	return genlmsg_multicast_netns(&handshake_genl_family, net, msg,
+				       0, handler_class, flags);
+
+out_free:
+	nlmsg_free(msg);
+	return -EMSGSIZE;
+}
+
+/**
+ * handshake_genl_put - Create a generic netlink message header
+ * @msg: buffer in which to create the header
+ * @gi: generic netlink message context
+ *
+ * Returns a ready-to-use header, or NULL.
+ */
+struct nlmsghdr *handshake_genl_put(struct sk_buff *msg, struct genl_info *gi)
+{
+	return genlmsg_put(msg, gi->snd_portid, gi->snd_seq,
+			   &handshake_genl_family, 0, gi->genlhdr->cmd);
+}
+EXPORT_SYMBOL(handshake_genl_put);
+
+static int handshake_status_reply(struct sk_buff *skb, struct genl_info *gi,
+				  int status)
+{
+	struct nlmsghdr *hdr;
+	struct sk_buff *msg;
+	int ret;
+
+	ret = -ENOMEM;
+	msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		goto out;
+	hdr = handshake_genl_put(msg, gi);
+	if (!hdr)
+		goto out_free;
+
+	ret = -EMSGSIZE;
+	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_STATUS, status);
+	if (ret < 0)
+		goto out_free;
+
+	genlmsg_end(msg, hdr);
+	return genlmsg_reply(msg, gi);
+
+out_free:
+	genlmsg_cancel(msg, hdr);
+out:
+	return ret;
+}
+
+/*
+ * dup() a kernel socket for use as a user space file descriptor
+ * in the current process.
+ *
+ * Implicit argument: "current()"
+ */
+static int handshake_dup(struct socket *kernsock)
+{
+	struct file *file = get_file(kernsock->file);
+	int newfd;
+
+	newfd = get_unused_fd_flags(O_CLOEXEC);
+	if (newfd < 0) {
+		fput(file);
+		return newfd;
+	}
+
+	fd_install(newfd, file);
+	return newfd;
+}
+
+static const struct nla_policy
+handshake_accept_nl_policy[HANDSHAKE_A_ACCEPT_HANDLER_CLASS + 1] = {
+	[HANDSHAKE_A_ACCEPT_HANDLER_CLASS] = { .type = NLA_U32, },
+};
+
+static int handshake_nl_accept_doit(struct sk_buff *skb, struct genl_info *gi)
+{
+	struct nlattr *tb[HANDSHAKE_A_ACCEPT_MAX + 1];
+	struct net *net = sock_net(skb->sk);
+	struct handshake_req *pos, *req;
+	int fd, err;
+
+	err = -EINVAL;
+	if (genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
+			  HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
+			  handshake_accept_nl_policy, NULL))
+		goto out_status;
+	if (!tb[HANDSHAKE_A_ACCEPT_HANDLER_CLASS])
+		goto out_status;
+
+	req = NULL;
+	spin_lock(&net->hs_lock);
+	list_for_each_entry(pos, &net->hs_requests, hr_list) {
+		if (pos->hr_proto->hp_handler_class !=
+		    nla_get_u32(tb[HANDSHAKE_A_ACCEPT_HANDLER_CLASS]))
+			continue;
+		__remove_pending_locked(net, pos);
+		req = pos;
+		break;
+	}
+	spin_unlock(&net->hs_lock);
+	if (!req)
+		goto out_status;
+
+	fd = handshake_dup(req->hr_sock);
+	if (fd < 0) {
+		err = fd;
+		goto out_complete;
+	}
+	err = req->hr_proto->hp_accept(req, gi, fd);
+	if (err)
+		goto out_complete;
+
+	trace_handshake_cmd_accept(net, req, req->hr_sock, fd);
+	return 0;
+
+out_complete:
+	handshake_complete(req, -EIO, NULL);
+	fput(req->hr_sock->file);
+out_status:
+	trace_handshake_cmd_accept_err(net, req, NULL, err);
+	return handshake_status_reply(skb, gi, err);
+}
+
+static const struct nla_policy
+handshake_done_nl_policy[HANDSHAKE_A_DONE_MAX + 1] = {
+	[HANDSHAKE_A_DONE_SOCKFD] = { .type = NLA_U32, },
+	[HANDSHAKE_A_DONE_STATUS] = { .type = NLA_U32, },
+	[HANDSHAKE_A_DONE_REMOTE_PEERID] = { .type = NLA_U32, },
+};
+
+static int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *gi)
+{
+	struct nlattr *tb[HANDSHAKE_A_DONE_MAX + 1];
+	struct net *net = sock_net(skb->sk);
+	struct socket *sock = NULL;
+	struct handshake_req *req;
+	int fd, status, err;
+
+	err = genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
+			    HANDSHAKE_A_DONE_MAX, handshake_done_nl_policy,
+			    NULL);
+	if (err || !tb[HANDSHAKE_A_DONE_SOCKFD]) {
+		err = -EINVAL;
+		goto out_status;
+	}
+
+	fd = nla_get_u32(tb[HANDSHAKE_A_DONE_SOCKFD]);
+
+	err = 0;
+	sock = sockfd_lookup(fd, &err);
+	if (err) {
+		err = -EBADF;
+		goto out_status;
+	}
+
+	req = sock->sk->sk_handshake_req;
+	if (!req) {
+		err = -EBUSY;
+		goto out_status;
+	}
+
+	trace_handshake_cmd_done(net, req, sock, fd);
+
+	status = -EIO;
+	if (tb[HANDSHAKE_A_DONE_STATUS])
+		status = nla_get_u32(tb[HANDSHAKE_A_DONE_STATUS]);
+
+	handshake_complete(req, status, tb);
+	fput(sock->file);
+	return 0;
+
+out_status:
+	trace_handshake_cmd_done_err(net, req, sock, err);
+	return handshake_status_reply(skb, gi, err);
+}
+
+static const struct genl_split_ops handshake_nl_ops[] = {
+	{
+		.cmd		= HANDSHAKE_CMD_ACCEPT,
+		.doit		= handshake_nl_accept_doit,
+		.policy		= handshake_accept_nl_policy,
+		.maxattr	= HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd		= HANDSHAKE_CMD_DONE,
+		.doit		= handshake_nl_done_doit,
+		.policy		= handshake_done_nl_policy,
+		.maxattr	= HANDSHAKE_A_DONE_REMOTE_PEERID,
+		.flags		= GENL_CMD_CAP_DO,
+	},
+};
+
+static const struct genl_multicast_group handshake_nl_mcgrps[] = {
+	[HANDSHAKE_HANDLER_CLASS_NONE] = { .name = HANDSHAKE_MCGRP_NONE, },
+};
+
+static struct genl_family __ro_after_init handshake_genl_family = {
+	.hdrsize		= 0,
+	.name			= HANDSHAKE_FAMILY_NAME,
+	.version		= HANDSHAKE_FAMILY_VERSION,
+	.netnsok		= true,
+	.parallel_ops		= true,
+	.n_mcgrps		= ARRAY_SIZE(handshake_nl_mcgrps),
+	.n_split_ops		= ARRAY_SIZE(handshake_nl_ops),
+	.split_ops		= handshake_nl_ops,
+	.mcgrps			= handshake_nl_mcgrps,
+	.module			= THIS_MODULE,
+};
+
+static int __net_init handshake_net_init(struct net *net)
+{
+	spin_lock_init(&net->hs_lock);
+	INIT_LIST_HEAD(&net->hs_requests);
+	net->hs_pending	= 0;
+	return 0;
+}
+
+static void __net_exit handshake_net_exit(struct net *net)
+{
+	struct handshake_req *req;
+	LIST_HEAD(requests);
+
+	/*
+	 * This drains the net's pending list. Requests that
+	 * have been accepted and are in progress will be
+	 * destroyed when the socket is closed.
+	 */
+	spin_lock(&net->hs_lock);
+	list_splice_init(&requests, &net->hs_requests);
+	spin_unlock(&net->hs_lock);
+
+	while (!list_empty(&requests)) {
+		req = list_first_entry(&requests, struct handshake_req, hr_list);
+		list_del(&req->hr_list);
+
+		/*
+		 * Requests on this list have not yet been
+		 * accepted, so they do not have an fd to put.
+		 */
+
+		handshake_complete(req, -ETIMEDOUT, NULL);
+	}
+}
+
+static struct pernet_operations handshake_genl_net_ops = {
+	.init		= handshake_net_init,
+	.exit		= handshake_net_exit,
+};
+
+static int __init handshake_init(void)
+{
+	int ret;
+
+	ret = genl_register_family(&handshake_genl_family);
+	if (ret) {
+		pr_warn("handshake: netlink registration failed (%d)\n", ret);
+		return ret;
+	}
+
+	ret = register_pernet_subsys(&handshake_genl_net_ops);
+	if (ret) {
+		pr_warn("handshake: pernet registration failed (%d)\n", ret);
+		genl_unregister_family(&handshake_genl_family);
+	}
+
+	handshake_genl_inited = true;
+	return ret;
+}
+
+static void __exit handshake_exit(void)
+{
+	unregister_pernet_subsys(&handshake_genl_net_ops);
+	genl_unregister_family(&handshake_genl_family);
+}
+
+module_init(handshake_init);
+module_exit(handshake_exit);
diff --git a/net/handshake/request.c b/net/handshake/request.c
new file mode 100644
index 000000000000..1d3b8e76dd2c
--- /dev/null
+++ b/net/handshake/request.c
@@ -0,0 +1,246 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Handshake request lifetime events
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2023, Oracle and/or its affiliates.
+ */
+
+#include <linux/types.h>
+#include <linux/socket.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/inet.h>
+#include <linux/fdtable.h>
+
+#include <net/sock.h>
+#include <net/genetlink.h>
+#include <net/handshake.h>
+
+#include <uapi/linux/handshake.h>
+#include <trace/events/handshake.h>
+#include "handshake.h"
+
+/*
+ * This limit is to prevent slow remotes from causing denial of service.
+ * A ulimit-style tunable might be used instead.
+ */
+#define HANDSHAKE_PENDING_MAX (10)
+
+static void __add_pending_locked(struct net *net, struct handshake_req *req)
+{
+	net->hs_pending++;
+	list_add_tail(&req->hr_list, &net->hs_requests);
+}
+
+void __remove_pending_locked(struct net *net, struct handshake_req *req)
+{
+	net->hs_pending--;
+	list_del_init(&req->hr_list);
+}
+
+/*
+ * Return values:
+ *   %true - the request was found on @net's pending list
+ *   %false - the request was not found on @net's pending list
+ *
+ * If @req was on a pending list, it has not yet been accepted.
+ */
+static bool remove_pending(struct net *net, struct handshake_req *req)
+{
+	bool ret;
+
+	ret = false;
+
+	spin_lock(&net->hs_lock);
+	if (!list_empty(&req->hr_list)) {
+		__remove_pending_locked(net, req);
+		ret = true;
+	}
+	spin_unlock(&net->hs_lock);
+
+	return ret;
+}
+
+static void handshake_req_destroy(struct handshake_req *req, struct sock *sk)
+{
+	req->hr_proto->hp_destroy(req);
+	sk->sk_handshake_req = NULL;
+	kfree(req);
+}
+
+static void handshake_sk_destruct(struct sock *sk)
+{
+	struct handshake_req *req = sk->sk_handshake_req;
+
+	if (req) {
+		trace_handshake_destruct(sock_net(sk), req, req->hr_sock);
+		handshake_req_destroy(req, sk);
+	}
+}
+
+/**
+ * handshake_req_alloc - consumer API to allocate a request
+ * @sock: open socket on which to perform a handshake
+ * @proto: security protocol
+ * @flags: memory allocation flags
+ *
+ * Returns an initialized handshake_req or NULL.
+ */
+struct handshake_req *handshake_req_alloc(struct socket *sock,
+					  const struct handshake_proto *proto,
+					  gfp_t flags)
+{
+	struct handshake_req *req;
+
+	/* Avoid accessing uninitialized global variables later on */
+	if (!handshake_genl_inited)
+		return NULL;
+
+	req = kzalloc(sizeof(*req) + proto->hp_privsize, flags);
+	if (!req)
+		return NULL;
+
+	sock_hold(sock->sk);
+
+	INIT_LIST_HEAD(&req->hr_list);
+	req->hr_sock = sock;
+	req->hr_proto = proto;
+	return req;
+}
+EXPORT_SYMBOL(handshake_req_alloc);
+
+/**
+ * handshake_req_private - consumer API to return per-handshake private data
+ * @req: handshake arguments
+ *
+ */
+void *handshake_req_private(struct handshake_req *req)
+{
+	return (void *)(req + 1);
+}
+EXPORT_SYMBOL(handshake_req_private);
+
+/**
+ * handshake_req_submit - consumer API to submit a handshake request
+ * @req: handshake arguments
+ * @flags: memory allocation flags
+ *
+ * Return values:
+ *   %0: Request queued
+ *   %-EBUSY: A handshake is already under way for this socket
+ *   %-ESRCH: No handshake agent is available
+ *   %-EAGAIN: Too many pending handshake requests
+ *   %-ENOMEM: Failed to allocate memory
+ *   %-EMSGSIZE: Failed to construct notification message
+ *
+ * A zero return value from handshake_request() means that
+ * exactly one subsequent completion callback is guaranteed.
+ *
+ * A negative return value from handshake_request() means that
+ * no completion callback will be done and that @req is
+ * destroyed.
+ */
+int handshake_req_submit(struct handshake_req *req, gfp_t flags)
+{
+	struct socket *sock = req->hr_sock;
+	struct sock *sk = sock->sk;
+	struct net *net = sock_net(sk);
+	int ret;
+
+	ret = -EAGAIN;
+	if (READ_ONCE(net->hs_pending) >= HANDSHAKE_PENDING_MAX)
+		goto out_err;
+
+	ret = -EBUSY;
+	spin_lock(&net->hs_lock);
+	if (sk->sk_handshake_req || !list_empty(&req->hr_list)) {
+		spin_unlock(&net->hs_lock);
+		goto out_err;
+	}
+	req->hr_saved_destruct = sk->sk_destruct;
+	sk->sk_destruct = handshake_sk_destruct;
+	sk->sk_handshake_req = req;
+	__add_pending_locked(net, req);
+	spin_unlock(&net->hs_lock);
+
+	ret = handshake_genl_notify(net, req->hr_proto->hp_handler_class,
+				    flags);
+	if (ret) {
+		trace_handshake_notify_err(net, req, sock, ret);
+		if (remove_pending(net, req))
+			goto out_err;
+	}
+
+	trace_handshake_submit(net, req, sock);
+	return 0;
+
+out_err:
+	trace_handshake_submit_err(net, req, sock, ret);
+	handshake_req_destroy(req, sk);
+	return ret;
+}
+EXPORT_SYMBOL(handshake_req_submit);
+
+void handshake_complete(struct handshake_req *req, int status,
+			struct nlattr **tb)
+{
+	struct socket *sock = req->hr_sock;
+	struct net *net = sock_net(sock->sk);
+
+	if (!test_and_set_bit(HANDSHAKE_F_COMPLETED, &req->hr_flags)) {
+		trace_handshake_complete(net, req, sock, status);
+		req->hr_proto->hp_done(req, status, tb);
+		__sock_put(sock->sk);
+	}
+}
+
+/**
+ * handshake_req_cancel - consumer API to cancel an in-progress handshake
+ * @sock: socket on which there is an ongoing handshake
+ *
+ * XXX: Perhaps killing the user space agent might also be necessary?
+ *
+ * Request cancellation races with request completion. To determine
+ * who won, callers examine the return value from this function.
+ *
+ * Return values:
+ *   %0 - Uncompleted handshake request was canceled or not found
+ *   %-EBUSY - Handshake request already completed
+ */
+int handshake_req_cancel(struct socket *sock)
+{
+	struct handshake_req *req;
+	struct sock *sk;
+	struct net *net;
+
+	if (!sock)
+		return 0;
+
+	sk = sock->sk;
+	req = sk->sk_handshake_req;
+	net = sock_net(sk);
+
+	if (!req) {
+		trace_handshake_cancel_none(net, req, sock);
+		return 0;
+	}
+
+	if (remove_pending(net, req)) {
+		/* Request hadn't been accepted */
+		trace_handshake_cancel(net, req, sock);
+		return 0;
+	}
+	if (test_and_set_bit(HANDSHAKE_F_COMPLETED, &req->hr_flags)) {
+		/* Request already completed */
+		trace_handshake_cancel_busy(net, req, sock);
+		return -EBUSY;
+	}
+
+	__sock_put(sk);
+	trace_handshake_cancel(net, req, sock);
+	return 0;
+}
+EXPORT_SYMBOL(handshake_req_cancel);
diff --git a/net/handshake/trace.c b/net/handshake/trace.c
new file mode 100644
index 000000000000..3a5b6f29a2b8
--- /dev/null
+++ b/net/handshake/trace.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trace points for transport security layer handshakes.
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2023, Oracle and/or its affiliates.
+ */
+
+#include <linux/types.h>
+#include <net/sock.h>
+
+#include "handshake.h"
+
+#define CREATE_TRACE_POINTS
+
+#include <trace/events/handshake.h>



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 2/2] net/tls: Add kernel APIs for requesting a TLSv1.3 handshake
  2023-02-24 19:19 [PATCH v5 0/2] Another crack at a handshake upcall mechanism Chuck Lever
  2023-02-24 19:19 ` [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests Chuck Lever
@ 2023-02-24 19:19 ` Chuck Lever
  2023-02-27  9:36   ` Hannes Reinecke
  1 sibling, 1 reply; 15+ messages in thread
From: Chuck Lever @ 2023-02-24 19:19 UTC (permalink / raw)
  To: kuba, pabeni, edumazet; +Cc: netdev, kernel-tls-handshake

From: Chuck Lever <chuck.lever@oracle.com>

To enable kernel consumers of TLS to request a TLS handshake, add
support to net/tls/ to send a handshake upcall. This patch also
acts as a template for adding handshake upcall support to other
transport layer security mechanisms.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 Documentation/netlink/specs/handshake.yaml |    4 
 Documentation/networking/index.rst         |    1 
 Documentation/networking/tls-handshake.rst |  146 ++++++++++
 include/net/tls.h                          |   27 ++
 include/uapi/linux/handshake.h             |    2 
 net/handshake/netlink.c                    |    1 
 net/tls/Makefile                           |    2 
 net/tls/tls_handshake.c                    |  423 ++++++++++++++++++++++++++++
 8 files changed, 604 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/networking/tls-handshake.rst
 create mode 100644 net/tls/tls_handshake.c

diff --git a/Documentation/netlink/specs/handshake.yaml b/Documentation/netlink/specs/handshake.yaml
index 683a8f2df0a7..c2f6bfff2326 100644
--- a/Documentation/netlink/specs/handshake.yaml
+++ b/Documentation/netlink/specs/handshake.yaml
@@ -21,7 +21,7 @@ definitions:
     name: handler-class
     enum-name:
     value-start: 0
-    entries: [ none ]
+    entries: [ none, tlshd ]
   -
     type: enum
     name: msg-type
@@ -132,3 +132,5 @@ mcast-groups:
   list:
     -
       name: none
+    -
+      name: tlshd
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 4ddcae33c336..189517f4ea96 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -36,6 +36,7 @@ Contents:
    scaling
    tls
    tls-offload
+   tls-handshake
    nfc
    6lowpan
    6pack
diff --git a/Documentation/networking/tls-handshake.rst b/Documentation/networking/tls-handshake.rst
new file mode 100644
index 000000000000..f09fc6c09580
--- /dev/null
+++ b/Documentation/networking/tls-handshake.rst
@@ -0,0 +1,146 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+In-Kernel TLS Handshake
+=======================
+
+Overview
+========
+
+Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs
+over TCP. TLS provides end-to-end data integrity and confidentiality,
+in addition to peer authentication.
+
+The kernel's kTLS implementation handles the TLS record subprotocol, but
+does not handle the TLS handshake subprotocol which is used to establish
+a TLS session. Kernel consumers can use the API described here to
+request TLS session establishment.
+
+There are several possible ways to provide a handshake service in the
+kernel. The API described here is designed to hide the details of those
+implementations so that in-kernel TLS consumers do not need to be
+aware of how the handshake gets done.
+
+
+User handshake agent
+====================
+
+As of this writing, there is no TLS handshake implementation in the
+Linux kernel. Thus, with the current implementation, a user agent is
+started in each network namespace where a kernel consumer might require
+a TLS handshake. This agent listens for events sent from the kernel
+that request a handshake on an open and connected TCP socket.
+
+The open socket is passed to user space via a netlink operation, which
+creates a socket descriptor in the agent's file descriptor table. If the
+handshake completes successfully, the user agent promotes the socket to
+use the TLS ULP and sets the session information using the SOL_TLS socket
+options. The user agent returns the socket to the kernel via a second
+netlink operation.
+
+
+Kernel Handshake API
+====================
+
+A kernel TLS consumer initiates a client-side TLS handshake on an open
+socket by invoking one of the tls_client_hello() functions. For example:
+
+.. code-block:: c
+
+  ret = tls_client_hello_x509(sock, done_func, cookie, priorities,
+                              cert, privkey);
+
+The function returns zero when the handshake request is under way. A
+zero return guarantees the callback function @done_func will be invoked
+for this socket.
+
+The function returns a negative errno if the handshake could not be
+started. A negative errno guarantees the callback function @done_func
+will not be invoked on this socket.
+
+The @sock argument is an open and connected socket. The caller must hold
+a reference on the socket to prevent it from being destroyed while the
+handshake is in progress.
+
+@done_func and @cookie are a callback function that is invoked when the
+handshake has completed. The success status of the handshake is returned
+via the @status parameter of the callback function. A good practice is
+to close and destroy the socket immediately if the handshake has failed.
+
+@priorities is a GnuTLS priorities string that controls the handshake.
+The special value TLS_DEFAULT_PRIORITIES causes the handshake to
+operate using default TLS priorities. However, the caller can use the
+string to (for example) adjust the handshake to use a restricted set
+of ciphers (say, if the kernel consumer wishes to mandate only a
+limited set of ciphers).
+
+@cert is the serial number of a key that contains a DER format x.509
+certificate that the handshake agent presents to the remote as the local
+peer's identity.
+
+@privkey is the serial number of a key that contains a DER-format
+private key associated with the x.509 certificate.
+
+
+To initiate a client-side TLS handshake with a pre-shared key, use:
+
+.. code-block:: c
+
+  ret = tls_client_hello_psk(sock, done_func, cookie, priorities,
+                             peerid);
+
+@peerid is the serial number of a key that contains the pre-shared
+key to be used for the handshake.
+
+The other parameters are as above.
+
+
+To initiate an anonymous client-side TLS handshake use:
+
+.. code-block:: c
+
+  ret = tls_client_hello_anon(sock, done_func, cookie, priorities);
+
+The parameters are as above.
+
+The handshake agent presents no peer identity information to the
+remote during the handshake. Only server authentication is performed
+during the handshake. Thus the established session uses encryption
+only.
+
+
+Consumers that are in-kernel servers use:
+
+.. code-block:: c
+
+  ret = tls_server_hello(sock, done_func, cookie, priorities);
+
+The parameters for this operation are as above.
+
+
+Lastly, if the consumer needs to cancel the handshake request, say,
+due to a ^C or other exigent event, the handshake core provides
+this API:
+
+.. code-block:: c
+
+  handshake_cancel(sock);
+
+
+Other considerations
+--------------------
+
+While a handshake is under way, the kernel consumer must alter the
+socket's sk_data_ready callback function to ignore all incoming data.
+Once the handshake completion callback function has been invoked,
+normal receive operation can be resumed.
+
+Once a TLS session is established, the consumer must provide a buffer
+for and then examine the control message (CMSG) that is part of every
+subsequent sock_recvmsg(). Each control message indicates whether the
+received message data is TLS record data or session metadata.
+
+See tls.rst for details on how a kTLS consumer recognizes incoming
+(decrypted) application data, alerts, and handshake packets once the
+socket has been promoted to use the TLS ULP.
+
diff --git a/include/net/tls.h b/include/net/tls.h
index 154949c7b0c8..505b23992ef0 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -512,4 +512,31 @@ static inline bool tls_is_sk_rx_device_offloaded(struct sock *sk)
 	return tls_get_ctx(sk)->rx_conf == TLS_HW;
 }
 #endif
+
+#define TLS_DEFAULT_PRIORITIES		(NULL)
+
+enum {
+	TLS_NO_PEERID = 0,
+	TLS_NO_CERT = 0,
+	TLS_NO_PRIVKEY = 0,
+};
+
+typedef void	(*tls_done_func_t)(void *data, int status,
+				   key_serial_t peerid);
+
+int tls_client_hello_anon(struct socket *sock, tls_done_func_t done,
+			  void *data, const char *priorities);
+int tls_client_hello_x509(struct socket *sock, tls_done_func_t done,
+			  void *data, const char *priorities,
+			  key_serial_t cert, key_serial_t privkey);
+int tls_client_hello_psk(struct socket *sock, tls_done_func_t done,
+			 void *data, const char *priorities,
+			 key_serial_t peerid);
+int tls_server_hello_x509(struct socket *sock, tls_done_func_t done,
+			  void *data, const char *priorities);
+int tls_server_hello_psk(struct socket *sock, tls_done_func_t done,
+			 void *data, const char *priorities);
+
+int tls_handshake_cancel(struct socket *sock);
+
 #endif /* _TLS_OFFLOAD_H */
diff --git a/include/uapi/linux/handshake.h b/include/uapi/linux/handshake.h
index 09fd7c37cba4..dad8227939a1 100644
--- a/include/uapi/linux/handshake.h
+++ b/include/uapi/linux/handshake.h
@@ -11,6 +11,7 @@
 
 enum {
 	HANDSHAKE_HANDLER_CLASS_NONE,
+	HANDSHAKE_HANDLER_CLASS_TLSHD,
 };
 
 enum {
@@ -59,5 +60,6 @@ enum {
 };
 
 #define HANDSHAKE_MCGRP_NONE	"none"
+#define HANDSHAKE_MCGRP_TLSHD	"tlshd"
 
 #endif /* _UAPI_LINUX_HANDSHAKE_H */
diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
index 581e382236cf..88775f784305 100644
--- a/net/handshake/netlink.c
+++ b/net/handshake/netlink.c
@@ -255,6 +255,7 @@ static const struct genl_split_ops handshake_nl_ops[] = {
 
 static const struct genl_multicast_group handshake_nl_mcgrps[] = {
 	[HANDSHAKE_HANDLER_CLASS_NONE] = { .name = HANDSHAKE_MCGRP_NONE, },
+	[HANDSHAKE_HANDLER_CLASS_TLSHD] = { .name = HANDSHAKE_MCGRP_TLSHD, },
 };
 
 static struct genl_family __ro_after_init handshake_genl_family = {
diff --git a/net/tls/Makefile b/net/tls/Makefile
index e41c800489ac..7e56b57f14f6 100644
--- a/net/tls/Makefile
+++ b/net/tls/Makefile
@@ -7,7 +7,7 @@ CFLAGS_trace.o := -I$(src)
 
 obj-$(CONFIG_TLS) += tls.o
 
-tls-y := tls_main.o tls_sw.o tls_proc.o trace.o tls_strp.o
+tls-y := tls_handshake.o tls_main.o tls_sw.o tls_proc.o trace.o tls_strp.o
 
 tls-$(CONFIG_TLS_TOE) += tls_toe.o
 tls-$(CONFIG_TLS_DEVICE) += tls_device.o tls_device_fallback.o
diff --git a/net/tls/tls_handshake.c b/net/tls/tls_handshake.c
new file mode 100644
index 000000000000..74d32a9ca857
--- /dev/null
+++ b/net/tls/tls_handshake.c
@@ -0,0 +1,423 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Establish a TLS session for a kernel socket consumer
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2021-2023, Oracle and/or its affiliates.
+ */
+
+#include <linux/types.h>
+#include <linux/socket.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+#include <net/sock.h>
+#include <net/tls.h>
+#include <net/genetlink.h>
+#include <net/handshake.h>
+
+#include <uapi/linux/handshake.h>
+
+/*
+ * TLS priorities string passed to the GnuTLS library.
+ *
+ * Specifically for kernel TLS consumers: enable only TLS v1.3 and the
+ * ciphers that are supported by kTLS.
+ *
+ * Currently this list is generated by hand from the supported ciphers
+ * found in include/uapi/linux/tls.h.
+ */
+#define KTLS_DEFAULT_PRIORITIES \
+	"SECURE256:+SECURE128:-COMP-ALL" \
+	":-VERS-ALL:+VERS-TLS1.3:%NO_TICKETS" \
+	":-CIPHER-ALL:+CHACHA20-POLY1305:+AES-256-GCM:+AES-128-GCM:+AES-128-CCM"
+
+struct tls_handshake_req {
+	void			(*th_consumer_done)(void *data, int status,
+						    key_serial_t peerid);
+	void			*th_consumer_data;
+
+	const char		*th_priorities;
+	int			th_type;
+	int			th_auth_type;
+	key_serial_t		th_peerid;
+	key_serial_t		th_certificate;
+	key_serial_t		th_privkey;
+
+};
+
+static const char *tls_handshake_dup_priorities(const char *priorities,
+						gfp_t flags)
+{
+	const char *tp;
+
+	if (priorities != TLS_DEFAULT_PRIORITIES && strlen(priorities))
+		tp = priorities;
+	else
+		tp = KTLS_DEFAULT_PRIORITIES;
+	return kstrdup(tp, flags);
+}
+
+static struct tls_handshake_req *
+tls_handshake_req_init(struct handshake_req *req, tls_done_func_t done,
+		       void *data, const char *priorities)
+{
+	struct tls_handshake_req *treq = handshake_req_private(req);
+
+	treq->th_consumer_done = done;
+	treq->th_consumer_data = data;
+	treq->th_priorities = priorities;
+	treq->th_peerid = TLS_NO_PEERID;
+	treq->th_certificate = TLS_NO_CERT;
+	treq->th_privkey = TLS_NO_PRIVKEY;
+	return treq;
+}
+
+/**
+ * tls_handshake_destroy - callback to release a handshake request
+ * @req: handshake parameters to release
+ *
+ */
+static void tls_handshake_destroy(struct handshake_req *req)
+{
+	struct tls_handshake_req *treq = handshake_req_private(req);
+
+	kfree(treq->th_priorities);
+}
+
+/**
+ * tls_handshake_done - callback to handle a CMD_DONE request
+ * @req: socket on which the handshake was performed
+ * @status: session status code
+ * @tb: other results of session establishment
+ *
+ * Eventually this will return information about the established
+ * session: whether it is authenticated, and if so, who the remote
+ * is.
+ */
+static void tls_handshake_done(struct handshake_req *req, int status,
+			       struct nlattr **tb)
+{
+	struct tls_handshake_req *treq = handshake_req_private(req);
+	key_serial_t peerid = TLS_NO_PEERID;
+
+	if (tb[HANDSHAKE_A_DONE_REMOTE_PEERID])
+		peerid = nla_get_u32(tb[HANDSHAKE_A_DONE_REMOTE_PEERID]);
+
+	treq->th_consumer_done(treq->th_consumer_data, status, peerid);
+}
+
+static int tls_handshake_put_accept_resp(struct sk_buff *msg,
+					 struct tls_handshake_req *treq)
+{
+	int ret;
+
+	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MESSAGE_TYPE, treq->th_type);
+	if (ret < 0)
+		goto out;
+	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_AUTH, treq->th_auth_type);
+	if (ret < 0)
+		goto out;
+	switch (treq->th_auth_type) {
+	case HANDSHAKE_AUTH_X509:
+		if (treq->th_certificate != TLS_NO_CERT) {
+			ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MY_PEERID,
+					  treq->th_certificate);
+			if (ret < 0)
+				goto out;
+		}
+		if (treq->th_privkey != TLS_NO_PRIVKEY) {
+			ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MY_PRIVKEY,
+					  treq->th_privkey);
+			if (ret < 0)
+				goto out;
+		}
+		break;
+	case HANDSHAKE_AUTH_PSK:
+		if (treq->th_peerid != TLS_NO_PEERID) {
+			ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MY_PEERID,
+					  treq->th_peerid);
+			if (ret < 0)
+				goto out;
+		}
+		break;
+	}
+
+	ret = nla_put_string(msg, HANDSHAKE_A_ACCEPT_GNUTLS_PRIORITIES,
+			     treq->th_priorities);
+	if (ret < 0)
+		goto out;
+
+	ret = 0;
+
+out:
+	return ret;
+}
+
+/**
+ * tls_handshake_accept - callback to construct a CMD_ACCEPT response
+ * @req: handshake parameters to return
+ * @gi: generic netlink message context
+ * @fd: file descriptor to be returned
+ *
+ * Returns zero on success, or a negative errno on failure.
+ */
+static int tls_handshake_accept(struct handshake_req *req,
+				struct genl_info *gi, int fd)
+{
+	struct tls_handshake_req *treq = handshake_req_private(req);
+	struct nlmsghdr *hdr;
+	struct sk_buff *msg;
+	int ret;
+
+	ret = -ENOMEM;
+	msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		goto out;
+	hdr = handshake_genl_put(msg, gi);
+	if (!hdr)
+		goto out_cancel;
+
+	ret = -EMSGSIZE;
+	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_SOCKFD, fd);
+	if (ret < 0)
+		goto out_cancel;
+
+	ret = tls_handshake_put_accept_resp(msg, treq);
+	if (ret < 0)
+		goto out_cancel;
+
+	genlmsg_end(msg, hdr);
+	return genlmsg_reply(msg, gi);
+
+out_cancel:
+	genlmsg_cancel(msg, hdr);
+out:
+	return ret;
+}
+
+static const struct handshake_proto tls_handshake_proto = {
+	.hp_handler_class	= HANDSHAKE_HANDLER_CLASS_TLSHD,
+	.hp_privsize		= sizeof(struct tls_handshake_req),
+
+	.hp_accept		= tls_handshake_accept,
+	.hp_done		= tls_handshake_done,
+	.hp_destroy		= tls_handshake_destroy,
+};
+
+/**
+ * tls_client_hello_anon - request an anonymous TLS handshake on a socket
+ * @sock: connected socket on which to perform the handshake
+ * @done: function to call when the handshake has completed
+ * @data: token to pass back to @done
+ * @priorities: GnuTLS TLS priorities string, or NULL
+ *
+ * Return values:
+ *   %0: Handshake request enqueue; ->done will be called when complete
+ *   %-ENOENT: No user agent is available
+ *   %-ENOMEM: Memory allocation failed
+ */
+int tls_client_hello_anon(struct socket *sock, tls_done_func_t done,
+			  void *data, const char *priorities)
+{
+	struct tls_handshake_req *treq;
+	struct handshake_req *req;
+	gfp_t flags = GFP_NOWAIT;
+	const char *tp;
+
+	tp = tls_handshake_dup_priorities(priorities, flags);
+	if (!tp)
+		return -ENOMEM;
+
+	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
+	if (!req) {
+		kfree(tp);
+		return -ENOMEM;
+	}
+
+	treq = tls_handshake_req_init(req, done, data, tp);
+	treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO;
+	treq->th_auth_type = HANDSHAKE_AUTH_UNAUTH;
+
+	return handshake_req_submit(req, flags);
+}
+EXPORT_SYMBOL(tls_client_hello_anon);
+
+/**
+ * tls_client_hello_x509 - request an x.509-based TLS handshake on a socket
+ * @sock: connected socket on which to perform the handshake
+ * @done: function to call when the handshake has completed
+ * @data: token to pass back to @done
+ * @priorities: GnuTLS TLS priorities string
+ * @cert: serial number of key containing client's x.509 certificate
+ * @privkey: serial number of key containing client's private key
+ *
+ * Return values:
+ *   %0: Handshake request enqueue; ->done will be called when complete
+ *   %-ENOENT: No user agent is available
+ *   %-ENOMEM: Memory allocation failed
+ */
+int tls_client_hello_x509(struct socket *sock, tls_done_func_t done,
+			  void *data, const char *priorities,
+			  key_serial_t cert, key_serial_t privkey)
+{
+	struct tls_handshake_req *treq;
+	struct handshake_req *req;
+	gfp_t flags = GFP_NOWAIT;
+	const char *tp;
+
+	tp = tls_handshake_dup_priorities(priorities, flags);
+	if (!tp)
+		return -ENOMEM;
+
+	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
+	if (!req) {
+		kfree(tp);
+		return -ENOMEM;
+	}
+
+	treq = tls_handshake_req_init(req, done, data, tp);
+	treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO;
+	treq->th_auth_type = HANDSHAKE_AUTH_X509;
+	treq->th_certificate = cert;
+	treq->th_privkey = privkey;
+
+	return handshake_req_submit(req, flags);
+}
+EXPORT_SYMBOL(tls_client_hello_x509);
+
+/**
+ * tls_client_hello_psk - request a PSK-based TLS handshake on a socket
+ * @sock: connected socket on which to perform the handshake
+ * @done: function to call when the handshake has completed
+ * @data: token to pass back to @done
+ * @priorities: GnuTLS TLS priorities string
+ * @peerid: serial number of key containing TLS identity
+ *
+ * Return values:
+ *   %0: Handshake request enqueue; ->done will be called when complete
+ *   %-ENOENT: No user agent is available
+ *   %-ENOMEM: Memory allocation failed
+ */
+int tls_client_hello_psk(struct socket *sock, tls_done_func_t done,
+			 void *data, const char *priorities,
+			 key_serial_t peerid)
+{
+	struct tls_handshake_req *treq;
+	struct handshake_req *req;
+	gfp_t flags = GFP_NOWAIT;
+	const char *tp;
+
+	tp = tls_handshake_dup_priorities(priorities, flags);
+	if (!tp)
+		return -ENOMEM;
+
+	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
+	if (!req) {
+		kfree(tp);
+		return -ENOMEM;
+	}
+
+	treq = tls_handshake_req_init(req, done, data, tp);
+	treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO;
+	treq->th_auth_type = HANDSHAKE_AUTH_PSK;
+	treq->th_peerid = peerid;
+
+	return handshake_req_submit(req, flags);
+}
+EXPORT_SYMBOL(tls_client_hello_psk);
+
+/**
+ * tls_server_hello_x509 - request a server TLS handshake on a socket
+ * @sock: connected socket on which to perform the handshake
+ * @done: function to call when the handshake has completed
+ * @data: token to pass back to @done
+ * @priorities: GnuTLS TLS priorities string
+ *
+ * Return values:
+ *   %0: Handshake request enqueue; ->done will be called when complete
+ *   %-ENOENT: No user agent is available
+ *   %-ENOMEM: Memory allocation failed
+ */
+int tls_server_hello_x509(struct socket *sock, tls_done_func_t done,
+			  void *data, const char *priorities)
+{
+	struct tls_handshake_req *treq;
+	struct handshake_req *req;
+	gfp_t flags = GFP_KERNEL;
+	const char *tp;
+
+	tp = tls_handshake_dup_priorities(priorities, flags);
+	if (!tp)
+		return -ENOMEM;
+
+	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
+	if (!req) {
+		kfree(tp);
+		return -ENOMEM;
+	}
+
+	treq = tls_handshake_req_init(req, done, data, tp);
+	treq->th_type = HANDSHAKE_MSG_TYPE_SERVERHELLO;
+	treq->th_auth_type = HANDSHAKE_AUTH_X509;
+
+	return handshake_req_submit(req, flags);
+}
+EXPORT_SYMBOL(tls_server_hello_x509);
+
+/**
+ * tls_server_hello_psk - request a server TLS handshake on a socket
+ * @sock: connected socket on which to perform the handshake
+ * @done: function to call when the handshake has completed
+ * @data: token to pass back to @done
+ * @priorities: GnuTLS TLS priorities string
+ *
+ * Return values:
+ *   %0: Handshake request enqueue; ->done will be called when complete
+ *   %-ENOENT: No user agent is available
+ *   %-ENOMEM: Memory allocation failed
+ */
+int tls_server_hello_psk(struct socket *sock, tls_done_func_t done,
+			 void *data, const char *priorities)
+{
+	struct tls_handshake_req *treq;
+	struct handshake_req *req;
+	gfp_t flags = GFP_KERNEL;
+	const char *tp;
+
+	tp = tls_handshake_dup_priorities(priorities, flags);
+	if (!tp)
+		return -ENOMEM;
+
+	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
+	if (!req) {
+		kfree(tp);
+		return -ENOMEM;
+	}
+
+	treq = tls_handshake_req_init(req, done, data, tp);
+	treq->th_type = HANDSHAKE_MSG_TYPE_SERVERHELLO;
+	treq->th_auth_type = HANDSHAKE_AUTH_PSK;
+
+	return handshake_req_submit(req, flags);
+}
+EXPORT_SYMBOL(tls_server_hello_psk);
+
+/**
+ * tls_handshake_cancel - cancel a pending handshake
+ * @sock: socket on which there is an ongoing handshake
+ *
+ * Request cancellation races with request completion. To determine
+ * who won, callers examine the return value from this function.
+ *
+ * Return values:
+ *   %0 - Uncompleted handshake request was canceled
+ *   %-EBUSY - Handshake request already completed
+ */
+int tls_handshake_cancel(struct socket *sock)
+{
+	return handshake_req_cancel(sock);
+}
+EXPORT_SYMBOL(tls_handshake_cancel);



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests
  2023-02-24 19:19 ` [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests Chuck Lever
@ 2023-02-27  9:24   ` Hannes Reinecke
  2023-02-27 14:59     ` Chuck Lever III
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-02-27  9:24 UTC (permalink / raw)
  To: Chuck Lever, kuba, pabeni, edumazet; +Cc: netdev, kernel-tls-handshake

On 2/24/23 20:19, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> When a kernel consumer needs a transport layer security session, it
> first needs a handshake to negotiate and establish a session. This
> negotiation can be done in user space via one of the several
> existing library implementations, or it can be done in the kernel.
> 
> No in-kernel handshake implementations yet exist. In their absence,
> we add a netlink service that can:
> 
> a. Notify a user space daemon that a handshake is needed.
> 
> b. Once notified, the daemon calls the kernel back via this
>     netlink service to get the handshake parameters, including an
>     open socket on which to establish the session.
> 
> c. Once the handshake is complete, the daemon reports the
>     session status and other information via a second netlink
>     operation. This operation marks that it is safe for the
>     kernel to use the open socket and the security session
>     established there.
> 
> The notification service uses a multicast group. Each handshake
> mechanism (eg, tlshd) adopts its own group number so that the
> handshake services are completely independent of one another. The
> kernel can then tell via netlink_has_listeners() whether a handshake
> service is active and prepared to handle a handshake request.
> 
> A new netlink operation, ACCEPT, acts like accept(2) in that it
> instantiates a file descriptor in the user space daemon's fd table.
> If this operation is successful, the reply carries the fd number,
> which can be treated as an open and ready file descriptor.
> 
> While user space is performing the handshake, the kernel keeps its
> muddy paws off the open socket. A second new netlink operation,
> DONE, indicates that the user space daemon is finished with the
> socket and it is safe for the kernel to use again. The operation
> also indicates whether a session was established successfully.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   Documentation/netlink/specs/handshake.yaml |  134 +++++++++++
>   include/net/handshake.h                    |   45 ++++
>   include/net/net_namespace.h                |    5
>   include/net/sock.h                         |    1
>   include/trace/events/handshake.h           |  159 +++++++++++++
>   include/uapi/linux/handshake.h             |   63 +++++
>   net/Makefile                               |    1
>   net/handshake/Makefile                     |   11 +
>   net/handshake/handshake.h                  |   41 +++
>   net/handshake/netlink.c                    |  340 ++++++++++++++++++++++++++++
>   net/handshake/request.c                    |  246 ++++++++++++++++++++
>   net/handshake/trace.c                      |   17 +
>   12 files changed, 1063 insertions(+)
>   create mode 100644 Documentation/netlink/specs/handshake.yaml
>   create mode 100644 include/net/handshake.h
>   create mode 100644 include/trace/events/handshake.h
>   create mode 100644 include/uapi/linux/handshake.h
>   create mode 100644 net/handshake/Makefile
>   create mode 100644 net/handshake/handshake.h
>   create mode 100644 net/handshake/netlink.c
>   create mode 100644 net/handshake/request.c
>   create mode 100644 net/handshake/trace.c
> 
> diff --git a/Documentation/netlink/specs/handshake.yaml b/Documentation/netlink/specs/handshake.yaml
> new file mode 100644
> index 000000000000..683a8f2df0a7
> --- /dev/null
> +++ b/Documentation/netlink/specs/handshake.yaml
> @@ -0,0 +1,134 @@
> +# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
> +#
> +# GENL HANDSHAKE service.
> +#
> +# Author: Chuck Lever <chuck.lever@oracle.com>
> +#
> +# Copyright (c) 2023, Oracle and/or its affiliates.
> +#
> +
> +name: handshake
> +
> +protocol: genetlink-c
> +
> +doc: Netlink protocol to request a transport layer security handshake.
> +
> +uapi-header: linux/net/handshake.h
> +
> +definitions:
> +  -
> +    type: enum
> +    name: handler-class
> +    enum-name:
> +    value-start: 0
> +    entries: [ none ]
> +  -
> +    type: enum
> +    name: msg-type
> +    enum-name:
> +    value-start: 0
> +    entries: [ unspec, clienthello, serverhello ]
> +  -
> +    type: enum
> +    name: auth
> +    enum-name:
> +    value-start: 0
> +    entries: [ unspec, unauth, x509, psk ]
> +
> +attribute-sets:
> +  -
> +    name: accept
> +    attributes:
> +      -
> +        name: status
> +        doc: Status of this accept operation
> +        type: u32
> +        value: 1
> +      -
> +        name: sockfd
> +        doc: File descriptor of socket to use
> +        type: u32
> +      -
> +        name: handler-class
> +        doc: Which type of handler is responding
> +        type: u32
> +        enum: handler-class
> +      -
> +        name: message-type
> +        doc: Handshake message type
> +        type: u32
> +        enum: msg-type
> +      -
> +        name: auth
> +        doc: Authentication mode
> +        type: u32
> +        enum: auth
> +      -
> +        name: gnutls-priorities
> +        doc: GnuTLS priority string
> +        type: string
> +      -
> +        name: my-peerid
> +        doc: Serial no of key containing local identity
> +        type: u32
> +      -
> +        name: my-privkey
> +        doc: Serial no of key containing optional private key
> +        type: u32
> +  -
> +    name: done
> +    attributes:
> +      -
> +        name: status
> +        doc: Session status
> +        type: u32
> +        value: 1
> +      -
> +        name: sockfd
> +        doc: File descriptor of socket that has completed
> +        type: u32
> +      -
> +        name: remote-peerid
> +        doc: Serial no of keys containing identities of remote peer
> +        type: u32
> +
> +operations:
> +  list:
> +    -
> +      name: ready
> +      doc: Notify handlers that a new handshake request is waiting
> +      value: 1
> +      notify: accept
> +    -
> +      name: accept
> +      doc: Handler retrieves next queued handshake request
> +      attribute-set: accept
> +      flags: [ admin-perm ]
> +      do:
> +        request:
> +          attributes:
> +            - handler-class
> +        reply:
> +          attributes:
> +            - status
> +            - sockfd
> +            - message-type
> +            - auth
> +            - gnutls-priorities
> +            - my-peerid
> +            - my-privkey
> +    -
> +      name: done
> +      doc: Handler reports handshake completion
> +      attribute-set: done
> +      do:
> +        request:
> +          attributes:
> +            - status
> +            - sockfd
> +            - remote-peerid
> +
> +mcast-groups:
> +  list:
> +    -
> +      name: none
> diff --git a/include/net/handshake.h b/include/net/handshake.h
> new file mode 100644
> index 000000000000..08f859237936
> --- /dev/null
> +++ b/include/net/handshake.h
> @@ -0,0 +1,45 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Generic HANDSHAKE service.
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2023, Oracle and/or its affiliates.
> + */
> +
> +/*
> + * Data structures and functions that are visible only within the
> + * kernel are declared here.
> + */
> +
> +#ifndef _NET_HANDSHAKE_H
> +#define _NET_HANDSHAKE_H
> +
> +struct handshake_req;
> +
> +/*
> + * Invariants for all handshake requests for one transport layer
> + * security protocol
> + */
> +struct handshake_proto {
> +	int			hp_handler_class;
> +	size_t			hp_privsize;
> +
> +	int			(*hp_accept)(struct handshake_req *req,
> +					     struct genl_info *gi, int fd);
> +	void			(*hp_done)(struct handshake_req *req,
> +					   int status, struct nlattr **tb);
> +	void			(*hp_destroy)(struct handshake_req *req);
> +};
> +
> +extern struct handshake_req *
> +handshake_req_alloc(struct socket *sock, const struct handshake_proto *proto,
> +		    gfp_t flags);
> +extern void *handshake_req_private(struct handshake_req *req);
> +extern int handshake_req_submit(struct handshake_req *req, gfp_t flags);
> +extern int handshake_req_cancel(struct socket *sock);
> +
> +extern struct nlmsghdr *handshake_genl_put(struct sk_buff *msg,
> +					   struct genl_info *gi);
> +
> +#endif /* _NET_HANDSHAKE_H */
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index 78beaa765c73..a0ce9de4dab1 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -188,6 +188,11 @@ struct net {
>   #if IS_ENABLED(CONFIG_SMC)
>   	struct netns_smc	smc;
>   #endif
> +
> +	/* transport layer security handshake requests */
> +	spinlock_t		hs_lock;
> +	struct list_head	hs_requests;
> +	int			hs_pending;
>   } __randomize_layout;
>   
>   #include <linux/seq_file_net.h>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 573f2bf7e0de..2a7345ce2540 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -519,6 +519,7 @@ struct sock {
>   
>   	struct socket		*sk_socket;
>   	void			*sk_user_data;
> +	void			*sk_handshake_req;
>   #ifdef CONFIG_SECURITY
>   	void			*sk_security;
>   #endif
> diff --git a/include/trace/events/handshake.h b/include/trace/events/handshake.h
> new file mode 100644
> index 000000000000..feffcd1d6256
> --- /dev/null
> +++ b/include/trace/events/handshake.h
> @@ -0,0 +1,159 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM handshake
> +
> +#if !defined(_TRACE_HANDSHAKE_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_HANDSHAKE_H
> +
> +#include <linux/net.h>
> +#include <linux/tracepoint.h>
> +
> +DECLARE_EVENT_CLASS(handshake_event_class,
> +	TP_PROTO(
> +		const struct net *net,
> +		const struct handshake_req *req,
> +		const struct socket *sock
> +	),
> +	TP_ARGS(net, req, sock),
> +	TP_STRUCT__entry(
> +		__field(const void *, req)
> +		__field(const void *, sock)
> +		__field(unsigned int, netns_ino)
> +	),
> +	TP_fast_assign(
> +		__entry->req = req;
> +		__entry->sock = sock;
> +		__entry->netns_ino = net->ns.inum;
> +	),
> +	TP_printk("req=%p sock=%p",
> +		__entry->req, __entry->sock
> +	)
> +);
> +#define DEFINE_HANDSHAKE_EVENT(name)				\
> +	DEFINE_EVENT(handshake_event_class, name,		\
> +		TP_PROTO(					\
> +			const struct net *net,			\
> +			const struct handshake_req *req,	\
> +			const struct socket *sock		\
> +		),						\
> +		TP_ARGS(net, req, sock))
> +
> +DECLARE_EVENT_CLASS(handshake_fd_class,
> +	TP_PROTO(
> +		const struct net *net,
> +		const struct handshake_req *req,
> +		const struct socket *sock,
> +		int fd
> +	),
> +	TP_ARGS(net, req, sock, fd),
> +	TP_STRUCT__entry(
> +		__field(const void *, req)
> +		__field(const void *, sock)
> +		__field(int, fd)
> +		__field(unsigned int, netns_ino)
> +	),
> +	TP_fast_assign(
> +		__entry->req = req;
> +		__entry->sock = req->hr_sock;
> +		__entry->fd = fd;
> +		__entry->netns_ino = net->ns.inum;
> +	),
> +	TP_printk("req=%p sock=%p fd=%d",
> +		__entry->req, __entry->sock, __entry->fd
> +	)
> +);
> +#define DEFINE_HANDSHAKE_FD_EVENT(name)				\
> +	DEFINE_EVENT(handshake_fd_class, name,			\
> +		TP_PROTO(					\
> +			const struct net *net,			\
> +			const struct handshake_req *req,	\
> +			const struct socket *sock,		\
> +			int fd					\
> +		),						\
> +		TP_ARGS(net, req, sock, fd))
> +
> +DECLARE_EVENT_CLASS(handshake_error_class,
> +	TP_PROTO(
> +		const struct net *net,
> +		const struct handshake_req *req,
> +		const struct socket *sock,
> +		int err
> +	),
> +	TP_ARGS(net, req, sock, err),
> +	TP_STRUCT__entry(
> +		__field(const void *, req)
> +		__field(const void *, sock)
> +		__field(int, err)
> +		__field(unsigned int, netns_ino)
> +	),
> +	TP_fast_assign(
> +		__entry->req = req;
> +		__entry->sock = sock;
> +		__entry->err = err;
> +		__entry->netns_ino = net->ns.inum;
> +	),
> +	TP_printk("req=%p sock=%p err=%d",
> +		__entry->req, __entry->sock, __entry->err
> +	)
> +);
> +#define DEFINE_HANDSHAKE_ERROR(name)				\
> +	DEFINE_EVENT(handshake_error_class, name,		\
> +		TP_PROTO(					\
> +			const struct net *net,			\
> +			const struct handshake_req *req,	\
> +			const struct socket *sock,		\
> +			int err					\
> +		),						\
> +		TP_ARGS(net, req, sock, err))
> +
> +
> +/**
> + ** Request lifetime events
> + **/
> +
> +DEFINE_HANDSHAKE_EVENT(handshake_submit);
> +DEFINE_HANDSHAKE_ERROR(handshake_submit_err);
> +DEFINE_HANDSHAKE_EVENT(handshake_cancel);
> +DEFINE_HANDSHAKE_EVENT(handshake_cancel_none);
> +DEFINE_HANDSHAKE_EVENT(handshake_cancel_busy);
> +DEFINE_HANDSHAKE_EVENT(handshake_destruct);
> +
> +
> +TRACE_EVENT(handshake_complete,
> +	TP_PROTO(
> +		const struct net *net,
> +		const struct handshake_req *req,
> +		const struct socket *sock,
> +		int status
> +	),
> +	TP_ARGS(net, req, sock, status),
> +	TP_STRUCT__entry(
> +		__field(const void *, req)
> +		__field(const void *, sock)
> +		__field(int, status)
> +		__field(unsigned int, netns_ino)
> +	),
> +	TP_fast_assign(
> +		__entry->req = req;
> +		__entry->sock = sock;
> +		__entry->status = status;
> +		__entry->netns_ino = net->ns.inum;
> +	),
> +	TP_printk("req=%p sock=%p status=%d",
> +		__entry->req, __entry->sock, __entry->status
> +	)
> +);
> +
> +/**
> + ** Netlink events
> + **/
> +
> +DEFINE_HANDSHAKE_ERROR(handshake_notify_err);
> +DEFINE_HANDSHAKE_FD_EVENT(handshake_cmd_accept);
> +DEFINE_HANDSHAKE_ERROR(handshake_cmd_accept_err);
> +DEFINE_HANDSHAKE_FD_EVENT(handshake_cmd_done);
> +DEFINE_HANDSHAKE_ERROR(handshake_cmd_done_err);
> +
> +#endif /* _TRACE_HANDSHAKE_H */
> +
> +#include <trace/define_trace.h>
> diff --git a/include/uapi/linux/handshake.h b/include/uapi/linux/handshake.h
> new file mode 100644
> index 000000000000..09fd7c37cba4
> --- /dev/null
> +++ b/include/uapi/linux/handshake.h
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/* Do not edit directly, auto-generated from: */
> +/*	Documentation/netlink/specs/handshake.yaml */
> +/* YNL-GEN uapi header */
> +
> +#ifndef _UAPI_LINUX_HANDSHAKE_H
> +#define _UAPI_LINUX_HANDSHAKE_H
> +
> +#define HANDSHAKE_FAMILY_NAME		"handshake"
> +#define HANDSHAKE_FAMILY_VERSION	1
> +
> +enum {
> +	HANDSHAKE_HANDLER_CLASS_NONE,
> +};
> +
> +enum {
> +	HANDSHAKE_MSG_TYPE_UNSPEC,
> +	HANDSHAKE_MSG_TYPE_CLIENTHELLO,
> +	HANDSHAKE_MSG_TYPE_SERVERHELLO,
> +};
> +
> +enum {
> +	HANDSHAKE_AUTH_UNSPEC,
> +	HANDSHAKE_AUTH_UNAUTH,
> +	HANDSHAKE_AUTH_X509,
> +	HANDSHAKE_AUTH_PSK,
> +};
> +
> +enum {
> +	HANDSHAKE_A_ACCEPT_STATUS = 1,
> +	HANDSHAKE_A_ACCEPT_SOCKFD,
> +	HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
> +	HANDSHAKE_A_ACCEPT_MESSAGE_TYPE,
> +	HANDSHAKE_A_ACCEPT_AUTH,
> +	HANDSHAKE_A_ACCEPT_GNUTLS_PRIORITIES,
> +	HANDSHAKE_A_ACCEPT_MY_PEERID,
> +	HANDSHAKE_A_ACCEPT_MY_PRIVKEY,
> +
> +	__HANDSHAKE_A_ACCEPT_MAX,
> +	HANDSHAKE_A_ACCEPT_MAX = (__HANDSHAKE_A_ACCEPT_MAX - 1)
> +};
> +
> +enum {
> +	HANDSHAKE_A_DONE_STATUS = 1,
> +	HANDSHAKE_A_DONE_SOCKFD,
> +	HANDSHAKE_A_DONE_REMOTE_PEERID,
> +
> +	__HANDSHAKE_A_DONE_MAX,
> +	HANDSHAKE_A_DONE_MAX = (__HANDSHAKE_A_DONE_MAX - 1)
> +};
> +
> +enum {
> +	HANDSHAKE_CMD_READY = 1,
> +	HANDSHAKE_CMD_ACCEPT,
> +	HANDSHAKE_CMD_DONE,
> +
> +	__HANDSHAKE_CMD_MAX,
> +	HANDSHAKE_CMD_MAX = (__HANDSHAKE_CMD_MAX - 1)
> +};
> +
> +#define HANDSHAKE_MCGRP_NONE	"none"
> +
> +#endif /* _UAPI_LINUX_HANDSHAKE_H */
> diff --git a/net/Makefile b/net/Makefile
> index 0914bea9c335..adbb64277601 100644
> --- a/net/Makefile
> +++ b/net/Makefile
> @@ -79,3 +79,4 @@ obj-$(CONFIG_NET_NCSI)		+= ncsi/
>   obj-$(CONFIG_XDP_SOCKETS)	+= xdp/
>   obj-$(CONFIG_MPTCP)		+= mptcp/
>   obj-$(CONFIG_MCTP)		+= mctp/
> +obj-y				+= handshake/
> diff --git a/net/handshake/Makefile b/net/handshake/Makefile
> new file mode 100644
> index 000000000000..a41b03f4837b
> --- /dev/null
> +++ b/net/handshake/Makefile
> @@ -0,0 +1,11 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Makefile for the Generic HANDSHAKE service
> +#
> +# Author: Chuck Lever <chuck.lever@oracle.com>
> +#
> +# Copyright (c) 2023, Oracle and/or its affiliates.
> +#
> +
> +obj-y += handshake.o
> +handshake-y := netlink.o request.o trace.o
> diff --git a/net/handshake/handshake.h b/net/handshake/handshake.h
> new file mode 100644
> index 000000000000..366c7659ec09
> --- /dev/null
> +++ b/net/handshake/handshake.h
> @@ -0,0 +1,41 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Generic netlink handshake service
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2023, Oracle and/or its affiliates.
> + */
> +
> +/*
> + * Data structures and functions that are visible only within the
> + * handshake module are declared here.
> + */
> +
> +#ifndef _INTERNAL_HANDSHAKE_H
> +#define _INTERNAL_HANDSHAKE_H
> +
> +/*
> + * One handshake request
> + */
> +struct handshake_req {
> +	struct list_head		hr_list;
> +	unsigned long			hr_flags;
> +	const struct handshake_proto	*hr_proto;
> +	struct socket			*hr_sock;
> +
> +	void				(*hr_saved_destruct)(struct sock *sk);
> +};
> +
> +#define HANDSHAKE_F_COMPLETED	BIT(0)
> +
> +/* netlink.c */
> +extern bool handshake_genl_inited;
> +int handshake_genl_notify(struct net *net, int handler_class, gfp_t flags);
> +
> +/* request.c */
> +void __remove_pending_locked(struct net *net, struct handshake_req *req);
> +void handshake_complete(struct handshake_req *req, int status,
> +			struct nlattr **tb);
> +
> +#endif /* _INTERNAL_HANDSHAKE_H */
> diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
> new file mode 100644
> index 000000000000..581e382236cf
> --- /dev/null
> +++ b/net/handshake/netlink.c
> @@ -0,0 +1,340 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Generic netlink handshake service
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2023, Oracle and/or its affiliates.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/socket.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <linux/inet.h>
> +
> +#include <net/sock.h>
> +#include <net/genetlink.h>
> +#include <net/handshake.h>
> +
> +#include <uapi/linux/handshake.h>
> +#include <trace/events/handshake.h>
> +#include "handshake.h"
> +
> +static struct genl_family __ro_after_init handshake_genl_family;
> +bool handshake_genl_inited;
> +
> +/**
> + * handshake_genl_notify - Notify handlers that a request is waiting
> + * @net: target network namespace
> + * @handler_class: target handler
> + * @flags: memory allocation control flags
> + *
> + * Returns zero on success or a negative errno if notification failed.
> + */
> +int handshake_genl_notify(struct net *net, int handler_class, gfp_t flags)
> +{
> +	struct sk_buff *msg;
> +	void *hdr;
> +
> +	if (!genl_has_listeners(&handshake_genl_family, net, handler_class))
> +		return -ESRCH;
> +
> +	msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	hdr = genlmsg_put(msg, 0, 0, &handshake_genl_family, 0,
> +			  HANDSHAKE_CMD_READY);
> +	if (!hdr)
> +		goto out_free;
> +
> +	if (nla_put_u32(msg, HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
> +			handler_class) < 0) {
> +		genlmsg_cancel(msg, hdr);
> +		goto out_free;
> +	}
> +
> +	genlmsg_end(msg, hdr);
> +	return genlmsg_multicast_netns(&handshake_genl_family, net, msg,
> +				       0, handler_class, flags);
> +
> +out_free:
> +	nlmsg_free(msg);
> +	return -EMSGSIZE;
> +}
> +
> +/**
> + * handshake_genl_put - Create a generic netlink message header
> + * @msg: buffer in which to create the header
> + * @gi: generic netlink message context
> + *
> + * Returns a ready-to-use header, or NULL.
> + */
> +struct nlmsghdr *handshake_genl_put(struct sk_buff *msg, struct genl_info *gi)
> +{
> +	return genlmsg_put(msg, gi->snd_portid, gi->snd_seq,
> +			   &handshake_genl_family, 0, gi->genlhdr->cmd);
> +}
> +EXPORT_SYMBOL(handshake_genl_put);
> +
> +static int handshake_status_reply(struct sk_buff *skb, struct genl_info *gi,
> +				  int status)
> +{
> +	struct nlmsghdr *hdr;
> +	struct sk_buff *msg;
> +	int ret;
> +
> +	ret = -ENOMEM;
> +	msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!msg)
> +		goto out;
> +	hdr = handshake_genl_put(msg, gi);
> +	if (!hdr)
> +		goto out_free;
> +
> +	ret = -EMSGSIZE;
> +	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_STATUS, status);
> +	if (ret < 0)
> +		goto out_free;
> +
> +	genlmsg_end(msg, hdr);
> +	return genlmsg_reply(msg, gi);
> +
> +out_free:
> +	genlmsg_cancel(msg, hdr);
> +out:
> +	return ret;
> +}
> +
> +/*
> + * dup() a kernel socket for use as a user space file descriptor
> + * in the current process.
> + *
> + * Implicit argument: "current()"
> + */
> +static int handshake_dup(struct socket *kernsock)
> +{
> +	struct file *file = get_file(kernsock->file);
> +	int newfd;
> +
> +	newfd = get_unused_fd_flags(O_CLOEXEC);
> +	if (newfd < 0) {
> +		fput(file);
> +		return newfd;
> +	}
> +
> +	fd_install(newfd, file);
> +	return newfd;
> +}
> +
> +static const struct nla_policy
> +handshake_accept_nl_policy[HANDSHAKE_A_ACCEPT_HANDLER_CLASS + 1] = {
> +	[HANDSHAKE_A_ACCEPT_HANDLER_CLASS] = { .type = NLA_U32, },
> +};
> +
> +static int handshake_nl_accept_doit(struct sk_buff *skb, struct genl_info *gi)
> +{
> +	struct nlattr *tb[HANDSHAKE_A_ACCEPT_MAX + 1];
> +	struct net *net = sock_net(skb->sk);
> +	struct handshake_req *pos, *req;
> +	int fd, err;
> +
> +	err = -EINVAL;
> +	if (genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
> +			  HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
> +			  handshake_accept_nl_policy, NULL))
> +		goto out_status;
> +	if (!tb[HANDSHAKE_A_ACCEPT_HANDLER_CLASS])
> +		goto out_status;
> +
> +	req = NULL;
> +	spin_lock(&net->hs_lock);
> +	list_for_each_entry(pos, &net->hs_requests, hr_list) {
> +		if (pos->hr_proto->hp_handler_class !=
> +		    nla_get_u32(tb[HANDSHAKE_A_ACCEPT_HANDLER_CLASS]))
> +			continue;
> +		__remove_pending_locked(net, pos);
> +		req = pos;
> +		break;
> +	}
> +	spin_unlock(&net->hs_lock);
> +	if (!req)
> +		goto out_status;
> +
> +	fd = handshake_dup(req->hr_sock);
> +	if (fd < 0) {
> +		err = fd;
> +		goto out_complete;
> +	}
> +	err = req->hr_proto->hp_accept(req, gi, fd);
> +	if (err)
> +		goto out_complete;
> +
> +	trace_handshake_cmd_accept(net, req, req->hr_sock, fd);
> +	return 0;
> +
> +out_complete:
> +	handshake_complete(req, -EIO, NULL);
> +	fput(req->hr_sock->file);
> +out_status:
> +	trace_handshake_cmd_accept_err(net, req, NULL, err);
> +	return handshake_status_reply(skb, gi, err);
> +}
> +
> +static const struct nla_policy
> +handshake_done_nl_policy[HANDSHAKE_A_DONE_MAX + 1] = {
> +	[HANDSHAKE_A_DONE_SOCKFD] = { .type = NLA_U32, },
> +	[HANDSHAKE_A_DONE_STATUS] = { .type = NLA_U32, },
> +	[HANDSHAKE_A_DONE_REMOTE_PEERID] = { .type = NLA_U32, },
> +};
> +
> +static int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *gi)
> +{
> +	struct nlattr *tb[HANDSHAKE_A_DONE_MAX + 1];
> +	struct net *net = sock_net(skb->sk);
> +	struct socket *sock = NULL;
> +	struct handshake_req *req;
> +	int fd, status, err;
> +
> +	err = genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
> +			    HANDSHAKE_A_DONE_MAX, handshake_done_nl_policy,
> +			    NULL);
> +	if (err || !tb[HANDSHAKE_A_DONE_SOCKFD]) {
> +		err = -EINVAL;
> +		goto out_status;
> +	}
> +
> +	fd = nla_get_u32(tb[HANDSHAKE_A_DONE_SOCKFD]);
> +
> +	err = 0;
> +	sock = sockfd_lookup(fd, &err);
> +	if (err) {
> +		err = -EBADF;
> +		goto out_status;
> +	}
> +
> +	req = sock->sk->sk_handshake_req;
> +	if (!req) {
> +		err = -EBUSY;
> +		goto out_status;
> +	}
> +
> +	trace_handshake_cmd_done(net, req, sock, fd);
> +
> +	status = -EIO;
> +	if (tb[HANDSHAKE_A_DONE_STATUS])
> +		status = nla_get_u32(tb[HANDSHAKE_A_DONE_STATUS]);
> +
And this makes me ever so slightly uneasy.

As 'status' is a netlink attribute it's inevitably defined as 'unsigned'.
Yet we assume that 'status' is a negative number, leaving us 
_technically_ in unchartered territory.

And that is notwithstanding the problem that we haven't even defined 
_what_ should be in the status attribute.

Reading the code I assume that it's either '0' for success or a negative 
number (ie the error code) on failure.
Which implicitely means that we _never_ set a positive number here.
So what would we lose if we declare 'status' to carry the _positive_ 
error number instead?
It would bring us in-line with the actual netlink attribute definition, 
we wouldn't need to worry about possible integer overflows, yadda yadda...

Hmm?

> +	handshake_complete(req, status, tb);
> +	fput(sock->file);
> +	return 0;
> +
> +out_status:
> +	trace_handshake_cmd_done_err(net, req, sock, err);
> +	return handshake_status_reply(skb, gi, err);
> +}
> +
> +static const struct genl_split_ops handshake_nl_ops[] = {
> +	{
> +		.cmd		= HANDSHAKE_CMD_ACCEPT,
> +		.doit		= handshake_nl_accept_doit,
> +		.policy		= handshake_accept_nl_policy,
> +		.maxattr	= HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
> +		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
> +	},
> +	{
> +		.cmd		= HANDSHAKE_CMD_DONE,
> +		.doit		= handshake_nl_done_doit,
> +		.policy		= handshake_done_nl_policy,
> +		.maxattr	= HANDSHAKE_A_DONE_REMOTE_PEERID,
> +		.flags		= GENL_CMD_CAP_DO,
> +	},
> +};
> +
> +static const struct genl_multicast_group handshake_nl_mcgrps[] = {
> +	[HANDSHAKE_HANDLER_CLASS_NONE] = { .name = HANDSHAKE_MCGRP_NONE, },
> +};
> +
> +static struct genl_family __ro_after_init handshake_genl_family = {
> +	.hdrsize		= 0,
> +	.name			= HANDSHAKE_FAMILY_NAME,
> +	.version		= HANDSHAKE_FAMILY_VERSION,
> +	.netnsok		= true,
> +	.parallel_ops		= true,
> +	.n_mcgrps		= ARRAY_SIZE(handshake_nl_mcgrps),
> +	.n_split_ops		= ARRAY_SIZE(handshake_nl_ops),
> +	.split_ops		= handshake_nl_ops,
> +	.mcgrps			= handshake_nl_mcgrps,
> +	.module			= THIS_MODULE,
> +};
> +
> +static int __net_init handshake_net_init(struct net *net)
> +{
> +	spin_lock_init(&net->hs_lock);
> +	INIT_LIST_HEAD(&net->hs_requests);
> +	net->hs_pending	= 0;
> +	return 0;
> +}
> +
> +static void __net_exit handshake_net_exit(struct net *net)
> +{
> +	struct handshake_req *req;
> +	LIST_HEAD(requests);
> +
> +	/*
> +	 * This drains the net's pending list. Requests that
> +	 * have been accepted and are in progress will be
> +	 * destroyed when the socket is closed.
> +	 */
> +	spin_lock(&net->hs_lock);
> +	list_splice_init(&requests, &net->hs_requests);
> +	spin_unlock(&net->hs_lock);
> +
> +	while (!list_empty(&requests)) {
> +		req = list_first_entry(&requests, struct handshake_req, hr_list);
> +		list_del(&req->hr_list);
> +
> +		/*
> +		 * Requests on this list have not yet been
> +		 * accepted, so they do not have an fd to put.
> +		 */
> +
> +		handshake_complete(req, -ETIMEDOUT, NULL);
> +	}
> +}
> +
> +static struct pernet_operations handshake_genl_net_ops = {
> +	.init		= handshake_net_init,
> +	.exit		= handshake_net_exit,
> +};
> +
> +static int __init handshake_init(void)
> +{
> +	int ret;
> +
> +	ret = genl_register_family(&handshake_genl_family);
> +	if (ret) {
> +		pr_warn("handshake: netlink registration failed (%d)\n", ret);
> +		return ret;
> +	}
> +
> +	ret = register_pernet_subsys(&handshake_genl_net_ops);
> +	if (ret) {
> +		pr_warn("handshake: pernet registration failed (%d)\n", ret);
> +		genl_unregister_family(&handshake_genl_family);
> +	}
> +
> +	handshake_genl_inited = true;
> +	return ret;
> +}
> +
> +static void __exit handshake_exit(void)
> +{
> +	unregister_pernet_subsys(&handshake_genl_net_ops);
> +	genl_unregister_family(&handshake_genl_family);
> +}
> +
> +module_init(handshake_init);
> +module_exit(handshake_exit);
> diff --git a/net/handshake/request.c b/net/handshake/request.c
> new file mode 100644
> index 000000000000..1d3b8e76dd2c
> --- /dev/null
> +++ b/net/handshake/request.c
> @@ -0,0 +1,246 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Handshake request lifetime events
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2023, Oracle and/or its affiliates.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/socket.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <linux/inet.h>
> +#include <linux/fdtable.h>
> +
> +#include <net/sock.h>
> +#include <net/genetlink.h>
> +#include <net/handshake.h>
> +
> +#include <uapi/linux/handshake.h>
> +#include <trace/events/handshake.h>
> +#include "handshake.h"
> +
> +/*
> + * This limit is to prevent slow remotes from causing denial of service.
> + * A ulimit-style tunable might be used instead.
> + */
> +#define HANDSHAKE_PENDING_MAX (10)
> +
> +static void __add_pending_locked(struct net *net, struct handshake_req *req)
> +{
> +	net->hs_pending++;
> +	list_add_tail(&req->hr_list, &net->hs_requests);
> +}
> +
> +void __remove_pending_locked(struct net *net, struct handshake_req *req)
> +{
> +	net->hs_pending--;
> +	list_del_init(&req->hr_list);
> +}
> +
> +/*
> + * Return values:
> + *   %true - the request was found on @net's pending list
> + *   %false - the request was not found on @net's pending list
> + *
> + * If @req was on a pending list, it has not yet been accepted.
> + */
> +static bool remove_pending(struct net *net, struct handshake_req *req)
> +{
> +	bool ret;
> +
> +	ret = false;
> +
> +	spin_lock(&net->hs_lock);
> +	if (!list_empty(&req->hr_list)) {
> +		__remove_pending_locked(net, req);
> +		ret = true;
> +	}
> +	spin_unlock(&net->hs_lock);
> +
> +	return ret;
> +}
> +
> +static void handshake_req_destroy(struct handshake_req *req, struct sock *sk)
> +{
> +	req->hr_proto->hp_destroy(req);
> +	sk->sk_handshake_req = NULL;
> +	kfree(req);
> +}
> +
> +static void handshake_sk_destruct(struct sock *sk)
> +{
> +	struct handshake_req *req = sk->sk_handshake_req;
> +
> +	if (req) {
> +		trace_handshake_destruct(sock_net(sk), req, req->hr_sock);
> +		handshake_req_destroy(req, sk);
> +	}
> +}
> +
> +/**
> + * handshake_req_alloc - consumer API to allocate a request
> + * @sock: open socket on which to perform a handshake
> + * @proto: security protocol
> + * @flags: memory allocation flags
> + *
> + * Returns an initialized handshake_req or NULL.
> + */
> +struct handshake_req *handshake_req_alloc(struct socket *sock,
> +					  const struct handshake_proto *proto,
> +					  gfp_t flags)
> +{
> +	struct handshake_req *req;
> +
> +	/* Avoid accessing uninitialized global variables later on */
> +	if (!handshake_genl_inited)
> +		return NULL;
> +
> +	req = kzalloc(sizeof(*req) + proto->hp_privsize, flags);
> +	if (!req)
> +		return NULL;
> +
> +	sock_hold(sock->sk);
> +
> +	INIT_LIST_HEAD(&req->hr_list);
> +	req->hr_sock = sock;
> +	req->hr_proto = proto;
> +	return req;
> +}
> +EXPORT_SYMBOL(handshake_req_alloc);
> +
> +/**
> + * handshake_req_private - consumer API to return per-handshake private data
> + * @req: handshake arguments
> + *
> + */
> +void *handshake_req_private(struct handshake_req *req)
> +{
> +	return (void *)(req + 1);
> +}
> +EXPORT_SYMBOL(handshake_req_private);
> +
> +/**
> + * handshake_req_submit - consumer API to submit a handshake request
> + * @req: handshake arguments
> + * @flags: memory allocation flags
> + *
> + * Return values:
> + *   %0: Request queued
> + *   %-EBUSY: A handshake is already under way for this socket
> + *   %-ESRCH: No handshake agent is available
> + *   %-EAGAIN: Too many pending handshake requests
> + *   %-ENOMEM: Failed to allocate memory
> + *   %-EMSGSIZE: Failed to construct notification message
> + *
> + * A zero return value from handshake_request() means that
> + * exactly one subsequent completion callback is guaranteed.
> + *
> + * A negative return value from handshake_request() means that
> + * no completion callback will be done and that @req is
> + * destroyed.
> + */
> +int handshake_req_submit(struct handshake_req *req, gfp_t flags)
> +{
> +	struct socket *sock = req->hr_sock;
> +	struct sock *sk = sock->sk;
> +	struct net *net = sock_net(sk);
> +	int ret;
> +
> +	ret = -EAGAIN;
> +	if (READ_ONCE(net->hs_pending) >= HANDSHAKE_PENDING_MAX)
> +		goto out_err;
> +
> +	ret = -EBUSY;
> +	spin_lock(&net->hs_lock);
> +	if (sk->sk_handshake_req || !list_empty(&req->hr_list)) {
> +		spin_unlock(&net->hs_lock);
> +		goto out_err;
> +	}
> +	req->hr_saved_destruct = sk->sk_destruct;
> +	sk->sk_destruct = handshake_sk_destruct;
> +	sk->sk_handshake_req = req;
> +	__add_pending_locked(net, req);
> +	spin_unlock(&net->hs_lock);
> +
> +	ret = handshake_genl_notify(net, req->hr_proto->hp_handler_class,
> +				    flags);
> +	if (ret) {
> +		trace_handshake_notify_err(net, req, sock, ret);
> +		if (remove_pending(net, req))
> +			goto out_err;
> +	}
> +
> +	trace_handshake_submit(net, req, sock);
> +	return 0;
> +
> +out_err:
> +	trace_handshake_submit_err(net, req, sock, ret);
> +	handshake_req_destroy(req, sk);
> +	return ret;
> +}
> +EXPORT_SYMBOL(handshake_req_submit);
> +
> +void handshake_complete(struct handshake_req *req, int status,
> +			struct nlattr **tb)
> +{
> +	struct socket *sock = req->hr_sock;
> +	struct net *net = sock_net(sock->sk);
> +
> +	if (!test_and_set_bit(HANDSHAKE_F_COMPLETED, &req->hr_flags)) {
> +		trace_handshake_complete(net, req, sock, status);
> +		req->hr_proto->hp_done(req, status, tb);
> +		__sock_put(sock->sk);
> +	}
> +}
> +
> +/**
> + * handshake_req_cancel - consumer API to cancel an in-progress handshake
> + * @sock: socket on which there is an ongoing handshake
> + *
> + * XXX: Perhaps killing the user space agent might also be necessary?

I thought we had agreed that we would be sending a signal to the 
userspace process?
Ideally we would be sending a SIGHUP, wait for some time on the 
userspace process to respond with a 'done' message, and send a 'KILL' 
signal if we haven't received one.

Obs: Sending a KILL signal would imply that userspace is able to cope 
with children dying. Which pretty much excludes pthreads, I would think.

Guess I'll have to consult Stevens :-)

> + *
> + * Request cancellation races with request completion. To determine
> + * who won, callers examine the return value from this function.
> + *
> + * Return values:
> + *   %0 - Uncompleted handshake request was canceled or not found
> + *   %-EBUSY - Handshake request already completed

EBUSY? Wouldn't be EAGAIN more approriate?
After all, the request is everything _but_ busy...

> + */
> +int handshake_req_cancel(struct socket *sock)
> +{
> +	struct handshake_req *req;
> +	struct sock *sk;
> +	struct net *net;
> +
> +	if (!sock)
> +		return 0;
> +
> +	sk = sock->sk;
> +	req = sk->sk_handshake_req;
> +	net = sock_net(sk);
> +
> +	if (!req) {
> +		trace_handshake_cancel_none(net, req, sock);
> +		return 0;
> +	}
> +
> +	if (remove_pending(net, req)) {
> +		/* Request hadn't been accepted */
> +		trace_handshake_cancel(net, req, sock);
> +		return 0;
> +	}
> +	if (test_and_set_bit(HANDSHAKE_F_COMPLETED, &req->hr_flags)) {
> +		/* Request already completed */
> +		trace_handshake_cancel_busy(net, req, sock);
> +		return -EBUSY;
> +	}
> +
> +	__sock_put(sk);
> +	trace_handshake_cancel(net, req, sock);
> +	return 0;
> +}
> +EXPORT_SYMBOL(handshake_req_cancel);
> diff --git a/net/handshake/trace.c b/net/handshake/trace.c
> new file mode 100644
> index 000000000000..3a5b6f29a2b8
> --- /dev/null
> +++ b/net/handshake/trace.c
> @@ -0,0 +1,17 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Trace points for transport security layer handshakes.
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2023, Oracle and/or its affiliates.
> + */
> +
> +#include <linux/types.h>
> +#include <net/sock.h>
> +
> +#include "handshake.h"
> +
> +#define CREATE_TRACE_POINTS
> +
> +#include <trace/events/handshake.h>
> 
Cheers,

Hannes



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 2/2] net/tls: Add kernel APIs for requesting a TLSv1.3 handshake
  2023-02-24 19:19 ` [PATCH v5 2/2] net/tls: Add kernel APIs for requesting a TLSv1.3 handshake Chuck Lever
@ 2023-02-27  9:36   ` Hannes Reinecke
  2023-02-27 15:01     ` Chuck Lever III
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-02-27  9:36 UTC (permalink / raw)
  To: Chuck Lever, kuba, pabeni, edumazet; +Cc: netdev, kernel-tls-handshake

On 2/24/23 20:19, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> To enable kernel consumers of TLS to request a TLS handshake, add
> support to net/tls/ to send a handshake upcall. This patch also
> acts as a template for adding handshake upcall support to other
> transport layer security mechanisms.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   Documentation/netlink/specs/handshake.yaml |    4
>   Documentation/networking/index.rst         |    1
>   Documentation/networking/tls-handshake.rst |  146 ++++++++++
>   include/net/tls.h                          |   27 ++
>   include/uapi/linux/handshake.h             |    2
>   net/handshake/netlink.c                    |    1
>   net/tls/Makefile                           |    2
>   net/tls/tls_handshake.c                    |  423 ++++++++++++++++++++++++++++
>   8 files changed, 604 insertions(+), 2 deletions(-)
>   create mode 100644 Documentation/networking/tls-handshake.rst
>   create mode 100644 net/tls/tls_handshake.c
> 
> diff --git a/Documentation/netlink/specs/handshake.yaml b/Documentation/netlink/specs/handshake.yaml
> index 683a8f2df0a7..c2f6bfff2326 100644
> --- a/Documentation/netlink/specs/handshake.yaml
> +++ b/Documentation/netlink/specs/handshake.yaml
> @@ -21,7 +21,7 @@ definitions:
>       name: handler-class
>       enum-name:
>       value-start: 0
> -    entries: [ none ]
> +    entries: [ none, tlshd ]
>     -
>       type: enum
>       name: msg-type
> @@ -132,3 +132,5 @@ mcast-groups:
>     list:
>       -
>         name: none
> +    -
> +      name: tlshd
> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
> index 4ddcae33c336..189517f4ea96 100644
> --- a/Documentation/networking/index.rst
> +++ b/Documentation/networking/index.rst
> @@ -36,6 +36,7 @@ Contents:
>      scaling
>      tls
>      tls-offload
> +   tls-handshake
>      nfc
>      6lowpan
>      6pack
> diff --git a/Documentation/networking/tls-handshake.rst b/Documentation/networking/tls-handshake.rst
> new file mode 100644
> index 000000000000..f09fc6c09580
> --- /dev/null
> +++ b/Documentation/networking/tls-handshake.rst
> @@ -0,0 +1,146 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=======================
> +In-Kernel TLS Handshake
> +=======================
> +
> +Overview
> +========
> +
> +Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs
> +over TCP. TLS provides end-to-end data integrity and confidentiality,
> +in addition to peer authentication.
> +
> +The kernel's kTLS implementation handles the TLS record subprotocol, but
> +does not handle the TLS handshake subprotocol which is used to establish
> +a TLS session. Kernel consumers can use the API described here to
> +request TLS session establishment.
> +
> +There are several possible ways to provide a handshake service in the
> +kernel. The API described here is designed to hide the details of those
> +implementations so that in-kernel TLS consumers do not need to be
> +aware of how the handshake gets done.
> +
> +
> +User handshake agent
> +====================
> +
> +As of this writing, there is no TLS handshake implementation in the
> +Linux kernel. Thus, with the current implementation, a user agent is
> +started in each network namespace where a kernel consumer might require
> +a TLS handshake. This agent listens for events sent from the kernel
> +that request a handshake on an open and connected TCP socket.
> +
> +The open socket is passed to user space via a netlink operation, which
> +creates a socket descriptor in the agent's file descriptor table. If the
> +handshake completes successfully, the user agent promotes the socket to
> +use the TLS ULP and sets the session information using the SOL_TLS socket
> +options. The user agent returns the socket to the kernel via a second
> +netlink operation.
> +
> +
> +Kernel Handshake API
> +====================
> +
> +A kernel TLS consumer initiates a client-side TLS handshake on an open
> +socket by invoking one of the tls_client_hello() functions. For example:
> +
> +.. code-block:: c
> +
> +  ret = tls_client_hello_x509(sock, done_func, cookie, priorities,
> +                              cert, privkey);
> +
> +The function returns zero when the handshake request is under way. A
> +zero return guarantees the callback function @done_func will be invoked
> +for this socket.
> +
> +The function returns a negative errno if the handshake could not be
> +started. A negative errno guarantees the callback function @done_func
> +will not be invoked on this socket.
> +
> +The @sock argument is an open and connected socket. The caller must hold
> +a reference on the socket to prevent it from being destroyed while the
> +handshake is in progress.
> +
> +@done_func and @cookie are a callback function that is invoked when the
> +handshake has completed. The success status of the handshake is returned
> +via the @status parameter of the callback function. A good practice is
> +to close and destroy the socket immediately if the handshake has failed.
> +
> +@priorities is a GnuTLS priorities string that controls the handshake.
> +The special value TLS_DEFAULT_PRIORITIES causes the handshake to
> +operate using default TLS priorities. However, the caller can use the
> +string to (for example) adjust the handshake to use a restricted set
> +of ciphers (say, if the kernel consumer wishes to mandate only a
> +limited set of ciphers).
> +
> +@cert is the serial number of a key that contains a DER format x.509
> +certificate that the handshake agent presents to the remote as the local
> +peer's identity.
> +
> +@privkey is the serial number of a key that contains a DER-format
> +private key associated with the x.509 certificate.
> +
> +
> +To initiate a client-side TLS handshake with a pre-shared key, use:
> +
> +.. code-block:: c
> +
> +  ret = tls_client_hello_psk(sock, done_func, cookie, priorities,
> +                             peerid);
> +
> +@peerid is the serial number of a key that contains the pre-shared
> +key to be used for the handshake.
> +
> +The other parameters are as above.
> +
> +
> +To initiate an anonymous client-side TLS handshake use:
> +
> +.. code-block:: c
> +
> +  ret = tls_client_hello_anon(sock, done_func, cookie, priorities);
> +
> +The parameters are as above.
> +
> +The handshake agent presents no peer identity information to the
> +remote during the handshake. Only server authentication is performed
> +during the handshake. Thus the established session uses encryption
> +only.
> +
> +
> +Consumers that are in-kernel servers use:
> +
> +.. code-block:: c
> +
> +  ret = tls_server_hello(sock, done_func, cookie, priorities);
> +
> +The parameters for this operation are as above.
> +
> +
> +Lastly, if the consumer needs to cancel the handshake request, say,
> +due to a ^C or other exigent event, the handshake core provides
> +this API:
> +
> +.. code-block:: c
> +
> +  handshake_cancel(sock);
> +
> +
> +Other considerations
> +--------------------
> +
> +While a handshake is under way, the kernel consumer must alter the
> +socket's sk_data_ready callback function to ignore all incoming data.
> +Once the handshake completion callback function has been invoked,
> +normal receive operation can be resumed.
> +
> +Once a TLS session is established, the consumer must provide a buffer
> +for and then examine the control message (CMSG) that is part of every
> +subsequent sock_recvmsg(). Each control message indicates whether the
> +received message data is TLS record data or session metadata.
> +
> +See tls.rst for details on how a kTLS consumer recognizes incoming
> +(decrypted) application data, alerts, and handshake packets once the
> +socket has been promoted to use the TLS ULP.
> +
> diff --git a/include/net/tls.h b/include/net/tls.h
> index 154949c7b0c8..505b23992ef0 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
> @@ -512,4 +512,31 @@ static inline bool tls_is_sk_rx_device_offloaded(struct sock *sk)
>   	return tls_get_ctx(sk)->rx_conf == TLS_HW;
>   }
>   #endif
> +
> +#define TLS_DEFAULT_PRIORITIES		(NULL)
> +

Hmm? What is the point in this?
It's not that we can overwrite it later on ...

> +enum {
> +	TLS_NO_PEERID = 0,
> +	TLS_NO_CERT = 0,
> +	TLS_NO_PRIVKEY = 0,
> +};
> +
> +typedef void	(*tls_done_func_t)(void *data, int status,
> +				   key_serial_t peerid);
> +
> +int tls_client_hello_anon(struct socket *sock, tls_done_func_t done,
> +			  void *data, const char *priorities);
> +int tls_client_hello_x509(struct socket *sock, tls_done_func_t done,
> +			  void *data, const char *priorities,
> +			  key_serial_t cert, key_serial_t privkey);
> +int tls_client_hello_psk(struct socket *sock, tls_done_func_t done,
> +			 void *data, const char *priorities,
> +			 key_serial_t peerid);
> +int tls_server_hello_x509(struct socket *sock, tls_done_func_t done,
> +			  void *data, const char *priorities);
> +int tls_server_hello_psk(struct socket *sock, tls_done_func_t done,
> +			 void *data, const char *priorities);
> +
> +int tls_handshake_cancel(struct socket *sock);
> +
>   #endif /* _TLS_OFFLOAD_H */
> diff --git a/include/uapi/linux/handshake.h b/include/uapi/linux/handshake.h
> index 09fd7c37cba4..dad8227939a1 100644
> --- a/include/uapi/linux/handshake.h
> +++ b/include/uapi/linux/handshake.h
> @@ -11,6 +11,7 @@
>   
>   enum {
>   	HANDSHAKE_HANDLER_CLASS_NONE,
> +	HANDSHAKE_HANDLER_CLASS_TLSHD,
>   };
>   
>   enum {
> @@ -59,5 +60,6 @@ enum {
>   };
>   
>   #define HANDSHAKE_MCGRP_NONE	"none"
> +#define HANDSHAKE_MCGRP_TLSHD	"tlshd"
>   
>   #endif /* _UAPI_LINUX_HANDSHAKE_H */
> diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
> index 581e382236cf..88775f784305 100644
> --- a/net/handshake/netlink.c
> +++ b/net/handshake/netlink.c
> @@ -255,6 +255,7 @@ static const struct genl_split_ops handshake_nl_ops[] = {
>   
>   static const struct genl_multicast_group handshake_nl_mcgrps[] = {
>   	[HANDSHAKE_HANDLER_CLASS_NONE] = { .name = HANDSHAKE_MCGRP_NONE, },
> +	[HANDSHAKE_HANDLER_CLASS_TLSHD] = { .name = HANDSHAKE_MCGRP_TLSHD, },
>   };
>   
>   static struct genl_family __ro_after_init handshake_genl_family = {
> diff --git a/net/tls/Makefile b/net/tls/Makefile
> index e41c800489ac..7e56b57f14f6 100644
> --- a/net/tls/Makefile
> +++ b/net/tls/Makefile
> @@ -7,7 +7,7 @@ CFLAGS_trace.o := -I$(src)
>   
>   obj-$(CONFIG_TLS) += tls.o
>   
> -tls-y := tls_main.o tls_sw.o tls_proc.o trace.o tls_strp.o
> +tls-y := tls_handshake.o tls_main.o tls_sw.o tls_proc.o trace.o tls_strp.o
>   
I'd rather tack the new file at the end, but that might be personal 
preference ...

>   tls-$(CONFIG_TLS_TOE) += tls_toe.o
>   tls-$(CONFIG_TLS_DEVICE) += tls_device.o tls_device_fallback.o
> diff --git a/net/tls/tls_handshake.c b/net/tls/tls_handshake.c
> new file mode 100644
> index 000000000000..74d32a9ca857
> --- /dev/null
> +++ b/net/tls/tls_handshake.c
> @@ -0,0 +1,423 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Establish a TLS session for a kernel socket consumer
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2021-2023, Oracle and/or its affiliates.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/socket.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +
> +#include <net/sock.h>
> +#include <net/tls.h>
> +#include <net/genetlink.h>
> +#include <net/handshake.h>
> +
> +#include <uapi/linux/handshake.h>
> +
> +/*
> + * TLS priorities string passed to the GnuTLS library.
> + *
> + * Specifically for kernel TLS consumers: enable only TLS v1.3 and the
> + * ciphers that are supported by kTLS.
> + *
> + * Currently this list is generated by hand from the supported ciphers
> + * found in include/uapi/linux/tls.h.
> + */
> +#define KTLS_DEFAULT_PRIORITIES \
> +	"SECURE256:+SECURE128:-COMP-ALL" \
> +	":-VERS-ALL:+VERS-TLS1.3:%NO_TICKETS" \
> +	":-CIPHER-ALL:+CHACHA20-POLY1305:+AES-256-GCM:+AES-128-GCM:+AES-128-CCM"
> +
> +struct tls_handshake_req {
> +	void			(*th_consumer_done)(void *data, int status,
> +						    key_serial_t peerid);
> +	void			*th_consumer_data;
> +
> +	const char		*th_priorities;
> +	int			th_type;
> +	int			th_auth_type;
> +	key_serial_t		th_peerid;
> +	key_serial_t		th_certificate;
> +	key_serial_t		th_privkey;
> +
> +};
> +
> +static const char *tls_handshake_dup_priorities(const char *priorities,
> +						gfp_t flags)
> +{
> +	const char *tp;
> +
> +	if (priorities != TLS_DEFAULT_PRIORITIES && strlen(priorities))
See above. At TLS_DEFAULT_PRIORITIES is NULL we can leave out the first 
condition.

> +		tp = priorities;
> +	else
> +		tp = KTLS_DEFAULT_PRIORITIES;
> +	return kstrdup(tp, flags);
> +}
> +
> +static struct tls_handshake_req *
> +tls_handshake_req_init(struct handshake_req *req, tls_done_func_t done,
> +		       void *data, const char *priorities)
> +{
> +	struct tls_handshake_req *treq = handshake_req_private(req);
> +
> +	treq->th_consumer_done = done;
> +	treq->th_consumer_data = data;
> +	treq->th_priorities = priorities;
> +	treq->th_peerid = TLS_NO_PEERID;
> +	treq->th_certificate = TLS_NO_CERT;
> +	treq->th_privkey = TLS_NO_PRIVKEY;
> +	return treq;
> +}
> +
> +/**
> + * tls_handshake_destroy - callback to release a handshake request
> + * @req: handshake parameters to release
> + *
> + */
> +static void tls_handshake_destroy(struct handshake_req *req)
> +{
> +	struct tls_handshake_req *treq = handshake_req_private(req);
> +
> +	kfree(treq->th_priorities);
> +}
> +
> +/**
> + * tls_handshake_done - callback to handle a CMD_DONE request
> + * @req: socket on which the handshake was performed
> + * @status: session status code
> + * @tb: other results of session establishment
> + *
> + * Eventually this will return information about the established
> + * session: whether it is authenticated, and if so, who the remote
> + * is.
> + */
> +static void tls_handshake_done(struct handshake_req *req, int status,
> +			       struct nlattr **tb)
> +{
> +	struct tls_handshake_req *treq = handshake_req_private(req);
> +	key_serial_t peerid = TLS_NO_PEERID;
> +
> +	if (tb[HANDSHAKE_A_DONE_REMOTE_PEERID])
> +		peerid = nla_get_u32(tb[HANDSHAKE_A_DONE_REMOTE_PEERID]);
> +
> +	treq->th_consumer_done(treq->th_consumer_data, status, peerid);
> +}
> +
> +static int tls_handshake_put_accept_resp(struct sk_buff *msg,
> +					 struct tls_handshake_req *treq)
> +{
> +	int ret;
> +
> +	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MESSAGE_TYPE, treq->th_type);
> +	if (ret < 0)
> +		goto out;
> +	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_AUTH, treq->th_auth_type);
> +	if (ret < 0)
> +		goto out;
> +	switch (treq->th_auth_type) {
> +	case HANDSHAKE_AUTH_X509:
> +		if (treq->th_certificate != TLS_NO_CERT) {
> +			ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MY_PEERID,
> +					  treq->th_certificate);
> +			if (ret < 0)
> +				goto out;
> +		}
> +		if (treq->th_privkey != TLS_NO_PRIVKEY) {
> +			ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MY_PRIVKEY,
> +					  treq->th_privkey);
> +			if (ret < 0)
> +				goto out;
> +		}
> +		break;
> +	case HANDSHAKE_AUTH_PSK:
> +		if (treq->th_peerid != TLS_NO_PEERID) {
> +			ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MY_PEERID,
> +					  treq->th_peerid);
> +			if (ret < 0)
> +				goto out;
> +		}
> +		break;
> +	}
> +
> +	ret = nla_put_string(msg, HANDSHAKE_A_ACCEPT_GNUTLS_PRIORITIES,
> +			     treq->th_priorities);
> +	if (ret < 0)
> +		goto out;
> +
> +	ret = 0;
> +
> +out:
> +	return ret;
> +}
> +
> +/**
> + * tls_handshake_accept - callback to construct a CMD_ACCEPT response
> + * @req: handshake parameters to return
> + * @gi: generic netlink message context
> + * @fd: file descriptor to be returned
> + *
> + * Returns zero on success, or a negative errno on failure.
> + */
> +static int tls_handshake_accept(struct handshake_req *req,
> +				struct genl_info *gi, int fd)
> +{
> +	struct tls_handshake_req *treq = handshake_req_private(req);
> +	struct nlmsghdr *hdr;
> +	struct sk_buff *msg;
> +	int ret;
> +
> +	ret = -ENOMEM;
> +	msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!msg)
> +		goto out;
> +	hdr = handshake_genl_put(msg, gi);
> +	if (!hdr)
> +		goto out_cancel;
> +
> +	ret = -EMSGSIZE;
> +	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_SOCKFD, fd);
> +	if (ret < 0)
> +		goto out_cancel;
> +
> +	ret = tls_handshake_put_accept_resp(msg, treq);
> +	if (ret < 0)
> +		goto out_cancel;
> +
> +	genlmsg_end(msg, hdr);
> +	return genlmsg_reply(msg, gi);
> +
> +out_cancel:
> +	genlmsg_cancel(msg, hdr);
> +out:
> +	return ret;
> +}
> +
> +static const struct handshake_proto tls_handshake_proto = {
> +	.hp_handler_class	= HANDSHAKE_HANDLER_CLASS_TLSHD,
> +	.hp_privsize		= sizeof(struct tls_handshake_req),
> +
> +	.hp_accept		= tls_handshake_accept,
> +	.hp_done		= tls_handshake_done,
> +	.hp_destroy		= tls_handshake_destroy,
> +};
> +
> +/**
> + * tls_client_hello_anon - request an anonymous TLS handshake on a socket
> + * @sock: connected socket on which to perform the handshake
> + * @done: function to call when the handshake has completed
> + * @data: token to pass back to @done
> + * @priorities: GnuTLS TLS priorities string, or NULL
> + *
> + * Return values:
> + *   %0: Handshake request enqueue; ->done will be called when complete
> + *   %-ENOENT: No user agent is available
> + *   %-ENOMEM: Memory allocation failed
> + */
> +int tls_client_hello_anon(struct socket *sock, tls_done_func_t done,
> +			  void *data, const char *priorities)
> +{
> +	struct tls_handshake_req *treq;
> +	struct handshake_req *req;
> +	gfp_t flags = GFP_NOWAIT;
> +	const char *tp;
> +
> +	tp = tls_handshake_dup_priorities(priorities, flags);
> +	if (!tp)
> +		return -ENOMEM;
> +
> +	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
> +	if (!req) {
> +		kfree(tp);
> +		return -ENOMEM;
> +	}
> +
> +	treq = tls_handshake_req_init(req, done, data, tp);
> +	treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO;
> +	treq->th_auth_type = HANDSHAKE_AUTH_UNAUTH;
> +
> +	return handshake_req_submit(req, flags);
> +}
> +EXPORT_SYMBOL(tls_client_hello_anon);
> +
> +/**
> + * tls_client_hello_x509 - request an x.509-based TLS handshake on a socket
> + * @sock: connected socket on which to perform the handshake
> + * @done: function to call when the handshake has completed
> + * @data: token to pass back to @done
> + * @priorities: GnuTLS TLS priorities string
> + * @cert: serial number of key containing client's x.509 certificate
> + * @privkey: serial number of key containing client's private key
> + *
> + * Return values:
> + *   %0: Handshake request enqueue; ->done will be called when complete
> + *   %-ENOENT: No user agent is available
> + *   %-ENOMEM: Memory allocation failed
> + */
> +int tls_client_hello_x509(struct socket *sock, tls_done_func_t done,
> +			  void *data, const char *priorities,
> +			  key_serial_t cert, key_serial_t privkey)
> +{
> +	struct tls_handshake_req *treq;
> +	struct handshake_req *req;
> +	gfp_t flags = GFP_NOWAIT;
> +	const char *tp;
> +
> +	tp = tls_handshake_dup_priorities(priorities, flags);
> +	if (!tp)
> +		return -ENOMEM;
> +
> +	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
> +	if (!req) {
> +		kfree(tp);
> +		return -ENOMEM;
> +	}
> +
> +	treq = tls_handshake_req_init(req, done, data, tp);
> +	treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO;
> +	treq->th_auth_type = HANDSHAKE_AUTH_X509;
> +	treq->th_certificate = cert;
> +	treq->th_privkey = privkey;
> +
> +	return handshake_req_submit(req, flags);
> +}
> +EXPORT_SYMBOL(tls_client_hello_x509);
> +
> +/**
> + * tls_client_hello_psk - request a PSK-based TLS handshake on a socket
> + * @sock: connected socket on which to perform the handshake
> + * @done: function to call when the handshake has completed
> + * @data: token to pass back to @done
> + * @priorities: GnuTLS TLS priorities string
> + * @peerid: serial number of key containing TLS identity
> + *
> + * Return values:
> + *   %0: Handshake request enqueue; ->done will be called when complete
> + *   %-ENOENT: No user agent is available
> + *   %-ENOMEM: Memory allocation failed
> + */
> +int tls_client_hello_psk(struct socket *sock, tls_done_func_t done,
> +			 void *data, const char *priorities,
> +			 key_serial_t peerid)
> +{
> +	struct tls_handshake_req *treq;
> +	struct handshake_req *req;
> +	gfp_t flags = GFP_NOWAIT;
> +	const char *tp;
> +
> +	tp = tls_handshake_dup_priorities(priorities, flags);
> +	if (!tp)
> +		return -ENOMEM;
> +
> +	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
> +	if (!req) {
> +		kfree(tp);
> +		return -ENOMEM;
> +	}
> +
> +	treq = tls_handshake_req_init(req, done, data, tp);
> +	treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO;
> +	treq->th_auth_type = HANDSHAKE_AUTH_PSK;
> +	treq->th_peerid = peerid;
> +
> +	return handshake_req_submit(req, flags);
> +}
> +EXPORT_SYMBOL(tls_client_hello_psk);
> +
> +/**
> + * tls_server_hello_x509 - request a server TLS handshake on a socket
> + * @sock: connected socket on which to perform the handshake
> + * @done: function to call when the handshake has completed
> + * @data: token to pass back to @done
> + * @priorities: GnuTLS TLS priorities string
> + *
> + * Return values:
> + *   %0: Handshake request enqueue; ->done will be called when complete
> + *   %-ENOENT: No user agent is available
> + *   %-ENOMEM: Memory allocation failed
> + */
> +int tls_server_hello_x509(struct socket *sock, tls_done_func_t done,
> +			  void *data, const char *priorities)
> +{
> +	struct tls_handshake_req *treq;
> +	struct handshake_req *req;
> +	gfp_t flags = GFP_KERNEL;
> +	const char *tp;
> +
> +	tp = tls_handshake_dup_priorities(priorities, flags);
> +	if (!tp)
> +		return -ENOMEM;
> +
> +	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
> +	if (!req) {
> +		kfree(tp);
> +		return -ENOMEM;
> +	}
> +
> +	treq = tls_handshake_req_init(req, done, data, tp);
> +	treq->th_type = HANDSHAKE_MSG_TYPE_SERVERHELLO;
> +	treq->th_auth_type = HANDSHAKE_AUTH_X509;
> +
> +	return handshake_req_submit(req, flags);
> +}
> +EXPORT_SYMBOL(tls_server_hello_x509);
> +
> +/**
> + * tls_server_hello_psk - request a server TLS handshake on a socket
> + * @sock: connected socket on which to perform the handshake
> + * @done: function to call when the handshake has completed
> + * @data: token to pass back to @done
> + * @priorities: GnuTLS TLS priorities string
> + *
> + * Return values:
> + *   %0: Handshake request enqueue; ->done will be called when complete
> + *   %-ENOENT: No user agent is available
> + *   %-ENOMEM: Memory allocation failed
> + */
> +int tls_server_hello_psk(struct socket *sock, tls_done_func_t done,
> +			 void *data, const char *priorities)
> +{
> +	struct tls_handshake_req *treq;
> +	struct handshake_req *req;
> +	gfp_t flags = GFP_KERNEL;
> +	const char *tp;
> +
> +	tp = tls_handshake_dup_priorities(priorities, flags);
> +	if (!tp)
> +		return -ENOMEM;
> +
> +	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
> +	if (!req) {
> +		kfree(tp);
> +		return -ENOMEM;
> +	}
> +
> +	treq = tls_handshake_req_init(req, done, data, tp);
> +	treq->th_type = HANDSHAKE_MSG_TYPE_SERVERHELLO;
> +	treq->th_auth_type = HANDSHAKE_AUTH_PSK;
> +
> +	return handshake_req_submit(req, flags);
> +}
> +EXPORT_SYMBOL(tls_server_hello_psk);
> +
> +/**
> + * tls_handshake_cancel - cancel a pending handshake
> + * @sock: socket on which there is an ongoing handshake
> + *
> + * Request cancellation races with request completion. To determine
> + * who won, callers examine the return value from this function.
> + *
> + * Return values:
> + *   %0 - Uncompleted handshake request was canceled
> + *   %-EBUSY - Handshake request already completed
> + */
> +int tls_handshake_cancel(struct socket *sock)
> +{
> +	return handshake_req_cancel(sock);
> +}
> +EXPORT_SYMBOL(tls_handshake_cancel);
> 
> 
> 

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests
  2023-02-27  9:24   ` Hannes Reinecke
@ 2023-02-27 14:59     ` Chuck Lever III
  2023-02-27 15:14       ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever III @ 2023-02-27 14:59 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chuck Lever, kuba, pabeni, edumazet, netdev, kernel-tls-handshake



> On Feb 27, 2023, at 4:24 AM, Hannes Reinecke <hare@suse.de> wrote:
> 
> On 2/24/23 20:19, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> When a kernel consumer needs a transport layer security session, it
>> first needs a handshake to negotiate and establish a session. This
>> negotiation can be done in user space via one of the several
>> existing library implementations, or it can be done in the kernel.
>> No in-kernel handshake implementations yet exist. In their absence,
>> we add a netlink service that can:
>> a. Notify a user space daemon that a handshake is needed.
>> b. Once notified, the daemon calls the kernel back via this
>>    netlink service to get the handshake parameters, including an
>>    open socket on which to establish the session.
>> c. Once the handshake is complete, the daemon reports the
>>    session status and other information via a second netlink
>>    operation. This operation marks that it is safe for the
>>    kernel to use the open socket and the security session
>>    established there.
>> The notification service uses a multicast group. Each handshake
>> mechanism (eg, tlshd) adopts its own group number so that the
>> handshake services are completely independent of one another. The
>> kernel can then tell via netlink_has_listeners() whether a handshake
>> service is active and prepared to handle a handshake request.
>> A new netlink operation, ACCEPT, acts like accept(2) in that it
>> instantiates a file descriptor in the user space daemon's fd table.
>> If this operation is successful, the reply carries the fd number,
>> which can be treated as an open and ready file descriptor.
>> While user space is performing the handshake, the kernel keeps its
>> muddy paws off the open socket. A second new netlink operation,
>> DONE, indicates that the user space daemon is finished with the
>> socket and it is safe for the kernel to use again. The operation
>> also indicates whether a session was established successfully.
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  Documentation/netlink/specs/handshake.yaml |  134 +++++++++++
>>  include/net/handshake.h                    |   45 ++++
>>  include/net/net_namespace.h                |    5
>>  include/net/sock.h                         |    1
>>  include/trace/events/handshake.h           |  159 +++++++++++++
>>  include/uapi/linux/handshake.h             |   63 +++++
>>  net/Makefile                               |    1
>>  net/handshake/Makefile                     |   11 +
>>  net/handshake/handshake.h                  |   41 +++
>>  net/handshake/netlink.c                    |  340 ++++++++++++++++++++++++++++
>>  net/handshake/request.c                    |  246 ++++++++++++++++++++
>>  net/handshake/trace.c                      |   17 +
>>  12 files changed, 1063 insertions(+)
>>  create mode 100644 Documentation/netlink/specs/handshake.yaml
>>  create mode 100644 include/net/handshake.h
>>  create mode 100644 include/trace/events/handshake.h
>>  create mode 100644 include/uapi/linux/handshake.h
>>  create mode 100644 net/handshake/Makefile
>>  create mode 100644 net/handshake/handshake.h
>>  create mode 100644 net/handshake/netlink.c
>>  create mode 100644 net/handshake/request.c
>>  create mode 100644 net/handshake/trace.c
>> diff --git a/Documentation/netlink/specs/handshake.yaml b/Documentation/netlink/specs/handshake.yaml
>> new file mode 100644
>> index 000000000000..683a8f2df0a7
>> --- /dev/null
>> +++ b/Documentation/netlink/specs/handshake.yaml
>> @@ -0,0 +1,134 @@
>> +# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>> +#
>> +# GENL HANDSHAKE service.
>> +#
>> +# Author: Chuck Lever <chuck.lever@oracle.com>
>> +#
>> +# Copyright (c) 2023, Oracle and/or its affiliates.
>> +#
>> +
>> +name: handshake
>> +
>> +protocol: genetlink-c
>> +
>> +doc: Netlink protocol to request a transport layer security handshake.
>> +
>> +uapi-header: linux/net/handshake.h
>> +
>> +definitions:
>> +  -
>> +    type: enum
>> +    name: handler-class
>> +    enum-name:
>> +    value-start: 0
>> +    entries: [ none ]
>> +  -
>> +    type: enum
>> +    name: msg-type
>> +    enum-name:
>> +    value-start: 0
>> +    entries: [ unspec, clienthello, serverhello ]
>> +  -
>> +    type: enum
>> +    name: auth
>> +    enum-name:
>> +    value-start: 0
>> +    entries: [ unspec, unauth, x509, psk ]
>> +
>> +attribute-sets:
>> +  -
>> +    name: accept
>> +    attributes:
>> +      -
>> +        name: status
>> +        doc: Status of this accept operation
>> +        type: u32
>> +        value: 1
>> +      -
>> +        name: sockfd
>> +        doc: File descriptor of socket to use
>> +        type: u32
>> +      -
>> +        name: handler-class
>> +        doc: Which type of handler is responding
>> +        type: u32
>> +        enum: handler-class
>> +      -
>> +        name: message-type
>> +        doc: Handshake message type
>> +        type: u32
>> +        enum: msg-type
>> +      -
>> +        name: auth
>> +        doc: Authentication mode
>> +        type: u32
>> +        enum: auth
>> +      -
>> +        name: gnutls-priorities
>> +        doc: GnuTLS priority string
>> +        type: string
>> +      -
>> +        name: my-peerid
>> +        doc: Serial no of key containing local identity
>> +        type: u32
>> +      -
>> +        name: my-privkey
>> +        doc: Serial no of key containing optional private key
>> +        type: u32
>> +  -
>> +    name: done
>> +    attributes:
>> +      -
>> +        name: status
>> +        doc: Session status
>> +        type: u32
>> +        value: 1
>> +      -
>> +        name: sockfd
>> +        doc: File descriptor of socket that has completed
>> +        type: u32
>> +      -
>> +        name: remote-peerid
>> +        doc: Serial no of keys containing identities of remote peer
>> +        type: u32
>> +
>> +operations:
>> +  list:
>> +    -
>> +      name: ready
>> +      doc: Notify handlers that a new handshake request is waiting
>> +      value: 1
>> +      notify: accept
>> +    -
>> +      name: accept
>> +      doc: Handler retrieves next queued handshake request
>> +      attribute-set: accept
>> +      flags: [ admin-perm ]
>> +      do:
>> +        request:
>> +          attributes:
>> +            - handler-class
>> +        reply:
>> +          attributes:
>> +            - status
>> +            - sockfd
>> +            - message-type
>> +            - auth
>> +            - gnutls-priorities
>> +            - my-peerid
>> +            - my-privkey
>> +    -
>> +      name: done
>> +      doc: Handler reports handshake completion
>> +      attribute-set: done
>> +      do:
>> +        request:
>> +          attributes:
>> +            - status
>> +            - sockfd
>> +            - remote-peerid
>> +
>> +mcast-groups:
>> +  list:
>> +    -
>> +      name: none
>> diff --git a/include/net/handshake.h b/include/net/handshake.h
>> new file mode 100644
>> index 000000000000..08f859237936
>> --- /dev/null
>> +++ b/include/net/handshake.h
>> @@ -0,0 +1,45 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Generic HANDSHAKE service.
>> + *
>> + * Author: Chuck Lever <chuck.lever@oracle.com>
>> + *
>> + * Copyright (c) 2023, Oracle and/or its affiliates.
>> + */
>> +
>> +/*
>> + * Data structures and functions that are visible only within the
>> + * kernel are declared here.
>> + */
>> +
>> +#ifndef _NET_HANDSHAKE_H
>> +#define _NET_HANDSHAKE_H
>> +
>> +struct handshake_req;
>> +
>> +/*
>> + * Invariants for all handshake requests for one transport layer
>> + * security protocol
>> + */
>> +struct handshake_proto {
>> +	int			hp_handler_class;
>> +	size_t			hp_privsize;
>> +
>> +	int			(*hp_accept)(struct handshake_req *req,
>> +					     struct genl_info *gi, int fd);
>> +	void			(*hp_done)(struct handshake_req *req,
>> +					   int status, struct nlattr **tb);
>> +	void			(*hp_destroy)(struct handshake_req *req);
>> +};
>> +
>> +extern struct handshake_req *
>> +handshake_req_alloc(struct socket *sock, const struct handshake_proto *proto,
>> +		    gfp_t flags);
>> +extern void *handshake_req_private(struct handshake_req *req);
>> +extern int handshake_req_submit(struct handshake_req *req, gfp_t flags);
>> +extern int handshake_req_cancel(struct socket *sock);
>> +
>> +extern struct nlmsghdr *handshake_genl_put(struct sk_buff *msg,
>> +					   struct genl_info *gi);
>> +
>> +#endif /* _NET_HANDSHAKE_H */
>> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
>> index 78beaa765c73..a0ce9de4dab1 100644
>> --- a/include/net/net_namespace.h
>> +++ b/include/net/net_namespace.h
>> @@ -188,6 +188,11 @@ struct net {
>>  #if IS_ENABLED(CONFIG_SMC)
>>  	struct netns_smc	smc;
>>  #endif
>> +
>> +	/* transport layer security handshake requests */
>> +	spinlock_t		hs_lock;
>> +	struct list_head	hs_requests;
>> +	int			hs_pending;
>>  } __randomize_layout;
>>    #include <linux/seq_file_net.h>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 573f2bf7e0de..2a7345ce2540 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -519,6 +519,7 @@ struct sock {
>>    	struct socket		*sk_socket;
>>  	void			*sk_user_data;
>> +	void			*sk_handshake_req;
>>  #ifdef CONFIG_SECURITY
>>  	void			*sk_security;
>>  #endif
>> diff --git a/include/trace/events/handshake.h b/include/trace/events/handshake.h
>> new file mode 100644
>> index 000000000000..feffcd1d6256
>> --- /dev/null
>> +++ b/include/trace/events/handshake.h
>> @@ -0,0 +1,159 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM handshake
>> +
>> +#if !defined(_TRACE_HANDSHAKE_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_HANDSHAKE_H
>> +
>> +#include <linux/net.h>
>> +#include <linux/tracepoint.h>
>> +
>> +DECLARE_EVENT_CLASS(handshake_event_class,
>> +	TP_PROTO(
>> +		const struct net *net,
>> +		const struct handshake_req *req,
>> +		const struct socket *sock
>> +	),
>> +	TP_ARGS(net, req, sock),
>> +	TP_STRUCT__entry(
>> +		__field(const void *, req)
>> +		__field(const void *, sock)
>> +		__field(unsigned int, netns_ino)
>> +	),
>> +	TP_fast_assign(
>> +		__entry->req = req;
>> +		__entry->sock = sock;
>> +		__entry->netns_ino = net->ns.inum;
>> +	),
>> +	TP_printk("req=%p sock=%p",
>> +		__entry->req, __entry->sock
>> +	)
>> +);
>> +#define DEFINE_HANDSHAKE_EVENT(name)				\
>> +	DEFINE_EVENT(handshake_event_class, name,		\
>> +		TP_PROTO(					\
>> +			const struct net *net,			\
>> +			const struct handshake_req *req,	\
>> +			const struct socket *sock		\
>> +		),						\
>> +		TP_ARGS(net, req, sock))
>> +
>> +DECLARE_EVENT_CLASS(handshake_fd_class,
>> +	TP_PROTO(
>> +		const struct net *net,
>> +		const struct handshake_req *req,
>> +		const struct socket *sock,
>> +		int fd
>> +	),
>> +	TP_ARGS(net, req, sock, fd),
>> +	TP_STRUCT__entry(
>> +		__field(const void *, req)
>> +		__field(const void *, sock)
>> +		__field(int, fd)
>> +		__field(unsigned int, netns_ino)
>> +	),
>> +	TP_fast_assign(
>> +		__entry->req = req;
>> +		__entry->sock = req->hr_sock;
>> +		__entry->fd = fd;
>> +		__entry->netns_ino = net->ns.inum;
>> +	),
>> +	TP_printk("req=%p sock=%p fd=%d",
>> +		__entry->req, __entry->sock, __entry->fd
>> +	)
>> +);
>> +#define DEFINE_HANDSHAKE_FD_EVENT(name)				\
>> +	DEFINE_EVENT(handshake_fd_class, name,			\
>> +		TP_PROTO(					\
>> +			const struct net *net,			\
>> +			const struct handshake_req *req,	\
>> +			const struct socket *sock,		\
>> +			int fd					\
>> +		),						\
>> +		TP_ARGS(net, req, sock, fd))
>> +
>> +DECLARE_EVENT_CLASS(handshake_error_class,
>> +	TP_PROTO(
>> +		const struct net *net,
>> +		const struct handshake_req *req,
>> +		const struct socket *sock,
>> +		int err
>> +	),
>> +	TP_ARGS(net, req, sock, err),
>> +	TP_STRUCT__entry(
>> +		__field(const void *, req)
>> +		__field(const void *, sock)
>> +		__field(int, err)
>> +		__field(unsigned int, netns_ino)
>> +	),
>> +	TP_fast_assign(
>> +		__entry->req = req;
>> +		__entry->sock = sock;
>> +		__entry->err = err;
>> +		__entry->netns_ino = net->ns.inum;
>> +	),
>> +	TP_printk("req=%p sock=%p err=%d",
>> +		__entry->req, __entry->sock, __entry->err
>> +	)
>> +);
>> +#define DEFINE_HANDSHAKE_ERROR(name)				\
>> +	DEFINE_EVENT(handshake_error_class, name,		\
>> +		TP_PROTO(					\
>> +			const struct net *net,			\
>> +			const struct handshake_req *req,	\
>> +			const struct socket *sock,		\
>> +			int err					\
>> +		),						\
>> +		TP_ARGS(net, req, sock, err))
>> +
>> +
>> +/**
>> + ** Request lifetime events
>> + **/
>> +
>> +DEFINE_HANDSHAKE_EVENT(handshake_submit);
>> +DEFINE_HANDSHAKE_ERROR(handshake_submit_err);
>> +DEFINE_HANDSHAKE_EVENT(handshake_cancel);
>> +DEFINE_HANDSHAKE_EVENT(handshake_cancel_none);
>> +DEFINE_HANDSHAKE_EVENT(handshake_cancel_busy);
>> +DEFINE_HANDSHAKE_EVENT(handshake_destruct);
>> +
>> +
>> +TRACE_EVENT(handshake_complete,
>> +	TP_PROTO(
>> +		const struct net *net,
>> +		const struct handshake_req *req,
>> +		const struct socket *sock,
>> +		int status
>> +	),
>> +	TP_ARGS(net, req, sock, status),
>> +	TP_STRUCT__entry(
>> +		__field(const void *, req)
>> +		__field(const void *, sock)
>> +		__field(int, status)
>> +		__field(unsigned int, netns_ino)
>> +	),
>> +	TP_fast_assign(
>> +		__entry->req = req;
>> +		__entry->sock = sock;
>> +		__entry->status = status;
>> +		__entry->netns_ino = net->ns.inum;
>> +	),
>> +	TP_printk("req=%p sock=%p status=%d",
>> +		__entry->req, __entry->sock, __entry->status
>> +	)
>> +);
>> +
>> +/**
>> + ** Netlink events
>> + **/
>> +
>> +DEFINE_HANDSHAKE_ERROR(handshake_notify_err);
>> +DEFINE_HANDSHAKE_FD_EVENT(handshake_cmd_accept);
>> +DEFINE_HANDSHAKE_ERROR(handshake_cmd_accept_err);
>> +DEFINE_HANDSHAKE_FD_EVENT(handshake_cmd_done);
>> +DEFINE_HANDSHAKE_ERROR(handshake_cmd_done_err);
>> +
>> +#endif /* _TRACE_HANDSHAKE_H */
>> +
>> +#include <trace/define_trace.h>
>> diff --git a/include/uapi/linux/handshake.h b/include/uapi/linux/handshake.h
>> new file mode 100644
>> index 000000000000..09fd7c37cba4
>> --- /dev/null
>> +++ b/include/uapi/linux/handshake.h
>> @@ -0,0 +1,63 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>> +/* Do not edit directly, auto-generated from: */
>> +/*	Documentation/netlink/specs/handshake.yaml */
>> +/* YNL-GEN uapi header */
>> +
>> +#ifndef _UAPI_LINUX_HANDSHAKE_H
>> +#define _UAPI_LINUX_HANDSHAKE_H
>> +
>> +#define HANDSHAKE_FAMILY_NAME		"handshake"
>> +#define HANDSHAKE_FAMILY_VERSION	1
>> +
>> +enum {
>> +	HANDSHAKE_HANDLER_CLASS_NONE,
>> +};
>> +
>> +enum {
>> +	HANDSHAKE_MSG_TYPE_UNSPEC,
>> +	HANDSHAKE_MSG_TYPE_CLIENTHELLO,
>> +	HANDSHAKE_MSG_TYPE_SERVERHELLO,
>> +};
>> +
>> +enum {
>> +	HANDSHAKE_AUTH_UNSPEC,
>> +	HANDSHAKE_AUTH_UNAUTH,
>> +	HANDSHAKE_AUTH_X509,
>> +	HANDSHAKE_AUTH_PSK,
>> +};
>> +
>> +enum {
>> +	HANDSHAKE_A_ACCEPT_STATUS = 1,
>> +	HANDSHAKE_A_ACCEPT_SOCKFD,
>> +	HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
>> +	HANDSHAKE_A_ACCEPT_MESSAGE_TYPE,
>> +	HANDSHAKE_A_ACCEPT_AUTH,
>> +	HANDSHAKE_A_ACCEPT_GNUTLS_PRIORITIES,
>> +	HANDSHAKE_A_ACCEPT_MY_PEERID,
>> +	HANDSHAKE_A_ACCEPT_MY_PRIVKEY,
>> +
>> +	__HANDSHAKE_A_ACCEPT_MAX,
>> +	HANDSHAKE_A_ACCEPT_MAX = (__HANDSHAKE_A_ACCEPT_MAX - 1)
>> +};
>> +
>> +enum {
>> +	HANDSHAKE_A_DONE_STATUS = 1,
>> +	HANDSHAKE_A_DONE_SOCKFD,
>> +	HANDSHAKE_A_DONE_REMOTE_PEERID,
>> +
>> +	__HANDSHAKE_A_DONE_MAX,
>> +	HANDSHAKE_A_DONE_MAX = (__HANDSHAKE_A_DONE_MAX - 1)
>> +};
>> +
>> +enum {
>> +	HANDSHAKE_CMD_READY = 1,
>> +	HANDSHAKE_CMD_ACCEPT,
>> +	HANDSHAKE_CMD_DONE,
>> +
>> +	__HANDSHAKE_CMD_MAX,
>> +	HANDSHAKE_CMD_MAX = (__HANDSHAKE_CMD_MAX - 1)
>> +};
>> +
>> +#define HANDSHAKE_MCGRP_NONE	"none"
>> +
>> +#endif /* _UAPI_LINUX_HANDSHAKE_H */
>> diff --git a/net/Makefile b/net/Makefile
>> index 0914bea9c335..adbb64277601 100644
>> --- a/net/Makefile
>> +++ b/net/Makefile
>> @@ -79,3 +79,4 @@ obj-$(CONFIG_NET_NCSI)		+= ncsi/
>>  obj-$(CONFIG_XDP_SOCKETS)	+= xdp/
>>  obj-$(CONFIG_MPTCP)		+= mptcp/
>>  obj-$(CONFIG_MCTP)		+= mctp/
>> +obj-y				+= handshake/
>> diff --git a/net/handshake/Makefile b/net/handshake/Makefile
>> new file mode 100644
>> index 000000000000..a41b03f4837b
>> --- /dev/null
>> +++ b/net/handshake/Makefile
>> @@ -0,0 +1,11 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +#
>> +# Makefile for the Generic HANDSHAKE service
>> +#
>> +# Author: Chuck Lever <chuck.lever@oracle.com>
>> +#
>> +# Copyright (c) 2023, Oracle and/or its affiliates.
>> +#
>> +
>> +obj-y += handshake.o
>> +handshake-y := netlink.o request.o trace.o
>> diff --git a/net/handshake/handshake.h b/net/handshake/handshake.h
>> new file mode 100644
>> index 000000000000..366c7659ec09
>> --- /dev/null
>> +++ b/net/handshake/handshake.h
>> @@ -0,0 +1,41 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Generic netlink handshake service
>> + *
>> + * Author: Chuck Lever <chuck.lever@oracle.com>
>> + *
>> + * Copyright (c) 2023, Oracle and/or its affiliates.
>> + */
>> +
>> +/*
>> + * Data structures and functions that are visible only within the
>> + * handshake module are declared here.
>> + */
>> +
>> +#ifndef _INTERNAL_HANDSHAKE_H
>> +#define _INTERNAL_HANDSHAKE_H
>> +
>> +/*
>> + * One handshake request
>> + */
>> +struct handshake_req {
>> +	struct list_head		hr_list;
>> +	unsigned long			hr_flags;
>> +	const struct handshake_proto	*hr_proto;
>> +	struct socket			*hr_sock;
>> +
>> +	void				(*hr_saved_destruct)(struct sock *sk);
>> +};
>> +
>> +#define HANDSHAKE_F_COMPLETED	BIT(0)
>> +
>> +/* netlink.c */
>> +extern bool handshake_genl_inited;
>> +int handshake_genl_notify(struct net *net, int handler_class, gfp_t flags);
>> +
>> +/* request.c */
>> +void __remove_pending_locked(struct net *net, struct handshake_req *req);
>> +void handshake_complete(struct handshake_req *req, int status,
>> +			struct nlattr **tb);
>> +
>> +#endif /* _INTERNAL_HANDSHAKE_H */
>> diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
>> new file mode 100644
>> index 000000000000..581e382236cf
>> --- /dev/null
>> +++ b/net/handshake/netlink.c
>> @@ -0,0 +1,340 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Generic netlink handshake service
>> + *
>> + * Author: Chuck Lever <chuck.lever@oracle.com>
>> + *
>> + * Copyright (c) 2023, Oracle and/or its affiliates.
>> + */
>> +
>> +#include <linux/types.h>
>> +#include <linux/socket.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/skbuff.h>
>> +#include <linux/inet.h>
>> +
>> +#include <net/sock.h>
>> +#include <net/genetlink.h>
>> +#include <net/handshake.h>
>> +
>> +#include <uapi/linux/handshake.h>
>> +#include <trace/events/handshake.h>
>> +#include "handshake.h"
>> +
>> +static struct genl_family __ro_after_init handshake_genl_family;
>> +bool handshake_genl_inited;
>> +
>> +/**
>> + * handshake_genl_notify - Notify handlers that a request is waiting
>> + * @net: target network namespace
>> + * @handler_class: target handler
>> + * @flags: memory allocation control flags
>> + *
>> + * Returns zero on success or a negative errno if notification failed.
>> + */
>> +int handshake_genl_notify(struct net *net, int handler_class, gfp_t flags)
>> +{
>> +	struct sk_buff *msg;
>> +	void *hdr;
>> +
>> +	if (!genl_has_listeners(&handshake_genl_family, net, handler_class))
>> +		return -ESRCH;
>> +
>> +	msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
>> +	if (!msg)
>> +		return -ENOMEM;
>> +
>> +	hdr = genlmsg_put(msg, 0, 0, &handshake_genl_family, 0,
>> +			  HANDSHAKE_CMD_READY);
>> +	if (!hdr)
>> +		goto out_free;
>> +
>> +	if (nla_put_u32(msg, HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
>> +			handler_class) < 0) {
>> +		genlmsg_cancel(msg, hdr);
>> +		goto out_free;
>> +	}
>> +
>> +	genlmsg_end(msg, hdr);
>> +	return genlmsg_multicast_netns(&handshake_genl_family, net, msg,
>> +				       0, handler_class, flags);
>> +
>> +out_free:
>> +	nlmsg_free(msg);
>> +	return -EMSGSIZE;
>> +}
>> +
>> +/**
>> + * handshake_genl_put - Create a generic netlink message header
>> + * @msg: buffer in which to create the header
>> + * @gi: generic netlink message context
>> + *
>> + * Returns a ready-to-use header, or NULL.
>> + */
>> +struct nlmsghdr *handshake_genl_put(struct sk_buff *msg, struct genl_info *gi)
>> +{
>> +	return genlmsg_put(msg, gi->snd_portid, gi->snd_seq,
>> +			   &handshake_genl_family, 0, gi->genlhdr->cmd);
>> +}
>> +EXPORT_SYMBOL(handshake_genl_put);
>> +
>> +static int handshake_status_reply(struct sk_buff *skb, struct genl_info *gi,
>> +				  int status)
>> +{
>> +	struct nlmsghdr *hdr;
>> +	struct sk_buff *msg;
>> +	int ret;
>> +
>> +	ret = -ENOMEM;
>> +	msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
>> +	if (!msg)
>> +		goto out;
>> +	hdr = handshake_genl_put(msg, gi);
>> +	if (!hdr)
>> +		goto out_free;
>> +
>> +	ret = -EMSGSIZE;
>> +	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_STATUS, status);
>> +	if (ret < 0)
>> +		goto out_free;
>> +
>> +	genlmsg_end(msg, hdr);
>> +	return genlmsg_reply(msg, gi);
>> +
>> +out_free:
>> +	genlmsg_cancel(msg, hdr);
>> +out:
>> +	return ret;
>> +}
>> +
>> +/*
>> + * dup() a kernel socket for use as a user space file descriptor
>> + * in the current process.
>> + *
>> + * Implicit argument: "current()"
>> + */
>> +static int handshake_dup(struct socket *kernsock)
>> +{
>> +	struct file *file = get_file(kernsock->file);
>> +	int newfd;
>> +
>> +	newfd = get_unused_fd_flags(O_CLOEXEC);
>> +	if (newfd < 0) {
>> +		fput(file);
>> +		return newfd;
>> +	}
>> +
>> +	fd_install(newfd, file);
>> +	return newfd;
>> +}
>> +
>> +static const struct nla_policy
>> +handshake_accept_nl_policy[HANDSHAKE_A_ACCEPT_HANDLER_CLASS + 1] = {
>> +	[HANDSHAKE_A_ACCEPT_HANDLER_CLASS] = { .type = NLA_U32, },
>> +};
>> +
>> +static int handshake_nl_accept_doit(struct sk_buff *skb, struct genl_info *gi)
>> +{
>> +	struct nlattr *tb[HANDSHAKE_A_ACCEPT_MAX + 1];
>> +	struct net *net = sock_net(skb->sk);
>> +	struct handshake_req *pos, *req;
>> +	int fd, err;
>> +
>> +	err = -EINVAL;
>> +	if (genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
>> +			  HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
>> +			  handshake_accept_nl_policy, NULL))
>> +		goto out_status;
>> +	if (!tb[HANDSHAKE_A_ACCEPT_HANDLER_CLASS])
>> +		goto out_status;
>> +
>> +	req = NULL;
>> +	spin_lock(&net->hs_lock);
>> +	list_for_each_entry(pos, &net->hs_requests, hr_list) {
>> +		if (pos->hr_proto->hp_handler_class !=
>> +		    nla_get_u32(tb[HANDSHAKE_A_ACCEPT_HANDLER_CLASS]))
>> +			continue;
>> +		__remove_pending_locked(net, pos);
>> +		req = pos;
>> +		break;
>> +	}
>> +	spin_unlock(&net->hs_lock);
>> +	if (!req)
>> +		goto out_status;
>> +
>> +	fd = handshake_dup(req->hr_sock);
>> +	if (fd < 0) {
>> +		err = fd;
>> +		goto out_complete;
>> +	}
>> +	err = req->hr_proto->hp_accept(req, gi, fd);
>> +	if (err)
>> +		goto out_complete;
>> +
>> +	trace_handshake_cmd_accept(net, req, req->hr_sock, fd);
>> +	return 0;
>> +
>> +out_complete:
>> +	handshake_complete(req, -EIO, NULL);
>> +	fput(req->hr_sock->file);
>> +out_status:
>> +	trace_handshake_cmd_accept_err(net, req, NULL, err);
>> +	return handshake_status_reply(skb, gi, err);
>> +}
>> +
>> +static const struct nla_policy
>> +handshake_done_nl_policy[HANDSHAKE_A_DONE_MAX + 1] = {
>> +	[HANDSHAKE_A_DONE_SOCKFD] = { .type = NLA_U32, },
>> +	[HANDSHAKE_A_DONE_STATUS] = { .type = NLA_U32, },
>> +	[HANDSHAKE_A_DONE_REMOTE_PEERID] = { .type = NLA_U32, },
>> +};
>> +
>> +static int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *gi)
>> +{
>> +	struct nlattr *tb[HANDSHAKE_A_DONE_MAX + 1];
>> +	struct net *net = sock_net(skb->sk);
>> +	struct socket *sock = NULL;
>> +	struct handshake_req *req;
>> +	int fd, status, err;
>> +
>> +	err = genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
>> +			    HANDSHAKE_A_DONE_MAX, handshake_done_nl_policy,
>> +			    NULL);
>> +	if (err || !tb[HANDSHAKE_A_DONE_SOCKFD]) {
>> +		err = -EINVAL;
>> +		goto out_status;
>> +	}
>> +
>> +	fd = nla_get_u32(tb[HANDSHAKE_A_DONE_SOCKFD]);
>> +
>> +	err = 0;
>> +	sock = sockfd_lookup(fd, &err);
>> +	if (err) {
>> +		err = -EBADF;
>> +		goto out_status;
>> +	}
>> +
>> +	req = sock->sk->sk_handshake_req;
>> +	if (!req) {
>> +		err = -EBUSY;
>> +		goto out_status;
>> +	}
>> +
>> +	trace_handshake_cmd_done(net, req, sock, fd);
>> +
>> +	status = -EIO;
>> +	if (tb[HANDSHAKE_A_DONE_STATUS])
>> +		status = nla_get_u32(tb[HANDSHAKE_A_DONE_STATUS]);
>> +
> And this makes me ever so slightly uneasy.
> 
> As 'status' is a netlink attribute it's inevitably defined as 'unsigned'.
> Yet we assume that 'status' is a negative number, leaving us _technically_ in unchartered territory.

Ah, that's an oversight.


> And that is notwithstanding the problem that we haven't even defined _what_ should be in the status attribute.

It's now an errno value.


> Reading the code I assume that it's either '0' for success or a negative number (ie the error code) on failure.
> Which implicitely means that we _never_ set a positive number here.
> So what would we lose if we declare 'status' to carry the _positive_ error number instead?
> It would bring us in-line with the actual netlink attribute definition, we wouldn't need to worry about possible integer overflows, yadda yadda...
> 
> Hmm?

It can also be argued that errnos in user space are positive-valued,
therefore, this user space visible protocol should use a positive
errno.


>> +	handshake_complete(req, status, tb);
>> +	fput(sock->file);
>> +	return 0;
>> +
>> +out_status:
>> +	trace_handshake_cmd_done_err(net, req, sock, err);
>> +	return handshake_status_reply(skb, gi, err);
>> +}
>> +
>> +static const struct genl_split_ops handshake_nl_ops[] = {
>> +	{
>> +		.cmd		= HANDSHAKE_CMD_ACCEPT,
>> +		.doit		= handshake_nl_accept_doit,
>> +		.policy		= handshake_accept_nl_policy,
>> +		.maxattr	= HANDSHAKE_A_ACCEPT_HANDLER_CLASS,
>> +		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
>> +	},
>> +	{
>> +		.cmd		= HANDSHAKE_CMD_DONE,
>> +		.doit		= handshake_nl_done_doit,
>> +		.policy		= handshake_done_nl_policy,
>> +		.maxattr	= HANDSHAKE_A_DONE_REMOTE_PEERID,
>> +		.flags		= GENL_CMD_CAP_DO,
>> +	},
>> +};
>> +
>> +static const struct genl_multicast_group handshake_nl_mcgrps[] = {
>> +	[HANDSHAKE_HANDLER_CLASS_NONE] = { .name = HANDSHAKE_MCGRP_NONE, },
>> +};
>> +
>> +static struct genl_family __ro_after_init handshake_genl_family = {
>> +	.hdrsize		= 0,
>> +	.name			= HANDSHAKE_FAMILY_NAME,
>> +	.version		= HANDSHAKE_FAMILY_VERSION,
>> +	.netnsok		= true,
>> +	.parallel_ops		= true,
>> +	.n_mcgrps		= ARRAY_SIZE(handshake_nl_mcgrps),
>> +	.n_split_ops		= ARRAY_SIZE(handshake_nl_ops),
>> +	.split_ops		= handshake_nl_ops,
>> +	.mcgrps			= handshake_nl_mcgrps,
>> +	.module			= THIS_MODULE,
>> +};
>> +
>> +static int __net_init handshake_net_init(struct net *net)
>> +{
>> +	spin_lock_init(&net->hs_lock);
>> +	INIT_LIST_HEAD(&net->hs_requests);
>> +	net->hs_pending	= 0;
>> +	return 0;
>> +}
>> +
>> +static void __net_exit handshake_net_exit(struct net *net)
>> +{
>> +	struct handshake_req *req;
>> +	LIST_HEAD(requests);
>> +
>> +	/*
>> +	 * This drains the net's pending list. Requests that
>> +	 * have been accepted and are in progress will be
>> +	 * destroyed when the socket is closed.
>> +	 */
>> +	spin_lock(&net->hs_lock);
>> +	list_splice_init(&requests, &net->hs_requests);
>> +	spin_unlock(&net->hs_lock);
>> +
>> +	while (!list_empty(&requests)) {
>> +		req = list_first_entry(&requests, struct handshake_req, hr_list);
>> +		list_del(&req->hr_list);
>> +
>> +		/*
>> +		 * Requests on this list have not yet been
>> +		 * accepted, so they do not have an fd to put.
>> +		 */
>> +
>> +		handshake_complete(req, -ETIMEDOUT, NULL);
>> +	}
>> +}
>> +
>> +static struct pernet_operations handshake_genl_net_ops = {
>> +	.init		= handshake_net_init,
>> +	.exit		= handshake_net_exit,
>> +};
>> +
>> +static int __init handshake_init(void)
>> +{
>> +	int ret;
>> +
>> +	ret = genl_register_family(&handshake_genl_family);
>> +	if (ret) {
>> +		pr_warn("handshake: netlink registration failed (%d)\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	ret = register_pernet_subsys(&handshake_genl_net_ops);
>> +	if (ret) {
>> +		pr_warn("handshake: pernet registration failed (%d)\n", ret);
>> +		genl_unregister_family(&handshake_genl_family);
>> +	}
>> +
>> +	handshake_genl_inited = true;
>> +	return ret;
>> +}
>> +
>> +static void __exit handshake_exit(void)
>> +{
>> +	unregister_pernet_subsys(&handshake_genl_net_ops);
>> +	genl_unregister_family(&handshake_genl_family);
>> +}
>> +
>> +module_init(handshake_init);
>> +module_exit(handshake_exit);
>> diff --git a/net/handshake/request.c b/net/handshake/request.c
>> new file mode 100644
>> index 000000000000..1d3b8e76dd2c
>> --- /dev/null
>> +++ b/net/handshake/request.c
>> @@ -0,0 +1,246 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Handshake request lifetime events
>> + *
>> + * Author: Chuck Lever <chuck.lever@oracle.com>
>> + *
>> + * Copyright (c) 2023, Oracle and/or its affiliates.
>> + */
>> +
>> +#include <linux/types.h>
>> +#include <linux/socket.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/skbuff.h>
>> +#include <linux/inet.h>
>> +#include <linux/fdtable.h>
>> +
>> +#include <net/sock.h>
>> +#include <net/genetlink.h>
>> +#include <net/handshake.h>
>> +
>> +#include <uapi/linux/handshake.h>
>> +#include <trace/events/handshake.h>
>> +#include "handshake.h"
>> +
>> +/*
>> + * This limit is to prevent slow remotes from causing denial of service.
>> + * A ulimit-style tunable might be used instead.
>> + */
>> +#define HANDSHAKE_PENDING_MAX (10)
>> +
>> +static void __add_pending_locked(struct net *net, struct handshake_req *req)
>> +{
>> +	net->hs_pending++;
>> +	list_add_tail(&req->hr_list, &net->hs_requests);
>> +}
>> +
>> +void __remove_pending_locked(struct net *net, struct handshake_req *req)
>> +{
>> +	net->hs_pending--;
>> +	list_del_init(&req->hr_list);
>> +}
>> +
>> +/*
>> + * Return values:
>> + *   %true - the request was found on @net's pending list
>> + *   %false - the request was not found on @net's pending list
>> + *
>> + * If @req was on a pending list, it has not yet been accepted.
>> + */
>> +static bool remove_pending(struct net *net, struct handshake_req *req)
>> +{
>> +	bool ret;
>> +
>> +	ret = false;
>> +
>> +	spin_lock(&net->hs_lock);
>> +	if (!list_empty(&req->hr_list)) {
>> +		__remove_pending_locked(net, req);
>> +		ret = true;
>> +	}
>> +	spin_unlock(&net->hs_lock);
>> +
>> +	return ret;
>> +}
>> +
>> +static void handshake_req_destroy(struct handshake_req *req, struct sock *sk)
>> +{
>> +	req->hr_proto->hp_destroy(req);
>> +	sk->sk_handshake_req = NULL;
>> +	kfree(req);
>> +}
>> +
>> +static void handshake_sk_destruct(struct sock *sk)
>> +{
>> +	struct handshake_req *req = sk->sk_handshake_req;
>> +
>> +	if (req) {
>> +		trace_handshake_destruct(sock_net(sk), req, req->hr_sock);
>> +		handshake_req_destroy(req, sk);
>> +	}
>> +}
>> +
>> +/**
>> + * handshake_req_alloc - consumer API to allocate a request
>> + * @sock: open socket on which to perform a handshake
>> + * @proto: security protocol
>> + * @flags: memory allocation flags
>> + *
>> + * Returns an initialized handshake_req or NULL.
>> + */
>> +struct handshake_req *handshake_req_alloc(struct socket *sock,
>> +					  const struct handshake_proto *proto,
>> +					  gfp_t flags)
>> +{
>> +	struct handshake_req *req;
>> +
>> +	/* Avoid accessing uninitialized global variables later on */
>> +	if (!handshake_genl_inited)
>> +		return NULL;
>> +
>> +	req = kzalloc(sizeof(*req) + proto->hp_privsize, flags);
>> +	if (!req)
>> +		return NULL;
>> +
>> +	sock_hold(sock->sk);
>> +
>> +	INIT_LIST_HEAD(&req->hr_list);
>> +	req->hr_sock = sock;
>> +	req->hr_proto = proto;
>> +	return req;
>> +}
>> +EXPORT_SYMBOL(handshake_req_alloc);
>> +
>> +/**
>> + * handshake_req_private - consumer API to return per-handshake private data
>> + * @req: handshake arguments
>> + *
>> + */
>> +void *handshake_req_private(struct handshake_req *req)
>> +{
>> +	return (void *)(req + 1);
>> +}
>> +EXPORT_SYMBOL(handshake_req_private);
>> +
>> +/**
>> + * handshake_req_submit - consumer API to submit a handshake request
>> + * @req: handshake arguments
>> + * @flags: memory allocation flags
>> + *
>> + * Return values:
>> + *   %0: Request queued
>> + *   %-EBUSY: A handshake is already under way for this socket
>> + *   %-ESRCH: No handshake agent is available
>> + *   %-EAGAIN: Too many pending handshake requests
>> + *   %-ENOMEM: Failed to allocate memory
>> + *   %-EMSGSIZE: Failed to construct notification message
>> + *
>> + * A zero return value from handshake_request() means that
>> + * exactly one subsequent completion callback is guaranteed.
>> + *
>> + * A negative return value from handshake_request() means that
>> + * no completion callback will be done and that @req is
>> + * destroyed.
>> + */
>> +int handshake_req_submit(struct handshake_req *req, gfp_t flags)
>> +{
>> +	struct socket *sock = req->hr_sock;
>> +	struct sock *sk = sock->sk;
>> +	struct net *net = sock_net(sk);
>> +	int ret;
>> +
>> +	ret = -EAGAIN;
>> +	if (READ_ONCE(net->hs_pending) >= HANDSHAKE_PENDING_MAX)
>> +		goto out_err;
>> +
>> +	ret = -EBUSY;
>> +	spin_lock(&net->hs_lock);
>> +	if (sk->sk_handshake_req || !list_empty(&req->hr_list)) {
>> +		spin_unlock(&net->hs_lock);
>> +		goto out_err;
>> +	}
>> +	req->hr_saved_destruct = sk->sk_destruct;
>> +	sk->sk_destruct = handshake_sk_destruct;
>> +	sk->sk_handshake_req = req;
>> +	__add_pending_locked(net, req);
>> +	spin_unlock(&net->hs_lock);
>> +
>> +	ret = handshake_genl_notify(net, req->hr_proto->hp_handler_class,
>> +				    flags);
>> +	if (ret) {
>> +		trace_handshake_notify_err(net, req, sock, ret);
>> +		if (remove_pending(net, req))
>> +			goto out_err;
>> +	}
>> +
>> +	trace_handshake_submit(net, req, sock);
>> +	return 0;
>> +
>> +out_err:
>> +	trace_handshake_submit_err(net, req, sock, ret);
>> +	handshake_req_destroy(req, sk);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(handshake_req_submit);
>> +
>> +void handshake_complete(struct handshake_req *req, int status,
>> +			struct nlattr **tb)
>> +{
>> +	struct socket *sock = req->hr_sock;
>> +	struct net *net = sock_net(sock->sk);
>> +
>> +	if (!test_and_set_bit(HANDSHAKE_F_COMPLETED, &req->hr_flags)) {
>> +		trace_handshake_complete(net, req, sock, status);
>> +		req->hr_proto->hp_done(req, status, tb);
>> +		__sock_put(sock->sk);
>> +	}
>> +}
>> +
>> +/**
>> + * handshake_req_cancel - consumer API to cancel an in-progress handshake
>> + * @sock: socket on which there is an ongoing handshake
>> + *
>> + * XXX: Perhaps killing the user space agent might also be necessary?
> 
> I thought we had agreed that we would be sending a signal to the userspace process?

We had discussed killing the handler, but I don't think it's necessary.
I'd rather not do something that drastic unless we have no other choice.
So far my testing hasn't shown a need for killing the child process.

I'm also concerned that the kernel could reuse the handler's process ID.
handshake_req_cancel would kill something that is not a handshake agent.


> Ideally we would be sending a SIGHUP, wait for some time on the userspace process to respond with a 'done' message, and send a 'KILL' signal if we haven't received one.
> 
> Obs: Sending a KILL signal would imply that userspace is able to cope with children dying. Which pretty much excludes pthreads, I would think.
> 
> Guess I'll have to consult Stevens :-)

Basically what cancel does is atomically disarm the "done" callback.

The socket belongs to the kernel, so it will live until the kernel is
good and through with it.


>> + *
>> + * Request cancellation races with request completion. To determine
>> + * who won, callers examine the return value from this function.
>> + *
>> + * Return values:
>> + *   %0 - Uncompleted handshake request was canceled or not found
>> + *   %-EBUSY - Handshake request already completed
> 
> EBUSY? Wouldn't be EAGAIN more approriate?

I don't think EAGAIN would be appropriate at all. The situation
is that the handshake completed, so there's no need to call cancel
again. It's synonym, EWOULDBLOCK, is also not a good semantic fit.


> After all, the request is everything _but_ busy...

I'm open to suggestion.

One option is to use a boolean return value instead of an errno.


>> + */
>> +int handshake_req_cancel(struct socket *sock)
>> +{
>> +	struct handshake_req *req;
>> +	struct sock *sk;
>> +	struct net *net;
>> +
>> +	if (!sock)
>> +		return 0;
>> +
>> +	sk = sock->sk;
>> +	req = sk->sk_handshake_req;
>> +	net = sock_net(sk);
>> +
>> +	if (!req) {
>> +		trace_handshake_cancel_none(net, req, sock);
>> +		return 0;
>> +	}
>> +
>> +	if (remove_pending(net, req)) {
>> +		/* Request hadn't been accepted */
>> +		trace_handshake_cancel(net, req, sock);
>> +		return 0;
>> +	}
>> +	if (test_and_set_bit(HANDSHAKE_F_COMPLETED, &req->hr_flags)) {
>> +		/* Request already completed */
>> +		trace_handshake_cancel_busy(net, req, sock);
>> +		return -EBUSY;
>> +	}
>> +
>> +	__sock_put(sk);
>> +	trace_handshake_cancel(net, req, sock);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(handshake_req_cancel);
>> diff --git a/net/handshake/trace.c b/net/handshake/trace.c
>> new file mode 100644
>> index 000000000000..3a5b6f29a2b8
>> --- /dev/null
>> +++ b/net/handshake/trace.c
>> @@ -0,0 +1,17 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Trace points for transport security layer handshakes.
>> + *
>> + * Author: Chuck Lever <chuck.lever@oracle.com>
>> + *
>> + * Copyright (c) 2023, Oracle and/or its affiliates.
>> + */
>> +
>> +#include <linux/types.h>
>> +#include <net/sock.h>
>> +
>> +#include "handshake.h"
>> +
>> +#define CREATE_TRACE_POINTS
>> +
>> +#include <trace/events/handshake.h>
> Cheers,
> 
> Hannes
> 
> 
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 2/2] net/tls: Add kernel APIs for requesting a TLSv1.3 handshake
  2023-02-27  9:36   ` Hannes Reinecke
@ 2023-02-27 15:01     ` Chuck Lever III
  0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever III @ 2023-02-27 15:01 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chuck Lever, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	open list:NETWORKING [GENERAL],
	kernel-tls-handshake



> On Feb 27, 2023, at 4:36 AM, Hannes Reinecke <hare@suse.de> wrote:
> 
> On 2/24/23 20:19, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> To enable kernel consumers of TLS to request a TLS handshake, add
>> support to net/tls/ to send a handshake upcall. This patch also
>> acts as a template for adding handshake upcall support to other
>> transport layer security mechanisms.
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  Documentation/netlink/specs/handshake.yaml |    4
>>  Documentation/networking/index.rst         |    1
>>  Documentation/networking/tls-handshake.rst |  146 ++++++++++
>>  include/net/tls.h                          |   27 ++
>>  include/uapi/linux/handshake.h             |    2
>>  net/handshake/netlink.c                    |    1
>>  net/tls/Makefile                           |    2
>>  net/tls/tls_handshake.c                    |  423 ++++++++++++++++++++++++++++
>>  8 files changed, 604 insertions(+), 2 deletions(-)
>>  create mode 100644 Documentation/networking/tls-handshake.rst
>>  create mode 100644 net/tls/tls_handshake.c
>> diff --git a/Documentation/netlink/specs/handshake.yaml b/Documentation/netlink/specs/handshake.yaml
>> index 683a8f2df0a7..c2f6bfff2326 100644
>> --- a/Documentation/netlink/specs/handshake.yaml
>> +++ b/Documentation/netlink/specs/handshake.yaml
>> @@ -21,7 +21,7 @@ definitions:
>>      name: handler-class
>>      enum-name:
>>      value-start: 0
>> -    entries: [ none ]
>> +    entries: [ none, tlshd ]
>>    -
>>      type: enum
>>      name: msg-type
>> @@ -132,3 +132,5 @@ mcast-groups:
>>    list:
>>      -
>>        name: none
>> +    -
>> +      name: tlshd
>> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
>> index 4ddcae33c336..189517f4ea96 100644
>> --- a/Documentation/networking/index.rst
>> +++ b/Documentation/networking/index.rst
>> @@ -36,6 +36,7 @@ Contents:
>>     scaling
>>     tls
>>     tls-offload
>> +   tls-handshake
>>     nfc
>>     6lowpan
>>     6pack
>> diff --git a/Documentation/networking/tls-handshake.rst b/Documentation/networking/tls-handshake.rst
>> new file mode 100644
>> index 000000000000..f09fc6c09580
>> --- /dev/null
>> +++ b/Documentation/networking/tls-handshake.rst
>> @@ -0,0 +1,146 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +=======================
>> +In-Kernel TLS Handshake
>> +=======================
>> +
>> +Overview
>> +========
>> +
>> +Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs
>> +over TCP. TLS provides end-to-end data integrity and confidentiality,
>> +in addition to peer authentication.
>> +
>> +The kernel's kTLS implementation handles the TLS record subprotocol, but
>> +does not handle the TLS handshake subprotocol which is used to establish
>> +a TLS session. Kernel consumers can use the API described here to
>> +request TLS session establishment.
>> +
>> +There are several possible ways to provide a handshake service in the
>> +kernel. The API described here is designed to hide the details of those
>> +implementations so that in-kernel TLS consumers do not need to be
>> +aware of how the handshake gets done.
>> +
>> +
>> +User handshake agent
>> +====================
>> +
>> +As of this writing, there is no TLS handshake implementation in the
>> +Linux kernel. Thus, with the current implementation, a user agent is
>> +started in each network namespace where a kernel consumer might require
>> +a TLS handshake. This agent listens for events sent from the kernel
>> +that request a handshake on an open and connected TCP socket.
>> +
>> +The open socket is passed to user space via a netlink operation, which
>> +creates a socket descriptor in the agent's file descriptor table. If the
>> +handshake completes successfully, the user agent promotes the socket to
>> +use the TLS ULP and sets the session information using the SOL_TLS socket
>> +options. The user agent returns the socket to the kernel via a second
>> +netlink operation.
>> +
>> +
>> +Kernel Handshake API
>> +====================
>> +
>> +A kernel TLS consumer initiates a client-side TLS handshake on an open
>> +socket by invoking one of the tls_client_hello() functions. For example:
>> +
>> +.. code-block:: c
>> +
>> +  ret = tls_client_hello_x509(sock, done_func, cookie, priorities,
>> +                              cert, privkey);
>> +
>> +The function returns zero when the handshake request is under way. A
>> +zero return guarantees the callback function @done_func will be invoked
>> +for this socket.
>> +
>> +The function returns a negative errno if the handshake could not be
>> +started. A negative errno guarantees the callback function @done_func
>> +will not be invoked on this socket.
>> +
>> +The @sock argument is an open and connected socket. The caller must hold
>> +a reference on the socket to prevent it from being destroyed while the
>> +handshake is in progress.
>> +
>> +@done_func and @cookie are a callback function that is invoked when the
>> +handshake has completed. The success status of the handshake is returned
>> +via the @status parameter of the callback function. A good practice is
>> +to close and destroy the socket immediately if the handshake has failed.
>> +
>> +@priorities is a GnuTLS priorities string that controls the handshake.
>> +The special value TLS_DEFAULT_PRIORITIES causes the handshake to
>> +operate using default TLS priorities. However, the caller can use the
>> +string to (for example) adjust the handshake to use a restricted set
>> +of ciphers (say, if the kernel consumer wishes to mandate only a
>> +limited set of ciphers).
>> +
>> +@cert is the serial number of a key that contains a DER format x.509
>> +certificate that the handshake agent presents to the remote as the local
>> +peer's identity.
>> +
>> +@privkey is the serial number of a key that contains a DER-format
>> +private key associated with the x.509 certificate.
>> +
>> +
>> +To initiate a client-side TLS handshake with a pre-shared key, use:
>> +
>> +.. code-block:: c
>> +
>> +  ret = tls_client_hello_psk(sock, done_func, cookie, priorities,
>> +                             peerid);
>> +
>> +@peerid is the serial number of a key that contains the pre-shared
>> +key to be used for the handshake.
>> +
>> +The other parameters are as above.
>> +
>> +
>> +To initiate an anonymous client-side TLS handshake use:
>> +
>> +.. code-block:: c
>> +
>> +  ret = tls_client_hello_anon(sock, done_func, cookie, priorities);
>> +
>> +The parameters are as above.
>> +
>> +The handshake agent presents no peer identity information to the
>> +remote during the handshake. Only server authentication is performed
>> +during the handshake. Thus the established session uses encryption
>> +only.
>> +
>> +
>> +Consumers that are in-kernel servers use:
>> +
>> +.. code-block:: c
>> +
>> +  ret = tls_server_hello(sock, done_func, cookie, priorities);
>> +
>> +The parameters for this operation are as above.
>> +
>> +
>> +Lastly, if the consumer needs to cancel the handshake request, say,
>> +due to a ^C or other exigent event, the handshake core provides
>> +this API:
>> +
>> +.. code-block:: c
>> +
>> +  handshake_cancel(sock);
>> +
>> +
>> +Other considerations
>> +--------------------
>> +
>> +While a handshake is under way, the kernel consumer must alter the
>> +socket's sk_data_ready callback function to ignore all incoming data.
>> +Once the handshake completion callback function has been invoked,
>> +normal receive operation can be resumed.
>> +
>> +Once a TLS session is established, the consumer must provide a buffer
>> +for and then examine the control message (CMSG) that is part of every
>> +subsequent sock_recvmsg(). Each control message indicates whether the
>> +received message data is TLS record data or session metadata.
>> +
>> +See tls.rst for details on how a kTLS consumer recognizes incoming
>> +(decrypted) application data, alerts, and handshake packets once the
>> +socket has been promoted to use the TLS ULP.
>> +
>> diff --git a/include/net/tls.h b/include/net/tls.h
>> index 154949c7b0c8..505b23992ef0 100644
>> --- a/include/net/tls.h
>> +++ b/include/net/tls.h
>> @@ -512,4 +512,31 @@ static inline bool tls_is_sk_rx_device_offloaded(struct sock *sk)
>>  	return tls_get_ctx(sk)->rx_conf == TLS_HW;
>>  }
>>  #endif
>> +
>> +#define TLS_DEFAULT_PRIORITIES		(NULL)
>> +
> 
> Hmm? What is the point in this?
> It's not that we can overwrite it later on ...
> 
>> +enum {
>> +	TLS_NO_PEERID = 0,
>> +	TLS_NO_CERT = 0,
>> +	TLS_NO_PRIVKEY = 0,
>> +};
>> +
>> +typedef void	(*tls_done_func_t)(void *data, int status,
>> +				   key_serial_t peerid);
>> +
>> +int tls_client_hello_anon(struct socket *sock, tls_done_func_t done,
>> +			  void *data, const char *priorities);
>> +int tls_client_hello_x509(struct socket *sock, tls_done_func_t done,
>> +			  void *data, const char *priorities,
>> +			  key_serial_t cert, key_serial_t privkey);
>> +int tls_client_hello_psk(struct socket *sock, tls_done_func_t done,
>> +			 void *data, const char *priorities,
>> +			 key_serial_t peerid);
>> +int tls_server_hello_x509(struct socket *sock, tls_done_func_t done,
>> +			  void *data, const char *priorities);
>> +int tls_server_hello_psk(struct socket *sock, tls_done_func_t done,
>> +			 void *data, const char *priorities);
>> +
>> +int tls_handshake_cancel(struct socket *sock);
>> +
>>  #endif /* _TLS_OFFLOAD_H */
>> diff --git a/include/uapi/linux/handshake.h b/include/uapi/linux/handshake.h
>> index 09fd7c37cba4..dad8227939a1 100644
>> --- a/include/uapi/linux/handshake.h
>> +++ b/include/uapi/linux/handshake.h
>> @@ -11,6 +11,7 @@
>>    enum {
>>  	HANDSHAKE_HANDLER_CLASS_NONE,
>> +	HANDSHAKE_HANDLER_CLASS_TLSHD,
>>  };
>>    enum {
>> @@ -59,5 +60,6 @@ enum {
>>  };
>>    #define HANDSHAKE_MCGRP_NONE	"none"
>> +#define HANDSHAKE_MCGRP_TLSHD	"tlshd"
>>    #endif /* _UAPI_LINUX_HANDSHAKE_H */
>> diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
>> index 581e382236cf..88775f784305 100644
>> --- a/net/handshake/netlink.c
>> +++ b/net/handshake/netlink.c
>> @@ -255,6 +255,7 @@ static const struct genl_split_ops handshake_nl_ops[] = {
>>    static const struct genl_multicast_group handshake_nl_mcgrps[] = {
>>  	[HANDSHAKE_HANDLER_CLASS_NONE] = { .name = HANDSHAKE_MCGRP_NONE, },
>> +	[HANDSHAKE_HANDLER_CLASS_TLSHD] = { .name = HANDSHAKE_MCGRP_TLSHD, },
>>  };
>>    static struct genl_family __ro_after_init handshake_genl_family = {
>> diff --git a/net/tls/Makefile b/net/tls/Makefile
>> index e41c800489ac..7e56b57f14f6 100644
>> --- a/net/tls/Makefile
>> +++ b/net/tls/Makefile
>> @@ -7,7 +7,7 @@ CFLAGS_trace.o := -I$(src)
>>    obj-$(CONFIG_TLS) += tls.o
>>  -tls-y := tls_main.o tls_sw.o tls_proc.o trace.o tls_strp.o
>> +tls-y := tls_handshake.o tls_main.o tls_sw.o tls_proc.o trace.o tls_strp.o
>>  
> I'd rather tack the new file at the end, but that might be personal preference ...
> 
>>  tls-$(CONFIG_TLS_TOE) += tls_toe.o
>>  tls-$(CONFIG_TLS_DEVICE) += tls_device.o tls_device_fallback.o
>> diff --git a/net/tls/tls_handshake.c b/net/tls/tls_handshake.c
>> new file mode 100644
>> index 000000000000..74d32a9ca857
>> --- /dev/null
>> +++ b/net/tls/tls_handshake.c
>> @@ -0,0 +1,423 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Establish a TLS session for a kernel socket consumer
>> + *
>> + * Author: Chuck Lever <chuck.lever@oracle.com>
>> + *
>> + * Copyright (c) 2021-2023, Oracle and/or its affiliates.
>> + */
>> +
>> +#include <linux/types.h>
>> +#include <linux/socket.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/slab.h>
>> +
>> +#include <net/sock.h>
>> +#include <net/tls.h>
>> +#include <net/genetlink.h>
>> +#include <net/handshake.h>
>> +
>> +#include <uapi/linux/handshake.h>
>> +
>> +/*
>> + * TLS priorities string passed to the GnuTLS library.
>> + *
>> + * Specifically for kernel TLS consumers: enable only TLS v1.3 and the
>> + * ciphers that are supported by kTLS.
>> + *
>> + * Currently this list is generated by hand from the supported ciphers
>> + * found in include/uapi/linux/tls.h.
>> + */
>> +#define KTLS_DEFAULT_PRIORITIES \
>> +	"SECURE256:+SECURE128:-COMP-ALL" \
>> +	":-VERS-ALL:+VERS-TLS1.3:%NO_TICKETS" \
>> +	":-CIPHER-ALL:+CHACHA20-POLY1305:+AES-256-GCM:+AES-128-GCM:+AES-128-CCM"
>> +
>> +struct tls_handshake_req {
>> +	void			(*th_consumer_done)(void *data, int status,
>> +						    key_serial_t peerid);
>> +	void			*th_consumer_data;
>> +
>> +	const char		*th_priorities;
>> +	int			th_type;
>> +	int			th_auth_type;
>> +	key_serial_t		th_peerid;
>> +	key_serial_t		th_certificate;
>> +	key_serial_t		th_privkey;
>> +
>> +};
>> +
>> +static const char *tls_handshake_dup_priorities(const char *priorities,
>> +						gfp_t flags)
>> +{
>> +	const char *tp;
>> +
>> +	if (priorities != TLS_DEFAULT_PRIORITIES && strlen(priorities))
> See above. At TLS_DEFAULT_PRIORITIES is NULL we can leave out the first condition.

strlen() crashes if it's passed a NULL pointer.

What I'm thinking of instead is to simply remove the "priorities" argument
from tls_{client,server}_hello, and leave it as something that is between
net/tls/tls_handshake.c and tlshd.


>> +		tp = priorities;
>> +	else
>> +		tp = KTLS_DEFAULT_PRIORITIES;
>> +	return kstrdup(tp, flags);
>> +}
>> +
>> +static struct tls_handshake_req *
>> +tls_handshake_req_init(struct handshake_req *req, tls_done_func_t done,
>> +		       void *data, const char *priorities)
>> +{
>> +	struct tls_handshake_req *treq = handshake_req_private(req);
>> +
>> +	treq->th_consumer_done = done;
>> +	treq->th_consumer_data = data;
>> +	treq->th_priorities = priorities;
>> +	treq->th_peerid = TLS_NO_PEERID;
>> +	treq->th_certificate = TLS_NO_CERT;
>> +	treq->th_privkey = TLS_NO_PRIVKEY;
>> +	return treq;
>> +}
>> +
>> +/**
>> + * tls_handshake_destroy - callback to release a handshake request
>> + * @req: handshake parameters to release
>> + *
>> + */
>> +static void tls_handshake_destroy(struct handshake_req *req)
>> +{
>> +	struct tls_handshake_req *treq = handshake_req_private(req);
>> +
>> +	kfree(treq->th_priorities);
>> +}
>> +
>> +/**
>> + * tls_handshake_done - callback to handle a CMD_DONE request
>> + * @req: socket on which the handshake was performed
>> + * @status: session status code
>> + * @tb: other results of session establishment
>> + *
>> + * Eventually this will return information about the established
>> + * session: whether it is authenticated, and if so, who the remote
>> + * is.
>> + */
>> +static void tls_handshake_done(struct handshake_req *req, int status,
>> +			       struct nlattr **tb)
>> +{
>> +	struct tls_handshake_req *treq = handshake_req_private(req);
>> +	key_serial_t peerid = TLS_NO_PEERID;
>> +
>> +	if (tb[HANDSHAKE_A_DONE_REMOTE_PEERID])
>> +		peerid = nla_get_u32(tb[HANDSHAKE_A_DONE_REMOTE_PEERID]);
>> +
>> +	treq->th_consumer_done(treq->th_consumer_data, status, peerid);
>> +}
>> +
>> +static int tls_handshake_put_accept_resp(struct sk_buff *msg,
>> +					 struct tls_handshake_req *treq)
>> +{
>> +	int ret;
>> +
>> +	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MESSAGE_TYPE, treq->th_type);
>> +	if (ret < 0)
>> +		goto out;
>> +	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_AUTH, treq->th_auth_type);
>> +	if (ret < 0)
>> +		goto out;
>> +	switch (treq->th_auth_type) {
>> +	case HANDSHAKE_AUTH_X509:
>> +		if (treq->th_certificate != TLS_NO_CERT) {
>> +			ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MY_PEERID,
>> +					  treq->th_certificate);
>> +			if (ret < 0)
>> +				goto out;
>> +		}
>> +		if (treq->th_privkey != TLS_NO_PRIVKEY) {
>> +			ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MY_PRIVKEY,
>> +					  treq->th_privkey);
>> +			if (ret < 0)
>> +				goto out;
>> +		}
>> +		break;
>> +	case HANDSHAKE_AUTH_PSK:
>> +		if (treq->th_peerid != TLS_NO_PEERID) {
>> +			ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MY_PEERID,
>> +					  treq->th_peerid);
>> +			if (ret < 0)
>> +				goto out;
>> +		}
>> +		break;
>> +	}
>> +
>> +	ret = nla_put_string(msg, HANDSHAKE_A_ACCEPT_GNUTLS_PRIORITIES,
>> +			     treq->th_priorities);
>> +	if (ret < 0)
>> +		goto out;
>> +
>> +	ret = 0;
>> +
>> +out:
>> +	return ret;
>> +}
>> +
>> +/**
>> + * tls_handshake_accept - callback to construct a CMD_ACCEPT response
>> + * @req: handshake parameters to return
>> + * @gi: generic netlink message context
>> + * @fd: file descriptor to be returned
>> + *
>> + * Returns zero on success, or a negative errno on failure.
>> + */
>> +static int tls_handshake_accept(struct handshake_req *req,
>> +				struct genl_info *gi, int fd)
>> +{
>> +	struct tls_handshake_req *treq = handshake_req_private(req);
>> +	struct nlmsghdr *hdr;
>> +	struct sk_buff *msg;
>> +	int ret;
>> +
>> +	ret = -ENOMEM;
>> +	msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
>> +	if (!msg)
>> +		goto out;
>> +	hdr = handshake_genl_put(msg, gi);
>> +	if (!hdr)
>> +		goto out_cancel;
>> +
>> +	ret = -EMSGSIZE;
>> +	ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_SOCKFD, fd);
>> +	if (ret < 0)
>> +		goto out_cancel;
>> +
>> +	ret = tls_handshake_put_accept_resp(msg, treq);
>> +	if (ret < 0)
>> +		goto out_cancel;
>> +
>> +	genlmsg_end(msg, hdr);
>> +	return genlmsg_reply(msg, gi);
>> +
>> +out_cancel:
>> +	genlmsg_cancel(msg, hdr);
>> +out:
>> +	return ret;
>> +}
>> +
>> +static const struct handshake_proto tls_handshake_proto = {
>> +	.hp_handler_class	= HANDSHAKE_HANDLER_CLASS_TLSHD,
>> +	.hp_privsize		= sizeof(struct tls_handshake_req),
>> +
>> +	.hp_accept		= tls_handshake_accept,
>> +	.hp_done		= tls_handshake_done,
>> +	.hp_destroy		= tls_handshake_destroy,
>> +};
>> +
>> +/**
>> + * tls_client_hello_anon - request an anonymous TLS handshake on a socket
>> + * @sock: connected socket on which to perform the handshake
>> + * @done: function to call when the handshake has completed
>> + * @data: token to pass back to @done
>> + * @priorities: GnuTLS TLS priorities string, or NULL
>> + *
>> + * Return values:
>> + *   %0: Handshake request enqueue; ->done will be called when complete
>> + *   %-ENOENT: No user agent is available
>> + *   %-ENOMEM: Memory allocation failed
>> + */
>> +int tls_client_hello_anon(struct socket *sock, tls_done_func_t done,
>> +			  void *data, const char *priorities)
>> +{
>> +	struct tls_handshake_req *treq;
>> +	struct handshake_req *req;
>> +	gfp_t flags = GFP_NOWAIT;
>> +	const char *tp;
>> +
>> +	tp = tls_handshake_dup_priorities(priorities, flags);
>> +	if (!tp)
>> +		return -ENOMEM;
>> +
>> +	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
>> +	if (!req) {
>> +		kfree(tp);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	treq = tls_handshake_req_init(req, done, data, tp);
>> +	treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO;
>> +	treq->th_auth_type = HANDSHAKE_AUTH_UNAUTH;
>> +
>> +	return handshake_req_submit(req, flags);
>> +}
>> +EXPORT_SYMBOL(tls_client_hello_anon);
>> +
>> +/**
>> + * tls_client_hello_x509 - request an x.509-based TLS handshake on a socket
>> + * @sock: connected socket on which to perform the handshake
>> + * @done: function to call when the handshake has completed
>> + * @data: token to pass back to @done
>> + * @priorities: GnuTLS TLS priorities string
>> + * @cert: serial number of key containing client's x.509 certificate
>> + * @privkey: serial number of key containing client's private key
>> + *
>> + * Return values:
>> + *   %0: Handshake request enqueue; ->done will be called when complete
>> + *   %-ENOENT: No user agent is available
>> + *   %-ENOMEM: Memory allocation failed
>> + */
>> +int tls_client_hello_x509(struct socket *sock, tls_done_func_t done,
>> +			  void *data, const char *priorities,
>> +			  key_serial_t cert, key_serial_t privkey)
>> +{
>> +	struct tls_handshake_req *treq;
>> +	struct handshake_req *req;
>> +	gfp_t flags = GFP_NOWAIT;
>> +	const char *tp;
>> +
>> +	tp = tls_handshake_dup_priorities(priorities, flags);
>> +	if (!tp)
>> +		return -ENOMEM;
>> +
>> +	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
>> +	if (!req) {
>> +		kfree(tp);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	treq = tls_handshake_req_init(req, done, data, tp);
>> +	treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO;
>> +	treq->th_auth_type = HANDSHAKE_AUTH_X509;
>> +	treq->th_certificate = cert;
>> +	treq->th_privkey = privkey;
>> +
>> +	return handshake_req_submit(req, flags);
>> +}
>> +EXPORT_SYMBOL(tls_client_hello_x509);
>> +
>> +/**
>> + * tls_client_hello_psk - request a PSK-based TLS handshake on a socket
>> + * @sock: connected socket on which to perform the handshake
>> + * @done: function to call when the handshake has completed
>> + * @data: token to pass back to @done
>> + * @priorities: GnuTLS TLS priorities string
>> + * @peerid: serial number of key containing TLS identity
>> + *
>> + * Return values:
>> + *   %0: Handshake request enqueue; ->done will be called when complete
>> + *   %-ENOENT: No user agent is available
>> + *   %-ENOMEM: Memory allocation failed
>> + */
>> +int tls_client_hello_psk(struct socket *sock, tls_done_func_t done,
>> +			 void *data, const char *priorities,
>> +			 key_serial_t peerid)
>> +{
>> +	struct tls_handshake_req *treq;
>> +	struct handshake_req *req;
>> +	gfp_t flags = GFP_NOWAIT;
>> +	const char *tp;
>> +
>> +	tp = tls_handshake_dup_priorities(priorities, flags);
>> +	if (!tp)
>> +		return -ENOMEM;
>> +
>> +	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
>> +	if (!req) {
>> +		kfree(tp);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	treq = tls_handshake_req_init(req, done, data, tp);
>> +	treq->th_type = HANDSHAKE_MSG_TYPE_CLIENTHELLO;
>> +	treq->th_auth_type = HANDSHAKE_AUTH_PSK;
>> +	treq->th_peerid = peerid;
>> +
>> +	return handshake_req_submit(req, flags);
>> +}
>> +EXPORT_SYMBOL(tls_client_hello_psk);
>> +
>> +/**
>> + * tls_server_hello_x509 - request a server TLS handshake on a socket
>> + * @sock: connected socket on which to perform the handshake
>> + * @done: function to call when the handshake has completed
>> + * @data: token to pass back to @done
>> + * @priorities: GnuTLS TLS priorities string
>> + *
>> + * Return values:
>> + *   %0: Handshake request enqueue; ->done will be called when complete
>> + *   %-ENOENT: No user agent is available
>> + *   %-ENOMEM: Memory allocation failed
>> + */
>> +int tls_server_hello_x509(struct socket *sock, tls_done_func_t done,
>> +			  void *data, const char *priorities)
>> +{
>> +	struct tls_handshake_req *treq;
>> +	struct handshake_req *req;
>> +	gfp_t flags = GFP_KERNEL;
>> +	const char *tp;
>> +
>> +	tp = tls_handshake_dup_priorities(priorities, flags);
>> +	if (!tp)
>> +		return -ENOMEM;
>> +
>> +	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
>> +	if (!req) {
>> +		kfree(tp);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	treq = tls_handshake_req_init(req, done, data, tp);
>> +	treq->th_type = HANDSHAKE_MSG_TYPE_SERVERHELLO;
>> +	treq->th_auth_type = HANDSHAKE_AUTH_X509;
>> +
>> +	return handshake_req_submit(req, flags);
>> +}
>> +EXPORT_SYMBOL(tls_server_hello_x509);
>> +
>> +/**
>> + * tls_server_hello_psk - request a server TLS handshake on a socket
>> + * @sock: connected socket on which to perform the handshake
>> + * @done: function to call when the handshake has completed
>> + * @data: token to pass back to @done
>> + * @priorities: GnuTLS TLS priorities string
>> + *
>> + * Return values:
>> + *   %0: Handshake request enqueue; ->done will be called when complete
>> + *   %-ENOENT: No user agent is available
>> + *   %-ENOMEM: Memory allocation failed
>> + */
>> +int tls_server_hello_psk(struct socket *sock, tls_done_func_t done,
>> +			 void *data, const char *priorities)
>> +{
>> +	struct tls_handshake_req *treq;
>> +	struct handshake_req *req;
>> +	gfp_t flags = GFP_KERNEL;
>> +	const char *tp;
>> +
>> +	tp = tls_handshake_dup_priorities(priorities, flags);
>> +	if (!tp)
>> +		return -ENOMEM;
>> +
>> +	req = handshake_req_alloc(sock, &tls_handshake_proto, flags);
>> +	if (!req) {
>> +		kfree(tp);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	treq = tls_handshake_req_init(req, done, data, tp);
>> +	treq->th_type = HANDSHAKE_MSG_TYPE_SERVERHELLO;
>> +	treq->th_auth_type = HANDSHAKE_AUTH_PSK;
>> +
>> +	return handshake_req_submit(req, flags);
>> +}
>> +EXPORT_SYMBOL(tls_server_hello_psk);
>> +
>> +/**
>> + * tls_handshake_cancel - cancel a pending handshake
>> + * @sock: socket on which there is an ongoing handshake
>> + *
>> + * Request cancellation races with request completion. To determine
>> + * who won, callers examine the return value from this function.
>> + *
>> + * Return values:
>> + *   %0 - Uncompleted handshake request was canceled
>> + *   %-EBUSY - Handshake request already completed
>> + */
>> +int tls_handshake_cancel(struct socket *sock)
>> +{
>> +	return handshake_req_cancel(sock);
>> +}
>> +EXPORT_SYMBOL(tls_handshake_cancel);
> 
> Cheers,
> 
> Hannes
> 
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests
  2023-02-27 14:59     ` Chuck Lever III
@ 2023-02-27 15:14       ` Hannes Reinecke
  2023-02-27 15:39         ` Chuck Lever III
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-02-27 15:14 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Chuck Lever, kuba, pabeni, edumazet, netdev, kernel-tls-handshake

On 2/27/23 15:59, Chuck Lever III wrote:
> 
> 
>> On Feb 27, 2023, at 4:24 AM, Hannes Reinecke <hare@suse.de> wrote:
>>
>> On 2/24/23 20:19, Chuck Lever wrote:
[ .. ]
>>> +	req = sock->sk->sk_handshake_req;
>>> +	if (!req) {
>>> +		err = -EBUSY;
>>> +		goto out_status;
>>> +	}
>>> +
>>> +	trace_handshake_cmd_done(net, req, sock, fd);
>>> +
>>> +	status = -EIO;
>>> +	if (tb[HANDSHAKE_A_DONE_STATUS])
>>> +		status = nla_get_u32(tb[HANDSHAKE_A_DONE_STATUS]);
>>> +
>> And this makes me ever so slightly uneasy.
>>
>> As 'status' is a netlink attribute it's inevitably defined as 'unsigned'.
>> Yet we assume that 'status' is a negative number, leaving us _technically_ in unchartered territory.
> 
> Ah, that's an oversight.
> 
> 
>> And that is notwithstanding the problem that we haven't even defined _what_ should be in the status attribute.
> 
> It's now an errno value.
> 
> 
>> Reading the code I assume that it's either '0' for success or a negative number (ie the error code) on failure.
>> Which implicitely means that we _never_ set a positive number here.
>> So what would we lose if we declare 'status' to carry the _positive_ error number instead?
>> It would bring us in-line with the actual netlink attribute definition, we wouldn't need
>> to worry about possible integer overflows, yadda yadda...
>>
>> Hmm?
> 
> It can also be argued that errnos in user space are positive-valued,
> therefore, this user space visible protocol should use a positive
> errno.
> 
> 
Thanks.

[ .. ]
>>> +
>>> +/**
>>> + * handshake_req_cancel - consumer API to cancel an in-progress handshake
>>> + * @sock: socket on which there is an ongoing handshake
>>> + *
>>> + * XXX: Perhaps killing the user space agent might also be necessary?
>>
>> I thought we had agreed that we would be sending a signal to the userspace process?
> 
> We had discussed killing the handler, but I don't think it's necessary.
> I'd rather not do something that drastic unless we have no other choice.
> So far my testing hasn't shown a need for killing the child process.
> 
> I'm also concerned that the kernel could reuse the handler's process ID.
> handshake_req_cancel would kill something that is not a handshake agent.
> 
Hmm? If that were the case, wouldn't we be sending the netlink message 
to the
wrong process, to?

And in the absence of any timeout handler: what do we do if userspace is 
stuck / doesn't make forward progress?
At one point TCP will timeout, and the client will close the connection.
Leaving us with (potentially) broken / stuck processes. Sure we would 
need to initiate some cleanup here, no?

>> Ideally we would be sending a SIGHUP, wait for some time on the userspace
>> process to respond with a 'done' message, and send a 'KILL' signal if we
>> haven't received one.
>>
>> Obs: Sending a KILL signal would imply that userspace is able to cope with
>> children dying. Which pretty much excludes pthreads, I would think.
>>
>> Guess I'll have to consult Stevens :-)
> 
> Basically what cancel does is atomically disarm the "done" callback.
> 
> The socket belongs to the kernel, so it will live until the kernel is
> good and through with it.
> 
Oh, the socket does. But the process handling the socket is not.
So even if we close the socket from the kernel there's no guarantee that 
userspace will react to it.

Problem here is with using different key materials.
As the current handshake can only deal with one key at a time the only 
chance we have for several possible keys is to retry the handshake with 
the next key.
But out of necessity we have to use the _same_ connection (as tlshd 
doesn't control the socket). So we cannot close the socket, and hence we 
can't notify userspace to give up the handshake attempt.
Being able to send a signal would be simple; sending SIGHUP to 
userspace, and wait for the 'done' call.
If it doesn't come we can terminate all attempts.
But if we get the 'done' call we know it's safe to start with the next 
attempt.

> 
>>> + *
>>> + * Request cancellation races with request completion. To determine
>>> + * who won, callers examine the return value from this function.
>>> + *
>>> + * Return values:
>>> + *   %0 - Uncompleted handshake request was canceled or not found
>>> + *   %-EBUSY - Handshake request already completed
>>
>> EBUSY? Wouldn't be EAGAIN more approriate?
> 
> I don't think EAGAIN would be appropriate at all. The situation
> is that the handshake completed, so there's no need to call cancel
> again. It's synonym, EWOULDBLOCK, is also not a good semantic fit.
> 
> 
>> After all, the request is everything _but_ busy...
> 
> I'm open to suggestion.
> 
> One option is to use a boolean return value instead of an errno.
> 
> 
Yeah, that's probably better.

BTW: thanks for the tracepoints!

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests
  2023-02-27 15:14       ` Hannes Reinecke
@ 2023-02-27 15:39         ` Chuck Lever III
  2023-02-27 17:21           ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever III @ 2023-02-27 15:39 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chuck Lever, kuba, pabeni, edumazet, netdev, kernel-tls-handshake



> On Feb 27, 2023, at 10:14 AM, Hannes Reinecke <hare@suse.de> wrote:
> 
> On 2/27/23 15:59, Chuck Lever III wrote:
>>> On Feb 27, 2023, at 4:24 AM, Hannes Reinecke <hare@suse.de> wrote:
>>> 
>>> On 2/24/23 20:19, Chuck Lever wrote:
> [ .. ]
>>>> +	req = sock->sk->sk_handshake_req;
>>>> +	if (!req) {
>>>> +		err = -EBUSY;
>>>> +		goto out_status;
>>>> +	}
>>>> +
>>>> +	trace_handshake_cmd_done(net, req, sock, fd);
>>>> +
>>>> +	status = -EIO;
>>>> +	if (tb[HANDSHAKE_A_DONE_STATUS])
>>>> +		status = nla_get_u32(tb[HANDSHAKE_A_DONE_STATUS]);
>>>> +
>>> And this makes me ever so slightly uneasy.
>>> 
>>> As 'status' is a netlink attribute it's inevitably defined as 'unsigned'.
>>> Yet we assume that 'status' is a negative number, leaving us _technically_ in unchartered territory.
>> Ah, that's an oversight.
>>> And that is notwithstanding the problem that we haven't even defined _what_ should be in the status attribute.
>> It's now an errno value.
>>> Reading the code I assume that it's either '0' for success or a negative number (ie the error code) on failure.
>>> Which implicitely means that we _never_ set a positive number here.
>>> So what would we lose if we declare 'status' to carry the _positive_ error number instead?
>>> It would bring us in-line with the actual netlink attribute definition, we wouldn't need
>>> to worry about possible integer overflows, yadda yadda...
>>> 
>>> Hmm?
>> It can also be argued that errnos in user space are positive-valued,
>> therefore, this user space visible protocol should use a positive
>> errno.
> Thanks.
> 
> [ .. ]
>>>> +
>>>> +/**
>>>> + * handshake_req_cancel - consumer API to cancel an in-progress handshake
>>>> + * @sock: socket on which there is an ongoing handshake
>>>> + *
>>>> + * XXX: Perhaps killing the user space agent might also be necessary?
>>> 
>>> I thought we had agreed that we would be sending a signal to the userspace process?
>> We had discussed killing the handler, but I don't think it's necessary.
>> I'd rather not do something that drastic unless we have no other choice.
>> So far my testing hasn't shown a need for killing the child process.
>> I'm also concerned that the kernel could reuse the handler's process ID.
>> handshake_req_cancel would kill something that is not a handshake agent.
> Hmm? If that were the case, wouldn't we be sending the netlink message to the
> wrong process, to?

Notifications go to anyone who is listening for handshake requests
and contain nothing but the handler class number. "Who is to respond
to this notification". It is up to those processes to send an ACCEPT
to the kernel, and then later a DONE.

So... listeners have to register to get notifications, and the
registration goes away as soon as the netlink socket is closed. That
is what the long-lived parent tlshd process does.

After notification, the handshake is driven entirely by the handshake
agent (the tlshd child process). The kernel is not otherwise sending
unsolicited netlink messages to anyone.

If you're concerned about the response messages that the kernel
sends back to the handshake agent... any new process would have to
have a netlink socket open, resolved to the HANDSHAKE family, and
it would have to recognize the message sequence ID in the response
message. Very very unlikely that all that would happen.


> And in the absence of any timeout handler: what do we do if userspace is stuck / doesn't make forward progress?
> At one point TCP will timeout, and the client will close the connection.
> Leaving us with (potentially) broken / stuck processes. Sure we would need to initiate some cleanup here, no?

I'm not sure. Test and see.

In my experience, one peer or the other closes the socket, and the
other follows suit. The handshake agent hits an error when it tries
to use the socket, and exits.


>>> Ideally we would be sending a SIGHUP, wait for some time on the userspace
>>> process to respond with a 'done' message, and send a 'KILL' signal if we
>>> haven't received one.
>>> 
>>> Obs: Sending a KILL signal would imply that userspace is able to cope with
>>> children dying. Which pretty much excludes pthreads, I would think.
>>> 
>>> Guess I'll have to consult Stevens :-)
>> Basically what cancel does is atomically disarm the "done" callback.
>> The socket belongs to the kernel, so it will live until the kernel is
>> good and through with it.
> Oh, the socket does. But the process handling the socket is not.
> So even if we close the socket from the kernel there's no guarantee that userspace will react to it.

If the kernel finishes first (ie, cancels and closes the socket,
as it is supposed to) the user space endpoint is dead. I don't
think it matters what the handshake agent does at that point,
although if this happens frequently, it might amount to a
resource leak.


> Problem here is with using different key materials.
> As the current handshake can only deal with one key at a time the only chance we have for several possible keys is to retry the handshake with the next key.
> But out of necessity we have to use the _same_ connection (as tlshd doesn't control the socket). So we cannot close the socket, and hence we can't notify userspace to give up the handshake attempt.
> Being able to send a signal would be simple; sending SIGHUP to userspace, and wait for the 'done' call.
> If it doesn't come we can terminate all attempts.
> But if we get the 'done' call we know it's safe to start with the next attempt.

We solve this problem by enabling the kernel to provide all those
materials to tlshd in one go.

I don't think there's a "retry" situation here. Once the handshake
has failed, the client peer has to know to try again. That would
mean retrying would have to be part of the upper layer protocol.
Does an NVMe initiator know it has to drive another handshake if
the first one fails, or does it rely on the handshake itself to
try all available identities?

We don't have a choice but to provide all the keys at once and
let the handshake negotiation deal with it.

I'm working on DONE passing multiple remote peer IDs back to the
kernel now. I don't see why ACCEPT couldn't pass multiple peer IDs
the other way.

Note that currently the handshake upcall mechanism supports only
one handshake per socket lifetime, as the handshake_req is
released by the socket's sk_destruct callback.


>>>> + *
>>>> + * Request cancellation races with request completion. To determine
>>>> + * who won, callers examine the return value from this function.
>>>> + *
>>>> + * Return values:
>>>> + *   %0 - Uncompleted handshake request was canceled or not found
>>>> + *   %-EBUSY - Handshake request already completed
>>> 
>>> EBUSY? Wouldn't be EAGAIN more approriate?
>> I don't think EAGAIN would be appropriate at all. The situation
>> is that the handshake completed, so there's no need to call cancel
>> again. It's synonym, EWOULDBLOCK, is also not a good semantic fit.
>>> After all, the request is everything _but_ busy...
>> I'm open to suggestion.
>> One option is to use a boolean return value instead of an errno.
> Yeah, that's probably better.
> 
> BTW: thanks for the tracepoints!
> 
> Cheers,
> 
> Hannes
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests
  2023-02-27 15:39         ` Chuck Lever III
@ 2023-02-27 17:21           ` Hannes Reinecke
  2023-02-27 18:10             ` Chuck Lever III
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-02-27 17:21 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Chuck Lever, kuba, pabeni, edumazet, netdev, kernel-tls-handshake

On 2/27/23 16:39, Chuck Lever III wrote:
> 
> 
>> On Feb 27, 2023, at 10:14 AM, Hannes Reinecke <hare@suse.de> wrote:
>>
>> On 2/27/23 15:59, Chuck Lever III wrote:
>>>> On Feb 27, 2023, at 4:24 AM, Hannes Reinecke <hare@suse.de> wrote:
>>>>
>>>> On 2/24/23 20:19, Chuck Lever wrote:
>> [ .. ]
>>>>> +	req = sock->sk->sk_handshake_req;
>>>>> +	if (!req) {
>>>>> +		err = -EBUSY;
>>>>> +		goto out_status;
>>>>> +	}
>>>>> +
>>>>> +	trace_handshake_cmd_done(net, req, sock, fd);
>>>>> +
>>>>> +	status = -EIO;
>>>>> +	if (tb[HANDSHAKE_A_DONE_STATUS])
>>>>> +		status = nla_get_u32(tb[HANDSHAKE_A_DONE_STATUS]);
>>>>> +
>>>> And this makes me ever so slightly uneasy.
>>>>
>>>> As 'status' is a netlink attribute it's inevitably defined as 'unsigned'.
>>>> Yet we assume that 'status' is a negative number, leaving us _technically_ in unchartered territory.
>>> Ah, that's an oversight.
>>>> And that is notwithstanding the problem that we haven't even defined _what_ should be in the status attribute.
>>> It's now an errno value.
>>>> Reading the code I assume that it's either '0' for success or a negative number (ie the error code) on failure.
>>>> Which implicitely means that we _never_ set a positive number here.
>>>> So what would we lose if we declare 'status' to carry the _positive_ error number instead?
>>>> It would bring us in-line with the actual netlink attribute definition, we wouldn't need
>>>> to worry about possible integer overflows, yadda yadda...
>>>>
>>>> Hmm?
>>> It can also be argued that errnos in user space are positive-valued,
>>> therefore, this user space visible protocol should use a positive
>>> errno.
>> Thanks.
>>
>> [ .. ]
>>>>> +
>>>>> +/**
>>>>> + * handshake_req_cancel - consumer API to cancel an in-progress handshake
>>>>> + * @sock: socket on which there is an ongoing handshake
>>>>> + *
>>>>> + * XXX: Perhaps killing the user space agent might also be necessary?
>>>>
>>>> I thought we had agreed that we would be sending a signal to the userspace process?
>>> We had discussed killing the handler, but I don't think it's necessary.
>>> I'd rather not do something that drastic unless we have no other choice.
>>> So far my testing hasn't shown a need for killing the child process.
>>> I'm also concerned that the kernel could reuse the handler's process ID.
>>> handshake_req_cancel would kill something that is not a handshake agent.
>> Hmm? If that were the case, wouldn't we be sending the netlink message to the
>> wrong process, to?
> 
> Notifications go to anyone who is listening for handshake requests
> and contain nothing but the handler class number. "Who is to respond
> to this notification". It is up to those processes to send an ACCEPT
> to the kernel, and then later a DONE.
> 
> So... listeners have to register to get notifications, and the
> registration goes away as soon as the netlink socket is closed. That
> is what the long-lived parent tlshd process does.
> 
> After notification, the handshake is driven entirely by the handshake
> agent (the tlshd child process). The kernel is not otherwise sending
> unsolicited netlink messages to anyone.
> 
> If you're concerned about the response messages that the kernel
> sends back to the handshake agent... any new process would have to
> have a netlink socket open, resolved to the HANDSHAKE family, and
> it would have to recognize the message sequence ID in the response
> message. Very very unlikely that all that would happen.
> 
> 
Yes, agree.

>> And in the absence of any timeout handler: what do we do if userspace is stuck / doesn't make forward progress?
>> At one point TCP will timeout, and the client will close the connection.
>> Leaving us with (potentially) broken / stuck processes. Sure we would need to initiate some cleanup here, no?
> 
> I'm not sure. Test and see.
> 
> In my experience, one peer or the other closes the socket, and the
> other follows suit. The handshake agent hits an error when it tries
> to use the socket, and exits.
> 
> 
Hmm. Yes, if the other side closes the socket we'll have to follow suit.
I'm not sure, though, if a TLS timeout necessarily induces as connection 
close. But okay, let's see how things pan out.

>>>> Ideally we would be sending a SIGHUP, wait for some time on the userspace
>>>> process to respond with a 'done' message, and send a 'KILL' signal if we
>>>> haven't received one.
>>>>
>>>> Obs: Sending a KILL signal would imply that userspace is able to cope with
>>>> children dying. Which pretty much excludes pthreads, I would think.
>>>>
>>>> Guess I'll have to consult Stevens :-)
>>> Basically what cancel does is atomically disarm the "done" callback.
>>> The socket belongs to the kernel, so it will live until the kernel is
>>> good and through with it.
>> Oh, the socket does. But the process handling the socket is not.
>> So even if we close the socket from the kernel there's no guarantee that userspace will react to it.
> 
> If the kernel finishes first (ie, cancels and closes the socket,
> as it is supposed to) the user space endpoint is dead. I don't
> think it matters what the handshake agent does at that point,
> although if this happens frequently, it might amount to a
> resource leak.
> 
> 
>> Problem here is with using different key materials.
>> As the current handshake can only deal with one key at a time
>> the only chance we have for several possible keys is to retry
>> the handshake with the next key.
>> But out of necessity we have to use the _same_ connection
>> (as tlshd doesn't control the socket). So we cannot close
>> the socket, and hence we can't notify userspace to give up the handshake attempt.
>> Being able to send a signal would be simple; sending SIGHUP to userspace, and wait for the 'done' call.
>> If it doesn't come we can terminate all attempts.
>> But if we get the 'done' call we know it's safe to start with the next attempt.
> 
> We solve this problem by enabling the kernel to provide all those
> materials to tlshd in one go.
> 
Ah. Right, that would work, too; provide all possible keys to the 
'accept' call and let the userspace agent figure out what to do with 
them. That makes life certainly easier for the kernel side.

> I don't think there's a "retry" situation here. Once the handshake
> has failed, the client peer has to know to try again. That would
> mean retrying would have to be part of the upper layer protocol.
> Does an NVMe initiator know it has to drive another handshake if
> the first one fails, or does it rely on the handshake itself to
> try all available identities?
> 
> We don't have a choice but to provide all the keys at once and
> let the handshake negotiation deal with it.
> 
> I'm working on DONE passing multiple remote peer IDs back to the
> kernel now. I don't see why ACCEPT couldn't pass multiple peer IDs
> the other way.
> 
Nope. That's not required.
DONE can only ever have one peer id (TLS 1.3 specifies that the client 
sends a list of identities, the server picks one, and sends that one 
back to the client). So for DONE we will only ever have 1 peer ID.
If we allow for several peer IDs to be present in the client ACCEPT 
message then we'd need to include the resulting peer ID in the client 
DONE, too; otherwise we'll need it for the server DONE only.

So all in all I think we should be going with the multiple IDs in the 
ACCEPT call (ie move the key id from being part of the message into an 
attribute), and have a peer id present in the DONE all for both 
versions, server and client.

> Note that currently the handshake upcall mechanism supports only
> one handshake per socket lifetime, as the handshake_req is
> released by the socket's sk_destruct callback.
> 
Oh, that's fine; we'll have one socket per (nvme) connection anyway.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests
  2023-02-27 17:21           ` Hannes Reinecke
@ 2023-02-27 18:10             ` Chuck Lever III
  2023-02-28  6:58               ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever III @ 2023-02-27 18:10 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chuck Lever, kuba, pabeni, edumazet, netdev, kernel-tls-handshake



> On Feb 27, 2023, at 12:21 PM, Hannes Reinecke <hare@suse.de> wrote:
> 
>> On 2/27/23 16:39, Chuck Lever III wrote:
>>> On Feb 27, 2023, at 10:14 AM, Hannes Reinecke <hare@suse.de> wrote:
>>> 
>>> Problem here is with using different key materials.
>>> As the current handshake can only deal with one key at a time
>>> the only chance we have for several possible keys is to retry
>>> the handshake with the next key.
>>> But out of necessity we have to use the _same_ connection
>>> (as tlshd doesn't control the socket). So we cannot close
>>> the socket, and hence we can't notify userspace to give up the handshake attempt.
>>> Being able to send a signal would be simple; sending SIGHUP to userspace, and wait for the 'done' call.
>>> If it doesn't come we can terminate all attempts.
>>> But if we get the 'done' call we know it's safe to start with the next attempt.
>> We solve this problem by enabling the kernel to provide all those
>> materials to tlshd in one go.
> Ah. Right, that would work, too; provide all possible keys to the 'accept' call and let the userspace agent figure out what to do with them. That makes life certainly easier for the kernel side.
> 
>> I don't think there's a "retry" situation here. Once the handshake
>> has failed, the client peer has to know to try again. That would
>> mean retrying would have to be part of the upper layer protocol.
>> Does an NVMe initiator know it has to drive another handshake if
>> the first one fails, or does it rely on the handshake itself to
>> try all available identities?
>> We don't have a choice but to provide all the keys at once and
>> let the handshake negotiation deal with it.
>> I'm working on DONE passing multiple remote peer IDs back to the
>> kernel now. I don't see why ACCEPT couldn't pass multiple peer IDs
>> the other way.
> Nope. That's not required.
> DONE can only ever have one peer id (TLS 1.3 specifies that the client sends a list of identities, the server picks one, and sends that one back to the client). So for DONE we will only ever have 1 peer ID.
> If we allow for several peer IDs to be present in the client ACCEPT message then we'd need to include the resulting peer ID in the client DONE, too; otherwise we'll need it for the server DONE only.
> 
> So all in all I think we should be going with the multiple IDs in the ACCEPT call (ie move the key id from being part of the message into an attribute), and have a peer id present in the DONE all for both versions, server and client.

To summarize:

---

The ACCEPT request (from tlshd) would have just the handler class
"Which handler is responding". The kernel uses that to find a
handshake request waiting for that type of handler. In our case,
"tlshd".

The ACCEPT response (from the kernel) would have the socket fd,
the handshake parameters, and zero or more peer ID key serial
numbers. (Today, just zero or one peer IDs).

There is also an errno status in the ACCEPT response, which
the kernel can use to indicate things like "no requests in that
class were found" or that the request was otherwise improperly
formed.

---

The DONE request (from tlshd) would have the socket fd (and
implicitly, the handler's PID), the session status, and zero
or one remote peer ID key serial numbers.

The DONE response (from the kernel) is an ACK. (Today it's
more than that, but that's broken and will be removed).

---

For the DONE request, the session status is one of:

0: session established -- see @peerid for authentication status
EIO: local error
EACCES: handshake rejected

For server handshake completion:

@peerid contains the remote peer ID if the session was
authenticated, or TLS_NO_PEERID if the session was not
authenticated.

status == EACCES if authentication material was present from
both peers but verification failed.

For client handshake completion:

@peerid contains the remote peer ID if authentication was
requested and the session was authenticated

status == EACCES if authentication was requested and the
session was not authenticated, or if verification failed.

(Maybe client could work like the server side, and the
kernel consumer would need to figure out if it cares
whether there was authentication).


Is that adequate?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests
  2023-02-27 18:10             ` Chuck Lever III
@ 2023-02-28  6:58               ` Hannes Reinecke
  2023-02-28 14:28                 ` Chuck Lever III
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-02-28  6:58 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Chuck Lever, kuba, pabeni, edumazet, netdev, kernel-tls-handshake

On 2/27/23 19:10, Chuck Lever III wrote:
> 
> 
>> On Feb 27, 2023, at 12:21 PM, Hannes Reinecke <hare@suse.de> wrote:
>>
>>> On 2/27/23 16:39, Chuck Lever III wrote:
>>>> On Feb 27, 2023, at 10:14 AM, Hannes Reinecke <hare@suse.de> wrote:
>>>>
>>>> Problem here is with using different key materials.
>>>> As the current handshake can only deal with one key at a time
>>>> the only chance we have for several possible keys is to retry
>>>> the handshake with the next key.
>>>> But out of necessity we have to use the _same_ connection
>>>> (as tlshd doesn't control the socket). So we cannot close
>>>> the socket, and hence we can't notify userspace to give up the handshake attempt.
>>>> Being able to send a signal would be simple; sending SIGHUP to userspace, and wait for the 'done' call.
>>>> If it doesn't come we can terminate all attempts.
>>>> But if we get the 'done' call we know it's safe to start with the next attempt.
>>> We solve this problem by enabling the kernel to provide all those
>>> materials to tlshd in one go.
>> Ah. Right, that would work, too; provide all possible keys to the
>> 'accept' call and let the userspace agent figure out what to do with
>> them. That makes life certainly easier for the kernel side.
>>
>>> I don't think there's a "retry" situation here. Once the handshake
>>> has failed, the client peer has to know to try again. That would
>>> mean retrying would have to be part of the upper layer protocol.
>>> Does an NVMe initiator know it has to drive another handshake if
>>> the first one fails, or does it rely on the handshake itself to
>>> try all available identities?
>>> We don't have a choice but to provide all the keys at once and
>>> let the handshake negotiation deal with it.
>>> I'm working on DONE passing multiple remote peer IDs back to the
>>> kernel now. I don't see why ACCEPT couldn't pass multiple peer IDs
>>> the other way.
>> Nope. That's not required.
>> DONE can only ever have one peer id (TLS 1.3 specifies that the client
>> sends a list of identities, the server picks one, and sends that one back
>> to the client). So for DONE we will only ever have 1 peer ID.
>> If we allow for several peer IDs to be present in the client ACCEPT message
>> then we'd need to include the resulting peer ID in the client DONE, too;
>> otherwise we'll need it for the server DONE only.
>>
>> So all in all I think we should be going with the multiple IDs in the
>> ACCEPT call (ie move the key id from being part of the message into an
>> attribute), and have a peer id present in the DONE all for both versions,
>> server and client.
> 
> To summarize:
> 
> ---
> 
> The ACCEPT request (from tlshd) would have just the handler class
> "Which handler is responding". The kernel uses that to find a
> handshake request waiting for that type of handler. In our case,
> "tlshd".
> 
> The ACCEPT response (from the kernel) would have the socket fd,
> the handshake parameters, and zero or more peer ID key serial
> numbers. (Today, just zero or one peer IDs).
>  > There is also an errno status in the ACCEPT response, which
> the kernel can use to indicate things like "no requests in that
> class were found" or that the request was otherwise improperly
> formed.
> 
> ---
> 
> The DONE request (from tlshd) would have the socket fd (and
> implicitly, the handler's PID), the session status, and zero
> or one remote peer ID key serial numbers.
>  > The DONE response (from the kernel) is an ACK. (Today it's
> more than that, but that's broken and will be removed).
> 
> ---
> 
> For the DONE request, the session status is one of:
> 
> 0: session established -- see @peerid for authentication status
> EIO: local error
> EACCES: handshake rejected
> 
> For server handshake completion:
> 
> @peerid contains the remote peer ID if the session was
> authenticated, or TLS_NO_PEERID if the session was not
> authenticated.
> 
> status == EACCES if authentication material was present from
> both peers but verification failed.
> 
> For client handshake completion:
> 
> @peerid contains the remote peer ID if authentication was
> requested and the session was authenticated
> 
> status == EACCES if authentication was requested and the
> session was not authenticated, or if verification failed.
> 
> (Maybe client could work like the server side, and the
> kernel consumer would need to figure out if it cares
> whether there was authentication).
> 
Yes, that would be my preference. Always return @peerid
for DONE if the TLS session was established.
We might also consider returning @peerid with EACCESS
to indicate the offending ID.

> 
> Is that adequate?
> 
Yes, it is.

So the only bone of contention is the timeout; as we won't
be implementing signals I still think that we should have
a 'timeout' attribute. And if only to feed the TLS timeout
parameter for gnutls ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests
  2023-02-28  6:58               ` Hannes Reinecke
@ 2023-02-28 14:28                 ` Chuck Lever III
  2023-02-28 15:48                   ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever III @ 2023-02-28 14:28 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chuck Lever, kuba, pabeni, edumazet, netdev, kernel-tls-handshake



> On Feb 28, 2023, at 1:58 AM, Hannes Reinecke <hare@suse.de> wrote:
> 
> On 2/27/23 19:10, Chuck Lever III wrote:
>>> On Feb 27, 2023, at 12:21 PM, Hannes Reinecke <hare@suse.de> wrote:
>>> 
>>>> On 2/27/23 16:39, Chuck Lever III wrote:
>>>>> On Feb 27, 2023, at 10:14 AM, Hannes Reinecke <hare@suse.de> wrote:
>>>>> 
>>>>> Problem here is with using different key materials.
>>>>> As the current handshake can only deal with one key at a time
>>>>> the only chance we have for several possible keys is to retry
>>>>> the handshake with the next key.
>>>>> But out of necessity we have to use the _same_ connection
>>>>> (as tlshd doesn't control the socket). So we cannot close
>>>>> the socket, and hence we can't notify userspace to give up the handshake attempt.
>>>>> Being able to send a signal would be simple; sending SIGHUP to userspace, and wait for the 'done' call.
>>>>> If it doesn't come we can terminate all attempts.
>>>>> But if we get the 'done' call we know it's safe to start with the next attempt.
>>>> We solve this problem by enabling the kernel to provide all those
>>>> materials to tlshd in one go.
>>> Ah. Right, that would work, too; provide all possible keys to the
>>> 'accept' call and let the userspace agent figure out what to do with
>>> them. That makes life certainly easier for the kernel side.
>>> 
>>>> I don't think there's a "retry" situation here. Once the handshake
>>>> has failed, the client peer has to know to try again. That would
>>>> mean retrying would have to be part of the upper layer protocol.
>>>> Does an NVMe initiator know it has to drive another handshake if
>>>> the first one fails, or does it rely on the handshake itself to
>>>> try all available identities?
>>>> We don't have a choice but to provide all the keys at once and
>>>> let the handshake negotiation deal with it.
>>>> I'm working on DONE passing multiple remote peer IDs back to the
>>>> kernel now. I don't see why ACCEPT couldn't pass multiple peer IDs
>>>> the other way.
>>> Nope. That's not required.
>>> DONE can only ever have one peer id (TLS 1.3 specifies that the client
>>> sends a list of identities, the server picks one, and sends that one back
>>> to the client). So for DONE we will only ever have 1 peer ID.
>>> If we allow for several peer IDs to be present in the client ACCEPT message
>>> then we'd need to include the resulting peer ID in the client DONE, too;
>>> otherwise we'll need it for the server DONE only.
>>> 
>>> So all in all I think we should be going with the multiple IDs in the
>>> ACCEPT call (ie move the key id from being part of the message into an
>>> attribute), and have a peer id present in the DONE all for both versions,
>>> server and client.
>> To summarize:
>> ---
>> The ACCEPT request (from tlshd) would have just the handler class
>> "Which handler is responding". The kernel uses that to find a
>> handshake request waiting for that type of handler. In our case,
>> "tlshd".
>> The ACCEPT response (from the kernel) would have the socket fd,
>> the handshake parameters, and zero or more peer ID key serial
>> numbers. (Today, just zero or one peer IDs).
>> > There is also an errno status in the ACCEPT response, which
>> the kernel can use to indicate things like "no requests in that
>> class were found" or that the request was otherwise improperly
>> formed.
>> ---
>> The DONE request (from tlshd) would have the socket fd (and
>> implicitly, the handler's PID), the session status, and zero
>> or one remote peer ID key serial numbers.
>> > The DONE response (from the kernel) is an ACK. (Today it's
>> more than that, but that's broken and will be removed).
>> ---
>> For the DONE request, the session status is one of:
>> 0: session established -- see @peerid for authentication status
>> EIO: local error
>> EACCES: handshake rejected
>> For server handshake completion:
>> @peerid contains the remote peer ID if the session was
>> authenticated, or TLS_NO_PEERID if the session was not
>> authenticated.
>> status == EACCES if authentication material was present from
>> both peers but verification failed.
>> For client handshake completion:
>> @peerid contains the remote peer ID if authentication was
>> requested and the session was authenticated
>> status == EACCES if authentication was requested and the
>> session was not authenticated, or if verification failed.
>> (Maybe client could work like the server side, and the
>> kernel consumer would need to figure out if it cares
>> whether there was authentication).
> Yes, that would be my preference. Always return @peerid
> for DONE if the TLS session was established.

You mean if the TLS session was authenticated. The server
won't receive a remote peer identity if the client peer
doesn't authenticate.


> We might also consider returning @peerid with EACCESS
> to indicate the offending ID.

I'll look into that.


>> Is that adequate?
> Yes, it is.

What about the narrow set of DONE status values? You've
recently wanted to add ENOMEM, ENOKEY, and EINVAL to
this set. My experience is that these status values are
nearly always obscured before they can get back to the
requesting user.

Can the kernel make use of ENOMEM, for example? It might
be able to retry, I suppose... retrying is not sensible
for the server side.


> So the only bone of contention is the timeout; as we won't
> be implementing signals I still think that we should have
> a 'timeout' attribute. And if only to feed the TLS timeout
> parameter for gnutls ...

I'm still not seeing the case for making it an individual
parameter for each handshake request. Maybe a config
parameter, if a short timeout is actually needed... even
then, maybe a built-in timeout is preferable to yet another
tuning knob that can be abused.

I'd like to see some testing results to determine that a
short timeout is the only way to handle corner cases.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests
  2023-02-28 14:28                 ` Chuck Lever III
@ 2023-02-28 15:48                   ` Hannes Reinecke
  2023-02-28 16:01                     ` Chuck Lever III
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-02-28 15:48 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Chuck Lever, kuba, pabeni, edumazet, netdev, kernel-tls-handshake

On 2/28/23 15:28, Chuck Lever III wrote:
> 
> 
>> On Feb 28, 2023, at 1:58 AM, Hannes Reinecke <hare@suse.de> wrote:
>>
>> On 2/27/23 19:10, Chuck Lever III wrote:
>>>> On Feb 27, 2023, at 12:21 PM, Hannes Reinecke <hare@suse.de> wrote:
>>>>
>>>>> On 2/27/23 16:39, Chuck Lever III wrote:
>>>>>> On Feb 27, 2023, at 10:14 AM, Hannes Reinecke <hare@suse.de> wrote:
>>>>>>
>>>>>> Problem here is with using different key materials.
>>>>>> As the current handshake can only deal with one key at a time
>>>>>> the only chance we have for several possible keys is to retry
>>>>>> the handshake with the next key.
>>>>>> But out of necessity we have to use the _same_ connection
>>>>>> (as tlshd doesn't control the socket). So we cannot close
>>>>>> the socket, and hence we can't notify userspace to give up the handshake attempt.
>>>>>> Being able to send a signal would be simple; sending SIGHUP to userspace, and wait for the 'done' call.
>>>>>> If it doesn't come we can terminate all attempts.
>>>>>> But if we get the 'done' call we know it's safe to start with the next attempt.
>>>>> We solve this problem by enabling the kernel to provide all those
>>>>> materials to tlshd in one go.
>>>> Ah. Right, that would work, too; provide all possible keys to the
>>>> 'accept' call and let the userspace agent figure out what to do with
>>>> them. That makes life certainly easier for the kernel side.
>>>>
>>>>> I don't think there's a "retry" situation here. Once the handshake
>>>>> has failed, the client peer has to know to try again. That would
>>>>> mean retrying would have to be part of the upper layer protocol.
>>>>> Does an NVMe initiator know it has to drive another handshake if
>>>>> the first one fails, or does it rely on the handshake itself to
>>>>> try all available identities?
>>>>> We don't have a choice but to provide all the keys at once and
>>>>> let the handshake negotiation deal with it.
>>>>> I'm working on DONE passing multiple remote peer IDs back to the
>>>>> kernel now. I don't see why ACCEPT couldn't pass multiple peer IDs
>>>>> the other way.
>>>> Nope. That's not required.
>>>> DONE can only ever have one peer id (TLS 1.3 specifies that the client
>>>> sends a list of identities, the server picks one, and sends that one back
>>>> to the client). So for DONE we will only ever have 1 peer ID.
>>>> If we allow for several peer IDs to be present in the client ACCEPT message
>>>> then we'd need to include the resulting peer ID in the client DONE, too;
>>>> otherwise we'll need it for the server DONE only.
>>>>
>>>> So all in all I think we should be going with the multiple IDs in the
>>>> ACCEPT call (ie move the key id from being part of the message into an
>>>> attribute), and have a peer id present in the DONE all for both versions,
>>>> server and client.
>>> To summarize:
>>> ---
>>> The ACCEPT request (from tlshd) would have just the handler class
>>> "Which handler is responding". The kernel uses that to find a
>>> handshake request waiting for that type of handler. In our case,
>>> "tlshd".
>>> The ACCEPT response (from the kernel) would have the socket fd,
>>> the handshake parameters, and zero or more peer ID key serial
>>> numbers. (Today, just zero or one peer IDs).
>>>> There is also an errno status in the ACCEPT response, which
>>> the kernel can use to indicate things like "no requests in that
>>> class were found" or that the request was otherwise improperly
>>> formed.
>>> ---
>>> The DONE request (from tlshd) would have the socket fd (and
>>> implicitly, the handler's PID), the session status, and zero
>>> or one remote peer ID key serial numbers.
>>>> The DONE response (from the kernel) is an ACK. (Today it's
>>> more than that, but that's broken and will be removed).
>>> ---
>>> For the DONE request, the session status is one of:
>>> 0: session established -- see @peerid for authentication status
>>> EIO: local error
>>> EACCES: handshake rejected
>>> For server handshake completion:
>>> @peerid contains the remote peer ID if the session was
>>> authenticated, or TLS_NO_PEERID if the session was not
>>> authenticated.
>>> status == EACCES if authentication material was present from
>>> both peers but verification failed.
>>> For client handshake completion:
>>> @peerid contains the remote peer ID if authentication was
>>> requested and the session was authenticated
>>> status == EACCES if authentication was requested and the
>>> session was not authenticated, or if verification failed.
>>> (Maybe client could work like the server side, and the
>>> kernel consumer would need to figure out if it cares
>>> whether there was authentication).
>> Yes, that would be my preference. Always return @peerid
>> for DONE if the TLS session was established.
> 
> You mean if the TLS session was authenticated. The server
> won't receive a remote peer identity if the client peer
> doesn't authenticate.
> 
Ah, yes, forgot about that.
(PSK always 'authenticate' as the identity is that used to
find the appropriate PSK ...)

> 
>> We might also consider returning @peerid with EACCESS
>> to indicate the offending ID.
> 
> I'll look into that.
> 
> 
>>> Is that adequate?
>> Yes, it is.
> 
> What about the narrow set of DONE status values? You've
> recently wanted to add ENOMEM, ENOKEY, and EINVAL to
> this set. My experience is that these status values are
> nearly always obscured before they can get back to the
> requesting user.
> 
> Can the kernel make use of ENOMEM, for example? It might
> be able to retry, I suppose... retrying is not sensible
> for the server side.
> 
The usual problem: Retry or no retry.
Sadly error numbers are no good indicator to that.
Maybe we should take the NVMe approach and add a _different_
attribute indicating whether this particular error status
should be retried.

> 
>> So the only bone of contention is the timeout; as we won't
>> be implementing signals I still think that we should have
>> a 'timeout' attribute. And if only to feed the TLS timeout
>> parameter for gnutls ...
> 
> I'm still not seeing the case for making it an individual
> parameter for each handshake request. Maybe a config
> parameter, if a short timeout is actually needed... even
> then, maybe a built-in timeout is preferable to yet another
> tuning knob that can be abused.
> 
The problem I see is that the kernel-side needs to make forward
progress eventually, and calling into userspace is a good recipe
of violating that principle.
Sending a timeout value as a netlink parameter has the advantage
the both sides are aware that there _is_ a timeout.
The alternative would be an unconditional wait in the kernel,
and a very real possibility of a stuck process.

> I'd like to see some testing results to determine that a
> short timeout is the only way to handle corner cases.
> 
Short timeouts are especially useful for testing and debugging;
timeout handlers are prone to issues, and hence need a really good
bashing to hash out issues.
And not having a timeout is also not a good idea, see above.

But yeah, in theory we could use a configuration timeout in tlshd.

In the end, it's _just_ another netlink attribute, which might
(or might not) be present. Which replaces a built-in value.
I hadn't thought this to be such an issue ...

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests
  2023-02-28 15:48                   ` Hannes Reinecke
@ 2023-02-28 16:01                     ` Chuck Lever III
  0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever III @ 2023-02-28 16:01 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chuck Lever, kuba, pabeni, edumazet, netdev, kernel-tls-handshake



> On Feb 28, 2023, at 10:48 AM, Hannes Reinecke <hare@suse.de> wrote:
> 
> On 2/28/23 15:28, Chuck Lever III wrote:
>>> On Feb 28, 2023, at 1:58 AM, Hannes Reinecke <hare@suse.de> wrote:
>>> 
>>> On 2/27/23 19:10, Chuck Lever III wrote:
>>> 
>> What about the narrow set of DONE status values? You've
>> recently wanted to add ENOMEM, ENOKEY, and EINVAL to
>> this set. My experience is that these status values are
>> nearly always obscured before they can get back to the
>> requesting user.
>> Can the kernel make use of ENOMEM, for example? It might
>> be able to retry, I suppose... retrying is not sensible
>> for the server side.
> The usual problem: Retry or no retry.
> Sadly error numbers are no good indicator to that.
> Maybe we should take the NVMe approach and add a _different_
> attribute indicating whether this particular error status
> should be retried.

ENOMEM is obviously temporary. The others are permanent
errors. This is handled simply via a tiny protocol
specification, which I can add near tls_handshake_done().


>>> So the only bone of contention is the timeout; as we won't
>>> be implementing signals I still think that we should have
>>> a 'timeout' attribute. And if only to feed the TLS timeout
>>> parameter for gnutls ...
>> I'm still not seeing the case for making it an individual
>> parameter for each handshake request. Maybe a config
>> parameter, if a short timeout is actually needed... even
>> then, maybe a built-in timeout is preferable to yet another
>> tuning knob that can be abused.
> The problem I see is that the kernel-side needs to make forward
> progress eventually, and calling into userspace is a good recipe
> of violating that principle.

That's why RPC-with-TLS uses wait-interruptible-timeout.


> Sending a timeout value as a netlink parameter has the advantage
> the both sides are aware that there _is_ a timeout.
> The alternative would be an unconditional wait in the kernel,
> and a very real possibility of a stuck process.

I'm not following you. Why isn't wait-interruptible-timeout
in the kernel adequate?


>> I'd like to see some testing results to determine that a
>> short timeout is the only way to handle corner cases.
> Short timeouts are especially useful for testing and debugging;
> timeout handlers are prone to issues, and hence need a really good
> bashing to hash out issues.
> And not having a timeout is also not a good idea, see above.

RPC-with-TLS has a timeout. The kernel is in complete control
of it. After a few seconds, the kernel abandons the handshake
attempt and closes the socket. It doesn't care what the handler
agent does at that point.


> But yeah, in theory we could use a configuration timeout in tlshd.
> 
> In the end, it's _just_ another netlink attribute, which might
> (or might not) be present. Which replaces a built-in value.
> I hadn't thought this to be such an issue ...

It's an issue because you have not identified a particular
corner case (via reproducer) where user and kernel have to
agree on exactly the same timeout value, and it might be
different per-request.

Show me one, and I will agree to add it. So far, I haven't
seen sufficient justification.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-02-28 16:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-24 19:19 [PATCH v5 0/2] Another crack at a handshake upcall mechanism Chuck Lever
2023-02-24 19:19 ` [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests Chuck Lever
2023-02-27  9:24   ` Hannes Reinecke
2023-02-27 14:59     ` Chuck Lever III
2023-02-27 15:14       ` Hannes Reinecke
2023-02-27 15:39         ` Chuck Lever III
2023-02-27 17:21           ` Hannes Reinecke
2023-02-27 18:10             ` Chuck Lever III
2023-02-28  6:58               ` Hannes Reinecke
2023-02-28 14:28                 ` Chuck Lever III
2023-02-28 15:48                   ` Hannes Reinecke
2023-02-28 16:01                     ` Chuck Lever III
2023-02-24 19:19 ` [PATCH v5 2/2] net/tls: Add kernel APIs for requesting a TLSv1.3 handshake Chuck Lever
2023-02-27  9:36   ` Hannes Reinecke
2023-02-27 15:01     ` Chuck Lever III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).