kernel-tls-handshake.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 00/18] nvme: In-kernel TLS support for TCP
@ 2023-04-17 13:02 Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 01/18] nvme-keyring: register '.nvme' keyring Hannes Reinecke
                   ` (17 more replies)
  0 siblings, 18 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

Hi all,

finally I've managed to put all things together and enable in-kernel
TLS support for NVMe-over-TCP.

The patchset is based on the TLS upcall mechanism from Chuck Lever
(cf '[PATCH v7 0/2] Another crack at a handshake upcall mechanism'
posted to the linux netdev list), and requires the 'tlshd' userspace
daemon (https://github.com/oracle/ktls-utils) for the actual TLS handshake.
Changes for nvme-cli are already included in the upstream repository.

Theory of operation:
A dedicated '.nvme' keyring is created to hold the pre-shared keys (PSKs)
for the TLS handshake. Keys will have to be provisioned before TLS handshake
is attempted; that can be done with the 'nvme gen-tls-key' command for nvme-cli
(patches are already merged upstream).
After connection to the remote TCP port the client side will use the
'best' PSK (as inferred from the NVMe TCP spec) or the PSK specified
by the '--tls_key' option to nvme-cli and call the TLS userspace daemon
to initiate a TLS handshake.
The server side will then invoke the TLS userspace daemon to run the TLS
handshake.
If the TLS handshake succeeds the userspace daemon will be activating
kTLS on the socket, and control is passed back to the kernel.

To make this work I had to implement the 'read_sock()' functionality
for TLS; it seems to be holding up well enough (for me), but it really
could do with reviews from persons with more network stack knowledge.

As usual, comments and reviews are welcome.

Changes to v2:
- Included reviews from Sagi
- Removed MSG_SENDPAGE_NOTLAST
- Improved MSG_EOR handling for TLS
- Add config options NVME_TCP_TLS
  and NVME_TARGET_TCP_TLS

Changes to the original RFC:
- Add a CONFIG_NVME_TLS config option
- Use a single PSK for the TLS handshake
- Make TLS connections mandatory
- Do not peek messages for the server
- Simplify data_ready callback
- Implement read_sock() for TLS

Hannes Reinecke (18):
  nvme-keyring: register '.nvme' keyring
  nvme-keyring: define a 'psk' keytype
  nvme: add TCP TSAS definitions
  nvme-tcp: add definitions for TLS cipher suites
  nvme-keyring: implement nvme_tls_psk_default()
  net/tls: implement ->read_sock()
  net/tls: sanitize MSG_EOR handling
  nvme-tcp: do not set MSG_SENDPAGE_NOTLAST
  security/keys: export key_lookup()
  nvme/tcp: allocate socket file
  nvme-tcp: enable TLS handshake upcall
  nvme-tcp: control message handling for recvmsg()
  nvme-fabrics: parse options 'keyring' and 'tls_key'
  nvmet: make TCP sectype settable via configfs
  nvmet-tcp: allocate socket file
  nvmet-tcp: enable TLS handshake upcall
  nvmet-tcp: control messages for recvmsg()
  nvmet-tcp: add configfs attribute 'param_keyring'

 drivers/nvme/common/Kconfig    |   4 +
 drivers/nvme/common/Makefile   |   3 +-
 drivers/nvme/common/keyring.c  | 182 ++++++++++++++++++++++++++++
 drivers/nvme/host/Kconfig      |  14 +++
 drivers/nvme/host/core.c       |  33 ++++-
 drivers/nvme/host/fabrics.c    |  77 +++++++++++-
 drivers/nvme/host/fabrics.h    |   9 ++
 drivers/nvme/host/nvme.h       |   1 +
 drivers/nvme/host/tcp.c        | 175 +++++++++++++++++++++++++--
 drivers/nvme/target/Kconfig    |  14 +++
 drivers/nvme/target/configfs.c | 173 +++++++++++++++++++++++++-
 drivers/nvme/target/nvmet.h    |   1 +
 drivers/nvme/target/tcp.c      | 213 ++++++++++++++++++++++++++++++---
 include/linux/nvme-keyring.h   |  36 ++++++
 include/linux/nvme-tcp.h       |   6 +
 include/linux/nvme.h           |  10 ++
 net/tls/tls.h                  |   2 +
 net/tls/tls_device.c           |  10 ++
 net/tls/tls_main.c             |   2 +
 net/tls/tls_sw.c               |  92 ++++++++++++++
 security/keys/key.c            |   1 +
 21 files changed, 1020 insertions(+), 38 deletions(-)
 create mode 100644 drivers/nvme/common/keyring.c
 create mode 100644 include/linux/nvme-keyring.h

-- 
2.35.3


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 01/18] nvme-keyring: register '.nvme' keyring
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 02/18] nvme-keyring: define a 'psk' keytype Hannes Reinecke
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

Register a '.nvme' keyring to hold keys for TLS and DH-HMAC-CHAP and
add a new config option NVME_KEYRING.
We need a separate keyring for NVMe as the configuration is done
via individual commands (eg for configfs), and the usual per-session
or per-process keyrings can't be used.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/common/Kconfig   |  4 ++++
 drivers/nvme/common/Makefile  |  3 ++-
 drivers/nvme/common/keyring.c | 40 +++++++++++++++++++++++++++++++++++
 drivers/nvme/host/core.c      | 10 +++++++--
 include/linux/nvme-keyring.h  | 28 ++++++++++++++++++++++++
 5 files changed, 82 insertions(+), 3 deletions(-)
 create mode 100644 drivers/nvme/common/keyring.c
 create mode 100644 include/linux/nvme-keyring.h

diff --git a/drivers/nvme/common/Kconfig b/drivers/nvme/common/Kconfig
index 4514f44362dd..641b27adb047 100644
--- a/drivers/nvme/common/Kconfig
+++ b/drivers/nvme/common/Kconfig
@@ -2,3 +2,7 @@
 
 config NVME_COMMON
        tristate
+
+config NVME_KEYRING
+       bool
+       select KEYS
diff --git a/drivers/nvme/common/Makefile b/drivers/nvme/common/Makefile
index 720c625b8a52..0cbd0b0b8d49 100644
--- a/drivers/nvme/common/Makefile
+++ b/drivers/nvme/common/Makefile
@@ -4,4 +4,5 @@ ccflags-y			+= -I$(src)
 
 obj-$(CONFIG_NVME_COMMON)	+= nvme-common.o
 
-nvme-common-y			+= auth.o
+nvme-common-$(CONFIG_NVME_AUTH)	+= auth.o
+nvme-common-$(CONFIG_NVME_KEYRING) += keyring.o
diff --git a/drivers/nvme/common/keyring.c b/drivers/nvme/common/keyring.c
new file mode 100644
index 000000000000..5cf64b278119
--- /dev/null
+++ b/drivers/nvme/common/keyring.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 Hannes Reinecke, SUSE Labs
+ */
+
+#include <linux/module.h>
+#include <linux/seq_file.h>
+#include <linux/key-type.h>
+#include <keys/user-type.h>
+#include <linux/nvme.h>
+
+static struct key *nvme_keyring;
+
+key_serial_t nvme_keyring_id(void)
+{
+	return nvme_keyring->serial;
+}
+EXPORT_SYMBOL_GPL(nvme_keyring_id);
+
+int nvme_keyring_init(void)
+{
+	nvme_keyring = keyring_alloc(".nvme",
+				     GLOBAL_ROOT_UID, GLOBAL_ROOT_GID,
+				     current_cred(),
+				     (KEY_POS_ALL & ~KEY_POS_SETATTR) |
+				     (KEY_USR_ALL & ~KEY_USR_SETATTR),
+				     KEY_ALLOC_NOT_IN_QUOTA, NULL, NULL);
+	if (IS_ERR(nvme_keyring))
+		return PTR_ERR(nvme_keyring);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvme_keyring_init);
+
+void nvme_keyring_exit(void)
+{
+	key_revoke(nvme_keyring);
+	key_put(nvme_keyring);
+}
+EXPORT_SYMBOL_GPL(nvme_keyring_exit);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6c1e7d6709e0..79aa215aec76 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -25,6 +25,7 @@
 #include "nvme.h"
 #include "fabrics.h"
 #include <linux/nvme-auth.h>
+#include <linux/nvme-keyring.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -5401,12 +5402,16 @@ static int __init nvme_core_init(void)
 		result = PTR_ERR(nvme_ns_chr_class);
 		goto unregister_generic_ns;
 	}
-
-	result = nvme_init_auth();
+	result = nvme_keyring_init();
 	if (result)
 		goto destroy_ns_chr;
+	result = nvme_init_auth();
+	if (result)
+		goto keyring_exit;
 	return 0;
 
+keyring_exit:
+	nvme_keyring_exit();
 destroy_ns_chr:
 	class_destroy(nvme_ns_chr_class);
 unregister_generic_ns:
@@ -5430,6 +5435,7 @@ static int __init nvme_core_init(void)
 static void __exit nvme_core_exit(void)
 {
 	nvme_exit_auth();
+	nvme_keyring_exit();
 	class_destroy(nvme_ns_chr_class);
 	class_destroy(nvme_subsys_class);
 	class_destroy(nvme_class);
diff --git a/include/linux/nvme-keyring.h b/include/linux/nvme-keyring.h
new file mode 100644
index 000000000000..32bd264a71e6
--- /dev/null
+++ b/include/linux/nvme-keyring.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2023 Hannes Reinecke, SUSE Labs
+ */
+
+#ifndef _NVME_KEYRING_H
+#define _NVME_KEYRING_H
+
+#ifdef CONFIG_NVME_KEYRING
+
+key_serial_t nvme_keyring_id(void);
+int nvme_keyring_init(void);
+void nvme_keyring_exit(void);
+
+#else
+
+static inline key_serial_t nvme_keyring_id(void)
+{
+	return 0;
+}
+static inline int nvme_keyring_init(void)
+{
+	return 0;
+}
+static inline void nvme_keyring_exit(void) {}
+
+#endif /* !CONFIG_NVME_KEYRING */
+#endif /* _NVME_KEYRING_H */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 02/18] nvme-keyring: define a 'psk' keytype
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 01/18] nvme-keyring: register '.nvme' keyring Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 03/18] nvme: add TCP TSAS definitions Hannes Reinecke
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

Define a 'psk' keytype to hold the NVMe TLS PSKs.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/common/keyring.c | 94 +++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/drivers/nvme/common/keyring.c b/drivers/nvme/common/keyring.c
index 5cf64b278119..494dd365052e 100644
--- a/drivers/nvme/common/keyring.c
+++ b/drivers/nvme/common/keyring.c
@@ -8,6 +8,8 @@
 #include <linux/key-type.h>
 #include <keys/user-type.h>
 #include <linux/nvme.h>
+#include <linux/nvme-tcp.h>
+#include <linux/nvme-keyring.h>
 
 static struct key *nvme_keyring;
 
@@ -17,8 +19,94 @@ key_serial_t nvme_keyring_id(void)
 }
 EXPORT_SYMBOL_GPL(nvme_keyring_id);
 
+static void nvme_tls_psk_describe(const struct key *key, struct seq_file *m)
+{
+	seq_puts(m, key->description);
+	seq_printf(m, ": %u", key->datalen);
+}
+
+static bool nvme_tls_psk_match(const struct key *key,
+			       const struct key_match_data *match_data)
+{
+	const char *match_id;
+	size_t match_len;
+
+	if (!key->description) {
+		pr_debug("%s: no key description\n", __func__);
+		return false;
+	}
+	match_len = strlen(key->description);
+	pr_debug("%s: id %s len %zd\n", __func__, key->description, match_len);
+
+	if (!match_data->raw_data) {
+		pr_debug("%s: no match data\n", __func__);
+		return false;
+	}
+	match_id = match_data->raw_data;
+	pr_debug("%s: match '%s' '%s' len %zd\n",
+		 __func__, match_id, key->description, match_len);
+	return !memcmp(key->description, match_id, match_len);
+}
+
+static int nvme_tls_psk_match_preparse(struct key_match_data *match_data)
+{
+	match_data->lookup_type = KEYRING_SEARCH_LOOKUP_ITERATE;
+	match_data->cmp = nvme_tls_psk_match;
+	return 0;
+}
+
+static struct key_type nvme_tls_psk_key_type = {
+	.name           = "psk",
+	.flags          = KEY_TYPE_NET_DOMAIN,
+	.preparse       = user_preparse,
+	.free_preparse  = user_free_preparse,
+	.match_preparse = nvme_tls_psk_match_preparse,
+	.instantiate    = generic_key_instantiate,
+	.revoke         = user_revoke,
+	.destroy        = user_destroy,
+	.describe       = nvme_tls_psk_describe,
+	.read           = user_read,
+};
+
+static struct key *nvme_tls_psk_lookup(struct key *keyring,
+		const char *hostnqn, const char *subnqn,
+		int hmac, bool generated)
+{
+	char *identity;
+	size_t identity_len = (NVMF_NQN_SIZE) * 2 + 11;
+	key_ref_t keyref;
+	key_serial_t keyring_id;
+
+	identity = kzalloc(identity_len, GFP_KERNEL);
+	if (!identity)
+		return ERR_PTR(-ENOMEM);
+
+	snprintf(identity, identity_len, "NVMe0%c%02d %s %s",
+		 generated ? 'G' : 'R', hmac, hostnqn, subnqn);
+
+	if (!keyring)
+		keyring = nvme_keyring;
+	keyring_id = key_serial(keyring);
+	pr_debug("keyring %x lookup tls psk '%s'\n",
+		 keyring_id, identity);
+	keyref = keyring_search(make_key_ref(keyring, true),
+				&nvme_tls_psk_key_type,
+				identity, false);
+	if (IS_ERR(keyref)) {
+		pr_debug("lookup tls psk '%s' failed, error %ld\n",
+			 identity, PTR_ERR(keyref));
+		kfree(identity);
+		return ERR_PTR(-ENOKEY);
+	}
+	kfree(identity);
+
+	return key_ref_to_ptr(keyref);
+}
+
 int nvme_keyring_init(void)
 {
+	int err;
+
 	nvme_keyring = keyring_alloc(".nvme",
 				     GLOBAL_ROOT_UID, GLOBAL_ROOT_GID,
 				     current_cred(),
@@ -28,12 +116,18 @@ int nvme_keyring_init(void)
 	if (IS_ERR(nvme_keyring))
 		return PTR_ERR(nvme_keyring);
 
+	err = register_key_type(&nvme_tls_psk_key_type);
+	if (err) {
+		key_put(nvme_keyring);
+		return err;
+	}
 	return 0;
 }
 EXPORT_SYMBOL_GPL(nvme_keyring_init);
 
 void nvme_keyring_exit(void)
 {
+	unregister_key_type(&nvme_tls_psk_key_type);
 	key_revoke(nvme_keyring);
 	key_put(nvme_keyring);
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 03/18] nvme: add TCP TSAS definitions
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 01/18] nvme-keyring: register '.nvme' keyring Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 02/18] nvme-keyring: define a 'psk' keytype Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 04/18] nvme-tcp: add definitions for TLS cipher suites Hannes Reinecke
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 include/linux/nvme.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 4fad4aa245fb..6e55a7a701e6 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -108,6 +108,13 @@ enum {
 	NVMF_RDMA_CMS_RDMA_CM	= 1, /* Sockets based endpoint addressing */
 };
 
+/* TSAS SECTYPE for TCP transport */
+enum {
+	NVMF_TCP_SECTYPE_NONE = 0, /* No Security */
+	NVMF_TCP_SECTYPE_TLS12 = 1, /* TLSv1.2, NVMe-oF 1.1 and NVMe-TCP 3.6.1.1 */
+	NVMF_TCP_SECTYPE_TLS13 = 2, /* TLSv1.3, NVMe-oF 1.1 and NVMe-TCP 3.6.1.1 */
+};
+
 #define NVME_AQ_DEPTH		32
 #define NVME_NR_AEN_COMMANDS	1
 #define NVME_AQ_BLK_MQ_DEPTH	(NVME_AQ_DEPTH - NVME_NR_AEN_COMMANDS)
@@ -1453,6 +1460,9 @@ struct nvmf_disc_rsp_page_entry {
 			__u16	pkey;
 			__u8	resv10[246];
 		} rdma;
+		struct tcp {
+			__u8	sectype;
+		} tcp;
 	} tsas;
 };
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 04/18] nvme-tcp: add definitions for TLS cipher suites
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (2 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 03/18] nvme: add TCP TSAS definitions Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 05/18] nvme-keyring: implement nvme_tls_psk_default() Hannes Reinecke
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 include/linux/nvme-tcp.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/nvme-tcp.h b/include/linux/nvme-tcp.h
index 75470159a194..19ca863813f1 100644
--- a/include/linux/nvme-tcp.h
+++ b/include/linux/nvme-tcp.h
@@ -18,6 +18,12 @@ enum nvme_tcp_pfv {
 	NVME_TCP_PFV_1_0 = 0x0,
 };
 
+enum nvme_tcp_tls_cipher {
+	NVME_TCP_TLS_CIPHER_INVALID     = 0,
+	NVME_TCP_TLS_CIPHER_SHA256      = 1,
+	NVME_TCP_TLS_CIPHER_SHA384      = 2,
+};
+
 enum nvme_tcp_fatal_error_status {
 	NVME_TCP_FES_INVALID_PDU_HDR		= 0x01,
 	NVME_TCP_FES_PDU_SEQ_ERR		= 0x02,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 05/18] nvme-keyring: implement nvme_tls_psk_default()
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (3 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 04/18] nvme-tcp: add definitions for TLS cipher suites Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 06/18] net/tls: implement ->read_sock() Hannes Reinecke
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

Implement a function to select the preferred PSK for TLS.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/common/keyring.c | 48 +++++++++++++++++++++++++++++++++++
 include/linux/nvme-keyring.h  |  8 ++++++
 2 files changed, 56 insertions(+)

diff --git a/drivers/nvme/common/keyring.c b/drivers/nvme/common/keyring.c
index 494dd365052e..f8d9a208397b 100644
--- a/drivers/nvme/common/keyring.c
+++ b/drivers/nvme/common/keyring.c
@@ -5,6 +5,7 @@
 
 #include <linux/module.h>
 #include <linux/seq_file.h>
+#include <linux/key.h>
 #include <linux/key-type.h>
 #include <keys/user-type.h>
 #include <linux/nvme.h>
@@ -103,6 +104,53 @@ static struct key *nvme_tls_psk_lookup(struct key *keyring,
 	return key_ref_to_ptr(keyref);
 }
 
+/*
+ * NVMe PSK priority list
+ *
+ * 'Retained' PSKs (ie 'generated == false')
+ * should be preferred to 'generated' PSKs,
+ * and SHA-384 should be preferred to SHA-256.
+ */
+struct nvme_tls_psk_priority_list {
+	bool generated;
+	enum nvme_tcp_tls_cipher cipher;
+} nvme_tls_psk_prio[] = {
+	{ .generated = false,
+	  .cipher = NVME_TCP_TLS_CIPHER_SHA384, },
+	{ .generated = false,
+	  .cipher = NVME_TCP_TLS_CIPHER_SHA256, },
+	{ .generated = true,
+	  .cipher = NVME_TCP_TLS_CIPHER_SHA384, },
+	{ .generated = true,
+	  .cipher = NVME_TCP_TLS_CIPHER_SHA256, },
+};
+
+/*
+ * nvme_tls_psk_default - Return the preferred PSK to use for TLS ClientHello
+ */
+key_serial_t nvme_tls_psk_default(struct key *keyring,
+		      const char *hostnqn, const char *subnqn)
+{
+	struct key *tls_key;
+	key_serial_t tls_key_id;
+	int prio;
+
+	for (prio = 0; prio < ARRAY_SIZE(nvme_tls_psk_prio); prio++) {
+		bool generated = nvme_tls_psk_prio[prio].generated;
+		enum nvme_tcp_tls_cipher cipher = nvme_tls_psk_prio[prio].cipher;
+
+		tls_key = nvme_tls_psk_lookup(keyring, hostnqn, subnqn,
+					      cipher, generated);
+		if (!IS_ERR(tls_key)) {
+			tls_key_id = tls_key->serial;
+			key_put(tls_key);
+			return tls_key_id;
+		}
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvme_tls_psk_default);
+
 int nvme_keyring_init(void)
 {
 	int err;
diff --git a/include/linux/nvme-keyring.h b/include/linux/nvme-keyring.h
index 32bd264a71e6..4efea9dd967c 100644
--- a/include/linux/nvme-keyring.h
+++ b/include/linux/nvme-keyring.h
@@ -8,12 +8,20 @@
 
 #ifdef CONFIG_NVME_KEYRING
 
+key_serial_t nvme_tls_psk_default(struct key *keyring,
+		const char *hostnqn, const char *subnqn);
+
 key_serial_t nvme_keyring_id(void);
 int nvme_keyring_init(void);
 void nvme_keyring_exit(void);
 
 #else
 
+static inline key_serial_t nvme_tls_psk_default(struct key *keyring,
+		const char *hostnqn, const char *subnqn)
+{
+	return 0;
+}
 static inline key_serial_t nvme_keyring_id(void)
 {
 	return 0;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 06/18] net/tls: implement ->read_sock()
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (4 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 05/18] nvme-keyring: implement nvme_tls_psk_default() Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 15:14   ` Sagi Grimberg
  2023-04-17 13:02 ` [PATCH 07/18] net/tls: sanitize MSG_EOR handling Hannes Reinecke
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke, Boris Pismenny

Implement ->read_sock() function for use with nvme-tcp.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Boris Pismenny <boris.pismenny@gmail.com>
---
 net/tls/tls.h      |  2 ++
 net/tls/tls_main.c |  2 ++
 net/tls/tls_sw.c   | 71 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 75 insertions(+)

diff --git a/net/tls/tls.h b/net/tls/tls.h
index 804c3880d028..a5bf3a9ce142 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -113,6 +113,8 @@ bool tls_sw_sock_is_readable(struct sock *sk);
 ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
 			   struct pipe_inode_info *pipe,
 			   size_t len, unsigned int flags);
+int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
+		     sk_read_actor_t read_actor);
 
 int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int tls_device_sendpage(struct sock *sk, struct page *page,
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index b32c112984dd..0b51b19aa46d 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -921,9 +921,11 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
 
 	ops[TLS_BASE][TLS_SW  ] = ops[TLS_BASE][TLS_BASE];
 	ops[TLS_BASE][TLS_SW  ].splice_read	= tls_sw_splice_read;
+	ops[TLS_BASE][TLS_SW  ].read_sock	= tls_sw_read_sock;
 
 	ops[TLS_SW  ][TLS_SW  ] = ops[TLS_SW  ][TLS_BASE];
 	ops[TLS_SW  ][TLS_SW  ].splice_read	= tls_sw_splice_read;
+	ops[TLS_SW  ][TLS_SW  ].read_sock	= tls_sw_read_sock;
 
 #ifdef CONFIG_TLS_DEVICE
 	ops[TLS_HW  ][TLS_BASE] = ops[TLS_BASE][TLS_BASE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 635b8bf6b937..827292e29f99 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2214,6 +2214,77 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 	goto splice_read_end;
 }
 
+int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
+		     sk_read_actor_t read_actor)
+{
+	struct tls_context *tls_ctx = tls_get_ctx(sk);
+	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+	struct strp_msg *rxm = NULL;
+	struct tls_msg *tlm;
+	struct sk_buff *skb;
+	ssize_t copied = 0;
+	int err, used;
+
+	if (!skb_queue_empty(&ctx->rx_list)) {
+		skb = __skb_dequeue(&ctx->rx_list);
+	} else {
+		struct tls_decrypt_arg darg;
+
+		err = tls_rx_rec_wait(sk, NULL, true, true);
+		if (err <= 0)
+			return err;
+
+		memset(&darg.inargs, 0, sizeof(darg.inargs));
+
+		err = tls_rx_one_record(sk, NULL, &darg);
+		if (err < 0) {
+			tls_err_abort(sk, -EBADMSG);
+			return err;
+		}
+
+		tls_rx_rec_done(ctx);
+		skb = darg.skb;
+	}
+
+	do {
+		rxm = strp_msg(skb);
+		tlm = tls_msg(skb);
+
+		/* read_sock does not support reading control messages */
+		if (tlm->control != TLS_RECORD_TYPE_DATA) {
+			err = -EINVAL;
+			goto read_sock_requeue;
+		}
+
+		used = read_actor(desc, skb, rxm->offset, rxm->full_len);
+		if (used <= 0) {
+			err = used;
+			goto read_sock_end;
+		}
+
+		copied += used;
+		if (used < rxm->full_len) {
+			rxm->offset += used;
+			rxm->full_len -= used;
+			if (!desc->count)
+				goto read_sock_requeue;
+		} else {
+			consume_skb(skb);
+			if (desc->count && !skb_queue_empty(&ctx->rx_list))
+				skb = __skb_dequeue(&ctx->rx_list);
+			else
+				skb = NULL;
+		}
+	} while (skb);
+
+read_sock_end:
+	return copied ? : err;
+
+read_sock_requeue:
+	__skb_queue_head(&ctx->rx_list, skb);
+	goto read_sock_end;
+}
+
 bool tls_sw_sock_is_readable(struct sock *sk)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 07/18] net/tls: sanitize MSG_EOR handling
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (5 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 06/18] net/tls: implement ->read_sock() Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 15:19   ` Sagi Grimberg
  2023-04-17 13:02 ` [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST Hannes Reinecke
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

The TLS stack is using MSG_EOR internally, so the flag cannot be
set for sendmsg()/sendpage(). But to avoid having the caller to
check whether TLS is active modify the code to clear the MSG_EOR
flag. And blank out MSG_MORE / MSG_SENDPAGE_NOTLAST, too, as they
conflict with MSG_EOR anyway.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 net/tls/tls_device.c | 10 ++++++++++
 net/tls/tls_sw.c     | 21 +++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a7cc4f9faac2..38e44e216865 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -576,6 +576,10 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	mutex_lock(&tls_ctx->tx_lock);
 	lock_sock(sk);
 
+	/* MSG_EOR conflicts with MSG_MORE, so clear both */
+	if (msg->msg_flags & MSG_EOR)
+		msg->msg_flags &= ~(MSG_EOR | MSG_MORE);
+
 	if (unlikely(msg->msg_controllen)) {
 		rc = tls_process_cmsg(sk, msg, &record_type);
 		if (rc)
@@ -604,6 +608,12 @@ int tls_device_sendpage(struct sock *sk, struct page *page,
 	if (flags & MSG_SENDPAGE_NOTLAST)
 		flags |= MSG_MORE;
 
+	/*
+	 * MSG_EOR conflicts with MSG_MORE/MSG_SENDPAGE_NOTLAST,
+	 * so clear all of them */
+	if (flags & MSG_EOR)
+		flags &= ~(MSG_EOR | MSG_SENDPAGE_NOTLAST | MSG_MORE);
+
 	mutex_lock(&tls_ctx->tx_lock);
 	lock_sock(sk);
 
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 827292e29f99..d0e6b7a04176 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -952,6 +952,11 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	int ret = 0;
 	int pending;
 
+	if (msg->msg_flags & MSG_EOR) {
+		eor = true;
+		/* MSG_EOR conflicts with MSG_MORE, so clear both */
+		msg->msg_flags &= ~(MSG_EOR | MSG_MORE);
+	}
 	if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
 			       MSG_CMSG_COMPAT))
 		return -EOPNOTSUPP;
@@ -1274,6 +1279,14 @@ static int tls_sw_do_sendpage(struct sock *sk, struct page *page,
 int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
 			   int offset, size_t size, int flags)
 {
+	/*
+	 * MSG_EOR is invalid for TLS, and conflicts
+	 * with MSG_MORE / MSG_SENDPAGE_NOTLAST.
+	 * So clear all of them.
+	 */
+	if (flags & MSG_EOR)
+		flags &= ~(MSG_MORE | MSG_SENDPAGE_NOTLAST | MSG_EOR);
+
 	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
 		      MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY |
 		      MSG_NO_SHARED_FRAGS))
@@ -1288,6 +1301,14 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	int ret;
 
+	/*
+	 * MSG_EOR is invalid for TLS, and conflicts
+	 * with MSG_MORE / MSG_SENDPAGE_NOTLAST.
+	 * So clear all of them.
+	 */
+	if (flags & MSG_EOR)
+		flags &= ~(MSG_MORE | MSG_SENDPAGE_NOTLAST | MSG_EOR);
+
 	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
 		      MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
 		return -EOPNOTSUPP;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (6 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 07/18] net/tls: sanitize MSG_EOR handling Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 15:10   ` Sagi Grimberg
  2023-04-17 13:02 ` [PATCH 09/18] security/keys: export key_lookup() Hannes Reinecke
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

MSG_SENDPAGE_NOTLAST was introduced with commit 35f9c09fe9c7
("tcp: tcp_sendpages() should call tcp_push() once") to fix issues
with the ->sendfile() call, and the implication seems to be that
the flag should be strictly internal to sendfile.
Hence we shouldn't be setting it on nvme-tcp, and anyway doing
so will cause TLS sending invalid data.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/tcp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 273c1f2760a4..732aa2296560 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -995,7 +995,7 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
 		if (last && !queue->data_digest && !nvme_tcp_queue_more(queue))
 			flags |= MSG_EOR;
 		else
-			flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
+			flags |= MSG_MORE;
 
 		if (sendpage_ok(page)) {
 			ret = kernel_sendpage(queue->sock, page, offset, len,
@@ -1049,7 +1049,7 @@ static int nvme_tcp_try_send_cmd_pdu(struct nvme_tcp_request *req)
 	int ret;
 
 	if (inline_data || nvme_tcp_queue_more(queue))
-		flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
+		flags |= MSG_MORE;
 	else
 		flags |= MSG_EOR;
 
@@ -1091,7 +1091,7 @@ static int nvme_tcp_try_send_data_pdu(struct nvme_tcp_request *req)
 	if (!req->h2cdata_left)
 		ret = kernel_sendpage(queue->sock, virt_to_page(pdu),
 				offset_in_page(pdu) + req->offset, len,
-				MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST);
+				MSG_DONTWAIT | MSG_MORE);
 	else
 		ret = sock_no_sendpage(queue->sock, virt_to_page(pdu),
 				offset_in_page(pdu) + req->offset, len,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 09/18] security/keys: export key_lookup()
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (7 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 10/18] nvme/tcp: allocate socket file Hannes Reinecke
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke, David Howells

For in-kernel consumers one cannot readily assign a user (eg when
running from a workqueue), so the normal key search permissions
cannot be applied.
This patch exports the 'key_lookup()' function for a simple lookup
of keys without checking for permissions.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 security/keys/key.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/keys/key.c b/security/keys/key.c
index 5c0c7df833f8..bd1b7d45df90 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -693,6 +693,7 @@ struct key *key_lookup(key_serial_t id)
 	spin_unlock(&key_serial_lock);
 	return key;
 }
+EXPORT_SYMBOL_GPL(key_lookup);
 
 /*
  * Find and lock the specified key type against removal.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 10/18] nvme/tcp: allocate socket file
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (8 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 09/18] security/keys: export key_lookup() Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 11/18] nvme-tcp: enable TLS handshake upcall Hannes Reinecke
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

When using the TLS upcall we need to allocate a socket file such
that the userspace daemon is able to use the socket.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/tcp.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 732aa2296560..0a39743da4e4 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1321,7 +1321,9 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
 	}
 
 	noreclaim_flag = memalloc_noreclaim_save();
-	sock_release(queue->sock);
+	/* ->sock will be released by fput() */
+	fput(queue->sock->file);
+	queue->sock = NULL;
 	memalloc_noreclaim_restore(noreclaim_flag);
 
 	kfree(queue->pdu);
@@ -1495,6 +1497,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
 	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
 	struct nvme_tcp_queue *queue = &ctrl->queues[qid];
 	int ret, rcv_pdu_size;
+	struct file *sock_file;
 
 	mutex_init(&queue->queue_lock);
 	queue->ctrl = ctrl;
@@ -1517,6 +1520,12 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
 		goto err_destroy_mutex;
 	}
 
+	sock_file = sock_alloc_file(queue->sock, O_CLOEXEC, NULL);
+	if (IS_ERR(sock_file)) {
+		sock_release(queue->sock);
+		ret = PTR_ERR(sock_file);
+		goto err_sock;
+	}
 	nvme_tcp_reclassify_socket(queue->sock);
 
 	/* Single syn retry */
@@ -1638,7 +1647,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
 	if (queue->hdr_digest || queue->data_digest)
 		nvme_tcp_free_crypto(queue);
 err_sock:
-	sock_release(queue->sock);
+	/* ->sock will be released by fput() */
+	fput(queue->sock->file);
 	queue->sock = NULL;
 err_destroy_mutex:
 	mutex_destroy(&queue->send_mutex);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 11/18] nvme-tcp: enable TLS handshake upcall
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (9 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 10/18] nvme/tcp: allocate socket file Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 15:26   ` Sagi Grimberg
  2023-04-18  5:52   ` Daniel Wagner
  2023-04-17 13:02 ` [PATCH 12/18] nvme-tcp: control message handling for recvmsg() Hannes Reinecke
                   ` (6 subsequent siblings)
  17 siblings, 2 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

Add a fabrics option 'tls' and start the TLS handshake upcall
with the default PSK. When TLS is started the PSK key serial
number is displayed in the sysfs attribute 'tls_key'

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/Kconfig   |  14 ++++
 drivers/nvme/host/core.c    |  23 ++++++-
 drivers/nvme/host/fabrics.c |  10 +++
 drivers/nvme/host/fabrics.h |   3 +
 drivers/nvme/host/nvme.h    |   1 +
 drivers/nvme/host/tcp.c     | 127 ++++++++++++++++++++++++++++++++++--
 6 files changed, 170 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index 2f6a7f8c94e8..96a74041bb0a 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -92,6 +92,20 @@ config NVME_TCP
 
 	  If unsure, say N.
 
+config NVME_TCP_TLS
+	bool "NVMe over Fabrics TCP TLS encryption support"
+	depends on NVME_TCP
+	select NVME_COMMON
+	select NVME_KEYRING
+	select NET_HANDSHAKE
+	help
+	  Enables TLS encryption for NVMe TCP using the netlink handshake API.
+
+	  The TLS handshake daemon is availble at
+	  https://github.com/oracle/ktls-utils.
+
+	  If unsure, say N.
+
 config NVME_AUTH
 	bool "NVM Express over Fabrics In-Band Authentication"
 	depends on NVME_CORE
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 79aa215aec76..937013127c91 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3889,6 +3889,19 @@ static DEVICE_ATTR(dhchap_ctrl_secret, S_IRUGO | S_IWUSR,
 	nvme_ctrl_dhchap_ctrl_secret_show, nvme_ctrl_dhchap_ctrl_secret_store);
 #endif
 
+#ifdef CONFIG_NVME_TCP_TLS
+static ssize_t tls_key_show(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+	if (!ctrl->tls_key)
+		return 0;
+	return sysfs_emit(buf, "%08x", key_serial(ctrl->tls_key));
+}
+static DEVICE_ATTR_RO(tls_key);
+#endif
+
 static struct attribute *nvme_dev_attrs[] = {
 	&dev_attr_reset_controller.attr,
 	&dev_attr_rescan_controller.attr,
@@ -3915,6 +3928,9 @@ static struct attribute *nvme_dev_attrs[] = {
 #ifdef CONFIG_NVME_AUTH
 	&dev_attr_dhchap_secret.attr,
 	&dev_attr_dhchap_ctrl_secret.attr,
+#endif
+#ifdef CONFIG_NVME_TCP_TLS
+	&dev_attr_tls_key.attr,
 #endif
 	NULL
 };
@@ -3945,7 +3961,10 @@ static umode_t nvme_dev_attrs_are_visible(struct kobject *kobj,
 	if (a == &dev_attr_dhchap_ctrl_secret.attr && !ctrl->opts)
 		return 0;
 #endif
-
+#ifdef CONFIG_NVME_TCP_TLS
+	if (a == &dev_attr_tls_key.attr && !ctrl->opts)
+		return 0;
+#endif
 	return a->mode;
 }
 
@@ -5080,7 +5099,7 @@ static void nvme_free_ctrl(struct device *dev)
 
 	if (!subsys || ctrl->instance != subsys->instance)
 		ida_free(&nvme_instance_ida, ctrl->instance);
-
+	key_put(ctrl->tls_key);
 	nvme_free_cels(ctrl);
 	nvme_mpath_uninit(ctrl);
 	nvme_auth_stop(ctrl);
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index bbaa04a0c502..a8d6c22515ca 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -609,6 +609,7 @@ static const match_table_t opt_tokens = {
 	{ NVMF_OPT_DISCOVERY,		"discovery"		},
 	{ NVMF_OPT_DHCHAP_SECRET,	"dhchap_secret=%s"	},
 	{ NVMF_OPT_DHCHAP_CTRL_SECRET,	"dhchap_ctrl_secret=%s"	},
+	{ NVMF_OPT_TLS,			"tls"			},
 	{ NVMF_OPT_ERR,			NULL			}
 };
 
@@ -632,6 +633,7 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 	opts->hdr_digest = false;
 	opts->data_digest = false;
 	opts->tos = -1; /* < 0 == use transport default */
+	opts->tls = false;
 
 	options = o = kstrdup(buf, GFP_KERNEL);
 	if (!options)
@@ -918,6 +920,14 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 			kfree(opts->dhchap_ctrl_secret);
 			opts->dhchap_ctrl_secret = p;
 			break;
+		case NVMF_OPT_TLS:
+			if (!IS_ENABLED(CONFIG_NVME_TCP_TLS)) {
+				pr_err("TLS is not supported\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			opts->tls = true;
+			break;
 		default:
 			pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n",
 				p);
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index dcac3df8a5f7..5db36e250e7a 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -70,6 +70,7 @@ enum {
 	NVMF_OPT_DISCOVERY	= 1 << 22,
 	NVMF_OPT_DHCHAP_SECRET	= 1 << 23,
 	NVMF_OPT_DHCHAP_CTRL_SECRET = 1 << 24,
+	NVMF_OPT_TLS		= 1 << 25,
 };
 
 /**
@@ -102,6 +103,7 @@ enum {
  * @dhchap_secret: DH-HMAC-CHAP secret
  * @dhchap_ctrl_secret: DH-HMAC-CHAP controller secret for bi-directional
  *              authentication
+ * @tls:        Start TLS encrypted connections (TCP)
  * @disable_sqflow: disable controller sq flow control
  * @hdr_digest: generate/verify header digest (TCP)
  * @data_digest: generate/verify data digest (TCP)
@@ -128,6 +130,7 @@ struct nvmf_ctrl_options {
 	int			max_reconnects;
 	char			*dhchap_secret;
 	char			*dhchap_ctrl_secret;
+	bool			tls;
 	bool			disable_sqflow;
 	bool			hdr_digest;
 	bool			data_digest;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bf46f122e9e1..fbcc0ccf2ef2 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -347,6 +347,7 @@ struct nvme_ctrl {
 	struct nvme_dhchap_key *ctrl_key;
 	u16 transaction;
 #endif
+	struct key *tls_key;
 
 	/* Power saving configuration */
 	u64 ps_max_latency_us;
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 0a39743da4e4..b80bcc5fe447 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -8,9 +8,13 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/err.h>
+#include <linux/key.h>
 #include <linux/nvme-tcp.h>
+#include <linux/nvme-keyring.h>
 #include <net/sock.h>
 #include <net/tcp.h>
+#include <net/tls.h>
+#include <net/handshake.h>
 #include <linux/blk-mq.h>
 #include <crypto/hash.h>
 #include <net/busy_poll.h>
@@ -31,6 +35,16 @@ static int so_priority;
 module_param(so_priority, int, 0644);
 MODULE_PARM_DESC(so_priority, "nvme tcp socket optimize priority");
 
+#ifdef CONFIG_NVME_TCP_TLS
+/*
+ * TLS handshake timeout
+ */
+static int tls_handshake_timeout = 10;
+module_param(tls_handshake_timeout, int, 0644);
+MODULE_PARM_DESC(tls_handshake_timeout,
+		 "nvme TLS handshake timeout in seconds (default 10)");
+#endif
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 /* lockdep can detect a circular dependency of the form
  *   sk_lock -> mmap_lock (page fault) -> fs locks -> sk_lock
@@ -146,7 +160,10 @@ struct nvme_tcp_queue {
 	struct ahash_request	*snd_hash;
 	__le32			exp_ddgst;
 	__le32			recv_ddgst;
-
+#ifdef CONFIG_NVME_TCP_TLS
+	struct completion       tls_complete;
+	int                     tls_err;
+#endif
 	struct page_frag_cache	pf_cache;
 
 	void (*state_change)(struct sock *);
@@ -1492,7 +1509,79 @@ static void nvme_tcp_set_queue_io_cpu(struct nvme_tcp_queue *queue)
 	queue->io_cpu = cpumask_next_wrap(n - 1, cpu_online_mask, -1, false);
 }
 
-static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
+#ifdef CONFIG_NVME_TCP_TLS
+static void nvme_tcp_tls_done(void *data, int status, key_serial_t pskid)
+{
+	struct nvme_tcp_queue *queue = data;
+	struct nvme_tcp_ctrl *ctrl = queue->ctrl;
+	int qid = nvme_tcp_queue_id(queue);
+	struct key *tls_key;
+
+	dev_dbg(ctrl->ctrl.device, "queue %d: TLS handshake done, key %x, status %d\n",
+		qid, pskid, status);
+
+	if (status) {
+		queue->tls_err = -status;
+		goto out_complete;
+	}
+
+	tls_key = key_lookup(pskid);
+	if (IS_ERR(tls_key)) {
+		dev_warn(ctrl->ctrl.device, "queue %d: Invalid key %x\n",
+			 qid, pskid);
+		queue->tls_err = -ENOKEY;
+	} else {
+		ctrl->ctrl.tls_key = tls_key;
+		queue->tls_err = 0;
+	}
+
+out_complete:
+	complete(&queue->tls_complete);
+}
+
+static int nvme_tcp_start_tls(struct nvme_ctrl *nctrl,
+			      struct nvme_tcp_queue *queue,
+			      key_serial_t pskid)
+{
+	int qid = nvme_tcp_queue_id(queue);
+	int ret;
+	struct tls_handshake_args args;
+	unsigned long tmo = tls_handshake_timeout * HZ;
+	key_serial_t keyring = nvme_keyring_id();
+
+	dev_dbg(nctrl->device, "queue %d: start TLS with key %x\n",
+		qid, pskid);
+	args.ta_sock = queue->sock;
+	args.ta_done = nvme_tcp_tls_done;
+	args.ta_data = queue;
+	args.ta_my_peerids[0] = pskid;
+	args.ta_num_peerids = 1;
+	args.ta_keyring = keyring;
+	args.ta_timeout_ms = tls_handshake_timeout * 1000;
+	queue->tls_err = -EOPNOTSUPP;
+	init_completion(&queue->tls_complete);
+	ret = tls_client_hello_psk(&args, GFP_KERNEL);
+	if (ret) {
+		dev_err(nctrl->device, "queue %d: failed to start TLS: %d\n",
+			qid, ret);
+		return ret;
+	}
+	if (wait_for_completion_timeout(&queue->tls_complete, tmo) == 0) {
+		dev_err(nctrl->device,
+			"queue %d: TLS handshake timeout\n", qid);
+		ret = -ETIMEDOUT;
+	} else {
+		dev_dbg(nctrl->device,
+			"queue %d: TLS handshake complete, error %d\n",
+			qid, queue->tls_err);
+		ret = queue->tls_err;
+	}
+	return ret;
+}
+#endif
+
+static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
+				key_serial_t pskid)
 {
 	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
 	struct nvme_tcp_queue *queue = &ctrl->queues[qid];
@@ -1616,6 +1705,14 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
 		goto err_rcv_pdu;
 	}
 
+#ifdef CONFIG_NVME_TCP_TLS
+	/* If PSKs are configured try to start TLS */
+	if (pskid) {
+		ret = nvme_tcp_start_tls(nctrl, queue, pskid);
+		if (ret)
+			goto err_init_connect;
+	}
+#endif
 	ret = nvme_tcp_init_connection(queue);
 	if (ret)
 		goto err_init_connect;
@@ -1759,10 +1856,22 @@ static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl,
 static int nvme_tcp_alloc_admin_queue(struct nvme_ctrl *ctrl)
 {
 	int ret;
+	key_serial_t psk_id = 0;
+
+	if (ctrl->opts->tls) {
+		psk_id = nvme_tls_psk_default(NULL,
+					      ctrl->opts->host->nqn,
+					      ctrl->opts->subsysnqn);
+		if (!psk_id) {
+			dev_err(ctrl->device, "no valid PSK found\n");
+			ret = -ENOKEY;
+			goto out_free_queue;
+		}
+	}
 
-	ret = nvme_tcp_alloc_queue(ctrl, 0);
+	ret = nvme_tcp_alloc_queue(ctrl, 0, psk_id);
 	if (ret)
-		return ret;
+		goto out_free_queue;
 
 	ret = nvme_tcp_alloc_async_req(to_tcp_ctrl(ctrl));
 	if (ret)
@@ -1778,9 +1887,15 @@ static int nvme_tcp_alloc_admin_queue(struct nvme_ctrl *ctrl)
 static int __nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
 {
 	int i, ret;
+	key_serial_t psk_id;
 
+	if (!ctrl->tls_key) {
+		dev_err(ctrl->device, "no PSK negotiated\n");
+		return -ENOKEY;
+	}
+	psk_id = key_serial(ctrl->tls_key);
 	for (i = 1; i < ctrl->queue_count; i++) {
-		ret = nvme_tcp_alloc_queue(ctrl, i);
+		ret = nvme_tcp_alloc_queue(ctrl, i, psk_id);
 		if (ret)
 			goto out_free_queues;
 	}
@@ -2689,7 +2804,7 @@ static struct nvmf_transport_ops nvme_tcp_transport = {
 			  NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
 			  NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
 			  NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_NR_POLL_QUEUES |
-			  NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE,
+			  NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE | NVMF_OPT_TLS,
 	.create_ctrl	= nvme_tcp_create_ctrl,
 };
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 12/18] nvme-tcp: control message handling for recvmsg()
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (10 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 11/18] nvme-tcp: enable TLS handshake upcall Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 15:24   ` Sagi Grimberg
  2023-04-17 13:02 ` [PATCH 13/18] nvme-fabrics: parse options 'keyring' and 'tls_key' Hannes Reinecke
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

kTLS is sending TLS ALERT messages as control messages for recvmsg().
As we can't do anything sensible with it just abort the connection
and let the userspace agent to a re-negotiation.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/tcp.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index b80bcc5fe447..eeb3f85fe259 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1352,6 +1352,11 @@ static int nvme_tcp_init_connection(struct nvme_tcp_queue *queue)
 {
 	struct nvme_tcp_icreq_pdu *icreq;
 	struct nvme_tcp_icresp_pdu *icresp;
+#ifdef CONFIG_NVME_TCP_TLS
+	char cbuf[CMSG_LEN(sizeof(char))] = {};
+	struct cmsghdr *cmsg;
+	unsigned char ctype;
+#endif
 	struct msghdr msg = {};
 	struct kvec iov;
 	bool ctrl_hdgst, ctrl_ddgst;
@@ -1389,11 +1394,27 @@ static int nvme_tcp_init_connection(struct nvme_tcp_queue *queue)
 	memset(&msg, 0, sizeof(msg));
 	iov.iov_base = icresp;
 	iov.iov_len = sizeof(*icresp);
+#ifdef CONFIG_NVME_TCP_TLS
+	msg.msg_control = cbuf;
+	msg.msg_controllen = sizeof(cbuf);
+#endif
 	ret = kernel_recvmsg(queue->sock, &msg, &iov, 1,
 			iov.iov_len, msg.msg_flags);
 	if (ret < 0)
 		goto free_icresp;
-
+#ifdef CONFIG_NVME_TCP_TLS
+	cmsg = (struct cmsghdr *)cbuf;
+	if (CMSG_OK(&msg, cmsg) &&
+	    cmsg->cmsg_level == SOL_TLS &&
+	    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
+		ctype = *((unsigned char *)CMSG_DATA(cmsg));
+		if (ctype != TLS_RECORD_TYPE_DATA) {
+			pr_err("queue %d: unhandled TLS record %d\n",
+			       nvme_tcp_queue_id(queue), ctype);
+			return -ENOTCONN;
+		}
+	}
+#endif
 	ret = -EINVAL;
 	if (icresp->hdr.type != nvme_tcp_icresp) {
 		pr_err("queue %d: bad type returned %d\n",
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 13/18] nvme-fabrics: parse options 'keyring' and 'tls_key'
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (11 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 12/18] nvme-tcp: control message handling for recvmsg() Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 14/18] nvmet: make TCP sectype settable via configfs Hannes Reinecke
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

Parse the fabrics options 'keyring' and 'tls_key' and store the
referenced keys in the options structure.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/fabrics.c | 67 ++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/fabrics.h |  6 ++++
 drivers/nvme/host/tcp.c     | 11 ++++--
 3 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index a8d6c22515ca..6c28e2f424d4 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -605,6 +605,8 @@ static const match_table_t opt_tokens = {
 	{ NVMF_OPT_NR_WRITE_QUEUES,	"nr_write_queues=%d"	},
 	{ NVMF_OPT_NR_POLL_QUEUES,	"nr_poll_queues=%d"	},
 	{ NVMF_OPT_TOS,			"tos=%d"		},
+	{ NVMF_OPT_KEYRING,		"keyring=%d"		},
+	{ NVMF_OPT_TLS_KEY,		"tls_key=%d"		},
 	{ NVMF_OPT_FAIL_FAST_TMO,	"fast_io_fail_tmo=%d"	},
 	{ NVMF_OPT_DISCOVERY,		"discovery"		},
 	{ NVMF_OPT_DHCHAP_SECRET,	"dhchap_secret=%s"	},
@@ -620,8 +622,9 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 	char *options, *o, *p;
 	int token, ret = 0;
 	size_t nqnlen  = 0;
-	int ctrl_loss_tmo = NVMF_DEF_CTRL_LOSS_TMO;
+	int ctrl_loss_tmo = NVMF_DEF_CTRL_LOSS_TMO, key_id;
 	uuid_t hostid;
+	struct key *key = NULL;
 
 	/* Set defaults */
 	opts->queue_size = NVMF_DEF_QUEUE_SIZE;
@@ -889,6 +892,66 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 			}
 			opts->tos = token;
 			break;
+		case NVMF_OPT_KEYRING:
+			if (!IS_ENABLED(CONFIG_NVME_TCP_TLS)) {
+				pr_err("TLS is not supported\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			if (match_int(args, &key_id)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			if (key_id < 0) {
+				pr_err("Invalid keyring id %d\n", key_id);
+				ret = -EINVAL;
+				goto out;
+			}
+			if (!key_id) {
+				pr_debug("Using default keyring\n");
+				key_put(opts->keyring);
+				opts->keyring = NULL;
+				break;
+			}
+			key = key_lookup(key_id);
+			if (!key) {
+				pr_err("Keyring id %08x not found\n", key_id);
+				ret = -ENOKEY;
+				goto out;
+			}
+			key_put(opts->keyring);
+			opts->keyring = key;
+			break;
+		case NVMF_OPT_TLS_KEY:
+			if (!IS_ENABLED(CONFIG_NVME_TCP_TLS)) {
+				pr_err("TLS is not supported\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			if (match_int(args, &key_id)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			if (key_id < 0) {
+				pr_err("Invalid key id %d\n", key_id);
+				ret = -EINVAL;
+				goto out;
+			}
+			if (!key_id) {
+				pr_debug("Using 'best' PSK\n");
+				key_put(opts->tls_key);
+				opts->tls_key = NULL;
+				break;
+			}
+			key = key_lookup(key_id);
+			if (!key) {
+				pr_err("Key id %08x not found\n", key_id);
+				ret = -ENOKEY;
+				goto out;
+			}
+			key_put(opts->tls_key);
+			opts->tls_key = key;
+			break;
 		case NVMF_OPT_DISCOVERY:
 			opts->discovery_nqn = true;
 			break;
@@ -1053,6 +1116,8 @@ static int nvmf_check_allowed_opts(struct nvmf_ctrl_options *opts,
 void nvmf_free_options(struct nvmf_ctrl_options *opts)
 {
 	nvmf_host_put(opts->host);
+	key_put(opts->keyring);
+	key_put(opts->tls_key);
 	kfree(opts->transport);
 	kfree(opts->traddr);
 	kfree(opts->trsvcid);
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 5db36e250e7a..2ff7b7168a40 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -71,6 +71,8 @@ enum {
 	NVMF_OPT_DHCHAP_SECRET	= 1 << 23,
 	NVMF_OPT_DHCHAP_CTRL_SECRET = 1 << 24,
 	NVMF_OPT_TLS		= 1 << 25,
+	NVMF_OPT_KEYRING	= 1 << 26,
+	NVMF_OPT_TLS_KEY	= 1 << 27,
 };
 
 /**
@@ -103,6 +105,8 @@ enum {
  * @dhchap_secret: DH-HMAC-CHAP secret
  * @dhchap_ctrl_secret: DH-HMAC-CHAP controller secret for bi-directional
  *              authentication
+ * @keyring:    Keyring to use for key lookups
+ * @tls_key:    TLS key for encrypted connections (TCP)
  * @tls:        Start TLS encrypted connections (TCP)
  * @disable_sqflow: disable controller sq flow control
  * @hdr_digest: generate/verify header digest (TCP)
@@ -130,6 +134,8 @@ struct nvmf_ctrl_options {
 	int			max_reconnects;
 	char			*dhchap_secret;
 	char			*dhchap_ctrl_secret;
+	struct key		*keyring;
+	struct key		*tls_key;
 	bool			tls;
 	bool			disable_sqflow;
 	bool			hdr_digest;
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index eeb3f85fe259..7714c2bdfc19 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1572,6 +1572,8 @@ static int nvme_tcp_start_tls(struct nvme_ctrl *nctrl,
 
 	dev_dbg(nctrl->device, "queue %d: start TLS with key %x\n",
 		qid, pskid);
+	if (nctrl->opts->keyring)
+		keyring = key_serial(nctrl->opts->keyring);
 	args.ta_sock = queue->sock;
 	args.ta_done = nvme_tcp_tls_done;
 	args.ta_data = queue;
@@ -1880,9 +1882,12 @@ static int nvme_tcp_alloc_admin_queue(struct nvme_ctrl *ctrl)
 	key_serial_t psk_id = 0;
 
 	if (ctrl->opts->tls) {
-		psk_id = nvme_tls_psk_default(NULL,
-					      ctrl->opts->host->nqn,
-					      ctrl->opts->subsysnqn);
+		if (ctrl->opts->tls_key)
+			psk_id = key_serial(ctrl->opts->tls_key);
+		else
+			psk_id = nvme_tls_psk_default(ctrl->opts->keyring,
+						      ctrl->opts->host->nqn,
+						      ctrl->opts->subsysnqn);
 		if (!psk_id) {
 			dev_err(ctrl->device, "no valid PSK found\n");
 			ret = -ENOKEY;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 14/18] nvmet: make TCP sectype settable via configfs
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (12 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 13/18] nvme-fabrics: parse options 'keyring' and 'tls_key' Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 13:02 ` [PATCH 15/18] nvmet-tcp: allocate socket file Hannes Reinecke
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

Add a new configfs attribute 'addr_tsas' to make the TCP sectype
settable via configfs.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/target/configfs.c | 65 ++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 907143870da5..7f826ac8b75c 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -303,6 +303,11 @@ static void nvmet_port_init_tsas_rdma(struct nvmet_port *port)
 	port->disc_addr.tsas.rdma.cms = NVMF_RDMA_CMS_RDMA_CM;
 }
 
+static void nvmet_port_init_tsas_tcp(struct nvmet_port *port, int sectype)
+{
+	port->disc_addr.tsas.tcp.sectype = sectype;
+}
+
 static ssize_t nvmet_addr_trtype_store(struct config_item *item,
 		const char *page, size_t count)
 {
@@ -325,11 +330,70 @@ static ssize_t nvmet_addr_trtype_store(struct config_item *item,
 	port->disc_addr.trtype = nvmet_transport[i].type;
 	if (port->disc_addr.trtype == NVMF_TRTYPE_RDMA)
 		nvmet_port_init_tsas_rdma(port);
+	else if (port->disc_addr.trtype == NVMF_TRTYPE_TCP)
+		nvmet_port_init_tsas_tcp(port, NVMF_TCP_SECTYPE_NONE);
 	return count;
 }
 
 CONFIGFS_ATTR(nvmet_, addr_trtype);
 
+static const struct nvmet_type_name_map nvmet_addr_tsas_tcp[] = {
+	{ NVMF_TCP_SECTYPE_NONE,	"none" },
+	{ NVMF_TCP_SECTYPE_TLS13,	"tls1.3" },
+};
+
+static const struct nvmet_type_name_map nvmet_addr_tsas_rdma[] = {
+	{ NVMF_RDMA_QPTYPE_CONNECTED,	"connected" },
+	{ NVMF_RDMA_QPTYPE_DATAGRAM,	"datagram"  },
+};
+
+static ssize_t nvmet_addr_tsas_show(struct config_item *item,
+		char *page)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	int i;
+
+	if (port->disc_addr.trtype == NVMF_TRTYPE_TCP) {
+		for (i = 0; i < ARRAY_SIZE(nvmet_addr_tsas_tcp); i++) {
+			if (port->disc_addr.tsas.tcp.sectype == nvmet_addr_tsas_tcp[i].type)
+				return sprintf(page, "%s\n", nvmet_addr_tsas_tcp[i].name);
+		}
+	} else if (port->disc_addr.trtype == NVMF_TRTYPE_RDMA) {
+		for (i = 0; i < ARRAY_SIZE(nvmet_addr_tsas_rdma); i++) {
+			if (port->disc_addr.tsas.rdma.qptype == nvmet_addr_tsas_rdma[i].type)
+				return sprintf(page, "%s\n", nvmet_addr_tsas_rdma[i].name);
+		}
+	}
+	return sprintf(page, "reserved\n");
+}
+
+static ssize_t nvmet_addr_tsas_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	int i;
+
+	if (nvmet_is_port_enabled(port, __func__))
+		return -EACCES;
+
+	if (port->disc_addr.trtype != NVMF_TRTYPE_TCP)
+		return -EINVAL;
+
+	for (i = 0; i < ARRAY_SIZE(nvmet_addr_tsas_tcp); i++) {
+		if (sysfs_streq(page, nvmet_addr_tsas_tcp[i].name))
+			goto found;
+	}
+
+	pr_err("Invalid value '%s' for tsas\n", page);
+	return -EINVAL;
+
+found:
+	nvmet_port_init_tsas_tcp(port, nvmet_addr_tsas_tcp[i].type);
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_, addr_tsas);
+
 /*
  * Namespace structures & file operation functions below
  */
@@ -1741,6 +1805,7 @@ static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_addr_traddr,
 	&nvmet_attr_addr_trsvcid,
 	&nvmet_attr_addr_trtype,
+	&nvmet_attr_addr_tsas,
 	&nvmet_attr_param_inline_data_size,
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 	&nvmet_attr_param_pi_enable,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 15/18] nvmet-tcp: allocate socket file
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (13 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 14/18] nvmet: make TCP sectype settable via configfs Hannes Reinecke
@ 2023-04-17 13:02 ` Hannes Reinecke
  2023-04-17 15:29   ` Sagi Grimberg
  2023-04-17 13:03 ` [PATCH 16/18] nvmet-tcp: enable TLS handshake upcall Hannes Reinecke
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

For the TLS upcall we need to allocate a socket file such
that the userspace daemon is able to use the socket.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/target/tcp.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index ed98df72c76b..fc7a9fb56a83 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -122,6 +122,7 @@ struct nvmet_tcp_cmd {
 
 enum nvmet_tcp_queue_state {
 	NVMET_TCP_Q_CONNECTING,
+	NVMET_TCP_Q_TLS_HANDSHAKE,
 	NVMET_TCP_Q_LIVE,
 	NVMET_TCP_Q_DISCONNECTING,
 };
@@ -1481,12 +1482,12 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w)
 	nvmet_sq_destroy(&queue->nvme_sq);
 	cancel_work_sync(&queue->io_work);
 	nvmet_tcp_free_cmd_data_in_buffers(queue);
-	sock_release(queue->sock);
+	/* ->sock will be released by fput() */
+	fput(queue->sock->file);
 	nvmet_tcp_free_cmds(queue);
 	if (queue->hdr_digest || queue->data_digest)
 		nvmet_tcp_free_crypto(queue);
 	ida_free(&nvmet_tcp_queue_ida, queue->idx);
-
 	page = virt_to_head_page(queue->pf_cache.va);
 	__page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias);
 	kfree(queue);
@@ -1609,15 +1610,16 @@ static int nvmet_tcp_set_queue_sock(struct nvmet_tcp_queue *queue)
 	return ret;
 }
 
-static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
+static void nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
 		struct socket *newsock)
 {
 	struct nvmet_tcp_queue *queue;
+	struct file *sock_file;
 	int ret;
 
 	queue = kzalloc(sizeof(*queue), GFP_KERNEL);
 	if (!queue)
-		return -ENOMEM;
+		return;
 
 	INIT_WORK(&queue->release_work, nvmet_tcp_release_queue_work);
 	INIT_WORK(&queue->io_work, nvmet_tcp_io_work);
@@ -1625,15 +1627,25 @@ static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
 	queue->port = port;
 	queue->nr_cmds = 0;
 	spin_lock_init(&queue->state_lock);
-	queue->state = NVMET_TCP_Q_CONNECTING;
+	if (queue->port->nport->disc_addr.tsas.tcp.sectype ==
+	    NVMF_TCP_SECTYPE_TLS13)
+		queue->state = NVMET_TCP_Q_TLS_HANDSHAKE;
+	else
+		queue->state = NVMET_TCP_Q_CONNECTING;
 	INIT_LIST_HEAD(&queue->free_list);
 	init_llist_head(&queue->resp_list);
 	INIT_LIST_HEAD(&queue->resp_send_list);
 
+	sock_file = sock_alloc_file(queue->sock, O_CLOEXEC, NULL);
+	if (IS_ERR(sock_file)) {
+		ret = PTR_ERR(sock_file);
+		goto out_free_queue;
+	}
+
 	queue->idx = ida_alloc(&nvmet_tcp_queue_ida, GFP_KERNEL);
 	if (queue->idx < 0) {
 		ret = queue->idx;
-		goto out_free_queue;
+		goto out_sock;
 	}
 
 	ret = nvmet_tcp_alloc_cmd(queue, &queue->connect);
@@ -1654,7 +1666,7 @@ static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
 	if (ret)
 		goto out_destroy_sq;
 
-	return 0;
+	return;
 out_destroy_sq:
 	mutex_lock(&nvmet_tcp_queue_mutex);
 	list_del_init(&queue->queue_list);
@@ -1664,9 +1676,11 @@ static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
 	nvmet_tcp_free_cmd(&queue->connect);
 out_ida_remove:
 	ida_free(&nvmet_tcp_queue_ida, queue->idx);
+out_sock:
+	fput(queue->sock->file);
 out_free_queue:
 	kfree(queue);
-	return ret;
+	pr_err("failed to allocate queue");
 }
 
 static void nvmet_tcp_accept_work(struct work_struct *w)
@@ -1683,11 +1697,7 @@ static void nvmet_tcp_accept_work(struct work_struct *w)
 				pr_warn("failed to accept err=%d\n", ret);
 			return;
 		}
-		ret = nvmet_tcp_alloc_queue(port, newsock);
-		if (ret) {
-			pr_err("failed to allocate queue\n");
-			sock_release(newsock);
-		}
+		nvmet_tcp_alloc_queue(port, newsock);
 	}
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 16/18] nvmet-tcp: enable TLS handshake upcall
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (14 preceding siblings ...)
  2023-04-17 13:02 ` [PATCH 15/18] nvmet-tcp: allocate socket file Hannes Reinecke
@ 2023-04-17 13:03 ` Hannes Reinecke
  2023-04-17 13:03 ` [PATCH 17/18] nvmet-tcp: control messages for recvmsg() Hannes Reinecke
  2023-04-17 13:03 ` [PATCH 18/18] nvmet-tcp: add configfs attribute 'param_keyring' Hannes Reinecke
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:03 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

Add functions to start the TLS handshake upcall when
the TCP TSAS sectype is set to 'tls1.3' and add a config
option NVME_TARGET_TCP_TLS.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/target/Kconfig    |  14 ++++
 drivers/nvme/target/configfs.c |  63 ++++++++++++++++-
 drivers/nvme/target/nvmet.h    |   1 +
 drivers/nvme/target/tcp.c      | 126 ++++++++++++++++++++++++++++++++-
 4 files changed, 199 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/target/Kconfig b/drivers/nvme/target/Kconfig
index 79fc64035ee3..15af2b4341f6 100644
--- a/drivers/nvme/target/Kconfig
+++ b/drivers/nvme/target/Kconfig
@@ -84,6 +84,20 @@ config NVME_TARGET_TCP
 
 	  If unsure, say N.
 
+config NVME_TARGET_TCP_TLS
+	bool "NVMe over Fabrics TCP target TLS encryption support"
+	depends on NVME_TARGET_TCP
+	select NVME_COMMON
+	select NVME_KEYRING
+	select NET_HANDSHAKE
+	help
+	  Enables TLS encryption for the NVMe TCP target using the netlink handshake API.
+
+	  The TLS handshake daemon is availble at
+	  https://github.com/oracle/ktls-utils.
+
+	  If unsure, say N.
+
 config NVME_TARGET_AUTH
 	bool "NVMe over Fabrics In-band Authentication support"
 	depends on NVME_TARGET
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 7f826ac8b75c..49b407702ad5 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -15,6 +15,7 @@
 #ifdef CONFIG_NVME_TARGET_AUTH
 #include <linux/nvme-auth.h>
 #endif
+#include <linux/nvme-keyring.h>
 #include <crypto/hash.h>
 #include <crypto/kpp.h>
 
@@ -159,10 +160,15 @@ static const struct nvmet_type_name_map nvmet_addr_treq[] = {
 	{ NVMF_TREQ_NOT_REQUIRED,	"not required" },
 };
 
+static inline u8 nvmet_port_treq(struct nvmet_port *port)
+{
+	return (port->disc_addr.treq & NVME_TREQ_SECURE_CHANNEL_MASK);
+}
+
 static ssize_t nvmet_addr_treq_show(struct config_item *item, char *page)
 {
-	u8 treq = to_nvmet_port(item)->disc_addr.treq &
-		NVME_TREQ_SECURE_CHANNEL_MASK;
+	struct nvmet_port *port = to_nvmet_port(item);
+	u8 treq = nvmet_port_treq(port);
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(nvmet_addr_treq); i++) {
@@ -174,11 +180,16 @@ static ssize_t nvmet_addr_treq_show(struct config_item *item, char *page)
 	return snprintf(page, PAGE_SIZE, "\n");
 }
 
+static inline u8 nvmet_port_treq_mask(struct nvmet_port *port)
+{
+	return (port->disc_addr.treq & ~NVME_TREQ_SECURE_CHANNEL_MASK);
+}
+
 static ssize_t nvmet_addr_treq_store(struct config_item *item,
 		const char *page, size_t count)
 {
 	struct nvmet_port *port = to_nvmet_port(item);
-	u8 treq = port->disc_addr.treq & ~NVME_TREQ_SECURE_CHANNEL_MASK;
+	u8 treq = nvmet_port_treq_mask(port);
 	int i;
 
 	if (nvmet_is_port_enabled(port, __func__))
@@ -193,6 +204,23 @@ static ssize_t nvmet_addr_treq_store(struct config_item *item,
 	return -EINVAL;
 
 found:
+	if (port->disc_addr.trtype == NVMF_TRTYPE_TCP) {
+		if (!IS_ENABLED(CONFIG_NVME_TARGET_TCP_TLS)) {
+			pr_err("TLS is not supported\n");
+			return -EINVAL;
+		}
+		if (!port->keyring) {
+			pr_err("TLS keyring not configured\n");
+			return -EINVAL;
+		}
+		if (port->disc_addr.tsas.tcp.sectype != NVMF_TCP_SECTYPE_TLS13) {
+			pr_warn("cannot change TREQ when TLS is not enabled\n");
+			return -EINVAL;
+		} else if (nvmet_addr_treq[i].type == NVMF_TREQ_NOT_SPECIFIED) {
+			pr_warn("cannot set TREQ to 'not specified' when TLS is enabled\n");
+			return -EINVAL;
+		}
+	}
 	treq |= nvmet_addr_treq[i].type;
 	port->disc_addr.treq = treq;
 	return count;
@@ -371,6 +399,7 @@ static ssize_t nvmet_addr_tsas_store(struct config_item *item,
 		const char *page, size_t count)
 {
 	struct nvmet_port *port = to_nvmet_port(item);
+	u8 treq = nvmet_port_treq_mask(port);
 	int i;
 
 	if (nvmet_is_port_enabled(port, __func__))
@@ -379,6 +408,15 @@ static ssize_t nvmet_addr_tsas_store(struct config_item *item,
 	if (port->disc_addr.trtype != NVMF_TRTYPE_TCP)
 		return -EINVAL;
 
+	if (!IS_ENABLED(CONFIG_NVME_TARGET_TCP_TLS)) {
+		pr_err("TLS is not supported\n");
+		return -EINVAL;
+	}
+	if (!port->keyring) {
+		pr_err("TLS keyring not configured\n");
+		return -EINVAL;
+	}
+
 	for (i = 0; i < ARRAY_SIZE(nvmet_addr_tsas_tcp); i++) {
 		if (sysfs_streq(page, nvmet_addr_tsas_tcp[i].name))
 			goto found;
@@ -389,6 +427,16 @@ static ssize_t nvmet_addr_tsas_store(struct config_item *item,
 
 found:
 	nvmet_port_init_tsas_tcp(port, nvmet_addr_tsas_tcp[i].type);
+	if (nvmet_addr_tsas_tcp[i].type == NVMF_TCP_SECTYPE_TLS13) {
+		if (nvmet_port_treq(port) == NVMF_TREQ_NOT_SPECIFIED)
+			treq |= NVMF_TREQ_REQUIRED;
+		else
+			treq |= nvmet_port_treq(port);
+	} else {
+		/* Set to 'not specified' if TLS is not enabled */
+		treq |= NVMF_TREQ_NOT_SPECIFIED;
+	}
+	port->disc_addr.treq = treq;
 	return count;
 }
 
@@ -1795,6 +1843,7 @@ static void nvmet_port_release(struct config_item *item)
 	flush_workqueue(nvmet_wq);
 	list_del(&port->global_entry);
 
+	key_put(port->keyring);
 	kfree(port->ana_state);
 	kfree(port);
 }
@@ -1844,6 +1893,14 @@ static struct config_group *nvmet_ports_make(struct config_group *group,
 		return ERR_PTR(-ENOMEM);
 	}
 
+	if (nvme_keyring_id()) {
+		port->keyring = key_lookup(nvme_keyring_id());
+		if (IS_ERR(port->keyring)) {
+			pr_warn("NVMe keyring not available, disabling TLS\n");
+			port->keyring = NULL;
+		}
+	}
+
 	for (i = 1; i <= NVMET_MAX_ANAGRPS; i++) {
 		if (i == NVMET_DEFAULT_ANA_GRPID)
 			port->ana_state[1] = NVME_ANA_OPTIMIZED;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index dc60a22646f7..a22f817f5935 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -158,6 +158,7 @@ struct nvmet_port {
 	struct config_group		ana_groups_group;
 	struct nvmet_ana_group		ana_default_group;
 	enum nvme_ana_state		*ana_state;
+	struct key			*keyring;
 	void				*priv;
 	bool				enabled;
 	int				inline_data_size;
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index fc7a9fb56a83..ea853658430f 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -8,9 +8,12 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/err.h>
+#include <linux/key.h>
 #include <linux/nvme-tcp.h>
+#include <linux/nvme-keyring.h>
 #include <net/sock.h>
 #include <net/tcp.h>
+#include <net/handshake.h>
 #include <linux/inet.h>
 #include <linux/llist.h>
 #include <crypto/hash.h>
@@ -66,6 +69,16 @@ device_param_cb(idle_poll_period_usecs, &set_param_ops,
 MODULE_PARM_DESC(idle_poll_period_usecs,
 		"nvmet tcp io_work poll till idle time period in usecs: Default 0");
 
+#ifdef CONFIG_NVME_TARGET_TCP_TLS
+/*
+ * TLS handshake timeout
+ */
+static int tls_handshake_timeout = 30;
+module_param(tls_handshake_timeout, int, 0644);
+MODULE_PARM_DESC(tls_handshake_timeout,
+		 "nvme TLS handshake timeout in seconds (default 30)");
+#endif
+
 #define NVMET_TCP_RECV_BUDGET		8
 #define NVMET_TCP_SEND_BUDGET		8
 #define NVMET_TCP_IO_WORK_BUDGET	64
@@ -155,6 +168,8 @@ struct nvmet_tcp_queue {
 	bool			data_digest;
 	struct ahash_request	*snd_hash;
 	struct ahash_request	*rcv_hash;
+	struct key		*tls_psk;
+	struct delayed_work	tls_handshake_work;
 
 	unsigned long           poll_end;
 
@@ -1487,6 +1502,7 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w)
 	nvmet_tcp_free_cmds(queue);
 	if (queue->hdr_digest || queue->data_digest)
 		nvmet_tcp_free_crypto(queue);
+	key_put(queue->tls_psk);
 	ida_free(&nvmet_tcp_queue_ida, queue->idx);
 	page = virt_to_head_page(queue->pf_cache.va);
 	__page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias);
@@ -1501,8 +1517,12 @@ static void nvmet_tcp_data_ready(struct sock *sk)
 
 	read_lock_bh(&sk->sk_callback_lock);
 	queue = sk->sk_user_data;
-	if (likely(queue))
-		queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
+	if (queue->data_ready)
+		queue->data_ready(sk);
+	if (likely(queue) &&
+	    queue->state != NVMET_TCP_Q_TLS_HANDSHAKE)
+		queue_work_on(queue_cpu(queue), nvmet_tcp_wq,
+			      &queue->io_work);
 	read_unlock_bh(&sk->sk_callback_lock);
 }
 
@@ -1610,6 +1630,89 @@ static int nvmet_tcp_set_queue_sock(struct nvmet_tcp_queue *queue)
 	return ret;
 }
 
+#ifdef CONFIG_NVME_TARGET_TCP_TLS
+static void nvmet_tcp_tls_queue_reset(struct nvmet_tcp_queue *queue)
+{
+	spin_lock(&queue->state_lock);
+	if (queue->state != NVMET_TCP_Q_TLS_HANDSHAKE) {
+		pr_warn("queue %d: TLS handshake already completed\n",
+			queue->idx);
+		spin_unlock(&queue->state_lock);
+		return;
+	}
+	queue->state = NVMET_TCP_Q_CONNECTING;
+	spin_unlock(&queue->state_lock);
+
+	pr_debug("queue %d: resetting queue callbacks after TLS handshake\n",
+		 queue->idx);
+	/*
+	 * Set callbacks after handshake; TLS implementation
+	 * might have changed the socket callbacks.
+	 */
+	nvmet_tcp_set_queue_sock(queue);
+}
+
+static void nvmet_tcp_tls_handshake_done(void *data, int status,
+					 key_serial_t peerid)
+{
+	struct nvmet_tcp_queue *queue = data;
+
+	pr_debug("queue %d: TLS handshake done, key %x, status %d\n",
+		 queue->idx, peerid, status);
+	if (!status) {
+		spin_lock(&queue->state_lock);
+		queue->tls_psk = key_lookup(peerid);
+		if (IS_ERR(queue->tls_psk)) {
+			pr_warn("queue %d: TLS key %x not found\n",
+				queue->idx, peerid);
+			queue->tls_psk = NULL;
+		}
+		spin_unlock(&queue->state_lock);
+	}
+	cancel_delayed_work_sync(&queue->tls_handshake_work);
+	if (status)
+		nvmet_tcp_schedule_release_queue(queue);
+	else
+		nvmet_tcp_tls_queue_reset(queue);
+}
+
+static void nvmet_tcp_tls_handshake_timeout_work(struct work_struct *w)
+{
+	struct nvmet_tcp_queue *queue = container_of(to_delayed_work(w),
+			struct nvmet_tcp_queue, tls_handshake_work);
+
+	pr_debug("queue %d: TLS handshake timeout\n", queue->idx);
+	nvmet_tcp_schedule_release_queue(queue);
+}
+
+static int nvmet_tcp_tls_handshake(struct nvmet_tcp_queue *queue)
+{
+	int ret = -EOPNOTSUPP;
+	struct tls_handshake_args args;
+
+	if (queue->state != NVMET_TCP_Q_TLS_HANDSHAKE) {
+		pr_warn("cannot start TLS in state %d\n", queue->state);
+		return -EINVAL;
+	}
+
+	pr_debug("queue %d: TLS ServerHello\n", queue->idx);
+	args.ta_sock = queue->sock;
+	args.ta_done = nvmet_tcp_tls_handshake_done;
+	args.ta_data = queue;
+	args.ta_keyring = key_serial(queue->port->nport->keyring);
+	args.ta_timeout_ms = tls_handshake_timeout * 2 * 1024;
+
+	ret = tls_server_hello_psk(&args, GFP_KERNEL);
+	if (ret) {
+		pr_err("failed to start TLS, err=%d\n", ret);
+	} else {
+		queue_delayed_work(nvmet_wq, &queue->tls_handshake_work,
+				   tls_handshake_timeout * HZ);
+	}
+	return ret;
+}
+#endif
+
 static void nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
 		struct socket *newsock)
 {
@@ -1662,6 +1765,25 @@ static void nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
 	list_add_tail(&queue->queue_list, &nvmet_tcp_queue_list);
 	mutex_unlock(&nvmet_tcp_queue_mutex);
 
+#ifdef CONFIG_NVME_TARGET_TCP_TLS
+	INIT_DELAYED_WORK(&queue->tls_handshake_work,
+			  nvmet_tcp_tls_handshake_timeout_work);
+	if (queue->state == NVMET_TCP_Q_TLS_HANDSHAKE) {
+		struct sock *sk = queue->sock->sk;
+
+		/* Restore the default callbacks before starting upcall */
+		read_lock_bh(&sk->sk_callback_lock);
+		sk->sk_user_data = NULL;
+		sk->sk_data_ready = port->data_ready;
+		read_unlock_bh(&sk->sk_callback_lock);
+		if (!nvmet_tcp_tls_handshake(queue))
+			return;
+
+		/* TLS handshake failed, terminate the connection */
+		goto out_destroy_sq;
+	}
+#endif
+
 	ret = nvmet_tcp_set_queue_sock(queue);
 	if (ret)
 		goto out_destroy_sq;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 17/18] nvmet-tcp: control messages for recvmsg()
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (15 preceding siblings ...)
  2023-04-17 13:03 ` [PATCH 16/18] nvmet-tcp: enable TLS handshake upcall Hannes Reinecke
@ 2023-04-17 13:03 ` Hannes Reinecke
  2023-04-17 13:03 ` [PATCH 18/18] nvmet-tcp: add configfs attribute 'param_keyring' Hannes Reinecke
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:03 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

kTLS requires control messages for recvmsg() to relay any out-of-band
TLS messages (eg TLS alerts) to the caller.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/target/tcp.c | 51 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 49 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index ea853658430f..2974abd30b45 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -13,6 +13,7 @@
 #include <linux/nvme-keyring.h>
 #include <net/sock.h>
 #include <net/tcp.h>
+#include <net/tls.h>
 #include <net/handshake.h>
 #include <linux/inet.h>
 #include <linux/llist.h>
@@ -117,6 +118,7 @@ struct nvmet_tcp_cmd {
 	u32				pdu_len;
 	u32				pdu_recv;
 	int				sg_idx;
+	char				recv_cbuf[CMSG_LEN(sizeof(char))];
 	struct msghdr			recv_msg;
 	struct bio_vec			*iov;
 	u32				flags;
@@ -1103,12 +1105,39 @@ static inline bool nvmet_tcp_pdu_valid(u8 type)
 	return false;
 }
 
+static bool nvmet_tcp_tls_record_ok(struct msghdr *msg, char *cbuf)
+{
+	struct cmsghdr *cmsg = (struct cmsghdr *)cbuf;
+	unsigned char ctype;
+
+	if (!IS_ENABLED(CONFIG_NVME_TARGET_TCP_TLS))
+		return 0;
+
+	if (CMSG_OK(msg, cmsg) &&
+	    cmsg->cmsg_level == SOL_TLS &&
+	    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
+		ctype = *((unsigned char *)CMSG_DATA(cmsg));
+		if (ctype != TLS_RECORD_TYPE_DATA) {
+			pr_err("unhandled TLS record %d\n", ctype);
+			return false;
+		}
+	}
+	return true;
+}
+
 static int nvmet_tcp_try_recv_pdu(struct nvmet_tcp_queue *queue)
 {
 	struct nvme_tcp_hdr *hdr = &queue->pdu.cmd.hdr;
 	int len;
 	struct kvec iov;
-	struct msghdr msg = { .msg_flags = MSG_DONTWAIT };
+	char cbuf[CMSG_LEN(sizeof(char))] = {};
+	struct msghdr msg = {
+#ifdef CONFIG_NVME_TARGET_TCP_TLS
+		.msg_control = cbuf,
+		.msg_controllen = sizeof(cbuf),
+#endif
+		.msg_flags = MSG_DONTWAIT
+	};
 
 recv:
 	iov.iov_base = (void *)&queue->pdu + queue->offset;
@@ -1117,6 +1146,8 @@ static int nvmet_tcp_try_recv_pdu(struct nvmet_tcp_queue *queue)
 			iov.iov_len, msg.msg_flags);
 	if (unlikely(len < 0))
 		return len;
+	if (!nvmet_tcp_tls_record_ok(&msg, cbuf))
+		return -ENOTCONN;
 
 	queue->offset += len;
 	queue->left -= len;
@@ -1176,6 +1207,9 @@ static int nvmet_tcp_try_recv_data(struct nvmet_tcp_queue *queue)
 			cmd->recv_msg.msg_flags);
 		if (ret <= 0)
 			return ret;
+		if (!nvmet_tcp_tls_record_ok(&cmd->recv_msg,
+					     cmd->recv_cbuf))
+			return -ENOTCONN;
 
 		cmd->pdu_recv += ret;
 		cmd->rbytes_done += ret;
@@ -1197,7 +1231,14 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
 {
 	struct nvmet_tcp_cmd *cmd = queue->cmd;
 	int ret;
-	struct msghdr msg = { .msg_flags = MSG_DONTWAIT };
+	char cbuf[CMSG_LEN(sizeof(char))] = {};
+	struct msghdr msg = {
+#ifdef CONFIG_NVME_TARGET_TCP_TLS
+		.msg_control = cbuf,
+		.msg_controllen = sizeof(cbuf),
+#endif
+		.msg_flags = MSG_DONTWAIT
+	};
 	struct kvec iov = {
 		.iov_base = (void *)&cmd->recv_ddgst + queue->offset,
 		.iov_len = queue->left
@@ -1207,6 +1248,8 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
 			iov.iov_len, msg.msg_flags);
 	if (unlikely(ret < 0))
 		return ret;
+	if (!nvmet_tcp_tls_record_ok(&msg, cbuf))
+		return -ENOTCONN;
 
 	queue->offset += ret;
 	queue->left -= ret;
@@ -1376,6 +1419,10 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_queue *queue,
 	if (!c->r2t_pdu)
 		goto out_free_data;
 
+	if (IS_ENABLED(CONFIG_NVME_TARGET_TCP_TLS)) {
+		c->recv_msg.msg_control = c->recv_cbuf;
+		c->recv_msg.msg_controllen = sizeof(c->recv_cbuf);
+	}
 	c->recv_msg.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
 
 	list_add_tail(&c->entry, &queue->free_list);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 18/18] nvmet-tcp: add configfs attribute 'param_keyring'
  2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
                   ` (16 preceding siblings ...)
  2023-04-17 13:03 ` [PATCH 17/18] nvmet-tcp: control messages for recvmsg() Hannes Reinecke
@ 2023-04-17 13:03 ` Hannes Reinecke
  17 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 13:03 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

Add a configfs attribute to list and change the default keyring.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/target/configfs.c | 45 ++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 49b407702ad5..493f5524c922 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -309,6 +309,47 @@ static ssize_t nvmet_param_pi_enable_store(struct config_item *item,
 CONFIGFS_ATTR(nvmet_, param_pi_enable);
 #endif
 
+#ifdef CONFIG_NVME_TARGET_TCP_TLS
+static ssize_t nvmet_param_keyring_show(struct config_item *item,
+		char *page)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+
+	if (!port->keyring)
+		return sprintf(page, "\n");
+	return snprintf(page, PAGE_SIZE, "%s\n",
+			port->keyring->description);
+}
+
+static ssize_t nvmet_param_keyring_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	struct key *keyring;
+	unsigned int keyring_id;
+	int ret;
+
+	if (nvmet_is_port_enabled(port, __func__))
+		return -EACCES;
+
+	ret = kstrtou32(page, 0, &keyring_id);
+	if (ret) {
+		pr_err("Invalid keyring id '%s'\n", page);
+		return ret;
+	}
+	keyring = key_lookup(keyring_id);
+	if (IS_ERR(keyring)) {
+		pr_err("Invalid keyring '%08x'\n", keyring_id);
+		return PTR_ERR(keyring);
+	}
+	key_put(port->keyring);
+	port->keyring = keyring;
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_, param_keyring);
+#endif
+
 static ssize_t nvmet_addr_trtype_show(struct config_item *item,
 		char *page)
 {
@@ -412,6 +453,7 @@ static ssize_t nvmet_addr_tsas_store(struct config_item *item,
 		pr_err("TLS is not supported\n");
 		return -EINVAL;
 	}
+
 	if (!port->keyring) {
 		pr_err("TLS keyring not configured\n");
 		return -EINVAL;
@@ -1858,6 +1900,9 @@ static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_param_inline_data_size,
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 	&nvmet_attr_param_pi_enable,
+#endif
+#ifdef CONFIG_NVME_TARGET_TCP_TLS
+	&nvmet_attr_param_keyring,
 #endif
 	NULL,
 };
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST
  2023-04-17 13:02 ` [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST Hannes Reinecke
@ 2023-04-17 15:10   ` Sagi Grimberg
  2023-04-17 15:25     ` Hannes Reinecke
  0 siblings, 1 reply; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-17 15:10 UTC (permalink / raw)
  To: Hannes Reinecke, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

nak from me.

> MSG_SENDPAGE_NOTLAST was introduced with commit 35f9c09fe9c7
> ("tcp: tcp_sendpages() should call tcp_push() once") to fix issues
> with the ->sendfile() call, and the implication seems to be that
> the flag should be strictly internal to sendfile.

no, the use case was to allow sendpage consumers to build larger
tso packets for large payloads (splice being the primary one), nvme-tcp
also benefits from it (as do others), just like sendfile.

> Hence we shouldn't be setting it on nvme-tcp, and anyway doing
> so will cause TLS sending invalid data.

That is something else. I don't recall the full details on the
discussion with Jakub, but it was not my intention to strike
it altogether, just remove it if !sendpage_ok.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/18] net/tls: implement ->read_sock()
  2023-04-17 13:02 ` [PATCH 06/18] net/tls: implement ->read_sock() Hannes Reinecke
@ 2023-04-17 15:14   ` Sagi Grimberg
  0 siblings, 0 replies; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-17 15:14 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Boris Pismenny

> Implement ->read_sock() function for use with nvme-tcp.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Cc: Boris Pismenny <boris.pismenny@gmail.com>

This needs a review from Jakub/Boris/John...

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/18] net/tls: sanitize MSG_EOR handling
  2023-04-17 13:02 ` [PATCH 07/18] net/tls: sanitize MSG_EOR handling Hannes Reinecke
@ 2023-04-17 15:19   ` Sagi Grimberg
  2023-04-17 20:10     ` Jakub Kicinski
  0 siblings, 1 reply; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-17 15:19 UTC (permalink / raw)
  To: Hannes Reinecke, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake


> The TLS stack is using MSG_EOR internally, so the flag cannot be
> set for sendmsg()/sendpage(). But to avoid having the caller to
> check whether TLS is active modify the code to clear the MSG_EOR
> flag. And blank out MSG_MORE / MSG_SENDPAGE_NOTLAST, too, as they
> conflict with MSG_EOR anyway.

This looks like a temporary workaround to me.

The networking folks really need to be CC'd on this (same for patch 6).

CC'ing Jakub for starters.

> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>   net/tls/tls_device.c | 10 ++++++++++
>   net/tls/tls_sw.c     | 21 +++++++++++++++++++++
>   2 files changed, 31 insertions(+)
> 
> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index a7cc4f9faac2..38e44e216865 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -576,6 +576,10 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
>   	mutex_lock(&tls_ctx->tx_lock);
>   	lock_sock(sk);
>   
> +	/* MSG_EOR conflicts with MSG_MORE, so clear both */
> +	if (msg->msg_flags & MSG_EOR)
> +		msg->msg_flags &= ~(MSG_EOR | MSG_MORE);
> +
>   	if (unlikely(msg->msg_controllen)) {
>   		rc = tls_process_cmsg(sk, msg, &record_type);
>   		if (rc)
> @@ -604,6 +608,12 @@ int tls_device_sendpage(struct sock *sk, struct page *page,
>   	if (flags & MSG_SENDPAGE_NOTLAST)
>   		flags |= MSG_MORE;
>   
> +	/*
> +	 * MSG_EOR conflicts with MSG_MORE/MSG_SENDPAGE_NOTLAST,
> +	 * so clear all of them */
> +	if (flags & MSG_EOR)
> +		flags &= ~(MSG_EOR | MSG_SENDPAGE_NOTLAST | MSG_MORE);
> +
>   	mutex_lock(&tls_ctx->tx_lock);
>   	lock_sock(sk);
>   
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 827292e29f99..d0e6b7a04176 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -952,6 +952,11 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
>   	int ret = 0;
>   	int pending;
>   
> +	if (msg->msg_flags & MSG_EOR) {
> +		eor = true;
> +		/* MSG_EOR conflicts with MSG_MORE, so clear both */
> +		msg->msg_flags &= ~(MSG_EOR | MSG_MORE);
> +	}
>   	if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
>   			       MSG_CMSG_COMPAT))
>   		return -EOPNOTSUPP;
> @@ -1274,6 +1279,14 @@ static int tls_sw_do_sendpage(struct sock *sk, struct page *page,
>   int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
>   			   int offset, size_t size, int flags)
>   {
> +	/*
> +	 * MSG_EOR is invalid for TLS, and conflicts
> +	 * with MSG_MORE / MSG_SENDPAGE_NOTLAST.
> +	 * So clear all of them.
> +	 */
> +	if (flags & MSG_EOR)
> +		flags &= ~(MSG_MORE | MSG_SENDPAGE_NOTLAST | MSG_EOR);
> +
>   	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
>   		      MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY |
>   		      MSG_NO_SHARED_FRAGS))
> @@ -1288,6 +1301,14 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
>   	struct tls_context *tls_ctx = tls_get_ctx(sk);
>   	int ret;
>   
> +	/*
> +	 * MSG_EOR is invalid for TLS, and conflicts
> +	 * with MSG_MORE / MSG_SENDPAGE_NOTLAST.
> +	 * So clear all of them.
> +	 */
> +	if (flags & MSG_EOR)
> +		flags &= ~(MSG_MORE | MSG_SENDPAGE_NOTLAST | MSG_EOR);
> +
>   	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
>   		      MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
>   		return -EOPNOTSUPP;

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 12/18] nvme-tcp: control message handling for recvmsg()
  2023-04-17 13:02 ` [PATCH 12/18] nvme-tcp: control message handling for recvmsg() Hannes Reinecke
@ 2023-04-17 15:24   ` Sagi Grimberg
  2023-04-17 15:26     ` Hannes Reinecke
  0 siblings, 1 reply; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-17 15:24 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake



On 4/17/23 16:02, Hannes Reinecke wrote:
> kTLS is sending TLS ALERT messages as control messages for recvmsg().
> As we can't do anything sensible with it just abort the connection
> and let the userspace agent to a re-negotiation.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>   drivers/nvme/host/tcp.c | 23 ++++++++++++++++++++++-
>   1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index b80bcc5fe447..eeb3f85fe259 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1352,6 +1352,11 @@ static int nvme_tcp_init_connection(struct nvme_tcp_queue *queue)
>   {
>   	struct nvme_tcp_icreq_pdu *icreq;
>   	struct nvme_tcp_icresp_pdu *icresp;
> +#ifdef CONFIG_NVME_TCP_TLS

I don't think you need the ifdef(s) here, right?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST
  2023-04-17 15:10   ` Sagi Grimberg
@ 2023-04-17 15:25     ` Hannes Reinecke
  2023-04-17 15:28       ` Sagi Grimberg
  0 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 15:25 UTC (permalink / raw)
  To: Sagi Grimberg, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

On 4/17/23 17:10, Sagi Grimberg wrote:
> nak from me.
> 
>> MSG_SENDPAGE_NOTLAST was introduced with commit 35f9c09fe9c7
>> ("tcp: tcp_sendpages() should call tcp_push() once") to fix issues
>> with the ->sendfile() call, and the implication seems to be that
>> the flag should be strictly internal to sendfile.
> 
> no, the use case was to allow sendpage consumers to build larger
> tso packets for large payloads (splice being the primary one), nvme-tcp
> also benefits from it (as do others), just like sendfile.
> 
>> Hence we shouldn't be setting it on nvme-tcp, and anyway doing
>> so will cause TLS sending invalid data.
> 
> That is something else. I don't recall the full details on the
> discussion with Jakub, but it was not my intention to strike
> it altogether, just remove it if !sendpage_ok.
> 
But then we're running afoul with tls_sw_do_sendpage(), which uses 
MSG_SENDPAGE_NOTLAST as a record indicator. Forwarding 
MSG_SENDPAGE_NOTLAST to tls results in incorrect TLS records on the wire 
and a transmission stall.

Took me days to figure that out.

If you have a better idea, I'm all ears.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] nvme-tcp: enable TLS handshake upcall
  2023-04-17 13:02 ` [PATCH 11/18] nvme-tcp: enable TLS handshake upcall Hannes Reinecke
@ 2023-04-17 15:26   ` Sagi Grimberg
  2023-04-17 15:28     ` Hannes Reinecke
  2023-04-18  5:52   ` Daniel Wagner
  1 sibling, 1 reply; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-17 15:26 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake



On 4/17/23 16:02, Hannes Reinecke wrote:
> Add a fabrics option 'tls' and start the TLS handshake upcall
> with the default PSK. When TLS is started the PSK key serial
> number is displayed in the sysfs attribute 'tls_key'
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>   drivers/nvme/host/Kconfig   |  14 ++++
>   drivers/nvme/host/core.c    |  23 ++++++-
>   drivers/nvme/host/fabrics.c |  10 +++
>   drivers/nvme/host/fabrics.h |   3 +
>   drivers/nvme/host/nvme.h    |   1 +
>   drivers/nvme/host/tcp.c     | 127 ++++++++++++++++++++++++++++++++++--
>   6 files changed, 170 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
> index 2f6a7f8c94e8..96a74041bb0a 100644
> --- a/drivers/nvme/host/Kconfig
> +++ b/drivers/nvme/host/Kconfig
> @@ -92,6 +92,20 @@ config NVME_TCP
>   
>   	  If unsure, say N.
>   
> +config NVME_TCP_TLS
> +	bool "NVMe over Fabrics TCP TLS encryption support"
> +	depends on NVME_TCP
> +	select NVME_COMMON
> +	select NVME_KEYRING
> +	select NET_HANDSHAKE
> +	help
> +	  Enables TLS encryption for NVMe TCP using the netlink handshake API.
> +
> +	  The TLS handshake daemon is availble at
> +	  https://github.com/oracle/ktls-utils.
> +
> +	  If unsure, say N.
> +
>   config NVME_AUTH
>   	bool "NVM Express over Fabrics In-Band Authentication"
>   	depends on NVME_CORE
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 79aa215aec76..937013127c91 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3889,6 +3889,19 @@ static DEVICE_ATTR(dhchap_ctrl_secret, S_IRUGO | S_IWUSR,
>   	nvme_ctrl_dhchap_ctrl_secret_show, nvme_ctrl_dhchap_ctrl_secret_store);
>   #endif
>   
> +#ifdef CONFIG_NVME_TCP_TLS
> +static ssize_t tls_key_show(struct device *dev,
> +			    struct device_attribute *attr, char *buf)
> +{
> +	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
> +
> +	if (!ctrl->tls_key)
> +		return 0;
> +	return sysfs_emit(buf, "%08x", key_serial(ctrl->tls_key));
> +}
> +static DEVICE_ATTR_RO(tls_key);
> +#endif
> +
>   static struct attribute *nvme_dev_attrs[] = {
>   	&dev_attr_reset_controller.attr,
>   	&dev_attr_rescan_controller.attr,
> @@ -3915,6 +3928,9 @@ static struct attribute *nvme_dev_attrs[] = {
>   #ifdef CONFIG_NVME_AUTH
>   	&dev_attr_dhchap_secret.attr,
>   	&dev_attr_dhchap_ctrl_secret.attr,
> +#endif
> +#ifdef CONFIG_NVME_TCP_TLS
> +	&dev_attr_tls_key.attr,
>   #endif
>   	NULL
>   };
> @@ -3945,7 +3961,10 @@ static umode_t nvme_dev_attrs_are_visible(struct kobject *kobj,
>   	if (a == &dev_attr_dhchap_ctrl_secret.attr && !ctrl->opts)
>   		return 0;
>   #endif
> -
> +#ifdef CONFIG_NVME_TCP_TLS
> +	if (a == &dev_attr_tls_key.attr && !ctrl->opts)
> +		return 0;
> +#endif
>   	return a->mode;
>   }
>   
> @@ -5080,7 +5099,7 @@ static void nvme_free_ctrl(struct device *dev)
>   
>   	if (!subsys || ctrl->instance != subsys->instance)
>   		ida_free(&nvme_instance_ida, ctrl->instance);
> -
> +	key_put(ctrl->tls_key);
>   	nvme_free_cels(ctrl);
>   	nvme_mpath_uninit(ctrl);
>   	nvme_auth_stop(ctrl);
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index bbaa04a0c502..a8d6c22515ca 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -609,6 +609,7 @@ static const match_table_t opt_tokens = {
>   	{ NVMF_OPT_DISCOVERY,		"discovery"		},
>   	{ NVMF_OPT_DHCHAP_SECRET,	"dhchap_secret=%s"	},
>   	{ NVMF_OPT_DHCHAP_CTRL_SECRET,	"dhchap_ctrl_secret=%s"	},
> +	{ NVMF_OPT_TLS,			"tls"			},
>   	{ NVMF_OPT_ERR,			NULL			}
>   };
>   
> @@ -632,6 +633,7 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
>   	opts->hdr_digest = false;
>   	opts->data_digest = false;
>   	opts->tos = -1; /* < 0 == use transport default */
> +	opts->tls = false;
>   
>   	options = o = kstrdup(buf, GFP_KERNEL);
>   	if (!options)
> @@ -918,6 +920,14 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
>   			kfree(opts->dhchap_ctrl_secret);
>   			opts->dhchap_ctrl_secret = p;
>   			break;
> +		case NVMF_OPT_TLS:
> +			if (!IS_ENABLED(CONFIG_NVME_TCP_TLS)) {
> +				pr_err("TLS is not supported\n");
> +				ret = -EINVAL;
> +				goto out;
> +			}
> +			opts->tls = true;
> +			break;
>   		default:
>   			pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n",
>   				p);
> diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
> index dcac3df8a5f7..5db36e250e7a 100644
> --- a/drivers/nvme/host/fabrics.h
> +++ b/drivers/nvme/host/fabrics.h
> @@ -70,6 +70,7 @@ enum {
>   	NVMF_OPT_DISCOVERY	= 1 << 22,
>   	NVMF_OPT_DHCHAP_SECRET	= 1 << 23,
>   	NVMF_OPT_DHCHAP_CTRL_SECRET = 1 << 24,
> +	NVMF_OPT_TLS		= 1 << 25,
>   };
>   
>   /**
> @@ -102,6 +103,7 @@ enum {
>    * @dhchap_secret: DH-HMAC-CHAP secret
>    * @dhchap_ctrl_secret: DH-HMAC-CHAP controller secret for bi-directional
>    *              authentication
> + * @tls:        Start TLS encrypted connections (TCP)
>    * @disable_sqflow: disable controller sq flow control
>    * @hdr_digest: generate/verify header digest (TCP)
>    * @data_digest: generate/verify data digest (TCP)
> @@ -128,6 +130,7 @@ struct nvmf_ctrl_options {
>   	int			max_reconnects;
>   	char			*dhchap_secret;
>   	char			*dhchap_ctrl_secret;
> +	bool			tls;
>   	bool			disable_sqflow;
>   	bool			hdr_digest;
>   	bool			data_digest;
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index bf46f122e9e1..fbcc0ccf2ef2 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -347,6 +347,7 @@ struct nvme_ctrl {
>   	struct nvme_dhchap_key *ctrl_key;
>   	u16 transaction;
>   #endif
> +	struct key *tls_key;
>   
>   	/* Power saving configuration */
>   	u64 ps_max_latency_us;
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 0a39743da4e4..b80bcc5fe447 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -8,9 +8,13 @@
>   #include <linux/init.h>
>   #include <linux/slab.h>
>   #include <linux/err.h>
> +#include <linux/key.h>
>   #include <linux/nvme-tcp.h>
> +#include <linux/nvme-keyring.h>
>   #include <net/sock.h>
>   #include <net/tcp.h>
> +#include <net/tls.h>
> +#include <net/handshake.h>
>   #include <linux/blk-mq.h>
>   #include <crypto/hash.h>
>   #include <net/busy_poll.h>
> @@ -31,6 +35,16 @@ static int so_priority;
>   module_param(so_priority, int, 0644);
>   MODULE_PARM_DESC(so_priority, "nvme tcp socket optimize priority");
>   
> +#ifdef CONFIG_NVME_TCP_TLS
> +/*
> + * TLS handshake timeout
> + */
> +static int tls_handshake_timeout = 10;
> +module_param(tls_handshake_timeout, int, 0644);
> +MODULE_PARM_DESC(tls_handshake_timeout,
> +		 "nvme TLS handshake timeout in seconds (default 10)");
> +#endif
> +
>   #ifdef CONFIG_DEBUG_LOCK_ALLOC
>   /* lockdep can detect a circular dependency of the form
>    *   sk_lock -> mmap_lock (page fault) -> fs locks -> sk_lock
> @@ -146,7 +160,10 @@ struct nvme_tcp_queue {
>   	struct ahash_request	*snd_hash;
>   	__le32			exp_ddgst;
>   	__le32			recv_ddgst;
> -
> +#ifdef CONFIG_NVME_TCP_TLS
> +	struct completion       tls_complete;
> +	int                     tls_err;
> +#endif
>   	struct page_frag_cache	pf_cache;
>   
>   	void (*state_change)(struct sock *);
> @@ -1492,7 +1509,79 @@ static void nvme_tcp_set_queue_io_cpu(struct nvme_tcp_queue *queue)
>   	queue->io_cpu = cpumask_next_wrap(n - 1, cpu_online_mask, -1, false);
>   }
>   
> -static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
> +#ifdef CONFIG_NVME_TCP_TLS
> +static void nvme_tcp_tls_done(void *data, int status, key_serial_t pskid)
> +{
> +	struct nvme_tcp_queue *queue = data;
> +	struct nvme_tcp_ctrl *ctrl = queue->ctrl;
> +	int qid = nvme_tcp_queue_id(queue);
> +	struct key *tls_key;
> +
> +	dev_dbg(ctrl->ctrl.device, "queue %d: TLS handshake done, key %x, status %d\n",
> +		qid, pskid, status);
> +
> +	if (status) {
> +		queue->tls_err = -status;
> +		goto out_complete;
> +	}
> +
> +	tls_key = key_lookup(pskid);
> +	if (IS_ERR(tls_key)) {
> +		dev_warn(ctrl->ctrl.device, "queue %d: Invalid key %x\n",
> +			 qid, pskid);
> +		queue->tls_err = -ENOKEY;
> +	} else {
> +		ctrl->ctrl.tls_key = tls_key;
> +		queue->tls_err = 0;
> +	}
> +
> +out_complete:
> +	complete(&queue->tls_complete);
> +}
> +
> +static int nvme_tcp_start_tls(struct nvme_ctrl *nctrl,
> +			      struct nvme_tcp_queue *queue,
> +			      key_serial_t pskid)
> +{
> +	int qid = nvme_tcp_queue_id(queue);
> +	int ret;
> +	struct tls_handshake_args args;
> +	unsigned long tmo = tls_handshake_timeout * HZ;
> +	key_serial_t keyring = nvme_keyring_id();
> +
> +	dev_dbg(nctrl->device, "queue %d: start TLS with key %x\n",
> +		qid, pskid);
> +	args.ta_sock = queue->sock;
> +	args.ta_done = nvme_tcp_tls_done;
> +	args.ta_data = queue;
> +	args.ta_my_peerids[0] = pskid;
> +	args.ta_num_peerids = 1;
> +	args.ta_keyring = keyring;
> +	args.ta_timeout_ms = tls_handshake_timeout * 1000;
> +	queue->tls_err = -EOPNOTSUPP;
> +	init_completion(&queue->tls_complete);
> +	ret = tls_client_hello_psk(&args, GFP_KERNEL);
> +	if (ret) {
> +		dev_err(nctrl->device, "queue %d: failed to start TLS: %d\n",
> +			qid, ret);
> +		return ret;
> +	}
> +	if (wait_for_completion_timeout(&queue->tls_complete, tmo) == 0) {
> +		dev_err(nctrl->device,
> +			"queue %d: TLS handshake timeout\n", qid);
> +		ret = -ETIMEDOUT;
> +	} else {
> +		dev_dbg(nctrl->device,
> +			"queue %d: TLS handshake complete, error %d\n",
> +			qid, queue->tls_err);
> +		ret = queue->tls_err;
> +	}
> +	return ret;
> +}
> +#endif
> +
> +static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
> +				key_serial_t pskid)
>   {
>   	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
>   	struct nvme_tcp_queue *queue = &ctrl->queues[qid];
> @@ -1616,6 +1705,14 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
>   		goto err_rcv_pdu;
>   	}
>   
> +#ifdef CONFIG_NVME_TCP_TLS
> +	/* If PSKs are configured try to start TLS */
> +	if (pskid) {
> +		ret = nvme_tcp_start_tls(nctrl, queue, pskid);
> +		if (ret)
> +			goto err_init_connect;
> +	}
> +#endif
>   	ret = nvme_tcp_init_connection(queue);
>   	if (ret)
>   		goto err_init_connect;
> @@ -1759,10 +1856,22 @@ static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl,
>   static int nvme_tcp_alloc_admin_queue(struct nvme_ctrl *ctrl)
>   {
>   	int ret;
> +	key_serial_t psk_id = 0;
> +
> +	if (ctrl->opts->tls) {
> +		psk_id = nvme_tls_psk_default(NULL,
> +					      ctrl->opts->host->nqn,
> +					      ctrl->opts->subsysnqn);
> +		if (!psk_id) {
> +			dev_err(ctrl->device, "no valid PSK found\n");
> +			ret = -ENOKEY;
> +			goto out_free_queue;
> +		}
> +	}
>   
> -	ret = nvme_tcp_alloc_queue(ctrl, 0);
> +	ret = nvme_tcp_alloc_queue(ctrl, 0, psk_id);
>   	if (ret)
> -		return ret;
> +		goto out_free_queue;
>   
>   	ret = nvme_tcp_alloc_async_req(to_tcp_ctrl(ctrl));
>   	if (ret)
> @@ -1778,9 +1887,15 @@ static int nvme_tcp_alloc_admin_queue(struct nvme_ctrl *ctrl)
>   static int __nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
>   {
>   	int i, ret;
> +	key_serial_t psk_id;
>   
> +	if (!ctrl->tls_key) {
> +		dev_err(ctrl->device, "no PSK negotiated\n");
> +		return -ENOKEY;
> +	}

is ctrl->tls_key always set? Looks like this is the expectation.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 12/18] nvme-tcp: control message handling for recvmsg()
  2023-04-17 15:24   ` Sagi Grimberg
@ 2023-04-17 15:26     ` Hannes Reinecke
  0 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 15:26 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

On 4/17/23 17:24, Sagi Grimberg wrote:
> 
> 
> On 4/17/23 16:02, Hannes Reinecke wrote:
>> kTLS is sending TLS ALERT messages as control messages for recvmsg().
>> As we can't do anything sensible with it just abort the connection
>> and let the userspace agent to a re-negotiation.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
>> ---
>>   drivers/nvme/host/tcp.c | 23 ++++++++++++++++++++++-
>>   1 file changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index b80bcc5fe447..eeb3f85fe259 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -1352,6 +1352,11 @@ static int nvme_tcp_init_connection(struct 
>> nvme_tcp_queue *queue)
>>   {
>>       struct nvme_tcp_icreq_pdu *icreq;
>>       struct nvme_tcp_icresp_pdu *icresp;
>> +#ifdef CONFIG_NVME_TCP_TLS
> 
> I don't think you need the ifdef(s) here, right?
> 
Leftovers, sorry.
Will be cleaning it up with the next round.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST
  2023-04-17 15:25     ` Hannes Reinecke
@ 2023-04-17 15:28       ` Sagi Grimberg
  2023-04-17 15:35         ` Hannes Reinecke
  2023-04-17 20:16         ` Jakub Kicinski
  0 siblings, 2 replies; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-17 15:28 UTC (permalink / raw)
  To: Hannes Reinecke, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake


>> nak from me.
>>
>>> MSG_SENDPAGE_NOTLAST was introduced with commit 35f9c09fe9c7
>>> ("tcp: tcp_sendpages() should call tcp_push() once") to fix issues
>>> with the ->sendfile() call, and the implication seems to be that
>>> the flag should be strictly internal to sendfile.
>>
>> no, the use case was to allow sendpage consumers to build larger
>> tso packets for large payloads (splice being the primary one), nvme-tcp
>> also benefits from it (as do others), just like sendfile.
>>
>>> Hence we shouldn't be setting it on nvme-tcp, and anyway doing
>>> so will cause TLS sending invalid data.
>>
>> That is something else. I don't recall the full details on the
>> discussion with Jakub, but it was not my intention to strike
>> it altogether, just remove it if !sendpage_ok.
>>
> But then we're running afoul with tls_sw_do_sendpage(), which uses 
> MSG_SENDPAGE_NOTLAST as a record indicator. Forwarding 
> MSG_SENDPAGE_NOTLAST to tls results in incorrect TLS records on the wire 
> and a transmission stall.
> 
> Took me days to figure that out.
> 
> If you have a better idea, I'm all ears.

So sendfile is unsupported over kTLS? Or is there some special
case for it in the upper layer?

Jakub?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] nvme-tcp: enable TLS handshake upcall
  2023-04-17 15:26   ` Sagi Grimberg
@ 2023-04-17 15:28     ` Hannes Reinecke
  2023-04-17 15:31       ` Sagi Grimberg
  0 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 15:28 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

On 4/17/23 17:26, Sagi Grimberg wrote:
> 
> 
> On 4/17/23 16:02, Hannes Reinecke wrote:
>> Add a fabrics option 'tls' and start the TLS handshake upcall
>> with the default PSK. When TLS is started the PSK key serial
>> number is displayed in the sysfs attribute 'tls_key'
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/nvme/host/Kconfig   |  14 ++++
>>   drivers/nvme/host/core.c    |  23 ++++++-
>>   drivers/nvme/host/fabrics.c |  10 +++
>>   drivers/nvme/host/fabrics.h |   3 +
>>   drivers/nvme/host/nvme.h    |   1 +
>>   drivers/nvme/host/tcp.c     | 127 ++++++++++++++++++++++++++++++++++--
>>   6 files changed, 170 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
>> index 2f6a7f8c94e8..96a74041bb0a 100644
>> --- a/drivers/nvme/host/Kconfig
>> +++ b/drivers/nvme/host/Kconfig
>> @@ -92,6 +92,20 @@ config NVME_TCP
>>         If unsure, say N.
>> +config NVME_TCP_TLS
>> +    bool "NVMe over Fabrics TCP TLS encryption support"
>> +    depends on NVME_TCP
>> +    select NVME_COMMON
>> +    select NVME_KEYRING
>> +    select NET_HANDSHAKE
>> +    help
>> +      Enables TLS encryption for NVMe TCP using the netlink handshake 
>> API.
>> +
>> +      The TLS handshake daemon is availble at
>> +      https://github.com/oracle/ktls-utils.
>> +
>> +      If unsure, say N.
>> +
>>   config NVME_AUTH
>>       bool "NVM Express over Fabrics In-Band Authentication"
>>       depends on NVME_CORE
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 79aa215aec76..937013127c91 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -3889,6 +3889,19 @@ static DEVICE_ATTR(dhchap_ctrl_secret, S_IRUGO 
>> | S_IWUSR,
>>       nvme_ctrl_dhchap_ctrl_secret_show, 
>> nvme_ctrl_dhchap_ctrl_secret_store);
>>   #endif
>> +#ifdef CONFIG_NVME_TCP_TLS
>> +static ssize_t tls_key_show(struct device *dev,
>> +                struct device_attribute *attr, char *buf)
>> +{
>> +    struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
>> +
>> +    if (!ctrl->tls_key)
>> +        return 0;
>> +    return sysfs_emit(buf, "%08x", key_serial(ctrl->tls_key));
>> +}
>> +static DEVICE_ATTR_RO(tls_key);
>> +#endif
>> +
>>   static struct attribute *nvme_dev_attrs[] = {
>>       &dev_attr_reset_controller.attr,
>>       &dev_attr_rescan_controller.attr,
>> @@ -3915,6 +3928,9 @@ static struct attribute *nvme_dev_attrs[] = {
>>   #ifdef CONFIG_NVME_AUTH
>>       &dev_attr_dhchap_secret.attr,
>>       &dev_attr_dhchap_ctrl_secret.attr,
>> +#endif
>> +#ifdef CONFIG_NVME_TCP_TLS
>> +    &dev_attr_tls_key.attr,
>>   #endif
>>       NULL
>>   };
>> @@ -3945,7 +3961,10 @@ static umode_t 
>> nvme_dev_attrs_are_visible(struct kobject *kobj,
>>       if (a == &dev_attr_dhchap_ctrl_secret.attr && !ctrl->opts)
>>           return 0;
>>   #endif
>> -
>> +#ifdef CONFIG_NVME_TCP_TLS
>> +    if (a == &dev_attr_tls_key.attr && !ctrl->opts)
>> +        return 0;
>> +#endif
>>       return a->mode;
>>   }
>> @@ -5080,7 +5099,7 @@ static void nvme_free_ctrl(struct device *dev)
>>       if (!subsys || ctrl->instance != subsys->instance)
>>           ida_free(&nvme_instance_ida, ctrl->instance);
>> -
>> +    key_put(ctrl->tls_key);
>>       nvme_free_cels(ctrl);
>>       nvme_mpath_uninit(ctrl);
>>       nvme_auth_stop(ctrl);
>> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
>> index bbaa04a0c502..a8d6c22515ca 100644
>> --- a/drivers/nvme/host/fabrics.c
>> +++ b/drivers/nvme/host/fabrics.c
>> @@ -609,6 +609,7 @@ static const match_table_t opt_tokens = {
>>       { NVMF_OPT_DISCOVERY,        "discovery"        },
>>       { NVMF_OPT_DHCHAP_SECRET,    "dhchap_secret=%s"    },
>>       { NVMF_OPT_DHCHAP_CTRL_SECRET,    "dhchap_ctrl_secret=%s"    },
>> +    { NVMF_OPT_TLS,            "tls"            },
>>       { NVMF_OPT_ERR,            NULL            }
>>   };
>> @@ -632,6 +633,7 @@ static int nvmf_parse_options(struct 
>> nvmf_ctrl_options *opts,
>>       opts->hdr_digest = false;
>>       opts->data_digest = false;
>>       opts->tos = -1; /* < 0 == use transport default */
>> +    opts->tls = false;
>>       options = o = kstrdup(buf, GFP_KERNEL);
>>       if (!options)
>> @@ -918,6 +920,14 @@ static int nvmf_parse_options(struct 
>> nvmf_ctrl_options *opts,
>>               kfree(opts->dhchap_ctrl_secret);
>>               opts->dhchap_ctrl_secret = p;
>>               break;
>> +        case NVMF_OPT_TLS:
>> +            if (!IS_ENABLED(CONFIG_NVME_TCP_TLS)) {
>> +                pr_err("TLS is not supported\n");
>> +                ret = -EINVAL;
>> +                goto out;
>> +            }
>> +            opts->tls = true;
>> +            break;
>>           default:
>>               pr_warn("unknown parameter or missing value '%s' in ctrl 
>> creation request\n",
>>                   p);
>> diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
>> index dcac3df8a5f7..5db36e250e7a 100644
>> --- a/drivers/nvme/host/fabrics.h
>> +++ b/drivers/nvme/host/fabrics.h
>> @@ -70,6 +70,7 @@ enum {
>>       NVMF_OPT_DISCOVERY    = 1 << 22,
>>       NVMF_OPT_DHCHAP_SECRET    = 1 << 23,
>>       NVMF_OPT_DHCHAP_CTRL_SECRET = 1 << 24,
>> +    NVMF_OPT_TLS        = 1 << 25,
>>   };
>>   /**
>> @@ -102,6 +103,7 @@ enum {
>>    * @dhchap_secret: DH-HMAC-CHAP secret
>>    * @dhchap_ctrl_secret: DH-HMAC-CHAP controller secret for 
>> bi-directional
>>    *              authentication
>> + * @tls:        Start TLS encrypted connections (TCP)
>>    * @disable_sqflow: disable controller sq flow control
>>    * @hdr_digest: generate/verify header digest (TCP)
>>    * @data_digest: generate/verify data digest (TCP)
>> @@ -128,6 +130,7 @@ struct nvmf_ctrl_options {
>>       int            max_reconnects;
>>       char            *dhchap_secret;
>>       char            *dhchap_ctrl_secret;
>> +    bool            tls;
>>       bool            disable_sqflow;
>>       bool            hdr_digest;
>>       bool            data_digest;
>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>> index bf46f122e9e1..fbcc0ccf2ef2 100644
>> --- a/drivers/nvme/host/nvme.h
>> +++ b/drivers/nvme/host/nvme.h
>> @@ -347,6 +347,7 @@ struct nvme_ctrl {
>>       struct nvme_dhchap_key *ctrl_key;
>>       u16 transaction;
>>   #endif
>> +    struct key *tls_key;
>>       /* Power saving configuration */
>>       u64 ps_max_latency_us;
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index 0a39743da4e4..b80bcc5fe447 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -8,9 +8,13 @@
>>   #include <linux/init.h>
>>   #include <linux/slab.h>
>>   #include <linux/err.h>
>> +#include <linux/key.h>
>>   #include <linux/nvme-tcp.h>
>> +#include <linux/nvme-keyring.h>
>>   #include <net/sock.h>
>>   #include <net/tcp.h>
>> +#include <net/tls.h>
>> +#include <net/handshake.h>
>>   #include <linux/blk-mq.h>
>>   #include <crypto/hash.h>
>>   #include <net/busy_poll.h>
>> @@ -31,6 +35,16 @@ static int so_priority;
>>   module_param(so_priority, int, 0644);
>>   MODULE_PARM_DESC(so_priority, "nvme tcp socket optimize priority");
>> +#ifdef CONFIG_NVME_TCP_TLS
>> +/*
>> + * TLS handshake timeout
>> + */
>> +static int tls_handshake_timeout = 10;
>> +module_param(tls_handshake_timeout, int, 0644);
>> +MODULE_PARM_DESC(tls_handshake_timeout,
>> +         "nvme TLS handshake timeout in seconds (default 10)");
>> +#endif
>> +
>>   #ifdef CONFIG_DEBUG_LOCK_ALLOC
>>   /* lockdep can detect a circular dependency of the form
>>    *   sk_lock -> mmap_lock (page fault) -> fs locks -> sk_lock
>> @@ -146,7 +160,10 @@ struct nvme_tcp_queue {
>>       struct ahash_request    *snd_hash;
>>       __le32            exp_ddgst;
>>       __le32            recv_ddgst;
>> -
>> +#ifdef CONFIG_NVME_TCP_TLS
>> +    struct completion       tls_complete;
>> +    int                     tls_err;
>> +#endif
>>       struct page_frag_cache    pf_cache;
>>       void (*state_change)(struct sock *);
>> @@ -1492,7 +1509,79 @@ static void nvme_tcp_set_queue_io_cpu(struct 
>> nvme_tcp_queue *queue)
>>       queue->io_cpu = cpumask_next_wrap(n - 1, cpu_online_mask, -1, 
>> false);
>>   }
>> -static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
>> +#ifdef CONFIG_NVME_TCP_TLS
>> +static void nvme_tcp_tls_done(void *data, int status, key_serial_t 
>> pskid)
>> +{
>> +    struct nvme_tcp_queue *queue = data;
>> +    struct nvme_tcp_ctrl *ctrl = queue->ctrl;
>> +    int qid = nvme_tcp_queue_id(queue);
>> +    struct key *tls_key;
>> +
>> +    dev_dbg(ctrl->ctrl.device, "queue %d: TLS handshake done, key %x, 
>> status %d\n",
>> +        qid, pskid, status);
>> +
>> +    if (status) {
>> +        queue->tls_err = -status;
>> +        goto out_complete;
>> +    }
>> +
>> +    tls_key = key_lookup(pskid);
>> +    if (IS_ERR(tls_key)) {
>> +        dev_warn(ctrl->ctrl.device, "queue %d: Invalid key %x\n",
>> +             qid, pskid);
>> +        queue->tls_err = -ENOKEY;
>> +    } else {
>> +        ctrl->ctrl.tls_key = tls_key;
>> +        queue->tls_err = 0;
>> +    }
>> +
>> +out_complete:
>> +    complete(&queue->tls_complete);
>> +}
>> +
>> +static int nvme_tcp_start_tls(struct nvme_ctrl *nctrl,
>> +                  struct nvme_tcp_queue *queue,
>> +                  key_serial_t pskid)
>> +{
>> +    int qid = nvme_tcp_queue_id(queue);
>> +    int ret;
>> +    struct tls_handshake_args args;
>> +    unsigned long tmo = tls_handshake_timeout * HZ;
>> +    key_serial_t keyring = nvme_keyring_id();
>> +
>> +    dev_dbg(nctrl->device, "queue %d: start TLS with key %x\n",
>> +        qid, pskid);
>> +    args.ta_sock = queue->sock;
>> +    args.ta_done = nvme_tcp_tls_done;
>> +    args.ta_data = queue;
>> +    args.ta_my_peerids[0] = pskid;
>> +    args.ta_num_peerids = 1;
>> +    args.ta_keyring = keyring;
>> +    args.ta_timeout_ms = tls_handshake_timeout * 1000;
>> +    queue->tls_err = -EOPNOTSUPP;
>> +    init_completion(&queue->tls_complete);
>> +    ret = tls_client_hello_psk(&args, GFP_KERNEL);
>> +    if (ret) {
>> +        dev_err(nctrl->device, "queue %d: failed to start TLS: %d\n",
>> +            qid, ret);
>> +        return ret;
>> +    }
>> +    if (wait_for_completion_timeout(&queue->tls_complete, tmo) == 0) {
>> +        dev_err(nctrl->device,
>> +            "queue %d: TLS handshake timeout\n", qid);
>> +        ret = -ETIMEDOUT;
>> +    } else {
>> +        dev_dbg(nctrl->device,
>> +            "queue %d: TLS handshake complete, error %d\n",
>> +            qid, queue->tls_err);
>> +        ret = queue->tls_err;
>> +    }
>> +    return ret;
>> +}
>> +#endif
>> +
>> +static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
>> +                key_serial_t pskid)
>>   {
>>       struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
>>       struct nvme_tcp_queue *queue = &ctrl->queues[qid];
>> @@ -1616,6 +1705,14 @@ static int nvme_tcp_alloc_queue(struct 
>> nvme_ctrl *nctrl, int qid)
>>           goto err_rcv_pdu;
>>       }
>> +#ifdef CONFIG_NVME_TCP_TLS
>> +    /* If PSKs are configured try to start TLS */
>> +    if (pskid) {
>> +        ret = nvme_tcp_start_tls(nctrl, queue, pskid);
>> +        if (ret)
>> +            goto err_init_connect;
>> +    }
>> +#endif
>>       ret = nvme_tcp_init_connection(queue);
>>       if (ret)
>>           goto err_init_connect;
>> @@ -1759,10 +1856,22 @@ static int nvme_tcp_start_io_queues(struct 
>> nvme_ctrl *ctrl,
>>   static int nvme_tcp_alloc_admin_queue(struct nvme_ctrl *ctrl)
>>   {
>>       int ret;
>> +    key_serial_t psk_id = 0;
>> +
>> +    if (ctrl->opts->tls) {
>> +        psk_id = nvme_tls_psk_default(NULL,
>> +                          ctrl->opts->host->nqn,
>> +                          ctrl->opts->subsysnqn);
>> +        if (!psk_id) {
>> +            dev_err(ctrl->device, "no valid PSK found\n");
>> +            ret = -ENOKEY;
>> +            goto out_free_queue;
>> +        }
>> +    }
>> -    ret = nvme_tcp_alloc_queue(ctrl, 0);
>> +    ret = nvme_tcp_alloc_queue(ctrl, 0, psk_id);
>>       if (ret)
>> -        return ret;
>> +        goto out_free_queue;
>>       ret = nvme_tcp_alloc_async_req(to_tcp_ctrl(ctrl));
>>       if (ret)
>> @@ -1778,9 +1887,15 @@ static int nvme_tcp_alloc_admin_queue(struct 
>> nvme_ctrl *ctrl)
>>   static int __nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
>>   {
>>       int i, ret;
>> +    key_serial_t psk_id;
>> +    if (!ctrl->tls_key) {
>> +        dev_err(ctrl->device, "no PSK negotiated\n");
>> +        return -ENOKEY;
>> +    }
> 
> is ctrl->tls_key always set? Looks like this is the expectation.
> 

It needs to be set when TLS is enabled for the admin queue.
Otherwise no.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 15/18] nvmet-tcp: allocate socket file
  2023-04-17 13:02 ` [PATCH 15/18] nvmet-tcp: allocate socket file Hannes Reinecke
@ 2023-04-17 15:29   ` Sagi Grimberg
  0 siblings, 0 replies; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-17 15:29 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake



On 4/17/23 16:02, Hannes Reinecke wrote:
> For the TLS upcall we need to allocate a socket file such
> that the userspace daemon is able to use the socket.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>   drivers/nvme/target/tcp.c | 36 +++++++++++++++++++++++-------------
>   1 file changed, 23 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index ed98df72c76b..fc7a9fb56a83 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -122,6 +122,7 @@ struct nvmet_tcp_cmd {
>   
>   enum nvmet_tcp_queue_state {
>   	NVMET_TCP_Q_CONNECTING,
> +	NVMET_TCP_Q_TLS_HANDSHAKE,

I think you got some misplaced chunks in the patch.

>   	NVMET_TCP_Q_LIVE,
>   	NVMET_TCP_Q_DISCONNECTING,
>   };
> @@ -1481,12 +1482,12 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w)
>   	nvmet_sq_destroy(&queue->nvme_sq);
>   	cancel_work_sync(&queue->io_work);
>   	nvmet_tcp_free_cmd_data_in_buffers(queue);
> -	sock_release(queue->sock);
> +	/* ->sock will be released by fput() */
> +	fput(queue->sock->file);
>   	nvmet_tcp_free_cmds(queue);
>   	if (queue->hdr_digest || queue->data_digest)
>   		nvmet_tcp_free_crypto(queue);
>   	ida_free(&nvmet_tcp_queue_ida, queue->idx);
> -
>   	page = virt_to_head_page(queue->pf_cache.va);
>   	__page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias);
>   	kfree(queue);
> @@ -1609,15 +1610,16 @@ static int nvmet_tcp_set_queue_sock(struct nvmet_tcp_queue *queue)
>   	return ret;
>   }
>   
> -static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
> +static void nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
>   		struct socket *newsock)
>   {
>   	struct nvmet_tcp_queue *queue;
> +	struct file *sock_file;
>   	int ret;
>   
>   	queue = kzalloc(sizeof(*queue), GFP_KERNEL);
>   	if (!queue)
> -		return -ENOMEM;
> +		return;
>   
>   	INIT_WORK(&queue->release_work, nvmet_tcp_release_queue_work);
>   	INIT_WORK(&queue->io_work, nvmet_tcp_io_work);
> @@ -1625,15 +1627,25 @@ static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
>   	queue->port = port;
>   	queue->nr_cmds = 0;
>   	spin_lock_init(&queue->state_lock);
> -	queue->state = NVMET_TCP_Q_CONNECTING;
> +	if (queue->port->nport->disc_addr.tsas.tcp.sectype ==
> +	    NVMF_TCP_SECTYPE_TLS13)
> +		queue->state = NVMET_TCP_Q_TLS_HANDSHAKE;
> +	else
> +		queue->state = NVMET_TCP_Q_CONNECTING;
>   	INIT_LIST_HEAD(&queue->free_list);
>   	init_llist_head(&queue->resp_list);
>   	INIT_LIST_HEAD(&queue->resp_send_list);
>   
> +	sock_file = sock_alloc_file(queue->sock, O_CLOEXEC, NULL);
> +	if (IS_ERR(sock_file)) {
> +		ret = PTR_ERR(sock_file);
> +		goto out_free_queue;
> +	}
> +
>   	queue->idx = ida_alloc(&nvmet_tcp_queue_ida, GFP_KERNEL);
>   	if (queue->idx < 0) {
>   		ret = queue->idx;
> -		goto out_free_queue;
> +		goto out_sock;
>   	}
>   
>   	ret = nvmet_tcp_alloc_cmd(queue, &queue->connect);
> @@ -1654,7 +1666,7 @@ static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
>   	if (ret)
>   		goto out_destroy_sq;
>   
> -	return 0;
> +	return;
>   out_destroy_sq:
>   	mutex_lock(&nvmet_tcp_queue_mutex);
>   	list_del_init(&queue->queue_list);
> @@ -1664,9 +1676,11 @@ static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
>   	nvmet_tcp_free_cmd(&queue->connect);
>   out_ida_remove:
>   	ida_free(&nvmet_tcp_queue_ida, queue->idx);
> +out_sock:
> +	fput(queue->sock->file);
>   out_free_queue:
>   	kfree(queue);
> -	return ret;
> +	pr_err("failed to allocate queue");
>   }
>   
>   static void nvmet_tcp_accept_work(struct work_struct *w)
> @@ -1683,11 +1697,7 @@ static void nvmet_tcp_accept_work(struct work_struct *w)
>   				pr_warn("failed to accept err=%d\n", ret);
>   			return;
>   		}
> -		ret = nvmet_tcp_alloc_queue(port, newsock);
> -		if (ret) {
> -			pr_err("failed to allocate queue\n");
> -			sock_release(newsock);
> -		}
> +		nvmet_tcp_alloc_queue(port, newsock);
>   	}
>   }
>   

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] nvme-tcp: enable TLS handshake upcall
  2023-04-17 15:28     ` Hannes Reinecke
@ 2023-04-17 15:31       ` Sagi Grimberg
  2023-04-17 15:36         ` Hannes Reinecke
  0 siblings, 1 reply; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-17 15:31 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake


>>>   static int nvme_tcp_alloc_admin_queue(struct nvme_ctrl *ctrl)
>>>   {
>>>       int ret;
>>> +    key_serial_t psk_id = 0;
>>> +
>>> +    if (ctrl->opts->tls) {
>>> +        psk_id = nvme_tls_psk_default(NULL,
>>> +                          ctrl->opts->host->nqn,
>>> +                          ctrl->opts->subsysnqn);
>>> +        if (!psk_id) {
>>> +            dev_err(ctrl->device, "no valid PSK found\n");
>>> +            ret = -ENOKEY;
>>> +            goto out_free_queue;
>>> +        }
>>> +    }
>>> -    ret = nvme_tcp_alloc_queue(ctrl, 0);
>>> +    ret = nvme_tcp_alloc_queue(ctrl, 0, psk_id);
>>>       if (ret)
>>> -        return ret;
>>> +        goto out_free_queue;
>>>       ret = nvme_tcp_alloc_async_req(to_tcp_ctrl(ctrl));
>>>       if (ret)
>>> @@ -1778,9 +1887,15 @@ static int nvme_tcp_alloc_admin_queue(struct 
>>> nvme_ctrl *ctrl)
>>>   static int __nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
>>>   {
>>>       int i, ret;
>>> +    key_serial_t psk_id;
>>> +    if (!ctrl->tls_key) {
>>> +        dev_err(ctrl->device, "no PSK negotiated\n");
>>> +        return -ENOKEY;
>>> +    }
>>
>> is ctrl->tls_key always set? Looks like this is the expectation.
>>
> 
> It needs to be set when TLS is enabled for the admin queue.
> Otherwise no.

Then aren't you missing also and statement with (ctrl->opts->tls) ?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST
  2023-04-17 15:28       ` Sagi Grimberg
@ 2023-04-17 15:35         ` Hannes Reinecke
  2023-04-17 20:16         ` Jakub Kicinski
  1 sibling, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 15:35 UTC (permalink / raw)
  To: Sagi Grimberg, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

On 4/17/23 17:28, Sagi Grimberg wrote:
> 
>>> nak from me.
>>>
>>>> MSG_SENDPAGE_NOTLAST was introduced with commit 35f9c09fe9c7
>>>> ("tcp: tcp_sendpages() should call tcp_push() once") to fix issues
>>>> with the ->sendfile() call, and the implication seems to be that
>>>> the flag should be strictly internal to sendfile.
>>>
>>> no, the use case was to allow sendpage consumers to build larger
>>> tso packets for large payloads (splice being the primary one), nvme-tcp
>>> also benefits from it (as do others), just like sendfile.
>>>
>>>> Hence we shouldn't be setting it on nvme-tcp, and anyway doing
>>>> so will cause TLS sending invalid data.
>>>
>>> That is something else. I don't recall the full details on the
>>> discussion with Jakub, but it was not my intention to strike
>>> it altogether, just remove it if !sendpage_ok.
>>>
>> But then we're running afoul with tls_sw_do_sendpage(), which uses 
>> MSG_SENDPAGE_NOTLAST as a record indicator. Forwarding 
>> MSG_SENDPAGE_NOTLAST to tls results in incorrect TLS records on the 
>> wire and a transmission stall.
>>
>> Took me days to figure that out.
>>
>> If you have a better idea, I'm all ears.
> 
> So sendfile is unsupported over kTLS? Or is there some special
> case for it in the upper layer?
> 
Dunno. Seems to work, at least according to commit d452d48b9f8b
("tls: prevent oversized sendfile() hangs by ignoring MSG_MORE")

Which is also where I got the notion about sendfile() from.
If MSG_SENDPAGE_NOTLAST is valid for non-sendfile users then
tls_sw_do_sendpage() has a bug.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] nvme-tcp: enable TLS handshake upcall
  2023-04-17 15:31       ` Sagi Grimberg
@ 2023-04-17 15:36         ` Hannes Reinecke
  0 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-17 15:36 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

On 4/17/23 17:31, Sagi Grimberg wrote:
> 
>>>>   static int nvme_tcp_alloc_admin_queue(struct nvme_ctrl *ctrl)
>>>>   {
>>>>       int ret;
>>>> +    key_serial_t psk_id = 0;
>>>> +
>>>> +    if (ctrl->opts->tls) {
>>>> +        psk_id = nvme_tls_psk_default(NULL,
>>>> +                          ctrl->opts->host->nqn,
>>>> +                          ctrl->opts->subsysnqn);
>>>> +        if (!psk_id) {
>>>> +            dev_err(ctrl->device, "no valid PSK found\n");
>>>> +            ret = -ENOKEY;
>>>> +            goto out_free_queue;
>>>> +        }
>>>> +    }
>>>> -    ret = nvme_tcp_alloc_queue(ctrl, 0);
>>>> +    ret = nvme_tcp_alloc_queue(ctrl, 0, psk_id);
>>>>       if (ret)
>>>> -        return ret;
>>>> +        goto out_free_queue;
>>>>       ret = nvme_tcp_alloc_async_req(to_tcp_ctrl(ctrl));
>>>>       if (ret)
>>>> @@ -1778,9 +1887,15 @@ static int nvme_tcp_alloc_admin_queue(struct 
>>>> nvme_ctrl *ctrl)
>>>>   static int __nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
>>>>   {
>>>>       int i, ret;
>>>> +    key_serial_t psk_id;
>>>> +    if (!ctrl->tls_key) {
>>>> +        dev_err(ctrl->device, "no PSK negotiated\n");
>>>> +        return -ENOKEY;
>>>> +    }
>>>
>>> is ctrl->tls_key always set? Looks like this is the expectation.
>>>
>>
>> It needs to be set when TLS is enabled for the admin queue.
>> Otherwise no.
> 
> Then aren't you missing also and statement with (ctrl->opts->tls) ?
> 
Indeed, I do. Will be fixing it up.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/18] net/tls: sanitize MSG_EOR handling
  2023-04-17 15:19   ` Sagi Grimberg
@ 2023-04-17 20:10     ` Jakub Kicinski
  2023-04-18 10:07       ` Sagi Grimberg
  0 siblings, 1 reply; 52+ messages in thread
From: Jakub Kicinski @ 2023-04-17 20:10 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Hannes Reinecke, Christoph Hellwig, Keith Busch, linux-nvme,
	Chuck Lever, kernel-tls-handshake

On Mon, 17 Apr 2023 18:19:12 +0300 Sagi Grimberg wrote:
> > The TLS stack is using MSG_EOR internally, so the flag cannot be
> > set for sendmsg()/sendpage(). But to avoid having the caller to
> > check whether TLS is active modify the code to clear the MSG_EOR
> > flag. And blank out MSG_MORE / MSG_SENDPAGE_NOTLAST, too, as they
> > conflict with MSG_EOR anyway.  
> 
> This looks like a temporary workaround to me.
> 
> The networking folks really need to be CC'd on this (same for patch 6).

Thanks, when I said "we can support EOR" I obviously meant support 
not ignore it :(  No ack.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST
  2023-04-17 15:28       ` Sagi Grimberg
  2023-04-17 15:35         ` Hannes Reinecke
@ 2023-04-17 20:16         ` Jakub Kicinski
  2023-04-18 10:33           ` Hannes Reinecke
  1 sibling, 1 reply; 52+ messages in thread
From: Jakub Kicinski @ 2023-04-17 20:16 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Hannes Reinecke, Christoph Hellwig, Keith Busch, linux-nvme,
	Chuck Lever, kernel-tls-handshake

On Mon, 17 Apr 2023 18:28:12 +0300 Sagi Grimberg wrote:
> > But then we're running afoul with tls_sw_do_sendpage(), which uses 
> > MSG_SENDPAGE_NOTLAST as a record indicator. Forwarding 
> > MSG_SENDPAGE_NOTLAST to tls results in incorrect TLS records on the wire 
> > and a transmission stall.
> > 
> > Took me days to figure that out.
> > 
> > If you have a better idea, I'm all ears.  
> 
> So sendfile is unsupported over kTLS? Or is there some special
> case for it in the upper layer?
> 
> Jakub?

First I hear of it. 10 sec look at the code doesn't indicate that 
the author missed this case completely either.

Hannes, any more details on what's going wrong?

What incorrect TLS record have you seen? What's the sequence of sends?
Offload or no offload? Easy to repro or stars need to (mis)align?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] nvme-tcp: enable TLS handshake upcall
  2023-04-17 13:02 ` [PATCH 11/18] nvme-tcp: enable TLS handshake upcall Hannes Reinecke
  2023-04-17 15:26   ` Sagi Grimberg
@ 2023-04-18  5:52   ` Daniel Wagner
  2023-04-18  9:38     ` Hannes Reinecke
  1 sibling, 1 reply; 52+ messages in thread
From: Daniel Wagner @ 2023-04-18  5:52 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Sagi Grimberg, Christoph Hellwig, Keith Busch, linux-nvme,
	Chuck Lever, kernel-tls-handshake

On Mon, Apr 17, 2023 at 03:02:55PM +0200, Hannes Reinecke wrote:
 --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -609,6 +609,7 @@ static const match_table_t opt_tokens = {
>  	{ NVMF_OPT_DISCOVERY,		"discovery"		},
>  	{ NVMF_OPT_DHCHAP_SECRET,	"dhchap_secret=%s"	},
>  	{ NVMF_OPT_DHCHAP_CTRL_SECRET,	"dhchap_ctrl_secret=%s"	},
> +	{ NVMF_OPT_TLS,			"tls"			},
>  	{ NVMF_OPT_ERR,			NULL			}
>  };

Coming back to your previous discussion about this. 'tls' is always announced
but not always supported. This renders the unsupported option filtering added to
libnvme useles:

  https://github.com/linux-nvme/libnvme/issues/612

Let's assume we go ahead with this patch. The user set explicit the tls option
via 'nvme connect ... --tls' and the kernel will return EINVAL, but userland
can't really know what it was and drop --tls and retry again (or print some
useful information). It could be any other parameter provided by the user.

How is userland supposed to do any feature detection?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] nvme-tcp: enable TLS handshake upcall
  2023-04-18  5:52   ` Daniel Wagner
@ 2023-04-18  9:38     ` Hannes Reinecke
  2023-04-18 10:12       ` Sagi Grimberg
  0 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-18  9:38 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Sagi Grimberg, Christoph Hellwig, Keith Busch, linux-nvme,
	Chuck Lever, kernel-tls-handshake

On 4/18/23 07:52, Daniel Wagner wrote:
> On Mon, Apr 17, 2023 at 03:02:55PM +0200, Hannes Reinecke wrote:
>   --- a/drivers/nvme/host/fabrics.c
>> +++ b/drivers/nvme/host/fabrics.c
>> @@ -609,6 +609,7 @@ static const match_table_t opt_tokens = {
>>   	{ NVMF_OPT_DISCOVERY,		"discovery"		},
>>   	{ NVMF_OPT_DHCHAP_SECRET,	"dhchap_secret=%s"	},
>>   	{ NVMF_OPT_DHCHAP_CTRL_SECRET,	"dhchap_ctrl_secret=%s"	},
>> +	{ NVMF_OPT_TLS,			"tls"			},
>>   	{ NVMF_OPT_ERR,			NULL			}
>>   };
> 
> Coming back to your previous discussion about this. 'tls' is always announced
> but not always supported. This renders the unsupported option filtering added to
> libnvme useles:
> 
>    https://github.com/linux-nvme/libnvme/issues/612
> 
> Let's assume we go ahead with this patch. The user set explicit the tls option
> via 'nvme connect ... --tls' and the kernel will return EINVAL, but userland
> can't really know what it was and drop --tls and retry again (or print some
> useful information). It could be any other parameter provided by the user.
> 
> How is userland supposed to do any feature detection?
> 
And that is a good question.
I've just removed most of the #ifdef annotations on request from Sagi.
If we were going with your suggestion I would need to bring them back.

So, Sagi: What should we do?
Re-clutter and allow nvme-cli to figure out if TLS is supported?
And have it always enabled and invent other means for nvme-cli?

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/18] net/tls: sanitize MSG_EOR handling
  2023-04-17 20:10     ` Jakub Kicinski
@ 2023-04-18 10:07       ` Sagi Grimberg
  2023-04-18 10:24         ` Hannes Reinecke
  0 siblings, 1 reply; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-18 10:07 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Hannes Reinecke, Christoph Hellwig, Keith Busch, linux-nvme,
	Chuck Lever, kernel-tls-handshake


>>> The TLS stack is using MSG_EOR internally, so the flag cannot be
>>> set for sendmsg()/sendpage(). But to avoid having the caller to
>>> check whether TLS is active modify the code to clear the MSG_EOR
>>> flag. And blank out MSG_MORE / MSG_SENDPAGE_NOTLAST, too, as they
>>> conflict with MSG_EOR anyway.
>>
>> This looks like a temporary workaround to me.
>>
>> The networking folks really need to be CC'd on this (same for patch 6).
> 
> Thanks, when I said "we can support EOR" I obviously meant support
> not ignore it :(  No ack.

Obviously neither Hannes or I have sufficient tls knowledge to properly
support it... It needs to be done by someone who knows the
implementation.

Is this a large scope to add support for it? Because if it is, I'd
simply change nvme to clear MSG_EOR when tls is used (despite my
preference to not do special things for tls).

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] nvme-tcp: enable TLS handshake upcall
  2023-04-18  9:38     ` Hannes Reinecke
@ 2023-04-18 10:12       ` Sagi Grimberg
  2023-04-18 10:28         ` Hannes Reinecke
  0 siblings, 1 reply; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-18 10:12 UTC (permalink / raw)
  To: Hannes Reinecke, Daniel Wagner
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake


>>   --- a/drivers/nvme/host/fabrics.c
>>> +++ b/drivers/nvme/host/fabrics.c
>>> @@ -609,6 +609,7 @@ static const match_table_t opt_tokens = {
>>>       { NVMF_OPT_DISCOVERY,        "discovery"        },
>>>       { NVMF_OPT_DHCHAP_SECRET,    "dhchap_secret=%s"    },
>>>       { NVMF_OPT_DHCHAP_CTRL_SECRET,    "dhchap_ctrl_secret=%s"    },
>>> +    { NVMF_OPT_TLS,            "tls"            },
>>>       { NVMF_OPT_ERR,            NULL            }
>>>   };
>>
>> Coming back to your previous discussion about this. 'tls' is always 
>> announced
>> but not always supported. This renders the unsupported option 
>> filtering added to
>> libnvme useles:
>>
>>    https://github.com/linux-nvme/libnvme/issues/612
>>
>> Let's assume we go ahead with this patch. The user set explicit the 
>> tls option
>> via 'nvme connect ... --tls' and the kernel will return EINVAL, but 
>> userland
>> can't really know what it was and drop --tls and retry again (or print 
>> some
>> useful information). It could be any other parameter provided by the 
>> user.
>>
>> How is userland supposed to do any feature detection?
>>
> And that is a good question.
> I've just removed most of the #ifdef annotations on request from Sagi.
> If we were going with your suggestion I would need to bring them back.
> 
> So, Sagi: What should we do?
> Re-clutter and allow nvme-cli to figure out if TLS is supported?
> And have it always enabled and invent other means for nvme-cli?

I don't understand the issue.

userspace already has a way to know if a feature is supported by reading
the misc device.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/18] net/tls: sanitize MSG_EOR handling
  2023-04-18 10:07       ` Sagi Grimberg
@ 2023-04-18 10:24         ` Hannes Reinecke
  2023-04-18 10:39           ` Hannes Reinecke
  0 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-18 10:24 UTC (permalink / raw)
  To: Sagi Grimberg, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

On 4/18/23 12:07, Sagi Grimberg wrote:
> 
>>>> The TLS stack is using MSG_EOR internally, so the flag cannot be
>>>> set for sendmsg()/sendpage(). But to avoid having the caller to
>>>> check whether TLS is active modify the code to clear the MSG_EOR
>>>> flag. And blank out MSG_MORE / MSG_SENDPAGE_NOTLAST, too, as they
>>>> conflict with MSG_EOR anyway.
>>>
>>> This looks like a temporary workaround to me.
>>>
>>> The networking folks really need to be CC'd on this (same for patch 6).
>>
>> Thanks, when I said "we can support EOR" I obviously meant support
>> not ignore it :(  No ack.
> 
> Obviously neither Hannes or I have sufficient tls knowledge to properly
> support it... It needs to be done by someone who knows the
> implementation.
> 
> Is this a large scope to add support for it? Because if it is, I'd
> simply change nvme to clear MSG_EOR when tls is used (despite my
> preference to not do special things for tls).

There's a rather simple patch for tls_sw. It already paces the data 
internally by an 'eor' variable, which is currently set to !MSG_MORE.
So evaluating MSG_EOR here is really trivial, and doesn't seem to cause 
any issues.

Or, at least, none which would show up with NVMe-over-TLS :-)

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] nvme-tcp: enable TLS handshake upcall
  2023-04-18 10:12       ` Sagi Grimberg
@ 2023-04-18 10:28         ` Hannes Reinecke
  2023-04-18 10:32           ` Sagi Grimberg
  0 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-18 10:28 UTC (permalink / raw)
  To: Sagi Grimberg, Daniel Wagner
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

On 4/18/23 12:12, Sagi Grimberg wrote:
> 
>>>   --- a/drivers/nvme/host/fabrics.c
>>>> +++ b/drivers/nvme/host/fabrics.c
>>>> @@ -609,6 +609,7 @@ static const match_table_t opt_tokens = {
>>>>       { NVMF_OPT_DISCOVERY,        "discovery"        },
>>>>       { NVMF_OPT_DHCHAP_SECRET,    "dhchap_secret=%s"    },
>>>>       { NVMF_OPT_DHCHAP_CTRL_SECRET,    "dhchap_ctrl_secret=%s"    },
>>>> +    { NVMF_OPT_TLS,            "tls"            },
>>>>       { NVMF_OPT_ERR,            NULL            }
>>>>   };
>>>
>>> Coming back to your previous discussion about this. 'tls' is always 
>>> announced
>>> but not always supported. This renders the unsupported option 
>>> filtering added to
>>> libnvme useles:
>>>
>>>    https://github.com/linux-nvme/libnvme/issues/612
>>>
>>> Let's assume we go ahead with this patch. The user set explicit the 
>>> tls option
>>> via 'nvme connect ... --tls' and the kernel will return EINVAL, but 
>>> userland
>>> can't really know what it was and drop --tls and retry again (or 
>>> print some
>>> useful information). It could be any other parameter provided by the 
>>> user.
>>>
>>> How is userland supposed to do any feature detection?
>>>
>> And that is a good question.
>> I've just removed most of the #ifdef annotations on request from Sagi.
>> If we were going with your suggestion I would need to bring them back.
>>
>> So, Sagi: What should we do?
>> Re-clutter and allow nvme-cli to figure out if TLS is supported?
>> And have it always enabled and invent other means for nvme-cli?
> 
> I don't understand the issue.
> 
> userspace already has a way to know if a feature is supported by reading
> the misc device.
> 
Yes. But the output there is generated by evaluating the fields from the 
fabrics 'opts' structure.

So if the 'tls' option is present it assumes that TLS is supported.
With my current patchset the _option_ is always present, but the 
evaluation of the option once set by nvme-cli might return -EINVAL if 
it's not compiled in.

The request from Daniel is to have the compile option for the fabrics 
'tls' option, too, such that it doesn't show up with the misc device.
IE revert the changes which you requested to convert all the
'#ifdef CONFIG_NVME_TCP_TLS' conditionals.

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] nvme-tcp: enable TLS handshake upcall
  2023-04-18 10:28         ` Hannes Reinecke
@ 2023-04-18 10:32           ` Sagi Grimberg
  2023-04-18 10:33             ` Hannes Reinecke
  0 siblings, 1 reply; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-18 10:32 UTC (permalink / raw)
  To: Hannes Reinecke, Daniel Wagner
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake



On 4/18/23 13:28, Hannes Reinecke wrote:
> On 4/18/23 12:12, Sagi Grimberg wrote:
>>
>>>>   --- a/drivers/nvme/host/fabrics.c
>>>>> +++ b/drivers/nvme/host/fabrics.c
>>>>> @@ -609,6 +609,7 @@ static const match_table_t opt_tokens = {
>>>>>       { NVMF_OPT_DISCOVERY,        "discovery"        },
>>>>>       { NVMF_OPT_DHCHAP_SECRET,    "dhchap_secret=%s"    },
>>>>>       { NVMF_OPT_DHCHAP_CTRL_SECRET,    "dhchap_ctrl_secret=%s"    },
>>>>> +    { NVMF_OPT_TLS,            "tls"            },
>>>>>       { NVMF_OPT_ERR,            NULL            }
>>>>>   };
>>>>
>>>> Coming back to your previous discussion about this. 'tls' is always 
>>>> announced
>>>> but not always supported. This renders the unsupported option 
>>>> filtering added to
>>>> libnvme useles:
>>>>
>>>>    https://github.com/linux-nvme/libnvme/issues/612
>>>>
>>>> Let's assume we go ahead with this patch. The user set explicit the 
>>>> tls option
>>>> via 'nvme connect ... --tls' and the kernel will return EINVAL, but 
>>>> userland
>>>> can't really know what it was and drop --tls and retry again (or 
>>>> print some
>>>> useful information). It could be any other parameter provided by the 
>>>> user.
>>>>
>>>> How is userland supposed to do any feature detection?
>>>>
>>> And that is a good question.
>>> I've just removed most of the #ifdef annotations on request from Sagi.
>>> If we were going with your suggestion I would need to bring them back.
>>>
>>> So, Sagi: What should we do?
>>> Re-clutter and allow nvme-cli to figure out if TLS is supported?
>>> And have it always enabled and invent other means for nvme-cli?
>>
>> I don't understand the issue.
>>
>> userspace already has a way to know if a feature is supported by reading
>> the misc device.
>>
> Yes. But the output there is generated by evaluating the fields from the 
> fabrics 'opts' structure.
> 
> So if the 'tls' option is present it assumes that TLS is supported.
> With my current patchset the _option_ is always present, but the 
> evaluation of the option once set by nvme-cli might return -EINVAL if 
> it's not compiled in.
> 
> The request from Daniel is to have the compile option for the fabrics 
> 'tls' option, too, such that it doesn't show up with the misc device.
> IE revert the changes which you requested to convert all the
> '#ifdef CONFIG_NVME_TCP_TLS' conditionals.

Thanks for explaining.

Then we can add the ifdef on the opts statement alone, with a comment.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST
  2023-04-17 20:16         ` Jakub Kicinski
@ 2023-04-18 10:33           ` Hannes Reinecke
  2023-04-18 18:21             ` Jakub Kicinski
  0 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-18 10:33 UTC (permalink / raw)
  To: Jakub Kicinski, Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

On 4/17/23 22:16, Jakub Kicinski wrote:
> On Mon, 17 Apr 2023 18:28:12 +0300 Sagi Grimberg wrote:
>>> But then we're running afoul with tls_sw_do_sendpage(), which uses
>>> MSG_SENDPAGE_NOTLAST as a record indicator. Forwarding
>>> MSG_SENDPAGE_NOTLAST to tls results in incorrect TLS records on the wire
>>> and a transmission stall.
>>>
>>> Took me days to figure that out.
>>>
>>> If you have a better idea, I'm all ears.
>>
>> So sendfile is unsupported over kTLS? Or is there some special
>> case for it in the upper layer?
>>
>> Jakub?
> 
> First I hear of it. 10 sec look at the code doesn't indicate that
> the author missed this case completely either.
> 
> Hannes, any more details on what's going wrong?
> 
> What incorrect TLS record have you seen? What's the sequence of sends?
> Offload or no offload? Easy to repro or stars need to (mis)align?
> 
Ah-ha. Found it.

Turns out to be an arguably invalid control flow on our side.

The initial 'connect' command is a command PDU with inline data.
So in our (current) flow we do a 'sendpage()' with the 
MORE/SENDPAGE_NOTLAST flag set as we have to transfer more data.
Ok so far.
But for transmitting the actual data we evaluate 'sendpage_ok()', which 
returns _false_. And then we continue to call 'sendmsg()' for the inline 
data.
This confuses the TLS software stack as sendpage() and sendmsg() are two 
distinct code paths, and we end up with a TX stall.

Solution: check 'sendpage_ok()' when we have inline data and use 
'sendmsg' for the command pdu if required.

It always pays to dig deeper if one's not exactly sure what's happening ...

Will be updating the patchset.

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] nvme-tcp: enable TLS handshake upcall
  2023-04-18 10:32           ` Sagi Grimberg
@ 2023-04-18 10:33             ` Hannes Reinecke
  0 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-18 10:33 UTC (permalink / raw)
  To: Sagi Grimberg, Daniel Wagner
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

On 4/18/23 12:32, Sagi Grimberg wrote:
> 
> 
> On 4/18/23 13:28, Hannes Reinecke wrote:
>> On 4/18/23 12:12, Sagi Grimberg wrote:
>>>
>>>>>   --- a/drivers/nvme/host/fabrics.c
>>>>>> +++ b/drivers/nvme/host/fabrics.c
>>>>>> @@ -609,6 +609,7 @@ static const match_table_t opt_tokens = {
>>>>>>       { NVMF_OPT_DISCOVERY,        "discovery"        },
>>>>>>       { NVMF_OPT_DHCHAP_SECRET,    "dhchap_secret=%s"    },
>>>>>>       { NVMF_OPT_DHCHAP_CTRL_SECRET,    "dhchap_ctrl_secret=%s"    },
>>>>>> +    { NVMF_OPT_TLS,            "tls"            },
>>>>>>       { NVMF_OPT_ERR,            NULL            }
>>>>>>   };
>>>>>
>>>>> Coming back to your previous discussion about this. 'tls' is always 
>>>>> announced
>>>>> but not always supported. This renders the unsupported option 
>>>>> filtering added to
>>>>> libnvme useles:
>>>>>
>>>>>    https://github.com/linux-nvme/libnvme/issues/612
>>>>>
>>>>> Let's assume we go ahead with this patch. The user set explicit the 
>>>>> tls option
>>>>> via 'nvme connect ... --tls' and the kernel will return EINVAL, but 
>>>>> userland
>>>>> can't really know what it was and drop --tls and retry again (or 
>>>>> print some
>>>>> useful information). It could be any other parameter provided by 
>>>>> the user.
>>>>>
>>>>> How is userland supposed to do any feature detection?
>>>>>
>>>> And that is a good question.
>>>> I've just removed most of the #ifdef annotations on request from Sagi.
>>>> If we were going with your suggestion I would need to bring them back.
>>>>
>>>> So, Sagi: What should we do?
>>>> Re-clutter and allow nvme-cli to figure out if TLS is supported?
>>>> And have it always enabled and invent other means for nvme-cli?
>>>
>>> I don't understand the issue.
>>>
>>> userspace already has a way to know if a feature is supported by reading
>>> the misc device.
>>>
>> Yes. But the output there is generated by evaluating the fields from 
>> the fabrics 'opts' structure.
>>
>> So if the 'tls' option is present it assumes that TLS is supported.
>> With my current patchset the _option_ is always present, but the 
>> evaluation of the option once set by nvme-cli might return -EINVAL if 
>> it's not compiled in.
>>
>> The request from Daniel is to have the compile option for the fabrics 
>> 'tls' option, too, such that it doesn't show up with the misc device.
>> IE revert the changes which you requested to convert all the
>> '#ifdef CONFIG_NVME_TCP_TLS' conditionals.
> 
> Thanks for explaining.
> 
> Then we can add the ifdef on the opts statement alone, with a comment.

Good, will do for the next round.

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/18] net/tls: sanitize MSG_EOR handling
  2023-04-18 10:24         ` Hannes Reinecke
@ 2023-04-18 10:39           ` Hannes Reinecke
  2023-04-18 10:43             ` Sagi Grimberg
  0 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-18 10:39 UTC (permalink / raw)
  To: Sagi Grimberg, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

On 4/18/23 12:24, Hannes Reinecke wrote:
> On 4/18/23 12:07, Sagi Grimberg wrote:
>>
>>>>> The TLS stack is using MSG_EOR internally, so the flag cannot be
>>>>> set for sendmsg()/sendpage(). But to avoid having the caller to
>>>>> check whether TLS is active modify the code to clear the MSG_EOR
>>>>> flag. And blank out MSG_MORE / MSG_SENDPAGE_NOTLAST, too, as they
>>>>> conflict with MSG_EOR anyway.
>>>>
>>>> This looks like a temporary workaround to me.
>>>>
>>>> The networking folks really need to be CC'd on this (same for patch 6).
>>>
>>> Thanks, when I said "we can support EOR" I obviously meant support
>>> not ignore it :(  No ack.
>>
>> Obviously neither Hannes or I have sufficient tls knowledge to properly
>> support it... It needs to be done by someone who knows the
>> implementation.
>>
>> Is this a large scope to add support for it? Because if it is, I'd
>> simply change nvme to clear MSG_EOR when tls is used (despite my
>> preference to not do special things for tls).
> 
> There's a rather simple patch for tls_sw. It already paces the data 
> internally by an 'eor' variable, which is currently set to !MSG_MORE.
> So evaluating MSG_EOR here is really trivial, and doesn't seem to cause 
> any issues.
> 
> Or, at least, none which would show up with NVMe-over-TLS :-)
> 
Suggestion is to change it like this (sendpage follows suit):

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 827292e29f99..7b28b11ff611 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -953,9 +953,12 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr 
*msg, size_t size)
         int pending;

         if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
-                              MSG_CMSG_COMPAT))
+                              MSG_EOR | MSG_CMSG_COMPAT))
                 return -EOPNOTSUPP;

+       if (MSG_EOR)
+               eor = true;
+
         ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
         if (ret)
                 return ret;

Cheers,

Hannes


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/18] net/tls: sanitize MSG_EOR handling
  2023-04-18 10:39           ` Hannes Reinecke
@ 2023-04-18 10:43             ` Sagi Grimberg
  2023-04-18 11:02               ` Hannes Reinecke
  0 siblings, 1 reply; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-18 10:43 UTC (permalink / raw)
  To: Hannes Reinecke, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake



On 4/18/23 13:39, Hannes Reinecke wrote:
> On 4/18/23 12:24, Hannes Reinecke wrote:
>> On 4/18/23 12:07, Sagi Grimberg wrote:
>>>
>>>>>> The TLS stack is using MSG_EOR internally, so the flag cannot be
>>>>>> set for sendmsg()/sendpage(). But to avoid having the caller to
>>>>>> check whether TLS is active modify the code to clear the MSG_EOR
>>>>>> flag. And blank out MSG_MORE / MSG_SENDPAGE_NOTLAST, too, as they
>>>>>> conflict with MSG_EOR anyway.
>>>>>
>>>>> This looks like a temporary workaround to me.
>>>>>
>>>>> The networking folks really need to be CC'd on this (same for patch 
>>>>> 6).
>>>>
>>>> Thanks, when I said "we can support EOR" I obviously meant support
>>>> not ignore it :(  No ack.
>>>
>>> Obviously neither Hannes or I have sufficient tls knowledge to properly
>>> support it... It needs to be done by someone who knows the
>>> implementation.
>>>
>>> Is this a large scope to add support for it? Because if it is, I'd
>>> simply change nvme to clear MSG_EOR when tls is used (despite my
>>> preference to not do special things for tls).
>>
>> There's a rather simple patch for tls_sw. It already paces the data 
>> internally by an 'eor' variable, which is currently set to !MSG_MORE.
>> So evaluating MSG_EOR here is really trivial, and doesn't seem to 
>> cause any issues.
>>
>> Or, at least, none which would show up with NVMe-over-TLS :-)
>>
> Suggestion is to change it like this (sendpage follows suit):
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 827292e29f99..7b28b11ff611 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -953,9 +953,12 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr 
> *msg, size_t size)
>          int pending;
> 
>          if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
> -                              MSG_CMSG_COMPAT))
> +                              MSG_EOR | MSG_CMSG_COMPAT))
>                  return -EOPNOTSUPP;
> 
> +       if (MSG_EOR)

You probably meant if (msg->msg_flags & MSG_EOR)...

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/18] net/tls: sanitize MSG_EOR handling
  2023-04-18 10:43             ` Sagi Grimberg
@ 2023-04-18 11:02               ` Hannes Reinecke
  2023-04-18 18:28                 ` Jakub Kicinski
  0 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-18 11:02 UTC (permalink / raw)
  To: Sagi Grimberg, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake

On 4/18/23 12:43, Sagi Grimberg wrote:
> 
> 
> On 4/18/23 13:39, Hannes Reinecke wrote:
>> On 4/18/23 12:24, Hannes Reinecke wrote:
>>> On 4/18/23 12:07, Sagi Grimberg wrote:
>>>>
>>>>>>> The TLS stack is using MSG_EOR internally, so the flag cannot be
>>>>>>> set for sendmsg()/sendpage(). But to avoid having the caller to
>>>>>>> check whether TLS is active modify the code to clear the MSG_EOR
>>>>>>> flag. And blank out MSG_MORE / MSG_SENDPAGE_NOTLAST, too, as they
>>>>>>> conflict with MSG_EOR anyway.
>>>>>>
>>>>>> This looks like a temporary workaround to me.
>>>>>>
>>>>>> The networking folks really need to be CC'd on this (same for 
>>>>>> patch 6).
>>>>>
>>>>> Thanks, when I said "we can support EOR" I obviously meant support
>>>>> not ignore it :(  No ack.
>>>>
>>>> Obviously neither Hannes or I have sufficient tls knowledge to properly
>>>> support it... It needs to be done by someone who knows the
>>>> implementation.
>>>>
>>>> Is this a large scope to add support for it? Because if it is, I'd
>>>> simply change nvme to clear MSG_EOR when tls is used (despite my
>>>> preference to not do special things for tls).
>>>
>>> There's a rather simple patch for tls_sw. It already paces the data 
>>> internally by an 'eor' variable, which is currently set to !MSG_MORE.
>>> So evaluating MSG_EOR here is really trivial, and doesn't seem to 
>>> cause any issues.
>>>
>>> Or, at least, none which would show up with NVMe-over-TLS :-)
>>>
>> Suggestion is to change it like this (sendpage follows suit):
>>
>> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
>> index 827292e29f99..7b28b11ff611 100644
>> --- a/net/tls/tls_sw.c
>> +++ b/net/tls/tls_sw.c
>> @@ -953,9 +953,12 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr 
>> *msg, size_t size)
>>          int pending;
>>
>>          if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
>> -                              MSG_CMSG_COMPAT))
>> +                              MSG_EOR | MSG_CMSG_COMPAT))
>>                  return -EOPNOTSUPP;
>>
>> +       if (MSG_EOR)
> 
> You probably meant if (msg->msg_flags & MSG_EOR)...
> 
Of course.

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST
  2023-04-18 10:33           ` Hannes Reinecke
@ 2023-04-18 18:21             ` Jakub Kicinski
  2023-04-18 18:25               ` Hannes Reinecke
  0 siblings, 1 reply; 52+ messages in thread
From: Jakub Kicinski @ 2023-04-18 18:21 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Sagi Grimberg, Christoph Hellwig, Keith Busch, linux-nvme,
	Chuck Lever, kernel-tls-handshake, Vadim Fedorenko

On Tue, 18 Apr 2023 12:33:03 +0200 Hannes Reinecke wrote:
> Ah-ha. Found it.
> 
> Turns out to be an arguably invalid control flow on our side.
> 
> The initial 'connect' command is a command PDU with inline data.
> So in our (current) flow we do a 'sendpage()' with the 
> MORE/SENDPAGE_NOTLAST flag set as we have to transfer more data.
> Ok so far.
> But for transmitting the actual data we evaluate 'sendpage_ok()', which 
> returns _false_. And then we continue to call 'sendmsg()' for the inline 
> data.
> This confuses the TLS software stack as sendpage() and sendmsg() are two 
> distinct code paths, and we end up with a TX stall.
> 
> Solution: check 'sendpage_ok()' when we have inline data and use 
> 'sendmsg' for the command pdu if required.
> 
> It always pays to dig deeper if one's not exactly sure what's happening ...
> 
> Will be updating the patchset.

Hm, yeah, sendpage(NOT_LAST)+sendmsg() could indeed be buggy/untested.
It wasn't something we could have tested from user space, we'd need
kunit or some such :(

Adding it the backlog. And adding Vadim to CC in case he has cycles 
to take a look.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST
  2023-04-18 18:21             ` Jakub Kicinski
@ 2023-04-18 18:25               ` Hannes Reinecke
  0 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-04-18 18:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Sagi Grimberg, Christoph Hellwig, Keith Busch, linux-nvme,
	Chuck Lever, kernel-tls-handshake, Vadim Fedorenko

On 4/18/23 20:21, Jakub Kicinski wrote:
> On Tue, 18 Apr 2023 12:33:03 +0200 Hannes Reinecke wrote:
>> Ah-ha. Found it.
>>
>> Turns out to be an arguably invalid control flow on our side.
>>
>> The initial 'connect' command is a command PDU with inline data.
>> So in our (current) flow we do a 'sendpage()' with the
>> MORE/SENDPAGE_NOTLAST flag set as we have to transfer more data.
>> Ok so far.
>> But for transmitting the actual data we evaluate 'sendpage_ok()', which
>> returns _false_. And then we continue to call 'sendmsg()' for the inline
>> data.
>> This confuses the TLS software stack as sendpage() and sendmsg() are two
>> distinct code paths, and we end up with a TX stall.
>>
>> Solution: check 'sendpage_ok()' when we have inline data and use
>> 'sendmsg' for the command pdu if required.
>>
>> It always pays to dig deeper if one's not exactly sure what's happening ...
>>
>> Will be updating the patchset.
> 
> Hm, yeah, sendpage(NOT_LAST)+sendmsg() could indeed be buggy/untested.
> It wasn't something we could have tested from user space, we'd need
> kunit or some such :(
> 
> Adding it the backlog. And adding Vadim to CC in case he has cycles
> to take a look.
> 
Bear in mind that that distinction will become meaningless once dhowells 
patches for MSG_SPLICE are in, as that will get rid of sendpage() 
altogether.
(At least to my understanding.)

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/18] net/tls: sanitize MSG_EOR handling
  2023-04-18 11:02               ` Hannes Reinecke
@ 2023-04-18 18:28                 ` Jakub Kicinski
  0 siblings, 0 replies; 52+ messages in thread
From: Jakub Kicinski @ 2023-04-18 18:28 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Sagi Grimberg, Christoph Hellwig, Keith Busch, linux-nvme,
	Chuck Lever, kernel-tls-handshake

On Tue, 18 Apr 2023 13:02:34 +0200 Hannes Reinecke wrote:
> >> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> >> index 827292e29f99..7b28b11ff611 100644
> >> --- a/net/tls/tls_sw.c
> >> +++ b/net/tls/tls_sw.c
> >> @@ -953,9 +953,12 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr 
> >> *msg, size_t size)
> >>          int pending;
> >>
> >>          if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
> >> -                              MSG_CMSG_COMPAT))
> >> +                              MSG_EOR | MSG_CMSG_COMPAT))
> >>                  return -EOPNOTSUPP;
> >>
> >> +       if (MSG_EOR)  
> > 
> > You probably meant if (msg->msg_flags & MSG_EOR)...
> >   
> Of course.

Neat. And similarly for sendpage() and in tls_device.c, it should be
fairly small changes in all 3 places. You don't need to worry about
tls_device itself using EOR, FWIW, it's just to try to make the offload
start on a skb boundary. Adding more EORs should not confuse anything.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 17/18] nvmet-tcp: control messages for recvmsg()
  2023-03-29 13:59 ` [PATCH 17/18] nvmet-tcp: control messages for recvmsg() Hannes Reinecke
@ 2023-04-03 12:59   ` Sagi Grimberg
  0 siblings, 0 replies; 52+ messages in thread
From: Sagi Grimberg @ 2023-04-03 12:59 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Chuck Lever, kernel-tls-handshake



On 3/29/23 16:59, Hannes Reinecke wrote:
> kTLS requires control messages for recvmsg() to relay any out-of-band
> TLS messages (eg TLS alerts) to the caller.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>   drivers/nvme/target/tcp.c | 71 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 71 insertions(+)
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 5daf16138471..75017fd5da9f 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -12,6 +12,7 @@
>   #include <net/sock.h>
>   #include <net/tcp.h>
>   #ifdef CONFIG_NVME_TLS
> +#include <net/tls.h>
>   #include <net/handshake.h>
>   #include <linux/nvme-keyring.h>
>   #endif
> @@ -92,6 +93,7 @@ struct nvmet_tcp_cmd {
>   	u32				pdu_len;
>   	u32				pdu_recv;
>   	int				sg_idx;
> +	char				recv_cbuf[CMSG_LEN(sizeof(char))];
>   	struct msghdr			recv_msg;
>   	struct bio_vec			*iov;
>   	u32				flags;
> @@ -1086,7 +1088,18 @@ static int nvmet_tcp_try_recv_pdu(struct nvmet_tcp_queue *queue)
>   	struct nvme_tcp_hdr *hdr = &queue->pdu.cmd.hdr;
>   	int len;
>   	struct kvec iov;
> +#ifdef CONFIG_NVME_TLS

Do you need this ifdef? The recv_cbuf in the cmd is not under and ifdef.

> +	char cbuf[CMSG_LEN(sizeof(char))] = {};
> +	unsigned char ctype;
> +	struct cmsghdr *cmsg;
> +	struct msghdr msg = {
> +		.msg_control = cbuf,
> +		.msg_controllen = sizeof(cbuf),
> +		.msg_flags = MSG_DONTWAIT
> +	};
> +#else
>   	struct msghdr msg = { .msg_flags = MSG_DONTWAIT };
> +#endif
>   
>   recv:
>   	iov.iov_base = (void *)&queue->pdu + queue->offset;
> @@ -1095,6 +1108,19 @@ static int nvmet_tcp_try_recv_pdu(struct nvmet_tcp_queue *queue)
>   			iov.iov_len, msg.msg_flags);
>   	if (unlikely(len < 0))
>   		return len;
> +#ifdef CONFIG_NVME_TLS
> +	cmsg = (struct cmsghdr *)cbuf;
> +	if (CMSG_OK(&msg, cmsg) &&
> +	    cmsg->cmsg_level == SOL_TLS &&
> +	    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
> +		ctype = *((unsigned char *)CMSG_DATA(cmsg));
> +		if (ctype != TLS_RECORD_TYPE_DATA) {
> +			pr_err("queue %d unhandled TLS record %d\n",
> +				queue->idx, ctype);
> +			return -ENOTCONN;
> +		}
> +	}
> +#endif

Can we consolidate this to a helper?

Something like:
static int nvmet_tcp_check_cmsg(struct nvmet_tcp_queue *queue, struct 
cmsghdr *cmsg)
{
	if (!IS_ENABLED(CONFIG_NVME_TLS))
		return 0;

	if (CMSG_OK(&msg, cmsg) &&
	    cmsg->cmsg_level == SOL_TLS &&
	    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
		ctype = *((unsigned char *)CMSG_DATA(cmsg));
		if (ctype != TLS_RECORD_TYPE_DATA) {
			pr_err("queue %d unhandled TLS record %d\n",
				queue->idx, ctype);
			return -ENOTCONN;
		}
	}
	return 0;
}

>   
>   	queue->offset += len;
>   	queue->left -= len;
> @@ -1150,10 +1176,27 @@ static int nvmet_tcp_try_recv_data(struct nvmet_tcp_queue *queue)
>   	int ret;
>   
>   	while (msg_data_left(&cmd->recv_msg)) {
> +#ifdef CONFIG_NVME_TLS
> +		struct cmsghdr *cmsg;
> +		unsigned char ctype;
> +#endif
>   		ret = sock_recvmsg(cmd->queue->sock, &cmd->recv_msg,
>   			cmd->recv_msg.msg_flags);
>   		if (ret <= 0)
>   			return ret;
> +#ifdef CONFIG_NVME_TLS
> +		cmsg = (struct cmsghdr *)cmd->recv_cbuf;
> +		if (CMSG_OK(&cmd->recv_msg, cmsg) &&
> +		    cmsg->cmsg_level == SOL_TLS &&
> +		    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
> +			ctype = *((unsigned char *)CMSG_DATA(cmsg));
> +			if (ctype != TLS_RECORD_TYPE_DATA) {
> +				pr_err("queue %d unhandled TLS record %d\n",
> +				       queue->idx, ctype);
> +				return -ENOTCONN;
> +			}
> +		}
> +#endif
>   
>   		cmd->pdu_recv += ret;
>   		cmd->rbytes_done += ret;
> @@ -1175,7 +1218,18 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
>   {
>   	struct nvmet_tcp_cmd *cmd = queue->cmd;
>   	int ret;
> +#ifdef CONFIG_NVME_TLS
> +	char cbuf[CMSG_LEN(sizeof(char))] = {};
> +	unsigned char ctype;
> +	struct cmsghdr *cmsg;
> +	struct msghdr msg = {
> +		.msg_control = cbuf,
> +		.msg_controllen = sizeof(cbuf),
> +		.msg_flags = MSG_DONTWAIT
> +	};
> +#else
>   	struct msghdr msg = { .msg_flags = MSG_DONTWAIT };
> +#endif
>   	struct kvec iov = {
>   		.iov_base = (void *)&cmd->recv_ddgst + queue->offset,
>   		.iov_len = queue->left
> @@ -1185,6 +1239,19 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
>   			iov.iov_len, msg.msg_flags);
>   	if (unlikely(ret < 0))
>   		return ret;
> +#ifdef CONFIG_NVME_TLS
> +	cmsg = (struct cmsghdr *)cbuf;
> +	if (CMSG_OK(&msg, cmsg) &&
> +	    cmsg->cmsg_level == SOL_TLS &&
> +	    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
> +		ctype = *((unsigned char *)CMSG_DATA(cmsg));
> +		if (ctype != TLS_RECORD_TYPE_DATA) {
> +			pr_err("queue %d unhandled TLS record %d\n",
> +				queue->idx, ctype);
> +			return -ENOTCONN;
> +		}
> +	}
> +#endif
>   
>   	queue->offset += ret;
>   	queue->left -= ret;
> @@ -1354,6 +1421,10 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_queue *queue,
>   	if (!c->r2t_pdu)
>   		goto out_free_data;
>   
> +#ifdef CONFIG_NVME_TLS
> +	c->recv_msg.msg_control = c->recv_cbuf;
> +	c->recv_msg.msg_controllen = sizeof(c->recv_cbuf);
> +#endif
>   	c->recv_msg.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
>   
>   	list_add_tail(&c->entry, &queue->free_list);

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 17/18] nvmet-tcp: control messages for recvmsg()
  2023-03-29 13:59 [PATCHv2 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
@ 2023-03-29 13:59 ` Hannes Reinecke
  2023-04-03 12:59   ` Sagi Grimberg
  0 siblings, 1 reply; 52+ messages in thread
From: Hannes Reinecke @ 2023-03-29 13:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

kTLS requires control messages for recvmsg() to relay any out-of-band
TLS messages (eg TLS alerts) to the caller.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/target/tcp.c | 71 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 5daf16138471..75017fd5da9f 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -12,6 +12,7 @@
 #include <net/sock.h>
 #include <net/tcp.h>
 #ifdef CONFIG_NVME_TLS
+#include <net/tls.h>
 #include <net/handshake.h>
 #include <linux/nvme-keyring.h>
 #endif
@@ -92,6 +93,7 @@ struct nvmet_tcp_cmd {
 	u32				pdu_len;
 	u32				pdu_recv;
 	int				sg_idx;
+	char				recv_cbuf[CMSG_LEN(sizeof(char))];
 	struct msghdr			recv_msg;
 	struct bio_vec			*iov;
 	u32				flags;
@@ -1086,7 +1088,18 @@ static int nvmet_tcp_try_recv_pdu(struct nvmet_tcp_queue *queue)
 	struct nvme_tcp_hdr *hdr = &queue->pdu.cmd.hdr;
 	int len;
 	struct kvec iov;
+#ifdef CONFIG_NVME_TLS
+	char cbuf[CMSG_LEN(sizeof(char))] = {};
+	unsigned char ctype;
+	struct cmsghdr *cmsg;
+	struct msghdr msg = {
+		.msg_control = cbuf,
+		.msg_controllen = sizeof(cbuf),
+		.msg_flags = MSG_DONTWAIT
+	};
+#else
 	struct msghdr msg = { .msg_flags = MSG_DONTWAIT };
+#endif
 
 recv:
 	iov.iov_base = (void *)&queue->pdu + queue->offset;
@@ -1095,6 +1108,19 @@ static int nvmet_tcp_try_recv_pdu(struct nvmet_tcp_queue *queue)
 			iov.iov_len, msg.msg_flags);
 	if (unlikely(len < 0))
 		return len;
+#ifdef CONFIG_NVME_TLS
+	cmsg = (struct cmsghdr *)cbuf;
+	if (CMSG_OK(&msg, cmsg) &&
+	    cmsg->cmsg_level == SOL_TLS &&
+	    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
+		ctype = *((unsigned char *)CMSG_DATA(cmsg));
+		if (ctype != TLS_RECORD_TYPE_DATA) {
+			pr_err("queue %d unhandled TLS record %d\n",
+				queue->idx, ctype);
+			return -ENOTCONN;
+		}
+	}
+#endif
 
 	queue->offset += len;
 	queue->left -= len;
@@ -1150,10 +1176,27 @@ static int nvmet_tcp_try_recv_data(struct nvmet_tcp_queue *queue)
 	int ret;
 
 	while (msg_data_left(&cmd->recv_msg)) {
+#ifdef CONFIG_NVME_TLS
+		struct cmsghdr *cmsg;
+		unsigned char ctype;
+#endif
 		ret = sock_recvmsg(cmd->queue->sock, &cmd->recv_msg,
 			cmd->recv_msg.msg_flags);
 		if (ret <= 0)
 			return ret;
+#ifdef CONFIG_NVME_TLS
+		cmsg = (struct cmsghdr *)cmd->recv_cbuf;
+		if (CMSG_OK(&cmd->recv_msg, cmsg) &&
+		    cmsg->cmsg_level == SOL_TLS &&
+		    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
+			ctype = *((unsigned char *)CMSG_DATA(cmsg));
+			if (ctype != TLS_RECORD_TYPE_DATA) {
+				pr_err("queue %d unhandled TLS record %d\n",
+				       queue->idx, ctype);
+				return -ENOTCONN;
+			}
+		}
+#endif
 
 		cmd->pdu_recv += ret;
 		cmd->rbytes_done += ret;
@@ -1175,7 +1218,18 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
 {
 	struct nvmet_tcp_cmd *cmd = queue->cmd;
 	int ret;
+#ifdef CONFIG_NVME_TLS
+	char cbuf[CMSG_LEN(sizeof(char))] = {};
+	unsigned char ctype;
+	struct cmsghdr *cmsg;
+	struct msghdr msg = {
+		.msg_control = cbuf,
+		.msg_controllen = sizeof(cbuf),
+		.msg_flags = MSG_DONTWAIT
+	};
+#else
 	struct msghdr msg = { .msg_flags = MSG_DONTWAIT };
+#endif
 	struct kvec iov = {
 		.iov_base = (void *)&cmd->recv_ddgst + queue->offset,
 		.iov_len = queue->left
@@ -1185,6 +1239,19 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
 			iov.iov_len, msg.msg_flags);
 	if (unlikely(ret < 0))
 		return ret;
+#ifdef CONFIG_NVME_TLS
+	cmsg = (struct cmsghdr *)cbuf;
+	if (CMSG_OK(&msg, cmsg) &&
+	    cmsg->cmsg_level == SOL_TLS &&
+	    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
+		ctype = *((unsigned char *)CMSG_DATA(cmsg));
+		if (ctype != TLS_RECORD_TYPE_DATA) {
+			pr_err("queue %d unhandled TLS record %d\n",
+				queue->idx, ctype);
+			return -ENOTCONN;
+		}
+	}
+#endif
 
 	queue->offset += ret;
 	queue->left -= ret;
@@ -1354,6 +1421,10 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_queue *queue,
 	if (!c->r2t_pdu)
 		goto out_free_data;
 
+#ifdef CONFIG_NVME_TLS
+	c->recv_msg.msg_control = c->recv_cbuf;
+	c->recv_msg.msg_controllen = sizeof(c->recv_cbuf);
+#endif
 	c->recv_msg.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
 
 	list_add_tail(&c->entry, &queue->free_list);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 17/18] nvmet-tcp: control messages for recvmsg()
  2023-03-21 12:43 [RFC PATCH 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
@ 2023-03-21 12:43 ` Hannes Reinecke
  0 siblings, 0 replies; 52+ messages in thread
From: Hannes Reinecke @ 2023-03-21 12:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Chuck Lever,
	kernel-tls-handshake, Hannes Reinecke

kTLS requires control messages for recvmsg() to relay any out-of-band
TLS messages (eg TLS alerts) to the caller.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/target/tcp.c | 58 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 9b69cac84508..a69647fb2c81 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -12,6 +12,7 @@
 #include <linux/nvme-keyring.h>
 #include <net/sock.h>
 #include <net/tcp.h>
+#include <net/tls.h>
 #include <net/handshake.h>
 #include <linux/inet.h>
 #include <linux/llist.h>
@@ -88,6 +89,7 @@ struct nvmet_tcp_cmd {
 	u32				pdu_len;
 	u32				pdu_recv;
 	int				sg_idx;
+	char				recv_cbuf[CMSG_LEN(sizeof(char))];
 	struct msghdr			recv_msg;
 	struct bio_vec			*iov;
 	u32				flags;
@@ -1108,7 +1110,14 @@ static int nvmet_tcp_try_recv_pdu(struct nvmet_tcp_queue *queue)
 	struct nvme_tcp_hdr *hdr = &queue->pdu.cmd.hdr;
 	int len;
 	struct kvec iov;
-	struct msghdr msg = { .msg_flags = MSG_DONTWAIT };
+	char cbuf[CMSG_LEN(sizeof(char))] = {};
+	unsigned char ctype;
+	struct cmsghdr *cmsg;
+	struct msghdr msg = {
+		.msg_control = cbuf,
+		.msg_controllen = sizeof(cbuf),
+		.msg_flags = MSG_DONTWAIT
+	};
 
 recv:
 	iov.iov_base = (void *)&queue->pdu + queue->offset;
@@ -1117,6 +1126,17 @@ static int nvmet_tcp_try_recv_pdu(struct nvmet_tcp_queue *queue)
 			iov.iov_len, msg.msg_flags);
 	if (unlikely(len < 0))
 		return len;
+	cmsg = (struct cmsghdr *)cbuf;
+	if (CMSG_OK(&msg, cmsg) &&
+	    cmsg->cmsg_level == SOL_TLS &&
+	    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
+		ctype = *((unsigned char *)CMSG_DATA(cmsg));
+		if (ctype != TLS_RECORD_TYPE_DATA) {
+			pr_err("queue %d unhandled TLS record %d\n",
+				queue->idx, ctype);
+			return -ENOTCONN;
+		}
+	}
 
 	queue->offset += len;
 	queue->left -= len;
@@ -1172,10 +1192,24 @@ static int nvmet_tcp_try_recv_data(struct nvmet_tcp_queue *queue)
 	int ret;
 
 	while (msg_data_left(&cmd->recv_msg)) {
+		struct cmsghdr *cmsg;
+		unsigned char ctype;
+
 		ret = sock_recvmsg(cmd->queue->sock, &cmd->recv_msg,
 			cmd->recv_msg.msg_flags);
 		if (ret <= 0)
 			return ret;
+		cmsg = (struct cmsghdr *)cmd->recv_cbuf;
+		if (CMSG_OK(&cmd->recv_msg, cmsg) &&
+		    cmsg->cmsg_level == SOL_TLS &&
+		    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
+			ctype = *((unsigned char *)CMSG_DATA(cmsg));
+			if (ctype != TLS_RECORD_TYPE_DATA) {
+				pr_err("queue %d unhandled TLS record %d\n",
+				       queue->idx, ctype);
+				return -ENOTCONN;
+			}
+		}
 
 		cmd->pdu_recv += ret;
 		cmd->rbytes_done += ret;
@@ -1197,7 +1231,14 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
 {
 	struct nvmet_tcp_cmd *cmd = queue->cmd;
 	int ret;
-	struct msghdr msg = { .msg_flags = MSG_DONTWAIT };
+	char cbuf[CMSG_LEN(sizeof(char))] = {};
+	unsigned char ctype;
+	struct cmsghdr *cmsg;
+	struct msghdr msg = {
+		.msg_control = cbuf,
+		.msg_controllen = sizeof(cbuf),
+		.msg_flags = MSG_DONTWAIT
+	};
 	struct kvec iov = {
 		.iov_base = (void *)&cmd->recv_ddgst + queue->offset,
 		.iov_len = queue->left
@@ -1207,6 +1248,17 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
 			iov.iov_len, msg.msg_flags);
 	if (unlikely(ret < 0))
 		return ret;
+	cmsg = (struct cmsghdr *)cbuf;
+	if (CMSG_OK(&msg, cmsg) &&
+	    cmsg->cmsg_level == SOL_TLS &&
+	    cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
+		ctype = *((unsigned char *)CMSG_DATA(cmsg));
+		if (ctype != TLS_RECORD_TYPE_DATA) {
+			pr_err("queue %d unhandled TLS record %d\n",
+				queue->idx, ctype);
+			return -ENOTCONN;
+		}
+	}
 
 	queue->offset += ret;
 	queue->left -= ret;
@@ -1376,6 +1428,8 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_queue *queue,
 	if (!c->r2t_pdu)
 		goto out_free_data;
 
+	c->recv_msg.msg_control = c->recv_cbuf;
+	c->recv_msg.msg_controllen = sizeof(c->recv_cbuf);
 	c->recv_msg.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
 
 	list_add_tail(&c->entry, &queue->free_list);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2023-04-18 18:28 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-17 13:02 [PATCHv3 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
2023-04-17 13:02 ` [PATCH 01/18] nvme-keyring: register '.nvme' keyring Hannes Reinecke
2023-04-17 13:02 ` [PATCH 02/18] nvme-keyring: define a 'psk' keytype Hannes Reinecke
2023-04-17 13:02 ` [PATCH 03/18] nvme: add TCP TSAS definitions Hannes Reinecke
2023-04-17 13:02 ` [PATCH 04/18] nvme-tcp: add definitions for TLS cipher suites Hannes Reinecke
2023-04-17 13:02 ` [PATCH 05/18] nvme-keyring: implement nvme_tls_psk_default() Hannes Reinecke
2023-04-17 13:02 ` [PATCH 06/18] net/tls: implement ->read_sock() Hannes Reinecke
2023-04-17 15:14   ` Sagi Grimberg
2023-04-17 13:02 ` [PATCH 07/18] net/tls: sanitize MSG_EOR handling Hannes Reinecke
2023-04-17 15:19   ` Sagi Grimberg
2023-04-17 20:10     ` Jakub Kicinski
2023-04-18 10:07       ` Sagi Grimberg
2023-04-18 10:24         ` Hannes Reinecke
2023-04-18 10:39           ` Hannes Reinecke
2023-04-18 10:43             ` Sagi Grimberg
2023-04-18 11:02               ` Hannes Reinecke
2023-04-18 18:28                 ` Jakub Kicinski
2023-04-17 13:02 ` [PATCH 08/18] nvme-tcp: do not set MSG_SENDPAGE_NOTLAST Hannes Reinecke
2023-04-17 15:10   ` Sagi Grimberg
2023-04-17 15:25     ` Hannes Reinecke
2023-04-17 15:28       ` Sagi Grimberg
2023-04-17 15:35         ` Hannes Reinecke
2023-04-17 20:16         ` Jakub Kicinski
2023-04-18 10:33           ` Hannes Reinecke
2023-04-18 18:21             ` Jakub Kicinski
2023-04-18 18:25               ` Hannes Reinecke
2023-04-17 13:02 ` [PATCH 09/18] security/keys: export key_lookup() Hannes Reinecke
2023-04-17 13:02 ` [PATCH 10/18] nvme/tcp: allocate socket file Hannes Reinecke
2023-04-17 13:02 ` [PATCH 11/18] nvme-tcp: enable TLS handshake upcall Hannes Reinecke
2023-04-17 15:26   ` Sagi Grimberg
2023-04-17 15:28     ` Hannes Reinecke
2023-04-17 15:31       ` Sagi Grimberg
2023-04-17 15:36         ` Hannes Reinecke
2023-04-18  5:52   ` Daniel Wagner
2023-04-18  9:38     ` Hannes Reinecke
2023-04-18 10:12       ` Sagi Grimberg
2023-04-18 10:28         ` Hannes Reinecke
2023-04-18 10:32           ` Sagi Grimberg
2023-04-18 10:33             ` Hannes Reinecke
2023-04-17 13:02 ` [PATCH 12/18] nvme-tcp: control message handling for recvmsg() Hannes Reinecke
2023-04-17 15:24   ` Sagi Grimberg
2023-04-17 15:26     ` Hannes Reinecke
2023-04-17 13:02 ` [PATCH 13/18] nvme-fabrics: parse options 'keyring' and 'tls_key' Hannes Reinecke
2023-04-17 13:02 ` [PATCH 14/18] nvmet: make TCP sectype settable via configfs Hannes Reinecke
2023-04-17 13:02 ` [PATCH 15/18] nvmet-tcp: allocate socket file Hannes Reinecke
2023-04-17 15:29   ` Sagi Grimberg
2023-04-17 13:03 ` [PATCH 16/18] nvmet-tcp: enable TLS handshake upcall Hannes Reinecke
2023-04-17 13:03 ` [PATCH 17/18] nvmet-tcp: control messages for recvmsg() Hannes Reinecke
2023-04-17 13:03 ` [PATCH 18/18] nvmet-tcp: add configfs attribute 'param_keyring' Hannes Reinecke
  -- strict thread matches above, loose matches on Subject: below --
2023-03-29 13:59 [PATCHv2 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
2023-03-29 13:59 ` [PATCH 17/18] nvmet-tcp: control messages for recvmsg() Hannes Reinecke
2023-04-03 12:59   ` Sagi Grimberg
2023-03-21 12:43 [RFC PATCH 00/18] nvme: In-kernel TLS support for TCP Hannes Reinecke
2023-03-21 12:43 ` [PATCH 17/18] nvmet-tcp: control messages for recvmsg() Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).